Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: add ignore missing field to text chunking processors #907

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

IanMenendez
Copy link
Contributor

Description

Adds ignore missing field to text chunking processors

Related Issues

#906

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Member

@yuye-aws yuye-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me

DEVELOPER_GUIDE.md Outdated Show resolved Hide resolved
Copy link
Member

@yuye-aws yuye-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need a few changes to the integration test.

@yuye-aws
Copy link
Member

Thanks for creating this PR @IanMenendez ! Please also update CHANGELOG.md. We are after the 2.17 code freeze date in https://opensearch.org/releases.html, 2.18 will be the first release for you change to take effect.

@yuye-aws
Copy link
Member

@zhichao-aws Can you help attach the backport 2.x label to this PR. Also, after @IanMenendez deal with the review comments, please approve the CI workflow.

Signed-off-by: Ian Menendez <[email protected]>
Signed-off-by: Ian Menendez <[email protected]>
Copy link
Member

@yuye-aws yuye-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me. The integration tests also passed. Thanks for your contribution.

@yuye-aws
Copy link
Member

Pinging @zane-neo @zhichao-aws @martin-gaievski for review.

@vibrantvarun
Copy link
Member

vibrantvarun commented Sep 18, 2024

@IanMenendez Please add BWC tests
cc: @yuye-aws

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from minor comments my main concerns are:

  • extra integ test, please move test cases to uni tests
  • field name, I suggested one in the issue

List<String> chunkedResult = chunkLeafType(chunkObject, runtimeParameters);
sourceAndMetadataMap.put(String.valueOf(targetKey), chunkedResult);

if (!(ignoreMissing && chunkObject == null)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please refactor condition to a small method with a meaningful name, like shouldProcessChunk. Also please use Objects.isNull for checking if object is null

Copy link
Member

@martin-gaievski martin-gaievski Sep 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this one got addressed completely, I don't see condition being moved to a method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-gaievski

In my opinion there is no point in having a method for a 2 variables boolean expression that is used just one time.

It will convolute the code more than help

@yuye-aws
Copy link
Member

yuye-aws commented Sep 18, 2024

@IanMenendez Please add BWC tests cc: @yuye-aws

@IanMenendez The BWC test file for text_chunking processor are located in

  1. qa/restart-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/TextChunkingProcessorIT.java
  2. qa/rolling-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/TextChunkingProcessorIT.java
    You can take a look whether the BWC tests are needed.

@IanMenendez
Copy link
Contributor Author

IanMenendez commented Sep 21, 2024

@IanMenendez Please add BWC tests cc: @yuye-aws

Why would BWC tests be needed for this feature?

@yuye-aws
Copy link
Member

+1 I don't think BWC tests are needed

@yuye-aws
Copy link
Member

@martin-gaievski Can you help review again?

@martin-gaievski
Copy link
Member

@IanMenendez thanks for addressing most comments. Please check one of previous comments, I've unresolved conversation.

In future, can you please reply to each comment:

  • whether you're making changes or not, just ask will be good
  • if you're making the change please state whether you agree with suggestion/request, and if not please put your reasoning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants