-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: add ignore missing field to text chunking processors #907
base: main
Are you sure you want to change the base?
[Feature]: add ignore missing field to text chunking processors #907
Conversation
Signed-off-by: Ian Menendez <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just need a few changes to the integration test.
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
Thanks for creating this PR @IanMenendez ! Please also update |
@zhichao-aws Can you help attach the backport 2.x label to this PR. Also, after @IanMenendez deal with the review comments, please approve the CI workflow. |
Signed-off-by: Ian Menendez <[email protected]>
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Ian Menendez <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me. The integration tests also passed. Thanks for your contribution.
Pinging @zane-neo @zhichao-aws @martin-gaievski for review. |
@IanMenendez Please add BWC tests |
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/resources/processor/chunker/TextChunkingTestMissingFieldDocument.json
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from minor comments my main concerns are:
- extra integ test, please move test cases to uni tests
- field name, I suggested one in the issue
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Show resolved
Hide resolved
List<String> chunkedResult = chunkLeafType(chunkObject, runtimeParameters); | ||
sourceAndMetadataMap.put(String.valueOf(targetKey), chunkedResult); | ||
|
||
if (!(ignoreMissing && chunkObject == null)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please refactor condition to a small method with a meaningful name, like shouldProcessChunk
. Also please use Objects.isNull for checking if object is null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this one got addressed completely, I don't see condition being moved to a method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion there is no point in having a method for a 2 variables boolean expression that is used just one time.
It will convolute the code more than help
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/TextChunkingProcessorIT.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/TextChunkingProcessor.java
Show resolved
Hide resolved
@IanMenendez The BWC test file for text_chunking processor are located in
|
Signed-off-by: Ian Menendez <[email protected]>
Signed-off-by: Ian Menendez <[email protected]>
Why would BWC tests be needed for this feature? |
+1 I don't think BWC tests are needed |
@martin-gaievski Can you help review again? |
@IanMenendez thanks for addressing most comments. Please check one of previous comments, I've unresolved conversation. In future, can you please reply to each comment:
|
Description
Adds ignore missing field to text chunking processors
Related Issues
#906
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.