Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle case with nested list of objects #477

Merged
merged 16 commits into from
Mar 1, 2024

Conversation

krishy91
Copy link
Contributor

@krishy91 krishy91 commented Oct 28, 2023

Description

These changes add support for processing a list of objects in a nested field. Currently, if the configuration looks like below and if the nested field contains a list of nested objects then the request fails:

curl --location --request PUT 'http://localhost:9200/_ingest/pipeline/embeddings-nested' \
--header 'Content-Type: application/json' \
--data '{
    "processors": [
        {
            "text_embedding": {
                "model_id": "o-bPa4sBu3zYeKjn7baf",
                "field_map": {
                    "nested-field": {
                        "text-field": "vector-field"
                    }
                }
            }
        }
    ]
}'

Example Request with Nested List

curl --location 'http://localhost:9200/_ingest/pipeline/embeddings-nested/_simulate?verbose=true' \
--header 'Content-Type: application/json' \
--data '{
    "docs": [
        {
            "_id": "1",
            "_source": {
                "nested-field": [
                    {
                        "text-field": "This is another test"
                    },
                    {
                        "text-field": "This is a third field"
                    },
                    {
                        "text-field": "This is a fourth field"
                    }
                ]
            }
        }
    ]
}'

Issues Resolved

#469

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@navneet1v
Copy link
Collaborator

@krishy91 is this PR still a draft PR or ready for review?

@krishy91
Copy link
Contributor Author

krishy91 commented Nov 7, 2023

@krishy91 is this PR still a draft PR or ready for review?

Hi! I'm still working on it! I will update here once its ready!

@navneet1v
Copy link
Collaborator

Hi @krishy91 before we can review the changes please do the following:

  1. Remove the draft keyword from title of the PR.
  2. Make sure that all your github actions are successful.

Please let ping on this PR once the above things are done.

Copy link

codecov bot commented Dec 20, 2023

Codecov Report

Attention: Patch coverage is 89.65517% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 84.42%. Comparing base (bd1dfbf) to head (54419ea).
Report is 17 commits behind head on main.

Files Patch % Lines
...rch/neuralsearch/processor/InferenceProcessor.java 89.65% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #477      +/-   ##
============================================
+ Coverage     84.33%   84.42%   +0.09%     
- Complexity      533      609      +76     
============================================
  Files            40       48       +8     
  Lines          1564     1843     +279     
  Branches        244      282      +38     
============================================
+ Hits           1319     1556     +237     
- Misses          133      162      +29     
- Partials        112      125      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Gopala-Krishna.Char and others added 10 commits December 20, 2023 19:55
Signed-off-by: Gopala-Krishna.Char <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>
Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com>
Signed-off-by: Gopala-Krishna.Char <[email protected]>
…l_sparse query (opensearch-project#438) (opensearch-project#479)

* [bug fix] Fix async actions are left in neural_sparse query (opensearch-project#438)

* add serialization and deserialization

Signed-off-by: zhichao-aws <[email protected]>

* hash, equals. + UT

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

* add test

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
(cherry picked from commit 51e6c00)

* rm max_token_score

Signed-off-by: zhichao-aws <[email protected]>

* add changelog

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>
…ery (opensearch-project#490)

* Adding null check for case when hybrid query wrapped into bool query

Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>
…eries (opensearch-project#498)

* Fixed nested field case

Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>
* Added support for jdk-21

Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>
@krishy91
Copy link
Contributor Author

Hi @navneet1v ! The Pull Request passed the workflows.

@br3no
Copy link

br3no commented Feb 6, 2024

We would really like to use this feature, is there anything we can do to help in getting this merged?

@manzke
Copy link

manzke commented Feb 6, 2024

Solves: #469 - PR is already in use and we would like to see it included in 2.12. or earlier.

@krishy91
Copy link
Contributor Author

krishy91 commented Feb 8, 2024

Hi, we would really like to have this fix in the next possible 2.x release. Need two more reviews @navneet1v

@navneet1v
Copy link
Collaborator

Will be checking this PR today.

@krishy91
Copy link
Contributor Author

Thanks @navneet1v! Let me know if I can do anything to speed up the merge process

@br3no
Copy link

br3no commented Feb 23, 2024

@navneet1v gently reminder to look into this PR. We would like to move away from using this branch.

@navneet1v
Copy link
Collaborator

Looked into the PR, on high level looks good. Please add Integration tests for this change.

@navneet1v
Copy link
Collaborator

Also code coverage is dropped for new lines which are added please ensure that those lines are covered.

@krishy91
Copy link
Contributor Author

krishy91 commented Feb 26, 2024

Looked into the PR, on high level looks good. Please add Integration tests for this change.

Thanks @navneet1v! Added integration test for the case containing list of nested objects

@krishy91
Copy link
Contributor Author

Also code coverage is dropped for new lines which are added please ensure that those lines are covered.

Added addtional test with objects having nesting depth of 2. The InferenceProcessor has 98% coverage for lines now. How can I rerun the report?

@krishy91
Copy link
Contributor Author

@navneet1v did you get a chance to look at the PR again after I resolved the issues you mentioned in your review?

@zane-neo zane-neo merged commit ea49d3c into opensearch-project:main Mar 1, 2024
48 of 54 checks passed
@zane-neo zane-neo added the backport 2.x Label will add auto workflow to backport PR to 2.x branch label Mar 1, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 1, 2024
* Handle case with nested list of objects

Signed-off-by: Gopala-Krishna.Char <[email protected]>

* fix validateEmbeddingsFieldValues Method

Signed-off-by: Gopala-Krishna.Char <[email protected]>

* spotless formatting

Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Onboard jenkins prod docker images on github actions (#483)

Signed-off-by: Peter Zhu <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Update dependency org.json:json to v20231013 (#481)

Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* [Backport main manually][bug fix] Fix async actions are left in neural_sparse query (#438) (#479)

* [bug fix] Fix async actions are left in neural_sparse query (#438)

* add serialization and deserialization

Signed-off-by: zhichao-aws <[email protected]>

* hash, equals. + UT

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

* add test

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
(cherry picked from commit 51e6c00)

* rm max_token_score

Signed-off-by: zhichao-aws <[email protected]>

* add changelog

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Fixed exception for case when Hybrid query being wrapped into bool query (#490)

* Adding null check for case when hybrid query wrapped into bool query

Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Fixed Hybrid query for cases when it's wrapped into other compound queries (#498)

* Fixed nested field case

Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Added the github action to copy the attached issues label to PR. (#504)

Signed-off-by: Navneet Verma <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Added support for jdk-21 (#500)

* Added support for jdk-21

Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Add unit tests + small fixes

Signed-off-by: krishy91 <[email protected]>

* fix indentation

Signed-off-by: krishy91 <[email protected]>

* remove unused code + add 2nd level nesting test

Signed-off-by: krishy91 <[email protected]>

* add integration test for list of nested objects

Signed-off-by: krishy91 <[email protected]>

---------

Signed-off-by: Gopala-Krishna.Char <[email protected]>
Signed-off-by: Peter Zhu <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Navneet Verma <[email protected]>
Signed-off-by: krishy91 <[email protected]>
Co-authored-by: Gopala-Krishna.Char <[email protected]>
Co-authored-by: Peter Zhu <[email protected]>
Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com>
Co-authored-by: zhichao-aws <[email protected]>
Co-authored-by: Martin Gaievski <[email protected]>
Co-authored-by: Navneet Verma <[email protected]>
(cherry picked from commit ea49d3c)
vibrantvarun pushed a commit that referenced this pull request Mar 5, 2024
* Handle case with nested list of objects

Signed-off-by: Gopala-Krishna.Char <[email protected]>

* fix validateEmbeddingsFieldValues Method

Signed-off-by: Gopala-Krishna.Char <[email protected]>

* spotless formatting

Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Onboard jenkins prod docker images on github actions (#483)

Signed-off-by: Peter Zhu <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Update dependency org.json:json to v20231013 (#481)

Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* [Backport main manually][bug fix] Fix async actions are left in neural_sparse query (#438) (#479)

* [bug fix] Fix async actions are left in neural_sparse query (#438)

* add serialization and deserialization

Signed-off-by: zhichao-aws <[email protected]>

* hash, equals. + UT

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

* add test

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
(cherry picked from commit 51e6c00)

* rm max_token_score

Signed-off-by: zhichao-aws <[email protected]>

* add changelog

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Fixed exception for case when Hybrid query being wrapped into bool query (#490)

* Adding null check for case when hybrid query wrapped into bool query

Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Fixed Hybrid query for cases when it's wrapped into other compound queries (#498)

* Fixed nested field case

Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Added the github action to copy the attached issues label to PR. (#504)

Signed-off-by: Navneet Verma <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Added support for jdk-21 (#500)

* Added support for jdk-21

Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Add unit tests + small fixes

Signed-off-by: krishy91 <[email protected]>

* fix indentation

Signed-off-by: krishy91 <[email protected]>

* remove unused code + add 2nd level nesting test

Signed-off-by: krishy91 <[email protected]>

* add integration test for list of nested objects

Signed-off-by: krishy91 <[email protected]>

---------

Signed-off-by: Gopala-Krishna.Char <[email protected]>
Signed-off-by: Peter Zhu <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Navneet Verma <[email protected]>
Signed-off-by: krishy91 <[email protected]>
Co-authored-by: Gopala-Krishna.Char <[email protected]>
Co-authored-by: Peter Zhu <[email protected]>
Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com>
Co-authored-by: zhichao-aws <[email protected]>
Co-authored-by: Martin Gaievski <[email protected]>
Co-authored-by: Navneet Verma <[email protected]>
(cherry picked from commit ea49d3c)

Co-authored-by: Gopala-Krishna Char <[email protected]>
yuye-aws pushed a commit to yuye-aws/neural-search that referenced this pull request Mar 8, 2024
* Handle case with nested list of objects

Signed-off-by: Gopala-Krishna.Char <[email protected]>

* fix validateEmbeddingsFieldValues Method

Signed-off-by: Gopala-Krishna.Char <[email protected]>

* spotless formatting

Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Onboard jenkins prod docker images on github actions (opensearch-project#483)

Signed-off-by: Peter Zhu <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Update dependency org.json:json to v20231013 (opensearch-project#481)

Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* [Backport main manually][bug fix] Fix async actions are left in neural_sparse query (opensearch-project#438) (opensearch-project#479)

* [bug fix] Fix async actions are left in neural_sparse query (opensearch-project#438)

* add serialization and deserialization

Signed-off-by: zhichao-aws <[email protected]>

* hash, equals. + UT

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

* add test

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
(cherry picked from commit 51e6c00)

* rm max_token_score

Signed-off-by: zhichao-aws <[email protected]>

* add changelog

Signed-off-by: zhichao-aws <[email protected]>

* tidy

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Fixed exception for case when Hybrid query being wrapped into bool query (opensearch-project#490)

* Adding null check for case when hybrid query wrapped into bool query

Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Fixed Hybrid query for cases when it's wrapped into other compound queries (opensearch-project#498)

* Fixed nested field case

Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Added the github action to copy the attached issues label to PR. (opensearch-project#504)

Signed-off-by: Navneet Verma <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Added support for jdk-21 (opensearch-project#500)

* Added support for jdk-21

Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Gopala-Krishna.Char <[email protected]>

* Add unit tests + small fixes

Signed-off-by: krishy91 <[email protected]>

* fix indentation

Signed-off-by: krishy91 <[email protected]>

* remove unused code + add 2nd level nesting test

Signed-off-by: krishy91 <[email protected]>

* add integration test for list of nested objects

Signed-off-by: krishy91 <[email protected]>

---------

Signed-off-by: Gopala-Krishna.Char <[email protected]>
Signed-off-by: Peter Zhu <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Navneet Verma <[email protected]>
Signed-off-by: krishy91 <[email protected]>
Co-authored-by: Gopala-Krishna.Char <[email protected]>
Co-authored-by: Peter Zhu <[email protected]>
Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com>
Co-authored-by: zhichao-aws <[email protected]>
Co-authored-by: Martin Gaievski <[email protected]>
Co-authored-by: Navneet Verma <[email protected]>
Signed-off-by: yuye-aws <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants