Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reindex ITs #446

Closed
wants to merge 3 commits into from
Closed

Conversation

zane-neo
Copy link
Collaborator

@zane-neo zane-neo commented Oct 13, 2023

Description

Add reindex ITs in neural search

Issues Resolved

Issue: #386

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: zane-neo <[email protected]>
Signed-off-by: zane-neo <[email protected]>
@codecov
Copy link

codecov bot commented Oct 13, 2023

Codecov Report

Merging #446 (4af404d) into main (04c1e05) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##               main     #446   +/-   ##
=========================================
  Coverage     84.37%   84.37%           
  Complexity      498      498           
=========================================
  Files            40       40           
  Lines          1491     1491           
  Branches        228      228           
=========================================
  Hits           1258     1258           
  Misses          133      133           
  Partials        100      100           

@navneet1v
Copy link
Collaborator

@zane-neo can we add reindex IT for other processors too?

@navneet1v
Copy link
Collaborator

@zane-neo any update on this PR?

@zane-neo
Copy link
Collaborator Author

@zane-neo can we add reindex IT for other processors too?

The issue in ml-commons is not specific to individual processor, it's more like a generic issue(user info not restored after the listener ran), so one or two processor to cover this is enough I thought.

@navneet1v
Copy link
Collaborator

@zane-neo can we add reindex IT for other processors too?

The issue in ml-commons is not specific to individual processor, it's more like a generic issue(user info not restored after the listener ran), so one or two processor to cover this is enough I thought.

I won't agree with this. It will be good if we have reindex ITs for atleast all the local model like splade and text embeddings. Text + image processor works on remote model so we cannot integrate here.

@zane-neo
Copy link
Collaborator Author

@zane-neo can we add reindex IT for other processors too?

The issue in ml-commons is not specific to individual processor, it's more like a generic issue(user info not restored after the listener ran), so one or two processor to cover this is enough I thought.

I won't agree with this. It will be good if we have reindex ITs for atleast all the local model like splade and text embeddings. Text + image processor works on remote model so we cannot integrate here.

This PR already includes splade and text embedding processors.

@martin-gaievski
Copy link
Member

Using Text_image_embeddings processor with only text field should work with local model. Processor uses mlClient to call the model, and model type is abstracted by the client.

@zane-neo
Copy link
Collaborator Author

Using Text_image_embeddings processor with only text field should work with local model. Processor uses mlClient to call the model, and model type is abstracted by the client.

it's a text embedding case right?

@martin-gaievski
Copy link
Member

Using Text_image_embeddings processor with only text field should work with local model. Processor uses mlClient to call the model, and model type is abstracted by the client.

it's a text embedding case right?

I think it's not. Embeddings are only text, that's correct, but the processor is text_image. That processor can work with text, image and text+image. If we want to test that reindexing works with all processors it would not be correct to test it with text_embedding and assume that text_image_embedding processor is also tested.

@zane-neo
Copy link
Collaborator Author

Using Text_image_embeddings processor with only text field should work with local model. Processor uses mlClient to call the model, and model type is abstracted by the client.

it's a text embedding case right?

I think it's not. Embeddings are only text, that's correct, but the processor is text_image. That processor can work with text, image and text+image. If we want to test that reindexing works with all processors it would not be correct to test it with text_embedding and assume that text_image_embedding processor is also tested.

current tests covered local models which can prove the correctness of ml-commons code, no need to add other cases IMO.

@navneet1v
Copy link
Collaborator

current tests covered local models which can prove the correctness of ml-commons code, no need to add other cases IMO.

This is not the sole purpose of these tests. If the purpose is for correctness of ML code, the tests should be in MLCommons.

We want to cover as many use cases as possible in which an ingestion processor can be used. Re-indexing is one of the use-case here. I know we started this PR as part of making sure that bug we got here: #386 can be caught in future, but that is not the sole purpose of the tests.

@zane-neo
Copy link
Collaborator Author

current tests covered local models which can prove the correctness of ml-commons code, no need to add other cases IMO.

This is not the sole purpose of these tests. If the purpose is for correctness of ML code, the tests should be in MLCommons.

We want to cover as many use cases as possible in which an ingestion processor can be used. Re-indexing is one of the use-case here. I know we started this PR as part of making sure that bug we got here: #386 can be caught in future, but that is not the sole purpose of the tests.

There're quite a lot different flows/features in OpenSearch, e.g. shrink index, split index, restore index etc. Do we have a way to cover all these features when implementing a new feature to ensure they're not breaking? Is covering only reindex enough?

@navneet1v
Copy link
Collaborator

Do we have a way to cover all these features when implementing a new feature to ensure they're not breaking? Is covering only reindex enough?

Given the issue in hand we should start with re-index and then add more case which you provided after this. We can cut a github issue to make sure that we are covering the cases you mentioned too.

@zane-neo
Copy link
Collaborator Author

Do we have a way to cover all these features when implementing a new feature to ensure they're not breaking? Is covering only reindex enough?

Given the issue in hand we should start with re-index and then add more case which you provided after this. We can cut a github issue to make sure that we are covering the cases you mentioned too.

Is there already a generic approach to address this? Like which can be used for all plugins so that no need to implement this in individual plugin separately?

@navneet1v
Copy link
Collaborator

Do we have a way to cover all these features when implementing a new feature to ensure they're not breaking? Is covering only reindex enough?

Given the issue in hand we should start with re-index and then add more case which you provided after this. We can cut a github issue to make sure that we are covering the cases you mentioned too.

Is there already a generic approach to address this? Like which can be used for all plugins so that no need to implement this in individual plugin separately?

As per my understanding there is no such thing. Main thing which I was trying to put here was, you are adding ITs only for sparse and text embedding processor. There are other processors too in the plugin. So it will be better if you can add re-index ITs for those processors too.

@zane-neo
Copy link
Collaborator Author

zane-neo commented Feb 6, 2024

Will do more test on open source part to reproduce this issue, will reopen this or create a new PR to this. Closing this now.

@zane-neo zane-neo closed this Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants