[BUG] BWC issue with async http client change #2417

ylwu-amzn · 2024-05-08T11:21:42Z

What is the bug?
We have changed to async http client in 2.14 (#1839), tested with 2.14 RC4, find the predict result for text embedding model not same with 2.13.

How can one reproduce the bug?
Steps to reproduce the behavior:

Create two test clusters: 2.13 and 2.14 RC4
Follow steps of this issue [BUG] Neural search: 4xx error ingesting data with Sagemaker external model #2249 to create model
Then run this test on 2.13 and 2.14 test cluster

POST /_plugins/_ml/_predict/text_embedding/<your_model_id>
{
  "text_docs":[ "today is sunny", "hello world"],
  "return_number": true,
  "target_response": ["sentence_embedding"]
}

2.13 result

{
  "inference_results": [
    {
      "output": [
        {
          "name": "sentence_embedding",
          "data_type": "FLOAT32",
          "shape": [
            768
          ],
          "data": [
            0.009424215,
            -0.008311393,
            0.06740056,
            ...
           ]
        }
      ],
      "status_code": 200
    },
    {
      "output": [
        {
          "name": "sentence_embedding",
          "data_type": "FLOAT32",
          "shape": [
            768
          ],
          "data": [
            0.010724226,
            0.055782758,
            0.027084162,
            ...
          ]
        }
      ],
      "status_code": 200
    }
  ]
}

2.14 result

{
  "inference_results": [
    {
      "output": [
        {
          "name": "sentence_embedding",
          "data_type": "FLOAT32",
          "shape": [
            768
          ],
          "data": [
            0.009424215,
            -0.008311393,
            0.06740056,
            ...
          ]
        }
      ],
      "status_code": 200
    }
  ]
}

What is the expected behavior?
We should not break BWC. Should return same result of 2.13.

What is your host/environment?

OS: [e.g. iOS]
Version [e.g. 22]
Plugins

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

The text was updated successfully, but these errors were encountered:

zane-neo · 2024-05-09T00:19:25Z

The issue is caused by the refactor of the httpclient from sync to async, when using sync httpclient with user defined preprocess function, a list of string input will be processed in sequential order, user script always pick up the first element in the list, and the list shrinks when a prediction is done.
With async httpclient approach, we need to calculate the total chunks before sending any request to remote model end, but the code doesn't handle user defined script case well, the request is been regarded as a batch request to remote model. So we need to check if user defined script is shown, we need to calculate the chunks in different way, this PR fixed this issue: #2418

dhrubo-os · 2024-05-21T18:03:24Z

PR is merged. Closing this issue.

ylwu-amzn added bug Something isn't working untriaged labels May 8, 2024

ylwu-amzn assigned zane-neo May 8, 2024

zane-neo mentioned this issue May 8, 2024

Fix user defined preprocess function missing prediction issue #2418

Merged

5 tasks

dhrubo-os removed the untriaged label May 8, 2024

dhrubo-os closed this as completed May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] BWC issue with async http client change #2417

[BUG] BWC issue with async http client change #2417

ylwu-amzn commented May 8, 2024 •

edited

Loading

zane-neo commented May 9, 2024

dhrubo-os commented May 21, 2024

[BUG] BWC issue with async http client change #2417

[BUG] BWC issue with async http client change #2417

Comments

ylwu-amzn commented May 8, 2024 • edited Loading

zane-neo commented May 9, 2024

dhrubo-os commented May 21, 2024

ylwu-amzn commented May 8, 2024 •

edited

Loading