[Enhancement] Enhance ML inference processor for input processing #3193

ylwu-amzn · 2024-10-30T22:59:30Z

What's the problem?

Currently, in using ml inference processors, when the users choose a model and then try to set input_map, users try to load the input map’s key following the model interface. This design works for the simple model interfaces when the documents have the exact format matching the model interface.

However, If the input parameter of model interface can't be set to the document field name after using json path, users need to change the model interface to match the document field. Constructing prompt is one of the complex use case. Prompt is usually mixing with some static instructions and content from the documents. It’s not reusable if configuring different model interfaces for different prompt engineering use cases.

Sample Use Case -Bedrock Claude V3 :

0. Index

Using music document for the following use case. Every document has two fields “persona” and “query”:


PUT /music/_doc/1
{
  "persona": "financial analyst" ,
  "query":"who is talor switch"
}

PUT /music/_doc/2
{
  "persona": "local farmer" ,
  "query":"who is talor switch"
}

PUT /music/_doc/3
{
  "persona": "financial analyst" ,
  "query":"justin biber"
}

1. Model:

Using Bedrock Claude V3 model as an example: This is the model api, usually it’s required a message field

POST /model/us.anthropic.claude-3-5-sonnet-20240620-v1:0/invoke HTTP/1.1
{
  "anthropic_version": "bedrock-2023-05-31",
  "max_tokens": 1024,
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Hello world"
        }
      ]
    }
  ]
}

This is optional system prompt, the system field is optional

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=2048,
    system="You are a seasoned data scientist at a Fortune 500 company.", # <-- role prompt
    messages=[
        {"role": "user", "content": "Analyze this dataset for anomalies: <dataset>{{DATASET}}</dataset>"}
    ]
)

print(response.content)

2. Blueprint:

In OpenSearch, we have the blueprint as (currently, it’s missing the system field), the ideal blueprint is going to be:

POST /_plugins/_ml/connectors/_create
{
  "name": "Claude V3",
  "description": "Connector for Claude V3",
  "version": 1,
  "protocol": "aws_sigv4",
  "parameters": {
        "region": "us-west-2",
        "service_name": "bedrock",
        "auth": "Sig_V4",
        "response_filter": "$.content[0].text",
        "max_tokens_to_sample": "8000",
        "anthropic_version": "bedrock-2023-05-31",
        "model": "anthropic.claude-3-sonnet-20240229-v1:0",
        "system":""
  },
  "credential": {
    "access_key": "",
    "secret_key": "",
    "session_token": ""
  },
  "actions": [
    {
     "action_type": "PREDICT",
      "method": "POST",
      "url": "https://bedrock-runtime.us-west-2.amazonaws.com/model/anthropic.claude-instant-v1/invoke",
      "headers": {
        "x-amz-content-sha256": "required",
        "content-type": "application/json"
      },
      "request_body": "{\"messages\":[{\"role\":\"${parameters.users}\",\"content\":[{\"type\":\"text\",\"text\":\"${parameters.inputs}\"}]}],\"anthropic_version\":\"${parameters.anthropic_version}\",\"max_tokens\":${parameters.max_tokens_to_sample},\"system\":\"${parameters.system:-null}\"}"
    }
  ]
}


POST /_plugins/_ml/models/_register
{
  "name": "Claude V3 model",
  "version": "1.0.1",
  "function_name": "remote",
  "description": "Claude V3",
  "connector_id": "Pali-ZIBAs32TwoKZ1th"
}

POST /_plugins/_ml/models/Qqli-ZIBAs32TwoKh1tO/_deploy

POST /_plugins/_ml/models/Qqli-ZIBAs32TwoKh1tO/_predict  
{
  "parameters": {
   "inputs": "How many moons does Jupiter have?",
   "system": "You are an ${parameters.role}, tell me about ${parameters.inputs}, Ensure tha you can generate a short answer less than 10 words.",
   "role":"assistant"}
}

Sample Predict with prompt requesting within 60 words:

POST /_plugins/_ml/models/Qqli-ZIBAs32TwoKh1tO/_predict  
{
  "parameters": {
   "inputs": "How many moons does Jupiter have?",
   "role":"assistant"
   "system": "You are an ${parameters.role}, tell me about ${parameters.inputs}, Ensure tha you can generate a short answer less than 90 words.",
   }
}

##returnning

{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "dataAsMap": {
            "response": "Jupiter has 79 known moons. The four largest moons of Jupiter that were discovered by Galileo Galilei in 1610 are Io, Europa, Ganymede, and Callisto. Io is the innermost of the four and volcanically active due to tidal heating from gravitational tug-of-war with Jupiter and the other large moons. Europa's icy surface likely hides an ocean of liquid water beneath. Ganymede is the largest moon in the Solar System. Callisto is also thought to harbor a subsurface ocean. Many of Jupiter's other moons are much smaller and more irregularly shaped. Several were discovered during the past few decades using ground- and space-based telescopes."
          }
        }
      ],
      "status_code": 200
    }
  ]
}

Sample Predict with prompt requesting within 10 words:

POST /_plugins/_ml/models/cqkn-ZIBAs32TwoK11ql/_predict  
{
  "parameters": {
   "inputs": "How many moons does Jupiter have?",
   "system": "You are an ${parameters.role}, tell me about ${parameters.inputs}, Ensure tha you can generate a short answer less than 10 words.",
   "role":"assistant"}

##returnning

{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "dataAsMap": {
            "response": "79 moons."
          }
        }
      ],
      "status_code": 200
    }
  ]
}

3. Model Interface for Bedrock:

we have predefine model interface for bedrock that requires a parameters.input field:

GET /_plugins/_ml/models/Qqli-ZIBAs32TwoKh1tO


{
  "name": "Claude V3 model",
  "model_group_id": "b6mh-JIBAs32TwoKZViT",
  "algorithm": "REMOTE",
  "model_version": "8",
  "description": "Claude V3",
  "model_state": "DEPLOYED",
  "created_time": 1730760836942,
  "last_updated_time": 1730760846556,
  "last_deployed_time": 1730760846556,
  "auto_redeploy_retry_times": 0,
  "planning_worker_node_count": 4,
  "current_worker_node_count": 4,
  "planning_worker_nodes": [
    "cBnpgDCdSd-qvzs6U8LT0g",
    "fqaeDeUxQwuE67FM-LpPNQ",
    "YTE9jdiwTfaMhPPUvt5D5A",
    "I1c8plZITK6EVQ5Ah3iekQ"
  ],
  "deploy_to_all_nodes": true,
  "is_hidden": false,
  "connector_id": "Pali-ZIBAs32TwoKZ1th",
  "interface": {
    "output": """{
    "type": "object",
    "properties": {
        "inference_results": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "output": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {
                                    "type": "string"
                                },
                                "dataAsMap": {
                                    "type": "object",
                                    "properties": {
                                        "response": {
                                            "type": "string"
                                        }
                                    },
                                    "required": [
                                        "response"
                                    ]
                                }
                            },
                            "required": [
                                "name",
                                "dataAsMap"
                            ]
                        }
                    },
                    "status_code": {
                        "type": "integer"
                    }
                },
                "required": [
                    "output",
                    "status_code"
                ]
            }
        }
    },
    "required": [
        "inference_results"
    ]
}""",
    "input": """{
    "type": "object",
    "properties": {
        "parameters": {
            "type": "object",
            "properties": {
                "inputs": {
                    "type": "string"
                }
            },
            "required": [
                "inputs"
            ]
        }
    },
    "required": [
        "parameters"
    ]
}"""
  }
}

4. Prompt

Now, when configuring prompt, we put in the model config field

{"model_config":{"system": "You are an ${parameters.role}, tell me about ${parameters.inputs}, Ensure tha you can generate a short answer less than 10 words."}}

But when we load the model interface, it required field inputs , but the role field is optional model input field, the system prompt field is also optional model input field.

5. Input_map

Now the current design, we can try to load the model interface keys into input_map key, but we also need to map role field as well. But it’s not in the model interface

"input_map": [
          {
            "role": "persona",
            "inputs": "query"
          }
        ]

here is the full config of search pipeline:

PUT /music/_doc/1
{
  "persona": "financial analyst" ,
  "query":"who is talor switch"
}

PUT /music/_doc/2
{
  "persona": "local farmer" ,
  "query":"who is talor switch"
}

PUT /music/_doc/3
{
  "persona": "financial analyst" ,
  "query":"justin biber"
}

PUT /_search/pipeline/my_pipeline_claude3
{
  "response_processors": [
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor is going to run claude3",
        "model_id": "Qqli-ZIBAs32TwoKh1tO",
        "function_name": "REMOTE",
        "input_map": [
          {
            "role": "persona",
            "inputs": "query"
          }
        ],
        "output_map": [
          {
            "claude_response": "response"
          }
        ],
        "model_config": {"system":"You are an ${parameters.role}, tell me about ${parameters.inputs}, Ensure tha you can generate a short answer less than 50 words."
        },
        "one_to_one":true,
        "ignore_missing": false,
        "ignore_failure": false
      }
    }
  ]
}

Pain point:

The pain point to format document field to match model interface

If we want to load the model input field match as the document field, users need to modify the model interface .

Proposal:

Proposal 1: Allow String Substitutions in input_map field
remove the prompt setting with model_config in ml inference processor, and have the prompt(system) field config in the input_map similar to below:

"input_map": [
          {
            "system": "You are an ${persona}, tell me about ${query}, 
                       Ensure tha you can generate a short answer less than 10 words.",
            "inputs": "query"            
          }
        ]

To benchmark, this is the predict API command:

POST /_plugins/_ml/models/Qqli-ZIBAs32TwoKh1tO/_predict  
{
  "parameters": {
   "inputs": "How many moons does Jupiter have?",
   "system": "You are an ${parameters.role}, tell me about ${parameters.inputs}, Ensure tha you can generate a short answer less than 10 words.",
   "role":"assistant"}
}

Pro:

directly mapping the document field into system field and format with document fields

Cons:

optional field not showing in the model interface:
the model interface is requiring a inputs field, and the system field is optional, we can’t preload the system field in the input map’s key
regex pattern limitation:
when we apply the string substitution in input_map, the questions cannot happen to have the same pattern. for example, I was asking a question, can you explain this code to me String templateString = "The ${animal} jumped over the ${target}."; animal = "quick brown fox"; "target"= "lazy dog" ” , but in the document, there is a field target="giant lion"

in this case, the string in the question is interpreted in the input map is
The quick brown fox jumped over the giant lion
, but in the question the string should meant:
The quick brown fox jumped over the lazy dog

the pattern of ${} in the string will always substitute when there happens to have the same document name

Proposal 2: transform model input format , apply setting in model_input field

in ml inference processor, there is a model_input config parameters, that can help format the model_inputs. There is a rerank example which use model_input in the doc. model_input can bypass the response_body format in the connector setting.

In this example, the model_input field can construct like this :

{\"messages\":
[{\"role\":\"${parameters.persona}\",
\"content\":[{\"type\":\"text\",\"text\":\"${parameters.inputs}\"}]}],
\"system\":\"You are an ${parameters.persona}, 
            tell me about ${parameters.inputs}, 
            Ensure that you can generate a short answer less than 50 words.\"}

the input map will be the same as the placeholder as

"input_map": [
          {
            "persona": "persona",
            "inputs": "query"
          }
        ]

Pros:

the model_input field can be powerful to format model input, not just for prompt engineering cases.
we still keep the input_map and model_input and model_config separately, thus we don’t need regex pattern to mix up the string substitution. The mapping only happens in input_maps and output_maps
this model_inputfield is already in used to format local model format input. This is added for local models to format local model format with less pre-processing functions.
Cons:
TBD

The text was updated successfully, but these errors were encountered:

dylan-tong-aws · 2024-10-31T17:17:10Z

The highest-level requirement is that a user should never be forced/constrained to having to create a custom model interface to support a specific search pipeline or use case. The user should be able to use the natural interface for a model (eg. Bedrock Claude V3's natural interface is the messaging or converse API) for all possible pipelines and use cases. This allows the user to share the model across any use case or pipeline. If you don't do this, then there is a scalability issue from a operational and usability (likely performance as well) because for each model, there will be the need to manage N models with unique interfaces for every N unique pipelines. A proper design should only require a single model with a common (natural) interface to support N unique pipelines. The current design does not satisfy these requirements. It's evident in RAG use cases. This is due to problems withe coupling of functionality and the order of operations of how fields are mapped and processed into a LLM prompt.

A simple design that decouples preprocessing and post processing logic from the core processor functionality should suffice. The processing flow can simply be: i. (optional) preprocess: perform data transform on input data ii. map transform output to model and execute inference iii. (optional) post process: perform data transform on inference output data. This simple design will guarantee the requirements are satisfied.

ylwu-amzn · 2024-10-31T17:28:23Z

Thanks @dylan-tong-aws , I had offline discussion with Tyler yesterday. He has another proposal which can also solve the concern "a user should never be forced/constrained to having to create a custom model interface to support a specific search pipeline or use case":

Tyler prefers something like this in input map part

"prompt": "Human: you are a helpful assistant. You have such context $.field1.content, please summarize and answer my question $.query.question"

Rather than have such thing in input map

"content": "$.field1.content"

And config prompt in model config

"prompt": "Human: you are a helpful assistant. You have such context ${parameters.content}, please summarize and answer my question $.query.question"

Correct me if wrong, @ohltyler

ohltyler · 2024-10-31T19:44:23Z

Regarding option 3 Enhance current ML inference processor input map parsing logic:

The highest-level requirement is that a user should never be forced/constrained to having to create a custom model interface to support a specific search pipeline or use case.

This implies a flexible and consistent model interface, which implies flexible and consistent keys in the input map / output map configurations, regardless of the use case. This is because the keys are the model interface inputs/outputs. See the ML processor documentation.

LLM inputs typically include a freeform text input as part of the API. See Anthropic messages API's content field. For prompt building, the prompt would be passed via this freeform text input. Therefore, the ideal model interface includes a text input (note the current preset connector/model is set up this way as well, with an inputs field. Therefore, the key to the input map should be this freeform text input. Therefore, the value should be the freeform input. Therefore, for prompt building use cases, the user should be able to build out this freeform prompt as a value to this input map.

@ylwu-amzn I agree with the above Option 3 as you laid out as one option. Options 1/2 also seem reasonable. I am indifferent on a solution, and suggest to go with the one that makes the most sense from an implementation perspective, and what the limitations are for ML processors etc. If it is not reasonable to expand the ML processor functionality, then it is simply not reasonable, and one of the other solutions should be pursued.

mingshl · 2024-10-31T22:08:12Z

Thanks @dylan-tong-aws , I had offline discussion with Tyler yesterday. He has another proposal which can also solve the concern "a user should never be forced/constrained to having to create a custom model interface to support a specific search pipeline or use case":

Tyler prefers something like this in input map part
"prompt": "Human: you are a helpful assistant. You have such context $.field1.content, please summarize and answer my question $.query.question"
Rather than have such thing in input map
"content": "$.field1.content"
And config prompt in model config
"prompt": "Human: you are a helpful assistant. You have such context ${parameters.content}, please summarize and answer my question $.query.question"
Correct me if wrong, @ohltyler

@ohltyler checkout the model_input field in ml inference processor configs,

model_input	String	Optional for externally hosted modelsRequired for local models	A template that defines the input field format expected by the model. Each local model type might use a different set of inputs. For externally hosted models, default is "{ "parameters": ${ml_inference.parameters} }.

We can load the prompt field in model_input and skip input mapping if you want. that serves the same purpose.

dylan-tong-aws · 2024-11-01T00:20:05Z

@mingshl, @ylwu-amzn, I'm fine with whatever approach or syntax that @ohltyler proposes. Tyler represents an end user, so if he approves of the usability, I'm good with it.

I only expect the solution to address the aforementioned requirement.

mingshl · 2024-11-05T21:45:15Z

Updated the issue description with sample use cases to explain the work flow of using ml inference search response processor and adding the two proposals.

ylwu-amzn added enhancement New feature or request untriaged labels Oct 30, 2024

mingshl removed the untriaged label Nov 5, 2024

mingshl self-assigned this Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Enhance ML inference processor for input processing #3193

[Enhancement] Enhance ML inference processor for input processing #3193

ylwu-amzn commented Oct 30, 2024 •

edited by mingshl

Loading

dylan-tong-aws commented Oct 31, 2024

ylwu-amzn commented Oct 31, 2024 •

edited

Loading

ohltyler commented Oct 31, 2024 •

edited

Loading

mingshl commented Oct 31, 2024

dylan-tong-aws commented Nov 1, 2024

mingshl commented Nov 5, 2024

[Enhancement] Enhance ML inference processor for input processing #3193

[Enhancement] Enhance ML inference processor for input processing #3193

Comments

ylwu-amzn commented Oct 30, 2024 • edited by mingshl Loading

What's the problem?

Sample Use Case -Bedrock Claude V3 :

Pain point:

Proposal:

dylan-tong-aws commented Oct 31, 2024

ylwu-amzn commented Oct 31, 2024 • edited Loading

ohltyler commented Oct 31, 2024 • edited Loading

mingshl commented Oct 31, 2024

dylan-tong-aws commented Nov 1, 2024

mingshl commented Nov 5, 2024

ylwu-amzn commented Oct 30, 2024 •

edited by mingshl

Loading

ylwu-amzn commented Oct 31, 2024 •

edited

Loading

ohltyler commented Oct 31, 2024 •

edited

Loading