diff --git a/models/README.md b/models/README.md index 11ad02d8..5999c3f6 100644 --- a/models/README.md +++ b/models/README.md @@ -74,6 +74,40 @@ A prompt format is used to convert a chat completions API input into a prompt to The string template should include the `{instruction}` keyword, which will be replaced with message content from the ChatCompletions API. +For example, if a user sends the following message for llama2-7b-chat-hf ([prompt format](continuous_batching/meta-llama--Llama-2-7b-chat-hf.yaml#L27-L33)): +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "What is the capital of France?" + }, + { + "role": "assistant", + "content": "The capital of France is Paris." + }, + { + "role": "user", + "content": "What about Germany?" + } + ] +} +``` +The generated prompt that is sent to the LLM engine will be: +``` +[INST] <> +You are a helpful assistant. +<> + +What is the capital of France? [/INST] The capital of France is Paris. [INST] What about Germany? [/INST] +``` + +##### Schema + The following keys are supported: * `system` - The system message. This is a message inserted at the beginning of the prompt to provide instructions for the LLM. * `assistant` - The assistant message. These messages are from the past turns of the assistant as defined in the list of messages provided in the ChatCompletions API. @@ -87,7 +121,7 @@ In addition, there some configurations to control the prompt formatting behavior * `strip_whitespace` - Whether to automatically strip whitespace from left and right of the content for the messages provided in the ChatCompletions API. -You can see an example in the [Adding a new model](#adding-a-new-model) section below. +You can see config in the [Adding a new model](#adding-a-new-model) section below. ### Scaling config