From 679747496d64833cdaaa9d27ce79c5a2e3a5539b Mon Sep 17 00:00:00 2001 From: Sihan Wang Date: Mon, 8 Jan 2024 14:18:32 -0800 Subject: [PATCH] Update Signed-off-by: Sihan Wang --- models/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/models/README.md b/models/README.md index 02c3abc7..a8fe030a 100644 --- a/models/README.md +++ b/models/README.md @@ -45,6 +45,8 @@ RayLLM supports continuous batching, meaning incoming requests are processed as * `gcs_mirror_config` is a dictionary that contains configuration for loading the model from Google Cloud Storage instead of Hugging Face Hub. You can use this to speed up downloads. #### TRTLLM Engine Config +* `model_id` is the ID that refers to the model in the RayLLM or OpenAI API. +* `type` is the type of inference engine. `VLLMEngine`, `TRTLLMEngine`, and `EmbeddingEngine` are currently supported. * `model_local_path` is the path to the TensorRT-LLM model directory. * `s3_mirror_config` is a dictionary that contains configurations for loading the model from S3 instead of Hugging Face Hub. You can use this to speed up downloads. * `generation` contains configurations related to default generation parameters such as `prompt_format` and `stopping_sequences`.