From 679747496d64833cdaaa9d27ce79c5a2e3a5539b Mon Sep 17 00:00:00 2001
From: Sihan Wang <sihanwang41@gmail.com>
Date: Mon, 8 Jan 2024 14:18:32 -0800
Subject: [PATCH] Update

Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
---
 models/README.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/models/README.md b/models/README.md
index 02c3abc7..a8fe030a 100644
--- a/models/README.md
+++ b/models/README.md
@@ -45,6 +45,8 @@ RayLLM supports continuous batching, meaning incoming requests are processed as
 * `gcs_mirror_config` is a dictionary that contains configuration for loading the model from Google Cloud Storage instead of Hugging Face Hub. You can use this to speed up downloads.
 
 #### TRTLLM Engine Config
+* `model_id` is the ID that refers to the model in the RayLLM or OpenAI API.
+* `type` is the type of  inference engine. `VLLMEngine`, `TRTLLMEngine`, and `EmbeddingEngine` are currently supported.
 * `model_local_path` is the path to the TensorRT-LLM model directory.
 * `s3_mirror_config` is a dictionary that contains configurations for loading the model from S3 instead of Hugging Face Hub. You can use this to speed up downloads.
 * `generation` contains configurations related to default generation parameters such as `prompt_format` and `stopping_sequences`.