Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Commit

Permalink
Update
Browse files Browse the repository at this point in the history
Signed-off-by: Sihan Wang <[email protected]>
  • Loading branch information
sihanwang41 committed Jan 8, 2024
1 parent 7a2db35 commit 6797474
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions models/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ RayLLM supports continuous batching, meaning incoming requests are processed as
* `gcs_mirror_config` is a dictionary that contains configuration for loading the model from Google Cloud Storage instead of Hugging Face Hub. You can use this to speed up downloads.

#### TRTLLM Engine Config
* `model_id` is the ID that refers to the model in the RayLLM or OpenAI API.
* `type` is the type of inference engine. `VLLMEngine`, `TRTLLMEngine`, and `EmbeddingEngine` are currently supported.
* `model_local_path` is the path to the TensorRT-LLM model directory.
* `s3_mirror_config` is a dictionary that contains configurations for loading the model from S3 instead of Hugging Face Hub. You can use this to speed up downloads.
* `generation` contains configurations related to default generation parameters such as `prompt_format` and `stopping_sequences`.
Expand Down

0 comments on commit 6797474

Please sign in to comment.