Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: Ray + vLLM OpenAI (offline) Batch Inference #8636

Open
1 task done
mbuet2ner opened this issue Sep 19, 2024 · 0 comments
Open
1 task done

[Usage]: Ray + vLLM OpenAI (offline) Batch Inference #8636

mbuet2ner opened this issue Sep 19, 2024 · 0 comments
Labels
usage How to use vllm

Comments

@mbuet2ner
Copy link

Your current environment

None

How would you like to use vllm

I want to use the OpenAI library to do offline batch inference leveraging Ray (for scaling and scheduling) on top of vLLM.

Context: The plan is to built a FastAPI service that closely mimicks OpenAI's batch API and allows to process a larger number of prompts (tens of thousands) in 24h. There are a few options of achieving this with vLLM but every one has some important drawback, but maybe I am missing something:

  • There is an existing guide that uses the LLMClass in the docs with Ray. While the LLMClass shares the OpenAI sampling parameters, it does lack the important OpenAI prompt templating.
  • The run_batch.py entrypoint that was introduced here would be the simplest one. But it does not support Ray out of the box.
  • The third option would be to use the AsyncLLMEngine as done here and bundle it with OpenAIServingChat as has been done in run_batch.py. But this would entail some (potential) performance degredation due to going asynch even though it is not really needed for offline batch inference.
  • The fourth option could be to use Ray serve like in this example from Ray's docs. But this would lack the OpenAI batch format and is – again – async.

Maybe this helps other people as well. Would be super grateful for some feedback. 🙂
And thanks a ton for this very nice piece of software and the great community!

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@mbuet2ner mbuet2ner added the usage How to use vllm label Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

1 participant