Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Slow Speed of vLLM when evaluating MMLU #35

Open
3 tasks done
cby-pku opened this issue Aug 5, 2024 · 1 comment
Open
3 tasks done

[Question] Slow Speed of vLLM when evaluating MMLU #35

cby-pku opened this issue Aug 5, 2024 · 1 comment
Labels
question Further information is requested

Comments

@cby-pku
Copy link
Contributor

cby-pku commented Aug 5, 2024

Required prerequisites

Questions

When evaluating MMLU, the codebase supports vLLM inference, but the speed is slow (20 minutes for a single task). According to my experience, the normal speed is 20 minutes for all tasks.

@cby-pku cby-pku added the question Further information is requested label Aug 5, 2024
@Kass123777
Copy link
Collaborator

Thank you for your question!
This is a known issue. Since the current architecture implements the BaseInference class based on deepspeed and vllm in the same Python file, importing deepspeed-related dependencies causes vllm to fail to start properly. Therefore, I set distributed_executor_backend="ray" when starting vllm. This does significantly affect efficiency.
We will further modify the framework in the next version to completely decouple the two backends and fully unleash the inference speed of vllm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants