Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Gunicorn and use Uvicorn only for gateway #530

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yunfeng-scale
Copy link
Collaborator

@yunfeng-scale yunfeng-scale commented May 31, 2024

Pull Request Summary

in kubernetes environment we don't really need multiple workers in the same pod, rather it's simpler to just have kubernetes autoscale the number of pods. based on some internal benchmarks gunicorn has some known load balancing issues, also removing this layer results in less error and better latency

Test Plan and Usage Guide

will run simple load testing for get requests with and without gunicorn

@yunfeng-scale yunfeng-scale requested a review from a team May 31, 2024 17:59
@yunfeng-scale yunfeng-scale changed the title Remove Gunicorn and use Uvicorn only for gateway MLI-2258 Remove Gunicorn and use Uvicorn only for gateway May 31, 2024
@yunfeng-scale yunfeng-scale changed the title MLI-2258 Remove Gunicorn and use Uvicorn only for gateway Remove Gunicorn and use Uvicorn only for gateway May 31, 2024
Copy link
Collaborator

@edgan8 edgan8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big change, can you include a summary of the test plan? I'm worried about any performance impact

"--workers",
f"{num_workers}",
"1", # Let the Kubernetes deployment handle the number of pods
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this reduce the amount of traffic we can receive per-pod? Why not keep it at 4?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to remove load balancing within pod

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we increasing the number of pods by 4 to compensate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's the initial plan

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh i think llm engine is overprovisioned

@yixu34
Copy link
Member

yixu34 commented Jun 1, 2024

This is a big change, can you include a summary of the test plan? I'm worried about any performance impact

In addition to:

will run simple load testing for get requests with and without gunicorn

Let's also monitor post-rollout 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants