-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multinode serving #574
Draft
seanshi-scale
wants to merge
91
commits into
main
Choose a base branch
from
seanshi/20240722-multinode-serving
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Multinode serving #574
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
seanshi-scale
commented
Jul 25, 2024
seanshi-scale
commented
Sep 26, 2024
model-engine/model_engine_server/infra/services/live_model_endpoint_service.py
Outdated
Show resolved
Hide resolved
seanshi-scale
commented
Sep 26, 2024
model-engine/model_engine_server/infra/services/live_model_endpoint_service.py
Outdated
Show resolved
Hide resolved
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Summary
Multinode serving
API/schema changes:
Bundles/bundle v2:
Fields in metadata to allow for LWS, specifically we need a different command + some sets of env vars. No code really needed here iiuc since we're just passing through the existing metadata field.Use the new fields in the entity for doing multinode
Endpoints:
nodes_per_worker param that controls whether to use a LWS or not
Create LWS endpoint, delete LWS endpoint is allowed
Modifying a LWS endpoint to a non-LWS endpoint or modifying a non-LWS endpoint into a LWS endpoint is not allowed (similar to how we don't allow switching from sync and async) (also changing nodes_per_worker will not be allowed). Enforced by this not being capable through the API.
Get LWS endpoint is allowed
Only available LWS endpoint type will be gpu + streaming, cpu-only LWS doesn't really make sense imo, async/sync could make sense but we don't need it for LLM serving at this point
k8s_erd get_resources + get_all_resources need to be LWS compatible
LLM endpoints:
create a multinode bundle if situation calls for it, use endpoint's api to include this new bundle
TODO build new vllm image to use multinode
Python client:
TODO test
Test Plan and Usage Guide
unit tests
TODO test for updating existing LWS, _delete_lws doesn't find the lws to delete, _get_all has a LWS returnedintegration tests/e2e tests require LWS, so can't really do this
TODO deploy and manually test some things
ie