Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multinode serving #574

Draft
wants to merge 91 commits into
base: main
Choose a base branch
from
Draft

Conversation

seanshi-scale
Copy link
Member

@seanshi-scale seanshi-scale commented Jul 23, 2024

Pull Request Summary

Multinode serving

API/schema changes:
Bundles/bundle v2:
Fields in metadata to allow for LWS, specifically we need a different command + some sets of env vars. No code really needed here iiuc since we're just passing through the existing metadata field.
Use the new fields in the entity for doing multinode
Endpoints:
nodes_per_worker param that controls whether to use a LWS or not
Create LWS endpoint, delete LWS endpoint is allowed
Modifying a LWS endpoint to a non-LWS endpoint or modifying a non-LWS endpoint into a LWS endpoint is not allowed (similar to how we don't allow switching from sync and async) (also changing nodes_per_worker will not be allowed). Enforced by this not being capable through the API.
Get LWS endpoint is allowed
Only available LWS endpoint type will be gpu + streaming, cpu-only LWS doesn't really make sense imo, async/sync could make sense but we don't need it for LLM serving at this point
k8s_erd get_resources + get_all_resources need to be LWS compatible
LLM endpoints:
create a multinode bundle if situation calls for it, use endpoint's api to include this new bundle
TODO build new vllm image to use multinode
Python client:
TODO test

Test Plan and Usage Guide

unit tests
TODO test for updating existing LWS, _delete_lws doesn't find the lws to delete, _get_all has a LWS returned
integration tests/e2e tests require LWS, so can't really do this
TODO deploy and manually test some things
ie

  • create llm multinode works, (yes)
  • update llm multinode replicas works, (yes)
  • get endpoint/llm works, (yes)
  • get all endpoint/llm works, (yes)
  • delete llm multinode works, (yes)
  • request to multinode llm endpoint works

@seanshi-scale seanshi-scale self-assigned this Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant