Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to specify CUDA visible device id for model service #444

Closed
nkwangleiGIT opened this issue Dec 25, 2023 · 4 comments
Closed

Support to specify CUDA visible device id for model service #444

nkwangleiGIT opened this issue Dec 25, 2023 · 4 comments
Assignees
Milestone

Comments

@nkwangleiGIT
Copy link
Contributor

nkwangleiGIT commented Dec 25, 2023

CUDA_DEVICE_ORDER=PCI_BUS_ID
CUDA_VISIBLE_DEVICES="0,3" # specify which GPU(s) to be used

For scenario that there are different type of GPUs on the same node, like T4 and A100 etc. We should support to deploy model with specified device id.

@bjwswang bjwswang self-assigned this Dec 26, 2023
@bjwswang
Copy link
Collaborator

  1. To RunnerFastchat, we can configure this variables --gpus to the command
    https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/model_worker.py#L311

but experiment must be made to make sure this can work with k8s's gpu scheduling

  1. For RunnerFastchatVLLM, there is no such configuration --gpus.

Based on ray doc https://docs.ray.io/en/latest/ray-core/scheduling/accelerators.html#starting-ray-nodes-with-accelerators
, it seems we can configure this CUDA_VISIBLE_DEVICES when start/configure ray node so the RunnerFastchatVLLM will only detect the devices exposed by ray node.

@bjwswang
Copy link
Collaborator

bjwswang commented Dec 26, 2023

Furthermore if we allow user to specifiy gpus , we need to provide user a way to view current available gpus which is easy if he/she has the permission to the host(not appropriate for all users)

This issue can be splitted to two tasks:

  • RunnerFastchat`(single node)

  • RunnerFastchatVLLM (distributed)

@nkwangleiGIT nkwangleiGIT added this to the v0.2.0 milestone Dec 30, 2023
@nkwangleiGIT
Copy link
Contributor Author

let me do a basic support like below:

  • Support to configure nodeSelector and cuda visible devices when deploy model on the API/ops-console
  1. No ray support, then it'll deploy the model to node(s) matching the nodeSelector and use the specified GPUs - like the single node above
  2. With ray support, it's the same and will construct a GPU pool if there are multiple nodes involved - like the distributed above
    For ray support, will cover it in Multiple gpus on different nodes #427

@nkwangleiGIT
Copy link
Contributor Author

I think we can support it now following the docs below:
http://kubeagi.k8s.com.cn/docs/Configuration/gpu-and-node-affinity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants