Implement MPS natively as in linux #807

thien-lm · 2024-07-08T14:58:07Z

How GPU resource be shared by MPS in Linux ?

GPU compute mode will be set to EXCLUSIVE_PROCESS to ensure any process request to use GPU will need to talk to MPS control daemon and MPS server
By default, each MPS client process can be access up to 100% memory and 100% available threads of GPUs
MPS resource can be limited from MPS control daemon level, MPS client level to CUDA context level: https://docs.nvidia.com/deploy/mps/#performance

A common provisioning strategy is to uniformly partition the available threads equally to each MPS client processes - this is how NVDP devs implemented MPS
A more optimal strategy is to uniformly partition the portion by half of the number of expected clients
The near optimal provision strategy is to non-uniformly partition the available threads based on the workloads of each MPS clients (i.e., set active thread percentage to 30% for client 1 and set active thread percentage to 70 % client 2 if the ratio of the client1 workload and the client2 workload is 30%: 70%) - this is what i want
The most optimal provision strategy is to precisely limit the number of SMs to use for each MPS clients knowing the execution resource requirements of each client

NVDP devs just set hard limit at control daemon level, by 100/n for both memory and threads, with n is the number of replicas
I think it will be so inconvenient for us to use MPS

I will remove the hard limit 100/n be set at control daemon level
Instead, i wll set resource limit for each container will use MPS in Kubernetes by two environment variable: CUDA_MPS_ACTIVE_THREAD_PERCENTAGE and CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
By that way, the resource provisioning of MPS in NVDP will be very flexible, because each container will be provided the number of threads and memory as it need, was that so nice?

thien-lm added 5 commits July 8, 2024 21:25

my MPS implementation

5e2b292

update readme.md

862fb85

update readme.md

6075fd3

update readme.md

0283f6f

revert readme.md

842d128