-
We have nodepools with different GPU cards in our cluster and would like to be able to choose a specific one for a task. For non-flyte workloads, we do this by adding a nodeSelector label gpu_type to the pod and label our nodes accordingly. We do the same to schedule some pods on nodes with a faster cpu if we care about single core performance.
Is there any other way to add such a nodeSelector to a task or a different approach to achieve what we intend to do? -Stephan Gref |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Flyte propeller allows you to specify a toleration for the GPU resource: https://github.com/flyteorg/flyteplugins/blob/892f35eb8a0041969039e56b64a0467f17e6809c/go/tasks/pluginmachinery/flytek8s/config/config.go#L99 When you specify a gpu resource in the task resource requirements, it will add the correct resource requirement, and the appropriate toleration to the pod so that it can schedule on the right node pool. Using multiple gpu node pools will likely require a sidecar task for now. Specifying a GPU resource without using a sidecar task only works with a single node selector. Since you have multiple cards and presumably multiple node selectors that are task-specific, you could add a wrapper around the task decorator that applies the correct task type based on an environment variable. That way you can say RUN_IN_LOCAL_MODE=true or something equivalent, and it will create a regular python task instead that should run locally. Just unset env var during registration and you should be good. Basically, you define a different task type, inject your gpu node selector info into the custom attribute of the task template, parse and apply it in the flyte propeller plugin when building the pod. This is not included in the standard task because Flyte has the concept of pod tasks, separate from container tasks. This just needs an implementation of the local_execute method: task.py
See also GitHub Issue #1328: |
Beta Was this translation helpful? Give feedback.
Flyte propeller allows you to specify a toleration for the GPU resource: https://github.com/flyteorg/flyteplugins/blob/892f35eb8a0041969039e56b64a0467f17e6809c/go/tasks/pluginmachinery/flytek8s/config/config.go#L99
and it adds the appropriate resource to the pod so that it gets scheduled on a node with GPUs: https://github.com/flyteorg/flyteplugins/blob/892f35eb8a0041969039e56b64a0467f17e6809c/go/tasks/pluginmachinery/flytek8s/container_helper.go#L23
When you specify a gpu resource in the task resource requirements, it will add the correct resource requirement, and the appropriate toleration to the pod so that it can schedule on the right node pool. Using multiple gpu node pools will like…