-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document deploying DRA to OpenShift #82
Draft
empovit
wants to merge
1
commit into
NVIDIA:main
Choose a base branch
from
empovit:openshift-doc
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
# Running the NVIDIA DRA Driver on Red Hat OpenShift | ||
|
||
This document explains the differences between deploying the NVIDIA DRA driver on OpenShift and upstream Kubernetes or its flavors. | ||
|
||
## Prerequisites | ||
|
||
Install OpenShift 4.16 or later. You can use the Assisted Installer to install on bare metal, or obtain an IPI installer binary (`openshift-install`) from the [OpenShift clients page](https://mirror.openshift.com/pub/openshift-v4/clients/ocp/) page. Refer to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.16/installing/index.html) for different installation methods. | ||
|
||
## Enabling DRA on OpenShift | ||
|
||
Enable the `TechPreviewNoUpgrade` feature set as explained in [Enabling features using FeatureGates](https://docs.openshift.com/container-platform/4.16/nodes/clusters/nodes-cluster-enabling-features.html), either during the installation or post-install. The feature set includes the `DynamicResourceAllocation` feature gate. | ||
|
||
Update the cluster scheduler to enable the DRA scheduling plugin: | ||
|
||
```console | ||
$ oc patch --type merge -p '{"spec":{"profile": "HighNodeUtilization", "profileCustomizations": {"dynamicResourceAllocation": "Enabled"}}}' scheduler cluster | ||
``` | ||
|
||
## NVIDIA GPU Drivers | ||
|
||
The easiest way to install NVIDIA GPU drivers on OpenShift nodes is via the NVIDIA GPU Operator with the device plugin disabled. Follow the installation steps in [NVIDIA GPU Operator on Red Hat OpenShift Container Platform](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html), and **_be careful to disable the device plugin so it does not conflict with the DRA plugin_**: | ||
|
||
```yaml | ||
devicePlugin: | ||
enabled: false | ||
``` | ||
|
||
## NVIDIA Binaries on RHCOS | ||
|
||
The location of some NVIDIA binaries on an OpenShift node differs from the defaults. Make sure to pass the following values when installing the Helm chart: | ||
|
||
```yaml | ||
nvidiaDriverRoot: /run/nvidia/driver | ||
nvidiaCtkPath: /var/usrlocal/nvidia/toolkit/nvidia-ctk | ||
``` | ||
|
||
## OpenShift Security | ||
|
||
OpenShift generally requires more stringent security settings than Kubernetes. If you see a warning about security context constraints when deploying the DRA plugin, pass the following to the Helm chart, either via an in-line variable or a values file: | ||
|
||
```yaml | ||
kubeletPlugin: | ||
containers: | ||
plugin: | ||
securityContext: | ||
privileged: true | ||
seccompProfile: | ||
type: Unconfined | ||
``` | ||
|
||
If you see security context constraints errors/warnings when deploying a sample workload, make sure to update the workload's security settings according to the [OpenShift documentation](https://docs.openshift.com/container-platform/4.16/operators/operator_sdk/osdk-complying-with-psa.html). Usually applying the following `securityContext` definition at a pod or container level works for non-privileged workloads. | ||
|
||
```yaml | ||
securityContext: | ||
runAsNonRoot: true | ||
seccompProfile: | ||
type: RuntimeDefault | ||
allowPrivilegeEscalation: false | ||
capabilities: | ||
drop: | ||
- ALL | ||
``` | ||
|
||
## Using Multi-Instance GPU (MIG) | ||
|
||
Workloads that use the Multi-instance GPU (MIG) feature require MIG to be enabled on the worker nodes with [MIG-supported GPUs](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#supported-gpus), e.g. A100. | ||
|
||
First, make sure to stop any custom pods that might be using the GPU to avoid disruption when the new MIG configuration is applied. | ||
|
||
Enable MIG via the MIG manager of the NVIDIA GPU Operator. **Do not configure MIG devices as the DRA driver will do it automatically on the fly**: | ||
|
||
```console | ||
$ oc label node <node> nvidia.com/mig.config=all-enabled --overwrite | ||
``` | ||
|
||
MIG will be automatically enabled on the labeled nodes. For additional information, see [MIG Support in OpenShift Container Platform](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/mig-ocp.html). | ||
|
||
**Note:** | ||
The `all-enabled` MIG configuration profile is available out of the box in the NVIDIA GPU Operator starting v24.3. With an earlier version, you may need to [create a custom profile](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/mig-ocp.html#creating-and-applying-a-custom-mig-configuration). | ||
|
||
You can verify the MIG status using the `nvidia-smi` command from a GPU driver pod: | ||
|
||
```console | ||
$ oc exec -ti nvidia-driver-daemonset-<suffix> -n nvidia-gpu-operator -- nvidia-smi | ||
+-----------------------------------------------------------------------------------------+ | ||
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: N/A | | ||
|-----------------------------------------+------------------------+----------------------+ | ||
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | ||
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | ||
| | | MIG M. | | ||
|=========================================+========================+======================| | ||
| 0 NVIDIA A100 80GB PCIe On | 00000000:17:00.0 Off | On | | ||
| N/A 35C P0 45W / 300W | 0MiB / 81920MiB | N/A Default | | ||
| | | Enabled | | ||
+-----------------------------------------+------------------------+----------------------+ | ||
``` | ||
|
||
**Note:** | ||
On some cloud service providers (CSP), the CSP blocks GPU reset for GPUs passed into a VM. In this case [ensure](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-mig.html#enabling-mig-during-installation) that the `WITH_REBOOT` environment variable is set to `true`: | ||
|
||
```yaml | ||
migManager: | ||
... | ||
env: | ||
- name: WITH_REBOOT | ||
value: 'true' | ||
... | ||
``` | ||
|
||
When MIG settings could not be fully applied, the MIG status will be marked with an asterisk (i.e. `Enabled*`) and you will need to reboot the nodes manually. | ||
|
||
See the [NVIDIA Multi-Instance GPU User Guide](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html) for more information about MIG. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
apiVersion: config.openshift.io/v1 | ||
kind: FeatureGate | ||
metadata: | ||
name: cluster | ||
spec: | ||
featureSet: TechPreviewNoUpgrade |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
#!/usr/bin/env bash | ||
|
||
set -ex | ||
set -o pipefail | ||
|
||
oc patch --type merge -p '{"spec":{"profile": "HighNodeUtilization", "profileCustomizations": {"dynamicResourceAllocation": "Enabled"}}}' scheduler cluster |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of scope: As a matter of interest, does it make sense to make something like this the default on Openshift?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but this clause is currently in values.yaml. I'm not sure it's possible to use conditional statements there.