Use OpenOnload® or EnterpriseOnload® stacks to accelerate your workloads in Kubernetes® and OpenShift® clusters.
- OpenOnload (including EnterpriseOnload) 8.1+
- AMD Solarflare hardware (
sfc
) - OpenShift Container Platform (OCP) 4.10+ with
- Both restricted network or internet-connected clusters
Deployment can also be performed on Kubernetes 1.23+ but full implementation details are not currently provided. The Onload Device Plugin is not currently designed for standalone deployment.
Please see Release Notes for further detail on version compatibility and feature availability.
Your terminal requires access to:
- Your cluster via
kubectl
oroc
- This repository
This documentation standardises on kubectl
but both are compatible: alias kubectl=oc
.
Most users can benefit from the provided container images along with
KMM's in-cluster onload-module
builds.
A more comprehensive development environment is required for special use cases, namely:
- building bespoke
onload-module
images outside the cluster, - OpenShift MachineConfig for Day 0/1 sfc,
- developing Onload, and/or
- developing Onload Operator or Onload Device Plugin.
Your cluster requires access to the following provided container images:
onload-operator
onload-device-plugin
onload-user
onload-source
(if in-cluster builds)sfptpd
(optional)sfnettest
(optional)- KMM Operator & dependents
- DTK (if in-cluster builds on OpenShift)
- OpenShift includes a
driver-toolkit
(DTK) image in each release. No action should be required.
- OpenShift includes a
The cluster also requires access to the following node-specific kernel module container image(s) which may be provided externally or internally. If using in-cluster builds, push access to an internal registry will be required. Otherwise, only pull access is required if these images are pre-built. Please see Release Notes for further detail on feature availability.
onload-module
When using in-cluster builds, other dependencies may be required depending on
the method selected. These may include ubi-minimal
container image and
UBI RPM repositories.
Nodes require 60MB of root-writable local storage, by default in /opt
.
This repository's YAML configuration uses the following images by default:
docker.io/onload/onload-operator
docker.io/onload/onload-device-plugin
docker.io/onload/onload-source
docker.io/onload/onload-user
docker.io/onload/sfptpd
docker.io/onload/sfnettest
For restricted networks these container images can be mirrored.
To accelerate a pod:
- Configure the Onload Operator
- Configure an Onload Custom Resource (CR)
- Configure a pod network with AMD Solarflare interfaces, ie. Multus IPVLAN or MACVAN
- Configure the out-of-tree
sfc
module - Configure your pods to use the resource provided by the Onload Device Plugin and the network
Kubernetes objects deployed (simplified):
Pods & devices on Nodes:
The Onload Operator follows the Kubernetes Operator
pattern which links a Kubernetes Controller,
implemented here in the onload-operator
container image, to one or more Custom Resource Definitions (CRD),
implemented here in the Onload
kind of CRD.
To deploy the Onload Operator, its controller container and CRD, run:
kubectl apply -k https://github.com/Xilinx-CNS/kubernetes-onload/config/default?ref=v3.0
This deploys the following by default:
- In Namespace
onload-operator-system
with prefixonload-operator-
:- Onload CRD
- Operator version from DockerHub.
- RBAC for these components
The Onload Operator will not deploy the components necessary for accelerating workload pods without
an Onload
kind of Custom Resource (CR).
For restricted networks, the onload-operator
and onload-device-plugin
image locations will require changing from
their DockerHub defaults. To run the above command using locally hosted container images, open this repository
locally and use the following overlay:
git clone -b v3.0 https://github.com/Xilinx-CNS/kubernetes-onload && cd kubernetes-onload
cp -r config/samples/default-clusterlocal config/samples/my-operator
$EDITOR config/samples/my-operator/kustomization.yaml
kubectl apply --validate=true -k config/samples/my-operator
Tip
Replacing kubectl apply
with kubectl kustomize
will output a complete YAML manifest file which can be copied to a
network that does not have access to this repository.
The Onload Device Plugin implements the Kubernetes Device Plugin API
to expose a Kubernetes Resource
named amd.com/onload
.
It is distributed as the container image onload-device-plugin
. The image location is configured as an environment
variable within the Onload Operator deployment (see above) and
its ImagePullPolicy as part of Onload Custom Resource (CR), along with its other
customisation properties.
The Onload Operator manages an Onload Device Plugin DaemonSet which deploys, to each node selected for acceleration, a pod consisting of 3 containers:
- Init (
init
container,onload-user
image) -- for copying Onload files to host filesystem and Onload Worker volume. - Onload Worker (
onload-worker
container,onload-device-plugin
image) -- provides Onload Control Plane environment; privileged access to network namespaces. - Onload Device Plugin (
device-plugin
container,onload-device-plugin
image) -- for Kubernetes Device Plugin API; privileged access to Kubernetes API.
Instruct the Onload Operator to deploy the components necessary for accelerating workload pods by deploying an Onload
kind of Custom Resource (CR).
If your cluster is internet-connected OpenShift and you want to use in-cluster builds with the current version of OpenOnload, run:
kubectl apply -k https://github.com/Xilinx-CNS/kubernetes-onload/config/samples/onload/overlays/in-cluster-build-ocp?ref=v3.0
This takes a base Onload
CR template and adds the
appropriate image versions and
in-cluster build configuration. To customise
this recommended overlay further, see comments in these files and the variant steps below.
The above overlay configures KMM to modprobe onload
and modprobe sfc
. Both are required, but the latter may occur
outside the Onload Operator. Please see Out-of-tree sfc
module for options.
For further explanation of Onload
CR's available properties, refer to either inline comments in these templates or
the built-in explain command, eg. kubectl explain onload.spec
.
The schema for the above templates is defined by an Onload
Custom Resource Definition (CRD)
in onload_types.go which is distributed as part of Onload Operator's
generated YAML bundle.
Important
Due to Kubernetes limitations on label lengths, the combined length of the Name and Namespace of the Onload CR must be less than 32 characters.
In restricted networks or on other versions of Kubernetes, change the container image locations and build method(s) to suit your environment. For example, to adapt the overlay in-cluster build on OpenShift in restricted network:
git clone -b v3.0 https://github.com/Xilinx-CNS/kubernetes-onload && cd kubernetes-onload
cd config/samples/onload
cp -r overlays/in-cluster-build-ocp-clusterlocal overlays/my-onload
$EDITOR overlays/my-onload/kustomization.yaml
$EDITOR overlays/my-onload/patch-onload.yaml
kubectl apply -k overlays/my-onload
Consider configuring:
- Onload Operator & Onload Device Plugin container image tags (recommended to match)
- In above
kustomization.yaml
- In above
- Onload Source & Onload User container image tags and Onload version (all must match)
- In above
kustomization.yaml
&version
attribute inpatch-onload.yaml
- In above
- Onload Module build method and tag (match tag to Onload version for clarity)
- In above
kustomization.yaml
&build
section inpatch-onload.yaml
- In above
The Onload Operator supports all of KMM's core methods for providing compiled kernel modules to the nodes.
Some working examples are provided for use with the Onload CR:
- dtk-ubi -- currently recommended for OpenShift, depends on DTK & UBI
- dtk-only -- for OpenShift in very restricted networks, depends only on official OpenShift DTK
- mkdist-direct -- for consistency with non-containerised Onload deployments (not recommended)
- ubuntu -- representative sample for non-OpenShift clusters
Please see Onload Module pre-built images for the alternative to building in-cluster.
The out-of-tree sfc
kernel module is currently required when using the provided onload
kernel module
with a Solarflare card.
The following methods may be used:
-
Configure the Onload Operator to deploy a KMM Module for
sfc
. Please see the example in in-cluster build configuration. -
OpenShift MachineConfig for Day 0/1 sfc. This is for when newer driver features are required at boot time while using OpenShift, or when Solarflare NICs are used for OpenShift machine traffic, so as to avoid kernel module reloads disconnecting nodes.
-
A user-supported method beyond the scope of this document, such as a custom kernel build or in-house OS image.
Tip
Network interface names can be fixed with UDEV rules.
On a RHCOS node within OpenShift, the directory /etc/udev/rules.d/
can be written to with a MachineConfig
CR.
The Solarflare Enhanced PTP Daemon (sfptpd) is not managed by Onload Operator but deployment instructions are included in this repository.
Please see config/samples/sfptpd/ for documentation and examples.
After you have completed the Deployment steps your cluster is configured with the capability to accelerate workloads using Onload.
An easy test to verify everything is correctly configured is the sfnettest example.
To accelerate your workload, configure a pod with a AMD Solarflare network interface and
amd.com/onload
resource:
kind: Pod
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: ipvlan-bond0
spec:
...
containers:
- ...
resources:
limits:
amd.com/onload: 1
All applications started within the pod environment will be accelerated due to the LD_PRELOAD
environment variable
unless setPreload: false
is configured in Onload CR.
This Kubernetes Resource automatically exposes the following to a requesting pod:
Device mounts:
/dev/onload
/dev/onload_epoll
/dev/sfc_char
Library mounts (by default in /opt/onload/usr/lib64/
):
libonload.so
libonload_ext.so
Environment variables (if setPreload
is true):
LD_PRELOAD=<library-mount>/libonload.so
Binary mounts (if mountOnload
is true, by default in /opt/onload/usr/bin/
)
onload
If you wish to customise where files are mounted in the container's filesystem this can be configured with the fields
of spec.devicePlugin
in an Onload CR.
Important
Kubernetes Device Plugin only affects initial pod scheduling
Kubernetes Device Plugin is designed to configure pods once only, at creation time. If the Onload CR is re-applied to
the cluster with settings that would change pod environment -- for example, changing the value of setPreload
--
then running pods must be recreated before using these changes.
Additionally, Kubernetes does not evict pods when node resources are removed; pods do not automatically have a formal dependency on Onload Device Plugin or Onload Module. This has the advantage that minor Onload Operator behaviour does not affect the workloads its components pre-configured.
Please see config/samples/sfnettest.
If you want to run your onloaded application with a runtime profile we suggest
using a ConfigMap to set the environment variables in the pod(s).
We have included an example definition for the 'latency' profile in
config/samples/profiles/
directory.
To deploy a ConfigMap named onload-latency-profile
in the current namespace:
kubectl apply -k https://github.com/Xilinx-CNS/kubernetes-onload/config/samples/profiles?ref=v3.0
To use this in your pod, add the following to the container spec in your pod definition:
kind: Pod
...
spec:
...
containers:
- ...
envFrom:
- configMapRef:
name: onload-latency-profile
If you have an existing profile defined as a .opf
file you can generate a new
ConfigMap definition from this using the scripts/profiles/profile_to_configmap.sh
script.
profile_to_configmap.sh
takes in a comma separated list of profiles and will
output the text definition of the ConfigMap which can be saved into a file, or
sent straight to the cluster. To apply the generated ConfigMap straight away
run:
./scripts/profiles/profile_to_configmap.sh -p /path/to/profile.opf | kubectl apply -f -
Currently the script produces ConfigMaps with a fixed naming structure,
for example if you want to create a ConfigMap from a profile called
name.opf
the generated name will be onload-name-profile
.
Please see dedicated troubleshooting guide.
Developing Onload Operator does not require building the onload-module
image as they can be built in-cluster by KMM.
To build these images outside the cluster, please see ./build/onload-module/ for documentation and examples.
Please see scripts/machineconfig/ for documentation and examples
to deploy an out-of-tree sfc
module in Day 0/1 (on boot).
Using Onload Operator does not require building these images as official images are available.
Please see DEVELOPING documentation.
Developing Onload Operator does not require building these images as official images are available.
If you wish to build these images, please follow 'Distributing as container image' in Onload repository's DEVELOPING.
This includes building debug versions. All Onload images in use must be consistent, in exact commit and build
parameters. For example, a debug build of onload-user
must be used with a debug build of onload-module
. Build
parameter specification is provided in the sample Onload CRs for the in-cluster build method.
If your registry is not running with TLS configured, additional configuration may be necessary for accessing and pushing images. For example:
$ oc edit image.config cluster
...
spec:
registrySources:
insecureRegistries:
- image-registry.openshift-image-registry.svc:5000
The Onload Operator has the capability to upgrade the version of Onload used by a CR. This can be done by updating the definition of the Onload CR once it is in the cluster.
Important
To trigger the start of an upgrade edit the Onload CR and change the spec.onload.version
field.
This can be done using kubectl edit
, kubectl patch
or re-applying the edited yaml file with kubectl apply
.
The fields that the Operator will propagate during an upgrade are:
spec.onload.version
spec.onload.userImage
spec.kernelMappings
Changes to other fields are ignored by the Operator.
For example using kubectl patch
(Please note that this is just an illustrative example and shouldn't be applied to a resource in your cluster):
kubectl patch onload onload-sample --type=merge --patch-file=/dev/stdin <<-EOF
{
"spec": {
"onload": {
"kernelMappings": [
{
"kernelModuleImage": "docker.io/onload/onload-module:8.2.0",
"regexp": "^.*\\.x86_64$"
}
],
"userImage": "docker.io/onload/onload-user:8.2.0",
"version": "8.2.0"
}
}
}
EOF
The upgrade procedure occurs node-by-node, the Operator will pick a node to upgrade (next alphabetically) and start the procedure for this node. Once the upgrade on this node has completed, it will move onto the next node.
Steps during an upgrade:
- Change to
spec.onload.version
. - Operator picks next node to upgrade, or stops if all nodes are upgrade. For each node:
- Operator stops the Onload Device Plugin.
- Operator evicts pods using
amd.com/onload
resource. - Operator removes the
onload
Module (and, if applicable, thesfc
Module). - Operator adds new Module(s).
- Operator re-starts the Onload Device Plugin.
During the upgrade procedure on a Node the Onload Operator will evict all pods that have requested an amd.com/onload
resource on the current node. This is done so that these application pods don't encounter unexpected errors during
runtime and so that the upgrade completes as expected. If your application's pods are created by a controller (for
example a Deployment) then they will get re-created once the upgrade has completed and amd.com/onload
resources are
available again, if your Pod was created manually it will may have to be re-created manually.
The Operator assumes that all users of either the sfc
or onload
kernel modules are in pods that have an
amd.com/onload
resource, if their are pods that are using the sfc interface but do not have a resource registered
through the device plugin please shut them down before starting the upgrade.
The Onload Operator does not interact with the Machine Config Operator, this means that the updated sfc driver will have to upgraded separately from Onload. We suggest updating the sfc MachineConfig first, then when that has finished you should trigger the Onload upgrade. This will result in a period of time after the machine has rebooted with the new sfc driver version, but with an old version of Onload. Onloaded apps are not expected to work during this period, and you should wait until the Onload upgrade has finished before re-starting your workload.
The Onload Operator does not keep a history of previous versions, so it is not possible to "rollback" an upgrade. If you wish to continue using an older version, you can simply re-follow the upgrade procedure but using the earlier version and images.
The Onload Operator does not perform an automatic validation of an upgrade. The status of the cluster should be checked after the upgrade has finished to ensure that things are as expected.
Once an upgrade has started the Onload Operator will try to perform the on all nodes that match its selector. Therefore it is not currently possible to "freeze" a node in place while others are upgraded. If you want to have heterogeneous Onload versions in the same cluster then you should have multiple Onload CRs with non-overlapping node selectors, then each of these can be upgraded separately.
Due to the Onload Operator's dependence on KMM v1 it is not possible to guarantee that a kernel module is actually
unloaded when the Module CR is deleted. This is a known issue with KMM v1, but please try to ensure that there are no
other users of the onload
(or sfc
if applicable) kernel modules when the upgrade starts.
-
The Onload Operator manages KMM resources on behalf of the user but does not provide feature parity with KMM. Examples of features not included are: in-cluster container image build signing, node version freezing during ordered upgrade (Onload Operator manages these labels), miscellaneous DevicePlugin configuration, configuration of registry credentials (beyond existing cluster configuration), customisation of kernel module parameters and soft dependencies, and customisation of Namespace and Service Account for dependent resources (instead inherited from Onload CR). Configuring
PreflightValidation
can be performed independently while the Onload Operator is running. -
Reloading of the kernel modules
onload
(and optionallysfc
) will occur on first deployment and under certain reconfigurations. When using AMD Solarflare interfaces for Kubernetes control plane traffic, ensure node network interface configuration and workloads will regain correct configuration and cluster connectivity after reload. -
Interface names may change when switching from an in-tree to out-of-tree
sfc
kernel module. This is due to changes in default interface names between versions 4 and 5. Ensure appropriate measures have been taken for any additional network configurations that depend on this information.
Trademarks are acknowledged as being the property of their respective owners. Kubernetes® is a trademark of The Linux Foundation. OpenShift® is a trademark of Red Hat, Inc..
Copyright (c) 2023-2024 Advanced Micro Devices, Inc.