Skip to content

Commit

Permalink
Support cluster upgrade for day1 scenario (#1076)
Browse files Browse the repository at this point in the history
the kernel module image's tag should be equal to kernel version of the
node. Pull image will get the current kernel version of the node
and use this value as a tag for image to pull
  • Loading branch information
yevgeny-shnaidman authored Apr 8, 2024
1 parent 41f8040 commit 9591126
Show file tree
Hide file tree
Showing 3 changed files with 52 additions and 17 deletions.
46 changes: 39 additions & 7 deletions docs/mkdocs/documentation/day1_limited_option.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,23 @@ The loading of OOT kernel module leverages MCO. The flow sequence is as follows:

The Day 1 functionality uses the same DTK based image that Day 2 KMM builds can leverage.
OOT kernel module should be located under `/opt/lib/modules/${kernelVersion}`.
The tag of the kernel module image should be equal to kernel version on the node: for example,
if the kernel version on the node is `5.14.0-284.59.1.el9_2.x86_64`, then the image and tag should be:
`repo/image:5.14.0-284.59.1.el9_2.x86_64`

## In-tree module replacement

The Day 1 functionality always tries to replace the in-tree kernel module with the OOT one.
If the in-tree kernel module is not loaded, the flow is not affected;, the service will proceed and load the OOT kernel module.
The Day 1 functionality will try to replace in-tree kernel module only if requested (see parameter to the MC creation).
If the in-tree kernel module is not loaded, but was requested to be unloaded, the flow is not affected;
the service will proceed and load the OOT kernel module.

## MCO yaml creation

KMM provides an API to create an MCO YAML manifest for the Day 1 functionality:
KMM provides 2 ways to create an MCO YAML manifest for the Day 1 functionality:
1. API to be called by from GO code
2. Linux executable that can be called manually with appropriate parameters

### API

```go
ProduceMachineConfig(machineConfigName, machineConfigPoolRef, kernelModuleImage, kernelModuleName string) (string, error)
Expand All @@ -54,15 +62,36 @@ The parameters are:

- `machineConfigName`: the name of the MCO YAML manifest. It will be set as the `name` parameter of the metadata of MCO YAML manifest.
- `machineConfigPoolRef`: the `MachineConfigPool` name that will be used in order to identify the targeted nodes
- `kernelModuleImage`: the name of the container image that includes the OOT kernel module.
- `kernelModuleImage`: the name of the container image that includes the OOT kernel module without the tag
- `kernelModuleName`: the name of the OOT kernel module. This parameter will be used both to unload the in-tree kernel module
(if loaded into the kernel) and to load the OOT kernel module.

The API is located under `pkg/mcproducer` package of hte KMM source code.
- `inTreeModuleToRemove`: optional parameter. The name of the in-tree kernel module to unload prior to loading OOT kernel module.
In case this parameter is not passed, day1 functionality will not try to unload any in-tree
module
- `workerImage`: optional parameter. The worker image to use. In case this parameter is not passed, the default worker image
will be used: quay.io/edge-infrastructure/kernel-module-management-worker:latest.


The API is located under `pkg/mcproducer` package of the KMM source code.
There is no need to KMM operator to be running to use the Day 1 functionality.
Users only need to import the `pkg/mcproducer` package into their operator/utility code, call the API and to apply the produced
MCO YAML to the cluster.

### Utility
`day1-utility` can be called from a shell. day1-utility executable is not a part of KMM github repo.
In order to build it the following commands needs to be run:
`make day1-utility`

Utility uses the following flags:
`-image <string>`: container image that contains kernel module .ko file
`-kernel-module <string>`: name of the OOT module to load
`-machine-config <string>`: name of the machine config to create
`-machine-config-pool <string>`: name of the machine config pool to use
`-in-tree-module-to-remove <string>`: in-tree kernel module that should be removed prior to loading the oot module.
`-worker-image <string>`: kernel-management worker image to use. If not passed, a default value will be used

The first 4 flags are mandatory, but the last 2 are optional. They correspond to the parameters of the API

### MachineConfigPool

MachineConfigPool is used to identify a collection of nodes that will be affected by the applied MCO.
Expand Down Expand Up @@ -106,5 +135,8 @@ will target the worker MachineConfigPool

A detailed description of MachineConfig and MachineConfigPool can be found in [MachineConfigPool explanation](https://www.redhat.com/en/blog/openshift-container-platform-4-how-does-machine-config-pool-work) for more information.


## Cluster Upgrade support
Using kernel version as a tag for kernel module image, allows supporting cluster upgrade. Pull service will determine the kernel version of the
node and then use this value as a tag for kernel module image. This way, all the customer needs to do prior to upgrading the cluster, it to create a kernel module image
with the appropriate tag, without any need to update day1 MC. Once the node is rebooted, pull service will pull the correct image

21 changes: 12 additions & 9 deletions pkg/mcproducer/scripts/pull-image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,13 @@
kernel_module_image_filepath="$KERNEL_MODULE_IMAGE_FILEPATH"
worker_image="$WORKER_IMAGE"
kernel_module_image="$KERNEL_MODULE_IMAGE"
kernel_module_image_tag=$(uname -r)
full_kernel_module_image="$kernel_module_image:$kernel_module_image_tag"

if [ -e $kernel_module_image_filepath ]; then
echo "File $kernel_module_image_filepath found.Nothing to do, the file was handled, removing it"
echo "File $kernel_module_image_filepath found. Nothing to do, the file was handled, removing $kernel_module_image_filepath and $kmm_config_file_filepath"
rm -f $kernel_module_image_filepath
rm -f $kmm_config_file_filepath
else
podman pull --authfile /var/lib/kubelet/config.json $worker_image
if [ $? -eq 0 ]; then
Expand All @@ -17,20 +20,20 @@ else
exit 1
fi

echo "File $kernel_module_image_filepath is not on the filesystem, pulling image "
podman pull --authfile /var/lib/kubelet/config.json $kernel_module_image
echo "File $kernel_module_image_filepath is not on the filesystem, pulling image $full_kernel_module_image"
podman pull --authfile /var/lib/kubelet/config.json $full_kernel_module_image
if [ $? -eq 0 ]; then
echo "Image $kernel_module_image has been successfully pulled"
echo "Image $full_kernel_module_image has been successfully pulled"
else
echo "Failed to pull image $kernel_module_image"
echo "Failed to pull image $full_kernel_module_image"
exit 1
fi
echo "Saving image $kernel_module_image into a file $kernel_module_image_filepath"
podman save -o $kernel_module_image_filepath $kernel_module_image
echo "Saving image $full_kernel_module_image into a file $kernel_module_image_filepath"
podman save -o $kernel_module_image_filepath $full_kernel_module_image
if [ $? -eq 0 ]; then
echo "Image $kernel_module_image has been successfully save on file $kernel_module_image_filepath, rebooting..."
echo "Image $full_kernel_module_image has been successfully save on file $kernel_module_image_filepath, rebooting..."
reboot
else
echo "Failed to save image $kernel_module_image to file $kernel_module_image_filepath"
echo "Failed to save image $full_kernel_module_image to file $kernel_module_image_filepath"
fi
fi
2 changes: 1 addition & 1 deletion pkg/mcproducer/testdata/machineconfig-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ spec:
user:
name: "root"
contents:
source: "data:text/plain;base64,IyEvYmluL2Jhc2gKCgprZXJuZWxfbW9kdWxlX2ltYWdlX2ZpbGVwYXRoPSIkS0VSTkVMX01PRFVMRV9JTUFHRV9GSUxFUEFUSCIKd29ya2VyX2ltYWdlPSIkV09SS0VSX0lNQUdFIgprZXJuZWxfbW9kdWxlX2ltYWdlPSIkS0VSTkVMX01PRFVMRV9JTUFHRSIKCmlmIFsgLWUgJGtlcm5lbF9tb2R1bGVfaW1hZ2VfZmlsZXBhdGggXTsgdGhlbgogICAgZWNobyAiRmlsZSAka2VybmVsX21vZHVsZV9pbWFnZV9maWxlcGF0aCBmb3VuZC5Ob3RoaW5nIHRvIGRvLCB0aGUgZmlsZSB3YXMgaGFuZGxlZCwgcmVtb3ZpbmcgaXQiCiAgICBybSAtZiAka2VybmVsX21vZHVsZV9pbWFnZV9maWxlcGF0aAplbHNlCiAgICBwb2RtYW4gcHVsbCAtLWF1dGhmaWxlIC92YXIvbGliL2t1YmVsZXQvY29uZmlnLmpzb24gJHdvcmtlcl9pbWFnZQogICAgaWYgWyAkPyAtZXEgMCBdOyB0aGVuCiAgICAgICAgZWNobyAiSW1hZ2UgJHdvcmtlcl9pbWFnZSBoYXMgYmVlbiBzdWNjZXNzZnVsbHkgcHVsbGVkIgogICAgZWxzZQogICAgICAgIGVjaG8gIkZhaWxlZCB0byBwdWxsIGltYWdlICR3b3JrZXJfaW1hZ2UiCiAgICAgICAgZXhpdCAxCiAgICBmaQoKICAgIGVjaG8gIkZpbGUgJGtlcm5lbF9tb2R1bGVfaW1hZ2VfZmlsZXBhdGggaXMgbm90IG9uIHRoZSBmaWxlc3lzdGVtLCBwdWxsaW5nIGltYWdlICIKICAgIHBvZG1hbiBwdWxsIC0tYXV0aGZpbGUgL3Zhci9saWIva3ViZWxldC9jb25maWcuanNvbiAka2VybmVsX21vZHVsZV9pbWFnZQogICAgaWYgWyAkPyAtZXEgMCBdOyB0aGVuCiAgICAgICAgZWNobyAiSW1hZ2UgJGtlcm5lbF9tb2R1bGVfaW1hZ2UgaGFzIGJlZW4gc3VjY2Vzc2Z1bGx5IHB1bGxlZCIKICAgIGVsc2UKICAgICAgICBlY2hvICJGYWlsZWQgdG8gcHVsbCBpbWFnZSAka2VybmVsX21vZHVsZV9pbWFnZSIKICAgICAgICBleGl0IDEKICAgIGZpCiAgICBlY2hvICJTYXZpbmcgaW1hZ2UgJGtlcm5lbF9tb2R1bGVfaW1hZ2UgaW50byBhIGZpbGUgJGtlcm5lbF9tb2R1bGVfaW1hZ2VfZmlsZXBhdGgiCiAgICBwb2RtYW4gc2F2ZSAtbyAka2VybmVsX21vZHVsZV9pbWFnZV9maWxlcGF0aCAka2VybmVsX21vZHVsZV9pbWFnZQogICAgaWYgWyAkPyAtZXEgMCBdOyB0aGVuCiAgICAgICAgZWNobyAiSW1hZ2UgJGtlcm5lbF9tb2R1bGVfaW1hZ2UgaGFzIGJlZW4gc3VjY2Vzc2Z1bGx5IHNhdmUgb24gZmlsZSAka2VybmVsX21vZHVsZV9pbWFnZV9maWxlcGF0aCwgcmVib290aW5nLi4uIgogICAgICAgIHJlYm9vdAogICAgZWxzZQogICAgICAgIGVjaG8gIkZhaWxlZCB0byBzYXZlIGltYWdlICRrZXJuZWxfbW9kdWxlX2ltYWdlIHRvIGZpbGUgJGtlcm5lbF9tb2R1bGVfaW1hZ2VfZmlsZXBhdGgiCiAgICBmaQpmaQo="
source: "data:text/plain;base64,IyEvYmluL2Jhc2gKCgprZXJuZWxfbW9kdWxlX2ltYWdlX2ZpbGVwYXRoPSIkS0VSTkVMX01PRFVMRV9JTUFHRV9GSUxFUEFUSCIKd29ya2VyX2ltYWdlPSIkV09SS0VSX0lNQUdFIgprZXJuZWxfbW9kdWxlX2ltYWdlPSIkS0VSTkVMX01PRFVMRV9JTUFHRSIKa2VybmVsX21vZHVsZV9pbWFnZV90YWc9JCh1bmFtZSAtcikKZnVsbF9rZXJuZWxfbW9kdWxlX2ltYWdlPSIka2VybmVsX21vZHVsZV9pbWFnZToka2VybmVsX21vZHVsZV9pbWFnZV90YWciCgppZiBbIC1lICRrZXJuZWxfbW9kdWxlX2ltYWdlX2ZpbGVwYXRoIF07IHRoZW4KICAgIGVjaG8gIkZpbGUgJGtlcm5lbF9tb2R1bGVfaW1hZ2VfZmlsZXBhdGggZm91bmQuIE5vdGhpbmcgdG8gZG8sIHRoZSBmaWxlIHdhcyBoYW5kbGVkLCByZW1vdmluZyAka2VybmVsX21vZHVsZV9pbWFnZV9maWxlcGF0aCBhbmQgJGttbV9jb25maWdfZmlsZV9maWxlcGF0aCIKICAgIHJtIC1mICRrZXJuZWxfbW9kdWxlX2ltYWdlX2ZpbGVwYXRoCiAgICBybSAtZiAka21tX2NvbmZpZ19maWxlX2ZpbGVwYXRoCmVsc2UKICAgIHBvZG1hbiBwdWxsIC0tYXV0aGZpbGUgL3Zhci9saWIva3ViZWxldC9jb25maWcuanNvbiAkd29ya2VyX2ltYWdlCiAgICBpZiBbICQ/IC1lcSAwIF07IHRoZW4KICAgICAgICBlY2hvICJJbWFnZSAkd29ya2VyX2ltYWdlIGhhcyBiZWVuIHN1Y2Nlc3NmdWxseSBwdWxsZWQiCiAgICBlbHNlCiAgICAgICAgZWNobyAiRmFpbGVkIHRvIHB1bGwgaW1hZ2UgJHdvcmtlcl9pbWFnZSIKICAgICAgICBleGl0IDEKICAgIGZpCgogICAgZWNobyAiRmlsZSAka2VybmVsX21vZHVsZV9pbWFnZV9maWxlcGF0aCBpcyBub3Qgb24gdGhlIGZpbGVzeXN0ZW0sIHB1bGxpbmcgaW1hZ2UgJGZ1bGxfa2VybmVsX21vZHVsZV9pbWFnZSIKICAgIHBvZG1hbiBwdWxsIC0tYXV0aGZpbGUgL3Zhci9saWIva3ViZWxldC9jb25maWcuanNvbiAkZnVsbF9rZXJuZWxfbW9kdWxlX2ltYWdlCiAgICBpZiBbICQ/IC1lcSAwIF07IHRoZW4KICAgICAgICBlY2hvICJJbWFnZSAkZnVsbF9rZXJuZWxfbW9kdWxlX2ltYWdlIGhhcyBiZWVuIHN1Y2Nlc3NmdWxseSBwdWxsZWQiCiAgICBlbHNlCiAgICAgICAgZWNobyAiRmFpbGVkIHRvIHB1bGwgaW1hZ2UgJGZ1bGxfa2VybmVsX21vZHVsZV9pbWFnZSIKICAgICAgICBleGl0IDEKICAgIGZpCiAgICBlY2hvICJTYXZpbmcgaW1hZ2UgJGZ1bGxfa2VybmVsX21vZHVsZV9pbWFnZSBpbnRvIGEgZmlsZSAka2VybmVsX21vZHVsZV9pbWFnZV9maWxlcGF0aCIKICAgIHBvZG1hbiBzYXZlIC1vICRrZXJuZWxfbW9kdWxlX2ltYWdlX2ZpbGVwYXRoICRmdWxsX2tlcm5lbF9tb2R1bGVfaW1hZ2UKICAgIGlmIFsgJD8gLWVxIDAgXTsgdGhlbgogICAgICAgIGVjaG8gIkltYWdlICRmdWxsX2tlcm5lbF9tb2R1bGVfaW1hZ2UgaGFzIGJlZW4gc3VjY2Vzc2Z1bGx5IHNhdmUgb24gZmlsZSAka2VybmVsX21vZHVsZV9pbWFnZV9maWxlcGF0aCwgcmVib290aW5nLi4uIgogICAgICAgIHJlYm9vdAogICAgZWxzZQogICAgICAgIGVjaG8gIkZhaWxlZCB0byBzYXZlIGltYWdlICRmdWxsX2tlcm5lbF9tb2R1bGVfaW1hZ2UgdG8gZmlsZSAka2VybmVsX21vZHVsZV9pbWFnZV9maWxlcGF0aCIKICAgIGZpCmZpCg=="
- path: "/usr/local/bin/wait-for-dispatcher.sh"
mode: 493
overwrite: true
Expand Down

0 comments on commit 9591126

Please sign in to comment.