Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign device plugin reset #747

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SchSeba
Copy link
Collaborator

@SchSeba SchSeba commented Jul 25, 2024

This commit introduces a new redesign on how the operator resets the device plugin

  • use a general nodeSelector to avoid updating the daemonset yaml
  • remove the config-daemon removing pod (better security)
  • make the operator in charge of resetting the device plugin via annotations
  • mark the node as cordon BEFORE we remove the device plugin (without drain) to avoid scheduling new pods until the device plugin is backed up

Copy link

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented Jul 25, 2024

Pull Request Test Coverage Report for Build 10683523602

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 271 of 434 (62.44%) changed or added relevant lines in 8 files are covered.
  • 18 unchanged lines in 5 files lost coverage.
  • Overall coverage increased (+0.3%) to 45.278%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/drain/drainer.go 12 14 85.71%
pkg/daemon/daemon.go 11 16 68.75%
pkg/utils/cluster.go 27 32 84.38%
pkg/platforms/openshift/openshift.go 0 9 0.0%
controllers/drain_controller_helper.go 195 337 57.86%
Files with Coverage Reduction New Missed Lines %
controllers/helper.go 1 72.41%
pkg/daemon/daemon.go 2 46.9%
pkg/utils/cluster.go 3 38.58%
controllers/drain_controller.go 4 77.68%
api/v1/helper.go 8 75.07%
Totals Coverage Status
Change from base Build 10594418576: 0.3%
Covered Lines: 6683
Relevant Lines: 14760

💛 - Coveralls

controllers/drain_controller.go Outdated Show resolved Hide resolved
controllers/drain_controller.go Outdated Show resolved Hide resolved
controllers/drain_controller.go Outdated Show resolved Hide resolved
@SchSeba SchSeba changed the title This commit introduces a new redesign on how the operator resets the … Redesign device plugin reset Jul 25, 2024
@SchSeba SchSeba force-pushed the device_plugin_redesign branch 5 times, most recently from a3e8a58 to ba8d6ef Compare July 30, 2024 09:05
@SchSeba
Copy link
Collaborator Author

SchSeba commented Aug 19, 2024

@zeeke @adrianchiris @ykulazhenkov when you have time please take a look on this :)

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments

Comment on lines 142 to 158
// in case we have policy there is nothing else to do
if len(nodeNetworkState.Spec.Interfaces) > 0 {
reqLogger.Info("node and nodeState are on idle nothing todo")
} else {
// if we don't have any policy
// let's be sure the device plugin label doesn't exist on the node
reqLogger.Info("remove Device plugin from node nodeState spec is empty")
err = utils.LabelNode(ctx, node.Name, constants.SriovDevicePluginEnabledLabel, constants.SriovDevicePluginEnabledLabelDisabled, dr.Client)
if err != nil {
log.Log.Error(err, "failed to label node for device plugin label",
"labelKey",
constants.SriovDevicePluginEnabledLabel,
"labelValue",
constants.SriovDevicePluginEnabledLabelDisabled)
return reconcile.Result{}, err
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we reached 4th level of nested if statement here. what about extracting:

Suggested change
// in case we have policy there is nothing else to do
if len(nodeNetworkState.Spec.Interfaces) > 0 {
reqLogger.Info("node and nodeState are on idle nothing todo")
} else {
// if we don't have any policy
// let's be sure the device plugin label doesn't exist on the node
reqLogger.Info("remove Device plugin from node nodeState spec is empty")
err = utils.LabelNode(ctx, node.Name, constants.SriovDevicePluginEnabledLabel, constants.SriovDevicePluginEnabledLabelDisabled, dr.Client)
if err != nil {
log.Log.Error(err, "failed to label node for device plugin label",
"labelKey",
constants.SriovDevicePluginEnabledLabel,
"labelValue",
constants.SriovDevicePluginEnabledLabelDisabled)
return reconcile.Result{}, err
}
}
func (dr *DrainReconcile) handleIdleIdle(ctx context.Context, node *corev1.Node, nodeNetworkState *sriovnetworkv1.SriovNetworkNodeState) (ctrl.Result, error) {
reqLogger := log.FromContext(ctx)
if len(nodeNetworkState.Spec.Interfaces) > 0 {
reqLogger.Info("node and nodeState are on idle nothing todo")
} else {
// if we don't have any policy
// let's be sure the device plugin label doesn't exist on the node
err := utils.LabelNode(ctx, node.Name, constants.SriovDevicePluginEnabledLabel, constants.SriovDevicePluginEnabledLabelDisabled, dr.Client)
if err != nil {
log.Log.Error(err, "failed to label node for device plugin label",
"labelKey",
constants.SriovDevicePluginEnabledLabel,
"labelValue",
constants.SriovDevicePluginEnabledLabelDisabled)
return reconcile.Result{}, err
}
}
return reconcile.Result{}, nil
}


// LabelObject adds label to a kubernetes object
func LabelObject(ctx context.Context, obj client.Object, key, value string, c client.Client) error {
log.Log.V(2).Info("LabelObject(): Annotate object",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, log only if the labels need to be updated. Otherwise, we'll get a lot of log lines even when the sriov configuration is stable.

Comment on lines 296 to 349
} else if nodeDrainAnnotation == constants.DevicePluginResetRequired {
// nothing to do here we need to wait for the node to move back to idle
if nodeStateDrainAnnotationCurrent == constants.DrainComplete {
reqLogger.Info("node requested a drain and nodeState is on drain completed nothing todo")
return ctrl.Result{}, nil
}

// if we are on idle state we move it to drain
if nodeStateDrainAnnotationCurrent == constants.DrainIdle {
err = utils.AnnotateObject(ctx, nodeNetworkState, constants.NodeStateDrainAnnotationCurrent, constants.Draining, dr.Client)
if err != nil {
reqLogger.Error(err, "failed to annotate node with annotation", "annotation", constants.Draining)
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}

// This cover a case where we only need to reset the device plugin
// for that we are going to cordon the node, so we don't get new pods allocated
// to the node in the time we remove the device plugin
err = dr.drainer.RunCordonOrUncordon(ctx, node, true)
if err != nil {
log.Log.Error(err, "failed to cordon on node")
return reconcile.Result{}, err
}

// we switch the sriov label to disable and mark the drain as completed
// no need to wait for the device plugin to exist here as we cordon the node,
// and we want to config-daemon to start the configuration in parallel of the kube-controller to remove the pod
// we check the device plugin was removed when the config-daemon moves is desire state to idle
reqLogger.Info("disable Device plugin from node")
err = utils.LabelNode(ctx, node.Name, constants.SriovDevicePluginEnabledLabel, constants.SriovDevicePluginEnabledLabelDisabled, dr.Client)
if err != nil {
log.Log.Error(err, "failed to label node for device plugin label",
"labelKey",
constants.SriovDevicePluginEnabledLabel,
"labelValue",
constants.SriovDevicePluginEnabledLabelDisabled)
return reconcile.Result{}, err
}

// if we manage to cordon we label the node state with drain completed and finish
err = utils.AnnotateObject(ctx, nodeNetworkState, constants.NodeStateDrainAnnotationCurrent, constants.DrainComplete, dr.Client)
if err != nil {
reqLogger.Error(err, "failed to annotate node with annotation", "annotation", constants.DrainComplete)
return ctrl.Result{}, err
}

reqLogger.Info("node cordoned successfully and device plugin removed")
dr.recorder.Event(nodeNetworkState,
corev1.EventTypeWarning,
"DrainController",
"node cordoned and device plugin removed completed")
return ctrl.Result{}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance to code a unit test for this case? It took me a while to understand the whole controller - daemon interaction

@SchSeba SchSeba force-pushed the device_plugin_redesign branch 3 times, most recently from 7b4de70 to d719a8c Compare September 2, 2024 10:50
…device plugin

* use a general nodeSelector to avoid updating the daemonset yaml
* remove the config-daemon removing pod (better security)
* make the operator in charge of resetting the device plugin via annotations
* mark the node as cordon BEFORE we remove the device plugin (without drain) to avoid scheduling new pods until the device plugin is backed up

Signed-off-by: Sebastian Sch <[email protected]>
Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Overall design looks good to me but we must be super sure about all of these annotation states. I'm pretty scared of having a deadlock in some production cluster :)

@@ -214,13 +190,15 @@ func syncPluginDaemonObjs(ctx context.Context,
return err
}
ds.Spec.Template.Spec.NodeSelector = dc.Spec.ConfigDaemonNodeSelector
// add the special node selector for the device plugin
ds.Spec.Template.Spec.NodeSelector[constants.SriovDevicePluginEnabledLabel] = constants.SriovDevicePluginEnabledLabelEnabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we put

nodeSelector:
    sriovnetwork.openshift.io/device-plugin: Enabled

in bindata/manifests/plugins/sriov-device-plugin.yaml ?
should we take care of the ConfigDaemonNodeSelector?


// if we are on idle state we move it to drain
if nodeStateDrainAnnotationCurrent == constants.DrainIdle {
err := utils.AnnotateObject(ctx, nodeNetworkState, constants.NodeStateDrainAnnotationCurrent, constants.Draining, dr.Client)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to understand this. We're setting nodeState = Draining even if we are not draining. We're just cordoning. It is to avoid having too many states?

createDevicePluginPodOnNode(ctx, node.Name)
expectNodeStateAnnotation(nodeState, constants.DrainIdle)
expectNodeLabel(node, constants.SriovDevicePluginEnabledLabel, constants.SriovDevicePluginEnabledLabelEnabled)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While checking if I understood all the step, I coded this unit test:

It("should restart the device plugin on multiple nodes if requested", func() {
			simpleNodeStateSpec := sriovnetworkv1.SriovNetworkNodeStateSpec{Interfaces: sriovnetworkv1.Interfaces{
				{
					Name:   "test",
					NumVfs: 10,
					VfGroups: []sriovnetworkv1.VfGroup{
						{
							PolicyName:   "test",
							ResourceName: "test",
							VfRange:      "0-9",
						},
					},
				},
			}}

			node1, nodeState1 := createNode(ctx, "node-a")
			nodeState1.Spec = simpleNodeStateSpec
			node2, nodeState2 := createNode(ctx, "node-b")
			nodeState2.Spec = simpleNodeStateSpec


			simulateDaemonSetAnnotation(node1, constants.DevicePluginResetRequired)
			simulateDaemonSetAnnotation(node2, constants.DevicePluginResetRequired)
			
			expectNodeStateAnnotation(nodeState1, constants.DrainComplete)
			expectNodeStateAnnotation(nodeState2, constants.DrainComplete)
			expectNodeLabel(node1, constants.SriovDevicePluginEnabledLabel, constants.SriovDevicePluginEnabledLabelDisabled)
			expectNodeLabel(node2, constants.SriovDevicePluginEnabledLabel, constants.SriovDevicePluginEnabledLabelDisabled)

			simulateDaemonSetAnnotation(node1, constants.DrainIdle)
			simulateDaemonSetAnnotation(node2, constants.DrainIdle)

			expectNodeStateAnnotation(nodeState1, constants.DrainIdle)
			expectNodeStateAnnotation(nodeState2, constants.DrainIdle)
			expectNodeLabel(node1, constants.SriovDevicePluginEnabledLabel, constants.SriovDevicePluginEnabledLabelEnabled)
			expectNodeLabel(node2, constants.SriovDevicePluginEnabledLabel, constants.SriovDevicePluginEnabledLabelEnabled)
		})

But it fails and the device plugin is not enabled on any node. Is this test case wrong, or is there an underlying unmanaged condition?

reqLogger *logr.Logger,
node *corev1.Node,
nodeNetworkState *sriovnetworkv1.SriovNetworkNodeState,
nodeDrainAnnotation,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is not used. please, remove it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants