Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add reference to HostUpdatePolicy in Servicing. #1969

Closed

Conversation

rhjanders
Copy link
Member

@rhjanders rhjanders commented Sep 20, 2024

What this PR does / why we need it:

This PR enables BMO to run Ironic servicing operations (such as applying firmware settings changes - or in the future firmware updates to already provisioned nodes). Servicing is an opt-in feature and is controlled by creation of a HostUpdatePolicy for a node with attributes indicating the desire to make changes to firmware configuration onReboot.

This is a partial implementation of https://github.com/metal3-io/metal3-docs/blob/main/design/baremetal-operator/host-live-updates.md (please note only firmware settings changes are currently supported, firmware update support will be added next).

@metal3-io-bot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@metal3-io-bot metal3-io-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 20, 2024
@metal3-io-bot metal3-io-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Sep 20, 2024
@rhjanders rhjanders force-pushed the servicing-hostupdatepolicy branch 2 times, most recently from 0d8b518 to 509027a Compare September 24, 2024 04:31
@rhjanders rhjanders changed the title Add reference to HostUpdatePolicy in Servicing. ✨ Add reference to HostUpdatePolicy in Servicing. Oct 15, 2024
@rhjanders rhjanders marked this pull request as ready for review October 15, 2024 11:41
@metal3-io-bot metal3-io-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2024
@rhjanders rhjanders force-pushed the servicing-hostupdatepolicy branch 3 times, most recently from 0f11a97 to 7f4b773 Compare October 17, 2024 06:55
@rhjanders rhjanders force-pushed the servicing-hostupdatepolicy branch 2 times, most recently from 87cb67e to 95aa70b Compare October 17, 2024 11:15
@iurygregory
Copy link
Member

LGTM, thanks for working on it @rhjanders

controllers/metal3.io/baremetalhost_controller.go Outdated Show resolved Hide resolved
pkg/provisioner/ironic/servicing.go Outdated Show resolved Hide resolved
controllers/metal3.io/baremetalhost_controller.go Outdated Show resolved Hide resolved
controllers/metal3.io/baremetalhost_controller.go Outdated Show resolved Hide resolved
controllers/metal3.io/baremetalhost_controller.go Outdated Show resolved Hide resolved
controllers/metal3.io/baremetalhost_controller.go Outdated Show resolved Hide resolved
controllers/metal3.io/baremetalhost_controller.go Outdated Show resolved Hide resolved
controllers/metal3.io/baremetalhost_controller.go Outdated Show resolved Hide resolved
dtantsur and others added 3 commits October 18, 2024 22:31
Signed-off-by: Dmitry Tantsur <[email protected]>
Servicing only runs when a host is powered off (either completely or
by rebooting it).

Signed-off-by: Dmitry Tantsur <[email protected]>
Signed-off-by: Jacob Anders <[email protected]>

Removed unused ServicingData fields.
@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zaneb for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhjanders rhjanders force-pushed the servicing-hostupdatepolicy branch 3 times, most recently from cbe1eb9 to 324e111 Compare October 23, 2024 14:25
@iurygregory
Copy link
Member

LGTM

controllers/metal3.io/baremetalhost_controller.go Outdated Show resolved Hide resolved
controllers/metal3.io/baremetalhost_controller.go Outdated Show resolved Hide resolved

if provResult.Dirty {
result := actionContinue{provResult.RequeueAfter}
if dirty {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual thing we want to check here is whether we need to write the BMH. Writes occur on line 1406 and line 1420, but dirty could be true if hfsDirty is true even if nothing is actually updated.

// update didn't actually happen. This is deemed an acceptable risk for the moment since it is only
// going to impact a small subset of Firmware Settings implementations.
currentError := info.host.Status.ErrorType
if clearErrorWithStatus(info.host, metal3api.OperationalStatusServicing) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too early to clear an error. We should do it after the check for provResult.ErrorMessage != "" around line 1418.

It's also arguably too late for an attempt to set the status to Servicing in the non-error case, because we'll still only write it once servicing has already started.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the order as per the first line of the comment. Unsure about the second part - will check in with Dmitry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember that writes to the k8s API only happen after we return from this function. So if you want to put it into OperationalStatusServicing before servicing starts then you'd need to do something like:

    if info.host.Status.OperationalStatus != metal3api.OperationalStatusServicing {
        info.host.Status.OperationalStatus = metal3api.OperationalStatusServicing
        return actionUpdate{}
    }

Just changing the order has no effect in that respect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do it this way, we'll lose the error information, and it will never be passed to prov.Service.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stand corrected. If we don't call clearErrorWithStatus, we'll have ErrorType still set. We just need to remember to unset it after the successful call.

Maybe we need to return clearErrorWithStatus to where it was before, but also set the status to servicing explicitly in the way that Zane suggested?

controllers/metal3.io/baremetalhost_controller.go Outdated Show resolved Hide resolved
// succeed before leaving this state (e.g. by deprovisioning) we lose the signal that the
// update didn't actually happen. This is deemed an acceptable risk for the moment since it is only
// going to impact a small subset of Firmware Settings implementations.
currentError := info.host.Status.ErrorType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you no longer need to store this because you no longer clear ErrorType before calling Service.

}

if started && fwDirty {
info.host.Status.Provisioning.Firmware = info.host.Spec.Firmware.DeepCopy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need dirty = true here

@dtantsur
Copy link
Member

dtantsur commented Nov 6, 2024

/close

Superseded by #2041

@metal3-io-bot
Copy link
Contributor

@dtantsur: Closed this PR.

In response to this:

/close

Superseded by #2041

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants