Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: sbd-inquisitor: SBD_SYNC_RESOURCE_STARTUP should consider SBD_PACEMAKER #141

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

isaacpittman-hitachi
Copy link

Currently, if SBD_SYNC_RESOURCE_STARTUP=false and
SBD_PACEMAKER=false, sbd prints a warning every time it runs: "Should
think about enabling SBD_SYNC_RESOURCE_STARTUP." But this warning can
never be addressed, since SBD_SYNC_RESOURCE_STARTUP=true doesn't work
with SBD_PACEMAKER=false.

Additionally, if SBD_PACEMAKER=false but
SBD_SYNC_RESOURCE_STARTUP=true, the SBD_SYNC_RESOURCE_STARTUP=true
will have no effect, since SBD_PACEMAKER controls whether sbd sends
the ping to pacemaker. (If pacemaker is also configured with
SBD_SYNC_RESOURCE_STARTUP=true, it will hang forever waiting for the
ping from sbd that never comes.)

Handle both of these cases by checking the values of
SBD_SYNC_RESOURCE_STARTUP and SBD_PACEMAKER:

  • SBD_SYNC_RESOURCE_STARTUP=true, SBD_PACEMAKER=true: No change.
    Startup syncs as expected.

  • SBD_SYNC_RESOURCE_STARTUP=false, SBD_PACEMAKER=true: No change.
    Sbd warns about enabling SBD_SYNC_RESOURCE_STARTUP as expected.

  • SBD_SYNC_RESOURCE_STARTUP=true, SBD_PACEMAKER=false: Currently,
    pacemaker would hang. With the fix, sbd aborts with an error.

  • SBD_SYNC_RESOURCE_STARTUP=false, SBD_PACEMAKER=false: Currently,
    sbd warns about enabling SBD_SYNC_RESOURCE_STARTUP. With the fix, the
    behavior is the same as it was before SBD_SYNC_RESOURCE_STARTUP was
    introduced: no warning.

…CEMAKER

Currently, if `SBD_SYNC_RESOURCE_STARTUP=false` and
`SBD_PACEMAKER=false`, sbd prints a warning every time it runs: "Should
think about enabling SBD_SYNC_RESOURCE_STARTUP." But this warning can
never be addressed, since `SBD_SYNC_RESOURCE_STARTUP=true` doesn't work
with `SBD_PACEMAKER=false`.

Additionally, if `SBD_PACEMAKER=false` but
`SBD_SYNC_RESOURCE_STARTUP=true`, the `SBD_SYNC_RESOURCE_STARTUP=true`
will have no effect, since `SBD_PACEMAKER` controls whether sbd sends
the ping to pacemaker. (If pacemaker is also configured with
`SBD_SYNC_RESOURCE_STARTUP=true`, it will hang forever waiting for the
ping from sbd that never comes.)

Handle both of these cases by checking the values of
`SBD_SYNC_RESOURCE_STARTUP` and `SBD_PACEMAKER`:

* `SBD_SYNC_RESOURCE_STARTUP=true`, `SBD_PACEMAKER=true`: No change.
Startup syncs as expected.

* `SBD_SYNC_RESOURCE_STARTUP=false`, `SBD_PACEMAKER=true`: No change.
Sbd warns about enabling `SBD_SYNC_RESOURCE_STARTUP` as expected.

* `SBD_SYNC_RESOURCE_STARTUP=true`, `SBD_PACEMAKER=false`: Currently,
pacemaker would hang. With the fix, sbd aborts with an error.

* `SBD_SYNC_RESOURCE_STARTUP=false`, `SBD_PACEMAKER=false`: Currently,
sbd warns about enabling `SBD_SYNC_RESOURCE_STARTUP`. With the fix, the
behavior is the same as it was before `SBD_SYNC_RESOURCE_STARTUP` was
introduced: no warning.
@knet-ci-bot
Copy link

Can one of the admins verify this patch?

@wenningerk
Copy link

test this please

@wenningerk
Copy link

wenningerk commented Jan 12, 2022

Not sure if what you tried is a valid configuration.
With SBD_PACEMAKER==false you probably should disable watchdog-fencing on the pacemaker side and then pacemaker shouldn't hang anymore.
Disabling the warning in case SBD_PACEMAKER==false is probably a good idea.
The other thing might be handled better on the pacemaker-side which should probably exit if SBD_PACEMAKER==false and watchdog-fencing is enabled.
Current behavior is at least save in a way that it prevents split-brain by not starting resources.

@isaacpittman-hitachi
Copy link
Author

Thank you for the feedback! You're probably right that I tested an invalid configuration.

I've removed the check that broke the unit tests but kept the disabled warning in case SBD_PACEMAKER==false.

@wenningerk
Copy link

test this please

@wenningerk
Copy link

Hmm ...
Guess I take back what I said yesterday.
This requires a bit more thinking.
I think we should make a difference if resource-startup-syncing got set coming in as default or explicitly.
Some thoughts but this is probably not final ...

If both resource-startup-syncing = true and pacemaker-check = false are set explicitly this is
definitely a false-configuration and I guess exiting sbd would be the correct thing to do.
If resource-startup-syncing is going for the default we might enforce defaulting to false if
pacemaker-checks are explicitly set to false.
Critical is that pacemaker-checks might be disabled via cmdline and that is something
pacemaker couldn't check for. If we detect critical inconsistency here we might exit as well.
Downside of an automatism as described above is that we'd need a similar implementation
in pacemaker.

In pacemaker we should probably at least catch the case that pacemaker-checks
are disabled and watchdog-fencing is enabled. We can't easily catch that
earlier in sbd-startup unfortunately as with pacemaker-checks disabled we have
no easy way to check for enabled watchdog-fencing.
And if we already need to know how pacemaker-checks are switched on/off we can
introduce the automatism for resource-startup-syncing as well.

We on top should try not to break high-level-tool-compatibility.
For pcs and sbd with just disk-fencing (no pacemaker awareness using
3 disks as otherwise the 1 disk shared disk would become a spof) this
might be broken already - have to check.

Your initial suggestion is probably a good first step in the right direction at least.
Maybe the error-output when it bails out should be more explicit as to
mention that it is an inconsistent configuration.
I'd like to think over a final goal still before we change anything at this point.

@isaacpittman-hitachi
Copy link
Author

Thank you for the in depth feedback! I was hoping to at least get the suppressed warning checked in, since the frequent warnings make it harder to spot other issues in the logs. If that's alright, I could file another PR for the suppressed warning issue and keep this one open to address compatibility in general between resource-startup-syncing and pacemaker-check.

@wenningerk
Copy link

Sorry for not reacting for so long. Got involved in other things and wanted to think it through properly - this time ;-) - before
actual changes to the code-base of sbd or pacemaker.
Should be able to pick up on the issue sometimes early this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants