Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tolerance of pre-existing software-RAID #6

Open
wants to merge 11 commits into
base: 10.10.11-8.3
Choose a base branch
from

Conversation

ydirson
Copy link
Collaborator

@ydirson ydirson commented Dec 9, 2022

Primary goal of this PR is avoid a pre-existing software RAID configuration to block usage of the disks that were used for that RAID. Those were at best not shown as available, or caused installation errors when selected (on disks with partitions that were part of a RAID), or even successful-looking installations that were not booting (when attempting to install on a RAID whose members were partitions).

The basic idea is to prevent assembling of pre-existing RAID volumes, and add an explicit option to trigger such assembling to allow upgrade/restore operations on systems installed on RAID. It also prevents, while RAIDs were not assembled, adding upgrade/restore choices to the user, of systems installed on RAID.

This filtering-out of restoration choice is extended to any backup partition referring to a non-accessible device, which as a side-effect will make it impossible to start a "restore" operation of a too-old version, which would be doomed to fail because of a dom0 device id change (as seen e.g. in https://xcp-ng.org/forum/topic/6573/upgrade-xenserver-7-to-xcp-ng-8-2-1).

Note this includes PR xenserver#20 to help the user understand which exact choices are really presented.

Note that some points are not currently addressed:

  • filtering out of disks whose partitions are in a RAID -- but then, this would only be needed when such a RAID is assembled, and we only offer to assemble pre-existing RAID when there are disks with a RAID superblock; so they would only appear when there are both a (supported) disk-level RAID, and an (unsupported) partition-level RAID; at this point the user is very likely already working on the supported RAID, so it is not sure there is really a problem left to address here
  • prevention of installation of a partition-level RAID -- those would only appear to be available in the same situation

TODO:

  • The "Back" button on the page displaying your installation options (install, restore, upgrade...) sends back to scanning for existing products and then to the same page again.

@ydirson ydirson force-pushed the yann/software-raid branch 2 times, most recently from af0eee9 to 5f445db Compare December 12, 2022 12:44
startup/01-installer.rules Outdated Show resolved Hide resolved
@ydirson ydirson force-pushed the yann/software-raid branch 2 times, most recently from 946713e to 28401b7 Compare December 14, 2022 17:17
Adding disk vendor and model would be useful, but can take too much
horizontal space, especially for good old 80x25 resolution, so we use
a shorter string with just the disk size for disambiguation.

Signed-off-by: Yann Dirson <[email protected]>
tui/installer/screens.py Outdated Show resolved Hide resolved
@ydirson ydirson force-pushed the yann/software-raid branch 2 times, most recently from 6474110 to 2f25984 Compare January 24, 2023 18:04
Reasons to have this here include:
* don't require changes to srpm when the set of installed files change
* make installer development faster by deploying directly to a filesystem

This installation process still depends by default on the sm package
being installed, but through SM_ROOT variable it can now be run on a
dev machine where this is not the case, it just needs an unpacked
version of the package.

Signed-off-by: Yann Dirson <[email protected]>
mdadm output is like "MD_DEVNAME=127 which results in getHumanDiskName
giving a name of "RAID: 127(sda,sdb)".  This fixes the problem and gets
us "RAID: md127(sda,sdb)" instead.

Also slightly improve readability with tuple unpacking.

Signed-off-by: Yann Dirson <[email protected]>
This prevents 65-md-incremental.rules from assembling every RAID it
identifies, we want to have control over that, so we can reuse a
former RAID member for other purposes.

Signed-off-by: Yann Dirson <[email protected]>
This causes the "Back" button in primary-disk selection to trigger a
rescan for products.  This provides support for allowing the user to
enable layers of block-device composition (e.g. "md" software RAID,
not part of this PR) and then scan those RAID volumes for products.

This could even be used for arbitrary manual block-device setup from shell
on alternate console.

Signed-off-by: Yann Dirson <[email protected]>
It has value for a user expecting an existing product to get notified that
none was found.  Moreover, this will allow a user to get to the "assemble
RAID" choice even when no product was detected.

Signed-off-by: Yann Dirson <[email protected]>
… RAID

This adds an option to enable *all* RAID devices found.  This is less
flexible than a more selective mechanism would be, but really simple and
covering all known use-cases.

It filters out from choices presented to the user both "update of" and
"restore to" entries refering to a disk that is part of a md RAID
(though in practice only the "update" case is likely to appear).  This
avoids the case where a user gets those entries before explicitly
assembling RAIDs, and thus avoids breaking a RAID by mistake.

Signed-off-by: Yann Dirson <[email protected]>
…er to assemble RAID

This commit is kept as separate until decided this is the way to go, but
would be better squashed.

Introduce installer feature flags, and hide the RAID-assembling
feature behind the `raid-assemble` flag.
…ound

This will avoid issues where users attempt to upgrade from a very old
version, and a dom0 kernel driver evolution changed the
/dev/disk/by-id/ identifier for the disk, and when the backup became
invalid after a hardware change.

Signed-off-by: Yann Dirson <[email protected]>
@ydirson
Copy link
Collaborator Author

ydirson commented May 22, 2023

Repushed with the same contents as upstream PR, instead of maintaining 2 branches.
Besides, old branch was using a defunct target branch.

@ydirson ydirson changed the base branch from 10.10.x-8.3 to 10.10.5-8.3 May 22, 2023 14:07
results = {}
nodes = getElementsByTagName(self.top_node, ['assemble-raid'])
if nodes:
results['assemble-raid'] = True # possibly useless
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why useless? :)

if nodes:
results['assemble-raid'] = True # possibly useless
logger.log("Assembling any RAID volumes")
rv = util.runCmd2([ 'mdadm', '--assemble', "--scan" ])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mix of simple and double quotes?

Also do you have a warn with pycodestyle? It seems to me that spaces next to the brackets can trigger a message.

@ydirson ydirson changed the base branch from 10.10.5-8.3 to 10.10.11-8.3 December 15, 2023 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants