Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use force unmount and explicitly unmount bad mount points #183

Draft
wants to merge 1 commit into
base: development
Choose a base branch
from

Conversation

dabradley
Copy link
Collaborator

There have been cases where the logic to cleanup a mount point has caused the driver to get into a bad state. This is most obvious when a subdirectory is mounted to a volume and a parent directory of that subdirectory is deleted. The Lustre driver doesn't handle that case in the way that Kubernetes expects and returns invalid data. To avoid this scenario causing our driver to get into a bad state, leak mount points, etc, we must explicitly check that we can read the necessary information about the mount point, and if not, explicitly unmount that mount point before allowing Kubernetes to clean up the directory. To ensure that we don't end up in a bad state, this change enables force unmounting as well. The force unmount will only occur after a timeout has expired, since force unmounts can cause issues with the Lustre driver. However, in this case, it is better if we are in a bad enough situation to be able to eventually return to a good state rather than require manual intervention.

What type of PR is this?

/kind bug

There have been cases where the logic to cleanup a mount point
has caused the driver to get into a bad state. This is most
obvious when a subdirectory is mounted to a volume and a parent
directory of that subdirectory is deleted. The Lustre driver
doesn't handle that case in the way that Kubernetes expects
and returns invalid data. To avoid this scenario causing our
driver to get into a bad state, leak mount points, etc, we
must explicitly check that we can read the necessary information
about the mount point, and if not, explicitly unmount that
mount point before allowing Kubernetes to clean up the directory.
To ensure that we don't end up in a bad state, this change
enables force unmounting as well. The force unmount will only
occur after a timeout has expired, since force unmounts can
cause issues with the Lustre driver. However, in this case, it
is better if we are in a bad enough situation to be able to
eventually return to a good state rather than require manual
intervention.
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 16, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dabradley

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 16, 2024
@coveralls
Copy link

Coverage Status

coverage: 81.844% (-0.9%) from 82.773%
when pulling 9297cb4 on dabradley:personal/dabradley/cleanupbadmounts
into d583bcd on kubernetes-sigs:development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants