Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete all PVCs if guest cluster is deleted in Rancher #64

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

votdev
Copy link
Member

@votdev votdev commented Nov 5, 2024

If a cluster is deleted, the PVCs of the workloads are not deleted in Harvester. This is because the node driver does not know why the associated VM has to be deleted, e.g. because its parameters have changed or because the cluster is deleted.

To solve the problem, a finalizer on the Machine resource in Rancher will add an annotation to the VM which then is evaluated by the node driver when it running the Remove() handler.

See rancher/rancher#47870 for the Rancher part.

Related to: harvester/harvester#2825

When the cluster is deleted in Rancher, a pod is started which will run the docker-machine-driver-harvester binary. The output will look like this:

Downloading driver from https://rancher.192.168.0.141.sslip.io/assets/docker-machine-driver-harvester
Doing /etc/rancher/ssl
docker-machine-driver-harvester
docker-machine-driver-harvester: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped
About to remove harv01-pool1-57832a3a-s8khx
WARNING: This action will delete both local reference and remote instance.
(harv01-pool1-57832a3a-s8khx) Remove node
(harv01-pool1-57832a3a-s8khx) Force the removal of all persistent volume claims
(harv01-pool1-57832a3a-s8khx) Waiting for node removed
Successfully removed harv01-pool1-57832a3a-s8khx
(harv01-pool1-57832a3a-s8khx) Closing plugin on server side
(temp-driver-loader) Closing plugin on server side
Stream closed EOF for fleet-default/harv01-pool1-57832a3a-s8khx-machine-provision-njm6b (machine)

Bildschirmfoto vom 2024-11-05 11-41-34

Testing

Testing can be done via an ipxe test cluster. Note, this test requires a Rancher setup containing this PR.

  1. Go to the code directory and build the project by running make build && make package
  2. Run a HTTP server to serve the binary.
$ cd ./dist/artifacts
$ python3 -m http.server 8080
  1. Get the SHA256 checksum of the compressed binary:
$ sha256sum ./dist/artifacts/docker-machine-driver-harvester-amd64.tar.gz
  1. Patch the node driver settings in Rancher. SSH into it via ssh [email protected] and run:
# kubectl patch nodedrivers/harvester --type=merge --patch '{"spec":{"builtin":false, "url":"http://<YOUR_HOST_IP>:8080/docker-machine-driver-harvester-amd64.tar.gz","whitelistDomains":["releases.rancher.com","<YOUR_HOST_IP>"],"checksum":"<THE _TAR_GZ_CHECKSUM>"}}'
  1. Your HTTP server should output something like that:
Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) ...
192.168.xxx.xxx - - [05/Nov/2024 11:35:42] "GET /docker-machine-driver-harvester-amd64.tar.gz HTTP/1.1" 200 -
192.168.xxx.xxx - - [05/Nov/2024 11:35:42] "GET /docker-machine-driver-harvester-amd64.tar.gz HTTP/1.1" 200 -
192.168.xxx.xxx - - [05/Nov/2024 11:35:42] "GET /docker-machine-driver-harvester-amd64.tar.gz HTTP/1.1" 200 -
  1. Create a Harvester downstream cluster in the Rancher cluster management UI.
  2. Create a workload which creates a PVC in the Harvester cluster. To do so, go to the Pod tab and choose Create Persistent Volume Claim after pressing the Add Volume button. Make sure to use the Harvester storage class.
  3. Delete the cluster in the Rancher cluster management UI.
  4. When you are fast enough you can see the added harvesterhci.io/removeAllPersistentVolumeClaims annotation at the VM in Harvester.
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  annotations:
    harvesterhci.io/removeAllPersistentVolumeClaims: 'true'  <-----------
    harvesterhci.io/vmRunStrategy: RerunOnFailure
    harvesterhci.io/volumeClaimTemplates: >-

Bildschirmfoto vom 2024-11-05 12-03-02
10. There is a pod (in the Rancher context) which is running the node driver binary. The log output should look like this:

Downloading driver from https://rancher.192.168.0.141.sslip.io/assets/docker-machine-driver-harvester
Doing /etc/rancher/ssl
docker-machine-driver-harvester
docker-machine-driver-harvester: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped
About to remove harv01-pool1-57832a3a-s8khx
WARNING: This action will delete both local reference and remote instance.
(harv01-pool1-57832a3a-s8khx) Remove node
(harv01-pool1-57832a3a-s8khx) Force the removal of all persistent volume claims
(harv01-pool1-57832a3a-s8khx) Waiting for node removed
Successfully removed harv01-pool1-57832a3a-s8khx
(harv01-pool1-57832a3a-s8khx) Closing plugin on server side
(temp-driver-loader) Closing plugin on server side
Stream closed EOF for fleet-default/harv01-pool1-57832a3a-s8khx-machine-provision-njm6b (machine)
  1. The Force the removal of all persistent volume claims log line MUST be present.
  2. Go to the Volumes page in Harvester. The volume that was created by the workload in rancher MUST be removed.

Copy link
Member

@w13915984028 w13915984028 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

harvester/harvester.go Outdated Show resolved Hide resolved
If a cluster is deleted, the PVCs of the workloads are not deleted in Harvester. This is because the node driver does not know why the associated VM has to be deleted, e.g. because its parameters have changed or because the cluster is deleted.

To solve the problem, a finalizer on the Machine resource in Rancher will add an annotation to the VM which then is evaluated by the node driver when it running the Remove() handler.

Signed-off-by: Volker Theile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants