Delete all PVCs if guest cluster is deleted in Rancher #64

votdev · 2024-11-05T08:06:55Z

If a cluster is deleted, the PVCs of the workloads are not deleted in Harvester. This is because the node driver does not know why the associated VM has to be deleted, e.g. because its parameters have changed or because the cluster is deleted.

To solve the problem, a finalizer on the Machine resource in Rancher will add an annotation to the VM which then is evaluated by the node driver when it running the Remove() handler.

See rancher/rancher#47870 for the Rancher part.

Related to: harvester/harvester#2825

When the cluster is deleted in Rancher, a pod is started which will run the docker-machine-driver-harvester binary. The output will look like this:

Downloading driver from https://rancher.192.168.0.141.sslip.io/assets/docker-machine-driver-harvester
Doing /etc/rancher/ssl
docker-machine-driver-harvester
docker-machine-driver-harvester: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped
About to remove harv01-pool1-57832a3a-s8khx
WARNING: This action will delete both local reference and remote instance.
(harv01-pool1-57832a3a-s8khx) Remove node
(harv01-pool1-57832a3a-s8khx) Force the removal of all persistent volume claims
(harv01-pool1-57832a3a-s8khx) Waiting for node removed
Successfully removed harv01-pool1-57832a3a-s8khx
(harv01-pool1-57832a3a-s8khx) Closing plugin on server side
(temp-driver-loader) Closing plugin on server side
Stream closed EOF for fleet-default/harv01-pool1-57832a3a-s8khx-machine-provision-njm6b (machine)

Testing

Testing can be done via an ipxe test cluster. Note, this test requires a Rancher setup containing this PR.

Go to the code directory and build the project by running make build && make package
Run a HTTP server to serve the binary.

$ cd ./dist/artifacts
$ python3 -m http.server 8080

Get the SHA256 checksum of the compressed binary:

$ sha256sum ./dist/artifacts/docker-machine-driver-harvester-amd64.tar.gz

Patch the node driver settings in Rancher. SSH into it via ssh [email protected] and run:

# kubectl patch nodedrivers/harvester --type=merge --patch '{"spec":{"builtin":false, "url":"http://<YOUR_HOST_IP>:8080/docker-machine-driver-harvester-amd64.tar.gz","whitelistDomains":["releases.rancher.com","<YOUR_HOST_IP>"],"checksum":"<THE _TAR_GZ_CHECKSUM>"}}'

Your HTTP server should output something like that:

Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) ...
192.168.xxx.xxx - - [05/Nov/2024 11:35:42] "GET /docker-machine-driver-harvester-amd64.tar.gz HTTP/1.1" 200 -
192.168.xxx.xxx - - [05/Nov/2024 11:35:42] "GET /docker-machine-driver-harvester-amd64.tar.gz HTTP/1.1" 200 -
192.168.xxx.xxx - - [05/Nov/2024 11:35:42] "GET /docker-machine-driver-harvester-amd64.tar.gz HTTP/1.1" 200 -

Create a Harvester downstream cluster in the Rancher cluster management UI.
Create a workload which creates a PVC in the Harvester cluster. To do so, go to the Pod tab and choose Create Persistent Volume Claim after pressing the Add Volume button. Make sure to use the Harvester storage class.
Delete the cluster in the Rancher cluster management UI.
When you are fast enough you can see the added harvesterhci.io/removeAllPersistentVolumeClaims annotation at the VM in Harvester.

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  annotations:
    harvesterhci.io/removeAllPersistentVolumeClaims: 'true'  <-----------
    harvesterhci.io/vmRunStrategy: RerunOnFailure
    harvesterhci.io/volumeClaimTemplates: >-

10. There is a pod (in the Rancher context) which is running the node driver binary. The log output should look like this:

Downloading driver from https://rancher.192.168.0.141.sslip.io/assets/docker-machine-driver-harvester
Doing /etc/rancher/ssl
docker-machine-driver-harvester
docker-machine-driver-harvester: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped
About to remove harv01-pool1-57832a3a-s8khx
WARNING: This action will delete both local reference and remote instance.
(harv01-pool1-57832a3a-s8khx) Remove node
(harv01-pool1-57832a3a-s8khx) Force the removal of all persistent volume claims
(harv01-pool1-57832a3a-s8khx) Waiting for node removed
Successfully removed harv01-pool1-57832a3a-s8khx
(harv01-pool1-57832a3a-s8khx) Closing plugin on server side
(temp-driver-loader) Closing plugin on server side
Stream closed EOF for fleet-default/harv01-pool1-57832a3a-s8khx-machine-provision-njm6b (machine)

The Force the removal of all persistent volume claims log line MUST be present.
Go to the Volumes page in Harvester. The volume that was created by the workload in rancher MUST be removed.

w13915984028

LGTM, thanks.

harvester/harvester.go

If a cluster is deleted, the PVCs of the workloads are not deleted in Harvester. This is because the node driver does not know why the associated VM has to be deleted, e.g. because its parameters have changed or because the cluster is deleted. To solve the problem, a finalizer on the Machine resource in Rancher will add an annotation to the VM which then is evaluated by the node driver when it running the Remove() handler. Signed-off-by: Volker Theile <[email protected]>

votdev added the enhancement New feature or request label Nov 5, 2024

votdev self-assigned this Nov 5, 2024

votdev mentioned this pull request Nov 5, 2024

Set annotation to guest cluster VM when deleting the cluster rancher/rancher#47870

Open

votdev requested review from ibrokethecloud, FrankYang0529 and w13915984028 November 5, 2024 11:24

w13915984028 approved these changes Nov 5, 2024

View reviewed changes

Oats87 suggested changes Nov 5, 2024

View reviewed changes

harvester/harvester.go Outdated Show resolved Hide resolved

votdev force-pushed the issue_2825 branch from 881b6eb to db728bb Compare November 6, 2024 07:29

brandboat self-requested a review November 7, 2024 01:14

votdev requested a review from Oats87 November 12, 2024 08:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete all PVCs if guest cluster is deleted in Rancher #64

Delete all PVCs if guest cluster is deleted in Rancher #64

votdev commented Nov 5, 2024 •

edited

Loading

w13915984028 left a comment

Delete all PVCs if guest cluster is deleted in Rancher #64

Are you sure you want to change the base?

Delete all PVCs if guest cluster is deleted in Rancher #64

Conversation

votdev commented Nov 5, 2024 • edited Loading

Testing

w13915984028 left a comment

Choose a reason for hiding this comment

votdev commented Nov 5, 2024 •

edited

Loading