Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tap interfaces removed shortly after VM deployment #2390

Open
scottyeager opened this issue Aug 6, 2024 · 2 comments
Open

Tap interfaces removed shortly after VM deployment #2390

scottyeager opened this issue Aug 6, 2024 · 2 comments
Assignees
Labels
type_bug Something isn't working
Milestone

Comments

@scottyeager
Copy link

We tried some deployments on mainnet node 7061 to test out the GPU. For some reason Zos deleted the tap interfaces for the VMs shortly after deployment on most attempts. These are the high level steps:

  1. Reserve the node as dedicated
  2. Deploy a VM with the GPU attached
  3. Try to connect to the VM and it doesn't work

Checking the node logs, we see this:

[+] networkd: 2024-08-06T19:52:28Z info Removing tap interface tap-name=F7vrqJbpmDrY3
[+] networkd: 2024-08-06T19:52:28Z info Removing tap interface tap-name=B5PgtDck4vZAm
[+] storaged: 2024-08-06T19:52:28Z warn failed to delete qgroup error="stderr: ERROR: unable to destroy quota group: Device or resource busy\n: exit status 1" group-id=0/2494
[+] storaged: 2024-08-06T19:52:28Z info Deleting volume rootfs:18-601701-vmb6u1n
[+] storaged: 2024-08-06T19:52:28Z warn Could not find filesystem 18-601701-vmb6u1n
[+] storaged: 2024-08-06T19:52:28Z info Deleting volume 18-601701-vmb6u1n
[+] flistd: 2024-08-06T19:51:53Z info request to mount flist storage= url=https://hub.grid.tf/tf-official-vms/ubuntu-24.04-full.flist
[+] flistd: 2024-08-06T19:51:53Z info request to mount flist: {ReadOnly:false Limit:0 Storage: PersistedVolume:/mnt/2f61fd58-c758-4f38-87e3-f3b53fc018db/rootfs:18-601701-vmb6u1n} name=18-601701-vmb6u1n storage= url=https://hub.grid.tf/tf-official-vms/ubuntu-24.04-full.flist
[+] storaged: 2024-08-06T19:51:53Z info Creating new volume with size 107374182400
[+] storaged: 2024-08-06T19:51:53Z warn Could not find filesystem 18-601701-vmb6u1n
[+] storaged: 2024-08-06T19:51:53Z info Deleting volume 18-601701-vmb6u1n
[+] flistd: 2024-08-06T19:51:53Z info request to mount flist: {ReadOnly:true Limit:0 Storage: PersistedVolume:} name=cloud-container:c1f77d34c40c7879a220ba3d20b3535a storage= url=https://hub.grid.tf/tf-autobuilder/cloud-container-9dba60e.flist
[+] flistd: 2024-08-06T19:51:48Z info request to mount flist storage= url=https://hub.grid.tf/tf-official-vms/ubuntu-24.04-full.flist
[+] flistd: 2024-08-06T19:51:48Z info request to mount flist: {ReadOnly:true Limit:0 Storage: PersistedVolume:} name=18-601701-vmb6u1n storage= url=https://hub.grid.tf/tf-official-vms/ubuntu-24.04-full.flist
[+] networkd: 2024-08-06T19:51:48Z info Setting up mycelium tap interface tap-name=F7vrqJbpmDrY3
[+] networkd: 2024-08-06T19:51:48Z info Setting up yggdrasil tap interface tap-name=6i11EuQTj4TDo
[+] networkd: 2024-08-06T19:51:48Z info Setting up tap interface network-id=7aNtVkvidsRRW
[+] networkd: 2024-08-06T19:51:44Z info to remove Set{}
[+] networkd: 2024-08-06T19:51:44Z info to add Set{100.64.20.2/16}
[+] networkd: 2024-08-06T19:51:44Z info current Set{}
[+] networkd: 2024-08-06T19:51:44Z info configure wg device
[+] networkd: 2024-08-06T19:51:44Z info create mycelium bridge bridge=m-7aNtVkvidsRRW
[+] networkd: 2024-08-06T19:51:43Z info set address on macvlan interface addr=10.20.2.1/24
[+] networkd: 2024-08-06T19:51:43Z info Create namespace namespace=n-7aNtVkvidsRRW
[+] networkd: 2024-08-06T19:51:43Z info Create bridge bridge=b-7aNtVkvidsRRW
[+] networkd: 2024-08-06T19:51:43Z info create network resource namespace
[+] networkd: 2024-08-06T19:51:43Z info create network resource network=7aNtVkvidsRRW

I don't understand why the tap interfaces are being removed.

@Mik-TF
Copy link

Mik-TF commented Aug 7, 2024

To add info:

  • connection over wireguard didn't work (connection timed out)
  • mycelium ping: destination unreachable: No route

@ramezsaeed ramezsaeed added this to the 3.12 milestone Aug 7, 2024
@rawdaGastan rawdaGastan added the type_bug Something isn't working label Sep 5, 2024
@ashraffouda ashraffouda modified the milestones: 3.12, 3.13 Sep 18, 2024
@iwanbk
Copy link
Member

iwanbk commented Sep 19, 2024

I don't understand why the tap interfaces are being removed.

it could happen in two scenarios:

  1. when the VM provisioning failed
  2. when doing VM deprovision.

i guess it is the case number 1 because you didn't do VM deprovision

do you have other logs than the above @scottyeager ?

The timestamps of your logs also a bit weird.
on the top:

[+] networkd: 2024-08-06T19:52:28Z info Removing tap interface tap-name=F7vrqJbpmDrY3
[+] networkd: 2024-08-06T19:52:28Z info Removing tap interface tap-name=B5PgtDck4vZAm

on the below, the time is before logs on the top

[+] networkd: 2024-08-06T19:51:48Z info Setting up mycelium tap interface tap-name=F7vrqJbpmDrY3
[+] networkd: 2024-08-06T19:51:48Z info Setting up yggdrasil tap interface tap-name=6i11EuQTj4TDo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type_bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants