Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calico vpp failed to install on k8s cluster with dpdk as vpp_uplink_driver #465

Open
dylan2intel opened this issue Dec 2, 2022 · 19 comments

Comments

@dylan2intel
Copy link

Environment

  • Calico/VPP version: v3.24.0
  • Kubernetes version: v1.24.4
  • Deployment type: bare-metal...
  • Network configuration: Calico without VXLAN
  • OS: ubuntu22.04

Issue description
Follow the official getting-started to install calico vpp failed to bring up on-premise k8s cluster.

How does it work if the k8s cluster node ip bound with dpdk driver by calico vpp when coredns pod is initializing and not ready?
Since coredns will reach DNS server with UDP per kernel driver.

To Reproduce
Steps to reproduce the behavior:

Expected behavior
k8s cluster install successfully with calico vpp as CNI

Additional context

image

image

image

image

image

@dylan2intel
Copy link
Author

New findings for root cause addressing on coredns timeout

image

@dylan2intel
Copy link
Author

Hey guys,

Any workaround suggestions or fixes for this issue?

@onong
Copy link
Collaborator

onong commented Dec 5, 2022

Hi @dylan2intel, could you try it in ubuntu 20.04?

@dylan2intel
Copy link
Author

Currently I can workaround this issue with static ip configured as below, However if dhcp enabled, it always fails. Why?

My internal network is under control of a physic switch with ip/mac address mapping.

cat /etc/netplan/00-installer-config.yaml

network:
  ethernets:
    ens259f0:
      addresses:
      - 172.17.29.19/23
      routes:
        - to: default
          via: 172.17.28.1
      nameservers:
        addresses:
        - 10.248.2.5
        - 172.17.6.9
        - 10.239.27.228
        search:
        - cd.intel.com
    ens259f1:
      dhcp4: true
    ens817f0:
      #dhcp4: true ############## Here if dhcp enabled, it will always fail #############
      addresses:
      - 192.168.126.4/24
    ens817f1:
      dhcp4: true
    enx2a24b96f8630:
      dhcp4: true
  version: 2

I think this workaround is a huge limitation, root cause still need to address.

@dylan2intel
Copy link
Author

Another new finding is that once I deploy it successfully by above workaround, then I deploy a pod with service exposed by external IP, as long as external IP specified with ip address(kubectl get node -o wide), the whole calico vpp k8s cluster will fail back, However, if I specify with another ip or expose the pod service without external IP, it works fine.

@onong
Copy link
Collaborator

onong commented Dec 6, 2022

ens817f0:
#dhcp4: true ############## Here if dhcp enabled, it will always fail #############
addresses:
- 192.168.126.4/24

Enabling DHCP by itself should not cause the deploy failure. For example, in cloud environments DHCP is enabled and instances/nodes obtain the DNS/IP addr etc using DHCP. However, point to note is that the IP address remains the same for the lifetime of the instance/node.

How is DHCP configured in your case? Are the IP addrs handed out by the DHCP server to the BM node/system in question dynamic or static? Whats the lease period? What is the network management backend and how does it handle DHCP leases? Could you check the logs to see anything peculiar wrt DHCP?

@onong
Copy link
Collaborator

onong commented Dec 6, 2022

Another new finding is that once I deploy it successfully by above workaround, then I deploy a pod with service exposed by external IP, as long as external IP specified with ip address(kubectl get node -o wide), the whole calico vpp k8s cluster will fail back, However, if I specify with another ip or expose the pod service without external IP, it works fine.

Could you share how you are creating the service and could you share the logs when this failure scenario happens?

kubectl -n calico-vpp-dataplane logs calico-vpp-XXX -c vpp
kubectl -n calico-vpp-dataplane logs calico-vpp-XXX -c agent

@dylan2intel
Copy link
Author

Here you can see that the coredns and calico-kube-controllers pods are always not ready even they are running.

kubectl get nodes -A -o wide
NAME                        STATUS   ROLES           AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE           KERNEL-VERSION      CONTAINER-RUNTIME
r04u06-yuchuan-nex-spr-cd   Ready    control-plane   12m   v1.24.4   192.168.126.4   <none>        Ubuntu 22.04 LTS   5.15.0-50-generic   containerd://1.5.9-0ubuntu3
Every 2.0s: kubectl get pods -A -o wide                                                                                                                              r04u06-yuchuan-nex-spr-cd: Tue Dec  6 14:18:12 2022

NAMESPACE              NAME                                                READY   STATUS    RESTARTS       AGE     IP               NODE                        NOMINATED NODE   READINESS GATES
calico-system          calico-kube-controllers-7cd8b89887-z5jq2            0/1     Running   4 (55s ago)    4m36s   10.244.174.201   r04u06-yuchuan-nex-spr-cd   <none>           <none>
calico-system          calico-node-bww68                                   1/1     Running   0              4m36s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
calico-system          calico-typha-85b7ff95f8-f8qmb                       1/1     Running   0              4m36s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
calico-system          csi-node-driver-6n4pb                               2/2     Running   0              4m36s   10.244.174.202   r04u06-yuchuan-nex-spr-cd   <none>           <none>
calico-vpp-dataplane   calico-vpp-node-24tt8                               2/2     Running   0              4m20s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            coredns-6d4b75cb6d-vjpn4                            0/1     Running   2 (30s ago)    6m36s   10.244.174.203   r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            coredns-6d4b75cb6d-z7p7v                            0/1     Running   2 (17s ago)    6m36s   10.244.174.204   r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            etcd-r04u06-yuchuan-nex-spr-cd                      1/1     Running   22             6m48s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            kube-apiserver-r04u06-yuchuan-nex-spr-cd            1/1     Running   2              6m50s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            kube-controller-manager-r04u06-yuchuan-nex-spr-cd   1/1     Running   3              6m48s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            kube-proxy-vkzz6                                    1/1     Running   0              6m36s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            kube-scheduler-r04u06-yuchuan-nex-spr-cd            1/1     Running   4              6m48s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
tigera-operator        tigera-operator-6c49dc8ddf-n95pf                    1/1     Running   1 (4m9s ago)   4m53s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
klogs calico-vpp-dataplane   calico-vpp-node-24tt8 -c vpp
time="2022-12-06T14:13:52Z" level=info msg="-- Environment --"
time="2022-12-06T14:13:52Z" level=info msg="CorePattern:         /var/lib/vpp/vppcore.%e.%p"
time="2022-12-06T14:13:52Z" level=info msg="ExtraAddrCount:      0"
time="2022-12-06T14:13:52Z" level=info msg="RxMode:              adaptive"
time="2022-12-06T14:13:52Z" level=info msg="TapRxMode:           adaptive"
time="2022-12-06T14:13:52Z" level=info msg="Tap MTU override:    0"
time="2022-12-06T14:13:52Z" level=info msg="Service CIDRs:       [10.96.0.0/12]"
time="2022-12-06T14:13:52Z" level=info msg="Tap Queue Size:      rx:1024 tx:1024"
time="2022-12-06T14:13:52Z" level=info msg="PHY Queue Size:      rx:1024 tx:1024"
time="2022-12-06T14:13:52Z" level=info msg="Hugepages            16"
time="2022-12-06T14:13:52Z" level=info msg="KernelVersion        5.15.0-50"
time="2022-12-06T14:13:52Z" level=info msg="Drivers              map[uio_pci_generic:false vfio-pci:true]"
time="2022-12-06T14:13:52Z" level=info msg="vfio iommu:          false"
time="2022-12-06T14:13:52Z" level=info msg="-- Interface Spec --"
time="2022-12-06T14:13:52Z" level=info msg="Interface Name:      ens817f0"
time="2022-12-06T14:13:52Z" level=info msg="Native Driver:       dpdk"
time="2022-12-06T14:13:52Z" level=info msg="vppIpConfSource:     linux"
time="2022-12-06T14:13:52Z" level=info msg="New Drive Name:      "
time="2022-12-06T14:13:52Z" level=info msg="PHY target #Queues   rx:1 tx:1"
time="2022-12-06T14:13:52Z" level=info msg="-- Interface config --"
time="2022-12-06T14:13:52Z" level=info msg="Node IP4:            192.168.126.4/24"
time="2022-12-06T14:13:52Z" level=info msg="Node IP6:            "
time="2022-12-06T14:13:52Z" level=info msg="PciId:               0000:98:00.0"
time="2022-12-06T14:13:52Z" level=info msg="Driver:              ice"
time="2022-12-06T14:13:52Z" level=info msg="Linux IF was up ?    true"
time="2022-12-06T14:13:52Z" level=info msg="Promisc was on ?     false"
time="2022-12-06T14:13:52Z" level=info msg="DoSwapDriver:        false"
time="2022-12-06T14:13:52Z" level=info msg="Mac:                 40:a6:b7:97:34:b0"
time="2022-12-06T14:13:52Z" level=info msg="Addresses:           [192.168.126.4/24 ens817f0,fe80::42a6:b7ff:fe97:34b0/64]"
time="2022-12-06T14:13:52Z" level=info msg="Routes:              [{Ifindex: 90 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254}, {Ifindex: 90 Dst: 192.168.126.0/24 Src: 192.168.126.4 Gw: <nil> Flags: [] Table: 254}]"
time="2022-12-06T14:13:52Z" level=info msg="PHY original #Queues rx:224 tx:224"
time="2022-12-06T14:13:52Z" level=info msg="MTU                  1500"
time="2022-12-06T14:13:52Z" level=info msg="isTunTap             false"
time="2022-12-06T14:13:52Z" level=info msg="isVeth               false"
time="2022-12-06T14:13:52Z" level=info msg="Found previous coredump /var/lib/vpp/vppcore.vpp_wk_0.1304166, missing gdb for stacktrace"
time="2022-12-06T14:13:52Z" level=info msg="Running with uplink dpdk"
time="2022-12-06T14:13:52Z" level=info msg="deleting Route {Ifindex: 90 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254}"
time="2022-12-06T14:13:52Z" level=info msg="deleting Route {Ifindex: 90 Dst: 192.168.126.0/24 Src: 192.168.126.4 Gw: <nil> Flags: [] Table: 254}"
time="2022-12-06T14:13:53Z" level=info msg="VPP started [PID 746218]"
time="2022-12-06T14:13:53Z" level=info msg="Waiting for VPP... [0/10]"
/usr/bin/vpp[746218]: perfmon: skipping source 'intel-uncore' - intel_uncore_init: no uncore units found
/usr/bin/vpp[746218]: tls_init_ca_chain:609: Could not initialize TLS CA certificates
/usr/bin/vpp[746218]: tls_mbedtls_init:644: failed to initialize TLS CA chain
/usr/bin/vpp[746218]: tls_init_ca_chain:1005: Could not initialize TLS CA certificates
/usr/bin/vpp[746218]: tls_openssl_init:1079: failed to initialize TLS CA chain
time="2022-12-06T14:13:55Z" level=info msg="Waiting for VPP... [1/10]"
time="2022-12-06T14:13:57Z" level=info msg="Waiting for VPP... [2/10]"
time="2022-12-06T14:13:59Z" level=info msg="Waiting for VPP... [3/10]"
time="2022-12-06T14:14:01Z" level=info msg="Adding address 192.168.126.4/24 ens817f0 to data interface"
time="2022-12-06T14:14:01Z" level=info msg="Adding address fe80::42a6:b7ff:fe97:34b0/64 to data interface"
time="2022-12-06T14:14:01Z" level=error msg="Error adding address to data interface: Adding IP address failed: req &{SwIfIndex:1 IsAdd:true DelAll:false Prefix:fe80::42a6:b7ff:fe97:34b0/64} reply &{Retval:-59}: VPPApiError: Address length mismatch (-59)"
time="2022-12-06T14:14:01Z" level=info msg="Creating Linux side interface"
time="2022-12-06T14:14:01Z" level=info msg="Adding address 192.168.126.4/24 ens817f0 to tap interface"
time="2022-12-06T14:14:01Z" level=info msg="Adding address fe80::42a6:b7ff:fe97:34b0/64 to tap interface"
time="2022-12-06T14:14:01Z" level=info msg="Adding ND proxy for address fe80::42a6:b7ff:fe97:34b0"
time="2022-12-06T14:14:01Z" level=info msg="Adding address 192.168.126.4/24 ens817f0 to tap interface"
time="2022-12-06T14:14:01Z" level=info msg="Adding address fe80::42a6:b7ff:fe97:34b0/64 to tap interface"
time="2022-12-06T14:14:01Z" level=warning msg="add addr fe80::42a6:b7ff:fe97:34b0/64 via vpp EEXIST, file exists"
time="2022-12-06T14:14:01Z" level=info msg="Adding route {Ifindex: 91 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254} via VPP"
time="2022-12-06T14:14:01Z" level=info msg="add route via vpp : {Ifindex: 91 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254} already exists"
time="2022-12-06T14:14:01Z" level=info msg="Adding route {Ifindex: 91 Dst: 192.168.126.0/24 Src: 192.168.126.4 Gw: <nil> Flags: [] Table: 254} via VPP"
time="2022-12-06T14:14:01Z" level=info msg="add route via vpp : {Ifindex: 91 Dst: 192.168.126.0/24 Src: 192.168.126.4 Gw: <nil> Flags: [] Table: 254} already exists"
time="2022-12-06T14:14:01Z" level=info msg="Using 192.168.126.254 as next hop for cluster IPv4 routes"
time="2022-12-06T14:14:01Z" level=info msg="Setting BGP nodeIP 192.168.126.4/24"
time="2022-12-06T14:14:01Z" level=info msg="Updating node, version = 965, metaversion = 965"
Using systemctl
Using systemd-networkd
time="2022-12-06T14:14:01Z" level=info msg="Received signal child exited, vpp index 1"
time="2022-12-06T14:14:01Z" level=info msg="Ignoring SIGCHLD for pid 0"
time="2022-12-06T14:14:01Z" level=info msg="Done with signal child exited"
time="2022-12-06T14:14:31Z" level=info msg="Received signal broken pipe, vpp index 1"
time="2022-12-06T14:14:31Z" level=info msg="Done with signal broken pipe"
klogs calico-vpp-dataplane   calico-vpp-node-24tt8 -c agent
time="2022-12-06T14:13:52Z" level=info msg="Config:TapNumRxQueues    1"
time="2022-12-06T14:13:52Z" level=info msg="Config:MultinetEnabled   false"
time="2022-12-06T14:13:52Z" level=info msg="Config:MemifEnabled      false"
time="2022-12-06T14:13:52Z" level=info msg="Config:VCLEnabled        false"
time="2022-12-06T14:13:52Z" level=info msg="Config:PodGSOEnabled     true"
time="2022-12-06T14:13:52Z" level=info msg="Config:EnableServices    true"
time="2022-12-06T14:13:52Z" level=info msg="Config:EnableIPSec       false"
time="2022-12-06T14:13:52Z" level=info msg="Config:CrossIpsecTunnels false"
time="2022-12-06T14:13:52Z" level=info msg="Config:EnablePolicies    true"
time="2022-12-06T14:13:52Z" level=info msg="Config:IpsecAddressCount 1"
time="2022-12-06T14:13:52Z" level=info msg="Config:RxMode            3"
time="2022-12-06T14:13:52Z" level=info msg="Config:LogLevel          4"
time="2022-12-06T14:13:52Z" level=info msg="Config:IpsecNbAsyncCryptoThread  0"
time="2022-12-06T14:13:52Z" level=info msg="Config:EnableSRv6        false"
time="2022-12-06T14:13:52Z" level=info msg="Waiting for VPP... [0/10]" component=vpp-api
time="2022-12-06T14:13:54Z" level=info msg="Waiting for VPP... [1/10]" component=vpp-api
time="2022-12-06T14:13:56Z" level=info msg="Waiting for VPP... [2/10]" component=vpp-api
time="2022-12-06T14:13:58Z" level=info msg="Waiting for VPP... [3/10]" component=vpp-api
time="2022-12-06T14:14:00Z" level=info msg="Connected to VPP version 22.06-rc0~254-g608dd6bd6" component=vpp-api
time="2022-12-06T14:14:01Z" level=info msg="SRv6Provider NewSRv6Provider" subcomponent=connectivity
time="2022-12-06T14:14:01Z" level=info msg="Waiting for our node's BGP spec..."
time="2022-12-06T14:14:01Z" level=info msg="Syncing nodes..." subcomponent=node-watcher
time="2022-12-06T14:14:01Z" level=info msg="Waiting for ipam..."
time="2022-12-06T14:14:01Z" level=info msg="Reconciliating pools..." subcomponent=ipam-cache
time="2022-12-06T14:14:01Z" level=info msg="Main client watcher loop"
time="2022-12-06T14:14:01Z" level=info msg="Adding pool: 10.244.0.0/16, nat:true" subcomponent=ipam-cache
time="2022-12-06T14:14:01Z" level=info msg="Waiting for felix config being present..."
time="2022-12-06T14:14:01Z" level=info msg="Starting policy server" component=policy
time="2022-12-06T14:14:01Z" level=info msg="Creating policy to allow traffic from host with ingress policies" component=policy
time="2022-12-06T14:14:01Z" level=info msg="policy(add) VPP rule=action=allow src==[192.168.126.4/32] id=0"
time="2022-12-06T14:14:01Z" level=info msg="Created ipset 0"
time="2022-12-06T14:14:01Z" level=info msg="Main client watcher loop"
time="2022-12-06T14:14:01Z" level=info msg="policy(add) VPP rule=action=deny sipset==[0] id=1"
time="2022-12-06T14:14:01Z" level=info msg="policy(add) VPP rule=action=allow id=2"
time="2022-12-06T14:14:01Z" level=info msg="Creating policy to allow traffic to host that is applied on uplink" component=policy
time="2022-12-06T14:14:01Z" level=info msg="policy(add) VPP rule=action=allow src==[192.168.126.4/32] id=3"
time="2022-12-06T14:14:01Z" level=info msg="policy(add) VPP rule=action=allow dst==[192.168.126.4/32] id=4"
time="2022-12-06T14:14:01Z" level=info msg="policy(add) VPP rule=action=allowfilters=[proto==6] dport==[22,179,2379,2380,5473,6443,6666,6667] id=5"
time="2022-12-06T14:14:01Z" level=info msg="policy(add) VPP rule=action=allowfilters=[proto==17] dport==[68] id=6"
time="2022-12-06T14:14:01Z" level=info msg="policy(add) VPP rule=action=allowfilters=[proto==6] dport==[179,2379,2380,5473,6443,6666,6667] id=7"
time="2022-12-06T14:14:01Z" level=info msg="policy(add) VPP rule=action=allowfilters=[proto==17] dport==[53,67] id=8"
time="2022-12-06T14:14:02Z" level=info msg="Accepted connection from felix" component=policy
time="2022-12-06T14:14:02Z" level=info msg="Got config from felix: config:<key:\"CalicoVersion\" value:\"v3.24.5\" > config:<key:\"ClusterGUID\" value:\"ae1120ee03fb4134bd94c768e57919cb\" > config:<key:\"ClusterType\" value:\"typha,kdd,k8s,operator,bgp,vpp,kubeadm\" > config:<key:\"DataplaneDriver\" value:\"/usr/local/bin/felix-plugins/felix-api-proxy\" > config:<key:\"DatastoreType\" value:\"kubernetes\" > config:<key:\"DefaultEndpointToHostAction\" value:\"ACCEPT\" > config:<key:\"FelixHostname\" value:\"r04u06-yuchuan-nex-spr-cd\" > config:<key:\"FloatingIPs\" value:\"Disabled\" > config:<key:\"HealthEnabled\" value:\"true\" > config:<key:\"HealthPort\" value:\"9099\" > config:<key:\"IpInIpTunnelAddr\" value:\"10.244.174.192\" > config:<key:\"Ipv6Support\" value:\"false\" > config:<key:\"LogFilePath\" value:\"None\" > config:<key:\"LogSeverityFile\" value:\"None\" > config:<key:\"LogSeverityScreen\" value:\"Info\" > config:<key:\"LogSeveritySys\" value:\"None\" > config:<key:\"MetadataAddr\" value:\"None\" > config:<key:\"ReportingIntervalSecs\" value:\"0\" > config:<key:\"TyphaCAFile\" value:\"/etc/pki/tls/certs/tigera-ca-bundle.crt\" > config:<key:\"TyphaCN\" value:\"typha-server\" > config:<key:\"TyphaCertFile\" value:\"/node-certs/tls.crt\" > config:<key:\"TyphaK8sNamespace\" value:\"calico-system\" > config:<key:\"TyphaK8sServiceName\" value:\"calico-typha\" > config:<key:\"TyphaKeyFile\" value:\"/node-certs/tls.key\" > config:<key:\"UseInternalDataplaneDriver\" value:\"false\" > config:<key:\"XDPEnabled\" value:\"false\" > " component=policy
time="2022-12-06T14:14:02Z" level=info msg="Merging in config from internal override: map[CalicoVersion:v3.24.5 ClusterGUID:ae1120ee03fb4134bd94c768e57919cb ClusterType:typha,kdd,k8s,operator,bgp,vpp,kubeadm DatastoreType:kubernetes DefaultEndpointToHostAction:ACCEPT FelixHostname:r04u06-yuchuan-nex-spr-cd FloatingIPs:Disabled HealthEnabled:true HealthPort:9099 IpInIpTunnelAddr:10.244.174.192 Ipv6Support:false LogSeverityFile:None LogSeverityScreen:Info LogSeveritySys:None MetadataAddr:None ReportingIntervalSecs:0 TyphaCN:typha-server TyphaK8sNamespace:calico-system TyphaK8sServiceName:calico-typha UseInternalDataplaneDriver:false XDPEnabled:false]"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for TyphaK8sNamespace: calico-system (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for TyphaK8sNamespace: calico-system (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for CalicoVersion: v3.24.5 (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for CalicoVersion: v3.24.5 (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for ClusterGUID: ae1120ee03fb4134bd94c768e57919cb (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for ClusterGUID: ae1120ee03fb4134bd94c768e57919cb (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for ClusterType: typha,kdd,k8s,operator,bgp,vpp,kubeadm (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for ClusterType: typha,kdd,k8s,operator,bgp,vpp,kubeadm (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for Ipv6Support: false (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for Ipv6Support: false (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for DatastoreType: kubernetes (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for DatastoreType: kubernetes (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for UseInternalDataplaneDriver: false (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for UseInternalDataplaneDriver: false (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for HealthPort: 9099 (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for HealthPort: 9099 (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for LogSeverityFile: None (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Value set to 'none', replacing with zero-value: \"\"."
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for LogSeverityFile:  (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for IpInIpTunnelAddr: 10.244.174.192 (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for IpInIpTunnelAddr: 10.244.174.192 (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for XDPEnabled: false (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for XDPEnabled: false (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for MetadataAddr: None (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Value set to 'none', replacing with zero-value: \"\"."
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for MetadataAddr:  (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for LogSeveritySys: None (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Value set to 'none', replacing with zero-value: \"\"."
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for LogSeveritySys:  (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for TyphaK8sServiceName: calico-typha (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for TyphaK8sServiceName: calico-typha (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for DefaultEndpointToHostAction: ACCEPT (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for DefaultEndpointToHostAction: ACCEPT (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for ReportingIntervalSecs: 0 (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for ReportingIntervalSecs: 0s (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for TyphaCN: typha-server (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for TyphaCN: typha-server (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for FelixHostname: r04u06-yuchuan-nex-spr-cd (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for FelixHostname: r04u06-yuchuan-nex-spr-cd (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for FloatingIPs: Disabled (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for FloatingIPs: Disabled (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for HealthEnabled: true (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for HealthEnabled: true (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsing value for LogSeverityScreen: Info (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="Parsed value for LogSeverityScreen: INFO (from internal override)"
time="2022-12-06T14:14:02Z" level=info msg="TODO : default endpoint to host action changed" component=policy
time="2022-12-06T14:14:02Z" level=info msg="Agent started"
time="2022-12-06T14:14:02Z" level=info msg="PEER watcher starts" subcomponent=peer-watcher
time="2022-12-06T14:14:02Z" level=info msg="Adding route to service prefix 10.96.0.0/12 through VPP"
time="2022-12-06T14:14:02Z" level=info msg="Routing server started" component=routing
time="2022-12-06T14:14:02Z" level=info msg="Rescanning existing VXLAN tunnels" subcomponent=connectivity
time="2022-12-06T14:14:02Z" level=info msg="Serve() Prometheus exporter" component=prometheus
time="2022-12-06T14:14:02Z" level=info msg="Subscribing to netlink route updates"
time="2022-12-06T14:14:02Z" level=info msg="Added ip pool 10.244.0.0/16"
time="2022-12-06T14:14:02Z" level=info msg="Subscribing to netlink address updates"
time="2022-12-06T14:14:02Z" level=info msg="RescanState: re-creating all interfaces" component=cni
time="2022-12-06T14:14:02Z" level=info msg="Serve() CNI" component=cni
time="2022-12-06T14:14:02Z" level=info msg="Routing server is running " component=routing
time="2022-12-06T14:14:02Z" level=info msg="Rescanning existing routes" subcomponent=connectivity
time="2022-12-06T14:14:02Z" level=info msg="SRv6Provider RescanState" subcomponent=connectivity
time="2022-12-06T14:14:02Z" level=info msg="Rescanning existing tunnels" subcomponent=connectivity
time="2022-12-06T14:14:02Z" level=info msg="Rescanning existing routes" subcomponent=connectivity
time="2022-12-06T14:14:02Z" level=info msg="svc(add) key=TCP#10.96.0.1#443 [TCP real=false lbtyp=1 vip=10.96.0.1;443 rw=[<nil>->192.168.126.4;6443]] vpp-id=0" component=services
time="2022-12-06T14:14:02Z" level=info msg="Updating local prefix set with 10.244.174.192/26" subcomponent=prefix-watcher
time="2022-12-06T14:14:02Z" level=info msg="svc(add) key=UDP#10.96.0.10#53 [UDP real=false lbtyp=1 vip=10.96.0.10;53 rw=] vpp-id=4294967295" component=services
time="2022-12-06T14:14:02Z" level=info msg="svc(add) key=TCP#10.96.0.10#53 [TCP real=false lbtyp=1 vip=10.96.0.10;53 rw=] vpp-id=4294967295" component=services
time="2022-12-06T14:14:02Z" level=info msg="svc(add) key=TCP#10.96.0.10#9153 [TCP real=false lbtyp=1 vip=10.96.0.10;9153 rw=] vpp-id=4294967295" component=services
time="2022-12-06T14:14:02Z" level=info msg="svc(add) key=TCP#10.106.25.159#5473 [TCP real=false lbtyp=1 vip=10.106.25.159;5473 rw=[<nil>->192.168.126.4;5473]] vpp-id=1" component=services
time="2022-12-06T14:14:02Z" level=info msg="Main client watcher loop"
time="2022-12-06T14:14:02Z" level=info msg="Process backend watcher done event in main client"
time="2022-12-06T14:14:02Z" level=info msg="Terminating main client watcher loop"
time="2022-12-06T14:14:02Z" level=info msg="Policies now in sync" component=policy
time="2022-12-06T14:14:02Z" level=info msg="Reconciliating pending policy state with configured state" component=policy
time="2022-12-06T14:14:02Z" level=info msg="Reconciliation done" component=policy
time="2022-12-06T14:14:03Z" level=info msg="pod(add) spec=netns:/var/run/netns/cni-cae70823-6e6d-865e-4e25-ba0f540e4366,if:eth0 [10.244.174.201] network=" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) checking available buffers, 0 existing pods, request 2048 / 429193" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) VRF" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) VRF 3519870697 ip4 default route via VRF 2" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) VRF 2868466484 ip6 default route via VRF 2" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) loopback" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) Enable interface[3] SNAT" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) tuntap" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) tun swIfIndex=4" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) Enable interface[4] SNAT" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) tun route swIfIndex=4 linux-ifIndex=3 route=0.0.0.0/0" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) Skipping tun swIfIndex=4 route=::/0" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) tun address swIfIndex=4 linux-ifIndex=3 address=10.244.174.201/32" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) Configuring IPv4 forwarding" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) Done tun swIfIndex=4" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) Default routes to swIfIndex=4 isL3=true" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) route [podVRF ->MainIF] v4 10.244.174.201/32 -> [if:4]" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) announcing pod Addresses" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) HostPorts" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) activate strict RPF on interface" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) create pod RPF VRF" component=cni
time="2022-12-06T14:14:03Z" level=info msg="policy(add) Workload id=k8s:calico-system/calico-kube-controllers-7cd8b89887-z5jq2:eth0: swIfIndex=4" component=policy
time="2022-12-06T14:14:03Z" level=info msg="creating wep in workloadadded?" component=policy
time="2022-12-06T14:14:03Z" level=info msg="not creating wep in workloadadded" component=policy
time="2022-12-06T14:14:03Z" level=info msg="pod(add) add routes for RPF VRF" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) add route to [10.244.174.201/32] in rpfvrf 1649599747 via tun" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) set custom-vrf urpf" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) Done spec=netns:/var/run/netns/cni-cae70823-6e6d-865e-4e25-ba0f540e4366,if:eth0 [10.244.174.201]" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) spec=netns:/var/run/netns/cni-fde6fdbe-3ee5-2d83-a390-2e6cadfd3af6,if:eth0 [10.244.174.202] network=" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) checking available buffers, 1 existing pods, request 4096 / 429193" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) VRF" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) VRF 1476291629 ip4 default route via VRF 2" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) VRF 2748932008 ip6 default route via VRF 2" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) loopback" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) Enable interface[5] SNAT" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) tuntap" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) tun swIfIndex=6" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) Enable interface[6] SNAT" component=cni
time="2022-12-06T14:14:03Z" level=info msg="policy(add) VPP rule=action=allow id=9"
time="2022-12-06T14:14:03Z" level=info msg="policy(add) VPP rule=action=allow id=10"
time="2022-12-06T14:14:03Z" level=info msg="policy(upd) Handled Profile Update pending=false id=kns.calico-system existing=<nil> new=[id=5 inboundRules=[[vpp-id=9 rid=fw2fB0QZr1DPiXMg]] outboundRules=[[vpp-id=10 rid=D66fMttEtc8Iyw4Z]]]"
time="2022-12-06T14:14:03Z" level=info msg="policy(upd) Handled Profile Update pending=false id=ksa.calico-system.calico-kube-controllers existing=<nil> new=[id=6]"
time="2022-12-06T14:14:03Z" level=info msg="no network status for pod, no multiple networks"
time="2022-12-06T14:14:03Z" level=info msg="policy(add) Workload Endpoint add pending=false id={k8s calico-system/calico-kube-controllers-7cd8b89887-z5jq2 eth0 } new=if=[4] profiles=[kns.calico-system ksa.calico-system.calico-kube-controllers] tiers=[] Profiles=[kns.calico-system,ksa.calico-system.calico-kube-controllers] swIf=map[eth0:4]"
time="2022-12-06T14:14:03Z" level=info msg="pod(add) tun route swIfIndex=6 linux-ifIndex=3 route=0.0.0.0/0" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) Skipping tun swIfIndex=6 route=::/0" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) tun address swIfIndex=6 linux-ifIndex=3 address=10.244.174.202/32" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) Configuring IPv4 forwarding" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) Done tun swIfIndex=6" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) Default routes to swIfIndex=6 isL3=true" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) route [podVRF ->MainIF] v4 10.244.174.202/32 -> [if:6]" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) announcing pod Addresses" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) HostPorts" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) activate strict RPF on interface" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) create pod RPF VRF" component=cni
time="2022-12-06T14:14:03Z" level=info msg="policy(add) Workload id=k8s:calico-system/csi-node-driver-6n4pb:eth0: swIfIndex=6" component=policy
time="2022-12-06T14:14:03Z" level=info msg="creating wep in workloadadded?" component=policy
time="2022-12-06T14:14:03Z" level=info msg="not creating wep in workloadadded" component=policy
time="2022-12-06T14:14:03Z" level=info msg="pod(add) add routes for RPF VRF" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) add route to [10.244.174.202/32] in rpfvrf 2180890343 via tun" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) set custom-vrf urpf" component=cni
time="2022-12-06T14:14:03Z" level=info msg="pod(add) Done spec=netns:/var/run/netns/cni-fde6fdbe-3ee5-2d83-a390-2e6cadfd3af6,if:eth0 [10.244.174.202]" component=cni
time="2022-12-06T14:14:03Z" level=info msg="policy(upd) Handled Profile Update pending=false id=ksa.calico-system.csi-node-driver existing=<nil> new=[id=7]"
time="2022-12-06T14:14:03Z" level=info msg="no network status for pod, no multiple networks"
time="2022-12-06T14:14:03Z" level=info msg="policy(add) Workload Endpoint add pending=false id={k8s calico-system/csi-node-driver-6n4pb eth0 } new=if=[6] profiles=[kns.calico-system ksa.calico-system.csi-node-driver] tiers=[] Profiles=[kns.calico-system,ksa.calico-system.csi-node-driver] swIf=map[eth0:6]"
time="2022-12-06T14:14:04Z" level=info msg="pod(add) spec=netns:/var/run/netns/cni-4dd07589-bc06-eef4-11da-fb28f3adb7a7,if:eth0 [10.244.174.203] network=" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) checking available buffers, 2 existing pods, request 6144 / 429193" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) VRF" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) VRF 3421909937 ip4 default route via VRF 2" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) VRF 3167820124 ip6 default route via VRF 2" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) loopback" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) Enable interface[7] SNAT" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) tuntap" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) tun swIfIndex=8" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) Enable interface[8] SNAT" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) tun route swIfIndex=8 linux-ifIndex=3 route=0.0.0.0/0" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) Skipping tun swIfIndex=8 route=::/0" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) tun address swIfIndex=8 linux-ifIndex=3 address=10.244.174.203/32" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) Configuring IPv4 forwarding" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) Done tun swIfIndex=8" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) Default routes to swIfIndex=8 isL3=true" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) route [podVRF ->MainIF] v4 10.244.174.203/32 -> [if:8]" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) announcing pod Addresses" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) HostPorts" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) activate strict RPF on interface" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) create pod RPF VRF" component=cni
time="2022-12-06T14:14:04Z" level=info msg="policy(add) Workload id=k8s:kube-system/coredns-6d4b75cb6d-vjpn4:eth0: swIfIndex=8" component=policy
time="2022-12-06T14:14:04Z" level=info msg="creating wep in workloadadded?" component=policy
time="2022-12-06T14:14:04Z" level=info msg="not creating wep in workloadadded" component=policy
time="2022-12-06T14:14:04Z" level=info msg="pod(add) add routes for RPF VRF" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) add route to [10.244.174.203/32] in rpfvrf 2636375307 via tun" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) set custom-vrf urpf" component=cni
time="2022-12-06T14:14:04Z" level=info msg="pod(add) Done spec=netns:/var/run/netns/cni-4dd07589-bc06-eef4-11da-fb28f3adb7a7,if:eth0 [10.244.174.203]" component=cni
time="2022-12-06T14:14:04Z" level=info msg="policy(add) VPP rule=action=allow id=11"
time="2022-12-06T14:14:04Z" level=info msg="policy(add) VPP rule=action=allow id=12"
time="2022-12-06T14:14:04Z" level=info msg="policy(upd) Handled Profile Update pending=false id=kns.kube-system existing=<nil> new=[id=8 inboundRules=[[vpp-id=11 rid=YlN-KJA93B7Fe2ih]] outboundRules=[[vpp-id=12 rid=GUM3x2IgYhIM5cBY]]]"
time="2022-12-06T14:14:04Z" level=info msg="policy(upd) Handled Profile Update pending=false id=ksa.kube-system.coredns existing=<nil> new=[id=9]"
time="2022-12-06T14:14:04Z" level=info msg="no network status for pod, no multiple networks"
time="2022-12-06T14:14:04Z" level=info msg="policy(add) Workload Endpoint add pending=false id={k8s kube-system/coredns-6d4b75cb6d-vjpn4 eth0 } new=if=[8] profiles=[kns.kube-system ksa.kube-system.coredns] tiers=[] Profiles=[kns.kube-system,ksa.kube-system.coredns] swIf=map[eth0:8]"
time="2022-12-06T14:14:04Z" level=info msg="Main client watcher loop"
time="2022-12-06T14:14:04Z" level=info msg="Process backend watcher done event in main client"
time="2022-12-06T14:14:04Z" level=info msg="Terminating main client watcher loop"
time="2022-12-06T14:14:06Z" level=info msg="Main client watcher loop"
time="2022-12-06T14:14:15Z" level=info msg="pod(add) spec=netns:/var/run/netns/cni-de6dbe7f-e7e8-755b-5ab5-dd9cdd50644a,if:eth0 [10.244.174.204] network=" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) checking available buffers, 3 existing pods, request 8192 / 429193" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) VRF" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) VRF 28987765 ip4 default route via VRF 2" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) VRF 2210837584 ip6 default route via VRF 2" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) loopback" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) Enable interface[9] SNAT" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) tuntap" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) tun swIfIndex=10" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) Enable interface[10] SNAT" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) tun route swIfIndex=10 linux-ifIndex=3 route=0.0.0.0/0" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) Skipping tun swIfIndex=10 route=::/0" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) tun address swIfIndex=10 linux-ifIndex=3 address=10.244.174.204/32" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) Configuring IPv4 forwarding" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) Done tun swIfIndex=10" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) Default routes to swIfIndex=10 isL3=true" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) route [podVRF ->MainIF] v4 10.244.174.204/32 -> [if:10]" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) announcing pod Addresses" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) HostPorts" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) activate strict RPF on interface" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) create pod RPF VRF" component=cni
time="2022-12-06T14:14:15Z" level=info msg="policy(add) Workload id=k8s:kube-system/coredns-6d4b75cb6d-z7p7v:eth0: swIfIndex=10" component=policy
time="2022-12-06T14:14:15Z" level=info msg="creating wep in workloadadded?" component=policy
time="2022-12-06T14:14:15Z" level=info msg="not creating wep in workloadadded" component=policy
time="2022-12-06T14:14:15Z" level=info msg="pod(add) add routes for RPF VRF" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) add route to [10.244.174.204/32] in rpfvrf 3039689583 via tun" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) set custom-vrf urpf" component=cni
time="2022-12-06T14:14:15Z" level=info msg="pod(add) Done spec=netns:/var/run/netns/cni-de6dbe7f-e7e8-755b-5ab5-dd9cdd50644a,if:eth0 [10.244.174.204]" component=cni
time="2022-12-06T14:14:15Z" level=info msg="no network status for pod, no multiple networks"
time="2022-12-06T14:14:15Z" level=info msg="policy(add) Workload Endpoint add pending=false id={k8s kube-system/coredns-6d4b75cb6d-z7p7v eth0 } new=if=[10] profiles=[kns.kube-system ksa.kube-system.coredns] tiers=[] Profiles=[kns.kube-system,ksa.kube-system.coredns] swIf=map[eth0:10]"
time="2022-12-06T14:14:32Z" level=info msg="pod(del) key=netns:/var/run/netns/cni-62afd86c-38fc-60e4-8bd2-c54bca9d6832,if:eth0" component=cni
time="2022-12-06T14:14:32Z" level=warning msg="Unknown pod to delete key=netns:/var/run/netns/cni-62afd86c-38fc-60e4-8bd2-c54bca9d6832,if:eth0" component=cni
time="2022-12-06T14:14:40Z" level=info msg="Process backend watcher done event in main client"
time="2022-12-06T14:14:40Z" level=info msg="Terminating main client watcher loop"
time="2022-12-06T14:14:42Z" level=info msg="Main client watcher loop"
time="2022-12-06T14:16:36Z" level=info msg="Process backend watcher done event in main client"
time="2022-12-06T14:16:36Z" level=info msg="Terminating main client watcher loop"
time="2022-12-06T14:16:38Z" level=info msg="Main client watcher loop"
ifconfig ens817f0
ens817f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.126.4  netmask 255.255.254.0  broadcast 192.168.127.255
        inet6 fe80::42a6:b7ff:fe97:34b0  prefixlen 64  scopeid 0x20<link>
        ether 76:78:58:e9:ef:80  txqueuelen 1000  (Ethernet)
        RX packets 297  bytes 22098 (22.0 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 9  bytes 786 (786.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
ip route
default via 172.17.28.1 dev ens259f0 proto static
10.96.0.0/12 via 192.168.126.254 dev ens817f1 proto static mtu 1440
10.244.0.0/16 via 192.168.126.254 dev ens817f1 proto static mtu 1440
172.17.28.0/23 dev ens259f0 proto kernel scope link src 172.17.29.19
192.168.126.0/23 dev ens817f1 proto kernel scope link src 192.168.126.5
192.168.126.0/23 dev ens817f0 proto kernel scope link src 192.168.126.4
sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-FIREWALL  all  --  anywhere             anywhere
KUBE-NODEPORTS  all  --  anywhere             anywhere             /* kubernetes health check service ports */
KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
KUBE-FORWARD  all  --  anywhere             anywhere             /* kubernetes forwarding rules */
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-FIREWALL  all  --  anywhere             anywhere
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */

Chain KUBE-EXTERNAL-SERVICES (2 references)
target     prot opt source               destination

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
DROP       all  -- !localhost/8          localhost/8          /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT

Chain KUBE-FORWARD (1 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere             ctstate INVALID
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding conntrack rule */ ctstate RELATED,ESTABLISHED

Chain KUBE-KUBELET-CANARY (0 references)
target     prot opt source               destination

Chain KUBE-NODEPORTS (1 references)
target     prot opt source               destination

Chain KUBE-PROXY-CANARY (0 references)
target     prot opt source               destination

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination
REJECT     udp  --  anywhere             fm62d-c85efs1a--fm62d-edr65_.fm.intel.com  /* kube-system/kube-dns:dns has no endpoints */ udp dpt:domain reject-with icmp-port-unreachable
REJECT     tcp  --  anywhere             fm62d-c85efs1a--fm62d-edr65_.fm.intel.com  /* kube-system/kube-dns:dns-tcp has no endpoints */ tcp dpt:domain reject-with icmp-port-unreachable
REJECT     tcp  --  anywhere             fm62d-c85efs1a--fm62d-edr65_.fm.intel.com  /* kube-system/kube-dns:metrics has no endpoints */ tcp dpt:9153 reject-with icmp-port-unreachable
ethtool -i  ens817f0
driver: tun
version: 1.6
firmware-version:
expansion-rom-version:
bus-info: tap
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

@dylan2intel
Copy link
Author

New finding from the route table, it seems VPP created a route on incorrect interface.

ip route
default via 172.17.28.1 dev ens259f0 proto static
10.96.0.0/12 via 192.168.126.254 dev ens817f1 proto static mtu 1440
10.244.0.0/16 via 192.168.126.254 dev ens817f1 proto static mtu 1440 // incorrect route found
172.17.28.0/23 dev ens259f0 proto kernel scope link src 172.17.29.19
192.168.126.0/23 dev ens817f1 proto kernel scope link src 192.168.126.5 
192.168.126.0/23 dev ens817f0 proto kernel scope link src 192.168.126.4

10.244.0.0/16 via 192.168.126.254 dev ens817f1 proto static mtu 1440

// this route is incorrect, the expected interface should be ens817f0(tap)
// Physically ens817f0/ens817f1 is two ports of one network interface
// The configured vpp_dataplane_interface is specified with ens817f0

@dylan2intel
Copy link
Author

dylan2intel commented Dec 7, 2022

Continuous update new findings

  1. Disable another port of the interface, only reserve one port, it works.
  2. As long as I specify the two ports(ens817f0, ens817f1) with diff subnets(192.168.126.4/24, 192.168.127.4/24), it works fine.
Every 2.0s: kubectl get pods -A -o wide                                                                                                            r04u06-yuchuan-nex-spr-cd: Wed Dec  7 04:05:06 2022

NAMESPACE              NAME                                                READY   STATUS    RESTARTS        AGE     IP               NODE                        NOMINATED NODE   READINESS GATES    calico-apiserver       calico-apiserver-b4599fb9-vflf4                     1/1     Running   0               7m57s   10.244.174.213   r04u06-yuchuan-nex-spr-cd   <none>           <none>
calico-apiserver       calico-apiserver-b4599fb9-z2spp                     1/1     Running   0               7m57s   10.244.174.212   r04u06-yuchuan-nex-spr-cd   <none>           <none>
calico-system          calico-kube-controllers-7cd8b89887-898dp            1/1     Running   0               8m59s   10.244.174.209   r04u06-yuchuan-nex-spr-cd   <none>           <none>
calico-system          calico-node-qxsn6                                   1/1     Running   0               6m37s   192.168.126.3    r04u38-yuchuan-nex-spr-cd   <none>           <none>
calico-system          calico-node-tjxrw                                   1/1     Running   0               9m      192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
calico-system          calico-typha-6648455568-h5555                       1/1     Running   0               9m      192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
calico-system          csi-node-driver-5wwrh                               2/2     Running   0               6m37s   10.244.29.194    r04u38-yuchuan-nex-spr-cd   <none>           <none>
calico-system          csi-node-driver-t2k89                               2/2     Running   0               8m59s   10.244.174.210   r04u06-yuchuan-nex-spr-cd   <none>           <none>
calico-vpp-dataplane   calico-vpp-node-7sb9x                               2/2     Running   0               6m37s   192.168.126.3    r04u38-yuchuan-nex-spr-cd   <none>           <none>
calico-vpp-dataplane   calico-vpp-node-d8xzr                               2/2     Running   0               8m46s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            coredns-6d4b75cb6d-jj4r2                            1/1     Running   0               9m33s   10.244.174.211   r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            coredns-6d4b75cb6d-zr7td                            1/1     Running   0               9m33s   10.244.174.208   r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            etcd-r04u06-yuchuan-nex-spr-cd                      1/1     Running   38 (10m ago)    9m47s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            kube-apiserver-r04u06-yuchuan-nex-spr-cd            1/1     Running   1               9m49s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            kube-controller-manager-r04u06-yuchuan-nex-spr-cd   1/1     Running   2 (8m34s ago)   9m47s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            kube-proxy-c922b                                    1/1     Running   0               6m37s   192.168.126.3    r04u38-yuchuan-nex-spr-cd   <none>           <none>
kube-system            kube-proxy-gzkrp                                    1/1     Running   0               9m33s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            kube-scheduler-r04u06-yuchuan-nex-spr-cd            1/1     Running   3 (8m34s ago)   9m47s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
tigera-operator        tigera-operator-6c49dc8ddf-c6hwq                    1/1     Running   1 (8m34s ago)   9m10s   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
Every 2.0s: ip route      r04u06-yuchuan-nex-spr-cd: Wed Dec  7 03:57:27 2022

default via 172.17.28.1 dev ens259f0 proto static
10.96.0.0/12 via 192.168.126.254 dev ens817f0 proto static mtu 1440
10.244.0.0/16 via 192.168.126.254 dev ens817f0 proto static mtu 1440
172.17.28.0/23 dev ens259f0 proto kernel scope link src 172.17.29.19
192.168.126.0/24 dev ens817f0 proto kernel scope link src 192.168.126.4
192.168.127.0/24 dev ens817f1 proto kernel scope link src 192.168.127.4

@onong Could you please double-check if this is a bug? thx.

@onong
Copy link
Collaborator

onong commented Dec 7, 2022

Hi @dylan2intel, thank you for sharing the detailed logs and for all the digging!

Looking more closely, the following might explain some of the anomalies:

192.168.126.0/23 dev ens817f1 proto kernel scope link src 192.168.126.5
192.168.126.0/23 dev ens817f0 proto kernel scope link src 192.168.126.4

These are conflicting routes. Which of the two interfaces, ens817f0/ens817f1, do packets destined for 192.168.126.0/23 go to? This would also explain why things work when only one port is used or when different subnets are used.

ip route
default via 172.17.28.1 dev ens259f0 proto static
10.96.0.0/12 via 192.168.126.254 dev ens817f1 proto static mtu 1440
10.244.0.0/16 via 192.168.126.254 dev ens817f1 proto static mtu 1440 // incorrect route found
172.17.28.0/23 dev ens259f0 proto kernel scope link src 172.17.29.19
192.168.126.0/23 dev ens817f1 proto kernel scope link src 192.168.126.5
192.168.126.0/23 dev ens817f0 proto kernel scope link src 192.168.126.4

The route in question is just a config magic that we do so that the pods are reachable from linux as well as vpp. The interface chosen, ens817f1, is incorrect but I believe it is caused by the fact that the nexthop 192.168.126.254 is reachable via both ens817f0 and ens817f1 and the kernel chose ens817f1.

@dylan2intel
Copy link
Author

BTW, due to continuous observing I found that even if the ip route is correct, however it also may get stuck on coredns pod creating, but if I re-run it will succeed, very strange, good news is that the failure frequency become lower.

Absolutely magic for the next hop, suppose if the networking switched into DHCP and assuming that the two ports allocated ips under the same subnet, how can it work?

@onong
Copy link
Collaborator

onong commented Dec 8, 2022

BTW, due to continuous observing I found that even if the ip route is correct, however it also may get stuck on coredns pod creating, but if I re-run it will succeed, very strange, good news is that the failure frequency become lower.

This is strange. Could you share the kubectl describe and logs for the coredns pods when they are stuck in pod creating? Also, could you share the show cnat translation and show ip fib output?

kubectl exec -it -n calico-vpp-dataplane calico-vpp-XYZ -c vpp -- bash
vppctl show cnat translation
vppctl sh ip fib

@onong
Copy link
Collaborator

onong commented Dec 8, 2022

Absolutely magic for the next hop, suppose if the networking switched into DHCP and assuming that the two ports allocated ips under the same subnet, how can it work?

Not sure I understood. Does your use case require both ports to be in the same subnet? You may run into arp issues at the least afaik.

@dylan2intel
Copy link
Author

Hi @onong

Bad luck that it reproduced today with correct route, here is for your information to address the root cause.

kubectl get pods -A -o wide

NAMESPACE              NAME                                                READY   STATUS             RESTARTS      AGE   IP               NODE                        NOMINATED NODE   READINESS GATES
calico-system          calico-kube-controllers-7cd8b89887-24prt            0/1     CrashLoopBackOff   6 (97s ago)   12m   10.244.174.202   r04u06-yuchuan-nex-spr-cd   <none>           <none>
calico-system          calico-node-pbp5n                                   1/1     Running            0             12m   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
calico-system          calico-typha-5b8865688d-cxv2p                       1/1     Running            0             12m   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
calico-system          csi-node-driver-rh8ck                               2/2     Running            0             11m   10.244.174.203   r04u06-yuchuan-nex-spr-cd   <none>           <none>
calico-vpp-dataplane   calico-vpp-node-lgklc                               2/2     Running            0             11m   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            coredns-6d4b75cb6d-7zfxp                            0/1     Running            6 (17s ago)   12m   10.244.174.204   r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            coredns-6d4b75cb6d-t6qnv                            0/1     Running            6 (27s ago)   12m   10.244.174.201   r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            etcd-r04u06-yuchuan-nex-spr-cd                      1/1     Running            20            12m   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            kube-apiserver-r04u06-yuchuan-nex-spr-cd            1/1     Running            20            12m   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            kube-controller-manager-r04u06-yuchuan-nex-spr-cd   1/1     Running            22            12m   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            kube-proxy-5nczq                                    1/1     Running            0             12m   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
kube-system            kube-scheduler-r04u06-yuchuan-nex-spr-cd            1/1     Running            24            12m   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>
tigera-operator        tigera-operator-6c49dc8ddf-52r6w                    1/1     Running            1 (11m ago)   12m   192.168.126.4    r04u06-yuchuan-nex-spr-cd   <none>           <none>

ip route

default via 172.17.28.1 dev ens259f0 proto static
10.96.0.0/16 via 192.168.126.254 dev ens817f0 proto static mtu 1440
10.244.0.0/16 via 192.168.126.254 dev ens817f0 proto static mtu 1440
172.17.28.0/23 dev ens259f0 proto kernel scope link src 172.17.29.19
192.168.126.0/24 dev ens817f0 proto kernel scope link src 192.168.126.4
192.168.127.0/24 dev ens817f1 proto kernel scope link src 192.168.127.4
raspadmin@r04u06-yuchuan-nex-spr-cd:~$ dpdk-devbind.py --status-dev net

Network devices using DPDK-compatible driver
============================================
0000:98:00.0 'Ethernet Controller E810-C for QSFP 1592' drv=vfio-pci unused=ice

Network devices using kernel driver
===================================
0000:16:00.0 'Ethernet Controller X710 for 10GbE SFP+ 1572' if=ens259f0 drv=i40e unused=vfio-pci *Active*
0000:16:00.1 'Ethernet Controller X710 for 10GbE SFP+ 1572' if=ens259f1 drv=i40e unused=vfio-pci
0000:98:00.1 'Ethernet Controller E810-C for QSFP 1592' if=ens817f1 drv=ice unused=vfio-pci *Active*
raspadmin@r04u06-yuchuan-nex-spr-cd:~$ cat /etc/vpp-network-mapping.conf
192.168.126.4/24 ens817f0 0000:98:00.0

kubectl exec -it -n calico-vpp-dataplane calico-vpp-node-lgklc -c vpp -- bash

vppctl show cnat translation

[0] 10.96.0.1;443 TCP lb:maglev
::;0->192.168.126.4;6443
  fib-entry:27
  [@0]: dpo-load-balance: [proto:ip4 index:29 buckets:1 uRPF:34 to:[1401:83880]]
        [0] [@15]: dpo-receive: 192.168.126.4 on HundredGigabitEthernet98/0/0
 via:
  [@2]: dpo-load-balance: [proto:ip4 index:44 buckets:1 uRPF:-1 to:[0:0]]
    [0] [@18]: dpo-load-balance: [proto:ip4 index:29 buckets:1 uRPF:34 to:[1401:83880]]
          [0] [@15]: dpo-receive: 192.168.126.4 on HundredGigabitEthernet98/0/0maglev backends map
  backend#0: 1ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
[1] 10.96.93.51;5473 TCP lb:maglev
::;0->192.168.126.4;5473
  fib-entry:27
  [@0]: dpo-load-balance: [proto:ip4 index:29 buckets:1 uRPF:34 to:[1401:83880]]
        [0] [@15]: dpo-receive: 192.168.126.4 on HundredGigabitEthernet98/0/0
 via:
  [@2]: dpo-load-balance: [proto:ip4 index:46 buckets:1 uRPF:-1 to:[0:0]]
    [0] [@18]: dpo-load-balance: [proto:ip4 index:29 buckets:1 uRPF:34 to:[1401:83880]]
          [0] [@15]: dpo-receive: 192.168.126.4 on HundredGigabitEthernet98/0/0maglev backends map
  backend#0: 1ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

vppctl sh ip fib

ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[recursive-resolution:1, default-route:1, ]
0.0.0.0/0
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:1 buckets:1 uRPF:0 to:[2417:119067]]
    [0] [@0]: dpo-drop ip4
0.0.0.0/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:2 buckets:1 uRPF:1 to:[0:0]]
    [0] [@0]: dpo-drop ip4
10.96.0.1/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:43 buckets:1 uRPF:62 to:[0:0]]
    [0] [@17]: [0] cnat-client:[10.96.0.1] tr:1 sess:3 exclusive
10.96.93.51/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:45 buckets:1 uRPF:64 to:[0:0]]
    [0] [@17]: [1] cnat-client:[10.96.93.51] tr:1 sess:0 exclusive
10.244.174.201/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:55 buckets:1 uRPF:77 to:[170:10200]]
    [0] [@5]: ipv4 via 0.0.0.0 tun1: mtu:9216 next:8 flags:[features ]
10.244.174.202/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:63 buckets:1 uRPF:87 to:[0:0]]
    [0] [@5]: ipv4 via 0.0.0.0 tun2: mtu:9216 next:9 flags:[features ]
10.244.174.203/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:71 buckets:1 uRPF:97 to:[0:0]]
    [0] [@5]: ipv4 via 0.0.0.0 tun3: mtu:9216 next:10 flags:[features ]
10.244.174.204/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:79 buckets:1 uRPF:107 to:[168:10080]]
    [0] [@5]: ipv4 via 0.0.0.0 tun4: mtu:9216 next:11 flags:[features ]
192.168.126.0/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:27 buckets:1 uRPF:30 to:[0:0]]
    [0] [@0]: dpo-drop ip4
192.168.126.0/24
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:26 buckets:1 uRPF:29 to:[0:0]]
    [0] [@4]: ipv4-glean: [src:192.168.126.0/24] HundredGigabitEthernet98/0/0: mtu:1500 next:1 flags:[] ffffffffffff40a6b79734b00806
192.168.126.4/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:29 buckets:1 uRPF:34 to:[1558:93280]]
    [0] [@15]: dpo-receive: 192.168.126.4 on HundredGigabitEthernet98/0/0
192.168.126.255/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:28 buckets:1 uRPF:32 to:[0:0]]
    [0] [@0]: dpo-drop ip4
224.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:4 buckets:1 uRPF:3 to:[0:0]]
    [0] [@0]: dpo-drop ip4
240.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:3 buckets:1 uRPF:2 to:[0:0]]
    [0] [@0]: dpo-drop ip4
255.255.255.255/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:5 buckets:1 uRPF:4 to:[0:0]]
    [0] [@0]: dpo-drop ip4
punt-table-ip4, fib_index:1, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[API:1, recursive-resolution:1, ]
0.0.0.0/0
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:9 buckets:1 uRPF:61 to:[1558:93280]]
    [0] [@5]: ipv4 via 169.254.0.1 tap0: mtu:9216 next:6 flags:[features ] 40a6b79734b002ca11c0fd100800
0.0.0.0/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:10 buckets:1 uRPF:8 to:[0:0]]
    [0] [@0]: dpo-drop ip4
224.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:12 buckets:1 uRPF:10 to:[0:0]]
    [0] [@0]: dpo-drop ip4
240.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:11 buckets:1 uRPF:9 to:[0:0]]
    [0] [@0]: dpo-drop ip4
255.255.255.255/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:13 buckets:1 uRPF:11 to:[0:0]]
    [0] [@0]: dpo-drop ip4
calico-pods-ip4, fib_index:2, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[API:1, recursive-resolution:1, ]
0.0.0.0/0
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:14 buckets:1 uRPF:18 to:[3975:212347]]
    [0] [@12]: dst-address,unicast lookup in ipv4-VRF:0
0.0.0.0/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:15 buckets:1 uRPF:13 to:[0:0]]
    [0] [@0]: dpo-drop ip4
224.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:17 buckets:1 uRPF:15 to:[0:0]]
    [0] [@0]: dpo-drop ip4
240.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:16 buckets:1 uRPF:14 to:[0:0]]
    [0] [@0]: dpo-drop ip4
255.255.255.255/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:18 buckets:1 uRPF:16 to:[0:0]]
    [0] [@0]: dpo-drop ip4
host-tap-ens817f0-ip4, fib_index:3, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[interface:1, API:1, adjacency:1, ]
0.0.0.0/0
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:31 buckets:1 uRPF:42 to:[338:20280]]
    [0] [@12]: dst-address,unicast lookup in ipv4-VRF:0
0.0.0.0/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:32 buckets:1 uRPF:37 to:[0:0]]
    [0] [@0]: dpo-drop ip4
169.254.0.1/32
  UNRESOLVED
192.168.126.4/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:42 buckets:1 uRPF:60 to:[0:0]]
    [0] [@15]: dpo-receive: 192.168.126.4 on tap0
224.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:34 buckets:1 uRPF:39 to:[0:0]]
    [0] [@0]: dpo-drop ip4
240.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:33 buckets:1 uRPF:38 to:[0:0]]
    [0] [@0]: dpo-drop ip4
255.255.255.255/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:35 buckets:1 uRPF:41 to:[0:0]]
    [0] [@5]: ipv4 via ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff HundredGigabitEthernet98/0/0: mtu:1500 next:5 flags:[features ] ffffffffffff40a6b79734b00800
1br07seQ-4-eth0-cni-9d568483-645f-c06c-18fc-c071890f8d4a, fib_index:4, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[interface:2, API:1, ]
0.0.0.0/0
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:48 buckets:1 uRPF:74 to:[1969:105028]]
    [0] [@12]: dst-address,unicast lookup in calico-pods-ip4
0.0.0.0/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:49 buckets:1 uRPF:67 to:[0:0]]
    [0] [@0]: dpo-drop ip4
224.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:51 buckets:1 uRPF:69 to:[0:0]]
    [0] [@0]: dpo-drop ip4
240.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:50 buckets:1 uRPF:68 to:[0:0]]
    [0] [@0]: dpo-drop ip4
255.255.255.255/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:52 buckets:1 uRPF:70 to:[0:0]]
    [0] [@0]: dpo-drop ip4
i+DQRJaF-4-eth0-cni-0492c2dc-d73e-ebb3-8d8b-d1d232f8ac3a, fib_index:5, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[interface:2, API:1, ]
0.0.0.0/0
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:56 buckets:1 uRPF:85 to:[59:3540]]
    [0] [@12]: dst-address,unicast lookup in calico-pods-ip4
0.0.0.0/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:57 buckets:1 uRPF:79 to:[0:0]]
    [0] [@0]: dpo-drop ip4
224.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:59 buckets:1 uRPF:81 to:[0:0]]
    [0] [@0]: dpo-drop ip4
240.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:58 buckets:1 uRPF:80 to:[0:0]]
    [0] [@0]: dpo-drop ip4
255.255.255.255/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:60 buckets:1 uRPF:82 to:[0:0]]
    [0] [@0]: dpo-drop ip4
AOv/EEyA-4-eth0-cni-db6b5df2-bf86-7c18-edda-015dde417e34, fib_index:6, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[interface:2, API:1, ]
0.0.0.0/0
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:64 buckets:1 uRPF:95 to:[0:0]]
    [0] [@12]: dst-address,unicast lookup in calico-pods-ip4
0.0.0.0/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:65 buckets:1 uRPF:89 to:[0:0]]
    [0] [@0]: dpo-drop ip4
224.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:67 buckets:1 uRPF:91 to:[0:0]]
    [0] [@0]: dpo-drop ip4
240.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:66 buckets:1 uRPF:90 to:[0:0]]
    [0] [@0]: dpo-drop ip4
255.255.255.255/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:68 buckets:1 uRPF:92 to:[0:0]]
    [0] [@0]: dpo-drop ip4
kvXwL2H0-4-eth0-cni-da411dd9-33c9-6966-f879-6749b628140d, fib_index:7, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[interface:2, API:1, ]
0.0.0.0/0
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:72 buckets:1 uRPF:105 to:[1947:103779]]
    [0] [@12]: dst-address,unicast lookup in calico-pods-ip4
0.0.0.0/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:73 buckets:1 uRPF:99 to:[0:0]]
    [0] [@0]: dpo-drop ip4
224.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:75 buckets:1 uRPF:101 to:[0:0]]
    [0] [@0]: dpo-drop ip4
240.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:74 buckets:1 uRPF:100 to:[0:0]]
    [0] [@0]: dpo-drop ip4
255.255.255.255/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:76 buckets:1 uRPF:102 to:[0:0]]
    [0] [@0]: dpo-drop ip4

@dylan2intel
Copy link
Author

Absolutely magic for the next hop, suppose if the networking switched into DHCP and assuming that the two ports allocated ips under the same subnet, how can it work?

Not sure I understood. Does your use case require both ports to be in the same subnet? You may run into arp issues at the least afaik.

Not my case requiring both ports, because Intel's NIC interface is with two ports physically.
And not my case requiring the same subnet, suppose my network setup is working in DHCP, which means the two ports probably are allocated two ips in the same subnet.

@dylan2intel
Copy link
Author

New finding that it always happens in such scenario that the created tap interface mac address changed.

Here is a workaround that when I change back the physical mac address, the pods fall back to ready.

raspadmin@r04u06-yuchuan-nex-spr-cd:~$ ethtool -i ens817f0
driver: tun
version: 1.6
firmware-version:
expansion-rom-version:
bus-info: tap
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

raspadmin@r04u06-yuchuan-nex-spr-cd:~$ ip a
...
130: ens817f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN group default qlen 1000
    link/ether 76:78:58:e9:ef:80 brd ff:ff:ff:ff:ff:ff
    inet 192.168.126.4/24 brd 192.168.126.255 scope global ens817f0
       valid_lft forever preferred_lft forever
    inet6 fe80::42a6:b7ff:fe97:34b0/64 scope link
       valid_lft forever preferred_lft forever

raspadmin@r04u06-yuchuan-nex-spr-cd:~$ sudo macchanger -m 40:a6:b7:97:34:b0 ens817f0
Current MAC:   76:78:58:e9:ef:80 (unknown)
Permanent MAC: 00:00:00:00:00:00 (XEROX CORPORATION)
New MAC:       40:a6:b7:97:34:b0 (unknown)

Can we fix it by always specifying the physical mac address to create the tap interface?

@dylan2intel
Copy link
Author

Another new findings:

kubernetes worker joining cluster failed due to created tap device mac address also changed

image

@ThomasJonathanJackson
Copy link

ThomasJonathanJackson commented Aug 24, 2024

I kinda had the same problems with coredns and calico kube controller pod, but doing the followings saved my life!

  • installing flannel after creating the kubernetes cluster and before installing calico/vpp dataplane. it is fine as stated here
  • installing istio as stated here
  • having the node without any vpp installation! (prevention of conflicts between system vpp and the calico-vpp pod)

finally I'm glad that your problem resolved though I don't much understand the details, being a newbie and all :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants