Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flannel fails to watch subnet leases for other nodes in host gateway backend #83

Closed
xie-qianyue opened this issue May 9, 2020 · 10 comments

Comments

@xie-qianyue
Copy link

I used the latest binaries of kubernetes and the PrepareNodes script.

But I still find logs as Failed to list *v1.Node: Get https://10.246.44.131:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available in flannel pod. Below is the complet log:

[root@xxx-centos-7 hzxieqianyue]# kubectl logs kube-flannel-ds-windows-amd64-482m2 -n kube-system


    Directory: C:\host\etc\cni


Mode                LastWriteTime         Length Name                          
----                -------------         ------ ----                          
d-----         5/6/2020   4:25 PM                net.d                         


    Directory: C:\host\etc


Mode                LastWriteTime         Length Name                          
----                -------------         ------ ----                          
d-----         5/9/2020   2:23 PM                kube-flannel                  


    Directory: C:\host\opt\cni


Mode                LastWriteTime         Length Name                          
----                -------------         ------ ----                          
d-----         5/6/2020   4:25 PM                bin                           


    Directory: C:\host\k


Mode                LastWriteTime         Length Name                          
----                -------------         ------ ----                          
d-----         5/9/2020  11:42 AM                flannel                       


    Directory: C:\host\k\flannel\var\run\secrets\kubernetes.io


Mode                LastWriteTime         Length Name                          
----                -------------         ------ ----                          
d-----         5/9/2020  11:36 AM                serviceaccount                
WARNING: The names of some imported commands from the module 'hns' include unapproved verbs that might make them less 
discoverable. To find the commands with unapproved verbs, run the Import-Module command again with the Verbose 
parameter. For a list of approved verbs, type Get-Verb.
Invoke-HnsRequest : @{Error=提供的策略配置无效或缺少参数。 ; ErrorCode=2151350285; Success=False}
At C:\k\flannel\hns.psm1:233 char:16
+ ...      return Invoke-HnsRequest -Method POST -Type networks -Data $Json ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Write-Error], WriteErrorException
    + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Invoke-HNSRequest
 
I0509 14:24:07.064568    7864 main.go:518] Determining IP address of default interface
I0509 14:24:09.032716    7864 main.go:531] Using interface with name Ethernet and address 10.246.44.130
I0509 14:24:09.032716    7864 main.go:548] Defaulting external address to interface address (10.246.44.130)
I0509 14:24:09.065974    7864 kube.go:119] Waiting 10m0s for node controller to sync
I0509 14:24:09.065974    7864 kube.go:306] Starting kube subnet manager
I0509 14:24:10.066040    7864 kube.go:126] Node controller sync successful
I0509 14:24:10.066040    7864 main.go:246] Created subnet manager: Kubernetes Subnet Manager - win-3fjh9ve50cq
I0509 14:24:10.066040    7864 main.go:249] Installing signal handlers
I0509 14:24:10.066956    7864 main.go:390] Found network config - Backend type: host-gw
I0509 14:24:10.066956    7864 hostgw_windows.go:73] HOST-GW config: {Name:cbr0 DNSServerList:}
I0509 14:24:10.102921    7864 hostgw_windows.go:157] Attempting to create HNSNetwork {"Name":"cbr0","Type":"L2Bridge","Subnets":[{"AddressPrefix":"10.244.5.0/24","GatewayAddress":"10.244.5.1"}]}
E0509 14:24:11.044412    7864 streamwatcher.go:109] Unable to decode an event from the watch stream: read tcp 10.246.44.130:51336->10.246.44.131:6443: wsarecv: An established connection was aborted by the software in your host machine.
E0509 14:24:11.044412    7864 reflector.go:304] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to watch *v1.Node: Get https://10.246.44.131:6443/api/v1/nodes?resourceVersion=2342170&timeoutSeconds=582&watch=true: http2: no cached connection was available
E0509 14:24:12.053470    7864 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://10.246.44.131:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
I0509 14:24:12.221469    7864 hostgw_windows.go:164] Waiting to get ManagementIP from HNSNetwork cbr0
I0509 14:24:12.736052    7864 hostgw_windows.go:174] Waiting to get net interface for HNSNetwork cbr0 (10.246.44.130)
E0509 14:24:13.064714    7864 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://10.246.44.131:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
E0509 14:24:14.073672    7864 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://10.246.44.131:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
E0509 14:24:15.073734    7864 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://10.246.44.131:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
I0509 14:24:15.444317    7864 hostgw_windows.go:183] Created HNSNetwork cbr0
I0509 14:24:15.448222    7864 hostgw_windows.go:212] Attempting to create bridge HNSEndpoint &{Id: Name:cbr0_ep VirtualNetwork:8D90C0B3-1A58-4C7D-BDE5-73DE0B955173 VirtualNetworkName: Policies:[] MacAddress: IPAddress:10.244.5.2 DNSSuffix: DNSServerList: GatewayAddress: EnableInternalDNS:false DisableICC:false PrefixLength:0 IsRemoteEndpoint:false EnableLowMetric:false Namespace:<nil> EncapOverhead:0}
I0509 14:24:15.466900    7864 hostgw_windows.go:217] Created bridge HNSEndpoint cbr0_ep
I0509 14:24:15.466900    7864 hostgw_windows.go:221] Waiting to attach bridge endpoint cbr0_ep to host
E0509 14:24:16.074559    7864 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://10.246.44.131:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
E0509 14:24:17.074680    7864 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://10.246.44.131:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
E0509 14:24:18.075379    7864 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://10.246.44.131:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
I0509 14:24:18.542661    7864 hostgw_windows.go:229] Attached bridge endpoint cbr0_ep to host successfully
I0509 14:24:18.907815    7864 hostgw_windows.go:237] Found {Idx:20 Name:vEthernet (Ethernet) 2 InterfaceMetric:25 DhcpEnabled:false IpAddress:10.246.44.130 SubnetPrefix:24 GatewayMetric:256 DefaultGatewayAddress:10.246.44.1} interface with IP 10.246.44.130
I0509 14:24:19.042572    7864 hostgw_windows.go:249] Enabled forwarding on vEthernet (Ethernet) 2 index 20
E0509 14:24:19.075967    7864 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://10.246.44.131:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
I0509 14:24:19.244242    7864 hostgw_windows.go:237] Found {Idx:45 Name:vEthernet (cbr0_ep) InterfaceMetric:25 DhcpEnabled:false IpAddress:10.244.5.2 SubnetPrefix:24 GatewayMetric:256 DefaultGatewayAddress:10.244.5.1} interface with IP 10.244.5.2
I0509 14:24:19.355554    7864 hostgw_windows.go:249] Enabled forwarding on vEthernet (cbr0_ep) index 45
I0509 14:24:19.355554    7864 main.go:313] Changing default FORWARD chain policy to ACCEPT
I0509 14:24:19.363308    7864 main.go:321] Wrote subnet file to /run/flannel/subnet.env
I0509 14:24:19.363308    7864 main.go:325] Running backend.
I0509 14:24:19.363308    7864 main.go:343] Waiting for all goroutines to exit
I0509 14:24:19.363308    7864 route_network_windows.go:51] Watching for new subnet leases
I0509 14:24:19.372155    7864 route_network_windows.go:94] Subnet added: 10.244.0.0/24 via 10.246.44.131
E0509 14:24:20.076134    7864 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://10.246.44.131:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
E0509 14:24:21.076326    7864 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://10.246.44.131:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
E0509 14:24:22.077035    7864 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://10.246.44.131:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available

I posted a comment here under the issue #37.

And submitted a review in #38. I have a question for the code below:

func setupL2bridge(interfaceName string) {
	run(fmt.Sprintf(`ipmo C:\k\flannel\hns.psm1; New-HNSNetwork -Type Overlay -AddressPrefix "192.168.255.0/30"`+

-Type Overlay here shouldn't be -Type l2bridge?

@Tehacz
Copy link

Tehacz commented May 28, 2020

Any fix or workaround?

@ArchiFleKs
Copy link

I have the same issue on windows server 2019

@masaeedu
Copy link

masaeedu commented Aug 9, 2020

@xie-qianyue Do you think the:

Unable to decode an event from the watch stream: read ... -> ... wsarecv: An established connection was aborted by the software in your host machine.

message is relevant at all? I'm getting the same problem, and that seems to be the first problematic message in the logs.

@masaeedu
Copy link

masaeedu commented Aug 9, 2020

@ArchiFleKs Sorry to ping you directly, but I noticed that you also opened this issue: #96 where you say that connectivity between Linux and Windows pods was working. Does this mean you managed to figure out a workaround?

@ArchiFleKs
Copy link

ArchiFleKs commented Aug 9, 2020

@masaeedu no worries. I have put the windows part on hold. I have tried two setup. One with a single network card where everything was working and another with two network card on windows when depending on which card was used (one card for the external network and another one for the internal network). If I use the external network for flannel I get the lease issue and internet access but no inter pod Connectivity. And if I use the internal card, route exchange is Ok and interpod also. I get no internet in that case.

My goal was to get flannel working with 2 distinct networks , one for external access, on for internal traffic. But for now I did not succeed and I have put that on standby.

if I get back to it at some point I will report here.

@xunchangguo
Copy link

FATA[2020-08-26T02:35:10-07:00] rpc error: code = Unknown desc = panic Failed to find Process32FirstW procedure in kernel32.dll: The specified procedure could not be found.
FATA[2020-08-26T02:35:10-07:00] rpc error: code = Internal desc = could not create IP forward entry: The object already exists.
I0826 02:35:11.829545    8856 main.go:518] Determining IP address of default interface
I0826 02:35:13.121579    8856 main.go:531] Using interface with name Ethernet and address 192.168.12.236
I0826 02:35:13.121579    8856 main.go:548] Defaulting external address to interface address (192.168.12.236)
I0826 02:35:13.808944    8856 kube.go:306] Starting kube subnet manager
I0826 02:35:13.808944    8856 kube.go:119] Waiting 10m0s for node controller to sync
I0826 02:35:14.824027    8856 kube.go:126] Node controller sync successful
I0826 02:35:14.824293    8856 main.go:246] Created subnet manager: Kubernetes Subnet Manager - win-puhtrerui39
I0826 02:35:14.824293    8856 main.go:249] Installing signal handlers
I0826 02:35:14.824293    8856 main.go:390] Found network config - Backend type: vxlan
I0826 02:35:14.824293    8856 vxlan_windows.go:127] VXLAN config: Name=flannel.4096 MacPrefix=0E-2A VNI=4096 Port=4789 GBP=false DirectRouting=false
I0826 02:35:14.950255    8856 device_windows.go:116] Attempting to create HostComputeNetwork &{ flannel.4096 Overlay [] {[]} { [] [] []} [{Static [{172.20.7.0/24 [[123 34 84 121 112 101 34 58 34 86 83 73 68 34 44 34 83 101 116 116 105 110 103 115 34 58 123 34 73 115 111 108 97 116 105 111 110 73 100 34 58 52 48 57 54 125 125]] [{172.20.7.1 0.0.0.0/0 0}]}]}] 8 {2 0}}
E0826 02:35:16.242875    8856 streamwatcher.go:109] Unable to decode an event from the watch stream: read tcp 192.168.12.236:50348->192.168.12.229:6443: wsarecv: An established connection was aborted by the software in your host machine.
E0826 02:35:16.244023    8856 reflector.go:304] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to watch *v1.Node: Get https://192.168.12.229:6443/api/v1/nodes?resourceVersion=260198&timeoutSeconds=582&watch=true: http2: no cached connection was available
E0826 02:35:17.274584    8856 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://192.168.12.229:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
E0826 02:35:18.296513    8856 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://192.168.12.229:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
I0826 02:35:18.989606    8856 device_windows.go:124] Waiting to get ManagementIP from HostComputeNetwork flannel.4096
E0826 02:35:19.298995    8856 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://192.168.12.229:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
I0826 02:35:19.492467    8856 device_windows.go:136] Waiting to get net interface for HostComputeNetwork flannel.4096 (192.168.12.236)
E0826 02:35:20.299735    8856 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://192.168.12.229:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
E0826 02:35:21.301093    8856 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://192.168.12.229:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
I0826 02:35:21.312188    8856 device_windows.go:145] Created HostComputeNetwork flannel.4096
I0826 02:35:21.316094    8856 main.go:313] Changing default FORWARD chain policy to ACCEPT
I0826 02:35:21.324592    8856 main.go:321] Wrote subnet file to /run/flannel/subnet.env
I0826 02:35:21.324592    8856 main.go:325] Running backend.
I0826 02:35:21.324592    8856 main.go:343] Waiting for all goroutines to exit
I0826 02:35:21.324592    8856 vxlan_network_windows.go:63] Watching for new subnet leases
E0826 02:35:22.301932    8856 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://192.168.12.229:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available
E0826 02:35:23.334636    8856 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://192.168.12.229:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available

i have the same error. it is something wrong?

@llyons
Copy link

llyons commented Oct 13, 2020

I have the same issue as the above. hybrid cluster with a linux master, worker node and a windows node.
It seems like this is a common issue.

#103

@jsturtevant
Copy link
Contributor

Although the original post is about host-gw config it looks to be the same error (Flannel creates the network but start to communicate to the api server before things are finished) as described in: #103 (comment). The work around should be to make sure you create the External Network before starting flannel. Your Invoke-HnsRequest : @{Error=提供的策略配置无效或缺少参数。 ; ErrorCode=2151350285; Success=False} is part of the issue describe in linked comment.

There is also an open issue in flannel for the network reset: flannel-io/flannel#1272 and the route issue is in the golang sdk: kubernetes/client-go#374

@jsturtevant
Copy link
Contributor

core flannel issue flannel-io/flannel#1359; closing here

/close

@k8s-ci-robot
Copy link
Contributor

@jsturtevant: Closing this issue.

In response to this:

core flannel issue flannel-io/flannel#1359; closing here

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants