Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to attach containers to encrypted overlay network since linux 5.15.17 #2653

Open
arnegroskurth opened this issue Feb 23, 2022 · 4 comments

Comments

@arnegroskurth
Copy link

(I've phrased the issue for moby/moby before realizing that this is a separate component - so sorry for the docker-based description)

Description

Its currently not possible to communicate over encrypted overlay networks with kernel 5.15.17 due to an unset interface id when configuring the ipsec tunnel.

Downstream issue: coreos/fedora-coreos-tracker#1111

Steps to reproduce the issue:

With two linux 5.15.17 hosts: Create an encrypted overlay network in a swarm and try to communicate between two containers on different nodes attached to that overlay network.

Additional information you deem important (e.g. issue happens only occasionally):

related linux change: torvalds/linux@68ac0f3810e7
potential workaround in netlink library: vishvananda/netlink#727

Missing Ifid for netlink.XfrmPolicy struct: (there may be more)

fPol := &netlink.XfrmPolicy{

@arnegroskurth
Copy link
Author

Also: Does it really make sense to only log the failure to create the xfrm policies as a warning? Seems like the network(-attachment) is not usable without that policy so I would much rather expect an error appearing for in the docker-client when creating/starting a container.

@jsmouret
Copy link

Similiar issue on Debian Buster with the same logs as coreos/fedora-coreos-tracker#1111 (comment)

Working with linux-image-4.19.0-18-amd64
Broken with linux-image-4.19.0-19-amd64

@Nowheresly
Copy link

Related to this issue:

moby/moby#43359 (comment)

@smin
Copy link

smin commented May 16, 2022

The Ubuntu kernels don't seem to have reverted the validation on XFRM IF_ID being > 0. Corrections to the original patch have been included in the latest linux-aws-5.13 which could be read as an indication of it staying https://launchpad.net/ubuntu/+source/linux-aws-5.13/5.13.0-1023.25~20.04.1 https://launchpad.net/bugs/1968591)

What's the appropriate change in Moby or libnetwork?

  1. pass a non-zero Ifid to the netlink call?
  2. patch the netlink library to include changes in Only set XFRMA_IF_ID if not 0 vishvananda/netlink#727
  3. update the netlink library in vendor.conf to a newer release that includes the PR above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants