Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encrypted docker overlay network breaks on update to 35.20220116.3.0 #1111

Closed
arnegroskurth opened this issue Feb 23, 2022 · 13 comments
Closed

Comments

@arnegroskurth
Copy link

Describe the bug

Updating CoreOS from 35.20220103.3.0 to 35.20220116.3.0 breaks encrypted docker overlay networking. No communication between containers on an encrypted overlay network is possible.

I've tested the following CoreOS versions which leads me to believe that the update might be related to the kernel-update from 5.15.7 to 5.15.17 but nothing in the kernel changelogs of the related releases stood out to me (combined changelog for all related versions for faster grepping: https://gist.github.com/arnegroskurth/ff5d0fdf9c66e0a3fc7b7f3eb4a92156).

35.20220131.3.0 -> broken
35.20220116.3.0 -> broken
35.20220103.3.0 -> works
35.20211215.3.0 -> works
35.20211203.3.0 -> works

My most promising insight:
Using tcpdump, I can see -as expected- the pings as ESP packets (timely correlated) on the VM network interface for those versions where the communication works. However when the VMs are created e.g. with version 35.20220116.3.0, I can see the pings as udp/vxlan/icmp packets on both the source- and target-VM indicating that the ESP encapsulation does not actually happen. (Yes, docker inspect confirms the option encrypted being set of the overlay network)

Observed traffic on source VM over encrypted network with broken version:

# tcpdump -i any -vvnn host 192.168.14.45
...
18:00:36.053451 net0  Out IP (tos 0x0, ttl 64, id 542, offset 0, flags [none], proto UDP (17), length 134)
    192.168.14.46.45320 > 192.168.14.45.4789: [bad udp cksum 0x9e2f -> 0x6cf8!] VXLAN, flags [I] (0x08), vni 4100
IP (tos 0x0, ttl 64, id 50734, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.2.9 > 10.0.2.11: ICMP echo request, id 419, seq 827, length 64

Observed traffic on target VM over encrypted network with broken version:

# tcpdump -i any -nnvv "host 192.168.14.46"
...
17:56:36.228997 net0  In  IP (tos 0x0, ttl 64, id 10361, offset 0, flags [DF], proto UDP (17), length 77)
    192.168.14.46.7946 > 192.168.14.45.7946: [udp sum ok] UDP, length 49
17:56:36.437736 net0  In  IP (tos 0x0, ttl 64, id 10460, offset 0, flags [none], proto UDP (17), length 134)
    192.168.14.46.45320 > 192.168.14.45.4789: [udp sum ok] VXLAN, flags [I] (0x08), vni 4100
IP (tos 0x0, ttl 64, id 8557, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.2.9 > 10.0.2.11: ICMP echo request, id 419, seq 593, length 64
17:56:36.970162 net0  In  IP (tos 0x0, ttl 64, id 10642, offset 0, flags [DF], proto UDP (17), length 86)
    192.168.14.46.7946 > 192.168.14.45.7946: [udp sum ok] UDP, length 58
17:56:36.970457 net0  Out IP (tos 0x0, ttl 64, id 61257, offset 0, flags [DF], proto UDP (17), length 77)
    192.168.14.45.7946 > 192.168.14.46.7946: [bad udp cksum 0x9df6 -> 0x59b4!] UDP, length 49

For comparison: The observed traffic on the target VM over an encrxpted network with a working version:

16:26:55.546208 net0  In  IP (tos 0x0, ttl 64, id 7381, offset 0, flags [none], proto ESP (50), length 160)
    192.168.14.43 > 192.168.14.42: ESP(spi=0xe70760a8,seq=0x1cf), length 140
16:26:56.546122 net0  Out IP (tos 0x0, ttl 64, id 51741, offset 0, flags [none], proto ESP (50), length 160)
    192.168.14.42 > 192.168.14.43: ESP(spi=0x1266a4b0,seq=0x1d7), length 140

Further insight:

  • Communication over an unencrypted overlay network works as expected
  • Communication over an encrypted overlay network but to a container on the same host (=swarm node) works
  • I've mostly tested with icmp/ping but any tests I've done over tcp did also fail

Reproduction steps

Steps to reproduce the behavior:

  1. Create two CoreOS VMs from the 35.20220116.3.0 ova image (obviously sharing a vlan/portgroup)
  2. Create a docker swarm among the two hosts
  3. Create an overlay network with --opt encrypted and --attachable
  4. Create two containers in the overlay network, one on each node
  5. Try to ping one container from the other
  6. Confirm that communication between two containers on an unencrypted overlay network does work fine

Expected behavior

Communication across encrypted overlay networks works.

Actual behavior

No IP packets reach a container on another host over an encrypted

Ignition config

Is generated by our deployment solution from which I can not extract it in a straight-forward way. But as I can reliably reproduce the problem by only changing the CoreOS version, I'm led to believe that it is not related to the ignition config.

@lucab
Copy link
Contributor

lucab commented Feb 23, 2022

Thanks for the report and all the details!
The kernel suspicion sounds like a good one. If you are in a position where you can easily try older FCOS releases, the testing stream had an intermediate kernel bump (5.15.7 to 5.15.10 to 5.15.17, in 35.20220116.2.0 and 35.20220116.2.1) which may be interesting to try.

Also, I see there has been a SELinux policy update at the same time. Did you check if you have any new AVC or similar permission issues in the logs?

@arnegroskurth
Copy link
Author

@lucab Thank you for the quick response!

I could not find any denials in the logs but I've now noticed a difference in the loaded kernel modules: The esp4 module is loaded on 35.20220103.3.0 but not on 35.20220116.3.0. I could not find a related commit in the linux kernel releases so has there been a change in CoreOS related to that?

@arnegroskurth
Copy link
Author

Have now noticed a lot of errors like this in the docker daemon logs when the containers attached to the overlay network are started:

Feb 23 21:07:46 BK-8bfc39dad3f909c0e35d599b8f0b4c155e89924a-zone-a dockerd[968]: time="2022-02-23T21:07:46.669498335+01:00" level=warning msg="Failed Adding rSA{Dst: 192.168.14.145, Src: 192.168.14.147, Proto: esp, Mode: transport, SPI: 0x1f279ba7, ReqID: 0xd0c4e3, ReplayWindow: 0, Mark: <nil>, OutputMark: 0, Ifid: 0, Auth: <nil>, Crypt: <nil>, Aead: {Name: rfc4106(gcm(aes)), Key: 0x43821b91e82886d2f3c6bfe1afcd9bdc1f279ba7, ICV length: 64}, Encap: <nil>, ESN: false}: invalid argument"
Feb 23 21:07:46 BK-8bfc39dad3f909c0e35d599b8f0b4c155e89924a-zone-a dockerd[968]: time="2022-02-23T21:07:46.669663525+01:00" level=warning msg="Failed Adding fSA{Dst: 192.168.14.147, Src: 192.168.14.145, Proto: esp, Mode: transport, SPI: 0x755dee5b, ReqID: 0xd0c4e3, ReplayWindow: 0, Mark: <nil>, OutputMark: 0, Ifid: 0, Auth: <nil>, Crypt: <nil>, Aead: {Name: rfc4106(gcm(aes)), Key: 0x43821b91e82886d2f3c6bfe1afcd9bdc755dee5b, ICV length: 64}, Encap: <nil>, ESN: false}: invalid argument."
Feb 23 21:07:46 BK-8bfc39dad3f909c0e35d599b8f0b4c155e89924a-zone-a dockerd[968]: time="2022-02-23T21:07:46.669779362+01:00" level=warning msg="Adding fSP{{Dst: 192.168.14.147/32, Src: 192.168.14.145/32, Proto: 17, DstPort: 4789, SrcPort: 0, Dir: dir out, Priority: 0, Index: 0, Action: allow, Ifindex: 0, Ifid: 0, Mark: (0xd0c4e3,0xffffffff), Tmpls: [{Dst: 192.168.14.147, Src: 192.168.14.145, Proto: esp, Mode: transport, Spi: 0x755dee5b, Reqid: 0xd0c4e3}]}}: invalid argument"

Those errors do not appear in the daemon logs of the working version.

@arnegroskurth
Copy link
Author

Traced the problem back to docker: moby/libnetwork#2653

@dustymabe
Copy link
Member

Hey @arnegroskurth - From your investigation you are saying this is a problem introduced with kernel 5.15.17? Do you know if there is an upstream fix (i.e. any newer kernel with a fix)?

@arnegroskurth
Copy link
Author

arnegroskurth commented Feb 24, 2022

Hey @arnegroskurth - From your investigation you are saying this is a problem introduced with kernel 5.15.17? Do you know if there is an upstream fix (i.e. any newer kernel with a fix)?

I don't know if there has been a rollback of this with a newer kernel but as far as I understand it, it was a conscious decision by the kernel maintainers to break the API in this way (see the linked commit in the upstream issue) - probably because otherwise undefined/unexpected results.
So from my point of view, this indeed has to be fixed in moby/libnetwork.

Edit: And yes, it has been released with 5.15.17.
Edit2: See also https://lore.kernel.org/lkml/[email protected]/T/

@Nowheresly
Copy link

From what I can see, it seems a rollback is planned in the kernel. See : torvalds/linux@a3d9001

@dustymabe
Copy link
Member

Thanks @Nowheresly - IIUC that should land when 5.17 hits. Hopefully we'll have a next release out soon with it.

@dustymabe dustymabe added the status/pending-next-release Fixed upstream. Waiting on a next release. label Mar 22, 2022
@dustymabe
Copy link
Member

kernel-5.17.0-300.fc36 landing in next-devel as part of coreos/fedora-coreos-config#1630

This is based on Fedora 36. Development artifacts can be found soon in the unofficial builds browser.

@dustymabe
Copy link
Member

The fix for this went into next stream release 36.20220325.1.0. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-testing-release Fixed upstream. Waiting on a testing release. and removed status/pending-next-release Fixed upstream. Waiting on a next release. labels Mar 30, 2022
@dustymabe
Copy link
Member

kernel-5.17.4-200.fc35 is now in fedora 35 in our testing stream.

@dustymabe
Copy link
Member

The fix for this went into testing stream release 35.20220424.2.0. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. labels Apr 28, 2022
@dustymabe
Copy link
Member

The fix for this went into stable stream release 35.20220424.3.0.

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants