-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encrypted docker overlay network breaks on update to 35.20220116.3.0 #1111
Comments
Thanks for the report and all the details! Also, I see there has been a SELinux policy update at the same time. Did you check if you have any new AVC or similar permission issues in the logs? |
@lucab Thank you for the quick response! I could not find any denials in the logs but I've now noticed a difference in the loaded kernel modules: The |
Have now noticed a lot of errors like this in the docker daemon logs when the containers attached to the overlay network are started:
Those errors do not appear in the daemon logs of the working version. |
Traced the problem back to docker: moby/libnetwork#2653 |
Hey @arnegroskurth - From your investigation you are saying this is a problem introduced with kernel 5.15.17? Do you know if there is an upstream fix (i.e. any newer kernel with a fix)? |
I don't know if there has been a rollback of this with a newer kernel but as far as I understand it, it was a conscious decision by the kernel maintainers to break the API in this way (see the linked commit in the upstream issue) - probably because otherwise undefined/unexpected results. Edit: And yes, it has been released with 5.15.17. |
From what I can see, it seems a rollback is planned in the kernel. See : torvalds/linux@a3d9001 |
Thanks @Nowheresly - IIUC that should land when 5.17 hits. Hopefully we'll have a |
This is based on Fedora 36. Development artifacts can be found soon in the unofficial builds browser. |
The fix for this went into |
|
The fix for this went into |
The fix for this went into |
Describe the bug
Updating CoreOS from
35.20220103.3.0
to35.20220116.3.0
breaks encrypted docker overlay networking. No communication between containers on an encrypted overlay network is possible.I've tested the following CoreOS versions which leads me to believe that the update might be related to the kernel-update from
5.15.7
to5.15.17
but nothing in the kernel changelogs of the related releases stood out to me (combined changelog for all related versions for faster grepping: https://gist.github.com/arnegroskurth/ff5d0fdf9c66e0a3fc7b7f3eb4a92156).35.20220131.3.0 -> broken
35.20220116.3.0 -> broken
35.20220103.3.0 -> works
35.20211215.3.0 -> works
35.20211203.3.0 -> works
My most promising insight:
Using tcpdump, I can see -as expected- the pings as ESP packets (timely correlated) on the VM network interface for those versions where the communication works. However when the VMs are created e.g. with version
35.20220116.3.0
, I can see the pings as udp/vxlan/icmp packets on both the source- and target-VM indicating that the ESP encapsulation does not actually happen. (Yes,docker inspect
confirms the optionencrypted
being set of the overlay network)Observed traffic on source VM over encrypted network with broken version:
Observed traffic on target VM over encrypted network with broken version:
For comparison: The observed traffic on the target VM over an encrxpted network with a working version:
Further insight:
Reproduction steps
Steps to reproduce the behavior:
35.20220116.3.0
ova image (obviously sharing a vlan/portgroup)--opt encrypted
and--attachable
Expected behavior
Communication across encrypted overlay networks works.
Actual behavior
No IP packets reach a container on another host over an encrypted
Ignition config
Is generated by our deployment solution from which I can not extract it in a straight-forward way. But as I can reliably reproduce the problem by only changing the CoreOS version, I'm led to believe that it is not related to the ignition config.
The text was updated successfully, but these errors were encountered: