-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tetragon: improve how we handle cgroupv1 and cgroupv2 #3053
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tixxdz
added
the
release-note/minor
This PR introduces a minor user-visible change
label
Oct 27, 2024
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
tixxdz
force-pushed
the
pr/tixxdz/2024-10-cgroup-fixes
branch
3 times, most recently
from
October 28, 2024 15:25
69fe615
to
17cd7bd
Compare
tixxdz
changed the title
cgroup: fixes for bpf-next
tetragon: improve how we handle cgroupv1 and cgroupv2
Oct 28, 2024
This was referenced Oct 29, 2024
olsajiri
approved these changes
Oct 29, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, and perhaps we could add test for parseCgroupv2RootControllers
kkourt
reviewed
Oct 29, 2024
tixxdz
force-pushed
the
pr/tixxdz/2024-10-cgroup-fixes
branch
from
October 30, 2024 11:20
17cd7bd
to
e3e6991
Compare
Thank you, yes I added it. |
kkourt
added
release-note/bug
This PR fixes an issue in a previous release of Tetragon.
and removed
release-note/minor
This PR introduces a minor user-visible change
labels
Oct 30, 2024
tixxdz
force-pushed
the
pr/tixxdz/2024-10-cgroup-fixes
branch
from
October 30, 2024 16:03
e3e6991
to
370c3c1
Compare
kkourt
approved these changes
Oct 31, 2024
kernel v6.11 introduced new CONFIG_MEMCG_V1 and CONFIG_CPUSETS_V1 [1] that changed how userspace detects the compiled cgroup controllers, if the new CONFIG_MEMCG_V1=n is not set then the memory and same for cpuset controller will not be exported in /proc/cgroups. Adapt our bpf infrastructure so we pass the cgroup filesystem magic number from userspace into bpf, then use it in our bpf helpers to fetch the appropriate cgroups, this allows to handle the cgroupv2 case independently from those new configs. We used to track per controller which works for both cgroupv1 and cgroupv2, and allowed to support old kernels even before 4.19, but now since upstream is adding new configs, let use the default cgroup of the cgroup subsystems state objects which is properly set under cgroupv2 unified hierarchy. This allows: * Work in case of cgroupv2 if CONFIG_MEMCG_V1='y' or 'n' both cases. * Work in case of cgroupv1 since distributions that still want to keep cgroupv1 usage must compile now with all CONFIG_CPUSETS=y, CONFIG_CPUSETS_V1=y, CONFIG_MEMCG=y and CONFIG_MEMCG_V1='y' set. [1] "af000ce85293b8e60" "cgroup: Do not report unavailable v1 controllers in /proc/cgroups" Signed-off-by: Djalal Harouni <[email protected]>
kernel v6.11 introduced new CONFIG_MEMCG_V1 and CONFIG_CPUSETS_V1 [1] that changed how userspace detects the compiled cgroup controllers, if the new CONFIG_MEMCG_V1=n is not set then the memory and same for cpuset controller will not be exported in /proc/cgroups. To better handle all these changes, from userspace if in cgroupv2 we do not pass anymore the cgroup subsystems state index of the controller we will use, but just use the default cgroup hierarchy ID from bpf side. If in cgroupv1 we continue to pass the css_set index and use it from bpf helpers. Also reduce changes by keeping somehow same logic as we are fixing a bug in TestCgroupv2ExecK8sHierarchyInUnified that was trigged by that upstream change. So we fix the bug but also keep cgroupv2 controllers discovery and we log it, this allows to know more about the environment in case of errors or during debugging. We also had some race conditions during spawning/executing a process inside its corresponding cgroup and the cgroup being reported, having as much information helps bisecting. On another note we have the test TestCgroupv1ExecK8sHierarchyInHybridMemory that tests the cgroupv1 memory controller to ensure in cgroupv1 we report appropriate cgroups and container names. [1] "af000ce85293b8e60" "cgroup: Do not report unavailable v1 controllers in /proc/cgroups" Signed-off-by: Djalal Harouni <[email protected]>
Signed-off-by: Djalal Harouni <[email protected]>
The CgroupIDFromPID() function parses the process cgroup hierarchies obtained from /proc/pid/cgroup , it was not tested on cgroupv1 however it contains a bug, let's fix this. We want the cgroup hierarchy ID since that's the one that is exposed by the kernel not the cgroup subsystem index which is used in bpf to read cgroupv1 css indexes. Signed-off-by: Djalal Harouni <[email protected]>
kernel v6.11 introduced new CONFIG_MEMCG_V1 and CONFIG_CPUSETS_V1 [1] that changed how userspace detects the compiled cgroup controllers, if the new CONFIG_MEMCG_V1=n is not set then the memory and same for cpuset controller will not be exported in /proc/cgroups. Update parseCgroupv1SubSysIds() that parses cgroup controllers to ensure that we have the memory and cpuset controllers compiled and exported, and in case of failures report that the user needs kernel configs: CONFIG_MEMCG=y and CONFIG_MEMCG_V1=y CONFIG_CPUSETS=y and CONFIG_CPUSETS_V1=y We need them to be compiled and exported in /proc/cgroups since we use the line order and number as an indication for their index in the css_set to fetch the related subsystem and its cgroup. They are compiled in configs, same for /proc/cgroups since its content is compile time generated. That index allows us to work and track the right cgroup hierarchy in cgroupv1 otherwise we will operate on the wrong hierarchy that probably does not have proper cgroup tracking. The css_set index is saved in the bpf map 'tg_conf_map' in field tg_cgrpv1_subsys_idx from user space during startup, and used later by bpf code to track processes and associate related cgroups. [1] "af000ce85293b8e60" "cgroup: Do not report unavailable v1 controllers in /proc/cgroups" Signed-off-by: Djalal Harouni <[email protected]>
tixxdz
force-pushed
the
pr/tixxdz/2024-10-cgroup-fixes
branch
from
October 31, 2024 14:49
370c3c1
to
54e27b6
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
kernel v6.11 introduced new CONFIG_MEMCG_V1 and CONFIG_CPUSETS_V1 [1]
that changed how userspace detects the compiled cgroup controllers,
if the new CONFIG_MEMCG_V1=n then memory and same for cpuset controller
will not be exported in /proc/cgroups.
In Cgroupv2 we automatically handle this with these changes. If your distribution
runs in Cgroupv1 then these changes also handle it assuming you have
both CONFIG_MEMCG=y and CONFIG_MEMCG_V1=y are set.
[1] "af000ce85293b8e60" "cgroup: Do not report unavailable v1 controllers in /proc/cgroups"
Kernel v6.11 introduced a new CONFIG_MEMCG_V1 for legacy cgroup v1 memory controller which may affect Tetragon cgroup environment detection, process and container/pod association. Improve Tetragon so we handle the new configuration and allow process container association to properly work if the distribution boots in default unified hierarchy Cgroupv2.
If the distribution boots in Hybrid Cgroupv1 mode then ensure the appropriate controllers are exported otherwise report an error and print the kernel config that are needed, probably: CONFIG_MEMCG=y, CONFIG_MEMCG_V1=y, CONFIG_CPUSETS=y and CONFIG_CPUSETS_V1=y all must be set to have proper process and container/pod association.