Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tetragon: improve how we handle cgroupv1 and cgroupv2 #3053

Merged
merged 5 commits into from
Oct 31, 2024

Conversation

tixxdz
Copy link
Member

@tixxdz tixxdz commented Oct 27, 2024

kernel v6.11 introduced new CONFIG_MEMCG_V1 and CONFIG_CPUSETS_V1 [1]
that changed how userspace detects the compiled cgroup controllers,
if the new CONFIG_MEMCG_V1=n then memory and same for cpuset controller
will not be exported in /proc/cgroups.

In Cgroupv2 we automatically handle this with these changes. If your distribution
runs in Cgroupv1 then these changes also handle it assuming you have
both CONFIG_MEMCG=y and CONFIG_MEMCG_V1=y are set.

[1] "af000ce85293b8e60" "cgroup: Do not report unavailable v1 controllers in /proc/cgroups"

Kernel v6.11 introduced a new CONFIG_MEMCG_V1 for legacy cgroup v1 memory controller which may affect Tetragon cgroup environment detection, process and container/pod association. Improve Tetragon so we handle the new configuration and allow process container association to properly work if the distribution boots in default unified hierarchy Cgroupv2.
If the distribution boots in Hybrid Cgroupv1 mode then ensure the appropriate controllers are exported otherwise report an error and print the kernel config that are needed, probably: CONFIG_MEMCG=y, CONFIG_MEMCG_V1=y, CONFIG_CPUSETS=y and CONFIG_CPUSETS_V1=y all must be set to have proper process and container/pod association.

cgroups: fix kernel  config issue introduced in 6.11. For cgroup v1, setting CONFIG_MEMCG=y, CONFIG_MEMCG_V1=y, CONFIG_CPUSETS=y and CONFIG_CPUSETS_V1=y is required for Tetragon to work.

@tixxdz tixxdz added the release-note/minor This PR introduces a minor user-visible change label Oct 27, 2024
@tixxdz tixxdz requested a review from a team as a code owner October 27, 2024 20:00
@tixxdz tixxdz requested a review from tpapagian October 27, 2024 20:00
@tixxdz tixxdz marked this pull request as draft October 27, 2024 20:00
Copy link

netlify bot commented Oct 27, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 54e27b6
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/672398f01b3b430008b25531
😎 Deploy Preview https://deploy-preview-3053--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@tixxdz tixxdz force-pushed the pr/tixxdz/2024-10-cgroup-fixes branch 3 times, most recently from 69fe615 to 17cd7bd Compare October 28, 2024 15:25
@tixxdz tixxdz marked this pull request as ready for review October 28, 2024 15:25
@tixxdz tixxdz changed the title cgroup: fixes for bpf-next tetragon: improve how we handle cgroupv1 and cgroupv2 Oct 28, 2024
Copy link
Contributor

@olsajiri olsajiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, and perhaps we could add test for parseCgroupv2RootControllers

pkg/cgroups/cgroups.go Outdated Show resolved Hide resolved
@tixxdz tixxdz force-pushed the pr/tixxdz/2024-10-cgroup-fixes branch from 17cd7bd to e3e6991 Compare October 30, 2024 11:20
@tixxdz
Copy link
Member Author

tixxdz commented Oct 30, 2024

lgtm, and perhaps we could add test for parseCgroupv2RootControllers

Thank you, yes I added it.

@kkourt kkourt added release-note/bug This PR fixes an issue in a previous release of Tetragon. and removed release-note/minor This PR introduces a minor user-visible change labels Oct 30, 2024
@tixxdz tixxdz force-pushed the pr/tixxdz/2024-10-cgroup-fixes branch from e3e6991 to 370c3c1 Compare October 30, 2024 16:03
kernel v6.11 introduced new CONFIG_MEMCG_V1 and CONFIG_CPUSETS_V1 [1]
that changed how userspace detects the compiled cgroup controllers,
if the new CONFIG_MEMCG_V1=n is not set then the memory and same for
cpuset controller will not be exported in /proc/cgroups.

Adapt our bpf infrastructure so we pass the cgroup filesystem magic
number from userspace into bpf, then use it in our bpf helpers to
fetch the appropriate cgroups, this allows to handle the cgroupv2
case independently from those new configs. We used to track per
controller which works for both cgroupv1 and cgroupv2, and allowed
to support old kernels even before 4.19, but now since upstream is
adding new configs, let use the default cgroup of the cgroup
subsystems state objects which is properly set under cgroupv2 unified
hierarchy.

This allows:

* Work in case of cgroupv2 if CONFIG_MEMCG_V1='y' or 'n' both cases.
* Work in case of cgroupv1 since distributions that still want to
  keep cgroupv1 usage must compile now with all
  CONFIG_CPUSETS=y, CONFIG_CPUSETS_V1=y, CONFIG_MEMCG=y and
  CONFIG_MEMCG_V1='y' set.

[1] "af000ce85293b8e60"
"cgroup: Do not report unavailable v1 controllers in /proc/cgroups"

Signed-off-by: Djalal Harouni <[email protected]>
kernel v6.11 introduced new CONFIG_MEMCG_V1 and CONFIG_CPUSETS_V1 [1]
that changed how userspace detects the compiled cgroup controllers,
if the new CONFIG_MEMCG_V1=n is not set then the memory and same for
cpuset controller will not be exported in /proc/cgroups.

To better handle all these changes, from userspace if in cgroupv2
we do not pass anymore the cgroup subsystems state index of the
controller we will use, but just use the default cgroup hierarchy
ID from bpf side.

If in cgroupv1 we continue to pass the css_set index and use it from
bpf helpers.

Also reduce changes by keeping somehow same logic as we are fixing
a bug in TestCgroupv2ExecK8sHierarchyInUnified that was trigged by
that upstream change. So we fix the bug but also keep cgroupv2
controllers discovery and we log it, this allows to know more about
the environment in case of errors or during debugging.

We also had some race conditions during spawning/executing a process
inside its corresponding cgroup and the cgroup being reported, having
as much information helps bisecting.

On another note we have the test
TestCgroupv1ExecK8sHierarchyInHybridMemory that tests the cgroupv1
memory controller to ensure in cgroupv1 we report appropriate cgroups
and container names.

[1] "af000ce85293b8e60"
"cgroup: Do not report unavailable v1 controllers in /proc/cgroups"

Signed-off-by: Djalal Harouni <[email protected]>
The CgroupIDFromPID() function parses the process cgroup hierarchies
obtained from /proc/pid/cgroup , it was not tested on cgroupv1 however
it contains a bug, let's fix this. We want the cgroup hierarchy ID
since that's the one that is exposed by the kernel not the cgroup
subsystem index which is used in bpf to read cgroupv1 css indexes.

Signed-off-by: Djalal Harouni <[email protected]>
kernel v6.11 introduced new CONFIG_MEMCG_V1 and CONFIG_CPUSETS_V1 [1]
that changed how userspace detects the compiled cgroup controllers,
if the new CONFIG_MEMCG_V1=n is not set then the memory and same for
cpuset controller will not be exported in /proc/cgroups.

Update parseCgroupv1SubSysIds() that parses cgroup controllers to
ensure that we have the memory and cpuset controllers compiled
and exported, and in case of failures report that the user needs
kernel configs:
CONFIG_MEMCG=y and CONFIG_MEMCG_V1=y
CONFIG_CPUSETS=y and CONFIG_CPUSETS_V1=y

We need them to be compiled and exported in /proc/cgroups since we
use the line order and number as an indication for their index in the
css_set to fetch the related subsystem and its cgroup. They are
compiled in configs, same for /proc/cgroups since its content is
compile time generated.

That index allows us to work and track the right cgroup hierarchy in
cgroupv1 otherwise we will operate on the wrong hierarchy that
probably does not have proper cgroup tracking.

The css_set index is saved in the bpf map 'tg_conf_map' in field
tg_cgrpv1_subsys_idx from user space during startup, and used later
by bpf code to track processes and associate related cgroups.

[1] "af000ce85293b8e60"
"cgroup: Do not report unavailable v1 controllers in /proc/cgroups"

Signed-off-by: Djalal Harouni <[email protected]>
@tixxdz tixxdz force-pushed the pr/tixxdz/2024-10-cgroup-fixes branch from 370c3c1 to 54e27b6 Compare October 31, 2024 14:49
@tixxdz tixxdz merged commit 70a9d19 into main Oct 31, 2024
48 checks passed
@tixxdz tixxdz deleted the pr/tixxdz/2024-10-cgroup-fixes branch October 31, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/bug This PR fixes an issue in a previous release of Tetragon.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants