Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

policyfilter: Apply Policy Only to Specific Containers in a Pod #2231

Merged
merged 8 commits into from
Apr 9, 2024

Conversation

BonySmoke
Copy link
Contributor

@BonySmoke BonySmoke commented Mar 17, 2024

This PR proposes a solution for this feature request: #1879

In general, it performs the following changes:

  • Add containerSelector to the tracing policy CRD. This section will accept the same arguments as podSelector and reuse the same label filtering logic. We will construct a map of container fields with the corresponding values and they will be the same as labels. For now, the only field we include in this map is name but this can be extended.
  • Add e2e and unit tests to cover the new changes.
  • Add documentation to describe how this feature can be used.
  • Minor changes to the Makefile and code for spinning up the cluster. Maybe this should be performed in a different PR, but I bumped the default image version for e2e tests to v1.29.2 because I couldn't spin up a cluster with the current image version and kind v0.22.0.

This PR also includes files generated with make crds.
Documentation has been tested with make docs.

However, while running e2e tests and unit tests I faced a couple of issues (likely not related to these changes):

  • e2e tests couldn't always create a cluster. For example, for tests/e2e/tests/labels, it looks like the repository is not available and returns 403. I didn't investigate these issues but targeted only tests/e2e/tests/policyfilter since this is what I changed.
I0317 13:32:30.031871  189507 helm.go:241] "Running Helm Operation" command="helm repo  add isovalent https://helm.isovalent.com --kubeconfig /tmp/kind-cluser-tetragon-ci-1988-kubecfg1161878701"
  • unit tests were taking too long in my case. However, I tested all the changes in pkg/policyfilter and pkg/sensors.
policyfilter: add a containerSelector that allows filtering policies by container name

@BonySmoke BonySmoke requested review from mtardy and a team as code owners March 17, 2024 11:48
Copy link

netlify bot commented Mar 17, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 3f1959e
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/661116789ecb0d000808848f
😎 Deploy Preview https://deploy-preview-2231--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@lambdanis lambdanis added the release-note/major This PR introduces major new functionality label Mar 17, 2024
Copy link
Contributor

@lambdanis lambdanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @BonySmoke. I didn't only a quick pass, but looks good overall.

The CI failures pointed out some changes required:

  • When updating CRDs, the version here has to be bumped too
  • It seems (*podInfo).cgroupIDs is unused

Two other failures are known CI issues: #2010 and #2233, so ignore them.

@BonySmoke
Copy link
Contributor Author

@lambdanis thank you for this quick review!
I would like to clarify something before pushing the changes:

  1. Can I bump the version of CRDs to 1.2.0? I see previously people were bumping mostly the patch version even when it looked like it wasn't a bug fix.
  2. You are totally right, after my changes, it's no longer used; however, I don't want to delete this code because it can be useful in the future:)
    Do you think I need to delete this function or maybe I need to somehow tell the linter to ignore it?

@lambdanis
Copy link
Contributor

  1. Can I bump the version of CRDs to 1.2.0? I see previously people were bumping mostly the patch version even when it looked like it wasn't a bug fix.

So the idea is that the CRD minor version matches the Tetragon minor version. So yeah, it looks a bit weird that only the patch version is bumped on new features - but these version bumps are mostly for development/testing purposes. Users will update CRDs when upgrading Tetragon, so will always upgrade to the next minor version anyway.

  1. You are totally right, after my changes, it's no longer used; however, I don't want to delete this code because it can be useful in the future:)
    Do you think I need to delete this function or maybe I need to somehow tell the linter to ignore it?

Do you have an example of what it can be useful for? In general we're trying to avoid unused code, because hypothesizing about what will be useful in the future often ends on hypothesizing, not actual usage. One option is to add the code in one commit and delete in another - then the code stays only in the git history for the future reference. If the code can be really useful in other packages, then you can just export it - linter won't complain about exported symbols. But again, we can't to predict what will be useful in the future, so I'd avoid doing that.

@kkourt kkourt self-assigned this Mar 18, 2024
@BonySmoke
Copy link
Contributor Author

@lambdanis Hello! Thank you for clarifying these points!

  1. I bumped the CRD version to 1.1.9.
  2. Your explanation makes perfect sense. I don't have a good example where this unused code could be useful; therefore, I deleted it.

Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Had a quick look and overall approach makes sense to me. Please find some comments below.

Makefile Show resolved Hide resolved
pkg/policyfilter/k8s_test.go Show resolved Hide resolved
pkg/policyfilter/state.go Show resolved Hide resolved
@BonySmoke BonySmoke force-pushed the pr/BonySmoke/add-container-selector branch from 7d3fdda to 32b2e84 Compare March 19, 2024 17:49
@kkourt
Copy link
Contributor

kkourt commented Mar 20, 2024

Thanks! started the CI!

@kkourt
Copy link
Contributor

kkourt commented Mar 20, 2024

The Check CRD version test seems to not properly working for external branches:

fatal: bad revision 'origin/pr/BonySmoke/add-container-selector'

We need to fix this, but it shouldn't block this PR.

@BonySmoke
Copy link
Contributor Author

@kkourt Hello! Thank you for this update!
I would like to ask you something as a part of the review. I probably should've included it in the PR description but I didn't, sorry for that 😞
There is this pkg/policyfilter/rthooks/rthooks.go file that registers new containers. You added it in this commit.

The purpose of the callback is to set-up the policyfilter state before the container starts.

While adding container filtering, I needed to get container information too (to extract the name). Therefore, after finding the pod, I added the logic for finding the container.
I wanted to confirm with you that this is the correct approach. There was no existing test for this function; therefore, I tried to test different scenarios in the live cluster and everything looked fine but still, I want to confirm this is the correct approach (mainly because I don't understand the concept of rthooks 🥲 ).

@kkourt
Copy link
Contributor

kkourt commented Mar 20, 2024

@kkourt Hello! Thank you for this update! I would like to ask you something as a part of the review. I probably should've included it in the PR description but I didn't, sorry for that 😞 There is this pkg/policyfilter/rthooks/rthooks.go file that registers new containers. You added it in this commit.

The purpose of the callback is to set-up the policyfilter state before the container starts.

While adding container filtering, I needed to get container information too (to extract the name). Therefore, after finding the pod, I added the logic for finding the container. I wanted to confirm with you that this is the correct approach. There was no existing test for this function; therefore, I tried to test different scenarios in the live cluster and everything looked fine but still, I want to confirm this is the correct approach (mainly because I don't understand the concept of rthooks 🥲 ).

Thanks for the head's up!
I'll take a look.

In the meantime, you might want to have a look at https://tetragon.io/docs/concepts/tracing-policy/k8s-filtering/. You can also use cri-o (it's actually a bit easier): https://github.com/cilium/tetragon/blob/main/contrib/rthooks/tetragon-oci-hook/docs/demo.md.

pkg/policyfilter/rthooks/rthooks.go Outdated Show resolved Hide resolved
@BonySmoke BonySmoke force-pushed the pr/BonySmoke/add-container-selector branch from 405531f to 2a2f64d Compare April 2, 2024 07:04
@BonySmoke BonySmoke requested a review from kkourt April 2, 2024 07:12
@BonySmoke BonySmoke force-pushed the pr/BonySmoke/add-container-selector branch 2 times, most recently from 9a695c3 to 85b8663 Compare April 3, 2024 14:56
@BonySmoke BonySmoke force-pushed the pr/BonySmoke/add-container-selector branch 2 times, most recently from 3904aea to 82e5a59 Compare April 4, 2024 10:21
@BonySmoke BonySmoke requested a review from lambdanis April 4, 2024 10:25
@BonySmoke BonySmoke force-pushed the pr/BonySmoke/add-container-selector branch from 82e5a59 to 2a7bd68 Compare April 4, 2024 15:07
Copy link
Contributor

@lambdanis lambdanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, thanks!

I see there is a conflict in rthooks, hopefully not painful. Could you rebase and resolve it?

// A map of container fields will be constructed in the same way as a map of labels.
// The name of the field represents the label "key", and the value of the field - label "value".
// Currently, only the "name" field is supported.
ContainerSelector *slimv1.LabelSelector `json:"containerSelector,omitempty"`
Copy link
Contributor

@lambdanis lambdanis Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused at first by LabelSelector, as there are no labels involved, but I guess by reusing it we get MatchExpressions in addition to exact name match essentially for free? This seems like a good enough reason to reuse LabelSelector, just checking my understanding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review!

I see there is a conflict in rthooks, hopefully not painful. Could you rebase and resolve it?

Sure, it was caused by the addition of the error label to the policyfilter_metrics_total label. I resolved it:)


I was confused at first by LabelSelector, as there are no labels involved, but I guess by reusing it we get MatchExpressions in addition to exact name match essentially for free?

You are totally right! I imagined container fields the same as labels (a map of key-value pairs) and therefore decided to reuse the existing filtering logic. As you said, essentially for free 😅

When running Tetragon in Kubernetes, it's possible to filter pods that
the policy will be applied to by pod labels and namespaces.
This change adds support for filtering by the container name inside the
pod or potentially a different field in the future.

The filtering happens in the "containerMatches" method. We construct a
map of key value pairs that represent different fields in the container.
Then, we apply the same label filtering as in the "podMatches" method.
At the moment, only the "name" field is supported.

Since we are dealing with multiple containers inside a pod and we only
need their cgroup ids to add to the policyfilter map, the "matchingContainersCgroupIDs"
method was added. It iterates over a slice of containers, finds matching
containers using "containerMatches", and returns their cgroup ids. This method is used for
all operations where we need to change cgroup ids in the policyfilter
map including applying policy diff, adding a new policy, etc.

This patch makes the following changes:

1. Adds the containerSelector field to the policyfilter package.
2. Updates CRD schema for tracing policies with containerSelector.
3. Bumps the CRD version.

Fixes: cilium#1879

Signed-off-by: Oleh Neichev <[email protected]>
This change adds multiple unit tests to cover the addition of containerSelector in the
policyfilter.

Fixes: cilium#1879

Signed-off-by: Oleh Neichev <[email protected]>
This patch generates CRDs with the support for containerSelector

Fixes: cilium#1879

Signed-off-by: Oleh Neichev <[email protected]>
This patch performs the following changes:

1. Bump the default image version for e2e tests to v1.29.2. Current
image doesn't work with kind v0.22.0.
2. Delete an excessive whitespace from the Makefile.

Signed-off-by: Oleh Neichev <[email protected]>
This change adds an integration test for the containerSelector field
section in the tracing policy.

Fixes: cilium#1879

Signed-off-by: Oleh Neichev <[email protected]>
This change describes how to use the container selector in tracing
policies. Also, this change renames the "K8s namespace and pod label
filtering" page to "K8s Policy Filtering" to make the name more generic.

Fixes: cilium#1879

Signed-off-by: Oleh Neichev <[email protected]>
We cannot use arg.Watcher.FindContainer() because it uses k8s API where
the container is still not available.

Instead, we extract the name of the container from arg.Req.ContainerName.
If the name is not found, we do not abort the hook because we can do
other types of filtering, e.g. by pod labels.

Fixes: cilium#1879

Signed-off-by: Oleh Neichev <[email protected]>
After adding the support for filtering policies by container name, we
decided not to abort the OCI hook when this detail is not present for
some reason not to break other filtering methods like pod labels.
However, we need to monitor such operations when the container name is
missing.

This patch aims to do this by adding a new "policyfilter_hook_container_name_missing_total" metric.
The counter will be increased when the container name cannot be found in
the "createContainerHook" function.

Besides, this patch adds a missing return statement for the case when
adding a container to pod from OCI hook fails and we inform the user
that we are aborting the hook.
In order to still have a counter increase upon error, we run the counter
increase logic before checking the error.

Fixes: cilium#1879

Signed-off-by: Oleh Neichev <[email protected]>
@BonySmoke BonySmoke force-pushed the pr/BonySmoke/add-container-selector branch from 2a7bd68 to 3f1959e Compare April 6, 2024 09:31
@BonySmoke BonySmoke requested a review from lambdanis April 6, 2024 09:38
Copy link
Contributor

@lambdanis lambdanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for the PR and all the fixes!

Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me!

Thanks for all your work, this is a great PR!

pkg/policyfilter/rthooks/rthooks.go Outdated Show resolved Hide resolved
@kkourt kkourt merged commit 078e2de into cilium:main Apr 9, 2024
42 of 43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/major This PR introduces major new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants