Skip to content
This repository has been archived by the owner on Jan 12, 2023. It is now read-only.

SPIKE: Operator to evict tainted pods - issue 18 #47

Closed

Conversation

alpe
Copy link
Contributor

@alpe alpe commented Jan 21, 2020

🚧 SPIKE - do not merge

See #18 for discussions.

This is a demo of a standalone k8s operator that runs within the cluster to find and evict tainted pods.
The code is based on the workqueue example.

The workqueue design comes with some learning curve but once you get used to it it is very nice. I kept most comments and functions from the original example.

A good start to read the code would be controller.evictPod and then main.go.

Disclaimer: It compiles but I have not tested or run it locally.
Update:
I have started to implement this, now:

Todo:

  • Namespaces other than default (all)
  • Remove tainted label and annotations when evicted
    Happens automatically on restart
  • Makefile and Dockerfile
  • Leader lock for running multiple instances
  • Investigate why "First Timestamp" is empty on minikube
  • Update helm charts
  • Unit test as far as possible

Copy link
Contributor

@dustin-decker dustin-decker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great start, thanks for spiking on this. We may find other uses for the operator down the road as well. I left a few suggestions.

cmd/evicter/main.go Show resolved Hide resolved
cmd/evicter/main.go Outdated Show resolved Hide resolved

const (
annotationPreventEviction = "k-rails/tainted-prevent-eviction"
annotationTimestamp = "k-rails/tainted-timestamp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to add an eviction reason annotation too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What workflows other than #9 do you have in mind for the tainted flag?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing particular in mind right now, but there may be some use cases in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A usecase came to mind: An image's package or dependency scan results contains CVE-2019-XXXXX.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a good scenario to discuss. If we flag it as tainted, we could have an 👀 on it in monitoring. Redeploying the pod by the operator would not fix the vulnerability, though just kills transient state which is not a bad thing.
This reminds me that this spike does not remove the label + annotations before calling evict.... will add to the not impl list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the label + annotation is added only to the Pod, I think it should be cleared after the Pod is evicted and a new Pod replaces it.
Should the taint reason should be a label as well, so it's easy to filter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "reason" as label can create a lot of noise in upstream systems like monitoring. Although reasons are currently very limited, it may be extended by somebody to any kind of text in the future. If you are think about scaling the operator we can run multiple and have some sharding.

@alpe
Copy link
Contributor Author

alpe commented Jan 27, 2020

How do you want to proceed with this task? Go this way with an operator or try something else?

In terms of priorities I assume Issue #9 should be implemented first.

@dustin-decker
Copy link
Contributor

I think you've shown that the operator approach is a fine way to go and worth continuing on. From the user perspective, it's an implementation detail and deployment will be just as easy as it was before.

I don't order is too important for #9, but it should be a quick and easy policy. There is already a PodExec resource extractor used by the No Exec policy.

@alpe
Copy link
Contributor Author

alpe commented Jan 28, 2020

it's an implementation detail and deployment will be just as easy as it was before.

I had a standalone operator in mind to separate concerns. Which means different binary, docker file and deployment. Also a separate service account to not mix permissions. This adds some complexity to the system and build process, though.
🤔 An embedded version would be doable, too but as the admission controller requires HA we have to ship it with a leader lock and some complexity there.
My personal preference is standalone.

@dustin-decker
Copy link
Contributor

I think standalone is the right choice, I just meant that the included helm chart could roll it out as a separate Deployment in the namespace. If we do that, the installation won't be any more difficult.
To keep builds simple it might be best to include the k-rail and k-rail-operator binary in the same container image and just change the entrypoint for each Deployment. I don't think dockerhub automated builds support multiple Dockerfiles.

@alpe alpe force-pushed the alex/spike_evict_operator_18 branch from c57f770 to e7c0cac Compare January 31, 2020 12:54
@alpe alpe force-pushed the alex/spike_evict_operator_18 branch from e7c0cac to 87f2bf8 Compare January 31, 2020 12:55
@alpe alpe mentioned this pull request Feb 3, 2020
4 tasks
@alpe
Copy link
Contributor Author

alpe commented Feb 3, 2020

Closing this in favour of #54

@alpe alpe closed this Feb 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants