SPIKE: Operator to evict tainted pods - issue 18 #47

alpe · 2020-01-21T13:15:31Z

🚧 SPIKE - do not merge

See #18 for discussions.

This is a demo of a standalone k8s operator that runs within the cluster to find and evict tainted pods.
The code is based on the workqueue example.

The workqueue design comes with some learning curve but once you get used to it it is very nice. I kept most comments and functions from the original example.

A good start to read the code would be controller.evictPod and then main.go.

~~Disclaimer: It compiles but I have not tested or run it locally.~~
Update:
I have started to implement this, now:

Todo:

Namespaces other than default (all)
Remove tainted label and annotations when evicted
Happens automatically on restart
Makefile and Dockerfile
Leader lock for running multiple instances
Investigate why "First Timestamp" is empty on minikube
Update helm charts
Unit test as far as possible

dustin-decker

This looks like a great start, thanks for spiking on this. We may find other uses for the operator down the road as well. I left a few suggestions.

cmd/evicter/main.go

dustin-decker · 2020-01-23T07:41:04Z

cmd/evicter/controller.go

+
+const (
+	annotationPreventEviction = "k-rails/tainted-prevent-eviction"
+	annotationTimestamp       = "k-rails/tainted-timestamp"


I think we might want to add an eviction reason annotation too

What workflows other than #9 do you have in mind for the tainted flag?

Nothing particular in mind right now, but there may be some use cases in the future.

A usecase came to mind: An image's package or dependency scan results contains CVE-2019-XXXXX.

Thanks, this is a good scenario to discuss. If we flag it as tainted, we could have an 👀 on it in monitoring. Redeploying the pod by the operator would not fix the vulnerability, though just kills transient state which is not a bad thing.
This reminds me that this spike does not remove the label + annotations before calling evict.... will add to the not impl list.

As long as the label + annotation is added only to the Pod, I think it should be cleared after the Pod is evicted and a new Pod replaces it.
Should the taint reason should be a label as well, so it's easy to filter?

The "reason" as label can create a lot of noise in upstream systems like monitoring. Although reasons are currently very limited, it may be extended by somebody to any kind of text in the future. If you are think about scaling the operator we can run multiple and have some sharding.

cmd/evicter/controller.go

alpe · 2020-01-27T14:25:11Z

How do you want to proceed with this task? Go this way with an operator or try something else?

In terms of priorities I assume Issue #9 should be implemented first.

dustin-decker · 2020-01-27T20:34:46Z

I think you've shown that the operator approach is a fine way to go and worth continuing on. From the user perspective, it's an implementation detail and deployment will be just as easy as it was before.

I don't order is too important for #9, but it should be a quick and easy policy. There is already a PodExec resource extractor used by the No Exec policy.

alpe · 2020-01-28T11:23:51Z

it's an implementation detail and deployment will be just as easy as it was before.

I had a standalone operator in mind to separate concerns. Which means different binary, docker file and deployment. Also a separate service account to not mix permissions. This adds some complexity to the system and build process, though.
🤔 An embedded version would be doable, too but as the admission controller requires HA we have to ship it with a leader lock and some complexity there.
My personal preference is standalone.

dustin-decker · 2020-01-28T21:56:49Z

I think standalone is the right choice, I just meant that the included helm chart could roll it out as a separate Deployment in the namespace. If we do that, the installation won't be any more difficult.
To keep builds simple it might be best to include the k-rail and k-rail-operator binary in the same container image and just change the entrypoint for each Deployment. I don't think dockerhub automated builds support multiple Dockerfiles.

alpe · 2020-02-03T12:55:07Z

Closing this in favour of #54

alpe requested a review from dustin-decker January 22, 2020 16:44

dustin-decker reviewed Jan 23, 2020

View reviewed changes

dustin-decker reviewed Jan 24, 2020

View reviewed changes

cmd/evicter/controller.go Show resolved Hide resolved

alpe added 2 commits January 31, 2020 13:47

Operator to evict tainted pods

b5a139c

Submit k8s event for evict

13fca19

alpe force-pushed the alex/spike_evict_operator_18 branch from c57f770 to e7c0cac Compare January 31, 2020 12:54

Add evicter to docker image

87f2bf8

alpe force-pushed the alex/spike_evict_operator_18 branch from e7c0cac to 87f2bf8 Compare January 31, 2020 12:55

alpe mentioned this pull request Feb 3, 2020

Add Evict operator #54

Open

4 tasks

alpe closed this Feb 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPIKE: Operator to evict tainted pods - issue 18 #47

SPIKE: Operator to evict tainted pods - issue 18 #47

alpe commented Jan 21, 2020 •

edited

Loading

dustin-decker left a comment

dustin-decker Jan 23, 2020

alpe Jan 23, 2020

dustin-decker Jan 23, 2020

dustin-decker Jan 24, 2020

alpe Jan 24, 2020

dustin-decker Jan 25, 2020

alpe Jan 27, 2020

alpe commented Jan 27, 2020

dustin-decker commented Jan 27, 2020

alpe commented Jan 28, 2020

dustin-decker commented Jan 28, 2020

alpe commented Feb 3, 2020

SPIKE: Operator to evict tainted pods - issue 18 #47

SPIKE: Operator to evict tainted pods - issue 18 #47

Conversation

alpe commented Jan 21, 2020 • edited Loading

dustin-decker left a comment

Choose a reason for hiding this comment

dustin-decker Jan 23, 2020

Choose a reason for hiding this comment

alpe Jan 23, 2020

Choose a reason for hiding this comment

dustin-decker Jan 23, 2020

Choose a reason for hiding this comment

dustin-decker Jan 24, 2020

Choose a reason for hiding this comment

alpe Jan 24, 2020

Choose a reason for hiding this comment

dustin-decker Jan 25, 2020

Choose a reason for hiding this comment

alpe Jan 27, 2020

Choose a reason for hiding this comment

alpe commented Jan 27, 2020

dustin-decker commented Jan 27, 2020

alpe commented Jan 28, 2020

dustin-decker commented Jan 28, 2020

alpe commented Feb 3, 2020

alpe commented Jan 21, 2020 •

edited

Loading