Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Natively Support IRSA on EKS Properly #200

Closed
callum-tait-pbx opened this issue Nov 17, 2020 · 5 comments · Fixed by #226
Closed

Feature: Natively Support IRSA on EKS Properly #200

callum-tait-pbx opened this issue Nov 17, 2020 · 5 comments · Fixed by #226
Assignees

Comments

@callum-tait-pbx
Copy link
Contributor

callum-tait-pbx commented Nov 17, 2020

This has come up in a few issues so I thought it would be good to create a issue specifically tackling implementing support. Additionally, having this explicitly called out will help people who want to implement the project on EKS in the meantime.

Classing it as a feature as there is a workaround and the project doesn't say it was created specifically for EKS.

Problem
EKS IAM Roles for Service Accounts (IRSA) is how AWS have implemented support role based authentication for their managed K8s service, EKS. The implementation does some manipulation of pods automatically injecting a volume and some environment variables. This automagic results in the controller spinning up a pod that is not as it expects resulting in a terminate-create loop.

Workaround
The workaround is to explicitly set the yaml that normally gets injected so the controller expects it preventing it fromt freaking out and getting stuck in a terminate-create loop.

Expected Behaviour
Only IRSA needs to be configured https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html, the values should get automatically injected into the runner definitions and the controller should not freak out as a result.

Normally injected YAML once IRSA is configured on your cluster:

  volumes:
  - name: aws-iam-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: sts.amazonaws.com
          expirationSeconds: 86400
          path: token
volumeMounts:
    - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
      name: aws-iam-token
      readOnly: true
env:
    - name: AWS_ROLE_ARN
      value: arn:aws:iam::xxxxxx:role/xxxxx
    - name: AWS_WEB_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token

Existing Issues That Reference this Bug/Feature
#16
#37

@mumoshu
Copy link
Collaborator

mumoshu commented Nov 19, 2020

@callum-tait-pbx Thank you so much for writing this up 🙏

I wasn't really sure what's happening on the mentioned issues until now. But after reading this, I got to think that this might be due to that how we mark a runner pod to be restarted:

https://github.com/summerwind/actions-runner-controller/blob/master/controllers/runner_controller.go#L201-L203

Perhaps we'd need to use some other criteria to mark runner pods to be restarted, so that it won't interfere with mutating webhooks - perhaps inject a pod spec hash into the runner pod annotations and compare that against the desired pod spec hash instead?

@callum-tait-pbx
Copy link
Contributor Author

Apologies for the late response, busy times at the moment! That would make a lot of sense to me.

perhaps inject a pod spec hash into the runner pod annotations and compare that against the desired pod spec hash instead?

I quite like idea

@mumoshu mumoshu self-assigned this Nov 30, 2020
@mumoshu
Copy link
Collaborator

mumoshu commented Nov 30, 2020

Thanks for confirming! I'm going to work on this in the next few days

mumoshu pushed a commit that referenced this issue Dec 7, 2020
One of the pod recreation conditions has been modified to use hash of runner spec, so that the controller does not keep restarting pods mutated by admission webhooks. This naturally allows us, for example, to use IRSA for EKS that requires its admission webhook to mutate the runner pod to have additional, IRSA-related volumes, volume mounts and env.

Resolves #200
@mumoshu
Copy link
Collaborator

mumoshu commented Dec 7, 2020

@callum-tait-pbx I'm working on this at #226

mumoshu pushed a commit that referenced this issue Dec 8, 2020
One of the pod recreation conditions has been modified to use hash of runner spec, so that the controller does not keep restarting pods mutated by admission webhooks. This naturally allows us, for example, to use IRSA for EKS that requires its admission webhook to mutate the runner pod to have additional, IRSA-related volumes, volume mounts and env.

Resolves #200
mumoshu added a commit that referenced this issue Dec 8, 2020
One of the pod recreation conditions has been modified to use hash of runner spec, so that the controller does not keep restarting pods mutated by admission webhooks. This naturally allows us, for example, to use IRSA for EKS that requires its admission webhook to mutate the runner pod to have additional, IRSA-related volumes, volume mounts and env.

Resolves #200
@mumoshu
Copy link
Collaborator

mumoshu commented Dec 8, 2020

@callum-tait-pbx Could you try v0.15.0? There's also a small documentation about this feature at https://github.com/summerwind/actions-runner-controller#using-eks-iam-role-for-service-accounts for your reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants