Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runner Pod can't start up with ServiceAccount on EKS 1.15.10 #16

Closed
missedone opened this issue Mar 21, 2020 · 17 comments
Closed

Runner Pod can't start up with ServiceAccount on EKS 1.15.10 #16

missedone opened this issue Mar 21, 2020 · 17 comments
Labels
question Further information is requested

Comments

@missedone
Copy link

I tried to deploy a runner with IAM Role for ServiceAccount (IRSA) in EKS 1.15.10, but the runner pod can't start up at all:

kubectl get pod
NAME                          READY   STATUS        RESTARTS   AGE
my-runner-lmzjc-znh8c         0/1     Terminating   0          10s

the manifest i used:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: actions-runner-system-sa
  namespace: actions-runner-system
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<AWS_ACCOUNT_ID>:role/<IAM_ROLE>
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: my-runner
  namespace: actions-runner-system
spec:
  replicas: 1
  template:
    spec:
      repository: example/repo
      serviceAccountName: actions-runner-system-sa
      automountServiceAccountToken: true
      containers:
        - name: runner
          image: example/action-runner:latest
          imagePullPolicy: Always

look into the event, i saw:

2s          Warning   FailedMount                pod/my-runner-zsvsd-vfdpz                Unable to mount volumes for pod "my-runner-zsvsd-vfdpz_actions-runner-system(1b78d4fb-25e6-4457-bd47-04ea06ae7d14)": timeout expired waiting for volumes to attach or mount for pod "actions-runner-system"/"my-runner-zsvsd-vfdpz". list of unmounted volumes=[aws-iam-token actions-runner-system-sa-token-8gmq9]. list of unattached volumes=[aws-iam-token actions-runner-system-sa-token-8gmq9]
@mumoshu
Copy link
Collaborator

mumoshu commented Mar 21, 2020

To clarify, containers[] is not the only option here. The workaround is to use spec.image to use a custom runner image so that serviceAccountName and automountServiceAccountToken works 😃

The fix for containers[] would be to add the required volume mounts to the runner container when containers[] is specified, which would look very similar to #14

@missedone
Copy link
Author

it doesn't matter if use containers[] or image, same error.

@mumoshu
Copy link
Collaborator

mumoshu commented Mar 21, 2020

Odd. Could you dump your runner pod by running kubectl get to -o yaml my-runner-lmzjc-znh8c?

@mumoshu
Copy link
Collaborator

mumoshu commented Mar 21, 2020

Does it work if you tried to use the pod IAM role from a regular pod, not a pod managed by the actions-runner-controller?

@mumoshu mumoshu added the question Further information is requested label Mar 22, 2020
@alexandrst88
Copy link
Contributor

I tested this on vanilla Kubernetes setup and service account was mounted to Runner Container without any issues.

@nachomillangarcia
Copy link

I think it's related to EKS IRSA. It works for me when I remove the annotation eks.amazonaws.com/role-arn: arn:aws:iam::<AWS_ACCOUNT_ID>:role/<IAM_ROLE>.

But with that annotation present, the controller keeps looping creating and destroying the runner pod.

EKS IRSA introduces envvars and volumes automatically in the pod:

  volumes:
  - name: aws-iam-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: sts.amazonaws.com
          expirationSeconds: 86400
          path: token
volumeMounts:
    - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
      name: aws-iam-token
      readOnly: true
env
    - name: AWS_ROLE_ARN
      value: arn:aws:iam::xxxxxx:role/xxxxx
    - name: AWS_WEB_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token

Maybe those collide with some validation at the controller.

I've checked the pod definitions when using EKS IRSA and they're fine. It's the controller who destroys the pod.

@callum-tait-pbx
Copy link
Contributor

callum-tait-pbx commented Aug 26, 2020

@summerwind @mumoshu I'm running into the same problem here and have the same symptoms / experience as the nachomillangarcia. This prevents me using the solution as role based auth is a requirement and we only use EKS for our k8s needs. We are running EKS version 1.17 atm.

@gregorygtseng
Copy link

It's possible to specify the environment variables and mount the volumes manually in your Runner or RunnerDeployment spec. Not ideal but it works.

@callum-tait-pbx
Copy link
Contributor

callum-tait-pbx commented Sep 3, 2020

@gregorygtseng I've been able to do this however my runner is still using the underlying node role and failing to assume. If I apply the annotations magic sauce manually and then try to perform an assume role with the github action (see below) I get a permission denied User: arn:aws:sts::***:assumed-role/eksctl-eks-test-nodegroup-NodeInstanceRole-EKV6S90QD0F3/i-0c5ccd732adce9984 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::***:role/example-role

with:
    aws-region: us-west-1
    role-to-assume: arn:aws:iam::$AWS_ACCOUNT:role/example-role
    role-skip-session-tagging: true

Have you been able to assume into roles successfully?

@gregorygtseng
Copy link

Yes I am able to login to ECR and push our Docker images from a GitHub Actions workflow.
I'm not using the action you're using with role-to-assume however.

Does your user have the permission for sts:AssumeRole ?

Here's my RunnerDeployment if that helps. Note I had to add securityContext to allow reading of the token

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: runner-deployment
spec:
  replicas: 2
  template:
    spec:
      ...
      env:
        - name: AWS_ROLE_ARN
          value: arn:aws:iam::xxxxxx:role/xxxxx
        - name: AWS_WEB_IDENTITY_TOKEN_FILE
          value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
      serviceAccountName: actions-runner-sa
      automountServiceAccountToken: true
      volumeMounts:
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: actions-runner-token-xxx
          readOnly: true
        - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
          name: aws-iam-token
          readOnly: true
      securityContext:
        fsGroup: 65534 # fix reading service token
      volumes:
      - name: aws-iam-token
        projected:
          defaultMode: 420
          sources:
          - serviceAccountToken:
              audience: sts.amazonaws.com
              expirationSeconds: 86400
              path: token

I would try a debug job where you sleep in the workflow and try execing into the runner?

$ aws sts get-caller-identity
{
    "UserId": "ABCDEFGHIJ:botocore-session-12345",
    "Account": "1234567890",
    "Arn": "arn:aws:sts::1234567890:assumed-role/example-role/botocore-session-12345"
}

@callum-tait-pbx
Copy link
Contributor

callum-tait-pbx commented Sep 3, 2020

Got it working, it was the magic combo of:

      - name: AWS_ROLE_ARN
        value: arn:aws:iam::$AWS_ACOUNT_ID:role/eksctl-sre-eks-test-addon-iamserviceaccount-Role1-1244USHBUKLEY
      - name: AWS_WEB_IDENTITY_TOKEN_FILE
        value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
      volumeMounts:
      - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
        name: aws-iam-token 
        readOnly: true
      volumes:
      - name: aws-iam-token
        projected:
          defaultMode: 420
          sources:
          - serviceAccountToken:
              audience: sts.amazonaws.com
              expirationSeconds: 86400
              path: token
      securityContext:
        fsGroup: 27

and the annotation on the service account. Used 27 as I couldn't sudo as runner if I used the nobody group for some reason.

@gregorygtseng
Copy link

gregorygtseng commented Sep 3, 2020

That's great! Did you mean removing the annotation on the service account? Because that's what I had to do since we're doing the mounting manually to not interfere

@callum-tait-pbx
Copy link
Contributor

callum-tait-pbx commented Sep 4, 2020

No without the annotation I was not able to get IAM working with IAM. I needed to include:

  • The annotation to the base IAM role in my service account, without this auth wouldn't work
  • Add in envs, volumeMounts and volume that the annotation automatically injects. The apps manager seemed to kill the pod if you let the annotation do this, I pressume because it fails some sort of validation?
  • Add in the fsGroup so the token is readable, 65534 or 27 seemed to work. 65534 broke sudo though whereas 27 didn't

My final yaml (with aws account id's replaced with $AWS_ACCOUNT_ID)

kind: ServiceAccount
metadata:
  name: sre-actions-runner-system-sa-ajh
  namespace: actions-runner-system
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::$AWS_ACCOUNT_ID:role/eksctl-sre-eks-test-addon-iamserviceaccount-Role1-1244USHBUKLEY 
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: runner-ajh
  namespace: actions-runner-system
spec:
  replicas: 1
  template:
    spec:
      organization: org
      serviceAccountName: sre-actions-runner-system-sa-ajh
      env:
      - name: AWS_ROLE_ARN
        value: arn:aws:iam::$AWS_ACCOUNT_ID:role/eksctl-sre-eks-test-addon-iamserviceaccount-Role1-1244USHBUKLEY
      - name: AWS_WEB_IDENTITY_TOKEN_FILE
        value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
      volumeMounts:
      - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
        name: aws-iam-token 
        readOnly: true
      volumes:
      - name: aws-iam-token
        projected:
          defaultMode: 420
          sources:
          - serviceAccountToken:
              audience: sts.amazonaws.com
              expirationSeconds: 86400
              path: token
      securityContext:
        fsGroup: 27
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: hra-1-ajh
  namespace: actions-runner-system
spec:
  scaleDownDelaySecondsAfterScaleUp: 10
  scaleTargetRef:
    name: runner-ajh
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: TotalNumberOfQueuedAndInProgressWorkflowRuns
    repositoryNames:
    - test-repo

After that it was just normal the AWS assume role setup.

@gregorygtseng
Copy link

@callum-tait-pbx Oh I see, you didn't mount the service token yourself which is why you need the annotation. Glad that combination worked out as well. Check the running pod spec to confirm.

@Nuru
Copy link
Contributor

Nuru commented Oct 21, 2020

I believe this is similar to jenkinsci/kubernetes-operator#361

When you add the eks.amazonaws.com/role-arn annotation to the service account, EKS automatically adds things to the Pod, which makes the Pod not look like what the controller is expecting, so the controller tries to fix it by deleting and recreating the Pod. The workaround is to tell the Controller to expect exactly what EKS is going to add.

I believe the better solution is for the Controller to deploy Deployments, not Pods (#133).

@droidpl
Copy link
Contributor

droidpl commented Nov 13, 2020

Thanks for all your guidance with this @callum-tait-pbx and @gregorygtseng . I got all of this also working to share account credentials through the AssumeWebIdentity procedure.

My deployment file is almost a clone of yours 😸

@mumoshu
Copy link
Collaborator

mumoshu commented Apr 25, 2021

We already have the documentation for this at https://github.com/summerwind/actions-runner-controller#using-eks-iam-role-for-service-accounts so it should be straightforward today. Closing as resolved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

8 participants