Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE workload Identity #13081

Open
userbradley opened this issue May 23, 2022 · 13 comments
Open

GKE workload Identity #13081

userbradley opened this issue May 23, 2022 · 13 comments
Labels
area/platform issues related to the platform community frozen Not being actively worked on team/deployments type/enhancement New feature or request

Comments

@userbradley
Copy link

userbradley commented May 23, 2022

Tell us about the problem you're trying to solve

I am trying to setup Airbyte in a secure manner on a GKE cluster running on Google cloud.

A it stands, you need to create a service account and keys, then base64 encode these values and store them as a secret in the Cluster.

apiVersion: v1
kind: Secret
metadata:
  name: gcs-log-creds
  namespace: default
data:
  gcp.json: ""

Describe the solution you’d like

Ideally I would like to use workload Identity, where we specify a service account that Airbyte uses on the cluster, which then impersonates and comes out the cluster as a GCP service account.

Describe the alternative you’ve considered or used

Simply not using the logging as it goes against our organizational policies of creating and exporting service account keys

Additional context

No

Are you willing to submit a PR?

Yes! I'm not 100% sure where I can help, perhaps with the KB writing!

Discourse post

https://discuss.airbyte.io/t/airbyte-using-fleet-workload-identity-overwrites-google-application-credentials-inside-connector/2277/1

@userbradley
Copy link
Author

Thanks @Santhin - The links you've provided (well at least this one, and this one ) are still using the key.json file

Can you share any modifications you needed to make to get Airbyte to work with workload ID over a SA key?

I am pretty familiar with K8's Workload Identity to GCP, we have a few deployments using them, but I'm unsure if Airbyte will work with it, as it seem to be expecting the key file.

Thoughts?

@Santhin
Copy link

Santhin commented Aug 11, 2022

Exactly this was the wall for me on how to use workload identity using key.json but the solution was to use Fleet workload identity which gives you the possibility to generate access token from Kubernetes service account.

Firstly u need to create sa:

resource "google_service_account" "sa_airbyte" {
  account_id = "airbyte-admin"
}
resource "google_project_iam_member" "sa_airbyte" {
  project = var.project
  role    = google_project_iam_custom_role.cr_airbyte.name
  member  = "serviceAccount:${google_service_account.sa_airbyte.email}"
}
resource "google_service_account_iam_member" "sa_airbyte" {
  service_account_id = google_service_account.sa_airbyte.id
  role               = "roles/iam.workloadIdentityUser"
  member             = "serviceAccount:${var.project}.svc.id.goog[airbyte/airbyte-admin]"
}

I tested with a different name and account_id must match the account used inside helm chart which is airbyte-admin

Now we need to create json file with impersonated credentials
I encourage you to follow this docs: https://cloud.google.com/anthos/fleet-management/docs/use-workload-identity#use_fleet_workload_identity
This var.airbyte_gcs_log_creds_payload contains this json file:

{
      "type": "external_account",
      "audience": "identitynamespace:WORKLOAD_IDENTITY_POOL:IDENTITY_PROVIDER",
      "service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/GSA_NAME@GSA_PROJECT_ID.iam.gserviceaccount.com:generateAccessToken",
      "subject_token_type": "urn:ietf:params:oauth:token-type:jwt",
      "token_url": "https://sts.googleapis.com/v1/token",
      "credential_source": {
        "file": "/secrets/tokens/gcp-ksa/token" <- in our example token gonna be mounted in this location screens below
      }
    }

With this json file we need to create kubernetes secret in my example it was something like this:

resource "kubernetes_manifest" "airbyte_gcs_log_creds" {
  manifest = {
    "apiVersion" = "v1"
    "data" = {
      "gcp.json" = base64encode(var.airbyte_gcs_log_creds_payload)
    }
    "kind" = "Secret"
    "metadata" = {
      "name" = "airbyte-airbyte-gcs-log-creds"
      "namespace" = "airbyte"
    }
  }
}

And now we gonna create ksa where we anotate our sa to ksa

(ksa - kubernetes service account)

Pls check this flag automountServiceAccountToken we want to mount our access token in different location so it's must have

resource "kubernetes_manifest" "ksa_airbyte_admin" {
  manifest = {
    "apiVersion" = "v1"
    "automountServiceAccountToken" = false
    "kind" = "ServiceAccount"
    "metadata" = {
      "annotations" = {
        "iam.gke.io/gcp-service-account" = var.sa_airbyte
      }
      "name" = "airbyte-admin"
      "namespace" = "airbyte"
    }
  }
}

In my values for helm charts

serviceAccount:
  create: false <- I don't want to create airbyte-admin with helm but with kubernetes manfiest 
global:
  logs:
    gcs:
      credentials: "/secrets/tokens/gcp-ksa/gcp.json" <- i make different path explenation later
    minio:
      enabled: true

server:
  extraVolumeMounts:
    - name: gcp-ksa
      mountPath: /secrets/tokens/gcp-ksa
      readOnly: true
  extraVolumes: 
    - name: gcp-ksa
      projected:
        defaultMode: 420
        sources:
        - serviceAccountToken:
            path: token
            audience: playground-357914.svc.id.goog
            expirationSeconds: 172800
        - secret:
            name: airbyte-airbyte-gcs-log-creds

worker:
  extraVolumeMounts:
    - name: gcp-ksa
      mountPath: /secrets/tokens/gcp-ksa
      readOnly: true
  extraVolumes: 
    - name: gcp-ksa
      projected:
        defaultMode: 420
        sources:
        - serviceAccountToken:
            path: token
            audience: playground-357914.svc.id.goog
            expirationSeconds: 172800
        - secret:
            name: airbyte-airbyte-gcs-log-creds

And here is the example of mounted files:
image
Here you can see my mounted secret twice
gcs-log-creds <- this is created from helm charts
token <- my overwrite

image

@userbradley If you have more questions about this implementation feel free to ask I will try to create some simple example with a public repo with this because I've seen tons of threads about this.

Additional notes: I didn't test this with gcp connector, for example, bigquery. If we can use the same method for using impersonated json file rather than private key from service account It would be huge :D.

@userbradley
Copy link
Author

@Santhin thanks for the comment, I'll try make sometime to look in to it.

Thought I'd just reply so you don't think I've ignored it - the team and I greatly appreciate your input and help!

@Santhin
Copy link

Santhin commented Aug 17, 2022

With this solution are some drawbacks or some additional goods it depends how you gonna look on this.

Using fleet workload identity which gonna mount GOOGLE_APPLICATION_CREDENTIALS to worker pod in case of trying to create connection / destination using bigquery you gonna encounter weird error while uploading credentials json
Something like that:
image

In first time I was confused why there is type external_account when I'm trying to enter normal credentials with type service_account

I connected the dots and the connector for bigquery is trying to use my GOOGLE_APPLICATION_CREDENTIALS from worker.
And here is the question doing a small rewrite inside the connector to bigquery gonna give us the possibility to enter impersonation creds rather than normal?

Doing small digging I found
https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java#L163
https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/sample_secret/credentials.json

@userbradley here link to issue https://discuss.airbyte.io/t/airbyte-using-fleet-workload-identity-overwrites-google-application-credentials-inside-connector/2277

@franviera92
Copy link
Contributor

i need work airbyte with Workload identity, please add feature

@yuriolive
Copy link

yuriolive commented Feb 13, 2023

{
"type": "external_account",
"audience": "identitynamespace:WORKLOAD_IDENTITY_POOL:IDENTITY_PROVIDER",
"service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/GSA_NAME@GSA_PROJECT_ID.iam.gserviceaccount.com:generateAccessToken",
"subject_token_type": "urn:ietf:params:oauth:token-type:jwt",
"token_url": "https://sts.googleapis.com/v1/token",
"credential_source": {
"file": "/secrets/tokens/gcp-ksa/token" <- in our example token gonna be mounted in this location screens below
}
}

@Santhin What IDENTITY_PROVIDER should be for a GKE cluster? Couldn't find in the links.

@Santhin
Copy link

Santhin commented Feb 13, 2023

@yuriolive To retrieve values you can use gcloud container fleet memberships describe MEMBERSHIP, where MEMBERSHIP is your cluster's unique membership name in the fleet source

@yuriolive
Copy link

@yuriolive To retrieve values you can use gcloud container fleet memberships describe MEMBERSHIP, where MEMBERSHIP is your cluster's unique membership name in the fleet source

gcloud container fleet memberships list

The command doesn't return any membership. Are you using GKE too? You have to enable Anthos? Anthos has some cost involved so I would avoid if I could.

@igrankova igrankova added the area/platform issues related to the platform label Jun 6, 2023
@bleonard bleonard added the frozen Not being actively worked on label Mar 22, 2024
@sandeshhegde1
Copy link

sandeshhegde1 commented Oct 27, 2024

Hi Any update on this ? I believe the change should be simple. I am new to this but it looks like
you are using "com.google.auth.oauth2.ServiceAccountCredentials" instead of generic "com.google.auth.oauth2.GoogleCredentials" which should work for all credential type like Workload identity federation.

i am passing gcp.json as

{
type = "external_account"
audience = "//iam.googleapis.com/projects/${project_number}/locations/global/workloadIdentityPools/${project_id}.svc.id.goog/subject/ns/${namespace}/sa/${kubernetes_service_account}",
subject_token_type = "urn:ietf:params:oauth:token-type:jwt"
token_url = "https://sts.googleapis.com/v1/token"
service_account_impersonation_url = "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/${google_service_account.airbyte_infra_sa.email}:generateAccessToken"
credential_source = {
file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
}
}

instead of GCP service account private keys.

I am getting this error from worker pods when deployed via helm charts.

Caused by: java.lang.ClassCastException: class com.google.auth.oauth2.IdentityPoolCredentials cannot be cast to class com.google.auth.oauth2.ServiceAccountCredentials (com.google.auth.oauth2.IdentityPoolCredentials and com.google.auth.oauth2.ServiceAccountCredentials are in unnamed module of loader 'app')
at com.google.auth.oauth2.ServiceAccountCredentials.fromStream(ServiceAccountCredentials.java:469) ~[google-auth-library-oauth2-http-1.23.0.jar:1.23.0]
at com.google.auth.oauth2.ServiceAccountCredentials.fromStream(ServiceAccountCredentials.java:452) ~[google-auth-library-oauth2-http-1.23.0.jar:1.23.0]
at io.airbyte.workers.storage.StorageClientKt.gcsClient(StorageClient.kt:292) ~[io.airbyte-airbyte-commons-worker-0.63.14.jar:?]
at io.airbyte.workers.storage.GcsStorageClient.(StorageClient.kt:125) ~[io.airbyte-airbyte-commons-worker-0.63.14.jar:?]
at io.airbyte.workers.storage.$GcsStorageClient$Definition.doInstantiate(Unknown Source) ~[io.airbyte-airbyte-commons-worker-0.63.14.jar:?]
at io.micronaut.context.AbstractInitializableBeanDefinition.instantiate(AbstractInitializableBeanDefinition.java:770) ~[micronaut-inject-4.5.4.jar:4.5.4]
at io.micronaut.context.DefaultBeanContext.resolveByBeanFactory(DefaultBeanContext.java:2326) ~[micronaut-inject-4.5.4.jar:4.5.4]
... 106 more

Which shows that IdentityPoolCredentials cannot be cast to class ServiceAccountCredentials.

On looking into this more: i found that you are using
com.google.auth.oauth2.ServiceAccountCredentials instead of generic com.google.auth.oauth2.GoogleCredentials

https://github.com/airbytehq/airbyte-platform/blob/main/airbyte-commons-storage/src/main/kotlin/io/airbyte/commons/storage/StorageClient.kt#L521

git grep "google.auth.oauth2"
airbyte-api/server-api/src/main/kotlin/io/airbyte/api/client/config/InternalApiAuthenticationFactory.kt:import com.google.auth.oauth2.ServiceAccountCredentials
airbyte-commons-storage/src/main/kotlin/io/airbyte/commons/storage/StorageClient.kt:import com.google.auth.oauth2.ServiceAccountCredentials
airbyte-config/config-secrets/src/main/kotlin/secrets/persistence/GoogleSecretManagerPersistence.kt:import com.google.auth.oauth2.ServiceAccountCredentials
airbyte-test-utils/src/main/java/io/airbyte/test/utils/CloudSqlDatabaseProvisioner.java:import com.google.auth.oauth2.GoogleCredentials;

From the repo https://github.com/airbytehq/airbyte-platform/tree/main, it looks like you are using GoogleCredentials in test code
i.e airbyte-test-utils/src/main/java/io/airbyte/test/utils/CloudSqlDatabaseProvisioner.java
but in other places like
airbyte-api/server-api/src/main/kotlin/io/airbyte/api/client/config/InternalApiAuthenticationFactory.kt
airbyte-commons-storage/src/main/kotlin/io/airbyte/commons/storage/StorageClient.kt
airbyte-config/config-secrets/src/main/kotlin/secrets/persistence/GoogleSecretManagerPersistence.kt

you are using com.google.auth.oauth2.ServiceAccountCredentials .

@userbradley
Copy link
Author

@sandeshhegde1 can you pls format your comment a little better?

@sandeshhegde1
Copy link

@userbradley updated. Sorry are you part of airbyte team ?

@userbradley
Copy link
Author

Sorry are you part of airbyte team ?

No I am the original poster of this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform community frozen Not being actively worked on team/deployments type/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

10 participants