Karpenter conversion webhooks do not work on chart version 1.0.2 #6982

dcherniv · 2024-09-11T21:27:56Z

Description

Observed Behavior:
karpenter Chart version: 1.0.2
karpenter-crd Chart version: 1.0.2
with webhooks enabled in values as follows:

karpenter-crd:
  webhook:
    enabled: false
karpenter:
  enabled: true
  webhook:
    enabled: false

The webhooks fail with the below error with no indication as to why:

{"level":"ERROR","time":"2024-09-11T20:30:53.969Z","logger":"controller","message":"Reconciler error","commit":"b897114","controller":"nodeclaim.podevents","controllerGroup":"","controllerKind":"Pod","Pod":{"name":"istio-gateway-internal-secret-init-84df655d8b-sdxn4","namespace":"istio-system"},"namespace":"istio-system","name":"istio-gateway-internal-secret-init-84df655d8b-sdxn4","reconcileID":"86d25014-bd4b-4b5d-852f-a5c4d50fe7c4","error":"Internal error occurred: failed calling webhook \"validation.webhook.karpenter.sh\": failed to call webhook: the server rejected our request for an unknown reason"}

Expected Behavior:
Webhook to work in 1.0.2

Reproduction Steps (Please include YAML):
#6847 (comment)

Versions:

Chart Version: 1.0.2
Kubernetes Version (kubectl version): 1.29+ eks

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

jonathan-innis · 2024-09-12T00:07:24Z

Can you share if there is anything in the logs that indicates why the webhook rejected the request? I'm also curious if this even made it to the webhook or if something is getting in the way of the network traffic to the service/pod.

dcherniv · 2024-09-12T00:16:40Z

@jonathan-innis nothing beyond this.
{"level":"ERROR","time":"2024-09-11T20:30:53.969Z","logger":"controller","message":"Reconciler error","commit":"b897114","controller":"nodeclaim.podevents","controllerGroup":"","controllerKind":"Pod","Pod":{"name":"istio-gateway-internal-secret-init-84df655d8b-sdxn4","namespace":"istio-system"},"namespace":"istio-system","name":"istio-gateway-internal-secret-init-84df655d8b-sdxn4","reconcileID":"86d25014-bd4b-4b5d-852f-a5c4d50fe7c4","error":"Internal error occurred: failed calling webhook "validation.webhook.karpenter.sh": failed to call webhook: the server rejected our request for an unknown reason"}
This is from Karpenter logs.
I'm assuming this was trying to validate a nodeclaim which didn't have group field set.
Reading the docs for the migration to v1. Can you confirm that 0.37.3 doesn't mutate resources?
Its worth noting that at the point I got this error all the resources I have defined for node-pools etc in my chart were already upgraded to v1 spec and deployed prior to deploying Karpenter 1.0.2 chart
My path was:

Upgrade to 0.37.3
Upgrade all manifests to v1 spec on 0.37.3
Deploy Karpenter 1.0.2

jonathan-innis · 2024-09-12T00:19:45Z

Upgrade all manifests to v1 spec on 0.37.3

When you run through this part of your upgrade, did this also have the v1 storage version? Even with the resources on the v1 spec, if they aren't stored on the correct storage version, there can be issues during the upgrade.

jonathan-innis · 2024-09-12T00:20:25Z

Also, from what it looks like, there is something on the network path that is blocking this traffic from getting through. If there's no errors on the webhook side, that would indicate to me that there's something preventing the call coming from the apiserver to the pod service endpoint.

dcherniv · 2024-09-12T00:31:14Z

@jonathan-innis Found a couple of folks who encountered the same #6847 (comment)
and #6879 (comment)
Deleting the validating webhooks altogether resolves this. I think they are carried over from 0.37.3 and are not needed on 1.0.2? Can you confirm? Its weird that they are not cleaned up upon upgrade automatically.

jonathan-innis · 2024-09-16T20:32:26Z

Agreed, this looks like an issue with the interaction that Karpenter has with Argo. We definitely need to look at this since we shouldn't be leaving behind MutatingWebhookConfigurations and ValidatingWebhookConfigurations after the upgrade.

jonathan-innis · 2024-09-19T15:48:36Z

So, I think that we tracked it down. There's an open issue here that actually specifically describes this interaction between Knative and ArgoCD. What it comes down to is that ArgoCD refuses to delete objects with ownerReferences -- even if it was the original creator of the object.

Because knative injects an owner reference onto the ValidatingWebhookConfiguration and MutatingWebhookConfiguration as part of its reconciliation flow, this causes ArgoCD to ignore these objects when pruning and leave them behind on the cluster. I'm trying out what happens when we don't have these owner references by patching the knative package that we are using here: #7042 to see if that resolves the issue (I suspect it should).

Code link to where ArgoCD does this check: https://github.com/argoproj/gitops-engine/blob/master/pkg/cache/cluster.go#L1167

neoakris · 2024-09-20T20:45:27Z

Could this be part of the issue?
https://github.com/aws/karpenter-provider-aws/blob/main/charts/karpenter/Chart.yaml
https://github.com/aws/karpenter-provider-aws/blob/v1.0.2/charts/karpenter/Chart.yaml
^-- v1.0.0 of the helm chart should use v1 CRDs
but from what I can tell it's referencing v1beta1 CRDs

Also in the future after these upstream issues get resolved, it'd be nice if EKS Blueprints karpenter addon could get some love.
aws-quickstart/cdk-eks-blueprints#1078
I tried doing a fresh install of karpenter helm chart 1.0.2 (via eks blueprints for cdk) where there should be no need to convert crds / no ned to convert old versions of the app, in that case even a fresh install doesn't work, and installs older beta versions of the CRDs from what I can tell and I think it might be the fault of the upstream helm chart, based on references in the above link.

dcherniv added bug Something isn't working needs-triage Issues that need to be triaged labels Sep 11, 2024

This was referenced Sep 11, 2024

Karpenter IAM policy version 1.0.2 is not backwards compatible with 0.37.3 #6983

Open

Karpenter v1.0.0 chart does not respect Helm values in packaged CRDs leading to ArgoCD upgrade difficulty #6847

Open

jonathan-innis added triage/needs-investigation Issues that need to be investigated before triaging and removed needs-triage Issues that need to be triaged labels Sep 12, 2024

jonathan-innis added triage/needs-information Marks that the issue still needs more information to properly triage and removed triage/needs-investigation Issues that need to be investigated before triaging labels Sep 12, 2024

jonathan-innis added triage/needs-investigation Issues that need to be investigated before triaging and removed triage/needs-information Marks that the issue still needs more information to properly triage labels Sep 16, 2024

jonathan-innis mentioned this issue Sep 19, 2024

[DRAFT] fix: Test dropping owner ref #7042

Draft

3 tasks

neoakris mentioned this issue Sep 20, 2024

karpenter addon (v1.0.2): has misconfigured CRDs and installCRDs: false, doesn't work. aws-quickstart/cdk-eks-blueprints#1078

Open

jonathan-innis mentioned this issue Sep 23, 2024

🧹 Give users the ability to disable namespace ownership of webhook configurations knative/pkg#3095

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karpenter conversion webhooks do not work on chart version 1.0.2 #6982

Karpenter conversion webhooks do not work on chart version 1.0.2 #6982

dcherniv commented Sep 11, 2024

jonathan-innis commented Sep 12, 2024

dcherniv commented Sep 12, 2024

jonathan-innis commented Sep 12, 2024

jonathan-innis commented Sep 12, 2024

dcherniv commented Sep 12, 2024

jonathan-innis commented Sep 16, 2024

jonathan-innis commented Sep 19, 2024

neoakris commented Sep 20, 2024 •

edited

Loading

Karpenter conversion webhooks do not work on chart version 1.0.2 #6982

Karpenter conversion webhooks do not work on chart version 1.0.2 #6982

Comments

dcherniv commented Sep 11, 2024

Description

jonathan-innis commented Sep 12, 2024

dcherniv commented Sep 12, 2024

jonathan-innis commented Sep 12, 2024

jonathan-innis commented Sep 12, 2024

dcherniv commented Sep 12, 2024

jonathan-innis commented Sep 16, 2024

jonathan-innis commented Sep 19, 2024

neoakris commented Sep 20, 2024 • edited Loading

neoakris commented Sep 20, 2024 •

edited

Loading