-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Namespace deletion hangs when containing vault-config-operator custom resources #133
Comments
thanks for the in-depth analysis. This is a race condition that from the point of view of the operator is un-distinguishable from a mis-configuration (i.e. let's say the wrong service account was configured). |
I would have the controller indicate to Kubernetes that the See https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/#ownership-and-finalizers for more details. I'm attempting to prototype this right now. Will keep you posted. |
Update - not working. First, the ownership stanza does not prevent the service account from being deleted for some reason. Maybe I failed to do it properly, even if verified multiple times. However, this would not solve the issue anyway: once the namespace enters the This situation would actually require an alternate, not Kubernetes-based means of authenticating to Vault. So all that's left is the last resort outcome you that mentioned: since there is no way the cleanup can be performed on Vault for CR objects, there is no point stalling the CRs and namespace removal. Thoughts? |
Probably a webook on DELETE operation for namespace resource would help |
@eye0fra how would you see this working (assuming that generating the JWT still works during the hook processing)? |
A webhook on the DELETE operation would not work. But a webhook on the UPDATE operation that sets the deletion time on the namespace might allow you to execute some logic while the namespace is still healthy. But what exactly? One might delete all of the vault config resources in the namespace and then return. |
I don't have any smarter option here since obtaining a JWT from Kube within a namespace being deleted is not allowed (which makes sense, right?). |
I think so. It would not be a huge change as we centralize that logic.
…On Wed, Mar 15, 2023 at 10:59 AM Pascal Davoust ***@***.***> wrote:
I don't have any smarter option here since obtaining a JWT from Kube
within a namespace being delete is not allowed (which makes sense, right?).
So what's the best option to avoid hanging? Ensure that we take the
namespace-being-delete condition into account into each controller and skip
the Vault cleanup in this case?
—
Reply to this email directly, view it on GitHub
<#133 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABPERXFPCCQICRE3EHWNYYTW4HKN7ANCNFSM6AAAAAAVQ7UEOE>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
ciao/bye
Raffaele
|
Hi,
After #126 was fixed, attempting to clear an entire namespace containing custom resources owned by
vault-config-operator
is still hanging, leaving the namespace into theTerminating
state.The obvious cause is again that finalizers are not removed from the Custom Resources for a different reason: the
vault-config-operator
exhibits failures to handle the removal and then fails to clear the finalizers from the Custom Resources.Removing the Custom Resource's finalizer manually unblocks the garbage collection and the namespace is cleared - leaving relevant content into Vault in the process.
Environment
Kubernetes cluster 1.24.8 (GKE)
vault 1.12.1 (deployed using helm) + fix for #126
vault-config-operator version 0.8.10 (deployed using Helm)
How to reproduce
RandomSecret
After the resources have been created, attempt to remove the entire namespace:
=> the operator immediately exhibits errors as shown below (backtraces stripped to ease reading):
Full log attached available on demand if needed.
Analysis
It seems that it's not possible to acquire a Kubernetes JWT token using the service account when the namespace is being deleted (which happens into the
prepareContext
function, see below), which is needed to run the proper cleanup logic (potentially cleaning content from Vault) and clearing the finalizer.See
vault-config-operator/api/v1alpha1/utils/commons.go
Line 171 in 18c909d
Unless I'm mistaken, the
ServiceAccount
resource is most probably already removed from the namespace at this moment, as this is a resource that is not managed by the operator. As a result, when the operator attempts to generate a token using the service account, the API ServerServiceAccount
admission controller attempts to create it automatically into the namespace, which cannot happen because of theTerminating
state, hence the error.The text was updated successfully, but these errors were encountered: