Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingester fails on DigitalOcean Kubernetes due to permission issues #996

Open
ThomasVitale opened this issue Jul 30, 2024 · 5 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@ThomasVitale
Copy link

ThomasVitale commented Jul 30, 2024

Tempo 2.5.0 (supported in Tempo Operator 0.11.0+) switched user from root to tempo (10001:10001) and ownership of /var/tempo. In some Kubernetes distributions such as DigitalOcean, persistent volumes are always created with a filesystem owned by root:root, which means the Ingester Pod will fail with the following error.

level=error ts=2024-07-30T08:55:08.620090627Z caller=main.go:121 msg="error running Tempo" err="failed to init module services: error initialising module: store: failed to create store: mkdir /var/tempo/wal: permission denied"

Unfortunately, DigitalOcean Kubernetes doesn't support the mountOptions settings to change the file permissions in the mounted volume, so Tempo Operator cannot be used in that environment as-is.

If I understand correctly, Tempo Operator 0.11.0 introduced a Job that runs when upgrading the Operator from previous versions to change the permissions on the volume (see https://github.com/grafana/tempo-operator/releases/tag/v0.11.0). Would it be an idea to allow running that same Job even on new installations, in environments like DigitalOcean Kubernetes? (perhaps, behind a configuration flag)

If that's not possible, I see two other options to make Tempo Operator support DigitalOcean Kubernetes:

  • Allow the configuration of an "initContainer" for the Ingester component, responsible for changing the permissions explicitly (solution describe in the DigitalOcean documentation)
  • Allow the configuration of the SecurityContext for the Ingester component to run the container as 1000:1000 instead of 10001:10001, though not really a production-ready solution. This is what the tempo-distributed Helm chart does.
@pavolloffay
Copy link
Collaborator

Thanks for reporting this.

Would setting the securityContenxt.fsGroup help you to resolve the issue on digital ocean? See grafana/helm-charts#3161 (comment)

Perhaps the operator could always set it.

@ThomasVitale
Copy link
Author

ThomasVitale commented Jul 31, 2024

@pavolloffay thanks so much for the quick answer. I've just verified on a DigitalOcean cluster that by setting the following to the Pod spec in the StatetulSet for the ingester component solves the problem.

securityContenxt:
  fsGroup: 10001

It would be great if the Operator would allow that configuration. Even better if it's by default.

@siegenthalerroger
Copy link

I'm facing the same issue on AKS with the clean deployment of a TempoStack. Is there a workaround that works currently or is the only option a downgrade?

@pavolloffay
Copy link
Collaborator

You can switch is to UnmanagedMode and fix the permission manually https://github.com/grafana/tempo-operator/blob/main/apis/tempo/v1alpha1/tempostack_types.go#L35.

The unmanaged mode allows you to directly modify k8s objects managed by the operator.

@HendrikLevering
Copy link

facing the same issue. I solved it for now by using operator version 0.10.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants