Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3ng: S3 health /ready check #6774

Open
wkloucek opened this issue Jul 11, 2023 · 7 comments
Open

S3ng: S3 health /ready check #6774

wkloucek opened this issue Jul 11, 2023 · 7 comments
Labels
Category:Enhancement Add new functionality

Comments

@wkloucek
Copy link
Contributor

Is your feature request related to a problem? Please describe.

To gain confidence of a up and running system I would love the storageusers service to do a S3 health check.

Describe the solution you'd like

The storageusers service could do a regular HEAD on the configured bucket if S3ng is used. This would help gaining confidence, that the storageusers service has the right S3 credentials and is able to connect.

This does not need to be scheduled by oCIS iself - a command that I can invoke from the CLI would also be fine for me.

Describe alternatives you've considered

Wait for a user to upload a file and see the upload failing

Additional context

This would highly improve maintainability.

@wkloucek wkloucek added the Category:Enhancement Add new functionality label Jul 11, 2023
@wkloucek wkloucek changed the title S3ng: S3 health check S3ng: S3 health /ready check Aug 4, 2023
@micbar
Copy link
Contributor

micbar commented Sep 1, 2023

@wkloucek @d7oc We should prioretize this.

Would that make sense to do via a readiness probe?

@wkloucek
Copy link
Contributor Author

wkloucek commented Sep 1, 2023

@wkloucek @d7oc We should prioretize this.

Would that make sense to do via a readiness probe?

Probably should be reflected in both readiness and liveness probobes.

See eg: https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-setting-up-health-checks-with-readiness-and-liveness-probes?hl=en

The storage-users service can neither be ready nor healthy if the S3 bucket is not accessible. At least for the current implementation of the s3ng driver.

@dragotin
Copy link
Contributor

In my understanding, the storage-users has to be able to deal with a non working S3 backend properly. Without any data loss.
The check if S3 is up and running and the overall health check for the entire system should be imho done via a monitoring system that just sets the entire system in maintenance mode in case one of the critical components is missing.

@wkloucek
Copy link
Contributor Author

The check if S3 is up and running and the overall health check for the entire system should be imho done via a monitoring system that just sets the entire system in maintenance mode in case one of the critical components is missing.

How does this maintenance mode look like? What are the indicators for putting oCIS in a maintenance mode?

@wkloucek
Copy link
Contributor Author

One important thing consider:

Storageusers starts without complaining even if a bucket does not exist or credentials are not valid. Probably it should only fail on the first user request.

@dragotin
Copy link
Contributor

Maintenance mode: If switched on (compare oC10) the instance answers to every client with a certain http reply code (IIRC it was 503 in oC10) so that all clients know that the server is in a non functional state and behave correctly by not deleting things or inform the user gracefully.

On startup, the storage users service should check a connection to the backend once and not start if something goes wrong. Imho.

@wkloucek
Copy link
Contributor Author

On startup, the storage users service should check a connection to the backend once and not start if something goes wrong. Imho.

In Kubernetes, the storage-users service wouldn't even need to exit with an error code. It would be sufficient if the the /readyz endpoint (

mux.HandleFunc("/readyz", dopts.Ready)
) wouldn't signal that oCIS is ready. Currently /readyz always signals "I'm ready, please give me work!" 😉 .

Maintenance mode: If switched on (compare oC10) the instance answers to every client with a certain http reply code (IIRC it was 503 in oC10) so that all clients know that the server is in a non functional state and behave correctly by not deleting things or inform the user gracefully.

Let's put that in a different ticket. Where should this go? Will this be a oCIS Product feature? Will this a oCIS Helm Chart feature (then we could reuse owncloud/ocis-charts#339)

For a SaaS project we already have a maintenance page during updates that returns 503 for every route / path / ... One downside is still that it's plain HTML, so no autorefresh so that the user automatically get's back to the service when the maintenance page is down.

But for a general unavailable page, oCIS doesn't handle /health endpoints well enough. From what I know the just return "I'm happy and alive" all the time 😉 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:Enhancement Add new functionality
Projects
None yet
Development

No branches or pull requests

3 participants