S3ng: S3 health /ready check #6774

wkloucek · 2023-07-11T16:06:40Z

Is your feature request related to a problem? Please describe.

To gain confidence of a up and running system I would love the storageusers service to do a S3 health check.

Describe the solution you'd like

The storageusers service could do a regular HEAD on the configured bucket if S3ng is used. This would help gaining confidence, that the storageusers service has the right S3 credentials and is able to connect.

This does not need to be scheduled by oCIS iself - a command that I can invoke from the CLI would also be fine for me.

Describe alternatives you've considered

Wait for a user to upload a file and see the upload failing

Additional context

This would highly improve maintainability.

micbar · 2023-09-01T09:45:40Z

@wkloucek @d7oc We should prioretize this.

Would that make sense to do via a readiness probe?

wkloucek · 2023-09-01T09:49:51Z

@wkloucek @d7oc We should prioretize this.

Would that make sense to do via a readiness probe?

Probably should be reflected in both readiness and liveness probobes.

See eg: https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-setting-up-health-checks-with-readiness-and-liveness-probes?hl=en

The storage-users service can neither be ready nor healthy if the S3 bucket is not accessible. At least for the current implementation of the s3ng driver.

dragotin · 2023-11-23T15:58:16Z

In my understanding, the storage-users has to be able to deal with a non working S3 backend properly. Without any data loss.
The check if S3 is up and running and the overall health check for the entire system should be imho done via a monitoring system that just sets the entire system in maintenance mode in case one of the critical components is missing.

wkloucek · 2023-11-23T16:03:47Z

The check if S3 is up and running and the overall health check for the entire system should be imho done via a monitoring system that just sets the entire system in maintenance mode in case one of the critical components is missing.

How does this maintenance mode look like? What are the indicators for putting oCIS in a maintenance mode?

wkloucek · 2023-11-23T16:08:53Z

One important thing consider:

Storageusers starts without complaining even if a bucket does not exist or credentials are not valid. Probably it should only fail on the first user request.

dragotin · 2023-11-23T16:21:24Z

Maintenance mode: If switched on (compare oC10) the instance answers to every client with a certain http reply code (IIRC it was 503 in oC10) so that all clients know that the server is in a non functional state and behave correctly by not deleting things or inform the user gracefully.

On startup, the storage users service should check a connection to the backend once and not start if something goes wrong. Imho.

wkloucek · 2023-11-24T07:29:22Z

On startup, the storage users service should check a connection to the backend once and not start if something goes wrong. Imho.

In Kubernetes, the storage-users service wouldn't even need to exit with an error code. It would be sufficient if the the /readyz endpoint (

ocis/ocis-pkg/service/debug/service.go

Line 32 in bd213bb

mux.HandleFunc("/readyz", dopts.Ready)

) wouldn't signal that oCIS is ready. Currently /readyz always signals "I'm ready, please give me work!" 😉 .

Maintenance mode: If switched on (compare oC10) the instance answers to every client with a certain http reply code (IIRC it was 503 in oC10) so that all clients know that the server is in a non functional state and behave correctly by not deleting things or inform the user gracefully.

Let's put that in a different ticket. Where should this go? Will this be a oCIS Product feature? Will this a oCIS Helm Chart feature (then we could reuse owncloud/ocis-charts#339)

For a SaaS project we already have a maintenance page during updates that returns 503 for every route / path / ... One downside is still that it's plain HTML, so no autorefresh so that the user automatically get's back to the service when the maintenance page is down.

But for a general unavailable page, oCIS doesn't handle /health endpoints well enough. From what I know the just return "I'm happy and alive" all the time 😉 .

wkloucek added the Category:Enhancement Add new functionality label Jul 11, 2023

wkloucek mentioned this issue Aug 3, 2023

No error when S3 upload fails #6962

Closed

wkloucek changed the title ~~S3ng: S3 health check~~ S3ng: S3 health /ready check Aug 4, 2023

This was referenced Sep 30, 2024

Bugfix: Improve Ready and Health Checks #10163

Merged

healthcheck not failing even when service registration is not successful #8783

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3ng: S3 health /ready check #6774

S3ng: S3 health /ready check #6774

wkloucek commented Jul 11, 2023

micbar commented Sep 1, 2023

wkloucek commented Sep 1, 2023

dragotin commented Nov 23, 2023

wkloucek commented Nov 23, 2023

wkloucek commented Nov 23, 2023

dragotin commented Nov 23, 2023

wkloucek commented Nov 24, 2023

S3ng: S3 health /ready check #6774

S3ng: S3 health /ready check #6774

Comments

wkloucek commented Jul 11, 2023

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

micbar commented Sep 1, 2023

wkloucek commented Sep 1, 2023

dragotin commented Nov 23, 2023

wkloucek commented Nov 23, 2023

wkloucek commented Nov 23, 2023

dragotin commented Nov 23, 2023

wkloucek commented Nov 24, 2023