Skip to content
This repository has been archived by the owner on Apr 27, 2023. It is now read-only.

Check Readiness, Liveness and graceful shutdown of submission, client and continuum-adaptor #1306

Open
2 of 11 tasks
erkannt opened this issue Aug 13, 2020 · 1 comment
Open
2 of 11 tasks
Milestone

Comments

@erkannt
Copy link
Contributor

erkannt commented Aug 13, 2020

Currently all our deployments query /health to determine Readiness and Liveness of the container.

  • Readiness signals that the app is ready to receive requests
  • failing Liveness will trigger a restart of the container

It is not recommended to use the same check for readiness and liveness.

From the official docs:

If the process in your container is able to crash on its own whenever it encounters an issue or becomes unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the correct action in accordance with the Pod's restartPolicy.
If you'd like your container to be killed and restarted if a probe fails, then specify a liveness probe, and specify a restartPolicy of Always or OnFailure.

I'm creating this ticket as I don't have sufficient understanding of our components to assess the questions raised by the DoD.

DoD:

  • only return 200 on /health when deployment can receive and parse requests
    • client
    • submission
    • continuum-adaptor
  • deployment crashes and terminate itself or has a Liveness Probe that can detect crashes/hangs/deadlocks etc.
    • client
    • submission
    • continuum-adaptor
  • submission and continuum-adaptor run a single process (no nodejs cluster-mode etc)
  • submission handles SIGTERM gracefully (see Properly handle sigterm reviewer-submission#223)
  • default terminationGracePeriodSeconds is set to a value in the Chart that avoids submission receiving a SIGKILL (see
    terminationGracePeriodSeconds: 120
    )
@erkannt erkannt added this to the Release milestone Aug 13, 2020
@hdrury1 hdrury1 modified the milestones: Release, Rollout Aug 24, 2020
@erkannt erkannt changed the title Check Readiness, Liveness and graceful shutdown of submission Check Readiness, Liveness and graceful shutdown of submission, client and continuum-adaptor Oct 6, 2020
@hdrury1
Copy link

hdrury1 commented Oct 7, 2020

This still relevant as currently out deployments go against recommended practice regarding liveness/readiness probes. IMHO the important thing about this ticket is that whoever owns the deployment understands the issue and then can make up their mind if/how to address this. The current setup shouldn't break anything but could lead to weird side affects/issues later on. It might be fully sufficient to just remove the liveness probe if our deployments are expected to be well behaved and crash if they run into any problem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants