Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't set limits for Autoscaler Buffer #593

Merged
merged 1 commit into from
Aug 12, 2024

Conversation

alexeykazakov
Copy link
Contributor

@alexeykazakov alexeykazakov commented Aug 10, 2024

It's actually not a good idea to set both limits and requests to the same value for our autoscaler buffer.

There are three Quality of Service Classes (QoS) in Kube:

  • Guaranteed
  • Burstable
  • BestEffort

The Pod has the Guaranteed QoS if:

  • Every Container in the Pod must have a memory limit and a memory request.
  • For every Container in the Pod, the memory limit must equal the memory request.
  • Every Container in the Pod must have a CPU limit and a CPU request.
  • For every Container in the Pod, the CPU limit must equal the CPU request.

The Guaranteed pods are evicted last if there is memory pressure in the node. But removing the limits and keeping the requests only we move the Autoscaler Buffer to the Burstable QoS. Same QoS as most user pods (some of the user pods can be Guaranteed though). So the Autoscaler Buffer pods should be evicted first due their lower Priority Class.

To be clear. This effect only the eviction case when there is a memory pressure in the node. It doesn't affect pod scheduling. We should be already good with the pod scheduling due to lower Priority Class for Autoscaler Buffer pods.

For more details see:

Paired with codeready-toolchain/toolchain-e2e#1029

Copy link

sonarcloud bot commented Aug 10, 2024

Copy link

codecov bot commented Aug 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.56%. Comparing base (a0cdbff) to head (d179a60).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #593   +/-   ##
=======================================
  Coverage   83.56%   83.56%           
=======================================
  Files          28       28           
  Lines        2604     2604           
=======================================
  Hits         2176     2176           
  Misses        288      288           
  Partials      140      140           

Copy link
Contributor

@MatousJobanek MatousJobanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for all the description and the links. I wasn't aware of the different levels of QoS, but it makes sense. I always thought that kubelet ranks the pods based the on priority class first and then based on the consumption compared to their requests.

Copy link

openshift-ci bot commented Aug 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexeykazakov, MatousJobanek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [MatousJobanek,alexeykazakov]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alexeykazakov alexeykazakov merged commit d1a683d into codeready-toolchain:master Aug 12, 2024
11 of 12 checks passed
@alexeykazakov alexeykazakov deleted the limit branch August 12, 2024 15:51
@mfrancisc
Copy link
Contributor

Thanks for explaining this and for the links!

Maybe dumb question but why do we want autoscaler pods to be evicted first in case of memory pressure on the node ? I guess it's because we can make room and avoid evicting first user pods but maybe that's not the (only) reason.

@alexeykazakov
Copy link
Contributor Author

@mfrancisc the whole purpose of the Autoscaler Buffer is to "hold" some compute resources, so the cluster autoscaler do not shrink the cluster too slim in case other pods (like user workloads) suddenly need more room.

  • So the buffer reserves some compute resources
  • If other pods needs more resources which our nodes do not currently have then the buffer is evicted and now there is enough resources for the other pods without need to wait for the autoscaler to kick in and create more nodes (it takes minutes)
  • As soon as the buffer is evicted, the autoscaler will still start node provisioning because it's need to run the now-homeless buffer.

So yes we want to make the buffer most likely evictable before anything else.

@mfrancisc
Copy link
Contributor

Thanks a lot for the context and nice job with configuring this buffer mechanism 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants