Don't set limits for Autoscaler Buffer #593

alexeykazakov · 2024-08-10T16:16:21Z

It's actually not a good idea to set both limits and requests to the same value for our autoscaler buffer.

There are three Quality of Service Classes (QoS) in Kube:

Guaranteed
Burstable
BestEffort

The Pod has the Guaranteed QoS if:

Every Container in the Pod must have a memory limit and a memory request.
For every Container in the Pod, the memory limit must equal the memory request.
Every Container in the Pod must have a CPU limit and a CPU request.
For every Container in the Pod, the CPU limit must equal the CPU request.

The Guaranteed pods are evicted last if there is memory pressure in the node. But removing the limits and keeping the requests only we move the Autoscaler Buffer to the Burstable QoS. Same QoS as most user pods (some of the user pods can be Guaranteed though). So the Autoscaler Buffer pods should be evicted first due their lower Priority Class.

To be clear. This effect only the eviction case when there is a memory pressure in the node. It doesn't affect pod scheduling. We should be already good with the pod scheduling due to lower Priority Class for Autoscaler Buffer pods.

For more details see:

Paired with codeready-toolchain/toolchain-e2e#1029

sonarcloud · 2024-08-10T16:16:39Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

codecov · 2024-08-10T18:12:07Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.56%. Comparing base (a0cdbff) to head (d179a60).

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #593   +/-   ##
=======================================
  Coverage   83.56%   83.56%           
=======================================
  Files          28       28           
  Lines        2604     2604           
=======================================
  Hits         2176     2176           
  Misses        288      288           
  Partials      140      140

MatousJobanek

Interesting, thanks for all the description and the links. I wasn't aware of the different levels of QoS, but it makes sense. I always thought that kubelet ranks the pods based the on priority class first and then based on the consumption compared to their requests.

openshift-ci · 2024-08-12T07:57:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexeykazakov, MatousJobanek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [MatousJobanek,alexeykazakov]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mfrancisc · 2024-08-12T16:27:06Z

Thanks for explaining this and for the links!

Maybe dumb question but why do we want autoscaler pods to be evicted first in case of memory pressure on the node ? I guess it's because we can make room and avoid evicting first user pods but maybe that's not the (only) reason.

alexeykazakov · 2024-08-12T22:25:12Z

@mfrancisc the whole purpose of the Autoscaler Buffer is to "hold" some compute resources, so the cluster autoscaler do not shrink the cluster too slim in case other pods (like user workloads) suddenly need more room.

So the buffer reserves some compute resources
If other pods needs more resources which our nodes do not currently have then the buffer is evicted and now there is enough resources for the other pods without need to wait for the autoscaler to kick in and create more nodes (it takes minutes)
As soon as the buffer is evicted, the autoscaler will still start node provisioning because it's need to run the now-homeless buffer.

So yes we want to make the buffer most likely evictable before anything else.

mfrancisc · 2024-08-13T07:46:01Z

Thanks a lot for the context and nice job with configuring this buffer mechanism 👍

Don't set limits for Autoscaler Buffer

d179a60

alexeykazakov requested review from MatousJobanek, xcoulon, rajivnathan, ranakan19, sbryzak and mfrancisc as code owners August 10, 2024 16:16

openshift-ci bot added the approved label Aug 10, 2024

alexeykazakov mentioned this pull request Aug 10, 2024

Don't set limits for Autoscaler Buffer codeready-toolchain/toolchain-e2e#1029

Merged

MatousJobanek approved these changes Aug 12, 2024

View reviewed changes

alexeykazakov merged commit d1a683d into codeready-toolchain:master Aug 12, 2024
11 of 12 checks passed

alexeykazakov deleted the limit branch August 12, 2024 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't set limits for Autoscaler Buffer #593

Don't set limits for Autoscaler Buffer #593

alexeykazakov commented Aug 10, 2024 •

edited

Loading

sonarcloud bot commented Aug 10, 2024

codecov bot commented Aug 10, 2024

MatousJobanek left a comment

openshift-ci bot commented Aug 12, 2024

mfrancisc commented Aug 12, 2024

alexeykazakov commented Aug 12, 2024

mfrancisc commented Aug 13, 2024

Don't set limits for Autoscaler Buffer #593

Don't set limits for Autoscaler Buffer #593

Conversation

alexeykazakov commented Aug 10, 2024 • edited Loading

sonarcloud bot commented Aug 10, 2024

Quality Gate passed

codecov bot commented Aug 10, 2024

Codecov Report

MatousJobanek left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Aug 12, 2024

mfrancisc commented Aug 12, 2024

alexeykazakov commented Aug 12, 2024

mfrancisc commented Aug 13, 2024

alexeykazakov commented Aug 10, 2024 •

edited

Loading