Allow autoscaler to run concurrently #84

iterion · 2021-01-21T15:13:47Z

We struggle to update clusters at times because changing circumstances can require our cluster to scale up the number of nodes. If cluster-autoscaler wasn't shut down, we could allow our cluster to scale up for increases in demand during the process.

I'm not sure of the best way to handle this, but it'd be fantastic if we could.

I think the primary issue with leaving the autoscaler on is that it will prefer to shut down nodes if nothing has scheduled there. This means that the nodes that get spun up before rotating nodes will be shut down prematurely. To combat this, we could annotate those nodes with "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true". It might be hard to determine which nodes are "new", but disabling scale down on all nodes that match the new launch configuration should be a good heuristic. If new nodes join, they'll also need to be annotated, however. I believe it's also possible to disable scale-down entirely, but that would require modifying the autoscaler deployment so that's less attractive for that reason.

The next issue would be when the desired count increased while rotating nodes. This would mess with eks-rolling-update as the original count will have diverged from where it was. eks-rolling-update could perhaps tolerate increases to this number and update the ASG tags to match. If the number went down unexpectedly, that would be an issue that would still cause the tool to abort.

I searched for related issues, but didn't see anything, so I apologize if this has already been noted.

The text was updated successfully, but these errors were encountered:

derbauer97 · 2021-01-25T14:30:44Z

We ran also in that issue and would be happy when someone has the time to contribute the feature. Meanwhile we are using Pods with a low priority (-1) which gets terminated when another Pod needs the Node. We have an Deployment for each ASG which starts these Pods. Before the eks-rolling-update script we scale out these deployment to the amount of Nodes we think we need and after that we scale them back in. Maybe thats a solution for you as well.

Explanation: https://medium.com/scout24-engineering/cluster-overprovisiong-in-kubernetes-79433cb3ed0e

pysysops mentioned this issue Mar 17, 2021

feat: WIP: allow user to over-scale a buffer of instances in an ASG #100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow autoscaler to run concurrently #84

Allow autoscaler to run concurrently #84

iterion commented Jan 21, 2021

derbauer97 commented Jan 25, 2021 •

edited

Loading

Allow autoscaler to run concurrently #84

Allow autoscaler to run concurrently #84

Comments

iterion commented Jan 21, 2021

derbauer97 commented Jan 25, 2021 • edited Loading

derbauer97 commented Jan 25, 2021 •

edited

Loading