Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow autoscaler to run concurrently #84

Open
iterion opened this issue Jan 21, 2021 · 1 comment
Open

Allow autoscaler to run concurrently #84

iterion opened this issue Jan 21, 2021 · 1 comment

Comments

@iterion
Copy link

iterion commented Jan 21, 2021

We struggle to update clusters at times because changing circumstances can require our cluster to scale up the number of nodes. If cluster-autoscaler wasn't shut down, we could allow our cluster to scale up for increases in demand during the process.

I'm not sure of the best way to handle this, but it'd be fantastic if we could.

I think the primary issue with leaving the autoscaler on is that it will prefer to shut down nodes if nothing has scheduled there. This means that the nodes that get spun up before rotating nodes will be shut down prematurely. To combat this, we could annotate those nodes with "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true". It might be hard to determine which nodes are "new", but disabling scale down on all nodes that match the new launch configuration should be a good heuristic. If new nodes join, they'll also need to be annotated, however. I believe it's also possible to disable scale-down entirely, but that would require modifying the autoscaler deployment so that's less attractive for that reason.

The next issue would be when the desired count increased while rotating nodes. This would mess with eks-rolling-update as the original count will have diverged from where it was. eks-rolling-update could perhaps tolerate increases to this number and update the ASG tags to match. If the number went down unexpectedly, that would be an issue that would still cause the tool to abort.

I searched for related issues, but didn't see anything, so I apologize if this has already been noted.

@derbauer97
Copy link

derbauer97 commented Jan 25, 2021

We ran also in that issue and would be happy when someone has the time to contribute the feature. Meanwhile we are using Pods with a low priority (-1) which gets terminated when another Pod needs the Node. We have an Deployment for each ASG which starts these Pods. Before the eks-rolling-update script we scale out these deployment to the amount of Nodes we think we need and after that we scale them back in. Maybe thats a solution for you as well.

Explanation: https://medium.com/scout24-engineering/cluster-overprovisiong-in-kubernetes-79433cb3ed0e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants