-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale up logic #173
Comments
This was referenced Apr 19, 2024
Hi @suhlrich. Following this comment I want to provide suggestions on how to implement the "scale up logic", which is in fact - metric submission logic.
|
This was referenced May 3, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Celery will check the queue length every 20 seconds and update the
desired_asg_gpu_instances
variable in cloudwatch accordingly.For counting stopped trials, we can use some of the logic at the beginning of this (
opencap-api/mcserver/views.py
Line 1401 in a937c73
Config variables somewhere when celery starts?
autoscale_gpus_on
- some way to toggle whether we are using autoscalingqueue_length_before_scaling_start
- how long will we let the queue be before starting asg machinesqueue_length_before_new_machine
- how many jobs in the queue per asg machine.Logic that gets executed on celery. This is functional python code:
The text was updated successfully, but these errors were encountered: