-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding mmpose #17
base: feature/ecs-on-ec2-auto-scaling
Are you sure you want to change the base?
Adding mmpose #17
Conversation
re. !17 , see aws/amazon-ecs-agent#1514 (comment) for fix details
Hi @antoinefalisse !
Is this still an issue? I see that later messages specify that the worker has actually started and processed stuff. In general, I saw this behavior when the container instance didn't have enough memory to place the task. In any case, I've improved it here, to make this behavior more predictable and adjustable: e802650
My bad :( fixed in 0859046 , you can revert the hotfix in opencap-core
Must be because of
I saw this, which suggests you tried to scale instances manually. I can't explain how, but my gut feeling suggests this might've caused three instances. Let's have another test please with the above fixes, especially the unprotect-on-crash things. What else is extremely weird is that I don't see the AlarmLow ever getting triggered. I'll look into this instance more, but we'll need another test definitely. |
…2-auto-scaling-updates
…2-auto-scaling-updates
…2-auto-scaling-updates
We need to share signle GPU between openpose & mmpose re. !17 , aws/containers-roadmap#327 (comment)
try a modern approach for configuring docker + set special env variable to make all GPUs visible re. !17
@sashasimkin @suhlrich @olehkorkh-planeks I could make the ASG work, or at least partly. Here are the TODOs/Bugs:
MMPOSE is not set in task definition
ECS_CONTAINER_METADATA_FILE is not set
ECS_CONTAINER_METADATA_FILE
was not set. ChatGPT suggested making this change in the task_definition, but it did not help. Any thoughts on what is going wrong?Two tasks start at once, shouldn't it be one by one?
When two tasks are active, they never stop even if the instances are unprotected.
Need to unprotect instance if something makes core crash (Antoine)