Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set k8s registry QPS to match MAX_PODS #1801

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions templates/al2/runtime/bootstrap.sh
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,8 @@ echo "$(jq --arg mebibytes_to_reserve "${mebibytes_to_reserve}Mi" --arg cpu_mill

if [[ "$USE_MAX_PODS" = "true" ]]; then
echo "$(jq ".maxPods=$MAX_PODS" $KUBELET_CONFIG)" > $KUBELET_CONFIG
#set registryPullQPS to match MAX_PODS to prevent startup problems when nodes failover
echo "$(jq --arg MAX_PODS $MAX_PODS '.+= {"registryPullQPS":$MAX_PODS}' $KUBELET_CONFIG)" > $KUBELET_CONFIG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to increase this beyond 5; but I don't think we want to go all the way to MAX_PODS -- that's a very large value on many instance types. Have you tested a more moderate increase, something like 10+15burst (up from 5+10burst)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually just set it to 0 (disabled). containerd times out commonly for me starting up 50-ish pods at once so I've also started to bump runtimeRequestTimeout.

I definitely think this number should be dynamic and I was suggesting max_pods as a proxy since, if a pod is "full" and it fails, the next one to boot up will potentially get all of those pods assigned.

fi

KUBELET_ARGS="--node-ip=$INTERNAL_IP --pod-infra-container-image=$PAUSE_CONTAINER --v=2"
Expand Down