-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: affinity priority #1548
base: main
Are you sure you want to change the base?
fix: affinity priority #1548
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: helen-frank The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
Current Test Results: ❯ kubectl get nodeclaims
NAME TYPE CAPACITY ZONE NODE READY AGE
default-8wq87 c-8x-amd64-linux spot test-zone-d blissful-goldwasser-3014441860 True 67s
default-chvld c-4x-amd64-linux spot test-zone-b exciting-wescoff-4170611030 True 67s
default-kbr7n c-2x-amd64-linux spot test-zone-d vibrant-aryabhata-969189106 True 67s
❯ kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx1-67877d4f4d-nbmj7 1/1 Running 0 77s 10.244.1.0 vibrant-aryabhata-969189106 <none> <none>
nginx10-6685645984-sjftg 1/1 Running 0 76s 10.244.2.2 exciting-wescoff-4170611030 <none> <none>
nginx2-5f45bfcb5b-flrlw 1/1 Running 0 77s 10.244.2.0 exciting-wescoff-4170611030 <none> <none>
nginx3-6b5495bfff-xt7d9 1/1 Running 0 77s 10.244.2.1 exciting-wescoff-4170611030 <none> <none>
nginx4-7bdd687bb6-nzc8f 1/1 Running 0 77s 10.244.3.5 blissful-goldwasser-3014441860 <none> <none>
nginx5-6b5d886fc7-6m57l 1/1 Running 0 77s 10.244.3.0 blissful-goldwasser-3014441860 <none> <none>
nginx6-bd5d6b9fb-x6lkq 1/1 Running 0 77s 10.244.3.2 blissful-goldwasser-3014441860 <none> <none>
nginx7-5559545b9f-xs5sm 1/1 Running 0 77s 10.244.3.4 blissful-goldwasser-3014441860 <none> <none>
nginx8-66bb679c4-zndwz 1/1 Running 0 76s 10.244.3.1 blissful-goldwasser-3014441860 <none> <none>
nginx9-6c47b869dd-nfds6 1/1 Running 0 76s 10.244.3.3 blissful-goldwasser-3014441860 <none> <none> |
edadb85
to
2581408
Compare
Pull Request Test Coverage Report for Build 11357525644Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessarily as clear-cut of a change to me. Is there data that you've generated to give you confidence that this doesn't have any adverse affects?
@@ -96,6 +97,15 @@ func byCPUAndMemoryDescending(pods []*v1.Pod) func(i int, j int) bool { | |||
return true | |||
} | |||
|
|||
// anti-affinity pods should be sorted before normal pods | |||
if affinityCmp := pod.PodAffinityCmp(lhsPod, rhsPod); affinityCmp != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the right move, but I'm not sure how this breaks down in our bin-packing algorithm. From what I understand, this just sorts pods with affinity + tsc before others with the same exact pod requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, after testing this approach (there is a small test case in the previous section), scheduling the mutually exclusive pods further ahead helps to get a more balanced scheduling result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this approach, the cluster will be more stable (e.g., draining one node will not cause most pods to be rescheduled). I observed that Karpenter attempts to distribute the pods across all nodes:
Scheduler Code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @njtran @jonathan-innis , please take a look
2581408
to
6806f12
Compare
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
379b0c6
to
46e7949
Compare
Signed-off-by: helen <[email protected]>
46e7949
to
ea438bc
Compare
Can you share the data that led you to this conclusion? Without going in and testing it myself, it's not clear to me how you came to this conclusion. |
@njtran This is the real scheduling result I got by using kwok as provider, and creating 10 deployments (where pod1, pod9, pod10 are mutually exclusive), you can see that now the scheduling instance specification is more balanced compared to the previous one, it's |
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
Fixes #1418
Description
Priority scheduling of pods with anti-affinity or topologySpreadConstraints
How was this change tested?
I have 10 pending pods:
pod1: 1c1g requests, with anti-affinity; cannot be scheduled on the same node as pod10 and pod9.
pod2 ~ pod8: 1c1g requests; no anti-affinity is configured.
pod9: 1c1g requests, with anti-affinity; cannot be scheduled on the same node as pod1 and pod10.
pod10: 1c1g requests, with anti-affinity; cannot be scheduled on the same node as pod1 and pod9.
I want the resources of the three nodes to be evenly distributed, like:
node1: c7a.4xlarge, 8c16g (4Pod)
node2: c7a.xlarge, 4c8g (3Pod)
node3: c7a.xlarge, 4c8g (3Pod)
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.