-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce RequestEviction feature for evicting pods in background (KEP-1397) #1466
base: master
Are you sure you want to change the base?
Introduce RequestEviction feature for evicting pods in background (KEP-1397) #1466
Conversation
1c05dac
to
d022bf4
Compare
Compared to virt-handler produced by my local kind these are the logs produced by the upstream CI:
Could be kubevirt does not like combination of running kind in pod with dind enabled. |
3210921
to
90a6b55
Compare
/retest-required |
/retest-required |
8203821
to
edb0100
Compare
@@ -210,13 +569,36 @@ func evictPod(ctx context.Context, client clientset.Interface, pod *v1.Pod, poli | |||
}, | |||
DeleteOptions: deleteOptions, | |||
} | |||
err := client.PolicyV1().Evictions(eviction.Namespace).Evict(ctx, eviction) | |||
err := pe.client.PolicyV1().Evictions(eviction.Namespace).Evict(ctx, eviction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered adding a similar retry mechanism?
retriable := func(err error) bool {
return apierrors.IsTooManyRequests(err) || apierrors.IsInternalError(err) || apierrors.IsServerTimeout(err) || apierrors.IsUnexpectedServerError(err)
}
err = retry.OnError(retry.DefaultRetry, retriable, func() error {
return evictPod(ctx, pe.client, pod, pe.directEvict)
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we discussed this in the past. Though, I have not seen anyone reporting this as annoying. We can always update the code. Nevertheless, if the returned errors are returned again for some period of time, retrying will not help. If only few evictions return the error codes the descheduler will try to evict another pod. All the pods are still treated as cattle. So the impact is usually negligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the impact is not significant. For example, the rescheduling policy uses the method podutil.SortPodsBasedOnPriorityLowToHigh(removablePods)
to sort the pods. Lower-priority pods at the front may fail to be evicted due to network issues, resulting in the eviction of higher-priority pods later, even though all these pods are evictable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. This could be a suboptional case. Too many higher priority pods getting evicted instead of lower priority ones. Nevertheless, this deserves a separate PR/change. Retries are outside the scope of this.
No obvious bugs were found; everything except the commented code looks good to me. |
edb0100
to
83d8ff9
Compare
79eebf1
to
9b5eae3
Compare
/retest-required |
9b5eae3
to
a2b5211
Compare
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -45,6 +50,10 @@ type DeschedulerServer struct { | |||
SecureServing *apiserveroptions.SecureServingOptionsWithLoopback | |||
DisableMetrics bool | |||
EnableHTTP2 bool | |||
// FeatureGates enabled by the user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, the approach used by upstream is via ComponentGlobalsRegistry. Might be good to consider if it makes sense to introduce that.
@@ -115,7 +116,13 @@ func NewDeschedulerCommand(out io.Writer) *cobra.Command { | |||
} | |||
|
|||
func Run(ctx context.Context, rs *options.DeschedulerServer) error { | |||
err := tracing.NewTracerProvider(ctx, rs.Tracing.CollectorEndpoint, rs.Tracing.TransportCert, rs.Tracing.ServiceName, rs.Tracing.ServiceNamespace, rs.Tracing.SampleRate, rs.Tracing.FallbackToNoOpProviderOnError) | |||
err := features.DefaultMutableFeatureGate.SetFromMap(rs.FeatureGates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the initializations would be better placed in the apply logic
eventRecorder, | ||
podInformer, | ||
rs.DefaultFeatureGates, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to pass the feature gates as an argument everywhere? Compared to the global variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was keeping in mind the case where the descheduling framework is run as a simulator. E.g. in case two or more framework instances are created in the same run/binary.
return | ||
} | ||
// Ignore any pod that does not have eviction in progress | ||
if _, exists := newPod.Annotations[EvictionInProgressAnnotationKey]; !exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can we get to this line and delete pods that lost their annotation, when we do
if newPod.Annotations == nil {
return
}
// Ignore pod's that are not subject to an eviction in background
if _, exists := newPod.Annotations[EvictionRequestAnnotationKey]; !exists {
return
}
just above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a pod looses its annotation there's no way of knowing whether the annotation was removed by accident or intentionally. E.g. a live migration is no longer needed since some scoring mechanism decided it's more costly to perform a live migration than to drop VM's state and re-create it.
|
||
type evictionRequestItem struct { | ||
pod *v1.Pod | ||
assumed bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we please make the name of the field more verbose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to evictionAssumed
@@ -314,7 +315,7 @@ func (d profileImpl) RunDeschedulePlugins(ctx context.Context, nodes []*v1.Node) | |||
span.AddEvent("Plugin Execution Failed", trace.WithAttributes(attribute.String("err", status.Err.Error()))) | |||
errs = append(errs, fmt.Errorf("plugin %q finished with error: %v", pl.Name(), status.Err)) | |||
} | |||
klog.V(1).InfoS("Total number of pods evicted", "extension point", "Deschedule", "evictedPods", d.podEvictor.TotalEvicted()-evicted) | |||
klog.V(1).InfoS("Total number of evictions/requests", "extension point", "Deschedule", "evictedPods", d.podEvictor.TotalEvicted()-evicted, "evictionRequests", d.podEvictor.TotalEvictionRequests()-evictionRequests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it might be good to unify the logging and use keysAndValues
everywhere
@@ -108,6 +423,46 @@ func (pe *PodEvictor) SetClient(client clientset.Interface) { | |||
pe.client = client | |||
} | |||
|
|||
func (pe *PodEvictor) evictionRequestsTotal() uint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot what was the consensus last time, but do we want to introduce new metrics for this feature? https://github.com/kubernetes-sigs/descheduler/blob/master/keps/1397-evictions-in-background/README.md#metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do. Yet, not as part of this PR. I will have a follow up for all other changes.
ObjectMeta: metav1.ObjectMeta{ | ||
Name: fmt.Sprintf("kubevirtvmi-%v", idx), | ||
Annotations: map[string]string{ | ||
"descheduler.alpha.kubernetes.io/request-evict-only": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm, the feature depends only on the key, and not on the value? Would be good to add a little a bit of the documentation above EvictionRequestAnnotationKey
and EvictionInProgressAnnotationKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. +1 for a documentation. I still have a blog post to write.
When the feature is enabled each pod with descheduler.alpha.kubernetes.io/request-evict-only annotation will have the eviction API error examined for a specific error code/reason and message. If matched eviction of such a pod will be interpreted as initiation of an eviction in background.
a2b5211
to
7d50a3b
Compare
@ingvagabund: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Implementation of #1354.
When the feature is enabled each pod with
descheduler.alpha.kubernetes.io/request-evict-only
annotation will have the eviction API error examined for a specific error code/reason and message. If matched eviction of such a pod will be interpreted as initiation of an eviction in background.The first version of the feature does not provide any way to configure a timeout for pending evictions in background. Neither a timeout for how often a pending evictions in background are checked for timeouts. Both set to 10 minutes by default. The default values can be adjusted based on provided feedback.
TODO:
EvictionsInBackground
as a new feature gate)Refactorings before this PR can be merged: