Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce RequestEviction feature for evicting pods in background (KEP-1397) #1466

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ingvagabund
Copy link
Contributor

@ingvagabund ingvagabund commented Jul 14, 2024

Implementation of #1354.

When the feature is enabled each pod with descheduler.alpha.kubernetes.io/request-evict-only annotation will have the eviction API error examined for a specific error code/reason and message. If matched eviction of such a pod will be interpreted as initiation of an eviction in background.

The first version of the feature does not provide any way to configure a timeout for pending evictions in background. Neither a timeout for how often a pending evictions in background are checked for timeouts. Both set to 10 minutes by default. The default values can be adjusted based on provided feedback.

TODO:

  • evictions in background are respested
  • unit tests
  • feature gating implemented (EvictionsInBackground as a new feature gate)
  • e2e integration with kubevirt
  • feature configuration (currently none)

Refactorings before this PR can be merged:

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 14, 2024
@ingvagabund ingvagabund force-pushed the eviction-in-background-code branch 17 times, most recently from 1c05dac to d022bf4 Compare July 17, 2024 14:54
@ingvagabund
Copy link
Contributor Author

Compared to virt-handler produced by my local kind these are the logs produced by the upstream CI:

{"component":"virt-handler","level":"info","msg":"re-enqueuing VirtualMachineInstance default/vmi-masquerade-3","pos":"vm.go:1712","reason":"server error. command SyncVMI failed: \"LibvirtError(Code=38, Domain=0, Message='Unable to create tap device tap0: Permission denied')\"","timestamp":"2024-07-17T15:01:56.211181Z"}

Could be kubevirt does not like combination of running kind in pod with dind enabled.

@ingvagabund ingvagabund force-pushed the eviction-in-background-code branch 6 times, most recently from 3210921 to 90a6b55 Compare July 19, 2024 07:18
@ingvagabund
Copy link
Contributor Author

/retest-required

@ingvagabund
Copy link
Contributor Author

/retest-required

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 9, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 17, 2024
@@ -210,13 +569,36 @@ func evictPod(ctx context.Context, client clientset.Interface, pod *v1.Pod, poli
},
DeleteOptions: deleteOptions,
}
err := client.PolicyV1().Evictions(eviction.Namespace).Evict(ctx, eviction)
err := pe.client.PolicyV1().Evictions(eviction.Namespace).Evict(ctx, eviction)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered adding a similar retry mechanism?

retriable := func(err error) bool {
return apierrors.IsTooManyRequests(err) || apierrors.IsInternalError(err) || apierrors.IsServerTimeout(err) || apierrors.IsUnexpectedServerError(err)
}
err = retry.OnError(retry.DefaultRetry, retriable, func() error {
return evictPod(ctx, pe.client, pod, pe.directEvict)
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we discussed this in the past. Though, I have not seen anyone reporting this as annoying. We can always update the code. Nevertheless, if the returned errors are returned again for some period of time, retrying will not help. If only few evictions return the error codes the descheduler will try to evict another pod. All the pods are still treated as cattle. So the impact is usually negligible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the impact is not significant. For example, the rescheduling policy uses the method podutil.SortPodsBasedOnPriorityLowToHigh(removablePods) to sort the pods. Lower-priority pods at the front may fail to be evicted due to network issues, resulting in the eviction of higher-priority pods later, even though all these pods are evictable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This could be a suboptional case. Too many higher priority pods getting evicted instead of lower priority ones. Nevertheless, this deserves a separate PR/change. Retries are outside the scope of this.

@fanhaouu
Copy link
Contributor

No obvious bugs were found; everything except the commented code looks good to me.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 1, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2024
@ingvagabund ingvagabund force-pushed the eviction-in-background-code branch 2 times, most recently from 79eebf1 to 9b5eae3 Compare October 7, 2024 11:21
@ingvagabund
Copy link
Contributor Author

/retest-required

@ingvagabund ingvagabund added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 7, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -45,6 +50,10 @@ type DeschedulerServer struct {
SecureServing *apiserveroptions.SecureServingOptionsWithLoopback
DisableMetrics bool
EnableHTTP2 bool
// FeatureGates enabled by the user

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, the approach used by upstream is via ComponentGlobalsRegistry. Might be good to consider if it makes sense to introduce that.

@@ -115,7 +116,13 @@ func NewDeschedulerCommand(out io.Writer) *cobra.Command {
}

func Run(ctx context.Context, rs *options.DeschedulerServer) error {
err := tracing.NewTracerProvider(ctx, rs.Tracing.CollectorEndpoint, rs.Tracing.TransportCert, rs.Tracing.ServiceName, rs.Tracing.ServiceNamespace, rs.Tracing.SampleRate, rs.Tracing.FallbackToNoOpProviderOnError)
err := features.DefaultMutableFeatureGate.SetFromMap(rs.FeatureGates)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the initializations would be better placed in the apply logic

eventRecorder,
podInformer,
rs.DefaultFeatureGates,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to pass the feature gates as an argument everywhere? Compared to the global variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was keeping in mind the case where the descheduling framework is run as a simulator. E.g. in case two or more framework instances are created in the same run/binary.

pkg/descheduler/descheduler.go Outdated Show resolved Hide resolved
return
}
// Ignore any pod that does not have eviction in progress
if _, exists := newPod.Annotations[EvictionInProgressAnnotationKey]; !exists {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we get to this line and delete pods that lost their annotation, when we do

				    if newPod.Annotations == nil {
						return
					}
					// Ignore pod's that are not subject to an eviction in background
					if _, exists := newPod.Annotations[EvictionRequestAnnotationKey]; !exists {
						return
					}

just above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a pod looses its annotation there's no way of knowing whether the annotation was removed by accident or intentionally. E.g. a live migration is no longer needed since some scoring mechanism decided it's more costly to perform a live migration than to drop VM's state and re-create it.


type evictionRequestItem struct {
pod *v1.Pod
assumed bool

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please make the name of the field more verbose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to evictionAssumed

pkg/descheduler/evictions/evictions.go Show resolved Hide resolved
@@ -314,7 +315,7 @@ func (d profileImpl) RunDeschedulePlugins(ctx context.Context, nodes []*v1.Node)
span.AddEvent("Plugin Execution Failed", trace.WithAttributes(attribute.String("err", status.Err.Error())))
errs = append(errs, fmt.Errorf("plugin %q finished with error: %v", pl.Name(), status.Err))
}
klog.V(1).InfoS("Total number of pods evicted", "extension point", "Deschedule", "evictedPods", d.podEvictor.TotalEvicted()-evicted)
klog.V(1).InfoS("Total number of evictions/requests", "extension point", "Deschedule", "evictedPods", d.podEvictor.TotalEvicted()-evicted, "evictionRequests", d.podEvictor.TotalEvictionRequests()-evictionRequests)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it might be good to unify the logging and use keysAndValues everywhere

@@ -108,6 +423,46 @@ func (pe *PodEvictor) SetClient(client clientset.Interface) {
pe.client = client
}

func (pe *PodEvictor) evictionRequestsTotal() uint {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot what was the consensus last time, but do we want to introduce new metrics for this feature? https://github.com/kubernetes-sigs/descheduler/blob/master/keps/1397-evictions-in-background/README.md#metrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do. Yet, not as part of this PR. I will have a follow up for all other changes.

ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("kubevirtvmi-%v", idx),
Annotations: map[string]string{
"descheduler.alpha.kubernetes.io/request-evict-only": "",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm, the feature depends only on the key, and not on the value? Would be good to add a little a bit of the documentation above EvictionRequestAnnotationKey and EvictionInProgressAnnotationKey

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. +1 for a documentation. I still have a blog post to write.

When the feature is enabled each pod with descheduler.alpha.kubernetes.io/request-evict-only
annotation will have the eviction API error examined for a specific
error code/reason and message. If matched eviction of such a pod will be interpreted
as initiation of an eviction in background.
@k8s-ci-robot
Copy link
Contributor

@ingvagabund: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-descheduler-verify-master 7d50a3b link true /test pull-descheduler-verify-master
pull-descheduler-test-e2e-k8s-master-1-29 7d50a3b link true /test pull-descheduler-test-e2e-k8s-master-1-29
pull-descheduler-test-e2e-k8s-master-1-30 7d50a3b link true /test pull-descheduler-test-e2e-k8s-master-1-30
pull-descheduler-test-e2e-k8s-master-1-31 7d50a3b link true /test pull-descheduler-test-e2e-k8s-master-1-31

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants