Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Descheduler Framework Proposal #753

Closed
damemi opened this issue Mar 2, 2022 · 37 comments
Closed

Descheduler Framework Proposal #753

damemi opened this issue Mar 2, 2022 · 37 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@damemi
Copy link
Contributor

damemi commented Mar 2, 2022

Hi all,

I'm creating this issue to gauge interest in our longstanding idea of creating a "Descheduler Framework". The general idea is to abstract the Descheduler internals into a library that third-party developers can use to create their own custom Deschedulers.

I think this is relevant now because we have a lot of big, ongoing tasks that I think would fit well under the umbrella of this effort. For example, each of these features presents a potential "extension point" for future custom descheduler developers (so, if we choose to take on a framework, it would be worth considering the implementation of these features with that extensibility in mind):

  • Supporting out-of-tree strategies (#679, #557)
  • Allowing custom options for pod evictions / custom PodEvictor implementations (#690, #725)
  • Defining an implementable interface for Strategies (#586, #706)
  • Refactoring the API to be more flexible and extensible (#587, #486)
  • Allowing event-triggered strategy runs (#488, #696)

In addition, the Descheduler has seen a lot of growth and requests for unique features to support custom use cases. Enabling users to build their own Deschedulers in a preferred way will free up our maintainers to focus on larger tasks like those above, while also steering the Descheduler toward a stable project.

I would like to propose an outline for this effort, but first I want to know if this is something people think is worth tackling as a project. I think this also will be a good chance for a lot of contributors to make a significant impact on the project as a whole.

So the main question is: do folks think it would be worth the effort to re-focus some of the above tasks under the umbrella goal of the Descheduler Framework?


Update: Based on feedback I put together some more organized thoughts into a doc: https://docs.google.com/document/d/1fr5tfSaCkqGZVXUMJA9pUkIy6GGRwJumPN5MeZ8oewo/edit?usp=sharing

Please comment there or here to help flesh out the design goals for if/when we pursue this

@damemi damemi added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 2, 2022
@JaneLiuL
Copy link
Member

JaneLiuL commented Mar 3, 2022

This is a great proposal ~~ i think it worth~~

@binacs
Copy link
Member

binacs commented Mar 7, 2022

/cc

@damemi
Copy link
Contributor Author

damemi commented Mar 9, 2022

Based on the reactions it sounds like this is something people are interested in. I'd like to bring it up at tomorrow's sig-scheduling meeting if anyone would like to discuss it there.

I also put together some more detail about the goals of this into a doc: https://docs.google.com/document/d/1fr5tfSaCkqGZVXUMJA9pUkIy6GGRwJumPN5MeZ8oewo/edit?usp=sharing

Please comment there (or in this issue), specifically about the "open questions" at the bottom of the doc. If there are any other questions you think should be brought up, those are welcome too.

@ingvagabund
Copy link
Contributor

The first version of the design proposal can be found at https://docs.google.com/document/d/1tWpXoZ6YF3ksnmueM2POwASdxoiLW8y7TXuWFxT8NSo/edit?usp=sharing. I have just shared the link at https://groups.google.com/g/kubernetes-sig-scheduling/c/VdwizKUMNgM as well to include broader audience.

Any feedback welcome.

@ingvagabund
Copy link
Contributor

Thanks everyone for sharing thoughts. Given there have not been any new comments in the proposal, I think we can start discussing the implementation. We could split the implementation into several steps:

  • define framework types:
    • descheduler configuration with profiles (internal/v1alpha2) with conversion/validation/defaulting
    • handle for injecting clientset, evictor, shared informer factory, etc.
    • plugin extension point interfaces
    • example strategy arguments + defaulting/validation
  • framework internals: creating plugin registry, plugin initialization from configuration, framework initialization, collecting metrics, descheduling loop, running fleet of extension points (e.g. RunAllDeschedulePlugins, RunAllBalancePlugins), ...
  • evictor plugin: migrate the PodEvictor's Evictable into a plugin to separate the pod eviction mechanism (framework internals) from the "is eligible for eviction" logic (evictor plugin), with examples of how to configure it (e.g. evicting system critical pods, evicting pods with a local storage, etc.)
  • migrate strategies: migration of all existing strategies into plugins, versioned strategy arguments+defaulting
  • leftovers: anything left

Every step needs to be properly tested. Before the descheduler code base is flipped into the framework, some refactoring will be required. E.g. to run the plugins alongside the strategies while transitioning. New documentation with diagrams will be needed. Including a guide for creating a new plugin.

@ingvagabund
Copy link
Contributor

ingvagabund commented May 25, 2022

@damemi @eminaktas @jklaw90 @JaneLiuL @Dentrax @pravarag and others, any volunteers? This will take some time to implement. The reviews will help to crystallize the design further. Feel free to ask any questions.

Those 4 steps can be further broken up into more steps if needed. Also feel free to suggest other/new steps if you have some in mind.

@pravarag
Copy link
Contributor

@damemi @eminaktas @jklaw90 @JaneLiuL @Dentrax @pravarag and others, any volunteers? This will take some time to implement. The reviews will help to crystallize the design further. Feel free to ask any questions.

Those 4 steps can be further broken up into more steps if needed. Also feel free to suggest other/new steps if you have some in mind.

I would love to work on it but I wanted to check, will this be having some sub-tasks for starters (I'm guessing the 4 steps could be formed into 4 different sub issues)? or this needs to be approached as one whole issue?

@binacs
Copy link
Member

binacs commented May 25, 2022

Hi, everyone! I also hope I can contribute to this ^ ^

@damemi
Copy link
Contributor Author

damemi commented May 25, 2022

or this needs to be approached as one whole issue?

We don't want to do this as one whole issue, but rather with atomic changes. Additionally, each change should be seamless in that it doesn't change any current functionality or break due to relying on another change. This is how the Scheduler framework was implemented, so we want to take a similar approach piece-by-piece. There should be plenty of work to go around

@ingvagabund
Copy link
Contributor

As @damemi says. We might open individual issues where we go more into the details of each step. The steps provided are high-level steps. I'd like to wait for more opinions so we can join the effort and make sure we cover as much of the important aspects as we can. I did some preliminary implementation in #781 to see how complex the implementation can get. It's not a referential implementation. Though, it can serve as an idea of what to keep in mind. I'd like to start over so we can discuss individual pieces one by one while still heading in the right direction.

@ingvagabund
Copy link
Contributor

The first issue can be found at #837. Let's move the discussion about the framework data types there.

@pravarag
Copy link
Contributor

pravarag commented Jun 8, 2022

@ingvagabund @damemi thanks for sharing the detailed info about framework data types. I wanted to check do we have any plan to introduce exposing more metrics as part of Framework in any of the above mentioned steps? I was thinking if this pending work #503 can be included as part of Framework level change as well 🤔

@ingvagabund
Copy link
Contributor

Adding #793 for awareness for defaulting the plugin arguments

@ingvagabund
Copy link
Contributor

ingvagabund commented Aug 4, 2022

@pravarag
Copy link
Contributor

pravarag commented Aug 4, 2022

  • LowNodeUtilization + HighNodeUtilization

@pravarag @Dentrax @BinacsLee @jklaw90 @eminaktas would you be interested in converting some of the strategies into plugins?

Hi @ingvagabund, I'm already working on migrating LowNodeUtilization + HighNodeUtilization. Will share the PR soon, apologies for the delay.

@Dentrax
Copy link
Contributor

Dentrax commented Aug 4, 2022

Hey! Thanks for pinging us! I'm not making any promises (due to limited free time), but maybe we (@eminaktas) can get into PodLifeTime to convert into plugin.

@jklaw90
Copy link
Contributor

jklaw90 commented Aug 4, 2022

Yep, i can finish off RemovePodsViolatingTopologySpreadConstraint. i just need to see what the latest changes are to the framework and update what i got.

@binacs
Copy link
Member

binacs commented Aug 4, 2022

Hi there, thanks for ping us. I will take up the migration of the RemovePodsHavingTooManyRestarts plugin. :)

@ingvagabund
Copy link
Contributor

ingvagabund commented Aug 17, 2022

All strategies have been migrated!!! Thank you everyone involved in the code changes and reviews. I very much appreciate all the discussions we have during the reviews. Some of them helped to discover weak spots and improve the code quality. Good work everyone. Thank you.

@ingvagabund
Copy link
Contributor

ingvagabund commented Aug 25, 2022

Time to move forward :) I am proposing the following next steps:

  1. providing a plugin example + readme with instructions: Descheduling plugin examples with guides #923
  2. converting current evictor filter into a plugin + introducing new preEvictFilter extension point for filtering right before a pod eviction is performed (e.g. for LowNodeUtilization plugin): Convert current evictor filter into a plugin #924
  3. plugin arguments defaulting + moving plugin arguments to each corresponding plugin: Plugin arguments defaulting + moving plugin arguments to each corresponding plugin #925
  4. introduction of new v1alpha2 configuration + configuration conversion from v1alpha1 to v1alpha2 (the fact a plugin arguments are mapped to a specific configuration version will be determined by importing a plugin arguments in corresponding conversion code): New v1alpha2 configuration + conversion #926
  5. introduce sort and preFilterSort extension points: Introduce sort and preEvictionSort extension points #927

Step 1 is independent and can be a good exercise for everyone. There can be more examples. Step 2 is mostly self-standing and can be performed right way. Benefit of moving the evictor filter under a plugin will allow framework users to build their own eviction plugins and either replace or extend the default filter with new filters. Step 3 is prerequisite for step 4 as all the plugin arguments need to be defaulted before the conversion to v1alpha2 takes place. Argument defaulting will gives chance to discuss which values are the most suitable candidates for setting by default. One PR for each plugin arguments. We can discuss arguments renaming as well during the reviews. Lastly, step 4 will allow the framework to convert both v1alpha1 and v1alpha2 into an internal configuration on which the framework data types and mechanics will be built. Step 5 will extend ability to change sorting algorithm of processing (sort) and evicting (preEvictSort) pods (e.g. by oldest or by priority class). Worth making some measurements or discussing which default sorting algorithms the framework provide. Integration of sorting algorithms is another self-standing task.

Once all bits are in place we can start extending the framework.

@damemi @a7i step 3 is aligned with the discussion we had about making the plugins more modular

@damemi
Copy link
Contributor Author

damemi commented Aug 25, 2022

@ingvagabund sounds good! thanks for working out the next steps.

For step 4, the v1alpha2 configuration should be backward-compatible with v1alpha1 configs, so no plugins should be dependent on v1alpha1. Instead, we can do the conversion in the framework from any v1alpha1->v1alpha2 configs.

Also, what's the plan for getting rid of the migration wrapper?

Thanks for laying all of this out and keeping track of the plan

@ingvagabund
Copy link
Contributor

ingvagabund commented Aug 25, 2022

For step 4, the v1alpha2 configuration should be backward-compatible with v1alpha1 configs, so no plugins should be dependent on v1alpha1

Conversion from v1alpha2 to v1alpha1 will be possible only when there's a single profile. There's no unambiguous mapping otherwise. I wonder whether it makes sense to converse from v1alpha2 to v1alpha1. Can you think of any use cases where this might be needed?

Also, what's the plan for getting rid of the migration wrapper?

After the conversion from v1alpha1 to v1alpha2 is implemented. We will still need to implement the plugin registry so all the plugins can be initialized systematically while injecting defaulted plugin arguments in the constructors.

@damemi
Copy link
Contributor Author

damemi commented Aug 25, 2022

Conversion from v1alpha2 to v1alpha1 will be possible only when there's a single profile

the framework code only needs to convert v1alpha1 -> v1alpha2, such as when the user is still using a v1alpha1 config. if the config is already given in v1alpha2, then we can just use it as is.

this way, all plugins can be written assuming v1alpha2, and let the framework do the conversion work. I don't want to deal with versioned plugins 😬

@ingvagabund
Copy link
Contributor

So I just misunderstood :) I would like to avoid any conversion from v1alpha2 to v1alpha1.

@damemi
Copy link
Contributor Author

damemi commented Aug 25, 2022

Yeah, there's no need to convert v1alpha2 to v1alpha1 (and like you said, it's probably not even possible). Just 1->2 to handle legacy configs, which should be possible

@JaneLiuL
Copy link
Member

is there any task need to be take for this proposal now?

@ingvagabund
Copy link
Contributor

@JaneLiuL any issue in #753 (comment) that has not been assigned yet

@ingvagabund
Copy link
Contributor

ingvagabund commented Sep 28, 2022

#926 is a pre-requisite for implementing a plugin registry. Once v1alpha1 -> v1alpha2 conversion (with plugin argument defaulting and validation) and plugin registry are in place, the descheduler will get updated to initialize plugins based on each profile configuration (plugin configuration with list of enabled/disabled extension points). Followed by implementing the framework methods for running a batch of extension points (i.e. runDescheduleExtensionPoints and runBalanceExtensionPoints).

Right now it is important to complete the following issues:

@rentseen
Copy link

Is this excellent work released in v0.25.0?

@ingvagabund
Copy link
Contributor

We do not target any specific release. Though, the goal is to finish the framework as soon as possible.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 16, 2023
@ingvagabund ingvagabund removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 16, 2023
@ingvagabund
Copy link
Contributor

The conversion from v1alpha1 -> v1alpha2 is implemented. Thank you everyone who participated.

The next item on the list is to discuss the descheduling framework in #979. Once implemented, all the remaining descheduling framework types and mechanism can be put in place.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 6, 2023
@ingvagabund ingvagabund removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 9, 2023
@ingvagabund
Copy link
Contributor

Remaining work left listed in #1187

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 24, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 23, 2024
@ingvagabund ingvagabund removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 23, 2024
@damemi damemi unpinned this issue Apr 23, 2024
@damemi
Copy link
Contributor Author

damemi commented Apr 23, 2024

Closing this as we are merging a formal descheduler KEP in #1372

@damemi damemi closed this as completed Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

10 participants