Node utilization util snapshot #1533

ingvagabund · 2024-10-11T15:20:33Z

Built on top of #1532.
Prep work for load-aware descheduling: #225.

Later on, usageSnapshot can be promoted to an interface with actualUsageSnapshot implementation pulling the usage from metrics.

Notes (towards integrating plugins with metrics):

utilize https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#metrics-api
- kubectl get podmetrics + kubectl get nodemetrics
implement an exponentially weighted moving average from the ^^ metrics (introduce initial delay for picking at least 5 samples before running descheduling plugins, with the default window set to 5s, with both window and initial delay configurable)
extend the descheduling framework with metrics collector (with ewma computed) and make it available for each plugin (the collector will start to collect if either of the plugins accesses the metrics)

TODO:

define initialDelay (and pullInterval for the kubernetes metrics) in case a user wants to let the metrics collector to smooth the resource changes.

For testing purposes:

apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
# metricsCollector:
#   enabled: true
profiles:
  - name: ProfileName
    pluginConfig:
    - name: "LowNodeUtilization"
      args:
        thresholds:
          "memory": 20
        targetThresholds:
          "memory": 70
        metricsUtilization:
          metricsServer: true
          prometheusURL: https://prometheus-k8s-openshift-monitoring.apps.jchaloup-20241106.group-b.devcluster.openshift.com
          prometheusAuthToken: XXXXX
          promQuery: instance:node_cpu:rate:sum
    plugins:
      balance:
        enabled:
          - "LowNodeUtilization"

k8s-ci-robot · 2024-10-11T15:20:39Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ingvagabund. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fanhaouu · 2024-10-12T02:59:04Z

Hello, master. Due to the company's busy schedule previously, I only managed to complete half of the related KEP. I'm glad to see that you're working on this. It looks like you're aiming to reuse the current Node utilization logic. I have a few suggestions:

It should support different data sources, similar to PayPal's load-watcher.
It should support various real-time data processing algorithms. For instance, real-time calculations, using rate averages, or predictions based on EWMA + P95, similar to the approach used by autoscaler.
If the goal is to address real-time CPU hotspots, perhaps there’s no need to calculate the number of nodes below or above a certain threshold. Of course, you could also provide a switch to control this behavior.

Hope these suggestions help!

ingvagabund · 2024-10-14T11:47:53Z

Hello sir :)

thank you for taking part in composing the out-of-tree descheduling plugin KEP.

It should support different data sources, similar to PayPal's load-watcher.

You are on the right track here. I'd like to get in touch with load-watcher maintainers and extend the codebase to provide a generic interface for accessing metrics related to pod utilization as well. Currently, only actual node utilization gets collected. Meantime, I am forming the code here to be able to better integrate with other utilization sources like metrics.

It should support various real-time data processing algorithms. For instance, real-time calculations, using rate averages, or predictions based on EWMA + P95, similar to the approach used by autoscaler.

This is where we can debate more. Thank you for sharing the specifics. There's an open issue for the pod autoscaler suggesting to introduce EMA: kubernetes/kubernetes#62235. Are you aware if there's a similar issue or a discussion for the cluster autoscaler? I'd love to learn more about how it's implemented there. Ultimately, the current plugin just needs to know which pod, when evicted, will improve the overall node/workload utilization when properly re-scheduled. I could see various ways to produce the utilization snapshot using various methods.

If the goal is to address real-time CPU hotspots, perhaps there’s no need to calculate the number of nodes below or above a certain threshold. Of course, you could also provide a switch to control this behavior.

I can see how evicting hotspot pods is related to consuming the metrics/real-time node utilization. In the current plugin context this is more suitable for a new/different plugin. I can also see how RemoveDuplicates can be extended to evict based on overall node utilization instead of the current counting approach. Not every plugin will need to consume metrics. Though, there can be common pieces shared across them through the descheduling framework.

ingvagabund · 2024-11-07T13:27:32Z

kubernetes/kubernetes#128663 to address the discrepancy in the fake metrics client node/pod metricses resource name.

…xtraction into a dedicated usage client Turning a usage client into an interface allows to implement other kinds of usage clients like actual usage or prometheus based resource collection.

…es instead

ingvagabund · 2024-11-08T16:50:29Z

/test pull-descheduler-verify-master

k8s-ci-robot requested review from JaneLiuL and jklaw90 October 11, 2024 15:20

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 11, 2024

ingvagabund force-pushed the node-utilization-util-snapshot branch from c889a53 to 1f55c4d Compare October 15, 2024 10:18

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 5, 2024

ingvagabund force-pushed the node-utilization-util-snapshot branch 3 times, most recently from d744a96 to 800c92c Compare November 6, 2024 18:34

[nodeutilization]: Separate code responsible for requested resource e…

00c285e

…xtraction into a dedicated usage client Turning a usage client into an interface allows to implement other kinds of usage clients like actual usage or prometheus based resource collection.

ingvagabund force-pushed the node-utilization-util-snapshot branch from f30f8a1 to 2e63411 Compare November 7, 2024 15:40

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 7, 2024

ingvagabund force-pushed the node-utilization-util-snapshot branch from 2e63411 to 4dc6da0 Compare November 7, 2024 20:05

ingvagabund added 4 commits November 7, 2024 21:33

go mod tidy/vendor k8s.io/metrics

094a804

[nodeutilization]: actual usage client through kubernetes metrics

3d2925c

Update vendor for prometheus deps

3adf165

[nodeutilization]: prometheus usage client through kubernetes metrics

e540957

ingvagabund force-pushed the node-utilization-util-snapshot branch from 4dc6da0 to e540957 Compare November 7, 2024 20:34

Update nodes sorting function to respect available resources

edefa1d

ingvagabund force-pushed the node-utilization-util-snapshot branch from e715f8c to 0330902 Compare November 8, 2024 15:39

fakemetricsclient: use Create with gvr to populate nodes/pods metrics…

baa6650

…es instead

ingvagabund force-pushed the node-utilization-util-snapshot branch from 0330902 to baa6650 Compare November 8, 2024 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node utilization util snapshot #1533

Node utilization util snapshot #1533

ingvagabund commented Oct 11, 2024 •

edited

Loading

k8s-ci-robot commented Oct 11, 2024

fanhaouu commented Oct 12, 2024

ingvagabund commented Oct 14, 2024

ingvagabund commented Nov 7, 2024

ingvagabund commented Nov 8, 2024

Node utilization util snapshot #1533

Are you sure you want to change the base?

Node utilization util snapshot #1533

Conversation

ingvagabund commented Oct 11, 2024 • edited Loading

k8s-ci-robot commented Oct 11, 2024

fanhaouu commented Oct 12, 2024

ingvagabund commented Oct 14, 2024

ingvagabund commented Nov 7, 2024

ingvagabund commented Nov 8, 2024

ingvagabund commented Oct 11, 2024 •

edited

Loading