You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current load-watcher Prometheus pkg was using the metric of instance:node_cpu:ratio to calculate the node utilization
However, when this value is still below 60%, I found another metric instance:node_cpu_utilisation:rate1m was very large and was around 90%. Apparently, the Prometheus metric had some smoothing for the metric, and the one we used may already have a smoothing over a large time window, which might be larger than 1m. Let's guess for 5m.
We are not sure which Prometheus metric is consistent with the metric obtained directly from the metric server, so there needs more testing.
The text was updated successfully, but these errors were encountered:
instance:node_cpu:ratio metric's time window is 5m.
sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}[5m])) WITHOUT (cpu, mode) / ON(instance) GROUP_LEFT() count(sum(node_cpu_seconds_total) BY (instance, cpu)) BY (instance)
The current load-watcher Prometheus pkg was using the metric of
instance:node_cpu:ratio
to calculate the node utilizationHowever, when this value is still below 60%, I found another metric
instance:node_cpu_utilisation:rate1m
was very large and was around 90%. Apparently, the Prometheus metric had some smoothing for the metric, and the one we used may already have a smoothing over a large time window, which might be larger than 1m. Let's guess for 5m.We are not sure which Prometheus metric is consistent with the metric obtained directly from the metric server, so there needs more testing.
The text was updated successfully, but these errors were encountered: