Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Scraping with Secure Port #1477

Open
blakeromano opened this issue Aug 2, 2024 · 4 comments
Open

Prometheus Scraping with Secure Port #1477

blakeromano opened this issue Aug 2, 2024 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@blakeromano
Copy link

What version of descheduler are you using?

descheduler version: 0.29.0

Does this issue reproduce with the latest release?

Yes

Which descheduler CLI options are you using?

--policy-config-file=/policy-dir/policy.yaml
--descheduling-interval=5m
--v=3

Please provide a copy of your descheduler policy config file
N/A

What k8s version are you using (kubectl version)?

kubectl version Output
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"29+", GitVersion:"v1.29.4-eks-036c24b", GitCommit:"9c0e57823b31865d0ee095997d9e7e721ffdc77f", GitTreeState:"clean", BuildDate:"2024-04-30T23:53:58Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

I am trying to scrape descheduler with an OpenTelemetry collector running as a Daemonset however because there is no option to run a port insecure there is no way that I can tell to scrape the metrics off the pod.

helm install with the following values:

kind: Deployment

deschedulingInterval: 5m

run a curl to pod like

curl POD_IP:10258/metrics

will fail and OpenTelemetry Collector's prometheus scraper can't connect.

What did you expect to see?

I can scrape prometheus metrics. I'd love to just have an insecure port that can be used.

What did you see instead?

The underlying problem seems to be Descheduler decided to use the same http server as API Server which also leads to extraneous Prometheus metrics like the ones below being introduced which adds noise and confusion.

# HELP aggregator_discovery_aggregation_count_total [ALPHA] Counter of number of times discovery was aggregated
# TYPE aggregator_discovery_aggregation_count_total counter
# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
@blakeromano blakeromano added the kind/bug Categorizes issue or PR as related to a bug. label Aug 2, 2024
@blakeromano
Copy link
Author

Related issues #1102 #1095 #842

@Athishpranav2003
Copy link

@blakeromano when i took a look at the code i saw that it just register's the new metrics in the registry which i presume is same as what u mentioned(the one which API server exposes). I guess for dependent services its correct to have central store. Why would 2 seperate hosting be needed?

@blakeromano
Copy link
Author

My suggestion is we should move away from using https://github.com/kubernetes-sigs/descheduler/blob/master/cmd/descheduler/app/server.go#L36 as the server we use for Descheduler and instead we should stand up a separate HTTP server and not depend on the k8s apiserver http server code.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

4 participants