Skip to content
This repository has been archived by the owner on Nov 2, 2021. It is now read-only.

Why duplicate metrics occured when a job scheduling to this server #174

Open
WYmindsky opened this issue Apr 1, 2021 · 3 comments
Open

Comments

@WYmindsky
Copy link

yaml:pod-gpu-exporter-daemonset.yaml
docker image:pod-gpu-metrics-exporter:1.0.0-alpha
dcgm:dcgm-exporter:1.4.6

Duplicate metrics occured when a job scheduling to this server for long time
11

@JulesBelveze
Copy link

Hey @WYmindsky I'm experiencing the same behaviour. Did you find out why this occurs?

@WYmindsky
Copy link
Author

Hey @WYmindsky I'm experiencing the same behaviour. Did you find out why this occurs?

It's still there

@nikkon-dev
Copy link

Hi,

Could you provide the logs from the dcgm-exporter itself?
It looks like there are two dcgm-exporter instances one aware of k8s environment (were able to connect to pod api) and another one didn't. The container_name, pod_namespace, pod_name labels are gathered from the k8s infra and if there are no such labels - connection to the k8s from the dcgm-exporter failed and that should be reflected in the dcgm-exporter logs.

WBR,
Nik

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants