Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cortexadmin GetRule : context deadline exceeded when evaluating alerting status. #1747

Open
alexandreLamarre opened this issue Sep 29, 2023 · 1 comment
Assignees
Labels
alerting bug Something isn't working metrics

Comments

@alexandreLamarre
Copy link
Contributor

Cortexadmin GetRule

2023-09-28T19:51:17Z ERROR plugin.metrics.cortex-admin cortex/admin.go:810 failed with Get "https://cortex-ruler:8080/prometheus/api/v1/rules": context canceled {"request": "https://cortex-ruler:8080/prometheus/api/v1/rules"}

root cause is likely caused by duplicate metric registration which causes a loaded rule to be invalid:

{"caller":"manager.go:677","err":"found duplicate series for the match group {instance=\"xx.yyy.zz.92:10250\"} on the right hand-side of the operation: [{__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"xx.yyy.zz.92:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"[ip-xx-yyy-zz-92.mydomain.com](http://ip-xx-yyy-zz-92.mydomain.com/)\", prometheus=\"opni/opni-prometheus-agent\", prometheus_replica=\"prom-agent-opni-prometheus-agent-0\", service=\"rancher-mon-me-cluster-k8s-kubelet\"}, {__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"xx.yyy.zz.92:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"[ip-xx-yyy-zz-92.mydomain.com](http://ip-xx-yyy-zz-92.mydomain.com/)\", prometheus=\"opni/opni-prometheus-agent\", prometheus_replica=\"prom-agent-opni-prometheus-agent-0\", service=\"opni-kube-prometheus-stack-kubelet\"}];many-to-many matching not allowed: matching labels must be unique on one side","file":"/rules/f8985de4-c040-40e9-9df6-9814c5582185/synced","group":"kubelet.rules","index":0,"level":"warn","msg":"Evaluating rule failed","name":"node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile","rule":"record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\nexpr: histogram_quantile(0.99, sum by (cluster, instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket{job=\"kubelet\",metrics_path=\"/metrics\"}[5m]))\n  * on (cluster, instance) group_left (node) kubelet_node_name{job=\"kubelet\",metrics_path=\"/metrics\"})\nlabels:\n  quantile: \"0.99\"\n","ts":"2023-09-28T17:40:34.900657379Z","user":"f8985de4-c040-40e9-9df6-9814c5582185"}

which propagates to the sync task running:

2023-09-28T19:51:26Z ERROR plugin.alerting alerting/admin.go:541  ran 3/4 tasks successfully context deadline exceeded {"action": "runSyncTasks"}
2023-09-28T19:51:26Z ERROR plugin.alerting alerting/admin.go:565 failed to successfully run all alerting sync tasks : context deadline exceeded

which in turn could be causing 1719

@alexandreLamarre alexandreLamarre self-assigned this Sep 29, 2023
@alexandreLamarre
Copy link
Contributor Author

This seems to be resolved by #1563 , but I'll keep this open in case it pops up again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alerting bug Something isn't working metrics
Projects
None yet
Development

No branches or pull requests

1 participant