Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metrics 4/x] Metrics exporter rules #732

Merged

Conversation

zeeke
Copy link
Member

@zeeke zeeke commented Jul 10, 2024

PrometheusRules allow recording pre-defined queries. Use
sriov_kubepoddevice metric to add pod|namespace pair
to the sriov metrics.

Here is an example of the raw exported metrics:

sriov_kubepoddevice{container="testpmd",dev_type="openshift.io/inteldpdk",namespace="cnf-4916",pciAddr="0000:17:01.4",pod="dpdk-intel-client"} 1

sriov_vf_rx_broadcast{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 1.0926018e+07
sriov_vf_rx_bytes{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 7.83952134e+08
sriov_vf_rx_dropped{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 0
sriov_vf_rx_multicast{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 0
sriov_vf_rx_packets{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 1.0926018e+07
sriov_vf_tx_bytes{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 0
sriov_vf_tx_dropped{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 1.0926018e+07
sriov_vf_tx_packets{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 0

Proposed prometheus rules allow to query the following new metrics:

  • network:sriov_vf_tx_packets
  • network:sriov_vf_rx_packets
  • network:sriov_vf_tx_bytes
  • network:sriov_vf_rx_bytes
  • network:sriov_vf_tx_dropped
  • network:sriov_vf_rx_dropped
  • network:sriov_vf_rx_broadcast
  • network:sriov_vf_rx_multicast

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-rules branch 4 times, most recently from 3eced4d to 7b25cc6 Compare July 19, 2024 13:58
@zeeke zeeke marked this pull request as ready for review August 5, 2024 11:17
@coveralls
Copy link

coveralls commented Aug 5, 2024

Pull Request Test Coverage Report for Build 10903994186

Details

  • 1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.004%) to 45.052%

Totals Coverage Status
Change from base Build 10903544901: 0.004%
Covered Lines: 6628
Relevant Lines: 14712

💛 - Coveralls

@zeeke zeeke force-pushed the metrics-exporter-rules branch 3 times, most recently from d8ce5ef to 9493e50 Compare August 20, 2024 11:36
Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -29,6 +29,7 @@ rules:
- monitoring.coreos.com
resources:
- servicemonitors
- prometheusrules
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support deletion of prometheus objects ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For when sno is redeployed with prometheus disabled (e.g in helm chart)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. we have to support object deletion as well. I handle it in

@zeeke
Copy link
Member Author

zeeke commented Sep 12, 2024

@adrianchiris can we move this forward?

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

PrometheusRules allow recording pre-defined queries. Use
`sriov_kubepoddevice` metric to add `pod|namespace` pair
to the sriov metrics.

Feature is enabled via the `METRICS_EXPORTER_PROMETHEUS_DEPLOY_RULE`
environment variable.

Signed-off-by: Andrea Panattoni <[email protected]>
When the `metricsExporter` feature is turned off, deployed resources
should be removed. These changes fix the error:

```
│ 2024-08-28T14:07:57.699760017Z    ERROR    controller/controller.go:266    Reconciler error    {"controller": "sriovoperatorconfig", "controllerGroup": "sriovnetwork.openshift.io", "controllerKind": "SriovOperatorConfig", "SriovOperatorConfig": {"name":"default","namespace":"openshift-sriov-network-operator"},  │
│ "namespace": "openshift-sriov-network-operator", "name": "default", "reconcileID": "fa841c50-dbb8-4c4c-9ddd-b98624fd2a24", "error": "failed to delete object &{map[apiVersion:monitoring.coreos.com/v1 kind:ServiceMonitor metadata:map[name:sriov-network-metrics-exporter namespace:openshift-sriov-network-operator]  │
│ spec:map[endpoints:[map[bearerTokenFile:/var/run/secrets/kubernetes.io/serviceaccount/token honorLabels:true interval:30s port:sriov-network-metrics scheme:https tlsConfig:map[caFile:/etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt insecureSkipVerify:false serverName:sriov-network-metrics-expor │
│ ter-service.openshift-sriov-network-operator.svc]]] namespaceSelector:map[matchNames:[openshift-sriov-network-operator]] selector:map[matchLabels:map[name:sriov-network-metrics-exporter-service]]]]} with err: could not delete object (monitoring.coreos.com/v1, Kind=ServiceMonitor) openshift-sriov-network-operato │
│ r/sriov-network-metrics-exporter: servicemonitors.monitoring.coreos.com \"sriov-network-metrics-exporter\" is forbidden: User \"system:serviceaccount:openshift-sriov-network-operator:sriov-network-operator\" cannot delete resource \"servicemonitors\" in API group \"monitoring.coreos.com\" in the namespace \"ope │
│ nshift-sriov-network-operator\""}
```

Signed-off-by: Andrea Panattoni <[email protected]>
@adrianchiris adrianchiris merged commit aecb4bb into k8snetworkplumbingwg:master Sep 19, 2024
13 checks passed
zeeke added a commit to zeeke/sriov-network-operator that referenced this pull request Sep 20, 2024
Make the operator creating PrometheusRules to browse
metrics in the Developer Console.

refs:
- k8snetworkplumbingwg/sriov-network-operator#732

Signed-off-by: Andrea Panattoni <[email protected]>
zeeke added a commit to zeeke/sriov-network-operator that referenced this pull request Sep 20, 2024
Make the operator creating PrometheusRules to browse
metrics in the Developer Console.

refs:
- k8snetworkplumbingwg/sriov-network-operator#732

Signed-off-by: Andrea Panattoni <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants