Skip to content

Commit

Permalink
Fix OTEL metric renames (#1713)
Browse files Browse the repository at this point in the history
* Fix OTEL metric renames

* fix deltatocumulative docs

* Make the changelog entry a bugfix

* missed one component in changelog entyr
  • Loading branch information
thampiotr authored Sep 19, 2024
1 parent b4b8834 commit 13874b5
Show file tree
Hide file tree
Showing 9 changed files with 39 additions and 37 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,12 @@ Main (unreleased)

### Bugfixes


- Update yet-another-cloudwatch-exporter from v0.60.0 vo v0.61.0: (@morreymeyer)
- Fixes a bug where cloudwatch S3 metrics are reported as `0`

- Fixed incorrect debug metric names in `otelcol.exporter.awss3`, `otelcol.exporter.otlp`, `otelcol.processor.batch`, `otelcol.processor.deltatocumulative` and `otelcol.processor.otlp`
which have changed due to an upstream breaking change. The dashboards and alerts in the mixin have also been fixed. (@thampiotr)


v1.4.0-rc.2
-----------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -142,10 +142,10 @@ information.

## Debug metrics

* `exporter_sent_spans_ratio_total` (counter): Number of spans successfully sent to destination.
* `exporter_send_failed_spans_ratio_total` (counter): Number of spans in failed attempts to send to destination.
* `exporter_queue_capacity_ratio` (gauge): Fixed capacity of the retry queue (in batches).
* `exporter_queue_size_ratio` (gauge): Current size of the retry queue (in batches).
* `otelcol_exporter_sent_spans_total` (counter): Number of spans successfully sent to destination.
* `otelcol_exporter_send_failed_spans_total` (counter): Number of spans in failed attempts to send to destination.
* `otelcol_exporter_queue_capacity` (gauge): Fixed capacity of the retry queue (in batches).
* `otelcol_exporter_queue_size` (gauge): Current size of the retry queue (in batches).
* `rpc_client_duration_milliseconds` (histogram): Measures the duration of inbound RPC.
* `rpc_client_request_size_bytes` (histogram): Measures size of RPC request messages (uncompressed).
* `rpc_client_requests_per_rpc` (histogram): Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -172,10 +172,10 @@ information.

## Debug metrics

* `exporter_sent_spans_ratio_total` (counter): Number of spans successfully sent to destination.
* `exporter_send_failed_spans_ratio_total` (counter): Number of spans in failed attempts to send to destination.
* `exporter_queue_capacity_ratio` (gauge): Fixed capacity of the retry queue (in batches)
* `exporter_queue_size_ratio` (gauge): Current size of the retry queue (in batches)
* `otelcol_exporter_sent_spans_total` (counter): Number of spans successfully sent to destination.
* `otelcol_exporter_send_failed_spans_total` (counter): Number of spans in failed attempts to send to destination.
* `otelcol_exporter_queue_capacity` (gauge): Fixed capacity of the retry queue (in batches)
* `otelcol_exporter_queue_size` (gauge): Current size of the retry queue (in batches)
* `rpc_client_duration_milliseconds` (histogram): Measures the duration of inbound RPC.
* `rpc_client_request_size_bytes` (histogram): Measures size of RPC request messages (uncompressed).
* `rpc_client_requests_per_rpc` (histogram): Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -136,11 +136,11 @@ information.

## Debug metrics

* `processor_batch_batch_send_size_bytes` (histogram): Number of bytes in batch that was sent.
* `processor_batch_batch_send_size_ratio` (histogram): Number of units in the batch.
* `processor_batch_metadata_cardinality_ratio` (gauge): Number of distinct metadata value combinations being processed.
* `processor_batch_timeout_trigger_send_ratio_total` (counter): Number of times the batch was sent due to a timeout trigger.
* `processor_batch_batch_size_trigger_send_ratio_total` (counter): Number of times the batch was sent due to a size trigger.
* `otelcol_processor_batch_batch_send_size_bytes` (histogram): Number of bytes in batch that was sent.
* `otelcol_processor_batch_batch_send_size` (histogram): Number of units in the batch.
* `otelcol_processor_batch_metadata_cardinality` (gauge): Number of distinct metadata value combinations being processed.
* `otelcol_processor_batch_timeout_trigger_send_total` (counter): Number of times the batch was sent due to a timeout trigger.
* `otelcol_processor_batch_batch_size_trigger_send_total` (counter): Number of times the batch was sent due to a size trigger.

## Examples

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,13 +89,13 @@ Name | Type | Description

## Debug metrics

* `processor_deltatocumulative_streams_tracked` (gauge): Number of streams currently tracked by the aggregation state.
* `processor_deltatocumulative_streams_limit` (gauge): Upper limit of tracked streams.
* `processor_deltatocumulative_streams_evicted` (counter): Total number of streams removed from tracking to ingest newer streams.
* `processor_deltatocumulative_streams_max_stale` (gauge): Duration without new samples after which streams are dropped.
* `processor_deltatocumulative_datapoints_processed` (counter): Total number of datapoints processed (successfully or unsuccessfully).
* `processor_deltatocumulative_datapoints_dropped` (counter): Faulty datapoints that were dropped due to the reason given in the `reason` label.
* `processor_deltatocumulative_gaps_length` (counter): Total length of all gaps in the streams, such as being due to lost in transit.
* `otelcol_deltatocumulative_streams_tracked` (gauge): Number of streams currently tracked by the aggregation state.
* `otelcol_deltatocumulative_streams_limit` (gauge): Upper limit of tracked streams.
* `otelcol_deltatocumulative_streams_evicted` (counter): Total number of streams removed from tracking to ingest newer streams.
* `otelcol_deltatocumulative_streams_max_stale_seconds` (gauge): Duration without new samples after which streams are dropped.
* `otelcol_deltatocumulative_datapoints_processed` (counter): Total number of datapoints processed (successfully or unsuccessfully).
* `otelcol_deltatocumulative_datapoints_dropped` (counter): Faulty datapoints that were dropped due to the reason given in the `reason` label.
* `otelcol_deltatocumulative_gaps_length` (counter): Total length of all gaps in the streams, such as being due to lost in transit.

## Examples

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -197,8 +197,8 @@ information.

## Debug metrics

* `receiver_accepted_spans_ratio_total` (counter): Number of spans successfully pushed into the pipeline.
* `receiver_refused_spans_ratio_total` (counter): Number of spans that could not be pushed into the pipeline.
* `otelcol_receiver_accepted_spans_total` (counter): Number of spans successfully pushed into the pipeline.
* `otelcol_receiver_refused_spans_total` (counter): Number of spans that could not be pushed into the pipeline.
* `rpc_server_duration_milliseconds` (histogram): Duration of RPC requests from a gRPC server.
* `rpc_server_request_size_bytes` (histogram): Measures size of RPC request messages (uncompressed).
* `rpc_server_requests_per_rpc` (histogram): Measures the number of messages received per RPC. Should be 1 for all non-streaming RPCs.
Expand Down
6 changes: 3 additions & 3 deletions docs/sources/set-up/deploy.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,9 +146,9 @@ This similarity is because most {{< param "PRODUCT_NAME" >}} components used for
#### When to scale

To decide whether scaling is necessary, check metrics such as:
* `receiver_refused_spans_ratio_total` from receivers such as `otelcol.receiver.otlp`.
* `processor_refused_spans_ratio_total` from processors such as `otelcol.processor.batch`.
* `exporter_send_failed_spans_ratio_total` from exporters such as `otelcol.exporter.otlp` and `otelcol.exporter.loadbalancing`.
* `otelcol_receiver_refused_spans_total` from receivers such as `otelcol.receiver.otlp`.
* `otelcol_receiver_refused_spans_total` from processors such as `otelcol.processor.batch`.
* `otelcol_exporter_send_failed_spans_total` from exporters such as `otelcol.exporter.otlp` and `otelcol.exporter.loadbalancing`.

#### Stateful and stateless components

Expand Down
8 changes: 4 additions & 4 deletions operations/alloy-mixin/alerts/opentelemetry.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ local alert = import './utils/alert.jsonnet';
alert.newRule(
'OtelcolReceiverRefusedSpans',
if enableK8sCluster then
'sum by (cluster, namespace, job) (rate(receiver_refused_spans_ratio_total{}[1m])) > 0'
'sum by (cluster, namespace, job) (rate(otelcol_receiver_refused_spans_total{}[1m])) > 0'
else
'sum by (job) (rate(receiver_refused_spans_ratio_total{}[1m])) > 0'
'sum by (job) (rate(otelcol_receiver_refused_spans_total{}[1m])) > 0'
,
'The receiver could not push some spans to the pipeline.',
'The receiver could not push some spans to the pipeline under job {{ $labels.job }}. This could be due to reaching a limit such as the ones imposed by otelcol.processor.memory_limiter.',
Expand All @@ -25,9 +25,9 @@ local alert = import './utils/alert.jsonnet';
alert.newRule(
'OtelcolExporterFailedSpans',
if enableK8sCluster then
'sum by (cluster, namespace, job) (rate(exporter_send_failed_spans_ratio_total{}[1m])) > 0'
'sum by (cluster, namespace, job) (rate(otelcol_exporter_send_failed_spans_total{}[1m])) > 0'
else
'sum by (job) (rate(exporter_send_failed_spans_ratio_total{}[1m])) > 0'
'sum by (job) (rate(otelcol_exporter_send_failed_spans_total{}[1m])) > 0'
,
'The exporter failed to send spans to their destination.',
'The exporter failed to send spans to their destination under job {{ $labels.job }}. There could be an issue with the payload or with the destination endpoint.',
Expand Down
14 changes: 7 additions & 7 deletions operations/alloy-mixin/dashboards/opentelemetry.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ local stackedPanelMixin = {
panel.withQueries([
panel.newQuery(
expr= |||
rate(receiver_accepted_spans_ratio_total{%(instanceSelector)s}[$__rate_interval])
rate(otelcol_receiver_accepted_spans_total{%(instanceSelector)s}[$__rate_interval])
||| % $._config,
//TODO: How will the dashboard look if there is more than one receiver component? The legend is not unique enough?
legendFormat='{{ pod }} / {{ transport }}',
Expand All @@ -62,7 +62,7 @@ local stackedPanelMixin = {
panel.withQueries([
panel.newQuery(
expr= |||
rate(receiver_refused_spans_ratio_total{%(instanceSelector)s}[$__rate_interval])
rate(otelcol_receiver_refused_spans_total{%(instanceSelector)s}[$__rate_interval])
||| % $._config,
legendFormat='{{ pod }} / {{ transport }}',
),
Expand Down Expand Up @@ -100,7 +100,7 @@ local stackedPanelMixin = {
panel.withQueries([
panel.newQuery(
expr= |||
sum by (le) (increase(processor_batch_batch_send_size_ratio_bucket{%(instanceSelector)s}[$__rate_interval]))
sum by (le) (increase(otelcol_processor_batch_batch_send_size_bucket{%(instanceSelector)s}[$__rate_interval]))
||| % $._config,
format='heatmap',
legendFormat='{{le}}',
Expand All @@ -119,7 +119,7 @@ local stackedPanelMixin = {
panel.withQueries([
panel.newQuery(
expr= |||
processor_batch_metadata_cardinality_ratio{%(instanceSelector)s}
otelcol_processor_batch_metadata_cardinality{%(instanceSelector)s}
||| % $._config,
legendFormat='{{ pod }}',
),
Expand All @@ -134,7 +134,7 @@ local stackedPanelMixin = {
panel.withQueries([
panel.newQuery(
expr= |||
rate(processor_batch_timeout_trigger_send_ratio_total{%(instanceSelector)s}[$__rate_interval])
rate(otelcol_processor_batch_timeout_trigger_send_total{%(instanceSelector)s}[$__rate_interval])
||| % $._config,
legendFormat='{{ pod }}',
),
Expand All @@ -156,7 +156,7 @@ local stackedPanelMixin = {
panel.withQueries([
panel.newQuery(
expr= |||
rate(exporter_sent_spans_ratio_total{%(instanceSelector)s}[$__rate_interval])
rate(otelcol_exporter_sent_spans_total{%(instanceSelector)s}[$__rate_interval])
||| % $._config,
legendFormat='{{ pod }}',
),
Expand All @@ -172,7 +172,7 @@ local stackedPanelMixin = {
panel.withQueries([
panel.newQuery(
expr= |||
rate(exporter_send_failed_spans_ratio_total{%(instanceSelector)s}[$__rate_interval])
rate(otelcol_exporter_send_failed_spans_total{%(instanceSelector)s}[$__rate_interval])
||| % $._config,
legendFormat='{{ pod }}',
),
Expand Down

0 comments on commit 13874b5

Please sign in to comment.