Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add info about custom latency buckets #4236

Merged
merged 38 commits into from
Jul 10, 2024
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
84bc33c
doc: http metrics path normalization
nelson-parente May 21, 2024
0882e0f
doc: code review & path matching rename
nelson-parente Jun 5, 2024
bb2e4c2
doc: add configuration examples
nelson-parente Jun 14, 2024
69f50d6
update: update docs based on last proposal changes
nelson-parente Jun 19, 2024
db60dab
feat: more updates based on the ingress/egress merge
nelson-parente Jun 19, 2024
f125100
doc: code review comments
nelson-parente Jun 25, 2024
71e818f
doc: code review comments
nelson-parente Jun 25, 2024
e801370
feat: add excludeVerbs
nelson-parente Jun 25, 2024
41aa984
feat: new line
nelson-parente Jun 25, 2024
b8eaf5e
feat: add review meeting changes
nelson-parente Jun 25, 2024
749fff2
v1.14 - cherry pick path normalization
filintod Jun 26, 2024
118bbab
add additional changes
filintod Jun 26, 2024
6619664
add additional changes
filintod Jun 26, 2024
1c289bc
add additional changes
filintod Jun 26, 2024
961a690
format table
filintod Jun 26, 2024
ef87997
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jun 26, 2024
50e6c70
explain buckets
filintod Jun 27, 2024
206ed67
Apply suggestions from code review
msfussell Jul 2, 2024
70ffef8
Merge branch 'v1.14' into filinto/custom-latency-buckets
msfussell Jul 2, 2024
64f01cc
Merge branch 'v1.14' into filinto/custom-latency-buckets
msfussell Jul 3, 2024
8c7b6b1
Merge branch 'v1.14' into filinto/custom-latency-buckets
filintod Jul 5, 2024
8ff8970
Merge branch 'v1.14' into filinto/custom-latency-buckets
hhunter-ms Jul 8, 2024
85bba8f
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jul 9, 2024
04adf9f
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jul 9, 2024
125cc16
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jul 9, 2024
3e35bc5
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jul 9, 2024
e8d856e
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jul 9, 2024
5997427
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jul 9, 2024
a63e422
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jul 9, 2024
833ef60
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jul 9, 2024
4215e02
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jul 9, 2024
a7ed609
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jul 9, 2024
800ad40
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jul 9, 2024
27d997f
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jul 9, 2024
9be1451
Merge branch 'v1.14' into filinto/custom-latency-buckets
filintod Jul 9, 2024
0a06955
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jul 9, 2024
a9b65f8
Update daprdocs/content/en/operations/observability/metrics/metrics-o…
filintod Jul 9, 2024
fdcc233
Merge branch 'v1.14' into filinto/custom-latency-buckets
filintod Jul 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ The `metrics` section under the `Configuration` spec contains the following prop
metrics:
enabled: true
rules: []
latencyDistributionBuckets: []
http:
increasedCardinality: true
pathMatching:
Expand All @@ -121,17 +122,18 @@ metrics:
excludeVerbs: false
```

In the examples above, the path filter `/orders/{orderID}/items/{itemID}` would return a single metric count matching all the `orderIDs` and all the `itemIDs`, rather than multiple metrics for each `itemID`. For more information, see [HTTP metrics path matching]({{< ref "metrics-overview.md#http-metrics-path-matching" >}}).
In the examples above this path filter `/orders/{orderID}/items/{itemID}` would return a single metric count matching all the orderIDs and all the itemIDs rather than multiple metrics for each itemID. For more information see [HTTP metrics path matching]({{< ref "metrics-overview.md#http-metrics-path-matching" >}})

The following table lists the properties for metrics:

| Property | Type | Description |
|--------------|--------|-------------|
| `enabled` | boolean | When set to true, the default, enables metrics collection and the metrics endpoint. |
| `rules` | array | Named rule to filter metrics. Each rule contains a set of `labels` to filter on and a `regex` expression to apply to the metrics path. |
| `http.increasedCardinality` | boolean | When set to `true` (default), in the Dapr HTTP server, each request path causes the creation of a new "bucket" of metrics. This can cause issues, including excessive memory consumption when there many different requested endpoints (such as when interacting with RESTful APIs).<br> To mitigate high memory usage and egress costs associated with [high cardinality metrics]({{< ref "metrics-overview.md#high-cardinality-metrics" >}}) with the HTTP server, you should set the `metrics.http.increasedCardinality` property to `false`.|
| `http.pathMatching` | array | Paths used for path matching, allowing users to define matching paths in order to manage cardinality. |
| `http.excludeVerbs` | boolean | When set to `true` (default is `false`), the Dapr HTTP server ignores each request HTTP verb when building the method metric label. |
| Property | Type | Description |
|------------------------------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `enabled` | boolean | When set to true, the default, enables metrics collection and the metrics endpoint. |
| `rules` | array | Named rule to filter metrics. Each rule contains a set of `labels` to filter on and a `regex` expression to apply to the metrics path. |
| `latencyDistributionBuckets` | array | Array of latency distribution buckets in milliseconds for latency metrics histograms. |
msfussell marked this conversation as resolved.
Show resolved Hide resolved
| `http.increasedCardinality` | boolean | When set to `true` (default), in the Dapr HTTP server each request path causes the creation of a new "bucket" of metrics. This can cause issues, including excessive memory consumption, when there many different requested endpoints (such as when interacting with RESTful APIs).<br> To mitigate high memory usage and egress costs associated with [high cardinality metrics]({{< ref "metrics-overview.md#high-cardinality-metrics" >}}) with the HTTP server, you should set the `metrics.http.increasedCardinality` property to `false`. |
| `http.pathMatching` | array | Array of paths for path matching, allowing users to define matching paths to manage cardinality. |
| `http.excludeVerbs` | boolean | When set to true (default is false), the Dapr HTTP server ignores each request HTTP verb when building the method metric label. |

To further help managing cardinality, path matching allows specified paths matched according to defined patterns, reducing the number of unique metrics paths and thus controlling metric cardinality. This feature is particularly useful for applications with dynamic URLs, ensuring that metrics remain meaningful and manageable without excessive memory consumption.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,57 @@ dapr_http_server_request_count{app_id="order-service",method="",path="/orders",s

In this example, the HTTP method is excluded from the metrics, resulting in a single metric for all requests to the `/orders` endpoint.

### Configuring Custom Latency Histogram Buckets
filintod marked this conversation as resolved.
Show resolved Hide resolved

Dapr uses cumulative histogram metrics to group latency values into buckets, where each bucket contains:
- A count of the number of requests with that latency
- All the requests with lower latency
### Using the default latency bucket configurations
filintod marked this conversation as resolved.
Show resolved Hide resolved

filintod marked this conversation as resolved.
Show resolved Hide resolved
By default, Dapr groups request latency metrics into the following buckets:

```
1, 2, 3, 4, 5, 6, 8, 10, 13, 16, 20, 25, 30, 40, 50, 65, 80, 100, 130, 160, 200, 250, 300, 400, 500, 650, 800, 1000, 2000, 5000, 10000, 20000, 50000, 100000
```

Grouping latency values in a cumulative fashion allows buckets to be used or dropped as needed for increased or decreased granularity of data.
For example, if a request takes 3ms, it's counted in the 3ms bucket, the 4ms bucket, the 5ms bucket, and so on.
Similarly, if a request takes 10ms, it's counted in the 10ms bucket, the 13ms bucket, the 16ms bucket, and so on.
After these two requests have completed, the 3ms bucket has a count of 1 and the 10ms bucket has a count of 2, since both the 3ms and 10ms requests are included here.

This shows up as follows:

|1|2|3|4|5|6|8|10|13|16|20|25|30|40|50|65|80|100|130|160| ..... | 100000 |
|-|-|-|-|-|-|-|--|--|--|--|--|--|--|--|--|--|---|---|---|-------|--------|
|0|0|1|1|1|1|1| 2| 2| 2| 2| 2| 2| 2| 2| 2| 2| 2 | 2 | 2 | ..... | 2 |


The default number of buckets works well for most use cases, but can be adjusted as needed. Each request creates 34 different metrics, leaving this value to grow considerably for a large number of applications.
More accurate latency percentiles can be achieved by increasing the number of buckets. However, a higher number of buckets increases the amount of memory used to store the metrics, potentially negatively impacting your monitoring system.

It is recommended to keep the number of latency buckets set to the default value, unless you are seeing unwanted memory pressure in your monitoring system. Configuring the number of buckets allows you to choose applications where:
- You want to see more detail with a higher number of buckets
- Broader values are sufficient by reducing the buckets

filintod marked this conversation as resolved.
Show resolved Hide resolved
Take note of the default latency values your applications are producing before configuring the number buckets.
### Customizing latency buckets to your scenario

Tailor the latency buckets to your needs, by modifying the `spec.metrics.latencyDistributionBuckets` field in the [Dapr configuration spec]({{< ref configuration-schema.md >}}) for your application(s).

For example, if you aren't interested in extremely low latency values (1-10ms), you can group them in a single 10ms bucket. Similarly, you can group the high values in a single bucket (1000-5000ms), while keeping more detail in the middle range of values that you are most interested in.

The following Configuration spec example replaces the default 34 buckets with 11 buckets, giving a higher level of granularity in the middle range of values:

```yaml
apiVersion: dapr.io/v1alpha1
kind: Configuration
metadata:
name: custom-metrics
spec:
metrics:
enabled: true
latencyDistributionBuckets: [10, 25, 40, 50, 70, 100, 150, 200, 500, 1000, 5000]
```

## Transform metrics with regular expressions

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ spec:
labels:
- name: <LABEL-NAME>
regex: {}
latencyDistributionBuckets:
- <BUCKET-VALUE-MS-0>
msfussell marked this conversation as resolved.
Show resolved Hide resolved
- <BUCKET-VALUE-MS-1>
http:
increasedCardinality: <TRUE-OR-FALSE>
pathMatching:
Expand Down
Loading