You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As reported several times on the forum (1, 2, 3), the percentile values users can visualize in Datadog are often wrong when compared to the end-of-test summary shown by k6.
As explained here, this is likely caused by the additional aggregation done by the DogStatsD agent, which by default only generates the 95percentile metric, which is then aggregated again by the query used in the Datadog graph.
Besides resulting in wrong values, users are unable to use raw metric data to generate any percentile value they need.
k6 version
0.41.0
OS
any
Steps to reproduce the problem
Follow the steps to setup the Datadog Agent, and run a test as explained here.
Go to the Datadog web UI and visualize the k6.http_req_duration.max and k6.http_req_duration.95percentile metrics, and compare them to the end-of-test summary shown by k6. Notice that they don't match.
Expected behaviour
The percentiles shown in Datadog should match the end-of-test summary shown by k6.
The user should be able to generate any percentile over raw metric data sent to Datadog. Ideally, the Datadog Agent (DogStatsD) shouldn't do any aggregation at all.
Actual behavior
The percentiles are different, and the Datadog 95percentile is confusing.
The user can only use a limited number of metrics, and most pre-aggregated ones show wrong values.
Suggested solution
After going through the Datadog documentation, it seems it's possible to send raw data that won't be aggregated by the Datadog Agent using the distribution metric type (1, 2). This would not only avoid the aggregation, but allow users to generate any percentile value they need over the raw data.
The drawback would possibly be overloading the ingest pipeline (either of the Datadog Agent or of Datadog itself, hitting some API limits, etc.), so this needs to be thoroughly tested.
In addition to evaluating whether this change works for some of our metric types, we should also ensure that we don't break support for any other StatsD backends.
Brief summary
As reported several times on the forum (1, 2, 3), the percentile values users can visualize in Datadog are often wrong when compared to the end-of-test summary shown by k6.
As explained here, this is likely caused by the additional aggregation done by the DogStatsD agent, which by default only generates the
95percentile
metric, which is then aggregated again by the query used in the Datadog graph.Besides resulting in wrong values, users are unable to use raw metric data to generate any percentile value they need.
k6 version
0.41.0
OS
any
Steps to reproduce the problem
Follow the steps to setup the Datadog Agent, and run a test as explained here.
Go to the Datadog web UI and visualize the
k6.http_req_duration.max
andk6.http_req_duration.95percentile
metrics, and compare them to the end-of-test summary shown by k6. Notice that they don't match.Expected behaviour
The percentiles shown in Datadog should match the end-of-test summary shown by k6.
The user should be able to generate any percentile over raw metric data sent to Datadog. Ideally, the Datadog Agent (DogStatsD) shouldn't do any aggregation at all.
Actual behavior
The percentiles are different, and the Datadog
95percentile
is confusing.The user can only use a limited number of metrics, and most pre-aggregated ones show wrong values.
Suggested solution
After going through the Datadog documentation, it seems it's possible to send raw data that won't be aggregated by the Datadog Agent using the distribution metric type (1, 2). This would not only avoid the aggregation, but allow users to generate any percentile value they need over the raw data.
The drawback would possibly be overloading the ingest pipeline (either of the Datadog Agent or of Datadog itself, hitting some API limits, etc.), so this needs to be thoroughly tested.
Currently we send
Count
,Gauge
and other metric types that will be aggregated, but thedatadog-go/statsd
client we use also supports theDistribution
metric.In addition to evaluating whether this change works for some of our metric types, we should also ensure that we don't break support for any other StatsD backends.
The issue ported from grafana/k6#2819
The text was updated successfully, but these errors were encountered: