-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set attribute on exemplar spans #3723
Set attribute on exemplar spans #3723
Conversation
specification/metrics/sdk.md
Outdated
If a span is to be used as an exemplar, the ExemplarReservoir MUST set the | ||
span attribute `"otel.exemplar"` to the value `true`, otherwise it MUST | ||
not change the attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only work if we have a static set of exemplars. If an exemplar is sampled and later it is dropped from the reservoir due to other exemplars being sampled (simple-fixed-size-reservoir) there isn't a way to return the span to its original state currently because there is no API to drop an attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attribute could be changed to indicate that the span was considered/initially-stored in an ExemplarReservoir, and the can be used by tail-sampler as an additional input when making sampling decision by prioritizing spans with this attribute. Not perfect, but still an improvement.
specification/metrics/sdk.md
Outdated
@@ -1028,7 +1032,7 @@ algorithm](https://en.wikipedia.org/wiki/Reservoir_sampling) can be used: | |||
|
|||
``` | |||
bucket = random_integer(0, num_measurements_seen) | |||
if bucket < num_buckets then | |||
if bucket < num_buckets && reservoir[bucket] == null then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will drastically change the default sampling algorithm. It would no longer be a "uniformly-weighted sampling algorithm based on the number of samples the reservoir has seen so far to determine if the offered measurements should be sampled."
I don't think this is an appropriate change for all use cases. It seems like a good motivation for a custom reservoir to be supported instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point - I hadn't realised that. Being able to provide my own (or switch to a provided "tail-sampling friendly") reservoir would be great.
Alternatively, we could keep the algorithm as-is but still add the otel.exemplar=true
attribute to all spans that made it into a bucket, even if they are later discarded. The number of spans with the attribute that didn't actually become exemplars would be small (the chance of the nth
span making it into a bucket is 1/n
, so for N
spans the total that make it into a bucket is log(N)
- please correct my maths if that's wrong!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being able to provide my own (or switch to a provided "tail-sampling friendly") reservoir would be great.
That is already allowed in the spec!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is already allowed in the spec!
I'll go give it a try!
I've updated the PR to undo the changes to the fixed-size reservoir. I think this approach would be better long-term, even if it means keeping more spans than is needed. We could add some disclaimer explaining the trade-off between this and the histogram reservoir in terms of tail-sampling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the new rule in the spec to align with this approach. A little more maths suggests the ratio between the number of exemplars (b
) and the number of spans that get the attribute (b ln(n)
, where n
is the number of spans) is 1/ln(n)
. So for b=2
(for some reason, the default size in java is the number of processor cores 🤯) :
spans | annotated spans | annotated spans that are exemplars |
---|---|---|
100 | 9.2 | 22% |
1000 | 13.8 | 14% |
10000 | 18.4 | 11% |
I'd personally be fine keeping that many spans unnecessarily. If it became a problem I'd switch to the histogram-based reservoir. I've updated the java PR with this approach.
specification/metrics/sdk.md
Outdated
@@ -1028,7 +1032,7 @@ algorithm](https://en.wikipedia.org/wiki/Reservoir_sampling) can be used: | |||
|
|||
``` | |||
bucket = random_integer(0, num_measurements_seen) | |||
if bucket < num_buckets then | |||
if bucket < num_buckets && reservoir[bucket] == null then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably contradicting the "uniformely-weigted sampling algorithm based on the number of samples seen so far", as once the buckets are filled, all subsequent measurements will be never be stored into reservoir.!
(nit perf, probably an implementation optizmiation: the cpu cycles spent on generating random is not required as well once the buckets are filled)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw this is already called out in #3723 (comment)
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm opposed to this specific change, but I think it calls out the need for allowing custom exemplar reservoir implementations.
@@ -1044,14 +1052,15 @@ SHOULD be used. | |||
#### AlignedHistogramBucketExemplarReservoir | |||
|
|||
This Exemplar reservoir MUST take a configuration parameter that is the | |||
configuration of a Histogram. This implementation MUST keep the last seen | |||
configuration of a Histogram. This implementation MUST keep the first seen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reservoir is designed to match Prometheus behavior of today.
Rather than changing this specification, I'd prefer instead to expose a mechanism for users to specify their own custom Exemplar Reservoirs. This is the right way going forward to allow users to better sample the exemplars for their own use case or do more intelligent per-application sampling (similarly to how we expose Samplers for tracing).
@@ -993,6 +993,13 @@ span context and baggage can be inspected at this point. | |||
The "offer" method does not need to store all measurements it is given and | |||
MAY further sample beyond the `ExemplarFilter`. | |||
|
|||
If a span might be used as an exemplar, the ExemplarReservoir MUST set the | |||
span attribute `otel.exemplar` to the value `true`, otherwise it MUST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only makes sense if you put a restriction to guarantee that an exemplar is preserved once added to a reservoir (which is not true at all today).
This could have poor interaction w/ cumulative streams.
I'd rather see this function exposed via a custom Exemplar Reservoir hook. Imagine this:
- As Instrument Advice or as View configuration, you can provide an ExemplarReservoir factory for your metric.
- You should be able to provide an ExemplarReservoir implementation that guarantees sampling once something hits the reservoir and would be able to add this flag.
- All of that should be able to be done without official specification.
The only thing you're missing is the ability to provide custom reservoirs, which is something we should absoutely be adding to the exemplar specification.
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Proof of Concept for a way to support metric exemplars and tail-based sampling at the same time. Hopefully this will spur discussion on how we want to support this use-case.
Motivation
(from #2922)
Changes
HistogramExemplarReservoir
andRandomFixedSizeExemplarReservoir
now collect thefirst
instead of thelast
exemplar sample they see. This lets us know whether or not a span will be used as an exemplar while it is still current.otel.exemplar
as a potential new attribute that, when set totrue
means the sample is being used as an exemplar.