Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC - Pipeline Component Telemetry #11406
base: main
Are you sure you want to change the base?
RFC - Pipeline Component Telemetry #11406
Changes from 4 commits
5df52e1
4232821
d0f1637
bea0e2f
35c82a4
2600836
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talk about the "scope" as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added language about the instrumentation scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Challenge for this: Do we really need both? If ever component records this, the outgoing of component X will be equal with next component Y metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also considering this. Are there situations where
outgoing
metric of component A is not equal toincoming
metric of component B? I think this is the same information. Does splitting this into two different metrics help in usage?I was considering a "one metric per graph edge" approach, as described in the below examples.
The advantage of the "one metric per edge" approach is that with sufficiently elaborate pipelines, the amount of metrics is cut in half (asymptotically).
Is there information that cannot be expressed with this "one metric per edge" approach? Are there other disadvantages, e.g. usability?
Example 1: Simple pipeline with Filter processor
Metrics according to this proposal:
One metric per graph edge:
Example 2: Pipeline with connector
This proposal:
One metric per edge:
Small note that the OTLP/JSON connector's behavior is currently actually different, see open-telemetry/opentelemetry-collector-contrib#35738. I've described the metrics as they would look like when that issue is resolved (connector does not emit empty batches).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only true for linear pipelines. Any fanout or merge of data streams breaks this. e.g. Receiver A emits 10 items, receiver B emits 20, processor C consumes 30.
Philosophically, I think users should reasonably expect that they can observe an object in isolation to get a relatively complete picture of how it is behaving. Offloading half of the picture for the sake of reducing data volume is a premature optimization in my opinion. In any case, if we support in some way the ability to disable individual metrics, then users can just disable one or the other to opt into exactly this same optimization.
More concretely, say we measure only incoming values. Some things that immediately become difficult for users:
The same problems exists if you only measure outgoing. The problem are even worse if we actually pin metrics to edges by including attributes from both components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which function call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For incoming, it is the call from the previous component to the described component's
ConsumeX
function. For outgoing, it is the call from the described component to the next component'sConsumeX
function.I'll clarify in the doc too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added clarification in the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call them "consumer", since consumer is about moving data, components only start/stop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otelcol_component_consumed_items
andotelcol_component_produced_items
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an advantage to have the
otel.component.kind
in the name of the metric vs as a label:I think it may not be bad to have per component kind metrics. I would not necessary go to have per component kind and type level, but that can be another option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am on the fence about this. I agree the case is weak but aggregation along this dimension might at least give a sense of scale.
What I mean is that if you consider each component as doing some non-trivial amount of work for a amount of data, you can aggregate to understand relatively how much work the collector is doing internally. e.g. Say you are considering some changes to a configuration and compare v(n) to v(n+1). If you aggregate across all dimensions of "incoming" you may see that even with the same inputs the collector is having 10x items passed between components.
Either way, I'm not necessarily against having metrics per kind. Some consequences though:
otel.signal
attribute, even though this is redundant with the metric name.I'm not sure I understand. Are you pointing out that users would be forced to enable/disable the metric for all kinds, but maybe they should be able to control it for each kind independently? e.g. Only record items produced by receivers and consumed by exporters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something that triggers another question to me:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not up to speed on the latest design but wouldn't it be intuitive that
component.TelemetrySettings
is able to answer the question of whether or not a metric is enabled?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think that the components will need to be able to return views (we've talked about this a few times).
The alternative here would be to only support enabling/disabling of metrics via view configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be great. I think would be Factories though, since we need this before pipelines are created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to capture details about logs that automatically report the status (ie started) of components in this RFC? I'm wondering if we should ensure that those logs include attributes listed earlier in this document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the mechanism I've described is able to do this. Also, since component status is still unstable, I'd prefer not to take that as a dependency here. My goal with this RFC is not to describe all standard telemetry for pipeline components, but to describe what can be done with the proposed mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that this can be out of scope for now