-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: Update GraphQL span name convention #1389
base: main
Are you sure you want to change the base?
feat!: Update GraphQL span name convention #1389
Conversation
Due to concerns about high cardinality with the current span name format use only the <graphql.operation.type> attribute for the span name. The higher cardinality attribute, <graphql.operation.name> can still be used in queries, as it is an attribute on the span. The original fallback, "GraphQL Operation" is preserved.
docs/graphql/graphql-spans.md
Outdated
`graphql.operation.type` and `graphql.operation.name` are available. If `graphql.operation.name` is not available, the | ||
span SHOULD be named `<graphql.operation.type>`. When `<graphql.operation.type>` is not available, `GraphQL Operation` | ||
MAY be used as span name. | ||
The **span name** MUST be of the format `<graphql.operation.type>` provided that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that the main reason for this change is avoiding high cardinality.
What's the problem with operation name cardinality? Is it theoretically high or high in practice?
E.g. for HTTP server spans, you could have 100s of http.route
values, but we use it in the span name given that usually cardinality is much lower and that route is such a useful grouping key.
Or for databases we say the following:
semantic-conventions/docs/database/database-spans.md
Lines 59 to 63 in 1507791
The **span name** SHOULD be `{db.operation.name} {target}` if there is a | |
(low-cardinality) `{db.operation.name}` available (see below for the exact definition of the [`{target}`](#target-placeholder) placeholder). | |
If there is no (low-cardinality) `db.operation.name` available, database span names | |
SHOULD be [`{target}`](#target-placeholder). |
would it be reasonable to follow the same pattern here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it right, the problem is not so much of cardinality but because it could be unbounded, given it's user specified? https://graphql.org/learn/queries/#operation-name and I'm not sure how an instrumentation can know if a operation name is "low cardinality".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems query name is something user optionally provides
Specifically the second answer https://stackoverflow.com/a/52542928/3239947
We use named queries so that they can be monitored consistently, and so that we can do persistent storage of a query. The duplication is there for query variables to fill the gaps.
The advantage is that the same query each time, not a different string because the query variables are the bit that differs. This means you can build tools on top of those queries because you can treat them as immutable.
If that's accurate, then it's intended for exactly this purpose and intended to have lowish/user-controlled cardinality. Seems like a perfect thing for the span name - for databases we actually would want if user provided us something like this so we could provide better diagnostics.
It doesn't even make sense to make opt-in, because users already provided name as such.
We should keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's accurate, then it's intended for exactly this purpose and intended to have lowish/user-controlled cardinality.
Unfortunately the intended use and reality are not aligned. If an API client decides that it wants to generate an operation name based off of an incrementing id, there isn't much the server can do as it is technically a valid label.
Recording that operation name as an attribute on the span can be quite useful. Setting your span name to something with unbounded cardinality is not.
There seems to be an incorrect assumption here the systems being instrumented here control the client callers.
This is not a hypothetical concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an API client decides that it wants to generate an operation name based off of an incrementing id,
Is it a client library or a an application that's using this client library?
If application user decides to use dynamic operation names, they would experience consequences of it in their telemetry systems. Primarily grouping by span name will be useless. It's unlikely that anything will break.
If they configured span->metrics, these metrics will reach cardinality limit and will be useless.
If my assumptions are correct, then removing operation name would be bad for everyone who uses operation name properly in order to protect those who use it incorrectly - this is a weird design choice to make.
Please help me understand where (if) my assumptions are wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a client library or a an application that's using this client library?
The instrumentation in question here is for server side spans generated from a graphql query/mutation/subscription, it receives client requests.
If application user decides to use dynamic operation names, they would experience consequences of it in their telemetry systems.
The server and client can be different entities. Imagine a public graphql api, anyone can issue queries, and you have no guarantee that operation name will conform to any pattern or bounded cardinality.
Primarily grouping by span name will be useless.
I completely agree, the current specification is directing instrumentation authors to make the graphql execute span useless for any grouping functions.
If they configured span->metrics, these metrics will reach cardinality limit and will be useless
Agree here as well, it means that this span is no longer a viable option metric generation.
In general it needlessly complicates any aggregate analysis of this span.
If my assumptions are correct, then removing operation name would be bad for everyone who uses operation name properly in order to protect those who use it incorrectly - this is a weird design choice to make.
- Operation name can be of unbounded cardinality (unless you control all of your client requests)
- Operation name is recorded as an attribute
- Operation name is also denormalized into the span name in the form
"#{operation_type} #{operation_name}"
By injecting the operation name into the span name it means for any graphql api that doesn't control all of the client consumers it is vulnerable to the issues that come with unbounded cardinality on a span name.
By not injecting the operation name into the span name, you are no longer vulnerable to those issue, and you can still optionally do any grouping or analysis based off of the span name and graphql.operation.name
attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can reach a compromise here, that by default, operation name is not added, but maybe can be configured/added in a optional manner? This then solves the problem for both categories of grapql servers - those who control the clients and those who don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instrumentation in question here is for server side spans generated from a graphql query/mutation/subscription, it receives client requests.
that's the part that I missed, I assumed these semantic conventions are for client apps. Thanks for pointing this out!
I agree that operation name is problematic then, it would also be problematic on metrics (if metrics are introduced).
The suggestion @joaopgrassi made
Maybe we can reach a compromise here, that by default, operation name is not added, but maybe can be configured/added in a optional manner?
would be useful for metrics as well.
I.e. instrumentation MAY provide configuration option to add operation name to span name (and populate it on the future hypothetical metrics). Otherwise, if option is not provided or not enabled, the instrumentation would only use operation-type in the span name.
Fixes #1361
Changes
Due to concerns about high cardinality with the current span name format, use only the
graphql.operation.type
attribute for the span name. The higher cardinality attribute,graphql.operation.name
can still be used in queries, as it is an attribute on the span. The original fallback span name,GraphQL Operation
, is preserved.Merge requirement checklist
[chore]