feat!: Update GraphQL span name convention #1389

kaylareopelle · 2024-09-04T18:19:34Z

Changes

Due to concerns about high cardinality with the current span name format, use only the graphql.operation.type attribute for the span name. The higher cardinality attribute, graphql.operation.name can still be used in queries, as it is an attribute on the span. The original fallback span name, GraphQL Operation, is preserved.

Merge requirement checklist

CONTRIBUTING.md guidelines followed.
Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with [chore]
schema-next.yaml updated with changes to existing conventions. (I don't think this needs to be updated because it's a span name and not an attribute)

Due to concerns about high cardinality with the current span name format use only the <graphql.operation.type> attribute for the span name. The higher cardinality attribute, <graphql.operation.name> can still be used in queries, as it is an attribute on the span. The original fallback, "GraphQL Operation" is preserved.

.chloggen/update-graphql-span-name.yaml

docs/graphql/graphql-spans.md

.chloggen/update-graphql-span-name.yaml

docs/graphql/graphql-spans.md

lmolkova · 2024-09-04T19:27:29Z

docs/graphql/graphql-spans.md

-`graphql.operation.type` and `graphql.operation.name` are available. If `graphql.operation.name` is not available, the
-span SHOULD be named `<graphql.operation.type>`. When `<graphql.operation.type>` is not available, `GraphQL Operation`
-MAY be used as span name.
+The **span name** MUST be of the format `<graphql.operation.type>` provided that


it seems that the main reason for this change is avoiding high cardinality.

What's the problem with operation name cardinality? Is it theoretically high or high in practice?

E.g. for HTTP server spans, you could have 100s of http.route values, but we use it in the span name given that usually cardinality is much lower and that route is such a useful grouping key.

Or for databases we say the following:

semantic-conventions/docs/database/database-spans.md

Lines 59 to 63 in 1507791

The **span name** SHOULD be `{db.operation.name} {target}` if there is a

(low-cardinality) `{db.operation.name}` available (see below for the exact definition of the [`{target}`](#target-placeholder) placeholder).

If there is no (low-cardinality) `db.operation.name` available, database span names

SHOULD be [`{target}`](#target-placeholder).

would it be reasonable to follow the same pattern here?

If I understand it right, the problem is not so much of cardinality but because it could be unbounded, given it's user specified? https://graphql.org/learn/queries/#operation-name and I'm not sure how an instrumentation can know if a operation name is "low cardinality".

it seems query name is something user optionally provides

https://stackoverflow.com/questions/34011964/what-is-the-point-of-naming-queries-and-mutations-in-graphql

Specifically the second answer https://stackoverflow.com/a/52542928/3239947

We use named queries so that they can be monitored consistently, and so that we can do persistent storage of a query. The duplication is there for query variables to fill the gaps.

The advantage is that the same query each time, not a different string because the query variables are the bit that differs. This means you can build tools on top of those queries because you can treat them as immutable.

If that's accurate, then it's intended for exactly this purpose and intended to have lowish/user-controlled cardinality. Seems like a perfect thing for the span name - for databases we actually would want if user provided us something like this so we could provide better diagnostics.

It doesn't even make sense to make opt-in, because users already provided name as such.

We should keep it.

If that's accurate, then it's intended for exactly this purpose and intended to have lowish/user-controlled cardinality.

Unfortunately the intended use and reality are not aligned. If an API client decides that it wants to generate an operation name based off of an incrementing id, there isn't much the server can do as it is technically a valid label.

Recording that operation name as an attribute on the span can be quite useful. Setting your span name to something with unbounded cardinality is not.

There seems to be an incorrect assumption here the systems being instrumented here control the client callers.

This is not a hypothetical concern.

If an API client decides that it wants to generate an operation name based off of an incrementing id,

Is it a client library or a an application that's using this client library?

If application user decides to use dynamic operation names, they would experience consequences of it in their telemetry systems. Primarily grouping by span name will be useless. It's unlikely that anything will break.

If they configured span->metrics, these metrics will reach cardinality limit and will be useless.

If my assumptions are correct, then removing operation name would be bad for everyone who uses operation name properly in order to protect those who use it incorrectly - this is a weird design choice to make.

Please help me understand where (if) my assumptions are wrong.

Is it a client library or a an application that's using this client library?

The instrumentation in question here is for server side spans generated from a graphql query/mutation/subscription, it receives client requests.

If application user decides to use dynamic operation names, they would experience consequences of it in their telemetry systems.

The server and client can be different entities. Imagine a public graphql api, anyone can issue queries, and you have no guarantee that operation name will conform to any pattern or bounded cardinality.

Primarily grouping by span name will be useless.

I completely agree, the current specification is directing instrumentation authors to make the graphql execute span useless for any grouping functions.

If they configured span->metrics, these metrics will reach cardinality limit and will be useless

Agree here as well, it means that this span is no longer a viable option metric generation.

In general it needlessly complicates any aggregate analysis of this span.

If my assumptions are correct, then removing operation name would be bad for everyone who uses operation name properly in order to protect those who use it incorrectly - this is a weird design choice to make.

Operation name can be of unbounded cardinality (unless you control all of your client requests)

Operation name is recorded as an attribute

Operation name is also denormalized into the span name in the form "#{operation_type} #{operation_name}"

By injecting the operation name into the span name it means for any graphql api that doesn't control all of the client consumers it is vulnerable to the issues that come with unbounded cardinality on a span name.

By not injecting the operation name into the span name, you are no longer vulnerable to those issue, and you can still optionally do any grouping or analysis based off of the span name and graphql.operation.name attribute.

Maybe we can reach a compromise here, that by default, operation name is not added, but maybe can be configured/added in a optional manner? This then solves the problem for both categories of grapql servers - those who control the clients and those who don't.

The instrumentation in question here is for server side spans generated from a graphql query/mutation/subscription, it receives client requests.

that's the part that I missed, I assumed these semantic conventions are for client apps. Thanks for pointing this out!

I agree that operation name is problematic then, it would also be problematic on metrics (if metrics are introduced).

The suggestion @joaopgrassi made

Maybe we can reach a compromise here, that by default, operation name is not added, but maybe can be configured/added in a optional manner?

would be useful for metrics as well.

I.e. instrumentation MAY provide configuration option to add operation name to span name (and populate it on the future hypothetical metrics). Otherwise, if option is not provided or not enabled, the instrumentation would only use operation-type in the span name.

kaylareopelle requested review from a team September 4, 2024 18:19

github-actions bot assigned reyang Sep 4, 2024

kaylareopelle commented Sep 4, 2024

View reviewed changes

.chloggen/update-graphql-span-name.yaml Show resolved Hide resolved

docs/graphql/graphql-spans.md Outdated Show resolved Hide resolved

This was referenced Sep 4, 2024

Concerns about high cardinality in GraphQL span names #1361

Open

fix: update GraphQL span name to follow semantic conventions open-telemetry/opentelemetry-ruby-contrib#966

Open

lmolkova reviewed Sep 4, 2024

View reviewed changes

feat: Update GraphQL span name spec to "SHOULD"

3806796

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: Update GraphQL span name convention #1389

feat!: Update GraphQL span name convention #1389

kaylareopelle commented Sep 4, 2024

lmolkova Sep 4, 2024 •

edited

Loading

joaopgrassi Sep 6, 2024

lmolkova Sep 6, 2024 •

edited

Loading

robertlaurin Sep 9, 2024 •

edited

Loading

lmolkova Sep 9, 2024 •

edited

Loading

robertlaurin Sep 10, 2024

joaopgrassi Sep 10, 2024

lmolkova Sep 10, 2024 •

edited

Loading

	The span name SHOULD be `{db.operation.name} {target}` if there is a
	(low-cardinality) `{db.operation.name}` available (see below for the exact definition of the [`{target}`](#target-placeholder) placeholder).

	If there is no (low-cardinality) `db.operation.name` available, database span names
	SHOULD be [`{target}`](#target-placeholder).

feat!: Update GraphQL span name convention #1389

Are you sure you want to change the base?

feat!: Update GraphQL span name convention #1389

Conversation

kaylareopelle commented Sep 4, 2024

Changes

Merge requirement checklist

lmolkova Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

joaopgrassi Sep 6, 2024

Choose a reason for hiding this comment

lmolkova Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

robertlaurin Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

lmolkova Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

robertlaurin Sep 10, 2024

Choose a reason for hiding this comment

joaopgrassi Sep 10, 2024

Choose a reason for hiding this comment

lmolkova Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

lmolkova Sep 4, 2024 •

edited

Loading

lmolkova Sep 6, 2024 •

edited

Loading

robertlaurin Sep 9, 2024 •

edited

Loading

lmolkova Sep 9, 2024 •

edited

Loading

lmolkova Sep 10, 2024 •

edited

Loading