Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to the profiles protocol based on implementation experience. #587

Closed
wants to merge 5 commits into from

Conversation

jhalliday
Copy link
Contributor

Context:

The Profiling SIG and TC representatives have agreed that the protocol will no longer aim for wire compatibility with pprof, but rather will be accompanied by a canonical method of mapping data from pprof to OpenTelemetry formats. This provides us greater flexibility in adapting pprof to OpenTelemetry use cases.

Early implementation experience has exposed areas in which the protocol could benefit from breaking changes. The SIG has agreed that some of these are worthwhile and better made sooner rather than later, such that the minimum number of existing implementations are impacted.

See slack #otel-profiles and the profiling SIG meeting minutes google doc for further background on the change motivation.

In light of these points, this change:

  • Alters the message definitions without using a version field or package change, e.g. not 'v1experimental' -> 'v2experiemental' or 'v1experimental2'. This is a process design choice. It will cause client/server interop problems during the transition. For the most part OpenTelemetry protos don't change once fully defined, so there exists little precedent here and the early-stage evolution process is open to debate. Early implementers on whom this will have an impact are invited to comment, as are those with an eye on how we may want to handle breaking changes further down the line when the blast zone is larger.

  • Combines profiles.proto and pprofextended.proto to a single file, merging ProfileContainer and Profile into a single new entity, Profile. The fields now in Profile are the union of those in the two messages, with some largely cosmetic reordering to provide improved clarity on the relationship between pprof and OpenTelemetry. A design choice is made to reserve a small number of field ids to provide for potential future alignment with any additions that pprof may make in later versions.

  • Fields which are array indexes into lookup tables are modified to consistently use int32 data type, so as to better support languages which don't feature 64bit array sizes. pprof was targeting golang only and was used as a storage file format, whereas OpenTelemetry has polyglot requirements and is intended as a wire message in which very large arrays are not expected. This change prioritizes implementation ease over ease of transforming pprof data to OpenTelemetry format, which as a result may require splitting any very large pprof file to multiple Otel messages.

  • Fields which are array indexes into lookup tables are renamed to consistently use a suffix indicating their purpose, which reduces confusion in the resulting code-generated APIs, where e.g. int fields represent String data types. Design choice: naming is hard! '_strindex' is used for indexes into the string table, whist indexes into all other lookup tables use '_index' and rely on the prefix for context indicating which table is referenced e.g. 'function_index' is clearly a reference into the function table without requiring the more verbose 'function_functionindex'.

There remain other breaking changes under consideration. These may follow separately. This PR limits the scope to just these few to keep the discussion focussed.

… experience.

- Combine profiles.proto and pprofextended.proto to a single file.
- Change array index data types and field names for improved code generation.
@jhalliday jhalliday requested review from a team September 11, 2024 14:50
@jhalliday
Copy link
Contributor Author

Thanks for the review, Christos. I've pushed changes for the things that are in the scope of this PR and kept a note to myself for the others. There will be at least one more separate PR before the dust settles I think, but I'm still working on the 0-index headache.

… experience.

Futher improvment to field name consistency.

// In the event that future pprof versions add additional fields, then
// future OpenTelementry versions MAY use these for corresponding purposes.
reserved 16,17,18,19;
Copy link

@aalexand aalexand Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is needed. Trying to keep the same physical IDs between the pprof format and the OTel profiling format is going to be hard and the conversion process will just have to remap things as needed. Even with this change, the reserved IDs are only added in the Profile message, but there are many other below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it may well be futile, but OTOH it's essentially zero cost to try. Whilst there are several other messages where we could do this, only a couple are ones where we already add additional fields without leaving a gap. The biggest mess is Sample, which has gaps for no reason and without using 'reserved' for them.
There is going to be at least one more PR for changes soon, I'm inclined to leave this as is for now and change it in that later one if needed, after a policy discussion with the SIG to get some global consistency on our approach. Unless you feel really strongly about it I'd prefer to get this PR merged without waiting on that, as the loss of change history is going to make re-syncing other changes a pain if it goes on too long.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss this in a later PR, yes. My understanding was that we are going to remove fields that act as "the old way of doing things" (such as labels) and so at the physical representation level the OTel proto simply completely diverges from the pprof's proto. Plus we'd re-number and re-order the OTel's proto fields and so the numbers even for fields that have the same semantical meaning as in pprof would diverge. With that in mind having field number gaps to allow pprof to add its new fields without breaking the representation compatibility seems like not achievable but maybe I'm missing something.

… experience.

Modifications addressing review comments.
// A globally unique identifier for a profile. The ID is a 16-byte array. An ID with
// all zeroes is considered invalid.
//
// This field is required.
bytes profile_id = 1;
bytes profile_id = 20;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings about this field being required. I suspect this is modeled after other OTel signals, but I'm thinking of something like, say, a particular language runtime decides to generate its profiles in the OTel format. Say, Python. Requiring a low-level language runtime to take a dependency on a UUID generator seems excessive and unnecessary. Maybe it's a lesser deal than I imagine.

BTW this also means that pprof profiles cannot be converted into OTel profiles with just the information that is in the pprof profile - this ID will have to be "invented" at the conversion time. It's probably OK, but this is the first field of this kind, so I wanted to call this out.

// Lookup table for attributes.
repeated opentelemetry.proto.common.v1.KeyValue attribute_table = 22;
// Represents a mapping between Attribute Keys and Units.
repeated AttributeUnit attribute_units = 23;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was probably discussed in the past, but I wonder now why this is a list and not map<int32, int32> to express better the uniqueness of the keys.

@aalexand
Copy link

+1 to submit this PR soon since it involves moving files which causes conflicts with other changes.

@tigrannajaryan
Copy link
Member

@open-telemetry/profiling-maintainers please use spec:profiles for profiling issues/PR. I filter on the label to see if there is anything pending merging/etc.

@trask
Copy link
Member

trask commented Sep 26, 2024

@open-telemetry/profiling-maintainers please use spec:profiles for profiling issues/PR. I filter on the label to see if there is anything pending merging/etc.

we probably need to give them triage permission on this repo

@tigrannajaryan
Copy link
Member

@open-telemetry/profiling-maintainers please use spec:profiles for profiling issues/PR. I filter on the label to see if there is anything pending merging/etc.

we probably need to give them triage permission on this repo

@trask profiling-maintainers have "Write" permission to this repo. Isn't that a superset of permissions that "Triage" have? If it is not, I agree let's add triage permission.

@trask
Copy link
Member

trask commented Sep 26, 2024

@trask profiling-maintainers have "Write" permission to this repo. Isn't that a superset of permissions that "Triage" have? If it is not, I agree let's add triage permission.

oh yes, I missed that, ignore my comment 👍

@tigrannajaryan
Copy link
Member

Hmm, @florianl seems to be unable to add labels. Florian, is the labels gear button disabled for you?
image

@trask
Copy link
Member

trask commented Sep 26, 2024

I don't see @florianl in @open-telemetry/profiling-maintainers

@tigrannajaryan
Copy link
Member

I don't see @florianl in @open-telemetry/profiling-maintainers

Of course, sorry for the confusion.

tigrannajaryan pushed a commit that referenced this pull request Oct 15, 2024
… ProfileContainer. (#590)

This change removes the now unnecessary split of the profiling format between two files, combining them into one as with other protocols. In the process, the likewise unnecessary ProfileContainer message type is removed and its fields merged into Profile, saving one level of indirection.

Note this as discussed on slack otel-profiles, this is effectively a minimal subset of the now defunct #587, remaining parts of which will be resubmitted in subsequent PRs.  This PR effectively blocks any concurrent changes to pprofextended.proto e.g. #588 as git won't track such changes across the file 'merge'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants