Unicode logs/attributes support #2636

tobervenec · 2024-04-15T14:19:39Z

Hi!
In most cases simple string_view or const char * is sufficient for needs of our project. But I wonder if there are some plans to extend logs/attributes with unicode support (Neither Logger functions nor AttributeValue seems to support wchar_t)?

Is your feature request related to a problem?
We are using logs/metrics (+traces in the future) quite extensively and recently a question was risen whether we can have at least unicode attributes (for example somebody wants to extend his log message with an attribute that represents some company name, which cannot be represented by ascii characters (chinese for example)).

Describe the solution you'd like
Would be nice to have a possibility to log messages and/or add attributes that can represent wider range of characters

Describe alternatives you've considered
One of alternatives I was thinking about is to have a possibility to send an array of bytes and corresponding encoding so that consumer knows how to interpret those bytes.

Additional context
Add any other context about the feature request here.

github-actions · 2024-06-19T01:49:47Z

This issue was marked as stale due to lack of activity.

tobervenec · 2024-08-07T06:23:44Z

Hi!
Is anything planned for this one?

marcalff · 2024-08-07T06:46:02Z

This request is valid and makes sense.
Planning is constrained by resources.
This sounds like a good task to get into the opentelemetry-cpp code, adding help needed label.

github-actions · 2024-08-07T06:46:23Z

This issue is available for anyone to work on. Make sure to reference this issue in your pull request.
✨ Thank you for your contribution! ✨

lalitb · 2024-08-07T07:19:26Z

I believe const char * and nostd::string_view are sufficient for handling UTF-8 encoded strings, which can represent any Unicode character of sequences of 1 to 4 bytes. The use of wchar_t can be problematic due to its differing size and encoding between platforms. Or am I missing something?

marcalff · 2024-08-07T07:40:22Z

A const char* string can represent unicode.

What needs to be clarified is whether the information "this is an ascii string" or "this is encoded in UTF8" or "this is encoded in XYZ character set" needs to be represented somewhere.

This will need a lot of testing too.

lalitb · 2024-08-07T16:03:47Z

supporting (or allowing to send/detect) different unicode formats eg - UTF-8, UTF-16 etc? I don't think this should be done. The string encoding should be supported as UTF-8 . Even protobuf format takes string as valid UTF-8 format.

string     := valid UTF-8 string (e.g. ASCII);
                max 2GB of bytes

punya · 2024-08-28T12:30:48Z

This seems related to open-telemetry/opentelemetry-specification#3421.

Personally I think that supporting arbitrary encodings (+ conversions to UTF-8 or raw bytes at OTLP boundaries) would be a large increase in the complexity of the API and SDK without a large enough payoff.

In the issue description, @tobervenec seems to have assumed that nostd::string_view or const char * is meant to be used only for ASCII text. Instead of adding support for arbitrary encodings, could we start by documenting that we expect those to contain UTF-8 encoded strings?

We may also want to implement validation/sanitization for invalid UTF-8 strings (currently, IIUC, we pass them through blindly). That should be a separate issue, and should probably be blocked on the resolution of the spec issue.

github-actions bot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 15, 2024

marcalff added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 17, 2024

github-actions bot added the Stale label Jun 19, 2024

marcalff added help wanted Good for taking. Extra help will be provided by maintainers good first issue Good for newcomers labels Aug 7, 2024

github-actions bot removed the Stale label Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode logs/attributes support #2636

Unicode logs/attributes support #2636

tobervenec commented Apr 15, 2024

github-actions bot commented Jun 19, 2024

tobervenec commented Aug 7, 2024

marcalff commented Aug 7, 2024

github-actions bot commented Aug 7, 2024

lalitb commented Aug 7, 2024 •

edited

Loading

marcalff commented Aug 7, 2024

lalitb commented Aug 7, 2024 •

edited

Loading

punya commented Aug 28, 2024

Unicode logs/attributes support #2636

Unicode logs/attributes support #2636

Comments

tobervenec commented Apr 15, 2024

github-actions bot commented Jun 19, 2024

tobervenec commented Aug 7, 2024

marcalff commented Aug 7, 2024

github-actions bot commented Aug 7, 2024

lalitb commented Aug 7, 2024 • edited Loading

marcalff commented Aug 7, 2024

lalitb commented Aug 7, 2024 • edited Loading

punya commented Aug 28, 2024

lalitb commented Aug 7, 2024 •

edited

Loading

lalitb commented Aug 7, 2024 •

edited

Loading