Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode logs/attributes support #2636

Open
tobervenec opened this issue Apr 15, 2024 · 8 comments
Open

Unicode logs/attributes support #2636

tobervenec opened this issue Apr 15, 2024 · 8 comments
Labels
good first issue Good for newcomers help wanted Good for taking. Extra help will be provided by maintainers triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@tobervenec
Copy link

Hi!
In most cases simple string_view or const char * is sufficient for needs of our project. But I wonder if there are some plans to extend logs/attributes with unicode support (Neither Logger functions nor AttributeValue seems to support wchar_t)?

Is your feature request related to a problem?
We are using logs/metrics (+traces in the future) quite extensively and recently a question was risen whether we can have at least unicode attributes (for example somebody wants to extend his log message with an attribute that represents some company name, which cannot be represented by ascii characters (chinese for example)).

Describe the solution you'd like
Would be nice to have a possibility to log messages and/or add attributes that can represent wider range of characters

Describe alternatives you've considered
One of alternatives I was thinking about is to have a possibility to send an array of bytes and corresponding encoding so that consumer knows how to interpret those bytes.

Additional context
Add any other context about the feature request here.

@github-actions github-actions bot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 15, 2024
@marcalff marcalff added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 17, 2024
Copy link

This issue was marked as stale due to lack of activity.

@github-actions github-actions bot added the Stale label Jun 19, 2024
@tobervenec
Copy link
Author

Hi!
Is anything planned for this one?

@marcalff
Copy link
Member

marcalff commented Aug 7, 2024

This request is valid and makes sense.
Planning is constrained by resources.
This sounds like a good task to get into the opentelemetry-cpp code, adding help needed label.

@marcalff marcalff added help wanted Good for taking. Extra help will be provided by maintainers good first issue Good for newcomers labels Aug 7, 2024
Copy link

github-actions bot commented Aug 7, 2024

This issue is available for anyone to work on. Make sure to reference this issue in your pull request.
✨ Thank you for your contribution! ✨

@lalitb
Copy link
Member

lalitb commented Aug 7, 2024

I believe const char * and nostd::string_view are sufficient for handling UTF-8 encoded strings, which can represent any Unicode character of sequences of 1 to 4 bytes. The use of wchar_t can be problematic due to its differing size and encoding between platforms. Or am I missing something?

@marcalff
Copy link
Member

marcalff commented Aug 7, 2024

A const char* string can represent unicode.

What needs to be clarified is whether the information "this is an ascii string" or "this is encoded in UTF8" or "this is encoded in XYZ character set" needs to be represented somewhere.

This will need a lot of testing too.

@lalitb
Copy link
Member

lalitb commented Aug 7, 2024

supporting (or allowing to send/detect) different unicode formats eg - UTF-8, UTF-16 etc? I don't think this should be done. The string encoding should be supported as UTF-8 . Even protobuf format takes string as valid UTF-8 format.

string     := valid UTF-8 string (e.g. ASCII);
                max 2GB of bytes

@github-actions github-actions bot removed the Stale label Aug 9, 2024
@punya
Copy link
Member

punya commented Aug 28, 2024

This seems related to open-telemetry/opentelemetry-specification#3421.

Personally I think that supporting arbitrary encodings (+ conversions to UTF-8 or raw bytes at OTLP boundaries) would be a large increase in the complexity of the API and SDK without a large enough payoff.

In the issue description, @tobervenec seems to have assumed that nostd::string_view or const char * is meant to be used only for ASCII text. Instead of adding support for arbitrary encodings, could we start by documenting that we expect those to contain UTF-8 encoded strings?

We may also want to implement validation/sanitization for invalid UTF-8 strings (currently, IIUC, we pass them through blindly). That should be a separate issue, and should probably be blocked on the resolution of the spec issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Good for taking. Extra help will be provided by maintainers triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants