Why do some glyphs have a four-digit hex code while other have five digits? #1681
-
I'm troubleshooting an issue where the VS Code integrated terminal does unnecessary line breaks when certain glyphs are present in the output. I've noticed that glyphs with four-digit hex codes don't have the issue, while glyphs with five-digit hex codes do. I plan to file an issue with vscode, but first, I want to understand how glyphs work better. Why do the hex codes vary in length? EDIT: I've also noticed that when I enter the hex code literal into an Oh My Posh segment template string rather than pasting the icon itself into the template string, the 4-digit one is correct while the 5-digit one isn't. I know this is neither the VS Code nor the Oh My Posh repo, but these issues seem conceptually similar. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
"In the beginning" there was ASCII, where each character can be expressed as 7 bit number - later expanded to 8 bit. That all was too limited and finally unicode came up. Version 2 had 40'000 characters, nowadays we have 150'000. We are not free to use any codepoint we like, there are certain ranges that need to be used. And when the useful-for-Nerdfonts 4 digit range filled up we needed to allocate new glyphs a 5 digit number. Well, in principle. That much for history. A long time 4 digit code were all what was needed and this lead to some pitfalls. Here you see LOL, with codepoint I have no clue what VS Code uses or expects. For the unnecessary line breaks, I guess the line breaks because the terminal calculates the length wrong? But the lines seem to be not full. On the other hand the break is not after the special glyph but later on. You need to know that unicode has the concept of different width or characters; they can be single, double, or ambiguous width. Maybe here is a problem. I hope this explains at least some things and helps your research. There are for more peculiarities, especially regarding Windows and how it handles stuff. I am not sure if the same holds for VS Code on other platforms and you do not specify your platform. Windows specifically was late in joining the other OSes that all supported 5 digit unicode characters and still struggles (I believe, I can be wrong here). See also |
Beta Was this translation helpful? Give feedback.
"In the beginning" there was ASCII, where each character can be expressed as 7 bit number - later expanded to 8 bit.
These 8 bits can be expressed as 2 digit hexadecimal number.
But 256 characters is a bit low, considering all these strange letters with appendices like ä and Ó.
Then we got codepages. Etc.
That all was too limited and finally unicode came up. Version 2 had 40'000 characters, nowadays we have 150'000.
The 'address' of a certain character or glyph is its codepoint. And you see for Version 2 with 40'000 (decimal) codepoints each can be expressed as 4 digit hexadecimal number (40000 decimal =
9C40
hex). But this is no longer possible (150'000 decimal =249F0
hex)We are not fr…