Skip to content

Commit

Permalink
keysyms: Fix inconsistent case-insensitive name lookup
Browse files Browse the repository at this point in the history
`xkb_keysym_from_name` has inconsistent behavior when used with the flag
`XKB_KEYSYM_CASE_INSENSITIVE`:

```c
xkb_keysym_from_name("a", XKB_KEYSYM_CASE_INSENSITIVE) == XKB_KEY_a;
xkb_keysym_from_name("A", XKB_KEYSYM_CASE_INSENSITIVE) == XKB_KEY_a;
xkb_keysym_from_name("dead_a", XKB_KEYSYM_CASE_INSENSITIVE) == XKB_KEY_dead_A;
xkb_keysym_from_name("dead_A", XKB_KEYSYM_CASE_INSENSITIVE) == XKB_KEY_dead_A;
xkb_keysym_from_name("dead_o", XKB_KEYSYM_CASE_INSENSITIVE) == XKB_KEY_dead_o;
xkb_keysym_from_name("dead_O", XKB_KEYSYM_CASE_INSENSITIVE) == XKB_KEY_dead_o;
xkb_keysym_from_name("KANA_tsu", XKB_KEYSYM_CASE_INSENSITIVE) == XKB_KEY_kana_tsu;
xkb_keysym_from_name("KANA_TSU", XKB_KEYSYM_CASE_INSENSITIVE) == XKB_KEY_kana_tsu;
xkb_keysym_from_name("KANA_ya", XKB_KEYSYM_CASE_INSENSITIVE) == XKB_KEY_kana_YA;
xkb_keysym_from_name("KANA_YA", XKB_KEYSYM_CASE_INSENSITIVE) == XKB_KEY_kana_YA;
xkb_keysym_from_name("XF86Screensaver", XKB_KEYSYM_CASE_INSENSITIVE) == XKB_KEY_XF86ScreenSaver;
xkb_keysym_from_name("XF86ScreenSaver", XKB_KEYSYM_CASE_INSENSITIVE) == XKB_KEY_XF86ScreenSaver;
```

So currently, if two keysym names differ only by case, then the
lower-case *keysym* is returned, not the keysym corresponding to the
lower-case keysym *name*. Indeed, `xkb_keysym_from_name` uses
`xkb_keysym_is_lower` to test if a keysym is a lower-case keysym.

Let’s look at the example for keysyms `a` and `A`: we get the keysym `a`
not because its name is lower case, but because `xkb_keysym_is_lower(XKB_KEY_a)`
returns true and `xkb_keysym_is_lower(XKB_KEY_A)` returns false.

So the results are correct according to the doc:

- Katakana is not a bicameral script, so e.g. `kana_ya` is *not* the lower
  case of `kana_YA`.
- As for the `dead_*` keysyms, they are not cased either because they do
  not correspond to characters.
- `XF86ScreenSaver` and `XF86Screensaver` are two different keysyms.

But this is also very counter-intuitive: `xkb_keysym_is_lower` is not
the right function to use in this case, because one would expect to check
only the name, not the corresponding character case:

```c
xkb_keysym_from_name("KANA_YA", XKB_KEYSYM_CASE_INSENSITIVE) == XKB_KEY_kana_ya;
xkb_keysym_from_name("XF86ScreenSaver", XKB_KEYSYM_CASE_INSENSITIVE) == XKB_KEY_XF86Screensaver;
```

Fixed by making the order of the keysyms names consistent in `src/ks_tables.h`:

1. Sort by the casefolded name: e.g. `kana_ya` < `kana_YO`.
2. If same casefolded name, then sort by cased name, i.e for
   ASCII: upper before lower: e.g `kana_YA` < `kana_ya`.

Thus we now have e.g. `kana_YA` < `kana_ya` < `kana_YO` < `kana_yo`.
The lookup logic has also been simplified.

Added exhaustive test for ambiguous case-insensitive names.
  • Loading branch information
wismill committed Mar 7, 2024
1 parent 0d875ef commit ced1916
Show file tree
Hide file tree
Showing 9 changed files with 1,140 additions and 692 deletions.
2 changes: 1 addition & 1 deletion include/xkbcommon/xkbcommon.h
Original file line number Diff line number Diff line change
Expand Up @@ -472,7 +472,7 @@ enum xkb_keysym_flags {
* invalid flags are passed, this will fail with XKB_KEY_NoSymbol.
*
* If you use the XKB_KEYSYM_CASE_INSENSITIVE flag and two keysym names
* differ only by case, then the lower-case keysym is returned. For
* differ only by case, then the lower-case keysym name is returned. For
* instance, for KEY_a and KEY_A, this function would return KEY_a for the
* case-insensitive search. If this functionality is needed, it is
* recommended to first call this function without this flag; and if that
Expand Down
6 changes: 4 additions & 2 deletions meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -665,13 +665,15 @@ if icu_dep.found()
endif
test(
'keysym',
executable('test-keysym', 'test/keysym.c', dependencies: keysyms_test_dep,
executable('test-keysym', 'test/keysym.c', 'test/keysym.h',
dependencies: keysyms_test_dep,
c_args: keysyms_test_c_args),
env: test_env,
)
test(
'keymap',
executable('test-keymap', 'test/keymap.c', dependencies: test_dep),
executable('test-keymap', 'test/keymap.c', 'test/keysym.h',
dependencies: test_dep),
env: test_env,
)
test(
Expand Down
11 changes: 10 additions & 1 deletion scripts/makekeys
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,16 @@ pattern = re.compile(r"^#define\s+XKB_KEY_(?P<name>\w+)\s+(?P<value>0x[0-9a-fA-F
matches = [pattern.match(line) for line in open(sys.argv[1])]
entries = [(m.group("name"), int(m.group("value"), 16)) for m in matches if m]

entries_isorted = sorted(entries, key=lambda e: e[0].lower())
# Sort based on the keysym name:
# 1. Sort by the casefolded name: e.g. kana_ya < kana_YO.
# 2. If same casefolded name, then sort by cased name, i.e for
# ASCII: upper before lower: e.g kana_YA < kana_ya.
# E.g. kana_YA < kana_ya < kana_YO < kana_yo
# WARNING: this sort must not be changed, as some functions e.g.
# xkb_keysym_from_name rely on upper case variant occuring first.
entries_isorted = sorted(entries, key=lambda e: (e[0].casefold(), e[0]))
# Sort based on keysym value. Sort is stable so in case of duplicate, the first
# keysym occurence stays first.
entries_kssorted = sorted(entries, key=lambda e: e[1])

print(
Expand Down
72 changes: 55 additions & 17 deletions scripts/update-headers.py
Original file line number Diff line number Diff line change
@@ -1,47 +1,72 @@
#!/usr/bin/env python3

import argparse
from collections import defaultdict
from pathlib import Path
import re
import sys
from typing import Any
from typing import Any, TypeAlias

import jinja2

KEYSYM_PATTERN = re.compile(
r"^#define\s+XKB_KEY_(?P<name>\w+)\s+(?P<value>0x[0-9a-fA-F]+)\s"
)
MAX_AMBIGUOUS_NAMES = 3

KeysymsBounds: TypeAlias = dict[str, int]
KeysymsCaseFoldedNames: TypeAlias = dict[str, list[str]]

def load_keysyms(path: Path) -> dict[str, int]:

def load_keysyms(path: Path) -> tuple[KeysymsBounds, KeysymsCaseFoldedNames]:
# Load the keysyms header
keysym_min = sys.maxsize
keysym_max = 0
min_unicode_keysym = 0x01000100
max_unicode_keysym = 0x0110FFFF
canonical_names: dict[int, str] = {}
casefolded_names: dict[str, list[str]] = defaultdict(list)
max_unicode_name = "U10FFFF"
max_keysym_name = "0x1fffffff" # XKB_KEYSYM_MAX

with path.open("rt", encoding="utf-8") as fd:
for line in fd:
if m := KEYSYM_PATTERN.match(line):
value = int(m.group("value"), 16)
keysym_min = min(keysym_min, value)
keysym_max = max(keysym_max, value)
name = m.group("name")
casefolded_names[name.casefold()].append(name)
if value not in canonical_names:
canonical_names[value] = m.group("name")
return {
"XKB_KEYSYM_MIN_ASSIGNED": min(keysym_min, min_unicode_keysym),
"XKB_KEYSYM_MAX_ASSIGNED": max(keysym_max, max_unicode_keysym),
"XKB_KEYSYM_MIN_EXPLICIT": keysym_min,
"XKB_KEYSYM_MAX_EXPLICIT": keysym_max,
"XKB_KEYSYM_COUNT_EXPLICIT": len(canonical_names),
"XKB_KEYSYM_NAME_MAX_SIZE": max(
max(len(name) for name in canonical_names.values()),
len(max_unicode_name),
len(max_keysym_name),
),
}
canonical_names[value] = name

# Keep only ambiguous case-insensitive names and sort them
for name in tuple(casefolded_names.keys()):
count = len(casefolded_names[name])
if count < 2:
del casefolded_names[name]
elif count > MAX_AMBIGUOUS_NAMES:
raise ValueError(
f"""Expected max {MAX_AMBIGUOUS_NAMES} keysyms for "{name}", got: {count}"""
)
else:
casefolded_names[name].sort()

return (
{
"XKB_KEYSYM_MIN_ASSIGNED": min(keysym_min, min_unicode_keysym),
"XKB_KEYSYM_MAX_ASSIGNED": max(keysym_max, max_unicode_keysym),
"XKB_KEYSYM_MIN_EXPLICIT": keysym_min,
"XKB_KEYSYM_MAX_EXPLICIT": keysym_max,
"XKB_KEYSYM_COUNT_EXPLICIT": len(canonical_names),
"XKB_KEYSYM_NAME_MAX_SIZE": max(
max(len(name) for name in canonical_names.values()),
len(max_unicode_name),
len(max_keysym_name),
),
},
casefolded_names,
)


def generate(
Expand Down Expand Up @@ -86,12 +111,25 @@ def generate(
jinja_env.filters["keysym"] = lambda ks: f"0x{ks:0>8x}"

# Load keysyms
keysyms_data = load_keysyms(args.root / "include/xkbcommon/xkbcommon-keysyms.h")
keysyms_bounds, keysyms_ambiguous_case_insensitive_names = load_keysyms(
args.root / "include/xkbcommon/xkbcommon-keysyms.h"
)

# Generate the files
generate(
jinja_env,
keysyms_data,
keysyms_bounds,
args.root,
Path("src/keysym.h"),
)

generate(
jinja_env,
dict(
keysyms_bounds,
ambiguous_case_insensitive_names=keysyms_ambiguous_case_insensitive_names,
MAX_AMBIGUOUS_NAMES=MAX_AMBIGUOUS_NAMES,
),
args.root,
Path("test/keysym.h"),
)
46 changes: 18 additions & 28 deletions src/keysym.c
Original file line number Diff line number Diff line change
Expand Up @@ -270,16 +270,18 @@ xkb_keysym_from_name(const char *name, enum xkb_keysym_flags flags)
* Find the correct keysym for case-insensitive match.
*
* The name_to_keysym table is sorted by istrcmp(). So the binary
* search may return _any_ of all possible case-insensitive duplicates. This
* code searches the entry, all previous and all next entries that match by
* case-insensitive comparison and returns the "best" case-insensitive
* match.
* search may return _any_ of all possible case-insensitive duplicates. The
* duplicates are sorted so that the "best" case-insensitive match comes
* last. So the code searches the entry and all next entries that match by
* case-insensitive comparison and returns the "best" case-insensitive match.
*
* The "best" case-insensitive match is the lower-case keysym which we find
* with the help of xkb_keysym_is_lower(). The only keysyms that only differ
* by letter-case are keysyms that are available as lower-case and
* upper-case variant (like KEY_a and KEY_A). So returning the first
* lower-case match is enough in this case.
* The "best" case-insensitive match is the lower-case keysym name. Most
* keysyms names that only differ by letter-case are keysyms that are
* available as “small” and “big” variants. For example:
*
* - Bicameral scripts: Lower-case and upper-case variants,
* e.g. KEY_a and KEY_A.
* - Non-bicameral scripts: e.g. KEY_kana_a and KEY_kana_A.
*/
else {
int32_t lo = 0, hi = ARRAY_SIZE(name_to_keysym) - 1;
Expand All @@ -296,26 +298,14 @@ xkb_keysym_from_name(const char *name, enum xkb_keysym_flags flags)
}
}
if (entry) {
const struct name_keysym *iter, *last;

if (icase && xkb_keysym_is_lower(entry->keysym))
return entry->keysym;

for (iter = entry - 1; iter >= name_to_keysym; --iter) {
if (istrcmp(get_name(iter), get_name(entry)) != 0)
break;
if (xkb_keysym_is_lower(iter->keysym))
return iter->keysym;
const struct name_keysym *last;
last = name_to_keysym + ARRAY_SIZE(name_to_keysym) - 1;
/* Keep going until we reach end of array
* or non case-insensitve match */
while (entry < last &&
istrcmp(get_name(entry + 1), get_name(entry)) == 0) {
entry++;
}

last = name_to_keysym + ARRAY_SIZE(name_to_keysym);
for (iter = entry + 1; iter < last; ++iter) {
if (istrcmp(get_name(iter), get_name(entry)) != 0)
break;
if (xkb_keysym_is_lower(iter->keysym))
return iter->keysym;
}

return entry->keysym;
}
}
Expand Down
Loading

0 comments on commit ced1916

Please sign in to comment.