Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support of local numbering systems for number symbols #1036

Merged
merged 3 commits into from
Nov 28, 2023

Conversation

kajte
Copy link
Contributor

@kajte kajte commented Oct 23, 2023

Changes:

  • Load number symbols for multiple number systems from cldr data
  • Add numbering_systems and default_numbering_system properties for Locale
  • Use default numbering system of the locale for formatting number symbols

Example:
Before:

>>> numbers.format_decimal(123456789.01234, locale="fa_IR")
'123,456,789.012'

Now:

numbers.format_decimal(123456789.01234, locale="fa_IR")
'123٬456٬789٫012'

Fixes partially issue:#446
I took some inspiration from the closed PR: #470

Couple questions:

  • How much do we have to care about backward compatibility? Is it okay that this changes the numbering system used for formatting symbols from latn to the default numbering system for locales which default system isn't latn? Eg. for before the change fa_IR used latn but now it starts using arabext.
  • Do we necessarily need a way to force usage of other numbering system than a default one for a locale? I'm not sure how common need it is to use other than the default numbering system of a locale but I feel it's not that useful.

I would eventually like to fix other # TODO: Support other number systems TODOs from the code and take default numbering system to use in percentage, currency etc. formatting and add a function for formatting digits to the local numbering system but I decided to create a pull request at this point to check the direction I'm taking is ok for maintainers. I feel even features added in this PR are an enhancement as such.

@akx akx self-requested a review October 23, 2023 12:35
Copy link
Member

@akx akx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kiitos Thank you for the PR! 💪

Some comments, requests, and discussion within. Re the defaults, if you happen to know speakers of the affected locales (see comment), please tag them in too :)

babel/core.py Show resolved Hide resolved
babel/numbers.py Show resolved Hide resolved
Comment on lines 759 to 760
if default_number_system_node is not None:
numbering_systems['default'] = str(default_number_system_node.text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to just have default_numbering_system in the data by itself, because otherwise numbering_systems's type will be a silly dict[str, str | list[str]].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good point and it made me notice there's a bug. I followed #470 and the this comment (#470 (comment)) without thinking too much. There's no list so the correct type is actually just dict[str, str]. So we can keep it as it is but just removed the list. Or would you like to still have default_numbering_system separately?

Reference: https://github.com/unicode-org/cldr/blob/d17bf3c2aff7cfad24e39ae75c5b5630bf984450/common/main/el.xml#L5736-L5741

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth to have a separate default_numbering_system since I foresee that to be a pretty hot path, and saving one dict lookup might compound :)

@codecov
Copy link

codecov bot commented Oct 23, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (5dff607) 91.34% compared to head (abe6bda) 91.43%.
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1036      +/-   ##
==========================================
+ Coverage   91.34%   91.43%   +0.08%     
==========================================
  Files          26       26              
  Lines        4415     4436      +21     
==========================================
+ Hits         4033     4056      +23     
+ Misses        382      380       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor Author

@kajte kajte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick response and good comments!

How do you see the end goal how function like format_decimal should work? Should it also convert the digits like this numbers.format_decimal(1.2, locale="ar", numbering_system='default) == '٢٫٣'?

Comment on lines 759 to 760
if default_number_system_node is not None:
numbering_systems['default'] = str(default_number_system_node.text)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good point and it made me notice there's a bug. I followed #470 and the this comment (#470 (comment)) without thinking too much. There's no list so the correct type is actually just dict[str, str]. So we can keep it as it is but just removed the list. Or would you like to still have default_numbering_system separately?

Reference: https://github.com/unicode-org/cldr/blob/d17bf3c2aff7cfad24e39ae75c5b5630bf984450/common/main/el.xml#L5736-L5741

babel/numbers.py Show resolved Hide resolved
babel/numbers.py Show resolved Hide resolved
babel/numbers.py Show resolved Hide resolved
babel/numbers.py Show resolved Hide resolved
@kajte kajte force-pushed the number-symbols-non-latin-support branch from 9216b12 to c9f7ac8 Compare October 23, 2023 15:52
@akx
Copy link
Member

akx commented Oct 23, 2023

Should it also convert the digits like this numbers.format_decimal(1.2, locale="ar", numbering_system='default') == '٢٫٣'?

I mean, if that's a sensible formatting, I think so, yeah. I do wonder if we should do one version where the high-level formatting functions still use "latn" and emit e.g. a FutureWarning when the default for the locale (e.g. "ar") would not be Latin, like

FutureWarning: the default numbering system for the `"ar"` locale is ...; to keep Latin formatting, pass `numbering_system='latn'`

? I'm just kind of worried about putting people through the trouble of their formatting suddenly changing with a "routine" Babel upgrade.

@kajte kajte force-pushed the number-symbols-non-latin-support branch 2 times, most recently from f0375f3 to b0d8b60 Compare October 24, 2023 16:27
@kajte
Copy link
Contributor Author

kajte commented Oct 24, 2023

Stats about numbering systems regarding symbols
I did some statistics regarding availability of number symbols (I'll paste code used for calculations to the end of message):

  • All 825 supported locales have number symbols data for latn
  • All locales have number symbols data for their default numbering system
  • All locales have native numbering system and symbols for it
  • 41/825 locales have traditional numbering system. 0/41 traditional numbering system have symbols.
  • There's 69 locales where if native isn't latn then native isnt' same as default.

Conclusion: There's no need to have traditional alias. Also I feel native alias isn't that useful and we could skip it if not having it makes code simpler but it's your call.


Next steps on this PR

Could we preliminarily agree on sensible signature for functions so I can't start implementing it so there's less risk I make a lot of code we don't agree on?

One option could be to add something like numbering_system_config: Literal["default"] | str = "latn" to all relevant functions defaulting to latn. On later versions the FutureWarning can be added as you suggested.

It would be nicer if it don't have to call _determine_numbering_system on every stage and for lower level functions the argument could be only numbering_system: str = "latn". However, as far as I understand the library structure, most of functions in numbers module (and all on the example below) are public and documented so someone might be using them directly and then the ability to pass "default" is probably useful. Correct me if I'm wrong on this.

What do you think? Do you have more suitable suggestions?

# new function
def _determine_numbering_system(locale: Locale, numbering_system: Literal["default"] | str = "latn") -> str:
    if numbering_system == "default":
        return locale.default_numbering_system
    else:
        return numbering_system
 
#existing but numbering_system_config argument added
def format_decimal(
    number: float | decimal.Decimal | str,
    format: str | NumberPattern | None = None,
    locale: Locale | str | None = LC_NUMERIC,
    decimal_quantization: bool = True,
    group_separator: bool = True,
    *,
    numbering_system_config: Literal["default"] | str = "latn"
) -> str:
    locale = Locale.parse(locale)
    numbering_system = _determine_numbering_system(locale, numbering_system_config)
    ...
    NumberPattern.apply(..., numbering_system_config=numbering_system)
    
    
NumberPattern():
#existing but numbering_system_config argument added
     def apply(
        self,
        value: float | decimal.Decimal | str,
        locale: Locale | str | None,
        currency: str | None = None,
        currency_digits: bool = True,
        decimal_quantization: bool = True,
        force_frac: tuple[int, int] | None = None,
        group_separator: bool = True,
        *,
        numbering_system_config: Literal["default"] | str = "latn"
    ):
         numbering_system = _determine_numbering_system(locale, numbering_system_config)
         ...
          symbol = get_decimal_symbol(locale, numbering_system_config=numbering_system)
          ....

#existing but numbering_system_config argument and modified to use number system for getting correct symbol
def get_decimal_symbol(
        locale: Locale | str | None = LC_NUMERIC,
        *,
        numbering_system_config: Literal["default"] | str = "latn"
) -> str:
    parsed_locale = Locale.parse(locale)
    numbering_system = _determine_numbering_system(locale, numbering_system_config)
    return parsed_locale.number_symbols[numbering_system].get('decimal', '.')

Other thoughts
I was chatting with my friend who is native speaker of bn. For that language it seems eg. decimal delimiters are same for bo latn and beng numering systems so it didn't help for the question whether mixing local symbols and latn numbers looks weird. However, I learned from him that converting digits from latn to beng is just doing 1-to-1 mapping from 0123456789 to ০১২৩৪৫৬৭৮৯ (https://github.com/unicode-org/cldr/blob/main/common/supplemental/numberingSystems.xml#L18). Also, he said they use quite often also latn numbers.

I think it would be quite easy to validate digit formatting against eg. the java/kotlin number formatter from the standard library to get more confidence (or to findout there's more nuances) that converting digits is as simple as 1-to-1 mapping for all number systems having 10 digits.


Here's code that I used to get stats on the beginning of post. Basically calculating number of failures or successes depending on the case. Just for the reference.

def test_latn_available(locale):
    assert Locale.parse(locale).number_symbols["latn"]


@pytest.mark.all_locales
def test_default_available(locale):
    locale = Locale.parse(locale)
    assert locale.number_symbols[locale.default_numbering_system]


@pytest.mark.all_locales
def test_locale_have_native(locale):
    assert Locale.parse(locale).other_numbering_systems["native"]


@pytest.mark.all_locales
def test_native_locale_have_symbols(locale):
    locale = Locale.parse(locale)
    assert locale.number_symbols[locale.other_numbering_systems["native"]]


@pytest.mark.all_locales
def test_locale_have_traditional(locale):
    assert Locale.parse(locale).other_numbering_systems["traditional"]


@pytest.mark.all_locales
def test_traditional_locale_have_symbols(locale):
    locale = Locale.parse(locale)
    assert locale.number_symbols[locale.other_numbering_systems["traditional"]]


@pytest.mark.all_locales
def test_native_and_default_same_if_native_isnt_latin(locale):
    locale = Locale.parse(locale)
    native = locale.other_numbering_systems["native"]
    default = locale.default_numbering_system

    if native != "latn":
        assert native == default, f"{default} {native} {locale}"

@kajte kajte requested a review from akx October 24, 2023 20:28
@akx
Copy link
Member

akx commented Nov 1, 2023

Sorry for the delay, had a bunch of other things to attend to...

One option could be to add something like numbering_system_config: Literal["default"] | str = "latn" to all relevant functions defaulting to latn. On later versions the FutureWarning can be added as you suggested.

Sounds good, but let's call the argument numbering_system instead, since it's kwarg-only so numbering_system_config= would be littering user code with an extra _config word :)

It would be nicer if it don't have to call _determine_numbering_system on every stage and for lower level functions the argument could be only numbering_system: str = "latn". However, as far as I understand the library structure, most of functions in numbers module (and all on the example below) are public and documented so someone might be using them directly and then the ability to pass "default" is probably useful.

That is the case, yep. That's somewhat analogous to how you can pass in a locale identifier to almost anything, and we call Locale.parse on it.

The API surface looks good to me at first glance aside from that naming change :)

@akx
Copy link
Member

akx commented Nov 22, 2023

@kajte I'm planning on trying to get a new Babel out maybe this week or early next week, and I think we could at least include the numbering-system loading code from this PR, even if there'd be no surface changes that'd allow to use them just yet, if you have the time to crop this PR to only that? IOW, pretty much just change the get_*_symbol() code to always use "latn" for the time being.

@kajte
Copy link
Contributor Author

kajte commented Nov 22, 2023

@kajte I'm planning on trying to get a new Babel out maybe this week or early next week, and I think we could at least include the numbering-system loading code from this PR, even if there'd be no surface changes that'd allow to use them just yet, if you have the time to crop this PR to only that? IOW, pretty much just change the get_*_symbol() code to always use "latn" for the time being.

I wrote some code last week that I didn't have time to complete yet. I can check today or tomorrow whether it's close to done. If not, your suggestions sounds good.

@kajte kajte force-pushed the number-symbols-non-latin-support branch from b0d8b60 to 0bfc336 Compare November 23, 2023 22:33
@kajte
Copy link
Contributor Author

kajte commented Nov 23, 2023

Okay, I got something done now.

  1. Loads number symbols for available numbering systems from cldr data
  2. All number formatting functions using number symbol data should now take argument numbering_system: Literal["default"] | str = "latn" which is used to select from which numbering system symbols are used.
    I hope I didn't miss any function or forgot to test any function where I added the numbering_system argument

How does it look? If needed I can also change it what you suggested on Wednesday so that we only want to include that part to the next release.

@akx
Copy link
Member

akx commented Nov 27, 2023

@kajte Thanks! Can you rebase this on master and take care of the lint errors? Beyond the ones currently flagged by Ruff, there will be new ones since #1045 was just merged. I'll give the rest a review too now.

Copy link
Member

@akx akx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good!

babel/core.py Outdated
"""
return self._data['number_symbols']

@property
def other_numbering_systems(self) -> localedata.LocaleDataDict:
"""Mapping of othern numbering systems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Mapping of othern numbering systems.
"""Mapping of other numbering systems.

What does "other" mean here? Could we elucidate that some?

Copy link
Contributor Author

@kajte kajte Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cldr spec calls them Other numbering systems: https://www.unicode.org/reports/tr35/tr35-numbers.html#otherNumberingSystems But I agree it's not very clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it now to:

Mapping of other numbering systems available for the locale.
See: https://www.unicode.org/reports/tr35/tr35-numbers.html#otherNumberingSystems

babel/core.py Outdated Show resolved Hide resolved
babel/numbers.py Outdated Show resolved Hide resolved
babel/numbers.py Outdated
Comment on lines 343 to 344
def __init__(self, numbering_system: str, locale: Locale | str | None) -> None:
super().__init__(f"Unknown numbering system {numbering_system} for Locale {locale}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just format this error message at the call site since you're not putting the numbering_system and locale into fields in the exception anyway.

Alternately, put them in there :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I formatted it at the call

babel/numbers.py Outdated
class UnsupportedNumberingSystemError(Exception):
"""Exception thrown when an unsupported numbering system is requested for the given Locale."""

def __init__(self, numbering_system: str, locale: Locale | str | None) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does need to be (and currently is) constructed with an actual Locale.

Suggested change
def __init__(self, numbering_system: str, locale: Locale | str | None) -> None:
def __init__(self, numbering_system: str, locale: Locale) -> None:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whole __init__ function based on the comment above

babel/numbers.py Outdated
Comment on lines 362 to 363
:param numbering_system: The numbering system used for fetching the symbol. Defaults to "latn", special
value "default" will use the default numbering system of the locale.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param numbering_system: The numbering system used for fetching the symbol. Defaults to "latn", special
value "default" will use the default numbering system of the locale.
:param numbering_system: The numbering system used for fetching the symbol. Defaults to "latn".
The special value "default" will use the default numbering system of the locale.

(Consider this repeated for all param numbering_systems :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

tests/test_numbers.py Outdated Show resolved Hide resolved
tests/test_support.py Show resolved Hide resolved
@kajte kajte force-pushed the number-symbols-non-latin-support branch 3 times, most recently from 572a890 to 5113888 Compare November 27, 2023 21:44
- Import number symbols for available  numbering systems from cldr data
- Add default_numbering_system  and other_numbering_systems properties for Locale
- Add numbering_system argument to relevant number formatting fuctions and use number symbols based on the given numbering system

Fixes partially issue python-babel#446
@kajte kajte force-pushed the number-symbols-non-latin-support branch from 5113888 to 0509ae4 Compare November 27, 2023 21:57
@kajte
Copy link
Contributor Author

kajte commented Nov 27, 2023

@kajte Thanks! Can you rebase this on master and take care of the lint errors? Beyond the ones currently flagged by Ruff, there will be new ones since #1045 was just merged. I'll give the rest a review too now.

Rebased and at least pre-commit hook succeed locally now.

Copy link
Member

@akx akx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just applying some minor formatting fixes via review.)

babel/numbers.py Outdated Show resolved Hide resolved
babel/numbers.py Outdated Show resolved Hide resolved
babel/numbers.py Outdated Show resolved Hide resolved
babel/support.py Show resolved Hide resolved
babel/support.py Outdated Show resolved Hide resolved
babel/numbers.py Outdated Show resolved Hide resolved
babel/units.py Outdated Show resolved Hide resolved
Comment on lines +636 to +637
assert numbers.format_percent(134.5, locale='ar_EG', numbering_system="default") == '13٬450%'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert numbers.format_percent(134.5, locale='ar_EG', numbering_system="default") == '13٬450%'
assert numbers.format_percent(134.5, locale='ar_EG', numbering_system="default") == '13٬450%'

tests/test_numbers.py Outdated Show resolved Hide resolved
tests/test_numbers.py Show resolved Hide resolved
@akx akx force-pushed the number-symbols-non-latin-support branch from 9032583 to abe6bda Compare November 28, 2023 09:13
@akx akx merged commit aca7663 into python-babel:master Nov 28, 2023
26 checks passed
@akx
Copy link
Member

akx commented Nov 28, 2023

@kajte Thanks! 🎉

@kajte
Copy link
Contributor Author

kajte commented Nov 28, 2023

How should I run linters locally to catch these? I installed the pre-commit hook and it passed for my commit.

@kajte
Copy link
Contributor Author

kajte commented Nov 28, 2023

@kajte Thanks! 🎉

Thanks for your support! I have couple busy weeks ahead but I'll try continue with formatting actual numerals after that.

@akx
Copy link
Member

akx commented Nov 28, 2023

How should I run linters locally to catch these? I installed the pre-commit hook and it passed for my commit.

I think you hadn't rebased on the master that had the augmented Ruff configuration :)

Thanks for your support! I have couple busy weeks ahead but I'll try continue with formatting actual numerals after that.

Thank you for the contribution! Same here TBH, and then it'll be kinkunpaistoaika and – – 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants