Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pt] Improve degree sign and ordinal indicator detection rules #9975

Merged
merged 2 commits into from
Dec 21, 2023

Conversation

p-goulart
Copy link
Collaborator

@p-goulart p-goulart commented Dec 21, 2023

Prob. still not perfect, but first let's get rid of the low-hanging fruit FPs:

  • make a systematic distinction between temperature scales (which require a space before the degree sign) and cardinal points (which require the space between the degree and the abbreviation):

    • 90 °F but 90° N, 'noventa graus Fahrenheit' but 'noventa graus ao norte'.
  • improve, somewhat, the recognition for coordinates like 90° 45′ 22″, using the prime/double prime characters for angular minutes/seconds, but this may require some more tinkering (and potentially also different tokenisation, which I'm not keen on...);

  • crucially, I've added o to the ordinal/degree detection rule – the nightly corpus reveals many instances of 100o clearly indicating degrees; once this is working, the ORDINAL_ABBREVIATION rules should be easier to work with, as we will have eliminated many FPs (and these rules are, crucially, the last ones of this group to run!).


As a small aside, users' habit of replacing the degree sign with lowercase o may come from some kind of automatic MS Word rule to convert that to the ordinal indicator after \d, since they, uh, look similar. It would be lovely to eliminate this habit somehow, and the first results on the live degree/ordinal rules show very high acceptance rates.

@p-goulart p-goulart merged commit 28ced48 into master Dec 21, 2023
2 checks passed
@p-goulart p-goulart deleted the pt/grammar/degree_abbrev_fix branch December 21, 2023 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants