Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pt] Enable multi-token spell-checking #10052

Merged
merged 9 commits into from
Jan 9, 2024
Merged

Commits on Jan 5, 2024

  1. Configuration menu
    Copy the full SHA
    6df212c View commit details
    Browse the repository at this point in the history
  2. Add smart titlecase method to StringTools

    p-goulart committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    ba0e768 View commit details
    Browse the repository at this point in the history
  3. [pt] Fix multiword prepositions

    p-goulart committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    916e7a6 View commit details
    Browse the repository at this point in the history
  4. [pt] Update PT tokeniser

     - improve handling of percent signs (was: [50%OFF], will be: [50%,
       OFF]);
     - add some tests due to the latest dictionary version.
    p-goulart committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    a8c4f4f View commit details
    Browse the repository at this point in the history
  5. [pt] Add speller tests due to latest dictionary

    p-goulart committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    a77bd8d View commit details
    Browse the repository at this point in the history
  6. Add titlecasing step to MultiWordChunker class

     - multitoken suggestions were failing because we were only checking if
       they were present in the dictionary by upcasing their first letter;
     - this failed to account for titlecasing (either naively or a little
       more smartly), which is relatively frequent;
     - cf. stuff like "The Lord of the Rings".
    p-goulart committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    34d261b View commit details
    Browse the repository at this point in the history
  7. [pt] Bump up dict to v0.12

    p-goulart committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    d14d5b1 View commit details
    Browse the repository at this point in the history
  8. Improve titlecase logic in MultiWordChunker

    p-goulart committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    57a9ef0 View commit details
    Browse the repository at this point in the history

Commits on Jan 8, 2024

  1. Add titlecasing option to multi-word chunker

     - only Portuguese has it *on*, all other locales have it set to false;
    
     - add a simple StringTools method to check if all words in a
       multi-token string are lowercase (and tests).
    p-goulart committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    1a0dbab View commit details
    Browse the repository at this point in the history