Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pt] Improve compounding rules #9213

Merged
merged 8 commits into from
Aug 31, 2023
Merged

[pt] Improve compounding rules #9213

merged 8 commits into from
Aug 31, 2023

Conversation

p-goulart
Copy link
Collaborator

@p-goulart p-goulart commented Aug 29, 2023

  • move a few common disables from the compounding file to the grammar XML so we can have more granular logic;
  • add new rule for colour name hyphenation, since that's something that people commonly disable – that waay, users will have more control over LT's behaviour when it comes to hyphenating colour names, without us losing any coverage (and the performance of the compounding rule should improve, to boot).

Blocked by #9216 ⚠️

@p-goulart p-goulart marked this pull request as draft August 30, 2023 12:48
@p-goulart p-goulart marked this pull request as ready for review August 30, 2023 14:09
@p-goulart
Copy link
Collaborator Author

@susanaboatto have you had a moment to review this? Afraid this PR might get stale.

@@ -38791,6 +38791,71 @@ Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.


<category id='COMPOUNDING' name="Palavras Compostas" type="misspelling">
<rule id="SEM_ABRIGO" name="Composição de sem-abrigo">
<pattern>
<token postag_regexp="yes" postag="[ADP].+M[SP].+|Z.+|SENT_START"></token>
Copy link
Collaborator

@susanaboatto susanaboatto Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit weary of including adjectives here, I feel like some FPs will happen. Hopefully few enough to tackle with antipatterns. For example: Eu com casa e o bonito sem abrigo

Also with DP*:

O meu em casa e o seu sem abrigo

Maybe adjectives and possessives should be dealt with separately.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, you're not wrong, I'll remove adjectives from here and see what happens.

@p-goulart p-goulart merged commit 78ed25a into master Aug 31, 2023
1 check passed
@p-goulart p-goulart deleted the pt/grammar/iffy_compounds branch August 31, 2023 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants