Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pt] Added/improved APs in rule ID:MORRER_PERECER_FALECER #10621

Merged
merged 1 commit into from
May 28, 2024

Conversation

marcoagpinto
Copy link
Member

Heya, @susanaboatto and @p-goulart ,

My improvements in this rule yesterday removed:
https://internal1.languagetool.org/regression-tests/via-http/2024-05-27/pt-BR/
MORRER_PERECER_FALECER[1]: 76 FPs
MORRER_PERECER_FALECER[2]: 3 FPs

Now it will remove tons more.

I guess I will look at the tonight's results and tomorrow remove the "temp_off".

Thanks!

Copy link
Collaborator

@p-goulart p-goulart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I think this rule is a great example of why we need wordnet-style semantic categories.

Where are you getting these exceptions from? Are you using a specific corpus?

@marcoagpinto
Copy link
Member Author

LGTM, but I think this rule is a great example of why we need wordnet-style semantic categories.

Where are you getting these exceptions from? Are you using a specific corpus?

@p-goulart

I used the trick in the wordlist of adding "12 34 56." in front of each line since:
Portuguese (Portugal): 11546 input lines ignored (e.g. not between 10 and 300 chars or at least 4 tokens)

Now I got more 40 000 or 50 000 valid sentences to check, and a lot more than the ones in the nightly results.

Now I can finally make deep testing of the rules, although it takes a very long time.

If next year I am able to buy an i9 computer, I will join both PT-BR and the PT-PT wordlists in one and have hundreds of thousands of more sentences to check.

Anyway, in a shorter reply: for example, if using "gato" removed FPs, then I added "cão" and "cachorro", which is one of the tricks I used in this rule.

@marcoagpinto marcoagpinto merged commit 3d5e318 into master May 28, 2024
2 checks passed
@marcoagpinto marcoagpinto deleted the lt_marcoagpinto_20240528_0848 branch May 28, 2024 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants