-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pt] Fixed FPs in rule ID:TOMAR_ASSUMIR #9247
Conversation
@@ -8316,18 +8316,21 @@ USA | |||
|
|||
<rule id='TOMAR_ASSUMIR' name="[Universitário][Científico] V. Tomar → V. Assumir" tone_tags="academic" default="temp_off"> | |||
<pattern> | |||
<token postag='AQ.+|NC.+|NP.+|CS|CC' postag_regexp='yes'> | |||
<exception postag_regexp='no' postag='RG'/> <!-- Add more exceptions here as they are found --> | |||
</token> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears to me that the problem is the double meaning of “tomar”—the rule doesn't work when it is used literally (e.g., for drinks, medication, etc.). So the question is, does this exception fix this issue? Or what does it exclude?
We'd need quite a long list of exceptions in <token postag='AQ.+|NC.+' postag_regexp='yes'/>
to avoid the literal false alarms. Some I can think of: drinks|bebidas|cafés|su[mc]os|vinhos|cervejas|sorvetes|gelados|remédios|medicações|medicamentos|shots|chás|táxis|carros|ônibus|autocarros|trens|comboios|voos
, etc.
The RG exception fixes this sentence: "já"=RG All other FPs in the link above were fixed by restricting (removing) the list of possible words and by adding as the start token: This is the full rule:
Could you try one of the drinks example with the rule? Thanks! ❤️ ❤️ ❤️ ❤️ ❤️ ❤️ |
Ahhhh... and if is needed, we can add the most possible usable exceptions. |
I see matches where "tomar" was accurate such as
doesn't address the root cause, which seems to be contextual to me. For example, this still matches cases like:
And it fails to detect:
It appears only exceptions to |
What shall I do? 😋 |
This is easy to improve, in the first token I can have |
Give me some tips, and tonight I will improve the rule and create another pull request. |
thanks! |
I'd say delete
...unless this restriction removes false positives that I'm not seeing, other than those related to the verb Then add those words as an exception to the last token. |
Heya, Susana, I have improved the rule:
I had to add the first line to remove false positives, such as: So, here is how the rule became:
Thanks! 😋 😛 ❤️ 🤗 |
I still don't understand the first token limitation. Can you give examples of the sentences they are limiting? |
I created the rule two or three weeks ago, I can't remember about the limiting, but it is better to have a stricter rule than having false positives. |
Heya, @susanaboatto and @p-goulart
I was looking at the nightly results and there were several false positives, so I have fixed them.
Here is the fix.
In the next nightly results, I will see if the FPs are all gone and if so, I will remove the temp_off.
Thanks!