Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pt] Fix hexadecimal number bug #10601

Merged
merged 3 commits into from
May 20, 2024
Merged

Conversation

p-goulart
Copy link
Collaborator

@p-goulart p-goulart commented May 20, 2024

This PR fixes a few FPs with valid hexadecimals like 0x5A stemming from the multiplication rule1.

To achieve that, we:

  • fix a logic bug in RegexAntiPatternFilter (@jaumeortola);
  • add a RegexAntiPatternFilter to MULTIPLICATION_SIGN;
  • add a couple of hexadecimal examples to MULTIPLICATION_SIGN;
  • add HEXADECIMAL_NOTATION and OCTAL_NOTATION rules to the disambiguator so we can reliably ignore spelling for valid hexadecimal and octal numerals2;
  • add tests to the speller rule to make sure we're ignoring what we should be ignoring.

1: this is not a new issue, but the matches appear as new because of the recent improvements to the disambiguator.

2: I decided to only ignore spelling of C-style hexadecimal and C-inspired octal numerals (i.e. 0x... and 0o..., respectively). Most other notations seemed like they'd conflict with legitimate spelling issues, and I didn't want to be too aggressive.

 - by doing `if (matcher.find())` we were only ever checking the first
   match within a single text sample;

 - changing it from an `if` block to a `while` loop means that, so long
   as there are instances to find, we will continue comparing them
   with those in the actual rule match object;

 - this bug affected any usage of RegexAntiPatternFilter where the rule
   regexp pattern matched multiple times in the same sentence.
 - the tailing letters (usually measurement unit abbreviations) must be
   present in the regexp if we want the regexp antipattern to function
   properly;

 - add a couple of examples testing hex numbers;

 - add a couple of speller rule tests to drive the point home.
@@ -3466,6 +3466,20 @@
<disambig action="ignore_spelling"/>
</rule>

<rule id="HEXADECIMAL_NOTATION">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'name' attribute is missing. I think it is required in some tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rule within a rulegroup, though, and the rulegroup has both a name and an id attribute. Should be fine..?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes. If the tests pass, it is fine.

@p-goulart p-goulart merged commit 2bd167a into master May 20, 2024
2 checks passed
@p-goulart p-goulart deleted the pt/grammar/octal_multiplication branch May 20, 2024 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants