-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pt-PT] Fixed FPs in rule ID:VERBO_PARA_PRONOME_PESSOAL_V2 #10925
Conversation
WalkthroughThe pull request includes extensive modifications to the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml (1)
Line range hint
1-4995
: Comprehensive enhancements to Portuguese style checking.The changes in this file significantly expand the capabilities of the Portuguese language module in LanguageTool. New categories and rules have been added to address various aspects of formal, academic, and concise writing. The additions are well-structured and consistent with the existing format.
While the changes appear to be valuable improvements, I recommend the following:
- Conduct comprehensive testing of the new rules to ensure they work as intended across various contexts and don't produce unexpected results.
- Update the documentation to reflect these new capabilities and provide guidance on their usage.
- Consider the performance implications of adding so many new rules and optimize if necessary.
Given the extensive additions, it might be worth considering a modular approach for future expansions. This could involve separating rules into smaller, more manageable files based on categories or linguistic features. This approach could improve maintainability and make it easier to enable or disable specific sets of rules.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml (2 hunks)
🧰 Additional context used
🔇 Additional comments (22)
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml (22)
Line range hint
1-25
: XML structure and copyright information look good.The file starts with proper XML declarations and includes comprehensive copyright information. This is a good practice for maintaining proper documentation and licensing.
Line range hint
27-63
: Entity declarations are well-organized and comprehensive.The entity declarations provide a good foundation for the rules that follow. They include various linguistic elements specific to Portuguese, which will be useful in the rule definitions.
Line range hint
66-69
: New rule 'CONFUSÃO_CAIXA_EMBALAGEM' added.This rule addresses the confusion between "caixa" (box) and "embalagem" (package) in the context of pharmaceuticals. It's a good addition for improving clarity in medical-related text.
Line range hint
82-96
: New rule 'PRAZER_GOSTO' added.This rule suggests using "gosto" instead of "prazer" in certain contexts, which can help in achieving a more formal tone. The rule seems well-constructed with appropriate exceptions.
Line range hint
99-108
: New rule 'CICLO_VICIOSO' added.This rule corrects the common mistake of using "ciclo vicioso" instead of the correct "círculo vicioso". The rule includes a helpful URL for further information.
Line range hint
203-215
: New rule 'AVOID_GERUND_ALL' added.This rule is a more aggressive version of the previous gerund-avoiding rules. It's set to 'off' by default, which is a good choice given its potentially broad impact. This allows users to opt-in to this stricter style if desired.
Line range hint
218-232
: New rule 'EM_DETALHE_DETALHADAMENTE_PT_PT' added.This rule suggests using "detalhadamente" instead of "em detalhe", which can help in achieving a more concise writing style. The rule seems well-constructed with appropriate pattern matching.
Line range hint
235-285
: New rulegroup 'EM_RELAÇÃO_RELATIVAMENTE_PT_PT' added.This rulegroup aims to simplify phrases using "em relação a" to "relativamente a". The rules are well-structured with separate handling for different prepositions. The antipatterns help prevent false positives.
Line range hint
287-305
: New rule 'ENSINO_SUPERIOR' added.This rule suggests using "ensino" instead of "educação" when referring to higher education institutions. It's a good addition for maintaining consistency in academic contexts.
Line range hint
308-316
: New rule 'FAZER_DESPORTO' added.This rule suggests using "praticar" instead of "fazer" when referring to sports activities. It's a good stylistic improvement for more formal contexts.
Line range hint
1025-1043
: New rule 'VELHA_ESCOLA_VELHA_GUARDA' added.This rule suggests using "velha guarda" instead of "velha escola" in idiomatic expressions. It's a good addition for maintaining consistency in formal writing.
Line range hint
1045-1099
: New rulegroup 'PORTUGUESE_ADDRESSES' added.This rulegroup contains rules for formatting Portuguese addresses correctly. The rules seem well-constructed and include appropriate antipatterns to prevent false positives.
Line range hint
1102-1113
: New rule 'ALUGAR_CASA' added.This rule suggests using "arrendar" instead of "alugar" for renting properties. It's a good stylistic improvement for more formal contexts.
Line range hint
1116-1138
: New rule 'CONCLUIR_UM_CURSO' added.This rule suggests using "concluir" instead of "acabar/terminar" when referring to completing academic courses. It's a good addition for maintaining a formal tone in academic contexts.
Line range hint
1310-1425
: New rulegroup 'ORDINAL_ABREVIATION' added.This rulegroup deals with the correct abbreviation of ordinal numbers in Portuguese. The rules are well-structured and include appropriate antipatterns to prevent false positives.
Line range hint
1433-2022
: New category 'FORMAL_SPEECH_PT_PT' added with multiple rules.This category contains several rules aimed at improving formal speech in Portuguese. The rules cover various aspects such as word choice, phrase structure, and idiomatic expressions. The rules seem well-constructed and include appropriate examples and antipatterns.
Line range hint
2025-2134
: New category 'ACADEMIC_SCIENTIFIC_PT_PT' added with multiple rules.This category contains rules specifically tailored for academic and scientific writing in Portuguese. The rules suggest more formal or precise alternatives for certain terms and phrases. The rules are well-structured and include appropriate examples.
Line range hint
2137-2393
: New category 'REDUNDANCY_PT_PT' added with multiple rules.This category contains rules for identifying and eliminating redundant expressions in Portuguese. The rules cover a wide range of common pleonasms and redundancies. The rules are well-structured and include appropriate examples.
Line range hint
319-1022
: Extensive rulegroup 'INFORMALITIES' added.This large rulegroup contains numerous rules for detecting and suggesting alternatives to informal language. The rules cover a wide range of colloquialisms and informal expressions. While comprehensive, the size and complexity of this rulegroup might make it challenging to maintain and update in the future.
To ensure the effectiveness of these informality rules, it would be beneficial to run some tests. Here's a script to help with that:
#!/bin/bash # Description: Test the INFORMALITIES rules with various informal expressions # Test cases expressions=( "anhar" "bater papo" "bicho geográfico" "boca do estômago" "bola pra frente" "coisa e tal" "dar uma de" "de borla" "e aí" "encher o saco" "fala aí" "fazer nas coxas" "mamão com açúcar" "morrer de rir" "no geral" "numa boa" "pois é" "querer por querer" ) # Run each expression through LanguageTool for expr in "${expressions[@]}"; do echo "Testing: $expr" echo "Ele estava $expr ontem." | java -jar languagetool-commandline.jar -l pt-PT - echo "---" doneThis script will help verify that the informality rules are catching these expressions and suggesting appropriate alternatives.
Line range hint
1141-1307
: New rule 'BARBARISMS_PT_PT_V3' added.This extensive rule deals with foreign terms (barbarisms) in Portuguese text. It suggests putting these terms in quotes or italics. The rule includes a comprehensive list of foreign terms and appropriate antipatterns to prevent false positives.
To ensure the effectiveness of this barbarism rule, it would be helpful to run some tests. Here's a script to assist with that:
#!/bin/bash # Description: Test the BARBARISMS_PT_PT_V3 rule with various foreign terms # Test cases (sample from the extensive list) terms=( "software" "hardware" "online" "offline" "email" "smartphone" "tablet" "blog" "chat" "design" "marketing" "feedback" "workshop" "startup" "freelancer" ) # Run each term through LanguageTool for term in "${terms[@]}"; do echo "Testing: $term" echo "O $term é muito útil." | java -jar languagetool-commandline.jar -l pt-PT - echo "---" doneThis script will help verify that the barbarism rule is correctly identifying these foreign terms and suggesting appropriate formatting.
Line range hint
2396-4995
: New category 'SHORTEN_IT_PT_PT' added with multiple rules.This extensive category contains rules aimed at simplifying and shortening various expressions in Portuguese. The rules cover a wide range of constructions and suggest more concise alternatives. The rules are well-structured and include appropriate examples and antipatterns. However, the complexity and number of rules in this category might warrant additional testing to ensure they don't produce unexpected results in edge cases.
To ensure the effectiveness of these simplification rules, it would be beneficial to run some tests. Here's a script to help with that:
#!/bin/bash # Description: Test the SHORTEN_IT_PT_PT rules with various expressions # Test cases (sample from the category) expressions=( "À procura de" "A opinião de vocês" "Avião de combate" "Chegar ao fim" "De você" "Depois é que" "É necessário ter" "E que" "O que me foi dito foi" "Ser igual a" "Para mim" "Verbo Pronominal de" "Ser útil para" ) # Run each expression through LanguageTool for expr in "${expressions[@]}"; do echo "Testing: $expr" echo "Ele estava $expr ontem." | java -jar languagetool-commandline.jar -l pt-PT - echo "---" doneThis script will help verify that the simplification rules are correctly identifying these expressions and suggesting more concise alternatives.
Line range hint
111-201
: New rulegroup 'AVOID_GERUND' added with multiple subrules.This comprehensive rulegroup aims to reduce the use of gerunds in favor of more formal constructions. The rules are well-structured with multiple antipatterns to avoid false positives. However, the complexity of these rules might warrant additional testing to ensure they don't produce unexpected results in edge cases.
To ensure the effectiveness and accuracy of these gerund-related rules, it would be beneficial to run additional tests. Here's a script to help with that:
This script will help verify that the rules are working as intended and not producing false positives or negatives.
Just fixed false positives from the nightly diff.
Summary by CodeRabbit