Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pt-PT] Fixed FPs in rule ID:VERBO_PARA_PRONOME_PESSOAL_V2 #10925

Merged
merged 1 commit into from
Oct 13, 2024

Conversation

marcoagpinto
Copy link
Member

@marcoagpinto marcoagpinto commented Oct 13, 2024

Just fixed false positives from the nightly diff.

Summary by CodeRabbit

  • New Features
    • Introduced new rules for improved grammatical correctness and stylistic quality in Portuguese text.
    • Added rules to address common linguistic issues and enhance formal language use.
    • Implemented rules to eliminate redundancy and simplify expressions.
    • Included antipatterns to catch informal language usage and pleonasms.

Copy link
Contributor

coderabbitai bot commented Oct 13, 2024

Walkthrough

The pull request includes extensive modifications to the style.xml file for the Portuguese language module in LanguageTool. It introduces and refines various language rules to enhance grammatical correctness and stylistic quality. Key changes involve the addition of new rules addressing common linguistic issues, the enhancement of existing rules for clarity in formal language use, and the implementation of rules to eliminate redundancy and simplify expressions. Overall, the changes expand the range of linguistic phenomena covered, maintaining the XML structure while significantly enhancing the content.

Changes

File Path Change Summary
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml Added and modified numerous rules and rule groups focusing on grammatical correctness, stylistic quality, and simplification of Portuguese language expressions.

Possibly related PRs

Suggested reviewers

  • p-goulart
  • susanaboatto

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml (1)

Line range hint 1-4995: Comprehensive enhancements to Portuguese style checking.

The changes in this file significantly expand the capabilities of the Portuguese language module in LanguageTool. New categories and rules have been added to address various aspects of formal, academic, and concise writing. The additions are well-structured and consistent with the existing format.

While the changes appear to be valuable improvements, I recommend the following:

  1. Conduct comprehensive testing of the new rules to ensure they work as intended across various contexts and don't produce unexpected results.
  2. Update the documentation to reflect these new capabilities and provide guidance on their usage.
  3. Consider the performance implications of adding so many new rules and optimize if necessary.

Given the extensive additions, it might be worth considering a modular approach for future expansions. This could involve separating rules into smaller, more manageable files based on categories or linguistic features. This approach could improve maintainability and make it easier to enable or disable specific sets of rules.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 0fd427c and 738ce79.

📒 Files selected for processing (1)
  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml (2 hunks)
🧰 Additional context used
🔇 Additional comments (22)
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml (22)

Line range hint 1-25: XML structure and copyright information look good.

The file starts with proper XML declarations and includes comprehensive copyright information. This is a good practice for maintaining proper documentation and licensing.


Line range hint 27-63: Entity declarations are well-organized and comprehensive.

The entity declarations provide a good foundation for the rules that follow. They include various linguistic elements specific to Portuguese, which will be useful in the rule definitions.


Line range hint 66-69: New rule 'CONFUSÃO_CAIXA_EMBALAGEM' added.

This rule addresses the confusion between "caixa" (box) and "embalagem" (package) in the context of pharmaceuticals. It's a good addition for improving clarity in medical-related text.


Line range hint 82-96: New rule 'PRAZER_GOSTO' added.

This rule suggests using "gosto" instead of "prazer" in certain contexts, which can help in achieving a more formal tone. The rule seems well-constructed with appropriate exceptions.


Line range hint 99-108: New rule 'CICLO_VICIOSO' added.

This rule corrects the common mistake of using "ciclo vicioso" instead of the correct "círculo vicioso". The rule includes a helpful URL for further information.


Line range hint 203-215: New rule 'AVOID_GERUND_ALL' added.

This rule is a more aggressive version of the previous gerund-avoiding rules. It's set to 'off' by default, which is a good choice given its potentially broad impact. This allows users to opt-in to this stricter style if desired.


Line range hint 218-232: New rule 'EM_DETALHE_DETALHADAMENTE_PT_PT' added.

This rule suggests using "detalhadamente" instead of "em detalhe", which can help in achieving a more concise writing style. The rule seems well-constructed with appropriate pattern matching.


Line range hint 235-285: New rulegroup 'EM_RELAÇÃO_RELATIVAMENTE_PT_PT' added.

This rulegroup aims to simplify phrases using "em relação a" to "relativamente a". The rules are well-structured with separate handling for different prepositions. The antipatterns help prevent false positives.


Line range hint 287-305: New rule 'ENSINO_SUPERIOR' added.

This rule suggests using "ensino" instead of "educação" when referring to higher education institutions. It's a good addition for maintaining consistency in academic contexts.


Line range hint 308-316: New rule 'FAZER_DESPORTO' added.

This rule suggests using "praticar" instead of "fazer" when referring to sports activities. It's a good stylistic improvement for more formal contexts.


Line range hint 1025-1043: New rule 'VELHA_ESCOLA_VELHA_GUARDA' added.

This rule suggests using "velha guarda" instead of "velha escola" in idiomatic expressions. It's a good addition for maintaining consistency in formal writing.


Line range hint 1045-1099: New rulegroup 'PORTUGUESE_ADDRESSES' added.

This rulegroup contains rules for formatting Portuguese addresses correctly. The rules seem well-constructed and include appropriate antipatterns to prevent false positives.


Line range hint 1102-1113: New rule 'ALUGAR_CASA' added.

This rule suggests using "arrendar" instead of "alugar" for renting properties. It's a good stylistic improvement for more formal contexts.


Line range hint 1116-1138: New rule 'CONCLUIR_UM_CURSO' added.

This rule suggests using "concluir" instead of "acabar/terminar" when referring to completing academic courses. It's a good addition for maintaining a formal tone in academic contexts.


Line range hint 1310-1425: New rulegroup 'ORDINAL_ABREVIATION' added.

This rulegroup deals with the correct abbreviation of ordinal numbers in Portuguese. The rules are well-structured and include appropriate antipatterns to prevent false positives.


Line range hint 1433-2022: New category 'FORMAL_SPEECH_PT_PT' added with multiple rules.

This category contains several rules aimed at improving formal speech in Portuguese. The rules cover various aspects such as word choice, phrase structure, and idiomatic expressions. The rules seem well-constructed and include appropriate examples and antipatterns.


Line range hint 2025-2134: New category 'ACADEMIC_SCIENTIFIC_PT_PT' added with multiple rules.

This category contains rules specifically tailored for academic and scientific writing in Portuguese. The rules suggest more formal or precise alternatives for certain terms and phrases. The rules are well-structured and include appropriate examples.


Line range hint 2137-2393: New category 'REDUNDANCY_PT_PT' added with multiple rules.

This category contains rules for identifying and eliminating redundant expressions in Portuguese. The rules cover a wide range of common pleonasms and redundancies. The rules are well-structured and include appropriate examples.


Line range hint 319-1022: Extensive rulegroup 'INFORMALITIES' added.

This large rulegroup contains numerous rules for detecting and suggesting alternatives to informal language. The rules cover a wide range of colloquialisms and informal expressions. While comprehensive, the size and complexity of this rulegroup might make it challenging to maintain and update in the future.

To ensure the effectiveness of these informality rules, it would be beneficial to run some tests. Here's a script to help with that:

#!/bin/bash
# Description: Test the INFORMALITIES rules with various informal expressions

# Test cases
expressions=(
    "anhar"
    "bater papo"
    "bicho geográfico"
    "boca do estômago"
    "bola pra frente"
    "coisa e tal"
    "dar uma de"
    "de borla"
    "e aí"
    "encher o saco"
    "fala aí"
    "fazer nas coxas"
    "mamão com açúcar"
    "morrer de rir"
    "no geral"
    "numa boa"
    "pois é"
    "querer por querer"
)

# Run each expression through LanguageTool
for expr in "${expressions[@]}"; do
    echo "Testing: $expr"
    echo "Ele estava $expr ontem." | java -jar languagetool-commandline.jar -l pt-PT -
    echo "---"
done

This script will help verify that the informality rules are catching these expressions and suggesting appropriate alternatives.


Line range hint 1141-1307: New rule 'BARBARISMS_PT_PT_V3' added.

This extensive rule deals with foreign terms (barbarisms) in Portuguese text. It suggests putting these terms in quotes or italics. The rule includes a comprehensive list of foreign terms and appropriate antipatterns to prevent false positives.

To ensure the effectiveness of this barbarism rule, it would be helpful to run some tests. Here's a script to assist with that:

#!/bin/bash
# Description: Test the BARBARISMS_PT_PT_V3 rule with various foreign terms

# Test cases (sample from the extensive list)
terms=(
    "software"
    "hardware"
    "online"
    "offline"
    "email"
    "smartphone"
    "tablet"
    "blog"
    "chat"
    "design"
    "marketing"
    "feedback"
    "workshop"
    "startup"
    "freelancer"
)

# Run each term through LanguageTool
for term in "${terms[@]}"; do
    echo "Testing: $term"
    echo "O $term é muito útil." | java -jar languagetool-commandline.jar -l pt-PT -
    echo "---"
done

This script will help verify that the barbarism rule is correctly identifying these foreign terms and suggesting appropriate formatting.


Line range hint 2396-4995: New category 'SHORTEN_IT_PT_PT' added with multiple rules.

This extensive category contains rules aimed at simplifying and shortening various expressions in Portuguese. The rules cover a wide range of constructions and suggest more concise alternatives. The rules are well-structured and include appropriate examples and antipatterns. However, the complexity and number of rules in this category might warrant additional testing to ensure they don't produce unexpected results in edge cases.

To ensure the effectiveness of these simplification rules, it would be beneficial to run some tests. Here's a script to help with that:

#!/bin/bash
# Description: Test the SHORTEN_IT_PT_PT rules with various expressions

# Test cases (sample from the category)
expressions=(
    "À procura de"
    "A opinião de vocês"
    "Avião de combate"
    "Chegar ao fim"
    "De você"
    "Depois é que"
    "É necessário ter"
    "E que"
    "O que me foi dito foi"
    "Ser igual a"
    "Para mim"
    "Verbo Pronominal de"
    "Ser útil para"
)

# Run each expression through LanguageTool
for expr in "${expressions[@]}"; do
    echo "Testing: $expr"
    echo "Ele estava $expr ontem." | java -jar languagetool-commandline.jar -l pt-PT -
    echo "---"
done

This script will help verify that the simplification rules are correctly identifying these expressions and suggesting more concise alternatives.


Line range hint 111-201: New rulegroup 'AVOID_GERUND' added with multiple subrules.

This comprehensive rulegroup aims to reduce the use of gerunds in favor of more formal constructions. The rules are well-structured with multiple antipatterns to avoid false positives. However, the complexity of these rules might warrant additional testing to ensure they don't produce unexpected results in edge cases.

To ensure the effectiveness and accuracy of these gerund-related rules, it would be beneficial to run additional tests. Here's a script to help with that:

This script will help verify that the rules are working as intended and not producing false positives or negatives.

@marcoagpinto marcoagpinto merged commit a3205ac into master Oct 13, 2024
5 checks passed
@marcoagpinto marcoagpinto deleted the lt_marcoagpinto_20241013_0932 branch October 13, 2024 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant