-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cb changes 20240830 #10856
Cb changes 20240830 #10856
Conversation
WalkthroughThe pull request introduces multiple updates across various files in the LanguageTool project. Key changes include the addition of proper nouns and contemporary terms to spelling and ignore lists, enhancements to grammar-checking capabilities, and updates to the handling of specific linguistic patterns in both German and English. These modifications collectively expand the vocabulary and improve the accuracy of language processing. Changes
Possibly related PRs
Recent review detailsConfiguration used: CodeRabbit UI Files selected for processing (6)
Files skipped from review due to trivial changes (2)
Files skipped from review as they are similar to previous changes (4)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range and nitpick comments (1)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt (1)
463-494
: Approve the new entries with a minor suggestion.The addition of these new entries, including contemporary terms, legal and financial terminology, and compound forms, improves the comprehensiveness of the spell checker. The variations of the words ensure that different forms are recognized as valid.
However, please consider the following suggestion:
Remove "bio-degradable" and "bio-degradables" from the list, as "biodegradable" (spelled as one word) is the more common and accepted spelling. The one-word form is already included in the list.
Tools
LanguageTool
[misspelling] ~490-~490: This word is normally spelled as one.
Context: ...demnifiers UV-degradable oxy-degradable bio-degradable bio-degradables biodegradable biodegrad...(EN_COMPOUNDS_BIO_DEGRADABLE)
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (11)
- languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt (2 hunks)
- languagetool-language-modules/de/src/main/java/org/languagetool/rules/de/GermanUnpairedQuotesRule.java (1 hunks)
- languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/compounds.txt (1 hunks)
- languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt (2 hunks)
- languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt (2 hunks)
- languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt (1 hunks)
- languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/remote-rule-filters.xml (1 hunks)
- languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/added.txt (1 hunks)
- languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt (1 hunks)
- languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt (2 hunks)
- languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml (10 hunks)
Files skipped from review due to trivial changes (5)
- languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt
- languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/compounds.txt
- languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt
- languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt
- languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt
Additional context used
LanguageTool
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt
[misspelling] ~490-~490: This word is normally spelled as one.
Context: ...demnifiers UV-degradable oxy-degradable bio-degradable bio-degradables biodegradable biodegrad...(EN_COMPOUNDS_BIO_DEGRADABLE)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/added.txt
[duplication] ~1082-~1082: Possible typo: you repeated a word
Context: ...utershare NNP pax pax NNS torr torr NNS plankton plankton NNS nanoplankton nanoplankton NNS Choct...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~1083-~1083: Possible typo: you repeated a word
Context: ...NNS torr torr NNS plankton plankton NNS nanoplankton nanoplankton NNS Choctaw Choctaw JJ Brangelina Brang...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~1084-~1084: Possible typo: you repeated a word
Context: ...nkton NNS nanoplankton nanoplankton NNS Choctaw Choctaw JJ Brangelina Brangelina NNP Seminole S...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~1085-~1085: Possible typo: you repeated a word
Context: ...ton nanoplankton NNS Choctaw Choctaw JJ Brangelina Brangelina NNP Seminole Seminole NNPS counsel coun...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~1086-~1086: Possible typo: you repeated a word
Context: ...aw Choctaw JJ Brangelina Brangelina NNP Seminole Seminole NNPS counsel counsel NNS Hydrapple Hydr...(ENGLISH_WORD_REPEAT_RULE)
Additional comments not posted (27)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt (2)
64659-64669
: LGTM!The additions to the spelling list look good. They expand the vocabulary with relevant terms related to finance, technology, and cultural references, reflecting contemporary usage. The format and content of the entries are consistent with the existing list.
64772-64807
: Looks good!The additions to the spelling list in this hunk are also appropriate and beneficial. They include more terms related to paraphrasing techniques, cultural references, and compound words that capture specific meanings and usages. The expanded vocabulary will improve the tool's language processing capabilities for German. The format and content of the entries remain consistent with the existing list.
languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml (12)
105-105
: LGTM!The change replaces "chicken" with "counsel" in the list of uncommon verbs. This is an appropriate update as "counsel" is a more fitting uncommon verb compared to "chicken".
73599-73617
: Looks good!The new antipatterns handle specific cases of incorrect grammar constructs. The token combinations and regex usage seem appropriate.
74014-74027
: The new antipatterns look good.The antipatterns for "have noun all of the time" and "have noun the next" use appropriate POS tags and token sequences to match the desired cases of incorrect or uncommon usage of "have".
74029-74031
: Is the "access to" antipattern necessary?"access to" seems to be a common and valid phrase. It may not necessarily indicate incorrect grammar. Please verify if this antipattern is really needed.
74060-74060
: The added exceptions look good.The terms added to the list of exceptions for the verb token are common nouns or other valid exceptions. The use of "?" for optional plural forms is also appropriate.
74288-74297
: Are the "write" and "read" access/permissions/privileges antipatterns necessary?The antipatterns seem to be handling common phrases related to "write" and "read" access/permissions/privileges. These phrases are not necessarily grammar errors. Please verify if these antipatterns are really needed.
74298-74300
: Please provide more context or examples for the "the|this" antipattern.The antipattern for "the|this" followed by up to 2 tokens with chunk_re "I-NP.*" is a bit unclear without more context. It might be too broad and match valid phrases. Can you please provide some examples of the types of phrases this antipattern is intended to match? That would help verify its validity.
78211-78211
: The added exceptions for the plural noun token look good.Allowing proper nouns (NNP), nouns (NN), adverbs (RB), uppercase words ([A-Z].*), and the specific word "knows" as exceptions to the plural noun token makes sense. These are reasonable additions to the exceptions list.
96287-96315
: The new antipatterns for "many", "several", and "few" look good.The added antipatterns cover a good range of common phrases and constructs where "many", "several", or "few" are used correctly. The patterns use appropriate POS tags, token sequences, and regex to match the desired cases. These antipatterns should help avoid flagging correct usages as errors.
96327-96333
: The new antipattern looks good.The antipattern handles cases where "many", "several", "few", or "various" are followed by a gerund (VBG) that is part of a noun phrase. The exceptions for "amazing" and "interesting" are appropriate, as they are common adjectives that can validly appear in this context. The pattern uses suitable POS tags and chunk_re to match the desired case.
96344-96348
: The added exceptions look good.The new word exceptions are common words that can validly follow "many", "several", "few", or "various", even if they are not singular nouns. The regex exceptions for apostrophe, slash, dash, uppercase words, and units of measurement abbreviations also make sense in this context. These are appropriate additions to the exceptions list.
96485-96485
: The added examples are useful.The new example phrases provide helpful context and test cases for the pattern and antipatterns related to "many", "several", and "few". They cover both valid and invalid usages, and are appropriately annotated with comments and "type" attributes to indicate if they should trigger an error or not. These examples will be valuable for testing and understanding the behavior of the pattern and antipatterns.
languagetool-language-modules/de/src/main/java/org/languagetool/rules/de/GermanUnpairedQuotesRule.java (2)
32-32
: Verify the removal of "‚" from the start symbols.The symbol "‚" has been removed from the list of starting symbols used for detecting unpaired quotes in German text. Please confirm if this removal is intentional and aligns with the desired behavior of the quote detection logic.
33-33
: Verify the removal of "'" from the end symbols.The symbol "'" has been removed from the list of ending symbols used for detecting unpaired quotes in German text. Please confirm if this removal is intentional and aligns with the desired behavior of the quote detection logic.
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt (2)
420-420
: LGTM!Adding "Brangelina" to the list is appropriate as it is a well-known contemporary term and a proper noun. This will ensure that the spell checker recognizes it as a valid word.
459-462
: LGTM!Adding both the hyphenated and non-hyphenated forms of these compound words is a good practice. It ensures that the spell checker recognizes both variations as valid, providing flexibility in usage.
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/added.txt (7)
1082-1082
: LGTM!The addition of
torr
as a plural noun (NNS) is appropriate and enhances the lexicon.Tools
LanguageTool
[duplication] ~1082-~1082: Possible typo: you repeated a word
Context: ...utershare NNP pax pax NNS torr torr NNS plankton plankton NNS nanoplankton nanoplankton NNS Choct...(ENGLISH_WORD_REPEAT_RULE)
1083-1083
: Looks good!Adding
plankton
as a plural noun (NNS) is suitable and improves the vocabulary coverage.Tools
LanguageTool
[duplication] ~1083-~1083: Possible typo: you repeated a word
Context: ...NNS torr torr NNS plankton plankton NNS nanoplankton nanoplankton NNS Choctaw Choctaw JJ Brangelina Brang...(ENGLISH_WORD_REPEAT_RULE)
1084-1084
: Approved!The inclusion of
nanoplankton
as a plural noun (NNS) is beneficial and adds more specificity to the lexicon.Tools
LanguageTool
[duplication] ~1084-~1084: Possible typo: you repeated a word
Context: ...nkton NNS nanoplankton nanoplankton NNS Choctaw Choctaw JJ Brangelina Brangelina NNP Seminole S...(ENGLISH_WORD_REPEAT_RULE)
1085-1085
: Looks good to me!Adding
Choctaw
as both a plural noun (NNPS) and an adjective (JJ) is suitable and enhances the language processing capabilities.Tools
LanguageTool
[duplication] ~1085-~1085: Possible typo: you repeated a word
Context: ...ton nanoplankton NNS Choctaw Choctaw JJ Brangelina Brangelina NNP Seminole Seminole NNPS counsel coun...(ENGLISH_WORD_REPEAT_RULE)
1086-1086
: Approved!Including
Brangelina
as a proper noun (NNP) is appropriate and allows the system to handle this contemporary term effectively.Tools
LanguageTool
[duplication] ~1086-~1086: Possible typo: you repeated a word
Context: ...aw Choctaw JJ Brangelina Brangelina NNP Seminole Seminole NNPS counsel counsel NNS Hydrapple Hydr...(ENGLISH_WORD_REPEAT_RULE)
1082-1086
: Ignore the duplication hints from static analysis.The flagged duplications are false positives in this context. The terms are intentionally repeated with different part-of-speech tags, which is necessary and expected in a part-of-speech dictionary.
Tools
LanguageTool
[duplication] ~1082-~1082: Possible typo: you repeated a word
Context: ...utershare NNP pax pax NNS torr torr NNS plankton plankton NNS nanoplankton nanoplankton NNS Choct...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~1083-~1083: Possible typo: you repeated a word
Context: ...NNS torr torr NNS plankton plankton NNS nanoplankton nanoplankton NNS Choctaw Choctaw JJ Brangelina Brang...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~1084-~1084: Possible typo: you repeated a word
Context: ...nkton NNS nanoplankton nanoplankton NNS Choctaw Choctaw JJ Brangelina Brangelina NNP Seminole S...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~1085-~1085: Possible typo: you repeated a word
Context: ...ton nanoplankton NNS Choctaw Choctaw JJ Brangelina Brangelina NNP Seminole Seminole NNPS counsel coun...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~1086-~1086: Possible typo: you repeated a word
Context: ...aw Choctaw JJ Brangelina Brangelina NNP Seminole Seminole NNPS counsel counsel NNS Hydrapple Hydr...(ENGLISH_WORD_REPEAT_RULE)
1082-1086
: Overall, the changes look great!The addition of new terms to the part-of-speech dictionary is a valuable improvement. The included nouns and adjectives enhance the lexicon and strengthen the system's language processing capabilities. The changes are appropriate, consistent, and have a positive impact on the system's ability to handle contemporary terms and named entities effectively.
Tools
LanguageTool
[duplication] ~1082-~1082: Possible typo: you repeated a word
Context: ...utershare NNP pax pax NNS torr torr NNS plankton plankton NNS nanoplankton nanoplankton NNS Choct...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~1083-~1083: Possible typo: you repeated a word
Context: ...NNS torr torr NNS plankton plankton NNS nanoplankton nanoplankton NNS Choctaw Choctaw JJ Brangelina Brang...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~1084-~1084: Possible typo: you repeated a word
Context: ...nkton NNS nanoplankton nanoplankton NNS Choctaw Choctaw JJ Brangelina Brangelina NNP Seminole S...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~1085-~1085: Possible typo: you repeated a word
Context: ...ton nanoplankton NNS Choctaw Choctaw JJ Brangelina Brangelina NNP Seminole Seminole NNPS counsel coun...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~1086-~1086: Possible typo: you repeated a word
Context: ...aw Choctaw JJ Brangelina Brangelina NNP Seminole Seminole NNPS counsel counsel NNS Hydrapple Hydr...(ENGLISH_WORD_REPEAT_RULE)
languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/remote-rule-filters.xml (2)
30-48
: Excellent addition to handle punctuation issues with double quotes in German! The rules look solid.The new rule group
AI_DE_GGEC_MISSING_PUNCTUATION_L_DOUBLE_QUOT
introduces two well-defined rules to identify potential missing punctuation related to double quotes in German sentences. The regular expressions correctly capture the intended patterns. The included examples clearly demonstrate the corrections.
Line range hint
49-105
: Skipping review as no changes were made to theAI_DE_HYDRA_LEO_CASE_UPPER
rule group.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores