Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crimean Tatar (crh) #10026

Merged
merged 10 commits into from
Jan 10, 2024
Merged

Crimean Tatar (crh) #10026

merged 10 commits into from
Jan 10, 2024

Conversation

arysin
Copy link
Contributor

@arysin arysin commented Jan 2, 2024

Add Crimean Tatar Language (crh)

@danielnaber
Copy link
Member

common_words.txt should probably be limited to ~10,000 lines. The lang detection algorithm doesn't normalize for the length of the common_words file, I think, so all of them should roughly have the same number of words.

@arysin
Copy link
Contributor Author

arysin commented Jan 3, 2024

So (at least for now) crh dictionary contains words duplicated in Latin and Cyrillic, so we have 10k + 10k words. If we should trim it to 10k total, I can cut it to 5k + 5k

@arysin
Copy link
Contributor Author

arysin commented Jan 5, 2024

@danielnaber based on what you said about common_words, it could be that in this (unique) case with two alphabets it makes sense for 10k+10k, as the text would be either in Latin or Cyrillic and in each case only 10k of either set of words would be in play. So this would work in the algorithm is based on absolute values. Unless I am missing something.

@danielnaber
Copy link
Member

I guess 10k+10k would be okay

@danielnaber
Copy link
Member

I guess this is ready to be merged now? Just as a heads-up: while I can merge this, please don't expect the language to show up in the UI. Adapting the UIs is a lot of manual work, and adding one more language makes the usability of the long drop-down even worse.

@arysin
Copy link
Contributor Author

arysin commented Jan 10, 2024

Yes, thank you. The team understands it'll take some time to reach the UI.

@danielnaber danielnaber merged commit 27ef025 into languagetool-org:master Jan 10, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants