You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Readme of this crate: https://lib.rs/crates/dcli contains Chinese simplified text with code examples in English. If I feed markdown of this file to whatlang, I get Lang::Fra with 0.52 confidence.
I think the language detection could be strongly biased towards presence of CJK characters, because speakers of these languages are much more likely to use some latin letters, than speakers of European languages use substantial amount of CJK characters.
The text was updated successfully, but these errors were encountered:
At the moment, the algorithm to detect a script is based on counting chars that belongs to one or another script. And the winner is the one, that gets the highest count.
Readme of this crate: https://lib.rs/crates/dcli contains Chinese simplified text with code examples in English. If I feed markdown of this file to whatlang, I get
Lang::Fra
with 0.52 confidence.I think the language detection could be strongly biased towards presence of CJK characters, because speakers of these languages are much more likely to use some latin letters, than speakers of European languages use substantial amount of CJK characters.
The text was updated successfully, but these errors were encountered: