We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language.detect
We found that the Language.detect failed to detect Japanese text with kanji characters.
julia> Languages.detect("組織が人材を募集する際") (Languages.Mandarin(), Languages.MandarinScript(), 1.0) julia> Languages.detect("職場で危険な状況を見つけた場合、") (Languages.Mandarin(), Languages.MandarinScript(), 1.0) julia> Languages.detect("業務で血液やその他の感染性物質に触れる機会がある場合") (Languages.Mandarin(), Languages.MandarinScript(), 1.0) julia> Languages.detect("情報を紛失や盗難から保護することは、評判を維持し、ビジネスを成長させ続けるために不可欠です。") (nothing, nothing, 0)
This doesn't happen with the whatlang-rs and whatlang-pyo3:
use whatlang::{detect, Lang, Script}; fn main() { let text1 = "業務で血液やその他の感染性物質に触れる機会がある場合"; let text2 = "組織が人材を募集する際、採用候補者に対して求める最大のスキルの 1 つに、コラボレーション能力があります"; let info1 = detect(text1).unwrap(); let info2 = detect(text2).unwrap(); dbg!(info1); dbg!(info2); } [src/main.rs:10] info1 = Info { script: Mandarin, lang: Jpn, confidence: 1.0, } [src/main.rs:11] info2 = Info { script: Hiragana, lang: Jpn, confidence: 1.0, }
>>> from whatlang import detect >>> detect("組織が人材を募集する際") Language: jpn - Script: Mandarin - Confidence: 1 - Is reliable: true >>> detect("職場で危険な状況を見つけた場合、") Language: jpn - Script: Mandarin - Confidence: 1 - Is reliable: true >>> detect("業務で血液やその他の感染性物質に触れる機会がある場合") Language: jpn - Script: Mandarin - Confidence: 1 - Is reliable: true >>> from whatlang import detect_lang >>> detect_lang("業務で血液やその他の感染性物質に触れる機会がある場合") Language: jpn
This happen because the detect_lang_based_on_script only return "Cmn" for MandarinScript
detect_lang_based_on_script
"Cmn"
MandarinScript
Languages.jl/src/whatlang.jl
Line 235 in 3ea5534
We might either need a more complicated detect_lang_based_on_script for MandarinScript or use the whatlang-ffi directly.
cc @rssdev10
The text was updated successfully, but these errors were encountered:
Fixed with the help of this sctipt https://github.com/greyblake/whatlang-rs/blob/0c03d281a8d327558ab89632d2d997e644a8c7dd/src/core/detect.rs#L70-L100
Thank you @chengchingwen and @rssdev10
See #50
Sorry, something went wrong.
No branches or pull requests
We found that the
Language.detect
failed to detect Japanese text with kanji characters.This doesn't happen with the whatlang-rs and whatlang-pyo3:
This happen because the
detect_lang_based_on_script
only return"Cmn"
forMandarinScript
Languages.jl/src/whatlang.jl
Line 235 in 3ea5534
We might either need a more complicated
detect_lang_based_on_script
for MandarinScript or use the whatlang-ffi directly.cc @rssdev10
The text was updated successfully, but these errors were encountered: