add chi_tra_vert and chi_sim_vert to supported languages #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi!
I am finding a way for the OCR to recognize Chinese in vertical writing before I upload tons of old books in Traditional Chinese to the IA, as I said in the email.
This PR simply adds
chi_tra_vert
/HantT_vert
,chi_sim_vert
/HanS_vert
for this purpose. It does not change the existing language code mapping ofChinese
,Chinese (Simplified)
and etc. So there won't be breaking changes.I plan to set
language = Chinese (Traditional)
and setocr_default_parameters = ocr_additional_languages:chi_tra_vert
on my books so that-l chi_tra+chi_tra_vert
would be applied to Tesseract, which appears to be the only feasible way so far to recognize Traditional Chinese in either vertical or horizontal writing, even though it yields less accuracy in contrast to applied-l chi_tra
or-l chi_tra_vert
independently.It might not be the right way to resolve the problem. But I think it is an acceptable workaround. I have also opened an issue in the upstream tesseract-ocr/tessdata_best#72.