add chi_tra_vert and chi_sim_vert to supported languages #1

Gowee · 2023-03-01T07:41:06Z

Hi!

I am finding a way for the OCR to recognize Chinese in vertical writing before I upload tons of old books in Traditional Chinese to the IA, as I said in the email.

This PR simply adds chi_tra_vert/HantT_vert, chi_sim_vert/HanS_vert for this purpose. It does not change the existing language code mapping of Chinese, Chinese (Simplified) and etc. So there won't be breaking changes.

I plan to set language = Chinese (Traditional) and set ocr_default_parameters = ocr_additional_languages:chi_tra_vert on my books so that -l chi_tra+chi_tra_vert would be applied to Tesseract, which appears to be the only feasible way so far to recognize Traditional Chinese in either vertical or horizontal writing, even though it yields less accuracy in contrast to applied -l chi_tra or -l chi_tra_vert independently.

It might not be the right way to resolve the problem. But I think it is an acceptable workaround. I have also opened an issue in the upstream tesseract-ocr/tessdata_best#72.

MerlijnWajer · 2023-03-01T09:55:32Z

Thank you for the request, I think this looks good as is, I would just like to clarify that the ocr_default_parameters is not read from any archive.org item metadata. It is read either from a task argument, or from a collection that the items are part of.

If you could let me know if you have a collection for your books, I could help you (create and) upload to a collection that has the right metadata set.

Alternatively, in addition to this pull request, you could add additional languages to your items, which will then automatically trigger using the right language packs.

MerlijnWajer · 2023-03-14T22:14:49Z

Sorry that it's taking a while to get back to you on this -- I'll get it done this week!

Gowee · 2023-03-16T12:25:00Z

No problem at all. Actually, I have been busy with some other things these days so I did not have time to finish my upload scripts.

Gowee · 2023-04-25T07:00:48Z

I just wanted to follow up on this. Is there any progress? :-)

MerlijnWajer · 2023-05-15T23:32:18Z

Sorry, I was mostly gone last week. Really trying to fit this in ASAP. :)

language: add vertical Chinese

8de980b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add chi_tra_vert and chi_sim_vert to supported languages #1

add chi_tra_vert and chi_sim_vert to supported languages #1

Gowee commented Mar 1, 2023

MerlijnWajer commented Mar 1, 2023 •

edited

Loading

MerlijnWajer commented Mar 14, 2023

Gowee commented Mar 16, 2023 •

edited

Loading

Gowee commented Apr 25, 2023

MerlijnWajer commented May 15, 2023

add chi_tra_vert and chi_sim_vert to supported languages #1

Are you sure you want to change the base?

add chi_tra_vert and chi_sim_vert to supported languages #1

Conversation

Gowee commented Mar 1, 2023

MerlijnWajer commented Mar 1, 2023 • edited Loading

MerlijnWajer commented Mar 14, 2023

Gowee commented Mar 16, 2023 • edited Loading

Gowee commented Apr 25, 2023

MerlijnWajer commented May 15, 2023

MerlijnWajer commented Mar 1, 2023 •

edited

Loading

Gowee commented Mar 16, 2023 •

edited

Loading