-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault with certain language packs on certain images #4146
Comments
Similar to #4148 (but crash on the different place):
|
@cchadowitz, please provide sample_012631.jpg, mrz.traineddata and ocrb_int.traineddata. Those files are needed to reproduce the issue. |
@stweil : seems like ocrb_int is @Shreeshrii training from https://github.com/Shreeshrii/tessdata_ocrb and mrz is from https://github.com/DoubangoTelecom/tesseractMRZ |
I can reproduce one of the segmentation faults:
The segmentation fault does not occur with eng and ara from tessdata_best or tessdata. But I got another issue with a non-existing 2nd language model:
|
Thanks for taking a look! @stweil looks like you were able to reproduce, but just fyi the images I referenced are the same as the attached ones (Github just changed the filenames). I shared the example run after the image for each example, so the first image is the referenced Anyways, let me know if I can provide anything else in the meantime. |
I am getting a lot of reports of crashes in |
@stweil are you able to reproduce the segfaults from this and from #4148 against 5.3.4 or current code? We tested against the Ubuntu 24.04-distributed 5.3.4 and it doesn't seem like we can anymore. However, I cannot seem to figure out what in the branch diffs would have changed this. If you have a moment to over it I'd love to know your thoughts. |
This bug still occurs with the latest code from the main branch. It should be fixed before the next release is tagged. |
Call stack with debug code (which stops earlier):
|
Current Behavior
When running the tesseract binary with specific language packs enabled, the binary segfaults (different places for different combinations).
chi_sim
and themrz
language packs together for this input image:eng
,ara
, andocrb_int
language packs together for this input image:Expected Behavior
No segfaults.
Suggested Fix
No response
tesseract -v
Operating System
No response
Other Operating System
Ubuntu 23.10
uname -a
Compiler
GCC 13.2.0
CPU
Intel Xeon CPU E5-2609 v4 @ 1.70GHz x16
Virtualization / Containers
I'm building and running
tesseract
inside a docker (v24.0.2) container, where the container is running the OS and compiler versions listed above.Other Information
No response
The text was updated successfully, but these errors were encountered: