-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assert fail in src/ccstruct/pageres.cpp, line 1502 with specific image and language combination #4148
Comments
I see several problems here:
|
I can reproduce the issue with ara from tessdata_fast, ocrb_int from https://github.com/Shreeshrii/tessdata_ocrb/raw/master/ocrb_int.traineddata (thanks @zdenop) and the image above. It does not crash when I use ara from tessdata or tessdata_best. |
…opy-paste of assert.
@GerHobbelt Is the fix you made in your fork something which will be merged into mainline at some point? I'm just trying to figure if it's worth it to build a custom package to include that one line change. Thanks for your time and attention! |
Issue #4270 is a duplicate of this bug report and contains additional information. |
Removing the assertion as in commit 407c165 is not a valid solution because this causes a later heap-use-after-free bug. |
@stweil Thanks for the activity on these tickets. I agree - this did sidestep the assertion issue (we're using it in a local fork) but there are still segfaults remaining - i assume because of this change. Any idea on what to look at to fix the original issue? |
No, I still have no idea. I am afraid that the planned next release won't fix this issue unless someone finds a solution fast. |
The issue does not occur with all language pairs. It is even sufficient to replace tessdata_fast/ara by tessdata_best/ara. So there exist workarounds which can be used to avoid the crash. |
Current Behavior
When running this command line:
The following occurs:
Estimating resolution as 303 !w_it.cycled_list():Error:Assert failed:in file src/ccstruct/pageres.cpp, line 1502 Aborted
This was first detected with API usage from an internal app, but reproducible in tesseract commandline.
Backtrace:
Expected Behavior
The expectation is that the frame would be analyzed for OCR data without aborting. Other language combinations which have run without crash are:
eng+ara+fra+spa+mrz+chi_sim+chi_sim_vert+chi_tra+chi_tra_vert+rus
ocrb_int+eng+fra+spa+mrz+chi_sim+chi_sim_vert+chi_tra+chi_tra_vert+rus
ocrb_int+eng
The combination of
ara
andorcb_int
(whether on their own or included in a larger list such as above) trigger the abort each timeSuggested Fix
No known suggested fixes at this time.
tesseract -v
This has also been confirmed with the latest git
main
.Operating System
No response
Other Operating System
Ubuntu 23.10-based docker image, running under CentOS 7 host (all amd64). This has been reproduced in development setups however, including Ubuntu 20.04, 22.04, and WSL2.
uname -a
Linux e04873eb47b5 3.10.0-1160.95.1.el7.x86_64 #1 SMP Mon Jul 24 13:59:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
(container view - host view is the same aside from the hostname)Compiler
CPU
Virtualization / Containers
Docker 24.0.6
Other Information
No response
The text was updated successfully, but these errors were encountered: