Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect if no text was extracted / if there are grave inconsistencies #99

Open
4 tasks
mikegerber opened this issue Oct 27, 2023 · 0 comments
Open
4 tasks
Assignees
Labels
enhancement New feature or request

Comments

@mikegerber
Copy link
Member

mikegerber commented Oct 27, 2023

I had a user report he wouldn't get good results. It turned out he used --text-equiv-level line when there was no line text. Ways to improve dinglehopper's behavior here:

  • Warn if no text is extracted (and maybe do so in a smart way. "no text" can be valid on empty pages.)
  • Warn if there are grave inconsistencies between levels (harder; and line vs region text can differ in a small ways)
  • Warn if there are grave differences between GT and OCR (e.g. no GT text but lots of OCR text; need to think about this more)
  • Check if I could use OCR-D libs here (I'm somewhat skeptical to change something here because the text extraction code here is working, and OCR-D changes a lot comparatively)
@mikegerber mikegerber self-assigned this Oct 27, 2023
@mikegerber mikegerber added the enhancement New feature or request label Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant