-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ocrd-segment-extract-lines - Lines are not extracted, in case they are in an area of other lines #61
Comments
According to these coords, |
@bertsky:
|
Agreed. Currently, there is only
So I could add an info message if the text is shorter than min-line-length or the size is smaller than min-line-height / min-line-width.
That's how PRImA sees and implements it (but they fail to communicate it in their standards), but not OCR-D (so far). All coordinate/polygon/image handling libraries I know use the pixel-below-right convention, not the path-refers-to-inside-of-polygon interpretation. There's been a short discussion with PRImA on this, but as you can see their neglect of this important detail is striking. |
Yes, this would be very good :-)
Well, if I have an image, which has a height of 1000, it has the pixel numbers from y=0 to y=999 (or y=1 to y=1000, depending on library used). |
Yes, and in array terms you express that via a slice/interval
For an implementation, the pixel-below-right coordinate semantics is by far easier than the path-refers-to-inside-of-polygon (which needs a notion of path and directionality; the former is undefined for baselines and subsegments, the latter is not agreed upon). |
Hi,
I think I have found a bug in
ocrd-segment-extract-lines
:I cannot prove to 100%, but I think I see my environment, that the lines are not extracted (no images are created), in case a line is somehow graphically (concerning the coordinates) within another line of the same region.
I extract only images in this case using this command:
Page-Extract: Here the line
TR-15_line0002
was not extracted:Logfile content for this case:
The text was updated successfully, but these errors were encountered: