-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue in machine_based_reading_order_integration branch #125
Comments
Thank you for taking the time to test the new model. As you're aware, machine-based reading order detection relies on ground truth (GT) for training, unlike heuristic methods. As you correctly pointed out, the machine-based approach doesn't perform well for certain document layouts because these layouts aren't represented in the training ground truth data. Other issues may also arise. For instance, newspapers with multiple articles where the articles can be read in any random order (although the text regions within articles have a unique reading order) are not covered in our ground truth dataset. This tool represents our initial attempt at a machine-based model for reading order, and we aim to enhance it in terms of both ground truth data and model structure. |
Hi,
I was looking at your branch regarding the computation of region ordering by neural network and I think I noticed a possible issue that I want to report you. Specifically, I tried the code on an image that has the following characteristics: in the center there is a large image going left to right and there are two columns broken from this image. Therefore, there are 4 paragraphs and one image.
The new sorting you want to implement does not read the paragraphs in a columnar way, but first reads the ones at the top of the image and then the ones at the bottom of the image from left to right. On the other hand, the previous sorting read them correctly. This seems to me to be an issue whenever an image is present.
If you want, I can share with you the image in issue, even though the language of the text is Italian and you might still not understand why it continues in this way by looking at the text.
I assume that the new network does not take into account the presence of images and therefore seeing paragraphs far apart does not understand that they should be read vertically as there is an aesthetic break and not first looking at the upper cluster and then at the lower cluster because of a contextual break.
I don't know if I misunderstood or if it is an intended behavior, but just in case it is not, it introduces this problem and I wanted to flag it for correction.
Apart from that, the new ordering is fantastic and when there are no pictures it is clearly superior to the previous one. Well done! 💯
The text was updated successfully, but these errors were encountered: