Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in machine_based_reading_order_integration branch #125

Open
LucPol98 opened this issue Apr 30, 2024 · 1 comment
Open

Issue in machine_based_reading_order_integration branch #125

LucPol98 opened this issue Apr 30, 2024 · 1 comment

Comments

@LucPol98
Copy link

LucPol98 commented Apr 30, 2024

Hi,
I was looking at your branch regarding the computation of region ordering by neural network and I think I noticed a possible issue that I want to report you. Specifically, I tried the code on an image that has the following characteristics: in the center there is a large image going left to right and there are two columns broken from this image. Therefore, there are 4 paragraphs and one image.

The new sorting you want to implement does not read the paragraphs in a columnar way, but first reads the ones at the top of the image and then the ones at the bottom of the image from left to right. On the other hand, the previous sorting read them correctly. This seems to me to be an issue whenever an image is present.
If you want, I can share with you the image in issue, even though the language of the text is Italian and you might still not understand why it continues in this way by looking at the text.

I assume that the new network does not take into account the presence of images and therefore seeing paragraphs far apart does not understand that they should be read vertically as there is an aesthetic break and not first looking at the upper cluster and then at the lower cluster because of a contextual break.

I don't know if I misunderstood or if it is an intended behavior, but just in case it is not, it introduces this problem and I wanted to flag it for correction.

Apart from that, the new ordering is fantastic and when there are no pictures it is clearly superior to the previous one. Well done! 💯

@vahidrezanezhad
Copy link
Member

Hi, I was looking at your branch regarding the computation of region ordering by neural network and I think I noticed a possible issue that I want to report you. Specifically, I tried the code on an image that has the following characteristics: in the center there is a large image going left to right and there are two columns broken from this image. Therefore, there are 4 paragraphs and one image.

The new sorting you want to implement does not read the paragraphs in a columnar way, but first reads the ones at the top of the image and then the ones at the bottom of the image from left to right. On the other hand, the previous sorting read them correctly. This seems to me to be an issue whenever an image is present. If you want, I can share with you the image in issue, even though the language of the text is Italian and you might still not understand why it continues in this way by looking at the text.

I assume that the new network does not take into account the presence of images and therefore seeing paragraphs far apart does not understand that they should be read vertically as there is an aesthetic break and not first looking at the upper cluster and then at the lower cluster because of a contextual break.

I don't know if I misunderstood or if it is an intended behavior, but just in case it is not, it introduces this problem and I wanted to flag it for correction.

Apart from that, the new ordering is fantastic and when there are no pictures it is clearly superior to the previous one. Well done! 💯

Thank you for taking the time to test the new model. As you're aware, machine-based reading order detection relies on ground truth (GT) for training, unlike heuristic methods. As you correctly pointed out, the machine-based approach doesn't perform well for certain document layouts because these layouts aren't represented in the training ground truth data. Other issues may also arise. For instance, newspapers with multiple articles where the articles can be read in any random order (although the text regions within articles have a unique reading order) are not covered in our ground truth dataset. This tool represents our initial attempt at a machine-based model for reading order, and we aim to enhance it in terms of both ground truth data and model structure.

@LucPol98 LucPol98 closed this as completed May 8, 2024
@LucPol98 LucPol98 reopened this May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants