-
Notifications
You must be signed in to change notification settings - Fork 743
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
enhancement: entire page OCR output included with hi_res (#1263)
Bumps unstructured-inference==0.5.19 to bring in @christinestraub's enhancement Unstructured-IO/unstructured-inference#186 . This is a **massive** improvement where previously omitted text was not included in `hi_res` output if the layout model had not put a bounding box around it. In addition, the xycut sorting algorithm generally does a good job of ordering the merged OCR-text-not-in-layout-model bboxes with layout-model bboxes into "natural reading order." More details in Unstructured-IO/unstructured-inference#186 (comment) . Bonus: changelog fix.
- Loading branch information
Showing
14 changed files
with
2,618 additions
and
664 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.