-
-
Notifications
You must be signed in to change notification settings - Fork 327
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This is nearly a full rewrite of the alignment code. Position is now based on the line baseline (provided by Tesseract) and the font size is smarter (defaulting to Tesseract's provided value with various adjustments). The goals were: - Have Ctrl+F highlight the word as accurately as possible. - Have Ctrl+A/Ctrl+C end up with text that matches the original as closely as possible. - Have PdfSharp and Pdfium produce consistent output. On my test cases all goals are fully met. #236
- Loading branch information
Showing
9 changed files
with
254 additions
and
133 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,15 @@ | ||
namespace NAPS2.Ocr; | ||
using System.Collections.Immutable; | ||
|
||
namespace NAPS2.Ocr; | ||
|
||
/// <summary> | ||
/// A element in the result of an OCR request that represents a text segment. | ||
/// </summary> | ||
public record OcrResultElement(string Text, string LanguageCode, bool RightToLeft, (int x, int y, int w, int h) Bounds); | ||
public record OcrResultElement( | ||
string Text, | ||
string LanguageCode, | ||
bool RightToLeft, | ||
(int x, int y, int w, int h) Bounds, | ||
int Baseline, | ||
int FontSize, | ||
ImmutableList<OcrResultElement> Children); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.