Port to ocrd core version 3.0.0 #5

MehmedGIT · 2024-08-13T08:43:04Z

Already migrated processors:

OcropyBinarize
OcropyClip
OcropyDenoise
OcropyDeskew
OcropyDewarp
OcropyRecognize
OcropyResegment
OcropySegment
OcropyTrain
PostCorrector
CISAligner

ocrd_cis/ocropy/binarize.py

Port to v3

ocrd_cis/ocropy/binarize.py

kba · 2024-08-14T08:12:56Z

ocrd_cis/ocropy/binarize.py

+        if level == 'page':
+            try:
+                ret.append(self.process_page(page, page_image, page_xywh, zoom, page_id, output_file_id))
+            except ValueError as e:


@bertsky Do we even want to catch these or should we let them explode and let core do the error handling? For page-wide binarization, I think we probably should because that means the page failed. But for region and line, we might have rogue instances of zero size but all the other regions/lines might be fine.

We will soon catch all exceptions on the page level in core. So this should not be handled here.

Regarding lower-level error handling: We have discussed this before, but partial failures across a page in general mean we also must be able to cope with partial annotation (incremental processors). We have no real solution ATM.

But since this PR just preserves the current behaviour (skipping partial failures regardless of level), and there are also other possible causes to catch in core, let's keep it like this.

ocrd_cis/ocropy/binarize.py

kba

ocrd-cis-ocropy-binarize with new API LGTM!

Co-authored-by: Konstantin Baierer <[email protected]>

binarize: don't conflate region/lines seg, pass output_file_id

MehmedGIT · 2024-08-27T12:03:01Z

Would you be willing to switch from CircleCI to GithubActions?

Sure, let's try that!

I will push a simple Github Actions workflow to this PR to execute the tests.

.github/workflow/tests.yml

Co-authored-by: Robert Sachunsky <[email protected]>

MehmedGIT · 2024-08-27T13:52:14Z

@bertsky, I think the GitHub Actions workflow is not triggered until you have that in your fork's master/main?

bertsky · 2024-08-27T13:55:26Z

I think the GitHub Actions workflow is not triggered until you have that in your fork's master/main?

No, AFAIK GH Actions must be activated for each GH repo/fork individually. So in this case, it would be your fork. Then once I merged here, I have to enable on my fork. And once fix-alpha-shape gets merged upstream, GHA would need to be activated there.

bertsky · 2024-08-27T14:18:40Z

Wow. So with that we now know that we will get more problems on Ocrolib starting with Python 3.9:

NameError: name 'NaN' is not defined

But at least 3.8 runs through (and fast!)

MehmedGIT · 2024-08-27T14:21:37Z

No, AFAIK GH Actions must be activated for each GH repo/fork individually. So in this case, it would be your fork. Then once I merged here, I have to enable on my fork. And once fix-alpha-shape gets merged upstream, GHA would need to be activated there.

It was probably me who messed up a bit and created the workflow folder instead of workflows ... Maybe it was just enough to rename the folder instead of merging branches to the master. To trigger GH Actions on this PR.

bertsky · 2024-08-27T14:24:56Z

Ok, so what do we do next? Debugging CircleCI seems tiresome, perhaps we should just deactivate that (keeping the config file). But then we should also add a CD to GHA. (The credentials for ocrd on Dockerhub must be added as a Secret in project settings IIRC.)

MehmedGIT · 2024-08-27T14:25:37Z

Wow. So with that we now know that we will get more problems on Ocrolib starting with Python 3.9

I will see if I can find a fast fix for that. But I will have to modify the ocrolib slightly to make NaN work with higher Python versions.

EDIT: All tests pass now after 224e86f and a397531. Not sure if a397531 was needed since tests pass regardless.

MehmedGIT · 2024-08-27T14:33:13Z

But then we should also add a CD to GHA. (The credentials for ocrd on Dockerhub must be added as a Secret in project settings IIRC.)

Yes.

…ementation)

MehmedGIT added 3 commits August 13, 2024 10:41

add executable property

2ed2c4f

add setup method if missing

61e6caf

add self.logger wherever missing

a0965c2

kba reviewed Aug 13, 2024

View reviewed changes

ocrd_cis/ocropy/binarize.py Outdated Show resolved Hide resolved

kba and others added 11 commits August 13, 2024 14:57

require core >= 3.0.0a1

dbccae5

port part of binarize to core v3

8557a26

Merge pull request #1 from kba/port-to-v3

911a4c1

Port to v3

move: determine_zoom to common.py

278b706

move: logger init to setup()

6beec17

refactor: log -> logger

1b2fea3

remove: unused imports

fe33494

remove: file grp cardinality checks inside process()

3368a53

remove: constructors, adapt setup()

ae97768

completed: OcropyBinarize

60d02d2

remove file grp cardinality asserts

dcaccd4

kba reviewed Aug 14, 2024

View reviewed changes

ocrd_cis/ocropy/binarize.py Outdated Show resolved Hide resolved

kba reviewed Aug 14, 2024

View reviewed changes

ocrd_cis/ocropy/binarize.py Outdated Show resolved Hide resolved

kba reviewed Aug 14, 2024

View reviewed changes

ocrd_cis/ocropy/binarize.py Outdated Show resolved Hide resolved

kba reviewed Aug 14, 2024

View reviewed changes

ocrd_cis/ocropy/binarize.py Outdated Show resolved Hide resolved

kba reviewed Aug 14, 2024

View reviewed changes

ocrd_cis/ocropy/binarize.py Outdated Show resolved Hide resolved

kba reviewed Aug 14, 2024

View reviewed changes

MehmedGIT and others added 8 commits August 14, 2024 10:51

Update ocrd_cis/ocropy/binarize.py

b178227

Co-authored-by: Konstantin Baierer <[email protected]>

Update ocrd_cis/ocropy/binarize.py

67b6107

Co-authored-by: Konstantin Baierer <[email protected]>

Update ocrd_cis/ocropy/binarize.py

06a98b1

Co-authored-by: Konstantin Baierer <[email protected]>

Update ocrd_cis/ocropy/binarize.py

1e6cd7b

Co-authored-by: Konstantin Baierer <[email protected]>

fix: potentially wrong dpi in logs

71bb26d

binarize: don't conflate region/lines seg, pass output_file_id

64f02a3

Update binarize.py

d7c15c7

Merge pull request #2 from kba/fix-binarize-v3

156d79f

binarize: don't conflate region/lines seg, pass output_file_id

add: simple github actions workflow

f6e437f

bertsky requested changes Aug 27, 2024

View reviewed changes

.github/workflow/tests.yml Outdated Show resolved Hide resolved

.github/workflow/tests.yml Outdated Show resolved Hide resolved

MehmedGIT and others added 2 commits August 27, 2024 15:30

Update .github/workflow/tests.yml

403781a

Co-authored-by: Robert Sachunsky <[email protected]>

Update .github/workflow/tests.yml

97083bb

Co-authored-by: Robert Sachunsky <[email protected]>

bertsky self-requested a review August 27, 2024 13:36

fix: checkout ref

2b20e0c

MehmedGIT marked this pull request as ready for review August 27, 2024 13:53

bertsky approved these changes Aug 27, 2024

View reviewed changes

MehmedGIT added 2 commits August 27, 2024 16:08

Create GH Actions workflow: test.yml

86a08eb

Merge branch 'master' into port-to-v3

231edf2

delete: wrong path for workflows

1d7e9a0

MehmedGIT added 2 commits August 27, 2024 16:27

fix: NaN error for python3.9+

224e86f

fix: NaN in reading_order in morph.py

a397531

bertsky and others added 9 commits September 1, 2024 11:26

fix type hints

9cf8305

dewarp: make thread-safe

a0c734d

recognize: disallow multithreading (impossible with current lstm impl…

66baaf0

…ementation)

postcorrect: make work under METS Server

32ce656

tests: use METS Server if OCRD_MAX_PARALLEL_PAGES>1

c4a5999

make test: run serially and parallel, show times

ae7dc67

require ocrd>=3.0.0b4

e540b10

segment: adapt to numpy deprecation

99b3489

eval/stats: Levenshtein -> rapidfuzz.distance.Levenshtein

dee1abf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port to ocrd core version 3.0.0 #5

Port to ocrd core version 3.0.0 #5

MehmedGIT commented Aug 13, 2024 •

edited by bertsky

Loading

kba Aug 14, 2024

bertsky Aug 14, 2024

kba left a comment

MehmedGIT commented Aug 27, 2024

MehmedGIT commented Aug 27, 2024

bertsky commented Aug 27, 2024

bertsky commented Aug 27, 2024

MehmedGIT commented Aug 27, 2024 •

edited

Loading

bertsky commented Aug 27, 2024

MehmedGIT commented Aug 27, 2024 •

edited

Loading

MehmedGIT commented Aug 27, 2024

Port to ocrd core version 3.0.0 #5

Are you sure you want to change the base?

Port to ocrd core version 3.0.0 #5

Conversation

MehmedGIT commented Aug 13, 2024 • edited by bertsky Loading

kba Aug 14, 2024

Choose a reason for hiding this comment

bertsky Aug 14, 2024

Choose a reason for hiding this comment

kba left a comment

Choose a reason for hiding this comment

MehmedGIT commented Aug 27, 2024

MehmedGIT commented Aug 27, 2024

bertsky commented Aug 27, 2024

bertsky commented Aug 27, 2024

MehmedGIT commented Aug 27, 2024 • edited Loading

bertsky commented Aug 27, 2024

MehmedGIT commented Aug 27, 2024 • edited Loading

MehmedGIT commented Aug 27, 2024

MehmedGIT commented Aug 13, 2024 •

edited by bertsky

Loading

MehmedGIT commented Aug 27, 2024 •

edited

Loading

MehmedGIT commented Aug 27, 2024 •

edited

Loading