Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review tests #72

Open
4 of 11 tasks
mikegerber opened this issue Feb 23, 2022 · 16 comments
Open
4 of 11 tasks

Review tests #72

mikegerber opened this issue Feb 23, 2022 · 16 comments
Assignees

Comments

@mikegerber
Copy link
Collaborator

mikegerber commented Feb 23, 2022

  • Run on all targeted Python versions (to avoid surprises with TensorFlow versions)
  • We're working with GT. Make the GT text removal more robust and implement safeguards ("No TextLine in the GT!")
  • I also noticed an issue where I had an old checkout of repo/assets which would NOT trigger test fails but a fresh checkout would
  • test/assets are copied once and are then reused
  • We reuse the temporary workspace which is also potentially a problem
    • Don't reuse the workspace directory name
  • Run scheduled tests (There have been subtle changes, e.g. in OCR-D, that changed the filenames of created files)
  • Make pytest work - currently only make test works
  • Why is the CircleCI result not accessible/private?
  • Consider dropping Python 3.6 support: It's EOL,
    • and tests now spend most time on compiling OpenCV for Python 3.6...
@mikegerber
Copy link
Collaborator Author

* [ ]  I also noticed an issue where I had an old checkout of `repo/assets` which would NOT trigger test fails but a fresh checkout would
* [ ]  `test/assets` are copied once and are then reused

@bertsky Reading https://github.com/OCR-D/ocrd_tesserocr/blob/master/Makefile#L124-L135, I think we should discuss how to handle {repo,test}/assets better?

@bertsky
Copy link
Contributor

bertsky commented Feb 24, 2022

Reading https://github.com/OCR-D/ocrd_tesserocr/blob/master/Makefile#L124-L135, I think we should discuss how to handle {repo,test}/assets better?

Yes, I have always argued against that pattern – precisely because of old checkouts. The proper way to handle this is via subrepos, which can be updated. See for example repo/assets in core.

@bertsky
Copy link
Contributor

bertsky commented Feb 24, 2022

  • test/assets are copied once and are then reused

Yes, but the dependency to repo/assets ensures that it is in sync with upstream. (The problem is the rule for the latter, see above.)

  • We reuse the temporary workspace which is also potentially a problem

That's not true. Each test case is independent from the others – the fixture makes a fresh copy of test/assets under /tmp.

@mikegerber
Copy link
Collaborator Author

  • We reuse the temporary workspace which is also potentially a problem
    That's not true. Each test case is independent from the others – the fixture makes a fresh copy of test/assets under /tmp.

Yeah, I didn't remember writing it correctly :) My fixture code leaves the temporary directory though, and I think I'm going to rewrite it to use mktemp and only leaving it only if a fail occurs or the user wants it.

@mikegerber
Copy link
Collaborator Author

@mikegerber
Copy link
Collaborator Author

We now install an older opencv-python-headless using a wheel (pip install --prefer-binary opencv-headless) for Python 3.6. This avoids the major pain of having to compile OpenCV for this version and so cuts down test time from >30 minutes to <5 minutes. The price paid for this is that we don't test with the latest OpenCV, but as Python 3.6 is EOL, I can live with it.

@bertsky
Copy link
Contributor

bertsky commented Mar 13, 2023

That's a nice recipe. Should adopt that in all the other places where we still use Python 3.6

@mikegerber
Copy link
Collaborator Author

* [ ]  Why is the CircleCI result not accessible/private?

@kba Do you have an idea how to resolve this? I checked the project settings but couldn't find anything. To reproduce open a private browser window (= not logged in to CircleCI) and try:

😡 https://app.circleci.com/pipelines/github/OCR-D/ocrd_calamari gives 404
https://app.circleci.com/pipelines/github/OCR-D/core - for comparison - gives the build results

@mikegerber
Copy link
Collaborator Author

mikegerber commented Mar 14, 2023

That's a nice recipe. Should adopt that in all the other places where we still use Python 3.6

Keep me posted if you adopt this. I currently use this in CircleCI config but could image moving it to the Makefile if that's the convention:

      - run: if python3 --version | grep -q "Python 3.6"; then pip install --prefer-binary -U opencv-python-headless numpy; fi  # to avoid compilation

I believe there would also be line in requirements.txt that could do the same, but I didn't test this:

opencv-python-headless, opencv-python-headless == 4.6.0.66 ; python_version < "3.7"

But then I would really like if everyone would move away from Ubuntu 18.04, if possible.

@mikegerber
Copy link
Collaborator Author

That's a nice recipe. Should adopt that in all the other places where we still use Python 3.6

It gets pulled in by ocrd so affects every processor - just had the same issue with dinglehopper (resolved with the same workaround).

@bertsky
Copy link
Contributor

bertsky commented Mar 14, 2023

Agreed, we should add this to core.

@bertsky
Copy link
Contributor

bertsky commented Mar 14, 2023

But then I would really like if everyone would move away from Ubuntu 18.04, if possible.

we are not Ubuntu 20 / Python 3.7 now in core/ocrd_all Docker BTW. (There's been no notification unfortunately.)

@kba
Copy link
Member

kba commented Mar 14, 2023

(There's been no notification unfortunately.

Notification in the chat coming once I get around to fixing the base image issue and do next release.

@kba Do you have an idea how to resolve this? I checked the project settings but couldn't find anything. To reproduce open a private browser window (= not logged in to CircleCI) and try:

😡 https://app.circleci.com/pipelines/github/OCR-D/ocrd_calamari gives 404
https://app.circleci.com/pipelines/github/OCR-D/core - for comparison - gives the build results

I've compared the project settings for both projects and could spot no difference. I'll try rotating the GitHub keys, perhaps there are stale permission settings linked to that.

@mikegerber
Copy link
Collaborator Author

Run scheduled tests (There have been subtle changes, e.g. in OCR-D, that changed the filenames of created files)

I set up a monthly schedule. (Needs to be done by "hand" unfortunately)

@bertsky
Copy link
Contributor

bertsky commented Mar 14, 2023

pip install --prefer-binary -U opencv-python-headless numpy

I'm going to add a line

    if $(PYTHON) -V | fgrep -e 3.5 -e 3.6; then $(PIP) install --prefer-binary -U opencv-python-headless numpy; fi

to the recipe of make install in core – so that it will end up in all sub-venv, Docker and CI targets.

@kba Is it ok to add this to OCR-D/core#986?

@kba
Copy link
Member

kba commented Mar 14, 2023

@kba Is it ok to add this to OCR-D/core#986?

Yes, please. The opencv build time is enormous for fresh python 3.6- installations, great if we can reduce that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants