Skip to content
This repository has been archived by the owner on Feb 19, 2021. It is now read-only.

Upgrade Docker image to Alpine 3.11 #612

Merged

Conversation

languitar
Copy link
Contributor

This make tesseract 4.1 avaialbe, which fixes some things like infinite
processing loops on some documents:
tesseract-ocr/tesseract#2288 (comment)

Some dependencies had to be bumped for being compatible with the new Alpine
libraries.

@languitar
Copy link
Contributor Author

Travis failure seem to be unrelated to my changes. They originate from the sphinx documentation :/

src/paperless/settings.py Outdated Show resolved Hide resolved
bauerj
bauerj previously approved these changes Feb 21, 2020
src/paperless/settings.py Outdated Show resolved Hide resolved
@Tooa
Copy link

Tooa commented Feb 23, 2020

Travis failure seem to be unrelated to my changes. They originate from the sphinx documentation :/

@languitar @bauerj Merge PR #601 first to fix this issue

How is this related to upgrading the operating system?

Had to add the protocol to CORS_ORIGIN_WHITELIST too. It seems like a required fix. See #600 /Open Questions 1.

Tooa
Tooa previously approved these changes Feb 23, 2020
Copy link

@Tooa Tooa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to upgrade Alpine in order to fix issues with tesseract. I have no complaints merging your changes. Generally, it would be nice to separate changes that are not part of the main PR feature in its own commit.

  • Conducted tests
    • docker-compose build works
    • All unit-tests still succeed
    • PDF documents are processed
    • OCR output looks fine

Maybe I found something:

  • After the first docker-compose up, I added PAPERLESS_OCR_LANGUAGES=deu and PAPERLESS_OCR_LANGUAGE=deu to both containers. The log of the consumer shows me:
sudo: setrlimit(RLIMIT_CORE): Operation not permitted
Operations to perform:
  Apply all migrations: admin, auth, contenttypes, documents, reminders, sessions
Running migrations:
  No migrations to apply.
fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/community/x86_64/APKINDEX.tar.gz
(1/1) Installing tesseract-ocr-data-deu (4.1.0-r0)
OK: 308 MiB in 128 packages
sudo: setrlimit(RLIMIT_CORE): Operation not permitted

@pitkley
Copy link
Member

pitkley commented Feb 23, 2020

@languitar thanks for the PR, dependency updates are always nice! #601 is just missing another review from @the-paperless-project/reviewers, then you can rebase on master to fix the Travis issue.

Regarding the setrlimit(RLIMIT_CORE) error/warning @Tooa mentioned: a simple fix is to echo 'Set disable_coredump false' >> /etc/sudo.conf, see https://ask.fedoraproject.org/t/sudo-setrlimit-rlimit-core-operation-not-permitted/4223 and https://gitlab.alpinelinux.org/alpine/aports/issues/11122. Maybe the suggested doas-change in the latter link is a more correct fix, but I think that would be too big for what this PR should be doing.

Feel free to ping me directly for a review once the Travis build is fixed. 👍

@languitar
Copy link
Contributor Author

I have rebased the PR, added the setrlimit fix, and split the PR into two distinct commits with further clarifications.

pitkley
pitkley previously approved these changes Feb 29, 2020
Copy link
Member

@pitkley pitkley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the major dependency updates as best I could and didn‘t find any incompatibilities that should affect us (besides the already fixed CORS whitelist).

Not all dependencies work well on Alpine 3.11. Thus, bump dependencies and lock
again.

Due to also updating the CORS packages while dependency locking, the
CORS_ORIGIN_WHITELIST had to be changed to valid URIs, which are now required
by the respective packages.
This make tesseract 4.1 avaialbe, which fixes some things like infinite
processing loops on some documents: tesseract-ocr/tesseract#2288
@MasterofJOKers MasterofJOKers merged commit 222acb8 into the-paperless-project:master Mar 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants