OCR REST API using Tesseract OCR Engine (via Tess4J)
Docker image available: https://hub.docker.com/r/lgdd/tess4j-rest
Try and run:
docker run -it --rm -p 8000:8000 lgdd/tess4j-rest
Run docker-compose up --build
(also available as make dev
).
Note: You can also run
./mvnw quarkus:dev
(orquarkus dev
). But for this method to work, you would need the environment variable TESSDATA_PREFIX to be set to the absolute path of this project resource:src/test/resources/test-tessdata/eng.traineddata
You can navigate to http://localhost:8000/q/swagger-ui
and test uploading an image.
Or you can quickly test the endpoint with curl
(from this project root):
curl -X 'POST' \
'http://localhost:8000/detect-text' \
-H 'accept: text/plain' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@src/test/resources/test-data/eurotext.png'
# Parent folder path for tesseract data files
ENV TESSDATA_PREFIX="/opt/tesseract/tessdata"
# Suffix for the data repository to use.
# Either "best", "fast" or "".
# See https://github.com/tesseract-ocr/tessdata#readme
ENV TESSERACT_DATA_SUFFIX="best"
# Version of the data repository.
# See https://github.com/tesseract-ocr/tessdata#readme
ENV TESSERACT_DATA_VERSION="4.1.0"
# Additional languages to download on the application startup.
# For the possible values, see https://github.com/tesseract-ocr/tessdata
ENV TESSERACT_DATA_LANGS="fra,spa,deu"
Readiness: /q/healh/ready
Liveness: /q/healh/live
Application is ready and live when all additional languages has been downloaded.