-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OCR] Convert jupyter notebook (with bash script) to Python script #254
Comments
Hello @cuducos and @jtemporal I took a quick look at the problem and it seems we're pending on the |
Exactly — a bit of background just in case ; ) |
I actually did the OCR with python, it even run things in parallel 😄 :mindblown: Just check the stuff I linked on #188 (comment) and LMK if u need any help with that! |
Have you seen http://ocrmypdf.readthedocs.io/ ? I use that in my research. It's a interface to Tesseract-ocr and the results It seems good. |
Heads up: I've been hacking away on a better approach for OCRing receipts at https://github.com/fgrehm/serenata-ocr and one of the ideas is that it will have support for a "pluggable provider interface", meaning people can choose between Google Cloud Vision, https://ocr.space/, Microsoft Azure and maybe even some self hosted tesseract infra. |
See #298 |
Currently the code that generates a dataset with the text from CEAP receipts is in this notebook.
As it uses shell scripts here and there it would be great to have this as a standard
src/
Python file, without shell script, to automate this data collection without some many dependencies such as binaries available in one's$PATH
.The text was updated successfully, but these errors were encountered: