[OCR] Convert jupyter notebook (with bash script) to Python script #254

cuducos · 2017-06-19T14:19:53Z

Currently the code that generates a dataset with the text from CEAP receipts is in this notebook.

As it uses shell scripts here and there it would be great to have this as a standard src/ Python file, without shell script, to automate this data collection without some many dependencies such as binaries available in one's $PATH.

The text was updated successfully, but these errors were encountered:

jtemporal · 2017-06-19T19:34:27Z

#207 documents how to generate the dataset that already is on our S3 but there's no python script for it. thanks @fgrehm for the data btw ;)

tuliocasagrande · 2017-06-20T17:46:54Z

Hello @cuducos and @jtemporal

I took a quick look at the problem and it seems we're pending on the ReimbursementOCR to be implemented. Is that correct?
The google-cloud-vision python library can also be a good idea.

cuducos · 2017-06-20T18:06:51Z

I took a quick look at the problem and it seems we're pending on the ReimbursementOCR to be implemented. Is that correct?

Exactly — a bit of background just in case ; )

fgrehm · 2017-06-20T18:28:13Z

I actually did the OCR with python, it even run things in parallel 😄 :mindblown:

Just check the stuff I linked on #188 (comment) and LMK if u need any help with that!

jandersoncoelho · 2017-07-16T16:27:19Z

Have you seen http://ocrmypdf.readthedocs.io/ ? I use that in my research. It's a interface to Tesseract-ocr and the results It seems good.

fgrehm · 2017-11-27T16:00:12Z

Heads up: I've been hacking away on a better approach for OCRing receipts at https://github.com/fgrehm/serenata-ocr and one of the ideas is that it will have support for a "pluggable provider interface", meaning people can choose between Google Cloud Vision, https://ocr.space/, Microsoft Azure and maybe even some self hosted tesseract infra.

fgrehm · 2017-11-29T14:54:50Z

See #298

cuducos added data collection enhancement labels Jun 19, 2017

jtemporal mentioned this issue Jun 21, 2017

Document reimbursements OCR dataset #207

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OCR] Convert jupyter notebook (with bash script) to Python script #254

[OCR] Convert jupyter notebook (with bash script) to Python script #254

cuducos commented Jun 19, 2017

jtemporal commented Jun 19, 2017

tuliocasagrande commented Jun 20, 2017

cuducos commented Jun 20, 2017

fgrehm commented Jun 20, 2017

jandersoncoelho commented Jul 16, 2017

fgrehm commented Nov 27, 2017 •

edited

Loading

fgrehm commented Nov 29, 2017

[OCR] Convert jupyter notebook (with bash script) to Python script #254

[OCR] Convert jupyter notebook (with bash script) to Python script #254

Comments

cuducos commented Jun 19, 2017

jtemporal commented Jun 19, 2017

tuliocasagrande commented Jun 20, 2017

cuducos commented Jun 20, 2017

fgrehm commented Jun 20, 2017

jandersoncoelho commented Jul 16, 2017

fgrehm commented Nov 27, 2017 • edited Loading

fgrehm commented Nov 29, 2017

fgrehm commented Nov 27, 2017 •

edited

Loading