Web app for OCR/NER process #26

richpomfret · 2019-01-16T15:01:07Z

To develop tool based on process created in last SoC.

benwbrum · 2019-01-16T15:20:57Z

In the 2018 Summer of Code project, Saumya Shah was able to develop a work-flow that

started with scanned images on the local filesystem
ran OCR software on them to identify the text of individual entries
ran NER software on them (based on a trained dataset) to extract semantically relevant fields.

This code is currently functional from the command line, for someone who has access to the images on their local filesystem. However, we anticipate the functionality being used by volunteers who do not have such access or skills, so we need a web-based application (built in Ruby on Rails to be consistent with the rest of our tech stack) that can manage the process.

It should

Allow the end user to upload image files and store them on the filesystem
Present the existing image files on the filesystem for the user to browse
Allow the user to launch the OCR/NER process from the user interface
Actually run the scripts mentioned above (which rely on Tesseract and the Python Spacy library), logging any output and errors
Allow the user to view progress and review quality of the process.

PatReynolds added the SoC label Feb 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web app for OCR/NER process #26

Web app for OCR/NER process #26

richpomfret commented Jan 16, 2019

benwbrum commented Jan 16, 2019

Web app for OCR/NER process #26

Web app for OCR/NER process #26

Comments

richpomfret commented Jan 16, 2019

benwbrum commented Jan 16, 2019