Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web app for OCR/NER process #26

Open
richpomfret opened this issue Jan 16, 2019 · 1 comment
Open

Web app for OCR/NER process #26

richpomfret opened this issue Jan 16, 2019 · 1 comment
Labels

Comments

@richpomfret
Copy link
Contributor

To develop tool based on process created in last SoC.

@benwbrum
Copy link
Member

In the 2018 Summer of Code project, Saumya Shah was able to develop a work-flow that

  • started with scanned images on the local filesystem
  • ran OCR software on them to identify the text of individual entries
  • ran NER software on them (based on a trained dataset) to extract semantically relevant fields.

This code is currently functional from the command line, for someone who has access to the images on their local filesystem. However, we anticipate the functionality being used by volunteers who do not have such access or skills, so we need a web-based application (built in Ruby on Rails to be consistent with the rest of our tech stack) that can manage the process.

It should

  • Allow the end user to upload image files and store them on the filesystem
  • Present the existing image files on the filesystem for the user to browse
  • Allow the user to launch the OCR/NER process from the user interface
  • Actually run the scripts mentioned above (which rely on Tesseract and the Python Spacy library), logging any output and errors
  • Allow the user to view progress and review quality of the process.

@PatReynolds PatReynolds added the SoC label Feb 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants