Skip to content

ChujieChen/PyOCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyOCR

About

This is a python based optical character recognition tool (OCR) with GUI. The GUI is supported with python tkinter. The OCR part is handled by using py-tesseract which is a a wrapper for Google’s Tesseract-OCR Engine. This tool also provide a manual correction (provided by python-datamuse) using homophones of target words.

Functionality

  • Import one image from your computer at a time and OCR the imported image
  • Select a part of the image and do OCR only on that region
  • Can edit/copy/paste the result
  • Provide homographs and homophones as other possible words

Installing PyOCR

Once you've downloaded all files. Install following dependencies: (Tesseract, Pillow, Numpy, py-tesseract, python-datamuse)

An example using Homebrew, Anaconda, Pip to install aboving dependencies contains following command lines:

  • brew install tesseract
  • conda install numpy pillow
  • pip install pytesseract python-datamuse

Running PyOCR

In your console, at directory /PyOCR/pyocr/ run below commend line:

  • ./PyOCR.py

Or

  • python3 PyOCR.py

to launch the GUI. Then you can select the image to OCR with. And you can also get a list of homographs and homophones of a word as its candidates.

Using PyOCR

Once you launch the GUI, you will see an interface like this:

main_interface

Click the Browse then you can select the image you want to do OCR with. For example, below there, a scan of a business card is imported. And the OCR results are shown in the middle. all_business

If you only need the information about this Dan Porter, you can drag a rectangular box on this image and click Update OCR, and the tool will only do OCR on text inside that box. some_business

Here is an example showing the result with a photo and the fuzzy search on a word fast. photo

Another example with a screenshot of SMS. simtxt

About

A pythonic optical character recognition tool

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages