I-can-read-ML

Preprocess Image

In order to recognize text in an image well, it is almost essential to go through image preprocessing. I performed image preprocessing through the following process.

Change to gray
Resize image
Remove noise
Use Adaptive Thresholding

After such image preprocessing, a text-extracting model was created.

Try1. Using Tesseract

I simply imported tesseract from the library and extracted the text. However, it did not perform well for Korean language. Here's how to use it. (I did not want to extract numbers from the image, so I specified the blacklist as follows.)

print(tesseract.image_to_string(Image.open(result), config=r'-c tessedit_char_blacklist=0123456789 --psm 3', lang='kor'))

Try2. Using Tesseract4 with tessdata_best

There was training data for the specific language of Tesseract itself. So, I tried text extraction using this Korean data that showed the best performance. Although the recognition rate was higher than before, the performance was still not good. Finally I decided that I should train Korean language myself. The picture below is the additional value I put in to better recognize the cafe menu items.

Try3. Using Clova ai model

To better recognize Korean language, it seemed best to use a model made by Koreans. And I had to use a model that could train without difficulty. So, I decided to train the model using the deep-text-recognition-benchmark created by Clova ai. And I used Korean font images from aihub to train the data. This data consists of 50 types of fonts using 11,172 modern Korean characters, image files created by gender and age group, and images constructed with 100,000 images including signs, trademarks, and traffic signs.

Create the lmdb data

python3 ./deep-text-recognition-benchmark/create_lmdb_dataset.py \
  --inputPath ./deep-text-recognition-benchmark/ocr_data/ \
  --gtFile ./deep-text-recognition-benchmark/ocr_data/gt_train.txt \
  --outputPath ./deep-text-recognition-benchmark/ocr_data_lmdb/train

python3 ./deep-text-recognition-benchmark/create_lmdb_dataset.py \
  --inputPath ./deep-text-recognition-benchmark/ocr_data/ \
  --gtFile ./deep-text-recognition-benchmark/ocr_data/gt_validation.txt \
  --outputPath ./deep-text-recognition-benchmark/ocr_data_lmdb/validation

Train the model

CUDA_VISIBLE_DEVICES=0 python3 ./deep_text_recognition_benchmark/train.py \
    --train_data ./deep_text_recognition_benchmark/ocr_data_lmdb/train \
    --valid_data ./deep_text_recognition_benchmark/ocr_data_lmdb/validation \
    --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction CTC \
    --batch_size 512 --batch_max_length 200 --data_filtering_off --workers 0 \
    --num_iter 100000 --valInterval 100

Result

The best accuarcy is 99.031.

Try4. Train cafe menu items with my own data

A model with very high performance was created, but I wanted to create a model that better recognizes cafe menu items. So, I created my own dataset of cafe menu items and fine-tuned the existing model with this data. This is the final model I made.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
deep_text_recognition_benchmark		deep_text_recognition_benchmark
.gitignore		.gitignore
README.md		README.md
app.py		app.py
log_demo_result.txt		log_demo_result.txt
ocr_test_annotation.json		ocr_test_annotation.json
ocr_train_annotation.json		ocr_train_annotation.json
ocr_validation_annotation.json		ocr_validation_annotation.json
preprocessImage.py		preprocessImage.py
requirements.txt		requirements.txt
testlog.txt		testlog.txt
transformModel.py		transformModel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

I-can-read-ML

Preprocess Image

Try1. Using Tesseract

Try2. Using Tesseract4 with tessdata_best

Try3. Using Clova ai model

Try4. Train cafe menu items with my own data

About

Releases

Packages

Languages

I-can-read/ML

Folders and files

Latest commit

History

Repository files navigation

I-can-read-ML

Preprocess Image

Try1. Using Tesseract

Try2. Using Tesseract4 with tessdata_best

Try3. Using Clova ai model

Try4. Train cafe menu items with my own data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages