@inproceedings{maslowa2019receipt,
author = {Maslova, Olga and Klein, Louis and Dabernat, Damien and Benoit, Alexandre and Lambert, Patrick},
year = {2019},
month = {09},
pages = {1-6},
title = {Receipt automatic reader},
doi = {10.1109/CBMI.2019.8877407}
}
Receipt detection | Receipt localization | Receipt normalization | Text line segmentation | Optical character recognition | Semantic analysis |
---|---|---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ | ❗ | ❌ |
-
Mask-RCNN detection and segmentation method
- ResNet-101 as backbone
-
Only the final layers were adjusted: the Region Proposal Network (RPN) and the segmentation mask heads of the network have been fine-tuned and the bounding box classifier head is modified to comply with our two class problem (receipt/non-receipt)
- Dice loss
-
First we extract the polygonal representation of the predicted receipt mask boundaries. We then approximate this polygon by a quadrilateral. Finally, a homography is computed to remove the perspective effect. This operation transforms the receipt quadrilateral into the closest straight vertical (or horizontal) and rectangular shaped receipt image and crops it out of the initial image.
- OpenCV
- Using VGG-16, finding orientation of receipt is done (0, 90, 180 or 270 degrees).
- orientation correction
-
cropped images are resized to 512 pixel width
-
highlight the text-free areas that are likely to isolate distinct text region blocks. Sobel filters are applied to highlight character boundaries
-
An automatic thresholding that relies on the classical Otsu method is applied next in order to obtain a binary image mask
-
A morphological image closing is finally applied to merge neighboring text pixels together
-
Tesseract do not compete with commercial grade OCR systems from Cloud Providers
- Google Vision OCR
-
receipts are, to name a few, not standardized, often damaged before being captured (crumples, tear, etc.), captured in poor acquisition conditions (no vertical alignment, with perspective effect, poor light, etc.)
-
data augmentation
horizontal flipping, scaling, small rotations around the vertical alignment of the receipt, shearing and translation