Nhat Pham (https://github.com/nhatsmrt) & Hoang Phan (https://github.com/petrpan26)
This project is based on Kaggle's competition: https://www.kaggle.com/c/denoising-dirty-documents
The challenge is to removed different types of synthetic noises from scanned texts.
NOTE: This project is writen in Tensorflow 1.9.
Small windows (e.g of size ) of the scanned texts are passed through an autoencoder-like neural network.
The network has a convolutional encoder with residual connections. For the decoder component, a simple feedforward layer is sufficient. However, a deconvolutional layer is used because it has less parameters, which speeds up training time.
Detailed architecture can be found in code and project report.