Document Enhancement System Using Auto-encoders

The conversion of scanned documents to digital forms is performed using an Optical Character Recognition (OCR) software. This work focuses on improving the quality of scanned documents in order to improve the OCR output. We create an end-to-end document enhancement pipeline which takes in a set of noisy documents and produces clean ones. Deep neural network based denoising auto-encoders are trained to improve the OCR quality. We train a blind model that works on different noise levels of scanned text documents. Results are shown for blurring and watermark noise removal from noisy scanned documents.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here