Home / Papers / Optical Character Recognition (OCR) for Text Recognition and its Post-Processing...

Optical Character Recognition (OCR) for Text Recognition and its Post-Processing Method: A Literature Review

DOI: 10.1109/ICTIIA54654.2022.9935961Semantic Scholar

10 Citations•2022•

Ridvy Avyodri, Samuel Lukas, H. Tjahyadi

2022 1st International Conference on Technology Innovation and Its Applications (ICTIIA)

This research will review OCR-related works and the methods used within this framework to support further research and divided into Image Pre-processing, Text Segmentation/Localization, Feature Extraction, Text Recognition, and Post-Processing.

Abstract

Most organizations worldwide still rely on paper-based documents. Usage of paper-based documents gives a hard time extracting the data required from those documents. This heavy paper usage also damages the efficiency in cost and time, not to mention the impact on the environment caused by deforestation to produce these papers. These are some reasons that motivate the need to digitalize paper-based documents. To convert the usage of paper-based documents into paperless documents cannot be done in an instant. In its transition, these paper-based documents are usually scanned into image format to reduce the usage of paper. From this comes a need for technology that is able to recognize and extract data in the scanned image of paper-based documents. Optical Character Recognition makes it possible to do text recognition appearing in images. However, despite its long history of development, OCR for text recognition has yet to achieve 100% accuracy. In general, OCR process will be divided into Image Pre-processing, Text Segmentation/Localization, Feature Extraction, Text Recognition, and Post-Processing. Thus, this research will review OCR-related works and the methods used within this framework to support further research.