This attempt was taken to improve the accuracy of Sinhala character recognition using deep learning mechanisms using Tesseract, an open-source, deep-learning based OCR engine developed by Google.
With the advancement of computer technology during the last few years, researchers have integrated machine learning and deep learning techniques to analyse the textual representations on digital documents. As a result of that, people have tended to integrate Optical Character Recognition (OCR) technology to recognize printed texts into machine operable text for different character sets. Sinhala as an abugida script has its own writing system which is used to write Sinhala and Pali languages. With the complexities of the Sinhala script, it makes hard to develop an OCR system. When considering recent literature, most research groups try to reduce the complex nature of the Sinhala script with the support of computer science and Neural networks [1] , [2] . Tesseract is an open-source, deep-learning based OCR engine developed by Google [3] . Despite decades of research on the engineering aspects, our attempt was taken to improve the accuracy of Sinhala character recognition using deep learning mechanisms.