The functionality of OCR opus was extended to work with any language by creating support for UTF-8 character encoding and a character and language model for the Hungarian language was created.
Aim to understand, utilize and improve the open source Optical Character Recognizer (OCR) software, OCR opus, to better handle some of the more complex recognition issues such as unique language alphabets and special characters such as mathematical symbols. We extended the functionality of OCR opus to work with any language by creating support for UTF-8 character encoding. We also created a character and language model for the Hungarian language. This will allow other users of the software to perform character recognition on Hungarian input without having to train a completely new character model