login
Home / Papers / Optical Character Recognition

Optical Character Recognition

1 Citations1992
Anne Permaloff, C. Grafton
PS: Political Science & Politics

Optical character recognition is a process by which printed text is detected and transformed into a computer text file that is readable by word processor, statistics, and/or database software supported by the OCR program used.

Abstract

Optical character recognition (OCR) is a process by which printed text is detected and transformed into a computer text file. OCR consists of two basic processes: scanning and recognition. Scanning, performed with a device called a scanner, digitizes the printed page, creating a coded graphics version of the text that may be stored on disk. That coded version transforms the scanned image into pixels, and it is readable by graphics programs. The separate recognition process translates the picture of an “A” into the letter “A.” A new file is created in a format determined by user instructions. That file is readable by word processor, statistics, and/or database software supported by the OCR program used. OCR is a technique that can be useful to political scientists. For example, research notes taken from printed sources, rather than being laboriously typed, could be scanned, processed, and saved as a file readable by a word processing package. Content analysis might be almost completely mechanized. Numerical data from government reports could be scanned rather than entered by hand and then made readable by a spreadsheet, database management program, or statistics package.