Optical Character Recognition Research: Index
Optical Character Recognition Research: Index
Index
A.
1.
Current Techonlogy............................................................................................................................... 2
1.1.
Tesseract ....................................................................................................................................... 2
1.2.
SimpleOCR..................................................................................................................................... 3
1.3.
ABBYY ............................................................................................................................................ 3
2.
3.
Solution ................................................................................................................................................. 5
3.1.
3.2.
3.3.
4.
5.
Refences ................................................................................................................................................ 6
2
A. There Issues Addressed
Current technology for development Optical Character Recognition.
Popular algorithm about text detect in images
Solution for project requirement
Feasible option for project requirement
There references using for this report
1. Current Technology
in the word have a lot of applications about optical character recognition engine for various
operating systems. Almost application was built using Google Libs, SimpleOCR SDK And ABBYY
SDK.
1.1. Tesseract
Tesseract is an optical character recognition engine for Mobile operating systems. Its
free software, released under Apache License. Tesseract is considered one of the most
accurate open source OCR engines currently available. Tesseract was in the top three OCR
engines in terms of character accuracy in 1995. Its available for Android, Window, Ubuntu,
Mac OS X.
1.2. SimpleOCR
SimpleOCR is a proprietary optical character recognition application developed originally
by Cyril Cambien of France under title WOCAR. Version 3.1, reviewed in PC Magazine in
2004.
Accuracy Results: 99% http://www.simpleocr.com/Info.asp
1.3. ABBYY
ABBYY is an international software company thats provider optical character Recognition.
ABBYY product, such as FineReader. In January 2007 the FineReader Engine (an OCR SDK) was
selected for use in Ricohs DocumentMall document management system.
4
Processing architecture
5
2. Text detect algorithm in images
3. Solution
3.1. Using Tesseract API
Tesseract 3.0 can handle any Unicode characters. Tesseract needs to know about different
shapes of the same character by having different fonts separated explicitly. This used to be
limited to 32 fonts, but the limit has been raised to 64. Architecture and accuracy result, I was
90% in OCR of hande written digits and 93.22% in OCR of English alphabets. OpenCV run on
Android, IOS, Window Its free.
6
Architecture using Opencv and Tesseract API.