0% found this document useful (0 votes)
130 views6 pages

Optical Character Recognition Research: Index

This document discusses optical character recognition technology and potential solutions for an OCR project. It reviews popular OCR engines like Tesseract, SimpleOCR, and ABBYY. Tesseract used with OpenCV library is proposed as a feasible option, combining Tesseract's accuracy with OpenCV's image processing capabilities. Previous attempts using just the Tesseract API did not work well for business card text detection. The document provides accuracy results and processing architectures for the different technologies.

Uploaded by

Phi Thiện Hồ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views6 pages

Optical Character Recognition Research: Index

This document discusses optical character recognition technology and potential solutions for an OCR project. It reviews popular OCR engines like Tesseract, SimpleOCR, and ABBYY. Tesseract used with OpenCV library is proposed as a feasible option, combining Tesseract's accuracy with OpenCV's image processing capabilities. Previous attempts using just the Tesseract API did not work well for business card text detection. The document provides accuracy results and processing architectures for the different technologies.

Uploaded by

Phi Thiện Hồ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

1

Optical Character Recognition Research

Index
A.

There Issues Addressed ........................................................................................................................ 2

1.

Current Techonlogy............................................................................................................................... 2
1.1.

Tesseract ....................................................................................................................................... 2

1.2.

SimpleOCR..................................................................................................................................... 3

1.3.

ABBYY ............................................................................................................................................ 3

2.

Text detect algorithm in images ........................................................................................................... 5

3.

Solution ................................................................................................................................................. 5
3.1.

Using Tesseract API ....................................................................................................................... 5

3.2.

Using Tesseract API and OpenCV Library ..................................................................................... 5

3.3.

Using Matlab API ........................................................................................................................... 6

4.

Feasible option ...................................................................................................................................... 6

5.

Refences ................................................................................................................................................ 6

2
A. There Issues Addressed
Current technology for development Optical Character Recognition.
Popular algorithm about text detect in images
Solution for project requirement
Feasible option for project requirement
There references using for this report
1. Current Technology
in the word have a lot of applications about optical character recognition engine for various
operating systems. Almost application was built using Google Libs, SimpleOCR SDK And ABBYY
SDK.
1.1. Tesseract
Tesseract is an optical character recognition engine for Mobile operating systems. Its
free software, released under Apache License. Tesseract is considered one of the most
accurate open source OCR engines currently available. Tesseract was in the top three OCR
engines in terms of character accuracy in 1995. Its available for Android, Window, Ubuntu,
Mac OS X.

Tesseract OCR Architecture:

Accuracy Results in 1995

1.2. SimpleOCR
SimpleOCR is a proprietary optical character recognition application developed originally
by Cyril Cambien of France under title WOCAR. Version 3.1, reviewed in PC Magazine in
2004.
Accuracy Results: 99% http://www.simpleocr.com/Info.asp
1.3. ABBYY
ABBYY is an international software company thats provider optical character Recognition.
ABBYY product, such as FineReader. In January 2007 the FineReader Engine (an OCR SDK) was
selected for use in Ricohs DocumentMall document management system.

Accuracy Results: 99.8%

4
Processing architecture

5
2. Text detect algorithm in images

3. Solution
3.1. Using Tesseract API
Tesseract 3.0 can handle any Unicode characters. Tesseract needs to know about different
shapes of the same character by having different fonts separated explicitly. This used to be
limited to 32 fonts, but the limit has been raised to 64. Architecture and accuracy result, I was

show at index above.


3.2. Using Tesseract API and OpenCV Library
OpenCV is written in C++ and its primary interface is in C++, but it still retains a less
comprehensive though extensive older C interface. The API for these interfaces can be found
in the online Document. In system, Application can takes only 1.1MB and accuracy result:

90% in OCR of hande written digits and 93.22% in OCR of English alphabets. OpenCV run on
Android, IOS, Window Its free.

6
Architecture using Opencv and Tesseract API.

3.3. Using Matlab API


MATLAB allows matrix manipulations, plotting of functions and data, implementation of
algorithms, creation of user interfaces, and interfacing with programs written in other
languages, including C, C++, Java, and Fortran.
We can generate say C code or build a library and add that to our project for android. But
the MATLAB Coder alone is not enough to do it. So we need to buy the Tier 1 package
Embedded Code. It is not meet requirement of project.
4. Feasible option
From 24/3/2014 to 28/3/2014, I was built an application in android for demo with Tesseract API.
But I have been some result is not good about detect text in Business card.
During I research about optical character recognition at some articles, tutorial, and report. I
think We can using OpenCV and Tesseract API for build application in android.
5. Refences
http://www.mathworks.com/help/vision/examples/automatically-detect-and-recognize-text-innatural-images.html#zmw57dd0e728
http://antoniogarrote.wordpress.com/2011/01/30/ocr-with-clojure-tesseract-and-opencv/
http://tesseract-ocr.googlecode.com/files/TesseractOSCON.pdf
http://www.abbyy.com/mobileocr/android/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy