History Ocr
History Ocr
The early optical recognition systems technologies were developed to help peoples that were impaired
optically. Tauschek’s reading machine and Fournier Optophone are two earliest devices
developed back in 1870 to 1931 to help the blind read.[3].
Later in 1950 the Gismo was invented that was capable of translating printed text in machine codes.
These devices were sold by IMR (Intelligent Machines Research) corporation. After 1950 an OCR
was developed for making credit card for the oil company in California by David H. Shepard.
In era of 1954 to 1974 the Optacon hit the market for its portability. In 1980 the developing progress
of OCR system was massive. The price tag and passport scanner were built.
Caere Corporation, Kurzweil Computer Products Inc and ABBYY are some famous companies of
today that were developed in late 1980 and early years of 1990.
In past 19 years (2000 to 2019) the OCR technology is enhanced massively; the online web services
of OCR and some certain offline application or real time translation are developed used on
smartphones.
When talking about the Urdu OCRs, the script research begins decades after the research of Latin
research.
The technologies about Urdu OCR are called “Nastalique”, It was emerged when Persian was official
language of Mughal Empires in South Asia.
Nastalique is was widely used in 1971 in different regions of South Asia, it is still used widely in
India and Pakistan, it is standard calligraphic style in Pakistan.[2]. Naskh is the most common writing
style that is used for Arabic, Persian as well as Pashto scripts
Arabic, Persian, Urdu and Pashto, all four alphabet systems are more or less the same, the only
difference is the total number of characters.
There is a total of 38 characters in Urdu alphabet [12]. In Urdu, the text lines are read from top to
bottom, while, the characters are read from right to left. The characters can be clustered into similar
classes based on the likenesses of their base forms; the characters in the same class vary only by
their dots or retroflex mark.
There is no publicly available handwritten dataset for Nastaliq to research community. While,
character set is almost same for both scripts (i.e., Naskh and Nasta’liq) Efforts are being made to
normalize the dataset of Urdu language for the purpose of comparing different available state-of-
the-art techniques. One such effort is made by CEPARMI (Centre for Pattern Recognition and
Machine Intelligence) [24] to develop a handwritten database from different sources. Other efforts
are being reported by Image understanding and Pattern Recognition Group at the Technical
University of Kaiserslautern, Germany to generate synthetic data of Urdu language, whose contents
were taken from leading Urdu newspaper of Pakistan named Jang [25].
25. Ul-Hasan Adnan, Bukhari SS, Rashid SF, Shafait F, and Breuel TM Semi-Automated OCR Database
Generation for Nabataean Scripts. ICPR:1667, (2012).
2. S. Naz, K. Hayat, M. I. Razzak, M. W. Anwar, S. A. Madani, and S. U. Khan, ‘‘The optical character
recognition of Urdu-like cursive scripts,’’ Pattern Recognit., vol. 47, no. 3, pp. 1229–1248, 2014.
3. H. F. Schantz, History of OCR, Optical Character Recognition. Manchester, VT, USA: Recognition
Technologies Users Association, 1982.
4. W. J. Bijleveld and A. J. Van De Toorn, ‘‘Process and apparatus for producing and reading Arabic
numbers on a record sheet,’’ U.S. Patent 3 527 927, Sep. 8, 1970.