Generic Text Recognition using Long Short-Term Memory ... - KLUEDO

Generic Text Recognition using Long Short-Term Memory Networks by Adnan Ul-Hasan Thesis approved by the Department of Computer Science University of Kaiserslautern for the award of Doctoral Degree Doctor of Engineering (Dr. -Ing)

Date of PhD Defense: 11.01.2016 Dean of the Department: Prof. Dr. Klaus Schneider Chairperson of the PhD Committee: Prof. Dr. Paul Lukowicz Thesis Reviewers: Prof. Dr. Andreas Dengel, DFKI Kaiserslautern Associate Prof. Dr. Faisal Shafait, SEECS, NUST Pakistan apl. Prof. Dr. Marcus Liwicki, University of Kaiserslautern

D 386

It always seems impossible until it is done.

Nelson Mandela

Abstract The task of printed Optical Character Recognition (OCR) is considered a “solved” issue by many Pattern Recognition (PR) researchers. The notion, however, partially true, does not represent the whole picture. Although, it is true that state-of-the-art OCR systems for many scripts exist, for example, for Latin, Greek, Han (Chinese), and Kana (Japanese), there is still a need for exhaustive research for many other challenging modern scripts. Example of such scripts are: cursive Nabataean, which include Arabic, Persian, and Urdu; and the Brahamic family of scripts, which contain Devanagari, Sanskrit, and its derivatives. These scripts present many challenging issues for OCR, for example, change in shape of character within a word depending upon its location, kerning, and a huge number of ligatures. Moreover, OCR research for historical documents still requires much probing; therefore, efforts are required to develop robust OCR systems to preserve the literary heritage. Likewise, there is a need to address the issue of OCR of multilingual documents. Plenty of multilingual documents exist in the current time of globalization, which has increased the influence of different languages on each other. There is an increase in the usage of foreign words and phrases in articles, newspapers, and books, which are generating a large body of multilingual literature everyday. Another effect is seen in the products we use in our daily lives. From packaging of imported food items to sophisticated electronics, the demand of international customers to access information about these products in their native language is ever increasing. The use of multilingual operational manuals, books, and dictionaries motivates the need to have multilingual OCR systems for their digitization. The aim of this thesis is to find the answers to some of these challenges using the contemporary machine learning methodologies, especially the Recurrent Neural Networks (RNN). Specifically, a recent architecture of these networks, referred to as Long Short-Term Memory (LSTM) networks, has been employed to OCR modern as well historical documents. The excellent OCR results obtained on these documents encourage us to extend their application to the field of multilingual OCR. The LSTM networks are first evaluated on standard English datasets to benchmark their performance. They yield better recognition results than any other contemporary OCR techniques without using sophisticated features and language modeling. Therefore, their application is further extended to more complex scripts that include Urdu Nastaleeq and Devanagari. For Urdu Nastaleeq script, LSTM networks achieve

the best reported OCR results (2.55% Character Error Rate (CER)) on a publicly available data set, while for Devanagari script, a new freely available database has been introduced on which CER of 9% is achieved. The LSTM-based methodology is further extended to the OCR of historical documents. In this regard, this thesis focuses on Old German Fraktur script, medieval Latin script of the 15th century, and the Polytonic Greek script. LSTM-based systems outperform the contemporary OCR systems on all of these scripts. For old documents, it is usually very hard to prepare transcribed dataset for training a neural network in supervised learning paradigm. A novel methodology has been proposed by combining segmentation-based and segmentation-free approaches to OCR scripts for which no transcribed training data is available. For German Fraktur and Polytonic Greek scripts, artificially generated data from existing text corpora yield highly promising results (CER of

Generic Text Recognition using Long Short-Term Memory ... - KLUEDO

Generic Text Recognition using Long Short-Term Memory ... - KLUEDO

Suggest Documents

Generic Text Summarization Using Probabilistic Latent Semantic ...

Forecasting Volatility Using Long Memory and

Text Normalization using Memory Augmented Neural Networks

Text Based Image Recognition using Multilayer Perceptron

Degraded Text Recognition Using Word Collocation

Degraded Text Recognition Using Word Collocation

MASTER Text Dependent Speaker Recognition Using

Mono-font Cursive Arabic Text Recognition Using Speech Recognition

Urdu Nasta'liq text recognition using implicit ... - SpringerPlus

Deficits in Long-Term Recognition Memory Reveal ... - Semantic Scholar

Spatial Memory and Long-Term Object Recognition Are ... - PLOS

Recognition Memory for Text and Melody of Songs After Unilateral ...

Vehicle recognition and tracking using a generic multi ... - HAL-Inria

Keystroke Biometric Recognition Studies on Long-Text Input under

and long-term memory

Long Memory Processes - ReyLab

Particle Methods - KLUEDO

Memory-Based Named Entity Recognition using Unannotated ... - CLiPS

Pattern Recognition and Memory Mapping using Mirroring ... - arXiv

Working memory, long-term memory and language

Independence of Recognition Memory and

Exercise can rescue recognition memory

Prefrontal cortex and recognition memory

Exercise can rescue recognition memory