Large Distributed Arabic Handwriting Recognition System based on ...

3 downloads 124796 Views 392KB Size Report
-mail address hh .... Amazon Simple Storage Service (S3) is in the first hand, the source for the data to ... instances of Amazon Elastic Computing Cloud service.
Available online at www.sciencedirect.com

ScienceDirect AASRI Procedia 5 (2013) 156 – 163

2013 AA ASRI Confference on Parallel P and Distributed d Computingg and Systeems

Large Distribut D ed Arabic Handw writing Recogni R ition Sysstem B Based onn the Coombinatioon of FaastDTW Algorithhm and M Mapredduce Proogrammiing Moddel via Cloud C Co omputingg Technoologies H Hamdi Hassena,b*, Maaher Khemaakhema a

b

Mirr@cl Lab, FSEGSS University of Sfa fax BP 1088, 3018 8 Sfax, Tunisia C Computer sciencee departement, College C Of Sciencee And arts at Al-O Ola, Taibah Univversity, KSA.

Absttract This paper proposees a robust, effficient and scaalable distributeed Arabic hand dwriting OCR system based on a parallel DTW algorithm m via cloud com mputing technologies. The three techniques Haadoop, MapRedduce and Cascaading are used FastD to im mplement the paarallel FastDTW W algorithm. The T experimentss were deployed d on Amazon EC2 E Elastic Mapp Reduce and Amaazon Simple Stoorage Service (S S3) using a largge scaled dataseet built from thee IFN/ENIT dattabase.

Published d by ElsevierbyB.V. BElsevier B.V. ©2013 20013The © Authors. Published Selection peer under responsibility of American Applied Science Science ResearchRese Institute Seleectionand/or and/or p review peer review unnder responsibbility of Amer rican Applied earch Institute Keyw words : Large OCR R system, Fast DTW, D MapReducee, Cloud computing ;

1. In ntroduction T Today there aree many OCR systems in use based on diffferent approaaches and algoorithms. All off the popular OCR R systems suppport high accuuracy and most high speed especially tho ose dedicated for printed chharacters and highh quality docuuments. Unforrtunately, thiss is not the caase especially for the Arabbic handwritinng characters

*

C Corresponding auuthor. Tel: +966--563598400 ; faxx: + 216 74 278 777 7 E-mail address [email protected]

2212-6716 © 2013 The Authors. Published by Elsevier B.V. Selection and/or peer review under responsibility of American Applied Science Research Institute doi:10.1016/j.aasri.2013.10.072

Hamdi Hassen and Maher Khemakhem / AASRI Procedia 5 (2013) 156 – 163

where OCR systems are limited to recognize small and rarely medium quantity of documents for some specific purposes. OCR systems that treat large amounts of documents are very limited and not powerfully enough such as the Australian Newspaper Digitization Project [1], OCRGrid [2], Kirtas[3] and OCRopus [4]. Conducted experiments and evaluations on several Arabic handwriting OCR systems show and confirm that : in the first hand, the euclidean distance technique is used for classification. However, this technique is less robustness and more fragile [5]. In the second hand, the Dynamic Time Warp (DTW) algorithm stands among the best techniques for such a mission [6]. The major problem of the DTW is the slowness of its response time because of the enormous amount of computation to achieve [7]. Distributed system, such as cloud computing technologies, provides viable framework to speed up the time of the OCR system based on DTW algorithm. Cloud computing is primarily used to deliver many services such as Infrastructure (I), Platform (P) and Software (S) as services. All these services are available to consumers as registration based services in a pay-as-you-consume model [8]. This paper is organized as follows: an overview on the DTW algorithm and especially the FastDTW and the use of them in Arabic character recognition, is presented in section 2. Hadoop, MapReduce and Cascading models are presented in section 3.The proposed approach is explained in section 4. Experimental and results are presented and discussed in section 5. Conclusion and future work are presented in the last section. 2. Dynamic Time Warp (DTW) 2.1. Dynamic Time Warping Algorithm The Dynamic Time Warping (DTW) is a technique intended to compute similarities between two different sequences of patterns even when they are not aligned in time or in space [9]. Let’s consider A and B two different sequences. n :is the feature vectors of the sequence A. m: is the feature vectors of the sequence B. A= a1, a2,a3,…..an (1) B= b1, b2,b3,…..bn (2) D [n , m] : the distance matrix. Cell (i, j) represents the distance between the ith element of the sequence A and the jth element of the sequence B ( Fig1).

157

158

Hamdi Hassen and Maher Khemakhem / AASRI Procedia 5 (2013) 156 – 163

Time Serries A is m

is

js

Time Series S B

Fig 1. The DTW mechhanism.

To find the besst alignment between T b A andd B, we need too find the path h through the grid. P = p1… ps , … , pk ps = (is , js ) T minimizess the total disttance betweenn them. This P is called a waarping functioon. T calculate thhe length pathh, is simply, juust we sum all the cells that were visited along To a this pathh[10]. k

D (A, B) =

s

1

(3) (4)

d ( p s ) ws k

(5)

ws s 1 D (pps): the distancce between is and js; ws> 0: weightingg coefficient. P0: The T best alignnment path bettween A and B: B

P0 = argp min (D (A , B )).

(6)

2.2. Fast DTW DTW W presents maany disadvanttages such as the t time and space s complex xity which aree exponential.. This model is prractical only for small andd medium daata sets (

Suggest Documents