Multilingual Number Image Interpreter Vinod Upadhye Research Student G. H. Raisoni College of Engineering, Nagpur (MS), India
[email protected]
Urmila Shrawankar Department of Computer Sci. and Engineering G. H. Raisoni College of Engineering, Nagpur (MS), India
[email protected]
Abstract— India is multilingual country, in India people speak different language, and they also used different ways to write number text. Due to lack of language knowledge, people are not able to read number text from one language to another language. This proposed method solves the problem of reading number text from one language to another language. It is part of Natural Language Processing (NLP). Optical Character is used for separate out number text from image. Perform translation on number text by using rule based approach, so that it will convert number text from one regional language to another regional language. By using speech synthesis, it will give number text in voice form
II. NON-STANDARD WORD (NSW)
Index Terms—NLP; Rule based approach; Speech Synthesis
I. NTRODUCTION Numbers are played very important role in day to day life; peoples used different ways to write number text with respect to regional language. While travelling from one place to another place, there are much more problems arrives to interpret number text from one regional language to another regional language. Hence to overcome this problem, this propose method try to minimize gap between the people so that they can easily read number text from one language to another language. In this method, it will directly convert number text from one regional language to another regional language. For example, the number taken in Gujarati language then it is converted into English text which is shown in Fig. 1. After getting number text in English language then it converted into regional language, here regional language is Marathi. Image containing number text in one regional language
Separate out number text from image
Convert English number text into required regional language and pronounce number text in regional language
Convert number text into English number text and pronounce English number text in English
one nine Fig. 1. Scenarion of project
978-1-4799-3972-5/14/$31.00 ©2014 IEEE
Non-standard word (NSW) may include the numeric values, abbreviation, acronyms, phone numbers, money, dates, times, symbols, etc[19]. A. Cardinal numbers: A number (such as 1, 2, or 3) is used in counting to indicate amount but not order. For e.g. zero (0), one (1), two (2), one hundred (100), one thousand (1000) etc[3][20]. B. Ordinal numbers: A number that specifies position or order in relation to other numbers: first, second, third, and so on. Ordinal numbers contain one word i.e. seventh prize, third in line, fifteenth anniversary, and sixtieth birthday. Use numerals for the others: the 62nd state, the 31st Amendment [3][20]. C. Date: The number value present in date format. Date is used to represent the current day, month and year. We can read it as 20/08/1988 etc [3]. D. Figures: This group includes the number such as telephone numbers, room numbers, trains number etc. For example, the train number 12106 have to be pronounced in the form one two one zero six rather than twelve thousand one hundred six [3][20]. III. EXISTING WORK A. Text separation technique from Image: Text separation can be done by using Optical Character Recognition. This method consist of Binaration method, skew method, noise removal and correction, font normalization, and Support Vector Machine is used to translate number image into ASCII character or in machine readable form [1]. B. Text normalization techniques: As the input text in any text to speech system can be in any format. For getting proper output speech for any given text that text should be present in standard format. Initial step for any number translation is the text normalization. Text normalization includes the tokenization, classification of number, its expansion, etc [2][7][9] [17][19][21]. C. Approaches for multilingual number expansion i) Rule based expansion with lookup dictionary: Handwritten rules are used for expansion of text and its conversion is done depending on lookup dictionary. The rules require small memory and are conventionally stored as
589
2014 IEEE International Conference on Computational Intelligence and Computing Research
software independent language data. The same rule framework can be used for processing context–dependent abbreviation and interpretation of formatted text such as cardinal numbers [22]. ii) Generation of number-name grammar and use of database from web: A database of several million spelled-out number names is collected from the web and mapped to digit strings using an over generating number name grammar [23].The n-gram model can be used for generating the number name grammar. The number name is handled by a language model that selects the contextually most appropriate form. iii) Letter language model and decision trees for classification and number expansion: The model involves filtration process of input text and expanding it to the full words to get the output refined text in which NSWs are tagged with classification information and pronounceable words for each NSW [21]. This model [21] is language specific and was designed only for one language. The goal of language model is to produce accurate value of probability of a word [26]. A language model contains the structural constraints available in the language to generate the probabilities [15]. Language model specifies what are the valid words in the language and in what sequence they can occur [26] [29]. iv) A hybrid word sense disambiguation method with word net: This approach uses a minimum speech database containing all the diaphones occurring in a given language. The suffix stripping approach along with a rule engine that generates all the possible suffix sequences is used [24]. The word sense disambiguation methods include the sense tagged corpora, dictionaries and word nets. The supervised learning approach is achieved through sense tagged corpora and unsupervised approach includes the dictionaries and word nets for removing disambiguation [23] [27]. v) Context-aware mapping method with Neural network: In this approach the system can be designed that is able to translate number segments into the intended words [25] [28]. The system is made aware of the correspondence of numbersegments with Japanese words through learning by ANN. The ANN is used for mapping between the numbers and its number words depending upon the context of that input number. The testing of this method has been done on Japanese language [25]. D. Approaches for Language Translation: Normal language translation approaches are depicted below which can be used for translation of normal text from one language to another [14]. i) Statistical Machine Translation Technique (SBMT): The statistical machine translation is a machine translation approach where translations are done on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora [10][30]. The SMT is a corpus based approach. The parallel corpora are maintained here depending on that the translation is done. The main advantage of this
590
approach is that the corpora can be maintained without any specific training [10]. ii) Example-based Machine Translation Technique (EBMT): The example based machine translation (EBMT) is a simple but accurate approach for machine translation. The basic units used in this approach are sequences of words or phrases. The basic techniques used in this approach are matching of input sentence with the source examples; matching phrases from the database and extracting corresponding phrases along with the translation phrases and aggregation of extracted phrases and correct translation sentences [11]. iii) Rule Based Machine Translation Technique (RBMT): The rule based model can be used for translation of given input sentence to the target sentence using handwritten rules [2] [17]. The rule based model generates Marathi or Hindi translation of a given input English sentence using rules generating verbs and nouns for Marathi or Hindi [16] [17] .The main advantage of rule based approach is the easy implementation and requirement of small memory [10]. E. Methods for Rule based Machine Translation: The RBMT methodology could have several approaches. The Interlingua method, transfer method and direct method are the three main methods used in RBMT [13]. i) Direct Method The words in source language are translated directly without passing through any intermediary representation. ii) Transfer based Method In this method the source language is transformed into an abstract, less language-specific representation. An equivalent representation is then generated for the target language using bilingual dictionaries and grammar rules. iii) Interlingua Method Interlingua is a combination of two words Inter and Lingua which means between/intermediary and language respectively. In Interlingua source language is transformed into an intermediary language which is independent of any of the languages involved in translation. The translated verse for the target language is then derived from the intermediary language. IV. SYSTEM MODEL In this part system implementation technique is shown. Figure 1 represents the system diagram of this work. Scanned Number text Image
Binarization
Segmentation
2014 IEEE International Conference on Computational Intelligence and Computing Research
class variance is the same as maximizing inter-class variance which is shown in below equation, Get single number text after segmentation
Apply Freeman chain code
It is shown in terms of class probabilities and class means . is computed from the The class probability histogram as :
3[33] And the class mean
is:
Apply Ruled Based Approach
4[33]
/
Set feature to Neural Network
Extract Number text from Scanned Image
where is the value at the center of the th histogram and on the right-hand side of bin. Similarly, compute the histogram for bins greater than . So binarization gives binarized image from degraded image containing regional number text. B. Segmentation: Thresholding is most likely frequently used technique for segmentation. The thresholding procedure is a grey value remapping operation defined by, 0 1
Perform Speech Synthesis Fig. 2. System model
Take Image containing number text in multiple regional language i.e. English, Marathi, Hindi, Tamil, Gujarati etc. There are several steps required to implement Multilingual Optical Character Recognition. 1. Number text Image scanning (Digitization) 2. Preprocessing 3. Segmentation 4. Feature Extraction 5. Character Recognition A. Pre-processing: It is first steps of Image processing method, here scanned digitized 24- bit bitmap RGB color image containing number text take as an input. Pre-processing is the method to change that scanned RGB image or color image to Gray scale image. And also convert the Gray scale image (which is 2 dimensional array of unit8) to Binary image (which is 2 dimensional array of logical). •
2
Where
5
= grey value, and = threshold value.
Segmentation algorithm: The basic focus of segmentation is projection of image containing number text on horizontal axis or vertical axis Algorithm1. Calculate the horizontal projection of the entire image 2. Examine the projection profile to take out the lines 3. For each line, calculate the vertical projection profile 4. Analyze the projection profile obtained in step 4 to take out the words 5. For each word, calculate the vertical projection profile 6. Analyze the projection profile obtained in step 5 to take out the characters. Segmentation process includes line, word and character segmentation. • Line segmentation: Take out line by collecting the frequency of black pixel in horizontal case, figure shows the line extraction of image containing number text. Take out line form Image containing number text
Binarization: Otsu algorithm performs binarization. It finds out histogram and probabilities of each intensity level. It also minimizes intra-class variance. So it removed degraded background of number text image. Weighted sum of variance of two classes is given by: 1 & are the class probabilities, Where, weights variances of these classes. Intrathreshold and
451655811
Line 1
46498976
Line 2
01242032
Line 3
12345678
Line 4
451655811 (Extraction of line 1)
Fig. 3. Line segmentation
591
2014 IEEE International Conference on Computational Intelligence and Computing Research
•
Word Segmentation: Image contains number text; number text is separated by considering white sppace between the numbers. This is done via vertical scan by collecting mber text image. the frequency of black pixel of num
Feedforward backproppagation neural network: In this OCR, it is considering multiple regional language number text i.e. Marathi, Hindi, English, Tamil, Gujarati, etc. each languagge has 10 characteristic, the feed forward back propagationn neural network is trained in a supervised form. There will be a target array. By examining the training process feed forward back propagation network, it start to perform operation on number image. Input layyer has 10 neurons and vector length is 88[], it gives 888 combination (10 88 array).It takes 10 88 array as inpuut and gives output layer having 10 neurons, here givingg example of Tamil language number text recognition model. m
1.
Fig 4: Word segmentatioon
C. Freeman chain code: To extract features form number image the freeman chain code approach is implemented. It does not reequired to perform certain preprocessing steps like smoothing, filtering and slant removing for efficient classification and recoognition. There are two chain code approaches 1. 4-Directional chain code 2. 8-Directional chain code
Fig 5: Chain code
Fig 6: Neural Network Design D for character Recognition
2.
Neural Network: The structure of neuraal network contains input layer, hidden layer and output layer of neurons and ‘logsig’ activation function for trransformation value from one layer to another layer. A feed forward back propagation neural network requires thhat these inputs be normalized to the range among 0-7 for 8-connectivity. In feed forward back propagation neural network output neurons (10 v with respect to the input indices array) produces values neurons. When the charactter is offered to the feed forward back propagation neurall network, one single output neuron is selected as the sppecific character (number text).
This SHAPE number is for the object, uniquely recognized, independent of rotation (by 900). Normally chosen from Difference Code of smallest order, in encoding scheme eight possible direction assigned integer between 0-7, it c encoding in performs 8-Directional chain code. The chain terms of
……
or
, frreeman suggested 1 length of chain code with n chains as 2 . Where = unbiased estimate of perimeter lengthh, = number of even chain codes, = number of odd chain c codes, = number of corners. D. Recognition: It is final step for number text image; it recognized number text from the image by feed forwardd back propagation neural network
592
Fig 7: Neural Networkk view for Multilingual OCR
E. Ruled Based Approach: Here it is performing operation o on only multilingual number text, and it does not n consider any grammatical framework for processing the task. t For example, 41465 is the extracted text from image thenn it first translates into English text i.e. Four One Four Sixx Five. After performing this operation it converted into regional r language for example Tamil language
2014 IEEE International Conference on Computational Intelligence and Computing Research
.If regional language is Hindi then it will give the result as . • Algorithm for translator: Algorithm 1: LEXEMEGEN(ITEXT, BASEW, SUFF) Input will store in input text as ITXT, after applying rules lexemes will copied in base-word BASEW and suffix SUFF. 1. Start 2. Take ITEXT (input as text) 2.1 Read text till white space. 2.2 Apply rules to generate morphemes (LEXEME). 2.3 Both base word and suffix will store in variables BASEW and SUFF. 3. BASEW & SUFF store for future use. 4. EXIT.[32] Algorithm 2: COMPARE (NOUNIND, VERBIND, PRNIND, IND, SUFF) Index position IND and noun index NOUNIND, verb index VERBIND, pro-noun index PRNIND positions will compare respectively and on bases of that appropriate result will place in translation result. 1. Start 2. If (IND: = NOUNIND) then Print: Indexed elements respective word to appropriate place in translation result. Else if (IND: = VERBIND) then Print: Indexed elements respective word to appropriate place in translation result. Else (IND: = PRNIND) then Print: Indexed elements respective word to appropriate place in translation result. 3. Check condition for SUFF and add that in result. 4. EXIT.[32] Algorithm 3: LANTRANS (NOUN, PRONOUN, VERB, SUFF, NOUNSUF, VERBSUF, PRNSUF) Language translation will use above both algorithm and compare base-word BASEW with noun, verb and pronoun respectively. Same we will compare suffix SUFF with nounsuffix NOUNSUF, verb-suffix VERBSUF respectively. 1. START 2. CALL LEXEMEGEN(); 3. If BASEW: = NULL && SUFF: = NULL then Return and exit. 4. If BASEW := NOUN && SUFF := NOUNSUF then { Print details in relations and copy the Index position of related word in IND. CALL COMPARE (); } Else if BASEW: = VERB && SUFF: = VERBSUF then
{ Index
Print details in relations and copy the position of related word in IND. CALL COMPARE ();
} Else BASEW: = PRONOUN then { Index
Print details in relations and copy the position of related word in IND. CALL COMPARE ();
} 5. CALL LANTRANS(); 6. EXIT [32] F. Speech Synthesis: After getting the translated text, it will pronounce the number text in respective language by applying speech synthesis method. This will convert text to phonemic format. These texts are made audible to user by speech output after converting the text into speech waveform. This conversion will take care of accuracy of text normalization & pronunciation. In above example it converted English number text into Tamil and Hindi regional language in voice form i) Concatenative Synthesis: This method is used for to connect speech to speech data. It has two types i.e. Unit Selection Synthesis and DomainSpecific Synthesis ii) Working of text to speech synthesis Input: Take the input as a one regional language number text. Text Normalization: It performs tokenization and token identification and token to word. Output: By applying the speech synthesis the number text gives the voice. For e.g. if the number text in Telgu then it convert to regional language, suppose required regional language is Marathi then Telgu number text convert into Marathi voice.
Fig 8: Speech Synthesis
CONCLUSION The rule based approach has been selected to be used for number text translation from English language to other regional languages, after doing literature survey of several approaches. The result coming from the English text to English speech conversion and English speech to different regional language speech conversion can show the effective outcome. The proposed work presents an algorithm for conversion of text to speech for different regional languages. The number interpreter can have many applications in day to
593
2014 IEEE International Conference on Computational Intelligence and Computing Research
day life. The main advantage of using Rule Based Approach is the easy implementation and requirement of small memory space hence is suitable for developing the required application. This approach includes translating number text from one regional language to another regional language and also giving pronunciation to English number text and regional language number text. REFERENCES
[17] [18] [19]
[20] [21]
[1] M. Usman Akram, Zabeel Bashir, Anam Tariq and Shoab A Khan,
[2]
[3] [4] [5]
[6] [7] [8]
[9] [10]
[11] [12]
[13] [14]
[15]
[16]
“Geometric Feature Points Based Optical Character Recognition” Symposium on Industrial Electronics & Applications (ISIEA2013), September 22-25, 2013, Kuching, Malaysia 2013 IEEE Richard Gillam Advisory Software Engineer Center for Java Technology–Silicon Valley IBM Corp, “A Rule-Based Approach to Number Spellout”, 12th International Unicode/ISO 10646 Conference, 1998. Mei Tu , Yu Zhou and Chengqing Zong, “A Universal Approach to Translating Numerical and Time Expressions”, Proceedings IWSLT,2012. Hugo Brandt, Corstius, "Automatic translation between number names”, Grammars for number names Foundations of Language Supplementary Series, vol. 7, p. 103, 1968. Agni Dik, Adnan Maxhuni, Avni Rexhepi, “The principles of designing of algorithm for speech synthesis from texts written in Albanian language” IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 3, No 3, May 2012. Sunil R, Jayan V and Bhadran V K, “Preprocessors in NLP Applications: In the Context of English to Malayalam Machine Translation” India Conference (INDICON), 2012 Annual IEEE , 2012. Corstius, H.B., "Automatic translation of numbers into Dutch," Foundations of Language, vol. 1, pp. 59-62, 1965. Zhai, Feifei, Xia, Rui and Zong, Chengqing, “An Approach to Recognizing and Translating Chinese & English Time and Number Named Entities,” in the 7th China Workshop on Machine Translation (CWMT 2009), Nanjing, 2009. Terumasa, E., "Rule based machine translation combined with statistical post editor for japanese to english patent translation," pp. 13-18, 2007. Bassam Jabaian, Laurent Besacier, and Fabrice Lefèvre, “Comparison and Combination of Lightly Supervised Approaches for Language Portability of a Spoken Language Understanding System”, IEEE Transactions On Audio, Speech, And Language Processing, Vol. 21, No. 3, March 2013. Vimal Mishra and R. B. Mishra, „Study of Example Based English to Sanskrit Machine Translation ,IEEE 2011. Jain, Renu, Sinha, R.M.K. and Ajai Jain, „ANUBHARTI: Using Hybrid Example‐Based Approach for Machine Translation , In Proceedings Symposium on Translation Support Systems (STRANS2001), Kanpur, India, February 15‐17, 2001. Sneha Tripathi and Juran Krishna Sarkhet, “Approaches to machine translation”, Annals of Library and Information Studies, vol 57, Dec 2010. Prof. Deepak Mane and Aniket Hirve, “Study of Various Approaches in Machine Translation for Sanskrit Language”, International Journal of Advancements in Research & Technology, Volume 2, Issue4, April‐2013. Och and Franz Josef, Proceedings of the 2007 Joint Conference on “Empirical Methods in Natural Language Processing and Computational Natural Language Learning”, Prague, Association for Computational Lingustics, pp. 858‐867, June 2007. Abhay Adapanawar, Anita Garje, Paurnima Thakare, Prajakta Gundawar and Priyanka Kulkarni, “Rule Based English To Marathi Translation Of Assertive Sentence”, International Journal Of Scientific & Engineering Research, Volume 4, Issue 5, May-2013.
594
[22] [23] [24]
[25]
[26]
[27]
[28] [29]
[30]
[31]
[32]
[33]
Monika Gaule, Dr. Gurpreet Singh and Josan, “ Machine Translation of Idioms from English to Hindi”, International Journal Of Computational Engineering Research Vol. 2 Issue. 6, Oct 2012. Anil Kumar Singh, “Extraction and Translation of Multi-Word Number Expressions”,2010. Jagadish S Kallimani, Srinivasa K G, Eswara Reddy B, “Normalization of Non Standard Words for Kannada Speech Synthesis” , Volume 1, No.1, International Journal of Advances in Computer Science and Technology November – December 2012. Jari Alhonen, “Multilingual Number Expansion for TTS”, IEEE International Conference on Speech Database and Assessments, 2009. Thu-Trang Thi Nguyen, Thanh Thi Pham and Do-Dat Tran “A method for Vietnamese Text Normalization to improve the quality of speech synthesis”, ACM Symposium on Information and Communication Technology, 2010. Marko Moberg and Kimmo Parssinen “Multilingual rule-based approach to number expansion: Framework, extensions and application”, Int J Speech Technology Springer ,2007. Sproat, R.: "Lightly Supervised Learning of Text Normalization: Russian Number Names," IEEE Workshop on Spoken Language Technology, Berkeley, CA, 2010. Minho Kim,Youngim Jung and Hyuk-Chul Kwon, “Hybrid Word Sense Disambiguation Using Language Resources for Transliteration of Arabic Numerals in Korean” ACM (ICHIT) International Conference on Convergence and Hybrid Information Technology, 2009. Matsuhara M and Suzuki. S, “An efficient context-aware character input algorithm for mobile phone based on artificial neural network”, Applied Computational Intelligence and Soft Computing Journal on Awareness Science and Technology (ICAST), January 2012. Neema Mishra, Urmila Shrawankar and Dr. V. M Thakare, “An Overview Of Hindi Speech Recognition”, Proceedings of the International Conference, “Computational Systems and Communication Technology” 5th May 2010. Priti Saktel and Urmila Shrawankar, “Context Based Meaning Extraction for HCI Using WSD Algorithm: A Review” IEEE International Conference On Advances in Engineering, Science And Management (ICAESM-2012) March 2012. Priti Saktel and Urmila Shrawankar, “Context Based Domain Identification for Resolving Ambiguity” ICCCNT , Coimbatore India, July 2012. Neema Mishra, Urmila Shrawankar and V.M. Thakare, “Automatic Spech Recognition using Template Model for Man-Machine Interface” Proceedings of Emerging Trends in Computing Technologies, SRM University, Chennai, India, June 21-24, 2010.pp.39-42. Rina Damdoo and Urmila Shrawankar, “Probabilistic Language Models for Template Messaging based on Bi-Gram” IEEE International International Conference On Advances in Engineering, Science And Management (ICAESM-2012) March 2012. Urmila Shrawankar and Vilas Thakare, “Noise Estimation and Noise Removal Techniques for Speech Recognition in Adverse Environment” Intelligent Information Processing V IFIP Advances in Information and Communication Technology Volume 340,2010,pp 336-342 Ved Kumar Gupta, Prof. Namrata Tapaswi, Dr. Suresh Jain, “Knowledge Representation of Grammatical Constructs of Sanskrit Language Using Rule Based Sanskrit Language to English Language Machine Translation” Advances in Technology and Engineering (ICATE), 2013 IEEE http://en.wikipedia.org/wiki/Otsu%27s_method