Parteek Bhatia (CSED, Thapar University) for being a constant source of
inspiration. This work is also an inspiration of my mother and father who have
helped in ...
Recent Advances in Computer Engineering, Communications and Information Technology
Machine Translation-Indian Regional Languages NAKUL SHARMA Information Technology University of Pune (Affiliated) Fl-02B, Radiant Hillview Housing Society, Opp. H. P. Petrol Station, Kondhwa, Pune-48 INDIA
[email protected] Abstract: - Natural Language Processing is an emerging field of Machine Learning. NLP systems deal with making use of machines to translate text or speech. MT system can be classified according to approaches being followed for translation. In this paper, existing MT systems according to the regional languages of India are being analyzed.
Key-Words: - Machine Translation (MT), Natural Language Processing (NLP), Indo-Aryan Languages, Dravidian Languages.
1 Introduction MT is the branch of NLP. It strives to convert natural languages (such as Hindi, English etc.) to another natural language by making use of machines. Training of MT systems can be done on multilingual languages.
Interlingua
Developing a Universal Natural Language
Example
Corpus and previous translations
Transfer
Translation rules
Knowledge
Uses Artificial Intelligence
Table 1 gives the list of major MT types and the techniques on which they are based upon.
Fig. 1. Working of MT systems
2 Problem Formulation The Fig 1 shows how general processing of MT systems takes place. The source language is fed into the MT system and the target language is generated by the system. The MT system varies from text-to-text or text-to-speech. The text-totext systems convert source text into target text. The text-to-speech systems convert source text into speech form of target language. The reverse conversion of speech-to-text is also possible depending upon the various factors. TABLE I.
With respect to Machine Translation systems in India, there is a lot of work currently being undertaken. This work is a endeavor to address following research questions:Question-1: What are the various spoken and written languages in India? Question-2: What are regions in which the regional languages of India are used?
MACHINE TRANSLATION SYSTEMS [1]
Type of MT System
Based On
Direct
Dictionary lookup
Statistical
Corpus and statistical models
ISBN: 978-960-474-361-2
407
Recent Advances in Computer Engineering, Communications and Information Technology
3 Problem Solution
Urdu
Jammu and Kashmir, Uttar Pradesh, Delhi
Yes
Yes
Dogri
Himachal Pradesh, Jammu and Kashmir.
No
Yes
-Both written and spoken
Haryanvi
Haryana
No
Yes
The languages such as Hindi, Punjabi, and Marathi are spoken as well as written. The languages such as Dogri are only spoken but not written. India hosts many regional languages. Based upon their historical significance, they are spoken and/or written in many scripts. Some of the scripts in which the languages are written are Devnagri, Gurmukhi etc.
Rajasthani
Rajasthan
No
Yes
Bihari
Bihar
No
Yes
Rajasthani
Rajasthan
No
Yes
Bihari
Bihar
No
Yes
3.1 Indian Regional Languages The natural languages can be categorized as:-Written -Spoken
Table III gives major languages of central India along with the official languages of states. TABLE III.
Fig. 2. Division of Regional Languages of India
Language
Official Language of State
Written
Spoken
Hindi
Madhya Pradesh, Jharkhand, Chattisgarh, Delhi, Uttranchal, Uttar Pradesh
Yes
Yes
Table II gives the major languages of North India along with the official languages of states. TABLE II.
LANGUAGES OF CENTRAL INDIA
LANGUAGES OF NORTH INDIA
Language
Official Language of State
Written
Spoken
Hindi
Uttar Pradesh, Uttaranchal, Bihar, Rajasthan, Haryana, Delhi
Yes
Yes
ISBN: 978-960-474-361-2
Table IV gives the major languages of southern states along with their official languages. TABLE IV.
408
LANGUAGES OF SOUTH INDIA
Language
Official Language of State
Written
Spoken
Malayalam
Kerela
Yes
Yes
Tamil
Tamil Nadu,
Yes
Yes
Recent Advances in Computer Engineering, Communications and Information Technology
Adaman and Nicobar Islands, Puducherry
Assamese
Assam, Nagaland, Arunachal Pradesh
Yes
Yes
Telugu
Andhra Pradesh
Yes
Yes
Mizo
Mizorum
Yes
Yes
Yes
Yes
Karnataka
Yes
Yes
Borak(Kokbor ak)
Tripura
Kannada Tulu
-
Yes
Yes
Hindi and English
Arunanchal Pradesh
Yes
Yes
Oriya
Odisa
Yes
Yes
*-These languages are part of the Dravidian group of languages. They are spoken and written in south Indian states.
I.
A. Web-Based Hindi to Punjabi MT system
Table V gives the major languages of western and south western states along with their official languages. TABLE V.
This system makes use of Direct Machine Translation technique. It can convert web pages, web documents from Hindi to Punjabi language [2]. Punjabi university, Patiala, has developed a web based system available at
LANGUAGES OF WESTERN AND SOUTH WESTERN STATES
Language
Official Languages of States/Union Territories
Writte n
Gujarati
Gujarat
Yes
Yes
Marathi
Maharashtra
Yes
Yes
Yes
Yes
Portuguese Daman and Diu, Goa
On-line Machine Translation Tools
Spoken
http://h2p.learningpunjabi.org
B. Bing Translator A service offered by Microsoft, bing can translate languages and also provide various ways of viewing the translated content [2]. This tool can be accessed at: http://bing.com/translator
C. Babylon Translation by Babylon is a free online version of translation Babylon software [7]. This tool can be accessed online at the at:
Table 6 gives the major languages eastern states along with their official languages.
http://translation.babylon.com/ TABLE VI.
LANGUAGES OF EASTERN STATES
Language
Official Language of States
Bengali
West Bengal, Tripura
ISBN: 978-960-474-361-2
D. PROMT Translation This online tool undertakes translation by giving text to be translated to Google, Bing, Bayblon translation systems[8]. This tool can be accessed online at:
Written Spoken
Yes
Yes
http://imtranslator.net/translator.asp
409
Recent Advances in Computer Engineering, Communications and Information Technology
languages include Hindi, Marathi, Urdu, Tamil and Oriya.
E. Google Translator It is a service offered by Google Inc. Google Translator provides a side by side view while translating content. Google also provide the feature of translating the web links into English [2]. This tool can be accessed online at:-
F. UNL Based Encovertor-decovertor This technique is based on Universal Natural Language (UNL). A encovertor converts English sentences to Punjabi sentences. A decovertor converts Punjabi sentences back to English language [3].
http://translate.google.com II. Off-line Translation Tools A. Systran
G. Anusaaraka
This system was developed by a company of the same name. The system offers translation on 35 languages. It provides technology for Yahoo! Babel Fish and was used by Google Inc. till 2007[6].
This is an expert based Machine Translation system. This system deals with creating a joint architectural model for doing Machine Translation from English to various regional Languages [9].
B. METAL
H. Sampark
METAL is a MT system developed at University of Texas. Using the concept of controlled language, this system achieves high quality translations. METAL is now known as LANT-MARK and its marketing is taken over by a Belgian company [6].
Sampark make use of transfer-based approach for undertaking translation. This system consists of source analysis, transfer, and target generation as the main processes [10].
C. English to Bangla Phrase-Based Machine Translation The system uses as base the log-linear translation models for undertaking translation. The source language is English and target language is Bengali [10]. D. Anglabharati This tool makes use of rule-based technique for completing the translation. It makes use of context free grammar for generating a pseudotarget applicable to a group of Indian Languages [11]. E. Anuvadaksh The tool is being developed by CDAC-Pune. The system aims to translate English text into regional languages of India. The target
ISBN: 978-960-474-361-2
410
Recent Advances in Computer Engineering, Communications and Information Technology
4. English Language Machine Translation Systems Table shows some of the machine translation system in which the source or target language is English.
Name of System
Source/Target Language
MT System
Developed By (Organization/authors)
Portage [10]
English
Phrase-Based
National Research Council, Canada
PANGLOSS [11]
English-Spanish,English-korean
Knowledge-Based
Kevin Knight, Steve K. Luk
Pan-Lite [12]
English
Example-Based, TransferBased, Knowledge-Based
Centre of Machine Translation, Carnegie Mellon University
ALT-J/E [13]
English-Japanese
Experimental-Based
NTT Communications Science Labouratory
Pan EBMT [14]
Spanish-English
Example-Based Machine Translation
Centre of Machine Translation, Carnegie Mellon University
English to American Sign Language [15]
English-American Sign Language
Visual & spatial
University of Pennsylvania
Candide [16]
French-English
Statistical (Probability based)
IBM Thomas J Watson Centre
ManTra [17]
English-Hindi
Automatic, Parsing
Anathankrishnan, C-DAC Mumbai
Shiraz [18]
Persian-English
Ontology Based, Direct MT
Computing Research Labouratory, New Mexico State University
ETAP [19]
Russian-English
Direct MT
Computational Linguistics Labouratory, Russian Academy of Sciences
English to Japanese MT system [20]
English-Japanese
Example-Based
ATR Interpreting Telephony Research Laboratories, Kyoto, Japan
MaTrEx [21]
English-Italian
Example Based, Statistical Based
Dublin City University, Dublin
Transonics [22]
English-Persian
Speech to speech translator
University of South California, HRL Laboratories, LLC, CA
LOGON [23]
English-Norwegian
Semantic Based, Transfer Based
Various universities in Norway
ISBN: 978-960-474-361-2
411
Recent Advances in Computer Engineering, Communications and Information Technology
[7] “Free Online Translation”, [Online] http://translation.babylon.com accessed on 19 March 2013. [8] “Online Translation Tools”, [Online] http://imtranslator.net/translator.asp accessed on 19 March 2013. [9] Sriram Choudhury, Ankitha Rao, Dipti M Sharma, “Anusaaraka: An Expert System Based Machine Translation System”, IEEE [10] Gary Anthes, “Automated Translation of Indian Languages” communications of the ACM, Technology News, January 2010, Vol 53. No. 1.
5. Acknowledgement I would like to thank my M.E. thesis guide, Dr. Parteek Bhatia (CSED, Thapar University) for being a constant source of inspiration. This work is also an inspiration of my mother and father who have helped in all efforts. It is difficult to pen-down the efforts they all had undertaken.
[11] Sadat, F., Johnson, H., Agbago, A., Foster, G., Kuhn, R., Martin, J. and Tikuisis, A. “Portage: A Phrase-based Machine Translation System “ In Proc. Association of Computing Linguistics, Workshop on Buiding and Using Parallel Texts: Data-Driven Machine Translation,Ann Arbor, Michigan, USA. June 29-30 2005. pp. 133-136. NRC 48525.
6. Conclusion Machine Translation has gone a long way from is objective for translating text and speeches. But much work needs to be done to be done in respect with having a satisfactory output in terms of quality and the flexibility of the software. There has been a spur in Machine Translation professionals as well as organization focused on Machine Translation. Indian government is also making constant efforts to bridge the gap of language barrier. III.
[12] Ralf D. Brown: Example-Based Machine Translation in the Pangloss System. COLING 1996: PP. 169-174. [13] R E Frederking, R D Brown, “THE PANGLOSS-LITE MACHINE TRANSLATION SYSTEM”, MT Horizons, Proceedings of the Second Conference of the Association for Machine Translation in the Americas (AMTA-96)
REFERENCES
[14] Ralf D Brown, “Example-based machine translation in the pangloss system”, Proceedings of the 16th conference on Computational linguisticsVolume 1, Association for Computational Linguistics, PP-169-174.
[1] Nakul Sharma, Parteek Bhatia, “English to Hindi Statistical Machine Translation System”, Master of Engineering Thesis submitted to Thapar University, July 2011 accessed at http://dspace.thapar.edu:8080/dspace/handle/10 266/1449. [2] Nakul Sharma, Parteek Bhatia, “Statistical Machine Translation for Indian Languages”, IEEE’s International Conference in Computer Engineering and Technology (ICCET-2010), ISBN: 978-81-920748-1-8. [3] Parteek Kumar, R.K. Sharma, “UNL Based Machine Translation System for Punjabi Language”, Phd thesis submitted to Thapar University, Feb 2012 accessible at http://dspace.thapar.edu:8080/dspace/handle/10 266/1729 [4] “Natural Language Processing activities at CDAC Kolkata”, Annual Report-2011, CDAC Kolkata. [5] Sitendar, Seema Bawa, “Survey of Indian Machine Translation Systems”, In. Proc. International Journal of Computer Science and Technology (IJCST), Vol-3 Issue-1, Jan-March 2012. [6] “Machine Translation System in Indian Perspectives”, Sanjay Kumar Dwivedi and Pramod Premdas Sukhadeve, In Proc. Of Journal of Computer Science. Vol 6 Issue 10, ISSN-1549-3636, 2010.
ISBN: 978-960-474-361-2
[15] Liwei Zhao, Karin Kipper, William Schuler, Christian Vogler, Martha Palmer, and Norman I. Badler , “A Machine Translation System from English to American Sign Language ”, Leture Notes in Computer Science, Volume 1934, Envisioning Machine Translation in the Information Future 4th Conference of the Association for Machine Translation in the Americas, 2000, pages 54-67. [16] Adam L. Berger, Peter F. Brown,* Stephen A. Della Pietra, Vincent J. Della Pietra, John R. GiUett, John D. Lafferty, Robert L. Mercer,* Harry Printz, Luboi Urei, “The Candide System for Machine Translation”, IBM Thomas J. Watson Research Center , P.O. Box 704 Yorktown Heights, NY 10598.
412
Recent Advances in Computer Engineering, Communications and Information Technology
[17] Ananthakrishnan R, Kavitha M, Jayprasad J Hegde, Chandra Shekhar, Ritesh Shah, Sawani Bade, Sasikumar M , “MaTra: A Practical Approach to Fully-Automatic Indicative English- Hindi Machine Translation”. [18] Jan W. Amtrup, Hamid Mansouri Rad, Karine Megerdoomian and Rémi Zajac, “Persian-English Machine Translation: An overview of Shiraz Project. NMSU, CRL, Memoranda in Computer and Cognitive Science (MCCS-00-319). [19] Makoto Nagao, “A FRAMEWORK OF A MECHANICAL TRANSLATION BETWEEN JAPANESE AND ENGLISH BY ANALOGY PRINCIPLE ”, ARTIFICIAL AND HUMAN INTELLIGENCE , Elsevier Science Publishers. B.V. NATO, 1984. [20] Nicolas Stroppa, Andy Way, “MaTerX: DCU Machine Translation System for IWSLT 2006”, [21] Emil Ettelaie, Sudeep Gandhe, Panayiotis Georgiou, Kevin Knight, Daniel Marcu, Shrikanth Narayanan, David Traum , Robert Belvin , “Transonics: A Practical Speech-to-Speech Translator for English-Farsi Medical Dialogues ”, Proceedings of the ACL Interactive Poster and Demonstration Sessions, pages 89–92, Ann Arbor, June 2005.
ISBN: 978-960-474-361-2
413