Machine Translation-Indian Regional Languages - Wseas.us

14 downloads 450 Views 296KB Size Report
Parteek Bhatia (CSED, Thapar University) for being a constant source of inspiration. This work is also an inspiration of my mother and father who have helped in ...
Recent Advances in Computer Engineering, Communications and Information Technology

Machine Translation-Indian Regional Languages NAKUL SHARMA Information Technology University of Pune (Affiliated) Fl-02B, Radiant Hillview Housing Society, Opp. H. P. Petrol Station, Kondhwa, Pune-48 INDIA [email protected] Abstract: - Natural Language Processing is an emerging field of Machine Learning. NLP systems deal with making use of machines to translate text or speech. MT system can be classified according to approaches being followed for translation. In this paper, existing MT systems according to the regional languages of India are being analyzed.

Key-Words: - Machine Translation (MT), Natural Language Processing (NLP), Indo-Aryan Languages, Dravidian Languages.

1 Introduction MT is the branch of NLP. It strives to convert natural languages (such as Hindi, English etc.) to another natural language by making use of machines. Training of MT systems can be done on multilingual languages.

Interlingua

Developing a Universal Natural Language

Example

Corpus and previous translations

Transfer

Translation rules

Knowledge

Uses Artificial Intelligence

Table 1 gives the list of major MT types and the techniques on which they are based upon.

Fig. 1. Working of MT systems

2 Problem Formulation The Fig 1 shows how general processing of MT systems takes place. The source language is fed into the MT system and the target language is generated by the system. The MT system varies from text-to-text or text-to-speech. The text-totext systems convert source text into target text. The text-to-speech systems convert source text into speech form of target language. The reverse conversion of speech-to-text is also possible depending upon the various factors. TABLE I.

With respect to Machine Translation systems in India, there is a lot of work currently being undertaken. This work is a endeavor to address following research questions:Question-1: What are the various spoken and written languages in India? Question-2: What are regions in which the regional languages of India are used?

MACHINE TRANSLATION SYSTEMS [1]

Type of MT System

Based On

Direct

Dictionary lookup

Statistical

Corpus and statistical models

ISBN: 978-960-474-361-2

407

Recent Advances in Computer Engineering, Communications and Information Technology

3 Problem Solution

Urdu

Jammu and Kashmir, Uttar Pradesh, Delhi

Yes

Yes

Dogri

Himachal Pradesh, Jammu and Kashmir.

No

Yes

-Both written and spoken

Haryanvi

Haryana

No

Yes

The languages such as Hindi, Punjabi, and Marathi are spoken as well as written. The languages such as Dogri are only spoken but not written. India hosts many regional languages. Based upon their historical significance, they are spoken and/or written in many scripts. Some of the scripts in which the languages are written are Devnagri, Gurmukhi etc.

Rajasthani

Rajasthan

No

Yes

Bihari

Bihar

No

Yes

Rajasthani

Rajasthan

No

Yes

Bihari

Bihar

No

Yes

3.1 Indian Regional Languages The natural languages can be categorized as:-Written -Spoken

Table III gives major languages of central India along with the official languages of states. TABLE III.

Fig. 2. Division of Regional Languages of India

Language

Official Language of State

Written

Spoken

Hindi

Madhya Pradesh, Jharkhand, Chattisgarh, Delhi, Uttranchal, Uttar Pradesh

Yes

Yes

Table II gives the major languages of North India along with the official languages of states. TABLE II.

LANGUAGES OF CENTRAL INDIA

LANGUAGES OF NORTH INDIA

Language

Official Language of State

Written

Spoken

Hindi

Uttar Pradesh, Uttaranchal, Bihar, Rajasthan, Haryana, Delhi

Yes

Yes

ISBN: 978-960-474-361-2

Table IV gives the major languages of southern states along with their official languages. TABLE IV.

408

LANGUAGES OF SOUTH INDIA

Language

Official Language of State

Written

Spoken

Malayalam

Kerela

Yes

Yes

Tamil

Tamil Nadu,

Yes

Yes

Recent Advances in Computer Engineering, Communications and Information Technology

Adaman and Nicobar Islands, Puducherry

Assamese

Assam, Nagaland, Arunachal Pradesh

Yes

Yes

Telugu

Andhra Pradesh

Yes

Yes

Mizo

Mizorum

Yes

Yes

Yes

Yes

Karnataka

Yes

Yes

Borak(Kokbor ak)

Tripura

Kannada Tulu

-

Yes

Yes

Hindi and English

Arunanchal Pradesh

Yes

Yes

Oriya

Odisa

Yes

Yes

*-These languages are part of the Dravidian group of languages. They are spoken and written in south Indian states.

I.

A. Web-Based Hindi to Punjabi MT system

Table V gives the major languages of western and south western states along with their official languages. TABLE V.

This system makes use of Direct Machine Translation technique. It can convert web pages, web documents from Hindi to Punjabi language [2]. Punjabi university, Patiala, has developed a web based system available at

LANGUAGES OF WESTERN AND SOUTH WESTERN STATES

Language

Official Languages of States/Union Territories

Writte n

Gujarati

Gujarat

Yes

Yes

Marathi

Maharashtra

Yes

Yes

Yes

Yes

Portuguese Daman and Diu, Goa

On-line Machine Translation Tools

Spoken

http://h2p.learningpunjabi.org

B. Bing Translator A service offered by Microsoft, bing can translate languages and also provide various ways of viewing the translated content [2]. This tool can be accessed at: http://bing.com/translator

C. Babylon Translation by Babylon is a free online version of translation Babylon software [7]. This tool can be accessed online at the at:

Table 6 gives the major languages eastern states along with their official languages.

http://translation.babylon.com/ TABLE VI.

LANGUAGES OF EASTERN STATES

Language

Official Language of States

Bengali

West Bengal, Tripura

ISBN: 978-960-474-361-2

D. PROMT Translation This online tool undertakes translation by giving text to be translated to Google, Bing, Bayblon translation systems[8]. This tool can be accessed online at:

Written Spoken

Yes

Yes

http://imtranslator.net/translator.asp

409

Recent Advances in Computer Engineering, Communications and Information Technology

languages include Hindi, Marathi, Urdu, Tamil and Oriya.

E. Google Translator It is a service offered by Google Inc. Google Translator provides a side by side view while translating content. Google also provide the feature of translating the web links into English [2]. This tool can be accessed online at:-

F. UNL Based Encovertor-decovertor This technique is based on Universal Natural Language (UNL). A encovertor converts English sentences to Punjabi sentences. A decovertor converts Punjabi sentences back to English language [3].

http://translate.google.com II. Off-line Translation Tools A. Systran

G. Anusaaraka

This system was developed by a company of the same name. The system offers translation on 35 languages. It provides technology for Yahoo! Babel Fish and was used by Google Inc. till 2007[6].

This is an expert based Machine Translation system. This system deals with creating a joint architectural model for doing Machine Translation from English to various regional Languages [9].

B. METAL

H. Sampark

METAL is a MT system developed at University of Texas. Using the concept of controlled language, this system achieves high quality translations. METAL is now known as LANT-MARK and its marketing is taken over by a Belgian company [6].

Sampark make use of transfer-based approach for undertaking translation. This system consists of source analysis, transfer, and target generation as the main processes [10].

C. English to Bangla Phrase-Based Machine Translation The system uses as base the log-linear translation models for undertaking translation. The source language is English and target language is Bengali [10]. D. Anglabharati This tool makes use of rule-based technique for completing the translation. It makes use of context free grammar for generating a pseudotarget applicable to a group of Indian Languages [11]. E. Anuvadaksh The tool is being developed by CDAC-Pune. The system aims to translate English text into regional languages of India. The target

ISBN: 978-960-474-361-2

410

Recent Advances in Computer Engineering, Communications and Information Technology

4. English Language Machine Translation Systems Table shows some of the machine translation system in which the source or target language is English.

Name of System

Source/Target Language

MT System

Developed By (Organization/authors)

Portage [10]

English

Phrase-Based

National Research Council, Canada

PANGLOSS [11]

English-Spanish,English-korean

Knowledge-Based

Kevin Knight, Steve K. Luk

Pan-Lite [12]

English

Example-Based, TransferBased, Knowledge-Based

Centre of Machine Translation, Carnegie Mellon University

ALT-J/E [13]

English-Japanese

Experimental-Based

NTT Communications Science Labouratory

Pan EBMT [14]

Spanish-English

Example-Based Machine Translation

Centre of Machine Translation, Carnegie Mellon University

English to American Sign Language [15]

English-American Sign Language

Visual & spatial

University of Pennsylvania

Candide [16]

French-English

Statistical (Probability based)

IBM Thomas J Watson Centre

ManTra [17]

English-Hindi

Automatic, Parsing

Anathankrishnan, C-DAC Mumbai

Shiraz [18]

Persian-English

Ontology Based, Direct MT

Computing Research Labouratory, New Mexico State University

ETAP [19]

Russian-English

Direct MT

Computational Linguistics Labouratory, Russian Academy of Sciences

English to Japanese MT system [20]

English-Japanese

Example-Based

ATR Interpreting Telephony Research Laboratories, Kyoto, Japan

MaTrEx [21]

English-Italian

Example Based, Statistical Based

Dublin City University, Dublin

Transonics [22]

English-Persian

Speech to speech translator

University of South California, HRL Laboratories, LLC, CA

LOGON [23]

English-Norwegian

Semantic Based, Transfer Based

Various universities in Norway

ISBN: 978-960-474-361-2

411

Recent Advances in Computer Engineering, Communications and Information Technology

[7] “Free Online Translation”, [Online] http://translation.babylon.com accessed on 19 March 2013. [8] “Online Translation Tools”, [Online] http://imtranslator.net/translator.asp accessed on 19 March 2013. [9] Sriram Choudhury, Ankitha Rao, Dipti M Sharma, “Anusaaraka: An Expert System Based Machine Translation System”, IEEE [10] Gary Anthes, “Automated Translation of Indian Languages” communications of the ACM, Technology News, January 2010, Vol 53. No. 1.

5. Acknowledgement I would like to thank my M.E. thesis guide, Dr. Parteek Bhatia (CSED, Thapar University) for being a constant source of inspiration. This work is also an inspiration of my mother and father who have helped in all efforts. It is difficult to pen-down the efforts they all had undertaken.

[11] Sadat, F., Johnson, H., Agbago, A., Foster, G., Kuhn, R., Martin, J. and Tikuisis, A. “Portage: A Phrase-based Machine Translation System “ In Proc. Association of Computing Linguistics, Workshop on Buiding and Using Parallel Texts: Data-Driven Machine Translation,Ann Arbor, Michigan, USA. June 29-30 2005. pp. 133-136. NRC 48525.

6. Conclusion Machine Translation has gone a long way from is objective for translating text and speeches. But much work needs to be done to be done in respect with having a satisfactory output in terms of quality and the flexibility of the software. There has been a spur in Machine Translation professionals as well as organization focused on Machine Translation. Indian government is also making constant efforts to bridge the gap of language barrier. III.

[12] Ralf D. Brown: Example-Based Machine Translation in the Pangloss System. COLING 1996: PP. 169-174. [13] R E Frederking, R D Brown, “THE PANGLOSS-LITE MACHINE TRANSLATION SYSTEM”, MT Horizons, Proceedings of the Second Conference of the Association for Machine Translation in the Americas (AMTA-96)

REFERENCES

[14] Ralf D Brown, “Example-based machine translation in the pangloss system”, Proceedings of the 16th conference on Computational linguisticsVolume 1, Association for Computational Linguistics, PP-169-174.

[1] Nakul Sharma, Parteek Bhatia, “English to Hindi Statistical Machine Translation System”, Master of Engineering Thesis submitted to Thapar University, July 2011 accessed at http://dspace.thapar.edu:8080/dspace/handle/10 266/1449. [2] Nakul Sharma, Parteek Bhatia, “Statistical Machine Translation for Indian Languages”, IEEE’s International Conference in Computer Engineering and Technology (ICCET-2010), ISBN: 978-81-920748-1-8. [3] Parteek Kumar, R.K. Sharma, “UNL Based Machine Translation System for Punjabi Language”, Phd thesis submitted to Thapar University, Feb 2012 accessible at http://dspace.thapar.edu:8080/dspace/handle/10 266/1729 [4] “Natural Language Processing activities at CDAC Kolkata”, Annual Report-2011, CDAC Kolkata. [5] Sitendar, Seema Bawa, “Survey of Indian Machine Translation Systems”, In. Proc. International Journal of Computer Science and Technology (IJCST), Vol-3 Issue-1, Jan-March 2012. [6] “Machine Translation System in Indian Perspectives”, Sanjay Kumar Dwivedi and Pramod Premdas Sukhadeve, In Proc. Of Journal of Computer Science. Vol 6 Issue 10, ISSN-1549-3636, 2010.

ISBN: 978-960-474-361-2

[15] Liwei Zhao, Karin Kipper, William Schuler, Christian Vogler, Martha Palmer, and Norman I. Badler , “A Machine Translation System from English to American Sign Language ”, Leture Notes in Computer Science, Volume 1934, Envisioning Machine Translation in the Information Future 4th Conference of the Association for Machine Translation in the Americas, 2000, pages 54-67. [16] Adam L. Berger, Peter F. Brown,* Stephen A. Della Pietra, Vincent J. Della Pietra, John R. GiUett, John D. Lafferty, Robert L. Mercer,* Harry Printz, Luboi Urei, “The Candide System for Machine Translation”, IBM Thomas J. Watson Research Center , P.O. Box 704 Yorktown Heights, NY 10598.

412

Recent Advances in Computer Engineering, Communications and Information Technology

[17] Ananthakrishnan R, Kavitha M, Jayprasad J Hegde, Chandra Shekhar, Ritesh Shah, Sawani Bade, Sasikumar M , “MaTra: A Practical Approach to Fully-Automatic Indicative English- Hindi Machine Translation”. [18] Jan W. Amtrup, Hamid Mansouri Rad, Karine Megerdoomian and Rémi Zajac, “Persian-English Machine Translation: An overview of Shiraz Project. NMSU, CRL, Memoranda in Computer and Cognitive Science (MCCS-00-319). [19] Makoto Nagao, “A FRAMEWORK OF A MECHANICAL TRANSLATION BETWEEN JAPANESE AND ENGLISH BY ANALOGY PRINCIPLE ”, ARTIFICIAL AND HUMAN INTELLIGENCE , Elsevier Science Publishers. B.V. NATO, 1984. [20] Nicolas Stroppa, Andy Way, “MaTerX: DCU Machine Translation System for IWSLT 2006”, [21] Emil Ettelaie, Sudeep Gandhe, Panayiotis Georgiou, Kevin Knight, Daniel Marcu, Shrikanth Narayanan, David Traum , Robert Belvin , “Transonics: A Practical Speech-to-Speech Translator for English-Farsi Medical Dialogues ”, Proceedings of the ACL Interactive Poster and Demonstration Sessions, pages 89–92, Ann Arbor, June 2005.

ISBN: 978-960-474-361-2

413