This is to certify that the thesis entitled “ENGLISH-TELUGU RULE BASED.
MACHINE ... have developed English to Telugu machine translation system.
English-Telugu Rule Based Machine Translation system A Thesis submitted for the degree of
Master of Science (by research) in the School of Engineering
By R.SRIBADRI NARAYANAN
Centre for Excellence in Computational Engineering Amrita School of Engineering Amrita Vishwa Vidyapeetham University Coimbatore – 641105
March, 2012
Amrita School of Engineering Amrita Vishwa Vidyapeetham, Coimbatore – 641105
BONAFIDE CERTIFICATE
This is to certify that the thesis entitled “ENGLISH-TELUGU RULE BASED MACHINE
TRANSLATION
SYSTEM”
submitted
by
R.SRIBADRI
NARAYANAN (Reg. No.: CB.EN.M*CEN09009) for the award of the degree of Master of Science (by research) in the School of Engineering, is a bonafide record of the research work carried out by him under my guidance. He has satisfied all the requirements put forth for the project and has completed all the formalities regarding the same to the fullest of my satisfaction.
Ettimadai, Coimbatore. Date:
DR. K P SOMAN RESEARCH GUIDE AND HEAD, CEN.
Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore – 641105 Centre for Excellence in Computational Engineering. DECLARATION
I, R.SRIBADRI NARAYANAN (REG. NO.: CB.EN.M*CEN09009), hereby declare that this thesis entitled ENGLISH-TELUGU RULE BASED MACHINE TRANSLATION SYSTEM is the record of the original work done by me under the guidance of Dr. K P Soman, Head, Centre for Excellence in Computational Engineering, Amrita School of Engineering, Coimbatore and to the best of my knowledge this work has not formed the basis for the award of any degree / diploma / associateship / fellowship or a similar award, to any candidate in any University.
Place: Ettimadai Date: Signature of the Student Countersigned by
K P SOMAN PROFESSOR AND HEAD, CEN, AMRITA VISHWA VIDYAPEETHAM, COIMBATORE.
ACKNOWLEDGEMENTS First and foremost, I would like to thank my guide Dr. K.P Soman for his support, valuable suggestion and constant encouragement throughout the project. I would like to thank Dr. S. Rajendran who spend enormous amount of time in guiding and rectifying our problems whenever it was necessary. I would like to thank Ms Mallika V Research Associate Computational Engineering and networks, for her support in linguistic knowledge who gave full support and enormous amount of ideas to scale up the system. I am very grateful to have my friend Mr. Saravanan.S who has immense experience and meticulously tried to shape me up in the project. I extend my gratitude to Mr. Sankara Narayanan for his valuable suggestions and ideas. I would also thank Mr. Senthil for his support in my research work and giving his valuable ideas. I extend my cordial thanks to all the teaching and the non teaching staffs of the Department of Computational Engineering and Networking for the help rendered at various phases of the project work. I express our thanks to my parents and friends who always stood with me with their valuable suggestions and help.
ABSTRACT Translation from one language to another language plays a vital role in sharing the information between two languages. For example in Indian language we have ethics like Ramayana, Mahabharata etc., which are life transforming stories, should be made available in all other languages. Similarly many advanced or latest technological topics should be translated to our Indian language. For this purposes we have developed English to Telugu machine translation system. In this system English sentence is given as input and we get output as Telugu sentence. Before producing the Telugu output, English sentence have to go through certain process such as parser, reordering, lexicalization, transliteration and morphology. Parser gives grammatical tree structure for English sentence. For this purpose we are using Stanford parser, which gives better results when compared with other parser. In reordering we reorder the English sentence with respect to our Telugu sentence. In English, format of the sentence will be Subject-Verb-Object (SVO) type but in Telugu we have SOV format. Using reordering rules we have to reorder the sentences. Lexicalization is a process of changing the English words to Telugu words. We have a English-Telugu bilingual dictionary. Using it, English words will be searched and replaced with Telugu words. Transliteration is done using Support Vector Machine (SVM) based approach which is developed at CEN, Amrita University. Transliteration is mainly used for transliterating the named entities and also for those words which are not available in the bilingual dictionary. Morphology is done for the grammatical words. Morphology plays a vital role in Telugu language, because Telugu language is rich in inflection and agglutinative in nature. We have used SVMTool for morphological analyzer and data driven approach for morphological generator. Final process is integrating the tools in a unique platform and producing the Telugu output. i
CONTENTS Abstract ........................................................................................................................... i Contents .........................................................................................................................ii List Of Figures .............................................................................................................. iv List of Tables ................................................................................................................. v Chapter 1 ........................................................................................................................ 1 Introduction .................................................................................................................... 1 1.1 ISSUES IN MACHINE TRANSLATION ....................................................................... 2 Chapter 2 ........................................................................................................................ 3 Literature Survey ........................................................................................................... 3 2.1 MACHINE TRANSLATION ....................................................................................... 3 2.2 THE NECESSITY OF MACHINE TRANSLATION ........................................................ 3 2.3 DIFFERENT CATEGORIES OF MACHINE TRANSLATION SYSTEMS ........................... 4 2.4 VARIOUS APPROACHES TO MACHINE TRANSLATION ............................................ 5 2.4.1 LINGUISTICS OR RULE BASED APPROACH ...................................................... 6 2.4.2 NON-LINGUISTIC APPROACHES ...................................................................... 8 2.4.3 HYBRID APPROACH ...................................................................................... 10 2.5 MORPHOLOGICAL ANALYZER AND GENERATOR ............................................. 10 Chapter 3 ...................................................................................................................... 14 Overview Of Telugu Language ................................................................................... 14 3.1 DEMOGRAPHIC INFORMATION ............................................................................. 14 3.2 GENERIC AFFILIATION AND HISTORY .................................................................. 14 3.3 THE TELUGU SCRIPT ........................................................................................... 14 3.3.1 ORIGIN AND DEVELOPMENT......................................................................... 14 3.3.2 TELUGU ALPHABET...................................................................................... 15 3.4 COMPUTATIONAL GRAMMAR OF TELUGU ........................................................... 17 3.4.1 NOUNS ......................................................................................................... 17 3.4.2 VERBS .......................................................................................................... 19 Chapter 4 ...................................................................................................................... 23 Overview Of English-Telugu Machine Translation System ........................................ 23 4.1 PARSER ............................................................................................................... 24 ii
4.2 REORDERING ....................................................................................................... 24 4.3 DICTIONARY ....................................................................................................... 24 4.4 TRANSLITERATION .............................................................................................. 25 4.5 MORPHOLOGICAL ANALYZER .............................................................................. 25 4.5.1 INTRODUCTION............................................................................................. 25 4.5.2 DATA CREATION FOR SUPERVISED LEARNING ............................................. 26 4.5.3 IMPLEMENTATION OF MORPHOLOGICAL ANALYZER MODULE ....................... 31 4.6 MORPHOLOGICAL GENERATOR ........................................................................... 33 4.6.1 INTRODUCTION ............................................................................................. 33 4.6.2 MORPHOLOGICAL GENERATOR FOR TELUGU ............................................... 34 4.6.3 DIFFICULTIES IN MORPHOLOGICAL GENERATION FOR TELUGU ................... 34 4.6.4 FORMATION OF INFLECTIONAL TABLE ......................................................... 35 4.6.5 METHODOLOGY ........................................................................................... 36 Chapter 5 ...................................................................................................................... 41 Results .......................................................................................................................... 41 5.1 TESTING AND RESULTS ....................................................................................... 41 5.2 DISCUSSION ........................................................................................................ 41 5.3 SCREEN SHOT OF MORPHOLOGICAL ANALYZER ................................................. 42 5.4 TESTING AND RESULTS ....................................................................................... 43 5.5 DISCUSSION ........................................................................................................ 43 5.6 SCREEN SHOT OF MORPHOLOGICAL GENERATOR............................................... 44 5.7 TESTING AND RESULTS ....................................................................................... 45 5.8 DISCUSSION ........................................................................................................ 45 5.9 SCREEN SHOT OF ENGLISH-TELUGU MACHINE TRANSLATION SYSTEM .............. 46 Chapter 6 ...................................................................................................................... 47 Conclusion ................................................................................................................... 47 References .................................................................................................................... 48 Publication ................................................................................................................... 49
iii
LIST OF FIGURES FIG. 2.1. Illustrates different approach of machine translation system ………………6 Fig. 2.2. DAWG ……………………………………………………………………..12 Fig. 4.1. General block diagram for English-Telugu machine translation system …..23 Fig. 4.2. Example to illustrate morphological analyzer ……………………………..25 Fig. 4.3. Formation of paradigm …………………………………………………….26 Fig. 4.4. Steps involved in preprocessing data for SVM model …………………….27 Fig. 4.5. SVM model for morphological analyzer …………………………………..31 Fig. 4.6. Illustration for training module 1 and 2 in SVM …………………………..32 Fig. 4.7. Overview of morphological generator system ……………………………..35 Fig. 4.8. Grammatical tree structure …………………………………………………37 Fig. 4.9. Reordering of “She is writing a letter”...……………………………………38 Fig. 4.10. Lexicalization ……………………………………………………………..38 Fig. 5.1. GUI for morphological analyzer-verb ……………………………………...41 Fig. 5.2. GUI for morphological analyzer-noun ……………………………………..41 Fig. 5.3. GUI for morphological generator-verb …………………………………….43 Fig. 5.4. GUI for morphological generator-noun ……………………………………43 Fig. 5.5. GUI for English-Telugu machine translation system ……………………...45 Fig. 5.6. GUI for English-Telugu machine translation system ……………………...45
iv
LIST OF TABLES Table 2.1. An example to illustrate the direct approach to machine translation system …………………………………………………………7 Table 2.2. An example to illustrate the interlingua representation …………………...7 Table 2.3. An example to illustrate the transfer approach …………………………….8 Table 4.1. Database information …………………………………………………….24 Table 4.2. Grouping of words in „ADU‟ paradigm ………………………………….27 Table 4.3. Sample input for SVM model ……………………………………………28 Table 4.4. Verb paradigm …………………………………………………………....34 Table 4.5. Noun paradigm …………………………………………………………...34 Table 4.6. Morpho-lexical forms …………………………………………………….35 Table 5.1. Testing results of morphological analyzer-noun …………………………40 Table 5.2. Testing results of morphological analyzer-Verb …………………………40 Table 5.3. Testing results of morphological generator-noun ………………………..42 Table 5.4. Testing results of morphological generator-verb ………………………...42 Table 5.5. Testing results of translation system ……………………………………..44
v
ABBREVIATION CV
Constant-Vowel
DAWG
Direct Acrylic Word Graph
FAMT
Fully Automatic Machine Translation
FAHQMT
Fully Automatic High Quality Machine Translation
FST
Finite State Transducers
HAMT
Human Aided Machine Translation
MAHT
Machine Aided Human Translation
MT
Machine Translation
NLP
Natural Language Processing
PCFG
Probabilistic Context Free Grammar
POS
Parts Of Speech
SOV
Subject-Object-Verb
SVO
Subject-Verb-Object
SVM
Support Vector Machine
XML
Extensible Markup Language
vi
CHAPTER 1 INTRODUCTION Machine translation is the task of translating the text in source language to target language, automatically. Machine translation can be considered as an area of applied research that draws ideas and techniques from linguistics, computer science, artificial intelligence, translation theory, and statistics. Even though machine translation was envisioned as a computer application in the 1950‟s and research has been made for 60 years, machine translation is still considered to be an open problem. In a linguistically diverged country like India, machine translation is an important and most appropriate technology for localization. Human translation in India can be found since the ancient times which are being evident from the various works of philosophy, arts, mythology, religion and science which have been translated among ancient and modern Indian languages. Also, numerous classic works of art, ancient, medieval and modern, have also been translated between European and Indian languages since the 18th century. As of now, human translation in India finds application mainly in the administration, media and education, and to a lesser extent, in business, arts and science and technology. India has 18 constitutional languages, which were written in 10 different scripts. Hindi is the official language of the India. English is the language which is most widely used in the media, commerce, science and technology and education. Many of the states have their own regional language, which is either Hindi or one of the other constitutional languages. Only about 5% of the population speaks English. In such a situation, there is a big market for translation between English and the various Indian languages. Currently, the translation is done manually. Use of automation is largely restricted to word processing. Two specific examples of high volume manual translation are -Translation of news from English into local languages, translation of annual reports of government departments and public sector units among, English, Hindi and the local language. Many resources such as news, weather reports, books, etc., in English are being manually translated to Indian languages. Of these, news and weather reports from all around the world are 1
translated from English to Indian languages by human translators more often. Human translation is slow and also consumes more time and cost compared to machine translation. The reason for choosing automatic machine translation rather than human translation is that machine translation is better, faster and cheaper than human translation.
1.1 ISSUES IN MACHINE TRANSLATION Natural language processing has many challenges, of which the biggest is the inherent ambiguity of natural language. Machine translation systems have to deal with ambiguity, and various other natural language phenomena. In addition, the linguistic diversity between the source and target language makes machine translation a bigger challenge. This is particularly true for widely divergent languages such as English and Indian languages. The major structural difference between English and Indian languages can be summarized as follows. English is a highly positional language with rudimentary morphology, and default sentence structure as SVO. Indian languages are highly inflectional, with a rich morphology, relatively free word order, and default sentence structure as SOV [3]. In addition, there are many stylistic differences. For example, it is common to see very long sentences in English, using abstract concepts as the subjects of sentences, and stringing several clauses together. Such constructions are not natural in Indian languages, and this leads to major difficulties in producing good translations. Compared to Hindi, Telugu is rich in morphology and is an agglutinative language. As it is recognized all over the world, with the current state of art in machine translation, it is not possible to have fully automatic, high quality, and general-purpose machine translation. Practical systems need to handle ambiguity and the other complexities of natural language processing, by relaxing one or more of the above dimensions.
2
CHAPTER 2 LITERATURE SURVEY 2.1 MACHINE TRANSLATION Machine translation is one of the major, oldest and the most active area in natural language processing and it was one of the man‟s oldest dreams. It has become a reality in the twentieth century, in the form of computer programs capable of translating a wide variety of texts from one natural language into another. Yet there are no „translating machines‟ that are capable of translating text in any language and produce a perfect translation in any other language without human intervention or assistance. Till now programs were developed which can produce „raw‟ translations of texts in relatively well-defined subject domains, which can be revised manually to give good-quality translated texts at an economically feasible rate or which in their unrevised state can be read and understood by experts in the subject for information purposes.
2.2 THE NECESSITY OF MACHINE TRANSLATION Machine Translation is an important technology for localization, and is particularly relevant in a linguistically diverse country like India. This is because we have 18 constitutional languages, which are written in 10 different scripts. So the translation among these languages is very important. It‟s not possible to manually translate the required resources among these languages. In our country, only a less percentage of people speak English. Though Hindi is our National language, not everyone in our country knows Hindi. There comes the need for the machine translation. Also the resources such as text books, literatures and other valuable resources might be available only in a specific language. For example, consider a literature which is available in a language that is known only by a few people and it was required by some other people who don‟t have any idea about the language using which the literature was written. Therefore it will be difficult for those people to use that resource, due to language which here acts as a communication barrier. This is the situation where we seek the help of the human translators to translate the resource. 3
But this will be a tedious job to find a translator who knows the language in which the literature was written and the language in which the user required to translate the literature i.e. the language known by him. Also it is time consuming and very expensive. And if the resource to be translated is huge, it is definitely impossible for humans to manually translate the entire resources, in a short span of time. The only solution for this problem is to design machine which can perform the translation automatically.
2.3 DIFFERENT CATEGORIES OF MACHINE TRANSLATION SYSTEMS The three categories of machine translation systems are [1], MACHINE AIDED HUMAN TRANSLATION can range from automatic look-up programs to systems which are practically fully automatic, but which require the translator to approve each sentence. Examples of some of the more successful of this type of software are the Translators Workbench of Trados and INK Tools. HUMAN AIDED MACHINE TRANSLATION also covers a broad range of systems. Human intervention can mean pre-editing the SL text by a person skilled at using the machine translation system in order to make the SL easier for the computer to understand, or it can mean interactive intervention, in which the translator may be asked questions about the meaning of the SL text by the computer. Some of the most irritating MT systems have used this approach, requiring the translator to sit in front of the computer terminal and answer such questions as: The word 'pen' means: a writing pen a play pen a pig pen Human intervention can also mean post-editing to check the translation and fix mistakes made by the computer. It should be noted that the pre-editing and glossary compilation required for HAMT typically require a person who is a trained linguist
4
who can parse the syntax of the sentence, not simply a translator who understands the foreign language and can express it in his/her own language. Obviously the most primitive is the system which requires pre-editing, since the computer cannot handle the text unless a human converts NL into a semi-artificial language which is easier for the computer to understand. The ideal is when the automatic translation is so good that all that is necessary is to check the translation and change a few details. Interactive intervention can be anywhere in between. FULLY AUTOMATED MACHINE TRANSLATION systems, and although they may suit the needs of people who have to search through mountains of information and only need to get a very general idea of the contents of a document (a good example is provided by the low-quality needs of the military and the intelligence agencies), highquality translation of truly natural language which is really fully automatic (automated) hardly exists. Fully Automatic High Quality Machine Translation (FAHQMT) systems have requirements either for the compilation of extensive glossaries and/or are restricted to specific sub worlds or sublanguages.
2.4 VARIOUS APPROACHES TO MACHINE TRANSLATION From the period when the first idea of using machine for the process of language translation, there have been many different approaches to machine translation that have been proposed, implemented and put into use, during the course of time. The main approaches to machine translation are shown in Figure 2.1.
5
FIGURE 2.1 ILLUSTRATES DIFFERENT APPROACHES OF MACHINE TRANSLATION SYSTEM
2.4.1 LINGUISTICS OR RULE BASED APPROACH Rule based approaches require linguistic knowledge during the translation and so it uses grammar rules and computer programs which will be helpful in analysing the text for determining grammatical information and features for each and every word in the source language, translating it by replacing each word by lexicon or word that have the same context in the target language. Rule based approach is the principal methodology that was developed in machine translation. Linguistic knowledge will be required in order to write the rules for this type of approaches. These rules will play a vital role during the different levels of translation. 2.4.1.1 DIRECT APPROACH Direct translation approach can be considered as the first approach to machine translation. It involves the process of analysing morphological information, identify the constituents and reorder the words in the source language according to the word order pattern of the target language and then replace the words in the source language by the target language words using a lexical dictionary of that particular language pair and as a last step, inflect the words appropriately to produce translations. 6
TABLE2.1 AN EXAMPLE TO ILLUSTRATE THE DIRECT APPROACH TO MACHINE TRANSLATION Input Sentence in English Morphological Analysis
After
Constituent Identification
He came late to school yesterday He come PAST late to school yesterday అతను తునన పాఠశాల చివరలో వచిి
Dictionary Lookup Inflect(the final translated sentence)
అతను తునన పాఠశాల చివరిలో వచిింది
2.4.1.2 INTERLINGUA APPROACH Interlingua approach to machine translation mainly aims at transforming the texts in the source language to a common representation applicable to many languages, using which translation of text to the target language is performed. Interlingua approach sees machine translation as a two stage process: 1 Analysing and transforming the source language texts into a common language independent representation. 2 From the common language independent form generate the text in the target language. TABLE 2.2 AN EXAMPLE TO ILLUSTRATE THE INTERLINGUA REPRESENTATION Predicate
Reach
Agent
Boy (Number: Singular)
Theme
Hospital (Number: Singular)
Instrument Tense
Ambulance (Number: Singular) FUTURE
7
2.4.1.3 TRANSFER APPROACH The transfer model involves three stages: analysis, transfer, and generation. In the analysis stage, the source language sentence is parsed, and the sentence structure and the constituents of the sentence are identified. In the transfer stage, transformations are applied to the source language parse tree to convert the structure to that of the target language. The generation stage translates the words and expresses the tense, number, gender etc. TABLE 2.3 AN EXAMPLE TO ILLUSTRATE THE TRANSFER APPROACH Input Sentence
He will come to school in bus
Analysis
< to school> < in bus>
Transfer
Generation (Output)
అతను బస్ుు లో పాఠశాల వస్ాాయి
RELATED WORKS Rule based approach is widely used in developing machine translation for Indian language.
2.4.2 NON-LINGUISTIC APPROACHES The non-linguistic approaches are those which don‟t require any linguistic knowledge explicitly to translate texts in the source language to target language. The only resource required by this type of approaches is data either the dictionaries for the dictionary based approach or bilingual and monolingual corpus for the empirical or corpus based approaches. 2.4.2.1 DICTIONARY BASED APPROACH The dictionary based approach to machine translation uses s dictionary for the language pair to translate the texts in the source language to target language. In this approach, word level translations will be done. This dictionary based approach can either be preceded by some pre-processing stages to analyse the morphological information and lemmatize the word to be retrieved from the dictionary. This kind of approach can be used to translate the phrases in a sentence and found to be least useful in translating a full sentence. This approach will be very useful in accelerating 8
the human translation, by providing meaningful word translations and limiting the work of humans to correcting the syntax and grammar of the sentence. 2.4.2.2 EMPIRICAL OR CORPUS BASED APPROACHES The corpus based approaches don‟t require any explicit linguistic knowledge to translate the sentence. But a bilingual corpus of the language pair and the monolingual corpus of the target language are required to train the system to translate a sentence. This approach has driven lots of interest world-wide, from late 1980s till now. 2.4.2.2.1 EXAMPLE BASED APPROACH This approach to machine translation is a technique that is mainly based how human beings interpret and solve the problems. That is, normally the humans split the problem into sub problems, solve each of the sub problems with the idea of how they solved this type of similar problems in the past and integrate them to solve the problem in whole. This approach needs a huge bilingual corpus of the language pair among which translation has to be performed. 2.4.2.2.2 STATISTICAL BASED APPROACH Statistical approach to machine translation generates translations using statistical methods by deriving the parameters for those methods by analysing the bilingual corpora. Even though designing a statistical system for a particular language pair is a rapid process, the work lies on creating bilingual corpora for that particular language pair, as this was the technology behind this approach. In order obtain better translations from this approach, at least more than two million words if designing the system for a particular domain and more than this for designing a general system for translating particular language pair. RELATED WORKS Recently Google has released alpha version of English to Telugu machine translation system. The system is developed using statistical approach [15]. The online version of the system is available. The output of the system is good for simple and frequently used sentences. Since they have huge amount of bilingual corpus their output is acceptable.
9
2.4.3 HYBRID APPROACH Hybrid machine translation approach makes use of the advantages of both statistical and rule-based translation methodologies. Commercial translation systems such as Asia Online and Systran provide systems that were implemented using this approach. Hybrid machine translation approaches differ in many numbers of aspects: 1. Rule-based system with post-processing by statistical approach 2. Statistical translation system with pre-processing by the rule based approach
2.5 PARSER Parser is the process of analyzing a text, made of a sequence of tokens, to determine the grammatical structure with respect to given formal language. Two approaches for developing parsers are top down approach and bottom up approach. Some of the parsers available as open software are XML parser, Stanford parser, LL parser and LR parser.
2.6 MORPHOLOGICAL ANALYZER AND GENERATOR Various NLP research groups have developed different methods and algorithm for morphological analysis. Some of the algorithms are language dependent and some of them are language independent. A survey of various methods involved in Morphological Analysis includes the following: 1. Finite State Transducer (FST) 2. Stemmer Algorithm 3. Corpus Based Approach 4. DAWG (Directed Acrylic Word Graph) 5. Paradigm Based Approach
2.6.1 FINITE STATE TRANSDUCERS The FST based morphological analyzer and generators are widely implemented for many languages [4]. The FST systems are mainly used in speech recognition and speech processing while building the language models. The morph analyzer and generator can be built in a bidirectional manner using FST [2].
10
RELATED WORKS FST based approach is one of the popular method for developing morphological generator and analyzer. Using FST, Morphological analyzer and generator is developed for Tamil, Malayalam and Kannada at AMRITA-CEN. 2.6.2 STEMMER Stemmer [6] is used for stripping of affixes. It uses a set of rules containing list of stems and replacement rules. E.g: writing write + ing For a stemmer program me we have to specify all possible affixes with replacement rules. E.g. ational →ate relational→ relate tional → tion conditional→ condition The most widely used stemmer algorithm is Potter algorithm. The algorithm is available free of cost http://www.tartarus.org/martin/PotterStemmer/. RELATED WORKS There are some attempts to develop stemmer for Indian Languages also. IIT Bombay and NCST Bombay has developed stemmer for Hindi [Manish, Anantha]. 2.6.3 CORPUS BASED APPROACH Corpus is a large collection of written text belongs to a particular language. Raw corpus can be used for morphological analysis. It takes raw corpus as input and produces a segmentation of the word forms observed in the text. Such segmentation resembles morphological segmentation. RELATED WORKS Morfessor1.0 developed in Helsinki University is a corpus based language independent
morphological
segmentation
program.
The
LTRC
Hyderabad
successfully developed a corpus based morphological analyzer. The program combines paradigm based approach as well corpus based approach.
11
2.6.4 DAWG (DIRECTED ACRYLIC WORD GRAPH) DAWG is a very efficient data structure for lexicon representation and fast string matching, with a great variety of application. This method has been successfully implemented for Greek language by University of Partas Greece. DAWG data structure can be used for both morphological analysis and generation. This approach is language independent it does not utilizes any morphological rules or any other special linguistic information. The method can be tested for Indian languages also. Figure 2.2 shows an example for DAWG graph. In the figure A, B, C, U, L, T, S are different states from one node to another.
FIGURE 2.2 AN EXAMPLE FOR DAWG 2.6.5 PARADIGM APPROACH A paradigm defines all the word form of a given stem and also provides a feature structure with every word form. The paradigm based approach is efficient for inflectionally rich languages. This or a variant of this scheme has been used widely in NLP. The linguist or the language expert is asked to provide different tables of word forms covering the words in a language. Each word-forms table covers a set of roots which means that the roots follow the pattern (or paradigm) implicit in the table for generating their word forms. Almost all Indian language morphological analyzers are developed using this method. Based on paradigms the program generates add delete string for analyzing. Paradigm approach rely on findings that the different types of word paradigms are based on their morphological behavior. 12
RELATED WORKS The ANUSAARAKA research group has developed a language independent paradigm based morphological compiler program for Indian Languages. Words are categorized as nouns, verbs, adjectives, adverbs and postpositions. Each category will be classified into certain types of paradigms based on their morphophonemic behavior. For example noun Uru (village) belongs to a paradigm class is different form Abbayi (boy) which belongs to a different paradigm class as they behave differently morpho-phonemically. We have used Machine learning using SVMTool for implementing Morphological Analyzer and paradigm approach for Morphological generator.
13
CHAPTER 3 OVERVIEW OF TELUGU LANGUAGE Historically Telugu Language is also known by the names, amdhram, tenu (m) gu and gentoo [8].
3.1 DEMOGRAPHIC INFORMATION Telugu is one of the major Scheduled Languages in India. It is the second most popular language in India [10]. Its speakers are mainly concentrated in South India. It is the official language of Andhra Pradesh and Secondly widely spoken language in Tamilnadu and Karnataka. Considerable numbers of Telugu speaking minorities live in Maharashtra, Orissa, Madhya Pradesh and West Bengal. Considerable numbers of Telugu language speakers have migrated to Mauritius, South Africa and recently to U.S.A, UK, and Australia.
3.2 GENERIC AFFILIATION AND HISTORY Telugu [10] belongs to South Central branch of Dravidian family of languages. It is the most widely spoken Dravidian language. It is the only literary language outside the South-Dravidian branch. Its literature goes back to 11th century A.D. Its ancient forms were attested through inscriptions dating back to 200A.D.
3.3 THE TELUGU SCRIPT 3.3.1 ORIGIN AND DEVELOPMENT Telugu is written is Telugu script which is derived from Ashokan Brahmi [8] used in the South India cerca 2nd A.D. The Southern Brahmi is also known as Dravidian Brahmi gave rise to vengi-calukyan script also known as Telugu-Kannada script. By the end of 13th Century A.D., the Telugu and Kannada scripts got separated. In the early Telugu- Kannada script, no orthographic distinct was made between the short mid (e, o) and long mid vowels (E, O). 14
3.3.2 TELUGU ALPHABET The primary units of Telugu [10] alphabet are syllables; therefore, it should be rightly called a syllabary and most appropriately a mixed alphabetic-syllabic script. Unlike in the Roman alphabet used for English, in the Telugu alphabet the correspondence between the symbols (graphemes) and sounds (phonemes) is more or less exact. However, there exist some differences between the alphabet and the phonemic inventory of Telugu. The overall pattern consists of 60 vowels, 3 vowel modifiers and 41 consonants. NOTABLE FEATURES Type of writing system: syllabic alphabet in which all consonants have an inherent vowel. Diacritics, which can appear above, below, before or after the consonant they belong to, are used to change the inherent vowel. When they appear at the beginning of a syllable, vowels are written as independent letters. When certain consonants occur together, special conjunct symbols are used which combine the essential parts of each letter. Direction of writing: left to right in horizontal lines. VOWELS
15
CONSTANTS
CONJUNCT CONSONANTS
16
VOWEL MODIFIERS
3.4 COMPUTATIONAL GRAMMAR OF TELUGU In Telugu [9], Morphology plays a crucial role in not only generating numerous word forms from nouns and verbs but determining their shapes as well. As head of noun phrases, nouns carry distinct morphological inflections indicating various syntactic and semantic functions expressed in proposition. Word Order, unlike English, does not determine the syntactic relations between a noun and its governing category verb.
3.4.1 NOUNS A noun [9] in Telugu is inflected in a complex way. Nouns in Telugu characteristically carry the markings of gender, number, person and case. A number of nouns in Telugu often change their form before the marking of gender, number, and person and case. Systematic changes occur in the base particularly when inflected for non-nominative cases such as accusative, dative, instrumental, ablative and locative. Conventionally noun-nominative base of a noun is also known as oblique base or oblique form. However, it should be noted that such a base is neither unique nor common. GENDER MARKING ON NOUN Though the inflection classes are insensitive to gender distinctions, there are distinctions of gender discernible from morphology of agreement on verbs, adjectives, possessives, predicate nominal, numerals and deictic categories. It is necessary to identify four distinctions in gender, viz. nouns indicating:
Human males
Other than human males, in singular and plural, nouns indicating
Humans, and
Non-humans. This distinct is necessitated by the distribution of nouns indicating human
females which are grouped with neuter nouns in singular, but human males in plural. 17
However, a number of nouns denoting human males end in –du, and human females end in –di. NUMBER MARKING IN TELUGU NOUNS Telugu nouns usually occur in two numbers, singular and plural. However, only plural nouns are explicitly marked. In case of large number of nouns the form of the plural suffix is –lu, while in case of some nouns of human male category, the form of plural suffix alternant is –ru. GENDER- NUMBER-PERSON MARKING ON NOUNS Telugu nouns when function as nominal predicate show agreement with the gender, number and person of the surface subject of the clause. Pronominalized possessive nouns (possessors) show agreement (in gender, number and person) with the nouns of possession and function as heads of possessive phrases. In these two cases nouns are marked by pronominal suffixes of the relevant gender-numberperson. The person marking on nouns is however, explicit only in 1st and 2nd person both singular and plural, In the case of 3rd person, only the number is marked explicitly and not the person.
CASES: CASE MARKERS AND POST- POSITIONS Nouns are usually inflected by case by case markers and post-positions to indicate their semantic-syntactic function in clausal predication. The terms case markers and post-positions roughly correspond to Type-1 and Type-2 post-positions of Krishnamurti and Gwynn. They use the term post-positions corresponds in meaning to prepositions in English. However, they makes a distinction between two types of post-positions, viz. Type-1 and Type-2 based on the criteria like the freedom of distribution (bound and free) and the nature of composition of post-positions (Type-1 post-positions are attached to Type-2 post-positions and not vice-versa). Telugu uses a wide variety of case markers and post-positions and their combinations to indicate various relations between nouns and verbs or nouns. Case suffixes and post-positions fall into two types viz. “Grammatical” and “Semantic or location and directional”. Grammatical case suffixes are those which express grammatical case relations such as nominative, accusative, dative, instrumental, genitive, comitative, vocative and causal. The semantic cases include such as nouns 18
inflected for location in time and space. Nouns when attached with various combinations of adverbial nouns and case markers or post-positions express many more such relations.
3.4.2 VERBS Verb [16] denotes the state of or action by a substance. Telugu verb may be finite or non-finite. All finite verbs and some non-finite verbs can occur according to situation before the utterance final juncture /#/ characterized by of following terminal contours: rising pitch, meaning question; level pitch, falling pitch, meaning command. A finite verb does not occur before any of the non-final junctures. On the morphological level, no non- finite verb contains a morpheme indicating person; this statement should not, however, be taken to mean that all finite verbs necessarily contain a morpheme indicating person. Since any verb, finite or non-finite, occurs only after some marked juncture, by definition of these junctures, all verbs have phonetic stress or prominence on their first syllable, which invariably part of the root. Almost every Telugu verb has a Finite and a non- finite form. A finite form is one that can stand as the main verb of a sentence and occur before a final pause (full stop). A non- finite form cannot stand as a main verb and rarely occurs before a final pause. FINITE VERBS The eight finite forms of the modern Telugu verb may be arranged in three structural types, which are set up according to the differences in the grouping of the three substitution classes,
Stem or inflection root
Tense-mode suffix
Personal suffix( es ) The paradigms of the finite forms of a simple verbal base are given below
under the three structural types: ammu (to sell), with two allomorphs: amm- before a vowel. Type 1: stem + personal suffix: 1. Imperative : singular –u amm-u (sell) Plural - andi ammu - andi 19
Type 2: stem + tense-mode suffix: 2. Admonitive or abusive: On account of semantic restrictions, many verbs cannot occur in this mood. A few bases like kAlu (to burn), kUlu (to fall), cAvu (to die), pagulu (to break), etc., occur Eg: nIyilli kUlu - may your house fall 3. Obligative (in all persons): -Ali amma – Ali I, we, you( sg, pl) he, she, it Type 3: stem + tense-mode suffix + personal suffix 4. Habitual- future or non-past: -tammu – t - Anu I shall sell ammu – t – Am we shall sell ammu – t – Ava you shall sell ammu – t – Aru he shall sell ammu – t – Adu she shall sell ammu – tun – di she sell ammu – t – Ay they sell 5. Past tense: -iammu – i – Anu* I sold ammu – i – Am we sold ammu – i – Ava you sold (Singular) ammu – i – Aru you sold (plural) ammu – i – Adu he sold ammu – in – di she/ it sold ammu – i – Aru they sold 6. Hortative: -dammu – d – Am let us sell, or we shall sell 7. Negative tense: -aammu – a – nu I (do, did, and shall) not sell 20
ammu – a –m we(do, did, and shall) not sell ammu – a –va you (do, did, and shall) not sell ammu – a – Du he(does, did, and shall) not sell ammu – a – du she/ it(do, did, and shall) not sell ammu – a – ru they (do, did, and shall) not sell 8. Negative imperative or prohibitive: -Akammu – Ak – a you(sg.) don‟t sell ammu – Ak – andi you(pl.) don‟t sell NON FINITE VERBS There are ten non- finite verbs which may be arranged into two structural types:
Unbound
Bound
Type 1: 1. Present participle -tu ammu- tU selling 2. Past participle -i ammu- i having sold 3. Concessive -inA ammu- inA even though sold 4. Conditional -itE ammu- itE if sold 5. Infinitive -a ammu- a to sell 6. Negative participle -aka amm-aka not selling 7. Habitual adjective -E amm-E that sells 8. Past adjective -ina amma-ina that sold 9. Negative adjective -ani ammu- ani not selling
Type 2: Bound present - t- : ammu- t – occurs with any finite form of the verb un- to be and also a few non- finite forms. Example: ammu- t- unnAnu I am selling ammu- t- un- nA even selling( now) 21
ammu- t- un- tE if selling ammu- t- un- na that selling
22
CHAPTER 4 OVERVIEW OF ENGLISHTELUGU MACHINE TRANSLATION SYSTEM English to Telugu machine translation system is developed by integrating five modules namely parser, reordering, lexicalization, transliteration and morphological generator.
ENGLISH SENTENCE (INPUTTEXT)
S TANFORD PARSER
REORDERING
LEXICALIZATION
TRANSLITERATION
MORPHOLOGICAL GENERATION
TELUGU SENTENCE(OUTPUT TEXT) FIGURE 4.1 GENERAL BLOCK DIAGRAM FOR ENGLISH-TELUGU MACHINE TRANSLATION 23
4.1 PARSER The Stanford parser is used for generating grammatical tree structure and parts of speech (POS) category for the given English sentence. Stanford parser is a lexicalized PCFG parser. When compared with all other existing parsers it provides better results and so the Stanford parser is integrated in the present system.
4.2 REORDERING Reordering plays a vital role in overcoming the structural difference between English and Telugu language. In English, format of the sentence will be SubjectVerb-Object (SVO) type but in Telugu we have SOV format. To overcome this problem reordering rules are applied in the source language level. A set of reordering rules for Telugu has been adopted from the reordering rules developed for Tamil.
4.3 DICTIONARY A well groomed comprehensive bilingual dictionary, specially made from the point of view of translation, is an essential component in a translation system. The prototype of one such dictionary is created for the present English-Telugu machine translation system. The bilingual dictionary is collected through various resources like internet, books etc. At present the dictionary contains 26000 words which belong to different grammatical categories.
24
TABLE 4.1 DATABASE INFORMATION
4.4 TRANSLITERATION SVM based English to Telugu transliterator is used for transliteration. Transliteration is mainly done for the words which are not available in the bilingual dictionary.
4.5 MORPHOLOGICAL ANALYZER 4.5.1 INTRODUCTION Morphological analyzer takes input as a word and produces output as the analysis of the word. Presently morphological analyzer is considered as a module in which the input is Telugu word and the output is the analysis of the given Telugu word.
FIGURE 4.2 EXAMPLE TO ILLUSTRATE MORPHOLOGICAL ANALYZER Before explaining the module, let us first look at the inflections that are to be considered. The morphological structure of Telugu verbs inflects for tense, person, gender, and number. The nouns inflect for plural, oblique, case and postpositions. The structure of verbal complex is unique and capturing this complexity in a machine 25
analyzable and generatable format is a challenging task. Inflections of the Telugu verbs include finite, infinite, adjectival, adverbial and conditional markers. The verbs are classified into certain number of paradigms based on the inflections. For computational need we have 37 paradigms of verb and each paradigm with 160 inflections. Sixty seven paradigms are identified for Telugu noun. Each paradigm has 117 sets of inflected forms. Based on the nature of the inflections the root words are classified into groups. A corpus with all morphological information has been prepared. So the machine by itself captures all the morphological rules. Morphological analysis of nouns is less complex compared to verbs. The detailed list of Paradigms and the possible inflections of the verbs and nouns are given in the Appendix.
Support Vector Machine (SVM) is used for classifying task. Presently there are three modules [13]. 1. SVMTlearn 2. SVMTagger 3.SVMTeval. SVMTlearn is used for training the system with manually created corpus. SVMTagger is used for tagging the sequence of words by taking samples from previously learned SVM model. SVMTeval is used for evaluating the final output.
4.5.2 DATA CREATION FOR SUPERVISED LEARNING 1. The first step involves the data creation (corpora development) for morphological analyzer and classifying the verbs and nouns into paradigm types. Each root word inflects for different grammatical features. But the nature of these inflections is same for each paradigm type. The verbs inflect for grammatical features such as tense, person, number, gender, non-finite, infiniteness, conditional negation, emphasis and interrogation. The nouns inflect for plural numbers, postpositions may follow the case immediately or after a space. Figure 4.3 illustrates the formation of paradigms.
26
FIGURE 4.3 FORMATION OF PARADIGM 2. The second step is to collect the list of words which will fall under the paradigms of verbs and nouns. Table 4.2 illustrates some of the words and its inflections under the paradigm ADu.
TABLE 4.2 GROUPING OF WORDS IN ‘ADU’ PARADIGM PARADIGM 1 ADU LIST OF WORDS
INFLECTIONS
1.ATADU 2.IdADu 3.KoniyADu 4.koTTADu ………. ……….
1.tunnAnu 2.tunnAmu 3.Anu 4.Amu 5.tAnu ……………
3. The third step is pre-processing the corpus for morphological analyzer [12]. Steps involved in pre-processing are explained in the Figure 7.
27
FIGURE 4.4 STEPS INVOLVED IN PRE-PROCESSING DATA FOR SVM MODEL The pre-processing steps involves the Romanization, Segmentation, Alignmentmapping and mismatching.
ROMANIZATION: The set of most commonly used noun and verb forms are generated manually for input structure and similarly the output structure is developed. These data are converted to Romanized forms using the Unicode to Roman mapping file.
SEGMENTATION: After Romanization each and every word in the corpora is segmented based on the Telugu grapheme and each grapheme which is syllabic is further segmented into consonants and vowels. The Consonant are represented by "C" and vowel is represented by "V" respectively. It is named as C-V representation or Consonant – Vowel representation. Morpheme boundaries (end of each morpheme) are indicated by “*” symbol in output data.
ALIGNMENT AND MAPPING: The segmented syllables are aligned vertically as shown in Table 1. Here the input segmented syllables are consequently mapped with labeled output segmented syllables. First column represents the input data with C-V
28
representation and latter one represents output data labels.”*” indicates the morpheme boundaries TABLE 4.3 SAMPLE INPUT FOR SVM MODEL
MISMATCHING: It is the key problem in mapping between the input and output data. Mismatching occurs in two cases i.e., either the input units are larger or smaller than that of the output units. This problem is solved by inserting null symbol “$” or combining two units based on the morph-syntactic rules to the output data. And the input segments are mapped with output segments. After mapping machine learning tool is used for training the data. This type of problems sometimes it may occur in case of nouns also.
Case 1: Input sequence: Input sequence:
1-C|E-V|s|t-C|u-V|n|n-C|A-V|n-C|u-V|
(10 segments)
Output sequence (Mismatching) 1|E*|t|u|n|n|A*|n|u*|
(9 segments)
Corrected output sequence: 1|E*|$|t|u|n|n|A*|n|u*|
(10 segments)
29
Case 1 This case shows the input sequence is having more number of segments than the output sequence. Telugu verb lEstunnAnu is having 10 segments in input sequence but in output it has only 9segments.the occurrence of “s” in the input sequence becomes null due to the morph syntactic rule. So there is no segment to map with that “s”. For this reason, in training data “s” is mapped with “$” symbol ($ indicates null). Now the number of input units are equal to the number of output units is shown in corrected output sequence. Case 2:
(A) Input sequence: A|D-C|a-V|n-C|u-V|
(5 segments)
Output sequence (Mismatched): A|D|u*|a*|n|u*|
(6 segments)
Corrected output sequence: A|Du*|a*|n|u*|
(5 segments)
(B) Input sequence A|v-C|A-V|m-C| e-V|
(5 segments)
Output sequence (Mismatched): A|v|u*|A|m|e*|
(6 segments)
Corrected output sequence:
30
A|vu*|A|m|e*|
(5 segments)
Case 2 This shows the input sequence is having less number of units than the output units. (A) and (B) are examples for case2 in case of verbs and nouns. Telugu verb ADanu is having 5 units in input sequence but output has 6 units or segments. Due to morph syntactic change the unit “D-C” in the input sequence is mapped to two segments “D, u*” in output sequence is shown in corrected output sequence. For this reason in training “D-C” is mapped with “Du*”. Now the input and output sequences are having equal number of units. So the problem of mismatching is solved. Same thing happened in case of nouns also which is explained in (B). There are some cases in which both case 1and case 2 will occur together. We can solve such type of mismatching problems by applying same rules of case1 and case2. Example with Telugu noun Urikeduru is shown below. Input sequence: U|r-C|i-V|K-C|e-V|d-C|u-V|r-C|u-V|
(9 segments)
Output sequence (Mismatching) U |r|u*|i*|K|e|d|u|r|u*|
(10 segments)
Corrected output sequence: U |ru*|i*|$|e|d|u|r|u*|
(9 segments)
4.5.3 IMPLEMENTATION OF MORPHOLOGICAL ANALYZER MODULE Using machine learning approach the morphological analyzer for Telugu is developed. Separate engines are developed for nouns and verbs. Morphological analyzer is redefined as a classification task using the machine learning approaches. Three phases are involved in morphological analyzer. 1. Pre-processing. 2. Segmentation of morphemes. 31
3. Identifying morphemes.
FIGURE 4.5 SVM MODEL FOR MORPHOLOGICAL ANALYZER Figure 4.5 gives an outlook of the morphological analyzer system. In this machine learning approach, two training modules are created for morphological analyzer. These two modules are represented as module-I and module-II. In the first module the system is trained using the sequence of input characters and their corresponding output labels. The first module of training is used for identifying morpheme boundaries. For example for the noun form abbAYilu (boys), there are two morpheme boundaries, „abbaYi‟(boy) and „lu‟(plural). These two morpheme boundaries are made to learn in module- I. Similarly the system is trained with large set of corpus.
In module-II, the sequence of morphemes and their grammatical
categories are used for training. By this grammatical classes to each morpheme are assigned. For example, for abbaYilu two grammatical categories have been assigned, abbaYi as root and lu as plural suffix. These two morpheme information are trained in module II. The figure 4.6 clearly depicts the training by SVM module-I and module-II. PRE-PROCESSING:
In pre-processing, first the given word is romanized. After that the
Romanized words are segmented into syllables according to the Telugu grapheme segmentation. These segmented syllables are further split for C-V representation. SEGMENTATION OF MORPHEMES:
Pre-processed words are segmented into morphemes
according to the morpheme boundaries. The input sequence is given to the training module-I. The training module predicts each output label to the input segments.
32
IDENTIFYING MORPHEME:
The Segmented morpheme is given to the training module-II.
It predicts grammatical categories to the segmented morphemes.
FIGURE 4.6 AN ILLUSTRATION FOR TRAINING MODULE 1 AND 2 IN SVM The system is trained for the word „abbAYilu‟. When the system names across a similar kind word like „AvUlu‟ the SVM modules will give the correct morphological interpretation.
4.6 MORPHOLOGICAL GENERATOR 4.6.1 INTRODUCTION Morphological generator is developed using Data Driven Approach. In this approach three different modules are developed. The first module takes the lemma and POS category as input and gives the lemma‟s paradigm number and word‟s stem as output. The second module takes morpho-lexical information as the input and gives its index number as the output. In third module, a suffix-table is used to generate the word with the information from the above two modules.
33
4.6.2 MORPHOLOGICAL GENERATOR FOR TELUGU There are different methods available for Morphological generation. In particular most familiar approach is rule based morphological generator. In rule based approach we need linguistic knowledge to develop the Morphological generator system as it requires morpho-phonemic rules and morpheme dictionary. In the present approach, rules and dictionaries are not needed. It requires only suffix table and code for paradigm classification. Information given as the input to morphological generator are 1.lemma , 2.word_class and 3.Morpho-lexical information. Lemma specifies the word-form to be generated, Word-class specifies the grammatical category and Morpho-lexical information specifies the type of information. The input to the morphological generator is given in the form of lemma + word_class + Morpholexical Information. Morpho-lexical information is extracted from the Morphological analyzer tool for Telugu. An example of Morphological generator system is given below.
Adu verb + past +3SM Adadu ప్లే
verb + past +3SM ఆడాడు
3SM = Third Person Singular Male.
4.6.3 DIFFICULTIES IN MORPHOLOGICAL GENERATION FOR TELUGU Developing a morphological generator is a tedious job, because every word in Telugu has multiple inflections. Some of the inflections include auxiliaries, clitics, adjectival, adverbial, finite, infinite and condition forms of verbs. The number of inflected forms varies with each and every word. To solve this problem, a classification of Telugu verbs based on tense markers and inflection is made. Verbs are classified in to thirty six paradigms and the paradigms are listed in Table 4.4.
34
TABLE 4.4 VERB PARADIGM ADu
aruvu
avvu
Cavu
Ceppu
Ceyi
Cudu
Cupimcu
eduvu
Ivvu
Kaavu
Kadulcu
Kalu
Kaluvu
Konu
Koyyi
Kudurcu
Kurco
Kuriyu
kuTTi
Lee
moopu
Padu
Pannu
pettu
Piluvu
Pogudu
Poo
Puyyi
Rayyi
Tannu
Tee
Tiyuu
Umdu
valayu
vellu
Nouns are classified in to sixty five paradigms and the paradigms are listed in Table 4.5. TABLE 4.5 NOUN PARADIGM Abbayi
Baludu
Bandi
Bendu
Bonu
Buddi
Cenu
Cillu
Dari
Enimidi
Gadi
Goru
Goyyi
Guddu
Gudi
Gumdu
Guudu
Illu
Iamtuvu
Kalcar
Kalu
Kannu
Kilu
Kota
Koti
koTTu
Kotu
Kundelu
Mamdi
Manishi
Medadu
Menu
Meku
Metuku
Nokaru
Nuru
Okati
Palu
Pamdiri
Pandem
Pani
Papam
Pelli
Pennu
Pette
Pillavadu
Pimdi
Puli
Pustakam
Putti
Puvu
Raatri
Rayyi
Rani
Remdu
Riksha
Saari
Samdadi
Snehitudu
Taragati
Tennu
Tiragali
Uru
Velu
veyyi
4.6.4 FORMATION OF INFLECTIONAL TABLE The initial work is collection of Telugu words. Telugu words are collected manually from the books and the internet using information retrieval process. The collected words are classified into separate groups. The groups are formed on the basis of similarity between the words. For example the root word „ADu‟ inflects as Adikadu, Adanu, AdtunnAnu etc. All these inflected words are tabulated. Paradigm and inflection tables are formed by using the data collected. Paradigm and inflection tables are made separately for nouns and verbs. There are 36 paradigms for verbs and 65 paradigms for nouns. Here the most frequently used Morpho-lexical forms of verbs 35
and nouns are selected. The creation of morpho-lexical forms of verbs and nouns make use of an order which is followed for all the paradigms. Morpho-lexical information list is created using Morpho-lexical forms. In the tabular column, row indicates the Morpho-lexical information and column indicates the paradigm number. The inflection table for Verb is given in Table 4.6. TABLE 4.6 MORPHO-LEXICAL FORMS P-1
P-2
P-3
P-4
P-5
ML-1
u
vu
pu
nu
yi
ML-2
utunnAnu
ustunnAnu
utunnAnu
uTunAnnu
ustunnAnu
ML-3
utunnAmu
stunnAmu
tunnAmu
TunAmu
stunnAmu
ML-4
Anu
sAnu
pAnu
nnanu
sAnu
ML-5
Amu
sAmu
pAmu
nnamu
sAmu
4.6.5 METHODOLOGY
FIGURE 4.7 OVERVIEW OF MORPHOLOGICAL GENERATOR SYSTEM Block diagram for Telugu morphological generator is shown in Figure 4.7. When compared with other Morphological generator the implementation of the present system is entirely different. The information given to Morphological generator is lemma or root word, word class and Morpho-lexical information. The lemma or root word with POS tag information is romanized. For the Romanized root word the 36
paradigm number has to be found. The paradigm number corresponds to column index for the inflection table. The Morpho-lexical information of the required word class is given by the user as input. From the Morpho-lexicon information list the index number of the corresponding input is identified and this corresponds to the row index. The row and column index number thus obtained is sent to Noun/verb suffix table. The input word class determines the Noun/verb Suffix table to be selected. Stemming is done to the root word. The selected information from the inflection table is concatenated with the root word. The above process is explained with an example. STEP 1 Let us consider input to the system is given as ప్లే (ADu) + verb + Present Tense. 1. ప్లే is lemma 2. Verb is word_class 3. Present Tense is Morpho-Lexical Information. STEP 2 ప్లే is Romanized and we get output as ADu. STEP 3 The Romanized ADu is given as input for the verb paradigm table and we get the output as paradigm number of ADU which is 1. This is the column index for Table 4.6(Morpho-Lexical forms)
STEP 4 The lemma „ADu‟ is send for stemming process and the output is „AD‟
37
STEP 5 With Morpo-Lexical Information we have to find the Morpho-Lexical index number. In this case for the present tense it is ML-3. This is the row index for Table 4.6 (Morpho-Lexical forms) STEP 6 Now with the help of row index and column index we can find the morpho-Lexical information which is „utunnAnu’. STEP 7 Now we have to concatenate the lemma „AD‟ and Morpho-Lexical information „utunnAnu‟ and produce output as ADutunnAnu.
Working of English to Telugu machine translation system is explained with a simple example.
STEP 1 Consider the input sentence as „She is writing a letter‟. STEP 2 Input sentence is given to parser to get the grammatical tree structure and Parts Of Speech category. Grammatical tree structure is shown in figure 4.8.
FIGURE 4.8 GRAMMATICAL TREE STRUCTURE 38
STEP 3 Reordering rule is applied for the English sentence.
FIGURE 4.9 REORDERING OF ‘SHE IS WRITING A LETTER’ STEP 4 For the given English words equivalent Telugu words are found in the bilingual dictionary.
FIGURE 4.10 LEXICALIZATION
39
STEP 5 Next step is morphological generation for verb.
VERB MORPHOLOGICAL GENERATION Input vrAYU(write) + V + present + 3SF Outpu vrAstundi STEP 6 FINAL OUTPUT English
He is writing a letter
Transliterate
Ame oka aksharamu vrAstundi
Telugu
ఆమె ఒక అక్షరము రాస్త ా ఉంది
40
CHAPTER 5 RESULTS 5.1 TESTING AND RESULTS Morphological analyzer for Telugu Nouns and Verbs are tested separately and the results of the system are mentioned in Table 5.1 and 5.2.
TABLE 5.1 TESTING RESULTS MORPHOLOGICAL ANALYZER NOUN
TESTING RESULTS MORPHOLOGICAL ANALYZER-NOUN NUMBER OF NOUNS TESTED
150
NUMBER OF CORRECT OUTPUT
94
NUMBER OF INCORRECT OUTPUT
56
ACCURACY (%)
62.6
TABLE 5.2 TESTING RESULTS OF MORPHOLOGICAL ANALYZER VERB
TESTING RESULTS MORPHOLOGICAL ANALYZER-VERB NUMBER OF NOUNS TESTED
200
NUMBER OF CORRECT OUTPUT
117
NUMBER OF INCORRECT OUTPUT
83
ACCURACY (%)
58.5
5.2 DISCUSSION Morphological analyzer for noun and verb are tested separately. The system is tested with 150 nouns and 200 verbs. The accuracy of the system is 62.6 percent and 58.5 percent respectively. Incorrect output occurs mainly due to words which do not fall under the classified paradigm.
41
5.3 SCREEN SHOT OF MORPHOLOGICAL ANALYZER Screen shots of morphological analyzer for verb and noun is given below.
FIGURE 5.1 GUI FOR MORPHOLOGICAL ANALYZER-VERB
FIGURE 5.2 GUI FOR MORPHOLOGICAL ANALYZER-NOUN
42
5.4 TESTING AND RESULTS Morphological generation for verbs and nouns are tested separately and the results are mentioned in Table 5.3 and Table 5.4. TABLE 5.3 TESTING RESULTS OF MORPHOLOGICAL GENERATOR FOR NOUN
TESTING RESULTS MORPHOLOGICAL GENERATOR-NOUN NUMBER OF NOUNS TESTED
300
NUMBER OF CORRECT OUTPUT
174
NUMBER OF INCORRECT OUTPUT
136
ACCURACY (%)
58
TABLE 5.4 TESTING RESULTS OF MORPHOLOGICAL GENERATOR FOR VERB
TESTING RESULTS MORPHOLOGICAL GENERATOR-VERB NUMBER OF NOUNS TESTED
200
NUMBER OF CORRECT OUTPUT
107
NUMBER OF INCORRECT OUTPUT
93
ACCURACY (%)
53.5
5.5 DISCUSSION Morphological generation for noun and verb are tested separately. The system is tested with 300 nouns and 200 verbs. The accuracy of the system is 58 percent and 53.5 percent respectively. Incorrect output occurs mainly due to words which do not fall under the classified paradigm. The accuracy of the system can be scaled up by considering more special cases, clitics and negative forms.
43
5.6 SCREEN SHOT OF MORPHOLOGICAL GENERATOR Screen shot of morphological generator verb and noun is given below
FIGURE 5.3 GUI FOR MORPHOLOGICAL GENERATOR-VERB
FIGURE 5.4 GUI FOR MORPHOLOGICAL GENERATOR-NOUN 44
5.7 TESTING AND RESULTS The system is tested with simple sentences. The outputs of the sentences are classified into three categories. 1. Good 2.Understandable and 3. Bad TABLE 5.5 TESTING RESULTS OF TRANSLATION SYSTEM
TESTING RESULTS
ACCURACY
NUMBER OF TESTED SENTENCE
450
NUMBER OF GOOD TRANSLATION
128
28.44
NUMBER OF UNDERSTANDABLE TRANSLATION
227
61.55
95
21.11
NUMBER OF BAD TRANSLATION
5.8 DISCUSSION English to Telugu Machine translation system is tested with 450 simple sentences. The output is categorized into three types namely good, understandable and Bad. Bad translation occurs mainly due to following reasons, 1. Non-availability of Lexicon in the bilingual dictionary. 2. Reordering Output is incorrect. (Cases like Exclamation sentences, Question types and Negative sentences) 3. Due to limited Morphological inflection. A set of tested sentences is attached as an excel file and the output is compared with Google translator system. Since morphological generation is not available in Google translator, the outputs of our translation system are morphologically better than Google. So, the translations are meaningful and more understandable in our system. But the number of lexicon in Google is high compared to our translation system, therefore lexicon wise Google‟s translation system works better. The online system is available at http://nlp.amrita.edu:8080/Eng2Tel/.
45
5.9 SCREEN SHOT OF ENGLISH-TELUGU MACHINE TRANSLATION SYSTEM
FIGURE 5.5 GUI FOR ENGLISH-TELUGU MACHINE TRANSLATION SYSTEM
FIGURE 5.6 GUI FOR ENGLISH-TELUGU MACHINE TRANSLATION SYSTEM 46
CHAPTER 6 CONCLUSION Machine translation plays a key role for breaking the barrier of language problem. Particularly in India we have different states and in each state we have different kinds of languages. Throughout the country it is difficult to follow a unique language. There needs lot of research in this field to handle the difficulties. Telugu is second most spoken language in India, it is important to have a translation system for Telugu language. Morphological analyzer based on the Support Vector Machine (SVM) a new state of art. We have demonstrated a new methodology adopted for the preparation of the data which was used for the machine learning approaches. We have not used any morpheme dictionary but from the training model our system has identified the morpheme boundaries. The accuracy obtained from the different machine learning tools shows that SVM based machine learning tool gives better result than other machine learning tools. Morphological analyzer and generator have been developed with the limited resource of linguistic knowledge. In the future people who have good knowledge in Telugu can use the system and provide an enhanced output.
47
REFERENCES 1. W.John Hutchins and Halord L. Somers, “An Introduction To Machine Translation”, Academic Press Ltd.,1992, pp 1-9 2. Jurafsky, Daniel and Martin, James.H, Speech and Language Processing-An Introduction to Natural Language processing, Computational Linguistics and Speech Recognition, 2002. 3. Manish Shrivastava, “Morphology Based Natural Language Processing tools for Indian Languages,” Department of Computer Science and Engineering, Indian Institute of Technology, Powai, Mumbai, 2005. 4. K. R. Beesley and L. Karttunen, Finite State Morphology. Stanford: CSLI Publications, 2003. 5. http://unicode.org/standard/WhatIsUnicode.html 6. M.F.Potter, An Algorithm for Suffix Stripping, 2001. 7. Brown, C.P., The Grammar of the Telugu Language. New Delhi: Laurier Books Ltd, 2001 8. Kosti D, A Mitter, Bh Krishnamurti. “A Short Outline of Telugu Phonetics”, on phone frequencies, 1979, pp 202-204. 9. Krishnamurti Bh, J P L Gwunn. “A Grammar of Modern Telugu” Chapter 5: The structure of Telugu Orthography, 1985. 10. Uma Maheshwar Rao G, Rajeev Sangal, P V H M L Narasimham, S C Babu, J Satyanarayana. Subcommitee report on “standards for the Implementation of Telugu in Information Technology”, 2001. 11. Gwynn and Krishnamurti: “A Grammar of Modern Telugu”, volume 11, Oxford University Press, Delhi, 1987. 12. K.P.Soman, R.Loganathan, V.Ajay, “Support Vector Machines and other Kernel Methods”, PHI Learning Private Ltd.,2009, pp 115-155. 13. Jesus Gimenez and Lluis Marquez, “SVMTool Technical Manual v1.3”, TALP Research Center, LSI Department, Salgado, Barcelona, 2006. 14. Anand Kumar M, Dhanalakshmi V, Rajendran S, Soman K P: A Novel Approach to Morphological "Hörsaalgebäude" of the University of Koeln Köln, Universitätsstrasse 35, Albertus-Magnus-Platz 1,Germany, 2009. 15. http://en.wikipedia.org/wiki/Google_Translate .
48
PUBLICATION INTERNATIONAL JOURNAL [1] R. SriBadri Narayanan, Saravanan.S and Dr Soman K.P, Amrita University, Coimbatore, India, “ Data Driven Suffix List And Concatenation Algorithm For Telugu Morphological Generator,” In Proceedings of International Journal Of Engineering Science and Technology,vol.3, no 8, pp.6712-6717, August 2011. NATIONAL CONFERENCE [1] Ramasamy Veerappan, R. SriBadri Narayanan, and Dr. K. P. Soman, Amrita University, Coimbatore, India, “Translation Based Support System for Smart Education,” In Proceedings of NCILC, 2011.
49
APPENDIX MARKERS GIVEN BELOW ARE THE INFLECTIONS CONSIDERED FOR TELUGU VERBS 1. PRESENT TENSE MARKERS tunnA, TunnA, tunTE, TunTE, Tum~m, tU , TU, to~m, To~m. 2. PAST TENSE MARKERS nnA, sunnA, A, sA, DA, cA, ppA, lcA, slA, tA, LLA, TTA, ccA, kunnA, kua~m, ia~m, ccA, ia~mcA, se, de, ce, ppe, te, ue, rce, nne, ye. 3. FUTURE TENSE MARKERS TA, ddA, A, tA, tua~m, ia~mcu, su, u, cu, ccu, dcu. 4. CLITIC vO, nO, rO, dO, lO, lA, kO, sai, si, stu, akA, nnA, lE. 5. AUXILIARY VERBS nivvu, vaccu, valayu, pO, ua~mdu, cUdu, peTTu, pArEyi, veyyi, avvu, mugia~mcu, cUpu,daluvu, manu, cupia~mcu, veLLu, goTTu, beTTu, sAgu, tIru. 6 .NEGATIVE MARKERS aka, akua~mDA, akpoyinA, akapotE, a, akpotEnE, akunnA. 7. PRONOUNS vanni, aTTua~mdi, naTTua~mdi. 8. NOUNS ammA, ayyA, nakkara, annamATa, nEkkara. 9. ADJECTIVE anavasara~m. 10. ADVERBIAL ADJECTIVE a~mduku, a~mduvalana, a~mduna, aTuva~mTi, aTlu, aTlugA. 11. POST POSITIONS lOga, lOpuna, dAkA, koddi, kadA, gAni, kanuka,kadu, gUDA, kAbOlu, kAni, gAdA, annA, kUDA, mua~mdu, ni, a~mTA, a~mTE, aMTu, mAku, baTTi, gAni, kUDa, mAllE, mari, gala, bO, lA, sariki, dagu nua~mDu, galugu, joccu, jAlu, baDuvu, tappa, pATiki, varaku, ka~mTE. 12. IMPERATIVE SUFFIX a~mDi, lEa~mDi 13. IMPERATIVE NEGATIVE SUFFIX aka~mDi.
GIVEN BELOW ARE THE INFLECTIONS CONSIDERED FOR TELUGU NOUNS 1. POST-POSITIONS a~mTE, O, gAni, gUDA, kAkua~mDA, gA, lEkua~mDA, vu, ki, ni, runibaTTi, lA, lAa~mTi, aDuduna, aDugunua~mci, aDuguki, eDuTaki, bataTa, bayaTanua~mDi, badulegA, cEta, cOTiki, cOTO, cOTOnua~mDi, cOTinua~mci, gua~mDa, guria~mci, gADa, ka~mTE, kedurugA, kosa~m, kOraku. malle, lO, lOgUDA, lOki, nua~mDi, lOpala, lOpali, lOpalanua~mDi, mIda, mIdaku, 50
mIdanua~mDi, madya, madyaki, madyalOnua~mDi, madyalOki, medalukoni, mua~mdu, naDuma, naDumaki, ni, nua~mDi, pai, paiki, painua~mDi, pakka, painua~mdi, pakkaku, pakkalO, pakkanua~mDi, prakAra~m, stAnAniki, stAna~m, stAna~mlO, stAna~mlOnua~Di, valana, vadd, vaddaku, vaddanua~mDi, venukanua~mDi, venuka, venukaku, taravAta, taravAnua~mDi, venuka, venukaku, taravAta, taravAtanua~mDi, tO, gUDA, tOpATu, gAka, daggara, daggaralO, daggaraku, daggaranua~mDi, dRushTilO, yOkka, dvArA. 2. PRONOUNS < pro> Ayana, Ame, atanu, gAru, di, vi, taravAta, vADu, vAru, vaipu. 3. ADJECTIVE ayinA, ayina. Paradigm List For example, Verb ఆడు have the following paradigms ఆడేస్ా ాను ఆడేస్ా ున్ానను ఆడేదా ాం ఆడేస్ాను ఆడేసింది TELUGU PARADIGM LIST – VERB Paradigm 1
అయిాకనుక
ఆడడం
అయిాకదు
ఆడుతూ
అయిాగూడ
ఆడేయ్యాలి
Paradigm 4
ఆడకూడదు
చావాలి
ఆడేసకాతు లా
చావకూడదు
Paradigm 2
చసలా కాతు
అరవను
చచిినట్ల ే
అరవలేను
చచిినట్ల ే గా
అరవలేదు
Paradigm 5
అరవక
చెప్పక
అరిచేయ్కుండా
చెప్పకుండ
Paradigm 3
చెప్ిపన
అయిాకదా
చెప్ా తనన
అయిాగాతు
చెప్పకపోయిన్ా 51
Paradigm 6
Paradigm 11
చేసలా
కాస్ుాంట్ే
చేస్ా ుంట్ే
కాయ్కపోతే
చెయ్ాకపోతే
కాస్ుాన్ాన
చేస్ా ున్ాన
కాసై
చేసై
కాయ్డం
Paradigm 7 చతడవదుా
Paradigm 12 కదలిను
చతడన్ేవదుా
కదలిలేను
చతడమయకు
కదలిడంలేదు
చతడకూడదు
కదలిక
చతడన్ేకూడదు
కదలికుండ
Paradigm 8
Paradigm 13 కాలక
చతప్ిస్ా ాను చతప్ిస్ా ున్ానను
కాలకుండ
చతప్ిదా ాం
కాలిిన
చతప్ించాను
కాలుస్ుానన
చతప్ించింది
కాలకపోయిన్ా
Paradigm 9
Paradigm 14
ఏడుస్ాాను
కలిసింది
ఏడుస్ుాన్ానను
కలవను
ఏడుదాాం
కలవలేను
ఏడాిను
కలవడంలేదు
ఏడచింది
కలవక
Paradigm 10
Paradigm 15
ఇస్ాానననమయట్
కొన్ేస్ా ుననప్పట్ికి
ఇస్త ా కాతు
కొన్ేస్ా ుననందుకు
ఇదత ా లే
కొన్ేస్ా ుననందువలన
ఇస్త ా కనుక
కొన్ేస్ా ుననట్లవంట్ి
ఇస్త ా కదా 52
కొన్ేస్ా ుననందున
కుడుతుననట్ల ే గా
Paradigm 16 కోస్ాాను
Paradigm 21 లేస్ా ాను
కోస్ుాన్ానను
లేస్ా ున్ానను
కోదాాం
లేదా ాం
కోస్ాను
లేచాను
కోసింది
లేచింది
Paradigm 17
Paradigm 22
కుదురచిసింది
మోప్తతాను
కుదరిను
మోప్తతున్ానను
కుదరిలేను
మోప్తదాం
కుదరిలేదు
మోపాను
కుదరిక
మోప్ింది
Paradigm 18
Paradigm 23
కూరచిను
ప్డేసలా
కూరచిలేను
ప్డేస్ా ుంట్ే
కూరచిడంలేదు
ప్డేయ్కపొతే
కూరచిక
ప్డేస్ా ున్ాన
కూరచికుండ
ప్డేయి
Paradigm 19
Paradigm 24
కురిసలా
ప్ననకకదు
కురుస్ుాంట్ే
ప్ననకకదా
కురియ్కపోతే
ప్ననకననమయట్ా
కురుస్ుాన్ాన
ప్న్ేనయ్కతు
కురిసై
ప్ననకట్లింది
Paradigm 20
Paradigm 25
కుట్ిినకొదిా
ప్రిచడమంట్ృ
కుడుతుననట్ల ే
ప్రచడమన్ాన
కుట్ిినచో
ప్రచడమననమయట్
కుడుతుననచో 53
ప్రచడమంట్ే
ప్ూయ్కపోయిన్ా
ప్రచడంకాతు
ప్ూసలా
Paradigm 26
ప్ూస్ుాంట్ే
ప్ట్టియ్యాలి
Paradigm 31
ప్ట్ి కూడదు
రాయ్ను
ప్ట్ేిసకాతు లా
రాయ్లేను
ప్ట్ేిసినట్ల ే
రాయ్డంలేదు
ప్ట్ేిసినట్ల ే గా
రాయ్క
Paradigm 27
రాయ్కుండ
ప్ిలుస్ుానన
Paradigm 32
ప్ిలవకపోయిన్ా
తననలేను
ప్ిలిసలా
తననడంలేదు
ప్ిలుస్ుాంట్ే
తననక
ప్ిలవకపోతే
తననకుండ
Paradigm 28
తన్ేనసిన
పొగుడాాను
Paradigm 33
పొగుడుాన్ానను
తేకుండ
పొగుడుదాం
తెచిిన
పొగిడాను
తేస్ా ునన
పొగిడచంది
తేకపోయిన్ా
Paradigm 29
తెసలా
పోతాను
Paradigm 34
పోతున్ానను
తీస్ుాననప్పట్ి
పోదాం
తీస్ుాననప్పట్ినుంచి
పోయ్యను
తీస్ుాననప్పట్ికీ
పోయింది
తీస్ుాననందుకు
Paradigm 30
తీస్ుాననందువలన
ప్ూసిన
Paradigm 35
ప్ూస్ుానన
ఉండలేదు
54
ఉండక
వలచలేదు
ఉండేయ్కుండా ఉండేసిన
Paradigm 37
ఉండేస్ా ునన
వచేియ్కుండా
Paradigm 36
వచేిసిన
వలచేస్ాను
వచేిస్ుానన
వలచేసింది
రాకపోయిన్ా
వలయ్ను
వచేిస్
వలచలేను
55
For example, Noun ఊరు have the
బెండతను
following paradigms ఊరాయ్న
బెండయిన్ా బెండయిన
ఊరామె
Paradigm 5
ఊరతను
బోన్ాయిన
ఊరయిన్ా
బోన్ామె
ఊరయిన
బోనతను బోనయిన్ా
Paradigm 1
బోనయిన
అబాాయ్యయ్న
Paradigm 6
అబాాయ్యమె
బుడాాయిన
అబాాయ్తను
బుడాామె
అబాాయ్యిన్ా
బుడా తను
అబాాయ్యిన
బుడా యిన్ా బుడా యిన
Paradigm 2 బాలుడాయిన
Paradigm 7
బాలుడామె
చేన్ాయ్న
బాలుడతను
చేన్ామె
బాలుడయిన్ా
చేనతను
బాలుడయిన
చేనయిన్ా చేనయిన
Paradigm 3 బండాయిన
Paradigm 8
బండామె
చిలయేయిన
బండతను
చిలయేమె
బండయిన్ా
చిలే తను
బండయిన
చిలే యిన్ా
Paradigm 4
చిలే యిన
బెండాయిన
Paradigm 9
బెండామె
దారాయ్న 56
దారామె
గుడాాయిన
దారతను
గుడాామె
దారయిన్ా
గుడా తను గుడా యిన్ా
దారయిన
గుడా యిన
Paradigm 10
Paradigm 15
ఎతుమిదాయ్న
గుడాయిన
ఎతుమిదామె
గుడామె
ఎతుమిదతను
గుడతను
ఎతుమిదయిన్ా
గుడయిన్ా
ఎతుమిదయిన
గుడయిన
Paradigm 11
Paradigm 16
గదాయ్న
గుండాయిన
గదామె
గుండామె
గదతను
గుండతను
గదయిన్ా
గుండయిన్ా
గదయిన
గుండయిన
Paradigm 12
Paradigm 17
గచరాయ్న
గూడాయిన
గచరామె
గూడామె
గచరతను
గూడతను
గచరయిన్ా
గూడయిన్ా
గచరయిన
గూడయిన
Paradigm 13
Paradigm 18
గొయ్యాయిన
ఇలయేయిన
గొయ్యామె
ఇలయేమె
గొయ్ాతను
ఇలే తను
గొయ్ాయిన్ా
ఇలే యిన్ా
గొయ్ాయిన
ఇలే యిన
Paradigm 14
Paradigm 19 57
జంతువాయిన
కోట్ాయిన
జంతువామె
కోట్ామె
జంతువతను
కోట్తను
జంతువయిన్ా
కోట్యిన్ా
జంతువయిన
కోట్యిన
Paradigm 20
Paradigm 25
కలిరాయిన
కోట్ాయ్న
కలిరామె
కోట్ామె
కలిరతను
కోట్తను
కలిరయిన్ా
కోట్యిన్ా
కలిరయిన
కోట్యిన
Paradigm 21
Paradigm 26
కాలయయిన
కొట్ాియ్న
కాలయమె
కొట్ాిమె
కాలతను
కొట్ి తను
కాలయిన్ా
కొట్ి యిన్ా
కాలయిన
కొట్ి యిన
Paradigm 22
Paradigm 27
కన్ానయిన
కోట్ాయిన
కన్ానమె
కోట్ామె
కననతను
కోట్తను
కననయిన్ా
కోట్యిన్ా
కననయిన
కోట్యిన
Paradigm 23
Paradigm 28
కీలయయ్న
కుందేలయయ్న
కీలయమె
కుందేలయమె
కీలతను
కుందేలతను
కీలయిన్ా
కుందేలయిన్ా
కీలయిన
కుందేలయిన
Paradigm 24 58
Paradigm 29
మేకయిన
మందాయ్న
Paradigm 34
మందామె
మెతుకాయిన
మందతను
మెతుకామె
మందయిన్ా
మెతుకతను
మందయిన
మెతుకయిన్ా
Paradigm 30
మెతుకయిన
మతుషాయిన
Paradigm 35
మతుషామె
న్ౌకరాయ్న
మతుషతను
న్ౌకరామె
మతుషయిన్ా
న్ౌకరతను
మతుషయిన
న్ౌకరయిన్ా
Paradigm 31
న్ౌకరయిన
మెదడాయిన
Paradigm 36
మెదడామె
నతరాయ్న
మెదడతను
నతరామె
మెదడయిన్ా
నతరతను
మెదడయిన
నతరయిన్ా
Paradigm 32
నతరయిన
మేన్ాయిన మేన్ామె
Paradigm 37
మేనతను
ఒకట్ాయిన
మేనయిన్ా
ఒకట్ామె
మేనయిన
ఒకట్తను
Paradigm 33
ఒకట్యిన్ా
మేకాయ్న
ఒకట్యిన
మేకామె
Paradigm 38
మేకతను
పాలయయ్న
మేకయిన్ా
పాలయమె పాలతను 59
పాలయిన్ా
ప్ళ్ాతను
పాలయిన
ప్ళ్ాయిన్ా
Paradigm 39
ప్ళ్ాయిన
ప్ందిరాయ్న
Paradigm 44
ప్ందిరామె
ప్న్ానయ్న
ప్ందిరతను
ప్న్ానమె
ప్ందిరయిన్ా
ప్ననతను
ప్ందిరయిన
ప్ననయిన్ా
Paradigm 40
ప్ననయిన
ప్ందెమయయ్న
Paradigm 45
ప్ందెమయమె
ప్ట్ాియ్న
ప్ందెమతను
ప్ట్ాిమె
ప్ందయిన్ా
ప్ట్ి తను
ప్ందయిన
ప్ట్ి యిన్ా
Paradigm 41
ప్ట్ి యిన
ప్న్ాయ్న
Paradigm 46
ప్న్ామె
ప్ిలేవాడాయిన
ప్నతను
ప్ిలేవాడామె
ప్నయిన్ా
ప్ిలేవాడతను
ప్నయిన
ప్ిలేవాడయిన్ా
Paradigm 42
ప్ిలేవాడయిన
పాప్మయయిన
Paradigm 47
పాప్మయమె
ప్ిండాయిన
పాప్మతను
ప్ిండామె
పాప్మయిన్ా
ప్ిండతను
పాప్మయిన
ప్ిండయిన్ా
Paradigm 43
ప్ిండయిన
ప్ళ్ళాయిన
Paradigm 48
ప్ళ్ళామె
ప్తలయయ్న 60
ప్తలయమె
రాణాయ్న
ప్తలతను
రాణామె
ప్తలయిన్ా
రాణతను
ప్తలయిన
రాణి అయిన్ా
Paradigm 49
రాణి అయిన
ప్తస్ా కమయయిన
Paradigm 54
ప్తస్ా కమయమె
రాయ్యయిన
ప్తస్ా కమతను
రాయ్యమె
ప్తస్ా కమయిన్ా
రాయ్తను
ప్తస్ా కమయిన
రాయ్యిన్ా
Paradigm 50
రాయ్యిన
ప్తట్ాియిన ప్తట్ాిమె
Paradigm 55
ప్తట్ి తను
రెండాయిన
ప్తట్ి యిన్ా
రెండామె
ప్తట్ి యిన
రెండతను
Paradigm 51
రెండయిన్ా
ప్ూలయయ్న
రెండయిన
ప్ూలయమె
Paradigm 56
ప్ూలతను
రిక్షా అయ్న
ప్ూవయిన్ా
రిక్షా ఆమె
ప్ూవయిన
రిక్షా అతను
Paradigm 52
రిక్షా అయిన్ా
రాతాాయ్న
రిక్షా అయిన
రాతాామె
Paradigm 57
రాతాతను
స్ారాయిన
రాతాయిన్ా
స్ారామె
రాతాయిన
స్ారతను
Paradigm 53
స్ారయిన్ా
61
స్ారయిన
తిరగలయిన్ా
Paradigm 58
తిరగలయిన
స్ందడాయిన
Paradigm 63
స్ందడామె
ఊరాయ్న
స్ందడతను
ఊరామె
స్ందడయిన్ా
ఊరతను
స్ందడయిన
ఊరయిన్ా
Paradigm 59
ఊరయిన
సలనహితుడాయిన
Paradigm 64
సలనహితుడామె
వేలయయ్న
సలనహితుడతను
వేలయమె
సలనహితుడయిన్ా
వేలతను
సలనహితుడయిన
వేలయిన్ా
Paradigm 60
వేలయిన
తరగతాయ్న
Paradigm 65
తరగతామె
వెయ్యాయిన
తరగతతను
వెయ్యామె
తరగతయిన్ా
వెయ్ాతను
తరగతయిన
వెయ్యాయిన్ా
Paradigm 61
వెయ్యాయ్
తెన్ానయ్న తెన్ానమె తెననతను తెననయిన్ా తెననయిన Paradigm 62 తిరగలయయిన తిరగలయమె తిరగలతను
62
63