technological development of tamil

31 downloads 0 Views 665KB Size Report
ILTPDC has a downloadable Tamil Telugu General Text Corpus. ... Machine Translation project Tamil-Telugu parallel corpus was produced by human translators. ... phonetic representation and transcription text file listing the transcription for each audio file .... Tamil" which recognizes the voice spoken through mike and type ...
TECHNOLOGICAL DEVELOPMENT OF TAMIL Rajendran Sankarvelayuthan [email protected]

Abstract Tamil has initiated its technological development well in advance. It has made use of all the opportunities given to it for making it suitable for digitalization and computerization. The references listed below stand to establish its efforts in fulfilling the need of the day i.e. technological development. Governments, both state and central, funded liberally for the technological development of Tamil. This helped it to develop MT systems, wordNet and other NLP systems. Private organizations also contributed for this mission. Many individuals, both from inside and abroad, literally worked for Tamil computing. The organizations such as CIIL, AUKBCRC, Anna University, Amrita University, Tamil Virtual Academy, Tamil University, and Madras University need to appreciated for their efforts in uplifting Tamil in the era of Information Technology. Tamil has switched over to Unicode abandoning other systems. Tamil has comparatively commendable resouces and tools for NLP applications. Sumptuous amount of text corpora, speech corpora and parallel corpoara are available for Tamil. A good number of speech recognition systems and text to speech systems are developed for Tamil. Reliable morphological analyzers, morphological generators, syntactic parsers, chunkers, shallow parsers, named entiry recogniztion sytems optical character recognition system are available for Tamil. Computational semantics also improved in Tamil. There are attempts to develop word sense disambiguation system, question answering system, relationship extraction system, sentiment analysis systems, automatic summarization systems, and coreference resolution systems. Efforts are made to develop text generation systems too for Tamil. Tamil shows only positive symptoms in the technological development.

Key words analysis, generation, parsing, chunking, tokenization, morphological analyzer, POS tagger, chunker, parser, morphological generator, tokenizer, syntactic parser, semantic analysis, grammatical analysis, name entity recognizer, speech synthesizer, speech recognizer, machine translation, Unicode font, Unicode convertor, word sense disambiguation, co-reference, coreference resolution, anaphora resolution. text analysis, text generation

1. Introduction Tamil Nadu which covers an area of 130,058 square kilometres (50,216 sq mi) is the eleventh largest state in India and. Kerala borders it to the west, Karnataka to the northwest, Andhra Pradesh to the north, the Bay of Bengal to the east and the Indian Ocean to the south. According to the 2011 India census, Tamil Nadu has a population of 72,147,030. The sex ratio of the state is 995 with 36,137,975 males and 36,009,055 females. A total of 14,438,445 people constituting 20.01 per cent of the total population belonged to Scheduled Castes (SC) and 794,697 people constituting 1.10 per cent of the population belonged to Scheduled tribes (ST). Tamil is one of the longest-surviving classical languages in the world. It has been described as the only language of contemporary India which is recognizably continuous with a classical past. The variety and quality of classical Tamil literature has led to it being described as one of the great classical traditions and literature of the world. A recorded Tamil literature has been documented for over 2000 years. The earliest period of Tamil literature, Sangam literature, is dated from ca. 300 BC–AD 300. It has the oldest extant literature among Dravidian languages. The earliest epigraphic records found on rock edicts and 'hero stones' date from around the 3rd century BC. More than 55% of the epigraphical inscriptions (about 55,000) found by the Archaeological Survey of India are in the Tamil language. Tamil language inscriptions written in Brahmi script have been discovered in Sri Lanka and on trade goods in Thailand and Egypt. The two earliest manuscripts from India, acknowledged and registered by the UNESCO Memory of the World register in 1997 and 2005, were written in Tamil. The major fund for the technological development of comes from Department of Electronics and Information Technology (DeitY), Govt. of India. Central Institute of Indian Language is contributing a lot for the Technological development of Tamil. “Resource Centre for Indian Language Technology Solutions – Tamil – projects" established at Anna University contributed a lot for the technological development of Tamil. The project was funded by the ministry of InformationTechnology, Govt. of India.

From the regional side, Tamil Virtual Academy has sponsored certain projects which aimed at the ethnological development of Tamil. Industries rarely fund projects promoting technological development of Tamil. Consultancy for TDT is also rare. Institutions are individual from abroad, especially form Singapore, Srilanka, Malsia and Pensylvania too work for the technological development of Tamil (Ranganathan 2016f). International Forum for Information Technology in Tamil (INFITT- உத்தமம்) conducts Tamil Internet Conferece every year and encourages the technological development of Tamil. 2. Challenges and risks The challenges and risks thrown on Tamil for its technological development includes digitization, switching over to Unicode fonts, coverting the already available digitalized material into Unicode format by font converter, creation of linguistic or language technological resources for Tamil computing and other efforts for the technological development of Tamil. At the risk or loss level, Tamil faces the distortion of various types in the digital media such as mixed coding, corrupt and ungrammatical expressions, and use of slangs. This practice may lead to the the loss of the language, culture and various other unique features of the languge. Digitization is the process of converting information into a digital format. In this format, information is organized into discrete units of data (called bits) that can be separately addressed (usually in multiple-bit groups called bytes). This is the binary data that computers and many devices with computing capacity (such as digital camera s and digital hearing aids) can process. Digital India initiative has launched some amazing projects. Text and images can be digitized similarly: a scanner captures an image (which may be an image of text) and converts it to an image file, such as a bitmap. An optical character recognition (OCR) program analyzes a text image for light and dark areas in order to identify each alphabetic letter or numeric digit, and converts each character into an ASCII code. Audio and video digitization uses one of many analog-to-digital conversion processes in which a continuously variable (analog) signal is changed, without altering its essential content, into a multi-level (digital) signal. The process of sampling measures the amplitude (signal strength) of an analog waveform at evenly spaced time markers and represents the samples as numerical values for input as digital data. Digitizing information makes it easier to preserve, access, and share. For example, an original historical document may only be accessible to people who visit its physical location, but if the document content is digitized, it can be made available to people worldwide. There is a growing trend towards digitization of historically and culturally significant data. Tamil Virtual Academy, Project Madurai and other institutions have digitalized Tamil literary works including Sanga Tamil literary pieces, middle and modern Tamil literary works and other important works in Tamil in the digitalized form. Tamil input methods refer to different systems developed to type Tamil language characters using a typewriter or a computer keyboard. Several programs such as Azhagi and NHM writer provide both fixed and phonetic type layouts for typing. There are many issues with the present Unicode for Tamil language. The present Unicode standard for Tamil is considered not adequate for efficient and effective usage of Tamil in computers, due to the following reasons: 

  

 

Only 31 out of 247 Tamil characters have Unicode code positions. These 31 characters include 12 vowels (uyirkaL), 18 vowel-consonant combinations (uyir-meykaL), and one aytham. The five Grantha vowelconsonant combinations are not provided code space in Unicode Tamil. The other Tamil characters have to be rendered using separate software. In the present Unicode Tamil, only 10% of the Tamil characters are provided code space. Code space is not given for 90% of the Tamil characters that are used in general text interchange. The vowel-consonat combinations that are left out in the present Unicode Tamil are simple characters, just like A, B, C, D are characters to English. Vowel-consonat combinations are not glyphs, nor ligatures, nor conjunct characters as assumed in Unicode. ka, kA, ki, kI, etc., are characters to Tamil. In any plain Tamil text, vowel-consonants form 64 to 70%; vowels form 5 to 6% and consonants form 25 to 30%. Breaking high frequency letters like vowel-consonants into glyphs is highly inefficient. This type of encoding requires a rendering engine to realize a character. This type of computing is not suitable for applications like system software developments in Tamil, searching and sorting and Natural language processing (NLP) in Tamil, It consumes extra time and space. This makes the computing process highly inefficient. For such applications Level-1 implementation where all the characters of a language have code positions in the encoding, like English is required. As this encoding is based on ISCII (1988). the characters are not in the natural order of sequence. A complex collation algorithm is required for arranging them in the natural order of sequence. Single characters are rendered by making use of multiple code points to render. Multiple code points lead to security vulnerabilities, ambiguous combinations and require the use of normalization.

3. Resources and Tools 3.1. Language Resources: Resources, Data and Knowledge Base In the context of NLP, corpus is a large collection of text. There are different types of corpus such as text corpus, speech corpus, image corpus etc. 3.1.1. Text corpora The Central Institute of Indian Languages has coordinated for the development of 45 plus million word corpora in Scheduled Languages under the scheme of Technology Development for Indian Languages (TDIL) of the Ministry of Communication and Information Technology. These corpora were created following sampling methodologies and hence these are balanced corpora. They are available in Indian Standard Code for Information Interchange or Indian Script Code for Information Interchange (ISCII) format. The Institute also intends to enhance these corpora to the tune of twenty million in each language. In collaboration with the Lancaster University, CIIL has converted the same into UNICODE format. Corpora in this format along with the Lancaster University corpora are also available for users at: http://www.emille.lancs.ac.uk/home.htm Tamil Virtual Academy aims to generate Tamil corpus bank with global standard. Currently, corpus extraction is done from Sangam, Medieval and Modern Tamil with a corpus of size 150 Million words. Under the Indian Languages Corpora Initiative phase–II (ILCI Phase-II) project, initiated by the MeitY, Govt. of India, Jawaharlal Nehru University, New Delhi had collected monolingual corpus in Tamil. This is the final outcome of the project and there are approx. 30,000 sentences of general domain. The translated sentences have been POS tagged according to BIS (Bureau of Indian Standards) tagset. This corpus has following features: unique ID, UTF-8 encoding, and text file format. ILTPDC has a downloadable Tamil Telugu General Text Corpus. Under Indian Language to Indian Language Machine Translation project Tamil-Telugu parallel corpus was produced by human translators. The corpus is drawn from web covering tourism/travel and other in the ratio of 3:1. There are total 1000 sentences comprised of very simple, simple, complex and compound sentence structures. The sentence structures include relative clauses, complement clauses, finite and non-finite conjunctions. Text encoding is UTF- 8 (Reported in ILTPDC website). A large size manually annotated POS tagged corpus (515K tokens) is made availale for Tamil by AUKBC, Chennai. The corpus is based on the famous 20th century Tamil novel "Ponniyin Selvan" written by "Kalki Krishnamoorthy". The corpus is annotated with the BIS Tagset, a hierarchical tagset which is approved by the Bureau of Indian Standards and Tamil Virtual Academy. The Corpus Statistics: Total Number of sentences 50,876; Number of words - 5,15,283. 3.1.2. Speech corpora ILTPDC reports the availability of downlodable Tamil Speech Data – ASR. This corpus contains the more than 62000 audio files of Tamil language of 1000 speakers, .dic file which contains word and its corresponding phonetic representation and transcription text file listing the transcription for each audio file. This data was prepared for Agricultural Commodity domain and Size of this corpus is 5.7 GB. This Tamil Speech Recognition database was collected in Tamilnadu and contains the voices of 450 different native speaker who were selected according to age distribution (16-20,21-50,51+), gender, dialectical regions and environment (home, office and public place). Each speaker recorded a news text in a noisy environment through recorder having an inbuilt microphone. The recordings are in stereo recording and the extracted channel are also included in the specific files. It includes audio file, text file, NIST files which were saved as .ZIP Files. All the speech data are transcribed and labelled at the sentence level. Statics: Form and Function words, Command and Control words, Phonetically balanced vocabulary, Proper names, most frequent 1000 words 3.1.3. Parallel corpora Under the scheme “EnTam: An English-Tamil Parallel Corpus” English-Tamil bilingual data has been collected from some of the publicly available websites for NLP research involving Tamil. The standard set of processing has been applied on the raw web data before the data became available in sentence aligned English-Tamil parallel corpus suitable for various NLP tasks. The parallel corpora cover texts from bible, cinema and news domains. The following are the details of Tamil Text and Speech Corpus in CIIL as reported by Ramamoorthy:

Text Corpus: Tamil CIIL Corpus Word Count:

8068759

Tamil Magazine Corpus Word Count:

1805164

Tamil News paper Word Count: (Web Crawlwed) Total Raw Text Word Count -

1059561 10933484

Annotated Corpus POS tagged: BIS Tagset tagged LDC-IL Tagset tagged:

1881646 58557

Chunking: Sentence Count: Chunk Count:

20972 160386

Speech Corpus Total Speakers: Total Hours:

453 213:37:27

Note: Total Hours in Speech Corpus will change after quality check is completed 3.1.4. Lexical resources Ideally speaking, a lexical resource (LR) is a database consisting of one or several dictionaries. Depending on the type of languages that are addressed, the LR may be qualified as monolingual, bilingual or multilingual. For bilingual and multilingual LRs, the words may be connected or not connected, from a language to another. When connected, the equivalence from a language to another, is performed through a bilingual link (for bilingual LRs) or through multilingual notations (for multilingual LRs). It is possible also to build and manage a lexical resource consisting of different lexicons of the same language, for instance, one dictionary for general words and one or several dictionaries for different specialized domains. A lexical database is a lexical resource which has an associated software environment database which permits access to its contents. The database may be custom-designed for the lexical information or a general-purpose database into which lexical information has been entered. Information typically stored in a lexical database includes lexical category and synonyms of words, as well as semantic and phonological relations between different words or sets of words. Tamil wordNet has been constructed based on Hindi wordNet under Dravidian WordNet project funded by DietY. Nearly 30000 systets have been created and linked to the four major Dravidian languages (Rajendran . The Dravidian wordNet is the part of the IndoWordNet project. WordNets are on online lexical databases built to surface across language for information extraction, machine translation and other NLP applications. Tamil visual Onto-thesaurus constituting of 50000 words have been constructed by Amrita Vishwa Vidyapeetham, Coimbatore. It was funded by Virtual Tamil Academy, Chennai. Tamil vocabulary is taxonomically classified and liked by various kind of semantic or lexical relations. 3.1.5. Grammars Tamil has the rich resource of grammars. It has all types of grammars ranging from those based on traditional principles to modern principles including Chomskiyan transformational generative grammar principles. Formalisms based on Phrase Structure Grammar (PSG), Context Free Grammar (CFG), CSG (Context Sensitive Grammar), Tag grammar, LFG (Lexical Functional Grammar), Dependency Grammar, Paniniyan grammar and Tolkappiyam have been made use of for computing Tamil. Paniyan formalism has been made use of in ILMT project and Tag formalism has been made use of for ELMT project promoted by CDAC, Pune. Amrita University has made use of Dependency grammar for developing its own MT system between English and Tamil and developing shallow parser for Tamil.

3.2. Language Technology: Tools, Technologies and applications 3.2.1. Speech recognition Speech to text (STT) i.e. converting text to speech is a challenging task in NLP. Yegnanarayana (Nayeemulla Khan and Yegnanarayana 2001), Hema A. Murthy, A.G. Ramakrihnan and T. Nagarajan have laid foundation for converting speech into text. A.G. Ramakrihnan developed an Automatic Speech Recognition system for Tamil ing neural networks and finite state transducers to recognize Tamil utterances covering a large vocabulary. T. Nagarajan developed a "Speech-Input Speech-Output Communication Aid (SISOCA) for Speakers with Cerebral Palsy" with the financial support of DST-TIDE. Google AdMob has a software for "Speech to Text Tamil" which recognizes the voice spoken through mike and type automatically in Tamil text. One can save this typed text and use anywhere. This software works only in Google Chrome (Version 25 or higher) Browser only. Googleplay offers a Tamil speech to text converter. Azhagi Android App - Voice Input is a STT (Voice input or Speech recognition) in 100+ languages. While using Azhagi Android App, apart from typing, one can also speak and create one's texts in multiple languages including Tamil. Thangarajan of Kongu college of Engineering, Coimbatore also contributed to the development of STT system for Tamil (Thangarajan et al 2008a, 2008b). 3.2.2. Speech synthesis Though speech recognition, i.e. speech to text conversion is yet to reach an appreciable level, speech synthesis i.e. text to speech (TTS) has reached a commendable position. IIT, Chennai has made a major contribution in this line of technological development of Tamil. IISC, Bangalore too has contributed for speech analysis. The initiation taken up by CIIL, Mysore to develop a speech corpus for Tamil has to be appreciated in this context. There are many attempts to convert written text into speech form. The pioneering institutios involed in developing TTS synthesis are IIT Chennai, International School of Dravidian Linguistics (ISDL), Trivandurm and IISc Banglore. ISDL developed a very efficient TTS synthesis system for Tamil (Subramoniam 2001). TTS synthesis of Tamil reseach was initiated by Yegnanarayana who was at IIT Chennai athat time. His student Hema A. Murthy followed him and contributed a lot in this line of research. She has developed a commendable text to speech stem for Tamil. The input to a TTS system is not always pure text and it may contain some acronyms, abbreviations and non-standard words, which need to be first converted to the corresponding Tamil graphemic form. A text normalization module is developed for accomplishing this task. T. Nagarajan has undertaken a project entitled ‘Development of Text-to-Speech Synthesis Systems for Indian Languages - High Quality TTS and Small Footprint TTS Integrated with Disability Aids’. T. Nagarajan has developed "Speech Assistive Aids for Visually-Challenged People" with the financial assistance from Tamil Virtual Academy (TVA), Chennai. He also developed a HMM-based Text-to-Speech Synthesis System for Malaysian Tamil with the support of the funding Agency, Murasu Systems Sdn Bhd, Malaysia. Many types of speech: syllable based continuous speech recognizer, syllable based isolated word recognizer, Hybrid Modeling Algorithm for Continuous Tamil Speech Recognition etc. A.G Ramachandran of IISc, Bangalore developed a text to speech synthesizer for Tamil (Ramakrishnan 2007). A TTS software which was able to render the machine readable text into Human Voice was developed under the consortium Mode project with the leadership of IIT Madras. The TTS software allows people with visual impairments or reading disabilities to listen to written works on a computer or a mobile device. Text to Speech system when integrated with screen reader would enable visually challenged users to interpret and perform computer operations with audio interface. Thus TTS system integrated with screen reader would be a potential assistive technology which will help visually challenged section and learning disabled section of the society to use the benefits of ICT and knowledge sharing. 3.2.3. Grammatical analysis Important grammatical analysis of natural language involves phonological analysis, morphological analysis and syntactic analysis. Tamil is relatively free word order, verb final and inflectional language. The third person pronouns show singular-plural distinction and masculine-feminine distinction. In Tamil language, the participants of a sentence that depend on the verb (such as subject, object, indirect object, instrument, and location) are identified by the case markers they take. As Tamil is an agglutinative language, the words are formed by combining several morphemes. A Tamil word is a composition consisting of a root combined with other grammatical accretions. Irrespective of the length, complexity and type of Tamil words, the roots can be traced up to monosyllabic level by careful removal of successive accretions. Traditionally, a Tamil word is divided into a maximum of six parts, namely pakuthy (prime-stem), sandhi (junction), vihaaram (variation), iTainilai (middle part), saariyai (enunciater) and vikuti(terminator) in that order. For example, a word, ndaTantananmeaning ‘(He) walked’, is made up of the morphemes: naTa + t(n) + t + an + an. The middle part and terminator are grammatical additions

to the prime-stem. The middle part marks the tense and the terminator marks the gender. Usually, the primestem is the main part of the word responsible for its meaning. 3.2.3.1. Word segmentation or tokenization Tokenization is the process of breaking up the given text into units called tokens. The tokens may be words or number or punctuation mark. Tokenization does this task by locating word boundaries. Ending point of a word and beginning of the next word is called word boundaries. Tokenization is also known as word segmentation. Separate a chunk of continuous text into separate words. For a language like English, this is fairly trivial, since words are usually separated by spaces. However, for the agglutinative languages like Tamil do not mark word boundaries in such a fashion, and in those languages text segmentation is a significant task requiring knowledge of the vocabulary and morphology of words in the language. Sometimes this process is also used in cases like Bag of Words (BOW) creation in data mining and word sense disambiguation (WSD). All the MT systems oriented towards Tamil have to start with word segmentation of tokenization. 3.2.3.2. Stemming In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. Algorithms for stemming have been studied in computer science since the 1960s. Many search engines treat words with the same stem as synonyms as a kind of query expansion, a process called conflation. A computer program or subroutine that stems word may be called a stemming program, stemming algorithm, or stemmer. A number of stemmers have been built for Tamil (Thnkarasu and Manavalan 2013). They work on the corpus or texts which are segmented or tokenized into words. 3.2.3.3. Lemmatization Lemmatisation (or lemmatization) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form. In computational linguistics, lemmatisation is the algorithmic process of determining the lemma of a word based on its intended meaning. Unlike stemming, lemmatisation depends on correctly identifying the intended part of speech and meaning of a word in a sentence, as well as within the larger context surrounding that sentence, such as neighboring sentences or even an entire document. As a result, developing efficient lemmatisation algorithms is an open area of research. A computer system performing lemmatization is called lemmatizer. Amrita University, AUKBC and Anna University have developed lemmatizers. 3.2.3.4. Morphological Analysis Morphological analysis is the process of separating words into individual morphemes and identifing the class of the morphemes. The difficulty of this task depends greatly on the complexity of the morphology (i.e. the structure of words) of the language being considered. English has fairly simple morphology, especially inflectional morphology, and thus it is often possible to ignore this task entirely and simply model all possible forms of a word (e.g. "open, opens, opened, and opening") as separate words. In languages such as Tamil a highly agglutinated Indian language, however, such an approach is not possible, as each dictionary entry has thousands of possible word forms. A computer system performing morphological analysis is called morphological analyzer. Developing morphological analyzer is crucial for many NLP applications including Machine Translation (MT). Most of the current morphological analyzers for Tamil mainly use segmentation to deconstruct the word to generate all possible candidates and then either grammar rules or tagging mismatch is used during post processing to get the best candidate. The first morphological analyzer for Tamil was developed under Anusaraka project undertaken at IIT Kanpur. Rajendran made the first morphological analyzer for Tamil based on the frame work formed for the module in Anusaraka Project. Later on it was improved by AUKBCRC. The major contribution of developing morphological analyzer for Tamil came from CIIL (Ganesan 1994; Ganesan and Fransis Ekka (1994), AUKBCRC (Viswanathan 2000; Viwanathan et al 2003; Vijay Sundar Ram et al 2010; Marimuthu et al 2013), Anna University (Anandan et al 2002), Amrita University, Coimbatore (Menon 2009; Anand Kumar et al 2009; Anand Kumar et al 2010a; Anand Kumar et al 2010b; Dhanalakshmi et al 2009e, Manone et al 2009; Anand Kumar et al 2014 ), Tamil University (Rajendran 1999; Rajendran et al 2003, Rajendran 2006), Central

University, Hyderabad (Arulmozi 1998; Ramaswamy 2003; Winston Cruz 2002) Madras University (Deivasundram and Gopal 2003) and Annamalai Universtiy (Ganesan 2003). There are some individual efforts too, say for example, Deyvasundram's Tamil Word Processor and Ganesan's morphological analyzer. Ahilan et al have developed a morphological analyzer for classical text by rule based approach (Ahilan et al 2015). Different types of morphological analyzers have been developed for Tamil making use of different approaches such as rule based approach, sequence labelling approach, and machine learning approach. There are a few notable contributions from abroad also (Vasu 1997; Lokanathan et al Lushanthan 2014; Mokanarangan et al 2016, Ranganathan 2016b). 3.2.3.5. Morphological generation Morphological generation is the inverse of morphological anslysis, namely the process of converting the internal representation of a word to its surface form. A computer system performing morphological generation is called morphological generator. The final stage of the generation module is morphological generation. In most of the machine translation system, the final module is morphological generator. In order to generate a word form for a specific grammatical category the corresponding suffixes have to be concatenated with the root word. The morphological generator takes its input from the previous module. The input would be the root word along with its grammatical features. The generator then inflects the root word according to the morphology of the language and outputs the target language word form. The words thus generated are concatenated to form the complete target language sentence. The Machine translation systems targeting Tamil have developed morphological generator or word generator. ILILMT cosortitum project and EILMT consortitum project contributed to the development of morphological generator for Tamil. CIIL, Central University, Hyderabad (Winston Cruz 2002; Ramaswamy 2000) AUKBCRC (Menaka et al 2018), Amrita University (Menon 2009; Anand Kumar et al 2010) and Anna University (Anandan et al 2001) have developed morphological generator for Tamil to meet out the requirements for their NLP proejts. Duraipandi of Ultimate Software Solutions, Dindigul has developed a ‘Morphological Generator and Parsing Engine for Tamil Verb Forms’ (Duraipandi 2006). 3.2.3.6. Part-of-speech tagging Given a sentence, determine the part of speech for each word. Many words, especially common ones, can serve as multiple parts of speech. Some languages have more such ambiguity than others. Languages with little inflectional morphology, such as English, are particularly prone to such ambiguity. Chinese is prone to such ambiguity because it is a tonal language during verbalization. Tamil is an inflectionally rich language. POS tagging can remove categorical ambiguity (ambiguity due to the presence of a homophonous lexical item belonging to two parts of speech). A number of tagsets have been developed for POS tagging Indian languages including Tamil. Mricosoft, India attempted to develop ‘a commone part of speech tagset framework for Indian Languages’ (Sankaran Baskaran et al 2008). It is a hierarchical tagset. The other tagset available for Tamil is flat tagset. TDIL developed BIS tagset for Indian languages. This has been followed for all the Deity funded projects including Indian Languge Corpora Initiative (ILCI) project. The system that determines parts of speech of each word is called POS tagger. POS taggers are developed for Tamil by many institutions and individuals. TDIL has developed a POS tagger for Indian Languages including Tamil. It makes use of BIS tagsets. CIIL, AUKBCRC (Arulmozhi et al 2004; Pattabhi 2013), Amrita university (Anandakumar et al, Dhanalakshmi et al 2008; Chandrakanth et al 2016), Anna University (Pandian et al 2009, Ganesh et al 2014), and a few individuals have developed POS tagger for Tamil. There are a few notable contributions from abroad also (Ranganathan. 2001; Thyaparan 2018). There are attempts to improve the existing POS tagging systems (Selvam et al 2009). Two main types of approaches are used for POS tagging: rule based approach and machine learning approach (Anandakumar et al, Dhanalakshmi et al). Rajendran (2007) discusses about the complexity of POS tagging in Tamil. 3.2.3.7. Syntactic Parsing Parsing determines the parse tree (grammatical analysis) of a given sentence. The grammar for natural languages is ambiguous and typical sentences have multiple possible analyses. In fact, perhaps surprisingly, for a typical sentence there may be thousands of potential parses (most of which will seem completely nonsensical to a human). There are two primary types of parsing, Dependency Parsing and Constituency Parsing. Dependency Parsing focuses on the relationships between words in a sentence (marking things like Primary

Objects and predicates), whereas Constituency Parsing focuses on building out the Parse Tree using a Probabilistic Context-Free Grammar (PCFG). AUKBC developed a parse representation for Tamil (Kumarshanmugam, 2004). Various types of syntactic parsers are attempted for Tamil. They are lexical and statistical parsing, dependency parsing, TAG formalism based paring (Manone et al), shallow parsing (Arirathanam et al 2015), structural parsing using phrase structure hybrid language (Selvam et al 2009), structural parsing using phrase structure hybrid language model (Selvam et al 2008), semantic parsing (Balaji et al 2012), Penn Treebank-Based Syntactic Parsing (Antony et al 2010), dependency parsing using rule based and corpus based approaches (Ramasamy et al 2011). 3.2.3.8. Chunking Chunking adopted in the place of fullfledged syntactic parsing which is time consuming and unenconomical. For the purpose of NLP actitivies such as machine translalation chunking of sentences is enough. Chunker divides a sentence into its major non-overlapping phrases and attaches a label to each chunk. Chunker differs in terms of their precise output and the way in which a chunk is defined. Many do more than just simple chunking. Others just find NPs. Chunking falls between tagging and full parsing. The structure of individual chunks is fairly easy to describe, while relations between chunks are harder and more dependent on individual lexical properties. Chunker tokenizes and tags the sentence. A typical chunk consists of a single content word surrounded by a constellation of function words. Chunks are normally taken to be a non-recursive correlated group of words. Tamil being an agglutinative language have a complex morphological and syntactical structure. It is a relatively free word order language but in the phrasal and clausal construction it behaves like a fixed word order language. So the process of chunking in Tamil is less complex compared to the process of POS tagging. AUKBCR developed a Noun Phrase chunker for Tamil (Sobha L, Vijay Sundar Ram R. 2006). An elaborate work on chunking in Tamil was undertaken by Dhanalakshmi (2010; Dhanlkshmi et al 2019). 3.2.3.9. Shallow parsing A pioneering research work on shallow parsing in Tamil was undertaken by Dhanalakshmi (2010). The main aim of the research was to develop a shallow parser for Tamil. The shallow parser consists of the following four modules: (i) Part-of-Speech Tagger which tags the tokenized sentences by POS tags, (ii) Morphological analyzer which analyzes the words into morphemes giving English glosses, (iii) Chunker which divides a sentence into its major-non-overlapping phrases (noun phrase, verb phrase, etc.) and attaches a label to each chunk and (iv) Dependency parser for relation finding: Given the POS tag and chunks in a sentence, this decides which relations they have with the main verb (subject, object, location, etc.). Shallow parser for Tamil and the modules developed for it using machine learning approach shows reliable results. Morphological Analyzer gives an accuracy of 95% when tested with a real-time corpus of size 10,000 words. Tamil POS Tagger using SVMTool gives an overall accuracy of 94.12%. Dependency parser tools like MaltParser and MSTParser are trained with the same corpus of size 25, 000 words. The testing accuracy of MaltParser was 90.19% whereas it was 81.13% in the case of MSTParser. The results of POS tagging and chunking affect the overall accuracy of the dependency parser system. Ariaratnam and others (Ariaratnam et al 2014) have attempted to build a shallow parser designed to assign a partial structure to natural language sentences in order to recover useful syntactic information from Sri Lankan Tamil sentences. It uses a combination of a maximum entropy based part-of-speech (POS) tagger which automatically labels each word in a sentence with the appropriate POS tag, and a rule-based chunker which segments the sentences into syntactically correlated word groups, without the need for a large annotated corpus. To do this, they developed a POS tagset consisting of 20 POS tags using expert input, manually annotated a corpus of approximately 12500 words, and identified 390 chunk patterns to extract the chunks. Our POS tagger and chunker demonstrated promising f-measures of 81.72% and 78.3% respectively. Our combined shallow parser gives an f-measure of 66.6% owing to error propagation. 3.2.3.10. Clause boundary identification A Clause is the minimal grammatical unit which can express a proposition. It is a sequential group of words, containing a verb or a verb group (verb and its auxiliary), and its arguments which can be explicit or implicit in nature (Vijay Sankar Ram and Sobha 2008). This makes a clause an important unit in language grammars and emphasis the need to identify and classify them as part of linguistic studies. Analysis and processing of complex sentences is a far more challenging task as compared to a simple sentence. NLP applications often perform poorly as the complexity of the sentence increases. It is impossible, to process a complex sentence if its clauses are not properly identified and classified according to their syntactic function in the sentence. The performance of many NLP systems like Machine Translation, Parallel corpora alignment, Information Extraction, Syntactic parsing, automatic summarization and speech applications etc improves by introducing clause boundaries in a sentence. AUKBCRC has undertaken a pioneering work on clause boundary identification in Tamil (Vijay

Sundar Ram and Sobha 2008; Vijay Sundar Ram et al 2012). In Amrita University Dhivya et al (2011) tried for clause boundary identification for Tamil Language using dependency parsing. 3.2.3.11. Optical character recognition (OCR) Given an image representing printed text, OCR determines the corresponding text. Many OCR systems have been developed for Tamil. Krishnamoorthy is known for development of OCR for Tamil. Amrita University has undertaken to develop OCR for Tamil which is of average quality. Google has developed comparatively an efficient OCR online system for Tamil. Indian Language Technology Proliferation and Deployment Centre (ILTPDC) has a dowlodable OCR name eAksharayan–Tamil OCR. e-Aksharayan is a Desktop software for converting scanned printed Indian Language documents into a fully editable text format in Unicode encoding. It works on Windows 7, 8, and 10 (TDIL website). Googele offers an online OCR for Tamil which works fairly well. IISc Bangalore developed a complete OCR for Tamil (Aparna and Ramakrishnan). The Tamil OCR system works on a font independent and size independent scenario. A precise skew detection approach has been employed which enables us to find out the skew angle within the range of ± 0.06° about the actual value. Skew rotation is performed on a gray level image to avoid quantization effects and to reduce the distortions present in the character. Geometric moments and discrete cosine transforms are the features employed and the classifier used is nearest neighborhood. The overall recognition accuracy is around 98%. 3.2.4. Semantic analysis Semantics and its understanding as a study of meaning covers most complex tasks like: finding synonyms, word sense disambiguation, constructing question-answering systems, translating from one NL to another, populating base of knowledge. Basically one needs to complete morphological and syntactical analysis before trying to solve any semantic problem. 3.2.4.1. Word Sense Disambiguation Words have different meanings. A WSD system helps us to choose a correct sense in a given context. WSD is very important for applications like Machine Translation systems, Information retrieval systems, and questionanswering systems. For example, in a Machine translation from English to Tamil, words like ‘bank’ has to be disambiguated so that they are translated correctly in the target language (i.e. in this case the correct Tamil word is chosen). Many have attempted to develop WSD systems for Tamil. A fulfedged work on WSD in Tamil has been attempted by Baskaran (2002). Rajendran (2014) and Rajendran and Anand Kumar (2014) too attempted for WSD in Tamil with limited scope. Anand Kumar et al (Kumar 2014) proposes WSD in Tamil using support vector machines with rich features. An intesting and useful WSD system with a narrowly defined aim (Crosslingual preposition dissambiguation for machine translation) is advocated in Anand Kumar et al (2014 and 2015). Santosh Kumar (2016) proposed WSD using Semantic Web for Tamil to English Statistical Machine Translation. He proposes a technique to improve existing statistical Machine Translation methods by making use of semantic web technology. 3.2.4.2. Question answering Given a human-language question, determine its answer. Typical questions have a specific right answer (such as "What is the capital of India?"), but sometimes open-ended questions are also considered (such as "What is the meaning of life?"). Recent works have looked at even more complex questions. There are a few question-answering systems developed for Tamil. and automatic question generator system for Tamil.

Sowmaya (2013) developed an

3.2.4.3. Relationship extraction Given a chunk of text, relationship extraction identifies the relationships among named entities (e.g. who is married to whom). Menaka et al worked on ‘Automatic identification of cause-effect relations in Tamil using CRFs’. 3.2.4.4. Sentiment analysis

Sentimaent analysis is the extraction of subjective information usually from a set of documents, often using online reviews to determine "polarity" about specific objects. It is especially useful for identifying trends of public opinion in the social media, for the purpose of marketing. Sentiment analysis is widely undertaken for Tamil. Sentiment analyses aimed at different targets are undertaken for Tamil. Many attempts of different types have undertaken based on sentiment anlayis in Tamil. Arun et al (2015) worked on ‘Sentiment analysis of Tamil movie reviews via feature frequency count’. Padmamala et al (2017) worked on ‘Sentiment analysis of online Tamil contents using recursive neural network models approach for Tamil language’. Kausikaa.N and V. Uma (2016) workd on ‘Sentiment Analysis of English and Tamil Tweets’. 3.2.4.5. Automatic text summarization Automatic text summarization algorithm produces a readable summary of a chunk of text. It is often used to provide summaries of text of a known type, such as articles in the financial section of a newspaper. There are a few systems developed for automatic summarization of Tamil Texts of different types using different methods (Banu et al 2007; Pattabhi and Sobha 2017). 3.2.4.6. Coreference resolution Coreference resolution algorithm determines which words ("mentions") refer to the same objects ("entities") from a given sentence or larger chunk of text. Anaphora resolution is a specific example of this task, and is specifically concerned with matching up pronouns with the nouns or names to which they refer. The more general task of coreference resolution also includes identifying so-called "bridging relationships" involving referring expressions. Sobha is the torch bearer of coference resolution in Tamil (Sobha 2003; Sobha et al 2014a, 2014b; 2015). She has been working on coreferece resoulution in Tamil in various dimesions such as anaphora resolution, pronominal resolution, etc.. A number of reseach works has been undertaken in AUKBCRC under her supervision (Akhilandeswari and Sobha 2013, 2014 and 2015; Vijaya Sankar Ram and Sobha 2012, 2013 and 2016). Pattabhi and Sobha (2008) tried to identify similar and co-referring documents across languages. 3.2.3.7. Named entity recognition (NER) Named entity recognition (NER) is a sub-task of information extraction (IE) that seeks out and categorizes specified entities in a body or bodies of texts. NER is also known simply as entity identification, entity chunking and entity extraction. NER is used in many fields in artificial intelligence (AI) including natural language processing (NLP) and machine learning. NER System determine which items in the text map to proper names, such as people or places, and what the type of each such name is (e.g. person, location, organization) from a given stream of text. There a few attempts on building generic NER systems for Tamil language. These attempts were based on machine-learning approaches such as Hidden Markov Models (HMM), Maximum Entropy Markov Models (MEMM), Support Vector Machine (SVM) and Conditional Random Fields (CRF). A pioneering work on Named entity recognition in Tamil has been undertaken by AUKBCRC (Vijayakrishna and Sobha 2008, Malarkodi et al 2016; Malarkodi and Sobha 2018); Vijashankar and Sobha). Amrita University too contributed to developing NER system for Tamil (Abinaya et al 2014). 3.2.5. Text Generation Converting information from computer databases or semantic intents into readable human language is called text generation. A system or algorithm that generates text is called text generagtor. The aim of a typical text generation system is to produce a text which satisfies some set of pre-stated goals. Such systems are provided with a knowledge base – which contains information to be expressed -- and a set of goals. The system then organises this information into sentence-length chunks, realises these chunks as sentences, and prints or speaks the text. Ananth Ramakrishnan (Ananth Ramakrishnan et al 2009 ; Ananth Ramakrihsnan and Sobha 2010) attempted on the "Automatic Generation of Tamil Lyrics for Melodies". Kohilavani et al (2009) developed an Automatic Tamil Content Generation system. The Automatic content generation system aims on developing an intelligent tutoring system in Tamil language. This system focuses on delivering personalized content in Tamil language to an individual user needs based on their learning abilities and interests.

Rajeswari et al (2017) proposes to develop a ‘Language Relationship Model for Automatic Generation of Tamil Stories from Hints’. 3.2.6. Machine Translation Automatically translate text from one human language to another. This is one of the most difficult problems, and is a member of a class of problems colloquially termed "AI-complete", i.e. requiring all of the different types of knowledge that humans possess (grammar, semantics, facts about the real world, etc.) in order to solve properly. 3.2.6.1. Anusaraka for Tamil The firs MT system named Anusaraka was developed by Rajendran at IIT Kanpur for Hindi-Tamil translation based on Anusaraka frame work. Later the work was undertaken by AUKBCRC and was completed. 3.2.6.2. ANUVADAKSH – An Expert English to Indian Language Machine Translation System This is a collaborative effort of the consortium institutes which have brought forward the integration of four Machine Translation Technologies ─ TAG (Tree-Adjoining-Grammar based MT), SMT (Statistical based MT), AnalGen (Rules-Based MT) and EBMT (Example Based MT). An Expert English to Indian Languages Machine Translation System [EILMT]’ is a state-of-the-art solution that allows translating the text from English to Indian languages. ANUVADAKSH MT has been developed by a consortium of 10 Indian academic and research institutes, led by CDAC-Pune with the support of “Technology Development for Indian Languages” (TDIL) Programme of the Department of Electronics & Information Technology (DeitY), Government of India. The English to Indian Languages MT system is being developed for the 6 language pairs i.e. English to Hindi, Bengali, Marathi, Urdu, Tamil and Oriya. This system is developed to facilitate the multi-lingual community, initially in the domain-specific expressions of Tourism, and subsequently it would foray into various other domains as well, in a phased manner. 3.2.6.3. Indian Language to Indian Language Machine Translation Systems (ILILMT systems) The consortium project funded by DeitY has developed MT systems for Indian languages. Under this project a few Tamil oriented MT systems have been developed. They are Hindi-Tamil and vice versa MT system, TeluguTamil and vice versa MT systems and Malayalam-Tamil and vice versa MT system. 3.2.6.4. Amrita University’s contributins to developing MT systems Amrita University, Coimbatore has intiated a numer of projects for developing Tamil oriented MT systems. It undertook English to Tamil MT under the EILMT consortium project. The papser is based on TAG grammar. Anand Kumar (2015) as a part of his PhD work developed a statistical machine translation system for EnglishTamil transfer based on computational linguistics. Anand Kumar et al (2010) proposed a factored statistical mahince translation system for English to Tamil language. Under the project "Computational Tools for Teaching Tamil" funded by Tamil Virtual Academy, Amrita University has developed a rule based MT system for English-Tamil Machine translation (Rajendran and Anand Kumar 2018). The Stanford university dependency parser output of English in converted to Tamil dependency parser output and the Tamil text is generated using morho-syntactic generator. The system is in its infant stage and need to be developed further. 3.2.6.5. AUKBCRC’s contribution to developing MT systems AUKBCRC completed Anusaraka project fo developing Anusara system for Hindi-Tamil Traslation Apart from consortia project, AUKBCRC has developed an MT system for English-Tamil Translation. This is a rule based system. 3.2.6.6. Tamil University’s contribution to developing MT systems Tamil Univesity undertook Russian-Tamil Translation quite ealier. Rajendran tried to prepare a a Machine Aid to Translate Linguistics (Rajenran ; Rajendran and Kamakshi 2004). Table consolidating the above discussed information

Quantity

Availability

Quality

Coverage

Maturity

Substitutability

adaptability

Language Technology: Tools, technologies and applications Speech recognition

6

4

Average

Average

Average

Average

Average

Speech synthesis

7

6

Good

Good

Good

Good

Good

1.Tokenization

6

4

Good

Good

Good

Good

Good

2. Stemming

2

2

Good

Good

Good

Good

Good

3.Lemmatization

3

2

Average

Average

Average

Average

Average

4.Morphological analysis

10

7

Good

Good

Good

Good

Good

5.Morphological generation

6

3

Good

Good

Good

Good

Good

6.POS tagging

10

7

Good

Good

Good

Good

Good

7.Syntacitic parsing

4

2

Average

Average

Average

Average

Average

8 Chunking.

4

3

Average

Average

Average

Average

Average

9.Shallow parsing

2

1

Average

Average

Average

Average

Average

10.Caluse boundary identification

2

1

Average

Average

Average

Average

Average

11. OCR

7

3

Average

Average

Average

Average

Average

1.WSD

4

2

Below average

Below average

Below average

Below average

Below average

2.Question answering system

2

1

Below average

Below average

Below average

Below average

Below average

3.Relationship extraction

3

2

Below average

Below average

Below average

Below average

Below average

4.Sentiment analysis

5

3

Average

Average

Average

Average

Average

5.Automatic text summarization

3

1

Average

Average

Average

Average

Average

6.Coreference resolution

4

2

Average

Average

Average

Average

Average

Text generation

3

1

Below average

Below average

Below average

Below average

Below average

Machine Translation

10

5

Average

Average

Average

Average

Average

Average

Average

Average

Average

Average

Grammatical analysis

Semantic analysis

Language Resources: Data and Knowledge bases Text corpora

4

4

Speech corpora

2

2

Below average

Below average

Below average

Below average

Below average

Parallel copora

4

3

Below average

Below average

Below average

Below average

Below average

Lexical resources

10

6

Average

Average

Average

Average

Average

Grammars

6

6

Good

Good

Good

Good

Good

4. Strategies for future R&D Tamil is now prepared for the full-fledged digitalization as visualized by the central government. Of course there are still many unfinished works. For example MT systems are yet to me modified to make it suitable for the general users. Tamil WordNet is to be expanded or augmented to make it at par with European languages. There are many problems with the Unicode slots allotted for Tamil. Tamil need more slots for proper grammatical analysis in Unicode format. Still scholars convert Tamil into roman and then into Tamil after grammatical analysis. WX transliteration system adopted for inputting Tamil is notoriously bad. This problem has to be solved. Many computational tools developed so far have to be made available as open source. The language resources such as text corpus, speech corpus, parallel corpus, etc. have to be shared and made available to the general users. Cloud sourcing can be encouraged to minimize the efforts. Repetition of the works needs to be avoided. Resources available to one have to be shared with others. There should be combined efforts for the technological development of Tamil. What has been done for the Technological Development of Tamil is comparatively less when compared to European languages. We have to go a long way to achieve the desired goal. All the institutions including private sector institutions have to work for this. Computer Scientists and Linguists (including language experts) have to work hand in hand with a consolidated and time scheduled target. Now-a-days Google has a few NLP systems for Tamil, say for example Translators, OCRs, etc. But Google works by statistical method which is now always reliable. Moreover, depending on Google may be dangerous sometimes. So we have to develop our own NLP or computational tools and resources for the technological development of our languages, say for example, Tamil. Students are our strength. They have to be encouraged to work on Tamil NLP. We have to motive them into Tamil NLP. The governments, both central and state government have to fund for the technological development of Tamil. We should make use of the tax-payers' money with great care. 3.2.4.6.5. Other contributions There many other individual and institutional efforts in building Tamil oriented MT systesms.

References Abinaya, N., John, N., Ganesh, B. H., Kumar, A. M., and Soman, K. (2014). "Amrita cen@ fire-2014: Named entity recognition for Indian languages using rich features." In: Proceedings of the Forum for Information Retrieval Evaluation, pages 103–111. ACM. Akilan R. and Naganathan E. R. (2015). Morphological Analyzer For Classical Tamil Text: A Rule-Based Approach Vol. 10, No 20, November, 2015 ISSN 1819-6608, ARPN Journal of Engineering and Applied Sciences ©2006-2015 Asian Research Publishing Network (ARPN). www.arpnjournals.com 9325. Akilandeswari A. and Sobha Lalitha Devi. (2015). "Tamil Pronominal Resolution Boosted By Sentence Transformation". Aust. J. Basic & Appl. Sci., 9(23): 566-572, 2015. Akilandeswari A and Sobha Lalitha Devi (2014). “Anaphora Resolution in Tamil Novels”, In Rajendra Prasath, Philip O'Reilly, T. Kathirvalavakumar (Eds), Mining Intelligence and Knowledge Exploration, Springer LNAI Vol 8891, pp. 268-277. Akilandeswari A., Sobha Lalitha Devi. (2013). "Conditional Random Fields Based Pronominal Resolution in Tamil ", In International Journal on Computer Science and Engineering, Vol. 5 Issue 6 pp 601 – 610. Anandan P, Rajani Parthasarathy, Geetha, T.V. (2001). “Morphological analyzer for Tamil”. ICON 2002, RCILTS-Tamil Anna University, Chennai. Anandan P, Ranjani Parthasarathi and Geetha T.V. (2001) Morphological Generator For Tamil Tamil Inayam, Malaysia 2001. Anandan, P, Rajani Parthasarathy, Geetha, T.V. (2001). “Morphological Generator for Tamil” in Tamil Internet 2001 Conference Proceedings, Malaysia. Ananth Ramakrishnan A. Sankar Kuppan and Sobha Lalitha Devi (2009) "Automatic Generation of Tamil Lyrics for Melodies." In: Proceedings of the Workshop on Computational Approaches to Linguistic Creativity

Boulder, Colorado:Association for Computational Linguistics, Pages: 40–46 Ananth Ramakrishnan A, Sobha Lalitha. (2010) "An alternate approach towards meaningful lyric generation in Tamil". Proceedings of the Workshop: Second Workshop on Computational Approaches to Linguistic Creativity, June 5, Los Angeles, California, 2010, 31-39. Anand Kumar, M., Dhanalakshmi, V., Soman, K.P., Rajendran, S. (2009) A Novel Approach for Tamil Morphological Analyzer. In: Proceedings of Tamil Internet Conference 2009, Cologne, Germany, pp. 23–35 (October 2009). Anand Kumar M, Dhanalakshmi V, Rekha R U,Soman K, P and Rajendran (2010). "A Novel data driven algorithm for Tamil morphological generator", IJCA,Vol.6,No.12,Pages:52-56. Anand Kumar M, Dhanalakshmi V, Soman K.P , Rajendran S (2010) " A Sequence Labeling Approach to Morphological Analyzer for Tamil Language", IJCSE, Vol. 02, No. 06,, 1944-1951. Anand Kumar M, Dhanalakshmi V, Soman K.P and Rajendran S (2010). "A Sequence Labelling Approach to Morphological Analyzer for Tamil Language". IJCSE) International Journal on Computer Science and Engineering Vol. 02, No. 06, 2010, 1944-1951 Anand Kumar M, Dhanalakshmi V, Soman K P and Sharmiladevi V (2013). "Improving the Performance of English-Tamil Statistical Machine Translation System using Source-Side Pre-Processing", Proceedings of International Conference on Advances in Computer Science, AETACS © Elsevier, 2013 Anand Kumar M, Rajendran S. and Soman K. P. (2014) “AMRITA@ FIRE-2014: Morpheme Extraction for Tamil using Machine Learning (Working notes)”, in International Workshop: "MET shared Task" Forum for Information Retrieval Evaluation (FIRE- 2014), Bengaluru , 2014. Anand Kumar M, Rajendran S and Soman K P (2014) “Tamil Word Sense Disambiguation using Support Vector Machines with rich features.” International Journal of Applied Engineering Research. ISSN 0973-4562, Volume X, Number X (2014) pp. xxx-xxx © Research India Publications http://www. ripublication.com/ijaer.htm Anand Kumar, M., S. Rajendran, and K. P. Soman. (2014). “Supervised Cross-lingual Preposition Disambiguation for Machine Translation" iDravidian’ 2014. Symposium on Natural Language Processing for Dravidian Languages, ICON 2014. Anand Kumar M, Rajendran S. and Soman K. P. (2015). “Cross-Lingual Preposition Disambiguation for Machine Translation.” Eleventh International Multi-Conference on Information Processing-2015 (IMCIP2015). Available online at www.sciencedirect.com Anand Kumar M, Dhanalakshmi V, Soman K.P. and Rajendran S. (2014). “Factored Statistical Machine Translation System for English to Tamil Language.” Social Sciences & Humanities Journal homepage: ttp://www.pertanika.upm.edu.my/Pertanika J. Soc. Sci. & Hum. 22 (4): 1045 - 1061 (2014) M Anand Kumar 2015. Tamil Linguistic Tools and English-Tamil Machine Translation System. United State: LAP Lambert Academic Publishing. Anand Kumar M, S. Shriya, and K. P. Soman. (2015). AMRITA-CEN@FIRE 2015: Extracting entities for social media texts in Indian languages. CEUR Workshop Proceedings, 1587:85–88, 2015. Anand Kumar M, Shivkaran Singh, Kavirajan B and Soman K P 2016. Shared Task on Detecting Paraphrases in Indian Languages (DPIL): An Overview. Anand Kumar M (http://orcid.org/0000-0003-0310-4510) Anand Kumar M, Premjith B, Shivkaran Singh, Rajendran S and Soman K P. (2017). An Overview of the Shared Task on Machine Translation in Indian Languages (MTIL) - 2017 (Special issue on MTIL 2017) J. Intell. Syst. (), 1–16 DOI 10.1515/jisys-- © de Gruyter. Anandan P., Saravanan K., Parthasarathy R., and Geetha T.V., (2002), "Morphological Analyzer for Tamil", in the proceedings of ICON 2002, Chennai. ANUVADAKSH – An Expert English to Indian Language Machine Translation System. Antony P J, Warrier N.J. and Soman K P. (2010). "Penn Treebank-Based Syntactic Parsers for South Dravidian Languages using a Machine Learning Approach", International Journal of Computer Applications (0975 – 8887) Volume 7– No.8, October 2010. Aparna K G and A G RamakrishnanA "Complete Tamil Optical Character Recognition System." Ariaratnam I., Weerasinghe A. R. and Liyanage C. (2014). "A shallow parser for Tamil". 14th International Conference on Advances in ICT for Emerging Regions (ICTer), 10-13 Dec. 2014. Arulmozhi S. (1998). Aspects of Inflectional Morphophonology — A Computational Approach. Unpublished Ph.D Thesis. Hyderabad: University of Hyderabad. Arulmozhi P, Sobha L and Kumara Shanmugam B. (2004). "Parts of Speech Tagger for Tamil", Symposium on Indian Morphology, Phonology & Language Engineering, March 19-21, IIT Kharagpur. : 55-57. Arun Srivatsa. R , Shriya.P , Adithya Srinivasan (2015) “Interpretation of Tamil Script using Optical Character Recognition in Android.” International Journal of Computer Science and Information Technologies (IJCSIT), Vol. 6 (5) , 2015, 4528-4530. Arunselvan S.J., M Anand Kumar K. P. Soman (2015) "Sentiment analysis of Tamil movie reviews via feature frequency count" International Journal of Applied Engineering Research vol. 10 no. 20 pp. 17934-17939 2015. Balaji J, Geetha T V, Ranjani Parthasarathi. (2012). Semantic Parsing of Tamil Sentences. Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-2012), pages 15–22, COLING

2012, Mumbai, December 2012. Banu M, Karthika C, Sundaramani P and Geetha T.V. 2007. "Tamil Document Summarization using Semantic Graph Method." International Conference on Computational Intelligence and Multimedia Applications." IEEE, pp 125-134, 2007. Baskaran, S. (2002). Semantic Analyser for Word Sense Disambiguation. MS Thesis, Anna University, Chennai. Baskaran Sankaran, Vaidehi V (2002) "Role of Collocations and Case-Markers in Word Sense Disambiguation: A Clustering-Based Approach". In Proceedings of IEEE International Symposium on Natural Language Processing and Knowledge Engineering 2002. Vol. I. pp. 625-630. Hammamet, Tunisia. Baskaran S and Vaidehi V (2003). "Collocation based Word Sense Disambiguation using Clustering for Tamil". Communicated to International Journal of Sankaran Baskaran, Kalika Bali, Tanmoy Bhattacharya, Pushpak Bhattacharya, Girish Nath Jha, Rajendran S, Saravanan. K, Sobha L and Subbarao, K. V. (2008). A Common Part of Speech Tagset Framework for Indian Languages, LREC 2008, 6th Language Resources and Evaluation Conference, Marrakech, Morocco.Brouchers on ‘Language Technology Products’ of the Resource Center for Indian Language Technology Solutions – Tamil, Chennai. Bharati, A., Chaitanya, V., Kulkarni, A. P., & Sangal, R. (1997). ANUSAARAKA: Machine Translation in Stages. A Quarterly in Artificial Intelligence, 10(3), 22-25. Barathi Ganesh HB, Anand Kumar M and Soman KP. "Conditional Random Fields for Code Mixed Entity Recognition." Chandrakanth, D. Anand Kumar M. and Gunasekaran S. (2012) "Parts-of-Speech tagging for Tamil language‖", IJCSE,Vol.6,No.6,Pages:88-93. Chandrasekaran S, Chithra S. (2016) Big Data Methods for Computational Tamil Linguistics with Emotion Oriented Semantic Pattern Mining. Discovery, 2016, 52(245), 979-984. Deivasundaram, N. and Gopal, A. (2003). ‘Computational Morphology of Tamil’ In B. Ramakrishna Reddy (ed.) Word Structure in Dravidian, Kuppam: Dravidian University, 406-410. Dhanalakshmi V, Anandkumar M, Vijaya M.S, Loganathan R, Soman K.P. and Rajendran S. (2008). "Tamil Partof-Speech tagger based on SVMTool", In: Proceedings of the COLIPS International Conference on natural language processing(IALP), Chiang Mai, Thailand. 2008:59-64. Dhanalakshmi V, Anandkumar M, Shivapratap G, Soman, K P and Rajendran S. (2009). Tamil POS Tagging using Linear Programming, In: International Journal of Recent Trends in Engineering, 1(2):166-169. Dhanalakshmi V, Anand kumar M, Rajendran S and Soman K P. (2009). "POS Tagger and Chunker for Tamil Language," Proceedings of the 8th Tamil Internet Conference. Cologne, Germany, 2009. Dhanalakshmi V., Padmavathy P., Anand Kumar M., Soman K.P., and Rajendran S. (2009). “Chunker For Tamil Using Machine Learning” 7th International Conference on Natural Language Processing 2009 (ICON2009), IIIT Hyderabad, India, December 2009. Dhanalakshmi V, Padmavathy P, Anand Kumar M, Soman K P, and Rajendran S 2009, “Chunker for Tamil”, Proceedings of International Conference on Advances in Recent Technologies in Communication and Computing, IEEE Press, doi: 10.1109/ARTCom. 2009.191. Dhanalakshmi V. and Anand Kumar M. “Hierarchal POS tagging for Tamil language using Machine learning approach”, Unpublished. Uploaded in academia. edu. Dhanalakshmi V, Anand Kumar M, Shivapratap G, Soman K.P and Rajendran S (2009) "Tamil POS Tagging using Linear Programming", International Journal of Recent Trends in Engineering, Vol. 1, No. 2. Dhanalakshmi V, Anand kumar M, Rajendran S, Soman K P. (2009). POS Tagger and Chunker for Tamil Language. Dhanalakshmi V., Anand Kumar M., Rekha R.U., Arun Kumar C., Soman K.P. and Rajendran S. (2009). "Morphological Analyzer for Agglutinative Languages Using Machine Learning Approaches," artcom, pp.433435, 2009 International Conference on Advances in Recent Technologies in Communication and Computing, 2009,IEEE Press,doi: 10.1109/ARTCom.2009.184. Dhanalakshmi V., Padmavathy P., Anand Kumar M., Soman K.P. and Rajendran S. (2009). "Chunker for Tamil," artcom, pp.436-438, 2009 International Conference on Advances in Recent Technologies in Communication and Computing, 2009, IEEE Press,doi: 10.1109/ARTCom.2009.191. Dhanalakshmi, V., Anand Kumar, M., Rekha, R.U., Soman, K.P., Rajendran, S. (2010) "Grammar Teaching Tools for Tamil Language", In: Technology for Education Conference (T4E 2010), pp. 85–88, India, 2010. Dhanalakshmi V. (2010). Shallow Parser for Tamil. Thesis Submitted to Tamil University for the Award of the Degree of Doctor of Philosophy in Linguistics in Department of Linguistics, Tamil University, Thanjavur. Dhanalakshmi V and Anand Kumar M, (2011), "Tamil shallow parser using machine leaning Approach", Tamil Internet. Dhivya, R, Dhanalakshmi V, Anand Kumar M and Soman K.P. (2011) “Clause Boundary Identification for Tamil Language Using Dependency Parsing.” SPIT/IPC 2011: 195-197 Duraipandi, R. “The Mophological Generator and Parsing Engines of Tamil Verb forms”. In Tamil Internet 2006.Chennai: Asian Printers. Ganesan, M. (1994). “Functions of Morphological Analyzer Developed at CIIL, Mysore” in Harikumar Basi (ed.) Automatic Translation (seminar proceedings), Thiruvanthapuram: ISDL

Ganesan, M. (2003). “Computational Morphology of Tamil”. In B. Ramakrishna Reddy (ed.) Word Structure in Dravidian, Kuppam: Dravidian University, 399-405. Ganesan, M and Francis Ekka. (1994). Morphological Analyzer for Indian Languages. Agarwarl Pani (eds) Information Technology Applications in Language, Script and Speech. New Delhi:BPB Publication. Ganesh, J., Parthasarathi, R., Geetha, T. V., and Balaji, J. (2014). Pattern based bootstrapping technique for Tamil POS tagging. In: Mining Intelligence and Knowledge Exploration, Lecture Notes in Computer Science, pages 256–267. Springer, Cham. Girish Nath Jha, The TDIL Program and the Indian Langauge Corpora Intitiative (ILCI)., in: LREC, 2010. Gowri Mani G. Latent Semantic Analysis and Applications for Tamil Documents, MCA Project, Anna University, Chennai. Jalin A.F. and J. Jayakumari (2017) "Text to speech synthesis system for Tamil using HMM," (2017) IEEE International Conference on Circuits and Systems (ICCS), Thiruvananthapuram, 2017, pp. 447-451. doi: 10.1109/ICCS1.2017.8326040. Jayavardhana Rama, G.L. Ramakrishnan, A.G., Muralishankar R. and Prathibha P. (2002). "A Complete Text-toSpeech Synthesis System in Tamil,'' Proc. IEEE 2002 Workshop on Speech Synthesis, Sep. 11-13, Santa Monica, CA USA, 2002. Kamakshi S and Rajendran S. 2004. Preliminaries to the Preparation of a Machine Aid to Translate Linguistics Texts in English into Tamil. Thiruvananthapuram: Dravidian Linguistics Association, Publication 86, 2004. Kavirajan, B & Kumar, M & Kp, Soman & Sankaravelayuthan, Rajendran & Vaithehi, Vaithehi. (2017). Improving the rule based machine translation system using sentence simplification (English to Tamil). 957-963. 10.1109/ICACCI.2017.8125965. Keerthana S., Dhanalakshmi V., Anand Kumar M., Ajith V.P., Soman K.P. (2012) Tamil to Hindi Machine Transliteration Using Support Vector Machines. In: Das V.V., Ariwa E., Rahayu S.B. (eds) Signal Processing and Information Technology. SPIT 2011. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 62. Springer, Berlin, Heidelberg Kiran R., Nivedha K., Pavithra Devi S. and Subha T. (2017). Voice and speech recognition in Tamil language. 2017 2nd International Conference on Computing and Communications Technologies (ICCCT). Kohilavani S, T. Mala, T. V. Geetha. (2009). “Automatic Tamil Content Generation.” 2009 International Conference on Intelligent Agent & Multi-Agent Systems. Kumar C.S. and Foo Say Wei (2003), ‘A Bilingual Speech Recognition System for English and Tamil’, Proceedings of Joint Conference of the Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia, Vol. 3, pp. 1641-1644. Kumara Shanmugam, P. (2004). Parse Representation of Tamil syntax. MS Thesis, Anna University, Chennai. Janarthanam, S., Nallasamy, U., Ramasamy, L., Santhoshkumar, C. (2007). Robust Dependency Parser for Natural Language Dialog Systems in Tamil, In Proceedings of 5th Workshop on Knowledge and Reasoning in Practical Dialogue Systems(IJCAI KRPDS-2007), pp. 1–6, Hyderabad, India, 2007. Lakshmana Pandian S. and T. V. Geetha Morpheme based Language Modelfor Tamil Part-of-Speech Tagging. Kausikaa.N and V. Uma 2016. “Sentiment Analysis of English and Tamil Tweets using Path Length Similarity based Word Sense Disambiguation.” IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 22780661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. I (May-Jun. 2016), PP 82-89 Kumar A M, Rajendran, S, and Soman K P (2014). “Tamil word sense disambiguation using support vector machines with rich features”, International Journal of Applied Engineering Research, vol. 9, pp. 7609-7620, 2014. Lakshmi S, Sindhuja Gopalan, Sobha Lalitha Devi. (2014). “Cross Linguistic Variations in Discourse Relations among Indian Languages.” In Proceedings of International Conference on Asian Language Processing 2014, Kuching, Sarawak, Malaysia. Lakshmi S., and Sobha Lalitha Devi (2014).”Rule-based Case Transfer in Tamil Malayalam MT”, In proceedings of CICLING, 2014. Loganathan, R. (2010). English-Tamil Machine Translation System. Master of Science by Research Thesis, Amrita Vishwa Vidyapeetham, Coimbatore. Loganathan Ramasamy and ZdenZabokrtsky , (2011), "Tamil Dependency Parsing: Results Using Rule Based and Corpus Based Approaches", Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science Volume 6608, pp 82-95. Loganathan, Ramasamy, OndrejBojar, Zdenek Žabokrtský, L. O. (2012), Morphological Processing for EnglishTamil Statistical Machine Translation. In 24th International Conference on Computational Linguistics (p. 113). Loganathan, Ramasamy, OndrejBojar, Zdenek Žabokrtský, L. O. (2012), Morphological Processing for EnglishTamil Statistical Machine Translation. In 24th International Conference on Computational Linguistics (p. 113). Lushanthan S, A. R. Weerasinghe, D.L. Herath. (2014) "Morphological Analyzer and Generator for Tamil Language" 2014 International Conference on Advances in ICT for Emerging Regions (ICTer). Madhupriya U. Enhanced Version of Morphological Analyzer and Parser for Tamil Language. MCA Project, Anna University, Chennai. Mahalakshmi, S., Anand Kumar, M., Soman, K.P., 2015. Paraphrase detection for Tamil language using Deep learning algorithm. International journal of Applied Engineering Research, 10 (17), pp. 13929-13934

Malarkodi C.S., Elisabeth Lex and Sobha Lalitha Devi. (2016). "Named Entity Recognition for the Agricultural Domain", In proceedings of 17th International Conference on Intelligent Text Processing and Computational Linguistics, Konya, Turkey. Special issue: Research in Computing Science. Malarkodi C.S. and Sobha Lalitha Devi. (2018). "Twitter Named Entity Recognition for Indian Languages.", In proceedings of 18th International Conference on Computational Linguistics and Intelligent Text Processing, March 18 to 24, 2018, Hanoi, Vietnam. Manone, V.K. Rajendran S., & Soman, K.P. 2015. “A Synchronous Syntax for English-Tamil language pair for Machine Translation”. Fourth International Symposium on Natural Language Processing (NLP’15). Kochi, Kerala: Co-affiliated with Fourth International Conference in Computing, Communications and Informatics (ICACCI-2015). Marimuthu K., Amudha K., Bakiyavathi T. and Sobha Lalitha Devi. (2013). ”Word Boundary Identifier as a Catalyzer and Performance Booster for Tamil Morphological Analyzer”, In Proceedings of 6th Language and Technology Conference, Human Language Technologies as a challenge for Computer Science and Linguistics 2013, Poznan, Poland. Meenakshi Narayanaswamy, Ravikumar.K.E., and K.Vijay-Shanker “A Biological Named Entity Recognizer “ accepted in Proceedings of the Pacific Symposium on Biocomputing '03 (PSB'03), Hawaii, January 2003. Menaka S, Vijay Sundar Ram And Sobha Lalitha Devi. 2018. Morphological Generator For Tamil. https://www.researchgate.net/publication/265866650.Menaka S., Pattabhi R. K. Rao, Sobha Lalitha Devi, (2011), “Automatic Identification of Cause-Effect Relations in Tamil Using CRFs”, In A. Gelbukh (ed), Computational Linguistics and Intelligent Text Processing, Springer LNCS Vol. 6608/2011 pp 316-327. Menaka Sankarlingam, Malarkodi C S and Sobha Lalitha Devi (2014), “A Deep Study on Causal Relations and its Automatic Identification in Tamil”, In proceedings of 2nd Workshop on Indian Language Data: Resources and Evaluation, Organized under LREC2014, Reykjavik, Iceland. Menon, A. G., S. Saravanan, R. Loganathan, and K. P. Soman, (2009). “Amrita Morph Analyzer and Generator for Tamil:A Rule-Based Approach.” Proceedings of Tamil Internet Conference 2009, Cologne, Germany, October 2009. Menon, V.K., Rajendran S Anand Kumar M and Soman K P. A new TAG Formalism for Tamil and Parser Analytics. Uploaded in academia. edu and Research Gate. Mohan Raj, S.N. and Rajendran S. 2016. “Tamil Oriented Machine Translation under Indian Language to Indian Language Machine Translation (ILILMT) consortium.” In: Proceedings of 15th World Tamil Internet conference 2016, held at Gandhi Gram Rural University, Tamil Nadu, September 8-11, 2016, pages 393-402. Mokanarangan T, T Pranavan, U Megala, N Nilusija, Gihan Dias, Sanath Jayasena, Surangika Ranathunga. 2016. “Tamil morphological analyzer using support vector machines.” International Conference on Applications of Natural Language to Information Systems, 15-23, Springer, Cham Muralidharan V. and Sharma D. M. (2016). Construction Grammar based Annotation Framework for parsing. Tamil. In 17th International Conference on Intelligent Text Processing and Computational Linguistics, Konya, Turkey, Report No: IIIT/TR/2016/-1 Nagarajan T. and Murthy H.A. 2004. “Group Delay based Segmentation of Spontaneous Speech into Syllablelike units,” EURASIP Journal of Applied Signal Processing, Vol.17, pp.2614-2625, 2004. Narayana Murthi, K.N, Sobha, L, Muthukumari, B (2007) "Pronominal Resolution in Tamil Using Machine Learning Approach" The First Workshop on Anaphora Resolution (WAR I),Ed Christer Johansson, Cambridge Scholars Publishing, 15 Angerton Gardens, Newcastle, NE5 2JA, UK pp.39-50. Nathan Green, Loganathan Ramasamy and Zdenekˇ Zabokrtsk ˇ y´, (2012). "Using an SVM Ensemble System for Improved Tamil Dependency Parsing",Association for Computational Linguistics, pages:72–77. Nayeemulla Khan A. and Yegnanarayana B. (2001), ‘Development of Speech Recognition System for Tamil for Small Restricted Task’, Proceedings of National Conference on Communication, India. Padmamala R. and V. Prema. (2017). “Sentiment analysis of online Tamil contents using recursive neural network models approach for Tamil language.” 2017 IEEE International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM) Pandian, S. L. and Geetha, T. V. (2009). CRF models for Tamil part of speech tagging and chunking. In: Proceedings of the 22Nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy, ICCPOL ’09, pages 11–22, Berlin, Heidelberg. SpringerVerlag. Pattabhi R K Rao T and Sobha L (2008), "Identifying Similar and Co-referring Documents Across Languages", In Proceedings of 3rd International Joint Conference on Natural Language Processing Workshop on Cross Lingual Information Access (CLIA- 2008), Hyderabad, India. pp. 10 -17. Pattabhi R K Rao T, Vijay Sundar Ram R, Vijayakrishna R and Sobha L. (2013). “A Text Chunker and Hybrid POS Tagger for Indian Languages.” In: Proceedings of the IJCAI, 2013. Pattabhi R K Rao and Sobha Lalitha Devi. (2015) "Automatic Identification of Conceptual Structures Using Deep Boltzmann Machines" In Proceedings of Forum for Information Reterival and Evaluation-2015, ACM DL, ISBN 978-1-4503-4004-5 Pattabhi RK Rao and Sobha Lalitha Devi, (2017), ‘Patent Document Summarization Using Conceptual Graphs’, International Journal on Natural Language Computing (IJNLC), vol. 6(3).

Pattabhi RK Rao and Sobha Lalitha Devi, (2018), "Enhancing Multi-Document Summarization using Concepts", Sādhanā - Academy Proceedings in Engineering Sciences. vol 43:2 (27) . https://doi.org/10.1007/s12046-0180789-y. Rajendran, S. (1995). “Machine bridge for Tamil and English." In: Proceedings of World Tamil Conference, Tamil Universiy, Thanjavur, 1995. Rajendran, S. (1997). Grammatical Formalism and Computational Analysis of Nominal Compounds in Tamil. South Asian Language Review, vol. 7, no. 1, 1997, 27-46. Rajendran, S. 1999. ‘Spell and grammar checker for Tamil’. Paper read in 27th All India Conference of Dravidian Linguists held in ISDL, Thiruvananthapuram, 17th-19th, July, 1999. Rajendran, S. (2002). Preliminaries to the preparation of Wordnet for Tamil. Language in India 2:1, March 2002, www.languageinindia.com. Rajendran, S. and Baskaran, S. (2002). Electronic Thesaurus for Tamil. In: Proceedings of the International Conference on Natural Language Processing. NCST, Mumbai, 2002. Rajendran S, S.Arulmozi, B. Kumara Shanmugam, S. Baskaran and S. Thiagarajan. (2002). Tamil WordNet. In: Proceedings of the First Global WordNet Conference, CIIL, Mysore, 2002, 271-274. Rajendran, S., S.Arulmozi, B. Kumara Shanmugam, S. Baskaran & S. Thiagarajan 2002. Tamil WordNet. In: Proceedings of the First Global WordNet Conference, CIIL, Mysore, 2002, 271-274. Rajendran, S. 2003. Prerequisite for the Preparation of an Electronic Thesaurus for a Text Processor in Indian Languages. Language in India 3:1, January 2003, www.languageinindia.com. Rajendran, S. 2003. Creating Generative Lexicon from MRDs: Tamil Experience. In: Rajeev Sangal et al (edited). Recent Advances in Natural Language Processing: Proceedings of the International Conference of Natural Language Processing (ICON 2003), 2003, 83-91. Rajendran (2003). “Ontology and Knowledge Representation.” Uploaded in academia.edu and Reseach Gate. Rajendran S, Arulmozi S, Ramesh Kumar, Viswanathan S. (2003). "Computational morphology of verbal complex," B. Ramakrishna Reddy (edited) Word Structure in Dravidian, Dravidian University, Kuppam, 2003, 376-398. Rajendran S. and S. Kamakshi. (2003). Preliminaries to the Preparation of a Machine Aid to Translate Linguistic Text Books in English into Tamil.. Language in India 3:9, September, 2003, www.languageinindia.com. Rajendran, S. (2006) "Morphological parsing on Tolkappiyam’s perspective." Uploade in academia.edu and Reseach Gate. Rajendran, S. (2006). Parsing in Tamil: Present state of art. Language in India 6:8, August 2006, www.languageinindia.com. Rajendran, S. (2006). Parsing in Tamil: Present state of Art. Indian Linguistics vol. 67, 2006, 159–67. Rajendran, s. 2006. A Survey Of The State Of The Art In Tamil Language Technology, Language in India 6:10, October 2006, www.languageinindia.com. Rajendran, S. (2006). A Survey Of The State Of The Art In Tamil Language Technology, Language in India 6:10, October 2006, www.languageinindia.com Rajendran, S. (2007). Complexity of Tamil in POS Tagging. Language in India 7:1, January 2007, www.langugeinindia.com. Rajendran, S. (2009). Dravidian WordNet. In: Proceedings of Tamil Internet Conference 2009. Cologne, Germany, October, 2009 Rajendran, S. (2010). Tamil WordNet. In: Proceedings of the Global WordNet Conference (GWC 10) 2010, IIT, Bombay. Rajendran S., Shivapratap G., Dhanalakshmi V. and Soman K.P. (2010). Building a WordNet for Dravidian Languages. In: Proceedings of the Global WordNet Conference (GWC 10), 2010, IIT Bombay. Rajendran, S. (2012). “Preliminaries To The Preparation Of A Spell And Grammar Checker For TamiL” Upoaded in academia.edu and Reseach Gate. Rajendran, S., M. Anandkumar and Soman K.P. (2013). Computational approach to Word Sense disambiguation in Tamil. In: 12th International Tamil Internet Conference 2013, University of Malaya, Kuala Lumpur, Mayasia. Rajendran S and Vasuki G. (2013). English To Tamil Machine Translation System Using Parallel Corpus. Book Uploaded in academia.edu and Research Gate. Rajendran S. (2014). Resolution of Lexical Ambiguity in Tamil. Language in India (e-journal), 14:1, January, 2014. Rajendran S and Anandkumrar M. (2014). “Corpus based approach for resolving verbal polysemy in Tamil”. In: Proceedings of 13th International Conference on Tamil Computing and Tamil Internet (Tamil Internet 2014) from 19th to 21st September 2014 at Pondicherry University. Rajendran S and Anandkumrar M. (2014). “Corpus based approach for resolving verbal polysemy in Tamil”. In: Proceedings of 13th International Conference on Tamil Computing and Tamil Internet (Tamil Internet 2014) from 19th to 21st September 2014 at Pondicherry University. Rajendran, S. Anand Kumar M. and Soman K.P. (2015). “Building hierarchies and networks from MRDs of Tamil”. In: Proceedings of 14th International Conference on Tamil Computing and Tamil Internet (Tamil Internet 2015), at Singapore (SIM University, Singapore campus) from 30th, 31st May to 1st June 2015.

Rajendran, S. (2016). “Tamil Thesaurus to Tamil wordNet.” In: Proceedings of 15 th World Tamil Internet conference 2016, held at Gandhi Gram Rural University, Tamil Nadu, September 8-11, 2016, pages 1-9. Rajendran, S. and Anandkumar, M. (2017). “Visual Onto-Thesaurus for Tamil.” Language in India www.languageinindia.com ISSN 1930-2940 vol. 17:5 May 2017. Rajendran S. (2018) “Lexical Resource Tool for Tamil Computing." In: Conferene Papers - 17th Tamil Internet Conference at Tamil Agriculural University, Coimpator. Rajendran S and Anand Kumar (2018) “Computing tools for Tamil Language Teaching and Learning.” In: Conferene Papers - 17th Tamil Internet Conference at Tamil Agriculural University, Coimpator. Ramakrishnan, A.G. Lakshmish N Kaushik, Laxmi Narayana M. (2007). Natural Language Processing for Tamil TTS. Conference: 3rd Language and Technology Conference (LTC), At Poznan, Poland October 2007. Ramakrishnan, A.G. 2014. Speech Technology and Tamil. Conference Paper (PDF Available). January 2014 DOI: 10.13140/RG.2.1.2287.4725 Conference: National Conference on Tamil Internet, At Chennai, India. Ramasamy, L. "TamilTB: An Effort Towards Building a Dependency Treebank for Tamil." WDS'11 Proceedings of Contributed Papers, Part I, 143–148, 2011. ISBN 978-80-7378-184-2 © MATFYZPRESS Ramasamy, L. and Žabokrtský Z. (2011). Tamil Dependency Parsing: Results Using Rule Based and Corpus Based Approaches. International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2011: Computational Linguistics and Intelligent Text Processing pp 82-95. Ramaswamy, Vaishnavi. 2000. A Morphological Generator for Tamil. Unpublished M.Phil Dissertation. Hyderabad: University of Hyderabad. Ramaswamy, Vaishnavi. (2003). A Morphological Analyzer for Tamil. Unpublished Ph.D Dissertaion Subbmitted to University of Hyderabad. Ranganathan, Vasu. 1997. ‘A Lexical Phonological Approach to Tamil Word by computer’. International Journal of Dravidian Linguistics 26.1:57-70. Rekha R U, Anand Kumar M, Dhanalaksmi V, Soman K P and Rajendran S. A Novel Approach to Morphological Generator for Tamil. Data Engineering and Management, Second International Conference, ICDME, SpringerVerlag, Berlin, Heidelberg, 2012, 249-251. Ranganathan, Vasu (1997). “A Lexical Phonological Approach to Processing Tamil Words by Computer.” IJDL, 26:157-70. Ranganathan, Vasu. (2001). “Development of Part-of-Speech Tagger for Tamil.” Tamil Internet Conference, 2001 (http://infitt.org/drupal17a/ TIconferencepapers/TI2001/vasur.pdf). Ranganathan, Vasu. (2002). “An Interactive approach to development of English to Tamil Translation machine Translation System on the web.” Tamil Internet Conference, 2002. (http://infitt.org/drupal17a/TIconferencepapers/TI2002/15VASUR.PDF). Renganathan, Vasu. Computational Phonology and the Development of Text-to-Speech Application for Tamil Renganathan, Vasu (2001). "Development of Part-of-Speech Tagger for Tamil", Tamil Internet Conference, 2001, (http://infitt.org/drupal7a/TIconferencepapers/TI2001/vasur.pdf). Ranganathan, Vasu (2002). "An interactive approach to development of English to Tamil Translation machine Translation system on the web", Tamil Internet Conference, 2002. (http://infitt.org/drupal7a/TIconferencepapers/TI2002/15VASUR.PDF) Ranganathan, Vasu. (2010). Tamil Language in Context: A Comprehensive Approach to Learning Tamil, Department of South Asia Studies, University of Pennsylvania, http://www.thetamillangauge.com. Renganathan, Vasu. (2014). “Computational Phonology and the Development of Text-to-Speech Application for Tamil.” Tamil Internet Conference, 2014, Pondicherry University: Pondicheery. Ranganathan, Vasu. (2016a). Chapter 2: Computational Phonology. In: Computational Approaches To Tamil Linguistics. Chennai: Cre-A Publications. Ranganathan, Vasu. (2016b). Chapter 3: Computational Morphology. In: Computational Approaches To Tamil Linguistics. Chennai: Cre-A Publications. Ranganathan, Vasu. (2016c). Chapter 4: Computational Syntax. In: Computational Approaches To Tamil Linguistics. Chennai: Cre-A Publications. Ranganathan, Vasu. (2016d. Chapter 5: Computational Semantics. In: Computational Approaches To Tamil Linguistics. Chennai: Cre-A Publications. Ranganathan, Vasu. (2016e). Chapter 6: Applications of Tamil NLP Systems. In: Computational Approaches To Tamil Linguistics. Chennai: Cre-A Publications. Ranganathan, Vasu. (2016f). Computational Approaches To Tamil Linguistics. Chennai: Cre-A Publications. Remmiya Devi G, Veena P V, Anand Kumar M and Soman K P AMRITA_CEN@FIRE 2016: Code-Mix Entity Extraction for Hindi-English and Tamil-English Tweets. Rajeswari Sridhar, Rasiga Gowrisankar and G. Monica (2017) “Language Relationship Model for Automatic Generation of Tamil Stories from Hints.” International Journal of Intelligent Information Technologies (IJIIT) 13(2) Samuel Thomas, Venugopalakrishna, Y R. and Murthy H.A. (2007). “Speech Synthesis using Syllable-like units,” Workshop on Speech Synthesis, CDAC, Noida, April 13-14, 2007. Sanjay S, Anand Kumar M., and Soman K. P. (2015) AMRITA-CEN-NLP@FIRE 2015:CRF based named entity extraction for Twitter microposts. CEUR Workshop Proceedings, 1587:96–99, 2015.

Sankaralingam C, S. Rajendran, B. Kavirajan, M. Anand Kumar, K. P. Soman. 2017. “Onto-thesaurus for tamil language: Ontology based intelligent system for information retrieval.” ICACCI 2017: 2396 Sarada G.L., Lakshmi A, Murthy H.A. and Nagarajan T. (2009).“Automatic Transcription of Continuous Speech into Syllable-like units for Indian Languages,” Sadhana, Indian Academy of Sciences,Vol. 34, Part 2, April 2009, pp. 221-233. Saraswathi S. and Geetha T. V. (2007), ‘Comparison of Performance of Enhanced Morpheme-based language Model with Different Word-based Language Models for Improving the Performance of Tamil Speech Recognition System’, ACM Transaction on Asian Language Information Processing Vol. 6 No. 3, Article 9. Saraswathi,S., Kanivadhana, P., Anusiya, M., & Sathiya, S. (2011). Bilingual Translation System (For English and Tamil), International Journal on Computer Science and Engineering (IJCSE),3(3). Saravanan, S., Menon, A. G., &Soman, K. P. (2010). English to Tamil Machine Translation System, Proceedings of INFITT-2010, at Coimbatore. Schiffman, H.F. and Vasu Rengnatha, "Tamil Script Reform and Glyph Rendering Approach in Unicode: Past and Present Attempts to Simplify Tamil Writing System” In: Peri Bhaskararao (ed.). Working Papers of International Symposium on Indic Scripts: Past and Present. , ILCAA:Tokyo. Selvam M, Natarajan. A M, and Thangarajan R. (2008). Structural Parsing of Natural Language Text in Tamil Using Phrase Structure Hybrid Language Model. World Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering Vol:2, No:3, 2008 Selvam M, Natarajan. A M, and Thangarajan R (2008), "Structural Parsing of Natural Language Text in Tamil Using Phrase Structure Hybrid Language Model" , International Journal of Computer and Information Engineering. Selvam, M. Natarajan, A. M. and Thangarajan R. (2009). "Structural Parsing of Natural Language Text in Tamil Language Using Dependency Model", International Journal of Computer Processing of Languages Vol. 22, No. 02n03, pp. 237-256 (2009) https://doi.org/10.1142/S1793840609002093. Selvam M., A.M. Natarajan (2009). Improvement of Rule Based Morphological Analysis and POS Tagging in Tamil Language via Projection and Induction Techniques. International Journal Of Computers, Issue 4, Volume 3, 2009. Shanmugam, C. 2001. “Computer Analysis of Simple Sentence in Tamil”. Paper read in UGC-SAP National Seminar on Computational Linguistics and Dravidian Languages, 22-24 February, 2001, CAS in Linguistics, Annamalai University, Annamalainagar. Shanmugam, C. 2002. “Grammar and Parser: A Program for Syntactic Parsing in Tamil,” International Seminar on Tamil Computing, 27-28 February and March 1, 2002, University of Madras, Chennai. Shanmugam, C. “Minimalist Program for Tamil Parsing”. Shriya, S. Vinayakumar R, Anand Kumar M, and Soman K. P. (2015) AMRITA-CEN@ SAIL2015: Sentiment Analysis in Indian Languages. In International Conference on Mining Intelligence and Knowledge Exploration, pages 703–710. Springer, 2015. Shriya Se, R. Vinayakumar, M. Anand Kumar and K. P. Soman. (2016) "Predicting the Sentimental Reviews in Tamil Movie using Machine Learning Algorithms." Indian Journal of Science and Technology, Vol 9(45), DOI: 10.17485/ijst/2016/v9i45/106482, December 2016. Sindhuja Gopalan, Bakyavathi T, Vijay Sundar Ram, Sobha L. (2012)."Analysis of 'genetive drop' in Tamil-Hindi machine translation", presented in 40th All India conference for Dravidian Linguistics held at CALTS, School Of Humanities, University Of Hyderabad. (Received Best Paper Award). Shivkaran Singh, M. Anand Kumar, K. P. Soman (2018). Attention based English to Punjabi neural machine translation. Journal of Intelligent and Fuzzy Systems 34(3): 1551-1559 (2018) Sivashanmugam, C. (2000) “A Model for Computer analysis of Verbs in Tamil”. in Working Papers in Linguistics, Department of Linguistics, Bharathiyar University, Coimbatore. Sivashanmugam, C. (2002). Morphological Processor for Negative Constructions in Tamil”. Indian Conference on Natural Language Processing, Anna University, Chennai. Sobha, L. (2003) "Pronominal Resolution in South Dravidian languages", 23rd South Asian Language Analysis, University of Texas, Austin, 10-12 October 2003 Sobha L and Vijay Sundar Ram R. (2007). "Multilingual Place Name Tagger for Indian Languages", In the Proceedings of IJCAI Workshop on Cross Lingual Information Access (CLIA) - Addressing the Information Need of Multilingual Societies, Hyderabad. pp 34 - 39 Sobha Lalitha Devi, Sindhuja Gopalan, R. Vijay Sundar Ram., (2013). ”Transfer Grammar in Tamil-Hindi MT System”, In Proceedings of International Conference on Asian Language Processing 2013, Urumqi, China. pp 79-82. Sobha Lalitha Devi, Vijay Sundar Ram, Pattabhi RK Rao. (2014). "A Generic Anaphora Resolution Engine for Indian Languages". In Proceedings of the 25th International Conference on Computational Linguistics (Coling 2014). Sobha Lalitha Devi, Vijay Sundar Ram R., and Pattabhi RK Rao (2014). “Anaphora Resolution System for Indian Languages”, In proceedings of 2nd Workshop on Indian Language Data: Resources and Evaluation (WILDRE), Organized under LREC2014, Reykjavik, Iceland

Sobha Lalitha Devi, Sindhuja Gopalan, Lakshmi S (2014). “Automatic Identification of Discourse Relations in Indian Languages”, In proceedings of 2nd Workshop on Indian Language Data: Resources and Evaluation, Organized under LREC2014, Reykjavik, Iceland. Sobha L, Vijay Sundar Ram R. (2006). “Noun Phrase Chunking in Tamil.” In: Proceedings of the Sympositum on Modeling and Shallow Parsing of Indian Languages, Indian Institute of Technology, Bombay. pp:194-198. Sobha Lalitha Devi, Vijay Sundar Ram, Pattabhi RK Rao. (2014). "A Generic Anaphora Resolution Engine for Indian Languages". In Proceedings of the 25th International Conference on Computational Linguistics (Coling 2014). Sobha Lalitha Devi, Vijay Sundar Ram R., and Pattabhi RK Rao (2014). “Anaphora Resolution System for Indian Languages”, In proceedings of 2nd Workshop on Indian Language Data: Resources and Evaluation (WILDRE), Organized under LREC2014, Reykjavik, Iceland Sobha Lalitha Devi, Sindhuja Gopalan, Lakshmi S (2014). “Automatic Identification of Discourse Relations in Indian Languages”, In proceedings of 2nd Workshop on Indian Language Data: Resources and Evaluation, Organized under LREC2014, Reykjavik, Iceland. Sobha Lalitha Devi, Lakshmi S and Sindhuja Gopalan (2014). “Discourse Tagging for Indian Languages”, In A. Gelbukh (ed), Computational Linguistics and Intelligent Text Processing, Springer LNCS Vol 8403, pp. 469480. Sobha Lalitha Devi, Lakshmi S and Sindhuja Gopalan (2014). “Discourse Tagging for Indian Languages”, In A. Gelbukh (ed), Computational Linguistics and Intelligent Text Processing, Springer LNCS Vol 8403, pp. 469480. Sobha Lalitha Devi and Pattabhi RK Rao.(2016). "Mining of Social Networks from Literary Texts of Resource Poor Languages", In proceedings of CICLING 2016.In proceedings of 17th International Conference on Intelligent Text Processing and Computational Linguistics, Konya, Turkey, Springer LNCS Sobha Lalitha Devi and Pattabhi RK Rao.(2016). "Mining of Social Networks from Literary Texts of Resource Poor Languages", In proceedings of CICLING 2016.In proceedings of 17th International Conference on Intelligent Text Processing and Computational Linguistics, Konya, Turkey, Springer LNCS. Sobha Lalitha Devi and Pattabhi RK Rao.(2016). "Mining of Social Networks from Literary Texts of Resource Poor Languages", In proceedings of CICLING 2016.In proceedings of 17th International Conference on Intelligent Text Processing and Computational Linguistics, Konya, Turkey, Springer LNCS Sobha Lalitha Devi and Pattabhi RK Rao. (2016). "Semantic Representation of Tamil Texts using Conceptual Graphs", In Proceedings of 3rd Workshop on Indian Language Data: Resources & Evaluation (WILDRE-3) under LREC 2016 in Portoroz, Slovenia Sobha Lalitha Devi, Vijay Sundar Ram, Sindhuja Gopalan, Pattabhi RK Rao and Lakshmi S. (2016). "A Demo Proposal: Tamil – Hindi Automatic Machine Translation – A detailed Description", In Proceedings of 3rd Workshop on Indian Language Data: Resources & Evaluation (WILDRE-3) under LREC 2016 in Portoroz, Slovenia Sobha Lalitha Devi, Pattabhi RK Rao, Vijay Sundar Ram and Malarkodi C.S.(2016). "A Demo Proposal: Tamil – English Cross Lingual Information Access (CLIA) System", In Proceedings of 3rd Workshop on Indian Language Data: Resources & Evaluation (WILDRE-3) under LREC 2016 in Portoroz, Slovenia Subramonim V.I., Mahidas Bhattacharya, Lohy A. Tarai S. 2001. Speech Synthesis (Tamil & Oriya) : an application for the blind. (III,5(35)/2001-ET), Department of Science and Technology, Govt. of India Sureka K, Srinivasagan K.G. and Suganth S. An Efficient Dependency Parser Using Hybrid Approach For Tamil Language. Thangarajan R. Speech Recognition for Agglutinative Languages. http://dx.doi.org/10.5772/50140 Thangarajan R., Natarajan A. M. and Selvam M. (2008a), ‘Word and Triphone based Approaches in Continuous Speech Recognition for Tamil Language’, WSEAS Transactions on Signal Processing, Issue 3, Vol.4, 2008, pp. 76-85. Thangarajan R. and Natarajan A. M. (2008b), ‘Syllable Based Continuous Speech Recognition for Tamil Language’, South Asian Language Review (SALR), Vol. XVIII, No. 1, pp. 71-85. Thangarasu M. and R.Manavalan. 2013. "Stemmers for Tamil Language: Performance Analysis," International Journal of Computer Science & Engineering Technology (IJCSET)ISSN : 2229-3345 Vol. 4 No. 07 Jul 2013, 902-908. Thayaparan M., Ranathunga S. and Thayasivam U. "Graph Based Semi-Supervised Learning for Tamil POS Tagging", Theivendiram P, Megala Uthayakumar, Nilusija Nadarasamoorthy, Mokanarangan Thayaparan, Sanath Jayasena, Gihan Dias Surangika Ranathunga. 2016. “Named-Entity-Recognition (NER) for Tamil Language Using Margin-Infused Relaxed Algorithm (MIRA).” International Conference on Intelligent Text Processing and Computational Linguistics, Pages 465-476, Springer, Cham Vaishnavi, T. and Roxanna Samuel. (2016) "Individual Document Keyword Extraction for Tamil". International Journal of Computer Technology & Applications,Vol 7(3), 448-452 IJCTA May-June 2016 Available [email protected] Vijay Sundar Ram R, Menaka S and Sobha Lalitha Devi (2010), “Tamil Morphological Analyser”, in “Morphological Analysers and Generators”, (ed.) Mona Parakh, LDC-IL, Mysore, pp. 1 –18.

Vijay Sundar Ram R. and Sobha Lalitha Devi (2010), "Noun Phrase Chunker Using Finite State Automata for an Agglutinative Language", Proceedings of the Tamil Internet - 2010 at Coimbatore, India, June 23 -27, pp. 218 - 224 Vijay Sundar Ram R, Bakiyavathi T, Sindhujagopalan R, Amudha K and Sobha Lalitha Devi, (2012), "Tamil Clause Boundary Identification: Annotation and Evaluation" In Proceedings of Workshop on Indian Language Data: Resources and Evaluation, LREC 2012, Istanbul Vijay Sundar Ram, R. and Sobha Lalitha Devi. (2013). ”Pronominal Resolution in Tamil Using Tree CRFs”, In Proceedings of 6th Language and Technology Conference, Human Language Technologies as a challenge for Computer Science and Linguistics - 2013, Poznan, Poland Vijay Sundar Ram and Sobha Lalitha Devi, (2016). Two Layer Machine Learning Approach for Mining Referential Entities for a Morphologically Rich Language. Asian Journal of Information Technology, 15: 28312838. Vijay Sundar Ram and Sobha Lalitha Devi, (2012), "Coreference Resolution using Tree-CRF", In A. Gelbukh (ed), Computational Linguistics and Intelligent Text Processing, Springer LNCS Vol. 7181/2012 pp 285 - 296. Vijay Sundar Ram and Sobha Lalitha Devi. (2016). "How to Handle Split Antecedents in Tamil?", In proceedings of Coreference Resolution Beyond OntoNotes co-located with NAACL 2016, San Diego, California. Vijay Sundar Ram and Sobha Lalitha Devi. (2017). "A Robust Coreference chain Builder for Tamil", In proceedings of 18th International Conference on Computational Linguistics and Intelligent Text Processing, April 17 to 23, 2017, Budapest, Hungary Vijay Sundar Ram and Sobha Lalitha Devi. (2018). "A Semi-automated Annotation of Co-reference Chains in Tamil", In proceedings of 18th International Conference on Computational Linguistics and Intelligent Text Processing, March 18 to 24, 2018, Hanoi, Vietnam, Vijayakrishna R and Sobha L. Domain Focused Named Entity Recognizer for Tamil Using Conditional Random Fields. Viswanathan S. (2000) Tamil Morphological Analyser. MS Thesis. Anna University, Chennai. Viswanathan, S. Ramesh Kumar, S. Kumara Shanmugham, B. Arulmozi, S. and Vijay Shanker K. (2003). A Tamil Morphological Analyser, ICON-2003, 31-39. Sobha Lalitha Devi and Pattabhi RK Rao.(2016). "Semantic Representation of Tamil Texts using Conceptual Graphs", In Proceedings of 3rd Workshop on Indian Language Data: Resources & Evaluation (WILDRE-3) under LREC 2016 in Portoroz, Slovenia Sobha Lalitha Devi, Vijay Sundar Ram, Sindhuja Gopalan, Pattabhi RK Rao and Lakshmi S. (2016). "A Demo Proposal: Tamil – Hindi Automatic Machine Translation – A detailed Description", In Proceedings of 3rd Workshop on Indian Language Data: Resources & Evaluation (WILDRE-3) under LREC 2016 in Portoroz, Slovenia Sobha Lalitha Devi, Pattabhi RK Rao, Vijay Sundar Ram and Malarkodi C.S.(2016). "A Demo Proposal: Tamil – English Cross Lingual Information Access (CLIA) System", In Proceedings of 3rd Workshop on Indian Language Data: Resources & Evaluation (WILDRE-3) under LREC 2016 in Portoroz, Slovenia. Subramoniam V. I. (PI), Mahidas Bhattacharya (Associate PI), A. Lohy, S. Tarai. (2001) Speech Synthesis (Tamil & Oriya): an application for the Blind, (III.5(35)/2001–ET), Department of Science and Technology , Govt. of India. Sureka K, Srinivasagan K.G. and Suganth S. An Efficient Dependency Parser Using Hybrid Approach for Tamil Language. Thangarajan R. Speech Recognition for Agglutinative Languages. http://dx.doi.org/10.5772/50140 Thangarajan R., Natarajan A. M. and Selvam M. (2008a), ‘Word and Triphone based Approaches in Continuous Speech Recognition for Tamil Language’, WSEAS Transactions on Signal Processing, Issue 3, Vol.4, 2008, pp. 76-85. Thangarajan R. and Natarajan A. M. (2008b), ‘Syllable Based Continuous Speech Recognition for Tamil Language’, South Asian Language Review (SALR), Vol. XVIII, No. 1, pp. 71-85. Thangarasu M. and R.Manavalan. 2013. "Stemmers for Tamil Language: Performance Analysis," International Journal of Computer Science & Engineering Technology (IJCSET)ISSN : 2229-3345 Vol. 4 No. 07 Jul 2013, 902-908. Thayaparan M., Ranathunga S. and Thayasivam U. "Graph Based Semi-Supervised Learning for Tamil POS Tagging", LREC 2018. Thilagavathi R and Krishnakumari K. (2016) “Tamil English Languge Sentiment Analysis.” International Journal of Engineering Research and Technology (IJERT) ICETET-2016. Vasu, R. (1993). “A Logical approach to development of natural language understanding system for Tamil.” PJDS 3:1, 53-64. Vasu, R. (1997a) “A Lexical Phonological Approach to Processing Tamil Word by Computer.” IJDL, 26:1, January 1997. Vasu, R. (1997b). “Significance of Creation and Use of Corpus of Modern Tamil Prose Text through the Web.” International Symposium for Tamil Information Processing and Resources on the Internet, National University of Singapore, May 1997.

Vasu, R. (1988). “Human Aided Machine Translation (Problems and Perspectives). In: Karunakaran and Jayakumar (eds.) Vaishnavi, T. and Roxanna Samuel. (2016) "Individual Document Keyword Extraction for Tamil". International Journal of Computer Technology & Applications,Vol 7(3), 448-452 IJCTA May-June 2016 Available [email protected]. Vijayakrishna R and Sobha L, (2008), "Domain focused Named Entity Recognizer for Tamil using Conditional Random Fields", NERSSEAL 08, IJCNLP2008 workshop, Vignesh N and S.Sowmya. “Automatic Question Generator in Tamil.” International Journal of Engineering Research & Technology (IJERT) Vol. 2 Issue 10, October - 2013 IJERTIJERT ISSN: 2278-0181 Vijay Sundar Ram R., Chandra Mouli N., Bhuvaneswari P., Ananda Priya J. and Kumara Shanmugam B. (2005), “Hybrid Approach for Developing a Tamil Spell Checker.” In the Proceedings of International Conference on Natural Language, Indian Institute of Technology, Kanpur, pp. 111-115. Vijay Sundar Ram and Sobha L, (2007), "Automatic Identification of Semantic Arguments" Fifth Malaysia International Conference on Languages, Literatures, and Cultures (MICOLLAC '07), Malaysia. Vijay Sundar Ram R. and Sobha L., (2008). "Clause Boundary Identification Using Conditional Random Fields", In Proceedings of 9th International Conference on Intelligent Text Processing and Computational Linguistics, Haifa, Israel. Vijay Sundar Ram R, Bakiyavathi T, Sindhujagopalan R, Amudha K and Sobha Lalitha Devi, (2012), "Tamil Clause Boundary Identification: Annotation and Evaluation" In Proceedings of Workshop on Indian Language Data: Resources and Evaluation, LREC 2012, Istanbul Vijay Sundar Ram and Sobha Lalitha Devi, (2012), "Coreference Resolution using Tree-CRF", In A. Gelbukh (ed), Computational Linguistics and Intelligent Text Processing, Springer LNCS Vol. 7181/2012 pp 285 - 296. Vijay Sundar Ram, R. and Sobha Lalitha Devi (2013). “Pronominal Resolution in Tamil Using Tree CRFs”, In Proceedings of International Conference on Asian Language Processing 2013, Urumqi, China. pp197-200 Vijay Sundar Ram and Sobha Lalitha Devi, (2016). Two Layer Machine Learning Approach for Mining Referential Entities for a Morphologically Rich Language. Asian Journal of Information Technology, 15: 28312838. Vijay Sundar Ram and Sobha Lalitha Devi. (2016). "How to Handle Split Antecedents in Tamil?", In proceedings of Coreference Resolution Beyond OntoNotes co-located with NAACL 2016, San Diego, California. Vijay Sundar Ram and Sobha Lalitha Devi. (2017). "A Robust Coreference chain Builder for Tamil", In proceedings of 18th International Conference on Computational Linguistics and Intelligent Text Processing, April 17 to 23, 2017, Budapest, Hungary Vijay Sundar Ram and Sobha Lalitha Devi. (2018). "A Semi-automated Annotation of Co-reference Chains in Tamil", In proceedings of 18th International Conference on Computational Linguistics and Intelligent Text Processing, March 18 to 24, 2018, Hanoi, Vietnam, Vijayakrishna R and Sobha L. Domain Focused Named Entity Recognizer for Tamil Using Conditional Random Fields. Viswanathan S. (2000) Tamil Morphological Analyser. MS Thesis. Anna University, Chennai. Viswanathan, S. Ramesh Kumar, S. Kumara Shanmugham, B. Arulmozi, S. and Vijay Shanker K. (2003). A Tamil Morphological Analyser, ICON-2003, 31-39. Weerasinghe, R. (2011). A SMT Approach to Sinhala-Tamil Language Translation. citeseerx.ist.psu.edu /viewdoc/summary?doi= 10.1.1.78.7481, 2011. Winston Cruz, S. 2002. Parsing and Generation of Tamil Verbs in GSMorph. M.Phil. dissertation submitted to the University of Hyderabad.