Automatic Generation of Context Sensitive Dictionary ...

6 downloads 82515 Views 179KB Size Report
Dec 25, 2015 - ... develop automated tools to comprehend them in terms of their organization, structure and content. ... Sensitive Dictionary (CSD) generation, and Automated cataloging. 1. .... marketing and charisma. Trade secrets are often ...
Automatic Generation of Context Sensitive Dictionary Using Computational Linguistics Approach 1

V. Lakshmi Narasimhan1,2, V. Bhargavi1 and Ravi Prakash1 RGM College of Engineering & Technology, 2Srikar & Associates International, India Email: [email protected],[email protected],[email protected]

Abstract: Teaching, reading, writing and documentation have gone through significant transition from the normal traditional hand-to-print mode to a digital setting, whereby enthusiasts can share/disseminate their intellectuality through various audio, visual and other virtual mechanisms. Since the invention of the Phonograph in 1877, the first talking book became operational during the 1930s and the Audio Books during the 1970s. The advent of the Internet has made vast volumes of educational materials available and accessible to anyone who is on the right part of the digital divide. The volumes of such useful educational materials are so massive that one needs to develop automated tools to comprehend them in terms of their organization, structure and content. Besides the (automatic) transcription of the audio and video educational resources, a primary tool to be developed is that of an automatic Context Sensitive Dictionary (CSD) generator. This paper describes a preliminary study performed for automatically generating CSD, using the computational linguistics approach and, its performance. Specifically, this study considered educational videos available over the YouTube – the most widely used website for videos – with an ultimate aim to automatically catalog all of the material contents of the YouTube. KEYWORDS: Computational Linguistics, Automated Context Sensitive Dictionary (CSD) generation, and Automated cataloging.

1. INTRODUCTION In the present digital era, modes of teaching, reading, writing and documentation have gone through transition from the normal traditional hand-to-print mode to a digital setting, whereby enthusiasts share/disseminate their intellectuality through various audio-visual software systems. Since the development of the computer system in 1960s and 1970s, writing, speaking, documentation and presentation have gone through a sea change. In 1877, recordings of spoken words began with the invention of Phonograph. In 1930s, the first talking book was made to hold short stories and book chapters. The 1970s were the time when the term Audio Book became a part of the cassette tapes, later on Videos and now eBooks over DVDs, Internet and other means. Over the last decade, the Internet has changed the game completely with the offering of a large number of unordered useful materials with seemingly unlimited number of related videos also. For example, the YouTube website allows users

to upload, view, and share educational and other videos; it makes use of the Adobe Flash player to display a wide variety of user-generated and corporate media video. Presently available content include many videos which are social in nature such as, video clips, TV clips, music videos and other content such as, video blogging, short original videos, and other useful videos. Most of the content on YouTube has been uploaded by individuals, but media corporations, including CBS, the BBC, Vevo, Hulu, the British Council and other organizations also offer some of their material via YouTube, as part of the YouTube partnership program [22]. Websites, such as the YouTube, is home to high-quality videos from around the world, including significant amount of educational materials. These websites are generally the place to go for a search, whether a user is a teacher by profession, a gifted storyteller looking to inspire an audience with their passion for a particular topic, or simply curious about the world around. In addition, MOOCs, such as the MIT Courseware [23] and EdX [24], and websites such as TED.com, offer excellent quality materials on many educational aspects. The volumes of such educationally useful data sets is so massive and ever-increasing that there is an increasing urge to treat this gigantic datasets by means of automatic methods so that one can glean information and knowledge from of them; such knowledge could become potentially useful for teaching & lecturing, self-learning, peerbased learning and case-based (or experience-based) learning. Traditional techniques for video mining and information retrieval show their limits, when they are applied to such huge collections of video data. One can collate such educational materials from all such sites into a ‘computable catalog’ using a defined terms of CSD so that depending on the ‘taste and flavor of learning’, one can create an refined choice of materials and learning techniques for a given educational topic. Such a process, if available, would be useful in that different users can choose different ways of learning depending on their style and available timeframe. The possible answer to this question underpins the primary motivation for this paper. The rest of the paper is organized as follows: section 2 deals with computational techniques for automatic generation of terms of CSD, while section 3and 4 outline respectively the need for automatic domain-specific phraseological characterization and automatic generation of Z39.50 compatible ontology. Section 5 provides a list of possible

tools and technologies applicable to this area of work, while section 6 details related research work. The conclusion summarizes the paper and provides pointers for further work in this area. 2. COMPUTATIONAL TECHNIQUES FOR AUTOMATIC GENERATION OF CSD TERMS CSD, as the name suggests, contains context-specific words and their highly regulated strict meaning/s. In a CSD, each word has only one specific meaning, which may or may not be the meaning available in a Standard dictionary. The process of generating the CSD is still semi-automatic in our present system. “Can this process be fully automated?” is still an open question. Furthermore, unlike text-based word search approach, our algorithms need to be tuned in order to differentiate videos contextually – it is noted that in a pure text document, spelling and grammar is expected to be good, while this may not necessarily be so in a (audio/video) transcribed text document, hence this issue. Such algorithms need to work with terms that build out the theme/concept, called ‘supporting terms’ [25] for the automatic generation of CSD. The latter process will advance the development of CSD and related word-sense semantics for the wide class of (nearly) all engineering domains. Since the search for information over heterogeneous websites is always a non-trivial task – mainly due to the sheer volume of information – there is a clear need to categorize a large quantity of meta-information [15],we define CSD as ‘more than a dictionary look-up engine’, since it tailors dictionary entries to the context of the domain. The result of CSD generation would ultimately lead to the ability to categorize all YouTube videos based on their underlying contextual data/metadata and other parameters. With several publicly available instant comprehension tools, we tried to develop a ‘Limited CSD’ on the domain of Engineering Ethics. You Tube

Voice

Video Voice Stripper

Context Sensitive Word Stripper

Stemming Remover

Concept mapped CSW

Concept Map Maker

Text

Text Stripper

Concept Map Editor

Human intervened CSW

CSW Vocabulizer

Fig 1: Operational Architecture of Automatic CSD Generator Figure 1 shows the operational architecture of the automatic CSD generator, wherein a voice and text stripers are used to strip the voice and text parts of the videos automatically. Context-sensitive words (in this context, words relevant to Engineering Ethics) are stripped. Further two other entities remove context sensitive words and thereafter stemming parts of the word in order to create an unique word list. A Concept Map Maker (such as XXXX []) is used to map

the words and their relationships over a (typically) multidimensional Concept Map. A Concept Map Editor is used to edit the Concept Map. A Concept Sensitive Word Vocabulizer is built to generate meanings – mostly using manual means – in order to generate a CSD for a given domain. In this paper though, we will limit our discussions to automatic generation of CSD only, whose algorithm is given in Table-1 using a sample of five videos downloaded from the YouTube on the topic of Engineering Ethics. Table-1: Algorithm for CSD Generation for Engineering Ethics Step-1: Download five video files from YouTube Step-2: Extract the voice data from the video files using voice web tool Step-3: Convert the voice data into text data using voice to text conversion tool Step-4: Generate the word frequency table for all five transcripted files. Step-5: Extract five more of frequently used Engineering Ethics related terms form transcripted files. Step-6: Repeat this for rest all four files and five more of frequently used Engineering Ethics related terms form transcripted files. Step-7: Match the collected terms with the Reference CSD using tools Step-8: Evaluate accuracy of the procedure – e.g., Efficiency (%) = (matched terms/Extracted terms)*100 Step-9: Repeat Step 1 to 8 four times with different set of videos. Every time the denominator would be enriched by new extracted terms matched. We have illustrated the above algorithm using a set of five video files taken from the YouTube as noted below: Step 1: Select five video files on Engineering Ethics from the YouTube as described in Table-2. Table-2: Description of Selected Video Files Video Nature Duration Size (MB) Video 1 Can You Keep a 5.24 6.33 Secret? Confidentiality and Engineering. Video 2 Conflicts of Interest for 7.55 6.46 Engineers Video 3 Engineering Ethics 101: 8.31 6.89 Professionalism Video 4 Ethical Engineering 9.40 7.63 Decision Making Step 2: Extract voice only from all the above videos [34].Corresponding to each video file the description of audio files is provided in Table-3: Table-3: Description of Audio Files Audio Duration(Min) Size (MB) 480p Audio 1 5.24 26 Audio 2 7.56 37 Audio 3 8.30 43 Audio 4 9.39 45 Audio 5 5.15 26

Step 3: Convert all audio files into text transcribed form using online tool: [24],(A free online supporting tool with limited number of hours). Authors manually checked also the accuracy of conversion by listening to the audio files and matching with the converted text. The accuracy of this tool is around 98%. The description of transcripted files is provided in table 4. Table 4: Text Files description Text Files Size (KB) Word Count Transcribed 1 35.1 765 Transcribed 2 23.1 1215 Transcribed 3 23.7 1164 Transcribed 4 24.8 1524 Transcribed 5 22.3 757

Table-6: Generated CSD in the domain of Engineering Ethics Using Five Videos from YouTube CSD Terms Meanings Divulge Trade To open up issues related to any Secrets business/trade Former Previous employee who wish to keep Employer his information confidentially puts Obligation restrictions in accessing their personal information

Former Employer

The formula or secret of any company product and other issues related to its marketing and charisma. Trade secrets are often under obligation of confidentiality. Employer who is associated with ongoing projects/process of a business

Trade Divulge(-Ing)

Some of the issues of a business, which cannot be revealed/open for all

Step 5: From each transcripted file, five or more most frequent words which are related to Engineering Ethics are extracted. Since there are five files taken initially, hence all together 40 words are extracted. A separate file for all these extracted words is then created – called Context Sensitive Word List file.

Obligation (On) Confidentiality Minimum Probation Period

The restriction put by company/employer related to any issue of the company The minimum prohibition duration of an engineer who violates obligation of confidentiality

Step 6: The extracted words from all five files are represented in a tabular form as shown in Table-5.

Hard To Prove

Table-5: Context Sensitive Word List file File Name Extracted Terms Transcribed 1 Divulge trade secrets, former employer obligation, trade secrets, former employer, trade divulge(-ing), obligation (on) confidentiality, Minimum probation period, Hard to prove, trade Transcribed 2 Conflict of Interest, Justice, Work, Engineer, Professional, Problem, Relationship, Trust, Professor, Committee, Family Transcribed 3 Client, Benefit, Code, Ethics Transcribed 4 Test, Spam, Law, Consequences Transcribed 5 Responsible, Safety,

Legal Recourse

The employee violations related to a trade secret is a criminal issue which is very hard to prove Legal action against the violation of rules while disclosing the trade secrets

Step 4: The transcripted files are processed by filtering articles, compositions, prepositions, verbs etc. After filtering authors generate the word frequency count for each files. This has been done by using following free online tools [29].

Trade Secrets

Trade Conflict Interest Justice

Work

Engineer Step 7: Words represented in above tabular forms are matched with the CSD [5] [6] [7]. Professional The accuracy of this procedure was evaluated using the formula: Efficiency (%) = (matched terms/30)*100 Through our preliminary results, we have found the accuracy to be around 80%. Finally, the Context Sensitive Dictionary (CSD) in the domain of Engineering Ethics (as taken from five videos from the YouTube) takes shape as shown in Table-6.

Problem Relationship

Trust

Of

Managerial job where engineers prepare designs Clash between professional responsibility and Personal responsibilities Expected capability of deciding between two issues; professional vs. personal 1. Career choice for engineers, 2. Role according to the requirement of the company 1. A designer, 2. A responsible professional for the safety of the public and society. 1. A person who is responsible for the safety of the public and society. 2. Who is capable running conflict of interest. An ethical issue. Professional relationship, 2. Trustworthiness. 3. Relationship between professional and people Rely on engineers when ethical conflict arises

Professor

Committee

Family

Client

Benefit

Code Ethics Test

Spam

Law Consequences

Responsible Safety

1. A person who teaches at a University level, 2. Subject exponent. 3. Teaching expert. The board of a leading company, who are responsible for any issues related to the company. Personal interest of the employee, for whose sake he may run conflict of interest Victim of some social issue who seeks help from the societal authorities to resolve his problem. The profession which benefits humanity and society, 2. Works for the interest of maximum number of people. 3. Benefit of it of the professional organizations 1. Code of conduct, 2. Code related to the profession, 3. Codes of ethics Devoted to public good, codes of ethics The Tiger reversibility Test; a reversibility test to divulge the trade secrets Unwanted mail, sometimes employee may be tested how well he is interested in mails from somebody Spam law which applies in the US, Which differentiates law and ethics Engineers should be creative and thinking of alternative and the possible consequences related to that. Moral responsibility of an engineer, responsibility of future Responsible for safety of the society, prioritizing the safety of the public

3. AUTOMATIC DOMAIN-SPECIFIC PHRASEOLOGICAL CHARACTERIZATION Characteristic phrases or constructs that relate to specific domain scan to be automatically extracted from a given text – a process called phraseological analysis. The work by several Linguists [2], [3], [4] have stated that it is these phraseological terms that quite often define the domain area. Phraseological units (PhUs), which are used in specific contexts, are – nonvariable, or fixed. For example, the terms used to designate a PhU have oft-received most attention in linguistic literature have been phrases and idioms1. Sometimes, there is not even a clear distinction between these two terms and their parallel use with the same meaning being the common practice [20] [1]. In the literature, dealing with phraseology, different terms, 1 PhUs are classified as: i) non-motivated word-groups, ii) cannot be freely made up in speech, iii) reproduced as ready-made units, iv) structurally stable, v) possess stability of lexical components, and, vi) reproduced as single unchangeable collocations. 2 Indeed Z39.50 is so comprehensive that it would take three trees of paper to print it out completely!

such as idiom, phraseme or word-group have often been used to refer to the same category. Each of them is defined according to different criteria and, for this reason, each term leads to broader or narrower definitions and views. N. N. Amosova (1963) [17] deserves a special mention for their formulation of the “phraseologically bound meaning”. In fact, the notion of “contextual determination of meaning” is a core concept for modern English phraseology, indeed PhUs are also investigated from the contextual perspective, under this approach, the distinction is that freeword groups build a variable context, whereas the essential feature of PhUs is a non-variable or ''fixed" context. Unlike free-word groups, which have variable components, PhUs allow only partial or no substitution. Burger et al. (2007) [25] claim that “PhUs have the characteristics of the sentence” and hence include collocations, proverbs and formulae also as PhUs. For instance, in the examples – i) to lock the stable door after the horse is stolen, ii) to wash one’s dirty linen in public, iii) meet the demand/the requirements, the semantic change may affect only one of the components of a word-group (“partially transferred meaning”) like in small hours, small (weak) bear. Another example, namely, “red tape”, meaning the bureaucratic methods conveys a single concept which is reproduced as single unchangeable collocations. As a result, these PhUs can be identified based on ‘denotational and cannotational’ meanings. Therefore, semantic change may affect the whole wordgroup (‘complete transferred meaning’) - e.g., to skate on thin ice (to take risks), to have one’s heart in one’s boots (to be anxious about themselves.)Thus, J. Casares (1950), N. N. Amosova (1963), [17] think that if the multi-word units do not constitute part of the sentence it is wrong to include them in the system of the language, because they are independent units of communication. The above distinction was recognized and followed by other specialists of the linguistic field, such as Cowie (1988), [11], Melcuk, [35] [21] Glaser (1988) and Burger (1988) and later by Granger & Paquot (2008), etc.[20] [21]. We intend to use a confluence of tools and techniques such as, n Gram [27], Latent Semantic Indexing [28], etc, for the purpose of extracting domain phraseology. Our process will also detect inconsistency in phraseology, which might lead to detection of cross-disciplinary, multi-disciplinary and trans-disciplinary areas of work. While the work by [10], [1] is directed only at detecting inconsistency, our work will enhance this work using advanced algorithms based on Artificial Neural Networks (ANNs) towards application domains. 4. AUTOMATIC GENERATION OF Z39.50 COMPATIBLE ONTOLOGY Cataloging has been a long-standing process and the earliest catalog has been the Dublin Core [26]. Even though the Library of Congress Catalog – aka Z39.50 – is a comprehensive catalog2, it is still insufficient to represent all possible catalog-able items in the world – more on this issue later. Is it possible to automatically catalog an item into the Z39.50? The process we wish to adopt is to develop a set of

synergistic words for each of the entries in the catalog – initially for one domain – and match phaseological terms of the given document to various entries so that an automatic catalog of the given document can be achieved. However, the issue becomes complicated when the given document’s characteristics match more than one catalog entry. This issue typically happens when fields/disciplines become interrelated. New algorithms are required which can potentially give weights to catalog entries or seed a new catalog entry for the given document. A number of algorithmic works needs to be carried out in this area. Further, Z39.50 is a very broad, called horizontal catalog and, still cannot cover all areas of knowledge. Engineering ethics specific catalogs are available and one need to automatically annotate a given document entry on them. In this process, the CSD need to be focused towards engineering ethics specific words and their meanings should have precision definitions. 5. TOOLS AND METRICS A list of potemtial tools that can be used in this area includes the following: • Latent Semantic Indexing [30], • Htk [31] • Ontology Generator [32] • PHP/Yaztoolkit [33] The list of potential metrics that could be used in this area includes the following: • Comparison to other dictionaries • Babylon • Word Point [36] • Clever learn • I Finger [37] • Techocraft’s RoboWord 6. RELATED RESEARCH WORK Aist and Mostow [8] describe automatic generation of all possible questions from a topic of interest, and Allen et al.[9], state that they have been exploring the feasibility of building extensive knowledge bases by reading definitional sources such as, online dictionaries and encyclopedias such as, the Wikipedia. Till date their work has been focused on Word Net [13], a widely used lexical resource in Computational Linguistics. The Context-sensitive electronic dictionary developed by Feldweg and Breidt [10] gives an entire online entry dictionary based on the context by focusing on the implementation of the lexicon and context-sensitive features. Cowie, et al. [11] quote that the dictionary as a codified record of phraseological norms and as an indispensable aid to language-learning and teaching, and so dedicated their work purely on drawing upon a standard bilingual dictionary in machine-readable form. They extract its very varied collocational material and then enriches a database with a system of 'lexical functions' (derived from the work of Melcuk [35]), which specify the lexical-semantic relations of collocations. The CoNLL-2010 Shared Task [12] was dedicated to the detection of uncertainty cues and their linguistic scope in natural language texts. The motivation behind this task was that the distinguishing factual and uncertain information in texts is of essential importance in

information extraction. Fellbaum and Christiane [12] identified the use of verbs, adjectives and adverbs into grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Word Net [13] is also freely and publicly available for download and its structure makes it a useful tool for computational linguistics and natural language processing. Papineni, Kishore and Salim-Roukos [14], have proposed a method of automatic machine translation and their evaluation that is quick, inexpensive, and languageindependent; the process correlates highly with human valuation, and has little marginal cost per run. They presented a method BLUE (Bilingual Evaluation Understudy) [14] as an automated understudy to skilled human judges which provides for quick results when frequent evaluations are called for. 7. CONCLUSIONS We have attempted a computational linguistics based approach to CSD production and our initial results indicate a good efficiency of around 80%. However, our process is semiautomatic and we are currently working on advanced algorithms to automate various processes therein. Clearly, there is a need for new ways to indexing videos and cataloging them. In addition to extracting and analyzing keywords and phrases contained in a video, our approach – in the longer term – will also examine the collection as a whole, in order to perform such operations as automatic ontology management, automatic catalog entry generation, query-byexample, vertical and fine-grain catalogs, CSD generation and Automatic Context Sensitive Dictionary (CSD) generation leading to a Federated Domain Catalog (FDC). Specifically, our future works in this area include, but not limited to, the following: •

• •



Automatically extract topic/s and catalog all YouTube videos initially using algorithms available (and some developed by the PIs) based on computational linguistic techniques. Develop algorithms for real application scenarios such as TED.com and other such systems. To develop a higher level toolkit/s for higher level of information integration and build intelligent mediators rather than wrappers. The most important aspect will be to develop algorithms to perform automatic generation of video cataloguing, indexing and retrieval. 8. REFERENCES

1.

2. 3. 4. 5.

D. Chiang. (2005). A hierarchical phrase-based model for statistical machine translation. In Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05) , pages 263– 270 P. Koehn, F.J. Och, and D. Marcu. (2003). Statistical phrase-based translation. In Proceedings of ACL 2003 , pages 48–54 Van de Cruys, T. (2010). "Mining for Meaning. The Extraction of Lexico-Semantic Knowledge from Text" (PDF). Lin, D. (1998). Automatic retrieval and clustering of similar words (PDF). 17th International Conference on Computational linguistics (COLING). Montreal, Canada. pp. 768–774. Fatheringham, H. (2007), ‘Glossary for Engineers’, Engineering Ethics Case Studies, University of Leeds, viewed (20/12/2015) http://www.engsc.ac.uk/downloads/scholarart/ethics/glossary.pdf

6. 7. 8.

9.

10. 11. 12.

13. 14.

15. 16. 17. 18.

https://www.niehs.nih.gov/research/resources/bioethics/glossary/ind ex.cfm Charles b. Fleddermann. (2012). Engineering Ethics, Fourth Edition, University of New Mexico Aist, G. & J. Mostow, (2009). “Predictable and educational spoken dialogues: Pilot results,” in Proceedings of the 2009 ISCA Workshop on Speech and Language Technology in Education (SLaTE 2009). Birmingham, UK: University of Birmingham. [Aist & Mostow 2009 available online (pdf)] Allen J., W. de Beaumont, L. Galescu, J. Orfan, M. Swift, and C.M. Teng, (2013). “Automatically deriving event ontologies for a commonsense knowledge base,” in Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013), University of Potsdam, Germany, March 19–22. Stroudsburg, PA: Association for Computational Linguistics (ACL). [Allen et al. 2013 available online (pdf)] Feldweg and Breidt(1996); or Sharp, see Poznanski et al, 1998 Cowie, A.P. (1998). Phraseology: Theory, Analysis, and Applications. Oxford: Oxford University Press. Farkas, Richárd, VeronikaVincze, GyörgyMóra, JánosCsirik, and GyörgySzarvas. (2010)“The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text.” In Proceedings of the 14th Conference on Computational Natural Language Learning—Shared Task, pp. 1-12. Association for Computational Linguistics. Fellbaum, Christiane. “WordNet: An electronic lexical database. 1998.” WordNet is available from http://www. cogsci. princeton. edu/wn (2010). Papineni, Kishore, SalimRoukos, Todd Ward, and Wei-Jing Zhu. “BLEU: a method for automatic evaluation of machine translation.” In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311-318.Association for Computational Linguistics, 2002. Grossman and Frieder 2004; Information Retrieval: Algorithms and Heuristics, The Kluwer International Series on Information Retrieval, Vol. 15, PP. 1-11, http://informationretrieval.org. Hayes, P. (ed) (2002): RDF Semantics. http://www.w3.org/TR/rdfmt/Beckett, DJ (Author); Brickley, D (Author); Miller, E (Author). 2002. Expressing Simple Dublin Core in RDF/XML Amosova, N.N. 1963. Osnoviangliyskoyfrazeologii. Leningrad. Cowie, A. P. (1998). Phraseology: Theory, Analysis, and Applications, (Oxford Studies in Lexicography and Lexicology) Oxford: Oxford University Press.

19. Colson, J.-P. (2008). Cross-linguistic phraseological studies: An

20. 21.

22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37.

overview. Granger, S. &Meunier, F. Phraseology -An interdisciplinary perspective. John Benjamins Publishing Company, Amsterdam / Philadelphia. Burger, H. (1998) Phraseologie: Eine Einführungam Beispiel des Deutschen, Berlin: Erich Schmidt. Sinclair, J. (2008). The phrase, the whole phrase, and nothing but the phrase. Granger, S. & Meunier, F. Phraseology -An interdisciplinary perspective. John Benjamins Publishing Company, Amsterdam / Philadelphia. https://www.youtube.com/user/BritishCouncilLE Accessed on 20/12/2015 https://www.edx.org/school/mitx Accessed on 20/12/2015 https://transcribe.wreally.com Accessed on 30/12/2015 Burger, H., D. Dobrovol’skij, P. Kühn& N. R. Norrick (eds.) (2007) Phraseology: An International Handbook of Contemporary Research,vols 1 & 2. Berlin: Mouton de Gruyter https://www.yumpu.com/en/document/view/5483800/a-documentontology-and-agent-based-rdf-metadata-retrieval, Miller and Brickly 2002. Accessed on 2/01/2016 Christopher D. Manning, Hinrich Schutze, (1999). Foundations of Statistical Natural Language Processing, MIT Press:. ISBN 0-26213360-1. http://classifier.rubyforge.org/Accessed on 30/12/2015 http://www.online-utility.org/ andhttp://www.textfixer.com/Accessed on 2/1/2016 http://lsa.colorado.edu/ .Accessed on 31/12/2015 http://htk.eng.cam.ac.uk. Accessed on 30/12/2015 Class Inferred Ontology Generator java.lang. Object; org. semantic web.owlapi.util.InferredOntologyGenerator. Accessed on 30/12/2015 http://www.indexdata.com/services accessed on 28/12/2015 http://videodownloader.ummy.net/video to audio converter. 25/12/2015 http://olst.ling.umontreal.ca/pdf/yop_2012_0003.final.pdf Accessed on 30/12/2015 http://www.babylon.com/ Accessed on 31/12/2015 http://ifinger.com/ Accessed on 31/12/2015

Suggest Documents