A Workbench for Collaborative Ontological ... - Semantic Scholar

13 downloads 569 Views 151KB Size Report
construction and maintenance with authoring tools in engineering point of .... The AOS (Agricultural Ontology Server) Workbench, originated by FAO, is a web-.
A Workbench for Collaborative Ontological Knowledge Construction and Maintenance with Authoring Tools Daoyos Noikongka1, Dussadee Thamvijit1, Aurawan Imsombut2, Mukda Suktarachan1, Sachit Rajbhandari1, Frederic Andres3, Asanee Kawtrakul1 1

Department of Computer Engineering, Kasetsart University, Bangkok, Thailand The faculty of Information Technology, Dhurakij Pundit University, Bangkok, Thailand 3 National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan E-mail : {dusadee, daoyos, mukda,ak}@naist.cpe.ku.ac.th, [email protected], [email protected], [email protected] 2

Abstract. Knowledge management systems have often been considered as a means for sharing knowledge among communities of users. The key of success of knowledge sharing in the field of agriculture is using sharing agreed terminologies such as ontological knowledge especially in multilingual aspects. This paper proposes a workbench for collaborative ontological knowledge construction and maintenance with authoring tools in engineering point of view in the field of food and agriculture. The framework consists on ontological knowledge management, user management and three authoring tools: ontological acquisition supporting, lexical acquisition supporting and ontological integrating tools. Keywords: Agriculture Ontology, Collaborative Multilingual Ontological Workbench, Ontological Authoring tool

1

Introduction

Web, blog or knowledge management systems [1] have often been considered as a means for sharing knowledge among a community of users. In specific domains such as agriculture or food, experts require considerable time to be able to build and to share common understanding knowledge using several ad-hoc processes (database, office file sharing, web site…) such as document databases (agnic1, faostat2) or traditional thesaurus (The NAL Agricultural Thesaurus 3 ). Sharing agreed terminologies is the result of collaborative communities where terms creation comes from human interaction with computers and from human-to-human interaction via computers. Ontological knowledge building has been progressing until now by crafting approaches rather than real engineering methodologies.

1 www.agnic.org 2 faostat.fao.org 3 agclass.nal.usda.gov/agt/about.shtml

Recently, the needs of monolingual and multilingual ontologies [2] have been increased to apply agricultural knowledge to many applications [3], [4] such as search engines (agfind4, agrisurf5), query & answer systems6, Web applications such as RSS feeder and query expansion web service, etc. General web search services (e.g. Google, Yahoo) are also using ontology-based approaches to find and organize content over the Web. Knowledge and community scaling with the information explosion lead to combine automatic ontology construction and collaborative ontology development for integrating ontologies [5] accessible over internet to external specific applications. It is easy to notice that the ontology construction has been a hot research topic [6], [7], [8] in the past few years. These researches focus on various types of sources, methodologies and applied domain (e.g. agriculture, justice) tackling issues related to ontology specification, ontology domain, conceptual and integration, etc. In this paper, we propose the workbench platform tackling two important research issues: automatic ontology extraction from multi-sources resources (e.g. textual documents, dictionaries and thesaurus) and collaborative ontology updates by enduser community. The ontology extraction has been processed by using two types of corpora that are unstructured and structured or semi-structured textual resources. The ontology-learning technique for unstructured resources processing applied hybrid approach. By the observation, there are three favorite approaches in ontology learning those are pattern-based approach, statistical-based approach and hybrid-based approach. Each approach gains in the different point of views. The pattern-based technique [9], [10] is efficient for general pattern that could be used in several domains. However, it has to prepare too much pre-defined extraction patterns and it is also faced in cue word ambiguity and data sparseness problems. The statistical-based approach [11], [12] is scalable. It can process a huge data and a lot of features but the drawbacks are problems in data sparseness and man power consuming for labeling the cluster node. This approach appropriates for extracting taxonomic relation. The hybrid approach [13], [14] can be utilized for extracting both taxonomic and nontaxonomic relations however it has to prepare pre-defined extraction patterns for pattern-based technique and it needs a lot of learning examples for machine learning technique. To construct ontology by utilizing structured or semi-structured textual resources such as thesaurus and dictionaries, there are a few works to catch this kind of processing because of the complexity of copyright and lacking of resources. Our processing, we applied hybrid approach. Belonging to the others, [15], [16] tried to apply rule-based approach for converting thesaurus and [17], [18] utilized existing dictionaries for constructing ontology. This approach is not complex but it needs experts to define heuristic rules. Jannink [19] ran statistical-based approach to convert the dictionaries to the thesaurus. The benefit of this approach is the data size which could be processed but it could not define relation types. The rest of this paper is organized as following. Section 2 is an architecture overview of the ontological knowledge construction and maintenance workbench 4 5 6

www.agfind.com www.agrisurf.com www.agriculture.gov.bb/default.asp?V_DOC_ID=1618

including authoring tools. In Section 3, we present our approach related to ontological knowledge building and maintenance management. In section 4, we present our research result related to ontological knowledge authoring tools. Finally, in Section 5, we conclude this work and pin down some future ongoing work.

2

Overview of Ontology Management Architecture

Figure 1 shows the generic Workbench platform for Multilingual Ontological knowledge construction and maintenance by extending the AOS (Agricultural Ontology Server) construction workbench with Authoring Tools. Communities

Authoring Tools Printed Text

GUI for Ontology Acquisition & Maintenance

Concept management Relationship management

Scheme management

Search

Export

Ontological Knowledge Management

Consistency check Import

System Statistic Report

System preference

User management Group management

User Management

MRD

Dictionaries

Morphological Analysis and Phrase chunking

Task-Oriented Parsing

Ontology Extraction

Lexicon Information Integration

Filtering & Correcting Ontology Integration

Validation JDBC API

Sesame API

SQL

SeRQL

System Data Repository (MySQL)

Ontology Repository in OWL format (MySQL)

Fig. 1. Overall System architecture

2.1

What is Agricultural Ontological Server Workbench?

The AOS (Agricultural Ontology Server) Workbench, originated by FAO, is a webservice java tool for collaborative building and structuring multilingual ontology and terminology systems in the area of agriculture with a distributed environment. For this workbench, we moved away from a centralized development of AOS to a Web2.0 inspired way of networked and distributed contributions to create a system with richer

semantics that is going to greatly enhance both the resource indexation and related search, and the information organization in the agricultural domain. 2.2

Moving to a Generic Workbench and Authoring Tools

As shown in Fig.1, the extended workbench consists of three main parts: the Ontological Knowledge management component, the User management component and the authoring tools. (See for more details in section 3 and 4.)

3

Ontological Knowledge Construction and Maintenance

Since the workbench supports collaborative ontological knowledge construction and maintenance, a good ontological knowledge and user management are needed. 3.1

Ontological Knowledge Management

Ontology is kept in OWL (web ontology language) format by using MySQL as the persistence repository. Sesame has been used as OWL framework to do many actions with data in OWL format such as querying by using SeRQL7 query language, adding graph, deleting graph and exporting data. There are 7 functionalities that user can use for managing ontology. Concept Management Function. This module provides functionality of concept navigation. The workflow enables to understand how user can work with this module. The end users can start to create or delete concept from concept hierarchy. After adding a new concept, user can add, edit or delete more information in each component as follows • Basic Information such as create-date = 2006-10-03, update-date = 200610-03, status = published • History of change for tracking the version of concepts with terms in any language • Scope note for reminding some important information for sharing with the other users in the community. • Terms that related to the concepts in any language for supporting multi-lingual aspect. Accordingly, when user browses the concept such as “public administration” then he/she could see the terms in the other languages such as “public administration (en)” and “Administration publique (fr)” • Definition of the concept in any language for supporting the meaning of the concept especially the technical terms. For example, the definition of the concept Cycadaceae (en) is “ancient palmlike plants closely related to ferns in that fertilization is by means of spermatozoids (en)” 7

www.openrdf.org/doc/sesame/users/ch06.html

• Relationship between users' selected concept to other concepts. • Image that associated to the concept. According to the above information, the collaborative ontology construction could be managed more consistently and efficiently. This function also allows administrators to manage about permissions for ontology editors, validators, etc. Search Function. This function consists of basic search and advance search. • Basic search. User can search concept by using term as the query and results are returned as the concept which has that term. More options variable for providing a better result in this module are using regular expression (contain, exact match and start with), case sensitive and include description. • Advance search. Using the advance search, user can make the result more accurate by filtering concept using concept relationship, sub-vocabulary (geographic, scientific term, etc), term code, and concept status or classification scheme. Relationship Management Function. The data model of this system is an ontological one which is kept in OWL format. Basically OWL format is a triple pattern (subject-predicate-object). User can use relationship management module to add, edit or delete some predicate that were used in this system. The relationship hierarchy consists of 2 types of relationship properties (e.g object property and data type property). In case of adding new relationship, the users can also add more related information to that relationship. They can also edit or delete the related information components which are listed below. • Label: Label of relationships in any language such as “has category”. • Definition: Definition of relationship in any language. For example, relationship is “belong to category”. Definition is “to map any domain concept to any category”. • Properties: Properties of relationship such as symmetric, transitive, functional and inverse functional. • Domain & Range: Boundary of subject and object of that relationship. For example, “has image” has “domain concept” as domain and “image” as range. Consistency Check Function. Checking whether some ontology parts are inconsistency depends on consistency condition. The function will return inconsistency part with solution for that issue. Validation Function. People can have their own way to construct ontology or maybe they have different background knowledge. As shown in Fig.1, every action that is going to change data in ontology, needs to be approved by two types of user group which are “validator” and “publisher” (ontology expert). The validation function will perform this issue before releasing the updates to the public. Import Function. It enables to import external ontology in OWL format that has the same schema compared to the system. In case of duplication, system will alert to user. Export Function. It enables to export ontology from in OWL format to RDF, XML, TBX, SKOS ,OWL (simple format) and RDBMS (SQL, UTF8) format. Scheme Management. This module is used for grouping concept into user defined category.

3.2

User management

The main task of this part is related to user/group management by defining the permission for user accessing and registering to any module in the workbench. It is also available of broadcasting news updating to the community. This part is also related to statistical data collection (event logs for this system) that is kept in MySQL database. There are 4 functions proposed only to manage user and system data. This part does not have any effect to the ontology model.

4

Ontological Knowledge Authoring Tools

One of necessary parts of this workbench is the ontological knowledge authoring tools, (semi-) automatic ontology acquisition component, which supports the users for acquiring the complete and up-to-date ontology. This component allows extracting ontological terms, their lexicon information and their relations from different resources, i.e. texts and dictionaries, and integrating them into the core ontology. This component is divided to 3 sub-processes: ontology acquisition process, lexicon information acquisition process and ontology integration. 4.1 Ontology Acquisition Process The process of (semi-)automatic ontology acquisition from texts is composed of two main processes. The first one is the morphological analysis and the phrase chunking and the second is the ontology learning process. Morphological Analysis and Phrase Chunking. These processes are preprocessing module. The execution of these modules is language dependence so the grammatical rules are changed to process those various languages. The first step (if needed) is that the printed books are scanned in order to make them to be electronic text. After that, a shallow parser, based on grammatical rules and statistical approach, is applied for identifying the boundary of words and morphological information, e.g. part-ofspeech. Next, the outputs are chunked into phrases by using grammatical rules. Multi-Algorithms for Ontology learning. The multi-algorithms applying for extracting the complete ontological terms and relationships: concept acquisition, NP analysis-based taxonomic and cue-based taxonomic relation acquisition composed of Concept acquisition module. Concept can be acquired by using term frequencies in texts. The terms that are more frequently used in a domain-specific corpus than in general corpus will be identified as ontological concept and proposed the user to verify. NP analysis-based taxonomic relation acquisition. The noun phrase analysis technique is used to analyze the surface form of a compound term’s head word. If the head word of a term has the same surface form as other terms, the system will apply the IS-A relationship to them. For example, the head word of cow milk is milk which

has the same surface form as milk. Then, the system will identify cow milk is a subclass of milk. Cue-based taxonomic relation acquisition. To identify the intended relationships of the ontological terms, we use explicit cues, i.e. lexico-syntactic patterns (e.g. NP such as NP1, NP2, …) [9] and an item list (i.e. bullet list and numbered list). The main advantage of this approach is that it simplifies the task of concept and relation labeling since the cues can be used to identify the ontological concept and to hint their relations. However, this technique poses certain problems, i.e. cue words ambiguity, item list identification ambiguity, and numerous candidate terms ambiguity. The last problem is very important, especially for the sentence that head word has several modifiers. The methodology to solve these problems is proposed in [20] by using lexicon and co-occurrence features and weighting technique from information gain. The system will calculate the most likely hypernym value (MLH) of all candidate terms and select the term that has maximum MLH value in each candidate set to be the ontological term of the related terms. The corpus used to test these methodologies deals with the domain of agriculture. It is the 302,640 words plain text in Thai from 90 documents. By testing with these documents, the system is able to extract about 2,228 concepts and 2,325 taxonomic relations when using multi-algorithms techniques. The performances of the system are 0.74 of the precision, 0.78 of the recall and 0.76 of the F-measure. The important errors of pattern approach are caused by some ambiguities of the cue words. 4.2

Lexicon Information Acquisition Process

The semi-automatic Computational Lexicon construction, KULEX [21], is originated by integrating word information from multiple Thai language dictionaries such as Klangkam8, the Royal Institute Dictionary9 (RID) and Matichon Dictionary10. These dictionaries were respected as very good and reliable resources. The KULEX has been organized in concept hierarchy with necessary information. The KULEX greatly reduces the labor work and time consuming. Moreover, it contains the varieties of word information. Almost dictionaries which are the resources of our system are in printed form. Thus the optical characters recognition is applied for converting the image document into the electronic text. The optical characters recognition that applied to our system is ArnThai11 which has 90%-95% of correctness. The lexicon construction by this process consists of two main steps. Task-Oriented Parsing. Each dictionary has different structure of content, for example, one had been organized in hierarchical concept structure but the other had been organized by alphabetically ordering. So the word information extraction tools are different. Fig. 2 and Fig. 3 show examples of word information extraction from each dictionary. 8

Nawawan Pantumata.: Klangkam Dictionary. Amarin P&P, Thailand (2004) Royal Institute: Royal Institute Dictionary. Aksorn Jarurntad printing, Thailand (1988) 10 Matichon Public Co.,Ltd.: Matichon Dictionary. Pickanes printing center, Thailand (2004) 11 http://arnthai.links.nectec.or.th 9

ท-ม สรรพสิ่ง น1-น346 โลกตามธรรมชาติและตามจินตนาการ Concept Hierarchy น 260-น 280 พืชที่ใชเปนอาหาร น 260 ขาว

Word

Classifie r

Classifier

ร ๑ - ร ๑๒ การทําใหมี ขึ้น คงอยู และหมดไป Concept Hierarchy ร ๑ การทําใหมีขึ้น

Word Usage Example

Word Usage Example

ปลูก ทํา ใหเกิ ดพรรณไม เชน ปลู กผัก; โดยปริ ยายใช หมายถึง ทําใหเกิดที่อยูอาศัย เชน ปลูกบาน, ปลูกพลับพลา

Definition 1

Definition

Word POS

Definition

Word POS

Definition

ขาว น. ชือไม ลม ลุกหลายชนิด หลายสกุล ในวงศ Gramineae โดยเฉพาะชนิด Oryza sativa Linn. ซึ่งใชเมล็ดเป็นอาหารหลัก มี หลายพันธุ เชน ขา วเจ า ข าวเหนียว

ขาว พืชที่ใ ชเมล็ดเปนอาหารสําคัญ มี หลายชนิดหลายพันธุ [ล.วา เม็ ด, เมล็ด ; เรีย กตามภาชนะที่บรรจุ เชน ถุง จาน

Word

Word POS

Definition 2

Fig. 2. Parsing of Klangkam dictionary

ปลูก ก. เอาตนไม หรือเมล็ด หนอ หัว เปนตน ใสลงในดินหรื อสิ่งอื่นเพื่อใหง อกหรือใหเจริญเติ บโต, ทําใหเจริญเติ บโต, ทําใหง อกงาม เชน ปลูกไมตรี. Word Usage Example

ปลูก ก. เอาสิ่งตางๆ มาปรุ งกันเขา เพื่อทํา เปนที่อยูหรือที่พักอาศัยโดยวิ ธีฝงเสาลงในดิน, โดยปริยาย หมายถึงการกระทําที่คลา ยคลึงเชนนั้น เชน ปลู กพลับพลายก. Word Usage Example

Fig. 3. Parsing of Royal Institute dictionary

Lexicon Information Integration. We integrate all word information from various dictionaries together step by step. First of all, with the task oriented parsing, the system could acquire word entries and theirs information from Klangkam e.g. concept hierarchy, and word entries and theirs information from RID e.g. part of speech, word definition. Next, the system will integrate those information into the concepts by using word definition based classification. This semi-automatic classification is based on two approaches: using head word matching and using definition of the words of the same concept matching. (See for more details in [21]) The first approach has been applied where the surface forms from RID are consistent to the word in Klangkam. This approach uses Lesk [22] algorithm for finding similarity of word senses based on the assumption that “words which have similar surface form (head word) and sense in each dictionary should have similar word definition and should be in the same concept”. The correctness is 91.50%. The second approach has been applied for the rest words which have different head word and the rest senses. This approach uses term weighting [23] for integrating the rest words and theirs information with the appropriate concept based on the assumption that “words related the other words in the same concept should have similar word definitions”. The correctness is 65% of a top ten rank’s concepts. 4.3 Ontology Integration and Reorganization At this step, the related word/phrase pairs are collected from the two types of sources, texts and Dictionaries, and integrated to the existed core ontology by applying two heuristics techniques: • If the separated ontological trees have the same label nodes, then merge them. • If the terms’ head words match partially, then merge them. For example, Fruit has head word matching with Tropical Fruit. At the current state, there are two operations involved in this process: • Addition: A child node will be added to the core tree, if the parent node has the same label. • Insertion: If the child nodes have the same label as the head word of the parent nodes then the new term that more specific is inserted between two existing ontological terms. Fig. 4 points out an example of the ontology operation for inserting a new ontological tree (right-hand-side tree) into a core-tree (left-hand-side tree). The remaining terms that could not be integrated will be kept for the expert to be added later on, manually.

Fruit

Tropical Fruit +

Durian

Fruit Tropical Fruit

Durian

Durian

Fig. 4. Example of the insertion operations for ontology integration

The process of ontology integration is iteratively occurred when the system adds each extracted concepts and relationships to the core tree. The system integrated 1,544 relationships that extracted from corpus to the core tree with term matching technique and 595 relationships with partially terms’ head words matching technique. The accuracies of these techniques are 0.82 and 0.91, respectively.

5

Conclusion

The workbench is originated by FAO and has been developed based on web 2.0 with Kasetsart University (NAiST lab and Thai AGRIS Center). In order to help the expert to both construct and maintain the ontological knowledge, we plug-in the authoring tools by using our previous researches in semi-automatically ontology construction and maintenance from unstructured text and lexicon knowledge acquisition from the existing sources, i.e., dictionaries and thesaurus. This workbench could support the experts in agricultural area for building and structuring multilingual ontology and terminology systems in a distributed environment which allows users to edit content by themselves and provide service for collaborating work. The workbench also has been tested via e-conference, by FAO, for four weeks with 170 participants from 43 countries such as Canada, Chile, Czech Republic, Finland, France, Germany, Hungary, India, Italy, Korea, Netherlands, Norway, Peru, Philippines, Portugal, Serbia, Spain, Sudan, Thailand, Turkey, USA, etc. The result of the system is that users in different countries and languages can access and work at any time. The next step is to find the strategies of how to promote this workbench and get the feedback for tuning the system. Acknowledgments. This work is supported by NECTEC, FAO, NII. We would like to give special thanks for Margherita Sini, Boris Lauser, Johannes Keizer and the Team from FAO for the first design of distributed environment and for their valuable comments and cooperation regarding the AOS project. We also thank to Mrs. Aree Thunkijjanukit for suggestion and kindly providing the resources of Thai Agrovoc.

References 1. Linger H., Fisher J., Wojtkowski W. G., Zupanci J., Vigo K., and Arnold J.: Constructing the Infrastructure for the Knowledge Economy: Methods and Tools, Theory and Practice, Plenum Pub Corp , 716 pp, ISBN 0306485540. (2004) 2. Basili, R., Pazienza, M. T., Zanzotto, F. M., "Web-Based Information Access: Multilingual Automatic Authoring," itcc, p. 0548, Intl. Conf. on IT: Coding and Computing (2002) 3. Reddy, P. K., Ramaraju, G. V., and Reddy, G. S.: eSagu™: a data warehouse enabled personalized agricultural advisory system. In: Proceedings of the 2007 ACM SIGMOD

(Beijing, China, June 11 - 14, 2007). SIGMOD '07. ACM Press, New York, (2007) 4. Kemp, Z., Tan, L., and Whalley, J.: Interoperability for geospatial analysis: a semantics and ontology-based approach. In: Proc. of the 18th Conf. on Australasian Database, Volume 63, Victoria, Australia (2007) 5. Ahmad, M. N. and Colomb, R. M.: Managing ontologies: a comparative study of ontology servers. In: Proc. of the 18th Conf. on Australasian Database, Volume 63, Victoria, Australia (2007) 6. Liu, C., Chen, W., and Han, Y.: DODO: a mechanism helping to dynamically construct domain ontologies for services integration. In: Proc. of the Intl. Workshop on SOSE '06. ACM Press, New York, NY, 13-18. (2006) 7. Alani, H.: Position paper: ontology construction from online ontologies. In: Proc. of the 15th Intl. Conf. on WWW. ACM Press, New York, NY, 491-495. (2006) 8. Casellas, N., Casanovas, P., Vallbé, J., Poblet, M., Blázquez, M., Contreras, J., López-Cobo, J., and Benjamins, V. R.: Semantic enhancement for legal information retrieval: Iuriservice performance. In: Proc. of the 11th Intl. Conf. on Artificial intelligence and Law, ICAIL '07. ACM Press, New York, NY, 49-57. (2007) 9. Hearst, M.: Automatic acquisition of hyponyms from large text corpora, In: Proceedings of the 14th International Conference on Computational Linguistics. (1992) 10. Landau M. F., E. Morin.: Extracting semantic relationships between terms: supervised vs. unsupervised methods, pp. 71-80. In: Proc. of International Workshop on Ontological Engineering on the Global Information Infrastructure, Dagstuhl Castle, Germany. (1999) 11. Bisson, G., C. Nedellec, D. Cañamero: Designing Clustering Methods for Ontology Building.The Mo’K Workbench. In: Proc. of the Workshop on ECAI’00, Germany. (2000) 12. Agirre, E., O. Ansa, E. Hovy, D. Martinez.: Enriching very large ontologies using the WWW. In: Proceedings of the Workshop on ECAI. (2000) 13. Navigli, R., P. Velardi and A. Gangemi.: Ontology learning and its application to automated terminology translation.IEEE Intelligent Systems 18(1). (2003) 14. Shamsfard, M. and A. A., Barforoush.: Learning ontologies from natural language texts. In: International Journal of Human-Computer Studies 60(1): 17-63. (2004) 15. Soergel, D., B. Lauser, A. Liang, and F. Fisseha.: Reengineering thesauri for new applications: The AGROVOC example. In: Journal of Digital Information 4(4). (2004) 16. Wielinga, B., A. Th, S. Wielemaker and J. Sandberg.: From thesaurus to ontology, pp. 194201. In: Proc. of the Intl. Conf. on Knowledge Capture.ACM Press, Canada. (2001) 17. Kang S. J. and J.H. Lee.: Semi-automatic practical ontology construction by using a thesaurus, computational dictionaries, and large corpora, pp.45-52. In: Proc. of ACL 2001 Workshop, Toulouse, France. (2001) 18. Kietz, J.U., A. Maedche and R. Volz.: A method for semi-automatic ontology acquisition from a corporate intranet. In: Proc. of Workshop Ontologies and Text, co-located with the 12th Intl. Workshop EKAW'2000, Juan-Les-Pins, France. (2000) 19. Jannink, J.: Thesaurus entry extraction from an on-line dictionary. In: Proceedings of Fusion '99, Sunnyvale CA. (1999) 20. Imsombut, A., A., Kawtrakul.: Automatic building of an ontology on the basis of text corpora in Thai, To be appear in Language Resources and Evaluation Journal special issue on Asian Language technology, Springer (2007) 21. Noikongka, D., Suktarachan M., A., Kawtrakul.: Semi-Automatic Thai Computational Lexicon Construction: KULEX. In: The 7th international Symposium on Natural Language Processing, SNLP2007. Pattaya, Thailand (2007) 22. M., Lesk.: Automatic Sense Disambiguation using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. In: 5th International Conference on Systems Documentation (ACMSIGDOC). Toronto (1986) 23. Christopher, D. M., S., Hinrich.: Foundation of Statistical Natural Language Processing. The MIT Press., Fifth printing, England (2002)