A tool supporting mining based approach selection to automatic ontology ..... Redondo S., 2014, SEO 101: What is Semantic Search and Why Should I Care?, ...
A TOOL SUPPORTING MINING BASED APPROACH SELECTION TO AUTOMATIC ONTOLOGY CONSTRUCTION Agnieszka Konys West Pomeranian University of Technology in Szczecin, Faculty of Computer Science and Information Technology Żołnierska 49, 71-210 Szczecin, Poland
ABSTRACT In recent years the Semantic Web community has been very active and productive in this research field. One of its main purposes is to provide a meaningful representation of machine readable data over the Web. Due to dynamic development of new technologies and an increasing amount of data available on the Web the greater role is assigned to efficient data collection, analysis and processing. The users increasingly expect search engines to understand natural language and perceive the intent behind the words they type in, and search engine algorithms are rising to this challenge. In this research area a great amount of work is dedicated to improving ontology engineering. Furthermore, a process of ontology construction for frequently changing domains is a time-consuming and very often requires an expert's participation during this process. This paper presents the problem of automatic ontology construction, especially including mining based approaches. A choice of a proper mining approach heavily depends on a type of input source and offered functionalities. A tool supporting mining based approach selection to automatic ontology construction is proposed. It may help users in a proper selection of a relevant mining based approach to automatic ontology construction, and provides a knowledge systematization in this area. Moreover, this part of presented work is a part of more complex procedure to automatic ontology construction. Due to limited space of this paper and a complexity of this problem only this part of work is presented. KEYWORDS Mining based approach, ontology, automatic ontology construction, Semantic Web.
1. INTRODUCTION Search has changed dramatically over the last years and it seems that it is still early days for the rapidly changing environment. The World Wide Web (WWW) allows the people to share information from the large repositories globally and the amount of information is still growing up. The process of searching information in the Web might be a little bit time consuming to find relevant content in the Web resources. A huge number of information available on the Web and the increasing user's expectations may pose new challenges to the searching engines. The users increasingly expect search engines to understand natural language and perceive the intent behind the words they type in, and search engine algorithms are rising to this challenge (Hogan et. al., 2011). This evolution in search has a great impact both for users and for technology developers. Very often results provided by traditional search engines might be inconvenient. They do not typically produce direct answers to queries, but instead typically recommends a selection of related documents from the Web. In the general case, traditional search engines might be not enough suitable for complex information gathering tasks requiring aggregation from multiple indexed documents (Madhu, 2011). Most of the search engines search for keywords to answer the queries from users. Due to a huge number of content available on the Web the important thing should be understanding how the data was related, both with within the same site page and throughout the Web. As a consequence of this, the most significant change should encompass a progression from the ubiquitous keywords to the increasingly important entities. Words become concepts and search engines evolve into genuine learning machines (Redondo, 2014). The Semantic Web is described as structured information on the Internet designed to be read by software agents rather than by humans (Berners-Lee, 2001). It requires a vocabulary and rules of the form provided by
ontologies. The general aim is to provide a meaningful representation of machine readable data over the Web. It means that machines are capable of rightly interpreting the data (Ramprakash et. al., 2008). In this research area a great amount of work is dedicated to improving ontology engineering. This includes techniques to discover correspondences and to match similar concepts automatically (Bedini et. al., 2010). The role of efficient application of ontology on Semantic Web increasingly rises up. A process of ontology construction for frequently changing domains is a time-consuming and very often requires an expert participation during this process. This problem is simplified, when an ontology constructor concentrates only on one specified domain, and the set of data is relatively low. An implementation of automation ontology construction may reduce some of research problems (e.g. lack of knowledge systematization, the lack of tools capable of extracting and acquiring information, the complexity of aligning and merging two or more knowledge sources, the difficulty of validation based on background knowledge hard to produce and maintain (Bedini et. al., 2010)). The general aim of ontology automatic construction is to provide a possibility to create ontologies from different types of input sources. It is assumed that it may help in a limitation of a human intervention in this process. It enables time reducing and thereby the necessity of an expert participation in domain ontology construction process should be limited. In this paper, the specified analysis of selected mining based approaches to automatic ontology construction is presented. Based on this, the comparative analysis is created. To simplify the selection procedure of a proper approach, a tool supporting mining based approach selection to automatic ontology construction is proposed. This tool is based on ontology and it is built using OWL language. The general aim of the proposed tool is to help in classification processes and provide the relevant answers in a simply way. Moreover it ensures a knowledge systematization in this area. The process of searching a proper mining based approach to automatic ontology construction is a part of more complex procedure. Due to limited space of this paper and the size of the considered problem only this small part of it is presented. The whole procedure is composed of the several steps (selected steps: knowledge extraction from structured and unstructured sources of knowledge, using a tool supporting mining based approach selection, parsing the data to automate ontology construction, pattern matching, using lexicons, adaptation to pose question in natural language, validation). It is worth to notice that this procedure is still developed and enhanced.
2. AUTOMATIC ONTOLOGY CONSTRUCTION Ontology provides a common understanding of specific domains that can be communicated between people and application systems. The best known definition of ontology was proposed by Gruber (Gruber 1993), who defines ontology as an explicit specification of a conceptualization. Ontology is also expressed as a formal representation of knowledge by a set of concepts within a domain and the relationship between these concepts. The construction of the ontology still remains a hard human task. The process is sometimes assisted by software tools that facilitate the information extraction from a textual corpus. Manually constructing ontology with the help of tools is still practiced to acquire knowledge of a many domains. However, this is a difficult and time-consuming task that involves the domain experts and knowledge engineers (Navigli et.al., 2003). It is profitable when the domain is relatively small, and an ontology constructor has a complete data set. In this case is better and faster to create ontology manually. Nonetheless, the vast majority of cases points at automatic ontology construction as the most preferable way. The size, complexity and dynamic development of a given domain causes the necessity of automatically build ontologies (Subhashini, Akilandeswari, 2011). During the last decade, many efforts to automate the ontology acquisition process have been carried out, but it is seemed that the automatic domain ontology construction is a challenging task. Automated generation provides a fundamentally different approach to ontology creation than manual construction by a designer. The general aim of ontology automatic construction is to create ontology from different input sources both structured (e.g. HTML, XML/XML Schema, RDF/OWL, relational data) and unstructured (e.g. text, documents, images) and a possibility to its development in different ways: merging and alignment with other ontology or to use it for question answering system and create natural language interface (Hellmann, Auer, 2013) . The resulting knowledge needs to be in a machine-readable and machine-
interpretable format and must represent knowledge in a manner that unambiguously defines its meaning and facilitates inferencing (Unbehauen et. al., 2012).
2.1 A general approach to automatic ontology generation life-cycle In this paper an adaptation of a general approach to automatic ontology construction proposed by Bedini is exploited (Bedini et. al, 2010). It consists of the following phases: (1) Information Extraction, (2) Analysis, (3) Generation, (4) Validation, and (5) Evolution. If necessary, some of the steps have to be repeated until a satisfactory result is achieved. Sometimes, the individual steps should be supported by automated validation techniques. Based on this, the most important factors are indicated and presented in table 1. Table 1. The phases of the general approach to automatic ontology generation life-cycle Name EXTRACTION
ANALYSIS
GENERATION
EVOLUTION VALIDATION
Description Selection process; Information acquisition (concepts, attributes, relationships and axioms); Different types of knowledge sources (structured, semi-structured and unstructured); Different techniques to process the information (Natural Language Process - NLP, machine learning, clustering, semantics etc); Formalization process of information extraction. Matching results; Semantics analysis; Matching and alignment between knowledge sources; (relationships, hierarchy of concepts); Terms generalization. Merging and integration problem; Formalization process using a proper language (e.g. OWL); Inference, coherence; Using heuristics and rules. Adding additional elements (concepts, relationships, object properties). A validation task of the final result; Input correctness verification; Consistency verification.
These steps are considered during the analysis of mining based approaches to automatic ontology construction in section 3. It is worth to notice that a number of considered aspects may be changed and filled by other important factors.
3. MINING BASED APPROACHES TO AUTOMATIC ONTOLOGY CONSTRUCTION INTRODUCTION The process of building domain ontology using external knowledge resources is defined in a literature as mining based approach. The general aim of mining based approach is to retrieve seeds and to interpret queries on the Web. The seeds are either manually or automatically defined from the input source, and the external resource is queried in order to derive new knowledge. This approach encompasses the integration of external dictionaries, existing ontology or from a more general knowledge resource, like WordNet (Miller 1995) or the Web. Mining based techniques implement some mining techniques to retrieve the keywords from the given text documents. Mining techniques incorporate automatic key word extraction techniques in order to construct the ontology.
3.1 Related works An analysis of a literature allows to identify several mining based approaches to automatic ontology construction. Some of them is still developed and enhanced. According to the general approach to automatic ontology construction (see: table 1), they can be considered in four main aspects: extraction, analysis, generation, validation. It is possible to point at five additional aspect referred to evolution. Additionally the criterion called offered support is added to a comparison analysis. It includes the offered tool and
methodological aspects. The most of analyzed approaches offered semi-automatic tools, requiring a human intervention in several steps, especially at the begin and at the end of the process. The table 2 below present the short characteristics of 5 selected mining based approaches to automatic ontology construction. Table 2. The characteristics of selected mining based approaches to automatic ontology construction Name TERMNAE (Biébow, Szulman, 1999) (Bourigault, 1994)
Short description The general aim of this approach is to support the process of building an ontology, both from scratch and from texts, without control by any task . It offers a computer-aided knowledge engineering tool written in Java. TERMINAE is composed of two tools: linguistic engineering tool and knowledge engineering tool. The first module allows to extract the terminological forms (keywords) from a given text file. The general aim of its application is the to help in a notion representation as a concept, which is called a terminological concept. It uses a term extractor called LEXTER. LEARNING OWL ONTOLOGY It is based on an analysis of a set of texts followed by the use of WordNet. The keywords of the text are analyzed. As a next step, selected words are FROM FREE TEXT (Da-You Liu, 2004) searched in WordNet to find the concepts associated with these words. The ontology generation is characterized of a high level of automation. It does not provide any information of terms extraction from the body text. It describes a generic approach for the creation of an ontology for A METHOD FOR SEMIa domain based on a source with multiple entries (including a generic AUTOMATIC ONTOLOGY ontology to generate the main structure, a dictionary containing generic ACQUISITION FROM A terms close to the domain, and a textual corpus specific to the area to clean CORPORATE INTRANET (Kietz et. al., 2000) the ontology from wrong concepts). This approach allows to combine several input sources. A user must manually check the ontology at the end of the generation process. It allows to generate domain ontologies from text documents. This SALT (Lonsdale et. al., 2002) approach assumes the availability of 3 types of knowledge sources: more general and well defined ontology for the domain; a dictionary or any external source to discover lexical and structural relationships like WordNet; consistent set of training text documents. Based on these elements it enables automating the creation of a new sub-ontology of the more general ontology. User intervention is required at the end of the process, because it can generate more concepts than required. ONTOLOGY CONSTRUCTION It enables automatic construction an ontology from a set of text documents. It proceeds in the following steps: terms are extracted from FOR INFORMATION documents with text mining techniques; the documents are grouped SELECTION (Khan, Luo, 2002) hierarchically according to their similarities using a modified version of SOTA algorithm. It requires to assign concepts to the tree nodes starting from leaf nodes with a method based on the Rocchio algorithm. The process of concept assignment is based on WordNet. Bottom up approach is used for ontology generation. A human intervention is strongly limited.
3.2 A comparative analysis of selected mining based approaches to automatic ontology construction An analysis of selected mining approaches to automatic ontology construction allows to compare these solutions. On base of the approach (see: section 2, table 2) the analysis encompasses 4 steps (Extraction, Analysis, Generation, Validation). The comparative analysis allows to indicate the sets of sub-criteria for each of them: Extraction (NLP Techniques, Text sources, Optional human intervention, High automation level, Semi-automatic approach, Multi entries source, keyword searching using term extractor, keyword searching using WordNet); Analysis (Concept Relationship Analysis, a semi-automatic approach); Generation (semi-automatic approach, Human intervention optional, OWL format); Validation (By human, limited human intervention, a semi-automatic approach). Additionally 5th step was added: Offered support (Tool, Methodological approach). The selected mining based approaches to automatic ontology construction were surveyed according to these characteristics. Due to inaccurate information of evolution, this step was omitted. Instead of this,
offered support was considered. Table 3 summarizes the results of the survey. Due to the length of the names and the limited place, the names of the presented approaches have been changed into the numbers as follow: (1) TERMINAE, (2) SALT, (3) Learning OWL ontology from free text, (4) Ontology construction for information selection, (5) A method for semi automatic ontology acquisition from a corporate Intranet. The "+" symbol means that a given mining based approach fulfills a requirement. The comparative analysis of these approaches informs that a different degree of automation exists. In many cases it cannot be measured with a relevant precision. Table 3. The comparative analysis of selected mining based approaches to automatic ontology construction Criterion Sub-criterion EXTRACTION NLP Techniques Text sources Optional human intervention High automation level Semi-automatic approach Multi entries source Keyword searching using term extractor Keyword searching using WordNet Concept ANALYSIS Relationship Analysis Semi-automatic approach GENERATION Semi-automatic approach Human intervention optional
1
2
3
+
+
+
+
+
+
+
+
+
+
+
OFFERED SUPPORT
Tool Methodological approach
5 +
+
+
+ +
+
+
+ +
+ +
+
+
+
+
+
+
+ +
+
+
OWL format VALIDATION By human Limited human intervention Semi-automatic approach
4
+
+
+ +
+ +
+
+
+
+
+
+
+ +
+
+
+
4. A TOOL SUPPORTING MINING BASED APPROACH SELECTION TO AUTOMATIC ONTOLOGY CONSTRUCTION To summarize the results of comparative analysis of mining based approaches to automatic ontology construction, a tool supporting mining based approach selection to automatic ontology construction is proposed. The proposed tool should help for the classification of mining based approaches. Furthermore, it should supports the selection process and provides a knowledge systematization in this area. It was implemented in OWL language.
The general aim of proposed tool is to provide a classification for these approaches, and allow to a user to find relevant mining technique especially including his needs. A huge level of complexity and a number of information causes this step to present in the ontological way. The main advantage is that a user does not need to check many sub-sites to find relevant information of mining based approaches and their functionalities. Moreover, this ontology provides also a description of each analyzed approach. Furthermore, it enables information classification, what has a great impact in the future work. For each of analyzed mining based approach a specified description is provided. Currently, Protégé tool is used to present the results, but the user's interface will be provided in the nearest future.
Fig. 1. A part of class hierarchy of mining based approaches to automatic ontology construction
Figure 1 presents a part of classification criteria (including class hierarchy) of mining based approaches to automatic ontology construction. To each of them a short description is provided. A figure 2 depicts the relations between the mining based approaches to automatic ontology construction.
Fig. 2. The relations between the mining based approaches to automatic ontology construction
4.1 Case studies: the tool supporting mining based approach selection to automatic ontology construction The case studies present practical example of the tool supporting mining based approach selection to automatic ontology construction. It is possible to find relevant mining based approach with regard to its specification defined by a user. In this case study it is supposed that a user is looking for a mining based approach that fulfills a set of pre-defined requirements: Analysis: Concept relationship analysis, or Extraction: Multi entries source, and Extraction: Semi automatic approach. The application of the reasoning mechanism provides a set of results with regard to the pre-defined requirements. In this case 2 mining based approaches to automatic ontology construction (TERMINAE and SALT) fulfill this defined set of criteria. Figure 3 depicts the results.
Fig. 3. A practical example of tool's application supporting selection of mining based approaches to automatic ontology construction
In the second case study the set et of criteria has been changed. It is supposed that a user is looking for a mining based approach that fulfills the following criteria: Analysis: Semi-automatic automatic approach, or Extraction: Keyword searching using WordNet, WordNet, and Extraction: Using NLP techniques, and Offered support: Tool. The results provided by the reasoning mechanism contain three solutions: the method for semi automatic ontology acquisition from a corporate Intranet ntranet, SALT, and TERMINAE. Figure 4 shows the results.
Fig. 4. A practical example of tool's application supporting selection of mining based approaches to automatic ontology construction
It is possible to specify a non-limited non set of queries for the proposed tool.. Moreover, the user does not have to have a broad knowledge of mining based approaches to automatic ontology construction, but can still make a reasonable choice.
5. CONCLUSION This paper presents the tool supporting mining based approach selection to automatic ontology construction. constr The practical examples of its application were included. The general aim of its is to provide a systematic and repeatable way for the selection process of mining based approach to a given decision problem. The short characteristics of selected mining ng based approaches to automatic ontology construction was presented. Based on this, the comparative analysis was created. The results from the comparative analysis was used to build the tool supporting mining based approach selection to automatic ontology construction. The tool was based on the ontology. The standard of the ontology description was OWL language. It is worth to notice that a user has a possibility to specify a non-limited non set of queries. Moreover, the user does not have to have a broad knowledge of available mining based approaches to automatic ontology construction, but can still make a reasonable choice. It is worth to emphasize that the building of the tool to support mining based approach selection is a part of the more complex work of automatic ontology construction (selected steps: knowledge extraction from structured and unstructured sources of knowledge, using a tool supporting mining based approach selection, parsing the data to automate ontology construction, pattern matching, using using lexicons, adaptation to pose question in natural language, validation). validation) The future researches encompass the development of the complex approach to automatic ontology construction, especially including Question Answering Systems and Natural Language Processing ssing techniques to query ontologies.
REFERENCES Bedini I. et. al., 2010, Janus: Automatic Ontology Builder from XSD files, Proceedings of the World Wide Web Conference (WWW), Beijin, China. Berners-Lee T. et. al, 2001, The Semantic Web, Scientific American. Biébow B., Szulman S., 1999, TERMINAE: A Linguistics-Based Tool for the Building of a Domain Ontology, Knowledge Acquisition, In Modeling and Management Lecture Notes in Computer Science, Vol. 1621, pp 49-66. Bourigault D., 1994, LEXTER, un Logiciel d’EXtraction de TERminologie. Application à l’acquisition des connaissances à partir de textes, Ph.D. thesis, EHESS Paris, France. Da-You Liu, He Hu, 2004, Learning owl ontologies from free texts, In Machine Learning and Cybernetics, Vol. 2, pp 1233-1237. Gruber T., 1993, A translation approach to portable ontology specifications, In Knowledge Acquisition, Vol. 5, No. 2, pp 199-220. Hellmann, S.; Auer, S., 2013, Towards Web-Scale Collaborative Knowledge Extraction, In Gurevych I., Kim J., The People’s Web Meets NLP, Theory and Applications of Natural Language Processing. Springer-Verlag. pp 287-313. Hogan A. et. al., 2011, Searching And Browsing Linked Data With SWSE: The Semantic Web Search Engine, In Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 9, Issue 4, pp 365-401. Khan L., Luo F., 2002, Ontology construction for information selection, In Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence, ICTAI ’02, IEEE Computer Society, Washington, DC, pp 122– 130. Kietz, J. et.al., 2000, A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet, In Proceedings of EKAW-2000 Workshop Ontologies and Text, Juan- Les-Pins, France. Lonsdale D. et. al., 2002, Peppering knowledge sources with salt: Boosting conceptual content for ontology generation, In AAAI Workshop for Semantic Web Meets Language Resources, The Eighteenth National Conference on Artificial Intelligence, AAAI Press, pp 30–36. Madhu G. et. al., 2011, Intelligent Semantic Web Search Engines: A Brief Survey, In International Journal of Web & Semantic Technology (IJWesT), Vol.2, No.1. Miller, G.A., 1995, WORDNET: A lexical database for English, In Communications of ACM, Vol. 11, pp 39-41. Navigli R. et. al., 2003, Ontology learning and its application to automated terminology translation, In Intelligent Systems, IEEE, Vol. 18, No. 1, pp 22-31. Ramprakash et. al., 2008, Role of Search Engines in Intelligent Information Retrieval on Web, Proceedings of the 2nd National Conference, India. Redondo S., 2014, SEO 101: What is Semantic Search and Why Should I Care?, In Search Engine Journal, http://www.searchenginejournal.com/seo-101-semantic-search-care/119760/ Subhashini R., Akilandeswari J., 2011, A Survey On Ontology Construction Methodologies, In International Journal of Enterprise Computing and Business Systems, Vol. 1, Issue 1. Unbehauen J. et. al., 2012, Knowledge Extraction from Structured Sources, In Search Computing Lecture Notes in Computer Science, Vol.7538, pp 34-52.