Ontology Development Using Hozo and Semantic ... - IEEE Xplore

1 downloads 0 Views 1MB Size Report
Tech, Guru Tegh Bahadur Institute of Technology, GGS Indraprastha University, Delhi ... Computer Science and Engineering Department, Krishna Engineering College, Ghaziabad, U.P ... Management Solutions were not so advanced that they.
Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013)

Ontology Development using Hozo and Semantic Analysis for Information Retrieval in Semantic Web 1

Gagandeep Singh1, Vishal Jain2 and Dr. Mayank Singh3

B.Tech, Guru Tegh Bahadur Institute of Technology, GGS Indraprastha University, Delhi Research Scholar, Computer Science and Engineering Department, Lingaya‟s University, Faridabad 3 Associate Professor, Computer Science and Engineering Department, Krishna Engineering College, Ghaziabad, U.P 1 [email protected], [email protected], [email protected] 2

Representation documents. This paper is divided into five sections: Section 2 defines the IR technology. It describes IR Process and its Architecture, types of documents present on web. Section 3 gives brief overview of Semantic Web (SW) including its challenges, its technologies and it‟s comparison with World Wide Web (www). It gives a proposed methodology for building ontology with the help of Ontology Editors that makes use of Knowledgeable Representation languages like OWL, RDF, DAML+OIL etc. In Section 4, we have described about one of Ontology Editors named as Hozo. We have developed ontology on “Computer Appreciation” using Hozo. Section 5 gives information about Semantic Analysis in Ontology based information retrieval and search system. We have also implemented one of semantic search engines named as SenseBot in respective paper.

Abstract- We are living in the world of computers. This modern era deals with a wide network of information present on web. A huge number of documents present on web have increased the need for support in exchange of information and knowledge. It is necessary that user should be provided with relevant information about given domain. Traditional Information Extraction techniques like Knowledge Management Solutions were not so advanced that they can lead to extraction of precise information form text documents. It leads to the concept of Semantic Web that depends on creation and integration of Semantic data. The Semantic data in turn depends on building of Ontology. Ontology is considered as backbone of Software system. It improves understanding between concepts used in Semantic Web. So, there is need to build an ontology that uses well defined methodology and process of developing ontology is called Ontology Development.

II. INFORMATION RETRIEVAL Definition: - Information Retrieval (IR) is defined as process of identifying and retrieving unstructured documents containing the specific information stored in them. IR mainly focuses on retrieval of natural language text. A. Types of Documents Documents may be Structured, Unstructured, Semistructured or combination of them. (a) Structured documents: A document is said to be structured if it is written in well defined syntax and has components. Structured database is a Table where we have multiple attributes of user‟s record. It is shown below:

Keywords - Information Retrieval (IR), Ontology, Semantic Web (SW), Software Development Life Cycle (SDLC), Hozo I. INTRODUCTION Information Retrieval (IR) technology is major factor responsible for handling annotations in Semantic Web (SW) [1]. Traditional text Search Engines are not optimal for finding the relevant documents. It is produced by various approaches of ontologies and Semantic data. These purely text based Search Engines fails because of following reasons:  Improper style of natural languages: - There are chances that syntax of languages is not appropriate.  High level unclear concepts: - Some concepts which are used in document but present Search Engines can‟t find those words.  Timely Scenario: - Keywords matching is not used to find timely specified documents. The ability to translate knowledge from different languages is considered as major factor for building powerful Artificial Intelligent (AI) systems. Various AI research communities like Natural Language Processing (NLP). Ontology has changed the way of present web thus making it more expressive and full of Knowledgeable

978-1-4673-6101-9/13/$31.00 ©2013 IEEE

TABLE1. STRUCTURED DATABASE

S.No 1. 2.

Name Gagan Vishal

Address Canada USA

Id 129 128

IR engines can easily find out components in structured document due to its unique components. (b) Unstructured documents: These documents are written in natural languages. They do not have well defined syntaxes and positions where IR engines could find records satisfying user problems. Unstructured documents are randomly generated documents on any topic.

113

Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013)

 No optimal Software or Hardware is provided. Following are aspects of Semantic Web (SW):  The Semantic Web (SW) leads to an environment where information and services can be interpreted semantically and are processed in machine understandable form.  SW relies on ontology as a tool for modeling an abstract view of real world and Semantic analysis of documents.  SW is an XML (Extensible Markup Language) application. B. Semantic Web (SW) vs. World Wide Web (www) Both Semantic Web (SW) and World Wide Web (www) are different from each other in various aspects which are described in the form of table as shown:

(c) Semi-structured documents: These documents share common structure and meaning of collection of textual documents. It is different from structured query in a way that they do not have same column for each row in table. B. IR Process and Architecture The procedure of retrieving information is as follows: Background knowledge is stored in form of ontology that can be used at any step. As we have ranked list of documents, they are indexed to form document in represented way. These documents produce ranked results which are given to admin. Admin solves user query which leads to transformation of user query.

Ranked •list of •documents

Text documents

Result

Admin

Solves user query

TABLE2. SEMANTIC WEB VS WORLD WIDE WEB

Semantic Web (SW)

Figure1: Information Retrieval Process [2]. The architecture of Information Retrieval Engine is as follows: It is based on ONTOLOGY BASED MODEL which represents the content of resource from given ontology. It has following parts:  OMC (Ontology Manager Component):- It is used by Indexer, Search Engine and GUI.  INDEXER: - It indexes documents and creates metadata.  SEARCH ENGINE  GUI supports user in query formation INDEXER

SEARCH ENGINE

GUI

OMC

Figure2: IR Architecture III.

CONCEPT OF SEMANTIC WEB AND ONTOLOGY The idea of Semantic Web (SW) [3] as envisioned by Tim Bermers Lee came into existence in 1996 with the aim to translate given information into machine understandable form. The Semantic Web (SW) is an extension of current www in which documents are filled by annotations in machine understandable markup language. Semantic Web (SW) uses Semantic Web documents (SWD‟s) that are written in SW languages like OWL, DAML+OIL.

World Wide Web (www)

1. It is an extension of www that will manipulate contents of information automatically without human involvement.

1. It is human focused web.

2. It discovers documents for gathering relevant information.

2. It discovers documents for people.

3. It deals with resources like pages, images, photos and people.

3. It only deals with media resources like web pages, photos, images.

4. SW holds different kinds of relations showing association among different kinds of resources.

4. Www only holds of hyperlinks between resources.

5. SW makes use of ontology that allows users to organize information into science of concepts.

5. It does not use concept of ontologies.

6. SW has formal semantics of context i.e. it uses web ontology languages for generating data.

6. It does not have formal semantics of context. The contents are machine readable but not machine understandable.

7. Complete information is accessible to Semantic Search Engines like Hakia.

7. Only few pages of information are accessible to traditional Search Engines like Google.

C. Semantic Web (SW) Technology SW technologies are listed below:  XML: - XML is extensible language that allows users to create their own tags to documents. It provides syntax for content structure within documents. XML Schema: - It is language for defining XML documents. XML document is a tree.  RDF: - It stands for Resource Description Framework. It is simple language to express data models which refers to objects and their

A. Challenges and Aspects of Semantic Web (SW) In spite of various efforts led by researchers, SW has remained a future concept or technology due to following reasons:  Complete SW parts have not been yet developed and the developed parts are so poor that they cannot be used in real world.

114

Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013)



Applying constraints: - Constraints represent named relationships between domain and range class.  Verification: - After designing preliminary web ontology model, it is necessary that it should be tested for its correctness. (b) Design Phase: - The phase is backbone of Semantic Web. The physical structure of designed ontology is based on RDF model which is associated with three triplesSubject, Predicate and Object. Predicate: - All characteristics of resources and relationship are taken as Predicate. E.g. each student is assigned unique RollNo called as „HasRollNo‟. Subject: - All domain classes of characteristics and relationships of resources are taken as Subject. E.g. there are various average students each having unique URI, so they are grouped in ‘AvgStudentsGroup’. Objects: - Refers to Range class relationships. E.g. HasRollNo contains range class „NUMBER‟ which is literal. (c) Formalization Phase: - This phase is result of output of ontology obtained in design phase with the help of some tools.

relationships. These models are called RDF Models. RDF model consists of Resource, Property and Statement. Resource may be web pages or individual elements of XML document. Resource having its name is called as Property. Statement is combination of Resource and Property along with its value. E.g. Vishal plays Guitar. Object

Property

Resource RDF

Resource

Property

Statement

Figure3: RDF Model D. Ontology The term Ontology [4] can be defined in different ways as: Ontology is abbreviated as FESC (Formal, Explicit, and Specification of Shared Conceptualization) which is defined as: Formal: - It specifies that it should be machine understandable. Explicit: - It defines type of constraints used in model. Shared: - It means that ontology is shared by group. It is not restricted to individuals. Conceptualization: - It refers model of some phenomenon to identify relevant concepts of that phenomenon. Ontology is also defined as set of concepts and relationships arranged in hierarchical fashion. E. Ontology Development Ontology development [5] needs well defined methodology that must follow certain guidelines:  Ontology being developed should follow Software Engineering standards.  Ontology development strategy should be simple and practical. The phases that are being used in developing ontology also satisfy Software Engineering principles and thus called as Software Development Life Cycle (SDLC) phases. They are described below: (a) Specification Phase: - This phase has its few activities.  Domain Vocabulary definition: - It defines common name and attributes for domain concepts.  Identifying Resources: - A Resource is anything that has URI. So, if some concepts have number of instances, then they can be grouped into a class.  Identifying Axioms: - They are structures that represent behavior of concepts.  Identifying relationships: - Relations are defined within resources.  Identifying data characteristics: - Defines features of types of resources and their relationship.

Design Phase • Formalization Phase

Domain vocabulary definition Resource Identification Identifying Axioms

Specification Phase

Identifying Relationships Identifying data characteristics

Verification Applying constraints

Figure4: Ontology development phases [6] IV. HOZO- AN ONTOLOGY EDITOR Version used: - 5.2.36 beta Developed at Mizoguchi Lab, ENEGATE Co. Ltd. Hozo is different from other ontology editors in following aspects:  Its user friendly environment lets users to work easily on it.  Hozo has API named as HozoAPI ver 1.15 that accesses existing ontologies.  Slot definition option is available.  Inheritance information is clear and easily accessible by two options: One is from Super

115

Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013)



Classes through is-a link. Other is from Class constraint. Hozo provides facility of correcting errors at time of validating ontology.

TABLE4. DATA CHARACTERISTICS Domain class Range class Computer Generation, String Components, Input and Output devices

Name HasName

Ontology Editor

Hastype

Ontology Manager (Dependencies)

Developer

Identifying data characteristics: -



Generation Components Generation

Hasyear



Ontology Server

String Number

Applying Constraints: -

TABLE5. RESOURCES RELATIONS ALONG WITH CONSTRAINTS Name Domain class Range class

Figure5: Hozo Components [7] .

Has H/w system

Computer

Components

Has S/w system

Computer

Components

Haskeyboard

H/w system

Input devices

HasPrinter

H/w system

Output devices

HasRing

Computer

Network System

HasBus

Computer

Network System



Validating: - In Hozo, there is a feature named Ontology Consistency Check feature that utilizes Hozo inference structure to verify whether ontology is developed properly or not. (b) Design Phase In context of Hozo, the output obtained from specification phase results in an ontology file that is considered as output of developed ontology. It is available in different formats like:  Text/HTML  XML  RDF  OWL

Figure6: Ontology editing screen 4.1 Case Study We have presented a case study to implement all the phases that are involved in ontology development methodology. This case study illustrates the ontology building on „Computer Appreciation‟ with the help of ontology editor called HOZO. (a) Specification Phase  Domain definition and resources identification: There are number of computers classified on basis of speed, each has its unique URI so they are grouped in sub class called „CLASSIFICATION‟ under Super class „COMPUTER‟. TABLE3. DEFINING CLASSES AND INSTANCES Concepts Instances Features of Predicate (Nodes) Resources Classification Home Purpose, Name, Hasname etc. computer, examples. PC etc. Generation First, Purpose, Name, Hasname etc. Second etc. examples. Components H/w Types, name, Hastype system, s/w examples, Hasname etc. system purpose. Input devices Scanner, Types, purpose Hastype and CPU, purpose. keyboard Output devices Monitor, Types, Hastype and Printer, examples, examples. speakers. purpose.





Figure7: Sample slice of ontology using RDF

Defining Axioms about Resources: - A computer has Hardware system and Software system. Hardware system is categorized into Input devices and Output devices.. Relationship Identifying and naming: - Relation between „H/w system‟ class and „Input devices‟ class is named as Haskeyboard.

Figure8: Sample slice of ontology using OWL (c) Formalization Phase This phase describes developed ontology pictorially whose source code was developed using RDF syntax.

116

Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013)

where we have to search all possible paths between any two nodes in semantic graph. Finding associations between all possible paths in a graph is made possible by using path association algorithm. Steps are as follows: Finding possible paths between two classes at schema level

It shows Hozo user – interface for showing ontology hierarchy.

Comparing each path with other paths Figure9.1: Hozo user-interface

If there is intersection between two nodes Two paths meet at same node in schema Result is used to perform search at data level that determines associations between nodes. (d) Displaying Results: - It refers how semantic associations are being displayed.

Figure9.2: Hozo user – interface Another interesting feature of Hozo is that it produces map layout view of our developed ontology by using „Generate Map‟ function.

Provides user interface Ranking algorithm Semantic search engine uses languages like SPARQL, RQL Ontology Data conversion

Figure10: Map Generation using Hozo Data sources Data Sets Figure11: Components of Semantic enhanced ontology based search engine

V. SEMANTIC ANALYSIS The word Semantic Association Analysis means discovering complex and meaningful relationships between objects and these relationships are called as Semantic Associations. Following are aspects about Semantic analysis as:  It leads to generation of knowledge driven information from available data resources.  It uses semantic query framework for analyzing relationships by using various semantic query languages like SPARQL, RQL, and SERQL etc.  There are semantic search engines that analyses relationships and creates associations between resources. Examples include Swoogle, Weet-IT. A. Components of Semantic Analysis (a) Ontology development: - The process of developing ontology has became easier with the help of free, open source editors like Hozo that uses ontology languages like DAML+OIL, RDF etc. E.g. If we want to create ontology on travelling process, then we can import the concepts from existing ontology. We need not to develop from root node. (b) Dataset Construction: - Dataset is also called as Test Bed or Knowledge Base. It is collection of instances for creating ontology. (c) Semantic Association Discovery: - It uses Graph Traversal algorithm for determining semantic associations

B. Semantic Search Engine Structure The system uses a search engine called SenseBot that is designed to produce summary in response to keywords that are to be searched by user. About SenseBot: - It understands the meaning of search query and uses relevant results to generate the summary of valid results. Below figure illustrates the results of query from Semantic association analysis.

Figure12: Results of Query CONCLUSION This paper highlights the common problem of users of retrieving relevant information about their queries. It emphasis on the concept of Information Retrieval (IR) and various IR approaches for extracting knowledge driven documents from the cluster of interlinked web documents.

117

Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013)

proceedings of the 2nd International Semantic Web Conference (ISWC), pp 453-468, 2003.

Here introduces the concept of ontology and their role in Semantic Web. Since, ontology is considered as backbone of software system, so it should be well designed without creating any ambiguities. This paper also shows proposed methodology for ontology development using SDLC phases. The concept of Semantic Web has revolutionized emerging technology by extracting information from various web documents and integrating them in machine form. We have developed ontology on Computer Appreciation using one of ontology editors named Hozo. In this paper, research issues in Semantic analysis have been described that plays vital role in Semantic web. It enables meaningful relations between set of entities and finds all possible paths by using Graph traversal algorithm. It also describes architecture of Semantic enhanced ontology based search engine.

[16]. L. Stojanovic, “Migrating data intensive web sites into the Semantic web”, In Proceedings of the 17th ACM symposium on applied computing (SAC), ACM Press, pp 1100-1107, 2002. [17]. Aleman-Meza B, Arpinar I.B, “A Context aware Semantic association Ranking”, Technical Report LSDIS Lab, Computer Science, Univ of Georgia, pp 03-010, 2003. [18]. Dayal U, Kuno H, “Making the Semantic Web Real”, IEEE Data Engineering Bulletin, Vol.26, No.4, pp 4-7, 2003. [19]. Kaushal Giri, “Role of Ontology in Semantic Web”, DESIDOC Journal of Library & Information Technology, Vol.31 No.2, March 2011, pp 116-120 [20]. Urvi Shah, Tim Finin and Anupam Joshi, “Information Retrieval on the Semantic Web”, Scientific American, pp 35-45

About the Authors

REFERENCES

Gagandeep Singh Narula has completed B.Tech from Guru Tegh Bahadur Institute of Technology (GTBIT) affiliated to Guru Gobind Singh Indraprastha University (GGSIPU), New Delhi. His research area includes Web Technology, Semantic Web and Information Retrieval.

[1]. Urvi Shah, James Mayfield, “Information Retrieval on the Semantic Web”, ACM CIKM International Conference on Information Management, Nov 2002. [2]. Gagandeep Singh, Vishal Jain, “Information Retrieval (IR) through Semantic Web (SW): An Overview”, In proceedings of CONFLUENCE-The Next Generation Information Technology Summit, 27-28 September 2012, pp 23-27. [3]. Accessible from T. Berners Lee, “The Semantic Web”, Scientific American, May 2007

Vishal Jain has completed his M.Tech (CSE) from USIT, Guru Gobind Singh Indraprastha University, Delhi and doing PhD in Computer Science and Engineering Department, Lingaya‟s University, Faridabad. Presently, He is working as Assistant Professor in Bharati Vidyapeeth‟s Institute of Computer Applications and Management, (BVICAM), New Delhi. His research area includes Web Technology, Semantic Web and Information Retrieval. He is also associated with CSI, ISTE.

[4]. Berners Lee, J. Lassila, “Ontologies in Semantic Web”, “Scientific American”, May 2001, pp 34-43. [5]. Helena Sofia Pinto, Joao P. Martins, “Ontologies: How can they be built? Knowledge and Information Systems, pages 441-464, 2004. [6]. Amjad Farooq and Abad Shah, “Ontology Development Methodology for Semantic Web System”, Pakistan Journal of Life Social Sciences, Vol.6 No.1, May 2008, pp 50-58. [7]. Kozaki K, “Hozo: An Environment for Building Ontologies”, In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW), pp 213-218, October 2002. [8]. Deligiannidis L, Sheth A, “Semantic Analytics Visualization”, Intelligence and Security Informatics Proc. ISI-2006, pp 48-59, 2006. [9]. J. Mayfield, “Ontologies Engineering Review, 2007

and

text

retrieval”,

Knowledge

Dr. Mayank Singh have done his M. E in software engineering from Thapar University and PhD from Uttarakhand Technical University. His Research areas are Software Engineering, Software Testing, Wireless Sensor Networks and Data Mining. Presently He is working as Associate Professor in Krishna Engineering College, Ghaziabad. He is associated with CSI, IE (I), IEEE Computer Society India and ACM.

[10]. Cristani, R. Cuel, “A Survey on Ontology Creation Methodologies”, International Journal on Semantic Web and Information Systems, Vol.1 No.2”, 2005. [11]. Uschold, M. And King, “Towards A Methodology for Building Ontologies”, IJCAI-95 Workshop on Basic Ontological Issues in Knowledge Sharing, Montreal and Canada, 2006. [12]. Uschold, M. And Gr Ninger, “Ontologies: Principles, Methods and Applications”, Knowledge Engineering Review, Vol.11 No.2, pp 93-137. [13]. Updegrove A, “The Semantic Web: An interview with Tim Berners-Lee”, 2005. [14]. S. Staab, R. Studer and Y. Sure, “Knowledge Processes and Ontologies”, IEEE Intelligent Systems Vol. 16, No.1, pp 2-9, 2001 [15]. Kozaki K, R. Mizoguchi, “An Environment for Distributed Ontology Development Based on Dependency Management”, In

118

Suggest Documents