Creating Knowledge Representation and Semantic Web Application ...

1 downloads 25697 Views 45KB Size Report
development of semantic web application has not yet been easy. In this paper ... The Web has influenced the way people ..... good result in simplifying the way of.
Creating Knowledge Representation and Semantic Web Application with RDF Dr. Abdul Monem S Rahma Computer science department University Of Technology, Iraq Email: [email protected]

Jamal F. Tawfeq Al-Rashed College Of Engineering & Science University Of Technology, Iraq Email: [email protected]

ABSTRACT The World Wide Web is the greatest repository of information ever assembled by man. The web information was not design to be processed by machines. The web information is meaningless to computer, and it is very hard to find out what we are looking for. New challenges have to be raised to build a Semantic Web infrastructure where documents will be met to understandable by human and computers. Then the knowledge creation and development of semantic web application has not yet been easy. In this paper steps towards building a semantic web application are introduced using Recourse Description Framework (RDF). Key Words: Semantic Web, Knowledge Representation, RDF

1. Introduction The Web has influenced the way people communicate and collaborate. We can publish information on the Web, making it accessible to anyone with access to the Web, or we can use the Web as a source of information to derive new knowledge. The amount of information that is accessible on the Web has increased enormously in a short period of time. This increase in information is a desirable evolution, but it has also made the problems with the Web more evident. Everyone that has used the Web to search for information knows that it is not as easy or as fast as one would like it to be. Because the Web information was not designed to be processed by machines, that requires developing an intelligent tool for integration of information extracted from pages, including special information that tells a computer how to display a particular piece of text or where to go when a link is clicked. It helps the machine to determine what the text means.

Berners-Lee, Hendler and Lassila provide the following definition of the Semantic Web [1]: The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation. There are two main features to the Semantic Web. 1.The Semantic Web is not a separate Web, but an extension of the current Web. That means two names to the same thing. And the Semantic Web exists in the Web and is a part of the Web at the same time. This makes them inseparable at the URI-level. 2.The name Semantic Web comes from that fact it is represented as a set of semantically and formally interlinked data units thereby creating a Semantic Web inside the Web.

The word semantic, as WordNet defines it, “relating to the study of meaning and changes of meaning”. For the Semantic Web, semantic indicates that the meaning of data on the Web can be discovered—not just by people, but also by computers. In contrast, most meaning on the Web today is inferred by people who read web pages and the labels of hyperlinks, and by other people who write specialized software to work with the data. The phrase the Semantic Web stands for a vision in which computers—software—as well as people can find, read, understand, and use data over the World Wide Web to accomplish useful goals for users.[2] Web users need new ways to exploit all this available information and possibilities. The problem is that Web information is meaningless for computer and so it is very hard to find out what we are looking for. In this context, the need a new vision of the Web, the semantic Web arises. The Semantic web is a Web of data. It is supposed to make data located anywhere on the Web accessible and understandable, both to people and to machines.

The Semantic Web is the abstract representation of data on the World Wide Web, based on the RDF standards and other standards to be defined. It is being developed by the W3C, in collaboration with a large number of researchers and industrial partners. From those definitions we can see the Semantic Web is a “web of meaning”, as opposed to the “web of links” that the Web is today. This “web of meaning” will enable computers with specialized programs to help us not only to find information but also to derive information that did not exist before. What we need is to make information “meaningfully processable” by computers so they can use

The MetaWeb A machine processable Web about the Web

The Web

Human's Understanding

By World Wide Web Consortium provides the following definition which depends on the language that is used in creating Semantic web: [3]

the information present on the Semantic Web as if they created it. [4] The reason for using “implicit” is that the Semantic Web is the Web, but can conceptually be considered to be at another abstract level and aimed at computer consumption. This is illustrated in Figure (1). [4] Tim Berners-Lee, known as the inventor of the World Wide Web (WWW), has a vision for the future of the World Wide Web, which he calls “The Semantic Web” [1]. In this Semantic Web, information will be presented in machine-readable form. Right now most information present on the WWW is presented in natural language and can only be understood by humans. And although there have been some advancements in the field of textrecognition, there are still a lot of issues to be resolved before natural language can be understood by computers [4].

Computer understanding

2. Semantic Web Purpose and Definition:

Figure (1) An Abstract and Conceptual View of Semantic Web The Semantic Web is all about creating a Web that is understandable by both man and computer. Computer users will still have the information presented in the way they are used to, but for the computers the Semantic Web is a breakthrough. Now computers don’t have to reason based on grammar and mark-up languages anymore, because the semantic structure of the text is already included. With The Semantic Web it will be a lot easier to find what you

are looking for, since everything is already placed in context. The Semantic Web [1] [5]: 1. allows effective combination of the independent work of diverse communities. 2. supports the ability to add new information without insisting that the old be modified. 3. provides communities the ability to resolve ambiguities and clarify inconsistencies. Uses descriptive conventions that can be expanded as human understanding expands. The purpose of the Semantic Web is to benefit humans, not computers. The original idea was that instead of waiting for computers to become smart enough to solve all the problems of understanding human language, we should focus on the slightly less difficult problem of making human data more understandable to computers. [6]

3. Resource Description Framework (RDF) There are many languages that are used to publishing the text document on the Web. One of these languages is Hyper Text Markup Language (HTML). W3C has established a standard for HTML. The web pages contain information with both free and structure text, images and possibly audio and video sequences. It is important to note that the markup provided by HTML does not refer to the content of the information provided, but only covers the way it should be structured and presented on the page. But W3C has proposed many standard languages based on knowledge representation, to design a semantic web. First of these language is XML (eXtensible Markup Language). XML and XML schema describe semi-structured data to give machine accessible meaning to a piece of information, by defining a schema. Another language became W3C standard of creating Semantic web is RDF language. RDF offers developers a

powerful toolkit for making statements and connecting those statements to derive meaning. RDF offers a different, and in some ways more powerful, framework for data representation than XML or relational databases. There are many properties for RDF language [7]: 1. RDF's foundations are built on a very simple model, but the basic logic can support large-scale information management and processing in a variety of different contexts. 2. RDF is a language designed to support the Semantic Web, in much the same way that HTML is the language that helped initiate the original Web. RDF is a framework for supporting resource description, or metadata (data about data), for the Web. 3. RDF provides common structures that can be used for interoperable XML data exchange. 4. RDF is a foundation for processing metadata; it provides interoperability between applications that exchange machine-understandable information on the Web. 5. RDF emphasizes facilities to enable automated processing of Web resources. 6. RDF can be used in a variety of application areas, for example: in resource discovery to provide better search engine capabilities, in cataloging for describing the content and content relationships available at a particular Web site, page, or digital library, by intelligent software agents to facilitate knowledge sharing and exchange, in content rating. In describing collections of pages that represent a single logical "document", for describing intellectual property rights of Web pages, and for expressing the privacy preferences of a user as well as the privacy policies of a Web site. 7. RDF with digital signatures will be key to building the "Web of Trust" for

electronic commerce, collaboration, and other applications.

4.2. The Document The document is knowledge in a specific domain. It is gathered from different web pages, Figure (4).

4. Creating Semantic Web There are several steps are involved in building Semantic Web application, Figure (2). Creating Semantic Web contents

Validating the Semantic contents

Web

Semantic

Using the semantic contents

Figure (2) Building Semantic Web Steps

4.1. Creating Semantic Web contains. This is to gather knowledge from existing Web resources. This will involve marking up content with semantic tags using Semantic Web languages. RDF language is used to representing information about resources in the World Wide Web. It is particularly intended for representing metadata about web resources. RDF is based on the idea of identifying thing using web identifiers called Uniform Resource Identifiers (URI) and describing resource in terms of simple properties and property values. This enables RDF to represent simple statements about resources as a graph of node and arcs. RDF URIs can refer to any identifiable thing, including things that may not be directly retrievable on the web. In addition RDF properties themselves have URIs, to precisely identify the relationships that exist between the linked items. [8][9] [10] Many things must be done to complete this step, Figure (3). Document

RDF Metadata model

RDF/XML document

Figure (3) Creating Semantic Web Contents

Figure (4) Metadata and Relationship Gathered from Web Page This document is a metadata about resource in specific domain. Semantic metadata is data about data that is machine processable. Semantic Web is based on relations betweens terms. Each term represents a concept. There are semantic relations between terms that capture their semantic.

5. Suggested RDF Semantic Metadata Model The document in step (1) is a metadata about specific domain. It is written in English statement. These statements are a metadata about web resources. Then we can translate these English statements to RDF/XML document. The RDF data model provides an abstract, conceptual framework for defining and using metadata. RDF provides a mechanism for recording statements about Web resources, e.g., Web page, so that machines can easily interpret the statements. That means, RDF gives you a way to make statements that are machineprocessable. Now computer can not actually "understand" what you said, but it can deal with it in a way that makes it seem as it does.

Within RDF specification, any RDF statement has three pieces of information. These pieces are: [7] [8] 1. Subject. 2. Property type. 3. An Object or property value. Then the RDF triple is: {Subject, Property, Object} That allows both human and machine consumption of the same data. But a basic rule of English grammar is that a complete sentence (or statement) contains both a subject and a predicate: the subject is the who or what of the sentence and the predicate provides information about the subject. [7] For example, if we have a sentence about the production of hardware companies: Computer hardware companies have production of "motherboard". This is a complete statement about the computer hardware companies. The subject is computer hardware companies, and the predicate is production of, with a matching value of "motherboard" combined, the three separate pieces of information make completely unique piece of knowledge. In RDF, this English statement translates to an RDF triple. In RDF, the Subject is a resource identified by literal or a URI, and the Predicate is a property type of the resource, such as an attribute, a relationship, or a characteristic. In addition to the subject and predicate, the specification also introduces a third component, the Object. Within RDF, the object is equivalent to the value of the resource property type for the specific subject. [7] The product of hardware companies is "motherboard" the generic reference to hardware companies is replaced by the company's URI, forming a new and more precise sentence:

The hardware companies at http://hardware.net have production of "motherboard". With this change, there is no confusion about which hardware production "motherboard" we're talking about i.e. the one with the URI at http://hardware.net. The individual components of the statement we're interested in can be further highlighted, with each of the three components specifically broken out into the following format: [7] [8] HAS This is a representation of a statement whereby three components of the statement can be replaced by instances of the components to generate a specific statement. The example statement is converted to this format as follows: http://hardware.net has a production of "motherboard" In RDF, this new statement, redefined as RDF triple, can be considered a complete RDF graph because it consists of a complete fact that can be recorded using RDF methodology and that can then be documented using shorthand techniques. It is using the following to represent a triple: {Subject, Predicate, Object} Then, the previous example becomes: {http://hardware.net, production, "motherboard"}

5.1. The RDF Semantic Graph Model Section 5 has introduced RDF's basic statement concepts, the idea of using URI references to identify the things referred to in RDF statements, and RDF/XML as a machine-processable way to represent RDF statements. RDF is based on the idea of expressing simple statements about resources, where each statement consists of a subject, a predicate, and an object.

From the previous example, the English statement: http://hardware.net has a production of "motherboard". It could be represented by an RDF statement having: 1. a subject is http://hardware.net 2. a predicate is production 3. and an object is motherboard Note how URIrefs are used to identify not only the subject of the original statement, but also the predicate and object, instead of using the words "production" and "motherboard" respectively. RDF model statements are represented as nodes and arcs in a graph. A statement is represented by: 1. a node for the subject 2. a node for the object 3. an arc for the predicate, directed from the subject node to the object node. So, the RDF statement above would be represented by the graph shown in Figure(5) http://hardware.net

production

motherboard

Figure (5) Simple RDF Statement Representation by the Graph In drawing RDF graphs, nodes that are URIref's are shown as ellipses, while nodes that are literals are shown as boxes. Groups of statements are represented by corresponding groups of nodes and arcs.

5.2. RDF/XML Document The basic principle for the RDF/XML syntax can be illustrated using the examples presented already for English statement.

For the example: http://hardware.net has a production of "motherboard". The RDF graph for this single statement is shown in Figure (5), with triple: {http://hardware.net, production, "motherboard"} An RDF/XML document corresponding to this graph is: motherboard This example illustrates the basic ideas used by RDF/XML document to encode an RDF graph as XML elements, attributes, element content and attribute values.

6. RDF Validating Parsing the Semantic Contents The next step is to validate RDF/XML data. The valid data will enable search engines and other software agents to find, use and reuse these contents. The parsing RDF/XML document is to ensure that it's valid, and it generates view of the data model as RDF triples. There is an online resource operated by the W3C for validation of RDF syntax found at http://www.w3.org/RDF/Validator/. This tool is used to validate, to find the triples and the graphics of the RDF/XML document RDF parsers provide the basic support in parsing different RDF serializations, accessing RDF triples via programming interfaces or queries, and provid basic operations with the triples. RDF parser is the bridge between the RDF/XML document and RDF database (RDFDB). That means, it can't persist until RDF/XML document passes the RDF parser and it is valid according W3C definition.

RDF/XML Document

RDF Parser

RDFDB

All these tools check whether both the RDF Schemata and related Metadata instances satisfy the semantic constraints implied by the RDF Schema Specifications.

Figure (6) RDF Parsing The input of RDF parser is RDF/XML document and the output is a set of RDF triples: (Subject, Predicate, Object) All these triples are well stored in MySql database as RDFDB. It can contain all data and the application can pick and choose what it needs.

6.1. RDF Parsing Tools: There are many tools that are used for validating and parsing the RDF/XML document. These tools are: [11] 1. The SiRPAC RDF parser. SiRPAC is a set of Java classes that can parse RDF/XML documents into the threetuples of the corresponding RDF data model. The parser has evolved in several versions and has become a defacto standard in Java-based RDF development. 2. The Profium tool targets the same objective as SiRPAC and provides similar functionality. It is available as a Perl script in addition to its Java implementation. 3. The Perllib W3C library is born of a need to implement an RDF infrastructure at W3C. This is currently used for access control and annotations, but will be used for a more diverse group of applications as needs evolve. The library is implemented in Perl and is now under prerelease preparation. 4. ICS-FORTH Validating RDF Parser is a tool for parsing RDF statements and validating them against an RDF Schema. The parser analyzes syntactically the statements of a given RDF/XML document according to the RDF Model and Syntax specifications.

7. User Application Interface The third step is to develop semantic services utilizing the semantic content that is marked with RDF triples. Software agents will parse this content once and deliver several services to the user. The most important use of the semantic web agent is to search the semantic contents and answer user query. The current Semantic Web activities focus mainly on making the web machine understandable, the end-user needs are often neglected [12]. But the design of the user interface is very important because the user does not receive training in the use of these applications. To bring the advantages of the Semantic Web to the user interface, one clearly needs an interface which would allow the user to explore, browse, and query the content he is interested in. [12] Not only we need to present RDF data to the user, but also need to give him intuitive tools with which to interact with such data and allow the user to manipulate resources with direct manipulation techniques such as query, sitemap and drag and drop. [13] The Jena toolkit provides RDQL query language to query the RDF content. Any search agent can use this query language to provide search facilities for the Semantic Web. RDQL has similar syntax to standard SQL, making it attractive to search agent developers. [14] There is another tool called Repat. It is used to parse and query the RDF database. It is PHP version. And it uses RDQL to query the RDF database, too [15]. MySql database management system is used to manage and query the RDFDB using RDF Data Query Language (RDQL). Any search agent can use this query language to provide search facilities for the semantic web.

8. Conclusion The web is a collection of data and its expansion is very wide. It has become the greatest repository of information ever assembled by man. Web information is not designed to be processed by machines, and the information is meaningless to computer and it is very hard to find out what we are looking for, for this reason the Semantic Web by calls for a well defined tool to process Web depending on the knowledge representation. Semantic Web is about creating a Web that is understood by both human and computer. RDF language is a standard of W3C to describe metadata by RDF triple. The steps suggested to create knowledge representation and semantic web application using RDF are: 1. creating Semantic Web contents. 2. validating the Semantic contents. 3. using the Semantic contents. In order not to "reinvent the wheel" we can apply the knowledge representation method to represent web resource metadata. These steps are applied to create a Semantic Web for computer hardware companies as a case study, and we have a good result in simplifying the way of creating Semantic Web.

References [1] T. Berners-Lee, J. Hendler, and O. Lasilla, "The Semantic Web" Scientific American, May 2001. [2] Thomas B. Passin, Explorer’s Guide to the Semantic Web, Manning Publications Co.,2004. [3] Klaas Naaijkens, Semantic Web, 20 Nov 2002. [4] B. Gustavsson, On the Semantic Web language, Växjö University, School of Mathematics and Systems Engineering, 2001. [5]

Leendert W.M.Wienhofen, Using Graphical Ontologies for Searching

The (Semantic) Web, University College of Ostfold ,August 2003. [6] Boris Katz and Jimmy Lin, Annotating the Semantic Web Using Natural Language, Proceeding of 2nd Workshop on NLP and XML, September 2002, Taipei, Taiwan. [7] Shelley Powers, "Practical RDF", O'Reilly, July 2003. [8] Ora Lassila and Ralph R. Swick ,RDF Model and Syntax Specification, W3C,1999. [9] Patrick Hayes, RDF Model Theory, W3C, 25Sep2001. [10] Frank Manola and Eric Miller, RDF Primer, W3C, 10Feb2004. [11] Ying Ding, Dieter Fensel, Michel Klein, and Borys Omelayenko, The SemanticWeb: Yet Another Hip?, Division of Mathmatics & Computer Science, Vrije Universiteit Amsterdam, De Boelelaan Amsterdam, NL, 2002. [12] Richard Vdovjak, Peter Barna, GeertJan Houben and Flavius Frasincar, Bringing the Semantic Web Closer to the User, Technische Universiteit Eindhoven, 2003. [13] Dennis Quan, David Huynh and David R. Karger, Haystack: A Platform for Authoring End User Semantic Web Applications, , MIT Artificial Intelligence Laboratory, USA, 2003. [14] B. McBride, Jena: A Semantic Web toolkit, IEEE Internet computing, Vol. 6, no. 6, Nov/Dec 2002, pp 5559. [15]

Luis Argerich, Parsing RDF documents using PHP, 2002, [email protected]