Development of Semantic Web Site Using Knowledge

0 downloads 0 Views 4MB Size Report
Seagate Company has product ST340014A hard disk driver ", the semantics ...... The floppy disk driver is produced by Panasonic, Sony and Mitsumi companies ...
Republic of Iraq Ministry of Higher Education and Scientific Research University of Technology Dep. of Computer Sciences

Development of Semantic Web Site Using Knowledge Representation A Thesis Submitted to the Department of Computer Sciences of University of Technology in a Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science

By Jamal Fadthel Tawfeq

Supervised by Ass. Prof. Dr. Abdul Monem S. Rahma

April 2006

‫ﺑِﺴـ ِﻢ ِ‬ ‫اﷲ اﻟـ ﱠﺮ ْﺣـ َﻤ ـ ِﻦ اﻟـ ﱠﺮِﺣـﻴـ ِﻢ‬ ‫ﱠـﻦ ﻟـَﮭـ ُ ْﻢ أﻧـﱠﮫُ‬ ‫ﺎق َوﻓﻲ أﻧﻔـُﺴـِﮭـِﻢ ﺣـَﺘـﱠﻰ ﯾـَﺘـَﺒـَﯿ َ‬ ‫َﺳـﻨـُﺮﯾـﮭـِﻢ َ أﯾـَﺎﺗِـﻨـَﺎ ﻓﻲ اﻷﻓـ َ ِ‬ ‫اﻟﺤـ َ ﱡ‬ ‫ﻚ أﻧـﱠﮫُ َﻋﻠـَﻰ ﻛـَﻞﱢ ﺷـَﻲ ٍء ﺷـَﮭـِﯿ ٌﺪ‪.‬‬ ‫ﻜﻒ ﺑـ ِ َﺮﺑـ ِ َ‬ ‫أوﻟـ َ ْﻢ ﯾـ َ ِ‬ ‫ﻖ َ‬ ‫ق ﷲ اﻟ َﻌﻈﯿﻢ‬ ‫ﺻ َﺪ َ‬ ‫َ‬ ‫ﺳﻮرة ﻓﺼﻠﺖ آﯾﺔ ‪٥٣‬‬

Supervisor's Certification

We certify that this thesis was prepared under my supervision at the Department of Computer Sciences of University of Technology as a partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science.

Asst. Prof. Dr. Abdul-Monem S. Rahma

Committee Certification We certify that we have read this thesis, entitled “Development of Semantic Web Site Using Knowledge Representation”, and as an examining committee examined the student, Jamal Fadthel Tawfeq, in its contents and in what is related with, and that in our opinion it meets the standard of a thesis for the degree of Doctor of Philosophy in Computer Science. Signature: Name: Prof. Dr. Hilal H. Saleh Date:

(Chairman)

Signature: Name: Asst. Prof. Dr. Aladdin J. Alnaji Date:

(Member)

Signature: Name: Asst. Prof. Dr. Kais J. Al Jumaily Date:

(Member)

Signature: Name: Asst. Prof. Dr. Jane J. Stephan Date:

(Member)

Signature: Name: Asst. Prof. Dr. Ahmed T. Sadiq Date:

(Member)

Signature: Name: Asst. Prof. Dr. Abdul-Monem S. Rahma Date:

(Super Visor)

Approved by the Department of Computer Science, University of Technology. Signature: Name: Prof. Dr. Hilal H. Saleh Date:

(Head of the Department)

‫ﺍﻹﻫﺪﺍء‬ ‫ﺇﱃ ﺯﻭﺟﺘﻲ ﺍﻟﻌﺰﻳﺰﺓ‬

‫ﺍﻟﺘﻲ ﺯﺭﻋﺖ ﰲ ﻧﻔﺴﻲ‬

‫ﺍﻷﻣﻞ‪ ،‬ﻭﺃﺣﻴﻴﺘﻪ‪،‬‬ ‫ً‬ ‫ﻭﺳﻘﺘﻪ ﻣﻦ ﻳﺪﻫﺎ ﻣﺎء ﳕﲑﺍ‪،‬‬ ‫ﻓﺈﺫﺍ ﺑﻪ ﻳﻨﻤﻮ ﻭﻳﺰﻫﻮ‪،‬‬ ‫ﻭﳛﻤﻞ ﲦﺮﺍ ﻳﺎﻧﻌﺎ‪.‬‬

‫ﺇﻟﻴﻬﺎ ‪. . . . . . . .‬‬

‫ﺃﻫﺪﻱ ﻫﺬﺍ ﺍﻟﺜﻤﺮ ﺍﻟﻴﺎﻧﻊ‪،‬‬ ‫ﻋﺮﻓﺎﻧﺎ ﻭﺷﻜﺮﺍ‪.‬‬

Acknowledgements

I would deeply thank Ass. Prof. Dr. Abdul Monem S. Rahma for his great help and assistance, encouragement and support, and his continuous commitment and goodwill which put me on the right way. He is the teacher and the brother to me and every one. I also acknowledge the contribution of Ass. Prof. Dr. Moad A. Fadhil and Ass. Prof. Dr. Rashed A. Al-Zubaidy, their advice, support. Thanks are due to all friends in the Department of Computer Sciences of University of Technology. My great thanks and regards go to my close friends Dr. Hala Bahjet and Dr. Abdul Karem Merheg for their real friendship, they are always there when I need someone to talk to, when I need them. Finally, I must thank my family, a specially my wife, who have provided love and a support through of these years, and always offered words of encouragement when I was feeling down.

I hope that I didn't disappoint any one of them.

Jamal 2006

i

Abstract Semantic Web, commonly regarded as the next generation of web. It is emerging the knowledge representation methods and the web communities. New challenges have risen to build a semantic web infrastructure where document will be understandable by human and computer. In this dissertation, introduce steps of building a Semantic Web application will be suggested, using Resource Description Framework (RDF). This methodology is supposed to give information accessible better by both human and computer. There are three steps involved in building Semantic Web application. The first step is Creating Semantic Web contents: In this step used one of the knowledge representation methods. In this dissertation used a semantic net to represent a metadata for two case studies. The first case is about computer hardware companies and the second case is an Arabic document about Arabic poets. The second step is a Validating the semantic contents: It is providing the basic support in parsing RDF/XML document. Accessing RDF triples via programming interface or queries. And it is providing basic operation with triples. The third step is Using the Semantic Contents: It is to develop semantic services utilizing the semantic contents. This methodology is applied to create Semantic Web applications and give good result in simplifying the way of creating Semantic Web application. In this dissertation well be design and implementation of a general user application interface for RDF/XML documents. It is applied to three case studies. First case is about a document for Arabic poets, the second case is

ii

a document taken from Practical RDF book by S. Power, and the third case is about a computer hardware companies but with image that uploaded with the document.

iii

Contents Page no.

Subject Acknowledgements

i

Abstract

ii

Contents

iv

List of Figures

vii

List of Tables

x

List of Abbreviation

xi

Chapter One 1. Introduction

1

1.1 Introduction

2

1.2 Review of Previous Work

4

1.3 The Goal of the Thesis

6

1.4 Thesis structure

6

Chapter Two 2. Theoretical Background

8

2.1 Introduction

9

2.2 The Current Web

9

2.2.1 The Weakness of Web

10

2.2.2 The Weakness of Search engine

11

2.3 Semantic Web

12

2.3.1 Semantic Web Metadata

16

2.3.2 Metadata Meaning

16

2.3.3 Web and Semantic Web Difference

18

2.3.4 The Difference in Use

20

2.4 Semantic Web Architecture

23

iv

2.5 Semantic Web Principles

28

Chapter Three 3. knowledge Representation

31

3.1 Introduction

32

3.2 Knowledge Representation Techniques

32

3.3 Principles of Knowledge Representation

40

3.4 Knowledge Representation Properties

41

3.5 Knowledge Representation on Web

41

3.5.1 HTML: Visualizing Information

43

3.5.2 XML: Exchange Information

44

3.5.3 RDF: A Data-Model for Meta-information

46

3.6 Parsing RDF/XML Document

52

3.6.1 RDF Parsing Tools

53

3.7 RDF Data Query Language (RDQL) 3.7.1 RDQL syntax

53 54

3.8 User Interface

56

Chapter Four 4. Creating Knowledge Representation for Semantic Web

58

4.1 Introduction

59

4.2 Creating Semantic Web

59

4.3 Creating Semantic Web Content

60

4.3.1 The Document

60

4.3.2 Suggested RDF Semantic Metadata Model

64

4.3.2.1 RDF Semantic Graph Model

69

4.3.2.2 Structured Statement

75

4.4 Semantic RDF/XML Document

79

4.5 Arabic Document

83

v

4.6 RDF Validating and Parsing the Semantic Contents

93

Chapter Five 5. Using the semantic contents

95

5.1 Introduction

96

5.2 Mysql and RDF Database

96

5.3 RDF Data Query Language (RDQL)

97

5.4 User Interface

97

5.5 General User Interface

102

5.6 Example of Using General User Interface

103

Chapter Six 6. Conclusions and Suggestions for Future Work

115

6.1 Conclusions

116

6.2 RDF and Semantic Web Application

118

6.3 Suggestions for Future Work

119

Reference

120

Appendix A

128

RDF Graph for Computer Hardware Companies Appendix B

160

RDF Triples for Computer Hardware Companies Appendix C

176

RDF/XML Document for Computer Hardware Companies

vi

List of Figures Fig

Figure Name

no.

Page no.

2.1

An Abstract and Conceptual View for Semantic Web

15

2.2

Communication Process Between Two Individual

21

2.3

Two Applications Running on Computers

21

2.4

How Humans will Use Semantic Web Indirectly

23

2.5

The Semantic Web Architecture

24

3.1

ISA Semantic Graph

34

3.2

Instant-Of Semantic Graph

34

3.3

ISPART Semantic Graph

34

3.4

The Representation of the Sentence

36

3.5

A Hierarchy

40

3.6

An Example HTML Documents

43

3.7

An Example of XML Document

45

3.8

Simple Node and Arc Diagram

50

3.9

Complex Sentences with Identifier

51

3.10

An Example of RDF Document

52

4.1

Creating Semantic Web Steps

59

4.2

Create Semantic Web Contains

60

4.3

Metadata and Relationship Gathering from Web Page

61

4.4

Computer Hardware Companies Document

64

4.5

The Document with URIref's

69

vii

4.6

Simple RDF Statement Representation by the Graph

71

4.7

RDF Graph for Group of Statement

74

4.8

RDF Graph for Complex Statement

77

4.9

Arabic Poets Document

83

4.10

Arabic Poet Document with Identifiers

84

4.11

Simple Node and Arc Diagram for Arabic Sentence

85

4.12

RDF Graph for Structured Value

86

4.13

RDF Graph for Arabic Document

89

4.14

RDF/XML Document for Arabic poets

93

4.15

RDF Parsing

93

5.1

Semantic Web Site Home Page

98

5.2

The Query Web Page

99

5.3

The Sitemap Interface

100

5.4

The Web Page Interface

101

5.5

Cases.php Page

102

5.6

Cases.php Page to Upload the Arabic Document

104

5.7

Query Arabic Document

105

5.8

Site Map Page for Arabic Document

105

5.9

Web Page for Arabic Document

106

5.10

RDF/XML Document

111

5.11

Cases.php Page to Upload the RDF/XML Document

111

5.12

Query RDF/XML Document

112

5.13

Site Map Page for RDF/XML Document

112

5.14

Web Page for RDF/XML Document

113

viii

5.15

Cases.php Page Upload the Pictures

114

5.16

Display the Picture in the Web Page

114

ix

List of Tables Table no.

Table Name

4.1

Triples of the Data Model for Group of Statements.

4.2

Triples of the Data Model for Group of Statements

Page no. 73

used prefix notation.

76

4.3

Triples of the Data Model for Structured Statement.

78

4.4

Triples of Arabic Sentence Data Model

86

4.5

Triples of the Data Model for Arabic Document

90

x

List of Abbreviation AI

Artificial Intelligent

DAML

DRAPA Agent Markup Language

DTD

Document Type Definition

e-Commerce

Electronic Commerce

e-Learning

Electronic Learning

EPE

Electronic Publishing Environment

FOL

First Order Logic

HTML

Hyper Text Markup Language

KR

Knowledge Representation

OIL

Ontology Inference Language

OWL

Web Ontology Language

QName

Qualified Name

PHP

Personal Home Page

RDF

Resource Description Framework

RDFDB

Resource Description Framework Data Base

RDFS

Resource Description Framework Schema

RDQL

RDF Data Query Language

SGML

Standard Generalized Markup Language

SHOE

Simple Html Ontology Extension

SQL

Structured Query Language

URI

Uniform Resource Identifier

URL

Uniform Resource Locators

URN

Uniform Resource Name

W3C

World Wide Web Consortium

WWW

World Wide Web

XML

eXtensible Markup Language

xi

xii

Chapter One Introduction

1

1.1 Introduction The World Wide Web is the greatest repository of information ever assembled by man. It contains documents and multimedia resources concerning almost every imaginable subject, and all of this data is instantaneously available to anyone with an Internet connection. The Web’s success is largely due to its decentralized design: numerous computers host web pages, where each document can point to other documents, either on the same or different computers. As a result, individuals all over the world can provide content on the Web, allowing it to grow exponentially as more and more people learn how to use it. However, the Web’s size has also become its curse. Due to the sheer volume of available information, it is becoming increasingly difficult to locate useful information. Although directories (such as Yahoo!) and search engines (such as Google and Alta Vista) can provide some assistance, they are far from perfect. For many users, locating the “right” document is still like trying to find a needle in a haystack. [Hef 01] The Web has influenced the way people communicate and collaborate. The Web is merely an information-publishing medium directed to human consumption. We can use the Web as a source of information to derive new knowledge. The amount of information that is accessible on the Web has increased enormously in a short period of time. This increase in information is a desirable evolution, but it has also made the problems with the Web more evident. Everyone that has used the Web to search for information knows that it is not as easy or as fast as one would like it to be.

2

Only humans are able to catch enough meaning of information to decide how to process it. Programmers have tools to automate these decisions, but they are not expressive enough to provide an automatic framework. They are continuously involved in the low-level development issues. Furthermore, users often want to use the Web to do more than just locate a document, they want to perform some task. For example, a user might want to find the best price for a desktop computer, or for computer's books. Completing these tasks often involves visiting a series of pages, integrating their content and reasoning about them in some way. The main obstacle is the fact that the Web was not designed to be processed by machines. Although, web pages include special information that tells a computer how to display a particular piece of text or where to go when a link is clicked, they do not provide any information that helps the machine to determine what the text means. Thus, to process a web page intelligently, a computer must understand the text, but natural language understanding is known to be an extremely difficult and unsolved problem. Tim Berners-Lee, inventor of the Web, has coined the term Semantic Web to describe this approach. Berners-Lee, Hendler and Lassila [Lee et. al. 01] provide the following definition: The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.

3

1.2 Review of Previous Work There are several theses and papers published in the recent years. Some of those are about how to develop a semantic web language, and the other are about how to represent knowledge in Web. 1. Jeffrey Douglas's work "Towards the semantic web: Knowledge representation in a dynamic, distributed environment" in his PhD thesis [Hef 01]. He provides a new formal definition of ontologies for use in dynamic, distributed environments. And he developed a new method for integrating distributed data sources. And he used SHOE, a web-based knowledge representation language that allows machines to automatically process and integrate web data. He used the feasibility and potential of SHOE as semantic web language. 2. Wang Hai's work "Semantic Web and Formal Design Methods" in his PhD thesis [Hai 04]. This thesis tries to address web and software engineering. It is centered on two main issues: how software engineering techniques facilitate Web applications and how Web technology assists software design and development. 3. Heiner Stuckenschmidt's work "Ontology-Based Information Sharing in Weakly Structured Environments" in his PhD thesis [Hei 03]. He prepares the ground for a framework that covers the complete process of using ontologies for information sharing from the representation of ontologies and their deployment in a heterogeneous environment to their use for information filtering and exchange. 4. Benny Gustavsson's work "On the Semantic Web language" in his PhD thesis [Gus 01]. He describes the computer- processable web of semantic

4

data with focus on the underlying theories and language for its creation, and the language architecture of that future system. 5. Tim Berners-Lee's work "Conceptual Graphs and the Semantic Web" paper-[Lee 01]. He uses the conceptual graphs (CGs) as logic language for describing closed worlds of logic. We clearly have to modify CG syntax's so that each concept and each relation can be a first class object, by having a URI (Uniform Resource Identifier). 6. Frank van Harmelen and Dieter Fensel's work "Practical Knowledge Representation for the Web" -paper- [Har and Fen 99]. They provide a survey and analysis of traditional, new, and arising Web standards and show how they can be used to represent machine-processable semantics of Web sources. 7. Boris Katz and Jimmy Lin's work" Annotating the Semantic Web Using Natural Language" -paper- [Kat and Lin 02]. They exploit the synergistic opportunities between the Semantic Web and natural language techniques; they propose three mechanisms for seamlessly integrating natural language technology into the RDF. The first involves augmenting RDF property definitions. The second involves creating information access schemata to bridge the gap between language and RDF. The third mechanism proposes further extensions that attempt to mirror human question answering behavior in the form of natural language “query plans”.

5

1.3 The Goal of the Thesis In this work, it will be describe the computer processable web "semantic data" originated by the definition in section 1.1, and Semantic Web application will be development based on knowledge representation methods. The main goal involves: 1. Suggestion a methodology to create Semantic Web application based on a semantic net method using RDF language. 2. Development a general user interface. It could be upload any RDF documents to the internet environment.

1.4 Thesis structure The next chapter introduces the current web and what is the weakness of it, and what is the meaning of the metadata. Then describe the Semantic Web, and how it is different from the current web. Chapter 3 discuses the knowledge representation and its important to the WWW for Semantic Web. Then it describes some languages that are used to represent the information on the web. These languages are HTML, XML and RDF as knowledge representation languages. Chapter 4 describe the methodology suppose to creating Semantic Web. It is used to represent a metadata for two case studies. The first is about computer hardware companies and the second is an Arabic document about Arabic poets. Chapter 5 was discus the user application interface for RDF/XML document. In this chapter used three case studies. First case is about RDF/XML document for Arabic poets. And the second case is RDF/XML

6

document taken from Practical RDF book by S. Power. And the third case is a computer hardware companies but with image that is uploaded with the document. Chapter 6 is about the conclusions and suggestion future direction of the Semantic Web.

7

Chapter Two Theoretical Background

8

2-1 Introduction An alternative approach for the current Web is to represent Web content in form that is more easily machine-processable and to use intelligent techniques to take advantage of these representations. [Ant and Har 04] This chapter introduces the weakness of the current web, what is the semantic web and how it is the different from the current web. It provides a survey and analysis of traditional and new web standards and shows how they can be used to represent machine-processable semantics of Web sources. And it argues the Semantic Web architecture and what are the principles of it.

2.2 The Current Web The Web has influenced the way people communicate and collaborate. Publishing information on the Web, making it accessible to anyone with access to the Web, or could be uses the Web as a source of information to derive new knowledge. The amount of information that is accessible on the Web has increased enormously in a short period of time. This increase in information is a desirable evolution, but it has also made the problems with the Web more evident. Everyone that has used the Web to search for information knows that it is not as easy or as fast as one would like it to be. The Internet is a collection of data and its expansion is very high. There are a heterogeneous source of data like documents and multimedia resources concerning almost every imaginable subject, and all of this data is instantaneously available to anyone with an internet connection. Because the Web was not designed to be processed by machines, an intelligent tool must be developed for integration of information extracted from pages includes special information that tells a computer how to

9

display a particular piece of text or where to go when a link is clicked. They help the machine to determine what the text means. The concept of machine-understandable documents does not imply some magical artificial intelligence, which allows machines to comprehend human mumbling. It only indicates a machine's ability to solve a welldefined problem by performing well-defined operations on existing welldefined data. Instead of asking machines to understand people's language, it involves asking people to make the extra effort. [Baa et al 03] Furthermore, users often want to use the web to do more than just locate a document, they want to find the best price on a desktop computer, plane and book of a romantic vacation, etc. a completing these tasks often involves visiting a series of pages.

2.2.1 The Weaknesses of Web WWW has drastically changed the availability of electronically available information. Currently, there are more than eight billion documents in the WWW [Google 05], which are used by the users around the world. And that number is growing fast. However, this success and exponential growth makes it increasingly difficult to find, to access, to present, and to maintain the information required by a wide variety of users. But document management systems now on the market have severe weaknesses: 1. Searching information: Existing keyword-based searches can retrieve irrelevant information that includes certain terms in different meanings. They also miss information when different terms with the same meaning about the desired content are used. 2. Extracting information: Currently, human browsing and reading is required to extract relevant information from information sources. This

10

is because automatic agents do not possess the common sense knowledge required to extract such information from textual representations, and they fail to integrate information distributed over different sources. 3. Maintaining: Weakly structured text sources are a difficult and timeconsuming activity when such sources become large. Keeping such collections consistent, correct, and up-to-date requires mechanized representations of semantics that help to detect anomalies. 4. Automatic document generation: would enable adaptive web sites that are dynamically reconfigured according to user profiles or other aspects of relevance. Generation of semi-structured information presentations from semi-structured data requires a machine-accessible representation of the semantics of these information sources. [Baa et al 01] [Ben 04] [Bru 03] [Din et al 04] [Din et. al. 02]

2.2.2 The Weakness of Search Engines Users have two main tools to help them locate relevant resources on the Web, catalogs and search engines. Catalogs are constructed by human experts, thus they tend to be highly accurate but can be difficult to maintain as the Web grows. To keep up with this growth, search engines were designed to eliminate human effort in cataloging web sites. A search engine consists of a mechanism that “crawls” the Web looking for new or changed pages, an indexing mechanism, and a query interface. [Hef 01] Typically, the indices store information on the frequency of words and some limited positional information. Users query the system by entering a few keywords and the system computes its response by matching the entries against the index. Although many contemporary search engines now also use link analysis to some degree, this only helps to identify the most

11

popular pages, which may or may not be related to the relevance of the pages for a particular query. Keyword-based search engines, such as AltaVista, Yahoo, and Google, are the main tools for using today’s Web. It is clear that the Web would not have been the huge success it was, were it not for search engines. However, there are serious problems associated with their use [Ant and Har 04]: 1. High recall, low precision. Even if the main relevant pages are retrieved, they are of little use if other mildly relevant or irrelevant documents were also retrieved. Too much can easily become as bad as too little. 2. Low or no recall. Often it happens that we don’t get any answer for our request, or that important and relevant pages are not retrieved. Although low recall is a less frequent problem with current search engines, it does occur. 3. Results are highly sensitive to vocabulary. Often our initial keywords do not get the results we want; in these cases the relevant documents use different terminology from the original query. This is unsatisfactory because semantically similar queries should return similar results. 4. Results are single Web pages. If we need information that is spread over various documents, we must initiate several queries to collect the relevant documents, and then we must manually extract the partial information and put it together.

2.3 Semantic Web The word semantic implies meaning or, as WordNet defines it, “ of or relating to the study of meaning and changes of meaning. ” For the Semantic Web, semantic indicates that the meaning of data on the Web can be discovered—not just by people, but also by computers. In contrast, most meaning on the Web today is inferred by people who read web pages and

12

the labels of hyperlinks, and by other people who write specialized software to work with the data. The phrase the Semantic Web stands for a vision in which computers—software—as well as people can find, read, understand, and use data over the World Wide Web to accomplish useful goals for users. [Pas 04] Web users need new ways to exploit all this available information and possibilities. The problem is that Web information is meaningless for computer and so it is very hard to find out what are looking for. In this context, the need arises for a new vision of the Web, the Semantic Web arises. The Semantic Web is a Web of data. It is supposed to make data located anywhere on the Web accessible and understandable, both to people and to machines. World Wide Web Consortium W3C provides the following definition: The Semantic Web is the abstract representation of data on the World Wide Web, based on the RDF standards and other standards to be defined. It is being developed by the W3C, in collaboration with a large number of researchers and industrial partners. [Naa 02] For this definition and the definition in sec. 1.1 can be seen the Semantic Web as a “web of meaning”, as opposed to the “web of links” that the Web is today. This “web of meaning” will enable computers with specialized programs to help us not only to find information but also to derive information that did not exist before. What they need is to make information “meaningfully processable” by computers so they can use the information present on the Semantic Web as if they created it. [Gus 01] The Semantic Web consists of two implicit parts:

13

1. A computer processable metaweb that describes the web-resources on the Web and contains directly computer oriented information, and 2. The Web itself. The reason for using “implicit” is that the Semantic Web is the Web, but can conceptually be considered to be at another abstract level and aimed at computers consumption. This is illustrated in Figure (2.1). [Gus 01] This “web of meaning” will enable computers with specialized programs to help us not only to find information but also to derive information that did not exist before. Tim Berners-Lee, known as the inventor of the WWW, has a vision for the future of the World Wide Web, which he calls “The Semantic Web” [Lee et al 01]. In this Semantic Web, information will be presented in machinereadable form. Right now most information present on the WWW is presented in natural language and can only be understood by humans. And although there have been some advancements in the field of textrecognition, there are still a lot of issues to be resolved before natural language can be understood by computers. [Gus 01] The Semantic Web is all about creating a Web that is understandable by both man and computer. Computer users will still have the information presented in the way they are used to, but for the computers the Semantic Web is a breakthrough. Now computers don’t have to reason based on grammar and mark-up languages anymore, because the semantic structure of the text is already included. With The Semantic Web it will be a lot easier to find what you are looking for, since everything is already placed in context.

14

An abstract and conceptual view

The Web

Human's Understanding

Computer processable

The MetaWeb A machine processable Web about the Web

Figure (2.1): An Abstract and Conceptual View for Semantic Web The Semantic Web [Wie 03] [Lee et al 01]: 1. allows effective combination of the independent work of diverse communities. 2. supports the ability to add new information without insisting that the old be modified. 3. provides communities the ability to resolve ambiguities and clarify inconsistencies. it uses descriptive conventions that can expand as human understanding expands. The purpose of the Semantic Web is to benefit humans, not computers. The original idea was that instead of waiting for computers to become smart enough to solve all the problems of understanding human language. [Kat and Lin 02]

15

2.3.1 Semantic Web Metadata With the Semantic Web could be associates semantically rich, descriptive information with any resource for instance, by adding metadata about document creation. In Semantic Web not only provide URIs to documents but to people, concepts and relationships. For example, could be also combine descriptive information from different sites and learn more about this person in differing contexts; in his roles as an author, as a manager, as a developer, etc. [Koi and Mil 01] That means the World Wide Web is a set of resources, when you retrieve them, do not stand simply by them without explanation, but there is information about the resource. Information about information is generally known as Metadata, specifically in the web design. The World Wide Web was originally built for human consumption, and although everything on it is machine-readable, this data is not machineunderstandable. It is very hard to automate anything on the Web, and because of the volume of information the Web contains, it is not possible to manage it manually. The solution proposed here is to use metadata to describe the data contained on the Web. [Las and Swi 99]

2.3.2 Metadata Meaning Metadata are structured documentation about document and objects, or structured information about information. When properly implemented, metadata can crisply and unambiguously describe information resources, enhancing information retrieval and enabling accurate matches to be done, while being totally transparent and invisible to the user. Search specificity is increased and search sensitivity is boosted. [Bou et al 01]

16

Metadata is structured information that describes, explains, locates or otherwise makes it easier to retrieve, use or manage an information resource. [NISO 04] For this definition, metadata provide mechanism which allows a more precise description of thing on the web. This could lead to elevate the status of the web from machine-readable to machine-understandable. T. B. Lee gives a well-formed definition to metadata as [Lee 1997] [Gus 01] Definition Metadata is machine understandable information about web resources or other things. The phrase "machine understandable" is a key. Talking here about information which software agents could be used in order to make life, ensure obey our principles, the law, check that could be trust what could be doing, and make everything work more smoothly and rapidly. Metadata has well defined semantics and structure. Metadata is called "Metadata" because it started life, and is currently still chiefly, information about web resources, so data about data. It is information about anything: about the people, things, concepts and ideas. Even though, the first step is to make a system for information about information. The first axiom is that: Metadata is data. That is to say, information about information is to be counted in all respects as information. There are various parts of this. One is that metadata can be stored regarded as data; it can be stored in a resource. So, one resource may contain information about itself or about another resource. Metadata about one document can occur within the

17

document, or within a separate document, or it may be transferred accompanying the document. The second part of the above axiom is: Metadata can describe metadata That is, metadata itself may have attributes such as ownership and an expiry date, and so there is meta-metadata but don't distinguish many levels, just say that metadata is data and that from that it follows that it can have other data about itself. This gives the Web a certain consistency. [Lee 97] [Gus 01]:

2.3.3 Web and Semantic Web Difference The biggest difference is that the Web is aimed at human consumption through the use of rendering software and machines will ONLY use the Semantic Web. The debate on whether the Semantic Web is the "new" Web is on the wrong level of abstraction. At the URI level they are the same things, but at a higher level they differ substantially. [Lee 98] The definition of the Semantic Web in section 1.1 is divided into two parts: 1. The Semantic Web and the Web are basically two names for the same thing. The Semantic Web exists in the Web, and is a part of the Web at the same time. This makes them inseparable at the URIlevel, and consequently, that level is not useful for explaining the Semantic Web or relating it to the Web. 2. Basically, the name Semantic Web comes from the fact that it "represents" a set of semantically and formally interlinked data units thereby creating a semantic web inside the Web. [Lee 98]

18

But this should also indicate that there exist important conceptual differences between them. Roughly, there are two conceptual differences between the Semantic Web and the Web [Gus 01]: 1. The Semantic Web is an information space in which the information is expressed in a special machine-targeted language, whereas the Web is an information space that contains information targeted at human consumption expressed in a wide range of natural languages. 2. The Semantic Web is a web of formally and semantically interlinked data, whereas the Web is a set of informally interlinked information. By examining these conceptual differences, one finds that there are similarities; and these similarities are the use of links and their importance. It is easy to separate links and human oriented textual information; a text document basically means the same thing to the author with or without links. But the use of links can highly increase the understanding and precision of the information - to a viewer. Instead of only writing: " Seagate Company has product ST340014A hard disk driver ", the semantics and precision would be increased to a viewer if the word Seagate Company was a link to that company homepage, and product was a link to the product type. The use of links is not only to make it possible to navigate in the space, but also to share concepts. The separation of data expressed in the machine-targeted language and semantic links on the Semantic Web is harder to separate, and there is a reason for this: The "meaning" of the information expressed in the machine-targeted language is defined by making semantic links between different concepts. The semantics of the data is expressed in the machinetargeted language thereby highly depending on how its parts, or the descriptions of its parts, are semantically linked, i.e. how one concept relates to another concept. This is very similar to how humans

19

communicate. Languages used by human’s express meaning by referring to a set of shared concepts that ground the understanding of their communication. This sharing and building of concepts has taken thousands of years to establish. Two applications developed separately of each other have no shared language but possibly shared concepts. By creating universal concepts identified by URIs, could be create the set of shared concepts that machines need if the machines are going to comprehend the machine-targeted language. If a language should become universal the concepts that are used have to be universal, and by using URI they are. Hence, the use of semantic links on the Semantic Web is to share concepts. [Gus 01]

2.3.4 The Difference in Use To make it easier to see how applications are going to be able to communicate on the Semantic Web it is often useful to see how humans communicate on the Web. The Figure (2.2) shows a very simplified communication process between two individuals. What the figure emphasizes is that, first of all, people need a set of shared concepts to be able to communicate. Different languages could then be used to encode the concepts that are communicated. The receiver of the message has to be able to decode the message and "rebuild" the meaning of the message by using the shared concepts. To understand the meaning, in some sense, of information one needs to handle both the language and the concepts it encodes. Another thing that is important is that the encoding informally declares which concepts are used. A person that cannot handle a language cannot deduce which concept the message that the language encoded uses. Reading a document is basically a heuristic process guessing concepts.

20

Figure (2.2): Communication Process between Two Individuals Humans can, if they understand the language in which the information was encoded in and the used concepts use the Web as a source of knowledge. The Semantic Web has the same role; it is a giant knowledge base expressed in a way that makes it possible for machines to use. Figure (2.3) shows how two applications running on computers could communicate on the Semantic Web.

Figure (2.3): Two Applications Running on Computers The similarities with human communication are obvious. The key thing to consider is the differences in the figures: Applications that communicate on the Semantic Web explicitly declare concepts that are used in the information encoding! The arrows from, e.g. the space of shared concepts

21

to the encoding of the information illustrate this. Also, the space of shared concepts is the Web. This means that an application that receives an information encoding can extract, without guessing, which concepts are being used, and since the concepts are identified with URIs, they are universal. This does not mean that the receiving application automatically understands (could process) the concepts that are being used, but it knows what concepts are being used. Then, if the application is "smart" enough, it can try to relate the used concept to its hand-coded concepts so it can process the information. As stated previously, the purpose of creating the Semantic Web is to help humans; should be not build the Semantic Web if it does not provide any use to humans. Humans will never use the Semantic Web - directly, humans will use applications that will use the Semantic Web, or they will use the Web to access information that computers accessing the Semantic Web have produced. Figure (2.4) shows how humans will use the Semantic Web, indirectly. Using Web browsers is to instruct machines to help us, and to see the results. Also, machines will create information that is published on Web pages. To make the Semantic Web happen, there are plenty of things that need to be considered, and problems to be solved. The easiest problem to solve is the precision problem that is present on the Web, and that is

22

Figure (2.4): How Humans will Use Semantic Web Indirectly explained elsewhere. But how could create information in a way that makes it semantically processable by computers? This is the hardest problem, but this is not a new problem, the Artificial Intelligence community, especially the Knowledge Representation community, has been faced with this problem for a long time. To not "reinvent the wheel", their accomplishment has to be considered and reused as much as possible to find the suitable solution. Basically, what is needed is a language -- a Semantic Web language.

2.3.5 Semantic Web Architecture The Semantic Web is a stack of languages, each of which adds a bit to The Semantic Web figure (2.5). In the ultimate situation all layers of this stack are used to ensure the best possible security and information value level. Until that time, these layers can be used one after one. In the below, a short explanation is given for each layer.

23

Figure (2.5): The Semantic Web Architecture Layer 1. Unicode is a way of representing text on a computer. URI stands for Uniform Resource Identifier, a generic set of all names and addresses that refer to resources (a resource can be literally anything). URI is a compact string of characters for identifying an abstract or physical resource. And they can be classified as locator, name, or both. The term URL refers to the subset of URI. That identifies resources via a representation of their primary access mechanism (e.g., their network "location"), rather than identifying the resource by name or by some other attribute(s) of that resource. The term "Uniform Resource Name" (URN) refers to the subset of URI that is required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable.[RFC 2396] Layer 2. XML is a common syntax for the exchange and processing of metadata. The XML syntax is a subset of the international text-processing

24

standard SGML specifically intended for use on the Web. The XML syntax provides vendor independence, user extensibility, validation, human readability, and the ability to represent complex structures. [Mil 98] [Don et al 02] XML and XML schema describe semi-structured data to give machine accessible meaning to a piece of information, by defining a schema, for example, a CV. Then this CV can be chopped up in some main parts like “name”, “education”, “experience” and “private”. In this way the machine already knows the context of the information which makes it easier to process the text entered in these main parts. XML was designed to allow anyone to design their own document format and then write a document in that format. These document formats can include markup to enhance the meaning of the document's content. This markup is "machine-readable," that is, programs can read and understand the corresponding structure. [Mil 01] Layer 3. The RDF, developed under the auspices of the W3C, is an infrastructure that enables the encoding, exchange, and reuse of structured metadata. This infrastructure enables metadata interoperability through the design of mechanisms that support common conventions of semantics, syntax, and structure. [Mil 98] Since Layer 2 lacks the possibility to create a domain specific ontological vocabulary (among other things). The Resource Description Framework (RDF) and RDFS layer provide a metadata layer and a domain specific library. And they may be used to make simple assertions about web resources or any other entity that can be named. A simple assertion is a statement that an entity has a property with a particular value. RDF Schema extends RDF by class and property hierarchies that enable the creation of simple ontologies. [Obe et al 05]

25

The RDF can be seen as the first layer, which is part of the Semantic Web. According to the W3C recommendation, RDF “is a foundation for processing metadata; it provides interoperability between applications that exchange machine understandable information on the Web.” RDF descriptions consist of three types of entities: resources, properties, and statements. Resources may be web pages, parts or collections of web pages, or any (real-world) objects, which are not directly part of the WWW. In RDF, resources are always addressed by URI. Properties are specific attributes, characteristics, or relations describing resources. A resource together with a property having a value for that resource forms a RDF statement. A value is a literal, a resource, or another statement. Statements can thus be considered as object–attribute–value triples. [Stu et al 03] RDF Schema is a simple data-typing model for RDF so that could be describes groups of related resources and the relationships among these resources. For example, could be says “pupil” is a type of “student” and “student” is a subclass of “people”. [Wan 03] Layer 4. The word “ontology” is borrowed from philosophy. Its original meaning is “the branch of metaphysics that deals with the nature of being”-The American Heritage Dictionary of the English Language: Fourth Edition (2000). In AI domain, T. R. Gruber defined the term as a specification of a conceptualization [Gru 93], i.e. it is a description of concepts and relationships with a set of representational vocabulary. The aim of building ontologies is to share and reuse knowledge. W3C defines ontologies as: “Ontologies figure prominently in the emerging Semantic Web as a way of representing the semantics of documents and enabling the semantics to be used by web applications and intelligent agents". Ontologies can prove very useful for a community as a way of structuring and defining the meaning of the metadata terms that are

26

currently being collected and standardized. Using ontologies, tomorrow's applications can be "intelligent," in the sense that they can more accurately work at the human conceptual level.” The ontology vocabulary can be defined in DAML+OIL (or OWL). It extends RDFS with logical Expressions, data typing, cardinality and quantifiers. This enables wide interoperability. Layer 5. The logic layer will provide an interoperable language for describing the sets of deductions one can make from a collection of data how, -- given a ontology-based information base --, one can derive new information from existing data. [Obe et al 05] This layer can include any system that can validate proofs. It will not assume one standard engine, which means that inference capabilities differ. Layer 6. These proofs can then be passed around and verified, providing short cuts to new facts in the system without having each node conduct the deductions themselves. [Obe et al 05] Proof can be interpreted in two different ways. The first one is about proving that a person is who he says he is; Who/what to trust on The Semantic Web. Anybody can say they are the ruler of the universe and therefore have access rights to anything, yet without proving this somehow, they will not be believed. The second interpretation is about proving that an information source provides correct information (part of this is explained in trust). [Wie 03] Layer 7. Which person or which resource can you trust on the web? Perhaps it is possible to prove that a person or resources actually are what they say they are, yet the information they provide does not necessarily have to be true. If a person writes that a whale is a fish, he is giving false information, since a whale is a mammal. If two people say different things about the same subject, for example a whale being a fish or a mammal,

27

who could trust and why? A possible solution for this is a so-called “web of trust” where you can define who you trust. Since this person is likely to trust some people as well, you can use these people as a base of trust as well, hence the web of trust. [Wie 03] Layer 8. Digital signatures are not a layer but a support part. They make sure that the information presented in the adjacent layers can be verified to be coming from the institution or person it claims to be coming from, as well as it was not tampered with in transfer. It is possible to use existing technology for this, for example: I may say, "If an organization is a member of W3C according to a document signed with this key, then that organization is indeed a member". That is a trust statement which gives the key a connection into the world of meaning of documents. [Wie 03] [Lee 02]

2.3.6 Semantic Web Principles Marja Riitta and Eric Miller suggested six principles for the semantic web. These are [Mar and Mil 01]: Principle 1: Everything can be identified by URI's People, places, and things in the physical world can be referred to in the Semantic Web by using a variety of identifiers. Principle 2: Resources and links can have types The current Web consists of resources and links. The resources are Web documents targeted for human consumption and do not commonly contain metadata explaining what they are used for and what are their relationships to other Web documents. Also the Semantic Web consists of resources and links. However, the resources and links can have types that define concepts that tell a bit more to the machines. For instance, some links may tell that a

28

resource is a version of another resource or written by a resource that describes a person or that a resource contains software that depends on some other software. Principle 3: Partial information is tolerated The current Web is unbounded: it sacrificed link integrity for scalability. Also the Semantic Web is unbounded: anyone can say anything about anything and create different types of links between resources. Principle 4: There is no need for absolute truth Not everything found from the Web is true and the Semantic Web does not change that in any way. Truth - or more pragmatically, trustworthiness - is evaluated by each application that processes the information on the Web. The applications decide what they trust by using the context of the statements; e.g. who said what and when and what credentials they had to say it. Principle 5: Evolution is supported It is common that similar concepts are often defined by different groups of people in different places or even by the same group at different times. It would often be beneficial to combine the data available on the Web that uses these concepts. The Semantic Web uses descriptive conventions that can expand as human understanding expands. In addition, the conventions allow effective combination of the independent work of diverse communities even when they use different vocabularies. Principle 6: Minimalist design The Semantic Web makes the simple things simple, and the complex things possible. The aim of the W3C activity is to standardize no more than is necessary. This approach enables the implementation of simple

29

applications now that they are based on already standardized technologies. When could be uses the Semantic Web technologies the result should offer much more possibilities than the sum of the parts.

30

Chapter Three Knowledge Representation

31

3.1 Introduction Knowledge Representation is an important sub-field of AI. In order to solve the complex problems encountered in AI, one needs both a large amount of knowledge and some mechanism for manipulating that knowledge [Ric 83]. There are a variety of ways of representing knowledge which have been exploited in AI. There are two different kinds of entities to represent the knowledge: 1. Simple metadata: as discussed in section 2.3.1. 2. Representation metadata in specific structure able to be manipulated by machine. The goal of KR is to create schemes that allow information to be efficiently stored, modified, and reasoned with.[Ric 83] The objective of knowledge representations is to make knowledge explicit. Knowledge can be shared less ambiguously in its explicit form and this has become especially important when machines started to be applied to facilitate knowledge management.

3.2 Knowledge Representation Techniques Knowledge is always more than the sum of its parts and Knowledge Representation provides the tools needed to manage accumulations of knowledge and the World Wide Web is becoming the biggest accumulation of knowledge ever faced by humanity. There are two questions about representation of the knowledge in the world. These questions are: 1. How can objects (resources) be represented? 2. How can the representation of the objects (resources) be from the structure of the world?

32

These questions are referred to as the problems of knowledge representation. There are many techniques to solve these problems. These techniques are: 1. Semantic network It is one of the oldest KR formalisms. In semantic net many complex objects can be decomposed into simpler ones. Many complex classes of object can be decomposed into smaller ones [Ric 83]. In the semantic network the complex concept can be described as collection of attributes and associated values. To do this, they are stored in ordered triples, which are usually thought of as having the form:

Object ,

Attribute , Value

Information can be retrieved by specifying values for any two fields from the triples. And we can represent this triple as a set of nodes connected to each other by a set of labeled arc, which represent relationships among the nodes. [Fad 05] A graph is used to represent structure of concepts of English sentence. It represents concept hierarchies and inheritance of properties. Each concept is represented by a node in graph--concept that are semantically related are connected by labeled arc. In such representation, meaning is implied by the way that a concept is connected to other concept. There are many common and useful properties of objects. These properties are relationship of objects to other object: a. ISA relationship: indicates that one concept is a subclass of another.

exp: Computer ISA machine.

33

computer

ISA

machine

Figure (3.1): ISA Semantic Graph b. Instance-Of relationship: indicates that a concept is an example of another concept. exp: Desktop Instance-Of computer. Disktop

Instant-Of

computer

Figure (3.2): Instant-Of Semantic Graph

c. ISPART relationship: indicates that a concept is part of another concept. exp: Monitor ISPART Computer.

Monitor

ISPART

Computer

Figure (3.3): ISPART semantic graph These arcs have correlation in basic set theory: ISA and ISPART are like the subset relation and Instant-Of is like the element of relation. The power of the semantic net comes from the definition of links and associated inference rules that define a specific inference such as inheritance.[Lug 02]

34

2. Frame system Frames like semantic nets, are general-purpose structures in which particular sets of domain specific knowledge can be embedded. The details of the operation of a frame-based system vary with the specific kinds of knowledge. [Ric 83] In the terminology of such systems, a frame is a named data object that has a set of slots, where each slot represents a property or attribute of the object. Slots can have one or more values (called fillers), some of which may be pointers to other frames. Since each frame has a set of slots that represent its properties, frame systems are usually considered to be more structured than semantic networks. However, it has been shown that frame systems are isomorphic to semantic networks. [Hef 01] To analyze new experience they evoke appropriate stored structures and then fill them in with the details of the current event. A general mechanism designed for the computer representation of such common knowledge is frame. The word frame has been applied to a variety of slot-and-filler representation structures. Frames are useful to the extent that they make easy to infer as yet unobserved fact about new situations. They facilitate this in a variety of ways:[Ric 83] 1. Frames contain information about many aspects of the objects or situations that they describe. This information can be used as though it had been explicitly observed. 2. Frames contain attributes that must be true of objects that will be used to fill individual slots. 3. Frames describe typical instances of the concepts they represent.

35

3. Conceptual Graphs Conceptual Graphs are formally defined in as an abstract syntax that is independent of any notation, but the formalism can be represented in several different concrete notations. Conceptual graph is a finite connected bipartite graph. The nodes of the graph are either concepts or conceptual relations. Conceptual graphs do not use labeled arcs: instead the conceptual relation nodes represent relations between concepts.[Lug 02] For example; the conceptual graph for the English sentence: A person is between a rock and a hard place. The between relation (Betw) is a triadic relation, whose first two arcs are linked to concepts of entities that occur on either side of the entity represented by the concept linked to the third arc, Figure (3.4).

Figure (3.4): The Representation of the Sentence In liner form for the Figure (3.4) may be represented in the following form: [Person]100 USING comp for Syntax: USING The section is used to declare all the namespaces that will be used for RDF properties, declarations are separated by commas and use the notation prefix form. Example: USING comp for http://hardware.net

3.8 User Interface The current Semantic Web activities focus mainly on making the web machine processable, the end-user needs are often neglected. [Ric et al 03] But the design of the user interface is very important because the user does not receive training in the use of these applications.

56

To bring the advantages of the Semantic Web to the user interface, one clearly needs an interface which would allow the user to explore, browse, and query the content he is interested in. [Ric et al 03] Not only we need to present RDF data to the user, but also need to give him intuitive tools with which to interact with such data and it allows the user to manipulate resources with direct manipulation techniques such as query, sitemap and drag and drop.[ Den et al 05]

57

Chapter Four Creating Knowledge Representation for Semantic Web

58

4.1 Introduction From the definition of Semantic Web in section 1.1, there are two main features of the Semantic Web: 1. Semantic Web is not a separate Web, but an extension of the current Web. That means two names to the same thing. And the Semantic Web exists in the Web and is a part of the Web at the same time. This makes them inseparable at the URI-level. 2. Semantic Web name comes from that fact it represents a set of semantically and formally interlinked data units thereby creating a Semantic Web inside the Web. So the objective of the research is to apply the observed potential of knowledge representation in the Web context. More concretely, the idea is to apply Semantic Web technologies to design and implementation of a Web site for collection of computer hardware company and its products based on T. B. Lee definition of the Semantic Web.

4.2 Creating Semantic Web There are several steps involved in building Semantic Web application Figure (4.1). Creating Semantic Web contents

Validating the Semantic contents

Using the semantic contents

Figure (4.1): Creating Semantic Web Steps

59

4.3 Creating Semantic Web Contents. This is to gather knowledge from existing Web resources. This will involve marking up the content with semantic tags using Semantic Web languages. RDF language is used to represent information about resources in the World Wide Web. It is particularly intended for representing metadata about web resources. RDF as in section 3.5.3, is based on the idea of identifying thing using web identifiers called URI and describing resource in terms of simple properties and property values. This enables RDF to represent simple statements about resources as a graph of node and arcs. RDF URIs can refer to any identifiable thing, including things that may not be directly retrievable on the web. In addition RDF properties themselves have URIs, to precisely identify the relationships that exist between the linked items. Many things must be done to complete this step, Figure (4.2).

Document

RDF Metadata model

RDF/XML document

Figure (4.2): Create Semantic Web Contains

4.3.1 The Document The document is knowledge in specific domain. It is gathered from different web pages, Figure (4.3).

60

Web page

Semantic Network

Figure (4.3): Metadata and Relationship Gathering from Web Page This document is a metadata about resource in specific domain. Semantic metadata is data about data that is machine processable. Semantic Web is based on relations betweens terms. Each term represents a concept. There are semantic relations between terms that capture their semantic. In this dissertation the document that is selected from specific domain is about computer hardware companies, which produce computer items in specific features. This document is shown in Figure (4.4) Computer hardware companies have production of motherboard, cpu, hard disk drive, floppy disk drive, video card, monitor, ram, keyboard, case, soundcard. The motherboard is produced by Asus, Gigabyte, and Abit companies.

61

Asus Company produces P5P800SE and P5lD2 motherboards and each one has specific features. Gigabyte Company produces GA-8I645VM-RZ and GA-7VM400MRZ motherboards and each one has specific features. Abit Company produces A17 and IC7-G motherboards and each one has specific features. The cpu is produced by Celeron and Intel companies. Celeron Company produces cpu-cel4-1.7 and cpu-cel4-2.0 cpu and each one has specific features. Intel Company produces cpu-p4-1 and cpu-p4-3.06 cpu and each one has specific features. The hard disk driver is produced by Western Digital, Seagate and IBM companies. Western

Digital

Company

is

produces

WD2500BB_Caviar,

WD4000KD and WD400BB_ Caviar hard disk drives and each one has specific features. Seagate Company has produces ST340014A hard disk driver and it has specific features. IBM

Company

has

produced

IC35L080AVVA07

and

IC35L120AVVA07 hard disk drivers and each one has specific features. The floppy disk driver is produced by Panasonic, Sony and Mitsumi companies. Panasonic Company has produced JU-257A-907P and JU-257A827P floppy disk drivers and each one has specific features. Sony Company has produced MPF920C floppy disk driver and it has specific features.

62

Mitsumi Company has produced D359M3 floppy disk driver and it has specific features. The sound card is produced by Creative and Aopen companies. Creative Company has produced 70SB022200000 sound card and it has specific features. Aopen Company has produced AW840 and AW850 sound cards and each one has specific features. The computer case is produced by Foxconn, Translucent and Chieftech companies. Foxconn Company has produced ATX3400-P4 computer case and it has specific features. Translucent Company has produced case-24-100 computer case and it has specific features. Chieftech Company has produced CASE-40-300S computer case and it has specific features. The video card is produces by ATI, Apollo and Creative companies. ATI Company has produced av400 video card and it has specific features. Apollo

Company

has

produced

GeForceFX5200,

and

GeForceFX5300 video cards and each one has specific features. Creative Company has produced 3dlabs wildcat vp990 pro video card and it has specific features. The monitor is produced by ViewSonic Company. ViewSonic Company has produced VG510B and VG510B monitors and each one has specific features. The computer mouse is produces by IBM Company. IBM Company has produced IBM Mouse and it has specific features.

63

The ram is produced by Spactic Company. Spactic Company has produced ram003 ram and it has specific features."

Figure (4.4): Computer Hardware Companies Document

4.3.2 Suggested RDF Semantic Metadata Model The RDF data model provides an abstract, conceptual framework for defining and using metadata. RDF provides a mechanism for recording statements about Web resources, e.g., Web page. So that machines can easily interpret the statements. That means, RDF gives you a way to make statements that are machine-processable. Now computer can not actually "understand" what you said, but it can deal with it in a way that makes it seem what it does. Within RDF specification -as seen in section 3.5.3- any RDF statement has three terms of information. These terms are: 1. Subject. 2. Property type. 3. An object or property value. That allows both human and machine consumption of the same data. But a basic rule of English grammar – refer section 3.5.3- is that a complete sentence (or statement) contains both a subject and a predicate: the subject is the who or what of the sentence and the predicate provides information about the subject. A sentence about the production of hardware companies is: Hardware

company

has

"motherboard".

64

production

of

This is a complete statement about the hardware company. The subject is "hardware company", and the predicate is "production", with a matching value of "motherboard" combined, the three separate pieces of information, completely make a unique piece of knowledge. In RDF, this English statement translates to an RDF triple. In RDF, the subject is a resource identified by literal or a URI, and the predicate is a property type of the resource, such as an attribute, a relationship, or a characteristic. In addition to the subject and predicate, the specification also introduces a third component, the object. Within RDF, the object is equivalent to the value of the resource property type for the specific subject. Working with the sentence given earlier, the product of hardware companies is "motherboard" the generic reference to hardware companies is replaced by the company's URI, forming a new and more precise sentence: Hardware company at http://hardware.net has a production of "motherboard". With this change, there is no confusion about which hardware production "motherboard" we're talking about, the one with the URI at http://hardware.net. The individual components of the statement we're interested in can be further highlighted, with each of the three components specifically broken out into the following format: HAS

This is a representation of a statement whereby three components of the statement can be replaced by instances of the components to generate a specific statement. The example statement is converted to this format as follows:

65

http://hardware.net

has

a

production

of

"motherboard" In RDF, this new statement, redefined as RDF triple, can be considered a complete RDF graph because it consists of a complete fact that can be recorded using RDF methodology and that can then be documented using shorthand techniques. It uses the following to represent a triple: {subject, predicate, object} Then, the above English statement becomes: {http://hardware.net, production, "motherboard"} In the document Figure (4.4) there are many resources. To make it more precise use URIref to these resources, then the document becomes as seen in Figure (4.5): http://hardware.net has a production of "motherboard" http://hardware.net has a production of "cpu" http://hardware.net has a production of "harddisk" http://hardware.net has a production of "ram" http://hardware.net has a production of "videocard" http://hardware.net has a production of "keyboard" http://hardware.net has a production of "mouse" http://hardware.net has a production of "case" http://hardware.net has a production of "monitor" http://hardware.net has a production of "floppydisk" http://hardware.net has a production of "soundcard".

http://hwcomp.net/motherboard is produced by Asus Company. http://hwcomp.net/motherboard is produced by Gigabyte Company. http://hwcomp.net/motherboard is produced by Abit Company. http://hwcomp.net/asus has produced http://hwcomp.net/P5P800SE and

66

http://hwcomp/PSLD2 motherboards and each one has specific features." http://hwcomp/gigabyte has produced http://hwcomp.net/ GA-8I645VMRZ and http://hwcomp.net/GA-7VM400M-RZ motherboards and each one has specific features." http://hwcomp/gigabyte

has

produced

http://hwcomp.net/A17

and

http://hwcomp.net/IC7-G motherboards and each one has specific features." http://hwcomp.net/cpu is produced by Celeron Company. http://hwcomp.net/cpu is produced by Intel Company. http://hwcomp.net/celeron has produced http://hwcomp.net/cpu-cel4-1.7 and http://hwcomp.net/cpu-cel4-2.0 cpu and each one has specific features. http://hwcomp/intel

has

produced

http://hwcomp.net/cpu-p4-1and

http://hwcomp.net/cpu-p4-3.06 cpu and each one has specific features. http://hwcomp.net/harddisk is produced by western digital Company. http://hwcomp.net/harddisk is produced by Seagate Company. http://hwcomp.net/harddisk is produced by IBM Company. http://hwcomp.net/western_

digital

http://hwcomp.net/WD2500BB_Caviar,

has

produced

http://hwcomp.net/WD4000KD

and http://hwcomp.net/WD400BB_CAVIAR hard disk drives and each one has specific features. http://hwcomp.net/Seagate has produced http://hwcomp.net/ST340014A hard disk drives and it has specific features. http://hwcomp.net/ibm has produced http://hwcomp.net/IC35L080AVVA07and http://hwcomp.net/IC35L120AVVA07 hard disk drives and each one has specific features.

67

http://hwcomp.net/fdd is produced by Panasonic Company. http://hwcomp.net/fdd is produced by Sony Company. http://hwcomp.net/fdd is produced by Mitsumi Company. http://hwcomp.net/panasonic has produced http://hwcomp.net/JU-257A907P, http://hwcomp.net/JU-257A-827P floppy disk drives and each one has specific features. http://hwcomp.net/sony

has

produced

http://hwcomp.net/MPF920C

floppy disk drives and it has specific features. http://hwcomp.net/mitsumi has produced http://hwcomp.net/D359M3 floppy disk drives and it has specific features. http://hwcomp.net/soundcard is produced by Creative Company. http://hwcomp.net/soundcard is produced by Aopen Company. http://hwcomp.net/Creative has produced http://hwcomp.net/70SB022200000 sound card and it has specific features. http://hwcomp.net/Aopen has product http://hwcomp.net/AW840 and http://hwcomp.net/AW850 sound cards and each one has specific features. http://hwcomp.net/case is produced by Foxconn Company. http://hwcomp.net/case is produced by Translucent Company. http://hwcomp.net/case is produced by Chieftech Company. http://hwcomp.net/Foxconn has produced http://hwcomp.net/ATX3400-P4 case and it has specific features. http://hwcomp.net/Translucent has produced http://hwcomp.net/case-24100 case and it has specific features. http://hwcomp.net/Chieftech has produced http://hwcomp.net CASE-40300S case and it has specific features.

68

http://hwcomp.net/videocard is produced by Ati Company. http://hwcomp.net/videocard is produced by Apollo Company. http://hwcomp.net/videocard is produced by Creative Company. http://hwcomp.net/ati has produced http://hwcomp.net/av400 video card and it has specific features. http://hwcomp.net/apollo has produced http://hwcomp.net/GeForceFX5200 and http://hwcomp.net/GeForceFX5300 video card and each one has specific features. http://hwcomp.net/Creative has produced http://hwcomp.net/3dlabswildcat-vp990-pro video card and it has specific features. http://hwcomp.net/monitor is produced by ViewSonic Company. http://hwcomp.net/ ViewSonic is produced http://hwcomp.net/VG510B and http://hwcomp.net/VG710B video card and each one has specific features. http://hwcomp.net/mouse is produced by IBM Company. http://hwcomp.net/IBM has produced http://hwcomp.net/IBM_Mouse mouse and has specific features. http://hwcomp.net/ram is produced by Spactic Company. http://hwcomp.net/Spactic has produced http://hwcomp.net/ram003ram and has specific features.

Figure (4.5): The Document with URIref's 4.3.2.1 The RDF Semantic Graph Model Section 3.2.1.2 has introduced RDF's basic statement concepts, the idea of using URI references to identify the things referred to in RDF

69

statements, and RDF/XML as a machine-processable way to represent RDF statements. RDF is based on the idea of expressing simple statements about resources, where each statement consists of a subject, a predicate, and an object. Example (1) Simple Statement The simple English statement: http://hardware.net

has

a

production

of

"motherboard". It could be represented by an RDF statement having: 1. a subject http://hardware.net 2. a predicate http://hwcomp.net/production/1.1/motherboard/ 3. an object http://hardware.net/motherboard Note how URIrefs are used to identify not only the subject of the original statement, but also the predicate and object, instead of using the words "production" and "motherboard" respectively. RDF models statements –referred in section 3.5.3-- are representing as nodes and arcs in a graph. A statement is represented by: 1. a node for the subject 2. a node for the object 3. An arc for the predicate, directed from the subject node to the object node. So that, the RDF statement above which would be represented by the graph is shown in figure (4.6).

70

http://hardware.net

http://hwcomp.net/production/1.1/motherboard

http://hardware.net/motherboard

Figure (4.6): Simple RDF Statement Representation by the Graph In drawing RDF graphs, nodes that are URIref's are shown as ellipses, while nodes that are literals are shown as boxes. Groups of statements are represented by corresponding groups of nodes and arcs. Example (2) Compound Statement There are many components like CPU, hard disk, monitor, case… etc, that are produced by hardware companies. The new statement becomes: http://hardware.net has a production of "motherboard" http://hardware.net has a production of "cpu" http://hardware.net has a production of "harddisk" http://hardware.net has a production of "ram" http://hardware.net has a production of "videocard" http://hardware.net has a production of "keyboard" http://hardware.net has a production of "mouse" http://hardware.net has a production of "case" http://hardware.net has a production of "monitor" http://hardware.net has a production of "floppydisk" http://hardware.net has a production of "soundcard".

71

The compound statement in example (2) is written in the RDF triple notation as seen in table (4.1). And the RDF is graph shown in Figure (4.7). Each triple corresponds to a single node-arc-node in the graph, complete with the arc's beginning and ending nodes of the subject and object of the statement respectively. Unlike the drawn graph, the triples notation requires that a node be separately identified for each statement it appears in. So, for example, http://hwcomp.net appears many times in the triples representation of the graph, but only once in the drawn graph. However, the triples represent exactly the same information as the drawn graph, and this is a key point.

72

Table (4.1): Triples of the Data Model for Group of Statements

No.

Subject

Predicate

Object

http://hard http://hwcomp.net/production http://hwcomp.net/moth 1 ware.net

/1.1/motherboard

erboard

http://hard http://hwcomp.net/production 2

http://hwcomp.net/cpu ware.net

/1.1/cpu

http://hard http://hwcomp.net/production http://hwcomp.net/hard 3 ware.net

/1.1/harddisk

disk

http://hard http://hwcomp.net/production 4

http://hwcomp.net/ram ware.net

/1.1/ram

http://hard http://hwcomp.net/production http://hwcomp.net/vide 5 ware.net

/1.1/videocard

ocard

http://hard http://hwcomp.net/production http://hwcomp.net/keyb 6 ware.net

/1.1/keyboard

oard

http://hard http://hwcomp.net/production http://hwcomp.net/mous 7 ware.net

/1.1/mouse

e

http://hard http://hwcomp.net/production 8

http://hwcomp.net/case ware.net

/1.1/case

http://hard http://hwcomp.net/production http://hwcomp.net/moni 9 ware.net

/1.1/monitor

tor

http://hard http://hwcomp.net/production 10

http://hwcomp.net/fdd ware.net

/1.1/fdd

http://hard http://hwcomp.net/production http://hwcomp.net/soun 11 ware.net

/1.1/soundcard

73

dcard

Figure (4.7): RDF Graph for Group of Statement

The full triple notation requires that URI references -refer to section 3.5.3be written out completely, which as in the example can result in very long lines on a page. For this reason we can use shorthand substitutes of an

74

XML QName. A QName contains a prefix that has been assigned to a namespace URI followed by colon, and then local name. There are several well known QName prefixes defined as follows: prefix rdf:, namespace URI: http://www.w3.org/1999/02/22-rdf-syntax-ns# prefix rdfs:, namespace URI: http://www.w3.org/2000/01/rdf-schema# prefix

dc:,

namespace

prefix

owl:,

namespace

URI: URI:

http://purl.org/dc/elements/1.1/ http://www.w3.org/2002/07/owl#

prefix xsd:, namespace URI: http://www.w3.org/2001/XMLSchema# in the previous example the prefixes are: prefix hw name space URI http://hardware.net prefix base name space URI http://hwcomp.net/ prefix comp name space URI http://hwcomp.net/production/1.1/ Then the triples in the table (4.1) become as seen in table (4.2):

4.3.2.2 Structured Statement The example above is a simple RDF statement. However, most realworld data involves structures that are more complicated than that. Structure information is represented in RDF –refer to section 3.5.3- by considering the aggregate thing to be described as resource and then making statements about that new resource. Example(3). Complex Statement For example the complex statement: "The motherboard produced by Asus company. And Asus Company has a web address http://asus.com and has produced P5P800SE and PSLD2 motherboards and each one has specific features."

75

Table (4.2): Triples of the Data Model for Group of Statements Using Prefix Notation No.

Subject

Predicate

Object

1

hw:

comp:motherboard

base:motherboard

2

hw:

comp:cpu

base:cpu

3

hw:

comp:harddisk

base:harddisk

4

hw:

comp:ram

base:ram

5

hw:

comp:videocard

base:videocard

6

hw:

comp:keyboard

base:keyboard

7

hw:

comp:mouse

base:mouse

8

hw:

comp:case

base:case

9

hw:

comp:monitor

base:monitor

10

hw:

comp:fdd

base:fdd

11

hw:

comp:soundcard

base:soundcard

This paragraph is made more precisely by using URIref then becomes: http://hwcomp.net/motherboard is produced by Asus Company. http://hwcomp.net/asus has a web address http://asus.com. And it has product http://hwcomp/P5P800SE and http://hwcomp/ PSLD2 motherboards and each one has specific features." RDF statements can then be written with those nodes as subject, to represent the additional information producing the graph shown in Figure (4.8).

76

base:motherboa rd

prod:asus http://asu s.com

company: webaddress

base:Asus rdf:_2

rdf:_1

base:P5P800 SE

base:P5LD2

prodtyp:

prodtyp:

Figure (4.8): RDF Graph for Complex Statement

The prefix space name is used: prefix prod name space URI http://hwcomp.net/productby/1.2/ prefix company name space URI http://hwcomp.net/companies/1.3/ prefix prodtyp name space URI http://hwcomp.net/productype/1.4/ prefix base name space URI http://hwcomp.net/ Then the triples of this example are in the table (4.3).

77

Table (4.3): Triples of the Data Model for Structured Statement

No.

Subject

Predicate

Object

1

base:motherboard

prod:asus

base:ASUS

2

base:ASUS

company:webaddress

"http://asus.com" "Intel 82801DB

3

base:P5P800 SE prodtyp:bioschipset ICH4"

4

base:P5P800 SE

prodtyp:chipset

"Intel 865PE"

5

base:P5P800 SE

prodtyp:processor

"Intel Pentium 4"

6

base:P5P800 SE

prodtyp:processor

"Intel Dual_Core"

7

base:P5P800 SE

prodtyp:Lan

"Intel Gigabit"

8

base:P5P800 SE

prodtyp:memorytype

"Dual_Channel DDR400" 9

base:P5P800 SE

prodtyp:maxmemory

"UPTO 2GB"

10

base:ASUS

rdf:_1

base:P5P800 SE

11

base:P5LD2

prodtyp:chipset

"Intel 945P chipset" "Intel LGA775 12

base:P5LD2

prodtyp:processor Pentium 4" "Dual-Core CPU

13

base:P5LD2

prodtyp:processor Ready"

14

base:P5LD2

prodtyp:maxmemory

"UPTO 3GB"

15

base:P5LD2

prodtyp:memorytype

"DDR2 400"

16

base:P5LD2

prodtyp:memorytype

"DDR2 667"

17

base:P5LD2

prodtyp:memorytype

"DDR2 533"

18

base:ASUS

rdf:_2

base:P5LD2

78

This way of representing structured information in RDF can involve generating numerous "intermediate" URIrefs such as "prod:Asus" to represent aggregate concepts. Working with the document in Figure (4.4), the product of hardware companies is the computer component. It is generic reference to hardware companies and become more precisely document in Figure (4.5). Then RDF graph can be drawn with those nodes as a subject, to represent the additional information producing the graph shown in appendix (A). The prefix space name is used: xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:comp="http://hwcomp.net/production/1.1/" xmlns:prod="http://hwcomp.net/productby/1.2/" xmlns:company="http://hwcomp.net/companies/1.3/" xmlns:prodtyp="http://hwcomp.net/productype/1.4/" xml:base="http://hwcomp.net/" > The triples of the document are in the table in appendix (B)

4.4 Semantic RDF/XML Document RDF is a declarative language and provides a standard way for using XML to represent metadata in the form of statements about properties and relationships of item on the web. RDF's conceptual model is a graph. RDF provides XML syntax for writing down and exchanging RDF graphs-refer to section 3.5.3 - called RDF/XML. Unlike triples, which are intended as shorthand notation, RDF/XML is the normative syntax for writing RDF. The basic principle for the RDF/XML syntax can be illustrated using the examples presented already for English statement.

79

Example (4) RDF/XML Document for Simple Statement For the simple statement example: http://hardware.net

has

a

production

of

"motherboard". The RDF graph for this single statement, shown if Figure (3.6), and with triple: {http://hardware.net, http://hardware.net/production/1.1/motherboard, http://hardware.net/motherboard} An RDF/XML document corresponding to this graph is: 1. 2. 4. 5.



6. 7. This example illustrates the basic ideas used by RDF/XML document to encode and RDF graph as XML elements, attributes, element content, and attribute values. Example (5) RDF/XML Document for Group of Statements If there are a group of statements as seen in table (4.1), we can write The RDF/XML document corresponding to the graph in Figure (4.7) is: 1.



2.

4.

80

5.



6.



7.



8.



9.



10.



11.



12.



13.



14.



15.



16. 17.

Example (7) RDF/XML Document for Structured statement It is important to see that RDF/XML document can represent a structure property values in example (3). Each statement in example (3) written separately describes exactly the same RDF graph Figure (4.8). And the corresponding RDF/XML document is:

1: 2:

7: 8:



81

9: 10: 11: asus.com 12: 13: 14: Intel 82801DB ICH4 15: Intel 865PE 16: Intel Pentium 4 17: Intel Dual_Core 18: Intel Gigabit 19: Dual_Channel DDR400 20: UPTO 2GB 21: 22: 23: 24: 25: Intel 945P chipset 26: Intel LGA775 Pentium 4 27: Dual-Core CPU Ready 28: UPTO 3GB 29: DDR2 400 30: DDR2 667 31: DDR2 533 32: 33: 34: 35:

82

For the hardware computer components companies document in Figure (4.5), the RDF/XML document can represent a structure property values. Each statement written separately describes exactly the same RDF graph appendix (A). And the corresponding RDF/XML document is in appendix (C). RDF file can contain all of this information, and the application can pick and choose what it needs.

4.5 Arabic Document There is another case study in this dissertation. It is the document that is selected from Arabic poetry domain. This document is shown in Figure (4.9). ‫ أﺑ ﻮ ﻧ ﻮاس ﺷ ﺎﻋﺮ ﻋﺮﺑ ﻲ ﻟ ﮫ ﻗﺼ ﯿﺪة‬.‫ اﻟ ﺬﯾﻦ ﻟﮭ ﻢ ﻋ ﺪة ﻗﺼ ﺎﺋﺪ‬.‫إن ﻟﻠﺸﻌﺮ اﻟﻌﺮﺑﻲ ﻋﺪد ﻣﻦ اﻟﺸ ﻌﺮاء‬ .‫ وھﻲ ﻗﺼﯿﺪة ﻓﻲ ﻣﺪح اﻟﺨﻤﺮ‬.‫ وإﻧﮭﺎ راﺋﯿﺔ اﻟﻘﺎﻓﯿﺔ‬،‫ إﻧﮭﺎ ﻣﻦ اﻟﺒﺤﺮ اﻟﻮاﻓﺮ‬."‫"اﻟﻘﺪح اﻟﻤﺪار‬ ‫ وﻗﺪ‬.‫ وإﻧﮭﺎ ﺑﺎﺋﯿﺔ اﻟﻘﺎﻓﯿﺔ‬، ‫ إﻧﮭﺎ ﻣﻦ اﻟﺒﺤﺮ اﻟﺒﺴﯿﻂ‬."‫أﺑﻮ ﺗﻤﺎم ﺷﺎﻋﺮ ﻋﺮﺑﻲ ﻟﮫ ﻗﺼﯿﺪة "اﻟﺴﯿﻒ أﺻﺪق‬ ‫ﻗﯿﻠ ﺖ ﻓ ﻲ ﻣ ﺪح اﻟﻤﻌﺘﺼ ﻢ ﻋﻨ ﺪﻣﺎ أﺧﺒ ﺮ ﺑ ﺄن إﻣ ﺮأة ھﺎﺷ ﻤﯿﺔ ﺻ ﺎﺣﺖ وھ ﻲ أﺳ ﯿﺮة ﻓ ﻲ أﯾ ﺪي اﻟ ﺮوم‬ .‫واﻣﻌﺘﺼﻤﺎه ﻓﺄﺟﺎﺑﮭﺎ ﻟﺒﯿﻚ‬ ‫ وھ ﻲ ﻟﻤ ﺪح ﻋﻠ ﻲ‬.‫ واﻟﻘﺎﻓﯿ ﺔ ھ ﻲ اﻟ ﺪال‬،‫ وھﻲ ﻣﻦ اﻟﺒﺤﺮ اﻟﻜﺎﻣﻞ‬."‫وﻟﮫ ﻗﺼﯿﺪة آﺧﺮى اﺳﻤﮭﺎ "ﻣﺪﺣﺔ‬ .‫ﺑﻦ اﻟﺠﮭﻢ أﺛﻨﺎء ﺗﻮدﯾﻌﮫ ﻟﺴﻔﺮه‬ ‫ وھ ﻲ‬.‫ واﻟﻘﺎﻓﯿﺔ ھﻲ اﻟﻨ ﻮن‬،‫ وھﻲ ﻣﻦ اﻟﺒﺤﺮ اﻟﺜﺎﻧﻲ اﻟﺒﺴﯿﻂ‬."‫وﻟﮫ ﻗﺼﯿﺪة أﺧﺮى اﺳﻤﮭﺎ "ﻣﺪﺣﺔ ﺛﺎﻧﯿﺔ‬ .‫ﻟﻤﺪح ﻣﺤﻤﺪ ﺑﻦ ﺣﺴﺎن اﻟﻀﺒﻲ‬ .‫ واﻟﻘﺎﻓﯿﺔ اﻟ ﺪال‬.‫ وھﻲ ﻣﻦ اﻟﺒﺤﺮ اﻟﻄﻮﯾﻞ‬."‫اﻟﺒﺤﺘﺮي ﺷﺎﻋﺮ ﻋﺮﺑﻲ ﻟﮫ ﻗﺼﯿﺪة "ﻣﻘﺘﻞ ذﺋﺐ ﻓﻲ اﻟﺒﯿﺪاء‬ .‫وﻗﯿﻠﺖ أﺛﻨﺎء ﻣﻮاﺟﮭﺔ ذﺋﺐ ﻓﻲ اﻟﺒﺪاء‬ Figure (4.9): Arabic Poets Document In this document there are many poems. Each poem has specific features. It is made more precise is by using identifiers to the poets as seen in Figure (4.10).

83

‫أﺑﻮ ﻧﻮاس ﺷﺎﻋﺮ ﻋﺮﺑﻲ‪.‬‬ ‫أﺑﻮ ﺗﻤﺎم ﺷﺎﻋﺮ ﻋﺮﺑﻲ‪.‬‬ ‫اﻟﺒﺤﺘﺮي ﺷﺎﻋﺮ ﻋﺮﺑﻲ‪.‬‬ ‫ﻟﻠﺸ ﺎﻋﺮ أﺑ ﻮ ﻧ ﻮاس ﻗﺼ ﯿﺪة "اﻟﻘ ﺪح اﻟﻤ ﺪار"‪ .‬وھ ﻲ ﻣ ﻦ اﻟﺒﺤ ﺮ اﻟ ﻮاﻓﺮ‪ ،‬واﻟﻘﺎﻓﯿ ﺔ اﻟ ﺮاء‪ ،‬وإﻧﮭ ﺎ ﻓ ﻲ‬ ‫وﺻﻒ اﻟﺨﻤﺮ‪.‬‬ ‫ﻟﻠﺸﺎﻋﺮ أﺑﻮ ﺗﻤﺎم ﻗﺼﯿﺪة "اﻟﺴﯿﻒ أﺻﺪق"‪ .‬وھﻲ ﻣﻦ اﻟﺒﺤﺮ اﻟﺒﺴﯿﻂ‪ ،‬واﻟﻘﺎﻓﯿ ﺔ اﻟﺒ ﺎء‪ ،‬وھ ﻲ ﻓ ﻲ ﻣ ﺪح‬ ‫اﻟﻤﻌﺘﺼﻢ ﻋﻨﺪﻣﺎ ﻟﺒﻰ ﻧﺪاء إﻣﺮاه ھﺎﺷﻤﯿﺔ ﺻﺎﺣﺖ وھﻲ أﺳﯿﺮة ﻓﻲ أﯾ ﺪي اﻟ ﺮوم واﻣﻌﺘﺼ ﻤﺎه ﻓﺄﺟﺎﺑﮭ ﺎ‬ ‫ﻟﺒﯿﻚ‪.‬‬ ‫وﻟﮫ ﻗﺼﯿﺪة "ﻣﺪﺣﮫ"‪ .‬وھﻲ ﻣﻦ اﻟﺒﺤﺮ اﻟﻜﺎﻣ ﻞ‪ .‬واﻟﻘﺎﻓﯿ ﺔ اﻟ ﺪال‪ .‬وإﻧﮭ ﺎ ﻓ ﻲ ﻣ ﺪح ﻋﻠ ﻲ ﺑ ﻦ ﺟﮭ ﻢ أﺛﻨ ﺎء‬ ‫ﺗﻮدﯾﻌﮫ ﻟﺴﻔﺮه‪.‬‬ ‫وﻟﮫ ﻗﺼﯿﺪة " ﻣﺪﺣﮫ ﺛﺎﻧﯿﺔ"‪ .‬وھﻲ ﻣﻦ اﻟﺒﺤﺮ اﻟﺜﺎﻧﻲ اﻟﺒﺴﯿﻂ‪ .‬واﻟﻘﺎﻓﯿﺔ اﻟﻨﻮن‪ .‬وإﻧﮭﺎ ﻓﻲ ﻣﺪح ﻣﺤﻤﺪ ﺑﻦ‬ ‫ﺣﺴﺎن اﻟﻀﺒﻲ‪.‬‬ ‫ﻟﻠﺸ ﺎﻋﺮ اﻟﺒﺤﺘ ﺮي "ﻗﺼ ﯿﺪة ﻣﻘﺘ ﻞ ذﺋ ﺐ ﻓ ﻲ اﻟﺒﯿ ﺪاء"‪ .‬وھ ﻲ ﻣ ﻦ اﻟﺒﺤ ﺮ اﻟﻄﻮﯾ ﻞ‪ .‬واﻟﻘﺎﻓﯿ ﺔ اﻟ ﺪال‪.‬‬ ‫واﻟﻘﺼﯿﺪة ﻓﻲ ﻣﻮاﺟﮭﺔ ذﺋﺐ ﻓﻲ ﻟﺒﯿﺪاء‪.‬‬ ‫‪Figure (4.10): Arabic Poet Document with Identifiers‬‬ ‫‪Consider as a simple example of Arabic sentence:‬‬

‫ﻧَﻈَ َﻢ اﺑﻮ ﻧﻮاس ﻗﺼﻴﺪة اﻟﻘﺪح اﳌﺪار‬ ‫‪ ) and the attribute‬اﺑـﻮ ﻧـﻮاس‬

‫( ‪That mean the ( subject ) of this sentence is‬‬

‫‪ ).‬ﻗﺼﻴﺪة اﻟﻘﺪح اﳌﺪار ( ‪ ) and the ( object ) is‬ﻧَﻈَ َﻢ ( ‪(property ) is‬‬ ‫‪In RDF, this new statement, redefined as RDF triple, can be considered a‬‬ ‫‪complete RDF graph because it consists of a complete fact that can be‬‬ ‫‪recorded using RDF methodology and that can then be documented using‬‬ ‫‪shorthand techniques. It is using the following to represent a triple:‬‬ ‫}‪{subject, predicate, object‬‬ ‫‪Then, the above Arabic statement becomes:‬‬

‫} ﻗﺼﻴﺪة اﻟﻘﺪح اﳌﺪار‪ ,‬ﻧَﻈَ َﻢ ‪ ,‬اﺑﻮ ﻧﻮاس{‬

‫‪84‬‬

This triple has the following parts:

‫اﺑﻮ ﻧﻮاس‬

Subject ( Literal )

‫ﻧَﻈَ َﻢ‬

Predicate ( Property )

‫ﻗﺼﻴﺪة اﻟﻘﺪح اﳌﺪار‬

Object ( Literal )

Then the diagram of this sentence is shown in Figure(4.11):

‫اﺑﻮ ﻧﻮاس‬

َ َ‫ﻧ‬ ‫ﻈ َﻢ‬

‫ﻗﺼﯿﺪة اﻟﻘﺪح اﻟﻤﺪار‬

Figure (4.11): Simple Node and Arc Diagram for Arabic Sentence

RDF document for Figure (4.11) is: "‪

Suggest Documents