Ontology-Based Intelligent Web Mining Agent for ... - Semantic Scholar

8 downloads 2314 Views 553KB Size Report
provide search engine optimization for change the rank of searching result .... Expert … Taiwan Travel. Ontology. Ant Colony. Optimization for. Semantic Relation.
2009 2009 2009 IEEE/WIC/ACM IEEE/WIC/ACM IEEE/WIC/ACM International International International Conference Joint JointConferences Conference on Web Intelligence on on Web Web Intelligence Intelligence and Intelligent and and Intelligent Intelligent Agent Technology Agent Agent Technologies Technology - Workshops

Ontology-based Intelligent Web Mining Agent for Taiwan Travel

Yung-Chun Chang

Pei-Ching Yang

Academia Sinica Grid Computing Academia Sinica Taipei, Taiwan [email protected]

Department of Computer Science and Information Engineering National Cheng Kung University Tainan, Taiwan [email protected] [email protected]

place, with Chinese and aboriginal festivals, performing arts and religious belief preserving a legacy that goes back millennia. Taiwan's hinterland offers more surprises: towering mountains, including Northeast Asia's tallest, six national parks, a selection of alluring offshore islands and, numerous hot-spring resorts. Besides, it has always been famous for its traditional, delicious, and mouthwatering local foods [8].

Abstract—Due to the gradual increase in travel, the travel agent plays an important role in providing travel information conform to tourist’s requirements. Taiwan, also known as Formosa, its society is known throughout the world for its sincere hospitality and diverse cultural cuisine, and it has been one of the top tourist attractions in East Asia for years. In this paper, we propose an ontology-based intelligent web mining agent for Taiwan travel. The core technologies of the agent contain the ontology model, fuzzy inference mechanism, particle swarm optimization and ant colony optimization. The proposed agent can help tourist to collect Taiwan travel information form World Wide Web automatically by using tourist’s natural language description or documents. In this way, it will reduce travel agency’s workload and to accelerate the speed of tourist’s getting the travel information.

The remainder of this paper is structured as follows: Section 2 describes the structure of Taiwan travel ontology. Section 3 presents an ontology-based intelligent web mining agent for Taiwan travel. Finally, some conclusions are drawn in Section 4.

Keywords- Ontology; Agent; Fuzzy Inference; Particle Swarm Optimization; Ant Colony Optimization component

I.

II.

THE STRUCTURE OF TAIWAN TRAVEL ONTOLOGY

The ontology is a computational model of some portions of the world. Recently, we have seen an explosion of interest in ontologies as artifacts to represent human knowledge and as critical components in knowledge management, the Semantic Web [4], business-to-business applications, and several other application areas [7][9]. In this section, we explain the structure of Taiwan travel ontology. This paper adopts a novel structure of the domain ontology to construct Taiwan travel ontology.

INTRODUCTION

With the huge amount of information available online, it is difficult for users to find out information needed quickly and correctly. Search engine still can’t know the meaning of semantic query, although many portals that provide users with search engine service by typing key words. As a result, it will search out huge amount of impertinent information. Besides, based on the searching rules, many companies provide search engine optimization for change the rank of searching result, hence the search engine will lose its’ exactitude and credibility. In recent years, due to the growing number of travelers, a wide range of information such as the tourist attractions and local gourmet food are posted on the Internet to appeal to the tourists. However, it is not easy for a tourist to obtain the information what exactly he really wants from a large amount of information available on the Internet. Because of this, if there is an agent to help with collecting travel information which is tourist exactly want. Then it is sure both to reduce travel agency’s workload and to accelerate the speed of tourist’s getting the travel information. The motivation of our work is to provide tourist an agent can collect Taiwan travel information from Internet automatically. Taiwan is located off the southeastern coast of China, at the western edge of the Pacific Ocean, between Japan and the Philippines. Taiwan is an intensely traditional 978-0-7695-3801-3/09 $26.00 © 2009 IEEE DOI 10.1109/WI-IAT.2009.316

Jung-Hsien Chiang

Ξ Taipei

Ξ

Nantou

Concept Layer

Kaohsiung Healien

Tainan

Taichung

Relation Layer Locate A Part of Gourmet

Locate Locate

A Part of Specialty Goods

A Part of History Site

A Part of History Site

A Part of Scenic Area

Locate

A Part of Gourmet

Taroko National Park Mochi

Fort San Domingo

Oyster Omelet

Sun Moon Lake

Sun Cake

Ξ

Anping Fort

Instance Layer Heart of Love River

Ξ

Association Instance of

Figure 1. Architecture of Taiwan travel ontology.

417 421

Figure 1 depicts the architecture of the Taiwan travel ontology which including a concept layer, a relation layer, and an instance layer [2][10]. The concepts in the concept layer include “Healien,” “Taipei,” “Taichung,” “Nantou,” “Tainan,” “Kaohsiung,” and so on. And there are main five kind of relationship defined in the relation layer. Including “Locate,” “A Part of History Site,” “A Part of Gourmet,” “A Part of Scenic Area,” and “A Part of Specially Goods.” The instance layer represents all kinds of instances, and relevant to concepts via each relationship in the relation layer. For example, The instance-of relation between concept “Healien” and instance “Taroko National Park” are called “Locate” and “A Part of Scenic Area.” Consider all of the features mentioned above, the domain knowledge of Taiwan travel for the proposed agent is able to be represented using the structure of the domain ontology. III.

Web Information Pre-processing Agent Web Attribute Statistics Mechanism Domain Expert



Chinese Dictionary (CKIP)

Web Information and Term Filter

Ant Colony Optimization for Semantic Relation Mechanism

World Wide Web

Taiwan Travel Ontology

Semantic Analysis Agent Fuzzy Inference Engine Ontology Term Set O1 O2 On

Fuzzy Rule Base

Coordinate Transfer Service

Query Term Set Q1 Q2 Qn







Particle Swarm Optimization Engine

… Semantic Similarity Repository

ONTOLOGY-BASED INTELLIGENT WEB MINING AGENT FOR TAIWAN TRAVEL

Y

Y

Y

i

2

1

Sorting Mechanism

Figure 2. Architecture of ontology-based intelligent web mining agent for Taiwan travel.

The Ontology-based intelligent web mining agent (OIWMA) for Taiwan travel can collect Taiwan travel information based on the tourist’s requirements. Then it is sure both to reduce travel agency’s workload and to accelerate the speed of tourist’s getting the travel information. Figure 2 shows the architecture of OIWMA, and it includes two agents. They are Web Information Preprocessing Agent and Semantic Analysis Agent respectively [9]. The detail processes of each agent are described as bellow:

SD TF

SR

(1)

n

(2)

m

¦ HLF  ¦ HTF

CR

i

i 1

j

j 1

After transferring each website to a 3D space, it will choose the appropriate webpage by using the particle swarm optimization [5]. The algorithm for the particle swarm optimization is given as follows:

A. Web Information Pre-process Agent The CKIP will process webpage data which retrieval from World Wide Web, and filter out meaningful concept terms of Taiwan travel ontology and defined web tag via Web Information and Term Filter [1]. And then its’ result will be integrated to be the Context Information Strength (CIS). The CIS will be transferred to Web Attribute Statistics Mechanism and Ant Colony Optimization [6] for Semantic Relation Mechanism respectively. The Web Attribute Statistics Mechanism will calculate the Context Relation Similarity (CRS), and the Ant Colony Optimization for Semantic Relation Mechanism will calculate the Semantic Relation Similarity (SRS). Those two values which have been calculated are very important for OIWMA.

Algorithm for PSO of OIWMA BEGIN Initialize the particle swarm optimization environment. 1.1: Generate particles’ initial location (xid) randomly. /*The xid denotes the dth-dimensional search space of ith particle*/ 1.2: Generate particles’ initial velocity (vid) depend on their location. /*The vid denotes the dth-dimensional velocity of ith particle*/ 1.3: Set social optimal fitness value. Optimization 2.1: For i m 1 to i /*The i denotes the number of cycle*/ 2.2: For j m 1 to j /*The j denotes the number of particle*/ 2.2.1: Every particle executes the fuzzy inference engine for calculating semantic strength of each webpage to be the fitness value. 2.2.2: Modify each dimensional velocity and search space of particles by using the following equation: (3) v v  C u rand () u ( p  x )  C u Rand () u ( p  x ) id

B. Semantic Analysis Agent In this agent, the Coordinate Transfer Service will convert each webpage to a 3D space based on the Context Information Strength, Context Relation Similarity and Semantic Relation Similarity. Following is an example of converting the several Taiwan gourmet website to 3D space. X-axis is the value of semantic relation which is calculated by Eq. (1), Y-axis is the value of context relation which is given by Eq. (2) and Z-axis is the value of context information which is given by term frequency. Where SD is Semantic Distance which calculated by ACO, TF is Term Frequency. Where HLF is the term frequency of each Hyperlink, and HTF is the term frequency of each HTML head tag.

xid

id

1

xid  vid

id

id

2

gd

id

(4)

Where C1 and C2 are two positive constants, and rand() and Rand() are two functions in the range[0, 1]. pid is the best location of this individual, and the pgd is the best location of this social.

END

In this paper, we use the fuzzy inference engine to be the fitness function of PSO. Herein, the results of the fuzzy inference engine are adopted to infer the semantic similarity between each webpage and each instance stored in the ontology. Table 1 shows the fuzzy rules predefined by domain experts.

422 418

Mean+50, Mean+50, Mean+100), and SRS_VeryLow(x: Mean+50, Mean+100, Mean+100, Mean+100). The semantic relation similarity represents the semantic relation between the each webpage and instances stored in the ontology. In Fig. 4(a), the value of Mean is selected as the threshold value for determining the membership function for fuzzy variable SRS. The Mean represents the value that averages the location distance of all instances stored in the ontology. Fig. 4(b) shows the membership functions for fuzzy sets CRS_Low(x: 0, 0, 1, 2), CRS_Medium(x: 1, 2, 2, 3), and CRS_High(x: 2, 3, 3, 3). Fig. 4(c) displays the membership functions for fuzzy sets CIS_VeryLow(x: 0, 0, 0, 0.4), CIS_Low(x: 0.2, 0.4, 0.4, 0.6), CIS_Medium(x: 0.4, 0.6, 0.6, 0.8), CIS_High(x: 0.6, 0.8, 0.8, 1), and CIS_VeryHigh(x: 0.8, 1, 1, 1). If the value of CIS is high, then the membership degree for the CI similarity is high [3].

Z http://taiwan.net.tw http://www.tw-food.com.tw

http://www.tw-food.com.tw/action/index.php?type_id=3 http://blog.mjjq.com/archives/1931.html http://blog.roodo.com/subing/archives/239360.html X http://taiwan.net.tw/m1.aspx?sNO=0000106 http://www.taiwanfun.com/north/taipei/dining/indexTW.htm http://www.tcff.com.tw/ http://www.always-free.com/5252food/

Y

Figure 3. Apply coordinate transfer service to convert webpage to coordinate.

Membership Degree

Rule No 1

Table 1. Fuzzy rules of the fuzzy inference engine. Fuzzy Variable Semantic Relation Context Relation Context Information Similarity (SRS) Similarity (CRS) Strength (CIS) VeryHigh VeryHigh VeryHigh

2

VeryHigh

High

VeryHigh

3

VeryHigh

Medium

VeryHigh

4

VeryHigh

Low

High

5

VeryHigh

VeryLow

High

6

High

VeryHigh

VeryHigh VeryHigh

7

High

High

8

High

Medium

High

9

High

Low

Medium

10

High

VeryLow

Medium

11

Medium

VeryHigh

VeryHigh

12

Medium

High

High

13

Medium

Medium

High

14

Medium

Low

Medium

15

Medium

VeryLow

Low

16

Low

VeryHigh

Medium

17

Low

High

Medium

18

Low

Medium

Low

19

Low

Low

Low

20

Low

VeryLow

VeryLow

21

VeryLow

VeryHigh

Medium

22

VeryLow

High

Low

23

VeryLow

Medium

Low

24

VeryLow

Low

VeryLow

25

VeryLow

VeryLow

VeryLow

SRS_VeryHigh

1

SRS_High

SRS_Medium

SRS_Low

SRS_VeryLow

0 Mean-100

Mean-50

Mean

Mean+50

Mean+100

Location Distance

(a) Membership Degree

1

CIS_VeryLow

CIS_Low CIS_Medium CIS_High

CIS_VeryHigh

0 0.2

0.4

0.6

0.8

1

Context Information

(b) Membership Degree

1

CRS_Low

CRS_Medium

CRS_High

0 1

2

3

Context Relation

(c) Figure 4. Membership functions of the fuzzy variables: (a) Semantic Relation Similarity (SRS), (b) Context Relation Similarity (CRS), and (c) Context Information Strength (CIS).

The linguistic terms of input fuzzy variable SRS are VeryHigh, High, Medium, Low, and VeryLow. Fig. 4(a) shows the membership functions for fuzzy sets SRS_VeryHigh(x: 0, 0, Mean-100, Mean-50), SRS_High(x: Mean-100, Mean-50, Mean-50, Mean), SRS_Medium(x: Mean-50, Mean, Mean, Mean+50), SRS_Low(x: Mean,

Fuzzy inference results will be transferred to sorting mechanism when fuzzy inference engine has been executed completely. Sorting mechanism sorts the semantic similarity between each webpage and the ontology in a descending order. All sorting results will be stored in semantic similarity

423 419

repository, and then OIWMA can provide the appropriate travel information to tourist. Besides, it can also reduce travel agency’s workload and to accelerate the speed of tourist’s getting the travel information. IV.

CONCLUSION

The ontology-based intelligent web mining agent for Taiwan travel, including a web information pre-process agent and a semantic analysis agent, is presented in this paper. The proposed agent can collect the Taiwan travel information form World Wide Web according to the tourist’s requirements. With this provided information, the tourist can understand all kinds of Taiwan travel information conveniently and easily. But there are still some problems needed to further study in the future. For example, the genetic learning or neural network will be added to the fuzzy inference to enhance the proposed method. Moreover, the domain ontology for Taiwan travel can be applied to other fields. REFERENCES [1]

[2]

[3]

[4] [5] [6] [7] [8]

C. S. Lee, M. H. Wang, J. J. Chen, and C. Y. Hsu, “Ontology-based Intelligent Decision Support Agent for CMMI Project Monitoring and Control,” International Journal of Approximate Reasoning, vol. 48, vol. 1, pp. 62-76, 2008. C. S. Lee, M. H. Wang, W. C. Sun, and Y. C. Chang, “Intelligent Healthcare Agent for Food Recommendation at Tainan City,” IEEE International Conference on Systems, Man, and Cybernetics (SMC 2008), pp.1465-1470, 2008. C. S. Lee, Y. C. Chang, M. H. Wang, “ Ontological recommendation multi-agent for Tainan City travel,“ Expert Systems with Applications, Vol. 36, Issue 3, Part 2, pp. 67406753, 2009. G. Fenza, V. Loia, and S. Senatore, “A hybrid approach to semantic web services matchmaking,” International Journal of Approximate Reasoning, vol. 48, no. 3, pp. 808-828, 2008. J. Kennedy, R. Eberhart, “Particle Swarm Optimization,” Proceedings of IEEE international Conference on Neural Network, vol.4, PP. 1942-1948, 1995. M. Dorigo, M. Birattari, and T. Stutzle, “Ant Colony Optimization,” IEEE Computational Intelligence Magazine, vol. 1 no. 4 pp. 28-39, 2006. T. B. Lee, “The Semantic Web Revisited,” IEEE Intelligent Systems, vol. 21, no. 3, pp. 96-101, 2006. Taiwan Tourism Bureau. Retrieved Mar. 5, 2009, from the World Wide Web: http://www.taiwan.net.tw/

W. S. Lo, T. P. Hong, and R. Jeng, “A framework of E-SCM multi-agent systems in the fashion industry,” International Journal of Production Economics, vol. 114, no. 2, pp. 594-614, 2008. [10] S. Y. Yang, “OntoPortal: An ontology-supported portal architecture with linguistically enhanced and focused crawler technologies,” Expert Systems with Applications, vol. 36, no. 6, pp. 10148-10157, 2009. [9]

424 420