2009 2009 2009 IEEE/WIC/ACM IEEE/WIC/ACM IEEE/WIC/ACM International International International Conference Joint JointConferences Conference on Web Intelligence on on Web Web Intelligence Intelligence and Intelligent and and Intelligent Intelligent Agent Technology Agent Agent Technologies Technology - Workshops
Ontology-based Intelligent Web Mining Agent for Taiwan Travel
Yung-Chun Chang
Pei-Ching Yang
Academia Sinica Grid Computing Academia Sinica Taipei, Taiwan
[email protected]
Department of Computer Science and Information Engineering National Cheng Kung University Tainan, Taiwan
[email protected] [email protected]
place, with Chinese and aboriginal festivals, performing arts and religious belief preserving a legacy that goes back millennia. Taiwan's hinterland offers more surprises: towering mountains, including Northeast Asia's tallest, six national parks, a selection of alluring offshore islands and, numerous hot-spring resorts. Besides, it has always been famous for its traditional, delicious, and mouthwatering local foods [8].
Abstract—Due to the gradual increase in travel, the travel agent plays an important role in providing travel information conform to tourist’s requirements. Taiwan, also known as Formosa, its society is known throughout the world for its sincere hospitality and diverse cultural cuisine, and it has been one of the top tourist attractions in East Asia for years. In this paper, we propose an ontology-based intelligent web mining agent for Taiwan travel. The core technologies of the agent contain the ontology model, fuzzy inference mechanism, particle swarm optimization and ant colony optimization. The proposed agent can help tourist to collect Taiwan travel information form World Wide Web automatically by using tourist’s natural language description or documents. In this way, it will reduce travel agency’s workload and to accelerate the speed of tourist’s getting the travel information.
The remainder of this paper is structured as follows: Section 2 describes the structure of Taiwan travel ontology. Section 3 presents an ontology-based intelligent web mining agent for Taiwan travel. Finally, some conclusions are drawn in Section 4.
Keywords- Ontology; Agent; Fuzzy Inference; Particle Swarm Optimization; Ant Colony Optimization component
I.
II.
THE STRUCTURE OF TAIWAN TRAVEL ONTOLOGY
The ontology is a computational model of some portions of the world. Recently, we have seen an explosion of interest in ontologies as artifacts to represent human knowledge and as critical components in knowledge management, the Semantic Web [4], business-to-business applications, and several other application areas [7][9]. In this section, we explain the structure of Taiwan travel ontology. This paper adopts a novel structure of the domain ontology to construct Taiwan travel ontology.
INTRODUCTION
With the huge amount of information available online, it is difficult for users to find out information needed quickly and correctly. Search engine still can’t know the meaning of semantic query, although many portals that provide users with search engine service by typing key words. As a result, it will search out huge amount of impertinent information. Besides, based on the searching rules, many companies provide search engine optimization for change the rank of searching result, hence the search engine will lose its’ exactitude and credibility. In recent years, due to the growing number of travelers, a wide range of information such as the tourist attractions and local gourmet food are posted on the Internet to appeal to the tourists. However, it is not easy for a tourist to obtain the information what exactly he really wants from a large amount of information available on the Internet. Because of this, if there is an agent to help with collecting travel information which is tourist exactly want. Then it is sure both to reduce travel agency’s workload and to accelerate the speed of tourist’s getting the travel information. The motivation of our work is to provide tourist an agent can collect Taiwan travel information from Internet automatically. Taiwan is located off the southeastern coast of China, at the western edge of the Pacific Ocean, between Japan and the Philippines. Taiwan is an intensely traditional 978-0-7695-3801-3/09 $26.00 © 2009 IEEE DOI 10.1109/WI-IAT.2009.316
Jung-Hsien Chiang
Ξ Taipei
Ξ
Nantou
Concept Layer
Kaohsiung Healien
Tainan
Taichung
Relation Layer Locate A Part of Gourmet
Locate Locate
A Part of Specialty Goods
A Part of History Site
A Part of History Site
A Part of Scenic Area
Locate
A Part of Gourmet
Taroko National Park Mochi
Fort San Domingo
Oyster Omelet
Sun Moon Lake
Sun Cake
Ξ
Anping Fort
Instance Layer Heart of Love River
Ξ
Association Instance of
Figure 1. Architecture of Taiwan travel ontology.
417 421
Figure 1 depicts the architecture of the Taiwan travel ontology which including a concept layer, a relation layer, and an instance layer [2][10]. The concepts in the concept layer include “Healien,” “Taipei,” “Taichung,” “Nantou,” “Tainan,” “Kaohsiung,” and so on. And there are main five kind of relationship defined in the relation layer. Including “Locate,” “A Part of History Site,” “A Part of Gourmet,” “A Part of Scenic Area,” and “A Part of Specially Goods.” The instance layer represents all kinds of instances, and relevant to concepts via each relationship in the relation layer. For example, The instance-of relation between concept “Healien” and instance “Taroko National Park” are called “Locate” and “A Part of Scenic Area.” Consider all of the features mentioned above, the domain knowledge of Taiwan travel for the proposed agent is able to be represented using the structure of the domain ontology. III.
Web Information Pre-processing Agent Web Attribute Statistics Mechanism Domain Expert
…
Chinese Dictionary (CKIP)
Web Information and Term Filter
Ant Colony Optimization for Semantic Relation Mechanism
World Wide Web
Taiwan Travel Ontology
Semantic Analysis Agent Fuzzy Inference Engine Ontology Term Set O1 O2 On
Fuzzy Rule Base
Coordinate Transfer Service
Query Term Set Q1 Q2 Qn
…
…
…
Particle Swarm Optimization Engine
… Semantic Similarity Repository
ONTOLOGY-BASED INTELLIGENT WEB MINING AGENT FOR TAIWAN TRAVEL
Y
Y
Y
i
2
1
Sorting Mechanism
Figure 2. Architecture of ontology-based intelligent web mining agent for Taiwan travel.
The Ontology-based intelligent web mining agent (OIWMA) for Taiwan travel can collect Taiwan travel information based on the tourist’s requirements. Then it is sure both to reduce travel agency’s workload and to accelerate the speed of tourist’s getting the travel information. Figure 2 shows the architecture of OIWMA, and it includes two agents. They are Web Information Preprocessing Agent and Semantic Analysis Agent respectively [9]. The detail processes of each agent are described as bellow:
SD TF
SR
(1)
n
(2)
m
¦ HLF ¦ HTF
CR
i
i 1
j
j 1
After transferring each website to a 3D space, it will choose the appropriate webpage by using the particle swarm optimization [5]. The algorithm for the particle swarm optimization is given as follows:
A. Web Information Pre-process Agent The CKIP will process webpage data which retrieval from World Wide Web, and filter out meaningful concept terms of Taiwan travel ontology and defined web tag via Web Information and Term Filter [1]. And then its’ result will be integrated to be the Context Information Strength (CIS). The CIS will be transferred to Web Attribute Statistics Mechanism and Ant Colony Optimization [6] for Semantic Relation Mechanism respectively. The Web Attribute Statistics Mechanism will calculate the Context Relation Similarity (CRS), and the Ant Colony Optimization for Semantic Relation Mechanism will calculate the Semantic Relation Similarity (SRS). Those two values which have been calculated are very important for OIWMA.
Algorithm for PSO of OIWMA BEGIN Initialize the particle swarm optimization environment. 1.1: Generate particles’ initial location (xid) randomly. /*The xid denotes the dth-dimensional search space of ith particle*/ 1.2: Generate particles’ initial velocity (vid) depend on their location. /*The vid denotes the dth-dimensional velocity of ith particle*/ 1.3: Set social optimal fitness value. Optimization 2.1: For i m 1 to i /*The i denotes the number of cycle*/ 2.2: For j m 1 to j /*The j denotes the number of particle*/ 2.2.1: Every particle executes the fuzzy inference engine for calculating semantic strength of each webpage to be the fitness value. 2.2.2: Modify each dimensional velocity and search space of particles by using the following equation: (3) v v C u rand () u ( p x ) C u Rand () u ( p x ) id
B. Semantic Analysis Agent In this agent, the Coordinate Transfer Service will convert each webpage to a 3D space based on the Context Information Strength, Context Relation Similarity and Semantic Relation Similarity. Following is an example of converting the several Taiwan gourmet website to 3D space. X-axis is the value of semantic relation which is calculated by Eq. (1), Y-axis is the value of context relation which is given by Eq. (2) and Z-axis is the value of context information which is given by term frequency. Where SD is Semantic Distance which calculated by ACO, TF is Term Frequency. Where HLF is the term frequency of each Hyperlink, and HTF is the term frequency of each HTML head tag.
xid
id
1
xid vid
id
id
2
gd
id
(4)
Where C1 and C2 are two positive constants, and rand() and Rand() are two functions in the range[0, 1]. pid is the best location of this individual, and the pgd is the best location of this social.
END
In this paper, we use the fuzzy inference engine to be the fitness function of PSO. Herein, the results of the fuzzy inference engine are adopted to infer the semantic similarity between each webpage and each instance stored in the ontology. Table 1 shows the fuzzy rules predefined by domain experts.
422 418
Mean+50, Mean+50, Mean+100), and SRS_VeryLow(x: Mean+50, Mean+100, Mean+100, Mean+100). The semantic relation similarity represents the semantic relation between the each webpage and instances stored in the ontology. In Fig. 4(a), the value of Mean is selected as the threshold value for determining the membership function for fuzzy variable SRS. The Mean represents the value that averages the location distance of all instances stored in the ontology. Fig. 4(b) shows the membership functions for fuzzy sets CRS_Low(x: 0, 0, 1, 2), CRS_Medium(x: 1, 2, 2, 3), and CRS_High(x: 2, 3, 3, 3). Fig. 4(c) displays the membership functions for fuzzy sets CIS_VeryLow(x: 0, 0, 0, 0.4), CIS_Low(x: 0.2, 0.4, 0.4, 0.6), CIS_Medium(x: 0.4, 0.6, 0.6, 0.8), CIS_High(x: 0.6, 0.8, 0.8, 1), and CIS_VeryHigh(x: 0.8, 1, 1, 1). If the value of CIS is high, then the membership degree for the CI similarity is high [3].
Z http://taiwan.net.tw http://www.tw-food.com.tw
http://www.tw-food.com.tw/action/index.php?type_id=3 http://blog.mjjq.com/archives/1931.html http://blog.roodo.com/subing/archives/239360.html X http://taiwan.net.tw/m1.aspx?sNO=0000106 http://www.taiwanfun.com/north/taipei/dining/indexTW.htm http://www.tcff.com.tw/ http://www.always-free.com/5252food/
Y
Figure 3. Apply coordinate transfer service to convert webpage to coordinate.
Membership Degree
Rule No 1
Table 1. Fuzzy rules of the fuzzy inference engine. Fuzzy Variable Semantic Relation Context Relation Context Information Similarity (SRS) Similarity (CRS) Strength (CIS) VeryHigh VeryHigh VeryHigh
2
VeryHigh
High
VeryHigh
3
VeryHigh
Medium
VeryHigh
4
VeryHigh
Low
High
5
VeryHigh
VeryLow
High
6
High
VeryHigh
VeryHigh VeryHigh
7
High
High
8
High
Medium
High
9
High
Low
Medium
10
High
VeryLow
Medium
11
Medium
VeryHigh
VeryHigh
12
Medium
High
High
13
Medium
Medium
High
14
Medium
Low
Medium
15
Medium
VeryLow
Low
16
Low
VeryHigh
Medium
17
Low
High
Medium
18
Low
Medium
Low
19
Low
Low
Low
20
Low
VeryLow
VeryLow
21
VeryLow
VeryHigh
Medium
22
VeryLow
High
Low
23
VeryLow
Medium
Low
24
VeryLow
Low
VeryLow
25
VeryLow
VeryLow
VeryLow
SRS_VeryHigh
1
SRS_High
SRS_Medium
SRS_Low
SRS_VeryLow
0 Mean-100
Mean-50
Mean
Mean+50
Mean+100
Location Distance
(a) Membership Degree
1
CIS_VeryLow
CIS_Low CIS_Medium CIS_High
CIS_VeryHigh
0 0.2
0.4
0.6
0.8
1
Context Information
(b) Membership Degree
1
CRS_Low
CRS_Medium
CRS_High
0 1
2
3
Context Relation
(c) Figure 4. Membership functions of the fuzzy variables: (a) Semantic Relation Similarity (SRS), (b) Context Relation Similarity (CRS), and (c) Context Information Strength (CIS).
The linguistic terms of input fuzzy variable SRS are VeryHigh, High, Medium, Low, and VeryLow. Fig. 4(a) shows the membership functions for fuzzy sets SRS_VeryHigh(x: 0, 0, Mean-100, Mean-50), SRS_High(x: Mean-100, Mean-50, Mean-50, Mean), SRS_Medium(x: Mean-50, Mean, Mean, Mean+50), SRS_Low(x: Mean,
Fuzzy inference results will be transferred to sorting mechanism when fuzzy inference engine has been executed completely. Sorting mechanism sorts the semantic similarity between each webpage and the ontology in a descending order. All sorting results will be stored in semantic similarity
423 419
repository, and then OIWMA can provide the appropriate travel information to tourist. Besides, it can also reduce travel agency’s workload and to accelerate the speed of tourist’s getting the travel information. IV.
CONCLUSION
The ontology-based intelligent web mining agent for Taiwan travel, including a web information pre-process agent and a semantic analysis agent, is presented in this paper. The proposed agent can collect the Taiwan travel information form World Wide Web according to the tourist’s requirements. With this provided information, the tourist can understand all kinds of Taiwan travel information conveniently and easily. But there are still some problems needed to further study in the future. For example, the genetic learning or neural network will be added to the fuzzy inference to enhance the proposed method. Moreover, the domain ontology for Taiwan travel can be applied to other fields. REFERENCES [1]
[2]
[3]
[4] [5] [6] [7] [8]
C. S. Lee, M. H. Wang, J. J. Chen, and C. Y. Hsu, “Ontology-based Intelligent Decision Support Agent for CMMI Project Monitoring and Control,” International Journal of Approximate Reasoning, vol. 48, vol. 1, pp. 62-76, 2008. C. S. Lee, M. H. Wang, W. C. Sun, and Y. C. Chang, “Intelligent Healthcare Agent for Food Recommendation at Tainan City,” IEEE International Conference on Systems, Man, and Cybernetics (SMC 2008), pp.1465-1470, 2008. C. S. Lee, Y. C. Chang, M. H. Wang, “ Ontological recommendation multi-agent for Tainan City travel,“ Expert Systems with Applications, Vol. 36, Issue 3, Part 2, pp. 67406753, 2009. G. Fenza, V. Loia, and S. Senatore, “A hybrid approach to semantic web services matchmaking,” International Journal of Approximate Reasoning, vol. 48, no. 3, pp. 808-828, 2008. J. Kennedy, R. Eberhart, “Particle Swarm Optimization,” Proceedings of IEEE international Conference on Neural Network, vol.4, PP. 1942-1948, 1995. M. Dorigo, M. Birattari, and T. Stutzle, “Ant Colony Optimization,” IEEE Computational Intelligence Magazine, vol. 1 no. 4 pp. 28-39, 2006. T. B. Lee, “The Semantic Web Revisited,” IEEE Intelligent Systems, vol. 21, no. 3, pp. 96-101, 2006. Taiwan Tourism Bureau. Retrieved Mar. 5, 2009, from the World Wide Web: http://www.taiwan.net.tw/
W. S. Lo, T. P. Hong, and R. Jeng, “A framework of E-SCM multi-agent systems in the fashion industry,” International Journal of Production Economics, vol. 114, no. 2, pp. 594-614, 2008. [10] S. Y. Yang, “OntoPortal: An ontology-supported portal architecture with linguistically enhanced and focused crawler technologies,” Expert Systems with Applications, vol. 36, no. 6, pp. 10148-10157, 2009. [9]
424 420