Mining web data to create online navigation recommendations Juan D. Vel´asquez1,3 , Alejandro Bassi2,4 , Hiroshi Yasuda1 and Terumasa Aoki1 1 Research Center for Advanced Science and Technology, University of Tokyo, {jvelasqu,yasuda,aoki}@mpeg.rcast.u-tokyo.ac.jp 2 Center for Collaborative Research, University of Tokyo
[email protected] 3 Department of Industrial Engineering, University of Chile 4 Department of Computer Science, University of Chile
Abstract Depending of the web site complexity, the searching for information can be difficult for a big number of user. Some web site implement systems to provide to the users online navigation recommendations. In many cases, this recommendations are based on personal user information. In this paper a system to provide navigation recommendation for web visitor is introduced. We call visitor to the anonymous user, i.e, only data about her/her browsing behavior (web logs) are available. We applied a clustering technique over the web data and from the significant patterns discovery a set of rule about how use them is created. Next comparing the current web visitor session with the patterns, a online navigation recommendation is created used the mentioned rules. The system was tested using web data from a certain web site, showing its effectiveness.
web site, some sites have incorporated system to provide online navigation recommendations. It is the case of amazon1 , where the user receive recommendations about others items related with her/his query. This paper introduce the online navigation recommendation for web site visitor, based on the past navigation browsing behavior. Here the word visitor refers to the occasional user of a web site, where no personal information is available about her/him. This paper is organized as follows. In section 2 the relative work is presented. Section 3 describes the knowledge base proposed to create online navigation recommendations. The method to create and validate the recommendation is explained in section 4. A real world application is shown in section 5, and the conclusions are presented in section 6.
2 Related work 1 Introduction From the beginning of the Web, the web site designers have always tried to create sites whose structure and content facilitate to find the information that a user is looking for. However, reality is often quite different. In many cases, the site structure hides the desired information, even though it is contained in the web site. In fact, some web sites incorporate too much content in a same page. That, aggravated by a bad structure, make users feel ”lost in the hyperspace” where they cannot find the specific content they want. It depends of the kind of user, because sometimes a web site is incomprehensible for some users but totally clear for others. However, the structure of a web site should be clear for the majority of users. In order to improve the relationship between user and
The ideal structure of a Web site should support the visitors find the information that they are looking for. However some times the site structure does not help to this intention [14]. Some research initiatives aim to understand the visitor browsing behavior [13, 22] in order to create more attractive web sites, personalizing its structure and content. These research initiatives aim at facilitating Web site navigation, and in the case of commercial sites, at increasing market shares, transforming visitors into customers, increasing customers loyalty and predicting their preferences.
1 http://www.amazon.com
2.1 Web personalization It is the process where the web server and the related applications, mainly CGI-Bin2 dynamically, customize the content (pages, items, browsing recommendations, etc.) shown to the user, based on information about his/her behavior in a web site [11, 12]. This is different to another relationed concept called “customization” where the visitor interacts with the web server using an interface to create her/his own web site, e.g., “My Banking Page” [20]. The key of web personalization is to understand the visitor’s desires and needs. This will allow to design and construct the information repositories using the data of the visitor transactions, in order to predict the correct supply of products and services [1]. Personalization requires to recognize patterns in the visitor behavior, in order to compare with the patterns of new visitors and be able to make suggestions. A specific model about the visitor behavior and a measure that allow to compare two behaviors are required.
2.2 Web recommendations There are two categories of changes in a web site: structural and content [14]. The structural ones include the addition and elimination of links. The content ones are mainly the free text modifications, the variation of other objects like the colors, pictures, etc, can be considered too. Due to the risk of directly applying the changes proposed by an automatic system, these are performed through recommendations for web users. Recommendations can be grouped y two categories: Online and Offline. The Online recommendations principally consists in navigation suggestions showed at the bottom of the web page [2]. It is a non invasive scheme which give to the user the possibility of following the suggestion or not. Offline recommendations are targeted to the web master. These are about the addition or elimination of links, and changes in the web content. It is a non invasive scheme, because the web master can accept o reject the recommendations. In this work, we are focus in the online navigation recommendations.
2.3 Mining web data Web mining techniques are the application of data mining theory in order to discovery patterns from WWW data [4, 10, 17]. It is a non trivial task, considering that the web is a huge collection of heterogeneous, unlabelled, distributed, time variant, semi-structured and high dimensional 2 Common
Gateway Interface http://www.msg.net/tutorial/cgi/
data [15]. Web mining must consider three important steps: preprocessing, pattern discovery and pattern analysis [18]. From different kinds of web data, a common terminology is defined as follows: Content. The web page content, i.e., pictures, free text, sounds, etc. Structure. Data that shows the internal web page structure. In general, they are HTML or XML tags, some of them contain information about hyperlink connections with other web pages. Usage. Data that describes the visitor preferences while browsing in a web site. It is possible to find them inside of web log files. User profile. A collection of information about a user, combining demographic information (e.g. name, age), usage information (e.g. page visited) and interest. According to the above definitions, the web mining techniques, can be grouped in three areas: Web Structure Mining (WSM), Web Content Mining (WCM) and Web Usage Mining (WUM). In the next sections, a short description of these techniques is given.
2.4 Modeling the visitor browsing behavior Our model of the visitor behavior uses three variables: the sequence of visited pages, their contents and the time spent on each page. The model is based on a n-dimensional visitor behavior vector which is defined as follows. Definition 1 (Visitor Behavior Vector) υ = [(p1 , t1 ) . . . (pn , tn )], where the pair (pi , ti ) represent the ith page viewed (pi ) and the percentage of time spent on it within a session (ti ), respectively. Let α and β be two visitor behavior vectors of dimension C α and C β , respectively. A similarity measure has been proposed elsewhere to compare visitor sessions as follows [22]: η 1X st(tα,k , tβ,k )∗sp(pα,k , pβ,k ) η k=1 (1) where η = min{C α , C β }, and sp(pα,k , pβ,k ) is the similarity between the k th page of vector α and the k th page of vector β. We calculated this similarity comparing the text content of the web page. This comparison need a previous step, it is the representation of the web page text content as a feature vector. Using the vector space model, the web site is represented by a matrix M of dimension RxQ, A = (aij ) for i = 1, . . . , R,
sm(α, β) = sg(Gα , Gβ )
j = 1, . . . , Q, where mij is the weight of the ith word in the j th page. To calculate these weights, we use a variant of the tf*idf-weighting [21], defined as follows, aij = fij (1 + sw(i)) ∗ log(
Q ) ni
(2)
where fij is the number of occurrences of the ith word in the j th page, sw(i) is a factor to increase the importance of special words and ni is the number of documents containing the ith word. Next a page pj is represented by the column j in A, i.e., pj → (a1j , . . . , aRj ) and the distance between pages is PR sp(pi , pj ) = qP R
k=1
2 k=1 (aki ) t
aki akj qP
R 2 k=1 (akj )
.
(3)
t
β,k The term st(tα,k , tβ,k ) = min{ tα,k , tα,k } is the similarβ,k th ity between the time spent on the k page by each of the two visitors. The term sg is the similarity between the sequences of pages viewed by the two visitors [16]. As an example, let us assume that the respective navigation graphs of the two visitors are defined by Gα = {1 → 2, 2 → 6, 2 → 5, 5 → 8} and Gβ = {1 → 3, 3 → 6, 3 → 7}. The graphs edge are E(Gα ) = {1, 2, 5, 6, 8} and E(Gβ ) = {1, 3, 6, 7} with cardinality k E(Gα ) k= 5 and k E(G=β ) k= 4, respectively. From Gα and Gβ , the visitor navigation sequence Sα = S(Gα ) = (1, 2, 6, 5, 8) and Sβ = S(Gβ ) = (1, 3, 6, 7) are derived. The graph similarity is defined as
Ł(Sα , Sβ ) sg(Gα , Gβ ) = 1 − 2 . k E(Gα ) k + k E(Gβ ) k
This approach has pros and cons, for instance what will happen if the expert decide to go away from the institution?. Usually the expert will be the knowledge with him. The above scenario is a motivation to find a method to maintenance the knowledge [5], i.e., to storage and retrieve the knowledge discover about a specific subject. In this sense, we need a Knowledge Base (KB) as a repository about the patterns discovery and how use them in the knowledge representation for a user.
3.1 General structure Web mining techniques have the potential to reveal interesting patterns about the visitor behavior in given web site, by analyzing web data [22]. Using his/her expertise, an expert in the respective domain can interpret these patterns and build rules for a given task, in our case for online navigation suggestions. The Knowledge Base [3] represents wisdom representation through the use of “if-then-else” rules based on the discovered patterns. Figure 1 shows the general structure of the proposed Knowledge Base. It is composed by a Pattern Repository, where the patterns discover are storage, and a Rule Repository, where the general rules about how to used the pattern are storage. if then .... if then recommendation 5 ... if then recommendation j and recommendation k
where Ł(Sα , Sβ ) is the Levenshtein distance, also known as edit distance [9]. It determines the number of transformations necessary to convert Sα into Sβ . If both graphs are equals then Ł(Sα , Sβ ) = 0 and sg(G1 , G2 ) = 1. In the case of disjoint graphs sg(Gα , Gβ ) = 0. In (1the expression sm(α, β) = 1, if and only if the visitors view pages with the same contents, in the same sequence order, and spent the same percentage of time for each k th page, within their sessions.
3 A knowledge base for navigation recommendations A web mining tool developed for a particular web site, allows to discover significant patterns about the visitor behavior and her/his preferences. However, the collaboration of an expert is required to validate the patterns and give a short description about how to use them.
browsing
Recommen− dation
Rule Repository
Current visitor
(4)
Online Navigation
MATCH Pattern repository Browsing_Behavior id_time id_wmt id_bb Time Year Month Week Day Hour
Navigation Statistics
Pattern Formula Period Description Wmt Technique Description
Figure 1. A Knowledge Base for visitor browsing behavior
In order to use the KB, when a pattern is presented a matching process is performed to find the more similar patten in the Patterns Repository. With this information, in the Rule is selected the set of rules that will create the suggestion of the KB. The potential users of the Knowledge Base are human beings and automated inference engines that use expert knowledge in their inference process. In this work, the
knowledge is used to suggest online navigation steps during a visit.
3.2 Pattern repository It stores the patterns revealed from the web data by applying web mining techniques. Figure 1 shows a generic model of the Pattern Repository, which is based on Data Mart technology over a Relational Data Base Manager System (RDBMS). In the fact table, the measures are navigation and statistics. Both are non-additives [8] and contain the web page navigation suggestions and related statistics, such as the percentage of visits in the period of study. The dimension table Time contains the date when the web mining technique was applied over the information repository. Next, the Browsing Behavior table, contains the comparison patterns and formula of a specific web mining technique. It is explained in the description column. The period column is the period in which the data to be analyzed was generated, e.g. “01-Jan-2003 to 31-Mar-2003”. Finally, the table Wmt (web mining technique) stores the applied mining technique, e.g., “Classical SOFM, SOFM with thoroidal architecture, K-means, etc.” When the KB is consulted, the Pattern Repository returns a set of possible web pages to be suggested. Based on this set and additional information, such as statistic of web page accessed, the Rule Repository makes the final recommendations.
3.3 Rule Repository The goal of applying the Rule Repository is to recommend a page from the current web site, i.e. to make a navigation suggestion. Using an online detection mechanism like a cookie, the visited pages and the time spent on them one during a session can be obtained. Using this data, we first match the current visit with the visit patterns stored in the Pattern Repository. This requires a minimum of visited pages to understand the current visitor’s behavior. Then the Rule Repository is applied to the matching results to give an online navigation suggestion about the next page to be visited, based on the history of previous navigation patterns. If the suggested page is not directly connected with the current page but the suggestion is accepted by the visitor, then new knowledge can be generated about his/her preferences. This can be used to reconfigure the Web site reorganizing the links among the pages. Figure 2 shows part of the rule set of Rule Repository. The parameter ² to identify those patterns that are “close
select navigation, statistics into S from pr fact, time, browsing behavior, wmt where “star join” and “fix technique” and “fix time” and formula(pattern,n) > ²; ... if S is empty then send(“no suggestion”); ... while S not empty loop if S.navigation ∈ / actual web site then S.navigation = compare page(ws,S.navigation); ... if S.navigation 6= last page visited and S.statics > γ then send(S.navigation); ... end loop
Figure 2. Rule Repository’s pseudo code enough” to the current visit. The parameter γ filters recommendations whose statistics are above a given threshold, e.g. the page should be a minimum percentage of visits. Since the Pattern Repository contains historical information, a suggested page may not appear in the current web site. In that case, the function “compare page” determines the page of the current web site, whose content is most similar to that of the suggested page (by using equation 3).
4 Creating the navigation recommendation A online navigation recommendation is a set of web pages belong to the web site, shown to the visitor as suggestion based on her/his browsing behavior.
4.1 Extracting significant patterns A simple analysis of the visitor’s browsing behavior in a web site, show that in a session only a subset total number of pages are visited. Then it is feasible to make in groups of comparable visitors sessions [12, 23]. The cluster identification, some times is a difficult process, because depend of a subjective definition, i.e., what we understand by cluster [19]. However, two approach given an estimation about when a group of point can be considerate a cluster [6]. In the kernel function the influence of data point in the neighborhood is considerate. In the density function the sum of the influence of all data points is necessary. In both case the cluster identification a parameter whose value depend of the specific problem studied. In other hand, the cluster interpretation is a subjective task. While for some persons the cluster don’t make sense, others discover a new knowledge. Then it is convenience the assistant of a business expert for a relative good interpretation.
Considering the above inconvenience, clustering techniques have been used in several studies [7, 13, 23], underline the benefits for visitor behavior analysis. In the next sections the main clustering algorithms used in this thesis are described.
4.2 How use the patterns? From clusters discovery, we can classify the visitor browsing behavior in one of them, comparing the cluster centroid with the current navigation, using the similarity measure introduced in (1). Let α = [(p1 , t1 ), . . . , (pm , tm )] the current visitor sesα α α sion and Cα = [(pα 1 , t1 ), . . . , (pH , tH )] be the centroid such as max{sm(α, Ci )} whit Ci the set of centroid discovered. The recommendations are created as a set of pages whose text content is related with pα m+1 . These pages are selected with the expert collaboration. Let Rm+1 (α) be the online navigation recommendation for the (m + 1)th page to be visited by visitor α, where δ < m < H and δ the minimum number of pages visited to prepare the suggestion. Then, we can write Rm+1 (α) = α α α α , . . . , lm+1,j , . . . , lm+1,k }, with lm+1,j the j th link {lm+1,0 th page suggested for the (m+1) page to be visited by visitor α and k the maximum number of pages for each suggestion. α represents the “no suggestion” In this notation, li+1,0 state.
4.3 Recommendation effectiveness Usually the application of any recommendation need the grants of the web site owner. It is a complicate task due to sometimes the web site is the core business of the institution and any change could mean loss market participation. In that sense, we propose a method to test of the recommendations effectiveness based on the same web data used in the pattern discovery stage. In fact, we use a percentage of the complete web data to extract significant patterns and we define the set of rule for them. Next we test the effectiveness with the regarding percentage of web data. Let ws = {p1 , . . . , pn } be the web site and pages that compound it. Using the distance introduced in (3) and with the collaboration of a web site content expert, we can define an equivalent class for the pages, where the pages belonging to the same class content similar information. The class partitioned the web site in disjoint subsets of pages. Let Clx be the xth equivalent class forSthe web site. It w is such as ∀pz ∈ Clx , pz ∈ / Cly , x 6= y x=1 Clx = ws where w is the number of equivalent class. Let α = [(p1 , t1 ), . . . , (pH , tH )] be a visitor behavior vector from the test set. Based on the first m pages actually visited, the proposed systems recommends for the following
page m + 1 several possibilities, i.e., possible pages to be visited. We test the effectiveness of the suggestions made for the (m + 1)th page to be visited by visitor α following this procedure. Let be Clq the equivalent class for pm+1 , if α α α ∃lm+1,j ∈ Rm+1 / lm+1,j ∈ Clq , j > 0 then we assume the suggestion was successful. By construction of the recommendation, the set of page could be a lot, producing a confuse in the visitor about which page to follow. We set in k the maximum number of page per recommendation. Using the page distance introduced in (3) we can extract the k pages more closed to pm+1 in the recommendation. k α α Em+1 (α) = {lm+1,j ∈ sortk (sp(pm+1 , lm+1,j ))}, (5)
with sp the page distance introduced in (3). The “sortk ” function sort the result of sp in descendent way and extract the “k” links pages whose distance to pm+1 are the more bigger. α }, i.e., no A particular case is when Em+1 (α) = {lm+1,0 suggestion is proposed.
5 A practical application We applied the our methodology over web data originated in a certain web site. It belong to the financial institution and correspond to Chilean virtual bank Tbanc3 (Technological Bank) It is the first, i.e., it does not have physical branches and all the transactions are made using electronic means, like e-mails, portals, etc. The site is written in Spanish, has 217 static web pages and approximately eight million web log registers corresponding to the period January to March, 2003. In the creation of visitor behavior vectors we only considerate the sessions with three or more pages visited and the session with ten pages or more are considerate no real sessions caused by the action of firewalls,crawlers, etc.
5.1 Significant patterns extraction We applied a Self-Organizing Feature Map (SOFM) assuming that the maximin number of visitor behavior vector components is H = 6. For the vectors with three components, the rest are complete with zero. The SOFM receive 6 variable as input and 32x32 output neurons. In the training task was used a 80% of the total visitor behavior vectors, approximately 220,000. Figure 3 shows the result where four main clusters can be identified. 3 http://www.tbanc.cl/
Neurons winner frecuency 2.5 2 1.5 1 0.5 0 0
5
10 15 Neurons 20 25 axis i 30
0
5
20 10 15 Neurons axis j
25 30
Figure 3. Clusters among important page vectors
The thoroidal topology of the SOFM maintains the continuity of clusters, which allows to study the transition among the preferences of the visitors from one cluster to another. The data mining tool interacts with the data mart through a stream, that is created by the previous execution of a code in perl, that receives as input the time range to be analyzed. The clusters shown in figure 3 are presented in more detail in table 1. The second and third column of this table contain the center neurons (winner neuron) of each of the clusters, representing the visited pages and the time spent in each one of them. Table 1. Visitor behavior clusters Cluster A B C D
Pages Visited (78,81,150,193,138,81) (157,178,187,102,115,1) (4,18,29,41,141,205) (127,116,129,130,47,65)
Time spent in seconds (7,61,43,18,82,93) (12,87,98,105,42,3) (8,39,113,138,149,58) (37,51,41,63,105,23)
The pages in the web site were labeled with a number to facilitate the respective analysis. Table 2 shows the main content of each page. A detailed cluster analysis together with the interpretation of the particular web pages visited revealed the following results: • Cluster A. Visitors are interested in agreements between the bank and other institutions. • Cluster B. Visitors are interested in investments and remote services offered by the bank. • Cluster C. The visitors are interested in products and services offered by the bank.
Table 2. Web pages and main topics Pages Content 1 Home page 2, . . . , 65 Products and Services 66, . . . , 98 Institutional agreements 99, . . . , 115 Remote services 116, . . . , 130 Credit cards 131, . . . , 155 Promotions 156, . . . , 184 Investments 185, . . . , 217 Supply of credits • Cluster D. The visitors search information about credit cards.
5.2 Storing the patterns Figure 1 presents the patterns repository’s general structure. In the case of the described application it contains the following specific information: Time. (2003,Sep,3,20,16), i.e. “16:00 hours, September 20th, third week, year 2003”. Browsing Behavior. The clusters’ centroids discovered by the web mining process and shown in table 1 as well as the formula shown in equation 1. WMT. Self-organizing Feature Map with thoroidal architecture, 32x32 neurons.
5.3 Creating rules for navigation suggestions First, we need to identify the current visitor session. Since the selected web site uses cookies we could use this tool for online session identification. In order to prepare the online suggestion for the (m + 1)th page to visit, we compare the current session with the patterns in the Patterns Repository. The comparison needs a minimum of three pages visited (δ = 3) to determine the cluster centroid most similar to the current visit, allowing to prepare the suggestion for the fourth page visited. This process can be repeated after the visitor has visited more than three pages, i.e., in the suggestions for the fifth page, we use the four pages visited in the current session. The final online suggestion is made using the rule base developed together with the domain expert. For the four clusters found, sixteen rules were created. We suggest at most three pages for the fourth, fifth, and sixth page to visit, i.e. k = 3. A rule example is shown in table 3. It belongs to the cluster “A”, for the fourth page recommendation. In the example, “S” is a stack implemented using simple linked list. The function Pop(S) extract the next element in
“S”. This data structure contain the result of the star query shown in the table 2. From the final set of pages in “L”, Extracted Three Links represent the expression (5) and extract a subset with a maximum of three links, based on the associated statistic. Default is the no suggestion state.
5.4 Testing the online navigation effectiveness From the 20% of the visitor behavior vectors that belong to the test set, we select only those with six real components, i.e., it was not necessary to complete visitor vectors with zero to get the six components. Given this selection, we have 9,751 vectors to test the effectiveness of the online navigation suggestions. Figure 4 shows a histogram with the percentage of the accepted suggestions using our validation method. As can be seen acceptance increases if more pages are suggested for each page visit. If using the proposed methodology just one page is suggested it would have been accepted in slightly more than 60% of the cases. This has been considered a very successful suggestion by the business expert, since we are dealing with a complex web site with many pages, many links between pages, and a high rate of visitors that leave the site after a few clicks. Furthermore, it should be mentioned that the percentage of acceptance probably would have been even higher if we actually would have suggested the respective page during the session. Since we are comparing past visits stored in
Percentage of acceptance
Table 3. A example of rule operation select navigation, statistics into S from pr fact, time, browsing behavior, wmt where “star join” and “fix technique” and “fix time” and sm(pattern,current visitor) > ²; ... CA → [(78,7),(81,61),(150,43),(193,18),(138,82),(81,93)] α → [(60, 12), (92, 70), (142, 22)] % current visitor ws → {p1 , . . . , p217 } % current web site pages S.navigation → {p200 , p121 , p187 , p128 , p212 , p112 } S.statistic → {0.2, 2.2, 1.7, 0.7, 1.2, 0.9} Case CA and SuggestionPage=4 : Prepare suggestion(p0 ,0,L); % default “no suggestion” % L: link page suggestion, 0: statistic associated while S not null loop if (S.navigation not in ws) then S.navigation = compare page(ws,S.navigation); elsif ((S.navigation αp1...3 ) and (S.statistic > γ)) then Prepare suggestion(S.navigation,S.statistic,L); Pop(S); % Next element in S end if; end loop; send(Extracted Three Links(L)); % L → {p121 , p187 , p212 }
100 90 80 70 60 50 40 30 20 10 0
sixth page fifth page 1
fourth page 2
3
Number of page links per suggestion
Figure 4. Percentage of acceptance of online navigation recommendations
log files we could only analyze the behavior of visitors that did not actually receive any suggestion we propose.
6 Conclusions and future work In this work a method to create online navigation recommendation was introduced. It considerate the extraction of significant navigation patterns from web data and the creation of rules about how use them in the creation of the recommendation. In order to storage the knowledge discovery, we define a Knowledge Base compose by a Pattern Repository and Rule Repository. The first one is implemented using the data mart technology and the second one is a code written in a high level language. Using the KB the online navigation recommendations are created and tested with real web logs data. Since the KB contain the knowledge discovery using the 80% of the complete web data, it was possible test the effectiveness in the regarding 20%. The result shown that the visitor would follow the recommendation in a high percentage. As future work, we want apply the online recommendation in the normal function of a web site.
References [1] B. Berendt and M. Spiliopoulou. Analysis of navigation behavior in web sites integrating multiple information systems. The VLDB Journal, 9:56–75, 2001. [2] P. Brusilovsky. Adaptive web-based system: Technologies and examples. Tutorial, IEEE Web Intelligence Int. Conference, October 2003.
[3] M. Cadoli and F. M. Donini. A survey on knowledge compilation. AI Communications, 10(3-4):137–150, 1997. [4] G. Chang, M. Healey, J. McHugh, and J. Wang. Mining the World Wide Web. Kluwer Academic Publishers, 2003. [5] V. Devedzic. Knowledge discovery and data mining in databases. Technical report, School of Business Administration, University of Belgrade, Yugoslabia, 2002. [6] A. Hinneburg and D. Keim. Advances in clustering and applications. Tutorial, ICDM Int. Conf., Melbourne, Florida, USA, November 2003. [7] A. Joshi and R. Krishnapuram. On mining web access logs. pages 63–69, 2000. [8] R. Kimball and R. Merx. The Data Webhouse Toolkit. Wiley Computer Publisher, New York, 2000. [9] V. Levenshtein. Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl., pages 705– 710, 1966. [10] G. Linoff and M. Berry. Mining the Web. Jon Wiley & Sons, New York, 2001. [11] Z. Lu, Y. Yao, and N. Zhong. Web Intelligence. SpringerVerlag, Berlin, 2003. [12] B. Mobasher, R. Cooley, and J. Srivastava. Creating adaptive web sites through usage-based clustering of urls. In Procs. IEEE Knowledge and Data Engineering Exchange, November 1999. [13] B. Mobasher, T. Luo, Y. Sung, and J. Zhu. Integrating web usage and content mining for more effective personalization. In Procs. of the Int. Conf. on E-Commerce and Web Technologies, pages 165–176, September, Greenwich, UK, 2000. [14] J. Nielsen. User interface directions for the web. Communications of ACM, 42(1):65–72, 1999. [15] S. K. Pal, V. Talwar, and P. Mitra. Web mining in soft computing framework: Relevance, state of the art and future directions. IEEE Transactions on Neural Networks, 13(5):1163–1177, September 2002. [16] T. A. Runkler and J. Bezdek. Web mining with relational clustering. International Journal of Approximate Reasoning, 32(2-3):217–236, Feb 2003. [17] M. Spiliopoulou. Data mining for the web. In Principles of Data Mining and Knowledge Discovery, pages 588–589, 1999. [18] J. Srivastava, R. Cooley, M. Deshpande, and P. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2):12–23, 2000. [19] S. Theodoridis and K. Koutroumbas. Pattern Recognition. Academic Press, 1999. [20] C. Vassiliou, D. Stamoulis, and D. Martakos. The process of personalizing web content: techniques, workflow and evaluation. In Procs Int. Conf. on Advances in Infrastructure for Electronic Business, Science and Education on the Internet, 2002. [21] J. D. Vel´asquez, R. Weber, H. Yasuda, and T. Aoki. A methodology to find web site keywords. In Procs. IEEE Int. Conf. on e-Technology, e-Commerce and e-Service, pages 285–292, Taipei, Taiwan, March 2004. [22] J. D. Vel´asquez, H. Yasuda, T. Aoki, and R. Weber. A new similarity measure to understand visitor behavior in a web
site. IEICE Transactions on Information and Systems, Special Issues in Information Processing Technology for web utilization, E87-D(2):389–396, February 2004. [23] J. D. Vel´asquez, H. Yasuda, T. Aoki, R. Weber, and E. Vera. Using self organizing feature maps to acquire knowledge about visitor behavior in a web site. Lecture Notes in Artificial Intelligence, 2773(1):951–958, September 2003.