A neuro-fuzzy collaborative filtering approach for

0 downloads 0 Views 287KB Size Report
Among these, the most widely adopted is collaborative filter- ... approach based on the combination of the fuzzy reasoning with a neural network is employed in ...
A neuro-fuzzy collaborative filtering approach for Web recommendation International Journal of Computational Science 1992-6669 (Print) 1992-6677 (Online) © Global Information Publisher 2007, Vol. 1, No. 1, 27-39

A neuro-fuzzy collaborative filtering approach for Web recommendation G. Castellano, A. M. Fanelli, and M. A. Torsello* Department of Informatics, University of Bari, Via Orabona, 4 – 70126 Bari, Italy {castellano,fanelli,torsello}@di.uniba.it

Abstract. Due to the growing variety and quantity of information available on the Web, there is urgent need for developing web-based applications capable of adapting their services to the needs of the users. This is the main rationale behind the flourishing area of Web recommendation, that finds in Soft Computing techniques a valid tool to handle uncertainty in web usage data and develop web-based applications tailored on users preferences. In this context, we propose a neuro-fuzzy strategy that combines soft computing techniques to develop a Web recommendation system that dynamically suggests interesting URLs for the current user. As a preliminary step, user access logs are analyzed to identify user sessions. Then, groups of users which exhibit a common browser behavior (i.e. user profiles) are discovered by applying a fuzzy clustering algorithm to the user sessions. Finally, a knowledge extraction process is carried out to derive associations between user profiles and relevant Web pages to be suggested to users. In particular, a hybrid approach based on the combination of the fuzzy reasoning and the connectionist paradigm is proposed in order to derive knowledge from session data and represent it in the comprehensible form of fuzzy rules. The derived knowledge is ultimately used to dynamically suggest links to Web pages judged interesting for the current user. Keywords: Web Personalization, Web usage mining, Web recommendation, Clustering, Neuro-fuzzy.

*

Corresponding Author. Tel.: +390805442456, Fax: +390805442476, Email: [email protected].

GLOBAL INFORMATION PUBLISHER 27

International Journal of Computational Science

1 Introduction The growing quantity of information and applications available on the World Wide Web has recently imposed some kind of personalization for the Web information space [1]. Web personalization may be useful in several contexts to offer a variety of functionalities, such as customization, task performance support, personalized guidance, etc. In particular, personalization guidance allows to assist the user in getting quickly the information he is seeking in a site, without asking for it explicitly. One main example of guidance functions is represented by link recommendation which consists in the suggestion of a set of links to pages of a site according the preferences and the necessities of users [2]. Generally speaking, Web recommendation may be intended as the process of meeting or predicting the interests of users by providing them with the information or services that they need [3], [4]. The scientific literature includes a variety of techniques that have been employed to perform Web recommendation [3]. Particularly, three main approaches can be identified: content-based filtering, rulebased filtering and collaborative filtering. Among these, the most widely adopted is collaborative filtering [4], [5], [6], [7], [8] which is based on the idea to match the preferences of a current user about specific objects (e.g. web pages) with those of similar users, in order to produce recommendations on other objects not yet rated by the current user.

In a Web recommendation system, two principal tasks can be distinguished: the discovery of knowledge about the user’s preferences by analyzing Web data and the effective recommendation process. Typically, to execute the task of knowledge discovery, Web Usage Mining (WUM) methodology is employed. WUM consists in exploiting statistical and data mining techniques in order to derive patterns of user navigational behavior starting from Web usage data [9], [10]. In the effective recommendation process, the extracted knowledge is used to provide recommendations to the users, such as adding hyperlinks to the last Web page requested by the user, depending on the type of user. Web usage data are characterized by vagueness and imprecision. The use of traditional machine learning techniques in WUM may often result inefficient and inadequate to handle the uncertainty underlying data about Web interactions. As consequence, Soft Computing techniques (i.e. fuzzy logic, neural networks, neuro-fuzzy systems, etc.) can be properly applied in order to face the imprecision and partial truths characterizing the recommendation process [11]. In this paper, we propose a Web recommendation approach for dynamic link suggestion exploiting Soft Computing techniques as tools for WUM. Specifically, we investigate the use of a neuro-fuzzy strategy to develop a collaborative filtering approach. A fuzzy clustering algorithm is applied to determine user profiles by grouping preprocessed Web usage data into session categories. Then, a hybrid approach based on the combination of the fuzzy reasoning with a neural network is employed in order to derive fuzzy rules useful to provide dynamical predictions about Web pages to be suggested to the current user, according to the user profiles previously identified.

The rest of the paper is organized as follows. In Section 2 the methodology underlying the proposed recommendation approach is formulated and the involved tasks are described. In Section 3

28

GLOBAL INFORMATION PUBLISHER

A neuro-fuzzy collaborative filtering approach for Web recommendation

preliminary simulation results on a simple Web site are reported and analyzed. Section 4 concludes the paper.

2 The Web recommendation approach The proposed Web recommendation approach involves a number of tasks, as illustrated in Fig. 1. Specifically, two main tasks are performed: User Profiling and Recommendation. User profiling is the task of discovering a number of user categories starting from session data derived by preprocessing log files. Precisely, the identified user sessions are used to create models of visitor behavior that are successively grouped into user profiles by a fuzzy clustering strategy. Starting from the extracted user profiles and the available session data, a knowledge base expressed in the form of fuzzy rules is extracted via a neuro-fuzzy learning strategy. Such knowledge base is exploited during the recommendation task to dynamically suggest links to Web pages judged interesting for the current user. In the following subsections, we describe in more detail all the tasks involved in the proposed approach. User Profiling Modeling visitor behavior

Preprocessing log data

Extracting user profiles by fuzzy clustering Log

Creating recommendation rules by neuro-fuzzy learning

Recommending links Recommendation Fig. 1. Working scheme of the proposed Web recommendation approach

GLOBAL INFORMATION PUBLISHER 29

International Journal of Computational Science

2.1 Modeling the visitor behavior from preprocessed log data The first step of the user profiling task is aimed to derive a model of the visitor behavior. To achieve this, we properly preprocess log data representing all the requests made by the visitors of a Web site. Log data preprocessing leads to identify a number of significant user sessions that can be useful for modeling the user navigational behavior. Preprocessing of access log files is performed by means of LODAP (LOg DAta Preprocessor), a software tool that we have presented in [12]. The tool analyzes usage data stored in log files to produce statistics about the browsing behavior of the users visiting the Web site. Particularly, LODAP structures the requests made by the connected users into user sessions by identifying the sequence of pages accessed by each visitor. LODAP preprocesses log data into three steps: data cleaning, data structuration and data filtering. During data cleaning, Web log data are cleaned from the useless information in order to retain only records corresponding to the explicit requests of the users that can be effectively exploited to derive models of the user navigational behavior. Precisely, LODAP removes records related to failed and corrupted requests, records of requests for multimedia objects (such as images, videos, sounds, ecc.) and records corresponding to visits made by Web robots. After data cleaning, only information concerning accesses to pages of the Web site are retained. We formally define the set of such pages as P = p1 , p2 ,..., pn p . Data structuration identifies user sessions by grouping the unstructured requests made by a same user for different pages. To extract user sessions, the identification of distinct users is a problem which has to be addressed. For Web sites requiring user registration, different users can be identified by exploiting the information concerning the user login contained in log files. When the user login is not available, LODAP simply considers each IP address as a different user (being aware that an IP address might be used by several users). The set of all users (IP) is defined by U = u1 , u2 ,..., unU and a user session is defined as the set of accesses originating from the same user (IP) within a predefined time period. Formally, a user session is represented as a triple s i = u i , t i , p i where u i ∈ U represents the user identifier, t i is the total time access of the i-th session, p i is the set of all pages requested during the i-th session. More in detail, p i = ( p i1 , t i1 , N i1 ), ( p i 2 , t i 2 , N i 2 ),..., p ini , t ini , N ini where pik is the k-th URL visited during the i-the session, t ik is the total access time to page pik and N ik represents the number of accesses to page pik during the i-th session. Summarizing, after data structuration, a collection S = s1 , s 2 ,..., s ns of n s sessions is identified from the log data. Finally, LODAP executes a page filtering and a session filtering process in order to retain only the most visited pages and the most significant user sessions. Page filtering eliminates two kinds of requests: requests for very low support URLs, i.e. requests to pages which do not appear in a sufficient number of sessions, and requests for very high support URLs, i.e. requests to pages which appear in nearly all sessions. Session filtering removes all user sessions that include a very low number of visited URLs. Hence, after data filtering, only m page requests (with m ≤ nP ) and only n sessions (with n ≤ nS ) are retained.

{

{

}

(

30

GLOBAL INFORMATION PUBLISHER

)

}

A neuro-fuzzy collaborative filtering approach for Web recommendation

Once user sessions have been identified by LODAP, visitor behavior models are created by defining a measure expressing the interest degree of the users for each visited page during a session. In literature several measures have been considered to estimate how much the user is interested in a page of the Web site [13], [1], [14]. In our approach, we measure the interest degree for a page as the average access time on that page. Precisely, the interest degree for the j-th page in the i-th user session is defined as: t ij IDij = N ij where t ij is the overall time spent by the user on the j-th page and N ij is the number of accesses to that page during the i-th session. Hence, we model the visitor behavior of each user through a pattern of interest degrees for all pages visited by that user. Since the number of pages visited by different users may vary, visitor behavior patterns may have different dimensions. To obtain a homogeneous behavior model for all users, we translate behavior patterns into vectors having the same dimension equal to the number m of pages retained by LODAP after page filtering. In particular, the behavior of the i-th user ( i = 1,..., n ) is modeled by a vector b i = (bi1 , bi 2 ,..., bim ) where if page p j is accessed in session s i ⎧ ID bij = ⎨ ij otherwise ⎩ 0

[ ]

Summarizing, we model the visitor behaviors by a n × m matrix B = bij where each entry represents the interest degree of the i-th user for the j-th page. Based on this matrix, visitors with similar preferences can be successively clustered together into user profiles, as described in the following subsection.

2.2 Extracting profiles of visitor behavior The second step of the user profiling task concerns the extraction of a number of user profiles by applying a clustering process to the matrix of interest degrees previously derived. Precisely, visitors exhibiting a common browsing behavior are grouped together into the same cluster (i.e. user profile). Since user profiles are rarely well separated (a user can exhibit interests characterizing different user profiles), the use of traditional clustering algorithms result often inadequate to extract user profiles expressing the actual user behavior. Conversely, fuzzy clustering algorithms seems to be particularly suited in this context since they enable the creation of overlapping clusters, so that users with different interests may belong to several profiles to a different extent. In our approach, the well-known Fuzzy C-Means (FCM) clustering algorithm [15] is applied in order to group behavior vectors b i into overlapping clusters representing user profiles. Briefly, the FCM algorithm finds C clusters based on the minimization of the following objective function: n

C

Fα = ∑∑ uicα b i − v c

2

, 1≤α ≤ ∞

i =1 c =1

GLOBAL INFORMATION PUBLISHER 31

International Journal of Computational Science

where α is any real number greater than 1, u ic is the degree of membership of the behavior vector b i to the c-th cluster, v c is the center of the c-th cluster. The FCM algorithm works as follows: (0) n 1. Initialize U = [uic ]ic==1,..., 1,...,C matrix,U 2. At τ -th step: calculate the center vectors V ( ) = ( v c )c =1,..,C as τ

n

∑ uα b ic

vc =

i

i =1 n

∑ uα

ic

i =1

3. Update U (τ ) according to:

uic =

1 ⎛ bi − v c ⎜⎜ ∑ k =1 ⎝ b i − v k C

2

⎞ α −1 ⎟⎟ ⎠

4. If U (τ ) − U (τ −1) < ε , with 0 < ε < 1 , STOP; otherwise return to step 2. As a result, the FCM algorithm provides: ,...,C A fuzzy partition matrix U = [u ic ]ic==11,..., n where u ic represents the membership degree of the

behavior vector b i to the c-th cluster.

C clusters with prototype vectors v c , c = 1,..., C . Each cluster prototype v c = ( vc1 , vc 2 ,..., vcm ) represents a user profile describing the typical navigational behavior of a group of users with similar interests.

2.3 Creating recommendation rules After user profiling, the proposed Web personalization approach involves a recommendation process that employs the extracted user profiles to create recommendation rules that associate relevance degrees of URLs to each visitor profile. Such rules represent the knowledge base to be used in the ultimate online process of link recommendation. Each recommendation rule expresses a fuzzy relation between a behavior vector b = ( b1 , b2 ,..., bm ) and relevance of URLs in the following form: IF ( b1 is A1k ) AND … AND ( bm is Amk ) THEN (relevance of URL1 is y1k ) AND … AND (relevance of URLm is ymk ) for k = 1,.., K where K is the number of rules, A jk ( j=1,…, m ) are fuzzy sets with Gaussian membership functions defined over the input variables b j and y jk are fuzzy singletons expressing the relevance degree of the jth URL. The main advantage of using a fuzzy knowledge base for recommendation is the readability of the extracted knowledge. Actually, fuzzy rules can be easily understood by human users since they can be expressed in a linguistic fashion by labelling fuzzy sets with linguistic terms such as LOW,

32

GLOBAL INFORMATION PUBLISHER

A neuro-fuzzy collaborative filtering approach for Web recommendation

MEDIUM, HIGH. Hence, a fuzzy rule for recommendation can assume the following linguistic form: IF (the degree of interest for URL1 is LOW) AND … AND (the degree of interest for URLm is HIGH) THEN (recommend URL1 with relevance 0.3) AND … AND (recommend URLm with relevance 0.8) In our approach, the creation of recommendation rules is performed through a hybrid strategy based on the combination of fuzzy reasoning with a specific neural network that encodes in its structure the discovered knowledge in form of fuzzy rules. The architecture of the network (depicted in fig. 2) is composed of three layers computing respectively: • membership degree to fuzzy sets; • fullfillment degree for each fuzzy rule; • inferred output. Units in the first layer L1 receive a behavior vector ( b1 , b2 ,..., bm ) and evaluate the Gaussian membership functions representing fuzzy sets. In this layer, units are arranged in K groups, one for each fuzzy rule. The k-th group contains m units corresponding to the fuzzy sets which define the premise part of the k-th rule. In detail, each unit in L1 receives the interest degree for the j-th page b j , j = 1...m and computes its membership value to fuzzy set A jk as follows: ⎛ b −x 2 ⎞ j jk (1) ⎟ , j = 1,..., m k = 1,..., K O jk = exp ⎜ − 2 σ jk ⎜⎜ ⎟⎟ ⎝ ⎠ where x jk and σ jk are the center and the width of the Gaussian function, representing the adjustable parameters of the unit. The second layer L2 contains K units that compute the fulfillment degree of each rule. In this layer, no modifiable parameter is associated with the units. The output is derived by computing the rule activation strength, as follows:

(

)

O k( 2) =

n

∏O

(1) jk

j =1

where x jk and σ jk are the center and the width of the Gaussian function, representing the adjustable parameters of that unit. The third layer L3 provides the outputs of the network, i.e. the relevance values of the m web pages. Each output results from the inference of rules, according to the following formula:

∑ = ∑ K

O

(3) j

k =1 K

Ok(3) y jk

k =1

Ok(3)

, j = 1,..., m

Connections between layer L2 and L3 are weighted by the fuzzy singletons y jk that represent a set of free parameters for the neural network.

GLOBAL INFORMATION PUBLISHER 33

International Journal of Computational Science

1

b1 O1 b2

O2 K

bm

Om

Fig. 2. Architecture of the neuro-fuzzy network

In order to learn significant recommendation rules, the network is trained on a set of inputoutput samples describing the association between user sessions and preferred URLs. Precisely, the training set is a collection of n input-output vectors: T = ( b i , ri ) i =1,..., n

where the input vector bi represents the behavior vector of the i-th user, and the desired output vector ri expresses the relevance degrees associated to the m URLs for the i-th visitor. To compute such relevance degrees, we exploit information embedded in the profiles extracted through fuzzy clustering. Precisely, for each behavior vector bi we consider its membership values {u ic }c =1,...,C in the fuzzy partition matrix U . Then, we identify the two top matching profiles c1 , c2 ∈ {1,.., C} as those with the highest membership values. The relevance degrees in the output

(

)

vector ri = ri 1 , ri 2 ,..., ri m are hence calculated as follows: ri j = uic vic + uic vic for j = 1,..., m and 1

2

i = 1,..., n . Once the training set has been constructed, the neural network can enter the learning phase to extract the knowledge embedded into training set and represent it as a collection of fuzzy rules. The learning is articulated in two steps. The first step is based on an unsupervised learning, based on a rival penalized mechanism, which provides a clustering of the behavior vectors and the definition of an initial fuzzy rule base. In this step, the structure and the parameters of fuzzy rules are identified. Successively, the obtained knowledge base is refined by a supervised learning process. Here, fuzzy rule parameters are tuned via supervised learning to improve the accuracy of the derived knowledge. Major details on the algorithms underlying the learning strategy can be retrieved in [16].

34

GLOBAL INFORMATION PUBLISHER

A neuro-fuzzy collaborative filtering approach for Web recommendation

2.4 Recommending links The ultimate task of personalization is the online recommendation of links to Web pages judged interesting for the current user of the Web site. Specifically, when a new user accesses the Web site, an on-line module matches his current partial session against the fuzzy rules currently available in the knowledge base and derives a vector of relevance degrees by means of a fuzzy inference process. Formally, when a new user has access to the Web site, an active user’s current session is created in the form of a vector b 0 . Each time the user requests a new page, the vector is updated. To maintain the active session, a sliding window is used to capture the most recent user’s behavior. Thus the partial active session of the current user is represented as a vector b 0 = b10 ,..., bm0 where some values are equal to zero, corresponding to unexplored pages. Based on the set of K rules generated through the neural learning described above, the recommendation module provides URL relevance degrees by means of the following fuzzy reasoning procedure: (1) Calculate the matching degree of current behavior vector b 0 to the k-th rule, for k = 1,.., K by means of product operator:

(

)

μk ( b 0 ) = ∏ j =1 μ jk ( b 0j ) n

(2) Calculate the relevance degree rj0 for the j-th URL as: K

rj0 =

∑r k =1 K

jk

μk ( b0 )

∑ μk ( b0 )

, j = 1...m

k =1

This inference process provides the relevance degree for all the considered m pages, independently on the actual navigation of the current user. In order to perform dynamic link suggestion, the recommendation module firstly identifies URLs that have been not visited by the current user, i.e. all pages such that b0j = 0 . Then, among unexplored pages, only those having a relevance degree rj0 greater than a properly defined threshold α are recommended to the user. In practice, a list of links is dynamically included in the page currently visited by the user.

3 Simulation results and analysis To test the proposed Web recommendation approach, a preliminary experimental session has been carried out by considering the log files of a sample Web site. The log files contain user requests covering a time period of two weeks, for a total of 13100 requests. First of all, LODAP has been applied to identify user sessions by preprocessing the available log data. In data cleaning step, LODAP removed all the useless requests, such as accesses to multimedia objects, robot’s requests, failed and corrupt requests, leading to 9115 significant requests. In the data structuration step, LODAP identified user sessions by grouping such requests. Specifi-

GLOBAL INFORMATION PUBLISHER 35

International Journal of Computational Science

cally, the requests originating from the same IP address during an established time period of 25 minutes were grouped into a session. A collection of 2510 sessions were identified including requests for 15 distinct pages. Then, page and session filtering were applied to select the most visited pages and the most significant user sessions. In particular, to perform page filtering, LODAP counts for each page p j , the number NS j of different sessions that include a request for p j . LODAP performed very low support filtering by removing all pages that satisfied NS j < ε , where ε is equal to 10% the quantity NS = max NS j . Very high support filtering was executed j =1,...,15 by deleting all pages such that NS j > NS − ε . In the session filtering step, all user sessions containing a low number of visited URLs were removed. Precisely, a threshold η = 4 was fixed which represents the minimum number of distinct pages that a user session should contain to be retained significant. Hence, session filtering removed all user sessions s i which satisfied the condition NPi < η , where NPi is the number of distinct URLs visited in that session. At the end of preprocessing, a number of 2000 user sessions were identified and 10 distinct pages were retained. For the sake of brevity, we indicate the selected pages by the letters A, B, C, D, E, F, G, H, I and J. Once user sessions were identified, visitor behavior models were derived by calculating the interest degrees of each user for each page, leading to a 2000x10 behavior matrix. Next, the FCM algorithm was applied to the behavior matrix in order to obtain clusters of users with similar navigational behavior corresponding to the user profiles. To evaluate the validity of the clustering process, two different indexes widely used in literature were adopted: the Dunn’s index and the Davies-Bouldin index [17]. The Dunn’s index D is defined as: ⎧ ⎧ δ X ,X ⎫⎫ i j ⎪ ⎪ ⎪⎪ D = min ⎨ min ⎨ ⎬⎬ 1≤i ≤C 1≤ j ≤C max {Δ ( X k )} ⎪ ⎪⎩ 1≤ k ≤C ⎪⎭⎪ ⎩ ⎭ where δ X i , X j represents the intercluster distance between clusters X i e X j , Δ( X k ) represents the intracluster distance of cluster X k and C is the number of clusters. The goal is to maximize intercluster distances whilst minimizing intracluster distances. Hence, large values of Dunn’s index correspond to a good cluster partition. The Davies-Bouldin validation index DB is defined as: C ⎧⎪ Δ( X i ) + Δ X j ⎫⎪ 1 DB = max ⎨ ⎬ C i =1 i ≠ j ⎪⎩ δ X i , X j ⎪⎭ where δ X i , X j , Δ ( X i ) , Δ( X j ) are defined as above. In this case, small index values corre-

(

(

)



(

)

)

(

( ) )

spond to good clusters. We carried out several runs of the FCM by setting different values of the number of clusters (C=3,…, 20). To obtain more reliable results, each run was repeated 10 times and the average values of the validity measures were considered. Figures 3 and 4 depict the average values of the Dunn’s and the Davies-Bouldin indexes for different numbers of clusters. It can be seen that the best partition was obtained with C = 6 , because it provides the best values for both indexes.

36

GLOBAL INFORMATION PUBLISHER

A neuro-fuzzy collaborative filtering approach for Web recommendation

Based on the prototypes of the six clusters, a collection of six user profiles was derived. In Table 1, for each user profile the pages with highest interest degree are indicated. It can be noted that some pages (e.g. pages I and D) characterize more than a profile, thus showing the importance of using fuzzy clustering for user profiling. The next step was the creation of recommendation rules starting from the extracted user profiles. A neural network with 10 inputs (corresponding to the components of the behavior vector) and 10 outputs (corresponding to the relevance values of the Web pages) was considered. The internal layer of the network contains 6 units that compute the fulfillment degree of each rule. The network was trained on a training set of 1400 input-output samples derived from the available 2000 behavior patterns and from the 6 user profiles, as described in Section 2.3. The remaining 600 samples were used for testing. The training of the network was stopped when the error on the training set dropped below 0.01, corresponding to a testing error of 0.03. The derived fuzzy rule base was integrated into the online recommendation module to infer the relevance degree of each URL for the current user. These relevance degrees were ultimately used to suggest a list of links to unexplored pages retained interesting to the current user. To perform link recommendation, the navigational behavior of the current user was observed during a sliding window of 3 minutes in order to derive the behavior pattern corresponding to his partial visit. Such behavior pattern was used as input to the fuzzy rule inference process that computes the relevance degrees for all the considered 10 pages. Then, among the unexplored pages, only those having a relevance degree greater than α = 0.7 were included in the list of links to be suggested. How to dynamically present link to the interesting pages within the currently visited page is an aspect still under investigation. 8 7 6

D values

5 4 3 2 1 0 3

4

5

6

7

8

9

10

11

12

13

15

20

cluster num ber

Fig. 3. Dunn’s index values

GLOBAL INFORMATION PUBLISHER 37

International Journal of Computational Science

0,9 0,8 0,7

DB values

0,6 0,5 0,4 0,3 0,2 0,1 0 3

4

5

6

7

8

9

10

11

12

13

15

20

cluster num ber

Fig. 4. Davies-Bouldin’s index values Table 1. The extracted user profiles by setting C=6 User Profiles

Pages characterizing the profile

1

A (0.82), D(0.48), I(0.80)

2

C(0.86), F(0.83), I(0.75)

3

B(0.86), I(0.80)

4

D(0.88), G(0.85), J(0.82)

5

E(0.88), J(0.84)

6

A(0.84), D(0.45), H(0.82)

4 Conclusions A Web recommendation approach based on the combination of Soft Computing techniques has been presented. We investigated the use of a hybrid approach joining the advantages of neural networks and fuzzy reasoning in order to develop a recommendation system that dynamically suggests interesting links to the current user on the basis of fuzzy rules. The first task of our Web recommendation approach is the creation of user profiles that synthesize the interests of users with similar browsing behavior. We perform this task by a fuzzy clustering algorithm that enables creation of overlapping clusters, so that a user, according to his browsing behavior, can belong to more than a profile with different membership degrees. The suitable number of profiles is determined by using cluster validity measures. The second task is the creation of a set of fuzzy rules that associate relevance degrees of URLs to each visitor profile. Experimental results on the session data of a simple Web site showed the effectiveness of the proposed approach and encourage its application to more complex Web domains. Currently, we are experimenting the approach on high-dimensional session data to evaluate how the proposed approach scales with the number of pages.

38

GLOBAL INFORMATION PUBLISHER

A neuro-fuzzy collaborative filtering approach for Web recommendation

References 1. Nasraoui, O.: World Wide Web Personalization. In J. Wang (ed), Encyclopedia of Data Mining and Data Warehousing, Idea Group (2005) 2. Pierrakos, D. G., Paliouras, G., Papatheodorou, C., and Spyropoulos, C. D. : Web usage mining as a tool for personalization: A survey. User Modeling and User-Adapted Interaction 13 (2003) 311–372 3. Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM TOIT 3 (2003) 2-27 4. Mulvenna, M., Anand, S., and Buchner, A.: Personalization on the net using web mining. CACM 43 (2000) 123-125 5. Suryavanshi, B.S., Shiri, N., Mudur, S.P.: An efficient technique for mining usage profiles using relational fuzzy subtractive clustering. Proc. of the 2005 Int. Workshop on Challenges in Web Information Retrieval and Integration (WIRI’05) (2005) 23-29 6. Herlocker J., Borchers, A., and Riedl, J.: An algorithmic framework for performing collaborative filtering. In Proceedings of the 1999 Conference on Research and Development in Information Retrieval (1999) 7. Konstan, J., Miller, B., Maltz, D., Herlocker, J., Gordon, L., and Rield, J.: GroupLens: applying collaborative filtering to usenet news. Communications of the ACM 3 (1997) 8. Shardanand, U., and Maes, P.: Social information filtering: algorithms for automating word of mouth. In Proc. of the ACM CHI Conference (1995) 9. Sarwar, B.M., Karypis, G., Konstan, J.A., and Riedl, J.: Analysis of recommender algorithms for ecommerce. In Proc. of the 2nd ACM E-commerce Conference. Minnesota, USA (2000)

10.Srivastava, J., Cooley, R., Deshpande, M., and Tan, P.-T.: Web usage mining: Discovery and applications of usage patterns from Web data. SIGKDD Explorations, 1:2 (2000) 11. Frias-Martinez, E., Magoulas, G., Chen, S., and Macredie, R.: Modeling human behavior in useradaptive systems: Recent advances using soft computing techniques. Expert Systems with Applications 29 (2005) 320-329 12. Castellano, G., Fanelli, A. M., Torsello, M. A.: LODAP: a LOg DAta Preprocessor for mining Web browsing patterns. In Proc. of the 6th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Base (AIKED 2007), Corfu, Greece, February 16-19 (2007) 13. Sihun, L., Jee-Hyong, L., Keon-Myung, L., YOUN, H. Y.: Fuzzy category and fuzzy interest for web user understanding. In Proc. Of the International conference on computational science and its applications. Singapore (2005) 14. Hofgesang, P. I.: Relaevance of time spent on Web pages. In Proc. Of the 11th Web Knowledge Discovery and Data Mining (2005) 15. Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York (1981) 16. Castellano, G., Castiello, C., Fanelli, A.M., and Mencar, C.: Knowledge discovering by a neurofuzzy modelling framework. Fuzzy sets and Systems 149 (2005) 187-207 17. M. Halkidi, Y. Batistakis, M. Vazirgiannis,: Cluster Validity Methods:Part II, in SIGMOD Record, September 2002

GLOBAL INFORMATION PUBLISHER 39