that are further divided into two working groups: offline and online. ... Web usage mining, web personalization, multi-agent system, clustering, online classification. 1. .... decision maker agent informs (A6) the personalization agent about ...
ISBN: 978-972-8924-44-7 © 2007 IADIS
PWUM: A WEB USAGE MINING MULTI-AGENT ARCHITECTURE FOR WEB PERSONALIZATION Fadoua Ouamani1, Zeina Jrad2, Marie-Aude Aufaure2,, Hager Baazaoui Zghal 1 Henda Ben Ghezala1 1
Laboratoire Riadi GDL-ENSI-Campus universitaire de la Mannouba 2010-Mannouba, Tunisie 2 INRIA Paris-Rocquencourt, Domaine de Voluceau 78 153 Le Chesnay Cedex, France 3 Supélec - Plateau du Moulon - Service Informatique 91 192 Gif-sur-Yvette Cedex, France
ABSTRACT Web usage mining is a tool for web personalization, since it captures and models behaviors and profiles of users interacting with a web site. These models can be used by personalization systems to better understand the behavioral characteristics of visitors improve the organization, the content and the structure of web sites and provide dynamic recommendations to visitors. This paper describes the design of a web usage mining architecture for web personalization (PWUM) implemented using a multi-agent platform. This latter is composed of a set of autonomous agents interacting together in order to fulfill the main goal of the system. Agents are divided into modules that have well- defined tasks and that are further divided into two working groups: offline and online. The data mining module is the main module of the architecture since it provides the personalization module with patterns, the crucial key of the whole system. The multiagent system in PWUM has been implemented using the Jade idiom. KEYWORDS Web usage mining, web personalization, multi-agent system, clustering, online classification.
1. INTRODUCTION With the tremendous growth of the amount of information available online, the World Wide Web has rapidly became the main source of information to the majority of users in many domains. This increases the number of inexperienced users. So these factors raise the necessity of analyzing and understanding the behavior and the interests of web users in order to personalize responses to their requests. This may create a friendly relationship between the Web and its users. Our research focuses on combining multiple web usage mining techniques in a multi-agent architecture to give a personalization solution to web sites. We aim to improve the output of the web usage mining process by using the agent technology. This will provide a dynamic personalized guidance to the visitor of the Web. The agent technology has been used in varied types of Web applications such as: information retrieval, web service, e-learning, e-commerce, semantic web, web mining, personalization and web adaptation … so, the agent based computing paradigm is particularly suitable for distributed, adaptive and personalized web environment. In this paper, we present a novel approach for personalizing the Web. We first present our system and describe the mechanism of interactions between agents. This mechanism is necessary for WUM (Web Usage Mining) and personalization tasks. Then, section 3 gives details on main algorithms of the personalization solution. Finally, we discuss the main issues of our approach and possible future directions.
272
IADIS International Conference WWW/Internet 2007
2. PWUM: A MULTI-AGENT WEB USAGE MINING ARCHITUCTURE FOR WEB PERSONALIZATION In this section, we present our web usage mining methodology and personalization solution. We describe the main agents of our architecture along with their roles and the interactions among them. .
2.1 The Web Usage Mining Methodology Our methodology consists in using many web usage techniques. For user modeling, we used a dynamic clustering algorithm based on the FM model for constructing clusters. This model also allows the online classification of new captured session. [9]. In fact, a scalable model for the clusters is needed. If a cluster is modeled as a set of session’s models only, any analysis on a cluster will be independent of the number of sessions in the cluster, which is not a scalable solution; so, the FM model is suitable for real time matching of session to pre-generated cluster and it offers scalable models of clusters. In order to construct groups of sessions, features values are aggregated for all clustered sessions into corresponding features values of a virtual session, called a cluster centroid or clusters model. The final result of aggregation is a set of aggregated feature matrices that constitute the FM model of the cluster. Cfm = {MF1, MF2, …, MFm} The aggregated feature matrix for every feature of the cluster is computed as follows: N MF = 1/N ∑ MiF i=1 where N is the cardinality of the cluster C. This function is also applied incrementally to update the cluster model when a new session Aj joins the cluster previously created. So we talk about dynamic clustering: MF = 1/(N+1) (N*MF*MjF) We also adopted the new similarity measure defined by Shahabi and al. based on FM model [9]. This measure is a variant of PED (Pure Euclidean Distance) called PPED (Projected Pure Euclidean Distance) to alleviate the overestimation problem and reduce the time complexity of the similarity measurement. It is specifically designed for accurate matching of partial navigation patterns in real time. → → Assuming that A and B are two feature vectors of the same type belonging respectively to a session and a cluster model. Each vector is composed of N components. The dissimilarity between the two vectors is estimated as the computation of the pure Euclidean distance between the first vector and the projection of the second vector on the coordinate planes at which the first vector has non-zero components: →→ N PPED(A, B) = (∑ (ai – bi)2)1/2 i=1, ai≠0 where PPED € [0,∞] and is not commutative. By contrasting the first vector with the projected vector, the session and the cluster are compared based on just the segments that exist in the session and not on the entire basis. Thus, the uncovered part of the cluster in the session is excluded from the comparison to avoid the overestimation. Furthermore, for each group of session, we want to determine the frequent episodes by using a sequential patterns technique. In our methodology, the PSP algorithm has been used [13]. It is based on the same general algorithm as GSP [10] but using an improved tree-like data structure for storing candidate sequence
2.2 Presentation and Components Description Our architecture (Figure 1) is composed of two types of agents, online and offline: 1- The interface agent is a mobile agent that handles the interaction between the user and the web. It observes the user’s behavior to learn its navigational interests. Once the web usage mining is achieved and personalization tasks are chosen, it displays the recommended and requested results. 2- The user data agent is also a mobile agent that collects user’s data from cookies. It identifies the user, classifies the data corresponding to each user separately and saves them in the database.
273
ISBN: 978-972-8924-44-7 © 2007 IADIS
3- The filtering agent cleans the noisy web data by erasing the redundant and irrelevant data. We applied the Cooley method [6] and considered images, sound, video files, other web sites, CGI scripts and image map files as irrelevant data. We also filtered and removed requests that have not been satisfied and records created by spiders or crawlers and recognized through the user agent field of the log file. The user session’s identification has to identify the session and the user. Users can be identified through cookies [4] and (IP, User agent) couple, whereas, user’s session identification is done using a time-based method. A set of pages visited by a specific user is considered as a single user session if it happens in a time interval not larger than 30 minutes [6].
Interface agent
Usage data base
Filtering agent
Sessions identification agent
User data agent
Missing data agent
Evaluation agent
Clustering agent Decision maker agent Classification agent (on line) Personalization agent
Knowledge base Sequential pattern agent Offline
Online
Offline
Figure 1. Functional structure of the PWUM architecture
4- The missing data agent catches missing data (due to cache and proxy problems) among web filtered data by executing a detection algorithm called Logparser [8]. 5- The classification agent classifies the user active session in one of the user’s group already found. To do that, it executes the online classification algorithm presented in section 3.1. 6- The clustering agent has the task of gathering users having similar navigational behaviors in clusters. We applied the dynamic clustering algorithm presented in details in section 3.1. 7- The sequential patterns agent discovers frequent episodes among the identified user sessions. It facilitates the identification of linked pages and allows patterns generation in navigational rules form. 8- The evaluation agent has two roles, validating discovered rules from the web usage mining process and visualizing all of the results. For each discovered rule, the evaluation agent calculates the level of
274
IADIS International Conference WWW/Internet 2007
relevance to judge the accuracy of the rules. The validation task [5] consists in eliminating useless and uninteresting rules and extracting significant rules. These rules will be used later for personalization tasks. 9- The decision maker agent makes decision about personalization function necessary to satisfy user’s needs. Once the personalization function is chosen, it activates the personalization agent that does the work requested by this function. 10- The personalization agent receives the discovered rules and defines the personalized tasks adapted to the current user based on the user’s group found by the classification agent.
2.3 Interactions Among Agents The purpose of our system is to discover operational knowledge to automate the personalization process. To achieve such a goal, agents cooperate and coordinate their work by sending messages among them. These messages are informative messages sent by agents either to activate other agents or to give some interesting pieces of information necessary to the work progress. The interface agent captures the navigational behavior of the user and informs the user data agent and online classification agent about it (A1 and A2 in figure2). The user data agent identifies the current user, creates and record a new navigational session for him. Otherwise, it asks (A3) the classification agent to classify the active session in one of the groups previously discovered by the clustering agent and using the new information about the user embedded in her or his session.
Figure 2. AUML sequence diagram representing the interactions among agents in online work
Once the current session classified, the classification agent informs (A4) the evaluation agent about this group. This latter notifies (A5) the decision maker agent with the rules to apply in such a case. Finally, the decision maker agent informs (A6) the personalization agent about personalization functions to execute according to the adaptation rules associated to that group of user. Once personalization functions are executed, the personalization agent (A7) sends (the only data message) the results to interface agents which display them to the user as response to his or her request. The interactions between agents are represented in figure 2 using AUML (Agent-base Unified Modeling Language) which is an extended version of UML for agents interactions modeling [12]. In fact, AUML provides an efficient way for modeling agent-based system because multi-agent systems are social communities of interdependent members that act individually [1]: First, agents are active because they can take the initiative and have control over whether and how they process external requests [2]. Second, agents do not only act in isolation but in cooperation and coordination with each other.
275
ISBN: 978-972-8924-44-7 © 2007 IADIS
2.4 The Personalization Solution The personalization agent uses the user model knowledge along with the previously discovered sequential patterns and applies a set of personalization rules in order to deliver the following types of personalization tasks or functions: The memorization of personal information ; The user’s salutation ; The recommendation of links related to what users in the same group previously choose or links that the same user usually views ; and Objects differentiation by presenting different features of each object. However, the way these functions will be combined to provide a complete personalization solution depends on the personalization policy the site owner wishes to follow. We used aggregate models so we are following a multi-user personalization policy. And since our personalization functions are done once on the beginning of user sessions, our personalization policy is then static. Moreover, our personalization functions are adjusted to the browsing context of the user, so, we have context-sensitive personalization policy. Finally, personalization tasks focus on a certain topic and there is no explanation available for each performed personalization task, thus, we can describe our personalization policy as converging and non explanatory.
3. CONCLUSION This work described PWUM, a web usage mining multi agent system for web personalization. We presented our system and described the mechanism necessary for WUM (Web Usage Mining) and personalization tasks. The combination of more than one technique of WUM enhances the quality of discovered models, so this optimizes the personalization process. Furthermore, the use of multi-agent paradigm reduces the time complexity and decreases the efficiency of the system. The software agents of PWUM have been implemented using a multi-agents platform called JADE [3]. The results we got in using both of multi-agents systems and WUM techniques are very encouraging. We are looking forward testing our approach in tourism web sites as part of national research projects.
REFERENCES Bauer B., 1999.Extending UML for the specification of interaction protocols.submission for the 6th call for proposal of FIPA and revised version part of FIPA 99. Bauer B. et al.,2000.agent UML: a formalism for specifying multiagent interactions. AOSE:91-104. Bellifemine F. et al, 2004.JADE Basic Documentation: Programmer’s Guide. Cooley R. et al, 1999a.Data Preparation for mining World Wide Web browsing patterns. Journal of Knowledge and Information System, pp 55-32. Jiang Q., 2003.Web Usage Mining: Processes and Application, CSE 8331. Kamdar T. and Joshi A, 2000.On Creating Adaptive Web Sites using WebLog Mining”, Technical Report TR-CS-00-05, Department of Computer Sciences and Electrical Engineering, University of Maryland, Baltimore Country. Masseglia F. et al., 1999.An efficient algorithm for web usage mining. Networking and Information System Journal. Murgue T., 2006.De l'importance du prétraitement des données pour l'utilisation de l'inférence grammaticale en Web Usage Mining”, Labo. Hubert Curien, Jean Monnet university, Saint-Etienne. Shahabi C. and Banaei-Kashsani F., 2003.Efficient and Anonymous Web Usage Mining for Web Personalization. INFORMS Journal on Computing-Special Issue on data mining, Vol 15, No.2. Srikant R. and Agrawal R., 1996, Mining sequential patterns: Generalization and performance improvements, In Proceeding of the 5th International Conference on Extending Database technology, Avignon, France, page 3-17. Symeonidis A.L. and Mitkas P.A., 2006, agent Intelligence through data mining. the 17th European conference on Machine Learning and the 10th European Conference on Principles and Practice of Knowledge Discovery in Data base. Berlin, Germany. http://www.auml.org/ FIPA Request Interaction Protocol Specification, 2004, Available in: http://www.fipa.org/specs/fipa00026/sc00026H.html/, accessed in 22 January 2005.
276