Exploiting multidimensional data for web site automation

4 downloads 36467 Views 131KB Size Report
mensional data to automate the personalization and ... Keywords: Web site automation, recommender sys- ... FS, combines the DaVI-BEST with a sequential.
1

Exploiting multidimensional data for web site automation Marcos Aur´elio Domingues INESC Porto - Institute for Systems and Computer Engineering of Porto, Porto, Portugal E-mail: [email protected], [email protected] The continuous growth in size and usage of the World Wide Web poses a number of challenging research problems. This thesis investigates the use of multidimensional data to automate the personalization and management activities of web sites. We pay particular attention to the use of additional dimensions (e.g., contextual or background information) in traditional useritem based top-N recommender systems and propose a multidimensional approach for this purpose. To support our research, we also propose a data warehouse to collect and compile information regarding the activity on a web site in terms of usage, content and structure. We show that, by exploiting multidimensional data, efficient web automation methods can be developed and used to improve the personalization and management of web sites. Keywords: Web site automation, recommender systems, multidimensional recommender systems, multidimensional data, data warehouse

1. Introduction The personalization and management of web sites imposes a constant demand for new information and timely updates due to the increase of services and content that site owners wish to make available to their users. Such constant labor intensive effort implies very high financial and personnel costs. Web site automation has emerged as a solution to automate several personalization and management activities of a web site. One of the goals of automation is the reduction of the editors effort, and consequently of the costs for the owner. The other goal is that the site can more timely adapt to the behavior of the user, improving the browsAI Communications ISSN 0921-7126, IOS Press. All rights reserved

ing experience and helping the user in achieving his/her own goals. The overall goal of this thesis [1] is to exploit multidimensional data (i.e., data involving several dimensions or aspects) for web site automation. In the first part of our work, we propose a data warehouse that is developed to be a repository of information to support different web site automation and monitoring activities. In the second part, we exploit the use of multidimensional data in top-N recommender systems.

2. A data warehouse for web site automation In the literature, most data warehouses for web site automation are developed for specific activities, which means that these data warehouses are designed to store only information needed by such activities. Our proposal consists of star schemas (or a fact constellation schema) involving three fact tables and six dimensions. Fact table Usage is connected to the dimension tables Session, User, Referrer, Time, Date and Page. This fact table is loaded with data about accesses/requests to the web site, extracted from web access logs. Fact tables Structure and Content (which share the dimensions Time, Date and Page) are loaded, respectively, with data about every hyperlink in the web site and representations of web page content, both extracted from web pages. The two data sources, web access logs and web pages, are site independent and can be used to support several web site automation and monitoring activities [3]. We implemented our data warehouse and used it as a repository of information in three different case studies [1]. The first is a simple application that illustrates how the data warehouse can be used to compute a set of well-known metrics to assess the effectiveness of an e-commerce web site. The second, is a more complex application, where the data warehouse is used as a repository of information to feed recommender systems in

2

Marcos Aur´ elio Domingues / Exploiting multidimensional data for web site automation

an e-learning web site, as well as tools to evaluate and monitor their performance. Finally, we have a much more complex application, which is concerned with a tool to monitor the quality of the meta-data describing content in an e-news web portal. In conclusion, our case studies demonstrate the simplicity in using the data warehouse, which make us believe that it has potential for many other applications. 3. A multidimensional approach to enhance top-N recommender systems A top-N recommender system is an information filtering technology which can be used to output a set of N recommendations that are likely to be of interest to the user. Here, we propose a multidimensional approach, called DaVI (Dimensions as Virtual Items), that consists in representing additional dimensions as virtual items together with the regular items in the access data, enabling the use of traditional user-item top-N recommender algorithms for generation of recommendations using multidimensional data [2]. We instantiated our approach in three different algorithms. The first one, called DaVI-BEST, evaluates and selects the best dimension in a data set to build the multidimensional recommendation model. The second algorithm, called DaVIFS, combines the DaVI-BEST with a sequential forward selection algorithm in order to select the best combination of dimensions to build the multidimensional model. The last algorithm, called DaVI-ALL, consists in the simple idea of applying the DaVI approach on all existing dimensions in a data set, at the same time, to build the model. In order to evaluate the effectiveness of the DaVI algorithms, we combined them with two different recommendation techniques, Item-based Collaborative Filtering (CF) and Association Rules based (AR), and ran an extensive set of experiments on three different real world data sets (Listener, Playlist and Entree) [1]. The F1 measure for top-1 recommendations is summarized in Table 1. The values which are statistically significant are indicated in boldface. The character “*” indicates the highest values, while “-” the algorithms that timed-out. In conclusion, the empirical evaluation of our multidimensional approach DaVI shows that we can use it to enhance top-N recommender systems.

Table 1 F1 measure for top-1 recommendations on Listener, Playlist and Entree data sets Algorithm

Listener

CF Playlist

Entree

user-item DaVI-BEST DaVI-FS DaVI-ALL

0.231 0.309* 0.307

0.342 0.429* 0.426 0.426

0.214 0.22 0.22 0.221*

user-item DaVI-BEST DaVI-FS

0.175 0.207 0.208*

0.225 0.255* 0.255*

0.322 0.348* 0.345

DaVI-ALL

-

0.255*

0.342

AR

4. Final remarks There is a wide range of opportunities to use multidimensional data in web site automation. In this thesis, we proposed a data warehouse as a storage infrastructure for such data and exploited additional dimensions in top-N recommender systems. In the future, we intend to extend the data warehouse to store site dependent web data (e.g., products for sales) and address other applications. We also will try the DaVI approach with other recommender algorithms, such as Markov Models. Acknowledgements Marcos Aur´elio Domingues was supervised by Al´ıpio Jorge and Carlos Soares and supported by FCT PhD grant SFRH/BD/22516/2005, FCT project Rank! (PTDC/EIA/81178/2006) and QREN-AdI Palco3.0/3121 PONORTE. References [1] M. A. Domingues. Exploiting multidimensional data for web site automation. PhD thesis, University of Porto, Porto, Portugal, 2010. Available on http://www.liaad.up.pt/∼marcos/downloads/ PhDThesisMarcosDomingues.pdf. [2] M. A. Domingues, A. Jorge, and C. Soares. Using contextual information as virtual items on top-n recommender systems. In ACM RecSys’09 Workshop on Context-Aware Recommender Systems (CARS-2009), New York, USA, 2009. [3] M. A. Domingues, A. Jorge, C. Soares, J. P. Leal, and P. Machado. A data warehouse for web intelligence. In Proceedings of the Thirteenth Portuguese Conference on Artificial Intelligence, pages 487–499, Guimaraes, Portugal, 2007.