An Ontological User Modeling Approach Using Wikipedia's Content for ...

2 downloads 0 Views 104KB Size Report
In this work, we are building a personalization engine for web pages across systems by using Wikipedia's content as an ontological model. The model captures ...
An Ontological User Modeling Approach Using Wikipedia’s Content for Cross-System Personalization Pei-Chia Chang Information & Computer Sciences University of Hawaii at Manoa 1680 East-West Road Honolulu, HI 96822, USA [email protected]

Luz M. Quiroga Information & Computer Sciences and Library & Information Science University of Hawaii at Manoa 1680 East-West Road Honolulu, HI 96822, USA [email protected]

ABSTRACT

protocol among agents or personalization systems [15].

In this work, we are building a personalization engine for web pages across systems by using Wikipedia’s content as an ontological model. The model captures domain topics of any web page for user modeling purposes. Our system formulates a semantic user profile by analyzing the user’s browsing behavior, assuming that frequent browsing in a topical area implies a user’s interest in that area. We also address semantic interoperability via using Wikipedia as a shared platform to construct a protocol for profiling and data exchange. Preliminary tests of our system in the computer science domain may indicate that the proposed ontological model fairly captures a user’s topical interests with minor errors.

Regarding the first deficiency, we propose having a unified model at the client-side. As long as there is a shared protocol for model exchange, service providers can utilize the user model for product recommendations. There are three benefits of client-side modeling. First, users have more control on preferences and privacy if their profiles are generated locally [8]. Second, client-side browsing behavior is an alternative source to integrate profiles from commercial websites that are difficult to acquire due to conflict of interest. Third, as pointed out by Padmanabhan et al., by modeling usage data from multiple websites, user behavior can be predicted with greater accuracy [13].

Cross-System Personalization (CSP) aims to provide personalization based on shared profiles and protocols among different service systems [9]. Common approaches for CSP include generic user modeling system, unified user context, multi-agents, and ontology. There are two identified deficiencies in existing approaches. One is model replacement from one user-adaptive system to another. The other is the lack of agreement on a shared communication

As for the second deficiency, the use of ontologies is a key technique that characterizes the semantics of information exchange [5] and thus addresses semantic interoperability. Nevertheless, constructing or mapping the ontology that accommodates various topics from different systems is labor intensive and hard to maintain. To automate the process, we have selected Wikipedia, one of the world’s largest collaborative knowledge bases, as the shared platform for ontology derivation. We speculate that Wikipedia’s content, its vocabulary, and its categorization system may cover recent and popular topical areas that people are generally interested in. Therefore, our ontological model may be more dynamic and up-to-date. However, our method of ontology generation is a shortcut when authoritative ontologies are unavailable or expensive, but by no means a replacement. Additionally, user modeling is our focus, rather than semantic accuracy of the ontology.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. UDISW 2010, February 7, 2010, Hong-Kong, China Copyright 2010 ACM 978-1-60558-998-5/10/02…$5.00

Our research question is "Does the ontological model based on Wikipedia’s content semantically capture a user’s interests?" This question is within the broader context of “Does the model provide effective CSP?” In our earlier work [1], we investigated the question “Can the model correctly identify the topics of a web page?” The results encouraged us to move forward to investigate the model’s capacity to capture a user’s interests, which is the main focus of this paper. At the next stage, we will use the model to recommend web pages across various websites to the

Author Keywords

User modeling, personalization, ontology, information filtering, recommender systems, web page categorization. ACM Classification Keywords

H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval-- Information filtering. INTRODUCTION

30

the sensor. Utilizing the sensor, the crawler generates a corresponding content model for each newly fetched page.

user. Eventually, we will evaluate the question “How effective are page recommendations based on the model?” in order to address the research question of the broader context.

Every component in the system uses WikiBase, which stores ontologies, keywords, content models, and the user model. We construct WikiBase by representing Wikipedia’s categories applying heuristic information extraction to keywords from pages belonging to the same category as a collection. Heuristics include page titles, categorical labels, anchor texts, and italic and bold terms, as well as terms with a high TF-IDF score. Hence, each category has a keyword collection that includes the significance weight of each word to be utilized by the sensor.

RELATED WORK

Greaves et al. view interoperability from two aspects: syntactic interoperability and semantic interoperability [6]. In our work, we focus on the latter. Ontological modeling is a common method of addressing semantic interoperability. Examples are Oberle et al. [12] who use semantic content to track users at a conceptual level; Dai and Mobasher utilize usage metadata to formulate domain level profiles[3]. Most efforts on ontological modeling manage personalization for a single website and only a few studies address CSP.

The sensor maps usage pages into content models and manages the user model. Regarding the mapping, the system calculates a usage page's topical relevance by sensing any categorical keywords from the collection that appear in the page and associating the page with the corresponding category according to the keyword’s weight. As for the user model, it is a vector of available categories in WikiBase. Whenever the sensor maps a usage page into a content model, it increases the score of the page’s relevant categories in the user model. If a user accesses a specific categorical topic multiple times or through multiple pages, the user will score higher in the corresponding category of the user model. Thus, the user model constantly evolves.

One of the studies is Context Passport [11], which employs a unified user context model (UUCM) [10]. As opposed to UUCM, which captures multidimensional user facets, we emphasize only on the semantic dimension of topical interests. Other personalization systems, such as WebWatcher [7], Syskill & Webert [14], and WebMate [2], recommend pages from different websites based on content similarity but they fail to provide a semantic user model that can be exchanged for CSP. Therefore, our work focuses on deriving a user’s topical interests in a semantic ontology.

As for the matcher, it compares the cosine similarity of each content model of the crawler-retrieved pages across websites against the user model and then selects recommended pages of high similarity. In addition to cosine similarity, the matcher also relies on the ontological structure (hierarchal and associative) of WikiBase to reveal topical associations among web pages. In addition, the structure helps to identify whether or not a user is interested in particular domains.

METHOD DESCRIPTION

Our system captures user interests in a profile via categories from Wikipedia. This is based on the ontology – WikiBase – we derived from Wikipedia [1]. With the ontology, our system automatically associates a user with categorical topics that the user browses frequently. The following subsection briefly describes the system structure introduced in [1].

Evaluation Method

Figure 1 System Architecture

The pilot evaluation includes recruiting a few (