Personalized Access Model: Concepts and ... - Semantic Scholar

2 downloads 55343 Views 587KB Size Report
Thus, all messages addressed to the PAM should respect the meta .... example, aim to automate the management of ... sending an email instead than SMS.
Personalized Access Model: Concepts and Services for Content Delivery Platforms Sofiane Abbar1

Mokrane Bouzeghoub1

Stéphane Lopes1

Armen Aghasaryan2

Dimitre Kostadinov1 Stéphane Betge-Brezetz2

1

PRiSM Laboratory, University of Versailles, 45, av. des Etats-Unis, 78035, Versailles {firstname.lastname}@prism.uvsq.fr 2

Alcatel Lucent Bell Labs Villarceaux Route de Villejust, 91620 Nozay, France {firstname.lastname}@alcatel-lucent.fr

ABSTRACT Access to relevant information, adapted to user’s needs, preferences and environment, is a challenge in many applications running in content delivery platforms, like IPTV, VoD and mobile Video. In order to provide users with personalized content, applications use various techniques such as content recommendation, content filtering, preference-driven queries, etc. These techniques exploit different knowledge organized into profiles and contexts. However, there is not a common understanding of these latter concepts and there is no clear foundation of what a personalized access model should be. This paper contributes to this concern by providing, through a meta model, a clear distinction between profile and context, and by providing a set of services which constitutes a basement to the definition of a personalized access model (PAM). Our PAM definition allows applications to interoperate in multiple personalization scenarios, including, preference-based recommendation, context-aware content delivery, personalized access to multiple contents, etc. Concepts and services proposed are tightly defined with respect to real applications requirements provided by Alcatel-Lucent. Keywords: personalization, context, contextual preferences, user profile modeling, personalized access.

1. INTRODUCTION Personalization paradigm aims at adapting applications as much as possible to the user preferences and to the user context. Adaptation may concern several aspects, such as system reconfiguration, communication protocols, data sources selection, query reformulation, data layout, or users feedback handling. Data personalization refers to the set of techniques which allow to provide the user with the most relevant data, depending on his domain of interest, his data quality requirements, his location at the querying time, the time at which the data is required, the media used to supply this data or any other constraint related to data pricing, user privacy or business policy. Works on personalization techniques concern, generally, a very few dimensions among these, and the proposed solutions are encapsulated as part of systems and applications features; therefore reducing their capabilities of extension and evolution. Actually, there is no effort devoted to the definition and production of personalization services which are generic enough to be used in many applications. Then, the personalization process is approached in different ways, depending on the applications and on the technologies used. In information retrieval systems, the personalization is considered as a machine learning process based on user feedback [8, 46].

The user is fully involved in the query evaluation which is conducted as a stepwise refinement process where the user can decide at each step which data he likes and which data he dislikes. From the log of this behaviour, user profiles are elaborated and introduced as new filtering rules refining the results of further queries. In database systems, personalization is considered through two viewpoints: query language extension and user query expansion. Query language extension, such as SQL/f [35] or PreferenceSQL [23], enable expressing within each query user preferences. Query expansion consists in the user query enrichment using preferences given in the user profile [27,25]. Both approaches deal with user preferences characterizing the desired data. They are not concerned by the adaptation to the user environment. However, the need of considering the user mobility and omnipresence [6] have imposed new considerations such as user location, the media used for the interaction and many other features grouped in the concept of user context. A context is a set of features which describe the environment in which the user interacts with the information system, while a profile is a set of features characterizing user needs in terms of data and quality of this data. Many context-aware applications, such as smarthomes [45] or context sensitive search engines [19], are able to adapt their processing and their services to the user context. Even if all applications agree with the importance to have a profile and a context, there is a lack of consensus in the definitions of these concepts. There are as many profile and context definitions as application domains and technologies. Thus, classifying, organizing and structuring the knowledge describing the user and the context is a key element to have a global vision on data personalization. The goal of this paper is to provide a formal definition of a personalized access model (PAM) with its underlying notions of profile, context, query and session. The

definition of such a PAM is driven by the following requirements: – Define profile and context meta models which are generic enough to be adapted to a wide range of applications and which are open to integrate specific knowledge not included in the initial modeling; – Define a set of services which can be used to personalize existing applications or to build new personalized applications; – Allow partial or full usage of this process such that each application can use part or entire knowledge defined by the meta model. This paper is organized as follows. Section 2 gives a global view of our personalized access model. Section 3 describes the profile and the context meta-models, and presents a model management platform. Section 4 presents the personalization services offered by the PAM. Section 5 describes how the PAM can be deployed over several architectures. Section 6 position the paper according to related works. Section 7 concludes the paper with further research.

2. PERSONALIZED ACCES MODEL Our definition of a personalized access model (PAM) aims to provide a generic set of concepts and techniques which should be deployed over a given system architecture to make the target applications adaptable to users’ profiles and contexts. Figure 1 gives an overview of the main components of a PAM. The PAM is composed of three layers: (i) a persistency layer, (ii) a functional layer and (iii) a communication layer. The persistency layer deals with the storage and the access to the profiles and contexts. It includes the profile and the context catalogs. The functional layer is composed of the profile management, context management and personalized access services. These services are: instantiation, update, profile contextualization, profile matching, profile-context binding and query reformulation. A more detailed description of the services offered by the platform can be

found later in the paper. Finally, the communication layer provides a communication interface between the PAM and users or applications. The role of this layer is, on one hand, to give access to the profiles and the contexts base and, on the other hand, to enable calling the PAM services.

be used to create a large variety of profiles and contexts. Their content may vary from one domain to another. The next subsections present the main profile and context dimensions we have identified. Dimension #Dimension Name 0..n SubDimension #SubDimension Name

Services Access API

1..n

Profile and Context Meta Models

Personalized access services Profile and Context Matching

Query Reformulation

Instantiation of Profiles and Contexts

1..n 1..n

Profile - Context Binding

Profile and context management services Profile Knowledge Contextualisation

0..n Attribute #Attribute Name ValueType

Update of Profiles and Contexts

AttributeValue #Value Value

< Defined On

Preference #Preference PrefValueExpr < Defined On < Defined On

Figure 2. Profile and context meta model

Profile and Context Access API

3.1 Profile Modeling PROFILE CATALOG

CONTEXT CATALOG

Figure 1. Personalized Access Model Architecture

The components of the three layers of the PAM are build around profile and context meta models which are generic enough to be adapted to a wide range of applications and which are open to integrate specific knowledge not included in the initial modeling. All PAM components are developed in conformance with the meta models. Thus, all messages addressed to the PAM should respect the meta models format. More details about the meta models can be found in the next section.

3. META MODELS AND META DATA MANAGEMENT PLATFORM We propose to organize profile and context meta models into dimensions and sub dimensions which are described by (attribute, value) couples. Sub dimensions could be considered as complex attributes. An example of possible sub dimension is the address which is composed of a street name, street number, postal code, city and country. The generic profile and context meta model is given in Figure 2. The meta model can

We have identified five dimensions through which a user profile can be defined: personal data, domain of interest, data quality, data delivery and security and privacy. A brief description is given below for each dimension. Personal Data groups attributes and preferences related to the user himself, that is all what concern his identity, demographic data, professional data, health care data, and so on. In some information systems, personal data will be used to filter query results with respect to the age of the user, his gender or his area of work. In others, personal data is useless for information filtering itself but it is still useful as an ’exchange currency’ between the user and the information provider. This is the case in many e-commerce applications and webbased systems which collect personal data for statistics purpose or for publicity dissemination. Domain of Interest is the central dimension of the user profile. It groups all attributes and preferences related to general needs of a given user. The domain of interest may describe the user’s expertise or qualification in a specific field as well as the main object types he is interested in. It can be defined in different ways depending on the application needs (Figure 3). For instance, in Information Retrieval, the Domain of Interest

is usually described by a set of possibly weighted keywords [37] or ontology graphs [15], while in Databases, it is commonly described by a set of predicates [27] or expressions in a given formalism such as Horn clauses [14] or utility functions [11]. In some applications the domain of interest is represented by the history of the user interactions with the system. This includes examples of elements to which are associated the actions performed by the user on them. Knowledge defined in the domain of interest is mainly used to reformulate user queries, either by term substitution, by complementing the queries with new selection predicates or by introducing orders between predicates.

DisjFormulas

Ontology

History

1..n

1..n

1..n

1..n

ConjPredicates

ConceptsGraph

ExamplesList

1..n

1..n SemLink

0..N between

2..2

1..n

Example

Predicate

UniqueList

Axiom

0..n

1..1 Attribute

1..n

0..n

1..n Concept

1..n operator

Security and privacy describe security rules and constraints that can be applied either to the data resulting from queries, to the queries themselves, to the user identity or to the whole profile as a sensitive knowledge. Security dimension mainly refers to privacy policies as described in different standards such as P3P [13] and PAPI [31] for example. The next subsection will describe a complementary meta model which concerns the context.

DomainOfInterest

KeywordsVector

such modalities. Delivery modalities may depend on the media used, same query results will not be presented in the same way depending on whether the media is a laptop, a PDA or a mobile phone. Delivery preferences can be used to define such modalities and to decide in which context they are used.

Actions

1..n Content

Value RecurrentList

Figure 3. Domain of Interest Model

Data quality is one of the most important issues in data personalization. Most of the user preferences relate to data accuracy, data freshness, data consistency, etc. Data quality does not only concern data values but also data sources (e.g. confidence, update frequency, completeness) and data derivation process (e.g. response time, reliability). Attributes and preferences of the quality dimension can be used by the query processor or more generally by the data management system to filter the accessed data sources when their number is significantly high [30] or to find a good balance between data freshness and response time [10]. Data Delivery dimension describes different modalities related to user interface (e.g. media type, presentation style, results size), user mobility (e.g. geographical location), temporal accessibility (e.g. moment at which queries are issued and result notified, login duration), etc. Most of the web-based search engines propose

3.2 Context Modeling Since users’ preferences may change with regard to their interaction environment (place, moment, availability, etc.), we have defined a context as being the set of information that characterize these users’ environments. Further, the advent of mobile and pervasive computing needs [6] has pushed multimedia services and content providers to take into account the context of users in the personalization and delivery process. This consideration gave rise to context-aware applications able to adapt their treatments and services to users’ contexts such as smart home intelligent agents [45] and context-sensitive information retrieval [19]. For these applications context refers usually to spatial and temporal information of the interaction session as well as users’ recent activities. As for the profile, we have identified five dimensions for the context meta model: temporal, spatial, equipment, user state and environment (Figure 4).

both application layout (HMI) and delivery (format and number of delivered contents).

Figure 4. Context meta model

The temporal dimension groups attributes related to the temporal aspect of the user interaction. This information allows personalizing an application with respect to the moment of the user interaction. For example, one user may be interested by reading news in the morning, listening music in the afternoon, and watching movies in the evening. Attributes of this dimension are organized into a hierarchy (Year, Quarter, month, etc.) to deal with various granularity needs of applications. The spatial dimension constitutes one of the most important context characteristics. It encompasses all information and parameters that characterize the place from which users interact with applications. Since users may have different interaction behaviors with regard to their geospatial situation, we have enumerated two generic spatial situations: static and on move. Both of these situations may be described with a simple coordinate (e.g. GPS, address, etc.) and a locality label in the case on known place (e.g. Principal Home, Office, in train, etc.). The equipment dimension characterizes the media used by the user to interact with the application. Three aspects are described in this dimension: details about the used device (e.g. type, autonomy, memory storage and computing power), the software that are used (e.g. operating system, mail client, etc.), and characteristics of the connection (e.g. Type: WiFi, 3G; Rate; Services: FTP, MMS, etc.). Equipment dimension is very important for personalizing

The environment dimension concerns elementary sensors that may inform applications about some external characteristic of users’ interaction environments. These sensors may be used to capture temperature, moisture, luminosity, noise volume, etc. Smart home applications [45] for example, aim to automate the management of these sensors in order to satisfy the user preferences. The user-state dimension provides information about the user availability and cognitive emotions. Some applications such as automated hotline agent deal directly with users; therefore they need to infer the user emotions in order to adapt the speech. Further, interactions have to be adapted to the user availability, for example, when a user is in a meeting, the personalized application can privilege to contact him by sending an email instead than SMS.

3.3 Profile and Context Management Platform The models described above have been implemented and stored in a model management platform (Figure 5) which provides developers and users with the necessary functionalities for profiles and contexts management. This platform provides services which enable profile and context meta models instantiation and update. Providers may use our platform in order to instantiate models of profiles and contexts. Profile models may correspond to particular application domains, i.e. relevant profile knowledge may vary from one domain to other (e.g. multimedia, scientific). Context models correspond to contexts situations that are held by the application (e.g. home context, holiday context). Each model has its own structure, but is always instantiated from the meta model. Another instantiation level consists in creating profile and context instances from models. These instances are related to a particular user in term of values they contain.

4.1 Profile Contextualization

Figure 5. Model Management Platform: context management

Further, the platform enables both provider/developer and user to perform update operations such as adding elements, deleting elements, and updating element values. However, users can only update the values of their profile and context instances, whereas developers may operate on both meta models and their instances.

4. PERSONALIZED ACCESS SERVICES This section presents the personalized access services offered by the PAM. We distinguish between offline and online services. Offline services, also called modeling services, are used to complete the meta models instantiation. They are applied during the design time and are not involved into the process of query evaluation. Online services, called personalization services, are used to adapt the execution of a given query to the user profile and to the user context. Usually, this type of services is applied in realtime as part of the query execution process. This section proposes one modeling service (profile contextualization) and three personalization services (profile-context binding, profile matching and query reformulation). In the following, we focus on the profile dimension "Domain of interest". We consider a profile as a set of preferences (Pu = {p1, ..., pn}). Each preference is composed of a predicate and a weight (pi = (pri, wi)). The weight wi is a real number expressing the importance of the predicate pri for the user. The predicate is a triplet which characterizes a subset of all instances of the concept concept.

The user profile usually describes the global information need of a given user. That is, it contains all known information about the user and his preferences, independently of the contexts in which the user can interact with a given application. Taking into account the whole profile content during the personalization process can be a time-consuming task. In addition some user preferences may be useless in some contexts. Thus, taking into account such preferences can provide the user with less relevant results. To avoid these problems, it is necessary to capture the exact user needs with respect to the context in which he interacts with a given application. This is done by the profile contextualization service. The profile contextualization service enables expressing dependencies between user profiles and one or several contexts in which these profiles can be exploited. This service has been developed with respect to the following assumptions: -

the set C = {c1, ..., cm} of possible contexts, in which the user can interact with the application, is predefined,

-

the application maintains history of the user behavior on the results of each of his queries. This history is in the form H = {feedback(Pu, ci, Qu)}, where feedback(Pu, ci, Qu) is the behavior of the owner of profile Pu over the results of the query Qu, evaluated with respect to Pu, when the user is in the context ci.

For a given user whose profile is Pu, the profile contextualization consists in specifying a set M of mappings which relates the user profile preferences to the contexts C: M (Pu, C) = {m(pi, cj, δ) | pi ∈ Pu, cj ∈ C, δ ∈ [-1, 1]}, where δ is the mapping score of pi with respect to the context cj. According to the value of δ, there are three types of mappings:

-

positive mappings (δ > 0) indicating profile predicates which have to be satisfied in the given context,

-

negative mappings (δ < 0) pointing out profile predicates which would not be satisfied in the given context and

-

neutral mappings (δ = 0) corresponding to profile predicates which are irrelevant in the context.

One of the main problems of the profile contextualization is the mappings construction process. Building these mappings for each couple of (profile, context) manually is time consuming and even impossible when he two concepts are defined separately. Thus, automating the discovery of these mappings remains an appealing approach. Although, the discovery process may be not trivial. Mappings are discovered by analyzing the user history H. The main idea is to check if there are correlations between the user profile preferences and the user feedback in a given context. Thus, the profile contextualization process takes as input a non-contextualized user profile Pu, the user history H and the contexts C of a given application. It returns the set of mappings M(Pu, C). We propose a specific profile contextualization service, called F-Contextualize, which proceeds in three steps (Figure 6). user profile Pu

F-Contextualize partitioned results

user history H

Results partitioning

Mappings initialization

candidate mappings

Context specificity check

mappings M(Pu, C)

contexts C

Figure 6. F-Profile contextualization

The first step (results partitioning) consists in identifying which results the user liked and which he disliked. For example if a user spent a considerable amount of time in looking a given

result, this result is considered as relevant for the user. The output of this step represents two sets containing respectively relevant and irrelevant results for each context. In the following, POS(Pu, ci) and NEG(Pu, ci) denote respectively the sets of relevant and irrelevant results in the context ci for a user profile Pu. The second step (mappings initialization) consists in initializing the possible mappings between the user profile predicates and the application contexts. This task is achieved by evaluating the score δ of each profile preference pi in each context cj. This is done by computing the occurrence frequencies δ+ and δ- of each profile predicate in POS(Pu, ci) and NEG(Pu, ci) respectively. Then, candidate mappings are created. The value of δ depends on whether the frequency δ+ (respectively δ–) satisfy a given condition γ(δ+) (respectively γ(δ-)) as for instance a threshold γ+ (respectively γ-). The different cases, which may occur, are summarized in Table 1. γ(δ δ+)

γ(δ δ-)

Value of δ





δ+





-δ-





0





freq(pi, POS(Pu, ci)∪ NEG(Pu, ci)

Table 1. Mapping score

If only γ(δ+) holds, that is a sufficient part of the relevant results satisfy the predicate, a positive mapping m(pi, cj, δ) is created with δ = δ+. Similarly, if only γ(δ-) holds a negative mapping m(pi, cj, δ) is created with δ = - δ-. If both γ(δ+) and γ(δ+) are not satisfied, the predicate does not enable distinguishing between relevant and irrelevant results. Consequently, a neutral mapping m(pi, cj, 0) is created. Finally, if both γ(δ+) and γ(δ+) hold, the predicate corresponds to a user preference which is always valid. Thus, a positive mapping m(pi, cj, δ) is created with δ

equal to the pi frequency in the set union of relevant and irrelevant results (δ = freq(POS(Pu, ci)∪ NEG(Pu, ci))). The latter case can be seen as part of a user view definition. For example if the user watches only movies in English, then all relevant results will satisfy the predicate “language=’English’”, but the same will be true for all irrelevant results too. This kind of predicates has to be taken into account as it corresponds to general user preferences which have to be satisfied within the actual context.

where this problem is solved by transforming the profile preferences in order to take into account the mapping scores. It processes in two steps (Figure 7).

Finally, the context specificity check phase prunes mappings which relate the same profile predicate to all contexts. Indeed, these mappings concern profile predicates which have not to be contextualized as they have to be considered in all contexts. The result of this step is a set of valid mappings representing links between the user profile and the application contexts.

Figure 7. α-Profile binding

In the approach we propose in this section, we make several assumptions such as the existence of the user feedback, the possibility to distinguish between relevant and irrelevant results, etc. These assumptions can lead to interesting research problems which are not discussed in this paper.

4.2 Profile-Context Binding The profile-context binding consists in identifying the user profile parts which are related to a given context. While it is easy to get the user profile, we assume that there exists a context server which provide the id of the current context. The profile-context binding takes as input the user profile Pu, the set of mappings M issued from the contextualization and the current user context ci. It returns a contextualized profile P’u which contains only profile predicates which have to be considered by the application in the context ci. The main problem which has to be considered when developing a binding service is how to take into account the mapping scores (values of δ). Several approaches may be envisioned. We propose a specific service, called α-Binding,

user profile Pu

α-Binding mappings M(Pu, C)

contextual profile content

Profile content selection

user profile PÕ u

Mappings score considering

actual user context ci

The first step (profile content selection) consists in selecting profile preferences which have to be taken into account in the actual context. This corresponds to the union of non contextualized preferences, preferences which appear in the positive mappings and those which appear in the negative mappings. In other words, only profile predicates involved into the neutral mappings are pruned. This step preserves the information about the mappings validity by assigning to each contextualized profile preference the corresponding value of δ in the actual context. Thus, the output of this step is composed by two sets of profile preferences: a set of triplets (pi, wi, δi) for the contextualized preferences and a set of couples (pi, wi) for the non contextualized profile preferences. The second step (mappings score considering) transforms the contextualized user preferences into simple preferences. In other words, it transforms the triplets (pi, wi, δi) produced by the previous step in couples (p’i, w’i). The goal of this step is to normalize the user profile format. Firstly, the negative mappings are transformed: we assume that when it is irrelevant for the user to satisfy pi (δ < 0), then it is relevant for him to satisfy the opposite predicate ¬pi. The associated mapping score is equal to -δ. For example if the predicate “gender=’drama’” appears in a negative mapping, this predicate is transformed in a positive “gender≠’drama’” user preference.

Secondly, the weight wi of each contextualized predicate is updated according to the mapping score δ. Among various possibilities, we choose to replace the weight wi by |δ|. The next two sections present services for exploiting a user profile.

4.3 Query reformulation Query reformulation is a personalization service which enables adapting a query processing to a given user profile [24]. This service is conform to the following assumptions: -

the users interact with a mediation system where mappings between the virtual schema and the source schemas are defined in a Local As View manner,

-

user queries are conjunctive queries of type Select-Project-Join, and

-

terminological problems between terms (attribute and table names) are supposed to be resolved.

The goal of query reformulation is to answer the following questions: (i) How to take into account the multiple data sources ? and (ii) How to consider the user preferences ? To answer these questions, two types of techniques are used: (i) query rewriting which is used to select data sources and to substitute the query variables, and (ii) query enrichment which exploits the user profile. The query rewriting process consists in transforming the user query expressed on the virtual schema into expressions defined on the data sources [18]. The query enrichment process exploits the user profile to enhance expressiveness of a given query by integrating specific knowledge taken from the user profile [27]. The query reformulation takes as input an initial user query Qu, a user profile Pu, the virtual schema and the LAV sources definitions. It produces a set of enriched query rewritings which integrate the profile content.

Let R and E be respectively the rewriting and the enrichment process. We have implemented three query reformulation services. Two of them are compositions of the previously mentioned processes, while the third one is a query rewriting service driven by the user profile. The first two services depend on the order in which the two processes are applied. Thus, it is possible to first select data sources and then enrich the resulting rewritings E(R)-Reformulation service or to first enrich the user query and then rewrite the enriched query R(E)-Reformulation service. More details about the query reformulation techniques can be found in [24]. In this paper we will only detail the main principles of the third reformulation service, called R/P-Reformulation. It performs in four steps (Figure 8). Virtual Schema

R/P-Reformulation

user query

Query Expansion

expanded query

Relevant Sources Identification

user profile relevant sources

reformulated user query

rewritings

Final Enrichment

Relevant Sources Combination

sources definitions

Figure 8. R/P-query reformulation

The first step (query expansion) consists in expanding the initial query with relevant virtual relations. The set of selected virtual relations is chosen according to two criteria: (i) the profile predicates which can be expressed on these relations, and (ii) the size of the set. The first criterion has to be maximized to allow a good personalization whereas the second has to be minimized to not penalize the rewriting process too much. The selected relations are then integrated to the initial query. The result of this step is an expanded query on which a sufficient number of profile predicates can be expressed. The second step (relevant sources identification) includes identifying contributive data sources for

rewriting the query as well as filtering irrelevant data sources according to the user profile. The result of this step is a set of relevant data sources which can contribute to the query rewriting. The goal of the third step (relevant sources combination) is to produce query rewritings by combining sources selected in the previous phase. In this step, knowledge from the profile is used to prune irrelevant combinations. Finally, the fourth step (final enrichment) expands the candidate rewritings with profile predicates. This step is quite simple because the concerned profile predicates have been identified during the previous stages of the reformulation process. All personalization services have been implemented and evaluated on a significant benchmark [24]. Integrating virtual relations to the user query during query expansion is a difficult task similar to the Steiner Tree Problem [22]. Due to its complexity, this task is solved using the Minimum Cost Path Heuristic [42]. Relevant data sources are identified using the first part of the MiniCon algorithm [18] with an additional pruning rule. Relevant sources are then combined using an adaptation of the Apriori algorithm [3] which allows pruning irrelevant combination as soon as possible. It is important to mention that the query reformulation service can be partially used to perform only query rewriting without personalization or to perform only query enrichment according to a user profile. Services provided by the PAM can be used in various manners to personalize applications. The next section presents how the PAM can be deployed in several architectures.

4.4 Profile matching Many web applications such as e-commerce, VoD and IPTV implement recommender systems in order to provide users with relevant multimedia content. Matching algorithms are the key of recommender systems. These algorithms

give answers to some important commercial questions such as: Which users are similar to a given user? Which contents may be of some interest for a given user? Who are the users interested in a given content/product? Which contents are similar to a given content? Answering these questions may be done by implementing four algorithms. The first one takes as input a user profile and return a ranked list of its similar profiles. The second algorithm takes as an input a user profile and set of a content descriptors, and returns a list of contents ranked with regard to their potential score of interest. The third algorithm takes as input one content descriptor and a set of user profiles, and returns a set of user profiles that may be in some of interest to the given content. The last algorithm takes as input one content descriptor and returns a ranked list of its similar contents. Given that both user and content profiles may be instantiated from the same profile meta model (section 3.1), the four algorithms operate on a slight different inputs. The general pattern of a matcher is an algorithm Matcher (e, K) that takes as input an element e and a set of elements K and returns a list L of elements of K ranked with respect to their similarity with e. Note that e may be either a user profile or a content descriptor instance, K may be a set of user profiles or a set of content descriptors, and L is a list of elements of K. In the following, we detail only the algorithm that corresponds to the first algorithm, and discuss modifications for holding others. Profile-to-Profile Matcher A similarity measure is necessary to estimate similarities between user profiles. The measure we propose takes into account three aspects: (i) the structure of the two compared profiles, (ii) the associated values to the attributes of the two profiles, and (iii) the associated preferences to the element of the two profiles [1]. Structural similarity corresponds to the similarity of models from which profiles are instantiated. Value

similarity is quantified in two ways. On one hand, by applying some predefined functions in the case of basic attributes, for example, comparing two zip codes may result in giving the arithmetic difference between them, or the real distance (km) separating their two corresponding regions. On another hand, by using semantic similarities over domain ontologies. For example, the similarity of two movie genres (Action and Adventure) may be captured from an ontology or a taxonomy such as Tv-anytime [21]. Preference similarity concerns preferences that are expressed on profile elements. These preferences may be quantitative [4] or qualitative [23]. In the remainder of the paper, we consider only quantitative ones. For an adequate definition of the problem of preference matching, we need to measure the similarity between two user profile preferences by using one of the well known similarities such as the cosine similarity, the Pearson correlation, or the Spearman rank correlation. For this to be defined we first need to form the vector representations of a profile preferences. Vector representation of profiles Consider the set D of all distinct < predicate, weight> pairs appearing in the profile. The cardinality N of this set is finite and is equal to |D|. Let OD be an arbitrary but fixed order on the pairs appearing in D. We refer to the i-th element of D based on the ordering OD by D[i]. A vector representation of a profile P is a real vector V of size N. The i-th element of the vector corresponds to the weight of the pair D[i]. Definition Two profile vector U and V are said homogenous iff: -

U =V =N

-

Under a certain order OD, ∀i,1 ≤ i ≤ N , predicate(Du[i])= predicate(Dv[i])

But, in practice, user profile vectors are usually non-homogenous. Indeed, one user may be

interested in Drama, Action and War movies whereas another may be interested in Drama, Adventure, War and Social movies. To overcome the non homogeneity of profile vectors, we proposed three solutions: -

Take into account the common concepts only. Add absent concepts in each vector with a weight of zero. - Add absent concepts and predict their weights using weights of present concepts and an ontology. Let us focus on the third solution. Consider two users Bill (, , ), and Jean (, , ) with their corresponding profiles. Homogenization of Bill and Jean vectors is done by adding the pair to Jean vector, and the pair to Bill vector. Weights w1 and w2 are set using the preference propagation algorithm. Adding concept to a vector U and estimating its expected weights, leads to search, in a given ontology O, the nearest concept of c in U using semantic similarity measures [36, 38]. let c’ be this nearest concept, and sim(c,c’) be the semantic similarity of c and c’. The expected weight of c is then given by multiplying the weight that user had given to c’, and the similarity sim(c,c’). Algorithm : Preference propagation

Input: - Vector of weighted concepts,

-

U.

New concept c, c ∉ Concept(U )

- Ontology of concepts domain, O. Output: - The expected weight for c. Algorithm: -

Weights abstraction: transformation of input vector into set of concepts S1 = Concept (U )

-

Looking for the nearest neighbor of c in (in the ontology)

S1

nearest _ neighbor(c, S ) = {c' / c'∈ S ∧ ∀a ∈ S , a ≠ c': sim(a, c ) ≤ sim(c' , c )}

-

Inferring expected weight for c

score(c ) = score(U .c') × sim(c' , c )

An interesting problem is the choice of the most adequate semantic similarity between concepts. Following the sample of ontology given in Figure 9, the nearest neighbor for Action concept in Jean vector is War and the similarity between them is 0.7. Then the concept Action will be added to Jean profile vector with an expected weight = 0.4*0.7, where 0.4 is the associated weight to War concept. Similarly, the concept Social will be to Bill profile vector with the expected weight = 0.9*0.2. Once the two user profile vectors are homogenized, the similarity between them is evaluated using one of the well known similarity measures given above. social War

0.4

0.9

Science-Fiction 0.7

Drama 0.6 horror 0.9

0.4

action

adventure

Finally, profile-to-contents and content-toprofiles matchers are implemented in the same way as content-to-content matcher, i.e. by inferring the vector representations of both user profile and content descriptor, and measuring the similarity between these two vectors.

0.4

Comedy 0.5

0.8 Disaster

0.1

0.5

representation in the case of non-weighted concepts. A concept c is assigned a value 1 in the vector if it appears in the description of the content, and 0 otherwise. Similarity of two content descriptors may be quantified using one of the well-known similarity measures (e.g. cosine similarity). The similarity of non-weighted content descriptors can be quantified also, using domain ontology and a semantic similarity measure defined on this ontology [1]. In the case of weighted concepts, content descriptors are translated into vectors using vector representation given above. The homogenization of the resulting vectors could be done by taking into account common concept of the two contents, or by filling the missing weights by zero.

Romance

0.01

Realistic

Figure 9. Sample of movie genre ontology

This matching service constitutes one of the main online components of the PAM. Other matching services with different semantics may be defined following the same methodology. Toward a definition of other matchers Unlike user profiles which are modeled through a pretty complex meta models because of their richness, contents are usually modeled with a set of weighted concepts. Therefore, contrary to profile-to-profile matching algorithm that takes into account the structure, the values and the preferences of the matched profiles, content-toprofile and content-to-content matchers have less consideration. Indeed, content-to-content matcher May be implemented through traditional binary vector

5. PAM DEPLOYMENT The PAM can be used to build new personalized applications or to provide personalization features to existing applications (legacy applications). Even if the PAM is application independent, its deployment depends on the application needs and the architecture. We have identified three use cases of PAM usage: mediation-improvement PAM, application-improvement PAM and service-providing PAM. This section presents the deployment and the usage of the PAM in each architecture.

5.1 PAM-M: Introducing Personalization in Mediators The PAM-M is used to enhance a mediation system with personalization services. A mediation system gives a transparent access to a set of applications (sources). Each application communicates with the mediator through a particular wrapper. When a user sends a query to

the mediation system, it is reformulated in order to be evaluated on the real applications. Contexts

Contexts

PAM - A Profiles

Profile Warehouse

PAM - M

Application

Contexts

PAM - A Profiles

Application

PAM - A Profiles

Application

Mediator meta data

Mediation System

Context DB

Wrapper

Wrapper

Wrapper

Application

Application

Application

Content

Content

Content

Figure 11. Introducing Personalization in Applications Content

Content

Content

Figure 10. Introducing Personalization in Mediators

The PAM’s deployment on a mediation system is shown in Figure 10. The role of the PAM-M is to adapt evaluation of the user query according to the user preferences and to the user context. In this case, applications do not have to be aware of the personalization or of the PAM-M. The knowledge about users (profiles) and contexts is stored at the PAM-M level and personalization is achieved on the mediator side. For example, when a user issues a query, the mediator can call the profile-context binding service in order to get the contextualized user profile. This profile and the initial query can then be sent to the query reformulation service in order to get a set of personalized query rewritings. The enriched rewritings can then be sent to the real applications in order to get relevant results according to the preferences stored in the user profile.

5.2 PAM-A: Introducing Personalization in Applications The PAM-A provides applications with personalized services and a persistent layer for profile and context management. The PAM-A is considered as being a part of the application (Figure 11). Each application has its own PAM-A and thus, its own container of profile and context instances. However, in order to interoperate between applications, all PAM-A should share the same profile and context meta models.

The PAM-A can be used in different manners according to the implication of the application in the personalization. In the first scenario, the application assigns all personalization tasks to the PAM-A. This scenario behaves similarly to the PAM-M. When a user submits a query, the application calls the query reformulation service before evaluating the enriched query. Another personalization step can be performed after the query evaluation by using the profile-content matching in order to filter the non relevant results. In the second scenario, the application aims to have more control on personalization. The PAMA is thus more strongly coupled with the application and services are used during different steps of the query life cycle. For example, the user query can be enriched with profile preferences during the query compilation step and then the query execution can be done using the profile-content matching service. In the last scenario, the application performs all personalization tasks and uses only the PAM-A profile and context management components. For instance, the application may query the PAM-A to get a user profile adapted to a given context.

5.3 PAM-S: Providing Personalization as an Autonomous Service The PAM-S can be seen as a toolbox of reusable services which can be called by the applications (Figure 12). The PAM-S only receives messages (service invocations), executes the services and sends the results to the applications. Every

application can call the PAM-S services if messages are conform to the meta models. Indeed, the PAM’s services can only be applied on profiles, on contexts and on contents which are instances of the profile or context meta model. Consequently, applications have to be aware of the meta models.

Application

Application

Application

Profiles

Profiles

Profiles

Contexts

Contexts

Contexts

Content

Content

Content

PAM - S

Figure 12. Providing Personalization as an Autonomous Service

In this use case, the PAM-S does not store profile and context instances and is considered to be stateless. Consequently, each application manages its own user profiles and contexts, and supervises the service calls. For example, to personalize a query according to a given user profile in a particular context, the application sends the profile and the context to the PAM-S and calls the profile-context binding service and then sends the bound profile and calls the query reformulation service to get a personalized query. In this example, the PAM-S will deal with both service calls independently. It is the application responsibility to call the services by sending the appropriate inputs. The PAM-S can be also deployed for a set of applications which belong to the same information system. In this case, applications will probably share the same user profiles and contexts. Thus, the profile and context management can be performed in the PAM-S layer. However, a protocol has to be developed to deal with concurrent accesses.

In this section, different scenarios of PAM deployment were presented. We have shown how the PAM can be used according to the application needs and the type of personalization to be performed.

6. RELATED WORK User profiles. Different attempts have been done to collect and classify the knowledge about users and contexts. For example, P3P [13], as a standard for profile security, has identified three categories of profile knowledge: demographic attributes (e.g. identity, age, revenue), professional attributes (e.g. employer, job category, expertise) and behavior attributes (e.g. trace of previous queries, time spent at each navigation link). Another categorization tentative has been done by [2] for the digital libraries field. They have identified four categories of knowledge: personal data (identity), collected data (content, structure and origin of accessed documents), delivery data (time and support of delivery), behavioural data (trace of user-system interactions). These attempts to structure the profile knowledge are valuable but insufficient to cover the field of the personalization as they are hardly extensible. Context. Context-aware applications emerged with the development of mobile and ubiquitous applications [6] where context was mainly related to spatial and temporal information. A. Dey [16] defined a context as being any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves. Based on this definition, many applications aim to adapt their services with regard to the user context. Smart home applications [45] for example, use the context in order to personalize the reaction of future home control system (Luminosity, TV volume) with respect to user’s context. Context-sensitive search [19] exploits the user context in order to provide the user with more relevant answers. Some

authors consider the users’ recent activities as being the context. Context is modeled in the literature with a large panel of formalisms. The closer work to our proposition is given in [20], where situation (context) is modeled using an entity-relationship schema. In [39] context is modeled through a set of predicates of the form of . [41] uses the same formalism than the previous one, with considering that attributes are organized in hierarchies. There exist other formalisms such as xml-based [34], object-oriented [7], and logicbased [29]. Contextual preferences. Recently, contextualization of user profile and preferences has attracted attention. In [20] authors proposed a framework for contextual preferences, called situated preferences. In their approach, both user profiles and situations (contexts) are modeled in an Entity-relationship model. Contextualization of preferences is modeled as M:N relationship (pid, cid) expressing that the preference pid holds in the situation sid. However, no details are given about the relationships construction process. In their research, Stefanidis et al. [41] addresses the problem of introducing the notion of context into the database field. The context is represented by a set of contextual attributes (e.g. Age, weather, accompanying-people). It is used to rank database tuples according to contextual preferences of the form (contextState, preferencePredicate, Score), which are explicitly specified by users. An example of contextual preference is ((cloudy, alone), movie.genre = Action, 0.8). It expresses the fact that the interest score of Action movies is 0.8 when the user is in the context of cloudy weather and when he is alone. The difference between this work and our proposition is that the first one considers a user profile as being part of the context, while we clearly separated these two concepts. We argue that whether the user profile and the context have to be separated depends on the point of view. On one hand, from the query point of view, the user who issued this query can be seen as being part of

the query context. Thus, no distinction between profile and context is needed. On the other hand, from the user point of view, the context describes the user’s environment and thus has to be considered separately from the user profile which describes the user himself. An approach for automatic discovery of contextual preferences is initiated by Agrawal et al. [5]. A massive collaboration is used to discover contextual preferences of the form i1 f i2 X where the item i1 is preferred to the item i2 in the context X. In [43] contextual preferences are also automatically generated by observing and analyzing user behavior and histories. Authors have defined a descriptionlogical based framework for preference modeling. Each contextual preference is a rule of the form: Rule = ruleName: Context = contextInstance ∧ Preference = preferencePredicate : Score = probabilityScore. In this expression, Score is the probability that an element satisfying preferencePredicate is the ideal one for the user in the context contextInstance. This work is similar to our as it automatically discovers contextual preferences. However, it computes the probability that an element (e.g. document, content) be the best one in a given context, while we evaluate the relevance degree of the mappings which relate profile elements to contexts. Query reformulation. As we have seen in section 4.3, the reformulation service covers two main tasks: rewriting task and enrichment task. Various works have addressed the two tasks separately for different purposes. With respect to the rewriting process, many rewriting algorithms exist [17] [18] [28] [33]. Compared to our proposition, they do not deal with user profiles. Some other approaches in the query rewriting area exploit user preferences during the source selection [30] [44]. User preferences in these approaches concern data

quality. In [30] preferences are used to select the most relevant data sources for a given query. The selected sources are used to generate correct execution plans for the target query. In [44], source selection consists in finding the set of top K navigational paths from an origin source to a target source. Top K paths are computed using three quality preferences: cardinality of the target objects, explicit user preference on data sources or joins, and query execution cost. In contrast to these approaches which exploit preferences on containers, our reformulation service deals with preferences on content. These approaches can be seen as complementary to our service because we can use extrinsic preferences for source filtering. With respect to the enrichment process, as far as we know, the unique significant work done in the database area is the one of Koutrika and Ioannidis [26, 27]. Similarly with this work, our reformulation service exploit user profiles by expanding the user query with complementary relations derived from the profile or from the virtual schema. However, our proposition goes further and deals with multiple data sources. Actually, the two approaches are complementary. Indeed, the enrichment phase of our reformulation service is straightforward and can be improved by using the more elaborated idea of [26]. Matching. The matching paradigm is in key of many personalized applications. It refers to the quantification of the similarity between two entities. Usually, these entities are either users, or contents. In recommendation field, recommender systems provide users with relevant contents. Relevancy is measured by the similarity between users and contents. This similarity is issued from the matching of user profile with content. This technique is called Content-based Filtering [40]. There is another manner to do recommendation, which is called collaborative filtering [9]. This technique is based on the assertion that users with the same behavior have the same interests.

7. CONCLUSION In this paper, we have proposed a personalized access model which provides a generic set of concepts and services for personalizing a wide range of applications. The PAM encompasses meta models for user profiles and for contexts which serve as a foundation for the interoperability between services and applications. We claim that these services constitute fundamental building pieces of a personalized application. Finally, we have shown how the PAM can be deployed in various use cases. Currently, the main services of the PAM and the meta model management platform are implemented. It remains to put in practice these services within specific PAM deployment in real life applications, and to evaluate the added value of personalized applications.

References 1.

2.

3.

4.

5.

Abbar, S., Bouzeghoub, M., Kostadinov, D., Lopes, S.: Profile Matching: Approaches, Techniques, and Algorithms. Technical report, © Alcatel-Lucent, France (2007) Amato, G., Straccia, U.: User Profile Modeling and Applications to Digital Libraries. In: Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries. pp. 184-197, Paris, France (1999) Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the International Conference on Very Large DataBases (VLDB), Santiago, Chile. 487– 499. (1994) AGRAWAL, R. AND WIMMERS, E. L. A framework for expressing and combining preferences. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 297–306. (2000) Agrawal, R., Rantzau, R. and Terzi, E.: Context-Sensitive Ranking. SIGMOD’2006

6.

7.

8.

9.

10.

11.

12.

13.

14.

Belotti, R., Decurtins, C., Grossniklaus, M., Norrie, M. C., Palinginis, A.: Modelling Context for Information Environments. In: Workshop on Ubiquitous Mobile Information and Collaboration Systems (UMICS), CAiSE 2004, Riga, Latvia (2004) BOUZY, B., CAZENAVE, T.: Using the object oriented paradigm to model context in computer go. In: In Proceedings of Context’97. Rio, Brazil. (1997) Bradley, K., Rafter, R., Smyth, B.: Casebased user profiling for content personalization. In: Proceedings of the International Conference on Adaptive Hypermedia and Adaptive Web-based Systems, Trento, Italy (August 2000) Breese, J. S. Heckerman, D. and Kadie, C: Empirical analysis of predictive algorithms for collaborative filtering. Technical report MSR-TR-98-12. Microsoft research, Redmond, WA 98052, USA (1998) Bright, L., Raschid, L.: Using LatencyRecency Profiles for Data Delivery on the Web. In: Proceedings of the 28th Conference on Very Large Data Bases, pp. 550-561, China (2002) Cherniack, M., Galvez, Ed., Franklin, M., Zdonik, St.: Profile-Driven Cache Management. In: Proceedings of the 19th International Conference on Data Engineering. pp. 645-656, Bangalore, India (2003) Ciro, S., Newton, V.: Use Reformulated Profile in Information Filtering. In: Proceedings of the AAAI Workshop on Semantic Web Personalization, San Jose, California (2004) Cranor, L., Dobbs, B., Egelman, S., Hogben, G., Humphrey, J., Langheinrich, M., Marchiori, M., Presler-Marshall, M., Reagle, J., Schunter, M.: The Platform for Privacy Preferences 1.1 (P3P1.1) Specification. W3C Working Draft (2005) Dell’Acqua, P., Moniz Pereira, L., Vitoria, A.: User Preference Information in Query

15.

16.

17.

18.

19.

20.

21. 22.

23.

24.

Answering. In: Proceedings of the 5th International Conference on Flexible Query Answering Systems. pp. 163-173 Copenhagen, Denmark (2002) Dempski, K. L.: Real Time Television Content Platform: Personalized Programming Over Existing Broadcast Infrastructures. In: Proceedings The Second International Conference on Adaptive Hypermedia and Adaptive Web-based System USA (2002) Dey, A. K: Understanding and Using Context. Personal and Ubiquitous Computing, (2001) Duschka, O., Genesereth, M. Answering Recursive Queries Using Views. In Proceedings of the 16th ACM SIGACTSIGMOD-SIGART Conference on Principles of Database Systems, PODS, Tucson, AZ, , 109-116. (1997) Halevy, A., Pottinger, R. MiniCon: A scalable algorithm for answering queries using views. Very Large Data Bases Journal, Vol. 10, , 182-198. (2001) Hasan, O., Atwood, M.E., Waters, J., Char, B.W.: A context-sensitive search mechanism. In: Proceedings of INMIC. 8th International Volume , Issue , pp: 368 – 374.(2004) Holland, S. and Kießling, W.: Situated Preferences and Preference Repositories for Personalized Database Applications. ER’04. (2004) http://www.tv-anytime.org/ Hwang, F.K., Richards, D.S., Winter, P.: The Steiner Tree Problem. Elsevier, NorthHolland (1992) KIEßLING, W. Foundations of preferences in database systems. In Proceedings of the International Conference on Very Large Data Bases (2002). Kostadinov, D.: Data Personalization: an approach for profile management and query reformulation. PhD thesis, Université de Versailles St-Quentin-en-Yvelines, France (2007)

25. Kostadinov, D., Bouzeghoub, M., Lopes, S.: Query rewriting based on user’s profile knowledge. In: Actes des 23emes Journées Bases de Données Avancées, Marseille, France. (2007) 26. Koutrika, G., Ioannidis, Y. E. Personalization of Queries in Database Systems. In Proceedings of the 20th International Conference on Data Engineering, Boston, USA, 597-608. (2004) 27. Koutrika, G., Ioannidis, Y. E.: Personalized Queries under a Generalized Preference Model. In: Proceedings of the 21st International Conference on Data Engineering. pp. 841-852, Tokyo (2005) 28. Levy, A. Y., Rajaraman, A., Ordille, J. J. Querying Heterogeneous Information Sources Using Source Descriptions. In Proceedings of the 22nd Very Large Data Bases Conference, Bombay, India, 251-262. (1996) 29. MCCARTHY, J., BUVA, C. : Formalizing context (expanded notes). In : In Working Papers of the AAAI Fall Symposium on Context in Knowledge Representation and Natural Language. Menlo Park, California, 99–135. (1997) 30. Naumann, F., Freytag, J.C., Spiliopoulou, M.: Quality Driven Source Selection Using Data Envelope Analysis. In: Proceedings of the MIT Conference on Information Quality, pp. 137-152, Cambridge, USA (1998) 31. PAPI: http://icl.cs.utk.edu/papi/pubs/index.html 32. Pottinger, R., Halevy, A.Y.: Minicon: A scalable algorithm for answering queries using views. Very Large Data Bases Journal 10(2-3) (2001) 182–198 33. Qian, X. Query folding. In Proceedings of the 12th International Conference on Data Engineering (ICDE), New Orleans, Louisiana, (1996) 34. RYAN, N.: Contextml: Exchanging contextual information between a mobile client and the fieldnote server. In: Computing

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

Laboratory,University of Kent at Canterbury, http://www.cs.kent.ac.uk/projects/mobicomp /fnc/ConteXtML.html. (1999) Rocacher, D., Liétard, L.: Préférences et quantités dans le cadre de l’interrogation flexible: sur la prise en compte d’expressions quantifiées. In: Actes des 22e Journées Bases de Données Avancées (BDA), Lille, France. (2006) Resnik P., Semantic Similarity in a Taxonomy: An Information-based Measure and its Application to Problems of Ambiguity in Natural Language, Journal of Artificial Intelligence.(1999) Santos, C., Vieira, N.: Use reformulated profile in information filtering. In: Proceedings of the AAAI Workshop on SemanticWeb Personalization, San Jose, California. (2004) Schickel-Zuber V., Faltings B., Inferring User’s Preferences using Ontologies. In the proceeding of AAAI 2006 (2006), p. 14131418. (2006) Schilit, B.N., Adams, N.L., Want, R.: Context-aware computing applications. In: In IEEE Workshop on Mobile Computing Systems and Applications. Santa Cruz, CA, US, (1994) Shoval P., Maidel V., Shapira B., An Ontology- Content-Based Filtering Method. I.Tech-2007 - Information Research and Applications. (2007) Stefanidis, K., and Pitoura, E.: Fast Contextual Preference Scoring of Database Tuples. EDBT’08. (2008) Takahashi, H., Matsuyama, A.: An approximate solution for the steiner problem in graphs. Mathematica Japonica (1980) 573–577 Van Bunningen, A. H., Fokkinga, M.M., Apers, P., M., G. and Feng, L.: Ranking Query Results using Context-Aware Preferences. ICDEW’07. (2007) Vidal, M. E., Rashid, L., Marquez, N., Cardenas M., Wu, Y. Query Rewriting in the

Semantic Web. In Proceedings of the 22nd International Conference on Data Engineering Workshops (ICDEW), Atlanta, GA, USA, (2006) 45. Vildjiounaite, E., Kocsis, O., Kyllönen, V., Kladis B.: Context-Dependent User Modelling for Smart Homes. In: Proceedings 11th International Conference UM 2007, pp. 345-349, Corfu, Greec (2007) 46. Zemirli, N., Tamine-Lechani, L., Boughanem, M.: Présentation et évaluation

d’un modèle d’accès personnalisé à l’information basé sur les diagrammes d’influence. In: XXVème congrès INFORSID, Perros-Guirec, France. 89–104, (2007)