sites (Amazon [15], MovieLens [17]), where they use input about a customer's interests to .... An OLAP analysis context is defined as {CF, CD1, â¦, CDn} where.
Applying Recommendation Technology in OLAP Systems Houssem Jerbi, Franck Ravat, Olivier Teste, and Gilles Zurfluh IRIT, Institut de Recherche en Informatique de Toulouse 118 route de Narbonne, F-31062 Toulouse, France {jerbi,ravat,teste,zurfluh}@irit.fr
Abstract. OLAP systems offering multidimensional and large information space cannot solely rely on standard navigation but need to apply recommendations to make the analysis process easy and to help users quickly find relevant data for decision-making. In this paper, we propose a recommendation methodology for assisting the user during his decision-support analysis. The system helps the user in querying multidimensional data and exposes him to the most interesting patterns, i.e. it provides to the user anticipatory as well as alternative decisionsupport data. We provide a preference-based approach to apply such methodology. Keywords: Decision-support analysis, OLAP, Recommendations, Preferences.
1 Introduction OLAP (On-Line Analytical Processing) systems are the predominant frontend tools for decision-support systems. They provide a multidimensional view of the data as this is certainly the most logical way to analyze businesses and organizations. Data are organised according to subjects of analysis, called facts, which are associated to axes of analysis, called dimensions. A decision-support analysis is an interactive exploration of Multidimensional DataBases (MDB), which allows users to see data from different perspectives. 1.1 Context and Motivations Decision-support systems intend to help knowledge workers (executives, managers, etc.) make strategic business decisions. As enterprises face competitive pressure to increase the speed of decision making, the decision-support systems must evolve to support new initiatives, such as providing a more personalized information access and helping users quickly find relevant data. Recommender systems are one way to meet this need. Recommender systems are best known for their use on e-commerce Web sites (Amazon [15], MovieLens [17]), where they use input about a customer’s interests to provide advice on movies, travels, and leisure activities. OLAP provides an interactive analysis of multidimensional data based on a set of navigational operations. In most cases, the analyst is expected to use these operations J. Filipe and J. Cordeiro (Eds.): ICEIS 2009, LNBIP 24, pp. 220–233, 2009. © Springer-Verlag Berlin Heidelberg 2009
Applying Recommendation Technology in OLAP Systems
221
intuitively to find interesting patterns [6,19]. Obviously, analysis process becomes very laborious and complex task due to the large size and the high dimensionality of OLAP data [5]. We argue that the manual effort and the time spent in analysis could be reduced by anticipating the user’s strategy and recommending relevant data for decision-making. Furthermore, OLAP process brings the user into a world of endless possibilities when applied to a high dimensional and hierarchical dataset. Analysts are frequently confronted with several adjoining patterns of multidimensional data with various perspectives and different granularity levels, e.g. analysis of sales amounts by customer may be performed according to cities, zones, departments, regions, and states. Providing advice on relevant patterns reduces effectively the user faltering when exploring multidimensional data. To meet the challenges of more user-centered decision-support systems, OLAP tools are to be extended with recommendation techniques to make the analysis process easy. 1.2 Related Work Recommendation approaches have been studied in many research communities, such as information retrieval [2], World Wide Web [3], and databases [13,20]. Existing recommendation approaches are usually classified into the following categories, based on how recommendations are generated: − content-based methods [14,16] recommend to the user items similar to the ones the user preferred in the past, − collaborative Filtering [12,20] recommends to the user items that people with similar preferences liked in the past, and − hybrid approaches that combine collaborative and content-based methods [3]. In OLAP, Giacometti et al [7] propose to recommend to the user the next query based on the OLAP server query log. Recommendations are provided irrespective of user preferences while such preferences play an important role in the success of recommender systems [3]. Besides, this approach consists in recommending full queries and does not consider flexible recommendations that deal with different levels of user involvement. Recommender systems always assume that the target of the recommendations is the current user. Therefore, user modeling plays the main role in the success of these systems [3]. User modelling in OLAP has been studied in two main works. In [10], a context-aware preference model is proposed. This model deals with user interests that vary according to different contexts of OLAP analysis. Sapia [19] proposes to model user behavior in order to improve caching algorithms of OLAP systems. This approach deploys information about characteristic patterns in the user’s data access. 1.3 Aims and Contributions In order to make the analysis process easy, we intend to define a recommendation methodology for assisting the analyst. This methodology must be adapted to the OLAP analysis pattern. The main contributions of the paper are the following:
222
H. Jerbi et al.
− We provide a graph-based model of OLAP analysis; a user analysis consists in a succession of analysis contexts. We model an analysis context through an internal view irrespective of data visualization form. − Motivated by recommendation techniques in the web field, we define a flexible recommendation paradigm according to details provided by the user. − We introduce a model of user preferences in OLAP that depend on the analysis context and we discuss a preference-based approach that applies recommendations. The remainder of the paper is organized as follows: section 2 sets the stage by providing an overview of the decision-support analysis; section 3 introduces promising recommendations in OLAP; section 4 presents a preference-based recommendation approach. Finally section 5 concludes the paper with directions for future research.
2 Decision-Support Analysis Analytical power of the OLAP technology comes from its underlying multidimensional data model, called constellation [11, 18]. 2.1 Multidimensional Data Model A constellation regroups several facts, which are studied according to several analysis axes (dimensions) possibly shared between facts. It extends star schemas [11], which are commonly used in the multidimensional context. Definition. A constellation is defined as (NC, FC, DC, StarC) where NC is a constellaC tion name, FC is a set of facts, DC is a set of dimensions, StarC: FC → 2D associates each fact to its linked dimensions. Definition. A dimension, noted Di∈DC, is defined as (NDi, ADi, HDi) where NDi is a dimension name, ADi = {aDi1,…, aDiu} is a set of dimension attributes, HDi = {HDi1,…, HDiv} is a set of hierarchies. Within a dimension, attribute values represent several data granularities according to which measures could be analyzed. In a same dimension, attributes may be organised according to one or several hierarchies Definition. A hierarchy, noted HDij∈HDi, is defined as (NHj, PHj, WeakHj) where NHj is a hierarchy name, PHj= is an ordered set of attributes, called parameters, which represent useful graduations along the dimension, ∀k∈[1..vj], pHjk∈ADi. The WeakHj: PHj → 2ADi−ParamHj function associates each parameter to a set of weak attributes for adding semantic information to the parameter. A fact reflects information that has to be analyzed through one or several indicators, called measures. Definition. A fact, noted Fi∈FC, is defined as (NFi, MFi) where NFi is the fact name, MFi={f1(mFi1),…, fw(mFiw)} is a set of measures associated to aggregation functions f1,…, fp.
Applying Recommendation Technology in OLAP Systems
223
The following figure shows an example of a constellation that allows analysing online sales as well as the purchase activity of a worldwide distributor (graphical notations are inspired by [8]). DAY_MONTH
MONTH_NAME YEAR
DAY_NAME
QUARTER
PURCHASES
MONTH
COMPANY_NAME
IDD
DATES
HMonth
Amount Quantity
CITY REGION COUNTRY
IDS
SUPPLIER
HGeo
H_Week WEEK ZONE
GeoZON COUNTRY STATE
DESCRIPTION
SALES
CITY
IDST
STORE
GeoUSA
CATEGORY CLASS
IDP
GeoFR
Revenue Quantity Margin
PRODUCT
HCateg HBrand BRAND
REGION DEPARTMENT FIRST_NAME_C FIRST_NAME LAST_NAME_C CONTINENT COUNTRY STATE
CITY
LAST_NAME IDE
IDC
CUSTOMER
HUsa
EMPLOYEE
HFr
HAg AGE INCOME_GROUP HS SEX
REGION
DEPARTMENT
Fig. 1. Example of constellation schema
2.2 OLAP Analysis Modelling OLAP systems offer capabilities to interactively analyse the data by applying a set of specialized operations, such as drill-down, roll-up and slice-and-dice [1,9,18]. It has been recognized [6,19] that the workload of an OLAP application can be characterized by the user’s navigational data analysis task: the user defines a first query then successively manipulates the results applying OLAP operations. Thus, the typical interaction between the user and the system consists of a sequence of queries. Henceforth, the set of queries necessary to answer a business question (e.g. “which products are selling abnormally in low quantities?”) is referred to as analysis. Each query result within a given analysis represents an analysis context. 2.2.1 Analysis Context Within an OLAP analysis both structures and data are displayed. In the context of our research, the term analysis context refers to all the items (structures as well as data) that are displayed in a given instant of the analysis. We model the analysis context through a set of multidimensional structures and values of displayed parameters and measures, called context components [10]. We distinguish two categories of context components: 1) components that are related to the fact context CF: fact (F), measure (m), a value of a measure (valm) and aggregate function (fAgreg); and 2) components that are related to the dimension context CD: dimension (D), parameter (p) and a value of a parameter (valp). Note that each analysis context consists of one fact context and at least two dimension contexts.
224
H. Jerbi et al.
Definition. An OLAP analysis context is defined as {CF, CD1, …, CDn} where − CF = F (/ fAgreg (mj) ∈ {valm})+ is a fact context, where fAgreg (AVG, SUM, …) is an aggregate function, mj ∈MF, and valm ∈ Dom(mj), − CDi = Di (/ pk ∈ {valp})+ is a dimension context, where pk ∈ ADi and valp ∈ Dom(pk). Note that attributes of a dimension context must belong to the same hierarchy H ∈ HDi. Although decisional data are usually displayed within visualization structures that support interpretation and decision making, such as Multidimensional Tables (MT) and charts, the internal view of data is a tree structure. Analysis Context Tree. An analysis context is expressed by means of a tree T(V, E) (where V is the set of nodes and E is the set of edges) that reflects the nature of the relationship between the components of an OLAP analysis. There are two types of nodes in V: − structure nodes, one for the analysed fact (the root of the tree), one for each analysis indicator (a measure associated with an aggregate function), one for each analysis axis (a displayed dimension), and one for each displayed attribute. − value nodes, one for each value of an attribute or a measure. Example 1. Fig. 2 depicts an example of a 2-dimensional analysis context, which displays sales revenue by year according to the countries and cities of customers: C = {CF, CD1, CD2}, where CF = SALES / Sum (Revenue) ∈ {14, 13, 8, 9, 16, 15, 12, 9}, CD1 = CUSTOMER.HFr / Country ∈ {France,USA} / City ∈ {Paris, Toulouse, N-Y, Washington}, and CD2 = DATES.HMonth / Year ∈ {2007, 2008}. The internal view of this analysis context represented in Fig. 2 (a) is displayed to the user according to a MT (cf. Fig. 2 (b)) and a chart (cf. Fig. 2 (c)). USER
Internal view SALES
Visualization structure (b)
(a) SUM (REVENUE)
CUSTOMETR .HFr
DATES. HMonth Country
France USA
Year
14 13
2007 2008
8
9 16 15 12 9
16 14
City Paris Tlse N-Y Washington
12
2007
10
2008
8
(c)
6
Notation:
Dimension with current hierarchy Meseaure
Parameter
4 2 0
Measure or parameter value
Paris
Toulouse
N-Y
Fig. 2. Example of a context of analysis of sales revenue
Washington
Applying Recommendation Technology in OLAP Systems
225
2.2.2 OLAP Analysis Graph An analysis context represents a given state of the OLAP analysis. Therefore, we consider an OLAP analysis as a succession of analysis contexts. User performs OLAP operations to move from one context to another. This navigational pattern is best described by a graph representation, where the current analysis context corresponds to a node. The edges represent transitions between analysis contexts (see Fig. 3). Notation: CAi : Analysis context; Opi : OLAP operation (Drill-down, Rull-up,…) SALES SALES
SALES
SUM (REVENUE)
PRODUCT. HCateg
DATES. HMonth
Op1 DATES. HMonth
CUSTOMER. HFr
SUM (REVENUE)
Opn-1
Opk
Computer.
Country 14 13
Telephony
2007 2008
14 13 8 9 16 15 12 9
8
9
City
Year 2007 2008 City Paris Tlse N-Y. Wash. 23 12 16 18 14 12 9 11 Category
CUSTOMETR SUM (REVENUE) .HFr
Year
Class Year
DATES. HDate
2007
PC Software Mobile accessory
2008
France
Paris Toulouse
CA1
CA2
CAm
CAn
Fig. 3. OLAP Analysis graph
3 Flexible Recommendations in OLAP In this section, we discuss what should the system recommend to the analyst in order to assist him in navigating through multidimensional data. A common scenario for existing recommendation systems is a Web application with which a user interacts. The system helps the user to select items during his exploration of online catalog. Actually, the user can specify only some details about products and the system displays items with these details but that are close to his profile. Even if the user sets a full request (with all characteristics of products he is searching for), the system displays products that correspond to his request but it also pops up beside to provide advice on alternative items that seem to interest him. Existing web recommendation techniques can be categorized according to the level of involvement of the user in the data seeking process (see Table 1). We adapt these existing techniques to OLAP in order to make the navigation process handy and to help users quickly find relevant data. Table 1. Recommendations in web applications Vs. Recommendations in OLAP Application features Data structure Data view Query process User input Full query Partial query
E-commerce website
OLAP
Transactional Databases Detailed, flat, relational Isolated queries
Multidimensional Databases Summarized, multidimensional Navigational analysis process Recommendation output Additional products close to user - Anticipatory analysis node profile - Alternative analysis node Products with features stated by Analysis nodes close to user request user and that are near to his profile and in accordance with his profile
226
H. Jerbi et al.
We adopt the graph-based representation of OLAP analyses as the basis for our approach to apply recommendations in OLAP. We define three categories of recommendations according to details provided by the analyst. (1) Interactive Assistance in Querying Multidimensional Data. End-users analyse multidimensional data using a textual or a graphic language [4, 18]. In the last case, query specification is done implicitly by dragging elements from a navigation zone into the visualization structure (e.g., the MT) and incrementally refining the view. By using either a textual or a graphical language, user must state several details for each query, i.e., the analysis axes, the analysis perspectives, etc. In order to make the MDB querying easier and faster, the user should be guided along the query specification process: the system expands incrementally the query according to user manipulations. For example, the system proposes within a drop-down list appropriate granularity levels when the user specifies the analysis axe, i.e., the system generates recommended items to assist users through their interactions. Otherwise, the system can cope with any request regardless of its conciseness. As a consequence, user can rely upon the system by defining queries which lack details in order to perform his analysis faster. The system allows answering user partial queries; it displays decisional data that are related to user request and that are of particular interest to him. Using such an assistance paradigm will effectively reduce user uncertainty in the discovery of relevant information when navigating the data from a constellation. (2) Anticipatory Recommendations. This category of recommendations allows reducing user manual effort and the time spent in analysis by anticipating user navigation strategy. Let us consider a user who is interested in detailed data according to days when weekly revenue exceeds 10k Euro. According to the basic “philosophy” of OLAP technology, the user starts by asking for sales according to weeks. Then the user focuses on weeks where sales revenue exceeds 10k Euro. After that, he/she performs a drilling operation along the temporal dimension to see data by days. By keeping a repository of user analysis habits, the system will be able to anticipate the user analysis strategy by displaying data by days as a result to his first query, i.e., the system avoids intermediate states among user analysis and displays directly the relevant analysis node, called anticipatory node. (3) Alternatives Recommendation. The installation of recommender systems in OLAP guides analysts by offering them helpful alternatives that may be interesting for their decision-making process. This type of recommendation provides an alternative node according to the user navigation graph form. The recommended alternative nodes are provided in addition to the classic result of a user query. They can be subdivided into three major classes: 1) elaborated analysis nodes, which contain more detailed information comparing to the classic node, 2) missed analysis nodes, the system reminds the user of nodes he should ask for, and 3) other analysis nodes, that represent additional nodes the user did not ask for but that may be interesting for him (e.g., the recommender system may provide useful patterns based on user behavior in similar analysis contexts). The additional analysis nodes are useful for data interpretation and ease users to better understand the classic result. In summary, taking into account the interactive and navigational nature of the user query behaviour, applying recommendations in OLAP consists in:
Applying Recommendation Technology in OLAP Systems
227
− helping users perform an analysis node; the system provides advice on relevant components (dimension, parameters,…) (Fig. 4, step (1)), and − suggesting relevant analysis nodes; the system guides users toward relevant patterns by proposing them anticipatory nodes (Fig. 4, step (2)) and even alternative nodes (Fig. 4, step (3)). (2) Notation: CAi: Analysis context
CA1
CA2
CA3
CALj: Alternative analysis context Classic navigation
(3)
Recommendations
CAL1
CAL2
(1)
Fig. 4. Applying recommendations upon an OLAP analysis graph
4 Preference-Based Recommendation Framework Content-based recommenders build on the intuition “find me things like I have liked in the past”. Following a content-based approach in OLAP could consider that each analysis node is represented by the set of multidimensional data that it displays (analysis context), and each user is represented by a list of analysis preferences. In the following subsections we describe our user preference model and show how such a model can be used to generate recommendations. 4.1 User Preferences Modelling Analysts have various preferences determined upon different analysis contexts [10]. We consider two main categories of user preferences: preferences relating to the analysis axes and preferences concerning the analysis precision. 4.1.1 Preference Context The user may have preferences that depend on more or less general contexts, e.g. a user preference can be associated with the context of analysis of sales or with a more detailed context such as the analysis of sales of a given product category. A preference context CP is a fragment of the analysis context tree. Actually, the context of a user preference does not necessarily contain all the analysis context components. This can be expressed by assigning the value all to the corresponding context components. For example, a user preference that is associated with the context of analysis of sales can be applied in every analysis of sales data irrespective of the analysis axes and parameters. The more the preference context is detailed, the more the related user interest is specific.
228
H. Jerbi et al.
4.1.2 Contextual Preference Model A preference between dimensions defines relevant dimensions for the fact analysis in a specific context. Definition. Given a constellation C, a preference between dimensions, noted PCk = ( fp, CP), is a strict partial order over the subset of the constellation dimensions that are connected to the same fact, where fp ⊆ DC×DC and CP is a preference context. A preference within a dimension provides priority parameters (dimension attributes) for data analysis in a given context. Definition. A preference within dimension D, noted PHk, is defined as ( fp, CP) where −
fp is a strict partial order over AH ⊂ AD, where AH is a set of parameters and weak attributes situated on the hierarchy H of dimension D and fp ⊆ AH × AH
− CP is a preference context
Example 2. The decision-maker prefers to analyze sales revenue in priority by country then by region, but he/she may also wish to see more detailed data according to cities then by country in the context of analysis of yearly sales revenue. Such analysis preferences within dimension Customer are defined as follows: − PHFr1: Country fp Region, CP1 = {Sales/ Sum(Revenue)} − PHFr2: City fp Country, CP2 = {Sales/ Sum(Revenue), DATES.HMonth / Year } This model deals with several user preferences within a dimension (respectively between dimensions) that are related to parameters of the same hierarchy (rep. related to the same fact) providing that they depend on different contexts. Otherwise, for a given analysis context, in the case of a single preference on parameters of a dimension D, their hierarchy will be considered as a hierarchy by default for D exploration. If there are more, a conflict between hierarchies arises: what hierarchy should be considered to explore D? Hence, it is necessary to define a priority order between hierarchies (preference between hierarchies) to solve such kind of conflict. Definition. Given a dimension D, a preference between hierarchies is a strict partial order PDk = (D, fp), where fp ⊆ HD × HD. We call the set of contextual preferences that hold for a MDB, profile P. By CP(P), we denote the set of preference contexts CP that are associated with at least one preference in P. We assume that such profiles are available. In practice, users may express their preferences explicitly. These preferences may be also mined from the previous behavior of the users. 4.2 Recommendation Generation Contextual preferences are used to retrieve relevant analysis elements which are then used to generate recommendations for the user. 4.2.1 User Preference Selection User preferences are used to enhance the current analysis context or to build additional contexts that are near to the displayed context.
Applying Recommendation Technology in OLAP Systems
229
Now given the current analysis context CA, we would like (1) to identify the set PCand ⊆ P of preferences (P, CP) for which CP = CA, and then, (2) use them to enhance CA or to generate recommended analysis contexts according to CA. For a given context CA, there may be no preference (P, CP) in the profile P, with C = CA, that is CA ∉ CP(P). Actually, the profile contains preferences that do not necessarily depend on all analysis context components. To address this, we use those preferences in P that depend on CA, i.e., preferences whose contexts are included in CA. The problem of preference selection is a problem of trees matching [10]: a preference whose context tree is included in (all its edges and nodes belong to) the tree of the current analysis context is a candidate preference. If there are several candidate preferences, the selected preference is the most relevant one; it’s the preference whose context covers more the current analysis context; i.e. whose context tree has the largest number of nodes. Depending on the type of the preference context CP, we distinguish two cases: P
− CP concerns a value of a measure or a parameter: integrating the underlying preference leads to move on to a next analysis node. For example, a user prefers to see detailed data according to days when analysing the sales revenue in Italy (CP = {Sales/ Sum(Revenue), Customer.HFr / Country = ‘Italy’}). When focusing on revenue in Italy, the user needs to turn on the analysis of data by months. P − C does not contain values: integrating the related preference allows enhancing the analysis context. 4.2.2 Computing Recommendations A recommender system maintains a repository of user preferences that are used to suggest relevant patterns. Hence a key question becomes how does recommender system use these preferences to compute recommendations? An OLAP recommender system allows users to perform full or partial queries and to ask for help to build their analysis report. User Partial Query. A partial query generates an incomplete analysis context which can not be displayed to the user. For each query, the system builds a recommendation in an ascending way by enhancing the analysis context resulting from the user query until it becomes well-rounded: (i) filling-out favourite dimensions for the current fact analysis, (ii) specifying the relevant granularity levels of each dimension (dimension parameters), and (iii) aggregating fact data according to the specified parameters. Example 3. The marketing manager analyses yearly sales revenue according to products’ categories and classes (see CA1 in Fig. 5). He/she intends to change the Product dimension by Customer axis. Although he/she does not specify the granularity level within the Customer axis, the system generates a complete analysis context which is close to his preferences. The system takes into account the user preference PHFr2 (see example 2) to enhance the intermediate context (see CAinterm in Fig. 5). Actually, both PHFr1 and PHFr2 are candidate preferences but CP2 covers more the current context CAinterm.
230
H. Jerbi et al. CA1
CAinterm Pivot
SALES
SUM (REVENUE)
PRODUCT. HCateg
DATES. HMonth
SUM (REVENUE) DATES. HMonth
Class Computer.
Year
2007 2008
CUSTOMER. HFr
CUSTOMER. HFr
DATES. HMonth
SUM (REVENUE)
Current context
Telephony
14 13 8 9 16 15 12 9
Category
CA2 SALES
SALES
Year
2007 2008
Year 2007 2008 City Paris Tlse N-Y. Wash. 23 12 16 18 14 12 9 11
PC Software Mobile accessory
Recommended analysis context
Initial analysis context
Recommendation computing
Fig. 5. Partial analysis context expansion
User Full Query. When the user performs a full query, the system computes the query result which will be considered as the current analysis context of user. Then it looks for extra data patterns (i.e, extra analysis nodes) that are interesting to the user in this current analysis context. Following a preference-based paradigm, such analysis nodes are dynamically made up according to user preferences. The basic idea is to gradually construct analysis contexts by altering the current analysis context through preferences integration. Preferences that are related to the current context are integrated in decreasing order of their degrees of hierarchy: − The system searches for preferences between dimensions to change a current dimension by other relevant one. Then, parameters are specified through preferences within the selected dimension. − The system changes current parameters according to preferences within current dimensions. The generated analysis context differs from the classic one by the granularity levels. 4.3 Recommendation Display The system determines recommended analysis contexts (internal view), then displays them to the user according to the visualisation structure he uses. Recommendations are provided to the user according to their types: − An anticipatory recommendation is displayed instead of a classic result. The user is enabled to customize the system by stating whether he wants to authorize such recommendation. An explanation for anticipatory recommendation is displayed besides in order to establish trust in the recommender system. − Recommended alternativess are displayed in a separated part of the visualization interface. Only alternative dataset prototypes (data structure, i.e., fact, measures, dimensions, parameters and restriction predicates) are displayed to the user. The system loads dimension data (parameters values) as well as fact data (measures values) when the user selects a recommended prototype. 4.4 Example Let us consider a decision-maker who has the following preferences that are deduced from his previous interactions with the system:
Applying Recommendation Technology in OLAP Systems
231
− PGeoUSA3: State fp City, CP3 = {Sales/AVG(Margin), Dates.HMonth / Year} − PGeoZON4: Zone fp Country, CP4 = {Sales/AVG(Margin)} − PStore5 : GeoZON fp GeoUSA Suppose that the user intends to analyse profit margin in USA according to dimensions Store and Dates, especially by City (from hierarchy GeoUSA) and by Year. The system displays the classic query result (see Fig. 6 (a)). Furthermore, it provides two alternative recommendations in order to help the user quickly find relevant information and discover interesting patterns. The first alternative (profit margin by year according to cities and states, see Fig. 6 (b)) is generated since the user is interested in data by state in the context of analysis of yearly profit margin (PGeoUSA3). This analysis node is richer on information than the classic node since it provides more details on cities (state of each city). It provides also correlations between cities themselves, i.e., the user can check the effectiveness of the values related to each city by observing its margin part in the total margin of its state. This may help user evaluate data. The second alternative (profit margin by years and by Zone, see Fig. 6 (c)) provides another perspective to analyse the profit margin. The user is interested in data according to the geographical perspective (GeoZON according to PStore5) and more precisely according to the level Zone in the context of analysis of profit margin (PGeoZON4). USER INTERFACE
(a)
(b) (c)
Alternatives prototypes
SALES AVG(MARGIN) % City STORE. Asheville GeoUSA Chesapeake Dallas Durham Houston Los Angeles Norfolk San Diego STORE. Country = 'USA '
DATES.HMonth Year 2006
2007
2008
15,50 18,00 12,00 14,40 12,00 11,00 18,00 9,00
16,80 16,50 10,40 17,20 12,60 16,00 15,50 20,00
14,00 18,00 12,00 14,00 14,00 18,00 14,00 18,00
Data analysis USER
SALES/AVG(Margin), DATES/ Year, STORE.GeoUSA/ State/ City SALES/AVG(Margin), DATES/ Year, STORE.GeoZON/ Zone
Selection of data prototype
MULTIDIMENSIONAL DATABASE
QUERY ENGINE
Data loading
MT related to (b) SALES AVG(MARGIN) % State STORE. GeoUSA California
MT related to (c)
City Los Angeles San Diego
Total North Carolina
Durham Asheville
Total Texas
Dallas Houston
Total Virginia
Total STORE. Country = 'USA '
Norfolk Chesapeake
DATES.HMonth Year 2006
2007
2008
11,00 9,00 10,00 14,40 15,50 14,95 12,00 12,00 12,00 18,00 18,00 18,00
16,00 20,00 18,00 17,20 16,80 17,00 10,40 12,60 11,50 15,50 16,50 16,00
18,00 18,00 18,00 14,00 14,00 14,00 12,00 14,00 13,00 14,00 18,00 16,00
DATES.HMonth SALES Year 2006 AVG(MARGIN) % Zone STORE. 14,00 North GeoZON 13,50 South 13,75 Central 15,00 East 12,50 West STORE. Country = 'USA '
Fig. 6. Framework for alternatives recommendation
2007
2008
14,00 13,00 15,00 16,50 14,00
18,00 14,00 13,00 15,50 15,75
232
H. Jerbi et al.
5 Conclusions We proposed to apply recommendations in OLAP systems in order to assist the user during his decision-support analysis. This includes both implicit assistance in the form of anticipatory recommendations and explicit assistance by providing alternative patterns of data or helping user perform his analysis reports. Our approach deals with recommendation of an analysis context which represents a state of an OLAP analysis. It is independent from user structure visualization. The system determines recommended analysis contexts, then displays them to the users according to their data visualization structure (MT, chart, diagrams,…). We defined three categories of recommendations in OLAP according to details provided by user and we discussed how recommendations are generated to the user with regard to his preferences. As a future work, we intend to specify preference mining techniques for detecting strict partial order preferences in user log data. These techniques must: 1) elicit user preferences; and 2) discover mappings that associate the user preferences to their related analysis contexts. We intend also to investigate how to make progressive improvement of the recommendations while a user is increasingly using the system. This leads to conversational recommenders that participate in an interactive dialog with the user by asking him to give feedback or to answer questions.
References 1. Abelló, A., Samos, J., Saltor, F.: Implementing operations to navigate semantic star schemas. In: International Workshop on Data Warehousing and OLAP, pp. 56–62. ACM, New York (2003) 2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999) 3. Balabanovic, M., Shoham, Y.: Fab: Content-based, collaborative recommendation. Communications of the ACM 40(3), 66–72 (1997) 4. Cabibbo, L., Torlone, R.: From a procedural to a visual query language for OLAP. In: International Conference on Scientific and Statistical Database Management, pp. 74–83. IEEE Computer Society, Washington (1998) 5. Choong, Y.W., Laurent, D., Marcel, P.: Computing Appropriate Representations for Multidimensional Data. Data & knowledge Engineering Journal 45(2), 181–203 (2003) 6. Dittrich, J.P., Kossmann, D., Kreutz, A.: Bridging the gap between OLAP and SQL. In: International Conference on Very Large Data Bases, pp. 1031–1042 (2005) 7. Giacometti, A., Marcel, P., Negre, E.: A Framework for Recommending OLAP Queries. In: International Workshop on Data Warehousing and OLAP, pp. 73–80. ACM, New York (2008) 8. Golfarelli, M., Maio, D., Rizzi, S.: Conceptual Design of Data Warehouses from E/R Schemes. In: Annual Hawaii International Conference on System Sciences (1998) 9. Gyssen, M., Lakshmanan, L.: A foundation for multi-dimensional databases. In: International Conference on Very Large Data Bases, pp. 106–115 (1997) 10. Jerbi, H., Ravat, F., Teste, O., Zurfluh, G.: Management of context-aware preferences in Multidimensional Databases. In: International Conference on Digital Information Management, pp. 669–675 (2008)
Applying Recommendation Technology in OLAP Systems
233
11. Kimball, R.: The Data Warehouse Toolkit, 1996, 2nd edn. John Wiley and Sons, Chichester (2003) 12. Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R., Riedl, J.: GroupLens: Applying Collaborative Filtering to Usenet News. Communications of the ACM 40(3), 77–87 (1997) 13. Koutrika, G., Ikeda, R., Bercovitz, B., Garcia-Molina, H.: Flexible Recommendations over Rich Data. In: ACM Conference On Recommender Systems, pp. 203–210. ACM, New York (2008) 14. Lieberman, H.: Autonomous Interface Agents. In: SIGCHI Conference on Human Factors in Computing Systems, pp. 67–74. ACM, New York (1997) 15. Linden, G., Smith, B., York, J.: Amazon.com recommendations: Item-to-item collaborative Filtering. IEEE Internet Computing 7(1), 76–80 (2003) 16. Maes, P.: Agents That Reduce Work and Information Overload. Communications of the ACM 37(7), 31–40 (1994) 17. Miller, B.N., Albert, I., Lam, S.K., Konstan, J.A., Riedl, J.: Movielens unplugged: Experiences with an occasionally connected recommender system. In: ACM International Conference on Intelligent User Interfaces, pp. 263–266 (2003) 18. Ravat, F., Teste, O., Tournier, R., Zurfluh, G.: Algebraic and graphic languages for OLAP manipulations. International Journal of Data Warehousing and Mining 4(1), 17–46 (2008) 19. Sapia, C.: PROMISE: Predicting Query Behavior to Enable Predictive Caching Strategies for OLAP Systems. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, pp. 224–233. Springer, Heidelberg (2000) 20. Satzger, B., Endres, M., Kießling, W.: A Preference-Based Recommender System. In: Bauknecht, K., Pröll, B., Werthner, H. (eds.) EC-Web 2006. LNCS, vol. 4082, pp. 31–40. Springer, Heidelberg (2006)