Preference-oriented mining techniques for location-based ... - CiteSeerX

0 downloads 0 Views 853KB Size Report
Oct 12, 2010 - lic store evaluation (α), weight of store distance (β), and weight of user ..... Horozov T, Narasimhan N, Vasudevan V (2006) Using location for ...
Knowl Inf Syst DOI 10.1007/s10115-011-0475-4 REGULAR PAPER

Preference-oriented mining techniques for location-based store search Jess Soo-Fong Tan · Eric Hsueh-Chan Lu · Vincent S. Tseng

Received: 12 October 2010 / Revised: 11 October 2011 / Accepted: 15 October 2011 © Springer-Verlag London Limited 2012

Abstract With the development of wireless telecommunication technologies, a number of studies have been done on the issues of location-based services due to wide applications. Among them, one of the active topics is the location-based search. Most of previous studies focused on the search of nearby stores, such as restaurants, hotels, or shopping malls, based on the user’s location. However, such search results may not satisfy the users well for their preferences. In this paper, we propose a novel data mining-based approach, named preference-oriented location-based search (POLS), to efficiently search for k nearby stores that are most preferred by the user based on the user’s location, preference, and query time. In POLS, we propose two preference learning algorithms to automatically learn user’s preference. In addition, we propose a ranking algorithm to rank the nearby stores based on user’s location, preference, and query time. To the best of our knowledge, this is the first work on taking temporal location-based search with automatic user preference learning into account simultaneously. Through experimental evaluations on the real dataset, the proposed approach is shown to deliver excellent performance. Keywords Data mining · Location-based search · Preference learning · Feedback · Collaborative filtering 1 Introduction The advancement of wireless communication techniques and the popularity of mobile devices, such as mobile phones, PDA, and GPS-enabled devices, have contributed to a number J. S.-F. Tan · E. H.-C. Lu · V. S. Tseng (B) Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C. e-mail: [email protected] J. S.-F. Tan e-mail: [email protected] E. H.-C. Lu e-mail: [email protected]

123

J. S.-F. Tan et al.

of location-based services (LBSs). One of the popular topics is the location-based search [31]. Mobile users can search anything through their mobile devices from anywhere at anytime. A number of search platforms, such as PAPAGO [33], Google maps [32], iPeen [14], and UrMap [36], provide the capability of search for nearby stores. The search results depend on the user’s current location and query time. In addition, owing to the rapid advancement of the Web 2.0 technology [35], many stores have made their store information, for example, business hours, location, and features available on-line, for example, via mapping services such as Google maps. Based on this information, more advanced location-based search engine can be developed, for example, search for chophouses in a particular city, search for the nearest gas station that is opening, and search for shopping malls that are in promotion. Take a scenario for an example, John is a traveler who travels to a strange city. He wants to search for a nearby restaurant that matches his preferences at dinner time, but he has no idea what choice he has got. Although there are a number of ways to achieve this requirement, for example, Google search or restaurant recommendation Web sites, they have some drawbacks as follows. (1) John must input his location into the search engine. However, it is difficult to identify his location while there is no obvious landmark. (2) Even though the search engine recommends John a nearby restaurant by automatic location detection, nonetheless, the quality of restaurant may no guaranty. (3) Even though the restaurant quality can be found on some restaurant recommendation Web sites, public favor restaurants may not same with John. Therefore, to provide a location-based search engine based on user preference and location is required pressingly. Hence, the main problem we are addressing in this paper is formulated as follows. Given a user location and a query time, our goal is to develop a system framework that provides a nearby store ranking list which is liked by the user based on the user location and query time, when the user queries this system. We expect the framework can efficiently and accurately return more accurate ranking list after learning a short span of time. Although the existing search systems have been shown in convenience of gathering the preferred store, the traditional recommenders still encounter three main problems: (1) The lack of location information: This problem indicates that the recommender cannot recommend the nearby stores for the user due to no taking user location information into account. For example, a user has a searching result with a recommended store in shopping center 101 in Taipei while the user is in Kaohsiung. (2) The lack of temporal information: This problem indicates that the recommender cannot predict the store that is opened due to no taking query time information into account. For example, a user has a searching result with a breakfast store that is closed in evening. (3) The lack of preference information: This problem indicates that the recommender cannot predict the preference store for user due to no taking user preference information into account. For example, a user has a searching result with a cheap store; nevertheless, the user preferred high-class store. In addition, a number of studies [13,31] have been proposed for location-based search approaches, the main concern is how to rank nearby stores based on the user location. Many of search platforms [32,33] provide the distance-based ranking strategy, which ranks stores based on the distance between the store and the user. Ranking by the public evaluation, for example, from restaurant recommendation Web sites [14], is another ranking strategy called the evaluation-based ranking strategy, which ranks stores based on the public favors. In addition, the studies [2,21,23] have discussed the issues of ranking strategies. However, none of them takes user personal preference into account. To provide an automatic preference learning mechanism is needed. Moreover, to increase the precision of ranking results, two mechanisms namely relevant feedback [27] and collaborative filtering (CF) [16] have been applied to search systems. However, there is no work that integrates the location-based search with preference learning, relevant feedback, and collaborative filtering simultaneously.

123

Preference-oriented mining techniques

Due to the existing data, mining techniques are not match enough to solve the issues of our researches; we created a framework with data mining concept and some new techniques to reach our goal. In this paper, we propose a novel data mining-based approach named preference-oriented location-based search (POLS) to efficiently search for k nearby stores that are most preferred by the user based on the user’s current location. In POLS, a personal preference database, which contains positive and negative preferences, is designed to maintain the personal preference for each user. Based on this database, we propose a novel ranking algorithm for ranking nearby stores based on user location. For a store ranking list, the user may rate it, and the rating result will be used to update the personal database by the proposed feedback-based learning algorithm. Besides, all users individual’s preference databases are assembled as a global preference database. A collaborative filtering (CF)-based learning algorithm, which learns preference by similar users in global preference database, is proposed to update the user’s personal preference database. To our best knowledge, this is the first work on integration of location-based search and user preference learning simultaneously. Finally, through experimental evaluation of various system conditions, the proposed methods are shown to deliver excellent performance in terms of ranking accuracy and search latency. Our contributions in this research study are fourfold: • • • •

We propose the POLS framework, a new approach to search for nearby stores based on the user preference and location. The problems and ideas in POLS have not been well explored in the research community. We propose a ranking algorithm, a new technique for ranking stores based on not only personal preferences but also global preferences provided by similar users. We propose two automatic learning algorithms, feedback-based and CF-based learning algorithms, for efficiently learning user preferences. Through a comprehensive empirical evaluation and sensitivity analysis, we show that POLS produces excellent performance under various system conditions.

The remainder of this paper is organized as follows. We state the related work in Sect. 2, and the problem and the framework in Sect. 3. In Sect. 4, we describe the proposed data mining approach, POLS. The empirical evaluation for performance study is made in Sect. 5. Conclusions and future work are presented in Sect. 6.

2 Related work In this section, we review previous studies, which contain four categories: (1) global positioning system and Web 2.0, (2) location-based search, (3) ranking algorithm, and (4) recommendation system. 2.1 Global positioning system The global positioning system (GPS) [8] has become very common today. With the GPS technology in cell phones, the location-based service and application are very popular in this century since they can easily provide the useful information to mobile user while the user is moving in outdoor. The well-known location-based application such as Google maps [32] is the GPS exchange format (GPX) for the interchange of GPS data (waypoints, routes, and tracks) between applications and Web services on the Internet. This standard allows people to chart their routes and waypoints on a Google map, so they can easily share with others.

123

J. S.-F. Tan et al.

Google maps released their application programming interface (API) in the middle of 2005, which led MapQuest [22] to do the same. It also led to a revolution in Web-based applications that has been dubbed “Web 2.0” [35]. Web 2.0 refers to a new approach to coding for the Web that makes us of certain techniques that do away with the old requirement for a Web page to refresh between operations. This technique is a steady connection between the server and the Web browser on your computer. This method had been around for a while but little used until Google and Google spin-off GMail began using it a few years ago. It is the key feature that allows such previously unheard of behavior as the ability to drag maps around inside a browser window. The map is being continuously refreshed by this communication with the server, so the page does not need to be reloaded for changes to appear. This has given rise to many new applications on the Web. Flickr [9], a Web site that allows users to store, share, and use their photos interactively on the Web, was one of the first “early adopters” of the technology. This has led to a raft of off-shoots such as Frappr, Flickrmap, Facebook, Myspace, Digg, and Del.icio.us. They are all parts of the new trend called the “Social Web.” In addition to the Social Web, this also led to other amusement Web application such as Web Taiwan Gourmet Food [28] and Web iPeen [14]. Web Taiwan Gourmet Food is a Web site that provides the introduction to project cuisine, top store and cuisine list, cuisine introduction, overseas cuisine, cuisine video, and cuisine blog. Web iPeen is a Web site that allows users to store and share their experience interactively on the Web in categories of cuisines, point of interests, movies, books, clothes, and ornaments. There are a lots of such Web applications on Internet with copious of store content information such as store location, business hours, public evaluation score, price level, and user commentary, which are very useful for searching engine to mine and get a optimal search result.

2.2 Location-based search The location-based service (LBS) is an information and entertainment service, accessible with mobile devices through the mobile network and utilizing the ability to make use of the geographical position of the mobile device [17,18,26,29]. LBSs can be used in a variety of contexts, such as health, work, and personal life. LBSs include services to identify a location of a person or object, such as discovering the nearest banking cash machine or the whereabouts of a friend or employee. LBSs include parcel tracking and vehicle tracking services. LBSs can include mobile commerce when taking the form of coupons or advertising directed at customers based on their current location. They include personalized weather services and even location-based games. In recent years, many studies have been proposed on the location-based search and recommendation. In [6,10], Chow et al. and Ghinita et al. proposed a study using cloak location instead of location point to protect the user privacy. In [30], Zhang et al. proposed a study of re-use query result in same region for mobile and stationary users. In [31], Zheng et al. proposed a location-based social networking service called GeoLife 2.0. In GeoLife 2.0, user can share their life GPS trajectory data to others. There are a lot of valuable geographic information such as visited locations of popular and frequently visit. This research proposed a similarity measurement for measuring the user similarity. Based on the user similarity, the system proposed a friend recommendation approach to find the personal potential friends. In [34], Takeuchi et al. proposed a shop recommendation system based on individual user preferences and needs. The system recommends user with frequently visited shops using a custom estimation algorithm. In [2], Belkin et al. proposed a recommendation system, which helps user to find the correct words for a successful search. For popular location-based search services, Google maps [32] and PAPAGO [33] provide the

123

Preference-oriented mining techniques

search capability for nearby targets. However, the above studies do not take user preference learning into account while searching nearby targets. 2.3 Ranking algorithm In information retrieval, a ranking function is a function used by search engines to rank matching items according to their relevance to a given search query. Once a search engine has identified a set of potentially relevant items, it faces the task of determining which articles are most relevant, so that they may be listed at the top of the list. This is typically done by assigning a numerical score to each item based on a ranking function, which incorporates features of the item, the query, and the overall item collection. Ranking functions are evaluated by a variety of means, one of the simplest is determining the precision of the first k top-ranked results for some fixed k, for example, the proportion of the top-10 results that is relevant, on average over many queries. Frequently, computation of ranking functions can be simplified by taking advantage of the observation that only the relative order of scores matters, not their absolute value. Hence, terms or factors that are independent of the item may be removed, and terms or factors that are independent of the query may be pre-computed and stored with the item. In the information century, a ranking algorithm is necessary to help people to choose their preferred item. In [21], Liu et al. proposed an item ranking approach named EigenRank, ranks items based on the preference of similar users. The similarity between users is measured by the correlation of rating items rather than rating values. To increase the precision of ranking results, two mechanisms named relevance feedback [4,24,27] and collaborative filtering (CF) [12,16] had been applied to search systems. There are a lot of studies for the relevance feedback. In [27], Rui et al. proposed the retrieval of relevance feedback-based interactive approach in content-based image retrieval (CBIR). For each ranking item list, users can rate on each item. The idea of relevance feedback is to take the rating results as an input knowledge, which is used to improve the ranking quality of a new query. In [24], Moghaddam et al. proposed a FeedbackTrust model for using feedback effects in trust-based recommendation systems. The recommendation system recommend the user based on three types of trust, for example, dispositional, interpersonal, and impersonal. In [4], Buckley et al. proposed an approach named Dynamic Feedback Optimization, starts with a good weighting scheme based upon Rocchio feedback and then improves those weights in a dynamic fashion by testing possible changes of query weights on the learning set documents. A number of CF-based search systems have been proposed in the literatures. In [16], Jin et al. proposed a new model for collaborative filtering. The idea of collaborative filtering is considering the similarity between two users based on their rating results. For those who are similar to the user, their preferences are used to rank an item list to the user. In [12], Herlocker et al. proposed an algorithm framework for performing collaborative filtering and new algorithmic elements that increase the accuracy of collaborative prediction algorithms. 2.4 Recommendation system Recommender systems [3] or recommendation engines form or work from a specific type of information filtering system technique that attempts to present information items such as films, television, video on demand, music, books, news, images, and Web pages, which are likely to be the interest of the user [25]. Typically, a recommender system compares the user’s profile to some reference characteristics and seeks to predict the “rating” that a user would give to an item they had not yet considered. These characteristics may be from the information item called the content-based approach or the user’s social environment called

123

J. S.-F. Tan et al.

the collaborative filtering (CF)-based approach. There are a lots of recommendation system based on the collaborative filtering approach. In [13], Horozov et al. investigated the issues of a recommender system for location-based points of interest (POI). This research proposed a location-based restaurant recommender system based on the proposed enhanced CF-based solution for generating recommendations. In [23], Massa et al. discuss the problem of finding similar users often fails, when the rating data are too sparse. The research proposed a trust metric that replaces this step in a CF-based recommendation system. In [18], Li et al. proposed a CF-based user similarity measurement that measures the similarity of two users from their GPS trajectory databases. In the content-based recommendation systems, Cano et al. present a metadata-free system, called MusicSurfer [5], for the interaction with massive collections of music. MusicSurfer automatically extracts descriptions related to instrumentation, rhythm, and harmony from music audio signals. There are a lots of recommendation system using content-based and CF-based approaches. In [1], Balabanovic et al. proposed a recommendation system named Fab to help users sift through the enormous amount of information available in the World Wide Web. This system combines the content-based and collaborative methods of recommendation in a way that exploits the advantages of the two approaches while avoiding their shortcomings. In [7], Debnath et al. propose a hybridization of content-based and CF-based recommendation system. Attributes used for content-based recommendations are assigned weights depending on their importance to users. In [19], Kayaalp et al. proposed an application combined the content based and CF based that has been integrated with several Web sites. The system collects event data from several related Web sites either by using Web services or Web scraping. It also permits users rating events they have attended or planned. Given the social network between users, the system tries to recommend upcoming events to users. 3 Problem statement In this section, we first define some terms used in discussion of our research work and then specify our research goal. Table 1 summarizes the notations used in the paper. Definition 1 (Preferences) P = { p1 , p2 , . . . , p|P| } denotes the collection of preferences, also called features in this paper. Definition 2 (Users) U = {u 1 , u 2 , . . . , u |U | } denotes the collection of users. For all u i ∈ U ,  each user has a learned preference set Pu i = { p1 , p2 , . . . , p|P }, where p j ∈ P, ∀ 1 ≤ j ≤ ui |    Pu . i

Table 1 Notation table

123

Notation

Description

P

Collection of preferences, also called features

U

Collection of users

L

Two-dimensional coordinates of a location point

Dist

Collection of the distances between stores and users

T

Collection of non-overlap time slots in a day

S

Collection of stores

RV

Range of rating values for user rates on a store

Date

Collection of rating date

Preference-oriented mining techniques

Definition 3 (Similar users) CF = {u 1 , u 2 , . . . , u |CF| } denotes the collection of similar users using in collaborative filtering (CF)-based preference learning algorithm in this paper. Definition 4 (Location) L = (x, y) denotes a two-dimensional coordinates of a location point with longitude x and latitude y. Definition 5 (Distances) Dist = {dist1 , dist2 , . . . , dist|Dist| } denotes the collection of the distances between stores and users, which defined by Geographical Distance. Definition 6 (Time slots) T = {t1 , t2 , . . . , t|T | } denotes the collection of non-overlap time slots in a day, since the preferences of a user in each time slot maybe different. The system collects and stores user preferences based on the predefined time slots and hence uses time slots when performing look up. For example, five time slots are defined, where t1 represents 04:00:00–10:59:59, t2 represents 11:00:00–13:59:59, t3 represents 14:00:00–15:59:59, t4 represents 16:00:00–19:59:59, and t5 represents 20:00:00–03:59:59. Definition 7 (Stores) S = {s1 , s2 , . . . , s|S| } denotes the collection of stores. For all si ∈ S, each store has a location, business hours, public store evaluation, food category, and a set of store properties (also called features in this paper) Fsi = { f 1 , f 2 , . . . , f |FSi | }, where   f j ∈ P, ∀ 1 ≤ j ≤  Fsi . Definition 8 (Rating values) RV = {0, 1, 2, 3, 4, 5} denotes the range of rating values for user rates on a store, where 1 to 5 represent the preference level from not preferred to preferred, and the value 0 represents the unknown rating. 1 and 2 are represent to not preferred, 3 represents to ordinary, and 4 and 5 represent to preferred. The not preferred level of 1 is higher than 2. The preferred level of 5 is higher than 4. Definition 9 (Rating dates) Date = {date1 , date2 , . . . , date|Date| } denotes the collection of rating date. For all datei ∈ Date, each may different with the query date since the user may rate the store after query a few days. 3.1 Problem formulation With the above definitions, the main problem we are addressing in this paper is formulated as follows. Given a user location and a query time, our goal is to develop a system framework that provides a nearby store ranking list that is liked by the user based on the user location and query time, when the user queries this system. We expect the framework can efficiently and accurately return more accurate ranking list after learning a short span of time. In this paper, we propose POLS approach that includes a ranking algorithm and two preference learning algorithms for solving this problem.

4 Proposed method In this section, we describe our system framework, namely preference-oriented locationbased search (POLS), for a location-based service that search for the k nearby stores that are most preferred by the user based on the user’s current location. In POLS framework, three important research issues need to be addressed: (1) recommendation of the k nearby stores for the user, (2) learning of user’s preferences by his/her feedback, and (3) learning of similar users’ preferences. Correspondingly, we first propose a ranking algorithm for store ranking based on user’s location, preference, and query time to recommend the k nearby stores that

123

J. S.-F. Tan et al.

Environments Other Users

...

Results Rank Store Rating 1 32 4 5 2 14 3 6 3 68

Global Preference Database

CF-Based Learning

Current Location

Active User

k

POLS Framework

2

Ranking Algorithm

Personal Preference Database

Feedback Learning

Fig. 1 System framework POLS

are most preferred by the user. Next, we learn the user’s preferences by his/her feedback with feedback-based learning algorithm. Finally, we propose a CF-based learning algorithm to learn the similar users’ preferences. 4.1 The POLS system framework Figure 1 shows the proposed POLS system framework. When an active user queries the system, the user’s location will be detected by any GPS-enabled mobile device. In POLS, a ranking algorithm is designed to rank nearby stores based on the user’s personal preference database, which is automatically learned by learning algorithms. The ranking list will be presented to the user by POLS, and the user can rates any store in the list. Then, the rating result will be processed by feedback-based learning algorithm, and the personal preferences will be extracted to update the personal preference database. Besides, other users’ preferences will be learned while using POLS, and the preferences will be updated to their personal preference database, called global preference database. A CF-based learning algorithm is designed to learn the active user’s potential preferences from global preference database based on the similar users. The POLS framework is to search the preference store based on location. The main purpose of this framework is to provide a precise and efficient mobile search system to mobile users. 4.2 Ranking algorithm The main purpose of the ranking algorithm is to provide a precise store ranking list that prioritized the stores according to public evaluation, user preference, and location. From the existing study, one or two of the three factors are only considered individually, especially the factor of distance. Although there are many studies on user personal preference, none of them takes user personal preference into account to figure out user preference store. Before performing the ranking algorithm, the stores that is closed will be filtered by comparing their business hours and query time, and the remaining stores will be scored by the scoring function of ranking algorithm. The higher score the user more like. For a user u and a store s, the scoring function S(u, s) is defined as (1), which is combined from three aspects including the

123

Preference-oriented mining techniques

public evaluation score SEval (s), the distance score SDist (u, s), and the preference matching score SPref (u, s). S(u, s) = α × SEval (s) + β × SDist (u, s) + γ × SPref (u, s)

(1)

where α, β, and γ are predefined weights for fusing the three sub-scores, α + β + γ = 1. SEval (s) is a public evaluation that is retrieved from Web sites such as restaurant recommendation Web sites. The value of SEval (s) is normalized from 0 to 1. The concept of distance equations is based on user behavior. Firstly, according to user rating historical log, we calculate the average distance that user might be preferred from the actual distance between the store and user in Eq. (2). n j=1 distAct j (u, s) distPref (u, s) = (2) n where distAct (u, s) is the actual distance between the locations of user u and store s in kilometer, and n is the number of transactions. Both of them are rated greater than 3 in the past under a query time slot. Secondly, we designed the distance score SDist (u, s) by the concept of the farther store the user might not preferred if there is a closer store with same features such as branch. Hence, we designed Eq. (3) for getting a lower distance score for the farther store if the distance is over the average preference distance in Eq. (2). On the other hand, for those stores which are closed enough and within the average preference distance would get a highest distance score 1.  1, if distAct (u, s) ≤ distPref (u, s) SDist (u, s) = (3) 1 − distActR (u,s) , otherwise where distAct (u, s) is the actual distance between the locations of user u and store s in kilometer, distPref (u, s) is the user preference distance defined as (2), and R is a tolerable distance range with user location as a circle, which is predefined by POLS for each user in kilometer, for example, 2 km. The R could be updated by query with distance keyword, for example, 0.5 and 3 km. The user preference distance might be different under different time slot. For example, a user preferred closer store in the morning and preferred high-class store in the evening even though the store is far. SPref (u, s) is the preference matching score between the preferences of user u and the features of store s. For each mobile user, the personal preference database in POLS contains a positive preference database and a negative preference database. Every entity in the positive or negative preference database contains a preference item and a strength value within 0–1. In Fig. 2, there are three positive preferences and two negative preferences in the user personal preference database, for example, an entity {cheap, 0.9} in the positive preference represents that the user prefers cheaper stores intensely. The preference matching score is defined as (4), where Fs is the set of features for the store s, PPu and NPu are the positive and negative preference for the user u, respectively, StrPPu and StrNPu are the strength values of positive and negative preferences, respectively. In Fig. 2, there are two stores namely store A and store B, each contain three store features. The preference matching scores can be calculated as 0.8 and −0.4 for store A and store B, respectively.   StrPPu − StrNPu (4) SPref (u, s) = ∀PPu ∈Fs

∀NPu ∈Fs

123

J. S.-F. Tan et al.

Mobile User Positive Preference Item Strength 0.9 Cheap Parking 0.3 Near 0.2 Negative Preference Item Strength Spicy 0.9 0.4 Buffet

Store Store A Feature Cheap Parking Buffet

Store B Feature Parking Near Spicy

S Pref (u, s A ) = 0.9 + 0.3 − 0.4 = 0.8 S Pref (u, s B ) = 0.3 + 0.2 − 0.9 = −0.4

Fig. 2 An example for preference matching score

For the personal preference database, we propose two preference learning algorithms described as follows namely feedback-based [27] learning algorithm and enhanced collaborative filtering (CF)-based [16] learning algorithm to learn the user’s positive and negative preferences and their corresponding strengths. 4.3 Feedback-based algorithm For a recommended list, users will rate it according to their preferences. Suppose that there are five rating values from 1 to 5 to indicate preference level. The higher rating value, the store is more like by the user. POLS will automatically learn the positive and negative preferences from the user feedback and then update the personal preference database. For the rating values {1, 2, 3, 4, 5}, we classified whether the store is like or dislike by the user. We defined the store is like by the user while the rating value is 4 or 5, the store is dislike by the user while the rating value is 1 or 2, and the store is ordinary while the user rates 3. According to the rating values, we defined the corresponding preference strength values to {1.0, 0.5, 0, 0.5, 1.0}. For the store that is rated above 3 or below 3, the store features and the corresponding preference strength values will be stored into positive or negative preference databases, respectively. Take Fig. 3 as an example. The user rates 5 to store A on April 15. The corresponding preference strength value of rating score 5 is 1.0. We store all the features of store A namely cheap, parking, and buffet into the positive preference database with strength value 1.0 and date April 15, since rating score 5 is determined that the user likes the store. It is possible that a store feature is rated more than once; hence, the preference strength needs to be combined to a strength value. We combine them based on their rating day. In Fig. 3, suppose that today is April 30, the item “cheap” is occurred twice in positive preference database with strength values 0.5 and 1.0, and days past 30 and 15 from rating day, respectively. The combined strength value Str N of preference “cheap” will be normalized to Str N = (0.5 × 30−1 /(30−1 + 15−1 )) + (1.0 × 15−1 /(30−1 + 15−1 )) = 0.83 According to the human sense, user preference will be changed by time and the preference level will be decayed by a number of days have passed. It could be presented by a parabolic curve as shown in Fig. 4. The larger number of times, the user more preferred the preference. For a user most preferred active preference, the weight of the active preference is decayed slower than others. In other words, the user preferred the preference for a long time, since the user rated the preference as preferred preference many times. The preferences will be set to inactive if the

123

Preference-oriented mining techniques

Query Date: Apr. 15 Search Result Rank

Store

Rating

Strength

1

Store A

5

1,0

2

Store B

2

0.5

3

Store C

3

0

Store A Feature Cheap Parking Buffet

Store Store B Feature Parking Near Spicy

Preference Database Positive Preference Item Strength Cheap 0.5 (Apr.1) Buffet 0.5 (Apr.1) Cheap 1.0 (Apr.15) Parking 1.0 (Apr.15) Buffet 1.0 (Apr.15) Negative Preference Item Strength Parking 0.5 Near 0.5 0.5 Spicy

Store C Feature Parking Near

Weight of Preference

Fig. 3 The process of feedback learning

Time(i)=1

Time(i)=2

Time(i)=3

Number of Days Have Passed Fig. 4 The preference level decay by the number of days have passed

decayed weight is less than and equal to 0, since the preferences are past for a long time that over its life time. After normalized all the preferences, we decayed each active preference strength value based on the number of days have passed start counting from the rating day. The decay function D(Str N , d N , i) is defined as (5) D(Str N , d N , i) = Str N − eg(i)×(d N −c2 ) i

(5)

where g(i) =

1 1 + c log10 i

and Str N is an active preference strength value after normalized, and d N − c2i is used to control the distance (number of days have passed) between parabolic curve and y-axis (weight of preference). d N is a number of days have passed at x-axis that used to get the weight of preference at y-axis by pointing to the parabolic curve. −c2i is smaller while the distance is farther. In another word, the more times (i) of a preference that user preferred the slower decay of weight of preference. c is a predefined gradient coefficient which set to 4, and the base of 2i is set to 2 for optimizing the distance between the number of days have passed and weight of user preference. g(i) is the gradient rate. All of the preferences should be taken into consideration. The active preference strength decay is more significant while the gradient coefficient is larger. After strength decaying,

123

J. S.-F. Tan et al.

Ru

Nu

Mobile User U Positive Preference Item Strength Cheap 0.5 (Apr.1) Buffet 0.5 (Apr.1) Cheap 1.0 (Apr.15) Parking 1.0 (Apr.15) Buffet 1.0 (Apr.15) Counting

Mobile User V Positive Preference Item Strength Spicy 0.5 (Apr.4) Parking 0.5 (Apr.7) Cheap 1.0 (Apr.12) Parking 1.0 (Apr.16) Spicy 1.0 (Apr.16) Counting

Positive Pref. Set Item Count Cheap 2 Buffet 2 Parking 1

Positive Pref. Set Item Count Spicy 2 Parking 2 Cheap 1

Sim(u, v) =

Rv

Nv

Min(2,1) + Min(1,2) = 0.626 1 + log e (3 × 3)

Fig. 5 An example for measuring the similarity between two users

any active preference with non-positive strength indicates that they are past for a long time. Hence, an updating process is performed to set the preference as inactive in the preference database. 4.4 Collaborative filtering (CF)-based algorithm For the global preference learning, we propose an enhanced CF-based learning algorithm that is learning the potential preferences of the user by the other similar user [20]. Consequently, POLS could recommend by the personal preference and potential preference of the user. To achieve this learning scheme, we have to store the rating information into rating log while rating is made by all the users. According to the rating log, we analyzed the similarity of the preference sets between other users and the user. For the users who have high similar preference sets, their positive preferences will be transferred to the user’s personal preference database and became the potential preferences of the user. To analyze the similarity of the preference sets, a positive preference set is transformed from the positive preference to maintain each user’s positive preferences and preference counts. In Fig. 5, Ru and Rv are the positive preference sets for user u and user v, respectively. Then, the similarity between two users is measured by the proposed similarity measurement. The similarity measurement is defined as (6)  ∀r ∈Ru ∩Rv Min(cntr,u , cntr,v ) Sim(u, v) = (6) σ where r represented a positive preference item that is simultaneously rated in Ru and Rv , cntr,u and cntr,v are the count of preference item r for user u and user v, respectively, and σ is a similarity controller define as follows σ = 1 + loge (Nu × Nv )

(7)

where Nu and Nv are the number of positive preference items in the positive preference set belongs to user u and user v, respectively.

123

Preference-oriented mining techniques

Dividing the similarity by the factor of Nu × Nv is motivated by the problem of unbalanced data of users. Intuitively, users joining in a Web community on different periods will generate different amount of logs. Thus, if we do not consider the scale of data, the individuals owning a large amount of data are more likely to be similar to others than users having less data. On the contrary, logarithmic the Nu × Nv is motivated to avoid the individuals owning a large amount of data are more likely to be non-similar to others even though the user v is similar to user u. Take Fig. 5 as an example. In the two user personal positive preference databases, the user U prefers “cheap,” “buffet,” and “parking” and the user V prefers “spicy,” “parking,” and “cheap.” The counts of these preference items are first counted as the positive preference sets by POLS. The common preferences of user U and user V are “cheap” and “parking,” and their similarity can be measured as 0.626 based on (6) by analyzing all the common preferences. 4.5 Complexity analysis In this section, we analyze the time complexity of POLS in the cases of best, worst, and average. We found that there are three factors in the time complexity which are the number of users (U N ) who used POLS, the number of stores (S N ) in POLS database, and the number of similar users (CF N ) in CF-based preference learning algorithm. The time complexity of POLS is analyzed as below: Best case: O(U N ∗ S N ) The latency is increased while the U N or S N is increased. Due to the more U N using the POLS, the more calculation is processed in POLS; hence, the time overhead is increased. In the same way, due to the more S N , the more comparison within store properties and user preferences should be performed; hence, it increased the time overhead of POLS. The main issues that effect time complexity of POLS are U N and S N in the best case. Worst case: O(U N ∗ S N + CF N ) The latency is increased while the U N or S N is increased. Due to the more U N using the POLS, the more calculation is processed in POLS; hence, the time overhead is increased. In the same way, due to the more S N , the more comparison within store properties and user preferences should be performed; hence, it increased the time overhead of POLS. The main issues that effect time complexity of POLS are U N and S N in the best case. Average case: O(U N ∗ S N + CF N ) For the operation executes in a time comparable to the majority of possible cases, the Big-O notation of average case is equal to the worst case since both of them included the time complexity of CF-based preference learning algorithm CF N .

123

J. S.-F. Tan et al.

Fig. 6 An example of search result list

5 Experimental evaluation In this section, we conducted a series of experiments to evaluate the performance of the proposed POLS, under various system conditions. All of the experiments were implemented in PHP on a 2.40 GHz machine with 3 GB memory running on windows Vista. 5.1 Experimental model For the store dataset, we collect 598 restaurants information in Tainan city from iPeen Restaurant Recommendation Web site [14]. Each restaurant contains features, location, public evaluation, and business hours. The average number of restaurant features is 3. For the user set, there are 16 laboratory members participate in our experiment. At the beginning of the experiments, we assign a location and a query time for each participant from six predefined distinct locations and three predefined distinct query time slots, respectively. The participants query the POLS on breakfast time, lunch time, and dinner time twice among the six assigned locations. In other words, every participant would query the POLS for six times per day. The participants are requested to imagine that they are under the assigning time slot and location while making a query to POLS. For all of the result lists, the participants are requested to rating on the top-10 stores based on their personal preference to the store under the assigning time slot and location. The example of search result list is shown in Fig. 6. In Fig. 6, Store Saiki is predicted as the most preference store of the user, following by store Garden, Starbucks, and Shojikiya. The distance shown after the store name is the distance between the store and user. The matched store features to the user’s preference are shown after the distance. The food categories on the left-hand side of the screen can be multiple selected by the user for only viewing the stores that provided the selected kinds of foods. The duration of the real experimental is 1 month. During the experiment, we observed the rating behavior of participants who rate store based on the public store evaluation, store distance, and user preference. According to the participants rating behavior, we construct a regular rating scheme for the simulation experiments. To evaluate the performance of POLS, we enlarge the number of stores and users in simulation experiments. According to the real store dataset, we simulate the restaurant information including features, location, public evaluation, and business hours. The average number of

123

Preference-oriented mining techniques Table 2 Experimental models

Experimental model

Data

User

iPeen_Lab

iPeen

Lab

iPeen_Sim

iPeen

Simulation

Sim_Sim

Simulation

Simulation

restaurant features is 5. For each simulation user, we designed an action for simulation users. Every user has their pre-defined ground truth preferences, and a simulation query time and location are assigned to each user at the beginning of each experiment transaction. The user may query with or without a keyword based on a probability. A keyword is sent to POLS if the user probability is under the probability; otherwise, the user search without keyword. When a user sends a search query to POLS, a store search result is recommended by the ranking algorithm that is taken location information, temporal information, and preference information into account. Every simulation user would rates to the top-10 stores in the list with rating value 1–5, the greater value represents the store is more preferred by the user. The user preference would be learned by the learning algorithm on each rating action. The rating value might be random or regular based on the probability of noise rating rate. A random rating value is simulated if the probability is under the probability of noise rating rate, the rating record reflects the noise of POLS. On the contrary, a regular rating scheme is adopted which defined conditions from the observation of laboratory members rating behavior in “iPeen_Lab” model. The query location of simulation user is randomly chosen within the city, and the query time is the time of breakfast, lunch, and dinner. To evaluate the accuracy of preference learning, we randomly assign some preferences to each simulation user as “ground truth.” The average number of simulation user’s preferences is 5. Each user is simulated to search the nearby restaurants and rated to the restaurants in ranking list that is recommended by POLS. The rating behavior of simulation users is referred to the rating scheme of laboratory users, which is observed and constructed in real experiment. Every simulation user queries POLS 30 times in the experiment, all rating results of queries are fed back for next iteration search. The rating value of a restaurant is based on the restaurant public evaluation, distance between the restaurant and the user, and the matching degree between user preference and restaurant feature. Due to the different kinds of data and users, there are three experimental models for our experiments as shown in Table 2. Each model is composed by a dataset and user set. For dataset, it might be a real data crawled from iPeen Restaurant Recommendation Web site, a restaurant search system, or a simulation data simulated to test POLS. For user set, it might be a real experiment taken by our laboratory members or a simulation experiment taken by simulation users who are designed to verify POLS. The followings are the main measurements for the experimental evaluation. We use the area under receiver operating characteristic (ROC) curve [11], defined as ROC Area and the normalized discounted cumulative gain (NDCG) [15] as the accuracy measurements for the ranking list. The area under ROC curve and the NDCG are two popular measurements for evaluating the effectiveness of a search engine algorithm. The NDCG using a graded relevance scale of stores in a search engine result sets to measure the usefulness or gain of a store based on its position in the result list. The gain is accumulated cumulatively from the top of the result list to the bottom with the gain of each result discounted at lower ranks. The ROC curve is a graphical plot by the fraction of true positive rate (TPR) and the fraction of false positive rate (FPR). It is also known as a relative operating characteristic curve, since it is a comparison of two operating characteristics (TPR and FPR) as the criterion changes.

123

J. S.-F. Tan et al.

POLS 0.86 0.84 0.82

NDCG@10

0.8 0.78 0.76 0.74 0.72 0.7 0.68 0.66

1:0:0 0:1:0

0:0:1

1:1:0

1:0:1

0:1:1

1:1:1

2:1:1

1:2:1

1:1:2

2:2:1

2:1:2 1:2:2

α:β:γ Fig. 7 The impact of α, β, γ for POLS framework

TPR determines a classifier or a diagnostic test performance on classifying positive instances correctly among all positive samples available during the test. FPR, on the other hand, defines how many incorrect positive results occur among all negative samples available during the test. 5.2 Impact of various weights for POLS To figure out the optimal weights of three important ranking issues, that is, weight of public store evaluation (α), weight of store distance (β), and weight of user preference to the store (γ ) in Eq. (1) in Sect. 4.2, we vary the proportions of α, β, and γ and performed the “iPeen_Sim” model by simulate users rating on top-10 stores in each result list, this experiment is measured by the NDCG in Fig. 7. In Fig. 7, we observed that ranking with only considered public store evaluation, that is, 1:0:0 would cause the worst accuracy ranking list, and only considered store distance 0:1:0 and user preference to the store 0:0:1, respectively, would also cause the ranking accuracy less than 0.78. Considered two ranking issues among the three ranking issues would cause a better result while simultaneously considered store distance and user preference 0:1:1. In addition, the NDCG has a higher accuracy 0.838 if simultaneously considered three ranking issues 1:1:1. Finally, we found the optimal proportion 1:2:2 which cause the highest accuracy 0.847. The final result is fit to our expectation since a store with higher public store evaluation might not be the personal preference store of the active user. On the contrary, the issues of store distance and user preference are the more important ranking issues on human sense. According to human sense, the evaluation of a store should be normalized as a value that larger than 0. In the following experiments, α is set as 0.2, β is set as 0.4, and γ is set as 0.4, since α : β : γ = 1:2:2 and α + β + γ = 1. 5.3 Impact of various noise rating rates on ranking accuracy This experiment is performed in “iPeen_Sim” model to verify the POLS by vary the rate of rating noise. We simulate a noise rating rate for the simulation user to reflect the noise

123

Preference-oriented mining techniques

Accuracy

POLS NDCG@10

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5

0.4

ROC Area@10

0.3

0.2

0.1

Noise Rating Rate

1

1

0.8

0.8

0.6

Recall

Recall

Fig. 8 The impact of noise rating rate on ranking accuracy

k=9 k=7

0.4

0.6

k=9 k=7

0.4

k=5 0.2 0

k=5 0.2

k=3

0 0

0.2

0.4

0.6

0.8

1

k=3 0

0.2

False Positive Rate

0.4

0.6

0.8

1

False Positive Rate

Fig. 9 The impact of top-k measurement on ranking accuracy in “iPeen_Sim” (left) and “iPeen_Lab” (right)

rating of POLS. The accuracy measurements of this experiment are NDCG and ROC Area. The noise rating rate is varying from 0.1 to 0.5 to test the accuracy of POLS as shown in Fig. 8. In Fig. 8, the NDCG and ROC Area are increased, while the rate of rating noises is decreased. We observed that the NDCG and ROC Area reach to 0.86 and 0.55, respectively, while the noise rating rate is 0.1, and they are 0.71 and 0.45, respectively, while the noise rating rate is 0.5. The accuracy increases rapidly while the noise rating rate decreased to 0.3, since the preference is precisely learned to rank a store list. The rating noise influenced the accuracy of preference learning. The lower rating noise rate the higher ranking accuracy. 5.4 Impact of various top-k measurements on ranking accuracy This experiment evaluates the top-k stores measurement on ranking list in terms of accuracy. The “iPeen_Sim” and “iPeen_Lab” models are used in this experiment, and the accuracy measurements of this experiment for the two models are ROC curve. The simulation users and the laboratory members are requested to rating on the top-10 stores on the result list in each experimental model. For each rated result, the ROC curve measured the accuracy by varying the top-k stores on the result list for both of the models as shown in Fig. 9. In both of the figures, the area under ROC curve of top-3 is greater than others, followed by top-5, top-7, and top-9 since POLS learned the user preference accurately and recommend with an accurate list of preference store.

123

J. S.-F. Tan et al. 0.7

ROC Area@10

0.86

NDCG@10

0.82 0.78 0.74 0.7 0.66

Eval Eval+Dist

1

6

11

Dist POLS

16

0.6 0.5 0.4 0.3 0.2 Eval Eval+Dist

0.1 21

0

26

1

6

th

11

Dist POLS

16

21

26

th

The N Query

The N Query

Fig. 10 The impact of the number of queries on NDCG (left) and ROC Area (right)

Accuracy

POLS 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

NDCG@10

1

3

ROC Area@10

5

7

9

The Number of Rating Stores Fig. 11 The impact of the number of rating stores on ranking accuracy

5.5 Impact of various numbers of queries on ranking accuracy We implement several ranking algorithm approaches to compare with our proposed POLS approach by varying the number of queries in terms of accuracy, that is, the public store evaluation ranking algorithm “Eval,” the distance ranking algorithm “Dist,” both of the public store evaluation and the distance ranking algorithm “Eval + Dist.” The “iPeen_Sim” model is used in this experiment, and the measurements are NDCG and ROC Area as shown in Fig. 10, respectively. In both of the figures, POLS outperforms the other ranking algorithms and increases along a parabolic curve while the number of queries is increased. There are no any trend is shown by the other ranking algorithms in both of the figures, since their algorithms have not include the issue of user preference learning, which is important factor for ranking approaches. 5.6 Impact of various numbers of rating stores on ranking accuracy This experiment is performed in “iPeen_Sim” model to verify the impact of the number of rating stores on ranking accuracy. The users are simulated to rating on the top-10 stores on the result list, only the top-1, top-3, top-5, top-7, and top-9 stores would be used to learn the user preference. The NDCG and ROC Area measurements are used in this experiment by measuring the top-10 stores as shown in Fig. 11. Both of the NDCG and ROC Area are increased while the number of rating stores is increased, since the more rating stores the more user feedbacks and the faster preference

123

Preference-oriented mining techniques

Accuracy

POLS 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

NDCG@10

5

50

ROC Area@10

500

5000

50000

The Number of Users 40000

0.25

Space (MB)

Latency (sec)

POLS

POLS

0.3 0.2 0.15 0.1 0.05 0

5

50

500

5000

The Number of Users

50000

30000 20000 10000 0

5

50

500

5000

50000

The Number of Users

Fig. 12 The impact of the number of users on latency (left), space (right) and ranking accuracy (top)

learning by the POLS. Consequently, the greater number of user rating to the stores, the POLS learned the user preference faster and the ranking list would be more accurate. 5.7 Impact of various numbers of users on latency, space, and ranking accuracy This experiment is performed in “Sim_Sim” model to enlarge the number of users for verifying the latency, space, and ranking accuracy of POLS. The accuracy measurements of this experiment are NDCG and ROC Area. We enlarge the number of users by multiple 10 from 5 to 50,000 in measurements of latency, space and ranking accuracy as shown in Fig. 12, respectively. In Fig. 12, the ranking accuracy is increased as the number of users increases. Due to the number of users increased, the user preference could be learned more precisely by our proposed CF-based preference learning algorithm. Consequently, the greater number of users using the POLS the ranking result list would be more accurate. The latency is increased while the number of users is increased. The latency is about 0.25 s per query transaction while the number of users is 50,000. Consequently, the POLS could run in real time, since the ranking result list is responded within a low response time even the number of users is huge. The size of database space is increased while the number of users is increased. The space is obviously increased when the number of users is 50,000 since the number of users is raised with a huge different 50,000 − 5,000 = 45,000 corresponding to previous plot 5,000 − 500 = 4,500. In Fig. 12, we found that the average space uses by a user is 40,000/50,000 < 1 MB when the number of users is 50,000. 5.8 Impact of various numbers of stores on latency and ranking accuracy This experiment is performed in “Sim_Sim” model to enlarge the number of stores for verifying the latency and ranking accuracy of POLS. The accuracy measurements of this experiment are NDCG and ROC Area. We enlarge the number of stores from 250 to 1,250 in both of the measurements of latency and ranking accuracy as shown in Fig. 13, respectively.

123

J. S.-F. Tan et al.

Accuracy

POLS NDCG@10

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 250

500

ROC Area@10

750

1000

1250

The Number of Stores POLS

POLS 45

0.25

Space (MB)

Latency (sec)

0.3

0.2 0.15 0.1

40

0.05 0 250

500

750

1000

The Number of Stores

1250

35 250

500

750

1000

1250

The Number of Stores

Fig. 13 The impact of the number of stores on latency (left) and ranking accuracy (right)

In Fig. 13, the ranking accuracy has no obvious variation while the number of stores is increased since the set of stores properties is fixed and our proposed POLS learned the user preference by the set of stores properties in our proposed feedback-based preference learning algorithm. Consequently, the number of stores is not influenced the accuracy of ranking result list in POLS. The latency is increased while the number of stores is increased. The latency is about 0.23 s per query transaction while the number of stores is 1,250. Consequently, the POLS could run in real time, since it could returned a ranking result list with a low response time even the number of stores is huge. The size of database space is increased while the number of users is increased. We can found that the space of the number of stores is slackly increased from 39 to 41 MB when the number of stores increased from 750 to 1,250. Hence, the average database space of each store is (41 − 39)/(1,250 − 750) = 0.004 MB = 4.1 KB. 6 Conclusions and future work In this paper, we have presented a novel store search approach named preference-oriented location-based search (POLS) by integrating user location, query time, and user preference. To the best of our knowledge, this is the first work on taking location-based search and user preference learning into account simultaneously. The list of recommended nearby stores is accurately recommended by matching the store features to the learned user preferences. In the experimental result, we have shown that the POLS outperforms other existing approaches under different environmental conditions. By considering location and temporal information, we precisely learned the user preferences under different location and temporal conditions by propose feedback-based and CF-based preference learning algorithm. Finally, we rank store list based on user location, query time, and user preference by proposed Ranking Algorithm. Moreover, we used the Popular Store Ranking Strategy to provide a popular and nearby store list for user when the

123

Preference-oriented mining techniques

POLS faces to the cold-start problem. For solving the new-item problem, we used the store properties as the features to match user preference. Consequently, the traditional cold-start and new-item problems that often occurred in most of the search systems are solved by POLS. In the experiments, we have shown that the POLS outperforms the ranking approaches such as public store evaluation, store distance, and both of them in terms of accuracy. Overall, POLS reaches the best performance with low computation cost and achieves a high-quality ubiquitous store search. In future work, we will try to design a more accurate and efficient location-based search engine based on user preference that provides users to search any object from anywhere at anytime. Furthermore, we will try to apply POLS to other location-based search applications such as search for nearby public transportation and friend. Acknowledgments This research was supported by National Science Council, Taiwan, ROC under grant no. NSC100-2631-H-006-002 and NSC100-2218-E-006-001.

References 1. Balabanovic M, Shoham Y (1997) Fab: content-based, collaborative recommendation. Commun ACM 40(3):66–72 2. Belkin NJ (2000) Helping people find what they don’t know. Commun ACM 43(8):58–61 3. Bezerra B, Carvalho F (2011) Symbolic data analysis tools for recommendation systems. Knowl Inf Syst 26(3):385–418 4. Buckley C, Salton G (1995) Optimization of relevance feedback weights. In: Proceedings of the 18th international ACM SIGIR conference on research and development in information retrieval, pp 351–357 5. Cano P, Koppenberger M, Wack N (2005) An industrial-strength content-based music recommendation system. In: Proceedings of the 28th international ACM SIGIR conference on research and development in information retrieval, pp 673–673 6. Chow CY, Mokbel MF, Liu X (2006) A peer-to-peer spatial cloaking algorithm for anonymous locationbased services. In: Proceedings of the 14th ACM international symposium on geographic information systems, pp 171–178 7. Debnath S, Ganguly N, Mitra P (2008) Feature weighting in content based recommendation system using social network analysis. In: Proceeding of the 17th international conference on world wide web, pp 1041–1042 8. Dixon T (1991) An introduction to the global positioning system and some tectonic applications. Rev Geophys 29(2):249–276 9. Flickr http://www.flickr.com/ 10. Ghinita G, Azarmi M, Bertino E (2010) Privacy-aware location-aided routing in mobile ad hoc networks. In: Proceedings of the 11th international conference on mobile data management, pp 65–74 11. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36 12. Herlocker JL, Konstan JA, Brochers A et al (1999) An algorithm framework for performing collaborative filtering. In: Proceedings of the 22nd international ACM SIGIR conference on research and development in information retrieval, pp 230–237 13. Horozov T, Narasimhan N, Vasudevan V (2006) Using location for personalized POI recommendations in mobile environments. In: Proceedings of the 6th international symposium on applications on internet, pp 124–129 14. iPeen http://www.ipeen.com.tw/ 15. Jarvelin K, Kekalainen J (2002) Cumulated gain-based evaluation of ir techniques. ACM Trans Inf Syst 20(4):422–446 16. Jin R, Si L, Zhai C et al (2003) Collaborative filtering with decoupled models for preferences and ratings. In: Proceedings of the 12th international conference on information and knowledge management, pp 309–106 17. Jose R, Davies N (1999) Scalable and flexible location-based services for ubiquitous information access. In: Proceedings of the 1st international symposium on hand-held and ubiquitous computing, pp 52–66 18. Kaasinen E (2003) User needs for location-aware mobile services. Pers Ubiquitous Comput 7(1):70–79

123

J. S.-F. Tan et al. 19. Kayaalp M, Özyer T, Özyer ST (2009) A collaborative and content based event recommendation system integrated with data collection scrapers and services at a social networking site. In: Proceedings of the 1st international conference on advances in social network analysis and mining, pp 113–118 20. Li Q, Zheng Y, Xie X et al (2008) Mining user similarity based on location history. In: Proceedings of the 16th ACM SIGSPATIAL international conference on advances in geographic information systems, pp 1–10 21. Liu NN, Yang Q (2008) EigenRank: a ranking-oriented approach to collaborative filtering. In: Proceedings of the 31st international ACM SIGIR conference on research and development in information retrieval, pp 83–90 22. MapQuest. http://www.mapquest.com/ 23. Massa P, Avesani P (2007) Trust-aware recommender systems. In: Proceedings of the 1st ACM conference on recommender systems, pp 17–24 24. Moghaddam S, Jamali M, Ester M et al (2009) FeedbackTrust: using feedback effects in trust-based recommendation systems. In: Proceedings of the 3rd ACM conference on recommender systems, pp 269–272 25. Muhlestein D, Lim S (2011) Online learning with social computing based interest sharing. Knowl Inf Syst 26(1):31–58 26. Reitmayr G, Schmalstieg D (2003) Location based applications for mobile augmented reality. In: Proceedings of the 4th Australasian user interface conference on user interfaces, pp 65–73 27. Rui Y, Huang TS, Ortega M (1998) Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans Circuit Syst Video Technol 8(5):644–655 28. Taiwan Gourmet Food. http://gcis.nat.gov.tw/tw-food/link.php 29. Taha K, Elmasri R (2010) BusSEngine: a business search engine. Knowl Inf Syst 23(2):153–197 30. Zhang J, Zhu M, Papadias D et al (2003) Location-based spatial queries. In: Proceedings of the 8th ACM SIGMOD international conference on management of data, pp 443–454 31. Zheng Y, Chen Y, Xie X et al (2009) GeoLife2.0: a location-based social networking service. In: Proceedings of the 10th international conference on mobile data management, pp 357–358 32. Google Maps. http://maps.google.com/ 33. PAPAGO. http://www.papago.com.tw/ 34. Takeuchi Y, Sugimoto M (2006) CityVoyager: an outdoor recommendation system based on user location history. In: Proceedings of the 3rd international conference on ubiquitous intelligence and computing, pp 625–636 35. Toma I, Ding Y, Chalermsook K et al (2009) Utilizing Web2.0 in web service ranking. In: Proceedings of the 3rd international conference on digital society, pp 174–179 36. UrMap. http://www.urmap.com

Author Biographies Jess Soo-Fong Tan received the BS and Master degree in Computer Science and Information Engineering from National Cheng Kung University (NCKU), Taiwan, ROC, in 2008 and 2010, respectively. Her research interests include data mining, mobile computing, mobility behavior discovery with prediction, and intelligent recommendation systems.

123

Preference-oriented mining techniques Eric Hsueh-Chan Lu received the BS degree in Computer Science and Information Engineering from National Taiwan University of Science and Technology (NTUST), Taiwan, ROC, in 2003, and the PhD degree in Computer Science and Information Engineering from National Cheng Kung University (NCKU), Taiwan, ROC, in 2010. His research interests include data mining, mobile computing, social networking, object tracking, Location-Based Services (LBSs), and intelligent transport systems.

Vincent S. Tseng is a professor at Department of Computer Science and Information Engineering at National Cheng Kung University (NCKU), Taiwan, ROC. Before this, he was a postdoctoral research fellow in University of California at Berkeley during January 1998 and July 1999. He is also the president of Taiwanese Association for Artificial Intelligence and had acted as the director for Institute of Medical Informatics of NCKU during 2008 and 2011. Dr. Tseng has a wide variety of research interests covering data mining, biomedical informatics, multimedia databases, mobile and Web technologies. He has published more than 200 research papers in referred journals and international conferences, and has held/filed more than 15 patents in USA and ROC. He is on the editor board of several international journals and has also served as chair/program committee for a number of premier conferences related to data mining and database systems.

123

Suggest Documents