Indirect Location Recommendation - Semantic Scholar

3 downloads 1060 Views 628KB Size Report
[19] present a machine learning approach for keyword ranking. These authors do not focus on discovery, instead discuss the ranking problem, which is essential ...
Indirect Location Recommendation André Sabino

Armanda Rodrigues

CITI, Departamento de Informática, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa Quinta da Torre, 2829–516 Caparica, Portugal

CITI, Departamento de Informática, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa Quinta da Torre, 2829–516 Caparica, Portugal

[email protected]

[email protected]

ABSTRACT Recommending interesting locations to users is a challenge for social and productive networks. The evidence of the content produced by users must be considered in this task, which may be simplified by the use of the meta-data associated with the content, i.e., the categorization supported by the network – descriptive keywords and geographic coordinates. In this paper we present an extension to a productive network representation model, originally designed to discover indirect keywords. Our extension adds a spatial dimension to the information that represents the user production, enabling indirect location discovery methods through the interpretation of the network as a graph, solely relying on keywords and locations that categorize or describe productive items. The model and indirect location discovery methods presented in this paper avoid content analysis, and are a new step towards a generic approach to the identification of relevant information, otherwise hidden from the users. The evaluation of the model extension and methods is accomplished by an experiment that performs a classification analysis over the Twitter network. The results show that we can efficiently recommend locations to users.

Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous

cation based social networks. Much like Foursquare 1 , the user experience on other social networks can be enriched with the recommendation of locations, even when the user opted out on the geo-tracking services of the network. In [14], we present our previous work with an information model for the discovery of indirect keywords, where the focus is on the recommendation of keywords that can be interesting to the user, but were never directly used by the user herself or someone from the user’s contact list. The discovery method for indirect keyword discovery focuses on the relationships enabled by the users’ production, particularly, the implicit keyword graph present in the network. The focus on the structure of user production is so relevant for the model that the networks are referred to as productive networks. The approach is focused on the structure of information, and deliberately does not analyze the production’s content, avoiding computational costs of media analysis and loss of generalization. We are interested in applying the same principle that enables the discovery of indirect keyword, to discover relevant indirect locations. To apply the idea, the user only needs to produce annotated content, and, actually, no explicit location associated with production items is required. The keywords used in annotation should enable the discovery of locations that are associated with indirect keywords and items. In this context, the contributions of this paper are:

General Terms

• An extension of our information model (relating users, items, keywords) with the notion of locations;

Application, Experimentation

• A location recommendation strategy that uses our information model to extract features suitable to train a classifier;

Keywords Location Recommendation, Social Networks, Productive Networks

1.

• An evaluation of our model extension, which includes a comparison with results from related work.

INTRODUCTION Location recommendation services are not exclusive of lo-

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. SIGSPATIAL ’14, November 04-07 2014, Dallas/Fort Worth, TX, USA Copyright 2014 ACM 978-1-4503-3135-7/14/11 ...$15.00 http://dx.doi.org/10.1145/2675354.2675697

The remaining of this paper is organized as follows. Section 2 discusses the related work. Section 3 presents the problem of finding interesting keywords to recommend. Section 4 describes the model and the potential relationship extraction methods. Section 5 presents the case study used to evaluate our methods and details the evaluation procedure implementation. Section 6 presents a discussion of the results and the outline of future experiments. Section 7 draws some conclusions. 1

http://www.foursquare.com

2.

RELATED WORK

In our previous work [14], we present an information model to enable the discovery of indirect keywords. We propose to extend this model in order to discover indirect locations. Gouveia et al. [4, 5] present an interaction and visualization framework for the productive networks model. In [15, 3, 10], several authors present approaches for the design of keyword recommendation systems. The authors focus on strategies to recommend keywords to enhance the description of items, and usually those keywords already annotate items of the user’s contacts. Lappas et al. [9] discuss how social endorsement techniques can be used for keyword recommendation and ranking. Stefanidis et al. [17] discuss the use of preference contexts for group recommendation systems. All these authors provide context to the recommendation problem. Liu et al. [11] discuss keyword ranking using a probabilistic approach, using Flickr as a case study. Wang et al. [19] present a machine learning approach for keyword ranking. These authors do not focus on discovery, instead discuss the ranking problem, which is essential to deliver recommendation results to the user, and thus influenced our ranking approach. Zhou et al. [21] discuss several methods to recommend users in social annotation systems (social tagging). The case study used is the de.licio.us 2 , and the approach is based on the proximity network of users. Laere et al. [8], use learning methods to automatically assign geographic coordinates to Flickr photos. The authors also use a clustering approach to obtain regions of interest, and present a method that successfully predicts the location of a previously unseen photo. The authors also provide a discussion on the effects of spatial granularity on the meaning of the location recommendation for a particular photo. Peregrino et al. [13] present a method to infer location from Twitter posts. It is based on text analysis, and cross referencing with Wikipedia 3 entries. Our model and this work can be integrated in a solution that first infers the geographic location of a Twitter post, and then discovers related indirect locations for recommendation. Ozdikis et al. [12] use evidential reasoning techniques over Twitter data to estimate locations, with the ultimate goal of event detection. Using the Dempster-Shafer Theory, the authors use the Twitter post location, text, and user profile declared location to construct belief intervals for sets of locations where certain events might have happened. This approach enables the discovery of locations that may be relevant to a user interested in a particular event. The evaluation is presented using belief percentage as effectiveness metric, and cannot be compared with our results. In [6, 7, 16], several authors present methods for mobile user profiling and event prediction, based on the crossreference between the user production and location. These authors are mainly focused on predicting the next location where the user will be, or an event will occur, from the evidence of past production. We are interested in suggesting locations that user did not yet consider. In [20, 2, 18], the authors present methods for location recommendation in location-based social networks. The methods are usually dependent on the type of post available on 2 3

https://delicious.com/ http://www.wikipedia.com

these networks, but provide a good insight over the location recommendation problem.

3.

DISCOVERING INDIRECT LOCATIONS

In this paper, we address the problem of the recommendation of locations for a user in a productive network, suggesting locations related with the user’s production. A productive network is a network were users share items of any type of media, e.g., photographs, videos or text. Items are annotated with text keywords. Keywords may be descriptive, classify or categorize items. An example of an item annotated with a keyword is a photograph annotated with the word “vacation”. In this paper we narrow the scope of application to productive networks which enable a spatial dimension to items. The goal is to promote places discovery or raise awareness about a possible semantic relationship between the user’s interests and a particular location. It is not about enriching a particular item’s description. We present an example to provide context to the following sections.

3.1

Motivating Example

Our goal is to recommend locations which are currently not related to the user. We will consider the context of Twitter, which is not a location based social network, although it enables users to associate geographic coordinates with posts. In our example, we consider three Twitter users, Alice, Bob, and Carol. Alice and Bob included the keyword (i.e., hashtag) “#TheRollingStones” in some of their posts. Additionally, Bob uses the keyword “#Lisbon” in some other posts, and enables his mobile phone to post coordinates on Twitter, associating both tags with his location coordinates. Carol’s posts include the keyword “#Lisbon” and “#RockInRio”, also with coordinate information. Following the approach of indirect keyword recommendation, the keyword “#RockInRio” is a suitable indirect keyword for Alice. If we consider a short time frame of the Twitter feed, the location coordinates used by Carol are also interesting for Alice, because it informs about a location that is potentially temporally related with her interests. This idea applies to events like summer music festivals where, for a few days, certain locations become interesting for the users. Finally, we expect that several locations meet the same conditions as the ones used by Carol, so the method of indirect location discovery must rank the list of suggestions according to some criteria.

3.2

Problem Statement

The problem of recommending an indirect location is defined as follows: For a particular user, based on a set of user defined keywords and locations for her production, there is a set of candidate locations for recommendation, which are not used by the user and which can be ranked according to some strategy. In our example, the user we want to recommend locations to is Alice, and the task would be to find all the locations in the same conditions as the one associated with the posts by Carol, with the keyword “#RockInRio”, in a particular time frame. There are two distinct tasks:

1. The first task is the definition of the set of indirect locations that are candidate for recommendation;

This is not present on our example, but is possible on some productive networks;

2. The second task is the construction of the ranked list from the candidate location set.

• Direct relationship: when two users are not co-authors but their items are related with a common set of keywords (i.e., users share keywords, like Alice and Bob share “#TheRollingStones”);

To address both tasks, we represent the information in the productive network, such as the Twitter network of our example, through an extension of our indirect keyword discovery information model – the productive network model. The extended model relates users, items, keywords and locations. From a user’s perspective, we propose to use the implicit graph of keywords to define a graph of keywords and locations. From the set of locations, we build a set of candidate locations that are not related with the user’s production, but which are related with the keywords that do. The location recommendations expected from this strategy are particularly interesting to expose relationships between keywords and locations of the user, which can be explored by the system to offer suggestions. In a scenario like our example’s, locations recommendations such as the locations associated with Carol’s posts would enhance content discovery and awareness for Alice. Ultimately, the model is used to train classifiers that are able to predict the user interest in a particular indirect location. Classifiers are machine learning constructs that approximate difficult functions, namely, the interest pattern of a user. Each classifier is trained to predict the behavior of variables according to a set of features. We use features extracted from the model to train a classifier for each user. The classifier used in the experiments is a support vector classifier (see section 4.3). The classifier enables the identification of a set of candidate locations, which can be suggested to the user, and enabled the suggestion of relevant users related with the locations. Task 2 deals with the presentation of the information. In order to be useful, the candidate locations need to be ranked. Finally, depending on the system, the top n elements of the list of ranked candidates are shown to the user. In our example, the Twitter interface would deliver a list of top candidates to Alice, which would be constrained by usability guidelines.

4.

EXTENSION TO THE PRODUCTIVE NETWORK INFORMATION MODEL

This section presents our extension to the productive network model. To the set of basic concepts of user, item, and keyword, we add the concept of location. A location is a geographic coordinate, which can be associated with several items. Relationships between these concepts are assumed from evidence taken from the data available in the network. A user is directly related to all of her items and, as items are associated to keywords, users will be automatically associated with the keywords they have used to annotate and describe their items. Locations are related with the user’s items, therefore are also associated with the user. The implicit graph defined by these concepts exhibits several types of relationships between users: co-authoring, direct relationship, and indirect relationship. These relationships are defined by: • Co-authoring: when two users share the same item.

• Indirect relationship: when there is no direct relationship between two users but some of the keywords used to categorize their items are related by other users’ items, like the keywords “#TheRollingStones” and “#Lisbon” are related through Bob’s pictures, creating an indirect relationship between Alice and Carol. The concept behind identifying indirect relationships is defined by the value, to the user, of the frequent association of two keywords with common items, e.g., for Alice, there is value in the association between ‘#TheRollingStones” and “#Lisbon” in Bob’s pictures. Similarly, locations that are frequently associated with a particular set of keywords have potential value to users with an interest in those keywords. The remaining of this section describes the details of the model, which include the formal representation of the concepts and relationships, and the experiments designed to test the methods.

4.1

Formalization

This section provides an extended version of the productive network model. We are interested in including a spatial dimension in the model. Originally, the basic elements of the model are users, U , items, I and keywords, K. Items are owned by users, and annotated with keywords. We now introduce the notion of locations, L. Each location is associated with several items, but not every item is located. The sets are defined such as: U = {U1 , · · · , Un } is a finite set of users, n ≥ 1 I = {I1 , · · · , Im } is a finite set of items, m ≥ 1 K = {K1 , · · · , Ko } is a finite set of keywords, o ≥ 1 L = {L1 , · · · , Lu } is a finite set of locations, u ≥ 1 Note: The subscripts used in the definitions serve to distinguish between elements of the same set. We use i, j, k for users, p, q, r for keywords, t, u, v for items, l, g, h for locations and m, n, o, u for set dimensions. Definition 1 presents the notions of ownership and annotation. Definition 1. The ownership by an user, Ui , of an item, It , is defined by: O(Ui ) = {It | It is owned by Ui , It ∈ I, Ui ∈ U } Own(Ui , It ) = {It ∈ O(Ui )} The annotation of an item, It , by a keyword, Kp , is defined by: T (It ) = {Kp | Kp is associated with It , Kp ∈ K, It ∈ I} Annotate(It , Kp ) = {Kp ∈ T (It )} The notion of geographically referencing an item is presented in definition 2.

Definition 2. The geographic referencing of an item, It , by a location, Ll , is defined by: G(It ) = {Ll | Ll is associated with It , Ll ∈ L, It ∈ I}

We are now able to define indirect relationships, from location-based direct relationships based on locations. Definition 5 presents these indirect relationships, and the set of indirect locations.

GeoRef (It , Ll ) = {Ll ∈ G(It )} The extended model verifies all the properties of the original model. Particularly, the indirect relationships between two users, summarized in definition 3.

Definition 5. An indirect relationship, IRL, between two users, Ui and Uj , based on locations, is defined by: if DRL(Ui , Uj ) = {∅}, ∃Uk ∈ U, ∃Ll , Lg ∈ L :

Definition 3. For a user, Ui , the set of all direct keywords of all of the user’s items is defined by: UK(Ui ) = {Kp | ∀It ∈ O(Ui ), ∀Kp ∈ T (It )} Therefore, a direct relationship, DR, between two users, Ui and Uj , is defined by: DR(Ui , Uj ) = {Kp |It ∈ I, Iu ∈ I, ∃Kp ∈ K : Own(Ui , It ), Own(Uj , Iu ), Annotate(It , Kp ), Annotate(Iu , Kp )} Finally, an indirect relationship, IR, between two users, Ui and Uj , is defined by: if DR(Ui , Uj ) = {∅}, ∃Uk ∈ U, ∃Kp , Kq ∈ K : Kp ∈ DR(Uk , Ui ), Kq ∈ DR(Uk , Uj ) then IR(Ui , Uj ) = {Kq | Kq ∈ DR(Uk , Uj )} IR(Uj , Ui ) = {Kp | Kp ∈ DR(Uk , Ui )} We are also able to conclude that the set of indirect keywords, IK, of a user, Ui , is determined by: IK(Ui ) = {Kp | Kp ∈ IR(Ui , Uj ), ∀Uj ∈ U, Uj 6= Ui } Definition 3 shows the construction of the indirect relationship set of a user, based on the set of keywords that are associated with her (UK). It also provides a simple mechanism to evaluate indirect relationships, i.e., the set of indirect keywords. It is this set of indirect keywords that enables the evaluation of the model (see section 4.3 for details). What we propose is to define the same set of the user’s indirect relationships, but through a set of indirect locations instead. We begin by redefining the Direct Relationship, now based on locations instead of keywords. Definition 4 presents the new set of direct relationships. Definition 4. For a user, Ui , the set of all direct locations of all of the user’s items is defined by: UL(Ui ) = {Ll | ∀It ∈ O(Ui ), ∀Ll : Ll ∈ G(It )} We are now able to define direct relationships based on locations, DRL, between two users, Ui and Uj , such as: DRL(Ui , Uj ) = {Ll |It ∈ I, Iu ∈ I, ∃Ll ∈ L : Own(Ui , It ), Own(Uj , Iu ), GeoRef (It , Ll ), GeoRef (Iu , Ll )}

Ll ∈ DRL(Uk , Ui ), Lg ∈ DRL(Uk , Uj ) then IRL(Ui , Uj ) = {Ll | Ll ∈ DRL(Uk , Uj )} IRL(Uj , Ui ) = {Lg | Lg ∈ DRL(Uk , Ui )} Finally, the set of indirect locations, IL, of a user, Ui , is determined by: IL(Ui ) = {Ll | Ll ∈ IRL(Ui , Uj ), ∀Uj ∈ U, Uj 6= Ui } The original experimental design is based on the notion that if a user annotated items with a keyword, that keyword would be a valid suggestion in a scenario where the network was modified so that that keyword became indirect to the user. Therefore, the strategy of the experiment is to remove one keyword from the set of keywords associated to the user. This method creates a new set of keywords that annotate the user’s items, and the experiment outputs the ranked list of indirect keywords. To evaluate our extension, we modified the experiment in order to focus on locations. Figure 1 presents the outline of the experiments for a particular user, Ui . In the outline, the experiment (step 5) is any procedure that tries to recover a location removed (step 4) from the user’s locations. An example of an experiment is the classification task described in section 4.3. Require: O(Ui ) 6= ∅ 1: for all It ∈ O(Ui ) do 2: if G(It ) 6= ∅ then 3: for all Ll ∈ G(It ) do 4: G0 (It ) = {Lg | Ll ∈ G(It ), Lg 6= Ll } 5: ILr = experiment(G0 (It )) 6: print Ll ∈ ILr ? 7: print rank(Ll , ILr ) 8: end for 9: end if 10: end for Figure 1: Experiment outline. It shows the removal of the association between user and location, which the experiment is designed to recover. The rank function returns the position of Ll in ILr . Given that the user explicitly associated items with the removed location, we are sure that it is relevant to the user. Therefore, as presented in figure 1, the experiment’s goal is twofold: first, recover it as an indirect location; second, attribute a high rank value. Success in this setup implies that the model is capable of discovering relevant locations. The ultimate goal is to find users to recommend, i.e., for user Ui , find users Uj , such that IRL(Ui , Uj ) 6= ∅. The actual procedure is outlined in figure 2.

Require: O(Ui ) 6= ∅ 1: result = list() 2: if IL(Ui ) 6= ∅ then 3: for all Ll ∈ IL(Ui ) do 4: result.append({Uj | Ll ∈ IL(Uj )}) 5: end for 6: end if 7: return rank(result) Figure 2: Outline of the procedure used to build lists of indirect relationships, candidate for recommendation. The ranking strategy is highly dependent on the characteristics of the data. We experimented with the same approach used with indirect keywords, and obtained similar results. The next section presents the details of the ranking process.

4.2

Ranking Results

To sort the output of the classifier, we consider two ranking values. The first, RLl , for every location, Ll , with a positive match, is defined by: RKp =

X

|{Kr | ∃It ∈ I : Kr ∈ T (It ), Kp ∈ T (It )}|

Kr ∈UK(Ui )

R Ll =

X

|{Lg | ∃It ∈ I : Lg ∈ G(It ), Ll ∈ G(It )}|

Lg ∈UL(Ui )

RLl calculates the sum of the number of co-occurrences between Ll and the user’s locations. 0 , is the normalization of The second ranking strategy, RL l RLl by the frequency of Ll , FLl , and is defined by: with FLl =

RLl |{It | Ll ∈ G(It )}| 0 , then RL = l |I| F Ll

Section 6 shows results with the best ranking strategy, 0 . RL l

4.3

Experiments

We replicate the evaluation method used in our previous work. The idea is to train a classifier, which will be able to decide if a location is relevant to the user. The classifier is a support vector machine (support vector classifier - SVC). The ground truth in our experiment is defined by the set of locations that the user actually identified as relevant, i.e., that the user associated with her production. The success or failure in a classification task is determined by the ability to identify a true positive (i.e., a location that is actually associated with the production) with the classifier. The algorithm in figure 1 describes the outline of the experimental procedure. For each location, we remove the association between the location and the user’s production (step 4), execute the classification task experiment (step 5), and then we check if the location is in the user’s location recommendation list (step 6). Finally, if the location is in the list, we request its rank (step 7). To speed up the process we train the classifier with half of the user’s locations, and validate it with the remaining half.

The SVC is able to determine if a particular location belongs to the user. The success of the classification is determined by the training conditions, i.e., the set of features used to infer data patterns and the training set. The challenge is in determining if the training set of locations accurately represents the user’s interests, and in selecting a robust set of location features. We designed a set of features for locations, based on the best approach tested by our previous work. For a location, Ll , of the user’s (Ui ) direct locations, the features are represented by the pair, F, determined by the cardinalities of the feature sets A and B, such that F = h|A|, |B|i. We propose one pair of feature sets, Fa , defined by: Fa Each location is represented by its absolute number of items (A) and its absolute number of users (B). A = {It | ∀It ∈ I : Ll ∈ G(It )} B = {Uj | ∀Uj ∈ U : Ll ∈ UL(Uj )} To sort the output of the classifier, we build the RLl ranked list. It is computed by the sum of the number of cooccurrences between Ll and the user’s locations. The rank0 ing values are then normalized by the RL , the frequency of l Ll , FLl , as described in section 4.2.

5.

EXPERIMENTAL SETUP

Our experiments use Twitter datasets to evaluate the model and the procedures detailed in section 4. The first step in the evaluation is to replicate the evaluation method for indirect keywords, to determine if the model extension remains compatible with the original goal (find indirect keywords). This initial step requires the computation of indirect keywords, ignoring the spatial dimension of the model. See section 6.1 for results. After we establish that the extension to the model does not create incompatibilities with previous operations, the evaluation turns its focus to the spatial dimension. There are some issues related with the nature of spatial data that require some tuning of the datasets, as discussed in section 5.2. Section 6.2 discusses the results of this extended experiment.

5.1

Datasets

The evaluation of the model and procedure uses 6 datasets built with Twitter data. Table 1 summarizes the datasets. We designed a live Twitter feed capture tool that collects and organizes the information according to our information model. All datasets originated from a particular event that we were able to monitor (live music summer events, in Portugal). The information collected contains users, items, keywords, locations, and places. Twitter provides both locations and places, where places are locations augmented with semantic. However, for the datasets available, the number of places is relatively low. We consistently obtained a very low percentage of geo-referenced information. Section 6.2 provides a discussion on the topic, and the impact of the low (or absent) number of located tweets on indirect location recommendation. Table 2 describes the datasets, with counts of the several dimensions available. All datasets contain the complete set of tweets associated with the events, starting 72 hours before the event begins,

Table 1: Twitter datasets available for evaluation. Each dataset was obtained by collecting the live feed resulting from filtering the Twitter stream with the given queries. ID Event Description E1 Rock in Rio Lisboa music festival E2 Lisbon summer holidays (”Santos Populares”) E3 Lisbon Mega Picnic E4 Paredes de Coura music festival E5 Paredes de Coura music festival V2 E6 Sol da Caparica music festival

Query #rirlx #santospopulares #megapicnic #rirlx #paredesdecoura #vodafoneparedesdecoura #soldacaparica

Table 2: Description of the datasets. The number of places is indicated in the locations’ column, in parenthesis. ID E1 E2 E3 E4 E5 E6

Items 47114 558 16 375 908 303

Users 26750 356 14 177 325 188

Keywords 1820 546 5 203 365 168

Require: UL(Ui ) 6= ∅ 1: clusters = DBSCAN (UL(Ui ), eps, min pts) {clusters is a collection of location clusters.} {Each cluster contains a set of locations.} 2: for all cluster c in clusters do 3: c.items = {IT | ∀Ll ∈ c.locations, G(Ll , It )} 4: c.users = {Ui | ∀It ∈ c.items, O(It , Ui )} 5: c.keywords = {Kp | ∀It ∈ c.items, T (Kp , It )} 6: end for Figure 3: Procedure used to build location clusters and associate the information needed for the classification analysis with clusters (instead of single locations).

to around 10% of the number of locations in the datasets, merging items, users, and keyword sets of cluster members, thus producing clusters with more than 1 keyword on average. Figure 4 shows the clustering results (and parameters used) on the E1 dataset, reducing from 743 locations to 79 location clusters.

79 c lus te rs - e ps : 0.025, m in. pts .: 2

Locations 743 (287) 79 (14) 2 (2) 0 (7) 0 (13) 0 (6)

and ending 72 hours after it closes. However, only the first two, E1, and E2, contain enough spatial data to enable indirect location identification.

5.2

Location Clusters

Although the datasets contain a low percentage of georeferenced items (as expected), the relationship pattern between item, keyword and location proved to be insufficient to enable our evaluation. The method requires a set of keywords to be associated with locations, through items. However, in most cases, each location correspond to only one item. Such is caused by the granularity operated by the GPS sensor on the mobile devices used to create the item. The same user, posting twice from the same location, a few minutes apart, is likely to produce two different coordinate pairs. The solution to the problem is clustering locations. Instead of running the evaluation directly on locations, we compute a set of location clusters, using the DBSCAN [1] algorithm. Figure 3 shows the clustering procedure outline, including the information cross-referencing computation between single locations and the respective clusters. The choice of parameter values for DBSCAN was not focused on optimal behavior in terms of clustering. The problem lies with the high amount of locations, most with only one associated keyword, so we are mainly looking to significantly reduce the number of locations. The informal heuristic we followed was to obtain a number of clusters equal

Figure 4: Clustering results for dataset E1. DBSCAN parameters are set to produce around 10% of the initial amount of locations. Circles represent location clusters, dots represent data marked as noise. Section 6.2 discusses the classification results between the clustering approach and the original set of locations.

5.3

Evaluation Metrics

To evaluate the classification analysis results, we use two standard metrics: the Mean Reciprocal Rank and Precision. Both metrics are statistics suitable to describe a list with a ranking of queries results.

Mean reciprocal rank (MRR) The Mean Reciprocal Rank informs where the first relevant keyword occurs in the ranking, averaged over all queries. It is calculated with equation 1.

M RR =

N 1 X 1 N 1=1 ranki

(1)

Precision (P) Precision is the proportion of retrieved keywords that is relevant, averaged over all queries. The results of any retrieval method can be divided into the relevant results and the non relevant, and the precision (P ) is determined by equation 2. P =

|relevant ∩ retrieved| |retrieved|

(2)

1.0 0.8 0.6 0.4 0.2 0.0

Precision at rank K (P@K) is the value of precision, considering only a subset of the results. This metric is interesting because only the top results are ultimately returned to the user, and is used to compare the preliminary results of our evaluation with the work in [14].

Recall (R) Recall is the proportion of relevant keywords that is retrieved, averaged over all queries. The recall (R) is determined by equation 3. R=

|relevant ∩ retrieved| |relevant|

(3)

We are not trying to extensively retrieve all the keywords and locations that are relevant to the user, but instead want to make sure that the ones that are retrieved are indeed relevant. We include the recall computation in the results for the sake of completion.

6.

Indirect Keywords

Our first set of results are from the indirect keyword analysis. These are relevant to establish a comparison between our model and the original productive network model, and to determine if there is a substantial difference between the datasets extracted from Twitter, and the Flickr dataset used by the original model. Figure 5 shows the distribution of the results for the Mean Reciprocal Rank and Precision for the our 6 datasets, contextualized by results from [14]. There is a consistent difference on the Precision, which is explained by the comparatively smaller size of the datasets. The Mean Reciprocal Rank results are similar to the original model’s. We conclude that the original model was successfully replicated and that our extension to the model is compatible with the indirect keyword discovery approach.

6.2

MRR

E1

E2

E3

E4

E5

E6

Figure 5: Boxplots with the Precision (P), and Mean Reciprocal Rank (MRR) results, for each case study. Both metrics are computed for each user of each case and this figure shows the average distribution of the metrics for each dataset. The horizontal doted lines represent the results obtained by the original model, in [14], for context (MRR=0.3978, P@1=0.2667). Table 3: Results of the indirect locations classification analysis. ID E1 E2

MRR 0.5390 0.1365

P@1 0.6415 0.7371

R@1 0.4351 0.5804

RESULTS

In this section we present the results of our experiments: the indirect keyword discovery experiment, similar to our previous work in [14]; and the indirect location discovery experiment, detailed in section 4.3.

6.1

P

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

Indirect Locations

We established that our model is correctly finding indirect keywords. We further execute the classification analysis for the location clusters. Table 3 shows the results. Because of the low percentage of keyword-location association, only the first (E1) dataset allows results without the clustering approach. These are: MRR=0.3785, P=0.6259, R=0.4118. Clustering significantly improves the Mean Reciprocal Rank results, and allows indirect location discovery on smaller datasets.

7.

CONCLUSIONS

This work presented an extension to the information model by Sabino, et al. [14]. The model addresses the problem of identifying meaningful location suggestions on a productive community, based on the structure of the content generation network. To address the challenge, this paper presented two contributions: 1. An extension to the productive network model that includes geographic locations – see section 4; 2. An indirect location discovery method to build ranked recommendation lists – see sections 5 and 6; We propose an extension to the productive network model. Our extension adds the concept of location to the original set of concepts – user, item and keyword. Our ultimate goal was to present a method to find potentially interesting locations. The new model was evaluated by a set of experiments using 6 datasets built from the Twitter network. We reproduced the original experiment of indirect keyword discovery, and extended the experiment design to discover indirect locations. The first experiment showed that we maintained consistency with the original model, while the second showed that the new model successfully enables the discovery of indirect locations. Although all datasets completely capture tweets about their particular events, only 2 out of 6 contained enough spatial information to enable indirect location identification. This is a constraint of the data on Twitter, and must be

taken into account when designing applications that use this approach to discover locations that may be relevant to users. Our evaluation focuses on datasets that captured Twitter posts over a short time period – generally between two days and two weeks. The recommendation of locations that are, at a particular moment in time, related with a user interests is relevant under these conditions. However, it does depend on the temporal co-occurrence of posts of the user with those interests declared and posts of other users with location information. To address this issue, the model can be paired with a user profile model that keeps track of the evolution of the user’s interests. In the future, we plan to experiment further with the new model, using datasets from other networks that also enable geographically located production items.

Acknowledgments This work was partly funded by doctoral grant FCT/MEC SFRH/BD/47403/2008, research grant FCT/MEC - PTDC /AAC-AMB/120702/2010 (Hidralerta Project), and research grant CITI/FCT/UNL - PEst-OE/EEI/UI 0527/2011.

8.

REFERENCES

[1] M. Ester, H. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, 1996. [2] G. Ference, M. Ye, and W.-c. Lee. Location Recommendation for Out-of-Town Users in Location-Based Social Networks. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages 721–726, 2013. [3] N. Garg and I. Weber. Personalized, interactive tag suggestion for flickr. Proceedings of the 2008 ACM conference on Recommender systems - RecSys ’08, page 67, Oct. 2008. [4] J. Gouveia, A. Sabino, and A. Rodrigues. Visualizing Productive Network Relationships. In Proceedings of the 2014 IEEE/WIC/ACM International Conference on Web Intelligence, Warsaw, Poland, 2014. [5] J. Gouveia, A. Sabino, and A. Rodrigues. Visualizing productive networks relationships. In Proceedings of the13th International Conference WWW/INTERNET, Porto, Portugal, 2014. [6] S. Ho, M. Lieberman, P. Wang, and H. Samet. Mining future spatiotemporal events and their sentiment from online news articles for location-aware recommendation system. In Proceedings of the First ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems, pages 25–32, 2012. [7] B. Hu and M. Ester. Spatial topic modeling in online social media for location recommendation. In Proceedings of the 7th ACM conference on Recommender systems, pages 25–32, 2013. [8] O. V. Laere, S. Schockaert, and B. Dhoedt. Towards automated georeferencing of flickr photos. In Proceedings of the 6th Workshop on Geographic Information Retrieval, 2010.

[9] T. Lappas, K. Punera, and T. Sarlos. Mining tags using social endorsement networks. Proceedings of the 34th international ACM SIGIR conference on Research and development in Information - SIGIR ’11, page 195, 2011. [10] H. Liang, Y. Xu, Y. Li, R. Nayak, and X. Tao. Connecting users and items with weighted tags for personalized item recommendations. In Proceedings of the 21st ACM conference on Hypertext and hypermedia - HT ’10, page 51, New York, New York, USA, June 2010. ACM Press. [11] D. Liu, X.-S. Hua, L. Yang, M. Wang, and H.-J. Zhang. Tag ranking. Proceedings of the 18th international conference on World wide web - WWW ’09, page 351, 2009. [12] O. Ozdikis, H. Oguztuzun, and P. Karagoz. Evidential location estimation for events detected in twitter. In Proceedings of the 7th Workshop on Geographic Information Retrieval, pages 9–16, 2013. [13] F. Peregrino, D. Tom´ as, and F. Llopis. Every move you make I’ll be watching you: geographical focus detection on Twitter. In Proceedings of the 7th Workshop on Geographic Information Retrieval, pages 1–8, 2013. [14] A. Sabino, A. Rodrigues, J. Gouveia, and M. Goul˜ ao. Indirect Keyword Recommendation. In Proceedings of the 2014 IEEE/WIC/ACM International Conference on Web Intelligence, Warsaw, Poland, 2014. [15] B. Sigurbj¨ ornsson and R. van Zwol. Flickr tag recommendation based on collective knowledge. Proceeding of the 17th international conference on World Wide Web - WWW ’08, page 327, Apr. 2008. [16] J. Son, A. Kim, and S. Park. A location-based news article recommendation with explicit localized semantic analysis. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 293–302, 2013. [17] K. Stefanidis, N. Shabib, K. Nø rv˚ ag, and J. Krogstie. Contextual recommendations for groups. Advances in Conceptual Modeling: ER 2012 Workshops, 2012. [18] H. Wang, M. Terrovitis, and N. Mamoulis. Location Recommendation in Location-based Social Networks using User Check-in Data. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2013. [19] Z. Wang, J. Feng, C. Zhang, and S. Yan. Learning to rank tags. Proceedings of the ACM International Conference on Image and Video Retrieval - CIVR ’10, page 42, 2010. [20] M. Ye and P. Yin. Location Recommendation for Location-based Social Networks. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, number c, pages 458–461, 2010. [21] T. C. Zhou, H. Ma, M. R. Lyu, and I. King. UserRec: A User Recommendation Framework in Social Tagging Systems. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, pages 1486–1491, 2010.