Jul 1, 2009 - continued growth. Web 2.0 applications such as blogs, book- marking, music, photo and video sharing systems are among the most popular ...
Cross-Tagging for Personalized Open Social Networking Avaré Stewart, Ernesto Diaz-Aviles, Wolfgang Nejdl
Leandro Balby Marinho, Alexandros Nanopoulos, Lars Schmidt-Thieme
{stewart, diaz, nejdl}@L3S.de
{marinho, nanopoulos, schmidtthieme}@ismll.uni-hildesheim.de
L3S Research Center / University of Hannover Appelstr. 9A Hannover, Germany 30167
University of Hildesheim Marienburger Platz 22 Hildesheim, Germany 31141
ABSTRACT
1. INTRODUCTION
The Social Web is successfully established and poised for continued growth. Web 2.0 applications such as blogs, bookmarking, music, photo and video sharing systems are among the most popular; and all of them incorporate a social aspect, i.e., users can easily share information with other users. But due to the diversity of these applications – serving different aims – the Social Web is ironically divided. Blog users who write about music for example, could possibly benefit from other users registered in other social systems operating within the same domain, such as a social radio station. Although these sites are two different and disconnected systems, offering distinct services to the users, the fact that domains are compatible could benefit users from both systems with interesting and multi-faceted information. In this paper we propose to automatically establish social links between distinct social systems through cross-tagging, i.e., enriching a social system with the tags of other similar social system(s). Since tags are known for increasing the prediction quality of recommender systems (RS), we propose to quantitatively evaluate the extent to which users can benefit from cross-tagging by measuring the impact of different cross-tagging approaches on tag-aware RS for personalized resource recommendations. We conduct experiments in real world data sets and empirically show the effectiveness of our approaches.
The social networking phenomena has attracted many millions of users, and has resulted in a proliferation of sites. These sites intentionally seek to distinguish themselves by a set of community practices (social activities) and what they offer members. However, given the sheer number, it is often the case that there is redundancy or overlap with respect to the type of media, resources or topics to which the sites are devoted. Although overlap exists, it is untapped to the benefit of those who actually constitute the social networking ecosystem: the result is a Social Networking Divide. The momentum is swinging in favor of truly Open Social Networking (OSN) – where data can be ported across various sites: Google1 , MySpace [15] and Facebook [12]. These sites seek to establish de-facto standards, to handle issues related to the portability and interoperability of data, personal identities, as well as social graphs. Recent advances toward a more open social networking paradigm are also prevalent in the Semantic Web community and in cross-folksonomy platforms where the user’s multiple identities are consolidated [22]. These efforts support, but do not address, how the social practices in one community may be exploited to support those in another, comparable social community; when underlying resources – but not necessarily the users – in the different social systems are the same. Consider the scenario of an emerging OSN platform, targeted towards linking open data for the purpose of facilitating information finding of tagged resources in the music domain [18]. In such a system, there are different types of (distinct) users. Bloggers, for example, write text about artists, tracks, albums or music videos and taggers tag audio tracks. A registered tagger can greatly benefit from tags for improving browsing, searching and personalized recommendations, in contrast, for example, to registered non-tagging bloggers. For such environments, we propose Cross-Tagging: an approach by which the experience of a non-folksonomy user, such as a music blogger, is personalized by exploiting the tags assertions made by users of folksonomies (Figure 1). The user-tag-resource relations in the tag community is exploited by mapping common resources and inferring similarities between different users in the blog social community. By considering an Open Social Network from this perspective, the social activities in one site are exploited, to support the discovery of new interrelationships within another.
Categories and Subject Descriptors: H.3 [Information Storage and Retrieval]: Information Search and Retrieval– Information Filtering; K.4 [Computer and Society]: General General Terms: Algorithms, Performance, Experimentation Keywords: Social Media, Recommender Systems, Web 2.0, Tags
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. HT’09, June 29–July 1, 2009, Torino, Italy. Copyright 2009 ACM 978-1-60558-486-7/09/06 ...$5.00.
1
271
http://www.google.com/friendconnect/
Figure 1: Cross-Tagging applied to distributed Web 2.0 applications operating over the same domain. tag space where multiple information sources from comparable, but different domains are combined. The “cross-media” system [2] offers a personalized search across disparate media types including: video, images and social bookmarks. By relying upon external APIs, their “cross-media” system creates ranked lists of thumbnail items having tags that match the user’s input. Personalization is achieved by either: restricting the search space to the resources uploaded by the user, or ranking search results based on resources the user picks from results sets. As well, View Completion [20], uses collaborative tags to heuristically complete missing or inadequate feature sets (or views). The basic premise underlying view completion, is that for many tasks, combining multiple information sources yields significantly better results than using just a single one alone. Views are used in this case, since that blogs are not typically available on collaborative tagging websites, and as such the tags provided by bloggers suffer from the vocabulary problem and cannot be adequately used as a shared index. Finally in [17], the goal is to provide a seamless navigation between tag spaces, while the the work presented in [6] merges the areas of formal concept analysis and association rule mining to discover shared conceptualizations that are hidden in folksonomies. These tag unification systems focus on media resources of various types but in our case, we are also interested in a unification paradigm that includes blogs. Bloggers can tag blog posts, but typically cannot directly associate a tag to a specific resource (i.e. song, album or artist) that being written about within the blog. Furthermore, the tags associated with the blog post may not be appropriate for all instances of the entities appearing in the entire post.
The contributions of this work lay on devising an opensocial networking framework based on two tag-aware recommender components: 1. First, we cast the Cross-Tagging problem as a tag recommendation problem, where tags from one social system are recommended in order to automatically annotate resources in another social system. 2. Second, state-of-the art tag-aware resource recommenders exploit these annotations in order to recommend high quality personalized resources to the users. This component is also used as an evaluator for measuring the quality of the tags generated by the first component, i.e., the higher the quality of the recommended tags the better the performance of a tag-aware resource recommender that is based on them. This paper is organized as follows: in Section 2, we present the related work in unified tag and user profile paradigms. In Section 3, we present our cross-tagging approach; beginning with introducing terminology. We then lay the foundation for our approach, by discussing alternative tag recommendation algorithms: first an unpersonalized and then a personalized one, that is based on collaborative filtering (CF). The section is concluded by describing the tag-aware recommendation we use. Section 4 describes a recommendation based evaluation for cross-tagging. Finally in Section 5, we present conclusions and future work.
2.
RELATED WORK
In this section we present related work along two axes: (i) Unified Tag Spaces and (ii) Unified Profiles for Recommendations.
Unified Profiles for Recommendations In recommender systems, Cross-System Personalization [25, 14, 13, 16] is a body of work, which enables personal information across different systems to be shared. The focus has been on adequate representations of dependencies between the user’s profile, to support a unified representation in the different systems. These approaches are ego-centric in that they assume the same user to exist across different systems; and that the user is interested in an aggregated
Unified Tag Spaces Tagging has proved to be an intuitive and flexible Web 2.0 mechanism to facilitate search [1, 26], navigation (e.g., tag clouds) and recommendations [24], but syntactic and semantic differences across tagging systems make it difficult to exploit their latent information in a more versatile manner. For this reason, tags have been exploited to create a unified
272
view of their profile or social networking information. This is not the case in an our view of an open social networking environment, where users are assumed to be similar (in some way), but have distinct digital identities. Tags have been viewed as an explicit, personalized (tagaware) annotation of a person [23, 24, 11, 7]. Diametrically, tags have also been viewed as unpersonalized, i.e., properties of a resource [5, 1]: in such cases, tag assignments are associated with a resource and are considered the same for all users. Moreover, within a single social system, tags have been integrated into the recommendation task, for the purpose of recommending resources other than the tags themselves [4, 11]. Our cross-tagging approach exploits tag-aware recommender systems in two manners: First, to automatically annotate resources across multiple social system. Second, we exploit these annotations to provide users with personalized recommendations of resources of their interest.
(ii) how to measure to which extent these tags are indeed appropriate. We discuss our approaches on how to address (i) and (ii) in subsections 3.2 and 3.3, respectively.
3.
In this approach, the most popular tags used to annotate the active resource are recommended, i.e.:
3.2 Cross-Tagging Approaches Note that the subproblem (i) can be easily cast as a tag recommendation problem, i.e., given a particular (user, resource) pair, one wants to suggest a certain number of tags that respect some previously defined criterion [7]. Thereby, we identified and selected two tag recommendation algorithms, introduced in [11, 7], that can be easily applied to the problem at hand. The first one corresponds to an unpersonalized approach and the second one is a personalized tag recommender based on collaborative filtering (CF). We discuss both approaches in the following subsections.
Unpersonalized Cross-Tagging
CROSS-TAGGING IN MUSIC DOMAIN
In this section we present our approach to Cross-Tagging in the Music Domain in the context of a plausible application scenario to improve the user experience: an Open Social Network Recommender System. For the purpose of this work, we use: Blogger.com, a blog site and Last.f m, a social music site, and present a uni-directional recommendation process, first a mapping between the sites and then provide an enrichment via the mapping. We discuss each aspect in turn.
n T˜ (ub , rb ) := argmax(|Yrb ,tl |)
where Yrb ,tl := Yl ∩ ({rb } × {tl } × Ul ). Note, however, that differently from [11, 7], where the recommender algorithms operate in single datasets, here the tags assigned to the resource in question belong to users that are not necessarily in the Blogger.com system. The assumption behind this strategy is that collective knowledge about a given domain should hold across different systems, as long as these systems operate over the same domain. This method will serve as a baseline for our experiments.
3.1 Terminology and Problem Definition Before explaining our approach for crossing tags between different social systems, we first present some terminology about the concepts discussed in this paper. Similarly to [6], we define a folksonomy as a four-tuple, F := (U, T, R, Y ) , where:
Personalized Cross-Tagging For this approach, a personalized tag recommender based on collaborative filtering (CF) is used. The idea is to first compute a neighborhood of Last.fm users based on how similar their profiles are to the Blogger.com user profiles. Next, the tags of the neighborhood that were used for the active resource are weighted, aggregated and sorted by decreasing weight (see Equation 4). The assumption behind this strategy is that users who share similar resources, also share similar tags. We are able to show through our experiments (see Section 4), that this strategy proves indeed to be more beneficial to the users than the unpersonalized one. Note that in CF, for m users and n resources (in our case music tracks), the user profiles are represented in a userresource matrix X ∈ {0, 1}m×n . Each user profile is in turn represented as row vectors of X :
• U ,T and R are finite sets, whose elements are called users, tags and resources, respectively, and • Y a ternary relation between them, i.e. Y ⊆ U ×T ×R, whose elements are called tag assignments. For convenience we define the Blogger.com system as a folksonomy Fb where the set of tags contains initially only one element, i.e., Tb := {default}. We will use the subscripts l and b to distinguish the corresponding users, tags, resources and tag assignments from Blogger.com and Last.fm, respectively. Assuming that all resources present in Blogger.com are also present in Last.fm, a trivial way to cross tags between these two systems is to make a join with respect to the overlapping resources, i.e., Ybnew := πub ,tl ,rb (σrb =rl (Yb × Yl ))
(2)
t∈T
X := [~ x1 , ..., ~xm ]T with ~ xu := [xu,1 , ..., xu,n ], for u := 1, . . . , m,
(1)
where xu,r indicates that user u co-occurred with resource r by xu,r := 1. Also note, that since we do not have explicit feedback in the form of numerical ratings, the resource matrix is binary, where 1 denotes that a certain user cooccurred with a particular resource, and 0 otherwise. In our case, we have two resource-matrices, one for Blogger.com and one for Last.fm respectively. The well known cosine similarity measure was used for computing the most k similar Last.fm users for a particular Blogger.com user, i.e., xl ,~ xb i . sim(~ xl , ~ xb ) := k~h~ xl kk~ xb k
were σ and π are the relational algebra operators for selection and projection. First, from the cartesian product Yb × Yl , the tuples with equal resources in both sites are selected, and then the projection is taken over the Blogger.com users, the Last.fm tags, and the common resource elements. Given this mapping, the question is which of these tags are useful, if at all, to the Blogger.com users. The problem can then be described as: (i) how to find the most appropriate Last.fm tags for a given (ub , rb ) pair, and
273
3.3 Personalized Recommendations based on Tensor Approximation
The best k neighbors of ub in Last.fm are then computed as follows: k
Nukb := argmax sim(~ xl , ~ xb )
The tags that are generated by the cross-tagging process can be exploited for providing personalized recommendations of resources. This section describes the tag-aware recommendation algorithm that we use for this purpose. The motivation for using a recommendation algorithm is twofold: (a) Recommended resources significantly improve the blog users’ everyday experience, by allowing them to easily locate resources and address the “information overload” problem that emerges in large blogs. (b) The task of recommendation comprises a suitable evaluation framework for cross-tagging strategies, as the higher the quality of generated tags is (i.e., tags that better reflect the personalized aspect of users for the corresponding resources), the better the performance of a tag-aware recommendation algorithm that is based on them. A similar idea is used in [10], where the authors used ontology-aware recommender systems to indirectly measure the quality of a given set of ontologies. As pointed out in Section 3.1, cross-tagging results in a set of triples with the form hu, t, ri, denoting that user u tagged resource r with the tag t. We model the set of all triples with a 3-order tensor (3 dimensional array) A = (au,t,r ) ∈ RNU ×NT ×NR , where NU , NT , NR is the total number of users (first mode of A), tags (second mode in A), and resources (third mode of A), respectively. For each u, t, r (1 ≤ u ≤ NU , 1 ≤ t ≤ NT , 1 ≤ r ≤ NR ) for which there exist a triple hu, t, ri, we set the corresponding element au,t,r equal to 1, whereas all other elements of A are set to 0. Thus, A is, in general, a sparse tensor. By modeling the triples with a tensor, we are able to exploit the underlying latent semantic structure in A formed by multi-way correlations between users, tags, and resources. This can be attained using a recommendation algorithm that is based on tensor reduction, which has been proposed in [21]. With this algorithm we can effectively detect multiway correlations, leading to improved performance, which is empirically confirmed by our experimental results. First, we decompose A using the Tucker decomposition, which is the multi-dimensional analog of SVD for tensors [8]. The decomposition of A is expressed in Equation 5. U ∈ RNU ×NU , T ∈ RNT ×NT , R ∈ RNR ×NR are orthonormal matrices corresponding to the dominant singular vectors per mode. S is the core tensor that contains the singular values, thus it has the same number of dimensions as A and the property of all orthogonality.2 The symbol ×i denotes the i-mode multiplication between a tensor and a matrix.
(3)
ul ∈Ul
After that, the set T˜ (ub , rb ) of n recommended tags for a given (ub , rb ) pair, and some n ∈ N, is computed as follows: X n sim(~ ub , ~ ul )δ(ul , tl , rb ) (4) T˜ (ub , rb ) := argmax t∈T
k ul ∈Nu
b
where δ(ul , tl , rb ) := 1 if (ul , tl , rb ) ∈ Yl and 0 else. Example. Consider the blog user Paul shown in Figure 2. The profile of this user is composed of two songs, i.e., Don’t Cry and Z.I.TO. Suppose we want to recommend tags for the pair (Paul, Don’t Cry) through the cross-tagging approaches introduced in Section 3.2. For the unpersonalized one, we just need to count which tags were used most often for this song (see Equation 2). If we restrict the number of tags to be recommended to 2, we would have the tags Rock and Hard Rock (see upper part of Figure 2b). Note that through this approach, each user would receive the same tag recommendations independently of his/her profile. For the personalized approach, we first find the users having the most similar profiles to Paul (see Equation 3), in this case Jack and John (see Figure 2a), and aggregate the tags used by these “best neighbors” to the resource we want to recommend (see Equation 4). This would lead, in this particular case, to Guns N’ Roses and Hard Rock (see bottom part of Figure 2b). Note that in this case, each user would eventually receive different tag recommendations for the same resource, since the recommendations are based on the individual profiles of the users and thus reflect their personal interests.
A = S ×1 U ×2 T ×3 R
(5)
After decomposing A, we truncate matrices U, T, R, and the core tensor S by maintaining only the highest D singular values and the corresponding singular vectors per mode (henceforth, D denotes the fraction, e.g., 0.7, of the maintained values divided by the original number of values). This produces the truncated matrices UD ∈ RNU ×D , TD ∈ RTU ×D , RD ∈ RNR ×D , and the truncated core tensor SD ∈ RD×D×D .
Figure 2: A blog user profile (top) and last.fm users (bottom) (a). Output of unpersonalized cross-tagging (top) and output of personalized cross-tagging (b)
2
Differently from SVD in 2-order tensors, i.e., matrices, however, S is not diagonal.
274
Using truncation we can approximate A with the reconstructed tensor Aˆ ∈ RNU ×NR ×NT as expressed in Equation 6 and illustrated in Figure 3.3 Aˆ := SD ×1 UD ×2 TD ×3 RD
Table 1: Data Sets Summary Data Collected
(6)
The reconstructed tensor Aˆ is not sparse. The value of |U |, Users |R|, Resources (i.e., tracks) |T |, Tags |Y |, TagAssignments
Data Used in the Experiments†
Blogger.com
Last.fm
Enriched Blog
6,620
44,143
3,827
17,372
17,372
1,323
0
4,903
422
0
254,388
32,900
† For our experiments, we only considered the (ub , rb ) pairs for which a personalized top-10 tag recommendation could be generated.
tracks in the blogs by relying upon a dictionary of tracks gathered from Bill Board Music Dictionary5 . Figure 3: method.
Bill Board Music Dictionary
Visualization of the Tensor Approximation
To extract the blogger profiles, a dictionary constructed from Billboard, a magazine devoted to the music industry which monitors the most popular songs and albums in various categories on a weekly basis. The data was obtained via license for a specified period of time. The artist and track dictionary was constructed from the Billboard Hot 100 and the Billboard 200 survey was used for constructing the album dictionary. Some filter was required for the dictionary entries, particularly for concatenating-tokens within track names such as: “and”; “featuring”, dashes or parenthesis. Concatenated tokens were removed and each artist was then made a separate dictionary entry. This process led to duplicate dictionary entries, so an additional filter to remove duplicates was subsequently applied. Some fourty-two domain-specific stop words were also applied, that described variations of a track or album such as: “Radio Edit”, “Main Version”, or “Original Version”, etc. All dictionary entries were converted to lower case. The approximate dictionary matching algorithm of AhoCorasick Algorithm available under LingPipe6 was used with tolerance set to zero. It was chosen for its performance with respect to the size of the dictionary, which is our case consisted of 51,226 distinct tracks.
each element a ˆu,t,r in Aˆ predicts the association among user u, tag t, and resource r (the higher the value, the stronger the association). In particular, all the non-zero elements of Aˆ represent quadruplets of the form hu, t, r, pi, with p expressing the likeliness that u will tag r as t. Therefore, resources can be recommended to a u for a particular t, according to their weights associated with the quadruplets that contain the (u, t) pair. If we want to recommend N resources to u for t, then we select the N ones with the highest corresponding p value.
4.
EXPERIMENTS
This section first describes the test data sets and evaluation metric, and then reports the experimental results.
4.1 Data Set For Cross-Tagging, we used two data sets: one data set consists of personal music blogs from Blogger.com, one of the most popular blogsites, whereas the second data set consisted of tagged tracks from Last.f m, a radio and music community website and one of the largest social music platforms. The details of each data set are presented in this section, and summarized in Table 1.
Last.fm Data A total of 17,372 unique tracks for both Blogger.com and Last.fm were collected. Also, for Last.fm, a total of 44,143 users, 4,903 tags, and 254,388 user-resource-tag triples were obtained.
Blogger.com Data The raw music blogs were collected by experimentally selecting seed bloggers using several music directories4 and limiting the bloggers selected to the genre of pop and rock music. The blogroll for each seed was traversed, fanning out in a breath-first order, going three levels deep in the blogroll hierarchy. For each of the pages, the html was stripped, no stemming or stop word removal was done for a total number of bloggers equal to |UBlogger.com | = 6, 620. Once the blogger’s pages were collected, profiles were built by parsing the
Enriched Blog Data For our experiments, we only considered the (ub , rb ) pairs for which a personalized top-10 tag recommendation could be generated. Note, that it is not always the case that a given neighborhood will have the resource for which we want to recommend tags. This yielded 3,827 users, 1,323 resources, 422 distinct tags and 32,900 triples.
3
Due to the sparsity of A, its decomposing and approximation can be performed efficiently following the approach of Sun and Kolda [9]. 4 http://www.musicblogscatalog.com/, http://yocheckthisjam.com/music-blog-directory/, http://www.blogged.com/directory/entertainment/music/rock, http://www.blogcatalog.com/directory/music/rock
5 6
275
http://www.billboard.com/bbcom/index.jsp http://alias-i.com/lingpipe/
0.7 0.6
NN=30, D=0.5 P MP
Recall@5
0.5 0.4 0.3 0.2 0.1 0
1
(a)
5
n
10
(b)
n=5, D=0.5
0.7
P MP
0.6
Recall@5
0.5 0.4 0.3 0.2 0.1 0
10
30
NN
50
100
(c)
(d) Figure 4: Experimental Results
4.2 Experimental Results
ized Cross-Tagging approach (default value N N = 30), D is the fraction of maintained dimensions per mode during the tensor reduction (default value D = 0.5). For UB we examined various user-neighborhood sizes and report results for the best value equal to 50.
Protocol and Evaluation Metric For measuring the recommendation quality, we used the recall7 measure. We examine the Tensor Approximation algorithm having as input either a folksonomy created through “Personalized” Cross-Tagging (for convience denoted as P) or through “Unpersonalized” Cross-Tagging (denoted as MP for “most popular”). In order to investigate the overall impact of Cross-Tagging, we also compare the tag-aware recommender with the classic plain CF without tags, denoted here as UB for user-based CF [19]. The evaluation was performed with the Allbut1 [3] protocol, i.e., for each user one resource was randomly hidden and used for testing, while the remaining ones were used for training.
Results Figure 4a depicts the recall of P, MP, and UB for varying number of recommended resources. As expected, the recall of all methods increases with increasing number of recommendations. UB is clearly outperformed by P and MP, which confirms our assumption that, indeed, tags can carry valuable information. Moreover, P attains better recall than MP, especially for the number of recommended resources that are reasonable for real-world applications. The reason lies in the better personalization during the process of Cross-Tagging. To simplify the presentation, we henceforth omit results for UB, as it is consistently outperformed by the other two methods. To gain further insights on the performance of P and MP, we proceed to examine the impact of the parameters. Figure 4b depicts recall@5 (i.e., recall when 5 resources are recommended) while varying n, the number of tags. When n is low, the performance of both P and MP drops, as there
Parameters The parameters we considered in our experiments are the following: n is the number of suggested tags (default value n = 5), N N is the neighborhood size used in the personal7
With a fixed number of recommendations, precision is just the same as recall up to a multiplicative constant and thereby there is no need to evaluate precision.
276
is not enough margin left to personalize the Cross-Tagging process. However, after a point, the consideration of more tags does not add much to the performance. This indicates that a reasonable number of tags per post is enough to both attain good recommendation quality and to speed up the outcome of the Cross-Tagging process. We have to note that in all cases, P performs better than MP. Next, we examine the impact of N N on P (note that MP is independent to N N ). Figure 4c depicts recall@5 for varying N N . Evidently, when N N is high (e.g., 100), the personalization process is not efficient as very “poor” neighbors are also considered. Thus, the performance of P decreases. On the other hand, when the value of N N allows for the identification of more similar and consistent neighbors, then Cross-Tagging proves to be effective. Finally, we examine the impact of D on P. Figure 4d depicts recall@5 for varying D. When D is high, then noise in the data is not effectively filtered out. In contrast, when D is low, information is being lost. Therefore, for reasonable range of D values, P performs efficiently with has the nice property of not being too sensitive on D.
In future work, we plan to explore the use of more robust profiles, which applies confidence measures for disambiguating the named entities extracted from the music blog text. We also plan to extend the approach to a bi-directional recommendation to support both, mutual and dual enrichment for each social site in the Cross-Tagging ecosystem.
Acknowledgments This work was funded in part by the European Project PHAROS (IST Contract No.045035), by the Programme Alβan, the European Union Programme of High Level Scholarships for Latin America, scholarship no. (E07D400591SV), CNPq an institution of Brazilian Government for scientific and technologic development and X-Media project (www. xmedia-project.org) sponsored by the European Commission as part of the Information Society Technologies (IST) programme under EC grant number IST-FP6-026978.
6. REFERENCES
[1] K. Bischoff, C. S. Firan, W. Nejdl, and R. Paiu. Can all tags be used for search? In CIKM ’08: Proceedings of the 17th Conference on Information and Knowledge Management. To Appear. ACM, 2008. [2] M. Braun, K. Dellschaft, T. Franz, D. Hering, P. Jungen, H. Metzler, E. M¨ uller, A. Rostilov, and C. Saathoff. Personalized search and exploration with mytag. In WWW ’08: Proceeding of the 17th international conference on World Wide Web, pages 1031–1032, New York, NY, USA, 2008. ACM. [3] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 43–52. Morgan Kaufmann, 1998. [4] C. S. Firan, W. Nejdl, and R. Paiu. The benefit of using tag-based profiles. In LA-WEB ’07: Proceedings of the 2007 Latin American Web Conference, pages 32–41, Washington, DC, USA, 2007. IEEE Computer Society. [5] C. Hanser and B. Berendt. Tags are not metadata, but ”just more content” - to some people. In Proceedings of the International Conference on Weblogs and Social Media (ICWSM 2007), 2007. [6] R. J¨ aschke, A. Hotho, C. Schmitz, B. Ganter, and G. Stumme. Discovering shared conceptualizations in folksonomies. Web Semant., 6(1):38–53, 2008. [7] R. J¨ aschke, L. Marinho, A. Hotho, L. Schmidt-Thieme, and G. Stumme. Tag recommendations in social bookmarking systems. AI Communications, pages 231–247, 2008. [8] T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review. to appear (accepted June 2008). [9] T. G. Kolda and J. Sun. Scalable tensor decompositions for multi-aspect data mining. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), December 2008. [10] L. B. Marinho, K. Buza, and L. Schmidt-Thieme. Folksonomy-based collabulary learning. In International Semantic Web Conference (ISWC 08). Springer, 2008.
4.3 Discussion Simplifying assumptions were made when constructing profiles. A more robust approach which applies confidence measure to profiles based on a combination of statistical, semantic and syntactic approaches would be needed to disambiguate the named entities extracted from the music blog text. For example Light-weight “semantic association networks” derived from collocated terms in music blogs or the mining of frequent items sets could be useful in identifying prominent pairs of resources. Profiles which exhibit less prominent patterns could be weighted with lower confidence. Syntactic approaches could involved part-of-speech analysis applied to the music blog text. Terms that have an unlikely part of speech as a music entity can be weighted with a lower confidence.
5.
CONCLUSION AND FUTURE WORK
In this work we introduced Cross-Tagging in the music domain, an approach to Open Social Networking where the experience of a non-folksonomy Music blogger, is personalized by exploiting the tags assertions made by folksonomy users in Last.f m. Our Cross-Tagging approach exploits tagaware recommender systems by first automatically annotating resources across multiple social system and modeling resource triples with a tri-dimensional array (i.e., tensor), to exploit the underlying latent semantic structure formed by multi-way correlations between them. We evaluated the approach with a recommendation algorithm that is based on tensor reduction algorithm that can effectively detect multi-way correlations. We found that when compared to classic collaborative filtering without tags, and an unpersonalized cross-tagging system, better personalization was achieved with personalized cross-tagging as measured by recall. The implications for such results suggest that the social practices in one community can be exploited to support those in another, comparable social community; when underlying resources are the same, but users are not. Thereby bringing the open initiative community one step closer to closing the gap in the social network divide.
277
[11] L. B. Marinho and L. Schmidt-Thieme. Collaborative tag recommendations. In Proceedings of 31st Annual Conference of the Gesellschaft f¨ ur Klassifikation (GfKl), Freiburg. Springer, 2007. [12] C. McCarthy. Myspace announces “Data Availability” project with yahoo, ebay, photobucket, twitter. http://news.cnet.com/8301-13577 3-9939286-36.html, 2008. [13] B. Mehta and T. Hofmann. Cross system personalization by learning manifold alignments. In KI 2006: Advances in Artificial Intelligence, volume 4314/2007, pages 244–259. Springer Berlin / Heidelberg, 2006. [14] B. Mehta, T. Hofmann, and P. Fankhauser. Cross system personalization by factor analysis. In ITWP Workshop at AAAI 2006. AAAI Press, 2006. [15] D. Morin. Announcing facebook connect. http://developers.facebook.com/news.php?blog=1&story=108, 2008. [16] C. Nieder´ee, A. Stewart, B. Mehta, and M. Hemmje. A multi-dimensional, unified user model for cross-system personalization. In Proceedings of Workshop On Environments For Personalized Information Access at Advanced Visual Interfaces, May 2004. [17] S. Oldenburg. Comparative studies of social classification systems using rss feeds. In J. Cordeiro, J. Filipe, and S. Hammoudi, editors, WEBIST (2), pages 394–403. INSTICC Press, 2008. [18] R. Paiu, L. Chen, C. S. Firan, and W. Nejdl. Pharos personalizing users’ experience in audio-visual online spaces. In PersDB, pages 40–47, 2008. [19] P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm, and J. Riedl. Grouplens: An open architecture for collaborative filtering of netnews. In Proc. of ACM 1994 Conference on Computer Supported Cooperative Work, pages 175–186, Chapel Hill, North Carolina, 1994. ACM.
[20] Shankara Bhargava Subramanya. View Completation And Collaborative Tagging In Blogosphere. Master’s thesis, Arizona State University, July 2008. [21] M. Y. Symeonidis P., Nanopoulos A. Tag recommendations based on tensor dimensionality reduction. In 2nd ACM Conference in Recommender Systems (RecSys 08), pages 43–50, Lausanne, Switzerland, 2008. [22] M. Szomszor, H. Alani, I. Cantador, K. O’Hara, and N. Shadbolt. Semantic modelling of user interests based on cross-folksonomy analysis. In International Semantic Web Conference, volume 5318 of Lecture Notes in Computer Science, pages 632–648. Springer, 2008. [23] M. Szomszor, H. Alani, I. Cantador, K. O’Hara, and N. Shadbolt. Semantic modelling of user interests based on cross-folksonomy analysis. In International Semantic Web Conference, pages 632–648, 2008. [24] K. H. L. Tso-Sutter, L. B. Marinho, and L. Schmidt-Thieme. Tag-aware recommender systems by fusion of collaborative filtering algorithms. In SAC ’08: Proceedings of the 2008 ACM symposium on Applied computing, pages 1995–1999, New York, NY, USA, 2008. ACM. [25] C. Wang, Y. Zhang, and F. Zhang. User modeling for cross system personalization in digital libraries. Information Technologies and Applications in Education, 2007. ISITAE ’07. First IEEE International Symposium on, pages 238–243, Nov. 2007. [26] J. Wang and B. D. Davison. Explorations in tag suggestion and query expansion. In SSM ’08: Proceeding of the 2008 ACM workshop on Search in social media, pages 43–50, New York, NY, USA, 2008. ACM.
278