Aggregating Music Recommendation Web APIs by ... - Purdue University

Aggregating Music Recommendation Web APIs by Artist Brandeis Marshall Computer and Information Technology Purdue University [email protected] Abstract Through user accounts, music recommendations are refined by user-supplied genres and artists preferences. Music recommendation is further complicated by multiple genre artists, artist collaborations and artist similarity identification. We focus primarily on artist similarity in which we propose a rank fusion solution. We aggregate the most similar artist ranking from Idiomag, Last.fm and Echo Nest. Through an experimental evaluation of 300 artist queries, we compare five rank fusion algorithms and how each fusion method could impact the retrieval of established, new or cross-genre music artists. Keywords: rank aggregation methods, artist similarity, music information retrieval

1. Introduction Online radio can be accessed in one of two approaches: subscription such as Sirius Satellite Radio and XM Satellite Radio and free such as AOL Radio, Pandora, Last.fm, Idiomag and Echo Nest. Each online radio portal allows music listeners to create a user account in hopes of tracking music genre and artist preferences. In most cases, the user chooses a radio station with a programmed playlist. In contrast, Pandora only needs a single music artist to being the customized user playlist. If the user would like to listen to different music genres, she must provide a sample music artist for Pandora to generate an appropriate playlist. Some challenges facing music recommendation are multiple genre artists, music artist collaborations and artist similarity identification. Many music artists can be classified in more than one genre due to the artists deciding to alter her sound or assessing the influences of one genre onto another genre. Music artist collaborations have become popular in which collaborations are within and across

music genre. These collaborations may occur on more than one song or album. For two artist collaborations, a music listener may like the song collaboration but only enjoy the music from one of the artists. Artist similarity is primarily user-driven since likeness is highly subjective. These challenges, therefore, make capturing artist similarity difficult. In this paper, we are concerned with identifying similar artists, which serves as a precursor to how music recommendation can handle the more complex issues of multiple genre artists and artist collaborations. We consider the individual most similar artist ranking from three public-use Web APIs (Idiomag, Last.fm and Echo Nest) as different perspectives of artist similarity. We examine the level of overlap amongst these Web APIs through rank fusion methods. By understanding this overlap, we can more easily isolate the multiple genre artists and artist collaborations. The specific contributions of this paper are: (1) examine rank fusion as a solution to artist similarity and (2) perform a quantitative study of artist similarity using five fusions algorithms including Average, Condorcet-fuse, CombMNZ, PageRank and Median.

2. Related Work Recommendation systems have been developed for niche domains such as books, movies and music [4, 9, 13]. Collaborative filtering has become the accepted approaches in order to provide userspecific results using information from many users. However, the prior work of [4, 13] concentrate on the users’ playlist through song properties including pitch, duration and loudness. The music genre, on the other hand, is more complex than the users’ playlist. We focus on music genre because of its song and artist diversity while avoiding the preprocessing of song properties conducted in prior research. In recommending music using text, labels or tags

of music artists are used to classify artists into predefined or user-generated categories. Music recommendation varies from other Web 2.0 tools such as Web pages tagging (Del.icio.us) and image tagging (Flickr ) due to its reliance on Type and Opinions tag classification taxonomy [1]. Bischoff et al. [1] emphasize that music listeners tend to label music and artist according to genre style and enjoy providing personalized opinions of the music while Del.icio.us and Flickr users classify by Topic, Time and Location. Last.fm is one of the online music portal that has been used in music information retrieval in recent years [1, 5, 8, 11, 14]. The user-based tagging of Last.fm uncovered the frequency of tags by different users tend to be the most accurate in characterizing an artists style. Also, the authors propose a method in handling semantic similar tags, e.g. “hiphop”, “hip hop”, and “hip-hop”, to remove duplicate tags due to spelling variations. Since we apply an aggregation approach, tag discrepancies are minimized. Magno and Sable [11] show the similarity of human recommendation with automate music recommendation services provided by Last.fm and Pandora. Nevertheless, the results also note the limitations of human recommendation as some dependencies are not captured for an individuals musical taste. The landscape for finding new music has been vastly transformed thanks to the Internet [5] as it has helped new artists such as Taylor Swift (country) and Sean Kingston (reggae) find new listeners as well. The work of Turnbull et. al. [14] is the most similar work since five tagging approaches are compared including surveying, social tagging, gaming, web documentation and autotagging. The autotagging approach is the only method that does not require at least one song to be annotated by humans. The rank-based interleaving (RBI) approach determines appropriate tags, through fusion of social tagging, gaming, web documentation and autotagging approaches, resulting in higher precision. Hence, aggregation from imperfect methods can motivate the generation of more effective ground truth. We apply this methodology in determining artist similarity, rather than song genre classification.

is readily available for public use.

3.1. Music Recommendation Web APIs 1. Idiomag.1 Idiomag uniquely labels, or tags, a given artist by weighted genre names from a preset list of 144 acceptable genre names maintained by the company’s staff. Then a manual weight is applied to each of the tags by Idiomag’s expert musicians/music lovers. Lastly, the artists are ordered according to their labels’ values. 2. Last.fm.2 Last.fm is highly user-centric by allowing any user to create self-defined tags. In contrast to Idiomag, Last.fm supports usertagging which has led to a number of issues, including duplicate tags due to grammatical errors and maliciously false tags applied to artists. Last.fm combats this challenge by counting multiple occurrences of a single tag for a single artist as a vote for that artists tag. A tag’s votes are reflected in how each tag is weighted. To determine musical similarity for an artist, Last.fm compares the tags of all artists in their database to the target artist. 3. The Echo Nest.3 Due to intellectual property rights, Echo Nest has not unveiled their process of relating artists. Nevertheless, the company has revealed that artist information is generated in a number of ways such as the analyzation of the raw music, blogs, song lyrics and message board postings. While the exact method in rating artist similarity is unknown, test queries have shown that their artist database is multi-faceted with many musicians and genres.

3.2. Rank Fusion Methods Rank fusion, or rank aggregation, has been applied to disciplines such as sensor networks and the Web. However, in these disciplines, correct answers is presumed where relevancy of return results can be clearly measured with a general consensus. Music, on the other hand, is highly customizable as music listeners may enjoy songs and/or artists across music genres. In addition, music genres may overlap in style as one music genre is a descendent of multiple music genres and artists may bridge more than one music genre. We chose five existing rank fusion methods and discuss each below.

3. Artist Aggregation To address the problem of inconsistent results among individual music recommendation Web APIs, we aggregate the artist query results from three music recommendation portal with Web APIs: Idiomag, Last.fm and Echo Nest. These three music recommendation Web APIs are selected since each

1 http://www.idiomag.com/api/ 2 http://www.lastfm.com/api/ 3 http://developer.echonest.com/

2

• Average (Av). Av [2] computes the average rank of artist amongst the three Web APIs and sorts these values to obtain the aggregate ranker. Average considers all rank information which may not be desired in the case of large amounts of incorrect information.

Alternative Blues

Country Electronic

• Median (Me). Me [7] computes the median rank instead of the average rank of each artist. Median ignores all but one set of rank information that may be problematic when the median ranks are highly similar.

Funk

Jazz

• CombMNZ (MNZ). MNZ [10] orders the information using a combination of the frequency of appearances and the ranks. MNZ relies on multiple appearances of the same artist in order to provide supporting evidence of similarity.

R&B Rap Reggae Rock

• PageRank (Pg). Pg [3] is the most popular rank fusion algorithm. It is an approximation to the Markov chain aggregator M C4 [6], which computes the steady state probability through either navigation or randomly jumping.

Table 2. Sample Artists Pandora Radio4 made their similar artist method available through a Web API, we would have used Pandora as the ground truth. SortMusic5 does provide a similar artist tab; however, the gathering of similar artists must be done manually and not all artists are not catalogued on the website. We report the precision in an upper triangular matrix by genre and over all queries. For example, we display in Table 1 the output of the artist query for Usher in terms of the Web API rankings (Table 1a) and the rank fusion methods results (Table 1b). Since Ne-Yo appears highly ranked in two rankings, each rank fusion method gave it a rank of 1. Each Web API provides a different ordering of artists, but in general, we notice the higher overlap in artists between Last.fm and Echo Nest.

• Condorcet-fuse (Cfuse). Cfuse [12] uses an unweighted directed graph in which an edge indicates the higher rank of one artist over another for each Web API. The magnitude of rank difference between one artist to another artist is omitted resulting in some information loss.

4. Experimental Study To test artist similarity amongst the three Web APIs, we ran 300 artist queries over 10 popular music genres. We manually selected the query artists as to guarantee the return of at least 10 similar artists from each music recommendation application. We present a sample of the artists queried by the rank fusion method in Table 2. For each artist query, we examine the performance of each rank fusion method by calculating the precision. The precision prec is calculated by taking as input two ranked lists mi , mj and finding the number of common elements in relation to the number of returned elements k. In our experiments, each Web API returns a ranked list of 10 to 15 similar artists. We chose k = 10. Formally, precision is defined as follows prec(mi , mj ) =

Ben Harper, Counting Crows, John Mayer, Phish, Radiohead B.B. King, Buddy Guy, Muddy Waters, Robert Johnson, Stevie Ray Vaughan Garth Brooks, Johnny Cash, Toby Keith, Waylon Jennings, Willie Nelson Chemical Brothers, Daft punk, Kraftwerk, Thievery Corporation, Zero 7 Curtis Mayfield, James Brown, Red Hot Chili Peppers, Stevie Wonder, The Meters Herbie Hancock, John Coltrane, John Scofield, Miles Davis, Wes Montgomery Al Green, Jagged Edge, Marvin Gaye, Otis Redding, Usher Cassidy, Dr. Dre, Emenem, Snoop Dogg, Young Jeezy Bob Marley, Matisyahu, Steel Pulse, Sublime, Toots & The Maytals Audioslave, Incubus, Nine Inch Nails, Pearl Jam, Tool

Overall Performance. For each row and column pair (ri , cj ), we present the precision value and if i = j, then the precision is 100%. In Table 3, we display the average precision across the 10 genres. The comparison of Av with the other rank fusion algorithms reveal a high precision of around 84% between (Av, MNZ) and (Av, Pg). We also observe that (MNZ, Pg) is highly similar in ranking artists with a precision of 86.1%. Given the high correlation amongst Av, MNZ and Pg, these

mi ∩ mj k

We investigate how each rank fusion method performs as the ground truth creating a symmetric matrix in which prec(mi , mj ) = prec(mj , mi ). If

4 http://www.pandora.com/ 5 http://www.sortmusic.com/

3

Rank 1 2 3 4 5 6 7 8 9 10

Idiomag Craig David Eamon Sisqo Mario Winans R. Kelly Baby Bash Ciara The Black Eyed Peas Missy Elliott Puff Daddy

Last.fm Ne-Yo Mario Chris Brown Bobby Valentino Trey Songz Omarion Marques Houston Lloyd Joe Jagged Edge

Echo Nest Mary J. Blige Ne-Yo Jagged Edge Toni Braxton Bobby Valentino Trey Songz Sammie Frankie J Mariah Carey Chris Brown

(a) Web API Rankings Rank 1 2

Av Ne-Yo

MNZ Ne-Yo Bobby Valentino

Pg Ne-Yo Bobby Valentino

Me Ne-Yo Bobby Valentino

Cfuse Ne-Yo

Trey Songz

Trey Songz

Trey Songz

Toni Braxton Bobby Valentino Trey Songz Sisqo Sammie R. Kelly Baby Bash Ciara

3

Bobby Valentino Trey Songz

4

Craig David

Chris Brown

Jagged Edge

Chris Brown

5 6 7 8 9 10

Mary J. Blige Eamon Mario Chris Brown Jagged Edge Sisqo

Jagged Edge Craig David Mary J. Blige Mario Eamon Sisqo

Chris Brown Craig David Mary J. Blige Eamon Mario Sisqo

Jagged Edge Craig David Eamon Sisqo Mario Winans R. Kelly

Jagged Edge

(b) Aggregate Results

Table 1. Artist Query: Usher

Av Me MNZ Cfuse

Av 100.0 0.0 0.0 0.0

Me 62.53 100.0 0.0 0.0

MNZ 84.80 70.40 100.0 0.0

Cfuse 60.89 91.33 65.43 100.0

precision results of 91%. Both Me and Cfuse primarily focus on strict ordering of one artist over another artist. Since Me only considers the middle rank, artists are ranked in a localized environment. In Cfuse, the dominance of one artist in the ranking over another artist is based on all three APIs. When ties occur for either Me or Cfuse, the sort scheme is random, which can be a contributor to the performance variation between (Av, MNZ, Pg) and (Me, Cfuse).

Pg 83.33 66.03 86.10 62.89

Table 3. average precision

rank fusion methods tend to operate in a similar manner. Those rank fusion algorithms are highly dictated by the frequency of appearance. For Av, an artist ranked in the top-10 for all three APIs has a higher probability of producing as an average rank in the top-10. For MNZ, the method directly uses number of appearance and a weight measurement in its ranking scheme. In Pg, artists appearing in the top-10 of each API will have more incoming and outgoing links than those artists in one or two of the APIs. These links allows navigation of the graph more effective without cycles and the need to randomly jump to an unseen artist.

As observed in Table 1b, the contrast in artist ordering in (Av, MNZ, Pg) and (Me, Cfuse) is demonstrated with the inclusion of Toni Braxton, Mario Winans, R. Kelly, Baby Bash and Ciara in Me and Cfuse. The artist Mary J. Blige appears in Av, MNZ and Pg rankings but does not appear in either top10 results for Me and Cfuse. The rank fusion results provide very similar rankings only when a high overlap amongst the Web APIs occurred. In the case of a large number of similar artists, the method behavior is emphasized providing different results.

On the contrary, a lower precision of about 61% between rank fusion methods (Av, Me) and (Av, Cfuse) highlights the differences in the methods and increases the likelihood of finding (unconventional) partial matches such as new or cross-genre artists. Interestingly, (Me, Cfuse) are highly similar with

Table 4 shows the average precision for two genres: Alternative and R&B. In these genre-specific results, we observe the contrast between (Av, MNZ, Pg) and (Me, Cfuse). In Table 4a, (Av, MNZ, Pg) have a precision of > 87% and (Me, Cfuse) obtains a precision of 84% whereas the pair compari4

Av Me MNZ Cfuse

Av 100.0 0.0 0.0 0.0

Me 67.99 100.0 0.0 0.0

MNZ 88.66 73.99 100.0 0.0

Cfuse 68.0 84.33 71.00 100.0

Pg 87.66 71.99 91.33 70.33

Av Me MNZ Cfuse

Av 100.0 0.0 0.0 0.0

(a) Alternative

Me 67.33 100.0 0.0 0.0

MNZ 92.66 76.33 100.0 0.0

Cfuse 60.66 95.66 65.66 100.0

Pg 91.99 73.33 95.33 63.66

(b) R&B

Table 4. Genre Performance son across these fusion method groups ranges from 67.99% (Av, Me) to 73.99% (Me, MNZ). In the case of the R&B genre, the precision similarity of (Av, MNZ, Pg) is above 90%. Both Me and Cfuse are also highly correlated with precision of 95%. Once again, we observe a larger discrepancy between the rank fusion methods groups producing precision between 60% (Av, Cfuse) to 76% (Me, MNZ). The discovery of two ranking groups (Av, MNZ, Pg) and (Me, Cfuse) can provide music listeners a choice of aggregate music recommendation. The need for strongly linked similarity amongst the top artist result can be resolved by applying either Av, MNZ or Pg. To include partial matches and loose similarity, Me or Cfuse can be used for aggregation.

[2] J. C. Borda. Mémoire sur les élections au scrutin. In Histoire de l’Académie Royale des Sciences, 1781. [3] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of ACM WWW, pages 107–117, 1998. [4] H.-C. Chen and A. L. P. Chen. A music recommendation system based on music data grouping and user interests. In Proceedings of ACM CIKM, pages 231–238, 2001. [5] S. Cunningham, D. Bainbridge, and D. McKay. Finding new music: a diary study of everyday encounters with novel songs. In International Conference on Music Information Retrieval (ISMIR), pages 83–88, 2007. [6] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of ACM WWW, pages 613–622, 2001. [7] R. Fagin, R. Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggregation. In Proceedings of ACM SIGMOD, pages 301–312, 2003. [8] G. Geleijnse, M. Schedl, and P. Knees. The quest for ground truth in musical artist tagging in the social web era. In International Conference on Music Information Retrieval (ISMIR), pages 525–530, 2007. [9] I. Im and A. Hars. Does a one-size recommendation system fit all? the effectiveness of collaborative filtering based recommendation systems across different domains and search modes. ACM Transactions on Information Systems, 26(1), 2007. [10] J. H. Lee. Analyses of multiple evidence combination. In Proceedings of ACM SIGIR, pages 267–276, 1997. [11] T. Magno and C. Sable. A comparison of signalbased music recommendation to genre labels, collaborative filtering, musicological analysis, human recommendation, and random baseline. In International Conference on Music Information Retrieval (ISMIR), pages 161–166, 2008. [12] M. Montague and J. A. Aslam. Condorcet fusion for improved retrieval. In Proceedings of ACM CIKM, pages 538–548, 2002. [13] M. Slaney and W. White. Measuring playlist diversity for recommendation systems. In Proceedings of the ACM Workshop on Audio and Music Computing Multimedia, pages 77–82, 2006. [14] D. Turnbull, L. Barrington, and G. Lanckreit. Five approaches to collecting tags for music. In International Conference on Music Information Retrieval (ISMIR), pages 225–230, 2008.

5. Conclusion We address the problem of inconsistent similar artist results through aggregation of three online music portals (Idiomag, Last.fm and Echo Nest) with music recommendation APIs. Using five rank fusion methods, we test the relevancy of results from 300 artist queries. We show that the Average, CombMNZ and PageRank rank fusion algorithms produced very similar precision results. Median and Condorcet-fuse are similar to each other but vary greatly with the other three ranking schemes. We observe highly similar artists can be retrieved using Average, CombMNZ or PageRank while a more relaxed similarity can be captured using either Median or Condorcet-fuse rank fusion method. In future work, we would like to address semiduplicate results in which music artists collaborate on more than one album and/or song. A music data repository of artists with collaboration will assist in reducing semi-duplicate noise. Also, a music collaboration structure can prove beneficial for assessing artist similarity quality where collaborating with other like-minded artists is becoming more commonplace.

References [1] K. Bischoff, C. Firan, W. Nejdl, and T. Paiu. Can all tags be used for search. In Proceedings of ACM CIKM, pages 203–212, 2008.

5