Experience Discovery: Hybrid Recommendation of ... - CiteSeerX

2 downloads 0 Views 442KB Size Report
Oct 27, 2011 - curricular activities to high school and middle school students in ... searchers in new media literacy refer to the modes of “hang- ing out” ...
Experience Discovery: Hybrid Recommendation of Student Activities using Social Network Data Robin Burke, Yong Zheng, Scott Riley Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA [email protected], [email protected], [email protected]

ABSTRACT

• Students attend activities for a variety of reasons, and interest in the subject matter is not necessarily the most important one. For example, the involvement of friends is often a key motivational factor. This can make profiles based on content characteristics of activities somewhat unreliable.

The aim of the Experience Discovery project is to recommend extracurricular activities to high school and middle school students in urban areas. In implementing this system, we have been able to make use of both usage data and data drawn from a social networking site. Using pilot data, we are able to show that very simple aggregation techniques applied to the social network can improve recommendation accuracy.

• The goal of the recommender system is to both broaden and deepen students’ knowledge and interests. Different activities lend themselves to different levels of engagement. Researchers in new media literacy refer to the modes of “hanging out”, “messing around”, and “geeking out” [5]. It is often beneficial to encourage a student to engage more deeply with a particular topic in which they have shown an interest, but also a diversity of options is also desirable.

Categories and Subject Descriptors H.2 [Database Management]: H.2.8 Database application—Data mining; H.3 [Information Storage and Retrieval]: H.3.3 Information Search and Retrieval—Search process

• Finally, there is an element of persuasion inherent in activity recommendation. Teenagers often need some extra motivation to get them out of customary situations and into new and challenging ones.

General Terms Experimentation, Performance

Keywords Hybrid Recommenders, Social Networks

1.

INTRODUCTION

Inner-city youth are a primary target for many non-profit organizations offering workshops, after-school courses and other educational opportunities in urban settings. However, these organizations often find it difficult to market their offerings to eligible and interested students. The Experience Discovery (xDisc) project aims to provide a personalized service for students to connect them to such services. Educational experiences such as those presented by xDisc have properties that make them unusual and challenging for recommendation. • Students participate in activities multiple times, and so recommendations do not exhibit a portfolio effect. For example, suppose Alice attends a book discussion at her local public library. The system might be justified in recommending that she attend the next book club meeting.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. HetRec’11, October 27, 2011, Chicago, IL, USA Copyright 2011 ACM 978-1-4503-1027-7/11/10 ...$10.00.

Recommendation in the Experience Discovery context is therefore an inter-disciplinary problem requiring insights from sociology, educational theory, digital media, human-computer interaction, as well as computer science. The many facets of the problem also suggest that it is appropriate for the use of hybrid recommendation models and the integration of different types of knowledge sources. This paper describes our implementation of a platform designed to support both experimentation with recommendation algorithms and online delivery of recommendations to students. We also show some experimental results based on preliminary data that support our hypotheses about the synergy between collaborative and social networking data.

2.

ARCHITECTURE

Figure 1 gives an overview of the architecture of the system. The system draws from a number of data sources, including descriptions of activities and their attributes, attendance information for students, and data from the social networking site, including students’ posts and media uploads, their links to friends, and browsing logs. In order to prevent excessive load on these systems from the recommender, we opted to cache our extracts of these databases. In its simplest form, the role of the recommendation engine is to compute a recommendation list, given a user id. As discussed above, there are a variety of requirements for recommendation, not all of which are fully compatible with each other. For example, content is often a good predictor of user interests, but not always, as in the case of activities attended by friends. So, one key requirement from the design of the system was that the platform be capable

!"#$%&$(0$,-$01''$(2*31(,4+*51%', 6037&)8,9*)*, 6G$(2*(0$, 9*)*,

D#$%*31(*+, >()$%C*0$,

6+A1%&)@',E&F%*%8,

>(#/), ?*0@$,

-$01''$(2*31(, !(A&($,

-$./+), ?*0@$,

?+&$(), 6##+&0*31(,

!"#$%&'$()*+, ?1(BA/%*31(.,

:10&*+, ;$)()$%C*0$,

!"#$%&'$()*+, -$./+).,

Figure 1: Architecture Overview of producing recommendations from multiple algorithms. The recommendations can be presented in different tabs in the xDisc client application. The system has two primary interfaces: one for experimentation and one for the delivery of recommendations to users. The experimental interface allows researchers to define and select data sets, experimental protocols and algorithms, and to run experiment and log their outcomes. The operational interface relies on a results cache to ensure scalability. The system will periodically populate the results cache with recommendation lists for each user, and the operational interface merely delivers the pre-computed results. This design allows the system to respond very quickly to client requests and the scale easily, but with the cost that many recommendation lists will be computed and stored for users who will likely never see them: a standard space/time tradeoff.

3.

ALGORITHMS AND METHODOLOGY

The Experience Discovery task lends itself to a variety of recommendation approaches. For example, the features associated with each activity can be used to drive content-based recommendation. Student attendance information can be used to create profiles for user- and item-based collaborative recommendation. In the future, we anticipate collecting and using rating information from students to supplement the attendance data in these profiles. For our initial experiments with this data, we opted to compare the simple, well-known technique of user-based collaborative recommendation [3] with hybrids incorporating additional content and social-network-derived features.

3.1

Data Sources

Our educational partners are in the midst of transition from an earlier system version and we were only able to obtain a small data sample on which to perform our initial experiments. In addition, only a subset of the users could be matched between the attendance records and the social network. The next release of the system will include more detailed content features for activities, closer integration between the attendance and social network systems, and a much larger dataset both in terms of activities and students. The activity and attendance information is hosted by CitySpan YouthServices.net1 , a platform for tracking attendance and participation in youth programs. Our data extract was limited to students who were also participants in programs run by YouMedia2 , a new 1 2

http://cityspan.com/solutions/ys.asp http://www.youmedia.org/

media organization for youth, which includes an online social network. Our database included 1,916 participants and 55 activities. However, data analysis showed that there were a large number of students who attended only a single activity, all at the same time, and discussion with our partners showed that these were school visits or other compulsory activities. Since attendance in these events was not evidence of student interest, we eliminated them from the data. In the interest of making use of the social media data, we also eliminated students for whom we could not match (using name, sex, and date of birth information) among the YouMedia users. This filtering step left us 226 students, 32 activities and 3,806 attendance records. For cross-validation purpoess, the dataset was divided into five folds, by partitioning the entries for each user, so that there were approximately 3000 records in each training set. This is a very small data set but it was sufficient for testing purposes. Complete integration with the new version of the online systems is expected in the next few months, and will yield a much larger (and growing) data set that is also more diverse in types of activities represented. The newer system also links identifiers between the two systems, preventing the need for heuristic matching of accounts by name.

3.2

Algorithms

The simplest way to handle our attendance data is as binary values, reflecting whether a student has attended an activity or not. To find peer users for collaborative recommendation, we can treat these binary vectors as sets and use the Jaccard coefficient, which is simply the ratio between the size of the intersection and the size of the union of two sets. However, treating attendance as binary gives all activities the same weight, and as we have noted, students often attend an activity over and over again. To capture this aspect of the attendance data, we construct a pseudo-rating for each activity, computed using a form of tfidf weighting. tf in this case is how often a user attended an activity and df is how many users total attended that activity. Dividing the two gives a weight for each attended activity. The vector of these weights is then normalized for each user so that the sum adds to 1. This gives a vector of real-numbered pseudoratings that roughly reflects users’ repeat attendance behavior. On top of this behavior-based data, we sought to add two additional data sources: content features related to each activity, and user characteristics based on behavior in the social network. Each activity is described by relevant descriptors (academic, career, health, music, etc.) intended to represent the general type of skills being practiced or developed in the activity. There are 13 descriptors, and an activity may be labeled with more than one. The

vector of descriptors is therefore a way to represent “content” so that activities may be compared. To make use of this data, we built a content-collaborative metalevel hybrid [1]. We built user profiles by counting the descriptors associated with the activities for each user. A user profile therefore consists of a vector of size 13 (one for each content descriptor) with these activity descriptor counts. This is the content-based part of the hybrid. We then compare users collaboratively using these descriptor vectors to form the peer neighborhoods. Our final algorithm was built by layering on top of this hybrid an additional data source drawn from the social network. There has been very little study of using social network information to guide recommendations for items outside of the social network itself, and we believe that the YouMedia data provides us many excellent opportunities to explore possible approaches. For our initial attempt, we limited ourselves to data about students’ behavior, especially uploading, creating and sharing media on the site. In all, we computed 10 features over this activity data. Six features represented user contributions to the site: blog postings, photos, music, etc.; four additional features were added to capture a measure of the user’s general level of activity including the number of outbound friend links and the number of comments placed on other’s pages. Our experiments indicated that using all 10 of these dimensions was superior to the contribution data alone. To build these features into the hybrid, we used a simple feature combination approach. The 10 user behavior features were added to the existing activity-descriptor-based profiles. So, in this threeway hybrid, a user profile is built as shown in Figure 2.

8$93&(:. !,(,. !"#$%&'()%#.

4)$&,0. 5"(6)%7. 1"2,3&)%.-,(,.

!"#$%&'()%*+,#"-. '%)/0".

1"2,3&)%*+,#"-. '%)/0".

Figure 2: Design of Hybrid User Profile So, for the results below, we are comparing four algorithms, all variants on user-based collaborative filtering: Binary: User profiles represented as binary attendance vectors. Rating: User profiles represented as pseudo-raings – normalized tf idf weights over activities. Hybrid1: Content-collaborative meta-level hybrid with users represented as vectors of activity descriptors. Hybrid2: Three-way content-collaborative hybrid combined with social network behavior data.

4.

EXPERIMENTAL RESULTS

Some basic experimental results are shown in Table 1. As described above, the profiles were split into five folds and cross validation performed to compute MAE and precision in the top 5. A number of different variants (with different thresholds, neighborhood sizes, etc.) were tried and those with best MAE retained. Several points can be noted about these results. The overall precision is fairly low, which can be attributed to the limited quantity of data available and the limited profile size – although the average profile size is 17, many users had much smaller profiles (fewer

than 10 attendances.) The simple binary algorithm had the best performance in terms of precision. However, we see that this comes at a cost in terms of coverage.3 The sparsity of the data means that many users do not have neighbors. Converting to pseudoratings improved MAE at a high level of coverage, but precision was poorer. By representing the similarity between activities, the two hybrids offer improved coverage and improved precision, although MAE is not as strong. It should be noted, however, that with MAE we are comparing against the computed pseudo-ratings, which make the interpretation of error results difficult. The most common recommendation scenario for our application will be the provision of short recommendation lists for users, so precision among the top items is our preferred metric. [4] Algorithm Binary Rating Hybrid1 Hybrid2

MAE 0.109710 0.093066 0.128494 0.125255

Coverage 57.9% 72.0% 97.1% 100%

P@5 15.7% 11.3% 10.7% 11.6%

Table 1: Performance measured by MAE and Coverage

We also performed a dynamic evaluation of recommendation performance [2]. This technique is intended to capture the evolution of system performance over time as user profiles are built. Let an attendance record be a three-tuple, < u, a, t >, where u is a user, a is an activity, and t is the time. All attendance records are sorted in temporal order, based on time stamp. Each record is evaluated as follows. To test the system’s ability to predict < u, a, t >, we create a training set of all prior attendances at times < t. Then we generate a recommendation list for user u and look at the rank of a in that list. If a appears in the top 5 items, it is considered a hit. This temporal leave-one-out approach gives a view of the evolution of a recommender system over time as it is (or would be) experienced by users. Following [2], we can get a more personal view of system evolution by looking at the results of the evaluation as a function of the length of a user profile. The question here is how much improvement does users see as they spend more time with the system and their profiles grow longer. To compute this profile-based result, we take note of the length user u profile at the time this prediction is made. We average over all users with profiles of the same length. When all the users were considered together, we found no significant differences between the algorithms, so we performed an analysis of user sub-groups based on profile size and profile diversity. We looked at three different levels of activity and two categories of diversity, giving six groups as shown in Table 2. Of these groups, U 1 and U 2 had poor performance due to the small number of activities in their profiles, and so are excluded here. The remaining groups are quite small in size so our results cannot be considered anything more than suggestive at this stage. Figure 3 shows the results of this experiment for the other subgroups. As we can see, there are few performance differences between algorithms for groups U 3 and U 4. All algorithms performed very poorly on the high diversity group (U4). Hit ratio shows strong results for the group with low diversity, most likely reflecting recommendations for repeated instances of a popular activity. For the larger profiles U 5 and U 6, the results are more mixed and 3

Coverage is different from the Rating algorithm because the minimal similarity threshold and neighborhood size were tuned to minimize MAE for each algorithm.

Rating Binary Hybrid1

HIT Ratio (Top 5)

HIT Ratio (Top 5)

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Hybrid2

1

3

5

7

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Rating Binary Hybrid1 Hybrid2

9 11 13 15 17 19 21 23 25 27 29

1

Profile Size (U3)

5

7

9 11 13 15 17 19 21 23 25 27 29 Profile Size (U4)

0.35

1.2

0.3

0.8 Rating

0.6

Binary 0.4

Hybrid1

0.2

Hybrid2

HIT Ratio (TOP 5)

1 HIT Ratio (Top 5)

3

0.25 0.2

Rating

0.15

Binary Hybrid1

0.1

Hybrid2

0.05 0

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109

0 Profile Size (U5)

Profile Size (U6)

Figure 3: Dynamic evaluation of subgroups U3-U6

U1 U2 U3 U4 U5 U6

Profile Size Small Small Medium Medium Large Large

Profile Diversity Low High Low High Low High

No. of Users 126 10 35 12 12 31

Table 2: User Classifications

we begin to see differentiation between the algorithms. The low diversity users are something of a mixed bag with no algorithm a clear winner but the pseudo-rating-based algorithm clearly inferior. Interestingly, the high diversity users are best served by the threeway hybrid and the two-way hybrid shows the worst performance. This indicates that the behavior in the social network may be a good indicator of similarity between users when the profiles are hard to compare due to a diversity of activity choices. (Note, however, the difference in the y-axis scale between this and the other figures.)

5.

CONCLUSIONS

Our work with this system is just beginning. However, several conclusions can be drawn from the preliminary work shown here. One is that this domain of recommending youth service activities that offers vmany opportunity to explore issues of heterogeneity in recommender systems. We have a wide variety of data types to drawn from both within the attendance data itself and also in the auxiliary social networking system. In extracting user upload statistics and overall activity indicators, we have really only scratched the surface of the opportunities offered by the integration of social network data into activity recommendation. But even our preliminary deployment of these diverse data sources suggests that

there is value in their fusion. In future work, we will be making use of a more complete dataset with a larger student body and a larger social network. We will begin providing live recommendations to Experience Discovery users sometime in the Fall of 2011 and at that point we will have the opportunity to perform A/B testing and to bring user rating data, tags, and other data sources into our recommendation algorithm.

6.

ACKNOWLEDGMENTS

The research reported in this paper was supported in part by the Bill and Melinda Gates Foundation and the MacArthur Foundation. Thanks to Nichole Pinkard for her support and Akili Lee and the Experience Discovery team for technical assistance.

7.

REFERENCES

[1] R. Burke. Hybrid recommender systems: Survey and experiments. User Modeling and User-Adapted Interaction, 12(4):331–370, 2002. [2] R. Burke. Evaluating the dynamic properties of recommendation algorithms. In Proceedings of the 4th ACM International Conference on Recommender Systems, pages 225–228, Barcelona, Spain, 2010. [3] J. Herlocker, J. Konstan, A. Borchers, and J. Riedl. An Algorithmic Framework for Performing Collaborative Filtering. In 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, 1999. ACM. [4] J. Herlocker, J. Konstan, L. G. Tervin, and J. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. on Information Systems, 22(1):5–53, 2004. [5] M. It¯o. Hanging out, messing around, and geeking out: kids living and learning with new media. MacArthur Foundation series on digital media and learning. MIT Press, 2010.