Document not found! Please try again

CatStream: Categorising Tweets for User Profiling and Stream Filtering

4 downloads 45282 Views 1MB Size Report
Mar 22, 2013 - Twitter allows users to consume a stream of information gen- erated by other users they have opted to follow. Such streams consist of short text ...
CatStream: Categorising Tweets for User Profiling and Stream Filtering Sandra Garcia Esparza, Michael P. O’Mahony, Barry Smyth CLARITY: Centre for Sensor Web Technologies, School of Computer Science and Informatics, University College Dublin (UCD), Belfield, Dublin 4, Ireland. {sandra.garcia-esparza, michael.omahony, barry.smyth}@ucd.ie ABSTRACT

Real-time information streams such as Twitter have become a common way for users to discover new information. For most users this means curating a set of other users to follow. However, at the moment the following granularity of Twitter is restricted to the level of individual users. Our research has highlighted that many following relationships are motivated by a subset of interests that are shared by the users in question. For example, user A might follow user B because of their technology related tweets, but shares little or no interest in their other tweets. As a result, this all-or-nothing following relationship can quickly overwhelm users’ timelines with extraneous information. To improve this situation we propose a user profiling approach based on the topical categorisation of users’ posted URLs. These topics can then be used to filter information streams so that they focus on more relevant information from the people they follow, based on their core interests. In particular, we present a system called CatStream that provides for a more fine-grained way to follow users on specific topics and filter our timelines accordingly. We present the results of a live-user study that shows how filtered timelines offer a better way to organise and filter their information streams. Most importantly users are generally satisfied with the categories predicted for their profiles and tweets. Author Keywords

Real-time Web, User Profiling, Information Filtering, Classification ACM Classification Keywords

H.3.3 Information Search and Retrieval: Information filtering; H.5.2 User Interfaces: Evaluation/methodology INTRODUCTION

Twitter has had an incredible impact on the world’s information flow and communication landscape. Today Twitter’s

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IUI’13, March 19–22, 2013, Santa Monica, CA, USA. Copyright 2013 ACM 978-1-4503-1965-2/13/03...$15.00.

140m users generate about 340 million tweets a day1 , 25% of which contain URLs 2 . However, notwithstanding its success to date, we believe that there are a number of key challenges that quickly become apparent to new users, and that ultimately may limit the long-term efficacy of Twitter as an information sharing and discovery platform. Twitter allows users to consume a stream of information generated by other users they have opted to follow. Such streams consist of short text messages (tweets) which can contain raw text, tags (hashtags) and URLs. Following a user in Twitter is an all-or-nothing affair, in the sense that all of their friends’ tweets are shown in the user’s information stream (i.e. their timeline). If the user has been judicious about the people and topics that they follow then the hope is that this stream of information will contain relevant content. This hope is often far from the reality however, because as Twitter users accumulate more and more friends, their streams quickly become polluted by a variety of content not all of which is relevant to their core interests. We note that this all-or-nothing following relationship is not just present in Twitter, but in many other information streaming services such as Tumblr or Pinterest. To highlight this problem, in this paper we performed a live user study where we asked 45 participants to specify what categories they followed their friends on (we selected 10 friends at random) out of a set of 18 broad categories. After analysing their friends’ tweets for those categories we found that on average only 53% of their friends’ tweets (with URLs) were actually about the categories they were interested in. In other words, when users follow other users on Twitter, nearly half of their tweets are on topics that are not of interest to them. This result shows a clear need for a system that facilitates the following of users only in relation to certain topics or categories. This is the challenge we wish to address and which contains two key elements. First, we need a way to automatically profile the topics of interests for Twitter users so that users can follow other users based on a subset of these interests. Second, we need a way to classify new tweets and filter those that are off topic. In this way, for example, we can recognize that user @Dogbert’s interests include technology, 1 http://blog.twitter.com/2012/03/ twitter-turns-six.html 2 http://techcrunch.com/2010/09/14/ twitter-seeing-90-million-tweets-per-day/

research, and music and when user @Alice considers following @Dogbert she can do so with respect to technology only. Thus @Alice’s timeline integrity will be preserved by filtering out @Dogbert’s non-technology related tweets. In this paper we have implemented a system called CatStream that incorporates these profiling and filtering capabilites in order to form a more topic-centric Twitter client that can deliver more relevant streams of information to end-users. CatStream’s core component is a multi-category URL classifier that enables the system to categorise users and their tweets based on the URLs contained in them. Focusing on URLs is important because it provides a richer source of content for the purpose of topic classification. In order to evaluate CatStream we have performed an online study with 45 participants. The result of this evaluation shows that most users would prefer to use a system that allowed to categorise and filter their streams in comparison to a standard Twitter client. Further, users are generally satisfied with the categories predicted for their profiles and tweets. This paper is organised as follows. In the next section we describe related work that has been performed in the area of user profiling and information stream filtering. A description of the system, including the system architecture and the user interface is then presented. The process of selecting the URL classifier is explained and finally, a live user study to evaluate CatStream is described and results are presented. RELATED WORK

Existing research has investigated different ways of profiling Twitter users. Some of the work has focused on using the content of tweets to represent the interests of users [5, 15, 23], while other researchers have also looked at hashtag information [1, 7]. However, analyzing tweets poses some significant challenges due to their short length, their noisy nature (i.e. spelling and grammatical errors, use of abbreviation, etc.) and the heterogeneous nature of their content. To deal with these challenges, researchers have also used external sources of information to enrich and disambiguate tweet content. For instance, richer user profiles can be built by extracting entity information (i.e. persons, events or products) using tools like OpenCalais [2, 17]. Further, in [14] the authors use Wikipedia to disambiguate tweets and extract categories of interest, while in [3] search engines are used as an external knowledge source to expand the content of tweets. In this work we analyze the content of URLs, which are present in about 25% of tweets. While many tweets might not reveal much about a user’s topical interests due to a lack of useful information (e.g. a large percentage of tweets consist of daily chatter), posted URLs provide a much richer source of information in terms of identifying the preferences of users for particular kinds of content (i.e. topics). In particular, our solution to user profiling consists on extracting categories of interest (i.e. high level topics such as movies, sports, politics, etc.) by using a multi-label text classifier to categorise URLs. One of the typical approaches to text categorisation

uses machine learning techniques [10, 18, 24]. For example, na¨ıve Bayes methods are commonly used in web page classification due to their high accuracy and simplicity [13, 22]. Support Vector Machines (SVMs) have also been used to classify web pages and in some work they report higher accuracy than na¨ıve Bayes classifiers [8, 20]. Currently, some information stream services also perform user categorisation where a user is assigned a single category they are considered to be influential on. This is the case with Twitter and Pinterest, where such categorisation is used to suggest people to follow in these services. This approach works well for famous personalities (i.e. @BarackObama mostly tweets about politics) and other specialised users (@BBCSport just tweets about sports). However, for most users, this single-category assignation is rather limited since users tend to be have more diverse interests and post about many different topics. Thus, our approach offers a richer representation of users’ interests which allows users to follow other users by a subset of topics they post about. Filtering real-time streams has also been explored recently due to the increasing amount of information in those services. One of the solutions proposed to filtering information streams is recommender systems. For instance, the authors in [7] build user profiles using hashtags contained in the users’ tweets and then use those profiles to recommend tweets. Similarly, user profiles have also been used to recommend news URLs to users [2, 5]. Recommending credible information in Twitter has also been investigated by [11], where a credibility score can be computed for a tweet about a particular topic using topic-specific credibility models. Our solution to filtering information streams is treated as a categorisation problem rather than as recommendation problem. In this sense, categorisation techniques can be used to determine the category of each URL contained in a tweet and then by using category-based profiles we can select the tweets that are interesting to the user. Categorisation for filtering tweets has also been applied in [19] to classify Twitter messages into five categories (news, events, opinions, deals and private messages). Here, we also wish to filter micro-blogs but we apply filtering based on the category of the message rather than on the particular communication intention of the author. This work focuses on the evaluation of CatStream, which incorporates the user profiling and filtering capabilities mentioned above. Our solution to both profiling and filtering is based on a multi-label URL classifier. For this reason in order to evaluate the quality of the user profiles and of the filtering of tweets we must first evaluate the performance of the categoriser on its own, which is described in section Choosing the Classifier. Further we also performed a live user study to evaluate CatStream which is explained in section Online User Study. SYSTEM OVERVIEW

CatStream3 is a system which categorises and filters information streams (currently on Twitter) based on category-based 3 The

system is currently live at http://catstream.org

Figure 1. CatStream’s System Architecture.

user profiles. Its main goal is to help users to better organise their Twitter streams in order to facilitate a more effective discovery platform. Currently CatStream focuses on tweets which contain URLs, since we believe they tend to contain more relevant content, and ignores those which are considered daily chatter. In fact, during our user study we asked users for what reasons they used Twitter and the most popular answer was to find news (82% of users). This establishes that Twitter is more of an information discovery network than a social network, as some might view it. System Architecture

The architecture of CatStream (Figure 1), consists of three main components: 1. URL Classifier. The core of CatStream is a URL classifier which automatically assigns one or more categories to a tweet based on its URL content. This classification is performed in real-time using a na¨ıve Bayes Multinomial classifier [12]. Currently, our classifier works with a set of 18 categories which correspond to topics such as music, sports or health. 2. User Profiler. The profiling component is responsible for constructing a category-based profile of a Twitter user. It does this by categorising the user’s posted tweets (and retweets and replies) using the URL classifier, producing a weighted set of profile categories that reflect the topics that the user tends to post about. The purpose of the profile is two-fold. On the one hand a target user’s profile is used as the basis for filtering the user’s timeline, prioritising those

tweets that conform to the users own interests. In addition, it is also used to provide the ability to follow users on certain topics; so for example a target user Ut might follow user Ui on topic T j by selecting this topic from Ui ’s profile during the following process. 3. Stream Filter. The stream filter is responsible for filtering a target user’s timeline. As mentioned above, CatStream uniquely provides Twitter users with the ability to partially follow other users, by indicating topics of interest during the following process. This means that instead of presenting the target user with a complete timeline of all of the tweets from the people they follow, CatStream filters these tweets based on the user’s chosen topics as the basis for following. Classifying Tweets

CatStream’s profiling and filtering approach is fundamentally based on the ability to classify tweets containing URLs; that is, the ability to associate tweets with one or more topical categories based on an analysis of URL content. This classifier component, as shown in Figure 1, is implemented using a conventional machine learning approach, by training for a set of M categories using a set of labeled tweets to produce a classifier that is able to associate a new tweet Tw with a set of one or more category labels Ci as seen in Equation 1. Classi f y(Tw) = {c1 , ..., cm }

(1)

In this work, categories correspond to broad topics such as music, sports or health. For the purpose of obtaining labeled training data we source URLs from Twitter (using their Streaming API) that have been annotated with corresponding category hashtags (e.g. #music, #sports, #health). Thus, each training instance is a tweet containing a URL and an associated class label. For the purpose of training, the text of the URL provides the basic training data. In our work we have experimented with single-label and multi-label classifiers [18]. The former assigns a single major label to a tweet, while the latter has the option of assigning multiple labels. We will discuss the utility of both approaches in a later section. From Tweets to Profiles

Given a classifier that can associate a tweet with one or more categories we can now adopt a straightforward approach to profile a given user. In CatStream, each user Ui is associated with a set of URLs (see Equation 2), which are the URLs that they have shared on Twitter (given the constraints of the Twitter API, for the purpose of our work we select URLs contained in the user’s 200 most recent tweets). Ui = {URL1 ...URLn }

(2)

Each of these URLs is then associated with one or more categories (using the classifier) and a count is maintained of the number of times that a particular category has been assigned to URLs for the target user. In this way, each user is associated with a weighted set of categories with a higher weight indicating that the user has posted more tweets on the corresponding topic (see Equation 3). Pro f ile(Ui ) = {(c1 , w1 ), ..., (ck , wk )}

(3)

For multi-label classifiers, where each tweet can be associated with a ranking list of categories, we using a straightforward positional weighting scheme so that primary categories count for more than secondary or tertiary categories. For example, if a tweet is associated with a ranked list of C = {c1 , ..., cm } categories then the weight wr assigned to a category at rank r is calculated as per Equation 4.

wr = 1 −

r−1 , where r = 1, .., |C| |C|

(4)

From Profiles to Timelines

As mentioned above, a key purpose of the user profile is to provide other users with a perspective on a particular user, with respect to their main topics of interest. This then means that the all-or-nothing model of Twitter can be improved to allow for following relationships that are based on a combination of users and topics. Moreover, as a target user (Ut ) follows other users based on subsets of their topics, CatStream can then filter Ut ’s timeline by eliminating posts from these users that are off-topic.

This can be achieved in a straightforward manner. Each target user, followed user (U f ) combination is associated with a set of topical categories based on the topics chosen by Ut when they chose to follow U f ; see Equation 5. To compute Ut ’s timeline (that is, the tweets they should see) CatStream requests the current tweets for Ut from Twitter; these are all of the tweets from the users that Ut follows since Twitter knows nothing of CatStream’s categories or profiles. FollowTopics(Ut ,U f ) = {c1 , ..., cs }

(5)

Each of these timeline tweets is classified in real-time by CatStream and filtered out unless its assigned categories support some minimal overlap with the categories chosen by Ut for the originator of the tweet. In other words, for Ut CatStream will only keep (that is, display) a tweet Tw from user U f if and only if there is some overlap between the tweets categories and FollowTopics(Ut ,U f ); see Equation 6. Keep(Ut ,U f , Tw) = {FollowTopics(Ut ,U f ) ∩ Classi f y(Tw)} 6= ∅

(6)

Obviously this approach to filtering has been chosen because of its simplicity. The choice of overlap threshold will be further explored as part of our future work. For example, an obvious extension of the above is to utlilise category weights during filtering more directly by, for example, calculating a cosine similarity between the FollowTopics and tweet categories so as to better reflect the relative priority of overlapping categories. The User Interface

The CatStream system has been designed to emulate a conventional Twitter client and provides all the functionality one might expect in terms of allowing users to tweet, view/manage their profiles, view and follow other users, access timelines, tag tweets etc. Obviously, CatStream also introduces additional features based around its topic-based profiling and filtering components. We will briefly introduce these novel interface elements in what follows. To begin with, Figure 2 shows a target user’s profile. In this case user @Alice is looking at her own profile page which contains a range of categories that have been chosen based on an analysis of her tweets. By default these categories are active but a user can choose to deactivate a category if they wish. For example, we can see that @Alice has active categories such as art & photography and travel & leisure but has deactivated categories such as health and music. These may be temporary deactivations (for example, @Alice may only want to read about sports the day after her football team has played) or they may be more long-term, perhaps indicating that @Alice is generally not interested in that particular category. Currently in CatStream a user’s active categories are used to over-ride the filtering functionality in the sense that only tweets that match on active categories are considered by the

Figure 2. A typical CatStream profile.

filtering component. Thus, if @Alice follows another user @Dogbert on the topic of music then normally she will see all of @Dogbert’s music tweets. However, if @Alice subsequently deactivates music in her profile the she will no longer see @Dogbert’s music tweets. Moving on, in Figure 3 we see @Alice’s view of @Dogbert’s profile page. In this case @Alice is already following @Dogbert and has chosen to do so based on topics such as books & literature and music. @Alice can choose to modify this relationship by deleting existing topics or adding new ones as appropriate. Again, the purpose of this is to support a more finegrained following relationship between @Alice and @Dogbert, thereby ensuring that @Alice’s timeline stays relevant with respect to the types of topics that matter to her and from the people she trusts with respect to these topics. Currently, the main filtering is performed using the topics from the user’s profile page. Thus, by default, a user follows their friends on the set of topics that are active on her profile page, so that she is not obliged to select topics on an individual friend-byfriend basis. This filtering criteria also means that when a user activates a new topic of interest from her profile page for the first time, she will see the tweets on this topic from all her friends. Finally, in Figure 4 we see CatStream’s filtered timeline for user @Alice. Filtering can be switched on or off by the user, as shown. In this case, filtering is on and so @Alice sees the tweets in her timeline that are filtered based on her active categories and the categories by which she has chosen to follow users. Each tweet is shown in the normal way but augmented with real-time category information. A number of feedback options are provided so that users can delete/add categories, for example, which in the future can be used as part of an active learning approach to improving the core classifier.

CHOOSING THE CLASSIFIER

At this point we have motivated and described a novel approach to profiling and following users and filtering their stream content. The classifier approach at the core of CatStream has not yet been discussed in detail however. Before we discuss this element of CatStream it is worth highlighting that our main focus, at least at this stage of our work, is not to optimise the classifier to deliver a significant leap forward in content/text classification. Rather, our core objective is to explore this idea of a more personalised, topic-centric Twitter client by using a reasonably effective classifier. For this reason our approach to classification has largely been about harnessing existing classification technologies.

Single-label Classifier

Our starting point was to use a na¨ıve Bayes multinomial classifier, which has been found to work well in the past for related classification problems [9, 12, 13]. By default this approach returns only a single class/category per classified tweet/URL. We used a bag-of-words approach to represent each URL, where the attributes of the classifier correspond to the terms appearing in the page pointed to by the URL, weighted according to the standard TF-IDF weighting function [21]. We also performed some basic preprocessing on the text, such as removing stop-words, special symbols and digits. To evaluate this classifier we collected and downloaded the content of URLs for 18 different categories for tweets posted between September 2011 and January 2012. We removed duplicate URLs and selected 100 URLs at random for each category, producing a total of 1800 URLs. To begin with we performed a standard 10-fold cross-validation evaluation over this labeled data. Across the 18 categories, the average accuracy achieved by the classifier was 83.6% (σ = 6.3%).

Figure 3. Following friends based on a subset of their profile topics.

2,3$+/'(4(( 5'+,3$+/'( 6%7/'"%.0(

!"#$%&'( )''&*+,-(( #.(,+/'0#"%'1( 2&&(/8''/(/#( )+$#9"%/'1( 6%7/'"(/8''/1(*:( #.'(,+/'0#":(

Figure 4. CatStream’s categorised timeline.

We also performed a small online evaluation in an effort to benchmark the above offline results. We asked a group of 7 users to provide positive/negative feedback on the categories predicted for 50 random tweets; users were also invited to assign other categories to tweets as appropriate. The average prediction accuracy for these tweets was found to be 78.29% (σ = 7%), which although lower than the equivalent crossvalidation accuracy is nevertheless in broad agreement. Multi-label Classifiers

During the above experiment it became clear that one of the obvious shortcomings of our single-label approach was that tweeted URLs were more naturally associated with multiple categories, particularly given the set of categories that we use in CatStream. For this reason we chose to explore a multilabel classification approach.

To transform our single-label na¨ıve Bayes classifier into a multi-label classifier version, we use the posterior probabilities assigned to each label (i.e. category) and which are returned by the na¨ıve Bayes single-label classifier. We produce a label ranking by ordering the labels by their probabilities and pick the top L labels (typically L=2 or 3). We also implemented a variety of other standard classifiers (J48 [16], JRip [6], SVM [10] and Random Forest (RF)[4] with 500 trees) and re-evaluated their prediction accuracy across the 387 manually labeled tweets arising out of the small user study described above. The accuracy results of this multi-label classifier evaluation are presented in Figure 5 as a graph of the percentage of tweets that were correctly labeled for each of the different classifiers at L = 3. We only show the performance for 16 categories as users did not provide feedback on tweets from the

50 0  

 

 

RF _

SV M

Jri p

J4 8  

NB  

%  Correctly  Classified  Tweets  

For the 32 users who did have sufficient tweets with URLs, their tweets contained a mean of 90 (σ = 61) URLs and CatStream predicted a mean of 4.35 (σ = 2.09) categories per user. Participants were asked to delete categories that did not apply to them and to add categories that they felt were missing. On average users deleted only 0.6 (σ = 0.74) categories and added 2.65 (σ = 2.93).

100   90   80   70   60   50   40   30   20   10   0  

Figure 5. Comparison of Multi-label Classifiers (L = 3).

remaining 2 (pictures and checkins). The multi-label na¨ıve Bayes classifier delivered superior overall accuracy. It also significantly outperformed the second place Random Forest on classification time; an important consideration given our need to perform real-time classification over a user’s timeline of tens of tweets. Figure 6 also shows a more detailed analysis of the na¨ıve Bayes approach on a category-by-category basis and highlighting the change in accuracy for L = 1, 2, 3. For the majority of categories classification accuracy is well in excess of 80%. In the next section we perform a live-user study of CatStream with a na¨ıve Bayes multi-label classifier as the core of the classification engine.The classifier was trained on the 1800 tweet/URL dataset described above. ONLINE USER STUDY

To test CatStream in context we created a web application that allowed users to connect via their Twitter user credentials and evaluated it with 45 users who varied in ages from 18 to 55. All of the participants were active Twitter users to varying degrees; 77% used Twitter at least once per week while 14% used it on a monthly basis and 9% rarely used it. 62% of participants described themselves mainly as consumers of Twitter content, while 38% described themselves as both producers and consumers. The purpose of this study was to better understand how effective CatStream’s classifier was performing by getting user feedback on the accuracy of their profile categories and the relevance of their filtered timelines. In addition the study also provided an opportunity to garner more general feedback on the perceived usefulness of the approach and the system as a whole. Part 1 - Profile Accuracy

During the first part of this study users were presented with their profile page, complete with categories assigned to them based on their 200 most recent tweets that contained a minimum of 10 URLs. Unfortunately not all of the participants had sufficient tweets with URLs to use as a the basis for their profile and so the following results are based on a subset of 32 of the 45 users.

We can use this information to calculate versions of precision and recall for the predicted categories by treating deletions as irrelevant categories and additions as missing relevant categories. For example, the precision of a set of predicted profile categories is the number of predicted categories minus deletions divided by the number of predicted categories. And recall is the number of predicted categories minus deletions divided by the number of predicted categories minus deletions plus additions. This corresponds to a profile precision score of 0.9 (σ = 0.14) and a recall score of 0.68 (σ = 0.27). The mean of F1 was 0.73 (σ = 0.21). In other words, CatStream was performing very well when it came to predicting the primary categories that applied to users, rarely making incorrect category suggestions that were deleted by users. At the same time it was sometimes missing categories that users felt should be applied to their interests. These results are very encouraging, especially in relation to the precision of the predictions. As to the missing categories, one explanation is that users were interested in categories that they would not usually post about. Another explanation is of course that CatStream only has access to each user’s most recent tweets, which must represent an incomplete picture of their interests, and so it is entirely likely that the very categories that users tended to add as missing were not reflected in the subset of tweets that CatStream used for profiling. In a more open deployment setting it is likely that CatStream would be able to build a more complete picture of a user’s interests. Separately, it is worth highlighting that when users were asked to rate the overall accuracy of their profile pages on a 5point scale from 1 (poor) to 5 (very good) the median rating was 4, again providing some additional encouragement that CatStream’s profiling component is operating effectively. Part 2- Tweet Classification for Timeline Filtering

For the second part of the study participants were asked to provide feedback on the accuracy of the categories assigned to their tweets in their timeline view (i.e. tweets which contain URLs from the people they follow). Figure 7 shows an example of the timeline view that was used for the purpose of this part of the study. It shows a fairly traditional Twitter timeline of tweets but, as previously discussed, each of the tweets is labeled with a set of predicted categories (computed in real-time by CatStream). Each tweet also has a pair of feedback icons. The ‘tick’ icon allows the user to confirm that they are happy with the categories assigned to the tweet as they are. The ‘plus’ icon allows them to add additional categories (either by choosing from one of the default categories or by providing a free-form category of their own).

%  Correctly  Classified    Tweets    p b e h ot au og bo ty  & raph ok  f a y   bu s  & shi sin  lite on es ra   s  & tu re ca  fina   re n er ce ed  &    jo uc bs a>   on  & de  re al se   ar ch   f m oo ov d   ie s  & hea l t   tv h   ne sh ws ow  & s    cu rre mu nt sic te  aff   ch ai no rs lo s gy po    & r tra  sc ts   ve ien l  & ce  le   i su we re   at he r   100  

80   60   40   20  

L=1  

0  

L=2  

ar t  &

L=3  

Figure 6. Percentage of correctly classified tweets per category for the top L labels achieved by the classifier.

Figure 7. Timeline view showing tweets and annotated categories.

Each user was asked to provide feedback on a minimum of 30 URLs posted in tweets from their followers in their timeline; all 45 participants provided this feedback. In addition the more active users (those who had at least 30 of their own tweets with URLs) were encouraged to provide similar feedback but for their own posted tweets; only 25 participants were able to provide this feedback. In total users provided feedback for 2650 categorised tweets. Participants confirmed that the predicted categories were correct (without change) for 64% of the tweets. They removed a category for only 8% of the tweets, added additional categories for 17% of the tweets and both removed and added categories for 11% of tweets. This demonstrates a similar feedback pattern as was found for profile pages in part 1 of this user study. Most of the time participants seemed happy with the categories assigned to tweets and tended to add categories more often than they deleted predicted categories. For a given tweet we can predict the precision and recall

scores for its predicted categories in a manner similar to that used in part 1 of the study. When we do this we obtain a mean precision and recall of 0.86 (σ = 0.31) and 0.81 (σ = 0.33) respectively. Mean F1 was 0.82 (σ = 0.31). Interestingly when we compare the accuracy of the predictions for a user’s own tweets to the accuracy of their friends’ (timeline) tweets we notice a mean F1 score for the former of 0.76 (σ = 0.34%) versus 0.84 (σ = 0.28%) for the latter. It is a matter for future work to determine whether this difference is due to actual poorer classification performance over each user’s own tweets or whether it is based on user’s having a higher degree of familiarity with the tweets they posted versus those their friends posted. Interestingly, when participants were asked (as in part 1) to rate the accuracy of the classifier over their own tweets and the tweets of their friends on a 5-point scale from 1 (very inaccurate) to 5 (very accurate), the difference in the users’ perceived quality was not so evident. In particular, the median

!"#$%%&'("

'!!" &!" %!" $!"

A2(*-2)"

*)(@29"+"92653)2"

5,.)*5"

*2;-7.9./0"+"5;627;2"

,2)5.7(9",6;*3)25"

?356;"

72A5"+";3))27*"(B(6)5"

?.@625"+"*@5-.A5"

4..="

-2(9*-"

=2(9"

2=3;(>.7"+")252();-"

;-2;8675"

;()22)"+"

Suggest Documents