Tree-Based Mining for Discovering Patterns of Reposting Behavior in Microblog Huilei He1, Zhiwen Yu1, Bin Guo1, Xinjiang Lu1, and Jilei Tian2 1
School of Computer Science, Northwestern Polytechnical University, Xi’an, China {nwpuhhl,ramber1836}@gmail.com, {zhiwenyu,guobin.keio}@nwpu.edu.cn 2 Nokia, China
[email protected]
Abstract. Discovering behavior patterns is important in online human interaction understanding (e.g., how information is shared through reposting, what roles do people play in a conversation). As reposting has become the key mechanism for information propagation in social media (e.g. microblog) and contributes a lot to users’ participation in online events, it is important to explore how repost works. Different from previous studies, we make two contributions in this work: firstly, we analyze the patterns of reposting behavior from the perspective of microblog user and employ a special mining method which successfully find interesting results; secondly, our analysis is based on the Sina Weibo, which has different characteristics with Twitter. Specifically, information flow for a certain message in Weibo is represented as a tree. Treebased pattern mining algorithm is presented to extract a number of interesting patterns which are useful for understanding information diffusion in the Weibo network. Keywords: Information propagation, Reposting behavior, Microblog, Treebased pattern mining.
1
Introduction
Along with the development of Web 2.0 applications, social network services have gained ever-increasing popularity as a result of people’s growing communication demand as well as Internet’s permeation into everyone’s daily life. These services have profoundly changed the way people acquire knowledge, share information and interact with one another on a societal scale. Microblog is one of the most important types of social media services and has become a popular communication tool among Internet users. As microblog services have gained wide popularity, users have applied them for many purposes, such as sharing news, promoting political views, showing off, marketing, and tracking realtime events [1,3]. In microblog, short messages of a maximum of 140 characters are posted by users, which are called “tweet” or “post”. A user can retweet or repost a message posted by others. Users may “follow” one another to receive all up-to-date H. Motoda et al. (Eds.): ADMA 2013, Part I, LNAI 8346, pp. 372–384, 2013. © Springer-Verlag Berlin Heidelberg 2013
Tree-Based Mining for Discovering Patterns of Reposting Behavior in Microblog
373
messages published by interested users and get “followed” by other users to spread his messages. Those who follow a user are called his or her followers and those whom the user follows are called his or her followees. Since following someone does not necessarily mean that they will follow you back, the Followees-Followers network is a directed graph [5]. One can also use “@” to mention others and address messages to them directly. The ease of usage of tweets has made possible the swift propagation of news and messages in Twitter network [3]. Among various microblog systems, Twitter is the first microblog site which has been well studied. But in contrast, Chinese social media has not been well-studied. Although Chinese social media services are much younger than Twitter, but the number of users, real time content distribution and influence it produces are tremendous. China's unique cultural and social environment also suggest that individuals’ behaviors online might be different than that in Western counterparts. Yu et al. [12] found that the effect of reposting is much larger in Sina Weibo where users are more likely to learn about a particular topic through reposts. In this paper, we examine in detail a fascinating online environment: Sina Weibo [10]. Reposting is one of the most important features in Weibo, which relays a post that has been written by another user. When a user finds an interesting post published by someone and wants to share it, s/he can simply repost the message. Reposting can apparently be considered as an efficient way of information propagation since the original tweet is propagated to a new set of consumers, namely the followers of the reposter. With its rapid growth, Sina Weibo plays an important role in the social networking world and thus studying the patterns of reposting behavior in Weibo is significant. Since reposting has become the key mechanism for information propagation in microblog and contributes a lot to users’ participation in online events, there have been some studies devoted to explore reposting behaviors. While some works [1, 2, 7] have been carried out in studying the retweet behavior, they mainly focused on the global factors, i.e., factors related to retweeting behaviors or from the perspective of information spreading [4, 8]. However, none of them has answered the questions that what are the patterns of the reposting behavior (e.g., is it true that a celebrity is more likely to be reposted and reasons behind) on the microblog network. It has also been proved that many special features in Sina Weibo differ from Twitter [9]. In our work, we conduct an in-depth study on the patterns of reposting behavior in Sina Weibo. To discover the frequent patterns of reposting behavior in Weibo is non-trivial. The challenges are two folds: i) How to accurately represent the information diffusion process for a certain original post. After an original post is sent out on the microblog network, it will spread in a complex cascaded way and the mutual relationship between the involved users is intricate. It is difficult to obtain the complete structure for a certain reposting process (e.g., how a piece of information is spread to a user from her/his followees and then s/he reposts it to her/his followers), especially for those famous posts which have been reposted thousands or even tens of thousands of times. ii) What data mining techniques could be employed to solve this problem. Unlike the problem of associations mining or sequences mining, we focus on mining frequent patterns of the reposting behavior in Weibo, which is presented as heterogeneous collection of ill-structured data like forest or graph and thus is more
374
H. He et al.
difficult to analyze. Some algorithms for discovering the tree-like patterns basically adopted a straightforward generate-and-test strategy [15,16], which is not applicable here because the mining algorithm should depend on the size of the dataset. In our work, the information flow for a certain message in Weibo is represented as a repost tree. We treat repost trees as representations of information diffusion structure. In order to find out the most frequent patterns of reposting behavior, we label the nodes in the repost tree according to the corresponding users’ followers’ number and formulate the problem of mining subtrees in a forest of rooted, labeled, and ordered trees. We investigate data mining techniques to detect and analyze frequent reposting behavior patterns and hope to discover various types of new knowledge on information propagation. The rest of this paper is organized as follows: In Section 2, we discuss previous studies related to our work. Section 3 introduces the repost behavior modeling. The pattern mining method is presented in Section 4. Section 5 presents the dataset and experimental results. Finally, we conclude in Section 6.
2
Related Work
Microblog has attracted much attention in the research community since it became an important social network service. Twitter has been well studied. Kwak et al. [4] conducted a large-scale study to analyze the topological characteristics of Twitter and its power as a new medium of information sharing. Java et al. [3] provided initial analysis on the topological and geographical properties of the twitter social graph along with observations on what type of content people used to tweet. Cha et al. [5] developed a framework to measure and model an individual’s influence on twitter, and found that a high follower count does not necessarily lead to many retweets. In general, the above-mentioned studies aim at analyzing the basic properties of the twitter network, while our work focuses on higher level knowledge of the microblog to discover frequent patterns of reposting behavior. As retweeting has become the key mechanism for spreading information in Twitter network, there have been a number of studies on retweeting to explore how information is diffused. The propagation graph and statistics are studied in [4]. Boyd et al. [1] have investigated retweeting as a conversational practice, such as how authorship, attribution, and communicative fidelity are negotiated in various ways. Suh et al. [2] investigated a number of tweet features that have potential relationship with the retweetability of tweets. Yang et al. [7] found that almost 25.5% of the tweets posted by users are actually retweeted from friends’ blog spaces and proposed a factor graph model to predict users’ retweeting behaviors. Yang et al. [8] studied the underlying mechanism of the retweeting behaviors. These studies mainly focus on analyzing the related factors of retweeting from the information spreading perspective. However, since the information flows in social network carry rich information about user behaviors, it is still an open question that what are the patterns of reposting behavior from the users’ perspective (e.g., how does the message propagate between users with different influence) and why the information spreads in that way on microblog network?
Tree-Based Mining for Discovering Patterns of Reposting Behavior in Microblog
375
Until now, quite few studies on Sina Weibo have been conducted. The differences between Sina Weibo and Twitter were studied in [9] wihich gain an in-depth understanding of what kinds of users are more active and what kinds of content are favored most for reposting. Qu et al. [11] studied the roles played by microblog systems in response to major disasters. Our work mainly focuses on finding out the patterns of the reposting behavior to reveal how information spread between diffetnt users in Weibo microblog.
3
Weibo Repost Behavior Modeling
A common practice on Sina Weibo is reposting, or rebroadcasting someone else's messages to one's followers. This section we present how repost behavior should be modeled. 3.1
Weibo Repost Tree Definition
When a piece of information is generated on the microblog network, it will spread in a complex cascaded way, which forms an information flow tree. Here, we represent the information flow for a certain post as repost tree. The root node of the repost tree stands for the author of the original post and other nodes stand for users that have reposted the original post onward. A node can have many child nodes, which means that the post has been reposted by many other users. All repost trees are subgraphs of the Weibo network. Definition 1 (Weibo Repost Tree). A tree is used to represent an information flow in Weibo. In this paper, trees are rooted, directed, sequential, and labelled. A tree is denoted as T = (V, E), where V = {0,1,...,n} is the set of vertices representing each retweeting Weibo users, and E = { (vi, vj )| vi, vj ∈ V, vi ≠ vj } is the set of edges to connect the retweeting users. One distinguished vertex vr ∈ V is designated the root. If (vi, vj ) ∈ E, then vi is the parent of vj or vj is a child of vi, denoting that vj retweet from vi. Each child has one and only one parent but a parent may have multiple children. Further, l: V→ L is a labelling function mapping vertices to a set of Labels L = {l1, l2 ...}; for any node vi ∈ V, L (vi ) is the label of vi. Edges are not labelled. The children of each vertex are ordered, i.e., left vertex represents that the retweeting behaviour occurs earlier than the right ones. Ma et al. [17] conduct a study on modeling the popularity of microblogs considering the number of followers. Intuitively, a tweet from a user with millions of followers is much more influential than that from a user with tens of followers. Furthermore, a retweet by a popular user, who has many followers, may increase the popularity of the original tweet. Specifically, we adopt the number of followers as the label criterion to describe a user’s popularity, where L = {a, b, c, d, e} indicates a Weibo user’s number of followers is larger than 100K, between 10K-100K, between 1K-10K, between 100-1K, and between 1 to 100, respectively.
376
3.2
H. He et al.
Weibo Repost Tree Construction
Users may “follow” one another to receive all up-to-date messages published by interested users and get “followed” by other users to spread his messages. Consequently, users in microblog sphere receive all messages which are published by the followees and aggregated in a single reverse-chronologically order. Thus we construct the repost tree based on the follower/followee relationship of the involved users in the ordered repost list. Original posts are represented as the root node, and the boosting reposts are placed as the children or grandchildren of the root. The child nodes in each depth of the tree are ranked in the chronological order. The relationship of Weibo users change dynamically, as it is common for users to add or remove his/her followees and due to the limitation of Sina Weibo APIs, it is difficult to obtain the complete structure of a certain reposting process. We formulate the Weibo repost tree construction in Algorithm 1 to get the optimal approximation of the information diffusion path. For each user in a given repost list RL, the algorithm firstly labels each user using the corresponding user’s number of followers as mentioned above and sorts the repost list in chronological order (Steps 1-2). The initialization is then processed (Steps 3-5). For each user u in RL, we iterate to select the appropriate parent node p from Pi and then add u to the tree (Steps 6-11). We start the construction of next tree level by adding 1 to i (Step 12) until breaking the loop. If RL is not empty, the remaining users in RL are added as child nodes of the root (Steps 13-14). These users are not followers of any users in the repost tree, they may repost from other approaches, such as trending topics and search engine. Finally, we transform the generated repost tree to an xml format file for storage (Step 15). Algorithm 1. RTC (RL) (Repost Tree Construction) Input: repost list for a given original post, RL Output: repost tree in xml format Procedure: (1) for each repost user u in RL, do label u using the labelling function l (2) sort RL using the repost time from earlier to late (3) init root node of the tree as r using author of the post (4) i ← 1 (5) add r to the parent nodes set Pi (6) while RL ≠ ∅ and Pi ≠ ∅ (7) for each repost user u in RL, do (8) for each candidate parent node p in Pi, do (9) if p is the followee of u && u repost from p (10) add u as the right most child of the p (11) Pi+1 ← Pi+1 ∪ u, then remove u from RL (12) i ← i + 1 (13) if RL ≠ ∅ (14) add the remaining users in RL as child nodes of r ranking by the repost time (15) transform the repost tree to xml format file
Tree-Based Mining for Discovering Patterns of Reposting Behavior in Microblog
3.3
377
An Example of Repost Tree
To illustrate the reposting function on Sina Weibo and the process of the repost tree construction, we present an example which is a piece of news about “The disappearance of iPhone5’s magic”, as shown in Fig. 1. Sina Weibo provides convenient retweet buttons for users to retweet a tweet easily. The equivalent of a repost on SinaWeibo is shown as two amalgamated entries: the original entry and the current user's actual entry which is a commentary on the original entry. In this example, the original author (refered to as A) was reposted 65 times totally while the current user (refered to as B) contributed once to the reposting. There are a total of 66 nodes in this repost tree and depth of the tree is 4. Most of the repost users appear at depth 2 and five of them are reposted by their followers further.
A B Pajek
A reposted 65 times B reposted once Fig. 1. An example of a repost on Sina Weibo
4
Tree-Based Pattern Mining
In this section, based on the repost tree structure, we present the frequent pattern mining algorithm. With the tree constructed for each repost list in our dataset, we built the tree sets consisting of the generated repost trees. To mine the frequent patterns, we firstly give the definitions for patterns, tree set size, and support for determining patterns. Afterwards, we present the mining algorithm. Table 1 shows the symbols we use. Table 1. Symbols Notation TS t tk Ck
Description A tree set consists of repost trees A tree A subtree with k nodes, i.e., k-subtree A set of candidates with k nodes
Fk
A set of frequent k-subtrees
σ
A support threshold minsupp
378
H. He et al.
Definition 2 (Patterns). Patterns are defined as frequent SubTrees in the tree set. Definition 3 (Tree Set Size). We use |TS| to represent the numbers of nodes in a tree set named TS. Definition 4 (Support). Given a subtree T and a tree set TS, the support of T is defined as: supp(T) =
number of occurrences of T total number of vertices in TS
If the value of supp(T) is more than a threshold value minsupp (e.g., 1%), T is called a “frequent subtree”. Given a minimum support threshold σ , we aim to find all the subtrees that appear at least σ × |TS| times in the set. Based on the frequent pattern mining algorithms [6], we employ the tree mining techniques to discover all frequent tree-like patterns within the tree set, which is a large collection of labeled ordered repost trees. The key of the method is the concept of the rightmost expansion. In order to get all frequent subtrees according to the minsupp, the support for each subtree is calculated. The procedure of the mining process is presented in algorithm 2. Algorithm 2. FRSPM (TS, σ) (Frequent Repost SubTree Pattern Mining) Input: a tree set consists of repost trees, TS support threshold, σ Output: all frequent tree patterns with respect to σ Procedure: (1) i ← 1 (2) scan TS, calculate the support for each labelled node (3) select the nodes whose support are larger than σ to form Fi (4) while Fi ≠ ∅ (5) for each tree ti in Fi, do (6) expandResult ← Right_Most_Expand (ti) (7) Ci+1 ← expandResult ∪ Ci+1 (8) for each pattern T ∈ Ci+1, do (9) if supp(T) > σ Fi+1 ← Fi+1 ∪ T (10) i ← i + 1 (11) output all frequent subtrees in Fk (1 ≤ k i) whose supporting values are larger than σ
<
The algorithm firstly calculates the corresponding support for nodes with distinct label from L, and then selects the nodes whose support are larger than σ to form the set of frequent nodes, F1 (Steps 1-3). It then calls the Right_Most_Expand for each existing frequent i-subtrees from Fi to generate the set of candidate subtrees with i + 1 nodes Ci+1 (Steps 5-7). If there are any trees whose supports are larger than σ, it selects them to form Fi+1 (Steps 8-9). Then repeatting the procedure by adding 1 to i (Step 10) until breaking the loop. Finally, it outputs all the subtrees whose supports are larger than σ
Tree-Based Mining for Discovering Patterns of Reposting Behavior in Microblog
379
from F1 to Fi-1 (Step 11). Step 6 calls a subprocedure (Right_Most_Expand) to get the expanding result of ti. The subprocedure, Right_Most_Expand, is executed for right most expansion for a pattern, which is used to grow a tree by attaching new nodes only on the rightmost branch of the tree. This subprocedure is presented as below. It firstly gets the right most branch of tk (Step 1). Then, for each node r in the branch, it adds a new node with different label as the right most child of r (Steps 2-4). In Step 5, each new generated pattern ck+1 is then added to Sk+1 (all expansion results of tk). It returns the candidate patterns (Step 6). Subprocedure. Right_Most_Expand (tk) Input: a tree pattern of k nodes, tk Return: candidate frequent subtree patterns, each including k+1 node (1) rmb ← the right most branch of tk (2) for each node r in rmb, do (3) for each l in L, do (4) create a new node n labeled as l, then ck+1 ← add n as the right most child of r (5) Sk+1 ← ck+1 ∪ Sk+1 (6) return all the generated candidate patterns Sk+1 We define number of labels in L as |L| and depth of the right most branch for T as |D|, then there would be |L|×|D| right most expansion results for T. Here we give an example and illustrate the expansion process in Fig. 2. Suppose that L = {a, b}. As we can see, there are 4 different expansion results for the given pattern tree. There are two nodes in the right most branch, which are the gray nodes in Fig. 2(a) and Fig. 2(b). In Fig. 2(a), new nodes labeled as a and b are added as the right most child of the gray node a respectively. The same process is showed in Fig. 2(b) for the gray node b.
(a)
(b)
Fig. 2. Example of the Rightmost Expansion for a Pattern Tree
5
Experiments and Discussion
In this section, we firstly describe the dataset used in this paper, and then present the experiments and findings of the popular repost and pattern mining results.
380
H. He et al.
5.1
Dataset
To trace the dissimilation of posts, we implemented a crawler tool and collected 241 message cascades (more than 1 million reposts totally) based on Sina Weibo open API 1 . We also collected the repost list of these messages and the corresponding follower/followee relationships of related Weibo users with Sina Weibo’s open API 2 , 3 . To ensure that our social graph contains most of the follower/followee relationships, we collected the followeeIDs of the users who are involved in the retweeting process. We analyze the repost tree depth distribution. As shown in Fig. 3, we observe that the tree depth varies from 2 to 11, and the number of repost trees for each depth decrease progressively from small to large. A deeper repost tree means that the information spreads more widely in the Weibo sphere and maybe reposted by more users. In other words, there is a high correlation between depth of repost tree and total number of retweets (with a correlation coefficient of 0.81). 90
0.7
80
a b c d e
0.6
70 0.5
Percentage
Number
60 50 40
0.4
0.3
30 0.2 20 0.1
10 0
2
3
4
5
6
7
8
9
10
0
11
C1
C2
C3
C4
C5
Tree Depth
Fig. 3. Repost tree level distribution
Fig. 4. Tree set nodes distribution
Table 2. TreeSet TreeSet C1 C2 C3 C4 C5 (C1 + C2 + C3 + C4)
1 2 3
Number of Trees 111 22 50 58 241
Number of Nodes 1907 6494 149069 865376 1022846
Reposts Number Range 1-100 100-1k 1k-10k More than 10k More than 1
http://open.weibo.com/wiki/2/statuses/user_timeline http://open.weibo.com/wiki/2/statuses/repost_timeline http://open.weibo.com/wiki/2/friendships/friends/ids
Tree-Based Mining for Discovering Patterns of Reposting Behavior in Microblog
5.2
381
Pattern Mining Results and Findings
In order to study the frequent patterns of the repost tree, we first generate five different tree sets according to the number of reposts, as shown in Table 2. As nodes in the tree set are labeled (refer to Section 3.1), we analyze the nodes’ label distribution. From Fig. 4, we can see that nodes of d and e account for a greater proportion than the others, which means that there are a lot of users with small number of followers in our tree set. With the frequent repost subTree pattern mining algorithm mentioned in Section 4, we set the support threshold σ as 0.3 percent and discover the frequent patterns of C5. As showed in Fig. 5, we select the top 10 frequent patterns, which are ranked by their support values. Fig. 5(1) and Fig. 5(2) show the most frequent patterns contain only one node (e and d), which indicates that users on Sina Weibo expressing their attitudes by reposting are often the ones with fewer followers, and even their neighbors seldom take any action to response them. Therefore the most of the information propagation processes are interrupted and sliced. In Fig. 5(3), one user labeled as a is reposted by another user labeled as e. The pattern shown in Fig. 5(4) indicates that one user labeled as a is reposted by two users, which are labeled as d and e respectively. And the left node (d) is prior to the right node (e) for the repost behavior.
Fig. 5. The top 10 frequent patterns for TreeSet C5
For different TreeSets, we ignore the patterns with only one node and compare the generated frequent patterns in Fig. 6, which depicts the top three patterns for a certain number of nodes (from 2 to 5) ranked by the value of support. Comparing the generated patterns of the different TreeSets, it can be observed that information flows are all started with users labeled as a and messages tend to propagate in some certain paths. Some of them are similar to traditional media such as newspapers, while the others are inhered in the online social network structure. Several interesting findings are presented as follows:
Information tends to propagate from advanced-users to low-level users. It indicates that the advanced-users get a lot of reposts and promote the information spreading to a deeper level in the Weibo sphere. For patterns with two nodes, we discover that pattern 2-1, pattern 2-2, and pattern 2-3 in TreeSet C1 also exist in other TreeSets of C2, C3 and C4, except the pattern 2-3 (a → a) in TreeSet C3. This pattern may indicate the situation that the user labeled as a reposts him/herself a lot during the information propagation process. Further study is needed to explain this pattern precisely.
382
H. He et al.
(a)
(b)
(d)
(c)
(e)
Fig. 6. Top three frequent patterns for two, three, four and five nodes for different TreeSet (The numbers under the pattern indicate the number of nodes and the rank, e.g., pattern 3-2 is the second rank of three-node patterns)
5.3
Frequent patterns tend to be wider rather than deeper. It means that the possibility of reposting gets lower when the information spreads to a deeper level For all patterns in Fig. 6, they are all trees of depth two and with the number of nodes increased, they become wider not deeper. The information flows of a → e and a → d are the most frequent in all TreeSets. It means that users labeled as e and d repost a lot of messages from those influential users in Weibo sphere. For example, there are a total of 30 edges involved in each TreeSet. And the number of flows of a → e and a → d in TreeSet C1, C2, C3, C4, and C5 is 26, 24, 30, 27, and 28 respectively. The users labelled as b don’t exist in all extracted frequent patterns. We find that all advanced-users are labelled as a while the most frequent low-level users in descending order are e, d and c respectively, which indicate that the information flows of a → b and b → d (or e, c) are less frequent. For TreeSet C5, consisted of all repost trees in our dataset, the most frequent patterns with two nodes are a → e, a → d and a → c. Users at end of an arrow with less followers are more likely to repost, which is in accordance with the percentage distribution of the three nodes: e > d > c, as is shown in Fig. 6. Discussion
As social media has revolutionized the nature of influence and the role of influential people, our findings have provided evidence to support the influence topology theory.
Tree-Based Mining for Discovering Patterns of Reposting Behavior in Microblog
383
Based on Edelman’s topology of influence (TOI) [13], Tinati et al. [14] developed a model based upon the Twitter message exchange which enables to analyze conversations around specific topics and identify different communicator roles within a Twitter conversation. As is shown in Fig. 6, most of the users in Sina Weibo have a less number of followers and they tend to follow and repost from those influence individuals with a large number of followers, such as movie stars, grassroots celebrities, and public news media. Users with fewer followers are more likely to be the information viewers while users with a lot of followers are the idea starter or the information amplifier. The idea starter and information amplifier all have a large network of follower/followee connections and most of the information viewers repost from them.
6
Conclusion
In this study, we use the data collected from Sina Weibo to study the patterns of reposting behavior in microblog services. In order to gain insights into the patterns of reposting behavior the microblog network, each message cascade is represented as a repost tree based on its reposting process. The patern mining results indicate that messages tend to propagate in some certain manners. Advanced users have a higher influential and can get a lot of reposts to promote the information spreading to a deeper level in the Weibo sphere. For example, information flows of a (users with more than 100K followers) to e (users with 1-100 followers) or d (users with 100-1K followers) are the most frequent in all patterns. In the future, we plan to study the relationship between the patterns of reposting behavior and the media modalities (i.e., picture, video, and URL) embodied in the content. Acknowledgments. This work was partially supported by the National Basic Research Program of China (No.2012CB316400), the National Natural Science Foundation of China (No. 61222209, 61103063), the Program for New Century Excellent Talents in University (No. NCET-12-0466), the Specialized Research Fund for the Doctoral Program of Higher Education (No. 20126102110043), and the Natural Science Basic Research Plan in Shaanxi Province of China (No. 2012JQ8028).
References [1] Boyd, D., Golder, S., Lotan, G.: Tweet, tweet, retweet: Conversational aspects of retweeting on twitter. In: Proc. of HICSS 2010 (2010) [2] Suh, B., Hong, L., Pirolli, P., Chi, E.H.: Want to be retweeted? Large scale analytics on factors impacting retweet in Twitter network. In: Proc. of SocialCom 2010 (2010) [3] Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: An analysis of a microblogging community. In: Zhang, H., Spiliopoulou, M., Mobasher, B., Giles, C.L., McCallum, A., Nasraoui, O., Srivastava, J., Yen, J. (eds.) WebKDD 2007. LNCS, vol. 5439, pp. 118–138. Springer, Heidelberg (2009) [4] Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proc. of WWW 2010 (2010)
384
H. He et al.
[5] Zhou, Z., Bandari, R., Kong, J.S., Qian, H., Roychowdhury, V.: Information resonance on twitter: Watching Iran. In: Proc. of SOMA 2010 (2010) [6] Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: Proc. of SIAM 2002 (2002) [7] Yang, Z., Guo, J., Cai, K., Tang, J., Li, J., Zhang, L., Su, Z.: Understanding retweeting behaviors in social networks. In: Proc. of CIKM 2010 (2010) [8] Yang, J., Counts, S.: Predicting the speed, scale, and range of information diffusion in twitter. In: ICWSM 2010 (2010) [9] Wang, C., Guan, X., Qin, T., Li, W.: Who are active? An in-depth measurement on user activity characteristics in sina microblogging. In: GLOBECOM (2012) [10] Sina weibo, http://en.wikipedia.org/wiki/SinaWeibo [11] Qu, Y., Huang, C., Zhang, P., Zhang, J.: Microblogging after a major disaster in China: a case study of the 2010 Yushu earthquake. In: Proc. of CSCW 2011 (2011) [12] Yu, L.L., Asur, S., Huberman, B.A.: Artificial Inflation: The True Story of Trends in Sina Weibo. In: J. arXiv preprint arXiv:1202.0327 (2012) [13] Bentwood, J.: Distributed influence: Quantifying the impact of social media. Edelman (2008) [14] Tinati, R., Carr, L., Hall, W., Bentwood, J.: Identifying communicator roles in twitter. In: Proc. of MSND 2012 (2012) [15] Miyahara, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of frequent tree structured patterns in semistructured web documents. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, p. 47. Springer, Heidelberg (2001) [16] Wang, J.T.L., Shapiro, B.A., Shasha, D., Zhang, K., Chang, C.Y.: Automated discovery of active motifs in multiple RNA seconary structures. In: Proc. KDD 1996 (1996) [17] Ma, H., Qian, W., Xia, F., et al.: Towards modeling popularity of microblogs. J. Frontiers of Computer Science (2013)