Robust Features for Detecting Evasive Spammers in Twitter Muhammad Rezaul Karim and Sandra Zilles 1
2
Department of Computer Science, University of Calgary, Canada
[email protected] Department of Computer Science, University of Regina, Canada
[email protected]
Abstract. With its growing popularity as a microblogging service, Twitter has been attracting an increasing number of spammers. Researchers have designed features of Twitter accounts that help machine learning algorithms to detect spammers. Evasive spammers, in return, try to evade detection by manipulating such features. This has led to research on the design of robust features, i.e., features that are hard to manipulate. In this paper, we propose five new robust features to detect evasive spammers. An empirical analysis shows that our proposed features can improve the performance of some machine learning classifiers, when used along with some existing robust features.
1
Introduction
Twitter is one of the most popular social networking and microblogging services. It allows its users to send and read tweets, which are short text messages containing up to 140 characters. With around 250 million monthly active users, Twitter has revolutionized the way people share news and express opinions. Spammers have begun to use Twitter to spread malware, to post phishing links as well as for advertising. Over the last few years, researchers have applied various machine learning algorithms to detect spam accounts in Twitter [1, 5, 8, 11, 13, 14]. These approaches mostly rely on features of the spam account. Even though most existing features are capable of distinguishing current spammers from legitimate users, they might become less effective as spammers’ techniques evolve. For example, evasive spammers can easily manipulate features like number of followers [13] or F-F Ratio (i.e., the ratio of the number of followings to the number of followers) [5] by purchasing followers from third-party websites or exchanging followers from third-party websites [14]. The cost per 1,000 accounts ranges from $10 to $200 [12]. Evasive spammers can also manipulate features like tweet similarity [5] which rely on the similarity of tweets. They can post multiple tweets with very similar meaning but with different content [14, 11], or they can can mix normal tweets with spam tweets. Empirical results indicate that 30% of the spammers may remain undetected due to their dual behavior [1].
In other words, most of the features studied so far and ranked highly by various feature selection algorithms are not robust [14]. A feature is considered robust if the manipulation of its value is expensive in terms of some cost measures (money, time, effort etc.). Yang et al. proposed several features to detect evasive spammers [14]. They showed that graph-based features that represent the position of a user as a node in the Twitter graph (in which nodes are accounts and edges reflect the follower relation) are robust and effective. In this paper, we propose new robust graph-based features (concerning the Twitter graph), content-based features (concerning the content of the tweets) and neighbour-based features (concerning the followings of an account). The graph-based features are based on the concept of eigenvector centrality [2] and Laplacian centrality [9] of the users in the Twitter graph, while the proposed content-based features are based on the optimal matching [10] and LSA-based semantic similarity of tweets [7]. The neighbour-based feature that we propose is based on the reputation of the accounts a user is following. We conduct experiments with several machine learning classifiers on a data set containing 750 users to determine the effectiveness of the proposed features. Our results show that the proposed features, in particular the graph-based ones, can improve the performance of machine learning algorithms when used with some existing robust features. Furthermore, three standard feature selection algorithms rank our proposed graph-based features fairly high when compared with other (robust or non-robust) features, which suggests that eigenvector centrality and Laplacian centrality can be very effective for spam account classification.
2
Background
In this section, we provide the background on previously studied Twitter account features that we use in our empirical study. We further discuss basic approaches to computing semantic similarity between phrases as well as centrality in graphs, which are essential to the new account features we will propose. 2.1
Twitter Social Network
In Twitter, users can post short text messages called tweets, of length up to 140 characters. Following an account means subscribing to tweets posted by the account. If user A follows user B, A is a follower of B and as a subscriber, user A can see a tweet posted by user B as soon as it is posted. B is then called a following account of A. Researchers model the Twitter network as a graph, where each user is a node and edges are based on user relationships. In this paper, we follow the directed graph model adopted by Wang [13]. In this model, an edge from A to B means that A follows B, see Figure 1. If a tweet contains text fragments with the syntax @username, it is referred to as a mention of the user username. In this case, the tweet automatically appears to the user username, even if the receiver is not a follower of the sender. This feature is exploited by spammers to send spam to normal users, because normal users rarely follow spammers. A reply is a special case of a mention.
Fig. 1: A Twitter graph in which users A and B follow each other, A follows C, and C follows B
Fig. 2: Example of optimal semantic matching for two sentences (tweets). The matching shown on the right is optimal
2.2
Features for Spam Detection
A feature of a Twitter account is considered robust if it is difficult or expensive to evade [14]. A feature is difficult to evade if the manipulation of the feature value requires a fundamental change in the way spammers usually perform their malicious deeds. A feature is expensive to evade if manipulation of the feature value is expensive in terms of some cost measures (money, time, effort etc.). Existing approaches on spam account detection can be classified into two categories: Standard Spammer Detection Approaches and Evasive Spammer Detection Approaches. Approaches belonging to the first category, such as those reported in [1, 5, 8, 13], ignore the importance of the robustness of the features and assume that spammers do not manipulate their account features. Each approach in the literature uses a combination of features of different types: profile-based features [1, 5, 8, 13, 14], content-based features [1, 5, 8, 13, 14], graph-based features [5, 14], neighbourhood-based features [14], automation-based features [14] and timing-based features [1, 5, 8, 14]. Profile-based features are extracted from the user profile information, e.g., the number of followers or the age of the account. Content-based features rely on the content of the tweets posted by the users, for instance, on the similarity between different tweets posted by the same user, the number of malicious URLs in the tweets etc. Graph-based features are extracted from the Twitter graph. Automation-based features are targeted to catch spammers who use customized tools or softwares to automatically post tweets. Descriptions of the existing features we experiment with (see Section 4) can be found in [1, 5, 8, 11, 13, 14]. 2.3
Semantic Similarity
Measuring semantic similarity between words has important applications in artificial intelligence, natural language processing and information retrieval. Existing approaches rely on either statistics from a large corpus or on thesauri like WordNet [6, 7]. WordNet is a lexical database for the English language and groups
English words into sets of synonyms. In WordNet, measures of semantic similarity quantify how much two words are alike, based on information contained in an is-a hierarchy. LIN is one of the existing approaches that relies on WordNet to compute word-to-word similarity [6]. Word-to-Word semantic similarity can also be computed using Latent Semantic Analysis (LSA) [7]. This statistical method derives a vectorial representation for each word based on the concept of word co-occurrences. The co-occurrence information is usually derived from large collections of text documents. To compute the similarity between any two words, the cosine (normalized dot product) of the LSA vector representations of the two words is computed. Several methods have been proposed in the natural language processing literature to compute sentence-to-sentence semantic similarity [7, 10] using word-to-word similarity; we focus on the following two: Optimal Matching. The optimal matching method [10] is based on the optimal assignment problem, a fundamental combinatorial optimization problem. In this approach, the words from one sentence form the set X = {x1 , x2 , . . . , xn }, while the words from the other sentence form the set Y = {y1 , y2 , . . . , yn } (preprocessing is needed to make the sentences have equal numbers of words). The goal of the assignment problem is to find a permutation Pn π of {1, 2, . . . , n} that yields the maximum value of the objective function i=1 w(xi , yπ(i) ), see Figure 2. Here the weight w(x, y) is the word-to-word similarity between x ∈ X and y ∈ Y . This problem has a polynomial-time solution [3]. LSA-Based Similarity. LSA-based word-to-word similarity can be easily extended to computing the similarity of two sentences. In this case, for each of the sentences, vector algebra is used to generate a single vector by adding up the LSA vectors of the individual words. Alternatively, the individual word vectors can be combined through weighted sums [7]. Then the cosine between the resulting vectors is computed. 2.4
Centrality in Graphs
Centrality measures quantify the importance of individuals in a network. Different centrality measures capture different aspects of the positioning of the nodes in a network. Betweenness centrality, which gives high weight to nodes that lie on many shortest paths between other nodes, is an example of a centrality measure that has already been used as a feature in the detection of evasive spammers [14]. One disadvantage of this centrality measure is that any two nodes that do not lie on a shortest path between any two other nodes receive the same score of zero, irrespective of their differences in other properties. In this section, we describe two more centrality measures, which we use to extract features in our approach. Eigenvector Centrality. Bonacich [2] suggested that the eigenvector associated with the largest eigenvalue of an adjacency matrix can make a good network centrality measure. Unlike degree centrality, which weighs every connection equally, the eigenvector centrality weighs connections based on their centralities. It is a
weighted sum of not only direct connections but indirect connections of every length, and thus takes the full network structure into account. Let A = [aij ]n×n be the adjacency matrix of a graph G over n nodes, where aij = 1 if there is an edge between nodes i and j, and aij = 0 otherwise. The eigenvector centrality CE (vi ) of a node vi is defined as follows (see [2]). One first determines the largest eigenvalue λ of A, then takes c to be the corresponding eigenvector, i.e., Ac = λc, and c(vj ) to be the jth component of c, i.e., c = (c(v1 ), c(v2 ), . . . , c(vn )). Then n
CE (vi ) =
1X aij c(vj ) . λ j=1
(1)
According to this centrality measure, important Twitter users are being paid attention to by many other users, who in turn are being paid attention to by many others, and so on. Even if a user has few connections, it can have a very high eigenvector centrality provided that those connections themselves are important. Laplacian Centrality. The Laplacian centrality method [9] is a recently proposed centrality method to analyze terrorist networks and is based on the Laplacian energy of a graph. The latter is a quantity that reflects internal connectivity in a graph [4]. The Laplacian energy EL (G) for a directed graph G is defined as EL (G) =
n X
d2i ,
(2)
i=1
where di is the out-degree of the nodes vi in G. The Laplacian centrality of a node is defined as the decrease of the Laplacian energy in the graph when removing this node. It measures how much “damage” is done to the graph structure due to the removal of the node. Let H be the graph obtained by removing node vi from G. The Laplacian centrality CL (vi ) of a node vi is defined as follows: CL (vi ) = EL (G) − EL (H) .
(3)
CL (vi ) is related to the number of 2-walks in which vi participates [9]. The more 2-walks a node participates in, the more important it is. Laplacian centrality is expected to yield a useful node categorization when many subgroups exist in a graph, which is very common in social networks. Unlike the centrality methods that aim to find the center of the whole network, Laplacian centrality targets centers of sub-communities in the network.
3
Proposed Features
We propose five features for detecting evasive spammers. Two of them are graphbased, two are content-based and one is neighbour-based.
Eigenvector centrality (graph-based). We propose eigenvector centrality (see Section 2.4) as a feature to detect spammers. We conjecture that legitimate users typically have friends, family members, relatives and colleagues as both followers and followings. We expect that some of the followers of a legitimate user have a fair number of connections and have high centrality values. Having neighbours with high centrality values, in turn, contributes to the high eigenvector centrality of a legitimate user. Spammers, on the other hand, blindly follow a large number of legitimate users without being followed back and thus without positively affecting their own eigenvector centrality. Even if a spammer buys a large number of followers, those followers are expected to be relatively unimportant or may have few followers themselves, again not contributing to the spammer’s eigenvector centrality. We argue that eigenvector centrality is robust. It is expensive for a spammer to change the value of this feature, as it requires buying new followers as well as improving the centrality of direct and indirect connections. Laplacian centrality (graph-based). Laplacian centrality (see Section 2.4) is another graph-based feature that we propose for spam account detection. Considering the type of followers spammers usually have as well as the accounts they usually follow, we conjecture that spammers are likely to have relatively low Laplacian centrality. As spammers do probably not have well-connected neighbours and they usually form networks without community structure, the removal of a spammer from the Twitter graph should have little effect on its Laplacian energy. Legitimate users are likely to have relatively high Laplacian centrality. Thus the deactivation of a legitimate user is more likely to affect the Laplacian energy of the graph to a noticeable extent. Laplacian centrality is also a robust feature, for similar reasons as in the case of eigenvector centrality. Optimal matching similarity (content-based). Evasive spammers can post tweets with the same semantic but different words so as to evade features like tweet similarity and duplicate tweets, which are based on syntactic similarity. We propose to compute the average semantic similarity between any two tweets posted by a Twitter user, based on the optimal matching method described in Section 2.3. Spammers who post tweets with different words but with the same meaning will have a high value for this feature. Legitimate users usually post tweets with different contents and different semantics and are thus expected to have a relatively low optimal matching similarity score. This is more robust than the syntax-based feature tweet similarity as it takes semantic similarity into account. LSA similarity (content-based). This feature is also targeted at the semantic similarity of tweets and relies on the LSA-based sentence-to-sentence semantic similarity as discussed in Section 2.3. In order to investigate the effectiveness of a method that uses a large corpus to compute semantic similarity, this feature is selected along with the optimal matching method. For the same reasons as in the case of optimal matching similarity, LSA similarity is also a robust feature. Average neighbours’ reputation (neighbour-based). Spammers can control their own behavior but they usually do not have control over their following accounts.
Inspired by [14], we propose a neighbour-based feature called average neighbours’ reputation. It refers to the average reputation of the account’s followings: 1 |f ollowings(v)|
X i∈f ollowings(v)
|f ollowers(i)| |f ollowers(i)| + |f ollowings(i)|
(4)
where f ollowers(i) and f ollowings(i) denote the set of follower accounts of i and the set of accounts i is following, respectively. This feature reflects the quality of the choice of friends. Spammers can try to manipulate this feature value by buying new accounts and following those new accounts. Then they have to change the number of followings and the number of followers of those purchased accounts to make their reputation value look like legitimate users. Considering the money and effort required to do so, the average neighbours’ reputation is a robust feature.
4
Experimental Setup
We use a random selection of 350 spammers and 400 legitimate users from the labeled data on 1000 spammers and 10,000 legitimate users collected by Yang at el. [14]. For most users, the neighbour information required to compute the graph-based feature values in our experiments is provided in this data set. Since a few neighbour entries might be missing in the data, the resulting graph-based feature values are only approximations of the actual values; however, like Yang et al., we assume that not too much information is missing from the data. We conduct our experiments using five different machine learning classifiers: Sequential Minimal Optimization (SMO), Random Forest, J48, Naive Bayes and Decorate as implemented in WEKA3 . For each machine learning classifier, we use 10-fold cross validation to compute several performance metrics. PAJEK4 and GEPHI5 are used to extract graph-based features, while for computing semantic similarity, we use the SEMILAR6 semantic similarity toolkit. To compute content-based features, we used the 40 most recent tweets posted by each user. To compute tweet similarity, LSA similarity and optimal matching similarity, pairwise similarities were averaged over all pairs formed from the 40 most recent tweets posted by each user. To analyze the effectiveness of the five features we proposed, we evaluate the chosen machine learning algorithms on various feature sets comprised of some of 18 existing features and some of our 5 features. Table 1 lists all the features we experimented with, sorted by category; for the existing features, the robustness value is adopted from Yang et al. [14]. We consider our proposed features as highly robust, with the exception of the average neighbours’ reputation, which we rate as medium robust. 3 4 5 6
http://www.cs.waikato.ac.nz/ml/weka/ http://pajek.imfm.si/doku.php?id=pajek https://gephi.org http://deeptutor2.memphis.edu/Semilar-Web
Table 1: 18 existing and 5 proposed features; the latter are highlighted in bold Feature Category Robustness Studied by number of followers Profile Low [1, 13] number of followings Profile Low [1, 13, 14] F-F ratio Profile Low [1, 5, 14] reputation Profile Low [13] tweets per day Profile Low [5] age Profile High [5, 14] URL ratio Content Low [1, 5, 13, 14] unique URL ratio Content Low [5, 14] tweet similarity Content Low [5, 14] hashtag ratio Content Low [1, 13] reply ratio Content Low [1, 5, 13] retweet ratio Content Low [1, 8] optimum matching similarity Content High – LSA similarity Content High – local clustering coefficient Graph High [14] betweenness centrality Graph High [14] eigenvector centrality Graph High – Laplacian centrality Graph High – bidirectional links Graph Low [5] bidirectional links ratio Graph Medium [14] average neighbours’ followers Neighbour Low [14] average neighbours’ tweets Neighbour Low [14] average neighbours’ reputation Neighbour Medium –
We divide our machine learning experiments into two groups. In group (A), we combine both robust and non-robust features. The first feature set (A-1) in this group contains only the 18 existing features, while the second feature set (A-2) in this group contains all the 23 features. In group (B), we consider only the robust features. Here feature set (B-1) consists of 4 existing highly or medium robust features: age, local clustering coefficient, betweenness centrality, bidirectional links ratio, while the feature set (B-2) consists of the same 4 existing robust features and the 5 newly designed features. The last feature set (B-3) in group (B) contains only the newly proposed robust features. Finally, we also ran some experiments with standard feature ranking algorithms implemented in WEKA, namely Chi-Square, Information Gain and ReliefF. These algorithms rank feature f1 higher than feature f2 (i.e., assign f1 a lower number than f2 ) if they consider f1 more effective for distinguishing spammers from legitimate users than f2 . Based on the obtained ranking, we tested the machine learning algorithms on four further feature sets described below.
5
Experimental Results
Table 2 shows the results for our experiments with the two feature sets (A-1) and (A-2) in group (A), where we combine both robust and non-robust features. While SMO is slightly better when ignoring our features, Decorate is slightly
Table 2: Classification results with feature sets containing robust and non-robust features, based on accuracy, F-measure and False Positive (FP) Rate. Highlighted in bold are the best results in the experiment set when the difference is at least 0.005 (i.e., 0.5% for accuracy) Classifier
Without Our Features (A-1) With Our Features (A-2) Accuracy(%) F-measure FP Accuracy(%) F-measure FP Random Forest 93.467 0.935 0.065 93.333 0.933 0.066 SMO 93.867 0.939 0.059 92.400 0.924 0.074 J48 90.400 0.904 0.095 90.400 0.904 0.096 Decorate 92.267 0.923 0.075 93.067 0.931 0.068 Naive Bayes 87.866 0.879 0.124 87.600 0.876 0.125
Table 3: Classification results with feature sets containing robust features only, based on accuracy, F-measure and False Positive (FP) Rate. Highlighted in bold are the best results in the experiment set when the difference is at least 0.005 (i.e., 0.5% for accuracy) Classifier Existing Robust Feat. Accuracy(%) F-measure Random Forest 92.400 0.924 SMO 69.733 0.697 J48 91.467 0.915 Decorate 91.467 0.915 Naive Bayes 76.000 0.755
(B-1) All Robust Feat. (B-2) New Robust Feat. (B-3) FP Accuracy(%) F-measure FP Accuracy(%) F-measure FP 0.077 94.400 0.944 0.055 87.600 0.876 0.125 0.293 83.733 0.837 0.152 68.530 0.671 0.284 0.084 92.000 0.920 0.080 84.530 0.845 0.156 0.081 93.067 0.931 0.069 89.200 0.892 0.107 0.257 80.533 0.805 0.189 69.860 0.694 0.282
better with our features. Overall it seems that the addition of our features does not affect the classifier performance too much. However, we argue that the existing data still contain realtively few evasive spammers. As spammers start to use more sophisticated methods in order to avoid detection, we will likely need to rely more on robust features. Therefore the results of our experiments in group (B), which focus only on robust features are of higher importance to our study, see Table 3. Here, all tested classifiers improve their performance when our newly proposed robust features are added to the existing robust features (most notably for for SMO and Naive Bayes, where accuracy increases from less than 70% for (B-1) and (B-3) to over 83% for (B-2), and from 76% for (B-1) and roughly 70% for (B3) to over 80% for (B-2), respectively. Similarly substantial improvements are seen in the F-measure and FP rate in these cases. These results clearly suggest that, collectively, our newly proposed features can help to improve classifier performance when combined with the previously studied features. It is conceivable that some of the features we propose contribute more to the success of the combined feature set (B-2) than others. We hence conducted experiments with three feature ranking methods as explained in Section 4, in order to determine which of our newly proposed features are most promising. The results are shown in Table 4, which ranks all features, and Table 5, which ranks only the robust features. The proposed graph-based features are ranked highly by all the feature ranking algorithms and are among the top 8 features of all 23 and among the top 4 of
Table 4: Ranking of all features, sorted by average rank obtained from the three ranking methods. The newly proposed features are highlighted in bold Feature Information-Gain ReliefF Chi-Square Average Rank bidirectional links ratio 1 2 1 1.33 bidirectional links 2 10 2 1.66 F-F Ratio 3 3 3 3.00 reputation 4 1 4 3.00 eigenvector centrality 7 4 7 6.00 number of followers 5 9 5 6.33 reply ratio 12 5 10 9.00 Laplacian centrality 9 7 13 9.66 betweenness centrality 6 19 8 11.00 age 11 11 11 11.00 local clustering coefficient 8 22 6 12.00 URL ratio 10 18 9 12.33 unique URL ratio 13 13 12 12.66 number of followings 14 14 14 14.00 average neighbours’ reputation 18 6 19 14.33 average neighbours’ followers 15 17 15 15.66 average neighbours’ tweets 20 8 20 16.00 optimal matching similarity 19 15 18 17.33 tweets per day 17 20 16 17.66 tweet similarity 16 21 17 18.00 hashtag ratio 22 12 22 18.66 LSA similarity 21 16 21 19.33 retweet ratio 23 23 23 23.00
the 9 robust features. The proposed semantic similarity based features, however, seem not as effective as the proposed graph based features. Average neighbours’ reputation does not rank very highly either. In order to determine whether or not the addition of optimal matching similarity, LSA similarity, and average neighbours’ reputation is useful, we ran experiments on four further features sets: one containing all previously studied features plus our two graph-based features (A-1 graph), one containing all previously studied robust features plus our two graph-based features (B-1 graph), one containing the top 8 features in the ranked list of all features in Table 4 (Top 8), and one containing the top 4 features in the ranked list of all robust features in Table 5 (Top 4R). The results are shown in Table 6, which compares all these feature sets to (A-2) and (B-2), containing all (all robust, resp.) features. We present only accuracy values here, F-measure and FP rate performances showed similar relationships. Table 6 suggests that adding only our proposed graph-based features to the set of all existing features yields a performance comparable to that of using all features. When focusing only on robust features though, adding optimal matching similarity, LSA similarity, and average neighbours’ reputation slightly improves the results over (B-1 graph) for four out of the five tested Machine Learning algorithms. The results worsen slightly again when using only the top-ranked features from our feature selection experiments. This suggests that semantic similarity features might gain importance on data sets in which spammers invest more effort into evading detection.
Table 5: Ranking of robust features, sorted by average rank obtained from the three ranking methods. The newly proposed features are highlighted in bold Feature Information-Gain ReliefF Chi-Square Average Rank bidirectional links ratio 1 2 1 1.33 eigenvector centrality 3 1 3 2.33 betweenness centrality 2 6 4 4 Laplacian centrality 5 3 6 4.66 age 6 4 5 5 local clustering coefficient 4 9 2 5 average neighbours’ reputation 7 5 8 6.66 optimal matching similarity 8 7 7 7.33 LSA similarity 9 8 9 8.66
Table 6: Comparison of various features sets excluding optimal matching similarity, LSA similarity, and average neighbours’ reputation – accuracy values Including Non-Robust F. Classifier (A-2) (A-1 graph) (Top 8) Random Forest 93.300 92.800 89.460 SMO 92.400 92.800 90.133 J48 90.400 90.800 90.800 Decorate 93.067 93.333 90.930 Naive Bayes 87.600 88.000 87.460
Only Robust F. (B-2) (B-1 graph) (Top 4R) 94.400 92.930 92.130 83.733 82.000 79.860 92.000 91.330 90.266 93.067 93.460 90.400 80.533 79.333 75.730
That our semantic similarity features are not ranked highly by the feature selection algorithms can be explained by the fact that our used data set contains users who post tweets in different languages. As the proposed semantic feature values are computed based on English words, these features are not effective for the used data set. To test further the effectiveness of these features, we use a reduced data set of 400 users posting English tweets only. In this case, the rank of the feature optimum matching similarity improves for all the feature ranking algorithms and for Information Gain, it reduces from 19 to 14. Still, the semantic similarity features seem less effective than our graph-based features. Another difficulty with the used data set is that, in this data set, advertisers are not considered as spammers unless they post malicious links. So, we can expect that some users labeled ‘legitimate’ have tweets with similar contents in the used data, which also reduces the effectiveness of the similarity based features like tweet similarity or optimum matching similarity in distinguishing spammers from legitimate users. It is worth noting that some non-robust features like F-F Ratio, reputation and number of followers are ranked highly by all feature selection algorithms. But these features can be easily manipulated by evasive spammers and thus may become less important than robust features over time.
6
Conclusion
We proposed several robust features to detect (evasive) Twitter spammers and evaluated their effectiveness for distinguishing spammers from legitimate users.
Our empirical results show that our proposed robust features can improve the performance of classifiers, when used along with the existing four robust features. Our further experiments on feature ranking also suggest that the two proposed graph-based features are highly effective in discovering spammers. It will be interesting to experiment with data sets that contain a large ratio of evasive spammers, when such data becomes available. We suspect that all five of our proposed features will become more effective as spammers start to use more sophisticated methods for evading detection.
References 1. Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: 7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (2010) 2. Bonacich, P.: Factoring and weighting approaches to status scores and clique identification. Journal of Mathematical Sociology 2(1), 113–120 (1972) 3. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Research Logistics 52, 7-21 (2005). 4. Lazic, M.: On the Laplacian energy of a graph. Czechoslovak Mathematical Journal 56(4), 1207–1213 5. Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots + machine learning. In: Proc. 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 435–442. ACM (2010) 6. Lin, D.: An information-theoretic definition of similarity. In: Proc. 15th International Conference on Machine Learning. pp. 296–304 (1998) 7. Lintean, M.C., Moldovan, C., Rus, V., McNamara, D.S.: The role of local and global weighting in assessing the semantic similarity of texts using latent semantic analysis. In: Proc. 23rd FLAIRS Conference (2010) 8. McCord, M., Chuah, M.: Spam detection on Twitter using traditional classifiers. In: Proc. 8th International Conference on Autonomic and Trusted Computing. pp. 175–186. Springer (2011) 9. Qi, X., et al.: Terrorist networks, network energy and node removal: A new measure of centrality based on Laplacian energy. Social Networking 2, 19–31 (2013) 10. Rus, V., Lintean, M., Moldovan, C., Baggett, W., Niraula, N., Morgan, B.: The similar: Corpus: A resource to foster the qualitative understanding of semantic similarity of texts. In: Semantic Relations II: Enhancing Resources and Applications, The 8th Language Resources and Evaluation Conference. pp. 50–59 (2012) 11. Song, J., Lee, S., Kim, J.: Spam filtering in Twitter using sender-receiver relationship. Lecture Notes in Computer Science, vol. 6961, pp. 301–317. Springer Berlin Heidelberg (2011) 12. Thomas, K., McCoy, D., Grier, C., Kolcz, A.: Trafficking fraudulent accounts: The role of the underground market in Twitter spam and abuse. In: Proc. USENIX Security Symposium. pp. 196–210 (2013) 13. Wang, A.H.: Don’t follow me: Spam detection in Twitter. In: Proc. 2010 International Conference on Security and Cryptography. pp. 1–10 (2010) 14. Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving Twitter spammers. IEEE Transactions on Information Forensics and Security 8(8), 1280–1293 (2013)