Anomaly Detection in Q & A based Social Networks Neda Soltani1, Elham Hormizi2 and S. Alireza Hashemi Golpayegani3 1
Computer and IT Engineering Department, Amirkabir University of Technology, Tehran, Iran
[email protected] 2 Computer and IT Engineering Department, University of Science and Technology, Babol, Mazandaran
3
Computer and IT Engineering Department, Amirkabir University of Technology, Tehran, Iran
[email protected] [email protected]
Abstract. Detection of anomalies in question/answer based social networks is important in terms of finding the best answers and removing unrelated posts. These networks are usually based on users' posts and comments, and the best answer is selected based on the ratings by the users. The problem with the scoring systems is that users might collude in rating unrelated posts or boost their reputation. Also, some malicious users might spam the discussion. In this paper, we propose a network analysis method based on network structure and node property for exploring and detecting these anomalies. Keywords: Anomaly Detection; Social Networks; Q & A; Reputation boosting; Spam Detection.
1
Introduction
Widespread participation in question and answer sites and answering specialized questions, has led to the creation of massive data collections that are growing rapidly. On the other hand, it's hard to detect related, correct, and non-spam responses. In order to identify spam, misleading or irrelevant answers that are replied to a question or discussion, it is necessary to analyze these responses. Besides natural-language analysis methods that have many complexities, some of these anomalies can be identified based on the structure of communication between individuals and the content of the posts. For instance, authors of [1] state that spammers create star-like sub-networks. Anomaly means deviation from expected behavior. This means there exists patterns in observed data that do not match the definition of normal behavior. In social networks, anomalies mean interactive patterns that have significant differences from the whole network. In fact, the definition of anomaly depends on the nature of the problem. Various types of anomalies could be defined in social network environments, depending on the network of question. For example, spam emails are known as as anomaly. In a network-based trust system, collusion is identified as another type of anomaly. These are just examples of anomaly types in network structures. Considering the total amount of resources, time, and cost spent on these anomalies, it is necessary
to develop solutions to this issue. According to statistics, 67% of email traffic within the period of January to June 2014 was spam. Also, in 82% of cases, social networks were used for online abuse. These examples indicate the importance of the issue. These anomalies appear as abrupt changes in interactions or interaction, which are completely different from the usual form in a particular network. For instance, subnets that are created for collusion have certain forms of interaction. Another symptom of anomalies is highly interconnected subnets or star-like structures. Solutions that have been proposed to detect anomalies in social networks are in two categories: Checking and comparing the network model with a normal interaction model Checking network attributes Therefore, detection of anomalies in social networks involves the selection and calculation of network characteristics, and classification and observation in the characteristics space. The first challenge is the definition of normal behavior. Social networks do not have a fixed and balanced structure in all components due to the diversity of individuals and available nodes; and the definition of a normal structure in such networks is not possible. Another issue is that distributes of node degrees and network structure of communities changes over time. The scenarios presented for a normal structure are not necessarily real-time and it's possible for a network to change before structure is extracted. Anomaly detection includes the following steps [1]: 1) Determining the smallest affected unit by behavior, 2) Identifying characteristics that are different from normal states, 3) Determining the context, 4) Calculation of characteristics and extracting a characteristic space and 5) Calculation of the distance between observations. The difference between anomaly detection in social networks and other areas is that in social networks we have individuals –containing characteristics- and the relationships between them –, which are relevant to their characteristics. Networks may be static or dynamic, labeled or not, and local or global; all of which affect the definitions in the network, and also a definition of anomalies. Therefore, the method used for anomaly detection in a friendship social network does not necessarily have optimal result in an authors’ network. In this paper, we will use social network analysis methods to detect anomalies in content sent by users in a question and answer based social network. To achieve this goal we have to first define the anomaly type; and second, present the detection method based on the network and anomaly properties. Then, we will use network analysis methods to use the presented method on the selected network. The main contribution of this paper is using node properties along with graph structure for detecting anomalies. The remainder of the paper is organized as follows: In the next section literature review is throughout the recent works in this area. In section 3, the problem statement is presented in details. Then, our proposed solution methodology is explained. Section 4 covers the experiments and results of our tests and finally, in section 5 we conclude our work and discuss future works.
2
Related work
The types of anomalies in terms of the anomaly detection are in the following categories [1]: Static unlabeled anomalies, Static labeled anomalies, Dynamic unlabeled anomaly, and Dynamic labeled anomaly. Detection of anomalies is critical in preventing malicious activities such as bullying, designing terrorist attacks and disseminating counterfeit information. The authors of [2] examined the work that has been done to detect anomalies in social networks and focus on the effects of new anomalies in social media and most new techniques to identify specific types of anomalies. There are also a variety of studies on the detection of anomalies, data types and data attributes in the social network, anomalies are detected in network data [3,4,5,8], which focus on graph data, including data weights, to detect anomalies. An "ego-nets" is provided that includes sub-graphs of favorite nodes and neighboring nodes, and an "oddball" sphere regards around each node at the substrate of the adjacent nodes that exists to each node, Then, a small list of numerical features is designed for it. Detection of anomalies in temporary data has been done by [7, 9, and 10]. The key idea is to create a Granger graphical model on a reference data set, and using a series of restrictions on the existing model, assuming that there is time dependence as reference data, they test the determined dataset and also speed up detection of anomalies by several random and parallel optimization algorithms. The proposed methods in the referred papers cause the effectiveness of accuracy and stability. Reference [11] discusses about advances in detecting fraud and malformation for social network data, including point anomaly detection. In that, a taxi driving fraud detection system was used. To implement the system, there are a large number of GPS trackers for 500 taxi drivers and systematically, they have investigated counterfeit activities of taxi drivers. [12] Uses an algorithm called WSAR3E.0 that can detect anomalies in simulated data with the earliest possible detection time and a low false positive number. It is also discussed in some articles about the detection of group malformations in social networks, applications, and systems. In [13], in order to identify the social implicit relations and close entities in the dataset, a framework has been used to solve similar unusual users in the real-world datasets. This approach requires a model for coping of communications, a model for independent users, and a method for distinguishing between them. In [14], a graphical model called GLAD, which has the ability to discover the group structure of social networks and detect group anomalies and also, required tests are performed on real and unrealistic datasets by anomaly injections Which automatically checks the nodes of a multi-layer network based on the degree of similarity of the nodes to the stars in different layers and by parallelizing the extracted features and anomalous detection operations in different layers of the multi-layer network, significantly, the calculations have been increased by the distribution of inputs to different machines cores. The paper [16] analyzes the distribution of input times and the volume of events such as comments and displays of online surveys for ranking and detecting suspicious users, such as spammers, bots and Internet fraudsters are being used. In this paper, a relative model called VOLTIME is presented that measures the distribution of input times from real users.
In another research-based on the idea that most user behavior is divergent from what can be considered as 'normal behavior', there is a risk assessment that results in more risks [17]. Because similar users follow a series of similar rules on social networks, this assessment is organized in two phases: Similar users are first grouped together, then, for each identified group, one or more models are constructed for their normal behavior. [18] Using the recorded sessions to solve the problem of whether each session is abnormal determines the degree of anomalies in each session. Implementing robust statistical analyzes on such data is very challenging as the number of observed sessions is much smaller than the number of network users. The new method being forwarded in this paper for detecting anomalies in a very large dimension based on hyper-graphs, an important extension of graphs in which simultaneously the edges connect to more than two vertices. Table 1 shows a comparison between abovementioned researches. Table 1. Comparison between recent researches on Social Networks Anomaly Detection. Reference
Anomaly Type
Target Network
Method
[3,4,5,8]
Anomalies Nodes
Weight Graph
[7,9,10]
Time-Series Anomaly Detection
Weight Graph
[11]
Point Anomaly Detection
Weight Graph
[12]
Bayesian Network Anomaly Detection
Bayesian Network
OddBall, ego-net Patterns, Hybrid Method for Outlier Node Detection Granger graphical model Taxi Driving Fraud Detection System WSAR3E.0 Algorithm, Simulation
[13]
Intrusion Detection
Graph Network
Tribes algorithm
Node
[14]
Group anomaly Detection
Graph Network
Node, weight
[15]
Multilayer Networks
Unsupervised, Parameter-Free, and Network
[16]
Suspicious Users
[17]
User Anomalous Behaviors
Unsupervised Anomaly Detection Online Social Networks
Group Latent Anomaly Detection (GLAD) model, dGLAD ADOMS (Anomaly Detection On Multilayer Social networks VOLTIME Model
Time, Node
[18]
Anomaly Detection
Two-Phase Risk Assessment Approach OddBall Algorithm
Weighted graphs
Node/Edge Property Included Density, Weights, Ranks and Eigenvalues, use Node and Edge Edge, Weight Edge, Weight Edge, Time
Node, Edge, Weight
Time
Node, Density, Weights, Ranks
3
Problem Statement and Solution Methodology
As mentioned in introduction part, we are looking for anomalies in this dataset. We limit anomaly types to spam and reputation sub-networks. Therefore, following questions are to be answered in the database: 1. Which users submit answers irrelevant to the question, spam, or aim at misleads the discussion? 2. Which users boost reputation on a mendacious basis? We have ignored comments for some reasons; first, we want to keep track of the discussion, which is mainly included in the posts not comments. Second, it would be a time-consuming task to merge the comments to the posts, as the dataset is provided separately for comments. Furthermore, comments are written in response to a single post and mostly contain details about that post, not the whole question. Finally, rating and badges are based on posts, not comments. So, the specific types of anomaly we are looking for would be found in posts. 3.1 Methodology In this section, we present our analysis made on the proposed network. The analyses aim at detecting spammer accounts, and as a result, the spam answers. Based on [4, 6], spammers create a star-like network. So, we first detect star-like sub-networks. To do so we have to create ego-net for each individual node and then study the neighbor nodes. A star-like sub-network is detected if there are few neighbors who connect directly to one another. The node in the center of a star-like sub-network is a spammer by a high possibility. The other question mentioned in the previous section is about detecting the nodes, which try to falsely boost their reputation. This is done by detecting communities whose intercommunications are too much tight [19]. Finding star-like structure. In order to detect star-like structures, we have to detect cliques of size 3, i.e. triads in ego network of each node. In order to study ego networks, we choose the nodes with the highest betweenness; as these nodes connect components of the network to each other, are likely to create star-like structure. Fig. 1 shows a pseudo code of the algorithm proposed for detecting star-like ego-networks in this paper.
Given graph G containing n nodes: foreach node i of nodes betweenness[i] = calculateBetweenness(i, G) end foreach sort nodes based on betweenness foreach node j of sorted nodes egonet[j] = findEgoNetwork(j, G) neighbornet[j] = findNeighborNetwork(j, G, egonet[j]) c = numberOfConnectedNodes(neighbornet[j]) m = numberOfNodes(neighbornet[j]) if(c/m < 0.5) – structure is star like if (reputation < reputationThreshold && upvotes < downvotes) – possibly spammer if(averagevalues < totalaveragevalues) spammer detected end if end if end if end foreach
Fig. 1. Pseudo code for the proposed algorithm
Detecting highly interconnected communities. Another type of anomaly considered in this paper is collusion in order to boost reputation. Based on [1], [19] this type of anomaly is detected by detecting highly interconnected communities. Communities having this property are almost isolated from the whole network and have a large number of edges inside. While finding this type of community, edge weights get important. In the first scenario we used to create the network, we did not consider the edge weights. In order to add weight to edges, in such a way that it shows the level of two nodes’ connectivity, we add the number of times one node answers another node’s question as the edge weight between those nodes. Considering the nature of the anomaly we want to detect, we can omit edge directions; as we are looking for high interconnectedness. We assume that these subnetworks contain malicious users who try to boost their own reputation by asking or answering another’s questions. Communities are detected by identifying isolated components of the network (Fig. 2). Given graph G containing n nodes: communities = detectCommunities(G) foreach community i in communities structure = communityStructure(community[i]) if(structure star like) clique = findBiggestClique(community[i]) foreach node j in clique if(reputation[j] >> averageReputation && upvotes >> downvotes) possible anomaly end if end foreach end if end foreach
Fig. 2. Pseudo code for the algorithm we presented for detecting anomalous communities
4
Experiments and results
4.1
Date set Specifications
The dataset has been downloaded from the Stack Exchange site and includes questions about the "Android" category on this site. This dataset contains user information, badges, comments posted below posts, questions and answers, history of post changes, posts links, and registered votes for each post. Each of this information is in a separate XML file [18]. On the Stack Exchange site, they do the control mechanism for posting and controlling the users. Each post gets a negative or positive rating from users. According to posts, people give each other a badge. Also, people's reputation is based on their posts, the number of correct answers set by the rest of the users, and so on. To work with this dataset, we first enter the information in the Excel environment and save the sections in the CSV file format. In the following, in order for the data to be able to enter the Pajek software, using a Java program, read the files and save the nodes and edges in separate files. Network Creation Scenarios One method to detect spam is detecting spammer accounts. Therefore, if we create a network of users and analyze it in order to find the spammer accounts, we could simply flag posts by those accounts as spam. Obviously, we won’t be able to detect spam sent by normal users. In the aforementioned network nodes are users. Each edge resembles a reply by a user to another user’s post. Therefore, an edge connecting user u1 to user u2 shows that user u1 has answered user u2’s one question. Edges are directed (from u1 towards u2). Therefore, a user having high in-degree in one who has answered questions by many users, and a user having a high out-degree is one who has answered questions of many users. The latter users are more important to us now, as we consider spam answers. Nodes have properties including id, reputation, account creation date, name, age, positive votes count, negative votes count, and badges. We would use these properties to detect spammer users. A large number of users are solitary; i.e. there are a large number of users who have not asked questions or answered any other questions. We remove solitary nodes, which results in the network illustrated in Fig. 3. The network created from users based on answers of each user to the other user’s question. The network has several separate components. In a plenty of cases the user has asked only one question, answered by only one user, none of whom interact with the rest of the users. In the following section, we will explain implementation of our proposed solution. There are plenty of visualizations of resulting network, which represents nodes as small circles (each of which is representative of a user either answering a question or asking one). A connection between two nodes shows an answer from one user to the other’s question.
Fig. 3. Network created based on scenario
4.2
Implementation
Detecting Star-like Ego-net. In order to find the possible spammer accounts, we choose the nodes based on betweenness and examine those nodes first. The first experiment is done on user 137 who has the most betweenness. Fig. 4 shows neighbor network, Fig. 5 shows the ego-net of node 137 and Fig. 6 shows the triads of the network in Fig. 4.
Fig. 4. Neighbor network of user 137
Fig. 5. Ego network of node 137
50 nodes of total 105 nodes create a neighbor network with 137. Therefore, the ego network of 137 is not a star like structure as more than 70% of its neighbors are connected to each other. Table 2 shows the properties of node 137 which is used to decide if anything abnormal exists about this node. The next node in the highest betweenness order is 16575. Fig. 7 and Fig. 8 show the ego-net and neighbor network of this node respectively. There are 502 nodes in 16575 neighborhoods, but only 135 of them are connected to each other. In order to analyze this node further, we check its properties as follows (Table 3). Considering upvote count of this node compared to its downvote, high reputation, and 79 badges, it is unlikely for this node to be a spammer. Although, the ego network of this user is quite close to star structure.
Fig. 6. Triads of neighbor network of node 137 Table 2. Node 137 properties ID
Reputation
CreationDate
DisplayName
UpVotes
DownVotes
Age
Cb
137
14905
2010-0914T02:48:38. 087
Matt
1236
18
0.0040
Fig. 7. Ego network of 16575
Fig. 8. Neighbor network of 16575 Table 3. Properties of node 16575 ID
Reputation
CreationDate
DisplayName
UpVotes
DownVotes
Cb
1657 5
45479
2012-0702T20:06:13.047
Izzy
1452
213
0.00 34
The third experiment is done on user 1465. 110 nodes out of 272 nodes in 1465’s neighborhood are connected to each other (45%). Considering this node’s properties, we can see it has a high reputation, but the downvotes outnumber the upvotes. It is
possible that 1465 is a spammer user (Fig. 9, 10). Considering other properties of this node, we can see this user has had 1012 posts with an average rating of 3.32, average view of 20500, the average answer to questions of 1.33, and average comments on posts of 1.42. We compare these numbers to the overall average values (Table 4). It is seen that average values for user 1465 is above, or almost equal to overall values; based on which we conclude user 1465 is not a spammer, despite the prior guess. Other nodes having a high betweenness are studied the same way.
Fig. 9. Ego network of 1465
Fig. 10. Neighbor network of 1465 Table 4. Properties of node 1465 compared to the overall average Average
Score
ViewCount
AnswerCount
CommentCount
FavoriteCount
All data
1.75
2937.04
1.175
1.226
1.655
1465
3.32
20500.61
1.33
1.42
5.762
Detecting communities. Communities are detected by identifying isolated components of the network. We omit components having less than 4 nodes. The result is shown in Fig. 11. We consider the biggest component by detecting communities in it and removing the edges, which connect communities to each other (Fig. 12).
Fig. 11. Communities in the network
Fig. 12. Communities inside the biggest component of the network after removing components having less than 4 nodes and edges between components.
In order to detect highly interconnected communities, each community is studied solo. For each community, we study the degree distribution, the most central node, and the reputation average of the community. As seen in Fig. 13 the sub-network has starlike structure and is not highly interconnected. The most central node has the following properties (Table 5).
Fig. 13. Community with the highest number of nodes Table 5. properties of node 40036
ID
Reputation
CreationDate
DisplayName
UpVotes
DownVotes
40036
3705
2013-08-25T09:42:20.677
RossC
913
885
Age
This user’s reputation is higher than the total average reputation. Nothing is anomalous about this node so we move on to the next community. One of the communities does not have star-like structure (which makes it possible to be interconnected – Fig. 14). The biggest clique in it is as represented in Fig. 15.
Fig. 14. A community, which is not star like
Fig. 15. Biggest clique Table 6. 10 highest degree centrality nodes are as follows:
ID
Reputation
CreationDate
DisplayName
UpVotes
DownVotes
Age
Degree
137
14905
2010-09-14
Matt
1236
18
10
18945
2010-09-13
Bryan Denny
1481
30
482
15609
2010-09-27
Lie Ryan
3591
141
56
15
4856
2010-09-13
gary
1498
44
31
594
3820
2010-10-02
Edelcom
376
2
54
23
366
915
2010-09-22
Casebash
154
1
28
18
86
2168
2010-09-13
FoleyIsGood
165
1
33
17
382
1804
2010-09-22
BrianCooksey
119
0
49
16
7
1687
2010-09-13
Jonas
78
17
16
280
520
2010-09-21
Radek
159
0
15
70 29
65
All the nodes in Table 6 were created within two weeks. Most of them have a high reputation, and their up-votes are much bigger than their downvotes. The clique created in the aforementioned community is possible an anomaly because of it resembles a highly interconnected subnetwork. Given that other communities have a similar structure, this structure is abnormal.
The reason behind the fact that most communities have star-like structure is that experts in each field answer questions of their own expertise and rarely answer the question in all fields. Therefore, most users have asked few questions and these questions have been answered by few numbers of experts in that specific field, who are at the center of the stars. For this community having a different structure, there could be two hypotheses: there exists a number of experts that communicate to one another and rarely answer questions of other users, or there are users in it who have joined the network in order to get badges and boost reputation. Considering the creation time of the users in this clique, the second hypothesis is further reinforced. Other communities exist that have structures different from star-like subnetwork. Fig. 16 shows them:
Fig. 16. Other communities with star structure
5
Conclusion & Discussion
In this paper, we presented a solution to detect anomalies in social networks. We focused on a famous QA network; therefore, the anomalies were defined as inappropriate answers (e.g. spam) and false reputation boosting. In order to detect these two types of anomalies, we suggested and applied two different approaches. For detecting spammers, we used a methodology to detect star like ego networks, and for detecting false reputation boosting, we detected highly interconnected networks. As another contribution of this paper, we considered network structure and node properties at the same time, which helps to get results that are more accurate. Detecting anomalies in social networks highly depend on the type, structure, and the content of the network. Based on the type of anomaly to be detected, different network scenarios exist. Also based on the network creation scenario, the solution will be different. All of which makes it impossible to present a general-purpose anomaly detection method. The limitations of the research include the challenges of combining network analysis results with mining results on node properties. As seen in this paper, we analyzed nodes after finding the most probable abnormal node using network solutions. Yet, there is not a unique systematic solution to this. As the future path for this research, one can consider the following: Analysis and detection of other possible types of anomalies in a typical Q & A social network, such as spurious expertise, irrelevant answers, offensive comments, etc. Extension of research to user feedback-based areas like product overview, discussion forums, and social groups; each of which is potentially fit for spam and reputation boosting. Implementing different network generation scenarios; e.g. a weighted graph of users based on the number of interactions between two users, a second layer network generated based on keywords of users and questions. These scenarios might help better in detecting abnormal behavior within the current context.
References 1. Savage, D., Zhang, X., Yu, X., Chou, P., & Wang, Q.: Anomaly detection in online social networks. Social Networks 39, 62-70 (2014). 2. Liu, Y., Chawla, S.: Social media anomaly detection: Challenges and solutions. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 817-818. ACM, Cambridge, United Kingdom (2017). 3. Akoglu, L., McGlohon. M.: Anomaly detection in large graphs. CMU-CS-09-173 Technical Report, (2009). 4. Akoglu L., McGlohon M., Faloutsos C.: oddball: Spotting Anomalies in Weighted Graphs. In: Zaki M.J., Yu J.X., Ravindran B., Pudi V. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2010. LNCS, vol 6119. Springer, Berlin, Heidelberg (2010). 5. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection for discrete sequences: A survey. IEEE Transactions on Knowledge and Data Engineering 24(5), 823-839 (2012).
6. Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C.: Neighborhood formation and anomaly detection in bipartite graphs. In: Fifth IEEE International Conference on Data Mining, pp. 418-425, IEEE Computer Society, Washington, DC, USA, (2005). 7. Cheng, H., Tan, P. N., Potter, C., Klooster, S.: Detection and characterization of anomalies in multivariate time series. In: Proceedings 8. Tong, H., Lin, C.-Y.: Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection. In: Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 143-153. Society for Industrial and Applied Mathematics, (2011). 9. Qiu, H., Liu, Y., Subrahmanya, N.A. and Li, W.: Granger causality for time-series anomaly detection. In: IEEE 12th International Conference on Data Mining (ICDM), pp. 1074-1079, IEEE (2012). 10. Sun, P., Chawla, S. and Arunasalam, B.: Mining for outliers in sequential databases. In: Proceedings of the 2006 SIAM International Conference on Data Mining, pp. 94-105, Society for Industrial and Applied Mathematics (2006). 11. Ge, Y., Xiong, H., Liu, C. and Zhou, Z.H.: A taxi driving fraud detection system. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 181-190. IEEE (2011). 12. Wong, W.K., Moore, A.W., Cooper, G.F. and Wagner, M.M.: Bayesian network anomaly pattern detection for disease outbreaks. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 808-815. IEEE (2003). 13. Friedland, L. and Jensen, D.: August. Finding tribes: identifying close-knit individuals from employment patterns. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 290-299. ACM, Vancouver (2007). 14. Yu, R., He, X. and Liu, Y.: Glad: group anomaly detection in social media analysis. ACM Transactions on Knowledge Discovery from Data (TKDD) 10(2), p.18. (2015). 15. Bindu, P.V., Thilagam, P.S. and Ahuja, D.: Discovering suspicious behavior in multilayer social networks. Computers in Human Behavior 73, 568-582 (2017). 16. Chino, D.Y., Costa, A.F., Traina, A.J. and Faloutsos, C.: VolTime: Unsupervised Anomaly Detection on Users' Online Activity Volume. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 108-116. Society for Industrial and Applied Mathematics (2017). 17. Laleh, N., Carminati, B. and Ferrari, E.: Risk Assessment in Social Networks based on User Anomalous Behaviour. IEEE Transactions on Dependable and Secure Computing (2016). 18. Stack Exchange Data Dump, https://archive.org/details/stackexchange, last accessed (2017/11/09). 19. Pandit, S., Chau, D.H., Wang, S. and Faloutsos, C.: Netprobe: a fast and scalable system for fraud detection in online auction networks. In: Proceedings of the 16th international conference on World Wide Web, pp. 201-210. ACM (2007).