Anomaly Detection in Dynamic Networks of Varying Size

11 downloads 5380 Views 2MB Size Report
Nov 13, 2014 - network statistics calculated on the graph examples. The ac- ..... the probability of the two nodes having the same degree ap- proaches zero. ..... steps 88-89, and (c) Degree Shift: Facebook time steps 249-250. Each plot ...
Anomaly Detection in Dynamic Networks of Varying Size 1

Timothy La Fond1 , Jennifer Neville1 , Brian Gallagher2 Purdue University, 2 Lawrence Livermore National Laboratory {tlafond,neville}@purdue.edu, [email protected]

arXiv:1411.3749v1 [cs.SI] 13 Nov 2014

ABSTRACT Dynamic networks, also called network streams, are an important data representation that applies to many real-world domains. Many sets of network data such as e-mail networks, social networks, or internet traffic networks are best represented by a dynamic network due to the temporal component of the data. One important application in the domain of dynamic network analysis is anomaly detection. Here the task is to identify points in time where the network exhibits behavior radically different from a typical time, either due to some event (like the failure of machines in a computer network) or a shift in the network properties. This problem is made more difficult by the fluid nature of what is considered ”normal” network behavior. The volume of traffic on a network, for example, can change over the course of a month or even vary based on the time of the day without being considered unusual. Anomaly detection tests using traditional network statistics have difficulty in these scenarios due to their Density Dependence: as the volume of edges changes the value of the statistics changes as well making it difficult to determine if the change in signal is due to the traffic volume or due to some fundamental shift in the behavior of the network. To more accurately detect anomalies in dynamic networks, we introduce the concept of DensityConsistent network statistics. These statistics are designed to produce results that reflect the state of the network independent of the volume of edges. On synthetically generated graphs anomaly detectors using these statistics show a a 20400% improvement in the recall when distinguishing graphs drawn from different distributions. When applied to several real datasets Density-Consistent statistics recover multiple network events which standard statistics failed to find, and the times flagged as anomalies by Density-Consistent statistics have subgraphs with radically different structure from normal time steps.

1.

INTRODUCTION

Network analysis is a broad field but one of the more important applications is in the detection of anomalous or critical

events. These anomalies could be a machine failure on a computer network, an example of malicious activity, or the repercussions of some event on a social network’s behavior [16, 12]. In this paper, we will focus on the task of anomaly detection in a dynamic network where the structure of the network is changing over time. For example, each time step could represent one day’s worth of activity on an e-mail network. The goal is then to identify any time steps where the pattern of those communications seems abnormal compared to those of other time steps. As comparing the communication pattern of two network examples directly is complex, one simple approach is to summarize each network using a network statistic then compare the statistics. A number of anomaly detection methods rely on these statistics [15, 2, 8]. Another method is to use a network model such as ERGM [13]. However, both these methods often encounter difficulties when the properties of the network are not static. A typical real-world network experiences many changes in the course of its natural behavior, changes which are not examples of anomalous events. The most common of these is variation in the volume of edges. In the case of an email network where the edges represent messages, the total number of messages could vary based on the time or there could be random variance in the number of messages sent each day. The statistics used to measure the network properties are usually intended to capture some other effect of the network than simply the volume of edges: for example, the clustering coefficient is usually treated as a measure of the transitivity. However, the common clustering coefficient measure is statistically inconsistent as the density of the network changes. Even on an Erdos-Renyi network, which does not explicitly capture transitive relationships, the clustering coefficient will be greater as the density of the network increases. When statistics vary with the number of edges in the network, it is not valid to compare different network time steps using those statistics unless the number of edges is constant in each time step. A similar effect occurs with network models that employ features or statistics which are size sensitive: [11] show that ERGM models learn different parameters given subsets of the same graph, so even if the network properties are identical observing a smaller portion of the network leads to learning a different set of parameters.

The purpose of this work is to analytically characterize statistics by their sensitivity to network density, and offer principled alternatives that are consistent estimators, which empirically give more accurate results on networks with varying densities. The major contributions of this paper are: • We prove that several commonly used network statistics are Density Dependent and poorly reflect the network behavior if the network size is not constant. • We offer alternative statistics that are Density Consistent which measure changes to the distribution of edges regardless of the total number of observed edges. • We demonstrate through theory and synthetic trials that anomaly detection tests using Density Consistent statistics are better at identifying when the distribution of edges in a network has changed. • We apply anomaly detection tests using both types of statistics to real data to show that Density Consistent statistics recover more major events of the network stream while Density Dependent statistics flag many time steps due to a change in the total edge count rather than an identifiable anomaly. • We analyze the subgraphs that changed the most in the anomalous time steps and demonstrate that Density Consistent statistics are better at finding local features which changed radically during the anomaly.

2.

STATISTIC-BASED ANOMALY DETECTION

A statistic-based anomaly detection method is any method which makes its determination of anomalous behavior using network statistics calculated on the graph examples. The actual anomaly detection process can be characterized in the form of a hypothesis test. The network statistics calculated on examples demonstrating normal network behavior form the null distribution, while the statistic on the network being tested for anomalies forms the test statistic. If the test statistic is not likely to have been drawn from the null distribution, we can reject the null hypothesis (that the test network reflects normal behavior) and conclude that it is anomalous. Let Gt = {V, Et } be a multigraph that represents a dynamic network, where V is the node set and Et is the set of edges at time t, with eij,t the number of edges between nodes i and j at time t. The edges represent the number of interactions that occurred between the nodes observed within a discrete window of time. As the number of participating nodes is relatively static compared to the number of communications, we will assume that the node set is a constant; in time steps where a node has no communications we will treat it as being part of the network but having zero edges. Let us define some network statistic Sk (Gt ) designed to measure network property k which we will use as the test statistic (e.g., clustering coefficient). Given some set of time steps tLx ∈ {tL1 , ...tLmax } such that the set of graphs in those

times {GtLx } are all examples of normal network behavior, Sk is calculated on each of these learning set examples to estimate an empirical null distribution. If ttest is the time step we are testing for abnormality, then the value Sk (Gttest ) is the test statistic. Given a specified p-value (referred to as α), we can find threshold(s) that reject p percentage of the null distribution, then draw our conclusion about whether to reject the null hypothesis (conclude an anomaly is present) if the test statistic Sk (Gttest ) falls out of those bounds. For this work we will use a two-tailed Z-test with a p-value of α = 0.05 with the thresholds φlower and φupper . Anomalous test cases where the null hypothesis is rejected correspond to true positives; normal cases where the null hypothesis is rejected correspond to false positives. Likewise anomalous cases where the null is not rejected correspond to false negatives and normal cases where the null is not rejected correspond to true negatives. Deltacon [5] is an example of the statistic-based approach, as are many others [7, 8, 9, 14]. As these models also often incorporate network statistics, we will focus on the statistics themselves in this paper. Moreover, not all methods rely solely on statistics calculated from single networks: there are also delta statistics which measure the difference between two network examples. Netsimile is an example of such a network comparison statistic [1]. For a dynamic network, these delta statistics are usually calculated between the networks in two consecutive time steps. In practical use for hypothesis testing, these delta statistics function the same as their single network counterparts. If the network properties being tested are not static with respect to the time t this natural evolution may cause Sk (Gt ) to change over time regardless of anomalies which makes the null distribution invalid. It is useful in these instances to replace the statistic with a detrended version S ∗k (Gt ) = Sk (Gt ) − f (t) where the function f (t) is some fit to the original statistic values. The paper by [6] describes how to do dynamic anomaly detection using a linear detrending function, but other functions can be used for the detrending. This detrending operation does not change the overall properties of the statistic so for the remainder of the paper assume Sk (Gt ) refers to a detrended version, if appropriate.

2.1

Common Network Statistics

Listed here are some of the more commonly used network statistics for the anomaly detection. Graph Edit Distance: The graph edit distance (GED) [4] is defined as the number of edits that one network needs to undergo to transform into another network and is a measure of how different the network topologies are. In the context of a dynamic network the GED is applied as a delta measure to report how quickly the network is changing. GED(Gt ) = |Vt | + |Vt−1 | − 2 ∗ |Vt ∩ Vt−1 | + |Et | + |Et−1 | − 2 ∗ |Et ∩ Et−1 |

(1)

Degree Distribution Difference: P Define Di,t = j6=i eij,t to be the degree of i, the total number of messages to or from the node. The degree distribution is then a histogram of the number of nodes with a particular degree. Typically real-world network exhibit a power-law

degree distribution [3] but others are possible. To compare the degree distributions of t and t − 1, one option is to take the squared difference between the degree counts for each possible degree. We will call this the degree distribution difference (DD): maxi (Di,t ,Di,t−1 )

X

DD(Gt ) =

k=1

(

X

I(Di,t = k) − I(Di,t−1 = k))2

i

(2)

Other measures can be used to compare the two distributions but the statistical properties of the squared distance described later extend to other distance measures. Weighted Clustering Coefficient: Clustering coefficient is intended as a measure of the transitivity of the network. It is a critical property of many social networks as the odds of interacting with friends of your friends often lead to these triangular relationships. As the standard clustering coefficient is not designed for weighted graphs we will be analyzing a weighted clustering coefficient, specifically the Barrat weighted clustering coefficient (CB)[10]: CB(Gt ) =

X i

X ei,j + ei,k 1 ai,j ∗ ai,k ∗ aj,k N ∗ (ki − 1) ∗ si j,k 2 (3)

P

A second problem occurs when the edge counts in the learning set have high variance. If the statistic is dependent on the number of edges, noise in the edge counts translates to noise in the statistic values which lowers the statistical power of the test. Theorem 3.2 False Negatives for Density Dependent Statistics For any Sk (Gttest ) calculated on a network that is anomalous with respect to property k, if Sk (Gt ) is dependent on |Et | there is some value of the variance of the learning network edge counts such that Sk (Gttest ) is not detected as an anomaly.

P

Where ai,j = I[ei,j > 0], si = j ei,j , and ki = j ai,j . Other weighted clustering coefficients exists but they behave similarly to the Barrat coefficient.

3.

differs sufficiently in its number of edges compared to the learning examples the null hypothesis will be rejected regardless of the other network properties. As for why it is not sufficient to simply label these times as anomalies (due to unusual edge volume) the hypothesis test is designed to test for abnormality in a specific network property. If edge count anomalies also flag anomalies on other network properties, we cannot disambiguate the case where both are anomalous or just the message volume. If just the message volume is unusual, this might simply be an example of an exceptionally busy day where the pattern of communication is roughly the same just elevated. This is a very different case from where both the volume and the distribution of edges are unusual.

DENSITY DEPENDENCE

To illustrate why dependency of the network statistic on the edge count affects the conclusion of hypothesis tests, we will us first investigate statistics that are density dependent. Definition 3.1 A statistic S is density dependent if the value of S(Gt ) is dependent on the density of Gt (i.e., |Et |). Theorem 3.1 False Positives for Density Dependent Statistics Let L = {GtLx } be a learning set of graphs and Gttest be the test graph. If Sk (Gt ) is monotonically dependent on |Et | and {|EtLx |} is bounded by finite Elower and Eupper , there is some |Ettest | that will cause the test case to be rejected regardless of whether the network is an example of an anomaly with respect to property k. Proof. Let Sk (G) be a network statistic that is a monotonic and divergent function with respect to the number of edges in G. Given a set of learning graphs {GtL } the values of Sk ({GtL }) are bounded by max(Sk ({GtL })) and min(Sk ({GtL })), so the critical points φlower and φupper of a hypothesis test using this learning set will be within these bounds. Since an increasing |Ettest | implies Sk (Gttest ) increases or decreases, then there exists a |Ettest | such that Sk (Gttest ) is not within φlower and φupper and will be rejected by the test. If changing the number of edges in an observed network changes the output of the statistic, then if the test network

Proof. Let Sk (Gttest ) be the test statistic and {GtL } be the set of learning graphs where the edge count of any learning graph |EtLx | be drawn according to distribution ΦE . If Sk (Gt ) is a monotonic divergent function of |Et | then as the variance of ΦE increases the variance of Sk (GtLx ) increases as well. For a given α, the hypothesis test thresholds φlower and φupper will widen as the variance increases to incorporate 1 − α learning set instances. Therefore, for a given Sk (Gttest ) there is some value of var(ΦE ) such that φlower < Sk (Gttest ) < φupper . With a sufficient amount of edge count noise, the statistical power of the anomaly detector drops to zero. These theorems have been defined using a statistic calculated on a single network, but some statistics are delta measures which are measured on two networks. In these cases, the edge counts of either or both of the networks can cause edge dependency issues. Lemma 3.3 If a delta statistic Sk (Gt , Gt−1 ) is dependent on either Emin (Gt ) = min(|Et |, |Et−1 |) or E∆ (Gt ) = abs(|Et |− |Et−1 |) then Thms 3.1-3.2 apply to the delta statistic. Proof. For some delta statistic Sk (Gt , Gt−1 ) if the statistic is dependent on |Et |, |Et−1 |, or both, Theorems 3.1 and 3.2 apply to any edge count which influences the statistic. If Sk (Gt , Gt−1 ) depends on E∆ = abs(|Et | − |Et−1 |) then as either |Et | or |Et−1 | change the statistic produced changes leading to the problems described in theorems 3.1 and 3.2. If Sk (Gt , Gt−1 ) depends on Emin = min(|Et |, |Et−1 |) then if both |Et | and |Et−1 | increase or decrease the statistic is affected leading to the same types of errors.

These theorems show that dependency on edge counts can lead to both false positives and false negatives from anomaly detectors that look for unusual network properties other than edge count. In order to distinguish between the observed edge counts in each time step and the other network properties, we need a more specific data model to represent the network generation process with components for each.

4.

DENSITY INDEPENDENCE

Now that we have established the problems with density dependence, we need to define the properties that we would prefer our network statistics to have. To do this we need a more detailed model of how the graph examples were generated.

4.1

Data Model

Let the number of edges in any time step |Et | be a random variable drawn from distribution MEn (t) in times where there is a normal message volume and distribution MEa (t) in times where there is anomalous message volume. Now let the distribution of edges amongst the nodes of the graph be represented by a N ×N matrix Pt where the value of any cell pij,t is the probability of observing a message between two particular node pairs at a particular time. This is a probability distribution so the total matrix mass sums to 1. Like edge count, treat this matrix as drawn from distribution MPn (t) a in normal times and MP k (t) in anomalous times where k is the network parameter that is anomalous (for example, an atypical degree distribution). Any observed network slice can be treated as having been generated by a multinomial sampling process where |Et | edges are selected independently from N xN with probabilities Pt . Denote the sampling procedure for a graph Gt with F (|Et |, Pt ). In the next section, we will detail how this decomposition into the count of edges and their distribution allows us to define statistics which are not sensitive to the number of edges in the network.

4.2

Density Consistency and Unbiasedness

In the above data model, Pt is the distribution of edges in the network, thus any property of the network aside from the volume of edges is encapsulated by the Pt matrix. Therefore, a network statistic designed to capture some network property other than edge count should be a function of Pt alone. Let Sk (Pt ) be some test statistic designed to capture a network property k of Pt , that is independent of the density of Gt . Since Pt is not directly observable we can estimate the statistic with the empirical statistic Sˆk (Gt ) where Gt is used eij,t . to estimate Pt , with pˆij,t = |E t| ˆ t) Definition 4.1 A statistic S is density consistent if S(G is a consistent estimator of S(Pt ). If Sˆk (Gt ) is a consistent estimator of the true value of Sk (Pt ), then observing more edge examples should cause the estimated statistics to converge to the true value given Pt . More specifically it is asymptotically consistent with respect to the true value as the number of observed edges increases. Another way to describe this property is that Sˆk (Gt ) has some bias term dependent on the edge count |Et |, but the bias converges to zero as the edge count increases: lim|Et |→∞ Sˆk (Gt )− Sk (Pt ) = 0. Density consistent statistics allow us to perform

accurate hypothesis tests as long as a sufficient number of edges are observed in the networks. To begin, we will prove that the rate of false positives does not exceed the selected p-value α. Theorem 4.1 False Positive Rate for Density Consistent Statistics As |Et | → ∞ if Sˆk (Gt = F (|Et |, Pt ∼ MPn (t)) converges to Sk (Pt ), the probability of a false positive when testing a time with an edge count anomaly Gttest = F (|Et |, Pt ∼ MPn (t)) approaches α. Proof. Let all learning set graphs GtLx = F (|EtLx | ∼ MEn (tLx ), PtLx ∼ MPn (tLx )) be drawn from non-anomalous |E| and P distributions and the test instance Gttest = F (|Ettest | ∼ MEa (ttest ), Pttest ∼ MPn (ttest )) be drawn from an anomalous |E| distribution but a non-anomalous P distribution. If Sˆk (Gt ) is a consistent estimator of Sk (Pt ), lim|Et |→∞ Sˆk (Gt ) = Sk (Pt ). Then as both |EtLx | and |Ettest | increase, all learning set instances and the test set instance approach the distribution Sk (Pt ∼ MPn (t)). As any threshold is as likely to reject a learning set instance as the test instance, the false positive rate approaches α. As the bias converges to zero, graphs created with the same underlying properties will produce statistic values within the same distribution, making the test case come from the same distribution as the null. Even if the test case has an unusual number of edges, as long as the number of edges is not too small there will not be a false positive. Density consistency is also beneficial in the case of false negatives. Theorem 4.2 False Negative Rate for Density Consistent Statistics As |Et | → ∞ let Sˆk (Gt = F (|Et |, Ptn ∼ MPn (t))) converge to Sk (Ptn ) and Sˆk (Gt = F (|Et |, Pta ∼ MPa (t)) converge to Sk (Pta ). If Sk (Ptn ) and Sk (Pta ) are separable then the probability of a false negative → 0 as |Et | increases. Proof. Let all learning set graphs GtLx = F (|EtLx | ∼ MEn (tLx ), PtLx ∼ MPn (tLx )) be drawn from non-anomalous |E| and P distributions and the test instance Gttest = F (|Ettest | ∼ a MEn (ttest ), Pttest ∼ MP k (ttest )) be drawn from a non-anomalous |E| distribution but a P distribution that is anomalous on the network property being tested. Let Sˆk (Gt ) be a consisa tent estimator of Sk (Pt ). If Sk (Pt ∼ MP k (t)) and Sk (Pt ∼ n MP (t)) are separable, then lim

Sˆk (Gttest )

lim

Sˆk (GtLx )

|Ettest |→∞

and |EtLx |→∞

converge to two non-overlapping distributions and the probability of rejection approaches 1 for any α. The statistical power of a density consistent statistic depends only on whether the P matrices of normal and anomalous graphs are separable using the true statistic value: as long as |Et | is sufficiently large the bias is small enough that it is not a factor in the rate of false negatives.

A special case of density consistency is density consistent and unbiased, which refers to statistics where in addition to consistency the statistic is also an unbiased statistic of the true Sk (Pt ). ˆ t) Definition 4.2 A statistic S is density unbiased if S(G is a unbiased estimator of S(Pt ).

Graph edit distance Degree distribution Barrat clustering Mass shift Degree shift Triangle probability

Dependent X X

Consistent

Unbiased

X X X X

X

Figure 1: Statistical properties of previous network statistics and our proposed alternatives.

Unbiasedness is a desirable property because a density consistent statistic without it may produce errors due to bias when the number of observed edges is low.

5.1

4.3

Claim 5.1 GED is a density dependent statistic.

Proposed Density-Consistent Statistics

We will now define a set of Density-Consistent statistics designed to measure network properties similar to the previously described dependent statistics, but without the sensitivity to total network edge count. Probability Mass Shift: The probability mass shift (MS) is a parallel to GED as a measure of how much change has occurred between the two networks examined. Mass Shift, however, attempts to measure the change in the underlying P distributions and avoids being directly dependent on the edge counts. The probability mass shift between time steps t and t − 1 is M S(Pt ) =

X

(pij,t − pij,t−1 )2

The MS can be thought of as the total edge probability that changes between the edge distributions in t and t − 1. Probabilistic Degree Shift: We will now propose a counterpart to the degree distribution which is density consistent. P Define the probabilistic degree of a node to be P D(vi ) = j∈Vt ,i6=j pij,t . Then, let the probabilistic degree shift of a particular node in Gt be defined as the squared difference of the probabilistic degree of that node in times t and t − 1. The total probabilistic degree shift (DS) of Gt is then: X X X ( pij,t − pij,t−1 )2 i

j6=i

(5)

j6=i

This is a measure of how much the total probability mass of nodes in the graph change across a single time step. If the shape of the degree distribution is changing, the probabilistic degree of nodes will be changing as well. Triangle Probability: As the name suggests, the triangle probability (TP) statistics is an approach to capturing the transitivity of the network and an alternative to traditional clustering coefficient measures. Define the triangle probability as: T P (Pt ) =

X

pij,t ∗ pik,t ∗ pjk,t

(6)

i,j,k∈V,i6=j6=k

5.

When the edge counts of the two time steps are the same, the GED (Eq. 1) can be thought of as the difference in the distribution of edges in the network. However, the GED is sensitive to |Et | in two ways: the change in the number of edges from t−1 to t: E∆ = abs(|Et | − |Et−1 |), and the minimum number of edges in each time step Emin = min(|Et−1 |, |Et |). In both cases the statistic is density dependent, and in fact it diverges as the number of edges increases. The first case is discussed in Theorem 5.1, the second in Theorem 5.2. Theorem 5.1 As E∆ → ∞, GED(Gt ) → ∞, regardless of Pt and Pt−1 .

(4)

ij

DS(Pt ) =

Graph Edit Distance

PROPERTIES OF NETWORK STATISTICS

Now that we have described the different categories of network statistics and their relationship to the network density we will characterize several common network statistics as density dependent or consistent, comparing them to our proposed alternatives. Table 1 summarizes our findings.

Proof. Let E∆ be abs(|Et | − |Et−1 |). Since the GED corresponds to |Et | + |Et−1 | − 2 ∗ |Et ∩ Et−1 |, the minimum edit distance between Gt and Gt−1 occurs when their edge sets overlap maximally and is equal to E∆ . Therefore, as E∆ increases even the minimum (i.e., best case) GED(Gt , Gt−1 ) also increases. Theorem 5.2 As Emin → ∞, GED(Gt ) → ∞ if Pt 6= Pt−1 . b Proof. Let Gt = F (|Et |, Pta ) and Gt−1 = F (|Et−1 |, Pt−1 ), a b a with Pt 6= Pt−1 . Select two nodes i, j such that pij,t 6= pbij,t−1 . The edit distance contributed by those two nodes is abs(eij,t − eij,t−1 ). Let Emin increase but E∆ remain constant. As Emin increases the edit distance of the two nodes converges to abs(Emin ∗paij,t − Emin ∗pbij,t−1 ) = Emin ∗ abs(paij,t − pbij,t−1 ). Since every pair of nodes with differing edge probabilities in the two time steps will have increasing edit distance as Emin increases, the global edit distance will also increase.

Since the GED measure is the literal count of edges and nodes that differ in each graph, the statistic is dependent on the difference in size between the two graphs. Even if the graphs are the same size, comparing two large graphs is likely to produce more differences than two very small graphs due to random chance. In addition, even when small nonanomalous differences occur between the probability distributions of edges in two time steps, variation in the edge count can result in large differences in GED.

5.2

Degree Distribution Difference

Claim 5.2 DD is a density dependent statistic.

The degree distribution is naturally very dependent on the total degree of a network: the average degree of nodes is larger in networks with many edges. The DD measure (Eq. 2) is again sensitive to |Et | via E∆ and Emin . In both cases the statistic is density dependent. The first case, in Theorem 5.3, shows that as E∆ increases, the DD measure will also increase even if the graphs were generated with the same P probabilities. The second case, in Theorem 5.4, shows that as Emin increases, small variations in P will increase the DD measure.

Let CB(Gt , i) refer to the clustering coefficent of node i. Then, in the limit of |Et | it converges to: P

j,k

lim CB(Gt , i) = ∗(

lim I[eij,t > 0] ∗ I[eik,t > 0] ∗ I[ejk,t > 0])

|Et |→∞

P =

|Et |pij,t + |Et |pik,t P j6=i |Et |pij,t

2 ∗ ki,lim ∗

|Et |→∞

j,k

pij,t + pik,t P j6=i pij,t

2 ∗ ki,lim ∗

∗ I[pij,t > 0] ∗ I[pik,t > 0] ∗ I[pjk,t > 0]

Theorem 5.3 As E∆ ↑, DD(Gt ) ↑, regardless of Pt and Pt−1 . Proof. Pick any node i in Gt and j in Gt−1 . If |Et | increases, and |Et−1 | stays the same, the expected degree of i increases, while j stays the same; likewise the inverse is true if |Et−1 | increases and |Et | stays the same. Thus, as E∆ increases the probability of any two nodes having the same degree approaches zero, so the degree distribution difference of the two networks increases with greater E∆ . Theorem 5.4 As EM in ↑, DD(Gt ) ↑ if Pt 6= Pt−1 . P Proof. Let P D(i, Gt ) = k6=i pik,t be the probabilistic degree of node i for Gt = F (|Et |, Pt ). Pick any node i in Gt and j in Gt−1 such that P D(i, Gt ) 6= P D(j, Gt−1 ). For a constant E∆ , as Emin increases, thePexpected degrees of i and j converge to Di,t = Emin ∗ k6=i pik,t and P Dj,t−1 = Emin ∗ k6=j pjk,t−1 respectively. This means that the probability of the two nodes having the same degree approaches zero. Since every pair of nodes with differing edge probabilities in the two time steps will have unique degrees, the degree distribution difference will also increase.

P where ki,lim = j6=i I[pij,t > 0]. Since this limit can be expressed in terms of Pt alone, and CB is a sum of the clustering over all nodes, the Barrat clustering coefficient is a density consistent statistic. If we calculate the expectation of the Barrat clustering, we obtain: E[CB(Gt , i)] = E[

= E[

X ei,j + ei,k 1 ai,j ∗ ai,k ∗ aj,k ] (ki − 1) ∗ si j,k 2 X E[ei,j ] + E[ei,k ] 1 ] (ki − 1) ∗ si j,k 2

∗ E[ai,j ] ∗ E[ai,k ] ∗ E[aj,k ] X pij,t + pik,t 1 = E[ ] (ki − 1) ∗ si j,k 2 ∗ E[ai,j ] ∗ E[ai,k ] ∗ E[aj,k ] X pij,t + pik,t 1 = E[ ] (ki − 1) ∗ si j,k 2 |E |

If a node has very similar edge probabilities in matrices Pt and Pt−1 , when few edges are sampled it is likely to have the same degree in both time steps, and thus the impact on the degree distribution difference will be low. However, as the number of edges increases, even small non-anomalous difference in the P matrices will become more apparent (i.e., the node is likely to be placed in different bins in the degree distribution difference calculation), and the impact on the measure will be larger.

5.3

Weighted Clustering Coefficient

Claim 5.3 CB is a density consistent statistic. As shown in Theorem 5.5 below, the weighted Barrat clustering coefficient (CB, Eq. 3) is in fact density consistent. However, we will show later that the triangle probability statistic is also density unbiased, which gives more robust results, even on very sparse networks. Theorem 5.5 CB(Gt ) is a consistent estimator of CB(Pt ), with a bias that converges to 0. Proof. For Gt = F (|Et |, Pt ) the number of edges observed any pair of nodes can be represented using a multie eyz,t t |! . As |Et | → ∞, nomial distribution eij,t|E p ij,t ...pyz,t !...eyz,t ! ij,t the rate of sampling a particular node pair i, j is pij,t , so: lim

|Et |→∞

eij,t = |Et | ∗ pij,t

|E |

|E |

t t ∗ (1 − pij,tt ) ∗ (1 − pik,t ) ∗ (1 − pjk,t )

As this does not simplify to the limit of CB, it is not an unbiased estimator, and is thus density consistent but not unbiased. Other weighted clustering coefficients are also available but they have the same properties as the Barrat statistic.

5.4

Probability Mass Shift

Claim 5.4 MS is a density consistent statistic. As the true P is unobserved, we cannot calculate the Mass Shift (MS, Eq. 4) statistic exactly and must use the empiriP d pij,t − pˆij,t−1 )2 . cal Probability Mass Shift: M S(Gt ) = ij (ˆ As shown in Theorem 5.6 below, the bias of this estimator approaches 0 as |Et | and |Et−1 | increase, making this a density consistent statistic. d Theorem 5.6 M S(Gt ) is a consistent estimator of M S(Pt ), with a bias that converges to 0. Proof. The expectation of the empirical Mass Shift can be calculated with E[

X

(ˆ pij,t − pˆij,t−1 )2 ]

ij

= E[

X eij,t eij,t−1 2 − ) ] ( |E | |Et−1 | t ij

=

X

E[

ij

e2ij,t |Et

|2

e2ij,t

As the expectation of ten as:

+

e2ij,t−1 |Et−1 |2

−2∗

eij,t eij,t−1 ] |Et | |Et−1 |

d t ) is a consistent estimator of DS(Pt ), Theorem 5.7 DS(G with a bias that converges to 0.

for any node pair i, j can be writ-

Proof. The expectation of the empirical degree shift can be calculated with d t )] = E E[DS(G

E[e2ij,t ] = V ar(eij,t ) + E[eij,t ]2

i

= V ar(Bin(|Et |, pij,t )) + E[Bin(|Et |, pij,t )]2

The expected empirical mass shift can then be written as: E[

ij

e2ij,t |Et |2

+

e2ij,t−1 |Et−1 |2

−2∗

eij,t eij,t−1 ] |Et | |Et−1 |

j,k6=i

X |Et | ∗ pij,t ∗ (1 − pij,t ) |Et |2 ∗ p2ij,t = + 2 |E | |Et |2 t ij + + =

=

|Et−1 |2 X

X

i

j6=i

X pij,t (1 − pij,t ) X p2ij,t−1 + |Et | j6=i j6=i

p2ij,t +

!

|Et | ∗ pij,t |Et−1 | ∗ pij,t−1 −2∗ |Et | |Et−1 |

+2∗

X

p2ij,t − 2 ∗ pij,t ∗ pij,t−1 + p2ij,t−1

=

X

pij,t pik,t−1

j,k6=i

XX i

pij,t (1 − pij,t ) pij,t−1 (1 − pij,t−1 ) + |Et | |Et−1 | X pij,t (1 − pij,t ) 2 (pij,t − pij,t−1 ) + = |Et | ij

pij,t−1 ∗ pik,t−1 − 2 ∗

j,k6=i

ij

pˆij,t −

X

j6=i

pˆij,t−1

2

j6=i

+

X pij,t (1 − pij,t ) |Et | j6=i

X pij,t−1 (1 − pij,t−1 ) + |Et | j6=i

+

+

X

X pij,t−1 (1 − pij,t−1 ) X + pij,t ∗ pˆik,t +2∗ |Et | j6=i j,k6=i

|Et−1 | ∗ pij,t−1 ∗ (1 − pij,t−1 ) |Et−1 |2 |Et−1 |2 ∗ p2ij,t−1

j6=i

j6=i

X e2ij,t−1 X X e2ij,t =E ( + 2 |Et | |Et−1 |2 i j6=i j6=i X eij,t eik,t X eij,t−1 eik,t−1 +2∗ ∗ +2∗ ∗ |Et | |Et | |Et−1 | |Et−1 | j,k6=i j,k6=i # X pˆij,t pˆik,t−1 −2∗ "

= |Et | ∗ pij,t ∗ (1 − pij,t ) + |Et |2 ∗ p2ij,t

X

i hX X X pˆij,t−1 )2 pˆij,t − (

Which is density-consistent because the two additional bias terms converge to 0 as |Et | increases.

pij,t−1 (1 − pij,t−1 ) |Et−1 |

As the two additional bias terms converge to 0 as |Et | and |Et−1 | increase, the empirical mass shift is a consistent estimator of the true mass shift, and is density consistent.

Since the bias converges to 0, the statistic is density consistent. By subtracting out the empirical estimate of this bias term we can hasten the convergence. We use the following bias-corrected empirical degree shift in our experiments: 

We can improve the rate of convergence as well by using our empirical estimates of the probabilities to subtract an estimate of the bias from the statistic. We use the following bias-corrected version of the empirical statistic in all experiments: ∗ d M S (Gt ) =

X

5.5

d (G ˆt) = DS

X i

(

X j6=i

pˆij,t −

X

pˆij,t−1 )2

j6=i

 X pˆij,t (1 − pˆij,t ) X pˆij,t−1 (1 − pˆij,t−1 )  (8) + − |Et | |Et | j6=i j6=i

(ˆ pij,t − pˆij,t−1 )2

ij





pˆij,t (1 − pˆij,t ) pˆij,t−1 (1 − pˆij,t−1 ) − |Et | |Et−1 |

5.6

Triangle Probability

(7)

Probabilistic Degree Shift

Claim 5.5 DS is a density consistent statistic. Again, since the true P is unobserved, we cannot calculate the Degree Shift (DS, Eq. 5) statistic exactly and must use the empirical Probability Degree Shift: P d G ˆ t ) = P (P DS( ˆij,t − j6=i pˆij,t−1 )2 . As shown in Thei j6=i p orem 5.7 below, the bias of this estimator approaches 0, making this a density consistent statistic.

Claim 5.6 TP is a density consistent and density unbiased statistic. Again, since the true P is unobserved, we cannot calculate the Triangle Probability (TP, Eq. 6) statistic exactly and must use the empirical Triangle Probability X eij,t ∗ eik,t ∗ ejk,t d T P (Gt ) = |Et |3 i,j,k∈V,i6=j6=k

which is an unbiased estimator of the true statistic (shown below in Theorem 5.8). This means that there is no minimum number of edges necessary to attain an unbiased estimate of the true triangle probability.

d Theorem 5.8 T P (Gt ) is an consistent and unbiased estimator of T P (Pt ). Proof. The expectation of the empirical Triangle Probability can be written as

ijk

eij,t eik,t ejk,t E[ ] |Et | |Et | |Et |

7000 5000

6000

1e-04

4000

0e+00

X pij,t ∗ |Et | pik,t ∗ |Et | pjk,t ∗ |Et | |Et | |Et | |Et | ijk X = pi,j pi,k pj,k =

8000

2e-04

As the number of edges on any node pair i, j can be represented with a multinomial, the expectation of each is |Et | ∗ pij,t . This lets us rewrite the triangle probability as

Figure 3: Recall when applying each statistic to flag synthetically generated anomalies. Each cell is an average over all model parameter pairs.

10000

=

Therefore the empirical triangle probability is an unbiased estimator of the true triangle probability and is a density consistent statistic.

EXPERIMENTS

Now that we have established the properties of densityconsistent and -dependent statistics we will show the tangible effects of these properties using both synthetic datasets as well as data from real networks. The purpose of the synthetic data experiments is to show the ability of hypothesis tests using various statistics to distinguish networks that have different distributions of edges but also a random number of observed edges. The real data experiments demonstrate the types of events that generate anomalies as well as the characteristics of the anomalies that hypothesis tests using each statistic are most likely to find.

6.1

Synthetic Data Experiments

To validate the ability of Density-Consistent statistics to more accurately detect anomalies than traditional statistics we evaluated their performance on sets of synthetically generated graphs. Rather than create a dynamic network we generated independent sets of graphs using differing model parameters and a random number of edges. For every combination of model parameters, statistics calculated from graphs of one set of parameters (or pairs of graphs with the same parameters in the case of delta statistics like mass shift) became the null distribution, and statistics calculated on graphs from the other set of parameters (or two graphs, one from each model parameter in the case of delta statistics) became the test examples. Treating the null distribution as normally distributed, the critical points corresponding to a p-value of 0.05 are used to accept or reject each of the test examples. The percentage of test examples that are rejected averaged over all combinations of model parameters becomes the recall for that statistic. To generate the synthetic graphs, we first sampled a uniform random variable representing the number of edges to insert into the graph. For each statistic we did three sets of trials with 1k-2k, 3k-5k, and 7k-10k edges respectively. For

3000

-1e-04

ijk

6.

7k-10k edges 0.21 ± 0.02 0.64 ± 0.06 0.78 ± 0.04 0.88 ± 0.04 0.67 ± 0.08 0.97 ± 0.03

E[ˆ pij,t pˆik,t pˆjk,t ]

ijk

X

3k-5k edges 0.05 ± 0.01 0.26 ± 0.05 0.78 ± 0.05 0.77 ± 0.05 0.63 ± 0.08 0.94 ± 0.04

9000

X

3e-04

E[PˆT (Gt )] =

1k-2k edges 0.01 ± 0.001 0.15 ± 0.03 0.44 ± 0.05 0.51 ± 0.05 0.62 ± 0.07 0.77 ± 0.04

Edit Distance Degree Dist. Clust. Coef. Mass Shift Degree Shift Triangle Prob.

0

20

40

(a)

60

80

100

0

20

40

60

80

100

(b)

Figure 4: Comparison of Mass Shifts (a) and Edit Distances (b) produced by synthetic dataset pairs. The variance of the Edit Distances is due to the variable edge counts of the graphs, and leads to errors when distinguishing the two distributions.

mass shift, edit distance, triangle probability, and clustering coefficient each synthetic graph was generated according to the edge probabilities of a stochastic blockmodel using accept/reject sampling to place the selected number of edges amongst the nodes. Each set of models was designed to produce graphs varying a certain property, i.e. triangle probability and clustering coefficient were applied to models with varying transitivity while mass shift and edit distance skew the class probabilities by a certain amount. For the degree distribution and degree shift statistics, rather than using a stochastic blockmodel we assigned degrees to each node by sampling from a power-law distribution with varying parameters then using a fast Chung-Lu graph generation process to construct the network. Unlike a standard Chung-Lu process we allow multiple edges between nodes and we continue the process until a random number of edges are inserted rather than inserting edges equal to the sum of the degrees. The recalls are calculated the same way as with the stochastic blockmodels, using pairs of graphs with the same power-law degree distribution as the null set and differing degree distributions as the test set. Figure 6.1 shows the average recall of each statistic when applied to all models of a certain range of edges. In general the more edges that are observed the more reliable the statistics are; however, the statistics we have proposed enjoy an advantage over the traditional statistics for all ranges of network sizes, and in some cases this improvement is as large as 200% or more.

Edge Count Mass Shift Degree Shift Triangle Probability Degree Distribution Edit Distance Clustering Coefficient

Edge Count Mass Shift Degree Shift Triangle Probability Degree Distribution Edit Distance Clustering Coefficient

September 2001: Enron stock begins to falter

May 2001: Mintz sends memorandum to Skilling on LJM paperwork

May 2000: Price manipulation strategy "Death Star" implemented

February 2002: Skilling testifies before Congress

Edge Count Mass Shift Degree Shift Triangle Probability Degree Distribution Edit Distance Clustering Coefficient

December 25: Christmas

November 14-16: Severe tornado warnings

February 14: Valentine's Day

January 16: Martin Luther King day

August 29: Semester begins

March 12-17: Spring Break

August 27: Classes begin

September 5: Labor day

December 2000: Skilling takes over as CEO

November 1: Basketball season begins

0

50

100

January 14: Classes resume

150

0

50

Time (weeks)

(a) Enron email network

100

150

November 1: Basketball season begins

200

Time (days)

(b) University e-mail network 2011-2

0

100

200

300

400

Time (days)

(c) Facebook Univ. subnetwork 2007-8

Figure 2: Timelines showing reported anomalies using each statistic for three real world networks. Figure 4 demonstrates the effect of density consistency using one pair of graph models and the mass shift and edit distance statistics. Each black point represents a pair of datasets drawn from the same distribution while each green point is the statistic value calculated from a pair of points from differing distributions; the number of edges sampled from each distribution was 7k-10k uniformly at random. Clearly the distribution of statistic values when calculated on graph pairs from the same distribution is different from the distribution of cross-model pairs for both statistics. However, the randomness of the size of the graphs translates to variance in the edit distance statistics calculated leading to two distributions which are not easily separable leading to reduced recall. Mass shift, on the other hand, is nearly unaffected by the edge variance leading to two very distinct statistic distributions.

6.2

Real Data Experiments

We will now demonstrate how using Density-Consistent statistics as the test statistics for anomaly detectors improves the ability of detectors to find novel anomalies in real-world networks. We analyzed three dynamic network, one composed of e-mail communication from the Enron corporation during its operation and collapse, one from e-mail communications from university students, and one from the Facebook wall postings of the subnetwork composed of students at a university. Solid points represent statistics that we have proposed in this paper while open bubbles represent classic network statistics. Ideally the major events marked with vertical lines will be found as unusual by the detectors. A hypothesis test using the statistic in question is applied to every time step to determine its category as anomalous or normal. The null distribution used is the set of statistics calculated on all other time steps of the stream. A normal distribution is fit to the statistic values and critical points selected set according to a p-value of 0.05; any time step with a statistic that exceeds the critical points is flagged as anomalous.

statistics are able to recover the critical events of the timeline including events that standard statistics are not able to find. In particular the price manipulation strategy that was implemented in the summer of 2000 and the CEO transition in December 2000 are not found by edge dependent statistics; unlike some of the later events this strategy was not accompanied with an abnormal number of edges so edge dependent statistics produced less of a signal on these points. In addition, the time steps that are flagged by traditional statistics cluster around a set of edge count anomalies just after the Enron stock begins to crash. This period has an elevated number of messages in general, so the statistics are responding to the number of edges in these time steps rather than changes in other properties of the network. Figure 2(b) is taken from the e-mail communication of students from the summer of 2011 to February 2012. In general the density-consistent statistics flag time steps where certain events are taking place such as the start of basketball season, Martin Luther King Day, and Valentine’s day. The density-dependent statistics tend to flag more randomly and often coincide with times of unusual total edge count. Figure 2(c) is constructed from the set of wall postings between members of a university Facebook group. As with the Enron data, there is a large set of time steps with an usual total edge count that are also flagged by the density-dependent statistics. The density-consistent statistics recover the major events such as the beginning of each semester and the start and end of spring break. Interestingly, they also find unusual activity before and after spring break; project deadlines often fall before or after the break which may explain this activity.

Let us take a look at the network structures found in the anomalous time steps. As the mass shift, degree shift, and triangle probability statistics are sums over the edges or nodes of the graph, by decomposing the total statistic into the values generated by each edge/node we can select a subgraph which contributes the most to the statistic. As these statistics are designed to measure changes in the probabilFigure 2(a) shows the the timeline of events that occurred during the Enron scandal and breakdown. The Density-Consistent ity distribution of edges, this subgraph can be considered as

s81

s81

s30570

s30570

s12495

s40348

s12495

s3977

s2136

s47636

s114

s96

s114

s359652

s2136

s50631

s41197

s7319

s27439

s50726

s18197

s32422

s7383

s6716

s41197

s50319

s21967

s38901

s359653

s359652

s359651

s21413

s11636

s14727

s21967

s7319

s99882

s20296

s20296

s51868

s14828

s18197

s32422

s14828

s7383

s6716

s38901

s11636

s11080

s968

s11080

s22553

s8051 s1714

s21413

s99882

s51868

s27439

s50726

s22553

s183

s359651

s5516

s968

s183

s8051

s184

s1714

s184

s5896

s63

s359653

s47636

s30541

s5516

s14727

s96

s40348

s3977

s50631

s30541

s50319

s5896

s63 s18569

Prior  

Anomaly  

(a)  Mass  Shi2:  Enron  

Prior  

Anomaly  

s18569

Prior  

(b)  Triangle  Probability:  E-­‐mail  

Anomaly  

(c)  Degree  Shi2:  Facebook  

Figure 5: Anomalies detected by (a) Mass Shift: Enron time steps 31-32, (b) Triangle Probability: Email time steps 88-89, and (c) Degree Shift: Facebook time steps 249-250. Each plot shows most unusual subgraph for the prior time step and the anomalous one. s26

s26

s10

s41

s10

s41

s69

s1

s21

s1

s5310 s1880 s2062 s12542

s21

s13216s8189s11120s7024 s7198s25853s1335s29895 s5284s18785 s1162s1153s1166 s20377s1959 s1161s1152 s1167s9294 s1060s1481s2117 s31579 s9096 s6097s7649 s16177 s25417 s3775 s7130 s1420s1683 s9097 s4276 s42402 s6499 s49539 s3809 s80 s17373 s1609 s6864 s10135 s1687 s13496 s1686 s6825s601 s11461 s1960 s6541 s1737 s403 s1394 s2658 s1417 s2090 s1483 s9262 s2092 s4123 s2091 s879 s4275

s1165

s13216s8189s11120s7024s5284 s7198s25853s1335s29895 s1162s1153s1166 s18785 s20377s1959 s1161s1152 s1167s9294 s31579 s9096 s16177 s3775 s7130s9097 s42402 s49539

s6097s7649s1060s1481s2117 s25417 s1683 s1420 s4276 s6499 s3809 s17373

s61

s40

s56

s128

s61

s40

s56

s38

s71

s122

s25

s433612 s414280 s343058 s433613 s272821 s8559 s184673 s433614 s373895 s61956 s433611 s59006 s90482 s91701 s433610 s433615 s433609 s7304 s104653 s433616 s364021 s375234 s184672 s433617 s4817 s33033 s8556 s12348 s8561 s391730 s226910 s8565 s1352 s1354 s4816 s434336 s207244 s433635 s147477 s33165 s25877 s433636 s8558 s433637 s383584 s234833 s204321 s433638 s14762 s312254 s230085 s31132 s433608 s219818 s433607 s40849 s8554 s12728 s433606 s213517 s1290 s19202 s214644 s312253 s433605 s148507 s13990 s136231 s8563 s316141 s78350 s4558 s373411 s99585 s433604 s4745 s31050 s106731 s433603 s352505 s433602 s433639 s386337 s16664 s26798 s10652 s156721 s307432 s433601 s433640 s373410 s67116 s433600 s113855 s232772 s94603 s319005 s163962 s433599 s48004 s8557 s48005 s195636 s344137 s18044 s294996 s64422 s433641 s8519 s25006 s17357 s2867 s152120 s197733 s6163 s2856 s134649 s202920 s181248 s4073 s14862 s205700 s8527 s4043s125045 s76514 s434109 s1186 s29780 s433220 s1193 s46546 s27948 s15443 s23563 s8369 s273511 s8372 s66119s8351 s44449

s1609 s10135 s13496 s6825s601 s6541 s403 s2658 s2090 s9262 s2092 s4123 s2091s879s4275s19

s2295 s18062s5019 s19363 s5391 s14790 s91687 s14425 s3136 s74694 s433127 s1188 s15106 s76500 s323 s710 s117695 s433126 s26435 s96359 s86709 s14589 s49414 s315407 s10921 s80490 s10406 s21724 s27445 s36487 s46558 s67851 s52112 s78705 s3516 s5513 s13547 s275080

s1165 s4191

s1243

s2745

s4191

s3509

s33320

s2745

s1356

s1735

s33320

s1736

s7899

s1735

s1423

s4795

s7899

s2664

s996

s4795

s1418

s2821

s996

s1357

s4633

s5056

s2821

s35210

s2823

s2799

s4633

s5056

s2183

s4312

s2823

s2799

s2822

s1169

s2183

s4312

s738

s31337

s2822

s1169

s12813

s19534

s738

s31337

s11074

s11182

s12813

s19534

s5160

s81

s5160

s81

s17013

s8251

s8427

s45719

s17013

s9746

s2223

s45719

s28692

s44228s1269

s2223

s2094

s5468

s44228

s20015

s5468

s1767

s21606

s19167

s3694

s1269

s21606

s2313

s3161

s19167

s3694

s1859

s3160

s2313

s3161

s37618

s5134

s1859

s3160

s2354

s3433

s37618

s5134

s2151

s3168

s2354

s3433

s2152

s8081

s2151

s3168

s2153

s7012

s433612 s414280 s343058 s433613 s272821 s8559 s184673 s433614 s373895 s61956 s433611 s59006 s90482 s91701 s433610 s433615 s433609 s7304 s104653 s433616 s364021 s375234s33033 s184672 s433617s12348 s4817 s8556 s8561 s391730 s226910 s8565 s1352 s1354 s4816 s434336 s207244 s433635 s147477 s33165 s433636 s8558 s433637 s383584s25877 s234833 s204321 s433638 s14762 s312254 s230085 s31132 s433608 s219818 s433607 s40849 s8554 s12728 s433606 s213517 s1290 s19202 s214644 s312253 s433605 s148507 s13990 s136231 s8563 s316141 s78350 s4558 s373411 s99585 s433604 s4745 s31050 s106731 s433603 s352505 s433602 s433639 s386337 s16664 s26798 s10652 s156721 s307432 s433601 s433640 s373410 s67116 s433600 s113855 s232772 s94603 s319005 s163962 s433599 s48004 s8557 s48005 s195636 s344137 s18044 s294996 s64422 s433641 s8519 s25006 s17357 s2867 s152120 s197733 s6163 s2856 s134649 s202920 s181248 s4073 s14862 s205700 s8527 s4043 s76514 s434109 s1186 s125045 s29780 s433220 s1193 s46546 s27948 s15443 s5019 s23563 s2295 s8369 s18062 s273511 s19363 s8372 s5391 s66119s8351 s14790 s44449 s91687 s14425 s310974 s3136 s62511 s74694 s433579 s433127 s208374 s1188 s8363 s15106 s8338 s76500 s8334 s323 s433580 s710 s168573 s117695 s8355 s433126 s18816 s26435 s163739 s96359 s8376 s86709 s433581 s14589 s8373 s49414 s8360 s315407 s905 s10921 s8342 s80490 s8356 s10406 s8347 s21724 s433582 s27445 s433583 s36487 s8332 s46558 s8367 s67851 s6152 s52112 s8361 s78705 s74217 s3516 s38957 s5513 s187116 s13547 s433584 s275080 s117390 s251299 s29481 s38304 s8346 s97839 s63365 s99726 s2910 s74513 s412044 s358957 s2941 s45502 s12936 s13629 s24825 s50932 s4798 s23979 s433221 s283309 s433222 s15883 s433251 s17917 s284474 s274042 s4099 s136426 s282190 s403208 s27729 s4044 s84198 s379443 s6017 s122397 s433252 s15184 s318015 s267269 s433253 s4113 s4911 s433254 s220434 s124398 s270859 s433255

s8361 s74217 s38957 s187116 s433584 s117390 s29481 s8346 s63365 s2910 s412044 s2941 s12936 s24825 s4798 s433221 s433222 s433251 s284474 s4099 s282190 s403208 s27729 s4044 s84198 s379443 s6017 s122397 s433252 s15184 s318015 s267269 s433253 s4113 s4911 s433254 s220434 s124398 s270859 s433255

s23046 s130333 s97307s22364 s8528 s24803 s13673 s26807 s13348 s211968 s425432 s17682 s385739 s16223 s241808 s47755 s9668 s8870 s253631

s433256

s43375

s2154

s7025

s2154

s8062

s1291

s795

s5561

s7022

s1454

s49428 s7021 s780

s1297

s7533

s147 s3162

s1628

s7018

s13429

s3159 s49512

s995

s47860

s995

s118

s15489 s9304 s9046 s38585

s40212

s33

s5653

s10660

s8831

s40212

s6246

s5541 s36377 s1137

s9046 s38585 s2145 s8831 s6246

s412221 s82929

s1419 s1136

s69906

s9105 s2221

s6168

s2007

s10891 s8523

s433264

s4077

s4105 s433267 s433268 s1334 s433269 s433270 s433271 s4062 s413174 s296304 s4111 s271052 s4061 s433272 s4074

s2866

s385498 s5357 s434286 s424395

s3245

s702

s3245

s29905

s702

s45818

s432034

s29905

s25412

s2268

s760

s11308

s270

s173822

s2190

s5706

s5192

s96

s8917

s19

s30

s19

s30

s3528

s699

s3128 s41347 s8966 s16404 s5231 s46491 s777 s12631 s886

s141

s66

s31847

s21900

s41085

s593

s3639s41479 s8323 s998

s699

s3128 s41347 s8966 s16404 s5231 s46491 s777 s12631 s886

s704 s3184 s23937 s3325 s47707 s5543 s2155 s8242 s2156 s1961 s3244 s38109 s8249 s3238 s2189 s43712 s11479 s35336 s33303 s24619 s39263 s3294 s36993s8032 s11394 s2708s7440 s17268 s2121s21145s1760s22102 s4083s17287 s5764s2904 s98 s8394 s146 s2097

s16947

s7398

s29774

s700 s703

s8026 s22174 s17266 s14583 s50807 s36618 s11899 s19563 s2462 s46989 s15955 s38489 s1376

s13123

s2057

s38061

s20739

s40605

s20190

s41085

s593

s3639s41479 s8323 s998

s704 s3184 s23937 s3325 s47707 s5543 s2155 s8242 s2156 s1961 s3244 s38109 s8249 s3238 s2189 s43712 s11479 s35336 s33303 s24619 s39263 s3294 s36993s8032 s11394 s2708s7440s2121 s17268 s21145s1760s22102s5764s2904 s98 s8394 s146 s2097s4083s17287

s16947

s7398

s29774

s1972

s40605

s4179

s39082 s4304

s3308

s4304 s6630

s12007

s12007 s6909

s41003

s2018

s10813

s24790

s12094 s3196

s6600

s42239

s10617

s1972 s42924

s10813

s24790

s29338

s28136

s2606

s32483 s2514

s18388

s3308

s42924

s32

s13123

s2057

s38061

s20739

s10617 s20190

s4179

s6909

s3

s31847

s21900

s2606

s32483 s2514

s18388 s39082

s6630

s12094

s32

s763

s2018

s42239

s28136 s3120

s3196

s6600

s29338

s42168

s44766

s68

s763

s3120 s6315

s45563

s4059

s9054

s10604

s244283

s7535

s16629

s45563

s4059

s9054

s20823 s52

s68599

s6126 s16210

s78878

s371088

s11546 s68603

s52

s3042

s20502

s39681

s14608

s10366

s49339

s3633

s29694

s7041

s15834

s10145 s30270

s4600

s33343

s10145

s13767

s32121

s26148

s10239

s30270

s7150

s12109

s497

s14757

s14124

s13767

s32121

s26148

s10239

s6447

s4601

s13565

s35188

s7170

s44

s14757

s607

s39956

s2880

s433112

(a)  Edit  Distance:  Enron  

s178859

s7757

s211853

s186500

s323530

s32865

s93502 s13162

s242547 s112607

s508

s285945 s3512

s135101

s13121

s433114 s1031

s629

s4254

s2476

s4252 s6095

s14124

s433174

s1041 s1036

s110812

s179559

s45872

s607

s61456

s433117

s433112

s616

s433095 s1010

s2050

s433115

s2049

s142625

s66855

s1029

s349903

s433115

s350835

s13780

s2062

s431712

s634 s2073

s66855

s621

s124396

s1028

s1034 s620 s2052

s1028

s639

s395097

s433111 s433173

s344579

s433172

s92834

s433177

s610

s433116

s31824

s121144

s109470

s1008

s2093

s314397

s38654

s2738

s22091

s316236 s133589

Anomaly  

(b)  Barrat  Clustering:  E-­‐mail  

s1007

s314236

s2030

s198237

s132983

s593

Prior  

s433118 s1007 s240653

s625

s433178 s13740 s272071

s27460

s8663

s2066

s116068

s395626 s213290

s35743 s637

s2102

s24837 s297182

s1008

s125480 s314397

s240653

s625

s433178

s2096 s433171

s36536

s433118 s35743

s8663

s2066

s116068

s11162 s25340

s2093

s36536

s198237 s637

s18681

s92834 s2084

s22091

s125480

s11634

s272071

Prior  

s497

s640

s433117

s109470

s13740

Anomaly  

s69712

s13123

s12109

s433094

s433116

s31824

s121144

s132983

Prior  

s14143

s345914

s245383

s7159

s7150

s596 s433113

s639

s610

s39956

s28634

s4326 s2102

s11162 s25340

s13178

s9130

s247623 s4180

s138367

s35279

s231487

s621

s433111

s31556

s38190 s562

s38654

s2738

s19628

s58854

s34604 s502

s98593

s67441

s95852

s1029

s431712

s634 s124396

s344579

s433177

s563

s28634

s18681

s363005

s27362

s421907 s71436

s2072

s13780

s2062

s395097

s433173

s2084

s4252 s6095

s83996

s13135

s14328 s132817

s612

s328559

s192197

s5441

s18305

s563

s4254

s2476

s127728

s14703 s8744

s65571

s23275

s285464

s319522

s433095

s2049

s30927

s11739

s11712 s31556

s38190 s4326

s11634

s13200

s68101

s35628

s15189

s134147

s433086

s14411

s1013

s349903

s2050

s142625

s33719 s14942

s5441

s18305

s562

s19629

s322

s105488

s6124

s85086

s19931

s432914

s7532

s50872

s433176

s244564

s14363

s30927

s11739

s11712 s2880

s79069

s432493

s85095

s103858

s10145

s2079

s14412

s45872

s61456

s69

s251559

s13169

s6433

s6125

s433087

s9449 s82241

s95854 s331649 s433175 s318356 s110812

s179559

s629

s1034

s54

s131146

s37538 s506

s34117

s154024 s34833

s13121

s1036

s640

s35279

s231487

s620

s23

s13159

s13198 s433088

s134230

s23276

s135101

s1041

s433094

s67441

s95852

s433172

s69

s69201

s13174 s11975

s68608

s68035

s156069 s7534

s285945 s3512

s433114 s1031 s596 s433113

s2072

s2052

s54

s205395

s4029

s9376

s17631 s43592

s22162

s29126 s99083

s242547 s112607

s508

s138367

s328559

s192197

s2073

s23

s35627

s4829 s4561

s6434

s269927

s14927

s93502 s13162

s247623 s4180

s98593

s285464

s319522

s350835

s7170

s44

s213903

s507

s2760 s13344

s3801

s55272

s16853

s323530

s32865

s34604 s502

s612

s14411

s433174

s14988 s4600

s33719 s4601

s35188

s18100

s13194

s25852

s153092 s7672

s15263

s186500

s421907 s71436

s84092

s211853 s324803

s14328 s132817

s433176

s47311

s32279

s9835

s6447

s14363

s14942

s13565

s8036

s13199 s55601

s405790 s27329

s1378

s7757

s65571

s23275

s432914

s7532

s50872

s15617

s49339 s433086

s269927

s10145

s14412

s28801

s1010

s46

s9756

s13196

s17760

s221832 s59113

s38152

s616

s73

s371088

s240591 s20511

s15393

s44661

s32279

s9835

s14988

s46

s78878

s139244 s1759

s30097

s44089

s40088 s811

s244564

s73

s68599

s6126 s11546 s84136 s99212 s23256 s20861

s191014

s5650

s47311

s44661

s110439

s8302

s1013

s12345 s10779

s33343

s12054

s429369 s14143

s345914

s19628 s7671

s9449

s95854 s331649 s433175 s318356

s44089 s28801

s22164

s14676

s363005

s433525

s156069

s49881

s2645

s2079

s15834

s68603

s83996

s13178

s31434 s35960

s7041 s5650 s40088

s86923

s6636 s127728

s245383

s69712

s178859

s46994 s13254

s49881

s2645

s16210

s1104

s13200

s20506

s154024 s34833 s82241

s29694

s7535

s20502 s19629

s7159

s29126 s99083 s7534

s3633

s244283

s7384 s251559

s9130

s134147

s84092

s324803

s21741 s18688

s31434 s35960

s145996 s413 s258736 s130326 s434 s42891 s464 s420 s16162 s433085 s12746 s36306 s27956 s448 s194738 s454 s22316 s91683 s6416 s471 s422 s400 s444 s467 s358322 s189042 s452 s460 s163408 s153336 s469 s197018 s241131 s27668 s426 s10435 s84945 s440 s435 s52585 s385 s437 s42110 s135360 s439 s196358 s451 s438 s46427 s5893 s150006 s10979 s125685 s240698 s10955 s277755 s277754 s307635 s10992s10943 s151250 s3470 s3389 s347483 s10882s10906 s5926 s147042 s128210 s103136 s433792 s1247 s21732 s10981 s10968 s107868 s199495s10964 s5894 s58068 s2s10838 s101483s3240 s309178 s10944 s2191 s12520 s2216 s18821 s356618 s432588 s2202 s10990 s28019 s22979 s2220 s81228 s13662 s46673 s2207 s1232 s18169 s10937 s1892 s433791 s168299 s10909 s57150 s10913 s2193 s123664 s2208 s22995 s58067 s10902 s43371 s400385 s144119 s433790 s2184 s10911 s2190 s18481 s157432 s58193 s64896 s10945 s433652 s18824 s400887 s6322 s302913 s118848 s17556 s379167 s8940 s1148 s5286 s3079 s83995 s10958 s92391 s22410 s6560 s340620 s10950 s10892 s18494 s66025s3s10971

s10604

s26647 s16629 s17044 s20510 s20503

s131146

s58854

s6125

s85086

s19931

s20621 s6675

s46994 s13254

s41427 s102749 s311383 s49587 s99883 s303301 s14265 s14281 s1782 s28915 s197250 s173355 s124190 s63929 s212855 s98152 s434046 s5732 s39324 s14266 s59923 s256755 s17852 s20263 s434047 s434048 s22650 s177145 s129977 s14267 s5734 s254533 s147772 s110233 s11041 s137992 s346427 s183278 s14279 s100639 s92937 s9629 s91236 s131402 s261658 s306495 s149570 s265663 s173 s200395 s121271 s278133 s275658 s203122 s434049 s285860 s75943 s119737 s160576 s192409 s20264 s434050 s84534 s344448 s160577 s1781 s10531 s195022 s10150 s295910 s115594 s39243 s253669 s36093 s280373 s309789 s24745 s49346 s71443 s260233 s81518 s18144 s24442 s107067 s165456 s18129 s18131 s10149 s18130 s180297 s433651 s2187s241008 s2230 s112742 s134301 s345408s2201 s189852 s64198 s60536 s61404

s85095

s103858

s48422 s3632

s21741 s18688

s122119

s89330 s194368 s67458 s405596 s21308 s8964 s5263 s74725 s363802 s433648 s349997 s160870 s60994 s80588 s433647 s351912 s114006 s96849 s280347 s37745 s8942 s10991s186438 s474 s382

s15617

s10366

s17778

s20621 s6675

s12345

s169726

s268571

s13159

s27362

s432493

s23276 s55272

s16853

s39681

s44650

s48422 s3632

s57472

s74882

s3801

s30097

s14927

s17778

s10779

s125758

s258970

s69201

s13135

s6124

s6434

s8302

s2833

s13123

s44650

s18135

s265766

s8744

s134230

s6936

s433087

s14608

s14277

s214571

s15189

s22162

s7671

s34117

s811

s105085

s354566

s941

s29650

s27599

s3938

s322

s68101

s14703 s1378

s15263

s127

s16005

s255144

s35628

s68608

s68035

s433525

s2127

s20366 s3309

s2833

s38152

s51

s124365

s434282

s105488

s429369

s677

s45092

s2836

s40538

s15393

s127

s119782

s99425

s220876 s79069

s17631 s43592

s191014

s51

s12971

s434283

s205395

s4029

s13198

s37538

s13169

s6433

s4561

s110439

s3333

s6445 s6984

s51704

s941

s29650

s53594

s8649

s506

s4829 s14676

s20506

s3521

s3608 s2646

s41435

s6936 s3309

s75944

s320381

s433088

s153092

s13344

s3755

s2137

s26676

s37282

s2127

s20366

s2866

s434045

s8434

s7672 s2760

s15268

s2834

s25071

s677

s45092

s2847

s8804 s8255 s4347

s103557

s294943

s11975

s27329

s3681

s220876

s31676

s4682

s3333

s6445

s27599

s3938

s4041

s89476

s84783

s244332

s294942

s9376

s405790

s1080

s41629

s3521

s3608

s2836

s40538

s4074 s40966

s146012 s433642

s30499

s325278

s59113

s86923

s3755

s2137

s6984

s51704

s4061 s433272

s12705

s1814

s434044

s398682

s37954 s6636

s15268

s2646

s41435

s4111 s271052

s348515

s35095

s31046 s33769

s171

s387060

s10862

s35627

s240591

s1104

s2001

s3681 s19203

s31676

s2834

s26676

s37282

s4062 s296304

s260623

s170873 s327561 s202699 s14810

s118

s280948

s20511 s221832

s2711

s1080 s37954

s25071

s4105 s433267

s51862

s24061

s286704

s18100

s213903

s507

s139244 s1759

s8036

s13199 s13194

s25852

s23256 s20861

s13174

s3042

s2001

s4682

s4121

s40910

s77342

s55601

s99212 s12054

s14037

s9756

s13196

s17760

s84136 s22164

s20503 s10862

s14037

s2711

s19203

s41629

s107

s4077 s433266

s434387 s144050 s65327 s18319

s376640

s340788

s43847

s2002

s44766

s7384

s20823

s2

s36

s92

s420684

s24711

s333444

s42168

s43847

s2002

s17044

s29

s107

s64255

s225460 s18143

s69081

s267683

s18135

s20510

s68 s2

s36

s363150

s86042

s278233 s89716

s173822

s122119

s26647

s6315

s29

s433265

s146380

s89330 s41427 s194368 s102749 s67458 s311383 s405596 s49587 s21308 s99883 s8964 s303301 s5263 s14265 s74725 s14281 s363802 s1782 s433648 s28915 s349997 s197250 s160870 s173355 s60994 s124190 s80588 s63929 s433647 s212855 s351912 s98152 s114006 s434046 s96849 s5732 s280347 s39324 s37745 s14266 s8942 s59923 s10991 s256755 s474 s17852 s382 s20263 s186438 s434047 s145996 s434048 s413 s22650 s258736 s177145 s130326 s129977 s434 s14267 s42891 s5734 s464 s254533 s420 s147772 s16162 s110233 s433085 s11041 s12746 s137992 s36306 s346427 s27956 s183278 s448 s14279 s194738 s100639 s454 s92937 s22316 s9629 s91683 s91236 s6416 s131402 s471 s261658 s422 s306495 s400 s149570 s444 s265663 s467 s173 s358322 s200395 s189042 s121271 s452 s278133 s460 s275658 s163408 s203122 s153336 s434049 s469 s285860 s197018 s75943 s241131 s119737 s27668 s160576 s426 s192409 s10435 s20264 s84945 s434050 s440 s84534 s435 s344448 s52585 s160577 s385 s1781 s437 s10531 s42110 s195022 s135360 s10150 s439 s295910 s196358 s115594 s451 s39243 s438 s253669 s46427 s36093 s5893 s280373 s150006 s309789 s10979 s24745 s125685 s49346 s240698 s71443 s10955 s260233 s277755 s81518 s277754 s18144 s307635 s24442 s10992s10943 s107067 s151250 s165456 s18129 s3470 s18131 s3389 s10149 s347483 s18130 s10882 s180297 s5926 s241008 s10906 s433651 s147042 s2187 s128210 s2230 s103136 s112742 s433792 s2201 s1247 s134301 s21732 s345408 s10981 s189852 s10968 s107868 s60536 s199495 s61404s64198 s5894 s3240 s10964s2 s58068 s101483 s10838 s309178 s10944 s2191 s12520 s2216 s18821 s356618 s432588s22979 s2202 s10990 s28019 s2220 s81228 s13662 s46673 s2207 s1232 s18169 s10937 s1892 s433791 s168299 s10909 s57150 s10913 s2193 s123664 s2208 s22995s400385 s58067 s10902s433790 s43371 s144119 s2184 s10911 s2190 s18481 s157432 s58193 s64896 s10945 s433652 s18824 s400887 s6322 s302913 s118848 s17556 s379167 s8940 s1148 s5286 s3079 s83995 s10958 s92391 s22410 s6560 s340620 s10950 s10892 s18494 s66025s3s10971

s141

s41003

s3

s92

s192639

s157119

s197911

s51567

s169726

s268571

s34903

s3528

s700 s703

s8026 s22174 s17266 s14583 s50807 s36618 s11899 s19563 s2462 s46989 s15955 s38489 s1376

s66

s433264

s132431

s15774

s168653

s57472

s74882

s36288

s12203

s347719

s134596

s125758

s258970

s8

s20493

s34903

s892

s434284

s14277

s214571

s2191

s34286

s36288

s12203

s164652

s142964

s105085

s354566

s875 s21

s9324

s8

s20493

s38821

s205082

s16005

s255144

s265766

s2191s21

s34286

s64571

s426861

s124365

s434282

s49640

s381 s23932

s9324

s27707

s192052

s119782

s99425

s876

s8156

s875

s23932

s24746

s17076

s12971

s434283

s4062

s7251

s49640

s381

s207422

s115461

s53594

s8649

s97s577

s11199

s876

s8156

s195409

s434285

s75944

s320381

s984

s29262

s4062

s7251

s237845 s80301

s269611

s434045

s4070

s97

s11199

s297893

s103557

s294943

s1065

s1562

s577

s29262

s309272

s27935

s244332

s294942

s22

s33918

s984

s4070

s150041

s22285

s30499

s325278

s20525

s31296

s1065

s1562

s432034

s237845 s80301

s278233 s89716

s376640

s434044

s398682

s208

s1085

s22

s33918

s198812

s171

s387060

s4125

s8959

s20525

s31296

s274137

s118

s280948

s96

s8917

s208

s1085

s150551

s24061

s286704

s2190

s5706

s4125

s8959

s223232

s40910

s77342

s23664

s8434

s5192

s4056 s433263

s424395

s24711

s333444 s340788

s137

s5357 s434286

s69081

s267683

s18

s385498

s197911

s51567 s146380

s137

s141823

s15774

s168653

s18

s310930

s347719

s134596

s6004

s1049

s280230

s892

s434284

s9372

s2814

s23664

s232738

s164652

s142964

s4481

s16829

s6004

s1049

s2295

s32399

s9372

s2814

s1334

s372351

s38821

s205082

s1738

s52441

s4481

s16829

s433271

s434287

s64571

s426861

s2780

s26645

s2295

s32399

s433268

s7946 s391698

s27707

s192052

s3944

s269

s1738

s52441

s433270

s277664

s24746

s17076

s8522

s3693

s2780

s26645

s433269

s339466

s207422

s115461

s7023

s2288

s3944

s269

s413174

s73274 s160520

s195409

s434285

s1267

s3450

s8522

s3693

s83964 s142963

s150041

s269611

s50846

s1788

s7023

s2288

s11308

s270

s1267

s3450

s25412

s2268

s50846

s1788

s106411

s309272

s27935 s297893

s760

s4

s7945

s168360

s1814

s157119 s132431

s22285

s45818

s70

s8977

s8947

s414247

s12705 s35095

s31046 s33769

s18143 s86042

s274137 s198812

s4

s8968 s13209

s8951

s5830

s8971 s3224

s8957 s8973

s91949

s8804 s8255

s348515 s260623

s170873 s327561 s202699 s14810 s51862 s434387 s144050 s65327 s18319 s225460

s223232 s150551

s70

s433650 s144177

s8978 s8955 s8946 s41221 s8967 s183876

s330658

s2847

s141823

s134754

s184731

s4041

s89476

s84783 s4347

s310930

s433649

s40966

s146012 s433642

s280230

s8958 s260302

s4121 s433266

s232738

s4101

s282933

s420684

s372351

s701

s4135

s57576

s64255

s7946 s434287

s1163

s19227

s4064

s15803

s363150

s339466

s8163

s7415

s433262

s7840

s433265

s8958

s8968 s13209

s8971

s8957 s8947

s391698

s6437

s103

s433261

s181836

s192639

s15803 s282933 s260302 s433650 s144177

s8978 s8946 s8951 s183876

s277664

s16332

s1208

s701

s202029

s168313

s4056 s433263

s134754

s83964

s11950

s1156

s1163

s19227

s433260

s273770

s4101

s91949

s142963

s72

s1154

s8163

s7415

s168955

s118175

s4135

s168360

s624

s3512

s1164

s6437

s103

s4119

s317792

s4064

s184731

s30592

s8325

s1209

s16332

s1208

s433082

s297187

s433262

s160520

s1155

s11950

s1156

s69906

s240465

s433261

s73274

s5952

s72

s1154

s433259

s198958

s202029

s7945

s18073

s4281

s3512

s1164

s352222

s213438

s433260

s8973

s5316

s7963

s8325

s1209

s82929

s9087 s302394

s168955

s3224

s1505 s3100

s5969

s624

s1155

s412221

s9365 s30370

s4119 s433082

s106411

s30592

s5952

s4133

s276684

s433259

s414247

s6168

s2007

s4100

s265916 s7482 s159135

s352222

s41221

s12317 s17551

s1139 s32316

s18073

s4281

s144898

s288885

s4133

s57576

s9304

s9105

s5316

s7963

s47760

s143652

s4100

s433649

s5969

s433258

s23414

s144898

s330658

s1136

s20

s4106

s238343

s47760

s7840

s15489

s4512 s1471 s2149

s2145

s12317 s17551

s2221 s1505 s3100

s1419

s55 s72

s4312

s60937

s433258

s5830

s1139 s32316

s17 s126

s20

s71871

s20122

s4106

s8967

s1137

s96

s55

s433257

s112792

s4312

s8955

s5541 s36377

s17 s126

s4088

s96105

s71871

s8977

s33

s96

s72

s205701

s169307

s433257

s10891 s8523 s181836

s5653 s4512 s1471 s2149 s10660

s118

s413170

s269965

s4088

s168313

s38975

s18256

s284475

s66164

s205701

s118175

s16327

s273770

s38975

s4030

s242152

s413170

s297187

s46544

s317792

s16327

s4070

s99432

s284475

s198958 s240465

s46544

s4069

s43817

s4030

s9365 s30370 s302394 s213438

s13429

s4103

s195670

s4070

s276684

s6784

s9087

s1628

s4083

s3594

s4069

s60937 s23414 s288885 s265916 s7482 s159135

s6783

s1926

s6784

s143652

s17523

s7648

s7533

s238343

s1297

s3166

s6783

s20122

s1295

s779

s17523

s4104

s26311

s4103

s96105

s1293

s112792

s1295

s4093

s14863

s4083

s269965

s1292

s169307

s1293

s4084

s193900

s4104

s242152 s66164

s1292

s4068

s46377

s4093

s43817 s99432

s1454

s4128

s22827

s4084

s3594 s195670

s5561

s311120

s87587

s4068

s14863 s26311

s1291

s49428 s7021 s780 s779 s3166 s7648 s1926 s147 s3162 s7018 s3159 s49512 s47860 s18256

s311124

s7839

s4128

s46377 s193900

s7025 s8062 s795 s7022

s71

s433256

s43375

s311120

s87587 s22827

s477

s245042

s311124

s7839

s2153

s7012

s310974 s62511 s433579 s208374 s8363 s8338 s8334 s433580 s168573 s8355 s18816 s163739 s8376 s433581 s8373 s8360 s905 s8342 s8356 s8347 s433582 s433583 s8332s6152 s8367

s245042

s2152

s8081

s477

s38304 s97839 s99726s251299 s74513 s358957 s45502 s13629 s50932 s23979 s283309 s15883 s17917 s274042 s136426 s22364 s23046 s130333 s97307 s8528 s24803 s13673 s26807 s13348 s211968 s425432 s17682 s385739 s16223 s241808 s47755 s9668 s8870 s253631

s11074

s11182

s8251

s8427 s9746 s28692 s2094 s20015 s1767

s25

s80

s1687s6864 s1686 s11461 s1960 s1737 s1394 s1417 s1483 s69 s5310 s1880 s2062 s12542 s19

s1243 s3509 s1356 s1736 s1423 s2664 s1418 s1357 s35210

s128

s38

s122

s24837

s2096 s433171

s316236 s133589

s314236

s2030

s297182

s395626 s213290

s27460

s593

Anomaly  

(c)  Degree  Distribu

Suggest Documents