Deployment of DNIDS in Social Networks - CiteSeerX

3 downloads 38874 Views 553KB Size Report
To automate the process of detecting new threats, ... Email worms can send themselves to .... Threats propagate via Email by sending themselves as an.
Deployment of DNIDS in Social Networks Meytal Tubi, Rami Puzis, and Yuval Elovici Deutsche Telekom Laboratories at Ben-Gurion University Ben Gurion University, Be’er Sheva, 84105, Israel

{tubim,puzis,elovici}@bgu.ac.il

Internet users form social networks as they communicate with each other. Computer worms and viruses exploit these social networks in order to propagate to other users. In this paper we present a new framework aimed at slowing down or even preventing the propagation of computer worms and viruses in social networks. In the first part of the framework a social network has to be derived for a given community of users. In the second part the group of users that have the highest influence on the communication in the social network has to be located. The Group Betweenness Centrality measure is used to evaluate the influence of each candidate group. In the third part we analyze the threat propagation in the social network assuming that a Distributed Network Intrusion Detection System (DNIDS) is monitoring the traffic of the group. The analysis is performed using a network simulator that was developed for this purpose. In the fourth part a DNIDS has to be deployed on a range of ISPs in order to monitor and clean the traffic of the users belonging to the central group. We applied the new framework by deriving the social network of 1000 students, finding the most influential group of users, and analyzing the influence of the deployment of DNIDS using a simulation tool. The simulation results demonstrated the framework’s ability to slow down or even prevent the propagation of threats by cleaning the traffic of central group of users. Index Terms— Computer Viruses, Distributed Detection, Group Betweenness Centrality, Social Networks.

I. INTRODUCTION

U

sers that are connected to the Internet exchange information with other users. The communication between users is transmitted in various ways such as Emails, Instant Messaging, etc. Users may unintentionally copy contaminated files to their computers and spread them to other users with whom they communicate [1]. The contaminated files may contain computer viruses and computer worms [2], [3] (hereafter we will use the term threats instead of computer viruses and computer worms). Recent studies suggest that users acquire these threats mainly from the Internet [4]. Intrusion Prevention Systems [5] filter Internet traffic relying on predefined attack signatures or source address blacklists. Both approaches are commonly implemented in software and hardware security solutions. Signatures of existing threats can be obtained by careful study in antivirus This work was supported by Deutsche Telekom Co.

laboratories. To automate the process of detecting new threats, Intrusion Detection Systems produce blacklists and signatures of potential attacks using honey-pots [6] or anomaly detection techniques [7]. Weaver et al. [2] state that an effective Intrusion Detection System should be widely distributed across the network. In this study we assume that a Distributed Network Intrusion Detection System (DNIDS) is available. We also assume that this system is capable of detecting new threats, generating signatures, and cleaning the Internet traffic. Examples of systems that detect malicious activity using a coordinated effort of many detection units can be found in [8]. Incoming and outgoing Internet traffic of users can be inspected by DNIDS deployed on a range of Internet Service Providers. It is not realistic to inspect the traffic generated by all Internet users. In [9] for example, it is proposed that traffic only of users that have recently obtained an IP address should be inspected and cleaned. Theoretic models of epidemic propagation suggest immunizing the most significant users in order to slow down the threat propagation in the entire network [10]. In this paper we present a new framework aimed at reducing the spread of threats between users belonging to a social network. The framework is based on four parts. In the first part, we build a social network by analyzing the communication between users. In the second part, we identify a group of users that have the highest influence on the communication between users that belong to the social network. In the third part, we analyze the threat propagation in the social network assuming that a DNIDS is monitoring the traffic of the central group of users. Various deployments are examined to determine the optimal deployment size. In the fourth part, Internet Service Providers should deploy the DNIDS in order to inspect and clean the traffic of the group of users found in the third part. The rest of the paper is structured as follows. In section 2 we present the work related to this study. In section 3 we present the new framework. In section 4 we present the framework evaluation. In section 5 we conclude the paper with a summary and suggestions for future work.

II. RELATED WORK A. Epidemiological models Common models of epidemic propagation categorize the population into three defined states: Susceptible (S) individuals do not have the disease in question but can catch it if exposed to someone who does; Infective (I) individuals have the disease and can pass it on; and Removed (R) individuals have been disabled by the disease and cannot be infected or infect others. Different epidemic propagation models define different transitions between the states. The SIR model assumes that any susceptible individual has a probability β of catching the disease in a unit of time from any infective entity. Infected entities are removed with a probability γ in a unit of time [11]. The SIS model is a model of endemic disease: carriers that are cured become susceptible. Since carriers can be infected many times, it is possible that the disease will persist indefinitely, circulating around the population and will never die out [12]. In this study we will use the SIRS model in which computers can either crash (be temporally removed) or be cured (become susceptible). Models of this type assume that the population is fully mixed. As a result the fractions s, i, and r of entities in the states S, I, and R can be expressed analytically by differential equations. In reality diseases can only spread between entities that have actual physical contact. The nature of the contacts influences the disease propagation [12]. Researchers usually study epidemic propagation in networks using stochastic simulations. Most epidemic models define λ = β / γ to be the effective spreading rate of the infection. Epidemic threshold is a value λc such that for any λ > λc the infection is constantly spreading in the population until it reaches an equilibrium state. If λ < λc, the infection dies out exponentially fast. In this study we reduce the level of infection in the equilibrium state. B. Threats propagation in social networks Computer threats are similar to biological viruses in their spreading model. Email worms can send themselves to randomly generated Email addresses. To increase the amount of successful infections, some threats (such as ILoveYou, Kletz, etc.) exploit address-books or Emails kept on the hosting computer. A threat can search for stored Emails in order to retrieve addresses of potential victims. Sent and received Emails form a social network where two users are connected if an Email was sent from one to another or vice versa. This network can be extracted from logs of Email servers. Recent work has highlighted the effect of a network's degree distribution on the behavior of epidemic spread. Particular attention was paid to scale-free networks, in which the probability that a vertex has degree k decays as k-α for some constant α [13], [14]. Infections spreading over scale-free networks are known to be highly resilient to random vaccination strategies. However, it was shown that targeted vaccination (of highly connected vertices, for example) can be very effective [10].

The degree distribution of networks based on sent and received Emails is continuous but does not follow a power law [1]. Tyler et al. [15] found that 80% of vertices in such networks must be protected in order to prevent epidemics, even if targeted vaccination is used. We show that 10% of protected vertices decrease the average contamination level in the network by more than seven times. C. Group Betweenness Centrality In order to analyze and understand the roles played by vertices in complex networks, many network-analytic studies in the last years have relied on the evaluation of centrality measures defined on the vertices of the network [16]. These measures are used to rank the individual vertex’s prominence according to their position in the network [17]. An important centrality measure on which we will be focusing in this study is the Shortest Path Betweenness Centrality. Betweenness Centrality (BC) measures the extent to which a vertex has control over information flowing between others [18], [19]. The Betweenness centrality of a vertex is defined as the total fraction of shortest paths between all pairs of vertices in a network in which the vertex takes part. High Betweenness centrality scores indicate that the ratio of shortest paths on which a vertex lies is high [20]. The shortest path between two vertices can be determined by an algorithm, such as Dijkstra’s algorithm or Breadth-First Search (BFS). These algorithms devise paths in which the distance between two end vertices is minimal [20], [21]. For a given graph G=(V,E), let σs,t be the number of shortest paths between s and t and let σs,t(υ) be the number of shortest paths between s and t that traverse υ. The Betweenness Centrality of vertex υ is:  σ s ,t (ν )  (1)  BC (ν ) = ∑   s ,t∈V | s ≠ν ≠ t  σ s , t  Betweenness Centrality of individual vertices can be naturally extended to Betweenness Centrality of groups of vertices. Let C ⊆ V be a group of vertices. Group Betweenness Centrality (GBC) of C stands for the total fraction of shortest paths between all pairs of vertices that pass through at least one member of the group C [18]. Let σs ,t (C ) be the number of shortest paths between s and t that traverse at least one member of the group C. GBC(C) is defined as follows:  σ (C )  (2)  GBC (C ) = ∑  s ,t  s ,t∈V |s ≠ t  σ s ,t  A fast algorithm for computation of GBC was recently developed [22]. Using this algorithm and standard methods of combinatorial optimizations (such as simulated annealing, genetic algorithms, etc.) a central group of vertices can be found. In order to find the group of users that should be monitored, we use a greedy algorithm for GBC maximization. This algorithm incrementally constructs a group by adding vertices having the highest contribution to the GBC of the current group.

III. THE NEW FRAMEWORK In this study we describe a new framework for detecting and eliminating threats while analyzing only part of the social network traffic. The framework is based on four parts: In the first part, we derive a social network represented by a graph. The social network is constructed by analyzing the Email communication between users. In the second part, we pinpoint the optimal places to intercept the social network traffic based on the network topology. We locate the group of users having the highest influence on the network communication using the GBC measure and by employing a new greedy algorithm. In the third part, we analyze the threat propagation in the social network for various deployment sizes. In this part we use network simulation tool we have developed. In the fourth part, Network Service Providers should deploy DNIDS to inspect the traffic of the group of users found in the third part. The framework is described in detail in the following subsections: A. Extracting the Social Network In the first part of our framework, the social network is extracted from logs of Email servers. Logs of Email contain addresses of a sender and a recipient of all outgoing and incoming Emails. We create the social network based on Email addresses appearing in these logs. Each vertex in the graph represents a user (Email address) and each edge represents a social connection between the users (an Email sent from one user to another or vice versa). The social network was constructed based on one week’s log of Email traffic at Ben-Gurion University. Every Email sent from one university member to another was recorded. Every record in the log contains the exact date and time when the Email message was sent. Email addresses of senders and receivers were hashed in order to preserve the user’s privacy. The log also contained the volume of every Email in bytes. The last field indicates whether the Email was successfully received by the addressee. Threats propagate via Email by sending themselves as an attachment to the Email. We assume that an infected attachment weighs more than 2M. Thus, we remove records of Emails of a volume lower than 2M. We also remove the records of Emails that did not reach their destination. Next we construct the social network formed by the remaining records. Every Email address that appears in the filtered log is mapped to a vertex. Two vertices are connected if there is at least one recorded Email from one to the other. The derived network is unweighted and undirected. Other properties of the network are described in Table 1. We have used Pajek (a tool for network analysis and visualization) to layout the social network for the simulation.

TABLE I DERIVED NETWORK STATISTICAL PROPERTIES Derived Network Number of vertices Mean Degree Total number of edges

942 3.44 1476

B. Pinpointing Central Users In the second part of our framework the suggested places to intercept threats in the social network are found. To pinpoint the central group of users we analyzed the graph derived in the previous subsection. We locate the group of users that have the highest influence on the communication between all users using a greedy algorithm based on the GBC measure. It can be proven by straightforward reduction from the Vertex Cover problem that finding a group of vertices with maximal GBC is NP hard. Every set of vertices that has the maximum possible GBC (as defined by equation 2) also has a member attached to every edge in a graph and vise versa. In light of this fact we are aiming towards an algorithm fast enough to be used in this framework. The fast algorithm for GBC computation described in [22] can be modified to construct a group of vertices with high GBC. The algorithm requires preprocessing that can be performed in O(n3) time where n is the number of the vertices in our network. The original algorithm is aimed at rapid computation of GBC of various groups. Assuming we want to compute the GBC of a group of vertices C of size k, M is used during the algorithm as a subset of C that includes vertices whose joint contribution was already accounted for. One of the most important features of this algorithm is that we always know the contribution of any vertex to the GBC of M. In this study we developed an efficient greedy maximization of GBC by starting the algorithm with C equal to the set of all users and each time choosing the next vertex having maximal contribution. The following roughly describes the process used to find the possible groups of influential users: 1. Run preprocessing described in [22]. Betweenness Centrality of all vertices is calculated as a part of preprocessing. 2. Initialize M with empty set and the contributions of all vertices is initialized with their Betweenness Centrality. 3. Choose a vertex with highest contribution, add this vertex to M, and update the contributions of all vertices [22]. Then, the vertex is added to a list. 4. Repeat step 3 for all vertices in the network. As mentioned above, the preprocessing (step 1) is performed in O(n3). Steps 2-4 are performed k times (the size of the group of vertices), when the third step of this process takes O(n2) time to complete. Therefore, the execution time of steps 2-4 is O(k⋅n2). Thus, the total execution time of the second part of our framework scales as the cube of the network size (O(n3)). This part of our framework was implemented in C++. We have executed this part of the framework on a PIVCore-Duo 3GHz computer with 2GB memory. The analysis of the social

network extracted in the first part of the framework took a little less than three hours. During the third step of the above process, we saved both the index of the chosen vertex and the GBC of M top vertices in a list (which contains the users arranged in the order in which

they should be added to the group). A deployment of size k should include the top k vertices in this ordered list. The size of the optimal deployment is determined in the third part of our framework in which every suggested deployment is evaluated using a network simulator.

Fig. 1 – A screenshot of the simulation. In the main frame the social network is visualized. At the bottom statistical information is presented.

C. Simulation tool In the third part of the framework, we analyze the threat propagation in the presence of DNIDS. We have developed a simulation tool which uses the social network created in the first part of the framework. This tool simulates threats propagation in a social network and the operation of DNIDS deployed on groups of users, identified in the second part of the framework. The output of this simulation tool is the average contamination level of the network and the time it takes to detect new threats. In our simulation we used the SIRS model of epidemic disease propagation. A computer can be in one of the following states: a) Susceptible – a computer can become infected with a threat; b) Infective – a computer is infected with a threat and can infect others; c) Removed – a computer

crashed and is now temporally disabled. These states are presented in Fig. 2. A Susceptible computer can be infected with a threat when a removable media is attached. Since the origin of this infection is not another user in our network, we call this infection "spontaneous infection." Any susceptible individual has a constant probability of becoming infected in every time unit of the simulation. In the course of this study we investigated the propagation of threats in a closed environment formed by university members. Thus, threats originating outside the network are considered as a part of spontaneous infections.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)
REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)
REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < centrality to Greedy GBC algorithm. Spontaneous infection can be based on empirical data on computer infections via removable media. Samples of outgoing traffic of infected computers can help us to simulate more realistic accidental propagation. It seems that we can make the simulation more accurate by taking into account the global traces of threat propagation. These traces reflect traffic that doesn't consider deployment of DNIDS. Since the traffic changes once a threat was detected and removed from the network, these traces are not suitable for simulating traffic of threats in the presence of DNIDS.

[9] [10] [11] [12] [13] [14] [15]

APPENDIX A Contamination Experiment Parameters

[16] Value

Initial number of threat classes Threat activation probability Appearance of spontaneous new threats in each computer every time unit The probability that a computer will send a threat in each time unit The probability that a computer will crash in each time unit The number of time units it takes a computer to recover from crashing DNIDS detection probability of threat class Initial number of threats in the network

10 1 0.0003 1

5-50 0.5 0

Value 1 1 0 1 0 0.001 1.00E-07

REFERENCES [1] [2] [3] [4] [5] [6] [7]

[8]

[18] [19] [20]

0.01

Detection Time Experiment Parameters Initial number of threat classes Threat activation probability Appearance of spontaneous new threats in each computer every time unit The probability that a computer will send a threat in each time unit The probability that a computer will crash in each time unit DNIDS detection probability of threat class Initial number of threats in the network

[17]

J. Balthrop, S. Forrest, M. E. J. Newman, and M. M. Williamson, "Technological Networks and the Spread of Computer Viruses", Science vol. 304, 527, 2004. N. Weaver, V. Paxson, S. Staniford, R. Cunningham, "A taxonomy of computer worms", In The First ACM Workshop on Rapid Malcode (WORM), 2003. S. Staniford, V. Paxson, N. Weaver, "How to 0wn the Internet in Your Spare Time", Proceedings of the 11th USENIX Security Symposium. August 2002. Symantec Internet Security Threat Report January-June 2004. X. Zhang, C. Li, W. Zheng, "Intrusion prevention system design", Proceedings - The Fourth International Conference on Computer and Information Technology, pp. 386-390, 2004. Y. Tang, S. Chen, "Defending Against Internet Worms. A signature based approach", IEEE INFOCOM, 2005. A. Gupta, R. Sekar, "An Approach for Detecting Self-Propagating Email Using Anomaly Detection", In Proceedings of the International Symposium on Recent Advances in Intrusion Detection, September 2003. V. Yegneswaran, P. Barford, S. Jha, "Global Intrusion Detection in the DOMINO Overlay System", In Proceedings of NDSS, San Diego, CA, 2004.

[21] [22]

7

P. Blackburn, "Quarantining DHCP clients to reduce worm infection risk", http://www.giac.org/certified_professionals/practicals/gsec/3472.php. R. Pastor-Satorras, A. Vespignani, "Epidemics and immunization in scale-free networks", http://arxiv.org/abs/cond-mat/0205260, 2002. R. Huerta, L.S. Tsimring, "Contact tracing and epidemics control in social networks", Physical Review E, vol. 66, 056115, 2002. M. E. J. Newman, "The structure and function of complex networks", SIAM Review, vol. 45, no. 2, pp. 167-256, 2003. A. L. Barabasi, R. Albert, H. Jeong, "Scale-free characteristics of random networks: the topology of the world-wide web", Physica A vol. 281, pp. 69-77, 2000. A. L. Barabasi, R. Albert, "Emergence of Scaling in Random Networks", Science vol. 286, pp. 509, 1999. J. R. Tyler, D. M. Wilkinson, B. A. Huberman, "Email as spectroscopy: automated discovery of community structure within organizations. Communities and Technologies", M. Huysman, E. Wenger, V. Wulf, Eds., pp. 81-95, 2003. M E J, Newman, "A measure of betweenness centrality based on random walks", http://arXiv.org/abs/cond-mat/0309045, 2003. S. Wasserman, K. Faust, "Social Network Analysis", Cambridge University Press, Cambridge, 1994. M. G. Everett, S. P. Borgatti, "The Centrality of Groups and Classes", Mathematical Sociology, vol. 23 no. 3, pp. 181-201, 1999. L.C. Freeman, "A set of measuring centrality based on Betweenness", Sociometry vol. 40, pp. 35-41, 1977. M. E. J. Newman, "Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality", Physical Review E, Vol. 64, 016132, 2001. U. Brandes, "A Faster Algorithm for Betweenness Centrality", Journal of Mathematical Sociology, vol. 25, no. 2, pp. 163-177, 2001. Fast Algorithm for Successive Group Betweenness Centrality Computation, submitted for publication.