2015 2015 IEEE IEEE 9th International International Conference Conference onon Self-Adaptive Self-Adaptive and and Self-Organizing Self-Organizing Systems Systems Workshops Workshops
Network Attack Detection and Mitigation Ashok Singh Sairam Department of Computer Science and Engineering Indian Institute of Technology Patna Patna, India Email:
[email protected]
Sangita Roy Department of Computer Science and Engineering Indian Institute of Technology Patna Patna, India Email: r
[email protected]
positives is negligible. We propose an attack response algorithm where routers are supplied with a signature (path id) of the attack packet which is then used to filter the malicious packets. However, we show that in the midst of an attack, sending the signature directly to the edge router connected to the attacker, may require a large number of transmissions. Our proposed attack response algorithm follows a hop-by-hop approach. The attack signature is send one hop at a time to the upstream router until the edge router is encountered. The expected number of transmissions in this approach is significantly reduced as compared to the end-to-end approach. Finally the schemes were implemented in an emulated environment and demonstrated. Real attack traffic was injected and it was shown that our proposed scheme correctly traces the attacker as well as filters the attack traffic. In next phase we developed a distributed star coloring algorithm and that work is compared with the hash based IP traceback technique. As the hashing is an online approach so our star coloring should be distributed in manner and the experiment result shows that it provides less false positive as compare to hash technique. Our current simulation is going on the dynamic environment where router/link addition/deletion is under consideration. In dynamic environment there are lots of challenges in coloring the network and the most challenging part is traceback. Currently we are concentrating on this issue to trace the attacker in dynamic environment.
Abstract—Resource exhaustion attacks or denial of service attacks (DoS) have emerged as a major way to compromise the availability of servers and interrupt legitimate online services. IP traceback refers to the problem of identifying the source of such attacks. Packet marking is a general technique to traceback attackers. The main idea in packet marking is to insert some traceback data in each packet. The general technique used is to encode the IP address of the edge router into each incoming packet and store it in the 16-bit ID field of the IP packet header. Since information of a 32-bit field is converted to a 16-bit field, irrespective of the hash function used, collisions occur. This means there will be false positives (that is incorrectly identifying a legitimate user as attacker) and the problem will escalate as the size of the network increase. To avoid such collisions, we propose to explore the feasibility of using packet marks that is not directly dependant on the IP address of the packet. Keywords-star coloring; packet marking; distributed star coloring; attack path construction.
I. P ROBLEM When the marking is IP address independent we need to assign some values to each router explicitly. In this work we propose a variant of star coloring scheme called color balanced star coloring which assigns a color to each router. The coloring scheme ensures that no three consecutive routers have same color. The advantage of star coloring is that it reduces ambiguity during traceback. Secondly, the colors can be reused and thus reduces the bit space required to represent the colors. Thirdly the proposed scheme ensures that the graph is colored with its star chromatic number guaranteeing that the bit space required is minimum. Next we propose an algorithm to traceback the attacker using these colors. We introduce the concept of path id which is a signature of the attack path. The use of path id provides an elegant way of collecting packets from the attacker and constructing the attack path. We also introduce the concept of precise termination condition, that is number of packets required to reconstruct the attack graph. Intermediate routers marking en route packets set the TTL field to a global constant. This ensures that the traceback algorithm is robust to TTL spoofing. Extensive mathematical analysis was done to compute the false positives that can occur due to path id collisions. Our results show that percentage of such false 978-1-4673-8439-1/15 /15 $31.00 © 2015 IEEE $31.00 © 2015 IEEE DOI 10.1109/SASOW.2015.33
II. M OTIVATING S CENARIO AND R ESEARCH C HALLENGES Most of the existing packet marking techniques require a large number of packets to converge on the attack path(s) [1] [3]. This is because the existing solutions use IP address of the router as a mark and encode it in the 16-bit identification field of the IP packet header. Thus a minimum of two packets are required at the victim to collect information about a router. A complex mechanism is required to integrate the packet mark information. To reduce this overhead on the IP header as well as to make the traceback mechanism simple a number of packet marking schemes have been proposed [6] [7]. However, it has been observed that reducing the 168
overhead leads to false positive. In this work, our aim is to assign a mark to routers such that it can be used to identify an attacker as well as mitigate DoS attacks. The objectives of our work can be summarized as follows: • Our first goal is to devise a mechanism to associate a mark with each router which is independent of the router’s IP address. Initially, we assume that the entire network map is known and assign the marks so that attackers (routers) can be unambiguously traced back. We also ensure that the marks are reused so as to minimize the bit space required to encode them. • In order to make our proposed scheme of assigning marks to routers scalable, our second goal is to allow routers to choose their marks in a distributed manner without compromising on the traceback capability. • Our third objective is to use the packet marks to construct the attack path with less number of packets and reduce false positive. • In the last part of our work, we use the packet marks to filter attacks and thereby mitigate denial of service attacks. Keeping in view the increasing number of application based Dos attack, we also aim to propose a light weight scheme to mitigate application layer (Distributed Denial of Service) DDos attack.
a graph with its chromatic number. Though graph coloring is an NP−complete problem [9] and it does not cent percent guarantee to converge or successfully color the graph, still the conjugate failure probability of convergence and traceback is much more less than failure probability or false positive using other techniques i.e. RIM [10], hashing [4], etc. In our work we discussed how convergence depends on number of available colors and also compared failure probability of star color the graph. We designed two algorithms that attempt color balanced star coloring. The algorithms start with a color palette which is equal to the star chromatic number (χs ) of the graph. The input to the algorithm is an uncolored graph G. Each node maintains an available list that contains colors which can be assigned to the node such that the star template is not violated. A global occurrence list that maintains count of the times a color is used is also maintained. The algorithms select the node with the highest degree, colors it, update the lists and then from among its neigbours select the one with the highest degree. The process continues recursively until all nodes are colored or we arrive at a deadlock situation. We say that a deadlock has occurred if no colors are available to color a node and the node is said to be blocked. 1) Coloring without backtrack: In the first algorithm, we simply increase the color palate by 1 whenever a deadlock occurs. This is the simplest algorithm we can design to star color a graph [12]. The algorithm has a complexity of O(V ) where V is the number of nodes in the graph. Since the color is incremented, we can increment the color palette upto (V − C) at the worst case. But in reality number of nodes in Internet graph is much more higher than the total number of available colors. So it is not worth to assign each node a unique color. However, we give a bound on the minimum, average and maximum number of color increments. 2) Coloring with color reassignment: In the next algorithm, we try to enforce that the graph is colored with its star chromatic number [13]. As we started with star chromatic number, each node should have a color. The reason for running into a deadlock is because the traversal method was not optimal. Thus whenever the algorithm runs into a deadlock, it first examines its first and second hop neighbours. For each node in this set, it computes its current available list. From this set it chooses a pair of two nodes which have a common available color and are at a distance greater than two. This process continues until a color becomes available for the blocked node. The algorithm has a complexity of O(V 2 ).
III. S TATE OF THE ART IP traceback schemes which require global coordination can be classified into four broad categories, a) Link testing, b) Packet Logging, c) iTrace and d) Packet Marking. In link testing [3] an attack signature is constructed based on some common feature of attack packets, which is then communicated to the victim’s egress router. The egress router in turn communicates the signature to its upstream router and the process is recursively continued until the attack origin or ISP’s boundary is reached. In logging based schemes [4], digest of packets are stored in en route routers. This information is later used to identify the source of a malicious packet by recursively querying connected routers. In iTrace [2] [5] routers probabilistically generate ICMP messages for en route packets directed towards the victim. The ICMP packets collected at the victim is used to reconstruct the attack path. However, these schemes either suffer from a huge computational and overhead at the en route routers or generate additional traffic overhead. IV. M ETHODOLOGY A. Color balanced star coloring of Internet graph in centralized approach
B. Star coloring using distributed approach
Color balanced star coloring is star coloring of a graph such that the occurrence of the color is balanced. Star coloring of a graph G is a function φ : V (G) → N such that for any x, y ∈ V, φ(x) = φ(y) if distance between the vertices x and y is less than equal to 2. We show that color balanced star coloring is a necessary condition to star color
Although the centralized approach of star coloring minimizes the bits to encode the marks, it requires that the graph structure be known a priori. This requirement makes the approach practically infeasible for two reasons. Firstly, the Internet topology is dynamic as nodes come up or
169
go down instantaneously, secondly, the Internet topology till date cannot be accurately abstracted with any known mathematical model. In this work, we propose a distributed star coloring algorithm suited to online coloring of Internet graphs. The proposed technique does not require prior information of graph structure hence it can be used as an online approach by Internet routers to assign themselves color. The working principle is similar to that of the offline approach [13] [12]. In the first step, a node (v) assigns itself a temporary color from a given color template. The node then updates its provisional color selection to its first and second hop neighbors. In response the node also receives update from its two-hop neighbors about their color selection. The node then examines if the star color template is followed or not. In case the assignment is not successful, it means the node’s color i belongs to Cf (v), where Cf (v) is the set of colors which is being used in the node’s first and second hop neighbors. In such a case, the node can select color from three sets of colors - its assigned color i which has collided with one or more of its two-hop neighbors, forbidden color Cf (v) and unused color Cu (v) where Cu (v) is the unused or available colors of node v. The idea behind dividing the color palette into three subsets is that we select color from the first set with least probability, the second set with higher probability and the third set with the highest. • Case 1 (Probability of selecting color i): For the collided color i, (1) pi = (1 − b)pi
•
•
Figure 2.
Construction of path id.
C. Attack path construction In traditional packet marking approach, marks of the routers are assumed to be unique. Thus, an attack path will compose of unique marks sorted in decreasing order with respect to their distance from the victim. In star coloring, because the colors are reused, the same color can occur more than once in an attack path and thus directly sorting the packets based on the count of their colors will result in erroneous path. In Figure 1, if we consider the attack path 1-2-3-1-2, the victim will receive packets from node a and node d as well as from node b and e bearing same color information. Constructing the attack path by sorting the count of packet colors will result in the attack path 1-23 which leads to an erroneous path. In this work, we propose to construct a path id to be embedded by en route routers. Typically, the 16-bit ID field in the IP header is overloaded. A k bit color field is used by nodes to mark their color. The remaining bits are used to store the fingerprint or path id. Ideally, the path id is a summation of colors used on the path. However, such a proposition would require large bit space to store the path id. In order to restrict the size of the path id to that of the color field, we propose to use XOR or one’s complement arithmetic to compute the sum. The first router or the source stores its color in the path id field. The intermediate routers XORs its color with the already existing path id in the path id field to obtain the new path id. Finally, when the packet arrives at the victim, the mark is the XOR summation of color of each router in the path. This process is shown in Figure 2. The attack path construction algorithm assumes that the IDS flags a packet as an attack packet. The algorithm
where |C| is the total number of color used for coloring the network, |Cf | is the total number of colided color and |C | is the total number of colors as seen by the node excluding those colors which have been permanently assigned to nodes. Case 3 (Probability of selecting color from unused color Cu (v)): b(|C |−2) pk = (1 − b)pk + (|C b|−1) + (|C |−1)|C | , if j > 1 (3) pk = b b (1 − b)pj + |C| + |C|−|Cu | , ifj = 1
Star coloring of graph.
all its first and second hop neighbors. Likewise a node also receives color update of all its two-hop neighbors. Using this information, a node creates its forbidden batch map.
where b [11] is a design parameter and its value ranges from 0 to 1. Case 2 (Probability of selecting color from forbidden color Cf (v)): b (1 − b)pj + (|C |−1)(|C if j > 1 |−1) , f (2) pj = b (1 − b)pj + |C| , ifj = 1
Figure 1.
f
u
In our algorithm, we have seen that each node needs to maintain Cf , the forbidden list of colors. We propose to use a data structure forbidden batch map to construct this list. Each node maintains a copy of the forbidden batch map. It stores the self-assigned color of its neighbors. A node after provisionally choosing a color, broadcast its choice to
170
Figure 3.
End-to-end filtering approach. Figure 4.
extracts the path id from the malicious packet. It checks each incoming packet for this value, collects them, and finally sorts the packets on the basis of their color field and TTL value. The main advantage of our path construction algorithm is that the IDS only needs to flag a packet as malicious. The victim can extract the path id from that field, collect all subsequent packets bearing that ID, and sort them to obtain the attack path. An additional advantage of using path id is in filtering. The victim can use the path id to drop all subsequent packets. However, the path id for an entire path is not guaranteed to be globally unique because two paths can have exact color information.
Hop-by-hop filtering approach.
starts filtering the packets, computes the path id of the next upstream router, and then forwards it. This process continues until the edge router is reached. A router computes the path id of its upstream router by XORing its color with the path id. Figure 4 shows the functioning of the hop-by-hop filtering technique. 2) Application layer attack response using clustering: App-DDoS attacks unlike the lower layer DDoS attacks do not necessarily flood the victim with packets. They consume resources of the server by sending intensive requests or sending request for large downloads by simulating the browsing pattern of normal users. These attacks work on top of successful transport layer connections, hence the standard DDoS defense schemes based on filtering traffic from spoofed IP address is rendered useless. In this work, we mainly target the HTTP flood, HTTP repeated request and HTTP recursive attack. In these type of attacks, high rate of legitimate or invalid HTTP packets are sent to the server with an aim to overwhelm its session resources. We make the following assumptions.
D. Attack mitigation using packet mark 1) Network layer attack response using packet mark: An attack response plan consists of the procedure to identify, filter, and traceback the attack traffic. Here, we show how attack packets can be filtered using packet marks. Packet filtering is a network mechanism for controlling what data can flow to and from network-affected routers and firewalls. In order to mitigate the effect of an attack, the victim needs to filter attack packets. However, dropping packets at the victim will not help because the bandwidth would have been already utilized. A more effective approach would be to drop attack packets at the edge router. In our case, this would mean, the first router nearest to the attacker, in the attack path. After the traceback is complete, our aim is to send a signature of the attack packet to the edge router, which it can use to filter the attack packets. There are two possibilities of sending the filtering information: end-to-end and hop-byhop. • End-to-end filtering: In an end-to-end approach, the victim will directly send the filtering information to the edge router. The advantage is that attack packets will be directly dropped at the point of entry and the problem will be immediately addressed. Figure 3 shows the endto-end filtering technique. • Hop-by-hop filtering: In a hop-by-hop approach, the attack signature is forwarded one hop at a time along the attack path. The victim computes the attack signature of its first upstream router on the attack path and forwards the information. This router immediately
•
•
Attackers visit semantically related pages. Although botnets can dynamically change the URLs to visit, for a given duration all bots under a single C&C will visit pages of a server in same sequence or at least from same pool of semantically related web pages. Attackers will attempt to drain resources of the server. In order for an attack to be successful, the attacker needs to exert sufficient workload on the server either in terms of communication or computational resources. HTTP flood attack will primarily drain the network bandwidth while attacks like HTTP recursive request will exhaust the computational resources.
Figure 5 shows the architecture of our attack detection model. It consists of four modules - feature extractor, clustering unit, workload calculator and filter. The feature extractor module mines the navigation log to extract browsing behavior attributes. The clustering unit samples the user traffic every T interval. The value of T will greatly influence how the clusters are formed. If we take a very high value, we may get dense clusters but will render the framework useless in the event of an attack. The workload calculator uses the
171
Figure 5.
Architecture of Proposed Solution. Figure 6.
Relation of false positive/false negative with number of clients
clustering report and computes the size and density of each cluster. Dense and large clusters are further subjected to workload calculation. The load is calculated both in terms of the computational overhead and network bandwidth usage. Finally, the filtering unit examines each cluster and if their workload exceeds a threshold, it provides a mechanism to filter traffic from such sources. Ideally the value of threshold should be set proportional to the capacity of the webserver.
V. C URRENT S TATUS A. False positives and false negatives in traceback
Figure 7.
In order to compute the false positives and negatives, we use the skitter project map distributed by Cooperative Association for Internet Data Analysis [14] to build the network map. The number of nodes considered was 10000. All these nodes were assigned color according to the balanced star color template. To model DDoS attack, one node was chosen as victim, and other nodes were chosen at random to simulate the attackers. The routing protocol used was Bellman-Ford. The experiment was repeated by gradually increasing the number of attackers. In order to trigger the traceback process, an IDS needs to identify the attacker packets. This requires examining the traffic content, which is beyond the scope of this paper. As described in the previous section, an existing IDS [15] is used to identify attack packets. The victim records the path followed by each packet and computes the path id. The maximum and average path lengths considered in our experiment were 10 and 5, respectively. In our approach, we only collect packets from attackers to construct the attack path. Assuming that the IDS correctly identifies attack packets, no legitimate packets will be identified at the victim’s end. However, path id of one attack path can coincide with that of another attack path. In such a case, packets from both the attackers will be merged. The victim will not be able to traceback to either of the attackers, but the constructed path may lead to a legitimate user. Thus, in case of a path id collision, possibility of both false positives and negatives exists. Relation between the false positive/false negative ratio and the number of clients for both our approach and RIM [10] is plotted in Figure 6.
Scale free graph.
In case of RIM, false positives will occur because the unique interface identifiers (IIDs) of the en route routers and their XOR values may coincide. As can be seen from the figure, in the average case, false positive ratio in the case of RIM is double than that of our approach. The difference is because the bit space for path id in our approach is nearly twice as large as the bit space for IID. The collisions are therefore less in our case. Another observation is that in general, false positive ratio increases with the increase in number of attackers. However, there are also a number of instances where value of false positive ratio actually drops with increasing attackers. These cases occur if the path id collisions remain constant, while number of attackers increase. B. Hash collision vs color collision in distributed coloring We compare our distributed star coloring algorithm with existing hash technology in respect to collision probability. Figure 7 needs atleast 6 colors for successful coloring. At distance 1, for 5 nodes, 5 distinct colors will be used to follow the star template but not necessarily 5 distinct hashes will be generated. For different hash generation we run the hash generation program 1000 times for all nodes and finally calculated the mean value of single hash generation. At distance 2, there are 12 nodes where any one color can be used maximum 4 times (different from line graph as the positions of nodes are changed) and the single hash can be used maximum 12 times itself but in our experiment the
172
ACKNOWLEDGMENT We thank to IIT Patna for providing research platform and TCS RSP for financial support. R EFERENCES [1] H. Burch and B. Cheswick, Tracing anonymous packets to their approximate source, in Proc. 2000 USENIX LISA Conf., Dec. 2000, pp. 319327. [2] S. M. Bellovin, ICMP traceback messages, Draft:draft-ietf-itrace-04.txt, February, 2000.
Figure 8. nodes.
Internet
[3] S. Savage, D. Wetherall, A. Karlin, and T. Anderson, Network support for IP traceback, IEEE/ACM Trans. Networking (June 2001), vol. 9, pp. 226-237.
Hash collision and color collision probability over number of
[4] A. Snoeren, C. Patridge, L. Sanchez, C. Jones, F. Tchakounito, S. Kent, and W. Strayer, Hash-based IP traceback, in Proc. ACM SIGCOMM (2001), San Diego, CA, pp. 314.
mean number of single hash generated is 7. So, the color and hash collision probabilities at distance 2 are .33 and .58 respectively. Collision probability is calculated of number of same hash/color generated over total number of generated hash/color. At distance 3 the collision probabilities are .57 and .76 and so on. As hash values are generated randomly, with the increment of the number of nodes at different distance hash collision probability is increased more than the color collision probability.
[5] H. Lee, V. Thing, Y. Xu, and M. Ma, ICMP Traceback with Cumulative Path, An Effcient Solution for IP Traceback, Information and Communications Security. LNCS (2003).124135. [6] A. Belenky A and N. Ansari IP traceback with deterministic packet marking, IEEE Communications Letters 2003, 7(4), 162-64. [7] S. K. Rayanchu and G. Barua, Tracing attackers with deterministic edge router marking (DERM), in Proc. 1st Int. Conf. Distrib. Comput. Internet Technol (2004), pp. 4952.
VI. R ESEARCH P LAN In this research, we proposed different centralized star coloring algorithms to color the Internet graph. To overcome the problems of centralized approach, we designed distributed star coloring in our next phase of research work. To minimize collision, some uniform weighted probability is distributed to all the colors and after every iteration the color probability is changed if any color collision occurs at any node. For any collided node, the probability of its newly generated color list is divided into three different ways. Probability of collided nodes own color, probability of its forbidden color(which is not permanently assigned to its neighbors) and probability of its unused color are different as per the probability distribution. All failure nodes maintain different color list for selecting appropriate color at the next iteration after collision. The colors which are assigned permanently are removed from all nodes avail- able color list to fasten the convergence. In the next phase we compared our approach with existing hash technology where we showed that color collision is less than the hash collision. Our distributed star coloring algorithm is also able to handle the dynamic nature of the Internet graph. The future direction of this work is to develop a star coloring algorithm for the dynamic network as well as traceback in dynamic environment considering node failure and link failure of the network.
[8] TKT Law, You Can Run, But You Can’t Hide: An Effective Statistical Methodology to Trace Back DDoS Attackers, 2005. [9] Muthuprasanna M, Manimaran G, Manzor M and Kumar V., Coloring the Internet: IP traceback, Proceedings of the 12th International Conference Parallel and Distributed Systems, ICPADS (2006), Minneapolis, MN. [10] Chen R, Park J M, Marchany R. 2006. RIM: Router Interface Marking for IP traceback, IEEE Global Telecommunication Conference(GLOBECOM ’06) pp. 1–5 [11] D.J. Leith, and Clifford P, A self-managed distributed channel selection algorithm for WLANs, 4th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks,IEEE, pp. 1-9, 2006 [12] Roy S, Singh A, Sairam A S. IP traceback in star colored networks Fifth International Conference on Communication Systems and Networks (COMSNETS),2013. [13] A. S. Sairam, S. Roy, R. Sahay, Coloring Networks for Attacker Identification and Response, Security and Communication Networks, Wiley, doi: 10.1002/sec.1022, 2014 [14] CAIDA’s Router Level Topology Measurements, http://www.caida.org/tools/measurement/skitter/router topology/ [15] Roesch M. 1999. Snort lightweight intrusion detection for networks In 13th USENIX Systems Administration Conference (LISA 99), Seattle, WA, Nov. 1999.
173