NXG01-4: Scalable Hierarchical Traceback

Scalable Hierarchical Traceback Arjan Durresi, Vamsi Paruchuri Department of Computer Science Louisiana State University Baton Rouge, LA, USA [email protected] Abstract—Distributed Denial of Service attacks have recently emerged as one of the most potent, if not the greatest, weaknesses of the Internet. Previous solutions for this problem try to traceback to the exact origin of the attack by requiring the participation of all routers. For many reasons this requirement is impractical. In the presence of non-participating routers most of the proposed schemes either fail in reconstructing the attack path or end up with an approximate location of the attacker. We propose Hierarchical IP Traceback (HIT), a hierarchical approach to address this issue. HIT has significant improvements over other works in several dimensions: (1) with just a few tens of packets, HIT enables the victim to reconstruct the attack graph, an improvement of 2-3 orders of magnitude when compared to previous schemes; (2) HIT scales to large distributed attacks with thousands of attacks; (3) owing to its hierarchical nature, the reconstruction takes only tens of seconds. Keywords Traceback; Distributed Denial of Service; Security

I. INTRODUCTION (HEADING 1) Distributed Denial of Service (DDoS) attacks are a potent form of attack on the availability of Internet services and resources. A DDoS attack by definition is any act intended to cause a service to become unavailable or unusable. In a DDoS attack, there are no inherent limitations to the number of machines that can be used to launch the attack. A DDoS attack utilizes the distributed nature of the internet, with hosts owned by disparate entities around the world. These unsuspecting computers are then used to wage a coordinated mass-scale attack against a particular system or site. In addition, since these attacks are coming from a wide range of IP addresses, it is much more difficult to block and detect at the firewall level. DDoS attacks use advanced methods of attacking a network system to make it unusable to legitimate network users. These attacks are an annoyance at a minimum, and if they are against a critical system, they can be severely damaging. The negative effects [1] of a DDoS attack require that solutions and security measures be developed to prevent these types of attacks. Due to the stateless nature of the Internet, it is a difficult problem to determine the source of these spoofed IP packets; this problem is called the IP traceback problem. Also, traceback mechanisms are a first step in providing automated packet filtering at routers in order to block an attack’s origin.

To effectively provide the above benefits and be applicable in an Internet environment, a traceback mechanism must have the following general properties: (1) compatibility with existing network protocols, routers and infrastructure, (2) insignificant network traffic overhead, (3) scalability to a large number of attackers while maintaining accuracy, and (4) ability to cope with non-participating routers. In this paper, we present HIT to address the needs of both victims and network operators by exploiting the inherent Autonomous System structure of the Internet. We list a summary of contributions of this paper: - HIT is very scalable, able to deal with thousands of attackers, and able to perform reconstruction in seconds. HIT needs over 150 packets, a reduction of 1-2 orders of magnitude as compared to previous schemes, but still produces less than one tenth of false positives as compared to previous techniques. - While most of the previous traceback mechanisms either completely fail or misinterpret router locations in the presence of legacy routers, HIT is able to reconstruct accurate attack graphs. - HIT exploits the inherent Autonomous System topology of the Internet, thus achieving higher scalability, lower false positives and lower computational overhead when compared to existing works. The rest of this paper is organized as follows: Section II discusses related work, Section III deals with the motivation and background, Section IV presents the Hierarchical IP Traceback scheme, Section V presents performance evaluation of HIT and Section VI concludes. II. RELATED WORK The most obvious countermeasure against DDoS attacks is ingress filtering [2] based on source address. Another technique is victim pushback, where a site that believes to be under attack can send back messages installing filters at upstream routers [9]. Due to the current lack of incentives for ISPs to provide such a service, it is not expected to become widely deployed anytime soon. One promising solution is to let routers probabilistically mark packets with partial path information

1-4244-0357-X/06/$20.00 ©2006 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE GLOBECOM 2006 proceedings.

during packet forwarding [5, 11]. The victim then reconstructs the complete paths after receiving a modest number of packets that contain the marking. But, as shown in [6], this approach has a very high computation overhead for the victim to reconstruct the attack paths, and gives a large number of false positives when the denial-of-service attack originates from multiple attackers. Song et al. [6] improve on the Savage scheme [5] by predetermining the network topology. Dean et al. [10] provide another avenue to improve CEFS. Instead of using a hash function as a verifier, the routers algebraically encode the path or edge information iteratively using Horner’s rule. This scheme is susceptible to a GOSSIB attack [16]. The Internet Engineering Task Force (IETF) working group proposed that each router would periodically (every few hundred or thousand packets) select a packet and “append” authenticated traceback information to this packet [3], by creating a second packet tailgating the original packet. In [8,13] the authors propose storing a hash of each packet along with information about where it arrived from. FIT [12] uses a fragment marking scheme. This scheme combined with a novel technique to calculate the distance of the marking router just by using 1-bit in IP header, makes FIT reconstruct the attack graph, with fewer packets and scale to large attacks. An architecture for AS traceback is proposed in [15]. This is based on a passive detection packet method like iTrace packet in ICMP traceback. The objective is to construct the AS attack path. We have proposed an AS traceback technique [14] that is able to reconstruct the AS attack graph based on the assumption that all AS border Routers are traceback enabled. III.

3.

4.

5.

less than seven ASes before reaching its destination. Privately owned ASes may not always like to disclose their network details. If each router in the AS participates in the marking scheme, then one can easily infer the network architecture by observing the markings. AS number is 16 bits in length while IP v4 address is 32bit length (128 bits with IPv6). Thus, encoding AS number needs less header space than IP address and AS path construction needs far fewer packets than IP path. There is no scope for AS false positives, as a packet can carry the whole AS number and the victim can reconstruct the attack path without any uncertainty.

B. Performance Metrics Traditional metrics used to evaluate the performance of a traceback mechanism are the number of false positives (nodes in the real attack graph but not in the reconstructed attack graph), Number of packets required to reconstruct, Computational Overhead and Reconstruction time. We present a new metric Total Time to Traceback (TTT) that incorporates both number of packets and reconstruction time. Total Time to Traceback (TTT):. This is the time taken for the victim to construct the attack graph and traceback to the attacker upon detection of an attack. Thus, TTT = TPkt + TR where TPkt is time taken to receive enough packets and TR is the reconstruction time. Thus, the network would be under attack for a duration TTT. TR is traceback mechanism dependant.

MOTIVATION AND BACKGROUND

In this section we initially describe the AS topology of the Internet. We then present the performance metrics we consider and propose a new metric – Total Time to Traceback. A. AS Topology An Autonomous System (AS) is a group of IP networks operated by one or more network operator(s), which has a single and clearly defined external routing policy. An Autonomous System Number (ASN) is a globally unique 16bit number in the Internet to identify an AS. ASN is used in exchanging exterior routing information between ASes, and as an identifier of the AS itself in the global Internet. Autonomous System Border Routers (ASBRs) are connected to more than one AS, and exchange routing information with routers in another AS. ASBRs advertise the exchanged external routing information throughout their AS. The traffic to/from an ASBR is controlled by its ASBRs. Any traffic originating from (to) an AS to (from) a node outside the AS has to pass through an ASBR of the AS. ASes have been well studied in [21-26]. AS Traceback has numerous advantages over IP traceback including: 1. Number of ASes is far smaller than the number of routers. The Internet consists of around 14,000 ASes as compared to 1.6e+08 hosts. Hence, obtaining an AS map of the Internet is feasible, while obtaining an accurate dynamic Internet map itself is very difficult if not impossible. 2. In more than 99.5% of the cases, a packet passes through

IV.

HERARCHICAL IP TRACEBACK - HIT

HIT uses a hierarchical approach to construct the attack graph. This enables HIT to achieve better scalability and reduce the reconstruction complexity. In HIT, routers mark/overwrite 17-bits from the IP header space for marking as shown in Fig. 1. We use the following fields from IP header for this purpose: The ID Field and The Unused Fragment Flag. A. Marking HIT router marks a forwarded packet with a certain probability, p, which is a global constant. A router can mark a packet in two different ways. With probability q, it decides to do AS marking, in which case it overwrites the 16-bit IP ID field to its ASN number and sets the flag bit to ‘0’. The complete marking algorithm is presented in Fig. 2. With a probability of (1-q), the router does IP marking. We use a similar technique as in FIT [12] and use only 1-bit in the IP ID field to mark the distance from the victim at which the packet was marked. HIT uses node sampling as this technique is effective even in presence of legacy routers. A router marks a packet with a hash of its IP address as shown in Fig. 2. B. Attack graph Reconstruction The purpose of IP traceback mechanism is to effectively reconstruct the IP addresses of the routers on the path from the attacker to the victim. Similar to all previous IP Traceback

1-4244-0357-X/06/$20.00 ©2006 IEEE


mechanisms, we assume that the victim has a mechanism to identify malicious packets, so that traceback can be performed.

graph. This leads to smaller number of routers – routers belonging only to the ASes in the attack graph – to be verified.

Consider we have rda routers on the attack path at distance d. Assume that the victim received all fragments from all routers in the attack graph. Then, the probability that a specific fragment of a router not in the attack graph matches the fragment of a router on the attack path is

Fig. 3 plots pf for varying numbers of routers on attack graph at distance d, for different fractions of routers to be verified. For instance, f = 0.5 presents the scenario in which only half the routers in the original IP graph belong to the AS attack graph. We note that just reducing the actual number of routers by half reduces the false positives by a factor of eight.

1   p frag = 1 −  1 − b frag   2 

rda

(1)

Table 1 summarizes pfrag for different number of routers on attack path. Hash fragment size bfrag is 13 bits. Since, we require at least npath markings per router to add it to the reconstructed attack graph, the probability that a router will be a false positive is n− j n (2) p f = ∑   p frag j (1 − p frag ) j = n path  j  n

1 - bit flag = 0

16 – bits ASN (a)

1 - bit flag = 1

1 bit b

2 – bits frag#

The probability of receiving j distinct hash fragments from a set of k total fragments after receiving y randomly selected fragments is [27]:  k Pf [ j , k , y ] =  k −

j  k k − j +l  l  j  − − 1 1 ( ) ∑  l   j  l =0 l   

With probability pq, a router marks a packet with its AS number and with probability p(1–q), the router does IP marking. To receive a marking from a router at distance d, that router must mark a packet and all subsequent routers must not mark that packet. Thus, the probability of receiving a packet with marking from a router at distance d hops is

Pm  d , p (1 − q )  = p (1 − q ) . (1 − p (1 − q ) )

13 – bits IP hash fragment (b)

Figure 1. HIT marking field diagrams. (a) For AS marking, the flag field is set to ‘0’. (b) For IP marking. The flag bit is set to ‘1’. The distance field is 1-bit.

(3)

d −1

(4)

Thus, if the victim receives x packets from a router at distance d, then on an average x.Pm[d, p(1-q)] packets would have IP markings from that router. The probability of receiving at least j distinct markings out of nf markings from a router at distance d after receiving x packets is: nf

Pip [ d , x ] = ∑ Pf l , n f , x.Pm  d , p (1 − q )  

(5)

l= j

for each packet P R α←   0,1)

IF (α ≤ p) OR (b/c – TTL[5..0] mod 64) > 32 R β←   0,1) IF β < q P.flag ← 0 P.[b/frag_num/fragment] ← AS number ELSE P.flag ← 1 R γ←  0, n ) P.frag_num ← γ P. fragment ← H ( IP) (α +1).b 

ELSE

j and nf are determined by desired false positive rates and desired speed of reconstruction. As verified by experimental results, HIT(nf = 2;j = 2) is able to reconstruct more than 95% of attack graph with less than 141 packets, while keeping the false positive rate as low as 0.04 even in presence of 1000 attackers. TABLE 1. VALUES OF PFRAG FOR DIFFERENT NUMBER OF ROUTERS ON ATTACK GRAPH AT DISTANCE D

rda

250

500

1000

2000

pfrag

0.03006

0.05921

0.11492

0.21663

fnum −1..α .b fnum  

P.b ← TTL[5] TTL[4..0] ← c

Computational complexity The complexity of AMS and FIT is

TTL ← TTL – 1

Figure 2. HIT marking algorithm The estimate for the number of false positive routers at a distance d is then p f ( rd − rda ) . HIT first constructs the AS attack graph and then proceeds on constructing the IP attack

O

(∑

d

| S d | . | Ψ d +1 |

),

where Sd is the set of routers at distance d in the reconstructed attack graph and |Ψd| is the number of unique fragments received by the victim from routers at distance d. Though the computational complexity of HIT is also given by the same expression, HIT takes less time to reconstruct the attack graph. This is because FIT needs to consider all the routers in the IP graph, while HIT, needs to consider only the routers that belong to the ASes present in the reconstructed AS graph. Asymptotically, as the number of attackers is large

1-4244-0357-X/06/$20.00 ©2006 IEEE


p(router is false positive)

0.08 f=1 f=0.5 f=0.2

0.06 0.04 0.02 0 0

1000

2000

3000

# of routers on attack graph at distance d

Figure 3. Probability that a router is a false positive for varying number of attackers Selection of Marking Probabilities A careful selection of packet marking probability (p) and AS marking probability (q) is needed for efficient performance of HIT. Following the same reasoning presented in section IV, we chose p = 0.004. We observe that, for high values of q more packets would be marked with AS numbers, and hence AS path reconstruction would be very fast. At the same time, since the number of packets carrying IP markings is less, IP path reconstruction would be slower. Again, as explained in section IV.B, on an average, 4 to 7 routers mark with the same AS number except for origin and destination ASes. In the origin AS, at least two routers mark with the same AS number. We choose AS marking probability of q = 0.2. To see the intuition behind this, consider the case when the victim has received 150 packets. Then, around 30 packets would be carrying AS markings, with which the victim could construct the AS path and then proceed to reconstruct IP path.

Figs. 4 and 5 present the performance of HIT in presence of 1000 and 5000 attackers, respectively. The false positive (negative) rates are computed as a ratio of the number of occurrences of a false positive (negative) versus the number of routers in the upstream paths. As expected, HIT 2/2 needs fewer packets as compared to HIT 4/3, because it just needs two fragments from each router while HIT 4/3 needs 3 fragments. On the other hand, HIT 4/3 is more scalable and produces lesser false positives than HIT 2/2. HIT 2/2 is preferred for small DDoS attacks (< 1000 attackers) and when false positives can be tolerated. When the DDoS attack is expected to be highly distributed or an accurate attack graph reconstruction is needed, HIT 4/3 could be preferred. Figs. 4 and 5 also compare the performance of HIT with the performance of FIT. The time needed to reconstruct the attack graph for both HIT and FIT are presented in Fig. 6. Our timing information is based on a Perl implementation running on a 3 GHz Pentium IV Linux workstation. HIT 2/2 is faster than HIT 4/3 because of lesser number of fragments it has to go through, the difference being noticeable in presence high number of attackers. When compared with FIT, because of the hierarchical nature of HIT, it needs much less time (around half) than FIT. 0.8

False Negatives

enough such that all ASes are present in the attack graph, computational complexity of FIT and HIT would be similar.

HIT 2/2

0.6

HIT 4/3 FIT 4/3 FIT 8/5

0.4

0.2

0 0

200

400

600

800

Number of packets

1000

(a) EKSPERIMENTAL E VALUATION

To test the behavior of these Advanced Marking schemes in realistic settings, we conduct an experiment on simulated attacks using a real traceroute dataset obtained from CAIDA [19] containing 127,634 distinct traceroutes from a single source with an average path length of 25. We use Cisco Border Gateway Protocol (BGP) RIBs collected from routeviews [18] to convert IP addresses to AS numbers and get AS topology. We obtained 7247 different ASes with an average of 3.6 ASes per path. In all the tests, we used the single source of the traceroute as the victim, and the whole traceroute dataset as the map of upstream routers from the victim. In each test, we randomly selected a given number of routers as attackers. We then simulated the routers to mark the attack packets and simulated the victim to reconstruct the attack graph using the markings in the packets. In the path reconstruction experiments each attacker sends x packets to the victim. For each path reconstruction experiment, we assume that the tracing end host has a complete map of the upstream router tree with no false positives. We use the notation HIT x/y to mean that IP address is divided into x fragments and the victim needs to receive at least y fragments from a router to consider it to be in the attack graph. Same is the case with FIT x/y

1

Hit 4/3

False Positives

V.

HIT 2/2 0.1

FIT 4/3 FIT 8/5

0.01

0.001 0

0.1

0.2

0.3

0.4

False Negatives

0.5

0.6

(b) Fig. 4. Performance of HIT in reconstructing attack graph, 1000 attackers. (a) Number of packets needed. (b) False positive rate

We also study the performance of HIT and FIT in terms of total time to traceback. For this we consider each attacker to send packets at a rate of 10 packets per second. Though an attacker can send packets at much higher rate, it is only detrimental to him, as it would make the reconstruction faster, thus reducing TTT. Fig. 7 shows the performance for HIT and FIT. While HIT 2/2 is able to reconstruct the attack graph in less than 2 minutes once the attack is detected, FIT takes more than 5 minutes. It should be noted that TTT directly translates

1-4244-0357-X/06/$20.00 ©2006 IEEE


False Negatives

to the duration the attacker is able to deprive the users not only from the services offered by the victim, but also from the services provided by the Internet. 0.8

HIT 2/2 HIT 4/3 FIT 4/3 FIT 8/5

0.6 0.4

REFERENCES

0.2 0 0

200

400

600

Number of packets

800

1000

(a) False Positives

10

Hit 2/2 HIT 4/3 FIT 4/3 FIT 8/5

1

0.1

0.01

0.001 0

0.1

0.2

0.3

0.4

False Negatives

0.5

0.6

(b)

Reconstruction Time

Fig. 5. Performance of HIT in reconstructing attack graph, 5000 attackers. (a) Number of packets needed. (b) False positive rate

300 250

HIT 2/2

200

HIT 4/3

150

FIT 4/3

100 50 0 0

1000

2000

3000

Number of Attackers Figure 6. Reconstruction time of HIT and FIT

Total Time to Tracebac (seconds)

350

HIT 2/2

300

HIT 4/3

250

FIT 4/3

200 150 100 50 0 0

1000

2000

3000

Number of Attackers Figure 7. Total time to reconstruct for HIT and FIT

VI.

works. To be precise, HIT can reconstruct the attack graph after receiving just tens of packets – a reduction of 2-3 orders of magnitude as compared to previous marking schemes. HIT also scales to large distributed attacks with thousands of attackers, but is still able to reconstruct the attack graph in seconds.

CONCLUSIONS

In this paper, we present HIT, a hierarchical approach to improve packet-marking traceback. HIT exploits the inherent Autonomous System topology of the Internet. A HIT enabled router marks a packet both with its AS number and its IP address, thus achieving higher scalability, fewer false positives, and lower computational overhead as compared to existing

1. CERT Advisory CA-97.28. “IP Denial-of-Service Attacks”. www.cert.org/advisories/CA-97.28.smurf.html, Dec’1997. 2. P. Ferguson and D. Senie. “Network Ingress Filtering: DDoS Attacks which employ IP Source Spoofing”. RFC 2267, Jan’98 3. A. Mankin et al. “On Design and Evaluation of Intention-Driven ICMP Traceback”, Proceedings of the IEEE ICCCN, Oct’ 2001. 4. J. Mogul and S. Deering. “Path MTU Discovery”. RFC 1191, Nov’90. 5. Stefan Savage et al. “Practical Network Support for IP Traceback”. Proc of the 2000 ACM SIGCOMM Conference. 6. Dawn Song and Adrian Perrig. “Advanced and Authenticated Marking Schemes for IP traceback”, IEEE INFOCOM 2001. 7. Albright, E. and Dang, X.-H. An Implementation of IP Traceback in IPv6 Using Probabilistic Packet Marking. Proc. of Intl. Conf. on Internet Computing (ICOMP’05). Jun’05. 8. Alex Snoeren et al. “Hash-Based IP Traceback”, Proc. of ACM SIGCOMM 2001. 9. John Ioannidis and Steven M. Bellovin. “Implementing pushback: Routerbased defense against DDoS attacks”. In Proc. of Network and Distributed System Security Symposium, Feb’02. 10. Drew Dean, Matt Franklin, and Adam Stubblefield. “An algebraic approach to IP traceback”. In Proc of the Network and Distributed System Security Symposium, February 2001. 11. Stefan Savage et al. “Network support for IP traceback”. ACM/IEEE Transactions on Networking, June 2001. 12. A. Yaar, A. Perrig, and D. Song. “FIT: Fast Internet traceback. In Proc.of IEEE INFOCOM”, Miami, USA, March 2005. 13. Chao Gong et al. “Single Packet IP Traceback in AS-level Partial Deployment Scenario”. IEEE GLOBECOM, Dec’05. 14. Vamsi Paruchuri, A. Durresi, R. Kannan, S.S. Iyengar. “Authenticated Autonomous System Traceback”. In IEEE Proceedings of AINA, Mar’04. 15. A. Yaar, A. Perrig, and D. Song. “Pi: A Path Identification Mechanism to Defend against DDoS Attacks”. IEEE Symposium on Security and Privacy, May 2003 16. Masafumi Oe et al. “An implementation and verification of a hierarchical architecture for IP traceback”. In Electronics and Communications in Japan, Vol. 87-11, 2004. 17. Marcel Waldvogel. “GOSSIB vs. IP Traceback Rumors”, 18th Annual Computer Security Applications Conference, Dec’02. 18. David Meyer. “University of Oregon Route Views Archive” Project. http://archive.routeviews.org/, June 2004 19. The Cooperative Association for Internet Data Analysis (CAIDA) MapNet tool, http://www.caida.org/tools/visualization/mapnet/. 20. K. Nichols et al. “Definition of the Differentiated Services field (DS field) in the IPv4 and IPv6 headers”. RFC 2474, Dec. 1998. 21. W. Theilmann and K. Rothermel, “Dynamic Distance Maps of the Internet”. In Proc. of IEEE INFOCOM 2000. 22. H. Tangmunarunkit et al. “Does AS Size Determine Degree in AS Topology?”. ACM Computer Comm Review, Oct’01. 23. R. Govindan and P. Radoslavov. “An Analysis of the Internal Structure of Large Autonomous Systems”. Unpublished manuscript. http://citeseer.ist.psu.edu/613626.html 24. H. Chang, S. Jamin, and W. Willinger. “Internet Connectivity at the ASlevel: An Optimization-Driven Modeling Approach”. In Proc. of ACM SIGCOMM Workshop on MoMeTools Aug’03. 25. D. Magoni and J. Pansiot. “Analysis of the Autonomous System Network Topology”. ACM Computer CommReview, Jul’01. 26. M. Fayed et al. “On the Size Distribution of Autonomous Systems”. Technical Report, Boston University, Jan’03. 27. William Feller. “An Intro to Prob. Theory and Its Applications”. Vol I of Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, New York, third edition, 1968.

1-4244-0357-X/06/$20.00 ©2006 IEEE


NXG01-4: Scalable Hierarchical Traceback

NXG01-4: Scalable Hierarchical Traceback

Suggest Documents

Scalable packet digesting schemes for IP traceback - Semantic Scholar

A Scalable Hierarchical Distributed Language Model

Hierarchical Network Management: A Scalable ... - Semantic Scholar

Scalable Expanders: Exploiting Hierarchical ... - Semantic Scholar

Asynchronous, Hierarchical and Scalable ... - Semantic Scholar

Hierarchical Scalable Photonic Architectures for ... - Semantic Scholar

Hierarchical Clustering: A Structure for Scalable ... - CiteSeerX

Scalable VANET Content Routing Using Hierarchical ... - UCLA.edu

Scalable Inference in Hierarchical Generative ... - Semantic Scholar

On Detailed Routing for a Hierarchical Scalable

ATTackEr Traceback using MAC Layer

A Scalable Hierarchical Clustering Algorithm Using Spark - CiteSeerX

Hierarchical Cluster for Scalable Web Servers - Semantic Scholar

A Scalable Hierarchical Clustering Algorithm Using Spark - CiteSeerX

A Hierarchical Selective Encryption Technique in a Scalable Image ...

Scalable Event Routing in Hierarchical Neural Array Architecture with ...

Scalable, low-cost, hierarchical assembly of programmable DNA

Design and Evaluation of a Scalable Hierarchical Application ... - UGent

Scalable Hierarchical Access Control in Secure Group Communications

A Scalable Hierarchical Fuzzy Clustering Algorithm for ... - Google Sites

Hierarchical Agent Architecture for Scalable NoC ... - Semantic Scholar

Design and Implementation of Scalable Hierarchical Density ... - IDEAL

Hierarchical Bloom Filter Arrays (HBA): A Novel, Scalable Metadata ...

Safari: A self-organizing, hierarchical architecture for scalable