238
IEEE TRANSACTIONS ON COMPUTERS,
Preventing Session Table Explosion in Packet Inspection Computers Hyogon Kim, Member, IEEE, Jin-Ho Kim, Inhye Kang, Member, IEEE, and Saewoong Bahk, Member, IEEE Abstract—In this paper, we first show that various network attacks can cause fatal inflation of dynamic memory usage on packet processing computers. Considering Transmission Control Protocol (TCP) is utilized by most of these attacks as well as legitimate traffic, we propose a parsimonious memory management guideline based on the design of the TCP and the analysis of real-life Internet traces. In particular, we demonstrate that, for all practical purposes, one should not allocate memory for an embryonic TCP connection with roughly more than 10 seconds of inactivity. Index Terms—Network monitoring, memory management, TCP, timeout, packet inspection.
æ 1
INTRODUCTION
TODAY, there are many types of computers specialized in packet processing. In particular, there are so-called stateful packet inspection computers or devices for various applications such as firewall [1], virtual private network (VPN), network intrusion detection, traffic monitoring [2], accounting and charging [3], and traffic load balancing [4] among others. In stateful inspection, some state needs to be maintained across packets within the same packet stream (a.k.a. “flow”). As flows are initiated and terminated, a corresponding entry is created and destroyed in the packet inspection computer. The treatment of a packet may be determined by its individual values as well as by the states left by prior packets in the same flow. For space and lookup efficiency, most stateful inspection computers employ some form of timeout mechanism to purge invalid states. Typically, some proprietary timeout values (frequently 60 or 120 seconds [5]) are used. Unfortunately, there is a lack of engineering guideline here, which we argue can lead to inefficient management of states. Too short a timeout will cause excessive deletion and creation of entries, having undesirable ramifications—e.g., a firewall blocking legitimate packets due to the untimely deletion of the corresponding entry. Too long a timeout, on the other hand, can inflate the required memory by a large factor (e.g., see [6]). More importantly, it can even lead to memory overflow in the face of network attacks (even if the attacks are not targeting the packet inspection computer itself), as we will demonstrate later. In this paper, we attempt to provide a technically solid guideline on configuring the timeout for efficient dynamic memory management. The guideline is designed to help configure the stateful inspection computers in an informed manner and enable them to continue functioning even in the face of strong network attacks.
. H. Kim is with the Department of Computer Science and Engineering, Korea University, Anam-dong 5-ga 1, Seongbug-gu, Seoul 136-701, Korea. E-mail:
[email protected]. . J.-H. Kim is with Samsung Electronics Co., Ltd., Dong Suwon PO Box 105, Maetan3-dong 416, Yeongtong-gu, Suwon, Gyeonggi-do 442-600, Korea. E-mail:
[email protected]. . I. Kang is with the Department of Mechanical and Information Engineering, University of Seoul, 90 Jeonnong-dong, Dongdaemun-gu, Seoul 130-743, Korea. E-mail:
[email protected]. . S. Bahk is with the School of Electrical Engineering, Seoul National University, Shinlim-dong, Gwanag-gu, Seoul 151-742, Korea. E-mail:
[email protected]. Manuscript received 12 Oct. 2003; revised 9 June 2004; accepted 20 Sept. 2004; published online 15 Dec. 2004. For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference IEEECS Log Number TC-0178-1003. 0018-9340/05/$20.00 ß 2005 IEEE
Published by the IEEE Computer Society
2
VOL. 54, NO. 2,
FEBRUARY 2005
DYNAMIC STATE MANAGEMENT
State in packet inspection is associated with a flow and essentially refers to the flow information. The flow information is typically a 5-tuple: < protocol; source IP ; source port; destination IP ; destination port > ¼ < ’; Is ; ps ; Id ; pd > : Depending on the situation, though, this flow definition can be made coarser (e.g., < ’; Is > ) or finer (e.g., by including additional values such as TCP sequence numbers [1]). When a packet is captured by a packet inspection computer, its flow information is extracted and compared against the entries in the session table, which has a list of presently tracked flows. If a match is found, an action will be performed on the packet, such as “pass” for a stateful firewall, or “increment the byte count” for usage-based charging. If a match is not found, however, a new flow entry can be created in the session table. And, when the flow is finished, the entry is purged. Determining the beginning and the end of a flow depends on the specific protocol. For connectionless protocols like UDP, the end of a flow can only be guessed, typically by means of inactivity timeout. But, for a connection-oriented, state-based protocol such as TCP, the packet inspection computer can explicitly identify the packets that signal the beginning and the end. In normal operating condition, each session table entry represents a valid flow. With the flow definition above, the memory requirement is not unbearable to a modern computer. A million entries with 40 bytes each would only take 40 megabytes. However, if network attacks are under way, the size of the session table can drastically change, as we will see in the next section.
3
IMPLICATION OF NETWORK ATTACKS
In this paper, we focus on those attacks that can cause memory overflow with the packet inspection computers: denial-of-service (DoS) attack and scanning. The most problematic aspect of these attacks from the perspective of the stateful packet inspection computer is that each single packet constitutes a separate flow. This is because attacks change at least one element of the flow identifier for each packet—Is for DoS, Id for hostscan, and pd for portscan. Since the DoS attacker is not really attempting to set up a connection with the victim (Id ), the attacker typically fills these fields other than Id with random numbers. In particular, Is is randomized in order to maximally confound traceback efforts [7]. As for hostscan, it is the Id that changes from packet to packet. Since the address list of vulnerable nodes is not known a priori, these attackers need to scan the IP address space in rather a random manner. In real-life traces, this behavior is easily observable [8]. Since no two packets in the same attack flow share the same flow identifier, the packet inspection computer ends up creating a flow entry for each attack packet. Unfortunately, the packet rate of these attacks is very high (e.g., 100,000-200,000 pkts/s in DoS [9] and up to 26,000 pkts/s per infected host [10] in worm epidemic). So, for instance, if a stateful inspection computer lies at the boundary of an enterprise network with 10 such infected hosts, the session table of a high-end firewall [5] would be overwhelmed just 10 seconds into the attack. Worse yet, these entries can enjoy a maximum lifespan in the session table. No DoS packet is intended for a legitimate connection and most of the hostscan packets fail to find targets to make a connection with (i.e., failed scanning). So, most bogus entries with one or more forged fields will remain in the table as long as they are allowed to. This is in contrast to legitimate TCP flows whose entries are cleared as soon as they are finished. In essence, the entries created by attacks are many, generated at a fast pace, and left abandoned as long as the packet inspection computer allows. Since the session table lookup is performed for every packet and, typically, the memory access is the bottleneck in packet processing machines, it is crucial to save on session table lookups to maintain the packet throughput. In the next section, we set a memory management guideline that maximally suppresses the creation of bogus entries that arise from network attacks while
IEEE TRANSACTIONS ON COMPUTERS,
VOL. 54, NO. 2,
FEBRUARY 2005
239
minimizing the adverse effects on legitimate flows. Since TCP is the protocol of choice for most attacks [7] and for the dominant majority of the Internet traffic [11], we focus on the bogus entries created by TCP flows.
4
DERIVING TIMEOUT GUIDELINES FOR TCP CONNECTIONS
To derive the guideline on adequate session timeouts for stateful inspection devices, we take the following approach: Obtain the distribution of TCP connection setup delays from a large number of connections from the Internet. . Based on the distribution, choose the connection setup timeout period so that it allows a ”sufficient” fraction of normal flows to proceed normally—i.e., set up the connection before timeout. . Purge the connections that remain incomplete beyond this timeout as they are likely attacks. The TCP connection setup delay is defined to be the time elapsed between the transmission of the first SY N packet and the ACK. (Multiple SY N transmissions can occur due to packet loss.) To make the numbers in the delay distribution strong, we use an 8+ million-connection trace collected in 2001 on two trans-Pacific T3 lines connecting a Korean Internet Exchange (IX) to North America. Notice the trans-Pacific nature of the trace serves an important purpose for our study. Long-haul connections have high delay and loss rate, helping us to choose a conservative delay estimates in Step 2, thus covering a dominant majority of normal TCP connections. In order to ensure that the distribution from the backbone trace is general enough, we used a separate set of traces collected from a regional ISP in Korea in 2003, providing connectivity to five universities. Unlike the trans-Pacific trace, these contain domestic (i.e., short-haul) connections. We filtered them out and extracted the distribution from only international connections. The number of connections in the 2003 trace is on the order of 100,000, much less than in 2001. Fig. 1 plots the CDF of the connection setup delays. The X-axis represents the connection setup delay in milliseconds and the Y-axis is the cumulative probability. The “total” curves represent ttotal , the interval between the first SY N and the ACK, including the delay caused by any SY N retransmissions. The “total-last” curves are the distributions of the ðttotal tlast Þ, where tlast is the interval between the last SY N (which could be a retransmission) and the ACK. So, it is the distribution of time spent in SY N retransmissions, if any. First, the 2003 trace generally shows a better total setup delay. On the other hand, in terms of the time spent in SY N retransmissions, the 2003 trace slightly lags behind. All in all, however, these two completely different traces in time and locale exhibit similar qualitative characteristics. Therefore, we will base our discussion on the 2001 trace because it presents more sound numbers in the distribution. We see in Fig. 1 that, for ttotal , there is a stiff increase at around 3 seconds. A closer examination reveals there are other (yet smaller) ones at around 9 and 6 seconds. In essence, these are due to TCP retransmission timeouts (RTO). The increase at 3 seconds is dictated by RFC 2988 [12]—the standard says that, in case SY N (or SY N=ACK) is lost, the kth retransmission should be tried in 3 2ðk1Þ seconds from the previous one, until success. The gap between successive SY N retransmissions thus becomes 3, 6, 12, and so forth, by which the connection setup delay is elongated. So, the minor increase at 9 seconds implies the second retransmission (9 ¼ 3 þ 6). The less popular BSD-derived implementations, on the other hand, transmit the first retransmission 6 seconds after the original SY N [13], resulting in the marginal increase at around 6 seconds. But, the most important observation from Fig. 1 is that the predominant fraction of connections (97 percent in 2001, 94.5 percent in 2003) does not go through any SYN retransmission, about 2 percent retransmit only once, and a small fraction retransmits more than once, according to ttotal tlast . .
Fig. 1. The distribution of connection setup delays in 2001 and 2003 traces.
Fig. 2. The delay distributions excluding SY N retransmission times.
Fig. 2 shows the CDF of the “pure” setup delay tlast from the 2001 and 2003 traces. We observe in the figure that the overwhelming majority of the connections take less than 2 seconds to complete the round-trip for SY N-ACK exchange. Moreover, considering these are all trans-Pacific (and international) connections, the percentage should go higher if we include local connections in the statistics. Looking at the results from a different angle, Table 1 summarizes the impact of different timeout values. We now know that RFC 2988 SY N retransmission behavior dictates the connection setup delays, thus the table is indexed by the number of retransmissions. Based on the observations from Figs. 1 and 2, we consider only up to three SY N retransmissions and we set tlast ¼ 1:0. We could also use 2 seconds for tlast , but it would only marginally improve the completion rate. Note that the limitation of two SY N retransmissions in BSD is due to the connection establishment timer that expires at 75 seconds [13]. The table tells us that having the timeout value > 10 is technically unwarranted (with tlast ¼ 1). Whether with RFC 2988 or BSD, the connection completion rate is close to 1 beyond ¼ 10. On the other hand, should not be lower than 1 because it would adversely affect a significant fraction of connections (see also Fig. 1). The discussion so far lets us draw the guideline that timeout configuration must accommodate the RFC 2988 dynamics and it should be 1:0 11:0. The boundary values are for (zero retransmissions allowed, tlast ¼ 1) and (two retransmissions allowed,
240
IEEE TRANSACTIONS ON COMPUTERS,
VOL. 54, NO. 2,
ct ðÞ ¼ xt þ ; TABLE 1 Connection Timeouts and Completion Rates
Fig. 3. Session table size with different timeout values.
FEBRUARY 2005
ð1Þ
where xt is the number of legitimate connection entries and is the attack rate. Note that xt is hardly a function because legitimate connections are established mostly before the timeout, thus does not affect their numbers. Using emulation (as in [14], [15]), we tracked the state of each TCP connection and measured the number of session tables and of purge entries given different timeout values. In Fig. 3, we observe that ct / , implying the existence of attack (positive , (1)). Indeed, from manual inspection, we know that the sharp spikes of Fig. 3 are DoS attack attempts; others are largely due to scan traffic [8]. From (1) and Fig. 3, we can estimate 5; 000 pkts/s at peak, whose derivation [8] we omit here due to the space constraint. See how inflated the session table can be with large s, even with this low attack rate. Recollect that a full-fledged DoS attack can easily mobilize more than 100,000 pkts/s [9] and a global worm epidemic can generate more than 20,000 pkts/s per infected host [10]. On the other hand, lower s are more resistant to attacks—see the less pronounced DoS spikes. Fig. 4 shows the number of purged entries under different s. Surprisingly, the difference between ¼ 1 and ¼ 31 is small compared to the absolute number of purged connections. It bears out the observation of Table 1 that a longer timeout beyond 1 second reaps only a marginal increment of connection completion rate. Namely, most purged embryonic connections would not grow into a full-fledged one no matter how large grows. In summary, keeping the TCP embryonic states more than a few seconds in the hope that they grow into a full, valid connection is not only technically groundless but also dangerous. Some wellknown systems, including Linux netfilter [5], [16] use 60 to 120 seconds timeout, which renders them vulnerable to network attacks. Once an embryonic state exceeds a short, recommended timeout value within the guideline developed in this paper, it should be promptly purged.
ACKNOWLEDGMENTS This work was supported by Korea Telecom.
REFERENCES [1] [2]
[3]
[4] [5] [6] [7]
[8] Fig. 4. Number of entries to be purged.
[9] [10]
tlast ¼ 2), respectively. Subject to the guideline, a stateful inspection computer should modulate until it achieves a target completion rate or a target memory utilization level. In particular, the timeout should not be kept to a low value in the absence of strong attack since this can adversely affect the connection completion ratio.
5
REAL-LIFE IMPACTS
With our 2001 trace, we demonstrate the impacts of different timeout settings on the session table size. At time t, the total number of entries ct can be modeled as:
[11] [12] [13] [14]
[15]
[16]
“Stateful-Inspection Firewalls: The Netscreen Way,” white paper, http:// www.netscreen.com/products/firewall_wpaper.html, 2000. G. Iannaconne, C. Diot, I. Graham, and N. McKeown, “Dealing with High Speed Links and Other Measurement Challenges,” Proc. ACM SIGCOMM Internet Measurement Workshop, 2001. H.-W. Braun, K. Claffy, and G. Polyzos, “A Framework for Flow-Based Accouting on the Internet,” Proc. IEEE Singapore Int’l Conf. Information Eng., pp. 847-851, 1993. V. Srinivasan, G. Varghese, S. Suri, and M. Waldvogel, “Fast Scalable Algorithms for Level Four Switching,” Proc. ACM SIGCOMM, 1998. S. Gill, “Maximizing Firewall Availabilty,” http://www.qorbit.net/ documents/maximizing-firewall-availability.htm, June 2002. IP Monitoring Project at Sprint, http://ipmon.sprint.com/ipmon.php, 2004. K. Houle and G. Weaver, “Trends in Denial of Service Attack Technology,” a CERT paper, http://www.cert.org/archive/pdf/DoS_trends.pdf, Oct. 2001. H. Kim, “Dynamic Memory Management for Packet Inspection Computers,” techreport, http://net. korea.ac.kr/lifetime.html, 2004. P. Vixie, G. Sneeringer, and M. Schleifer, “Events of 21-Oct-2002,” http:// f.root-servers.org/october21.txt, 24 Nov. 2002. D. Moore et al., “The Spread of Sapphire Worm,” techreport, http:// www.caida.org/outreach/papers/2003/sapphire/sapphire.html, Feb. 2003. NLANR, “NLANR Network Traffic Packet Header Traces,” http://pma. nlanr.net/Traces/, 2004. V. Paxson and M. Allman, “Computing TCP’s Retansmission Timer,” RFC 2988, Nov. 2000. R. Stevens, TCP/IP Illustrated Vol. 1. Addison-Wesley, 1994. K. Bhargavan, S. Chandra, P.J. McCann, and C.A. Gunter, “What Packets May Come: Automata for Network Monitoring,” Proc. 28th ACM SIGPLANSIGACT Symp. Principles of Programming Languages, pp. 206-219, 2001. D.V. Schuehler, J. Moscola, and J.W. Lockwood, “Architecture for a Hardware-Based, TCP/IP Content-Processing System,” IEEE Micro, vol. 24, no. 1, pp. 62-69, Jan.-Feb. 2004. Netfilter, http://www.netfilter.org, 2004.