Determining embryonic connection timeout in stateful ... - IEEE Xplore

2 downloads 0 Views 517KB Size Report
Determining Embryonic Connection Timeout in Stateful Inspection. Inhye Kang. University of Seoul. Hyogon Kim. Korea University. Abstract—Purging embryonic ...
Determining Embryonic Connection Timeout in Stateful Inspection Inhye Kang

Hyogon Kim

University of Seoul

Korea University Each session entry represent a “hole” that possibly allows incoming attack packets. And since the session table lookup is in the per-packet processing path, unnecessary increase in the number of entries obviously leads to the decrease in the packet throughput. Specifically, hashing is typically used for session table lookup (there is no range match or prefix match which is prevalent in “static” rule2 tables that forces more sophisticated matching methods, e.g., [5]), and the increase in the total entries results in the increase in the average number of entries in each hash bucket.

Abstract—Purging embryonic connection states after an appropriate time interval is essential for connection-level monitoring devices such as stateful firewalls in order to minimize security holes and improve state lookup performance. This paper investigates what timeout intervals are adequate, based on the analysis of real-life Internet traces. It reveals that (R+T) seconds are useful timeout periods where R=0, 3, 9 and 1 ≤ T ≤ 2, and that wide implementation of RFC 2988 is behind the phenomenon. Keywords—stateful inspection, session state purge, TCP, retransmission timeout

Although the session entries using non-TCP protocols exist, current Internet traffic mix is strongly biased towards TCP [6]. In this paper, therefore, we limit the scope of our investigation to the TCP session entries. In particular, we focus on purging invalid embryonic TCP sessions. In other words, we do not address the inactivity timeout of those TCP sessions that have been already established. This is because most attacks such as DoS (and in some cases scanning) result in invalid embryonic sessions.

I. INTRODUCTION “Stateful” inspection refers to an extension of packet-bypacket filtering process that tracks individual flows, enabling policy checks that extend across series of packets [1]. For example, a stateful firewall can block a TCP ACK packet not preceded by a TCP SYN packet with a correct sequence number. This per-flow or per-connection tracking and inspection is not limited to stateful firewalls but also utilized in network intrusion detection systems (NIDS), accounting and charging [2], and traffic monitoring [3] among others. Stateful inspection requires a session 1 table whose entries typically record source and destination IP addresses and port numbers, and the current sequence number (in case of TCP). For each arriving packet, the session table is looked up for a match. A session entry is created when the first packet appears from a flow previously not tracked. For TCP, this is the SYN packet. The entry is purged when the connection termination is signaled. Again for TCP, this is the last ACK packet for the second FIN unless the connection is aborted by a RST packet. For UDP, however, there is no connection establishment or teardown, so some timeout mechanism is typically used to purge the inactive session.

Most stateful inspection devices employ some form of timeout mechanism for purging states. However, its configuration is usually left to the discretion of users, who scarcely have clear guidelines. To the best of our knowledge, this paper is the first that attempts to provide such guidelines. The paper is organized as follows. Section II discusses our methodology and the difficulties involved in it. It also briefly introduces the Internet traces we used in this study. Section III first shows the base facts gathered from real-life Internet traces. Then it provides explanations for the facts, based on which the aforementioned guidelines are established. Section IV evaluates the performance impacts of different timeout values suggested in Section III. Section V provides some practical considerations, which are necessary for efficient implementations of the purge mechanism. Section VI summarizes the results and lessons.

Under normal operating condition, all session table entries represent a valid flow. However, abnormal events can create an excessive number of invalid entries in the table. A representative example is the denial-of-service (DoS) attack using TCP SYN packets. Suppose TCP SYN packets from typically spoofed source addresses go out through a stateful firewall in avalanche, triggering the creation of corresponding session entries in the table (Egress filtering is one way we can deter such attacks but its deployment is minimal in today’s Internet, not to mention its overall effectiveness is controversial [8]). When the session table is inundated with invalid entries, performance and security become concerns.

II. METHODOLOGY In this section, we discuss the methodology we used in order to determine adequate session timeouts. The methodology can be succintly summarized as follows. First, we obtain the distribution of TCP connection setup delays from a large number TCP connections observed from Internet. Second, based on the distribution, we choose the connection setup timeout period so that it covers “sufficient” percentage of normal flows. The connections that remain incomplete

1

The terms “flow”, “connection”, and “session” are used interchangeably in this paper. But when we need to include connection-less flows such as UDP, we will use “session”.

2

Session table lookup and static rule table lookup go hand in hand. A packet is rejected if either of these two rejects it. Which table comes first is an implementation decision.

0-7803-7802-4/03/$17.00 © 2003 IEEE

458

beyond this timeout interval are considered constituting potential attack, so they are purged. In order to track the state of connections, we run a pseduo- state machine for each, emulating a real TCP state machine. The TCP connection setup delay is defined to be the time elapsed between the transmission of the first SYN packet and the reception of the corresponding ACK. Note that it is not the total time between the first SYN transmission from the active open side and the reception of the ACK by the passive open side. This is because TCP is full-duplex, and channels in each direction are separately established. The TCP state machine on each side transitions into the ESTABLISHED state as soon as it receives the ACK packet for its SYN packet transmitted earlier [7]. The traces we use in this paper were collected in December, 2001, for the time duration of 10 days. Specifically, we collected only the packet headers 3 during work hours, usually from 9am through 5pm. Each day, approximately more than 600 million packets are captured in the trace, which is enough to fill a 80GB hard disk. Almost every day we observe the scanning and DoS traffic on multiple occasions. Other than that, there is little abnormality, and the traces exhibit similar traffic compositions. In this paper, we present the analysis of a single day trace, but other traces show similar qualitative results so we do not reiterate them.

Dc ’

Dsa

Dp (SYN)

SYN/ ACK Dp (SYN/ ACK)

ACK

A

O

III. TRACE ANALYSIS In this section, we present the analysis of the TCP connection setup delay observed from a large, real-life Internet packet trace. Fig. 2 plots the CDF of the connection setup delay from 8.4 million TCP connections observed on a typical day at our observation locale (IX). X-axis represents the connection setup delay measured in milliseconds, and the Y-axis is the cumulative probability.

B

Fig. 1. Estimating the connection setup delay from a backbone trace. There are a few difficulties imposed by our traces. The traces were gathered at an Internet Exchange (IX) in Korea. Four IXs including ours constitute the backbone of the Korean Internet. Our traces in particular, record the traffic exchanged between the IX and the U.S. over two trans-pacific T3 links. So our trace presents to us the “middle-of-the-network” view of the trans-pacific traffic. The impact is that the time difference between the SYN and the corresponding SYN/ACK appears much smaller (Dsa, Fig. 1) than the full connection establishment time Dc (This problem will be aggravated as the observation locale O moves towards B, the passive open side, in Fig. 1). In the figure, we want to measure Dc, which is the 3

Our investigation indeed reveals that asymmetric routing exists in our traces. Specifically, for the connections that goes through O to reach the U.S., the traffic coming back to Korea through other paths overwhelm those that pass through O, by the ratio of 14:1. But since this is not the focus of our study, and our method is not affected by it, we do not further elaborate on the issue. Another problem with our trace is that our packet capture program occasionally loses packets. We observe that it is not a prevalent problem across the entire trace but it is noticeable enough. One impact is that when the packet capture program misses some packets that tear down an established connection , the connection is considered simply inactive, whereas in reality it no longer exists. This “zombie” connection only exists in our session table, permanently occupying space unless explicitly identified and purged. Note that this problem is specific to our imperfect packet capture program. In reality, should a stateful inspection device loses packets, the resulting zombies can be easily removed using inactivity timers.

SYN

Dc

time difference between the SYN transmission and the ACK transmission. It accounts for the internal processing of the SYN/ACK (Dp(SYN/ACK)) as well, since the active open side transmits the ACK when it finishes the SYN/ACK processing. We can only obtain an approximation of Dc by calculating D’c. It is an approximation because it includes the error terms both in the observed times of SYN and the ACK, caused by the undulating Internet delay from A to O. A good property of this approximation though, is that it works under asymmetric routing. Suppose the packets from A to B goes through O, but not in the other direction. Since both the SYN and the ACK travel in the same direction, we can always observe them and compute the time difference. We could measure the connection setup delay in the other direction, i.e., between the SYN/ACK and whatever the packet that follows the final ACK. But since the packet after the final ACK can come anytime, it can falsely inflate the connection setup time estimate.

The bottom curve shows the total delay, regardless of how many SYN retransmissions were done. It represents the time difference between the first SYN transmission and the ACK transmission, completing the 3-way handshake. The top curve that starts quick is the distribution of the (total-last), where last is the time difference between the last SYN transmission (which could be a retransmission) and the ACK transmission. So the curve mostly represents the distribution of time spent to retransmit SYN packets, if any. According to the figure, 97% of connections do not go through SYN retransmission, and more than 2% retransmit once, and only a fraction retransmits more than once. Note that the percentage can vary depending on the packet loss probability of individual paths of the connections passing a particular observation locale.

The header field values have been sanitized to protect privacy [4].

459

goes up to 98.59% and 99.33%, respectively. Considering these are all trans-pacific connections, the percentile should go significantly higher if we observe both local and long-distance

A second observation is that for total there is a stiff increase at around 3 seconds. The fraction of connections established after the increase exceeds 99%. A closer examination reveals there is another (yet smaller) relatively faster increase at around 9 seconds. The fraction of established after this bump is more than 99.5%. Albeit almost invisible, this relatively faster increase also occurs at around 6 seconds. This observation reveals an interesting fact. First of all, these quick increases are due to TCP retransmission timeouts. When a SYN packet is lost, TCP should retransmit the SYN. On how fast the subsequent SYN packet should be transmitted, different implementations disagree. For instance, the BSDderived implementations transmit the first retransmission 6 seconds after the original SYN. This is a bug, which should rather be 12 seconds [6]. The almost invisible bump around 6 seconds tells us that a comparatively small number of BSDderived hosts exist. The majority, on the other hand, seems to follow the standard recommendation of RFC 2988 [7]. The specification dictates that before a round-trip measurement is made, the timeout (RTO) must be set to 3 seconds. The RTO is subject to the exponential backoff on each retransmission. (There are other rules specified by the RFC, but above two are enough for the purpose of our investigation.) The first stiff increase at 3 second is explained by the initial RTO from RFC 2988. The minor increase at 9 second is then the second retransmission (9=3+6). The retransmission timeout is backed off when there is no SYN/ACK response on the (first) retransmitted SYN. The second retransmission goes out at 9 seconds after the original SYN transmission. This observation reveals that most TCP implementations are RFC2988conformant.

1 0.95 0.9 0.85

prob.

0.8

0.7 0.65 0.6 0.55 0.5 500

prob.

total

0.9 8000

3000

Based on the observations that most TCP connections conform to RFC 2988, and that most connections are established within 1 second, we use 1 second as the “pure” setup delay in this paper. We come up with the connection completion rates in Table 1. One might as well allow for longer time for a higher completion rate. But the results should be comparable, since dilating it to twice the current value (i.e., 2 seconds) only improves the connection completion rate only by 2.5%. Determining this value in practice should depend on the load imposed on the stateful inspection device.

total-last

6000

2500

IV. RECOMMENDED TIMEOUT VALUES We saw in Section III that the TCP connection establishment time distribution is strongly affected by TCP retransmission behavior. In this section, we pick a few useful timeout values and investigate the impacts. In our experiments, we create a session entry for each new TCP SYN packet. Then we run a pseudo TCP state machine on the entry. An arriving packet for the entry may or may not change the state of the entry. Each entry has a state variable that shows how far the connection proceeded towards the ESTABLISHED state. Embryonic states must be purged after a prescribed time period to avoid possible abuse and to make the session table lookup most efficient.

0.94

4000

2000

Fig. 3. The delay distribution excluding SYN retransmission times.

0.96

2000

1500

connections.

0.98

0

1000

time (msec)

1

0.92

0.75

10000

time (msec)

Fig. 2. The distribution of connection setup delays. Another thing we notice from the figure is that the connection setup delay is usually much less than 1 second. At 1 second, more than 92% of the connections are established. But this is not considering the delay caused by SYN retransmissions. For instance, if the total connection setup delay is 3.9 seconds, it is probably because the first SYN was lost and retransmission occurred. It is more reasonable to think that the “pure” setup delay is 0.9 seconds. After accommodating this consideration, we obtain the CDF in Fig. 3 (this is the last in Fig. 2). At 0.5 second, the connection completion rate is 84.58%. At 1 second, the connection completion rate is 96.71%, and at 1.5 and 2 seconds, the rate

Note when reading Table 1 that BSD-derived implementations only allow total of 3 SYN transmissions. This is because the connection establishment timer in BSDderived systems goes off at 75 seconds, 3 seconds before the 4th SYN transmission [6]. The table tells us that setting the timeout at 7 seconds instead of at 4 seconds, possibly to allow one SYN retransmission for BSD-derived systems is of marginal use. The connection completion rate increase by just over 0.5% for the additional 3 seconds. It again confirms the fact that there are not many BSD-derived TCP

460

figure eloquently says these purged embryonic connections do not grow into a full-fledged one anyway since they are part of the (DoS) attack traffic.

implementations in use today. Finally, having a timeout larger than 10s buys us little in terms of connection completion rate. TABLE I.

CONNECTION TIMEOUTS AND COMPLETION RATES

60000 timeout=1s 4s

0 1 2 3

BSD-derived

7s 50000

10s 22s

timeout

Completion rate

timeout

Completion rate

1s 4s 10s 22s

93.07% 98.92% 99.86% 99.99%

1s 7s 31s -

93.07% 99.55% 100% -

31s killed embryonic connections

Maximum allowed SYN rex’s

RFC2988-conformant

40000

30000

20000

10000

When a stateful inspection device is under pressure, the timeout period should be shortened. If at all possible, it should not be lower than 1s because it would begin to affect majority of connections (Fig. 1), though. As we mentioned earlier, stateful inspection devices can be under pressure when an abnormal surge in the number of sessions takes place. Fortunately in our trace, we find some mild DoS attack traffic that can create such a situation. In the following, with different timeout values τ, we measure the number of session entries that have to be maintained in the session table.

0 500

c(t ) = x(t ) + λτ where x(t) is the number of legitimate connection entries (i.e., embryonic plus synchronized) at time t, and λ is the attack intensity in packets per second. Note that x(t) is hardly a function τ. This is because legitimate connections are established before the timeout, and the timeout interval thus does not affect their numbers. The second term in the equation is positive only when there is attack. What concerns us is that the attack intensity λ is relatively low in our trace. The most intense attack period records less than 5 Kp/s attack rate, while we see the total traffic then is more than 25 Kp/s. (So it is less than 20% at the peak.) If some full-fledged, distributed attack comes up, the increase in the flow states might become unbearable. So the important point here is that keeping the TCP embryonic states more than a few seconds in the hope that they sometime (within the timeout period) become a full, valid connection is highly likely futile. Once an embryonic state exceeds a short, recommended timeout of 4s or 10s, they had better to be purged immediately.

7s 10s 22s

registered connections

31s

100000

80000

60000

40000

20000

0 500

1000

1500

2000

2500

Back in Fig. 4, it is easy to see the number of entries increase proportionally with τ, and with the intensity of the attack. Namely, the total number of entries c(t) in the session table is

4s

0

2000

Fig. 5. Number of purged entries with different τ’s.

timeout=1s

120000

1500 time (10s)

160000

140000

1000

2500

time (10s)

Fig. 4. Number of session entries under a mild DoS attack.

In summary, three parameters should be taken into account when determining the appropriate timeout values for embryonic TCP connection states. 1) Maximum number of SYN retransmissions to be allowed for each connection, R. 2) Maximum time to wait for the ACK after a successful SYN transmission, T. 3) Number of entries in the session table, E. The real-life traces tell us that R can be small, typically 1 or 2 at most. And they also reveal that T can also be small. 1 second is fine, but for a bit higher connection completion rate, up to 2 seconds is acceptable. R and T should be modulated based on at what percentages of E the thresholds are set to change R and T.

We must note that the DoS attacks are active almost all day in our trace (Fig. 5). In particular, the sharp spikes represent the insets of more intensive attack traffic. The bottom curve in Fig. 4 is for τ = 1s, and the top curve is for τ = 31s. We can clearly see that lower τ’s are more resistant to the attacks. In the worst case, the number of entries to be kept with τ=31s is almost 14 times that with τ=1s or more than 5 times that with τ=4s. The striking fact is that keeping these entries for longer times earns us little. Fig. 5 testifies that with different τ’s, the number of embryonic connections eventually purged by the timeout are comparable across different τ’s (Contrast it with the big difference they make in terms of memory and search complexity as shown by Fig. 4). The

461

are conformant to RFC 2988, and only a small fraction follows the BSD style TCP retransmission timeout mechanism. As a result, SYN packet retransmissions, among others, take place at 3, 6, 9 seconds after the original transmission. The connection establishment rate does not significantly increase between these discrete points in time. In addition, we found that once the SYN packet is not lost in the network, the connection setup delay is mostly below 1 second. Also, the probability of SYN retransmission is quite low, although it depends on the individual connection paths of the passing connections at the observation locale.

V. IMPLEMENTATION If done brute-force, the purge complexity is O(n) where n is the number of session entries. For each entry in the session table, we must compare the timestamp with the current time. The session entry should have its state and the time the entry was last accessed in the current the state. If the elapsed time since the last access exceeds τ, and the entry is TCP and its state is still less than ESTABLISHED (i.e., embryonic), then the entry should be eliminated. Obviously, this cost is pure overhead if there are few invalid entries. For instance in our results shown above, if the DoS attack were not present, attempting to purge invalid entries more frequently would not lead to any gain. In order to avoid this problem, one could trigger the purge operation only when the number of session entries exceeds some threshold as mentioned in the previous section.

A general guideline on embryonic connection timeout is to set it to a small value, say 4 seconds. Any value longer than 10 seconds makes little sense. This is especially true because after a failed connection attempt, applications should retry anyway. Waiting more than 10 seconds for a connection establishment, most human users will be annoyed and start again. But the timeout should not be less than 1 second, because below it the connection completion rate begins to drop sharply.

A better way to perform the purge operation is using the time indexed queue. When a session entry is created at time t, it is embryonic. The entry is entered to a queue that is scheduled to be purged at (t+τ). This queue is separate from the session table itself. If the corresponding session entry proceeds to ESTABLISHED state, the entry in the time indexed queue is removed. Otherwise, when a queue entry is purged, its corresponding entry in the session table is also purged (Fig. 5). For each purge operation, one entire queue in the indexed queue is purged. For instance, at time t, only the entries in the dashed ellipse and their corresponding session entries are purged. This way, the entire session table does not have to be walked over for every purge operation. The frequency of the purge operation is inverse of the time gap between two adjacent time queues. It is a design decision but obviously it should be larger than 1/τ. IP addrs, ports

t:

If the inspection device is under severe attack and the number of session entries is increasing fast, we can adaptively change the timeout to 1 second. Our second guideline regarding timeout adaptation is to shift it discretely. This is because TCP retransmission timeout calculation for SYN packets dictates discrete jumps between successive transmissions. As a result, the connection completion rate does not visibly change between these discrete timeout values. ACKNOWLEDGMENT We give thanks to Korea Computerization Agency (KCA) who provided us with the packet traces.

state

REFERENCES [1] [2]

SYN_RCVD [3]

[4]

t+τ : Time- indexed queue

Session table [5]

Fig. 5. Using time indexed queue to facilitate purge operation.

[6] [7]

VI. CONCLUSION In this paper, we investigated what the appropriate timeout intervals are for embryonic TCP connection entries in stateful inspection devices. We found that most TCP implementations

[8]

462

Stateful-inspection firewalls: The Netscreen way, white paper, http://www.netscreen.com/products/firewall_wpaper.html. H.-W. Braun, K. Claffy, and G. Polyzos, “A framework for flow-based accouting on the Internet,” Proceedings of IEEE Singapore International Conference on Information Engineering, 1993. pp. 847 –851. K. Claffy, G. Polyzos, and H.-W. Braun, “A parametrizable methodology for Internet traffic flow monitoring,” JSAC 8(13), Oct. 1995, pp.1481-1494. J. Xu, J. Fan, M. Ammar, S. Moon, "On the Design and Performance of Prefix-Preserving IP Traffic Trace Anonymization," Extended abstract to the ACM SIGCOMM Internet Measurement Workshop. San Francisco. November 2001 P. Gupta and N. McKewon, “Packet classification on multiple fields,” Proceedings of ACM Sigcomm, 1999. R. Stevens, TCP/IP Illustrated Vol. 1. Addison-Wesley, 1994. V. Paxson and M. Allman, Computing TCP’s retansmission timer, RFC 2988, Nov. 2000. K. Park and H. Lee, “On the Effectiveness of Route-Based Packet Filtering for Distributed DoS Attack Prevention in Power-Law Internets,” Proceedings of ACM Sigcomm, 2001.

Suggest Documents