Congestion Control for Small Buffer High Speed Networks Yu Gu∗ , Don Towsley∗ , Chris V. Hollot† and Honggang Zhang‡ ∗ Department of Computer Science University of Massachusetts, Amherst Amherst, MA, 01002 Email: {yugu, towsley}@cs.umass.edu † Department of Electrical and Computer Engineering University of Massachusetts, Amherst Amherst, MA, 01002 Email:
[email protected] ‡ Math & Computer Science Department Suffolk University Boston, MA, 02108 Email:
[email protected]
Abstract— There is growing interest in designing high speed routers with small buffers that store only tens of packets. Recent studies suggest that TCP NewReno, with the addition of a pacing mechanism, can interact with such routers without sacrificing link utilization. Unfortunately, as we show in this paper, as workload requirements grow and connection bandwidths increase, the interaction between the congestion control protocol and small buffer routers produce link utilizations that tend to zero. This is a simple consequence of the inverse square root dependence of TCP throughput on loss probability. In this paper we present a new congestion controller that avoids this problem by allowing a TCP connection to achieve arbitrarily large bandwidths without demanding the loss probability go to zero. We show that this controller produces stable behavior and, through simulation, we show its performance to be superior to TCP NewReno in a variety of environments. Lastly, because of its advantages in high bandwidth environments, we compare our controller’s performance to some of the recently proposed high performance versions of TCP including HSTCP, STCP, and FAST. Simulations illustrate the superior performance of the proposed controller in a small buffer environment.
I. I NTRODUCTION Driven primarily by technology and cost considerations, there has been growing interest in designing routers with buffers that store only tens of packets. For example, Appenzeller et al. [1] have argued that it is extremely difficult to build packet buffers beyond 40Gb/s and that large buffers result in a bulky design, large power consumption, and expense. Also, buffering in an all-optical router is a technological challenge. Only recently have advances (Enachescu et al. [6]) given promise for buffering even a few optical packets. Small buffers are also favorable for end system applications. In [8], Gorinsky et al. advocate a buffer size of 2L packets where L is the number of input links. These buffers allow one to handle simultaneous packet arrivals. The idea is that while end system applications have many options for dealing with link under-utilization, they cannot compensate for the queuing delay created by applications. Recent studies indicate that TCP, the predominant transport
protocol, can coexist with such small buffers. Using fluid and queuing models of TCP and packet loss, Raina and Wischik [22] and Raina et al. [21] argue that small buffers provide greater network stability. Enachescu et al. [6] demonstrate that buffers can even be chosen with size Θ(log W ), where W is congestion window size, provided that TCP sessions use pacing. In this case they show that links can operate at approximately 75% utilization. The goal of our paper is twofold. First, we present a model where per connection throughput increases, link speed increases, and router buffers are fixed in size. We refer to this as an “evolutionary (workload and technology) model”. We will use this model to illustrate a serious performance degradation problem in bottleneck link utilization. Second, motivated by these observations, we specify requirements for congestion controllers that yield robust performance in the face of increasing per connection throughput and higher link speeds. One solution to the above problem is the log W rule proposed in [6]. We explore a different approach appropriate for fixed-size router buffers. Our analysis leads to a new endto-end congestion controller that allows buffers to remain constant in size regardless of the increasing sending rate of a source. As this new congestion controller results from our analysis of the evolution model, we name it E-TCP. The key idea behind E-TCP is to prevent the packet loss probability from going to zero as sending rates increase. ETCP achieves high link utilization by operating at an equilibrium where the sending rate forces the bottleneck packet loss probability to be larger than a predefined constant p0 . So instead of “avoiding congestion”, E-TCP drives the bottleneck to an equilibrium state of moderate congestion. This equilibrium is achieved using a generalized additive increasing multiplicative decreasing algorithm on the congestion window. The equilibrium of E-TCP requires a more aggressive packet loss rate than TCP. Therefore, we have also developed congestion signalling and reliable data delivery mechanisms that accommodate it. The E-TCP congestion control is com-
pletely decoupled from reliable data delivery mechanisms, which makes it applicable to both unreliable and reliable data transmission. While this research is primarily targeted towards future router technologies and workloads, our controller may also be useful for high speed, high volume data transfers in the current Internet. This is of growing importance in the high performance science community. Our simulations show that the new controller is competitive with several proposed controllers including HSTCP [7], STCP [14] and FAST [11] in a small buffer environment. The rest of the paper is organized as follows. In Section II, we propose a simple evolutionary network model of the Internet for the case that buffer sizes remain constant in size. Section III introduces the new congestion controller, E-TCP, and discusses its related properties. We present a reliable data delivery mechanism that builds upon E-TCP in Section IV. Section V demonstrates experimental results obtained during the development of E-TCP. The last two sections point to the related research and provide a short summary of our work.
Let τr be the round trip time. This additive-increase, multiplicative-decrease (AIMD) behavior can be approximated by the differential equation 1 W dW WW = − p(t), dt W τr 2 τr which yields the steady state congestion window size W ∗ = 2/p,
II. A N EVOLUTIONARY NETWORK MODEL
T (n) = ρ(n)c(n)/g(n),
The evolution of the Internet has seen a rapid growth in both users and applications. In addition, users are also experiencing higher bandwidth. This speed-up is primarily due to higherbandwidth routers, switches and pipes. In less than forty years, Internet line speeds have increased from Kb/s to Gb/s, and access line speeds have evolved from Kb/s to Mb/s. In some parts of the world, optic fiber connects end users on a per household basis providing tens of Mb/s. It is thus reasonable to expect this trend to continue where both the number of connections and individual bandwidth per user increase. To model this state of affairs we consider a network with a single bottleneck router having capacity c and carrying g connections. With n = 1, 2, . . . denoting the technology generation, we define c(n) to be the capacity of the n-th router generation and g(n) to be the number of connections such a router expects to carry. Based on our prior observations on Internet evolution , we can then model g(n) as nondecreasing and c(n) → ∞; c(n)/g(n) → ∞
where T (n) and ρ(n) are the per connection sending rate and the link utilization in an n-th generation network. Combining (1) – (3) yields
as n → ∞. This states that as link speed increases, per connection throughput also increases. Our analysis also considers the case when the bottleneck’s buffer size is a constant B, independent of the technology generation. The steadystate packet loss probability p is taken as a function of link utilization ρ (1) p = PB (ρ) where PB is a one-to-one mapping from R+ to [0, 1] and PB (x) → 0 as x → 0; i.e., p → 0 as ρ → 0. This is true for an M/G/1/B queue, which, for large g(n), can be a suitable loss model due to the Poisson nature of aggregate traffic; see [4] and [3]. To model the evolution of TCP, we first recall the adjustment of the congestion window W on the receipt of an acknowl-
edgement: W ← W + 1/W. On detecting a packet loss during a round trip time the window is halved W ← W − W/2.
and the steady-state sending rate
T = W ∗ /τr = (1/τr ) 2/p,
(2)
the well-known square-root formula for TCP throughput [20]. On the other hand, if we assume that g(n) homogeneous TCP connections equally share the bottleneck capacity, then (3)
2
ρ2 (n)PB (ρ(n)) = 2 (g(n)/τr c(n)) .
(4)
Since c(n)/g(n) → ∞, the right-hand side of (4) converges to zero as technology evolves. Because PB is one-to-one and PB (x) → 0 as x → 0, (4) implies that ρ(n) → 0. Note that individual connection throughput becomes unbounded, even though ρ → 0. Note that we’ve made no assumptions on the explicit dependence of c(n) and g(n) on n, other than c(n)/g(n) → ∞. One possible realization is c(n) → ∞ and g(n) constant, corresponding to the scenario found in the high performance science community where higher bandwidth devices are desired for data-intensive applications. One possible solution to this link under-utilization problem is to increase the router buffer size. Suppose our goal is to maintain a constant link utilization ρ, independent of the technology generation n. By approximating the bottleneck packet loss probability function PB (ρ) in (1) by PB (ρ) ≈ ρB and substituting into (4) yields 2
B + 2 = logρ 2 (g(n)/τr c(n)) .
(5)
Here c(n)/g(n) is proportional to W under a fixed round trip time and constant link utilization. Thus, to maintain constant non zero link utilization, the buffer size B should be on the order of log W . This is consistent with [6]. However, our evolution model further implies that buffer sizes must grow as technology evolves, since c(n)/g(n) in the right hand side of (5) goes to ∞ as n increases. The key factor behind TCP’s under-utilization problem is that evolutionary increases in TCP throughput requires the corresponding evolutionary loss probability to go to zero. This
III. C ONGESTION CONTROL In this section we present a congestion control algorithm that maintains a high link utilization in an evolving network where the router buffers are set to a constant size. The basic idea is to design a congestion control algorithm that does not require the loss probability to go to zero as throughput increases. In particular, we introduce a congestion controller that, under heavy loads, forces the connection to operate at an equilibrium where the packet loss probability p satisfies p > p0 where p0 is a constant chosen to be greater than zero. The controller is characterized by a throughput-loss equation where the throughput increases to ∞ as the loss probability p decreases to p0 . As it comes from our evolutionary network model, we name the congestion controller E-TCP. A. E-TCP congestion control One option in developing the controller is to have it behave similar to current TCP. Such a controller would then have the following steady state window-loss tradeoff curve, which is a variant of the TCP throughput formula (2) W ∗ = 2/(p − p0 ), where p0 > 0 is the predefined packet loss probability. Instead, we opt for a controller that more closely resembles STCP, namely (6) W ∗ = 2/(p − p0 ), The above equations show that the steady-state packet loss probability p will always be larger than p0 , and that, as the bottleneck capacity increases to infinity, p∗ decreases to p0 in both cases. The reason for choosing the latter is that STCP exhibits a more stable and scalable behavior at steady state [14]. These nice properties are carried over to the new controller and contribute to a higher link utilization than the one based on TCP. The equilibrium in (6) is achieved by using a generalized AIMD protocol with general increment and decrement that are functions of the congestion window size W , namely i(W ) and d(W ). On receiving a successful acknowledgement, the congestion window is increased by i(W ) and on receiving a loss indication, the congestion window is decreased by d(W ). The behavior of such an algorithm is described by W W dW = i(W ) − d(W ) p (7) dt τr τr and at equilibrium, i(W ∗ )/d(W ∗ ) = p. Combining the above with (6) gives i(W )/d(W ) = (2 + p0 W )/W.
(8)
Target Utilization
100
TCP E−TCP
90
Link utilization (%)
follows immediately from (2) and, as we have shown, p → 0 at a constant-sized buffer necessarily implies that ρ → 0. We will show that this problem also plagues recently proposed high performance versions of TCP: HSTCP [7], STCP [14] and BIC [25]. This observation leads to a new congestion control algorithm which we will discuss in the next section.
80 70 60 50 40 0
5
10
15
20
25
Throughput (Gbits/s)
Fig. 1. TCP’s link utilization drops under 50% as its throughput increases, while E-TCP’s link utilization converges to 88%.
Let i(W ) = 1/b for some parameter b > 0. This leads to d(W ) = W/(b(2 + p0 W )).
(9)
Setting p0 : It is important to understand that if a link is lightly utilized (is not the bottleneck for any session), its loss probability will be zero regardless of the value of p0 . Losses only occur at congested links. Hence, p0 should be selected while keeping in mind that the loss probability is bounded from below by p0 only in overload conditions. It seems reasonable then to choose p0 on the order of 0.01 or 0.001. Loss probabilities in this range can easily be dealt with through either retransmission, as in TCP, or through simple forward error correction codes such as [18]. In this work, we set p0 in E-TCP to 0.01. Setting b: The form of i(W ) comes from Vinnicombe’s work [24]. It shows that, in the limiting region where queuing delays and queue emptying times are small in relation to propagation delays, a sufficient stability condition is to set b larger than the buffer size B. Note that this is true in our case where we are concerned with the case of large capacities and constant size buffers. Since we are considering a buffer size B = 20, b is then set to 25 in our work. Therefore the E-TCP congestion control algorithm adjusts the congestion window W such that with every successful acknowledgement W ← W + 1/25, and with every packet loss event W ← W − W/(25 × (2 + 0.01W )). The algorithm converges to an equilibrium with steady-state congestion window W ∗ = 2/(p − 0.01). In summary, Figure 1 illustrates our key idea in designing the E-TCP congestion controller. The figure graphs both TCP and E-TCP’s link utilization as throughput increases. It assumes the packet size is 1000 bytes, the round trip time
100ms, and B = 20. Observe that TCP’s link utilization quickly drops under 50% as its throughput increases. Whereas as the throughput of E-TCP increases, the link utilization converges to 88%. B. Congestion signaling Let N denote the number of packet losses incurred by ETCP during a round trip time. Its expectation satisfies E[N ] ≥ p0 W ∗ . In particular, E[N ] → ∞ as W ∗ → ∞. This presents a challenge in congestion-signaling and loss recovery if reliable data delivery is required. First, whereas current TCP reacts only to one loss event per round trip time, E-TCP needs to react to every loss event in order to reach the expected stable state. Second, whereas current TCP can only recover a bounded number of losses per round trip time, any reliability mechanism coupled to E-TCP must handle an unbounded number of losses per round trip time. We address the congestion-signaling issue in this subsection and the reliability issue in the next section. The selective acknowledgement technique is a natural choice to handle multiple packet losses in a round trip time. An E-TCP sender maintains two numbers: • congestion control sequence number (cc seq), this is the unique sequence number for current sending packet and increases by one as a packet is sent; and • congestion control acknowledged number (cc ack), which records the highest sequence number that the sender has responded to. When the receiver receives a data packet, it generates an acknowledgment that contains the following two fields: • highest sequence number (h seq), this is the highest sequence number the receiver has ever received; and • bitmap, this is a 32 bit bitmap corresponding to sequence numbers from h seq-1 to h seq-32. If a sequence number is received, the corresponding bit is set to 1, otherwise it is set to 0. The 32 bit bitmap provides redundancy in case the acknowledgements get lost in the reverse path. Assume that both forward and reverse paths have independent Bernoulli packet loss processes with mean rates less than 0.05, and that no reordering occurs, the probability that a received sequence number is not acknowledged at the sender is reduced to less than 10−33 by the 32 bit bitmap. cc ack ensures that the sender responds to each sequence number once and only once. If the sender receives an acknowledgement with h seq smaller than cc ack, the acknowledgement is discarded. Otherwise, the sender responds to all sequence numbers between cc ack and h seq. Acknowledged packets increase the congestion window and packets that are reported as lost decrease the congestion window. In order to handle packet reordering in the networks, the sender does not respond to a missing sequence number s until it receives an h seq larger than s + 2. Note that the sequence numbers here identify the packets only and are independent of the data in the packets. Any
reliable transmission mechanism that builds upon E-TCP will have its own data labeling and loss recovery mechanism, and will be decoupled from the congestion control mechanism. C. Packet pacing E-TCP is a rate-based congestion control protocol. The time interval between two consecutively sent packets is governed by a rate-control timer. The interval of the rate-control timer is set according to an exponential distribution with mean τr /W , where τr is the estimated round trip time and W is the congestion window. This is motivated by fairness considerations. We observe that if the timer interval is set directly to τr /W , different senders may experience different loss rates, which leads to unfairness. Exponential pacing avoids this problem by adding randomness to the packet arrival process at the bottleneck. It does not affect the throughput of the congestion controller, as we will demonstrate in our experimental results. IV. R ELIABILITY We present a reliable data delivery mechanism that can be coupled with E-TCP. The purpose of building a reliable data delivery mechanism in this work is threefold. First, we show that with a simple packet retransmission technique, lost packets in an E-TCP connection can be recovered effectively. Second, we demonstrate that the reliable data delivery mechanism is completely decoupled from the congestion control behavior. Last, coupled with E-TCP, it provides a fair base when compared to other existing congestion control protocols that provide reliability. In an E-TCP connection, the expected number of packet losses during a round trip time, E[N ] ≥ p0 W ∗ , where W ∗ is the steady state congestion window. As bandwidth increases, W ∗ → ∞, and E[N ] → ∞ as well. This represents a scenario where the reliability mechanism employed by TCP cannot perform well. TCP retransmits packets on receiving three duplicate acknowledgements or partial acknowledgements. As a consequence, it retransmits only one packet per round trip time. Even if selective acknowledgement (SACK) [19] is used, TCP still faces the fact that lost retransmitted packets have to wait for timeouts in order to be retransmitted again. This will be problematic if applied in a reliable E-TCP connection, since, as E[N ] → ∞ goes to infinity, the number of lost retransmitted packets per round trip time also goes to infinity. Our reliable transmission mechanism employs an extension of the selective acknowledgement technique. The receiver maintains a receiving buffer where data that arrive out of order are stored. On the arrival of a packet, the receiver generates an acknowledgement and returns it to the sender. In addition to the information required by the congestion controller, the acknowledgement includes the following three extra fields: • Cumulative acknowledgement (CACK), this is the highest data sequence number that the receiver has ever received in order; • Triggering sequence number (TSN), this is the data sequence number of the packet that triggered the acknowledgement; and
V. E XPERIMENTAL RESULTS We performed numerous experiments to study the behavior and performance of E-TCP. Our experimental results emphasize the following three conclusions: • E-TCP maintains high utilization of the link bandwidth and outperforms other protocols in high delay bandwidth product networks. In all our experiments, E-TCP maintains almost constant high link utilization independent of the increasing bottleneck bandwidth. • E-TCP is stable. E-TCP reaches the predefined equilibrium states and exhibits stable behaviors in all the network settings carried out in our experiments. • E-TCP preserves fairness among multiple connections. Multiple E-TCP connections sharing a bottleneck will converge to a fair state. The convergence rate depends on the network bandwidth and the round trip delay. We obtained experimental results that study E-TCP’s performance under various network topologies and parameter settings. In this section, we present only a subset of the results due to space limitations. More results can be found in our technical report [9]. A. Experimental setup The experiments are performed using the packet level network simulator ns-2 extended with the E-TCP module. Please refer to our technical report [9] for implementation details of E-TCP in ns-2. Most of the experiments use the dumbbell
S1
Sn
Fig. 2.
R1 Bottleneck B1
B2
R2
...
S2
...
Left edge of TSN’s block (LE), this is the smallest sequence number such that the sequence numbers from LE to TSN have all been received. The sender maintains a retransmission queue that contains unacknowledged data. The retransmission queue is divided into segments according to the number of times the data have been retransmitted. On the arrival of an acknowledgement, the sender first updates the retransmission queue with the information carried in the packet. Then it infers packet losses according to the positions of CACK, TSN and LE in the retransmission queue. As the retransmission queue is divided into segments that retain the data retransmission information, lost retransmitted packets can be inferred. The data inferred as lost are then retransmitted, and the retransmission queue segments are updated accordingly. A retransmission timer is also maintained for the smallest sequence number in the retransmission queue. Whenever the timer goes off, the data with the smallest sequence number are retransmitted and the timer is reset to the estimated round trip time. In the reliable data delivery mechanism, the set of sequence numbers labeling the data is independent of the congestion control sequence numbers that label the packets. In addition, the sending times of all packets, including the retransmitted packets, are governed by the congestion control mechanism. As a consequence, the reliability mechanism operates completely independent of the congestion control algorithm and does not change the sending rate of the protocol. •
Rn
The dumbbell topology
topology shown in Figure 2. This topology has a single bottleneck. The bottleneck buffer is fixed at 20 packets and drop-tail queue management is applied. Each source or sink connects to the bottleneck through a unique edge link. The edge links are assigned a bandwidth of 100Gbps, a propagation delay of 5ms and a buffer size of 1000 packets. Unless specified otherwise, the reader should assume this default setting. Our experiments cover the bottleneck capacities ranging from 100Kbps to 5Gbps and the round trip propagation delay ranging from 50ms to 200ms. All E-TCP connections provide reliable data delivery. B. Single connection single bottleneck This set of experiments simulates cases where there is one single E-TCP connection going through the bottleneck in a dumbbell topology. We vary the bottleneck bandwidth from 100Kbps to 5Gbps and the round trip propagation delay from 50ms to 200ms. Each experiment simulates a duration of 300 seconds. System dynamics: As a case study, we first look at the system dynamics when the bottleneck bandwidth is 1Gbps and the round trip propagation delay is 100ms. The packet size is set to 1040 bytes yielding a delay bandwidth product of 12020 packets. We expect the congestion window to stabilize around 12020 and the bottleneck loss rate to stabilize at p = p0 + w2 ≈ 0.0102. Figure 3 graphs the sample path of the congestion window and the bottleneck loss rate averaged over 10ms interval. In Figure 3 (b), we also plot the cumulative packet loss rate, which is the ratio of the total lost packets divided by the total number of incoming packets. The figure is exactly what we expected: the congestion window of E-TCP stabilizes at an equilibrium close to 12020 and the bottleneck packet loss rate at a little above the target loss rate p0 = 0.01. Link utilization: In all experimental settings, E-TCP maintains high link utilization. We also studied the performance of seven other TCP variants in the same settings: TCP NewReno, HSTCP, STCP, FAST, BIC, CUBIC, and TCP NewReno with packet pacing. Please refer to our technical report for the detailed settings of these protocols. Figure 4 shows the link utilization of different protocols with increasing bottleneck capacities when the round trip propagation delay is set to 50ms and 200ms respectively.1 In the figure, the link utilization is calculated as the ratio of the goodput over the bottleneck bandwidth. While E-TCP maintains high link utilization, the link utilizations of other protocols decrease due to packet losses caused by small buffers. We also perform simulations 1 Generally CUBIC shows a performance improvement over BIC and we only show the curves of CUBIC for the sake of clarity.
400 congestion window
congestion window congestion window congestion window
350
congestion window
congestion window
20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0
300 250 200 150 100 50 0
0
50
100
150
200
250
300
0
50
100
time (s)
(a) congestion window
200
250
300
(a) congestion window
0.1
0.14
loss cumulative loss
loss cumulative loss
0.12
packet loss rate
0.08
packet loss rate
150 time (s)
0.06 0.04 0.02
0.1 0.08 0.06 0.04 0.02
0
0 0
50
100
150 time (s)
200
250
300
0
50
100
(b) packet loss rate
150 time (s)
200
250
300
(b) packet loss rate
Fig. 3. The congestion window of E-TCP stabilizes at an equilibrium close to 12020 and the bottleneck packet loss rate a little above the target loss rate p0 = 0.01.
Fig. 5. E-TCP remains stable in the case of 50 connections sharing a single bottleneck. 1 0.8
link utilization
link utilization
1 0.8 0.6 0.4 0.2 0
E-TCP E-TCP wo/ exp Newreno HSTCP STCP FAST CUBIC Newreno w/ pacing 100k
1M
0.6 0.4 0.2
E-TCP Newreno HSTCP STCP Newreno w/ pacing CUBIC
0 2.5G/50ms 10M
100M
1G
10G
5G/50ms
2.5G/100ms
5G/100ms
2.5G/200ms
5G/200ms
bottleneck capacity/round trip propagation delay
bottleneck capacity
Fig. 6. E-TCP obtains highest link utilization when multiple connections share a bottleneck. Here, the two curves of NewReno and HSTCP overlap.
(a) RTT=50ms
link utilization
1 0.8 0.6 0.4 0.2 0
E-TCP E-TCP wo/ exp Newreno HSTCP STCP FAST CUBIC Newreno w/ pacing 100k
1M
10M
100M
1G
10G
bottleneck capacity
(b) RTT=200ms Fig. 4. E-TCP maintains high link utilization as the bottleneck bandwidth increases while the link utilizations of other protocols decrease.
of E-TCP without exponential pacing and observe that its removal does not have any obvious effect on the throughput. C. Multiple connections single bottleneck We now look at the stability, link utilization, and fairness issues of E-TCP when multiple E-TCP connections share a single bottleneck. The bottleneck bandwidth is varied from 1Gbps to 5Gbps and the round trip propagation delay from 50ms to 200ms. We show experimental results when the number of connections is 50. The start-up times of the flows are distributed uniformly in the first 100ms. System dynamics: Again, we first study the system dynamics when the bottleneck link bandwidth is 1Gbps and the round trip propagation delay is 100ms. The average delay bandwidth product for each connection is approximately 240 packets and the loss rate at the bottleneck is expected to be 0.018. Figure 5 graphs the sample paths of the congestion windows of three randomly selected E-TCP connections and the bottleneck loss
rate. The figure confirms our expectation: the system becomes stable at the expected states. Link utilization: Figure 6 presents the link utilization of the protocols in some of the experiments. The x-axis indicates the experimental settings and the y-axis shows the link utilization. E-TCP obtains a higher link utilization than the other protocols in these high bandwidth settings. In other smaller bandwidth settings, E-TCP shows comparable link utilization to other protocols. Fairness: E-TCP presents good fairness properties in these experiments. We take the goodput of each connection and study the fairness among them by calculating Jain’s fairness index [10], which is defined as n ( i=1 xi )2 . J(x) = n n i=1 x2i x = (x1 , . . . , xn ) is the vector of the connections’ throughput and n is the number of connections. An index value of 1 implies perfect fairness. Figure 7 demonstrates one of the experimental results when the bottleneck is 1Gbps and the round trip propagation delay 50ms. The fairness index is calculated using the average throughput over one second. We observe that E-TCP preserves good fairness among all connections. In the experiments, we also observe that, as the delay bandwidth product increases, the convergence time to the fair state becomes larger. This is determined in part by the per packet response nature of end-to-end congestion control protocols.
1
12000 Jain’s fainess index
0.9
throughput (pkts/s)
Jain’s fairness index
0.95 0.85 0.8 0.75 0.7 0.65
connection 1 connection 2 connection 3
10000 8000 6000 4000 2000
0.6 0.55
0 0
50
100
150
200
250
300
0
50
100
time (s)
(a) 1Gbps/50ms Fig. 7.
150
200
250
300
time (s)
(a) 100Mbps
E-TCP converges to a fair state among all the connections.
Fig. 9. E-TCP responses to newly created connections and converges to an equilibrium where fairness is preserved.
700 congestion window congestion window congestion window
congestion window
600
Connection 2
Connection 3
500 400
Connection 1
300 200 100 0 0
50
100
150
200
250
300
time (s)
Background flows
(a) congestion window Fig. 10.
0.16 loss cumulative loss
packet loss rate
0.14
The multi-branch topology
0.12
start up, existing E-TCP connections respond by decreasing their throughputs and all E-TCP connections converge to an equilibrium where fairness is preserved.
0.1 0.08 0.06 0.04 0.02 0 50
100
150
200
250
300
time (s)
(b) packet loss rate Fig. 8. 50 E-TCP connections with different round trip times maintain the system in a stable state and the congestion windows of all connections converge to the same size.
D. Heterogeneous RTTs In this set of experiments, each connection is assigned a different round trip propagation delay. The bottleneck bandwidth is set to 500M bps. 50 E-TCP connections share the bottleneck with different round trip propagation delays ranging from 50ms to 295ms (RT Ti+1 = RT Ti + 5ms). Figure 8 graphs the sample paths of the congestion windows of three selected connections and the bottleneck loss rate. The round trip propagation delay of the three selected connections are 50ms, 100ms and 200ms respectively. In this case: 1) the system remains in a stable state; and 2) the congestion windows of all connections converge to the same size determined by the bottleneck packet loss rate.
F. Multiple bottlenecks We study the performance of E-TCP in a multiple bottleneck environment using the setting described in Figure 10. There are 5 bottlenecks and 8 E-TCP connections with different paths. Three of the connections are labeled. Each of the bottlenecks has a bandwidth of 600M bps and a propagation delay of 20ms. Figure 11 illustrates the evolution of the congestion window for E-TCP connections 1 and 2 and the packet loss rate of the second bottleneck from the left. These figures illustrate that when E-TCP connections go through multiple bottlenecks, they still converge to a stable state. G. Two way traffic We also consider the connections in both directions in the dumbbell topology. In each experiment, 25 connections send 140 congestion window 1 congestion window 2
120
congestion window
0
100 80 60 40 20
E. Response to new connections
0
50
100
150
200
250
300
time (s)
(a) congestion window loss 2 cumulative loss 2
0.1
packet loss rate
We study the responsiveness and fairness of E-TCP in a dynamic environment where new E-TCP connections start up and compete with existing E-TCP connections. The bottleneck bandwidth varies from 10M bps to 1Gbps and the round trip propagation delay from 50ms to 100ms. In each experiment, three E-TCP connections are set to share the bottleneck. The first connection starts at time 0s and stops at time 300s, the second starts at 60s and stops at 240s, and the third starts at 120s and stops at 180s. Figure 9 shows one of the experimental results when the round trip propagation delay is 50ms. It illustrates when new E-TCP connections
0
0.08 0.06 0.04 0.02 0 0
50
100
150
200
250
300
time (s)
(b) packet loss rate Fig. 11.
E-TCP is stable when they go through multiple bottlenecks.
0.9
link utilization
0.8 0.7 E-TCP, M=1 E-TCP, M=25 E-TCP, M=50 Newreno w/ pacing, M=1 Newreno w/ pacing, M=25 Newreno w/ pacing, M=50
0.6 0.5 0.4
50Mbps
100Mbps
200Mbps
500Mbps
750Mbps
1Gbps
bottleneck capacity
Fig. 12.
The throughput of E-TCP in a two way traffic setting.
600 TCP Newreno TCP Newreno with pacing
throughput in Mbps
500
ETCP
400
300
200
For finite size flows, E-TCP exhibits superior performance to TCP Newreno and TCP Newreno with pacing. When there are 1000 users, the link is 622M bps, and η is 60, the throughput of E-TCP is 18% and 9% higher than the throughputs of TCP Newreno and TCP Newreno with pacing respectively. If we further increase η to 120, the throughput of E-TCP is 41% and 37% higher. When there are 1500 users, the link is 622M bps, and η is 60, the throughput increases of E-TCP are more than 20% higher. We also note that when the link is 155M bps, 1500 users, and η is 15, E-TCP improves by more than 10%. For all other experiments, E-TCP also shows significant improvement over other variants of TCP. Since E-TCP exhibits much higher throughput, the expected finish time for finite-sized flows using E-TCP is shorter than those of other variants of TCP. This implies that E-TCP also improves user-perceived performance by decreasing the response time of HTTP requests. VI. R ELATED WORK
100
0
155/1500/15
622/1000/60
622/1500/60
622/1000/120
experiment setting
Fig. 13. Comparison of throughputs of E-TCP, TCP Newreno, and TCP Newreno with pacing. Experiment setting is represented as “link bandwith in Mbps/number of users/η”.
data from the nodes on the right to the nodes on the left and another M connections send data from the left to the right. M is set to 1, 25, and 50. We keep the round trip propagation time at 50ms and vary the bottleneck bandwidth from 50M bps to 1Gbps. We then study the goodput of the latter M connections. For comparison, we perform the same experiments on TCP NewReno with packet pacing. Figure 12 demonstrates the link utilization under different bottleneck capacities. In this two way traffic setting, the throughputs of all protocols are lower than they were in the one way traffic situation. However, this impact on E-TCP is the smallest. When M = 1, the single ETCP connection maintains 80% link utilization. When M = 25 or M = 50, the throughput of forward E-TCP connections increases to more than 90% of the bottleneck bandwidth. H. E-TCP for short-live flows We use an empirical web traffic model introduced in [23] to evaluate the performance of E-TCP when transferring short-lived flows. Specifically, we simulate an inter-connection peering link between two networks. As in [15], we use weblike traffic going through this link from sources on the one side to destinations on the other side. We run simulations for 500, 1000, and 1500 users and for 155M bps and 622M bps links respectively. The sampled flow sizes are scaled by a factor η to achieve reasonable link utilizations. For the 155M bps link, η is set to 15, and for the 622M bps link, η is set to 60 and 120. Note that, we do not model future HTTP workloads by doing this multiplication, but instead expect this to approximately model future growing TCP workloads. A complete description of the experiments can be found in our technical report [9]. Some of the simulation results are presented in Figure 13.
This paper has focused on the evolution of networks to high-bandwidth technologies. For completeness sake, we now discuss some of the recent research on designing protocols for such high-speed networks; e.g., see [7], [14], [11], [12], [25], [16] and [2]. The interested reader can also refer to [17] and [5] for more information on these protocols. Floyd’s HighSpeed TCP (HSTCP) [7], Kelly’s Scalable TCP (STCP) [14], Xu et al.’s Binary Increase Congestion Control (BIC) [25], and Bhandarkar et al.’s LTCP [2] are variants that strive to improve on TCP’s throughput-loss curve (2) in high-bandwidth networks. In HSTCP, the congestion window algorithm achieves the response function W ∗ = P S (1/pS )L, where S, P and L are predefined parameters. For one parameter set shown in [7], the steady state sending rate T = W ∗ /τr = 0.12/(p0.835 τr ). In STCP,
T = W ∗ /τr = 0.08/(τr p).
In BIC, the sending rate T is proportional to 1/pr , with 1/2 < r < 1, and in LTCP, the sending rate T is proportional to 1/(τr p3/5 ). From Section III, we then calculate in these three cases. For HSTCP: 1
1
PB (ρ)ρ 0.835 = (0.12g/τr c) 0.835 ; for STCP: PB (ρ)ρ = (0.08g/(τr c)); for BIC:
1
1
PB (ρ)ρ r ∝ (g/c) r ; and for LTCP: 5
5
PB (ρ)ρ 3 ∝ (g/(τr c)) 3 . In all cases the link utilization ρ decreases to 0 as c/g increases to ∞. This problem persists as long as the throughput T is proportional to p−α for some α > 0, and this happens for
HSTCP when α = 0.835, for STCP when α = 1, for BIC when α = r, and for LTCP when α = 3/5. Jin et al.’s FAST TCP [11] uses queuing delay as a measure of network congestion and achieves high utilization for high bandwidth, long latency networks by adjusting the congestion window based on the estimated round trip delay. This scheme would be challenged in networks with small buffers since congestion would be hard to detect from roundtrip time variations. Consequently, in a high-speed small-buffer networks, FAST TCP will not achieve high utilization. Katabi et al.’s eXplicit Control Protocol (XCP) [12] achieves efficient link utilization by using feedback signals generated by routers along the path to modify sending rates. In contrast, our E-TCP protocol is an end-to-end approach requiring no router assistance. Kunniyur modified the original TCP dynamics in a similar fashion to our proposal to tolerate fiber errors in high bandwidth optical connections [13]. In this work, lower bounding the packet loss rate is demonstrated as an effective approach for maintaining high link utilization in small constant-size buffer networks through our evolution model. Our work suggests a rate-based variant of STCP is preferred, and we have considered related issues as congestion signaling and reliable data delivery mechanisms, which is not studied in [13]. VII. S UMMARY We focused on future networks characterized by highspeed links and routers with small buffers. Our evolutionary network model indicates that TCP will experience serious performance problems as per connection throughput increases. This analysis leads us to a new congestion controller, E-TCP, which maintains high link utilizations in these environments. E-TCP features a novel target equilibrium that forces the bottleneck to a stable state of moderate congestion. In addition, the design of E-TCP completely decouples congestion control and reliable data delivery, which allows for a straightforward design and implementation of both the congestion control and the reliability mechanism. ACKNOWLEDGMENT This work was supported in part by the National Science Foundation under award number EIA-0080119 and CNS0519922. Any Opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect those of the National Science Foundation. R EFERENCES [1] G. Appenzeller, I. Keslassy, and N. McKeown. Sizing router buffers. In ACM SIGCOMM 2004, Portland, OR, Aug. 2004. [2] S. Bhandarkar, S. Jain, and A. L. N. Reddy. LTCP: improving the performance of TCP in highspeed networks. SIGCOMM Comput. Commun. Rev., 36(1):41–50, 2006. [3] J. Cao, W. S. Cleveland, D. Lin, and D. X. Sun. Internet traffic tends toward poisson and independent as the load increases. Nonlinear Estimation and Classification, eds. C. Holmes, D. Denison, M. Hansen, B. Yu, and B. Mallick, Dec. 2002. [4] J. Cao and K. Ramanan. A poisson limit for buffer overflow probabilities. In IEEE Infocom 2002, New York, NY, June 2002.
[5] L. Cottrell. TCP stack measurements on lightly loaded testbeds. http://www-iepm.slac.stanford.edu/monitoring/bulk/fast/. [6] M. Enachescu, Y. Ganjali, A. Goel, N. McKeown, and T. Roughgarden. Part III: routers with very small buffers. SIGCOMM Comput. Commun. Rev., 35(3):83–90, 2005. [7] S. Floyd. HighSpeed TCP for large congestion windows. RFC3649, Dec 2003. [8] S. Gorinsky, A. Kantawala, and J. S. Turner. Link buffer sizing: A new look at the old problem. In IEEE Symposium on Computers and Communications (ISCC 2005), Murcia, Cartagena, Spain, June 2005. [9] Y. Gu, D. Towsley, C. V. Hollot, and H. Zhang. Congestion control for small buffer high speed networks. Technical report. [10] R. Jain, D. Chiu, and W. Hawe. A quantitative measure of fairness and discrimination for resource allocation in shared computer systems. Technical report, Sept. 1984. [11] C. Jin, D. X. Wei, and S. H. Low. FAST TCP: Motivation, architecture, algorithms, performance. In IEEE Infocom 2004, Hongkong, Mar. 2004. [12] D. Katabi, M. Handley, and C. Rohrs. Congestion control for high bandwidth-delay product networks. In ACM SIGCOMM 2002, Pittsburgh, PA, Aug. 2002. [13] E. Kavak and S. Kunniyur. Congestion controllers for high bandwidth connections with fiber error rates. In 2004 IEEE International Conference on Communications, volume 2, 20-24 June 2004. [14] T. Kelly. Scalable TCP: improving performance in highspeed wide area networks. ACM SIGCOMM Computer Communication Review, 33(2):83–91, 2003. [15] L. Le, J. Aikat, K. Jeffay, and F. D. Smith. The effects of active queue management on web performance. In ACM SIGCOMM/Performance, 2003. [16] D. J. Leith and R. N. Shorten. H-TCP protocol for high-speed longdistance networks. In 2nd Workshop on Protocols for Fast Long Distance Networks, Argonne, Illinois USA, Feb. 2004. [17] Y.-T. Li, D. Leith, and R. N. Shorten. Experimental evaluation of TCP protocols for high-speed networks. Technical report, Hamilton Institute, 2005. [18] M. Luby. LT codes. In The 43rd Annual IEEE Symposium on Foundations of Computer Science, pages 271–282, Nov. 2002. [19] M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow. TCP selective acknowledgement options. RFC2018, Oct. 1996. [20] T. J. Ott, J. Kemperman, and M. Mathis. The stationary distribution of ideal TCP congestion avoidance. In DIMACS Workshop on Performance of Realtime Applications on the Internet, Nov. 1996. [21] G. Raina, D. Towsley, and D. Wischik. Part II: control theory for buffer sizing. SIGCOMM Comput. Commun. Rev., 35(3):79–82, 2005. [22] G. Raina and D. Wischik. Buffer sizes for large multiplexers: TCP queueing theory and instability analysis. In In EuroNGI 2005. Extended version to appear in Queueing Systems, 2005. [23] F. D. Smith, F. Hernandez-Campos, K. Jeffay, and D. Ott. What TCP/IP protocol headers can tell us about the web. In SIGMETRICS/Performance, pages 245–256, 2001. [24] G. Vinnicombe. On the stability of networks operating TCP-like congestion control. In Proc of the 15th IFAC World Congress on Automatic Control, Barcelona, Spain, Jul. 2002. [25] L. Xu, K. Harfoush, and I. Rhee. Binary increase congestion control (BIC) for fast long-distance networks. In IEEE Infocom 2004, Hongkong, Mar. 2004.