Stability Criteria of RED with TCP Traffic Technical Report Thomas Ziegler †‡ Serge Fdida † Christof Brandauer‡ {Thomas.Ziegler, Christof.Brandauer}@salzburgresearch.at
[email protected] † Université
Pierre et Marie Curie, Laboratoire Paris 6, Rue Capitaine Scott, 8, 75015 Paris, France
‡Salzburg
Research; University of Applied Sciences and Technologies, Jakob Haringerstr. 5, 5020 Salzburg, Austria*
Document Status: Draft Date: May 2000
Abstract This report investigates the dynamics of RED queue management in the presence of FTP and Web-like TCP reno flows for scenarios with homogeneous and heterogeneous round trip times. We systematically derive quantitative guidelines and steady state models how to set the RED-parameters as a function of the scenario parameters bottleneck-bandwidth, round-trip-time, number of TCP flows in order to avoid severe oscillation of the queuesize. We show that properly configuring RED is feasible in case of one way bulk-data TCP traffic and knowledge of the scenario parameters, although the buffer-requirements are significant. Additionally, it is investigated in detail why RED and two-way TCP bulk-data traffic necessarily have to oscillate resulting in concerns on the deployment of RED in the TCP-dominated Internet. * This work is partially sponsored by the EU IST project Aquila.
1
Introduction
The RED (Random Early Detection) queue management algorithm [1][2] for congestion avoidance in cooperation with end-to-end transport protocols has been widely discussed in the literature and is implemented in commercially available routers. Recent simulations [3][4][5] have shown that achieving RED’s control-goal - convergence of the queue-size between minth and maxth - is sensitive to parameter setting. However, quantitative analysis giving guidelines on how to set the RED parameters dependent on the network-scenario is still in a very premature state, giving a motivation for this work. Our observation in [6] is that setting RED parameters properly in order to avoid oscillation of the average queue size is absolutely non trivial. Oscillations are harmful as they cause periods of linkunderutilization when the instantaneous queue-size equals zero followed by periods of frequent “forced packet-drops” [7] when the average queue-size exceeds maxth or the instantaneous queue approaches the total buffer-size. Forced packet drops are in contradiction to the goal of early congestion detection and decrease the performance of ECN [8], attempting to avoid packet-loss and to provide increased throughput for low-demand flows by decoupling congestion notification from packet-dropping. In case of WRED [9] or RIO [10] oscillations may cause poor discrimination among in-profile and out-of-profile packets. When the average queue size decreases below the maximum queue-size threshold for out-of-profile packets the out-packets may enter the queue. Subsequently, the average queue-size increases again and in-packets may be dropped with high probability. However, the primary intention of our analysis is not to investigate RED’s performance decrease due to queue-size oscillations but to find the necessary and sufficient criteria to avoid these oscillations by setting the RED parameters properly. We focus on modelling RED with bulk-data TCP-Reno flows in steady state, i.e. we consider the convergence of the average queue after a phase of starting TCP flows performing their initial Slowstart. During our time-period of interest the number of TCP flows stays constant. Our arguments for investigating RED with TCP in steady state are first, the need to determine the operational bounds for any interacting set of control-mechanisms under the “simpler” steadystate assumption before investigating its behavior in transient state and second, the dominance of TCP traffic in the Internet. In section 8.2 the models for RED parameter setting are verified against realistic, Web-like traffic patterns. Contrary to other papers modelling TCP with RED ([11] [12]) our analysis is not restricted to the assumption of homogeneous round-trip-times and gives quantitative, not just qualitative guidelines how to set the RED parameters. The problem of parametrizing RED for steady state TCP flows is definitely challenging as four inter-dependent RED parameters (minth, maxth, maxp and wq) have to be set properly as a function of three scenario parameters (bottleneck bandwidth, number of TCP flows,
Introduction
-3-
round-trip-time). Regarding the kind of convergence we may hope to achieve with RED and TCP it is important to mention that we do not mean convergence to a certain queue-size value in the mathematical sense. Contrary, we define convergence very loosely as “achieving a state of bounded oscillation of the queue-size around a value between minth and maxth so that the amplitude of the oscillation of the average queue-size is significantly smaller than maxthminth and the instantaneous queue-size remains greater than zero and smaller than the total buffersize”. This state should be achieved in a few round-trip-times. Abbreviations used throughout this paper: • B: buffer-size at bottleneck in packets • C: capacity of bottleneck link in Mbps • L: bottleneck-capacity in mean packets per second • D: delay of Bottleneck link in ms • d: total propagation delay in ms • di: delay of access link i • N: number of flows • ni: number of flows at access link i
Queue Management Overview
2
-4-
Queue Management Overview
Traditional Internet routers employ FIFO packet-scheduling at output ports, aggregating traffic in a single queue. In case of congestion "drop tail packet dropping" is performed. The principle of drop-tail is to discard all packets once the queue size has reached the total buffer-size. Although simple, drop tail has several drawbacks: • Drop tail packet dropping exhibits a bias against bursty traffic. In case of congestion in the presence of TCP flows, the queue size converges to the buffersize as TCP increases the congestion window until drops happen. Once the queue size is at the buffer-size there is no bufferspace left to absorb bursts. Additionally, drop tail is incapable of detecting the early stages of congestion as it starts dropping packets when the buffer is already filled completely. • Drop tail packet dropping in combination with a single FIFO queue at the output port shared by all flows causes long delay for delay sensitive interactive traffic (telnet) and real time traffic due to the large steady state queue size. • Due to traffic phase effects drop-tail packet-drop algorithms sometimes allow a few connections to monopolize buffer space while locking out other connections causing considerable unfairness. Another non desirable phase effect of drop tail packet dropping is the tendency to synchronize multiple TCP conversations to be in phase, causing periods of heavy load followed by periods of link-underutilization. This effect is called "global synchronization" [1].
2.1 RED Gateways The Random Early Detection (RED) algorithm [1] has been proposed in order to alleviate the problems of simple drop-tail queue management. RED monitors the average queue size (avg) of a shared FIFO queue at the router output port. Based on comparison of the average queue size against a minimum (minth) and a maximum threshold (maxth) a packet
Queue Management Overview
-5-
dropping probability (pa) is computed. The RED algorithm is given in figure 1: For each packet arrival: Calculate the average queue size If q > 0 avg = (1 - wq) avg + wq q else avg = (1- wq)m avg If minth host1, host4 -> host2 (12 in total)
queue forward (pkts)
queue backward (pkts)
Red parameters: minth = 7, maxth = 40, buffersize = 60, wq = 0.015, maxp = 1/10
60
40
queue minth maxth
20
0 60 65
70
75
80
85
90
95
100
40
queue minth maxth
20
0 65
70
75
80
85
90
95
100
time (secs)
Figure 43 queue size in forward and backward direction Figure 43 shows that the queue-size oscillates heavily between zero and maxth. The oscillations at the two RED-gateways are phase-shifted by exactly 180 degree. As shown by figure 43, the measurement confirms the simulation results. Measurement2: Traffic: 7 tcp flows host1 -> host4, host2 -> host5, host5 -> host1, host4 -> host2 (28 in total)
Measurements with RED and TCP traffic
- 58 -
queue forward (pkts)
queue backward (pkts)
Red parameters: minth = 10, maxth = 60, buffersize = 100, wq = 0.002, maxp = 1/10 60
40
queue minth maxth
20
0 60 85
90
95
100
105
110
115
120
40
queue minth maxth
20
0 85
90
95
100
105
110
115
120
time (secs)
Figure 44 queue size in forward and backward direction Measurements 1,2 show that frequency and amplitude of oscillations are independent of the number of TCP flows.
11.4 Measurements with one-way TCP traffic Common parameters for all measurements: Bottleneck bandwidth 500kbps; switching delay in routers 2ms, propagation delay 0ms. All TCP flows start at the beginning of the measurement. The MTU size (configured at the bottleneck link) equals 512 bytes.
Measurement1: Traffic: 3 tcp flows host1 -> host4, host2 -> host5 (6 in total) Red parameters: minth = 10, maxth = 70, buffersize = 100, wq = 0.125, maxp = 1/40
Measurements with RED and TCP traffic
- 59 -
100
inst. queue-size (pkts)
80 60
queue minth maxth
40 20 0 0
20
40
60
80
100
time (secs)
Figure 45 queue size in forward and backward direction Measurement2: Traffic: 6 tcp flows host1 -> host4, host2 -> host5 (12 in total) Red parameters: minth = 10, maxth = 70, buffersize = 100, wq = 0.015, maxp = 1/10
inst. queue-size (pkts)
100 80 60
queue minth maxth
40 20 0 0
20
40
60
80
100
120
time (secs)
Figure 46 queue size in forward and backward direction Measurement3: Traffic: 14 tcp flows host1 -> host4, host2 -> host5 (28 in total)
Measurements with RED and TCP traffic
- 60 -
Red parameters: minth 10, maxth 70, buffersize 100, wq 0.125, maxp 1/6 100
inst. queue-size (pkts)
80 60
queue minth maxth
40 20 0 0
10 20 30 40 50 60 70 80 90 100 110 120 130
time (secs)
Figure 47 queue size in forward and backward direction
Measurement4: Traffic: 28 tcp flows host1 -> host4, host2 -> host5 (48 in total) Red parameters: minth = 10, maxth = 70, buffersize = 100, wq = 0.125, maxp = 1/4
inst. queue-size (pkts)
100 80 60
queue minth maxth
40 20 0 0
20
40
60
80
100
120
time (secs)
Figure 48 queue size in forward and backward direction
Measurements with RED and TCP traffic
- 61 -
Measurements in this chapter show that the maxp, wq, minth and maxth values proposed by the model (see section 6) make the queue-size converge between minth and maxth independently of the number of flows. Hence the simulation results for RED parameterization in case of oneway TCP traffic are confirmed for the investigated scenario. In figures 6 and 7 many packet losses and retransmissions happen. For a dropped packet one way delay can not be measured, causing time-periods of insufficient number of samples (e.g. figure 7 between 20 and 60 seconds).
Summary and Conclusions
- 62 -
12 Summary and Conclusions This report aims at systematically investigating the stability criteria of RED and deriving quantitative guidelines for the setting of RED parameters. As the stability of the system does not solely depend on the RED parameters but also on the behavior of end-to-end congestion control and the time-scales of the arrival-process, we focus on bulk-data TCP traffic for derivation of the models. For the assumed traffic-pattern, stability (i.e. bounded oscillation of the queue-size between minth and maxth) can be achieved assuming the scenario-parameters (bottleneck bandwidth, number of flows, round-trip-time distribution) are known. We show that the queue oscillates heavily if the difference between minth and maxth is set too small and that the queue converges to maxth or oscillates around minth if maxp is not set correctly. Subsequently, based on a model for the wq parameter [11], we analytically derive a model how to set maxp as function of the bottleneck bandwidth, number of flows, and the roundtrip-time distribution. This model is suitable for heterogeneous RTTs. As a next step, lacking an analytical model to obtain quantitative values for maxth-minth, we perform simulations over the maximum possible parameter range (bandwidth, number of flows) the NS simulator allows and determine the proper settings for maxth-minth by trial and error. These simulations result in a cloud of points which is approximated by a linear least square fit, giving a quantitative formula how to set maxth-minth as a function of bottleneck bandwidth, number of flows and the RTT. Subsequently, we extend our model for maxthminth to the heterogeneous RTT case and explain why the setting of maxp and maxthminth exhibits interdependencies. These interdependencies demand for combining the two models into a system of non linear equations. Finally, this system of equations is solved numerically and we may obtain the total model how to set the RED parameters as a function of bottleneck bandwidth, number of flows and round-trip-time distribution. The model is evaluated by measurements and simulations with bulk-data and Web-like TCP traffic. The model shows that the buffer-requirements of RED are significant in case of bulk-data TCP traffic making the usability of RED in networks with high bandwidth*delay products questionable. Additionally, we examine the reasons for the existence of heavy oscillations with RED and two-way bulk-data TCP-Reno traffic. We find that these oscillations are inevitable in case of tight coupling between forward and reverse TCP traffic, caused by interactions of queue-management and TCP Reno’s window-based congestion-control algorithm and can
Summary and Conclusions
- 63 -
be avoided if the transport protocol’s congestion control algorithm or the queue-management algorithm converge to a certain buffer-utilization. Finally, we investigate the decrease in service differentiation and link utilization by WRED in case parameters are not set properly and in case of 2way TCP traffic. All in all we that it is questionable whether appropriate setting of RED parameters is possible in a realistic Internet environment. Although the quantitative models proposed in this thesis make an important step ahead, it is questionable whether ISPs will be able to retrieve sufficiently accurate information about the RTT of flows and the number of flows traversing a router. Additionally, the models proposed in this paper, although definitely sophisticated to derive, can only be applied to simple topologies with one way TCP traffic. For sophisticated topologies with 2way TCP and cross-traffic, possibly consisting of flows not congestion controlled by TCP an extension of the model is required. The complexity of the model for TCP with RED indicates that future queue-management algorithms should employ a smaller parameter-set than RED does.
References
- 64 -
13 References [1] S. Floyd, V. Jacobson, “Random Early Detection Gateways for Congestion Avoidance”, IEEE/ACM Transactions on Networking, August 1993 [2] B. Braden, D. Clark et al., “Recommendations on Queue Management and Congestion Avoidance in the Internet”, RFC 2309, April 1998 [3] W. Feng, D. Kandlur, D. Saha, K.G. Shin, “ A self-configuring RED gateway”, Proc. of IEEE Infocom 1998 [4] W. Feng et al., “Techniques for eliminating Packet Loss in congested TCP/IP Networks”, University of Michigan CSE-TR-349-97, November 1997 [5] T.J. Ott, T.V. Lakshman, L.H. Wong, “SRED: Stabilized RED”, Proc. of IEEE Infocom 1998 [6] T. Ziegler, U. Hofmann, S. Fdida, "RED+ Gateways for detection and discrimination of unresponsive flows", Technical Report, December1998, http://www.newmedia.at/~tziegler/papers.html [7] S. Floyd, K. Fall, K. Tieu, “Estimating arrival rates from the RED packet drop history”, March 1998, unpublished, http://www.aciri.org/floyd/end2end-paper.html [8] K.K. Ramakrishnan, S. Floyd, "A Proposal to add Explicit Congestion Notification (ECN) to IP", RFC2491, January 1999 [9] Cisco Web pages, http://www.cisco.com/warp/public/732/netflow/qos_ds.html [10] D. Clark, “Explicit Allocation of Best Effort Packet Delivery Service”, http:// www.ietf.org/html.charters/diffserv-charter.html [11] V. Firoiu, M. Borden, “ A Study of active Queue Management for Congestion Control”, Proc. of IEEE Infocom, Tel Aviv, 2000 [12] V. Jacobson, K. Nichols, K. Poduri, “RED in a different Light”, Draft Technical Report, Cisco Systems, Sept. 1999 [13] S. Floyd, “Discussions on setting RED parameters”, http://www.aciri.org/floyd/ red.html , Nov. 1997 [14] C. Villamizar, C. Shong, “ High performance TCP in ANSNET”, Computer Communication Review, V.24. N.5, October 1994, pp. 45-60 [15] J. Padhye et al.,”Modeling TCP Troughput: A simple Model and its empirical Validation”, Proceedings of ACM SIGCOMM, August 1998 [16] NS Simulator Homepage, http://www-mash.cs.berkeley.edu/ns/ [17] Maple home page, http://www.mit.edu/afs/athena.mit.edu/software/maple/www/ home.html [18] B. Mah. ‘‘An Empirical Model of HTTP Network Traffic’’, Proceedings of INFOCOM ’97, Kobe, Japan, April 1997 [19] L. S. Brakmo, L. L. Peterson, “TCP Vegas: End to End Congestion Avoidance on a Global Internet”. IEEE Journal on Selected Areas in Communications (JSAC), 13(8):1465-1480, October 1995 [20] M. May et al, “Reasons not to deploy RED”, Technical Report, Feb. 1999
Appendix, Generic Measurement Environment (GME)
- 65 -
14 Appendix, Generic Measurement Environment (GME) The GME enables flexible and comfortable performance evaluation by measurement in the Internet. The main-program (a TCL script) is running at the GME-Server, controlling the entire measurement-process. This script in turn creates and distributes scripts for flowgeneration and traffic monitoring, starts flows and traffic-monitors on remote hosts, and finally gathers the measurement results at the GME-server after the measurement has terminated. For generation of flows the "ttcp" tool is used: • first a ttcp server (receiver) has to be started at the receiving host with "ttcp -rs -p x". The option "-rs" tells ttcp to operate as data-receiver tcp and to discard all incoming data; "-p x" specifies the port-number to listen at. • then the ttcp sender is started with "ttcp -ts -p x receiving-host". The "-ts" option tells ttcp the operate as transmitter and to source arbitrary data into the net to the host running the ttcp receiver with port-number "x". The ttcp sender is aware of several other TCP-specific options and additionally allows creation of UDP flows. Traffic monitoring is performed with tcpdump. Tcpdump switches a host’s ethernet-interface into promiscuous mode and dumps header-information of all packets broadcasted at the Ethernet Segment matching a filter condition. For instance, in order to filter all packets of one TCP flow from host A to host B with source port-number 4711, the call to tcpdump is as follows: "tcpdump src host A and dst host B and proto TCP and src port 4711" For remote procedure calls and file-copying the standard Unix-commands rsh and rcp are used. Note that for proper operation of the rsh and rcp commands ".rhosts files" are required at remote hosts enabling access from the GME-server. The main TCL script running at the GME-server can be outlined as follows: 1. create traffic-files, send traffic-files to hosts: there are two types of traffic-files: receiverfiles: shell scripts starting ttcp receivers and sender-files starting ttcp senders. Among a sender-file on a host A and a receiver file on a host be exists a one-to-one relation determining the traffic-flows: for each ttcp-receiver process on host B listening on port C, a ttcp sender is created (with adjustable options) sending to host B, port C. In the current version of GME, all ttcp senders in one sender-file start immediately. However, in future versions the individual ttcp sender processes can be scheduled at arbitrary points time (this would be facilitated by running TCL not only at the CME-Server but also at the remote hosts running the traffic-files). After creation of the sender- and receiver-files for host A and B the sender file is copied to host A; the receiver file is copied to host B with the Unix command "rcp". 2. create tcpdump-files, send tcpdump files to hosts. Tcpdump files are shell scripts con-
Appendix, Generic Measurement Environment (GME)
- 66 -
taining one line of code, starting tcpdump with a filter-function (see above). 3. Start tcpdump files: the tcpdump-files are started with the "rsh" command as background processes on remote hosts. 4. start ttcp receivers: the ttcp receiver-files are started with the "rsh" command as background processes. Note that it is important to start the ttcp receivers earlier than the ttcp senders as ttcp receivers and senders have a client-server relationship. 5. start ttcp senders: the ttcp sender-files are started with the "rsh" command as background processes. As the remote hosts are not time-synchronized it may happen that two sender-files at two distinct remote hosts are not started at the same point in time. Assume that the delay from the CME-server to a host A is longer than the delay between CMEserver and a host B or, assume that some rsh-packets are lost on the way to host A and all rsh.packets to host B arrive. In both cases the sender-file at host B (and thereby the ttcp flows at host B) will be started before the sender file at host A. It is the responsibility of the engineer to estimate whether the connectivity between CME-caller and the remote hosts is sufficient to satisfy the time-accuracy required for a specific measurement. For future versions of the tool an intelligent agent would be desirable, detecting the delayed start of ttcp flows and terminating the measurement in case accuracy requirements are not met. 6. wait for the estimated duration of the measurements 7. gather ttcp results: each ttcp-result contains the measurement results for a single tcp (or udp) flow. All ttcp-result files are copied from the remote hosts to the CME-sever by rcp. 8. gather tcpdump results. TCPdump result files are copied from the remote hosts to the CME-sever by rcp.