Scheduling Latency-Critical Traffic: A Measurement Study ... - CiteSeerX

Scheduling Latency-Critical Traffic: A Measurement Study of DRR+ and DRR++ C. Zhang, M.H. MacGregor Dept. of Computing Science, University of Alberta 221 Athabasca Hall Edmonton, Alberta, Canada T6G 2E1 [email protected]

ABSTRACT Efficient Fair Queuing Using Deficit Round-Robin, DRR, proposed by Shreedhar and Varghese [9] is a lowcomplexity packet scheduler that has several commercial implementations. DRR has also been extended as DRR+ to accommodate latency-critical flows. DRR+, however, assumes that a latency-critical flow exhibits very smooth arrivals whereas most network flows are very bursty in nature, either as the result of source bursts, or as a result of the dynamics of multihop network paths. When DRR+ encounters a burst, it reverts back to the behavior of DRR, providing no preference or latency bound for latency critical traffic. This is a fatal flaw that prevents DRR+ from being useful in scheduling bursty latency-critical flows. In this paper we present a different extension to DRR that has much lower delay and delay jitter than DRR+ and is capable of handling bursty latency-critical flows.

1. INTRODUCTION With the prolific growth of high bandwidth applications, the method used to schedule packets for transmission at a router interface has become a critical issue. Scheduling algorithms perform the function of distributing available network resources such as bandwidth and buffer space amongst the sessions in progress. They decide the service order of packets in different sessions and determine the impact of one session on another. In this paper, we present some of the results of a measurement study of the performance of two related schedulers: DRR+ and DRR++. We demonstrate that DRR++ has much better delay and delay jitter performance than DRR+ in an environment of bursty, latencycritical traffic. This is an important observation as there are several implementations of DRR / DRR+ currently in use, and the predominant network environment is one of unpredictable bursts rather than smooth arrivals. Further, we demonstrate that the delay of a latency-critical flow in DRR++ is not affected by the total number of flows, which is another difficulty affecting DRR. A large number of schedulers based on a variety of techniques have been proposed in the literature[4], [10], [15], [17]. Scheduling algorithms can be characterized as

either work-conserving or non-work-conserving. The former type always schedules a transmission provided that there is at least one packet queued, while the latter may defer the transmission of waiting packets. Nonwork-conserving schedulers may help to control flow distortions within a network [16]. We can also distinguish between rate-based (timebased) and round-robin based schedulers. Rate-based schedulers forward packets according to their deadline time. The deadline time is based on the packet’s arrival time plus its service time. Since queued packets are served in order of increasing deadline, implementation of this algorithm requires the use of a sorted priority queue mechanism implying O(log(n)) complexity, where n is the number of concurrently active sessions. Round-robin based schedulers such as Weighted Round Robin (WRR) [5] and Distributed Round Robin (DRR) [9] do not keep a timestamp for each packet. Instead they use arrival order and a service order to define transmission order. Several general models for schedulers have been introduced previously. In [11] a latency-rate (LR) server is presented as a general scheduler with two parameters, the latency and the allocated rate. A delay bound for an LR server is derived based on these two parameters, and the latency values of several scheduling algorithms are presented. The Rate-Controlled Static-Priority (RCSP) algorithm [14] consists of a regulator and a scheduler to decouple the allocation of delay and bandwidth. RCSP employs various algorithms as the regulator to satisfy specific requirements, such as regulating delay-jitter or rate-jitter.

1.1. DRR In DRR [9] flows are serviced in round-robin manner. Each flow is associated with a queue, and each queue is allocated a fixed quantum of service in each round. When a queue is serviced, the length of each packet sent is deducted from the current balance until the next packet would result in a deficit. The final balance is preserved if the queue is not emptied during the current round. Before the queue is serviced in the next round, the previously

preserved balance is added to the fixed quantum for that queue. Preserving the balance from the previous round ensures fairness and allows packets greater than the quantum to eventually be sent. DRR has several desirable features: it provides fairness amongst the connections, allows varying packet sizes and has O(1) time complexity. Only two arrays are used in DRR, containing the quantum and deficit for each flow. A modification called DRR+ is proposed in [9] to provide bounded delay for real time traffic. DRR+ uses a policy that the source should not send more than x bytes of data in some period T. Otherwise the flow is said to have violated its contract, and it is treated as a best-effort flow. In DRR+, once a latency-critical flow has been moved into the best-effort class, there is no provision for it to graduate back to priority treatment.

1.2. DRR++ DRR++ [6] uses the deficit approach of DRR for servicing latency-critical flows, along with priority transmit queuing. This ensures that the available bandwidth is shared between flows so that the best-effort flows are not starved, while giving priority to latency-critical flows. Most importantly, this approach allows latency-critical flows to burst occasionally. A latency-critical flow scheduled by DRR++ is constrained to send less than one quantum between best-effort service intervals. If the latency-critical flow exceeds this rate, the excess traffic simply remains backlogged. This overcomes the weakness of DRR+ while still preserving fairness. DRR++ is a non-work-conserving scheduler that can be used to provide robust service for latency-critical traffic. DRR++ is suitable for packet scheduling in the network core because, as long as a stream obeys its contract at the admission point, it will not be penalized for burst behaviour impressed on the stream as it transits the network. DRR++ is also suitable for use in the core because of its extremely low per-packet complexity inherited from DRR. DRR++ is very effective in isolating latency-critical (LC) and best-effort (BE) traffic from each other so that misbehaving sources of one type cannot affect traffic from sources of the other type. DRR++ also exhibits fairness for BE traffic with variable length packets, while normal Round Robin provides fairness only with fixedsize packets.

2. MEASUREMENT METHODOLOGY Measurements were made using clients hosted on two PC’s connected by a 100 Mbps Ethernet to a gateway node. The gateway was then connected by a WAN link to

the destination node. The link between the gateway and the destination was the bottleneck link. The scheduling algorithm resides at the gateway node G1, on the upstream side of the bottleneck link, and so takes effect after the packets arrive at node G1 and are waiting to be forwarded to D1. We define Ts and Ta as the timestamps just before a packet reaches node G1 and just after it leaves node G1, respectively. The metric we use to evaluate the algorithms is the mean delay of the packets for each flow j: D = (T – T ) j a s The parameters used in the study are the packet arrival rate, λ, and the quantum value, q, for each flow. We use exponentially distributed traffic as the source type for BE traffic under both DRR+ and DRR++. The BE traffic is intended mostly to provide a background load for our studies of the service received by the LC flow. It is well-known that interarrival times for requests for Web pages are exponentially distributed, so as well as being simple to generate, this is a meaningful background traffic type. The BE traffic had a packet length with a mean of 410 bytes, consisting of a 40 byte header plus a data portion with a length uniformly distributed in the range [40,700] bytes. The LC flow is meant to represent a bursty, latencycritical flow. In [3], two categories of burstiness are distinguished: burstiness of interarrival times, and burstiness of packet lengths. For the former we used an exponential distribution because it exhibits significant short term variance. For the latter we used MPEG traffic[1] because it exhibits significant variation in packet length. Only the results for the MPEG traffic are presented here.

3. RESULTS We gathered data for both exponential and MPEG LC sources. For both types of source, we looked at the effects of varying the intensity of the background best-effort flows, and the effects of changing the quantum allocated to the LC traffic.

3.1. Burstiness in Length: MPEG LC Traffic MPEG traffic is challenging for the scheduler because of the large variations in frame length. In this section, we compare the behaviour of DRR+ and DRR++ while changing the arrival rate of best effort traffic, λbe and the quantum allocated to the latency-critical flow, Q lc. In DRR+, the contract Clc for the MPEG LC flow is set as 1.35 MPEG , where MPEG = ( I + 3P + 6B ) ⁄ 10 .

3.1.1 Effect of background intensity In Figures 1and 2 we show the effects of varying the rates of the three BE flows from 0 to 30 packets per second. Figure 1shows that under DRR++ the delay of the LC flow approaches a maximum. However the delay of the three BE flows increases dramatically once their arrival rates are larger than 13 packets per second. Figure 2 shows the results for DRR+, where the LC flow has a larger delay than the BE flows because it has been demoted to best-effort service and is being served last in every round. This is an illustration of the flaw in DRR+, and is a behavior that will occur at all arrival rates for the MPEG flow because it is the packet length bursts, not arrival time burstiness, that causes the demotion of the flow. At the same time, the large variance in the packet length of the MPEG LC flow results in even larger delay than for the BE flows because the larger frames must wait multiple rounds for transmission. The delays for all flows reach more than one hundred times their value under DRR++.

improvement suggested as DRR++. That is the scope of this investigation; further improvements to the scheduler are still required in order to apply it in an environment of large numbers of latency-critical flows. Our experiments used two types of traffic - one with bursty interarrival times, and one with bursty packet lengths. Only the latter results are presented in this paper. We investigated the impact of background traffic, and the sensitivity of both schedulers to their parameter settings. Our measurements demonstrate that DRR++ has the following advantages over DRR+: •

DRR++ shares bandwidth between traffic classes in a predictable manner over a wide range of conditions, giving priority to latency-critical traffic without starving best-effort traffic.

•

DRR++ is very effective in isolating the behaviour of one class of flows from another, so that misbehaving sources do not affect flows in another class.

•

In particular, as the intensity of best-effort traffic increases in DRR++, the delay of latency-critical traffic approaches an upper limit, while in DRR+ it simply continues to increase.

We also tested the effect of Qlc on the MPEG LC flow for two different values of utilization of the bottleneck link.

•

Delay of latency-critical traffic is much lower in DRR++ than in DRR+, and is much less sensitive to the parameter settings of the scheduler.

•

Low Utilization Figure 3 presents the delay of the MPEG LC flow in both DRR+ and DRR++ with low utilization of the bottleneck link (56%). Neither value changes much when Qlc changes because at low utilization not many BE packets arrive during one round. So the best effort quantum isn’t fully used, which results in a bounded vacation time for servicing the LC flow.

•

Delay of latency-critical traffic is up to two orders of magnitude less for DRR++ than for DRR+ under conditions of high utilization.

•

Delay for MPEG latency-critical traffic in DRR+ can actually be worse than that of the best-effort traffic; this is not the case in DRR++.

•

Delay variance (delay jitter) of latency-critical traffic is much better (lower) in DRR++ than in DRR+.

•

•

DRR++ preserves the within-class fairness of DRR+.

3.1.2 Effect of quantum, Qlc

High Utilization Figures 4 and 5 show the two totally different delay trends for the MPEG LC and BE flows in DRR+ and DRR++. In DRR++, the MPEG LC flow reaches very low delay when Qlc equals 9000 bytes, which is 60% of the total quantum. In DRR+ the delay of the MPEG LC flow decreases from a very large value until the point at which Qlc equals 21000 bytes, which is 78% of total quantum. When Qlc is 27000 bytes (82% of total quantum), the delay is still much larger than in DRR++.

The change to DRR+ required to obtain the benefits documented here is fairly straightforward, but we think that the magnitude and scope of the benefits are certainly worth stressing, particularly in view of the current wide use of DRR.

5. REFERENCES [1]

4. CONCLUSIONS In this paper we report the results of a measurementbased evaluation of two packet scheduling algorithms: DRR+ and DRR++. The performance of DRR relative to other schedulers has been reported elsewhere; our aim is to document the relative performance of DRR+ and the

[2]

[3]

S. Bhattacharjee, K. L. Calvert, E. W. Zegura, Network Support for Multicast Video Distribution, Technical Report 98-16, Georgia Inst. of Tech., 1998. J. Bruno, E. Gabber, B. Özden, A. Silberschatz, “MoveTo-The-Rear List Scheduling: A New Scheduling Algorithm for Providing QoS Guarantees”, Proc. ACM Multimedia ’97, pages 63-73. K. C. Claffy, Internet Traffic Characterization, Ph.D thesis, University of California, San Diego, 1994.

[4]

[5]

[6]

[7] [8]

[9]

[10]

[11]

S. Golestani, “A self-clocked fair queueing scheme for broadband applications”, Proc. IEEE Infocomm ‘94, June, 1994, pages 636-646. B. Kim, B.-Y. Kim, “Simulation Study of Weighted Round-Robin Queueing Policy”, Proc. Tech. Conf. on Telecommuncations R&D, Massachusetts, 1994. M.H. MacGregor, W. Shi, “Deficits for Bursty Latencycritical Flows: DRR++”, Proc. IEEE ICON ‘00, Singapore, 2000, pages 287-293. V. Paxson, G. Almes, J. Mahdavi, M. Mathis, “RFC2330: Framework for IP Performance Metrics”, May 1998. V. Paxson, “On Calibrating Measurements of Packet Transit Times”, Proc. ACM SIGMETRICS ’98, 1998, pages 11-21. M. Shreedhar, G. Varghese, “Efficient Fair Queueing Using Deficit Round-Robin”, IEEE/ACM Transactions on Networking, Vol 4, No. 3, July 1996, pages 375-385. D. Stiliadis, A. Varma, “Rate Proportional Servers: A Design Methodology for Fair Queueing Algorithms, IEEE / ACM Transactions on Networking, April, 1998, pages 164-174. D. Stiliadis, A. Varma, “Latency-Rate Servers: A General

[12]

[13]

[14]

[15] [16]

[17]

Model For Analysis of Traffic Scheduling Algorithms”, IEEE/ACM Transactions on Networking, Vol 6., No. 5, October 1998, pages 611-624. F. Tsou, H. Chiou, Z. Tsai, “Design and Simulation of an Efficient Real-Time Traffic Scheduler with Jitter and Delay Guarantees”, IEEE/ACM Transactions on Multimedia, Vol. 2, No. 4, December 2000, pages 255-266. D. K. Y. Yau, S. S. Lam, “Adaptive Rate-Controlled Scheduling for Multimedia Applications”, Proc. ACM Multimedia ’96, pages 129-140. H. Zhang, D. Ferrari, “Rate-Controlled Static-Priority Queueing”, Proc. IEEE Infocom ’93, April, 1993, pages 227-236. H. Zhang, D. Ferrari, “Rate-Controlled Service Disciplines”, J. High Speed Networks, 3(4):389-412, 1994. H. Zhang, “Service Disciplines For Guaranteed Performance Service in Packet-Switching Networks”, Proc. IEEE, Vol. 83, pages 1374-1396, Oct., 1995. L. Zhang, “Virtual Clock: A new traffic control algorithm for packet switching networks”, Proc. ACM SIGCOMM ‘90, Sept. 1990, pages 19-29.

Figure 1. DRR++ carrying MPEG, varying background load

Figure 2. DRR+ carrying MPEG, varying background load

Figure 3. DRR++ carrying MPEG, low bottleneck utilization, varying Qlc

Figure 4. DRR++ carrying MPEG, high bottleneck utilization, varying Qlc

Figure 5. DRR+ carrying MPEG, high bottleneck utilization, varying Qlc

Scheduling Latency-Critical Traffic: A Measurement Study ... - CiteSeerX

Scheduling Latency-Critical Traffic: A Measurement Study ... - CiteSeerX

Suggest Documents

Downlink Scheduling of Heterogeneous Traffic - CiteSeerX

A Fair Scheduling Algorithm with Traffic Classification for ... - CiteSeerX

A Measurement Study of Data-intensive Network Traffic ... - Eurecom

TCP over WiMAX: A Measurement Study - CiteSeerX

Traffic Measurement and Vehicle Classification with a ... - CiteSeerX

Traffic measurement and vehicle classification with a ... - CiteSeerX

Traffic Measurement and Vehicle Classification with a ... - CiteSeerX

Traffic Measurement and Vehicle Classification with a ... - CiteSeerX

Traffic Measurement and Vehicle Classification with a ... - CiteSeerX

Optimization of Network Traffic Measurement : A Semidefinite ...

Traffic Scheduling Simulation and Assignment for Area ... - CiteSeerX

Delay-Bounded Packet Scheduling of Bursty Traffic over ... - CiteSeerX

Geo-intelligent Traffic Scheduling For Multi-Homed On ... - CiteSeerX

MAC Protocol and Traffic Scheduling for Wireless ATM ... - CiteSeerX

Personnel Scheduling: Comparative Study of ... - CiteSeerX

MAC Protocol and Traffic Scheduling for Wireless ATM ... - CiteSeerX

Multirate Scheduling Of VBR Video Traffic In ATM Networks - CiteSeerX

Scheduling of uplink measurement reports

A Study on Optimally Co-scheduling Jobs of Different ... - CiteSeerX

A Study on a Novel Tool Temperature Measurement ... - CiteSeerX

A Comparative Study of Modulo Scheduling Techniques - CiteSeerX

scheduling application using petri nets : a case study - CiteSeerX

A Comparative Study of Modulo Scheduling Techniques - CiteSeerX

A Literature Study on Scheduling in Distributed Systems - CiteSeerX