Dynamic Quality of Service Control in Packet Switch Scheduling Kevin Ross
Nicholas Bambos
Information Systems and Technology Management UCSC School of Engineering 1156 High Street, Santa Cruz, CA 95064
[email protected]
Departments of MS&E and EE, Stanford University Terman Engineering Center, Stanford, CA 94305-4026
[email protected]
Abstract— Recent research in packet switch scheduling algorithms has moved beyond throughput maximization to quality of service (QoS) control. Several classes of algorithms have been shown to achieve maximal throughput under certain system conditions. Between classes and within each class, QoS performance varies based on arrival traffic and properties of the scheduling algorithm being utilized. Here we compare two classes of throughput-maximizing algorithms and their performance with respect to buffer sizes. These classes are randomized algorithms, which can be characterized as offline algorithms, and projective cone scheduling algorithms, which are online since they respond to the current workload in the system. In each class, parameters can be fine-tuned to reflect the priorities of individual switch ports. We show how the online algorithms lead to significantly better quality of service performance.
I. I NTRODUCTION Analysis of high-performance packet switches has traditionally focused on finding throughput maximizing algorithms. Recent results [6], [7], [10] have proved the throughput maximization (stability) of several classes of algorithms. Whereas throughput maximization deals with the long term flow balance of input streams, quality of service (QoS) involves measurements and guarantees of performance metrics. The required buffer capacity, drop rates of packets, and expected delay are examples of quality of service quantities which are of great importance to network managers and customers. One is interested in not only ensuring that every packet is correctly forwarded, but also in how long it will remain in the system. For many throughput maximizing algorithms, explicit representation of QoS performance is intractable. Even if average buffer and delay characteristics can be shown to exist, the calculation required for numerical representation is often prohibitive. In this paper we explore two classes of algorithms, and how QoS can be shaped within each class. The first is a class of randomized algorithms introduced in [10], which are able to provide closed-form expressions for QoS under certain arrival conditions and with the prior knowledge of long term average rates. The second is the projective cone scheduling (PCS) algorithms described in [8], which are known to adapt well even under minimal arrival assumptions.
These two important classes of algorithms each guarantee maximal throughput in a crossbar packet switch. Randomized algorithms can be characterized as offline algorithms in the sense that the only information they require is the long term average arrival rate to each input-output pair. The PCS algorithms are online algorithms which select their service configuration based on the waiting workload in each queue at each timeslot without prior knowledge of the arrival rates. In another QoS driven approach to packet switch scheduling, Leonardi et. al. [5] derived bounds on the mean value and variance of packet delay and queue sizes under uniform traffic loads using the popular maximum weight matching algorithm first introduced by McKeown et. al. [6]. Cruz [3] and Chang [1], [2] described deterministic bounds for QoS measures under tight traffic admissibility conditions on single queue networks. Other work such as that by Weller and Hajek [11] considers how to restrict throughput in order to make QoS guarantees. The organization of this paper is as follows. In section II we develop the model, and in section III we describe the randomized algorithms. In section III-A we present analytical QoS results for randomized algorithms and address switch optimization. In section IV we describe PCS algorithms and their performance. In section V we compare the performance of PCS algorithms with that of randomized algorithms. We conclude in section VI. Given space constraints we limit this presentation to essential elements of our proposed approach. II. M ODEL AND A SSUMPTIONS We consider a traditional N -by-N crossbar packet switch. Incoming packets are stored in virtual output queues (VOQs) in order to avoid the well known effect of head of line blocking [4]. Each VOQ corresponds to a pair of input and output ports in the switch. Fig. 1 illustrates a four-by-four crossbar packet switch (with four input and four output ports). At any time, the switch can open the connections between each input port and exactly one output port. The model and assumptions in this work are similar to those in [6] and [10]. Time is slotted, with a timeslot corresponding to the transmission time of a single cell across the switching
X2 X (t+1) A (t+1) X (t) -S(t)
X1 Fig. 2. The workload evolution equation 2 can be viewed geometrically. The workload at each timeslot is the sum of previous workload, arrivals and (minus) service. Fig. 1. A four-by-four packet switch can connect each input and output port to exactly one other port. The switch configuration is defined by the set of established connections. In this example packets would be forwarded between the input-output pairs (1,2), (2,3), (3,1) and (4,4) numbered from top to bottom and left to right.
fabric. Packets are assumed to be divisible into equal sized cells. Each timeslot, a cell from input i to output j arrives to each queue according to an independent bernoulli distribution with rate λij . The arrival streams to the various (i, j) pairs are assumed to be independent but may have different arrival rates. The set {λij } of arrival rates is considered to be admissible if X
λij < 1 and
i
X
λij < 1
(1)
j
for all j and i respectively . Under these conditions no input or output port is over-subscribed. To accommodate variations in the models used to describe the algorithms under consideration, we will use a vector formulation with (i, j) labelling. That is, each element of the vector corresponds to an (i, j) pair. For example, we denote the vector λ to be the N 2 × 1 vector of arrival rates λij . The first N elements of λ correspond to the first input port, and the next N to the second port, and similarly for all N 2 elements. The same labelling applies to the workload, arrival and service vectors. Arrivals to each queue in timeslot t are represented in the arrival vector A(t), with zero and one entries identifying arrivals. According to the bernoulli arrival assumption Aij (t) = 1 with probability λij , or zero otherwise. At the beginning of each timeslot the scheduling algorithm being used selects a service configuration S(t). A service configuration corresponds to the set of connections which are established in the switch. If Sij (t) = 1 then a single cell may be transferred from input i to output j in timeslot t. 1
1 If we were using matrix notation, matrices [λ ] which satisfy equation 1 ij are known as doubly substochastic matrices.
Because of the crossbar switch design, the set S from which S(t) is chosen is the set of permutations of input-output pairs, corresponding toPthe matchingsPof size N . That is, S = {S : Sij ∈ (0, 1), i Sij = 1 ∀j, j Sij = 1 ∀i}. There are M = N ! such matchings and for notational simplicity we use an arbitrary but fixed order of the set S = {S m }M m=1 . A scheduling algorithm selects S(t) from S at timeslot t. We assume service happens in the middle of a timeslot, and packets arrive at the end of the timeslot. Letting Xij (t) denote the number of packets from input port i waiting to be sent to output port j, the system evolves according to the equation X(t + 1) = [X(t) − S(t)]+ + A(t + 1)
(2)
where the [.]+ operator applies to each (i, j) term. The vector notation can be viewed from a geometric perspective. In Fig. 2 the workload equation 2 is illustrated for the case where there are two queues in the switch. We say that the workload is rate stable if the long term arrival rate is equal to the long term departure rate for every queue. A scheduling algorithm is throughput maximizing if it guarantees workload rate stability for any admissible rate vector λ. The primary interest in this work is the behavior of the buffer size vector X(t) under various scheduling algorithms. We focus on two distinct classes of algorithms and highlight both the differences between these classes and the control within each class. For all of the algorithms presented here, the selection S(t) is either fixed by X(t) or randomly selected independent of X(t) and of t. Additionally, we have asserted that arrivals are according to a bernoulli distribution. Consequently, the transition probabilities of the workload change from time t to time t + 1 depend only on the state X(t) at time t and not on t itself. Therefore X(t) is a Markov chain in N 2 dimensions. This is an important observation since it establishes the existence of a steady state for the system (when stable), even if that steady state is prohibitively difficult to calculate.
Workload under Randomized Algorithm
III. R ANDOMIZED A LGORITHMS
Choose configuration S
with probability φm
18 16 14
Workload
Randomized algorithms and their performance was the focus of [10]. They select the service configuration at each timeslot according to a fixed probability distribution. Specifically, the policy at each timeslot is to m
Queue 1−1 Average queue
20
12 10 8 6 4 2 0
0
100
200
300
400
500 Timeslot
600
700
800
900
1000
Workload under Randomized Algorithm with Priority Queue 1−1 Average queue
20 18 16 14
Workload
The values of the {φm }M m=1 probabilities can be calculated offline, or updated at various time steps, depending on the system requirements. For implementation, one can see that each timeslot does not require complex calculation. Randomized algorithms simply choose a configuration from a predetermined distribution without taking the workload X(t) into account. This is equivalent to a ’coin flip’ operation with a multi-sided coin. Theorem 3.1: For any admissible λ, there exists a set {φm }M m=1 such that the randomized algorithms have a long term service rate to each queue larger than the long term arrival rate to that queue. Theorem 3.1 is a statement of the throughput maximization of randomized algorithms, and was proved in [10].
12 10 8 6 4 2
A. QoS for Randomized Algorithms
0
As was observed earlier, randomized algorithms cause the workload to follow a Markov chain. A helpful analytical observation in [10] reveals that considering each input-output pair individually, the model decouples nicely into separate dimensions. This makes calculating the distribution of workloads tractable even for large switches. Consider a fixed (i, j) pair alone, the corresponding queue is served whenever the chosen service configuration includes (i, j). Since at each timeslot, service is chosen randomly and independently, this service is governed by a bernoulli distribution, with probability Sij =
M X
m φm Sij > λij
(3)
m=1
100
200
300
λij Sij − λij
(4)
Since λij is fixed and Sij depends on {φm }, network designers can choose {φm } to prioritize service. Consider the case where each queue incurs a cost proportional to its average queue length. These costs need not be the same for each queue, allowing different priorities for each queue. Let cij denote the cost per cell for the average queue length of queue (i, j). In [10] we show how this leads to a convex optimization problem with linear constraints. We formulate the optimization problem (P):
400
500 Timeslot
600
700
800
900
1000
Fig. 3. Workload Traces for Randomized Algorithms The above plots show the workload traces of a simulated four-by-four packet switch operating under randomized algorithms. The same arrival trace is used for each of the two plots, and slightly different probability distributions are utilized. In the upper plot, the same probability φm = 1/(N !) is assigned to each service configuration. In this case no priority is given to any individual queue. In the second plot, the optimal probability distribution is used under the assumption that queue one has twice the value of any other queue (c11 = 2, cij = 1 otherwise), giving higher priority to the queue connecting input one to output one. Improved performance for that queue is seen by the lower overall workload in that queue relative to the other queues.
min φ∈P
With arrivals also according to a bernoulli distribution, the arrivals and departures to the queue (i, j) form a onedimensional random walk or birth-death chain. The probability distribution for the queue length of such a sequence is welldefined. Based on this limiting distribution, the expected queue length for each queue can be found. E[Xij ] =
0
X
cij E[Xij ]
(5)
ij
where P is the set of feasible {φm } sets and E[Xij ] is the vector of expected buffer sizes derived from equation 4, and depends on φ and λ. Since randomized algorithms are the only class of algorithms where such a formulation has been demonstrated, they provide an excellent base-case for analyzing the performance of other more complex scheduling algorithms. They also provide valuable intuition into the type of performance shaping that can be utilized for crossbar packet switches. Fig. 3 illustrates the QoS control using randomized algorithms. The same arrival trace is used operating under two probability sets {φm }M m=1 . The first distribution is with equal probability given to all configurations, while the second applies a higher probability to those configurations which serve the input-output pair (1, 1). The trace of the workload in queue (1, 1) and the average workload over all queues is included in both cases. It is seen that the higher priority case
leads to a lower average workload for that buffer, and that this priority also increases the average overall buffer size.
X2
X2 C3 S3
C2 S2
IV. P ROJECTIVE C ONE S CHEDULING (PCS) A LGORITHMS Projective cone scheduling (PCS) algorithms are a rich class of algorithms for packet switch scheduling which have been shown to be very robust in their performance. Unlike randomized algorithms, PCS algorithms do not rely on knowledge of the long term arrival rate λ, but adapt the scheduling of the switch according to waiting cells in each timeslot. These were presented in [8] and variations appear in [7], [9]. Good intuition for understanding PCS algorithms comes from the geometric representation of the vectors in the system. Definition 4.1: Given a fixed N 2 × N 2 matrix B, the Projective Cone Scheduling (PCS) algorithm is the scheduling algorithm which selects and activates a service vector S(t) ∈ S such that hS(t), BX(t)i = max hS, BX(t)i S∈S
(6)
2
when the backlog vector is X(t) ∈