Journal of High Speed Networks 12 (2002) 87–109 IOS Press
87
A network architecture for providing per-flow delay guarantees with scalable core Prasanna Chaporkar a and Joy Kuri b a
Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104, US Tel.: 1-267-257-3086; E-mail:
[email protected] b Center for Electronics Design and Technology, Indian Institute of Science, Bangalore, 560 012, India E-mail:
[email protected] Abstract. Many real-time applications demand delay guarantees from the network. A network architecture designed to support these applications should be robust and scalable. The IntServ architecture provides per-flow QoS at the cost of robustness and scalability. The DiffServ architecture is robust and scalable but can provide QoS at a class level and not at a flow level. In this paper, our aim is to design architectures that are scalable and robust like DiffServ and at the same time able to provide per-flow QoS like IntServ. We propose a non work-conserving and a work-conserving architecture to achieve this goal. The guaranteeable delay regions of these architectures are the same as those of GPS based policies with rate proportional resource allocation. We also propose a scheme to provide meaningful throughput and responsiveness to best effort traffic even in the presence of heavy QoS load. Keywords: QoS, differentiated services, scalable core, delay guarantee, schedulability
1. Introduction 1.1. Motivation Today’s Internet provides a single class best effort service and no admission control. Routers in such an architecture do not maintain any state, except routing state that is highly aggregated. This allows the Internet to scale with both the size of the network and heterogeneous applications and technologies. With the huge infrastructure and the scope of the Internet, it is desirable to extend the class of provided services to include real-time services that need specific QoS guarantees from the network. One of the limitations of the current Internet architecture is in providing Quality of Service (QoS) guarantees to real-time applications that need such guarantees. The Internet Engineering Task Force (IETF) proposed two architectures to accomplish this; these are the Integrated Services (IntServ) [3] and Differentiated Services (Diffserv) [10] architectures. Both have the following common features: • The users specify a traffic profile and required QoS. • The network promises a certain service profile that will be provided to a flow as long as the user adheres to his promised traffic profile. • Inside the network, routers implement different packet scheduling and buffer management schemes. • The packet header carries information about the treatment that the packet should receive. In spite of the same underlying architecture in principle, the key difference between IntServ and DiffServ is that IntServ provides per-flow QoS while DiffServ provides QoS to traffic aggregates or classes. IntServ achieves this fine granularity of operation by (1) processing per-flow messages and maintaining per-flow data forwarding and QoS state on the control path, (2) performing per-flow classification, scheduling and buffer management on the data path. 0926-6801/02/$8.00 2002 – IOS Press. All rights reserved
88
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
Fig. 1. Arbitrary network topology showing edge routers and core routers.
The complexity of implementing these functionalities affects the scalability and robustness of the architecture. This can be illustrated as follows. Figure 1 shows a network with an arbitrary topology. The squares indicate the network users. Users establish connections among themselves using the network resources. The filled circles on the elliptical line indicates edge routers and the hexagons indicate core routers. By convention, the edge of a network is a collection of routers that are connected to users directly. The general assumptions are that the access links (connecting the users to the edge) are thin, i.e., they have small capacity. The edge routers may not operate at very high speeds. In the core of a network (inside the elliptical boundary), the bit pipes are thick and routers are high speed. At each core router, the number of flows is very large [36]. The complexity of per-flow operations usually increases as a function of the number of flows. Hence, performing per-flow operations in the core at high speed is not a viable alternative. Also, maintaining consistency in dynamic and replicated per-flow state in a distributed network environment is an arduous task and results in a less robust architecture [5,34]. The DiffServ architecture provides a more scalable solution. DiffServ achieves scalability by pushing complex per-flow operations like policing (making sure that a user’s actual traffic adheres to the specified characterization), per-flow state management and stateful scheduling to the edge. Core routers do not perform per-flow operations, but process each packet independently based on a small number of Per Hop Behaviors (PHBs), encoded in the packet header. This makes the data plane of core routers very simple and hence scalable. But the scalability is achieved at the cost of coarse granularity of operation. DiffServ can only provide class based QoS and per-flow guarantees are not implicitly possible. That is, we need extra machinery to give per-flow guarantees. Such machinery should maintain the scalability of the architecture. Our aim is to devise one such solution. In this paper, we propose two architectures for providing per-flow guarantees without maintaining per-flow state and performing per-flow operations in the core. Bot the architectures use the concept of Dynamic Packet State [34]. These architectures are: (1) Non work-conserving architecture (Architecture A) using the Rate Controlled Service (RCS) discipline [43] (in our case Shaper + Earliest Deadline First (EDF) scheduling) that outperforms GPS based schedulers [18]. (2) Work-conserving architecture (Architecture B) using VirtualClock (VC) [44] at ingress edge routers and EDF at core and egress edge routers. In the next subsection we survey the related work and mention our contributions in the paper. 1.2. Related work and our contributions The idea of the DiffServ architecture to support class based QoS was first proposed in [5] and [27]. The different approaches proposed for providing service differentiation can be found in [9,11,12,33,38,39]. An approximation to fair queuing scheduling with a stateless core (referred to as “SCORE”) is given in [32]. The concept of the SCORE
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
89
architecture and DPS that we have used is introduced in this paper. None of these solutions is capable of providing per-flow delay guarantees. The work closest to ours is that of [34]. In this paper also an architecture to provide per-flow delay guarantees with a SCORE is obtained. [34] uses Core-Jitter VirtualClock (CJVC) to achieve per-flow delay guarantees in a SCORE. CJVC eliminates the requirement of maintaining the finish time of the previous packet of the flow for calculation of the finish time of the present packet by using the “slack value”. This value has to be calculated for every packet at the edge router. The DPS consists of this slack value, time ahead and the rate reserved for the flow. Further, each router needs the Jitter Clock (JC). A comparison between the existing architectures in the literature and the one in this paper is based on three major issues: amount of overhead, work-conserving and non-work-conserving nature of schedulers and the scheduling policies implemented. • Some advantages of the architectures given in this paper are obvious, e.g., we save on the cost of calculating the slack value for every packet. The DPS has one less variable and this reduces the overhead per packet. This will increase efficiency and throughput. • Work-conserving disciplines provide better throughput and smaller average delays than non-work-conserving disciplines. We can save on the cost of implementation of Jitter Clocks (JCs) at each router. However, nonwork-conserving disciplines achieve more efficient usage of the buffer space in the network. This point is the most important advantage of non work-conserving disciplines over work-conserving disciplines. Usually, scheduling disciplines are implemented using a priority queue. When a new packet arrives at the scheduler, it is assigned a finish time and then it is inserted at its appropriate place in the priority queue. So, the smaller the buffer occupancy, the lesser is the ordering overhead in the priority queue. In the core, minimizing this overhead is very important. Also, non-work-conserving disciplines provide jitter guarantees [37]. However, in our work-conserving architecture (architecture B), the same jitter guarantees can be provided using the JC at the egress edge router in the path. • In CJVC, with VC in the core, excellent protection from misbehaving sources is achieved. In our preliminary architecture (architecture A), the shaper at the edge acts as a policing device. This prevents the degradation of service because of misbehaving sources, even though EDF does not provide such protection explicitly. In our architecture B, the VC at the edge shapes the traffic for the EDF in the core. Thus, all three architectures provide excellent isolation. As discussed in [18], RCS disciplines (shapers + EDF scheduler) can provide the same end-to-end delay guaranteeable region as Packet Generalised Processor Sharing (PGPS). EDF scheduling is appealing since for a single router, EDF is known to provide the largest guaranteeable region among all scheduling policies [18]. Furthermore, very efficient implementations of EDF are available in hardware and software [26]. This will prove a vital point in high-speed cores. Unfortunately, since we cannot maintain per-flow state in the core, the “large guaranteeable region” property of EDF cannot be utilized to accommodate more flows; this would require complicated admission control criteria with per-flow state. The remainder of the paper is organized as follows. The system model and required notation are introduced in Section 2. In Section 3, we derive and discuss some results and concepts that are utilised in later sections. The main result given here is the “Schedulability Theorem” and this is followed by a discussion of the concepts of “Priority Chains” and “Departure Chains”. Since our architectures use EDF and RCS disciplines based on EDF, Section 4 contains a brief review of these policies and some results based on schedulability theory. In Section 5, we give the preliminary non work-conserving architecture for providing per-flow QoS with a scalable core. In Section 6, we obtain an equivalent work-conserving architecture. A scheme for providing meaningful throughput and responsiveness to best effort traffic is obtained in Section 7. Finally, we conclude in Section 8. 2. System model and notation The sequence of data packets going from a certain source SO i to a certain destination DE i is called a flow i [44]. A collection of all the active flows in a network is called flow vector and is denoted by F . We assume that each
90
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
flow follows a path predefined at the time of connection establishment. This means that at every router, there exists a function that maps a packet from a given flow to an output link. This function does not change with the state of the network once the flow is set up. We will assume that any flow i has ni number of hops in its path, i.e., it passes through ni routers before ultimately reaching DE i . The mth hop in the path of flow i will be denoted by Pi,m . The path of flow i will be denoted by an ordered set Pi = {Pi,m : m ∈ {1, 2, . . . , ni }}. Sources assumed here are packet sources, i.e., data arrives to a router in chunks that have to be treated as single entities. The packet length from any source can be arbitrarily small but is bounded above. Some notation is given (i) (i) next: p(i) k is the kth packet from flow i ∈ F . ak is the arrival time of pk at a router. (Subsequently, we will include (i) (i) additional notation to specify which router the packet arrives to.) lk is the length of p(i) k . lmax is the upper bound on the length of any packet from flow i. N is the set of natural numbers. R and R+ are the set of real numbers and the set of non-negative real numbers, respectively. In general, we define Ai to be the set of all possible instances of the arrival process from a flow i ∈ F . The (i) traffic characterization of a flow i defines Ai . Any Ai ∈ Ai can be thought of as a sequence of doublets {(a(i) k , lk ), k ∈ N }. To illustrate, we take the example of leaky bucket constrained sources [6]. These sources are characterized by two parameters, one is the bucket depth (σ 0, in bytes) and the other is the token replenishment rate (ρ 0). In this case, the sequences that satisfy the following constraint are valid arrival instances and their collection is the arrival process space Ai : (k+j)
(i) (i) lu(i) σ + lmax + ρ · a(i) k+j − ak
∀k, j ∈ N .
u=k
We observe that the space Ai is very large. Consider a single packet arrival with the length of the packet being less (i) than lmax . This instance is valid irrespective of the arrival time of the packet. Since the arrival can occur at any t ∈ R, it is clear that the space is at least uncountably infinite. be the set of all possible instances of the aggregate arrival process from all the flows as seen by the network. Let A of A can be thought of as an N tuple, where each component is a sequence, A = (A1 , A2 , . . . , AN ), Each element A where Ai ∈ Ai ∀i ∈ F . Here N is the cardinality of F (denoted as |F |). Consider any router m in the network. The router employs non-blocking output queuing. We will assume store and forward switching, i.e., a packet arrives when its last bit arrives. An input buffer stores an incoming packet and as soon as its last bit arrives, passes the packet to the switching fabric. Switching fabric identifies the flow to which the packet belongs and then switches it to an appropriate output link instantaneously. A scheduling policy at the output link stamps a finish time on each packet and passes it to an output buffer. This buffer arranges packets in increasing order of their finish times for a given link and then they are transmitted in that order. The output link is modeled as a server with a capacity equal to the capacity of the link, which is denoted by C. The server is non-idling, i.e., the server is never idle when there are packets to serve. A packet is said to have departed when its last bit leaves the router. To indicate the quantities corresponding to a router m, we put m in brackets. For (i) example, a(i) k (m) indicates the arrival time of pk at router m. A collection of flows passing through a router m is a flow vector at m and is denoted by F (m), i.e., F (m) = {i ∈ F : m ∈ Pi }. C(m) denotes the server capacity at router m. A router implements some scheduling policy at each output link (a comprehensive summary of scheduling policies can be found in [22,42]). The scheduling policies considered here are dynamic [25], i.e., any scheduling policy π is associated with a function fπ(φ) that assigns a real number to every incoming packet. φ is a set of parameters that the scheduling policy π takes as input, e.g., VirtualClock (VC) [44] takes a rate reserved for each i ∈ F , Earliest Deadline First (EDF) [25] takes a deadline assigned to each i ∈ F , etc. This assigned real number acts as a priority indicator that is stamped dynamically when a packet arrives. So, without loss of generality, it can be thought of as the time at which a packet ought to finish its service. We will call this real number “finish time”. Under any scheduling policy π(φ), fk(i) (π(φ), m) is the finish time of p(i) k at router m. For better readability we shall omit terms in brackets whenever there is no chance of ambiguity.
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
91
(i) s(i) k (π(φ), m) is the time at which pk goes into service for the first time at router m. (i) d(i) k (π(φ), m) is the departure time of pk from a router m. We assume that a scheduling policy provides in-order delivery for packets from the same flow. At router m, we define two more processes for a given flow i, namely, the Priority Process and the Departure Process. An instance of the priority process is the sequence {(fk(i) (m), lk(i) ), k ∈ N } and an instance of the departure process is (i) (i) (i) (i) the sequence {(d(i) k (m), lk ), k ∈ N }. Here fk (m) and dk (m) are the finish time and the departure time of pk at router m, respectively. In general, these processes depend upon the aggregate arrival process into the network because the scheduling policy at a router can consider all the flows arriving at the router in order to compute the finish time of a packet. If we assume the propagation delay to be zero, then the arrival process of flow i at the router Pi,m (say) is the departure process at the immediate upstream router Pi,m−1 .
3. Preliminaries In this section we will introduce some definitions and concepts required for further discussion. Here and in subsequent sections we analyze preemptive scheduling policies first and then extend the results to non-preemptive scheduling policies as in [21]. We view a preemptive policy πP and a non-preemptive policy πNP as two versions of the same scheduling policy π, i.e., fπ is the function to calculate finish times for both the versions and hence, ∀k ∈ N and ∀i ∈ F .
fk(i) (π) = fk(i) (πP ) = fk(i) (πNP )
= (A1 , A2 , . . . , Ai , . . . , AN ). Let wπ(φ) [t, t + τ ](m) be the work brought in by a flow i in an interval Fix A i,A ∈ A. Then, [t, t + τ ] at router m that has finish time (t + τ ) under scheduling policy π(φ, m), for any instance A
wi,π(φ) [t, t + τ ](m) = A
lk(i) .
{k : a(i) (m),fk(i) (π(φ),m)∈[t,t+τ ]} k
∈A at a router m. We call this quantity “live work” in the interval [t, t + τ ] under arrival instance A Further, let π(φ) wA [t, t + τ ](m) =
wi,π(φ) [t, t + τ ](m), A
{i : i∈F (m)}
and W π(φ) [t, t + τ ](m) = sup wπ(φ) [t, t + τ ](m) . A A∈
A
It is clear that the function wi,π(φ) [t, t + τ ](m) depends on the scheduling policy π(φ, m), an aggregate arrival A instance A ∈ A, an epoch t and a period τ . Using these definitions, we now discuss the concept of schedulability. 3.1. Schedulability of arrival process A scheduling policy assigns a finish time to each packet. But in case of overbooking of resources and/or misbehaviour of sources,1 packets may get delayed beyond their respective finish times. 1 Misbehaving
sources are those that send more data than that allowed under their declared characterization.
92
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
Definition 1. A packet is schedulable at a router under a policy πP (φ) if it leaves the router before or at, but never ∈ A. If all packets arriving at a router leave before their respective after, its finish time under every arrival pattern A ∈ A, then we say that the arrival process A is schedulable under πP (φ). finish times under any A Next we obtain a necessary and sufficient condition for schedulability under preemptive scheduling policies that satisfy the ordering property. This property is also discussed in [29] and [18]. For the sake of completeness, we state the definition here. But before that, we simplify some notation for better readability. pk (m) without a superscript indicating a flow means the kth packet at router m and it can be from any flow in F (m). So for a given router, ak , lk , fk and dk mean the arrival time, length, finish time and the departure time of the packet pk , respectively. Definition 2. Let two packets pk and pu be backlogged in a system at any time instant t. A dynamic scheduling policy π is said to have the ordering property if the ordering of the finish times corresponding to packets pk and pu is preserved irrespective of future arrivals. That is, if fk (=)fu at t, then fk (=)fu, for any sequence of future arrivals. A trivial observation is that for a packet to be schedulable, its finish time should be greater than or equal to its arrival time. This is because a packet cannot leave before it arrives at the router. Theorem 1 (Schedulability Theorem). An arrival process A is schedulable under a preemptive policy πP (φ) having the ordering property iff W π(φ) [t, t + τ ] τ · C
∀t and τ ∈ R+ .
(1)
Proof. (=⇒) Since all the packets are schedulable under any instance of an aggregate arrival process, the packets will depart before or at their respective finish times. So, for an arbitrary t and τ , π(φ) wA [t, t + τ ] τ · C, π(φ) sup wA [t, t + τ ] τ · C,
⇒
∀A ∈ A,
A∈A
W π(φ) [t, t + τ ] τ · C.
⇒
(⇐=) The proof is by contradiction. If the condition is not sufficient for schedulability, then there exists an arrival instance A ∈ A and a packet pk such that pk is not schedulable. Since A is fixed, we will not write it explicitly. Define for this arrival instance A, T = {Ti , where Ti denotes the instant when the ith system busy period starts}. We assume that the system was empty at t = 0, and we know that the packet pk arrives at ak ; so the set T is not empty. Define, Tk = max{Ti ∈ T : Ti ak }. i
That is, we find Tk such that the packet pk belongs to a busy period starting from the epoch Tk .
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
93
Fig. 2. Example to indicate the position of Tk and Tˆk on time axis.
Since the server is work-conserving, it is possible that some packets pˆ1 , pˆ2 , . . . , pˆn such that fˆi > fk ∀ i ∈ {1, 2, . . . , n} are scheduled in the interval [Tk , fk ]. We define Tˆk to be the smallest time instant such that • Tˆk ∈ [Tk , ak ]. • No packet with finish time > fk is scheduled in the interval [Tˆk , dk ]. Such Tˆk always exists because ak will be one such epoch if a packet with finish time > fk is in service at ak . Since pk is not schedulable, dk > fk . So in the interval [Tˆk , fk ], packets with the service time fk are the only ones that are served. Tˆk is defined precisely and will not change with new arrivals as the scheduling policy satisfies the ordering property. π(φ) ˆ Claim. All the packets served in the interval [Tˆk , fk ] contribute to wA [Tk , fk ].
Assume the claim to be true; we will prove the claim in the next Lemma 1 for better readability. Then, we note π(φ) ˆ that service of the live work wA [Tk , fk ] starts at Tˆk . Since pk is not schedulable under πP (φ) π(φ) ˆ W π(φ) [Tˆk , fk ] wA [Tk , fk ] > (fk − Tˆk ) · C.
(2)
Equation (2) holds because (1) from Tˆk , no packet with finish time > fk is scheduled till pk departs. That is, the packets that are scheduled π(φ) ˆ after Tˆk till pk departs are only those that contribute to wA [Tk , fk ]. π(φ) ˆ π(φ) ˆ [Tk , fk ] and (2) The service of wA [Tk , fk ] starts from Tˆk and continues beyond fk as pk contributes to wA pk is not schedulable. But (2) is a contradiction to (1) ⇒ No such pk exists under any A ∈ A.
π(φ) ˆ Lemma 1. All the packets served in the interval [Tˆk , fk ] contribute to wA [Tk , fk ].
Proof. We will use the same notation that was defined in the proof of the above Theorem 1. Now, to prove the Lemma 1, we need to show (1) arrival time of all such packets lie in the interval [Tˆk , fk ] and (2) finish time of all such packets lie in the interval [Tˆk , fk ]. The second condition is a direct consequence of the definition of Tˆk . The first condition can be seen as follows. Since the finish times of all such packets ∈ [Tˆk , fk ], their arrival times cannot be greater than fk . Hence arrival time of such packets is either < Tˆk or ∈ [Tˆk , fk ]. We will show by contradiction that the arrival time of such packets cannot be less than Tˆk . Consider a packet which was served (i.e., finished service) in [Tˆk , fk ], but which arrived before Tˆk (if possible). Let pu be this packet and suppose it arrived at au < Tˆk . Now, the definition of Tˆk says that it is the earliest instant such that every packet that begins service in [Tˆk , fk ] has a finish time fk . So, if we take an instant x < Tˆk , we can claim that there is at least 1 packet (say p˜) among those beginning service in [x, fk ] that has a finish time > fk .
94
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
We can assert this for any x ∈ [au , Tˆk ]. Moreover, since packets begin service in the order of their finish times, it is clear that pu must have finished service before p˜, because pu had a finish time fk . Thus, we are forced to conclude that pu finished service before Tˆk , and this contradicts the statement that we started with. This proves the Lemma 1. Next, we examine schedulability of arrival processes when scheduling policies are non-preemptive. Even though busy periods2 of πP and πNP coincide for a work-conserving server, i.e., work in system at any epoch t is the same for both, the sequence in which packets are scheduled might be very different. So, the same definition for schedulability cannot be used for non-preemptive scheduling policies. Definition 3. A packet pk is schedulable under πNP (φ) if it is schedulable under πP (φ) and for any A ∈ A and for some fixed β ∈ [0, ∞), dk (πNP (φ)) fk (π(φ)) + β ∀ k ∈ N .
(3)
In Theorem 1 of [21], it has been shown that dk (πNP (φ)) − fk (πP (φ)) Lmax /C, if the arrival process is (i) schedulable under πP (φ). Here Lmax = maxi∈F (m) {lmax }. So, if all the packets are schedulable under πP (φ) then they are always schedulable under πNP (φ). Thus we have the following result. Lemma 2. An arrival process is schedulable under πP (φ) iff it is schedulable under πNP (φ). The proof is simple to see from the above discussion. The above Lemma 2 establishes the equivalence of preemptive and non-preemptive versions of scheduling policies with regard to schedulability. 3.2. Departure chains and priority chains In general, the departure process of an upstream router is the arrival process at a downstream router. With respect to this arrival process, the priority process and the departure process at the downstream router are obtained. But characterizing the departure process is not simple (in many cases not tractable [7,41]). Instead, priority processes, which are governed by the rule used to calculate finish times, are simpler to analyze. This is the motivation behind defining the concept of priority chains. Denote by Api (m) and Adi (m) the arrival processes at router m of flow i. The additional superscript “p” indicates that Api (m) is obtained by “pushing” the priority process of flow i along the path. Pushing the priority process means that the priority process of the first router is the arrival process at the second router; and with respect to this arrival process the priority process at the second router is calculated, and so on. In other words, the priority process at a router m is always calculated by taking the priority processes of the routers upstream of m as the arrival process. This is why it is called a “priority chain”. Similarly, superscript “d” indicates that the arrival process Adi (m) is obtained by pushing the departure process. In practice, of course, Adi (Pi,m ) represents the actual arrival process at a router. The idea is explained in Fig. 3 using an example. The idea is to analyze the network with more tractable priority chains. The terms corresponding to priority chains are indicated by a hat on top, e.g., fˆk(i) (π(φ), m) means the finish time of p(i) k under scheduling policy π(φ) at router m when the priority chain is the arrival process at router m. 2 System
busy period is the maximum continuous time interval for which the server was never idle.
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
95
Fig. 3. The figure represents a path of some flow passing through four routers. The root A of the tree indicates the first router that sees the arrivals from the source. The children routers P and D indicate the priority and departure processes of the first router. If we trace a path A − P then it means that the priority process of router 1 is considered to be the arrival process of router 2. Similarly, a path A − P − P − D indicates that the priority process of first router is the arrival process of the second router. With this arrival process we obtain the priority process and this acts as the arrival process at router 3. The final D indicates that the departure process at router 3 is the arrival process at the last router. Thus, A − P − P − P indicates the priority chain and A − D − D − D indicates the departure chain.
3.3. Virtual Clock (VC) scheduling policy The VC algorithm was first proposed in [44]. This scheduling algorithm tries to achieve the equivalent of time division multiplexing in a packet switched network. = The rule for computing the finish time of a newly arrived packet is as follows: at any router m, assume R {r1 , r2 , . . . , rN } is a vector of real numbers associated with the flow vector F (m). Then the finish time of p(i) k is given by the following relation lk(i) = max a(i) , f (i) (VC(R)) fk(i) (VC(R)) + , k k−1 ri = 0. The value ri can be thought of as the rate promised to or as the ∀i ∈ F(m) and ∀k ∈ N . Here f0(i) (VC(R)) Next, we state a simple result rate reserved by a session i. So, for VC, the parameter set φ is this rate vector R. characterizing the priority process of VC; this will be used later in deriving our proposed architectures. Lemma 3. Under the VC scheduling algorithm, the priority process of a flow i conforms to leaky bucket shaped (i) traffic with bucket depth = lmax and token replenishment rate = ri . To prove the above Lemma 3, it is needed to prove that for any k, j ∈ {1, 2, . . .} and i ∈ F(m), k+j
(i) (i) lu(i) lmax + ri · (fk+j − fk(i) ).
u=k
This follows by simple algebraic computations. The detailed proof is available in [4]. Further, in [14,40], the authors have shown that when the server is not overbooked, i.e., i∈F (m) ri C(m), then any arrival process is schedulable. That is, if the sum of the rates assigned to all the sessions is smaller than the server capacity, then every packet will depart before its finish time.
4. EDF and EDF based RCS disciplines In this section, we provide a brief review of the Earliest Deadline First (EDF) scheduling policy and Rate Controlled Service (RCS) disciplines based on EDF. We also present some results on schedulability with regard to these scheduling policies that will be used in later sections.
96
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
4.1. Earliest Deadline First (EDF) scheduling policy EDF is one of the oldest scheduling policies proposed [25]. It is a delay based scheduling policy. The salient feature of the scheduling policy is its simplicity of implementation that makes it a very good choice for high speed operations [26]. Fix an arrival instance A ∈ A. The general rule for calculating the finish time of p(i) k at the Pi,m th hop where i,m )) is implemented is: EDF(D(P Pi,m ) = max a(i) (Pi,m ), f (i) (π(φ), Pi,m−1 ) + Di (Pi,m ). fk(i) (EDF(D), k k
(4)
i,m ) is a real valued vector taken as a parameter set by the EDF scheduling policy at the mth router in the D(P path. In (4), π(φ) can be any dynamic scheduling policy that is implemented at Pi,m−1 . Equation (4) shows the need for passing the finish times at upstream routers to the downstream router where EDF is implemented. This requires clock synchronization among different routers in a network and hence is undesirable. Instead, the popular approach is to pass the value fk(i) (π(φ), Pi,m−1 ) − d(i) k (π(φ), Pi,m−1 ) [13,46]. This value is called “time ahead” and is denoted as TA(i) (P ). So, assuming zero propagation delay, (4) is modified to i,m−1 k Pi,m ) = max{a(i) (Pi,m ), a(i) (Pi,m ) + TA(i) (Pi,m )} + Di (Pi,m ). fk(i) (EDF(D), k k k
(5)
Pi,m ) = a(i) (Pi,m ) + When all packets are schedulable at upstream routers, this simply reduces to fk(i) (EDF(D), k (i) TA(i) k (Pi,m ) + Di (Pi,m ). We note that providing the time ahead value TAk (Pi,m ) eliminates the need for clock synchronization among various routers. In a network, the departure processes of upstream routers are the arrival processes at respective downstream routers. As mentioned earlier, analysis with priority processes is more tractable. Next, we prove that schedulability with respect to priority processes of upstream routers implies schedulability with respect to their departure processes as well, when the EDF P scheduling policy is implemented. The result is non-trivial because in general, schedulability with respect to priority processes does not guarantee schedulability with respect to departure processes; see [4] for examples. Lemma 4. Let any router m be such that m = Pi,mi for any i ∈ F(m) and suppose that it implements the EDFP scheduling policy. If the arrival processes are schedulable at Pi,mi −1 for all i ∈ F(m) and the priority processes of flows in F (m) are schedulable at router m, then their departure processes are schedulable at router m. Proof. Fix any A ∈ A.3 For any flow i ∈ F(m), let a(i) k (Pi,mi ) denote the arrival time at router m when the arrival process is the departure process of the router Pi,mi −1 . Similarly, let a ˆ(i) k (Pi,mi ) denote the arrival time at router m when the arrival process is the priority process of the router Pi,mi −1 . Since the arrival processes are schedulable at routers Pi,mi −1 for all j ∈ F(m), for any A ∈ A, ˆ(i) a(i) k (Pi,mi ) a k (Pi,mi ).
(6)
From (6), the rule for calculating the finish time at router m (given in (4)) can be modified as m) = fˆ(i) (EDF(D), m) fk(i) (EDF(D), k =a ˆ(i) k (m) + Di (m), 3 Recall
that A is an aggregate arrival process instance at the ingress of the network.
(7)
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
97
when either the departure process or the priority process at router Pi,mi −1 is considered to be the arrival process
EDF(D) at router Pi,mi . Now fix an interval [t, t + τ ]. Assume that a packet p(i) [t, t + τ ](m). This k contributes to wi,A (i) (i) (i) m) is the implies that ak (m) ∈ [t, t + τ ] and fk (EDF(D), m) ∈ [t, t + τ ]. Now, the finish time fk (EDF(D), same in both the cases when the arrival process at m is the departure process of the upstream router or the priority process of the upstream router. This implies (i) t a(i) ˆ(i) k (m) a k (m) fk (EDF(D), m) t + τ.
(8)
EDF(D) The above inequality (8) clearly shows that p(i) i,A [t, t + τ ](m). Since the arrival instance A, k also belongs to w (i) time interval [t, t + τ ] and packet pk were arbitrary, we conclude that
EDF(D) EDF(D) wi,A [t, t + τ ](m) w i,A [t, t + τ ](m),
for every interval [t, t + τ ], for all i ∈ F(m) and for all A ∈ A. This implies that EDF(D) [t, t + τ ](m). W EDF(D) [t, t + τ ](m) W
Since the arrival process (as the priority processes of the upstream routers) is schedulable, using the Schedulability Theorem we conclude that EDF(D) W EDF(D) [t, t + τ ](m) W [t, t + τ ](m) τ · C.
From the above equation and again using the Schedulability Theorem (only if part), Lemma 4 can easily be seen. In Fig. 4, consider the flows i, j and k. These flows belong to F (m). Then the above Lemma 4 says that if the arrival processes at routers 1, 2 and 3 are schedulable and the priority processes at these routers of the flows i, j and k are schedulable at router m, then the departure processes of the flows are also schedulable at router m. So, for the EDFP scheduling policy, it is sufficient to ensure schedulability with respect to the priority processes of upstream routers in order to ensure the schedulability with respect to their departure processes. In a network case we obtain the following result: Theorem 2. Given a network of EDFP schedulers, schedulability with respect to priority chains implies schedulability with respect to departure chains. The proof for the above Theorem 2 follows by induction on number of flows in the network and then on number of hops that a given flow traverses. At each induction step we use the result obtained in Lemma 4. The complete proof is available in [4].
Fig. 4. For EDFP scheduling policy, schedulability with respect to the priority processes of flows i, j and k at router m implies schedulability with respect to their departure processes, whenever the input processes are schedulable at routers 1, 2 and 3.
98
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
The Theorem 2 suggests using the more analyzable priority chains in place departure chains. Further, when schedulability is ensured at all the routers m ∈ Pi , the worst case end-to-end (or network) delay for any packet p(i) k can be given as N Di = fk(i) (Pi,ni ) − a(i) k (Pi,1 ) =
ni
Di (m).
m=1
Since the left hand side quantity is the same for all packets, we conclude that the guaranteeable delay, when schedulability is ensured, is simply the summation of the deadlines at all the hops in the path of a flow. Further, we observe that ensuring schedulability is equivalent to ensuring delay guaranteeability in case of EDF schedulers, but this is not the case in general [4]. Till now we have considered only preemptive scheduling policies. Extension of these results to non-preemptive scheduling policies can be carried out as follows: we observe that at a router, if finish times of all the arriving packets are modified to fk(i) (π(φ)) + K, where K is any constant, then the ordering among finish times remains the same and hence the order of service also remains unchanged. Further, if a source vector is schedulable with respect to the original priority process, then it remains schedulable with modified priority process as well, for every K 0. These observations are true for both πP and πNP . Under πNP , if K Lmax /C at the given router, then the packets depart before their respective finish times. Based on these observations, we modify the priority process to fk(i) (π(φ)) + Lmax /C at every router that implements the non-preemptive scheduling policy. To see how this helps, let us assume that the routers 1, 2 and 3 in Fig. 4 implements non-preemptive schedulers. It is interesting to observe that all the arguments in Lemma 4 are valid when priority processes are replaced by modified priority processes at these routers. Hence Lemma 4 is valid even when the non-preemptive scheduling policies are implemented at the upstream routers. Also, since we have schedulability ensured with EDF P , the departure times at router m are less than or equal to the respective finish times (obtained using modified priority processes). So, even if this scheduler is changed to a non-preemptive one, its departure process will be less than or equal to its modified priority process. With these observations, we note that the arguments in Theorem 2 are valid for the network of EDF NP schedulers with modified priority chains. We state the following result without explicit proof. Corollary 1. Given a network of EDFNP schedulers, schedulability with respect to the modified priority chains implies schedulability with respect to the departure chains. As argued before, the guaranteeable network delay for flow i in the network of EDFNP schedulers will be equal to
ni Lmax Di (m) + . C(m)
m=1
4.2. RCS disciplines that outperform service disciplines based on GPS In [18], the authors have shown that the RCS disciplines (Shapers + EDF) outperform GPS based schedulers. “GPS based schedulers” refers to the packet schedulers that try to approximate GPS. It is shown that RCS disciplines have at least the same network delay guaranteeable region as that of GPS based schedulers. We will discuss only the Rate Proportional Processor Sharing (RPPS) assignment [31], as the details are available in [18]. Let the arrival process Ai be leaky bucket constrained with bucket depth σi and token replenishment rate ρi . Further, assume that the traffic of every flow i ∈ F(m) is shaped at the ingress of the router m with a leaky (i) bucket having bucket depth lmax and token rate ri (m). Then in [18], the authors have proved that any delay (i) greater than or equal to l /(r (m)) is guaranteeable with the EDF P scheduler, where ri (m) ri (m) and max i j∈F (m) rj (m) C(m). This result is very important for our purposes and hence we will state it as a Theorem 3. An alternative proof based on schedulability theory can be found in [4].
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
99
Fig. 5. Path for a tagged flow i in a network of RCS schedulers.
Theorem 3. Consider any router m and let every source i ∈ F(m) be shaped at the ingress of the router by a (i) , ri ). Let the output process of a leaky bucket controller be the input leaky bucket controller with parameters (lmax whenever ∀i, Di l(i) /r . process to the router. Then the arrival process is always schedulable under EDPP (D) max i Here ri satisfies (1) i∈F (m) ri C(m) and (2) ri ri , ∀i ∈ F(m). (i) So, if at the ingress of each router the traffic from flow i is shaped with a leaky bucket (lmax , ri (min)), then at (i) every router, the EDF P scheduler can guarantee the delay lmax /(r (m)). Here, r (min) = min i i m∈Pi ri (m). Thus (i) the guaranteeable end-to-end delay in this scenario will be m∈Pi lmax /(ri (m)) plus the delays experienced in the shapers. It has been shown in [18] that if the shaper characteristics are the same at each hop then the total worst case delay experienced in the shapers is σi /ri (min). So the guaranteeable end-to-end delay is
N Di =
l(i) σi max + . ri (min) ri (m)
(9)
m∈Pi
Hence, if we assume that ri (m) is the rate reserved for session i at router m under GPS scheduler, then it can be clearly seen that the RCS scheduling policies constructed here have the same end-to-end guaranteeable region as GPS. The practical implementation of this approach is shown in Fig. 5. At the first hop on Pi , the traffic is shaped (i) using a Leaky Bucket with parameters lmax and ri (min). The output of the shaper is given to the EDF sched(i) /(ri (1)). As uler. This scheduler stamps a finish time equal to arrival time at the scheduler plus the value lmax discussed above, schedulability is taken care of by the condition k∈F (1) rk (1) C(1). Before departing, the packet is stamped with its time ahead value. The time ahead value is always positive since all the packets are schedulable. This packet arrives at the second router in the path. Here the packet is held in the Delay Jitter Controller (DJC) for a duration equal to the time ahead stamp it carried. This ensures that the output stream from the (i) and ri (min) (see delay-jitter controlling regulators DJC is again leaky bucket constrained with parameters lmax in [43]). After this, the packets are given to EDF scheduler that again schedules the packet as described for the first router. This procedure is repeated at each router in the path. This architecture is the starting point of our schemes.
5. Preliminary architecture (A) As in [32], we will consider a Scalable Core (SCORE) architecture that is similar to DiffServ. In SCORE the edge routers can perform per-flow operations but core routers treat every packet as an independent entity. Each packet carries some information in its header. Core routers use this information to decide how they should be treating the packet. Using the terminology of [34], this information carried by packets is called Dynamic Packet State (DPS). DPS is initialized by the ingress router. Core routers process each incoming packet based on the state carried in the packet header. Before forwarding a packet to the next downstream router, DPS is updated by the
100
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
current router. Even though DPS is similar to PHB in DiffServ, there are differences as pointed out in [34]. The state coded in DPS is highly dynamic and reflects the current state of the flows, while PHB remains the same over the complete path of the flow and indicates class based behavior (like dropping or scheduling priority among classes). Detailed architecture and the admission control criteria are given in the following subsections. 5.1. Implementation of architecture: data path The architecture shown in Fig. 5 can also be directly implemented in SCORE with some suitable modifications. In the original architecture, each router in the path has to keep track of the deadline for a given flow. In addition, at the first router, shapers with different parameters are needed for different flows. On the data path, this indicates the need for identifying the flow to which the packet belongs and maintaining per-flow deadlines. To get rid of these operational complexities, we can put the deadlines at each hop in the packet header along with the time ahead value at the current router. Downstream routers can use this information to calculate the finish times. This eliminates the need for classifying flows in the core, as now each packet can be treated as an independent entity depending on its DPS. In case of different deadlines at every router, DPS should consist of the deadlines at every hop and the time ahead at the current hop. Since the number of hops does not remain the same for all the flows, header length will vary from flow to flow. This scenario is undesirable as then the length of the packet header will depend on the path length. Managing packets with variable header lengths is not an easy job. There are two possible solutions for this problem. (1) The maximum allowable path length (in terms of hops) is predefined, say = h. Hence, a packet from any flow cannot have hops > h on its path. We can reserve h places for having a deadline in the packet header. ˆ < h, then h − h ˆ places will be filled by dummy values. The ingress router If the path length is equal to h will identify the flow to which a packet belongs and will initialize the DPS. Each core router will strip off the first value from the record and append the dummy value at the end before the time ahead stamp. Recall that the time ahead stamp is updated at each router in the path. A typical scenario is shown in Fig. 6(a) with ˆ = 3. h = 5 and h (2) We can have the same deadline at every hop in the path. This approach will require only one place for having the deadline. As in the previous approach, the ingress router will identify the flow and will initialize the DPS. Each core router will read the deadline and time ahead values. At the time of departure, the time ahead value will be updated by the current router. A typical scenario is shown in Fig. 6(b). Approach 1 is flexible in the sense that we can have different deadlines at different routers in the path. This can allow us to provide delay guarantees to more number of flows. But if the value of h is large then the overhead per packet is much more and can adversely affect the throughput. Approach 2, is less flexible but considerably reduce packet overhead. This approach also requires less number of changes in packet header (only time ahead value is updated) and hence less dynamic. In this paper we will opt for the second approach.
Fig. 6. Approaches for having fixed packet header length.
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
101
To summarize the approach: The ingress edge router shapes the traffic from flow i using leaky bucket shapers (i) with some pre-defined parameters (lmax , ri ), where flow i is leaky bucket constrained with parameters (σi , ρi ) such (i) that ri ρi . The deadline assigned to flow i at any router in the path is lmax /ri . The deadline and time ahead are carried as DPS. The core routers (second router onwards) read this information from the packet header, and make the packet wait in the DJC for time ahead stamp it carries. Once the packet comes out of the DJC, the finish time is calculated as arrival time to scheduler plus the deadline it carried. The packets are arranged in increasing order of their finish times and transmitted in the same order. This can be implemented using a single priority queue. Before transmission, the time ahead for the packet is calculated and this, along with its deadline, is coded in the packet header. At every router in the path, the time ahead value will change but the deadline will remain the same. Time ahead values enable the working of the framework without any need of time synchronization among routers. 5.2. Implementation of architecture: control path In Section 5.1, we obtained data path scalability in core routers by eliminating any need of per-flow state management and per-flow operations. In this section, we propose an approach to obtain control path scalability in core routers. For guaranteeable delay service, the most important control path functionality is admission control. Considering our architecture, we need to address the following two specific questions in order to simplify admission control operation on the control path. (1) Given a source’s leaky bucket parameters (σi , ρi ), end-to-end delay requirement and the path, how do we obtain the token replenishment rate ri for the leaky bucket shaper at the ingress edge router? (2) How do we ensure that the arrival process is schedulable at each router in the path without maintaining per-flow state? The second question is particularly important as the general admission control criteria obtained for EDF schedulers [13,15,17,20,24,46] are complex and need per-flow state. Our admission control criterion is based on Theorem 3. Fix any router m ∈ Pi . The Theorem says that if i∈F (m) ri C(m), then the arrival process is schedulable. So (i) whenever ri ρi for every i ∈ F , the guaranteeable end-to-end delay is given by (σi + m∈Pi lmax )/ri (see Eq. (9)). Thus, if the required end-to-end delay guarantee is N Di , then ri = max
σi +
(i) m∈Pi lmax
N Di
, ρi .
(10)
In the above equation, the source i specifies all the parameters present on the right hand side. Equation (10) gives (i) /ri . Also, the admission a solution for the first question. The deadline at each router in the core is simply lmax control criterion is given precisely by
ri C(m).
i∈F (m)
If this is violated at any router, then the flow will not be set up. So, routers do not have to maintain per-flow state in the control path. They just have to keep the single quantity i∈F (m) ri . Whenever a new flow seeks admission, its ri value is added to the above quantity. If the sum is less than or equal to the server capacity then the flow will be allowed. If the flow is allowed at every hop in the path, then at every hop the aggregate reserved rate is updated to the original aggregate plus the ri value. Whenever a flow completes its transmission and no longer needs network resources, a tear down message can be sent. This message will carry the reservations obtained by the flow. On receiving any such message, this value is subtracted from the stored aggregate reserved rate. Now, it may happen that a flow dies without sending a tear down message.4 For such cases, the edge router can shoulder the 4 The
authors thank anonymous reviewers for pointing out this issue.
102
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
responsibility of sending the tear down message to free the reserved resources. We note that edge router maintains per-flow state, hence it knows the resources reserved by each flow. Furthermore, it also classifies the arriving packets according to the flow and initiates PHB field in the packet header. So, the edge router can maintain a counter for each flow to keep track of the time elapsed since the last packet from the flow arrived. If the elapsed time is more than a certain amount, then the edge router can initiate a tear down message. This additional feature makes the framework more robust. We observe that the proposed approach simplifies the data and control path in the core considerably. More specifically, core routers need not maintain any per-flow state and they need not perform any per-flow operation. Hence the proposed architecture is much more scalable and robust than the IntServ solution. However, we note that though the proposed architecture has many desirable features to improve scalability and robustness while providing per-flow delay guarantees, core routers still need to process resource reservation requests and connection tear-down requests for every flow. A resource reservation protocol like RSVP [45] can be used for this purpose. Very large number of such requests can affect the scalability of the architecture. Some possible approaches to overcome this excessive processing need are mentioned below. (1) Using a Centralized Bandwidth Broker Every reservation request goes to a centralized bandwidth broker, which maintains the network topology and the current status of resource availability at each router. Using this information, it decides whether to admit the new call and updates resource availability if necessary. This solution is efficient if the flows are longlived. We note that many real-time applications are long-lived, e.g., video-on-demand, tele-conferencing and real-time medical applications. (2) Aggregating Reservation Requests In this approach, edge routers aggregate resource reservation requests, which can potentially reduce the total number of requests a core router has to process. The edge routers can also predict future requests and reserve additional resources in advance. The detailed aggregation protocols are out of the scope of this paper. Some useful references on this specific research area are [23,28]. For route pinning and aggregation, Multi-Protocol Lable Switching (MPLS) can be used.
6. Work-conserving architecture (B) In this section, we will start from the preliminary architecture (Architecture A) described in Section 5 and make modifications step-wise to obtain an architecture that is work-conserving, with no need of shapers at any router and yet providing the same guaranteeable region. 6.1. Work-conserving core In Architecture A, core routers implement DJC to hold the packets for an interval equal to the time ahead stamp carried in the DPS. This implies that the server may be idle even when there are packets at the router (all the packets are held in the DJC). Thus Architecture A has non work-conserving core. We now use the results obtained in Section 4.1 in order to suggest modifications in order to make the core work-conserving. In [43] it is noted that the priority process at the EDF scheduler at any router is just a shifted version of the arrival process at the EDF scheduler of the first router, if the link propagation delays are constant over all the packets of the given flow. Moreover, from the definition of the time ahead value, it is clear that the arrival time at the EDF scheduler of router Pi,m is equal to the finish time at router Pi,m−1 . So, when we ensure schedulability at the EDF scheduler at router Pi,m , we ensure schedulability with respect to the priority process of the router Pi,m−1 . Using Lemma 4, we conclude that the departure processes of the immediate upstream router for the flows passing through Pi,m are also schedulable at router Pi,m , where finish times are calculated as per (4). This implies that the DJC can be done away with at core routers without affecting the schedulability. This yields a work-conserving core. This completes our first step of modifications.
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
103
6.2. Work-conserving edge We note that the input to the second EDF scheduler in the original architecture is leaky bucket constrained with (i) (i) , ri ). The maximum delay seen at the first hop is equal to (σi + lmax )/ri . So, if we can find a parameters (lmax scheduling discipline that (a) ensures schedulability, (i) )/ri and (b) guarantees delay less than or equal to (σi + lmax (i) (c) has a leaky bucket constrained priority process with parameters (lmax , ri ), then we can replace the shaper plus EDF scheduler by this scheduling policy. Consider such a policy at ingress edge routers. Since we have already ensured schedulability at the next hop EDF scheduler for leaky bucket constrained (i) sources with parameters (lmax , ri ), the priority chains are schedulable. By condition (a), the scheduling policy implemented at each ingress router also ensures schedulability. As we have noted earlier, Theorem 2 can be relaxed to include scheduling policies other than EDF P if they guarantee schedulability. This implies that the departure chains are schedulable in such a network. Thus, with a scheduling policy that satisfies (a), (b) and (c) at the first hop in the path and EDF schedulers in work-conserving core, we can guarantee the same end-to-end delay. Such a policy is VC with ri as the rate reserved for the flow i (see Lemma 3). In this architecture, shapers are not required at all. A typical scenario is shown in Fig. 7. There exists a session from the user X to the user Y . At the ingress edge router, we have VC and at all the remaining routers we have EDF schedulers. In the next Theorem we formally prove the delay bound for this architecture. At the ingress Theorem 4. Let all the flows be leaky bucket constrained with parameters (σi , ρi ) for any i ∈ S. edge router, the VCP scheduler is implemented with ri ρi as the reserved rate for flow i. At each core router and egress edge router, the EDFP scheduler is implemented. For flow i, the deadline at any EDFP scheduler is given (i) by lmax /ri . At any router m in the system,
rj C(m),
(11)
j∈F (m)
where C(m) is the link capacity. Then the delay (σi +
(i) m∈Pi lmax )/ri
is guaranteeable for flow i.
Proof. We prove this Theorem 4 in two steps. In the first step, we prove that the departure chains are schedulable. Recall that the departure chains at a router are the actual arrival process at the router. When schedulability is ensured, guaranteeable end-to-end delay is less than or equal to the maximum difference between the finish time (with priority chains) at the last hop and the arrival time at the first hop in the path over all packets and arrival
Fig. 7. New architecture for per-flow end-to-end delay guarantees in a SCORE network.
104
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
instances. This is because of the fact that under EDF scheduling, finish times with departure chains are equal to finish times with priority chains. STEP 1: In [14,40], the authors have shown that any arrival process is schedulable under VC if the server is not overbooked. Now, for any arrival process, the priority process of VC with reserved rate ri is leaky bucket (i) (i) constrained with bucket depth lmax and token replenishment rate ri (Lemma 3). This (lmax , ri ) leaky bucket (i) /ri . By constrained process is the input to the core EDF P scheduler. This EDF P scheduler has a deadline lmax Theorem 3, this arrival process is schedulable if the sum of all these ri ’s is less than the server capacity. We observe that the characterization of the priority process of the EDF P scheduler is the same as the characterization of the arrival process. So condition (11) ensures schedulability at every router in the network with respect to priority chains. The above discussion and Theorem 2 prove that at every router the departure chains are schedulable. STEP 2: Consider any packet p(i) k . As shown in [14,40] fk(i) (VC, A)(Pi,1 ) a(i) k +
(i) σi + lmax , ri
(i) whenever ρi ri . Since at every other router we have EDF P schedulers with the deadline equal to lmax /ri , we can conclude that
fk(i) (EDF, A)(Pi,ni ) = fk(i) (VC, A)(Pi,1 ) + (ni − 1)
(i) lmax . ri
Hence the total delay seen by any packet of flow i is fk(i) (EDF, A)(Pi,ni ) − a(i) k
σi +
(i) m∈Pi lmax
ri
.
Since packet p(i) k and the arrival instance A are arbitrary, we conclude that the end-to-end delay for the flow i is N Di
σi +
(i) m∈Pi lmax
ri
.
The implementation of this architecture with a SCORE is the same as discussed in Section 5. Each packet carries (i) two values as DPS. The first is lmax /ri and the other is the time ahead stamp. Each router calculates a finish time equal to the sum of arrival time, time ahead stamp and the deadline coded in the packet header. The packets are served in increasing order of their respective finish times. The time ahead stamp is updated by each router on the path while the other value remains constant. The calculation for ri is as given in the (10) and the admission control criterion also remains the same as discussed in Section 5.2. 6.3. Extension to non-preemptive scheduling policies The basic idea is the same as the one discussed in Section 4. At each router m in the network and for each flow i ∈ F , we construct a modified priority process by inflating the finish time of each packet by Lmax /C(m). Under this modified priority process, the priority structure remains the same and all packets depart before their finish times even under non-preemptive versions of scheduling policies. The following Corollary gives the end-to-end delay bound in such a scenario. At the ingress Corollary 2. Let all the flows be leaky bucket constrained with parameters (σi , ρi ) for any i ∈ S. edge router, the VCNP scheduler is implemented with ri ρi as the reserved rate for flow i. At each core router
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
105
and egress edge router, the EDFNP scheduler is implemented. For flow i, the deadline at any EDFNP scheduler is (i) given by lmax /ri . At each router m, the time ahead value is calculated with respect to the modified priority chain and rj C(m), (12) j∈F (m)
where C(m) is the link capacity. Then the delay (σi /ri )+ for flow i.
m∈Pi
(i) ((lmax /ri )+(Lmax /C(m))) is guaranteeable
The proof is available in [4]. Furthermore, since the delay bound is different in case of non-preemptive schedulers, the calculation for ri has to be modified as follows. ri = max
σi +
N Di −
(i) m∈P lmax i Lmax m∈Pi C(m)
, ρi .
The admission control criteria remains the same as given in (12).
7. Supporting best-effort traffic QoS and best-effort traffic coexist in most real-world networks. In such a scenario, apart from guaranteeing a required performance to QoS traffic, it is important to provide meaningful throughput and responsiveness to best-effort traffic. Most of the present approaches provide higher priority to the QoS traffic and best-effort traffic receives service only when no packet from QoS flows is present at a router. In such cases, the performance seen by best-effort traffic is highly sensitive to the QoS load at a router. In case of high QoS traffic, the performance seen by best-effort traffic can degrade below acceptable levels. This suggests a need for isolation between the two different traffic classes. To provide a minimum level of service to best-effort flows independent of QoS load, one of the possible solutions is to keep aside some bandwidth specifically for best-effort traffic. The typical scenario is shown in Fig. 8(a). A server of capacity C is split between QoS and best-effort traffic at rates r1 and r2 . The exact values of these alloted capacities is a policy issue and depends on the service provider. Rate guarantees can be achieved by using any rate based service policy [2,8,19,29,30,44]. As a result, we can view the complete system as a hierarchical system as shown in the Fig. 8(b). Hierarchical scheduling policies are studied in [1,16,35]. These policies need per-flow state management and different analysis for guaranteeable delay. That is, the guaranteeable delay in the hierarchical system is not equal to the guaranteeable delay in the system with scheduling policy π(φ) and a server of rate r1 . In our specific case, we overcome the above difficulties using the observation in the following Theorem. But before that we explain the basic idea in brief. Consider the system shown in the Fig. 8(c). System 1 is a collection of N subsystems. The set of sources feeding data into the subsystem i is represented by “Class i”. Any subsystem i represents a separate router where the link capacity is ri bits per unit time and the scheduling policy implemented is πi (φi ). In System 2, all the classes are served by a single non-idling server of capacity C bits per unit time but the finish time calculation for each class is still done as per the scheduling policy πi (φi ) for class i. Packets are served in increasing order of their finish times across all classes. Observe here that the overall scheduling policy is still a dynamic scheduling policy as priorities are assigned to the packets as and when they arrive. N Theorem 5. If k=1 rk C, then the schedulability of all classes in System 1 implies their schedulability in System 2, if the ordering property is satisfied across all classes.
106
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
(a)
(b)
(c) Fig. 8. Figure (a) shows a scenario where the total server capacity is split between QoS traffic and best-effort traffic. Figure (b) shows a hierarchical view of the system and figure (c) shows the schedulability condition in case of the rate assignment for multiple classes at a router in a network.
Proof. Consider any arbitrary subsystem i of the System 1. Since the class Ci is schedulable, from Theorem 1 we know that W1πi (φi ) [t, t + τ ] ri · τ
∀t and τ ∈ R+ .
Subscript 1 indicates that the quantity refers to System 1. Observe that under both systems, the arrival process is the same for every class and the class is served as per same the scheduling policy. Thus the finish time of every packet is the same under both the systems. That is, for any class W1πi (φi ) [t, t + τ ] = W2πi (φi ) [t, t + τ ]. Now to prove schedulability for all the classes taken collectively under System 2, by Theorem 1, it is needed to show that N
W2πi (φi ) [t, t + τ ] C · τ
∀t and τ ∈ R+ .
i=1
(13) can be seen as follows: N
W2πi (φi ) [t, t + τ ] =
i=1
N
W1πi (φi ) [t, t + τ ]
i=1
N
ri · τ
i=1
C · τ.
(13)
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
107
This result can be directly used in our case where we have two classes, namely, a QoS class (QC) and a best effort class (BEC). Assume that a policy is to reserve rate Rb (m) for BEC at router m. Then the capacity available for QC is Rq (m) = C(m) − Rb (m). Based on the above Theorem, we make the following modifications for providing a reserved rate Rb (m) to BEC: (a) Change the admission control criterion (for QoS traffic) at a router to: i∈F (m) ri Rq (m). (b) Calculate the finish time for incoming packet pk of BEC as fk = max(fk−1 , ak ) +
lk . Rb (m)
This finish time calculation is exactly similar to the finish time calculation under a VC scheduler. Observe that with modification (a), we ensure schedulability for QC at the router with rate Rq (m). Also, if the packets that belong to BEC depart before or at their respective finish times as given in (b), the rate Rb (m) is given to BEC. We note that with the finish times calculated as in (b), any arrival process is schedulable at a router with server of capacity Rb (m). Then Theorem 5 implies that both the classes are schedulable at a router with server capacity C(m). We note that a core router has to maintain only two values, namely, Rb (m) and the finish time of the last packet of BEC.
8. Conclusion and future work DiffServ is a more scalable but less flexible architecture. It is less flexible in the sense that it cannot implicitly provide per-flow QoS. In this paper we addressed the problem of providing per-flow delay guarantees without maintaining per-flow state and without performing per-flow operations at the core routers. The summary of the results obtained in the paper is as follows: • A preliminary architecture to provide per-flow delay guarantees with SCORE was obtained. This architecture uses the RCS disciplines (EDF + Shapers) [18] as the basis. The modifications done are at the implementation level and in the flow admission control criteria. The guaranteeable region of this scheme is the same as that of GPS based scheduling policies with rate proportional resource allocation. • The preliminary architecture is non work-conserving and requires shapers or DJC at every hop. We modified this architecture to obtain a work-conserving architecture that does not need any shapers, without sacrificing either scalability or the guaranteeable region. • The architectures proposed also accommodate best effort traffic gracefully. A minimum rate is guaranteed to traffic of this class, irrespective of QoS load. However, in the present architectures, we failed to exploit the complete guaranteeable region of the EDF policy. Also, fair sharing of bandwidth between best effort traffic and QoS traffic could not be achieved. These are the areas for future research.
References [1] [2] [3] [4]
J.C.R. Bennet and H. Zhang, Hierarchical packet fair queueing algorithms, IEEE/ACM Trans. on Networking 5(5) (1997), 675–689. J.C.R. Bennett and H. Zhang, W F 2 Q: Worst-case fair weighted fair queueing, in: INFOCOM ’96, 1996, pp. 120–127. R. Braden, D. Clark and S. Shenker, Integrated services in the Internet architecture: An overview, Internet RFC 1633, Jun. 1994. P. Chaporkar, An approach to the analysis of scheduling policies for guaranteeing delay with arbitrary arrivals, Master’s thesis, Indian Institute of Science, Bangalore, India, 2000. http://shravana.cedt.iisc.ernet.in/∼scprasan. [5] D. Clark and J. Wroclawski, An approach to service allocation in the Internet, Internet Draft, Jul. 1997. [6] R.L. Cruz, A calculus for network delay, part I: Network elements in isolation, IEEE Trans. on Information Theory 37(1) (1991), 114–131.
108
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
[7] R.L. Cruz, A calculus for network delay, part II: Network analysis, IEEE Trans. on Information Theory 37(1) (1991), 132–141. [8] A. Demers, S. Keshav and S. Shenkar, Analysis and simulation of a fair queueing algorithm, Internetworking Research and Experience (1) (1990). [9] C. Dovrolis, D. Stiliadis and P. Ramanathan, Proportional differentiated services: Delay differentiation and packet scheduling, in: SIGCOMM ’99, 1999, pp. 109–120. [10] Y. Bernet et al. A framework for differentiated services, Internet Draft, draft-ietf-diffserv-framework-01.txt, Nov. 1998. [11] W.-C. Feng, D.D. Kandlur, D. Saha and K.G. Shin, Adaptive packet marking for maintaining end-to-end throughput in a differentiatedservices Internet, IEEE/ACM Trans. on Networking 7(5) (1999), 685–697. [12] D.F. Ferguson, C.K. Nikolau and Y. Yemini, An economy flow control in computer networks, in: INFOCOM ’90, 1990. [13] D. Ferrari and D.C. Verma, A scheme for real-time channel establishment in wide-area networks, IEEE J. on Selected Areas in Communications 8(3) (1990), 368–379. [14] N.R. Figueira and J. Pasquale, An upper bound on delay for the Virtualclock service discipline, IEEE/ACM Trans. on Networking 4(3) (1995). [15] V. Firoiu, D. Towsley and J. Kurose, Efficient admission control for EDF schedulers, in: INFOCOM ’97, 1997. [16] S. Floyd and V. Jacobson, Link-sharing and resource management models for packet networks, IEEE/ACM Trans. on Networking, 1993. [17] L. Georgiadis, R. Guérin and A. Parekh, Optimal multiplexing on a single link: Delay and buffer requirements, IEEE Trans. on Information Theory 43(5) (1997), 1518–1535. [18] L. Georgiadis, R. Guérin, V. Peris and K.N. Sivarajan, Efficient network QoS provisioning based on per node traffic shaping, IEEE/ACM Trans. on Networking 4(4) (1996), 482–501. [19] S.J. Golestani, A self-clocked fair queueing scheme for high speed applications, in: INFOCOM ’94, 1994, pp. 636–646. [20] S. Gorinsky, S. Baruah, T.J. Marlowe and A.D. Stoyenko, Exact and efficient analysis of schedulability in fixed-packet networks: A generic approach, in: INFOCOM ’97, 1997. [21] P. Goyal and H.M. Vin, Generalized guaranteed rate scheduling algorithms: a framework, IEEE/ACM Trans. on Networking 5(4) (1997), 561–571. [22] R. Guérin and V. Peris, Quality of service in packet networks: Basic mechanisms and directions, Computer Networks 31 (1999), 169–189. [23] R. Guérin, S. Herzog and S. Blake, Aggregating RSVP-based reservation requests, Internet Draft, Sept. 1997. [24] J. Liebeherr, D.E. Wrege and D. Ferrari, Exact admission control for networks with bounded delay service, IEEE/ACM Trans. on Networking 4(6) (1996), 885–901. [25] C.L. Liu and J.W. Layland, Scheduling algorithms for multiprogramming in a hard real time environment, J. of ACM 20(1) (1973), 46–61. [26] A. Mok, Task management techniques for enforcing ED scheduling on a periodic task set, in: 5th IEEE Workshop on Real-Time Software and Operating Systems, 1988. [27] K. Nichols, V. Jacobson and L. Zhang, An approach to service allocation in the Internet, Internet Draft, Nov. 1997. [28] P. Pan, E. Hahne and H. Schulzrinne, The border gateway reservation protocol (bgrp) for tree-based aggregation of inter-domain reservations, Journal of Communications and Networks (June) (2000). [29] A.K. Parekh and R.G. Gallager, A generalized processor sharing approach to flow control in integrated services networks: The single-node case, IEEE/ACM Trans. on Networking 1(3) (1993), 344–357. [30] D. Stiliadis and A. Varma, Efficient fair queueing algorithms for packet-switched networks, IEEE/ACM Trans. on Networking 6(2) (1998), 175–185. [31] D. Stiliadis and A. Varma, Rate proportional servers: A design methodology for fair queueing algorithms, IEEE/ACM Trans. on Networking 6(2) (1998), 164–174. [32] I. Stoica, S. Shenker and H. Zhang, Core-stareless fair queueing: A scalable architecture to approximate fair bandwidth allocations in high speed networks, in: ACM SIGCOMM ’98, Vancouver, CA, 1998. [33] I. Stoica and H. Zhang, LIRA: A model for service differentiation in the Internet, in: NOSSADEV ’98, London, UK, 1998. [34] I. Stoica and H. Zhang, Providing guaranteed service without per-flow management, in: ACM SIGCOMM ’99, 1999. [35] I. Stoica, H. Zhang and T.S. Eugene Ng, A hierarchical fair service curve algorithm for link-sharing, real-time and priority services, in: SIGCOMM ’97, 1997. [36] K. Thomson, G.J. Miller and R. Wilder, Wide-area traffic patterns and characteristics, IEEE Network (Dec.) (1997). [37] D.C. Verma, H. Zhang and D. Ferrari, Delay jitter control for real-time communication in a packet switching network, in: TRICOMM ’91, 1991, pp. 35–46. [38] C.A. Waldspurge, Lottery and stride scheduling: Flexible proportional-share resource management, PhD Thesis, 1995. [39] Z. Wang, User-share differentiation (USD) scalable bandwidth allocation for differentiated services, Internet Draft, May 1998. [40] G.G. Xie and S.S. Lam, Delay guarantees of virtual clock server, IEEE/ACM Trans. on Networking 6(3) (1995), 683–689.
P. Chaporkar and J. Kuri / A network architecture for providing per-flow delay guarantees with scalable core
109
[41] H. Zhang, Providing end-to-end performance guarantees using non-work-conserving disciplines, Computer Communications 18(10) (1995). [42] H. Zhang, Service disciplines of guaranteed performance service in packet-switching networks, Proceedings of IEEE 83(10) (1995), 1374–1396. [43] H. Zhang and D. Ferrari, Rate controlled service disciplines, Journal of High Speed Networks 3(4) (1994), 389–412. [44] L. Zhang, Virtualclock: A new traffic control algorithm for packet switching networks, in: Proceedings of ACM SIGCOMM ’90, 1990, pp. 19–29. [45] L. Zhang, S. Deering, D. Estrin, S. Shenker and D. Zappala, RSVP: A new resource reservation protocol, IEEE Communications Magazine 31(9) (1993), 8–18. [46] Q. Zheng and K.G. Shin, On the ability of establishing real-time channels in point-to-point packet switched networks, IEEE Trans. on Communications 42(2/3/4) (1994), 1096–1105.