Fair Scheduling on Parallel Bonded Channels with Intersecting Bonding Groups Brian Dean, James Martin, Scott Moser, and James Westall School of Computing Clemson University Clemson, South Carolina 29634-0974 Email:
[email protected]
Abstract— We describe an efficient scheduling technique for providing weighted sharing of aggregate capacity in networks having parallel bonded channels in which a single channel may simultaneously be a member of multiple bonding groups. Our work is motivated by the introduction of this capability into version 3 of the Data over Cable Service Interface Specification (DOCSIS). Our technique extends Golestani’s self-clocked fair queuing algorithm (SCFQ). We illustrate its weighted fair-sharing properties via simulation and provide some analytic results that establish fairness under certain conditions. We also demonstrate that round robin based techniques such as weighted deficit round robin do not extend equally easily and effectively to this environment. keywords: DOCSIS, channel bonding, weighted fair scheduling,
I. I NTRODUCTION We describe an efficient scheduling technique for providing weighted sharing of aggregate capacity in networks having parallel bonded channels in which a single channel may simultaneously be a member of multiple bonding groups. Our work is motivated by the introduction of this capability into version 3 of the Data over Cable Service Interface Specification (DOCSIS) [1]. Our technique extends Golestani’s self-clocked fair queuing (SCFQ) algorithm [2]. We illustrate its weighted fairsharing properties via simulation and provide some analytic results that establish fairness under certain conditions. We also demonstrate that round robin based techniques such as weighted deficit round robin are not equally effective in this environment. A. Channel bonding Channel bonding is a technique in which multiple physical channels or multiplexed subchannels of a broadband channel are logically aggregated to provide increased throughput. The use of bonded channels dates back at least to the early 1970s when they were referred to as Transmission Groups in IBM’s Systems Network Architecture (SNA) [3]. In SNA a bonded channel consists of one or more parallel point-to-point links and is known as a transmission group. The transmission group becomes active as soon as its first constituent link is activated and remains active as long as any link remains operational. Because of the connection oriented nature of the SNA network
S
C−0
D
C−1 Type 0 C−0
S
C−1
C−2
D
C−3 Type 1 C−0 S
C−1
C−1
D
C−2 Type 2 Fig. 1.
Channel bonding
layer, the failure of a transmission group aborted all sessions that it carried. Therefore, the transmission group design was motivated not only by the need for increased capacity but also by the significantly increased reliability that it provided in the presence of the unreliable links and modems of the day. Channel bonding is currently used in parallel link form in Ethernet and ADSL networks and in FDM form in 802.11n WiFi networks. It is most typically employed in the form shown at the top of figure 1. We will call this type 0 bonding. Here, two (or more) channels comprise a single bonding group connecting node S to node D. When there is a single sender on a broadcast medium, as is the case on the WiFi and DOCSIS downstream channels, the node D can be viewed as representing the set of all end user stations that can receive channel 0 and channel 1. It is also possible employ channel bonding in the form we call type 1 in figure 1. Here there are parallel bonding groups connecting nodes S and D, but the channel sets underlying the bonding group do not intersect. In this configuration flows with real time QoS requirements might be mapped to the the bonding group consisting of channels 0 and 1 while best effort flows might be assigned to the other bonding group. Beginning with DOCSIS 3.0, channel bonding is supported in all three configurations shown in figure 1. Type 2 bonding
is shown in the bottom part of the figure. In this configuration channel 0 is reserved for the flows mapped to the upper bonding group, channel 2 is reserved for the flows mapped to the lower bonding group, and all flows have access to channel 1. We identify any configuration in which at least one channel appears in multiple bonding groups as a type 2 configuration and we refer to bonding groups that share a channel as intersecting bonding groups. Assuming equal channel capacity, the type 2 configuration shown reserves 1/3 of the total bandwidth for the flows of both the top and bottom bonding groups, but permits the scheduler to allocated the remaining 1/3 to flows of either bonding group based upon demand and priority. There are no defined constraints on the size or number of bonding groups, and a channel may be a member of bonding groups of different size. Hence, for a set of N channels, the number bonding Pof N groups that could theoretically be configured is k=1 N k . Identifying optimal configurations for specific workload characteristics is an interesting and difficult problem. However, we are not aware of any scientific research that addresses it, and we will not pursue it in this paper. B. Weighted fair-share scheduling on bonded channels Weighted fair-share schedulers, also called fair queuing schedulers, attempt to provide weighted allocation of channel capacity to competing packet flows. A weighted fair-share system must be able to classify packets by flow and dynamically recognize the presence of new flows and the termination of existing ones. A unique FCFS packet queue is maintained for each active flow. When the channel being scheduled becomes idle, the scheduling decision is choosing the next flow to receive service. For type 0 and type 1 channel bonding, time, scheduling algorithms developed for single channel use extend to scheduling on the bonding group in the obvious way. When any channel in the group becomes idle, the scheduler identifies the next flow to be serviced using the same algorithm used for a single channel. When this is done, each flow will achieve its weighted share of the aggregate capacity of the channels comprising the bonding group. Nevertheless, precise characterizations of delay and jitter fairness developed for single channels may not extend to bonded channels, especially when the channel capacities are heterogeneous. In this paper we also do not pursue the issue of delay and jitter fairness. It should be noted that in type 1 bonding systems the mapping of flows to bonding group does have a significant impact on the fraction of overall system capacity available to a each flow. The scheduler will ensure that all flows within the bonding group are treated fairly, but no load balancing among bounding groups is possible under the assumption that flows or channels do not dynamically change groups. For type 2 channel bonding, load balancing across bonding groups can be achieved (within limits) via scheduling decisions made on the shared channel(s). For example, in the type 2 configuration shown in figure 1. The fraction of total system capacity allocated to either bonding group can be varied from
1/3 to 2/3 depending upon how flows are scheduled onto channel 1. Nevertheless, not all weighted fair-share scheduling algorithms extend to the type 2 domain in straightforward and effective manners. We will show that round robin based approaches are problematic, but that virtual clock based approaches can be extended in an efficient and useful way. The remainder of the paper is organized as follows. We present background material on DOCSIS and weighted fair share scheduling in section II. Determining the feasibility of weighted share allocation of aggregate capacity for a given system configuration is considered in section III. Approaches for extending scheduling algorithms to type 2 bonding group configurations are described and evaluated in section IV. II. BACKGROUND A. DOCSIS The Data-Over-Cable Service Interface Specifications (DOCSIS) define the protocols and standards that support delivery of Internet services over hybrid fiber and cable (HFC) plants to cable customers. The DOCSIS standards are developed by Cable Television Laboratories (CableLabs), a not for profit research and development corporation founded in 1988 by a consortium of cable television system operators. The physical layer of a DOCSIS system is a shared medium cable. Access to the medium is controlled by a head-end device called the Cable Modem Terminating System (CMTS). All packet flow is between the CMTS and the Cable Modems (CMs). Downstream bandwidth is shared in an asynchronous TDM fashion under the control of the CMTS. In versions 1 and 2 of the DOCSIS protocols a collection of CMs shared a single downstream 6 MHz channel that provides a raw aggregate bit rate of 42.88 Mbps under QAM-256 modulation. Packets sent over the downstream channel are broken into 188 byte MPEG frames each with 4 bytes of header and a 184 byte payload. Since the CMTS is the only transmitter on the downstream channel, deficit round robin scheduling provides an efficient way to provide fair sharing of available capacity to best effort traffic in this environment under type 0 and type 1 channel bonding. A variant of DRR was developed and used by Cisco Systems [4] in their DOCSIS 1.1 product line. Type 2 channel bonding is supported in the DOCSIS 3.0 downstream. The CMTS functions as the S node in figure 1. Since the downstream channels operate as a broadcast medium, the CMs attached to the CMTS may be collectively considered the D node for the purposes of scheduling, and the methods we present in this paper are therefore directly applicable to the DOCSIS downstream. Since every CM tuned to an upstream channel is a potential sender, controlling access to the upstream channels is considerably more complex. The upstream channel is time division multiplexed with transmission slots referred to as mini-slots. Permission to transmit data in a block of one or more minislots must be granted to a CM by the CMTS. The CMTS grants mini-slot ownership by periodically transmitting a frame called the MAP on the downstream channel. In addition to
ownership grants, the MAP also typically identifies some mini-slots as contention slots in which CMs may bid for for quantities of future mini-slots. To minimize collisions in the contention slots, a non-greedy backoff procedure is employed. When a CM has a backlog of upstream packets it may also “piggyback” a request for mini-slots for the next packet at the tail of the current packet. Because of the complexity of the upstream side, virtually all of the published academic research on DOCSIS has pertained to management of the upstream channel. The work of Kuo, Kumar, and Kuo [5] provides additional detail on the upstream and is illustrative of research in the area of DOCSIS upstream scheduling. In DOCSIS a packet flow is a unidirectional entity called a service flow. A service flow consists of one or more TCP/IP connections having terminating at a specific CM. For example, a configuration that supports toll quality IP telephone service and best effort data typically consists of four service flows: one each for the upstream and downstream VoIP traffic and one each for upstream and downstream aggregate consisting of all the best effort traffic. Each CM maintains queues for each of its upstream flows. The CMTS maintains individual queues for the downstream traffic of the service flows of the set of CMs that it controls. When a service flow contains multiple TCP/IP connections, arriving to the service flow queue in FCFS order. The DOCSIS protocol intentionally does not address the specifics of scheduling. Hence, development of specific scheduling algorithms is left to the CMTS vendor. The algorithms developed by the vendor typically provide tuning mechanisms accessible to the cable system operator. Consequently, details of the scheduling algorithms are proprietary. The research upon which this paper is based was supported by a Cisco Research Award and involved periodic discussion with Cisco’s DOCSIS development group. Our findings were made available to both Cisco and the general public during the course of the project. A recent Web document available from CISCO advertises a ”DOCSIS WFQ scheduling engine” [6] for the DOCSIS 3.0 CMTS but provides no details on the underlying scheduling algorithms. The authors of this paper had no access to any proprietary Cisco information nor direct involvement in the development of any Cisco products. Therefore, we also have no knowledge regarding the extent (if any) to which our work might have been used by Cisco in the development of its DOCSIS offerings. B. Weighted fair queuing and scheduling We now present a brief review of the subject of weighted fair allocation of capacity and weighted fair share scheduling. Weighted fairness in capacity allocation has been widely studied in the context of computer networks [7] and theoretical graph flow problems [8]. The most commonly accepted measure of weighted fair allocation of capacity is called weighted max-min fair allocation. It is characterized by Keshav [7] as satisfying the following constraints: • Resources are allocated in order of increasing demand, normalized by weight. • No source gets a resource share larger than its demand.
Sources with unsatisfied demands get resource shares in proportion to their weights. Scheduling algorithms that provide weighted max-min fair sharing of capacity among competing flows have also been widely studied. One such class of algorithms is based upon processor sharing (GPS) [9]. Although GPS is itself a theoretical and unrealizable algorithm, it provides precise weighted max-min fair sharing and is commonly used as the benchmark against which realizable algorithms are compared. In GPS all backlogged flows continually received infinitesimal service quanta in proportion to their weights. Assume three flows are weighted (1/2, 1/3, 1/6) and packets requiring two units of service simultaneously arrive. Under GPS scheduling flow 0’s packet will complete service at time 4. At that time flow 1’s packet will have consumed 4/3 units of service and flow 2’s 4/6. In the next unit of real time flow 1’s packet will consume 2/3 units of service will complete at time 5. Flow 2’s packet will complete at time 6. Advantages of GPS scheduling include precise delivery of service by flow weight and avoidance of and avoidance of the head-of-line blocking problem that occurs on packet based scheduler when transmission of a long low priority packet is initiated shortly before a short high priority packet arrives on a flow with an empty queue. The study of GPS scheduling motivated the development of packet schedulers whose objective was to provide service comparable to GPS. The weighted fair queuing algorithm [9] requires that the packet scheduler carry out a parallel simulation of a GPS scheduler, and when the channel become idle, the packet that would have finished first under GPS is selected for scheduling. It was discovered by Bennett and Zhang [10] that fairness could be improved by limiting scheduler choice to packets that had already commenced service under GPS. This refinement to WFQ was called worst-case fair weighted fair queuing (WF2 Q). Their algorithm still required the computationally costly parallel simulation of GPS. Zhang [11] proposed the virtual clock scheduler which was claimed to provide fairness comparable to WFQ but without the cost of simulating GPS. Although the algorithm was flawed [2], the concept of the virtual clock was useful and is an important component of Golestani’s self-clocked fair queuing algorithm (SCFQ) [2]. Our work extends SCFQ, and both the original and extended versions of SCFQ are described in detail in section IV. Another class of fair share scheduling algorithms are based upon simple round robin scheduling. The most general of these, weighted deficit round robin (WDRR) [12] is very computationally efficient can also provide weighted max-min fair sharing of capacity. It will also be described in more detail in section IV. The WFQ, WF2 Q, SCFQ, and DRR algorithms all ensure asymptotically weighted max-min fair sharing of channel capacity. They differ in the bounds they guarantee between the completion time of a real packet and its theoretical counterpart in a GPS system. •
III. W EIGHTED FAIR
SHARING OF TYPE CHANNELS
2 BONDED
In the introduction, we showed that type 2 channel bonding could reduce, but did not necessarily eliminate, constraints on fractional allocation of aggregate capacity that were imposed by the mapping of flows and channels to bonding groups. In this section we address the issue of determining the feasibility of weighted fair sharing of aggregate capacity of type 2 bonded channels given a specific configuration. Suppose a source and destination are connected by m channels. Each of these channels has a capacity Cj measuredP in M bps, and the aggregate capacity of the system is m−1 C = j=0 Cj . Assume that the workload consists of n flows, Fi , where each flow has a target Pn−1fractional allocation of the aggregate capacity, ri , and i=0 ri = 1. Without loss of generality, we replace the explicit mapping of flows and channels to bonding groups by an n × m binary-valued mapping matrix, δi,j . Flow i is allowed to use channel j i.f.f. δi,j = 1. A system configuration is thus completely defined by a vectors Cj and ri , and the matrix δi,j . An allocation strategy is defined by a matrix, ai,j that defines the the portion of the capacity of channelPj that is m−1 allocated to flow i. We will use the notation ai = j=0 ai,j to refer to the total allocation of flow i. A feasible allocation strategy must satisfy three constraints. • ai,j > 0 only if δi,j = 1. Flows may use only eligible channels. Pn−1 • i=0 ai,j ≤ Cj ∀j.; No channel is overloaded. • ai = ri C ∀i. Each flow receives its targeted share of aggregate system capacity. It is easy to see that not all configurations have feasible allocation strategies. The Cj = (10, 10), ri = " configuration # 1 1 (0.4, 0.6), and δi,j = has no feasible allocation 0 1 because flow 1 has access to only 50% of aggregate capacity. " # 10 2 However if ri = (0.6, 0.4), then ai,j = is 0 8 a feasible allocation. In this particular case, the allocation strategy happens to be unique, but in general this is not true. A. Computing feasibility Given a possible configuration (Cj , ri , δi,j ) a linear optimization approach can be used to determine whether there exists an allocation ai,j that satisfies the three constraints. If there is a feasible allocation, the solution to the linear optimization will find it. Each ai,j for which δi,j is non-zero must be represented. The program is maximize X P = ai,j δi,j =1
subject to, for each channel j X ai,j ≤ Cj , δi,j =1
and for each flow l X
ai,j ≤ ri C
δl,j =1
The first set of inequalities ensures that no channel is overloaded. The second set ensure that no flow obtains more than its fair share. If the solution yields P = C, then the resulting values ai,j define a feasible allocation. If P = C then the aggregate capacity of the network is fully allocated, The first set of inequalities ensures no channel is over committed. The final set ensures notPflow has been allocated more than its weighted share. Since ri = 1 and P = C, every flow must have exactly its weighted share for this to be the case. B. Weighted max-min fair sharing Weighted max-min fairness is a commonly used criterion for characterizing fairness. Allocation of capacity is weighted max-min fair i.f.f the two following constraints are satisfied: • the aggregate flow through the network is maximized; and • it is not possible to increase the share, ai of any flow, fi , without decreasing the share, ak , of another flow fk for which arkk ≤ arii . A feasible allocation strategy, as described above, does represent a weighted max-min fair sharing of Paggregate capacity with respect to the weights. {ri }. Since q aq = C, it is not possible to increase the share of flow i without decreasing the share of flow k. If the strategy is feasible, then ak = rk C, and so arkk = C ∀k. Therefore, arkk = arii ∀i, k. C. Maintaining fairness under varying demand It is well-known that, on a single channel system, GPS based scheduling systems such as WFQ and SCFQ produce allocations that converge to weighted max-min fairness regardless of offered load. In contrast, it is easy to show that for type 2 bonded channels there are configurations with feasible allocation strategies when demand is unbounded for which it is not possible to preserve the target allocation ratios, ri , under a work conserving scheduler if some flows do not have unbounded demand. Suppose ri = 1/3 and the following mapping is used: 1 0 0 δi,j = 1 1 0 . 0 1 1
Under the unbounded demand assumption, there are an uncountable number of feasible allocation strategies. An obvious one is 10 0 0 ai,j = 0 10 0 . 0 0 10
Let di represent the fractional demand of flow i and assume d0 and d2 are unbounded. On a single channel system with
a GPS scheduler, for all values of d1