obtained through the concatenation of per-domain services and Service Level Agreements (SLAs) between adjoining domains along the traffic path, from source ...
Aggregate Flow Control: Improving Assurances for Differentiated Services Network Biswajit Nandy, Jeremy Ethridge, Abderrahmane Lakas, Alan Chapman Email: {bnandy, jethridg, alakas, achapman}@nortelnetworks.com Nortel Networks, Ottawa, Canada AbstractThe Differentiated Services Architecture is a simple, but novel, approach for providing service differentiation in an IP network. However, there are various issues to be addressed before any sophisticated end-to-end services can be offered. This work proposes an Aggregate Flow Control (AFC) technique with a Diffserv traffic conditioner to improve the bandwidth and delay assurance of differentiated services. A prototype has been developed to study the end-to-end behavior of customer aggregates. In particular, this new approach improves performance in the following manner: (1) fairness issues among aggregated customer traffic with different number of micro-flows in an aggregate, interaction of non-responsive traffic (UDP) and responsive traffic (TCP), and the effect of different packet sizes in aggregates; (2) improved transactions per second for short TCP flows; and (3) reduced inter-packet delay variation for streaming UDP traffic. Experiments are also performed in a topology with multiple congestion points to show an improved treatment of conformant aggregates, and the ability of AFC to handle multiple aggregates and differing target rates. Flow Control, Congestion KeywordsAggregate Management, Diffserv, TCP-friendly.
I.
INTRODUCTION
The Differentiated Services (Diffserv) architecture [2] has recently become the preferred method for addressing QoS issues in IP networks. An end-to-end differentiated service is obtained through the concatenation of per-domain services and Service Level Agreements (SLAs) between adjoining domains along the traffic path, from source to destination. Per-domain services are realized by traffic conditioning at the edge and simple differentiated forwarding mechanisms at the core of the network. One of the forwarding mechanisms recently standardized by the IETF is the Assured Forwarding (AF) [4] Per Hop Behaviors (PHB). The basis of the AF PHB is differentiated dropping of packets during congestion at a router. To build an end-toend service with AF, subscribed traffic profiles for customers are maintained at the traffic conditioning nodes at the edge of the network. The aggregated traffic is monitored; and packets are marked at the traffic conditioner. When an aggregate’s measured traffic is within its committed information rate (CIR), its packets are marked with the lowest drop precedence, dp0. When traffic exceeds its CIR, but falls below its Peak Information Rate (PIR), packets are marked with a higher drop precedence, dp1. If the measured traffic exceeds its PIR, packets are marked with the highest drop precedence, dp2. At the core of the network, during congestion, packets with a dp1 marking have a higher probability of being dropped
than packets with a dp0 marking. Similarly, packets with a dp2 marking have a higher probability of being dropped than packets with a dp0 or dp1 marking. The different drop probabilities are achieved with a “RED-like” [1] Active Queue Management (AQM) technique, with three different sets of RED parameters—one for each of the drop precedence markings. Although the IETF Diffserv Working Group has finalized the basic building blocks for Diffserv, the possible end-toend services that can be created for an end user of the AF PHB are still under development. Various issues with bandwidth and delay assurance in an AF PHB Diffserv network have been reported in recent research papers [5][6]. These issues need to be resolved before quantitative assurances of some form can be specified in SLA contracts. Recent papers have addressed the issues using intelligent traffic conditioning at the network edge [7][8][9] to improve bandwidth assurances. The intelligent traffic conditioning approach mitigates the impact of various factors that affect the distribution of available bandwidth, such as UDP/TCP interaction, RTT disparities, and target rate. A second approach is to address the very reason for the congestion at the network core by enforcing flow control for the customer aggregates at the edges of a domain. The architecture discussed in this paper, Aggregate Flow Control (AFC), follows this congestion management approach. AFC manages congestion in a manner that is fair to competing customer aggregates and that reduces the queues at the core of the network. AFC is an edge-to-edge control mechanism that has been combined with Diffserv traffic conditioning to address assurance issues for AF-based services. This new Diffserv overlay promotes treatment of aggregates that is independent of user dataAFC performs congestion management on an aggregate, regardless of what user data it carries. Such data transparency (a) improves fairness in TCP/UDP interactions; and it renders achieved bandwidth for an aggregate insensitive to (b) the number of microflows it contains, and (c) its packet sizes. Additionally, AFC pushes congestion from the shared core of a Diffserv network onto the edges whose ingress traffic is causing congestion. This transition (a) improves the treatment of conformant aggregates, (b) protects short-lived TCP flows, and (c) reduces the jitter of streaming traffic. Section II discusses related studies of end-to-end bandwidth assurance issues in Diffserv networks and proposed improvements. Section III describes the architecture of the integrated edge router, which combines a Diffserv traffic conditioner with AFC. Section IV details the
implementation and Section V reports the experimental results. Finally, the discussion and conclusions are captured in Sections VI and VII, respectively. II.
AFC Traf. DS Cond. Traf. Cond.
RELATED WORK
Ibanez and Nichols [5], via simulation studies, show that RTT, target rate, and TCP/UDP interactions are key factors in the throughput of flows that obtain an Assured Service using a RIO-like (RED with In/Out) [3] scheme. Their main conclusion is that such an Assured Service “cannot offer a quantifiable service to TCP traffic.” Seddigh, Nandy, and Pieda [6] confirm with detailed experimental study that those factors are critical for biasing distribution of excess bandwidth in an over-provisioned network. Lin, Zheng, and Hou [7] propose an enhanced TSW profiler and two enhanced RIO queue management algorithms. Simulation results show that the combination of enhanced algorithms improves the throughput and fairness among aggregated customer flows with different target rates, RTTs, and co-existing TCP and UDP flows. However, the proposed solutions may not be scaleable, due to their dependence on state information at the core of the network. Yeom and Reddy [8] suggest an algorithm that improves fairness for the case of individual flows in an aggregate that have different RTTs. The proposed algorithm maintains perflow information at the edge of the network. Nandy et al. [9] use intelligent traffic conditioning at the edge of the network to address throughput issues in AF-based Diffserv networks. An improved policer algorithm is suggested to mitigate the impact of RTT disparities on the distribution of excess bandwidth. TCP/UDP interaction is addressed by mapping TCP and UDP flows to different drop precedences. Harrison and Kalyanaraman [12] propose an overlay architecture and algorithms for edge-to-edge traffic control. A core router indicates congestion by marking a bit at the IP layer. On receipt of congestion notification, the edge node enforces rate control on incoming traffic aggregates. This edge-to-edge rate control approach can be integrated with a Diffserv traffic conditioner to address the bandwidth and delay related issues in a Diffserv network. However, the paper does not discuss this extension. III.
Customer
ARCHITECTURE
Many of the above-mentioned bandwidth and delay assurance issues in a Diffserv network can be addressed if the customer traffic aggregates are managed in a controlled, TCP-friendly [13] manner. The Aggregate Flow Control overlay regulates the flow of the aggregated customer traffic into the core of the network. The control mechanism is based on the feedback from the core—i.e., packet drop due to congestion at the core.
Edge Router
Core R
Figure 1. Diffserv Network with AFC
Traffic aggregates that are exceeding their committed rate and causing congestion at the core are throttled at the edge of the network. Thus, the queues at the core of the network remain small, due to congestion control at the network edges. The queues caused by misbehaving aggregate flows are pushed back to the ingress point of the corresponding edge routers. As shown in Figure 1, the building blocks for the TCPfriendly Aggregate Flow Control mechanism reside with the Diffserv traffic conditioners at the edge routers. This approach does not assume any special congestion notification at the core routers. The edge-to-edge AFC architecture is an extension to a scheme called “TCP Trunking,” proposed by Chapman and Kung [10]. A study reporting the design, implementation, and performance of TCP Trunking in a best effort network is reported by Kung and Wang [11]. AFC extends and improves TCP Trunking and integrates it with a standard Diffserv traffic conditioner. AFC works in the following manner: (a) Control TCP connections are associated with a customer traffic aggregate between two network edges. (b) Control TCP packets are injected into the network to detect congestion along the path of the aggregated data. (c) Based on the packet drops of the control flow at the core routers, congestion control is enforced on the aggregated customer traffic at the ingress edge router. Control TCP connections play the role of controlling the behavior of the aggregates between the edges. Although the aggregated customer traffic and its associated TCP control packets are in separate flows, the traffic aggregate can be virtually viewed as the payload of the control TCP flows, while the control packets are the headers. It is assumed that control packets and data packets are following the same path between the edges. Thus, it can be said that at the core, there are a few special TCP flows carrying customer aggregates between two edges.
Classifier
Credit Meter Credit update Diffserv Meter
Flow Control Unit
Diffserv Policer
Diffserv Marker
Data Packet Control Figure 2. Block Diagram of a Traffic Conditioner
Figure 2 depicts the functional block diagram for an edge node. A classifier segregates the traffic belonging to various customer aggregates. Each customer aggregate between two edges is controlled by a virtual TCP connection. Customer data is queued at the Flow Control Unit. The data is forwarded to Diffserv traffic conditioner only if credit is available for that customer aggregates. Credit is decremented whenever customer data is forwarded to the Diffserv traffic conditioner. AFC uses the notion of a virtual maximum segment size (vmss) for the control TCP flow. As explained earlier, conceptually, the customer data is the payload for the control TCP. The virtual maximum segment size for the control flow is the amount of customer data allowed for each credit (e.g., 1514 bytes of customer data). The Flow Control Unit generates a control packet (i.e., injects a header packet) after every vmss bytes of data that it transmits. Credit generation is determined by the state (or the congestion window) of the virtual control TCP connection, as detailed in Section IV. The Credit Meter accumulates credit upon the generation of a control packet by the Flow Control Unit. Thus, customer data arrives at the input of a Diffserv traffic conditioner only if the Flow Control Unit allows the data to proceed. The user data and its associated control packets are metered, policed, and marked in the identical manner of any standard Diffserv traffic conditioner. An underlying assumption of Aggregate Flow Control is that control packets follow the same path as their associated data packets. Each aggregate data stream is associated with a control flow. The control flow allows the customer traffic to grow during light to no congestion, while throttling the traffic if
congestion is detected. This technique pushes the queues at the core routers to the edge of the network, because it limits the sending rate of a congestion-causing aggregate at its ingress edge node. In the AFC implementation used for this paper’s experimentation, each edge node employs RED to manage its queues. However, it is worth noting that once congestion is pushed onto an edge node, any intelligent traffic management scheme may be applied. Aggregated Flow Control is applicable to services with double-ended SLAs—i.e., cases for which both the source and destination of traffic aggregates are known. One example of this scenario is VPN service. The double-ended restriction arises because a control TCP connection must be established between two edges. The Aggregated Flow Control mechanism in a Diffserv network can be further explained with an example, using the experimental topology illustrated in Figure 4 (see Section V). Consider the case of two traffic aggregates, labeled A and B. Aggregates A is between edge nodes E1 and E3, while aggregate C is between E2 and E3. Thus, each aggregate passes through the same core router, C1. Assume that E1 receives Aggregate A traffic below its CIR and that E2 receives Aggregate B traffic above its PIR. The bottleneck link is between the Core and edge node E3; and suppose that it can only support the committed information rates for the two customers. Aggregate A’s traffic is marked with drop precedence dp0, while Aggregate B’s is marked with dp0, dp1, and dp2. Each aggregate’s control TCP packets are marked in the same manner as its data packets. The congestion at the core causes dp2 traffic to be dropped with a higher probability. Due to the drop of dp2 control traffic, the control TCP throttles Aggregate B at E2. This congestion control will reduce the queue length at the shared core router and push it onto only E2. Aggregate A, which is conforming to its profile, therefore receives better service. At the core of the network, when there is no congestion, none of the data and control packets is dropped. The Flow Control Unit at edge node allows increased data traffic by following TCP’s Additive Increase and Multiplicative Decrease (AIMD) congestion control mechanism. As soon as a control packet is dropped, the virtual control TCP throttles the flow of customer data (following the AIMD mechanism) by limiting the availability of credit at the credit meter. IV.
IMPLEMENTATION DETAIL
Figure 1 treats an AFC edge node as two functional blocks; data passes through an AFC Traffic Conditioner into a Diffserv Traffic Conditioner. Figure 2 expands on that view by denoting the six process blocks that comprise those two conditioners. Figure 3 provides greater detail of the AFC
TCP Stack
Credit Flow Control Unit
Cl.
DS Block
Customer Data Queues
Data Packet Flow Control Packet Flow
OH
Credit Update Figure 3. Aggregate Flow Control Data Path
blocks, while representing all of the Diffserv functionality in a single block. The AFC Traffic Conditioner employs a token bucket scheme to regulate data output from the customer network into the core network, as illustrated in Figure 3. Incoming packets are intercepted by the Classifier (Cl.), classified as belonging to a particular aggregate, and queued according to that classification. The Flow Control Unit (FCU) schedulesfrom the customer data queues, based on the TCP credit bucket. The FCU can only forward user packets when TCP credit is available in the bucket. When the FCU sends a packet for output, the bucket is drained by the size of that packet. For every vmss worth of user data that is sent, a control packet is generated and sent to the TCP stack. All data and control packets are handed to the Diffserv conditioner for metering, policing, and marking. The AFC architecture does not require any changes to the Diffserv functionality; it is simply an overlay that controls input to the Diffserv block. An Output Handler (OH) monitors all outgoing packets. When the OH detects a control packet, it increments the TCP bucket for the associated aggregate by vmss bytes. Control packets, like any TCP packet, may be held in the stack, if their congestion window does not allow a transmission. The loss of a control packet in the network will slow the transmission rate of control packets; and, because control packets regulate data packets, the drop will slow the transmission rate of user data. Figure 3 and its explanation so far include two important simplifications. Firstly, the figure depicts only one TCP credit bucket. In fact, each aggregate has its own bucket.
The second simplification concerns the TCP control flows. The loss of a TCP control flow’s packet causes that flow to halve its congestion window. If each aggregate is regulated by a single control flow, a packet loss causes the entire aggregate to halve its sending rate. When dealing with large traffic aggregates, such congestion control is too harsh. Instead, multiple control TCP flows are used for each aggregate. Consider an aggregate that is managed by four control flows. If one of those flows loses a packet, that flow halves its congestion window; but the other flows are unaffected. Since one of the four congestion windows that control an aggregate is halved, the aggregate’s effective control window is cut by one-eighth. The experiments detailed in this paper use four control flows per aggregate, which results in smoother behaviour than a single flow gives. There is a practical limit to how many control flows should be used. Given a fixed queue size at the core routers, increasing to an arbitrarily high number of control flows implies that many of those flows will be in slow-start state. In the experimental testbed, four control flows produces smooth behaviour; and fixing the number of control flows allows core RED parameters to be chosen. To produce the fair treatment that is one of the goals of AFC, all aggregates are governed by the same number of control flows. At initialization time, the TCP credit is set to n+1 vmss, where n is the number of control flows, to avoid a potential deadlock. Assume that vmss is 1514 bytes and that the user sends 1024-byte packets. If the credit is initialized to 1 vmss, the first packet will drain 1024 bytes from the bucket. The second user packet cannot be sent, since the remaining credit is 490 bytes. No control packet is generated because vmss bytes of user data have not been sent. This lack of credit update causes a deadlock. If the initial credit is n+1 vmss, the second user packet and a control packet will be transmitted; and vmss worth of TCP credit will be added. Since, there is no loss of credit in the system, there is no potential for deadlock. The only requirement is that the minimum allowable vmss is one MTU. The AFC mechanism schedules from the multiple control flows in a round-robin manner. However, the scheduler also detects if a control flow has written a packet to the stack that has not yet reached the Output Handler (see Figure 3). The scheduler skips any flow with a packet pending in the stack. This step prevents the shared TCP credit from being consumed on a control packet that is guaranteed to wait in the stack behind another pending control packet. Thus, a packet drop in one control flow does not block the other control flows. In the unusual case that all control flows have packets pending output, the scheduler uses the flow whose turn it was supposed to be in the round robin. If a customer aggregate is throttled, the customer data queue will grow. In the experiments, a single level RED algorithm is applied on these queues for packet dropping.
V.
EXPERIMENTAL RESULTS
Studies were performed using an experimental testbed that includes Diffserv and AFC building blocks. The devices run on a Pentium platform with VxWorks as the RTOS. Figure 4 shows the basic experimental topology. The setup consists of four router elements: E1, E2, E3, and C1. Each edge device is connected to an end host or traffic source. The netperf [14] tool is used as the TCP traffic generator and udpblast is used to create UDP streams. Each link in the topology has a 5 Mbps capacity. E1 E3
1
3
C1 2 E2 - Edge Device - Core Device
- Client
Figure 4. Single Congestion Point Topology
The experiments detailed in this section refer to Aggregate 1 and Aggregate 2. Aggregate 1 runs between Client 1 and Client 3; and Aggregate 2 runs between Clients 2 and 3. Each therefore passes through a single congested core device. The bottleneck link is between nodes C1 and E3. The edge devices in the testbed classify packets based on source and destination IP addresses. The policer utilizes the Time Sliding Window (TSW) tagger [3]. The experiments compare conventional Diffserv, referred to as the Standard Traffic Conditioner (TC), to the AFC scheme. The core device implements the AF PHB using the threelevel version of RED. Three sets of RED thresholds are maintained in the core deviceone for each drop precedence. A coupled queue accounting scheme is used, meaning that: (a) the probability of dropping a dp0 packet depends on the number of dp0 packets in the queue, (b) the probability of dropping a dp1 packet depends on the number of dp0 and dp1 packets in the queue, and (c) the probability of dropping a dp2 packet depends on the number of dp0, dp1, and dp2 packets in the queue. In all cases, decisions are made using a weighted running average of the queue size. The following RED parameters are used in all experiments: TABLE 1 CORE RED PARAMETERS
Prec.
minth (pkts)
dp0 dp1 dp2
200 160 40
maxth (pkts) 240 200 160
maxp 0.02 0.06 0.12
The physical queue size is capped at 250 packets and wq equals 0.002. It is important to note that the thresholds are not as high as they might appear. A vmss of 1514 bytes is used throughout the tests, meaning that the AFC mechanism will insert a 40-byte control packet for every user data packet. Thus, a 40-packet threshold allows 20 (full-sized) user packets and 20 control packets. The thresholds are used because the core RED implementation is packet-based, rather than byte-based. Each edge node in the testbed also uses a single-level RED queue, with thresholds of 30 and 60 packets, a maxp of 0.02, and a wq value of 0.002. The six experiments study the impact of various factors on throughput and delay. The fairness issues examined are the effect of a differing number of microflows in an aggregate, the impact of non-responsive flows, the effect of differing packet sizes, and the treatment of short-lived TCP flows. Additional experiments measure the interpacket delay of streaming data and the impact of multiple congested nodes. Finally, the applicability of AFC is demonstrated through experiments involving different numbers of aggregates and aggregates with differening profiles. 1st. Experiment: Impact of Number of Microflows Service agreements in a Diffserv network are made on aggregated traffic; and various business houses will contract a target rate with a service provider. It is likely that different customers will have different number of microflows in a target aggregate. In an over-provisioned network, the aggregate with a larger number of TCP flows obtains a greater share of the excess bandwidth. This experiment shows that the sharing of excess bandwidth can be made insensitive to the number of microflows. In this scenario, there are two sets of aggregated TCP flows. Each has the same Diffserv target: a 0.5 Mbps CIR and a 1.0 Mbps PIR. Aggregate 1 contains 10 TCP flows, while the number of TCP flows in Aggregate 2 varies from 5 to 25. With the Standard Diffserv Traffic Conditioner, the bandwidth obtained by each aggregate is directly proportinal to its number of microflows. The experiment was repeated with Aggregate Flow Control enabled at the edge routers. The bandwidth achieved under AFC is independent of the number of microflows in the aggregate; each aggregate obtains an equal amount of bandwidth. The improved bandwidth distribution results because the control TCP flows transform the scenario from one in which 15 to 35 TCP flows vie for bandwidth into a case in which two aggregates compete for bandwidth. Sharing occurs on an aggregate level, which is independent of the data within an aggregate.
TCP/UDP Interaction
Differing Number of Microflows per Aggregate 4
4.5 4
Throughput (Mbps)
Throughput (Mbps)
3.5 3 2.5 2 1.5
DS Agg 1
1
DS Agg 2
0.5
AFC Agg 1 AFC Agg 2
3 2.5 2 1.5 1 0.5
Aggregate 1: 10 flows
10
15
DS Agg 1 DS Agg 2
Aggregate 1: 10 TCP flows
AFC Agg 1
0
0 5
3.5
20
25
Microflows in Aggregate 2
Figure 5. Number of Microflows
AFC Agg 2
1
2
3
4
Aggregate 2 UDP rate (Mbps)
5 Agg 1 Exp. Exp. Agg 2
Figure 6. TCP/UDP Interaction
2nd. Experiment: TCP/UDP interactions
3rd. Experiment: Impact of Packet Size
A paying Diffserv customer will inject both TCP and UDP traffic into the network. The interaction between TCP and UDP may cause the unresponsive UDP traffic to impact the TCP traffic in an adverse manner. Clearly, there is a need to protect responsive TCP flows from non-responsive UDP flows, while protecting certain UDP flows, which require the same fair treatment as TCP, due to multimedia demands. The Diffserv customer should decide the importance of the payload, assuming that the network is capable of handling both TCP and UDP traffic in a fair manner. In this scenario, there are two sets of aggregated flows, each with a 1 Mbps CIR and a 2 Mbps PIR. Aggregate 1 comprises 10 TCP flows and Aggregate 2 has a UDP flow with a sending rate increasing from 1 Mbps to 5 Mbps. Each aggregate has an identical target rate, so after the UDP rate reaches 3 Mbps and causes congestion, the two aggregates should share the bandwidth equally, which is the data plotted in Figure 6 as the Expected Aggregate values. With the Standard Diffserv Traffic Conditioner, as the UDP streaming rate increases, the amount of bandwidth obtained by the TCP aggregate decreases. However, with Aggregate Flow Control, the UDP throughput is restricted by the control TCP flows. The aggregates share the bottleneck link bandwidth in a TCP-friendly manner. The throughputs obtained by the aggregates are closer to the expected throughput. Additionally, it is worth noting that AFC does not restrict the TCP aggregate from consuming all of the excess bandwidth when the UDP streaming rate is 1 Mbps and 2 Mbps. This shows that AFC scheme allows an elastic growth of traffic aggregates at the time of no congestion.
When two aggregates send different sized TCP packets, the effect is similar to the case of two aggregates with differing numbers of microflows. With all other factors being equal, the aggregate that is sending larger packets will consume more of the available bandwidth, because its TCP flows individual congestion windows grow more quickly. However, Aggregate Flow Control effectively applies a congestion window to each aggregate that is independent of packet size. A control packet is generated for every vmss bytes of user data, regardless of how many user packets it takes to reach vmss. Thus, AFC removes packet size as a factor in aggregate congestion control. Figure 7 illustrates the difference between standard Diffserv and AFC. Aggregate 1 transmits 256-byte packets, while Aggregate 2’s packet increase from 256 bytes to 1500 bytes. Each has a 1 Mbps CIR and a 2 Mbps PIR. Through Diffserv, as the disparity in packet sizes grows, so does the disparity in achieved bandwidth. With AFC, though, differing packet sizes are not a factor in aggregate bandwidth distribution. 4th. Experiment: Protection of Short TCP Flows The objective of this experiment is to study if short-lived TCP flows can be protected under congestion by enabling Aggregated Flow Control. A traffic mix of TCP and UDP is chosen to create congestion. Each aggregate has a 1 Mbps CIR and a 2 Mbps PIR. Aggregate 2 includes a base traffic of long-lived flows plus a series of short TCP flows. Each short flow consists of a single packet request from Client 3 to Client 2, followed by a 16 Kbytes response (transaction). Transactions per second is the metric used to indicate the disparity between standard Diffserv and AFC.
TABLE 2 SHORT-LIVED FLOWS WITH DIFFSERV Differing Packet Sizes
Throughput (Mbps)
3
Agg. 1 Total Traffic (E1-E3)
Agg. 2 Base Traffic (E2-E3)
Trans/ sec
4M UDP 4M UDP 10 TCP 10 TCP 10 TCP
2M UDP 10 TCP 2M UDP 10 TCP none
0.58 0.55 0.94 0.31 1.33
2.5
2
1 2 3 4 5
DS Agg 1 DS Agg 2 AFC Agg 1
1.5
AFC Agg 2 256
Agg 1: 256 byte packets 512
1024
1500
Avg. Q E2 1 1 1 1 1
TABLE 3 SHORT-LIVED FLOWS WITH AFC
Aggregate 2 Packet Size (bytes)
Agg. 1 Total Traffic (E1-E3)
Figure 7. Impact of Packet Size
Table 2 lists the results of 5 tests conducted over a Diffserv network; while Table 3 covers the results of the same tests performed with Aggregate Flow Control. Note that the core queue in Table 3 includes data and control packets. The number inside the brackets shows the number of data packets in the core queue. The majority of data packets are of maximum segment size. Test 1 studies the short TCP flows in the presence of competing UDP flows. For Aggregate Flow Control, there is a 4 times improvement in transactions per second. This increase is due to two factors. One is that AFC promotes the fair sharing of bandwidth between TCP and UDP, as explained earlier. The second factor is that AFC pushes congestion from the core of a network onto the edge that is exceeding its profile. With standard Diffserv, Edge 1 forwards all of Aggregate 1’s incoming traffic into the core of the network, causing a large queue at the core. The AFC mechanism throttles the sending rate from the edge of the network into its core. Under the Diffserv case, no queue develops at the edge; but the queue at the core averages 160 packets. With AFC, the average core queue is reduced to 94 packets. The average queue at Edge 1, whose aggregate is causing most of the congestion, swells to 51 packets; whereas the average queue at Edge 2 grows only to 3 packets. The AFC improvements are also emphasized in the RTT differences, as measured along the transactional traffic path. The same factors that explain the results of Test 1 apply to the other test cases; and one can examine the same metrics to make comparisons. Test 2 combines the short TCP flow with 10 long-lived TCP flows. Again, there is an improvement in transactions per second. The disparity is not as great as in Test 1, because the 10 TCP flows will ramp up to consume a higher share of the link than the fixed 2 Mbps UDP traffic of Test 1. Thus, the 10 long-lived TCP flows of Aggregate 2 leaves less room for
Standard Diffserv Avg. Avg. Avg. RTT Q Q E1 (ms) Core 775 160 1 775 159 1 605 123 1 393 83 1 188 53 1
1 2 3 4 5
4M UDP 4M UDP 10 TCP 10 TCP 10 TCP
Agg. 2 Base raffic (E2E3) 2M UDP 10 TCP 2M UDP 10 TCP none
Aggregate Flow Control Diffserv Trans/ Avg. Avg. Q Avg. Avg. sec RTT Core Q Q E2 (ms) E1 2.47 0.73 2.29 0.75 6.37
199 530 236 518 144
94 (47) 94 (47) 95 (43) 95 (43) 61 (31)
51 44 36 34 34
3 37 7 36 1
the transactional traffic. However, even in this scenario, AFC outperforms standard Diffserv, in both transactions per second and RTT. In this scenario, the queue at the core caused by Aggregate 2’s 10 TCP flows is pushed backed to edge E2. If a separate queue is allocated at the edge for transactional traffic, the transactions per second can be greatly increased. It is up to a network administrator to decide whether to protect shortlived flows further, by either allocating a separate aggregate for them or by favouring them when managing edge queues. Test 3 emphasizes the difference between Tests 1 and 2. Again, the additional traffic on Aggregate 2 is a UDP stream whose sending rate is less than that aggregate’s fair share of the link. The AFC scheme enables the transactional traffic to consume that remaining fair share, while keeping the RTT low. Test 4 shows an improvement in transactions per second, when only TCP traffic is sent. Even when the competing traffic is equal, the AFC architecture improves the treatment of short-lived flows. The transactions per second in Table 3 for Test 2 and Test 4 are almost equal. This similarity arises because, in each case, the short TCP flow is encountering equal total queues (edge + core queues) in the data path. Test 5 proves that even if a customer aggregate is sending only one short TCP flow (and is therefore well within its specified profile), in a Diffserv network, the transactions per second can suffer. It is difficult for the aggregate to ramp up to a significant share of the link. Aggregate Flow Control does a much better job of securing bandwidth for the shortflow aggregate, resulting in a transaction rate that is 4.5 times that of the standard Diffserv case.
5th. Experiment: Improving Interpacket Delay Characteristics for Streaming Traffic
E1 This experiment demonstrates that the Aggregate Flow Control can improve the interpacket delay of streaming UDP traffic in presence of competing traffic from other customers. 10 TCP flows are sent along Aggregate 1; while 1 Mbps of UDP is streamed along Aggregate 2. Each aggregate has a 1 Mbps CIR and a 2 Mbps PIR. The expected interpacket arrival time for the UDP is 12 milliseconds. Figure 8 shows that the spread in interpacket arrival time is greater for the standard Diffserv scenario. The reason for the improved delay characteristic of AFC can be explained by the results of Test 3 of Experiment 3. Aggregate Flow Control significantly reduces the queue at the core in that scenario. The congestion caused by Aggregate 1’s 10 TCP flows is pushed back to its corresponding edge node. AFC produces a smaller, less bursty core queue. Interpacket Arrival Time Distribution
Number of Packets
1400 1200
DS
AFC
1000 800 600 400 200
0. 00 0. 2 00 0. 4 00 0. 6 00 8 0. 0 0. 1 01 0. 2 01 0. 4 01 0. 6 01 8 0. 0 0. 2 02 0. 2 02 0. 4 02 0. 6 02 8 0. 03 M or e
0
Interpacket Arrival Gap (ms)
E5 5
1 C1
C2
C3
2
4 E2
E3
- Edge Device - Core Device
E4 3
Figure 9. Multiple Congestion Points Topology
Figure 10 shows the effect of multiple congested nodes. In this scenario, the throughput drops very quickly under Diffserv. With AFC, the impact of multiple congestion points is mitigated. With 3 congested nodes, for example, the Diffserv throughput has dropped to 0.8 Mbps, whereas the AFC throughput is 1.4 Mbps. The reason that a long flow suffers when it passes through multiple congested nodes is twofold. Firstly, each packet has a chance of being dropped at each core node, therefore the total probability of a packet drop increases with multiple nodes. Secondly, it is a known fact that for aggregates with RTT differences, the aggregate with the shorter RTT ramps up more quickly and consumes a greater share of excess bandwidth. This second point does not apply to the UDP traffic used in this experiment; but it is a factor with TCP traffic.
Figure 8. Interpacket Arrival Times Effect of Multiple Congested Nodes
6th. Experiment: Effect of Multiple Congested Nodes The single congestion point topology serves to illustrate the advantages of Aggregate Flow Control. For the final experiments, though, a more complex topology is used, which adds two more congestion points, as illustrated in Figure 9. For Experiment 6, the throughput of an aggregate between Clients 1 and 5 is measured. That aggregate contains 10 TCP flows. Aggregates are enabled and disabled along the remaining clients in such a way as to produce congestion at one, two, or all three core routers. Specifically, the competing aggregates run from Host 2 to 3, from 3 to 4, and from 4 to 5. Thus, each faces only one congested node.Each of these competeing customer aggregates is a UDP flow streaming at 3 Mbps. Each aggregate uses a Diffserv profile of 1 Mbps CIR and 2 Mbps PIR.
Agg 1-5 Throughput (Mbps)
3 2.5 2 1.5 1 0.5
DS
AFC
Agg 1: 10 TCP flows Other Agg: 3 Mbps UDP
0 1
2
3
Number of Congested Nodes
Figure 10. Effect of Multiple Congested Nodes
The reason that AFC lessens the impact of multiple nodes is again because it pushes queues at the congestion points onto the edges. Thus, compared with Diffserv, the drop probability and queueing delay increases less quickly as the number of congested nodes increases.
TABLE 5 BANDWIDTH DISTRIBUTION WITH VARYING PROFILES
7th. Experiment: Varying Numbers of Aggregates The first six experiments illustrate the advantages of AFC over a standard Diffserv network. The final two experiments demonstrate the applicability of AFC to more complex scenarios. These tests make no comparison to Diffserv; they simply illustrate that AFC extends beyond what has been proven. The Multiple Congested Nodes experiment uses a network with up to four competing aggregates. However, due to the symmetry of that experiment, no congested link is shared by more than two aggregates. Experiment 7 sends aggregates from Hosts 1, 2, 3, and 4 to Host 5. Thus, the various links are shared by between two and four aggregates. Each aggregate comprises 10 TCP flows, with an equal profile of a 1 Mbps CIR and a 2 Mbps PIR. Two, three, and four aggregates are sent at a time, and the bandwidth is measured between Edge 5 and Host 5. Table 4 details the results, which clearly show that AFC is able to equally divide the bandwidth bewteen the total number of aggregates. TABLE 4 BANDWIDTH DISTRIBUTION BETWEEN A VARIABLE NUMBER OF AGGREGATES (Mbps) Agg. 1-5 2.4 1.4 1.1
Agg. 2-5 2.4 1.7 1.2
Agg. 3-5 Off 1.7 1.2
Agg. 4-5 Off Off 1.1
8th. Experiment: Varying Target Rates Although the other experiments on this paper illustrate equal bandwidth sharing for aggregates with equal profiles, AFC in fact distributes bandwidth according to profiles. This final experiment captures the results of two competing aggregates with different profiles. Each aggregate is composed of 10 TCP flows. Aggregate 1 has a fixed profile of a 1 Mbps CIR and a 2 Mbps PIR; while the profile for Aggregate 2 varies, as noted in Table 5. The results illustrate that AFC respects different traffic profiles. Furthermore, excess bandwidth seems to be divided equally between aggregates. For every case, each aggregate has a PIR 1 Mbps higher than its CIR. Thus, in this comparison, all bandwidth after CIR can be considered excess. For example, consider the case for which Aggregate 2 has a profile of (2 Mbps, 3 Mbps). In this scenario, the total committed bandwidth is 3 Mbps (1 Mbps for Aggregate 1 and 2 Mbps for Aggregate 2). If the remaining 2 Mbps of bandwidth is divided equally, Aggregate 1 should expect a throughput of 2 Mbps, while Aggregate 2 should expect 3
Profile 2 (CIR, PIR) (0.0, 1.0) (1.0, 2.0) (2.0, 3.0) (3.0, 4.0) (4.0, 5.0)
Agg. 1 BW (Mbps) 2.8 2.4 1.8 1.5 1.1
Agg. 2 BW (Mbps) 1.9 2.4 3.0 3.2 3.6
Mbps. These figures match the results (1.8 Mbps and 3.0 Mbps) to within experimental error. The other trials also closely match the expected results, although as the total committed rate approaches link speed, the bandwidth division falls further from the theoretical values. VI.
DISCUSSION
This section discusses various issues related to Aggregated Flow Control in a Diffserv Network. As proven by the experiments of Section V, the AFC scheme improves the bandwidth and delay assurances in an AF PHB based Differentiated services network. AFC promotes elastic sharing of available bandwidth among customer aggregates, it introduces fairness among customer aggregates, and it reduces queues at the core of the network. At times of no congestion, the customer aggregates are allowed to use the available bandwidth. While, the aggregates are throttled as soon as congestion at the core starts dropping packets. Why Aggregates? In a Diffserv network, performance commitments for customer traffic will be specified at aggregate levels, rather than on the level of individual flows. For example, it is likely that VPN applications in a Diffserv network will specify throughput, latency, and loss rate in terms of aggregates. Thus, for scalability reasons, the congestion control scheme for aggregates needs to be performed without monitoring individual flows. How good is the scheme? The AFC mechanism uses the well-understood AIMD congestion management approach of TCP. In fact, the simplicity in its implementation one of the strengths of AFC. A standard TCP NewReno stack is used for the development of the prototype. The self-clocking nature of TCP adds to the robustness of the AFC scheme. The control loop is very well defined in the sense that new control packets can be injected in the network only if old ones leave the network. Deployment Issues: The AFC approach requires an incremental modification at the Diffserv edge routers. As explained earlier, the AFC building blocks will reside with the traffic conditioner block of a Diffserv edge router. AFC does not assume or require any special networking support at the core of the network. The AFC traffic cannot mix with other uncontrolled Diffserv traffic. In the case of traffic mix, AFC traffic will
start competing with individual Diffserv flows. This competition will lead to unfair treatment of (well-behaved) AFC traffic. This unfairness can be avoided by allocating a special queue with a specified bandwidth at the core nodes. All the Aggregated Flow Controlled Diffserv traffic will use this queue for an improved end-to-end service. This solution will expedite a partial deployment of AFC Diffserv network. Improved treatment of customer aggregates: The AFC scheme pushes the queues for the respective customers to the edges. This transfer allows an individual customer aggregate or a particular traffic type to be treated in a special fashion. For example, a specialized algorithm will be feasible at the edge for a controlled loss and latency for certain customer aggregates. Another example is assigning a separate queue at the edge for transactional traffic (as discussed in Experiment 4, Test 2) for improved transactions per second. Further Investigation: There are a number of issues requiring further investigation. (1) How many control flows needed per customer aggregates is to be determined. (2) The impact of a higher vmss is to be studied. This information is needed to reduce the overhead due to control packets. The experiments of this paper inject a 40-byte packet for every 1514 bytes (vmss) of user data; thus, the bandwidth overhead is 2.6%. If vmss is increased to 5*1514, the bandwidth overhead is reduced to 0.52%. The packet overhead for the reported experiments is the worst case scenario; i.e., all packets are full-sized and vmss is 1514 bytes. However, the typical distribution of packet sizes[15] provides a mean packet size of 420 bytes. Thus, a reasonable possible packet overhead is 5.56%. (3) The current AFC approach is applicable to one-to-one and one-to-many network topology. It requires further investigation to extend the AFC scheme to one-to-any topology.
it makes the interpacket delay characteristics for streaming traffic more predictable. The AFC scheme also improves bandwidth assurance in a network with multiple congested nodes by protecting all conformant traffic. Aggregate Flow Control is an easily deployable overlay mechanism for Diffserv networks that significantly improves aggregate fairness and network behaviour. VIII.
The authors would like to thank Nabil Seddigh, Steve Jaworski, and Peter Pieda for discussion and support at various stages of this work. IX. [1]
[2] [3]
[4] [5]
[6]
[7]
[8]
VII.
CONCLUSIONS
The major contribution of this paper is the proposed novel scheme for Aggregate Flow Control in a Diffserv network. A prototype is developed to demonstrate the following capabilities of the proposed scheme: (1) the aggregate congestion management is performed in a TCP-friendly manner; (2) The scheme allows elastic growth of customer traffic aggregates at times of no congestion; (3) The shared queue at the core is distributed to the respective edges whose ingress traffic is causing congestion. These capabilities of the AFC mechanism lead to improved end-to-end bandwidth and delay assurances in a Diffserv network. The various experiments illustrate the specific benefits of AFC. AFC offers an improved fairness in interaction of TCP and UDP traffic in a Diffserv network. The scheme makes the bandwidth sharing among aggregates insensitive to the number of microflows in an aggregate and its packet sizes. The AFC approach improves transactions per second for short-lived TCP flows under congestion; and
ACKNOWLEDGMENT
[9]
[10]
[11]
[12]
[13]
[14] [15]
REFERENCES
S. Floyd and V. Jacobson, “Random Early Detection gateways for Congestion Avoidance,” IEEE/ACM Transactions on Networking, V.1 N.4, August 1993, p. 397-413. S. Blake et al, "An Architecture for Differentiated Services,” RFC 2475, December 1998. D. Clark and W. Fang, “Explicit Allocation of Best Effort Packet Delivery Service,” IEEE/ACM Transactions on Networking, V.6 N. 4, August, 1998. J. Heinanen, F. Baker, W. Weiss, and J. Wroclawski, “Assured Forwarding PHB Group,” RFC 2597, June 1999. J. Ibanez and K. Nichols, “Preliminary Simulation Evaluation of an Assured Service,” Internet Draft, draft-ibanez-diffservassured-eval-00.txt, August 1998. N. Seddigh, B. Nandy, and P. Pieda, “Bandwidth Assurance Issues for TCP flows in a Differentiated Services Network,” In Proceedings of Globecom’99, Rio De Janeiro, December 1999. W. Lin, R. Zheng, and J. Hou, “How to Make Assured Services More Assured,” In Proceedings of ICNP, Toronto, Canada, October 1999. I. Yeom and N. Reddy, “Realizing throughput guarantees in a differentiated services network,” In Proceedings of ICMCS, Florence, Italy, June 1999. B. Nandy, N. Seddigh, P. Pieda, J. Ethridge, “Intelligent Traffic Conditioners for Assured Forwarding Based Differentiated Services Networks,” In Proceedings of Networking 2000, Paris, France, May 2000. A. Chapman A and H.T. Kung, “Traffic Management for Aggregate IP Streams,” In Proceedings of CCBR, Ottawa, November 1999. H.T. Kung and S.Y. Wang, “TCP Trunking: Design, Implementation and Performance,” In Proceedings of ICNP, Toronto, Canada, October 1999. D. Harrison and S. Kalyanaraman, “Edge-To-Edge Traffic Control for the Internet,” RPI ECSE Networks Laboratory Technical Report, ECSE-NET-2000-I, January 2000. S. Floyd and K. Fall, “Promoting the use of End-to-End Congestion Control in the Internet,” IEEE/ACM Transactions on Networking, August 1999. Netperf: http://www.netperf.org/netperf/NetperfPage.html S. McCreary and K. Claffy, “Trends in Wide Area IP Traffic Patterns,” ITC Specialist Seminar on IP Traffic Modeling, Measurement and Management, Monterey, September 2000.