TCP-ROME: performance and fairness in parallel downloads for Web and real time multimedia streaming applications Roger P. Karrer
Ju-Won Park and JongWon Kim
Deutsche Telekom Laboratories TU Berlin D-10587 Berlin, Germany
[email protected]
Networked Media Laboratories Gwangju Institute of Science and Technology Gwangju, Korea {jwpark,jongwon}@gist.ac.kr
Abstract— Parallel download protocols that establish multiple TCP connections to distributed replica servers have the potential to reduce file download time and to improve the quality of realtime multimedia downloads. Unfortunately, parallel download protocols are also inherently unfair towards single-flow downloads. This paper presents TCP-ROME, a parallel download protocol that allows a dynamic mitigation of throughput and fairness. A receiver-based framework allows a dynamic adjustment of the congestion and rate control of each subconnection. TCP-ROME offers two usage modes: a binary mode where the congestion control of each subconnection can be switched between a TCP-fair rate (high priority) and at a TCP-LP (low priority) fair rate, and a more complex range mode where the aggregated throughput aims at meeting a specified target rate. This paper describes the TCP-ROME protocol and shows the suitability of the range mode for real-time streaming applications. For the binary mode, we develop novel analytical throughput models for TCP-LP and for TCP-ROME. The models are validated via simulations. Extensive simulation scenarios show the flexibility of TCP-ROME in mitigating performance for fairness.
I. I NTRODUCTION Multimedia content is increasingly used in ”‘traditional”’ Web pages as well as on the novel Web 2.0, where sites such as MySpace, Yahoo! and YouTube allow users to upload and share videos. Commercial sites offering video on demand are flourishing all over the world. In addition, home-made videos are flooding the Internet. In August 2006, a total of 6.9 billion video downloads were initiated from the top U.S. streaming video properties, among them 1.4 billion video downloads from Fox Interactive (MySpace), 823 million from Yahoo! and 688 million from YouTube 1 , and the recent acquisition of YouTube by Google emphasizes the market potential companies see in home-made videos. Apart from the user demand and service providers, telecom operators are tapping into new revenue potentials by streaming video over UMTS to the next-generation cell phones. To scale these systems, data is frequently replicated, e.g. using Mirrors, Content Distribution Networks (CDNs) and 1 source:
www.comscore.com
peer-to-peer (p2p) systems. Parallel download protocols that establish multiple TCP connections from one client to multiple distributed servers can then take advantage to enhance the throughput via the bandwidth aggregation of multiple connections and to improve end-to-end latency. The drawback of parallel download is that the performance of concurrent connections that share the same bottleneck may suffer from the more aggressive bandwidth usage of the parallel download. Depending on the connectivity and the user preferences, this unfairness may be desirable or not. If the user wants to download multiple video streams, it may want to give preference to the one he is watching. Yet, the others may not need to be closed because the primary download may have a remote bottleneck. Then, the other connections should be able to use the available bandwidth. Existing parallel download protocols lack the ability to dynamically abate performance. For parallel connections between the same client-server pair, Hacker et al. [8] propose a protocol where the congestion window is changed based on a virtual, higher round trip time rather than the real RTT. Unfortunately, throughput and RTT are only loosely coupled and the model is not readily extended for distributed replica servers. Fairness in peer-to-peer networks is based on a titfor-tat model [18]. This fairness definition aims at preventing free-riding and is not related to a fair resource allocation of a shared bottleneck resource. For This paper presents TCP-ROME, a novel parallel download protocol that allows a mitigation of throughput and fairness. TCP-ROME relies on a previously developed protocol, TCPPARIS [9] that establishes parallel connections to n distributed replica servers and coordinates the download of disjoint segments and therefore provides the same reliability and reduced download time. However, TCP-ROME extends TCP-PARIS by a novel Congestion Control Coordinator (CCC). The CCC allows a dynamic adjustment of the instantaneous transmission rate. Each subconnection must not exceed the TCP fair rate, but the CCC may limit the update of the congestion window to reduce the aggressiveness of the aggregated throughput and
therefore increase fairness towards competing flows. The contributions of this paper are as follows. First, we develop the TCP-ROME protocol. In particular, we describe concept and the design of the Rate Control Coordinator. The CCC can operate in two modes: a simple to use binary mode and a range mode. In binary mode, the subconnections can download data at two distinct ”priority levels”. In high priority, a subconnection adheres to the congestion control of a TCPfriendly protocol (TCP Reno, Vegas, etc), whereas low priority adheres to the congestion control of TCP-LP [11] or TCPNice [21]. In range mode, the CCC aims at meeting a specified target rate, e.g. that of a real-time streaming rate. Second, we show the suitability of TCP-ROME to better support real-time streaming multimedia applications. Ensuring a satisfactory video displaying for the user requires a small startup delay and a smooth playback rate. While UDP has traditionally been used as the underlying protocol for realtime streaming, real deployments of UDP-based systems today became impractical because firewalls are blocking UDP traffic. Therefore, real-time streaming protocols must be streamed over TCP. Unfortunately, TCP is far from efficient for realtime streaming. Reactions to packet losses or packet reordering result in drastic throughput degradations within TCP and thus to playback stops of the video. Recent studies show that the available bandwidth along the end-to-end path must be at least twice the playback rate of the video [13], [22] - a condition that is often not met in today’s Internet. Moreover, segment retransmissions are required to arrive before the playback time of the enclosed frame. If the frame is important, e.g. an I-frame in an MPEG movie, a late arrival may significantly distort the movie. To avoid playback interruptions, client-side buffering is used. But how should the buffer sizes be set? Large buffers lead to an increased startup delay. Significant startup delays beyond the user’s patience threshold lead to frequent download aborts. Moreover, while abundant buffer may be available in today’s PCs, next-generation mobile phones capable of real-time streaming may have limited buffer space. TCPROME provides the basic mechanism to support real-time streaming. TCP-ROME additionally provides a rate adaptation mechanism that aim at adapting the aggregated download rate to the requested streaming rate of the video. This rate adaptation ensures that the finite client buffer neither overflows nor starves. Initial experimental results via simulations show that TCP-ROME is able to provide a sustained throughput and therefore significantly increase the video quality. Third, we derive novel analytical models for the throughput of TCP-LP and TCP-ROME. We apply twell-known TCP model methodologies and to derive a model for TCP-LP. Then, the models for TCP and TCP-LP are combined for TCP-ROME. Therefore, we are able to predict the expected throughput and the expected download time for TCP-ROME. We validate the models with ns-2 simulations. Forth, we evaluate the performance-fairness tradeoff of TCP-ROME with extensive simulations. We show with simple baseline scenarios where a TCP-ROME download shares a bottleneck with single flow TCP downloads that TCP-ROME
sender 2
sender 1
sender 3
file
file
...
file
sender n file
receiver Fig. 1.
Parallel download of replicated content
allows a user to vary the download time of a 10 MB file between 18 min and 2.9 min, subject to the number of high priority connections it uses. At the same time, we show though that the download time of concurrent single TCP downloads increases from 11 min to 21.3 min. Moreover, in a typical peer-to-peer scenario with multiple parallel and single flow downloads sharing a network, TCP-ROME offers download times between 6.5 min and 3.2 min, whereby the single flow download time increases from 6.5 min to 10.4 min. These results show that the adjustment of parallel download fairness has a significant impact on single-flow performance. This paper is organized as follows. In Section II, we provide background on parallel downloads and the performancefairness tradeoff. Section III describes the design of the TCPROME protocol, its building blocks and the range and binary mode. Section IV provides details on the feasibility to stream content in real time from multiple distributed sources to one destination using TCP-ROME. Section V and VI derive and verify throughput models for TCP-LP and TCP-ROME. Section VII present baseline simulations and experimental results, including real-time streaming and the system-wide performance-fairness tradeoff in a typical peer-to-peer setup. After discussing related work in Section VIII, we conclude in Section IX. II. BACKGROUND A. Scenario We consider the scenario of a client that wants to download a movie and see it in real time. The movie file is available on multiple distributed replica servers, as depicted in Figure 1. We assume that the client knows the addresses of the replica servers that host the file, and that the file is consistent on all servers. Such a scenario is frequently encountered today, as replica systems are a well-known technique to build scalable systems, e.g. mirrors or content distribution networks, and can also be applied to Web2.0. A parallel download protocols establishes n TCP connections to the distributed servers and downloads data in parallel.
server
content feedback loop replica 2
sender 1
TCP
...
bottleneck
replica n
replica 1
bottleneck server
sender 2
TCP
TCP feedback loops
TCP TCP TCP
receiver
bottleneck
sender n
unfairness
TCP
unfairness parallel download
Fig. 2.
Challenge: how can the two control loops interact?
parallel download
Fig. 3.
In parallel here means that (i) the data is received simultaneously from the different servers, and (ii) the downloaded from the different servers is disjoint, i.e. no two servers deliver the same content but the sum of the data downloaded from the different servers is the entire video. In such a scenario, parallel download protocols have the potential to • aggregate the bandwidth of multiple connections. If the connections use disjoint paths, they can multiply their bandwidth, and if they share the same bottleneck, they can obtain an n-times higher throughput than their fair share. • compensate the throughput degradation of one connection by increasing the throughput of another connection • ensure a timely delivery of the data: if one connection is not able to deliver the data in time, e.g. because of TCP congestion control reactions, the data can be downloaded from alternative servers. • a faster startup time because multiple streams initiate their slow-start phase. Thus, a protocol that is able to fulfill this potential will significantly improve the viewer’s experience, with the obvious exception that a parallel download protocol can not achieve a higher throughput than the bottleneck capacity. That is, if the client-side connection is the common bottleneck (e.g. a slow modem or dial-up connection), the parallel download will not be able to achieve a higher throughput than the capacity of this bottleneck. Unfortunately, current parallel download protocols from peer-to-peer systems or Grids are not able to fulfill the above potential, as they either rely on UDP [14], work between a single client-server pair only [23], [8] or focus on overlay multicast via peer-to-peer streaming [4], [17] but lack the realtime feature. B. Challenges for real-time streaming The challenges in real-time streaming via TCP from multiple distributed servers arises because two independent and uncoordinated control loops must be integrated. The two control loops are the TCP control loop and the content coordination, as depicted in Figure 2. The TCP control loop dynamically adjusts to congestion in the network and ensures that the segments are delivered reliably. The content coordination controls the data delivery of each server to avoid overlaps and
single flow download
parallel download
parallel download
single flow download
Unfairness scenarios in parallel downloads
to ensure that the entire video is delivered. These two loops are traditionally independent, separated by the socket interface. A separation is tolerable if the delivery of data is not bound to time constraints, as the simplicity and popularity of peerto-peer protocols such as BitTorrent [6] shows. However, a separation of the control loops has severe drawbacks for realtime streaming: if the data delivery is blocked on one connection, e.g. due to congestion, the video is stopped until that part of the video has been transmitted. With a minimum TCP retransmission timeout of 1 sec, the streaming interruption can not be hidden from the user. If the congestion is persistent, the viewing experience may be intolerable. Therefore, to ensure a pleasant viewing experience, we argue that a joint congestion and content control is necessary. C. Fairness challenges Parallel download protocols have an intrinsic unfairness towards concurrent single flow downloads that share the same resources. Figure 3 illustrates two cases of unfairness. First, on the left, we consider the case of a client-side bottleneck. The bottleneck network may range from a DSL connection of a single user to an entire university network. In both cases, an excessive use of parallel downloads may severely reduce the throughput of concurrent flows, including Web access and VoIP. On the right, Figure 3 depicts server-side unfairness. Since each parallel download establishes multiple connections, the number of connections at the servers can increase beyond the capacity. Therefore, a large deployment of parallel download may lead to (temporary) shortages of server-side resources up to the case where the service becomes unavailable. In reality, users often ignore where the bottleneck is in reality. Bottlenecks may be on the client or server side, or both. It is beyond the scope of this paper to try to identify where the bottleneck lies. Instead, we want to develop a mechanism to trade off performance and unfairness s.t. to the application and user preferences. III. TCP-ROME
CONCEPT
TCP-ROME is a multipoint-to-point protocol that provides a reliable file delivery by establishing n parallel connections to n replica servers, as depicted in Figure 4. We assume that the
sender 1
replica 1
High (TCP)
receiver 1
Low (TCP-LP)
...
... sender n
replica n
High (TCP)
TCP-ROME receiver
client
CCC receiver n
Low (TCP-LP)
TCP-ROME senders
Fig. 4.
set priority (1... )n
Parallel download using TCP-ROME
client has previously detected that the same file is consistently replicated on these servers. TCP-ROME has two goals: the coordinated download of disjoint parts of the original file from the different replicas, and the mitigation of fairness and performance. For the former, TCP-ROME bases on TCP-PARIS [9] which splits the file into smaller units and downloads disjoint units from the different replicas. However, in contrast to BitTorrent, TCP-PARIS coordinates the download at the granularity of individual TCP segments. Each subconnection behaves like a standard TCP connection in termms of congestion control and reliable delivery, except that every subconnection only delivers a subset of the original segments. For this purpose, the receiver assigns individual, non-overlapping TCP segments to the subconnections, e.g. subconnection 1 is assigned segment 1, subconnection 2 segment 2, etc. Because of the reliable data delivery, the entire file is downloaded in parallel. The receiver assembles the segments into the original stream before passing it to the client. TCP-PARIS has been shown to have a nearto-optimal download time. TCP-ROME extends TCP-PARIS by providing a framework that allows a dynamic adjustment of the transmission bandwidth of each subconnection to trade off performance and fairness. The key novelty of TCP-ROME is the congestion control coordinator (CCC) at the TCP-ROME receiver to mitigate performance and fairness. The CCC coordinates the throughput of the subconnections during the download. In particular, the CCC is allowed to modify the opening of the congestion window of each subconnection to meet an overall performance-fairness tradeoff of the parallel download. For the CCC, we distinguish two modes of operation that adhere to different goals: binary mode and range mode. These modes are described further below. A. Rate coordinator The rate coordinator (RC) coordinates the congestion control behavior of the subconnections. A rate coordinator is needed if, for some reason, the streaming rate has to be limited. Examples are the ability to rate limit the throughput for fairness reasons. Consider a home user connected over a DSL connection who runs multiple networked applications in parallel. The use of n parallel connections may reduce the throughput of concurrent connections or even starve them. If the capacity of the DSL connection exceeds the required video streaming rate, the parallel download may be dynamically
adjusted to approximately match the video display rate. This adjustment does not reduce the video experience of the user, as just less data is pre-fetched, but concurrent flows are able to achieve a higher throughput. Another reason is a limited buffer space at the receiver. In particular, next-generation mobile streaming devices (phones) will be limited in the ability to cache data. Therefore, an adaptation of the streaming bandwidth is desirable. Denote Rv as the video playback rate and ri (t) as the instantaneous throughput of a TCP-ROME subconnection, defined by wi ∗ ssize (1) ri (t) = rtti where wi and rtti denote the congestion window and the round-trip time of the i − th subconnection at time t and ssize the segment size, which wePconsider the same for all subconnections. Then, R(t) = i ri (t) is the aggregate throughput of TCP-ROME. For real-time streaming, the rate coordinator adjusts R to match the video display rate, i.e. R ≈ Rv . R is reduced or increased by modifying the congestion control of one or multiple subconnections. TCP uses per definition an AIMD algorithm to adjust the congestion window w. It increases w linearly by w ← w + α with α = 1/w during congestion avoidance. In times of congestion, TCP decreases w by w ← βw with β = 1/2. Moreover, TCP may even trigger a timeout that incurs a sending stop and a reset of the congestion window. Since the reactions infer a sharp drop in the throughput of a single connection and therefore also in the aggregated throughput, the use of a parallel download protocol allows the compensation of this reduction via the other subconnections. The RC dynamically decides how to influence the in- or decrease of the throughput of the subconnections. We propose a simple threshold-based mechanism. Denote R+ and R− as the upper and lower threshold of the desired rate such that R− < Rv < R+ . If R+ (t) > R(t), the parallel download rate exceeds the necessary streaming rate. Therefore, the RC signals the subconnections to omit throughput increases. In particular, if a subconnection receives an event to open its congestion control window but the RC has signaled that a rate increase is not necessary, the subconnection ignores the event and maintains the congestion window at its actual size. Alternatively, if R− (t) < R(t), the streaming rate is no longer maintained. Here, the RC has two options that can be set by the user. The first option is to strictly adhere to TCP fairness and accept that the streaming rate may not be sufficient. An alternative is a ”compensation” model. In times of high available bandwidth when subconnections ignore events to open the congestion window, the RC may count the number of ignored events per subconnection. In times of low bandwidth, the RC may then compensate the ignored events by modifying the TCP behavior - either by increasing the throughput of the subconnections even though no event is triggered, or reducing the impact of a window decrease (e.g., reduce the window by
SMS
Content Coordinator: Subconn1: Subconn2: Subconnn:
LAS sliding window
Segment not yet received
file start
Subconn Receiver 1
prule 1 prule 2 prule 3
Video Player
Subconn Receiver 2
overflow desired operation
Subconn Receiver n
B+
starvation
Fig. 6.
Download coordination at the receiver: partition rule
B-
TCP segments
Fig. 5.
file end
Buffer management at the receiver
1/4 rather than 1/2). These options are available within TCPROME, but their use is subject to the viewers preferences. Moreover, they do not guarantee that the streaming rate is maintained. How does TCP-ROME know the video display rate Rv ? There are two options. First, the user knows the rate and sets it as a parameter. If the rate is unknown, TCP-ROME can dynamically monitor the receiver buffer and infer the display rate. In particular, denote rv (t) as the the actual frame rate required by the video at the client at time (t). The rate is time dependent as the video content may vary even if the frame rate remains constant. Data arrives from the network at a streaming rate rs (t). The ratio between rv (t) and rs (t) determines the fill degree of the client-side buffer, depicted in Figure 5. Denote b(t) as the actual degree to which the buffer is filled. Again, a simple strategy to adjust the streaming rate of the subconnections is to define two thresholds B + and B − . The objective is to maintain B + < b(t) < B − . If b(t) < B − , the buffer dries out and the movie will be halted. Therefore, the RC must increase the download rate of its subconnections. In contrast, if b(t) > B + , the buffer is filling up, and the RC may reduce the download rate. When the RC modifies the congestion control of its subconnections, the question arises: which subconnection should be considered? For example, consider a simple case of a parallel download with just 2 subconnections, each able to stream at 1 Mbps, and a required video streaming rate of 1 Mbps. Should the RC decide to download all data from 1 subconnection only? Or rather, should the load be balanced over both subconnections at equal rates? But what happens when the rates are not equal? How does the RC deal with short-term vs. long-term fluctuations in the bandwidth? Looking at literature, similar problems have been discussed at length. Basically, the strategies can be grouped into three categories: random, round robin and sophisticated strategies. The first two strategies are simple and pretty efficient. Sophisticated strategies usually require a non-negligible implementation overhead, are difficult to tune and show low 2-digit improvements at most. Therefore, we opt for a simple random strategy, with the option to bias changes towards high-bandwidth subconnections: we devide the interval [0, 1) in proportion to the subconnection’s bandwidth (a high bandwidth subconnection gets a larger interval),
draw a random number and adjust the subconnection into which the random number falls. We argue that this strategy is still simple but takes into account that the TCP behavior of potentially congested links is less likely to be modified. B. Content coordinator The content coordinator (CC) coordinates the download of the segments. In the basic TCP-ROME framework, it ensures that each segment is successfully received only once. While the CC ensures that each segment is eventually delivered, it is necessary to ensure a timely delivery of the segments, i.e. before their play time, for real-time streaming. Therefore, the CC must jointly monitor the partition rule (Figure 6) for missing segments with in the sliding window as well as the buffer depicted in Figure 5 to identify segments that are about to be displayed but have not been delivered yet. Denote si (t) as the location of the i − th segment in the client-side buffer at a given time t. The 0 − th segment is the segment that will be fetched next by the video player. Over time, the index i decreases as the video plays. The expected play time of a segment si , measured from a given time t, can be estimated as i · ssize tp (si ) = (2) Rv with Rv as the average video play rate and ssize as the segment size. We define that a segment is missing if tp (si ) < rttj , i → j, where rttj denotes the round-trip time of the j−th subconnection and i → j denotes that segment i is assigned to subconnection j. The notion of ”missing” here reflects the view of the content coordinator that ignores the status of segment i, i.e. whether it is in transit or lost. Given the fact, though, that the playback time of a missing segment is only a round-trip time away and that it takes at least one RTT to re-request and deliver a segment, the CC must enforce a retransmission. Thus, the content coordinator constantly checks for missing segments in the buffer and ensures that si ∈ B∀i : tp (si ) < rttj , i → j
(3)
where B denotes the buffer. Of course there is no guarantee that the segment is not lost again even after re-requesting it, but this argument could be repeated over and over again, leading to the conclusion that the entire video must be downloaded before streaming - which is clearly undesirable. Therefore, to trade
application
TCP-PARIS
application
Partition rule
file
TCP-PARIS
Partition rule
to client 8 7 6
4
to client
1
8 7 6 4 1 8 7 6 4 1
file 5
(a)
3 2
(b)
Fig. 7. Two alternatives to design the interface between the application and the transport layer
off segment loss and buffer space, we focus on re-requesting a segment once in the above case. Equation 3 can be simplified and modified in various ways. A simplification is to replace rttj by maxj (rttj ). This simplification ensures that a missing segment can be re-requested over any subconnection. Moreover, the notion of a ”missing segment” may also be adjusted to any value larger than rttj . To re-request a missing segment, the content coordinator flags the corresponding segment for immediate delivery. If subconnection j is assumed to be unable to deliver the segment in time, the content coordinator removes the segment from the assignment list of subconnection j and assigns it to an alternative subconnection k. The case of re-requesting a missing segment may violate the original concept of TCP-ROME that every segment is delivered over exactly one subconnection. By re-assigning a segment to an alternative subconnection, it is possible that the same segment eventually arrives over both subconnections. However, this tradeoff is necessary to maintain the real-time streaming properties. Should a subconnection be responsible for a significant number of missing segments, TCP-ROME can consider abandoning this subconnection. C. Discussion: towards a new network interface The current Internet protocol suite contains two types of transport-layer protocols: UDP is message-oriented and TCP is stream-oriented. A real-time download clearly has the characteristics of a stream. However, the distributed nature of the servers requires a messaging protocol that coordinates the download. How and where can this contradiction be united in the current Internet protocol design and what are the implications for a future Internet architecture? In TCP-PARIS and TCP-ROME, the contradiction is solved at the transport layer by separating what is downloaded from where. At the sender, the protocol itself breaks the stream into two types of segments as depicted in Figure 7: segments that are assigned to the connection and are therefore delivered, and segments that are received from the application, temporarily stored at the sender but eventually discarded when the receiver signals that the segment has been received from another server. There are a number of advantages and disadvantages in this design decision. A first advantages is that the ”unstreaming” is independent of the application. Thus, the same application can be used for single and parallel downloads without changes. The flip side, of course, is that the underlying operating
system kernel must be extended with the parallel download functionality. A second advantage is that each server can always deliver all segments if needed. E.g., consider the case that during a parallel download one or even multiple connections go down. Thus, they are no longer able to deliver the assigned segments. In the current design, each server has all segments available that have not yet been delivered successfully over any connection. In an alternative design where the ”unstreaming” is done within the application context such that only assigned segments are given to the transport layer, such a re-assignment requires an interaction between the transport and the application layer. The drawback is, of course, that the server requires larger buffers to maintain all segments within the sliding window. Thus, we note that neither the transport-layer protocol approach nor the application-layer approach are without drawbacks given the current Internet protocol architecture. Given the importance of real-time streaming and the current openness for a clean slate Internet design, we asked ourselves whether we can come up with any recommendation for a future architecture? We came up with three. First, clearly separate control- and data plane. A lot of challenges we faced were due to the fact that data and control were not sufficiently separated, neither within a layer, nor among the layers. Second, a more transparent signaling between the layers is needed. Even though layering is often heavily criticised, we recognized that the layering paradigm has contributed to the evolution of the Internet and therefore can be expected to prevail in some form also in the future Internet. However, we need new ways to allow interactions among the layers, even if the complexity increases. Third, end systems should be reconsidered in terms of memory. Currently, end system memory is divided into kernel and user memory - the former is highly limited, the latter almost abundant. We believe that having two groups only is no longer suited. Similarly, it is no longer suited to have an upper bound on per-connection memory in the kernel. An enhanced flexibility is required for next-generation networks because of the increasing network heterogeneity (wired and wireless) and the diversity in end systems. D. Binary mode In the binary mode, TCP-ROME allows a variation of each subconnection’s aggressiveness between 2 priority levels: high and low priority. High priority corresponds to a subconnection using TCP Reno, Vegas or a related TCP-friendly protocol, whereas a low-priority subconnection uses TCP-LP [11]. The latter TCP variant is transparent to TCP traffic and utilizes only the excess bandwidth in the network. Given a fixed number of n parallel connections, a user can adjust the performance by determining how many connections run at high priority h and low priority l respectively, with h + l = n. The larger the number of high priority connections, the higher the throughput and the larger the unfairness. The binary mode is particularly appealing because of its simplicity. A single parameter (either h or l) is sufficient to mitigate fairness and performance. In practice, a simple GUI
gives a user the possibility to adjust the download speed of the parallel download. A particular configuration of the binary mode is the setting h = 1, where TCP-ROME uses only one subconnection at TCP fairness. By setting h = 1, TCP-ROME basically behaves like a single TCP connection, but with 2 differences. First, the remaining n − 1 subconnections can contribute to the download and thus reduce the time. Second, the CCC is free to choose which subconnection should be at high priority. An opportunistic selection of the fastest connection at any time during the download can significantly reduce the download time. We discuss these selection strategies in Section III-F. E. Range mode The range mode targets applications with known throughput requirements, such as multimedia applications or real-time streaming. While UDP has been traditionally used for realtime streaming, recent work advocates the use of TCP as an underlying protocol [22]. Wang et al. also show the feasibility of supporting real-time streaming using TCP if the TCP throughput is approximately twice the bandwidth requirement of the stream. We argue that TCP-ROME is well suited to support these applications. First, by using n connections, TCP-ROME has the potential to achieve a higher goodput via parallel streams. Second, it can dynamically adapt the download rate to ensure a fair resource usage. In range mode, the throughput of each subconnection is varied between high and low priority. Denote B ∗ (t) as the target bandwidth of the real-time stream at time t during the i download. Denote BROME (t) the throughput of TCPPthe i-th i (t) ROME connection at time t, and BROME = i BROME as the total throughput of the parallel download. The objective of the TCP-ROME range mode is to coordinate the congestion control such that the aggregate bandwidth of TCP-ROME matches the target rate, i.e., BROME (t) ≥ B ∗ (t) for each time during the download. Since each TCP-ROME subconnection i must adhere to TCP fairness, i.e. BROME ≤ BH , it is obvious that TCP-ROME can match the target rate only if the aggregated bandwidth of all n TCP subconnections running at high priority exceeds than the target bandwidth. The range mode works as follows. For each subconnection, the congestion control coordinator calculates the aci tual throughput of each subconnection as BROME (t) = cwndi ·MSSi , and the total throughput as B . Moreover, ROME RT Ti instead of having a single value for the target rate B ∗ , we ∗ ∗ introduce B+ and B− as upper and lower bounds of the target bandwidth. When an event that modifies the congestion control (reception of an ACK, ECI or packet loss) occurs on subconnection i, its congestion window is adjusted according to the following rule: cwnd i 2 cwnd + 1 cwndi = i cwnd 2 0
if if if if
i detects a packet loss ∗ an ack is received ∧ BROM E (t + 1) ≤ B+ ∗ i detects a single ECI ∧ BROM E (t + 1) ≥ B− ∗ i detects an ECI during IT ∧ BROM E (t + 1) ≥ B− (4)
In Equation 4, the first case ensures that the subconnection behaves TCP friendly in the case of congestion. Condition 2 allows a rate increase according to TCP if the total throughput
download 1
?? ?? ?? ? ?? ? Core
download 2
Fig. 8.
Simulation setup
does not exceed the upper threshold of the target rate. Cases 3 and 4 allow an adjustment of the congestion window a la TCP-LP if the total throughput does not fall below the lower target rate threshold. F. Connection setting strategy Besides the number of high and low priority connections or the target bandwidth, TCP-ROME offers a forth parameter to influence the performance-fairness tradeoff: the priority selection strategy s. The priority selection strategy s defines which subconnections should run at high priority and which at low priority. The CCC can dynamically adjust priorities among the flows at any time during the download without overhead. Since the available rates of each subconnection are not known a-priori and a difficult-to-predict function of network conditions and server load, the ability to dynamically adjust priorities is a key feature of TCP-ROME to maintain a high throughput under dynamic conditions. Among the wide variety of strategies, we discuss here 4 strategies that will also be evaluated in Section VII-D.3. Given n and h, strategies 1 and 2 select the h connections with the smallest smoothed RTT (s rtt) and the biggest congestion window (cwnd) respectively. These strategies promise a higher throughput compared to a random or a static strategy because a short rtt has the potential to increase the throughput faster, and a large cwnd corresponds to a high throughput. Note that TCP-ROME exchanges cwnd and rtt information between the sender and the receiver. The remaining 2 strategies combine the RTT and congestion window. Strategy 3 selects the h subconnections with the largest ratio bw := cwnd/s rtt, and strategy 4 chooses the h subconnections with the largest ”smoothed” bandwidth ratio: sbw = α ∗ cwnd s rtt + (1 − α) ∗ bw, with α = 0.9. IV. E VALUATION OF
THE REAL - TIME STREAMING SUPPORT
This section evaluates the ability of TCP-ROME to support and enhance the streaming performance of real-time downloads using ns-2 simulations. First, we assess the overall benefit of TCP-ROME in a distributed system with 50 clients and servers. Then, we provide details on the Rate Controller and the Content Controller. A. Streaming performance Parallel download protocols have the potential to increase the aggregated throughput and therefore to improve the video
Sender 1
600
Receiver 1 Access Network (512kbps, 1Mbps, 2Mpbs)
500
Receiver 2
Core Network
Sender 2
10Mbps
Throughput [kbps]
400
Sender n
Receiver n
300
Fig. 10.
Simulation setup
200
3.5x106 100
3.0x106 0 2
3
4
5
6
7
8
Number of subconnections
Fig. 9.
Streaming performance
2.5x106
Throughput (bps)
1
2.0x106
1.5x106
1.0x106
experience by the user. To demonstrate this ability, we simulated a distributed system depicted in Figure 8 in ns-2. The system consists of a core network with 20 backbone routers that are connected via 100 Mbps links. 100 end systems are randomly attached to this core. The access link bandwidth between the end system and the core router has a uniform distribution of among 56, 128, 256, 512 kbps and 1 Mbps. Given this network topology, we randomly placed 50 clients on the end systems. These clients download files that vary in the size from 10-100 MB using TCP-ROME with n parallel subconnections. Similarly, n ∗ 50 servers are placed randomly on the end systems. Therefore, clients and servers may share connections to increase the dynamics in the network. Figure 9 shows the average download rate in kbps as a function of n. The results show that the rate is doubled from using n = 1 subconnection to n = 8. For this particular simulation, the biggest improvement is from n = 1 to n = 2: for n = 2, each end system hosts statistically a server or a client and therefore uses the bandwidth of the link. For n > 2, improvement are only due to temporary usage of links bandwidths and therefore add less improvements. We argue that the general ability to significantly improve the streaming performance is valid for many situations in real networks, however, the exact performance improvement depends on a plethora of factors, such as the location of the bottleneck or the number of flows sharing a bottleneck. The improvements in the download performance have a visible impact on the video experience of a user. However, this impact is only ensured when the content is controlled to ensure that the segments are delivered in time at the receiver. We are providing evidence of this fact below. B. Rate coordination To evaluate the rate coordinator, we revert to a very simple topology depicted in Figure 10. Data is streamed from n servers (senders) to a single receiver. The server-side access
500.0x103
0.0 0
20
40
60
80
100
120
Time (sec) Throughput of subconnection 1 (access link capacity = 512kbps) Throughput of subconnection 2 (access link capacity = 1Mkbps) Throughput of subconnection 3 (access link capacity = 2Mbps) Throughput of TCP-ROME (n=3)
Fig. 11.
Bandwidth aggregation without cross traffic
network has a lower bandwidth than the core and the receiverside network. This is obviously the best case for parallel download because the different connections have different bottlenecks. If they share the same bottleneck, e.g. at the receiver, the benefits of parallel download is less pronounced. The server-side networks have different bandwidths, allowing for different strategies to be investigated. First, we consider the case where only a parallel download is present. Figure 11 shows the obtained bandwidth of the different subconnections as a function of the download time. In this experiment, we assume that the parallel download application requires a bandwidth of 3 Mbps. This value is chosen because (i) no single connection is able to support this bandwidth but (ii) the aggregated bandwidth is larger than the required rate. Therefore, the Rate Controller must dynamically adjust the bandwidth to avoid a buffer overflow as well. The figure shows the obtained throughput of each individual subconnection as well as the aggregated throughput (termed TCP-ROME). We note that all subconnections initially contribute to the download at their maximal, TCP-bound speed. After 8 seconds, though, the RC notes that the available bandwidth has exceeded and that the buffer has been filled to the threshold. Therefore, it can reduce the bandwidth of at least one subconnection. In our simulation, we decided to
1.4
3.5x10
6
3.0x10
6
1.0
2.5x10
6
0.8
2.0x10
6
1.5x10
6
1.0x10
6
Total throughput Throughput of 1st subconnection Throughput of 2nd subconnection
0.6 0.4
Throughput (bps)
Throughput (Mbps)
1.2
0.2 0.0 0
20
40
60
80
100
500.0x10
3
0.0
0
120
60
80
100
120
Throughput of TCP-ROME (n =3) Throughput of subconnection 1 (access link capacity = 512kbps) Throughput of subconnection 2 (access link capacity = 1Mbps) Throughput of subconnection 3 (access link capacity = 2Mbps) Throughput of TCP 1 ( 20sec ~ 50sec) Throughput of TCP 2 ( 30sec ~ 60sec) Throughput of TCP 3 ( 40sec ~ 80sec)
Bandwidth aggregation with cross traffic
reduce the bandwidth that contributes the least rate. Therefore, the throughput of subconnection 1 is being reduced whereas the other subconnections continue to stream at their allowed TCP-fair rate. Note that the bandwidth slightly varies during the download as a function of TCP fluctuations. These small changes do not have an impact on the video quality as they can be handled by the client-side buffer. Next, we consider the case where cross-traffic is injected into the above topology. In particular, we inject cross traffic from the sender to additional receivers in the same access network. The cross traffic interferes with the TCP-ROME flows and forces the rate coordinator to adjust its rates. Cross traffic is injected after 30 sec. Therefore, for the first 30 seconds, both subconnections have an equal throughput of 0.53 Mbps. After 30 sec, the cross traffic interferes with subconnection 1. As a result, the throughput of subconnection 1 degrades. The rate coordinator compensates this degradation by allowing subconnection 2 to increase its throughput. The average aggregated throughput remains therefore at roughly 1 Mbps. Only a small drop is visible when cross traffic starts because the rate controller needs time to ramp up the throughput of subconnection 2. Next, we consider the more realistic case where concurrent TCP flows cause contention on the access links that impact the parallel download. In particular, we use the same simulation setup, but use 3 subconnections and add cross-traffic TCP flows. The legend in Figure 13 denotes the activation times of the cross flows. We note that the Rate Controller initially stabilizes the aggregated throughput at 3 Mbps. With the first cross traffic, however, the sustained throughput degrades. After 40 seconds, the cross traffic even reduces the aggregated throughput to less than 2 Mbps before the RC is again able to level it at 2.5 Mbps. The results shown with this simulation setup are coarsegrained in the sense that the rate changes are significant.
40
Time (sec)
Time (sec)
Fig. 12.
20
Fig. 13.
no cross traffic cross traffic
Bandwidth aggregation with cross traffic throughput [Mbps] 1.06 1.06 1.17
per-link 0.53 0.17 0.23
throughput 0.53 0.53 0.11
[Mbps] 0 0.36 0.83
TABLE I T HROUGHPUT ADJUSTMENTS WITH TCP-ROME
Therefore, it is difficult for the RC to find an equilibrium. In more realistic setups where a large number of TCP flows interact and compete, it can be expected that rate changes are more frequent and potentially less drastic. Second, this simulation also shows that the parallel download does not guarantee a sustained bandwidth. With all cross traffic, the long-term aggregated fair throughput is roughly 1.75 Mbps between 40 and 50 seconds. However, the rate is still far above the throughput obtained from a single-flow TCP download. Table I shows the average throughput. Three observations are important. First, the results show the flexibility of TCPROME to maintain a stable throughput. Depending on the temporary congestion on a certain link, the rate coordinator is able to compensate it with the remainder of the subconnections. Second, the target rate Rv serves as an upper bound to the aggregated throughput. Establishing a large number of subconnections (say 100) does not imply that other traffic is starved. Without the rate coordination, 100 TCP flows would grab a large share of a common bottleneck bandwidth, e.g. of a DSL connection. Third, we emphasize that TCPROME has a limited ability to sustain the target rate if the available bandwidth in the network is not sufficient. In the last experiment, TCP-ROME uses subconnection 3 to compensate throughput degradations. If no such backup is available, TCPROME is not able to maintain the target rate. In such a case,
20000
9000
18000
8000
16000
7000
14000
Sequence Number
10000
6000
5000
4000
12000
10000
8000
6000
3000 4000
2000 2000
1000 0
0 0,036448
0,0
5,511472
9,952372
14,334578
18,726173
23,279856
1,9
3,5
5,2
6,7
8,0
9,4
10,8
12,1
13,5
14,9
16,3
17,6
19,0
20,4
21,8
23,3
24,7
Time [sec]
Fig. 14. Data delivery without Content Coordinator: a timely delivery is not guaranteed
Fig. 15. Data delivery with Content Coordinator: segments are delivered within their playback time
the throughput can only be maintained if the subconnections do no longer adhere to AIMD. Considerations to severely violate TCP congestion control are beyond the scope of this work.
subconnections get as unaligned in their segment delivery as in the previous experiments. There are, of course, small variations in the delivery pattern, but at the presented (relevant) resolution, the arrival looks like a smooth line. Therefore, we conclude that the CC is a necessary component of parallel download protocols and that TCP-ROME significantly improves the viewing experience compared to other parallel download protocols.
C. Content coordination The ability to ensure that the segments are delivered in time to be displayed is vital for real-time streaming. To emphasize the importance, we conduct the following experiment: given the topology in Figure 10, a client downloads a movie in parallel with n = 3 subconnections. At different times, TCP cross traffic is added, as in the previous section. The cross traffic leads to packet drop and retransmissions, and even timeouts. Figure 14 shows the acknowledged sequence number on the y–axis as a function of the time in the case where no content coordinator is present. What happens is that one subconnection looses packets and retransmits them while the others continue to send segments. In the figure, the continuation is visible as a steadily increasing line. In contrast, packet loss and retransmissions slow down the delivery of the segments, visible by the almost horizontal line in the first half on the figure. The effects on the real-time streaming application are disastrous: since TCP only delivers data that has been received in order to the application, the video streaming rate is delimited by the lower line in Figure 14. Therefore, during the first 13 seconds, the video stops frequently and the viewing experience is therefore low. During this period, a large number of frames are delivered that have to be buffered at the client. After 13 seconds, the assignment and the delivery are synchronized and the video starts playing smoothly. However, towards the end of the shown interval, the subconnections are again diverging, as shown by the zigzag pattern in the trace. Instead, with the Content Coordination, the video experience is quite different, as Figure 15 shows. Here, the delivery of the segments creates an almost smooth line. The reason is that the Content Coordinator re-assigns segments that are missing in the sequence. Therefore, the streaming rate of the video can be maintained. Moreover, the CC avoids that the
V. TCP-LP THROUGHPUT
MODEL
In this section, we develop a throughput model for TCPLP [11]. We leverage the methodology used by Padhye et al. [16] that models TCP throughput. The congestion avoidance of TCP-LP differs in two ways from that of TCP. First, TCP reacts to packet loss via duplicate acknowledgments or timeouts, whereas TCP-LP reacts to early congestion indication (ECI). Therefore, the model for TCP-LP is a function of the ECI probability, which we denote as q. The ECI probability replaces the loss probability p of TCP models. An ECI can be triggered in two ways: by receiving an early congestion notification (ECN) from a router with active queue management, or by measuring one-way delays from the sender to the receiver. In the latter case, if the difference between the minimal measured delay and the actually measured oneway delay exceeds a threshold δ, TCP-LP assumes that the difference is caused by queuing delay of competing traffic. Second, TCP-LP reacts differently to an ECI. TCP-LP halves the congestion window after the first ECI and starts an inference timer. If another ECI is received before the timer expires, TCP-LP assumes that the congestion is confirmed and therefore reduces the congestion window to one. The combination of these two mechanisms ensures that TCP-LP is transparent to TCP cross traffic. To derive a model for TCP-LP throughput, we follow the derivation of a TCP throughput model by Padhye et al. [16] for TCP, but modifying the congestion avoidance for TCP-LP as described above. The details of the derivation are described in the Appendix. The final equation for the throughput of a
30
Throughput [Mb/sec]
Throughput [Mb/sec]
18 16 14 12 10 8 6 4 2
0
0
−4
10
0.5
−3
10
−2
q
10
−1
10
χ
15
10
5
2+b−4XT ± (XT · 6b
− (1 − q)
√ ...
)
√ 2+b−4XT ± ... (XT · ) 6b
− (1 − q)
) · XT
) · TIT
) (5)
with
Throughput [Mb/sec]
2+b−4X ± ...
15
20
25
30
24b + b(b + 16XT − 20) + 4(2XT − 1)2 q
q=0.0001 q=0.005 q=0.001 q=0.05 q=0.01 q=0.1
30 25 20 15 10 5 0
. . . :=
10
35
T 1−q + Wmax q + 3b √ BLP (q, χ) =min( , ... 2+b−4X ± T RT T RT T · ( + XT + 1)+ 6
√
5
40
√
s
0
Fig. 17. Throughput of TCP-LP as a function of the RTT for different ECI correlation probabilities χ
single TCP-LP connection BLP (q, χ) is
1 1−χ (1
20
RTT [ms]
Fig. 16. Throughput of TCP-LP as a function of the ECI probability q and the ECI correlation probability χ
1 1−χ (1
25
0
1
χ=0.1 χ=0.25 χ=0.5 χ=0.75 χ=0.9 χ=0.99
0
5
10
15
20
25
30
RTT [ms]
(6)
where q denotes the ECI probability and χ is the correlation probability of ECIs, i.e. the probability that an ECI arrives while the inference timer is active, TIT is the interference timeout and XT is the ratio of this timeout divided by the TIT round-trip time, i.e. XT = RT T . The intuition behind this equation is as follows. The first term is an upper bound to the throughput set by the max congestion window and is only important if there is no loss at all. Since TCP-LP react to ECIs, which occur with a higher probability than packet loss, it is unlikely that this term is relevant for the throughput. In 1 the second part, two terms are dominant: 1−q q and 1−χ·... . The latter term is relevant only for large values of χ (χ > 0.9), as we show below. For moderate values, the throughput follows the term 1−q q . Thus, a simplified approximation of Equation 5 can be written as 1−q 1 1 · · o( √ ) (7) BLP (q) = RT T q q The details of the influence of χ, q and the RT T on the throughput of a TCP-ROME connection are shown in Figures 16 to 18. First, Figure 16 plots the throughput of a TCP-ROME connection as a function of q and χ for a fixed RTT of 10 msec. The figure shows that q dominates over χ for ”high” and ”low” probabilities of ECIs. For example, for a high ECI probability of q = 0.1, the throughput is below 30 kb/sec, independent of χ. For low ECI probabilities of q = 0.0001, the throughput changes between 17 and 14 Mb/sec for 0.1 ≤ χ ≤ 0.9. The influence of χ is only significant for χ = 0.99 where the throughput drops to 4Mb/sec.
Fig. 18. Throughput of TCP-LP as a function of the RTT for different ECI probabilities q
In contrast, χ has a significant impact for ECI probabilities between q ≥ 0.001 and q ≤ 0.01. For q = 0.001, e.g., the throughput varies between 4.3 Mb/sec for χ = 0.1 to 1.5 Mb/sec (χ = 0.9) and even 0.17 Mb/sec for χ = 0.99. This result is also highlighted in Figure 17, which shows the throughput of a TCP-ROME connection as a function of the RTT and the correlation probability χ for a fixed value of q = 0.001. The influence of the RTT increases with lower values of χ. In particular, for χ = 0.1, the throughput varies between 7 Mb/sec and 2.3 Mb/sec for RTT values between 10 and 30 ms. Finally, Figure 18 shows the throughput of a TCP-ROME connection as a function of the RTT and the ECI probability q for a fixed value of χ = 0.5. For an RTT of 30ms, the throughput varies between 10 and 0.5 Mb/sec for different values of q. With increasing RTTs, the throughput decreases for TCP-ROME. The decrease is non-linear for low values of q whereas the difference is dismal for high values of q. A. Model validation Here, we validate the TCP-LP model using ns2 simulations. We simulate a single bottleneck scenario where a TCP-LP flow interacts with a varying number of TCP connections to create different ECI probabilities. The link capacity is either 1.5 or 10 Mb/sec, the delay 30 msec. The size of the RED queue is 2.5 times the delay-bandwidth product. The inference timeout
a result, the modeled value of q will be smaller than the number of measured ECIs. Similarly, in case 4 of Equation 4, if TCP-ROME does not react to a second ECI, it modifies χ. Therefore, the above model holds for both versions of TCPROME.
10 χ=0 χ=0.5 χ=0.99
Throughput [Mb/sec]
9 8 7 6 5
VI. M ODELING TCP-ROME
4 3 2 1 0 −4 10
−3
−2
10
10
−1
10
q
Fig. 19.
Model validation with ns-2 simulations
is 1 sec, the simulation duration is 1000 sec. The experiment is repeated 100 times with varying numbers of TCP flows and start times. Figure 19 shows the throughput in Mb/sec as a function of q. The different lines denote different values of χ. The figure shows that the model is accurate for low values of q, as all points lie within the upper and lower boundaries of χ. For q < 0.01, the individual data points also match the model with 5%. For values of q > 0.01, the throughput increasingly differs from the plotted model. The reason for the difference, though, is that the increasing value of q correlates with an increasing average RTT value. By adjusting the RTT in the model, the simulation measurements are again within 5% of the model. Thus, the simulations show a close match with the model. Moreover, the throughput model of TCP-LP(q) and TCP(p) (plots not shown in this paper) show identical trends. The correspondence of the trends reflects the fact that the congestion control of TCP and TCP-LP rely on the same concepts, and only their throughput is different. The throughput of TCP-LP is strongly influenced by its sensitivity to ECIs and its fast and rigorous reaction to ECIs. In particular, the correlation factor χ plays an important role for the throughput for moderate values of q. Thus, for the delay-based version of TCP-LP where the sensitivity to ECIs χ is a function of the delay threshold parameter δ, δ becomes a modeling parameter that can be adjusted for the performancefairness tradeoff. The original paper of TCP-LP empirically derives a value of δ = 0.15. Our results show that δ can be actively chosen to mitigate aggressiveness and fairness. For ECN-based TCP-LP, the throughput will vary different for different AQM schemes. The correlation of RED, which uses a random strategy to mark packets, is expected to be lower than FRED or RED-PD, which mark packets of a given flow with higher probability. Equation 5 explicitly models the binary mode of TCPROME, but implicitly also the range mode. The congestion control coordination of the range mode described by Equation 4 modifies the parameters q and χ. Case 3 in Equation 4 reduces the probability that TCP-ROME reacts to an ECI. Thus, relevant for the model is not the number of received ECIs but the number of ECIs TCP-ROME reacted to. As
In this section, we derive an expression for the expected throughput and the expected download time of a parallel download with TCP-ROME, based on the throughput model of TCP Reno and TCP-LP. A. TCP-ROME steady state throughput Denote BH (p) as the steady-state throughput of a TCP connection developed in [16] and BL (q, χ) as the throughput of TCP-LP described by Equation 5. Then, in the simplest case where priorities are fixed among the subconnections, the TCP-ROME steady-state throughput is simply the sum of the throughput of the individual subconnections, i.e. BROME (p, q, χ) =
h X
i BH (p) +
i=1
l X
j BL (q, χ)
(8)
j=1
with n as total number of subconnections and h and l as the number of high and low priority subconnections respectively such that n = h + l. The model does not need to distinguish whether the subconnections share the same (client-side) bottleneck or whether they have individual bottlenecks in the network or at the server, as described in Section II. If all subconnections share the same bottleneck, the probability of p and q will increase for all subconnections, whereas the values will be different for server-side bottlenecks. In the case that the priorities are changed among the subconnections during the download, we expect that the model can be split into phases. Each phase starts and ends with the change of the priority. However, we do not verify this expectation here due to its complex validation process. B. TCP-ROME expected download time The above model can be used to derive a model for the expected download time of a TCP-ROME connection. Cardwell et al. [3] developed a complete model for TCP latency as E[T ] = E[Tss ] + E[Tloss ] + E[Tca ] + E[Tdelack ]
(9)
where expressions for the expected time in slow start (E[Tss ]), congestion avoidance (E[Tca ]) and the times accounted for the first loss (E[Tloss ]) and delayed acknowledgments (E[Tdelack ]) are given in [3]. TCP-ROME only alters the congestion avoidance. The original term is E[Tca ] = E[dca ]/B(p), where B(p) is the steady-state throughput of the TCP connection. Thus, we simply replace this throughput by the throughput for a TCP-ROME: E[Tca ] =
E[dca ] BROME (p, q, χ)
where BROME (q, χ) is Equation 8.
(10)
TCP−ROME TCP−ROME without TCP−LP
4
TCP-ROME source 1
router
3.5
router
C
TCP-ROME client
TCP-ROME source n TCP source n
3 2.5
TCP sinkn
2
Fig. 21.
1.5
Common bottleneck setup
1 0.5 0
0
1
2
3
4
5
6
7
8
h (number of high priority connections) Fig. 20. Download time for TCP-ROME with and without low-priority connections
Moreover, if all subconnections start at the same time, the expression can be simplified to E[Tss ] + E[Tloss ] + E[Tdelack ] + E[Tca ] (11) n Figure 20 shows the download time E[T ] as defined in Equation 11 as a function of number of the high-priority flows in a parallel download with n = 8 parallel subconnections. p is set to 0.001, q to 0.1 and RTT=50 msec for a file download of 12 MB. The lower curve shows the expected download time of TCP-ROME, the upper curve the expected download time if the low-priority flows were omitted. The figure allows two conclusions. First, TCP-ROME allows a mitigation of fairness and performance that leads to differences in the download time between 4.2 min and 0.5 min, an 8 times difference. This mitigation can be adjusted any time during the download and is subject to user or application preferences. Second, the contributions of the low-priority flows significantly reduce the download time. For example, by using a single high-priority connection and 7 low-priority connections, the download time is reduced from 4.2 min to 2.2 min, a download time reduction of 48%. E[T ] =
VII. E VALUATION In this section we assess the performance and fairness tradeoff of TCP-ROME with extensive simulations. First, we study the tradeoff for a single TCP-ROME download running in binary mode in a baseline scenario where TCP-ROME interacts with single flow downloads. Second, we assess TCPROME’s ability to meet the bandwidth requirements in range mode. Third, we assess the system-wide tradeoff of multiple TCP-ROME and TCP flows in a typical peer-to-peer scenario. A. Single TCP-ROME flow over a common bottleneck As a baseline topology, we consider a single TCP-ROME download running in binary mode where all its subconnections share a common bottleneck as shown in Figure 21. The bottleneck link is simultaneously used by a varying number of TCP flows. The bottleneck link capacity is C = 1 Mb/sec,
Download time TCP−ROME [min]
Download time TCP−ROME [min]
TCP sink 1
TCP source 1
4.5
18 0 cross TCP 1 cross TCP 2 cross TCP 4 cross TCP 8 cross TCP
16 14 12 10 8 6 4 2 0 0
1
2
3
4
5
6
7
8
h (number of high priority connections) Fig. 22. Download time of TCP-ROME for a single bottleneck as a function of h
the delay 20 msec. The access links have a capacity of 100 Mb/sec and a delay of 2 msec. Figure 22 shows the download time in minutes of a 10 MB file for TCP-ROME with n = 8 parallel connections. The x-axis denotes the number of high priority subconnections h. The different lines denote a different number of TCP connections sharing the same bottleneck, varying from 0 to 8 TCP flows. The lowest line denotes the case where no TCP traffic interferes with TCP-ROME. Without cross traffic, the download time varies between 2.3 min (h = 0) and 1.45 min (h = 8). Without cross traffic, TCP-ROME efficiently uses all available bandwidth for itself. The difference between h = 0 and h = 8 occurs because the modified congestion avoidance of TCP-LP implies that low priority connection require more time to reach their equilibrium. By increasing the number of concurrent TCP flows, the bottleneck bandwidth is shared among more flows that adhere to TCP fairness. Increasing the number of TCP flows decreases the excess capacity at the bottleneck link. Therefore, if TCPROME uses only low priority subconnections, the download time of TCP-ROME increases from 1.8 min (1 cross flow) up to 18 minutes (8 cross flows). The download time of TCP-ROME decreases when h is increased. In the cases where TCP-ROME uses as many high priority subconnections as TCP flows (e.g., h = 2 with 2 TCP flows, etc.), the download time is close to 2.5 minutes.
80
C1
TCP sink 1
router
TCP-ROME source 1
TCP-ROME client
TCP-ROME source n
70
Cn
TCP source n
60
router
router
TCP sinkn
50
Fig. 25.
Distributed bottleneck setup
40 30 20 10 0
1
2
3
4
5
6
7
8
h (number of high priority connections) Fig. 23.
Contributions of low-priority flows in TCP-ROME
25 1 cross TCP 2 cross TCP 4 cross TCP 8 cross TCP
20
14 0 cross TCP 1 cross, shared 1 cross, not shared 4 cross shared 4 cross not shared 8 cross TCP
12 10 8 6 4 2 0 0
1
2
3
4
5
6
7
8
h (number of high priority connections)
15
Fig. 26. Download time of TCP-ROME for distributed bottlenecks as a function of h
10
5
0
Download time TCP−ROME [min]
low priority contribution [%]
0 cross TCP 1 cross TCP 2 cross TCP 4 cross TCP 8 cross TCP
90
0
Download time [min]
router
TCP source 1
100
0
1
2
3
4
5
6
7
8
h (number of high priority connections) Fig. 24.
Download time for single flow TCP downloads
The increase in the number of high priority connections and concurrent TCP flows decreases the benefit of the low priority connections, as the amount of excess traffic on the bottleneck decreases rapidly. Figure 23 depicts the contribution of the low-priority flows of TCP-ROME to the downloaded volume. Obviously, for h = 0 and h = n, the contributions are 100% and 0% respectively. Between these two extremes, for no cross traffic, the aggregated contribution of the lowpriority flows can be as high as 56% (with 7 low-priority subconnections). Even for a total number of 4 high priority flows, the contribution is still between 15% and 33%. These contributions are opportunistic in the sense that the decrease the performance time of the parallel download without adding to the unfairness towards other downloads. Unfortunately, these contributions decrease rapidly with the number of TCP flows sharing the same bottleneck. The level of aggressiveness of the TCP-ROME connection impacts the download time of the TCP flows sharing the
same bottleneck. Figure 24 shows the download time of the single TCP flow as a function of the number of high priority connections of TCP-ROME and the number of TCP flows. Obviously, the TCP flows are adversely affected by the mitigation of fairness and throughput of TCP-ROME. For a given number of TCP and high priority flows sharing the bottleneck, the download time of a single TCP flow increases almost linearly. For 1 TCP flow, e.g., the download time increases from 1.4 minutes to 10.6 minutes when TCP-ROME changes the number of high priority flows from h = 0 to h = 8. Moreover, the download time increases when the number of TCP flows increases as well. Thus, with h = 8 and 8 cross TCP flows, the download time of a single TCP flow reaches 21.3 minutes. B. Independent bottlenecks Here we assess the fairness-performance tradeoff of TCPROME when the different subconnections have independent bottlenecks, as depicted in Figure 25. To compare the results with the previous experiment, we set all link capacities to Ci = 128 kb/sec such that their aggregated capacity is 1Mb/sec. Again we consider only TCP-ROME flows with n = 8 subconnections, and the number of high priority subconnections is varied from h = 0 to h = 8. The total number of TCP flows is varied from 0 to 8, i.e. each bottleneck link has either 0 or 1 cross TCP flow.
Figure 26 shows the download time in minutes of a 10 MB file with TCP-ROME as a function of the number of high priority subconnections h on the x-axis. The different lines denote different number of concurrent TCP connections. For 2 and 4 single flow TCP connections we differ two strategies for TCP-ROME: (i) ”cross” denotes that TCP-ROME uses the high priority subconnection on the bottleneck that is shared with a TCP connection; ”no cross” denotes that TCP-ROME uses high priority subconnections on bottlenecks that are not shared. For h = 0 and h = 8 we achieve similar results as in Figure 22 because the aggregated link capacities are identical. For the cases with 1 and 4 cross flows, we note differences in the download time between 1.4 and 1.7 minutes as a function of the strategy and h. The differences arise because a low priority subconnection fully uses the capacity of a link that is not shared with a TCP connection, whereas it receives only a small fraction of the excess capacity if it shares the link with a TCP connection. Overall, the difference between the fastest download (h = 8) and the least aggressive (h = 0) in the common bottleneck scenario is approximately 8 times! The scenario of independent bottlenecks emphasizes the ability to achieve a load balancing. In contrast to a common bottleneck where TCP-ROME competes with all cross flows, only a subset of cross flows are affected. Alternative attempts in literature that target a performance-fairness tradeoff among a single client-server pair lack this property. Finally, looking from a different perspective, this scenario shows that TCP-ROME offers a simple solution to the server selection problem. By setting only 1 subconnection to high priority and n− 1 to low priority, TCP-ROME has the same level of fairness as a single TCP connection. Besides the fact that low-priority flows contribute to the download, TCP-ROME can dynamically determine which of the n subconnections is set to high priority. As we will show in Section VII-D.3, an opportunistic selection of the fastest subconnection at any time during the download can significantly reduce the download time. C. Range mode evaluation Here, we evaluate the range mode of TCP-ROME. The focus of this evaluation is to highlight the impact of a parallel streaming application with a specified target bandwidth B ∗ on competing flows. An implementation of TCP-ROME that focuses on the interaction between parallel download and realtime streaming will be assessed in future work. We consider a distributed bottleneck topology as depicted in Figure 25. The bottleneck links have a capacity of 2Mb/sec and 20 msec delay each, the access links 100 Mb/sec and 5msec. TCP-ROME shares the bottleneck links with concurrent TCP flows that start and stop at random times, but there are always between 5 and 10 TCP flows active per bottleneck link. TCPROME uses n = 8 parallel connections and coordinates the download such that disjoint segments are downloaded from each replica server. The target stream rate B ∗ is 1Mb/sec, we ∗ ∗ set B+ to 1.2 Mb/sec and B− to 1 Mb/sec.
Received frames[%] TCP throughput [kbps]
range mode 100 164
h=8 100 152
h=4 67 163
h=2 22 171
TABLE II R ANGE MODE EVALUATION .
download 1
?? ?? ?? ? ?? ? Core
download 2
Fig. 27.
Peer-to-peer setup
Table II assesses the performance fairness tradeoff for TCP-ROME range mode and compares it to 3 other TCPROME setups with a fixed number of high and low priority subconnections (h = 8, h = 4 and h = 2). The first row shows the streaming quality, measured as the percentage of frames that can be displayed in real time with a client buffer of 2 seconds. Using the range mode, TCP-ROME achieves the same video quality as TCP-ROME with h = 8 high priority subconnections. Using only 4 or 2 subconnections at high priority, the quality of the video drops drastically to 67% and 22% respectively. The second row shows the impact on fairness towards the concurrent TCP flows. With the range mode, a TCP flow achieves 164 kb/sec on the average, as much as with h = 4. Using h = 8 high priority subconnections, the single flow throughput is reduced by 8% to 152 kb/sec, whereas it increases to 171 kb/sec, but at the cost of lower streaming quality. D. System-wide tradeoff 1) Evaluation setup: Here we evaluate TCP-ROME in a peer-to-peer-like scenario where multiple applications - parallel and single flow downloads - share network and server resources in a distributed system. The simulation setup is depicted in Figure 27. The network core consists of links with 100Mb/sec and 10 msec delay. Attached to the core are 50 end systems. The access link bandwidth has a uniform distribution among 56, 128, 256, 512 kb/sec and 1 Mb/sec, which approximates measurements from [19]. 50 single flow and 50 parallel download applications are randomly placed onto the end systems. Each end system can be sender and receiver (but not for the same application). Applications start times have a random distribution between 0 and 10 sec and terminate when the download is finished. The file size distribution approximates a peer-to-peer system [7]: 90% of the files have a uniform distribution of 10-100MB, 3% of 1-10MB and 7% of 100MB-1GB. We measure the average download time of the TCP-ROME flows as well as the average download time of single flow downloads as a function of the
h=n
h=n−1
h=n−2
h=n−3
h=n−4
h=n−5
h=n−6
h=n−7
cwnd
6
bw
sbw
7
5.5
5
4.5
4
3.5
3
latency
8
Download time [min]
Download time [min]
6.5
1
2
3
4
5
6
7
8
6 5 4 3 2 1
n (number of parallel connections) 0
Fig. 28.
Download time of TCP-ROME applications in a p2p setup
h=n h=n−1 h=n−2 h=n−3 h=n−4 h=n−5 h=n−6 h=n−7
Download time [min]
10 9.5 9 8.5 8 7.5 7 6.5
1
2
3
4
5
6
7
8
n (number of parallel connections)
Fig. 29.
n=4, h=1
n=6, h=1
n=8, h=4
Fig. 30. TCP-ROME download times for different priority setting strategies
10.5
6
n=2, h=1
Download time of single flow applications in a p2p setup
number of parallel subconnections n and different values of high priority flows h for 10 runs. 2) Performance fairness tradeoff: Figure 28 depicts the download time of the TCP-ROME flows in minutes as a function of the number of parallel subconnections n on the x-axis. The different curves show a different number of highpriority flows h. We omit the case of h = 0 for displaying reasons. For n = 1, a single way to set the priority exists, whereas the number increases with increasing n. For h = n, i.e. if TCP-ROME always uses all of its subconnections at high priority, TCP-ROME achieves the fastest download time for a given n. The download time is reduced from 6.5 min (n = 1) to 3.17 min (n=8), a relative difference of 52%. Similarly, if TCP-ROME uses only a single flow at high priority (h = 1) and by modifying n (the ”upper” most points of all lines), the download time is reduced from 6.5 min to 5.66 min, a difference of 13%. This reduction is due to the low-priority flows and is therefore not subject to unfairness towards single-flow downloads. Finally, for a given n and by modifying h, TCP-ROME shows a wide range of fairness - performance tradeoffs. For n = 8, e.g., the two extremes are 3.17 min vs. 5.66 min, a net difference of 45%. Figure 29 shows the effect of different TCP-ROME values on the single-flow TCP downloads as a function of the number
of parallel subconnections n on the x-axis. The different curves show different values of h. Following each line, a significant increase in the download time occurs due to the increase of h. For example, the download time increases from 6.6 min to 10.4 min when h is changed from 1 to 8, a net increase of 39%. In contrast, when TCP-ROME maintains a fairness level, the impact of the low priority subconnections on single flow downloads is as low as 5% (and most probably due to the randomness of the simulations and not subject to TCP-LP connections). 3) Influence of priority selection strategy: Finally, we assess the influence of the priority selection strategies described in Section III-D. With the same simulation setup, we modify the priority selection strategy and measure the download time for each application. Figure 30 shows the download time as a function of n and h. The figure shows that the strategy based on the congestion window shows the best results. The other 3 strategies differ from this strategy by 5% (bandwidth) and up to 30% (smoothed bandwidth). We do not pursue a detailed analysis of the results. The important message to be seen here is that the priority selection strategy is another parameter that can be used to modify the tradeoff between performance and fairness, adding to the flexibility of TCP-ROME. VIII. R ELATED WORK Related work can be grouped into a 4 groups. First, parallel download protocols have been developed for (i) a single clientserver pair and (ii) parallel download from different replicas. For each of these two categories, efforts have first focused on improving the performance and then to mitigate performance and fairness. Between the same client-server pairs, parallel download protocols, such as PSockets [20], have initially been developed to overcome congestion window limitations in high-speed networks. Allcock et al. [1] evaluate the performance in a Grid context. Fairness is addressed by Hacker et al. [8]. Fairness is mitigated by increasing the congestion window as a function of
a virtual round-trip time that is larger than the actual round-trip time. While mitigation is possible, the analytically provided upper bounds on the throughput model are weak. In particular, determining the point of congestion in an end-to-end path is difficult, if not impossible, such that mitigation in real network is difficult to achieve. Lu et al. [12] address these shortcomings by predicting the throughput and taming a parallel download. Cho et al [5] also looks at coordinated throughput in parallel flows between the same client-server pair. Our work differs in two ways. First, TCP-ROME addresses replica systems that are distributed, i.e. bandwidth and round trip times are different for each subconnection. Second, we derive and validate a comprehensive throughput model for a parallel download. The concept of a congestion control coordinator (CCC) is not new by itself. The Congestion Manager (CM) [2] aims at controlling the performance of multiple interacting protocols on a single client. The CM re-implements a TCP friendly congestion control algorithm. Our work contrasts in that we do not re-implement TCP, but instead allow the CCC to directly modify the congestion control of subconnections. Moreover, while the CM ensures a fair (=TCP fair) resource usage of multiple protocols, TCP-ROME is specific to parallel downloads, but focuses on the potential to mitigate performance and fairness. The Coordination Protocol (CP) [15] aims at controlling the performance of aggregated flows in a cluster setting. While the CP is designed specifically for two clusters (similar to single client-server pair), TCP-ROME addresses distributed replicas. Moreover, TCP-ROME is a parallel download protocol, whereas CP ”only” addresses performance and fairness. Finally, Key et al. [10] also advocate a joint congestion control and routing, yet in the context of multi-path routing and multi-homing and more from a theoretical perspective. Parallel downloads for replicas have been exploited in peerto-peer systems. BitTorrent [6], e.g., coordinates the download from multiple, distributed servers to decrease the download time. A study of the performance characteristics of a peerto-peer network is presented by Qiu et al. [18]. This study highlights that fairness in peer-to-peer networks is measured by the willingness to cooperate (participation vs free-riding). This metric is not related to and independent of performanceoriented metrics (throughput). TCP-ROME, in contrast, targets performance and throughput fairness as a metric. IX. C ONCLUSIONS This paper presents TCP-ROME, a novel protocol for parallel downloads. TCP-ROME coordinates the data download from multiple replicas to a single client. Moreover, it coordinates the end-to-end throughput of its subconnections to dynamically mitigate performance and fairness towards other concurrent flows by allowing each subconnection to adjust its congestion control between TCP fairness and TCP-LP fairness. As a result, TCP-ROME offers a protocol framework that can be customized to achieve different levels of performance and fairness. In particular, in binary mode, TCP-ROME is intuitive and easy to use by adjusting a single variable, resulting in
a wide range of throughput for the parallel and single flow downloads. This paper first advocates the use of parallel downloads for real-time streaming of multimedia content. We have developed TCP-ROME, a parallel download protocol that embodies multiple mechanisms to support real-time streaming: the download coordination at the granularity of TCP segments, the content coordinator that ensures a timely delivery of the segments, and the rate coordinator that allows a dynamic tradeoff between streaming performance and fairness. The sum of these mechanisms allows TCP-ROME to significantly improve the streaming performance and the viewing experience of the user. To analyze the throughput and fairness tradeoff, this paper develops a new analytical model for TCP-LP throughput. The paper discusses the influence of the main variables and validates the model with simulation. Then, by combining this model with TCP model, an expression for the expected throughput and the expected download time for a parallel download is derived. These models therefore allow a mathematical understanding of their throughput and their impact of concurrent flows sharing the network and server resources. Finally, protocol simulations in typical peer-to-peer networks show that the download time of TCP-ROME with the most and least aggressive setting differ by a factor of 8. In a typical peer-to-peer setup with multiple interacting parallel and single flow downloads, the average download time difference for parallel downloads varies between 3.17 min and 6.5 min, a difference of 52%, whereas it varies between 6.5 min and 10.5 min for single flow TCP downloads for different performance fairness tradeoffs. These results highlight the potential of TCP-ROME to tradeoff the throughput and fairness. Thus, TCP-ROME is a comprehensive, well-founded and flexible protocol framework to mitigate performance and fairness. TCP-ROME’s ability to opportunistically profit from high speed environments while abating its performance under resource shortages is a fundamental property suited to address the increasing heterogeneity in today’s and future networks. R EFERENCES [1] B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, and S. Tuecke. Data management and transfer in high-performance computational grid environments. Parallel Computing, 28(5):749–771, May 2002. [2] H. Balakrishnan, H. Rahul, and S. Seshan. An integrated congestion management architecture for internet hosts. In Proceedings of ACM SIGCOMM99, Cambridge, MA, September 1999. [3] N. Cardwell, S. Savage, and T. Anderson. Modeling TCP latency. In Proceedings of IEEE INFOCOM’00, Tel Aviv, Israel, March 2000. [4] M. Castro, P. Druschel, A. Kermarrec, A. Nandi, A. Rowstron, and A. Singh. SplitStream: High-bandwidth multicast in a cooperative environment. In Proceedings of SOSP’03, Bolton Landing, NY, October 2003. [5] S. Cho and R. Bettati. Collaborative congestion control in parallel tcp flows. In Proceedings of ICC, Seoul, Korea, May 2005. [6] B. Cohen. Incentives build robustness in Bittorrent, May 2003. http://bittorrent.com/bittorrentcon.pdf. [7] K. Gummadi, R. Dunn, S. Saroiu, S. Gribble, H. Levy, and J. Zahorjan. Measurement, modeling and analysis of a peer-to-peer file-sharing workload. In Proceedings of SOSP’03, Bolton Landing, NY, October 2003.
[8] T. Hacker, B. Noble, and B. Athey. Improving throughput and maintaining fairness using parallel TCP. In Proceedings of IEEE INFOCOM’04, Hong Kong, March 2004. [9] R. Karrer and E. Knightly. TCP-PARIS: a parallel download protocol for replicas. In Proceedings of IEEE International Workshop on Web Content Caching and Distribution (WCW’05), Sophia Antipolis, France, September 2005. [10] P. Key, L. Massoulie, and D. Towsley. Combining multipath routing and congestion control for robustness. In Proceedings of CISS, Princeton, NJ, March 2006. [11] A. Kuzmanovic and E. Knightly. TCP-LP: a distributed algorithm for low priority data transfer. In Proceedings of IEEE INFOCOM’03, San Francisco, CA, April 2003. [12] D. Lu, Y. Qiao, P. Dinda, and F. Bustamante. Modeling and taming parallel TCP on the wide area network. In Proceedings of IPDPS’05, Denver, CO, April 2005. [13] P. Mehra, C. de Vleeschouwer, and A. Zakhor. Receiver-driven bandwidth sharing for TCP and its application to video streaming. IEEE Transactions on Multimedia, 7(4), August 2005. [14] T. Nguyen and A. Zakhor. Multiple sender distributed video streaming. IEEE Transactions on Multimedia, 6(2), April 2005. [15] D. Ott, T. Sparks, and K. Mayer-Patel. Aggregate congestion control for distributed multimedia applications. In Proceedings of IEEE INFOCOM’04, Hong Kong, March 2004. [16] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose. Modeling TCP throughput: A simple model and its empirical validation. In Proceedings of ACM SIGCOMM’98, pages 303–314, Vancouver, CA, 1998. [17] V. Padmanabhan, H. Wang, P. Chou, and K. Sripanidkulchai. Distributed streaming media content using cooperative networking. In Proceedings of ACM NOSSDAV’02, Miami, FL, May 2002. [18] D. Qiu and R. Srikant. Modeling and performance analysis of BitTorrent-like peer-to-peer networks. In Proceedings of ACM SIGCOMM’04, Portland, OR, August 2004. [19] S. Sen and J. Wang. Analyzing peer-to-peer traffic across large networks. In Proceedings of the Second SIGCOMM Internet Measurement Workshop (IMW 2002), Marseille, France, November 2002. [20] R. Sivakumar, S. Bailey, and R. Grossman. PSockets: the case for application-level network striping for data intensive applications using high speed wide area networks. In Proceedings of the 2000 ACM/IEEE Conference on Supercomputing, Dallas, TX, March 2000. [21] A. Venkataramani, R. Kokku, and M. Dahlin. TCP-NICE: A mechanism for background transfers. In Proceedings of OSDI’02, Boston, MA, December 2002. [22] B. Wang, J. Kurose, P. Shenoy, and D. Towsley. Multimedia Streaming via TCP: An Analytic Performance Study. In In Proceedings of ACM Multimedia, New York, NY, October 2004. [23] B. Wang, W. Wei, Z. Guo, and D. Towsley. Multipath live streaming via tcp: Performance and benefits. Technical Report BECAT/CSE-TR-06-7, University of Connecticut, 2006.
an inference timer. While this timer is active, TCP-LP does not increase the congestion window. Since we consider a single ECI only in this part of the analysis, we can assume that the timer expires before another ECI arrives. These differences influence the model as follows. In TCP, the congestion window at the end of a TD period is given by Equation 7* as Wi =
Xi Wi−1 + , i = 1, 2, . . . 2 b
(13)
where the latter term denotes the linear increase of the congestion window. This increase is delayed in TCP-LP until the inference timer expires. Denote TIT as the length of the TIT inference timer, and XT = RT T as the number of rounds spent until the timer expires. Then, the number of rounds where the congestion window actually increases in TCP-LP is Xi − XT . Since we consider only single ECIs, Xi > XT . With this modification, Equation 7* is modified for TCP-LP as Wi =
Wi−1 Xi − XT + , i = 1, 2, . . . 2 b
(14)
Further, in Equations 8*-10*, which determine the number of packets transmitted, Yi , becomes Yi = = = =
P Xi /b−1 W Wi−1 b + k=X ( i−1 + k)b + βi 2 2 T /b−1 P Xi /b−1 Wi−1 P (Xi −XT )/b−1 b + k=0 kb + βi k=0 2 Xi Wi−1 Xi −XT Xi −XT + ( − 1) + βi 2 2 b W XT Xi Wi−1 ( 2 + Wi − 1) + 2 (Wi − i−1 − 1) + βi 2 2
P XT /b−1 k=0
(15) (16) (17) (18)
In Equation 17, the first term denotes the amount of data sent during the interference timeout, and the second term denotes the amount of data sent while the congestion window is increased. Equation 18 finally resembles the first term in Equation 10*, except that the amount of data during XT must be deducted. With E[W ] =
2 (E[X] − XT ) b
(19)
and using Equations 16 to 18, we get A PPENDIX Here, we develop a throughput model for TCP-LP based on the methodology for TCP by Padhye [16]. References to equations followed by a (∗) refer to equations in [16]. A. Single-ECIs only First, we model the throughput of TCP-LP in the absence of timeouts, i.e., the TCP throughput is influenced only by duplicate acks. Denote p as the probability of a loss (for TCP) and denote q as the probability of an ECI (for TCP-LP). Then, the TCP throughput is then given by Equation 4* r 1 √ 3 B(p) = + o(1/ p) (12) RT T 2bp TCP-LP differs in 2 ways from TCP. First, TCP-LP reacts to a single ECI whereas TCP reacts only to the third duplicate ACKs. Therefore, the trigger mechanism is different. Second, in addition to halving the congestion window, TCP-LP starts
1−q E[X] E[W ] XT E[W ] E[W ] +E[W ] = ( +E[W ]−1)+ (E[W ]− −1)+ q 2 2 2 2 2 (20)
With the same further assumptions, we can express E[W ], E[X] and E[A], leading to Equation 20*: E[W ] =
2 + b − 4XT ±
r
24b + b(b + 16X − 20) + 4(2X − 1)2 T T q
(21)
3b
Observe that as in Equation 17*, r √ 8 + o(1/ q) E[W ] = 3bq
(22)
for small values of q, since the absence of loss (TCP) and ECI (TCP-LP) decreases the differences between TCP and TCPLP. Combining Equations 21 and 19, we get √ 2 + b − 4XT ± . . . + XT (23) E[X] = 6
p , where ( . . .) denotes the root of Equation 21. Then, according to Equation 6*, E[A] = (E[X] + 1) · RT T , we can calculate E[A]: √ 2 + b − 4XT ± . . . E[A] = RT T ( + XT + 1) (24) 6 These equations lead to the final equation for TCP-LP throughput if only single ECIs are received: B(q) =
√ 2+b−4XT ± ... 1−q + q 3b √ 2+b−4XT ± ... + XT RT T ( 6
+ 1)
(25)
B. Double ECIs Here, we analyze the throughput for the case that a second ECI arrives while the inference timer is still active from a previously received ECI. Following the analysis in Section 2.2 of [16], we first derive an expression for the throughput under 2 ECIs, and then the probability that a second ECI is received while the interference timer is active. The length of an interference timeout is given by ZiT O
= E[Z
TO
] = TIT
(26)
If a second ECI arrives during the interference time, the congestion window is set to 1. Therefore, the throughput for the remainder of the timeout is TIT =: XT (27) Ri = E[R] = RT T and Equation 21* becomes B=
E[Y ] + Q ∗ XT E[A] + Q ∗ TIT
(28)
Yij and Aij do not depend on double ECIs. Therefore, we only need to determine an expression for Q, i.e., the probability that a second ECI is received while the interference timer is still active. Denote α as the arrival of an ECI. In general, the probability that a set of packets create k ECI is, according to Equation 3*, P [α = k] = (1 − q)k−1 q, k = 1, 2, . . .
(29)
The probability of a second ECI given the first ECI is then given by 1 minus the probability that no loss occurs while the interference timer is active. Thus, the probability that 2 ECIs occur is a function of the number of packets sent during the interference timeout. Therefore, we calculate the probability that one of these packets generates the second ECI. Denote li as the number of packets transmitted between the first ECI and the timer expiration. Since the congestion window does not increase during the timeout, li is a function of the congestion window that is halved by the first ECI: Wi TIT · (30) 2 RT T Then, the probability that a second ECI is received given that the first ECI arrived at time t is li =
P [α(t, t+TIT ) = 2|α(t) = 1] =
P (α(t, t + TIT ) = 2 ∧ α(t) = 1) P [α(t) = 1] (31)
In TCP-LP, ECIs are generated either by receiving ECNs or via one-way delay measurements. In the former case, the probability that a flow experiences 2 ECNs is a complex function of the marking strategy of the AQM router and the behavior of the other flows sharing the same router. In the latter case, it is a function of the one-way delay and a delay threshold parameter δ. To maintain the generality of our solution, we introduce a parameter χ with 0 ≤ χ ≤ 1 that denotes the correlation probability of 2 ECIs. Then, we can derive the probability that a second ECI occurs during the interference timeout Q(q, χ) as E[W ] 1 1 (1 − (1 − q)l ) = (1 − (1 − q) 2 ·XT ) 1−χ 1−χ (32) Combining this probability of Equation 32 with the expected throughput during the interference timeout of Equation 28, we get the TCP-LP throughput in the case of double ECIs:
Q(q, χ) =
B(q, γ) =
1−q q
+ E[W ] +
1 1−γ (1
RT T · (E[X] + 1) +
1 1−γ
− (1 − q)
E[W ] ·XT 2
· (1 − (1 − q)
) · XT
E[W ] ·XT 2
) · TIT (33)
and by substituting E[W ] from Equation 21: √ 2+b−4X ± ...
T 1−q 1 + 1−χ (1 − (1 − Wmax q + 3b , B(q, χ) = min( √ ... 2+b−4X ± T 1 RT T RT T · ( + XT + 1) + 1−χ (1 − 6 (34)
C. Window limitation Finally, Padhye et al. [16] consider the effect of the window size limitation advertised by the buffer. TCP-LP does not differ from TCP, therefore we directly use approximation of Equation 30*. Thus, the final throughput model a steady-state TCP-LP throughput is defined as above in Equation 34.