Distributed applications (such as Web servers and clients) are executed ... network interfaces of the emulated host as if received from a .... operating systems, including Windows, Linux, Solaris, and .... packets to the emulation gateway via another dedicated TCP ..... best response time among the three TCP variants that we.
SVEET! A Scalable Virtualized Evaluation Environment for TCP Miguel A. Erazo, Yue Li, and Jason Liu School of Computer and Information Sciences Florida International University, Miami, FL 33199 Emails: {meraz001,yueli,liux}@cis.fiu.edu
Abstract—The ability to establish an objective comparison between high-performance TCP variants under diverse networking conditions and to obtain a quantitative assessment of their impact on the global network traffic is essential to a communitywide understanding of various design approaches. Small-scale experiments are insufficient for a comprehensive study of these TCP variants. We propose a TCP performance evaluation testbed, called SVEET, on which real implementations of the TCP variants can be accurately evaluated under diverse network configurations and workloads in large-scale network settings. This testbed combines real-time immersive simulation, emulation, machine and time virtualization techniques. We validate the testbed via extensive experiments and assess its capabilities through case studies involving real web services. Index Terms—TCP Performance, Network Simulation, Emulation, Machine Virtualization, Time Dilation
I. I NTRODUCTION The congestion control mechanism of TCP, which limits the rate of data entering the network, is essential to the overall stability of the network under traffic congestion and important to the protocol’s performance. It has been widely documented that the traditional TCP congestion control algorithms (such as TCP Reno and TCP SACK) have serious problems preventing TCP from reaching high data throughput over high-speed long-latency links. Consequently, quite a number of TCP variants, including H-TCP [1], Scalable TCP [2], FAST [3], BIC [4], CUBIC [5], just to mention a few, have been proposed to directly tackle these problems. Compared with the traditional methods, these TCP variants typically adopt more aggressive congestion control methods in order to address the under-utilization problem of TCP over networks with a large bandwidth-delay product. In recent years, a significant amount of effort has been invested to evaluate these TCP variants (e.g., [6]–[8]). However, there is still no comprehensive study to date on the performance of these TCP variants beyond small-scale experiments. This problem is compounded by the lack of standard performance metrics and network scenarios to benchmark the existing algorithms. More recently, several test suites have been developed in an effort to standardize network configurations, workloads, and metrics to evaluate these TCP variants (e.g., [9], [10]). These test suites, however, support only small-scale network scenarios with limited complexity for testing and benchmarking the TCP variants. The com-
munity urgently needs a standard evaluation environment for an objective comparison of the TCP variants under a wide spectrum of networking conditions in large network scenarios. We believe such an effort will be essential to a communitywide understanding of different design approaches of TCP congestion control, as well as their impact on the overall network traffic. Three principal methods can be used to test the TCP performance: live experiments, simulation, and emulation. Live experiments on existing research testbeds, such as PlanetLab [11] and VINI [12], provide protocol designers with realistic distributed environments and traffic conditions that resemble the target system on which the protocols are deployed. These testbeds, however, do not provide the level of reproducibility, controllability, and flexibility necessary for testing and benchmarking the TCP variants under diverse network conditions. In contrast, network simulation, such as ns-2 [13], SSFNet [14], and GTNetS [15], offers complete control over the testing environment. In simulation, the network topologies and workloads are easy to configure, and events can be readily generated to test the protocols under various circumstances. Nevertheless, simulation lacks realism. The protocols are not easily implemented in simulation without significant effort and, above all, simulation models must be subjected to rigorous verification and validation tests, without which they may not necessarily reflect the behavior of the real network stack. Alternatively, emulation testbeds, such as EmuLab [16] and ModelNet [17], allow flexible network experiments that directly involve real protocol implementations. However, the cost of setting up a large-scale network emulation environment for testing and benchmarking the TCP variants can be substantially higher than that of simulation. Our goal is to provide researchers with a flexible, easy-touse, and easy-to-maintain solution—a testbed where real implementations of the TCP variants can be accurately evaluated with diverse network configurations and workloads in largescale network settings. We present the Scalable Virtualized Evaluation Environment for TCP (SVEET), a testbed that combines network simulation, emulation, and machine virtualization techniques. We apply real-time immersive network simulation techniques, which allow the simulated network to interact with real implementations of network applications, protocols, and services [18]. We extend the real-time sim-
II. R ELATED W ORK
ulation capabilities of our network simulator such that the simulation time can be made to advance either in real time or proportional to real time. We use machine virtualization technologies to provide the exact execution environment of the target system. Applications run on separate virtual machine compartments and communicate using the real network stacks implemented with the TCP variants under investigation. These virtual machines are connected through the network simulator, which calculate the traffic delays and losses according to simulated conditions of the virtual network. Our approach combines the advantages of both simulation and emulation: we can achieve high-level controllability, repeatability, and flexibility of experiments from simulation, and real protocol execution capabilities from emulation.
There have been significant efforts in the performance evaluation of different TCP congestion control algorithms. Li et al. [6] presented experimental results of several major TCP congestion control algorithms. Although the authors recognized the importance of diverse network conditions on TCP performance evaluations, their studies mainly focused on longlived flows (with realistic round-trip time distributions) over a single bottleneck link. Their studies obtained several important (and somewhat surprising) results about TCP performance. For example, it was shown that some TCP variants can exhibit substantial intra-protocol unfairness and unfairness toward flows with larger round-trip times. Ha et al. [8] identified several important elements in constructing realistic test scenarios for TCP performance evaluation, such as bottleneck bandwidths, round-trip times, topology, network queue size, and background traffic. In a later study, Ha et al. [7] provided a systematic study on the impact of background traffic on the performance of high-speed TCP variants. They extended the traditional network scenarios to include as many as five types of background traffic. They found that the background traffic plays an important role in determining TCP performance. For example, the presence of background traffic has been shown to improve the fairness for nearly all protocols mainly due to the increased randomness in packet losses. There has been a genuine push from the community since the beginning for a standard evaluation environment for prototyping, testing, and benchmarking existing and new TCP algorithms. For example, Wei et al. [10] proposed a benchmark suite that consists of a set of network configurations, workloads, and standard metrics. Specifically, they proposed 15 network scenarios to exercise different aspects of the TCP design. The authors proposed a benchmark suite with two modes: a simulation mode (using ns-2) and a live experiment mode (directly on hardware implementation). Similarly, Shimonishi et al. [23] proposed a TCP evaluation suite for TCP simulation studies using ns-2. Their goal was to promote comparable studies of different congestion control schemes. The TCP evaluation suite uses a set of TCL scripts to automatically generate experiment scenarios (including topology, flow, and workload), run experiments, and analyze results. Separately, Andrew et al. [9] proposed a specification of an evaluation suite for TCP congestion algorithms, which considers different traffic scenarios (such as traffic load, traffic constitution, flow durations, and packet size distribution) and different network configurations (such as round-trip times and link types), and defines a set of performance metrics and scenarios of common interest (such as throughput, delay, convergence time, and fairness). SVEET is inspired by these studies and aims to provide researchers with an accurate, scalable, and flexible testbed to evaluate TCP variants with realistic network configurations and workloads. Most of the aforementioned studies were based on either simulation or emulation, but not both. For example, The testbed used by Ha et al. [7] to evaluate TCP performance
To enable network experiments with large capacity links, we apply a technique, called time dilation, on the virtual machines [19]. Time dilation can change a virtual machine’s notion of how time progresses by controlling its system clock and timer interrupts. In a time-dilated system, applications running on the virtual machines experience a slower passage of time and consequently believe the system is upgraded with a faster CPU and I/O speed. In order for SVEET to accommodate data communications with multi-gigabit throughput performance, we can apply time dilation, proportionally slowing down the virtual machines and the network simulator. Using time dilation allows us to provide much higher bandwidths than what can be provided by the physical system and the network simulator at the cost of increased experiment time. We designed and implemented SVEET based on our real-time immersive network simulator, PRIME, which supports parallel and distributed simulation of large-scale networks [20]. PRIME provides a scalable emulation infrastructure that allows automatic and almost transparent interaction between multiple virtual machines and the simulator [21]. In the mean time, we ported several TCP congestion control algorithms to PRIME. These TCP variants were previously implemented in the ns-2 simulator following the Linux TCP implementation, which consist of a total of thirteen TCP variants, including BIC, CUBIC, HS-TCP, H-TCP, TCPHYBLA, TCP NewReno, Scalable TCP, TCP Vegas, TCP Westwood, TCP Veno, TCP-LP, TCP Illinois, and C-TCP [22]. In doing so it enables us to conduct large-scale experiments using simulated traffic generated by these TCP variants. We also customized the Linux kernel on the virtual machines to include the TCP variants so that we can test them using real applications running on the virtual machines to communicate with remote applications via the TCP/IP stack. The rest of the paper is organized as follows. In Section II, we review related work. Section III describes the details of our design and implementation of SVEET. We conducted extensive experiments to validate our testbed; the results are shown in Section IV. Section V provides some interesting case studies enabled by the testbed involving real web traffic. Finally, we conclude this paper and outline our future work in Section VI. 2
both virtual machines and the network simulator.
with background traffic was developed upon FreeBSD Dummynet [24]. There are several prominent emulation testbeds well received by the research community for conducting network experiments in general. They include, for example, EmuLab and ModelNet. EmuLab [16] is an experimentation facility that consists of a large number of networked computers that can be configured to present a virtual network environment. EmuLab supports resource sharing through machine virtualization so that researchers can simultaneously run multiple experiments using the same facility. EmuLab currently supports hundreds of projects with thousands of users running over 18,000 experiments per year. ModelNet [17] is another emulation environment that supports large-scale network experiments. ModelNet runs real network applications at the peripheral of the system called “edge nodes” and directs traffic through the emulation core consisting of parallel computers. Network operations, specifically packet forwarding, are carried out according to proper delays and losses at the emulation core based on a time-stepped (or interrupt-driven) approach. These emulation testbeds are general-purpose environments for conducting experiments for network and distributed applications; SVEET focuses on TCP performance evaluation and benchmarking. SVEET also has a different design philosophy: one of the main goals of SVEET is to support commodity execution environments, rather than promoting resource sharing, to increase the accessibility of the testbed by the network researchers at large. Researchers have proposed to use machine virtualization for building network emulation testbeds (e.g., [12], [25]). Virtual machine solutions come in variations, ranging from full-scale machine virtualization (such as VMWare Workstation [26] and User-Mode Linux [27]), to light-weight machine virtualization (such as Xen [28] and VServer [29]), to virtualized network stacks (such as OpenVZ [30]). Recently, Caini et al. [31] designed and developed a virtual integrated TCP testbed, called VITT. The main idea is to exploit advanced virtualization technologies to realize real network components within a single physical machine. VITT aims to support high-fidelity TCP performance evaluation with high-level timing accuracy. However, since all network components must be emulated on the same physical host, the scalability of VITT is questionable. Virtual time management has been the central theme of parallel discrete-event simulation. However, the concept is relatively new for network emulation. SVEET is developed based on the time dilation technique proposed by Gupta et al. [19]. Bergstrom et al. [32] proposed a different virtual time management mechanism enabled by their binary executable modification scheme, which includes the ability to dynamically dilate or contract time to improve resource utilization. Grau et al. [33] proposed a low-overhead conservative synchronization mechanism to regulate the time dilation factors across virtual machines. SVEET currently does not support dynamic time dilation; we will incorporate dynamic time management following these approaches. All the above methods, however, have been applied only to network emulation; SVEET needs to coordinate time dilation across the entire system, including
III. T HE SVEET A PPROACH SVEET is expected to be a standard TCP evaluation testbed for the community. Such a testbed must satisfy the following requirements: • It must be able to generate reproducible results. Reproducibility is essential to the protocol development and evaluation; the users should be able to use the testbed and follow a set procedure for regression testing, documenting, and benchmarking existing TCP variants. • It must be able to accommodate a diverse set of networking scenarios, ranging from small-scale topologies to large-scale configurations. Not only should one be able to use the testbed for inspecting the details of the protocol behavior in small, tightly controlled, choreographic conditions, but also be able to perform studies to assess largescale impact—e.g., how much a particular TCP variant can affect and be affected by other network traffic in realistic large-scale network settings. • It must be able to incorporate existing protocol implementations in real systems rather than to develop its own version of the TCP variants simply for testing purposes. The protocol development process is complicated and error prone: maintaining a separate code base would have to include a costly procedure for verification and validation to avoid jeopardizing the credibility of the studies. Further, using existing implementations can avoid the situation where the implementations involve changes made to other parts of the supporting operating system, thus making it difficult to replicate in simulation. In addition to satisfying the above requirements, among the multiplicity of design decisions, we focus on four important factors that will determine the utility of our TCP evaluation testbed: accuracy, scalability, flexibility, and accessibility. The TCP evaluation testbed shall maintain a certain degree of accuracy in projecting the performance of TCP under diverse traffic conditions as well as its impact on the global network traffic as a whole. The testbed shall be scalable so that users will be able to test TCP variants in large-scale network scenarios. The testbed shall provide mechanisms for flexible configuration of network experiments with which users can explore and evaluate different design alternatives. The testbed shall also facilitate running on commodity execution environments so that researchers can easily conduct TCP experiments. Fig. 1 provides a schematic view of the SVEET architecture. Distributed applications (such as Web servers and clients) are executed directly on end-hosts configured as separate virtual machines with their own network stacks implemented with the TCP variants under investigation. We call these end-hosts emulated hosts; two such emulated hosts are shown in the figure. Traffic generated by the applications on these emulated hosts are captured by the virtual network interfaces (NICs), which forward the packets to the PRIME network simulator via the emulation infrastructure provided by PRIME. Once inside the network simulator these packets are treated simply 3
PRIME Network Simulator
real applications on the virtual machines), PRIME can throttle the simulation speed to only a fraction of the speed of the wall-clock time. Similar to the time dilation technique for the virtual machines (described in more detail in the following section), we also call the slowdown rate the Time Dilation Factor (TDF). For example, if the simulator’s TDF is 10, it means the simulation clock will be advanced at a rate 10 times slower than the wall-clock time. PRIME provides methods to change TDF both at the start of the simulation or during run time. In the current SVEET implementation, we allow only static TDF throughout the whole experiment. PRIME provides a rich set of network elements (such as routers, links, and network queues) and protocol implementations, including TCP (Tahoe, Reno, NewReno, and SACK), UDP, and various application-layer protocols (such as FTP and HTTP). These network elements and protocols can be used to construct various network experiment scenarios. In addition, we also ported all thirteen Linux TCP variants to PRIME to allow more flexibility for conducting TCP experiments on the testbed. For example, one can use these TCP variants to generate background traffic directly in simulation, rather than consuming the virtual machine resources to generate similar emulated traffic. Background traffic is needed in order to test its impact on the foreground real applications, such as web services and multimedia streaming (e.g., see our case studies in Section V). We followed the same design principle as in [22] for porting Linux TCP variants to PRIME. In fact, we reused as many data structures as possible from the Linux TCP port to ns-2. Fig. 2 shows the code structure. In PRIME, protocols on each virtual node are organized as a list of protocol sessions, each represented as a ProtocolSession object. We created a protocol session, LinuxTcpMaster, to manage all active Linux TCP connections, and another protocol session, LinuxTcpSimpleSocket, to support a simple interface for applications to send or receive data over Linux TCP. Consequently, both are derived from the ProtocolSession class. A TCP connection was structured in the same fashion as in ns-2: we used LinuxTcpAgent to represent the TCP sender-side logic and SinkTcpAgent to represent the receiver-side logic. In this way we achieved maximum reuse of the existing code from the Linux TCP implementation in ns-2. The congestion control mechanisms of the TCP variants were transplanted directly from the Linux TCP implementation. ns-linux-util is a set of facilities created by the ns-2 port as an interface between the Linux TCP functions and the ns-2 simulator. We refurbished these facilities for them to run in PRIME.
VM2 Applications
VM1
TCP/IP Stack
Applications
Virtual NICs
TCP/IP Stack Virtual NICs
PRIME Emulation Infrastructure
Fig. 1.
The SVEET architecture.
as simulation events. PRIME then carries out packet forwarding on the virtual network according to the virtual network configuration regardless whether the packets are simulated or emulated packets. That is, packet delays and packet losses are assigned by the network simulator as both emulated traffic (i.e., packets generated by the real applications) and simulated traffic (i.e., virtual packets simulated by the network simulator) compete for the resources of the virtual network. Once the traffic reaches its destination that has been emulated, the packets are exported to the emulated hosts via the emulation infrastructure provided by PRIME. Packets arrive at the virtual network interfaces of the emulated host as if received from a real network. In the following subsections, we present more details for each component of the SVEET architecture. A. PRIME Network Simulator PRIME stands for Parallel Real-Time Immersive network Modeling Environment [20]. PRIME features a parallel discrete-event simulation engine based on the Scalable Simulation Framework (SSF), which is a standard API for parallel large-scale simulation [14]. PRIME can run on most parallel platforms, including shared-memory multiprocessors (as a multi-threaded program), distributed-memory machines (via the message-passing interface), or a combination of both. Supporting parallel and distributed simulation allows PRIME to run extremely large network simulations. PRIME also supports real-time simulation, in which case unmodified implementations of real applications can run along with the network simulator that operates in real time (i.e., the simulation time advances at the same speed as the wallclock time). Traffic between the real applications (on virtual machines) is captured and forwarded implicitly to PRIME so that it can be “carried” on the virtual network with calculated packet delays and losses in accordance with the topology and congestion level of the simulated network. Note that, to guarantee real-time performance, the network simulator must process events at a rate no slower than the wall-clock time. In cases where the simulation speed cannot keep up with the real time (i.e., if the number of simulation events generated by the simulator becomes larger than the maximum event execution rate supported by PRIME), or when the emulation infrastructure is overwhelmed by the emulation traffic (i.e., when the I/O interface can no longer support the large traffic volume between the network simulator and the
B. Xen Virtual Machines We chose Xen [28] for the virtual machines in our implementation. Xen is a high-performance open-source virtual machine solution. On most machines, Xen uses a technique called “para-virtualization” to achieve high performance by modifying the guest operating systems to obtain certain architectural features not provided by the host machines to 4
We adopt the time dilation technique developed for Xen by Gupta et al. [19], which can uniformly slow the passage of time from the perspective of the guest operating system (XenoLinux). This is achieved, among many things, by enlarging the interval between timer interrupts delivered to the virtual machines from the Xen hypervisor by a specified factor, called the Time Dilation Factor (TDF). Time dilation can scale the perceived I/O rate as well as the perceived processing power on the virtual machines by the same factor. For instance, if a virtual machine has a TDF of 10, it means that the time, as perceived by the applications running on the virtual machine, will be advanced at a pace 10 times slower than the true wall-time clock. Similarly, the applications would experience a tenfold increase in both network capacity and CPU cycles. In the current implementation of SVEET, we set the same TDF for all virtual machines and the network simulator at the start of the experiment, according to the maximum projected simulation event processing rate and emulation bandwidth. This approach may be overly conservative and cause significant resource under-utilization. We are investigating alternative ways for adaptive TDF assignments during the experiments and will report the results in another paper.
LinuxTcpSimpleSocket
PRIME ProtocolSession
LinuxTcpMaster
SinkTcpAgent
LinuxTcpAgent
ns-linux-util
HS-TCP
CUBIC
H-TCP
HYBLA
NewReno
TCP-LP
BIC
STCP
Vegas
C-TCP
Veno
FAST
Westwood Illinois
Linux TCP
Fig. 2.
Implementing Linux TCP in PRIME.
support virtualization. Xen supports a wide range of guest operating systems, including Windows, Linux, Solaris, and several BSD variants. Our current SVEET implementation only supports Linux (a.k.a. XenoLinux) as the guest operating system, which features the TCP implementations we’d need for experimentation. Specifically, we chose Xen 3.0.4 and Linux kernel 2.6.16.33. This Linux kernel comes with 9 TCP variants which we shall include in our experiments.1 We also instrument the Linux kernel with Web100 [34], so that researchers can easily monitor and change various TCP variables during the experiments. In order to fully test the variety of TCP congestion control algorithms in various environments, especially on highbandwidth long-latency networks for which some of these TCP variants were particularly designed, plentiful bandwidth must be supported in the test network topology. There are two issues that could limit SVEET’s capabilities of conducting experiments involving high-capacity network links. First, high-bandwidth links can transmit more packets per unit of time than slower links, which means more simulation events need to be processed in the network simulator, which could eventually surpass the maximum event execution rate supported by PRIME on the particular platform. Second, highcapability links may cause more traffic to go through the emulation infrastructure situated between the virtual machines that generate the traffic and the network simulator that carries the traffic. The physical bandwidth depends on the underlying network fabric chosen by the emulation infrastructure, which could be substantially smaller than the bandwidth used in the testing scenarios. In all the above cases, one must be able to slow down the progression of time (i.e., to dilate time), both in the network simulator and the virtual machines, in order to satisfy the computational and communicational requirements of the testbed.
C. PRIME Emulation Infrastructure SVEET uses the PRIME emulation infrastructure to capture packets generated by the emulated hosts (i.e., the Xen virtual machines). The emulation infrastructure forwards the packets to PRIME to be simulated. From the simulator’s point view, these packets seem to have been generated directly by the corresponding end-hosts on the virtual network. The particular emulation infrastructure we used for the experiments is built upon OpenVPN, which has been customized to tunnel emulation traffic between the virtual machines and the network simulator [21]. Fig. 3 illustrates a simple example of SVEET using the emulation infrastructure to connect the PRIME network simulator with two virtual machines—one running a web server sending HTTP traffic to the other virtual machine running a web client. To set up the experiment, the two virtual machines need to first establish separate VPN connections with a designated VPN server, which we call the emulation gateway. OpenVPN is an open-source VPN solution that uses TUN/TAP devices. The OpenVPN client on each virtual machine creates a virtual network interface (e.g., the tunnel device, tun0), which is assigned with the same IP address as that of the corresponding end-host on the virtual network. The forwarding table of each virtual machine is automatically configured to forward traffic destined to the virtual network IP space via the VPN connection. In this case, data generated by the web server will be sent down to tun0 via the TCP/IP stack and then given to the OpenVPN client. The OpenVPN client uses IP over UDP to transport packets to the OpenVPN server at the emulation gateway. Upon receiving the packets, the emulation gateway forwards the packets via a dedicated TCP connection to the simulator.
1 We are currently in the process of upgrading the Linux kernel to the latest version that contains all the Linux TCP variants described in this paper.
5
Web Client
PRIME Network Simulator
eth0 OpenVPN Client
Emulation Gateway
80%
70% 60% 50% 40% 30% 20%
60% 50% 40% 30% 56 bytes 512 bytes 1K bytes
10%
0%
0% 34
Writer Thread
70%
20%
10%
TCP/IP Stack
tun0
90%
Cumulative Fraction
Reader Thread
VM2
User
tun0
Kernel
80%
TCP/IP Stack
100%
Scalable TCP TCP BIC TCP CUBIC TCP Reno
90%
OpenVPN Client Cumulative Fraction
User Kernel
VM1
100%
Web Server
36
Fig. 4.
38
40 42 44 46 Throughput (Mb/s)
48
50
52
0.6
0.65
0.7 0.75 0.8 Round-Trip Time (ms)
0.85
0.9
Measured throughput and delay of the emulation infrastructure.
eth0
Fig. 3.
distribution. The delays ranged from 0.6 ms to 0.75 ms in most cases; larger payload resulted in larger delays but not by a large margin. From the results, we can conclude that the emulation infrastructure is able to sustain close to 40 Mb/s throughput and only introduce sub-millisecond delay. Although at this speed it is sufficient to test normal network links, it is obvious that we have to enable time dilation when the emulation traffic demand exceeds 40 Mb/s, which is especially common for large network experiments. Our emulation infrastructure also allows client machines to choose randomly among multiple simulation gateways (i.e., a VPN server farm) in order to connect to the network simulator, both for load balancing and for increased capacity. Although this technique leads to a higher throughput for the emulated traffic, time dilation is still needed in cases where the throughput can no longer satisfy the requirement of the experiment scenarios.
PRIME emulation infrastructure.
The reader thread at the simulator side receives the packets from the emulation gateway and then generates simulation events representing the packets sent from the corresponding end-hosts on the virtual network. PRIME simulates the network transactions as the packets being forwarded on the virtual network. Upon reaching their destinations, the packets are exported from the simulator and the writer thread sends the packets to the emulation gateway via another dedicated TCP connection. The OpenVPN server at the emulation gateway subsequently forwards the packets to the corresponding emulated hosts via the VPN connections. The OpenVPN client at the target virtual machine receives the packets and then writes them out to tun0. The web client receives these packets as if they arrived directly from a physical network. We measured the overhead of the emulation infrastructure through a simple experiment. We set up the Xen machine on a Dell OptiplexTM 745 workstation with an Intel Core 2 Duo 2.4 GHz processor and 2 GB of memory. We created two user domains each containing an emulated host, where we used iperf to set up a TCP connection and generate traffic between the two user emulated hosts, and used ping to measure the round-trip times. We set up both the simulation gateway and the network simulator on another machine, which is a Dell PowerEdgeTM 1900 configured with an Intel Quad Core 2.0 GHz processor and 8 GB of memory. The two physical machines were connected through a gigabit switch. The network model contained only the two emulated hosts connected through a link of zero delay and with a huge capacity, so that we could measure the data throughput and latency as imposed by the emulation infrastructure and the physical hardware. Fig. 4 shows the distributions of throughput and end-toend delay between the clients. For the throughput experiment, we selected four TCP variants: Scalable TCP, BIC, CUBIC, and TCP Reno. We collected the throughput measurements for 30 trials for each TCP variant and show its cumulative distribution. The majority of the measured throughput fell between 40 and 46 Mb/s, although TCP Reno seemed to have a larger variance compared to others. For the round-trip time tests, we varied the size of the ICMP packets among 56 bytes (the default), 512 bytes, and 1 KB. We collected 100 ping responses for each payload size and plot its cumulative
IV. T ESTBED VALIDATION In this section we describe in detail the three sets of experiments we conducted to evaluate the SVEET testbed. A. TCP Congestion Window Trajectories Our first set of experiments aim to provide the baseline comparison between the two network simulators (ns-2 and PRIME) and the SVEET approach, however, without time dilation. We use a simple network with two end-hosts connected by two routers, as shown in Fig. 5, which is similar to the one used in a previous study [22]. The connection between the two routers forms a bottleneck link, configured with 10 Mb/s bandwidth and 64 ms delay. The network interfaces at both ends of the bottleneck link each has a drop-tail queue with a buffer size of around 66 KB (about 50 packets). The links connecting the routers and the end-hosts each has 1 Gb/s bandwidth and zero delay, respectively. We conducted three tests for each TCP variant: the first with the ns-2 simulator, the second with the PRIME simulator (with emulation disabled), and the third on SVEET (without
H1
1 Gb/s
10 Mb/s
1 Gb/s
0 ms
64 ms
0 ms
TCP Flow
Fig. 5.
6
R1
R2
A simple network scenario.
H2
aggressive congestion control scheme, while CUBIC stayed between the other two TCP variants.
time dilation). Emulation was conducted on the same platform as was used in the experiments in the previous section. Both end-hosts were emulated in separate Xen domains (i.e., virtual machines) located on the same physical machine. Both PRIME and the simulation gateway were run on another machine, and the two machines are connected through a gigabit switch. During each test, we directed one TCP flow from one endhost (H1 ) to the other (H2 ) and measured the changes in the TCP congestion window size over time at the sender (H1 ): for both ns-2 and PRIME, we used a script to analyze the trace output from the simulators; for SVEET, we used Web100 to collect the congestion window size at the virtual machines. Fig. 6 shows the results. The results from ns-2 and PRIME match well, with only small differences that can be attributed to the differences between the two simulators in the calculation of transmission delays as packets traversing the routers. SVEET produced results similar to those from the simulators; the differences are typically more pronounced at the beginning of the data transfer, which resulted in a slight phase shift in the congestion window trajectory onward. SVEET predicted larges congestion windows for TCP Vegas than those from the two simulators. We speculate that there is a discrepancy in the parameters used by this protocol between the simulators and the real implementation. Nevertheless, the results from these tests show conclusively that SVEET (without time dilation) can accurately represent the behavior of the TCP variants.
C. Intra-Protocol Fairness In the third set of experiments we examine fairness between homogeneous TCP flows (i.e., using the same TCP variant), to further establish the accuracy of SVEET under time dilation. We created a dumbbell topology (similar to the one used in a recent TCP performance study by Li et al. [6]) by adding another pair of end-hosts and attaching them separately to the two routers in our simple network model. We set the bandwidth of the bottleneck link to be 100 Mb/s and the delay to be 50 ms. At the start of each experiment, we selected one of the end-hosts on the left to send data over TCP to one of the end-hosts on the right across the bottleneck link. At 50 seconds, the other end-host on the left established a separate TCP flow to the other end-host on the right. In SVEET, all endhosts are emulated and we set the TDF of the virtual machines and the simulator both to be 10. We measured the changes to the congestion window size over time at the senders of both flows. Fig. 9 compares the results from SVEET and PRIME for TCP Reno, CUBIC, and Scalable TCP. In all cases, the emulation results match well with the corresponding simulation results. The slow convergence of Scalable TCP indicates that this protocol does not score well in intra-protocol fairness. This is mainly due to its aggressive congestion control mechanism, an multiplicative-increase and multiplicative-decrease (MIMD) algorithm. Such observations have been confirmed by earlier studies.
B. Throughput versus Packet Loss We use the second set of experiments to study the accuracy of the time dilation technique in SVEET for evaluating TCP performance. In these experiments, we reused the same network scenario as in the previous set of experiments, but applied random packet drops according to a specified probability. We varied the packet drop probability between 10−6 and 10−1 , and measured the aggregate throughput of downloading a large data file over TCP for 100 seconds. Fig. 7 shows the results both from PRIME and SVEET for three TCP variants, TCP Reno, CUBIC, and Scalable TCP, which were selected for their drastically different congestion control behaviors. In all cases, SVEET (with TDF of 1) produced very similar results to those from simulation. In a separate experiment, we increased the bandwidth of the bottleneck link from 10 Mb/s to 100 Mb/s and adjusted the NIC’s buffer size from 66 KB to 291 KB (around 220 packets). In Fig. 8, we compare the results from PRIME, from SVEET using a TDF of 10 (i.e., with a slowdown factor of 10), and from SVEET with TDF of 1 (i.e., real-time simulation with time dilation disabled). The results from the first two scenarios are almost indistinguishable from each other, while SVEET with TDF of 1 achieved significantly lower throughput when the emulation traffic exceeded the maximum capacity supported by the underlying emulation infrastructure. The results also show that the throughput for TCP Reno never reached the physical capacity even at no data loss. In contrast, the throughput of Scalable TCP remained very much close to the full bandwidth in low data loss situations due to its
V. C ASE S TUDIES Background traffic has been proven to have a significant impact on the behavior of network applications and protocols. Floyd and Kohler have been strongly advocating the use of better models for network research through careful examination of unrealistic assumptions in modeling and simulation studies [35]. Ha et al. conducted a systematic study of highspeed TCP protocols and demonstrated conclusively that the stability, fairness, and convergence speed of several TCP variants are clearly affected by the intensity and variability of background traffic [7]. Recently, Vishwanath and Vahdat investigated the impact of background traffic on distributed systems [36]. They concluded that even small differences in the burstiness of background traffic can lead to drastic changes in the overall application behavior. In this section, we describe the case studies that we performed to assess the global effect of background traffic generated by the TCP variants on real applications. We used a synthetic network topology called campus network (shown in Fig. 10), which consists of 508 end-hosts and 30 routers. This network is a scaled-down version of the baseline network model that has been used for large-scale simulation studies. The network has four subnets; within net2 and net3, there are 12 local area networks (LANs), each configured with a gateway router and 42 end-hosts. The LANs are 10 Mb/s 7
SCALABLE
BIC
100
50 ns-2 PRIME SVEET 0
10
20
30 40 Time (seconds)
50
200
150
100
50 ns-2 PRIME SVEET
0 60
0
10
20
HTCP
150
100
50 ns-2 PRIME SVEET 0
10
20
0
30 40 Time (seconds)
50
100
50 ns-2 PRIME SVEET
60
0
10
20
30 40 Time (seconds)
50
50 ns-2 PRIME SVEET
0 30 40 Time (seconds)
50
Fig. 6.
100
50 ns-2 PRIME SVEET 0
10
20
30 40 Time (seconds)
0
50
60
30 40 Time (seconds)
0.001
0.01
Fig. 7.
100000 1e-06
0.1
50 ns-2 PRIME SVEET 10
20
30 40 Time (seconds)
60
1e+07
1e+06
1e-05
0.0001
0.001
0.01
100000 1e-06
0.1
1e-05
0.0001
0.001
0.01
0.1
Drop Probability
Throughput achieved under random packet loss (10 Mb/s bottleneck link bandwidth).
CUBIC 1e+08
1e+06
SCALABLE 1e+08
PRIME SVEET (TDF=10) SVEET (TDF=1)
1e+07
Throughput
Throughput
1e+07
1e+06
Fig. 8.
50
PRIME SVEET (TDF=1)
Drop Probability
PRIME SVEET (TDF=10) SVEET (TDF=1)
0.001
60
SCALABLE
1e+07
RENO
Drop Probability
50
100
1e+08
Throughput
Throughput 0.0001
0.0001
20
150
0
1e+06
1e-05
10
0
PRIME SVEET (TDF=1)
Drop Probability
100000 1e-06
ns-2 PRIME SVEET
CUBIC 1e+08
1e+06
1e+08
50
Congestion window trajectories of Linux TCP variants.
PRIME SVEET (TDF=1)
1e-05
100
VEGAS
150
RENO
1e+07
60
200
0 60
50
150
60
Congestion Window Size (packets)
Congestion Window Size (packets)
100
100000 1e-06
30 40 Time (seconds)
HIGHSPEED
150
1e+08
20
0
200
20
10
HYBLA
150
RENO
10
ns-2 PRIME SVEET
200
0
200
0
50
60
Congestion Window Size (packets)
Congestion Window Size (packets)
Congestion Window Size (packets)
50
100
0
200
0
Congestion Window Size (packets)
30 40 Time (seconds)
150
WESTWOOD
200
Throughput
Congestion Window Size (packets)
150
0
Throughput
CUBIC
200 Congestion Window Size (packets)
Congestion Window Size (packets)
200
0.01
0.1
100000 1e-06
PRIME SVEET (TDF=10) SVEET (TDF=1)
1e+07
1e+06
1e-05
0.0001
0.001
Drop Probability
0.01
0.1
100000 1e-06
1e-05
0.0001
Throughput achieved under random packet loss (100 Mb/s bottleneck link bandwidth).
8
0.001
Drop Probability
0.01
0.1
RENO
CUBIC 1400
1200 1000 800 600 400 200 0
SCALABLE 1400
SVEET TCP flow 1 PRIME TCP flow 1 SVEET TCP flow 2 PRIME TCP flow 2
1200
Congestion Window Size (packets)
SVEET TCP flow 1 PRIME TCP flow 1 SVEET TCP flow 2 PRIME TCP flow 2
Congestion Window Size (packets)
Congestion Window Size (packets)
1400
1000 800 600 400 200 0
0
20
40
60 80 100 Time (seconds)
120
140
160
Fig. 9.
net 0
20
40
60 80 100 Time (seconds)
800 600 400 200
120
140
160
0
20
40
60 80 100 Time (seconds)
120
140
160
Congestion window sizes of two competing TCP flows.
From the figure, it is obvious that different TCP variants produced drastically different results. TCP Reno achieved the best response time among the three TCP variants that we chose for this study. We speculate this may be partially due to its better performance in intra-protocol fairness; in this case, the foreground traffic could retain a larger share of the link bandwidths for downloading the objects. Surprisingly, Scalable TCP is performing better than CUBIC. The results undoubtedly suggest that we need to further investigate the details of the background traffic used in this experiment, as well as its impact on the link utilization. Here we use this example only to show that SVEET can now enable us to begin studying these global-scale behaviors of the TCP variants in complex large-scale network settings.
httperf
net 3 LAN with 42 hosts Apache web server
Fig. 10.
1000
0 0
net 1
net 2
SVEET TCP flow 1 PRIME TCP flow 1 SVEET TCP flow 2 PRIME TCP flow 2
1200
A campus network model.
VI. C ONCLUSIONS SVEET is a scalable, flexible, and accurate testbed, where real TCP implementations can be evaluated in diverse network scenarios. We accomplish our goal through real-time simulation, emulation, and machine and time virtualization techniques. The testbed’s accuracy is tested extensively by comparing the performance of the TCP variants in terms of congestion window history, mean throughput, and response function, against what has been projected by pure simulation. Results confirm that SVEET can capture the expected TCP behavior. In the near future, we plan to test the TCP performance using more realistic network topologies and traffic in much larger network and involve other distributed applications, such as multimedia streaming. We will also investigate and implement adaptive TDF schemes as a way to minimize resource underutilization. Additionally, we will investigate new techniques to reduce the overhead of the emulation infrastructure. For example, we are currently investigating more efficient inter-VM communication schemes to facilitate highperformance and scalable interaction between the simulator and potentially a large number of virtual machines running network-intensive applications. We would also like to improve the interoperability between the simulated TCP variants and those on the virtual machines so that real applications can directly communicate with simulated entities over TCP. In the future, we expect SVEET to be a highly flexible and highly efficient testbed with built-in capabilities to generate and run standardized TCP tests with minimum human intervention.
networks. For links connecting routers within net1 and net2, we set the bandwidth to be 100 Mb/s and the link delay to be 10 ms. For other links connecting routers, we set the bandwidth to be 1 Gb/s and link delay to be 10 ms. In this experiment, each end-host acts as a on-off traffic source: the node stays idle for a period of time exponentially distributed with a mean of 1 second, before it sends data to a randomly selected end-host in net1 over TCP for a duration according to a Pareto distribution with a mean of 1 second. We enabled time dilation and set TDF to be 10 for both simulation and the virtual machines. We chose web applications for this study. We placed an Apache web server at one of the emulated end-host in net2. We also selected another end-host in net1 as an emulated host and programmed a httperf client on the emulated host to fetch objects of different size from the web server. We chose three TCP variants—TCP Reno, CUBIC, and Scalable TCP—and set the simulated background traffic and the foreground web traffic to use the same TCP variant in each experiment. We also varied the size of the objects to be retrieved from the web server to be 10 KB, 100 KB, or 1 MB. For each of the nine TCP variant and object size combinations, we collected measurements from 30 independent trials. Fig. 11 shows the empirical cumulative distributions of the response time, which is the time between the client’s sending the HTTP request and finally receiving the entire object. 9
Object Size = 10 KB 90%
90% 80% Cumulative Fraction
80% Cumulative Fraction
Object Size = 100 KB 100%
SCALABLE CUBIC RENO
70% 60% 50% 40% 30%
Object Size = 1 MB 100%
SCALABLE CUBIC RENO
90% 80% Cumulative Fraction
100%
70% 60% 50% 40% 30%
70% 60% 50% 40% 30%
20%
20%
20%
10%
10%
10%
0% 350
400
450 500 550 600 650 700 Response Time Per Download (msecs)
750
0% 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 Response Time Per Download (msecs)
Fig. 11.
SCALABLE CUBIC RENO
0% 10000 10500 11000 11500 12000 12500 13000 13500 14000 Response Time Per Download (msecs)
HTTP download.
R EFERENCES
[18] J. Liu, “A primer for real-time simulation of large-scale networks,” in Proceedings of the 41st Annual Simulation Symposium (ANSS), 2008, pp. 85–94. [19] D. Gupta, K. Yocum, M. McNett, A. Snoeren, A. Vahdat, and G. Voelker, “To infinity and beyond: time-warped network emulation,” in Proceedings of the 3rd USENIX Symposium on Networked Systems Design and Implementation (NSDI’06), 2006. [20] J. Liu, “The PRIME research,” http://www.cis.fiu.edu/prime/. [21] J. Liu, S. Mann, N. V. Vorst, and K. Hellman, “An open and scalable emulation infrastructure for large-scale real-time network simulations,” in Proceedings of IEEE INFOCOM MiniSymposium, 2007, pp. 2476– 2480. [22] D. X. Wei and P. Cao, “NS-2 TCP-Linux: An NS-2 TCP implementation with congestion control algorithms from Linux,” in Proceedings of the 2nd International Workshop on NS-2 (WNS2), 2006. [23] H. Shimonishi, M. Sanadidi, and T. Murase, “Assessing interactions among legacy and high-speed TCP protocols,” in Proceedings of International Workshop on Protocols for Fast Long-Distance Networks (PFLDnet), 2007, pp. 91–96. [24] L. Rizzo, “Dummynet: a simple approach to the evaulation of network protocols,” ACM SIGCOMM Computer Communication Review, vol. 27, no. 1, pp. 31–41, 1997. [25] S. Bhatia, M. Motiwala, W. Muhlbauer, V. Valancius, A. Bavier, N. Feamster, L. Peterson, and J. Rexford, “Hosting virtual networks on commodity hardware,” Georgia Tech, Tech. Rep. GT-CS-07-10, 2008. [26] “VMWare Workstation,” http://www.vmware.com/products/desktop/ workstation.html. [27] J. Dike, “A user-mode port of the Linux kernel,” in Proceedings of the 4th Annual Linux Showcase & Conference, 2000. [28] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen and the art of virtualization,” in Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03), 2003. [29] “Linux VServer,” http://linux-vserver.org/. [30] “OpenVZ,” http://openvz.org/. [31] C. Caini, R. Firrincieli, R. Davoli, and D. Lacamera, “Virtual integrated TCP testbed (VITT),” in Proceedings of the 4th International Conference on Testbeds and Research Infrastructures for the Development of Networks & Communities (TRIDENTCOM’08), 2008, pp. 1–6. [32] C. Bergstrom, S. Varadarajan, and G. Back, “The distributed open network emulator: Using relativistic time for distributed scalable simulation,” in Proceedings of the 20th Workshop on Principles of Advanced and Distributed Simulation (PADS’06), 2006, pp. 19–28. [33] A. Grau, S. Maier, K. Herrmann, and K. Rothermel, “Time jails: A hybrid approach to scalable network emulation,” in Proceedings of the 22nd Workshop on Principles of Advanced and Distributed Simulation (PADS’08), 2008, pp. 7–14. [34] M. Mathis, J. Heffner, and R. Reddy, “Web100: Extended TCP instrumentation for research, education and diagnosis,” ACM Comput. Commun., vol. 33, no. 3, pp. 69–79, 2003. [35] S. Floyd and E. Kohler, “Internet research needs better models,” Computer Communication Review, vol. 33, no. 1, pp. 29–34, 2003. [36] K. V. Vishwanath and A. Vahdat, “Evaluating distributed systems: Does background traffic matter,” in Proceedings of the 2008 USENIX Technical Conference, 2008, pp. 227–240.
[1] D. Leith and R. Shorten, “H-TCP protocol for high-speed long distance networks,” in Proceedings of International Workshop on Protocols for Fast Long-Distance Networks (PFLDnet), 2004. [2] T. Kelly, “Scalable TCP: improving performance on highspeed wide area networks,” ACM SIGCOMM Computer Communication Review, 2003. [3] D. X. Wei, C. Jin, S. H. Low, and S. Hegde, “FAST TCP: motivation, architecture, algorithms, performance,” vol. 14, no. 6, 2006, pp. 1246– 1259. [4] L. Xu, K. Harfoush, and I. Rhee, “Binary increase congestion control (bic) for fast long-distance networks,” in Proceedings of IEEE INFOCOM, vol. 4, 2004, pp. 2514–2524. [5] I. Rhee and L. Xu, “CUBIC: A new TCP-friendly high-speed TCP variant,” in Proceedings of International Workshop on Protocols for Fast Long-Distance Networks (PFLDnet), 2005. [6] Y.-T. Li, D. Leith, and R. N. Shorten, “Experimental evaluation of TCP protocols for high-speed networks,” IEEE/ACM Transactions on Networking, vol. 15, no. 5, pp. 1109–1122, 2007. [7] S. Ha, L. Le, I. Rhee, and L. Xu, “Impact of background traffic on performance of high-speed TCP variant protocols,” Computer Networks, vol. 51, no. 7, pp. 1748–1762, 2007. [8] S. Ha, Y. Kim, L. Le, I. Rhee, and L. Xu, “A step toward realistic performance evaluation of high-speed TCP variants,” in Proceedings of International Workshop on Protocols for Fast Long-Distance Networks (PFLDnet), 2006. [9] L. Andrew, C. Marcondes, S. Floyd, L. Dunn, R. Guillier, W. Gang, L. Eggert, S. Ha, and I. Rhee, “Towards a common TCP evauation suite,” in Proceedings of International Workshop on Protocols for Fast Long-Distance Networks (PFLDnet), 2008. [10] D. X. Wei, P. Cao, and S. H. Low, “Time for a TCP benchmark suite?” CalTech, Tech. Rep., 2005. [11] L. Peterson, T. Anderson, D. Culler, and T. Roscoe, “A blueprint for introducing disruptive technology into the Internet,” in Proceedings of the 1st Workshop on Hot Topics in Networking (HotNets-I), 2002. [12] A. Bavier, N. Feamster, M. Huang, L. Peterson, and J. Rexford, “In VINI veritas: realistic and controlled network experimentation,” in SIGCOMM, 2006, pp. 3–14. [13] L. Breslau, D. Estrin, K. Fall, S. Floyd, J. Heidemann, A. Helmy, P. Huang, S. McCanne, K. Varadhan, Y. Xu, and H. Yu, “Advances in network simulation,” IEEE Computer, vol. 33, no. 5, pp. 59–67, 2000. [14] J. Cowie, D. Nicol, and A. Ogielski, “Modeling the global Internet,” Computing in Science and Engineering, vol. 1, no. 1, pp. 42–50, January 1999. [15] G. F. Riley, “The Georgia Tech network simulator,” in Proceedings of the ACM SIGCOMM Workshop on Models, Methods and Tools for Reproducible Network Research (MoMeTools’03), 2003, pp. 5–12. [16] B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar, “An integrated experimental environment for distributed systems and networks,” in Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI’02), 2002, pp. 255–270. [17] A. Vahdat, K. Yocum, K. Walsh, P. Mahadevan, D. Kostic, J. Chase, and D. Becker, “Scalability and accuracy in a large scale network emulator,” in Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI’02), 2002, pp. 271–284.
10