Network Integrated Transparent TCP Accelerator - Semantic Scholar

3 downloads 31208 Views 1MB Size Report
acceleration service on two network device platforms with co-located processing .... Client. OpenWRT router. TCP accelerator client proxy. ONE Services module switching fabric ..... [11] Soekris, Inc. http://www.soekris.com/. [12] A. Tirumala, F.
Network Integrated Transparent TCP Accelerator Jeongkeun Lee, Puneet Sharma, Jean Tourrilhes, Rick McGeer, Jack Brassil Hewlett-Packard Laboratories Palo Alto, US Email: {jklee,puneet.sharma,jean.tourrilhes,rick.mcgeer,jack.brassil,}@hp.com

Abstract—Network device vendors have recently opened up the processing capabilities on their hardware platform to support third-party applications. In this paper, we explore the requirements and overheads associated with co-locating middlebox functionality on such computing resources on networking hardware. In particular, we use an example of TCP acceleration proxy (CHART) that improves throughput over networks with delay and loss. The CHART system, developed by HP and its partners provides enhanced TCP/IP performance and service quality guarantees by deploying performance accelerating proxies, which enables legacy clients to benefit by high-performance network service. Use of the TCP proxy, however, requires manual configuration on the clients changing http proxy and/or routing table settings. Can we remove the need to configure end-hosts by inserting a transparent TCP proxy in the path, without losing performance? To address this question, we implement the accelerator on HP’s x86-based processing blade designed to integrate network applications within switch architecture as well as on low-end home routers with OpenWRT. We describe the implementation detail such as flow redirection for transparency and new mechanisms required for easy insertion of proxies in the network path. We also evaluate its performance on HP’s experimental testbed in terms of throughput and additional processing overhead. Keywords-TCP acceleration, in-network processing, flow routing

I. I NTRODUCTION Over time the Internet has grown to support a plethora of applications over an increasingly heterogeneous underlying network with largely varying characteristics. The same one size fits all paradigm that aided the rapid and widespread adoption of Internet is now a hindrance to deployment of newer protocols and technologies to match the growing heterogeneity. Such new functionality can be to address the new environments and requirements such as high loss wireless links, long latency satellite channels or QoS support for realtime applications. Lack of co-located processing and protocol inflexibility has led to mushrooming of middleboxes, also referred to as proxies, as the primary approach to deploying any new network functionality. An interesting recent trend has been to make the computing/processing co-located with network devices available to This work was supported in part by DARPA Contract N66001-05-9-8904 (Internet Control Plane).

Andy Bavier Princeton University Princeton, US Email: [email protected]

third-party software developers [1]–[3]. Various in-network processing such as caching, transcoding, firewall etc. can now be implemented on the switching platforms directly. Thus co-located processing with switching helps eliminate separate middle boxes and reduce overall complexity. Additionally the co-located processing can also be leveraged to enable and experiment with new features that are not available in current networking hardware. For instance software based solutions for emulating absent hardware feature e.g., header rewriting can be implemented on these computing resources. As an example to explore the requirements and overheads with implementing middlebox functionality co-located on the networking hardware, we chose to design and build a TCP acceleration service. We implemented a TCPacceleration service on two network device platforms with co-located processing capabilities: one for data-center deployment using HP’s ONE (Open Network Ecosystem) services blade [1] that can be added to the switch chassis and the second on a wireless OpenWRT platform [4] for low cost home routers. In this paper, we used a explicit-rate-signalling based TCP accelerator called CHART [5] and implemented it on the ONE blade. The original CHART accelerator required either end-host OS modification or route configuration changes. Our implementation supports transparent TCP acceleration to unmodified multiplatform clients (e.g., Windows, Max OS, etc) without any change in their network and/or webbrowser settings. The transparency support is provided using redirection and header-rewriting on OpenFlow enabled HP switches. Similarly, we integrated the CHART proxy into home router and access point via OpenWRT. By providing CHART TCP accelerator proxy as a OpenWRT package, we allow (home) users to get benefit from the CHART service from their home routers, wireless access points and any other embedded network gear supported by OpenWRT. Our solution is also transparent to the proxy itself in a sense that we can reuse any non-transparent proxy already deployed without modifying the implementation and/or configuration of the non-transparent proxy. We evaluated the performance of our transparent TCP acceleration system on the HP testbed. The TCP accelerator improves throughput over standard end-to-end TCP by a

factor of up to 30X. The transparent proxy throughput performance is within 93∼100% of the non-transparent original CHART proxy system. The remainder of the paper is organized as follows. Section II introduces the CHART TCP acceleration system developed by HP and its partners under a DARPA contract. The key components used to turn the original CHART system into a transparent proxy system are described in section III and the implementation details are given in section IV. In section V, we present throughput performance of the transparent proxy system in various settings. Section VI discusses related work and section VII contains our conclusions. II. CHART S YSTEM TCP implementations typically determine transmission rate by inferring network congestion from measurements taken at the end-points, notably latency variations and loss rates. The TCP rate equation, shown in the Appendix, indicates that TCP’s congestion control algorithm breaks down in a regime with either high RTT or high loss. This is well-known, of course, and a variety of solutions have been proposed. Mostly, these involve tweaking the window size computation to adjust for the known properties of a link. A second approach is explicit signalling. In this regime, the network elements keep track of how much bandwidth is requested and used along each output port, and allocate bandwidth among existing flows. The available bandwidth is signaled to each flow by annotating the IP packet headers. A conformant transport implementation then adjusts its transmission rate to conform to the signal. An early form of explicit signaling is Explicit Congestion Notification [6], which effectively behaves as a virtual dropped packet. This has the advantage of taming congestion without packet drops, and thus signals congestion earlier. However, it does not signal the rate at which the endpoint should transmit. More importantly, the absence of a set ECN bit does not signal an absence of congestion, and therefore in the absence of any ECN signals the TCP transmitter must rely on the standard TCP congestion control mechanism, meaning that its rate is still governed by the classic rate equations. In order to transmit at the actual rate that the line and the actual congestion can support, a richer primitive is needed. This was recognized by the Telecommunications Industry Association, which standardized a protocol for explicit-rate transmission control [7]. Under the TIA 1039 protocol, an explicit rate is requested by an endpoint as an option in the IP header, and the network elements explicitly grant the rate by modifying the appropriate field in the packet header. The first such signaling packet is the SYN packet of a flow; subsequent signaling packets are inserted every 128 packets or every second, whichever comes first. The annotated fields are returned to the sender via the acknowledgment packets.

Figure 1. bottom.

ProCurve ONE zl Services blade plugged in the switch at the

Unlike the standard TCP and ECN, the sender then has a certificate from the network that it can transmit at the signaled rate, and the sender can then do so. This removes the bandwidth caps of the TCP rate equations, and permits full use of available bandwidth. Over the past four years, we have implemented a TCP driver and proxy based on the TIA-1039 protocol [5] [8]. In DARPA testing, the modified TCP driver and proxy were found to increase TCP throughput by a mean of 40x on highlatency, high-loss links, and were further found not to induce congestion in the network, nor be unfair to legacy (non1039) flows. The proxy was initially deployed on x86 servers running on the Fedora Core 9 Linux operating system, and this was the configuration used in DARPA testing. A. Need For Transparency Support Use of the TCP proxy, however, still required manual configuration on the end-hosts using a command similar to the commands used to configure http (web) proxies. In this research, we wanted to establish the following thing: • Could we remove the need to configure end-hosts by inserting a transparent TCP proxy in the path, without losing performance? We set out to answer this question in this project. III. B UILDING B LOCKS In this section, we describe the building blocks that compose the transparent TCP acceleration system which is integrated into the network elements. A. Hardware Layer 2 switch One of the goal is to add the CHART TCP proxy transparently into a hardware layer 2 switch. For this work, we selected a set of HP ProCurve 5406zl switches. The 5406zl switch is a high performance modular switch. The switch chassis can accommodate 6 or 12 linecards, each linecard has 24 1Gb/s Ethernet port or 4 10GB/s port.

The 5406zl switch has two features that were essential for this work. First, the 5406zl can accommodate processing blade in one of its slots. Second, we have developed an implementation of OpenFlow protocol for the 5406zl. B. Switch-Integrated Processing Blade The Open Network Ecosystem (ONE) [1] is a program run by HP ProCurve which enables network functionality to be integrated directly within selected HP ProCurve switches. Those common edge functionalities, such as monitoring, firewalls, NAT, and proxies, can be run in the switch instead of in a separate box. The core part of ONE program is the ONE Services zl module, which is a x86-based server blade that fits into any linecard slot of an ONE compatible HP switch. Fig. 1 shows an ONE Services blade plugged into one of HP ProCurve 5406zl switches in our lab. The ONE Services blade provides two 10-GbE network links directly into the switch backplane. It has a dual core Intel CPU, 4 GB RAM, and a hard drive. In this paper, we implement a transparent TCP acceleration proxy as an in-network processing application running in ONE Services blades. C. OpenFlow Protocol OpenFlow [9] is an open specification for network devices that enables running of experimental protocols in production networks. The OpenFlow protocol provides a rich set of API enabling an application to control the flow of packets in a network device. We leveraged OpenFlow to steer selected flows transparently to the ONE blade which runs the CHART proxy. While it is possible to use other protocol to perform packet steering, OpenFlow has the advantage of being open and easy to implement. An OpenFlow switch maintains a flow table and its each entry defines a flow as a certain set of packets by matching on nine packet header fields plus an ingress port number. OpenFlow was initiated by Stanford University and is managed by the OpenFlow community, a loose association of network researcher and vendors. HP Labs is an active part of the OpenFlow community. HP Labs has developed an implementation of OpenFlow in the firmware of selected HP ProCurve switches, including the 5406zl [10]. This OpenFlow firmware is a research prototype, not a product, and has been provided to Stanford and some other universities and installed on the ProCurve switches in their campus networks. The OpenFlow firmware can implement common OpenFlow rules in hardware (line speed rate), more complex rules are implemented in software (which is slow). D. OpenWRT Software Router In addition to the powerful layer-2 switch designed for enterprise and campus networks, we also implemented the transparent TCP accelerator on an OpenWRT software router

Figure 2.

Soekris net5501 board running OpenWRT firmware.

as a ‘personal’ proxy so that ordinary home network users can use the high performance TCP acceleration service. OpenWRT [4] is a Linux-based opensource firmware for home routers and wireless access points. OpenWRT supports highly modular firmware components, so users can trim the firmware image down to small enough to run on low-end embedded devices. OpenWRT supports more than 300 different hardware models with different capabilities. For this work, we chose the Soekris net5501 [11] as the hardware platform to run OpenWRT. It is a x86-based embedded computer equipped with 500 Mhz CPU and 512 Mbyte memory and is shown in Fig. 2. E. Testbed Our testbed was designed to explore different ways to implement transparent CHART proxy operation. The testbed has two main subnets, the client subnet and the server subnet. The two subnet are connected by a Linux router, which mimics the various characteristics of Wide-Area Networks (WAN) by inserting delays and dropping packets using the netem network emulator module. The server subnet contains the server serving the actual data. It is connected to the Linux router via a 5406zl switch. The 5406zl contains a ONE zl blade, which runs the CHART server proxy (S-proxy). The client subnet has two different configurations. The first configuration is using a hardware switch and is similar to the server subnet, it contains the client connected to the router via a 5406zl switch. The 5406zl contains a ONE service blade, which runs the CHART client proxy (Cproxy). The second configuration of the client subnet uses the OpenWRT Software Router instead of the 5406zl switch. The subnet is actually split into two subnets by the OpenWRT router. The CHART client proxy (C-proxy) is implemented in the OpenWRT router.

OpenWRT router

HP ProCurve switch

Linux daemon TCP accelerator client proxy wired or wireless link

ONE Services module

WAN (Delay & Loss)

Linux iptables

Linux networking stack

Client default gateway = OpenWRT router

switching fabric

Router

Client to Server TCP connection is split into

TCP accelerator server proxy

Server Client ↔ client proxy client proxy ↔ server proxy server proxy ↔ Server Other flows

Figure 3.

Transparent TCP acceleration proxy system: client proxy built in the OpenWRT software router.

IV. I MPLEMENTATION OF T RANSPARENT TCP ACCELERATOR The CHART TCP acceleration system splits a single TCP connection between client and server into three connections (i.e., client to client proxy, client proxy to server proxy, server proxy to server). The client proxy (C-proxy) intercepts the first SYN packet from the client and replies SYN-ACK to the client instead of the server. The C-proxy initiates another TCP connection to the server proxy (S-proxy), which makes the final TCP connection to the server. The proxy provides accelerated TCP service on the middle connection via modifications to the TCP congestion control mechanism and explicit signaling of available bandwidth information. Because the S-proxy initiates the connection to the server, no configuration change is needed at the server. In the original CHART implementation, the C-proxy is not transparent to the client, as the client has to explicitly direct its packets to the C-proxy by changing its routing table or web-proxy configuration. Our aim is to make the Cproxy transparent, so that no special configuration is needed on the client. We also want to be transparent to the Cproxy itself, so that it does not need any modification. We present two approaches to provide transparency, one applicable to software routers and the other applicable to hardware switches. A. Router Integrated Proxy Implementation on a software router is easier, because a generic OS and a CPU are involved in processing every packets. Our goal here is to provide a harmonious integration of CHART TCP proxy in the router operating system, with no additional custom configuration. We selected OpenWRT as representative of a common software router. Fig. 3 shows the system architecture diagram where the client proxy running inside the OpenWRT router. We ported the CHART proxy software to the OpenWRT platform as a set of OpenWRT packages. The total size of

the set of CHART packages is 43 Kbytes, which is small enough to be fit into low-end commodity home routers. Because the OpenWRT box works as a L3 router for the client in this setting, all the client traffic go up to layer 3 in the OpenWRT router’s networking stack. The original CHART proxy configuration uses Linux iptables packet filter to hijack interested client-to-server TCP packets from the internal forwarding path and redirect them to a C-proxy daemon process; and the same configuration and iptables filter rules can be used in the OpenWRT router without a modification. This is possible because the C-proxy is directly built in the packet forwarding path. This is also similar to the way how current network appliances are deployed using an inline two-port configuration: traffic from one side of the appliance is received and processed, and then forwarded to the other side towards the destination. In our implementation, the OpenWRT box acts not only as a router but also as a proxy appliance, which is also transparent to the client. B. Transparent Flow Redirection for Switch-Integrated Proxy Implementation in a hardware switch is more complex. Packets are only processed by the ASICs in the switch, and it is not practical for us to implement CHART in those ASICs. Therefore, those packets need to be diverted to a server where CHART can be run. This packet diversion present another difficulty, a normal server will only accept packet destined to it and will drop any packets destined somewhere else, like the diverted packets. It is possible to modify the server to accept those packets, for example by turning it into a L2 switch, however we wanted to not make any change to the CHART proxy and the server configuration, so that the CHART server could still be used normally as in a nontransparent mode. Using OpenFlow on the hardware switch enables us to run the C-proxy transparently to the client and the C-proxy

src_IP=Client,dst_IP=Server,protocol=TCP,action=mod_dst_mac:C-proxy-NIC;output:IN-PORT HP ProCurve switch

HP ProCurve switch

port 3

port 2

ONE services module Software OF TCP accelerator (MAC rewriting) client proxy

ONE Services module

WAN (Delay & Loss)

TCP accelerator server proxy

port 1

switching fabric

Client

switching fabric

Router

in_port=port1,src_IP=Client,dst_IP=Server,protocol=TCP,action=output:port2 in_port=port2,src_IP=Client,dst_IP=Server,protocol=TCP,action=output:port3

Server

Client to Server TCP connection is split into

Client client proxy client proxy server proxy server proxy Server Other flows

Figure 4.

Transparent TCP acceleration proxy system: client proxy integrated in the hardware switch.

itself. With OpenFlow, we need to perform three operations: 1) divert the packet to the C-proxy server, 2) modify the packet so that it is properly processed by the C-proxy server, and 3) re-insert packets coming out of the C-proxy into the normal flow of data. OpenFlow classifies packets based on 10 tuples, which allow to the precisely select TCP flows that require processing by the CHART proxy. The basic action of OpenFlow is to direct a flow to a specific port, enabling us to do 1) and 3). OpenFlow can also rewrite specific header fields, in our case we only need to rewrite the destination MAC address to enable 2). In our particular example of the TCP proxy application, the C-proxy initiates another TCP connection to the S-proxy: thus, 3) is not needed. An exemplary OpenFlow rule enabling 1) and 2) will look like this: in port=client-port,src IP=Client,dst IP=Server, protocol=TCP,actions=mod dst mac:C-proxy-NIC; output:port-to-C-proxy The switch we are using is not designed for OpenFlow, and the OpenFlow implementation has some limitations. One limitation is that it can not rewrite the destination MAC address. At first glance, this would prevent us from implementing the desired packet diversion. We could obviously relax our constraints on transparency and modify the CHART implementation. We were more interested in seeing how we could use OpenFlow itself to work around the limitation of our OpenFlow implementation. To implement MAC address rewriting, we use our limited OpenFlow implementation on the hardware switch to redirect packets to an OpenFlow software switch which has the full OpenFlow capability. The software OpenFlow switch rewrites the packet MAC address and then re-injects the

packet onto the network. Those packets are then forwarded to the C-proxy server. As many hardware implementations of OpenFlow are limited and do not support the full specification, this technique can be used generically to provide the full set of OpenFlow actions to any hardware OpenFlow switch (as long as it supports OpenFlow forwarding). One last thing to realize is that the OpenFlow software switch and the C-proxy server can be co-located on the same hardware. In our experiment, we run the OpenFlow software switch and the C-proxy on the same ONE blade as shown in Fig. 4. We use two separate physical NICs, one for OpenFlow, the other for CHART. We could also use a single NIC by using different VLANs and virtual interfaces to separate the two traffic streams. Fig. 4 shows the three OpenFlow rules (flow entries) used in our testbed setting: two forwarding rules on the hardware switch and one MAC rewriting rule on the software OpenFlow. V. P ERFORMANCE E VALUATION The throughput and fairness performance of CHART system has been presented in previous papers [5], [8], therefore in this paper, we focus on the comparative performance of the transparent TCP proxy system against the non-transparent original CHART system. Here, the nontransparent (non-TR) proxy means that the client needs to manually set its routing table or web-browser proxy configuration such that the packets sent to the server traverse the client proxy (C-proxy). In our case, we set the client’s default gateway address to be the IP address of the ONE blade network interface which the C-proxy operates on. In the case when the transparent proxy is built in the OpenWRT software router, the ‘transparency’ does not incur any change in the packet forwarding path and its perfor-

100

100

10 ms RTT 20 ms RTT 100 ms RTT

80

Throughput (Mbps)

Throughput (Mbps)

80

60

40

40

20

20

0

0 0.01%

10 ms RTT 20 ms RTT 100 ms RTT

60

0.10%

1.00%

0.01%

10.00%

0.10%

2 The performance of the router-integrated proxy system can be better if the flow redirection and proxying time in the software router is shorter than the time for the flow to traverse a non-TR proxy system that resides in a separate machine. 3 CUBIC TCP protocol [13] is employed as a default congestion control protocol in Linux kernel.

Figure 6. Transparent TCP accelerator performance with various RTT and loss rate.

100

80

Throughput (Mbps)

mance should be the same with or slightly better than the non-TR proxy’s.2 Thus, we compare the performance of transparent system and non-TR system in the hardware switch platform where the MAC address rewriting incurs additional delay. In the testbed topology shown in Fig. 4, the e2e throughput is capped by the 100 Mb/s link between the laptop client and the switch that the client is attached to. Iperf [12] network testing tool is used to measure the throughput of clientto-server TCP traffic. The measurements are done at the server and each TCP session lasts 60 seconds. Loss and delay is inserted in the middle router to test with various WAN conditions. To implement x ms RTT (Round-TripTime) delay, we inserted x/2 ms delay on both directions in the middle router. We first measure the throughput performance of a standard end-to-end TCP connection, in which the client and the server directly communicate with each other using standard Linux TCP/IP stack without any proxing in the middle.3 The problem of standard TCP with loss and delay is well illustrated in Fig. 5. The standard TCP throughput degrades drastically with increasing delay and loss rate. Even with small RTT, the throughput degrades down to 1 Mb/s when the loss rate is high. In contrast, our transparent TCP acceleration system shows far more graceful throughput degradation in Fig. 6. When delay is small, the throughput stays close to the maximum even at the substantial loss rate of 1%. Fig. 7 compares the throughput of standard TCP, non-

10.00%

Link Loss Rate

Link Loss Rate

Figure 5. Standard end-to-end TCP throughput with various RTT and loss rate.

1.00%

non-TR Proxy Transparent Proxy Linux Standard TCP

60

40

20

0 0.01%

0.10%

1.00%

10.00%

Link Loss Rate

Figure 7. Comparison between standard TCP, conventional non-transparent TCP proxy and transparent TCP proxy, when RTT is 20 ms.

TR proxy and transparent proxy when the RTT is 20 ms. It demonstrates that the transparent proxy performs very close to the non-TR proxy. The (small) difference between the transparent and non-TR proxies is caused by additional MAC address rewrite operation time. We first measured the switching delay of the software OpenFlow switch without rewriting a destination MAC address by ping tests. The switching delay depends on frame size. With a 1500 byte frame, which is the default MTU size that Iperf also uses, the software OpenFlow switching delay was 130 µs. With default ping packet size of 56 bytes, the switching delay was less than 10 µs, which is fairly short. Please note that the ONE service blade is running a fast CPU and is directly connected to the switch backplane via 10 Gb Ethernet. There is another report from literature also showing low

30

Performance Ratio

25 20

non-TR Proxy: 100 ms non-TR Proxy: 20 ms non-TR Proxy: 10 ms Transparent Proxy: 100 ms Transparent Proxy: 20 ms Transparent Proxy: 10 ms

15 10 5 0 0.01%

0.10%

1.00%

10.00%

Link Loss Rate

Figure 8. Performance ratios of conventional proxy and transparent proxy over standard TCP.

Linux bridge latency time [14]. When we compared the pure switching delay to the switching plus MAC address rewriting time at the software switch, we could not see a noticeable difference between them. In conclusion, the total time for MAC address rewrite in the software OpenFlow switch is about 130 µs and it causes a little dip of the throughput performance. Obviously, if MAC rewriting was available in hardware, this additional delay would disappear entirely. In Fig. 8, we compare the performance ratios of transparent proxy (against the standard TCP) and non-TR proxy (against the standard TCP) with various delay and loss settings. In general, the performance ratios of both proxy systems increase as link loss rate and RTT increase. Again, the transparent proxy performs very close to the non-TR proxy at all regime. The throughput of transparent proxy is mostly within 97%∼100% of the throughput of the non-TR proxy. The transparent proxy throughput falls down to 93% of the non-TR proxy at the highest loss (10%) and delay (100 ms RTT) setting, but the absolute difference is very small: 7.69 Mb/s versus 7.12 Mb/s. VI. R ELATED W ORK Web Cache Communication Protocol (WCCP) [15] is a Cisco’s proprietary protocol implemented on Cisco routers and some web-cache/proxy/security appliances for transparent redirection of web traffic. It also employs MAC address rewriting for L2 packet forwarding between Cisco routers and proxy/cache appliances. For L3 forwarding, GRE tunneling is used. In this paper, we showed that the transparent packet redirection, which had been supported only by the expensive, proprietary, impenetrable WCCP protocol, can be implemented by just three OpenFlow rules (two forwarding rules on the hardware switch, and one MAC rewriting rule on the ONE blade). Furthermore, in our model, no change

is required on the client proxy appliance while WCCP requires vendors to implement the WCCP protocol in their cache/proxy appliances. The capability of adding processing modules into the path of packet flows is gaining more attentions from industry and academia. There are two main directions in the efforts of realizing the in-network processing capability. First, networking vendors such as Arista Networks [2] and Juniper [3] (selectively) open their hardened Linux(or Unix)- based switch operating systems so that third-party applications can run in their switches or switch-integrated hardware modules while performance, fault, and security isolations are guaranteed by the operating system. However, this approach limits the portability of the in-network processing applications. The third-party solutions should be (re-)implemented by using the vendor’s programming SDK and will be closely coupled with the networking vendor’s operation system platform. There are efforts to use external x86 machines and/or Net-FPGA hardware boards as processing modules, using OpenFlow or a similar flow management capability implemented on hardware switches to re-direct flows from various network entities to multiple processing modules. Our approach falls in this category. Such approaches require coordination between the various switches in the flow path, and also possibly between the switches and processing modules. OpenPipes [16] and FlowStream [17] are examples of OpenFlow-based approach, using virtual machine and/or Click modular software technologies for the processing hosts. IBM and Blade Network Technology released Open Service Framework (OSF) [18] which is based on their own implementation of flow control functionality on Broadcombased rack mount switches. OSF software and Virtual Ethernet Bridge (VEB) enables IBM BladeCenter servers to work as in-network processing modules. Our approach also uses OpenFlow. The main advantage of using OpenFlow is that many different OpenFlow switches are nowadays available, such as from NEC, Juniper, HP, Quanta and potentially other Broadcom-based switches. Our approach is different from the other OpenFlow-based solutions [16], [17] and OSF [18] in that our solution is also transparent to the proxy itself: it can reuse any nontransparent proxy already deployed, and does not force one to re-implement or reinstall the proxy into a specific framework. In addition, we provide implementation details of the transparent TCP acceleration system with a real proxy and evaluate its performance relative to the non-transparent implementation, while the previous papers briefly describe case studies without experimental results. VII. C ONCLUSION The capability of adding in-network processing modules into the packet path is an important feature to provide an integrated, cost-effective, customizable and manageable network infrastructure that meets the continually evolving

requirements of IT/service providers in very different networking environments. By using an particular example of TCP acceleration proxy as an in-network processing appliance, in this paper, we show how we made the processing module co-located with network devices in a transparent way such that no configurational changes are needed on client end-hosts and also on the in-network processing appliance. The specific use of the proxy application has significant merit in its own right. Proxies have long been known to play a fundamental role in the Internet architecture; in particular, they bridge discontinuities in Internet domains. This was the fundamental insight of [19]. However, as currently constructed, the network architecture requires that end-hosts specifically address proxies to use their services, effectively requiring end-host knowledge of discontinuities in the network. Seamless operation – and the end-to-end principle – demands that this requirement be removed. The issue is to demonstrate that this can be done without significant performance penalties. The testbed measurements showed that our transparent proxy implementation performs very close to a conventional non-transparent proxy and the transparency support adds marginal processing delay on the end-to-end flow path. Our ongoing work is devising an efficient in-network processing framework that orchestrates multiple in-network processing modules across the network. A PPENDIX The classic TCP rate equation from [20], [21] is 1.3m √ (1) r l where m is the minimum transmission unit, r is the round trip time, and l is the loss rate. Note that the actual available bandwidth appears nowhere in this expression, and, thus, maximum transmission rate is independent of available bandwidth, for available bandwidth sufficiently high. Put another way, given a loss rate and latency, increasing bandwidth beyond the limit given by (1) offers no increase in transmission performance. (1) is clearly an approximation, and not entirely valid. Mahdavi and Floyd [22] note that it overestimates transmission rate for loss rates above 5%. It further assumes that loss is only due to congestion, and, thus, 8 3W 2 where W is the mean window size. Nonetheless, its rough guidance is valid. We have found that with 100 ms RTT and no loss, Linux 2.6 gives a maximum TCP performance, measured with Iperf, of 3.5 Mb/s with an MTU of 1500 bytes. l≈

R EFERENCES [1] HP ProCurve Open Network Ecosystem http://www.procurve.com/one/index.htm.

(ONE),

[2] EOS, Extensible Modular Operating System by Arista Networks, http://www.aristanetworks.com/en/EOS. [3] J. Kelly, W. Araujo, and K. Banerjee, “Rapid service creation using the JUNOS SDK,” in ACM PRESTO, 2009. [4] OpenWRT: Linux distribution for embedded devices, http://openwrt.org. [5] J. Brassil et. al., “The CHART System: A High-Performance, Fair Transport Architecture Based on Explicit-Rate Signaling,” ACM SIGOPS Review, February 2009. [6] K. Ramakrishnan, S. Floyd, and D. Black, “The Addition of Explicit Congestion Notification (ECN) to IP,” RFC 3168 (Proposed Standard), Internet Engineering Task Force, Sept. 2001. [Online]. Available: http://www.ietf.org/rfc/rfc3168.txt [7] “QoS Signaling for IP QoS Support,” Telecommunications Industry Association (TIA) Standard 1039, May 2006. [8] A. Bavier et al, “Increasing TCP Throughput with an Enhanced Internet Control Plane,” in Proceedings of MILCOMM, 2006. [9] “OpenFlow: Enabling Innovation in Campus Networks,” http://openflow.org. [10] J. C. Mogul, P. Yalagandula, J. Tourrilhes, R. McGeer, S. Banerjee, T. Connors, and P. Sharma, “API Design Challenges for Open Router Platforms on Proprietary Hardware,” in ACM HotNets-VII, Calgary, Alberta, Canada, October 2008. [11] Soekris, Inc. http://www.soekris.com/. [12] A. Tirumala, F. Qin, J. Dugan, J. Ferguson, and K. Gibbs, “Iperf-The TCP/UDP Bandwidth Measurement Tool, version 2.02.” [13] S. Ha, I. Rhee, and L. Xu, “CUBIC: a new TCP-friendly high-speed TCP variant,” SIGOPS Oper. Syst. Rev., vol. 42, no. 5, 2008. [14] J. T. Yu, “Performance Evaluation of Linux Bridge,” in Telecommunications System Management Conference, Louisville, Kentucky, 2004. [15] Web Cache Communication Protocol V2.0, IETF InternetDraft Std. draft-wilson-wrec-wccp-v2-01.txt, Apr. 2001. [16] G. Gibb, D. Underhill, N. McKeown, A. Covington, and T. Yabe, “Openpipes: Prototyping high-speed networking systems,” in ACM SigComm, Demo session, 2009. [17] A. Greenhalgh, F. Huici, M. Hoerdt, P. Papadimitriou, M. Handley, and L. Mathy, “Flow processing and the rise of commodity network hardware,” ACM SIGCOMM Comput. Commun. Rev., vol. 39, no. 2, 2009. [18] A. Gorti and V. Pandey, “An integrated high-speed load balancing and flow steering framework,” in First Workshop on Data Center - Converged and Virtual Ethernet Switching (DC CAVES), Issy-les-Moulineaux, France, September 2009. [19] B. Knutsson and L. Peterson, “Transparent Proxy Signalling,” Journal of Communications and Networks, Korean Institute of Communication Sciences (KICS), vol. 3, no. 2, 2001. [20] T. Ott, J. Kemperman, and M. Mathis, “Window Size Behavior in TCP/IP with Constant Loss Probability,” in DIMACS Workshop on Performance of Realtime Applications on the Internet, Plainfield, NJ, November 1996. [21] S. Floyd, “Connections with Multiple Congested Gateways in Packet-Switched Networks Part 1: One-way Traffic.” Computer Communications Review, vol. 21, no. 5, pp. 30–47, October 1991. [22] J. Mahdavi and S. Floyd, “TCP-friendly unicast rate-based flow control,” Technical Note sent to the end2end-interest mailing list, 1997.

Suggest Documents