Design and Implementation Experiments of Scalable ... - CiteSeerX

Design and Implementation Experiments of Scalable Socket Buffer Tuning Tatsuhiko Terai, Takuya Okamoto, Go Hasegawa and Masayuki Murata Department of Infomatics and Mathematical Science Graduate School of Engineering Science, Osaka University 1-3, Machikaneyama, Toyonaka, Osaka 560-8531, Japan Phone: +81-6-6850-6616, Fax: +81-6-6850-6589 E-Mail: {terai, tak-okmt, hasegawa, murata}@ics.es.osaka-u.ac.jp Abstract Although many research efforts have been devoted to the network congestion against an increase of the Internet traffic, only a few discussions are recently made on the performance improvement of the endhosts. In this paper, we propose a new architecture, which is called Scalable Socket Buffer Tuning (SSBT), to provide high performance and fair service for many TCP connections at the Internet endhosts. SSBT has two major features. The one is the Equation-based Automatic TCP Buffer Tuning (E-ATBT), where the sender host estimates ‘expected’ throughput of the TCP connections through a simple mathematical equation, and assigns the send socket buffer to them according to the estimated throughput. The other is to reduce the number of memory accesses at the sender host by using some of new system calls, which is called Simple Memory-copy Reduction (SMR) scheme. We confirm an effectiveness of our proposed algorithm through the implementation experiments, and we have found that our SSBT can achieve up to 30% gain of the Web server throughput, and the fair and effective usage of the sender socket buffer can be achieved.

1

Introduction

With a rapid growth of the Internet users, many research efforts have been made in order to avoid and dissolve the network congestion against an increase of network traffic. However, only a few discussions are recently made on the performance improvement of the Internet endhosts in spite of the projection that the bottleneck is now being shifted from the network to endhosts. For example, busy WWW (World Wide Web) servers in the current Internet receive hundreds of requests for document transfer every second at peak time. Of course, an improvement of protocol processing on the endhosts is not a new subject. An early example can be found in [1] where the authors propose the ‘fbuf’ (fast buffer) architecture, which shares the memory space between the system kernel and the user process to avoid redundant memory copies during data exchanges. It is based on an observation that memory copy is a main cause of the bottleneck at endhosts in TCP data transfer. Other

approaches can be found in, e.g., [2]. However, past researches including the above approaches do not consider the ‘fair’ treatment of connections, by which the connections receive fair service from the server. By achieving fair service among connections, we can also expect performance improvement by the following reasons. Suppose that a server host is sending TCP data to two clients of 64Kbps dial-up (say, client A) and 100Mbps LAN (client B). If the server host assigns an equal size of the send socket buffer to both clients, it is likely that an amount of the assigned buffer is too large for client A, and too small for client B, because of the difference of capacities (more strictly, bandwidth-delay products) of two connections. For an effective buffer allocation to both clients, a compromise on buffer usage should be taken into account. Another important example that requires the ‘fair’ buffer treatment can be found in a busy Internet WWW server, which simultaneously accepts a large number of TCP connections with different bandwidths and round trip times (RTTs) at the same time. In [3], the authors have proposed a buffer tuning algorithm called Automatic TCP Buffer Tuning (referred to as ATBT in this paper), which dynamically adjusts the send socket buffer size according to the change of the TCP sending window size of the connection. However, it cannot provide ‘fairness’ among connections because the throughput of TCP connections is not proportional to the sending window size [4]. In this paper, we propose a novel architecture, called Scalable Socket Buffer Tuning (SSBT), to provide high performance and fair service for many TCP connections at the sender host. For this purpose, we develop the following two mechanisms in SSBT. Those are E-ATBT (Equationbased Automatic TCP Buffer Tuning) and SMR (Simple Memory-copy Reduction) scheme. Our main contribution in this paper is E-ATBT, which provides a fair assignment of the socket buffer. In E-ATBT, we consider ‘expected’ throughput of TCP connections by an analytic approach. It is characterized by packet loss rate, RTT and RTO (Retransmission Time Out) values of the connections, which can easily be monitored by the sender host. The send socket buffer is then assigned to each connection based on its expected throughput, with consideration on a max-min fair-

ness among connections. The SMR scheme provides a set of socket system calls in order to reduce the number of memory copy operations at the sender host in TCP data transfer. It is a replacament of the existing so-called ‘zerocopy’ mechanisms. However, it is much simpler and can still achieve the same effect in reducing the overhead at the sender host. We validate an effectiveness of our proposed mechanism through the implementation experiments, by which we confirm the effectiveness of SSBT in terms of overall server performance, the number of accommodated connections, and fairness among connections. This paper is organized as follows. In Section 2, we propose our SSBT algorithm. We evaluate the effectiveness of our proposed algorithm through the implementation experiments in Section 3. Finally, we present some concluding remarks in Section 4.

2

Scalable Socket Buffer Tuning (SSBT)

Our proposed method called a “Scalable Socket Buffer Tuning” (SSBT) includes the following two mechanisms; the Equation-based Automatic TCP Buffer Tuning (EATBT) and the Simple Memory-copy Reduction (SMR) scheme. We explain them in the following subsections.

2.1 Equation-based Automatic TCP Buffer Tuning (E-ATBT) The Equation-based Automatic TCP Buffer Tuning (EATBT) provides the fair and effective send socket buffer assignment. In E-ATBT, the sender host estimates an ‘expected’ throughput of each TCP connection by monitoring three parameters, which are packet loss probability, RTT, and RTO values. It then determines the required buffer size of the connection from the estimated throughput, not from the current window size of TCP. The estimation method of TCP throughput is based on the analysis result in [5]. In [5], the authors have derived the average throughput of the TCP connection for the model in which multiple connections with different input link bandwidths share the bottleneck router employing the RED algorithm [6]. The following parameters are necessary to derive the throughput; • p: packet dropping probability of the RED algorithm • rtt: the RTT value of the TCP connection • rto: the RTO value of the TCP connection In the analysis, the average window size of the TCP connection is first calculated from the above three parameters. The average throughput is then obtained by considering the performance degradation caused by TCP’s retransmission timeout. The analysis developed in [5] is easily applied to our case, by viewing the packet dropping probability of the RED algorithm as the observed packet loss rate. The parameter set (p, rtt, and rto) is obtained at the sender host as follows. Rtt and rto can be directly obtained from the sender TCP. The packet loss rate p can also be estimated from the number of successfully transmitted packets and the number of lost packets detected at the sender host

via acknowledgement packets. A possible cause of the estimation error on p is due to a stochastic nature of the packet losses since the analysis in [5] assumes the random packet loss. Thus, we need the validation for the case where the packet losses occur at the drop-tail router, since in that case, the packets tend to be dropped in the bursty nature. Note that an appropriateness of the estimation method used in the E-ATBT algorithm has been validated in [5]. However, we can apply another estimation result of TCP throughput given in [4] using an additional parameter of Wmax , the buffer size of the receiver, 1 max λ = max W , RT T RT T

2bp 3 +T0

min(1,3

3bp 2 8 )p(1+32p )

where b is set to 2 if the receiver uses TCP’s Delayed ACK option, and 1 if not. We denote the estimated throughput of connection i by ρi . From ρi , we simply determine Bi , the required buffer size of connection i, as; Bi = ρi × rtti where rtti is RTT of connection i. By this mechanism, it is expected to provide a stable assignment of the send socket buffer to TCP connections, if the parameter set (p, rtt, and rto) used in estimation is stable. In ATBT, on the other hand, the assignment is inherently instable even when three parameters are stable, since the window size oscillates more significantly regardless of the stability of parameters. As in ATBT, our E-ATBT also adopts the max-min fairness policy for re-assigning the excess buffer. Differently from the ATBT algorithm, however, E-ATBT employs the proportional re-assignment policy, that is, the excess buffer is re-assigned to connections that need more buffer in proportion to the required buffer size of those connections calculated from the analysis. Note that ATBT re-assigns the excess buffer equally, since it has no means to know the expected throughput of the connections.

2.2 Simple Memory-copy Reduction Scheme

(SMR)

Another mechanism we implemented is a Simple Memory-copy Reduction (SMR) scheme. A detailed mechanism at the sender host in TCP data transfer of the original BSD UNIX system is illustrated in Figure 1(a). When the user application transfers a file by using TCP, the file is first copied from the file system (on disk or memory) to the application buffer located in the user memory space by one memory copy operation. Then, the data in the application buffer is copied to the socket buffer which is located in the kernel memory space. It is performed by a couple of system calls provided by the socket interface. Two memory copy operations are thus necessary. It is redundant, however, to copy the file in the file system to the socket buffer via the application buffer, and it can be directly copied from the file system to the socket buffer in file transfer. Of course, the memory space should be clearly divided into

Data File Memory Copy ( File to Application Buffer )

Memory Copy

Application Buffer

( Application Buffer to Socket Buffer )

Data File

Call socket system call

Call socket system call

Socket System Call ( Copying data from User space to Kernel Space )

Socket System Call ( Copying data from User space to Kernel Space )

User Space Kernel Space

User Space Kernel Space

Memory Copy

Socket Buffer

( File to Socket Buffer )

TCP

(a) Original Mechanism

Socket Buffer

TCP

(b) Proposed Mechanism of SMR scheme

Figure 1: Memory Copy Operations in TCP data Transfer the user and kernel spaces by the UNIX memory management system, so that the kernel memory space is protected from illegal accesses caused by user applications. When we consider performance improvement of TCP data transfer, however, we should avoid the memory copy from the file system to the application buffer, since the memory access is the largest obstacle in improving TCP data transfer [7], and reducing the number of memory copy operations in TCP data transfer is a key to improve the server’s protocol processing speed. Reducing the number of memory accesses in TCP data transfer is not a new issue. Such an example can be found in [1, 2]. In [1], the authors proposed the ‘fbuf’ (fast buffer) architecture, which shares the memory space between the system kernel and user processes to avoid redundant memory copies during data transfers. However, it is difficult to implement fbuf on actual systems, because its complicated architecture requires many modifications in the memory management mechanism of operating systems. In this paper, we propose a Simple Memorycopy Reduction (SMR) scheme, which is a new and simpler mechanism that can avoid the memory copy from the file system to the application buffer. Figure 1(b) illustrates the proposed mechanism of the TCP data transfer. In the proposed mechanism, a transferring file in the file system is directly copied to the socket buffer by new socket system calls. The original socket system calls require the application buffer as one of the arguments to copy the data from the application buffer to the socket buffer. In the proposed mechanism, on the contrary, the socket system calls are modified to specify a file descriptor of a transferring file as an argument, by which a direct copy of the data from the file system to the socket buffer can be accomplished. Then, the redundant memory copy procedure can be avoided. Note that SMR scheme is much simpler than the above-mentioned schemes, but it can still achieve the same effect in reducing the overhead at the sender host.

3

Implementation Experiments

In this section, we present the results obtained by our experimental system. We implemented SSBT mechanism

on Intel Pentium PC running FreeBSD-4.0RELEASE.

3.1 Experiment 1: Fair Buffer Assignment among Different Connections We first present a time–dependent behavior of the buffer size assigned to each connection in Figure 2. We use the network topology depicted in Figure 3. Server host is with 550 [MHz] Pentium-III Xeon processor and 1024 [MByte] memory and client hosts are with 1 [GHz] Pentium-III processor and 512 [MByte] memory. For routers (Routers 1 and 2) in the figure, we use the PC-based router called ALTQ [8] to realize the RED router. We establish connections 1, 2, and 4 between the server host and client host 1, and connection 3 between the server host and client host 2. In this experiment, connections 1 and 2 first start packet transmission at time 0 [sec]. Then, connections 3 and 4 join the network at time 5 [sec] and 10 [sec], respectively. Here, we set the total buffer size to be 512 [KByte]. In ATBT (Figure 2(a)), the assigned buffer sizes heavily oscillate because the ATBT algorithm adapts the buffer assignment according to the window size. Another and more important reason is that when retransmission timeout expiration occurs at a certain connection, that connection resets the window size to 1 [packet] according to the TCP retransmission mechanism. Then, the assigned buffer size of that connection becomes very low. Once the assigned buffer gets small, the connection cannot inflate its window size during a while because of the throttled buffer size. It is the reason why the assigned buffer size is kept low during a while in the ATBT algorithm. On the other hand, E-ATBT can provide the stable and fair buffer assignment as shown in Figure 2(b).

3.2 Experiment 2: Web Server Performance In this experiment, the effectiveness of the SSBT scheme is investigated using the machine running the Apache Web server. We use the network topology in Figure 3, and run the SSBT-enabled Web server at the server host. Server and all client systems are same as previous experiment in Subsection 3.1. For routers (Routers 1 and 2) in the figure, we use the PC-based router called NISTNet [9] to emulate

Assigned Buffer Size [KBytes]

Assigned Buffer Size [KBytes]

350

Connection 1 Connection 2 Connection 3 Connection 4

300 250 200 150 100 50 0 0

5

10

15 20 25 Time [sec]

30

35

40

(a) ATBT Case

350

Connection 1 Connection 2 Connection 3 Connection 4

300 250 200 150 100 50 0 0

5

10

15 20 25 Time [sec]

30

35

40

(b) E-ATBT Case

Figure 2: Changes of Assigned Buffer Sizes

1000 [Mbps]

1000 [Mbps]

Client Host 1

Router 1

Server Host 100 [Mbps]

100 [Mbps] Router 2

Client Host 2

Figure 3: Network Environment a variety of network conditions, such as large RTT, packet loss environment, and so on. In this experiment, we set the delay time between server host and each client host to 2 [ms] and the number of HTTP connections at Client 2 to 300. At the receiver host, we run httperf [10] to generate document transfer requests to the Web server by HTTP. Furthermore, to consider a realistic scenario for the traffic arriving at the server, we follow [11] in determining the characteristics of the Web access, which includes the document size distribution, the idle time distribution of requests, and the distribution for the number of embedded documents. We use the HTTP/1.1 protocol for Web access to the server. The send/receive socket buffer size of each TCP connection is set to 64 [KByte]. We run each experiment during 30 minutes, which corresponds to generations of about 25,000 document transfer requests from each client. Figure 4 shows the average performance gain by the SSBT scheme as a function of the document size. Here, the average performance gain for the documents with size i [Byte] was determined as follows. Let the number of document requests during the experiments be M SSBT,i and Morig,i for the case with and without the SSBT scheme. Similarly, the observed transfer time of jth document with size i is denoted as TSSBT,i ,j and Torig,i ,j . Then, the average performance gain Gi for documents with size i is defined as; Morig,i MSSBT,i i i Gi [%] = 100×

j=1

Torig,i,j

Morig,i

j=1

TSSBT,i,j

MSSBT,i

Note that since we generated the requests according to the probability functions for the document size, idle time, and the number of embedded documents, Morig,i and MSSBT,i may be different in the experiments with and without SSBT. If G(i) is over 100 [%], it means that our SSBT outperforms the original mechanism. The cases of 10, 100, 300 and 600 clients at Client 1 are shown in Figures 4(a), 4(b), 4(c), and 4(d), respectively. It seems that SSBT degrades the performance of the WWW server, especially when the document size is small. This is caused by TCP’s retransmission timeouts, which is explained as follows. When a requested document size is small, the window size of the TCP connection carrying the document does not become so large. Therefore, if packet loss occurs, the TCP connection tends to wait retransmission timeout since the window size is too small to invoke a fast retransmit. It results in that the timeout duration occupies a most part of the document transfer. That is, the transfer time of the small document is dominated by whether the timeout occurs or not, not by whether the SSBT algorithm is used or not. When the document size is large, on the other hand, the effectiveness of the SSBT becomes apparent as expected. Especially when the number of clients increases, it greatly reduces the document transfer times (see Figures 4(c) and 4(d)). It means that the SSBT algorithm shares the Web server resource in an effective and fair manner, compared with the original mechanism. These results show that the SSBT algorithm is effective even at the actual Internet servers.

4 Conclusion In this paper, we have proposed SSBT (Scalable Socket Buffer Tuning), a novel architecture for effectively and fairly utilizing the send socket buffer of the busy Internet server. SSBT consists of the two algorithms, the E-ATBT and SMR schemes. The E-ATBT algorithm assigns the send socket buffer to the connections according to the estimated throughput of connections for fair and effective usage of the socket buffer. The SMR scheme can reduce the number of memory-copy operations when the data packet is sent by TCP, and improve the overall performance of the

1000 Average Performance Gain [%]

Average Performance Gain [%]

1000 500 300 200 100 50 30 20 10 1000

10000 100000 1e+06 Document size [Bytes]

500 300 200 100 50 30 20 10 1000

1e+07

(a) 10 Clients

1e+07

(b) 100 Clients 1000 Average Performance Gain [%]

1000 Average Performance Gain [%]


500 300 200 100 50 30 20 10 1000


1e+07

(c) 300 Clients

500 300 200 100 50 30 20 10 1000


1e+07

(d) 600 Clients

Figure 4: Document Size vs. Average Performance Gain by SSBT Scheme server host. We have confirmed the effectiveness of the SSBT algorithm through the implementation experiments, and have shown that SSBT can improve the overall performance of the server, and provide the fair assignment of the send socket buffer to the heterogeneous TCP connections.

Acknowledgements This work was partly supported by Research for the Future Program of Japan Society for the Promotion of Science under the Project “Integrated Network Architecture for Advanced Multimedia Application Systems,” Telecommunication Advancement Organization of Japan under the Project “Global Experimental Networks for Information Society Project,” and financial support on “Research on High-performance WWW server for the NextGeneration Internet,” from the Telecommunications Advancement Foundation.

References [1] P. Druschel and L. L. Peterson, “Fbufs: a high-bandwidth cross-domain transfer facility,” in Proceedings of the Fourteenth ACM symposium on Operating Systems Principles, pp. 189–202, Dec. 1993. [2] A. Gallatin, J. Chase, and K. Yocum, “Trapeze/IP:TCP/IP at Near-Gigabit Speeds,” 1999 USENIX Technical Conference, June 1999. [3] J. Semke, J. Mahdavi, and M. Mathis, “Automatic TCP buffer tuning,” in Proceedings of ACM SIGCOMM’98,

pp. 315–323, August 1998. [4] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP throughput: a simple model and its empirical validation,” in Proceedings of ACM SIGCOMM’98, pp. 303–314, August 1998. [5] G. Hasegawa, T. Matsuo, M. Murata, and H. Miyahara, “Comparisons of packet scheduling algorithms for fair service among connections on the Internet,” in Proceedings of IEEE INFOCOM 2000, Mar. 2000. [6] S. Floyd and V. Jacobson, “Random early detection gateways for congestion avoidance,” IEEE/ACM Transactions on Networking, vol. 1, pp. 397–413, August 1993. [7] D. D. Clark, V. Jacobson, J. Romkey, and H. Salwen, “An Analysis of TCP Processing Overhead,” IEEE Communications Magazine, vol. 27, pp. 23–29, June 1989. [8] ALTQ (Alternate Queueing for BSD UNIX) , available at http://www.csl.sony.co.jp/˜kjc/ software.html. [9] NIST Net Home Page, available at http://www.antd. nist.gov/itg/nistnet/. [10] D. Mosberger and T. Jin, “httperf – A Tool for Measuring Web Server Performance,” Technical Report, HewlettPackard Laboratories, HPL-98-61, March 1998. [11] P. Barford and M. Crovella, “Generating Representative Web Workloads for Network and Server Performance Evaluation,” in Proceedings of ACM SIGMETRICS ’98, 1998.

Design and Implementation Experiments of Scalable ... - CiteSeerX

Design and Implementation Experiments of Scalable ... - CiteSeerX

Suggest Documents

Monte Carlo Experiments: Design and Implementation - CiteSeerX

Design, Implementation and Evaluation of Scalable Resource ...

Scalable and Recoverable Implementation of Object ... - CiteSeerX

Scalable and Recoverable Implementation of Object ... - CiteSeerX

Implementation experiments of TCP Symbiosis: bio ... - CiteSeerX

Design and Implementation of a Scalable ... - Semantic Scholar

Design and Implementation of Scalable Hierarchical Density ... - IDEAL

Design and Implementation of scalable GetFeasibility ... - IEEE Xplore

Design and Implementation of a Scalable Channel Emulator for ...

Design and Implementation of Scalable Wireless ... - Semantic Scholar

Design and Implementation of Monitoring and ... - CiteSeerX

Scalable Distributed Databases Design - CiteSeerX

Recent Experiments in Aerospace and Design ... - CiteSeerX

Implementation and Evaluation of a Scalable Application ... - CiteSeerX

Design and Implementation of Composite Protocols - CiteSeerX

Design and Practical Implementation of Multiple ... - CiteSeerX

Design and Implementation of Content-Adaptive ... - CiteSeerX

Design and Implementation of Reflective SQL - CiteSeerX

Design and Implementation of Secure Networked ... - CiteSeerX

DESIGN AND IMPLEMENTATION ISSUES OF A ... - CiteSeerX

Design, Analysis and Implementation of Multiphase ... - CiteSeerX

Design, implementation and performance evaluation of ... - CiteSeerX

Design, Implementation and Deployment of PAIRwise - CiteSeerX

Design and Implementation of Backtracking Wave ... - CiteSeerX