August 2008 (vol. 9, no. 8), art. no. 0808-mds2008080002 1541-4922 © 2008 IEEE Published by the IEEE Computer Society
Distributed Wisdom
On the (Un)Reliability of TCP Connections: The Return of the End-to-End Argument Vadim Drabkin, Roy Friedman, and Gabriel Kliot • Technion—Israel Institute of Technology
D
evelopers often use TCP connections to realize reliable point-to-point communication in
distributed systems. A common issue in such systems’ design is whether a middleware or an application can rely solely on TCP or if a higher-level reliable mechanism should be implemented above it. A related question is whether developers can use the breakage of TCP for failure detection. The famous end-to-end argument1 answers the first question. Yet common wisdom suggests that TCP breakage always results from the failure of a process or machine on either end of the connection or from a severe networking problem.2 Consequently, some designers might be tempted to avoid implementing a higher-level reliable delivery mechanism when designing systems for LAN environments. Others might rely on TCP breakage as a definite indication of a failure or a network partition. Here, we highlight the dangers of relying solely on TCP for reliability without any additional messagerecovery mechanism at the application level (or at least inside a middleware in the same address space as the application). Also, TCP breakage can occur in a perfectly functioning LAN, so it can’t be relied on for failure detection either.
An example: WiPeer We’ve implemented a serverless collaborative application suite called WiPeer (available free at www.wipeer.com). WiPeer lets users automatically discover other WiPeer users on the same network segment and provides utilities such as chatting, multiplayer games, and file sharing such that all communication flows directly from one computer to the other. WiPeer uses JazzEnsemble group communication—a derivative of Ensemble3—for membership maintenance and reliable delivery purposes. WiPeer is written mostly in C#, whereas JazzEnsemble is implemented in OCaml. In our initial implementation, all communication was carried through JazzEnsemble. However, in the case of file sharing, we discovered that, owing to the overhead of marshaling and unmarshaling and the cost of passing between the memory domains of C# and OCaml, the performance was about 10 times slower than we would have expected based on the wire speed. Hence, we changed the implementation of the file-sharing mechanism to operate directly over TCP; the rest of the communication still flows over JazzEnsemble. This is somewhat similar to the Maestro concept.4 Moreover, to reduce the complexity of multiplexing and avoid adding headers to each file transfer packet, each file transfer is invoked using a separate TCP connection. Because downloading very large files can take several minutes, if a user disconnects and later reconnects in the middle of a file transfer, the transfer should immediately resume. Such disconnections can result from networking problems or users shutting down their computers. In particular, in wireless ad-hoc networks, users can temporarily move out of range or external temporary interference can occur. So, we’ve implemented a mechanism for restarting such file
IEEE Distributed Systems Online (vol. 9, no. 8), art. no. 0808-mds2008080002
1
transfers. Initially, though, we would do so only following a view change (a change in the set of members currently participating in the application).
Results This solution enabled us to gain near-wire speeds, especially for files larger than a few Kbytes. However, we encountered a bug in which a transfer of a large file (500 Mbytes or more) would occasionally hang after a few hundred Mbytes had already been transferred successfully. This happened rarely, but often enough for users to complain. During such events, all processes were alive, and the network seemed perfectly fine; all other communication flowed smoothly, and other concurrent file transfers between the same two machines continued uninterrupted. We discovered that the exception error message returned during these cases was “An established connection was aborted by the software in your host machine.” After some investigation, we determined that the problem can have two causes: the Windows firewall shutting down the TCP connection, or antivirus software. Hence, we had to add a mechanism in which the transfer would immediately restart from the last received byte if a TCP connection of a file transfer breaks without a group communication failure detection (view change). Of course, from the point of view of dependable-systems designers, a similar behavior could have occurred owing to various unexpected problems in libraries and runtime environments that are outside the reach of the developer, but that participate in executing the application.
T
he bottom line is that even when the network is working well, the loss of a TCP connection in a T
modern operating system could be a result of some other application or service on the same machine. So, to develop truly dependable applications, a high-level recovery mechanism is required, even when using TCP for point-to-point communication. Moreover, the break of a TCP connection can’t serve as evidence of node failure or network partition. References 1. 2. 3. 4.
J.H. Saltzer, D.P. Reed, and D.D. Clark, “End-to-End Arguments in System Design,” ACM Trans. Computer Systems, vol. 2, no. 4, pp. 277–288. R. Ekwall, P. Urbán, and A. Schiper, “Robust TCP Connections for Fault Tolerant Computing,” Proc. 9th Int’l Conf. Parallel and Distributed Systems (ICPADS 02), IEEE CS Press, 2002, pp. 501–508. M. Hayden, The Ensemble System, tech. report TR98-1662, Computer Science Dept., Cornell Univ., 1998. K.P. Birman et al., “Middleware Support for Distributed Multimedia and Collaborative Computing,” Software Practice and Experience, vol. 29, no. 14, pp. 1285–1312.
Vadim Drabkin is a PhD student in the Computer Science Department of the Technion—Israel Institute of Technology. Contact him at
[email protected]. Roy Friedman is an associate professor of computer science at the Technion—Israel Institute of Technology. Contact him at
[email protected]. Gabriel Kliot is a PhD student in the Computer Science Department of the Technion—Israel Institute of Technology. Contact him at
[email protected]. Cite this article: Vadim Drabkin, Roy Friedman, and Gabriel Kliot, "On the (Un)Reliability of TCP Connections: The Return of the End-to-End Argument," IEEE Distributed Systems Online, vol. 9, no. 8, 2008, art. no. 0808-o8003.
IEEE Distributed Systems Online (vol. 9, no. 8), art. no. 0808-mds2008080002
2