A new method of fault tolerance TCP - Computer ... - Semantic Scholar

A New Method of Fault Tolerance TCP Li Zhao1, Xu Ke2, Xu Mingwei2, Fu Lizheng1, Wu Jianping3 1 Tsinghua Bitway Networking Technology Corporation {lizhao,fulz}@bit-way.com 2 Department of Computer Science, Tsinghua University {xuke, xmw}@csnet1.cs.tsinghua.edu.cn 3 Department of Computer Science, Tsinghua University [email protected] Abstract With the rapid development of Internet, the need of high availability of data services on Internet becomes more urgent. But as one of the most useful protocol on Internet, TCP protocol software can not solve the high availability due to the failure of hardware or software on Server/Client. In this paper, we propose a new method that can implement fault tolerance TCP to improve the high availability of data transmission. First, we analyze some existing methods of fault tolerant TCP; Then based on the characteristic of present server architecture, we put forward our new method of fault tolerant TCP; At last, we describe in detail how to implement and test our method. Experimental results show that our fault-tolerant TCP can offer high available and high effective communication support for reliable data service on Internet.

1. Introduction 1.1. The Problems As an important protocol in TCP/IP protocol suite, Transmission Control Protocol(TCP)[1] is extensively applied to data transmission on Internet. TCP utilizes the technology of positive acknowledgement with retransmission to solve the instability of under level IP protocol and offers reliable data transmission service for application protocols such as HTTP, Telnet, FTP, etc., TCP is also used for route information communication between router’s BGP entity. For TCP applications, there are always two communicating peers. One is called Client, and another is called Server. A TCP connection is usually initiated by Client’s connection request, and established by Server’s response message. After TCP connection has been established, Server will offer data service to Client through this TCP connection. There are typical Servers such as HTTP Server, FTP Server, etc. Among Internet society, ISP (Internet Service Provider) offers and

manages many different Servers who offer corresponding data service to a large amount of Clients. The problem of ISP considering is how to offer reliable and incessant data services to Clients. Though TCP protocol has implemented the reliable dataflow above unreliable communication system, but TCP protocol itself does not consider TCP connection broken down because of software and hardware troubles of Servers or Clients. One interruption of TCP connection is equivalent to one data services interruption. With expansion of Internet, Server need provide services for more and more Clients, the probability of Servers befalling troubles such as overload; software and hardware failure is more and more high. In current research of TCP protocol, how to guarantee TCP connection fast be recovered from interruption to normal becomes the hot topic.

1.2. Related Works Concerning fault tolerant TCP, there are a lot of relevant research papers already. Some influential methods that have code implementations include: 1ǃTCP Splice of IBM Research Division and CMU [3]; 2ǃMTCP of Rutgers University [4]; 3ǃWrapping Server-side TCP of UT Austin [5]; The initial purpose of TCP Splice is to realize mobility, but it can be used for realizing fault tolerant TCP. This method mainly makes use of Proxy to redirect TCP dataflow between Clients and Servers. The proxy has strong intelligence compared with general proxy. Except the proxy transmits TCP packets, it also maintains the TCP connection state between Clients and Servers. Once Server takes place collapse and unable to offer data services through TCP connection, the proxy will transmit the connection state and follow TCP packets to another similar Server. The new Server will continue to offer data service for Client through resumed TCP connection. If this method is adopted to realize fault-tolerant TCP, we will face a heavier fault-tolerant problem. A large number of TCP connection states are maintained on one proxy,

Proceedings of the 2003 International Conference on Computer Networks and Mobile Computing (ICCNMC’03) 0-7695-2033-2/03 $17.00 © 2003 IEEE

but the proxy might fail too. MTCP is based on a trend of present Internet development. The author of MTCP deems that Clients care mostly service quality but not position of Server. By rebuilding the TCP protocol, MTCP realizes a new transmission layer protocol. Basing on the new transmission layer, MTCP supports Client to changes different Server voluntarily in order to obtain better serving quality of the same data service. In MTCP, upper fault tolerant applications must backup their states as parts of TCP connection states, and all backup information are backup by Client’s software. Though MTCP has offered some foundation for designing upper fault tolerant application, but upper application might need backup a large number of state information, this will cause Client’s overload. And application’s backup operation can’t be processed according to the application’s logic structure. Secondly, this method must change the TCP’s code on Server and Client at the same time, but software on Clients is not in control of ISPs at present. Wrapping Server-side TCP realizes fault tolerance by changing TCP code on Server side. Main modification is to add two wrapping levels between TCP and IP and between TCP and upper application. These wrapping levels back up TCP state information to logging system. When Server happens invalidly, another Server will establish a new TCP connection that is same with collapsed one according to the data on logging system. This method avoids modifications of TCP protocol, and need not change any Client-side software. But this method does not consider how upper application backup their state information.

2. Development of Server Architecture As one of high performance computer, Server runs some server applications and offers data services for a large number of Clients. The development of Server architecture undergoes several stages from early Mainframe to present SMP, from Shared Memory multiprocessors to Distributed memory multi-processors. Finally, Cluster becomes present first-selected architecture because its high performance, high availability, high scalability and low cost [6]. Cluster is a group of computers (Nodes) that are connected by high-performance communication network according to certain architecture. Supported by some systems’ and applications’ environment, all Nodes work in phase together, like a single calculation resource, to offer coherent high-efficient calculation services for users. Generally, each Node in Cluster is a high-performance workstation or top-grade PC server. It has own processor, high-speed buffer memory, disk and I/O adapter, and a whole operating system. Cluster adopts the technology of Single System Image (SSI) to integrate the calculation resource as a single computer. SSI makes Cluster easier to use and manage. Communication network among Nodes can adopt commercial network such as Ethernet or appropriative network such as Myrinet, Mesh. The figure 1 shows the architecture of a typical Cluster. Cluster System SSI

Node

Node

Switch Fabric

Node

1.3. Main Contributions of This Paper We will describe a new method of implementing fault tolerant TCP in this paper. The method is based on recent development of Server architecture, and need not alter TCP protocol. The method add TCP connection state backup function by carrying on some modifications of the TCP code on Server to implement fault tolerant TCP. The method can overcome several shortcomings of above methods of fault tolerant TCP. It is simple, and able to connect with Client who uses standard TCP protocol. It can offer flexible support for the designing of upper fault tolerant applications. The rest of this paper is organized as follow: Section 2 discusses the development of present Server architecture; Section 3 introduces the model of fault tolerant TCP and upper fault tolerant application combining to realize high availability data services; Section 4 describes how to realize our fault-tolerant TCP; Section 5 describes testing results of our fault-tolerant TCP; and in the last section we make some conclusions.

Node

Figure 1. Cluster architecture Using Cluster as Server to offer Internet services, the Nodes in Cluster generally need to be classified according to function of Node. Some Nodes are Connect Nodes that connect with the outside. There are several network interfaces on Connect Node connecting with Internet. Other nodes serve as Response Node that response the outside computing request. Respond Node only connected with communication network inside Cluster. Connect Nodes transfer the request and respond data between outside and Response Node. After Respond Node gets computing request from Connect Node, Respond Nodes carry on some corresponding calculation, and then return the result data to outside through Connect Nodes. The architecture of Cluster used as Server is showed as figure 2.


C lu ster S erve r

App system

C o m p u te N o d e

Log system

SSI

C o m p u te N o d e

Step 1

S w itch Fab ric

Check point 1

C o n n e ct N o d e

Step 2 states backup

Step 2

Cpu crash point

C o n n ec t N o d e

Check point 2 Step 3

fault

In tern et

backup App system

Log system

Figure 2. Cluster Server Architecture Cluster Server’s advantages of high performance, high availability, high scalability and low cost already have been analyzed by a lot of papers [7]. High availability of Cluster Server can be sufficiently embodied from its architecture. Firstly, these are many physical connection channels between Cluster Server and outside Internet, when a channels is invalid, another channels can continue the dataflow that is interrupted on the invalid channels. A lot of technologies can realize fault tolerant function of physics channels, such as Trunking technology [8] and EtherChannel technology [9] that are used for Ethernet channels, APS/MSP technology on SONET/SDH channels [10], etc. Secondly, the interior communication network of Cluster Server can realize redundant working. Finally, because these are many Response Nodes in Cluster Server, some Response Node can be defined as backup Nodes of other Response Node that running important task. Cluster Server can utilize redundant Nodes to realize fault-tolerance data services. Once some Response nodes running important task fail, the backup Response Node will start up and resume the task to continue the interrupted data service.

3. Model of Fault Tolerance TCP and Upper Fault Tolerance Application While Server need long-playing TCP connection to offer data service for Client, Response Nodes of Cluster Server may take place overload or software and hardware failure, it will cause Response Node lose efficiency to offer corresponding services to Client. If the data service must be guaranteed to continue, the task on failure Response Node must be transferred to other valid Response Node. Realizing this kind of fault tolerance, the most usual and basal method is to carry on Check-Point backup. Once the running Response Node fail, then a spare Response Node will rollback resumes the fail task accord the backup state [11]. Its working way shows as the figure 3.

Step 2 states restore Cpu restore point Check point 2 Step 3

Figure 3. Fault-tolerance Application’s state backup and rollback resume In figure 3, step i can be a piece of instruction or a set of instructions that have logic meaning, we can define the term of “backup granularity” in a backup system by the number of instruction in one step averagely. For TCP, any operation of sending out data may involve the state of the far TCP peer to be changed. Because the state of the far TCP peer is not in control of the local TCP software, the local TCP can’t make the state reappear on the far TCP peer. So the local TCP must adopts a small backup granularity if local TCP want to realize fault tolerance by state backup, that is to say fault tolerance TCP’s state information must be backup after every TCP writing operation. For upper application that uses fault tolerance TCP, the reading and writing operations that use TCP socket disperse in each corner of software code. At the time of considering how to backup the state of upper application, it is impossible to backup application’s state according to the backup granularity of fault tolerance TCP. The upper application’s backup granularity only can be specified by the logic structure of the application’s instruction. Generally, application’s backup granularity is larger than TCP’s, because backup operations of TCP state must be carried on between two socket writing operation at least, but there are a lot of socket writing operations between these two application’s backup operations. Generally when a Response Node loses efficiency and the application is switched over to other Response Node, TCP connection need to be switched over too. According to the discussion above, the backup granularity


of application and TCP are different. Such will produce one problem, TCP begins to resume from last socket operation when it fail, and application might begins to resume from last time's data that is backup. In the course of application resuming, the application will re-execute some TCP socket operations. These reduplicate socket operations maybe disturb the far TCP peer’s dataflow. It is not error from TCP’s view, but it will create logic error for application. In order to resolve the disagreement problem of backup granularity of TCP and upper application, TCP can backup some events of socket operation when TCP backup its state. These events include data, data length, and result of socket operation. When application reexecute socket operation in the process of rollback, TCP will not re-execute real socket operations, but return the result from these backup events. How fault tolerant TCP and upper fault tolerant application cooperates to implement high availability data services is showed as the figure 4. Fault TCP Application

Restore TCP Application

Task is a restore task?

Task is a restore task?

N

Y

Construct Connection socket

Get backup Socket index

TCP Status

Backup Connection socket id

LOG

Restore socket status

…… step i:

Socket index

…… Tcp read or write …… tcp read or write …… APP Status

Backup App step i status

LOG

Restore app status to step i

App step i+1: …… Repeat Tcp read or write …… Repeat tcp read or write ……

Node, one Connect Node. Three Nodes are interconnected through 100M Ethernet. Connect Node is responsible for connecting with the Internet. Connect Node transmits the data between the outside Internet and the inside Response Nodes. Two Response Nodes form hardware environment of dual nodes backup system. The two nodes can realize the state backup each other through the heartbeat algorithm. Among the system, Response Node might

exist four kinds of state as the following. 1ǃ Active State, Node in this state can offer normal computing service, and backup the present software state to another node. 2ǃ Standby State, Node in this state does not offer any computing service, but obtains backup information of software state from Active Node and stores the information. 3ǃ One-working State, Node in this state can offer normal computing service, but does not produce any backup information. 4ǃ No-working State, this state does not do any work. The possible states of two Response Nodes are showed as the figure 5. Compute Node 1

Compute Node 2

Compute Node 1

Compute Node 2

Oneworking

Noworking

Noworking

Oneworking

Compute Node 1

Compute Node 2

Compute Node 1

Compute Node 2

Standby

Active

Active

Cpu crash point

Standby

Backup App step i+1 status

step i+1: …… Tcp read or write …… tcp read or write ……

App step i+2: …… Tcp read or write Cpu …… current tcp read or write point ……

Once switch from fault to restore

Figure 4 The Model of Fault-tolerance TCP and Upper Fault-tolerance Application Cooperating to Realize High Availability data service

4. Implementation of Fault Tolerance TCP 4.1. Hardware Platform We have implemented the prototype of fault tolerant TCP in the hardware platform that is same as Cluster Server. In this hardware platform there are two Response

Figure 5. The Possible State Combinations of Two Response Nodes In normal startup, two Response Nodes will consult and confirm the Active Node and Standby Node. In the course of dual nodes working, if the Active Node takes place fault, the Active Node’s state will be switched over to No-working, the Standby Node’s state will be switched over to One-working, and the One-Working Node continues last Active Node’s works. In the course of dual nodes working, if Standby Node takes place fault, Active Node’s state will be switched over to One-working, the Standby Node’s state is switched to No-working. Once No-working Node resume, One-working Node’s state will be switched over to Active, and the resumed Node’s state will be switched over to Standby.

4.2. Software Environment


Every node in above hardware platform runs a kernel of own HEROS operating system [12]. HEROS is a multitask real-time operating system that developed by the department of computer science of Tsinghua University. It implements a distributed share message queue system based on Ethernet hardware. The message queue system can offer a highly efficient mechanism of communication between tasks on different Nodes. And the message queue system adopts the acknowledgement mechanism to realize high dependability of communication. HEROS also includes small SSI software to realize single image of nodes’ calculation resource. And the SSI software can monitor heartbeat state of software, elect and maintain the state of Active/Standby Node. The SSI software can run on Connect Node also. The SSI software that running on Connect Node can carry on different data transmission policy according to present Active/One-working Node’s position. Though each node among the system runs HEROS, but HEROS that run on different Node has different functions according to type of Node. It is necessary to configure function module of HEROS on different Node. HEROS that run on Connect Node receives IP packet from Internet, it does not transmit it to the upper level protocol on Connect Node, but give the packet to Response Nodes through inter communication network. When Response Node need to send out some response data to Internet, HEROS on Response Node does not give the data to IP protocol software of itself, but send out through inter communication network to Connect Node. HEROS has realized abundant network protocols that include the whole TCP/IP protocol suite. In HEROS, TCP runs on Response Node as an independent task. It maintains a data structure (TCB) for every TCP connection. A TCB corresponds a TCP connection. TCP task receives various kinds of message from different software module, include IP packet from Connect Node’s IP software, clock message from under software module, and socket function call command from application software, etc. TCP task takes message from its receiving message queue and carry on some corresponding operation according to message. The operation includes modification of TCB or sending out some outputs. The figure 6 shows TCP task structure in HEROS. TCP msg queue

Output

Tcp msg Handle Task

TCBs

Figure 6. Structure of TCP

4.3. Structure of Fault Tolerance TCP Fault tolerant TCP is based on the structure of ordinary TCP, the figure 7 shows the structure of fault tolerance TCP. Systematic state identification TCP msg queue

TCP backup msg queue

Output Backup msg

Tcp msg Handle Task

Tcp Backup Server Task

TCBs

Figure 7. Structure of Fault Tolerance TCP Though each node among the system runs HEROS, but HEROS that run on different Node has different functions according to type of Node. It is necessary to configure function module of HEROS on different Node. HEROS that run on Connect Node receives IP packet from Internet, it does not transmit it to the upper level protocol on Connect Node, but give the packet to Response Nodes through inter communication network. When Response Node need to send out some response data to Internet, HEROS on Response Node does not give the data to IP protocol software of itself, but send out through inter communication network to Connect Node. The structure of fault-tolerant TCP includes the following parts: 1ǃ Systematic state identification: it represent state of Response Node which run fault tolerance TCP, the value maybe include Active, Standby, One-working, and Noworking. 2ǃ TCP receiving message queue: it is similar to ordinary TCP’s message queue and used for receiving various kinds of message. 3ǃ Task that processes TCP message: it is similar to the task of ordinary TCP. It deals with TCP messages from received message queue mainly. According to present state of Node, this task determine if produce TCP connection state backup message to send to fault tolerant TCP task running on Standby Node after finishing one message once. 4ǃ TCBs: it like TCBs in ordinary TCP, but its function is different according to the state of


Node that runs fault tolerance TCP. If Node’s state is Active or One-working, then the function of it is the same as TCBs in ordinary TCP. If Node’s state is Standby, then the function of it is the store field of TCP backup data. 5ǃ TCP backup message queue: it is used for receiving TCP backup message from Active Response Node. This queue is effective only when the Node’s state is Standby. 6ǃ Task that processes TCP backup message: it deals with message in TCP backup message queue, and write the backup message to its TCBs according to certain rule. This task is effective only when the Node’s state is Standby. There are two sets of mechanism of receiving and processing message in fault-tolerant TCP. But it is impossible that two mechanisms run at the same time. If the state of Node is Active or One-working at present, then TCP receiving message queue and task that processes TCP message will be running, and TCP backup message queue will not receive any message and Task that processes TCP backup message will be hung up. Equally, if the state of Node is Standby at present, TCP backup message queue and task that processes TCP backup message will be running, and TCP message queue will not receive any message, and task that processes TCP message will be hung up. In addition, while the Response Node’s state is switched over from Standby to Oneworking, it is necessary to wait all the backup messages in TCP backup message queue to be dealt over. How the two fault tolerance TCP mutually backup is showed as figure 8. Active

Standby


TCP msg queue

Output Tcp msg Backup msg Handle Task


TCBs

TCP msg queue


Output Tcp msg Backup msg Handle Task


TCBs

Figure 8. Mutually backup between Fault Tolerance TCP running on Active Node and Standby Node

4.4. Steps of Fault Tolerant TCP Realization Among fault tolerant TCP, TCBs are not only used for maintaining TCP connection states, but also used as LOG system of the state backup data of TCP connection. So we must carry on certain modification to TCB data structure at first to make it able to realize two kinds of above functions. There is a data item in TCB to point out the corresponding TCP connection’ state at present. The

fetching value range of it is {TCP_FREE, TCP_CONNECT, TCP_ESTABLISH, TCP_WAIT…}. Now we will add a new TCB state as TCP_BACKUP, if the data item of TCB is equal to TCP_BACKUP, then all data in this TCB is backup data from other node, not the real TCP connection’s data. Secondly, we need one symbol in TCB to represent if the TCP connection adopts backup mechanism, so we add a new data item of IsFaultToleranceMode to TCB. The data type of this data item is BOOLEAN. If this data item is equal to TRUE, the corresponding TCP connection will adopts backup mechanism. Finally, because fault-tolerant TCP must implement backup of some socket reading and writing operations, we also add two links to store reading and writing operation events in TCB. For fault tolerant TCP, it is necessary to backup the TCP connection state to Standby Node properly. So we modify the task that processes TCP message. If certain TCP message induce that the TCP task send some TCP data to the far TCP peer, the TCP task will create a backup message according the present state of TCP connection, and send it to the backup TCP task running on Standby Node. If upper application calls socket reading or writing operation, the TCP task will send the operation event to Standby Node as a backup message. The TCP task running on Standby Node will store the event to the corresponding link. Fault tolerant TCP not only handles ordinary message, but also needs to handle backup message. So we need to add a new mechanism of message handling for fault tolerance TCP which is used for receiving and dealing with the backup message. Finally, in order to facilitate the design of upper fault tolerant application software, we need to carry on certain modification to API of TCP socket. Several new socket API functions are be added, such as tcp_setsockftmode, tcp_restoreconnect, tcp_releaserwevent and tcp_beginsockbackup. The tcp_setsockftmode function makes application program able to conveniently establish one fault tolerance or ordinary TCP connection. The tcp_restoreconnect function makes application program able to recover original TCP connection voluntarily from backup data after switching over from Standby to Active/One-working. The tcp_releaserwevent function makes application program able to release the backup socket operation events in any time, generally this function should be called after the fault tolerant application complete backup of state one time. The tcp_beginsockbackup function makes fault tolerant TCP able to backup all information of TCP connection to Standby at any time.

5. Test of Fault Tolerance TCP


Test of fault tolerant TCP is divided into two parts, function test and performance test. In the function test, we design a simple Server application program that use fault tolerant TCP. The main work of it is to receive TCP connection request from the Client application that use ordinary TCP. After TCP connection is established, Server application sends out data with sequence number though the TCP connection. While client application receives these data with sequence number, then return them to Server application. In the course of Client/Server normal working, we make the Active Response Node to be fail, the Standby Response Node is switched over from Standby to One-working, then we observe Client application if it is aware that the Server application have been switched over. The Server application used fault tolerant TCP and Client application used ordinary TCP shows as figure 9. Server

Client

Connect to Server, Construct socket

Is restore?

get sock ID

Respone Client connect request, Construct socket

tcp_restoreconnect

tcp_setsockftmode

backup sock ID

tcp_beginsockbackup

TRUE

ordinary TCP, so we set up one basic 10M Ethernet hardware environment that uses a SUN workstation as Client. After we run many groups data on TTCP, we can compare the performance between fault tolerance TCP and ordinary TCP. The result shows as the figure 10. In the figure 10, the throughput of Client is different with the corresponding Server. It is because the SUN workstation can’t measure nicely small time slot. When the size of transmission data reaches to 1M, the test result of Server and Client are more identical. According to test results, we can calculate out that the throughput of fault tolerant TCP is only 48.83% of ordinary TCP. The reason of performance descendent is mainly because that fault tolerant TCP must keep frequent state backup, and the message communication among the nodes may take up a large amount of time. Compare with the result of some existing fault-tolerant TCP, our fault tolerance TCP’s performance is only bad than wrapping Server-side TCP. But our fault tolerance TCP overcomes the shortcoming that the wrapping Server-side TCP. Of course, how improve the performance of our fault tolerance TCP is our future work.

tcp_read

tcp_write

backup sock ID

i

A new method of fault tolerance TCP - Computer ... - Semantic Scholar

A new method of fault tolerance TCP - Computer ... - Semantic Scholar

Suggest Documents

TCP Server Fault Tolerance Using Connection ... - Semantic Scholar

Improving the Fault Tolerance of a Computer ... - Semantic Scholar

090419 Fault Tolerance - Semantic Scholar

A New Fault-Tolerance Technique for Cache ... - Semantic Scholar

A New Fault-Tolerance Measure for Evolved Small ... - Semantic Scholar

A NEW METHOD TO IDENTIFY FAULT CURRENT ... - Semantic Scholar

A New Bearing Fault Detection Method in ... - Semantic Scholar

Fault Tolerance in Cooperative Manipulators - Semantic Scholar

Fault TOLERANCE IN GRID COMPUTING - Semantic Scholar

Survivability: Beyond Fault Tolerance and ... - Semantic Scholar

Software-implemented fault-tolerance and ... - Semantic Scholar

Adaptive Fault Tolerance for Spacecraft1 - Semantic Scholar

Composing Distributed Fault-tolerance Components - Semantic Scholar

Architecting Fault Tolerance with Exception ... - Semantic Scholar

Incorporating Fault Tolerance in LEACH - Semantic Scholar

A survey of fault tolerance mechanisms and ... - Semantic Scholar

A Local Measure of Fault Tolerance for ... - Semantic Scholar

A systematic review of fault tolerance in mobile ... - Semantic Scholar

Fault Detection and Fault Tolerance in Robotics - Semantic Scholar

SpiNNaker: Fault tolerance in a power- and area ... - Semantic Scholar

Fault tolerance in VLSI circuits - Computer - CiteSeerX

Hypervisor-Based Fault-Tolerance - Cornell Computer Science

Hypervisor-Based Fault-Tolerance - UT Computer Science

A Heuristic for Fault-Tolerance Provisioning in ... - Semantic Scholar