IMPACT OF SACK DELAY AND LINK DELAY ON FAILOVER PERFORMANCE IN SCTP Johan Eklund Karlstad University S-651 88 Karlstad, Sweden email:
[email protected]
Anna Brunstrom Karlstad University S-651 88 Karlstad, Sweden email:
[email protected]
ABSTRACT The Stream Control Transmission Protocol (SCTP) was developed to support the transfer of telephony signaling over IP networks. One of the ambitions when designing SCTP was to offer a robust transfer of traffic between hosts. For this reason SCTP was designed to support multihoming, which gives the possibility to set up several paths between the same hosts in the same session. If the primary path between a source machine and a destination machine breaks down, the traffic may still be sent to the destination by utilizing one of the alternate paths. The failover that occurs when changing path is to be transparent to the application. Consequently, the time between occurrence of a break on the primary path until the traffic is run smoothly on one of the alternate paths is important. This paper presents experimental results concerning SCTP failover performance. The focus in this paper is to evaluate the impact of the SACK delay and link delay on the failover time as well as on the maximum transfer time for a single message, which complements earlier studies in this area. The results show a significant performance impact of the SACK delay as well as of the link delay. This suggests that the SACK delay is an important parameter to tune to enhance application transparency in failure situations.
focus on the failover performance of SCTP. The set up of the experiment is inspired by the telephony signaling perspective in that we consider the transfer of many small messages. Our contribution is twofold. First we extend the research by Grinnemo and Brunstrom [2], by investigating the impact of longer link delays on the failover time and on the resulting longest transfer time for an individual message. Second, and most importantly, we investigate the impact of the SACK delay on these performance parameters. The results indicate that the link delay as well as the SACK delay have a major impact on both the failover time and the maximum message transfer time. The remainder of the paper is organized as follows. In Section 2 SCTP and the failover procedure is further described. Section 3 presents some related work done in this area. Further, in Section 4 the experimental setup and the parameters used for the experiment are presented. Section 5 presents the results from the experiment as well as an analysis of the results. Finally in Section 6 some conclusions are drawn and some suggestions for future work are pointed out.
KEY WORDS IP Networks, SCTP, Multihoming, Failover Performance, Experimental Study
SCTP is a reliable transport protocol, which has inherited many features from the traditional reliable transport protocol on the Internet, TCP [3]. The congestion control of SCTP is based on the congestion control of TCP [4]. The slow start, congestion avoidance and fast retransmit mechanisms have been almost directly inherited from TCP, although SCTP uses a byte-oriented congestion window as opposed to the segment-oriented window used by TCP. Although recently revised, the original SCTP specification [1] also required four duplicate acknowledgements to be received before triggering a fast retransmit (as opposed to three in TCP) and this is used in our experiment. SCTP employs a selective acknowledgement (SACK) scheme that is similar to SACK TCP. Although many of the features of SCTP have been inherited from TCP, there are, however, also some significant differences between the protocols. TCP uses only one interface on each host for the transfer of traffic. A path failure is handled by the routers in the network and the time to find a new path may take several seconds [5] if it is possible at all. Furthermore, a failure between the host and the first node
1 Introduction The Stream Control Transmission Protocol (SCTP) [1] is a relatively new general purpose transport protocol, primarily designed to enable telephony signaling over IP. One of the demands from telephony signaling was robustness, which requires redundant paths between hosts. This is supported in SCTP by a feature called multihoming. This feature enables the connection of more than one path between the same hosts in the same session. If the data is delivered to the receiver as expected all the traffic is sent on the path denoted primary, but if the traffic on the primary path cannot be delivered the traffic is switched over to one of the alternate paths and the data delivery to the destination host proceeds. A so called failover has occurred. In the experimental study presented in this paper we
2
SCTP and Failover
in the network is not handled at all by TCP. To overcome this limitation of TCP and to offer a more robust service, SCTP is designed to enable multihoming, where more than one interface could be used in the same session. In an ideal situation the paths connected to the different interfaces of the host are totally disjoint. During normal operation all data is sent over the path denoted primary, while only so called Heartbeats are sent on the alternate paths to probe reachability at regular intervals. In case of failure the traffic is handed over to one of the alternate paths, which until now has been totally redundant. By this, the data transfer between the hosts will continue. After the failover Heartbeats are sent on the path that noticed the failure. In case of success for the Heartbeats to reach the destination the traffic is handed back to the original path [1]. The way that SCTP discovers the primary path failure is if SACKs from the receiver are missing. This could be due to path failure but it can also be due to congestion in the network. The challenge for the sender is to distinguish between these two causes and decide when to abandon the primary path. In a multihomed association discrete congestion control parameters are kept for each destination. To be able to observe failures on a path, the sending SCTP host also keeps an error counter for each destination that counts the number of retransmission timeouts. In case of failure on the primary path, and under the assumption that there is outstanding traffic, the retransmission timer will expire after a retransmission timeout (RTO) interval. At this time the error counter for the primary path is incremented by one and the data not yet acknowledged is resent on one of the alternate paths. New traffic is at this moment still sent on the primary path, where the congestion window after the timeout is set to one and slow start is entered. A path is considered unreachable when the error counter reaches the value of the protocol parameter Path.Max.Retrans (PMR). In case of a permanent failure the number of timeouts will after a while reach the PMR value and the failover will take place. From the description above it is evident that the failover performance is highly dependant on the parameters PMR and RTO. RTO is a dynamic parameter adjusted during the session, but it is possible to set the minimum and maximum values. A too strict configuration of these parameters may cause spurious failovers while a too liberal setting can degrade the protocol performance and the service offered to the application. The SCTP standard, RFC 2960 [1], has some recommendations for these parameters (PMR = 5, RTOinit = 3000 ms, RTOmin = 1000 ms, RTOmax = 60000 ms).
3
failover performance. In their experiment the tests were run with a link delay of max 20 ms and under these conditions the link delay showed to have marginal impact on the failover time and the maximum transfer time for a message. A similar study by Jungmaier et al. [6] found the recommendations in the RFC to be far too liberal to meet the telecommunication requirements. In both these previous experiments the configuration of the retransmission timer was set very strict in order to meet the telephony signaling requirements in [7]. This strict configuration excluded the dynamic aspects of the tuning of the parameters available for optimizing failover performance. In our experiment we use the results from these studies, but we enable the retransmission timer to vary between 80 ms and 60 s. We also consider the impact of SACK delay which has not been investigated before. The failover performance has also been investigated in a wider aspect by Caro [8]. He looked at SCTP failover performance over both fixed infrastructures and mobile adhoc networks where the traffic on the primary path faces problems due to congestion, but the path may become available again during the session. His research is focusing on situations where a big file (4 MB) is transferred between hosts. For fixed infrastructures he recommends an aggressive approach for failover, where all further traffic after a timeout is sent on one of the alternate paths, while fast retransmissions are resent on the primary path. In his study the focus is on bulk transfers, which is quite different from our experiment since we are motivated by telephony signaling and instead consider several small individual messages where each message has its own transfer time. For this reason the results are not directly comparable.
4
Experimental Design
The purpose of the experiment presented in this paper is to evaluate the impact of link delay and SACK delay on the performance of SCTP-controlled failovers. A typical network scenario for a multihomed session between two hosts is considered, as depicted in Fig. 1 where data is sent from the source to the destination.
Figure 1. Network scenario
Related Work
Some research has been done to investigate the failover performance of SCTP in relation to telephony signaling requirements. Grinnemo and Brunstrom [2] investigated failover where they found PMR to have great impact on the
4.1
Experimental Setup
The evaluation of the failover performance is done using a network setup illustrated in Fig. 2. The source application
at the source machine generates messages, which in compliance with the congestion window are immediately sent to the sink application at the destination machine. Both the source machine and the destination machine are PCs running Linux 2.6.10. Initially all data is sent on the primary path. After a specific time, long enough for a stationary transmission behavior to occur, the primary path is broken and SCTP then transfers the traffic to the alternate path. The primary path never recovers during the transfer. The time between the path failure and the time when the transfer is handed over to the alternate path is the time for the failover procedure. The test run is ended about 20 seconds after the failover, long enough to regain the stationary transmission behavior on the alternate path.
Figure 2. Experimental setup
The two paths between the source and the destination machines consist of disjoint links of bandwidth 100 Mbps. Both paths include link emulators (L1 and L2 in Fig. 2) which gives the possibility to restrict the bandwidth as well as to vary the link delays and the size of the router queues. The path break of the primary link is also emulated by the emulator L1. The link emulators are PCs running FreeBSD 4.1 and dummynet [9].
4.2 Experimental Parameters The data transferred in the experiment consists of many small messages, which is inspired by the telephony signaling perspective. Several small individual messages of size 250 bytes are sent between the hosts. The messages are sent at a constant rate with an interval of 10 ms. The tests were performed with four different link delays: 20, 40, 60 and 80 ms, where 20 ms roughly represents a domestic link delay, whereas 80 ms roughly represents a transatlantic transfer link delay. Although the use of fixed link delays may not be completely realistic, it was used to better highlight the impact of this parameter. In a well managed telephony signaling network the delays are also typically relatively stable. In order to isolate the behavior of the flow under study, no concurrent traffic was sent during the experiment. The bandwidth in the experiment is the same for both the
primary and the alternate paths between the hosts and is restricted to 1 Mbps. One of the intentions of the experiment is to see the impact of the SACK delay on the failover performance. For this reason we run the experiment with different SACK delays on the destination machine. The default value for the implementation is set to 200 ms, why this is used in the experiment. Furthermore, test runs with no SACK delay were performed and to be able to compare the results to other studies a SACK delay of 40 ms was also used. Earlier studies have shown the configuration of the retransmission timer and PMR to have great impact on the failover performance [2] [6]. Since these studies have shown the recommendations in the RFC 2960 to be too liberal to meet the telecommunication requirements the minimum value for RTO is in this experiment set to 80 ms, while the initial as well as the maximum value for RTO is kept as recommended in the RFC [1] (RTOinit 3000 ms, RTOmax 60000 ms). The configuration used in this experiment enables a faster recognition of problems in the transfer compared to the recommendations in the RFC, since a short round trip time will keep the RTO timer close to the minimum value. Since the range of variation for the RTO timer is wide this gives the possibility to notice if the different SACK delays or the varying link delays will have some impact on the failover performance. Different values for PMR are used in the experiment, 5 is equal to the RFC recommendation while lower values have been shown necessary to meet the telephony signaling demands [2]. All the parameters used in the experiment are shown in Table 1. Table 1. Experimental parameters Message size Message interval Link Delay Bandwidth SACK delay RT Oinit RT Omin RT Omax PMR
4.3
250 Bytes 10 ms 20 ms 40 ms 60 ms 80 ms 1 Mbps 0 ms 40 ms 200 ms 3000 ms 80 ms 60000 ms 2 3 4 5
Data Gathering
The logging of data took place at both the application and the network levels. The results presented in the paper are measured at the application level. The log data collected at the network level represents the same results and has served as verification and has also been used for analysis. Two performance measures were evaluated in the experiment: the failover time and the maximum message transfer time. The failover time is the time between the failure of the primary path until the path is regarded as down by the
sending machine, which then starts sending all data on the alternate path. This time is not possible to calculate exactly in a traffic situation. To estimate the failover time in the experiment we will use an approximation where the time for the failover is counted between the following two points in time. The first point is the original send time for the first message that is later resent. This message is sent in close connection to the primary path failure. The second point is the time when the message with the longest transfer time is resent on the alternate path. This is the last packet that was originally sent on the primary path during the failover procedure. From an application perspective the failover time is not critical by itself. The critical part is the delay in the transfer caused by the failover procedure. From the telephony signaling perspective it is thus important to know the longest transfer time for a single message, since a too long delay for a single message may disturb the signaling. This is captured by measuring the maximum message transfer time during a session.
Figure 3 shows the failover time as a function of the different link delays used in the experiment. In the tests presented in Fig. 3 a SACK delay of 40 ms is used, but the results for the other SACK delays show the same behavior. The results are presented as separate graphs for the different values of PMR. What is evident in the figure is that the failover time increases linearly as the link delay increases. This indicates a direct relation between the link delay and the failover time. As expected, it is seen that for a given value of the link delay the failover time is doubled with each increase of the PMR due to the exponential back-off mechanism used for the retransmission timer. Figure 4 shows the maximum message transfer time as a function of the link delay with the SACK delay set to 40 ms and the results for the different PMR values separated. It is seen, as for the failover times, that for a given PMR value there is a linear increase of the maximum message transfer time with link delay and that for a given link delay the maximum message transfer times is doubled for each increase of the PMR. 14000 PMR 2 PMR 3 PMR 4 PMR 5
5 Results
5.1
Link Delay
14000 PMR 2 PMR 3 PMR 4 PMR 5
Failover time (ms)
10000
8000
6000
4000
2000
0 10
20
30
10000
8000
6000
4000
2000
One of the questions asked in the experiment is what impact the link delay between the hosts has on the failover performance. In the study by Grinnemo and Brunstrom [2] they saw no correlation between the link delay on neither the failover time nor the maximum transfer time for a message.
12000
Max message transfer time (ms)
12000
In this section the results from the experiment are shown and commented. All tests in the experiment were repeated 40 times and the average values of the 40 repetitions are shown in the graphs. For all tests the 95% confidence intervals were below 2%. The confidence intervals are therefore not included in the graphs.
40
50 Link delay
60
70
80
90
Figure 3. Failover time versus link delay for different PMR, SACK delay 40 ms
0 10
20
30
40
50
60
70
80
90
Link delay
Figure 4. Max message transfer time versus link delay for different PMR, SACK delay 40 ms
The correlation between the increase of the link delay and the failover time as well as the maximum message transfer times were not found in the study by Grinnemo and Brunstrom [2]. The reason for this is that the strict configuration of the RTO they used limited the link delay from having any impact on the results. Another reflection when comparing the results found in this experiment to the results by Grinnemo and Brunstrom is that in this experiment the failover times and the maximum message transfer times are almost equal, while the maximum message transfer times are significantly lower than the failover times in the study by Grinnemo and Brunstrom. The reason for this divergence is the different RTO configurations and different link delays used. Since the traffic is paced out from the application at a constant rate of 100 messages per second the send buffer on the sender side will start to fill up with messages waiting to be sent to the destination when a failure occurs. For exam-
5.2 SACK Delay
2000 SACK delay 200 SACK delay 40 SACK delay 0
Failover time (ms)
1500
1000
500
0 20
30
SACK delay 200 SACK delay 40 SACK delay 0
1500
1000
500
The second question asked in the experiment is what impact the SACK delay has on the failover performance. Figure 5 shows the mean value of the failover time as a function of the link delay. The results shown in Fig. 5 are based on a PMR of 2. Although not displayed, the same behavior was, however, seen for all PMR values. The results for the different SACK delays are presented in different graphs in the figure.
10
2000
Max message transfer time (ms)
ple, in this experiment and for a link delay of 80 ms there will arrive about 33 messages to the send buffer before the first timeout. Not all of these messages will be sent to the destination machine until the failover procedure is finished and during this time more messages will be buffered at the sending machine. This situation does not occur for tight RTO configurations and low link delays.
40
50
60
70
80
90
Link delay
Figure 5. Failover time versus link delay for different SACK delays, PMR 2
All graphs in Fig. 5 start in the same point at link delay 20 ms. This is due to the setting of the RTO parameter, which is restricted to a minimum of 80 ms. For this reason all settings of SACK delay reach the minimum value of RTO at short link delays which lead to the same failover times. Furthermore, it is visible that the graphs for SACK delays of 40 ms and 200 ms stay close together for all link delays. This is due to the traffic pattern used in the experiment. Since messages are sent at a constant rate with a 10 ms interval, the SACK delay timer will never time out during normal transfer neither when the SACK delay is 40 ms nor when the SACK delay is 200 ms. For this reason both these settings show the same results. The interesting part in Fig. 5 is that the graph for no SACK delay diverges significantly from the other graphs in the figure and shows a shorter failover time compared to the other settings at the same link delay. This indicates that the SACK delay used in the system has a major impact on the failover time.
0 10
20
30
40
50 Link delay
60
70
80
90
Figure 6. Maximum message transfer time versus link delays for different SACK delays, PMR 2
Next the impact of the SACK delay on the maximum transfer time for a message is investigated. In Fig. 6 the maximum message transfer time is shown as a function of link delay. Again, the results shown in the figure are for a PMR of 2, but the same behavior was seen for all PMR values. The graphs in the figure represent the different SACK delays used during the experiment. Also here, as for the failover times above, all graphs start in the same point at link delay 20 ms. The reason for this, as well as for the continuous close connections between the graphs for SACK delays 40 ms and 200 ms, is the same as for the failover times above. From Fig. 6 it is clearly visible that the graph indicating no SACK delay diverges significantly from the other graphs in the figure and shows a shorter maximum message transfer time. The SACK delay thus also has a major impact on the maximum transfer time for a message.
5.3
Analysis
The results clearly indicate that both the link delay and the SACK delay have an impact on both the failover time and the maximum transfer time of a message. The impact caused by the increasing link delay is quite intuitive. A longer link delay causes longer round trip times, and the round trip times are the basis for the RTO calculation [10]. Thus a longer link delay will mean a longer time before the first lost packet is discovered by the sender. Due to the exponential back off of the RTO-timer its value increases quickly as the PMR increases. The reason behind the impact of the SACK delay is also related to the impact on the RTO. The results have been investigated and sample values for the RTO at the point of failure have been calculated using a PMR of 2 and for tests with different SACK delays. These values can be found in Table 2. These values are sample values, but the values for all the repetitions are almost the same. These values
Table 2. Sample RTO values (ms) at the moment of failover at PMR 2 Link Delay SACK delay 00 SACK delay 40 SACK delay 200
20 ms 80 80 80
40 ms 94 117 116
60 ms 136 155 155
80 ms 175 195 195
Acknowledgment The authors would like to thank the Flux Research group at the University of Utah for providing the Emulab testbed [11]. The work is supported by grants from the Knowledge Foundations of Sweden with TietoEnator and Ericsson as industrial partners.
References can, thus, be seen as representative for all samples. It is intuitive that the RTO value is 80 ms at 20 ms link delay, since 80 ms is the minimum value. It is also seen that the RTO values increase as the link delays increase. For all link delays longer than 20 ms the RTO-values are lower at no SACK delay, compared to the other SACK delays, which explains the performance improvement seen for this case. For SACK delays of 40 ms and 200 ms the RTO values are very similar. This leads to the very similar performance and the reason for this is the traffic pattern, as described above.
6
Conclusion and Future Work
The intention with SCTP multihoming is to offer a robust service where a failover, in case of path failure, is to be as transparent to the application as possible. To be able to offer this service the performance of the failover mechanism is crucial. The intent with the experiment presented in this paper was to investigate the impact of the link delay and of the SACK delay on the failover time and on the maximum transfer time for a message. A simple traffic pattern without concurrent traffic was used in the experiment, but, nevertheless, the results provide some interesting findings. The experiment has shown significant impact of the link delay as well as the SACK delay on the failover performance. The most interesting result from the study is the impact of the SACK delay, where the introduction of SACK delay on the data receiver side significantly degrades the failover performance. This indicates that a tuning of the SACK delay, to zero or a value close to zero, could be used to improve application performance. From a telephony signaling perspective, where a well managed network can be assumed, the extra traffic overhead that such a tuning may create may be well offset by the gain in failover performance that can be expected. Further studies with more complex traffic models and network topologies are needed to verify this conclusion and we plan to conduct such studies in the future. The great importance of the configuration of PMR for failover performance has been shown in earlier studies and is also evident from the presented results. The impact of the SACK delay shown in the study is, however, present with a similar behavior for all PMR values. The results thus further indicate that the SACK delay and the PMR can be independently tuned to optimize failover performance without consideration of complex interaction effects.
[1] R. Stewart, Q. Xie, K.Morneault, C. Sharp, H. Schwarzbauer, T. Taylor, I. Rythina, M. Kalla, L. Zhang, and V. Paxson. RFC 2960: Stream Control Transmission Protocol, October 2000. [2] K-J Grinnemo and A. Brunstrom. Performance of SCTP-controlled failovers in M3UA-based SIGTRAN networks. Proc. Advanced Simulation Technologies Conference, Hyatt Regency Crystal City, Arlington, Virginia, USA, April 2004. [3] J. Postel. RFC 793: Transmission control protocol, September 1981. [4] M. Allman, V. Paxson, and W. Stevens. RFC 2581: TCP congestion control, April 1999. [5] C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian. Delayed internet routing convergence. IEEE/ACM Transactions on Networking, 9(3):293–306, June 2001. [6] A. Jungmaier, E. Rathgeb, and M. Tuexen. On the use of SCTP in failover scenarios. Proc. of the 6th World Multiconference on Systemics, Cybernetics and Informatics, pages 363–368, July 2002. [7] ITU-T. Q.706: Signalling system no.7 - message transfer part signalling performance. ITU-T, March 1993. [8] A. Caro JR. End-to-end Fault Tolerance using Transport Layer Multihoming. PhD thesis, University of Delaware, 2005. [9] L. Rizzo. Dummynet: A simple approach to the evaluation of network protocols. ACM Computer Communication Review, 27(1):31–41, January 1997. [10] V. Paxson and M. Allman. RFC 2988: Computing TCP’s Retransmission Timer, November 2000. [11] B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar. An integrated experimental environment for distributed systems and networks. Proc. of the Fifth Symposium on Operating Systems Design and Implementation, pages 255–270, Boston, MA, December 2002. USENIX Association.