SCTP Switchover Performance Issues in WLAN Environments *
*
*
**
*
Sheila Fallon , Paul Jacob , Yuansong Qiao , Liam Murphy , Enda Fallon , Austin Hanley Software Research Centre, Athlone Institute of Technology, Ireland
*
*
Performance Engineering Laboratory, University College Dublin, Ireland
**
E-mail
[email protected], {pjacob, ysqiao, efallon, ahanley}@ait.ie, Liam.Murphy @ucd.ie
Abstract- The increased number and diversity of underlying networks have made transparent network migration a necessity. Through its support for multi-homing the Stream Control Transmission Protocol (SCTP) enables seamless network mobility by abstracting multiple underlying physical paths into a single end-to-end association. One of these paths is selected as the primary. When a number of retransmission failures occur on the primary path, switchover is initiated to a secondary path. The number of retransmission attempts before switchover is initiated can be configured; however, the delay between each retransmission is managed internally in SCTP using a Retransmission TimeOut (RTO) value. This paper shows that the current SCTP mechanism for calculating RTO values is inappropriate in WLAN environments, since increased Round Trip Times (RTT) significantly distort RTO calculations. Experimental and simulated results indicate that SCTP behaves in a counterintuitive manner which allows more time for switchover as network conditions degrade: delays of up to 187 seconds can be experienced before switchover occurs. We show that additional SCTP parameters need to be carefully configured in order to reduce this switchover delay to a more acceptable level.
Index Terms — SCTP, Heterogeneous Networks I.
WLAN,
Switchover,
INTRODUCTION
In recent years the demand for a seamless end user experience, particularly for real time applications such as Voice Over IP (VOIP), has led to the wide scale deployment of heterogeneous access network types such as Wireless LAN (WLAN), WiMax and 3G. The availability of multiple network types has resulted in end user devices supporting multiple network interfaces. While the pervasive availability of multiple network types creates opportunities for application developers, the heterogeneity of these networks in terms of range and capacity poses technical issues in relation to the need for a network independent migration solution. Originally designed as a transport protocol for Signaling System 7 (SS7) data across IP networks the transport layer protocol SCTP [1] is well positioned to address these issues. As a transport layer protocol SCTP shares some of the features of TCP, as well as introducing enhancements to support network mobility. Foremost amongst these enhancements is its support for multi-homing - the ability to implement an end to end communication session transparently over multiple
physical paths where the end point of each path is identified by an IP address. Given their ubiquitous deployment WLAN networks will be a critical element of any pervasive heterogeneous network implementation. Their limited signal range however, means that any effective WLAN enabled area will be achieved through multiple overlapping zones of coverage. The implications of these overlapping zones of coverage is that multiple network switchovers will be required in order to support for example, a mobile VOIP client session. It is critical therefore that handovers are implemented in an application transparent manner. Through its support for multi-homing SCTP is a suitable technology to implement network mobility. This paper however, identifies significant deficiencies which affect the use of SCTP in a WLAN environment. These deficiencies result from the mechanism by which SCTP determines when a network switchover should occur. Results presented will illustrate that the current SCTP standard behaves in a counterintuitive manner. As a mobile node moves from the coverage of the Access Point (AP) hosting the primary path RTT and loss rates increase significantly. Rather than interpreting the network quality degradation as an indicator of imminent primary path failure and implementing an immediate switch, SCTP significantly delays the switchover. These findings also have implications in relation to how an aggressive SCTP switch strategy is best implemented in a WLAN environment. This paper is organized as follows. Section II details related work in the area. Section III describes in detail the SCTP path management functionality. Section IV presents the experimental test configuration and results. Using result data from Section IV as input, Section V describes the simulated study undertaken and presents results. Finally conclusions are discussed in Section VI. II. RELATED WORK In this paper we investigate the performance of SCTP switchover in a WLAN environment. Experimental results indicate that in a WLAN environment there exists a point of performance transition from a low loss, low delay regime to a
high-loss, high delay operation. This sudden transition in performance is observed as a mobile node moves from the zone of coverage of the AP hosting the primary path. In [2] the tradeoff between buffering and loss for voice traffic in 802.11 networks is investigated. An abrupt transition from the low-loss, low delay regime to high-loss, high-delay operation is observed. The study indicates that below the point of performance transition AP buffer sizing has little impact on throughput. However as the point of performance transition is passed total delay depends strongly on AP buffer size. In [3] the performance of SCTP, TCP, and Eifel are compared during delay spikes. Results indicate that in the presence of delay spikes without packet loss, SCTP and TCP Reno have similar performance. This paper does not consider continuously increasing RTT. A number of studies have been undertaken which investigate the performance of SCTP switchover. In [4] an analytical study of SCTP failover is undertaken which indicates that the current mechanism for calculating the duration of a SCTP switchover is unsatisfactory. Two additional parameters are introduced to the SCTP failover strategy in order to more accurately reflect the exact time at which catastrophic primary path failure occurs. In [5] performance implications of the use of heterogeneous wireless networks with differing bandwidths are presented. In [6] it is suggested that the SCTP handover strategy is reactive in nature and a more proactive approach where handover is based on path delays should be introduced in order to preempt and avoid path failures. In [7] a SCTP handover scheme for Voice over IP (VoIP) applications for heterogeneous networks is presented. The proposed handover scheme is based on the ITU-T E-Model for voice quality. In [8] failover mechanisms for transport layer protocols are investigated. Results indicate that an aggressive failover strategy does not degrade performance and often improves goodput. III. CURRENT SCTP SWITCHOVER MANAGEMENT As a multi-homed protocol SCTP has the ability to abstract multiple underlying physical paths, where the end point of each is identified by an IP address, into a single end to end association. At association start-up, one of these physical paths is defined as the primary path. This primary path is normally used for packet transmission. When the primary path fails, a backup path will be selected as primary path. The SCTP path management function monitors the reachability of each destination address using heartbeat packets when other packet data is inadequate to provide this information. If no data chunks have been sent to a destination address within the current heartbeat interval the address is marked as “idle”. If the sender received the expected acknowledgement from its peer within a designated period the address is marked as “active”.
If a SCTP sender does not receive a response for a SCTP data chunk from its receiver within the time of Retransmission Time-Out (RTO), the sender will consider this data chunk lost. If the number of data loss exceeds the SCTP parameter Path.Max.Retrans, the address will be marked as inactive by the sender. Therefore, RTO is a very important factor for handover and the stability of the protocol. RTO is calculated for each destination address separately based on the Smoothed Round-Trip Time (SRTT) and Round-Trip Time Variation (RTTVAR) of the path. It is initialized with RTO.Initial which is a SCTP parameter and can be configured by the user: RTO = RTO.Initial
(3.1)
SRTT and RTTVAR of a path are calculated by the measurement of Round-Trip Time (RTT) of the path. The RTT measurement for a path is made for every round trip. When SCTP gets the first measurement of RTT: RTT.1st, SRTT and RTTVAR are initialized as: (3.2) SRTT = RTT.1st RTTVAR = RTT .1st 2 (3.3) And RTO is updated to RTO = SRTT+4 xRTTVAR (3.4) For each time SCTP gets a new measurement of RTT: RTT.new, SRTT and RTTVAR will be updated as follows: RTTVAR.new = (1-β ) x RTTVAR.old+ β x (SRTT.oldRTT.new) (3.5) (3.6) SRTT.new = (1- α) x SRTT.old+ α x RTT.new Where β and α are constants and their recommended values are 1/4 and 1/8 respectively [1]. Then the new RTO is: RTO = SRTT.new + 4 x RTTVAR.new (3.7) If the new RTO is less than RTO.Min, it will be set to RTO.Min. If the new RTO is greater than RTO.Max, it will be set to RTO.Max. Every time a transmission timeout occurs for an address, the RTO for this address will be doubled: RTO = RTO x2 (3.8) If the new RTO is greater than RTO.Max, RTO.Max will be used for the new RTO. If the sender gets a response from the receiver and a new RTT is measured, SCTP will use this new RTT to calculate RTTVAR, SRTT and finally RTO by the equations (3.5) to (3.7). The default values from [1] for SCTP parameters which are used to implement the switch over management strategy are: Parameter RTO.Initial RTO.Min RTO.Max Path.Max.Retrans HB.interval
Recommended Value 3 seconds 1 second 60 seconds 5 attempts 30 seconds
Figure 1. Test Network Configuration All IP addresses were statically configured to ensure that 2 distinct paths were created. Each test was initiated with Path 1 (192.168.1.115 - 192.168.1.112) as primary and Path 2 (192.168.2.109 - 192.168.2.150) as secondary. Each test started adjacent to access point 2 (which hosted the primary path). The mobile node then moved at slow walking pace directly towards access point 1 (which hosted the secondary path). 25 tests were executed in which the SCTP client sent ten packets, with a payload of 1500 bytes, every 50 milliseconds. This value was chosen to represent a constant bit rate required for media streaming applications of 2.4Mbits/sec. B. Illustrating the Effect of Increased RTT with a Single Test The manner by which RTO is calculated is critical to this investigation. The deficiencies in SCTP RTO calculation are initially best illustrated using a single test. Later sections detail the aggregated results of multiple tests. As the mobile node moves from the coverage area of the AP hosting the primary path the signal strength degrades and results in intermittent network connectivity. The average
4500 4000 3500
60000
3000 2500 2000 1500
40000
50000
30000 20000
1000 500 0
RTO (ms)
A. Test Configuration Two Laptops, Laptop 1 representing a mobile client and Laptop 2 a back end server are connected via two 802.11g access points. The configuration is illustrated in Figure 1. Both laptops were configured with SCTPLIB [10]. The SCTP client was hosted by a Dell D820 laptop which was multihomed through an internal Dell Wireless 802.11g WLAN card together with an external Belkin 802.11g wireless card. The SCTP server was hosted by a Dell D800 laptop. The client and server were networked by 2 Linksys WRT54GL 802.11g access points. The internal wireless card on the client laptop was configured with IP address 192.168.2.109 while the external Belkin wireless card was configured with 192.168.1.115. On the server-side 2 Ethernet connections were configured. Ethernet connection 1 was configured with IP address 192.168.2.150 while Ethernet connection 2 was configured with 192.168.1.112.
signal strength for the twenty five second periods 1..25, 26..50, 51..75, 76..100 were -69.5, -78.5, -88 and -97 dBm respectively. As the signal strength degrades the RTT and corresponding RTO increase. Figure 2 illustrates the RTO, SRTT, RTT and RTTVAR for the selected test.
RTT (ms)
IV. EXPERIMENTAL RESULTS
SRTT RTTVAR RTT RTO
10000 0 0
50
100
150
200
250
300
350
Time (Secs)
Figure 2. RTT and RTO Values for Selected Test After 100 seconds the RTT increases significantly. For the period 100-250 seconds the calculation of RTO is based on the recorded RTT value by applying (3.5) (3.6) and (3.7). As a result of the continuously high RTT value the baseline RTO value also increases significantly. After 250 seconds communication on the primary path fails. As the packet retransmission failures occur (3.8) is applied which doubles an already high baseline RTO value. Table 1 details the average RTO values for various periods during the test. Time (s) 0-100 101-250 251-300 301-350 3148 20714 55040 RTO (ms) 1014 Table 1. Average RTO Values by Interval In order to illustrate how intermittent network connectivity effected RTO calculation a Wireshark trace was taken. Details of packet transmission for the period 100-110 seconds are shown in Figure 3.
Figure 3. Wireshark Trace for Interval 100-110 Secs
At approximately 100 seconds TSN 27933 is successfully transmitted. The SCTP Client continues to send packets which utilize sequential TSNs. At t = 100.2 secs TSN 28026 is transmitted. As a result of degrading signal quality packets with TSN 27991 to TSN 28008 are lost. Using the standard retransmission strategy defined in [1] packets were successfully retransmitted on the standby path. At t = 100.6 secs the SCTP server cumulative ACKs all TSN up to 28026. Using this cumulative ACK the SCTP client determines that the RTT for TSN 28026 was 400ms.
C. Summarized Experimental Results Using the test configuration detailed in Section A 25 tests were executed and the results are illustrated in Figure 5.
Table 2 details the transmission and acknowledgement times for a selection of packets which were successfully transmitted on the primary path. TSN 28026 28069 28268 28415 28484
Time Sent (s) Time Acked (s) RTT (ms) 100.2 100.6 400 100.7 101.4 700 102.7 103.4 700 104 105.3 1300 104.7 106.3 1600 Table 2. RTT Calculation by Packet
After 100 seconds the RTO was 1000ms, SRTT was 107ms, RTTVAR was 79ms and the recorded RTT was 7ms. Figure 4 illustrates how increasing RTT affect the SRTT, RTTVAR and subsequent RTO calculations. 4000 3500
Figure 5. Average RTO and RTT Values The experimental results indicate that SCTP behaves in a counterintuitive manner. Figure 5 illustrates that there is a continuous and significant increase in RTO between approximately 40 and 60 seconds as a result of increased RTT. This increase is as a result of intermittent network connectivity as the mobile node moves from the coverage of the access point hosting the primary path. It would be desirable for SCTP to interpret this degradation in network quality as an imminent path failure. On the contrary Figure 5 illustrates that at approximately 60 seconds SCTP significantly increases RTO. In this way more time is allocated before switchover occurs. This behavior effectively delays switchover when it is required most.
3000 Time (ms)
RTO 2500 SRTT
V. SIMULATION RESULTS
2000 RTTVAR 1500 RTT 1000 500 0 100
102
104
106
108
110
Test Time (Sec)
In order to illustrate how the SCTP RTO.max parameter could be used to limit the delay before switchover an NS2 [11] simulation was created which utilized the University of Delaware’s [12] SCTP module. The simulation used the results of the experimental study described previously as a basis. Figure 6 illustrates the simulation topology which reflects the experimental test configuration illustrated in Figure 1.
Figure 4. RTT and RTO Values for Time 100-110 Secs Within a period of 10 seconds the RTO has increased from 1000ms to 2802ms. Table 1 indicated that the average RTO between 100 and 250 seconds was 3.148 seconds. If a retransmission timeout occurred during this time period (3.8) would be applied. This would double the RTO to 5602ms. Using the default PMR value of 5 it would take 2.8+5.6+11.2+22.4+44.8+60=146.8 seconds for switchover to occur. While a PMR value of 5 is generally accepted to be excessively large, lower PMR values also result in significant switch delays. Using PMR values of 0, 1 and 2 delays of 2.8, 8.4 and 19.6 seconds respectively are experienced. These delays are excessively large.
Figure 6. NS2 Simulation Configuration Node S and Node R are SCTP sender and receiver respectively. Both SCTP endpoints have two addresses. R1,1, R1,2, R2,1 and R2,2 are routers. The implementation is
configured with no overlap between the two paths. Node S begins to send data to Node R after 0.5 seconds at a rate of 2.4Mbps. The results of the experimental study indicated that the average time of primary path communication failure was 64 seconds. In the simulation the loss rate for the primary path was set to 100% at 64 seconds. Figure 7 illustrates the accumulated bytes transmitted for a range of PMR values, retransmissions are not included.
Results indicate that the PMR value in isolation is not sufficient to guarantee an aggressive switch strategy. By setting PMR=0 delays of 6.36 seconds were experienced before switchover occurred. This is due to the excessively large RTO values calculated using (3.5) (3.6) and (3.7) as a result of increased RTT in a WLAN environment. Therefore a limiting value for PMR in itself does not result in an aggressive switchover strategy which was recommended in [8]. Table 3 illustrates that limiting values for both PMR and RTO.max should be employed. VI. CONCLUSIONS AND FUTURE WORK
Figure 7. Bytes Transmitted By PMR Using the PMR value of 5 and an RTO.max value of 60 seconds resulted in a switchover after 250.61 seconds, 186.61 seconds after catastrophic primary path failure occurred. It is widely accepted that the PMR 5 values as suggested in [1] are excessively large. Previous studies have suggested that values in the range 0-2 should be used. Simulation results indicate however, that as a result of the manner by which SCTP RTO values are calculated even these aggressive PMR values do not result in an acceptable switch performance. With an RTO.Max value of 60 seconds with PMR set to 2, 1 and 0 respectively the delay between catastrophic failure of the primary path and subsequent switchover are 34.1, 15.84 and 6.36 seconds respectively. In order to address the excessive delay between primary path failure and switchover a number of simulations were undertaken which limited the increase in RTO by utilizing the RTO.max parameter. Table 3 illustrates the switchover times for PMR values 0 to 5 when the maximum RTO is set to 60, 30, 10, 5 and 1 second respectively.
PMR Value
RTO.MAX (s) 10 5 1 186.61 126 58.24 34.37 9.14 5 130.91 95.4 47.65 28.77 7.53 4 70.31 64.8 37.05 23.17 6.53 3 34.1 34.1 26.44 17.57 5.53 2 15.84 15.84 15.84 11.97 4.52 1 6.36 6.36 6.36 6.36 2.92 0 Table 3. Switchover Times For PMR and RTO.Max 60
30
This paper presents results which indicate that the current SCTP implementation behaves in a counterintuitive manner by allowing more time for switchover as network conditions degrade in a WLAN environment. As a mobile node moves from the coverage of the access point hosting the primary path, RTT and loss rates increase significantly. It would be desirable for SCTP to interpret this degradation in network quality as an imminent path failure. On the contrary, SCTP increases the RTO, further delaying the time at which switchover occurs: this effectively delays switchover when it is most required. Results also indicate that the PMR parameter in isolation can not define an aggressive switch strategy in a WLAN environment: it is necessary to set limiting values for both PMR and RTO.Max in order to ensure that an appropriate switchover strategy is obtained. Future work includes developing a SCTP switch management algorithm which will recognize continuously increasing RTT as an indicator of imminent path failure in a WLAN environment. REFERENCES [1]
R. Stewart et al: Stream Control Transmission Protocol, IETF RFC 2960, Oct. 2000. [2] Malone.D.W., Clifford,P., Leith,D.J “On Buffer Sizing for Voice in 802.11 WLANs”, IEEE Communications Letters 2006 [3] Fu, S, Atiquzzaman, M, Ivancic, W, “Effect of delay spike on SCTP, TCP Reno, and Eifel in a wireless mobile environment” Proceedings of the Conference on Computer Communications and Networks, 2002. [4] Budzisz L, Ferrus R, Grinnemo K, Brunstrom A, Ferran C, “An Analytical Estimation of the Failover Time in SCTP Multihoming Scenarios” Wireless Communications and Networking Conference (WCNC) 2007 [5] Qiao Y, Fallon E, Murphy L, Murphy J, Hanley A, Zhu X, “SCTP Performance Issue on Path Delay Differential”, Wired/Wireless Internet Communications (WWIC) 2007 [6] Kelly, A, Muntean , G, Perry , P, Murphy , J, “Delay-Centric Handover in SCTP over WLAN”, Transactions on Automatic Control and Computer Science, 49, 63 (2004), 1--6. [7] Fitzpatrick, J, Murphy, S, Murphy, J, “An Approach to Transport Layer Handover of VoIP over WLAN”, Proc. of IEEE Consumer Communications and Networking Conference (CCNC) 2006. [8] Caro, A, Amer, P, Stewart, R, “Rethinking End-to-End Failover with Transport Layer Multihoming”, Annals of Telecommunications 2006 [9] UC Berkeley, LBL, USC/ISI, and Xerox Parc: ns-2 documentation and software, Version 2.29, Oct. 2005, www.isi.edu/nsnam/ns. [10] SCTP library (sctplib), version sctplib-1.0.5 www.sctp.de [11] A. Caro, et al : ns-2 SCTP module, Version 3.5, www.armandocaro.net/software/ns2sctp/. [12] G. Combs, et al : Wireshark network protocol Analyzer, Version 0.99.5, www.wireshark.org