Stall and Path Monitoring Issues in SCTP - CiteSeerX

2 downloads 0 Views 186KB Size Report
occurs that causes only SACKs to be lost, which confuses the. SCTP sender ...... proposals that retransmissions are marked and this mark acknowledged in ...
Stall and Path Monitoring Issues in SCTP James Noonan, Philip Perry, Se´an Murphy, John Murphy School of Computer Science and Informatics University College Dublin Dublin 4, Ireland Email: [email protected], [email protected], [email protected], [email protected]

Abstract— This paper presents how SCTP can stall in multihomed scenarios during failover and under certain circumstances. A stall is where an SCTP end-point ceases to communicate for an extended period of time, but does not report any error to the upper layer. This paper presents two different sets of circumstances where a stall can occur: firstly when there is an underestimation of the Retransmission Time-Out (RTO) value for a redundant network path; and secondly when a network error occurs that causes only SACKs to be lost, which confuses the SCTP sender about which network path is operational. Solutions to both of these stalls are presented that include modifying the RTO value, applying Karn’s algorithm to path monitoring and ensuring the destination address selection policy for SACKs is changed. This paper also presents a mechanism to de-couple data acknowledgement and path monitoring when using multi-homed transport protocols, which should remove the ambiguity about path monitoring and offers a universal solution to the stall.

I. I NTRODUCTION Multi-homed protocols will play an increasingly prominent role in the future of the Internet for a number of reasons. Applications are often dependent on their connection to the Internet and cannot afford to have this as a single pointof-failure; multi-homing is an effective way to provide redundancy by connecting to multiple service providers. Also, mobile applications can improve network availability and reduce handover latency by simultaneously connecting to multiple wireless access networks. It is therefore expected that a growing number of Internet nodes will connect to the Internet using multiple interfaces. Multi-homing presents a new challenge to Internet applications and protocols. The traditional transport protocols of Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) only consider a single connection to the Internet. Support for multi-homing partly motivated the design of a new transport protocol, Stream Control Transmission Protocol (SCTP). SCTP is a peer-to-peer transport protocol for reliable data exchange, similar to TCP, but it also supports a number of features not found in TCP, such as multi-homing, preservation of message boundaries and multiple logical streams. The issues examined in this paper relate to multi-homing. This paper reports two scenarios where end-points using SCTP effectively cease to communicate but where no problem is reported to the upper-layer. This is termed a stall. It is triggered by a network path failure, where SCTP would be expected to perform a failover. After a successful failover, data is transmitted to an alternate destination address, normally over

a different network path, but a stall either prevents or significantly delays a failover. In both cases, the stall is caused by the SCTP sender misinterpreting Selective Acknowledgements (SACKs) and mistakenly deciding that a failed network path is operational. The first scenario is caused by an inaccurate estimate of the network path Round-Trip Time (RTT) of the alternate path. This results in unnecessary retransmissions on the failed path that seem to be then acknowledged by the late SACKs. The second scenario is caused as a result of a network error that only affects traffic in one direction, coupled with the SCTP receiver using two different policies for selecting the destination address of SACKs. This makes it difficult for the data sender to isolate the network failure. Both scenarios are described in detail in Section IV. By analysing this behaviour, a number of different solutions to the stall are identified. Although both scenarios can be resolved independently, an underlying problem for multi-homed protocols is identified, which prevents the correct identification of failed network paths. A contributing cause to this weakness is the coupling of the SCTP algorithms for path monitoring and data acknowledgement. It is suggested that a mechanism to de-couple these algorithms not only removes both the stall scenarios described, but should also prevent other cases where stalls might occur that have not been discovered. Section V describes the different approaches to resolving the stall situations. Existing research that examines SCTP either compares its performance to TCP under similar circumstances [1] [2] or studies issues that are not applicable to TCP [3] [4]. As multihoming is not supported by TCP, the issues examined in this paper are relevant only to SCTP and possibly future multihoming protocols. This paper reports two different situations where an SCTP association stalls which, to our knowledge, are reported in the literature for the first time. One of the suggested solutions is to de-couple path monitoring from data acknowledgements. This is similar to other suggested modifications to SCTP that focus on different problems to the ones discussed here [5] [6]. The paper is set out as follows. Section II examines related work. SCTP, and multi-homing in particular, are described in Section III. The two stall scenarios are described in detail in Section IV and the proposed solutions are discussed in Section V. The conclusions are presented in Section VI.

II. R ELATED W ORK The behaviour of multi-homed SCTP associations and stalls is studied here. The fundamental problem is that there is ambiguity about which destination addresses a SACK refers to when data chunks have been transmitted to multiple destination addresses. A similar problem was faced by TCP when measuring RTTs with packets that had been retransmitted, the sender could not know if the ACK referred to the first or second transmission. The solution, Karn’s Algorithm [7], discarded measurements from retransmitted data segments, a policy adopted by SCTP. However, retransmitted data packets can still be used to detect if a destination address is reachable even though the same packet might have been transmitted to multiple addresses. Both the SCTP standard [8] and reference book [9] state that an implementation can choose to discard any SACKs of retransmitted data chunks when clearing an error-counter, but also state that there is no significant performance difference in doing so. The stall scenarios described here provide a counter example where performance is affected. There is some work looking at other issues related specifically to multi-homing. Coene [10] wrote an Internet draft discussing multi-homing in SCTP, noting that multi-homing alone is not sufficient to ensure redundancy and careful routing table configuration is also required. Jungmaier et al [4] examined SCTP’s failover performance, while Ravier et al [11] examined how SCTP congestion control algorithms are affected by multi-homing. The response of a multi-homed association to packet loss was studied by Caro et al [3]. This recommended that lost packets that are fast-retransmitted (detected by gap reports in SACKs) should not use an alternate path, but those detected by a retransmission time-out should. As losses in the work presented here are all detected by a timeout, implementation of this recommendation has no effect on the results presented. Iyengar et al examined SCTP’s behaviour during a changeover [6], where the primary address is changed by the application. It was found in this work that SCTP had problems recognising which address to credit SACKs to, which caused incorrect growth of the congestion window. This problem is similar to the one discussed here, both being related to ambiguity about how to credit destination addresses with SACKs. The differences between the behaviour of SCTP and TCP are studied by Brennan [1] and Alamgir et al [2], amongst others, although their work does not study multi-homing. The reaction of SCTP to delay spikes is studied by Fu et al [12], while work performed by Ladha et al [5] focuses on making SCTP more robust to these network irregularities; both schemes remove the ambiguity about which transmission a SACK refers to. There are also attempts to extend multi-homing for uses beyond redundancy. The two most interesting areas are increasing the capacity between two end-points using multihoming [13], or using SCTP for mobility e.g. [14]. Previous work by us [15] [16] examines the use of SCTP in mobile scenarios where applications monitor network performance and

select between different access networks. Clearly, both of these applications can only be improved by a better understanding of multi-homing. Outside of SCTP, multi-homing is an area of increasing interest. A new transport protocol, Datagram Congestion Control Protocol (DCCP) [17], hopes to provide delay-sensitive traffic with a TCP-friendly transport protocol. This work is still in its early stages, and support for multi-homing is limited. Snoeren’s Migrate protocol [18] uses multiple TCP connections during one session. Both of these schemes focus on changing the network connection during a connection rather than explicitly supporting the use of multiple addresses simultaneously. III. I NTRODUCTION TO SCTP This section introduces SCTP. After a brief discussion of the new features of SCTP, the remainder of the section focuses on the SCTP algorithms that contribute to the stall scenarios described in Section IV. The important areas are multihoming, path monitoring, failover and association feedback or SACKs. A. Overview SCTP is, like TCP and UDP, a transport protocol for use over IP networks. It is designed to allow applications, represented as end-points, communicate in a reliable manner, and so is similar to TCP. Indeed, it has inherited much of its behaviour from TCP, such as association (connection) setup, congestion control and packet-loss detection algorithms. Data delivery is significantly different: SCTP delivers discrete application messages within multiple logical streams in a single association. This approach to data delivery is more flexible than the single byte-stream used by TCP, as messages can be ordered, unordered or even unreliable within the same association. Internally, SCTP packets consist of a common header and one or more chunks. Each chunk-type has a specific function, e.g. data chunk, SACK chunk, and it is possible to extend the functionality of SCTP with additional chunks. Further information and description about SCTP can be found in the SCTP standard [8], book [9] and a number of introductory articles, e.g. [19]– [21]. The remainder of this section describes features of SCTP that are important to the understanding of the stall scenarios described in Section IV. B. Multi-homing in SCTP An SCTP association consists of two end-points that communicate over the Internet. If either or both of these end-points are hosted on a multi-homed node, the association can use any of the available IP addresses when transmitting packets. The capability to select an IP address from a set in this manner is referred to as multi-homing at the transport layer. In many cases of interest, including when multi-homing is configured to provide redundancy or to improve the performance of mobile devices, each IP address also represents a different Point-ofAttachment to the Internet.

The use of multi-homing within SCTP is intentionally conservative, reflecting a concern that SCTP applications share network resources fairly with TCP applications. When an SCTP association is established, a single destination address is selected as the primary destination address and all new data is sent to this address. This means that the behaviour of a multi-homed SCTP association when there are no network losses is almost identical to behaviour of a TCP connection. Alternate (or secondary) destination addresses are only used for redundancy purposes, either to retransmit lost packets or when the primary destination address cannot be reached. Of most interest in this paper are cases when the primary destination address cannot be reached as this triggers the association to perform a failover. A failover is performed when the SCTP sender cannot elicit an acknowledgement, either a SACK from a data packet or a heartbeat-acknowledgement from a heartbeat chunk, for a consecutive number of transmissions. An error-counter is maintained by the SCTP sender for each destination address and if this exceeds a threshold (normally six), the address is marked as inactive. SCTP does not transmit data packets to inactive addresses. If the primary destination address is marked as inactive, all data is transmitted to a secondary address and the association is said to have performed a failover. As with TCP, packet losses in SCTP can be detected in two different ways. If a single packet is lost but subsequent packets are not, information contained in the SACKs allows the sender deduce the loss and retransmit the packet. While the original SCTP standard suggested that this packet be transmitted to an alternate address, more recent research has suggested that retransmissions detected in this manner are transmitted to the same address as the original packet [3]. If no SACKs are received within a certain time, a retransmission timer maintained for each destination address expires, causing all outstanding data transmitted to that destination to be marked as lost and the error-counter to be incremented. It is recommended that after a timeout, retransmissions are sent to an alternate destination address if one is available. (One consequence of this is that if there are two destination addresses and the retransmitted data is also lost, the retransmissions will alternate between the two destinations until either the association fails or an acknowledgement is received). The primary address becomes active again if it responds to a heartbeat. If no data has been sent to an address for a specified time (usually 30-seconds), then it is considered to be idle and a heartbeat is transmitted to it. The receiver is expected to respond to the heartbeat immediately with a heartbeatacknowledgement. As well as monitoring the status of destination addresses, the heartbeat is used to obtain RTT measurements on idle paths. It is important to note that retransmitted data is not considered when deciding if an address is idle or not, nor is it used to take RTT measurements. One consequence of this is that during the time between a failure on the primary network and the association detecting the failure, both heartbeats and retransmitted packets are transmitted to the secondary destination addresses, although only the heartbeats

are used to measure the RTT. Although state is maintained on the basis of a destination address only, the term network path or path is sometimes used. This is taken to include both the source address and destination address and is used in this paper when the source and destination addresses are well defined. C. Path Monitoring SCTP monitors the performance of each destination address with the use of feedback information: data packets are acknowledged with SACK chunks while heartbeats are acknowledged with heartbeat-acknowledgements. While the SACKs perform data acknowledgement, both they and the heartbeat-acknowledgements are also used to monitor the status of destination addresses and to measure the path RTT for each destination. The retransmission timer uses a Round-trip Time-Out (RTO) value to decide when either a data packet or a heartbeat has been lost. SCTP uses the same algorithm as TCP to calculate this value, basing it on measured RTTs, although in a multi-homed association, an RTO is maintained for each destination address. The SCTP standard recommends that one measurement should be taken each RTT, which means that one data chunk is being timed for any address with outstanding (new) data. The calculation of the RTO value is of interest to this work. The RTT measurements are used to calculate a Smoothed RTT variable (SRTT) and a RTT Variance variable (RTTVAR), which are then used to calculate the retransmission time-out value using Equation 1. RTO = SRTT + max(G, 4 × RTTVAR)

(1)

In this equation, the value G represents the resolution of the timing clock, usually 200-ms. The RTO is bounded by a minimum value of 1-second and a maximum of 60-seconds. The value is also subject to Karn’s algorithm, which states that the RTO should be doubled if a time-out occurs and retransmitted packets should not be used for RTT measurements. Although the RTO doubles after a time-out, it is recalculated if a new measurement becomes available. If either a heartbeat-acknowledgement or a SACK is successfully received after transmitting to a destination address, the SCTP sender can also clear the error-counter for that destination, which keeps it active. While this is quite simple in the case of heartbeats, as the destination address is included in the heartbeat payload, it means that the SCTP sender must keep track of which destination it last sent each data packet to. so that it can clear the appropriate error-counter. As both heartbeats and SACKs are used to monitor the status of a destination address, it is important to consider how the data receiver transmits feedback packets. In general, they should be transmitted in the same manner every time, thereby ensuring that they measure the same network path. SCTP does this by transmitting SACKs to the source address of the packet being acknowledged. Assuming that both nodes select the same source address and out-going interface for packets

D. SCTP Summary This section has described some of the important aspects of the behaviour of SCTP that lead to the stall conditions described later. For the first stall, which occurs when an inaccurate RTO value is used on the alternate address, it should be noted that heartbeats are used to obtain RTT measurements but retransmitted data packets are not. The stall occurs when there is a significant difference between the RTT measurement for data packets and heartbeat packets on the alternate address. The second stall scenario occurs when the destination address is varied as a result of receiving duplicate data packets. When coupled with an error that only affects SACKs, this causes a significant delay in an association performing a failover. Both stall scenarios occur when the correct response would be a failover. IV. SCTP S TALL SITUATIONS This section describes how SCTP can stall. During a stall, SCTP can only transmit packets at a very low rate of between 1 and 3 per minute and this continues either indefinitely or for a prolonged period of time. SCTP does not recognise any problem, so the upper-layer is not informed and cannot react. A stall is caused when SACKs are misinterpreted as clearing the error-counter on the primary path, preventing the SCTP sender detecting that it has failed, so it does not perform a failover. The RTO of the primary path increases to the maximum value of 60-seconds, reducing the transmission rate. 1 A DupTSN-SACK should not be confused with a DUP-ACK in TCP. In TCP, a DUP-ACK is used to indicate that a data segment has been received but an earlier segment is missing. This is not necessary in SCTP as the SACK can explicitly acknowledge out-of-order data chunks. The DupTSN-SACK indicates to the sender that a data chunk has been received more than once.

The work presented in this section was performed using the NS-2 simulation package with the Delaware NS-2 Modules [22]. The network configuration shown in Figure 1 is used throughout, so that there are two separate network paths between the two end-points. In each simulation, Node-S transmits data to Node-R for 3000-seconds of simulation time. At 400-seconds, a fault develops in the primary path, which should cause SCTP to failover to the secondary path. Primary Path Fails SA

Primary Path

RX

A

X

B

Y

Node-S

Node-R

SB

Fig. 1.

Alternate Path

RY

SCTP Association with two paths

In Figure 2, a normal and successful failover is shown. At 400-seconds, the primary path fails. After 1-second, the first timeout occurs, the primary path’s error-counter is incremented, and the lost packets are retransmitted on the secondary path. New data is then transmitted on the primary path, but again a timeout occurs, this time after 2-seconds. This is repeated until the error-counter on the primary path reaches six, at which point, the sender marks it as inactive. The process takes 63-seconds, made up of 6 consecutive timeouts that double each time (1+2+4+8+16+32). Once the primary path is inactive, data is sent on the secondary path first, completing the failover. The primary path is monitored with heartbeat packets. Note that for clarity in Figure 2, only every 150th packet is shown in the simulation results, except during the failover. Also, only the first 1000-seconds of the simulation are shown. 20000 Received On Primary Received On Secondary 18000 16000 Transmission Sequence Number

with the same destination, then using the source address of packets as the destination address of feedback ensures that the same round-trip network path is used (and measured) each time. There are two exceptions to this. SCTP uses ‘delayed SACKs’ to save network resources in the same way that TCP does; it does not respond to a data packet with a SACK immediately, but instead waits for either 200-ms or for a second data packet, combining the two SACKs into one and typically halving the number of required SACKs. If the two data packets arrive on different destinations, however, the SACK can be transmitted to either of the source addresses. If a data packet is received twice, as might happen if a SACK is lost or delayed and the data packet is unnecessarily retransmitted, SCTP allows an optional implementation variation in how the SACK destination address is chosen. In this case, it can assume that the duplicate data packet is a result of a lost SACK, and retransmit the SACK acknowledging the Duplicate TSN (referred to as a DupTSN-SACK1 ) to an alternate destination address, which should increase the probability of the SACK being successful. This behaviour is implemented in the reference implementation of SCTP and is a component in one of the stall scenarios described.

14000 12000 10000 Failover

8000 6000 4000 2000 0 0

200

400 600 Time (seconds)

Fig. 2.

800

1000

Successful Failover

A. Incorrect RTO Value In this scenario, the secondary path has an RTO value calculated only from heartbeat measurements when the primary path failure occurs. It is possible that this value is lower

than required for data packets, causing unnecessary timeouts on the secondary path when data is transmitted on it. This causes a second retransmission, but on the primary path. After the second retransmission, the SACK due to the original retransmission, on the secondary path, arrives. Erroneously, this clears the error-counter for the primary address. This sequence can repeat indefinitely, preventing a successful failover. A possible network where this situation might occur is shown in Figure 3. Each of the nodes might be a monitoring station that have to share readings on a regular basis. They are all connected to a high-speed network that provides the primary connections. They are also connected by a secondary network consisting of a telephone ring, where each link in the telephone ring provides a 36-kb/s duplex-link to another node. A secondary path might consist of up to three telephone links. The delays imposed by each telephone link are dependent on the packet size. A 1500-byte packet (a data packet) has a one-way delay of almost a second when passing three such telephone links; a 100-byte packet (heartbeat) has a 66-ms one-way delay over the same three links. A second scenario where this might occur could be a wireless link, where packets are fragmented before transmission over the air interface. Such networks could have significantly higher delays for big packets than small packets.

retransmission on the secondary path arrives; however, this SACK is erroneously attributed to the second retransmission on the primary path and causes the sender to clear the errorcounter for the primary path, so that EC(X)=0. Receiver B Sender A

Recv Y Slow Network

20 T1 (EC(X)=1)

20

T2 (EC(Y)=1) S-20

T3 (EC(X)=0)

20 21 22

T4 (EC(X)=1) 21 S -2 1

EC(X)=3 HB H B -A C K

T5

RTO(Y) reset

26

S-26 Modem

T6

26

T7 (EC(X)=0)

Different RTT Values

Modem Modem

Secondary Telephone Ring

Node-5

Node-6

EC(X)=0

Recv X Fast Network

Node-4 Fast Fiber Primary Network

Modem Modem

Node-7

Node-3

Node-0 Modem

Fig. 3.

Node-2

Each POTS Link 100-K at 22-ms 1500-K at 330ms

Modem

Network with High Secondary Latency

In the simulation, the primary link failed at 400-seconds. Figure 4 shows a simplified message-sequence diagram of events from there. The significant events are circled and numbered (T1–T7), with a full circle for timeouts and a broken circle for acknowledgement events. The network path failure means that all data packets sent from A to X are lost. At 401-seconds, the first timeout occurs for the primary path, shown by the circled T1. This increases the error-counter for address X (EC(X)=1) and doubles the RTO. The lost packet-20 is retransmitted on the secondary path between B and Y. The RTO for the secondary path is the minimum 1-second, which is based on heartbeat measurements. The RTT time required for a data packet, which is significantly bigger in size, is actually greater than 1-second and so a second timeout occurs, this time on the secondary path, shown by T2. The lost packet (20) is again retransmitted, this time on the primary path, where it is lost again. At time T3, some short time after the second retransmission, the SACK elicited by the first

Fig. 4.

Message Sequence for Failed failover

After this, packets are transmitted on the primary path (packets 21 and 22), but timeout (T4) and are retransmitted on the secondary path. As the RTO on the secondary path was doubled after the earlier timeout on this path (T2), the SACKs for these packets arrive in time. Each timeout on the primary path doubles the RTO and increments the error-counter for that path, so that after the initial problem, the failover process has resumed. However, as only retransmitted data has been sent on the secondary path so far, the secondary path is considered idle and therefore will be monitored by periodic heartbeats. Karn’s algorithm prevents the retransmitted data packets from updating the RTT measurements, but the heartbeats provide new measurements, which cause the RTO to be recalculated. This essentially resets the RTO to its initial low value. This problem is illustrated in the bottom half of Figure 4, and at T5, the RTO value for the secondary destination (RTT(Y)) is reset to its initial value based only on heartbeat measurements. The next data packet after a heartbeat sent on the secondary path will time-out as the RTO value has been reset. As above, this causes a second retransmission on the primary path (at time T6) which appears to be acknowledged by a SACK elicited by the first retransmission on the secondary path (at time T7). This clears the error-counter of the primary path again. The sequence will repeat itself after every heartbeat,

preventing the sender from detecting the failed path. The sender will persist in sending data on the primary path first, and will then wait 60-seconds before retransmitting it on the secondary path. The association has stalled, as shown in Figure 5. In the simulations performed, it did not recover or failover. This stall occurs when a number of conditions are met. The most important condition is that there is a significant difference between the RTO calculated by heartbeats and that required by data packets. The RTO formula has built into it two mechanisms that will tend to overestimate the value. Firstly, most implementations use a value of 200-ms for G in Equation 1 and secondly, any variance in the RTT measurements adds an additional margin. Therefore, a significant difference is one that overcomes both these factors. It is reasonable, however, to suggest that a secondary network might both be lightly loaded and use links that cause a significant difference between small and big, or heartbeat and data, packets.

The simplified message sequence diagram is shown in Figure 6. After the failure, a time-out occurs on the primary path at 401-seconds (T1) due to the loss of the SACK. The errorcounter for this address is incremented to one (EC(X)=1) and the RTO is doubled. The data packet (20) is also retransmitted on the secondary path. The retransmitted packet arrives at the receiver, where it is a duplicate data packet, causing a DupTSN-SACK to be generated. This is sent to address A and so is lost again. Despite sending two packets, the sender still has not received a SACK, and so a third attempt is made at T2, this time on the primary path. Receiver B Sender A

Recv Y Secondary

Recv X Primary 20

T1 (EC(X)=1)

0 S 2

20

20000 Received On Primary Received On Secondary

0 S 2

DSACK sent to A

T2 (EC(Y)=1) 20

TSN

15000

DSACK sent to B

S20

T3 (EC(X)=0)

21

10000

(EC(A)=5) HB 30 0 S 3

5000

T4 (EC(A)=6) T5 (EC(X)=1)

0 0

200

400 600 Time (seconds)

Fig. 5.

800

1000

SCTP Stall

B. Duplicate Data Packets The second stall scenario is caused by a network fault that only affects traffic in one direction. The same network configuration shown in Figure 1 is used, but this time the secondary path has increased bandwidth so that there is no significant delay difference between heartbeat and data packets that might cause the first stall condition. When the failure occurs, all data packets sent from address A to X are successfully delivered, but packets (SACKs) transmitted in the reverse direction are dropped. The receiver in this experiment transmits DupTSN-SACKs (SACKs elicited by duplicate data chunks) to an alternate destination than the source address of the data packet. The result of this is that if a duplicate data packet was received from A, then the DupTSN-SACK would be sent to B, and if the data packet was received from B, then the DupTSN-SACK would be sent to A. It was decided in these simulations to only select alternate addresses from those that are active, a decision which eventually allows the association to escape the stall.

Primary Inactive, SACKs send to Secondary

30 S 30

Failover occurring

Fig. 6.

Stall Caused by lost SACKs

The third transmission of packet 20 is received from address A and on the primary path. A DupTSN-SACK is generated by the SCTP receiver when it arrives, which is sent to address B (the secondary path) and successfully arrives at the receiver at T3. The sender now clears the error-counter for the primary path (EC(X)=0), and continues to transmit new data on the primary path (packet 21). However, this repeats the above sequence, and each packet will have to be sent three times before a SACK is received. The time between transmissions increases to 60-seconds as the primary path RTO value increases, and the SCTP sender cannot complete its failover. A stall has occurred. Heartbeats play a significant role in this stall scenario. In the above message sequence, a timeout occurs on the secondary path, which causes its error-counter to be incremented. However, as it continues to transmit heartbeats and receive heartbeat-acknowledgements, the secondary path’s error-counter is cleared, preventing it from being marked as inactive. The RTO of the secondary path is also kept low as a consequence.

The SCTP receiver also uses heartbeats to monitor the sender’s addresses. All the heartbeats sent to address A are lost, causing the error-counter at the receiver to increase. After about 210-seconds (7 × 30-seconds), the loss of heartbeats causes the error-counter of A to hit the threshold and the destination address A is marked as inactive (T4, EC(A)=6). From this point, the receiver stops sending DupTSN-SACKs to address A, hence the next packet retransmitted on the secondary path (at T5) is successfully acknowledged. It is now possible for the sender to failover. This is shown at the bottom of Figure 6. 14000 Received On Primary Received On Secondary 12000

10000

8000 Eventual Failover

TSN

Primary Path Fails 6000

4000

2000

0 0

200

Fig. 7.

400 600 Time (seconds)

800

1000

SCTP stalls as SACKs are lost

In the simulations performed, it was found that failover occurred after 450-seconds, as shown in Figure 7. This time was made up of two of stages. Initially, each packet had to be transmitted three times before it was acknowledged. This stage lasted until the receiver realised that the address A could not be reached, and lasted 235-seconds. After this, each packet was transmitted on the primary path, and then successfully retransmitted on the secondary path after 60-seconds, until the primary path was marked inactive. This process took 215seconds. V. P REVENTING S TALLS The stall scenarios described occur only when a number of conditions are met, and consequently there are a number of different approaches that can be taken to remove the stalls. These are considered in this section. A successful solution allows a normal failover to occur after the primary path fails, but the impact on the normal operation of SCTP should also be considered. A. Increasing the RTO Value In the first stall scenario, a contributing cause is an estimate for the RTO value that is too low. An obvious solution is to ensure a minimum margin between the measured SRTT and the RTO; however, this also increases the time taken by SCTP to detect losses, which reduces the efficiency of SCTP to take into account exceptional cases. There would also be networks where any fixed safety margin would not be sufficient. In

actual SCTP implementations, the clock resolution (G in Equation 1) provides a margin, but is not part of the standard. Simulations were performed where a value of G of 200-ms was used and the stall described in Section IV-A was prevented; however, this is of limited use as it was possible to reintroduce the stall by adding further narrow links to the secondary path. It is possible to address the underestimation of the RTO in other ways. The heartbeat packet’s size could be increased to 1500-bytes, or varied. Varying the RTO allows the sender to calculate a RTT value based on packet size, but is quite a complex solution. This idea, however, increases the RTTVAR value, meaning that the difference between the SRTT and RTO values would increase in proportion to the impact of packet size on delay. However, both of these solutions create unjustified network traffic in the majority of cases and additional protocol complexity. The possible inaccuracy of heartbeat measurements could also be recognised. One solution is to not use heartbeats to measure the RTT, but this is to discard information that is useful in the vast majority of cases. A less drastic solution is not to use a heartbeat-acknowledgement to reduce the RTO after timeout has occurred for a data packet. Although the SRTT could be updated, so the measurement is not lost, a lock could be placed on an RTO value after a timeout which meant that only a new data measurement could be used to reduce its value. This allows most associations to use the valuable RTT information from heartbeats but also recognises that there may be cases where this information is not accurate. This solution does not protect against the second stall scenario, however. B. Using Karn’s Algorithm for Path Monitoring Karn’s algorithm for RTT measurements discards any retransmitted packets when measuring the RTT. Similarly, not resetting the error-counter for any packets that have been transmitted more than once can prevent accidentally clearing the error-counter in a number of cases. This prevents the first stall situation, but surprisingly, has no effect on the second stall situation. This is explained as follows. In the second stall scenario, there are normally multiple packets sent to the primary address, but only the first of these is retransmitted two more times. When the SCTP sender eventually receives a SACK, not only does it acknowledge the packet that has been transmitted three times, but it also acknowledges those packets that were originally transmitted with it to the primary destination address. As these packets have not been retransmitted, this causes the sender to clear the error-counter of the primary path anyway, and the stall continues as before. C. Fixing the destination address for SACKs The second stall situation is caused partly by the receiver selecting an alternate address for DupTSN-SACKs. This is an optional implementation decision which might not be implemented. Therefore, if an association chooses not to implement this policy, then the second stall scenario will not occur. Although this has no effect on the first stall scenario caused by

the inaccurate RTO, coupled with one of the other solutions above, it is possible to prevent both stall cases. D. Using Connections Both stall scenarios arose when the SCTP sender misinterpreted the returning SACKs and incorrectly cleared the errorcounter of the primary destination. Although the solutions discussed above are capable of preventing the stalls described, they cannot ensure that a stall does not occur. The fundamental cause of the problem is the coupling of data acknowledgement and path monitoring. Therefore, by de-coupling these two functions, it is possible to propose a universal solution that solves the described stall scenarios and also removes the possibility of other stall scenarios arising. To do this, SCTP must recognise connections. The SCTP sender should use connections to transmit data and should maintain state, such as the error-counter and RTO, for connections. It should be able to identify these connections with a tag and this tag should be replayed to it within SACKs. When a packet requiring acknowledgement is received, such as one containing a data-chunk, the acknowledgement should include both the relevant information, such as which chunks have been received, and the identification tag of the connection used to deliver the data. In this way, even if data is transmitted multiple times, both the delivered data and successful path can be unambiguously recognised by the SCTP sender. Currently, SCTP connections are based only on the destination address of the packet; it is assumed that the source Point-of-Attachment and return path are constant. Although it is possible that this definition can be expanded, so that a connection might include not only the destination address but also the source Point-of-Attachment and perhaps Quality-ofService choices, it is assumed here that the destination address uniquely defines a connection. The SACK should include the destination address of the received packet or packets being acknowledged to indicate to the receiver which of the transmissions were successful. This information can be included within the SACK, or perhaps in a new connection-acknowledgement chunk bundled with the SACK. Upon receipt of the SACK, the SCTP data sender can simply read-off the appropriate connection and clear the error-counter for the appropriate destination address. If this destination address and the source address of the SACK happen to be the same, this might be indicated in the SACK with a flag, reducing the overhead involved in this scheme to a negligible level. This scheme was implemented in simulation code, and found to remove both stall scenarios successfully. It should be noted that other work has considered including this type of information in SCTP SACKs. The heartbeat-acknowledgement normally includes information which allows the heartbeat sender identify the appropriate destination address, while a time-stamp option similarly de-couples data acknowledgement and path RTT measurement mechanism. There have been proposals that retransmissions are marked and this mark acknowledged in SACKs [6] to prevent over aggressive congestion window growth during an application initiated change

of primary address. VI. C ONCLUSIONS This paper examined the behaviour of SCTP under unusual failover conditions. Two scenarios were studied, in which SCTP stalled during a failover. During the stall, not only did SCTP not failover, but no problem was reported to the upper-layers, preventing remedial action at these layers too. The performance of SCTP suffered greatly due to the stalls. The first stall scenario occurred when different delays were experienced by monitoring (heartbeat) traffic as compared to data traffic, which caused an inappropriate estimation of the RTO value. The problem continued because SCTP then erroneously cleared the error-counter of the failed primary path while keeping the RTO of the secondary path low due to heartbeats. The second stall was caused when an error that only affected the SACKs on the primary path occurred and the receiver used an alternate address for DupTSN-SACKs. Both stall situations could be resolved independently, but a weakness in the SCTP protocol was identified, where data acknowledgement and path monitoring are joined together. This is similar to the problem of using TCP ACKs for data acknowledgement and RTT measurements, and a solution similar to the TCP time-stamp option seems to the optimal solution. It was suggested, therefore, that path monitoring should be performed by identifying paths and explicitly identifying which path elicited a SACK within the SACK. This decouples two of the functions of an SCTP SACK of data acknowledgement and path monitoring. A number of contributions are presented in this paper. The first is to identify a weakness in using Heartbeat RTT measurements to estimate an RTO for data packets. The second contribution is to identify a stall scenario that might occur due to this underestimation, and how the multi-homing aspect of SCTP makes it difficult for an association to correct the inaccurate RTO estimate. The third contribution is recognising how varying the destination address for a DupTSN-SACK can cause a stall and the importance, therefore, of using a fixed policy when transmitting SACKs. The fourth contribution is to identify the fundamental cause of the stall scenarios as misinterpretation of SACK information, and to counter this with explicit identification of a path within a SACK. It was found that if a SACK could explicitly identify which destination address a packet had been sent to to elicit the SACK, then path monitoring could be decoupled from data acknowledgement, preventing stalls and not affecting normal SCTP operation. Finally, it is noted that the explicit monitoring of connections within multi-homed associations may be applicable to other protocols. It was noted that similar solutions were found to other problems in SCTP, such as spurious timeouts and change-over performance issues. Future work is to examine the importance of explicit connection and protocol information within associations.

ACKNOWLEDGEMENTS The support of both the Research Innovation Fund (RIF) and the Advanced Technology Research Programme (ATRP) from the Informatics Research Initiative of Enterprise Ireland is gratefully acknowledged. R EFERENCES [1] R. Brennan, T. Curran, “SCTP Congestion Control: Initial Simulation Studies” Proc. 17th International Teletraffic Congress, December 2001. [2] R. Alamgir, M. Atiquzzaman, W. Ivancic “Effect of Congestion Control on the Performance of TCP and SCTP over Satellite Networks”, NASA Earth Science Technology Conf., June 2002. [3] A.L. Caro Jr., P.D. Amer, R.R. Stewart, “Retransmission Schemes for End-to-end Failover with Transport Layer Multihoming”, GLOBECOM IEEE Global Telecommunications Conference, November 2004. [4] A. Jungmaier, E.P. Rathgeb, M. Tuxen, “On the Use of SCTP in FailoverScenarios”, Intl. Conf. on Information Systems, Analysis and Synthesis, July 2002. [5] S. Ladha, S. Buacke, R. Ludwig, P.D. Amer, “On Making SCTP Robust to Spurious Retransmissions”, ACM Sigcomm Computer Communication Review, Vol. 32, Number 2, April 2004. [6] J.R. Iyengar, A.L Caro, Jr., P.D. Amer, G.J. Heinz, R.R. Stewart, “Making SCTP More Robust to Changeover”, SPECTS 2003, July 2003. [7] P. Karn, C. Partridge “Improving Round-Trip Time Estimates in Reliable Transport Protocols” ACM Sigcomm, August 1997. [8] R. Stewart, Q. Xie, et al, “RFC 2960 – Stream Control Transmission Protocol”, October 2000. [9] R.R. Stewart, X. Qie “Stream Control Transmission Protocol (SCTP): A Reference Guide”, AWT-Publishing, October 2001 [10] L. Coene (ed), “Multihoming issues in the Stream Control Transmission Protocol ¡draft-coene-sctp-multihoming-04.txt”, Internet Draft (expired), May 2002. [11] T. Ravier, R. Brennan, T. Curran, “Experimental studies of SCTP Multihoming”, 1st Joint IEI/IEE Symp on Telecoms. Systems, November 2001. [12] S. Fu, M. Atiquzzaman, W. Ivancic “Effect of Delay Spike on SCTP, TCP Reno, and Eifel in a Wireless Mobile Environment”, Intl. Conf. on Computer Communications and Networks, October 2002 [13] J.R. Iyengar, K.C. Shah, P.D. Amer, R.R Stewart “Concurrent Multipath Transfer Using SCTP Multihoming”, Proc. of SPECTS 2004. [14] Li Ma, Fei Yu, V. Leung, “A New Method UMTS/WLAN Vertical Handover Using SCTP” Proc. IEEE VTC, Oct 2003 [15] J. Noonan, P. Perry, J. Murphy “Simulations of Multi-media Traffic over SCTP modified for Delay-centric Handover”, World Wireless Congress, June 2004, [16] J. Noonan, P. Perry, J. Murphy “Client Controlled Network Selection”, it IEE 3G Conf. 5th Intl. Conf. on 3G Mobile Communications, October 2004. [17] E. Kohler, M. Handley, S. Floyd “Designing DCCP: Congestion Control without Reliability” Technical Report, available from http://www.icir.org/kohler/dcp/ [18] A.C. Snoeren, D.G. Andersen, H. Balakrishnan “Fine-Grained Failover Using Connection Migration” Proc. 3rd USITS, March 2001. [19] R.R. Stewart, C. Metz “SCTP – New Transport Protocol For TCP/IP”, IEEE Internet Computing, vol 5, no. 6, pp 64-69, November 2001 [20] A.L. Caro Jr., J.R. Iyengar, P.D. Amer, S. Ladha, G.J. Heinz, K.C. Shah, “SCTP: A Proposed Standard for Robust Internet Data Transport”, Computer, vol 36, issue 11, November 2003. [21] S. Fu, M. Atiquzzaman, “SCTP: State of the art in Research, Products, and Technical Challenges”, IEEE Communications Magazine, vol 42, no. 4, April 2004. [22] A.L Caro Jr, J.R. Iyengar, “SCTP Module for Ns 2”, URL=http://www.armandocaro.net/software/ns2sctp

Suggest Documents