performance and correctness of the atm abr rate ... - Semantic Scholar

PERFORMANCE AND CORRECTNESS OF THE ATM ABR RATE CONTROL SCHEME David Lee Bell Laboratories 600 Mountain Ave. Murray Hill, New Jersey [email protected]

K. K. Ramakrishnan AT&T Research 600 Mountain Ave. Murray Hill, New Jersey [email protected]

ABSTRACT We study both correctness and performance of the source/destination protocol of the Available Bit Rate (ABR) service in Asynchronous Transfer Mode (ATM) networks. Although the basic source/destination protocol for congestion management is relatively simple, the protocol specification has to cope with several ‘‘real-world’’ cases such as failures and delayed/lost feedback which may introduce complexity. Rigorous proofs of the correct functioning of the protocol based on a formal specification is necessary. We use a formal Extended Finite State Machine (EFSM) model to show that the ABR source/destination protocol is free of live-locks, so that under all conditions both Resource Management (RM) and data cells will be transmitted. We also show that the network options of Explicit Forward Congestion Indication (EFCI) and Explicit Rate (ER) interoperate correctly. While ensuring the correct functioning of the protocol, it is essential that pathological situations do not result in abysmal performance, which is another form of incorrect operation. We use the understanding of the informal English description of the source/destination behavior and of our EFSM model to derive conditions that ensure that the source transmission rate is stable in the presence of delayed or lost feedback RM cells, especially under the operation of a source rule that requires the reduction of the source rate under these conditions. We arrive at bounds on the number of consecutive RM cell losses tolerated while the rate remains stable. It is also of vital importance that the delay for feedback of network state is well understood. There are two components of this delay contributed by the source/destination protocol: the source delay for sending out the probe Forward RM cells and the destination delay to turn-around the Backward RM cell. We provide a worst-case analysis of the delay in turning around RM cells at the destination station and the worst-case inter-departure time of Forward RM cells from the source. 1. INTRODUCTION The Available Bit Rate (ABR) service class for Asynchronous Transfer Mode (ATM) networks uses a rate control scheme to manage congestion. Sources adjust their rates such that the aggregate load on the network does not exceed the capacity of the network [ATM]. The protocol specification for the source and destination behavior to achieve overall congestion avoidance and control is specified using a relatively informal specification described in [ATM]. Many of the aspects of

W. Melody Moh San Jose State University Dept. of Math.& Computer Sc. San Jose, California [email protected]

A. Udaya Shankar University of Maryland Computer Sc. Dept. College Park, Maryland [email protected]

the protocol have been designed based on extensive performance analysis through simulation. Simulation has been used especially to determine correct parameter settings. There is little work in the literature that rigorously analyzes the performance and examines the correctness of the ABR protocol. Some of the properties of the protocol, such as whether it is correct or not are difficult to show using simulation. A formal specification is necessary for this purpose. Correctness relates to whether the specified protocol is free of any live-locks or other undesirable properties. We model the ABR protocol by an Extended Finite State Machine [LY] and show that the ABR protocol is live-lock free in terms of transmitting data cells and Resource Management (RM) cells, which act as source probes of the network conditions. In addition, the ABR congestion management specification accommodates at least two different modes in which switches in the network may operate. The source/destination policies (the main focus of the specification) are designed to smoothly operate with any intermix of the two types of switches in the network - these are the Explicit Forward Congestion Indication (EFCI) and Explicit Rate (ER) switches. EFCI switches use a single bit to communicate congestion [RJ], when a queue threshold is exceeded. ER switches on the other hand compute a max-min fair rate [CCJ,CR] and communicate this rate to the sources. The source and destination use a common ABR protocol to interface to both types of switches. We examine whether the specification correctly interoperates with both these types of switches, using the formal specification we have developed. Rate based flow control mechanisms also need some form of protection against failures. For example, when the feedback from the network is not provided in a timely manner or is lost, it is desirable that the source reduce its rate so that the network is not overloaded by the source transmitting at an incorrectly high rate. ATM’s ABR service also permits sources to start at a high initial rate so that higher layer protocols and applications such as Remote Procedure Calls (RPC) may transmit a short burst without a start-up delay of a full round-trip time for feedback from the network (Source Rule 6 in [ATM]). To avoid using this potentially high rate for too long, and thus exceeding the buffering in the network, the source rules specify a proportionate reduction in the rate in the absence of feedback from the network. This rate reduction may also be triggered in certain cases when the current transmission rate of RM cells by the source is mismatched with the rate at which RM cells return from the network, especially subsequent to a source rate increase. If the parameters are set inappropriately, this may

result in a net rate reduction in spite of feedback from the network to allow a rate increase. Some of this has been analyzed in the ATM Forum [JKFL] through simulation. In this paper, we provide a quantitative analysis of the worst-case conditions under which a stable source transmission rate (ACR) is maintained in spite of source rule 6. It is particularly important to examine this in the presence of different types of RM cell loss, which is one of our main contributions. This is covered in the next section. In conjunction with the problem of maintaining a stable rate, there is the need to ensure that the feedback from the network is timely. While some of the feedback delay contributed by the network is outside our framework, we quantify in this work, the contribution to the feedback delay due to the source/destination policies. Specifically, in Section 3, we estimate the turn-around time delay for Backward RM (BRM) cells and the inter-departure time intervals for Forward RM (FRM) cells from the source. The estimation provides useful information for appropriate parameter setting as well as for understanding the relationship between the rate and the network feedback delay. In Section 4, we examine some correctness issues, in proving that there are no live-locks in the specified protocol, and in Section 5 we show that the interoperation of EFCI and ER mechanisms is correct. The source/destination rules and various parameters and acronyms used in the paper are from [ATM]. For the formal specification that we used in this paper, see [LMRS, LRMS]. 2. RATE ANALYSIS The ABR source policy includes a mechanism that limits the use of a current transmission rate, called Allowed Cell Rate (ACR), in the absence of timely feedback from the network (source Rule 6). Under ‘‘steady-state’’ operation, where the rate is not changing, and there are no RM cells lost, the source endsystem should be receiving a BRM cell for every FRM sent. When a rate change occurs, we may be sending FRM cells at a different rate (based on the new ACR) than the returning rate of BRM cells. When a source starts up and is transmitting at an Initial Cell Rate (ICR), the source runs the risk of overloading the network if this initial rate is too high. RM cells, however, return only one Round-trip Time (RTT) after start-up. Consequently, in the absence of feedback from the network, the source rules provide for a conservative correction of the rate until the feedback arrives. The Forum ABR policy has chosen to have sources start at an ICR that may be relatively high. On the other hand, one could choose to start at a relatively low rate, and then increase based on feedback [Kes]. The ATM Forum’s ABR policy was chosen to allow for RPC-like flows to derive the benefit of a "fast-start", especially if the total amount of data transmitted in the "burst" is small enough not to exceed the buffering in the network. Subsequent to start-up, when a significant increase in the rate occurs, once again, Rule 6 may be triggered. Consider the following (relatively general) scenario. Subsequent to a rate change at time t = 0, the rate at which BRM cells return will

correspond to the rate at which FRM cells were sent approximately one RTT earlier. Therefore, if the original rate is small, and the new rate at which the station is transmitting FRM cells is much higher, then potentially more than Crm (Missing RMcell count) in-rate FRM cells may be transmitted prior to a BRM cell returning after the rate change at t = 0. Rule 6 will be triggered upon sending Crm FRM cells. The reduction of source rate, ACR, is multiplicative and proportionate to the current ACR. When a BRM cell is received, and the Explicit Rate, ER, (or the CI, NI bits for EFCI operation) allows the source rate to increase, then ACR is increased additively, by a fixed amount. Subsequent BRM cells received would continue to allow increasing the ACR, until it reaches the ER value or the Peak Cell Rate (PCR) for the connection. The ACR value from which we begin to increase is the rate achieved after any (zero or more) earlier reductions caused by Rule 6. As a result, there are two counter-acting actions taking place at the source: (1) A rate decrease as a result of Rule 6, when there are too many FRM cells transmitted before a BRM cell is received; and (2) A rate increase, potentially, when a BRM cell returns indicating that the current source rate is below its target. We examine below in Theorem 2.1 whether the rate increases that take place as a result of the BRM cells returning make up for the decreases caused while ‘‘running blind’’ awaiting the return of BRM cells. The analysis examines the case where there are n decreases (as a result of (1)), before an increase takes place (as a result of (2)). We need to ensure that there is no ‘‘net decrease’’ in the rate at the source because of Rule 6. 2.1. Estimating ACR With No RM cell loss In this subsection we examine the Allowed Cell Rate, ACR, in a single ‘‘epoch’’, where an epoch is the time interval between the arrival of two Backward RM (BRM) cells. Subsequent to the arrival of the first BRM cell of an epoch, we have possibly multiple decreases of the source rate, ACR, as a result of the repeated application of Rule 6, if the next BRM cell arrives late. Denote the current ACR ≡ ACR 0 and the ACR after k consecutive rate reductions from Rule 6 is: ACR k = ACR 0 ( 1 − CDF) k , k = 1 , 2 , . . .

(2.1)

Lemma 2.1. The time interval t k for the kth consecutive rate reduction from Rule 6 is:

t1

   Mrm  1 . = Crm min  max  ______ , Trm  + ______ , (2.2a) ACR 0    ACR 0  Nrm  ______  + t0 , ACR 0 

   Mrm  1 t k = min  max  ________ , Trm + ________ , (2.2b) ACR ACR k − 1 k −1    

Nrm  ________  , k = 2, ... ACR k − 1 

Theorem 2.1. A necessary and sufficient condition that there is no net rate reduction after a BRM cell with BN=0 is received is: . PCR _RIF ________ +  1 − CDF  n ≥ 1 (2.6)   ACR 0

where    Mrm  Nrm − 1 0 ≤ t 0 ≤ min  max  ______ , Trm  , _________ ACR ACR 0 0    

   .  

Proof. There are two cases when FRM cells are sent according to Source Rule (3)(a) [ATM]. Case 1: At a rate ACR j , the source sends an FRM cell out after every Nrm − 1 cells, as long as there are data cells to be sent. This corresponds to the second factor in (2.2). Case 2: If instead, we have Mrm data cells sent out (Mrm < Nrm − 1) and the timer also expires after time Trm then an FRM cell is sent. This is the first factor in equation (2.2). Observe that a BRM cell may be received either right before the first FRM cell is to be transmitted or just after the transmission of the previous FRM cell. This gives us time interval t 0 . The first rate reduction occurs after Crm FRM cells have been sent after receiving a BRM cell with BN = 0. The time interval of sending the Crm FRM cells is given in the first term of (2.2a). The sum of the two terms is the time interval t 1 in (2.2a). Afterwards, each FRM cell sent results in a rate reduction and that gives the time intervals t k in (2.2b) for the consecutive rate reductions. Consider the following scenario: we have a possible rate increase at time T 0 as a result of receiving a BRM cell. We examine what would be the maximum number of rate decreases occurring as a result of Rule 6, prior to the arrival of another BRM cell at time (T 0 + τ). Here τ is typically the interdeparture time of FRM cells from the source about one RTT time earlier (at T 0 − RTT) . For example, τ = (Nrm − 1 )/(ACR T 0 − RTT ). The following lemma describes the rate decrease/increase in between two consecutive BRM cells with BN=0. Lemma 2.2. Suppose that two consecutive BRM cells with BN=0 are received in time period τ. Then after n consecutive reductions, a rate increase occurs where: n = max  k :

k

Σ ti i =1

≤ τ 

(2.3)

where t i is given in (2.2). Before the rate increase, the total rate reduction is: ACR 0 − ACR n = ACR 0 [ 1 − ( 1 − CDF) n ] .

(2.4)

This net rate decrease occurs prior to the arrival of a BRM cell with BN=0, which allows a rate increase. After n consecutive rate reductions, we have a BRM cell allowing a rate increase (by Source Rule 8) of: r inc = RIF . PCR .

(2.5)

Comparing the rate increase in (2.5) with the rate decrease in (2.4), we obtain:

where n, given in (2.3), is the number of rate reductions since the previous BRM cell with BN=0 was received. Theorem 2.1 provides a relationship between the amount of decreases in the rate as a result of successive application of Rule 6, prior to a rate increase that compensates for the decreases. The time between two consecutive BRM cells (in the worst case, the first BRM cell was received just as the previous rate change occurred at t = 0) may be ( 1/ ACR 0 ), if the rate was ACR 0 for at least one RTT prior to the rate change at time t = 0. We assume that the time from the rate change, at t = 0, to the present time is less than one RTT time. Condition (2.6) provides the value for RIF that should be large enough to compensate for the decreases in between the arrival of two consecutive BRM cells. This is the worst case behavior of Rule 6 when there is no loss of RM cells. An important special case is when ACR is low: ACR ≤ RIF . PCR. In this case there is no net rate reduction, since the first term in (2.6) is always greater than 1. Corollary 2.1. Suppose that ACR ≤ RIF . PCR. Then there is no net rate reduction after a BRM cell with BN=0 is received. We have also examined the asymptotic behavior of the rate when the network is consistently returning a fixed value of ER in the BRM cells, and examine if the reductions due to Rule 6 result in an overall decrease in the source rate, when observed over a long period of a number of BRM cell arrivals [LRMS2]. 2.2. Rate Stability in the Presence of RM Cell Loss Suppose that an end system is operating at ‘‘steady state’’, by transmitting at a rate ACR = ER. We want to assure that the cumulative reduction due to Rule 6 does not overcome the aggregate increase achieved by the returning BRM cells. In ‘‘steady state’’, the total number of FRM cells transmitted should correspond to the number of BRM cells received. In a sense, we are ‘‘integrating’’ the effect across multiple BRM cells returning to the source, without necessarily maintaining the constraint that there are reductions occurring between every arriving BRM cell. In some cases, a BRM cell may arrive prior to Rule 6 being triggered. We also compute the worst case when the maximal number of reductions occur. The previous subsection looked at the case when there were no RM cells lost in the network. If there are RM cells lost, there is an increasing likelihood of rate reductions due to Rule 6. RM cell losses both in the forward (FRM cells) and backward (BRM cells) direction can result in the source experiencing a rate reduction. For the rest of this section, we will treat all RM cell losses as effectively being BRM cell losses (since when viewed from the source, a BRM cell does not arrive in time), for ease of exposition. The first case we examine here is the effect of losing ν − 1 consecutive BRM cells. Thus, the ‘‘epoch’’ between RM cell arrivals is now increased by ν − 1 BRM cell inter-arrival

 Nrm  log ________ l = max  Trm . ER _ _____________  0 , log ( 1 − CDF) 

times. For clarity, we assume that with all possible reductions, ACR ≥ MCR and we know: MCR ≤ ACR ≤ ER ≤ PCR .

(2.9)

    

(2.13)

  Mrm   log ________ r = max  Trm . ER  . _ _____________  0 , log ( 1 − CDF)   

(2.14)

and In the following rate analysis, we consider the worst case scenario for rate reductions. Consequently, we assume t 0 = 0. Suppose that after initial start-up, the source rate, ACR, converges to ER. We assume as before that the network consistently returns a constant allocation of ER for the VC. We derive the conditions such that ACR remains stable at ER in the event of losing one or more BRM cells. Suppose that ν − 1 BRM cells were lost. Then the time interval between two consecutive BRM cells received with BN=0 is ν . Nrm / ER. The following proposition gives the condition that the rate ACR remains stable: Proposition 2.1. Suppose that ν − 1 BRM cells were lost consecutively. The rate ACR is stable if  RIF . PCR  log 1 − _________  ER   n ≤ ___________________ log ( 1 − CDF)

(2.10)

where  n = max k : 

k

Σ ti i =1

Nrm  ≤ _____ . ν ER 

(2.11)

where t i is the time interval for the ith consecutive rate reduction, given in (2.2). Proof. The rate is stable if an increment of ACR from the arrival of a BRM cell with BN=0 offsets the decrements before its arrival. This can be derived directly from Theorem 2.1 with ACR 0 = ER, and τ = ν . Nrm / ER. In the above proposition, the rate stability conditions depend on two intermediate parameters: t i and n. The condition and intuition derived from these are related to the number of reductions due to Rule 6 and the time for these, which are similar to the conditions on the system parameters that are specified in the ATM Forum [ATM]. We now go a significant further step, which is one of the main results of our paper: a quantification of how many BRM cells could be lost consecutively without affecting the rate stability, based on the system parameters only. A detailed proof is in Appendix. Theorem 2.2. If ER ≤ RIF . PCR then the rate remains stable. Otherwise, the rate ACR remains stable when no more than ν − 1 consecutive BRM cells are lost, where:     . PCR   ________   log 1 − _RIF  1  ν ≤ _____ (Mrm + 1 ) .  (2.12) ER +   Nrm   ___________________ − r   log ( 1 − CDF)   Crm . Nrm + (l − 1 ) . Nrm + (r − l) . (ER . Trm + 1 )  where

If the rate remains high, so that Nrm / ACR ≤ Trm, the rate decreases are triggered after Crm FRM cells are sent. In this case, t 1 = Crm . Nrm / ACR 0 and t i = Nrm / ACR i − 1 , i = 2 , . . . , l. Then, we have n = ν − Crm + 1 . From Proposition 2.1, we have: Corollary 2.3. If ER ≤ RIF . PCR then the rate remains stable. Otherwise, suppose that the rate ACR satisfies Nrm / ACR < Trm. Then the number of allowed consecutive BRM cell losses is:  RIF . PCR  log 1 − _________  ER   ν ≤ ___________________ + Crm − 1 . log ( 1 − CDF)

(2.15)

When the rate is low, i.e., ER ≤ RIF . PCR, then the rate remains stable because a single rate increase from a BRM cell overcomes the accumulated effect of rate decreases from Rule 6. The interesting region for the rate stability issue is when ER > RIF . PCR. In this case, the number ν of consecutive losses tolerated depends on several parameters and the network feedback rate ER. 2.2.1. Numerical Results for Acceptable Number of Consecutive RM Cell Losses We examine, numerically, the number of consecutive RM Cell Losses (called BRM cell loss for simplicity), for typical parameter values, as the source rate ACR varies. We restrict ourselves to the condition that ER ≥ RIF . PCR, and in particular when Corollary 2.3 applies. In Table 2.1, we examine the number of consecutive RM cell losses that may be tolerated, for different ratios of the ‘‘steady state’’ rate, ER, returned by the network to the peak rate, PCR. ___________________________________________________________________   ER/PCR=0.1  ER/PCR=0.25  ER/PCR=0.5  ER/PCR=1.0  __________________________________________________________________ ____________________________________________________________________            CDF=1/2 31.245110 31.093110 31.048580 31.022720 ___________________________________________________________________            CDF=1/32  36.351370  33.032790  32.0 ___________________________________________________________________  31.496030        CDF=1/64  41.788370  35.098110  33.0160 32.0 ___________________________________________________________________         Table 2.1. Bounds on Allowable Consecutive RM cell Loss for Varrying Values of CDF (RIF = 1/64)

The values of the parameters chosen for the table are as

follows: The values chosen are Crm = 32, Nrm = 32, Mrm = 2 and Trm = 100 milliseconds. We have chosen a somewhat conservative value of RIF = 1/64, based on our understanding of the need to minimize the burst load on the network when the source rate increases. We have chosen to vary the ratio of ER / PCR, so that it is applicable for a wide range of link speeds, and examine how many consecutive RM cell losses may be tolerated while keeping the source rate stable. We examined equation (2.15) for different values of CDF, varying in powers of 2 from (1/2) to (1/128) [LRMS2]. Increasing ER reduces the tolerance for consecutive RM cell losses. When we consider realistic values of CDF (we believe 1/64 or smaller), quite a large number of RM cell losses are tolerated. For CDF = 1/64, at a ratio of ER / PCR=0.1, the number of consecutive losses is almost 42, which is reasonable. With a larger CDF, there is less tolerance for loss, but still appears to be ‘‘safely’’ large. Hence, for low rates, ER, and for reasonable values of the parameters, RM cell losses are not such a substantial concern. We now examine the tolerance to RM cell losses in the more realistic situation of having a higher ER rate. When the ratio ER / PCR ranges from 0.1 to 1.0 the number of consecutive RM cell losses tolerated continues to reduce. For CDF = 1/64, when ER / PCR=1.0, the number of consecutive RM cell losses tolerated is 32. At intermediate rates of 0. 1 < ER / PCR < 0. 5, smaller values of CDF (not shown) allows for slightly greater loss tolerance. 2.3. Rate Analysis Under Probabilistic Assumptions of RM Cell Losses We now examine the effect of RM cell losses in a probabilistic situation. We assume that our observation of the effect on the source rate, ACR, is over a sufficiently long period, covering multiple ‘‘epochs’’. As before, over this period, the total number of BRM cells (lost as well as those returned to source) is equal to the FRM cells transmitted. We derive for a given probability of loss ρ of RM cells, the relationship between the parameters associated with Rule 6 (CDF, Crm) and the rate increase factor RIF so that the expected rate is not below the starting value of ACR. As before, we assume that the target rate ER returned to the source in BRM cells is constant over the interval. Knowing a starting rate ACR at time t = 0 which is equal to the steadystate rate for the VC (i.e., ACR = ER at t = 0), we examine the effect on ACR when RM cells are lost. Suppose that the cell loss probability of RM cells is 0 ≤ ρ ≤ 1. If we observe the source events of interest, these are either sending FRM cells or receiving BRM cells. Given that we send N FRM cells over a time period, and receive N . ( 1 − ρ) BRM cells, the probability that the event is that of sending an FRM cell is 1/( 2 − ρ), and the probability that the event is of receiving a BRM cell is ( 1 − ρ)/( 2 − ρ). The expected amount of rate increases in the steady state over a fixed time interval is RIF . PCR[ 1 − 1/( 2 − ρ) ]. Corresponding to the same time interval, a decrease is caused if

Crm or more consecutive FRM cells are sent. The probability of sending Crm + i FRM cells consecutively (i.e., without an intervening event of a BRM cell returning) is 1/( 2 − ρ) Crm + i , i = 1 , . . ., and each of these causes a rate decrease of ACR[ 1 − ( 1 − CDF ) i ]. The expected rate reduction is then ∞

Σ i =1

1 − ( 1 − CDF) i ] _ACR[ ____________________ ( 2 − ρ) Crm + i

 1  ACR 1 ______ − _____________ = _____________  . Crm − 1  1 − ρ 2 − ρ − CDF  ( 2 − ρ)  In summary, the expected rate change in the steady state with an initial rate of ACR is:   1 RIF . PCR  1 − ______  − 2 − ρ    1  ACR 1 _____________ ______ − _____________  . Crm − 1  1 − ρ 2 − ρ − CDF ( 2 − ρ)   For the rate to be stable, statistically, the expected rate change should be non-negative. Thus, we have: Proposition 2.2. With an RM cell loss probability 0 ≤ ρ ≤ 1 and current transmission rate ACR, there is no expected rate decrease if:   1 RIF . PCR  1 − ______  ≥ 2 − ρ  

(2.16)

 1  ACR 1 _____________  ______ − _____________  . 2 − ρ − CDF  ( 2 − ρ) Crm − 1  1 − ρ

3. BRM CELL TURN-AROUND, FRM CELL TRANSMISSION INTERVALS, AND ACR We first study the turn-around time for BRM cells at the receiving end-station. When an FRM cell arrives at an endstation with the current network information, it is converted into a turned-around BRM cell. Understanding the contribution of the destination in turning around the RM cell is critical to understand the time it takes to feedback information to the source to adjust the rate. Informally, a source sends an FRM cell every Nrm − 1 data cells or based on an upper bound on the time, Trm. These FRM cells arrive in the data stream (sent on the same VC) at destination protocol machine of the remote station. After incurring queueing delays to have the hardware (adapter) process arriving cells (which we do not consider here), the FRM cell is handed to destination protocol machine. Our estimation of the turn-around delay starts from the point when the FRM cell is handed to the destination protocol machine. The destination protocol machine follows its rules to take the information received in the FRM cell to create information to be sent in the turned-around BRM cell. Specifically, the DIR bit in the FRM cell is changed from ‘‘forward’’ to ‘‘backward’’ and BN, CI, NI, QL and SN fields are set if necessary. From this

point onwards, the relevant turned-around BRM cells contain updated information to be handed to source. This process of updating the fields of an FRM cell can usually be done in hardware and the processing time is negligible. There are rules in the destination which state that the information received in the FRM cell may be used to re-write any turned-around BRM cells already queued to the source protocol machine. For convenience, we call this BRM re − writing. Another alternative is to drop all queued turned-around BRM cells and queue this one to send to the source. We call it BRM dropping. It turns out that the turn-around time is invariant with respect to the two different rules. The turned-around BRM cell is then processed through the protocol state machines at the receiver (source and scheduler machines) to eventually be transmitted back to the sending endsystem. The significant turn-around delay is the time it takes for the BRM cell to receive its opportunity to be transmitted by the scheduler. The turn-around delay estimation ends at the point when the BRM cell is transmitted. Our estimate of the turn-around time τ, from the moment an FRM cell is received at the destination protocol machine to the time a corresponding BRM cell is sent to the network, is derived informally based on the original specification [ATM]. We chose to do so since it provides more insight. On the other hand, a formal derivation based on the EFSM may also be obtained. We now estimate the delay of the BRM cell in the scheduler with the following assumptions: Assumption 3.1. (1) Mrm ≥ 1 and Nrm ≥ 2; (2) Mrm < Nrm; and (3) Trm > 1/ MCR. Note that the problem is only relevant under these assumptions. The first lemma is concerned with the ordering of the transmission of RM cells. It is crucial for the estimation of BRM cell turned-around time. Lemma 3.1. Suppose that BRM cells are waiting for transmission and that an FRM cell has just been transmitted to the network by scheduler. Then, the next cell to be transmitted by the scheduler is a BRM cell. Implementations may be somewhat lax in interpreting the order in which BRM cells and queued data cells may be transmitted. However, the priority, according to the specification (source rule (3)(a)) is given to the waiting BRM cell. Theorem 3.1. The turn-around delay τ of a BRM cell is bounded by: 1 1 _____ ≤ _____ ≤ τ PCR ACR    Mrm  1 Nrm ≤ min  max  Trm, _____  + _____ , _____ ACR ACR ACR        Mrm  1 Nrm ≤ min  max  Trm, _____  + _____ , _____ MCR    MCR MCR 

(3.1)         .  

Furthermore, the bounds are tight. Proof. We show that τ is primarily the time to transmit a BRM cell to the network by the scheduler. The lower bound in equation (3.1) is obvious; suppose that an FRM has been just sent to the network. Then by Lemma 3.1, a BRM cell is sent immediately in time 1/ ACR according to the rate constraint. Since the rate PCR is attainable by ACR, the lower bound is tight. For clarity, we denote the moment the BRM cell is queued to scheduler by t 0 . Again from Lemma 3.1, the upper bound is determined by the time for a maximum number of data cells sent before sending an FRM cell and the time Trm that would have to elapse for an FRM to be sent. The bound can be achieved if and only if the last RM cell sent before t 0 is a BRM cell. There are two cases. Case 1: We ignore the time constraint Trm then neither source rule (3)(a) nor (3)(b) hold until Nrm − 1 cells have been sent since the last FRM cell sent. To maximize the number of cells sent after t 0 , we consider that the two cells sent before t 0 are an FRM cell followed by a BRM cell. Therefore, from t 0 , Nrm − 2 data cells are sent followed by an FRM cell and then the BRM cell to be transmitted. The total number of cells sent after t 0 (including this BRM cell) is Nrm, in time Nrm / ACR. Case 2: The timer may expire before Nrm / ACR cells are sent. By Lemma 3.1, the longest possible time duration (after t 0 and before sending the BRM cell) occurs when the two cells sent right before t 0 are an FRM and then a BRM cell. The first FRM should have been sent at time t 0 − 1/ ACR, and at time (t 0 − 1/ ACR) + Trm another FRM cell will be sent. By Lemma 3.1, it is followed by the BRM to be transmitted. Thus, it takes time 2/ ACR to send these two RM cells, and, therefore, the total time duration from t 0 to the moment the BRM cell is sent is: Trm + 1/ ACR. On the other hand, in conjunction with the expiration of the timer, at lease Mrm in rate cells have to be sent before transmitting a BRM cell, which takes time (Mrm + 1 )/ ACR, including this BRM cell. To summarize, the BRM cell is sent whenever one of the above two events happens first, and (3.1) is proved. Since ACR is bounded below by MCR and the bound is tight, the upper bound on the right side of (3.1) is also tight. Since there are no other delays in the protocol state machines other than what is specified in equation (3.1), our estimation of BRM cell turn-around time is tight. 3.1. FRM CELL INTER-DEPARTURE TIMES RM cells act as the probes into the network to determine the capacity available for a VC. Frequent probes into the network introduce overhead, which may be excessive. However, sending probes too infrequently results in exposure of the network to congestion and delay in the source’s reaction to network state. Here, we attempt to understand the bounds on the time between transmitting two consecutive FRM cells. We assume that there are sufficient data/BRM cells available for transmission. Otherwise, the inter-departure times for FRM cells is close to being constant, Trm + 1/ ACR, until ACR increases above Nrm / Trm.

Proposition 3.1. Let τ be the time interval between two FRM cell transmissions. If there are data/BRM cells available for inrate transmission, then  Mrm  1 max  _____ , Trm + _____ ≤ τ ACR ACR  

(3.2)

    Mrm  Nrm − 1  1 ≤ min max  _____ , Trm , _________  + _____ . ACR ACR ACR       Proof. The lower bound is ensured by source rule (3)(a)(i) and the upper bound is a corollary of Lemma 2.1. If there are no data/BRM waiting for in-rate transmission, by rule (3)(a)(i) the next in-rate cell shall be an FRM cell if and only if at least Mrm in-rate cells have been transmitted and at least Trm time has elapsed since the last FRM cell was sent. Consequently, the time interval τ is nondeterministic.

condition for the protocol to perform properly. We show that there are no such live-locks in ABR. We use the formal specification in [LRMS], and the reader is referred to that paper for details on the EFSM. Note that sources queue data and RM cells instantly to scheduler whenever they are available. Therefore, delay of cells and ‘‘livelocks’’ can only occur in the scheduler. We now examine the scheduler behavior. We first show that the transitions from the active state S 1 are mutually exclusive and inclusive: Lemma 4.1. In the scheduler machine at any moment one and exactly one of the transitions from the active state S 1 is executable.

The source rate may change whenever FRM cells are sent. We now discuss the dependency of the rate ACR and FRM cell transmissions.

Sketch of Proof. We ignore the condition t ≥ 1/ ACR, since it becomes TRUE as time elapses. Examine the predicates of the outgoing transitions from S 1 . It is straightforward to check that the conjunction of any two predicates is FALSE. Therefore, at any moment, at most one of the transitions is executable. On the other hand, one checks that the disjunction of all the predicates is TRUE and this implies that at any moment at least one transition is executable.

Proposition 3.2. Assume data/BRM cells are available for transmission. (1) If ACR ≥ Nrm / Trm then each FRM cell is transmitted after transmitting ( Nrm − 1 ) data/BRM cells. (2) If Mrm / Trm ≤ ACR < Nrm / Trm then FRM cells are transmitted every Trm time interval. (3) If ACR < Mrm / Trm then each FRM cell is transmitted after transmitting Mrm data/BRM cells.

The above lemma shows that in the active state one and only one of the outgoing transitions is executable. It does not yet imply that the ABR protocol is live-lock free; we have to show that the transitions which transmit data cells and BRM cells will always be executed within a finite (bounded) amount of time.

From Proposition 3.2 and Lemma 2.1, we have the following relation between the rate ACR and BRM cell transmission. Corollary 3.1. Assume that data and turned-around BRM cells are available for transmission. Then (1) If ACR ≥ Nrm / Trm then each turned-around BRM cell is transmitted after (Nrm − 2 ) data cells and one FRM cell are sent. (2) If Mrm / Trm ≤ ACR < Nrm / Trm then turned-around BRM cells are transmitted every Trm time interval. (3) If ACR < Mrm / Trm then each turned-around BRM cell is transmitted after (Mrm − 1 ) data cells and one FRM cell being sent. We have also evaluated, numerically, the FRM cell inter-departure times [LRMS2]. The primary observation is that it is only in an "intermediate range" of source rates, the feedback delay to the source is impacted by the FRM inter-departure times (e.g., at a source rate of 1 cell/millisecond, or about 384 Kbits/second, the feedback occurs around once every 33 milliseconds). It is less of a concern at very low source rates or at much higher source rates. We have excluded this due to space constraints.

Theorem 4.1. The ABR source/destination protocol is live-lock free; both data cells and RM cells waiting for transmission will be sent eventually. Specifically, (1) So long as data cells are waiting for transmission, FRM cells are sent in a time interval no more than: τ FRM

    Mrm  Nrm − 1  _ ____ _ ________ = min max  , Trm ,  . (4.1) MCR    MCR   

(2) A turned-around BRM cell will be sent in time no more than τ BRM

      Mrm 1 Nrm = min  max Trm, _____  + _____ , _____  . (4.2) MCR  MCR MCR     

(3) Data cells waiting for transmission will be sent in time no more than: 4 τ DATA = _____ . MCR

(4.3)

4. CORRECTNESS — SOURCE/DESTINATION PROTOCOL BEING LIVE-LOCK FREE

Proof. Equation (4.1) can be derived from Proposition 3.1 where the time interval of sending FRM cells is derived. Equation (4.2) is a corollary of Theorem 4.1 where the turned-around delay of BRM cells is estimated.

We now discuss whether there are worst case scenarios such that available data or RM cells are never sent out. When this occurs the protocol has a live-lock [Kur]. This concerns the correctness of the rate control scheme and is a necessary

We now analyze the waiting time for a data cell ready for transmission. Data cells are queued to scheduler immediately, and, therefore, the only delay occurs in the scheduler. There are three cases.

Case 1: The last cell sent was an FRM cell.

Theorem 5.1. EFCI and ER schemes inter-operate as follows:

The corresponding transitions executed are either T 2 ; or T 3 followed by T 4 ; or T 3 followed by T 5 . In all the cases, we are back in state S 1 with X 1 + X 2 = 0. There are two subcases.

(1) If both EFCI and ER indicate congestion, then the final resulting rate is the minimum of the two rates obtained by rate reductions determined by the two schemes, respectively.

(A) X = 0. In this case, among all the transitions from S 1 , only T 7 is executable, since event E is TRUE: there are data cells waiting for transmission, X = 0, and X 1 + X 2 = 0 < Nrm − 1. Therefore, T 7 is executed and the data cell waiting from transmission is sent. The delay is 1/ ACR.

(2) If both EFCI and ER indicate no congestion, then the final rate increase is the minimum of the two increases obtained by applying the two schemes, respectively, unless NI = 1 in which case there would be no rate increase.

(B) X > 0. In this case, only T 6 is executable, since X > 0 and X 1 = 0. Therefore, a BRM cell is sent and we are still in state S 1 with X 1 = 1. Now by a similar argument among all the transitions from S 1 , only T 7 is executable. Note that T 6 is no longer executable since (X 1 = 0 )  !E = FALSE. Therefore, T 7 is executed and the data cell is sent. The delay is 2/ ACR. Case 2: The last cell sent was a BRM cell. The corresponding transition for the transmission was T 6 . After the execution of T 6 , X 1 > 0 and T 6 is no longer executable since (X 1 = 0 )  !E = FALSE. From Lemma 4.1, at least one of the transitions from S 1 is executable and, consequently, either an FRM cell or the data cell is sent upon execution. If an FRM cell is sent, then from Case 1 the data cell will be sent in time no more than 3/ ACR. Otherwise, the data cell is transmitted in time 1/ ACR. Case 3: The last cell sent was a data cell. The corresponding transition executed was T 7 . By Lemma 4.1, among all the transitions from S 1 at least one of them is executable. Therefore, either T 7 is executed next and in this case the data cell is sent, or any one of the following transitions is executed: T 2 ; T 3 followed by either T 4 or T 5 ; or T 6 . In any one of the cases, either an FRM cell or a BRM cell is sent. By the analysis in Case 1 and 2, the data cell is sent in time no more than 1/ ACR + 3/ ACR = 4/ ACR ≤ 4/ MCR. 5. INTEROPERATION OF EFCI AND ER SCHEMES There are two rate control schemes in ABR: Explicit Forward Congestion Indication (EFCI) and Explicit Rate (ER). We now discuss their interoperation using the formal extended finite state machine specification in [LRMS]. Upon receiving BRM cells, the source adjusts ACR at state S 2 of the source machine. The four transitions involving rate changes are T 4 , T 5 , T 6 , and T 7 ; they all go from state S 2 to S 1 . We first show that these four rate-change transitions are mutually exclusive and inclusive:

(3) If there is a conflict, (one indicates a congestion and the other indicates no congestion), then the final outcome is the rate reduction obtained by the option indicating congestion. Proof. EFCI scheme indicates congestion by setting the CI bit; ER scheme indicates congestion by setting ER such that ER ≤ ACR. Obviously, there are three cases. Case 1: Both EFCI and ER indicate congestion. In this case, (CI = 1 ) & & (ER BRM ≤ACR). Source transition T 4 is enabled, and by Lemma 5.1 it is the one and only transition enabled; the result is the following rate change: ACR : = max MCR, min ACR . ( 1 − RDF) , ER BRM   . Thus the final rate is the minimum of the two rates, determined by the two schemes, respectively. Case 2: Both EFCI and ER indicate no congestion. In this case, (CI = 0 ) & & (ACR < ER BRM ), we consider two sub-cases: (a) NI = 0: then, (CI = 0 ) & & (NI = 0 ) & & (ACR < ER BRM ), source T 5 is the one and only transition enabled; this results in the following rate change: ACR : = min ER BRM , PCR, ACR + (RIF . PCR)  . Therefore the final rate increase is bounded by the minimum of the increases allowed by the two option. (b) NI = 1: then, (CI = 0 ) & & (NI = 1 ) & & (ACR < ER BRM ), source T 6 is the one and only transition enabled; the result is no rate increase. Case 3: One indicates a congestion and the other indicates no congestion. Consider two sub-cases: (a) EFCI indicates congestion but ER indicates no congestion. In this case, (CI = 1 ) & & (ER BRM > ACR), source T 4 is the one and only transition enabled, the result is the following rate change: ACR : = max MCR, min ACR . ( 1 − RDF) , ER BRM   . Hence the final outcome is the rate reduction due to EFCI scheme since ER BRM > ACR.

Lemma 5.1. In the source machine, in the rate-change state S 2 , upon arrival of a BRM cell, one and exactly one rate-change transition is executable from the rate-change state S 2 .

(b) EFCI indicates no congestion but ER indicates congestion In this case, CI = 0 & & ACR ≥ ER BRM , source T 7 is the one and only transition enabled, this results in the following new rate:

Sketch of Proof. From the rate-change state S 2 , we examine the predicates of the four rate-change transitions: T 4 , T 5 , T 6 , and T 7 . Upon arrival of a BRM cell, we only have to check that the conjunction of any two predicates is FALSE and that the disjunction of all these predicates is TRUE.

ACR : = max MCR, ER BRM  .

We now discuss the interoperation of EFCI and ER in determining the rate ACR.

Thus the final outcome is the rate reduction due to ER scheme.

6. CONCLUSIONS

ACKNOWLEDGMENT

Protocol design has traditionally been an evolutionary process. First is the genesis of the algorithm and an informal specification of the algorithm using an English description. The protocol is subsequently refined as a result of additional scrutiny and performance analysis using simulations and other techniques. In the process, the protocol begins to accommodate complexity to address real-life situations where the initial design was inadequate. This has also been the evolution of the ATM Forum’s ABR service description and congestion management protocol. Subsequent study based on formal methods is useful to understand the correctness and performance issues of the resulting protocol.

Melody Moh is supported in part by the CSU Research Grant and the SJSU Faculty Development Award.

In this paper, we have examined the performance and correctness issues using a formal specification of the ABR service using an Extended Finite State Machine model. In the correctness domain, we have demonstrated using the EFSM that the protocol is live-lock free. This implies that if there are pending cells to be transmitted from a source, then they will eventually be transmitted. Furthermore, we computed the bounds on the time interval between successive transmissions of these cells. We have also demonstrated that the EFCI and ER options that have been accommodated in the ABR specification interoperate appropriately, with the source rate-change actions reflecting the minimum of the rates resulting from the two mechanisms when the network is congested. When a source is allowed to increase its rate, the rate is also allowed to increase to the minimum of the rate increases allowed by the two options. We believe these are needed to ensure that the protocol is robust without uncovered failure modes.

[CCJ] A. Charny, D. D. Clark, and R. Jain, ‘‘Congestion Control with Explicit Rate Indication,’’ Proc. ICC, June 1995.

In the performance domain, there has been considerable work done in the past to understand various aspects of the mechanism in different network conditions. In this paper, we have examined, based on the understanding obtained from constructing the EFSM, several worst-case behaviors. One of the rules for the source is to have a ‘‘ safety-valve’’ for the rate in the light of delayed or lost feedback from the network. We examine the stability of the source rate under various situations of RM cell loss and provide an understanding of the relationships between the different protocol parameters. In particular, the acceptable loss probability depends not only on parameters such as RIF and CDF, but also on the ratio (ER returned to source / PCR). We also provided bounds on the worst-case delay for a destination to turn-around RM cells that carry the network feedback to the source. The predominant delay here is the ‘‘waiting’’ period for the Backward RM cell for its turn to be transmitted by the scheduler. Another important bound to understand is the time between transmissions of Forward RM cells from the source, which depends on the current value of the source rate as well as the other connection-level parameters. This has to be considered in the context of the connection’s round trip time, to examine if the ‘‘RM-cell probes’’ are being transmitted adequately frequently.

REFERENCES [ATM]ATM Forum Traffic Management Specification Version 4.0, ATM Forum/95-0013R11, March 1996. [CGS] Y. Chang, N. Golmie, and D. Su, ‘‘Study of Interoperability between EFCI and ER Switch Mechanisms for ABR Traffic in an ATM Network,’’ Proc. of the 4th International Conference on Computer Communications and Networks, pp. 310-315, 1995.

[CR] Anna Charny and K. K. Ramakrishnan, ‘‘Time Scale Analysis of Explicit Rate Allocation in ATM Networks’’, Proceedings of Infocom ’96, March 1996. [JKFL]Raj Jain, Shiv Kalyanaraman, Sonia Fahmy and Fang Lu, ‘‘Parameter Values for Satellite Links’’, ATM Forum Contribution ATM Forum/95-972, August 1995. [Kes] S. Keshav, ‘‘A Control-Theoretic Approach to Flow Control’’, Proceedings of ACM Sigcomm ’91, September 1991. [Kur] R. P. Kurshan, Computer-aided Verification of Coordinating Processes, Princeton University Press, Princeton, New Jersey, 1995. [LY} D. Lee and M. Yannakakis, ‘‘Principle and Methods of Testing Finite State Machines - a Survey’’, Proceedings of The IEEE, Vol. 84, No. 8, August 1996. [LMRS]D. Lee, W. M. Moh, K.K. Ramakrishnan, and U. Shankar, ‘‘An Extended Finite State Machine Representation of the Source/Destination Behavior’’, ATM Forum Contribution ATM_Forum/96-0231, Feb. 1996. [LRMS]D. Lee, K.K. Ramakrishnan, W. M. Moh, and U. Shankar, ‘‘Protocol Specification Using Parameterized Communicating Extended Finite State Machines - A Case Study of the ATM ABR Rate Control Scheme’’, Proc. ICNP, pp. 208-217, October 1996. [LRMS2]D. Lee, K.K. Ramakrishnan, W. M. Moh, and U. Shankar, ‘‘Performance and Correctness of the ATM ABR Rate Control Scheme’’, Extended Version, June 1996. [RJ]

K. K. Ramakrishnan and Raj Jain, ‘‘A Binary Feedback Scheme for Congestion Avoidance in Computer Networks’’, ACM Transactions on Computer Systems, May 1990.

APPENDIX: PROOF OF THEOREM 2.2 The first part of the theorem is a simple special case of Corollary 2.1. Now suppose that ER > RIF . PCR. During the stable period where the source’s rate ACR = ER, each FRM cell is sent after Nrm − 1 data or BRM cells are sent. Including the transmission time of the FRM cell, the time interval each FRM sent is: Nrm / ER. The time interval between receipt of successive BRM cells with BN=0 is also Nrm / ER, if there is no loss.

Suppose that ν − 1 consecutive FRM cells in a forward direction or BRM cells in a

and

backward direction have been lost. The time interval between two BRM cells is

k −r    + 1 Mrm + 1 1   _________ _Mrm ________ = ____________________ − 1 . r   1 − CDF  . .  ACR ER CDF ( 1 − CDF)  i −1 i =r +1   

Σ k

ν . Nrm τ = ______ . ER

(A.1)

Before we analyze the rate stability, we observe that the problem is interesting when ER − MCR > RIF . PCR where RIF . PCR is the rate increase after receiving a BRM cell. Other-

We have

Σ k

wise, a single rate increase will always take the source transmission rate, ACR, back to ER and the rate is always stable, independent of the amount of decrease due to Rule 6. We assume that ER − MCR > RIF . PCR and provide conditions such that consecutive rate decreases are no more than a rate increase from receiving a BRM cell. We first estimate the time interval t k between consecutive rate decreases. We assume Nrm − 1 ≥ Mrm, based on generally accepted ranges for these parameters.

i =1

l −1    Nrm Nrm   _________ 1 t i = Crm . _____ + ________ − 1 + (r − l) . Trm +    ER ER . CDF   1 − CDF   

(A.4)

r l k r         1 1 1 Mrm + 1 .   _________ 1 1   _________ ________  −  _________   + _________   −  _________   . . .   ER CDF  1 − CDF  ER CDF 1 − CDF   1 − CDF    1 − CDF    

If we consider the boundary cases in (A.2), so that (Nrm)/ ACR l = Trm and

We order the decreases in the rate in the following manner: for time instants t i = 0 , . . . , l, when the rate is still high, FRM cells are sent after every Nrm − 1 data cells and

Mrm / ACR r = Trm, then we obtain (2.13) and (2.14). Since CDF Trm and, similarly, the condition r = 0 implies that Mrm / ACR i > Trm is always true.

(A.6)

  ν . Nrm −  Crm . Nrm + (l − 1 ) . Nrm + (r − l) + (r − l) . ER . Trm    + r . max  k : k ≤ _____________________________________________________________ Mrm + 1  

For large enough rates of ACR, Rule 6 is triggered based on sending Crm FRM cells, each of which take Nrm / ACR i − 1 cell times. For lower (intermediate) rates of ACR, when the source has sent more than Mrm cells but less than Nrm cells in time Trm, then the decrease is triggered by Crm FRM cells transmitted upon expiration of the timer Trm. For ‘‘very low rates’’, when Trm has already expired, FRM cell transmission occurs only after the requisite minimum number of cells Mrm, classifying the connection as active, have been sent. From (2.2), we can derive: . Nrm  _Crm ________ i = 1  ACR 0   Nrm ________  i = 2, ... , l ACR i − 1  ti =  1  ________ i = l + 1, ... , r Trm + ACR i − 1 ,   + 1 _Mrm ________  i = r + 1, ... , k ACR i − 1 

Σ k

 l Nrm Nrm ________ t i = Crm . ______ +  + (r − l) . Trm + ACR 0  i = 2 ACR i − 1

Σ

Σ r

i =l +1

1 ________ + ACR i − 1

(A.3)

Σ k

i =r +1

From (2.1), ACR i − 1 = ER( 1 − CDF) i − 1 , we have

Σ l

i =2

  Nrm Nrm .   _________ 1 ________ = ________   ACR i − 1 ER . CDF   1 − CDF  

1 Σ ________ ACR r

i =l +1

i −1

l −1

ν . Nrm −  Crm . Nrm + (l − 1 ) . Nrm + (r − l) + (r − l) . ER . Trm  _____________________________________________________________   + r Mrm + 1  RIF . PCR  log 1 − _________  ER   ≤ ___________________ log ( 1 − CDF) A routine derivation from (A.7) yields (2.12).

Therefore,

i =1

On the other hand, from Proposition 2.1, we have

 − 1  

r −l    1 1   _________ − 1 = ____________________   ER . CDF . ( 1 − CDF) l   1 − CDF   

+ 1  _Mrm ________  ACR i − 1 

(A.7)