Real-Time Issues in Call Acceptance Management for ATM networks

Real-Time Issues in Call Acceptance Management for ATM networks C. Courcoubetis1;2

G. Fouskas1;2

V. Friesen1

S. Sartzetakis1

1. Institute of Computer Science, FORTH, GREECE 2. Department of Computer Science, University of Crete, GREECE Abstract We discuss the issue of measuring Quality of Service (QoS) parameters in an ATM network. Difficulties arise in a high-speed environment because of the need for both timely and accurate information. A fast estimation approach is presented which can be used to enhance ATM network management functions, and this approach is applied to the particular area of Call Acceptance Management.

1

Introduction

An ATM network provides support for a wide range of services which have differing bandwidth requirements, traffic pattern statistics and Quality of Service (QoS) constraints. Cells belonging to a number of different calls are statistically multiplexed in order to make efficient use of network resources. At the same time, the various QoS criteria must be maintained for each of the calls being served. The network is required to fulfill, with the help of the Call Acceptance Management (CAM) function, the two conflicting goals of maximizing resource usage and providing acceptable QoS. It is therefore crucial that the network have, first, a mechanism for accurately measuring QoS, and second, a CAM strategy which can efficiently provide the required QoS for the network users. There are two main requirements for a mechanism which provides QoS information to the CAM function. The first requirement is that the information it provides must be accurate. The second requirement is that the information it provides must be timely. Due to the dynamically changing traffic mix in the network, and the high speeds at which this traffic is being carried, there are difficulties in providing the network with accurate and timely QoS measurements. Accuracy is difficult because of uncertainties in characterizing traffic. Timeliness is difficult because accurate estimates often require unacceptably long sampling periods due to the “rareness” of the occurrence of the relevant events. In [4] a theory is proposed that addresses the issues of timeliness and accuracy in QoS estimation. In this paper, we discuss ways in which this fast estimation approach can be used in ATM network management, specifically in the area of Call Acceptance Management. It is clear that any CAM strategy will benefit from the availability of timely, accurate QoS estimations, and other network management functions may also be enhanced. The paper is organized as follows. In section 2 we discuss the problem of QoS estimation and describe a viable solution to this problem. Section 3 explains how this solution can be used to complement most proposed CAM strategies. We conclude, in Section 4, with a summary and some suggestions for further research. 1

Quality of Service can be measured by a variety of parameters, including average cell delay, delay jitter, cell loss rate, and many others (see [12]). Not all of these parameters are directly controllable by, or the responsibility of, the call acceptance manager. For the purposes of our study, we will use cell-loss rate as the sole criterion for QoS measurements. For most practical considerations, this is the dominant QoS parameter since small delay and jitter are usually guaranteed due to the architecture of the network. We model the network by a directed graph, where nodes correspond to output-buffered switches and edges correspond to communication links between switches. In this model, for each output queue of an ATM switch, there is a unique corresponding link which is fed by the cells leaving the queue. The network’s goal is to keep the cell-loss rate at each link i less than some predetermined amount i . We are therefore assuming that the QoS requirements are translated into a set of numbers i , for each link i.

2.1 The Estimation Problem When a call with a given traffic profile makes a request for a connection, the call acceptance algorithm is responsible for deciding if and how this new call can be routed through the network so that the resulting cell-loss rate at each link along the route remains below the acceptable cell-loss threshold. Any CAM strategy must therefore be able to estimate, for each link, if it is currently providing acceptable QoS, and if adding the new call will degrade the QoS. There are essentially two options available for performing the estimation of the cell-loss rate at the links. We discuss these options in relation to the requirements of accuracy and timeliness. The first option for evaluating cell loss is to use a parametric model of the traffic being offered. Either each call must provide an accurate description of its traffic behaviour by specifying a set of traffic parameters (see [3]), or a model must be fitted to the observed traffic. A number of methods for doing this type of estimation have been proposed [9, 10, 15]. There is much uncertainty inherent in using this method of evaluation. If the parameters declared by each call are used for performing the evaluation, then some level of uncertainty is certain to be present in our estimate. This is because we cannot actually trust the calls to provide accurate traffic parameters, particularly the kind of detailed parameters that are necessary for calculating cell loss. We may improve our estimate by, rather than using the parameters declared by the call, using parameters that are measured from the input traffic. However, even if these (measured) parameters accurately describe the traffic that is entering the network, they may not accurately describe the traffic that is arriving to internal switches. Studies indicate that as aggregate traffic streams travel through the network, the process of switching can cause changes (i.e., smoothing) to both the characteristics of the aggregate traffic streams and the characteristics of individual traffic streams (see [5, 6, 14]). Cell-loss estimates based only on input traffic characteristics would likely predict higher cell-loss rates than were actually occurring. The second option for evaluating cell loss, which we choose to pursue, is independent of traffic parameters and accounts for the changes that may have occurred to the traffic while traveling through the network. The way in which this is achieved is to explicitly measure, at each link, the cell-loss rate that is being incurred by the current mix of traffic. Thus we are basing our estimate on the way in which the traffic is actually behaving, not on an inaccurate model which fails for the reasons mentioned above. An obvious problem arises in estimating cell loss at the individual links, however, because of the way in which the mix of traffic being served at any given link is constantly changing. There is no guarantee that a link will serve a static mix of traffic for an adequate length of time to make an accurate estimate of its cell-loss 2

For example, to measure, with reasonable accuracy, a cell-loss rate of 10?6 , one would have to observe in the order of 10 9 cells passing through a buffer. Assuming that a 150 Mb/s link is serving a buffer of ATM cells, and that the load on the link is .80, the estimate would require approximately 103 seconds of network time. This is, quite obviously, unacceptable for meeting our requirement of timeliness. Decisions related to QoS must be made within a much smaller time frame. It would appear that neither of the options we have presented is suitable for providing the QoS estimate that is required. The first method is too inaccurate, while the second method is too slow. It therefore becomes necessary to derive methods whereby accurate cell-loss rate estimates can be obtained in a short amount of time.

2.2 A Solution: The Virtual Buffer Estimation Technique In [4] the authors propose a new technique which can be used to reduce the amount of time required to achieve an accurate cell-loss-rate estimate. By observing the overflow process which would occur if the buffers of the switches were reduced in size, one can derive an estimate for the overflow process of the actual system. This is implemented as follows. For each buffer of size B , there are associated a number of virtual buffers with sizes B=k1 , B=k2 and B=k3 . There is a device which estimates the probability of overflow for each of these smaller buffers by performing direct measurements in the actual switch (which operates with buffer size B ). Since the virtual buffers are smaller than the actual buffer, these overflow probabilities are much larger than the overflow probability of the actual buffer, and therefore can be estimated with small variance in a reasonably short time. Using the theory of large deviations, a formula is derived which allows us to infer the overflow probability of the actual buffer of size B from the overflow probabilities of the smaller buffers. This formula is independent of the input traffic models. Some simulations were run to evaluate the virtual buffer estimation technique. We chose an arbitrary time interval of 2 seconds of network time for obtaining the cell-loss estimate from a 150 Mb/s link. This is a reasonable time frame for a CAM strategy to obtain information necessary for making call acceptance decisions. In the graphs of Figures 1 to 4, we present results for different sets of traffic mixes (see Appendix A for a description of the simulation and the traffic types). On the x-axis is the load, %, placed on the link by the given traffic mix (i.e., some combination of the 2 traffic types given). On the y-axis is the estimated cell-loss rate. For each traffic mix, the graphs contain three separate points. The first point corresponds to an accurate estimate of the cell-loss rate obtained by observing cell losses from the actual buffer over a long period of time. The second point corresponds to the estimate achieved by using virtual buffers in the 2-second time frame. The third point corresponds to the estimate achieved by counting dropped cells in the 2-second time frame. As can be seen from these graphs, for low cell-loss rates (i.e., below 10?2 ) the virtual buffer technique much more consistently provides a good estimate than counting dropped cells. For many of these points, the counting technique failed to see a single cell dropped in such a short time period, and so concluded that the cell loss was zero. For some points, the counting technique was lucky enough to make a good guess, but it could not do so with any consistency. The estimate provided by the counting technique at lower cell-loss rates than those presented here will be even worse, and completely unusable. The virtual buffer technique does not suffer from these shortcomings and is able to estimate cell-loss rates in the region of interest. For high cell-loss rates, the counting technique was able to do quite well in many instances, since overflow was frequent and so a sufficient number of samples were available. However, the network will never operate in the region of cell-loss rate that the 3

3

Application of the Real-Time QoS Estimation Technique

There are many possible applications of our real-time QoS estimation technique. To discuss all such applications would be beyond the scope of this paper. We concentrate, rather, on how this technique can be used to complement the CAM function.

3.1 Acceptance Regions In order to simplify the task of the CAM functions, there appears to be some merit in being able to define what is known as an acceptance region. The acceptance region for a given link is the set of traffic loads for which the QoS (in our case, cell-loss rate) can be guaranteed. Thus the definition of an acceptance region divides the set of possible traffic loads into two partitions: inside the acceptance region we find loads which produce acceptable cell-loss rates (“good” points); outside the acceptance region we find loads which produce unacceptable cell-loss rates (“bad” points). If an easily-defined acceptance region exists, then the CAM function need only evaluate whether or not adding the new call will cause the links to operate within the acceptance region or outside of the acceptance region. Since acceptance regions are, for most practical purposes, impossible to compute analytically (based on inexact data and imperfect models), much research has been devoted to deriving successful approximations of such a region (see [1, 9, 13]). One possible approximation for the acceptance region is based on the concept of an effective bandwidth [2, 7, 11, 13]. The effective bandwidth of a given call is the “amount” of bandwidth that this call must be allocated so as to statistically guarantee an acceptable level of QoS for the new call and all other calls currently being served. The traffic load on a link is therefore defined in terms of the number of each type of call that is currently using that link. Defining Ni to be the number of calls of type i currently being served by a link and Ei to be the effective P bandwidth of type i calls, we can represent the current load on a link as: i Ni Ei . If this sum is less than c, where c is the service rate of the link, then the load comprised of N1 ; N2; :::NI is declared to be acceptable. If the sum is greater than c, this load is declared to be unacceptable. Results confirm that such an approximation of the actual acceptance region is fairly accurate for a wide range of traffic mixes [1, 9, 13]. Another proposed acceptance region gives rise to the two-moment allocation scheme [13], which defines the load on a link in terms of the mean and standard deviation of the bandwidth requirements of the calls it is serving. The current load on a given link, say link j , is given as P P 2 1=2 where j is a constant related to the characteristics of link j (i.e., i Ni i + j ( i Ni i ) buffer size and link capacity), i is the mean bandwidth requirement of type i sources, and i is the standard deviation of the bandwidth requirement of type i sources. (This method inherently assumes that the distribution of the bandwidth required by all existing calls can be approximated by the normal distribution with the same mean and variance. Results based on this model are given in [13].) Hiramatsu [8] displays the acceptance region calculated by a neural network where the average bit rate of the cell input is plotted against the bit rate fluctuation of the cell input. The boundary between the “accepted” and “rejected” regions in this study appears to be nearly linear, lending support to the notion of a simply-defined acceptance region.

4

As we have already explained in the previous section, the purpose of using an approximation of the acceptance region in doing call acceptance is to simplify the decision process: one would like to produce a decision that fulfills QoS and requires a small amount of computation. There are some inherent problems in implementing any such CAM strategy. First, the CAM must have knowledge of not only the existence of the acceptance region, but also of the boundaries of the acceptance region, and, for each link, of the position of its current operating point in the acceptance region. The virtual buffer estimation technique can greatly improve the process of defining the boundaries by providing the necessary feedback from the links. Information describing the distance of the current operating point of each link from the boundary of the corresponding acceptance region could be required by CAM functions which must decide on which, among many possible routes, to place an incoming call. Also, the time at which such accurate information can be available is of great importance. In what follows we describe how our real-time estimation strategy can be used to enhance CAM. Consider the links as monitors that use traffic lights to inform the call acceptance manager of whether or not they are currently offering acceptable QoS. When the QoS is acceptable, the light is green, indicating that more calls can be routed through this link. When the QoS degrades, this light becomes red, indicating that no more calls can be accepted on this link. As well, if an acceptance region is being defined, the green light indicates that the current mix of traffic should be flagged as acceptable, while the red light indicates that the current mix of traffic should be flagged as unacceptable. When a change in the input traffic occurs, the light becomes yellow, indicating that the estimator is not accurate; the duration of the yellow light should be the sum of the time it takes for the network to reach its new steady state plus the time for the estimator to enter its desired confidence interval. Clearly, such a time will be prohibitively long if the virtual buffer technique is not used. As we demonstrated previously, using the above technique allows us to keep the yellow light for relatively short duration; we anticipate this to be in the order of a few seconds. Thus, the delay between the time at which QoS degrades at a link and the time at which the call acceptance manager has been alerted to this situation (i.e., sees a red light) is very small. The call acceptance manager uses the information regarding the colours of the traffic lights at the links as follows. While the traffic lights at the links along a given path are green, calls are accepted along this path. If some traffic light is red, then no call is routed along this path. If some traffic light is yellow, there are two alternatives. The conservative one is to block all calls along this path and wait until all lights turn green. This has the effect of minimizing QoS violations at the expense of reducing the utilization of the network. A more risky strategy would accept calls if the majority of the lights along the path are green, and the previous operating point was far from the boundary of the acceptance region. Another observation is that constraining the amount of extra traffic that can be accepted over a certain time period (i.e., disallowing large perturbations of the load of the network to occur) reduces the duration of the uncertainty period at the expense of increased call blocking. Note also that as the call acceptance manager becomes more and more confident of where the boundary of the acceptance region is, erroneous call acceptance decisions will become less and less frequent. It becomes apparent from the previous discussion that a number of CAM strategies can be defined by refining the above ideas.

4

Summary and Directions for Further Research

In this paper we addressed some issues related to the accurate and timely provision of QoS estimate information in ATM networks. We demonstrated that the estimation technique proposed 5

and directions of further research are the following. One should do extensive simulations to validate our belief that the duration of the yellow light (uncertainty period for the QoS estimators) can be chosen to be small, without greatly increasing the cost due to QoS violations; in fact the compromise between minimizing unnecessary blocking of calls and minimizing the cost of QoS violations should be studied in depth. Another important issue which should be further investigated is the use of the extrapolation technique introduced in [4] for predicting the effect of the addition of new traffic in the system. We believe that this can successfully complement the approach we proposed in the previous section of the paper. One should study new “on-line” routing strategies that can now be of practical interest since the operating point of the network can be estimated in real time. Such strategies could use sophisticated optimization criteria, such as expected cost due to blocking of different types of calls, various QoS violation costs, etc. One can easily see that “on-line” bandwidth allocation in such networks can be casted into a number of stochastic resource allocation problems with the new feature that the actual amount of the resource allocated to a call becomes known with some time hysteresis from the time of the allocation. Another useful application of the ideas in this paper is the simulation of ATM networks. The speed-up in estimating the QoS parameters remains the same and becomes necessary for many applications. Additional speed-up can be obtained by using fast simulation techniques. We intend to use some of the ideas in this paper in the implementation of the ATM QoS simulation experiment in the RACE project NEMESYS.

References [1] J. Appleton, “Modeling a Connection Acceptance Strategy for Asynchronous Transfer Mode,” Proceedings of the 7th Specialists Seminar, Morristown NJ, 1990, Session 5.1. [2] F. Borgonovo and L. Fratta, “Policing in ATM Networks: An Alternative Approach,” Proceedings of the 7th Specialists Seminar, Morristown NJ, 1990, Session 10.2. [3] CCITT Recommendation I.311, “B-ISDN General Network Aspects,” Geneva, Jan. 1990. [4] C. Courcoubetis, G. Kesidis, R. Ridder, J. Walrand and R. Weber, “Admission Control and Routing in ATM Networks Using Inferences from Measured Buffer Occupancy,” to appear. [5] A. Descloux, “Stochastic Models for ATM Switching Networks,” IEEE Journal on Selected Areas in Comm., Vol. 9, No. 3, April 1991, pp. 450-457. [6] V.J. Friesen, “A Performance Study of Broadband Networks,” Master’s Thesis, University of Waterloo, Waterloo, Canada, 1991. [7] R.J. Gibbens and P.J. Hunt, “Effective Bandwidths for the Multi-type UAS channel,” Statistical Laboratory, University of Cambridge, preprint, 1991. [8] A. Hiramatsu, “ATM Communications Network Control by Neural Networks,” IEEE Trans. on Neural Networks, Vol. 1, No. 1, March 1990, pp. 122-130. [9] J. Hui, “Resource Allocation for Broadband Networks,” IEEE Journal on Selected Areas in Comm., Vol. 6, No. 9, Dec. 1988, pp. 1598-1608.

6

Algorithm for ATM Networks,” Proceedings ICC 1989, pp. 415-422 (Session 13.5). [11] F.P. Kelly, “Effective Bandwidths at Multi-class Queues,” Statistical Laboratory, University of Cambridge, preprint, 1991. [12] Race Project R1005 “Specification of the NIKESH Experimental System,” May 1990, Race Document 05/GPT/BNM/DS/X/009/b1. [13] Race Project R1022 “Technology for ATD”, Race Document TG 123 0006 FD CC. [14] H. Saito, M. Kawarasaki and H. Yamada, “An Analysis of Statistical Multiplexing in an ATM Transport Network,” IEEE Journal on Selected Areas in Comm., Vol. 9, No. 3, April 1991, pp. 359-367. [15] H. Uose, S. Shioda and K. Mase, “Fast Cell Loss Rate Evaluation Methods and Their Application to ATM Network Control,” Proc. of the 7th ITC Specialists Seminar , Morristown NJ, 1990, Session 13.3.

7

0.1 + 3 3 3+

+

0.01 cell loss 0.001 rate

33 b

3

0.0001 3

33 b

3+3

33

3

b

+

b

b

b

b

b

b

b

+ +

b

3

measurement virtual buers estimate unenhanced estimate

b

b

+

b

b

+

1e-05 0.8

0.82

0.84

0.86

%

0.88

0.9

0.92

0.94

Figure 1: Mix of Traffic Types (i) and (v)

0.1 +

0.01 cell loss rate

3+

+

3+ b

3 b

b

33 b

b

0.001

+

3+ 3

3 3+ +

b

+

b

33 b

+

measurement virtual buers estimate unenhanced estimate b

3 3

b

b

+ b

3 b

+

b

+

3+ 0.0001 0.8 b

0.82

0.84

0.86

%

0.88

0.9

Figure 2: Mix of Traffic Types (iii) and (v)

8

0.92

0.94

1

3+

0.1

b

3+

cell loss 0.01 rate 0.001

+

b

+

3

3

b

b


+

3 b

0.0001 0.76

0.78

0.8

0.82

0.84

%

0.86

3 b

+

0.88

0.9

Figure 3: Mix of Traffic Types (ii) and (iv)

0.1

+ 3 + 3 + 3 3 + 3 3+ + 3+ 3+ 3 3+ + + 3 3+ b

0.01

b

b

b

b

b

3

b

b

+ 3 3 b

0.001 cell loss rate 0.0001 1e-05

b

3

b

b

b

b

b

3


b

+

3 b

+

b

1e-06 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 % Figure 4: Mix of Traffic Types (i) and (iii)

9

A.1 Simulation Models In our simulation, we model a single link of the network which is fed by a single buffer of an output-buffered switch. Cells arrive to the buffer from a number of traffic sources and are served in a FCFS manner by the link. The buffer is served at a deterministic rate, corresponding to the capacity of the link. The simulation was constructed so as to give us results for a network link operating at a speed of 150 Mb/s. For practical purposes, we used a simulation environment where the real time of the network was scaled to be 25 times slower. In other words, the simulation was constructed for a 6 Mb/s link (rather than a 150 Mb/s link), and each of the traffic sources was slowed down by the same factor. The length of each of the simulation runs, then, must be scaled in order to give results for the 150 Mb/s case. Each of our traffic sources is modeled by a two-state Markov chain. A traffic source remains in a given state for an exponentially distributed length of time and produces cells, in that state, at a deterministic rate. In order to validate our results for different traffic mixes, we used the following traffic types (see also Figure A.1 and Table A.1):

Traffic type (i) is a two-state Markov chain with rate matrix # " ? 10 10 Qi = 20 ?20 The probability of being in state 0 is therefore 0 = 2/3 and the probability of being in state 1 is 1 = 1/3. In state 1, the source produces cells with constant bit rate Λ1 = 2500 cells/sec and in state 0 it remains idle (i.e., Λ0 = 0).

Traffic type (ii) is the same as (i) except that the rate at which it changes from state 0 to state 1 (and vice versa) is speeded up by a factor of 10. Consequently, traffic type (ii) is more bursty than traffic type (i). It is worthwhile to notice that the first two moments of the bit-rate (see Table A.1) do not account for this difference in burstiness. Traffic type (iii) has a higher peak bit-rate and a higher bit-rate variance than traffic types (i) and (ii). Traffic type (iv) is similar to (iii) except that, in state 0, it produces cells with a low bit rate. Traffic type (v) is a low-bit-rate source.

A.2 Experiments We can decompose our simulation runs into three main categories: 1. Our first set of experiments were to estimate the cell-loss rate such that the 95% confidence interval of our estimator was within 20% of the actual cell-loss rate. 2. Our second set of experiments used the virtual buffer technique of estimating cell loss rate. Three subsets of experiments were used: (a) First, we ran the simulation as long as was necessary to obtain as accurate an estimate as we had for the experiments in #1 above (i.e., an estimate with a 95% confidence interval within 20% of the actual value). 10

network time at 6 Mb/s (or equivalently, for T = 2 sec at 150 Mb/s). (c) Third, we ran the simulation for time T = 10 sec (0.4 sec at 150 Mb/s). 3. Our third set of experiments estimated the cell-loss rate by counting the dropped cells in an amount of time comparable that used in #2 above: (a) First for T = 50 sec (2 sec at 150 Mb/s). (b) Second for T = 10 sec (0.4 sec at 150 Mb/s). For each set of experiments (1, 2a, 2b, 2c, 3a and 3b), there were 15 individual runs made: one for each individual traffic type (5 runs) and one for each pair of traffic types (10 runs). The parameters used in the simulations are:

Buffer size B = 1500 cells Link Capacity c = 15000 cells/sec ( 6 Mb/s) Virtual Buffers:

B0 = 107 cells (B=14) B1 = 150 cells (B=10) B2 = 250 cells (B=6)

The length of this paper does not permit us to present all of the results that we obtained from our experiments. However, the graphs in Figures 1 to 4 are indicative of our findings, and represent a cross-section of the traffic mixes used. We would like to point out that, while the x-axis of the graphs indicates the load on the link (%), the individual points for a given value of % correspond to a particular traffic mix. Various combinations of the 2 traffic types shown on a graph could have produced the same load, but different cell-loss rates (due to having different levels of burstiness). Thus the 3 points should be interpreted as corresponding to a certain traffic mix rather than a certain load. We did not plot multiple points for the same load in the interests of readability.

11

(i)

( ii )

( iii )

0.

0.

0

10.

0.

0

20.

1.

1

0

2.

10.

90.

1

2500.

1

2500.

10000.

( iv )

(v) 150.

0.

0

0

10.

110.

1.

1

4.

1

8250.

375.

Figure A.1 Traffic source types

Traffic source type

mean bit rate

std. deviation of bit rate

peak-to-mean ratio

squared coefficient of variation of bit rate

(i)

833.33

1178.51

3.

2.0

( ii )

833.33

1178.51

3.

2.0

( iii )

1000.

3000.

10.

9.0

( iv )

825.

2238.72

10.

7.36

(v)

75.

150.

5.

4.0

Table A.1 Traffic12source characteristics