Simulation Support and ATM Performance Prediction

6 downloads 0 Views 180KB Size Report
types: 8 Switched Multimegabit Data Service (SMDS) ports (a data service designed to o er nearly constant bitrate data connections between communica- ...
Simulation Support and ATM Performance Prediction Bjorn Levin

[email protected]

Anders Holst [email protected]

Anders Lansner [email protected]

Swedish Institute of Computer Science Box 1263, SE-164 29 KISTA, Sweden

Zsolt Haraszti

[email protected]

SwitchLab, Ericsson Infocom Systems AB SE-126 25, Sweden

Abstract

The estimation of quality of service through the simulation of trac passing an ATM network requires considerable computational resources due to the rare event nature of the phenomena in question. By training a neural network to mimic the output of the simulator, i.e. to capture the functional relationship between di erent con gurations and loads and their corresponding quality of service, estimates can be produced in under a millisecond as opposed to requiring around half a minute using only the simulation. This speed-up in turn enables interactive applications with smooth and instantaneous feedback and the treatment of considerably bigger ATM networks than before. The system can also be run \backwards", i.e. given a desired quality of service the system can determine the acceptable loads and con gurations.

1 Introduction The aim of the presented work is to improve the prediction of Quality of Service (QoS) in Asynchronous Transfer Mode (ATM) telephone networks. These predictions are normally obtained through simulations, in which synthetically generated trac is lead through a model of the switch and statistics measured at appropriate points. All trac in ATM networks is based on equally large cells. The above mentioned simulations are performed by randomly emitting such cells from a number of sources with adjustable statistical properties. Typically measured QoS properties are delay and loss. It should be noted that e.g. losses occur quite seldom, meaning that we are talking of so called \rare event simulations" [1], that require considerable time and computational resources. Simply stated, the task for the neural network is to be able to predict the outcome of a simulation. This is accomplished by the neural network trying to capture the functional relationship between the input and output measurements in the system being simulated. The training data consists of 8250 examples 1

Cell level simulator Random cell generator

Random parameter generator Configuration and Load description

ATM switch model

Training pair

Statistics collector

Quality of Service

Neural Network

Figure 1: An overview of the system. A vector with load and con guration

parameters can be used by the simulator to compute a quality of service vector (upper solid arrow path). By using a random parameter generator, a large number of such simulations can be initiated. The inputs and outputs of these simulations can be used as training data for the neural network (dashed arrow path). Once trained, the neural network can be used (lower solid arrow path) to compute the quality of service from any load and con guration vector much more rapidly than through simulation.

generated by the simulator. Each individual example is created by randomly selecting input conditions, such as load levels, bu er lengths etc, and then simulate and collect statistics. Once the neural network has captured the dependencies of the modelled system then the same results can be obtained either through the neural network or through the simulation, with the path through the neural network being magnitudes faster. An overview of the situation can be found in gure 1. It should be noted that the gain in speed means much more than just getting the answers more rapidly. It enables, for example, the creation of interactive applications, where the displayed expected delay is directly updated as the user changes the load. This kind of application gives an unequalled feeling for how the simulated system is likely to respond to di erent types of changes. It also enables iterative search and rapid massive explorative testing.

2 The ATM Component and its Simulation The ATM component used most extensively in this work is the Service Multiplexer (SMX), shown in gure 2. It multiplexes data from a number of lowspeed ports into a single high-speed data link. The ports are of three di erent types: 8 Switched Multimegabit Data Service (SMDS) ports (a data service designed to o er nearly constant bitrate data connections between communica-

1 SMDS 10 Mbps

SMDS-AAL

8

SMDS poller SMDS/Video Arbiter

SMDS interface cards

lhigh

1

Video 5 Mbps 4

ATM high 155 Mbps Video-AAL

Video poller

low

Video adapters

llow 1 IP 10 Mbps 16

IP-AAL

Ethernet card poller

Ethernet cards

Figure 2: Schematic diagram of the simulated ATM Service Multiplexer (SMX). tion access points), 4 video ports, and 16 LAN (IP) ports; each port type with its respective access speed and trac characteristics. To provide low latency for the delay-sensitive video and SMDS trac, while minimizing the probability of data loss for the loss-sensitive IP connections, an intricate scheduling/prioritizing discipline is used. The cells of the delaysensitive services are kept separate from the IP cells up to the last stage of the SMX (see gure 2), where they are merged using a two layer static priority scheme. IP cells can only be transmitted to the link when the high priority queue is empty, while video cells are given a lower priority than SMDS cells but are injected before the SMDS cells once accepted. Two bu ers, with adjustable length, are needed for this scheme. In the simulation, the random behaviour and correlation structure of the three service classes are modeled by three di erent types of stochastic sources. The goal is then to estimate the perceived QoS, here expressed as the mean delay, maximum delay and data loss probability for each service type. Using conventional simulation combined with Importance Sampling (IS), we can obtain such estimates for any given con guration.

3 Neural Network Methods Variations of two main types of neural networks have been used during the work: Multi-Layer Perceptrons (MLPs), trained with Backward Error Propagation (Backprop), and Mixture Density Neural Networks (MDNN), trained with an Expectation Maximization (EM) algorithm. Learning in the MLP/Backprop networks was done using conjugate gradient descent, of which perhaps the part contributing most to the performance was the bracketing line-search. Errors below 5 % of the range were truncated

to zero. A weighting scheme of the training patterns was also designed and employed, since their distribution, especially in the output space, was quite uneven. Training was always continued until stable, i.e. no pre-termination to increase generalization was used. The network was set up with three layers, in the form 9-6-6-9, with consecutive layers fully connected and shortcut connections from all layers to the input layer. This size was selected as a good compromise after trial and error. The MLP networks were also extended to allow inversion. A conjugate gradient descent method was used to adjust the inputs from an initial guess, so as to minimize the squared error between desired and computed output, similar to Kindermann and Linden [3]. Externally imposed constraints between the inputs were expressed as linear relationships. By modifying the conjugate gradient method used, these linear conditions could be automatically included in the descent algorithm during inversion. The mixture density neural network consisted of a mixture model [4] with 36 multivariate Gaussian components in the 18-dimensional space (consisting of the 9 input and 9 output attributes). The mixture model was trained using a Bayesian version of the expectation maximization algorithm [2]. The full covariance matrices were adapted during training. The Bayesian version of EM was used to prevent over- tting of the 171 components in each covariance matrix, and to prevent components from obtaining zero probability. One advantage with the MDNN is that inversion is handled without further requirements. All 18 axes are treated the same, and any of them can be used as input or output variables. When some inputs are given to the network, the units compete in a winner-take-all fashion, and the winning mixture component is used to calculate the expected output values given the current inputs. Another advantage with the mixture model is that it allows distributions for the inputs rather than xed values. In the current implementation a Gaussian distribution could be speci ed for each input, making it possible to use soft constraints on the variables.

4 Results Performance, as measured on a separate test set comprising 2750 patterns, is quite satisfactory. Figures 3 and 4 show the output of the MLP neural network, plotted against the result of the corresponding simulations. A perfect agreement would mean that all points fell on a 45  upwards pointing line through origo. As the diagrams show, the agreement is not perfect but still very good. One should also remember that the simulation is a stochastic process, hence generating noise that causes deviations from the theoretical line even if the neural network prediction was awless. The most obvious gain from using the neural networks is the speed with which results can be delivered. Estimating the quality of service from an input giving load and other con guration parameters of the ATM component described in section 2 takes around half a millisecond on an ordinary Sun Sparc5

40

35

30

25

20

15

10

5

0 0

5

10

15

20

25

30

35

40

45

Figure 3: SMDS average delay (in cells) as predicted by the MLP neural network (vertical axis) against the value computed by the cell level simulator (horizontal axis). The agreement is quite good. (See also section 4.) computer. The corresponding simulation takes between half a minute and two minutes depending on how the simulator is con gured. The speed with which the estimates can be delivered means that it has been possible to build an interactive system in which user changes are instantaneously matched by updated dials. It is quite enlightening to see how other values smoothly follow as you adjust any load, con guration parameter or desired quality of service. The system based on the MDNN, although not yet properly tuned, also allows the plotting of graphs showing the relationships between any two variables.

5 Discussion In the above, the neural networks have been used to predict the behaviour of single ATM components. The bene ts are potentially even higher when larger, especially cyclic systems are evaluated. Using a simulator is in many such cases impossible since the time needed for the simulator to settle into a stable state is prohibitive. If for some reason, e.g. reliability or added accuracy, there is a desire to continue using the simulator, the neural network can still be of assistance. By predicting where QoS values are likely to fall close to important thresholds, simulation resources can be allocated more eciently. A similar use, that has already been successfully implemented, is to let the neural network adjust the importance sampling parameters needed in the simulator. This reduces the

1

0.1

0.01

0.001

0.0001

1e-05

1e-06

1e-07 1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

1

Figure 4: Video cell loss (in cells per cell) as predicted by the MLP neural network (vertical axis) against the value computed by the cell level simulator (horizontal axis). The agreement, especially for very low losses, is not as good as in gure 3 but is still quite acceptable. The important feature is the lack of points far from the previously mentioned diagonal line. (See section 4.) simulation time by a factor of 10 without sacri cing any accuracy or reliability. A future possibility is also to use the rapid QoS estimates as an input when performing admittance control.

Acknowledgment This work has been funded by SwitchLab at Ericsson Infocom Systems AB.

References [1] P. Heidelberger. Fast simulation of rare event in queueing and reliability models. ACM Trans. on Modeling and Comp. Sim., 5(1):43{85, Jan. 1995. [2] A. Holst. The Use of a Bayesian Neural Network Model for Classi cation Tasks. PhD thesis, dept. of Numerical Analysis and Computing Science, Royal Institute of Technology, Stockholm, Sweden, 1997. [3] J. Kindermann and A. Linden. Inversion of neural networks by gradient descent. Parallel Computing, 14:277{286, 1990. [4] G. J. McLachlan and K. E. Basford. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York, 1988.