a tabu search algorithm for optimal sizing of locally recurrent neural

0 downloads 0 Views 167KB Size Report
metaheuristic, called Universal Tabu Search, is used to optimally design a Locally Recurrent Neural Network architecture. In fact, generally, the design of a ...
A TABU SEARCH ALGORITHM FOR OPTIMAL SIZING OF LOCALLY RECURRENT NEURAL NETWORKS B. Cannas, A. Fanni, M. Marchesi, F. Pilo Dipartimento di Ingegneria Elettrica ed Elettronica – Università di Cagliari Piazza d’Armi 09123 – ITALY [email protected]

Abstract A general purpose implementation of the Tabu Search metaheuristic, called Universal Tabu Search, is used to optimally design a Locally Recurrent Neural Network architecture. In fact, generally, the design of a neural network is a tedious and time consuming trial and error operation that leads to structures whose optimality is not guaranteed. In this paper, the problem of choosing the number of hidden neurons and the number of taps and delays in the FIR and IIR network synapses is formalised as an optimisation problem whose cost function to be minimised is the network error calculated on a validation data set. The performance of the algorithm have been tested on the difficult task to learn the chaotic behaviour of a non linear circuit proposed by Chua as a paradigm for studying chaos.

1. Introduction Artificial Neural Networks (ANNs) are a very powerful tool and they can be a valid aid in a large number of practical applications like pattern recognition, prediction, optimisation, associative memory and control. Traditional ANNs have neither feedbacks nor delays, and consequently no memory of the past inputs: the output result is strictly a function of the instantaneous input to the network. On the other hand, many practical problems such as, for instance, time series forecasting, digital signal processing, space-temporal pattern recognition or industrial system control and diagnosis, require that the solution takes into account the existing link among the current input and the previous inputs and outputs, because the output depends also on the previous history of the system. Locally Recurrent Neural Networks (LRNN) have synapses with internal memory (finite or infinite) [1,2] which provide better modelling accuracy compared to static neural networks and make them particularly well suited for dynamic applications. LRNNs are made up by units that get as input at generic time instant t the output of the previous level units at time t and at the time t-1, t-2, ..., t-n, and its output at the time t-1, t-2, ..., t-m, all suitably weighted. These delayed inputs let the unit know the history of the signal, allowing the creation of richer and more complex decision surfaces. These networks are sometimes called FIR or IIR networks because each synapse has the structure of a Finite Impulse Response (FIR) digital filter or an Infinite Impulse Response (IIR) digital filter.

One of the major difficulties in designing a neural network is the task of determining the actual network structure, such as the number of layers, the number of nodes and the appropriate interconnections. Network performance can radically change when such parameters are modified. Particular attention should be paid in order to solve this problem for LRNNs due to the tremendous number of variables (i.e. number of hidden nodes, number of taps in IIR and FIR synapses) that influence network performance. This problem can be formulated as an optimization problem with integer variables and it could be efficiently solved with metaheuristic techniques which are particularly well suited for combinatorial problems. Recently, a new metaheuristic, called Tabu Search (TS) [3-5], has provided advances for solving difficult optimization problems in many domains [6]. TS is a metaheuristic method that guides the search for the optimal solution making use of flexible memory systems which exploit the history of the search. TS consists of the systematic prohibition of some solutions to prevent cycling and to avoid the risk of trapping in local minima. New solutions are searched in the neighborhood of the current one. The neighborhood is defined as the set of the points reachable with a suitable sequence of local perturbations, starting from the current solution. TS algorithm can be used to optimally design LRNNs. The LRNN can be identified by means of M integers each of them represents the most important parameters of the network (i.e. the number of hidden neurons or the number of delays in the synapses). By so doing the problem becomes discrete and the TS algorithm can be successfully employed. The objective function to be minimized is the minimal error on a validation data set, reached after a complete training epoch. It should be noticed that the optimal neural network found with TS algorithm has great generalization capabilities or, in other words, it is able to make accurate predictions for data not used in the training phase. The optimal dimension allowed a more rapid and stable convergence of the learning algorithm even for really long time series. Moreover, the time consuming manual design phase has been overcome and the software package developed can be used as general purpose package well suited for a wide range of different neural networks applications. In this paper the performance of the proposed approach has been tested on the problem of learning the autonomous behaviour of a chaotic non-linear circuit

presented in the literature as a paradigm for studying chaos.

inputs. Moreover, IIR outputs also depend on the output past values.

2. Locally Recurrent Neural Networks

2.1.

The usual feed-forward MLP neural network, once trained by algorithms which minimise the error among the obtained and the desired outputs, allows the reproduction of a general non linear function provided that there is a sufficient number of neurons in the hidden layers. This non linear approximation is static because each inputoutput couple in the training set is considered as independent from each other. Obviously, such a procedure is not suited when the output depends not only on the current input but also on the previous history of the system and specific mathematical procedure are required. Many practical problems (e.g. time series forecasting or industrial systems control and diagnosis) require the solution takes into account the existing link among the current output and the previous inputs and outputs. This goal may be achieved by using globally recurrent neural networks, that have feedback connections among all the network layers. This kind of networks are rather difficult to be trained because of the need to use the BackPropagation Through Time (BPTT) algorithm which is non causal and requires a large amount of memory, drastically increasing with network dimensions [7,8]. Recently an intermediate architecture between a global feed-forward architecture and a global recurrent one has been proposed [1]. These networks are characterised by a feed-forward structure whose synapses between adjacent layers have taps and feedback connections. Thus each synapse in the network has the structure of a Finite Impulse Response (FIR) digital filter or of an Infinite Impulse Response (IIR) digital filter, and for this reason these networks are sometimes called FIR or IIR networks. Different training algorithms have been proposed for such networks [7-12]. The locally recurrent architecture for a very simple network is shown in Fig.1 where an IIR synapse having two feedback connections is depicted. If no feedback connections exist the synapse becomes a simple FIR filter with tapped inputs. Both FIR and IIR synapse outputs depend on the previous

The goal of time series prediction can be stated as follows: given a sequence deriving from the sampling of a continuos system, i.e., y(t-N), y(t-N+1),.., y(t), find the continuation y(t+1), y(t+2)... The oldest and most studied method to achieve this task is the linear regression where the predicted value of the y variable at (t+1) is calculated as the linear combination of the last (t-N) samples that minimise the RMS error among the desired output and the calculated one. Beside this approach, the time series prediction problem can be written in term of a suitable function that maps the samples of the y sequence. This non-linear and dynamic mapping may be achieved with neural networks, being the net of FIR or IIR type particularly well suited for this objective [1,2]. Such networks are trained using as input the value y(t) (previous values are stored in the synapses memories) and calculating the error e(t+1) between the network output y(t+1) and the desired output d(t+1) [4]. This error can then be used to adapt on-line the weights in the FIR and IIR filters, so to minimise the global instantaneous error. Once completed the training, the network is able to predict the behaviour of the y variable at t+1 if the last t-N samples are available. This prediction is often called one step ahead prediction. The most important task in time series prediction is, of course, the long term forecasting that may be obtained using the output at t+1 as input at t+2 instant (i.e. the LRNN operates under autonomous conditions). This working conditions allow to forecast the time series so much ahead in time as is desired but, the prediction accuracy is not guaranteed because the network has been trained only for the one step ahead forecasting. However, the neural network capability to learn the underlying link between input and output variables yields to obtain quite good long term prediction even for chaotic time series.

x 1 (0) (t)

+

s 1 (1) (t)

3. Tabu Search Algorithm TS is based on the use of an adaptive memory, i.e., the

x 1 (1) (t)

w

sgm

Time series prediction

11 ) (2 )

(0

z -1 w 11(1) (2)

+

y 1 (2) (t)

(2 11

sgm

x 1 (2) (t)

z -1

) (2)

z -1

w

x 2 (0) (t)

v 11(1) (2) z -1 v 11(2)

Hidden Layer ( l=1)

(2)

Output Layer (l=2)

Figure 1: Example of Locally Recurrent Neural Network

logic that guides the exploration of the search domain relies on the past trend of the same search. In other words, the system «learns» from its history and «reacts» trying to correct or to avoid past errors, using the knowledge acquired till that moment to direct the search into the most promising domain regions. The use of dynamic memory clearly opposes TS to other well-known metaheuristics such as Simulated Annealing [13] and the family of Genetic Algorithms [14], characterised by memory-less approaches, or to algorithms that use rigid memory structures such as Branch and Bound or A* or other similar AI methods [3,4]. Adaptive memory is the core of any TS algorithm. In particular, the performance of an algorithm is heavily affected by the trade-off between short-term and longterm memory, that can be structured in terms of previous moves, visited solutions and relative transition frequencies, tabu tenure variation (i.e., residence frequency of the tabu tenure parameter). Different memory management strategies (to which correspond different exploration strategies of the search domain) have been proposed in the literature [4,5]. As discussed in the following sections, we have chosen to implement in our tool only those strategies that are not specifically tied to a particular optimisation problem. The easiest way to introduce TS is to think of it as a variant of the maximum descent method. Given a function f(x), to be minimised in a set X, and an initial feasible solution x*∈X, f(x*) is evaluated and the set N(x*), called neighbourhood of x*, is generated. N(x*) is a subset of X such as each element of it is reachable from x* in only one move. After generating the neighbourhood, the objective function f(x) is evaluated for each set’s element, and the minimal element in N(x*), is chosen as the new solution. To avoid cycling, TS also modifies the definition of neighbourhood, by introducing the concept of tabu solution and tabu move, to which the name of algorithm is due. For this purpose, a dynamic memory structure, called tabu list, is generated and kept: it consists of a set T of TT solutions most recently visited. So a new neighbourhood is defined by the set N(x*)-T. Finally, this simple iterative scheme can be enhanced by introducing several rule-ofthumb criteria such as: aspiration (an element may be removed from the tabu list under certain conditions); intensification (deeply explore a region that look promising); diversification (leave a region that does not look promising); etc. TS performance heavily depends on the value of different parameters, such as the Tabu Tenure (TT ) i.e., the dimension of the tabu list, etc. Reactive Tabu Search (RTS) [15] consists in dynamically tuning such parameters during the search.

4. Autonomous model of Chua’s Circuit In the last years, Chua’s circuit has been extensively studied becoming a reference paradigm for chaos [16]. Chua’s circuit is one of the simplest systems for which the presence of chaos has been experimentally

established, numerically confirmed and mathematically proven. A set of more than 30 strange attractors generated by Chua’s circuit has been demonstrated [17], with particular attention on double scroll and generalized family of n-double scroll attractors [18]. Recently, the problem of generating the attractors of Chua’s circuit by means of different networks has been considered. The neural network approach proposes a globally recurrent neural network, that starting from time discrete samples of Chua’s state variables is trained to behave like a double scroll [19]. In this case, the problem arises to the learning of globally recurrent neural network. Conversely, a detailed knowledge on the system is not required, thus yielding a black box approach. Unfortunately, these networks are rather difficult to be trained because they need the Back Propagation Through Time (BPTT) algorithm, which is non causal and requires a large amount of memory, drastically increasing with network dimensions [7,8]. This may lead to heavy difficulties when n-double scroll attractors are considered. A better solution for the training of a neural network can be found in the global feed-forward network (e.g. the multilayer perceptron). Unfortunately, the input-output relationship of these networks is static and can be inefficient when the output depends not only on the current input but also on the previous history of the system, as it happens in chaotic behaviours. LRNNs avoid these problems but no constructive algorithms exist to choose their optimal architecture. This problem can be formulated as a combinatorial optimisation problem and TS metaheuristics can be used to find the optimal LRNN layout.

4.1.

Chua’s circuit

Chua’s circuit consists of a linear inductor L, two linear capacitors C1 and C2, a linear resistor R and a voltagecontrolled PWL resistor NR. The electric circuit is shown in Fig. 2a. The circuit is completely described by a system of three ordinary differential equations called Chua’s circuit state equations:

dv R C = G( v | dt | dv | C = G( v S dt | di | L = −v | T dt c1

1

c1

c2

2

c1





vc2 ) − g(vc1 ) vc2 ) + iL

(1)

L

c2

where vc1, vc2, iL denote, the voltage across C1 and C2 and the current through L, respectively (see Fig. 2a) An adequate choice of the circuit parameters has been shown to determine a chaotic behavior of the circuit and several strange attractors have been demonstrated [16,17]. In particular, when the non-linear resistor NR corresponds to the so-called Chua’s diode, the circuit generates the double scroll strange attractor [16,17]. The analytic expression of Chua’s diode is:

1 g(v c1 ) = G b v c1 + (G a − G b ) ⋅ v c1 + E + ⋅ v c1 − E 2 where E is breakpoint’s abscissa (see Fig.2b). ig

R vc

L

vc

2

C2

C1

vg

1

NR

iL Figure 2a: Chua’s circuit

ig Gb

 vc1   0.6365  vc 2  = − 0.2610      iL   0.2889 

vg Ga

Figure 2b: The Chua’s diode characteristic

4.2.

The training set consists of 3000 data points corresponding to samples in the time interval [0-320]s with sampling rate of 0.11. The voltage vc1 (Fig. 2a) is employed as input of the network and the target vector is a sequence of all the three state variables of the circuit vc1, vc2, iL. The validation set consists of 3000 samples with the same sampling rate of the training and , with initial state:

E -E

(2)

 vc1   0.9375  vc 2  = − 0.0599      iL   0.1880 

How a LRNN can learn chaos

In [20], it is shown that FIR and IIR neural networks can simulate an autonomous model of Chua’s circuit (Fig. 3), obtained using the output at t+1 as input at t+2 instant (i.e. the LRNN operates under autonomous conditions). This working conditions allow to forecast the time series so much ahead in time as is desired but, the prediction accuracy is not guaranteed because the network has to be trained only for the one step ahead forecasting. However, the neural network capability to learn the underlying link between input and output variables yields to obtain quite good long term prediction even for chaotic time series. Here the LRNN structure adopted consists of an input, a hidden and an output layer. The input and the output layers contain one and three neurons, respectively. The training set has been constructed by numerical integration of Eqs. (1). The value of the parameters used is 1/C1=9, 1/C2=1, 1/L=7, G=0.7, Ga=-0.8, Gb=-0.5 and E = 1, with initial state:

iL

The kind of synapses, FIR or IIR, the order of digital filters and the number of hidden neurons are found with the TS code. The total number of optimization variables is 5 and their values can vary within the range 1÷20. The objective function is the global instantaneous RMS error calculated during a cross validation learning process [21]. It consists of training the network on the training samples, measuring its performance on the validation samples, and choosing the net that yields the best performance on the validation samples, after a very small number (100) of epochs. The TS stops when a suitable imposed value of the objective function, i.e. the validation set error, is reached. In this application this threshold has been chosen equal to –30 dB. The optimal neural network found with the proposed methodology was able to behave like an autonomous model of the Chua’s circuit. The optimal network architecture consists of 8 hidden neurons and 3 output neurons. All the 8 synapses connecting the input layer to the hidden one are constituted by IIR filters with 17 taps in the forward connection and 12 feedbacks. The 8×3 synapses between the hidden and the output layer are IIR filters with 10 taps and 3 feedbacks. 0.4 0.3 0.2

vc 2

0.1

vc2 vc 1

0

-0.1 -0.2 -0.3 -0.4

∆t

Figure 3: LRNN structure adopted to simulate an autonomous model of Chua's circuit

-0.5 -3

-2

-1

0

1

2

3

iL Figure 4: Double scroll attractor reconstructed with LRNN

double-scroll strange attractor. Clearly, due to the chaotic nature of the system, the errors becomes larger when the prediction is projected too much ahead in the future as shown in Fig. 7 where the behavior of both real (dotted line) and predicted voltage vc1 (solid line) is depicted.

0.4 0.3 0.2 0.1

vc2

0

5. Conclusions

-0.1 -0.2 -0.3 -0.4 -0.5 -3

-2

-1

0

1

2

3

vc1 Figure 5: Double scroll attractor reconstructed with LRNN

2.5 2 1.5 1 0.5

iL 0 -0.5 -1 -1.5

Locally Recurrent Neural Networks proved to be very useful for temporal processing, and are able to modelize even highly non linear dynamic systems. They can predict the system behaviour one-step ahead as far as desired if suitable learning algorithm and network architecture are adopted. Although efficient learning algorithms have been proposed in literature, no constructive algorithms exist to design optimally chosen network’s architecture. In this paper, TS metaheuristic has been used to solve problem and it has been shown how optimally design allows a more rapid and stable convergence of the learning algorithm without time consuming parameter setting. In particular, results show that the optimal network is able to identify the underlying link among Chua’s circuit state variables and exhibits strange attractors under autonomous working conditions. It should be noticed that this is a very interesting result considering both the learning phase and the validation test have been performed only for the one-step ahead prediction. Similar results have been reached in [20] but at the cost of a time consuming manual design procedure that leads to an empirically found network exhibiting larger dimensionality.

-2 -2.5 -3

-2

-1

0

1

2

3

v c1 Figure 6: Double scroll attractor reconstructed with LRNN This network is greatly smaller than the network used in [20]. In addition the use of UTS successfully overcomes the very long trial and error training procedure. Figures 4-6 show the attractors originated by the optimal LRNN that operates under autonomous conditions (i.e. output feedback as input). As shown, the relationship among the state variables predicted by the LRNN is a

6. References [1] A. D. Back, A. C. Tsoi, “FIR and IIR synapses, a new neural network architecture for time series modeling”, Neural Computation, vol. 3, pp. 375-385, MIT, 1991. [2] A. Back, E. A. Wan, S. Lawrence, A. C. Tsoi, “A unifying view of some training algorithms for multilayer perceptrons with FIR filter Synapses”, Neural Computation,vol. 8.1, MIT, 1991. [3] F. Glover, M. Laguna, “Tabu search,” Modern Heuristic Techniques for Combinatorial Problems,

2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -2.5

0

50

100

150

200

250

300

350

Figure 7: Iterate prediction of voltage across C1 capacitor

400

450

500

Blackwell Scientific Publications, Oxford, pp. 70150, 1993. [4] F. Glover, “Tabu search fundamentals and uses,” Technical Report, University of Colorado, Boulder, 1994. [5] F. Glover, “Tabu search - part I,” ORSA Journal on Computing, vol. 1, no. 3, pp. 190-206, Summer 1989. [6] F. Glover, “Tabu search - part II,” ORSA Journal on Computing, vol. 2, no. 1, pp. 4-32, Winter 1990. [7] F. Pineda, “Generalization of back-propagation to recurrent neural networks”, Physical Review Letters, vol. 59, pp. 2229-2232, 1987. [8] P. Werbos, “Backpropagation through time: what it does and how to do it”, in Proc. IEEE, Special Issue on Neural Networks, vol. 2, pp. 1550-1560, 1990. [9] E. A. Wan, “Temporal backpropagation for FIR neural networks”, in Proc. Int. J. Conf. Neural Networks, pp. 575-580, 1990, San Diego. [10] D. Back, A. C. Tsoi, “A simplified gradient algorithm for IIR synapse multilayer perceptrons”, Neural Computation, vol. 5, pp. 456-462, MIT, 1993. [11] P. Campolucci, F. Piazza, A. Uncini, “On-line learning algorithms for neural networks with IIR synapses”, in Proc. IEEE Int. Conf. Neural Networks (ICNN-95), 1995, Perth. [12] P. Campolucci., A. Uncini, F. Piazza, B. D. Rao,“Online learning algorithms for Locally Recurrent Neural Networks”, IEEE Trans. on Neural Networks, vol. 10, n. 2, pp. 253-271, March 1999.

[13] S. Kirkpatrick, C. D. Gelatt, Jr., M. P. Vecchi, “Optimization by simulated annealing”, Science, vol. 220, pp. 671-680, May 1983. [14] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison Wesley, 1989. [15] R. Battiti, G. Tecchiolli, “The reactive tabu search,” ORSA Journal on Computing, vol. 6, no.2, pp. 126140, 1994. [16] R. Madan, "Chua's circuit: a paradigm for chaos", Singapore: World Scientific, 1993. [17] L. O. Chua “Global unfolding of Chua’s circuit”, IEICE Trans. Foundamentals-I, vol. E76-A, pp.704734, 1993. [18] J. A. K. Suykens, J. Vandervalle, “Generation of nDouble scrolls”, IEEE Trans. Circuits Syst.-I, vol. 40 No 11, pp. 861-867, 1993 [19] K. Suykens, J. Vandervalle, “Learning a simple recurrent neural state space model to behave Chua’s double scroll”, IEEE Trans. Circuits Syst.-I, vol. 42 no 8, pp. 499-502, 1995. [20] B. Cannas, S. Cincotti, A. Fanni, M. Marchesi, F. Pilo, M. Usai, “Performance Analysis of Locally Recurrent Neural Networks”, Int. J. COMPEL, vol.17, pp. 708-716, 1998. [21] M. Smith, Neural Networks for Statistical Modeling,, Van Nostrand Reinhold , 1993.

Suggest Documents