Input-Output Stability of Recurrent Neural Networks with Delays using Circle Criteria
Jochen J. Steil and Helge Ritter, University of Bielefeld, Faculty of Technology, Neuroinformatics Group, P.O.-Box 10 01 31, D-33501 Bielefeld, Germany, fjsteil,
[email protected]
Abstract We present a frequency domain analysis of additive recurrent neural networks based on the passivity approach to input-output stability. We apply graphical Circle Criteria for the case of normal weight matrices which result in eectively computable stability bounds, including systems with delay. Approximation techniques yield further generalisation to arbitrary matrices. Keywords: recurrent neural network, inputoutput stability, delay, circle criteria.
1 Introduction One strong motivation for research on recurrent neural network (RNN) models is their capability to model arbitrary temporal behaviour. Because there are a number of learning procedures available, which incrementally adapt a RNN to perform a desired transform from time-varying inputs to timevarying outputs [11], recurrent networks are found in a number of application areas and frequently used as components in larger systems [13, 9]. In this setting we regard a network as operator acting on inputs and then a basic requirement is to avoid unbounded responses. To assure this we develope criteria for input-output stability, i.e., we bound the function norm of the output relative to the input norm, where input and output functions usually are taken from the space L2 of square integrable functions. Thus the concept of input-ouput stability is related to the whole time-development of the inputs and outputs and not to internal states of the system [10, 1, 6, 16]. To regard a neural network as input-output system is natural from the point of view of technical application, on the other hand it is arti cial because neural network models are by de nition
given in state space with states corresponding to activity of formal neurons. The dynamics of the network are given by a set of dierential equations and we can use Ljapunov methods to nd a globally asymptotic stable (GAS) equilibrium in the state space [2, 4, 7, 8]. However, there is no contradiction between the two approaches, both use dierent mathematical methods to address dierent aspects of the system's stability behaviour. The Laplace transform allows to generate an input-output formulation of a state space model and it is possible to use equivalence theorems to switch between the two concepts [12, 16]. Therefore we believe that the introduction of input-output methods from system theory can enrich the recurrent network stability theory. We take this approach especially because it integrates un-delayed and delayed systems in one common framework, yields graphical Circle Criteria for stability conditions and can easily be applied to timevarying systems also. Time varying systems occur in RNN theory if on-line adaptation of the weight matrix introduces uncertain, changing parameters. To perform such an adaptation it is in general of large interest to know weight ranges in which networks operate stably despite of weight changes within that range. Including this aspect our problem can be stated as a classical absolute stability problem for recurrent networks: Find a class of nonlinear transfer functions and a set of weight matrices, such that for all choices of the transfer function within the class and all weight matrices within the set the resulting network is input-output stable. In Section 2 we derive the input-output formulation for the RNN model and its frequency domain description. In Sections 3 and 4 we introduce the basic tools from nonlinear feedback system theory and in Section 5 we apply them to simple and de-
u
e
linear convolution operator. In the time domain, the equation (2) becomes
x=y
G
y(t) = (Ge)(t) = (G e)(t) =
ag replacements
i x 'i (x) i x
+ Figure 1. The RNN as feedback circuit with linear forward path G and sector bounded feedback . layed RNN models. In Section 6 we highlight the connections to Lyapunov theory, in Section 7 we give an illustrative example and nally we discuss our results.
2 The input-output framework We derive an input-output formulation for the type of networks with state space equation
x_ = x + W (x) + u~ ;
(1)
where W 2 Rnn is the weight matrix and the vector of nonlinear transfer functions. The aim is to rewrite the system (1) as nonlinear feedback circuit (G; ) of the type shown in Fig. 1. The forward path in Fig. 1 consists of a linear operator G, and is given either in the time domain by a convolution kernel G or in the frequency domain as transfer function G(s). The frequency function G(s) can directly be found from (1) using the Laplace transform y(s) = x(s) = |! +1 1 W e(s) = G(s)e(s); (2) where e = u (x) and u = W 1u~ . We denote the output by y = x for compatibility to the usual control theory notation. To obtain the time domain kernel G , which is the impulse response function describing the system's behaviour for a delta peak at t =0 as input, we apply the inverse Laplace transform G (t) = L 1 (G(s)) and get a corresponding
Zt 0
G (t )e( )d;
and in either domains we can eliminate e and write the input-output feedback system equation as y = G(u (y)): In the feedback path we require the nonlinear function (x(t); t) = ('1 (x1 (t); t); : : : ; 'n (xn (t); t))T to be subject to 'incremental sector conditions' 0 i 'i ( (t);(tt)) '0i((t) (t); t) i ; (3) for all (t); 0 (t) 2 Rn , 'i (0; t) and i < i 2 R. The sector conditions (3) bound the slope of 'i and restrict its graph to lie between the straight lines y = i x and y = i x as indicated in the lower block in Fig. 1. We write 2 [A; B]; A = diagfi g; B = diagf i g if belongs to the function class de ned by i ; i and (3) and abbreviate this as 2 [; ], if A = I; B = I. As larger sectors cover smaller ones, we always have 2 [A; B] ) 2 [min i ; max i ], i.e., the sector conditions can be made uniform. Apart from the sector conditions we do not require further properties for such as saturation limits, monotony or time-invariance. Finally, also by virtue of (3) solutions exist and are unique [6]. An input-output setting for recurrent networks has previously been used only in Guzelis & Chua [5], where algebraic conditions based on the small gain theorem are presented. The approach taken in [5] diers in the position of the weight matrix in the loop which it is situated in the feedback path and is included in a modi ed ~ = W . That approach simpli es the evaluation of conditions on the forward path but does not yield the graphical methods and simpli cations we present in this paper. The problem is that sector conditions for ~ = W have to be stated in a multivariable fashion referring to matrix cones and do not simply follow from the sector conditions on , as was incorrectly assumed in [5].
3 L2-stability, passivity and similarity transformations
L2 -stability
In the sequel we assume that the input and output functions and the various operators are de ned
for the function space L2 of square integrable functions. L2 -stability of the loop (G; ) in Fig.1 means that of the output de ned by ky(t)k22 = R 1 hyL(2t-norm ); y(t)i dt must not exceed the L2 norm of 0 the input by more than a constant gain factor : kyk2 kuk2: (4)
where 0 = U U 1 . In view of Lemma 1 we can analyse stability of the loop (G; ) in terms of any loop (G0 ; ) with unitary transformed G0 = U GU 1 . Im []
Passivity
b 1 In the passivity approach to feedback system stability we analyse the loop (G; ) in terms of properties of the feedforward operator G and the feedback regarded as independent of each other. We require both paths to be passive, which in analogy to PSfrag replacements physical systems can be interpreted to dissipate the Re [] system's energy. Formally the passivity conditions j j on G and are stated as 1 Im [] hGx; xi kxk2; hx; xi kxk; (5) 212 Re [] min Re [(|!)] where > 0; 0 and h; i denotes the scalar ! product on L2 :
a
1
i (|! )
Z
hx; yi = hx(t); y(t)i dt The conditions (5) are sucient for L2 - stability of (G; ) if G is also L2 -stable: kGxk2 kxk2.
Similarity transformations
It is well known that a coordinate transform z = P x of the system (1) does not change the sector conditions (3) if P = diagfpi g > 0 [8]. It leads to the input-output system (P 1 GP ; ), because
only represents the class de ned by (3), which does not change. We show now that unitary similarity transformations de ne a second important class of matrices which leave uniform sector conditions invariant and thus can be applied if 2 [; ]. Lemma 1. 2 [; ] , U U 1 2 [; ], where U is an arbitrary unitary matrix. Proof. The Lemma follows from the fact that the sector conditions (3) can be expressed as scalar product
D
((x) (x0 )) (x x0 );
E
((x) (x0 )) (x x0 ) 0: Now we substitute U 1 z = x and multiply both sides of the scalar product by U to get
D
(0 (z) 0 (z0 )) (z z0 );
E
(0 (z) (z0 )) (z z0 ) 0
Figure 2. The Circle Criterion: The eigenloci i (|!) must be inside the critical circle for < 0 < .
4 The Circle Criterion To evaluate passivity of G and graphically it is necessary to transform the loop (G; ) by scaling and addition of linear auxiliary operators into a loop (G0 ; 0 ), which is equivalent with respect to stability in the sense that (G; ) is L2 -stable if and only if (G0 ; 0 ) is. Our further development relies on a set of loop transformations introduced in Harris & Valenca ([6], p.222), which result in G0 = (I + BG)(I + AG) 1 ; 0 = (
A)(B ) 1 :
In [6], it is further shown that 0 is in the in nite sector [0; 1] and therefore passive, whereas passivity of G0 remains to be proven graphically. As the Circle Criterion originally was developed for scalar frequency functions, its application to a multivariate G(|!) requires to diagonalise G(|!) into diagfi (G(|!))g such that the criterion can be applied to the scalar functions i (|!) = i (G(|!)). The problem is to carry out the diagonalisation without aecting the sector conditions, which is in general only possible if G is normal, i.e. G G = GG . Then G has a full set of orthogonal eigenvectors, can be diagonalised by a unitary matrix
U and, according to Lemma (1), we can perform the respective similarity transform U GU 1 without changing the stability behaviour. To state the graphical stability condition it jj remains to connect the sector bounds ; to the critical circle C ( b 1 ; a 1) in the complex Re [i (|!)] =Im [i (|!)]-plane shown in Fig.(2). It has its centre on the real line and passes through replacements the points ( b 1 ; 0) and ( a 1; 0) on thePSfrag real axis. For a ! 0 the critical circle degenerates to the abRe [] scissa y = b 1 .
Theorem 1. (Circle Criterion) Let G be normal. Then G is passive if (i) all eigenloci i (|!) = i (G(|!)) lie inside and do not touch the critical circle C ( b 1 ; a 1) for a < 0 < b, (ii) all eigenloci i (|!) lie outside the critical circle for 0 < a < b, where 2 [a; b]. If G is normal, eg symmetric, the real numbers a,b can be chosen a = ; b = , i.e. the sector conditions (3) directly de ne the critical circle. To extend the method to non-normal operators G we use an approximation method also proposed in [6]. It replaces in the forward loop the original G by a normal Gn using a number of preliminary loop transformations. If the operator (Gn G), representing the approximation error, can also be bounded by sector conditions of the type (3), i.e. (Gn G) 2 [m; r], then we choose a 1 = (m + 1 ); m < 1 and b 1 = (r + 1 )(1 + ) 1 , r > 1 , > 0. It follows that the transformed feedback path is passive by virtue of (3) and (Gn G) 2 [m; r] and the circle criterion can be applied to the eigenloci of Gn with modi ed a; b. The respective change in size of the critical circle C ( b 1 ; a 1) is proportional to the estimation of the approximation error (Gn G) by the sector bounds [m; r]. Though there is no systematic procedure to nd for a given G the best normal approximation Gn , it can always be chosen as the diagonal, symmetric or antisymmetric part of G. This technique is especially well suited to account for uncertainties and noise in systems with symmetric matrices, because then the approximation error is small and can eg be estimated by the variance of the noise process.
5 Results for RNN The application of the Circle Criterion to the RNN case is especially simple because we make use of the fact that the eigenloci of the forward operator G(|!) are circles in the complex plane dependent on the eigenvalues of W only. This graphically simple
min Re [(|!)] !
i (|! )
Im []
1 2
1 2
Im []
Re []
Figure 3. The eigenvalue circle of the RNN model together with the part ! > 0 of its delayed version for = 0:3. form also allows conclusions for delayed systems of the form
x_ (t) = x(t) + W (x(t )) + u~ (t)
(6)
with frequency function G (|!) = G(|!)e |! . The points of the delayed eigenloci are these of the undelayed i (|!) rotated around the origin by an amount proportional to ! and at every frequency !. The most important properties of the eigenloci are summarised in (i)-(iii), see also Fig.(3).
(i) G(|!) is normal if W is normal. (ii) The eigenloci i (Gn (|!)) are circles with centre ( 21 Re [i (Wn )] ; 21 Im [i (Wn )]) and radius 12 ji j denoted by C (i ). (iii) The delayed eigenloci i (|!) of Gn (|!) lie inside circles centred at the origin with radius ji j for all . These properties together with the generalised circle criterion yield the following stability theorems for the recurrent networks in the input-output form of Fig.1.
Theorem 2. Consider the RNN system (2), where 2 [; ]. Then the system is L2 -stable, if all circles C (i ) lie inside and do not touch the critical circle C ( b 1 ; a 1), where a 1 = 1 ; b 1 = 1 , if W is normal, and a 1 = ( r + 1 ), b 1 = (r + 1 )(1 + ) 1 , if W is approximated by a normal Wn and maxi i (Wn W )T (Wn W ) r2 .
Theorem 3. (Delays) If all circles centred at the origin with radius ji j are entirely inside the critical
circle, then the system is stable for all delays > 0. If the un-delayed system is stable according to Theorem 2, but not all circles with radius ji j are inside the critical circle, then the system is stable for all delays smaller then a nite max . The main drawback of the method is the need to choose the largest sector [min i ; max i ] for regularisation of the sector conditions. If = 0, which can be assumed for RNN applications, then Theorem 2 can be modi ed in order not to loose the information contained in coordinate-wise upper sector bounds i in (3). We rewrite the system (1) as x_ = x + WBB 1(x) + u~ : Now the modi ed feedback 0 = B 1 is in the sector [0; 1] and it holds Theorem 4. Consider the input-output RNN (2) ( the delayed system (6)) with 2 [0; B]. Then the system is L2 -stable if Theorem 2 (3) holds with W replaced by WB, i.e., if WB is normal and the eigenloci i (i ) is to the left of the abscissa y = 1, or if WB is approximated by a normal Wn , the approximation error is in [ r; r] and the graph of i (i ) lies to the left of y = (1 + r)(1 + ) 1 for some > 0.
Stable weight ranges
To de ne a suitable set of weight matrices, which parametrise a manifold of stable systems, we assume that a network is stable for all in some class [; ]. If we now employ a nonlinearity ~ , which is known to be of class [; ], where > , we can de ne an interval [kmin ; kmax ] such, that for all matrices K = diagfki g; ki 2 [kmin ; kmax ] the product K~ remains in the original class [; ], i.e., parametrises a stable system. We rewrite the system (1) as x_ = x + WK~ (x) + u~ = x + W~ ~ (x) + u~ : (7) and nd that the system (7) is stable for all weight matrices W~ (t) in the matrix set M =: fW 0 jW 0 = diagfki gW ; ki 2 [kmin ; kmax ]g: (8)
6 Relation to Lyapunov theory In state space the stability concept corresponding to input-output stability is global asymptotic sta-
bility of an equilibrium, which can be taken to be the origin without loss of generality. Recently a number of strong conditions based on matrix measures have been developed [2, 4, 7, 8] which give sharper stability bounds than earlier results for matrix norms reviewed eg in [7]. All these results rely on time-invariant feedback (x(t); t) = (x(t)) and explicit construction of Lyapunov functions. It can be shown that these results can as well be derived in the input-output framework [14] using a multivariable Popov criterion and the KalmanYakubovich-Lemma [12], even if the weight matrix is not invertible. In general there is a far reaching equivalence between the input-output and the state space approaches. If for a given u (t) the input-output system is L2 -stable, incrementally bounded and the state space is uniformly observable and reachable, then the corresponding solution trajectory in state space is globally asymptotically stable, regardless whether it is an equilibrium or a non-stationary solution. If W is invertible the assumptions are satis ed by the RNN models by virtue of the sector bounds (3). Thus implicitly our Theorems (2-4) provide also new conditions for state space systems.
Theorem 5. Assume the system (2) with timevarying feedback (x(t); t) is L2 -stable according to Theorems (2-4) and let a corresponding stable weight matrix set M be de ned by (8). Then for the time-varying state space system (1) under zero input u~ (t) = 0 the origin is global asymptotically stable and under non-zero u~ (t) the corresponding state-space trajectory is globally asymptotically stable for all W (t) 2 M. The analysis of delays with Ljapunov methods is quite dicult and has been included in more general studies on systems with uncertain parameters with known bounds, which correspond to the sector bounds (3). These systems are regarded to de ne a polytope in the linear matrix space and stability for the whole class can be proven by parallel solving of a Lyapunov equation for every corner of that polytope [17, 3]. This leads to complicated proofs and much computational eort, eg in [17] a large number of 2n auxiliary linear systems must be checked for stability or simpli cations are made which result in conservative inequalities.
7 Example We consider a simple example where Wn is normal for every choice of the parameters p; q; r, but
C ( 1:25; 1)
3 -1
ag replacements
-0.5
C (3 )
1
= 1:06
0:5 (|!)
1
1
0.8
C (3 )
0.6
0.5
0.4 0
-0.5
1 0.5
0.2 1
PSfrag replacements
C (1 )
-1
Figure 4. The eigenvalue circles 1;3 (|!) de ne the stability range through the abscissa 1 = 1:06 or the critical circle for 1 = 1; 1 = 1:25. is neither symmetric nor antisymmetric, Wn = 0 0 r q p1 0 0 1 1 11 1 04 14 1 C B B r 0 p qC 4 B C B @ q p 0 r A ; i:e: @ 14 0 1 441 CA 1 1 1 p q r 0 4 4 0
with eigenvalues 1;2 = 1 and 3;4 = 1 |( 21 ). As complex conjugate eigenloci lead to the same circle conditions we show in Fig. 4 the eigenvalue circles C (1 ) and C (3 ) only, together with the critical circle corresponding to the minimal = 1 and the critical line for = 0. From Fig. 5, we see how in the delayed case for = 0:5 the sector de ned by 1 shrinks. The Fig.5 also illustrates Theorem p 3, from which it follows that for 1 = 25 = max ji j the system is stable for all delays. To apply the normal approximation method we add to W a disturbance matrix W with jwij j < 0:03. The matrix W de nes the approximation 1 W = 1 (Gn G) in the feedback error |!+1 |!+1 path and is in the symmetric sector [ 0:12; 0:12]. In the table we show some sectors derived from the various Criteria presented above. Obviously the larger we choose , the larger will be . However, from Fig. 4 it is obvious that the largest overall sector is achieved if is at its minimum 1.
8 Discussion We provide an input-output framework to introduce a number of powerful methods of non-linear
-1
-0.8
-0.6
-0.4
-0.2
!0
0
0.2
0.4
-0.2
e |!0
-0.4 -0.6
Figure 5. The 3 circle with and without delay = 0:5. The delayed loop is stable for = 0, 1 = 1:09. System
a
b
Wn 0 0.94 Wn -1 0.88 Wn -0.2 0.93 Wn , all > 0 p25 p25 Wn ; = 0:3 0 0.924 Wn ; = 0:5 0 0.917 Wn ; = 1 0 0.911 W n + W 0 0.85 W n + W -0.89 0.80 Figure 6. Stability sectors for dierent systems with and without delays and disturbances. system theory into the research on recurrent network stability. The main gain lies in the possibility to handle time-invariant and time-varying systems with and without delay all in a uni ed manner. Further conceptually simple manipulations of the input-output loop provide many tools to reshape the system for better application of the theory, which is much simpler than developing a suitable Lyapunov function for every special case. We show that the input-output theory also enriches the much more developed state space theory. Especially interesting is the link to on-line weight adaptation provided by the transfer of the degree of freedom in the choice of the non-linear feedback to the freedom to choose weights from some stable sets. We demonstrated a simple and eective graphical method to nd such stability ranges for normal matrices and normal approximations of
general matrices. Further research will concentrate on the task to nd structurally `well behaved' networks, which yield large stability sectors, which includes to develop techniques for nding good normal approximations. Developing the approach we hope to draw more on the known methods of feedback design in control theory, eg to add stabilisators to a neural controller to give more internal freedom in the choice of the weights.
References
[1] C. Desoer and M. Vidyasagar. Feedback Systems: input-output properties. Academic Press, New York, 1975. [2] Y. Fang and T. G. Kincaid. Stability analysis of dynamical neural networks. IEEE Tansactions on Neural Networks, 7(4):996{1005, 1996. [3] Y. Fang, K. A. Lopardo, and X. Feng. Sucient conditions for the staiblity of interval matrices. Int. J. Control, 58(4):969{977, 1993. [4] M. Forti and A. Tesi. New conditions for global stability of neural networks with application to linear and quadratic programming problems. IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, 42(7):354{366, 1995. [5] C. Guzelis and L. Chua. Stability analysis of generalized cellular neural networks. International Journal of Circuit Theory and Applications, 21(1):1{33, 1993. [6] C. Harris and J. Valenca. The Stability of InputOutput Dynamical Systems. Academic Press, London, 1983. [7] X. Liang and L. Wu. Global exponential stability of hop eld-type neural network and its applications. Science in China (Series A), 38(6):757{768, 1995. [8] K. Matsuoka. Stability conditions for nonlinear continuous neural networks with asymmetric connection weights. Neural Networks, 5:495{500, 1992. [9] K. S. Narendra. Neural networks for control: Theory and practice. Proceedings of the IEEE, 84(10):1385{, 1996. [10] K. S. Narendra and J. H. Taylor. Frequency Domain Criteria for Absolute Stability. Academic Press, New York, 1973. [11] B. A. Pearlmutter. Gradient calculations for dynamic recurrent neural networks: A survey. IEEE Tansactions on Neural Networks, 6(5):1212{1228, 1995. [12] A. Rantzer. On the kalman-yakubovich-popov lemma. Systems & Control Letters, 28:7{10, 1996.
[13] J.-J. C. Slotine and R. M. Sanner. Stable adaptive control of robot manipulators using neural networks. Neural Computation, 7(4):753{790, 1995. [14] J. J. Steil and H. Ritter. Input-output vs. Lyapunov stability for continuous time recurrent neural networks. 1998. submitted to NIPS 98. [15] K. Tanaka. An approach to stability criteria of neural-network control systems. IEEE Transactions on Neural Networks, 7(3):629{642, 1996. [16] M. Vidyasagar. Nonlinear Systems Analysis. Prentice Hall, second edition, 1993. [17] K. Wang and A. M. Michel. Stability analysis of dierential inclusions in banach space with applications to nonlinear systems with time delays. IEEE Transactions on Circuits and Systems-I:Fundamental Theory and Applications, 43(8):617{626, 1996.