PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DYNAMICAL SYSTEMS AND DIFFERENTIAL EQUATIONS May 24 – 27, 2002, Wilmington, NC, USA
pp. 656–663
ON STOCHASTIC STABILITY OF DYNAMIC NEURAL MODELS IN PRESENCE OF NOISE
Kayvan Najarian College of Information Technology The University of North Carolina at Charlotte 9201 University City Blvd, Charlotte, NC 28223, U.S.A. Abstract. Dynamic feedback neural networks are known to present powerful tools in modeling of complex dynamic models. Since in many real applications, the stability of such models (specially in presence of noise) is of great importance, it is essential to address stochastic stability of such models. In this paper, sufficient conditions for stochastic stability of two families of feedback sigmoid neural networks are presented. These conditions are set on the weights of the networks and can be easily tested.
1. Introduction. Feedback neural networks have been successfully used to model many complex real systems. In such networks, the previous samples of the output signal is fed back to the input nodes of the model. As the described feedback can easily lead to instability (specially at presence of additive noise), it is necessary to address the issue of stochastic stability for such models. The need to include noise into the stability discussion comes from the fact that even when the deterministic input is bounded nonlinear systems can easily generate unbounded output merely due to the presence of noise, i.e. in some practical systems, instability is caused not by the deterministic (control) input but by the noise. Here, first the discussion of stability for a more general form of dynamic models is considered and then the results are further specified towards dynamic neural networks. Assuming that ut−q+1 , ut−q+2 , . . . ut−d describe the history of the input variable and yt−k , yt−k+1 , . . . yt−1 that of the output, the more general nonlinear Auto Regressive eXagenous (ARX) model can be formulated as follows: yt = f (yt−k , yt−k+1 , . . . yt−1 , ut−q+1 , ut−q+2 , . . . ut−d ) + ζt
(1)
where d, q − d − 1, k, ζt , and f represent the degree of the input, the delay from the input to the output, the degree of the output, the additive noise on the system, and the functional dependency of the model, respectively. Although the model can represent multi-dimensional models, here the single-input/single-output (SISO) case is considered. It is also assumed that ut and ζt are uncorrelated sequences of independently and identically distributed (i.i.d.) random variables. The Markov process formed as (1) includes a wide range of dynamic models used in engineering applications including dynamic (feedback) neural networks, as mentioned above. One of the most important properties of a nonlinear ARX model that is being investigated, is the stochastic stability of the model. Stochastic stability not only guarantees the 1991 Mathematics Subject Classification. Primary: 58F15, 58F17, 58F11; Secondary: 53C35. Key words and phrases. Dynamic Neural Networks,Stochastic Stability, Nonlinear ARX Models.
656
ON STOCHASTIC STABILITY OF DYNAMIC NEURAL MODELS
657
issue of boundedness of the output for bounded inputs, but also establishes the necessary conditions for many concepts such as learning and ergodicity. The concept of stochastic stability has been addressed in the literature assuming different definitions for stochastic stability, resulting in different sufficient stability conditions for nonlinear ARX models. The concept of Lagrange stability [1] defines a notion of stability based on a Lyapounov function [3] [4] defined in terms of the process (the definition of this function is given Section 2). Kushner’s work on stochastic stability [3],[2] has provided a more comprehensive mathematical framework for testing stochastic stability of discrete systems. Important results in this field come from the relation between the stochastic Lyapounov stability (as in Kushner’s work) and the concept of geometric ergodicity [4], [6]. The results of this line of research not only provide simple practical notions of stochastic stability, but also create a foundation for assessment of other statistical properties such as the learning properties of dynamic models [8]. In all learning paradigms presented for dynamic models, the assumption of processes being stochastically stable and geometrically ergodic is treated as a fundamental requirement, i.e. learning properties of dynamic models cannot be evaluated unless the assumption of models being stochastically stable and stochastically stable is verified [10], [9], [11]. This further calls for evaluation of these two properties for important families of nonlinear dynamic models. Here, the general results of [4] are applied to the special case of neural modeling with sigmoid neural networks and specific sufficient conditions under which the model is both geometrically ergodic and stochastically stable are presented. The paper is organized as follows. Section 2 gives the basic definitions of stochastic stability, as well as the existing results on the geometric ergodicity of a general family of nonlinear ARX models. Section 3 contains a set of sufficient conditions over the parameters of a sigmoid neural network, which guarantees the stochastic stability of the model. Section 4 discusses the results obtained in Section 3 and is followed by Section 5 which gives the conclusions of the paper. 2. Basic Definitions of Stochastic Stability. In this section, some of the basic concepts of geometric ergodicity as well as the existing results on the stochastic stability of nonlinear ARX models are reviewed. Consider an integer “t” and let Xt be a Markov chain with the state space (Rp , B), B being the collection of Borel sets. The t-step-ahead transition probability of Xt is denoted by P t (x, A), i.e. : P t (x, A) = P (Xt ∈ A|X0 = x), x ∈ Rp , A ∈ B..
(2)
Now, the concept of geometric ergodicity is defined as follows. Definition 2.1. Xt is geometrically ergodic if there exists a probability measure π on (Rp , B), a positive constant ρ < 1, and a π-integrable non-negative measurable function $ such that for any t: kP t (x, .) − π(.)kV ar ≤ ρt $(x), x ∈ Rp . where k.kV ar denotes the total variation norm.
(3)
658
KAYVAN NAJARIAN
Definition (2.1) shows that geometric ergodicity is closely related to stability. According to (3), in a geometrically ergodic process, the transition probability approaches a (possibly unknown) well-behaved probability measure π geometrically fast. Here the definition of stochastic stability is given: Definition 2.2. Suppose that for a Markov chain Xt . The process is called to be stochastically stable iff there exists a non-negative and measurable function V (called a Lyapounov function), and constants c > 0 and 0 < ρ < 1 such that: E(V (Xt+1 )|Xt = x) ≤ ρV (x) − c.
(4)
The concept of stochastic stability is also refereed to as “stochastic Lyapounov stability”. The Lyapounov function V and the way the stability condition is defined shows why the name stochastic Lyapounov stability seems to be a more appropriate name for this concept. Also, from the definition, it can be seen that when the sequence Xn is deterministic (i.e. there is no noise), the paradigm of stochastic stability is reduced to the conventional (deterministic) Lyapounov stability. The following theorem by Mokkadem [5] is known to be the most general result on geometric ergodicity of Markov processes and the way this property is related to stochastic stability. Theorem 2.1. (Mokkadem [5]) Suppose a Markov chain Xt , is stochastically stable as described in Definition (2.2). Then Xt is a geometrically ergodic process. Theorem (2.1) shows how geometric ergodicity relates to the concept of stochastic stability. Next, the focus is given to the existing results on the Markov process of form (1). Here, a theorem by Doukham [4], which presents a set of sufficient conditions for geometric ergodicity of the process (1) is reviewed. Theorem 2.2. [Doukham [4]] Consider the process (1). Let: Xt = (yt−k , yt−k+1 , . . . yt−1 , ut−q+1 , ut−q+2 , . . . ut−d ) .
(5)
Assume that Xt,i indicates the ith element of Xt . Also, assume the followings. 1. There exists a number x0 > 0 and non-negative constants ψ1 , . . . , ψk , a locally bounded measurable function h : R → R+ , and a positive constant c such that: supkXt k≤x0 |f (Xt )| < ∞ (where kXt k is the Euclidean norm of Xt ), and |f (Xt )| ≤
k X j=1
ψj |Xt,j | +
q−d+k X
h(Xt,j ) − c
(6)
j=k+1
if kXt k > x0 . 2. E[|ζ1 |] + (q − d)E[h(u1 )] < c < ∞ . Then, if the unique non-negative real zero of the “characteristic polynomial” P (z) = z k − ψ1 z k−1 − · · · − ψk is smaller than one, the process (1) is geometrically ergodic and stochastically stable.
ON STOCHASTIC STABILITY OF DYNAMIC NEURAL MODELS
659
Although the details of the long proof presented for this theorem (by Doukham) are not repeated here, a brief description of the general scheme of the proof will be given. The proof starts with introducing a Lyapounov function of the form:
V (Xt ) =
k X
αj |Xt,j | +
j=1
q−d+k X
βj h(Xt,j ) .
(7)
j=k+1
Then the appropriate choices of αj ’s and βj ’s to satisfy (4) are investigated. It is then proved that if all the assumptions made in the theorem hold, the condition set on the zeros of the characteristic polynomial P (z) guarantees geometric ergodicity. As can be seen, the sufficient conditions set in Theorem 2.2 guarantee the stochastic Lyapounov stability as well as the geometric ergodicity of the model. 3. Stochastic Stability of Sigmoid Neural Networks. This section starts with the following lemma about the atan sigmoid neural networks. Lemma 3.1. Suppose x ∈ Rp . Consider a family of sigmoid neural networks F with members as follows: Pl f (x) = i=1 ai σ(bi x). where: σ(.) = π2 tan−1 (.) is a smooth sigmoid activation function, l indicates the number of neurons, ai ’s (ai ∈ R) are the weights of the output layer and the pdimensional vectors bi ’s defined as: bi = (bi1 , . . . , bip ) represent the weights of the hidden layer. Then: Ã l ! p X X2 |f (x)| ≤ |ai ||bij | |xj | . (8) π j=1 i=1 Proof. From the definition of atan sigmoid neural networks: |f (x)|
=
l 2 X | ai tan−1 (bi x)| π i=1
≤
l 2X |ai || tan−1 (bi x)| π i=1
≤
l 2X |ai ||bi x| π i=1
=
p l X 2X |ai || bij xj | π i=1 j=1
p l X 2X |ai | |bij ||xj | π i=1 j=1 ! Ã l p X X2 |ai ||bij | |xj |. = π j=1 i=1
≤
which concludes the proof.
660
KAYVAN NAJARIAN
The next lemma gives a similar bound for the bipolar exponential sigmoid networks: Lemma 3.2. Suppose x ∈ Rp . Consider a family of sigmoid neural networks F with members as follows: Pl f (x) = i=1 ai σ(bi x) −(.)
where: σ(.) = 1−e is a smooth sigmoid activation function, l indicates the number 1+e−(.) of neurons, ai ’s (ai ∈ R) are the weights of the output layer and the p-dimensional vectors bi ’s defined as: bi = (bi1 , . . . , bip ) represent the weights of the hidden layer. Then: Ã l ! p X X |f (x)| ≤ |ai ||bij | |xj |. (9) j=1
i=1
Proof. From the definition of bipolar exponential neural networks:
|f (x)|
= ≤ ≤
¯ ¯ l ¯X 1 − e−bi x ¯¯ ¯ ai ¯ ¯ ¯ 1 + e−bi x ¯ i=1 ¯ ¯ l X ¯ 1 − e−bi x ¯ ¯ ¯ |ai | ¯ −bi x ¯ 1 + e i=1 l X
|ai ||bi x|
i=1
=
l X
|ai ||
l X i=1
=
|ai |
p X j=1
à l p X X j=1
bij xj |
j=1
i=1
≤
p X
|bij ||xj | !
|ai ||bij | |xj |.
i=1
which concludes the proof. Now, the following theorems present a set of sufficient conditions for stochastic stability and geometric ergodicity of the families of sigmoid neural network discussed above. These conditions involve the known parameters of the network, and as a result can be easily tested during a practical modeling task. A family of atan sigmoid networks is to be addressed first: Theorem 3.1. Let Xt = (yt−k , yt−k+1 , . . . yt−1 , ut−q+1 , ut−q+2 , . . . ut−d , ). Take yt , ζt and ut as defined in (1). Also assume that f is a sigmoid neural network as defined in Lemma (3.1) with x = Xt where: p = q − d + k. Further assume that E[|ζt |] ≤ Mζ and E[|ut |] ≤ Mu . Define:
ON STOCHASTIC STABILITY OF DYNAMIC NEURAL MODELS
ωj =
l X 2 |ai ||bij |. π i=1
661
(10)
where j = 1, . . . , k. Let: Mω = maxj ω. Also define the following characteristic polynomial: P (z) = z k − ω1 z k−1 − · · · − ωk . Then the sequence Xt is geometrically ergodic and stochastically stable if the unique non-negative real zero of P (z) is smaller than one. Proof. In order to apply the results of Theorem(2.2), the existence of a real number x0 such that the conditions of the theorem are satisfied has to be investigated. Lemma (3.1) shows that for any x0 , sup|Xt |≤x0 |f (Xt )| < ∞. Therefore the case where kXt k > x0 is investigated. Assuming kXt k > x0 , there exists at least one index τ such that: Xt,τ > √
x0 , 1 ≤ τ ≤ q − d + k .. q−d+k
(11)
Next, from Lemma (3.1):
|f (Xt )| ≤
k X
ωj |Xt,j | +
j=1
q−d+k X
ωj |Xt,j | .
(12)
j=k+1
Now, taking an arbitrary positive real number ρ > 0: k X
|f (Xt )| ≤
(ωj + ρ)|Xt,j | +
j=1
q−d+k X
(ωj + ρ)|Xt,j | − ρ
q−d+k X
|Xt,j | .
(13)
j=1
j=k+1
Now observe that:
ρ
q−d+k X
|Xt,j | ≥ ρ √
j=1
x0 . q−d+k
(14)
Therefore:
|f (Xt )| ≤
k X
(ωj + ρ)|Xt,j | +
j=1
q−d+k X
(ωj + ρ)|Xt,j | − ρ √
j=k+1
x0 . q−d+k
(15)
Defining: ψj = ωj +ρ, 1 ≤ j ≤ k, h(Xt,j ) = (Mω +ρ)|Xt,j |, k +1 ≤ j ≤ q −d+k x0 and c = ρ √q−d+k , the next step of the proof would be checking the second condition of Theorem(2.2). It suffices to have: x0 Mζ + (q − d)E[(Mω + ρ)|X1,q−d+k |] < ρ √q−d+k .
This means that it suffices to have: Mζ + (q − d)Mu (Mω + ρ) < ρ √
x0 . q−d+k
(16)
Now it can be seen that in order to satisfy Inequality (16), x0 and ρ need only be chosen such that ρx0 is large enough to satisfy the inequality. Choose ρ sufficiently
662
KAYVAN NAJARIAN
small (but non-zero) such that if the positive real root of P (z) with ψj = ωj , (j = 1, . . . , k) is less than 1, the positive real root of P (z) with ψj = ωj +ρ, (j = 1, . . . , k) is also less than 1. Then, choose x0 sufficiently larger so that Inequality (16) holds. Under such choices of x0 and ρ, if the unique positive real root of P (z) with ψj = ωj , (j = 1, . . . , k) is less than 1, all the conditions for geometric ergodicity are satisfied. Moreover, if Xt is stationary then “y” is geometrically α-mixing. In the above theorem it is assumed that if the positive real root of P (z) with ψj = ωj , (j = 1, . . . , k) is less than 1, there exists ρ such that the positive real root of P (z) with ψj = ωj + ρ, (j = 1, . . . , k) is also less than 1. This assumption requires that a very small change in the coefficients of P (z) does not change the location of the poles significantly, because ρ can be made arbitrarily small (but not equal to zero) to avoid such a change. A similar result can be obtained for a family of bipolar exponential networks, as follows. Theorem 3.2. Let Xt = (yt−k , yt−k+1 , . . . yt−1 , ut−q+1 , ut−q+2 , . . . ut−d , ). Take yt , ζt and ut as defined in (1). Also assume that f is a sigmoid neural network as defined in Lemma (3.2) with x = Xt where: p = q − d + k. Further assume that E[|ζt |] < Mζ and E[|ut |] < Mu . Define: ωj =
l X
|ai ||bij |,
(17)
i=1
where j = 1, . . . , k. Also define the following characteristic polynomial : P (z) = z k − ω1 z k−1 − · · · − ωk . Then the sequence Xt is geometrically ergodic and stochastically stable if the unique non-negative real zero of P (z) is smaller than one. Proof. Defining ωj ’s as in (17), the rest of the proof is the same as Theorem (3.1), and is not repeated here. Theorems (3.1) and (3.2) give sufficient conditions for the stability of the corresponding sigmoid networks, which can be easily tested. 4. Discussion. The results of the previous section is now discussed: 1. The conditions given for stochastic stability are all “sufficient” and not “necessary”. This means that a neural model may not satisfy the above condition but still be stochastically stable. 2. Since the conditions are set only on the weights of the networks, they are very easy to be verified. The verification process is done by forming the characteristic polynomials (as defined above) and calculating the positive real root of the resulting polynomial, which can be easily performed. 3. Once the stochastic stability of such models are tested and verified, the resulting neural models can be used as powerful and reliable tools in modeling of complex systems. 4. The conditions given here also guarantee the important property of geometric ergodicity, which is a pre-assumed property in many fields of study. For example, in the promising paradigm of Computational Learning Theory, geometric
ON STOCHASTIC STABILITY OF DYNAMIC NEURAL MODELS
663
ergodicity and stochastic stability are pre-assumed as the fundamental properties of dynamic models. With the results obtained here, instead of assuming these properties, one can verify them [7]. 5. Conclusions. In this paper, the paradigm of stochastic stability has been applied to feedback sigmoid neural networks. Sufficient conditions for stochastic stability of two popular families of neural networks have been introduced. This gives quantitative evaluation of stability for such neural models. which in turn helps introducing dynamic neural models that are theoretically reliable. REFERENCES [1] J.P. La Salle, Stability theory for difference equations, MAA Studies in Mathematics, American Math. Assoc. (1977), 1–31. [2] H.J. Kushner, On the stability of processes defined by stochastic difference-differential equations, J. Differential Equations, (4) 3 (1968), 424–443. [3] H.J. Kushner, Stochastic Stability, in Lecture Notes in Math., Springer, New York, (1972), 97–124. [4] P. Doukham, Mixing, properties and examples, Springer-Verlog, (1985). [5] A. Mokaddem, Mixing properties of polynomial autoregressive processes, Ann. Inst. H. Poincare Probab. Statist. (26) (1990), 219–260. [6] H. Tong, Non-linear time series, Oxford Science Publications, (1990). [7] K. Najarian, Appliation of learning theory in neural modeling of dynamic systems, Ph.D. thesis, Dpartment of Electrical and Computer Engineering, University of British Columbia, (2000). [8] M.C. Campi and P.R. Kumar, Learning dynamical systems in a stationary environment, Proc. 31th IEEE Conf. Decision and Control, (16) 2 (1996), 2308-2311. [9] A.D. Aldous and U. Vazirani, A Markovian extension of Valiant’s learning model, Proc. 31th Annual IEEE Symp. on the Foundations of Comp. Sci.,(1990), 392-396. [10] K. Najarian, G.A. Dumont, M.S. Davies, N. E. Heckman, Neural ARX Models and PAC Learning, Springer’s Lecture Notes in Artificial Intelligence Series, LNAI 1822 (2000), 305313. [11] P. Bartlett, Fischer, Hoeffgen, Exploiting random walks for learning, Proc. 7th ACM COLT, (1994), 318-327.
Received September 2002; in revised March 2003. E-mail address:
[email protected]