XML Template (2010) {TANDF_FPP}TSYS/TSYS_A_462964.3d
[16.2.2010–12:50am] (TSYS)
[1–13] [PREPRINTER stage]
International Journal of Systems Science Vol. ??, No. ?, Month?? 2010, 1–13
Automated nonlinear system modelling with multiple neural networks Wen Yua*, Kang Lib and Xiaoou Lic a
5
Departamento de Control Automatico, CINVESTAV-IPN, A.P. 14-740, Av.IPN 2508, Me´xico D.F. 07360, Me´xico; b School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Ashby Building, Stranmillis Road, Belfast, BT9 5AH, UK; cDepartamento de Computacio´n, CINVESTAV-IPN, A.P. 14-740, Av.IPN 2508, Me´xico D.F. 07360, Me´xico (Received 2 April 2009; final version received 4 January 2010)
10
15
This article discusses the identification of nonlinear dynamic systems using multi-layer perceptrons (MLP). It focuses on both the structure uncertainty and parameter uncertainty which have been widely explored in the literature of nonlinear system identification. The main contribution is that an integrated analytic framework is proposed for automated neural network structure selection, parameter identification and hysteresis network switching with guaranteed neural identification performance. Firstly, an automated network structure selection procedure is proposed within a fixed time interval for a given network construction criterion. Then the network parameter updating algorithm is proposed with guaranteed bounded identification error. To cope with structure uncertainty, a hysteresis strategy is proposed to enable neural identifier switching with guaranteed network performance along the switching process. Both theoretic analysis and simulation example show the efficacy of the proposed method. Keywords: neural networks; identification for control
20
25
30
35
40
1. Introduction Due to their simple topological structure and universal approximation ability, neural networks have been widely used in time-series prediction, nonlinear system modellling and control. This article focuses on the modellling of nonlinear dynamic systems using multi-layer perceptrons (MLPs). In neural modellling, the adjustable parameters, including the connection weights and biases need to be adjusted along the identification process, while it is also preferable to control the network size in practical applications, based on the principle of model parsimony (Al-Duwaish, Nazmul Karim, and Chandrasekar 1997). Therefore parameter optimisation and network construction are both important issues (Li, Peng, and Bai 2006). While evolutionary algorithms (EAs) have been applied to deal with the two issues (Gonzalez et al. 2003; Leung, Lam, Ling, and Tam 2003), these can be computationally very expensive and the tuning of continuous parameters can be very slow. Although few analytic methods were proposed in the past to deal with both the two issues, they have separately been studied extensively in the literature. For MLPs, the early training algorithms make use of the first derivative information, i.e. backpropagation
*Corresponding author. Email:
[email protected] ISSN 0020–7721 print/ISSN 1464–5319 online 2010 Taylor & Francis DOI: 10.1080/00207721003624550 http://www.informaworld.com
with adaptive learning rate and momentum (BPAM), conjugate gradient, QuickProp (Bishop 1995), etc. The advanced training algorithms like the LevenbergMarquardt (LM), however, make use of the second derivative information, proved to be more efficient and have been widely used in applications (Marquardt 1963). In contrast to the conventional two-stage learning procedure, supervised learning methods aim to optimise all the network parameters. To improve the convergence, various techniques have been introduced. For examples, hybrid algorithms combine the gradient-based search for the nonlinear parameters and the least-squares estimation of the linear output weights (Hong, Mitchell, and Chen 2008). Second-order algorithms have also been proposed, which use an additional adaptive momentum term to the LM algorithm in order to maintain the conjugates between successive minimisation directions, resulting in good convergence for some well-known hard problems (Ampazis and Perantonis 2002). To control the network complexity, a number of additive (or constructive, or growing) methods, and subtractive (or destructive, or pruning) methods have been proposed (Chen, Billings, and Grant 1992). Generally speaking, conventional network training
45
50
55
60
65
XML Template (2010) {TANDF_FPP}TSYS/TSYS_A_462964.3d
2
70
75
80
85
90
95
100
105
110
115
120
[16.2.2010–12:50am] (TSYS)
[1–13] [PREPRINTER stage]
W. Yu et al.
and selection methods can be extremely time consuming if both issues have to be considered simultaneously (Li et al. 2006). Furthermore, if the neural modellling is put into a wider scope, it is well known that most plants under study will experience a wide range of variations, including internal and external disturbance and the variation of operation conditions. These variations inevitably lead to both the structure uncertainty and the parameter uncertainty in neural modellling. These two issues have also been researched intensively in the past. The first multi-model approach may be found in Lainiotis (1976) where multiple Kalman filters were used to improve the accuracy of the state estimation. The switching for multi-model was first introduced in Morse, Mayne, and Goodwin (1992) when the unknown linear systems can be stabilised by the use of adaptive schemes. More general versions of continuous-time and discrete-time multi-model adaptive controllers can be found in Chen and Narendra (2001). Stability analysis of multi-estimators for adaptive control with reduced model is proposed in Gonzalez et al. (2003). Since the multiple models may describe more complex behaviour of the dynamic systems, the transient performance of adaptive control can be improved (Li et al. 2006). A comprehensive survey on nonlinear process identification with multiple models may be found in Boukhris, Mourot, and Ragot (1999). In many cases, the plant to be modelled is too complex to find the exact system dynamics, and the operating conditions in dynamic environments may be unexpected. Therefore, neural modelling has been combined with multiple models. Kikens and Karim (1999) used several static neural networks as multi-model identifier, the switching algorithm was realised by a gating neural networks, but the stability analysis was not presented. Multi-model identification and failure detection using static neural networks is presented in Selmic and Lewis (2001). In Yu (2006), a hierarchical mixture of experts method combining input/output space is employed. Lee and Lee (2004) proposes adaptive feedback linearising controller where nonlinearity terms are approximated with multiple neural networks. They conclude that the closed-loop system is globally stable in the sense that all signals involved are uniformly bounded. Another type of multiple neural networks for adaptive control is adaptive critic neural networks (Werbos 1992). The adaptive critic method determines optimal control laws for a system by successively adapting two neural networks, namely an action neural network (which dispenses the control signals) and a critic neural network (which ‘learns’ the desired performance index for some function associated with the performance index). These two
neural networks approximate the Hamilton–Jacobi equation associated with optimal control theory. During the adaptations, neither of the networks need any ‘information’ of an optimal trajectory, only the desired cost needs to be known. This technique of neuro-controller design does not require continual online training, thus overcoming the risks of instability (Yu 2006). Despite these above proposals, few researches have been carried out in the past to integrally perform automated neural network structure selection and parameter identification, together with the network switching under multiple neural networks structure with guaranteed neural identification performance. The identification objectives of this article are: (1) To design and prove a new stable updating algorithm without robust modifications. It is well known that normal gradient algorithms are stable when neural network models match nonlinear plants exactly (Polycarpou and Ioannou 1992). In the presence of disturbances or unmodelled dynamics, some modifications should be used such that the learning processes are stable, but the identification errors become bigger. For example, projection law (Al-Duwaish et al. 1997) forces the weights to stay inside a compact region, -modification and -rule (Chen 2008) assures boundedness of the weights. (2) To design a sequence switching policy, and prove the convergence of the multiple neural identification scheme. In general, even for linear time invariant switched systems, the stability of each component system in the switching loop does not guarantee the stability of entire switched system under arbitrary switching laws. Some conditions on switching policy are needed (Morse et al. 1992) to stabilise the whole system. In order to design an automated method for nonlinear systems modelling via multiple neural networks, we study the following cases. First, multiple neural networks for nonlinear system identification is discussed in Section 2. Then for each neural model, an automated structure selection procedure is proposed in Section 3. After structures are determined, parameters (weights) updating algorithm of neural models are guaranteed bounded in Section 4. Section 5 proposes a hysteresis switching scheme to select the best neural model from the multiple neural networks generated automatically in Sections 3 and 4. Section 6 presents the simulation results and Section 7 concludes this article.
125
130
135
140
145
150
155
160
165
170
175
XML Template (2010) {TANDF_FPP}TSYS/TSYS_A_462964.3d
[16.2.2010–12:50am] (TSYS)
[1–13] [PREPRINTER stage]
3
International Journal of Systems Science 2. Multiple neural networks for nonlinear system identification Consider a nonlinear discrete plant represented by yðkÞ ¼ f x ðkÞ, þ eðkÞ ð1Þ 180
where k is the time instant, x ðkÞ ¼ yðk 1Þ, yðk 2Þ, . . . , y k ny , uðk dÞ, uðk d 1Þ, . . . , uðk d nu ÞT 2 : ai,i ð12Þ 0 i4j i1 T i1 pi 4 q ai,y ¼ ai,i
320
¼ cij aiþ1,y biþ1, j , ciþ1 j
djiþ1 ¼ dji
b2iþ1, j aiþ1,iþ1
ð14Þ
and ð9Þ
Ei ¼ yT Ri y
310
and (10), the following regression is established:
The matrix A and vector ay can be calculated recursively by (11). With the definitions T T 4 4 iþ1 4 bi, j ¼ Ti Ri1 Tj , ciþ1 ¼ qiþ1 piþ1 ¼ piþ1 piþ1 j j , dj j ð13Þ
2 cij E j ¼ i , dj
i, j ¼ 1 n
ð15Þ
Then at time k ¼ N, the following automatic selection approach for the hidden nodes of a neural identifier is used. Step 1: Initialisation Let E0 ¼ (y)T(y), and aj,k ¼ (rj)T rk, aj,y ¼ yTrj for j, k ¼ 1, . . . , n according to (12) and (13). Moreover, let i ¼ 0, where i is the number of selected neural nodes, and calculate c0j and dj0 for j ¼ 1, . . . , n according to (14). Step 2: Selection of hidden nodes Calculate E(j) for j ¼ i þ 1, . . . , n according to (15). Search for the maximum contribution among all n i candidate hidden nodes, and the one giving the maximal contribution will be selected as the (i þ 1)-th hidden node. Then update the sum-squared cost function E by subtracting the net contribution of the selected hidden node. Step 3: Update phase Update matrix A, vector ay, as well as bi, j, ciþ1 j , and iþ1 dj according to (12) and (13), respectively. Step 4: Check phase Check if the network construction criterion is satisfied. If no, let i ¼ i þ 1; and go to Step 2 to select the next hidden node. Otherwise, terminate the network construction phase. Several criterion can be used, for example (1) the desired number of hidden nodes has been selected; (2) the sum squared error is reduced to a given level; (3) the network contribution of the last selected hidden node is below certain threshold; (4) other criterion like the Akaike’s information criterion (AIC) (Akaike 1974) begins to increase, etc. In this article, we use 3) to check if the hidden nodes are suitable.
325
330
335
340
345
350
355
4. Stable weights updating of neural models After hidden nodes are selected, the structure of the neural identifier is then fixed. According to (3) and (5), the nonlinear plant and the neural network can be represented as b yðkÞ ¼ WðkÞ’½VðkÞxðkÞ fk yðkÞ ¼ W 0 ’ V 0 xðkÞ þ e
360
XML Template (2010) {TANDF_FPP}TSYS/TSYS_A_462964.3d
[16.2.2010–12:50am] (TSYS)
[1–13] [PREPRINTER stage]
6
W. Yu et al.
The neuro identification error is defined as eðkÞ ¼ b yðkÞ yðkÞ
ð16Þ
Use Taylor series for ’[V(k)x(k)] on the point of V 0x(k), ’½VðkÞxðkÞ ’ V 0 xðkÞ ¼ ’0 VðkÞ V 0 xðkÞ þ Rl ð17Þ 365
370
where Rl is the remainder of the Taylor formula, and ’0 represents the first derivative of the activation function vector ’. This remainder of the bounded function ’ is assumed to be bounded, the same assumption was proposed by Al-Duwaish et al. (1997). The identification error can be represented as eðkÞ ¼ WðkÞ’½VðkÞxðkÞ W 0 ’ V 0 xðkÞ e fk ¼ WðkÞ’½VðkÞxðkÞ W 0 ’½VðkÞxðkÞ þ W 0 ’½VðkÞxðkÞ W 0 ’ V 0 xðkÞ e fk ek xðkÞ þ ðkÞ ek ’½VðkÞxðkÞ þ W 0 ’0 V ¼W
375
380
ð18Þ
ek ¼ VðkÞ V 0 , ðkÞ ¼ ek ¼ WðkÞ W 0 , V where W 0 e W Rl ðkÞ fk : In this article, only open-loop identification is discussed, the plant (1) can be assumed to be bounded-input and bounded-output stable, i.e. y(k) and x(k) in (1) are bounded. By the boundness of the fk sigmoid function ’, e fk is bounded. So ðkÞ ¼ W 0 Rl e can be assumed to be bounded. W 0 does not affect the stability property of the neuro identification, but it influences the identification accuracy. W 0 ¼ w01 w0n could be obtained from the calculation in the last section. Multiplying i at both side of (8) and using (9) yields T i Wi ¼ i M1 i i y ¼ y Ri y ¼
i X
wm m
ð19Þ
m¼1
385
T Then multiply pj1 ¼ Tj Rj1 at both sides of (19), j which gives j1 T yTj Rj1 Ri y ¼ pj
i X
wm Tj Rj1 m ,
j ¼ 1, . . . , n
Thus, within the time interval [1, N ], the structure of the neural networks using the method proposed in the above section is determined, and the output weights W according to (20) is calculated. While for the weights in hidden layer, within the time interval [1, N ], they can be updated using the following backpropagation algorithm 0
0 T
Vðk þ 1Þ ¼ VðkÞ eðkÞ’ W x ðkÞ,
390
395
1kN ð21Þ
where 05 1, and e(k) is the modellling error at time instant k. In summary, for given N data samples during the time interval [1, N ], along with the process of hidden node selection presented in Section 3, the output weights W is updated by (20) in conjunction with the backpropagation algorithm (21) for the hidden node weights. Normally, should be small to assure a stable learning procedure. In order to resolve the trade-off problem between the fast convergence and the stable learning, a normalising learning rate (k) is defined as ðkÞ ¼
2 2 1 þ ’0 W 0 xT ðkÞ þ ’
It is therefore easier to be decided, and no ‘prior’ information is required, e.g. ¼ 1. Some concepts of input-to-state stability (ISS) is recalled. Consider the following discrete-time nonlinear system: xðk þ 1Þ ¼ f ½xðkÞ, uðkÞ m
400
405
410
ð22Þ n
where u(k) 2 R is the input vector, x(k) 2 R is a state vector, f 2 C1 is a general nonlinear smooth function. Definition 1: A system (22) is said to be globally input-to-state stability if there exist a K-function () (continuous and strictly increasing, and (0) ¼ 0) and KL -function () (K-function and limsk !1 ðsk Þ ¼ 0), such that, for each u 2 L1 (sup{ku(k)k}51) and each initial state x0 2 Rn, it holds that
x k, x0 , uðkÞ x0 , k þ uðkÞ
415
420
Definition 2: A smooth function L : Rn ! R 0 is called a smooth ISS-Lyapunov function for system (22) if:
m¼1
Since Rj m ¼ 0 when m5j, according to (12), j, j aj,y ¼wj aj, j þ
i X
wm aj, m
m¼jþ1
When i ¼ n (select all candidates), then Pn 0 m¼jþ1 wm aj, m 0 wj ¼ aj,y , j ¼ n, n 1, . . . , 1 ð20Þ aj, j
(a) there exist K1-functions 1() and 2() (K-function and limsk !1 i ðsk Þ ¼ 1, i ¼ 1, 2) such that 1 ðsÞ LðsÞ 2 ðsÞ,
8s 2