A Recursive Orthogonal Least Squares Algorithm for ... - Springer Link

167

Neural Processing Letters 5: 167–176, 1997. c 1997 Kluwer Academic Publishers. Printed in the Netherlands.

A Recursive Orthogonal Least Squares Algorithm for Training RBF Networks D.L. YU,? J.B. GOMM and D. WILLIAMS School of Electrical Engineering and Electronics, Liverpool John Moores Univesity, Byrom Street, Liverpool L3 3AF, UK E-mail: [email protected]

Key words: multi-variable chemical process, neural network modelling, orthogonal least squares, RBF network, recursive algorithm Abstract. A recursive orthogonal least squares (ROLS) algorithm for multi-input, multi-output systems is developed in this paper and is applied to updating the weighting matrix of a radial basis function network. An illustrative example is given, to demonstrate the effectiveness of the algorithm for eliminating the effects of ill-conditioning in the training data, in an application of neural modelling of a multi-variable chemical process. Comparisons with results from using standard least squares algorithms, in batch and recursive form, show that the ROLS algorithm can significantly improve the neural modelling accuracy. The ROLS algorithm can also be applied to a large data set with much lower requirements on computer memory than the batch OLS algorithm.

Acknowledgements This work is funded by the EPSRC, UK under Grant No. GR/K 35815. 1. Introduction A radial basis function (RBF) network can represent any continuous non-linear function and therefore, can be used in dynamic system modelling and control. The main reason for RBF networks being widely used is the linear relationship between the network output and the weights, which enables linear optimisation methods, such as least squares (LS) type algorithms, to be employed. Training data can be collected for neural network modelling when the system is subjected to an excitation signal such as a random amplitude sequence (RAS) [1] or a modified RAS [2]. These signals are designed to excite the non-linear system dynamics completely and cover the input space. However, it is common for industrial systems not to allow production to stop for experimentation. In this case, on-line data with the system under closed-loop control is often collected. Ill-conditioning in the training data occurs with closed-loop data and consequently causes a reduction in the modelling accuracy. Ill-conditioning is usually measured for a data matrix by its condition

? Corresponding author

168

D.L. YU ET AL.

() = ( ) = ()

number defined as max=min , where are singular values of the matrix. In the batch LS algorithm, the information matrix T needs to be manipulated and because T 2 , the effect of ill-conditioning on the parameter estimation will be greater. Orthogonal decomposition is a well known technique to eliminate ill-conditioning. Although Korenberg et al. [3] proposed an orthogonal estimation method, which was later used by Chen et al. [4] for training neural network models, this is a batch orthogonal least squares (OLS) method and in practice can only be used for a small data set as the orthogonal decomposition of a large information matrix needs a large amount of computer memory. In this paper, a recursive OLS (ROLS) algorithm for multi-input, multi-output (MIMO) systems is developed, based on a single-input, single-output (SISO) form proposed recently by Bobrow and Murray [5], and is applied to updating the RBF network weighting matrix. The proposed algorithm is applied to training a RBF network model of a MIMO chemical reactor to demonstrate the effectiveness of the method.

2. RBF Network Modelling In neural modelling a NARX model is usually adopted to represent non-linear dynamic systems. In the MIMO case the following form of the NARX model is considered,

y(t) = f (y(t 1); ; y(t ny ); u(t kd); ; u(t kd nu + 1)) + e(t) where y

ep (t)]T

(1)

(t) = [y1(t); ; yp(t)]T ; u(t) = [u1 (t); ; um (t)]T ; e(t) = [e1 (t); ;

are the system output, input and noise vectors, respectively; p and m are the number of outputs and inputs respectively; ny and nu are the maximum lags in the output and input respectively; kd is the time delay in the inputs; e t is assumed to be a white noise sequence; and f is a vector-valued, continuous nonlinear function. A standard RBF network using a thin-plate-spline basis function, configured to represent the NARX model of Equation (1), performs a non-linear mapping x 2 RN ! y 2 Rp via the transformations,

()

()

^

y^T (t) = T (t)W

(2)

i (t) = d2i (t) log di(t) i = 1; ; nh

(3)

di(t) =k x(t) ci k

(4)

2

^( )

i = 1; ; nh

()

where y t and x t are the network output and input vectors at sample time t respectively; W 2 Rnh p is the weighting matrix with element wij denoting the weight connecting the ith hidden node output to the j th network output; t 2 Rnh is the output vector of the non-linear basis functions in the hidden layer with the

()

169

A RECURSIVE ORTHOGONAL LEAST SQUARES ALGORITHM

ith element (t); ci 2 RN is the ith centre vector; nh is the number of nodes in the hidden layer; di (t) is the Euclidean distance of the input vector to the ith hidden node centre. The weighting matrix, W , is computed to minimise the modelling error, "(t) = y (t) y^(t): There are several possible choices for the basis function, i (t), in the network, such as Gaussian or multiquadratic. It has been reported that

the choice of basis function is not crucial to the performance of the network [4]. The thin-plate-spline, equation (3), was chosen in this work because, unlike other possible basis functions including the Gaussian, this basis function does not require the choice of an additional width parameter. 3. ROLS Algorithm for MIMO Systems

Bobrow and Murray [5] proposed a ROLS algorithm for a single variable estimation. The method is extended to the MIMO case in this section. The development of the algorithm for MIMO systems is described as follows, with reference to determining the weighting matrix, W , in a RBF network. Considering equation (2) for a number of M groups of data, we have

Y = Y^ + E = W + E (5) where Y 2 RM p is the desired output matrix; Y^ 2 RM p is the neural model output matrix; 2 RM nh is the hidden layer output matrix; E 2 RM p is the modelling error matrix and

Y T = y(1); ; y(M ) ; Y^ T = y^(1); ; y^(M) ; T = (1); ; (M ) ; ET = "(1); ; "(M ) :

Now a MIMO least squares problem can be formulated to solve W such that the following cost function is minimised

J (W ) =k W Y kF :

(6)

Since the F-norm of a matrix is preserved by an orthogonal transformation, Equation (6) is equivalent to

J (W ) =k QT W QT Y kF where Q is an orthogonal matrix from the orthogonal decomposition of , 2R3 = Q4 5 :

(7)

(8)

0

Let

2^3 Y QT Y = 64 75 Y~

(9)

170

D.L. YU ET AL.

Equation (7) becomes

" ^ # " ^ # R Y RW Y J (W ) = 0 W Y~ = Y~ F : F

(10)

It follows that the optimal W which minimises the cost function (10) is

RW = Y^

(11)

~

and leaves the residual being Y . This is the batch algorithm for the OLS F method. Note that M may be very large for a training data set and therefore the orthogonal decomposition in (8) would be very difficult. To obtain a recursive algorithm, suppose an optimal Wk is obtained at stage k to minimise Jk jj k Wk Yk jjF , we need to solve a Wk such that

= Wk+1 = Wk + Wk

(12)

minimise

2 3 2 Yk 3 k Jk+1 = 4 5 Wk+1 4 5 : Tk+1 ykT+1 F

(13)

Because the size of matrix increases with new data, the manipulation of will be more difficult when the number of data becomes large. However, R in (8) can be manipulated much easier as its size is constant and small. In fact, cost function (13) is equivalent to the following cost function

2 2 ^ 3 Yk Rk 3 Jk+1 = 4 5 Wk+1 64 75 : Tk+1 ykT+1 F

(14)

This can be proved as follows. It is known from the definition of the F-norm that if for two matrices, A and B, AT A B T B , then jjAjjF jjB jjF . Let

2 3 k A = 4 5 Wk+1 Tk+1

= = 2 ^ 3 2 Y 3 2 Rk 3 Y k 4 5 ; B = 4 5 Wk+1 64 75 T T T yk+1

k+1

yk+1

then

AT A = W T W W T Y Y T W + Y T Y + WT W WyT yT W + yyT = B T B + Y~ T Y~

(15)

171


which bases on the following identities

T = RT R; T Y = RT Y^ ; Y T = Y^ T R; Y T Y = Y^ T Y^ + Y~ T Y~ : Equations (15) and (10) indicate that, since Y~ T Y~ is small and not affected by the

choice of W , cost function (14) is equivalent to (13). Using equation (12), cost function (14) is transformed to the following form

2 Rk 3 Jk+1 = 4 5 Wk T 2 Rk+1 3 k = 4 T 5 Wk k+1 2 Rk Orthogonally decompose 4

Tk+1

02 ^ 3 2 3 1 R Y k B @64 75 4 T 5 Wk CA = T F 2 yk+1 0 k+31 4 5 : ykT+1 Tk+1Wk F 3 5and let

2^ 3 2 3 0 64 Yk+1 75 = QTk+1 4 5 T T T yk+1 k+1Wk y~k+1

2 Rk 3 2R 3 k+1 4 5 = Qk+1 4 5 ; Tk+1

(16)

0

(17)

cost function (16) becomes

2 2 ^ 3 Rk+1 3 Yk+1 Qk+1 4 5 Wk Qk+1 64 75 = 2 0 y~kT+1 F 3 Rk+1Wk Y^k+1 75 : = 64 ::: y~T k+1

(18)

F

Hence, the optimal in (18) can be solved from

Rk+1Wk = Y^k+1

(19)

Note that Rk+1 is an upper triangular matrix and therefore Wk can be easily solved from (19) by backward substitution. The procedure of the algorithm is the following. Firstly, at stage k , calculate Rk+1 using QR decomposition and Yk+1 according to (17). Secondly, Wk solve in (19) and then update Wk+1 according to (12). Initial values for R and W can be assigned as R0 I and W0 0, where is a small positive number.

=

^

=

172

D.L. YU ET AL.

Figure 1. The chemical reactor process.

4. An Illustrative Example The methods described in Sections 2 and 3 are applied to neural network modelling of a chemical reactor. A RBF network model is trained and validated. The chemical reactor is first described followed by the RBF network modelling.

4.1. THE CHEMICAL REACTOR The reactor used in this research is a pilot system established in the laboratory to generally represent the dynamic behaviour of real chemical processes in industry. The schematic of the chemical reactor is shown in Figure 1. It consists of a continuously stirred tank to which the chemical solutions, ammonium hydroxide (NH4 OH ), acetic acid (CH3COOH ) and sodium sulphite (Na2 SO3 ), and air are added. The liquid level in the tank is maintained at a pre-specified constant level by an outflow pump system. The concentrations and the flow rates of CH3COOH and Na2 SO s are constant, while the flow rates of NH4 OH and air are adjustable to control the pH and the dissolved oxygen in the tank. The liquid temperature is controlled by a heating system. With the three inputs, the heating power (Q), the flow rate of NH4 OH fb and the flow rate of air fa , and the three outputs, liquid temperature (T), pH and percentage of dissolved oxygen (pO2), the process is a MIMO, non-linear system with complex, interactive dynamics.

( )

( )

4.2. RBF NETWORK MODELLING The reactor is modelled using a RBF network. Following some further investigations on the process, the system sample interval was chosen to be 10 seconds based

173


Figure 2. Process and model outputs for temperature.

on examining the rise times for different variables. The RBF network inputs were chosen as

x(t) = T (t fb(t

) ( ) pO2(t ) ( )

1 ; pH t 1 ; T 1 ; fa t 1

1

); pO2(t

2

); Q(t

22

);

using the model order and time-delay selection method of Gomm et al. [1]. The network outputs corresponded to the three process outputs, T(t), pH(t) and pO2(t). A set of closed-loop, input-output process data was collected with 2700 samples. The first 1800 samples formed the training data set and the last 900 samples formed the test data set. The RBF network used in the modelling is as described in Section 2, with 7 inputs, 60 neurons in the hidden layer and 3 outputs. The RBF network centres were chosen using the standard k-means clustering method and the weighting matrix was updated using the ROLS algorithm described in Section 3. The training data was normalised to have zero mean and unit variance. After training, cross-validation was performed using the test data set. The real process output and the RBF model output of the three variables for the test data set are displayed in Figure 2 to Figure 4, where the thin lines are the process outputs and the bold lines are the model outputs.

174

D.L. YU ET AL.

Figure 3. Process and model outputs for pH.

Figure 4. Process and model outputs for dissolved oxygen.

From the graphs it can be seen that the neural model accurately predicts the process outputs. To measure the accuracy of the model, a normalised mean square error (NMSE) index is employed as follows for the ith network output,

NMSE (yi ) = M1

M y (t) y^ (t) 2 X i i : y ( t ) i t=1

(20)

175


For the three output variables in this model the indices are respectively,

NMSE (T ) = 1:0573e 5; NMSE (pH ) = 9:3911e NMSE (pO2 ) = 0:0037

6;

To compare the effects of the ROLS algorithm with that of the conventional RLS and batch LS algorithms [6] on the modelling errors, the same neural network was trained using the RLS algorithm for the same set of training data and tested using the same set of test data. The NMSEs obtained for the three variables are respectively,


5;

These results show that for the temperature and the pH, the ROLS is much better than the RLS. A batch LS algorithm was also used to compute the weights for the same network. The results are


4;

It can be seen for the temperature and the pH that the batch LS updating even more severely degrades the modelling error. This is because the information matrix in the batch LS updating is nearly singular. For the dissolved oxygen, the ROLS algorithm also achieves the lowest NMSE, together with the LS algorithm. It is noted, however, that the three algorithms provide a similar accuracy for the dissolved oxygen. The reason is that, unlike the temperature and the pH, the dissolved oxygen data were not so correlated. 5. Conclusions A ROLS algorithm for MIMO systems is developed in this paper, based on the SISO form [5], and applied to updating the weighting matrix in a RBF network. A RBF model of a MIMO chemical reactor was trained using the developed algorithm. The modelling errors of the three network outputs were compared with those obtained using conventional RLS and batch LS algorithms. The results indicate that the ROLS algorithm can be applied to a large set of training data with a low requirement on computer memory, and can eliminate ill-conditioning in the data set, so that the network model accuracy is greatly improved. The ROLS algorithm for MIMO systems can be applied to any MIMO linear optimisation problem. The benefits of the ROLS algorithm should be further assessed in other application areas of neural networks. In [4], a batch OLS algorithm was used, not only to determine the output layer weights in an RBF network, but also for choosing the number and positions of the network centres. An interesting area for further research is to investigate how the ROLS algorithm can be applied to this problem.

176

D.L. YU ET AL.

This may also lead to an approach for developing on-line adaptive neural networks which are needed for application to time-varying problems. References 1. J.B. Gomm, D.L. Yu and D. Williams, “A new model structure selection method for non-linear systems in neural modelling”, Proc. UKACC International Conference on CONTROL’96, pp. 752– 757, Exeter, UK, Institution of Electrical Engineers, 1996. 2. J.B. Gomm, D. Williams, J.T. Evans, S.K. Doherty and P.J.G. Lisboa, “Enhancing the non-linear modelling capabilities of MLP neural networks using spread encoding”, Fuzzy Sets and Systems, Vol. 79, No. 1, pp. 113–126, 1996. 3. M.J. Korenberg, S.A. Billings, Y.P. Liu and P.J. McIloy, “Orthogonal parameter estimation algorithm for non-linear stochastic systems”, Int. J. Control, Vol. 48, pp. 193–210, 1988. 4. S. Chen, S.A. Billings, C.F.N. Cowan and P.M. Grant, “Practical identification of NARMAX models using radial basis functions”, Int. J. Control, Vol. 52, No. 6, pp. 1327–1350, 1990. 5. J.E. Bobrow and W. Murray, “An algorithm for RLS identification of parameters that vary quickly with time”, IEEE Trans. Automatic Control, Vol. 38, No. 2, pp. 351–354, 1993. 6. L. Ljung and T. Soderstrom, Theory and Practice of Recursive Identification, MIT Press: Cambridge, MA, 1983.

A Recursive Orthogonal Least Squares Algorithm for ... - Springer Link

A Recursive Orthogonal Least Squares Algorithm for ... - Springer Link

Suggest Documents

Recursive Least Squares Dictionary Learning Algorithm Abstract

The Kernel Recursive Least Squares Algorithm - CiteSeerX

Recursive Least Squares Dictionary Learning Algorithm Abstract

A recursive algorithm for nonlinear least-squares ... - Semantic Scholar

Orthogonal Least Squares Algorithm for Training Cascade Neural

Hierarchic Kernel Recursive Least-Squares

MULTIPLE CONCURRENT RECURSIVE LEAST SQUARES ...

Recursive Least-Squares Estimation for Hammerstein Nonlinear ...

Partial Diffusion Recursive Least-Squares for Distributed

Using Recursive Least Squares Estimator For ...

Orthogonal Least Squares Algorithm Applied to the Initialization of

Multichannel recursive-least-squares algorithms ... - Semantic Scholar

Least Squares Orthogonal Distance Fitting of

recursive least squares - IEEE Computer Society

Asymptotically Convergent Modified Recursive Least-Squares with ...

"Event Compression Using Recursive Least Squares Signal

Recursive Least Squares Method in Parameters ... - doiSerbia

Distributed Recursive Least-Squares: Stability and Performance ...

mutually orthogonal Latin squares DOCK - Springer Link

Research Article Kernel Recursive Least-Squares

Partial-Diffusion Recursive Least- Squares Estimation

Library Least-Squares (MCLLS) approach - Springer Link

A least-squares minimization approach for model ... - Springer Link

A total least squares solution for geodetic datum ... - Springer Link