Adaptive GRNN Modelling of Dynamic Plants Teo Lian Seng+, Marzuki Khalid** and Rubiyah Yusof (All correspondence should be sent to ** )
+ Faculty of Engineering, University Telekom (UNITELE), Jalan Air Keroh Lama, Bukit Beruang, 75450 Melaka, Malaysia. Email :
[email protected]
** Center for Artificial Intelligent and Robotics Universiti Teknologi Malaysia, Jalan Semarak, 54100 Kuala Lumpur, Malaysia. Email address:
[email protected] Fax number: +603-2970815
ABSTRACT This paper proposes an integrated General Regression Neural Network (GRNN) adaptation scheme for dynamic plant modelling. The scheme can be used in a noisy and dynamic environment for online process control. It possesses several distinguished features compared to the original GRNN proposed by Specht, such as flexible pattern nodes add-in and delete-off mechanism, dynamic initial sigma assignment using non-statistical method, automatic target adjustment and sigma tuning. These adaptation strategies are formulated based on the inherent advantageous features found in GRNN, such as highly localised pattern nodes, good interpolation capability, instantaneous learning, etc.. Good modelling performance was obtained when the GRNN is tested on a linear plant in a noisy environment. It performs better than the well-known Extended Recursive Least Squares identification algorithm. In this paper, analysis on the effects of some of the adaptation parameters involving a nonlinear plant is also investigated. The results show that the proposed methodology is computationally efficient and exhibits several attractive features such as fast learning, flexible network sizing and good robustness, which are suitable for the construction of estimators or predictors for many model-based adaptive control strategies.
Keywords:
1.
General Regression neural network (GRNN), modelling, dynamic process, adaptation, system identification.
INTRODUCTION Since the past decade, artificial intelligent methods such as fuzzy logic, neural networks,
chaotic theory, etc., have emerged rapidly as alternative solutions for system modelling. In particular, neural networks have been widely applied for empirical process modelling, especially for nonlinear or ill-defined processes. Neural networks such as the multi-layer feed-forward networks, recurrent networks, etc. can be trained to associate input data to output data, which in this context can be used to learn unknown plant dynamics. This inherent learning capability and good generalisation behaviour
1
make them favourable to be used in complex system identification [1]. Currently, neural networks are being applied to physical plant modelling with satisfactory results [2-3]. In control engineering, neural network models are generally used as dynamic plant emulators for controller design and also as predictor models in many adaptive control configurations [4-8]. Using neural networks as direct controllers to the plant are less popular, however, several techniques have been reported [6]. Based on the available information, the predictor model in a process control system is used to estimate the future system states, which is usually a few samples ahead and within the operating envelope of the systems. However, implementing a predictor model usually has the constraint of limited network size, and in many real-time applications fast processing speed is required. In reality, the neural network predictors that are applied online should have fast and continuous learning capability. Out of many neural network paradigms, the General Regression neural network (GRNN) seems to posses such characteristics [9] and thus, it is being used as the motivation for this research. GRNN has been applied in a number of applications for system control and identification [1014]. There has also been some comparative studies to demonstrate the modelling capability of the GRNN model with respect to other types of neural networks [10,12,15]. Although there are some studies on GRNN adaptation methods, the assignment of the sigmas is usually based on the overall statistical calculation from a pre-collected batch of training data [10,16]. This approach may not be suitable to be applied in a continuous modelling environment as the model needs to be updated continuously due to the changes in plant dynamics or operating conditions. However, there is not much work reported on adaptive GRNN for modelling of dynamical systems, especially for online applications. Furthermore, investigations on the adaptation aspects of GRNN parameters in dynamic process modelling are still lacking and still in its infancy. This paper proposes an integrated approach of the GRNN adaptation scheme for dynamical plant modelling. The adaptive GRNN modelling scheme is suitable to be applied in a noisy and dynamic control environment. The GRNN model is equipped with some distinguished features not found in the original GRNN model [9], such as flexible pattern nodes mechanism with add-in and delete-off features, dynamic initial sigma assignment using a non-statistical method, and automatic adjustment of the targets and sigmas associated with the pattern nodes. These proposed adaptation strategies are basically formulated based on the inherent advantageous features of GRNN, such as expandable and reducible network structure and the exclusive local properties of the pattern nodes [9,16]. The advantages and rationale of these strategies are experimentally investigated in modelling of linear and nonlinear plants. Relative performance of the proposed adaptive GRNN modelling
2
technique is compared to the popularly known mathematical based Extended Recursive Least Squares identification algorithm (ERLS) [17]. Furthermore, the effects of some adaptation parameters to the overall modelling efficiency are also investigated. This paper has been organised as follows. A brief introduction and background review of GRNN are first given. It is followed by an explanation on the adaptive strategies of the GRNN model. The next section presents the modelling experiments and their corresponding results. These experiments cover the scope of plant dynamic modelling with different noise levels for linear as well as nonlinear plants. The experiments show that the proposed integrated adaptation strategies of the GRNN model have several advantageous characteristics such as fast learning capability, robustness to noise, flexible network size, etc., which is suitable to be applied in building predictor models for many model-based adaptive control systems.
2.
GRNN FOR PROCESS PLANT MODELLING The GRNN paradigm was proposed by Donald Specht as an alternative to the well known
back-error propagation training algorithm for feed-forward neural networks [18]. It is closely related to the better-known probabilistic neural network [9]. Regression in this context can be thought of as the least-mean-squares estimation of variables based on available data. From a computational viewpoint, the GRNN is based on the estimation of a probability density function from observed samples using Parzen window estimation [19]. It utilises a probabilistic model between an independent vector random variable X with dimension D, and a dependent scalar random variable Y. Assumed that x and y are measured values for X and Y, respectively, and f(X,Y) represents the known joint continuous probability density function; and further assumed that f(X,Y) is also known, the expected value of Y given x (the regression of Y on x) can be estimated as:
∫ E[Y | x] =
∞
Y ⋅ f ( x, Y ) dY
−∞ ∞
∫
−∞
Eq(2.1)
f ( x, Y ) dY
Assume that the underlying density is continuous and the first partial derivatives of the function evaluated at any x are small, thus, based on p samples of observation data (the training set which is given by x and y), the probability estimator f$( x,y ) can be formulated as:
fˆ ( x, y ) =
1
(2π )( D+1) 2 σ D +1
T ( y − y i )2 1 p ( x − x i ) ( x − x i ) ⋅ ∑ exp − ⋅ exp − 2σ 2 p i =1 2σ 2
Eq(2.2)
3
where xi and yi are the ith training data set, and xi denotes the vector form of variable xi. A physical interpretation of the probability estimate f$( x,y ) is that it assigns a sample probability of width (σ) for each sample-pair of xi and yi, such that, the probability estimate is computed as the sum of the probabilistic samples. Substituting Eq(2.2) into Eq(2.1), the desired conditional mean of Y given x, is y$ such that: p
yˆ ( x ) = E[Y | x ] =
∑[y i =1
i
⋅ exp (d i )]
p
∑ exp (d ) i =1
Eq(2.3)
i
where di is the distance function between the input vectors and the centers recorded in the pattern nodes, which is given by:
( x − x i )T ( x − x i ) di = − (σ )2
Eq(2.4)
In the above GRNN formulation, all the input variables and pattern nodes share a single common sigma. The sigma is also sometimes known as the smoothing factor or widening factor of the kernel. If a single sigma value is used for all the variables, this means that the network needs to be provided with variables whose variations are all commensurate, which may be impractical in many dynamic modelling. On the other hand, assigning an independent sigma for each of the variables may greatly improve the generalisation accuracy and bring significant beneficial effects [9, 16]. This approach is especially useful when each of the stored patterns has its own importance, or the distance between the centers in each pattern is not uniform. In process plant modelling, the non-uniformity of the distance occurs due to the fact that the changing rate for each network input variables at different sampling instants could be significantly different. In this case, it is suggested that the pattern nodes of the GRNN model which record the transient behaviour of a process plant, should preferably be assigned with different sigmas from pattern nodes that stored the steady state characteristics. Thus, the distance function in equation (2.4) can be re-written as:
D
di = −∑ j =1
(x
− x ij )
2
j
(σ )
2
Eq(2.5)
ij
where x ij and σ ij denote the center and sigma, respectively, of the jth variable for the ith pattern node, and D is the dimension of the GRNN inputs. The computational procedures of GRNN can be viewed as the weighted average of all the observed data using the distance criteria in the input space. A
4
general structure of the GRNN can be illustrated as in Fig. 1. Input layer will simply channel in the input vector to the GRNN, and its distances to the recorded patterns are then calculated in each of the pattern nodes at pattern layer. The summation layer will perform the summing operation as for the upper and lower parts of Eq(2.3), and the final network's output is obtained at the output layer which performs the normalisation function.
{Fig.1}
GRNN is capable of approximating any arbitrary function, either linear or non-linear relationships between input and output variables, drawing the function estimates directly [10]. It is particularly advantageous with sparse data in a real-time environment, because the regression surface is instantly defined, even by just one sample. The learning technique of GRNN is different from the most popular neural network learning method, i.e., the delta rule. It does not use an iterative tuning approach to acquire the training information, but it learns almost instantaneously once presented with the training data. The GRNN learning method simply stores the training patterns and processes them through a nonlinear smoothing function to determine the component output probability density functions. This fast learning feature is suitable for real-time complex plant modelling where the updated process dynamics need to be modelled quickly. The decision surfaces of the GRNN are guaranteed to approach the Bayes-optimal decision boundaries as the number of training samples increases [9]. In all cases, the GRNN instantly adapts to new data points, this could be an advantage for continuous modelling of process plants, which often involves some changes in dynamics. In addition, its output does not converge to local minima, thus making it preferable for online modelling [10-11]. The output of the GRNN is always bounded when the training data is bounded, i.e. the stability of the model is guaranteed. Furthermore, adding new samples to the training set does not require re-calibration of the model. It is easy to modify or upgrade the GRNN according to specific needs of an environment, as each of the learning patterns is stored locally in the corresponding pattern nodes [10,16]. The other advantage of GRNN is that, during the training process, erroneous samples are tolerated by other existing pattern nodes [20]. GRNN has been reported to deliver better performance than other neural networks and classical modelling methods, especially in noisy environment [12]. However, one of the disadvantages of GRNN is that it requires all the training samples to be stored for future use. The network size could grow substantially large in a continuous and dynamic environment modelling due to large amounts of training data are being fed into the network over a
5
long period. Some clustering algorithms have been proposed to overcome this problem [21-22]. Other than that, there is currently no intuitive method for selecting the optimal smoothing factor. The statistical method of determining smoothing factors is not suitable in dynamic and continuous modelling environment [18]. Furthermore, GRNN requires many training samples in order to adequately include variations in the data. These disadvantages of GRNN are being compensated in the proposed adaptive mechanism in this paper.
3.
ADAPTIVE GRNN MODELING SCHEME In the proposed methodology, the GRNN can evolve dynamically from a null network, i.e.
with zero pattern nodes. The number of pattern nodes of the network increases as the training progresses. When an unlearned knowledge of the plant is presented into the network, a new pattern node can be created to store this new knowledge. Thus, the model can learn consistently even with changing plant dynamics. At the same time, the less valuable pattern nodes stored can be discarded. As the tuning of the pattern node parameters are performed locally, it gives the advantage of speed if parallel processing is applied. It was found that with the proposed methodology, no stopping criteria of learning is required, the adaptation can go on continuously in a process control system without causing over-tuning or over-fitting. Basically, the proposed adaptation methodology constitutes four strategies. The roles of these strategies cover the scope of creation of new pattern nodes, dynamic initialisation of new pattern nodes, adjustment of the targets, and tuning of the sigmas. Figure 2 presents the flow of the adaptation procedures. In each iteration, a new set of training vector is formed from the feedback system states. It is then channelled to the network for prediction purposes and for testing if a new pattern node needs to be created. Based on the firing level, the pattern nodes will either go for target adaptation or sigma tuning.
{Fig.2}
3.1
Adaptation Strategy I : Creation of New Pattern Nodes In an iterative and continuous learning environment, creation of one pattern node for each of
the training data is not appropriate, as it will cause the GRNN network size to grow infinitely. However, fixing specific network size is also not appropriate as this may deter the acquisition of new knowledge. We use an alternative approach to avoid these difficulties while preserving the GRNN
6
modelling accuracy. New patterns will be created only if necessary, and the maximum number of pattern nodes allowed is limited, which serves to control the largest size of the network due to hardware and software limitations. In order to overcome this problem, if the maximum number of pattern nodes has been reached but a new node is still needed to be inserted, an existing pattern node that has the lowest merit will be identified and replaced to cater for the new node. For this purpose, each of the pattern nodes is associated with a merit index (η : 0 ≤ η ≤ 1 ), with a value of one assigned upon its creation. The merit index η reflects the accumulated firing strength of that pattern node throughout the iterations. For prediction at the kth sampling instant, the index η for the ith pattern node is updated as follows: ηi(k) = 0.99 * ηi(k-1) + fi(k)
Eq(3.1)
where fi(k) is the firing level of the ith pattern node, and computed as: D x −x i ij f i ( k ) = exp − ∑ j =1 σ ij
2
Eq(3.2)
The merit index can be used to indicate the worthiness of a particular pattern node in the network. The pattern nodes that are seldom used or are outdated would eventually have a comparatively lower merit index, and vice-versa. Besides updating new knowledge to the network within a particular network size, the replacement actions as discussed also serve as a simple and effective strategy to gradually phase out the out-dated pattern nodes. This action may be useful to maintain the efficiency of the network in a dynamic modelling environment. The risk of deleting pattern nodes which could be useful in future is found to be low and furthermore, this occurrence is being compensated by the instant learning characteristic of the GRNN. Moreover, deletion of any important node rarely happens when the maximum network size is large enough to keep the important information within the operating range of the system. A decision threshold, to create a new node minimum threshold (α : 0 ≤ α ≤ 1 ) is used to determine the creation of a new pattern node in the network. In each sampling, if none of the pattern nodes is fired above the α threshold, a new pattern node will be created. This is due to the fact that when the matching degrees measured between the training vector and stored patterns of the network are below the α threshold, it indicates that a new characteristic of the plant dynamic has been observed. Thus, further learning is necessary to maintain the prediction accuracy. Or else, the existing network may not be able to give appropriate prediction upon this new input pattern. The parameters of the newly created node are assigned dynamically as explained in the following section.
7
3.2
Adaptation Strategy II : Dynamic Sigma Initialisation It is observed that the variance of the system states at each sampling period during the
transient states is much larger than the steady state. It thus suggests that the pattern nodes storing the transient characteristics will have the centres farther apart from those that recorded the steady state characteristics of the plant. Thus, it is preferable to have separate sigma for each of the pattern nodes of the GRNN model as it can greatly improve the prediction accuracy. However, finding an appropriate sigma for each of the variables of the pattern nodes could be a difficult task [9,16]. It depends on the network input variables and also the plant characteristics. This paper introduces a simple, fast and dynamic sigma initialisation method based on the dynamic states of the plant, and without the need for statistical calculation. This scheme is able to assign the centre and width of the Gaussian kernel for each of the input variables effectively and with less computation procedures compared to other clustering method [21-22]. The initialisation of the sigma is based on the distance, i.e. the changing rate of the variable at the time when the new node was created. For the sigma of the ith variable (xi) of the jth pattern node created at the kth sampling instant, the initialisation method can be written as follows: σij = ai + bi | xi(k) - xi(k-1) |
if {ai + bi | xi(k) - xi(k-1)| }< ci
σij = ci
else
Eq(3.3)
where ai and ci are the defined lower and upper limits of the sigma to xi respectively, and bi is the slope rate of the sigma. The initial value of the sigma is thus bounded, i.e., (ai ≤σij ≤ ci), where ai is a small positive value for the minimum sigma allowed. The parameter ci prevents the sigma from becoming too large which may cause prediction inaccuracy in a finite sample modelling, such that to be over generalised [9,16]. This strategy allocates a larger sigma if the changing rate, |xi(k)-xi(k-1)| is bigger. The initialisation method implies that the pattern nodes which stored the transient characteristic of the plant, will be assigned with a larger sigma value. Larger sigmas also means wider pattern kernels, which is in accordance to the fact that these pattern nodes have the centres which are farther away from the other pattern nodes. When necessary, the information in between these two points can be extracted via interpolation, as GRNN is known to have good interpolation capability. Generally, it helps in reducing the number of nodes needed to record the transient characteristic of the plant while still preserving good prediction. On the other hand, the pattern nodes that record the steady states of the plant is assigned with smaller sigma, in order to give more precise steady state prediction. Other than that, the centre of the Gaussian kernel of the ith variable (xi) of the new pattern node is simply assigned as xi(k).
8
3.3
Adaptation Strategy III : Adjustment Of The Targets The output of each of the pattern nodes reflects the matching degree of the network input
vector to the pattern vector stored in that particular node. For a Gaussian function pattern node, a higher firing output (closer to 1) means a higher matching degree, and vice-versa. A firing level of 1 indicates that the input vector is exactly the same value as the centres recorded in that particular pattern node. Due to the existence of various system noise, the recorded training targets could be distorted, therefore, a certain form of compensation to the pattern targets is needed. For noise error compensation, adjusting the target vector seems to be more appropriate than tuning the sigma of the pattern node. The rationale is that, assuming the case of a matching level of ‘1’, the calculation of the gradient of the error with respect to the corresponding sigma, as shown in Section 3.4, will be zero due to the zero distance measured in the input space, resulting in zero adjustment magnitude. On the other hand, adding a new pattern node or tuning the recorded targets may help in reducing prediction error and able to serve the same purpose, however, the latter is easier and uses no extra node. The idea here is to merge the existing target with the feed-in training target. However, adjustment will only apply to pattern nodes which have high matching degrees, i.e. the firing levels are closed to 1. In this paper, a pattern node is qualified for adjusting targets if its firing level is above a “target update” threshold (γ). The target vector of the ith pattern node to the tth output variable is then updated as follows: yit_new = (1 − λ )
yit _ training fi (k )
+ λ ⋅ y it _ old
yit_new = (1 − λ ) ⋅ y it _ training ⋅ fi (k ) + λ ⋅ yit _ old
if y
it _ training
≤ y it _ old
else
Eq(3.4) where λ is the adjustment rate of the target, and fi(k) is the firing level of the ith pattern node as in Eq(3.2). The parameter yit_training is the training target vector that is associated with the current input vectors, and yit_old is the target vector currently associated with the ith pattern node. The target update threshold (0 ≤ γ ≤ 1), is set closed to 1. Generally, the adaptation of the targets serves two functions, firstly, it helps to improve prediction error due to noise contamination; secondly, it helps to update the network if the plant characteristics vary over time.
3.4
Adaptation Strategy IV : Tuning of Sigma
9
The sigma initialisation method provided by Strategy II serves as a convenient way to assign the sigma appropriately, however, there are no means of generating an optimal sigma. Although it is reported that the GRNN algorithm does not get trapped in local minimas [11], tuning of the sigma would be necessary to further refine the prediction accuracy, especially in a dynamic modelling environment. The gradient of the prediction error of the GRNN can be computed by using partial differentiation method. This can be easily computed as the Gaussian function of the pattern node is differentiable with respect to the sigma. For simplicity, the following section discusses the gradient derivation of a GRNN with a single output node, i.e., one output variable only. The dynamics of the GRNN model in Eq(2.3) can be rewritten as follows:
∑ { y ⋅ exp(d )} p
A yˆ = B
A=
i
∑ {exp(d )} p
B=
i
i =1
i
Eq(3.5)
i =1
where y$ is the predicted output of the GRNN, and di is the distance function as defined in Eq(2.5). The squared error between the GRNN predicted output ( y$ ) and the target (T) can be written as: e = ( y$ -T )2
Eq(3.6)
Thus, the first derivative of the gradient with respect to each of the sigma weights can easily be obtained follows: ∂ y$ ij h − g ⋅ y$ = 2 y$ − T B
∂e = 2 y$ − T ∂ σ ij
[
[
] ∂ σ
Eq(3.7)
]
∂ di ∂A where g ≡ = y i ⋅ exp( d i ) ⋅ ∂ σ ij ∂ σ ij
∂ di ∂B = exp d i ⋅ h≡ ∂ σ ij ∂ σ ij
( )
(x j − xij )2 ∂d i =2 σ ij3 ∂σ ij
The optimisation of the sigma is intended to minimise the prediction squared-error of Eq(3.6). The sigma of the jth variable for the ith pattern node at the kth sampling instant is then updated as follows: σ ij ( k ) = −φ
∂ e( k ) ∂ σ ij
+ σ ij ( k − 1)
Eq(3.8)
where φ is the learning rate of the sigma, which usually is a small constant. While it is necessary to continuously improve the accuracy of the GRNN model, tuning the sigmas for all the pattern nodes in each sampling instant are rather time consuming which could cause problems in an online adaptation. On the other hand, the same objective can still be achieved by tuning only a group of identified pattern nodes that contributed significantly to the model predicted output, where the number of nodes is usually small. In a serial computer, this tuning scheme can
10
greatly save the time for tuning of the sigmas, and this is experimentally demonstrated in the following sections. In this paper, a minimum significant level (ψ) is defined to select the pattern nodes to be tuned. Only the pattern nodes that fire above the ψ level qualify for the gradient tuning procedure. In other words, only the sigma of those significantly contributed nodes, i.e., which are closely related to the input patterns, will be tuned. In the adaptation strategies of the GRNN as discussed, the tuning of the pattern nodes is determined and the tuning magnitude is calculated locally based on its firing level. The overall local adaptation action of the pattern nodes using Strategy III and IV can be summarised as follows: If the particular pattern node is fired above the target update threshold (γ), i.e. fi(k)> γ, then its targets will be adjusted without tuning its sigmas. Else, if it is fired above the tuned sigma minimum level (ψ), i.e. γ < fi(k) > ψ, the sigma of the pattern node will be tuned. Note that, there will be no adaptation for the pattern nodes that are fired below ψ, i.e. fi(k) < ψ. This highly localised GRNN adaptation strategy is, therefore, computationally efficient and is favourable for parallel processing.
4.
EXPERIMENTS AND RESULTS In the first experiment, we compare the proposed adaptive GRNN modelling method with a
mathematical linear modelling method on a linear system. Here the popular Extended Recursive Least Square Estimation algorithm (ERLS) which has been widely applied for linear system identification is chosen [17]. The effects of the proposed adaptation strategies on the prediction performance and structural growth of the GRNN models are also investigated. Next, the GRNN is applied to model a nonlinear plant, and the effects of some of the adaptation parameters involved in the proposed strategies are also investigated. Figure 3 shows a typical configuration where GRNN is used in modelling a plant with randomly excited inputs. The inputs of the GRNN use only the delayed value of the plant inputs (u) and outputs (yε). The parameter d is the plant time delay. These GRNN inputs also serve as the training vector for adaptation of the model. The parameter yε is the measured plant output that is distorted by the noise, i.e. { yε = y + ε }, where ε is the Gaussian noise. For the configuration shown in Fig.3, a set of pre-generated randomised plant input is first collected. In each of the learning cycle, this set of data is fetched one by one to stimulate the plant. Thus, one learning cycle consists of a number of iterations (L).
11
Although the network takes in yε , the evaluation of the prediction performance is measured by the prediction error ( e$ ), which is the difference between the predicted system response ( y$ ) and the actual system response (y). The prediction error, which is the accumulated normalised root mean square error (ARMSE) is represented as follows: L 2 ARMSE = ∑ y − yˆ i i i
Eq(4.1)
where L is the total number of data in the pre-generated data set. The maximum number of pattern nodes generated allowed in all the experiments is limited to 300 nodes.
{Fig.3}
4.1
Modelling of a Linear Plant Consider a plant that is described by the following ARMAX model: A( z −1 ) y (t ) = B ( z −1 )u(t − d ) + C ( z −1 )ε (t)
Eq(4.2)
where A( z −1 ) = 10 . − 15 . z −1 + 0.7 z −2 B (z −1 ) = 10 . + 05 . z −1 C (z −1 ) = 10 . − 10 . z −1 + 0.2 z −2
This plant has been used as a benchmark for identification in many literatures [23-25]. The parameter ε (t ) is the Gaussian noise and four types of noise levels are selected for comparison purposes. The applied Gaussian noise has zero-mean and the maximum magnitude of the noise is given as in Table-1. Note that Strategy I as discussed in Section 3.1 is applied in all the cases to increase the flexibility in developing a GRNN model. The different experiments carried out can be summarised as in Table-1.
{Table-1}
The GRNN model has four input nodes and one output node. The input consists of two delayed plant input and output signals to ensure that the neural network model learns the plant dynamics correctly (see [6]). In these experiments, the adaptation parameters are configured as follows: “create new node minimum” threshold, α=0.5 and the “sigma initialisation” parameters, ai bi and ci are set to be 0.02, 1.0 and 0.2 respectively. The “targets update” threshold, γ, is set to 0.8, the “target adjustment” rate, λ, is set to 0.5, the “sigma minimum” threshold, ψ, is set to 0.1, and the 12
“sigma tuning” rate, φ, is set to 0.2. The selection of these parameters are largely based on rationale which are related to their functions as discussed in the previous sections, and also by trial and error. For example, γ and ψ should be large and small, respectively. While α should be moderate to allow an appropriate rate for knowledge acquisition. Similar to the learning rate in other neural learning paradigms, φ, is usually set to a small value. The experiments in this paper show that these parameters are robust within a certain range. The order of the numerator and denominator used in the extended RLS algorithm are set to be of the same order as the linear plant in Eq(4.2), which is intended to result in a high modelling accuracy. The extended RLS algorithm is also equipped with a directional forgetting technique and in this case with a forgetting factor of 0.95 [17]. Figures 4a, 4b, 4c and 4d show the modelling results of the GRNN and the RLS models. It can be observed that the initial errors of the GRNN models are high as the networks start from a null network. However, the ARMSE of the GRNN prediction dropped drastically after a few learning cycles, reflecting its fast learning algorithm. Note that in a noiseless environment, the ERLS algorithm is able to model the plant dynamics perfectly, i.e. zero prediction error. The reason for the good performance of the ERLS in these experiments is because its internal mathematical model was set to have the exact structure as that of the linear plant of Eq(4.2), and the plant was excited sufficiently which enabled the ERLS to converge to the actual parameters of the plant. However, as the amplitude of the noise increased, the ERLS settled to a non-zero level. In both the GRNN and ERLS models, it can be observed that the settled-ARMSE level increases as the noise level increase, i.e. increasing of residuals due to noise.
{Fig.4a, 4b, 4c, 4d}
For the GRNN model, it can be seen that by employing Strategies I and II, the error decreases much faster and lower than using only Strategy I. Generally, the dynamic sigma initialisation method in Strategy II enables the GRNN to learn faster and obtains more accurate prediction. By employing Strategies III and IV (Case 2, 6, 10 & 14) which adjust the targets and fine-tune the sigmas, the GRNN models consist of an effective method of continuous network fine tuning, especially in heavy noise environment. Furthermore, by applying all the four strategies, the best prediction performance of the GRNN can be achieved (Case 4, 8, 12 & 16). Besides that, the experiments show that relatively, the GRNN out-performed the ERLS algorithm as the noise level in the operating environment increased. Although, it can be observed that in a noiseless environment, the RLS algorithm is better than the
13
GRNN, however, as the noise amplitude increased, the GRNN was capable to give better prediction than the ERLS algorithm. Figure 5 shows the changes of the average RMSE prediction error for both models as the amplitude of the noise is increased. It can be seen that the prediction of the ERLS model deteriorated faster than that of the GRNN model. Still more, the GRNN was also able to give sustainable prediction accuracy even though the level of the noise increased tremendously.
{Fig.5}
Comparing the various cases of the GRNN adaptation, it can be observed that by applying only Strategy I (case 1, 5, 9 & 13), the performance obtained is worst among its class. The results reflected two facts, first, although the network was able to acquire new knowledge and discarded outdated information by using Strategy I, it was just not enough to produce high prediction accuracy. The other Strategies were necessary in order to obtain a more accurate GRNN predictor model. Secondly, by examining the effect of Strategy II, it can be concluded that sharing one common sigma value for all the pattern nodes of GRNN seemed not appropriate for dynamic plant modelling. In addition, assuming that a common sigma is applied, it is difficult to determine a good common value for all the pattern nodes.
The size of the neural network is often a concerned issue, the proposed adaptation strategies also affect the size of the resulted GRNN network. Figure 6 shows the increment of the number of pattern nodes for various cases. In general, employing Strategy II, which involves dynamic initial sigma assignment, has seen pattern nodes being created more rapidly, however, in return it gives better modelling accuracy as shown by the prediction performance in Fig.4c. On the other hand, Strategies III and IV which are useful to further fine-tune the network, has slight impact in increasing the network size.
{Fig.6}
The level of measurement noise contaminated in the signal fed into the network also affected the network size as shown in Fig.7. When larger magnitude of Gaussian noise exists, more pattern nodes are needed. The extra pattern nodes are used to compensate the error tolerance inherent in the training data. It can be seen that by adapting the target as in Strategy III also helps to improve error in
14
the modelling. These features have contributed in improving the performance of the GRNN in noisy dynamic plant modelling.
{Fig.7}
4.3
Modelling of a nonlinear plant By modelling a nonlinear plant, the effects of some of the adaptation parameters involved can
be investigated, such as the “create new node” minimum threshold (α), targets adjustment threshold (γ) and sigma tuning rate(φ), to the overall modelling performance and the GRNN structural growth. In these experiments, we assume the nonlinearities of the plant are unknown. For our investigation purposes, a dynamical process which has been used as a benchmark for validating other neural networks and fuzzy algorithms in plant identification [26-28] is used. Its dynamics is modified such that it has a higher nonlinearity and a more complex plant dynamics and is given as follows:
y (k + 1) =
y (k ) y (k − 1)[2.5 + y (k − 1)] e −u ( k ) − 1 2 . 0 + ε (k ) + 1 + y 2 (k ) + y 2 (k − 1) e − u (k ) + 1
Eq(4.3)
where u(k) and y(k) are the plant input and output signals, respectively, and ε (k ) is the Gaussian noise with zero mean. The GRNN model has four inputs and outputs. The inputs consist of a single delayed value of the plant input signal and two delayed values of the plant output. Firstly, the experiments are carried out similar to those as described in Table 1. The default adaptation parameters of the GRNN model are selected to be the same as in Section 4.1. Figure 8 shows the effects of different strategies in the modelling environment, which consists of Gaussian noise with normalised amplitude of 0.05. Figure 9 indicates the progress of the GRNN learning with different cases of noise amplitudes. It was found that the effects of employing different GRNN adaptation strategies for the linear and nonlinear plants are basically similar, which has been discussed in Sections 4.1 and 4.2. Figure 10 shows the output by the GRNN model when compared with that of the actual plant. {Fig.8} {Fig.9} {Fig.10}
15
It is found that Strategy II can effectively initialise each of the sigmas without much computation required. Assuming that Strategy II is not applied, the problem arises where it is difficult to determine a single common sigma value that is suitable for all patterns stored in the network. Figure 11a shows the prediction performance using several values of sigma which is common to all the pattern nodes (i.e, 0.05, 0.1, 0.2, 0.4) and also applying Strategy II in noise free plant environment. Meanwhile, Fig.11b shows the corresponding increment of the pattern nodes in the network as the learning progresses. In a finite sample learning of the GRNN, if the single common sigma was set too large, it will create a wide kernel in the pattern nodes which can cause over generalisation of the plant dynamics [16]. This resulted in very few pattern nodes being created, causing relatively poor prediction performance. It is observed that if a smaller value of the common sigma is used instead, more pattern nodes are needed to store the necessary plant characteristics. From Fig. 11a, the experiments show that lower ARMSE was recorded due to the fact that the GRNN follows the training data closely, i.e. almost moving from point to point [16]. However, in this case, the knowledge stored in the pattern nodes is in a piece-wise manner, and generalisation could be very low. The network, thus, loses the robustness and the self-organizing features inherent in many neural networks, and worse, a large network size may be created. From Fig.11a and 11b, it can be seen that by employing Strategy II, satisfactory prediction performance with an appropriate number of pattern nodes can be achieved. Thus, it can be concluded that Strategy II is able to assign the initial sigma effectively.
{Fig.11a, 11b}
The following experiments were carried out by applying all the four adaptation strategies. Figure 12a shows that the accuracy of the prediction by the GRNN can be maintained over a range of ψ level. However, the performance of the network deteriorates if the ψ threshold is set too high. Figure.12b shows the average percentage of pattern nodes that have fired above a certain level in the experiments. It is clear that only a small percentage of the pattern nodes have been fired significantly in particular instances. This graph also represents the percentage of pattern nodes chosen to perform sigma adaptation as the ψ level is varied. As indicated in Fig.12b, only about 6% of the sigma of the pattern nodes were tuned when ψ = 0.1. This results a saving of 94% of computation time for the sigma tuning when compared to tuning each sigma in each of the pattern nodes as in the conventional GRNN. It was observed also that there is no deterioration of the performance of the GRNN as shown in Fig.12a.
16
{Fig.12a, 12b}
5.
CONCLUSION This paper has discussed an effective methodology in using the GRNN for modelling of
dynamic plants in which four adaptation strategies have been formulated and investigated experimentally.
The proposed GRNN model was able to evolve from a null network to an
appropriate size that can be practically used for modelling purposes without any loss in performance. Furthermore, the model is equipped with a dynamic and fast adaptation algorithm. The experimental results have shown that the GRNN is superior for modelling in a dynamical and noisy environment when compared to some popular traditional identification techniques such as the ERLS. In conclusion, the four strategies that have been proposed can be summarised to have the following advantages. Strategy I helps to create an appropriate network size as well as maintaining the updated pattern nodes. Strategy II overcomes the difficulty of initialising the sigma of the pattern nodes. Strategy III and IV further improve the prediction accuracy by continuously tuning the targets and sigmas of the pattern nodes, respectively. Further research in trying to apply the methodology to real physical systems are currently being investigated with some measurable success.
6.
REFERENCES
1. Kosko,B., Neural Network and Fuzzy Systems, Prentice-Hall Inc. USA. 1992. 2. Pao.Y.H., Phillips,S.M. and Sobajic,D.J., Neural-net Computing and the Intelligent Control of Systems, Int. J. Control, Vol.56, No.2, 1992, p263-289. 3. Gorinevsky,D., On the Persistency of Excitation in Radial Basis Function Network Identification of Nonlinear Systems. IEEE Trans. On Neural Networks, Vol.6, No.5, Sept.1995, p1237-1244. 4. Irwin,G., Brown,M. Hogg,B. And Swidenbank,E., Neural Network Modelling of a 200MW Boiler System, IEE Proc.-Control Theory Appl., Vol.142, No.6, Nov.1995, p529-536. 5. Draeger,A., Engell,S. and Ranke,H., Model Predictive Control Using Neural Network, IEEE Control Sys. Mag., Vol.15, Iss:5, Oct.1995, p61-66. 6. Omatu,S., Khalid,M. and Yusof,R., Neuro-Control and Its Applications, Springer-Verlag, London, 1995. 7. Smith,M., Neural Networks for Statistical Modeling, Van Nostrand Reinhold, USA, 1993. 8. Hunt,K.J. and Sbarbaro,D., Adaptive filtering and neural networks for realisation of internal model control, Intelligent Systems Enginerring, Summer 1993, pp67-75. 9. Specht,D.F., Fuzzy Logic and Neural Network Handbook: Chapter 3 - Probabilistic and General Regression Neural Networks, McGraw-Hill Companies, Inc. 1996. 10. Hyun.B.G. and Nam,K., Faults Diagnoses of Rotating Machines by Using Neural Networks: GRNN and BPN, Proc.of the 1995 IEEE IECON, Vol.2, p14561461. 11. Specht,D.F. and Romsdahl.H., Experience With Adaptive Probabilistic and General Regression Neural Networks, Proc. Of IEEE World Congress on Computational Intelligence, 1994, Orlando, USAVol.2, p1203-1208..
17
12. Marquez,L. And Hill,T., Function Approximation Using Backpropagation and General Regression Neural Networks, Proc. 26th Hawaii Int. Conf. On Systems Sciences, Vol.4, p607-15. 13. Patton,J.B., Brushless DC Motor Control Using a General Regression Neural Network, Proc. Of the 1995 IECON, Vol.2, p1422-1427. 14. Schaffner,C. And Schroder,D., An Application of General Regression Neural Network to Nonlinear Adaptive Control. Proc. Of Fifth European Conf. On Power Electronic and Applications, Brighton, UK, Sept.1993, Vol.4, p219-224. 15. Chen.C.H., Neural Networks for Financial Market Prediction, Proc. Int. Conf. On Neural Networks Vol.2, June 1994, p1199-1202. 16. Timothy,M., Advanced Algorithms for Neural Networks: A C++ Courcebook, John Wiley & Sons, Inc., Canada, 1995. 17. Wellstead,P.E. and Zarrop,M.B., Self-tuning System: Control and Signal Processing, John Wiley & Sons Inc., 1991, England. 18. Specht,D.F., A General Regression Neural Network, IEEE Transactions on Neural Network, Vol.2, No.6, Nov 1991, p568-76. 19. E.Parzen, On estimation of a probability density function and mode, Annals. of Mathematical Statistics, Vol.33, 1962, pp1065-1076. 20. Patton,J.B. and Ilic,J., Identification of Static Distribution Load Parameters Using General Regression Neural Networks, Proc. Of the 36th Midwest Symp. On Circuits and Systems, Aug.1993, Detroit, USA, Vol.2, p10231026. 21. Burrascano,P., Learning Vector Quantization for the Probabilistic Neural Network, IEEE Trans. On Neural Networks, Vol.2, July 1995, p458-461. 22. Traven,H.G.C., A Neural network Approach to Statistical Pattern Classification by ‘Semiparametric Estimation of Probability Density function, IEEE Trans. On Neural Networks, Vol.2, May 1991, p366-377. 23. Kristinsson,K. And Dumont,G.A., System Identification and Control Using Genetic Algorithms, IEEE Trans. On Sys., Man, and Cybern., Vol.22,No.5, Sept.1992, p1033-1046. 24. Ljung,L. and Soderstrom,T., Theory and Practice of Recursive Identification, Cambridge MA: MIT Press, 1983. 25. Astrom,K.J. and Wittenmark,B., Computer Controlled systems., Englewood Cliffs, NJ: Prentice-Hall, 1984. 26. Declercq,F. And Keyser,R.D., Comparative Study of Neural Predictors in Model Based Predictive Control, Proc. Int. Workshop on Neural Network for Ident., Control, Robotics & Sig. Processing, Italy, 21-23 Aug.1996, p2028. 27. Narendre,K.S. and Parthasarathy,K., Identification and Control of Dynamic Systems using Neural Networks, IEEE Trans. On Automatic Control, Vol.1,No.1,1990, p4-26. 28. Wang,H. And Sun,J., Modeling of Dynamic Systems Using Fuzzy Logic, Proc. Of the 36th Midwest Symp. On Circuits and Systems, Detroit, USA, Aug.1993, Vol.1, p506-509.
ACKNOWLEDGMENTS This research has been partly funded by the Ministry of Science, Technology and Environment of Malaysia and the Malaysian Toray Science Foundation.
18
List of notations di
Distance function for ith pattern node
xi
Input for ith input variable of GRNN
x ij
Centre of jth input variable for ith pattern node
y$
GRNN output
D
Dimension of GRNN inputs
GRNN
General Regression Neural Network
RLS
Extended Recursive Least Squares identification algorithm
γ
"Target update” threshold
φ
Learning rate in tuning the sigma
α
Minimum firing threshold to create a new node
σ ij
Sigma (kernel width) of jth input variable for ith pattern node
λ
Adjustment rate for adapting recorded targets
η
Merit index of pattern nodes
ψ
Minimum significant firing level to decide if a pattern node needs to be tuned
I
List of figure captions Table 1
Experimental setup for identification of the linear plant.
Fig.1
Structure of the General Regression Neural Network.
Fig.2
The flow of GRNN model adaptation strategies.
Fig.3
Training scheme of modelling with random plant inputs.
Fig.4a
Modelling ARMSE in noiseless environment.
Fig.4b Modelling ARMSE in low Gaussian noise environment. Fig.4c
Modelling ARMSE in medium Gaussian noise environment.
Fig.4d Modelling ARMSE in heavy Gaussian noise environment. Fig.5
Deterioration of the prediction performance due to random noise.
Fig. 6
The effects of adaptation strategies to the increment of pattern nodes.
Fig. 7
The effects of magnitude of noise to the increment of pattern nodes.
Fig.8
ARMSE for cases with Gaussian noise amplitude of 0.05.
Fig.9
Performance of GRNN models incorporating Strategies I,II,III & IV under various noise conditions.
Fig.10 The GRNN prediction of the nonlinear plant with noise amplitude of 0.05. Fig.11a
The changes in ARMSE when Strategy II is applied with various common sigma values.
Fig.11b The number of pattern nodes created when Strategy II is applied with various common sigma values. Fig.12a Effects of tuning threshold ψ to the prediction errors. Fig.12b Percentage of pattern nodes fired above a certain threshold.
II
Table and Figures
Without Gaussian
Gaussian noise
Gaussian noise
Gaussian noise
noise
amplitude=0.05
amplitude=0.2
amplitude=0.5
Case
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16
Strategy II
x
x
√
√
x
x
√
√
x
x
√
√
x
x
√
√
Strategy III & IV
x
√
x
√
x
√
x
√
x
√
x
√
x
√
x
√
*
√ : applied
*
Strategy I as discussed in Section.3.1 applied to all cases
*
When Strategy II is not applied, then all of the sigma is initialised as 0.1 except as
x: not applied
mentioned otherwise.
Table 1
Experimental setup for identification of the linear plant.
III
x1
Input layer
xj
xD
x11 x1j x1D -1 σ11
-1 σ1j
xpj
xij
xp D
xp1
-1 σ1D
-1 σij
-1 σi1
-1 σiD
-1 σpj
-1 σp1
-1 σp D
pattern layer yi
y1
yp ∑
summation layer
∑
output layer
yˆ
Fig.1 Structure of the General Regression Neural Network.
IV
no For each iteration, prepare new training data
Create new pattern node?
Assign centre and initialise sigma dynamically
yes
no
Update target?
no
yes Update the target of the pattern node.
Tune sigma ?
no
yes All pattern nodes checked?
yes Tune the sigma.
For each pattern node
Fig.2 The flow of GRNN model adaptation strategies.
V
z-2d z-1
GRNN model
y$
z-d
e$
+
z-1 z-1 T
u
plant
y
Gaussian noise
ε
yε + +
Fig.3 Training scheme of modelling with random plant inputs.
VI
ARMSE 1
Case Case Case Case RLS
0 0
10
Learning cycles
1 3 2 4
25
20
Fig.4a Modelling ARMSE in noiseless environment.
ARMSE 1
0 0
Case Case Case Case RLS
10
Learning cycles
20
5 6 7 8
25
Fig.4b Modelling ARMSE in low Gaussian noise environment.
VII
ARMSE 2
Case 9 Case 10 Case 11 Case 12 RLS
1
0 0
10
Learning cycles
25
20
Fig.4c Modelling ARMSE in medium Gaussian noise environment.
ARMSE 3
Case Case Case Case RLS
2
13 14 15 16
1
0 0
10
Learning cycles
20
25
Fig.4d Modelling ARMSE in heavy Gaussian noise environment.
VIII
Average RMSE per prediction 0.007
0.005 Case 16 0.003 Case 4 Case 8
0.001
0.0 0.2
Case 12
GRNN model RLS model
2.0 5.0 Amplitude of applied Gaussion noise
Fig.5 Deterioration of the prediction performance due to random noise.
IX
Number of pattern nodes 225
Case 11 Case 12 Case 10 Case 9
1 0
250 Learning iterations
500
Fig. 6 The effects of adaptation strategies to the increment of pattern nodes.
Number of pattern nodes 250
Case16 Case 12 Case 8 Case 4
X
XI
ARMSE 2
Strategy I Strategy I, III&IV 1
Strategy I& II Strategy I, II,III&IV 0 0
12
25
Learning cycles
Fig.8
ARMSE for cases with Gaussian noise amplitude of 0.05.
XII
ARMSE 2
noise amplitude 0.5
1
0.0 0.1 0.05 0 0
12 Learning cycles
25
Fig.9
Performance of GRNN models incorporating Strategies I,II,III & IV under various noise conditions.
XIII
Response (normalised)
yε y
1.1
0.45
GRNN prediction -0.2 800
900 Learning iterations
1000
Fig.10 The GRNN prediction of the nonlinear plant with noise amplitude of 0.05.
XIV
ARMSE 3
Common σ=0.4 2
Common σ=0.2 Common σ=0.1 1
Strategy II Common σ=0.05
0 0
Fig.11a
12 Learning cycles
25
The changes in ARMSE when Strategy II is applied with various single sigma values.
XV
Number of pattern nodes
Common σ=0.05
150
Common σ=0.1 Strategy II 75
Common σ=0.2 Common σ=0.4 1 0
25 Learning iterations
Fig.11b
50
The number of pattern nodes created when Strategy II is applied with various common sigma values.
XVI
ARMSE 2
ψ=0.4 ψ=0.1
1
ψ=0.01
ψ=0.001
0 0
12 Learning cycles
25
Fig.12a Effects of tuning threshold ψ to the prediction errors.
XVII
pattern nodes 12% ψ=0.001 ψ=0.01 ψ=0.1
6%
ψ=0.4
0% 0.0
0.2
0.4
Fired level
Fig.12b Percentage of pattern nodes fired above a certain threshold.
XVIII