Annals of Biomedical Engineering, Vol. 27, pp. 538–547, 1999 Printed in the USA. All rights reserved.
0090-6964/99/27共4兲/538/10/$15.00 Copyright © 1999 Biomedical Engineering Society
Robust Nonlinear Autoregressive Moving Average Model Parameter Estimation Using Stochastic Recurrent Artificial Neural Networks K. H. CHON,1 D. HOYER,2 A. A. ARMOUNDAS,3 N-H. HOLSTEIN-RATHLOU,4 and D. J. MARSH5 1
Department of Electrical Engineering, City College of New York, NY, 2Institute for Pathophysiology, Friedrich Shiller University, Jena, Germany, 3Harvard–MIT Division of Health Sciences and Technology, Cambridge, MA, 4Department of Medical Physiology, The Panum Institute, University of Copenhagen, Denmark, and 5Department of Molecular Pharmacology, Physiology and Biotechnology, Brown University, Providence, RI (Received 8 April 1998; accepted 15 April 1999)
Abstract—In this study, we introduce a new approach for estimating linear and nonlinear stochastic autoregressive moving average 共ARMA兲 model parameters, given a corrupt signal, using artificial recurrent neural networks. This new approach is a two-step approach in which the parameters of the deterministic part of the stochastic ARMA model are first estimated via a three-layer artificial neural network 共deterministic estimation step兲 and then reestimated using the prediction error as one of the inputs to the artificial neural networks in an iterative algorithm 共stochastic estimation step兲. The prediction error is obtained by subtracting the corrupt signal of the estimated ARMA model obtained via the deterministic estimation step from the system output response. We present computer simulation examples to show the efficacy of the proposed stochastic recurrent neural network approach in obtaining accurate model predictions. Furthermore, we compare the performance of the new approach to that of the deterministic recurrent neural network approach. Using this simple two-step procedure, we obtain more robust model predictions than with the deterministic recurrent neural network approach despite the presence of significant amounts of either dynamic or measurement noise in the output signal. The comparison between the deterministic and stochastic recurrent neural network approaches is furthered by applying both approaches to experimentally obtained renal blood pressure and flow signals. © 1999 Biomedical Engineering Society. 关S0090-6964共99兲00604-9兴
from organs or stimuli other than the system being studied. The other type is the measurement noise, which may occur due to inaccurate measuring devices, computer quantization errors, and perhaps nonstationarity in the measured signal 共if stationarity in the signal is assumed in analyzing the signal兲. One of the key challenges in physiological systems modeling is to estimate the physiological signals from input–output data corrupted by such noise sources. In a stochastic model the source of a physiological signal is considered the deterministic component of the model and the noise sources are considered the stochastic component of the model. Many novel algorithms exist for obtaining parameter estimates with reasonable accuracy, using either nonparametric 共e.g., Volterra–Weiner兲7,11–14,17 or parametric 关linear and nonlinear autoregressive moving average 共ARMA兲兴3,5,6,8,10,18 methods. An algorithm can be used to estimate parameters of either a deterministic or a stochastic model. Within these two classes of systems, deterministic models are more predominant in the literature, but stochastic models have also been developed in the attempt to best determine the deterministic component in a noise-corrupted signal. The general consensus is that more accurate parameter estimates result by applying the algorithms to stochastic models 共the stochastic approach兲, than to their deterministic counterparts 共the deterministic approach兲, because the noise source is specifically modeled in stochastic models. Noise is considered to be dynamic if the noise source is recursively added 共feedback兲 to the system output at each time step, whereas additive noise refers to the corruption of observation by errors which are independent of the dynamics. For example, measurement noise is considered additive noise, but inputs from other systems are considered dynamic noise. The common practice when estimating parameters of stochastic models is to first estimate the deterministic portion of the model and then calculate the
Keywords—ARMA, NARMA, Polynomial function, Backpropagation, Measurement noise, Dynamic noise, Deterministic, Stochastic, Recurrent neural network.
INTRODUCTION Most physiological signals obtained from experimental settings are corrupted with noise, the sources of which can be classified into two main categories. The first is dynamic, arising from the surroundings, that is, Address correspondence to Ki H. Chon, PhD, City College of New York, Dept. of Electrical Engineering, Steinman Hall, Rm 677, Convent at 138th Street, New York NY 10031. Electronic mail:
[email protected]
538
Stochastic Feedforward Artificial Neural Networks
prediction error signal by subtracting the output signal of the estimated deterministic model from the corrupt output signal. The deterministic components are then reestimated using an iterative procedure in which repeatedly updated estimates of the prediction errors are used in obtaining unbiased estimates of model parameters.6,10,18 An iterative procedure may provide further reduction in the magnitude of prediction error signals. Our approach to estimating stochastic ARMA model parameters is also based on calculating prediction errors, then reestimating linear and nonlinear ARMA model parameters 共with prediction error terms as an additional input to train the neural network兲. We compare the efficacy of the proposed approach with the deterministic approach for estimating parameters using simulated data from both noiseless and noise-corrupted models. In the literature, the accuracy of parameter estimates in the presence of dynamic noise has been studied occasionally, whereas additive noise has been widely studied.4,6,10,11,18 The aim of the present study is to extend the use of polynomial activation function neural networks to estimate stochastic linear and nonlinear ARMA model parameters. Note that the polynomial activation neural network is a parametric algorithm. The motivation of the present paper is to evaluate our proposed approach when the output signal is corrupted not only by additive noise, but also by dynamic noise. Furthermore, comparison between the deterministic and stochastic approaches is made using experimentally measured renal blood pressure and flow data. METHODS We have previously demonstrated how deterministic nonlinear autoregressive moving average 共NARMA兲 parameters may be obtained from three-layer neural networks 共Fig. 1兲 utilizing a polynomial representation of the activation function in the hidden units.3 In brief, consider a nonlinear, time-invariant, discrete-time dynamic system represented by the following NARMA model: P
y 共 n 兲⫽
Q
FIGURE 1. Three-layer artificial neural network topology. Note that weights of S input neurons are W leads and weights of U input neurons are V leads.
where P and Q represent the model order of the autoregressive 共linear and nonlinear兲 and moving-average 共linear and nonlinear兲 terms, respectively; y(n) is the system output signal; u(n) is the input signal; and i, j, m, and n are indices. The term d(n) is some unmodeled noise source and no assumptions have been made regarding the statistical nature of the noise source. If the input and output terms on the right side of Eq. 共1兲 are replaced by a polynomial or power series of these terms, then Eq. 共1兲 may be expressed as follows: M
y 共 n 兲⫽
兺 a 共 i 兲 y 共 n⫺i 兲 ⫹ j⫽0 兺 b 共 j 兲 u 共 n⫺ j 兲
⫹
P
兺兺 Q
⫹ ⫹
a 共 i, j 兲 y 共 n⫺i 兲 y 共 n⫺ j 兲
兺兺
b 共 i, j 兲 u 共 n⫺i 兲 u 共 n⫺ j 兲
兺
j⫽1
Q
w j i y 共 n⫺ j 兲 ⫹
兺
j⫽0
v j i u 共 n⫺ j 兲 .
共3兲
M
Q
兺兺 i⫽1 j⫽0
共2兲
If the basis function in Eq. 共2兲 is written as a polynomial function,
Q
i⫽0 j⫽0 P
x i⫽
P
i⫽1 j⫽1
兺 c i p i共 x i 兲 ⫹d 共 n 兲 ,
i⫽1
M is a set of basis functions that include where 兵 p i (x i ) 其 i⫽1 past values of y(n) and present and past values of u(n). Referring to Eq. 共2兲, we may identify c i as the weight of the coupling of hidden unit i to the output unit, M as the number of hidden units, and x i as the weighted sum of inputs to the hidden unit i, written as
i⫽1
P
539
c 共 i, j 兲 y 共 n⫺i 兲 u 共 n⫺ j 兲 ⫹¯⫹d 共 n 兲 , 共1兲
p i共 x 兲 ⫽
兺
m⫽0
a mi x m ,
then combining Eqs. 共2兲 and 共4兲 yields
共4兲
540
CHON et al.
冉
M
y 共 n 兲⫽
M
兺 c i m⫽0 兺 a mi x mi i⫽1
冊
⫹d 共 n 兲 .
共5兲
Substituting the x i in Eq. 共3兲 into Eq. 共5兲 and gathering like terms, the following expression is derived:
y 共 n 兲⫽
冉
P
M
M
兺 c i a 0i ⫹ j⫽1 兺 i⫽1 兺 c i a 1i w ji
i⫽1
Q
⫹
兺 i⫽1 兺 c i a 1i v ji
j⫽0
冉 兺 兺 冉兺 兺 兺 冉兺 P
⫹
P
冊
M
冊 冊
兺 兺 i⫽1 兺 c i a 2i w ji w ki M
Q
j⫽0 k⫽0 P
⫹2
i⫽1
j⫽1 k⫽0
y 共 n⫺ j 兲 y 共 n⫺k 兲
c i a 2i v ji v ki u 共 n⫺ j 兲 u 共 n⫺k 兲
M
Q
y 共 n⫺ j 兲
u 共 n⫺ j 兲
j⫽1 k⫽1 Q
⫹
冉
M
冊
i⫽1
冊
c i a 2i w ji v ki y 共 n⫺ j 兲 u 共 n⫺k 兲
⫹¯⫹d 共 n 兲 .
共6兲
Note that Eqs. 共1兲 and 共6兲 are equivalent, where the linear and nonlinear coefficients in Eq. 共1兲 are now represented by the neural network weight values and polynomial coefficients in Eq. 共6兲. The general form of nth-order NARMA coefficients can be given by M
a共 i 兲⫽
兺 c s a 1s w is , s⫽1
共7兲
cients are all obtained from a backpropagation training algorithm. Although more efficient algorithms exist for training neural networks 共e.g., radial basis functions兲,1,2,9,15,16 we have utilized the backpropagation algorithm since it is widely recognized and easily implemented. Note that the accuracy of the linear and nonlinear parameters in the above equations also depends on the proper selection of the model order and the number of hidden units in the neural network. Model order selection 共P and Q in the above equations兲 can be determined using well-established techniques 共e.g., Akaike Information Criteria, minimum description length, principal component analysis, etc.兲. Proper selection of the number of hidden units remains an active research area. In practice, the selection of the number of hidden units is determined heuristically, for example, by using the minimum number of hidden units without compromising the description of the system dynamics. It has been shown that a large number of hidden units often leads to ‘‘overfitting’’ 共i.e., memorizing statistical flukes in the data兲.1,4,14,15 Sharing weights among hidden units 共i.e., constrain them to the same set of weights兲 has been proposed as another approach to reducing the dimensionality of neural networks.1,15 In the ‘‘Simulation Results’’ section, we describe an approach for determining the proper number of hidden units and the model order. The procedure delineated so far comprises what we call deterministic polynomial function neural networks, which we explored in Ref. 3. Our new procedure, the stochastic polynomial function neural network, builds on this algorithm as follows. Once proper linear and nonlinear deterministic NARMA coefficients and the estimated output are obtained, the prediction error is computed
M
b共 i 兲⫽
兺 c s a 1s v is , s⫽1
共8兲
M
a 共 i, j,k, . . . ,n 兲 ⫽
兺 c s a ns w is w js ¯w ns , s⫽1
共9兲
M
b 共 i, j,k, . . . ,n 兲 ⫽
兺
s⫽1
c s a ns v is v js ¯ v ns ,
共10兲
M
c 共 i, j,k, . . . ,n 兲 ⫽1/2
兺
s⫽1
c s a ns w is v js ¯w ns v ns . 共11兲
Given a three-layer neural network having the topology of Fig. 1, NARMA coefficients can be obtained from the weights 兵w io and v io 其 and the polynomial coefficients 兵 a ni 其 . The unknown weights and the polynomial coeffi-
M
e 共 n 兲 ⫽y 共 n 兲 ⫺
兺 c i p i共 x i 兲 .
i⫽1
共12兲
The next step is to feed the prediction error sequence, e(n), as one of the tapped-delay input sequences, into the network 关i.e., delayed e(n) sequences are now one of the input signals along with delayed sequences of y(n) and u(n) in Fig. 1兴. This procedure allows the prediction error to approach an unpredictable sequence 共i.e., white noise兲 so that the deterministic parameter estimate is more likely to be unbiased, such that E 关 y⫺yˆ 兴 ⫽0,
共13兲
where E 关 • 兴 denotes the expected value. It has been shown that an iterative procedure of modeling the prediction error sequences, similar to the approach we propose, where prediction error changes after each iteration, is an effective method to achieve unbiased parameter
Stochastic Feedforward Artificial Neural Networks
541
TABLE 1. Comparison of the DPFNN and the least-squares method for ARMA model parameter estimates of the process described by Eq. „15…. Model terms
y ( n ⫺1)
y ( n ⫺2)
y ( n ⫺3)
y ( n ⫺4)
y ( n ⫺5)
y ( n ⫺6)
y ( n ⫺7)
True values DPFNN Least squares
0.60 ⫺0.04 4.25
⫺0.20 0.01 2.14
0.30 0.28 ⫺1.89
⫺0.40 ⫺0.24 ⫺0.57
0.15 ⫺0.05 0.25
0.00 0.02 1.26
0.00 0.03 ⫺0.63
Model terms True values DPFNN Least squares
x(n) 0.80 0.80 0.80
x ( n ⫺1) ⫺0.50 0.02 ⫺3.19
x ( n ⫺2) 0.00 ⫺0.18 ⫺1.90
x ( n ⫺3) 0.00 ⫺0.09 2.47
estimates.6,8,10 Incorporating this strategy, Eq. 共3兲 now becomes P
x i⫽
兺
j⫽1
Q
w j i y 共 n⫺ j 兲 ⫹
兺
j⫽0
R
v j i u 共 n⫺ j 兲 ⫹
兺
j⫽1
s j i e 共 n⫺ j 兲 , 共14兲
where R represents the model order of the prediction error terms. Note that e(n) is not included as a factor in the prediction error terms of Eq. 共13兲 共i.e., R must be at least equal to 1兲. The ARMA/NARMA coefficients are now reestimated from the newly obtained weights 兵w io and v io 其 and the polynomial coefficients 兵 a ni 其 via backpropagation training using Eqs. 共7兲–共11兲. The parameters associated with error sequences can also be obtained from weights 兵 s j i 其 and the polynomial coefficients 兵 a ni 其 . Note that parameters associated with error sequences are not used in the test phase to estimate the deterministic portion of the model. For convenience, we will refer to this procedure as the stochastic polynomial feedforward neural network 共SPFNN兲 and the procedure which does not model the prediction error term as the deterministic polynomial feedforward neural network 共DPFNN兲.
eter estimates obtained by the DPFNN and the traditional least-squares method will be made henceforth. As was the case in Ref. 3, when the correct model order was chosen to be P⫽5, Q⫽1, both the DPFNN and the least-squares methods provide correct parameter estimates of Eq. 共15兲. Table 1 shows results when the incorrect model order, P⫽7 and Q⫽6 is chosen. We observe in Table 1 that the least-squares method provides AR coefficients that are not stable 共e.g., the first three delays of the output coefficients have values greater than 1兲 and the estimated coefficient values deviate significantly from the true coefficients. The coefficients obtained via the DPFNN, however, are not only stable but they are closer to the true coefficients than those obtained with the least-squares method. Due to the unstable coefficients obtained with the least-squares method, the mean-square error resulted in an overflow value. For the DPFNN, the mean-square error 共MSE兲 is zero. To further examine incorrect model order selection in the case when the input delay is greater than the output delay 共e.g., Q⬎ P兲, the following output sequence was generated:
y 共 n 兲 ⫽0.8y 共 n⫺1 兲 ⫺0.5y 共 n⫺2 兲 ⫹0.6x 共 n 兲 ⫺0.25x 共 n⫺1 兲 SIMULATION RESULTS
⫹0.3x 共 n⫺2 兲 ⫺0.4x 共 n⫺3 兲 ⫹0.1x 共 n⫺4 兲 .
共16兲
Effects of Incorrect Model Order Selection Successful use of ARMA modeling depends on the judicious selection of a correct model order. A priori, the optimal model order is unknown. To examine the effect of incorrect model order selection using either the DPFNN or SPFNN, the following input–output sequence was generated: y 共 n 兲 ⫽0.6y 共 n⫺1 兲 ⫺0.2y 共 n⫺2 兲 ⫹0.3y 共 n⫺3 兲 ⫺0.4y 共 n⫺4 兲 ⫹0.15y 共 n⫺5 兲 ⫹0.8x 共 n 兲 ⫺0.5x 共 n⫺1 兲 .
共15兲
Note that the model order selection criteria is the same for DPFNN and SPFNN, thus comparison of the param-
Table 2 shows the results of the DPFNN and the leastsquares methods. The correct model order for the above sequence is P⫽2 and Q⫽4, but we have purposely selected an incorrect model order of P⫽4 and Q⫽4. Despite the incorrect model order, the DPFNN method resulted in coefficients that are exactly the same as the true coefficients. The coefficients obtained with the leastsquares method, however, deviate from the true coefficients. We have performed many different simulations than the ones presented here and have found that the DPFNN as well as the SPFNN outperform the leastsquares method. The neural network approach, therefore, is robust in providing accurate predicted output response even when the model order selection is incorrect.
542
CHON et al. TABLE 2. Comparison of the DPFNN and the least-squares method for ARMA model parameter estimates of the process described by Eq. „16…. Model terms
y ( n ⫺1)
y ( n ⫺2)
y ( n ⫺3)
x(n)
x ( n ⫺1)
x ( n ⫺2)
x ( n ⫺3)
x ( n ⫺4)
True values DPFNN Least-squares
0.800 0.800 0.813
⫺0.500 ⫺0.500 ⫺0.507
0.000 0.000 0.005
0.600 0.600 0.600
⫺0.250 ⫺0.250 ⫺0.258
0.300 0.300 0.301
⫺0.400 ⫺0.400 ⫺0.404
0.100 0.100 0.104
In the following, we will add dynamic and measurement noise to several linear and nonlinear ARMA models, generated by computer simulation, to analyze the performance of the SPFNN and DPFNN methods. Since the focus of the present paper is to compare the performance 共i.e., robustness兲 of the DPFNN to that of the SPFNN, both with noisy output signals, simulation examples will concentrate on such noisy output cases. However, it is important to note that for all simulation examples to follow, the DPFNN and SPFNN methods provided equally accurate parameter estimates when the output signal was not corrupted by noise. In the case of a noiseless output signal, the SPFNN is expected to provide the same parameter estimates as the DPFNN since the prediction error signal is zero. Thus, comparison between the two approaches is not effective in the case of a noiseless output signal; consequently these results are not provided. For all simulation examples to follow, the input– output signals generated contain 2000 data points. For the SPFNN approach, input–output signals containing 2000 data points 共for the first step of the two-step process兲 were initially used as both training and test data to obtain prediction error values of equal length 共2000 data points兲. Once prediction error values were obtained 共via DPFNN兲, all input–output data including prediction error values were segmented into halves. The first 1000 data points were used for training and the last 1000 data points were used for testing. Similarly, for the DPFNN, the first 1000 data points were used for training and the last 1000 data points were used for testing. Testing is not a part of the estimation process but it is done to measure the effectiveness of the model; thus, testing is not performed during the training process. By successively decreasing the number of hidden units from the initially chosen large number for the DPFNN, we selected the optimum number of hidden units to be the minimal one which could accurately capture the dynamics of noisefree output signals 共e.g., when MSE was reduced to a small value兲. The same number of hidden units found to be the minimum for DPFNN, was also used for SPFNN. For all simulations considered, the initial adaptive learning rate, the momentum, and the number of iterations were set to 0.0001, 0.1, and 20,000, respectively.
Linear ARMA Model with Additive and Dynamic Noise For the first example, we consider an arbitrarily chosen linear ARMA model described by the following difference equation: y 共 n 兲 ⫽0.7u 共 n 兲 ⫺0.4u 共 n⫺1 兲 ⫺0.1u 共 n⫺2 兲 ⫹0.25y 共 n⫺1 兲 ⫺0.1y 共 n⫺2 兲 ⫹0.4y 共 n⫺3 兲 , 共17兲 where y(n) is the output and u(n) is the input. We consider two separate cases, one in which the output of Eq. 共17兲 is corrupted by additive Gaussian white noise 共GWN兲 of the form z 共 n 兲 ⫽y 共 n 兲 ⫹e 共 n 兲 ,
共18兲
and the other in which the output of Eq. 共17兲 is disturbed by dynamic noise, which causes Eq. 共17兲 to take the form y 共 n 兲 ⫽0.7u 共 n 兲 ⫺0.4u 共 n⫺1 兲 ⫺0.1u 共 n⫺2 兲 ⫹0.25y 共 n⫺1 兲 ⫺0.1y 共 n⫺2 兲 ⫹0.4y 共 n⫺3 兲 ⫹0.25e 共 n⫺1 兲 ⫺0.11e 共 n⫺2 兲 ⫹0.19e 共 n⫺3 兲 . 共19兲 Note that additive noise 关Eq. 共18兲兴 is statically added to the clean output signal. For example, e(n) is added after the output sequence in Eq. 共17兲 has been generated. For the dynamic noise, as we see in Eq. 共19兲, the GWN source, e(n), is fed back to the output so that the current and future output values are dependent on the past states of the input and noise signals. Thus, the outputs described by Eqs. 共18兲 and 共19兲 have different values. To determine the optimum data length as well as the number of hidden units, simulations of Eq. 共19兲 involving the DPFNN were performed with the number of hidden units varying from two to four and the number of data points being 1000, 2000, or 4000. Table 3 shows simulation results. Note that for each of the number of data points chosen, the first half of the data was used for training and the last half of the data was used for testing.
Stochastic Feedforward Artificial Neural Networks
543
TABLE 3. Comparison of NMSE values as a function of data length and number of hidden units. No. of data points
1000
2000
4000
2 hidden units 3 hidden units 4 hidden units
9.16% 8.94% 9.27%
8.12% 7.87% 8.08%
7.75% 7.48% 7.90%
For all three cases with different data points 共1000, 2000, and 4000兲, we see that the MSE is the lowest with three hidden units. With a choice of either two or four hidden units, the MSE is increased, thus, the optimum number of hidden units for estimating the parameters of the above input–output sequence is three. In addition, the minimum number of data points needed to obtain reasonably accurate estimates of the parameters of the above input–output data is shown to be 2000 data points 共1000 for training and 1000 for testing兲. We observe that the MSE value does not significantly reduce as we increase the number of data points from 2000 to 4000 but does decrease as the number of data points is increased from 1000 to 2000. With 2000 data points and three hidden units chosen, we performed simulations on the above examples with five independent noise sources for both dynamic and additive noise. Thus, we obtained five results each for both dynamic and additive noise to determine the statistical significance of the obtained results. The mean signal-to-noise ratio 共SNR兲 due to five independent additive and five independent dynamic noise sources is ⫺1.6⫾0.1 and 7.6⫾0.1 dB, respectively. The values of the SNR for both additive and dynamic noise are chosen to demonstrate that even with a significant amount of noise, the SPFNN and, to a lesser extent, the DPFNN provide accurate NARMA model parameter estimates. We define SNR to be 10 log10 共variance of signal/ variance of noise兲. Note that the ARMA model order can be selected by sequentially increasing the model order, then choosing the one which provides the minimum MSE. Another approach is to use the Akaike information criteria 共AIC兲.8 However, it is our experience that the AIC does not always work well in practice. Figure 2 shows a bar graph of the ARMA model order as a function of normalized mean-square error 共NMSE兲 values for the noisy output data generated by Eq. 共18兲. The NMSE is defined as N 1/N 兺 i⫽1 共 e共 i 兲⫺e兲 N 1/N 兺 i⫽1 共 y 共 i 兲⫺ y 兲
,
where e(i) is the prediction error sequence, y(i) is the output, and e and y are the mean of the prediction error and output, respectively. The NMSE is high with an initial low model order and then decreases as the
FIGURE 2. Bar graph of NMSE values as a function of model order.
model order is increased. Since the subsequent decrease in NMSE values are small for model orders from 共3,2兲 to 共4,3兲 and 共4,3兲 to 共5,4兲, any of these three model orders can be selected as the optimal model order based on a MSE criterion. Under the parsimonious parameter estimation rule, the correct model order of P⫽3 and Q ⫽2 should be selected as the smallest yet not compromising the system dynamics. For the purpose of comparing the performance between the DPFNN and SPFNN, which is the primary focus of this paper, simulations herein are performed with the known correct model orders. Thus, for both types of noise 关Eqs. 共18兲 and 共19兲兴, the correct AR order P⫽3, and MA order Q⫽2 were selected. It should be noted that as evidenced by previous simulation results 共Tables 1 and 2兲, with either the DPFNN or SPFNN, it is not as crucial as with the leastsquares method to select the correct model orders to obtain accurate predicted output responses. Once the prediction error signal was obtained via the DPFNN, the SPFNN method was then performed with prediction error lag R⫽3. Despite severe amounts of either additive or dynamic noise, both the DPFNN and SPFNN provide fairly accurate parameter estimates. However, we observe that for both noisy cases, ARMA parameter estimates obtained via the SPFNN approach are closer to the true parameters than are those of the DPFNN method 共not shown兲. The average normalized mean-square errors 共NMSE兲 obtained with the DPFNN and SPFNN for the cases of additive and dynamic noise are shown in Table 4. For both additive and dynamic noise, the SPFNN achieved superior performance 共NMSE, p⬍0.5兲 to that of the DPFNN. A segment of the clean output signal 共solid line兲 along with the predicted output signal 共dotted line兲 for the DPFNN 共top
544
CHON et al. TABLE 4. Comparison of the mean NMSE „%… ⴞstandard deviation for the DPFNN and the SPFNN.
Eq No.
(18)
(19)
(21)
(22)
(23)
(21)⫹additive noise
SNR DPFNN SPFNN
⫺1.6⫾0.1 dB 7.4⫾0.4 6.1⫾0.4
7.6 ⫾0.1 dB 4.43⫾0.3 2.6 ⫾0.2
3.9⫾0.07 dB 9.8⫾0.8 4.8⫾0.6
⫺0.4⫾0.08 dB 6.7⫾0.5 4.9⫾0.6
7.5⫾0.1 dB 3.6⫾0.07 3.3⫾0.07
⫺1.7⫾0.1 dB 10.3⫾0.7 dB 8.2⫾0.5 dB
panel兲 and SPFNN 共bottom panel兲, with additive noise, is shown in Fig. 3. Figure 3 confirms the superior prediction capability of the SPFNN method since the estimated output obtained by the SPFNN tracks the noiseless output better than does the estimated output obtained by the DPFNN. In this simulated example, the SPFNN performs better when it is confronted with dynamic noise than with additive noise.
Nonlinear ARMA Model with Additive and Dynamic Noise The next simulation example consists of an arbitrarily chosen nonlinear ARMA difference equation of the form y 共 n 兲 ⫽0.8u 共 n 兲 ⫺0.13u 共 n⫺2 兲 ⫹0.2y 共 n⫺1 兲 ⫺0.11y 共 n⫺3 兲 ⫺0.11u 2 共 n⫺1 兲 ⫹0.13y 2 共 n⫺2 兲 ⫺0.18u 共 n⫺1 兲 y 共 n⫺1 兲 .
共20兲
Again, we consider separate cases with additive and dynamic noise. Dynamic noise terms were recursively added to the above equation:
y 共 n 兲 ⫽0.8u 共 n 兲 ⫺0.13u 共 n⫺2 兲 ⫹0.2y 共 n⫺1 兲 ⫺0.11y 共 n⫺3 兲 ⫺0.11u 2 共 n⫺1 兲 ⫹0.13y 2 共 n⫺2 兲 ⫺0.18u 共 n⫺1 兲 y 共 n⫺1 兲 ⫺0.3e 共 n⫺1 兲 ⫹0.1e 共 n⫺2 兲 ⫺0.5e 共 n⫺3 兲 ,
共21兲
whereas the additive noise was applied after the output sequence in Eq. 共20兲 has been generated: z 共 n 兲 ⫽y 共 n 兲 ⫹e 共 n 兲 ,
共22兲
where the noise source, e(n), in both Eqs. 共21兲 and 共22兲 is GWN that is independent from the driving input signal, u(n). The mean SNR based on five independent noise sources each for dynamic noise 关Eq. 共21兲兴 and additive noise 关Eq. 共22兲兴 are 3.9⫾0.07 and ⫺0.4 ⫾0.08 dB, respectively. With linear and nonlinear AR and MA model orders set to P⫽3 and Q⫽2, it was found that a minimum of six hidden units were necessary to estimate NARMA coefficients accurately for the clean signal of Eq. 共20兲 共the MSE with six hidden units is only slightly greater than the minimum MSE obtained with seven hidden units兲. We continued to use six hidden
FIGURE 3. Segment of the exact system output „solid lines in top and bottom panels… and the model predictions with additive noise, from the DPFNN „dashed line, top panel… and the SPFNN „dashed line, bottom panel… methods „SNRⴝⴚ1.6 dB….
Stochastic Feedforward Artificial Neural Networks
545
FIGURE 4. Segment of the exact system output „solid lines for all panels… with noisy „both dynamic and additive noise… output „dashed line, top panel…, and the model predictions obtained via the DPFNN „dashed line, middle panel… and the SPFNN „dashed line, bottom panel… methods „SNRⴝⴚ1.7 dB….
units for the noisy signals of Eqs. 共21兲 and 共22兲. For the SPFNN method, we have used R⫽6 共prediction error terms兲 as additional input lags to train the network for both noisy cases. We determined that six prediction error terms 共among several prediction errors examined, e.g., R⫽1 – 12兲 provided the smallest NMSE at the same time keeping R to a minimal number. The NMSE for the DPFNN and SPFNN with additive and dynamic noise are shown in Table 4. It is clear that for both types of noise sources, the SPFNN method again outperforms the DPFNN. For this example, the SPFNN method is more robust against additive noise than dynamic noise as evidenced by the approximately similar NMSE values 共4.9% for additive and 4.8% for dynamic noise兲 even though the SNR was much lower for additive than dynamic noise. Nonlinear ARMA Model with Nonlinear Dynamic Noise Dynamic noise can exhibit characteristics that are more complicated than the simple linear lags used in the two previous examples. To evaluate the performance of the SPFNN in the case of nonlinear dynamic noise 共e.g., multiplicative兲, the deterministic portion of Eq. 共18兲 was again simulated: y 共 n 兲 ⫽0.8u 共 n 兲 ⫺0.13u 共 n⫺2 兲 ⫹0.2y 共 n⫺1 兲 ⫺0.11y 共 n⫺3 兲 ⫺0.11u 2 共 n⫺1 兲 ⫹0.13y 2 共 n⫺2 兲 ⫺0.18u 共 n⫺1 兲 y 共 n⫺1 兲 ⫺0.3e 共 n⫺1 兲 ⫺0.11e 2 共 n⫺1 兲 ⫹0.2e 共 n⫺3 兲 y 共 n⫺2 兲 .
共23兲
The same number of hidden units and model orders were used as in the previous example 关Eqs. 共20兲–共22兲兴. The mean dynamic noise was added to yield a SNR of 7.5 ⫾0.1 dB 共based on five independent noise sources dynamically added兲. Table 4 shows the NMSE using R ⫽6 共prediction error terms兲 for the SPFNN and DPFNN. Both methods fared equally well in predicting the true output signal, but the SPFNN provided slightly better model prediction than did the DPFNN. When the error lag was increased from R⫽6 to R⫽10 and R⫽15 共for the SPFNN兲, the NMSE was found to be 3.20% ⫾0.1% and 3.22%⫾0.1%, respectively. As a final simulated example, we consider a case when the output signal is corrupted by both the dynamic and additive noise sources. Measured physiological signals in most recordings are corrupted by both types of noise, with measurement noise being the additive noise. The errors associated with a finite sampling frequency in data collection as well as inaccurate peak R – R interval detection are examples of possible measurement noise sources. In addition, heart rate fluctuations measured using an electrocardiogram are likely to be influenced by other input stimuli such as the nervous system and other organs as well as cardiovascular variables such as lung volume, blood pressure, and stroke volume. These input stimuli, other than cardiovascular variables, can be characterized as dynamic noise sources. Thus, to simulate this more likely scenario confronted in data analysis, we performed the following to generate a noisy output signal: first, the signal expressed by Eq. 共20兲 was corrupted by dynamic noise 关from Eq. 共21兲兴, then additive noise that was independent from the dynamic noise was added.
546
CHON et al.
TABLE 5. Comparison of the number of hidden units as a function of NMSE values. No. of hidden units
NMSE
2 3 4 5 6
13.53% 12.17% 9.55% 11.28% 46.64%
The ratio of the variance of the signal to dynamic noise was 2.38, and the ratio of the variance of the signal to additive noise was 0.93. The final mean SNR based on five separate noise sources 共with both dynamic and additive noise兲 was ⫺1.7⫾0.1 dB. Using the same model order and number of hidden units as in Eqs. 共21兲 and 共22兲, we obtained a NMSE of 10.3% for the DPFNN and 8.2% for the SPFNN 共with 15 error lags兲, as shown in Table 4. The top panel of Fig. 4 shows a segment of the clean signal 共solid line兲 along with the noise-corrupted signal 共dotted line兲; the middle and bottom panels of Fig. 4 show the same clean signal 共solid lines兲 along with the predicted outputs obtained via the DPFNN 共dotted line兲 and SPFNN 共dotted line兲, respectively. Although the variance of the noise was approximately 1.5 times greater than the variance of the signal 共an unusually noisy case to be encountered in an experimental setting兲, both methods provide fairly good predictions. The SPFNN, as in the other simulation examples, provides better predictions than does the DPFNN.
Application to Renal Blood Pressure and Flow Data The experimental data were collected from normotensive Sprague–Dawley rats using broadband perturbations of the arterial pressure 共input兲 and measuring the resulting renal blood flow 共output兲. Details of the experimental procedure have been presented elsewhere.4 Each of the four experimental data records used for analysis was 256 s long, with a sampling rate of two samples per second 共Nyquist frequency of 1 Hz兲, after digital low-pass filtering to avoid aliasing. Each data record, containing 512 data points, was subjected to second-degree polynomial trend removal 共which includes demeaning兲 and was normalized to unit variance. Data analysis employing the DPFNN and SPFNN used model orders of P⫽5 and Q⫽4 as they were previously determined to be the optimal model order.4 Due to our previous work7 showing nonlinear dynamics involving renal autoregulatory mechanisms, both the DPFNN and SPFNN employed a second-order polynomial function. Of the 512 data points available in each of the four data records, the first 384 data points 共75%兲 were used to train the network and the remaining 128 data points 共25%兲 were used as test data. For the SPFNN, the prediction error model order was set to R ⫽7. The step size of the network and the momentum were set to 0.0001 and 0.1, respectively. The optimum number of hidden units was determined to be the one with the minimum number of hidden units with a small MSE value. For example, Table 5 shows NMSE values as a function of hidden units. We observed that the largest number of hidden units, M ⫽6, results in the
FIGURE 5. Impulse response function obtained from renal blood pressure and blood flow via the SPFNN approach.
Stochastic Feedforward Artificial Neural Networks
largest MSE and M ⫽4 results in the smallest MSE values. Thus, four hidden units were selected for data analysis. The impulse response function 共IRF兲 obtained via the SPFNN is shown in Fig. 5. The dotted lines in Fig. 5 represent the standard deviation bounds of the sample mean. The IRF obtained via the DPFNN, while not shown, is similar to the SPFNN IRF, and to those obtained with various other methods.4,7 The average normalized mean-square errors 共NMSE兲 for the secondorder NARMA model obtained with the DPFNN and SPFNN are 12.11% and 9.41%, respectively. As was the case in the simulation examples, the SPFNN provides better estimation of the predicted renal blood flow response than does the DPFNN approach. DISCUSSION AND CONCLUSION The number of free parameters to be used for the SPFNN in estimating ARMA/NARMA model parameter estimates is M ( P⫹Q⫹R⫹D⫹3), where M is the number of hidden units, P, Q, and R are model orders of AR, MA, and error lags, respectively, and D is the degree of the polynomial function. For the DPFNN approach, the formula is the same except for the error lag terms 兵i.e., M ( P⫹Q⫹D⫹3)其. Thus, a fair comparison for the simulation examples considered would have been to increase the number of hidden units for the DPFNN so that the number of free parameters used were about equal for both approaches. The reason we have instead used the same number of hidden units for both methods is because the increased number of hidden units did not change the NMSE values significantly, but in certain cases, it resulted in fitting noise 共higher NMSE values兲. One disadvantage with the SPFNN is that more computation time is required than for the DPFNN approach. However, with ever-increasing computer processor speed and off-line analysis, the heavier computation time for the SPFNN method is not to be considered a serious drawback. Other stochastic ARMA/NARMA algorithms based on least-squares approaches6,8,10,18 also result in more computation time than do deterministic ARMA/ NARMA algorithms.3,5,11,12,14,17 In conclusion, both computer simulation examples of ARMA/NARMA systems corrupted by dynamic and additive noise sources and real physiological data have shown the superior predictive capabilities of the SPFNN over the DPFNN approach. ACKNOWLEDGMENTS This work was supported by NIH Grant No. DK 15968, the National Kidney Foundation, and the Whitaker Foundation.
547
REFERENCES
1
Bishop, C. M. Neural Networks for Pattern Recognition. New York, NY: Oxford University Press, 1995. 2 Chen, S., C. F. N. Cowan, and P. M. Grant. Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans. Neural Netw. 2:302–309, 1991. 3 Chon, K. H., and R. J. Cohen. Linear and nonlinear ARMA model parameter estimation using an artificial neural network. IEEE Trans. Biomed. Eng. 44:168–174, 1997. 4 Chon, K. H., Y. M. Chen, N. H. Holstein-Rathlou, D. J. Marsh, and V. Z. Marmarelis. On the efficacy of linear system analysis of renal autoregulation in rats. IEEE Trans. Biomed. Eng. 40:8–20, 1993. 5 Chon, K. H., R. J. Cohen, and N. H. Holstein-Rathlou. Compact and accurate linear and nonlinear autoregressive moving average model parameter estimation using Laguerre functions. Ann. Biomed. Eng. 25:731–738, 1997. 6 Chon, K. H., M. J. Korenberg, and N. H. Holstein-Rathlou. Application of fast orthogonal search to linear and nonlinear stochastic systems. Ann. Biomed. Eng. 25:793–801, 1997. 7 Chon, K. H., N. H. Holstein-Rathlou, D. J. Marsh, and V. Z. Marmarelis. Comparative nonlinear modeling of renal autoregulation in rats: Volterra approach versus artificial neural networks. IEEE Trans. Neural Netw. 9:430–435, 1998. 8 Goodwin, G. C., and R. L. Payne. Dynamic System Identification: Experimental Design and Data Analysis. New York, NY: Academic, 1977. 9 Hassoun, M. H. Fundamentals of Artificial Neural Networks. Cambridge, MA: MIT Press, 1995. 10 Korenberg, M. J., S. A. Billings, Y. P. Li, and P. J. McIlroy. Orthogonal parameter estimation algorithm for non-linear stochastic systems. Int. J. Control 48:193–210, 1988. 11 Korenberg, M. J. A robust orthogonal algorithm for system identification and time series analysis. Biol. Cybern. 60:267– 276, 1989. 12 Marmarelis, V. Z. Identification of nonlinear biological systems using Laguerre expansion of kernels. Ann. Biomed. Eng. 21:573–589, 1993. 13 Marmarelis, P. Z., and V. Z. Marmarelis. Analysis of Physiological Systems: The White Noise Approach. New York, NY: Plenum, 1978. 14 Marmarelis, V. Z., and X. Zhao. Volterra models and threelayer perceptrons. IEEE Trans. Neural Netw. 8:1421–1433, 1997. 15 Ripley, B. D. Pattern Recognition and Neural Networks. New York, NY: Cambridge University Press, 1996. 16 Rumelhart, D. E., and J. L. McClelland, Eds. Parallel Distributed Processing. Cambridge, MA: MIT Press, 1986, Vols. I and II. 17 Wray, J., and G. G. R. Green. Calculation of the Volterra kernels of nonlinear dynamic systems using an artificial neural network. Biol. Cybern. 71:187–195, 1994. 18 Zhu, Q., and S. A. Billings. Fast orthogonal identification of nonlinear stochastic model and radial basis function neural networks. Int. J. Control 64:871–886, 1996.