IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 43, NO. 4, APRIL 1996
351
Parametric Signal Restoration Using Artificial Neural Networks Andrzej Materka,” Senior Member, IEEE, and Shizuo Mizushina
resistance causes a measurable change in the aortic flow. The shape of the signal waveform is thus known (a step in the resistance value), along with a differential equation that describes the relationship between the signal and the aortic flow. However, the actual value of the step is unknown and needs to be estimated given the measured flow waveform. On-line estimation of the signal parameter is essential to clinical applications of the methodology that require tracking of time-varying parameters [ 11. Least-squares curve fitting is normally used for parameter estimation [l], [2]. However, due to its iterative nature, this technique is not suitable for parameter real-time tracking at reasonable costs of the computing equipment. Even for a simple, three-parameter signal, a typical PC386/387 machine does not make on-line estimation possible [ 2 ] . Delay time of an evoked potential is another example of a signal parameter of a diagnostic value [3]. This parameter estimation by the method of least squares is in general even more time consuming when compared to other parameters [9]-[ 111. This is caused by the fact that iterative minimization I. INTRODUCTION of the sum-of-error-squares function is often terminated at a ERY often, when medical diagnosis or biological study local minimum of the function when the time delay is one require estimation of the value of a physical quantity, of the optimized parameters. For a global minimum to be there is no direct access to the variable which carries the achieved a good initiation is required. To find a good starting information of interest. The signal that can be measured is point, least-squares fitting is performed for a number of fixed dependent on this variable and depends also on other unknown values of time delay and the value giving the lowest error variables. This applies to most of the noninvasive diagnostic is accepted as a final solution [9], [ll]. Such a procedure techniques and is a characteristic of any measurement method significantly increases the time of calculations. Thus there is which features a significant signal blurring andor contaminaa need for a fast time-delay estimation technique that would tion by random noise. In some cases, the signal becomes also not be iterative. distorted due to the nonlinearity of biological/physical media. The radiant flux spectrum, as recorded by a spectrometer, is Examples include on-line identification of arterial circulation given by the convolution of the “true” radiant flux spectrum parameters [ 11, [2], evoked potential parameter estimation (as it would be recorded by a perfect instrument) with the from noisy signals [3], signal and image analysis in laboratory spectrometer response function [4]. The observed spectrum is instrumentation, e.g., related to spectrometry [4], [5] or eleca blurred version of the radiant flux spectrum. This blurring trophoresis [ 6 ] ,parameter estimation of multicompartmental phenomenon takes place in any measuring instrument, not only responses of biological systems [7], [8], and many others. a spectrometer. Let t be the independent variable, which may In these examples, a model of the signal of interest is represent time, wavelength, wave number, or other physical known. The model often incorporates electrical signals and quantity, depending on the nature of the measurement. The systems [l], [2]. In [l], a step change in the aorta peripheral signal y ( t ) observed at the output of a linear measuring Manuscript recieved April 13, 1994; revised September 28, 1995. This instrument is described by the convolution integral Abstract-The problem of parametric signal restoration given its blurredhonlinearly distorted version contaminated by additive noise is discussed. It is postulated that feedforward artificial neural networks can be used to find a solution to this problem. The proposed estimator does not require iterative calculations that are normally performed using numerical methods for signal parameter estimation. Thus high speed is the main advantage of this approach. A two-stage neural network-based estimator architecture is considered in which the vector of measurements is projected on the signal subspace and the resulting features form the input to a feedforward neural network. The effect of noise on the estimator performance is analyzed and compared to the least-squares technique. It is shown, for low and moderate noise levels, that the two estimators are similar to each other in terms of their noise performance, provided the neural network approximates the inverse mapping from the measurement space to the parameter space with a negligible error. However, if the neural network is trained on noisy signal observations, the proposed technique is superior to the least-squaresestimate (LSE) model fitting. Numerical examples are presented to support the analytical results. Problems for future research are addressed.
V
work was supported by the Polish Scientific Research Committee under Grant 8T1 lFO1010. Asterisk indicates corresponding author. *A. Materka was with the Department of Electrical and Computer Systems Engineering, Monash University, Melbourne (Caulfield East), Australia. He is now with the Institute of Electronics, Technical University of Loda, Stefanowskiego 18, Lodz, 90-924 Poland (e-mail
[email protected]). S . Mizushina is with the Research Institute of Electronics, Shizuoka University, Hamamatsu 432, Japan. Publisher Item Identifier S 0018-9294(96)02430-5.
y(t) =
/
00
h(t - 7 ) 2 ( 7 ) d 7-k E ( t )
(1)
--oo
where z ( t ) is the undistorted signal of interest, h ( t ) is the instrument impulse response function, T is the variable of integration, and E ( t ) denotes the observation noise. Depending
0018-9294/96$05 1.000 1996 IEEE
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 43, NO. 4, APRIL 1996
358
on the function h ( t ) , two narrow pulse components of the signal x ( t ) that are sufficiently close to each other in the t domain cannot be distinguished at the output of the instrument since each pulse is spread over a range of the independent variable. Thus convolution involves a loss of resolution. There have been many methods proposed to process the instrument response y(t) in order to reconstruct the original waveform x ( t ) , e.g., [4], [12], and [13]. Most of them involve iterative calculations which offer certain advantages [13] at the expense of a relatively long computational time. It has been demonstrated [ 141 that feedforward artificial neural networks (ANN'S) can be used for model parameter estimation. The advantage of this approach is that the estimation process becomes relatively fast. This is because no iterative calculations are involved in generating the estimated parameter values. They are obtained at the output of a feedfonvard neural network, e.g., electrical or optical. The response time is limited only by signal propagation through such a network. Thus the proposed estimation technique is highly suitable for real-time applications. A neural network has to be trained before being applied to the estimation task. The training process involves adjustment of the network internal connections (weights) in order to minimize the estimation error over a set of examples. This could be a computationally intensive process. However, it has to be performed once only for a given class of signals whose parameters are to be estimated. The weights become fixed once the training has been completed. Training examples are calculated using mathematical equations which model the signal and the underlying mechanism of its generation. Thus an a priori knowledge about the problem at hand is incorporated with the proposed technique. The method is discussed in more theoretical details in this paper. Its properties are compared to the well-known technique of least-squares model fitting. Three numerical examples are presented to better illustrate the discussion. Problems for further research are addressed. 11. MATERIALS AND METHODS
All the signal examples mentioned in Section I can be represented by a response of a general continuous dynamical system, either linear or nonlinear, to the signal of interest ~ ( t The ) . dynamical system either represents the biological organ that generates the signal y ( t ) , e.g., vascular system [2] or corresponds to a laboratory instrument, e.g., spectrometer [4]. This system distorts the shape of the signal ~ ( tand ) contaminates it by noise. From this point of view this is a degrading system. A parameterized model of the signal z ( t ) is known and the actual values of the parameters, that relate to a particular system response y ( t ) , are to be estimated given this response. The response is measured at some n values of the independent variable t to form the measurement (observation) vector Y = [Yl Yz
+
'
.
YnlT
(2)
where y2 = f ( t , , Q ) c,,i = 1 , 2 , . . . , n and E, denotes the random noise term. The vector quantity B = [B, 82 . . B P I T , B E 0 , represents the signal parameters
which remain constant within the observation interval [tl,tn]. The parameter space 0 is a subset of P. The continuous function f ( . ) is the response of the degrading system to the signal of interest. This function can either be expressed analytically or evaluated numerically using the model equations. The vector f =
[f(tl,Q)
f(t2,Q)
'
. f(tn,Q)l'
(3)
can be used for the neural network training if the system equations are known. If they are unknown, the neural network can be trained using a set of vectors (2) measured for known parameter values. In the latter case, the training examples are noisy. One can show that this leads to biased parameter estimates which, however, have lower variance when compared to those obtained by noiseless training [15]. There is a possibility of compromising the bias for the variance by introducing certain smoothness constraints into the training equations. This interesting and useful property is proven in the Appendix and illustrated by numerical examples in the following section. In the proposed approach, the n-element measurement vector y is applied to the input of a trained neural network of p outputs. The output variables form the estimated parameter vector S. There are n measurements available and p parameters which are unknown. It can be shown that, p observations are sufficient to uniquely define the mapping Q(y) in the noiseless case, provided the related Jacobi' s determinant is nonzero. On the other hand, an increased number of measurements, n > p , can lead to a smaller parameter variance [16]. This, however, means a larger number of the input variables to the neural network. Thus the neural network would become more complex for larger n. The increased !ANN complexity would make the training process duration prohibitively long for some applications [17]. One faces then the well-known problem of dimensionality reduction [ 181. In the classical approach to finding solution to this problem, the measurement vector is transformed into a lower-dimensional space while optimizing certain performance index. In statistics, principal component model and factorial model are used for the purpose [19]. These two models are closely related to each other. They can be defined using eigenvectors of the measurements covariance matrix. Dimensionality reduction by principal component analysis (PCA) has been already found useful in biomedical engineering, e.g., [ 181, [20], among many other applications, and will be adopted for the purpose of this study as well. The PCA analysis can be interpreted in terms of a linear transformation that can be performed by the so-called linear combiner neural network [21]. Following this, a twostage neural network architecture is proposed to estimate signal parameters. It is shown schematically in Fig. 1. The signal of interest excites the input of a degrading system which either represents a biological system under investigation or an imperfect measuring instrument. The measurements y1, yz, . . . , and yn form the input to the PCA network. This network transforms the measurements, as follows: z = (y - f)TE
(4)
where z = [zl z~ . . zP]' is the feature vector, and E is an n x p matrix whose columns are those eigenvectors of the
MATERKA AND MIZUSHINA: PARAMETRIC SIGNAL RESTORATION USING ARTIFICIAL NEURAL NETWORKS
data covariance matrix that correspond to p largest eigenvalues of this matrix [19]. The vector f in (4) is a mean value of the vector f over the parameter space 0. In the case of the noiseless training, i.e., when the model equations are available, the data covariance matrix is described by
R = E[(f - f)(f -
359
ESTIMATED PARAMETERS
(5)
where E[.] denotes the expectation operator. If the model equations are not available, the sample data covariance matrix
SIGNAL FEATURES
can be used to construct the matrix E in (4) with f replaced by the mean value y . In either case, if A1 > A 2 > . . . > A, are the largest eigenvalues of the dispersion matrix and e l , e2, ... , and ep are its respective eigenvectors, e, = [e,l e22 ... e;,IT,i = 1 , 2 , . . . , p , then the projection matrix is defined as follows:
Eigenvalue analysis can be used to find the elements of the matrix E. In an alternative approach, an adaptive neural network can perform the PCA [22], [23], robust PCA [24], or nonlinear PCA [25], whichever gives better feature representation to the particular signal. The transformation (4) based on (7) is assumed in this paper to define the signal features. Fig. 1 shows that the features are applied to the input of an artificial neural network (ANN). The neural network considered here is a static system, e.g., electronic analogldigital circuit, comprising a number of nonlinear processing units arranged in layers [26], [27]. The lowest layer is formed by the neural network input nodes whereas the uppermost layer is formed by its output nodes. Each processing unit in the internal (hidden) layers realizes a simple nonlinear function, most often sigmoidal (squashing), e.g., tanh(a). The input Q to any processing unit is a weighted sum of outputs from the layer below it. Referring back to Fig. 1, it is assumed that the output of the neural network is a continuous multivariable function of the signal features
where w = [ w ~ w2 . . w,]’ is the vector of weights which represent internal connections between the layers. The network is thus a feedforward nonlinear system which approximates the mapping from the feature space to the parameter space. This inverse mapping is normally unknown, however, its input-output examples can be calculated using models of the signal and of the degrading system. In other words, for any particular parameter vector 0 E 0 one can calculate the vector f = f(0) and then, by using (4) with y = f , one can obtain z = z(Q)which is the neural network input. The network response is then obtained from (8). The desired network response is equal to 0, and the estimated parameter
MEASUREMENTS
PARAMETERISED SIGNAL Fig. 1. Neural network-based estimation of signal parameters.
vector 8 should be as close to this target value 0 as possible. Therefore the aim of the training process is to minimize a selected norm of the differences 118 - 011 over a set of examples {Oh E 0,h = 1 , 2 , . . . , N } . The minimization can be accomplished by the adjustment of the neural network weights. Finding an efficient training algorithm is one of the hottest research topics in the area of artificial neural network theory and application, see e.g., [28]-[30]. In the numerical experiment described in the next section, the well-known backpropagation algorithm [26]-[30] was used to find an initial weight estimates, followed by a faster and more accurate training using nonlinear programming [ 141. It has been proven that three-layer artificial neural networks are capable of approximating any multivariable function to any desired degree of accuracy, provided certain mild assumptions are satisfied regarding the transfer function of their processing units [26], [28], [31]. However, except for some simple cases, the neural network of a given size is unable to provide a zero error of parameter estimation for every vector 0 belonging to the parameter space, even if there is no measurement noise. This is caused by the dissimilarity between the approximated and approximating functions. There is therefore a bias involved with the proposed method of parameter estimation, caused by the finite-size network inability to exactly reconstruct the mapping of interest. Nevertheless, it has been demonstrated in [14] and [15], and will be shown later on as well, that the bias can be made practically negligible by increasing the number of processing units in the network.
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 43, NO. 4, APRIL 1996
360
rl 0 ... 01 Suppose g(z*) = 6'* where z* is defined by (4) for an observation vector y = f * = f(0*), and 6'* is an interior point of 0 . Consider a small deviation A0 of the parameter vector, such that (e* + A6') E 0. It causes the corresponding deviations of both z and g. In particular, for the output Assume the Jacobian IZI exists. Premultiplying both sides of node gk,k = 1 , 2 , . . . , p one obtains using the Taylor series (15) by ( Z T ) - I , one obtains expansion [32] G = (Z')-'. (17) Ask = g k - G P It follows from this first-order perturbation analysis that the 8% z* 4 function g ( y ) can uniquely be determined if the feature ,=1 sensitivity matrix Z is nonsingular in the neighborhood of each = Gkl A21 + GkzAZz . . . GkpAjZP (9) point 6'* E 0. It should be noted that this is the necessary condition for the existence of this function. The question, where whether a given architecture heural network can approximate az,= Z, - 2; it with a prescribed error or not, is another issue. The following sensitivity matrix can be defined for the measurements
.c""1
+ +
and where
+
By definition, function gk(.) should be equal to 0k,gk(z* Az) = 0; AQk, and be independent of all the other parameters. Under the assumption of g(z*) = 6'*, the following condition holds:
+
Agk = A6'k.
(13)
Using (13) and (12) one can formulate the system of p linear equations with p unknowns Gkl, Gk2,. . . , and Gkp
Repeating the above discussion for k = 1 , 2 , . . . , p give p equations similar to (14). All of them can be combined together to obtain
Z'G
=I
(15)
where 211
212
. u p 1
Gii
I .
4
2
. . . 21, ...
.
.'.
ZP,
G12
...
.
...
' I
which substituted to (17) gives G = (ETF)-l.
(21)
Equation (21) will be used to derive the covariance matrix for the neural-network based parameter estimator. The additive noise E, in (2) is modeled as a stationary, = a21, where white, zero-mean Gaussian process, i.e., E = [e1 E,]' and g2 is the noise variance. The noise is uncorrelated with the signal. Consider the observation vector y* = f * + E , which, when substituted to (4), produces the following input to the neural network:
that g(z*) = Q* at a point Q* E 0. F~~ Suppose noise variance one can approximate the a sufficiently parameter estimate by the following expression:
8 = g(z* + ET€) E Q* + G ~ E % .
(23)
The covariance matrix of the estimate 8 can be obtained as cov[8] = E [ @- e*)@ - 6'*)T] s E [(G ~ &) (EG ~~E ~ ~ ) ' ] = E[GTET~~'(GTET)'] =a
2
~
T
~
T
~
~
~
MATERKA AND MIZUSHINA: PARAMETRIC SIGNAL RESTORATION USING ARTIFICIAL NEURAL NETWORKS
where
1.8
rX1
0
361
-
1.6
... 0 1
1 4
1.2 $
*-
I
1
----t input,
X
Using (21) one obtains from (24)
x
0.8 -D- blurtnoise.
\
Y
0.6 0.4
C O V [ ~ ]= ~
( E ~ F A F ~ L ) - ~ . (26)
0.2 0
This result can be compared to the covariance matrix of a leastsquares parameter estimate based on the same measurement vector. That matrix, for large n, is described by the following expression [7]:
s,
COV[~X]
=a2(~T~)-1.
(27)
The matrix FTF is called the information matrix for 8 [33]. Since most of the information carried by the measurements is preserved by the transformation (4), one can expect that the proposed estimator is not much different from a least-squares estimator in terms of their parameter covariance matrices. This is indeed the case in practice, provided the inverse mapping of interest is accurately approximated by the neural network. This problem will be further discussed in the following section. (It should be noted here that (26) is valid for any number of observations n > p , whereas (27) is derived as a limit for n + CO. Thus their similarity is approximate only. More investigation is planned into the comparison of the proposed estimator to the least-squares [7] and total-least-squares [41] parameter estimators, including high noise level case.) The formula (26) can also be written as C O V [ ~ ]=
a2(ZTAZ)-'.
(28)
The larger the determinant IZTAZI, the lower the volume of confidence ellipsoids [33] and lower uncertainty of parameter estimates. The minimization of this determinant by a proper selection of observation points can be used for the experiment design [33], [34].
-0.2
1 o -
d
u
A. Spectrometric Signal Deconvolution
This particular example has already been discussed in a number of papers [13], [35],[36]to illustrate the performance of iterative methods of signal deconvolution. Its selection here will give better understanding of the properties of the neural network-based method of signal parameter estimation in the context of the previous related work performed by other research groups. Fig. 2 shows the two signals of interest: z ( t )
~
m
~
m
-
~
~
-
u
m
~
m
w
~
m
~
-
m
o
m
w
m
o
~
m
~
o
rt
t
Fig. 2. Computer simulated spectrophotometric signals.
and y ( t ) , calculated using (1). The ideal signal z ( t )comprises two Gaussian components 1001
x(t)=
~
6
0
2
exp
[
1 t - 108 -
2 ( 4 ]
each dependent on three parameters. The parameters of the second component are fixed for the purpose of this demonstration, whereas the parameters of the first component are unknown. Parameter 81 represents one-tenth of the amplitude of the first spectral component, 82 is the component width, and 83 corresponds to the spectral component location on the axis of the independent variable t. Their actual values belong to the following parameter space: ( 0 ~ 0 . 55
01
5 1.5, 2 5 82 5 4,
4 5 03 5 5 ) .
(30)
The signal z ( t ) is plotted in Fig. 2 for the nominal values of its parameters, 00 = [1.0, 3.0, 4.5IT. The number of parameters is limited to p = 3 in this example, to make all the presentation simpler and computer calculation shorter in time. The degrading system impulse response h ( t ) is also assumed in the form of a Gaussian function in this example
111. RESULTS To support the analytical results of the previous section and to demonstrate the practical usefulness of the proposed approach to the parameter estimation problem, three numerical examples are presented. The examples are selected to cover various aspects of biomedical engineering. They range from laboratory measurements using a spectrophotometer, through human electroretinogram modeling by a nonlinear dynamical system impulse response to subcutaneous tissue temperature measurement using a multifrequency microwave radiometer.
~
.I'>:(;
h ( t )= 3 exp[a
8
Equations (l), (2Y), and (31) were used to simulate the signal observations with normally distributed pseudorandom noise added. The signal y ( t ) plotted in Fig. 2 corresponds to the noise standard deviation of c = 0.02. The noiseless signal measurements, obtained for a set of different parameter values spanning the range (30), are shown in Fig. 3. The problem is to design the neural network to estimate the signal parameters given a set of noisy measurements ~ ( t ) . Observation points have to be selected first. Although other approaches are possible, it is assumed that equidistant samples of y ( t ) are taken at the following values of the variable t:
+
t; = CO Cl(z - trunc (n/2 + l)),
z = 1 , 2 , . . . , n.
(32)
Suppose n = 9 is a sufficient number of observations, i.e., it is the number that gives a sufficient reduction of the noiseinduced estimation error as compared to the error that could be observed for n = p = 3. As discussed e.g., in [33], the
IEEE TRANSACTTONS ON BIOMEDICAL ENGINEERING, VOL. 43, NO. 4, APRIL 1996
362
1.8
-
1.6 --
1.4
~~
(0.5.3.0.4.5) 11.s.3.0.4.51
h 0.8
0.4
~~
---ct
(1.0.3.0.4.01
--t
11.0.3.0.5.0)
7
(1.0.2.0.4.5)
-
11.0.4.0.4.5)
- 6 F4 - 5
O 3
w 4
" "
d m
o ~
w d
N "
w w
~ b
O w " r w W
d m
0
o o
2 4 PI X 3 0 2
3
t
Fig. 3. Signal y ( t ) simulated for v = 0
determinant of the information matrix can be maximized in the search for the optimum observation points. The following criterion is suggested for the purpose
4o(F) =
(fi.
= 0.243,
6= 0.048.
(34)
The corresponding eigenvectors are plotted in Fig. 5. They were used to construct the projection matrix (7) for the PCA network shown in Fig. 1. Next, the feature sensitivity matrix Z was calculated on a dense grid of points covering the parameter space. The corresponding surface plots showing the dependence of the criterion function 40(Z) on the signal parameters are presented in Fig. 6. The determinant ( Z (appears to be a smooth function of the parameters. It does not reach zero within the parameter space. One concludes that the inverse function g(z) exists. The calculated examples {e,.(e)} can be used for the neural network training. Fig. 7 demonstrates how the signal feature z1 varies with the parameters in the present example. For the signal and parameter space under consideration, the features span the following range of values: [-0.7,2.8]. (35) A single-hidden-layer feedforward neural network with sigmoidal processing units was used to verify the proposed parameter estimation technique. This is a well-known neural network architecture, discussed by many authors, e.g., [26]-[30]. x1 E
[-1.7,1.6],
z2
E [-2.3,2.3],
and
0
4
1
(33)
Fig. 4 shows the dependence of the function (33) on the constants CO and C1 in (32), calculated for the nominal parameter vector Bo. The criterion function reaches its maximum at CO = 45 and C1 = 3. It follows then that the optimum observation interval covers the range from t = 33 to t = 57, for the nominal parameter values and n = 9. One can expect that this range may not be optimal for other values of the unknown parameters and for different numbers n. Nevertheless, the values CO = 45 and Cl = 3 were used to find the measurement covariance matrix R using (1)-(3), ( 5 ) , and (32). The matrix R was estimated assuming that the elements of the parameter vector were random variables, uniformly distributed over the parameter space (30). The largest eigenvalues of the covariance matrix were equal, respectively, to = 0.345,
50
Fig. 4. Determinant criterion 4o(F) as function of CO and C1 in (32) 0.6
0.5 0.4
0.3 0.2
0.1 0
-0.1 -0.2
-0.3 -0.4
-0.5 33
39
45
57
51
t
Fig. 5. The eigenvectors of the data covariance matrix corresponding to its three largest eigenvalues.
The output of this network at node k , k = I,2 , . . . , and p is described by
where w
[w1,w2,...,wqlT = [uo,u~,'..,u,,v~o, is the q-element weight vector associated with the output node k , q = ( p + 2)m + l , m is the number of processing units in the hidden layer, and [ ( a ) denotes a sigmoidal function of the variable a. There is generally a freedom in selecting the particular form of this "squashing" function [31]. The functions 1/(1 exp ( - a ) ) or tanh(a) are perhaps the most popular ones in the literature. However, calculation of their values involve evaluation of the exponential function that is computationally intensive. The following piecewise function is proposed in this study: =
. . . ,W,,]~
~11,
+
z3 E
- a!3),
a < -1 5a51
-1 a:
(37)
> 1.
The time of calculation of a value of 1/(1 + exp ( - a ) ) is approximately three times longer than the time needed to
MATERKA AND MIZUSHINA: PARAMETRIC SIGNAL RESTORATION USING ARTIFICIAL NEURAL NETWORKS
2.5
2.5
2
2
6
l 5
Z
l
363
5 8
0 5
1 5
167
0
1
4
0.5
333
0
ez
e1
ez
(a)
(b)
Fig. 6. Determinant criterion 4o(Z) as function of the signal parameters: (a)
(C) 83
= 4.5, (b)
$2
= 3.0, and (c) 81 = 1.0.
15 67
5
7
e1
e3
(a) Fig. 7.
Feature
21
(b)
as a function of the signal parameters: (a)
83
= 4.5, (b)
obtain a value of