Distributed Approximating Functional Wavelet Nets Zhuoer Shi, D. S. Zhang, and Donald J. Kouri Department of Physics and Chemistry Univrsity of Houston, Houston, TX 77204
[email protected],
[email protected],
[email protected]
David K. Hoffman Department of Chemistry and Ames Laboratory Iowa State University, Ames, IA 50011
[email protected] Abstract— Novel polynomial functional neural networks using Distributed Approximating Functional (DAF) wavelets (infinitely smooth filters in both time and frequency regimes) are presented for signal estimation and surface fitting. The remarkable advantage of these polynomial nets is that the functional space smoothness is identical to the state space smoothness (consisting of the weighting vectors). The constrained cost energy function using optimal regularization programming endows the networks with a natural time-varying filtering feature. Theoretical analysis and an application show that the approach is extremely stable and efficient for signal processing and curve/surface fitting.
I. INTRODUCTION For real-world signal processing, pattern recognition and system identification, information extraction from a noisy background is the fundamental objective. To obtain an ideal output vector Y(X, W) from the observation input vector X, the system (neural network) should possess the following two kinds of smoothness [6] (where W is the response or the entry of the system, normally for neural networks, called the weight vector). (a) Functional space smoothness; (b) State space smoothness. The degree of smoothness in functional space governs quality of the filtering of the observed (noisy) signal. The smoother the output signal, the more noise is suppressed. State space smoothness implies that a weak fluctuation of the weight vector W={w(i), i=0, L-1}, has a small effect on the output signal, which makes the system less sensitive to the input distortion. For a robust estimation system, the output not only approaches the observed signal value, but also smoothes the signal to suppress the distortion due to noise. Simultaneously, the state space should be smooth to ensure stability. Based on these facts, one finds that the least mean square (LMS) error E A = ∫ [Y ( X ) − Yˆ (W , X )]2 dX ,
the regularization constraints of order r
(1)
ER = ∫ [
∂ rYˆ (W , X ) 2 ] dX ∂X r
,
(2)
and the condition of the system EW =
||W || ∑ | g ( x i )|
(3)
2
i
are the three dominant factors that must be accounted for in designing a robust estimation system. Distributed Approximating Functionals (DAFs), which can be constructed as a window modulated interpolating shell, were introduced previously as a powerful grid method for numerically solving partial differential equations with extremely high accuracy and computational efficiency [3,9,10]. In this paper, we use the DAF approximation scheme for implementing a neural network. Compared with other popular networks, DAF nets possess advantages in several areas: (1) a DAF wavelet is infinitely smooth in both time and frequency domains. (2) For essentially arbitrary order of the Hermite polynomial, the DAF shells possess an approximately constant shape, while commonly used wavelet functions always become more oscillatory as the regularization order is increased. (3) The translation invariance of the DAF approximation ensures feature preservation in state space. The signal processing analysis can be implemented in a space spanned by the DAFs directly. (4) Complicated mathematical operations, such as differentiation or integration, can be carried out conveniently using the DAF interpolating shell. (5) The identical smoothness of the DAF wavelet space and DAF state space underlie the inherent robustness of the DAF wavelet nets.
II. REGULARIZED DAF WAVELET NETS In general, signal filtering may be regarded as a special approximation problem with noise suppression. According to DAF polynomial theory, a signal approximation in DAF space can be expressed as gˆ ( x ) = ∑ g ( xi )δα ( x − xi )
(4)
i
where the δα(x) is a generalized symmetric Delta functional. We choose it as a Gauss modulated interpolating shell, or the so-call distributed approximating functional (DAF) wavelet. The Hermite-type DAF wavelet is given in the following equation [10].
2
M /2
2 δ M ( x | σ ) = 1 exp( − x 2 ) ∑ ( −1 )n 1 H 2 n ( x ) σ 2σ n = 0 4 2π n! 2σ
(5)
The function H2n is the Hermite polynomial of even order 2n. The qualitative behavior of one particular Hermite DAF is shown in Fig.1. The DAF wavelet neural nets possess the alternative feature of the commonly used DAF approximation as gˆ ( x ) = ∑ w(i)δα ( x − xi )
(6)
i
The weights w(i) of the nets determine the superposition approximation gˆ ( x ) to the original signal g(x)∈L2(R). It is easy to show that the weights of the approximation nets, w(i), are closely related to the DAF sampling coefficients g(xi ). The irregular finite discrete time samplers of the original signal are selected for network learning. If the observed signal is limited to an interval I containing a total of N discrete samples, I={0, 1, …, N-1}, the square error of the signal is digitized according to EA =
N −1
∑ [ g (n ) − gˆ (n)]2
(7)
n =0
This cost function is commonly used for neural network training in a noise-free background and is referred to as the minimum mean square error (MMSE) rule. However, if the observed signal is corrupted by the noise, the network produced by MMSE training causes an unstable reconstruction, because the MMSE recovers the noise components as well as the signal. In this case, the signal-noise-ratio (SNR) cannot be improved much. Even for a noise-free signal, MMSE may lead to Gibbs-like undulations in the signal, which is harmful for calculating accurate derivatives. Thus, for more robust filtering, the network structure should be modified to deal with the particular situation. In this paper, we present a novel regularization design of the cost function for network training. It generates edge-preserved filters and reduces distortion. To define the regularity (smoothness) of a signal, we introduce a “Lipschitz index” [6]. Definition: Let f(x)∈L2(R), for any α>0, then if |f(x)-f(y)|=O(|x-y|α),
(8)
the signal f(x) is said to be unified Lipschitz in the space L2(R). The constant, α, is the Lipschitz index of f(x).
3
It is easy to show that when the Lipschitz index, α, is an integer, the Lipschitz regularity is equivalent to the differentiability of f(x) with same order. For commonly used signals, the Lipschitz index α>0. In the presence of noise distortion, the Lipschitz index always satisfiesα1
The predominant advantage of the Hermite polynomial approximation is its high-order derivative preservation (which leads to a smooth approximation). To increase the stability of the approximation system further, an additional constraint in state space is taken to be EW =
∑ |w(i )|2 i
∑ | g ( xi )|
(13)
2
i
Thus the complete cost function utilized for DAF wavelet net training is given by
4
E = E A + λE r + ηEW ∂ r gˆ ( x) 2 = ∑ [ g (k ) − gˆ (k )] 2 + λ ∫R [ ] dx + η r k
∂x
∑ |w(i )|
2
(14)
i
∑ | g ( xi )| i
III. SIMULATIONS Two biomedical signal processing applications (for electrocardiogram and electromyography) using the DAF wavelet neural nets are presented in this chapter. Automatic diagnosis of electrocardiogram (ECG or EKG) signals is an important biomedical analysis tool. The diagnosis is based on the detection of abnormalities in an ECG signal. ECG signal processing is a crucial step for obtaining a noise-free signal and for improving diagnostic accuracy. A typical raw ECG signal is given in Fig. 2. The letters P, Q, R, S, T and U label the medically interesting features. For example, in the normal sinus rhythm of a 12-lead ECG, a QRS peak follows each P wave. Normal P waves rate 60-100 bpm with