Predicting time series with wavelet packet neural networks - Nanyang ...

1 downloads 0 Views 379KB Size Report
May 3, 2009 - wavelet decomposition, Zhang and Benveniste proposed the wavelet MLP (W-MLP), which has been usedfor time series prediction.
Predicting Time Series with Wavelet Packet Neural Networks Lip0 Wang, Kok Keong Teo, and Zhiping Lin School of Electrical and Electronic Engineering Nanyang Technological University Block S2, Nanyang Avenue Singapore 639798 { elpwang, ~730988 le, ezplin}@ntu.edu.sg Abstract In this paper, we apply the WP-MLP neural network for time series prediction. The paper is organized as follows. In Section 2, we briefly review the WP-MLP. In Section 3, we present the simulation results of time series prediction for three benchmarking cases, i.e. the sunspot, laser, and Mackey-Glass time series. We also discuss the effects of various weight initialization algorithms on the prediction results. Section 4 presents the conclusion.

Inspired by both the multilayer perceptron (MLP) and wavelet decomposition, Zhang and Benveniste proposed the wavelet MLP (W-MLP), which has been usedfor time series prediction. The wavelet packet MLP (WP-MLP) is an MLP with the wavelet packet as a feature extraction method to obtain time-frequency information. The WPMLP has been successfully applied to biomedical, image and speech classification. In this paper, we demonstrate that the WP-MLP can be a very promising approach to time series prediction.

2 Wavelet Packet MLP

The WP-MLP consists of three parts, i.e., (1) input with a tapped delay line with p delays, (2) wavelet packet decomposition, and (3) an MLP. The output i ( n + 1) is the value of the time series at time n + l and is assumed to be a hnction of the values of the time series at p previous time steps.

1 Introduction

There have been extensive applications of MPL neural networks to time-series prediction (e.g., [ 1],[20]-[23]). Neural networks are supposed to emulate the brain that is powerful, flexible and efficient. However, conventional neural networks process signals only on their finest resolutions. Wavelet decomposition [2]-[6] produces a good local representation of the signal in both the time and the frequency domains. Inspired by both the MLP and wavelet decomposition, Zhang and Benveniste [7] proposed the wavelet neural network that allows for hierarchical, multi-resolution learning of input-output maps from data. Geva [8] used the wavelet neural network for time series prediction and demonstrated its excellent performance.

Wavelet packets were introduced by Coifman, Meyer and Wickerhauser [ 9 ] . Recursive splitting of vector spaces is represented by a binary tree. The wavelet packet transformation is used to produce time-frequency atoms. These atoms provide both time and frequency information with varying resolution through out the time-frequency plane. The time-frequency atoms can be expanded in a tree-like structure to create arbitrary tiling, which is useh1 for signals with complex structures. A tree algorithm [IO] can be employed for computing wavelet packet transform with the wavelet coefficients as filter coefficients.

The wavelet packet multi-layer perceptron (WP-MLP) neural network is an MLP with the wavelet packet decomposition as a feature extraction method to obtain time-frequency information. The WP-MLP has been successhlly applied to classification . of biomedical signals, image, and speech.

0-7803-7044-9/0 1/$10.00 0200 1 IEEE

The input signal is decomposed by wavelet packets and then fed into a conventional MLP with k input units, one hidden layer with m sigmoid neurons and one linear output neuron. The architecture of the WP-MLP is

1593

Authorized licensed use limited to: Nanyang Technological University. Downloaded on May 3, 2009 at 09:52 from IEEE Xplore. Restrictions apply.

to a different clusler. The number of clusters is then chosen to be the number of neurons in the hidden layer of the WP-MLP. The number of clusters may be chosen before clustering, or may be determined by the clustering algorithm used. In the present work, the hierarchical tree clustering algorithm [ 141 and a competitive learning algorithm [ 151 are used. Hierarchical tree clustering algorithm consist ofthe following steps:

defined as [ p : [ ( w l e t ) : h] , where p is the number of tapped delays, I is the number of decomposition level, wlet is the type of wavelet packet used, and h is the number of hidden neurons in the MLP.

3 Simulation Results In this section, we first describe various weight initialization algorithms used in the simulations. We next present simulation results on three benchmarking timeseries.

1.

2.

3.1 Neural network weight initialization Kolen and Pollack [ 121 showed that feedforward network with the backpropagation technique is very sensitive to the initial weight selection. Prototype pattern [ 181 and the orthogonal least square algorithm [19] can be used to initialize the weights. The initialization of the weights and biases has a great impact on the network training time and generalization performance. Usually, the weights and biases are initialized to small random values. If the random initial weights happen to be far from a good. solution or they are near a poor local optimum, training may take a long time or get trap in the local optimum [8]. Proper weight initialization will place the weights close to a good solution, which reduces training time and increases the possibility of reaching a good solution. In this subsection, we describe methods of weight initialization based on clustering algorithms.

3.

Compute Ihe Euclidean distance between every pair of events in the data set. Group thc events into a binary, hierarchical cluster tre: by linking together pairs of events that are ir close proximity. As the events are paired into binary clusters, the newly formed clusters ale grouped into larger cluster until a hierarchical tree is formed. Cut the hit rarchical tree to form a certain number of cluste-s chosen based on some prior knowledge.

Let us consider a competitive learning network consisting of a layer of N, competitive neurons and a layer of N + I input nodes, N being the dimension of the input pattern. We have I

where gkj is the w,eight connecting neuron k to all the inputs, hk is the totid input into neuron k, and xjis element j of the input pattcrn. If neuron k has the largest total input, it will win Ihe competition and becomes the sole neuron to response to the input pattern. The input patterns that win the competition in the same neuron are considered to be in the same cluster. The neuron that wins the competition will have its weights adjusted as follows: wknjew = (1 -a(t))w? + a ( t ) x , ,

Geva [8] proposed to initialize the weights by a clustering algorithm that is based on mean local density (MLD). He showed that this method easily leads to good performance, whereas random weight initialization leads to a wide variety of different results and many of them were poor. However, we note that the best result from random weight initialization was much better than the result obtained from the MLD initialization method.

where learning constant a(t) is a function of time. The other neurons do not adjust their weights. Learning constant will change as follows to ensure that each training pattern has equal statistical importance and independence of pr:sentation order [ 161 [ 171.

Following Geva [8], we use a time-frequency event matrix defined by combining the time-frequency patterns and its respective targeted outputs of the training data. Each column is made up of a time-frequency pattern and its respective target. Clustering analysis [13] is the organization of a collection of patterns into clusters based on similarity. Grouping of the time-frequency events is thus revealed. Member events within a cluster are more similar to each other than they are to an event belonging

cX(Zk)

=

a(1) - 1 + ( Z k - l)cr(l)

where 7,2 1 is the number of times neuron k has modified its weigh1 s, including the current update. At the end of training, those neurons that never had its weights modified are discarded. We thus find the number of clusters.

1594

Authorized licensed use limited to: Nanyang Technological University. Downloaded on May 3, 2009 at 09:52 from IEEE Xplore. Restrictions apply.

Once clusters are formed, we proceed to initialize the WP-MLP: the number of hidden neurons of the WP-MLP is chosen to be the same as the number of clusters and the weights of the hidden neurons are assigned to be the centroid of each cluster.

Table 1. Test MSE of the WP-MLP with random initialization on Mackey-Glass chaotic time series. * stands for wavelet decomposition. Network r14: 1(Db2):81 [14:2(Db2):8] [ 14:3(Db2):8] *[14: (Db2):8]

We have chosen three time series often found in the literature for benchmarking, i.e., the Mackey-Glass delaydifferential equation, the yearly sunspot reading, and the laser time series. Each data set is divided into three portions for training, validation, and testing, respectively. The MLP is trained by the backpropagation algorithm using the Levenberg-Marquadt method [ 1 11. When there is an increase in the validation error, training stops. In order to have a fair comparison, 100 independent simulations are camed out for each weight initialization method and each time series (weight initialization can be different from trial to trial, since the clustering result can depend on the random initial starting point for a cluster). We record the minimum error, the mean error, and the standard deviation over the 100 runs.

Test MSE Mean I Std I Min 1 . 8 0 ~ 1 01 - ~ 8 . 1 3 ~O-(' 1 I 1.O2xl 0-5 3.73 xlO-' 1.86 x ~ O - ~ 1.25 x ~ O - ~ 1 . 5 6 ~ 1 0 - ~ 4.5 ~ x I O - ~ 9.65x10-(' 1.12 x ~ O ' ~ 3.83 x ~ O - ~ 1.49 x ~ O - ~

Table 1 shows the results on prediction errors for the Mackey-Glass time series using different architecture of the WP-MLP, i.e. the mean, the standard deviation, and the minimum Mean Squared Error (MSE) of the prediction errors over 100 simulations. Tables 2 and 3 show the results on prediction error for network initialized by hierarchical tree and competitive learning clustering algorithms, respectively. Table 2. Test MSE of the WP-MLP initialized with the hierarchical tree clustering algorithm on Mackey-Glass chaotic time series. * stands for wavelet decomposition.

3.2. Mackey-Glass chaotic time series

Network Architecture

The Mackey-Glass time-delay differential equation is defined by

where x ( t ) represents the concentration of blood at time t when the blood is produced. In patients with different pathologies, such as leukemia, the delay time Z may become very large, and the concentration of blood may oscillate, becoming chaotic when ~ 2 1 7 In . order to test the learning ability of our algorithm, we choose the valuer = 17 so that the time series is chaotic. The values of a and b are chosen as 0.2 and 0.1, respectively. The training data sequences are of length 400, followed by validation and testing data sequences of length 100 and 500, respectively.

Network Architecture [14: 1(Db2):10] I [14:2(Db2):10] I [14:3(Db2):10] I *[14: 3(Db2):10]1

Eight neurons in the hidden layer are chosen and this is a prior knowledge required by hierarchical clustering algorithm to cut the tree i.e. there are eight clusters. The competitive learning clustering algorithm is initially given ten neurons.

Mean

I

Test MSE Std

I

Test MSE Mean I Std 2.33 x ~ O - ~4.30 x ~ O 1.21 xlO-' 1.22 XIO-' 2 . 9 6 ~ 1 0 -I~ 3.52 x ~ O 2.34 x I O - ~ I 2.24 x ~ O

Min.

I Min. -13.54 ~ x ~ O - ~ 12.42 x ~ O - ~ -19.82 ~ x~O-~ -17.86xlO-' ~

Hierarchical tree clustering algorithm provides consistent performance for the various network architecture except the network with wavelet decomposition. Competitive learning clustering algorithm leads to the lowest minimum MSE with various network architecture.

1595

Authorized licensed use limited to: Nanyang Technological University. Downloaded on May 3, 2009 at 09:52 from IEEE Xplore. Restrictions apply.

Our results are comparable to those obtained by the Scalenet [K]; however, Scalenet may be regarded as a committee of sub-networks and we expect that a committee of ml: ltiple WP-MLP’s should improve on the performance of a single WP-MLP further.

Table 4. Comparisons between the WP-MLP results (with random weight initialization WP-MLP/ Rm, and weight initialization based on the hierarchical tree WP-MLP / HT and competitive learning WP-MLP / CL clustering algorithms) and other results on the Mackey-Glass time series found in the literature, i.e., the discrete-time backpropagation neural network (DTB), the infinite impulse response (IIR) neural network, a network with nonlinear preprocessing (MLP), and the time delay neural network (TDNN) with global feedback (TDNNGF). -

I

Network

I

4 Conclusion In this paper, we have demonstrated that WP-MLP can be extremely usefi for time series prediction. We used hierarchical tree and competitive learning clustering algorithms to initialized the WP-MLP. Usually the network weights are initialized to small random values that may be far kom a good solution. Simulations show that the weights of the WP-MLP must be initialized to improve performince.

Test NMSE

References 1. A.S Weigend., El. A. Huberman, D. E. Rumelhart. “Predicting Sunspots and Exchange Rate with connectionist Networks. Nonlinear Modeling and Forecasting,” SFI Studies in Science of Complexity, Vol. 12, pp. 395 -432, 1992. 2. S.G. Mallat, “Multifrequency channel decompositions of images and wavelet models,” IEEE Transactions on Acoustics, Spcech and Signal Processing, Vol. 37, No. 12, pp. 2091 -2110, 1989. 3. S.G. Mallat, “A Theory for Multiresolution Signal Decomposition: The Wavelet Representation” IEEE Transactions 011 Pattern Recognition, Vol. 11, No. 7 pp. 674 4 9 3 , 19851. 4. Vetterli, M., Herley, C.: Wavelets and filter banks: theory and design. IEEE Transactions on Signal Processing, Vol. 40, No.9 (1992) 2207 -2232 5 . Mallat, S.G.: A Wavelet Tour of Signal Processing. Academic Press (1998) 6. Strang, G., Nguyen, T.: Wavelet and Filter Banks. Wellesly-Camxidge Press (1996) 7. Zhang, Q., Bmveniste, A.: Wavelet Networks. IEEE Transactions m Neural Networks, Vol. 3, No. 6. (1 992) 889 -8518 8. Geva, A.B. : ScaleNet-Multi scale Neural-Network Architecture for Time Series Prediction. IEEE Transactions on Neural Networks, Vol. 9, No. 9 (1998) 1471 -1482 9. Coifman, RI!., Meyer, Y., Wickerhauser, M.V.: Entropy-Based Algorithm for Best Basis Selection.

Table 5 shows that the WP-MLP provides superior prediction performance compared to many others’ work, including the discrete-time backpropagation neural network [20], the infinite impulse response (IIR) neural network [2 11, a network with nonlinear preprocessing [22], and the time delay neural network with global feedback [23]. 3.3. Sunspot and laser time series Sunspots are large blotches on the sun that is often larger in diameter than the earth. The yearly average of sunspot areas has been recorded since 1700. The sunspots o f years 1700 to 1920 are chosen to be the training set, 192 1 to 1955 as the validation set, while the test set is taken from 1956 to 1979. “Set A” in the Santa Fe competition [ 13 is a clean physics laboratory experiment on a Lorentz-like chaotic behavior in an NH3 far-infrared laser. This time series includes 1100 samples of amplitude fluctuations in the far-infrared laser, approximately described by three coupled nonlinear ordinary differential equations. Samples 1 through 900 are chosen to be the training data, followed by validation and testing data sequences of length 100 and 100, respectively.

1596

Authorized licensed use limited to: Nanyang Technological University. Downloaded on May 3, 2009 at 09:52 from IEEE Xplore. Restrictions apply.

IEEE Transactions on Information Theory, Vol. 38, No. 2 Part 2 (1992) 713-718 10. Vetterli, M., Herley, C.: Wavelets and filter banks: theory and design. IEEE Transactions on Signal Processing, Vol. 40, No.9 (1992) 2207-2232 11. Hagan, M.T., Menhaj, M.B.: Training Feedforward Networks with the Marquardt Algorithm. IEEE Transactions on Neural Networks, Vol. 5, No. 6 (1994) 989 -993 12. Kolen, J.F., Pollack, J.B.: Back Propagation is Sensitive to Initial Conditions. Tech. Rep. TR 90-JKBPSIC, Laboratory for Artifical Intelligence Research, Computer and Information Science Department, (1990) 13. Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys, Vol. 31, No. 3 (1999) 264-323 14. Aldenderfer, M.S., Blashfield, R.K.: Cluster Analysis. Sage Publications (1 984) 15. Nielsen, R.H.: Neurocomputing. Addisson-Wesley Publishing Company. (1 990) 16. Wang, L.: On competitive leaming. IEEE Transaction on Neural Network, Vol. 8,No. 5 (1997) 1214-1217 17. Wang, L.: Oscillatory and Chaotic dynamic in neural networks under varying operating conditions. IEEE Transactions on Neural Network, Vol. 7, No. 6 (1996) 1382-1388 18. Denoeux, T., Lengelle, R.: Initializing back propagation network with prototypes. Neural Computation, Vol. 6 (1993) 351-363 19. Lehtokangas, M., Saarinen, J., Kaski, K., Huuhtanen, P.: Initialization weights of a multiplayer perceptron by using the orthogonal least squares algorithm. Neural Comput., Vol. 7, (1995) 982-999 20. Duro, R.J., Reyes, J.S.: Discrete-time backpropagation for training synaptic delay-based artificial neural networks, IEEE Trans. Neural Networks, Vol. 10, No. 4, pp.779-789, 1999. 21. McDonnell, J.R, Waagen, D.: Evolving recurrent Perceptrons for time series modeling. IEEE Trans. Neural Networks, Vol. 5, pp.24-38, 1994. 22. Chow, T., Leung, C.T.: Performance enhancement using nonlinear preprocessing. IEEE Trans. Neural Networks, Vol. 7, pp. 1039-1042, 1996. 23. Haykin, S., Principe, J.: Making sense of a complex world. IEEE Signal Processing Magazine, vol. 15, no.3, pp.66-81, 1998.

1597

Authorized licensed use limited to: Nanyang Technological University. Downloaded on May 3, 2009 at 09:52 from IEEE Xplore. Restrictions apply.

Suggest Documents