Optimal partition algorithm of the RBF neural network and its ...

13 downloads 19296 Views 678KB Size Report
Sep 18, 2004 - and its application to financial time series forecasting. Received: 2 December .... process of training centers and widths of the RBF neural network. ..... combining OPA with OLS and call this modified algo- rithm OPA–OLS.
Neural Comput & Applic (2005) 14: 36–44 DOI 10.1007/s00521-004-0439-7

O R I GI N A L A R T IC L E

Y. F. Sun Æ Y. C. Liang Æ W. L. Zhang Æ H. P. Lee W. Z. Lin Æ L. J. Cao

Optimal partition algorithm of the RBF neural network and its application to financial time series forecasting

Received: 2 December 2002 / Accepted: 27 July 2004 / Published online: 18 September 2004  Springer-Verlag London Limited 2004

Abstract A novel neural-network-based method of time series forecasting is presented in this paper. The method combines the optimal partition algorithm (OPA) with the radial basis function (RBF) neural network. OPA for ordered samples is used to perform the clustering for the samples. The centers and widths of the RBF neural network are determined based on the clustering. The difference of the objective functions of the clustering is used to adjust the structure of the neural network dynamically. Thus, the number of the hidden nodes is selected adaptively. The method is applied to stock price prediction. The results of numerical simulations demonstrate the effectiveness of the method. Comparisons with the hard c-means (HCM) algorithm show that the proposed OPA method possesses obvious advantages in the precision of forecasting, generalization, and forecasting trends. Simulations also show that the OPA–orthogonal least squares (OPA–OLS) algorithm, which combines OPA with the OLS algorithm, results in better performance for forecasting trends.

Y. F. Sun Æ Y. C. Liang College of Computer Science and Technology, Jilin University, 10 Qian Wei Road, Changchun, 130012, China Y. C. Liang Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100080, China Y. C. Liang (&) Æ H. P. Lee Æ W. Z. Lin Æ L. J. Cao Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Singapore Science Park II, Singapore, 117528, Singapore E-mail: [email protected] W. L. Zhang Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100080, China

Keywords Optimal partition algorithm Æ Financial time series Æ Ordered samples Æ Radial basis function Æ Neural network Æ Stock price prediction

1 Introduction Artificial neural networks have received increasing attention in time series forecasting in recent years [1, 2]. Financial time series forecasting is one of the most challenging applications of modern time series forecasting. Because financial time series are inherently noisy, nonstationary, and deterministically chaotic, there is no complete information that could be obtained from the past behavior of financial markets to fully capture the dependency between the future price and the past records [3]. Therefore, the financial industry is becoming more and more dependent on advanced computer technologies in order to maintain competitiveness in a global economy. Neural networks represent an exciting technology with a wide scope for potential applications. There is growing interest both in the field of neural computing and in the financial world in explaining the use of neural networks to forecast the future changes in prices of stocks, exchange rates, commodities, and other financial time series [4–11]. The radial basis function (RBF) neural network is a feedforward neural network with a simple structure, which has a single hidden layer. The training parameters in the RBF network are much fewer than those in other feedforward neural networks, which are only centers, widths, and weights between the hidden layer and the output layer [12]. There are two kinds of training methods regarding the RBF network. In the first class of methods, the centers, widths, and weights are trained separately. The iterative methods, such as gradientbased methods, iterative least squares methods, etc., can be used to train the weights. The centers and widths can be trained using different algorithms, such as the random selection algorithm, Konhonen self-organized

37

mapping algorithm, and hard c-means (HCM) algorithm, etc. In the second class of methods, all parameters are trained simultaneously. These kinds of methods include hierarchical genetic algorithm, supervised gradient descending algorithm, and the nearest neighbor clustering algorithm. Due to its advantages such as short learning time and global optimization, the RBF neural network has been extensively applied to a wide range of fields, such as pattern recognition and signal processing. The RBF neural network has been applied for the forecasting and modeling of a multivariable system like the stock market. Numerical simulations show that the RBF neural network is suitable for dealing with complex nonlinear time series such as stock prices, and ideal forecasting results can be obtained by the RBF neural network model. Most existing studies on time series forecasting based on neural network methods do not consider the ordered characteristics. Considering that some samples of time series are ordered, we propose a model to combine an algorithm of partitioning the ordered samples with the RBF neural network. In the novel model, Fisher’s optimal partition algorithm (OPA) is employed to train the centers and widths of the RBF neural network [13]. In order to decrease the manual control needed during the network training and improve the adaptability of neural networks, a methodology of dynamically adjusting the number of hidden nodes is adopted in this paper. The objective function of each class is generated in the process of training centers and widths of the RBF neural network. The number of hidden nodes is adjusted through monitoring the difference of the objective functions. As an application of OPA on the RBF neural network, the proposed method is applied to financial time series forecasting. Simulated results on the prediction of prices of stocks demonstrate the effectiveness of the method. Comparisons show that the proposed OPA and OPA–OLS algorithms are much superior to the HCM and OLS algorithms, respectively, in the precision of forecasting and the trends of prediction.

cluster represents one hidden node. The weights are trained using the gradient-based iterative least mean squares (LMS) algorithm. Five steps are included in the OPA employed in the RBF neural network in this paper. 2.1 Defining the diameter of the cluster Let a cluster G(i, j) be {xi, xi+1,..., xj} (i2, then there exists im which enables Eq. 8 to be the minimum, that is: e½pðN ; mÞ ¼ e½pðim  1; m  1Þ þ d ðim ; N Þ

ð10Þ

thus, the cluster G(im, N)={im, im+1,..., N} is obtained and abbreviated as Gm. Then, the partition is performed continuously and the next partition point im 1 is to be found to enable the objective function to be the minimum. If m1=2, then: e½pðim  1; m  1Þ ¼ d ð1; im1  1Þ þ d ðim1 ; im  1Þ

ð11Þ

and the partition ends, else: e½pðim  1; m  1Þ ¼ e½pðim1  1; m  2Þ þ d ðim1 ; im  1Þ

ð12Þ

Thus, the cluster G(im 1, im1)={im1, im1+1,..., im1} is obtained and abbreviated as Gm1. In general, for any k2{1, 2,..., m2}, we may find a partition point imk which enables the objective function in Eq. 7 or 8 to reach the minimum. If mk=2, then: e½pðimkþ1  1; m  k Þ ¼ d ð1; imk  1Þ þ d ðimk ; imkþ1  1Þ

ð13Þ

and the partition ends, else: e½pðimkþ1  1; m  k Þ ¼ e½pðimk  1; m  k  1Þ þ d ðimk ; imkþ1  1Þ

ð14Þ

Thus, the cluster G(imk, imk+11)={imk, imk+1,..., imk+11} is obtained and abbreviated as Gmk. When k takes all the integers in {1, 2,..., m2} sequentially, all the partition points imk (k=1, 2,..., m2) can be obtained until the last one i2 which enables Eq. 7 to be the minimum: e½pði3  1; 2Þ ¼ d ð1; i2  1Þ þ d ði2 ; i3  1Þ

expected optimal partition. This is an algorithm for partitioning ordered samples into m clusters under the condition that the objective function reaches the minimum. Based on this algorithm, all the samples are divided into the clusters represented by m hidden nodes without changing their orders. 2.4 Determining centers and widths In order to determine the values of the centers and widths of the RBF, a method similar to HCM is employed. The center of the RBF is the arithmetic mean value of all sample points in each cluster. Assuming that the hidden node j contains l sample points, which are fxi1 ; xi2 ; . . . ; xil g; then the center of the RBF corresponding to the hidden node j is: cj ¼

l 1X xi l k¼1 k

ð16Þ

The width of the RBF in the hidden node j is the arithmetic mean value of the distances from all sample points in the cluster to the center, i.e., the value of the width is l   1X dist xik ; cj ð17Þ l k¼1   where dist xik ; cj is the distance from the sample point xik to the center cj.

rj ¼

2.5 Determining the number of hidden nodes In most traditional algorithms, such as the HCM, the number of hidden nodes is predetermined, which restricts the real applications of the algorithms. In order to decrease the manual factors, we propose a novel approach to search for the number of hidden nodes, i.e., through the objective function to determine the number of hidden nodes. Note that e[p(N, i)] (1 £ i £ N) is a monotonically decreasing function with respect to i. Therefore, we may use the difference of the objective functions: diffðiÞ ¼ e½pðN ; iÞ  e½pðN ; i þ 1Þ

ð16i\N Þ

ð18Þ

to determine the number of hidden nodes. When the difference diff(i) reaches a given threshold, the corresponding clustering number i in e[p(N, i)] is the number of hidden nodes. For a given threshold, the number of the hidden nodes can be determined during network training and the network structure can be adjusted dynamically.

ð15Þ

Thus, the cluster G(i2, i31)={i2, i2+1,..., i31} is obtained and abbreviated as G2. The last cluster is G(i1, i21)={i1, i1+1,..., i21} and is abbreviated as G1. Thus, all the clusters G1, G2,..., Gm are obtained, which is the

3 Stock price prediction using RBF with OPA In order to examine the effectiveness of the proposed algorithm based on OPA, we apply it to stock price

39

forecasting. The data used are the indexes in the stock exchanges, including the Standard and Poor’s 500 (S&P 500) index and the composite index in the Shanghai Stock Exchange (CISSE). These are two groups of data with different characteristics. The S&P 500 data cover the time period from 01/04/1993 up to 12/31/1995, which possess obviously unsymmetrical characteristics [3], so preprocessing on the data needs to be performed. The original S&P 500 data can be transformed into a relative difference in percentage of price (RDP), such as a 5-day RDP (RDP-5). There are some advantages to applying this transformation, as mentioned by Thomason [16, 17]. The most prominent is that the distribution of the transformed data will become more symmetrical and will follow more closely a normal distribution. This modification to the data distribution will improve the predictive power of the neural network. The RDP-5 means that: ðRDP - 5Þi ¼

100ðxi  xi5 Þ xi5

ð19Þ

For the RDP-5 of S&P 500, the following equation is used to perform the data processing: 8 < mn þ 2std if xi > mn þ 2std x0i ¼ mn  2std if xi > mn  2std ð20Þ :x otherwise i where mn represents the mean of the samples and std is the standard deviation of the samples. Then, the data are transferred to those in the interval (0, 1). The data of the CISSE used in this paper are those with 353 continuous trading days from 01/04/1999. A different preprocessing approach is employed here which makes the data preprocessing simpler. The normalization of the data is performed first. Denote the stock price series as xi (i=1, 2,..., 353). Let xmax ¼ max ðxi Þ: 16i6353

Denote the normalized data as xi¢ (i=1, 2,..., 353) where xi¢=xi/xmax. To the above two groups of data, the numbers of input nodes are both taken as 3 in this paper; that is, a historical lag with order 3 is considered in the simulation. The number of output nodes is 1; that is, a singlestep prediction is taken. The original S&P 500 and CISSE data are first formed into 697 and 350 input– output data pairs. Then, the data are divided into two

parts, respectively, to form the training data set and the test data set. The error function is defined as: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N X 1 u t E¼ ð21Þ ðd i  y i Þ2 2N i¼1 where N is the size of the data, di is the desired output, and yi is the network output. The proposed OPA algorithm is applied to the above two groups of stock price data respectively. Table 1 lists some compared data of performance for OPA applied to the S&P 500 and CISSE data. In the simulated experiments, the number of hidden nodes and the training rate are determined by trial and error. The criterion of selecting the number of hidden nodes is to make the fitting error of the test data reach the minimum. The criterion of selecting the training rate is to make the success of the iterative error reach the maximum. Figures 1 and 2 show the comparison between the actual and simulation values of the prediction on the indexes when OPA is applied to the S&P 500 and CISSE data, respectively, where Fig. 1a, b illustrates the prediction curves using the training and test data for the S&P 500, respectively, and Fig. 2 gives the prediction curves using the training and test data for the CISSE data, where the region of shadow expresses the test results. In order to examine the effectiveness of OPA applied to the prediction, the comparison with another two commonly used algorithms in training RBF neural networks, HCM and OLS [18–22], is performed. OPA and HCM are both clustering methods which restrict each point of the data set to exactly one cluster. In the two methods, the same distance and objective functions for minimizing the sum of squares of the distances are used, but there are some differences in determining the centers and widths of the RBF network. Firstly, in HCM, all samples have to be initialized to one cluster and the selection of the minimization may affect the optimization process, whereas in OPA, the initialization is unnecessary. Secondly, an iterative optimization method is used in HCM to minimize the sum of squares of distances, while in OPA, the clustering can be obtained by partitioning samples based on the objective function. Lastly, in HCM, all the samples are regarded as having the same status and the ordered characteristics

Table 1 Comparison of algorithm performance and model parameters OPA

Hidden nodes Iteration times Learning rate Training time (s) Training error Test error

HCM

OLS

OPA–OLS

S&P 500

CISSE

S&P 500

CISSE

S&P 500

CISSE

S&P 500

CISSE

26 80,000 0.0024 118 0.003376 0.004440

26 80,000 0.26 59 0.000587 0.000942

26 80,000 0.1050 120 0.003528 0.004573

26 80,000 0.077 56 0.001090 0.007969

16 Null Null 11 0.003406 0.004494

5 Null Null 2 0.000844 0.000925

16 Null Null 14 0.003323 0.004393

6 Null Null 7 0.000670 0.000729

40 (a)

(a) actual values predicted values

0.8 0.6 0.4

50

100

150 200 250 300 Training data points

350

400

0.4

0

450

50

100

150 200 250 300 Training data points

(b)

1.0

actual values predicted values

0.8 0.6 0.4 0.2

350

400

450

actual values predicted values

1.0

Normalized target value

Normalized target value

0.6

0.0 0

0.8 0.6 0.4 0.2 0.0

0.0 0

40

80

120 160 Test data points

200

0

240

Fig. 1a, b Prediction effect of OPA to S&P 500 using a training data and b test data

1.0 actual values predicted values

Normalized target value

0.8

0.2

0.2 0.0

(b)

actual values predicted values

1.0 Normalized target value

Normalized target value

1.0

0.9

0.8

0.7

0.6

0

40

80

120

160 200 Data points

240

280

320

360

Fig. 2 Prediction effect of OPA to CISSE

in the financial time series are not considered, whereas in OPA, the ordered characteristics are considered sufficiently, as we have stated earlier. Furthermore, OPA and HCM are different in determining the number of the hidden nodes. In HCM, the number of hidden nodes is

40

80

120 160 Test data points

200

240

Fig. 3a, b Prediction effect of HCM to S&P 500 using a training data and b test data

predetermined before the optimization process, whereas in OPA, the number of hidden nodes can be determined by objective functions in the optimization process. In the simulation of real financial time series forecasting, the comparison can be made in various aspects. In the comparison, the same input–output data pairs and the same numbers of input and output nodes are used. The same preprocessing on the original data is performed. Numerical simulation results of HCM to the S&P 500 and CISSE data are shown in Table 1. Figure 3a, b show the comparison between the actual and simulation values to the S&P 500 data. It can be seen that OPA is superior to HCM in both numerical results and simulation figures. Better clustering results can be obtained by applying OPA to the CISSE data. Compared with HCM, to obtain the same forecasting precision, OPA needs only a few hidden nodes but HCM needs more hidden nodes. For example, the use of OPA with 11 hidden nodes can obtain almost the same results as that of HCM with 130 hidden nodes. However, the running time of HCM is ten times that of OPA. In the simulation, the number of hidden nodes used in OPA is obtained through training, whereas the number of hidden nodes has to be selected by trial and error in HCM.

41

When the same number of hidden nodes is used in the two algorithms, the simulation error of HCM is much greater than that of OPA. In the forecasting for the CISSE data, we use the same model to perform the prediction on the S&P 500 data. Figure 4 shows the forecasting curves when applying HCM to the CISSE

Normalized target value

1.0

actual values predicted values

0.9

0.8

0.7

0.6

0

40

80

120

160

200

240

280

320

360

Data points

Fig. 4 Prediction effect of HCM to CISSE Fig. 5a–d Behavior of cost function for S&P500 using a training data and b test data, and CISSE using c training data and d test data

(c) OPA HCM

0.0056 0.0052 0.0048 0.0044 0.0040 0.0036

0.020 0.016 0.012 0.008 0.004 0.000

0.0032 0

2000

4000

6000

8000 10000 Epochs

12000

14000

0

16000

(b)

2000

4000

6000

8000 Epochs

10000

12000

14000

16000

(d) OPA HCM

0.0056

0.0052

0.0048

OPA HCM

0.024 Error on training data for CISSE

0.0060 Error on test data for S&P 500

OPA HCM

0.024 Error on training data for CISSE

Error on training data for S&P 500

(a) 0.0060

data. Compared with Fig. 2, it can be seen that the effectiveness of the predictions of OPA is much better than those of HCM. Especially in the later part of the test results, HCM shows obvious distortion, but OPA still possesses perfect fitting precision. This shows that the generalization performance of OPA is much better than that of HCM. The numerical results of Table 1 also demonstrate this. From Table 1, it can be seen that the training error of OPA is half that of HCM and the test error of OPA is seven times less than that of HCM. In order to compare the two algorithms, we also examine the variance of the error function, as defined in Eq. 13. The effect of the prediction using the two algorithms are shown in Fig. 5a, b, giving the training and test curves for the S&P 500 data, respectively, and Fig. 5c, d show the training and test curves for the CISSE data, respectively. From Fig. 5, it can be seen that for the S&P 500 data, whether the test data or the training data, the value of the error function of OPA is less than that of HCM at about iteration steps 7,000– 8,000, which shows that the convergent performance of OPA is superior to that of HCM. For the CISSE data, the superiority of the convergence of OPA is more obvious. A minimum value can be reached for OPA by using only 100 steps or so. But the results using HCM are much worse. For the training data, HCM needs more than 10,000 steps to reach the corresponding

0.020 0.016 0.012 0.008 0.004

0.0044 0.000 0

2000

4000

6000

8000 Epochs

10000 12000 14000 16000

0

2000

4000

6000

8000 Epochs

10000

12000

14000

16000

42 (a)

actual values predicted values

Normalized target value

1.0 0.8 0.6 0.4 0.2 0.0 0

50

100

150 200 250 300 Training data points

350

400

450

(b) actual values predicted values

Normalized target value

1.0 0.8 0.6 0.4 0.2 0.0 0

40

80

120 160 Test data points

200

240

Fig. 6a, b Prediction effect of OLS to S&P 500 using a training data and b test data

1.0 actual values predicted values

Normalized target value

minimum value, whereas for the test data, HCM cannot reach the corresponding minimum value, which shows, again, the advantages of OPA in improving the generalization performance. OLS is another algorithm where the number of hidden nodes can be adjusted in training. Considering that the adaptive selection for the hidden nodes is another important factor to examine the performance of an algorithm, we also compare OPA with OLS. In the OLS algorithm, firstly, each sample is taken as a center, such that the number of hidden nodes is equal to the number of samples. When the number of samples is large, the number of hidden nodes should be pre-determined. The relevant samples are selected randomly and assigned to the RBF centers. Then, the outputs of the hidden nodes are orthogonalized. The least squares method is used for training the weights. There is no iterative process in OLS. During the training, the number of hidden nodes can be adjusted. There are some differences between OPA and OLS. For example, the centers and widths have to be predetermined in OLS, whereas in OPA, they can be obtained by using the partition algorithm. Furthermore, in OLS, the number of hidden nodes is determined based on the contribution of the candidate variables to the variance of the output, whereas it can be obtained by using the objective functions in OPA. Similarly to OPA, OLS is applied for predicting the S&P 500 and CISSE data. The criterion of selecting the hidden nodes is still to make the fitting error of the test data to be the minimum. Numerical simulation results for the two kinds of data are listed in Table 1. The corresponding comparisons of the actual and fitting curves are given in Figs. 6 and 7. From Table 1, it can be seen that there is no obvious difference between OPA and OLS for the training and test precision. But OLS is superior to OPA in the running time. Considering the characteristics of no pre-partition and short running time in OLS, we make the trial of combining OPA with OLS and call this modified algorithm OPA–OLS. In order to examine the effectiveness of the proposed algorithm, we also apply OPA–OLS to the forecasting of stock price. The criterion of selecting the hidden nodes is the same as before. Figures 8 and 9 show the forecasting effect of applying OPA–OLS to the S&P 500 and CISSE data, respectively, and Table 1 gives the numerical results. From Table 1, it can be seen that the forecasting results using OPA–OLS are obviously superior to those using OLS under the condition that the hidden nodes are equivalent in the two algorithms. Also, the running time has very little increase in OPA–OLS. This demonstrates, again, the effectiveness of OPA applied to the RBF neural network. In order to examine the improved methods further, we employ three statistical metrics to evaluate the prediction performance. The three metrics include directional symmetry (DS), correct up (CP) trend, and correct down (CD) trend [3]. The metrics DS, CP, and CD provide the correctness of the predicted direction, pre-

0.9

0.8

0.7

0.6

0

40

80

120

160 200 Data points

240

280

320

360

Fig. 7 Prediction effect of OLS to CISSE

dicted up trend, and predicted down trend of the time series data in terms of percentage, respectively. The larger the values of the three performance metrics, the closer are the predicted time series values to those of the actual ones. The definitions of the three metrics are as follows:

43

n1 is the number of training (test) samples when the training (test) is performed.

(a) actual values predicted values

Normalized target value

1.0

DP ¼

0.6

where:  1 ai ¼ 0

0.4 0.2

0

50

100

150 200 250 300 Training data points

350

400

450

(b)

0.8 0.6 0.4 0.2 0.0 0

40

80

120

160

200

240

Test data points

Fig. 8a, b Prediction effect of OPA–OLS to S&P 500 using a training data and b test data 1.0 actual values predicted values

0.9

0.8

0.7

0.6

0

40

80

120

160 200 Data points

240

280

320

CD ¼

if ðyi  yi1 Þ > 0 and ðyi  yi1 Þðdi  di1 Þ>0 otherwise

n3 100 X ai n3 i¼1

where:  1 ai ¼ 0

actual values predicted values

1.0

Normalized target value

ð23Þ

n2 is the number of training (test) samples for (yiyi1)>0 when the training (test) is performed.

0.0

Normalized target value

n2 100 X ai n2 i¼1

0.8

360

ð24Þ

if ðyi  yi1 Þ\0 and ðyi  yi1 Þðdi  di1 Þ>0 otherwise

n3 is the number of training (test) samples for (yiyi1)0 otherwise

ð22Þ

4 Conclusions and discussions A novel radial basis function (RBF) neural network method is presented and its effectiveness is demonstrated using financial forecasting. The centers and widths of the RBF neural network are determined based on the optimal partition algorithm (OPA). The number of hidden

44 Table 2 Comparison of statistical metrics for training and test data

OPA

HCM

OPA–OLS

Average

S&P 500 CISSE S&P 500 CISSE S&P 500 CISSE S&P 500 CISSE S&P 500 CISSE DS Training Test CP Training Test CD Training Test

57.11 50.59

56.57 52.30

55.30 48.62

51.43 50.57

58.01 50.99

53.14 51.72

57.56 53.75

58.86 52.87

57.00 50.99

55.00 51.87

55.80 50.00

60.64 56.52

53.57 48.41

55.32 43.48

58.93 53.17

48.94 50.00

56.25 53.97

60.64 56.52

56.14 51.39

56.39 51.63

56.25 51.18

51.85 47.56

54.81 48.82

46.91 58.54

54.81 48.82

58.02 53.66

56.73 53.54

56.79 48.78

55.65 50.59

53.39 52.14

Table 3 Comparison of metrics over average (%)

Training Test Total

OLS

OPA

HCM

OLS

OPA–OLS

23 33 28

0 8 4

31 17 24

46 42 44

nodes is selected according to the objective function in the OPA. Examination of the forecasting precision for stock prices and the statistical metrics of trends shows that the OPA method is obviously superior to the hard c-means (HCM) algorithm in the precision of prediction, generalization, and the correctness of forecasting trends. The statistical results on the metrics of the prediction trends show that the proportion over the corresponding statistical average of the OPA is seven times that of HCM. An improved algorithm is found by combining OPA with OLS. The proportion over the corresponding statistical average of OPA–OLS is more than 90%. Simulated experiments show that some useful results have been obtained as we apply OPA to the RBF neural network and the application of the proposed method to financial forecasting, whereas some problems still need further study, such as the issues of how to increase the on-line learning performance to decrease the fitting error for the test data, how to improve the selection method for the hidden nodes to overcome the dependence on the data, and many others, which require further efforts in theoretical and applied studies on OPA. Acknowledgements The first two authors are grateful to the support of the NSFC.

References 1. Qi M, Zhang GP (2001) An investigation of model selection criteria for neural network time series forecasting. Eur J Operat Res 132(3):666–680 2. Liang YC, Wang Z, Zhou CG (1998) Application of fuzzy neural networks to the prediction of time series. Comput Res Dev 7:661–665 3. Cao LJ, Tay Francis EH (2001) Financial forecasting using support vector machines. Neural Comput Appl 10(2):184–192

4. Swingler K (1996) Financial prediction: some pointers, pitfalls and common errors. Neural Comput Appl 4(4):192–197 5. Burrell PR, Folarin BO (1997) The impact of neural networks in finance. Neural Comput Appl 6(4):193–200 6. Van Gestel T, Suykens JAK, Baestaens DE, Lambrechts A, Lanckriet G, Vandaele B, De Moor B, Vandewalle J (2001) Financial time series prediction using least squares support vector machines within the evidence framework. IEEE Trans Neural Netw 12(4):809–821 7. Sitte R, Sitte J (2000) Analysis of the predictive ability of time delay neural networks applied to the S&P 500 time series. IEEE Trans Syst Cybern Part C Appl Rev 30(4):568–572 8. Virili F, Freisleben B (2001) Neural network model selection for financial time series prediction. Comput Stat 16(3):451–463 9. Dorsey R, Sexton R (1998) The use of parsimonious neural networks for forecasting financial time series. J Comput Intell Finance 6(1):24–31 10. Tay Francis EH, Cao LJ (2001) Application of support vector machines in financial time series forecasting. Int J Manag Sci 29(4):309–317 11. Tay Francis EH, Cao LJ (2002) e-descending support vector machines for financial time series forecasting. Neural Processing Lett 15(2):179–195 12. Haykin S (1999) Neural networks: a comprehensive foundation. Prentice-Hall, Englewood Cliffs, New Jersey 13. Fang KT (1989) Practical multivariate statistic analysis. East China Normal University Press, Shanghai 14. Spath H (1980) Cluster analysis algorithms for data reduction and classification of objects. Halsted Press, New York 15. Dillon WR, Goldstein M (1984) Multivariate analysis: methods and applications. Wiley, New York 16. Thomason M (1999) A basic neural network-based trading system: project revisited (parts 1 and 2). J Comput Intell Finance 7(3):36–45 17. Thomason M (1999) A basic neural network-based trading system: project revisited (parts 3 and 4). J Comput Intell Finance 7(4):35–45 18. Sergios T, Konstantinos K (2003) Pattern recognition, 2nd edn. Elsevier, Amsterdam, The Netherlands 19. Chen S, Cowan CFN, Grant PM (1991) Orthogonal leastsquares learning algorithm for radial basis function networks. IEEE Trans Neural Netw 2(2):302–309 20. Chen S (2002) Multi-output regression using a locally regularised orthogonal least-squares algorithm. IEEE Proc Vis Image Signal Processing 149(4):185–195 21. Chen S, Hong X, Harris CJ (2003) Sparse multioutput radial basis function network construction using combined locally regularised orthogonal least square and D-optimality experimental design. IEE Proc Control Theory Appl 150(2):139–146 22. Wu KL, Yang MS (2002) Alternative c-means clustering algorithms. Pattern Recognit 35(10):2267–2278

Suggest Documents