The application of ridge polynomial neural network to ... - CiteSeerX

Neural Comput & Applic (2008) 17:311–323 DOI 10.1007/s00521-007-0132-8

ORIGINAL ARTICLE

The application of ridge polynomial neural network to multi-step ahead financial time series prediction R. Ghazali Æ A. J. Hussain Æ P. Liatsis Æ H. Tawfik

Received: 6 August 2006 / Accepted: 7 June 2007 / Published online: 17 July 2007 Springer-Verlag London Limited 2007

Abstract Motivated by the slow learning properties of multilayer perceptrons (MLPs) which utilize computationally intensive training algorithms, such as the backpropagation learning algorithm, and can get trapped in local minima, this work deals with ridge polynomial neural networks (RPNN), which maintain fast learning properties and powerful mapping capabilities of single layer high order neural networks. The RPNN is constructed from a number of increasing orders of Pi–Sigma units, which are used to capture the underlying patterns in financial time series signals and to predict future trends in the financial market. In particular, this paper systematically investigates a method of pre-processing the financial signals in order to reduce the influence of their trends. The performance of the networks is benchmarked against the performance of MLPs, functional link neural networks (FLNN), and Pi– Sigma neural networks (PSNN). Simulation results clearly demonstrate that RPNNs generate higher profit returns with fast convergence on various noisy financial signals. Keywords Ridge polynomial neural networks Financial time series Multilayer perceptrons Functional link neural networks Pi–Sigma neural networks

R. Ghazali (&) A. J. Hussain School of Computing and Mathematical Sciences, Liverpool John Moores University, Liverpool, UK e-mail: [email protected] P. Liatsis School of Engineering and Mathematical Sciences, City University, London, UK H. Tawfik Intelligent and Distributed Systems (IDS) Laboratory, Liverpool Hope University, Liverpool, UK

1 Introduction There have been a number of research advancements taken place in the area of neural networks applications, however, not all of which can be used in real time commercial applications. This is normally because the size of the neural networks can be potentially so large as to prevent the problem solution from being commercialized in the real world [1]. Furthermore, the large network size can slow down the training speed and its convergence. For these reasons, selecting the optimum network structure is very important, since a network of size below the optimum will usually fail to approximate the underlying function. On the other hand, networks with size above the optimum will have a large number of weights and can lead to over-fitting of the problem, which can result in poor generalization [2]. Multilayer perceptrons (MLPs) have been successfully applied in a broad class of financial markets prediction tasks [3–6]. However, MLPs adopt computationally intensive training algorithms such as the error back propagation [2] and can get stuck in local minima. In addition, these networks have problems when dealing with large amounts of training data, and demonstrate poor interpolation properties, when using reduced training sets. Higher order neural networks (HONNs) which have a single layer of trainable weights can help speeding up the training process. HONNs are a type of feedforward neural networks, which have certain advantages over MLPs. They are simple in their architecture and this potentially reduces the number of required training parameters. As a result, they can learn faster, since each iteration of the training procedure takes less time [7]. HONNs have been used in various applications such as image compression [8], time series prediction [9], system identification [10], function approximation [11], and

123

312

pattern recognition [11–13]. Higher order terms or product units in HONNs can increase the information capacity of the networks. The representational power of higher order terms can help solving complex problems using significantly smaller network sizes, while maintaining fast learning properties [1]. This makes them suitable for solving complex problems, where the ability to retrain or adapt to new data in real time is critical [7, 14, 15]. Functional link neural networks (FLNNs) [16] are a type of HONN, which transforms the nonlinear input space into a higher dimensional space, where linear separability may be possible. However, they suffer from the combinatorial explosion in the number of weights, when the order of the networks becomes excessively high. Pi–Sigma neural networks (PSNNs) are HONNs, which were introduced by Shin and Ghosh [11] to overcome the problem of weight explosion in FLNNs. They have a regular structure and require a smaller number of free parameters, when compared to other single layer HONNs. However, the PSNN is not a universal approximator [17]. In this paper, we apply the ridge polynomial neural network (RPNN) as a nonlinear predictor structure for financial time series forecasting. The RPNN was originally proposed by Shin and Ghosh [17]. They tested the network on the surface fitting problem, data classification, and multivariate function approximation. RPNNs maintain the fast learning and powerful mapping properties of single layer HONNs and avoid the explosion of weights, as the number of inputs increases. The networks have a well-regulated structure, which is achieved through the embedding of different orders of PSNNs and are universal approximators [17]. Motivated by the powerful capabilities of RPNN models in pattern recognition [18], image compression [8], function approximation [17], [19], and intelligent control [20], we present a novel application of the RPNN network for financial time series forecasting. The RPNN, in this work, is used to forecast the upcoming trends of the relative difference in percentage of price (RDP) of the next five days ahead. We forecast the RDP in multi-step ahead, where the term ‘‘multi-step’’ means that the forecast horizon is more than one step (i.e., day) ahead.

Neural Comput & Applic (2008) 17:311–323

input layer, while at the output layer, the weighted sums of the inputs are calculated. Functional link neural networks can use higher order correlations of the input components to perform nonlinear mappings using only a single layer of units. Since the architecture is simpler, it reduces the computational costs during training, whilst maintaining good approximation performance [10]. A single node in the FLNN model could receive information from more than one nodes through one weighted link. The higher order weights, which connect the higher order terms of the input products to the upper nodes simulate the interaction among several weighted links. For this reason, the FLNN can greatly enhance information capacity and complex data can be learnt [7, 10, 22]. Fei and Yu [23] showed that the FLNN has powerful approximation capabilities, over the conventional backpropagation networks, and it is a good model for system identification [10]. Cass and Radl [7] used FLNNs in process optimization and found that FLNNs can be trained much faster than MLP networks without sacrificing computational capability. The FLNN can be structured to incorporate properties of invariance under geometric transformations [22]. Figure 1 shows an example of a third order FLNN with three external inputs, x1, x2, and x3, and four high order inputs which act as supplementary inputs to the network. The output of FLNN is determined as follows: Y ¼ r W0 þ

X

W j Xj þ

j

X

Wjk Xj Xk þ

j;k

X

! Wjkl Xj Xk Xl þ

j;k;l

ð1Þ where r is a nonlinear transfer function, and wo is the adjustable threshold. The FLNN suffers from the combinatorial explosion of the weights, which increase exponentially with the number of inputs. As a result, only second and third order functional link networks are considered in practice [13, 24].

Y

2 Higher order neural networks

σ (non-linear TF)

2.1 Functional link neural networks The FLNN was first introduced by Giles and Maxwell [16]. It naturally extends the family of theoretical feedforward network structures, by introducing nonlinearities through input pattern enhancements [21]. These enhancement nodes act as supplementary inputs to the network. The FLNN calculates the product of the network inputs at the

123

Output layer of summing unit

Σ Adjustable weights Input layer X1

X2

X3

X 1X 2

Fig. 1 Functional link neural network

X 1X 3 X 2X 3 X 1X 2X 3


313

2.2 Pi–Sigma neural networks The PSNN was introduced by Shin and Ghosh [11]. It is a feedforward network with a single ‘‘hidden’’ layer and product units at the output layer [25]. The PSNN uses the product of the sums of the input components, instead of the sum of products as in FLNNs. In contrast to FLNNs, the number of free parameters in the PSNN increases linearly with the order of the network. The reduction in the number of weights, as compared to FLNNs, allows the PSNN to enjoy faster training. Ghosh and Shin [25] showed that the PSNN requires fewer numbers of adjustable weights for the same order and the same number of inputs and outputs, when compared to the FLNN. The structure of the PSNN avoids the problem of the combinatorial explosion of the higher order terms. The PSNN is able to learn in a stable manner even with fairly large learning rates [26]. In addition, the use of linear summing units makes the convergence analysis of the learning rules for PSNNs more accurate and tractable. Shin and Ghosh [26] investigated the applicability of PSNN for shift, scale and rotation invariant pattern recognition. Results for function approximation and classification were encouraging, when compared to backpropagation networks for achieving similar performance. Ghosh and Shin [11] argued that the PSNN requires less memory, and at least two orders of magnitude less number of computations, when compared to MLPs for similar performance levels, and over a broad class of problems. Figure 2 shows a PSNN, whose output is determined according to the following equations: Y¼r

k X N Y

Wkj Xk þ Wj0

!

ð2Þ

j¼1 k¼1

where Wkj are adjustable weights, Wj0 are the biases of the summing units, Xk is the input vector, k is the number of summing units (alternatively, the order of the network), N is number of input nodes, and r is a nonlinear transfer

function. PSNNs demonstrated competent abilities in solving scientific and engineering problems [11], [25], [26], despite not being universal approximators. 2.3 Ridge polynomial neural networks The RPNN was introduced by Shin and Ghosh [17]. The network is constructed by adding gradually more complex PSNNs, denoted by Pi(x) in Eq. 3. RPNNs can approximate any multivariate continuous function defined on a compact set in multidimensional input space, with arbitrary degree of accuracy. Similar to the PSNN, the RPNN has only a single layer of adaptive weights as shown in Fig. 3, and hence the network preserves all the advantages of PSNN. Any multivariate polynomial can be represented in the form of a ridge polynomial and realized by the RPNN [17], whose output is determined according to the following equations: f ðxÞ r

N X

! Pi ðxÞ

i¼1

Pi ðxÞ ¼

i Y

ð3Þ

Wj ; X þ Wj0 ; i ¼ 1; . . . ; N:

j¼1

where is the inner product between the trainable weights matrix W, and the input vector X, Wj0 are the biases of the summing units in the corresponding PSNN units, N is the number of PSNN units used (or alternatively, the order of the RPNN), and r denotes a suitable nonlinear transfer function, typically the sigmoid transfer function. The RPNN provides a natural mechanism for incremental network growth, by which the number of free parameters is gradually increased with the addition of Pi– Sigma units of higher orders. The structure of the RPNN is highly regular, in the sense that Pi–Sigma units are added incrementally until an appropriate order of the network or a predefined error level criterion is achieved. Y Output linear of summing unit

Y Output layer of product

-linear TF)

(non-linear)

Fixed

Fixed weights h1

hj …………

hk ……

Adjustable weights Input layer

Hidden layer of linear summing unit

PSNN1

………… PSNNi

…….

PSNNN Wj

Adjustable weights

W kj X 1 ……………Xk

Fig. 2 Pi–Sigma neural network

…………… XN

Input layer

Xi …………… Xj

…………… Xd

Fig. 3 Ridge polynomial neural network

123

314


Tawfik and Liatsis [9] tested the RPNN in the one-step prediction of the Lorenz attractor and the solar spot time series. They demonstrated that the RPNN has a more regular structure and a superior performance in terms of speed and efficiency, when compared to MLPs. Voutriaridis et al. [18] found that RPNN could give satisfactory results with significantly high performance rates, when used in function approximation and character recognition. Schmitt [27] demonstrated that the performance of the RPNN and PSNN as controller structures is competitive with those obtained using fuzzy logic excitation controller designs. Shin and Ghosh [17] tested the RPNN in a surface fitting problem, the classification of high dimensional data, and the realization of a multivariate polynomial function. They highlighted the capabilities of the RPNN in comparison to MLPs, cascade correlation, and optimal brain damage (ODB). Results showed that an RPNN trained with the constructive learning algorithm provided smooth and steady learning and used much less computations and memory, in terms of the number of units and weights.

3 Prediction of financial signals 3.1 Financial time series Time series prediction is the process of predicting future values from a series of past data extending up to the present. The mapping takes an existing sequence of data Xt–n,..., Xt–2, Xt–1, Xt and forecasts the future values of the time series Xt+1, Xt+2,.... The objective is to find a mapping approximation between the input and the output data in order to discover the implicit rules governing the observed signals. While many time series are approximated with a high degree of confidence, financial time series are found to be among the most difficult to analyze and predict. This is hardly surprising, when considering the highly dynamic and complex behaviour of the financial markets themselves. Forecasting the behaviour of the financial market is a nontrivial task because the signals typically contain very high noise levels. These signals have significant nonlinear and non-stationary behaviour, Table 1 Financial time series signals used

123

and it has been suggested that some financial time series are not predictable [28]. Knowles et al. [29] showed that the prediction of the exchange rate between the Euro and the US dollar (EUR/ USD) time series using HONN models demonstrated a profit increase over the MLP, of around 8%. The networks also utilized a smaller number of weights, thus leading to faster training times and reduced memory requirements. Dunis and Wiliams [4] implemented neural networks regression to forecast foreign exchange rates on EUR/USD time series data. Their study was benchmarked against several traditional forecasting techniques including the Naı¨ve Strategy, MACD Strategy, ARMA Methodology, and Logit Estimation. Their observations confirmed the applicability of neural networks in financial forecasting. Yao and Tan [5] examined the forecasting performance of neural networks on the exchange rates between the American Dollar and five other major currencies, i.e., Japanese Yen, Deutsch Mark, British Pound, Swiss Franc and Australian Dollar. The results showed that without the use of extensive market data or knowledge, useful predictions can be made and significant paper profits can be achieved for out-of-sample data. They also concluded that a backpropagation network with simple technical indicators has proved to be adequate for forecasting. Another approach for time series forecasting can be found in [6] which analyzed the predictability of major world stock markets such as Canada, France, Germany, Japan, United Kingdom (UK), the United States (US), and the world excluding US (World) using MLP models. They found that MLP models predict daily stock returns better than the traditional ordinary least squares and general linear regression models. Neural networks have been shown as a promising tool in forecasting financial times series. They have been widely used to model the behaviour of financial time series and to forecast future values [4–6, 28, 29]. 3.2 Financial time series signals used in this research In this work, six noisy financial time series signals are considered as shown in Table 1. All financial time series

Time series data

Time periods

Total

1

IBM common stock closing price (IBM)

17 May 1961 to 02 November 1962

360

4

US dollar to EURO exchange rate (US/EU)

03 January 2000 to 04 November 2005

1,525

5

The Japanese Yen to UK exchange rate (JP/UK)


1,525

6

The Japanese Yen to US exchange rate (JP/US)


1,525

7

Japanese yen to EURO exchange rate (JP/EU)


1,525

8

The United States 10-year government bond (CBOT-US)

01 June 1989 to 11 December 1996

1,965


were obtained from a historical database provided by Datastream, forepart from the IBM common stock closing price time series, which was taken from the Time Series Data Library [30]. The signals were fed to the neural networks to capture the underlying rules of the movement in the financial markets.

4 Data pre-processing

315 Table 2 Calculations for input and output variables Indicator Input variables EMA15

PðiÞ EMA15 ðiÞ

RDP-5

ðpðiÞ pði 5ÞÞ=pði 5Þ 100

RDP-10


RDP-15


RDP-20


Output variable

Financial time series need adequate pre-processing before presenting them to the neural networks as they exhibit high volatility, complexity, and noise. To smooth out the noise and to reduce the trend, the non-stationary raw data was pre-processed into a stationary series (see Fig. 4) by transforming them into measurements of 5-day relative difference in percentage of price (RDP) [31]. The advantage of this transformation is that the distribution of the transformed data will become more symmetrical and will follow more closely the normal distribution. In fact, according to Thomason [31], this transformation of the signal often enhances the performance of trading systems, when applied in neural network models. The assumption is that the transformation results in the extraction of market characteristics that are more useful to the prediction task than the absolute values alone, and that improved prediction performance translates to improved trading system performance. The input variables were determined from four lagged RDP values based on 5-day periods (RDP-5, RDP-10, RDP-15, and RDP-20) and one transformed signal (EMA15), which is obtained by subtracting a 15-day exponential moving average from the original signal. As mentioned in [31], the optimal length of the moving day period, in this case is 15, is not critical, but it should be longer than the forecasting horizon. Since the use of RDP to transform the original series may remove some useful information embedded in the data, EMA15 was used to retain the information contained in the original data. As argued in [32], smoothing both input and output data by using either simple or exponential moving average is a good approach and can generally enhance the prediction performance. The weighting factor, a 2 [0,1] determines the impact of past returns on the actual volatility. The larger the value of a, the stronger the impact and the longer the memory. In our work, exponential moving average with weighting factor of a = 0.85 was selected, after a series of trial and error tests. From the trading aspect, the forecasting horizon should be sufficiently long such that excessive transaction cost resulting from over-trading could be avoided [33]. Meanwhile, from the prediction aspect, the forecasting horizon should be short enough as the persistence of financial time

Calculations

RDP+5

ðpði þ 5Þ pðiÞÞ=pðiÞ 100 pðiÞ ¼ EMA3 ðiÞ

EMAn(i) is the n-day exponential moving average of the i-th day. p(i) is the signal of the i-th day

series is of limited duration. Thomason [31] suggested that a forecasting horizon of 5 days is a suitable choice for the daily data. Therefore, in this work, we consider the prediction of the next five business day. The output variable, RDP+5, was obtained by first smoothing the signal with a 3-day exponential moving average, and is presented as a relative difference in percent for five days ahead. Since the statistical information of the previous 20 trading days was used for the definition of the input vector, the original series were transformed and reduced by a length of 20. The calculations for the transformation of input and output variables are presented in Table 2. The RDP series were subsequently scaled using the standard minimum and maximum normalization method, which then produces a new bounded dataset. One of the reasons for using data scaling is to process outliers, which consist of sample values that occur outside the normal (expected) range.

5 Performance metrics In financial forecasting parlance, accuracy typically refers to profitability. The reason is that, from the trading prospective, the objective is to use the network’s predictions to generate profit. Indeed, it does not add value from a financial forecasting perspective, when the network produces very low prediction error, while at the same time it attains a lower profit return. Therefore, it is important for the network to predict the correct direction of change of the signal. Certainly, any model that can predict the direction of change to 100% would be optimal from a profit point of view, regardless of what the error is. However, it appears that the number of direction changes that are correctly predicted is not as important to the annualized return (AR). This is because the size of the changes that are correctly predicted will have a greater effect on the AR. If a model is accurate at predicting many smaller changes, it will lose profitability if it fails on the larger changes. Conversely, if

123

316

Neural Comput & Applic (2008) 17:311–323 US/EU nonstationary signal

US/EU stationary signal

1.5

5 4

1.4

3 1.3

Price

Price

2 1.2

1.1

1 0 −1

1

−2 0.9

−3 −4

0.8 0

200

400

600

800

1000

1200

1400

1600

0

200

400

600

800

1000

1200

1400

1600

1200

1400

1600

Price

Day

US/EU signal JP/EU stationary signal

JP/EU stationary signal 6

150

140

4

130

Price

Price

2 120

110

0

−2 100 −4

90

−6

80

0

200

400

600

800

1000

1200

1400

1600

0

200

400

600

800

1000

Day

Day

JP/EU signal JP/UK nonstationary signal

JP/UK stationary signal

210

5 4

200 3 190

2 1

Price

Price

180

170

0 −1 −2

160

−3 150 −4 140

0

200

400

600

800

1000

1200

1400

−5

1600

0

200

400

600

800

1000

1200

1400

1600

Day

Day

JP/UK signal CBOT-US nonstationary signal

CBOT-US stationary signal

105

3

2 100

1

Price

Price

95 0

90 −1

85

−2

80

0

500

1000

1500

2000

−3

0

CBOT-US signal

123

500

1000

Day

Day

1500

2000


317

b Fig. 4 Signals before and after pre-processing. Thick solid line

indicates Signal, solid line indicates trend, dashed dots indicate Standard deviation

Table 3 Performance metrics and their calculations Metrics

Calculations

Annualised return (AR)

a model is accurate at predicting the larger changes, its profitability will be eroded if it fails on many smaller changes. This trade-off is not reflected in the mean squared error (MSE) metric. Therefore, it is important to consider the out-of-sample profitability when dealing with financial time series prediction. The prediction performance of our networks was evaluated using one financial metric, where the objective was to use the networks predictions to make money, and two statistical metrics were used to provide accurate tracking of the signals, as shown in Table 3. In order to measure profits generated from the networks predictions, a simple trading strategy was used. If the network predicts a positive change for the next five day RDP, a ‘‘buy’’ signal is sent, otherwise a ‘‘sell’’ signal is sent. The ability of the networks as traders was evaluated by the annualized return (AR), a real trading measurement which is used to test the possible monetary gains and to measure the overall profitability in a year, through the use of the ‘‘buy’’ and ‘‘sell’’ signals [4]. The AR is a scaled calculation of the observed change in the time series value, when the sign of the change is correctly predicted. The normalized mean squared error (NMSE) is also used to measure the deviation between the actual and the predicted signals. The smaller the value of the NMSE, the closer the predicted signals are to the actual signals. The signal to noise ratio (SNR) provides the relative amount of useful information in a signal, as compared to the noise it carries.

6 Training of the networks The RPNN is benchmarked against the MLP, FLNN and PSNN, which were trained with the incremental backpropagation learning algorithm [34]. We trained all the networks with maximum number of 3,000 epochs. For the benchmarked networks, early stopping was utilized and each signal was divided into three data sets which are the training, validation and the out-of-sample data which represent 25, 25, and 50% of the entire data set, respectively. For FLNNs and PSNNs, the higher order terms were empirically selected between 2 and 5. The MLPs were trained with hidden units that varied from 3 to 8. Mean-

Table 4 The learning parameters used in RPNNs and the benchmarked neural networks

Neural networks

AR ¼ 252

n 1X Ri n i¼1

ðyi Þð^ yi Þ 0 otherwise n Normalise mean square error (NMSE) 1 X ðyi y^i Þ2 NMSE ¼ 2 r n i¼1 n 1 X r2 ¼ ðyi yÞ2 n 1 i¼1 n X y¼ yi Ri ¼

yi jyi j

i¼1

Signal to noise ratio (SNR)

SNR ¼ 10 log10 ðsigmaÞ m2 n SSE n X SSE ¼ ðyi y^i Þ

sigma ¼

i¼1

m ¼ maxðyÞ n is the total number of data patterns. y and y^ represent the actual and predicted output value

while, we did not employ an early stopping criterion for the training of the RPNN. This is because every time a higher order PSNN unit is added to the network, the monitored MSE will slightly increase before it gradually decreases. If we use early stopping, the network training will usually stop after a PSNN unit is added, at the very time that the new added PSNN is about to be trained. This will result in truncated and incomplete learning. Therefore, the signals were segregated into two partitions; the training and the out-of-sample data with a distribution of 25 and 75%, respectively. We train the RPNNs with a constructive learning algorithm [17], which follows the steps: 1. 2. 3.

4.

Start with an RPNN of order 1, which has one unit of first order PSNN. Carry out the training and update weights asynchronously after each training pattern. When the observed change in error falls below the e e predefined threshold r, i.e., cep p \r; a higher order PSNN is added. Note that ec is the MSE for the current epoch, and ep is the MSE for the previous epoch. The threshold, r, for the error gradient and the learning rate, n, are reduced by a suitable factor dec_r and dec_n, respectively.

Initial weights

Learning rate (n)

dec_n

Threshold (r)

dec_r

Benchmarked networks

[–0.5,0.5]

0.1 or 0.05

–

–

–

RPNN

[–0.5,0.5]

[0.1, 0.5]

0.8

[0.001,0.7]

[0.05,0.2]

123

318


Table 5 The average performance of the RPNNs and the benchmarked neural networks

IBM closing price Predictor

MLP-Hidden 7

FLNN-Order 3

PSNN-Order 2

RPNN-Order 3

AR (%)

89.40

90.21

90.10

90.71

SNR

21.68

22.48

22.34

22.58

NMSE Epochs

0.3343 568

0.2764 2519

0.2858 651

0.2701 425

FLNN-Order 2

PSNN-Order 2

RPNN-Order 2

US/EU exchange rate Predictor

MLP-Hidden 3

AR (%)

87.88

87.46

87.54

88.32

SNR

23.81

23.74

23.82

23.58

NMSE

0.2375

0.2414

0.2369

0.2506

Epochs

3,000

3,000

1,294

24

FLNN-Order 3

PSNN-Order 5

RPNN-Order 4

JP/EU exchange rate Predictor

MLP-Hidden 7

AR (%)

87.05

87.34

87.06

87.48

SNR

27.84

27.88

27.89

27.85

NMSE

0.2156

0.2134

0.2133

0.2152

Epochs

3,000

3,000

871

817

FLNN-Order 5

PSNN-Order 5

RPNN-Order 2

JP/US exchange rate Predictor

MLP-Hidden 6

AR (%)

83.55

84.75

83.53

84.84

SNR NMSE

25.60 0.2694

25.79 0.2573

25.65 0.2656

25.24 0.2927

Epochs

699

1,489

1,141

8

FLNN-Order 2

PSNN-Order 5

RPNN-Order 4

JP/UK exchange rate Predictor

MLP-Hidden 5

AR (%)

88.97

88.84

88.86

89.25

SNR

26.61

26.61

26.6

26.61

NMSE

0.2083

0.2084

0.2090

0.2084

Epochs

1,179

3,000

1,078

298

FLNN-Order 5

PSNN-Order

RPNN-Order 5

CBOT-US government bond Predictor

Values presented in bold font indicates the best value in AR (annualized return)

5.

MLP-Hidden 3

AR (%)

86.10

86.27

86.17

86.60

SNR

25.20

25.24

25.23

25.15

NMSE

0.2537

0.2510

0.2515

0.2564

Epochs

1,395

1,104

242

131

The updated network carries out the learning cycle (repeat steps 2 to 4) until the maximum number of epochs is reached.

It should be noted that only the weights of the latest added Pi–Sigma unit are adjusted during the training and the rest are kept frozen through an asynchronous weight

123

update. A sigmoid activation function was used for all neural networks. An average performance of 20 trials was used with the respective learning parameters as given in Table 4. These sets of parameters were experimentally chosen to yield the best performance on the out of sample data.

Neural Comput & Applic (2008) 17:311–323 0.008

0.0074

0.007

0.0064

0.006

0.0054

MSE

MSE

Fig. 5 The learning curve for the prediction using RPNN

319

0.005

0.0034

0.003

0.0024

0.002

1 4

0.0014

7 10 13 16 19 22 25 28 31 34 37 40 43 46

1

14

27

40

53

66

79

92 105 118 131 144

Epoch

Epoch

US/EU signal

JP/EU signal 0.0058

0.0085

0.0053

0.0075

0.0048

MSE

0.0065

MSE

0.0044

0.004

0.0055

0.0043 0.0038

0.0045

0.0033

0.0035

0.0028

0.0025

0.0023 0.0018

0.0015

1

13 25 37 49 61 73 85 97 109 121 133

2

3

4

5

6

Epoch

Epoch

JP/UK signal

CBOT-US signal

7 Results As we are concerned with financial time series prediction, in these extensive experiments, our primary interest is not to assess the predictive ability of RPNN models against other neural networks models, but to concentrate on the profitable value contained in the RPNN predictions. During generalization, we focus more on how the network generates the profits. For this reason, the neural networks structure, which provides the highest percentage of AR on out-of-sample data (unseen data) is considered the best model. The simulation results of the RPNN are benchmarked against those of the FLNN, PSNN, and MLP. Table 5 summarizes the average results of 20 simulations obtained on unseen data from all neural networks. As it can be noticed, RPNNs successfully made the best profit compared to all benchmarked models on all time series data. The networks outperformed other models on the average AR by 0.1 to 1.31%. The prediction using MLPs produced the lowest AR for the IBM, JP/EU and CBOTUS signals. The FLNNs produced the lowest AR for the prediction of the US/EU and JP/UK signals, and the PSNN achieved the lowest AR on the JP/US signal. In terms of SNR, the simulation results for each dataset demonstrated very little deviation between the highest average value and the remaining results. There is no significant evidence that RPNNs can track the signal better than other network models, apart for the prediction of the IBM closing price and the JP/UK exchange rate. The overall results given by all network predictors on the amount of meaningful information, with the present level

7

8

9

of background noise in the forecast signals suggested that the datasets are highly noisy. Albeit RPNNs have a relatively lower SNR than other network models, when used to predict the US/EU, JP/US and CBOT-US signals, they can

93

91

89

Annualised Return (%)

1

87

85

83

81

79

77 IBM

US/EU MLP

JP/EUJ FLNN

P/US PSNN

JP/UKC BOT-US RPNN

Fig. 6 Best simulation results from all networks on all data sets

123

320


Annualised Return (%)

93 91 89 87 85 83 81

SNR

79 1

2

3 4 Network's Order

5

1

2

3 4 Network's Order

5

1

2 3 4 Network's Order

5

29 28 27 26 25 24 23 22 21 20

0.5

NMSE

0.45 0.4 0.35 0.3 0.25 0.2

Fig. 7 Performance of RPNN with increasing order. Solid line IBM, dashed line US/EU, thin solid line JP/EU, thick solid line JP/US, grey colour line JP/UK, dashed dots CBOT-US

still model the data reasonably well with a higher profit return. Results from Table 5 also showed that the NMSE produced by RPNNs is on average below 0.3, which is considered to be satisfactory with respect to the high profit return generated by the network. Despite the fact that in some cases RPNNs produced slightly higher NMSE and lower SNR than other networks models, the results do not reflect the significant profitable value offered by RPNNs. This is because the SNR and NMSE are calculated based on the squared error; therefore, if a model has a low NMSE, the SNR will be high. Conversely, if the NMSE is high, the SNR will drop. On the other hand, the AR is a scaled calculation of the observed change in the time series value, when the sign of the change is correctly predicted. Hence, it is worth noting that seeking optimal forecasting in terms of NMSE is not the aim of this research, as explained previously in the beginning of Sect. 5.

123

In all financial signals, RPNNs have shown to require a smaller number of training cycles (epochs) to converge on all the datasets, which is equivalent to being 1.066–186 times faster than the benchmarked models. For purpose of demonstration, Fig. 5 shows the learning curves for the prediction of the data signals using RPNNs, where the networks showed the ability to converge fast and learn the signals very quickly, within 9–145 epochs. In general, the simulation results given in Table 5 demonstrated that RPNNs of order 2–5 appeared to have learned the financial time series signals. Apart from attaining the highest average profit return, RPNNs also outperformed all benchmarked models on the best simulation results, when using the annualized return by around 0.04–3.41% (see Fig. 6). In order to test the modelling capabilities and the stability of RPNNs, Fig. 7 shows the average result of AR, SNR, and NMSE tested on out-of-sample data, when used to predict the financial signals. The performance of the networks was evaluated with the number of higher order terms tested between 1 and 5. The plots indicated that the RPNNs learned the data steadily with the AR and SNR continuing to increase, while the NMSE kept decreasing along with the network growth. The low performance of the first order RPNN is due to the structure of the network, which has a very small number of weights to carry out the mapping task and it does not incorporate higher order terms. Experimental results showed that in certain cases, the performance of the network started to degrade when a Pi–Sigma unit of order five was added. This is due to the utilization of a successive number of free parameters. The best forecast made by RPNNs on the financial signals is depicted in Fig. 8. As it can be noticed from Fig. 8, RPNNs are capable of learning the behaviour of chaotic and highly non-linear financial time series data and they can capture the underlying movements in financial markets. Figure 9 demonstrates the histograms of the nonlinear prediction errors using RPNNs, which indicate that the prediction errors can be approximately modelled by white Gaussian processes. This suggests that the prediction errors consist of stationary independent samples.

8 Conclusion and future works A novel application of RPNNs to the prediction of financial time series is presented. RPNN is composed of increasing orders of PSNNs, therefore preserving all the advantages of PSNN, whilst maintaining a well-regulated structure. Six noisy financial time series signals, i.e., the US/EU, JP/EU, JP/US, and JP/UK exchange rate, IBM closing price, and the United States 10-year government bond, were used to test the robustness of the RPNN, when used to forecast the 5-days ahead relative difference price. Prediction results

Neural Comput & Applic (2008) 17:311–323 2.5

3

2 2

1.5 1

RDP + 5 Values

RDP + 5 Values

Fig. 8 Best forecast made by RPNN. Solid line Original signal, thick solid line predicted signal

321

0.5 0

−0.5 −1

1 0 −1

−1.5

−2

−2 −2.5

−3 0

20

40

60

80

100

0

20

40

Day US/EU error signal 2.5

3

2

2.5

0.5 0

−0.5 −1

100

80

100

1.5 1 0.5 0 −0.5

−1.5 −2

−1

−2.5

−1.5 0

20

40

60

80

100

0

40

60 Day

JP/UK error signal

CBOT-US error signal

90

100

80

90

70

80 70 Frequency

50 40 30

60 50 40 30

20

20

10

10

0 −0.2

20

Day

60 Frequency

80

2

1

RDP + 5 Values

RDP + 5 Values

1.5

Fig. 9 Histogram of the error signals using RPNN

60

Day JP/EU error signal

−0.15

−0.1

−0.05

0

0.05

0.1

0 −0.15

0.15

−0.1

−0.05

0

Error signal

Error signal

US/EU error signal

JP/EU error signal

90

0.05

0.1

120

80 100 70 80 Frequency

Frequency

60 50 40 30

60 40

20 20 10 0 -0.1

-0.05

0

0.05

0.1

0 -0.2

-0.15

-0.1

-0.05

0

0.05

Error signal

Error signal

JP/UK error signal

CBOT-US error signal

0.1

0.15

123

322

demonstrated that RPNNs generate higher profit returns compared to MLPs, FLNNs, and PSNNs, thereby showing considerable promise as a decision making tool. In addition to generating profitable return value, which is a desirable property in nonlinear financial time series prediction, RPNNs also show fast convergence compared to the benchmarked neural networks. The superior performance in the prediction of all financial time series signals is attributed to the RPNNs’ well-regulated structure and robustness. The presence of higher order terms in the networks equipped the RPNNs with the ability to accurately forecast the upcoming trends of the financial time series data. A noteworthy advantage of RPNNs is the fact that there is no requirement to select the order of the networks as in the case of FLNNs and PSNNs, or the number of hidden units as in MLPs. The main intricacy when using RPNNs is to find the suitable parameters for successively adding a higher order Pi–Sigma unit in the network. Future work will involve the use of genetic algorithms for optimally tuning these RPNN’s parameters. Another avenue for research will be the investigation on the use of recurrent links in the RPNNs. The new recurrent RPNNs promise to offer both the advantages of feedforward RPNNs as well as the temporal dynamics induced by the recurrent connections. The presence of feedback connections will allow the networks to utilize a system that is capable of storing internal states and implementing complex dynamics. Acknowledgments The work of R. Ghazali is supported by Universiti Tun Hussein Onn (UTHM), Malaysia.

References 1. Leerink LR, Giles CL, Horne BG, Jabri MA (1995) Learning with product units. In: Tesaro G, Touretzky D, Leen T (eds) Advances in neural information processing systems 7, MIT Press, Cambridge, MA, pp 537–544 2. Lawrence S, Giles CL (2000) Overfitting and neural networks: conjugate gradient and back propagation. In: International joint conference on neural network, Italy, IEEE Computer Society, CA, pp 114–119 3. Hellstro¨m T, Holmstro¨m K (1998) Predicting the stock market. Technical Report IMa-TOM-1997-07, Center of Mathematical Modeling, Department of Mathematics and Physis, Ma¨lardalen University, Va¨steras, Sweden 4. Dunis CL, Wiliams M (2002) Modeling and trading the UER/USD exchange rate: do neural network models perform better? In: Derivatives use, trading and regulation, vol 8, No. 3, pp 211–239 5. Yao J, Tan CL (2000) A case study on neural networks to perform technical forecasting of forex. Neurocomputing 34:79–98 6. Shachmurove Y, Witkowska D (2000) Utilizing artificial neural network model to predict stock markets. CARESS Working Paper #00–11 7. Cass R, Radl B (1996) Adaptive process optimization using functional-link networks and evolutionary algorithm. Control Eng Pract 4(11):1579–1584

123

Neural Comput & Applic (2008) 17:311–323 8. Liatsis P, Hussain AJ (1999) Nonlinear one-dimensional DPCM image prediction using polynomial neural networks. In: Proc. SPIE, applications of artificial neural networks in image processing IV, San Jose, California, 28–29 January, vol 3647, pp 58– 68 9. Tawfik H, Liatsis P (1997) Prediction of non-linear time-series using higher-order neural networks. In: Proceeding IWSSIP’97 conference, Poznan, Poland 10. Mirea L, Marcu T (2002) System identification using functionallink neural networks with dynamic structure. In: 15th Triennial world congress, Barcelona, Spain 11. Shin Y, Ghosh J (1991) The Pi–Sigma networks: an efficient higher-order neural network for pattern classification and function approximation. In: Proceedings of international joint conference on neural networks, Seattle, Washington, July 1991, vol 1, pp 13–18 12. Shin Y, Ghosh J (1992) Computationally efficient invariant pattern recognition with higher order Pi–Sigma networks. The University of Texas at Austin 13. Kaita T, Tomita S, Yamanaka J (2002) On a higher-order neural network for distortion invariant pattern recognition. Pattern Recognit Lett 23:977–984 14. Pau YH, Phillips SM (1995) The functional link net and learning optimal control. Neurocomputing 9:149–164 15. Artyomov E, Pecht OY (2004) Modified high-order neural network for pattern recognition. Pattern Recognit Lett 26(6):843– 851 16. Giles CL, Maxwell T (1987) Learning, invariance and generalization in high-order neural networks. In: Applied optics, vol 26, no 23. Optical Society of America, Washington D.C., pp 4972– 4978 17. Shin Y, Ghosh J (1995) Ridge polynomial networks. IEEE Trans Neural Netwo 6(3):610–622 18. Voutriaridis C, Boutalis YS, Mertzios G (2003) Ridge polynomial networks in pattern recognition. EC-VIP-MC 2003. In: 4th EURASIP conference focused on video/image processing and multimedia communications, Croatia, pp 519–524 19. Shin Y, Ghosh J (1992) Approximation of multivariate functions using ridge polynomial networks. In: Proceedings of international joint conference on neural networks, vol 2, pp 380–385 20. Karnavas YL, Papadopoulos DP (2004) Excitation control of a synchronous machine using polynomial neural networks. J Electr Eng 55(7–8):169–179 21. Durbin R, Rumelhart DE (1989) Product units: a computationally powerful and biologically plausible extension to back-propagation networks. Neural Comput 1:133–142 22. Giles CL, Griffin RD, Maxwell T (1998) Encoding geometric invariances in HONN. American Institute of Physics, USA, pp 310–309 23. Fei G, Yu YL (1994) A modified Sigma–Pi BP network with selffeedback and its application in time series analysis. In: Proceedings of the 5th international conference, vol 2243–508F, pp 508–515 24. Thimm G (1995) Optimization of high order perceptron. Swiss federal Institute of Technology (EPFL) 25. Ghosh J, Shin Y (1992) Efficient higher-order neural networks for function approximation and classification. Int J Neural Syst 3(4):323–350 26. Shin Y, Ghosh J (1992) Computationally efficient invariant pattern recognition with higher order Pi–Sigma networks. The University of Texas at Austin 27. Schmitt M (2001) On the complexity of computing and learning with multiplicative neural networks. Neural Comput 14:241–301 28. Schwaerzel R (1996) Improving the prediction accuracy of financial time series by using multi-neural network systems and enhanced data preprocessing. Thesis, Master of Science, The University of Texas at San Antonio

Neural Comput & Applic (2008) 17:311–323 29. Knowles C, Hussain A, El Deredy W, Lisboa P (2005) Higher order neural networks for the prediction of financial time series. Forecasting Financial Markets, France 30. Hyndman RJ (2005) Time series data library. Downloaded from: http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/. Accessed on September 2005. Original source from: McCleary and Hay, Applied time series analysis for the social sciences, 1980, Sage Publications 31. Thomason M (1999) The practitioner method and tools. J Comput Intell Fin 7(3):36–45

323 32. Thomason M (1999) The practitioner method and tools. J Comput Intell Fin 7(4):35–45 33. Cao LJ, Francis EHT (2003) Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans Neural Netw 14:6 34. Haykin S (1999) Neural networks. A comprehensive foundation, 2nd edn. Prentice-Hall, New Jersey

123

The application of ridge polynomial neural network to ... - CiteSeerX

The application of ridge polynomial neural network to ... - CiteSeerX

Suggest Documents

Dynamic ridge polynomial neural network with

Application of the Neural Network to the Synthesis of ... - CiteSeerX

The Fractional Differential Polynomial Neural Network for ...

Evolving polynomial neural network by means of genetic ... - CiteSeerX

APPLICATION OF BACKPROPAGATION NEURAL NETWORK TO ...

APPLICATION OF ARTIFICIAL NEURAL NETWORK MODELING TO ...

Application of artificial neural network to forecast

Application of ensemble deep neural network to

application of artificial neural network systems to grade ... - CiteSeerX

Application of Neural Network to Load Forecasting in ... - CiteSeerX

Application of Artificial Neural Network to Forecast Actual ... - CiteSeerX

Application of Neural Network to Load Forecasting in ... - CiteSeerX

Application of Artificial Neural Network to Forecast Actual ... - CiteSeerX

Ridge Regression using Artificial Neural Network

APPLICATION OF ARTIFICIAL NEURAL NETWORK MODELING TO ...https://www.researchgate.net/.../Paul.../Application-of-Artificial-Neural-Network-Mod...

application of neural network control

application of artificial neural network

A Polynomial Neural Network Model for Prognostic Breast ... - CiteSeerX

application of kernel ridge regression to network ... - VÃ¶lgyesi Lajos

A Polynomial Neural Network Classifier based on

Artificial neural network fusion: Application to Arabic ... - CiteSeerX

An artificial neural network application to produce debris ... - CiteSeerX

The novel application of artificial neural network on ... - CiteSeerX

Convolutional Neural Network Steganalysis's Application to ... - arXiv