Up until now stochastic methods based on the statistical analysis of the ... The nonlinear nature of financial data has inspired many researchers to use neural.
How Good Is the Backpropogation Neural Network Using a Self-Organised Network Inspired by Immune Algorithm (SONIA) When Used for Multi-step Financial Time Series Prediction? Abir Jaafar Hussain and Dhiya Al-Jumeily Liverpool John Moores University, Byrom Street, Liverpool, L3 3AF, UK {a.hussain,d.aljumeily}@ljmu.ac.uk
Abstract. In this paper, a novel application of the backpropagation network using a self-organised layer inspired by immune algorithm is used for the prediction of financial time series. The simulations assess the data from two time series: Firstly the daily exchange rate between the US dollar and the Euro for the period from the 3rd January 2000 until the 4th November 2005, giving approximately 1525 data points. Secondly the IBM common stock closing price for the period from the 17th May 1961 until the 2nd November 1962, establishing 360 trading days as data points. The backpropagation network with the self-organising immune system algorithm produced an increase in profits of approximately 2% against the standard back propagation network, in the simulation, for the prediction of the IBM common stock price. However there was a slightly lower profit for the US dollar/Euro exchange rate prediction.
1 Introduction The efficient market hypothesis states that a stock price, at any given time, reflects the state of the environment for that stock at that time. That is the stock price is dependent on many variables, such as: news events, other stock prices, exchange rates, etc. The hypothesis suggests that future trends are completely unpredictable and subject to random occurrences. Thus making it infeasible, to use historical data or financial information, to produce above average returns [9]. However, in reality, market responses are not always instantaneous. Markets may be slow to react due to poor human reaction time or other psychological factors associated with the human actors in the system. Therefore, in these circumstances, it is possible to predict financial data, based on previous results [12]. There is a considerable body of evidence showing that markets do not work in a totally efficient manner. Much of the research shows that stock market returns are predictable by various methods such as; time series data analysis on financial and economic variables [11]. Up until now stochastic methods based on the statistical analysis of the signals, within the market system, were used for the prediction of financial time series [1-4]. D. Liu et al. (Eds.): ISNN 2007, Part II, LNCS 4492, pp. 921–930, 2007. © Springer-Verlag Berlin Heidelberg 2007
922
A.J. Hussain and D. Al-Jumeily
The nonlinear nature of financial data has inspired many researchers to use neural networks as a modelling approach [5] by replacing explicit linearity-in-the parameters dependencies with implicit semi-parametric models [6]. When the networks are trained on financial data, with multivariate function, they become minimum average function approximators [1]. Whilst ease of use and capability to model dynamical data are appealing general features, of typical neural networks, there are concerns about generalisation ability and parsimony. Cao and Francis [7] showed that when using multilayer perceptrons (MLP), trained using the back propagation learning algorithm, the normalised mean square of the error (NMSE) will decrease on the validation data for the first few epochs and will increase for the remaining epochs. This indicates that the MLP networks trained using the backpropagation algorithm suffers from overfitting. Hence, the use of neural networks for financial time series prediction encounters many problems, these include [8]: 1. Different neural network models can perform significantly differently when trained and tested on the same data sets. This is because there are artefacts that influence the predictive ability of the models. Yet, it would be reasonable to suppose that well-founded models would produce similar inferences regardless of the detailed architecture of the particular neural network used. 2. For any given type of neural network, the network is sensitive to the topological choice and the size of the data set. Neural networks suffers from overfitting problems and as a result, researchers need to take extra care when selecting the network architecture, the learning parameters and training data in order to achieve good generalisation since this is critical when using neural network for financial time series. 3. The nonstationary nature and the changing trending behaviour between oscillatory and monotonic trends of financial time series can prevent a single neural network from being able to accurately forecast an extended trading period even if it can well forecast changes in the testing data. To improve the recognition and generalisation capability of the backpropagation neural networks, Widyanto et al [11] used a hidden layer inspired by immune algorithm for the prediction of sinusoidal signal and time temperature based quality food data. Their simulations indicated that the prediction of sinusoidal signal showed an improvement of 1/17 in the approximation error in comparison to the backpropagation and 18% improvement in the recognition capability for the prediction of time temperature based quality food data. In this paper, we propose the use of the backpropagation neural networks with hidden layer inspired by immune algorithm for financial time series prediction. Two financial time series are used to test the performance of the network, which are the exchange rates between the US dollar and the Euro and the IBM common stock closing price.
How Good Is the Backpropogation Neural Network
923
The remainder of the paper is organised as follows: section 2 presents the backpropagation neural network with the hidden layer inspired by immune algorithm. Section 3 outlines the financial time series used for the simulations together with the pre-processing steps and the metrics used for benchmarking the performance of the neural networks. Section 4 is dedicated for the simulation results and discussion, while section 5 is used for conclusion.
2 The Self-Organised Network Inspired by the Immune Algorithm (SONIA) The immune algorithm which was first introduced by Timmis [12] has made a lot of attractions. Widyanto et al [19] introduced a method to improve recognition as well as generalization capability of the backpropagation by suggesting a self-organization hidden layer inspired by immune algorithm which is called SONIA network. The input vector and hidden layer of SONIA network are considered as antigen and recognition ball, respectively. The recognition ball which is the generation of the immune system is used for hidden unit creation. In time series prediction, the recognition balls are used to solve overfitting problem. In the immune system, the recognition ball has a single epitope and many paratopes. In which, the epitope is attached to B cell and paratopes are attached to antigen, where there is a single B cell that represents several antigens. For SONIA network, each hidden unit has a centre that represents the number of connections of the input vectors that are attached to it. To avoid the overfitting problem, each centre has a value which represents the strength of the connections between input units and their corresponding hidden units. The SONIA network consists of three layers which are input, self-organized and output layers as shown in the structure of the SONIA network [11]. In what follows the dynamic equations of SONIA network are considered. The ith input unit receives normalized external input Si where i = 1….NI and NI represents the number of inputs. The output of the hidden units is determined by the Euclidean distance between the outputs of input units and the connection strength of input units and the jth hidden unit. The use of the Euclidian distance enables the SONIA network to exploit locality information of input data. This can lead to improve the recognition capability [11]. The output of the jth hidden unit is determined as follows:
⎛ NI ⎜ X Hj = f ⎜ w Hij − x Ii ⎜ i =1 ⎝ j = 1......., N H
∑(
⎞
)2 ⎟⎟ ⎟ ⎠
where WHij represents the strength of the connection from the ith input unit to the jth hidden unit, and f is a nonlinear transfer function.
924
A.J. Hussain and D. Al-Jumeily
The outputs of the hidden units represent the inputs to the output layer. The network output can be determined as follows: ⎛ NH y k = g⎜ wojk + bok ⎜ = 1 j ⎝ k = 1,...., N o
∑
⎞ ⎟ ⎟ ⎠
where wojk represents the strength of the connection from the jth hidden unit to the kth output unit and bok is the bias associated with the kth output unit, while g is the nonlinear transfer function. 2.1 Training the SONIA Network
In this subsection, the training algorithm of the SONIA network will be shown. Furthermore, a B cell construction based hidden unit creation will be described. For the immune algorithm, inside the recognition ball there is a single B cell which represents several antigens. In this case the hidden unit is considered as the recognition ball of immune algorithm. Let d(t+1) represent the desired response of the network at time t+1. The error of the network at time t+1 is defined as:
e(t + 1) = d (t + 1) − y (t + 1)
(1)
The cost function of the network is the squared error between the original and the predicted value, that is:
J (t + 1) =
1 [e(t + 1)]2 2
(2)
The aim of the learning algorithm is to minimise the squared error by a gradient descent procedure. Therefore, the change for any specified element woij of the weights matrix is determined according to the following equation:
Δwoij (t + 1) = −η
∂J (t + 1) ∂wij
(3)
where (i = 1…., NH, j = 1…,No) and η is a positive real number representing the learning rate. The change for any specified element bok of the bias matrix can is determined as follows: Δboj (t + 1) = −η
∂J (t + 1) ∂wij
(4)
where ( j = 1…,No). The initial values of woij are set to zero and the initial values of boj are given randomly. 2.2 B Cell Construction Based Hidden Unit Creation
The purpose of hidden unit creation is to form clusters from input data and to determine the centroid of each cluster formed. These centroids are used to extract local
How Good Is the Backpropogation Neural Network
925
characteristic of the training data and enable the SONIA network to memories the characteristics of training data only and not the testing data. The overfitting problem could be prevented using this approach. Furthermore, the use of Euclidean distance to measure the distance of input data and these centroids, enables the network to exploit local information of the input data. This may lead to improve recognition capability for pattern recognition problem. For each hidden unit, two values are recorded which are the number of input vectors associated with the jth hidden unit, and the cluster centroid of the input vectors that represents the strength of the connection between the input units and the jth hidden unit. Let (dm, ym) represents a given set of pairs of input and output to be learned. In the initialisation process, the first hidden unit (t1, wH1) is created with t1 = 0, and wH1 is taken arbitrarily from the input vector. The following procedure is used for the hidden layer creation which was derived from the immune algorithm [12]. This procedure will be repeated until all inputs have found their corresponding hidden unit [11]: 1. For (j = 1 to NH) determines the distance between the mth input and centroid of the jth hidden unit as follows: dist mj =
NI
∑ (y
mi
− w Hij
)2
i =1
2. Select the shortest distance c=arg minj (distmj) 3. If the shortest distance distmc is below a stimulation level, sl (where s1 is selected between 0 and 1), in this case the input has found its corresponding hidden unit and tc = tc +1, w Hj = w Hj + hd mc , where h is a learning rate. Otherwise a new hidden unit is added with tNH= 0. The value of tk for k = 1 to NH are set to 0. Then go to step 1.
3 Financial Time Series Prediction Using SONIA Neural Network SONIA neural network was used to predict two financial time series. The daily exchange rates between the US Dollar and the Euro (US/EU) in the period between 3 January 2000 to 4 November 2005, which contain approximately 1525 data points and the IBM common stock closing price dated from 17th May 1961 to 2nd November 1962, giving a total of 360 trading days obtained from a historical database provided by DataStream [15]. These time series were fed to the neural networks to capture the underlying rules of the movement in the financial markets. Since financial time series are highly nonlinear and nonstationary signals, they need adequate pre-processing before presenting them to neural network. To smooth out the noise and to reduce the trend, the nonstationary raw data is usually transformed into stationary series.
926
A.J. Hussain and D. Al-Jumeily Table 1. Calculations for Input and Output Variables Indicator Input variables
Output variables
EMA15 RDP-5 RDP-10 RDP-15 RDP-20 RDP+5
Calculations
P ( i ) − EMA
15
(i)
( p ( i ) − p ( i − 5 )) / p ( i − 5 ) * 100 ( p ( i ) − p ( i − 10 )) / p ( i − 10 ) * 100 ( p ( i ) − p ( i − 15 )) / p ( i − 15 ) * 100 ( p ( i ) − p ( i − 20 )) / p ( i − 20 ) * 100 ( p ( i + 5 ) − p ( i ) ) / p ( i ) * 100 p ( i ) = EMA ( i ) 3
EMAn(i) is the n-day exponential moving average of the i-th day. p(i) is the closing price of the i-th day.
The original closing prices were transformed into five-day relative different in percentage of price (RDP) [13]. The advantage of this transformation is that the distribution of the transformed data will follow more closely to normal distribution. The input variables were determined from four lagged RDP values based on fiveday periods (RDP-5, RDP-10, RDP-15, and RDP-20) and one transformed closing price (EMA15) which is obtained by subtracting a 15-day exponential moving average from the closing price. The optimal length of the moving day is not critical, but it should be longer than the forecasting horizon of five days [13]. Since the use of RDP to transform the original series may remove some useful information embedded in the data, EMA15 was used to retain the information contained in the original data. It hasbeen argued in [14] that smoothing both input and output data by using either simple or exponential moving average is a good approach and can generally enhance the prediction performance. The horizon forecast is 5 days and therefore the output variable represents a price of 5 days ahead prediction. The output variable RDP+5 was obtained by first smoothing the closing price with a 3-day exponential moving average and is presented as a relative difference in percent for five days ahead. Because statistical information of the previous 20 trading days was used for the definition of the input vector, the original series has been transformed and is reduced by 20. The calculations for the transformation of input and output variables are presented in Table 1. The RDP series were scaled using standard minimum and maximum normalization method which then produces a new bounded dataset. One of the reasons for using data scaling is to process outliers, which consist of sample values that occur outside a normal range [14]. In financial forecasting parlance, accuracy is related to profitability. Therefore, it is important to consider the out-of-sample profitability, as well as its forecasting accuracy. The prediction performance of our networks was evaluated using various financial and statistical matrices as shown in Table 2.
How Good Is the Backpropogation Neural Network
927
Table 2. Performance Metrics and Their Calculations Metrics
Calculations
Normalised Mean Square Error
σ
Signal to Noise Ratio
2 1 n ∑ y i − yˆ i 2 σ n i =1
(
NMSE =
(NMSE) 2
)
n 2 ∑ (y i − y) n − 1 i =1 n y = ∑ yi i =1 =
1
(
SNR = 10 * log 10 sigma
(SNR) sigma =
m
2
)
∗n
SSE
n SSE = ∑ (y i − yˆ i ) i =1 m = max(y)
Directional Symmetry (DS)
DS =
di
1 n ∑ di n i =1
⎧1 ⎪ =⎨ ⎪0 ⎩
(y i − y i −1 )(yˆ i − yˆ i −1 ) ≥ 0 otherwise
Annualised Return (AR)
AR = 252 ∗
Ri
1 n ∑ Ri n i =1
⎧ ⎪⎪ y i =⎨ ⎪− y i ⎩⎪
(y i )(yˆ i ) ≥ 0 otherwise
n is the total number of data patterns. y and yˆ represent the actual and predicted output value.
4 Simulation Results This work is concerned with financial time series prediction. So, throught the extensive experiments conducted, the primary goal was not to assess the predictive ability of the SONIA neural networks against the backpropagation models, but rather to determine the profitable value contained in the network. As a result the focus was on how the network generates the profits: The the neural network structure which
928
A.J. Hussain and D. Al-Jumeily
produces the highest percentage of annualized return, on out of sample data, is considered the best model. Table 3 displays the average results of 20 simulations obtained on unseen data from the neural networks, while Figure 2 shows part of the prediction of the IBM common stock closing price and the US/EU exchange rate time series on out of sample data. As can be seen in Table 3, the average performance of the SONIA network, using the annualized return, demonstrated that using the network to predict the IBM common stock closing price resulted in better performance profit in comparison to the MLP network, with an average increased of 1.72% using 11 hidden units. In the MLP network, the objective of the backpropagation is to minimize the error over all the dataset, while for SONIA network, the learning concentrated on the local properties of the signal and the aim of the network is to adapt to the local properties of the observed signal using the self-organised hidden layer inspired by the immune algorithm. Thus the SONIA networks have a more detailed mapping of the underlying structure within the data and are able to respond more readily to any greater changes or regime shifts which are common in non-stationary. This accounts for the observed better performance of the SONIA networks, in comparison to the MLP networks, when used to predict the IBM common stock closing price. Table 3. The Average Results Over 20 Simulations for the MLP and the SONIA Neural Networks US/EU Exchange Rate AR (%) DS (%) NMSE SNR IBM Common Stock Closing Price AR (%) DS (%) NMSE SNR
MLP 87.88 65.69 0.2375 23.81
SONIA Hidden 20 87.24 64.20 0.2628 23.37
MLP
SONIA NN Hidden 11
88.54 63.53 0.352200 21.45
90.26 64.70 0.384 21.05
For the prediction of the US Dollar/Euro exchange rate, the simulation showed the MLP network fared slightly better than the SONIA network with a 0.64 % increase in the annualised return. In the MLP network extensive tests were carried out, beforehand, to determine the number of hidden units (between 3 and 10) that delivered the best network performance. However, for the SONIA network the optimum number of hidden units was decided by the system itself. In attempting to understand why the SONIA network failed to generate better profit than the MLP network, the properties of the US dollar and the Euro exchange rate time series were studied. The dataset the was used has 59.51% small changes containing 43.66% of the potential profit and 40.49% of higher value
How Good Is the Backpropogation Neural Network
929
changes containing 56.34% of the profit. This means that there is a large percentage of potential return within the small changes. As the purpose of the MLP network is to minimize the error over all the dataset and as it can work better when the data contains more potential return within small changes, then the MLP networks can perform better than the SONIA network on the annualized return when used to predict the dynamic US/EU exchange rate. SONIA
RDP+5
TESTING OUT-OF-SAMPLE OF
Days
(a)
(b)
Fig. 1. (a) Part of the predication of the IBM common stock closing price using the SONIA in the period between 17th May 1961 to 2nd November 1962. (b) Part of the predication of the daily exchange rate using the SONIA in the period between 3rd January 2000 to 7th November 2005 between the US dollar and the Euro.
5 Conclusion In this paper, a novel application of the SONIA neural network for financial time series prediction is presented. Two time series are used in these simulations which are the daily exchange rates between the US Dollar and the Euro in the period between 3rd January 2000 to 4th November 2005, which contain approximately 1525 data points, and the IBM common stock closing price dated from 17th May 1961 to 2nd November 1962, giving a total of 360 trading days. The simulation results showed that the SONIA network produced profit from the predictions based on the two time series.
References 1. Sitte, R., Sitte, J.: Analysis of the Prediction of Time Delay Neural Networks Applied to the S&P 500 Time Series”, IEEE Transactions on Systems, Man and Cybernetics 30 (2000) 568-57 2. Lindemann, A.., Dunis, C.L., Lisboa, P.: Level Estimation, Classification and Probability Distribution Architectures for Trading the EUR/USD Exchange Rate, Neural Computing & Applications, forthcoming, 2005
930
A.J. Hussain and D. Al-Jumeily
3. Lindemann, A.., Dunis, C.L., Lisboa, P.: Probability Distributions, Trading Strategies and Leverage: An Application of Gaussian Mixture Models, Journal of Forecasting, 23 ( 8) (2004) 559-585 4. Dunis, C., Williams, M.: Applications of Advanced Regression Analysis for Trading and Investment”, in C. Dunis, J. Laws and P. Naïm [eds.], Applied Quantitative Methods for Trading and Investment, John Wiley, Chichester, 2003 5. Zhang, G.Q., Michael, Y.H.: Neural network forecasting of the British Pound/U.S. Dollar Exchange Rate, Omega 26 ( 4) (1998) 495–506. 6. Haykin, S.: Neural Networks: a Comprehensive Foundation. Englewood Cliffs, NJ: Prentice-Hall, 1999. 7. Cao, L.J., Francis, E.H.T.: Support Vector Machine With Adaptive Parameters in Financial Time Series Forecasting, IEEE Transactions on Neural Network 14 (6) (2003) 15061518 8. Versace, M., Bhatt, R., Hinds, O., Shiffer, M.: Predicting the Exchange Traded Fund DIA with a Combination of Genetic Algorithms and Neural Networks”, Expert Systems with Applications 27 (2004) 417-425 9. Knowles, A.: Higher Order and pipelined Networks for Time Series Prediction of Currency Exchange Rates, MPhil, Liverpool John Moores University, 2006 10. Fama, E.F., French, K.R.: Business Conditions and Expected Returns on Stocks and Bonds, J. Financial Econ 25 (1989) 23–49 11. Widyanto, M.R., Nobuhara, H., Kawamoto, K., Hirota, K., Kusumoputro, B.: Improving Recognition and Generalization Capability of Back-Propagation NN using Self-Organized Network Inspired by Immune Algorithm (SONIA), Applied Soft computing 6 (2005) 72-84 12. Timmis, J.I.: Artificial Immune Systems: a Novel Data Analysis Technique Inspire by the Immune Network Theory, Ph.D. Dissertation, University of Wales, Aberystwyth, 2001 13. THOMASON, M.: The practitioner method and tools, Journal of Computational Intelligence in Finance 7 ( 3) (1999) 36-45 14. Kaastra, I., Boyd, M.: Designing a neural network for forecasting financial and economic time series, Neurocomputing 10 (1996) 215-236 15. R.J. Hyndman (n.d.), “Time Series Data Library,” downloaded from: http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/. Original source from: McCleary & Hay, Applied Time Series Analysis for the Social Sciences, Sage Publications, 1980.