Lecture Notes in Computer Science

Genetic Programming with Wavelet-Based Indicators for Financial Forecasting

Jin Li 1 , Zhu Shi2 and Xiaoli Li1 1 The Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA), School of Computer Science, The University of Birmingham, Edgbaston, Birmingham B15 2TT, UK. 2 School of Software Engineering, The University of Science and Technology of China, P.R.China

Wavelet analysis, as a promising technique, has been used to approach numerous problems in science and engineering. Recent years have witnessed its novel application in economic and finance. This paper is to investigate whether features (or indicators) extracted using the wavelet analysis technique could improve financial forecasting by means of Financial Genetic Programming (FGP), a genetic programming based forecasting tool (i.e., Li, 2001). More specifically, to predict whether Down Jones Industrial Average (DJIA) Index will rise by 2.2% or more within the next 21 trading days, we first extract some indicators based on wavelet coefficients of the DJIA time series using a discrete wavelet transform; we then feed FGP with those wavelet-based indicators to generate decision trees and make predictions. By comparison with the prediction performance of our previous study (i.e., Li and Tsang, 2000), it is suggested that wavelet analysis be capable of bringing in promising indicators, and improving the forecasting performance of FGP.

Address for correspondence: Jin Li, CERCIA, School of Computer Science, The University of Birmingham, Edgbaston, Birmingham B15 2TT, UK. E-Mail: [email protected]

1

1

Introduction

In the past decade, researchers in the fields of applied mathematics and electrical engineering have developed the useful wavelet analysis methods for the multi-scale representation and the analysis of complicated signals. Examples of wavelet applications are turbulence analysis, image compression, earthquake prediction, biomedical signal processing and so forth (e.g., Daubechies, 1992; Li and Yao, 2005; Li et al., 2000; Percival and Walden, 2000; Meyer, 1993). More recent years have witnessed its novel applications in economic and finance (e.g., Pan and Wang, 1998; Ramsey and Zhang, 1997; Ramsey and Lampart, 1998). An overview of wavelets in economic and finance can be found in the references of Gencay et al. (2002) and Ramsey (2002). Often, the wavelet transform is applied as a decomposition tool to analysis financial time series. Applications of wavelets are usually focused on studying the dynamics and correlation of financial time series, including scaling properties of foreign exchange volatility (Gençay et al., 2001), systematic risk in a capital asset pricing model (Gençay et al., 2003), and the relationship between financial variables and real economic activity (Kim and In, 2003). Apart from these, there are also wavelet applications for financial forecasting where wavelet coefficients are directly transformed as features, and input to neural networks for predictions (e.g., Arino, 1996; Aussem et al., 1998; Murtagh et al., 2003). The interest of wavelet analysis in empirical finance is attributed to its advantages. As opposed to the traditional Fourier techniques, wavelet analysis is able to reveal localized information within the data in the time-scale plane. More specifically, it is capable of decomposing an observed time series into a set of multi-scale or multiresolution constituent time series. This makes it suitable for the analysis of nonlinear

2

and non-stationary financial time series. The time scale decomposition leads to a number of benefits for financial analysis. Firstly, in theory, one is able to study a financial time series at as many more time scales as possible, rather than at few traditional time scales, like “long run” and “short run”. In practice, signals are usually decomposed into a number of constituent signals by discrete wavelet transforms. Secondly, through the decomposition, many of anomalies or noises in data can be revealed and therefore can be treated (e.g., being removed) separately if necessary. Finally, from the forecasting point of view, it has been made possible to tailor specific computational forecasting techniques to different constituent time scales and thereby gain efficiency of forecast (Ramsey, 2002). This study applies a discrete wavelet transform to decompose a financial time series. A number of features are derived based on wavelet coefficients. The differences between this study and the existing studies in the literature (e.g., Arino, 1996; Aussem et al., 1998; Murtagh, 2003) are as follows. Firstly, the forms of features are different. The features are derivatives from the wavelet coefficients, rather than the values of coefficients themselves. The indicators extracted in this study reflect the properties of the time series in respect of dynamics and statistics. Secondly, our features are generated using coefficients at a certain level, rather than at all levels, with an attempt of removing possible noise from original data. Finally, our approach adopts the genetic programming technique, rather than Neural Networks, which is able to generate comprehensible decision trees. This makes our method superior to theirs, simply due to the fact that solutions need to be understood by human beings for decision making in finance and economics. The purpose of this study is to investigate whether wavelet analysis could be exploited to improve financial forecasting. We carry out this investigation through a

3

prediction task addressed previously by a genetic programming based tool, Financial Genetic Programming (FGP) (see, Li, 2001). The task is to predict whether an index shall rise by r% or more within the next n periods. Our earlier studies (Li and Tsang, 1999a; 1999b; 2000; Tsang et al., 1998; Tsang and Li, 2002; Tsang et al., 2004) made predictions using some of indicators derived based on some technical analysis rules in textbooks. In this study, predictions shall be made using a number of novel indicators extracted by means of the wavelet analysis technique. It is worth pointing out that such novel indicators would probably have more merit to predictability to future price movement because they could possibly remove potential noises in original data to some extent. Provided that all experimental settings are same, any improvement in the forecasting performance reported in this study could suggest that wavelet analysis be of value to FGP. The structure of the paper is as follows. Section 2 describes FGP system used. Section 3 introduces wavelet analysis and discusses how the wavelet-based indicators are generated using the wavelet analysis technique. In Section 4, experiments and results are reported. The discussions and conclusions are given in the final section.

2

FGP for Financial Forecasting

This section reviews the history of FGP and briefly presents its technical detail for financial forecasting. The measures of its prediction performance are also given.

2.1

Overview of FGP

FGP is main part of the implementation of Evolutionary Dynamic Data Investment Evaluator (EDDIE) (Tsang et al., 1998; Tsang and Li, 2002; Tsang et al., 2004),

4

which is an interactive genetic programming based financial forecasting tool. It aims to help analysts to search the space of interactions and make financial decisions. Given a set of indicators (or features from the point of view of data mining), FGP at tempts to find interactions among indicators and discover appropriate corresponding thresholds for indicators. Using genetic programming, FGP generates Genetic Decision Trees (GDTs), which can be understood by human experts. Human expertise is channeled into FGP through indicators as the input to the system. In this way, experts are allowed to experiment with a variety of indicators more easily. The forecasting performance of FGP crucially depends on the quality of the indicators chosen. This study aims to examine the effectiveness of alternative indicators based on wavelets, instead of some technical analysis indicators used in our previous studies. FGP system has two versions: namely FGP-1 and FGP-2. FGP-1 is designed to be able to improve forecasting accuracy by combining experts’ forecasts from different sources. FGP-2 is designed to be able to improve prediction precision by a constraint handle. The handle allows users to pick up a constraint, i.e. the percentage of opportunities, though possibly at the price of missing opportunities. FGP-2 excels FGP-1 in providing the constraint handle by means of a novel fitness function (see the detail in Section 2.2). Both FGP-1 and FGP-2 have been applied to a variety of financial forecasting problems with demonstrated accuracy (Tsang and Li, 2002). In particular, the efficacy of FGP-2 has been examined intensively through a set of prediction tasks: whether an index will rise by r % or more within the next n periods, denoted by Pnr . In this study, FGP-2 is exploited to attack a task, P212.2 , on the prices of Dow Jones Industrial Average (DJIA). Like our previous study (Li and Tsang, 2000), we still

5

focus the performance of FGP-2 on the prediction precision (see the definition in Section 2.2). FGP-2 generates GDTs to make predictions. An example of GDT is shown below, where a ‘‘Positive’’ prediction means that the goal can be achieved; ‘‘Negative’’ means otherwise. ((IF (MV_50 < -18.45) THEN Positive ELSE (IF TRB_5 > - 19.48) AND (Filter_63 < 36.24) THEN Negative ELSE Positive MV_50, TRB_5 and Filter 63 involved in the GDT belong to three types of technical indicators. They were derived on ground of three simple technical analysis rules in the financial literature, e.g., Alexander (1964), Sweeney (1988) and Brock et al. (1992), namely, moving average rules, filter rules and trade range break rules. These indicators have been argued to have merits to financial forecasting (see Brock et al. 1992, Sweeney 1988). Our previous study (i.e., Li and Tsang, 2000) adopted six indicators as follows to attack P212.2 on DJIA. (1)

MV_12 = Today's price - the average price of the last 12 trading days

(2)

MV_50 = Today's price - the average price of the last 50 trading days

(3)

Filter_5 = Today's price - the minimum price of the last 5 trading days

(4)

Filter_63 = Today's price - the minimum price of the last 63 trading days

(5)

TRB_5 = Today's price - the maximum price of the last 5 trading days

(6)

TRB_50 = Today's price - the maximum price of the last 50 trading days

Nevertheless, to find alternative promising indicators is one of the important motivations in this study. The hope is that any new derived wavelet-based indicators

6

would be better and have more merit to the prediction. As a result, the performance of FGP can be improved in term of the prediction precision. For brevity, how FGP works can be explained in pseudo code below. To know more details of genetic programming technique, interested readers can refer to Koza (1992).

Procedure FGP ( ) Begin Partition whole data into training data and testing data /* While training data is employed to train FGP to find the best-so-far-rule; the testing data is used to determine the performance of predictability of the best-so-far-rule */

Pop Å InitializePopulation(Pop); /*randomly create a population of decision trees. */ Evaluation (Pop);

/* calculate fitness of each individual in Pop

*/

Repeat Pop Å Reproduction (Pop) + Crossover (Pop); /*new population is created after genetic operators of reproduction which reproduces M*Pr individuals and crossover which creates M*(1-Pr) individuals. In our case Pr=0.1, M is population size */

Pop Å Mutation (Pop);

/*apply mutation to population */

Evaluation (Pop); Until (TerminationCondition( )) /* determine if the termination condition is fired */ Apply the best-so-far rule to the testing data; End

7

2.2 Performance Measures of FGP

The prediction problem Pnr can be treated as a binary classification problem. Each day can be classified as either a positive position or a negative position. A positive position predicted by the GDT is sometimes called a buying signal or a recommendation to buy, both of which will be referred to in the following context of this paper. For each GDT, we define the Rate of Correctness (RC), the Rate of Missing Chance (RMC), and the Rate of Failure (RF) as its prediction performance measures. The Rate of Precision (RP) is also given as an important meaningful reference measure for the user, as it measures the accuracy of buying signals. Formulae for each measure is given through a contingency table (Table 1) as follows. Table 1. A contingency table for the binary classification, where a specific prediction rule is invoked. RC =

TP + TN TP + TN = ; O+ + O− N + + N −

# of True Negative Positions [TN] # of False Negative Positions [FN] # of negative positions predicted (N-) = TN+FN

RMC =

FN ; O+

RF = (1- RP)

# of False Positive Positions [FP] # of True Positive Positions [TP] # of positive positions predicted (N+) = FP+TP

=

FP ; N+

Actual # of negative positions (O-) = TN+FP Actual # of positive positions (O+) = FN+TP Number of Cases

As mentioned earlier, FGP-1 generates GDTs, aimed at making prediction as accurately as possible. Thus, RC on its own is an appropriate fitness function for FGP-1. In contrast, FGP-2 attempts to improve prediction precision, i.e., RP, which is equivalent to reducing RF. A lower RF means that each positive recommendation made by the GDT is more likely to be a good and correct opportunity for the investor to make a bid. FGP-2 achieves this target by means of a novel constrained fitness function, which is taken as follows.

8

f = w_rc * RC - w_rmc* RMC – w_rf * RF.

Where 0≤ w_rc, w_rmc, and w_rf ≤ 1

(1)

It involves three performance measures, i.e. RC, RMC and RF, and three weights i.e. w_rc, w_rmc and w_rf . The goodness of a GDT is no longer assessed only by its RC, but by a synthetical value, which is the weighted sum of its three performance rates. The user is allowed to reflect their preferences to any measure by adjusting the weights. Due to the brittleness of the fitness function (cf., Tsang and Li, 2002), a novel constraint parameter, R = [Pmin, Pmax], is introduced into Function 1, which defines the minimum and maximum percentage of recommendations that is used to enforce FGP to achieve in the training data (like most machine learning methods, the assumption is that the test data exhibits similar characteristics). Effectiveness of the constraint in the fitness function for achieving more reliable and accurate predictions has been demonstrated in our numerous previous studies. In general, FGP-2 allows the user to tune a parameter, i.e. constraint R, in order to improve RP without affecting the RC significantly, though at the price of increasing RMC. Such a scenario reoccurs in this study as well (see Section 4). It is worth emphasising again that the performance of FGP crucially depends on the quality of indicators chosen by the users. We argue that the higher quality of the indicators used would almost always lead to better performance of FGP. This is evident in this study.

3

Wavelet-Based Indicators

In this section, a brief introduction of wavelet analysis is given. We then describe the indicators used in this study, which are derived from wavelet coefficients.

9

3.1

Discrete Wavelet Transform

Successful applications in science and engineering demonstrate the wavelet transform is a powerful signal or image processing method. Wavelet transform overcomes the shortcomings of the STFT (short time Fourier transform) by performing a multiresolution analysis of signals (e.g., Meyer, 1993; Daubechies, 1992). The wavelet transform can be used to describe the content of the different frequency over time of a non-stationary time series at a time-scale space. Thus, some of transients that are hidden in the time series can be highlighted. The wavelet transform of a time series x(t) is defined as W ( a, b ) =

1

∫ x(t )ψ a

∗

⎛t −b⎞ ⎜ ⎟dt ⎝ a ⎠

(2)

Where, t is the time, a>0 and b are scale and translation parameters, respectively; ψ (.) is a mother wavelet. W(a ,b) is the coefficients of wavelet transform of x(t). 1/a is proportional to the frequency of the wavelet function. For a small value of a, the wavelet coefficient corresponds roughly to a high frequency component of a time series; whereas a big one corresponds to a low frequency component of the time series. By adjusting scale parameter a, the wavelet transform can flexibly decompose a time series x(t) into multiple resolution constituent time series. Since the wavelet coefficients obtained can indicate local characteristics of a non-stationary time series at the time-scale space, to identify system states, in practice, one often extracts features based on wavelet coefficients. To derive wavelet-based indicators, we apply a discrete wavelet transform to decompose the time series. The discrete wavelet decomposition of a discrete time series

10

x (n) (n=1, 2, …,N), where N is the number of data points in the time series, can be defined as follows: J

x(n) = ∑ C J (k )φ J (2 − J n − k ) + ∑∑ d j (k )ψ j (2 − j n − k ) k

j =1

(3)

k

where dj (k) are called wavelet coefficients at the level j {j = 1, 2, …, J}, and CJ(k) are the coefficients at the maximum resolution level J. Both values of coefficients are varied by position, as indicated by the value of k. The value of J can be set up by users from 1 to a maximum integer number, which is sustainable by N (i.e. 2J < N ).

φ j (.) is called father wavelets whereas ψ j (.) is called mother wavelets, both of which are derivable from a basis wavelet (e.g., the Haar, the Daublet, the Morlet). The father wavelet provides an approximate version of the time series at successive resolutions, whilst the mother wavelet captures the detail at each resolution. In summary, given a time series, a basis wavelet, and a parameter J, both wavelet coefficients, i.e. CJ(k) and dj (k) (j = 1, 2, …, J) can be calculated by a fast recursive scheme (see Meyer, 1993). CJ(k) represents the smooth coefficients that capture the trend of the time series, whereas dj (k), representing increasing finer resolution deviations from the smooth trend, can capture higher frequency oscillations. To what extent that resultant coefficients CJ(k) smooth the time series is determined by the size of J selected. The larger J is, the more smooth part of the time series can be captured by CJ(k). The choice of J is crucial in applications of wavelet analysis to finance and shall be discussed further in Section 4.

11

3.2

Deriving wavelet-based indicators

In this paper, the energy, entropy and others of CJ(k), wavelet coefficients at level J, are calculated and they are taken as indicators for FGP-2. The reason for choosing CJ(k), rather than dj (k), is that CJ(k) captures major trends of a time series whereas dj (k) only captures deviations of the time series. Some of the derived indicators describe the features of a financial time series in dynamics whilst others are mere statistics of a financial time series. Given that a financial time series could potentially reflect dynamics of the movements of financial markets, all indicators could have financial meaning to some extent. The formulae for extracting those features are given below: (1) Energy feature. The feature is based on the amplitude with different frequency of a time series. The energy of wavelet coefficient at each resolutions level j = 1,2,…,J with a sliding window (l is the window size) with index i is written as: ( BE j ) i = (∑ C j (l )) i 2

k

(4)

(2) Entropy feature. The feature is to measure the uncertainty of the wavelet coefficients at the different level j. The pk,j is the probability distribution of a wavelet coefficients at the scale of j. The entropy at each resolutions level j=1,2,…,J based on the wavelet coefficients estimated with a sliding window l with index i is given by ( H j ) i = (−∑ pl , j ln( pl , j )) i l

(5)

(3) Curve Length. The feature is to compute the trajectory of a wavelet coefficient. If a curve length is long, the change of system is severe; slight otherwise. The formula of computation of the curve length is below:

12

CL[ j ] = ∑ C j (i − 1) − C j (i ) l

(5)

i =1

(4) Nonlinear Energy. The feature is to describe the local change of energy information, which can be used to extract the ‘spikes’ in the wavelet coefficients. The equation of nonlinear energy is below:

NE[ j ] = ∑ C j (i) * C j (i ) − C j (i − 1) * C j (i + 1) l

(6)

i =1

(5) Statistic Features. Some basic statistics can also be applied to extract some of features from the wavelet coefficients. They are listed as follows: Mean: Mean[ j ] =

1 l ∑ C j (i ) l i =1

Maximum: f max( j ) = max(C j (1),..., C j ( j )..., C j (l )) Minimum: f min( j ) = min(C j (1),..., C j ( j )..., C j (l ))

(7)

Median: f med ( j ) = median (C j (1),..., C j ( j )..., C j (l )) Standard Deviation: STD ( j ) =

1 l ∑ (C j (i) − mean(C j (i)) 2 l i

All the 9 different indicators are adopted by FGP. Note that any feature above at a time index, i, is calculated using a fixed sliding window that covers preceding l coefficient values (i.e. widow size = l), because only previous coefficients are available at time index i. We did not conduct any feature selection process in this study as genetic programming itself has the capability of selecting more promising indicators adaptively via its genetic operators such as reproduction, crossover and mutation, while evolving decision trees.

13

4

Experiments and Results

As mentioned earlier, this study is to examine whether wavelets-based indicators are able to bring in any benefit to FGP in forecasting. In particular, we are keen to know any performance improvement of FGP-2 in reducing RF, or increasing RP equivalently. Table 2. Tableau for the parameters of FGP-2 experiments Input terminals (9 wavelets-based indicators) Prediction terminals

BE3, H3, CL[3], NE[3], Mean[3], fmax(3), fmin(3), fmed(3), STD(3), and Real values. {0, 1}: 1 means "Positive"; 0 means "Negative".

Non-terminals Crossover rate Mutation rate Population size Maximum no. of generations

If-then-else, And, Or, Not, 0.9. 0.01. 1,200. 30.

Termination criterion

The maximum number of generations has been run or FGP-2 has run for more than 0.5 hours.

Selection strategy

Tournament selection, Size = 4.

Max depth of individual programs

17.

Max depth of initial individual programs

4.

Run times (hours)

0-0.5 hours. Intel® Xeon™ PC 2.8 GHz running Windows 2000 with 2G RAM.

Hardware and operating system

>, ≥,

Lecture Notes in Computer Science - Google Sites

Lecture Notes in Computer Science - Google Sites

Suggest Documents