A Model Ranking Based Selective Ensemble ... - CyberLeninka

0 downloads 0 Views 359KB Size Report
recognition letters 24, 659-675. 21. Data Market (2014). http://datamarket.com. 22. Pacific exchange Rate Service (2014). http://fx.sauder.ubc.ca/data.html. 23.
Available online at www.sciencedirect.com

ScienceDirect Procedia Computer Science 48 (2015) 14 – 21

International Conference on Intelligent Computing, Communication & Convergence (ICCC-2014) (ICCC-2015) Conference Organized by Interscience Institute of Management and Technology, Bhubaneswar, Odisha, India

A Model Ranking Based Selective Ensemble Approach for Time Series Forecasting Ratnadip Adhikari*, Ghanshyam Verma, Ina Khandelwal Department of Computer Science and Engineering, the LNM Institute of Information Technology, Jaipur-302031, India

Abstract Time series analysis is a highly active research topic that encompasses various domains of science, engineering, and finance. A major challenge in this field is to obtain reasonably accurate forecasts of future data from analyzing the past records. A fruitful alternative to using a single forecasting technique is to combine the forecasts from several conceptually different models. Numerous research studies in literature strongly recommend this approach, due to the fact that a combination of multiple forecasts almost always substantially reduces the overall forecasting errors as well as outperforms the component models. In this paper, we propose an ensemble method that selectively combines some of the constituent forecasting models, instead of combining all of them. On each time series, the component models are successively ranked as per their past forecasting accuracies and then we combine the forecasts of a group of high ranked models. Empirical analysis is conducted with nine individual models and four real-world time series datasets. Results clearly show that our proposed ensemble mechanism achieves consistently better accuracies than all component models and other conventional forecasts combination schemes. © 2015 2014The TheAuthors. Authors.Published PublishedbybyElsevier ElsevierB.V. B.V. © This is an open access article under the CC BY-NC-ND license Selection and peer-review under responsibility of the Program Chairs of ICCC-2014. (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of scientific committee of International Conference on Computer, Communication and Convergence (ICCC 2015)

* Corresponding author. Tel.: +91-966-093-3276; E-mail address: [email protected]

1877-0509 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of scientific committee of International Conference on Computer, Communication and Convergence (ICCC 2015) doi:10.1016/j.procs.2015.04.104

Ratnadip Adhikari et al. / Procedia Computer Science 48 (2015) 14 – 21 Keywords: Time series; Combination of forecasts; Forecasting accuracy; Box-Jenkins models; Neural networks; Support vector machines.

1. Introduction Time series analysis is a highly important and dynamic research domain, having numerous practical applications. Its primary objective is to develop a mathematical model that estimates the underlying data generation process, retaining the statistical properties of the series and then to forecast desired number of future observations through this model. Appropriate modeling and forecasting of a time series is a considerably difficult task, mainly due to several unintended characteristics, often associated with the series. These include nonstationarity, irregular fluctuations, seasonal and cyclical variations, deviations from the standard statistical specifications, and severe multicollinearity among the observations [1]. The most appropriate alternative is to combine the forecasts from several structurally different models, instead of adopting only one model. Forecasts combination is based on the rational ideology that no specific model alone can consistently achieve best forecasts for a class of time series, but multiple models in unison can provide a very close estimation of the actual data generation process [2]. A number of renowned research works in this domain have demonstrated that a combination of forecasts generally comes up with much better forecasting accuracy than each component model. Moreover, this approach also substantially reduces the risk associated with selecting a single individual forecasting technique [3, 4]. Throughout the past two decades, there has been an overwhelming amount of research on combining forecasts, mainly due to its outstanding potency of accuracy enhancement. As a result, a variety of combination techniques have been developed in literature [2, 3]. Most of them form a weighted linear combination of the component forecasts, the weights being determined from the past forecasting records of the participating models. Their range varies from the simple statistical techniques, e.g. simple average, trimmed mean, winsorized mean, median, etc. [5] to more advanced methods, e.g. the outperformance and optimal linear combination of forecasts [2, 6]. Recently, Adhikari and Agrawal [7] have comprehensively reviewed the performances of several linear forecasts combination techniques on nine real time series datasets. An important finding from the past as well as recent research is that the simple techniques of combining generally achieves considerably better accuracies than more complex schemes. We further notice that there has been little work on selecting the suitable models in the ensemble and as such, the existing works combines all component forecasts. However, it is obvious that not all models will produce good forecasts for the particular time series and so tactically discarding some of them can potentially improve the overall accuracy to a large extent. This observation is the primary motivation behind the present work. In this paper, we propose an ensemble methodology that combines the forecasts from some selected component models. The appropriate subset of forecasts to combine is selected through a ranking mechanism. At first, the models are successively ranked between one and the total number of models, so that a model with a comparatively smaller in-sample forecasting error gets a smaller, i.e. in fact a better rank and vice versa. Then, starting with the first rank, we consecutively select a predefined number of models and form a weighted linear combination of their forecasts. The weight to each model in this group is assigned to be inversely proportional to its in-sample forecasting error. In this manner, the proposed approach selectively combines the forecasts from a group of better performing models and discards the others. In order to check the precision and effectiveness of our approach, empirical analysis is carried out with nine individual forecasting models on four real time series datasets. The forecasting performance of the proposed ensemble is compared with those of the individual models as well as a number of other traditional linear combination techniques, through two popular error measures. The remainder of the paper is organized as follows. Section 2 describes various well-known linear forecasts combination techniques and Sect. 3 presents the proposed ensemble mechanism. Section 4 reports the empirical analysis and finally Sec. 5 concludes this paper. 2. The ensemble forecasting paradigm The most popular and widely used ensemble method is to form a linear combination of the constituent forecasts. Let us consider that Y

y1 , y2 , , yN

T

be the actual out-of-sample testing dataset of a time series and

15

16

Ratnadip Adhikari et al. / Procedia Computer Science 48 (2015) 14 – 21

ˆi Y

T

yˆ1i , yˆ2i , , yˆ Ni

i 1,2, , n . Then, a linear combination of these

be its forecast through the ith model

n forecasts is obtained as follows:

yˆk

w1 yˆk(1) w2 yˆk(2)

n

wn yˆk(n)

wi yˆk(i )

(1)

i 1

k 1,2, , N. Here, wi is the weight assigned to the ith forecasting model. Usually, the weights are assumed to be nonnegative, i.e.

wi

0, i and unbiased, i.e.

n

ˆ wi 1. The combined forecast vector for Y is given by Y

i 1

T

yˆ1, yˆ2 , , yˆN . Over

the years, various linear forecasts combination methods have been developed in literature on the basis of different weight assignment techniques. Some widely popular among them are briefly discussed here. The simple average is the most intuitive and easiest combination method that assigns equal weights to all component forecasts, so that wi 1 n , i 1,2, , n. Due to its virtues of remarkable accuracy, impartiality, and robustness, the simple average is often a favorable choice in combining forecasts [4–6]. The median, trimmed mean, and winsorized mean are other successful alternatives to it. A trimmed and winsorized mean both forms a simple average by fetching an equal number of α smallest and largest forecasts and either completely discarding them or setting them equal to the (α+1)th smallest and the (α+1)th largest forecasts, respectively [6]. The simple average and median are in fact particular cases of a trimmed mean, corresponding to no trimming and maximum possible trimming, respectively. In an Error Based (EB) method, the weights to the component models are assigned to be inversely proportional to their in-sample forecasting errors. Thus, a model with more error receives less weight and vice versa. Usually, the in-sample forecasting errors are measured through some total absolute error statistic, e.g. the Sum of Squared Error (SSE) [2, 7]. A Differential Weighting (DW) scheme is an alternative to the EB method that adaptively estimates the combining weights from the past forecasting records of the constituent models. Here, we use a popular DW method from the work of Winkler and Makridakis [8]. Its weighting scheme is as follows:

wi,t

wi,t

1

1

t 1 s t v

es

i

2

1

n

t 1

j 1

s t v

es

j

2

1

(2)

i 1,2, , n. Here, n is the number of models; t is the forecasting period; wi,t is the weight assigned to the ith model on the i basis of the data preceding the period t; et is the percentage forecast error at time t and 0,1 is a constant parameter. Following Winkler and Makridakis [8], in this study, we consider that β=0.7. In the Ordinary Least Squares (OLS) method, the component forecasts, together with a constant are used as the regression terms in the OLS regression and the weights are determined through minimizing the combined forecast SSE [5, 9, 10]. This method is more general in the sense that it omits the requirement of nonnegativity and unbiasedness, but includes the risk of getting negative weights, which are often insensible [5, 9]. In practical applications, the weights are determined by minimizing an in-sample combined forecast SSE. The outperformance method, proposed by Bunn [11] determines the combining weights from the number of times the corresponding models performed best in past in-sample forecasting trials. It considers each weight as the probability that the respective model will outperform the others, i.e. produce the least error in the next trial. It is a very successful robust nonparametric approach of combining forecasts [5].

17

Ratnadip Adhikari et al. / Procedia Computer Science 48 (2015) 14 – 21

3. The proposed forecasts combination methodology A major challenge faced in a forecasts combination is to select the appropriate component models. Ideally, an ensemble should neither include the models with reasonably bad forecasting accuracies nor discard any potentially good model. Evaluating the out-of-sample forecasting potency of a model in advance is very difficult and as such, there has been considerably limited works in this direction [12]. Most of the existing ensemble schemes consider a group of component models and combine all the obtained forecasts. But, obviously, such an ensemble has a genuine risk of including some models with reasonably poor performances, which ultimately deteriorate the overall combined forecasting accuracy. In this study, we propose an approach that combines the forecasts from a class of good models, selected through a ranking mechanism that is based on past forecasting records of the component models. The models with potentially poor accuracies get filtered out through this ranking technique. The following steps are executed in our proposed ensemble method. Step 1. We divide the original time series Y

Ytr

y1 , y2 , , yNtr

testing dataset Yts and Nin

Nts

T

y1 , y2 , , yN

, the in-sample validation dataset Yvd T

yNin 1 , yNin 2 , , yNin

Nts

, so that Nin

T

into the in-sample training dataset

yNtr 1 , yNtr 2 , , yNtr

T Nvd

, and the out-of-sample

Ntr Nvd is the size of the total in-sample dataset

N.

ˆ tsi Step 2. Let, we have n component forecasting models and obtain Y

yˆNiin 1 , yˆNiin 2 , , yˆNiin

T Nts

as the forecast

th

of Yts through the i model.

ˆ vd Step 3. We implement each model on Ytr and use it to predict Yvd . Let, Y i

yˆNitr 1 , yˆ Nitr 2 , , yˆ Nitr

T Nvd

be the

prediction of Yvd through the ith model. Step 4. We find the in-sample forecasting error of each model through some suitable error measure. The Mean Absolute Error (MAE), Mean Squared Error (MSE), and Mean Absolute Percentage Error (MAPE), are three widely popular error statistics, which are defined as follows [13]:

1 N

ˆ MAE X, X where, X

et

N

ˆ et , MSE X, X

t 1

x1 , x2 , , xN

T

ˆ , X

1 N

N

ˆ et2 , MAPE X, X

t 1 T

xˆ1 , xˆ2 , , xˆN

1 N

N t 1

et xt

100 ,

are respectively the actual and forecasted datasets and

xt xˆt is the forecasting error at time t. In the present study, we adopt the MSE to find the in-sample

forecasting errors of the component models. Step 5. Based on the obtained in-sample forecasting errors, we assign a score to each component model as i

ˆ vdi , 1 MSE Yvd , Y

i 1,2, , n. The scores are assigned to be inversely proportional to the respective

errors, so that a model with comparatively smaller in-sample error receives more score and vice versa. Step 6. We assign a rank ri

1,2, , n to the ith model, on the basis of its score, so that ri

rj if i, j 1,2, , n. The minimum, i.e. the best rank is 1 and the maximum, i.e. the worst rank is at most n.

Step 7. We choose a number nr so that 1 nr

n and let I

i

j

i1 , i2 , , inr be the index set of the nr component

models, whose ranks are in 1, nr . So, we select a subgroup of nr smallest ranked component models. Step 8. Finally, we obtain the weighted linear combination of these selected nr component forecasts as follows:

yˆk

wi1 yˆki1

wi2 yˆki2

i

winr yˆk nr

wi yˆki i I

,

(3)

18

Ratnadip Adhikari et al. / Procedia Computer Science 48 (2015) 14 – 21

k 1,2, , N. Here, wik

nr ik

k 1

ik

nr

is the normalized weight to the selected component model, so that

k 1

wik

1.

In the described ensemble scheme, the selection of the appropriate validation set, i.e. the parameter Nvd and the group size nr is very important. The validation set should reflect the characteristics of the testing dataset that is practically unknown in advance. As such, in this study, we set Nvd to be equal to Nts , the size of the testing dataset. Also, the group size nr should be appropriately selected so that it is neither too small nor too large. 4. Empirical analysis To test the effectiveness of the proposed ensemble approach, we have carried out experiments on MATLAB for four important real time series datasets. The individual models are selected from the following widely popular classes: random walk [14], Box-Jenkins [13, 14], Artificial Neural Network (ANN) [13, 14], and Support Vector Machine (SVM) [15, 16]. The random walk and Box-Jenkins are linear models in the sense that each future value is assumed to be a linear function of past observations, whereas ANN and SVM have recognized capabilities of learning nonlinear structure in a time series. The subclass Box-Jenkins models, viz. Autoregressive Integrated Moving Average (ARIMA) [13, 14] and Seasonal ARIMA (SARIMA) [13] are used in this work and are fitted through the default ARIMA class of the Econometric toolbox of MATLAB. These two models are commonly expressed as ARIMA(p, d, q) and SARIMA(p, d, q)×(P, D, Q)s, where the parameters (p, P), (d, D), (q, Q) respectively denote the autoregressive, degree of differencing, and moving average processes, and “s” is the period of seasonality. The appropriate parameters are determined through the Box-Jenkins model building methodology [14]. For ANN, we have considered three different variations, viz. Feedforward ANN (FANN) [2, 14], Elman ANN (EANN) [2, 17], and Generalized Regression Neural Network (GRNN) [1], which are implemented through the default neural network toolbox [19] of MATLAB. Both the iterative (ITER) and direct (DIR) approaches of ANN forecasting are used here. The former approach predicts a single observation at a time, whereas the later predicts all the future values at one step [18]. The appropriate ANN structure i×h×o, consisting of the numbers of input nodes (i), hidden nodes (h), and output nodes (o) is identified through in-sample validations. For SVM modeling, we have used the Least Squares SVM (LS-SVM) framework, developed by Suykens and Vandewalle [16] and implemented through the LS-SVMlab toolbox [20], considering the Radial Basis Function (RBF) kernel. The optimal values of the parameters, viz. the regularization constant (C) and the RBF tuning parameter (σ) are selected through a 10-fold crossvalidation, by setting their search ranges as 10 5 , 105 and 2 10 , 210 , respectively.

Monthly river flow

1500

(a)

1000

500

0

0

100

200

300

400

500

600

Monthly employments

Four real time series are used in this study, which are: (1) River flow: contains the monthly flow in cms of the Clearwater river at Kamiah, Idaho, USA from 1911 to 1965 [21], (2) Employment: contains the monthly Govt. employment of USA by statistical area in thousands of persons from January, 1990 to June, 2014 [21], (3) AUD-INR exchange rate: contains monthly Australian dollar (AUD) to Indian Rupees (INR) exchange rates from January, 1993 to July, 2014 [22], (4) Facebook: contains the daily closing stock prices of Facebook from 18 May, 2012 to 3 September, 2014 [23]. The time plots of these series are depicted in Fig. 1 and the information regarding their sizes, types, and optimal model parameters are presented in Table 1. 300

(b)

280 260 240 220 0

49

98

147

196

245

294

19

Exchange rate

60

Closing stock price

Ratnadip Adhikari et al. / Procedia Computer Science 48 (2015) 14 – 21

(c)

50 40 30 20 0

37

74

111

148

185

222

80

(d)

60 40 20 0

259

64

128 192 256 320 384 448 512 576

Fig. 1. The time plots of the series: (a) River flow, (b) Employment, (c) AUD-INR exchange rate, (d) Facebook Table 1. Modeling information of the four time series datasets Information

River flow

Employment

AUD-INR exchange rate

Facebook

Size (total, testing)

(600, 100)

(294, 61)

(259, 56)

(576, 92)

Type

Stationary, non-seasonal

Monthly seasonal

Non-stationary, non-seasonal

Non-stationary, non-seasonal

12

Box-Jenkins model

ARIMA(16, 0, 0)

SARIMA(0, 1, 1)×(0, 1, 1)

ARIMA(2, 0, 0)

ARIMA(1, 0, 0)

ITER-FANN

8×6×1

9×3×1

5×1×1

9×11×1

DIR-FANN

8×6×100

5×5×61

7×7×56

2×2×92

ITER-EANN

8×15×1

7×9×1

6×15×1

13×15×1

DIR-EANN

8×15×100

6×5×61

1×12×56

3×11×92

ITER-GRNN

11×5×1

1×1×1

4×4×1

3×1×1

DIR-GRNN

12×5×100

13×1×61

2×1×56

2×1×92

LS-SVM (C, σ)

(2.413, 742.785)

(966.835, 336.116)

(0.732, 1.668)

(4984.848, 0.527)

In Table 2, we present the obtained forecasting results through all methods, in terms of MAE and MSE. In this work, we take the group size nr 5 , so that the first five high-ranked models out of nine are considered for combining and the remaining four low-ranked models are discarded. Table 2. Forecasting results through all methods River flow3

Forecasting errors

Employment

AUD-INR exchange rate

Facebook

MAE

MSE

MAE

MSE

MAE

MSE

MAE

MSE

Random walk

1.755

7.059

4.553

32.114

1.336

3.042

1.671

4.349

Box-Jenkins

1.260

2.606

4.792

31.413

1.307

2.424

1.824

4.671

ITER-FANN

0.913

1.787

4.580

32.762

1.245

2.640

1.851

5.358

DIR-FANN

1.137

1.982

5.025

35.720

1.760

4.732

1.987

6.301

ITER-EANN

1.036

2.189

4.788

34.732

1.342

2.747

1.921

5.772

DIR-EANN

1.585

4.257

4.531

30.200

1.415

3.157

1.855

5.430

ITER-GRNN

1.057

3.250

6.157

56.740

1.437

3.267

1.781

4.736

DIR-GRNN

1.947

4.684

5.431

42.850

1.299

2.525

2.233

6.899

LS-SVM

1.040

1.960

5.058

36.797

1.307

2.875

1.943

5.930

Simple average

0.805

1.656

4.067

25.146

0.798

1.057

0.983

1.678

Trimmed mean1

0.793

1.584

4.308

27.803

0.806

1.062

1.072

1.886

Winsorized mean

0.798

1.610

4.268

27.386

0.799

1.050

1.056

1.907

Median

0.818

1.615

4.379

28.408

0.909

1.263

1.170

2.056

EB2

0.778

1.476

4.347

28.336

0.822

1.073

0.989

1.747

20

Ratnadip Adhikari et al. / Procedia Computer Science 48 (2015) 14 – 21 DW

0.771

1.555

4.173

26.229

0.805

1.057

0.985

1.703

OLS

1.128

2.448

4.238

27.013

1.346

2.908

0.986

1.730

Outperformance

0.854

1.555

4.438

29.778

0.915

1.255

1.113

2.142

Proposed

0.766

1.437

3.715

21.154

0.745

0.950

0.971

1

2

3

1.640

2

4

20% trimming is used; the weight to each model is proportional to in-sample MSE; original MAE=MAE×10 , original MSE=MSE×10 .

From Table 2, it can be clearly seen that no individual model could attain the uniformly best accuracies for all time series and that the combination methods has achieved overall better forecasting results than the component models. Further, we can see that the proposed method has achieved the least errors and hence the best accuracies for each time series. In Table 3, we present the assigned ranks to the component models through our proposed ensemble scheme. In Fig. 2, we depict the graphs of the actual testing datasets and their forecasts through the proposed method. Table 3. The assigned ranks to the component models through the proposed ensemble Models

River flow

Employment

AUD-INR exchange rate

Facebook

Random walk

9

4

1

1

Box-Jenkins

5

8

4

3

ITER-FANN

1

1

3

2

DIR-FANN

2

7

8

8

ITER-EANN

6

2

9

9

DIR-EANN

7

6

2

7

ITER-GRNN

8

5

6

6

DIR-GRNN

4

9

5

4

LS-SVM

3

3

7

5

1500

270

(a)

(b) 260

1000

250 500 0

240 0

20

40

60

80

100

230

10

20

30

40

50

60

80

60

(c)

(d)

50

70

40

60

30

0

0

10

20

30

40

50

50

0

20

40

60

80

Fig. 2. Testing set and its forecast through the proposed method for: (a) River flow, (b) Employment, (c) AUD-INR exchange rate, (d) Facebook

Ratnadip Adhikari et al. / Procedia Computer Science 48 (2015) 14 – 21

Fig. 2 visually depicts the forecasting precision of the proposed selective ensemble method. The actual testing dataset and its forecast through the proposed method are depicted through the solid and dotted lines, respectively. The remarkable closeness between the actual and forecasted observations is clearly evident in Fig. 2. 5. Conclusions Obtaining reasonably precise forecasts for time series datasets is a major challenge in many domains of science, engineering, and finance. Numerous research evidences show that combining forecasts from multiple structurally different models substantially improves the forecasting accuracy as well as often outperforms all component models. In this study, an ensemble methodology is proposed that ranks the component models on the basis of their in-sample absolute errors and then selectively combines the forecasts from a predefined number of high ranked models. Thus, only a group of better performing models are combined and others are discarded. Empirical analysis is conducted with nine individual models on four real-world time series datasets. Obtained results clearly demonstrate that the proposed method has attained consistently better accuracies than all component models as well as several other popular forecasts combination mechanisms. As such, this study justifies the superiority of selective ensemble over combining all available forecasts. In future works, the proposed combination approach can be further explored with other varieties of forecasting models as well as more diverse time series datasets. References 1. Gheyas, I. A., Smith, L. S. (2011). A novel neural network ensemble architecture for time series forecasting. Neurocomputing 74 (18), 38553864. 2. Lemke, C., Gabrys, B. (2010). Meta-learning for time series forecasting and forecast combination. Neurocomputing 73, 2006–2016. 3. Terui, N., van Dijk, H. K. (2002). Combined forecasts from linear and nonlinear time series models. International Journal of Forecasting 18, 421-438. 4. Gooijer, J. G., Hyndman, R. J. (2006). 25 years of time series forecasting. International Journal of Forecasting 22, 443-473. 5. De Menezes, L. M., Bunn, D. W., Taylor, J. W. (2000). Review of guidelines for the use of combined forecasts. European Journal of Operational Research 120 (1), 190-204. 6. Jose, V. R. R., Winkler, R. L. (2008). Simple robust averages of forecasts: Some empirical results. International Journal of Forecasting 24 (1), 163-169. 7. Adhikari, R., Agrawal, R. K. (2012). Performance evaluation of weight selection schemes for linear combination of multiple forecasts. Artificial Intelligence Review 1-20. 8. Winkler, R. L., Makridakis, S. (1983): The combination of forecasts. Journal of the Royal Statistical Society A 146 (2), 150-157. 9. Granger, C. W. J., Ramanathan, R. (1984): Improved Methods of combining forecasts. International Journal of Forecasting 3, 197-204. 10. Frietas, P. S., Rodrigues, A. J. (2006): Model combination in neural-based forecasting. European Journal of Operational Research 173, 801-814. 11. Bunn, D. (1975): A Bayesian approach to the linear combination of forecasts. Operational Research Quarterly 26 (2), 325-329. 12. Che, J. (2014). Optimal sub-models selection algorithm for combination forecasting model. Neurocomputing. doi: http://dx.doi.org/10.1016/j.neucom.2014.09.028. 13. Hamzaçebi, C. (2008). Improving artificial neural networks' performance in seasonal time series forecasting. Information Sciences 178, 4550-4559. 14. Zhang, G.P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50, 159-175. 15. Vapnik, V. (1995). The nature of statistical learning theory. Springer-Verlag, New York. 16. Suykens, J. A. K., Vandewalle, J. (1999). Least squares support vector machines classifiers. Neural Processing Letters 9 (3), 293-300. 17. Zhao, J., Zhu, X., Wang, W., Liu, Y. (2013). Extended Kalman filter-based Elman networks for industrial time series prediction with GPU acceleration. Neurocomputing 118, 215-224. 18. Hamzaçebi, C., Akay, D., Kutay, F. (2009). Comparison of direct and iterative artificial neural network forecast approaches in multi-periodic time series forecasting. Expert Systems with Applications 36 (2), 3839-3844. 19. Demuth, H., Beale, M., Hagan, M. (2010). Neural network toolbox user's guide. The MathWorks, Natic, MA, USA. 20. Pelckmans, K., Suykens, J. A. K., Van Gestel, T., De Brabanter, J., Lukas, L., Hamers, B., et al. (2003). LS-SVMlab toolbox user’s guide. Pattern recognition letters 24, 659-675. 21. Data Market (2014). http://datamarket.com 22. Pacific exchange Rate Service (2014). http://fx.sauder.ubc.ca/data.html 23. Yahoo! Finance (2014). http://finance.yahoo.com

21