Energy Economics 46 (2014) 236–245
Contents lists available at ScienceDirect
Energy Economics journal homepage: www.elsevier.com/locate/eneco
A compressed sensing based AI learning paradigm for crude oil price forecasting Lean Yu, Yang Zhao, Ling Tang ⁎ School of Economics and Management, Beijing University of Chemical Technology, Beijing 100029, China
a r t i c l e
i n f o
Article history: Received 11 May 2014 Received in revised form 28 July 2014 Accepted 22 September 2014 Available online xxxx Jel classification: C45 C53 Q47 Keywords: Compressed sensing Data denoising Crude oil price prediction Hybrid model Feed-forward neural network
a b s t r a c t Due to the complexity of crude oil price series, traditional statistics-based forecasting approach cannot produce a good prediction performance. In order to improve the prediction performance, a novel compressed sensing based learning paradigm is proposed through integrating compressed sensing based denoising (CSD) and certain artificial intelligence (AI), i.e., CSD-AI. In the proposed learning paradigm, CSD is first performed as a preprocessor for the original data of international crude oil price to eliminate the noise, and then a certain powerful AI tool is employed to conduct prediction for the cleaned data. In particular, the process of CSD aims to reduce the level of noise which pollutes the data, and to further enhance the prediction performance of the AI model. For verification purpose, international crude oil price series of West Texas Intermediate (WTI) are taken as sample data. Empirical results demonstrate that the proposed CSD-AI learning paradigm significantly outperforms all other benchmark models including single models without CSD process and hybrid models with other denoising techniques, in terms of level and directional accuracies. Furthermore, in the case of different data samples with different time ranges, the proposed model performs the best, indicating that the proposed CSD-AI learning paradigm is an effective and robust approach in crude oil price prediction. © 2014 Elsevier B.V. All rights reserved.
1. Introduction International crude oil price has been fluctuating frequently since 1970, and largely impacting the global economy and society. First, like other commodities, crude oil price is basically determined by its supply and demand (Hagen, 1994; Stevens, 1995). Besides, it is also strongly influenced by other factors, e.g., weather, stock levels, economic growth, political aspects, psychological expectations, and even irregular events (Yu et al., 2008). These factors lead to a strongly fluctuating international crude oil market with the characteristics of complex nonlinearity, dynamic variation and high irregularity (Watkins and Plourde, 1994). In return, such volatility in crude oil market has a worldwide influence on economy and society, which should be captured for both policy planer and economic agents to ensure a stable development. For these reasons, an abundance of literature has focused on international crude oil price forecasting, which has been considered as one of the most important but difficult tasks in the field of prediction research. According to existing studies, various powerful artificial intelligence (AI) models have been applied to overcome the difficulty in modeling
⁎ Corresponding author at: School of Economics and Management, Beijing University of Chemical Technology, 15 Beisanhuan East Road, Beijing 100029, China. Tel.: +86 158 1023 7921; fax: +86 10 6441 2210. E-mail addresses:
[email protected] (L. Yu),
[email protected] (Y. Zhao),
[email protected] (L. Tang).
http://dx.doi.org/10.1016/j.eneco.2014.09.019 0140-9883/© 2014 Elsevier B.V. All rights reserved.
nonlinear hidden patterns of international crude oil price. Unlike traditional linear techniques, AIs are a kind of self-adaptive, nonlinear and data driven models, which have been repeatedly proved to be much more effective in modeling complex, nonlinear and irregular data than the traditional linear models. Furthermore, the statistical data assumptions (e.g., stationarity, regularity and linearity) required in traditional models are not needed in the AI models. Accordingly, lots of AI techniques, e.g., artificial neural networks (ANNs), support vector machines (SVMs), and various AI optimization algorithms (e.g., genetic algorithm (GA)), have been popularly used in international crude oil price forecasting. For example, Xie et al. (2006) implemented support vector regression (SVR) to predict crude oil price. Shambora and Rossiter (2007) and Yu et al. (2007a) respectively used the ANN model to predict crude oil price. All empirical results demonstrated the superiority of AI tools to traditional models. However, though superior to traditional models, AI-based models also have their own shortcomings and disadvantages. For example, the local minimum and overfitting problems sometimes occur while using ANN, and there are also some sensitivity problems to parameters when using ANN, SVM and their parameter optimization algorithms (e.g., GA). To address the shortcomings of single models (including traditional and AI tools), a series of hybrid models have been proposed to improve prediction performance, based on the principle of “hybrid modeling” (Wang et al., 2005; Yu et al., 2008). In particular, the concept of “hybrid modeling” assumes that, every model has its own disadvantages, while
L. Yu et al. / Energy Economics 46 (2014) 236–245
a hybrid learning paradigm integrating a set of models can hold a much better capability through making full use of merits of every model and offsetting the defects of the others. For example, Mirmirani and Li (2005) coupled the ANN model and GA to predict crude oil price and compared the results with the vector auto-regression (VAR) model. Amin-Naseri and Gharacheh (2007) proposed a hybrid AI approach integrating feed-forward neural networks (FNNs), GA, and k-means clustering, to predict the monthly data of crude oil price. Combining multilayer back propagation neural network and wavelet decomposition, a hybrid model is proposed by Jammazi and Aloui (2012) to achieve prominent prediction for crude oil price. Xiong et al. (2013) proposed a revised hybrid model by incorporating the slope-based method (SBM) into empirical mode decomposition (EMD) based FNN modeling framework, for crude oil price prediction. All the empirical results confirmed that the hybrid models outperformed the single models, and thus implying the effectiveness of the “hybrid modeling” methodology. As mentioned above, the most important challenge in modeling crude oil price is the complexity in terms of interactive inner factors, which leads to a high level of noise corrupting the original data and thus largely weakening the prediction capability of models. Therefore, a data preprocessing of noise level reducing (i.e., denoising) can thus largely enhance the analysis and forecasting performance for international crude oil price. That is, due to the complexity in terms of high level of noise, it is necessary to eliminate the noise from original data, and then better analysis and forecasting results for crude oil price can be obtained in modeling the true inner factors without noise. Actually, the process of data denoising has been already introduced into prediction, and the experimental results confirmed its effectiveness. For example, Sang et al. (2009a) proposed a new entropy-based wavelet denoising method for time series analysis. He et al. (2010) proposed a Slantlet denoising based least squares support vector regression (LSSVR) model to predict exchange rates. Faria et al. (2009) proposed an exponential smoothing denoising based neural network model for stock market forecasting. Yuan (2011) presented a novel model by combining the Markovswitching model and Hodrick–Prescott filter for exchange rate forecasting. Nasseri et al. (2011) proposed a hybrid model coupling the extended Kalman filter and genetic programming for forecasting water demand. Chen et al. (2012) introduced Fourier transform into a fuzzy time series forecasting model for stock price. He et al. (2012) proposed a novel multivariate wavelet denoising based approach for estimating portfolio value at risk (PVaR). Sang (2013) proposed an improved wavelet modeling framework to remove noise for time series forecasting. Therefore, based on the concept of “hybrid modeling”, this paper tries to formulate a novel learning paradigm by integrating an effective data denoising technique and powerful AI-based forecasting model. The empirical studies have fully proved that these denoising based hybrid models can all generate satisfactory results than their respective single models, due to the effectiveness of the denoising process in extracting true information from noise. As for denoising techniques, there existed a lot of methods within the research field of data processing, e.g., exponential smoothing (Gardner, 1985), Hodrick–Prescott (HP) filter (Hodrick and Prescott, 1997), Kalman filter (Kalman, 1960), Fourier transform (FT), discrete cosine transform (DCT) (Ahmed et al., 1974), and wavelet transform (Mallat, 1989). However, these above denoising methods have a fatal weakness, i.e., fixed basis design and thence the sensitivity to its parameter settings. In contrast, a recently popular denoising tool, i.e., compressed sensing based denoising (CSD) approach, is a more flexible algorithm based on sparsity (Han et al, 2010; Marim et al, 2010; Zhu et al, 2009). Besides, with a proper sparse transform basis, the CSD process could retain most of the information due to sparsity, while most of other denoising methods might lose some due to their own principles. For these two reasons, CSD is specially introduced as the effective data denoising technique to formulate the novel hybrid forecasting approach, in this study. Generally speaking, due to the complexity in terms of high level of noise, a novel hybrid learning paradigm is proposed for international
237
crude oil price forecasting, integrating CSD and a certain AI tool, i.e., CSD based AI method (abbreviated as CSD-AI). In the proposed learning paradigm, CSD is first performed, as a preprocessor for original data of international crude oil price, to reduce the noise level and help extract the cleaned data. Second, a certain powerful AI tool, e.g., ANN or LSSVR, is employed to model the cleaned data, and to provide the final prediction result for international crude oil price. The main motivation of this study is to formulate a novel hybrid forecasting approach for international crude oil price, i.e., CSD-AI, and compare its prediction performance with some other popular benchmarks (including both single techniques without denoising and hybrid models with other denoising methods). The rest of this study is organized as follows. Section 2 describes the formulation process of the proposed CSD-AI learning paradigm in detail. In Section 3, crude oil spot price data of West Texas Intermediate (WTI) are used to test the effectiveness of the proposed method, and the corresponding results are discussed in this section. Finally, some concluding remarks and future research directions are drawn in Section 4. 2. Methodology formulation Due to the complexity in terms of high level of noise, this section formulates a novel hybrid learning paradigm for international crude oil price, integrating compressed sensing based denoising (CSD) and certain artificial intelligence (AI) model (i.e., CSD-AI). In this section, the CSD and some popular AI tools are respectively described in Sections 2.12.1 and 2.2. Finally, the overall process of the proposed CSD-AI learning paradigm can be formulated, as presented in Section 2.3. 2.1. Compressed sensing based denoising (CSD) The compressed sensing (CS) theory was first proposed by Donoho in 2004 and it provides a new way of signal sampling which breaks the Shannon's sampling theorem. The key idea of CS is to recover a sparse signal from very few non-adaptive and linear measurements by convex optimization (Eldar and Kutyniok, 2012). Amongst abundant applications of CS, a compressed sensing based denoising (CSD) approach has been proposed for signal denoising (Han et al., 2010; Marim et al., 2010; Zhu et al., 2009). For a clear understanding, sparse representation is first introduced.As for sparse representation, signals can be concisely represented in terms of a convenient basis, such as Fourier basis and wavelet basis (Candès and Wakin, 2008). Mathematically, we have a vector X ∈ Rn which can be expanded in an orthonormal basis Ψ = [ψ1ψ2 … ψn]: X¼
n X
si ψi
ð1Þ
i¼1
where si is the ith coefficient sequence of X: si ¼ hX; ψi i:
ð2Þ
In this way, X could be expressed as Ψs, where Ψ is an n × n matrix with ψ1, …, ψn as columns. When the coefficient s is sparse where most entries of si are zero, Eq. (1) is the sparse representation for X. Generally, three main steps are involved in CSD, i.e., sparse representation, random sampling and signal recovery. 1. Sparse representation. If the signal X ∈ Rn is sparse under an orthogonal basis Ψ, the sparse coefficients s could be represented as s = ΨTX which is the sparse or approximate sparse representation for the signal X. 2. Random sampling. We define an m × n, (m b n) dimensional observation matrix Φ, which is irrelevant to the transform basis Ψ, to measure the sparse coefficients s and get an observation vector Y = Φs. Accordingly,
238
L. Yu et al. / Energy Economics 46 (2014) 236–245
the entire sensing process is:
2.2. Artificial intelligences (AIs)
T
Y ¼ ΦΨ X
ð3Þ
3. Signal recovery. After getting the compressed sensed signal Y, we can recover X from Y by T T min Ψ X ; s:t: Y ¼ ΦΨ X : 0
ð4Þ
Since Eq. (4) is an NP-hard (i.e., Non-deterministic Polynomial-time hard) problem, this problem could be transformed to: T T min Ψ X ; s:t: Y ¼ ΦΨ X :
ð5Þ
1
If X is polluted by noise, the minimization problem needs to be changed to: T T min Ψ X ; s:t: ΦΨ X−Y ≤ε : 1
2
ð6Þ
To model the nonlinear patterns hidden in the complex data of international crude oil price, a series of artificial intelligences (AIs) have been implemented. Using flexible function designs and powerful selflearning capability, AIs have been repeatedly proved to be superior to traditional forecasting models. Amongst various techniques, artificial neural network (ANN) and least squares support vector regression (LSSVR) may be the most popular and effective AI tools (Tang et al., 2012; Yu et al., 2007). 2.2.1. Artificial neural network Artificial neural network (ANN) is a typical intelligent learning model, widely used in complex data forecasting. In this study, a standard three-layer feed-forward neural network (FNN) (Hornik et al., 1989; White, 1990), based on error back-propagation algorithm, is introduced for modeling international crude oil price data. Usually, a FNN-based forecasting model can be trained by in-sample dataset and applied to out-of-sample dataset for prediction. The model parameters (including connection weights and node biases) are adjusted iteratively by a process of minimizing the forecasting error function. Basically, the final output of the FNN-based forecasting model can be represented as: f ðxÞ ¼ a0 þ
Eq. (6) could be solved by the orthogonal matching pursuit (OMP) algorithm which is a very popular and effective algorithm in guaranteeing the success of recovery. Since the signal X could be sparse represented in some transform domains, it could be near precisely recovered through the compressed sensed signal Y. The CSD process for international crude oil price can be shown in Fig. 1. There is an assumption that, the international crude oil price consists of trend T and noise Z which is of high frequency, and the goal of CSD is to remove the noise Z from the original data X. According to the principle of CSD (i.e., sparsity), if the signal could be sparsely represented, it could be recovered through the CS process. In fact, the trend T of crude oil price series could be sparsely represented under some kinds of wavelet bases while the noise Z could not. Furthermore, the CS process is actually a dimensionality reduction process, since Y has m dimensions while X has n N m dimensions. Therefore, the CS process can effectively extract the trend T from the original signal X. Compared to other traditional denoising methods, such as Fourier filter and wavelet denoising, CSD is more flexible in parameter settings. For example, Fourier filter needs to predetermine the frequency and amplitude thresholds in frequency domain. Furthermore, this process may cause loss of information. Similarly, the wavelet denoising method needs to specify the frequency thresholds in different time scales, and it may lead to inconvenience when processing large-scale data. In contrast, CSD only needs to choose a proper sparse transform basis and a proper sampling rate, and then a good denoising result can be generated based on CS theory.
Xq j¼1
w j φða j þ
Xp i¼1
wi j xi Þ
ð7Þ
where xi(i = 1, 2, …, p) represents the input patterns, f(x) is the output, aj(j = 0, 1, 2, …, q) is a bias on the j th unit, wij(i = 1, 2, …, p; j = 1, 2, …, q) and wj are the connection weights between layers, φ(⋅) is the transfer function of the hidden layer, p is the number of input nodes, and q is the number of hidden nodes. Actually, the FNN model in Eq. (7) performs a nonlinear functional mapping from the past observations (xt − 1, xt − 2, …, xt − p) to the future value xt + l: xtþl ¼ ϕ xt−1 ; xt−2 ; …; xt−p ; W þ ξt
ð8Þ
where l is the horizon, W = {wij} is the vector of weight parameters, and φ(⋅) is the function trained by FNN. 2.2.2. Least squares support vector regression Support vector machine (SVM) was first proposed by Vapnik (1995). Since the training process of SVM takes a long time when analyzing large-scale data, least squares support vector machine (LSSVM) was proposed to overcome this shortcoming (Suykens and Vandewalle, 1999). Generally, LSSVM could be categorized into least squares support vector regression (LSSVR) and least squares support vector classification (LSSVC), for respective prediction and classification purposes. In this study, the LSSVR is also used as a prediction tool. For support vector regression (SVR), the basic idea is to map the original data into a high-dimensional feature space in which linear
Fig. 1. CSD process for crude oil price time series data.
L. Yu et al. / Energy Economics 46 (2014) 236–245
regression is made. The regression function can be formulated as follows: f ðxÞ ¼
T X
wt K ðx; xt Þ þ b
ð9Þ
239
Step 2: Forecasting using AI After data denoising, a certain powerful AI technique, e.g., ANN or LSSVR, is used to model the cleaned data T and obtain the prediction results X for original data X.
t¼1
where K(x, xt) is the mapping function, f(x) is the prediction estimation, and wt and b are the weights obtained by minimizing the regularized risk function. Thus, Eq. (9) can be transformed into the following optimization problem: min s:t:
T X 1 T ξt þ ξt w wþγ 2 t¼1
T
w φðxt Þ þ b−yt ≤ ε þ ξt ; ði ¼ 1; 2; …; T Þ T yt − w φðxt Þ þ b ≤ ε þ ξt ; ði ¼ 1; 2; …; T Þ
ð10Þ
where γ is the penalty parameter, and nonnegative variables ξt and ξ∗t are the slack variables which represent the distance from the actual values to the corresponding boundary values of ε-tube. Since solving Eq. (10) is a time consuming process, LSSVR transforms the optimization problem into: min
T 1 T 1 X 2 e w wþ γ 2 2 i¼1 t
ð11Þ
T
s:t: yt ¼ w φðxt Þ þ b þ et ; ðt ¼ 1; 2; …; T Þ where ei indicates the slack variable presenting the same meaning as variables ξt and ξ∗t in the SVR methods. 2.3. Compressed sensing based AI forecasting model Based on above techniques, a novel hybrid CSD-AI learning paradigm for crude oil price can be formulated, as illustrated in Fig. 2. For multi-step-ahead prediction, there are several strategies (Bao et al., 2014a; Bao et al., 2014b; Taieb et al., 2012; Xiong et al., 2013). In this study, the direct forecasting strategy is implemented. Given a time series xt, (t = 1, 2, …, T), the m-step ahead prediction for xt + m can be generated by: _ xtþm ¼ f xt ; xt−1 ;…; xt−ðl−1Þ
ð12Þ
where _ x t is the predicted value in period t, xt is the actual value, and l represents the lag order. Generally, two main steps are included in the proposed CSD-AI learning paradigm, i.e., data denoising and forecasting. Step 1: Data denoising using CSD First, the original data X consisting of trend T and noise Z is represented by a proper transform basis, such as wavelet basis in this study. Then, the sparse coefficients are sampled through a Gaussian white noise sampling matrix. Finally, the cleaned data T can be obtained through OMP recovery algorithm for further analysis.
3. Experimental study In this section, experimental design, including data description, performance measurement criteria, benchmark models and parameter settings, is first given in Section 3.1. The experimental results are then discussed in Section 3.2 from three perspectives. First, the effectiveness of compressed sensing based denoising (CSD) in improving forecasting is tested through comparing CSD based hybrid models with their single counterparts, as discussed in Section 3.2.1. Second, the superiority of CSD to other denoising models is tested in Section 3.2.2. Finally, a robustness test with different datasets is performed in Section 3.2.3. 3.1. Experimental design 3.1.1. Data description In this study, crude oil spot price series of West Texas Intermediate (WTI) are chosen. The data are daily data obtained from the U.S. energy information administration (EIA) (http://www.eia.gov/). In particular, the data range from January 03, 2011 to July 17, 2013 excluding public holidays, with a total of 640 observations. The data from January 03, 2011 to January 11, 2013 are used for the model training (512 observations), and the remainder are used as the testing set (128 observations). For robustness analysis, other two datasets with different time ranges are also introduced (i.e., the data for the years 2011 and 2012). The sample data are selected for the following four reasons. First, compared with weekly data and monthly data, the daily data might be more complex in terms of higher level of noise. Second, the size of training data should be 2n(n N 0) in order to create the orthogonal sparse transform basis in CSD. Third, we choose the start point after the year 2010 to avoid the structural change caused by the financial crisis in 2008. Finally, the size ratio of training to testing sets is set to 4:1, according to the related research (Yu et al., 2008). 3.1.2. Performance evaluation criteria To evaluate the forecasting performance, two main classes of criteria, i.e., level and directional prediction accuracies, are used. Amongst various criteria, the root mean squared error (RMSE) has been popularly utilized as an effective statistic directly reflecting the level prediction errors (e.g., Tang et al., 2013; Tang et al., 2014a; Tang et al., 2014b; Wang et al., 2011; Xie et al., 2006; Yu et al., 2008). Therefore, RMSE is especially introduced here to evaluate the accuracy of level prediction, typically represented by:
RMSE ¼
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 XN ð^xðt Þ−xðt ÞÞ t¼1 N
Fig. 2. The overall process of CSD-AI learning paradigm.
ð13Þ
240
L. Yu et al. / Energy Economics 46 (2014) 236–245 115 Training set Denoised training set using CSD
Crude oil price (dollars per barrel)
110 105 100 95 90 85 80 75 Jan11
Apr11
Jul11
Oct11
Jan12
Apr12
Jul12
Oct12
Jan13
Fig. 3. Original data and denoised data using CSD.
where x(t) is the actual value, ^xðt Þ is the predicted value, and N is the number of prediction results, at time t. The ability to predict movement direction can be measured by a directional statistic Dstat, which can be expressed as: Dstat ¼
N 1X a 100% N t¼1 t
for xt calculated by the tested method a and its benchmark method b, respectively, at time t. Here, unilateral test is used to test the S statistic. In DM test, S value and p-value can be used to estimate the superiority of method a over method b.
ð14Þ
3.1.3. Benchmark models First, the effectiveness of the CSD process in enhancing prediction accuracy is tested. For this purpose, a set of hybrid models is formulated, by coupling CSD and popular forecasting techniques, e.g., the most traditional method of autoregressive integrated moving average (ARIMA) (Box and Jenkins, 1970), and the most popular AIs of LSSVR (Xie et al., 2006) and ANN (Yu et al., 2008); and then, by comparing these hybrid models (marked as CSD-ARIMA, CSD-LSSVR and CSD-ANN) with their respective single counterparts (i.e., single ARIMA, LSSVR and ANN), it can be obviously seen whether the CSD process can improve the prediction ability of the models. The main reasons for using ARIMA, LSSVR and ANN as the forecasting models in hybrid model formulation can be summarized into two perspectives. On the one hand, ARIMA can be seen as the most typical linear regression model, and has been popularly used as a traditional benchmark in the prediction research (e.g., Tang et al., 2014a; Tang et al., 2014b). On the other hand, LSSVR and ANN have been popularly utilized especially for crude oil price forecasting as the most typical AI techniques (e.g., Yu et al., 2008). However, neither of them can be fully proved to be better than the other, with their respective merits. Therefore, the two powerful intelligent models (i.e., LSSVR and
where at = 1 if xtþ1 −xt ^xtþ1 −xt ≥ 0, and at = 0 otherwise. To statistically test the significant differences in terms of predictive accuracy amongst different forecasting models, the Diebold–Mariano (DM) statistic is used (Diebold and Mariano, 2002). In this study, the loss function is set to mean square prediction error (MSPE) and the null hypothesis is that the MSPE value of the tested model a is not less than that of the benchmark model b. In particular, the DM statistic can be defined as: g S¼ 1=2 V g =M
where g ¼
ð15Þ !
_ 2 − x −x _ 2) and_ ∑ gt =M (g t ¼ xt −x V g ¼ γ0 þ 2 a;t t b;t M
t¼1
∞
∑ γ l (γl = cov(gt, gt − l))._ x a;t and _ x b;t represent the predicted values l¼1
Single forecasting model 0.70 0.60 0.50
0.6535
CSD based hybrid forecasting model
0.622
0.5433 0.4488
0.4961
0.4961
0.4803
0.4882
0.4724
0.4961
0.5039 0.4803
0.40 0.30 0.20 0.10 0.00 ARIMA One-step-ahead prediction
ANN One-step-ahead prediction
LSSVR One-step-ahead prediction
ARIMA Five-step-ahead prediction
ANN Five-step-ahead prediction
LSSVR Five-step-ahead prediction
Fig. 4. Dstat comparisons between CSD based hybrid models and their single benchmark models.
L. Yu et al. / Energy Economics 46 (2014) 236–245
Single forecasting model
241
CSD based hybrid forecasting model
3.50 3.00
2.7808
2.9268
2.7700 2.7691
2.7808
2.7638
2.50 2.00
1.6333
1.50
1.2454
1.1181 1.0992
1.12
1.1018
1.00 0.50 0.00 ARIMA One-step-ahead prediction
ANN One-step-ahead prediction
LSSVR One-step-ahead prediction
ARIMA Five-step-ahead prediction
ANN Five-step-ahead prediction
LSSVR Five-step-ahead prediction
Fig. 5. RMSE comparisons between CSD based hybrid models and their single benchmark models.
ANN) are both implemented here as the forecasting models under the proposed hybrid framework. Second, the superiority of proposed CSD-AI learning paradigm is discussed. Accordingly, other five popular denoising methods, including exponential smoothing (ES) (Bowerman and O'Connell, 1993), Hodrick–Prescott (HP) filter (Baxter and King, 1999), Kalman filter (KF) (Shumway and Stoffer, 1982), discrete cosine transform (DCT) filter (Bracewell, 1980) and wavelet denoising (WD) (Struzik, 2001), are also introduced as preprocessors for original data to formulate a series of hybrid benchmarks. Generally, for the proposed CSD-AI models (i.e., CSD-ANN and CSD-LSSVR), three single benchmarks (i.e., ARIMA, LSSVR and ANN), one CSD-based ARIMA hybrid benchmark (i.e., CSD-ARIMA), and a set of hybrid models with other five denoising methods are formulated for comparison purpose. 3.1.4. Parameter settings For denoising methods, the parameters are set based on the related literature (e.g., Gardner, 1985; Hodrick and Prescott, 1997; Welch and Bishop, 1995; Ahmed et al., 1974; Mallat, 1989; Zhu et al., 2009) and trial-and-error method. In CSD, symlet 6 is used as the sparse transform basis, the sampling number is set to 500, and the number of iterations in the orthogonal matching pursuit (OMP) algorithm is set to 125. In ES, the smoothing factor is set to 0.2. In HP filter, the smoothing value is set to 100. In KF, the measurement covariance and process covariance are set to 0.25 and 0.0004, respectively. In DCT, the frequency threshold is set to 100. In WD, symlet 6 is chosen as the wavelet basis, the iterations of decomposition are set to 8, and the frequency thresholds are set based on soft threshold principle (Donoho, 1995). For forecasting models, the best ARIMA model for each training sample is determined based on Schwarz Criterion (SC) minimization (Yu et al., 2008). In ANN, a feed-forward neural network (FNN) (I–H–O) is built in this study (White, 1990), using seven hidden nodes, one output Table 1 DM test results for CSD based hybrid models and their single benchmark models. Tested model
CSD-ANN CSD-LSSVR CSD-ARIMA ANN LSSVR
Reference model CSD-LSSVR
CSD-ARIMA
ANN
LSSVR
ARIMA
−0.96 (0.1686)
−4.55 (0.0000) −4.30 (0.0000)
−1.31 (0.0945) −1.19 (0.1175) 4.09 (1.0000)
−1.40 (0.0805) −1.34 (0.0904) 4.02 (1.0000) −1.16 (0.1236)
−3.47 (0.0003) −3.40 (0.0003) 0.06 (0.5234) −3.29 (0.0005) −3.26 (0.0006)
neuron and I input neurons, where I is the lag order determined by autocorrelation and partial correlation analyses and finally set to 6. Each ANN model iteratively runs 10,000 times using the training subset. In the LSSVR model, the most popular kernel function, Gaussian RBF, is selected, and the grid search method is used to determine the values of parameters γ and δ2 (Tang et al., 2012). It is worth noticing that, all models are implemented via the Matlab R2013a software package and all programs are run on HP desktop computer with CPU of Core i3 3.30 GHz, randomaccess memory (RAM) size of 4.00 GB and hard disk size of 500 GB. 3.2. Experimental results In the proposed CSD-AI based learning paradigm, the first step is to apply CSD to denoise the crude oil price data, and Fig. 3 shows the corresponding result. The second step is to forecast the cleaned data by a certain powerful forecasting tool (e.g., ANN or LSSVR). Moreover, a series of benchmark models, including single and hybrid forecasting models, (see Section 3.1.3) are also performed for comparison purpose. 3.2.1. Effectiveness of CSD in forecasting The effectiveness of CSD in improving forecasting performance is first discussed. Figs. 4 and 5 report the prediction comparison results between CSD based hybrid learning paradigms and their single benchmarks without CSD, respectively in terms of Dstat and RMSE. Besides, Table 1 reports the results of the DM test amongst these CSD based hybrid models and single models. From the results, one obvious conclusion can be obtained that, the hybrid models with the CSD process perform mostly better than their single benchmark models without CSD, in terms of both directional and level prediction accuracies. As for directional prediction accuracy, Fig. 4 shows the comparison results in terms of Dstat. From the results, it can be easily found that, the CSD based hybrid methods can get much better directional predictions than their respective single methods in all cases. In particular, in one-step-ahead prediction, the CSD based hybrid learning paradigm (i.e., CSD-ARIMA, CSD-ANN and CSD-LSSVR) achieve higher Dstat values than their corresponding single models (i.e., ARIMA, ANN and LSSVR), and CSD-ANN ranks the first with the highest Dstat. In five-step-ahead prediction, the CSD based hybrid models are also better than their corresponding single models without exception, in directional prediction. However, such improvement is somewhat limited compared to that in one-step-ahead prediction, and CSD-LSSVR achieves the highest Dstat. Generally speaking, the CSD based hybrid models all performs better than their single benchmark models in terms of Dstat, indicating the effectiveness of CSD in improving directional prediction. Moreover, the CSD-AI models (i.e., CSD-ANN and CSD-LSSVR) perform the best, even much better than CSD-ARIMA. The main reason may be that, since the ARIMA model is a traditional linear model, it may find difficulty
242
L. Yu et al. / Energy Economics 46 (2014) 236–245
KF-ANN
ES-ANN
HP-ANN
0.70 0.60
DCT-ANN
WD-ANN
CSD-ANN
0.6535 0.5197 0.5354
0.5748 0.5906 0.5276
0.5118 0.4961 0.4961 0.5276
0.50
0.5354 0.4724
0.40 0.30 0.20 0.10 0.00 One-step-ahead prediction
Five-step-ahead prediction
Fig. 6. Dstat comparison of different hybrid models coupling different denoising methods and ANN.
in modeling the nonlinear and complex patterns in crude oil price data, even with the help of CSD. As for level prediction accuracy, Fig. 5 shows the RMSE comparison results. Similarly, one obvious conclusion can be obtained that, the hybrid models with CSD are much better than their single benchmark models without CSD in most cases, in terms of RMSE criterion. In onestep-ahead prediction, the RMSE values of CSD based hybrid models are much lower than those of the corresponding single models, and CSD-ANN produces the best results with the lowest RMSE. In fivestep-ahead prediction, the CSD-AI models perform better than their corresponding single AI-based models in level prediction, and CSD-LSSVR ranks the first with the lowest RMSE. However, the CSD-ARIMA model performs somehow worse than its single benchmark ARIMA at horizon five. The main reason may be also referred to the intrinsic weakness of the statistical linear ARIMA model when modeling the nonlinear complex data of international crude oil price. In terms of RMSE, the CSD-AI models (i.e., CSD-ANN and CSD-LSSVR) perform much better than CSD-ARIMA and all single models (i.e., ARIMA, ANN and LSSVR), which further implies that the process of CSD can effectively improve the prediction performance of models by reducing the level of noise in the original data of crude oil price, and that the proposed learning paradigm with CSD and powerful AI technique is a quite promising forecasting tool for crude oil price. From above analysis, three main conclusions can be summarized. First, the proposed CSD-AI models (i.e., CSD-ANN and CSD-LSSVR) perform the best in both directional and level prediction, indicating the effectiveness of the novel methodology. In particular, CSD-ANN performs the best in one-step-ahead prediction, while the CSD-LSSVR performs the best in five-step-ahead prediction. The main reason can be summarized into that, first, CSD can effectively reduce the noise level of original data of crude oil price, and thus helps in improving
KF-ANN
ES-ANN
HP-ANN
the prediction performance of models. Second, when comparing the CSD based hybrid models with their single benchmarks, the former models mostly defeat the latter in both directional and level prediction, which again confirms the effectiveness of the CSD process in improving model prediction performance. Third, focusing on forecasting techniques, the CSD-AI models of CSD-LSSVR and CSD-ANN perform much better than CSD-ARIMA, and single AI models of LSSVR and ANN outperform single ARIMA, in both level and direction prediction, which further indicates the superiority of the proposed CSD-AI methodology with AI as forecasting tools. Moreover, CSD-ARIMA performs even worse than ARIMA in five-step-ahead prediction. The main reason may be that AIs are more effective in modeling the nonlinear patterns hidden in crude oil price, while the traditional method may lose power in the case of such complex data. To statistically confirm the above conclusions, the DM test is performed, and Table 1 reports the corresponding results at horizon one, where the S values and p-values (in brackets) are listed. First, it can be obviously seen that, when CSD-AI forecasting models are tested against their single benchmarks, all the p-values are much smaller than 10%, indicating the effectiveness of CSD process in improving forecasting performance under the confidence level of 90%. Second, focusing on forecasting techniques, by comparing the CSD-AI models (i.e., CSD-ANN and CSD-LSSVR) with CSD-ARIMA and comparing AI models (i.e., ANN and LSSVR) with ARIMA, all the p-values are significantly below 10%, indicating that the AI models are statistically more effective than traditional linear ARIMA model in the case of crude oil price data under the confidence level of 90%. It is due to the weakness of traditional ARIMA model in modeling nonlinear complex data that, CSD-ARIMA even with CSD process cannot be proved to be better than single ARIMA. Interestingly, for the two AI models (i.e., ANN and LSSVR), neither of them can be fully proved better than the other, since the p-values are
DCT-ANN
WD-ANN
CSD-ANN
9.00 7.6244
8.00 7.00 6.00
5.2319
5.00
4.1113
4.00 2.9493 3.00 2.00
1.1757 1.2033
2.7658 2.7635
1.5999 1.7591 1.1056 1.0992
1.00 0.00 One-stpe-ahead prediciton
Five-step-ahead prediction
Fig. 7. RMSE comparison of different hybrid models coupling different denoising methods and ANN.
L. Yu et al. / Energy Economics 46 (2014) 236–245
KF-ANN
ES-ANN
HP-ANN
DCT-ANN
243
WD-ANN
CSD-ANN
0.80
0.7097
0.70 0.60
0.6129 0.5484
0.5161
0.5484
0.5161
0.5484
0.5806
0.5806
0.5484
0.4516
0.50
0.3548
0.40 0.30 0.20 0.10 0.00 Data in 2011
Data in 2012
Fig. 8. Dstat comparison of different hybrid ANN methods for different sample data.
above 10% when comparing CSD-ANN with CSD-LSSVR and comparing ANN with LSSVR. 3.2.2. Superiority of CSD to other denoising methods To test the superiority of CSD in data processing, various other popular denoising techniques are introduced, including KF, ES, HP filter, DCT and WD (see Section 3.1.3). It is worthy of noticing that, since ANN and LSSVR perform quite similar in prediction for crude oil price (see Section 3.2.1), we only consider ANN here as the typical AI forecasting tool to formulate hybrid benchmark models with different denoising techniques. Figs. 6 and 7 respectively show the Dstat and RMSE comparison results across different hybrid ANN models with different denoising models (i.e., KF-ANN, ES-ANN, HP-ANN, DCT-ANN, WD-ANN and CSD-ANN). From the results, it can be easily seen that, the proposed CSD-ANN model performs the best in most cases. In particular, in both one-step-ahead prediction and five-step-ahead prediction, CSD-ANN achieves the highest Dstat and lowest RMSE, indicating the superiority of the novel CSD-ANN model to the hybrid counterparts with other denoising techniques. According to the above analyses from Figs. 6 and 7, the proposed CSD-ANN learning paradigm performs the best in all cases in terms of both Dstat and RMSE, which indicates that CSD is a much effective denoising method, and the proposed CSD based AI model is a promising tool for crude oil price series forecasting. 3.2.3. Robustness analysis With different sample data on different time ranges, a robustness analysis is performed to test the universality and stability of the proposed CSD-AI models, though two daily datasets for the year 2011 (i.e., from Jan. 03, 2011 to Aug. 19, 2011) and 2012 (i.e., from Jan. 01, 2012 to Aug. 23, 2012). Figs. 8 and 9 show the corresponding comparison KF-ANN
ES-ANN
HP-ANN
results across different hybrid ANN forecasting models with different denoising techniques in one-step-ahead prediction in terms of Dstat and RMSE, respectively; while Figs. 10 and 11 for hybrid LSSVR forecasting models. From Figs. 8 and 9, it can be easily found that, for different sample data on different time ranges, the proposed CSD-ANN consistently shows its superiority in prediction for crude oil price data. In particular, as seen from Fig. 8, amongst diverse hybrid ANN methods, the CSD-ANN model achieves the highest directional hits in terms of Dstat for both datasets in the year 2011 and 2012. Similarly, Fig. 9 shows that, CSDANN also achieves the highest level prediction accuracy in terms of RMSE for both datasets. Similarly, according to Figs. 10 and 11, the proposed CSD-LSSVR model achieves better prediction results than other hybrid LSSVR methods with other denoising techniques, for the two datasets on different time ranges. As shown in Fig. 10, the CSDLSSVR achieves the highest Dstat values in both cases of the datasets for the years 2011 and 2012. Fig. 11 shows that, the CSD-LSSVR performs the best with the lowest RMSE in prediction for the two sample data. Generally, from Figs. 8–11, it can be seen that, the CSD-AI models of CSD-ANN and CSD-LSSVR outperform their respective hybrid benchmarks with other denoising techniques, in all cases of the data with different time ranges, in terms of both Dstat and RMSE. The main reason can be referred to the effectiveness of CSD in processing data. With a proper transform basis, the CSD method tends to be more stable and effective in processing the complex data of crude oil price data, compared to other denoising methods. The results also indicate the stability and robustness of the proposed CSD-AI models, especially for crude oil price data. 3.2.4. Summarizations In terms of the above experiments presented in this study, we can draw the following five conclusions. (1) CSD can effectively improve the DCT-ANN
WD-ANN
CSD-ANN
6.00 5.2035 5.00 4.0471 4.00 3.00
2.7545 2.8058 2.9275
2.5129
2.3518 2.2965
2.3439
1.691
2.00
1.5731 1.5279
1.00 0.00 Data in 2011
Data in 2012
Fig. 9. RMSE comparison of different hybrid ANN methods for different sample data.
244
L. Yu et al. / Energy Economics 46 (2014) 236–245
KF-LSSVR
ES-LSSVR
HP-LSSVR
DCT-LSSVR
WD-LSSVR
CSD-LSSVR
0.70
0.50
0.6129
0.5806
0.60 0.4839 0.4839
0.5161 0.4516
0.4194
0.40
0.4194 0.3548
0.3871 0.3871
0.4194
0.30 0.20 0.10 0.00 Data in 2011
Data in 2012
Fig. 10. Dstat comparison of different hybrid LSSVR methods for different sample data.
forecasting ability, since the CSD based forecasting models outperform their single counterparts without CSD process. (2) The CSD based hybrid methods with AI as forecasting tools (i.e., CSD-ANN and CSD-LSSVR) perform significantly better than that with traditional model (i.e., CSD-ARIMA), indicating the effectiveness of AIs in modeling nonlinear patterns hidden in the crude oil price. (3) Hybrid methods with CSD and other denoising methods are compared, and the results confirmed the superiority of CSD in data processing. (4) With different sample data, the proposed CSD-AI models can be also tested as the best models in terms of both level and directional predictions, implying its universality and stability. (5) Finally, the proposed CSD based AI learning paradigm, with the competitive denoising technique of CSD and powerful AI forecasting tool, can be used as an effective forecasting method for crude oil price.
and stability of the novel learning paradigm. The results also indicate that, the proposed CSD-AI hybrid model is very promising in forecasting complex time series in terms of high level of noise, especially for the data of crude oil price. Besides crude oil price data, the proposed CSD-AI learning paradigm can be also applied to address other difficult forecasting tasks, especially for complex, irregular and highly nonlinear data. Moreover, how to choose a suitable transform basis may be the most crucial part in CSD, directly determining the functions of CSD and thence the CSD-AI learning paradigm. Therefore, the transform basis should be carefully selected, especially based on the data characteristics of the sample data. We will look into these issues in the future research.
4. Concluding remarks
Authors would like to express their sincere appreciation to the editor and the three independent referees in making valuable comments and suggestions to the paper. Their comments have improved the quality of the paper immensely. This work is supported by grants from the National Science Fund for Distinguished Young Scholars (NSFC No. 71025005), the National Natural Science Foundation of China (NSFC Nos. 91224001 and 71301006), the National Program for Support of Top-Notch Young professionals and the Fundamental Research Funds for the Central Universities in BUCT.
Due to the complexity of daily crude oil price in terms of high level of noise, a novel hybrid learning paradigm is proposed, through incorporating compressed sensing based denoising (CSD) and certain artificial intelligence (AI) forecasting tool, i.e., CSD-AI. In this model, CSD is used as a preprocessor to extract the clean data from the original data of crude oil price by data denoising; and then a certain powerful AI model, e.g., artificial neural network (ANN) or least square support vector regression (LSSVR), is employed to model the clean data and to provide the final prediction result. With crude oil price data of WTI as sample data, the empirical study indicates that the CSD process can significantly improve the forecasting capability of single AI models, since the CSD-AI models outperform their single benchmarks in both level and directional predictions. Moreover, the proposed CSD-AI outperforms other hybrid models with traditional forecasting method or other denoising techniques in terms of both level and directional accuracies. Furthermore, for different data samples on different time ranges, the proposed CSD-AI models also perform the best, indicating the universality KF-LSSVR 5.00
ES-LSSVR
HP-LSSVR
Acknowledgments
Appendix A. Supplementary data Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.eneco.2014.09.019. References Ahmed, N., Natarajan, T., Rao, K.R., 1974. Discrete cosine transform. IEEE Trans. Comput. 100 (1), 90–93.
DCT-LSSVR
WD-LSSVR
CSD-LSSVR
4.4675
4.50 4.00
3.4617
3.50 3.00 2.50
2.4177
2.4479
2.8648
2.6663 2.3539
2.4099 2.0930
2.0511 2.1845 2.0220
2.00 1.50 1.00 0.50 0.00 Data in 2011
Data in 2012
Fig. 11. RMSE comparison of different hybrid LSSVR methods for different sample data.
L. Yu et al. / Energy Economics 46 (2014) 236–245 Amin-Naseri, M.R., Gharacheh, E.A., 2007. A hybrid artificial intelligence approach to monthly forecasting of crude oil price time series. The Proceedings of the 10th International Conference on Engineering Applications of Neural Networks, CEUR-WS284, pp. 160–167. Bao, Y., Xiong, T., Hu, Z., 2014a. Multi-step-ahead time series prediction using multipleoutput support vector regression. Neurocomputing 129, 482–493. Bao, Y., Xiong, T., Hu, Z., 2014b. PSO-MISMO modeling strategy for multi-step-ahead time series prediction. IEEE Trans. Cybern. 44 (5), 655–668. Baxter, M., King, R.G., 1999. Measuring business cycles: approximate band-pass filters for economic time series. Rev. Econ. Stat. 81 (4), 575–593. Bowerman, B.L., O'Connell, R.T., 1993. Time Series and Forecasting: An Applied Approach. Duxbury Press, California. Box, G.E.P., Jenkins, G.M., 1970. Time Series Analysis: Forecasting and Control. Holden Day Press, San Francisco, CA. Bracewell, R.N., 1980. Fourier Transform and Its applications. McGraw-Hill Press, New York. Candès, E.J., Wakin, M.B., 2008. An introduction to compressive sampling. IEEE Signal Process. Mag. 25 (2), 21–30. Chen, B.T., Chen, M.Y., Fan, M.H., Chen, C.C., 2012. Forecasting stock price based on fuzzy time-series with equal-frequency partitioning and fast Fourier transform algorithm. Computing, Communications and Applications Conference (ComComAp), IEEE, pp. 238–243. Diebold, F.X., Mariano, R.S., 2002. Comparing predictive accuracy. J. Bus. Econ. Stat. 20 (1), 134–144. Donoho, D.L., 1995. De-noising by soft-thresholding. IEEE Trans. Inf. Theory 41 (3), 613–627. Eldar, Y.C., Kutyniok, G., 2012. Compressed Sensing: Theory and Applications. Cambridge University Press, Cambridge. Faria, E.L., Albuquerque, M.P., Gonzalez, J.L., Cavalcante, J.T.P., Albuquerque, M.P., 2009. Predicting the Brazilian stock market through neural networks and adaptive exponential smoothing methods. Expert Syst. Appl. 36 (10), 12506–12509. Gardner, E.S., 1985. Exponential smoothing: the state of the art. J. Forecast. 4 (1), 1–28. Hagen, R., 1994. How is the international price of a particular crude determined? OPEC Rev. 18 (1), 127–135. Han, B., Xiong, J., Li, L., Yang, J., Wang, Z., 2010. Research on millimeter-wave image denoising method based on contourlet and compressed sensing. Signal Processing Systems (ICSPS), 2010 2nd International Conference on IEEE. vol. 2 (V2-471 - V2-475). He, K., Lai, K.K., Yen, J., 2010. A hybrid slantlet denoising least squares support vector regression model for exchange rate prediction. Procedia Comput. Sci. 1 (1), 2397–2405. He, K., Lai, K.K., Xiang, G., 2012. Portfolio value at risk estimate for crude oil markets: a multivariate wavelet denoising approach. Energies. 5 (4), 1018–1043. Hodrick, R.J., Prescott, E.C., 1997. Postwar US business cycles: an empirical investigation. J. Money, Credit, Bank. 29 (1), 1–16. Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural Netw. 2 (5), 359–366. Jammazi, R., Aloui, C., 2012. Crude oil price forecasting: experimental evidence from wavelet decomposition and neural network modeling. Energy Econ. 34 (3), 828–841. Kalman, R.E., 1960. A new approach to linear filtering and prediction problems. J. Basic Eng. 82 (1), 35–45. Mallat, S.G., 1989. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11 (7), 674–693. Marim, M., Angelini, E., Olivo-Marin, J.C., 2010. Denoising in fluorescence microscopy using compressed sensing with multiple reconstructions and non-local merging. Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE, pp. 3394–3397. Mirmirani, S., Li, H.C., 2005. A comparison of VAR and neural networks with genetic algorithm in forecasting price of oil. Adv. Econ. 19, 203–223.
245
Nasseri, M., Moeini, A., Tabesh, M., 2011. Forecasting monthly urban water demand using Extended Kalman Filter and Genetic Programming. Expert Syst. Appl. 38 (6), 7387–7395. Sang, Y.F., 2013. Improved wavelet modeling framework for hydrologic time series forecasting. Water Resour. Manag. 27 (8), 2807–2821. Sang, Y.F., Wang, D., Wu, J.C., Zhu, Q.P., Wang, L., 2009. Entropy-based wavelet de-noising method for time series analysis. Entropy 11 (4), 1123–1147. Shambora, W.E., Rossiter, R., 2007. Are there exploitable inefficiencies in the futures market for oil? Energy Econ. 29 (1), 18–27. Shumway, R.H., Stoffer, D.S., 1982. An approach to time series smoothing and forecasting using the EM algorithm. J. Time Ser. Anal. 3 (4), 253–264. Stevens, P., 1995. The determination of oil prices 1945–1995: a diagrammatic interpretation. Energy Policy 23 (10), 861–870. Struzik, Z.R., 2001. Wavelet methods in (financial) time-series processing. Physica A 296 (1), 307–319. Suykens, J.A., Vandewalle, J., 1999. Least squares support vector machine classifiers. Neural. Process. Lett. 9 (3), 293–300. Taieb, B.S., Bontempi, G., Atiya, A.F., Sorjamaa, A., 2012. A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition. Expert Syst. Appl. 39 (8), 7067–7083. Tang, L., Yu, L., Wang, S., Li, J., Wang, S., 2012. A novel hybrid ensemble learning paradigm for nuclear energy consumption forecasting. Appl. Energy 93, 432–443. Tang, L., Yu, L., Liu, F., Xu, W., 2013. An integrated data characteristic testing scheme for complex time series data exploration. Int. J. Inf. Technol. Decis. Mak. 12 (3), 491–521. Tang, L., Wang, S., He, K., Wang, S., 2014a. A novel mode-characteristic-based decomposition ensemble model for nuclear energy consumption forecasting. Ann. Oper. Res. 2014, 1–22. Tang, L., Yu, L., He, K., 2014b. A novel data-characteristic-driven modeling methodology for nuclear energy consumption forecasting. Appl. Energy 128, 1–14. Vapnik, V., 1995. The nature of statistical learning theory. Springer-Verlag, New York. Wang, S., Yu, L., Lai, K.K., 2005. Crude oil price forecasting with TEI@I methodology. J. Syst. Sci. Complex. 18 (2), 145–166. Wang, S., Yu, L., Tang, L., Wang, S., 2011. A novel seasonal decomposition based least squares support vector regression ensemble learning approach for hydropower consumption forecasting in China. Energy 36 (11), 6542–6554. Watkins, G.C., Plourde, A., 1994. How volatile are crude oil prices. OPEC Rev. 18 (4), 220–245. Welch, G., Bishop, G., 1995. An introduction to the Kalman Filter, Department of Computer Science. University of North Carolina, Tech. Rep., TR-95-041. White, H., 1990. Connectionist nonparametric regression: multilayer feedforward networks can learn arbitrary mappings. Neural Netw. 3 (5), 535–549. Xie, W., Yu, L., Xu, S., Wang, S., 2006. A new method for crude oil price forecasting based on support vector machines. Computational Science—ICCS 2006. Springer, Berlin Heidelberg, pp. 444–451. Xiong, T., Bao, Y., Hu, Z., 2013. Beyond one-step-ahead forecasting: evaluation of alternative multi-step-ahead forecasting models for crude oil prices. Energy Econ. 40, 405–415. Yu, L., Lai, K.K., Wang, S., He, K., 2007. Oil price forecasting with an EMD-based multiscale neural network learning paradigm. Computational Science—ICCS 2007. Springer, Berlin Heidelberg, pp. 925–932. Yu, L., Wang, S., Lai, K.K., 2008. Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm. Energy Econ. 30 (5), 2623–2635. Yuan, C., 2011. Forecasting exchange rates: the multi-state Markov-switching model with smoothing. Int. Rev. Econ. Financ. 20 (2), 342–362. Zhu, L., Zhu, Y., Mao, H., Gu, M., 2009. A new method for sparse signal denoising based on compressed sensing. Knowledge Acquisition and Modeling, 2009. KAM'09. Second International Symposium on IEEE. vol. 1, pp. 35–38 (November).