Crude Oil Price Forecasting: A Transfer Learning Based ... - IEEE Xplore

2012 Fifth International Conference on Business Intelligence and Financial Engineering

Crude Oil Price Forecasting: A Transfer Learning based Analog Complexing Model

Jin Xiao, Changzheng He

Shouyang Wang

Business School Sichuan University Chengdu, China e-mail: [email protected], [email protected]

Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing, China e-mail: [email protected] short-memory system forecasting methods. Further, Fan et al. [6] have applied pattern matching technique to crude oil price forecast and proposed a new approach: generalized pattern matching based on genetic algorithm (GPMGA). The researches above have made important contributions to the crude oil price forecasting. However, after careful analysis it can be found that most of them only use the forecasted time series itself, i.e., target domain data. In fact, there are plenty of data available outside the target domain and they often come from other time series, i.e., source domains. It is expected to improve the forecasting accuracy of target time series if we can make full use of the data in related source domain time series. To solve this issue, the transfer learning technique [7] developed from machine learning area provides a good idea. Its basic idea is to utilize the knowledge acquired from the related tasks to assist people in learning target task for better performance. In the last decades, transfer learning has gotten more and more attention, and been applied to many areas successfully, such as text mining, speech processing and image recognition, but scarce in time series forecasting. This study combines transfer learning technique with a pattern matching forecast method analog complexing (AC) and genetic algorithm, and constructs a transfer learning based analog complexing model (TLAC) for crude oil price forecasting. The empirical analysis results show that the performance of TLAC is better that that of some existing models. The structure of this study is organized as follows: it simply introduces the working principle of AC method and the basic idea of transfer learning in Section II; presents the work principle and detailed steps of TLAC in Section III; conducts empirical analysis in Section IV. Finally, the conclusions are included in Section č.

Abstract—Most of the existing models for oil price forecasting only use the data in the forecasted time series itself. This study proposes a transfer learning based analog complexing model (TLAC). It first transfers some related time series in source domain to assist in modeling the target time series by transfer learning technique, and then constructs the forecasting model by analog complexing method. Finally, genetic algorithm is introduced to find the optimal matching between the two important parameters in TLAC. Two main crude oil price series, West Texas Intermediate (WTI) crude oil spot price and Brent crude oil spot price are used for empirical analysis, and the results show the effectiveness of the proposed model. Keywords- transfer learning method; analog complexing model; genetic algorithm; crude oil price forecasting

I.

INTRODUCTION

Oil is the most used energy source in the world, accounting for 36.4% of primary energy consumption and 94.5% of global energy used for transportation [1]. Oil market is a strongly fluctuating market, and the sharp oil price movements are likely to disturb aggregate economic activity and bring dramatic uncertainty for the global economy. Therefore oil price forecasting has been a common research theme in the last decades. At present, there are abundant studies on analysis and forecasting of crude oil price. The approaches can be classified into two categories: 1) econometric forecasting models, , which assume the time series is just a random sequence with linear correlation, including autoregressive moving average (ARMA), autoregressive conditional heteroscedasticity (ARCH), vector autoregression (VAR), etc. [2]. 2) nonlinear dynamics forecasting models, including artificial neural network (ANN), support vector regression (SVR), belief networks (BN), genetic programming (GP), etc. [3, 4]. However, these nonlinear models also have their own disadvantages, e.g., ANN often suffers from local minima and overfitting, while other models, such as SVR and GP, are sensitive to parameter selection [4]. The limitations of the econometric models and nonlinear dynamics models motivate the demand to looking for new method for crude oil price forecast. Peters [5] pointed out that most financial time series were long-memory processes, and the future prices depend on not some parts but all of the history prices. In this case, the prediction approaches based on historical pattern matching should be chosen for longmemory systems, which can overcome the disadvantages of

978-0-7695-4750-3/12 $26.00 © 2012 IEEE DOI 10.1109/BIFE.2012.14

II.

RELATED METHOD

A. Analog complexing Analog complexing (AC) algorithm was developed by Lorence [8] and was first applied to meteorological forecast. Further, Lemke enhanced AC algorithm by an inductive selforganizing approach and an advanced selection procedure to make it applicable evolutionary processes too. AC can be considered as a sequential pattern recognition method for forecasting, clustering and classification of complex objects. This method is based on the assumption that typical

29

situations of a time process will repeat in some form, that is, each actual period of state development of a given multidimensional time process may have one or more analogous periods in history. In this way, the forecast of the present state can be gotten by transforming and combining the development states of analogous periods in history. In general, the AC algorithm for forecast is a four-step procedure [9]. Its detail processes are as follows. 1) Generation of candidate patterns: Given a mdimensional real value series with N observations xt {x1t ,, xmt } , t=1, 2, …, N, a pattern is

N k 2,N; j 1,2,m) in reference pattern as the datum value, estimate the unknown weights a0li and a1li for each candidate pattern PK(i) by the least squares method, and give the total sum of squares as a similarity measure. 3) Selection of most similar patterns: To measure the similarity between the candidate pattern PK(i) transformed in Step 2 and the reference pattern PR, we need to compute the distance between the two patterns. In general, the distance between the ith candidate pattern and reference pattern can be defined as:

defined as a table Pk (i ) with k rows (observations) beginning from the ith line (period), where k is the pattern length (i 1,2, , N k 1) :

Pk (i )

ª x1i « . « « x1,i j « « . « x1,ik 1 ¬

. xli . . . xl ,i j . . . xl ,ik 1

. xmi º . . »» . xm ,i j » . » . . » . xm,ik 1 »¼ kum

di

Ti [ Pk (i )]

. . . .

xli* . * xl ,i j . * l ,i k 1

. x

º » » », » » * . xm ,i k 1 »¼

. . . .

* xmi . * xm ,i j .

(1)

R*

i 0l

i 1l l ,i j

m

¦ x

r ,i j

r 1

xr , N k j 1

2

.

(3)

¦

F

i 1

wi Ri ,

(5)

where wi (i=1, 2, …, F) is the weights. In this study, they are computed according to pattern similarity. B. Transfer learning technique The concept of transfer learning originates from the Psychology [7]. It means the ability that people can use their experience and technologies learned in the related areas to assist in learning the new task. People can learn new knowledge directly and also can utilize the old knowledge to assist in learning new knowledge. Machine learning has always attempted to simulate the learning of people since it emerged. Learning new knowledge direct is the traditional machine learning paradigm that we are familiar with. Such methods often suppose that the learning tasks are independent with each other, and they will discard the past learning experience and results in learning new task. Until the 1990s, transfer learning has gained more and more attention with the development of machine learning. Its basic idea is to utilize the data or information of related source tasks to assist in modeling of target task [7].

(2)

III. PROCESS OF THE ANALOG COMPLEXING

TRANSFER

LEARNING

BASED

The TLAC model proposed in this study contains two phases: 1) Selection of the related source domain time series. Because there is lots of source domain time series that is related to target domain time series and these data often contains lots of noise. The forecasting performance may be damaged if transfer improbably, that is, result in negative

a a x , j=0, 1,…,k-1; i=1,2,…,N-k+1; l 1,2, , m. The parameter a0li could be interpreted as *

where xl ,i j

1 k 1 ¦ k 1 j 0

Then the pattern similarity between the ith pattern and the reference pattern is defined as: si 1/ di . (4) Obviously the greater the distance value is, the smaller the pattern similarity is. 4) Combination forecasting: Suppose F most similar patterns PK(j) are selected in Step 3, then each selected analogue has its own continuation transformed to the reference pattern for forecasting. Let R1, R2, …, RF be the forecasts of F patterns, the combined forecasting results are gotten by the following formula:

In general, the last pattern P R Pk ( N k 1) just before the forecast origin is selected as the reference pattern. Then all possible candidate patterns Pk (i ) (i 1,, l ,, N k 1) are compared with the reference pattern, hoping to find one or more similar patterns with reference pattern to study the behavior of the system. 2) Transformation of analogues: For a reference pattern with a length of k observations, there may be one or more similar patterns in history. However, because the system is dynamic, patterns with similar shapes may have different means and standard deviations. For the example in Step 1, we consider the similarity between the candidate pattern P2(1) and reference pattern PR. Obviously they have the same trend and structure (PR = P2(1)+ 1) and are highly similar. However, their means and standard deviations are very different. Thus, to compute the similarity between patterns, we must look for a transformation from candidate patterns to reference pattern to describe these differences. It is advisable to defined the transformed pattern Ti[PK(i)] as a linear function of the pattern PK(i):

ª x1*i « « . « x1*,i j « « . « x* ¬ 1,i k 1

N k 1,

of the uncertainties. Regarding the data xij (i

the difference of state between the reference pattern and the similar patterns, and parameters a1li is considered to be some

30

domain time series, and it is more helpful to forecast the target domain time series. Therefore, in TLAC model, we select some source domain time series which have the largest average similarity into the target domain and assist in modeling.

transfer. 2) Genetic algorithm based parameter optimization. There are two important parameters when using AC algorithm to model: pattern length k and the number of most similar patterns F. The conventional AC model needs to fix the pattern length at a certain level first, generate reference pattern and candidate patterns, select F most similar candidate patterns with the reference pattern to combine and get the combined forecasting result; and then it changes the value of the pattern length repeatedly. Thus, each pattern length value corresponds to a combined forecasting result, and finally the optimal forecasting result is selected from all combined forecasting results. However, the conventional AC model is difficult to find the optimal match between the pattern length k and the number of most similar candidate patterns F. To overcome the deficiencies of the conventional AC model, this study introduces genetic algorithm to optimize the parameters in TLAC model.

B. Genetic algorithm based parameter optimization Genetic algorithm (GA) encodes a possible solution to a specific problem on simple chromosome string-like data structure and apply specified operators to these structures so as to preserve important information, and to produce a new population with the purpose of generating strings that map to high function values [10]. For short, the GA is characterized by bit string representations of potential solutions for a given problem and these bit string representations alter and improve these coded solutions by using GA operations. The main operations of GA, a selection, a crossover and a mutation of genetic information, affect the binary string characteristic in natural evolution. Each generation of a GA consists of a new population produced from the previous generation. The binary representation and the main operations of GAs in this study can be summarized as follows. 1) Variable encoding: Before a genetic algorithm can be put to work on any problem, a method is needed to encode potential solutions to that problem in a form that a computer can process. One common approach is to encode solutions as binary strings: sequences of 1 and 0, where the digit at each position represents the value of some aspect of the solution. In TLAC model, there are two important parameters: pattern length k and the number of most similar patterns F. In this study, we set 1F8 and 2k33, and the length of chromosome be 8 binary series, the front three bits denote the value of F, and the last five bits, the value of k (see Fig. 1). Take the chromosome in Fig. 1 as an example, the values of two parameters are respectively: F = (1*22+1*20)+1=6; k = (1*23+1*22+1*20)+2=15.

A. Selection of related source domain time series For any economic time series, we can always find many related time series from the source domain. To describe the relevance of these time series and target time series, we calculate the Pearson correlation coefficient (PCC) between these series and target time series. However, because the number of data points of different time series is often different, the PCC cannot be calculated directly. In this study we mainly focus on the daily crude oil spot price forecasting, therefore, we segment each time series into a series of sub-series with length of 255 (except Saturday and Sunday of one year, 360 * 5/7 255), and discard the sub-series of the last part which has less than 255 time points directly starting from the latest time point. Set the target time series of a forecasting problem Y, there are m related time series X i (i=1, 2, …, m) in source domain, suppose that the target time series Y and source domain time series X i are divided into n1 and n2 sub-time series respectively, and then the PCC between Yj (j=1, 2, …, i n1) and X k (k=1, 2, …, n2) is calculated as follows:

¦

rj , k

where X

¦ i k

255 t

255 t 1

(X 1

(X

i k ,t

i k

X )

i 2 k

1

X )(Y j ,t Y j )

i k ,t

¦

255 t

(Y j ,t Y j ) 1

,

(6)

X and Yj. Further, the average similarity between the target time series Y and source domain time series Xi (i=1, 2, …, m) is defined:

r

n2

k 1 j ,k

r

.

1

1

0

1

Pattern length

Figure 1. The encoding of potential solution.

and Y j are the mean value of sub-time series

j 1

0

2) Selection: The selection operator biases the search process in favor of the more fit members based on their fitness value. The fitness of the ith member in the population can participate in this operation on the basis of probabilities. This probability of the ith member in the population is calculated as below:

i k

n1

1

Number of selected most similar patterns

2

¦ ¦

0

Pi

(7)

n1 n2 It is easy to know that r [-1, 1], and the larger r is, the

fi

¦

q

c 1

,

(8)

fc

where q is the population size, f i (i=1, 2, …, q) is the fitness of the ith member. In the selection operation, the members of the population with better fitness can participate several times while the members with worse fitness may be deleted, which lead to an increase in average fitness.

stronger the positive correlation between the two time series is. In this study, the development trend of the source domain time series which have stronger positive relevance with the target domain time series is close to that of the target

31

3) Crossover: The crossover operation allows for an exchange of design characteristics among the mating members. The operation is executed by selecting two mating parents, randomly choosing two sites on each of the chromosomal strings and swapping strings between the sites among the pair. The crossover operation is applied with a probability of Pc which takes the probabilistic values from 0.2 to 0.8 [11]. 4) Mutations: This operator is another essential operator in GA process and it acts on each chromosome after crossover operator in this way that a random number is produced for each bit of a chromosome, if this number is smaller than Pm mutation will occur in that bite and otherwise it is not happened. If mutation is not applied, after crossover the offspring will enter the new generation. Mutation operation prevents losing unexpected valuable genetic information in the population during selection and crossover operation [12]. 5) The fitness function: To evaluate the forecasting result of each chromosome, we define the following fitness function:

Min F

1 N 2 ¦ yt yˆt , N t1

Calculate the similarity between the reference pattern and each candidate patterns, select F most similar ones and combine the continuations of these patterns according to (5) to get the forecasting result in Test; Calculate the fitting value of the forecasting result corresponding to the chromosome according to (9). End Select Pn chromosomes with higher fitness into the next generation according to (8), and crossover and mutate with the probability Pc and Pm. End Ĺ Find the chromosome with the highest fitness and combine the continuations of most similar patterns selected in that chromosome, and get the final forecasting result according to (5).

IV.

A. Research data In this study, we select the West Texas Intermediate (WTI) crude oil spot price and Brent crude oil spot price as experimental samples. The two crude oil price data used in this study are daily data, and are downloaded from the energy information administration (EIA) website of Department of Energy (DOE) of USA. For WTI crude oil spot price, we take the daily data from January 2, 1986 to December 30, 2011, excluding public holidays, with a total of 6559 observations. For convenience of modeling, the data from January 2, 1986 to June 30, 2011 is used for the training set, and the remainder is used as the test set. While for Brent crude oil spot price, the sampling data are from May 20, 1987 to December 30, 2011, including 6251 observations. Similarly, we take the data from May 20, 1987 to June 30, 2011 as training set, and take the data from July 1, 2011 to December 30, 2011 as the test set. Further, we regard the four time series as the source time series, including New York Harbor conventional gasoline regular spot price FOB, U.S. Gulf Coast conventional gasoline regular spot price FOB, New York Harbor No. 2 heating oil spot price FOB and U.S. Gulf Coast kerosenetype jet fuel spot price FOB. The former three series are from January 2, 1986 to December 30, 2011, and the last one is from April 2, 1990 to December 30, 2011. It is worth noting that the Brent crude oil spots price is also regarded as the source time series for WTI crude oil spots price, and vice versa.

(9)

where yt and yˆ t are the observed values and predicted values, N is the number of observations in test set. As the fitness function is minimization, individuals with less amount of fitness are chosen for each generation. Source domain Series X1 - Xm

Target domain Series Y

Compute the correlation between each source series and target series

Transfer some source series with highest similarity to target domain

Optimized by genetic algorithm Forecast by analog complexing

EMPIRICAL ANALYSIS

Final forecast results

Figure 2. The block diagram of TLAC model.

Let Pg be the generation number, Pn the population size, Pc the probability of crossover, Pm the probability of mutation, and the test set Test used to verify the performance of the candidate models. Fig. 2 shows the modeling process of transfer learning based analog complexing (TLAC) model. It is generally composed of the following steps: Step 1 Calculate the average similarity between each source domain time series X1, X2, …, Xm and the target domain time series Y according to (7); Step 2 Transfer some source time series with the highest similarity to the target time series; Step 3 Forecast with GA based analog complexing; ķ Initialize the population; ĸ For i =1: Pg (generation) For j=1: Pn (population size) Determine F and k according to the structure of the chromosome; Divide the target time series Y and the selected source domain time series into the candidate patterns with the length of k respectively;

B. Evaluation criteria To evaluate the prediction performance, two commonly used evaluation criteria, root mean square error (RMSE) and direction statistics (Dstat) are introduced in this study. Given N pairs of the observed values yt and predicted values yˆ t , RMSE describing the estimates’ deviation from the real values is calculated as

32

RMSE

V.

1 N yt yˆt 2 . ¦ Nt1

(10)

However, RMSE cannot provide direct suggestions to decision makers. Many decision makers, such as investors, are much more interested in the direction of the change. Therefore, we introduced directional change statistics, Dstat:

Dstat where at otherwise. TABLE I.

1 N

N

¦a ,

(11)

t

ACKNOWLEDGMENT

t 1

1 , if ( yt yt 1 )( yˆt yˆt 1 ) ! 0 , and at

This study is supported by the Natural Science Foundation of China under Grant Nos. 71101100, 70731160635 and 71071101, New Teachers' Fund for Doctor Stations, Ministry of Education under Grant No. 20110181120047, China Postdoctoral Science Foundation under Grant No. 2011M500418, Research Start-up Project of Sichuan University under Grant No. 2010SCU11012.

0

THE PERFORMANCE COMPARISONS FOR DIFFERENT MODELS FOR WTI CRUDE OIL SPOT PRICE

Models

RMSE

Rank

Dstat (%)

TLAC

1.0691

1

79.02

1

AC

1.4948

3

69.87

2

Rank

ARIMA

2.7725

5

51.97

5

ANN

2.0475

4

57.94

4

GPMGA

1.3642

2

66.49

3

REFERENCES [1]

G. Maggio, and G. Cacciola, "A variant of the Hubbert curve for world oil production forecasts," Energy Policy, vol. 37, pp. 47614770, 2009. [2] H. Mohammadi, and L. Su, "International evidence on crude oil price dynamics: Applications of ARIMA-GARCH models," Energy Economics, vol. 32, pp. 1001-1008, 2010. [3] S. Mirmirani, and H. C. Li, "A comparison of VAR and neural networks with genetic algorithm in forecasting price of oil," Applications of Artificial Intelligence in Finance and Economics: Advances in Econometrics, vol. 19, pp. 203-223, 2004. [4] L. Yu, S. Y. Wang, and K. K. Lai, "Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm," Energy Economics, vol. 30, pp. 2623-2635, 2008. [5] E. Peters, Fractal Market Hypothesis: Applying Chaos Theory to Investment and Economics. New York: Wiley, 1994. [6] Y. Fan, Q. Liang, and Y. M. Wei, "A generalized pattern matching approach for multi-step prediction of crude oil price," Energy Economics, vol. 30, pp. 889-904, 2008. [7] S. J. Pan, and Q. Yang, "A survey on transfer learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, pp. 13451359, 2010. [8] E. N. Lorece, "Atmospheric predictability as revealed by naturally occurring analogues," Journal of the Atmospheric Sciences, vol. 26, pp. 636-646, 1969. [9] J. A. Muller, and F. Lemke, Self-Organising Data Mining: An Intelligent Approach to Extract Knowledge from Data. Hamburg: Libri, 2000. [10] J. H. Holland, Adaptations in Natural Artificial Systems. Michigan: University of Michigan Press, 1975. [11] O. E. Canyurt, and H. K. Ozturk, "Application of genetic algorithm (GA) technique on demand estimation of fossil fuels in Turkey," Energy Policy, vol. 36, pp. 2562-2569, 2008. [12] A. Azadeh, and S. Tarverdian, "Integration of genetic algorithm, computer simulation and design of experiments for forecasting electrical energy consumption," Energy policy, vol. 35, pp. 52295241, 2007.

Further, we compared TLAC model with traditional AC [8], the pattern matching method GPMGA [6], ARIMA and ANN. The TLAC models are performed using the following user-specified parameters: Pg = 50, Pn = 100, Pc = 0.9, Pm = 0.05, and for both WTI and Brent crude oil spots price forecasting only one source time series with the highest similarity is selected to assist in modeling. While for GPMGA, the parameter setting is the same as in [6]. TABLE II.

THE FORECASTING PERFORMANCE COMPARISONS FOR DIFFERENT MODELS FOR BRENT CRUDE OIL SPOT PRICE

Models

RMSE

Rank

Dstat (%)

Rank

TLAC

1.1506

1

77.29

1

AC

1.4740

3

72.39

2

ARIMA

2.5860

5

55.56

5

ANN

2.1423

4

61.11

4

GPMGA

1.3482

2

68.73

3

CONCLUSIONS

This study combines transfer learning technique with a pattern matching forecast method analog complexing (AC) and genetic algorithm, and constructs a transfer learning based analog complexing model (TLAC) for crude oil price forecasting. The empirical analysis results in two main crude oil price series show that the proposed TLAC model outperforms the other four models.

C. Experimental results Tables 1 and 2 show the forecasting performance of five models for WTI crude oil spots price series and Brent crude oil spots price series. It can be seen that TLAC model outperforms the other four models for both crude oil prices under study, either in the measurement of goodness-of-fit, RMSE, or in the evaluation criterion Dstat. Therefore, it can be concluded that the forecasting performance of the proposed transfer learning based analog complexing model (TLAC) is better than that of other models listed in this study.

33