2015 International Conference on Computer Application Technologies
Time Series Prediction using DBN and ARIMA Takaomi HIRATA, Takashi KUREMOTO, Masanao OBAYASHI, Shingo MABU
Kunikazu KOBAYASHI School of Information Science and Technology Aichi Prefectural University Nagakute, Japan
[email protected]
Graduate School of Science and Engineering Yamaguchi University Ube, Japan {v003we, wu, m.obayas, mabu}@yamaguchi-u.ac.jp
Abstract— Time series data analyze and prediction is very important to the study of nonlinear phenomenon. Studies of time series prediction have a long history since last century, linear models such as autoregressive integrated moving average (ARIMA) model, and nonlinear models such as multi-layer perceptron (MLP) are well-known. As the state-of-art method, a deep belief net (DBN) using multiple Restricted Boltzmann machines (RBMs) was proposed recently. In this study, we propose a novel prediction method which composes not only a kind of DBN with RBM and MLP but also ARIMA. Prediction experiments for the time series of the actual data and chaotic time series were performed, and results showed the effectiveness of the proposed method.
of the proposed method was confirmed by comparing the prediction precision. II. METHOD A. Autoregressive Integrated Moving Average (ARIMA) model The ARIMA model is a time series analysis and prediction model proposed by Box & Jenkins [3]. It is represented by the following finite difference equation. (1) (2)
Keywords—time series forecasting; artificial neural network; deep belief net; ARIMA
(3) where,
I. INTRODUCTION
(B: Lag operator),
A time series is a data string to be observed in a temporal change in a certain phenomenon. Time series data exhibit certain characteristics and peculiarities by the influence of elements of not only linear factors but also nonlinear factors. For example, foreign currency exchange rates are changed dynamically according to the seasonal changes and every moment social demands. Chaotic time series, such as Lorenz chaos, Henon map, the change of the number of sunspots, electricity consumption, etc, show more complexity with the different initial values and unpredictable long-term values. ARIMA model [1] established in 1970s is a well-known linear prediction model and applied in many fields such as economics, finance, industry, and so on. Since 1986, artificial neural networks (ANN)s have attracted attention of time series forecasting for their high ability of nonlinear approximation. Recently, the deep learning method shows its possibility not only on the field of pattern recognition or dimensionality reduction [2], but also time series prediction [3] [4].
are autoregressive parameters, are moving average parameters, is the error term that follows a normal distribution) For time series data , using Akaike Information criteria (AIC) or other evaluation function to minimize the error between the estimated value Yt and , ) and variance ( ) of the white noise can be parameters ( estimated. In this study, these parameters are estimated by auto.arima functions presented in the library of forecast statistical analysis software R. B. Artificial Neural Network (ANN) ANNs are excellent function approximators which represent the advanced information processing ability of the human brain in the mathematical model and they are applied in a wide range of fields such as pattern recognition, control engineering, forecasting, and so on.
In this paper, we proposed a novel hybrid prediction method which uses a kind of DBN with RBM and MLP to predict the residual of prediction results given by ARIMA, or uses ARIMA to predict the residual of prediction results by the DBN. The principle is founded on the view that ARIMA model is suitable to predict the linear character of the data, and the nonlinear character may exist in the prediction errors [5].
1) Multi Layer Perceptron (MLP) MLP is a neural network with multiple layers and a feed forward structure. Generally, the MLP is with 3 layers, i.e., the input layer, the hidden layer and the output layer (Fig. 1). Neurons (or units) on the input layer receives the input signal from the external, and propagate them to the neurons on the hidden (intermediate) layer. Neurons on the hidden layer may
Time series prediction experiments using USD and JPY exchange rate data [6], CATS competition time series data [7] and Lorenz chaos, Henon map were performed and the priority 978-1-4673-8211-3/15 $31.00 © 2015 IEEE DOI 10.1109/CCATS.2015.15
24
fire when the elements of input signal are accumulated with connection weights and exceed their thresholds (biases). And the same processing is occurred between the hidden layer and the output layer. The output function of each unit on the hidden layer and output layer used is a logistic sigmoid function shown in equation (4). ε : Gradient
(4)
The error back propagation method (BP), an algorithm of supervised learning, is well-known to modify the synaptic weights between neurons. Details of BP are omitted here.
Figure 2 The structure of RBM 3) Deep Belief Nets(DBN) DBN is a multi-layer neural network which usually composes multiple RBMs [2] [3] [4]. DBN can extract the features of high-dimensional data, so it is also called “deep learning”, and DBNs hav been applied to many fields such as dimensionality reduction, image compression, pattern recognition, time series forecasting, and so on. A DBN prediction system is proposed recently by composing of plural RBMs in [3], and another DBN prediction system using RBM and MLP is proposed in [4]. However, all of these DBNs belong to nonlinear predictors and may not deal with linear characters of time series data such as the seasonal change factor.
Figure 1 The structure of MLP 2) Restricted Boltzmann Machine (RBM) RBM is a Boltzmann machine consisting of two layers of the visible layer and the hidden layer, no connections between neurons on the same layer [2] (See Fig.2). It is possible to extract features of the high-dimensional data by compressing them into low-dimensional data with an unsupervised learning algorithm. In Figure 2, each unit on the visible layer has a with respect to the unit symmetric connection weights of the hidden layer. Coupling between the units are bidirectional, while the given data that are used between the learning and network flows in both directions. The size of the connection weights between units is the same in both directions. Unit i of the visible layer has a bias , the hidden layer . All units stochastically output 1 or 0, according to a probability with sigmoid functions as shown in equations (5) and (6).
OUTPUT
INPUT
RBM
MLP
Figure 3 The structure of a DBN with RBM and MLP [4] C. Proposed Method(ARIMA+DBN, DBN+ARIMA) Some researchers have reported that it is effective to be considered separately for the linear component and the nonlinear component of the time series [5]. That is : (8)
(5)
where t is the number of data, is a linear component of time t, . component of
(6) Using a learning rule which modifies the weights of connections, RBM network can reach a convergent state by observing its energy function:
is real value of data at , is a nonlinear
In this paper, we propose a hybrid prediction system which uses conventional linear predictor ARIMA to predict oneahead data (i.e., short-term prediction) at first, and then uses the DBN with RBM and MLP (Fig. 3) to predict the prediction error of ARIMA. Final, prediction is given by the summation of the predicted value given by ARIMA and the predicted error given by DBN. We call the method as “ARIMA+DBN”. The method is expressed by the following equation:
(7) Details of the learning algorithm of RBM can be found in [2] [3] [4].
(9) (10)
25
where is the prediction result of ARIMA, prediction result of DBN which approximates prediction error of ARIMA.
is the , the
Additionally, to the original time series, we can also give a nonlinear prediction using DBN, and forecast its linear residual using ARIMA. That is, in equation (9) is given which approximates is given by by DBN and ARIMA. Here, we call this case of our proposed method as “DBN+ARIMA”. The original time series data㻌 Prediction error㻌
First predictor㻌
Second predictor㻌
Prediction error㻌
Figure 5 Strange attracter of Lorenz chaos
Final prediction result㻌
Figure 4 Flowchart of the proposed method The proposed methods are depicted in Figure 4 where “First predictor” indicates ARIMA or DBN, “Second predictor” is DBN or ARIMA. “Final prediction result” is given by equation (10). III. EXPERIMENTS AND RESULT Four time series data were used in the prediction experiments: A) Lorenz chaos; B) Henon map; C) Foreign currency exchange rate (USD vs JPY) [6]; and D) CATS benchmark data used in IJCNN’04 time series prediction competition [7].
(1) Lorenz chaos and prediction result of ARIMA+DBN
Training error and prediction error given by mean squared error (MSE) are used to compare the performance of different methods: (11) where n is the data (train samples or long-term prediction data) size. A. Lorenz chaos Lorenz chaos is well-known with a kind of butterfly shape strange attractor as shown in Figure 5. It is given by 3 variables nonlinear functions:
(2) Lorenz chaos and prediction result of DBN+ARIMA Figure 6 Prediction result of Lorenz chaos In Figure 6, it is shown that the time series data of x in Lorenz chaos, and predicted data by different methods:
(12)
where parameters were
1) ARIMA [1]; 2) DBN [3]; 3) ARIMA+DBN (proposed in this paper); 4) DBN+ARIMA (proposed in this paper). The forecasting precision of different methods for Lorenz chaos is compared in Table 1. Lorenz chaos prediction accuracy is improved greatly by using the ARIMA because it may contain a high linearity. The proposed method, ARIMA + DBN, showed its priority comparing with conventional methods.
.
The time series data of the x-axis of the Lorentz chaos were used in the experiment. The first 800 data were used as training samples, and the following 200 data were used to be predicted by the one-ahead (short-term) forecasting of different predictors.
26
Table 1. MSE comparison of Lorenz chaos Learning MSE ARIMA[1] DBN[4] ARIMA+ DBN DBN+ARIMA
Prediction MSE
-10
1.347×10-11 4.997×10-6 8.730×10-12 6.837×10-10
2.576×10 9.363×10-6 4.492×10-12 2.161×10-9
B. Henon map Henon map is given by 2 variables nonlinear functions: (13) where parameters were
,
(2) Henon map and prediction result of DBN+ARIMA Figure 8 Prediction result of Henon map
.
The forecasting precision of different methods for Henon map is compared in Table 2. As the comparison result, the proposed method DBN+ARIMA shows the most highperformance of prediction to Henon map.
Figure 7 shows the attractor of Henon map, and Figure 8 shows the time series data of x in Henon map and different predictors’ output. The number of training sample was also 800, and 200 data were used to be predicted by the one-ahead (short-term) forecasting of different predictors as same as the case of Lorenz chaos.
Table 2. MSE comparison of Henon map
ARIMA [1] DBN [4] ARIMA+ DBN DBN+ARIMA
Learning MSE 6.761×10-2 5.523×10-5 1.811×10-4 5.382×10-5
Prediction MSE 4.794×10-2 9.092×10-5 3.016×10-4 8.915×10-5
C. USD vs YEN exchange rate As a time series of economy, we make predictions about USD vs JPY exchange rate (One USD equals to some JPY) time series data provided by Mizuho Bank website [6]. Figure 9 shows the exchange rate time series data, and Figure 10 shows and the output of the different predictors. In this experiment, 1900 data were used as training samples, and following 1000 data were used as test samples of oneahead (short-term) forecasting of different predictors.
Figure 7 Strange attractor of Henon map
(1) Henon map and prediction result of ARIMA+DBN
Figure 9 Exchange rate time series data
27
-elements 3,981 to 4,000 -elements 4,981 to 5,000 The mean squared error missing value using:
is computed with the 100
(14) where prediction result
is a long-term prediction result.
Figure 11 shows the original CATS time series data, and Figure 12 shows the first 20 missing data of CATS time series and predictor’s output. The learning process used 980 data before predicting the 20 targets.
(1) Exchange rate and prediction result of ARIMA+DBN
(2) Exchange rate and prediction result of DBN+ARIMA Figure 10 Prediction result of the exchange rate
Figure 11 CATS Benchmark
The forecasting precision of different methods for the financial time series is compared in Table 3. Conventional ARIMA marked the highest prediction precision, and proposed ARIMA+DBN resulted the highest learning performance. In fact, in the case of this real data experiment, all predictors showed the similar prediction precision which suggests that a better predictor for the financial data needs to be developed in the future. Table 3. MSE comparison of exchange rate
ARIMA [1] DBN [4] ARIMA+ DBN DBN+ARIMA
Learning MSE (×10-1) 5.375 4.237 4.023 4.203
Prediction MSE (×10-1) 3.133 3.645 3.567 3.631
(1) CATS data and prediction result of ARIMA+DBN
D. CATS time series CATS time series data is the artificial benchmark data for competition [7].This artificial time series is given with 5,000 data, among which 100 are missing. The missing values are divided in 5 blocks: -elements 981 to 1,000 -elements 1,981 to 2,000 (2) CATS data and prediction result of DBN+ARIMA
-elements 2,981 to 3,000
28
Figure 12 Prediction result of the CATS data
IV. CONCLUSION A novel time series prediction method using the combination of ARIMA model and a DBN composed by RBM and MLP was proposed in this paper. Prediction precision shows the efficiency of the proposed method according to the experiment results. The proposed method showed its priority especially for chaotic time series and CATS benchmark time series data.
The forecasting precision of different methods for CATS is compared in Table 4, and in Table 5 it shows comparison with the best & the worst results of IJCNN ’04. According to the E1, the proposed method DBN+ARIIMA marked the first rank comparing with all conventional methods. Table 4. MSE comparison of CATS data [7]
ARIMA [1] DBN [4] ARIMA+ DBN DBN+ARIM A
Learning MSE (×101)
Prediction MSE (short term) (×102)
Prediction MSE (long term) (×103)
11.57 3.28 2.65
1.64 2.13 2.73
1.72 0.26 2.27
2.71
1.90
ACKNOWLEDGEMENT This work was supported by JSPS KAKENHI No. 26330254 and No. 25330287.
REFERENCES [1]
0.24
Table 5. Error (E1 ) comparison of CATS data [7]
Method DBN+ARIMA DBN [4] Kalman Smoother(The best of IJCNN '04) [7] A hierarchical Bayesian Learning Scheme for Autoregressive Neural Networks(The worst of IJCNN '04) [7] ARIMA [1] ARIMA+DBN
[2] [3]
244 257 408 1247
[4]
[5]
1715 2266
[6] [7]
29
G. E. P. Box, D. A. Pierce, “Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models”, Journal of the American Statistical Association, Vol. 65, No. 332 (Dec., 1970), pp. 1509-1526. G.E. Hinton, and R.R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, Vol.313, 2006, pp. 504-507. T. Kuremoto, S. Kimura, K. Kobayashi, M. Obayashi, “Time series forecasting using a deep belief network with restricted Boltzmann machines”, Neurocomputing, Vol.137, No.5, Aug. 2014, pp.47–56. T. Kuremoto, T. Hirata, M. Obayashi, S. Mabu, K. Kobayashi, “Forecast Chaotic Time Series Data by DBNs”, Proceedings of the 7th International Congress on Image and Signal Processing (CISP 2014), Oct. 2014, pp. 1304-1309. G.P. Zhang, “Time series forecasting using a hybrid ARIMA and neural network model”, Neurocomputing, Vol.50 (2003), pp. 159–175. USD vs JPY currency exchange rate, http://www.mizuhobank.co.jp/rate/market/historical.html, 2014/4/25 Amaury Lendasse, Erkki Oja, Olli Simula, “Time Series Prediction Competition: The CATS Benchmark”, IJCNN'2004 proceedings – International Joint Conference on Neural Networks, 2004, pp1615-1620.