Using Recurrent Neural Networks for Time Series Forecasting Sandy D. Balkin Pennsylvania State University Department of Management Science and Information Systems University Park, Pennsylvania 16802 e-mail:
[email protected]
In the past few years, arti cial neural networks (ANNs) have been investigated as a tool for time series analysis and forecasting. The most popular architecture is the multilayer perceptron, a feedforward network often trained by backpropagation. The forecasting performance of ANNs relative to traditional methods is still open to question although many experimenters seem optimistic. One problem with the multilayer perceptron is that, in its simplest form, it is similar to a pure autoregressive type model, so it lacks the ability to account for any moving average structure that may exist. By making a network recurrent, it is possible to include such structure. We present several examples showing how an ANN can be used to represent an ARMA scheme and compare the forecasting abilities of feedforward and recurrent neural networks with traditional methods. Abstract.
Working Paper Series number 97-11. This paper was presented at the 1997 International Symposium on Forecasting, Barbados.
1
Using Recurrent Neural Networks for Time Series Forecasting
2
1. Introduction
An Arti cial Neural Network (ANN) is an information processing paradigm that is inspired by the way the brain processes information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve speci c problems. ANNs, like people, learn by example. An ANN is con gured for a speci c application, such as pattern recognition or data classi cation, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true of ANNs as well. For an introduction to ANNs see, for example, Hinton (1988), Lippmann (1987), or Cheng and Titterington (1994). Recently, ANNs have been investigated as a tool for time series forecasting. Examples include Tang, et al. (1991), Hill, et al. (1996), and Faraway and Chat eld (1995). The most popular class is the multilayer perceptron, a feedforward network trained by backpropagation. This class of network consists of an input layer, a number of hidden layers and an output layer. Figure 1 is an example of an ANN of this type with 3 neurons in the input layer, a single hidden layer with 2 neurons, and 1 neuron in the output layer. The rst step in using an ANN concerns the choice of structure, known as the architecture of the network. Neural Networks are known to be universal function approximators (Hornik, Stinchcombe, & White, 1989), however, for an ANN to perform well, the inputs and number of nodes in the hidden layer must be carefully chosen. Since they learn by example, it is vital that the inputs characterize the important relationships and correlations in the time series being modeled. For a feedforward neural network with one hidden layer, the prediction equation, given by Faraway and Chat eld (1995), for computing forecasts x^ using selected past observations, x ? 1 ; : : : ; x ? , at lags (j ; : : : ; j ) and h nodes in the hidden layer will be referred to as N N [j ; : : : ; j ; h]: Thus, the network in Figure 1 is N N [1; 12; 13; 2]: The functional form t
t
t
j
1
jk
k
1
k
Using Recurrent Neural Networks for Time Series Forecasting
3
Output Layer
Hidden Layer
Input Layer
1
xt-1
xt-12
xt-13
Figure 1: A feedforward neural network
may be written as x ^t
= (w + o
co
X
who h (wch +
X
wih xt?ji ))
i
h
where fw g denote the weights for the connections between the constant input and the hidden neurons and w denotes the weight of the direct connection between the constant input and the output. The weights fw g and fw g denote the weights for the other connections between the inputs and the hidden neurons and between the neurons and the output respectively. The two functions and denote the activation functions that de ne the mappings from inputs to hidden nodes and from hidden nodes to output(s) respectively. The logistic function, y = (x) = ? is widely used in the ANN literature. Faraway and Chat eld (1995) recommend, for forecasting applications, that be logistic and linear. Forecasts are generated iteratively by performing successive one-step ahead forecasts using previous forecasts as estimates of observables. This approach represents a neural network implementation of a non-linear autoregressive (NLAR) model of order k as x = f (x ? 1 ; x ? 2 ; : : : ; x ? ; " where f" g is a sequence of IID random variables and ch
co
ih
ho
h
o
1 1+e x
h
0
b e t
t
j
t
j
t
jk
t)
t
4
Using Recurrent Neural Networks for Time Series Forecasting
e:
0, where k is the number of time samples to remember. Once again, consider the simplest neural network of this type. This network, shown in Figure 4, will have 1 input neural, 0 neurons in the hidden layer, 1 neuron in the output layer and 1 context unit. In Bulsari and Saxen (1993), it is shown that its forecasting function is 1
2
k
3
W 03
W 13
0
W 23
1
2
Figure 4: A simple recurrent neural network
given by x ^t (1)
=w +w 03
13 xt
+w
23 ht
which can be transformed as: x ^t (1)
= = =
w03
+ (w + w )x ? w (x ? h ) + (w + w )x ? w (x ? x^ )
w03
+ (w + w )x ? w
w03
13
23
t
23
t
t
13
23
t
23
t
t
13
23
t
23 "t
Using Recurrent Neural Networks for Time Series Forecasting
11
If we let c = w , = w + w and = w , then 03
13
23
23
x ^t (1)
= c + x ? " t
t
which is the forecasting equation for an ARMA(1; 1) scheme. Further AR terms can be added by including the desired lags as inputs. For example, Figure 5 is a neural network representation of an ARMA(2; 1) scheme. It is important to mention that, with this type of architecture, namely one direct connection from the output neuron to the context unit, we are restricted to a single order moving average structure. It is also important to note
Figure 5: A neural network representation of an ARMA(2,1) scheme
that although what has been presented are neural network representations of ARMA-type schemes, the neural network coecients are not constrained by stationarity conditions as they are for the conventional linear model. 4. Examples
For the rst example, consider the Box-Jenkins airline series. Figure 6 is a plot of the series. The Box-Jenkins analysis of this monthly series entails taking one seasonal and one non-
12
400 300 100
200
Passengers (in thousands)
500
600
Using Recurrent Neural Networks for Time Series Forecasting
1950
1952
1954
1956
1958
1960
Time in years
Figure 6: Box-Jenkins Series G : International Airline Passengers : Monthly Totals for January 1949-December 1960
seasonal dierence and moving average of the logarithm of the original series. This model is known as the airline model and is written (0; 1; 1) (0; 1; 1) . The following is a summary table of the forecasts generated by the various methods being compared. The model is t on the rst 132 data points and used to forecast the remaining 12. The neural network methods were performed on the original series and incorporated the same lags (1, 12 and 13) as the airline model using a Jordan neural network with decay parameters set as = 0:6 and = 0:4 for the RNN. The table summarizes the accuracy of the forecasts shown in Figure 7. 12
1
2
Using Recurrent Neural Networks for Time Series Forecasting
13
Modeling Type Box-Jenkins Feedforward ANN Recurrent ANN Model (0 ; 1 ; 1 ) (0 ; 1 ; 1 )12 lags=1, 12, 13 lags=1, 12, 13 # Hidden Neurons NA 2 2 # Parameters 2 11 13 SS 1.079 0.9994 1.1751 0.095 0.0962 0.1053 AIC -555.7 -546.786 -523.5157 BIC -546.1 -505.216 -487.28 FMSE 315.3 310.5 306.3 FMAD 12.5 14.1 12.9
b
RES
Although the within sample forecasts are not as good as the other methods, the recurrent network achieves the best mean out-of-sample forecasts. From the plot, it is interesting to note that each method does better than the rest for some forecasted values, but based on the FMSE, the recurrent network provides marginally better results. The FMAD tells us that the Box-Jenkins and recurrent network methods are comparable while the feedforward network performed signi cantly worse. For another example, consider the Box-Jenkins Series A, chemical process concentration readings at 2 hour intervals, shown in Figure 8. Box et al. (1994) analyze this series and choose an ARMA(1; 1) as a parsimonious model. For comparison, a feedforward neural network with one input at lag 1, two nodes in the hidden layer and one output node is created. For the recurrent network, a connection is added from the output node to a context unit with decay parameters = 0:9 and = 0:1: The following table is a summary of the forecasts shown in Figure 9. The networks are trained on the rst 191 observations and all three methods are used to forecast the nal 6 values. 1
2
Using Recurrent Neural Networks for Time Series Forecasting Actual and Box-Jenkins
600 550 500 400
4
6
8
10
12
2
4
6
8
10
Forecast
Forecast
Actual and FFNN
Actual and JRNN
12
550 500 450 400
400
450
500
550
Passengers (in thousands)
600
600
2
Passengers (in thousands)
450
Passengers (in thousands)
550 500 450 400
Passengers (in thousands)
600
Actual Values
2
4
6 Forecast
8
10
12
2
4
6
8
10
12
Forecast
Figure 7: Actual and forecasted values of last 12 observations from Airline series
14
15
17.0 16.5
Concentration
17.5
18.0
Using Recurrent Neural Networks for Time Series Forecasting
0
50
100
150
200
Reading
Figure 8: Box-Jenkins Series A : Chemical Process Concentration Readings : Every 2 Hours
Modeling Type Box-Jenkins Feedforward ANN Recurrent ANN Model (1 ; 0 ; 1 ) lags=1 lags=1 # Hidden Neurons NA 2 2 # Parameters 2 7 9 SS 19.188 21.56 19.208 0.314 0.337749 0.320494 AIC -451.47 -418.626 -437.266 BIC -442.914 -395.341 -407.443 FMSE 0.157935 0.264902 0.1576944 FMAD 0.3256333 0.463793 0.3288155
b
RES
Once again, the Box-Jenkins and recurrent neural network methods perform comparably while the feedforward network performs signi cantly worse. Moreover, perhaps the key result in both examples is that the recurrent ANN is able to match the performance of the ARIMA model, whereas the feedforward ANN does not.
Using Recurrent Neural Networks for Time Series Forecasting
18.0 17.8 17.4 17.2
2
3
4
5
6
1
2
3
4
5
Forecast
Forecast
Actual and FFNN
Actual and JRNN
6
17.8 17.6 17.2
17.2
17.4
17.4
17.6
Concentration
17.8
18.0
18.0
18.2
18.2
1
Concentration
17.6
Concentration
17.8 17.6 17.2
17.4
Concentration
18.0
18.2
Actual and Box-Jenkins
18.2
Actual Values
1
2
3
4
Forecast
5
6
1
2
3
4
5
6
Forecast
Figure 9: Actual and forecasted values of last 6 observations from the Chemical series
16
Using Recurrent Neural Networks for Time Series Forecasting
17
5. Conclusions and Future Work
This paper shows how some ARMA models can be implemented with neural networks. Most of the research regarding neural networks for time series forecasting has been done with feedforward networks which are analogous to a non-linear autoregressive model. By adding recurrence, it is possible to account for moving average structure inherent in a time series. Neural networks may perform marginally better, based on the FMSE, provided the moving average-like structure can be incorporated into the ANN, via recurrent networks or some other scheme. Such improvements are possible since ANNs can include non-linear relationships. However, this slight improvement does come at a cost in that the recurrent neural network takes considerably longer to train than a feedforward network or the time needed to estimate the linear model coecients. The literature on linear model selection is vast compared to that on choosing neural network architecture. As with conventional models, neural networks must also be constructed parsimoniously to avoid adding unnecessary noise into the system. Though much work has been done on architecture selection, it is essentially a trial and error process. By using the techniques for linear model selection, it should be possible to construct a neural network that will perform at least as well as the corresponding linear model. This architecture will not necessarily be the best, but will provide a good lower bound for further exploration. Future research areas in this eld include implementation of neural network architectures to account for higher order moving average structure and using statistical methods for detecting linear and non-linear relationships between variables to aid in architecture construction.
Using Recurrent Neural Networks for Time Series Forecasting
18
References
Bigus, J. P. (1996). Data Mining with Neural Networks. McGraw-Hill. Box, G., Jenkins, G., & Reinsel, G. (1994). Time Series Analysis Forecasting and Control (Third edition). Prentice Hall : Englewood Clis, New Jersey. Bulsari, A. B., & Saxen, H. (1993). A Recurrent Neural Network for Time-series Modelling. Arti cial Neural Networks and Genetic Algorithms { Proceedings of the International Conference in Innsbruck, Austria. Cheng, B., & Titterington, D. M. (1994). Neural Networks: A Review from a Statistical Perspective (with discussion). Statistical Science, 9 (1), 2{54. Connor, J. T., Martin, R. D., & Atlas, L. E. (1994). Recurrent Neural Networks and Robust Time Series Prediction. IEEE Transactions on Neural Networks, 5 (2), 240{254. Elman, J. L. (1990). Finding Structure in Time. Cognitive Science, 14, 179{211. Faraway, J., & Chat eld, C. (1995). Time Series Forecasting with Neural Networks: A Case Study. Research Report 95-06 of the Statistics Group, University of Bath. Freeman, J. A. (1994). Simulating Neural Networks with Mathematica. Addison-Wesley Publishing Company. Haykin, S. (1994). Neural Networks : A Comprehensive Foundation. Prentice Hall. Hill, T., O'Connor, M., & Remus, W. (1996). Neural Network Models for Time Series Forecasts. Management Science, 42 (7), 1082{1092. Hinton, G. E. (1992). How Neural Networks Learn from Experience. Scienti c American. Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer Feedforward Networks are Universal Approximators. Neural Networks, 2, 359{366.
Using Recurrent Neural Networks for Time Series Forecasting
19
Jordan, M. I. (1986). Serial Order: A Parallel Distributed Processing Approach. Tech. rep. Report 86-104, Institute of Cognitive Science. Lippmann, R. P. (1987). An Introduction to Computing with Neural Nets. IEEE ASSP Magazine, 4, 4 { 22. McDonnell, J. R., & Waagen, D. (1994). Evolving Recurrent Perceptrons for Time-Series Modeling. IEEE Transactions on Neural Networks, 5 (1). Parzen, E. (1983). ARARMA Models for Time Series Analysis and Forecasting. Journal of Forecasting, 1, 67 { 82. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning Representations by Back-propagating Errors. Nature, 323, 533{536. Sarle, W. S. (1994). Neural Networks and Statistical Models. Proceedings of the Nineteenth Annual SAS Users Group International Conference. Tang, Z., de Almeida, C., & Fishwick, P. A. (1991). Time Series Forecasting Using Neural Networks vs. Box-Jenkins Methodology. Simulation, 57 (5), 303{310. Tong, H. (1990). Non-linear Time Series. Oxford University Press.