Using Recurrent Neural Networks for Time Series ...

11 downloads 0 Views 212KB Size Report
to a pure autoregressive type model, so it lacks the ability to account for any moving average structure that may exist. By making a network recurrent, it is ...
Using Recurrent Neural Networks for Time Series Forecasting Sandy D. Balkin Pennsylvania State University Department of Management Science and Information Systems  University Park, Pennsylvania 16802 e-mail: [email protected]

In the past few years, arti cial neural networks (ANNs) have been investigated as a tool for time series analysis and forecasting. The most popular architecture is the multilayer perceptron, a feedforward network often trained by backpropagation. The forecasting performance of ANNs relative to traditional methods is still open to question although many experimenters seem optimistic. One problem with the multilayer perceptron is that, in its simplest form, it is similar to a pure autoregressive type model, so it lacks the ability to account for any moving average structure that may exist. By making a network recurrent, it is possible to include such structure. We present several examples showing how an ANN can be used to represent an ARMA scheme and compare the forecasting abilities of feedforward and recurrent neural networks with traditional methods. Abstract.

Working Paper Series number 97-11. This paper was presented at the 1997 International Symposium on Forecasting, Barbados. 

1

Using Recurrent Neural Networks for Time Series Forecasting

2

1. Introduction

An Arti cial Neural Network (ANN) is an information processing paradigm that is inspired by the way the brain processes information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve speci c problems. ANNs, like people, learn by example. An ANN is con gured for a speci c application, such as pattern recognition or data classi cation, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true of ANNs as well. For an introduction to ANNs see, for example, Hinton (1988), Lippmann (1987), or Cheng and Titterington (1994). Recently, ANNs have been investigated as a tool for time series forecasting. Examples include Tang, et al. (1991), Hill, et al. (1996), and Faraway and Chat eld (1995). The most popular class is the multilayer perceptron, a feedforward network trained by backpropagation. This class of network consists of an input layer, a number of hidden layers and an output layer. Figure 1 is an example of an ANN of this type with 3 neurons in the input layer, a single hidden layer with 2 neurons, and 1 neuron in the output layer. The rst step in using an ANN concerns the choice of structure, known as the architecture of the network. Neural Networks are known to be universal function approximators (Hornik, Stinchcombe, & White, 1989), however, for an ANN to perform well, the inputs and number of nodes in the hidden layer must be carefully chosen. Since they learn by example, it is vital that the inputs characterize the important relationships and correlations in the time series being modeled. For a feedforward neural network with one hidden layer, the prediction equation, given by Faraway and Chat eld (1995), for computing forecasts x^ using selected past observations, x ? 1 ; : : : ; x ? , at lags (j ; : : : ; j ) and h nodes in the hidden layer will be referred to as N N [j ; : : : ; j ; h]: Thus, the network in Figure 1 is N N [1; 12; 13; 2]: The functional form t

t

t

j

1

jk

k

1

k

Using Recurrent Neural Networks for Time Series Forecasting

3

Output Layer

Hidden Layer

Input Layer

1

xt-1

xt-12

xt-13

Figure 1: A feedforward neural network

may be written as x ^t

=  (w + o

co

X

who h (wch +

X

wih xt?ji ))

i

h

where fw g denote the weights for the connections between the constant input and the hidden neurons and w denotes the weight of the direct connection between the constant input and the output. The weights fw g and fw g denote the weights for the other connections between the inputs and the hidden neurons and between the neurons and the output respectively. The two functions  and  denote the activation functions that de ne the mappings from inputs to hidden nodes and from hidden nodes to output(s) respectively. The logistic function, y = (x) = ? is widely used in the ANN literature. Faraway and Chat eld (1995) recommend, for forecasting applications, that  be logistic and  linear. Forecasts are generated iteratively by performing successive one-step ahead forecasts using previous forecasts as estimates of observables. This approach represents a neural network implementation of a non-linear autoregressive (NLAR) model of order k as x = f (x ? 1 ; x ? 2 ; : : : ; x ? ; " where f" g is a sequence of IID random variables and ch

co

ih

ho

h

o

1 1+e x

h

0

b e t

t

j

t

j

t

jk

t)

t

4

Using Recurrent Neural Networks for Time Series Forecasting

e: 0, where k is the number of time samples to remember. Once again, consider the simplest neural network of this type. This network, shown in Figure 4, will have 1 input neural, 0 neurons in the hidden layer, 1 neuron in the output layer and 1 context unit. In Bulsari and Saxen (1993), it is shown that its forecasting function is 1

2

k

3

W 03

W 13

0

W 23

1

2

Figure 4: A simple recurrent neural network

given by x ^t (1)

=w +w 03

13 xt

+w

23 ht

which can be transformed as: x ^t (1)

= = =

w03

+ (w + w )x ? w (x ? h ) + (w + w )x ? w (x ? x^ )

w03

+ (w + w )x ? w

w03

13

23

t

23

t

t

13

23

t

23

t

t

13

23

t

23 "t

Using Recurrent Neural Networks for Time Series Forecasting

11

If we let c = w ,  = w + w and  = w , then 03

13

23

23

x ^t (1)

= c + x ? " t

t

which is the forecasting equation for an ARMA(1; 1) scheme. Further AR terms can be added by including the desired lags as inputs. For example, Figure 5 is a neural network representation of an ARMA(2; 1) scheme. It is important to mention that, with this type of architecture, namely one direct connection from the output neuron to the context unit, we are restricted to a single order moving average structure. It is also important to note

Figure 5: A neural network representation of an ARMA(2,1) scheme

that although what has been presented are neural network representations of ARMA-type schemes, the neural network coecients are not constrained by stationarity conditions as they are for the conventional linear model. 4. Examples

For the rst example, consider the Box-Jenkins airline series. Figure 6 is a plot of the series. The Box-Jenkins analysis of this monthly series entails taking one seasonal and one non-

12

400 300 100

200

Passengers (in thousands)

500

600

Using Recurrent Neural Networks for Time Series Forecasting

1950

1952

1954

1956

1958

1960

Time in years

Figure 6: Box-Jenkins Series G : International Airline Passengers : Monthly Totals for January 1949-December 1960

seasonal di erence and moving average of the logarithm of the original series. This model is known as the airline model and is written (0; 1; 1)  (0; 1; 1) . The following is a summary table of the forecasts generated by the various methods being compared. The model is t on the rst 132 data points and used to forecast the remaining 12. The neural network methods were performed on the original series and incorporated the same lags (1, 12 and 13) as the airline model using a Jordan neural network with decay parameters set as  = 0:6 and  = 0:4 for the RNN. The table summarizes the accuracy of the forecasts shown in Figure 7. 12

1

2

Using Recurrent Neural Networks for Time Series Forecasting

13

Modeling Type Box-Jenkins Feedforward ANN Recurrent ANN Model (0 ; 1 ; 1 )  (0 ; 1 ; 1 )12 lags=1, 12, 13 lags=1, 12, 13 # Hidden Neurons NA 2 2 # Parameters 2 11 13 SS 1.079 0.9994 1.1751  0.095 0.0962 0.1053 AIC -555.7 -546.786 -523.5157 BIC -546.1 -505.216 -487.28 FMSE 315.3 310.5 306.3 FMAD 12.5 14.1 12.9

b

RES

Although the within sample forecasts are not as good as the other methods, the recurrent network achieves the best mean out-of-sample forecasts. From the plot, it is interesting to note that each method does better than the rest for some forecasted values, but based on the FMSE, the recurrent network provides marginally better results. The FMAD tells us that the Box-Jenkins and recurrent network methods are comparable while the feedforward network performed signi cantly worse. For another example, consider the Box-Jenkins Series A, chemical process concentration readings at 2 hour intervals, shown in Figure 8. Box et al. (1994) analyze this series and choose an ARMA(1; 1) as a parsimonious model. For comparison, a feedforward neural network with one input at lag 1, two nodes in the hidden layer and one output node is created. For the recurrent network, a connection is added from the output node to a context unit with decay parameters  = 0:9 and  = 0:1: The following table is a summary of the forecasts shown in Figure 9. The networks are trained on the rst 191 observations and all three methods are used to forecast the nal 6 values. 1

2

Using Recurrent Neural Networks for Time Series Forecasting Actual and Box-Jenkins

600 550 500 400

4

6

8

10

12

2

4

6

8

10

Forecast

Forecast

Actual and FFNN

Actual and JRNN

12

550 500 450 400

400

450

500

550

Passengers (in thousands)

600

600

2

Passengers (in thousands)

450

Passengers (in thousands)

550 500 450 400

Passengers (in thousands)

600

Actual Values

2

4

6 Forecast

8

10

12

2

4

6

8

10

12

Forecast

Figure 7: Actual and forecasted values of last 12 observations from Airline series

14

15

17.0 16.5

Concentration

17.5

18.0

Using Recurrent Neural Networks for Time Series Forecasting

0

50

100

150

200

Reading

Figure 8: Box-Jenkins Series A : Chemical Process Concentration Readings : Every 2 Hours

Modeling Type Box-Jenkins Feedforward ANN Recurrent ANN Model (1 ; 0 ; 1 ) lags=1 lags=1 # Hidden Neurons NA 2 2 # Parameters 2 7 9 SS 19.188 21.56 19.208  0.314 0.337749 0.320494 AIC -451.47 -418.626 -437.266 BIC -442.914 -395.341 -407.443 FMSE 0.157935 0.264902 0.1576944 FMAD 0.3256333 0.463793 0.3288155

b

RES

Once again, the Box-Jenkins and recurrent neural network methods perform comparably while the feedforward network performs signi cantly worse. Moreover, perhaps the key result in both examples is that the recurrent ANN is able to match the performance of the ARIMA model, whereas the feedforward ANN does not.

Using Recurrent Neural Networks for Time Series Forecasting

18.0 17.8 17.4 17.2

2

3

4

5

6

1

2

3

4

5

Forecast

Forecast

Actual and FFNN

Actual and JRNN

6

17.8 17.6 17.2

17.2

17.4

17.4

17.6

Concentration

17.8

18.0

18.0

18.2

18.2

1

Concentration

17.6

Concentration

17.8 17.6 17.2

17.4

Concentration

18.0

18.2

Actual and Box-Jenkins

18.2

Actual Values

1

2

3

4

Forecast

5

6

1

2

3

4

5

6

Forecast

Figure 9: Actual and forecasted values of last 6 observations from the Chemical series

16

Using Recurrent Neural Networks for Time Series Forecasting

17

5. Conclusions and Future Work

This paper shows how some ARMA models can be implemented with neural networks. Most of the research regarding neural networks for time series forecasting has been done with feedforward networks which are analogous to a non-linear autoregressive model. By adding recurrence, it is possible to account for moving average structure inherent in a time series. Neural networks may perform marginally better, based on the FMSE, provided the moving average-like structure can be incorporated into the ANN, via recurrent networks or some other scheme. Such improvements are possible since ANNs can include non-linear relationships. However, this slight improvement does come at a cost in that the recurrent neural network takes considerably longer to train than a feedforward network or the time needed to estimate the linear model coecients. The literature on linear model selection is vast compared to that on choosing neural network architecture. As with conventional models, neural networks must also be constructed parsimoniously to avoid adding unnecessary noise into the system. Though much work has been done on architecture selection, it is essentially a trial and error process. By using the techniques for linear model selection, it should be possible to construct a neural network that will perform at least as well as the corresponding linear model. This architecture will not necessarily be the best, but will provide a good lower bound for further exploration. Future research areas in this eld include implementation of neural network architectures to account for higher order moving average structure and using statistical methods for detecting linear and non-linear relationships between variables to aid in architecture construction.

Using Recurrent Neural Networks for Time Series Forecasting

18

References

Bigus, J. P. (1996). Data Mining with Neural Networks. McGraw-Hill. Box, G., Jenkins, G., & Reinsel, G. (1994). Time Series Analysis Forecasting and Control (Third edition). Prentice Hall : Englewood Cli s, New Jersey. Bulsari, A. B., & Saxen, H. (1993). A Recurrent Neural Network for Time-series Modelling. Arti cial Neural Networks and Genetic Algorithms { Proceedings of the International Conference in Innsbruck, Austria. Cheng, B., & Titterington, D. M. (1994). Neural Networks: A Review from a Statistical Perspective (with discussion). Statistical Science, 9 (1), 2{54. Connor, J. T., Martin, R. D., & Atlas, L. E. (1994). Recurrent Neural Networks and Robust Time Series Prediction. IEEE Transactions on Neural Networks, 5 (2), 240{254. Elman, J. L. (1990). Finding Structure in Time. Cognitive Science, 14, 179{211. Faraway, J., & Chat eld, C. (1995). Time Series Forecasting with Neural Networks: A Case Study. Research Report 95-06 of the Statistics Group, University of Bath. Freeman, J. A. (1994). Simulating Neural Networks with Mathematica. Addison-Wesley Publishing Company. Haykin, S. (1994). Neural Networks : A Comprehensive Foundation. Prentice Hall. Hill, T., O'Connor, M., & Remus, W. (1996). Neural Network Models for Time Series Forecasts. Management Science, 42 (7), 1082{1092. Hinton, G. E. (1992). How Neural Networks Learn from Experience. Scienti c American. Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer Feedforward Networks are Universal Approximators. Neural Networks, 2, 359{366.

Using Recurrent Neural Networks for Time Series Forecasting

19

Jordan, M. I. (1986). Serial Order: A Parallel Distributed Processing Approach. Tech. rep. Report 86-104, Institute of Cognitive Science. Lippmann, R. P. (1987). An Introduction to Computing with Neural Nets. IEEE ASSP Magazine, 4, 4 { 22. McDonnell, J. R., & Waagen, D. (1994). Evolving Recurrent Perceptrons for Time-Series Modeling. IEEE Transactions on Neural Networks, 5 (1). Parzen, E. (1983). ARARMA Models for Time Series Analysis and Forecasting. Journal of Forecasting, 1, 67 { 82. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning Representations by Back-propagating Errors. Nature, 323, 533{536. Sarle, W. S. (1994). Neural Networks and Statistical Models. Proceedings of the Nineteenth Annual SAS Users Group International Conference. Tang, Z., de Almeida, C., & Fishwick, P. A. (1991). Time Series Forecasting Using Neural Networks vs. Box-Jenkins Methodology. Simulation, 57 (5), 303{310. Tong, H. (1990). Non-linear Time Series. Oxford University Press.

Suggest Documents