Determining Inputs for Neural Network Models of. Multivariate Time Series. H. R. Maier. Senior Civil Engineer, Western Samoa Water Authority, P.O. Box 245, ...
Microcomputers in Civil Engineering 12 (1997) 353–368
Determining Inputs for Neural Network Models of Multivariate Time Series H. R. Maier Senior Civil Engineer, Western Samoa Water Authority, P.O. Box 245, Apia, Western Samoa
& G. C. Dandy∗ Department of Civil and Environmental Engineering, University of Adelaide, Adelaide 5005, Australia
Abstract: In recent years, artificial neural networks have been used successfully to model multivariate water resources time series. By using analytical approaches to determine appropriate model inputs, network size and training time can be reduced. In this paper, it is proposed that the method of Haugh and Box and a new neural network–based approach can be used to identify the inputs for multivariate artificial neural network models. Both methods were used to obtain the inputs for a multivariate artificial neural network model used for forecasting salinity in the River Murray at Murray Bridge, South Australia. The methods were compared with a third method that uses knowledge of travel times in the river to identify a reasonable set of inputs. The results obtained indicate that all three methods are suitable for determining the inputs for multivariate time series models. However, the neural network–based method is preferable because it is quicker and simpler to use. Any prior knowledge of the underlying processes should be used in conjunction with the neural network method. 1 INTRODUCTION Time series analysis methods have been used extensively to model hydrologic time series, water-quality time series, and water demand, water pricing, and meteorologic time series ∗
To whom correspondence should be addressed.
and are a vital tool in water resources planning and management.23 Traditionally, the ARMA (autoregressive moving average) class of models5 has been used for modeling water resources time series because such models are “. . . accepted, standard representations of stochastic time series”46 and are among those forecasting models applied most successfully in practice.41 For example, this class of models has been used successfully to model river flows,3,9,17,39,42,44 water levels,26 and water-quality parameters.9 In many instances, multivariate models are required to model the complex relationship between the input and output time series, since the output time series may depend not only on its own previous values but also on past values of other variables. For example, water consumption depends on factors such as monthly rainfall, the number of rain days per month, monthly evaporation, and monthly average temperature,13 and river salinity at a particular location depends on upstream salinities, river levels, and flows.34 An excellent review of the use of ARMA type models for modeling multivariate water resources time series is given by Salas et al.39 In recent years, artificial neural network (ANN) methods have been applied successfully to a number of multivariate forecasting problems in the field of water resources engineering.13−15,30,31,34,47,48 ANNs are a form of computing inspired by the functioning of the brain and nervous system and are discussed in detail by a number of authors.1,21,24,29,34 ANN models have a number of advantages over ARMA type
© 1997 Microcomputers in Civil Engineering. Published by Blackwell Publishers, 350 Main Street, Malden, MA 02148, USA, and 108 Cowley Road, Oxford OX4 1JF, UK.
354
H. R. Maier & G. C. Dandy
models, including 1. The data used do not have to follow a gaussian distribution8,38 2. The data used may possess irregular seasonal variation.35 3. They are nonlinear.10 4. They perform well when limited data are available.40,41 5. They are very robust and are able to deal with outliers, as well as noisy or incomplete data.8,38,40,41 6. They are well suited to longer-term forecasting because they base their forecasts on the approximate underlying relationship in the data.38,41 It should be noted, however, that a number of more recent time series models are able to deal with some of these limitations such as nonlinearities,43 nonstationary data,45 and data noise.2 One of the most challenging tasks in developing a multivariate forecasting model is to determine appropriate model inputs. The difficulty is deciding how many lagged values to include from each input time series. In other words, if there are n input time series (z j,t−1 , z j,t−2 , . . . , z j,t−N , j = 1, 2, . . . , n), the problem is finding the maximum lag for each input time series (k j,max : k j,max < N , j = 1, 2, . . . , n) beyond which values of the input time series have no significant effect on the output time series. When developing models of the ARMA type, the order of the inputs can be determined using empirical and/or analytical approaches.17,42 In the empirical approach, a priori knowledge about the processes that generated the time series is used to determine the lags of the inputs. The aim of the analytical procedures is to determine the strength of the relationship between the output time series and the input time series at various lags. The lags that indicate strong correspondence between the input and output time series can then be used as an indicator of the important inputs for the multivariate model. Most analytical approaches are based on the method of Haugh and Box,20 which uses cross-correlation analysis to determine the strength of the relationship between the output time series and past values of the input time series. However, analytical approaches generally are not used to determine the inputs for multivariate ANN models. The main reason for this is that ANNs belong to the class of datadriven approaches, whereas conventional statistical methods are model-driven.10 In model-driven approaches, the structure of the model has to be determined first, which is done with the aid of the empirical or analytical approaches mentioned above, before the unknown model parameters can be estimated. Data-driven approaches, on the other hand, have the ability to determine which model inputs are critical,13,38 so there is no need for “. . . a priori rationalization about relationships between variables. . . .”27 However, presenting a large number of inputs to ANN models and relying on the network to determine the criti-
cal model inputs usually increase network size. This has a number of disadvantages, such as increasing training time, increasing the amount of data required to efficiently estimate the connection weights,27 and increasing the number of local minima in the error surface, which makes it more difficult to obtain the near-optimal combination of weights for the problem under consideration. This is particularly true for complex problems, where the number of potential inputs is large and where no a priori knowledge is available to suggest possible lags at which strong relationships exist between the output time series and the input time series. Consequently, there are distinct advantages in using an analytical technique to help determine the inputs for multivariate ANN models. Lachtermacher and Fuller27 have developed a hybrid methodology for determining the inputs for simple univariate ANN models. This method involves fitting a univariate Box-Jenkins model5 to the time series, which is used as a guide for determining the inputs to the ANN model. However, this method has not yet been extended to multivariate cases. In this paper it is proposed that the method of Haugh and Box20 can be used to determine the inputs to multivariate ANN models. An outline of the method is given in Section 3. In addition, a new neural network–based method for determining the inputs for multivariate ANN models is introduced, which is described in Section 4. The advantages and disadvantages of both methods are also discussed. Both methods are applied to the problem of forecasting salinity in the River Murray at Murray Bridge, South Australia.34 The model inputs obtained using these methods are compared with those obtained by Maier and Dandy,34 who used an empirical approach in conjunction with sensitivity analysis. All three sets of inputs are used to develop ANN models for forecasting salinity in the River Murray at Murray Bridge in order to assess the adequacy of the three methods for determining inputs to multivariate ANN models.
2 CASE STUDY The case study considered in this paper is that of forecasting salinity in the River Murray at Murray Bridge, South Australia, 14 days in advance. River Murray water is pumped from Murray Bridge to Adelaide, the capital of South Australia, via the Murray Bridge to Onkaparinga pipeline (Fig. 1) and forms a significant part of Adelaide’s water supply. By forecasting the salinity at Murray Bridge up to several weeks in advance and varying pumping accordingly, the salinity of the water supplied to users in Adelaide can be reduced by approximately 10%,12 resulting in a saving of approximately $4 million per year to consumers. The case study is discussed in detail by Maier and Dandy.34 The factors affecting salinity at Murray Bridge include upstream salinities, flows, and river levels.34 Daily salinity,
Neural Network Models of Multivariate Time Series
355
Fig. 1. The lower reaches of the River Murray.
flow, and river-level data were available from 1987 to 1991 at a number of locations, as indicated in Table 1 and Fig. 1. The data were supplied by the Engineering and Water Supply Department of South Australia (EWS). A plot of salinities at Murray Bridge and Loxton is shown in Fig. 2. It can be seen that both time series exhibit irregular seasonal variation. This is typical for all salinity time series. The salinity time series upstream of Murray Bridge are similar in shape to that of salinity at Murray Bridge. In general, the further upstream of Murray Bridge a site is, the lower are the salinity values and the greater is the time lag between changes in the upstream salinities and those at Murray Bridge. The shapes of the plots of the flow and river level data are very similar. The flow and river-level data also exhibit irregular seasonal variation and are inversely related to the salinity data. A typical plot is shown in Fig. 3. Since the flows
Fig. 2. Salinity at Murray Bridge (SMB) and salinity at Loxton (SLO), 1987–1991.
356
H. R. Maier & G. C. Dandy
at Lock 1 Lower and Overland Corner are almost identical and flows and river levels are strongly correlated,32 it was decided to consider only the salinity time series (i.e, salinity at Murray Bridge, Mannum, Morgan, Waikerie, and Loxton) and the time series of flow at Lock 1 Lower as potential input time series.
3 DETERMINATION OF MODEL INPUTS USING THE METHOD OF HAUGH AND BOX (METHOD 1) As mentioned in Section 1, the method of Haugh and Box20 uses cross-correlation analysis to determine the strength of the relationship between the input time series and the output time series at various lags. However, when cross-correlation analysis is used, all time series need to be “prewhitened” in order to obtain the true relationship between them.7 This is carried out by fitting a univariate ARMA type model to each time series and calculating the differences between the historical data and the values predicted by the models. These are called the model residuals. The cross-correlation function (CCF) is then calculated between the residuals of the output time series and the residuals of each of the input time series. An outline of the procedure is given in Figs. 4 and 5. The software required to implement the method of Haugh and Box20 was developed with the aid of a commercially available suite of Fortran subroutines25 in order to cater for daily data with seasonal variation.
Table 1 Available daily data (1987–1991) Location Murray Bridge Mannum Morgan Waikerie Loxton Lock 1 Lower Overland Corner Murray Bridge Mannum Lock 1 Lower Lock 1 Upper Morgan Waikerie Overland Corner Loxton
Variable
Abbreviation
Salinity Salinity Salinity Salinity Salinity Flow Flow River level River level River level River level River level River level River level River level
SMB SMN SMO SWE SLO FL1L FOC LMB LMN LL1L LL1U LMO LWE LOC LLO
3.1 Inspection of plots of the time series The first step is to inspect plots of the time series in order to get a visual indication of any seasonal effects, trends, and changes in variance and to enable the detection of outliers,
Fig. 4. Summary of the procedure based on the method of Haugh and Box.
Fig. 3. Salinity at Murray Bridge (SMB) and flow at Lock 1 Lower (FL1L), 1987–1991.
discontinuities, and turning points. A description of the time series was given in Section 2.
Neural Network Models of Multivariate Time Series
to difference the time series:
3.2 Checking of the time series for joint stationarity Before the true relationship between the component time series can be determined, they have to be jointly stationary. A multivariate time series is deemed to be jointly stationary if the statistical properties of each individual time series and the joint statistical properties between the time series are independent of time. The autocorrelation function (ACF) and the partial autocorrelation function (PACF) can be used to test for stationarity of the individual time series, and the cross-correlation function (CCF) can be used to check for joint stationarity between the time series.4 A daily time series with seasonal variation over an annual period is deemed to be stationary if none of the values of the ACF and the PACF are significantly different from zero at lags greater than 732.4 Values of the ACF and PACF may be considered to be significantly different from zero if they satisfy Eqs. (1) and (2), respectively.4 1 |rk | ≥ 2 √ N
à 1+2
k−1 X
!1/2 r j2
(1)
j=1
1 |rkk | ≥ 2 √ N
(2)
where rk = sample autocorrelation at lag k rkk = sample partial autocorrelation at lag k N = number of observations in the time series In addition, for joint stationarity between two daily time series (z i,t and z j,t ) with seasonal variation, no values of the CCF between the two time series may be significantly different from zero at lags greater than 732. Values of the CCF may be considered to be significantly different from zero if they satisfy Eq. (3). 1 |r zi,t z j,t | ≥ 2 √ N
357
(3)
where r zi,t z j,t = sample cross-correlation between time series z i,t and z j,t Using the preceding rules, each of the input time series was found to be nonstationary, since the ACF for each died down extremely slowly. 3.3 Transformation of time series by differencing If the component time series are not jointly stationary, they need to be transformed into jointly stationary time series by the use of differencing.5 The following equation can be used
yt = (1 − B L ) D (1 − B)d z t
(4)
where zt yt B L
= = = =
raw data (nonstationary) differenced data (stationary) backshift operator (i.e., Bz t = z t−1 ; B L z t = z t−L ) seasonality (e.g., L = 365 for daily data with
seasonal variation) D = degree of seasonal differencing d = degree of nonseasonal differencing The degree of seasonal and nonseasonal differencing required depends on the nature of the time series. Generally, the original (nonstationary) time series are differenced using various combinations of d and D. The differenced time series are then checked for stationarity. If more than one combination of d and D produces stationary time series, the time series with the lowest degree of differencing should be used. However, it is advisable to apply the same order of differencing to each component time series in order to maintain the phase relationship between them.18 Since each time series consists of daily data with seasonal variation, a degree of seasonal differencing of 1 (i.e., D = 1) and a seasonality of 365 (i.e., L = 365) were used. Various degrees of nonseasonal differencing (d = 0, 1, and 2) were tried. The lowest value of d that resulted in joint stationarity of the input time series was 1. It should be noted that after differencing, the value of N used in Eqs. (1) to (3) should be that for the differenced time series. 3.4 Development of univariate time series models for each component time series Once the component time series are jointly stationary, they need to be “prewhitened” by fitting a univariate auto-regressive moving average (ARMA) model to each.7 The BoxJenkins methodology5 can be used to perform this task. A summary of the Box-Jenkins methodology is given in Fig. 5. The first three steps of the methodology (i.e., inspecting a plot of the time series, checking for stationarity, and differencing) are carried out as part of the procedure for obtaining jointly stationary time series described above. The initial type and order of the model are identified by considering the lags at which the ACF and the PACF are significantly different from zero (Eqs. 1 and 2), as described by Bowermann and O’Connell.4 The unknown model parameters are then estimated using a maximum-likelihood approach.18,19 The residuals (difference between actual and predicted values) are also obtained as part of the parameter estimation process. Finally, diagnostic checking has to be carried out to assess
358
H. R. Maier & G. C. Dandy
Fig. 5. Summary of the Box-Jenkins methodology.
the adequacy of the tentative model. If the model is adequate, the residuals should be uncorrelated in time. The whiteness of the residuals can be examined with the aid of the Box-Pierce method6 and the Portmanteau lack-of-fit test, as suggested by Bowermann and O’Connell.4 If the tentative model proves to be inadequate, a new tentative model has to be chosen, and the parameter estimation and diagnostic checking steps have to be repeated. ARMA models were fitted to each differenced input time series. The residuals of each model were found to be uncorrelated when the Box-Pierce method and the Portmanteau lack-of-fit test were used.
3.5 Calculation of cross-correlations between residuals The CCFs between the residuals of the output time series and the residuals of each of the input time series indicate the lags at which maximum correspondence occurs between them. Values of the CCF are deemed to be significant if they satisfy Eq. (3). The values that indicate significant correspondence are then chosen as inputs to the multivariate model. The CCF between the residuals of the ARMA model predicting salinity at Murray Bridge and the residuals of the ARMA models predicting salinity at Mannum, Morgan, Waikerie, and Loxton, as well as flow at Lock 1 Lower, were calcu-
Neural Network Models of Multivariate Time Series
359
Table 2 Comparison of inputs obtained using the various methods (an x indicates that the corresponding input was used in the model) Input
SMB
SMN
SMO
SWE
SLO
FL1L
Method
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
Lag 1
2
x x x x x x x x x x x x x x x x x x
x x x x x x x x x x x x x x
3
4
x x
x
x x
x
x x x x x x x x x x
5
6
x
x x x x x x x x
7
x
x
x x
If the number of component time series is n (z 1,t , z 2,t , z 3,t , . . . , z n,t ), and the aim is to predict time series z 1,t , n neural
15
x
x x
4.2 Development of univariate and bivariate ANN models
14
x
x x
This step is the same as that described in Section 3.1.
13
x
x
4.1 Inspection of plots of the time series
12
x
x x
This method involves the development of simple univariate and bivariate ANN models to establish relationships between the output time series and past values of each of the input time series. The strength of these relationships at various lags can then be determined by means of a sensitivity analysis. An outline of the procedure is shown in Fig. 6. A commercially available software package, NeuralWorks Professional II/Plus (NeuralWare, Inc., 1991), was used to develop the ANN models on an 80486 PC.
11
x
x x
4 DETERMINATION OF MODEL INPUTS USING THE NEURAL NETWORK–BASED APPROACH (METHOD 2)
10
x
x
lated. The values of the CCFs that were found to be significant using Eq. (3) were used as inputs to the multivariate ANN model and are summarized in Table 2 (Method 1).
9
x
x x
x x x x x x x
8
x
x x
x
x x
x x
x
x
x
x
x x x
x x
x
x
x
x
network models have to be developed. Network 1 predicts z 1,t using its own previous values as inputs, network 2 predicts z 1,t using past values from z 2,t as inputs, network 3 predicts z 1,t using past values from z 3,t as inputs, and so on. The following steps form part of the model-development procedure.
4.2.1 Choice of lags of inputs and outputs For each network, only one input time series is used. A maximum lag kmax is chosen, and values at lags 1, 2, 3, . . . , kmax are used as model inputs. If a priori knowledge about the relationship between the input and output time series is available, kmax is chosen so that the lags of the input time series that exceed kmax are not suspected to have any significant effect on the output time series. If no a priori knowledge is available, kmax has to be chosen arbitrarily. The number and lags of the outputs depend on the forecasting period(s) required. In the case study considered, the number of input time series is six (i.e., n = 6). A summary of the inputs and outputs chosen for each of the six ANN models developed is given in Table 3. The value of kmax was chosen to be 20 for the model using salinities at Murray Bridge as inputs (Model 1). This is conservative, since one would not expect future values of salinity at Murray Bridge to be strongly related to their own previous values up to lags of that magnitude. Also, inputs at lags of around 365 days were used to pick up seasonal cycles. Taking into account the salinity travel times from upstream
360
H. R. Maier & G. C. Dandy
Fig. 6. Outline of the neural network–based approach.
Table 3 Details of the univariate and bivariate models trained for input identification purposes
Model
Output time series
Input time series
1 2 3 4 5 6
SMB SMB SMB SMB SMB SMB
SMB SMN SMO SWE SLO FL1L
locations to Murray Bridge34 and the fact that the models are trained for a forecasting period of 14 days, a value of kmax = 30 was chosen for the models using upstream salinities as inputs. For the model using flows from Lock 1 Lower as inputs (Model 6), an arbitrary choice of kmax = 50 was made. For all models, the lag of the output was chosen to be −13, since the desired forecasting period is 14 days.
Lags of inputs (days) 1, 2, . . . , 20, 354, 355, . . . , 376 1, 2, . . . , 30 1, 2, . . . , 30 1, 2, . . . , 30 1, 2, . . . , 30 1, 2, . . . , 50
4.2.2 Choice of network geometry and internal parameters The network geometry (number of hidden layers and number of nodes per hidden layer) and some internal parameters (e.g., learning rate) have to be chosen. This requires judgment, although some guidelines are available.22,37 However, there is generally a wide range of parameters for which network performance is only slightly affected.33 It should be noted that at this stage it is not crucial to obtain the best possible fore-
Neural Network Models of Multivariate Time Series
casts, since the aim of the procedure is to obtain the dominant network inputs, which generally will be the same unless the model gets stuck in an undesirable region of the weight space during training. Maier and Dandy33 carried out extensive sensitivity analyses, assessing the effect of different internal parameters and network geometries on network performance for the case study considered. They found that learning rate, momentum, epoch size, and network geometry did not have any significant effect on generalization ability, provided the combination of parameters chosen did not result in divergent behavior. However, the size of the steps taken in weight space during training was found to have a significant effect on learning speed. A learning rate of 0.1 in conjunction with a momentum value of 0.6 was found to give good results and hence was used for all models. The above-mentioned learning rate resulted in divergent behavior for the model using inputs of flows at Lock 1 Lower (Model 6). Consequently, the learning rate was reduced to 0.01. In the software used (NeuralWorks Professional II/Plus), epoch size is given a default value of 16, and since epoch size did not have any effect on generalization ability for this data set, this was adopted. The hyperbolic tangent transfer function was found to be superior to sigmoidal and linear transfer functions and hence was used. For the same reason, the quadratic error function was used in preference to cubic and quartic error functions, which are the other options available with the software employed. Network geometries were chosen with the aid of the guidelines discussed by Maier and Dandy.34 For Models 1 to 5, the number of nodes used in the first hidden layer was 45 and the numbers of nodes used in the second hidden layer was 15. For Model 6, the number of nodes used in the first and second hidden layers were 60 and 20, respectively. However, Maier and Dandy33 found that network geometry did not have a significant effect on model performance for the data set used. To ensure that the network would not get “stuck” in an undesirable region of the weight space, the initial connection weights were chosen to be randomly distributed between −0.1 and +0.1, which are the default values suggested in the software. 4.2.3 Training It is more informative to use the input data from the test set during the sensitivity analysis (see Sec. 4.3), since this is more likely to identify the inputs that result in better generalization ability. Daily data from 1987, 1989, 1990, and 1991 were used for training, and data from 1988 were used to test the generalization ability of the networks at various stages of learning. The 1988 data were used for testing because they do not contain any extreme values of inputs and outputs. The 1991 data were not used for testing because they are used in Section 6 to assess the ability of the various methods described to determine appropriate model inputs. The rootmean-square error (RMSE) between the actual (desired) and
361
predicted values of the test set was used to measure generalization ability. Training was continued until a plateau was reached in the RMS prediction error. This stopping criterion is known as cross-validation. The testing interval was chosen to be 5000. In each case, a local minimum in the error surface was reached prior to the presentation of 50,000 training samples. 4.3 Performance of sensitivity analyses In this step, the strength of the relationship between the model outputs(s) and the model inputs is determined for the models developed in Section 4.2 with the aid of sensitivity analyses. By examining plots of the relative significance of the model inputs at different lags for each of the models, appropriate inputs for the multivariate model can be chosen. The choice of which level of significance warrants the inclusion of a particular input is somewhat arbitrary and requires a degree of judgment. The plots of the relative significance of the model inputs also can be used to check if the value of kmax chosen (Sec. 4.2) was large enough. If the relative significance of the input at lag kmax is high, the value of kmax is too small. In this case, a larger value of kmax has to be chosen, the network geometry and internal parameters have to be adjusted, and training and the sensitivity analysis have to be repeated. In the case study considered, the sensitivity analyses were carried out with the aid of the software’s (NeuralWare Professional II/Plus) “Explain” function. As part of the sensitivity analysis, each of the network inputs is increased by 5% in turn. The percentage change in the output as a result of the increase in each of the inputs is then calculated, and the sensitivity of each of the inputs is given by
Sensitivity =
% change in output × 100 % change in input
(5)
The relative significance of the inputs was assessed separately for high- and low-flow conditions, since the significance of the various inputs is flow dependent.34 A typical plot of the relative significance of the inputs during high- and low-flow conditions for Model 1 (i.e., the model using salinities at Murray Bridge as inputs) is shown in Fig. 7. It can be seen that there are no significant inputs at high lags, indicating that the value of kmax chosen was large enough. By inspecting Fig. 7, inputs of salinity at Murray Bridge at lags 1 and 2 were considered significant and hence chosen as inputs for the multivariate model. A typical plot for another input time series (in this case salinity at Loxton) is shown in Fig. 8. As the distance between Murray Bridge and the location of the input time series
362
H. R. Maier & G. C. Dandy
Fig. 7. Relative significance of inputs, Model 1 (i.e., model using salinities at Murray Bridge as inputs), 14 days in advance.
Fig. 8. Relative significance of inputs, Model 5 (i.e., model using salinities at Loxton as inputs), 14 days in advance.
increases, the plots of the relative significance of the inputs exhibit the following trends: • At small lags, the inputs during high-flow conditions become increasingly significant when compared with those during low-flow conditions. • The number of significant lags increases. The preceding observation is in agreement with what one would expect. During low-flow conditions, salinity travel times from the sites closer to Murray Bridge (e.g., Mannum) exceed 14 days and provide sufficient information to produce
adequate 14-day forecasts. Consequently, salinity inputs from sites further upstream are not needed during low-flow conditions. However, during high-flow conditions, salinities from sites further upstream are required to produce adequate 14day forecasts, since salinity travel times from the sites closer to Murray Bridge are less than 14 days. The lags of the inputs that were found to be significant for the models using salinity at Mannum, Morgan, Waikerie, and Loxton as inputs are summarized in Table 2 (Method 2). At this stage, it should be highlighted that the choice of which inputs to include involves a degree of judgment. For example,
Neural Network Models of Multivariate Time Series
after inspection of Fig. 8, it was decided to include inputs at lags 1 to 7. However, equally strong cases could be made for the inclusion of inputs at lags 1 to 2 or 1 to 4. Figure 8 indicates that there is a significant input at lag 30 during low-flow conditions, suggesting that the salinity travel times from Loxton to Murray Bridge exceed 30 days at times of low flow. However, for sites upstream of Morgan, the significance of the inputs during low-flow conditions may be ignored. This is so because salinities from Murray Bridge, Mannum, and Morgan contribute to the prediction of salinity at Murray Bridge at times of low flow, whereas salinities from sites further upstream, including Waikerie and Loxton, mainly contribute to the prediction of salinity at Murray Bridge at times of high flow. During high-flow conditions, there are no significant inputs at high lags, indicating that the value of kmax chosen was appropriate. Using the preceding criteria, the value of kmax chosen was found to be adequate for the remaining models using salinity inputs (Models 2, 3, and 4). A typical plot of the relative significance of the inputs during high- and low-flow conditions for Model 6 (i.e., the model using flows at Lock 1 Lower as inputs) is shown in Fig. 9. It can be seen that there are no significant values at high lags, indicating that the lags of the inputs chosen were adequate. It should be noted that the input at lag 50 was not considered to be significant because it is isolated and less significant than the inputs at lags 1 to 7. It also can be seen that there is a much stronger relationship between flow at Lock 1 Lower and salinity at Murray Bridge at times of high flow. Figure 9 indicates that inputs at lags 1 to 8 appear to be significant for producing 14-day forecasts. Consequently, values of flow at Lock 1 Lower at lags 1 to 8 were chosen as inputs for the multivariate model (Table 2, Method 2).
5 SUMMARY OF NETWORK INPUTS CHOSEN A summary of the network inputs chosen using the method of Haugh and Box (Method 1), the neural network–based approach (Method 2), and the empirical method utilizing a priori knowledge about the underlying processes causing high salinities (Method 3, Maier and Dandy34 ) is given in Table 2. In Method 3, information about salinity travel times from upstream sites to Murray Bridge was used to choose the initial model inputs. The number of model inputs was subsequently reduced by carrying out sensitivity analyses on the entire model. It should be noted that it was decided to make a number of changes to the inputs obtained using Method 334 for the sake of consistency, including replacing the flows at Overland Corner with flows at Lock 1 Lower and omitting the levels at Lock 1 Upper. A comparison of the number of salinity, flow, and total inputs obtained using the three methods is given in Table 4.
363
The following points should be noted about Tables 2 and 4: • Generally, there is reasonable agreement between the methods. • Fewer salinity inputs were obtained using Method 2 than Method 1, since Method 2 identifies the critical inputs for a 14-day forecast, whereas Method 1 can only identify the critical inputs for a 1-day forecast. • The number of salinity inputs obtained using Method 3 was kept small by taking into account some of the underlying physical principles, resulting in salinity values at alternate days being used from Murray Bridge, Mannum, and Morgan. The reason for this is that salinity values from the preceding locations were assumed to be important at times of low flow, during which changes in salinity with time are slow. • The lowest number of total inputs was obtained using Method 2, and the highest number of total inputs was obtained using Method 1.
6 FORECASTING The inputs obtained using Methods 1, 2, and 3 were used to train ANN models for forecasting salinity at Murray Bridge 14 days in advance. Data from 1987 to 1990 were used for training, while data from 1991 were used to simulate what would be done in a real-time forecasting simulation. It should be noted that due to the limited data available, a real-time forecasting simulation is strictly not being carried out in this study, since some of the information built into the 1991 data was already used to determine the model inputs. Maier29 and Maier and Dandy33,36 found that, typically, the way the generalization ability of a network, as measured by the RMSE between the predicted and historical values of the independent test set, changes as training progresses is a function of the size of the steps taken in weight space. When small steps are taken in weight space, the RMSE decreases slowly and steadily until a local minimum in the error surface has been reached. Continued training results in small oscillations in RMSE as the network jumps from one side of a local minimum to the other. When larger steps are taken in weight space, the local minimum is reached more quickly, but continued training can result in large oscillations in the RMSE or even divergent behavior. Clearly, the former behavior is more desirable, although very small step sizes should be avoided because they increase training time. The absolute values of the step sizes that result in the two different types of network behavior described above are very much problem-dependent. Consequently, the way the generalization ability of a network changes as training progresses for a given step size has to be investigated for a particular data set. Such an investigation was carried out by Maier29
364
H. R. Maier & G. C. Dandy
Fig. 9. Relative significance of inputs, Model 7 (i.e., model using flows at Lock 1 Lower as inputs), 14 days in advance. Table 4 Comparison of the number of inputs obtained using the various methods
Number of salinity inputs Number of flow inputs TOTAL number of inputs
and Maier and Dandy33,36 for the data used in this study. They found that when relatively large steps were taken in weight space (learning rate = 0.1), a local minimum in the error surface was reached after the presentation of 45,000 training samples. Continued training (up to 150,000 training samples) did not result in overtraining, but the magnitude of the oscillations in RMSE was approximately 5.0 EC units (approximately 10%). When smaller steps were taken in weight space (learning rate = 0.005), a local minimum in the error surface was reached after the presentation of 150,000 training samples. Continued training (up to 200,000 training samples) did not result in overtraining, and the magnitude of the oscillations in RMSE was approximately 3.1 EC units (approximately 6%). Even when a learning rate of 0.001 was used and the network took a number of days to train, the magnitude of the oscillations in RMSE was still significant (approximately 1.5 EC units). As discussed by Maier29 and Maier and Dandy,33,36 when training a network, one has to ensure that a local minimum in the error surface has been reached when training is stopped (i.e., one has to ensure that the network is fully trained). On
Method 1
Method 2
Method 3
37 10
17 8
32 4
47
25
36
the other hand, the way the generalization ability of a network changes with continued training, once a local minimum has been reached, also has to be known to avoid overtraining, divergent behavior, or large oscillations in the RMSE. When sufficient data are available, independent training, validation, and forecasting sets should be used. The validation set is then used to optimize the generalization ability of the network. However, as pointed out by Maier29 and Maier and Dandy,33,36 in many real-life applications limited data sets are available, so as much data as possible should be used for training. If the way in which the generalization ability of a network changes as training progresses is known for a given set of parameters, a fixed number of training samples can be presented to the network. In this study, a learning rate of 0.02, a momentum value of 0.6, an epoch size of 16, the hyperbolic tangent transfer function, and the quadratic error function were used for all models. This is identical to the internal parameters used by Maier and Dandy.34 Maier29 and Maier and Dandy33,36 found that 100,000 training samples should be presented to networks using the preceding parameters to ensure that the network was fully trained. Continued
Neural Network Models of Multivariate Time Series
training was not found to result in overtraining. The preceding response was found to be independent of network size. Consequently, 100,000 training samples were used for training by Maier and Dandy34 and in this study. As discussed by Maier29 and Maier and Dandy,33,36 network geometry is not expected to have a significant impact on the generalization ability of the networks. However, for the internal parameters chosen, differences can be expected in the RMSEs of the forecasts obtained using networks with different geometries as a result of the oscillations in RMSE once the vicinity of a local minimum in the error surface has been reached (in the order of 4 to 5 EC units). This could be reduced by reducing the learning rate. However, there would be an increase in the time taken to train the network. The effect of different geometries was assessed in this study by trying three network geometries for each training set. The network geometries used in conjunction with the inputs obtained using Method 1 include 47-15-0-1 (Model 1-1), 47-20-0-1 (Model 1-2), and 47-35-0-1 (Model 1-3). The network geometries used in conjunction with the inputs obtained using Method 2 include 25-5-0-1 (Model 2-1), 25-15-0-1 (Model 2-2), and 25-30-0-1 (Model 2-3). The network geometries used in conjunction with the inputs obtained using Method 3 include 36-15-0-1 (Model 3-1), 36-20-0-1 (Model 3-2), and 36-35-0-1 (Model 3-3). The ranges of geometries tried are similar to those used by Maier and Dandy.34 Maier and Dandy34 presented the best forecast obtained using the three network geometries. The same was done in this study in order to obtain a fair comparison between the forecasts obtained using inputs suggested by the three methods for determining network inputs evaluated.
7 RESULTS The root mean squared errors (RMSEs) and average absolute percentage errors (AAPEs) of the 14-day forecasts obtained are shown in Table 5. It can be seen that, as discussed earlier, the generalization ability of the various models is comparable, regardless of which method was used to obtain the network inputs. The network geometry chosen had some impact on the results obtained. However, the differences were in the range expected as a result of oscillations in RMSE. As discussed earlier, this impact could be reduced by reducing the learning rate, although the time taken for training would be increased. The RMSEs of the best real-time forecasts for the models using the different training sets (Models 1-3, 2-1, and 3-2) and the corresponding time taken for training (i.e., the computer time taken to process 100,000 training samples) are shown in Fig. 10. It can be seen that the generalization ability of the three models is very similar. The RMSE of the model with inputs obtained using Method 1 was 44.0 EC units, compared with RMSEs of 44.6 and 46.5 EC units when Methods 2 and 3 were used. It should be noted that the 25 inputs identified by
365
Table 5 Real-time forecasting errors for models using inputs obtained by methods 1, 2, and 3
Model
Method
Geometry
RMSE (EC units)
AAPE (%)
1-1 1-2 1-3 2-1 2-2 2-3 3-1 3-2 3-3
1 1 1 2 2 2 3 3 3
47-15-0-1 47-20-0-1 47-35-0-1 25-5-0-1 25-15-0-1 25-30-0-1 36-15-0-1 36-20-0-1 36-35-0-1
49.7 45.2 44.0 44.6 47.5 49.5 47.9 46.5 52.2
7.3 6.2 5.7 5.7 5.7 6.3 5.9 5.8 7.3
Method 2 are a subset of the 47 inputs identified by Method 1. However, the latter has only a slightly improved performance (in terms of RMSE) than the former. It can be inferred that the additional inputs do not assist greatly in forecasting salinity at Murray Bridge. There is a marked difference in the time taken for training of the three models. It should be noted that there is good agreement between training time and the number of network inputs (and hence the network size that gives optimal generalization ability). In other words, the network that used the largest number of inputs (i.e., the network using the inputs obtained with the aid of Method 1) took longest to train, whereas the training time for the network with the smallest number of inputs (i.e., the network using the inputs obtained with the aid of Method 2) was shortest. The results obtained indicate that Method 2 has the greatest ability to determine the critical network inputs. The addition of unnecessary inputs did not adversely affect the generalization ability of the model, as indicated by the comparative performance of the models trained with the three input sets. This is in agreement with the results obtained by Tang et al.41 and Maier and Dandy.35 However, if the number of unnecessary inputs is too great, generalization ability might be reduced as a result of difficulties in finding near-optimal local minima in the error surface due to the larger number of parameters. The results also indicate that it might be worthwhile combining Methods 2 and 3 (i.e., incorporate any empirical knowledge into the analytical procedure, when available). For example, empirical knowledge was used to select inputs at alternate lags for salinities at Murray Bridge (SMB), Mannum (SMN), and Morgan (SMO) and flow at Lock 1 Lower (FL1L). This knowledge could be incorporated into the inputs obtained using Method 2, thus reducing the size of the input set further.
366
H. R. Maier & G. C. Dandy
Fig. 10. Best real-time forecasts and corresponding training times for models with inputs obtained using methods 1, 2, and 3.
8 CONCLUSIONS The results obtained indicate that the method of Haugh and Box (Method 1) and the method based on a neural network approach (Method 2) provide suitable analytical procedures for determining the inputs to multivariate ANN models. Such procedures are useful for determining the inputs for neural network models, especially when no a priori knowledge about the relationship between potential inputs and outputs is available and when the potential numbers of inputs are large. The advantages of the neural network–based method (Method 2) over the method of Haugh and Box (Method 1) include 1. It can determine which inputs are significant for a specific forecasting period (e.g., 14 days in the case study considered), thus reducing the number of inputs, which in turn reduces network size and training time. 2. It can provide valuable information about the relationship between the input and output time series (e.g., salinity travel times between Murray Bridge and various upstream locations under a variety of flow conditions in the case study considered). 3. It is simpler and quicker to use, since there is no need for preprocessing of the data (e.g., differencing and prewhitening). A disadvantage of Method 2 is that the initial lags of the inputs (i.e., kmax ) and the network geometry and parameters have to be determined using a trial-and-error approach. In addition, a large amount of judgment is required to decide which inputs of the univariate and bivariate models should be included in the multivariate model.
Method 3 (using knowledge of the travel times in the river) gave results that were reasonable but not quite as good as Method 2. Clearly, knowledge about the underlying physical processes should be used wherever possible in developing time series or neural network models. It is recommended that where knowledge of the physical processes is available, this should be used to narrow the range of inputs and lags considered in Method 2 (using a combination of Methods 2 and 3). Unfortunately, this information is not always available, in which case Method 2 should be used. We believe that the analytical methods presented in this paper provide general procedures for determining the inputs to multivariate ANN models. The neural network–based method (Method 2) has been used successfully to determine the inputs for a neural network model relating cell densities of cyanobacteria to a number of environmental variables in the River Murray.31 However, final judgment should be reserved until the procedures suggested in this paper have been applied to a number of diverse case studies.
REFERENCES 1. Abdalla, K. M. & Stavroulakis, G. E., A backpropagation neural network model for semi-rigid steel connections, Microcomputers in Civil Engineering, 10 (1995), 77–87. 2. Bergman, M. J. & Delleur, J. W., Kalman filter estimation and prediction of daily stream flows: I. Review, algorithm, and simulation experiments, Water Resources Bulletin, 21 (5) (1985), 827–32. 3. Bergman, M. J. & Delleur, J. W., Kalman filter estimation and prediction of daily stream flows: II. Application to the Potomac River, Water Resources Bulletin, 21 (5) (1985), 815–25.
Neural Network Models of Multivariate Time Series
4. Bowermann, B. L. & O’Connell, R. T., Time Series and Forecasting, Duxbury Press, North Scituate, MA, 1979. 5. Box, G. E. P. & Jenkins, G. M., Time Series Analysis, Forecasting and Control, Holden-Day, San Francisco, 1976. 6. Box, G. E. P. & Pierce, D. A., Distribution of residual autocorrelations in autoregressive-integrated moving average timeseries models, Journal of the American Statistical Association, 65 (1970), 1509–26. 7. Brockwell, P. J. & Davis, R. A., Time Series: Theory and Methods, Springer-Verlag, New York, 1987. 8. Burke, L. I. & Ignizio, J. P., Neural networks and operations research: An overview, Computer and Operations Research, 19 (3/4) (1992), 179–89. 9. Camacho, F., McLeod, A. I. & Hipel, K. W., Contemporaneous autoregressive-moving average (CARMA) modeling in water resources, Water Resources Bulletin, 21 (4) (1985), 709–20. 10. Chakraborty, K., Mehrotra, K., Mohan, C. K. & Ranka, S., Forecasting the behavior of multivariate time series using neural networks, Neural Networks, 5 (1992), 961–70. 11. Cryer, J. D., Time Series Analysis, Duxbury Press, North Scituate, MA, 1986. 12. Dandy, G. C. & Crawley, P. D., Optimization of multiple reservoir systems including salinity effects, Water Resources Research, 28 (4) (1992), 979–90. 13. Daniell, T. M., Neural networks: Applications in hydrology and water resources engineering, Preprints, International Hydrology and Water Resources Symposium, Perth, Australia, October 2–4, 1991, pp. 797–802. 14. Daniell, T. M. & Wundke, A. D., Neural networks: Assisting in water quality modelling, Preprints, Watercomp, Melbourne, Australia, March 30–April 1, 1993, pp. 51–7. 15. De Silets, L., Golden, B., Wang, Q. & Kumar, R., Predicting salinity in the Chesapeake Bay using backpropagation, Computer and Operations Research, 19 (3/4) (1992), 277–85. 16. Gunaratnam, D. J. & Gero, J. S., Effect of representation on the performance of neural networks in structural engineering applications, Microcomputers in Civil Engineering, 9 (1994), 97–108. 17. Haltiner, J. P. & Salas, J. D., Short-term forecasting of snowmelt runoff using ARMAX models, Water Resources Bulletin, 24 (5) (1988), 1083–89. 18. Harvey A. C., Time Series Models, Philip Allan Publishers, Oxford, UK, 1981. 19. Harvey A. C., Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge University Press, Cambridge, England, 1989. 20. Haugh, L. D. & Box, G. E. P., Identification of dynamic regression (distributed lag) models connecting two time series, Journal of the American Statistical Association, 72 (397) (1977), 121–30. 21. Hegazy, T., Fazio, P. & Moselhi, O., Developing practical neural network applications using backpropagation, Microcomputers in Civil Engineering, 9 (1994), 145–59. 22. Hertz, J. A., Krogh, A. & Palmer, R. G., Introduction to the Theory of Neural Computation, Addison-Wesley, Redwood City, CA, 1991. 23. Hipel, K. W., Time series analysis in perspective, Water Resources Bulletin, 21 (4), (1985), 609–24. 24. Hyeong-Taek, K. & Yoon, C. J., Neural network approaches
25. 26.
27.
28.
29.
30.
31.
32.
33.
34.
35. 36.
37. 38. 39.
40.
41.
367
to aid simple truss design problems, Microcomputers in Civil Engineering, 9 (1994), 211–8. IMSL, Inc., IMSL STAT/LIBRARY User’s Manual, version 2.0, IMSL, Houston, 1991. Irvine, K. N. & Eberhardt, A. J., Multiplicative, seasonal ARIMA models for Lake Erie and Lake Ontario water levels, Water Resources Bulletin, 28 (2) (1992), 385–96. Lachtermacher, G. & Fuller, J. D., Backpropagation in hydrological time series forecasting, in Stochastic and Statistical Methods in Hydrology and Environmental Engineering, vol. 3: Time Series Analysis in Hydrology and Environmental Engineering, K. W. Hipel, A. I. McLeod, U. S. Panu & V. P. Singh (eds.), Kluwer Academic Publishers, Dordrecht, Netherlands, 1994, pp. 229–42. Maier, H. R., A Review of Artificial Neural Networks, Research report no. R131, Department of Civil and Environmental Engineering, University of Adelaide, Adelaide, Australia, 1995. Maier, H. R., Use of artificial neural networks for modelling multivariate water quality time series, Ph.D. thesis, Department of Civil and Environmental Engineering, University of Adelaide, Adelaide, Australia, 1995. Maier, H. R. & Dandy, G. C., Forecasting salinity using neural networks and multivariate time series models, Preprints, Water Down Under 94, Adelaide, South Australia, November 21–25, 1994, pp. 297–302. Maier, H. R. & Dandy, G. C., Modelling cyanobacteria (bluegreen algae) in the River Murray using artificial neural networks, Preprints, Modsim 95, vol. 3, Newcastle, Australia, November 27–30, 1995, pp. 268–75. Maier, H. R. & Dandy, G. C., Application of Multivariate Time Series Modelling to the Prediction of Salinity, Research report no. R129, Department of Civil and Environmental Engineering, University of Adelaide, Adelaide, Australia, 1995. Maier, H. R. & Dandy, G. C., Effect of Internal Parameters and Geometry on the Performance of Backpropagation Networks, Research report no. R132, Department of Civil and Environmental Engineering, University of Adelaide, Adelaide, Australia, 1995. Maier, H. R. & Dandy, G. C., The use of artificial neural networks for the prediction of water quality parameters, Water Resources Research, 32 (4) (1996), 1013–22. Maier, H. R. & Dandy, G. C., Neural network models for forecasting univariate time series, Neural Network World (in press). Maier, H. R. & Dandy, G. C., The effect of internal parameters and geometry on the performance of back-propagation networks, Neural Computing and Applications (submitted). Maren, A., Harston, C. & Pap, R., Handbook of Neural Computing Applications, Academic Press, San Diego, CA, 1990. NeuralWare, Inc., Neural Computing, NeuralWorks Professional II/Plus, 1991. Salas, J. D., Tabioas, G. Q. III & Bartolini, P., Approaches to multivariate modeling of water resources time series, Water Resources Bulletin, 21 (4) (1985), 683–708. Schizas, C. N., Pattichis, C. S. & Michaelides, S. C., Forecasting minimum temperature with short time-length data using artificial neural networks, Neural Network World, 4 (2) (1994), 219–30. Tang, Z., deAlmeida, C. & Fishwick, P. A., Time series forecasting using neural networks vs. Box-Jenkins methodology,
368
H. R. Maier & G. C. Dandy
Simulation, 57 (5) (1991), 303–10. 42. Thompstone, R. M., Hipel, K. W. & McLeod, A. I., Forecasting quarter-monthly riverflow, Water Resources Bulletin, 21 (5) (1985), 731–41. 43. Tong, H., Thanoon, B. & Gudmundsson, G., Threshold time series modeling of two Icelandic riverflow systems, Water Resources Bulletin, 21 (4) (1985), 721–30. 44. Vecchia, A.V., Periodic autoregressive–moving average (PARMA) modeling with applications to water resources, Water Resources Bulletin, 21 (5) (1985), 683–708. 45. Young, P. C., Time-variable parameter and trend estimation in non-stationary economic time series, Journal of Forecasting, 13 (1994), 179–210. 46. Young, P. C., Ng, C. N., Lane, K. & Parker, D., Recursive forecasting, smoothing and seasonal adjustment of non-stationary
environmental data, Journal of Forecasting, 10 (1991), 57–89. 47. Zhang, S. P., Watanabe, H. & Yamada, R., Prediction of daily water demands by neural networks, in Stochastic and Statistical Methods in Hydrology and Environmental Engineering, vol. 3: Time Series Analysis in Hydrology and Environmental Engineering, K. W. Hipel, A. I. McLeod, U. S. Panu & V. P. Singh (eds.), Kluwer Academic Publishers, Dordrecht, Netherlands, 1994, pp. 217–27. 48. Zhu, M.-L.& Fujita, M., Application of neural networks to runoff prediction, in Stochastic and Statistical Methods in Hydrology and Environmental Engineering, vol. 3: Time Series Analysis in Hydrology and Environmental Engineering, K. W. Hipel, A. I. McLeod, U. S. Panu & V. P. Singh (eds.), Kluwer Academic Publishers, Dordrecht, Netherlands, 1994, pp. 205–16.