Forecast accuracy of exponential smoothing methods applied to four major crops of Pakistan ∗
Muhammad Akram1 , Ishaq Bhatti2 , Muhammad Ashfaq3 and Asif Ali Khan3
3
1 Monash
University, Australia,
2 LaTrobe
University, Australia,
University of Agriculture, Faisalabad
Abstract: This paper uses various forecasting methods to forecast future crop production levels using time series data for four major crops in Pakistan: wheat, rice, cotton and pulses. These different forecasting methods are then assessed based on their out-of-sample forecast accuracies. We empirically compare three methods: Box-Jenkins’ ARIMA, Dynamic Linear Models (DLM) and exponential smoothing. The best forecasting models are selected from each of the methods by applying them to various agricultural time series in order to demonstrate the usefulness of the models and the differences between them in an actual application. The forecasts obtained from the best selected exponential smoothing models are then compared with those obtained from the best selected classical Box-Jenkins ARIMA models and DLMs using various forecast accuracy measures. Key words: forecast; exponential smoothing; ARIMA; DLM; forecast accuracy measure.
∗ The work is partially supported by Higher Education Commission (HEC) of Pakistan and the University of Agriculture, Faisalabad 1 Corresponding Author. Email:
[email protected]; telephone: +61-3-99030111; address: Monash University, Victoria 3800, Australia.
Forecast comparison: An application
1 Introduction Exponential smoothing (ES) methods, which were first proposed in the late 1950s (Brown 1959, Holt 1957, Winters 1960), have since been among the most widely used forecasting procedures in industry and commerce, particularly in production planning, inventory control, sales forecasting, production scheduling and operations management. There are a variety of methods that fall into the ES family, each having the property that forecasts are weighted combinations of past observations. That is, forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation the higher the associated weight. This framework (ES) generates reliable forecasts quickly and for a wide spectrum of time series which is a great advantage and of major importance to applications in industry (see Hyndman et al. 2008). The popularity of ES is due to several practical considerations in short-range forecasting, and the surprising level of accuracy that can be obtained with minimal effort for model identification. Many studies have shown that, although the model formulation of these methods is relatively simple and easy to apply, they are as accurate as more complicated and statistically sophisticated forecasting methods (see, for example, Makridakis & Hibon 1979, Makridakis et al. 1982, Makridakis & Hibon 2000). ES can be applied to time series data, either for smoothing the data or for producing forecasts. The basic idea behind ES is to construct an exponentially weighted moving average (EWMA) for each component of the time series, namely the level, trend and seasonality. The weight and length of each component are determined through the smoothing parameters of the components, namely α for level, β for trend and γ for seasonality (for detail, see Makridakis et al. 1998). The ultimate objective of any forecasting method, such as ES or ARIMA, is to provide precise future predictions. Many new forecasting methods have become available over the last few decades, and researchers and practitioners use various different sets of methods to conduct their empirical forecasting comparisons. For example, Kirby (1966) compared
2
Forecast comparison: An application
regressions (trend fitting), moving averages and exponential smoothing for monthly forecasting, and reported in favor of exponential smoothing. The major finding of Makridakis & Hibon (1979), Makridakis et al. (1982), and Makridakis & Hibon (2000), who used thousands of time series, was that simple methods, such as simple exponential smoothing and damped trend methods perform as well or in many cases better than those that are more statistically sophisticated ones like ARIMA models. The flexibility of exponential smoothing is demonstrated by its ability to provide forecasts for the complete taxonomy of Pegels (1969), as extended by Gardner (1985), Hyndman et al. (2002) and Taylor (2003). It is important to evaluate forecast accuracy using genuine forecasts. That is, it is invalid to look at how well a model fits the historical data; the accuracy of forecasts can only be determined by considering how well a model performs on new data that were not used when fitting the model. When choosing models, it is common to use a portion of the available data for fitting, and use the rest of the data (out-of-sample data) for testing the model (see Hyndman et al. 2002). Then the testing data can be used to measure how well the model is likely to forecast on new data. The aim of this paper is to assess different forecasting methods based on their out-ofsample forecast accuracies. We compare various methods, such as Box-Jenkins’ ARIMA technique, dynamic linear models (West & Harrison 1997, Petris et al. 2009), and the exponential smoothing method, by applying them to some real time series data in order to demonstrate the differences between the models in an actual application. The rest of the paper is arranged as follows. A brief theoretical background of various forecasting methods is given in Section 2. In Section 3, model selection criteria for selecting the best exponential smoothing models are defined. The various forecast accuracy measures are defined in Section 4. The three methods under discussion are then compared by applying them to various agricultural production time series in Section 5. The final section contains concluding remarks.
3
Forecast comparison: An application
2 Theoretical Background 2.1 Modeling framework The state space framework, proposed by Hyndman et al. (2002) and extended by Taylor (2003), offers a statistical rationale in support of the ES models, which are based on capturing and smoothing the existing structural components of a time series, namely the level, trend and seasonality. Depending on the nature of the time series under consideration, an exponential smoothing model with appropriate components must be selected. The components of the method can interact in either an additive or a multiplicative way, and the trend can be either linear or damped. The forecasts generated by linear method display a constant trend (increasing or decreasing) indefinitely into the future. Empirical evidence indicates that these methods tend to over-forecast, especially for longer forecast horizons. Motivated by this observation, Gardner & McKenzie (1985) introduced a parameter that dampens the trend to a flat line some time in the future. The framework involves 30 different models (15 with additive errors and 15 with multiplicative errors). Each model consists of a measurement equation that describes the observed data and some transition equations that describe how the unobserved components or states (level, trend, seasonal) change over time. Hence these are referred to as state space models. Each model is denoted by its error, trend and seasonal components, following the procedure of Hyndman et al. (2008), in order to allow for easy crossreferencing of the models. They code any exponential smoothing model as a triplet ETS (error, trend, seasonality). The following shorthand is used: N for none, A for additive, Ad for additive damped, M for multiplicative and Md for multiplicative damped. For example, AAN refers to a model with additive errors, additive trend and no seasonality. Model ANN gives forecasts which are equivalent to simple exponential smoothing, model AAN underlies Holt’s linear method, and the additive Holt-Winters’ method is obtained by AAA. The following table, taken from Hyndman et al. (2002), shows the 15 models with additive errors.
4
Forecast comparison: An application
N A Ad M Md
Trend Component (None) (Additive) (Additive damped) (Multiplicative) (Multiplicative damped)
N (None) NN AN Ad N MN Md N
Seasonal Component A M (Additive) (Multiplicative) NA NM AA AM Ad A Ad M MA MM Md A Md M
2.2 State space models The state space models enable easy calculations of the likelihood, and provide facilities for the computation of prediction intervals for each model. For each of their methods, Hyndman et al. (2002) proposed two state space models with a single source of error, following the general approach of Ord et al. (1997). The two state space formulations correspond to the additive error and multiplicative error cases, and give equivalent point forecasts but different prediction intervals and likelihoods. The general framework, given by Ord et al. (1997), involves a state vector xt and state space equations of the form y t = h ( xt −1 ) + k ( xt −1 ) ε t xt =
f ( xt −1 ) + g ( xt −1 ) ε t ,
(1) (2)
where {ε t } is a Gaussian white noise process with mean zero and variance σ2 . Define et = k(xt−1 )ε t and µt = h(xt−1 ); then, yt = µt + et . The model with additive errors is written as yt = µt + ε t , so, in this case, k(xt−1 ) = 1 and g(xt−1 ) = g. The model with multiplicative errors is written as yt = µt (1 + ε t ). Thus, k(xt−1 ) = µt for this model and ε t = et /µt = (yt − µt )/µt , and hence, ε t is a relative error for the multiplicative model. Equation (1) is called the measurement equation, and equation (2) is known as the transition equation. The term h(xt−1 ) describes the effect of the past on yt . The term f (xt−1 ) shows the effect of the past on the current state xt . The term g(xt−1 )ε t shows the unpredictable change in xt , and the vector g determines the extent of the effect of the innovation on the state. All of the exponential smoothing methods of Hyndman et al. (2002) can be written in the form of equations (1) and (2). The underlying equations are given in Hyndman et al. (2008, Table 2.2 & 2.3). The only difference between the additive error
5
Forecast comparison: An application
and multiplicative error models is in the observation equation (1): multiplicative error models are obtained by replacing ε t in additive error models with µt ε t . The coefficient matrices h, f and g along with state vector xt can be determined easily from Table 2.2 & 2.3 of Hyndman et al. (2008), and are given below for some selected additive errors models. Here, In denotes the n × n identity matrix and 0n denotes a zero vector of length n. ANN: AAd N:
h = f = 1, h = [1 1],
g = α, xt = ℓt [ ] [ ] 1 1 α f = , g= 0 ϕ β
ANA:
h = [1 0′m−1 1],
1 0
f =
0m −1
[ and
0 0′m−1 0′m−1 1 , Im−1 0m−1
xt = g=
ℓt bt
]
α γ
and
0m −1
xt = [ℓt , st−1 , · · · , st−m+1 ]′ AAd A:
h = [1 1 0′m−1 1],
f =
1 0 0
1 ϕ 0
0m −1 0m −1
0′m−1 0 0′m−1 0 , ′ 0m −1 1 Im−1 0m−1
g=
α β γ
and
0m −1
xt = [ℓt , bt , st−1 , · · · , st−m+1 ]′ . The matrices for AAN and AAA are the same as for AAd N and AAd A respectively, but with ϕ = 1, and “m” denotes the seasonal frequency. ES methods are arguably the most widely applied forecasting methods. They can also be regarded as a sub-case of the ARIMA framework (Box et al. 2008), but they are often preferred to ARIMA models due to their simplicity, robustness and accuracy. However, Hyndman et al. (2002) showed that ES is in fact a general class of state space models that can provide optimal forecasts for a broader set of cases than ARIMA. In the opposite direction, an ARIMA model can also be put in the form of a linear state space model. The ES method has the advantage that its algorithm can be implemented relatively easily in a matrix language such as R (R Core Team 2013) and/or ’Matlab’. This algorithm is
6
Forecast comparison: An application
implemented in the R package fpp (Hyndman 2013).
2.3 Box-Jenkins ARIMA models ARIMA models provide another approach to time series forecasting. ARIMA is an acronym for AutoRegressive Integrated Moving Average model (integration in this context is the reverse of differencing). Exponential smoothing and ARIMA models are the two most widely-used approaches to time series forecasting, and provide complementary approaches to the problem. While exponential smoothing models were based on a description of trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data. Since a vast literature on ARIMA methodology is available (see Box et al. 1994, for example), we provide a brief description here. Before we introduce ARIMA models, we first discuss the autoregressive (AR) and moving average (MA) models. Autoregressive models In a multiple regression model, we forecast the variable of interest using a linear combination of predictors. In an autoregression model, we forecast the variable of interest using a linear combination of past values of the variable. The term autoregression indicates that it is a regression of the variable against itself. Thus an autoregressive model of order p can be written as yt = c + ϕ1 yt−1 + ϕ2 yt−2 + · · · + ϕ p yt− p + et , where c is a constant and et is white noise. This is like a multiple regression but with lagged values of yt as predictors. We refer to this as an AR(p) model. Mmoving average models Rather than use past values of the forecast variable in a regression, a moving average model uses past forecast errors in a regression-like model. y t = c + e t + θ 1 e t −1 + θ 2 e t −2 + · · · + θ q e t − q , where et is white noise. We refer to this as an MA(q) model. Of course, we do not observe the values of et , so it is not really regression in the usual sense.
7
Forecast comparison: An application
Non-seasonal ARIMA models If we combine differencing with autoregression and a moving average model, we obtain a non-seasonal ARIMA model. The full model can be written as y′t = c + ϕ1 y′t−1 + · · · + ϕ p y′t− p + θ1 et−1 + · · · + θq et−q + et ,
(3)
where y′t is the differenced series (it may have been differenced more than once). The “predictors” on the right hand side include both lagged values of yt and lagged errors. We call this an ARIMA(p,d,q) model. Once we start combining components in this way to form more complicated models, it is much easier to work with the backshift notation. Then equation (3) can be written as
(1 − ϕ1 B − · · · − ϕ p B p )(1 − B)d yt = c + (1 + θ1 B + · · · + θq Bq )et where B p yt = yt− p . Selecting appropriate values for p, d and q can be difficult. The auto.arima() function in R will do it automatically. This algorithm is also implemented in the R package fpp. The auto.arima() function in R uses a variation of the Hyndman & Khandakar (2008) algorithm which combines unit root tests, minimization of the AICc and MLE to obtain an ARIMA model. The order of ARIMA model (i.e., the values of p, d and q) is determined by Akaike’s Information Criterion (AIC). The algorithm set upper bound on the order p and q at 5, as it is almost impracticable to have the order p or q more than 5. Once the model order has been identified, we need to estimate the parameters c, ϕ1 , · · · , ϕ p , θ1 , · · · , θq by MLE. See Box et al. (2008) for detail.
2.4 Dynamic linear models In recent decades, dynamic linear models, and state space models more generally, have increasingly become a focus of interest in time series analysis. Dynamic linear models (DLM) represent a special case of general state space models, being linear and Gaussian.
8
Forecast comparison: An application
These models are specified by means of two equations: yt = Ht xt + ε t
(4)
xt = Ft xt−1 + ηt ,
(5)
where ε t ∼ N (0, σε2 ) and ηt ∼ N (0, ση2 ) are two independent white noise sequences, yt denotes the observation at time t, and xt is the random vector of k unobservable states of the system at time t. Ht is a fixed vector and Ft is a fixed state matrix. The usual features of a time series, such as trend and seasonality, can be modeled within this format. Sometimes, H and F are assumed to be independent of t, in which case the model is a time series DLM. Hyndman et al. (2008) refer to such time series DLMs as multi disturbance or multiple source of error (MSOE) state space models, as the equations (4) and (5) have two independent source of errors, ε t and ηt . When ηt = gε t (where g is a fixed vector of persistence parameters), the MSOE state space model becomes equivalent to the single source of error (SSOE) state space model introduced in Section 2.2. The following table (taken from Hyndman et al. 2008) shows the corresponding standard structural models from the two approaches for the local level and linear trend models. Note that although the common symbols ℓ, b and ε are used to Models MSOE models
SSOE models
Level
y t = ℓ t −1 + ε t ℓ t = ℓ t −1 + η t
y t = ℓ t −1 + ε t ℓt = ℓt−1 + αε t
Trend
y t = ℓ t − 1 + bt − 1 + ε t ℓ t = ℓ t − 1 + bt − 1 + η t bt = bt − 1 + ζ t
y t = ℓ t − 1 + bt − 1 + ε t ℓt = ℓt−1 + bt−1 + αε t bt = bt−1 + βε t
represent the level, slope and innovation respectively, their meanings differ between the two frameworks. It is possible to represent an ARIMA model, whether univariate or multivariate, as a DLM (see Petris et al. 2009, Chapter 3). However, in spite of the fact that an ARIMA model can formally be considered a DLM, the philosophies underlying the two classes of models
9
Forecast comparison: An application
are quite different: on the one hand, ARIMA models provide a black-box approach to data analysis, offering the possibility of forecasting future observations, but with a very limited interpretability of the fitted model; on the other hand, the DLM framework encourages the analyst to think in terms of easily interpretable, albeit unobservable, processes — such as trend and seasonal components — that drive the observed time series (for details, see Petris et al. 2009, Chapter 3). For DLM, estimation and forecasting can be obtained recursively by the well known Kalman filter (Kalman 1960, Kalman & Bucy 1961). The Kalman filter is a recursive procedure for computing the optimal estimator of the state vector at time t based on information available at time t. The central role of the Kalman filter is that when the disturbances and initial state vector are normally distributed, it enables the likelihood function to be calculated via the prediction error decomposition. This opens the way for estimation of any unknown parameter in the model (see Harvey 1989, for detail). The R function StructTS in the R package stats is used to fit DLM. It uses the Kalman filter to implement the maximum likelihood approach to develop the estimates. Since all the data series considered are non-seasonal, therefore it selects the best model among non-seasonal DLM using AIC.
3 Model selection criteria Hyndman et al. (2002) propose an automatic forecasting procedure which selects the best model from among the thirty single source of error state space models that underly exponential smoothing. This same algorithm has been used to select the best model in this paper. Ord et al. (1997) show that L(θ, x0 ) = n log
(
n
∑ ε2t
t =1
)
n
+ 2 ∑ log|k( xt−1 )|. t =1
is equal to twice the negative logarithm of the likelihood function (with constant terms eliminated), conditional on the parameters θ = (α, β, γ, ϕ) and the initial states x0 =
(ℓ0 , b0 , s0 , s−1 , · · · , s−m+1 ). The parameter θ and the initial states x0 have been estimated by minimizing L(θ, x0 ) (minimizing the negative logarithm of the likelihood function is
10
Forecast comparison: An application
equivalent to maximizing the logarithm of likelihood function). The models are selected using Akaike’s Information Criterion (AIC): ˆ xˆ 0 ) + 2p, AIC = −2 log L(θ, where L denotes the likelihood, p is the number of parameters in θ plus the number of states in x0 , and θˆ and xˆ 0 denote the estimates of θ and x0 . The model has been selected by minimizing the AIC amongst all 30 models. For model selection, the AIC is preferable to other measurements of the forecast error such as the MSE or MAPE because it penalizes models which contain too many parameters. Obviously, other model selection criteria, such as BIC, could also be used in a similar manner. There have been several suggestions for restricting the parameter space for α, β and γ (see Hyndman et al. 2008). The traditional approach is to ensure that the various equations can be interpreted as weighted averages, thus requiring all parameters to lie within (0,1).
3.1 Initialization The non-linear optimization requires some initial values of the parameters θ and initial states x0 . The initial values of the parameters used are θ = (0.1, 0.01, 0.01, 0.99). The initial values of ℓ0 and b0 are obtained using the following heuristic scheme proposed by Hyndman et al. (2002). • Compute a linear trend on the first 10 observations against a time variable t = 1, · · · , 10. • Then set ℓ0 to be the intercept of the trend. • For additive trend, set b0 to be the slope of the trend. • For multiplicative trend, set b0 = 1 + b/a, where a denotes the intercept and b denotes the slope of the fitted trend. Using other initial values of parameter θ and initial state x0 make no difference to our estimation.
11
Forecast comparison: An application
3.2 Automatic forecasting The preceding ideas are combined so as to obtain a robust and widely applicable automatic forecasting algorithm. The steps involved are summarized below. • For each series, apply all models and optimizing the parameters of the model in each case. • Select the best of the models according to the AIC. • The forecast is produced using the best model (with optimized parameters) for as many steps ahead as required. • Obtain prediction intervals for the best model either using the analytical results (see Hyndman et al. 2008) or by simulating future sample paths for yn+1 , . . . , yn+h and finding the δ/2 and 1 − δ/2 percentiles of the simulated data at each forecasting horizon. If simulation is used, the sample paths may be generated using the Gaussian distribution for errors (parametric bootstrap) or using the resampled errors (ordinary bootstrap). For some models (particularly the linear models), it is possible to compute analytical prediction intervals, but very difficult for others (see Hyndman et al. 2008). In fact, there are analytical results on prediction intervals for only 15 of the 30 models in Hyndman et al. (2008) exponential smoothing framework. So it is preferable to use a simulation approach, which is applicable to all of the models and very easy to implement.
4 Measures of forecast accuracy In the past, many studies have been conducted with the aim of identifying which method will provide the most accurate forecasts for a given class of time series data. In this section, the error measures used for drawing conclusions about the relative accuracies of models’ forecast performances are discussed. The Mean Square Error (MSE) is often used to compare the performances of forecasting models because of its computational convenience and theoretical relevance to statistics.
12
Forecast comparison: An application
However, it is scale-dependent, and so should not be used to compare different series. The most widely used unit-free measure is the Mean Absolute Percentage Error (MAPE), and many books recommend its use (see Hanke & Reitsch 1995, Bowerman et al. 2004). However, one disadvantage of the MAPE is that it is relevant only for ratio-scaled data (i.e., data with a meaningful zero). Armstrong & Collopy (1992) recommended the Median Relative Absolute Error (MdRAE) when few series are available, and the Median Absolute Percentage Error (MdAPE) otherwise. Makridakis (1993) discussed some problems with the RAE, and proposed an alternative measure called Symmetric Mean Absolute Percentage Error (sMAPE), a modification of the MAPE. Armstrong & Collopy (1992) referred to this as the Unbiased Absolute Percentage Error (UAPE). However, Koehler (2001) later commented on the sMAPE measure and showed that it actually penalizes low forecasts more than high forecasts, meaning that a more accurate name would be the Mean Asymmetric Absolute Percentage Error. Relative measures and measures based on relative errors both try to remove the scale of the data by comparing the forecasts with those obtained from some benchmark forecasting method, usually the na¨ıve method. However, they both have problems. Relative errors have a statistical distribution with an undefined mean and an infinite variance, while relative measures can only be computed when there are several forecasts for the same series, meaning that they cannot be used to measure the out-of-sample forecast accuracy at a single forecast horizon. Hyndman & Koehler (2006) proposed a new measure called the Mean Absolute Scaled Error (MASE), which is suitable for all three of the scenarios described earlier, and involves scaling the error based on the in-sample mean absolute error (MAE) from the na¨ıve forecast method. With so many measures of the forecast accuracy available, it is difficult to choose the most suitable one. In this paper, the following three commonly used forecast accuracy measures have been used to analyze the performances of the forecasts by applying them when forecasting the various time series.
13
Forecast comparison: An application
i. Root Mean Squared Error (RMSE) ii. Mean Absolute Percentage Error (MAPE) iii. Mean Absolute Scaled Error (MASE) These particular measures are chosen because several authors and practitioners believe that they are more reliable and sensitive to small changes than others (see Claeskens & Aerts 2000, Hyndman & Koehler 2006). Short descriptions of these accuracy measures are given below. i. RMSE
The root mean squared error (RMSE) is defined as follows: √ 1 n RMSE = (Yt − Ft )2 , n t∑ =1 where Yt is the observed value, Ft is the forecast (predicted) value at time t, and n is the length of the data series. ii. MAPE
First, a relative or percentage error is defined as PEt =
(Y − F ) t t × 100, Yt
where Yt is the actual value and Ft is the forecast value at time t. Then, the MAPE is defined as the average of the absolute values of PE, as follows (for more details, see Makridakis et al., 1998): MAPE =
1 n | PEt | . n t∑ =1
14
Forecast comparison: An application
iii. MASE
Hyndman & Koehler (2006) proposed a new but related idea that is suitable for all three of the scenarios described earlier. This scaled error is defined as qt =
1 n −1
Yt − Ft , n ∑i=2 |Yi − Yi−1 |
(6)
which is clearly independent of the scale of the data. A scaled error is less than one if it arises from a better forecast than the average one-step na¨ıve forecast computed insample. Conversely, it is greater than one if the forecast is worse than the average onestep na¨ıve forecast computed in-sample. The Mean Absolute Scaled Error (MASE) is simply MASE = mean(|qt |). The in-sample mean absolute error (MAE) is used in the denominator as it is always available and effectively scales the errors. In contrast, the out-of-sample MAE for the na¨ıve method can be based on very few observations and is therefore more variable. The MASE are widely applicable and can be used to compare forecast methods on a single series, and to compare forecast accuracy between series as it is scale-free. The only situation in which these measures would be infinite or undefined is when all historical observations are equal.
5 Empirical analysis In this section, the exponential smoothing method, the dynamic linear model and the Box-Jenkins method are applied to some real agriculture time series data in order to demonstrate the differences between the methods/models in an application. The time series used are wheat production (WP), rice production (RP), cotton production (CP) and pulse production (PP). All of the data sets are annual, covering the period 1961 to 2012 for wheat and rice, and the period 1971 to 2011 for cotton and pulses. The data have been collected from various issues of the Agricultural Statistics of Pakistan, published by the Ministry of Food and Agriculture (Economic Wing), government of Pakistan, and
15
Forecast comparison: An application
Pakistan Economic Surveys, published by the Economic Advisor wing, Finance Division, government of Pakistan. The time series are plotted in Figure 1. Agriculture is the backbone of the Pakistan’s economy, and plays an important role in the betterment of the large proportion of the population that live in rural areas in particular, and the overall economy in general. The growing population of Pakistan is a real point of concern which needs to be addressed by proper and timely planning for a food supply. Regarding food grain crops, wheat and rice are the main staple foods of the people of Pakistan, as well as being the most important parts of the agriculture sector.
6 5 4 3 1
2
10
15
20
Production (million tons)
7
25
Rice
5
Production (million tons)
Wheat
1990
2000
2010
1960
1990
1990
2000
2010
1.1
2010
0.9
2000
0.7
2.0
Production (million tons)
year
Pulses
1.5
1980
1980
year
1.0 1970
1970
Cotton
0.5
Production (million tons)
1980
0.5
1970
2.5
1960
1970
year
1980
1990
2000
2010
year
Figure 1: Time series data To enable a forecast comparison, each of the time series (Figure 1) is divided into two parts, a training set (comprising 80% of the data) and a test set (comprising 20% of the data). That is, we hold back the last 20% of the data, called the test set (we may call it future sample paths), and fit the model on the rest of the data, called the training set. The best selected exponential smoothing model, the best selected DLM and the best selected ARIMA model are then fitted to the training set using maximum likelihood estimation.
16
Forecast comparison: An application
In order to compare the performances of the models, we have plotted the forecasts of wheat, rice, cotton and pulse production in Figures 2–5, respectively. The actual values in the forecast period are also shown. Overall, the exponential smoothing method tends to perform well, as it is generally the best (for the wheat and rice time series) or secondbest (for the cotton and pulses time series) in terms of forecast accuracies, and gives more reasonable forecasts than its DLM and ARIMA counterparts. The performance of DLM is notable, as it outperforms the ARIMA method in all cases except for the pulses series. From Figure 2, it can be seen that the forecasts from the three methods all look quite similar, underestimating the actual values of wheat production. However, the measures of forecasting accuracy (see Table 1) suggest that the exponential smoothing method performs better than its counterparts. Moreover, the actual values in the forecast period are well within the 95% prediction intervals obtained via exponential smoothing. The forecasts and prediction intervals produced by the three methods for the rice time series are shown in Figure 3. The forecasts obtained from the various methods look quite different; in particular, the prediction intervals are much more wider in the case of exponential smoothing than for the other methods. This is because the prediction intervals produced by ES reflect the uncertainty associated with the data set, especially at the end of the training set, while the ARIMA and DLM methods seem to fail to pick it up properly. Furthermore, according to the forecast accuracy measures, the exponential smoothing method again performs well relative to the other techniques for the rice time series data (see Table 1). Figure 4 shows that the ES and ARIMA methods are both unable to produce very good forecasts for the cotton time series relative to the DLM, which seems to pick up some form of a trend from the training set. According to the forecast accuracy measures, the DLM takes the lead for the cotton time series, although the exponential smoothing method again beats the Box-Jenkins ARIMA technique. Only for the pulses production data (Figure 5), where strong stationarity is very prominent, the Box-Jenkins ARIMA technique perform better than the other methods, with ES again standing second-best. This could be due to the fact that the data series is stationary and all three methods fail to capture
17
Forecast comparison: An application
the pattern in the training set, though they do still produce sensible forecasts and forecast intervals. The various measures of forecast accuracy, namely the root mean squared error (RMSE), mean absolute percentage error (MAPE) and mean absolute scaled error (MASE), have also been calculated for all data sets over the forecast period (test set or future sample paths) for each model. According to these accuracy measures (Table 1), the exponential smoothing method comes out as the best for the wheat and rice time series, and second-best for the cotton and pulses time series. The DLM performs best for the cotton time series, while the Box-Jenkins ARIMA technique is best for the pulses time series, where strong stationarity is very prominent. While these are only few empirical examples, making it hard to draw general conclusions, the evidence is consistent with Figures 2 to 5 in suggesting that the exponential smoothing method performs better than either the DLM or Box-Jenkins ARIMA method. Measure
Method
Wheat
Rice
Cotton
Pulses
MASE
ES ARIMA DLM
1.302 1.797 1.308
3.472 3.678 3.788
1.759 1.864 1.138
1.285 1.245 1.356
MAPE
ES ARIMA DLM
5.004 6.830 5.026
14.062 14.825 15.264
13.544 14.352 8.749
16.030 14.902 18.914
RMSE
ES ARIMA DLM
1366 1913 1374
1018 1086 1119
347 366 248
182 193 173
Table 1: Measures of forecast accuracy for each model
6 Conclusions The main objective of any forecasting method is to make precise predictions of future random events. This paper has assessed different forecasting methods based on their out-of-sample forecast accuracy measures. Three methods, Box-Jenkins’ ARIMA, exponential smoothing (ES) and dynamic linear models (DLM), have been used to produce
18
Forecast comparison: An application
predictions for four important crops in Pakistan, which play important roles in Pakistan’s economy. Empirical comparisons have been performed using various different forecast accuracy measures, which have suggested that the exponential smoothing method performs better (either best or second-best in terms of the forecast accuracy measures) than its competitors, the DLM and the Box-Jenkins ARIMA. The DLM performs best for the cotton time series, followed by ES, while ARIMA does better for pulses, followed by ES. These forecasts may be helpful in determining prices of these four commodities in advance for the farmers and financial institutions.
Acknowledgements We would like to thank the Editor, the Associate Editor and two anonymous referees for their helpful and constructive comments and suggestions on an earlier draft of the paper, which have substantially improved our revised manuscript.
19
Forecast comparison: An application
Forecasts from ETS(M,Md,N)
million tones
25 20 15 10 5 1960
1970
1980
1990
2000
2010
2000
2010
Forecasts from ARIMA(2,1,0) with drift
million tones
25 20 15 10 5 1960
1970
1980
1990
Forecasts from Local linear structural model
million tones
25 20 15 10 5 1960
1970
1980
1990
2000
2010
Figure 2: Historical data (solid line) and forecasts (dashed line) for the wheat production time series. Top panel: forecasts from the best selected exponential smoothing model. Middle panel: forecasts from the best selected ARIMA. Bottom panel: forecasts from the best selected DLM model. The forecasts are based on models fitted to the data up to 2003. The dark (light) shaded region indicates 95% (80%) prediction intervals.
20
Forecast comparison: An application
Forecasts from ETS(M,A,N)
million tones
7 6 5 4 3 2 1 1960
1970
1980
1990
2000
2010
2000
2010
Forecasts from ARIMA(0,1,1) with drift
7 million tones
6 5 4 3 2 1 1960
1970
1980
1990
Forecasts from Local linear structural model
7 million tones
6 5 4 3 2 1 1960
1970
1980
1990
2000
2010
Figure 3: Historical data (solid line) and forecasts (dashed line) for the rice production time series. Top panel: forecasts from the best selected exponential smoothing model. Middle panel: forecasts from the best selected ARIMA. Bottom panel: forecasts from the best selected DLM model. The forecasts are based on models fitted to the data up to 2003. The dark (light) shaded region indicates 95% (80%) prediction intervals.
21
Forecast comparison: An application
Forecasts from ETS(A,N,N)
million tones
2.5 2.0 1.5 1.0 0.5 1970
1980
1990
2000
2010
2000
2010
Forecasts from ARIMA(0,1,0)
million tones
3.0 2.5 2.0 1.5 1.0 0.5 1970
1980
1990
Forecasts from Local linear structural model
3.0 million tones
2.5 2.0 1.5 1.0 0.5 1970
1980
1990
2000
2010
Figure 4: Historical data (solid line) and forecasts (dashed line) for the cotton production time series. Top panel: forecasts from the best selected exponential smoothing model. Middle panel: forecasts from the best selected ARIMA. Bottom panel: forecasts from the best selected DLM model. The forecasts are based on models fitted to the data up to 2003. The dark (light) shaded region indicates 95% (80%) prediction intervals.
22
Forecast comparison: An application
Forecasts from ETS(M,N,N) 1.1 million tones
1.0 0.9 0.8 0.7 0.6 0.5 1970
1980
1990
2000
2010
Forecasts from ARIMA(1,0,0) with non−zero mean
million tones
1.0 0.9 0.8 0.7 0.6 0.5 1970
1980
1990
2000
2010
Forecasts from Local linear structural model
1.6 million tones
1.4 1.2 1.0 0.8 0.6 0.4 0.2 1970
1980
1990
2000
2010
Figure 5: Historical data (solid line) and forecasts (dashed line) for the pulses production time series. Top panel: forecasts from the best selected exponential smoothing model. Middle panel: forecasts from the best selected ARIMA. Bottom panel: forecasts from the best selected DLM model. The forecasts are based on models fitted to the data up to 2003. The dark (light) shaded region indicates 95% (80%) prediction intervals.
23
Forecast comparison: An application
References Armstrong, J. S. & Collopy, F. (1992), ‘Error measures for generalizing about forecasting methods: Empirical comparisons’, International Journal of Forecasting 8, 69–80. Bowerman, B. L., O’Connell, R. T. & Koehler, A. B. (2004), Forecasting, time series and regression: an applied approach, Thomson Brooks/Cole, Belmont CA. Box, G. E. P., Jenkins, G. & Reinsel, G. C. (1994), Time series analysis: forecasting and control, Englewood Cliffs, New Jersey: Prentice-Hall. Box, G. E. P., Jenkins, G. & Reinsel, G. C. (2008), Time series analysis: forecasting and control, 4th edn, Hoboken, N.J. : John Wiley. Brown, R. G. (1959), Statistical forecasting for inventory control, McGraw-Hill, New York. Claeskens, G. & Aerts, M. (2000), ‘On local estimating equations in additive multiparameter models’, Statistics and Probability Letters 49, 139–148. Gardner, Jr, E. S. (1985), ‘Exponential smoothing: The state of the art’, Journal of Forecasting 4, 1–28. Gardner, Jr, E. S. & McKenzie, E. (1985), ‘Forecasting trends in time series’, Management Science 31(10), 1237–1246. Hanke, J. E. & Reitsch, A. G. (1995), Business forecasting, 5th edn, Prentice-Hall: Englewood Cliffs, NJ. Harvey, A. C. (1989), Forecasting, structural time series models and the Kalman filter, Cambridge University Press, Cambridge. Holt, C. E. (1957), Forecasting trends and seasonals by exponentially weighted averages, O.N.R. Memorandum 52/1957, Carnegie Institute of Technology. Hyndman, R. J. (2013), fpp: Data for “Forecasting: principles and practice”. R package version 0.5. URL: http://CRAN.R-project.org/package=fpp
24
Forecast comparison: An application
Hyndman, R. J. & Khandakar, Y. (2008), ‘Automatic time sereis forecasting: the forecast package for r’, Journal of Statistical Software 27(3). Hyndman, R. J. & Koehler, A. B. (2006), ‘Another look at measures of forecast accuracy’, International Journal of Forecasting 22. URL: http://www.robhyndman.info/papers/mase.htm Hyndman, R. J., Koehler, A. B., Ord, J. K. & Snyder, R. D. (2008), Forecasting with exponential smoothing: The state space approach, Springer-Verlag: Berlin Heidelberg. Hyndman, R. J., Koehler, A. B., Snyder, R. D. & Grose, S. (2002), ‘A state space framework for automatic forecasting using exponential smoothing methods’, International Journal of Forecasting 18(3), 439–454. URL: http://www.robhyndman.info/papers/hksg.htm Kalman, R. E. (1960), ‘A new approach to linear filtering and prediction problem’, Journal of Basic Engineering pp. 35–45. Kalman, R. E. & Bucy, R. S. (1961), ‘New results in linear filtering and prediction theory’, Journal of Basic Engineering pp. 95–108. Kirby, R. M. (1966), ‘A comparison of short and medium range forecasting methods’, Management Science 13, B202–B210. Koehler, A. B. (2001), ‘The asymmetry of the sAPE measure and other comments on the M3-competition’, International Journal of Forecasting 17, 537–584. Makridakis, M. & Hibon, M. (2000), ‘The M3-competition: results, conclusions and implications’, International Journal of Forecasting 16, 451–476. Makridakis, S. (1993), ‘Accuracy measures: theoretical and practical concerns’, International Journal of Forecasting 9, 527–529. Makridakis, S., Anderson, A., Carbone, R., Fildes, R., Hibon, M., Newton, R. L. J., Parzen, E. & Winkler, R. (1982), ‘The accuracy of extrapolation (time series) methods: results of a forecasting competition’, Journal of Forecasting 1, 111–153.
25
Forecast comparison: An application
Makridakis, S. & Hibon, M. (1979), ‘Accuracy of forecasting: an empirical investigation (with discussion)’, Journal of the Royal Statistical Society, Series A 142, 97–145. Makridakis, S., Wheelwright, S. C. & Hyndman, R. J. (1998), Forecasting: methods and applications, 3rd edn, John Wiley & Sons, New York. Ord, J. K., Koehler, A. B. & Snyder, R. D. (1997), ‘Estimation and prediction for a class of dynamic nonlinear statistical models’, Journal of the American Statistical Association 92, 1621–1629. Pegels, C. C. (1969), ‘Exponential forecasting: Some new variations’, Management Science 15(5), 311–315. Petris, G., Petrone, S. & Campagnoli, P. (2009), Dynamic Linear Models with R, Springer. R Core Team (2013), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. URL: http://www.R-project.org/ Taylor, J. W. (2003), ‘Short-term electricity demand forecasting using double seasonal exponential smoothing’, Journal of the Operational Research Soc 54, 799–805. West, M. & Harrison, J. (1997), Bayesian forecasting and dynamic models, 2nd edn, SpringerVerlag, New York. Winters, P. R. (1960), ‘Forecasting sales by exponentially weighted moving averages’, Management Science 6, 324–342.
26