Big Data: Baseline Forecasting With Exponential ... - (CPDF) Training

4 downloads 213 Views 3MB Size Report
8. Big Data: Baseline Forecasting With Exponential. Smoothing Models. Exponential smoothing models provide a viable fram
8 Big Data: Baseline Forecasting With Exponential Smoothing Models Prediction is very difficult, especially if it is about the future NIELS BOHR (1885-1962), Nobel Laureate Physicist

Exponential smoothing models provide a viable framework for forecasting large volume, disaggregate demand patterns. For short-term planning and control systems, these techniques are extremely reliable and have more than adequate track record in forecast accuracy with trend/seasonal data. This chapter deals with the description and evaluation of techniques that • • •

• •

are widely used in the areas of sales, inventory, logistics, and production planning as well as in quality control, process control, financial planning and marketing planning can be described in terms of a state-space modeling framework that provides prediction intervals and procedures for model selection are well-suited for large-scale, automated forecasting applications, because they require little forecaster intervention, thereby releasing the time of the demand forecaster to concentrate on the few problem cases are based on the mathematical extrapolation of past patterns into the future, accomplished by using forecasting equations that are simple to update and require relatively small number of calculations capture level (a starting point for the forecasts), trend (a factor for growth or decline) and seasonal factors (for adjustment of seasonal variation) in data patterns

What is Exponential Smoothing? In chapter 3, we introduced forecasting with simple and weighted moving averages as an exploratory smoothing technique for short-term forecasting of level data. With exponential smoothing models, on the other hand, we can create short-term forecasts with prediction limits for a wider variety of data having trends and seasonal patterns; the modeling methodology offers prediction limits (ranges of uncertainty) and prescribed forecast profiles. Exponential smoothing provides an essential simplicity and ease of understanding for the practitioner, and has been found to have a reliable track record for accuracy in many business applications. Exponential smoothing was invented during World War II by Robert G. Brown (1923 ̶ 2013), who was involved in the design of tracking systems for fire-control information on the location of enemy submarines. Later on, the principles of exponential smoothing were applied to business data, especially in the analysis of the demand for service parts in inventory systems in Brown’s book Advanced Service Parts Inventory Control (1982).

8 - 2 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained As part of the state-space forecasting methodology, exponential smoothing models provide a flexible approach to weighting past historical data for smoothing and extrapolation purposes. This exponentially declining weighting scheme contrasts with the equal weighting scheme that underlies the outmoded simple moving average technique for forecasting. Exponential smoothing is a forecasting technique that extrapolates historical patterns such as trends and seasonal cycles into the future. There are many types of exponential smoothing models, each appropriate for a particular forecast pattern or forecast profile. As a forecasting tool, exponential smoothing is very widely accepted and a proven tool for a wide variety of short-term forecasting applications. Most inventory planning and production control systems rely on exponential smoothing to some degree. We will see that the process for assigning smoothing weights is simple in concept and versatile for dealing with diverse types of data. Other advantages of exponential smoothing are that the methodology takes account of trend and seasonal patterns in time series; embodies a weighting scheme that gives more weight to the recent past than to the distant past; is easily automated, making it especially useful for large-scale forecasting applications; and can be described in a modeling framework needed for deriving useful statistical prediction limits and forecast profiles. When selecting a model for demand forecasting, focus on forecast profiles, rather than fit statistics and model coefficients. For demand forecasting, the disadvantages are that exponential smoothing models do not easily allow for the inclusion of explanatory variables into a forecasting model and cannot handle business cycles. Hence, when forecasting economic variables, such techniques are not expected to perform well on business data that exhibit cyclical turning points.

Smoothing Weights To understand how exponential smoothing works, we need first to understand the concept of exponentially decaying weights. Consider a time series of production rates (number of completed assemblies per week) for a 4-week period in the following table: Week Three periods ago (T - 3) Two periods ago (T - 2) Previous (T - 1) Current t = T

Production 266 411 376 425

In order to predict next period’s (T + 1) production rate without having knowledge of or information about future demand, we assume that the following week will have to be an average week for production. A reasonable projection for the following week can be based on taking an average of the production rates during past weeks. However, what kind of average should we propose? Equally Weighted Average. The simplest option. Described in Chapter 3, is to select an equally weighted average, which is obtained by given equal weight to each of the weeks of available data: (425 + 376 + 411 + 266) / 4 = 370 This equally weighted average is simply the arithmetic mean of the data. The forecast of next week's production rate is 370 assemblies. Implicitly, we are assuming that events of 2 and 3 weeks prior (e.g., the more distant past) are as relevant to what may happen next week as were events of the most current and prior week.

Big Data: Baseline Forecasting with Exponential Smoothing Models

8-3

In Figure 8.1, a weight is denoted by wi, where the subscript i represents the number of weeks into the past. For an equally weighted average, the weight given to each of the terms is 1/n, where n is the number of time periods. With n = 4, each weight in column 3 is equal to 1/4. If we consider only the latest week, we have another option, shown in column 4 of Exhibit 8.1, which is the Naïve_1 forecast; it places all weight on the most recent data value. Thus, the forecast for next week's production rate is 425, the same as the current week’s production. This forecast makes sense if only the current week's events are relevant in projecting the following week. Whatever happened before this week is ignored.

Figure 8.1 Various weighting schemes, Exponentially Decaying Weights. Most business forecasters find a middle ground more appealing than either of the two extremes, equally weighted or Naïve_1. In between lie weighting schemes in which the weights decay as we move from the current period to the distant past. w1 > w2 > w3 > w4 >. . . . The largest weight, w1, is given to the most recent data value. This means that to forecast next week's production rate, this week's figure is most important; last week's is less important, and so forth. Many other patterns are possible with decaying weight schemes. As illustrated by column 5 of Figure 8.1, the weight starts at 40% for the most recent week and decline steadily to 10% for week T - 3. Our forecast for week t = T + 1 is the weighted average with decaying weights: 425 x 0.4 + 376 x 0.3 + 411 x 0.2 + 266 x 0.1 = 392 This weighted average gives a production rate forecast that is more than that of the equally weighted average and less than that of the Naïve_1. An exponentially weighted average refers to a weighted average of the data in which the weights decay exponentially. The most useful example of decaying weights is that of exponentially decaying weights, in which each weight is a constant fraction of its predecessor. A fraction of 0.50 implies a decay rate of 50%, as shown in column 6 of Figure 8.1. In forecasting next period’s value, the current period’s value is weighted 0.5, the prior week half of that at 0.25, and so forth with each new weight 50% of the one before. (These weights must be adjusted to sum to unity as in column 7.) From Figure 8.1, we can see that the adjusted weights are obtained by dividing the exponential decay weights by 0.9375. Figure 8.2 illustrates the weighted average of all past data, with recent data receiving more weight than older data. The most recent data is at the bottom of the spreadsheet. The weight on each data value is shown in Figure 8.3. The weights decline exponentially with time, a feature that gives exponential smoothing its name.

8 - 4 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained

Figure 8.2 Computation of weights on historical data.

Figure 8.3 Exponentially decaying weights.

Simple Exponential Smoothing Method All exponential smoothing techniques incorporate an exponential-decay weighting system, hence the term exponential. Smoothing refers to the averaging that takes place when we calculate a weighted average of the past data. To determine a one-period-ahead forecast of historical data, the projection formula is given by

Yt (1) = α Yt + (1 - α) Yt-1 (1)

where Yt (1) is the smoothed value at time t, based on weighting the most recent observation Yt with a weight α (α is a smoothing parameter) and the current period’s forecast (or previous smoothed value) with a weight (1 - α). By rearranging the right-hand side, we can rewrite the equation as Yt (1) = Yt-1 (1) + α [Yt - Yt-1 (1)]

which can be interpreted as the current period’s forecast Yt-1 (1) adjusted by a proportion α of the current period’s forecast error [Yt - Yt-1 (1)].

Big Data: Baseline Forecasting with Exponential Smoothing Models

8-5

The simple exponential smoothing method produces forecasts that are a level line for any period in the future, but it is not appropriate for projecting trending data or patterns that are more complex. We can now show that the one-step-ahead forecast Yt (1) is a weighted moving average of all past observations with the weights decreasing exponentially. If we substitute for Yt-1 (1) in the first smoothing equation, we find that: Yt (1) = α Yt + (1 - α) [α Yt-1 + (1 - α) Yt - 2 (1)] = α Yt + α (1 - α) Yt-1 + (1 - α)2 Yt - 2 (1) If we next substitute for Y t-2 (1), then for Y t-3 (1), and so, we obtain the result

Yt (1) = α Yt + α (1 - α) Yt-1 + α (1 - α)2 Yt - 2 + α (1 - α)3 Yt – 3 + α (1 - α)4 Yt - 4 + . . . . + α (1 - α)t- 1 Y1 + (1 - α)t Y0 (1) The one-step-ahead forecast YT (1) represents a weighted average of all past observations. For three selected values of the parameter α, the weights that are assigned to the past observations are shown in the following table: Weight α = 0.1 α = 0.3 α = 0.5 α = 0.9 Assigned to: YT

0.1

0.3

0.5

0.9

YT-1

0.09

0.21

0.25

0.09

YT-2

0.081

0.147

0.125

0.009

YT-3

0.0729

0.1029

0.0625

0.0009

YT-4

0.0656

0.0720

(0.5) (0.5)4

0.00009

In Figure 8.4, we calculate a forecast of the production data, assuming that α = 0.5. (The production data are repeated in Figure 8.4, in the Actual column.) To use the formula, we need a starting value for the smoothing operation - a value that represents the smoothed average at the earliest week of our time series, here t = T - 3. The simplest choice for the starting value is the earliest data point. In our example, the starting value for the exponentially weighted average is the production rate for week t = T - 3, which was given as 266. The final result, YT (1) = 391 (rounded) for week t = T, is called the current level. It is a weighted average of 4 weeks of data, where the weights decline at a rate of 50% per week.

Figure 8.4 Updating an exponentially weighted average. We defined a one-period-ahead forecast made at time t = T to be YT (1). Likewise, the m-periodahead forecast is given by YT (m) = YT (1), for m = 2, 3, . . . . For a time series with a relatively constant level,

8 - 6 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained this is a good forecasting technique. We called this simple smoothing in Chapter 3, but it is generally known as the simple exponential smoothing. The forecast profile of the simple exponential smoothing method is a level horizontal line. Simple exponential smoothing works much like an automatic pilot or a thermostat. At each time period, the forecasts are adjusted according to the sign of the forecast error (actual data minus forecast.) If the current forecast error is positive, the next forecast is increased; if the error is negative, the forecast is reduced. To get the smoothing process started (Figure 8.5), we set the first forecast (cell E8) equal to the first data observation (cell D8). We can also use the average of the first few data observations. Thereafter, the forecasts are updated as follows: In column F, each error is equal to actual data minus forecast. In column E, each forecast is equal to the previous forecast plus a fraction of the previous error. This fraction is called the smoothing weight (cell I2). But how do we select the smoothing weight? The smoothing weight is usually chosen to minimize the mean square error (MSE), a statistical measure of fit. This smoothing weight is called optimal, because it is our best estimate based on a prescribed criterion (MSE). Forecasts, errors, and squared errors are shown in columns E, F, and G. The one-step-ahead forecast (=16.6 in cell E20) extends one period into the future. The travel expense data, smoothed values, and the one-period-ahead forecast are shown graphically in Figure 8.6.

Figure 8.5 Forecasting with simple exponential smoothing – company travel expenses.

Figure 8.6 Simple exponential smoothing: company travel expenses and one-period ahead forecasts.

Big Data: Baseline Forecasting with Exponential Smoothing Models

8-7

Forecast Profiles for Exponential Smoothing Models A system of exponential smoothing model can be classified by the type of trend and/or seasonal pattern generated as the forecast profile. The most appropriate technique to use for any forecasting should match the profile expected or desired in an application. Figure 8.7 shows the extended Pegels classification for 12 forecasting profiles for exponential smoothing developed by Everette S. Gardner, shown in picture, in a seminal paper Exponential smoothing: The state of the art, J. of Forecasting. 1985.

A Pegels classification of exponential smoothing methods gives rise to 12 forecast profiles for trend and seasonal patterns. After a preliminary examination of the data from a time plot, we may be able to determine which of the dozen models seems most suitable. In Figure 8.7, there are four types of trends to choose from (Nonseasonal column), and two types of seasonality (Additive and Multiplicative).

Figure 8.7 Pegels’ classification of exponential smoothing techniques extended to include the damped trend technique. For a downwardly trending time series, multiplicative seasonality appears as steadily diminishing swings about a trend. For level data, the constant-level multiplicative and additive seasonality techniques give the same forecast profile. Each profile can be directly associated with a specific exponential smoothing model (Figure 8.8), as described the in the next section (some of which are referred to by a common name attributed to their authors). We now explain how each model works to generate forecasts; that is, we describe how each model produces the appropriate forecasting profile.

8 - 8 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained Model Name

Trend profile

Simple (single) None Holt Additive (Linear) Holt-Winters Additive (Linear)

Seasonal Profile

State Space Classification

None None Additive or Multiplicative

(N, N) (A, N) (A, A) or (A, M)

Figure 8.8 Most commonly implemented exponential smoothing models.

Smoothing Levels and Constant Change An exponential smoothing model comprises one or more of the following components: the current level, the current trend, and the current seasonal index. • The current level serves as the starting point for the forecast. It is calculated to represent an exponentially weighted average of the time series at the end of the fit period. We can regard the current level as the value the time series would now have if there were nothing at all unusual going on at present. An alternative is to use the last observation as the starting point, but doing so might set the forecasts off at the wrong level if the most recent data is abnormal. • The current trend represents the amount by which we expect the time series to grow or decline per time period into the future. It is often calculated as an exponentially weighted average of past periodto-period changes in the level of the series. In this way, recent growth or decline in the time series is given more weight than changes farther back in time. • The current seasonal index is interpreted the same way as a conventional seasonal index – as the amount or degree by which the season’s value tends to exceed or fall short of the norm. Recall that in the classical decomposition of a time series, a (multiplicative) seasonal index measured the norm as a moving average (see Chapter 6). The determination of the current seasonal index, a key part of Holt-Winters’ method, differs from the conventional indexes in two respects: 1.

2.

Difference in weighting years. In a ratio-to-moving-average (RMA) method, the data used over the years to determine a monthly or quarterly index are weighted equally. In contrast, the Holt-Winters method takes an exponentially weighted average of the ratios to level, thus giving more weight to recent years than to those of the past. Representing the norm. The Holt-Winters method uses the current level of the series instead of a moving average of four quarters or 12 months.

The interpretation and estimation of the parameters in the smoothing algorithms is a complex matter that we will not deal with. The formulae to describe the current level, trend, and seasonal indexes can be found in Hyndman, Koehler, Ord and Snyder’s book Forecasting with Exponential Smoothing, (2008). It describes in detail the state space framework for forecasting time series with exponential smoothing. Optimal or nearoptimal parameter settings are derived using prescribed criteria for optimal selection found in modern forecasting software tools (e.g., R forecast package: robjhyndman.com/software/forecast and free PEERForecaster Addin for Excel: www.peerforecaster.com). In addition, combinations of multiple parameter values severely limit an intuitive feel for their impact on the forecasts. Moreover, for inventory replenishment planning purposes, a demand forecaster needs to rely on automatic forecasting features of modern software because of the very large volume of data involved. We now illustrate the use of the smoothing equations for (1) calculating the current level, trend, and seasonal indexes and (2) combining these values into the forecasting formula. Consider Figure 8.9 for weekly shipments of a canned beverage. Figure 8.10 is a time plot of weekly shipments of a canned beverage for 66 weeks, shown previously in Chapter 6. The series is highly variable but not trending. The average level is approximately 4000 - 5000 units per week with a standard deviation of 3639. Some of the high peaks may be

Big Data: Baseline Forecasting with Exponential Smoothing Models

8-9

attributed to promotions, but we will choose to forecast it using simple exponential smoothing. This model is appropriate for data lacking trend and seasonality.

Figure 8.9 Time plot weekly shipments of a canned beverage (66 weeks) – no trend or seasonality. (Source: Figure 8.10)

Figure 8.10 Weekly shipments of a canned beverage – no trend or seasonality. The simple exponential smoothing model is fit to all 66 weeks and forecast is made for 4 weeks. The optimal estimate (based on the MSE criterion) of the smoothing parameter is 0.62 with MSE = 1669. The multi-period forecasts are a constant level (= 5129). Thus, it represents a “typical” level. Figure 8.11 displays the most recent 20 weeks of historical shipments, 20 weeks of fitted values, and the four forecasts. Because simple exponential smoothing views the future of the time series as lacking both trend and seasonality, the forecasting equation does not contain these terms, leaving the current level as the sole component.

8 - 10 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained The current level Lt is calculated by an equation for an exponentially smoothed average of the past data: YT (m) = Lt .

Figure 8.11 Forecast model for canned beverage shipments: simple exponential smoothing: no trend, no seasonality. (Source: Figure 8.10)

Figure 8.12 Annual car registrations - linear trend. Consider Figure 8.12 for annual car registrations. Figure 8.13 shows a time plot of the data. Because the data are annual, the time series is necessarily nonseasonal. The global trend appears to be linear, although there are a number of local variations on the trend. Figure 8.13 is the graph of the output from the Holt model (A, N) – linear trend, no seasonality. We see that from the vantage point of Year 19, the current level of car registrations is estimated to be 1034 and the current trend is estimated to be an increase of 31 registrations per year. For nonseasonal data, the seasonal index terms are not present in the forecasting equations. What remains in the Holt method, however, is the forecasting equation for a linear trend:

Big Data: Baseline Forecasting with Exponential Smoothing Models

8 - 11

YT(m) = [Lt + m x Tt ] The forecast for year 22 is 1126, based on calculating a 3-year-ahead projection from the base year T = 19: Y 19 (3) = [1033.62 + 3 x 30.78] = 1126

Figure 8.13 Time plot of the historical data and forecast model for car registrations: Holt method - linear trend, no seasonality. (Source: Figure 8.12)

Damped and Exponential Trends Everette S. Gardner introduced the damped trend exponential smoothing model in a seminal paper: Exponential Smoothing. The state of the art, J. of Forecasting (1985) and updated 20 years later in an equally important paper Exponential Smoothing. The state of the art – Part II, International. Journal of. Forecasting (2006). These damped trend exponential smoothing models offer practitioners a well-tested and consistently top performing approach to trend-seasonal time series forecasting. In the seasonal models, the trend component, if one is present, is assumed to be linear. Trends can also be nonlinear. For example, damped (upward) trend assumes that the series will continue to grow but that the growth gradually dampens out. An exponential growth trend assumes that the series will grow by a progressively larger amount. Exponential growth is equivalent to a constant percentage rate of growth. As the base grows over time, the constant percentage increase on the base translates into larger and larger increments in volume, in the manner of compound-interest growth on an investment. In the case of a downward trend, the damped and exponential patterns are similar. During a phase out or decline under adverse market conditions, a forecast profile will be decaying without becoming negative. We may refer to the pattern of shipments of a cosmetic product (Figure 8.14) either as a downwardly damped trend or as exponential decay.

Figure 8.14 Time plot of weekly shipments of a cosmetic product – damped trend exponential smoothing model.

8 - 12 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained Like a no-trend or linear trend, the exponential trend may be used in conjunction with multiplicative, additive, or no seasonality patterns. We use the nonseasonal case here to illustrate damped and exponential trends. (When seasonality is included, it is called the Holt-Winters procedure, discussed later.) Both damped and exponential trends can be represented in a single forecasting equation, given by YT(m) = Lt + ∑ φ i x Tt where m is the length of the forecast horizon. The symbol φ is called the trend-modification parameter. Depending on the value of φ, the forecast profile can be an exponential trend, linear trend, damped trend, or constant level. Here are the cases. If φ > 1 trend is exponential. If φ = 1 trend is linear. If φ < 1 trend is damped. If φ = 0 there in no trend. Figure 8.15 shows the historical and forecast values for a damped trend and exponential trend model of the annual car registration series. The growth in the forecasts, the change from the prior year’s forecast, has dampened. With φ = 0.83, the forecasted trend is slowing by 1 - 0.83 = 0.17 or 17% per period. The estimates of level and trend weights are 0.6 and 0.2, respectively,

Figure 8.15 Time plot for historical car registrations with two exponential smoothing forecasts – damped trend and exponential trend. (Source: Figures 8.16 and 8.17) We can illustrate how the forecast was obtained for Year 20 (= T + 1); the fitted value for year 19 is 1078.55, Lt = 1060.86, and Tt = 18.75. Setting m = 1, Y19 (1) = Lt + φ x Tt = 1060.86 + 0.831 x (18.75) = 1076.42 The Year 21 forecasts are calculated as follows: Y19 (2) = Lt + (φ1 + φ2) x Tt = 1060.86 + (0.831 + 0.832) x (18.75) = 1089.34 Alternatively, we can obtain the forecast for T + 2 by calculating YT (2) = YT (1) + φ2 x Tt

Big Data: Baseline Forecasting with Exponential Smoothing Models

8 - 13

Figures 8.16 and 8.17 show a comparison of the forecasts from the exponential trend and the damped trend for car registrations data. Based on the MSE criterion, the MSE of the damped trend model is approximately 16% smaller than the MSE of the linear trend model. The five-period-ahead forecast (fifth year projection) for the damped trend model is approximately 5% below the linear trend projection. Whether the difference in the 5-year projections is significant depends on the context in which these forecasts are used. A comparison of the forecasts by two techniques is shown graphically in Figure 8.15.

Exhibit 8.16 Exponential trend forecast model for car registrations (dampening factor = 1.0).

Figure 8.17 Damped trend forecast model for car registrations. (dampening factor = 0.83)

Some Spreadsheet Examples Annual Sales of a Cosmetic Product. Consider the trend shown in Figure 8.18 for the yearly sales of a cosmetic product. The sales for this cosmetic product, for the period 1978 - 2002, show a declining trend. The forecasts decline exponentially, modeled by a nonlinear trend exponential smoothing model. Figure 8.19 presents a comparison of a linear trend and damped exponential trend model for the cosmetic product. Note that minimizing MSE should not be the only criterion for selecting a model; the forecast profile should also be considered. In this case, the linear trend model with the lowest MSE also yields much lower forecasts over the forecast period than the damped trend model. It may require some judgment on the part of the demand forecaster to determine the most appropriate profile for the data at hand.

8 - 14 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained

Figure 8.18 Sales of a cosmetic product – damped exponential trend.

Figure 8.19 Forecast models for a cosmetic product: damped and linear trend exponential smoothing. (Figure 8.18) To illustrate how the damped exponential trend forecast was obtained for the year 2005 (= T + 3), we set m = 3 and T = 2002: Y2005 (3) = Lt + (φ1 + φ2 + φ3 ) x Tt = 279.85 + (0.88 + 0.882 + 0.883) x (- 29.86) = 210.10 The difference in the forecast profiles for the linear trend and damped trend arises from the value of the trend modification parameter, which is below unity (φ = 0.88) for the damped trend model. This value of φ leads to a decrease over prior year’s forecast and is characteristic of an exponential decline. On the other hand, the damped exponential trend model, for which the trend modification parameter is

Big Data: Baseline Forecasting with Exponential Smoothing Models

8 - 15

constrained to φ = 1, gives rise to a linear trend forecast profile. A comparison of the two techniques yields a statistical summary (Figure 8.20) and forecast profiles for the cosmetics sales data (Figure 8.21) that indicate the difference in the rounded results. Exponential trends should be applied with caution, and careful consideration should be made of the underlying business environment of the data. Blind acceptance of optimal parameter estimates and best MSE values should be avoided.

Figure 8.20 A four-period forecast comparison for sales of a cosmetic product: linear and damped trend exponential smoothing. (Source: Figure 8.18)

Figure 8.21 A time plot of history and forecast profiles of sales of a cosmetic product: damped trend exponential smoothing (φ = 0.88) compared to linear trend ( φ = 1.0). (Source: Figure 8.18)

8 - 16 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained

Figure 8. 22 Sale of a cosmetic product: log transform, linear trend. (Source: Figure 8.18)

Figure 8.23 Forecast model for sales of a cosmetic product (logarithmic transformation): linear trend. (Source: Figure 8.22) Now consider Figure 8.22. By taking a transformation of the data, exponential growth patterns can also be modeled by applying a linear trend (Holt model) to the natural logarithm of the time series. The results are shown in Figure 8.23. The forecasts in Figure 8.23 are calculated first in terms of the logarithm of the series and then transformed into the original data by exponentiation. To illustrate how the forecast was obtained for the year 2006 (t = T + 4), we set m = 4 and T = 2002: Forecast for log10 (T + 4) = 2.44 + 4 x (-0.04) = 2.28 Transform back to original data = 102.28 = 190 The results can be compared with the corresponding forecasts from the damped trend exponential model (DT) in Figure 8.20. The log-transform approach yields forecasts that are slightly lower than those shown in Figure 8.20; however, both depict a forecast profile with a negative exponential trend. Depending on the context in which these projections are used in practice, the differences could become substantial. This illustrates the limitation of using these types of techniques for extrapolating highly trending annual

Big Data: Baseline Forecasting with Exponential Smoothing Models

8 - 17

time series for more than a couple of periods. More important, when we calculate prediction limits on the forecasts, these two approaches also give different interpretations. Because of the transformation, the log-transformed model will result in asymmetric (right-skewed) prediction limits, whereas the original model will give symmetric limits. Company Sales - Linear Trend. Consider Figure 8.24a. The initial values for level and trend are in the top cells of the columns Level and Trend, respectively. Thereafter, adding a fraction of the error to each smooths the level and trend. This is the same self-correcting idea used in the simple smoothing technique. The cells for LEVEL WEIGHT and TREND WEIGHT show that we used individual weights to smooth the level and trend. Each forecast is just the sum of the latest estimate of level and trend. For example, the linear trend forecast for 1990 is 48.02. This is the sum of the level and trend at the end of 1989 (46.08 + 1.94). At the end of the data, we can forecast as many years ahead as we like. The forecast for 2 years ahead (1991) is 49.96, the 1990 forecast plus another increment of trend (48.02 + 1.94). For most business data, the magnitudes of the level and trend can differ greatly. In this example, the numbers in the level column range from 7 to 25 times larger than the numbers in the trend column. Thus, the level weight has to be larger than the trend weight to give reasonable forecasts. The weights shown minimize the MSE. (See Figure 8.25 for the time plot.) Company Sales - Damped Trend. Figure 8.24b is identical to Figure 8.24a except that a new parameter has been added to modify the trend. Each forecast is the latest estimate of the level plus the trend times a trend modifier (shown in the cell for TREND MODIFIER). For example, the forecast for 1990 is 46.61. The formula is: Forecast = (Level at the end of 1989) + (Trend modifier) x (Trend at the end of 1989) = 45.78 + (0.80 x 1.03) = 46.61 To generate forecasts for more than one period into the future, the formula is Forecast = (Last forecast) + (Trend modifier) (# of periods into the future) x (Final trend estimate) The forecast for 1991 is thus 46.61 + 0.802 x1.03 = 47.27 Note that the trend modifier in Figure 8.24 is a fraction. Raising a fraction to a power produces smaller numbers as we move farther into the future. The result is called a damped trend because the amount of trend added to each new forecast declines. By changing the trend modifier, we can produce different kinds of trend. A modifier equal to 1.0 yields a linear trend, exactly the same result as the linear trend worksheet. A modifier greater than 1.0 yields an exponential trend, one in which the amount of growth gets larger each time period. A modifier of 0 produces no trend at all and the results are the same as simple exponential smoothing. The weights shown minimize the MSE, given that the trend modifier has been set. (See Figure 8.25 for the time plot.) In general, the damped trend techniques appear better suited for smoothing short-term patterns for operational forecasting in inventory and production planning than for smoothing long-term forecasting.

8 - 18 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained

Figure 8.24 Company sales: comparison of exponential smoothing of (a) linear and (b) damped-exponential trends.

Big Data: Baseline Forecasting with Exponential Smoothing Models

8 - 19

Figuret 8. 25 Time plot of company sales: linear and damped trend smoothing.

Trend-Seasonal Models with Prediction Limits The forecasting equations for additive and multiplicative seasonality are: Additive:

YT(m) = Lt + m x Tt + Seasonal index

Multiplicative: YT(m) = [Lt + m x Tt ] x Seasonal index where YT(m) denotes a forecast made at time T, the final season in the fit period, for m periods into the future. This technique is known as the Holt-Winters procedure. Unlike the nonseasonal exponential smoothing techniques, seasonality brings in a complexity that makes interpreting the smoothing equations directly less intuitive. Fortunately, these algorithms are available in some software, such as R forecast package: robjhyndman.com/software/forecast and the free Excel Add-in PEERForecaster.xla: www.peerforecaster.com, so we show only here that the forecast for m periods ahead can be compiled using the steps: 1. 2. 3.

Start at the current level, Lt. Add the product of current trend, Tt, and the number of periods m ahead that we are projecting trend. Adjust the resulting sum of level and trend for seasonality through a multiplicative or additive seasonal index.

Figure 8.26 is a time plot of a quarterly automobile sales time series that we analyze in a preliminary two-way ANOVA table decomposition (Figure 8.27). Approximately 68% of the total variation was attributed to seasonality. The trend appears to be linear with seasonal peaks occurring in the second quarter of each year. The key results are summarized in Figure 8.28 (Recall that the various error measures – MAPE, MAE, MAD, and RMSE – are defined in Chapter 4). Each measure for the model is compared with an appropriate naïve technique, such as Naïve_4 for quarterly seasonal data. The Naïve_4 technique assumes no trend, but has valuable information about seasonality. In a seasonal series with little trend, we cannot expect to do much better than Naïve_4. The fit coefficient is 1 - [(MSE of your model)/(MSE of Naïve_4)] is computed as follows: 1. 2. 3. 4.

The series is deseasonalized so that the seasonal pattern is removed. The preliminary forecast for each period is taken as the seasonally adjusted value from the previous period. The final forecasts are computed by reseasonalizing the preliminary forecasts. Forecast errors are computed by subtracting final forecasts from actual data.

Forecast errors are computed by subtracting final forecasts from actual data. The low seasonal peak in Year 2 was not adjusted because it occurred very early in the dataset and would have minimal weight in the exponential smoothing calculations.

8 - 20 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained The prediction limits are shown in Figure 8.26. These 95% prediction limits indicate that we are 95% sure that the true (in the sense that the model is correct) forecasts will lie within these limits. The prediction limits do not appear symmetrical around the forecast, which suggests that the model errors are multiplicative. The forecasts are more likely to be high than low, which makes sense with the consistent upward trend in the historical data.

Figure 8. 26 Time plot with 95% prediction limits of a quarterly automobile sales series.

Figure 8.27 Quarterly automobile sales series in Quebec City.

Big Data: Baseline Forecasting with Exponential Smoothing Models

8 - 21

Figure 8.28 Model-fitting summary for quarterly automobile sales; (a) linear trend, additive seasonality model (A, A); (b) linear trend, multiplicative seasonality model (A, M). (Source: Figure 8.27) The smoothing weights (level = 0.02, trend = 0.115 and seasonal = 0.00) in Figure 8.28(a) show the relative emphasis given to the data from the recent and more distant past in the calculation of the current level, trend, and seasonal indexes. The values for the current level, trend, and seasonal indexes are called final values. Note that period T is the second quarter (spring) of Year 6. To forecast from this time origin, the current level is 46,841, the current trend is 561.8 (units per quarter), and each season has its own seasonal index. The summer index (Q3= -2,225.8) indicates that automobile sales during the summer tend to be approximately 2,226 units below the norm.

Figure 8.29 Forecast model for automobile sales: additive Winters - linear trend, additive seasonality model (A, A). (Source: Figure 8.27) In Figure 8.29, starting from spring Q2 of Year 6, the automobile sales forecast for Q3 is (setting m = 1): YT (m) =[Lt + m x Tt] + Seasonal index YT (1) = [46,841 + 561.8] + (-2225.8) = 45,177 units To forecast three periods ahead to winter Q1 of Year 7, we set m = 3 and use the seasonal index for Q1: YT (3) = [Lt + 3 x Tt] + Seasonal index = [46,841 + 3 x 561.8] + (-9602.5) = 38,924 units The forecast for the winter of Year 7 is lower than that for the previous summer for two reasons. First, the trend is only growing by approximately 562 units per quarter. Second, and more substantially, the seasonal index for winter is 7377 units lower than in summer. In Figure 8.30, the multiplicative seasonal model produces the following forecast for the winter quarter of Year 7 YT (3) = [Lt + 3 x Tt ] x Seasonal index

8 - 22 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained = [46,961.5 + 3 x 957.1] x 0.755 = 37,624 units

Figure 8.30 Forecast model for automobile sales: multiplicative Winters - linear trend, multiplicative seasonality model (A, M). (Source: Figure 8.27) In this example, the two versions of Holt-Winters procedure give very similar values for the current level and trend. The seasonal indexes are in a different form and result in different projections. The additive index for the summer season tells us that summer sales tend to be approximately 15,000 units above the norm; this will be a constant amount for all future years. In contrast, the multiplicative index for the summer season is estimated to be approximately 39% above the norm, which represents an increasing amount as long as the data are trending up. Which model is preferable? On a strictly statistical basis, the multiplicative model has the preferred summary results. However, this may not necessarily mean that forecast performance will be better as well.

The Pegels Classification for Trend-Seasonal Models A state-space framework for forecasting techniques using exponential smoothing models has been developed by Rob Hyndman, Anne Koehler, Keith Ord and Ralph Snyder and published in a book, entitled Forecasting with Exponential Smoothing – The State Space Approach (2008). The state-space models (Figure 8.31) that underlie the exponential smoothing techniques come in two forms: models with additive error assumptions and models with multiplicative errors. Although the forecast profiles for a given model formulation are identical, there are important differences in the prediction limits produced by the additive and multiplicative error assumptions. With this distinction in error structure, the Error-Trend-Seasonal (ETS) state-space framework today effectively describes 30 models in the Pegels classification, by creating triplets, like (A, N, N) and (M, N, N) for the entry (N, N) in the table. A Pegels classification of exponential smoothing techniques in a state-space modeling framework gives rise to 30 trend-seasonal forecast profiles with prediction limits for trend and seasonal patterns

Big Data: Baseline Forecasting with Exponential Smoothing Models

8 - 23

Figure 8.31 Classification of the exponential smoothing methods for State Space forecasting. (Source: Hyndman, et al Forecasting with Exponential Smoothing – The State Space Approach. (2008). There are differences in their use as well. The multiplicative error models are not well defined if there are zeros or negative values in the data. Similarly, additive error models should not be used with multiplicative trend or multiplicative seasonality if any data value is zero. The estimation and model selection steps are beyond the scope of this chapter; the reader is referred to the aforementioned book by Hyndman et al. (2008) for the theoretical details. Fully functional software (PEERForecaster Add-in for Excel) for exponential smoothing using the statespace algorithms is available with a free download from http://www.peerforecaster.com. The free and open source software R offers the State Space formulation in the R forecast package: robjhyndman.com/software/forecast

Handling Special Events with Smoothing Methods Special events arise whenever periodic actions of the organization, such as special promotions, scheduled disruptions for maintenance, unusual weather, and holiday effects, cannot be treated as seasonality because they do not fall within the same period (week) each year. A special event adjustment can be combined with an exponential smoothing method to improve the accuracy of a forecast. A special event refers to a sudden change in the level of the time series that is expected to recur. Event Adjustments for Outliers. When dealing with an outlier, it is generally not possible to simply delete it because doing so would leave a gap or missing value event between adjacent time periods. One procedure to remedy missing values is to interpolate by using a method to project the missing value(s), or replace the value by averaging the two adjacent values. For example, a missing value for June can be replaced by the average for May and July. An alternative to interpolating missing values is to use dummy variable events. We first identify the specific time period(s) when the outlier occurs and then create dummy variable events - a time series of 0s and 1s with 1 for each outlier time period and 0 for each normal time period. Once this dummy time series is incorporated into a forecasting technique, the outlying values are effectively removed

8 - 24 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained from the calculations of the coefficients of the forecasting formula. The resulting forecasts are generated as if the unusual time periods never occurred. Dummy variable events can be used with other forecasting techniques as well. For exponential smoothing, event adjustments can be applied both to outliers and special events. First, the dummy variable event is created as an index in the exponential smoothing forecasting formula. For example, in the Winters method, the forecasting formula is an extension of the exponential smoothing technique for multiplicative and additive forms: Additive:

YT (m) = Lt + m x Tt + Seasonal index + Event index

Multiplicative: YT (m) = [Lt + m x Tt] x Seasonal index x Event index A multiplicative event index is interpreted as a multiple of the level of the series. For example, if the current level is 100 and the event index is 0.75, the result indicates that the average effect of the outlier value(s) was to reduce the level of the series by 25%. The same result in the additive model would become event index of -25. Note that the effect of the event index in the procedure is to remove the outlier values from the calculation of the trend component and seasonal indexes. Then set the event index (= 0 for additive, = 1 for multiplicative) to generate forecasts on the assumption that no outliers will occur over the forecasting horizon. Event Adjustments for Promotions and Other Special Events. When data are available on the timing and magnitude of product promotions (price changes), we can effectively use this information in a causal forecasting approach, such as a regression model, in which the promotion variable enters as one or more explanatory (independent) variable. Although exponential smoothing does not explicitly incorporate explanatory variables, it can be extended to incorporate dummy variables. In this manner, a dummy (0, 1) coding can be used to distinguish those periods in which a promotion occurred from those free of promotion. The introduction of such a dummy event series effectively models the timing of the promotions, without specifying their magnitude. Figure 8.32 shows a summary from an analysis of the monthly sales of a consumer product during a 24 month period. There were two peaks in the data associated with promotions during December of Year 1 and a promotion in August of Year 2. Three versions of a Holt-Winters (multiplicative) procedure were fit. Version 1 is a direct application of a multiplicative Winters seasonal exponential smoothing method. The Mean Absolute Percentage Error (MAPE) is 16.3% for the 24-month fit period. The strong effect of the twin promotions is viewed as seasonality, giving the months December, January, August, and September the high seasonal indexes of the year. However, promotions will appear as the seasonal indexes only if they are scheduled for the same months every year. This was not the case for this product in Year 3. Version 2 employs an event adjustment model in which the occurrence of promotions is represented by a (0, 1) dummy variable. For the historical period, the dummy variable is set equal to 1 in December of Year 1 and August of Year 2 and set equal to zero otherwise. The Winters procedure results in a substantial reduction in the MAPE from 16.3 to 11.2%, along with a much altered seasonal pattern. Now the high seasonal indexes are attributed to January, February, and September because December of Year 1 and August of Year 2 are identified as promotion months. The event index is reported as 1.57, a result that implies that, on average, sales during a promotions month will be approximately 57% higher than sales in a normal month. Although this is an improvement over Version 1, the seasonal pattern in Version 2 is still problematic. The high indexes for January and September are felt to be, at least in part, a response to the promotions occurring in December of Year 1 and August of Year 2 and to be unrepresentative of the plans for Year 3. Version 3 replaces the (0, 1) coding for promotions by a (0, 1, 2) coding. In general, a (0, 1, 2) coding indicates that there are two types of special events being dealt with: Event 1 and Event 2. In this example, the Event 2 was used to represent the delayed response of sales to an earlier promotion. Hence, the promotion time series was set equal to 1 for December of Year 1 and August of Year 2, 2 during January of Year 2 and September of Year 2, and 0 for all other months during these 24 months. Version 3 very substantially improves the fit to the historical data, with the MAPE falling to 6.9%. The event indexes imply that when a promotion occurs, sales can be expected to rise by 71% in the same

Big Data: Baseline Forecasting with Exponential Smoothing Models

8 - 25

month (Event 1) and 75% in the following month (Event 2). The January seasonal index has fallen from Version 2 because part of the January strength is now attributed to the December of Year 1 promotion.

Figure 8.32 Forecast performance with three models. (n.a., not available)

Outlier Adjustment with Prediction Limits Figure 8.33 depicts a time series of a pharmaceutical product in which the seasonal peak (observation #31) for the third year is diminished. What could be the root-cause and how should we clean this data? If not adjusted, an automatic exponential smoothing algorithm (State-Space model (N, M)) produced forecasts and prediction limits (Figure 8.33) for period’s #32 - #43 (12 months). It clearly does not look credible. The issue is not with the no-trend, multiplicative seasonal exponential smoothing model, but rather the impact that an outlier has on the forecast profile. For exponential smoothing models it is important that data are cleaned before modeling especially when the unusual values are so close to the most current period. The one-step-ahead forecast for observation #31 is 27036, based on the (N, M) model for the first 30 values. With this value (or any value within the prediction limits) replacing the original value 5740 yielded a much improved forecast profile with prediction limits in Figure 8.34 (Note the change in the vertical scale).

Figure 8.33 A predictive visualization of a seasonal peak adjustment, unadjusted observation #31.

8 - 26 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained

Figure 8.34 A predictive visualization of a seasonal peak adjustment, with outlier adjustment of period #31.

Predictive Visualization of Change and Chance – Hotel/Motel Demand In the spreadsheet examples above, we demonstrated how to calculate point forecasts with exponential smoothing methods. In Figure 8.35, we provide a model fitting summary and model performance comparison for an additive seasonal (A, A) and multiplicative seasonal (A, M) exponential smoothing model of the monthly time series of hotel/motel demand (Figure 8.36). The smoothing parameters come from minimizing MSE over the fit period. On a strictly statistical basis, the multiplicative seasonal version (A, M) has the better summary results. However, fit may not be a good indication of post-sample forecast accuracy. In practice, it would always be advisable to maintain several models on an ongoing basis at all times. Fit Statistics (A, A) (A, M) Level (alpha) 0.238 0.233 Trend (beta) 0.1 0.1 Season (gamma) 0.1 0.1 MAPE (Mean Absolute 2.2% 1.9% Percentage Error) AIC 2269 1973 -2231 -696 ME (Mean Residual Error) MdE (Median Residual -2511 -73 Error) MAD (Mean Absolute 25202 23237 Deviation) 19852 18790 MdAD (Median Absolute Deviation) Figure 8.35 Model-fitting summary for monthly hotel/motel demand - additive trend, additive seasonality model (A, A) and additive trend, multiplicative seasonality model (A, M). (Source: Figure 8.36)

Big Data: Baseline Forecasting with Exponential Smoothing Models

8 - 27

Figure 8.36 Historical data of monthly hotel/motel demand. (Source: D. C. Frechtling, Practical Tourism Forecasting 1996, Appendix. 1) The preferred way to simulate forecast accuracy is by creating a holdout period and distinguishing a fit period (to estimate parameters) from a forecast period (to determine forecast accuracy). Because our primary interest is in forecasting performance of models, we started with a holdout period of the latest 12 months and created forecasts based on the first 84 monthly values. The summary statistics (Figure 8.37) led to the same conclusions we made for the longer (=96) time series. The stability of model parameter estimates may be relatively unimportant because frequently parameter estimates can vary widely with little impact on forecast accuracy. Hence, we do not recommend manipulating parameter estimates to try to improve forecasting performance. The evaluation in Figure 8.38 shows the results for an automatic, optimally selected model: the summary FIT statistics, smoothing parameters, and seasonal factors in suggest we have a credible (A, M) model. The data are clearly seasonal (Figure 8.39). This is a static test with a multiplicative Holt-Winters model in the sense that all forecasts were made from a single time point (T = 1984) for a fixed time horizon (m = 12). In Figure 8.38, we also summarize the forecasting evaluations using the ME and MdE for bias and MAPE and MdAPE for precision. The 12 forecast errors are all negative, implying that the model is biased (i.e. over-forecasting).The average of the 12 forecast errors is – 63,716 and the median error is – 65,395 indicating overforecasting the entire year. The MAPE and MdAPE are both 5%, so there is no evidence in any unusual monthly overforecast. The hold-out sample evaluation of forecasts in Figure 8.40 is a dynamic test: both starting point and horizon change. For one-period- ahead forecasts (lead-time = 1), there are 12 possible forecasts that could generated. The average of the 12 one-period-ahead absolute forecast percentage errors is 2.4%, a measure of precision. This suggests that, on average, one-period-ahead forecasts can be accurate within 3% or so. For longer horizons, the accuracy decreases, but still within 10%, based on the test. The lead 12 accuracy is based on only one forecast, so the improved accuracy is an anomaly. When we repeat this dynamic simulation for the additive seasonal (A, A) version of Holt-Winters model (not shown), the range of MAPEs is between 2.5% and 14%. Hence, the multiplicative model still seems to be the better choice for this forecast period.

8 - 28 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained

Figure 8.37 Monthly hotel/motel demand series - linear trend, multiplicative seasonality model (A, M). Summary fit statistics, smoothing parameters and seasonal factors for 12-month hold-out sample.

Figure 8.38 Monthly hotel/motel demand series - linear trend, multiplicative seasonality model (A, M). Forecast bias and precision over 12-month horizon. (Source: Figure 8.36) The one-step-ahead forecast errors play a special role in the analysis of forecast performance because prediction limits are based on them. Figure 8.41 displays the 12 period-ahead forecasts with the upper and lower prediction limits over the holdout period made with the damped trend, multiplicative seasonal model (Ad, M). These forecasts are compared with the actuals in the holdout sample and forecast errors are calculated. The final level and trend components are shown, along with the seasonal index for the month. Evidently, the peak month is May (index = 1.187), meaning that May is almost 19% above norm. This may be attributable to the attractions of the spring season in this location. As expected for these data, the low months are in the winter.

Big Data: Baseline Forecasting with Exponential Smoothing Models

8 - 29

Figure 8.39 Predictive visualization of actuals, forecasts, prediction limits, and moving average trend for the hotel/motel demand - additive trend, multiplicative seasonality model (A, M). (Source: Figure 8.36)

Figure 8.40 Performance evaluation of the hotel/motel demand series - damped trend, multiplicative seasonality model (Ad, M). Evaluation of dynamic rolling forecasts). (Source: Figure 8.36)

Figure 8.41 One-period-ahead forecasts over the holdout period for the hotel/motel demand series - damped trend, multiplicative seasonality model (Ad, M). (Source: Figure 8.38)

8 - 30 EMBRACING CHANGE & CHANCE: Demand Forecasting Explained

Takeaways This chapter provides an introduction to a family of exponential smoothing models useful for forecasting trending and seasonal data with prediction limits: The components can describe a current level, trend, and seasonal index The current level is the starting point, the trend is the growth or decline factor, and seasonal index is the adjustment for seasonality All three components are exponentially weighted averages, rather than equally weighted averages, of the historical data. In calculating the current level, an exponentially weighted average is taken of the past data. The current trend is an exponentially weighted average of the past changes in the level and each seasonal index is an exponentially weighted average of the past ratios of data to level. The estimation and manipulation of parameter values for exponential smoothing algorithms are not emphasized, because: In practical situations, estimates can vary widely without significantly affecting the forecast profile created by the algorithm Optimal or near optimal parameter settings are readily derived with automated software tools Combinations of multiple parameter values can limit an intuitive feel for their impact on the forecast profile In large database applications, a demand forecaster needs to be able to rely on the automatic forecasting features of modern software because of the very large volumes of data involved. We have described the key characteristics of exponential smoothing as a flexible forecasting technique using (exponentially decaying) weights that give more emphasis to more recent periods in the data. Several examples are used to illustrate the nature of exponentially decaying weights for different smoothing models, which helps the demand forecaster select the most appropriate technique for handling trend and seasonal patterns. We have shown how the forecasting formula works for Holt and Holt-Winters models and also for nonlinear trend procedures such as those with damped and exponential trends.

Suggest Documents