special features

4 downloads 23646 Views 5MB Size Report
formula to determine resource requirements (Cleveland &. Mayben, 1997). Call center staffing is a common application, but the same distribution is used to ...
Issue 4  June 2006

FORESIGHT The International Journal of Applied Forecasting

THINK DEVISE CALCULATE ENVISION INVENT INSPIRE THINK DEVISE CALCULATE ENVISION INV

SPECIAL FEATURES Forecasting for Call Centers Forecast Accuracy Metrics for Inventory Control Lessons From Successful Companies Breaking Down Barriers to Forecast Process Improvement Transformation Lessons From Coca-Cola Enterprises Inc.

A PUBLICAT I O N O F T H E I N T E R N AT I O N A L I N S T I T U T E O F F O R E C A S T E R S

IIF

“Knowledge of truth is always more than theoretical and intellectual. It is the product of activity, as well as its cause. Scholarly reflection therefore must grow out of real problems, and not be the mere invention of professional scholars.”John Dewey, University of Vermont

CONTENTS 2 Editorial Statement

SPECIAL FEATURE

3 Renew FORESIGHT

FORECAST-ACCURACY METRICS FOR INVENTORY CONTROL AND INTERMITTENT DEMANDS

4 Subscribe to FORESIGHT

SPECIAL FEATURE FORECASTING FOR CALL CENTERS 5 Preface

31 Preface 32 Measuring Forecast Accuracy: Omissions in Today’s Forecasting Engines and Demand-Planning Software by Jim Hoover

6 Nano Forecasting: Forecasting Techniques for Short-Time Intervals by Jay Minnucci, Incoming Calls Management Institute

36 Forecast-Accuracy Metrics for Intermittent Demands: Look at the Entire Distribution of Demand by Tom Willemain

11 Forecasting Call Flow in a Direct Marketing Environment by Peter Varisco, New York Life

39 Accuracy and Accuracy-Implication Metrics for Intermittent Demand by John Boylan and Aris Syntetos

16 Forecasting Weekly Effects of Recurring Irregular Occurrences by Dan Rickwalder, Incoming Calls Management Institute

19 Commentary on Call Center Forecasting by Tim Montgomery, The Service Level Group, LLC

43 Another Look at Forecast-Accuracy Metrics for Intermittent Demand by Rob J. Hyndman

FORECASTING PRINCIPLES AND METHODS 47 Lessons From Thomas Edison’s

FORECAST PROCESS IMPROVEMENT: LESSONS FROM SUCCESSFUL COMPANIES 21 Managing the Introduction of a Structured Forecast Process: Transformation Lessons from Coca-Cola Enterprises Inc. by Simon Clarke

26 Breaking Down Barriers to Forecast Process Improvement by Mark Moon

Technological and Social Forecasts by Steven Schnaars

53 Tips for Forecasting Semi-New Products by Bill Tonetti

BOOK REVIEW 57 Anirvan Banerji reviews Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets by Nassim Nicholas Taleb

FORESIGHT 2006 60 Coming in Future Issues

FORESIGHT

FORESIGHT Issue 4 June 2006

Editorial Statement

FORESIGHT

The International Journal of Applied Forecasting

FORESIGHT, an official publication of the International Institute of Forecasters, seeks to advance the practice of forecasting. To this end, it will publish high-quality, peer-reviewed articles, and ensure that these are written in a concise, accessible style for forecasting analysts, managers, and students. Topics include:       

Design and implementation of forecasting processes Forecasting principles and methods Integration of forecasting into business planning Forecasting books, software and other technology Forecasting-application issues in related fields Case studies Briefings on new research

 

Len Tashman [email protected] ASSOCIATE EDITOR Nada Sanders [email protected] MANAGING EDITOR Bill Wicker [email protected]

Contributors of articles will include: 

EDITOR

Analysts and managers, examining the processes of forecasting within their organizations. Scholars, writing on the practical implications of their research. Consultants and vendors, reporting on forecasting challenges and potential solutions.

DESIGN EDITOR Anne McLaughlin Anne Matthew Design [email protected]

All invited and submitted papers will be subject to a blind editorial review. Accepted papers will be edited for clarity and style.

MANUSCRIPT EDITORS Steve Candiotti

FORESIGHT welcomes advertising. Journal content, however, is the responsibility of, and solely at the discretion of, the editors. The journal will adhere to the highest standards of objectivity. Where an article describes the use of commercially available software or a licensed procedure, we will require the author to disclose any interest in the product, financial or otherwise. Moreover, we will discourage articles whose principal purpose is to promote a commercial product or service.

Wordwright [email protected]

FORESIGHT is published by the International Institute of Forecasters Business Office: 53 Tesla Avenue, Medford, MA 02155 USA

EDITORIAL BOARD Celal Aksu J. Scott Armstrong William Bassin Roy Batchelor John Boylan Elaine Deschamps Paul Fields Robert Fildes Paul Goodwin Kesten Green James Hoover

2

Mary Ellen Bridge [email protected] © 2006 International Institute of Forecasters (ISSN 1555-9068)

PRACTITIONER ADVISORY BOARD Greg Hudak Ulrich Küsters Michael Leonard Hans Levenbach Marcus O’Connor Lars-Erik Öller Roy Pearson Steven Schnaars Tom Willemain George Wright

FORESIGHT Issue 4 June 2006

Chairman: Joe Smith, Coca-Cola Enterprises Inc. Jim Akers, DOW Carolyn Allmon, ConAgra Foods Thorodd Bakken, SEB Sandy Balkin, Rodman & Renshaw LLC Anirvan Banerji, Economic Cycle Research Institute Nariman Behravesh, Global Insight Charlie Chase, Information Resources, Inc. Robert Dhuyvetter, J. R. Simplot Company Jamilya Kasymova, Marriott International Jay Minnucci, Incoming Calls Management Institute Joseph McConnell, McConnell Chase Software Works Carmel Nadav, Wells Fargo Thomas Ross, Brooks Sports, Inc. (Russell) Eric Stellwagen, Business Forecast Systems Dwight Thomas, Lucent Technologies Bill Tonetti, Demand Works Kitty Vollbrecht, Norfolk Southern Patrick Wader, Bosch Brenda Wolfe, SAS

It’s time for more.

Unless you renew, this will be your LAST ISSUE of Foresight.

MORE forecasting at its best. MORE in-depth analysis from experts in the field...MORE real-life corporate solutions... MORE strategic forecasting tips...MORE unbiased software and book reviews...MORE...

3 WAYS TO SUBSCRIBE. RENEW TODAY. CLICK www.forecasters.org/foresight TO PAY BY CREDIT CARD OR CHECK ONLINE

CALL Bill Wicker toll-free 866.395.5220 TO PAY BY CREDIT CARD

E-MAIL Pam Stroud at [email protected] QUESTIONS?

DON’T LOSE YOUR FORESIGHT! All Foresight subscriptions expire June 2006.

Get MORE.

SUBSCRIPTION FORM ON NEXT PAGE. June 2006 Issue 4 FORESIGHT

3

FORESIGHT is published 3 times annually in February, June and October.

INDIVIDUAL SUBSCRIPTIONS $95 per year: USA $99 per year: Canada and Mexico

A one-year subscription to FORESIGHT includes all THREE issues of the journal plus a FREE digital version of each issue available as a downloadable pdf. Two or more subscriptions are available at a discount rate. The more subscriptions you order, the more you save per issue.

$105 per year: Other countries (domestic/in US dollars, includes mailing)

Subscribe/Renew for 2 years and save $15! $175/USA $183 Canada & Mexico $195/other countries

MULTI SUBSCRIPTION RATES

Subscribe to FORESIGHT

# of Subscriptions Price per Subscription 1 2-5

$95 each $75 each

6-10 16-20

$70 each $60 each

21+

pricing on request

click www.forecasters.org/foresight

(credit card payments accepted) Online www.forecasters.org/foresight Toll-Free Worldwide 866.395.5220 E-mail Bill Wicker, Managing Editor, [email protected]

(domestic/in US dollars, includes mailing) Canada & Mexico, add $4 per year. Other countries, add $10 per year.

CONTRIBUTE TO FORESIGHT To contribute an article or offer to serve as a reviewer or editor, contact Editor Len Tashman, [email protected]

Mail check or credit card info The International Institute of Forecasters Business Office 53 Tesla Avenue Medford, MA 02155 USA

FORESIGHT for IIF Members Your annual IIF membership fee of $120 ($40 for a student) includes subscriptions to: FORESIGHT: The International Journal of Applied Forecasting The International Journal of Forecasting (quarterly) The Oracle of the IIF (quarterly newsletter)

FORESIGHT is published by The International Institute of Forecasters Business Manager: Pamela N. Stroud 53 Tesla Avenue, Medford, MA 02155 USA [email protected]

Bulk Orders and Reprints Please contact Bill Wicker, [email protected]

F O R E S I G H T: T H E I N T E R N A T I O N A L J O U R N A L O F A P P L I E D F O R E C A S T I N G

YES. Send me a one-year (3 issues) subscription.  $95 / USA

 $99 / Canada and Mexico

 $105 / other countries

I want to SAVE $15! Subscribe/Renew for TWO YEARS.  $175 / USA  Check

(US funds only)

 $183 / Canada and Mexico

enclosed

Exp. Date

 $195 / other countries

 Credit Card #

Name on Card

Signature please print

NAME

COMPANY

ADDRESS

C I T Y / S TAT E / Z I P ( P O S T C O D E ) / C O U N T R Y

EMAIL

PHONE (

)

TOTAL AMOUNT $

The International Institute of Forecasters, Business Office, 53 Tesla Avenue, Medford, MA 02155 USA 866.395.5220 SUBSCRIBE OR RENEW ONLINE www.forecasters.org/foresight MAIL TO

PHONE TOLL-FREE WORLDWIDE

4

FORESIGHT Issue 4 June 2006

SPECIAL FEATURE FORECASTING FOR CALL CENTERS

Our Special Feature on call center forecasting contains three papers and a commentary. Jay Minnucci of the Incoming Calls Management Institute (ICMI) lays out the challenges of forecasting incomingcall volumes on a subhourly basis. He calls such a short-interval focus nano forecasting, and he argues that the precepts of nano forecasting apply not only to call centers but also to most organizations facing service queues.

PREFACE A call center is a centralized office where inbound calls are received – normally from customers requesting assistance – and outbound (telemarketing, survey) calls are made. Often staffing hundreds of agents, the call center concentrates the telephone-based hardware, service, and support in one location, a configuration that entices many companies to outsource their telephone functions. Call centers are a major producer of short-term forecasts, and forecast accuracy is at least as important a goal in call centers as it is in manufacturing organizations. The callvolume forecast drives staffing decisions (specifically, how many agents are required). Service-level considerations, the percentage of calls answered within a short time period, are paramount. The call center is truly a laboratory for forecasting methodology.

Peter Varisco, a call center consultant for New York Life Insurance, presents a case study on the use of dynamic modeling to link call volumes to marketing-campaign drivers. Here is a successful application of the transfer function, or ARIMA-X methodology, to the very practical requirements of call center operations. Whereas Peter’s case study deals with planned events (marketing campaigns), Dan Rickwalder of ICMI analyzes the forecasting challenges presented by “recurring irregular occurrences” (RIOs) such as the shifting calendar effects of paydays, billing cycles, and promotions. Dan shows how he has used the technique of event modeling – within the framework of exponential smoothing – to develop estimates of the impacts of RIOs on call volumes. Tim Montgomery’s commentary is a call to action to call center managers to take advantage of nano forecasting and of the statistical modeling of call-volume series.

June 2006 Issue 4 FORESIGHT

5

NANO FORECASTING: FORECASTING TECHNIQUES FOR SHORT-TIME INTERVALS by Jay Minnucci Preview: Call centers and other organizations that deal in real-time environments must be able to forecast in days, hours, and even minutes. They can do so successfully by finding smaller bits of data hidden within the “macro” data. Jay shows how this nano-forecasting focus can be employed to project call volumes and to improve resource productivity. Jay Minnucci is Vice President of Consulting for ICMI, an Annapolis, Maryland-based company specializing in consulting, seminars, conferences, and publications for inbound call centers. Jay works with call centers in all industries to improve forecasting and to plan for demand. When he is not in a call center, Jay enjoys time with his family or indulges in a round of golf.

 Analyzing data in short time intervals can often expose flaws in widely accepted assumptions about call-volume patterns.  To effectively forecast call volume, we must understand customer telephone behavior, which requires that we focus on the very small time increments when customer decisions are made.  While the call center provides an excellent laboratory for studying the value of nano forecasting, the benefits apply to many other forecasting organizations.

Introduction Hiding somewhere in your forecasting data base are small bits of data that can enable better forecasts and better organizational results. Finding these gems and learning how to use them will open up a world of possibilities for forecasting and productivity improvement. I use the term nano forecasting to represent the techniques of projecting volumes for very short time intervals. In identifying the potential benefits of nano forecasting, I find similarities in the history of medical diagnoses and treatments. Prior to 1674, patients would present symptoms—much as they do today—but treatment was little more than a forecasting exercise based on past history. If I do nothing, what will happen in three, five, or sixty days? If I choose treatment X, what is the likely outcome? What about treatment Y? The selection of a treatment came down to observations on what had worked in the past. Successes occurred from time to time, but given the myriad number of causes for the same symptoms, medical treatment wasn’t much more than a guessing game. 6

FORESIGHT Issue 4 June 2006

What happened in 1674 to turn this around? Anton Van Leeowenhoek, using a powerful microscope he had built, first discovered tiny microbes inside a drop of water. The impact was far-reaching. Physicians could now see the underlying causes of the symptoms, and they began to understand the sources of diseases. The discovery of microbes also enabled doctors to detect a disease prior to the onset of symptoms. In short, medical treatment became more science and less art. We forecasters deal not with microbes but with data. We take available data, adjust them, decompose them, and use them to build forecasting models. In the business world, the forecasting standard is the monthly forecast. Therein lies the problem. While we tend to deal with monthly numbers, we often forget to consider that months are actually made up of smaller segments of time. If you put time into the microscope, you would see many of the events that produce the monthly data. And if you isolate these events, you may find a new world of analytical possibilities.

Call Center Forecasting “Short moments of time” is a concept very familiar to call centers. The call center forecaster is challenged to determine how many agents (operators) are required at any moment to achieve a desired level of service. Driving this staffing number is a call-volume forecast. Given a service-level objective, the standard conversion of the call-volume forecast—along with a handle-time forecast—into a staffing requirement uses a distribution known as Erlang-C. Named after its originator, Danish

equipment that captures incoming call data on a halfhour basis. (Depending on the equipment, this could be by the hour or by the quarter hour, but I will use the standard of a half hour.)

engineer A. K. Erlang, Erlang-C is the standard queuing formula to determine resource requirements (Cleveland & Mayben, 1997). Call center staffing is a common application, but the same distribution is used to determine resources such as the number of toll booths needed at a bridge and the number of restrooms needed at a stadium. The Erlang-C formula is

Focusing on the half hour gives the call center the ability to understand what can happen in the “nano” that in turn can have a significant impact on the “macro.” And while half an hour may seem like a very short segment of time, it is often necessary in the call center to dig even deeper to find these impacts.

AN N N–A N! P ( >O ) = N – 1 Ax N AN ∑ + N! N – A x = O x! A = Total traffic offered in erlangs N = Number of servers in a full availability group P(>O) = Probability of delay greater than O P = Probability of loss – Poisson formula

Looking Within the Half Hour The Erlang-C calculation accounts for a random arrival of calls across a time interval. However, it does not account for increasing or decreasing call-arrival rates within the time interval. During an interval (a half hour) when volume is steadily increasing (typically early in the morning) or steadily decreasing (later in the afternoon), we need more than one staffing number per half hour.

Handle time, the length of time an agent takes to complete a call, is expressed (along with call volume) as a part of A (total traffic in erlangs). The formula is widely available today in downloadable calculators and add-ins to Excel. It is also a key element of the typical workforcemanagement systems used in call centers today. Examples of these systems include IEX, Aspect, and Blue Pumpkin, among others.

For example, consider an early-morning half hour (8:00 to 8:30 AM) where 340 calls are expected, and we wish to answer 80% of them within 20 seconds. So our desired service level (SL) is 80%. The typical “macro-assumptive” approach inherent in the Erlang–C calculation assumes that roughly half the calls (170) come in during each fifteen-minute segment. Assuming that an agent will spend on average 240 seconds per call (AHT or Average Handle Time), the Erlang-C calculation projects that we need 51 phone agents throughout the half hour to meet our goal of 80% answered within 20 seconds. Table 1 shows this scenario, with the final column listing the actual percentage of calls answered within 20 seconds.

Forecast accuracy in a call center is important. Overforecasting produces waste in the form of extra agents being paid to wait for calls. Underforecasting generates substantial queuing, which has consequences that can include lost calls, caller dissatisfaction, high employee turnover, and high telephone charges. So far, forecasting at a call center doesn’t sound very different from many other forecasting challenges. What distinguishes the call center environment, however, is the element of time.

As the last column indicates, service did not meet For call center forecasters, monthly data aren’t useful: expectations. The organization was overstaffed during the knowing that 100 agents are required, on average, for the first 15 minutes but understaffed during the last 15 minutes. month is of no value. Caller tolerance (the amount of time Why did that happen? Had we put the data under a a caller will wait before hanging up) is rarely beyond two microscope, we would have learned that we typically get to five minutes. A staffing forecast must ensure that a queue remains at a manageable level on a moment-to-moment basis. On Monday at 10:30 AM, 165 agents might be Table 1. Macro-Assumptive Approach needed, while on Thursday at 7:30 PM, the number Interval Interval Projected AHT Required Projected Actual required might be only 14. At minimum, the call center needs to set staffing by the half hour. Its ability to do so is made possible by

Start

End

# of Calls

(Secs)

Staff

8:00 8:15

8:15 8:30

170 170 340

240 240 240

51 51 51

Total

SL Result SL Result 80% 80% 80%

99% 2% 44%

June 2006 Issue 4 FORESIGHT

7

Table 2. Micro-Assumptive Approach Interval Start

Interval End

Projected # of Calls

AHT (Secs)

8:00 8:15

8:15 8:30

149 191 340

240 240 240

Total

Required Projected Actual Staff SL Result SL Result 45 57 51

80% 80% 80%

80% 80% 80%

just under 44% of the half-hour’s traffic from 8:00 to 8:15, with slightly more than 56% received from 8:15 to 8:30. Clearly, our assumption that the arrival rate would remain relatively constant during the entire 30 minutes was incorrect.

For this purpose, the standard analysis based on Erlang-C both helps and hurts. It helps by providing an estimate of queue size under various conditions. It hurts by assuming that no caller ever hangs up, which is clearly not the case in a call center! While there are ways to estimate this impact outside of Erlang-C, we will not go into these here. Instead, we will focus on the callers who decide to hang on.

Such a pattern is typical among call centers during the morning “ramp-up” period. Because the call-arrival rate was slower than projected during the first fifteen minutes, we provided very quick service: nearly all calls (99%) were answered within 20 seconds. While that may be good for the caller, it means that the organization spent too much on staffing. During the second 15 minutes, when traffic was heavier, the results were quite different. Only 2% of the calls were answered in the first 20 seconds, creating high queuing conditions and high levels of customer dissatisfaction.

The call center forecaster may be wondering why we would go through this exercise. After all, callers who are hanging on are part of our original demand, so why would we forecast for them again? The microscope provides the answer. The staffing calculations are based on call-volume projections by half-hour increments, ensuring only enough staff for the incoming demand projected for that half hour.

This micro-assumptive approach allows us to better refine our staffing. Now that we know that the arrival rates are not constant throughout the interval, we can run two different scenarios through our Erlang-C calculator. For this and other examples, I used the Queueview (2002) Erlang Calculator. The first interval is based on 149 calls in 15 minutes, while the second is based on 191 calls in 15 minutes. Armed with this greater level of detail, Erlang-C tells us to assign 45 agents to the first 15 minutes and 57 agents to the second 15 minutes. The results are shown in Table 2. Without increasing staff expenditure, we would achieve our service-level requirements throughout the half-hour period. This is an example of the precision impact of nano forecasting: greater precision in forecasting leads to better operational results.

Hanging On Versus Hanging Up: Predicting the Carryover Forecasting exercises become more interesting when consumer behavior comes into play. At busy times in the

8

FORESIGHT Issue 4 June 2006

call center, a caller may be placed into a queue when first “entering” the call center. From that moment until the point of answer, one question runs through the caller’s mind: Do I hang on, or do I hang up? The longer the delay, the more likely it is that a hang-up (or an “abandoned call”) will occur. Our challenge is to predict how many people will stay on board under different queuing conditions as well as what staffing we need to answer the calls in a timely fashion.

But what about those who called during the previous half hour who are still hanging on? That split-second crossover from one interval to the next doesn’t erase the queue. If we don’t account for the crossover, we won’t reach the desired service level. This is a failure of even the higherpriced workforce-management systems used in call centers. A typical “resource constrained” interval, in which we simply don’t have enough agents to meet demand, is shown in Table 3. Table 3. Determining the Carryover # Agents # Agents Agent Projected Carryover Interval # of Calls AHT Required Scheduled Shortage (# of Callers in Queue) 9:00

198

240

31

28

3

11

Where did the carryover calculation of 11 callers in queue come from? It is derived in Erlang-C. The better ErlangC calculators include “average queue size” as part of the calculation, and we make the assumption that the size of the queue at the end of the interval will be equal to the

average queue during the interval. Queue size at a given call volume is dependent on the agent shortage (the difference between the number required on the phone and

11 * 4 / 8 = 5.5 extra agents

Table 4. Carryover by Staffing Level Agent Projected Carryover AHT # Agents # Agents # of Calls (secs) Required Scheduled Shortage (# of Callers in Queue) 198 198 198 198 198 198

240 240 240 240 240 240

31 31 31 31 31 31

27 28 29 30 31 32

where the desired number of minutes to clear cannot be less than the AHT. So if you have 11 calls in the queue and wish it to be cleared in 8 minutes, the formula yields

4 3 2 1 0 -1

38 11 5 3 2 1

the number actually answering the phone). For the example above, the projected carryover provided by our Erlang-C calculator for different staffing levels is shown in Table 4. For our example, we project that our staffing shortage will produce a carryover of 11 calls into the next interval. What will happen to these 11 calls? If they are to be answered, we had better add the 11 calls to our projected call-volume for the subsequent half-hour interval. Table 5 shows how this adjustment changes the planning for the next 30 minutes. We see that two extra agents are required to take care of the carryover.

Callers in the queue represent what I call conditional demand: they will hang on only if there are enough resources to get to them without substantial delay. And it is actually their perception of length of delay that drives results. It may be that the delay is only 30 seconds, but if a caller perceives this time to be above a threshold of two minutes, the caller will hang up.

Other Applications in Call Centers Once you embrace the potential of nano forecasting, you can find other applications in a call center. Two examples include

 Calendar effects. For long-term forecasting (6 to 60 months out), it is common to use monthly data. But if you take a simple “volume by month” approach, you ignore the fact that you are open more hours in some months than you are in others, and you ignore that Table 5. Extra Agents Required by the Carryover some months have more high-volume days of the Projected Projected week (typically Monday and Tuesday) than others. Carryover # of # Agents # Agents Speed of (Average Queue Size) Interval Calls AHT Required Scheduled Answer A process exists to “true up” these factors so that 11 9:00 198 31 102 240 28 monthly data is stored and forecasted on a “calls9:30 w/o carryover 262 40 240 9:30 with carryover 273 42 240 per-index, day-per-month” format, rather than “volume by month.” Though lengthy and somewhat complex, this process greatly enhances the accuracy of seasonal factors and can reduce monthly forecast-error Clearing Out the Queue rates by up to 5 percentage points. The methodology illustrated in Tables 3, 4, and 5 still  Average handle time. The amount of time required restricts us to the world of half hours, where we spread to handle a call is the function of many different factors. capacity over the half hour to handle calls that will be It is not possible to account for all of them, but a nano there at the start. A better approach is to clear out the queue forecasting approach to AHT could focus on the in the first few (e.g., eight) minutes of the interval. How percentage of trainees with typically longer-than-average should we do that? handle times or the likelihood of slow system response during any given interval. Fortunately, it is relatively simple. Because there are 11 calls in the queue, and they take on average 4 minutes (240 seconds) to complete, we’ll need 11 extra agents for the Each call center is unique, and once you bring out the microscope you will likely find many more opportunities first four minutes of the queue to get the job done. Here is the formula for clearing the queue in less than half an hour: to explore. Each success improves the predictability of your call center staffing, and that is a win for both the customers and the organization. (# in queue)*(AHT in minutes/desired # of minutes to clear)

June 2006 Issue 4 FORESIGHT

9

Nano Forecasting Outside Call Centers Do examples of nano forecasting exist outside the call center? They absolutely do, but finding them often takes some experimentation and observation. If you run a coffee shop, you probably know the number of muffins you sell per day. But if you decompose the day, you’ll likely find that muffins are most popular early in the morning and that sales trail off substantially later in the day. If you also sell energy bars, you might find the opposite results—weak morning sales but stronger afternoon sales. And with that piece of nano information, noon might see you moving the muffin rack off to the side and putting the energy bars right on the counter. Other applications of nano forecasting include

events that could not previously be seen. Nano forecasting is a great challenge, but it is a small investment when you consider the payoff.

References Cleveland, B. & Mayben, J. (1997). Call Center Management on Fast Forward, Annapolis, MD: Call Center Press. Gans, N., Koole, G. & Mandebaum, A. (2003). Telephone call centers: Tutorial, review, and research prospects, Manufacturing and Service Operations Management, 5, 79-141. Queueview (2002) call center staffing calculator, by ICMI, version 5.1.21, copyright 2002.

 The impact of the number of cars in queue at the drive-through lane of a fast-food operation.  The potential shoppers at a retail outlet who turn away when they see a full parking lot.  The hopelessly understaffed department store that sees revenue slipping but doesn’t notice that people are walking away because they cannot get assistance. How do you address these situations? You begin by taking careful, detailed observations at the lowest levels. At the fast-food chain, watch what happens during the peak lunch hours. When the queue reaches a relatively small number (three, for example), record how quickly the next customer arrives, and express this arrival time—minutes per customer—in reciprocal form as customers per minute. This will give you an indicator of true arrival rates when demand is not impacted by the perception of a queue. Then record how long it takes for the next customer to arrive when there are four, five, and six cars in queue. After a number of days of observation, you will be able to develop a table that shows the impact of the queue size on customer arrival during peak hours. Armed with this information, a manager can make the case to invest more in speeding up the process to drive higher levels of revenue. For a good overview of queuing concepts in call centers, see Gans, Koole, & Mandebaum (2003). And with that, we’ve come full circle. Like the discovery of the microscope, the utilization of nano data uncovers

10

FORESIGHT Issue 4 June 2006

Contact Info: Jay Minnucci Incoming Calls Management Institute [email protected]

FORECASTING CALL FLOW IN A DIRECT MARKETING ENVIRONMENT by Peter Varisco, New York Life Preview: Peter provides a case study in the use of dynamic modeling to forecast call volumes and to estimate how these volumes are affected by the timing of direct mail campaigns. Dynamic modeling, variously called dynamic regression, ARIMAX, and transfer function modeling, is a driver-based (explanatory) methodology that can supply precise timing effects of key drivers such as direct mail promotions. In summarizing the lessons from the application of this methodology at New York Life Insurance, Peter provides a working demonstration of the method’s value for call-volume forecasting. Peter Varisco is a senior research consultant for New York Life Insurance. He holds both a BA and an MA in mathematics from the University of South Florida, and he has taught mathematics for the Gifted Program at Plant City High School, where he has also coached the mathematics team. His teaching interests continue through involvement in the Great American Teach-In, Bring Our Kids to Work Day, and Big Brothers and Big Sisters of Tampa Bay.

 Dynamic (regression) modeling is a potentially valuable technique for linking call-volume forecasts to marketing plans, especially when individual calls cannot be coded by source.  We present a case study of dynamic modeling to estimate the impact of a direct mail campaign on incoming call volume.  The dynamic models are a major improvement over the standard workforce-management tools for call-volume forecasting.

The Background: The New York Life AARP Call Center BenchmarkPortal and the Center for Customer-Driven Quality at Purdue University have recognized the New York Life AARP Call Center in Tampa (NYL-AARP) as a “Certified Center of Excellence.” This is the second consecutive year that the center has ranked first among 13 life insurance companies reviewed in the study. NYL-AARP operates in a highly dynamic environment. It is a leading direct marketer of life insurance and annuities in the senior market, and it constantly tests new products, new channels of communication, and new markets. Originally, NYL-AARP used a workforce-management tool to forecast short-term call volumes. However, this tool was unable to distinguish between the effects of two very different categories of calls. Total inbound-call volume consists of customer service calls (service for existing customers) and telemarketing calls (inbound calls driven by direct marketing campaigns). Existing customers often call with billing questions or questions about policy features. Customer

service patterns tend to show a certain degree of seasonality, and they trend up as the customer base increases. Telemarketing calls come in on phone numbers listed in NYL-AARP promotional materials. As a result, these calls are entirely driven by NYL-AARP promotions. During periods lacking a promotion, telemarketing-call volume approaches zero, and it stays there until the next promotional cycle begins. To anticipate weeks having peak volume, the company needed separate forecasts for telemarketing calls and customer service calls. Also, the company needed to link forecasts for telemarketing calls directly to the appropriate marketing plans. In short, it needed driver-based forecasting. This is more complicated than forecasting the number of insurance applications resulting from direct marketing campaigns. A source code is printed on each application so that the application can be tied to a specific campaign. As a result, when the campaign is complete, the company knows how many applications each campaign has generated, and also how many applications have arrived each week since the start of the campaign. Using this data, the company can forecast the weekly flow for subsequent campaigns. Forecasting telemarketing-call volume, however, is not as simple. Telephone service representatives don’t always capture a source code on each call. A source code establishes a link between the call and the specific campaign that generated it. Lacking the source code, the company can match the phone numbers reserved for certain types of promotional materials to the phone numbers feeding the call volume. Through this match, the company can determine that certain calls have originated from certain June 2006 Issue 4 FORESIGHT

11

To link call volumes and timing to specific campaigns, the company uses two sets of measurements. The first shows the timing and quantity of promotions; the second shows the stream of call volumes resulting from those promotions. The challenge is to link these two sets—to forecast call volumes from the drivers, which are the timing and quantity of promotions. Enter dynamic regression modeling.

The Case Study: A Practical Application of Dynamic Modeling NYL-AARP promotes its products through many channels, including AARP and non-AARP publications, DRTV, and direct mail campaigns. This case study is designed to illustrate how dynamic regression was employed to model one type of call volume: inbound telemarketing calls resulting from direct mail campaigns for the life insurance business. In a direct mail campaign, selected prospects receive an offer for life insurance in the form of a direct mail kit with an enclosed application for insurance. Prospects can call with questions using a phone number listed in the kit. These phone numbers are reserved for direct mail campaigns. So, with the exception of a few wrong numbers, every incoming call originates from a direct mail campaign. The weekly volume of such calls is driven by several factors:

proprietary methods, marketing analysts estimate the responsiveness of the individuals selected for each direct mail campaign. In modeling the call flow, we scale our mail quantities using marketing-supplied factors. The timing of mail arrival Direct mail kits contain a respond-by date. To allow prospects to meet those dates, mail delivery is coordinated so that the mail arrives on a designated in-home date. Although the in-home date determines the peak in the arrival of the kits, some of the kits arrive a few days earlier or later than expected. If the in-home day is a Monday, some portion of the kits will arrive in the week prior to the scheduled in-home week. But if the in-home day is a Wednesday, almost all of the kits will arrive in the scheduled in-home week. These differences do affect the call flow, as the call center will begin to receive calls as soon as the kits arrive. To see this, consider Figure 1, which shows a portion of N=156 weeks of actual call-volume data. The placement of the diamonds shows the weekly timing of seven direct mail campaigns. The in-home day values are shown on the right vertical axis. The in-home day is coded as Sunday=1, Monday=2, Tuesday=3, and so on. The calls (scaled per time unit) are shown on the left vertical axis. The impact of the in-home day on call flow is evident. When the in-home day is 2 or 3 (Monday or Tuesday), weekly call flows form an inverted V. There is an upturn in call flow in the week prior to the campaign, an upturn that results from the early arrival of kits. When the inhome day is Wednesday or later, few if any kits arrive in the week prior to the campaign, but we see an abrupt upward spike during the week of the campaign.

The selected prospects’ tendency to respond Although the timing and size of the campaign are the primary drivers, the propensity to respond is also relevant. Using 12

Calls (scaled)

The timing and size of the mail campaign The peaks and valleys in call volume reflect the timing and size of the mailing. Mail plans change from year to year, so the timing and Figure 1. size of peaks in previous Illustration of Effect of In-Home Day on Calls years are not reliable 100 guides to the current 90 80 year’s pattern. There is 70 no 52-week seasonality. 60 50 40 30

6 5 4 3 2

20 10 0

1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

FORESIGHT Issue 4 June 2006

Week Calls

In-home Day

In-home Day

types of campaigns, but it cannot say how many calls a specific campaign has generated, nor can it tell when the calls arrived.

Regardless of the in-home day, the effect of a campaign on call flow diminishes over the weeks following the campaign. How can dynamic modeling capture the weekly effects of campaign size (the number of kits mailed) and in-home day? The dynamic (regression) model I have developed is based on the Linear Transfer Function (LTF) Identification Method. The technical basis for this method is presented in the four references.

Step 1: Recode the data on campaigns. The weekly data on campaigns consist of a string of zeros (to indicate no direct mail campaign) and positive values (to indicate the timing and quantity of kits mailed). For each mail campaign, we place the in-home day next to the mail quantity, as shown in Step A of Figure 2. The weekly series on the quantity of mail kits is separated based on the in-home day, resulting in the five columns shown in Step B. There is one series for Monday in-home days (Qty_M), one series for Tuesday in-home days (Qty_T), and so forth. In Step C, the starting data on mail quantities are shifted back one week to study the effect of the in-home day in the week prior to the week that the mail is scheduled to arrive in home.

Step 2: Estimate the model. In this case study, we have a single (scaled) explanatory variable, Mail Quantity, which has been recoded into five separate drivers: Qty_M, Qty_T, …, Qty_F. For each driver,

we introduce lagged terms to capture the diminishing effect of mail quantity on call rates in weeks following the campaign. Figure 3 shows the initial results. The vertical bars display the size of the initial model coefficients. The coefficients represent the effects of mail quantity on call rates for the week prior to the in-home day (lag 0), the week containing the in-home day (lag 1), and the three weeks following the campaign (lags 2-4). The horizontal axis shows the coefficient for each lag for all five recoded series. The 2 standard error line allows us to test the statistical significance of each coefficient. Consider, for example, Qty_M_0=1.0. This means that for the Monday in-home day series, call rates are elevated by one call per time unit in the week prior to the in-home week (lag 0) for each unit of (scaled) mail quantity. Because the coefficient value is well above the 2 standard error line, this is a statistically significant result. Given the pattern for the entire set of coefficients for Qty_M, the elevation in call rates peaks in the in-home week (lag 1) and falls off rapidly in following weeks (lags 2-4).

Step 3: Assess the impact of the in-home day. Figure 3 reveals two distinct patterns. Observe the spikes at lag 0, which represent the effects in the week prior to the in-home day. The spikes for Monday (Qty_M_0) and Tuesday (Qty_T_0) show that, for both Monday and Tuesday in-home days, call rates are elevated in the week prior to the in-home day.

Figure 2. Recoding of Campaign Data Using In-home Day (mock data) Step B: Separate Into Separate Series

Step A: Calculate In-home Day

Week 1 2 3 4 5 6 7 8

Mail Quantity In-home Day (scaled) Description 0 No campaign 100 Tues 50 Mon 0 No campaign 125 Weds 0 No campaign 200 Fri 75 Thur Starting Data

In-home Day 0 3 2 0 4 0 6 5

Logic for recode: Step A: In-home Day: Mon=2, Tues=3,…, Fri=6 Step B: If In-home Day=2, then Qty_M=Mail Quantity; else Qty_M=0 If In-home Day=3, then Qty_T=Mail Quantity; else Qty_T=0 If In-home Day=4, then Qty_W=Mail Quantity; else Qty_W=0 and so on. Step C: All of the recoded data is shifted back one week

Week 1 2 3 4 5 6 7 8

Qty_M Qty_T Qty_W Qty_R 0 0 0 0 0 100 0 0 50 0 0 0 0 0 0 0 0 0 125 0 0 0 0 0 0 0 0 0 0 0 0 75 Recoded Data Based on In-home Day

Qty_F 0 0 0 0 0 0 200 0

Step C: Shift Back One Week Week 1 2 3 4 5 6 7 8

Qty_M 0 50 0 0 0 0 0 0

Qty_T Qty_W 100 0 0 0 0 0 0 125 0 0 0 0 0 0 0 0 Final Recoded Data

Qty_R 0 0 0 0 0 0 75 0

Qty_F 0 0 0 0 0 200 0 0

June 2006 Issue 4 FORESIGHT

13

Figure 3. Initial Dynamic Model Coefficients

coefficients. Figure 4 displays weights that the simplified model generates using these five coefficients.

5 4.5 4 3.5 3 2.5 2 1.5 1 0.5

Q ty _ Q M_ ty 0 _M Q ty _ 1 _ Q M_ ty 2 _ Q M_ ty 3 _ Q M_ ty 4 _ Q T_ ty 0 _ Q T_ 1 ty _ Q T_ ty 2 _T Q _ ty 3 Q _ T_ ty _ 4 Q W_ ty 0 _ Q W_ ty 1 _ Q W_ ty 2 _ Q W_ ty 3 _ Q W_ ty 4 _ Q R_ ty 0 _ Q R_ ty 1 _ Q R_ ty 2 _R Q _ ty 3 _ Q R_ ty 4 _ Q F_ 0 ty _ Q F_ ty 1 _F Q _ ty 2 _ Q F_ ty 3 _F _4

0

For Qty_A there is a positive effect on call rates at lag 0, a peak at lag 1, and a rapid falloff thereafter, producing an inverted “V” shape in the forecasted call flow. For Qty_B there is no effect on call rates at lag 0, a higher peak at lag 1, and a similar fall-off thereafter. So the forecasted call flow will have a valley in the week prior to the mailing, an abrupt spike during the week of the mailing, and a rapid fall-off in subsequent weeks.

coefficient

2 standard error

Second, for every day of the week, Monday through Friday, the spikes peak at lag 1 and fall off steadily out to lag 4. This means that, regardless of the in-home day, call rates peak during the week that the mail is scheduled to arrive in home. Call rates are still elevated in subsequent weeks, but they fall off rapidly.

At this stage, one should perform certain diagnostic checks for model adequacy, as documented in the references. For my model, none of these diagnostics indicated model inadequacy (the results are not shown).

Step 4: Simplify the model if possible. In addition, I tested my model against a simpler baseline model that does not assume distinct in-home day effects. The baseline model assumes a single call flow and applies it to all campaigns, regardless of in-home day. The baseline model does not achieve the performance of the final model.

The model at this stage is quite complex, requiring 25 coefficients to estimate the impact of the in-home day and the fall-off patterns. Some simplification proved useful, in which Mondays and Tuesdays are combined into one series, labeled Qty_A, and Wednesdays through Fridays are combined into a second series, labeled Qty_B. Further simplification results from adopting a “rational-lag” model form that represents fall-off patterns using very few coefficients. In this case, the final model estimates the response pattern for Qty_A using just three coefficients, and the response pattern for Qty_B using just two

Step 5: Test the final model on holdout data. I tested the forecast accuracy of the final model on 52 weeks of call data (scaled) that had been withheld from the model fit. Figure 5 shows the actual and forecasted calls during this holdout period. The placement of the diamonds shows the weekly timing of 11 direct mail campaigns. The inhome-day values are shown on the right vertical axis.

Figure 4. Final Dynamic Model Weights

4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0

0

1

2

3 lag

Coding: Qty_A=Qty_M+Qty_T Qty_B=Qty_W+Qty_R+Qty_F

14

FORESIGHT Issue 4 June 2006

4

5 QTY_A

QTY_B

From Figure 5, we make the following observations:  The forecast errors average about 16.5% (MAPE), and there is a slight tendency to overforecast (BIAS of -2.5%). The MAPE must be interpreted in context. Total call volume is decomposed into telemarketing and customer service volume. Telemarketing volume is further decomposed to provide this demonstration. NYL-AARP typically

Figure 5. Actual and Forecasted Calls During Holdout Period

120

5

100

4 3

60

In-home Day

Calls (scaled)

80

and learning, and proper use of the methodology requires statistical expertise. Moreover, the models are less transparent to management than spreadsheets and other simple approaches.

Nevertheless, the benefits have outweighed the costs. The timing and size 40 of marketing campaigns are now direct 1 20 inputs to the telemarketing forecasts, both short-term and long-term. Managers can 0 0 visualize the link between the marketing 1 5 9 13 17 21 25 29 33 37 41 45 49 plan and the call-volume forecast, and Week MAPE = 16.48% they can create scenarios to see how FORECAST ACTUAL In-home Day BIAS = -2.472% varying the mail plan affects call flows, thereby ensuring that the resources will experiences a MAPE on aggregate total inbound calls much be in place to handle the expected call volume. smaller than the value reported here. 2



The relatively large errors occur in forecasting the volumes at certain peaks. This is due in part to errors in forecasting the driver—in this case, the scaled quantities. The marketing group had made changes to mail quantities throughout the year, which were not incorporated into the data. 

Other large errors occur during holidays, when call volume falls off; requiring further adjustment to the model in order to incorporate holiday effects.

Assessment of Dynamic Modeling for Call Centers NYL-AARP has used dynamic modeling to forecast inbound calls since the fall of 2003. As demonstrated by the preceding case study, this methodology has proved successful in establishing relationships between the timing and size of direct mail campaigns and the resulting call flow. The method has succeeded without the requirement to capture a source code for each call. Software for dynamic modeling is able to automate the data transformations, the recodes, and updates, and to generate forecast reports. Scientific Computing Associates (Liu & Hudak, 2004) supplies the software used by NYL-AARP. Development and implementation of the dynamic models, however, require a significant up-front investment of time

Compared to the workforce management tool, the accuracy of call-volume forecasts improved and the forecast process became more automated. NYL-AARP now places less reliance on subjective adjustments.

References Liu, L.-M. (2005). Time Series Analysis and Forecasting, River Forest, IL: Scientific Computing Associates Corporation. Liu, L.-M. & Hudak, G. B. (2004). The SCA Statistical System Reference Manual for Forecasting and Time Series Analysis Using the SCA Statistical System, River Forest, IL: Scientific Computing Associates Corporation. Makridakis, S., Wheelwright, S. C. & Hyndman, R. J. (1998). Forecasting Methods and Applications (3rd ed.), New York: John Wiley and Sons. Pankratz, A. (1991). Forecasting with Dynamic Regression Models, New York: John Wiley & Sons.

Contact Info: Peter Varisco New York Life AARP [email protected]

June 2006 Issue 4 FORESIGHT

15

FORECASTING WEEKLY EFFECTS OF RECURRING IRREGULAR OCCURRENCES by Dan Rickwalder, Incoming Calls Management Institute Preview: Weekly forecasts are important for call centers, but they present a host of challenges, including “recurring irregular occurrences” such as paydays and billing cycles. In this article, Dan describes his techniques for cleaning the weekly data, accounting for the irregular-event effects, and generating weekly forecasts of call volumes. As a consultant with the Incoming Calls Management Institute, Dan focuses on call center forecasting and staff planning. He has worked with call centers in many industries to improve their forecasting methodologies. Dan is also a frequent speaker and seminar leader on forecasting topics. Dan has a BA in History from James Madison University.

 Weekly forecasts must contend with a host of recurring irregular occurrences (RIOs) such as paydays, billing cycles, and account-statement cycles. These occurrences fall within the domain of weekly seasonality.  When at least two years of weekly data are available, I use event models to estimate the callvolume lift from different types of RIOs. Event models can be implemented as part of the family of exponential smoothing methods.  Weekly data must also be adjusted for missing data and for end-of-year crossover, and I illustrate a method for doing so.  Ultimately, call centers would like to identify the underlying drivers of call volume, thus permitting implementation of causal forecasting models.

Introduction The vast majority of business forecasting is performed at the monthly level. Most organizations are content to predict their monthly changes in customers, sales, and revenue. While weekly or daily variations do occur, they do not significantly impact the organization’s planning. Call centers exist in a much different environment. Weekly and daily variations play a large role in forecasting inbound call-volume. A caller demands immediate service. As calls randomly arrive at a call center, the appropriate number of agents must be on hand to answer the calls. Significant overstaffing or understaffing is costly. Therefore, call centers forecast call volumes in increments as small as 30 or 15 minute intervals.

16

FORESIGHT Issue 4 June 2006

Once the data have been cleaned of events and anomalies, call volumes tend to have repeating patterns at the daily and short-interval (hourly and subhourly) level. Call volume at 10:30 AM last Monday tends to have the same percentage of volume as the week before. At those levels, call center forecasters can rely on percentage-of-week and percentage-of-day calculations. The variation that does come in at the short-interval level arises from planned events like television advertising, or from anomalies such as system outages, weather events, or software failures. The effects of television advertising can be estimated statistically using so-called event models. The effects of unforeseen events can be calculated as well, but little planning can be done to handle these events.

Focus on Weekly Variation In call center forecasts, variation becomes most significant at the weekly level; therefore weekly forecasts provide the most leverage for performance improvement. Weekly variation arises in response to planned events such as direct mail marketing and holidays. These can be accounted for in the forecast models. But there also are a host of recurring irregular occurrences (RIOs) such as paydays, billing cycles, and account-statement cycles. These fall within the domain of weekly seasonality. Finally, there are the uncertain effects on call volume of mass-publication marketing, such as regularly mailed circular inserts. Outside call centers, most forecasting models are focused on monthly seasonality, with 12 monthly indexes to describe the variation within a year. With weekly data, we require 52 indexes, and these weekly indexes are more volatile because RIOs play a much larger role in weekly

data than in monthly data. Adding to the challenge of weekly forecasting is that call centers do not always have two full years of data to use.

problem is resolved by adding the last four days of 2002 to the first three days of 2003.

Incorporating RIOs into Forecast Models th

Payday is an important RIO. Pay dates on the 15 and on the last day of the month will be two or three weeks apart, depending on the month, and they do not occur in the same weeks every year. Moreover, the effect of a payday on call volume may not be consistent because the payday can fall on a different day of the week, and customers exhibit different calling behaviors on different days. For example, customers may call more often on payday Fridays than on payday Tuesdays.

Cleaning the Weekly Data Cleaning the data plays a larger roll at the weekly level than at the monthly. Consider a call center that gets 100,000 calls per month (and 25,000 calls per week). If one day of calls is missing 5,000 calls, it is a 5% loss to the month, but a 20% loss to that week. The first step in determining weekly seasonality is to identify those weeks in which there are missing data. Systems-reporting failures, emergency closures, and weather events are typical causes of missing data. Initially, we interpolate the volumes during a missing data week from an average of surrounding weeks. This technique is problematic for seasonal data, however, as it smoothes out the seasonality we are trying to measure. So we do a further adjustment based on our estimates of weekly seasonality.

When we have at least two years of data, we use a statistical procedure that is based on event models—a feature of the family of exponential-smoothing methods. These models incorporate event adjustments for individually identifiable RIOs. Table 2 identifies the individual events that affect call volume. There are three primary events: (1) pay week, the week in which a payday falls; (2) radio or TV ads; and (3) in-store displays. These events are coded 1, 2, and 3 respectively. Recognizing that individual events interact to affect call volume, we also create a code for every combination of events. A pay week that also sees the placement of radio or TV ads is coded 4, while a week in which all three primary events occur is coded 6. In all, there are 8 possible event combinations. Because we are attempting to calculate the effects of the events, the effects of holidays and other nonstandard weeks must first be removed from the data. Holidays are removed because they have distinct day-of-week patterns. Table 2. Event Matrix Pay Week

Radio/TV Ads

In-Store Display

X X

We also need to deal with the end-of-year crossover—the adjustments for a year ending in the middle of a week. (Most reporting systems effectively deliver data at the daily and monthly level, but often do not provide good weekly reporting.) Table 1 demonstrates this adjustment. In the chart below, 2002 ends on a Wednesday. This is not an issue for a monthly forecast, but it must be accounted for when we forecast at the weekly level. In this case, the

X X X X

X X X X

X X

Code 0 1 2 3 4 5 6 7

We can now fit an exponential smoothing model to the weekly call-volume data. The model will account for any

Table 1. End-of-Year Adjustment (Holidays Normalized) Month/

Week

Year Dec-2002 Dec-2002

Begin 21-Dec 28-Dec

Jan-2003 Jan-2003

4-Jan 11-Jan

Weekly Sun 486 398 568

Mon 2,358 2,456 2,586

Tue 2,163 2,234 2,486

Wed 2,054 2,096 2,214

Thu 1,923

Fri 1,875

Sat 899

2,001 2,129

1,899 2,014

923 1,002

Total Calls 11,758 12,007 12,999

June 2006 Issue 4 FORESIGHT

17

trend and seasonality in the data and will provide estimates of the call-volume lift from each of the coded events in relation to Code-0 weeks (weeks without events). The call-volume lifts are reported under the label Factor in Table 3. We can interpret each factor as a ratio of the average number of calls during the weeks associated with an event code to the average number of calls during the Code-0 weeks. An alternative calculation for the event factors can be implemented within a spreadsheet. Here we find the average number of calls per week for each code; we then (judgmentally) adjust these averages for perceived trend and seasonality in the data. The use of exponential smoothing is preferable to this spreadsheet procedure, but it requires specialized forecasting software, such as Forecast Pro from Business Forecast Systems, Inc.

The event model described here for predicting the impacts of RIOs is not practical for short time series (less than two years of weekly data); for the short-time scenario, we would rely more heavily on judgmental adjustments. Nor is it practical for organizations with a wide variety of events, or when the event effects are inconsistent. For example, if the branded advertising causes large spikes in volume during certain weeks, but causes no spike at all during other weeks, this model will not work well. However, these models do identify events that require a more detailed analysis.

Summary Call centers often struggle to forecast call volume. A forecaster’s only recourse may be to quiz phone agents about the reasons that people call. Recurring irregular occurrences (RIOs) pose a particular challenge because their effects must be forecast at weekly and even daily intervals. Event models allow us to quantify the RIO effects.

Table 3. Event Factors Code 0 1 2 3 4 5 6 7

Avg. Calls 5,682 5,963 5,786 6,225 6,157 6,537 6,587 6,230

Factor 0.049 0.018 0.096 0.084 0.150 0.159 0.096

Ultimately, call centers would like to identify the underlying drivers of call volume, thus permitting implementation of causal forecasting models. However, until systems can be set up to get the forecaster this needed information, the procedures described here can help improve forecast accuracy.

Table 3 reveals that, among primary events, in-store displays have the greatest volume lift at 9.6%, with payroll weeks second at 4.9%, and with branded advertising far behind at 1.8%. We also analyze the variance of the factor for each event to determine (1) how accurate the event factor is, and (2) whether the event factor exhibits any trends. The table also shows that the combined lift for pay week and in-store-display weeks (Code 5) is greater than the sum of each event (Code 1 + Code 3) individually (15.0% versus a combined 14.5%). Likewise, branded advertising combined with payroll weeks will have a greater effect than anticipated (8.4% versus a combined 6.7%). These results suggest a synergy between pay weeks and promotional events. For example, it probably makes sense to advertise more during the weeks in which your customers get paid. 18

FORESIGHT Issue 4 June 2006

Contact Info: Dan Rickwalder Incoming Calls Management Institute [email protected]

COMMENTARY ON CALL CENTER FORECASTING by Tim Montgomery, The Service Level Group, LLC Tim Montgomery is founder of the Service Level Group, which provides consulting services to call centers. Tim spent three years as a consultant, seminar leader, and technology advisor with the Incoming Calls Management Institute (ICMI), and he held leadership positions in three of the most celebrated service companies in the world: USAA, The Coca-Cola Company, and The SCOOTER Store. Tim has written articles and white papers on the management of call centers, and has been featured in several global publications. He earned his BBA in accounting and his MBA from the University of Texas at San Antonio.

Jay Minnucci’s article on nano forecasting addresses two challenges faced by call centers. First, Jay explains why every call center should adopt a short-interval approach to forecasting and staffing. Many call centers would benefit by considering their inbound workload from Jay’s “ramp-up” perspective. By dividing the early workload into 15-minute increments, call centers would be better able to utilize their personnel and to improve customer satisfaction. The nano-forecasting approach also helps set the stage for the entire day: when you get behind in a call center, it can take several intervals to catch up, often with a lot of people running around trying to “put out the fire.” The second challenge is to properly use call center forecasting tools to predict the carryover or hangover—the length of the incoming call queues at times when staffing is insufficient. Following Jay’s recommendations, a call center would be able to add staff in subsequent intervals that allow for a faster, more informed (and less chaotic) reduction in the hangover queue. Jay’s nano-forecasting approach can also help call centers better manage their environments on an intra-day basis. For call centers to fully leverage their nano-forecasting knowledge, they must develop a way to quickly disseminate this information to the people who work on the front line. The first step is to communicate the expected workforce variances by interval reported; a simple spreadsheet at the beginning of each day is a great starting point. Those responsible for capturing and reporting the interval variance would have the responsibility for updating the spreadsheet based on actual events. The spreadsheet, along with the ongoing updates, could then be e-mailed or posted on an intranet site. Providing this continual “snapshot” of the workforce and workload distribution by interval will eliminate many queue surprises.

Through nano forecasting, call centers can better leverage the information at their fingertips to help supervisors, frontline agents, and customers. At some point, every call center manager experiences a disconnect with the center’s marketing group. Poor communication from marketing creates unexpected surges or lulls in call volume, with the consequence of overstaffing or understaffing, possibly for several days. In his case study “Forecasting Call Flow in a Direct Marketing Environment,” Peter Varisco illustrates how using a dynamic modeling approach for a marketing campaign can improve forecasting accuracy, provide more information on expected caller-response rates, and provide greater insight into call-volume expectations. All these factors make it easier for call center leaders to do their jobs effectively; the result is more precise scheduling and a better working environment for frontline employees. Peter’s analysis illustrates that call centers need to take a more scientific approach to understanding the drivers of call volume, and how to model the timing of marketing campaigns. Working together, call center leaders and marketing departments can use Peter’s approach to balance the timing of marketing campaigns against the expected call center staffing. The approach is a win for everyone— the marketing department gets better information for planning; the call center has additional details on volumes and types; and the customer quickly is connected to the appropriate agent. Peter shows the benefit of applying advanced forecasting models to the relationship between the timing of the message and the overall call volume. Equally important for call center managers is the ability to leverage information by balancing staff utilization against the expected workload. Because call centers have a limited June 2006 Issue 4 FORESIGHT

19

number of resources available on any given day, it is vital that they control as many of the internally created demand channels as possible. Call centers operate in a real-time environment. To be managed effectively, they must have an interval (typically 15- or 30-minute) focus on staffing and reporting. The challenge is further compounded by internal and external distractions: unplanned events that can affect call volumes, staffing, and cost of service. In his article “Forecasting Weekly Effects of Recurring Irregular Occurrences,” Dan Rickwalder starts by recommending something that all call centers should do on a regular basis: clean the data. Most call centers now have the ability to gather volume information down to the weekly level; however, as Dan points out, if you rely solely on the numbers from your system, you might forecast from the wrong baseline.

Get

MORE.

Unless you renew, this will be your LAST ISSUE of FORESIGHT. All FORESIGHT subscriptions expire June 2006.

Dan shows us how to apply event modeling to determine the potential impact on call volumes of recurring irregular events. Many call centers have this type of information in front of them, but some managers don’t know how to tie the activities together in ways that allow for better planning of staffing and controllable events. While Dan’s focus is on call volume, his cleaning approach and event modeling can also be applied to handle time (time spent by agents on calls). If handle time were added to each of the RIOs that Dan outlines, and were coded the same way, a call center could obtain not only the call-volume lift from individual RIOs but also estimates of expected handle times. There would most likely be an increase in handle time during the weeks when all three RIOs occur. An increase in handle time, combined with an increase in call volume, will create an even greater staffing challenge during the weeks when all three activities occur. By using event modeling, call centers can better forecast their expected staffing needs and better plan those events within their control.

3 WAYS TO SUBSCRIBE CLICK www.forecasters.org/foresight TO PAY BY CREDIT CARD OR CHECK ONLINE

CALL Bill Wicker toll-free 866.395.5220 TO PAY BY CREDIT CARD

E-MAIL Pam Stroud at [email protected] QUESTIONS?

Contact Info: Tim Montgomery The Service Level Group, LLC [email protected]

Or use Subscription Form on Page 4.

RENEW TODAY 20

FORESIGHT Issue 4 June 2006

FORECAST PROCESS IMPROVEMENT Lessons from Successful Companies MANAGING THE INTRODUCTION OF A STRUCTURED FORECAST PROCESS: Transformation Lessons from Coca-Cola Enterprises Inc. by Simon Clarke Preview: Simon Clarke, Manager of Forecasting and Planning for Coca-Cola Enterprises Inc., led a corporate team that engineered a radical transformation of the forecasting process. The team took his organization from an unstructured, decentralized process to a disciplined internal collaboration of over 2,000 forecasters in a highly volatile promotional environment. Here he describes the lessons learned in managing the transformation. Simon Clarke is the Manager of Forecasting and Planning for Coca-Cola Enterprises Inc., North American Business Unit. His career covers a variety of roles across operations, sales, marketing, and logistics, both in North America and Europe. Simon is a graduate of Grey College, Durham University, UK, where he received a degree in geography.

 Create a vision statement.  Form a strong opinion and avoid compromises.  Define the measures of success and how success affects the bottom line.  Be clear and specific on roles and responsibilities.  Focus on process before software.  Manage change as a priority.  Hire the very best people.  Spend more time with users than you think you can afford.

PREFACE: Established in 1986, Coca-Cola Enterprises Inc. (CCE) is a young company; however, its business roots stretch back to 1899, when the first Coca-Cola bottling operations began. Built from many individual franchises, Coca-Cola Enterprises Inc. is now the largest soft drink bottler in the world, with global revenues exceeding $18 billion per year.

Introduction

through organic growth and acquisitions—we found ourselves maintaining more than 20 forecasting systems. Some were sophisticated and tailored to local needs. Others were simply inadequate for modern business management. Adding to the complexity of a multisystem environment were (1) the need to contend with a nonstandard data hierarchy and (2) a lack of consensus on the definition of standard metrics. We struggled to establish a consolidated view of the business, and this problem limited our ability to plan the business effectively. During the next several years, we implemented a program of change. Our goal was to improve our organizational capability in forecasting. There is evidence that many other organizations face similar challenges (Moon, Mentzer, & Smith, 2003). I hope this description of our transformation will be valuable to other companies.

Identifying the Challenges The first step was to identify three key characteristics of our business: scale of operation, volatility of demand for our products, and a fast pace of change.

Scale Five years ago we were at a crossroads, facing significant forecasting challenges. As a result of rapid growth—

In North America, CCE operates in 46 states and throughout Canada. Every location differs in scale of June 2006 Issue 4 FORESIGHT

21

operation, in marketplace dynamics, and in the performance of our products. Therefore, our marketing approach is a function of location.

Volatility of Demand Because our products are heavily promoted and react to seasonal and short-term weather events, our demand data exhibit great volatility. At the SKU (stock-keeping unit) level, we may see demand swings exceeding 400% from week to week. Figure 1 shows the weekly volatility for just one of our SKUs. Such volatility requires that the supply chain be well planned and proactively managed.

succeed or fail, but it certainly made our lives easier as the project developed and as we tried to evaluate our progress. Our shared vision statement set clear expectations regarding the scope and objectives of our transformation. The statement also described responsibilities of key personnel. In our case, we received support from our chief financial officer and from a broad group of other key executives. To get such strong advocacy among senior leadership, we prepared diligently and created a strong, fact-based business case that demonstrated the financial benefits of the change. We also promoted the “soft benefits” of the transformation, such as optimized planning capability.

Figure 1. Weekly Demand Volatility for a Single SKU

Volume

As we gained senior leadership support, we also spent considerable time trying to influence other key opinion leaders within our business. The profiles of these opinion leaders were diverse, but all these people were highly respected by their peers and were willing to promote change at every opportunity. Although not formally engaged in the decision-making process, these opinion leaders were invaluable in creating positive momentum.

Week

Pace of Change To accommodate consumer preferences, we must constantly change our brands. Such marketing shifts create headaches for our forecasters. A generation ago, our business was primarily focused on soft drinks. Now our portfolio includes juices, fruit drinks, water, sports drinks, tea, and energy drinks. In the last 20 years, the number of our SKUs has grown more than sixfold.

Transformation Lessons To bring about change in our organization, a small group was assembled late in 2001. Known as the Uniform Forecast and Planning (UFP) team, we were responsible for redesigning the forecasting process, promoting acceptance, and managing the transformation. There were several key lessons that we learned along the way.

Lesson 1 – Create a vision statement. The creation of a vision statement is a highly valuable preparatory activity. This document is often overlooked by companies engaged in process redesign. The creation of our vision statement did not determine whether we would 22

FORESIGHT Issue 4 June 2006

During the course of the project, several challenges surfaced. Some field users voiced specific objections that were perhaps locally important but had little resonance elsewhere in the system. To rebuff these challenges, we referred to the vision statement. In doing so, we redirected attention to the original plan and to the fundamental benefits of the transformation. Our experience substantiates the recommendation in the forecasting literature that collaboration and communication enable a process of change. Be sure to read the series “The Organizational Politics of Forecasting” in the October 2005 issue of Foresight (Deschamps et al., 2005).

Lesson 2 – Form a strong opinion and avoid compromises. When meeting with our senior leadership and opinion leaders, we found that a strong, well-articulated opinion keeps conversations focused and effective. The same principle directed our approach to soliciting feedback from subject-matter experts who were assembled from each functional area and business unit. These experts gave us direction and feedback on key process-design questions. Our approach to getting this feedback was first to state how a process should work, based on our past observations.

We also described the best systems from other consumer packaged goods (CPG) companies. We then gave the subject-matter experts the opportunity to improve on these systems and to make changes when necessary. In most cases, the original design remained intact, albeit with many adjustments and improvements. We did not enter into these meetings with a completely blank slate; we wanted to avoid encouraging a series of negotiations and compromises among the participants. We focused on creating a new system, not on re-creating the past. We wished to avoid competition among advocates of our current forecasting processes. Such competition would have thwarted any attempt to transform our system.

Lesson 3 – Define the measures of success and how success affects the bottom line. As part of the process of creating the vision statement, we described what forecasting success might look like—we described it as utopia. Consequently, we understood what we wanted to measure and how we might calculate “success.” Unfortunately, we had made a mistake. When we rolled out the process, we did not have the benchmarks to determine whether our impact was successful. This gap in functionality led us to build a forecast benchmarking area within our system that would display key performance metrics. Once process adoption and performance metrics were in place, we were able not only to track performance of each participating group across the company, but also to provide each business unit with incentives to reward top performers. The monetary value of the incentive was probably less of a motivational force than the competition itself. The galvanizing effect of the incentive was impressive, and accuracy improvements were evident and sustained. The metrics dealt with not only forecast accuracy but also supply-chain indicators such as inventory for finished goods and raw materials; service-level performance was addressed as well. These financial indicators revealed the value of our project to the bottom line, and they gave evidence to senior leadership of the effectiveness of our investment.

Lesson 4 – Be clear and specific on roles and responsibilities. Our new collaborative forecasting process hinges on the inputs of constituent groups across functional areas of the business. We strove to design a process of shared

accountability for the forecasts. However, in doing so, we knew that we would open ourselves to the possibility of misunderstanding and finger-pointing. To avoid these hiccups, we clearly articulated, with plenty of advanced notice, each team member’s responsibilities. We also mandated the timing of each individual’s forecasting contributions. We found that repeating the message of collaboration, however tiresome, was extremely valuable. Repetition of key messages was included in both formal presentations and less conspicuously in low-key informal communication, such as user conference calls. We began to recognize the true value of this repetition when we started to see, unscripted, local communication that contained our core message.

Lesson 5 – Focus on process before software. We spent much time considering software options, but we did not let the software determine what our forecasting process should be. Instead, our process determined our software tools. As we examined the systems of several vendors, we concluded that, despite some packaging variation, the programs offered similar functionality. We also concluded that our process and scale requirements could not be satisfactorily met by any packaged software solution. We elected to develop an in-house solution that would not only meet our specific functionality requirements but also support the individual inputs of 2,000 weekly forecasters. Before your forecasting process is in place, agonizing over which software package to select is a misspent effort. We recommend that you design the forecasting process before you evaluate software. Once the process is defined, a decision on the software system becomes a critical step.

Lesson 6 – Manage change as a priority. Assess the impact of change on your users. Management of change is not merely a case of training users to perform their new tasks. It is in fact a broad set of activities designed to overcome the natural resistance to change—that is, to facilitate a seamless transition to a new way of working. Like any organization going through redesign of a core process, our group encountered some pockets of resistance. Most of this was benign, but we needed to overcome it, and we gave that effort a high priority. June 2006 Issue 4 FORESIGHT

23

Our new forecasting process was implemented in 21 business units, and in each case we carefully explained the potential impact of change on each user group. Using a formal Web-based survey of attitudes, concerns, and skills, we assessed each group’s readiness for change. In those instances where there was a high degree of comfort between the leadership group and the user community, we needed to undertake little more than thorough communication and training. In more challenging situations, we evaluated the concerns of each user group and designed specific activities to address those concerns. For example, when one group expressed concern about how a forecast was to be disaggregated for detailed planning, we designed a model that could simulate this activity and allow users to gain familiarity through experimentation. Face-to-face contact was important and was the preferred means for gaining acceptance to change. Because it was impossible for the UFP team to be everywhere at all times, we used other means of communication as well: conference calls, e-mail, intranet, and a monthly newsletter.

Lesson 7 – Hire the very best people. It may seem logical to select the most proficient forecasters within your organization to lead your forecasting transformation. Their technical savvy and earned respect will be advantageous, but there are other dimensions that need to be considered. These key attributes include the ability to relate to others, confidence to hold the line, and basic communication skills. We have found that it is easier to teach technical skills than it is to coach improved personal skills, particularly in a fast-paced project environment. Forecasting is repetitive and process-driven, and acquired behaviors can be difficult to break down. As a result, we see enthusiasm and perseverance as critical personal attributes. In addition, we have found that the most effective way to deal with objections is to empathize with those affected and to put personal effort into understanding the nature and significance of the concerns expressed. Such empathy requires highly proficient communication and people skills.

Lesson 8 – Spend more time with users than you think you can afford.

To keep the dialogue open, we held twice-monthly conference calls with the business units. We wanted to cascade information out to the field, to capture any resident concerns, and to share best practices. A testament to the effectiveness of this strategy is still evident today. The conference calls were intended to last only for the duration of the implementation; however, they have become a permanent component of managing the forecast process. Other techniques included creating a monthly newsletter that detailed the “big picture” changes while also giving bite-sized tips and tricks.

When buried beneath a mountain of e-mail and voice-mail messages, you might be hard-pressed to spend time with your forecasters. But that time is essential to ensure that your project remains on track.

Building an in-house forecasting tool and capturing design requirements from subject-matter experts helped us in unexpected ways. By asking the subject-matter experts to help design the functionality and test the tools, we gained their implicit buy-in. Had we bought an external solution, and had we been less reliant on our own resources, we might have lost some internal support.

Results

We were determined that training should be managed by the UFP Team, despite the offer of support from our technical training team. We wanted to maximize the time we spent with our end users. Also, we wanted to explain how and why the system was being changed. 24

FORESIGHT Issue 4 June 2006

We recommend regular trips into the field to meet with users and their management teams. There you will learn about transformational successes and failures. Our field visits ensured that we did not become too complacent about our new process. The visits demonstrated to the users our own openness to revision.

By creating a standardized process, we have greatly enhanced our forecasting capabilities. The key characteristics of the old and new processes are described in the table below: improvements in the process have enabled higher-quality inputs that lead to more valuable insights. For instance, a change in a large promotional event is more rapidly reflected in the forecast than ever before because roles and responsibilities are more clearly defined. In addition, because each business unit employs an identical methodology, improvements in the forecasting technique can be more rapidly shared and adopted. This has made us

Table 1. Results CHARACTERISTICS

OLD PROCESS

NEW PROCESS

Consistent forecasting methodology Consistent roles and responsibilities Knowledge and best practice sharing Reactivity to changes Integrated collaborative forecasts

Multiple different methodologies Different by business unit

Single methodology Single set of roles and responsibilities Active exchange of internal and external best practice Fast reaction to changes Full cross-functional collaboration

Little exchange of knowledge Slow reaction to changes Some collaboration

Implementation

Adoption

Development

Maturity

Figure 2 demonstrates our progress over the last four years with just two of our key metrics: forecast accuracy and days of inventory. The chart describes each phase of the project—implementation, adoption, development, and maturity. The most radical improvements in accuracy and reduction in inventory have occurred in the implementation and process-adoption phases; however, despite there being fewer easy “wins,” there have been ongoing improvements as the process has been fine-tuned. Besides providing better supply-chain inputs, forecasting has been the catalyst for the development of new processes and capabilities in other areas of the business. In revenuegrowth management, for example, we have been able to make optimized pricing decisions; we have also been able to identify revenue-growth opportunities.

12

10

6

8

4

2

12

8

10

4

6

2

12

8

10

6

4

2

12

10

6

8

4

2

12

 Forecast errors at the SKU level by plant have been reduced on average by 15%.  The stability of the weekly forecasts has also dramatically improved, with week-to-week fluctuations far smaller than before.  We can more accurately predict how much labor we need to accommodate the merchandising requirements of our customers. Such forecasts improve our service and ensure the availability of our in-store products.

Days of Inventory

Plant_SKU Accuracy

Days Inv Accuracy %

8

Our supply-chain metrics have shown marked improvement, with reduced inventory stocks and better synchronization across the supply chain.

Figure 2. Progress in Two Key Metrics

10

more nimble and adaptive to change. Probably the most significant benefit, however, has been our ability to effectively leverage the knowledge resident in each area of the business (sales, revenue-growth management, operations, and finance). This leverage has enabled us to generate more accurate forecasts.

Month

How much of our success has been driven by the adoption of the process, versus the adoption of the software tool? In our opinion, the process allows the tool to be effective. Success will come through focus on the process, on the end user, and on the management of change.

References Deschamps et al. (2005). The organizational politics of forecasting, Foresight: The International Journal of Applied Forecasting, Issue 2, 5-11. Moon, M. A., Mentzer, J. T. & Smith, C. D. (2003). Conducting a sales forecasting audit, International Journal of Forecasting, 19(1), 5-25.

Contact Info: Simon Clarke Coca-Cola Enterprises Inc. [email protected]

June 2006 Issue 4 FORESIGHT

25

BREAKING DOWN BARRIERS TO FORECAST PROCESS IMPROVEMENT by Mark Moon Preview: Mark draws upon his experience in audits of the forecast process at many large companies to identify the key barriers to forecast process improvement and how these barriers may be overcome. He examines the critical role of the forecast audit; discusses the need for changes in organization structure, forecast process, computer systems, and performance measurement; and explains how a forecast champion may be necessary to successfully implement the requisite changes.

Mark is an Associate Professor of Marketing at the University of Tennessee, Knoxville. He has an MBA from the University of Michigan and a PhD from the University of North Carolina. Mark’s professional experience includes positions in sales and marketing with IBM and Xerox, and he has consulted with numerous companies including Eastman Chemical, Hershey Foods, Lucent Technologies, DuPont, Union Pacific Railroad, Motorola, Sony, and Sara Lee. Mark teaches courses in demand planning, forecasting, and marketing strategy in the University of Tennessee’s Center for Executive Education. He is the author, along with Dr. John T. (Tom) Mentzer, of Sales Forecasting Management: A Demand Management Approach.

 To improve the forecast process, organizational leaders must understand what needs to be improved, must develop a plan to fix their problems, and must identify and overcome any barriers to implementing the plan.  To understand what needs to be improved, the organization should conduct a forecasting audit that provides a snapshot of the organization “as is.”  The plan to address the problems should consider organizational improvement, process improvement, systems improvement, performance-measurement improvement, and training programs.  Even the most artfully conceived improvement plan can fail because there are barriers that hinder the design and implementation of the project. The presence of an effective change agent—a forecasting champion—is the most important element in breaking down such barriers.

know that ineffective demand forecasting is to blame and that forecast process improvement (FPI) is needed. However, FPI efforts are often unsuccessful. In this article I will explain some potential problems with forecast process improvement, and I will describe management strategies that can overcome these barriers. My insights come primarily from a program of research conducted by members of the University of Tennessee Forecasting Research Team. In a benchmark study (Mentzer, Bienstock & Kahn, 1999), the team conducted in-depth work with 20 organizations (see the list of companies in the appendix). To date, 27 companies have participated in its forecasting-audit research (Moon, Mentzer & Smith, 2003). Not all of these FPI efforts were successful, but the experience gained from participating in the process provides the background for my analysis.

Introduction Demand forecasting is a management process that most companies wish to improve. It is integral to excellence in supply-chain management: understanding future demand is a critical element in planning future supply. Demand forecasting is also integral to successful management of customer relations. Excellence in demand forecasting allows an organization to provide customers with the products or services they want, when and where they want them. Some clear signals emerge when demand forecasting is ineffective. When inventory levels begin to get out of control, or customer fill-rates begin to suffer, companies 26

FORESIGHT Issue 4 June 2006

There are three stages for forecast process improvement:  Understand what needs to be improved.  Develop a plan to fix the organization’s problems.  Identify and overcome barriers to implementing the plan.

To understand what needs to be improved, conduct a forecasting audit. To begin FPI, the organization needs a specific understanding of what needs to be improved. An excellent way to document problems is to conduct a forecasting audit.

Whether conducted internally or managed by a team of industry experts, a forecasting audit has three objectives. First, it should document the current state of the forecasting process—it should take a snapshot of the organization “as is.” Second, it should articulate the goals of forecast process improvement, what we call the “should-be” state of forecasting practice. Finally, an audit should present a roadmap to guide the process-improvement effort. Whether they are internal or external to the firm, auditors must possess certain qualifications. They should have a detailed knowledge of accepted industry standards. They should have no incentive to overlook sensitive areas. And they should have the ability to gather honest opinions from participants. If such expertise and independence resides within the organization, an internal audit can be effective. Otherwise, the organization should use external auditors. The audit must document the current state of the organization’s forecasting practice. Mentzer, Bienstock, and Kahn (1999) provide a useful framework for doing so. They identify four dimensions of forecasting practice (functional integration, approach, systems, and performance measurement), and they articulate four stages of sophistication for each dimension. Using this framework, a company can identify strengths and weaknesses in its current forecasting practice. For example, a firm may find that it is reasonably strong in the systems dimension, but lacking in functional integration. Such insight provides guidance for the direction of process-improvement efforts. For a more detailed description of the forecasting audit, see Moon, Mentzer, and Smith (2003).

Develop a concrete plan for process improvement. Once auditors have identified the key problems, the organization should develop a concrete plan to address them. This plan should consider improvements in process, systems, performance measurement, and training programs.

Organizational improvement A problem exists when forecasting is perceived as being organizationally bound to one functional area, especially when that area inappropriately introduces its biases into the forecasts. A second problem arises when the forecasting function lacks adequate organizational clout to implement the desired process change. A third occurs when there are insufficient resources (human, system, or financial) dedicated to the forecasting function. In most companies, true forecast process improvement requires some form of organizational realignment. In some cases, additional human resources may be warranted. In other cases, a centralized forecasting team needs to be established, headed by a true forecasting champion (Mentzer, Moon, Kent, & Smith,1997), one who is unencumbered, to the extent possible, by functional biases (Deschamps, 2005).

Process improvement Major problems with the forecasting process can occur (1) when insufficient attention is given to statistical modeling of historical demand, (2) when there are misguided qualitative adjustments to statistical forecasts, and (3) when the financial goals of the firm exert undue influence on the forecasting process. In such circumstances, the FPI plan should suggest a more formalized, disciplined forecasting process. Deschamps (2005) makes a number of suggestions to purge bias from the forecast process.

Systems improvement When companies use outdated legacy systems that fail to provide adequate functionality, system improvement requires a straightforward implementation of a new, functionally richer forecasting system. Alternatively, the problem may be a lack of integration between the forecasting system and other upstream and downstream systems; this problem can result in a manual transfer of data and possibly a lack of access among key users. Here the plan must call for significant realignment, perhaps a migration to the forecasting system of an integrated supply-chain suite. June 2006 Issue 4 FORESIGHT

27

Perhaps the most daunting system problem is the presence of “islands of analysis,” where forecasting is done by people who use different tools, make different assumptions, and access different sources of data. Aggravating the problem is the reliance on Excel as the forecasting tool. To see the pitfalls in the use of Excel for forecasting, you should read the Special Feature in Issue 3 (February 2006) of Foresight, entitled “Software: Spotlight on Excel.” The organization that uses Excel must consider changing to an appropriate integrated forecasting system. Moreover, the firm should address the challenge of behavioral change. It is difficult to convince forecasters to give up their treasured spreadsheets and to migrate to a common forecasting platform.

Improvement in Performance Measurement Many organizations fail to follow this management mantra: “What gets measured gets rewarded, and what gets rewarded gets done” (Mentzer & Moon, 2005). As Dhuyvetter (2005) shows, companies can benefit from the insights available from their sales and marketing groups as inputs to or adjustments of statistical forecasts. However, incentives must be in place to ensure good work. Many companies lack a performance-measurement process that documents the quality of forecasting and rewards effective forecasting performance.

Identify and overcome barriers to process improvement. In the dozens of companies that have participated in our research over the past two decades, we have seen three assumptions that are major barriers to forecast process improvement. A. The culture is wrong. B. The system is the solution. C. Management doesn’t get it.

A. The culture is wrong. During forecasting audits, we hear the following sentiments: Sales and operations live in different worlds. They can’t communicate. You can’t believe anything that comes out of sales. They’re way too optimistic. You can’t believe anything that comes out of sales. They’re sandbaggers! You can’t believe anything that comes out of operations. They just want to produce what makes them look good, not what the customers want.

Improvement in training Forecasters often receive inadequate training in forecasting. We find that forecasters, marketing personnel, and salespeople are often provided only with instruction on how to interact with the system, or how to “fill out the forms.” To achieve true forecasting improvement, a comprehensive training program should be developed. Forecasting personnel need detailed instruction on how to perform statistical analysis of historical demand. They also need to understand the goals and procedures of the overall forecasting process. Users need to know what their forecasting system is doing, what it’s not doing, and what it takes to get the most benefit from the system’s capabilities. These five areas—organization, process, systems, performance measurement, and training—are crucial in forecast process improvement. But even the most artfully conceived improvement process can fail because there are barriers that hinder the design and implementation of such a project. 28

FORESIGHT Issue 4 June 2006

We’re not making our numbers. Increase the forecast! We’re not making our numbers. Push some more product on our distributors! Such perceptions indicate a culture that fails to recognize the importance of accurate forecasts. John Mello (2005) ascribes such corporate attitudes to misplaced strategies of “goal enforcement” and “manipulation”; these attitudes also result from a lack of understanding about how demand forecasting drives the supply chain. Organization-wide training efforts can overcome cultural differences that often exist between the demand and supply sides of an organization. In addition, performance measurement and reward strategies can focus forecasters’ attention outside their own functional silos. If forecast-accuracy metrics are built into performance evaluation, salespeople quickly become more aligned with the supply side of the organization, and they

begin to understand their roles in driving supply-chain efficiency. In turn, such accountability improves the attitude of supply-side people toward the sales organization. The most effective way to overcome the “wrong culture” barrier is to set the forecast function in its rightful place within an integrated demand-supply process that we now call Sales and Operations Planning (S&OP) (Lapide 2002). The overall benefit will be the creation of a formal, disciplined process by which the demand side of the company communicates its demand forecast, and the supply side of the company communicates its capacity constraints. Then, in the context of the financial goals of the firm, strategic decisions can be made about what to produce (the operational plans) and what to sell (the demand plans). The key element here is that an effective process is formal and disciplined. With such a process in place, both sides of the organization’s cultural divisions can begin to appreciate the issues faced by the other side, and both can work together to operate a more effective supply chain. Mello (2005) argues that “coming together” in this way promotes teamwork, consensus, and more effective forecasting.

B. The system is the solution. There is often a perception that forecast process improvement can be achieved through technology alone. The belief is that investment in technology will automatically make the forecasts more accurate. In many organizations, technology solutions are purchased and implemented before there is a complete understanding of the appropriate process for forecasting. Companies may attack problems with technology, rather than address the more slippery issues of cultural misalignment. And people may even reduce the time and energy they devote to forecasting, believing that the new technology will do the job for them.

C. Management doesn’t get it. In some organizations, senior executives do not understand the importance of demand forecasting. Senior managers may not embrace the notions that the firm participates in a supply chain and that effective supply-chain management requires effective anticipation of future demand. In other companies, management has not been shown the gains that are available from an investment in forecasting excellence. Overcoming this barrier requires an effective forecasting champion who can recognize the importance of gaining support from senior executives (Mentzer, Moon, Kent, & Smith, 1997). Speaking management jargon and demonstrating the financial gains from forecasting improvement will appeal to senior management. Mentzer (1999) provides an excellent template for speaking the language of executives by showing how forecast process improvement can impact the firm’s return on shareholder value. Process improvement can result in top-line effects from fewer stock-outs and higher fill rates. It can lower inventory carrying costs, reduce transshipment costs, and allow more effective procurement efforts, both for materials and transportation. And process improvement can impact the balance sheet by dramatically lowering inventory levels. All this adds up to significant improvement in return on shareholder value, a measurement that will always appeal to senior management. Executive sponsorship can also be enlisted by identifying the “pain points” that currently exist in the organization, such as out-of-control levels of inventory. In this case, the chief supply-chain officer is a likely candidate for executive sponsorship. When the pain point is the failure to supply scarce products to important customers, the chief sales or marketing officer can be an effective executive sponsor. It is critical to obtain effective executive sponsorship for any effort in process improvement. Identifying true pain

Overcoming the “system is the solution” barrier requires reframing the source of the problem, and this is where an audit can be extremely beneficial. The audit should be perceived as an exercise to understand the strengths and weaknesses of the overall forecasting process. In other words, the audit should not be considered simply a technology-system evaluation. By drawing attention to issues such as functional integration, culture, and performance measurement, the audit provides a blueprint for overall improvement, and it broadens any narrow focus on technology.

June 2006 Issue 4 FORESIGHT

29

points and “selling” process improvement through financial results are the best ways to overcome the “management doesn’t get it” barrier.

Conclusions Most companies can benefit from forecast process improvement. But leaders must recognize potential barriers to this improvement. These barriers are normally associated with cultural issues, making them difficult to overcome. The presence of an effective change agent—a forecasting champion—is the most important element in breaking down such barriers. This champion must be someone who has the ear of senior executives, who can provide the resources and sponsorship required for true change, and who has the ability to gain acceptance from a variety of functional areas— marketing, sales, finance, logistics, operations, and procurement. Building that level of consensus will integrate the demand-supply process. Although cultural barriers may be difficult to overcome, the effort is generally worthwhile. Forecast process improvement will result in (1) significant improvement in forecasting performance, (2) substantial decreases in inventory and other supply-chain costs, and (3) dramatic improvement in customer service. The ultimate benefit is company profitability. Appendix: Here are the 20 organizations in the benchmark study by Mentzer, Bienstock, and Kahn, 1999. Anheuser-Busch, Becton-Dickinson, Coca Cola, Colgate Palmolive, Federal Express, Kimberly Clark, Lykes Pasco,

References Deschamps, E. (2005). Six steps to overcome bias in the forecast process, Foresight: The International Journal of Applied Forecasting, Issue 2 (October 2005), 5-11. Dhuyvetter, R. (2005). Management judgment: Best as an input to the statistical forecasting process, Foresight: The International Journal of Applied Forecasting, Issue 2 (October 2005), 24-26. Lapide, L. (2002). New developments in business forecasting: You need Sales and Operations Planning, Journal of Business Forecasting, (Summer 2002), 11-14. Mello, J. E. (2005). The impact of corporate culture on sales forecasting, Foresight: The International Journal of Applied Forecasting, Issue 2 (October 2005), 12-15. Mentzer, J. T. (1999). The impact of forecasting improvement on return on shareholder value, Journal of Business Forecasting, (Fall 1999), 8-12. Mentzer, J. T., Bienstock, C. C. & Kahn, K. B. (1999). Benchmarking sales forecasting management, Business Horizons, 42 (May-June, 1999), 48-56. Mentzer, J. T. & Moon, M. A. (2005). Sales Forecasting Management: A Demand Management Approach, Thousand Oaks, CA: Sage Publications.

Nabisco, J.C. Penney, Pillsbury, Prosource, Reckitt Colman, Red Lobster, RJR Tobacco, Sandoz, Schering Plough, Sysco, Tropicana, Warner Lambert, and Westwood Squibb.

Here are the 27 organizations that have participated in the audit research (Moon, Mentzer, and Smith 2003) at the time of this article. AET Films, AlliedSignal, Alticor, Avery Dennison, Bacardi USA,

Mentzer, J. T., Moon, M. A., Kent, J. L. & Smith, C. D. (1997). The need for a forecasting champion, Journal of Business Forecasting, 16(3), 3-8. Moon, M. A., Mentzer, J. T. & Smith, C. D. (2003). Conducting a sales forecasting audit, International Journal of Forecasting, 19, 5-25.

Conagra, Continental Tire, Corning, Deere and Company, DuPont, Eastman Chemical, Ethicon, Exxon, Hershey Foods, Lucent Technologies, Maxtor, Michelin, Motorola PCS, OfficeMax, Orbit Irrigation Products, Pharmavite, Philips Consumer Electronics,

Tashman, L. (ed.) (2006). Software: Spotlight on Excel, Foresight: The International Journal of Applied Forecasting, Issue 3 (February 2006), 38-47.

Sara Lee Intimate Apparel, Smith & Nephew, Union Pacific Railroad, Whirlpool, and Williamson-Dickie.

30

FORESIGHT Issue 4 June 2006

Contact Info: Mark A. Moon University of Tennessee [email protected]

Still forecasting using a spreadsheet? Let Forecast Pro take the guesswork out of your forecasting.

Rely on the expert. Forecast Pro’s expert selection takes the guesswork out of forecasting—you provide the historic data and Forecast Pro does the rest. The built-in expert system analyzes your data, selects the appropriate forecasting technique and calculates the forecasts using proven statistical methods. If you prefer to control the forecasting approach—for some or all of the items you are forecasting—Forecast Pro provides you with a complete range of forecasting models backed by all the diagnostic aids you need. Creating accurate forecasts, generating reports, viewing graphs, collaborating with others, integrating your forecast results with other planning systems—it’s all a snap with Forecast Pro!

Contact us for a FREE demo Business Forecast Systems, Inc. ◆ 617-484-5050 www.forecastpro.com

Announcing the next generation of OxMetrics™

OxMetrics 4 ™

OxMetrics is a modular software system for data analysis and forecasting. OxMetrics 4 is the result of a major design overhaul , the third such overhaul in about 15 years.

New features include:

Fig 1 OxMetrics 4 – new interface

• New STAMP™ 7 has been completely redesigned. It offers, in a modular way, four modelling types - Unobserved component models Fig 2 – Japanese language support

• Support for most languages – internally OxMetrics and

- Business Cycles - Seasonal Adjustment

OxEdit are now fully Unicode. (See fig 2 showing support to

- Forecasting

Japanese language)

In future, other modules have been planned:

• Better support for Windows XP™

Stochastic Volatility, Bayesian dynamic linear

• Support for most platforms, including Linux and Apple Mac™.

models, Count data, SsfPack interactive, etc

• Improved dialogs, including the possibility to enlarge the

• New Quick Modeller in PcGets™ – a mode for the non-

dialog boxes if you wish more space to enter the required

expert user has been developed to extend the automatic

information

selection in PcGets. The user simply specifies the

• Better Undo/Redo. Now database and graph changes can

appropriate functions of the regressand and the basic regressors, then PcGets selects a model

also be undone • Modular structure – if you wish, you can use the facilities provided to guide you to the appropriate module. In this version, you choose a model category, for instance, Cross-

• And much more – PcGive 11 offers Monte Carlo simulations etc. and Ox 4 offers new random number generators etc. • New pricing structure – OxMetrics Enterprise comprises all

section analysis and then are offered two subcategories

the modules and has been priced very competitively.

Cross-section regression and Logit Models to choose from.

Optional annual maintenance is also available. The new

OxMetrics will work out which module is most appropriate

structure of OxMetrics 4 will provide the ability to develop

and will switch automatically to that module.

the software quicker and we foresee an upgrade every year.

TIMBERLAKE CONSULTANTS LTD HEAD OFFICE: Unit B3, Broomsleigh Business Park, Worsley Bridge Road, London SE26 5BN UK Tel: +44 (0)20 8697 3377 | Fax: +44 (0)20 8697 3388 | e-mail: [email protected] websites: www.timberlake.co.uk | www.oxmetrics.net | www.timberlake-consultancy.com

Business Excellence ‘’The Journey to Improved Business Performance” The Oliver Wight Companies introduced its first benchmarks and standards in 1976, which started companies on a systematic approach for improving their business performance. Now, we’re introducing the sixth edition of the checklist. This newest edition reflects the changing standards in business today and maps out the route to sustainable progress in improving company financial performance and competitive position. Take the first step on the Journey to Business Excellence. Contact us today to order your copy of the new Oliver Wight Class A Checklist for Business Excellence.

800-258-3862 www.oliverwight.com Enjoy the Journey.

www.oliverwight.com ad_apics1005 ad_for0206

You’ve spent millions on Enterprise Software… For just Pennies on the Dollar, ForecastBot™ can make your Enterprise Software actually Deliver a Return on Investment! What is ForecastBot™ ? ForecastBot is an Analytical Search Engine that merges qualitative and quantitative data to create new business intelligence. ForecastBot combines search technology with powerful statistics to bring together plans and forecasts into an actionable management consensus. ForecastBot is a search engine developed by RoadMap Technologies, the leader in forecast search technology.

It sounded simple when you started—one enterprise system for corporate planning and business intelligence—but now

R OADMAP

TECHNOLOGIES

you’ve got millions of individual business plans, forecasts and assumptions from hundreds of users. Do you know how to bring them together to build a consensus business plan?

RoadMap Technologies 900 Cummings Center Beverly, MA 01915 (978) 232-8901 www.roadmap-tech.com

Simply attach ForecastBot to your SAP or other Enterprise Software. ForecastBot is not a software package. It is an Analytical Search Engine that can be added to any existing Enterprise Software or Business Intelligence System. For more information, access iForecastBot.com.

MCCONNELL CHASE SOFTWARE WORKS, LLC

Improve Your Forecast Accuracy and Planning Process Get the Right Stuff to the Right Place at the Right Time “Thanks for the FD software...We found that it worked well with virtually no problems after installation despite some heavy customization required by our site. When minor problems did arise, mostly from users improperly using the system, you responded quickly to help us resolve the difficulty. Though we did not use all features included in the FD software, those features that we did use helped us to make better forecast decisions. To put this letter in proper perspective, I should state that in using the FD software we had close to 50,000 parts in our item master. The FD software handled that huge volume flawlessly. In addition, several users of FD requested some specialized reports. Thanks for building those reports!”

FD6 ENTERPRISE SOLUTIONS

Brian D. Six Hewlett Packard, Components Group Information Technology Department San Jose, California



■ ■ ■

FOR:

Sales Forecasting / Demand Planning Sales and Operations Planning Inventory Optimization and Planning Sales Forecasting / Demand Planning

■ ■

■ ■ ■

Excellent statistical forecasting Comprehensive, web-based collaboration Sales force forecasting Judgmental forecasting Promotion and event forecasting

■ ■ ■ ■ ■

Multi-level Forecasting New product forecasting Superb error tracking Extensive reporting Management by exception

Sales and Operations Planning ■ ■

■ ■

Multiple level S&OP Production planning Resource and rough-cut capacity requirements planning Financial integration Comprehensive support for self-developed S&OP processes

Inventory Optimization and Planning ■ ■

Call Now for your Free Live Web Demo

■ ■

DRP, VMI, CPFR Discrete and Continuous

McConnell Chase Solution Strengths ■ ■ ■ ■

See how FD6 can improve your accuracy

Service Parts Planning Supply Chain Planning

Accuracy Ease to use Usefulness Adaptability

■ ■ ■

Scalability Systems Integration with any ERP system Support

and planning processes, and help you get the right stuff to the right place at the right time, with a free live web demo using a model implementation that closely fits your company’s criteria.

McConnell Chase Software Works, llc 360 East Randolph Street, Suite 3202 ■ Chicago, Illinois 60601 ■ 312 540 1508 [email protected] ■ www.mcconnellchase.com

.EW$EMAND7ORKS3MOOTHIECOMBINESTHEEASEAND AFFORDABILITYOFSPREADSHEETSWITHTHEPOWEROFDYNAMIC PYRAMIDFORECASTING 4-

0IVOTTOFORECASTANDANALYZEATANYLEVELWITH DYNAMICSYNCHRONIZATIONTOOTHERLEVELS

 =VkZV;gZ]Z^ZH ^e d[Hbddi dcJNhLOAD3MOOTHIE

)MPORTEXISTINGFORECASTSORCREATETHEM IN3MOOTHIE

s$OW RSIONAND "9/&-INI 6E TEMSFOR I PLANUPTO R VE RE FO FREE  DAYTRIALO R s2EQUESTAYTIME0AID UPGRADEAN EFROM VERSIONSRANG EACH TO

3MOOTHIESAVESYOURWORK SOPERFORMANCE EVALUATIONISSIMPLEANDYOUDONTHAVETO INPUTANYTHINGMORETHANONCE %ASYINSTALLATIONAND SPREADSHEETINTEGRATION

=VkZ^i NdjgL 4WOGREAT Vn

3MOOTHIEEDITIONS "RING9OU "9/& 4- &ORECASTS R/WN 3MOOTHIE AND FEATURINGT 0LUS 4- LEADINGFO HEMARKET RE ACCURACYO CAST &ORECAST0 Fš RO 

$EMAND7ORKS#Os7-ARKET3Ts7EST#HESTER 0!   sWWWDEMANDWORKSCOM &ORECAST0ROšISAREGISTEREDTRADEMARKOF"USINESS&ORECAST3YSTEMS )NC$EMAND7ORKS $EMAND7ORKS3MOOTHIE 3MOOTHIE"9/& AND3MOOTHIE0LUSARETRADEMARKSOF$EMAND7ORKS#O ¥$EMAND7ORKS#O

SPECIAL FEATURE FORECAST-ACCURACY METRICS FOR INVENTORY CONTROL AND INTERMITTENT DEMANDS PREFACE This special feature of Foresight examines the challenges in measuring forecast accuracy within the context of inventory management. We give special attention to the difficulties in accuracy measurement that arise when demands are intermittent. This section contains four papers from five authors. The development of the papers was interactive: each author was shown a draft of every other paper, and each was asked to comment on the other authors’ ideas. The mutual commentaries led to one or more revisions in the drafts. The final result is a consensus among the authors on the metrics that are preferable in different forecasting situations. Perhaps the most widely reported metric for forecast accuracy is the mean absolute percentage error (MAPE), which expresses the numerical value of a forecast error as a percentage of the actual value. The MAPE indicates the average percentage by which the forecasts differ from the actual values. Perspective on forecast accuracy can be obtained by comparing MAPEs of different methods, on different series, or between different time periods.

Jim Hoover begins the section by noting that when demands are intermittent—with zero values interspersed in the series— the MAPE cannot be calculated, regardless of which demandplanning software package reports figures under the MAPE heading. Jim describes the problem with these software calculations, then considers several alternatives to the standard MAPE calculation that might overcome the problem. Jim points out further deficiencies of demand-planning software in reporting aggregate forecast errors across items or locations. Finally, he discusses the demand planner’s need to assess improvements in forecast accuracy, and he shows that software is not responding adequately to this requirement. Jim is Foresight’s Demand-Planning Software Editor and is Chief of Operations for Research in the Logistics Operations Directorate of the Defense Logistics Agency. See his demandplanning software columns in Issues 1 and 2 of Foresight.

Tom Willemain, Vice President of Smart Software Inc., faculty member at RPI, and well-known researcher on intermittent demand, explains that MAPEs and other measurements of average errors provide only a speck of the information required

for inventory-stocking decisions. He recommends that we pay attention to the entire demand distribution, not just to the average level of demand, and that we assess forecast error at distinct levels of demand. For intermittent-demand data, Tom proposes an accuracy metric based on the chi-square statistic to compare actual and predicted levels of demand.

John Boylan and Aris Syntetos show that the type of forecast-accuracy metric required depends on the inventory rules in place, and on whether accuracy is to be gauged for a single item or across a range of items. Depending on the particular inventory method, we need estimates of mean demand, variance of demand, percentiles of demand, and probabilities of highdemand values. John and Aris also make the important distinction between forecast-accuracy metrics, which measure the errors resulting from a forecast method, and accuracy-implication metrics, which measure the achievement of the organization’s stockholding and service-level goals. In their many publications in leading journals, John and Aris have advanced the state of the art in forecasting intermittent demand. John’s article in Issue 1 of Foresight, “Intermittent and Lumpy Demand: A Forecasting Challenge,” provides an excellent introduction to this topic.

Rob Hyndman concludes this special feature section by broadening our perspectives on forecast-accuracy metrics. He illustrates four types of accuracy measurements—scale dependent, scale independent, relative error, and scale-free error metrics—and he explains that the appropriate choice depends on what is being forecast. For intermittent demands, he shows that many traditional metrics are inappropriate because they yield infinite or undefined values. Rob introduces a new metric, the mean absolute scaled error (MASE), and he shows that it overcomes the deficiencies of many standard metrics. Rob is Editor in Chief of the International Journal of Forecasting—Foresight’s sister publication for original research—and he has published many valuable articles on forecasting methods and forecast-accuracy metrics. Len Tashman, Foresight Editor

June 2006 Issue 4 FORESIGHT

31

MEASURING FORECAST ACCURACY: OMISSIONS IN TODAY’S FORECASTING ENGINES AND DEMAND-PLANNING SOFTWARE by Jim Hoover Preview: In this article, Jim Hoover discusses:  Forecast error calculations for intermittent (sporadic) demand items  Issues with measuring forecast error aggregated across items  Metrics for tracking forecast accuracy improvement over time Jim Hoover, Foresight’s Demand-Planning Software Editor, is a Captain in the U.S. Navy Supply Corps. He is currently assigned as the Chief of Operations Research in the Logistics Operations Directorate of the Defense Logistics Agency, which manages over five million line items of inventory and has sales of over $29 billion dollars annually, supporting U.S. troops worldwide with food, fuel, clothing, spare parts and medical supplies.

 For zero-demand months, one cannot measure a percentage error because the percentage error on a base of zero is undefined. Nevertheless, some software packages report mean absolute percentage errors (MAPEs) for intermittent data; they do so by ignoring the time periods when actual demand is zero.  To more reliably measure the percentage errors in forecasting intermittent demand, I suggest three variations on the MAPE: the denominator-adjusted MAPE, the symmetric MAPE, and the ratio of MAD to MEAN.  A demand planner may want to assess the forecast accuracy of a model across a group of items, such as the aggregate of items sold in one location. However, software calculation of aggregated forecast-error metrics may be deficient, with some programs summing errors in a manner that presents an overly favorable picture of accuracy.  Forecast-error metrics in today’s software are almost universally calculated by giving equal weight to all errors, whether they represent the recent or distant past. But these measurements can disguise genuine improvements in accuracy that result from new systems and approaches. To measure accuracy improvements over time, I recommend the calculation of moving averages of forecast errors.

Introduction Forecast-accuracy metrics are critical guidelines for proper selection and implementation of forecast models. In demand planning, improved accuracy and better modeling translate into reduced inventory costs, increased service levels, or some combination thereof. Unfortunately, today’s software fails 32

FORESIGHT Issue 4 June 2006

to provide the demand planner with several types of important information. In this piece, I will discuss three major shortcomings and how you might deal with them: 1. How to measure forecast accuracy when demand is intermittent (sporadic) 2. How to calculate aggregated forecast accuracy across multiple items 3. How to track improvements in forecast accuracy over time

Measuring Forecast Accuracy for Items with Intermittent Demands Intermittent demands occur when some time periods (months) have no demand at all. These periods are represented as zeros in the midst of the historical time series. The “zero” periods occur in many kinds of businesses, including automotive, aircraft, and military parts. As John Boylan showed us in his Foresight article “Intermittent and Lumpy Demand: A Forecasting Challenge” (Boylan, 2005), these items are very difficult to forecast, but methodological advances are being made. Problems persist, however, with the available metrics to assess forecast accuracy, and these problems are related to the presence of months when actual demand is zero. Probably the most widely reported metric is the mean absolute percentage error (MAPE). See the definition in Exhibit 1. For example, a MAPE of 15% tells us that the forecasts come within 15% of the actuals on average. For zero-demand months, however, one cannot measure a percentage error because the percentage error on a base of zero is undefined. Technically, the MAPE cannot be computed for those months.

Exhibit 1. Calculation of the MAPE 1 ×∑ MAPE= — T

Actual Valuet – Forecast Valuet Actual Valuet

Nevertheless, I have found that some software packages report MAPEs for intermittent data. These programs do so by ignoring the time periods when the actual demand is zero. For demand patterns with many zeros, the MAPE is derived only from nonzero periods. The MAPE calculation on nonzero values will probably understate the amount of the true forecast error; in any event, it provides only a partial picture of forecast accuracy. The forecaster will learn nothing about the model’s performance in predicting months of zero demand. Is there a better way to calculate a percentage-error metric for intermittent series? Certainly, any worthwhile metric must include all time periods in the calculation. I see several reasonable ways to do this.

1. A Denominator-Adjusted MAPE (DAM) Referring to the MAPE formula in Exhibit 1, the calculation would proceed unchanged for every period of nonzero demand. For a period of zero demand, the numerator would remain the same, but the denominator would be set to +1 instead of zero. For example, in a month of zero demand, and with a forecast for that month of 1 unit, the absolute percentage error would be [0-1]/1 = 1, or 100%. The interpretation of the DAM is straightforward. One unit of absolute error equals 100% error when the denominator is 0. This is the same percentage error that results when the actual value is 1 and the forecast is off by 1 unit. The DAM is not without its problems. In his contribution to this section, Hyndman writes that the MAPE calculation remains dicey for periods of small but nonzero demand; in other words, small absolute errors translate into large percentage errors when the denominator is small.

2. The Symmetric MAPE (sMAPE) Replace the denominator of the MAPE with the average of the actual and forecasted value [(Actual Value + Forecasted Value)/2] while leaving the numerator of the MAPE the same. This metric is called the symmetric MAPE (sMAPE). The sMAPE will always be nonzero when either the forecasted or actual value is nonzero. However, in a period where the actual and forecasted values are both zero, that period’s absolute percentage error will

be excluded from the calculation. Is this a problem? That is, can a forecast of zero occur? My answer is that zero forecasts won’t normally result from a statistical model (such as simple exponential smoothing or Croston’s method), but they do occur in the automotive parts aftermarket (Kornafel, 2004). They can also occur when you have planes or equipment that have repair parts with a long mean time between failure (Muckstadt, 2005), or when you have repair-part support for a small fleet of vehicles such as the space shuttle (Sherbrooke, 2004). Moreover, forecasted values of zero can occur in an industrial repair facility that uses an MRP II repairplanning system, or in any environment that implements a judgmental or collaborative forecasting approach. A more serious problem with the sMAPE calculation is pointed out by Boylan and Syntetos — that the value for a single period symmetric absolute percentage error will equal 2 for any period when actual demand is zero, no matter what the size of the error. In effect, use of the sMAPE treats all methods as equally accurate over periods of zero demand.

3. Ratio of MAD/MEAN Approximate the MAPE by calculating the ratio of the mean absolute error (MAD or MAE) to mean, as shown in Exhibit 2. Instead of mean absolute percentage error, you get the mean absolute error as a percentage of the mean demand. Exhibit 2. MAD to Mean Ratio

MAD Ratio of MAD to Mean = = Mean

⎛ ⎜ ⎜ ⎜ ⎜ ⎝



∑ |Actual Valuet – Forecast Valuet|⎟⎟ T

⎟ ⎟ ⎠

∑ Actual Valuet

T

Exhibit 3 illustrates how these three metrics compare to the typical computer-calculated MAPE for intermittent demand. The software-calculated MAPE, which excludes the zero-demand periods, is far too small. The three alternative metrics are calculated using all periods, those with both nonzero and zero demands, and they paint a more realistic picture of forecast accuracy.

Calculating Aggregate Forecast Accuracy Across Multiple Items A demand planner may want to assess the forecast accuracy of a model across a group of items, such as the aggregate of items sold in one location. Furthermore, he or she may June 2006 Issue 4 FORESIGHT

33

Exhibit 3. Illustration of the Intermittent Demand Calculations Period

Item 1 1 0 0.80 -0.80 0.80

History Forecast Error Absolute Error Absolute N/A Percent Error Denominator 80% Modified APE Symmetric 200% APE MAD Mean

2 0 0.84 -0.84 0.84

3 2 0.67 1.33 1.33

4 2 1.34 .66 .66

5 2 1.27 0.73 0.73

6 1 1.02 -0.02 0.02

7 0 1.81 -1.81 1.81

8 1 1.45 -0.45 0.45

9 0 1.36 -1.36 1.36

10 1 1.09 -0.09 0.09

11 1 1.67 -0.67 0.67

12 6 1.74 4.26 4.26

N/A

66%

33%

36%

2%

N/A

45%

N/A

9%

67%

71%

84%

66%

33%

36%

2%

181%

45%

136%

9%

67%

71%

200%

99%

40%

45%

2%

200%

37%

200%

8%

50%

wish to compare aggregated accuracy between or among locations. To measure the accuracy of forecasts across a group of items, we need a unitless metric, one not expressed in volume or value units. We call such a metric, scale independent. This point is well made in the accompanying papers by Boylan and Syntetos, and Hyndman, respectively. Using scale-independent metrics, we can calculate averages and measurements of the spread of forecast errors across items (in a single forecasting period at a single location). We can use box plots and other graphics to track the distributions of forecast errors over time or across locations. Software support for aggregated forecast-error metrics is sometimes deficient. Teradata reports that some software incorrectly calculates aggregated forecast errors by summing the errors on individual items and then dividing by the aggregate actual demands (Schrieber, 2005). I have confirmed Teradata’s report from my own experience with demand-planning software integrators. In doing so, errors on individual items can cancel, creating an overly optimistic picture of forecast accuracy at that location. Exhibit 4 illustrates this point along with illustrations of some alternative calculation methods.

Software Calculated MAPE

DAM

sMAPE

41% 68%

Once we calculate a measurement of aggregate 81% forecast accuracy across items, we permit the forecast error at the aggregated level to be tracked over time. In this way, we can determine whether forecast accuracy is improving. 99%

110% 1.09 1.33

Tracking Forecast Improvement Over Time Forecast accuracy metrics in today’s software are almost universally calculated by giving equal weight to all errors, whether they represent the recent or distant past. Such metrics are not useful if our goal is to determine whether forecasting accuracy is improving over time. And the task of gauging improvement is important to many demand planners. For example, planners may wish to determine if a new collaborative forecasting arrangement with supplyor demand-chain partners is working better than the previous system; alternatively, planners might seek to evaluate a new method of judgmental adjustment. Forecast managers must determine whether institutional changes, such as new ERP or SCM systems, are living up to expectations. In addition, managers may wish to see if actions taken by individual forecasters are affecting the accuracy of those forecasters’ respective plans. For accuracy-improvement metrics, I recommend the calculation of moving averages of forecast errors. The length of the moving average should be fairly short, perhaps 3 periods, in order to rapidly capture changes in forecast

When forecasting multiple items at the SKU level, we must aggregate the absolute errors (or, alternatively, the squared errors). There are several ways to do so. For intermittent Exhibit 4. Aggregating Forecast Error Across Items series, aggregation methods 3 and Forecast History for Error 4 require that absolute percentage Current Current (History Period Period Forecast) SKUs at Store Location 1 errors be calculated with the Item 1 5 20 -15 Item 2 10 8 2 denominator adjustment (DAM) Item 3 15 5 10 or the sMAPE adjustment, as Item 4 20 15 5 Item 5 4 6 -2 discussed in the previous section. The Boylan and Syntetos paper in this section recommends an 34

FORESIGHT Issue 4 June 2006

Aggregation Method 1 Aggregation Method 2 Aggregation Method 3 Aggregation Method 4

MAD to Mean Ratio

alternative metric for aggregating across items; however, this metric is calculated using absolute errors, not absolute percentage errors.

54 54

54

Absolute Absolute Percent Error Error 15 300% 2 20% 10 67% 5 25% 2 50%

0 34

0% 63% 92% 50%

= error / history = absolute error / history = mean of individual APEs = median of individual APEs

accuracy. Longer averaging periods could be chosen for forecast series that demonstrate a great deal of variability in each period’s absolute percentage error. Exhibit 5 illustrates how a moving average captures forecast improvement even with as few observations as 10, while the traditional equally weighted MAPE does not make the forecast accuracy improvement so obvious. For longer series, the equally weighted MAPE could completely disguise improved forecast accuracy.

References Boylan, J. (2005). Intermittent and Lumpy Demand: A Forecasting Challenge. Foresight: The International Journal of Applied Forecasting, Issue 1, 36-42. Kornafel, P. (2004). Inventory Management and Purchasing: Tales and Techniques from the Automotive Aftermarket, Bloomington, IN: Authorhouse.

Summary

Muckstadt, J. A. (2005). Analysis and Algorithms for Service Parts Supply Chains, New York: Springer.

Some software is providing substandard information on at least three issues of practical importance to demand planners. Therefore, until software functionality improves, you should implement some manual calculations. Beware of reported MAPEs in the case of intermittent demand. Look into the specifics of how aggregated forecast accuracy is calculated, and consider moving averages of forecast errors as a metric for detecting improvement in forecast accuracy.

Exhibit 5. Example of an Improving Forecast Item 1 History Forecast Error Absolute Error Absolute Percent Error Equally Weighted MAPE 3-period Moving Average MAPE

Schrieber, J. (2005). Redefining Demand Visibility, Dayton, OH: Teradata Corporation http://www.teradata.com/t/go.aspx?id/=149027&pdf=134559 Sherbrooke, C. C. (2004). Optimal Inventory Modeling of Systems: Multi-Echelon Techniques, (2nd ed.), Norwell, MA: Kluwer Academic Publishers.

Improved Forecast Method Implemented in Period 8

Period 1 20 25.00 -5.00

2 30 25.00 5.00

3 22 25.00 -3.00

4 18 25.00 -7.00

5 28 25.00 3.00

6 32 25.00 7.00

7 21 25.00 -4.00

8 19 20.00 -1.00

9 29 28.00 1.00

10 27 28.00 -1.00

5.00

5.00

3.00

7.00

3.00

7.00

4.00

1.00

1.00

1.00

25%

17%

14%

39%

11%

22%

19%

5%

3%

4%

25%

21%

18%

24%

21%

21%

21%

19%

17%

16%

25%

21%

18%

23%

21%

24%

17%

15%

9%

4%

Contact Info: Jim Hoover Defense Logistics Agency [email protected]

Absolute Perc ent Error

History 45%

35

40% 35% 30%

30 25 20

25%

15

20% 15%

10

10% 5% 0%

5 0 1

2

3

4

5

6

7

8

9

1

10

2

3

Equally Weighted MAPE

4

5

6

7

8

9

Note how clearly the 3-period moving average MAPE captures the improvement.

3 Period Moving Average MAPE

30%

30%

25%

25%

20%

20%

15%

15%

10%

10%

5%

5%

10

0%

0% 1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

7

8

9

10

June 2006 Issue 4 FORESIGHT

35

FORECAST-ACCURACY METRICS FOR INTERMITTENT DEMANDS: LOOK AT THE ENTIRE DISTRIBUTION OF DEMAND by Tom Willemain Preview: While most forecast-error metrics are averages of forecast errors, Tom argues that, for intermittent-demand series, we should focus on the demand distribution and assess forecast error at each distinct level of demand. He illustrates how this can be done, and he suggests use of the chi-square statistic to judge the overall effectiveness of the forecast method. Tom Willemain is Vice President of Smart Software, Inc. in Belmont, MA and Professor of Decision Sciences and Engineering Systems at Rensselaer Polytechnic Institute in Troy, NY. Tom is recipient of the Rensselaer School of Engineering Education Excellence Award and the Rensselaer Distinguished Teaching Fellowship.

 The MAPE is an inappropriate measure of forecast accuracy for intermittent demand data not only because it cannot be calculated when demands are zero, but also because it is a measurement only of average error. Accuracy metrics must look at the entire demand distribution, not just the average error.  To look at the demand distribution, we should calculate the errors (actual versus forecast) in predicting each demand level. For example, we should ask how often (during how many months) there was one unit of demand, compared to how often a method predicted one unit of demand.  The chi-square is a useful statistic to summarize forecast accuracy across the entire demand distribution.

Introduction As Jim Hoover points out in the preceding article, intermittent demand, with its frequent zero values, forces us to consider new ways of assessing forecast accuracy. The most obvious reason is that conventional measures that work well for nonintermittent or smooth demand, such as MAPE, present problems when the actual demand in the denominator of the calculation is zero. But there are other, less obvious reasons to develop new accuracy metrics. For example, forecasts of intermittent demand are usually made to support inventory management of service parts, where the key to success has less to do with assessing forecast accuracy itself and more to do with managing key events.

36

FORESIGHT Issue 4 June 2006

The Problem of Zero Demand First, zero is a special demand because it causes trouble in the denominator of a division calculation. There is a big difference operationally between having some demand and having none. Thus, predicting the likelihood of zero demand over a lead time means predicting the likelihood of no reordering costs, no (trans)shipments of inventory, and so on. A metric like the MAPE addresses the variability in demand but does not directly relate to the likelihood of significant events, such as a total lack of demand during a specific period. Second, the MAPE does not directly address the likelihood of stocking out an item. To approach the stockout problem, one would have to switch from MAPE to some measure of absolute rather than relative error, such as the root mean square error (RMSE) or the mean absolute error (MAE). Even then, to estimate the probability of a stockout from RMSE or MAE, one would need to assume that the demand distribution has a normal (“bell-shaped”) distribution. If it does, then RMSE or MAE can be used to estimate the variance or standard deviation of demand, which in turn can be used to estimate high percentiles of the demand distribution, according to the following logic: Probability of stockout

= probability that demand over a lead time exceeds a certain level = expected demand plus a multiple of lead time times the standard deviation of demand.

Figure 1. Showing Nonnormal Distribution of Lead Time Demand Distributon of demand over 3-week lead time 16 Jet engine tools 0

2

TOOL01

4

6

8

TOOL02

0

2

TOOL03

4

6

8 100

TOOL04

50 0 100

TOOL05

TOOL06

TOOL07

TOOL08

TOOL09

TOOL10

TOOL11

TOOL12

Percent

50 0 100 50 0 100

TOOL13

TOOL14

TOOL15

TOOL16

50 0

0

2

4

6

8

0

2

4

6

8

can also be used to forecast high percentiles of the demand distribution when it is normal. But when demand is intermittent, or when it is abnormal for some other reason, then this approach breaks down.

Unfortunately, the normality assumption, while often valid for smooth demand, is usually grossly inaccurate for intermittent demand. Figure 1 shows the distribution of cumulative demand over a lead time of 3 weeks for several jet engine tools; not one of the distributions resembles the classic bell-shaped curve. Furthermore, hidden in the traditional logic is an assumption that demand in successive periods is independent, and this too is often not true for intermittent demand.

The more general problem is to forecast not just the average level of demand but also its entire distribution. Accordingly, the appropriate accuracy metric will assess the difference between the actual and forecasted distributions of demand. Figure 2 illustrates the difference between the two distributions: it shows that the number of periods with 0 demand is somewhat misforecasted, as is the number with 1 demand, and so on. Inevitably there will be many possible demand values, and every demand value will be misforecasted to at least some degree. To make sense of all this information, we need to condense

The Need to Focus on the Distribution of Demand, Not Just Average Demand

If the usual accuracy metrics blow up when demand is zero (MAPE), or they do not address key operational events (RMSE and MAE), what should we do? I believe the best approach is to shift attention from forecasting Figure 2. Showing the Difference Between Observed average demand to forecasting the entire and Forecasted Demand Over a Lead Time distribution of demand. Then we should use Observed Demand 100 metrics appropriate for the new objective. Traditional forecasting focuses on predicting the most likely level of demand. Assuming a normal distribution of demand, the most likely level is the average level. The traditional accuracy measures like RMSE and MAE do well at assessing accuracy in this case. As we have seen, when the job shifts from demand forecasting to inventory management, the traditional metrics

Forecasted Demand

80 60 40 20 0

0

1

2

3

4

5

6+

Demand Value

June 2006 Issue 4 FORESIGHT

37

all the forecast mistakes into a single-number summary of overall forecast error.

The example in Figure 3 is the second type of use: here we evaluate a single forecasting method against actual data. The chi-square value of 4.4 in Figure 3 is certainly worse than 0, but it has to be compared to the sizes of discrepancies that would routinely arise by innocent chance alone before we can properly react to the apparent mismatch between forecasted and actual distributions. For this instance, with 7 levels of expected demand (0, 1, … 6+), the chance model is the chi-square distribution with 7-1 = 6 degrees of freedom. Here we conclude that a chi-square value of 4.4 or even worse will happen quite routinely (about 62% of the time), when the forecasting method is correct. As there is nothing surprising about this result (most people would consider 5% the threshold for “surprising”), we would conclude that in this case the forecasted demand distribution closely matches the actual demand distribution.

The Chi-Square Metric One way to formulate a single-number summary of overall forecast error is by using a metric known as the “chisquare” statistic (pronounced “ky square”):

chi-square = ∑ (Oi – Ei)2/Ei where Oi = the observed number of lead times with demand equal to i; Ei = the expected number of lead times with demand equal to i; and the summation is over all levels of demand. Figure 3 shows how the chi-square statistic is computed for the data in Figure 2. There are two ways to use the chi-square statistic. Given two or more competing forecasting methods, and using observations of demand collected over many lead times, one can compute the chi-square statistic for each method and then choose the method that gives the lowest chi-square value. Alternatively, if one is assessing a single forecasting method, one can test the statistical hypothesis that any differences between observed and forecasted demand can be attributed to random chance (this strategy requires you to dust off your college statistics notes).

In conclusion, when faced with intermittent demand, we need to replace our familiar accuracy metrics—those measurements that come from a world of demand forecasting with normal distributions. Using the chi-square statistic to compare full-demand distributions is one simple approach that is well tuned to this new context.

Figure 3. Showing the Calculation of the Chi-square Statistic Demand Value

LeadTime Observed Count

Demand Expected Count

Chi-square Contribution

0 1 2 3 4 5 6+

83 21 11 12 8 4 1

76 26 15 10 6 5 2

0.64 0.96 1.07 0.40 0.67 0.20 0.50

Total

140

140

4.4 = Chi-square statistic

Contact Info: Tom Willemain Rensselaer Polytechnic Institute [email protected]

38

FORESIGHT Issue 4 June 2006

ACCURACY AND ACCURACY-IMPLICATION METRICS FOR INTERMITTENT DEMAND by John Boylan and Aris Syntetos Preview: John and Aris distinguish between forecast-accuracy metrics, which measure the errors resulting from a forecast method, and accuracy-implication metrics, which measure the achievement of the organization’s stock-holding and service-level goals. Both measurements are important. The correct choice of a forecast-accuracy metric depends on the organization’s inventory rules and on whether accuracy is to be gauged for a single item or across a range of items. The authors recommend specific accuracy and accuracy-implication metrics for each context. John Boylan is Professor of Management Science at Buckinghamshire Chilterns University College. Previously, he worked in OR at Rolls-Royce and at the Unipart Group. His research and publications (Journal of the OR Society, International Journal of Production Economics, International Journal of Forecasting) have increasingly focused on the challenges of forecasting slow, intermittent and lumpy demands.

Aris Syntetos is a Lecturer in Operations Management and Operational Research at the University of Salford, UK. His research interests include intermittent-demand forecasting and the interface between forecasting and stock control. On behalf of the Salford Business School, he is currently involved in two inventory-management projects, one with an engineering firm and one with an international wholesaling company.

 In considering forecasting-accuracy metrics for intermittent demand, we should begin by looking at the inventory method. Depending on that method, we may need estimates of mean demand, variance of demand, percentiles of demand, and probabilities of high-demand values.  When a forecast of mean demand is needed, the accuracy of the forecast for an individual item can be judged by the mean absolute error (MAE). To assess forecast accuracy across a range of items, a scale-independent metric, such as the ratio of the mean absolute error to the mean demand, is appropriate. Alternatively, the geometric mean absolute error (GMAE) may be used.  If forecasts of percentiles of demand or probabilities of high-demand values are needed, then an appropriate chi-square test should be used, concentrating on the upper end of the distribution (for example, the 95th percentile).  No matter which inventory system is used, the accuracy-implication metrics of stock-holding and service levels should always be considered.

Introduction In the February 2006 edition of Foresight, Kenneth Kahn poses the following question: “Should we view forecast accuracy as an end in itself or rather as a means to an end?” (Kahn, 2006, p. 25). Most commonly, intermittent-demand forecasting is a means toward the twin ends of lowering stock-holding costs (including costs of stock obsolescence)

and maintaining or improving stock availability (“service level”). The achievement of these goals depends not only on the accuracy of the forecasting method but also on the suitability of the inventory rules determining the timing and size of orders. The relationship between these factors and the system’s goals is shown in Figure 1, next page. If we regard the design and implementation of a stockmanagement system as a means toward an end, then the outcome measures on the right-hand side of Figure 1 should not be ignored. These measures ensure that forecasters and inventory managers do not lose sight of the system’s purpose.

Accuracy-Implication Metrics Most managers would regard stock-holding costs and service level as outcome measures rather than accuracy measures. But if we keep the inventory rules fixed and try different forecasting methods, these outcome measures become accuracy-implication measures. The term “accuracy implication” is used instead of “accuracy” because metrics such as service level do not measure the accuracy of a forecasting method, but they do measure the implication of its accuracy under a given inventory rule. A good example of this approach is the study by Eaves and Kingsman (2004). They estimate the effect of forecastingmethod choice on 18,750 line items, including intermittent June 2006 Issue 4 FORESIGHT

39

Figure 1. Relationship Among Forecasting, Inventory Rules and Performance Measures Forecasting Method

STOCK MANAGEMENT SYSTEM

Inventory Rules

Stock-holding Costs

Service Level

items. Importantly, they assume a constant service-level requirement, allowing accuracy implications to be assessed. For example, for quarterly data, they find that using single exponential smoothing instead of Croston’s method requires an additional stock investment of £1.28m. Therefore, instead of simply reporting that Croston’s method is more accurate than smoothing, they show the cost implication of making the wrong choice.

What Types of Forecasts Are Required? To identify the most appropriate accuracy metrics, we must first ask what is to be forecast. The variables to be forecast depend on the inventory method. For example, suppose that we use a periodic (R, s, S) inventory rule. This means that we review the inventory system every R periods, and when the stock level drops to a certain reorder point (called s) or lower, then we order enough stock to take us back up to the reorder level (called S). Some of the most effective methods for finding s and S, including Naddor’s heuristic (Naddor, 1975), require only estimates of the mean and variance of demand. To take a second example, suppose that we use an (s, Q) inventory method. In this case, we review the stock continuously and place an order of fixed quantity Q if the stock drops to the reorder level s or below. This system is also known as (r, Q). In some systems, we wish to ensure that there is no more than a 10% chance of stockout during the replenishment cycle (review time plus lead time). For this case, we need to estimate the 90th percentile of the distribution of demand over the replenishment cycle, rather than the mean and the variance. Another alternative is that we wish to ensure that at least 90% of demand is satisfied directly off the shelf—please note that this is not the same as a 90% chance of no stock outs; this point is discussed in greater detail by Silver, Pyke, & Peterson (1998, 266-270). In this case, we need to estimate the probabilities of any demands that exceed the reorder level. Here, instead of estimates of percentiles, we want estimates of the probabilities of high demand. 40

FORESIGHT Issue 4 June 2006

Should We Forecast the Entire Demand Distribution? Willemain (this issue) suggests that the general problem is to forecast the whole distribution of demand. It is true that this is the most general statement of the problem. However, as we have already noted, some inventory systems require estimates of only the mean and variance. For other systems, estimates of high percentiles and probabilities of high-demand values are needed; even in these cases, we do not need a forecast of the entire distribution. Measures based on the entire distribution can be misleading. A good overall “goodness of fit” statistic may result from excellent forecasts of the chances of lowdemand values, which can mask poor forecasts of the chances of high-demand values. It may be that for other applications (for example, revenue forecasts), forecasts of low percentiles are required (Willemain et al., 2004). However, for inventory calculations, we suggest that attention be restricted to the upper end of the distribution (the 90th or 95th percentiles). In summary, percentile forecasts and estimates of probabilities of demand are required for some inventory systems. For other systems, we need forecasts of the mean and variance of demand. All these quantities are features of the overall distribution of demand. It is the accuracy of determining the key quantities (for example, mean demand, 90th percentile) required for the inventory rules that is important, rather than the accuracy across the entire demand distribution.

Estimates of Mean Demand When it is necessary to forecast the mean demand level, there are two issues to address:  What is the best forecasting method for a particular stock-keeping unit (SKU)?  What is the best forecasting method across a range of SKUs? The second problem is more common in practice, but answering the first question gives us some insight into how to answer the second. The case of a single SKU For a single SKU, we may use a simple measure such as the mean absolute error (MAE) to measure a method’s accuracy in forecasting mean demand. (The mean absolute error is calculated by noting each of the errors and treating them all as positive in sign, and then averaging them.) The mean squared error is not suitable for intermittent-

demand items because it is sensitive to the occurrence of very high forecast errors. The accuracy of a method’s mean-demand forecasts can be compared with another method by calculating the percentage of series for which it has a lower MAE. This approach is known as the Percentage Better method, which is discussed in more detail by Boylan (2005). The approach can be easily extended to the comparison of more than two methods; in that case, it would be termed Percentage Best. A limitation of the Percentage Better method is that, although it summarizes the frequency with which one method outperforms another, it does not inform the user of the degree of improvement in accuracy. Averaging the values of mean absolute error across series would seem to be the obvious answer. Unfortunately, such measures can be dominated by a small number of SKUs with large errors. This problem is known as scale dependence. For non-intermittent data, an effective way of addressing the scale-dependence problem is to calculate the mean absolute percentage error (MAPE). However, the MAPE measure fails for intermittent data because the denominator (actual value) is frequently zero. Amending the denominator to unity when the actual value is zero, as suggested by Jim Hoover in this section, is a pragmatic idea, but it is without any foundation in statistical theory. Another option mentioned by Hoover is the symmetric MAPE (sMAPE), in which the numerator is the absolute value of the actual minus forecast, and the denominator is the average of the actual and forecast values. However, whenever the actual value is zero, the sMAPE entry will have a value of two, regardless of the forecast. If the actual is zero and our forecast is 1, then sMAPE = 1 / ((0+1) / 2)) = 2. If our forecast is 100, then sMAPE = 100 / ((0+100) / 2)) = 2. Therefore, sMAPE cannot be recommended because when actual demand is zero, it does not discriminate between forecasting methods. Scale-independent metrics Scale-independent metrics are required to assess forecast accuracy across a range of items. For intermittent data, a good scale-independent measure is the ratio of the mean absolute error to the mean demand, as suggested by Jim Hoover. A variation on this approach is to compare the accuracy of one method to another by taking the ratio of mean absolute errors.

Alternatively, instead of MAEs, we can compute the ratio of the geometric root mean square error (GRMSE) of one method to that of another. Although this metric is more complex, it is even more robust (less sensitive) than the MAE regarding outlying observations. Fildes (1992) showed that, in the GRMSE calculation, the distorting effect of large errors cancels out. For details on the application of the GRMSE to intermittent-demand items, see Syntetos and Boylan (2005). In his accompanying article, Rob Hyndman correctly notes that the geometric root mean square error is identical to the geometric mean absolute error (GMAE). Because the GMAE is easier to calculate than the GRMSE, and delivers the same result, we will use it in the example that follows. The geometric mean is an alternative to averaging by the arithmetic mean, in which we multiply all observations and then find the nth root. For example, suppose we have three observations: 1, 4, and 16. The geometric mean is 4, as this number is the cube root of 64 (= 1 x 4 x 16). This approach can also be applied to the absolute forecast errors. Suppose we have four forecast errors: -3, 1, -5, and 4. The absolute errors are 3, 1, 5, and 4. Then the geometric mean is the fourth root of 60 (= 3 x 1 x 5 x 4), namely 2.783. This is the geometric mean absolute error, and it is identical to the geometric root mean square error. A potential problem with the GMAE is that if any one forecast error is zero, then the GMAE is also zero, regardless of the size of the other forecast errors. Zero forecast errors can arise in two ways for intermittent demand: 1. Non-zero demand: identical non-zero is forecast. 2. Zero actual demand: zero is forecast. In our experience, the first case does not arise frequently in practice, and it never occurred on the dataset of 3,000 SKUs analyzed by Syntetos and Boylan (2005). Methods based on exponential smoothing (ES), such as Croston’s method and the Syntetos-Boylan approximation, do not generally produce whole-number estimates of the mean demand; therefore they do not typically generate zero errors. Consequently, series with zero GMAEs will be rare if ES-based methods are compared, and they can be excluded from an across-series analysis. If other methods are used, such as the naïve method, then the GMAE will not always be well defined. However, the naïve method is sensitive to large demands and will generate high forecasts in such instances, making it inappropriate for practical inventory applications. June 2006 Issue 4 FORESIGHT

41

The second case, highlighted to us in a private e-mail correspondence by Jack Hayya, can occur when it has been a very long time since there have been any non-zero observations. This may signal that the item is at the end of its life and should be reviewed for classification as “obsolescent,” requiring no subsequent forecasts. If the item is nearing obsolescence (but is not yet obsolete), there would have been some evidence of demand in recent years, and a zero mean demand forecast is inappropriate and should be reviewed.

Estimates of Demand Variance Why does the variance of forecast error need to be estimated? There are two reasons: (1) in some cases, the variance of demand is estimated as an intermediate step in finding a percentile of forecast demand, or the probability of high values of demand; (2) in other cases, the variance of demand is input to a formula that will be used to estimate inventory parameters, such as the reorder point (s). It is not possible to assess the accuracy of variance estimates directly, unless assumptions are made about the demand distribution. However, indirect approaches are available. If the variance is estimated to find a percentile of demand, we can examine the accuracy of the resulting percentile estimate. To do this, we identify the percentile of interest (for example, the 90th percentile) and compare how many observations exceed the percentile estimate against the expected value. This can be achieved using the chi-square test, as discussed by Tom Willemain. A similar approach can be adopted if the variance estimate is used to calculate probabilities of high values of demand. If the variance is used as an input to an inventory formula, we can look at the measures of inventory cost and service. This would enable different approaches to variance estimation to be compared indirectly and is an example of the accuracy-implication approach advocated in this paper.

Conclusions In considering forecasting-accuracy metrics for intermittent demand, we should begin by looking at the inventory method. Which forecasts are required for the particular inventory method? The answer may be estimates of mean demand, variance of demand, percentiles of demand, or probabilities of high-demand values. There are appropriate accuracy metrics for each type of estimate. No matter which inventory system is in use, the accuracyimplication metrics of stock-holding costs and service 42

FORESIGHT Issue 4 June 2006

levels should always be considered because these are of prime importance to the organization. The use of these measures should not be limited to situations in which it is difficult to assess forecast error directly. Accuracyimplication metrics also offer a basis for the comparison of different forecasting methods.

References Boylan, J. (2005). Intermittent and lumpy demand: A forecasting challenge, Foresight: The International Journal of Applied Forecasting, Issue 1, 36-42. Eaves, A. H. C. & Kingsman, B. G. (2004). Forecasting for the ordering and stock-holding of spare parts, Journal of the Operational Research Society, 55, 431-437. Fildes, R. (1992). The evaluation of extrapolative forecasting methods, International Journal of Forecasting, 8, 91-98. Kahn, K. B. (2006). Commentary: Putting forecast accuracy into perspective, Foresight: The International Journal of Applied Forecasting, Issue 3, 25-26. Naddor, E. (1975). Optimal and heuristic decisions on single and multi-item inventory systems, Management Science, 21, 1234-1249. Silver, E. A., Pyke, D. F. & Peterson, R. (1998). Inventory Management and Production Planning and Scheduling, 3rd ed., New York: John Wiley & Sons. Syntetos, A. A. & Boylan, J. E. (2005). The accuracy of intermittent demand estimates, International Journal of Forecasting, 21, 303-314. Willemain, T. R., Smart, C. N. & Schwarz, H. F. (2004). A new approach to forecasting intermittent demand for service parts inventories, International Journal of Forecasting, 20, 375-387. Contact Info: John Boylan Buckinghamshire Chilterns University College, UK [email protected] Aris Syntetos University of Salford, UK [email protected]

ANOTHER LOOK AT FORECAST-ACCURACY METRICS FOR INTERMITTENT DEMAND by Rob J. Hyndman Preview: Some traditional measurements of forecast accuracy are unsuitable for intermittent-demand data because they can give infinite or undefined values. Rob Hyndman summarizes these forecast accuracy metrics and explains their potential failings. He also introduces a new metric—the mean absolute scaled error (MASE)—which is more appropriate for intermittent-demand data. More generally, he believes that the MASE should become the standard metric for comparing forecast accuracy across multiple time series. Rob Hyndman is Professor of Statistics at Monash University, Australia, and Editor in Chief of the International Journal of Forecasting. He is an experienced consultant who has worked with over 200 clients during the last 20 years, on projects covering all areas of applied statistics, from forecasting to the ecology of lemmings. He is coauthor of the well-known textbook, Forecasting: Methods and Applications (Wiley, 1998), and he has published more than 40 journal articles. Rob is Director of the Business and Economic Forecasting Unit, Monash University, one of the leading forecasting research groups in the world.

 There are four types of forecast-error metrics: scale-dependent metrics such as the mean absolute error (MAE or MAD); percentage-error metrics such as the mean absolute percent error (MAPE); relative-error metrics, which average the ratios of the errors from a designated method to the errors of a naïve method; and scale-free error metrics, which express each error as a ratio to an average error from a baseline method.  For assessing accuracy on a single series, I prefer the MAE because it is easiest to understand and compute. However, it cannot be compared across series because it is scale dependent; it makes no sense to compare accuracy on different scales.  Percentage errors have the advantage of being scale independent, so they are frequently used to compare forecast performance between different data series. But measurements based on percentage errors have the disadvantage of being infinite or undefined if there are zero values in a series, as is frequent for intermittent data.  Relative-error metrics are also scale independent. However, when the errors are small, as they can be with intermittent series, use of the naïve method as a benchmark is no longer possible because it would involve division by zero.  The scale-free error metric I call the mean absolute scaled error (MASE) can be used to compare forecast methods on a single series and also to compare forecast accuracy between series. This metric is well suited to intermittent-demand series because it never gives infinite or undefined values.

Introduction: Three Ways to Generate Forecasts There are three ways we may generate forecasts (F) of a quantity (Y) from a particular forecasting method: 1. We can compute forecasts from a common origin t (for example, the most recent month) for a sequence of forecast horizons Fn+1,...,Fn+m based on data from times t = 1,...,n. This is the standard procedure implemented by forecasters in real time. 2. We can vary the origin from which forecasts are made but maintain a consistent forecast horizon. For example, we can generate a series of one-period-ahead forecasts F1+h,...,Fm+h where each Fj+h is based on data from times t = 1,..., j . This procedure is done not only to give attention to the forecast errors at a particular horizon but also to show how the forecast error changes as the horizon lengthens. 3. We may generate forecasts for a single future period using multiple data series, such as a collection of products or items. This procedure can be useful to demand planners as they assess aggregate accuracy over items or products at a location. This is also the procedure that underlies forecasting competitions, which compare the accuracy of different methods across multiple series. While these are very different situations, measuring forecast accuracy is similar in each case. It is useful to have a forecast accuracy metric that can be used for all three cases. June 2006 Issue 4 FORESIGHT

43

period because the in-sample period includes some relatively large observations. In general, we would expect out-of-sample errors to be larger.

An Example of What Can Go Wrong

10

Consider the classic intermittent-demand series shown in Figure 1. These data were part of a consulting project I did for a major Australian lubricant Figure 1. Three Years of Monthly Sales of a Lubricant Product Sold in Large Containers manufacturer. Sales of lubricant

6 4 0

2

Units sold

8

Suppose we are interested in comparing the forecast accuracy of four simple methods: (1) the historical mean, using data up to the most recent observation; (2) the naïve or random-walk method, in which the forecast for each future period is the actual value for this period; (3) simple exponential smoothing; and (4) Croston’s method for intermittent demands (Boylan, 2005). For methods (3) and (4) I have used a smoothing parameter of 0.1.

0

5

10

15

20

25

30

35

Month Data source: Product C in Makridakis et al. (1998, chapter 1). The vertical dashed line indicates

the end of the data used for fitting and the start of the holdout set used for out-of-sample forecasting. I compared the in-sample performance of these methods by varying the origin and generating a sequence of one-period-ahead forecasts – the second Measurement of Forecast Errors forecasting procedure described in the introduction. I also calculated the out-of-sample performance based on We can measure and average forecast errors in several ways: forecasting the data in the hold-out period, using information from the fitting period alone. These out-of-sample forecasts Scale-dependent errors are from one to twelve steps ahead and are not updated in The forecast error is simply, et=Yt – Ft , regardless of how the hold-out period. the forecast was produced. This is on the same scale as the data, applying to anything from ships to screws. Accuracy Table 1 shows some commonly used forecast-accuracy measurements based on et are therefore scale-dependent. metrics applied to these data. The metrics are all defined in the next section. There are many infinite values occurring The most commonly used scale-dependent metrics are in Table 1. These are caused by division by zero. The based on absolute errors or on squared errors: undefined values for the naïve method arise from the division Mean Absolute Error (MAE) = mean(|et |) of zero by zero. The only measurement that always gives Geometric Mean Absolute Error (GMAE) = gmean(|et |) sensible results for all four of the forecasting methods is the Mean Square Error (MSE) = mean(et2) MASE, or the mean absolute scaled error. Infinite, undefined, where “gmean” is a geometric mean. or zero values plague the other accuracy measurements.

In this particular series, the out-of-sample period has smaller errors (is more predictable) than the in-sample Table 1. Forecast-Accuracy Metrics for Lubricant Sales

GMAE MAPE sMAPE MdRAE GMRAE MASE

44

Geometric Mean Absolute Error Mean Absolute Percentage Error Symmetric Mean Absolute Percentage Error Median Relative Absolute Error Geometric Mean Relative Absolute Error Mean Absolute Scaled Error

FORESIGHT Issue 4 June 2006

Mean In Out 1.65 0.96 ∞ ∞ 1.73 1.47

Naïve In Out 0.00 0.00 – – – –

The MAE is often abbreviated as the MAD (“D” for “deviation”). The use of absolute values or squared values prevents negative and positive errors from offsetting each other. SES Croston In 1.33 ∞ 1.82

Out 0.09 ∞ 1.42

In 0.00 ∞ 1.70

Out 0.99 ∞ 1.47

0.95 ∞

∞ ∞





0.98 ∞

∞ ∞

0.93 ∞

∞ ∞

0.86

0.44

1.00

0.20

0.78

0.33

0.79

0.45

Since all of these metrics are on the same scale as the data, none of them are meaningful for assessing a method’s accuracy across multiple series.

For assessing accuracy on a single series, I prefer the MAE because it is easiest to understand and compute. However, it cannot be compared between series because it is scale dependent. For intermittent-demand data, Syntetos and Boylan (2005) recommend the use of GMAE, although they call it the GRMSE. (The GMAE and GRMSE are identical because the square root and the square cancel each other in a geometric mean.) Boylan and Syntetos (this issue) point out that the GMAE has the flaw of being equal to zero when any error is zero, a problem which will occur when both the actual and forecasted demands are zero. This is the result seen in Table 1 for the naïve method. Boylan and Syntetos claim that such a situation would occur only if an inappropriate forecasting method is used. However, it is not clear that the naïve method is always inappropriate. Further, Hoover indicates that division-byzero errors in intermittent series are expected occurrences for repair parts. I suggest that the GMAE is problematic for assessing accuracy on intermittent-demand data.

the value of sMAPE can be negative, giving it an ambiguous interpretation.

Relative errors An alternative to percentages for the calculation of scaleindependent measurements involves dividing each error by the error obtained using some benchmark method of forecasting. Let rt = et /et* denote the relative error where et* is the forecast error obtained from the benchmark method. Usually the benchmark method is the naïve method where Ft is equal to the last observation. Then we can define Median Relative Absolute Error (MdRAE) = median(|rt |) Geometric Mean Relative Absolute Error (GMRAE) = gmean(|rt |)

Because they are not scale dependent, these relative-error metrics were recommended in studies by Armstrong and Collopy (1992) and by Fildes (1992) for assessing forecast accuracy across multiple series. However, when the errors are small, as they can be with intermittent series, use of the naïve method as a benchmark is no longer possible because it would involve division by zero.

Scale-free errors Percentage errors The percentage error is given by pt = 100et /Yt. Percentage errors have the advantage of being scale independent, so they are frequently used to compare forecast performance between different data series. The most commonly used metric is

Mean Absolute Percentage Error (MAPE) = mean(|pt |) Measurements based on percentage errors have the disadvantage of being infinite or undefined if there are zero values in a series, as is frequent for intermittent data. Moreover, percentage errors can have an extremely skewed distribution when actual values are close to zero. With intermittent-demand data, it is impossible to use the MAPE because of the occurrences of zero periods of demand. The MAPE has another disadvantage: it puts a heavier penalty on positive errors than on negative errors. This observation has led to the use of the “symmetric” MAPE (sMAPE) in the M3-competition (Makridakis & Hibon, 2000). It is defined by sMAPE = mean(200 |Yt – Ft | / (Yt + Ft )) However, if the actual value Yt is zero, the forecast Ft is likely to be close to zero. Thus the measurement will still involve division by a number close to zero. Also,

The MASE was proposed by Hyndman and Koehler (2006) as a generally applicable measurement of forecast accuracy without the problems seen in the other measurements. They proposed scaling the errors based on the in-sample MAE from the naïve forecast method. Using the naïve method, we generate one-period-ahead forecasts from each data point in the sample. Accordingly, a scaled error is defined as et qt= n 1 ∑ |Y –Y | n–1 i=2 i i–1 The result is independent of the scale of the data. A scaled error is less than one if it arises from a better forecast than the average one-step, naïve forecast computed insample. Conversely, it is greater than one if the forecast is worse than the average one-step, naïve forecast computed in-sample. The mean absolute scaled error is simply

MASE = mean(|qt |) The first row of Table 2 shows the intermittent series plotted in Figure 1. The second row gives the naïve forecasts, which are equal to the previous actual values. The final row shows the naïve-forecast errors. The denominator of qt is the mean of the shaded values in this row; that is the MAE of the naïve method. June 2006 Issue 4 FORESIGHT

45

Table 2. Monthly Lubricant Sales, Naïve Forecast

values of other groups of series to identify which series are the most difficult to forecast. Typical Actual Yt values for one-step MASE values are less than one, Naïve as it is usually possible to obtain forecasts more forecast Ŷt 02010100002063000007000000000000000 Error |Yt – Ŷt| 22111100022633000077000000310010100 accurate than the naïve method. Multistep MASE values are often larger than one, as it becomes more difficult to forecast as the horizon increases. The only circumstance under which the MASE would be infinite or undefined is when all historical observations The MASE is the only available accuracy measurement that are equal. can be used in all three forecasting situations described in the introduction, and for all forecast methods and all types The in-sample MAE is used in the denominator because it is of series. I suggest that it is the best accuracy metric for always available and it effectively scales the errors. In intermittent demand studies and beyond. contrast, the out-of-sample MAE for the naïve method may be zero because it is usually based on fewer observations. References For example, if we were forecasting only two steps ahead, Armstrong, J. S. & Collopy F. (1992). Error measures for then the out-of-sample MAE would be zero. If we wanted to generalizing about forecasting methods: Empirical compare forecast accuracy at one step ahead for ten different comparisons, International Journal of Forecasting, 8, 69–80. series, then we would have one error for each series. The out-of-sample MAE in this case is also zero. These types of Boylan, J. (2005). Intermittent and lumpy demand: A problems are avoided by using in-sample, one-step MAE. forecasting challenge, Foresight: The International Journal of Applied Forecasting, Issue 1, 36-42. A closely related idea is the MAD/Mean ratio proposed by Hoover (this issue) which scales the errors by the in-sample Fildes, R. (1992). The evaluation of extrapolative forecasting mean of the series instead of the in-sample mean absolute methods, International Journal of Forecasting, 8, 81–98. error. This ratio also renders the errors scale free and is always finite unless all historical data happen to be zero. Hyndman, R. J. & Koehler, A. B. (2006). Another look at Hoover explains the use of the MAD/Mean ratio only in measures of forecast accuracy, International Journal of the case of in-sample, one-step forecasts (situation 2 of the Forecasting. To appear. three situations described in the introduction). However, it would also be straightforward to use the MAD/Mean ratio Makridakis, S. & Hibon, M. (2000). The M3-competition: in the other two forecasting situations. Results, conclusions and implications, International Journal of Forecasting, 16, 451–476. The main advantage of the MASE over the MAD/Mean ratio is that the MASE is more widely applicable. The MAD/Mean Makridakis, S. G., Wheelwright, S. C. & Hyndman, R. J. ratio assumes that the mean is stable over time (technically, (1998). Forecasting: Methods and Applications (3rd ed.), that the series is “stationary”). This is not true for data which New York: John Wiley & Sons. show trend, seasonality, or other patterns. While intermittent data is often quite stable, sometimes seasonality does occur, Syntetos, A. A. & Boylan, J. E, (2005). The accuracy of and this might make the MAD/Mean ratio unreliable. In intermittent demand estimates, International Journal of contrast, the MASE is suitable even when the data exhibit a trend or a seasonal pattern. Forecasting, 21, 303-314. In-sample Out-of-sample 020101000020630000070000000310010100

The MASE can be used to compare forecast methods on a single series, and, because it is scale-free, to compare forecast accuracy across series. For example, you can average the MASE values of several series to obtain a measurement of forecast accuracy for the group of series. This measurement can then be compared with the MASE 46

FORESIGHT Issue 4 June 2006

Contact Info: Rob J. Hyndman Monash University, Australia [email protected]

FORECASTING PRINCIPLES AND METHODS LESSONS FROM THOMAS EDISON’S TECHNOLOGICAL AND SOCIAL FORECASTS by Steven Schnaars Preview: Thomas Edison’s inventions have had an unparalleled influence on modern life. But Edison was also a technological forecaster, offering his vision of which technologies would (and would not) dominate our lives in the future. Steve Schnaars looks back on Edison’s 13 technological and social forecasts to evaluate the inventor’s predictive hits and blunders. The main lesson he sees in Edison’s technological forecasting is that spreading risk by pursuing multiple paths to future market success is probably a better strategy than trying to predict precisely which technologies will succeed in the future. Steve Schnaars is a Professor of Marketing at Baruch College in New York City, where he has taught since 1979. Steve is the author of three books and dozens of academic articles on technological forecasting and marketing strategy. He lives in Brooklyn and Vermont, and he has run competitively in 36 marathons.

edison’s successful predictions

edison’s blunders

» punch cards will create smart machines. » chemical food pills will not be popular. » man-made fibers will revolutionize clothing. » farming will become mechanized. » electric locomotives will replace steam. » atomic energy will be harnessed, but with some dangerous consequences. » technology will raise the average person’s standard of living.

» the hovering design will work best for aircraft. » poured-concrete houses will become common. » financial havoc will be created by man-made gold. » steel furniture and nickel books will replace wooden furniture and paper books. » solar energy will emerge as a new power source. » world poverty will end.

Introduction There is arguably no more prolific inventor in American history than Thomas Alva Edison. Edison made his first fortune in 1869, when he developed an improved stock ticker. Using his connections with Wall Street financiers, he invested profits from the stock ticker to bankroll his development of the phonograph (1877), which started first with tinfoil cylinders and then progressed to disk-shaped records. He then presented his most famous invention, the first practical incandescent light bulb (1879). He developed the world’s first large-scale, centralized power-generating station (1882), the motion picture projector (1888), the mimeograph, and later the alkaline, nickel-iron storage battery, which he thought would surely be used in the electric car. In total, Edison patented 1039 inventions.

Edison is the epitome of the “great man” theory of innovation, which holds that major technological innovations arise solely from the genius of a limited number of men like Thomas Edison and Henry Ford. This theory, which was a prominent explanation of technological evolution in the 1920s, was supplanted by “gradualism,” a theory posited by S. C. Gilfillan (1935), and later by George Basalla (1988), both of whom argued that scientific evolution is a gradual process developed by many. This view has been recently supported by Joel Mokyr (1990), who argues that such evolution is more a series of “fits and starts” than incremental change. Edison was an avid technological forecaster. As a result of the stunning inventions he made during the last quarter of the nineteenth century, he was often asked during the early twentieth century for his opinions about the future of technology. He enthusiastically offered his visions, and June 2006 Issue 4 FORESIGHT

47

these predictions were more than mere daydreams. Many of his forecasts were backed up by actual efforts to develop and market the products he espoused.

half century later, as well as the diffusion of advanced manufacturing devices.

In this article, I evaluate 13 technological and social forecasts made by Thomas Edison between 1911 and 1914. Then in his mid-60s, he offered his vision of which technologies would (and would not) dominate our lives in the future. References for these forecasts are Edison (1914, 1911, and 1910). In total, Edison focused on 11 emerging technologies that he thought would be important (or unimportant) in the future. Also, he addressed two specific social trends that were embedded in his broader vision of how technology would impact society.

Technological forecasters can claim success in two ways. They can predict that an event will occur or, alternatively, they can predict that an event will not occur. Edison found success with at least one of these naysayer forecasts.

Edison’s Successful Forecasts Six of Edison’s 11 technological forecasts were successful, based on an assessment of what actually happened during the 20th century. punch cards will create smart machines. Before the end of World War II, virtually no one predicted the predominance of computers in modern life. Edison was the exception to that rule. As early as 1911 he foresaw that punch card technology would someday spread to many sectors of the economy; a plethora of smart machines would automate many business processes. At the root of Edison’s forecast was the Jacquard loom, a device invented in the late 1700s by Joseph Jacquard; this machine stitched complicated patterns into rugs and fabrics, using information contained on punch cards. Nearly a hundred years later, in 1889, Herman Hollerith patented an adaptation of the loom’s technology to collect information for the U.S. census. The company he formed for that purpose in 1890 was renamed International Business Machines (IBM) in 1924. Thomas Edison was enamored of the potential of the Jacquard loom. It turned out to be one of his most interesting forecasts. Edison predicted that punch card technology would someday spread to many other industries, creating armies of smart machines that would “follow the order and commandment of the card.” For example, seamstresses would disappear, replaced by machines that would do the tasks of many coordinated workers. But that was only one application. In Edison’s words, “So far as I can see, there is almost no limit to the extent to which it [punch card automation] may be applied.” His successful prediction hints at the arrival of the computer more than a 48

FORESIGHT Issue 4 June 2006

chemical food pills will not be popular.

In the early 20th century, there was experimental work being done on the creation of chemical food pills, which proponents believed would someday serve as a food substitute. So many other forms of modern life were moving away from oldfashioned, natural, handmade processes—such as making your own butter and sewing your own clothes—that some experts expected that the same thing would happen with food. Edison disagreed with this prediction. He argued that food in the future would still come from the farm, although science would increase crop yields and create hybrids in that “laboratory.” Chemical foods would be accepted only when there was no alternative, perhaps during a severe drought or famine. Besides, Edison noted that “the complaint today seems to be that here [sic] are too many chemicals in our food.” He was right. Throughout the 20th century, there was an inexorable trend toward more food additives to increase shelf life and heighten taste, but traditional food itself would remain a pleasure unlikely to be replaced by chemical food pills. man-made fibers will revolutionize clothing. Edison scored another direct hit with his prediction that new man-made fibers would revolutionize clothing. Rayon, which is made from cellulose, a natural substance, was already available. But that was only the beginning. Edison predicted that clothes would be cheap and widely available so “that every young woman will be able to follow fashions promptly.” His bet on cheap, synthetic fibers was right. Twenty-five years later, nylon (1937) and polyester (1940) did indeed revolutionize the clothing industry. Edison’s belief in a bright future for synthetic fibers was part of his broader vision that many industries then dominated by manual labor would soon embrace large-scale mechanization that would drive down prices. He was undoubtedly influenced by the practices of his friend Henry Ford, who was revolutionizing the auto industry. Edison believed that automation would spread to many other sectors of the economy. Clothing, he predicted, would soon be made “in a single operation by machinery”; such automation would lead to increased productivity and low prices.

farming will become mechanized. Edison predicted a revolution in farming practices. This too was part of his broader vision of machines replacing human labor. “The present type of farmer and the present methods of farming,” he said, “are destined to disappear.” In their places would be highly mechanized farms running like big businesses and relying heavily on scientific agriculture: “I think the coming farmer will be a man on a seat beside a push-button and some levers.” The revolution in farming machinery would be matched by a revolution in farming practices. Science, he noted, was learning ever more about soil fertility and soil erosion. These dual revolutions would dramatically improve farming. That is exactly what happened. electric locomotives will replace steam. Edison achieved a clear hit when he predicted that “the steam locomotive is blowing its last blast for millions of people.” He was also right in noting that within a generation, no one in New York or New England would be riding on a train pulled by a steam locomotive. He believed that steam locomotives would remain only in sparsely populated areas where it would be impractical to generate and transport electricity. Edison was correct in predicting that steam power would disappear and that electric power would become more common. However, he failed to see the rise of diesel as a locomotive power. In fact, electric locomotives did replace steam in and around cities, but diesel engines replaced steam on long-haul routes outside East Coast cities. atomic energy will be harnessed, but with some dangerous consequences. Thomas Edison was both fascinated and frightened by radium. He noted that it had “great power” with “no appreciable limit or end” because of its ability to give off intra-atomic energy. He correctly recognized that radium presented great possibilities for both mechanical and medical applications, even though it was no more than a poorly understood scientific curiosity at the time. Edison understood that radium packs a large amount of power in a small quantity of substance. He speculated that being able to harness that extraordinary power would lead to wondrous future developments. One expert had already created a radium-powered clock that “will go several hundred years without winding.” Edison suggested that such developments were merely the start of a long stream of extraordinary inventions.

In 1910, radium was also useful for treating skin cancer; other medical applications were sure to be forthcoming. But Edison was leery of radium’s dangers. Professor Curie, he recalled, “had a tiny bit [of radium] in his vest pocket that he was taking to London and it had the effect of making a festering sore in his side.” Edison questioned how large deposits of radium would be handled “without dangerous consequences.” Overall, Edison’s early views of radium captured the two main issues that sealed its fate later in the 20th century: (1) the allure of its immense power to generate electricity in the 1960s that led to tremendous investments in nuclear power plants; and (2) the overwhelming questions about its safety that led to its abandonment as a viable energy source, at least in the United States, in the 1980s. He erred only in believing that the extraordinary power of radium might be used to turn ordinary metal into gold.

Edison’s Blunders Five of Edison’s 11 technological forecasts did not pan out. the hovering design will work best for aircraft. Edison’s biggest forecasting blunder concerned one of the most important inventions of the 20 th century—the airplane. It was not that Edison thought that the airplane would not fly, a mistake made by many forecasters ten years earlier; instead, he believed that the Wright brothers were pursuing the wrong design. Wilbur and Orville Wright had built their first winged glider in 1900, and they had made the first sustained and controlled flight in a heavier-than-air, powered aircraft on a sand dune near Kitty Hawk, North Carolina, on December 17, 1903. By 1905, the brothers had built a practical plane that could fly “figure eights” and stay airborne for half an hour. By January 1910, however, Edison was still not convinced that the Wright brothers’ airplane would be a success. He proposed an airplane based on the “helicopter principle,” which would be able to “vanquish the hostility of the wind.” He said that “a successful air machine must be able to defy the winds.” In 1911, Edison extended his vision of how successful flight would be accomplished. He observed that buzzards June 2006 Issue 4 FORESIGHT

49

and bumblebees hover effortlessly in place, and he surmised that they do so by creating sound waves under their wings. Airplanes would do the same once the “sound wave” principle could be harnessed mechanically. Then, “the present type of aeroplanes will soon be discarded, and . . . bumblebee fliers will carry passengers at the rate of one hundred miles an hour, or more.” By January 1914, when interviewed by a reporter for the Independent, Edison was chastened: he refused to comment on the future of flying machines, responding with a terse “Don’t know” when asked about the prospects for powered flight. By this time it must have been clear to him that the Wright brothers had been correct. In the end, sound waves played no role in flight, and helicopters, while subsequently feasible, supplemented but did not supplant winged aircraft as pioneered by the Wrights. Edison was as much a promoter as an inventor. He must have felt some jealousy at the magnitude of the Wright brothers’ invention. That envy may have colored his forecasting and driven one of his greatest errors. poured-concrete houses will be common. Like many of his contemporaries, Edison fell victim to a perceived analogy between automobile and housing construction. He assumed that just as autos were being built efficiently on Henry Ford’s assembly lines, so housing would be built more cheaply and efficiently. Edison envisioned an assembly line for houses where giant halfton molds would be erected on the building site. Massive amounts of concrete would then be poured into the molds, which would create entire preformed bathrooms, closets, and even one-piece staircases. Virtually the entire house would be created in a single, efficient step. The result would be lower construction costs to consumers and less expensive fire insurance, as concrete does not burn. In recognition of his accomplishments, Edison was granted a patent for his innovative poured-concrete housing method. However, the cost of assembling and moving the gigantic molds never declined substantially. The automobile analogy did not hold. It was always cheaper and more efficient to hire carpenters to put up “stick” houses. Edison’s marketing mistakes exacerbated the problem. Ever the entrepreneur, he sold his poured-concrete houses as an affordable alternative to slum dwellings. That tarnished 50

FORESIGHT Issue 4 June 2006

the houses with a disparaging label and scared away potential buyers. Ultimately, 13 houses were “poured” in New Jersey, with another 12 in Gary, Indiana, around 1915. Most are still standing, but their aging has been anything but graceful (Almada, 1996). The Indiana houses fell into disrepair, and many were abandoned after Gary had declined as a steel center. A government-backed rehabilitation program brought many of these dwellings back to life as cheap innercity housing. But poured-concrete housing proved to be a technology of false promise. gold will be manufactured. Hovercrafts and poured-concrete houses may have been two of Edison’s most stunning failures, but the development he was most passionate about, and the one he most wanted to discuss, was the future role that gold would play as a standard of monetary value. Specifically, Edison predicted that “it is only a question of time until a way is discovered to manufacture gold, making it so inexpensive that it may be left out at night.” The result would be a catastrophe for the financial system. Because most business contracts at that time stipulated payment in gold, Edison worried that debts would be paid off with a worthless metal once gold could be made for $25 a ton. Edison predicted a tremendous transfer of wealth from creditors to debtors. It never happened. There was a financial collapse at the end of the 1920s, but it had nothing to do with the cheap manufacture of gold. steel furniture and nickel books will replace wooden furniture and paper books. Metals played an important role in two other Edison forecasts. One was that all “furniture will soon be made of steel.” Steel at the time cost one-fifth as much as wood, and Edison believed that “babies of the next generation will sit in steel high-chairs and eat from steel tables. They will not know what wood furniture is.” Besides, steel could be stained to look just like wood. With cheap, highly durable steel furniture available, wooden pieces would be doomed to obsolescence. Turning his attention to publishing, Edison predicted that paper books would be replaced by books with steel covers and nickel pages. Unlike paper books, which crumble after a century or so, nickel-based books would be indestructible except through fire and abuse. Cost savings would speed the substitution. Edison claimed that a two-inch-thick metal

book could contain forty thousand pages and could be produced for much less than a paper book. solar energy will emerge as a new power source. Edison described a solar energy system that seems lifted from the energy crisis of the 1970s. In 1910 he noted that a 30-horsepower “sun engine” was powering an Arizona steam turbine by focusing the sun’s rays through a magnifying glass onto a copper boiler. “Sun engines,” he believed, “are very promising machines.” Edison was also bullish on wave, tidal, and geothermal energy systems, which, he predicted, would develop rapidly. But now, nearly a hundred years later, we still hear forecasts of a bright—sometimes imminent—future for alternative technologies.

machine manufacturing would cause consumer prices to fall, and goods would become widely available to the masses. This would drastically raise the average person’s standard of living. Edison predicted that in the far distant future, “the ordinary laborer will live as well as a man does now with $200,000.” Edison foresaw a profound revolution based on automation: “[T]here will be no manual labor in factories of the future.” Workers would be “merely superintendents watching the machinery.” As a result, there would be more emphasis on working with brains rather than backs. The workday would fall to eight hours and would offer more interesting work and more pleasant surroundings. world poverty will end.

Edison was by no means an environmentalist. He advocated burning coal at the mouth of a mine, setting entire veins of coal afire beneath the earth, then transporting the electricity over great distances. The “wasteful spouting” of those Yellowstone geysers was also a source of annoyance; this energy could be harnessed for industrial use. Meanwhile, some of Edison’s contemporaries urged caution regarding some of the more extreme proposals, such as harnessing the power of volcanoes. They thought that such interventions might throw the earth out of orbit with catastrophic results. Edison replied that “You can’t hurt the earth, even if you do throw it a trifle out of kilter.”

A Pair of Social Forecasts Edison’s social forecasts were predicated on his broader vision of industrial mechanization. He believed that the widespread use of machines to do formerly manual tasks would create fundamental social changes. In the decades preceding 1900, many tasks, from making butter and clothing at home to manufacturing in factories, relied heavily on manual labor. But the turn of the century brought the start of a revolution in mechanical conveniences. The telephone, the electric light, and the automobile were invented in the years between the Civil War and the turn of the 20th century. After 1900, those inventions started to become prevalent in everyday life. These were times of great technological change, and many experts, including Edison, assumed that great social changes would follow. the standard of living will rise. Edison observed that mechanization increases productivity. The result would be “a world flooded with food, clothing, shelter, and luxuries” at much lower prices. Widespread

Edison did well in predicting the rising standard of living, but he overshot in predicting that “[t]here will be no poverty in the world a hundred years from now.” Again, his prediction was based on the revolution in machinery and scientific agriculture that would result in a flood of cheap, high-quality goods. Edison saw a correlation between manual labor and poverty. “Now that men have begun to use their brains,” he observed, “poverty is decreasing.” Mechanization would ensure that there would be “no half-starved children, no overworked mothers, no poverty-worried fathers.” When asked by an interviewer what would happen if only a few men controlled such mechanized processes, he responded that “a few are getting too much and the rest not enough.” That would change, however. Accompanying the revolution in production would be a revolution in government. Edison was no socialist, but he predicted that “there will be some big experiments in government in the next fifty years.” Six years later, the Bolsheviks overthrew the Russian czar. Yet today, nearly a century after Edison first voiced his forecasts, there is still criticism about the disparity of wealth in the United States and about widespread poverty in much of the world.

Summary Evaluation of Edison’s Forecasts Thomas Edison’s 13 forecasts are summarized in Table 1. Seven of these proved prescient, for an overall success rate of 54%. When it came to forecasting the future of emerging technological and social trends, this man who was rightfully called the “Wizard of Menlo Park” was barely able to exceed chance. June 2006 Issue 4 FORESIGHT

51

Table 1. An Evaluation of Edison’s Forecasts

PREDICTION

OUTCOME

1.

Punch Cards: Smart machines controlled through punch cards will revolutionize industry.

Right: Punch cards were just the beginning of the computer revolution.

2.

Chemical Foods: Artificial foodstuffs would not supplant food grown on farms.

Right: While many foods have chemical additives, most foods have their roots in the ground, not in the chemist’s lab.

3.

Man-Made Fibers: Fibers from cellulose and other artificial sources will make clothes plentiful and fashions accessible to all.

Right: Clothes made of cellulose and synthetic fibers are ubiquitous, cheap, and in some respects superior to those made with natural fibers.

4.

Mechanized Farming: Farms will become mechanized and will be managed like factories.

Right: Family farms are economically under siege, and agriculture is dominated by industrial farming.

5.

Electric Locomotives: Electricity will render steam locomotives obsolete.

Right: The steam locomotive survives only in theme parks.

6.

Atomic Energy: Radium presents great commercial possibilities but is dangerous.

Right: The great building boom of atomic plants during the 1960s and 1970s was followed by abandonment because of safety concerns in the 1980s.

7.

Standards of Living: Technology will yield better working and economic conditions for all.

Right: Standards of living kept growing throughout the 20th century.

8.

Aviation: Helicopters and “bumblebee fliers” floating on sound waves are superior to the Wright brothers’ fixed-wing aircraft.

Wrong: Fixed-wing aircraft prevailed.

9.

Poured-Concrete Houses: Concrete can be molded into cheap and safe housing units.

Wrong: Cultural resistance and the huge investments required for molds halted construction after a few units.

10.

Man-Made Gold: New energy sources can be used to manufacture gold, with dramatic effects on the distribution of wealth.

Wrong: It never happened.

11.

Steel Furniture and Nickel Books: Metals will supplant wood and paper because they are cheaper, more plastic, and more durable.

Wrong: Wood and paper remain the standard for furniture and publishing.

12.

Solar Energy: Energy from climatic and geographic sources will be exploited, and electricity will be transported over long distances.

Wrong: The appropriate economic conditions never materialized.

13.

World Poverty: Abundant material goods will eliminate poverty and reduce inequities.

Wrong: Poverty is very much with us.

Edison, however, did have a strategy for dealing with the vagaries of evolving technology. He held more than a thousand patents. Unlike a company such as AT&T, which has placed huge bets on a single technology like the picture telephone, Edison spread his bets across a large number of promising technologies. In forecasting the future of technology, he was investigating the potential of dozens and sometimes hundreds of inventions that might have had a future. Although he could not foresee which inventions would lead to great wealth, he knew that some subset of his inventions would be successful. That is the lesson from Edison’s technological forecasting. Spreading risk by pursuing multiple paths to future market success is probably a better strategy than trying to predict precisely which technologies will succeed in the distant future.

Edison, T. (1914). Today and tomorrow. Interview with John McMahon. Independent, 77, 24-27. Edison, T. (1911). The wonderful new world ahead of us. Reported by Allan Benson. Cosmopolitan, 50, 294-306. Edison, T. (1910). Inventions of the future. Interview with John McMahon. Independent, 68, 15-18. Gilfillan, S. C. (1935). The Sociology of Invention, Chicago: Follett Publishing Company. Mokyr, J. (1990). The Lever of Riches, New York: Oxford University Press.

References Almada, J. (1996). New life for abandoned and derelict Edison homes, The New York Times, January 14, 10. Basalla, G. (1988). The Evolution of Technology, New York: Cambridge University Press. 52

FORESIGHT Issue 4 June 2006

Contact Info: Steven Schnaars Baruch College, City University of New York [email protected]

TIPS FOR FORECASTING SEMI-NEW PRODUCTS by Bill Tonetti Preview: “Semi-new” is the label that author Bill Tonetti assigns to products that are not truly new, but rather result from extensions and modifications of existing products. In this article, Bill shows how to use data that exist on the predecessor products to forecast demand for the semi-new products. Many firms overlook the opportunities Bill describes here, with severe consequences for forecast accuracy, inventory costs, and service levels. Bill Tonetti is President and Cofounder of Demand Works, a software and services company that focuses on collaborative forecasting, demand and supply planning, and multidimensional data analysis. Bill has over two decades of experience in supply-chain operations, software development, and consulting for a variety of manufacturing and distribution industries.

 The majority of new products are actually modifications of existing products. I use the term semi-new product to refer to an item that may not be entirely new, but that has little or no demand history.  All popular automatic-forecasting systems employ some number of time-series techniques with the objective of using a model that best fits the historical data. The reliability of automatic systems, however, fails for both semi-new products and end-of-life products.  One strategy for forecasting a semi-new product is to derive a simulated history that links the demands of semi-new and predecessor products. The method to link these demands depends on the nature of the product change: for example, is it a style change or an engineering change?  A second strategy links the semi-new product to the appropriate product group. So-called pyramid techniques can be valuable in transferring the trends and seasonal patterns in a group to the forecasts for the individual products within that group.  Forecasts for end-of-life products must be extinguished either by automatic procedures such as extinction curves or by judgmental overrides of erroneous automatic forecasts.

Introduction Most tactical business forecasting is based on the assumption that history is a useful predictor of future demand. Statistical approaches analyze historical levels, trends, and cycles to develop time-series forecasts. Unexplained variability (forecast error) is then used to determine optimal inventory levels. Demand history enables automated statistical forecasting and provides a

basis not only for user judgment but also for refinement of the forecast. Today there is scarcely a manager in business who hasn’t had some training in rudimentary time-series forecasting techniques. But what should a manager do when demand history is not readily available?

The Newness of New Products There are countless reasons for new products. Today’s dynamic economy is characterized by innovation, competitive moves and countermoves, and aggressive cost management. Even though we pride ourselves on the inventiveness of our era, the fact is that few really new products are introduced. Most change is evolutionary. The majority of new products are actually modified existing items, and many aren’t new at all. An increasing number of companies forecast demand for specific customers and distribution centers, so a “new” product may be new only for a particular customer or distribution center. For the purpose of this article, a semi-new product is an item that may not be entirely new, but that has little or no demand history. Regardless of the true degree of change or innovation causing the absence or irrelevance of history for a particular product, the need for accurate demand forecasting is nonetheless present, and the task of developing forecasting strategies for these products is a challenge for most businesses. Semi-new items generally emerge from change that is instigated by managers. These items result from market changes, product-engineering opportunities, material or manufacturing advances, cost pressures, and evolutionary or cyclical consumer preferences. June 2006 Issue 4 FORESIGHT

53

In some industries, such change is the regular order of doing business. The apparel and footwear markets are extreme examples, where producers must replace most if not all of their styles each year. Other industries have formulation changes, new configurations, or frequent changes in packaging or labeling. Even stable industrial manufacturers such as paper, glass, steel, chemicals, and other primary-product producers experience frequent formulation changes. This issue of inaccessible demand history of semi-new products spans nearly all industries, and it affects many corporate product portfolios. I recently worked with a packaging company that had completely stopped statistically forecasting demand because of the poor forecasts caused by minor product changes. As small changes to items were introduced, previous items were automatically forecast even though they had been discontinued; in turn, the replacements were very crudely forecast from the short history available. Although it may sound like an easily foreseeable problem with a straightforward remedy, many companies struggle with this issue, which is often a significant source of forecast error.

The Failures of Automatic Forecasting of Semi-New and End-of-Life Products

demand. Similarly, items with no demand history will not be forecasted. With one period, the systems will choose a random walk (that is, the next-period forecast will equal this period’s demand). With two periods, they will choose a random walk or a two-period moving average. The automatic models will consistently underforecast demand during the early stages of a ramp-up, and they often overestimate demand as the rate of growth begins to level off. The problem is magnified if the data are seasonal. Seasonal models can be fit only when there are at least two, and preferably three, full seasonal cycles, which amounts to at least two to three years for most monthly or weekly data. As a result, automated forecasting systems, if unattended, will produce suboptimal, straight-line forecasts for seasonal items for a period of at least two years after their introduction. This is a serious problem, as product life cycles in many industries are shorter than three years. Variability of demand is also an important (and often overlooked) characteristic, particularly because of the role it plays in determining service-inventory levels and demand- and supply-hedging strategies. Inaccurate forecasts produce higher errors, which in turn result in increased service inventories. All safety-stock formulas are based on unexplained demand variability during lead time. When less precise forecasting methods are used, inventory will rise or service will fall.

All popular automatic forecasting systems employ some number of time-series techniques with the objective of building a model that best fits the historical data. With Origins of Semi-New Products varying degrees of precision, automatic forecasting systems have proved capable of rapidly producing good forecasts Unlike a truly new product, for which there is little data for thousands of products with stable demand. The M3 beyond sociological and judgmental inputs, a semi-new Competition (Makridakis and Hibon, 2000) shows that item can be linked to useful past data. Table 1 identifies automatic forecasting procedures are among the most accurate when evaluated across many Table 1. Origins of Semi-New Products and Implications for Forecasting types of data. Because of their ORIGINS IMPLICATIONS FOR FORECASTING accuracy and efficiency, automatic Engineering Change » No expected change in level or variability of demand forecasting procedures are a » Unlikely change in demand requirement for most manufacturers » Possible pipeline issues with prior sizes if change is immediate Dimension Change » The product may ship to a different production facility or to a and distributors. The M3 customer with different buying patterns. Competition, however, did not include » Demand is typically expected to exceed that of the style that end-of-life or semi-new products. is being replaced, but it may cannibalize several other styles. Style Change

The accuracy of automatic systems fails for end-of-life products. Exponential smoothing models, which are the most prevalent in automated systems, will consistently overforecast items with expiring 54

FORESIGHT Issue 4 June 2006

Packaging Change

Product Extension

» Seasonality and demand variability should be similar to the predecessor products. » Demand will change if the quantity of material in the package is changed. » Both prior and replacement packaging will probably be sold in the transition. » Similar to style change, in that seasonality, level, and variability should be similar to related products. » Possible cannibalization effects

the origins of semi-new products and shows the implications for developing forecasts. As the table illustrates, information about the semi-new item is often available. Packaging or engineering changes, for example, which are a prolific basis for semi-new products in many companies, often bring significant opportunities to leverage information about predecessordemand levels, seasonality, trends, and variability.

Tips for Forecasting Semi-New Products Semi-new product forecasting is unique in that useful related information exists; however, that information has not yet been associated with the semi-new item. Because semi-new items are the result of management action, the reasons for change are usually known, and the implications for the new and previous products are known as well. Create a history for the semi-new product. The absence of historical demand is the key stumbling block to forecasting semi-new items. Thus, the first and most important step is to simulate a history for the seminew item that reflects the linkage between the semi-new item and its predecessor. To establish such a linkage, your software must support multiple streams of historical information, including actual history (what was really demanded) and simulated history (the demand incorporated into your forecasts). In this way, history can be created for forecasting purposes without invalidating the real historical demand information in the data. If your software does not allow multiple related streams of historical data, you can still adjust the history and automate part of the forecast; however you will no longer be able to use the system for sales analysis. There are several ways to develop or simulate history. The best approach depends on the information that is available and also on the assumed linkage between the new item and the predecessor item.  If the semi-new item is expected to follow the level, trend, seasonality, and variability of a predecessor, as in the case of an engineering change, then you can merely copy the history from the predecessor to the new item. Your simulated history will begin with the time series of the predecessor and therefore reflect historical data patterns of the predecessor. In effect, you are assuming that the change is “in name only.”

 If the new and replacement products will be sold at the same time during a transitional period, as in the case of a packaging change, then it’s best to use the sum of the actual histories for the two products as the simulated history for the new one. During the transition, however, you may need to repeat the summation calculation during each new forecasting period, which is a time-consuming process. If your software can link the items, the process of moving the summations forward can be performed automatically.  If the baseline level of demand is expected to grow but the variability and seasonality are expected to remain similar to the older item—as in the case of a style change— then a growth factor can be built into the simulated history.  If the product is truly new and there is a need to produce reasonable automated forecasts quickly, then it may be useful to create a couple of periods of simulated history to initiate automated forecasting. The simulated history will be augmented with actual demand as it becomes available. This technique will initially produce straight-line forecasts, which can then be judgmentally adjusted or else linked to the product family via pyramid techniques. Link the product to the product family: Pyramid techniques. Forecasts for semi-new items can also be developed by using pyramid forecasting techniques. A pyramid is a hierarchical structure in which items or products are aggregated into meaningful groups such as product groups or families, brands, or location totals. A pyramid can be as simple as a two-level hierarchy (products and product family), but often it consists of multiple levels (component, product, package group, and brand). If the items within a group are expected to possess similar demand patterns such as seasonality, the forecaster can use an established demand pattern at the group level to forecast the individual item demands. For example, if an ice cream manufacturer launches a new flavor, the seasonal patterns for the new flavor will likely be consistent with the patterns of other flavors. Linking the new item to the group pattern can be particularly productive if the individual-item data are inadequate for forecasting, as is the case for SKUs with demands that are volatile, sporadic (intermittent) or patternless—or for items whose histories are very short, such as semi-new items. The lengthier histories and more discernible structures of June 2006 Issue 4 FORESIGHT

55

the group-level data make it feasible to develop forecasts for the aggregate and to obtain indexes for the trend and seasonality of the aggregate. For a semi-new product with a short history (where we cannot derive a lengthier simulated history), any forecast will be limited to a flat line, even though it is known that the item will follow the pattern of the group. In this case, the flat-line forecasts can be adjusted by the trend or by the seasonal patterns derived from the group-level data. The preceding is an example of a top-down strategy for reconciling item and group-level forecasts. If the item data are poor, then the bottom-up strategy (forecasting the individual items and then aggregating to obtain a group forecast) will work badly. With a top-down approach, the patterns in group-level forecasts (the top) are imposed on the individual item forecasts, giving them the trend and seasonal structure of the group. An added advantage of the pyramid approach to forecasting concerns the workload on the firm’s planners. Most companies do not have enough time to create forecasts for all their products. By concentrating on the top levels of the forecast hierarchy, planners can employ pyramid forecasting to manage the forecast process without directly tending to each product. Adjust forecasts for predecessor or cannibalized items. In end-of-life or product-replacement scenarios, the forecast for the predecessor product must be managed as well. A failure to shut off or adjust automated forecasts for these products will result in unwanted production of soon-tobe-obsolete goods. Similarly, forecasts for cannibalized items should be reduced to reflect lower demand. There are various approaches for these end-of-life situations; the best approach depends on the capabilities of your system. For example, sophisticated forecasting systems have capabilities for applying extinction curves. If you are using less sophisticated software, you may turn off automated forecasting for the item being replaced, or you may enter forecast adjustments for future periods. Monitor and leverage new information as it becomes available. The days of forecasting only once per year or once per quarter are long gone. Best practices in demand management involve continual review and adaptation as new information becomes

56

FORESIGHT Issue 4 June 2006

available. New demand information might surface daily. Orders, point-of-sale data, or other vital demand information can be compared to expectations. For example, a new configuration that had not been expected to affect demand levels may actually have an effect. Cannibalization may occur. Forecasters should frequently review semi-new products and other volatile items. With some systems, you can use alert flags to indicate when items may need attention. The sooner you become aware of changes, the more quickly and profitably you can react to them.

Conclusion I have coined the term semi-new to classify “new” products that are really extensions or modifications of existing products. Because good information may exist on the predecessor products, you can significantly improve your forecasts of the semi-new products by analyzing the level, trend, and seasonality of the predecessor products. Information on the product group might also be useful in predicting demand for semi-new items. As intuitive as this may seem, it is remarkable how many forecasting processes are crippled by the failure to apply this approach to new styles, configurations, and packaging. Automated statistical forecasting is a valuable tool for business, and newer demand-management systems incorporate utilities for carrying multiple historical streams of data that link related items. This capability to “inherit” data patterns from existing products makes it possible to forecast the semi-new products accurately and systematically, enabling you to keep up with the challenges of a competitive environment and an ever-growing workload.

Reference Makridakis, S. & Hibon, M. (2000). The M3 competition: Results, conclusions and implications, International Journal of Forecasting, 16, 451-476.

Contact Info: Bill Tonetti Demand Works Co. [email protected]

BOOK REVIEW Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets by Nassim Nicholas Taleb reviewed by Anirvan Banerji Anirvan Banerji is Director of Research of the Economic Cycle Research Institute (ECRI). He has written about business cycle research and forecasting in many journals, including the International Journal of Forecasting. He is the coauthor, with Lakshman Achuthan, of Beating the Business Cycle: How to Predict and Profit from Turning Points in the Economy (Doubleday, 2004). Anirvan is Vice President of the Forecasters Club of New York, and he serves on New York City’s Economic Advisory Panel.

 Nassim Nicholas Taleb (2005). Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets (Second Edition). New York: Random House. ISBN: 0812975 219.

Introduction The British humor magazine Punch once gave a terse piece of advice to people who were about to marry: “Don’t.” That, to be overly simplistic, would be Taleb’s advice to people who are about to make precise quantitative forecasts. In this book—which is really an engaging but idiosyncratic, opinionated essay on why people, including professional forecasters, are bad at prediction—Taleb provides wideranging anecdotal, logical, and scientific evidence of misperceptions about the role of randomness in determining outcomes. He also offers philosophical musings about dealing with luck in life.

swans one has ever seen have been white, a facile induction might be that all swans are white. Then one day a black swan appears, demolishing the flawed inductive logic. The black swan events are those thought to be impossible because they have never happened before, but they turn out to be the most devastating of all. In that sense, the September 11 attacks were black swan events that killed about one-thousandth of 1% of the U.S. population, yet these incidents were inconceivable to most Americans before they happened. The 9/11 attacks are now so prominent in the national consciousness that, according to a recent poll, a fifth of all Americans expect to be casualties of a future terrorist attack. Ironically, the principal criticism of Robert Shiller’s (1981) critique of the efficient markets hypothesis came from Robert Merton (Marsh & Merton, 1986), who later helped found a hedge fund aimed at taking advantage of market inefficiencies. That fund, Long Term Capital Management,

Taleb, who in his day job is a “quant,” or quantitative derivatives trader, shows persuasively that success in life— and even more so in the financial markets—is all too often the result of luck rather than skill. He describes how people are frequently fooled by judgmental heuristics and biases. In doing so, he provides an entertaining roundup that ranges from the phenomenon of anchoring to available reference points, to hindsight bias and inductive fallacies.

The Black Swan The problem of induction is exemplified by what he calls the black swan problem, after John Stuart Mill. If all the June 2006 Issue 4 FORESIGHT

57

had a spectacular blowup in 1998. Merton, along with fellow Nobelist Myron Scholes, thought he could construct a scientific model of risk based on past data, but he failed to allow for a “black swan” that was later declared a “ten sigma” event (that is, ten standard deviations from the norm).

Wittgenstein’s Ruler This assertion, according to Taleb, reveals a Wittgenstein’s Ruler problem, namely that unless you have confidence in a ruler’s reliability, you may be using a table to measure the ruler. Thus, someone claiming that catastrophic losses constitute a ten sigma event either (a) knows the true model almost perfectly, in which case such an event should occur once in several times the age of the universe; or (b) badly misrepresents the model but refuses to admit it. This reminded me of a pair of acclaimed econometricians whose recession-probability model failed to anticipate the 1990-91 recession. One of them explained the failure by saying that not even the CIA had predicted Saddam Hussein’s invasion of Kuwait, which helped trigger the downturn, so no model could be expected to forecast it. He was visibly upset when I pointed out that our research group had been almost alone in predicting the downturn months in advance (“No Recession Yet, Says Panel of Experts,” 1990). These econometricians finally abandoned the model after it failed to predict the 2001 recession. Notably, Robert Chambers, CEO of high-flying Cisco Systems, which was laid low by that downturn, insisted that the “brightest people in the world” did not see that recession coming (Navarro, 2006).

warranted by the data. It is as though a gambler is making probabilistic calculations while assuming that the game is being played with fair six-sided dice, when in reality it is being played with loaded ten-sided dice—and the number of sides keeps changing in unforeseen ways. In the face of failure, the learned principals of Long Term Capital Management spent their energies “adducing ad hoc explanations and putting the blame on a rare event,” instead of acknowledging the root of the problem, which is “computing instead of thinking.” This could be an instance of attribution bias, which makes even scientists credit successes to their own skills while they blame bad luck for their failures. But Taleb is not just an armchair philosopher decrying the difficulty of forecasting. Armed with a Wharton MBA and a PhD in financial mathematics, he has been a successful trader for two decades. What is his secret?

What Is the Secret of Forecasting? Taleb relates a revealing story about his approach. Asked about his views on the stock market, he replies that there is a 70% probability that it would rise slightly. His colleagues are utterly confused because Taleb has just placed a large bet that the market would fall. But his view is that, although the market would probably go up, it is better to short it, because if it goes down it could fall substantially. In other words, his bet is based not on the most likely outcome, but on his loss function and on his belief that the market has underestimated the chance of extremely negative outcomes.

Hindsight Bias and the Illusion of Knowledge As Taleb points out, hindsight bias is so strong that those who are very good at predicting the past think highly of their abilities to predict the future. However, especially in economics and financial markets, the true model is not merely unknown, but it keeps changing over time. The response of another prominent econometrician to forecasting failures is an exhortation to do “more and better econometrics.” But Taleb is “now convinced that, perhaps, most of econometrics could be useless” because probability distributions might not be stable. Flawed assumptions embedded in forecasting models, especially in the financial markets, can lead to the “illusion of knowledge,” the tendency to make inferences not 58

FORESIGHT Issue 4 June 2006

The tendency of forecasters to underestimate the probability of extreme events—the so-called black swans—is one of the book’s central themes. As Taleb explains elsewhere (Mandelbrot & Taleb, 2006), measures of uncertainty using the bell curve, or Gaussian model, disregard the possibility of sharp jumps or discontinuities, often making such approaches as meaningless as focusing on the grass and missing out on the (gigantic) trees. While it might be common to remove “outliers” in econometric analysis, just the ten biggest one-day moves in the S&P 500 represent 63% of its total returns over the past 50 years. Certainly there are systems that show mild randomness, where a single observation may appear extreme by itself but will not disproportionately affect the aggregate. This sort of randomness can be averaged away, and it is

amenable to standard statistical models using large random samples. In such cases, the bell curve approach works very well: after all, when we measure the heights of people, we may come across someone who is eight feet tall, but we will never find someone who is a few million miles tall. But then there are systems that show wild randomness, where a single observation can disproportionately skew the total. Examples are income and wealth, financial market returns, or the number of casualties in terrorist attacks. In fact, almost all man-made variables—markets, societies, and economies—are complex systems that can display wild randomness. In such situations, even the most highly qualified forecasters may know less than they think they do; they are likely to predict the future based on patterns in past data, completely missing the possibility of black swan events and rationalizing away their forecasting failures when they do occur. The lesson for the practitioner is first to understand whether he or she is trying to forecast outcomes determined by complex systems. In these cases, to throw more econometrics at the problem may be to miss the point dangerously. Also, it is vital to consider one’s loss function. How costly is the failure to predict an extreme event? For instance, in sales forecasting, the impact of business cycles, which are economic fluctuations generated by a highly complex market economy, depends on the business you are in. If you are forecasting sales for a machine tool maker whose lead time for ordering raw materials is a year, failure to predict a recession could bankrupt the company. There is no such danger in forecasting the sales of toilet paper, which people will hardly forgo during a recession.

Guard Against Physics Envy In effect, Taleb admonishes us to guard against physics envy—the fantasy of being able to make forecasts as accurate as those in classical Newtonian physics. Most of all, Taleb emphasizes the importance of the awareness of probability as a factor in all that happens, as opposed to the futile urge to quantify outcomes. This is a lesson especially applicable to professional forecasters. More than most people, Taleb is keenly aware of the vagaries of fortune because of the history of his own family. A native of Lebanon, whose aristocratic Greek Levantine family was

ruined in the Lebanese civil war, Taleb is sensitive not only to the constant threat of misfortune in life, but also to the importance of dignity in the face of disaster. It is in this vein that he quotes the great poet Constantine Cavafy, who is “Greek by ancestry, Egyptian by choice,” and whose historical poems, according to a recent book review, “brood upon catastrophe.” Taleb observes that the only article over which Lady Fortuna has no control is personal behavior. He commends us to play our roles with stoic dignity, knowing that everything else—much of what determines our successes and failures—is the product of randomness.

References Mandelbrot, B. & Taleb, N. N. (2006). A focus on the exceptions that prove the rule [Mastering Uncertainty: A four-part weekly series], Financial Times, March 24, 2-3. Marsh, T. A. & Merton, R. C. (1986). Dividend variability and variance bounds tests for the rationality of stock market prices, American Economic Review, 76(3), 483-498. Navarro, P. (2006). The Well-Timed Strategy: Managing the Business Cycle for Competitive Advantage, Upper Saddle River, NJ: Wharton School Publishing. “No Recession Yet, Says Panel of Experts” (1990). The Wall Street Journal, March 3. Shiller, R. J. (1981). Do stock prices move too much to be justified by subsequent changes in dividends? American Economic Review, 71(3), 421-436.

Contact Info: Anirvan Banerji Economic Cycle Research Institute [email protected]

June 2006 Issue 4 FORESIGHT

59

“Trying to predict the unpredictable, like trying to will what cannot be willed, drives people crazy.” A Mind is a Terrible Thing to Measure Adam Phillips in the New York Times London/ 2006

FORESIGHT 2006

UPCOMING SPECIAL FEATURE Use of Monte Carlo Simulation Tools to Assess Forecast Error

FORECAST PROCESS IMPROVEMENT Implementing a Consensus Forecasting Process Merging Two Forecasting Groups

FORECASTING PRINCIPLES AND METHODS Overcoming the Pitfalls of the Sales Force Composite Approach to Forecasting Making the Best Use of Judgment in Forecasting Proper Uses of Forecast Error Measures Forecasts of the 2006 US Congressional Elections Predicting the Box Office Success of Motion Pictures

SPOTLIGHT ON SOFTWARE SAP: How Effective Are Its Forecasting Tools? Phicast (PEERForecaster): A State-of-the-Art Excel Add-In for Time Series Forecasting Jim Hoover Column on Demand Planning Software

BOOK REVIEW How to Make Important Forecasts and Decisions: Abolish Meetings! A review of James Surowiecki’s The Wisdom of Crowds

If you are interested in contributing an article or commentary on any forthcoming article, please contact Len Tashman, FORESIGHT Editor, [email protected]

2006FORESIGHT2006FORESIGHT2006FORESIGHT2006FORESIGHT2006 60

FORESIGHT Issue 4 June 2006

Forecasting Summit THE Conference Where Forecasters Converge to Share Knowledge and Ideas

enjamin Moore & Co. Federal Reserve Bank Capital One The Wharton School Coca-Cola SC Johnson GlaxoSmithKline Fed

Register now to: Learn best practices from leading practitioners and renowned experts Acquire new skills that will help advance your career Gain insights for dealing with real-world forecasting issues Exchange knowledge and ideas about forecasting

2006 Conference Dates: September 25-27, 2006 Boston, Massachusetts USA

Contact us for a free brochure with full schedule

www.forecasting-summit.com Phone: 617-484-5050 E-mail: [email protected]

Forecasting Summit is presented in cooperation with the International Institute of Forecasters

YOUR FORESIGHT SUBSCRIPTION EXPIRES WITH THIS ISSUE.

RENEW NOW!

see page 3.

SUBSCRIBE WORLDWIDE www.forecasters.org/foresight worldwide toll-free 866.395.5220

IIF

FORESIGHT: The International Journal of Applied Forecasting Published by The International Institute of Forecasters Business Office: 53 Tesla Avenue Medford, MA 02155 USA