This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/authorsrights
Author's personal copy
Expert Systems with Applications 41 (2014) 3850–3855
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Implementing support vector regression with differential evolution to forecast motherboard shipments Fu-Kwun Wang a, Timon Du b,⇑ a b
Department of Industrial Management, National Taiwan University of Science and Technology, Taiwan Department of Decision Sciences and Managerial Economics, The Chinese University of Hong Kong, Hong Kong
a r t i c l e
i n f o
Keywords: Generalized Bass diffusion model Particle swarm optimization Support vector regression Differential evolution
a b s t r a c t In this study, we investigate the forecasting accuracy of motherboard shipments from Taiwan manufacturers. A generalized Bass diffusion model with external variables can provide better forecasting performance. We present a hybrid particle swarm optimization (HPSO) algorithm to improve the parameter estimates of the generalized Bass diffusion model. A support vector regression (SVR) model was recently used successfully to solve forecasting problems. We propose an SVR model with a differential evolution (DE) algorithm to improve forecasting accuracy. We compare our proposed model with the Bass diffusion and generalized Bass diffusion models. The SVR model with a DE algorithm outperforms the other models on both model fit and forecasting accuracy. Ó 2014 Elsevier Ltd. All rights reserved.
1. Introduction Taiwanese motherboard manufacturers create 98.5% of the world’s desktop motherboards and dominate the global desktop motherboard market (Market Intelligence Center (MIC), 2012). However, this industry’s growth rate has slowed due to the trend of replacing desktops with laptops and tablets. In addition, aggressive pricing by laptop/tablet manufacturers has diminished desktop motherboard sales. Forecasting plays an important role in many business activities, such as the volume of demand in order and inventory management, production planning in manufacturing processes, capacity usage in production management, and the diffusion patterns of new products and technological innovations. The market is changing rapidly and a new forecasting model is required. Forecasted results can assist manufacturers in making better decisions on future expansion and investment. In recent years, the Bass diffusion model (Bass, 1969) has been used successfully to describe the empirical adoption curve for many new products and technological innovations. This model provides good predictions on the timing and magnitude of the sales peaks of the products to which it is applied. Bass, Krishnan, and Jain (1994) proposed a generalized Bass model that included marketing mix variables (e.g., price and advertising variables). This generalized model can produce the best model fit and forecasting performance. Bass (1969) used the ordinary least squares (OLS) method to estimate the parameters of the Bass diffusion model. However, the OLS approach has a bias when estimating continuous
⇑ Corresponding author. Tel.: +852 26098569. E-mail address:
[email protected] (T. Du). 0957-4174/$ - see front matter Ó 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2013.12.022
time models. In contrast, Schmittlein and Mahajan (1982) proposed the maximum likelihood estimation (MLE) method to improve the estimation. However, the maximum likelihood formulation considers only the sampling error and ignores all other sources of error, hence the computed standard error estimates may be too optimistic. Many researchers have tried to improve the problem. For example, Srinivasan and Mason (1986) applied the nonlinear least square (NLS) method to obtain valid error estimates. Venkatesan and Kumar (2002) presented genetic algorithms (GAs) to estimate the parameters of the Bass diffusion model. The parameter estimates they obtained from the GAs were consistent with the NLS method. Wang, Chang, and Hsiao (2013) proposed an evolutionary approach based on a GA/particle swarm optimization (PSO) hybrid to obtain the parameter estimates of the modified Bass model. This hybrid evolutionary approach has been successfully applied to real-world engineering design problems (Nagi, Yap, Nagi, Tiong, & Ahmed, 2011; Niu, Liu, & Wu, 2010). The support vector machine (SVM) developed by Vapnik (1998) is based on statistical learning theory. SVMs have been widely applied in the fields of pattern recognition, bioinformatics, and other artificial-intelligence-related applications. SVMs have also been used to solve nonlinear regression estimation problems, a process known as support vector regression (SVR). SVR models have been used successfully to solve forecasting problems (Cao, 2003; Che, Wang, & Tang, 2012; Chou, Cheng, & Wu, 2013; García, García Villalba, & Portela, 2012; Hong & Pai, 2007; Huang, 2012; Huang, Bo, & Wang, 2011; Jiang & He, 2012; Khashei & Bijari, 2012; Pai & Lin, 2005; Šteˇpnicˇka, Cortez, Donate, & Šteˇpnicˇková, 2013). Empirical results have indicated that the selection of the three parameters, including C, e, and c, in an SVR model significantly influences its forecasting accuracy. SVRs with evolutionary
Author's personal copy
F.-K. Wang, T. Du / Expert Systems with Applications 41 (2014) 3850–3855
algorithms (e.g., GA, simulated annealing, PSO, chaos-based PSO, chaos-based firefly, and hybrid) are used to determine appropriate parameter values (Hong, 2009; Kazem, Sharifi, Hussain, Saberi, & Hussain, 2013; Wu, 2010; Wu & Law, 2011). We implement an SVR model with a differential evolution (DE) algorithm (Price, Storn, & Lampinen, 2006; Storn & Price, 1997) to improve the forecasting performance of motherboard shipments. In addition, we use a hybrid evolutionary algorithm that combines PSO with a quasi-Newton method to improve the parameter estimates of the generalized Bass diffusion model. In the following section, we present the forecasting models. Section 3 explains the SVR model with the DE and hybrid PSO (HPSO) algorithms. Section 4 denotes our use of data on motherboard shipments from Taiwanese firms to demonstrate the application of our proposed forecasting model. Finally, we offer a conclusion and suggestions for future studies in Section 5.
vector and yi 2 R is the target output. There theoretically exists a linear function to formulate the nonlinear relationship between input and output data. An SVR function is defined as
f ðxÞ ¼ wT /ðxÞ þ b;
Minw;b;n;n
l X 1 T w w þ C ðni þ ni Þ; 2 i¼1
ð5Þ
with the constraints
wT /ðxi Þ þ b yi e þ ni ; ð6Þ
ni ; ni 0; i ¼ 1; 2; . . . ; l;
2. Forecasting models
where ni denotes training errors above e and ni denotes training errors below e. After the quadratic optimization with inequality constraints is solved, the parameter vector w in Eq. (4) is calculated as
2.1. Bass diffusion and generalized Bass diffusion models The Bass diffusion model (Bass, 1969) is defined as
nðtÞ ¼ m½FðtÞ Fðt 1Þ þ e;
ð1Þ
where n(t) = the sales at time t, m = the number of eventual adopters, F(t) = the cumulative distribution of adoptions at time . t = ð1 eðpþqÞt Þ ð1 þ ðqpÞ eðpþqÞt Þ; p = the innovation coefficient, q = the imitation coefficient, and e = the normally distributed random error term with a mean of zero and a variance of r2. The adopter’s probability density function f(t) for adoption at time t is derived by
ðp þ qÞ2 eðpþqÞt p
ð4Þ
where f(x) denotes the forecasting values, /( ) is a nonlinear mapping function, and the coefficients w (w 2 Rn) and b (b 2 R) are adjustable. Under the given parameters C > 0 and e > 0, the standard form of SVR (Vapnik, 1998) is defined as
yi wT /ðxi Þ bi e þ ni ; and
f ðtÞ ¼
3851
!,
1þ
2 q eðpþqÞt : p
ð2Þ
Finally, some quantities such as peak sales, peak sales times, sales period inflection points, and forecasts of future sales are of interest in practical applications. The peak sales times can be obtained by differentiating Eq. (2) with respect to t, i.e., T⁄ = ln (q/p)/(p + q).
w¼
l X ðbi bi Þ/ðxi Þ;
ð7Þ
i¼1
where bi and bi are obtained by solving a quadratic program and the Lagrange multipliers. Finally, the SVR function is calculated as
f ðxÞ ¼
l X ðbi bi ÞKðxi ; xj Þ þ b;
ð8Þ
i¼1
where K(xi, xj) = exp( c||xi xj||2) is the Gaussian radial basis function (RBF) kernel function. Chang and Lin (2011) suggested trying small and large values for C, e.g., 1–1,000, before deciding which are better for the data through cross validation, and finally trying several cs for the better Cs. However, better results are obtainable by practitioner experience.
2
The peak sales rate is obtained as nðT Þ ¼ m ðpþqÞ . The inflection 4q point for each sales period can be obtained by differentiating Eq. (2) twice with respect to t and solving for t, which yields pffiffi pffiffi 3 3 T left ¼ 2 lnðq=pÞ and T right ¼ 2þ lnðq=pÞ . pþq pþq However, the Bass diffusion model cannot consider external variables that can affect diffusion. A generalized Bass diffusion model (Bass et al., 1994) was developed to overcome the limitations of the Bass diffusion model. In the generalized Bass model, the mapping function x(t), which describes the current effect of the decision variables on the conditional probability of adoptions at time t, is added to Eq. (1):
nðtÞ ¼ m½FðtÞ Fðt 1ÞxðtÞ þ e;
ð3Þ
where x(t) = 1 + bv(t)/v0 (t) represents the pricing effect, v(t) and v0 (t) represent the absolute price and the rate of price change, respectively, and b reflects the sensitivity to the price change (Bass et al., 1994). In addition, x(t) = 1 + bv(t) is found in Jun and Park (1999). The generalized Bass diffusion model has been used to study optimal pricing and advertising policies for single-generation products (Krishnan & Jain, 2000). 2.2. Support vector regression The SVM is a popular machine learning method of classification, regression, and other learning tasks (Vapnik, 1998). We consider a set of training points, {(x1, y1), . . ., (xl, yl)}, where xi 2 Rn is a feature
3. The proposed model and HPSO algorithm 3.1. SVR-DE model DE is a search heuristic that was introduced by Storn and Price (1997). It has been successfully applied in a wide variety of fields, from computational physics to operations research (Price et al., 2006). DE belongs to the class of genetic algorithms that use the biology-inspired operations of crossover, mutation, and selection on a population to minimize an objective function over the course of successive generations (Mitchell, 1998). DE uses floating-point instead of bit-string encoding on population members, and arithmetic instead of logical operations in mutation. It has several advantages such as its simple structure, ease of use, speed, and robustness (Storn & Price, 1997). Therefore, a DE algorithm can be used to find the best hyperparameters for SVR. The DE procedure is summarized as follows (Ardia, Boudt, Carl, Mullen, & Peterson, 2011; Mullen, Ardia, Gil, Windover, & Cline, 2011). The variable NP represents the number of parameter vectors in the population. At generation 0, NP guesses the optimal parameter value, and vectors are made using random values between the lower and upper bounds. Each generation involves the creation of a new population from the current population members xi,g, where i indexes the vectors and g indexes the generation. This is accomplished using a differential mutation of the population members. A trial mutant parameter vector vi,g is created by choosing three
Author's personal copy
3852
F.-K. Wang, T. Du / Expert Systems with Applications 41 (2014) 3850–3855
members of the population (xr0, xr1, and xr2) at random. derived by
vi,g is then
v i ¼ xr0 þ F ðxr1 xr2 Þ;
ð9Þ
where F is a positive factor and F 2 (0, 1). After the first mutation operation, the mutation is continued until either the mutation length has been made or rand > CR, where CR is a crossover probability CR 2 [0, 1]. The crossover probability CR controls the fraction of the parameter values that are copied from the mutant. If an element of the trial parameter vector is found to violate the bounds after mutation and crossover, it is reset in such a way that the bounds are respected (with the specific protocol depending on the
DE algorithm Initialization of parameter vectors
Raw data
Optimize the hyperparameters
implementation). The objective function values associated with the children are then determined. If a trial vector has an equal or lower objective function value than the previous vector, it replaces the previous vector in the population; otherwise, the previous vector remains. The choices of NP, F, and CR depend on the specific problem. Price et al. (2006) suggested that the number of parents NP should be 10 times the number of parameters. Further, they suggested that F = 0.8 and CR = 0.9. DE is much more sensitive to the choice of F than it is to the choice of CR. CR is more similar to a fine-tuning element. High CR values such as 1 allow for faster convergence if convergence occurs. Sometimes, however, the value of CR has to equal 0 to make the DE robust enough for a particular problem. For more details on the DE strategy, refer to Price et al. (2006) and Storn and Price (1997). To improve the forecasting accuracy, we propose a model based on SVR with a DE algorithm (see Fig. 1). The suggested procedures are as follows. Step 1: Initialize a set of parameter vectors (Ci, ci, and ei) in the population. Step 2: Define the fitness function as the mean absolute percentage error (MAPE):
Train SVR
Mutation operator
Pn
Calculate MAPE
MAPE ¼
Crossover
Selection
No
Stopping criteria reached?
No
Yes
Stopping criteria reached ?
Generate forecasting values
Yes Fig. 1. SVR-DE flowchart.
F t Þ=At j ; n
t¼1 jðAt
ð10Þ
where At is the actual value at period t, Ft is the forecasting value at period t, and n is the number of periods used in the calculation. Step 3: Choose NP = 30, CR = 0.5, and F = 0.8. Step 4: Perform an SVR on each individual in the population and calculate the MAPE. Step 5: If the convergence rule or maximum iteration number (=200) is attained, the best hyperparameter estimates are output. Step 6: Perform an SVR based on the best hyperparameter estimates. If the termination criterion tolerance (0.0001) is attained, the forecasting values are output.
Table 1 Raw data. Time
Shipment (unit = thousand)
Market value (unit = million US$)
Time
Shipment (unit = thousand)
Market value (unit = million US$)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
12,510 10,037 14,060 17,124 14,087 13,545 16,706 20,040 18,725 18,404 22,158 25,085 21,080 15,521 20,746 23,218 21,667 19,106 22,153 23,628 23,514 22,282 27,770 29,943 26,598 24,028 26,819 30,542
1,076.0 1,026.0 1,255.0 1,497.0 1,076.0 1,026.0 1,255.0 1,497.0 1,260.0 1,237.0 1,492.0 1,685.0 1,437.0 1,071.0 1,463.0 1,676.0 1,464.0 1,238.0 1,425.0 1,509.0 1,430.0 1,361.0 1,750.0 1,833.0 1,653.0 1,326.0 1,517.0 1,732.0
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
27,121 25,532 28,740 31,910 28,436 25,382 29,927 27,371 26,884 23,861 28,758 29,594 25,232 22,670 26,490 22,351 22,077 20,697 23,349 18,251 15,306 13,302 15,521 14,099 16,459 15,409 16,808 14,932
1,505.0 1,400.0 1,598.0 1,751.0 1,477.1 1,312.8 1,597.2 1,392.5 1,381.9 1,235.2 1,444.2 1,463.3 1,229.3 1,097.8 1,234.2 1,028.9 996.6 923.5 1,036.1 814.3 695.0 601.0 701.0 637.0 753.0 707.0 763.0 677.0
Author's personal copy
3853
F.-K. Wang, T. Du / Expert Systems with Applications 41 (2014) 3850–3855 Table 2 Parameter estimates using the Bass diffusion model. Parameter
Estimate
Std. error
t value
P-value
p q m
0.00800 0.06025 1,448,000
0.00037 0.00366 44,440
21.40 16.45 32.59