Available online at www.sciencedirect.com
ScienceDirect Procedia Manufacturing 2 (2015) 82 – 86
2nd International Materials, Industrial, and Manufacturing Engineering Conference, MIMEC2015, 4-6 February 2015, Bali Indonesia
A clustering-based sales forecasting scheme using support vector regression for computer server Wenseng Daia, Yang-Yu Chuangb, Chi-Jie Lub* b
a International Monetary Institute, Financial School, Renmin University of China, Beijing 100872, China Department of Industrial Management, Chien Hsin University of Science and Technology, Taoyuan County 32097, Taiwan
Abstract In this study, a clustering-based sales forecasting scheme based on support vector regression (SVR) is proposed. The proposed scheme first uses k-means algorithm to partition the whole training sales data into several disjoint clusters. Then, for each group, the SVR is applied to construct forecasting model. Finally, for a given testing data, three similarity measurements are used to find the cluster which the testing data belongs to and then employee the corresponding trained SVR model to generate prediction result. A real aggregate sales data of computer server is used as an illustrative example to evaluate the performance of the proposed model. Experimental results revealed that the proposed clustering-based sales forecasting scheme outperforms the single SVR without data clustering and hence is an effective alternative for computer server sales forecasting. © The Authors. Published byThis Elsevier B.V. access article under the CC BY-NC-ND license © 2015 2015 Published by Elsevier B.V. is an open Selection and Peer-review under responsibility of the Scientific Committee of MIMEC2015. (http://creativecommons.org/licenses/by-nc-nd/4.0/). Selection and Peer-review under responsibility of the Scientific Committee of MIMEC2015 Keywords: Sales forecasting; computer server; clustering algorithm; support vector regression
1. Introduction Computer server industry is one of the most important segments in the internet age due to web search engines and internet application companies start to use cloud computing as their leading solution to achieve mass-data searching and accessing within a short period time. As the computer servers have long life cycle and premium price over other
* Corresponding author. Tel.: +886-3-4581196 ext 6712. E-mail address:
[email protected];
[email protected]
2351-9789 © 2015 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Selection and Peer-review under responsibility of the Scientific Committee of MIMEC2015 doi:10.1016/j.promfg.2015.07.014
Wenseng Dai et al. / Procedia Manufacturing 2 (2015) 82 – 86
computer products, how to create an effective sales forecast model becomes an essential topic for computer server providers. Sales forecasting is one of the most important tasks in many companies since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/over stocking. A number of studies have been conducted which focus on sales forecasting in various industries [1-5]. However, only few studies concentrate on sales forecasting in computer product industry [6-8]. To the best knowledge of authors, there is no study focus on developing sales forecasting model for computer server. Many studies have proposed different kinds of clustering-based forecasting models to improve prediction performance [6,9,10]. For a given input data, the clustering-based forecasting model first uses a data clustering algorithm to partition the whole input data into several disjoint groups. Then, for each group, the prediction model is constructed to produce the output. Finally, for the prediction target, finding the group it belongs to and using the trained forecasting method corresponding to the cluster to generate final forecasting results. Since the partitioned groups have more uniform/stationary data structure than that of the whole input data, it will become easier for constructing effective forecasting model. However, most existing studies used single similarity measurement method to calculate the similarity of the prediction target and the clusters. This may make the clustering-based forecasting models cannot provide stable and effective results. In this study, a clustering-based sales forecasting scheme by combining K-means algorithm with SVR and three similarity measurements is proposed. Although there are many different clustering algorithms have been proposed, but K-means algorithm is still one of the most effective clustering algorithms [11]. This study considers the SVR as the predictor due to its great potential and superior performance in practical applications [13]. SVR based on statistical learning theory is an effective neural network algorithm for solving nonlinear regression estimation problems and has been successfully used in sales forecasting[6-8]. In the proposed clustering-based sales forecasting scheme, first, the predictors are used as inputs of K-means algorithm to group the input data into several disjoint clusters. Then, for each cluster, the SVR forecasting model is constructed and the final forecasting results can be obtained. Finally, for a given testing data, three similarity measurements including MiniMax, Median and Mean methods are used to find the cluster which the testing data belongs to and then employee the corresponding trained SVR model to generate prediction result. A real monthly aggregate sales data of computer server collected from a computer server company in Taiwan is utilized as an illustrative example to evaluate the performance of the proposed model. The single SVR model without data clustering is used as baseline method in this paper. 2. Support vector regression Support vector regression (SVR) is an artificial intelligent forecasting tool based on statistical learning theory and structural risk minimization principle [13]. It can be expressed as ݂ሺݔሻ ൌ ൫ࢠ ή ሺݔሻ൯ ܾ, where ࢠ is weight vector, ܾ is bias and ሺݔሻ is a kernel function which use a non-linear function to transform the non-linear input to be linear mode in a high dimension feature space. Vapnik [13] introduced so-called ε-insensitivity loss function to SVR. When the predicted value falls into the band area, the loss is zero. Contrarily, if the predicted value falls out the band area, the loss is equal to the difference between the predicted value and the margin. After selecting proper modifying coefficient (C), width of band area (ɂ) and kernel function (ሺሻሻ, the optimum of each parameter can be resolved though Lagrange function. Since the most widely used kernel unction is the radial basis function (RBF), it is applied in this study as kernel function. The performance of SVR is mainly affected by the setting of parameters C,Ɂ and ɂሺthe parameter of RBF). This study employees grid search which uses exponentially growing sequences of the parameters to identify good parameters (for example, ൌ ʹିଵହ ǡ ʹିଵଷ ǡ ǥ ǡ ʹଵହ ).
3. Proposed clustering-based forecasting scheme In this study, a clustering-based sales forecasting scheme is proposed. The proposed model contains training and testing stages. In the training phase, the training dataset is divided into several disjoint groups and the SVR is used in each group to build forecasting model. The detailed procedure of the training phase is as following steps.
83
84
Wenseng Dai et al. / Procedia Manufacturing 2 (2015) 82 – 86
1.
Collect the training dataset. Given the training dataset ൛ሺǡ ୨ ሻǡ ൌ ͳǡ ʹǡ ǥ ǡ ൟ, the target variable is ൌ ሺ୧ ሻ containing data points, and the corresponding predictors are ୧ . Each predictor has data points, i.e., ୨ ൌ ൫୨ ൯. 2. Data scaling. In this step, the original target variable and predictors are respectively scaled into the range of [-1.0, 1.0] by utilizing min-max normalization method. 3. Data clustering. The K-means algorithm is used to the predictors ୨ for clustering the training dataset into disjoint training clusters. Each group contains different training dataset. In the d-th training cluster, ൌ ͳǡ ʹǡ ǥ ǡ , the target output ሺሻ ൌ ሺሺሻ୧ ሻ and predictors ݂ሺ݀ሻ ൌ ൫݂ሺ݀ሻఉ ൯ contains data points, ൏ ܰ. 4. Building forecasting model for each group. The SVR algorithm is used to each training cluster to construct SVR forecasting models (SVR(), ൌ ͳǡ ʹǡ ǥ ǡ ). Every constructed SVR model is the most adequate one for a particular cluster. In the testing phase, the detailed steps are described as follows: 1.Collect the testing dataset. Let ᇱ ൌ ሺ୧ᇱ ሻ is the target variable and ୨ᇱ are the corresponding predictors in the testing dataset. Each predictor ᇱ ൯. contains data points, i.e., ୨ᇱ ൌ ൫୨ 2. For each testing sample, find the training cluster which it belongs to. This study uses Euclidean distance to measure the similarity between a given testing sample ୧ᇱ and the training clusters , and then assign the testing sample to a training cluster according to the minimum Euclidean distance. Three different similarity measurement methods are used to get effective assignments. They are MinMax, Median and Mean methods. In the Mean method, the mean statistic is used to get the central tendency (i.e. the main attribute) of a group. It is the most used statistic to measure the similarity of two objects using Euclidean distance. For each training cluster, ̴ሺሻ୨ is the mean of the -th predictors in the cluster , ̴ሺሻ୨ ൌ ሺሺሻ୨ஒ ሻ, where ሺሻ୨ஒ is the Ⱦ-th value of the -th predictor in the training cluster . The distance ୢ between a given testing sample୧ᇱ and every training ଶ
′ ′ clusters can be calculated by ୢ ൌ ටσ୮୨ୀଵሺ୨ െ ̴ሺሻ୨ ሻ , where ୨ is the Ƚ-th value of the -th predictor of testing
sample. The most adequate group G for the given testing sample is the cluster with minimum distance ୢ , ൌ ሺ ୢ ሻ. As the median is less affected by outliers than the mean when finding the central tendency of a group, in the Median method, the median statistic is used instead of mean. Using the same concept with Mean method, for each training cluster, ̴ሺሻ୨ ൌ ሺሺሻ୨ஒ ሻis the median of the -th predictors in the cluster . The distance ୢ can be ଶ
′ െ ̴ሺሻ୨ ሻ . The most suitable cluster G for the testing sample is also determined by calculated by ୢ ൌ ටσ୮୨ୀଵሺ୨
ൌ ሺ ୢ ሻ. Both Mean and Median methods use only one attribute of the group to measure the similarity of two objects, which may ignore some useful information of the group. In order to get more attributes from the group to similarity measurement, every data points in the group are used as attributes. The concept is used in the MinMax method. In this method, for a given testing sample, the distance ߣሺ݀ሻఉ between a given testing sampleݕᇱ and every data points ଶ
′ െ ݂ሺ݀ሻఉ ൯ ቇ ǡ ߚ ൌ ͳǡ ʹǡ ǥ ǡ ݊Ǥ Then, final distance ܳௗ in the training cluster ݀ are first calculated, ߣሺ݀ሻఉ ൌ ቆටσୀଵ൫݂ఈ
of the given testing sample and the training cluster ݀ is the maximum value of the distance ߣሺ݀ሻఉ , ܳௗ ൌ ݉ܽݔ൫ߣሺ݀ሻఉ ൯Ǥ It is to measure the worst distance of the sample ݕᇱ and the cluster ݀. Finally, the cluster which has the minimum value of the distance ୢ is the most suitable group for the testing sample ୧ᇱ , ൌ ሺ ୢ ሻ. 3. Obtain the predicted value of the testing samples. After determining the most suitable cluster for each testing sample, the trained SVR forecasting model (SVR(G)) corresponding to the cluster is used to obtain the predicted results of the testing sample.
85
Wenseng Dai et al. / Procedia Manufacturing 2 (2015) 82 – 86
4. Experiential results In order to evaluate the performance of the proposed clustering-based sales forecasting scheme, the monthly aggregate sales data of a computer server company in Taiwan are used in this study. The sales data from January 2002 to December 2012 is collected and shown in Figure 1. There are a total of 120 data points in the dataset. The data set is divided into training and testing data sets. The first 96 data points (80% of the total sample points) are used as the training samples while the remaining 24 data points (20% of the total sample points) are employed as the holdout and used as the testing sample for measuring out-of-sample forecasting ability. 80000
Sales a mount
70000 60000 50000 40000 1
10
19
28
37
46
55
64
73
82
91
100
109
118
Data points
Fig. 1. The monthly aggregate sales amount of a computer server company from January January 2002 to December 2012.
According to the three different similarity measurements, the proposed clustering-based forecasting scheme can be viewed as three forecasting models. They are Mean-F model using Mean method, Median-F model employing Median method and MinMax-F model applying MinMax method. The prediction results of the proposed three sales forecasting schemes are compared to those of SVR model without data clustering (called single SVR model). For building forecasting models, five predictors employed in [7,8] including previous month’s sales amount (T1), previous two months’ sales amount (T-2), previous three months’ sales amount (T-3), 3-month moving average (MA3) and sales amount at the same time last year (T-12) are used in this study. In this study, all of the four forecasting schemes are used for one-step-ahead forecasting of monthly sales data. The prediction performance is evaluated using the root mean square error (RMSE), mean absolute deviation (MAD), mean absolute percentage error (MAPE), root mean square percentage error (RMSPE) and normalized mean square error (NMSE). Table 4. Forecasting results using single SVR model and the three clustering-based forecasting models with different groups
86
Wenseng Dai et al. / Procedia Manufacturing 2 (2015) 82 – 86
For modelling the single SVR model, the five predictor variables are used as input variables. By using the grid search, the parameter set ( ܥൌ ʹଵହ ,ߜ ൌ ʹିଷ , ߝ ൌ ʹିହ ) gives minimum testing MSE and considers the best parameter set for the single SVR model in forecasting aggregate sales of computer server. In building the three clustering-based forecasting models, how to determine the number of groups is the most important issue affecting the forecasting result. In order to obtain a better forecasting result, two to six groups are tested in the Mean-F model, Median-F model and MinMax-F models. The group with minimum testing error is the most suitable numbers of groups. In training the three clustering-based forecasting models, like the process of modeling single SVR model, the best parameter sets of the SVR models in the three forecasting models can be determined by the grid search. The sales forecasting results in terms of various measurement criteria using the single SVR and three clustering-based models with different groups are computed and listed in Table 4. Table 4 depicts that MAPE, RMSPE, RMSE, MAD and NMSE of the MiniMax-F model with 2 groups (called Mini-F(2G)) are 2.72%, 3.55%, 1911, 1522 and 0.0474, respectively. It can be observed that these values are smaller than those of the single SVR model and the two clustering-based models. It indicates that there is a smaller deviation between the actual and predicted values using the MiniMax-F model. Thus, for forecasting aggregate sales of computer server, the proposed MiniMax-F model with 2 groups can provide better forecasting result than the three competing models. 5. Conclusion This paper proposed a clustering-based scheme by combining K-means algorithm with SVR and three similarity measurements for forecasting aggregate sales of computer server. The monthly aggregate sales data of a computer server company in Taiwan are used in this study for evaluating the performance of the proposed method. Experimental results showed that the proposed sales forecasting scheme based on MiniMax similarity measurement produces the best forecasting result and outperforms the three competing models. It is an effective alternative for sales forecasting of computer server. Acknowledgements This work is partially supported by the Ministry of Science and Technology, Taiwan, under Grant no. Most 1032221-E-231-003-MY2. References [1]. [2]. [3]. [4]. [5]. [6]. [7]. [8]. [9]. [10]. [11]. [12]. [13].
S. Thomassey, M. Happiette. A neural clustering and classification system for sales forecasting of new apparel items. Applied Soft Computing. 7 (2007) 1177-1187. S. Thomassey. Sales forecasts in clothing industry: the key success factor of the supply chain management. International Journal of Production Economics. 128 (2010) 470-483. Z.L. Sun, T.M. Choi, K.F. Au and Y. Yu. Sales forecasting using extreme learning machine with applications in fashion retailing. Decision Support Systems. 46 (2008) 411-419. M. Xia and W.K. Wong. A seasonal discrete grey forecasting model for fashion retailing. Knowledge-Based Systems 57 (2014) 119-126. A. Sa-ngasoongsong, S. T.S. Bukkapatnam, J. Kim, P. S. Iyer, R.P. Suresh. Multi-step sales forecasting in automotive industry based on structural relationship identification. International Journal of Production Economics. 140 (2012) 875-887. C.J. Lu and Y.W. Wang. Combining independent component analysis and growing hierarchical self-organizing maps with support vector regression in product demand forecasting. International Journal of Production Economics. 128 (2010) 603-613. C.J. Lu. Sales forecasting of computer products based on variable selection scheme and support vector regression. Neurocomputing. 128 (2014) 491-499. C.J. Lu, T.S. Lee and C.M. Lian. Sales forecasting for computer wholesalers: A comparison of multivariate adaptive regression splines and artificial neural networks. Decision Support Systems. 54(1) (2012) 584-596. F.E.H. Tay and L.J. Cao. Improved financial time series forecasting by combining support vector machines with self-organizing feature map. Intelligent Data Analysis. 5 (2001) 339--354. R.K. Lai, C.Y. Fan, W.H. Huang and P.C. Chang. Evolving and clustering fuzzy decision tree for financial time series data forecasting. Expert Systems with Applications. 36 (2009) 3761--3773. A.K. Jain. Data clustering: 50 years beyond K-means. Pattern Recognition Letters. 31 (2010) 651--666. C.J. Lu, T.S. Lee and C.C. Chiu. Financial time series forecasting using independent component analysis and support vector regression. Decision Support Systems. 47 (2009) 115-125. V.N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 2000.