Oct 10, 2008 - School of Computer Science, Sichuan University, Chengdu, 610065, China. 2. Dept. of Computer Sci. & Tech., Shaanxi Univ. of Tech., Hanzhong, 723003 China) ... Abstract. As a learning mechanic, support vector machine.
Predicting the Total Workload in Telecommunications by SVMs
Mingfang Zhu1,2,Changjie Tang1, Shucheng Dai1,Yong Xiang1,Shaojie Qiao1,Chen Yu1 (1. School of Computer Science, Sichuan University, Chengdu, 610065, China 2. Dept. of Computer Sci. & Tech., Shaanxi Univ. of Tech., Hanzhong, 723003 China) {zhumingfang,tangchangjie,daishucheng,xiangyong,qiaoshaojie,chenyu}@cs.scu.edu.cn }
Abstract
applies grey theory building a prediction model. In []
As a learning mechanic, support vector machine
provides multivariable fuzzy inference forecasting
(SVMs) has been studied and applied in a wide area.
method and [] provides based set pair analysis
This study deals with the special futures of SVM in
classified prediction model.
predicting the total workload in telecommunication.
Time series analysis method [], [] aims to find
The contributions include: (a) Building a predicted
statistical principal from history data, combined to
model of the total workload in telecommunications
some experience, and then to make decision. This
and predicting using it; (b)Analyzing the parameter of
method acquire time series is stability change trend,
support vector regression(SVRs) which influence
otherwise, it makes prediction is very imprecise.
performance of SVRs. (c) Experiments demonstrate
Regression prediction is built on statistic theory; it
that SVM in this paper outperforms the others
searches for respond variable and predicted variables
methods in this area.
linear relationship. This method needs a large data set, otherwise it difficult find statistical disciplinarian.
1. Introduction
Unlike traditional statistical model, artificial neural networks [] are data-driven, nonparametric weak
Prediction is an important data mining task and
models, and they let the data speak for themselves.
widely used in scientific projects to make decisions.
They are machine learning building on empirical risk
Currently, hot prediction methods of the time series,
minimization (ERM) principal which only minimizes
such as time series analysis, grey model method,
the training error. In real application, designing the
regression analysis, and artificial neural network are
structure of network is no structural method, and this
used widely.
method exist over-fitting or infra-fitting, and the local
Some studies have to attempt to identify the most effective model for forecasting the workload in telecommunications. These works include: In []
optimal solution, and so on questions. All those influence the ANN predicted precise. Support Vector Regressions (SVRs) extend the excellent properties of the Support Vector Machines
Supported by Grant of National Science Foundation of China (60473071), Specialized Research Fund for Doctoral Program by the Ministry of Education (SRFDP 20020610007) and the Software Innovation Project of Sichuan Youth (AA0807).
The 3rd Intetnational Conference on Innovative Computing Information and Control (ICICIC'08) 978-0-7695-3161-8/08 $25.00 © 2008 IEEE
(SVMs) to solving function approximation problems. Support vector machine (SVM) is developed by
Authorized licensed use limited to: Sichuan University. Downloaded on October 10, 2008 at 13:09 from IEEE Xplore. Restrictions apply.
Vapnik and his coworkers, and it is based on the structural risk minimization (SRM) principle which seeks
to
minimize
an
upper
bound
of
the
1 || w ||2 W (w) °wmin 2 ® H ,bR °¯s.t. yi ((w I ( xi )) b) t 1
(2) i 1,2,...,l
generalization error consisting of the sum of the
If the training set is inseparable, slack variables ȟi
training error and a confidence interval. SVMs has
have to be introduced. Then the constraints conditions
powerful learning ability and good generalization
of 2) are modified, thus optimal problems as ().
ability. It has concise mathematical form, has intuition geometry interpret, and achieves higher generalization performance than traditional neural networks that implement the ERM principle in solving many machine learning problems. Another key characteristic of SVM is that training SVM is equivalent to solving a linearly constrained quadratic programming problem so that the solution of SVM is
l 1 W (w) || w ||2 C ¦ [i °wH min , bR ,[ R l 2 i 1 °° s t y w I x b [i i 1,2,..., l . . (( ( )) ) 1 t ® i i ° [i t 0, i 1,2,..., l ° °¯
(3)
Where C is the penalty parameter, constant C>0 represents the trade-off between the margin error and the margin.
ȟi
In SVM, (1), (2) and (3) are also called SVMs
always unique and globally optimal.
primal problems. In practical application, we study their dual problems, and then acquired optimal
2. Problem Specification
hyper-plane.
The main idea of SVM classification is to construct a hyper-plane to separate the data to two classes. Given a sample set T={(xi,yi)|xięRN,yię{1, -1},i=1,2,
,l }ę(X×Y)l, which generated randomly and independently from an unknown function. Where xi is sample vector, yi is the classes label; l is the total number of samples. Assumption sample can be separated linearly. Let the decision function be f(x)=sgn(g(x)), where g(x)=(w·x)+b. The optimal hyper-plane is g(x) =0, where w is weighting vector and b is a bias. The following optimization problem is given to maximize the margin of the hyper-plane, i.e. to minimize the following quadric optimization problem (1).
1 || w ||2 W (w) °min w,b 2 ° s t y w . . (( x) b) t 1 i 1,2,...,l ® i ° min yi ((w x) b) 1 ° i 1, 2,...l ¯
(1)
If the training set is nonlinearly separable, Then, in feature space H exploits the optimal hyper-plane, This is equal to optimal problem as follows (2)
The 3rd Intetnational Conference on Innovative Computing Information and Control (ICICIC'08) 978-0-7695-3161-8/08 $25.00 © 2008 IEEE
3. Preliminary Concepts Terminologies
and
Basic
An interesting property of SVM is that it is an approximate implementation of the structure risk minimization
induction principle
that
aims
at
minimizing a bound on the generation error of a model, rather than minimizing the mean square error over the train data set. SVM is considered as a good learning method that can overcome the internal drawbacks of neural network. The dual optimization problem of the above (3) optimization problem is represented as (4). 1 l l Q (D ) ¦ ¦ y i y jD iD j K ( x i , x j ) ° min D 2 i 1 j 1 ° l °° 0 y iD i ® s .t . ¦ 1 i ° ° 0 d D i d C , i 1, 2 ,..., l ° °¯
l
¦D
i
i 1
(4)
Where Į=(Į1,Į2,....,Įl),is the vector of nonnegative Lagrange multipliers, K(xi,x)=(ĭ(xi).ĭ(x)) is Hilbert space inner product, called kernel function. Suppose the optimum values of Lagrange multipliers are
Authorized licensed use limited to: Sichuan University. Downloaded on October 10, 2008 at 13:09 from IEEE Xplore. Restrictions apply.
denoted as Įi*(i=1,2,
,l). And then decision function is (5) l
f (x)
sgn( g ( x ))
sgn(
¦
y i D i* K ( x i x ) b * )
(5)
5. Prediction the Total Workload in Telecommunications By Support Vector 5.1. Data Set and Results Analysis
i 1
*
Where b is bias which compute: selected a Įj*>0, then b*=yj-i=1lyiK(xi,xj).
In [], the researchers analysis the relationship between business total posts and telecommunication
4 SVMs for Regression Estimation Model
and the first, second and third GNP, they are preprocessed as their rate of increase. Then,
SVMs have been successfully applied to solve
prediction
of
business
total
of
posts
and
forecasting problems in numerous fields. These
telecommunication turn into prediction of its rate of
studies show: SVRs model outperforms other
increase.
forecasting methods, such as BP neural network, RBF network, ARIMA, etc.
We choose 1991-1999 year data as train data set, to use train SVRs, the data of 2000y as predicted data.
Regression analysis is also named function
Select kernel function is RBF, the parameters are
approximation. Its proposition is: given sample data
C=2000,İЈ0.05,ı=1.1. Experiments use matlab6.5
N
l
set T={(xi,yi)|xięR ,yięR, i=1,2,
,l }ę(X×Y) ,
programming. Regression curve shows at figure 1.
where xi,i=1,2,..l is predicted variables, yi,i=1,2,..l is response factor, l is the number of the samples. Task is to find a function y=f(x), which is the best function in certain measured. Simulated SVMs for classification definition, let all response variable yi plus and minus a real number İ>0, now data set T change into D=D+ĤD-,where D+={(xi,yi+İ)T,i=1,2,..l}, D-={(xi,yi-İ)T,i=1,2,..l}. We look D as a classification question, then our aim at to search a hyper-plane which separate D into D+ and D.
Figure 1ˊThe real curve and estimated curve Figure 2 shows the spectrum of ө (*) versus samples. From figure 3, we see number 1,2,3,4,5,7
Here, we conclude SVRs model as (6).
samples are support vectors, because their absolute
1 l Q (D (*) ) ¦ (D i* D i )( D *j D j ) K ( x i , x j ) ° D min (*) R 2l 2 i, j 1 ° l l ° H ¦ ( D i* D i ) ¦ y i (D i* D i ) ° ° i 1 i 1 ® l ° s .t . (D i* D i ) 0 ¦ ° i 1 ° C ° 0 d Di d , i 1, 2 ,..., l l ¯°
errors bigger than or equal İЈ0.05, especially, the sample No. 3(year 1993), its absolute error=0.0532, (6)
samples as boundary support vectors). We named it outliers point. It shows in 1993, social development
Where K(xi,xj) is kernel function, Note: here (*)
*
Į =(Į1,Į
(*) T 1,
,Įl,Į l)
2l
is R
sample xj is support vector if
space vector. Call
Įj*0
or Įj0.
(*)
If (6)s, optimal solution D
f (x)
¦
*
(D
i
very rapidly or change very much, this needs economists to analysis. From forecasting, when eliminated this sample, precision of forecasting can be
*
(D1 ,D1* ,...,D l ,D l )T , then
regression function is (7). l
thus ө 3=C=2000(at this time, named responding
better. The choice of kernel function and its parameter(s), and regression parameters C, İ are very important.
D i )K ( x i , x ) b )
(7 )
They are related with real questions.
i 1
The 3rd Intetnational Conference on Innovative Computing Information and Control (ICICIC'08) 978-0-7695-3161-8/08 $25.00 © 2008 IEEE
Authorized licensed use limited to: Sichuan University. Downloaded on October 10, 2008 at 13:09 from IEEE Xplore. Restrictions apply.
Our model has high estimate precision. Mean of
results are satisfied. Compared prediction results
squared error is only 1.46%, the biggest relative error
using our method with others methods, show SVRs
is 3.75%, happened at 1991.
outperform others methods.
7. REFERENCES [1] WU HuiRong. Applying Gray Model Predict Business Total of Posts and Telecommunication [J]. Journal of Nanjing University of Posts and Telecommunication. 1990.10(2): 91-94.
Figure 2. Alpha spectrum of SVRs
[2] GAO Jie. The Application of the Multivariable Fuzzy Inference Forecasting Method to Predicting Business Total
5.2. Comparison with Other Methods
of Posts and Telecommunications [J]. Journal of Nanjing University of Posts and Telecommunication. 2000.20(1):
In order to show our method has better
58-62.
generalization ability than others, table 1, to illustrate
[3] GAO Jie and SHENG Zhao-han. Method and
the prediction results with SVR and with other
application of set pair analysis classified prediction [J].
methods.
Journal of Systems Engineering, 2002.10:458-462.
Table 1. SVRs methods vs. others techniques Clustering
[4] L. J. Cao and Francis E. H. Tay. Support Vector
prediction
Rel. err.
Machine with Adaptive Parameters in Financial Time Series
SVR
Ё
1.4205
0.10%
Forecasting [J]. IEEE Transactions on Neural Networks,
SPA[3]
A2
1.415
0.49%
2003, 14(6): 1509-1518.
MFI[2]
Ё
1.419
0.21%
[5] PENG Lifang, MENG Zhiqing, JIANG Hua, TIAN Min.
Grey[1]
A2
1.395
1.90%
Application of Support Vector Machine Based on Time
Prediction based on set pair analysis and grey
Sequence in Stock Forecasting [J]. Computing Technology
theory, first it cluster data, and then predict. Please
and Automation, 2006, 25(3): 88-91.
see related work about their grade and classify
[6]Hanh H. Nguyen ,Christine W. Chan. Multiple neural
method.
networks for a long term time series forecast [J]. Neural Comput. & Applic. (2004) 13: 9098.
6. Conclusions Since SVMs is an approximate implementation of
[7] DENG NaiYang, and TIAN YingJie. New Approach in Data MiningSupport Vector Machine [M]. Beijing: Science Press, 2004. 77-93.
structural risk minimization principle, it offers a better
[8] V. N. Vapnik, The Nature of Statistical Learning Theory
generalization performance particularly when the
[M]. Beijing: Tsinghua University Press, 2000.
training set available is limited. Unlike neural
[9] YANG LiMing. Linear support Vector Regression
network, SVM is both easy to implement and to use.
Based on VC Generalization Bounds and its Application [J].
This paper analysis the mechanism of SVRs, at
Computer Engineering and Application, 2006, 31: 230-232.
same time, applies support vector machines to
[10] WANG Yanfeng, GAO Feng. Stock Forecasting Based
forecast the rate of increase business total of posts and
on Support Vector Machine [J]. Computer Simulation, 2006,
telecommunication, analysis relationship between
23(11): 256-258.
parameters and support vectors. The experiment
The 3rd Intetnational Conference on Innovative Computing Information and Control (ICICIC'08) 978-0-7695-3161-8/08 $25.00 © 2008 IEEE
Authorized licensed use limited to: Sichuan University. Downloaded on October 10, 2008 at 13:09 from IEEE Xplore. Restrictions apply.