Predicting the Total Workload in Telecommunications by ... - CiteSeerX

2 downloads 284 Views 197KB Size Report
Oct 10, 2008 - School of Computer Science, Sichuan University, Chengdu, 610065, China. 2. Dept. of Computer Sci. & Tech., Shaanxi Univ. of Tech., Hanzhong, 723003 China) ... Abstract. As a learning mechanic, support vector machine.
Predicting the Total Workload in Telecommunications by SVMs

Mingfang Zhu1,2,Changjie Tang1, Shucheng Dai1,Yong Xiang1,Shaojie Qiao1,Chen Yu1 (1. School of Computer Science, Sichuan University, Chengdu, 610065, China 2. Dept. of Computer Sci. & Tech., Shaanxi Univ. of Tech., Hanzhong, 723003 China) {zhumingfang,tangchangjie,daishucheng,xiangyong,qiaoshaojie,chenyu}@cs.scu.edu.cn }

Abstract

applies grey theory building a prediction model. In []

As a learning mechanic, support vector machine

provides multivariable fuzzy inference forecasting

(SVMs) has been studied and applied in a wide area.

method and [] provides based set pair analysis

This study deals with the special futures of SVM in

classified prediction model.

predicting the total workload in telecommunication.

Time series analysis method [], [] aims to find

The contributions include: (a) Building a predicted

statistical principal from history data, combined to

model of the total workload in telecommunications

some experience, and then to make decision. This

and predicting using it; (b)Analyzing the parameter of

method acquire time series is stability change trend,

support vector regression(SVRs) which influence

otherwise, it makes prediction is very imprecise.

performance of SVRs. (c) Experiments demonstrate

Regression prediction is built on statistic theory; it

that SVM in this paper outperforms the others

searches for respond variable and predicted variables

methods in this area.

linear relationship. This method needs a large data set, otherwise it difficult find statistical disciplinarian.



1. Introduction

Unlike traditional statistical model, artificial neural networks [] are data-driven, nonparametric weak

Prediction is an important data mining task and

models, and they let “the data speak for themselves”.

widely used in scientific projects to make decisions.

They are machine learning building on empirical risk

Currently, hot prediction methods of the time series,

minimization (ERM) principal which only minimizes

such as time series analysis, grey model method,

the training error. In real application, designing the

regression analysis, and artificial neural network are

structure of network is no structural method, and this

used widely.

method exist over-fitting or infra-fitting, and the local

Some studies have to attempt to identify the most effective model for forecasting the workload in telecommunications. These works include: In []

optimal solution, and so on questions. All those influence the ANN predicted precise. Support Vector Regressions (SVRs) extend the excellent properties of the Support Vector Machines



Supported by Grant of National Science Foundation of China (60473071), Specialized Research Fund for Doctoral Program by the Ministry of Education (SRFDP 20020610007) and the Software Innovation Project of Sichuan Youth (AA0807).

The 3rd Intetnational Conference on Innovative Computing Information and Control (ICICIC'08) 978-0-7695-3161-8/08 $25.00 © 2008 IEEE

(SVMs) to solving function approximation problems. Support vector machine (SVM) is developed by

Authorized licensed use limited to: Sichuan University. Downloaded on October 10, 2008 at 13:09 from IEEE Xplore. Restrictions apply.

Vapnik and his coworkers, and it is based on the structural risk minimization (SRM) principle which seeks

to

minimize

an

upper

bound

of

the

1 ­ || w ||2 W (w) °wmin 2 ® H ,bR °¯s.t. yi ((w ˜ I ( xi ))  b) t 1

(2) i 1,2,...,l

generalization error consisting of the sum of the

If the training set is inseparable, slack variables ȟi

training error and a confidence interval. SVMs has

have to be introduced. Then the constraints conditions

powerful learning ability and good generalization

of 2) are modified, thus optimal problems as ().

ability. It has concise mathematical form, has intuition geometry interpret, and achieves higher generalization performance than traditional neural networks that implement the ERM principle in solving many machine learning problems. Another key characteristic of SVM is that training SVM is equivalent to solving a linearly constrained quadratic programming problem so that the solution of SVM is

l 1 ­ W (w) || w ||2 C ¦ [i °wH min , bR ,[ R l 2 i 1 °° s t y w I x b [i i 1,2,..., l . . (( ( )) ) 1 ˜  t  ® i i ° [i t 0, i 1,2,..., l ° °¯

(3)

Where C is the penalty parameter, constant C>0 represents the trade-off between the margin error and the margin.

ȟi

In SVM, (1), (2) and (3) are also called SVM’s

always unique and globally optimal.

primal problems. In practical application, we study their dual problems, and then acquired optimal

2. Problem Specification

hyper-plane.

The main idea of SVM classification is to construct a hyper-plane to separate the data to two classes. Given a sample set T={(xi,yi)|xięRN,yię{1, -1},i=1,2,…,l }ę(X×Y)l, which generated randomly and independently from an unknown function. Where xi is sample vector, yi is the classes label; “l” is the total number of samples. Assumption sample can be separated linearly. Let the decision function be f(x)=sgn(g(x)), where g(x)=(w·x)+b. The optimal hyper-plane is g(x) =0, where w is weighting vector and b is a bias. The following optimization problem is given to maximize the margin of the hyper-plane, i.e. to minimize the following quadric optimization problem (1).

1 ­ || w ||2 W (w) °min w,b 2 ° s t y w . . (( ˜ x)  b) t 1 i 1,2,...,l ® i ° min yi ((w ˜ x)  b) 1 ° i 1, 2,...l ¯

(1)

If the training set is nonlinearly separable, Then, in feature space H exploits the optimal hyper-plane, This is equal to optimal problem as follows (2)

The 3rd Intetnational Conference on Innovative Computing Information and Control (ICICIC'08) 978-0-7695-3161-8/08 $25.00 © 2008 IEEE

3. Preliminary Concepts Terminologies

and

Basic

An interesting property of SVM is that it is an approximate implementation of the structure risk minimization

induction principle

that

aims

at

minimizing a bound on the generation error of a model, rather than minimizing the mean square error over the train data set. SVM is considered as a good learning method that can overcome the internal drawbacks of neural network. The dual optimization problem of the above (3) optimization problem is represented as (4). 1 l l ­ Q (D ) ¦ ¦ y i y jD iD j K ( x i , x j )  ° min D 2 i 1 j 1 ° l °° 0 y iD i ® s .t . ¦ 1 i ° ° 0 d D i d C , i 1, 2 ,..., l ° °¯

l

¦D

i

i 1

(4)

Where Į=(Į1,Į2,....,Įl),is the vector of nonnegative Lagrange multipliers, K(xi,x)=(ĭ(xi).ĭ(x)) is Hilbert space inner product, called kernel function. Suppose the optimum values of Lagrange multipliers are

Authorized licensed use limited to: Sichuan University. Downloaded on October 10, 2008 at 13:09 from IEEE Xplore. Restrictions apply.

denoted as Įi*(i=1,2,…,l). And then decision function is (5) l

f (x)

sgn( g ( x ))

sgn(

¦

y i D i* K ( x i ˜ x )  b * )

(5)

5. Prediction the Total Workload in Telecommunications By Support Vector 5.1. Data Set and Results Analysis

i 1

*

Where b is bias which compute: selected a Įj*>0, then b*=yj-™i=1lyiK(xi,xj).

In [], the researchers analysis the relationship between business total posts and telecommunication

4 SVMs for Regression Estimation Model

and the first, second and third GNP, they are preprocessed as their rate of increase. Then,

SVMs have been successfully applied to solve

prediction

of

business

total

of

posts

and

forecasting problems in numerous fields. These

telecommunication turn into prediction of its rate of

studies show: SVRs model outperforms other

increase.

forecasting methods, such as BP neural network, RBF network, ARIMA, etc.

We choose 1991-1999 year data as train data set, to use train SVRs, the data of 2000y as predicted data.

Regression analysis is also named function

Select kernel function is RBF, the parameters are

approximation. Its proposition is: given sample data

C=2000,İЈ0.05,ı=1.1. Experiments use matlab6.5

N

l

set T={(xi,yi)|xięR ,yięR, i=1,2,…,l }ę(X×Y) ,

programming. Regression curve shows at figure 1.

where xi,i=1,2,..l is predicted variables, yi,i=1,2,..l is response factor, l is the number of the samples. Task is to find a function y=f(x), which is the best function in certain measured. Simulated SVMs for classification definition, let all response variable yi plus and minus a real number İ>0, now data set T change into D=D+ĤD-,where D+={(xi,yi+İ)T,i=1,2,..l}, D-={(xi,yi-İ)T,i=1,2,..l}. We look D as a classification question, then our aim at to search a hyper-plane which separate D into D+ and D.

Figure 1ˊThe real curve and estimated curve Figure 2 shows the spectrum of ө (*) versus samples. From figure 3, we see number 1,2,3,4,5,7

Here, we conclude SVRs model as (6).

samples are support vectors, because their absolute

1 l ­ Q (D (*) ) ¦ (D i*  D i )( D *j  D j ) K ( x i , x j ) ° D min (*) R 2l 2 i, j 1 ° l l °  H ¦ ( D i*  D i )  ¦ y i (D i*  D i ) ° ° i 1 i 1 ® l ° s .t . (D i*  D i ) 0 ¦ ° i 1 ° C ° 0 d Di d , i 1, 2 ,..., l l ¯°

errors bigger than or equal İЈ0.05, especially, the sample No. 3(year 1993), its absolute error=0.0532, (6)

samples as boundary support vectors). We named it outliers point. It shows in 1993, social development

Where K(xi,xj) is kernel function, Note: here (*)

*

Į =(Į1,Į

(*) T 1,…,Įl,Į l)

2l

is R

sample xj is support vector if

space vector. Call

Įj*0

or Įj0.

 (*)



If (6)’s, optimal solution D



f (x)

¦

 *

(D

i



very rapidly or change very much, this needs economists to analysis. From forecasting, when eliminated this sample, precision of forecasting can be



*

(D1 ,D1* ,...,D l ,D l )T , then

regression function is (7). l

thus ө 3=C=2000(at this time, named responding

better. The choice of kernel function and its parameter(s), and regression parameters C, İ are very important.



 D i )K ( x i , x )  b )

(7 )

They are related with real questions.

i 1

The 3rd Intetnational Conference on Innovative Computing Information and Control (ICICIC'08) 978-0-7695-3161-8/08 $25.00 © 2008 IEEE

Authorized licensed use limited to: Sichuan University. Downloaded on October 10, 2008 at 13:09 from IEEE Xplore. Restrictions apply.

Our model has high estimate precision. Mean of

results are satisfied. Compared prediction results

squared error is only 1.46%, the biggest relative error

using our method with others methods, show SVRs

is 3.75%, happened at 1991.

outperform others methods.

7. REFERENCES [1] WU HuiRong. Applying Gray Model Predict Business Total of Posts and Telecommunication [J]. Journal of Nanjing University of Posts and Telecommunication. 1990.10(2): 91-94.

Figure 2. Alpha spectrum of SVRs

[2] GAO Jie. The Application of the Multivariable Fuzzy Inference Forecasting Method to Predicting Business Total

5.2. Comparison with Other Methods

of Posts and Telecommunications [J]. Journal of Nanjing University of Posts and Telecommunication. 2000.20(1):

In order to show our method has better

58-62.

generalization ability than others, table 1, to illustrate

[3] GAO Jie and SHENG Zhao-han. Method and

the prediction results with SVR and with other

application of set pair analysis classified prediction [J].

methods.

Journal of Systems Engineering, 2002.10:458-462.

Table 1. SVRs methods vs. others techniques Clustering

[4] L. J. Cao and Francis E. H. Tay. Support Vector

prediction

Rel. err.

Machine with Adaptive Parameters in Financial Time Series

SVR

Ё

1.4205

0.10%

Forecasting [J]. IEEE Transactions on Neural Networks,

SPA[3]

A2

1.415

0.49%

2003, 14(6): 1509-1518.

MFI[2]

Ё

1.419

0.21%

[5] PENG Lifang, MENG Zhiqing, JIANG Hua, TIAN Min.

Grey[1]

A2

1.395

1.90%

Application of Support Vector Machine Based on Time

Prediction based on set pair analysis and grey

Sequence in Stock Forecasting [J]. Computing Technology

theory, first it cluster data, and then predict. Please

and Automation, 2006, 25(3): 88-91.

see related work about their grade and classify

[6]Hanh H. Nguyen ,Christine W. Chan. Multiple neural

method.

networks for a long term time series forecast [J]. Neural Comput. & Applic. (2004) 13: 90–98.

6. Conclusions Since SVMs is an approximate implementation of

[7] DENG NaiYang, and TIAN YingJie. New Approach in Data Mining—Support Vector Machine [M]. Beijing: Science Press, 2004. 77-93.

structural risk minimization principle, it offers a better

[8] V. N. Vapnik, The Nature of Statistical Learning Theory

generalization performance particularly when the

[M]. Beijing: Tsinghua University Press, 2000.

training set available is limited. Unlike neural

[9] YANG LiMing. Linear support Vector Regression

network, SVM is both easy to implement and to use.

Based on VC Generalization Bounds and its Application [J].

This paper analysis the mechanism of SVRs, at

Computer Engineering and Application, 2006, 31: 230-232.

same time, applies support vector machines to

[10] WANG Yanfeng, GAO Feng. Stock Forecasting Based

forecast the rate of increase business total of posts and

on Support Vector Machine [J]. Computer Simulation, 2006,

telecommunication, analysis relationship between

23(11): 256-258.

parameters and support vectors. The experiment

The 3rd Intetnational Conference on Innovative Computing Information and Control (ICICIC'08) 978-0-7695-3161-8/08 $25.00 © 2008 IEEE

Authorized licensed use limited to: Sichuan University. Downloaded on October 10, 2008 at 13:09 from IEEE Xplore. Restrictions apply.

Suggest Documents