The complex fuzzy system forecasting model ... - Semantic Scholar

Expert Systems with Applications xxx (2011) xxx–xxx

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

The complex fuzzy system forecasting model based on fuzzy SVM with triangular fuzzy number input and output Qi Wu a,b,⇑, Rob Law b a b

Key Laboratory of Measurement and Control of CSE (School of Automation, Southeast University), Ministry of Education, Nanjing, Jiangsu 210096, China School of Hotel and Tourism Management, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong

a r t i c l e

i n f o

Keywords: Fuzzy v-support vector machine Wavelet kernel function Particle swarm optimization Fuzzy system forecasting

a b s t r a c t This paper presents a new version of fuzzy support vector machine to forecast the nonlinear fuzzy system with multi-dimensional input variables. The input and output variables of the proposed model are described as triangular fuzzy numbers. Then by integrating the triangular fuzzy theory and v-support vector regression machine, the triangular fuzzy v-support vector machine (TFv-SVM) is proposed. To seek the optimal parameters of TFv-SVM, particle swarm optimization is also applied to optimize parameters of TFv-SVM. A forecasting method based on TFv-SVRM and PSO are put forward. The results of the application in sale system forecasts confirm the feasibility and the validity of the forecasting method. Compared with the traditional model, TFv-SVM method requires fewer samples and has better forecasting precision. Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction Recently, a novel machine learning technique, called support vector machine (SVM), has drawn much attention in the fields of pattern classification and regression estimation. SVM was first introduced by Vapnik and his colleagues (Vapnik, 1999, 2000). It is an approximate implementation to the structure risk minimization (SRM) principle in statistical learning theory, rather than the empirical risk minimization (ERM) method. This SRM principle is based on the fact that the generalization error is bounded by the sum of the empirical error and a confidence interval term depending on the Vapnik–Chervonenkis (VC) dimension (Vapnik, 2000). By minimizing this bound, good generalization performance can be achieved. Compared with traditional neural networks, SVM can obtain a unique global optimal solution and avoid the curse of dimensionality. These attractive properties make SVM become a promising technique (Alzate & Suykens, 2008; Anguita, Pischiutta, Ridella, & Sterpi, 2006; Bo, Jiao, & Wang, 2007; Bo, Wang, & Jiao, 2008, 2006; Chalimourda, Schölkopf, & Smola, 2004; Deb, Jayadeva, Gopal, & Chandra, 2007; Fei & Liu, 2006; Garcia-Pedrajas, 2009; Gonen, Tanugur, & Alpaydm, 2008; Guo & Li, 2003; Ikeda, 2006; Jiao, Bo, & Wang, 2007; Kobayashi & Komaki, 2006; Kotsia, Pitas, & Zafeiriou, 2009; Lee & Huang, 2007; Lee, Kim, Lee, & Lee, 2007; Lee & Lee, 2007; Li, Mersereau, & Simske, 2007; Liu & Chen,

⇑ Corresponding author at: Key Laboratory of Measurement and Control of CSE (School of Automation, Southeast University), Ministry of Education, Nanjing, Jiangsu 210096, China. Tel.: +86 25 51166581; fax: +86 25 511665260. E-mail addresses: [email protected], [email protected] (Q. Wu).

2007; Liu, Hao, & Tsang, 2008; Lu, Roychowdhury, & Vandenberghe, 2008; Mavroforakis, Sdralis, & Theodoridis, 2007; Mavroforakis & Theodoridis, 2006; Mitra, Wang, & Banerjee, 2006; NaviaVazquez, Gutierrez-Gonzalez, Parrado-Hernandez, & Navarro-Abellan, 2006; Nguyen & TuBao, 2006; Perfetti & Ricci, 2006; Romero & Toppo, 2007; Schölkopf, Smola, Williamson, & Bartlett, 2000; Shi & Han, 2007; Takahashi, Jun, & Nishi, 2008; Takahashi & Nishi, 2006; Tao, Chu, & Wang, 2008; Tsang, Kwok, & Zurada, 2006; Wai-Hung Tsang, Kocsor, & Kwok, 2008; Wang, Li, & Bi, 2007; Wang, Yeung, & Lochovsky, 2008; Wang, Yeung, & Tsang, 2007; Williams, Li, Feng, & Wu, 2007; Wu, 2009; Wu, Liu, Xiong, & Liu, 2009; Wu, 2010; Wu & Law, 2010a; Wu & Law, 2010b, Wu, Wu, & Liu, 2010; Wu, 2011a; Xun, Chen, & Guo, 2008). SVM was initially designed to solve pattern recognition problems (Fei & Liu, 2006; Garcia-Pedrajas, 2009; Gonen et al., 2008; Guo & Li, 2003; Kotsia et al., 2009; Lee & Huang, 2007; Lee et al., 2007; Lee & Lee, 2007; Liu & Chen, 2007; Liu et al., 2008; Lu et al., 2008; Perfetti & Ricci, 2006; Shi & Han, 2007; Wu & Liu et al., 2009). Recently, with the introduction of Vapnik’s einsensitive loss function, SVM has been extended to function approximation and regression estimation problems (Chalimourda et al., 2004; Jiao et al., 2007; Li et al., 2007; Wang et al., 2008; Wu, 2009; Wu, 2010; Wu & Law, 2010a; Wu & Law, 2010b, 2010; Wu, 2011a). In SVM approach, the parameter e controls the sparseness of the solution in an indirect way. However, it is difficult to come up with a reasonable value of e without the prior information about the accuracy of output values. Chalimourda et al. (2004) and Chalimourda et al. (2004) modify the original e-SVM and introduce vSVM, where a new parameter v controls the number of support vectors and the points that lie outside of the e-insensitive tube.

0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.02.094

Please cite this article in press as: Wu, Q., & Law, R. The complex fuzzy system forecasting model based on fuzzy SVM with triangular fuzzy number input and output. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.02.094

2

Q. Wu, R. Law / Expert Systems with Applications xxx (2011) xxx–xxx

Then, the value of e in the v-SVM is traded off between model complexity and slack variables via the constant v. In many real applications, the observed input data cannot be measured precisely and usually described in linguistic levels or ambiguous metrics. However, traditional support vector regression (SVR) method cannot cope with qualitative information. It is well known that fuzzy logic is a powerful tool to deal with fuzzy and uncertain data. The fuzzy membership is used to represent the fuzzy information. For computing conveniently, traditional fuzzy SVMs transform fuzzy sample data into crisp number (Wu, 2009; Wu, 2010; Wu et al., 2010; Wu & Law, 2010a; Wu, 2011a; Wu, 2011b). This leads to some estimated errors during fuzzy information transformation. Since triangular fuzzy number can represent fuzzy information, to reduce these errors effect on the final generalization capability of SVM, the left and right spreads of triangular fuzzy number are considered into the establishment of SVM, then a novel fuzzy support vector machine is proposed on triangular fuzzy space. It is obvious that the novel proposed fuzzy SVM can handle fuzzy sample set but transform into crisp number. Sample set of traditional fuzzy SVM is composed of crisp numbers (fuzzy membership or a mapping between fuzzy number and crisp number (Wu & Liu et al., 2009; Wu & Law, 2010b)) but fuzzy number. Some scholars have explored the fuzzy support vector machine (FSVM). However, fuzzy support vector machine mentioned in the above literatures are not suitable to the input and output variables described as triangular fuzzy numbers. The optimal problem based on triangular fuzzy numbers is proposed in this paper. In this paper, we put forward a new FSVM, called by TFv-SVM. Based on the TFv-SVM, a forecasting method for nonlinear fuzzy system is proposed. The rest of this paper is organized as follows. The TFv-SVM is described in Section 2. In Section 3, a PSO is used to optimize the parameters of TFv-SVM. In Section 4, a forecasting method based on TFv-SVM and PSO is proposed. Section 5 gives an application in car sale system forecast. TFv-SVM is also compared with ARIMA and other SVM methods. Section 6 draws some conclusions.

Fig. 1. The k-cuts of two triangular fuzzy numbers.

rB ; maxð Dr A ; DrB Þ; max DrA ; DrB ÞÞ; kA ¼ ðkr A ; Dr A ; Dr A Þ if k P 0; kA ¼ kr A ; Dr A ; Dr A if k < 0, and A B ¼ ðr A rB ; maxðDr A ; DrB Þ; max DrA ; DrB Þ. h i k-cut of A can be labeled as Ak ¼ AðkÞ; AðkÞ for k 2 [0, 1], where A(k) and AðkÞ are two boundaries of the k-cut, as shown in Fig. 1. The k-cut of a fuzzy number is always a closed and bounded interval. By the Hausdorff distance of real numbers, we can define a metric in T(R) as

o n DðA; BÞ ¼ sup max jAðkÞ BðkÞj; AðkÞ BðkÞ k

h

where Ak ¼ AðkÞ; AðkÞ and Bk ¼ BðkÞ; BðkÞ are k-cuts of two fuzzy numbers. Theorem 1. In T(R), the Hausdorff metric can be obtained as follows: DðA; BÞ ¼ max jðr A Dr A Þ ðr B Dr B Þj; jrA r B j; r A þ Dr A r B þ Dr B ð3Þ

Proof. The lower boundary of the k-cut of A meets the following formula

2. Triangular fuzzy v-support vector machine

k¼

2.1. Triangular fuzzy theory

Definition 1. Suppose M 2 T(R) is triangular fuzzy number (TFN) in triangular fuzzy space, whose membership function is represented as follows:

8 xaM ; aM 6 x < r M > < rM aM x ¼ rM lM ðxÞ ¼ 1; > : xbM ; r M 6 x < bM r M bM

AðkÞ ðr A Dr A Þ : Dr A

where aM 6 rM < bM, aM, rM,bM 2 R, aM 6 x < bM, x 2 R. Then we have the formulation M = (aM, rM, bM) in which rM is the center, aM is the left boundary and bM is the right boundary. The standard triangular fuzzy number is difficult to deal with input variables of SVM, the extended version is considered and described as following: The standard triangular fuzzy number is difficult to deal with input variable of SVM, the extended version of Definition 1 is considered and described as following: Definition 2. a ¼ ra ; Dra ; Dra is extended triangular fuzzy number (ETFN) in which ra 2 R is the center, Dra = ra aa is the left spread and Dr a ¼ ba r a is the right spread, where Dra > 0 or Dra > 0. Let A ¼ r A ; Dr A ; Dr A and B ¼ r B ; Dr B ; Dr B be two ETFNs, whose k-cuts are shown in Fig. 1. In the space T(R) of all ETFNs, we define linear operations by the extension principle: A þ B ¼ ðr A þ

ð4Þ

Then we have A(k) = rA + (k 1)DrA. In the same way, we can obtain AðkÞ ¼ r A þ ð1 kÞDr A ; BðkÞ ¼ r B þ ðk 1ÞDr B and BðkÞ ¼ r B þ ð1 kÞDrB . According to the definition of Eq. (2), we get DðA; BÞ ¼ sup max jðr A r B Þ þ ðk 1ÞðDrA Dr B Þj; ðr A rB Þ þ ð1 kÞ Dr A Dr B k

¼ max sup jðr A r B Þ þ ðk 1ÞðDr A Dr B Þj; sup jðr A r B Þ þ ð1 kÞ Dr A Dr B j k

ð1Þ

ð2Þ

i

ð5Þ

k

For the given triangular fuzzy numbers A and B, (rA rB) + (k 1) (DrA DrB) and ðr A r B Þ þ ð1 kÞ DrA DrB are two linear functions of k. As k 2 [0, 1], the following formulas must hold: sup jðr A rB Þ þ ðk 1ÞðDr A Dr B Þj ¼ maxfjðr A Dr A Þ ðr B Dr B Þj; jr A r B jg ð6Þ k

sup jðr A rB Þ þ ð1 kÞðDr A Dr B Þj ¼ maxfjðr A þ Dr A Þ ðr B þ Dr B Þj; jr A r B jg:ð7Þ k

Substituting (6) and (7) into (5), we can obtain Eq. (3). This completes the proof of Theorem 1. h 2.2. Triangular fuzzy support vector machine Suppose a set of fuzzy training samples fðxi ; yi Þgli¼1 , where xi 2 T(R)d and yi 2 T(R). T(R)d is the set of d dimensional vectors of ETFNs. We consider the approximation function f(x) = w x + b, where w = (w1, w2, . . . , wn), and w x denotes an inner product of w and x. In T(R), f(x) can be written as r x ; qðDr x Þ; qðDr x ÞÞ; w 2 Rd ; b 2 R f ðxÞ ¼ ðw rx þ b; qðDr x ÞÞ ¼ ðw

ð8Þ


3


Problem (9) is a quadratic programming (QP) problem. By introducing Lagrangian multipliers, a Lagrangian function can be defined as follows. Lðw; b; e; nðÞ ; aðÞ ; b; gðÞ Þ ¼

3 X l 1 1X 2 2 ðkwk þ b Þ þ C½me þ ðn þ nki Þ 2 l k¼1 i¼1 ki

3 X l X ðgki nki þ gki nki Þ be k¼1 i¼1

þ

l X

a1i ½ryi ðw /ðrxi Þ þ bÞ e n1i

i¼1

þ

l X

a1i ½ðw /ðrxi Þ þ bÞ ryi e n1i

i¼1

þ

l X

a2i ½ðryi Dryi Þ ðw /ðrxi Þ þ b

i¼1

þ qðDr xi ÞÞ e n2i þ

l X

a2i ½ðw /ðrxi Þ

i¼1

þ b þ qðDr xi ÞÞ ðr yi Dr yi Þ e n2i þ

Fig. 2. The e-insensitive tube of TFRWv-SVM.

l X

a3i ½ðryi þ Dryi Þ ðw /ðrxi Þ

i¼1

where qðDr x Þ ¼ maxðDr x1 ; Drx2 ; . . . ; Dr xd Þ; qðDr x Þ ¼ max Dr x1 ; Dr x2 ; . . . ; Drxd Þ: From Theorem 1 and the idea of TFv-SVM, which e-tube is shown in Fig. 2, the regression coefficients in T(R) can be estimated by the following constrained optimization problem:

min w;b;e;n

s:t:

ðÞ

3 X l 1 CX 2 kwk2 þ b þ C v e þ ðn þ nki Þ 2 l k¼1 i¼1 ki 8 r y ðw /ðrxi Þ þ bÞ 6 e þ n1i > > > i > > ðw /ðr xi Þ þ bÞ r yi 6 e þ n1i > > > > > ðryi Dr yi Þ ðw /ðr xi Þ þ b þ qðDr xi ÞÞ 6 e þ n2i > > > > < ðw /ðr xi Þ þ b þ qðDr xi ÞÞ ðr yi Dryi Þ 6 e þ n2i

ð9Þ

> ðryi þ Dr yi Þ ðw /ðr xi Þ þ b qðDr xi ÞÞ 6 e þ n3i > > > > > > ðw /ðr xi Þ þ b qðDr xi ÞÞ ðr yi þ Dryi Þ 6 e þ n3i > > > > > nki ; nki P 0; k ¼ 1; 2; 3 > > : eP0 ðÞ

where C > 0 is a penalty factor, nki ðk ¼ 1; 2; 3; i ¼ 1; . . . ; lÞ are slack variables and v 2 (0, 1] is an adjustable regularization parameter. l is constant coefficient of Laplace loss function.

þ b qðDr xi ÞÞ e n3i þ

l X

a3i ½ðw /ðrxi Þ

i¼1

þ b qðDr xi ÞÞ ðr yi þ Dr yi Þ e n3i ðÞ ki ; b;

ð10Þ

ðÞ k

where a g P 0 ðk ¼ 1; 2; 3; i ¼ 1; . . . ; lÞ are Lagrangian multipliers. Differentiating the Lagrangian function (10) with regard ðÞ to w; b; e; nki , we have 3 X l X @L ¼0)w¼ ðaki aki Þ/ðrxi Þ @w k¼1 i¼1

ð11Þ

3 X l X @L ¼ 0 )¼ ðaki aki Þ ¼ b @b k¼1 i¼1

ð12Þ

3 X l X @L ¼ 0 ) b ¼ Cv ðaki aki Þ @e k¼1 i¼1

ð13Þ

@L ðÞ

@nki

ðÞ

ðÞ

¼ 0 ) gki ¼ C=l aki

ð14Þ

By substituting (11)–(14) into (10), we can obtain the corresponding dual form of function (9) as follows:

Fig. 3. The architecture of TFRWv-SVM.


4


l l X X 1 2 max kwk þ r yi ða1i a1i Þ þ ðr yi Dr yi þ qðDr xi ÞÞða2i a2i Þ a; a 2 i¼1 i¼1

þ

l X ðr yi þ Dr yi qðDr xi ÞÞða3i a3i Þ i¼1

s:t:

aðÞ ki 2 ½0; C=l;

3 X l X ðaki þ aki Þ 6 C m k¼1 i¼1

ð15Þ

P where kwk2 ¼ li;j¼1 ða1i a1i þa2i a2i þ a3i a3i Þða1j a1j þ a2j a2j þ a3j a3j ÞðKðrxi rxj Þ þ 1Þ ðÞ The Lagrangian multipliers aki can be determined by solving the above QP problem. Based on the Karush–Kuhn–Tucker (KKT) conditions, we have

8 a1i ðryi w /ðrxi Þ b e n1i Þ ¼ 0 > > > > a ðw /ðr Þ þ b r e n Þ ¼ 0 > xi yi > 1i 1i > > > > a ðr D r w /ðr Þ b qðDrxi Þ e n2i Þ ¼ 0 > y y x 2i i i i > > > > a ðw /ðr Þ þ b þ q ð D r Þ r þ Dr yi e n2i Þ ¼ 0 > x x y i i i > < 2i a3i ðryi þ Dryi w /ðrxi Þ b þ q Drxi e n3i Þ ¼ 0 > > > > a3i ðw /ðrxi Þ þ b q Drxi ryi Dryi e n3i Þ ¼ 0 > > > > ðÞ ðÞ > ðC=l aki Þnki ¼ 0 > > > > > 3 P l > P > > ðaki þ aki Þ e ¼ 0 : Cm

ð16Þ

k¼1 i¼1

where akm ; akj 2 ð0; C=lÞ. Thus, the regression function (17) can be determined by

f ðxÞ ¼

3 X l X ðaki aki ÞðKðrxi r x Þ þ 1Þ; qðDr x Þ; q Dr x

! ð17Þ

3. Particle swarm optimization (PSO) It is difficult to seek the optimal unknown parameters of SVMs, and some evolutionary algorithms, viz., genetic algorithm (GA), particle swarm optimization (PSO) are generally adopted. TFvSVM involves four main parameters (C, v, a). Many literatures show that PSO has good optimization performance (Ikeda, 2006; Kobayashi & Komaki, 2006; Romero & Toppo, 2007; Williams et al., 2007). Since this paper focuses on the establishment of TFv-SVM, standard PSO is only used to seek its optimal parameters. In this paper, vector (C, v, a) represents the position of the particle. Similarly to evolutionary computation techniques, PSO uses a set of particles to represent potential solutions to the problem under consideration. The swarm consists of m particles. Each particle has a position Xi = {xi1, xi2, . . . , xij, . . . , xim}, a velocity Vi = {vi1, i = {vi1, vi2, . . . , vij, . . . , vim}, where i = 1, 2, . . . , n; j = 1, 2, . . . , m, and moves through a m-dimensional search space. According to the global variant of the PSO algorithm, each particle moves towards its best previous position and towards the best particle pg in the swarm. Let us denote the best previously visited position of the ith particle that gives the best fitness value as pi = {pi1, pi2, . . . , pij , . . . , pim}, and the best previously visited position of the swarm that gives best fitness as pg = {pg1, pg2, . . . , pgj, . . . , pgm}. The change of position of each particle from one iteration to another can be computed according to the distance between the current position and its previous best position and the distance between the current position and the best position of swarm. Then the updating of velocity and particle position can be obtained by using the following equations:

v kþ1 ¼ wv kij þ c1 r 1 ðpij xkij Þ þ c2 r2 ðpg j xkij Þ ij xkþ1 ¼ xkij þ v kþ1 ij ij

ð18Þ ð19Þ

k¼1 i¼1

In fact, SVM can be described as neural networks. Fig. 3 shows the structure of TFv-SV;M.

where w is called inertia weight and is employed to control the impact of the previous history of velocities on the current one. Accordingly, the parameter w regulates the trade-off between the global

Fig. 4. The forecasting model based on TFRWv-SVM.


5


and local exploration abilities of the swarm. A large inertia weight facilitates global exploration, while a small one tends to facilitate local exploration. A suitable value of the inertia weight w usually provides balance between global and local exploration abilities and consequently results in a reduction of the number of iterations required to locate the optimum solution. k denotes the iteration number, c1 is the cognition learning factor, c2 is the social learning factor, r1 and r2 are random numbers uniformly distributed in [0, 1]. Thus, the particle flies through potential solutions towards pki and pgk in a navigated way while still exploring new areas by the stochastic mechanism to escape from local optima. Since there was no actual mechanism for controlling the velocity of a particle, it was necessary to impose a maximum value Vmax on it. If the velocity exceeds the threshold, it is set equal to Vmax, which controls the maximum travel distance at each iteration to avoid this particle flying past good solutions. The PSO algorithm is terminated with a maximal number of generations or the best particle position of the entire swarm cannot be improved further after a sufficiently large number of generations. The PSO algorithm has shown its robustness and efficacy in solving function value optimization problems in real number spaces.

4. Regression forecasting method based on TFv-SVM Owing to the unknown future data, the accurate forecasting error cannot be computed in practical forecasting task. In addition, even if the error can be given, for evaluation methods, only one forecast cannot be taken as the final result owing to the existing stochastic noise interference. Therefore, the forecasting error should be considered by using mean method. For the same forecasting aim, the good and bad of forecasting method can be judged by mean square error (MSE). The forecasting method with bigger MSE value has worse forecasting capacity. The fitness function is very important to search an optimal parameter combination in forecasting task. The MSE is usually used as the fitness function in the following experiment. The formulation of MSE is described by Eq. (20).

MSE ¼

l 1X ðr y r yi Þ2 l i¼1 i

ð20Þ

where ryi is real datum from the sample set, r yi is forecasting datum of yi, and l is the length of test sample. Then, the fitness function can be formulated by Eq. (21),

fitness ¼ MSEðl; yi Þ

ð21Þ

Suppose the number of variables is n, and n = n1 + n2, where n1 and n2, respectively denote the number of fuzzy linguistic variables and crisp numerical variables. The linguistic variables are evaluated in several description levels, and a real number between 0 and 1 can be assigned to each description level. Distinct numerical variables have different dimensions and should be normalized firstly. The following normalization is adopted:

l xdi min xdi i¼1 xdi ¼ ; l l max xdi i¼1 min xdi i¼1

Fig. 5. Mexican hat wavelet transform.

d ¼ 1; 2; . . . ; n2

ð22Þ

where l is the number of samples, xdi and xdi denote the original value and the normalized value respectively. In fact, all the numerical variables from (1)–(21) are the normalized values although they are not marked by bars. Fuzzification is used to process the linguistic variables and the normalized numerical variables. The centers of those corresponding triangular fuzzy numbers are assigned the normalized values. The spreads of those fuzzy numbers can be determined by evaluating, or by taking some function of the observed values, such as Dr xi ¼ h r xi ; Dr xi ¼ s r xi where h and s are a coefficient for fuzzification.

Table 1 Learning and testing data. No.

Input

1 2 3 4 5 6 7 8 9 10 ... 58 59 60

(0.4, 0.04, 0.04) (0.8, 0.08, 0.08) (0.6, 0.06, 0.06,) (0.1, 0.01, 0.01) (0.3, 0.03, 0.03) (0.3, 0.03, 0.03) (0.5, 0.05, 0.05) (0.5, 0.05, 0.05) (0.8, 0.08, 0.08) (0.9, 0.09, 0.09) ... (0.7, 0.07, 0.07) (0.6, 0.06, 0.06) (0.3, 0.03, 0.03)

Desired outputs (0.4, 0.04, 0.04) (0.5, 0.05, 0.05) (0.3, 0.03, 0.03) (0.2, 0.02, 0.02) (0.6, 0.06, 0.06) (0.5, 0.05, 0.05) (0.8, 0.08, 0.08) (0.2, 0.02, 0.02) (0.4, 0.04, 0.04) (0.6, 0.06, 0.06) ... (0.3, 0.03, 0.03) (0.6, 0.06, 0.06) (0.8, 0.08, 0.08)

(0.4, 0.04, 0.04) (0.9, 0.09, 0.09) (0.1, 0.01, 0.01) (0.1, 0.01, 0.01) (0.9, 0.09, 0.09) (0.5, 0.05, 0.05) (0.2, 0.02, 0.02) (0.3, 0.03, 0.03) (0.4, 0.04, 0.04) (0.2, 0.02, 0.02) ... (0.9, 0.09, 0.09) (0.9, 0.09, 0.09) (0.3, 0.03, 0.03)

(4, 0.4, 0.4) (4, 0.4, 0.4) (4, 0.4, 0.4) (1, 0.1, 0.1) (1, 0.1, 0.1) (1, 0.1, 0.1) (1, 0.1, 0.1) (2, 0.2, 0.2) (1, 0.1, 0.1) (1, 0.1, 0.1) ... (8, 0.8, 0.8) (10, 1, 1) (12, 1.2, 1.2)

(3.1, 0.31, 0.31) (0.56, 0.056, 0.056) (1.5, 0.15, 0.15) (0.5, 0.05, 0.05) (2.1, 0.21, 0.21) (7.1, 0.71, 0.71) (0.5, 0.05, 0.05) (8.07, 0.807, 0.807) (0.45, 0.045, 0.045) (0.3, 0.03, 0.03) ... (2, 0.2, 0.2) (5, 0.5, 0.5) (7.9, 0.79, 0.79)

(9.3, 0.93, 0.93) (18.2, 1.82, 1.82) (21.1, 2.11, 2.11) (25, 2.5, 2.5) (23.7, 2.37, 2.37) (5.1, 0.51, 0.51) (2.8, 0.28, 0.28) (2.9, 0.29, 0.29) (8.2, 0.82, 0.82) (6.8, 0.68, 0.68) ... (3.7, 0.37, 0.37) (2.3, 0.23, 0.23) (11.6, 1.16, 1.16)

(950, 95, 95) (231, 23, 23) (606, 60.6, 60.6) (486, 48.6, 48.6) (891, 89.1, 89.1) (762, 76.2, 76.2) (456, 45.6, 45.6) (185, 18.5, 18.5) (821,82.1, 82.1) (444, 44.4, 44.4) (864, 86.4, 86.4) (853, 85.3, 85.3) (593, 59.3, 59.3)


6


Step (6) Step (7)

Update the particle position by Eqs. (18) and (19) and form new particle swarms, go to step 3. End the training procedure, output the optimal particle.

On the basis of TFv-SVM and PSO, we can summarize a forecasting method shown in Fig. 4 as the follows (see Fig. 5).

Step (1) Step (2)

Step (3) Step (4)

Initialize the original data by normalization and fuzzification, then form training patterns. Select the wavelet kernel function K. call Algorithm 1 and get the control constant v, the penalty factor C and scaling parameter a of wavelet kernel function. Construct the QP problem (15) of the TFv-SVRM. Solve the optimization problem and obtain the ðÞ parameters aki . For a new forecasting task, extract influencing features and form a set of input variables x. Then com^ by (17). pute the forecasting result y

5. Experiments

Fig. 6. Morlet wavelet transform.

The particle swarm optimization (PSO) algorithm is described in steps as follows: Algorithm 1. Step (1) Step (2)

Step (3) Step (4)

Step (5)

Data preparation: Training and testing sets are represented as Tr and Te, respectively. Particle initialization and PSO parameters setting: Generate initial particles. Set the PSO parameters including number of particles (n), particle dimension (m), number of maximal iterations (kmax), error limitation of the fitness function, velocity limitation (Vmax), and inertia weight for particle velocity (w0). Set iterative variable: k = 0. And perform the training process from steps 3–7. Set iterative variable: k = k + 1. Compute the fitness function value of each particle. Take current particle as individual extremum point of every particle and do the particle with minimal fitness value as the global extremum point. Stop condition checking: if stopping criteria (maximum iterations predefined or the error accuracy of the fitness function) are met, go to step 7. Otherwise, go to the next step.

To illustrate the forecasting method, car sale fuzzy system forecasts with multi-factors is studied. Some factors with large influencing weights are gathered to develop a factor list, as shown in Table 1 (see Fig. 6). In our experiments, a fuzzy sale sample set with 60 dimensional series is selected from past record in a typical car company. The detailed characteristic data and sale data of the company compose the corresponding patterns, as shown in Table 2. The experiments are made on a 1.80 GHz Core (TM)2 CPU personal computer (PC) with 1.0G memory under Microsoft Windows xp professional. The initial father process parameters of PSO are given as follows: number of particles: n = 100; particle dimension: m = 6; inertia weight: w = 0.9; positive acceleration constants: c1, c2 = 2; the maximal iterative number: kmax = 100; the fitness accuracy of the normalized samples is equal to 0.0002; fuzzification coefficient: h = s = 0.1. Some criteria, such as mean absolute error (MAE), mean square error (MSE) and mean absolute percentage error (MAPE), are adopted to evaluate the performance of the TFv-SVRM method. The indexMSE is used as the fitness function of PSO in this experiment (see Figs. 7 and 8). The optimal combinational parameters are obtained by PSO, viz.,C = 534.08, v = 0.89 and a = 0.05. Fig. 9 illuminates the sale series forecasting results given by TFv-SVM. To analyze the forecasting capacity of the model based on TFvSVM, the forecasting model based on ordinary fuzzy SVM and autoregressive moving average (ARMA) are selected to deal with the above series. Their results are shown in Table 2.

Table 2 Comparison of forecasting results from four different models. No.

1 2 3 4 5 6 7 8 9 10 11 12

Desired outputs

(593, 59.3, 59.3) (853, 85.3, 85.3) (864, 86.4, 86.4) (784, 78.4, 78.4) (979, 97.9, 97.9) (509, 50.9, 50.9) (541, 54.1, 54.1) (302, 30.2, 30.2) (682, 68.2, 68.2) (934, 93.4, 93.4) (897, 89.7, 89.7) (746, 74.6, 74.6)

Forecasting value ARMA

Fv-SVRM

FWv-SVRM

FRWv-SVM

635 626 630 593 658 657 701 645 601 591 615 669

(601, 60.1, 60.1) (838, 83.8,83.8) (848, 84.8, 84.8) (765, 76.5, 76.5) (951, 95.1, 95.1) (517, 51.7, 51.7) (546, 54.6, 54.6) (342, 34.2, 34.2) (664, 66.4, 66.4) (911, 91.1, 91.1) (877, 87.7, 87.7) (741, 74.1, 74.1)

(604, 60.4, 60.4) (840, 84, 84) (850, 85, 85) (768, 76.8, 76.8) (953, 95.3, 95.3) (519, 51.9, 51.9) (543, 54.3, 54.3) (344, 34.4, 34.4) (666, 66.6, 66.6) (913, 91.3, 91.3) (880, 88, 88) (744, ,74.4, 74.4)

(596, 59.6, 59.6) (845, 84.5, 84.5) (855, 85.5, 85.5) (773, 77.3, 77.3) (958, 95.8, 95.8) (511, 51.1, 51.1) (538, 53.8, 53.8) (336, 33.6, 33.6) (671, 67.1, 67.1) (918, 91.8, 91.8) (884, 88.4, 88.4) (743, 74.3, 74.3)



7

Fig. 7. Gaussian wavelet transform.

Fig. 8. Complex Gaussian wavelet transform.

Fig. 9. The sales forecasting results based on TFRWv-SVM.

The indexes MAE, MAPEand MSE are used to evaluate the forecasting capacity of three models shown in Table 3. To represent the error trend well, the latest 12 months forecasting results are used to analyze the forecasting performance of the above models. It is obvious that the forecasting accuracy given by fuzzy support

vector regression machine excels the ones by autoregressive moving average (ARMA). The indexes MAE, MAPE and MSE provided by TFv-SVM are better than ones by v-SVM. The indexes MAE, MAPE and MSE provided by TFv-SVM are also better than ones by Fv-SVM. It is


8


Table 3 Error statistic of four forecasting models. Model

MAE

MAPE

MSE

ARMA Fv-SVRM FWv-SVRM FRWv-SVRM

204.08 17.08 15.83 11.17

0.3171 0.0283 0.0272 0.0195

51866 388.08 356.33 203.33

obvious that TFv-SVM has the best generalization performance and is appropriate to those cases with finite samples with uncertain information. 6. Conclusions In this paper, a new version of FSVRM, named by TFv-SVM, is proposed to forecast sale system by integrating fuzzy theory and v-SVM. The performance of TFv-SVM is evaluated using fuzzy sale system forecast with multi-dimension input, and the simulation results demonstrate that TFv-SVM is effective in dealing with uncertain data and finite samples. Moreover, it is shown that the parameter-choosing algorithm presented here is available for the TFv-SVM to seek the unknown parameters. Compared to ARMA, the fuzzy support vector machines have some other attractive properties, such as the strong learning capability for small samples, the good generalization performance, the insensitivity to noise or outliers and the steerable approximation parameters. Compared to the models (v-SVM and Fv-SVM), TFv-SVM has the best generalization performance. Moreover, TFv-SVM, which has strong robustness, can penalize effectively some types of noise and singular points from input series. In our experiments, some fixed coefficient, such as fuzzification coefficients (h and s), are adopted. However, how to choose an appropriate coefficient is not described in this paper. This is a meaningful problem for future research. Acknowledgements This research was partly supported by the National Natural Science Foundation of China under Grant 60904043 and 70761002, a research grant funded by the Hong Kong Polytechnic University (G-YX5J), China Postdoctoral Science Foundation (20090451152), the third special Grant from China Postdoctoral Science Foundation, Jiangsu Planned Projects for Postdoctoral Research Funds (0901023C) and Southeast University Planned Projects for Postdoctoral Research Funds. References Alzate, C., & Suykens, J. (2008). Kernel component analysis using an epsiloninsensitive robust loss function. IEEE Transactions on Neural Networks, 19(9), 1583–1598. Anguita, D., Pischiutta, S., Ridella, S., & Sterpi, D. (2006). Feed-forward support vector machine without multipliers. IEEE Transactions on Neural Networks, 17(5), 1328–1331. Bo, L., Jiao, L., & Wang, L. (2007). Working set selection using functional gain for LSSVM. IEEE Transactions on Neural Networks, 18(5), 1541–1544. Bo, L., Wang, L., & Jiao, L. (2008). Training hard-margin support vector machines using greedy stagewise algorithm. IEEE Transactions on Neural Networks, 19(8), 1446–1455. Casali, D., Costantini, G., Perfetti, R., & Ricci, E. (2006). Associative memory design using support vector machines. IEEE Transactions on Neural Networks, 17(5), 1165–1174. Chalimourda, A., Schölkopf, B., & Smola, A. J. (2004). Experimentally optimal v in support vector regression for different noise models and parameter settings. Neural Networks, 17(1), 127–141. Deb, A. K., Jayadeva Gopal, M., & Chandra, S. (2007). SVM-based tree-type neural networks as a critic in adaptive critic designs for control. IEEE Transactions on Neural Networks, 18(4), 1016–1030.

Fei, B., & Liu, J. (2006). Binary tree of SVM: A new fast multiclass training and classification algorithm. IEEE Transactions on Neural Networks, 17(3), 696–704. Garcia-Pedrajas, N. (2009). Constructing ensembles of classifiers by means of weighted instance selection. IEEE Transactions on Neural Networks, 20(2), 258–277. Gonen, M., Tanugur, A. G., & Alpaydm, E. (2008). Multiclass posterior probability support vector machines. IEEE Transactions on Neural Networks, 19(1), 130–139. Guo, G. D., & Li, S. Z. (2003). Content-based audio classification and retrieval by support vector machines. IEEE Transactions on Neural Networks, 14(1), 209–215. Ikeda, K. (2006). Effects of kernel function on Nu support vector machines in extreme cases. IEEE Transactions on Neural Networks, 17(1), 1–9. Jiao, L., Bo, L., & Wang, L. (2007). Fast sparse approximation for least squares support vector machine. IEEE Transactions on Neural Networks, 18(3), 685–697. Kobayashi, K., & Komaki, F. (2006). Information criteria for support vector machines. IEEE Transactions on Neural Networks, 17(3), 571–577. Kotsia, I., Pitas, I., & Zafeiriou, S. (2009). Novel multiclass classifiers based on the minimization of the within-class variance. IEEE Transactions on Neural Networks, 20(1), 14–34. Lee, Y. J., & Huang, S. Y. (2007). Reduced support vector machines: A statistical theory. IEEE Transactions on Neural Networks, 18(1), 1–13. Lee, K. Y., Kim, D. W., Lee, K. H., & Lee, D. (2007). Density-induced support vector data description. IEEE Transactions on Neural Networks, 18(1), 284–289. Lee, D., & Lee, J. (2007). Equilibrium-based support vector machine for semisupervised classification. IEEE Transactions on Neural Networks, 18(2), 578–583. Li, D., Mersereau, R. M., & Simske, S. (2007). Blind image deconvolution through support vector regression. IEEE Transactions on Neural Networks, 18(3), 931–935. Liu, Y. H., & Chen, Y. T. (2007). Face recognition using total margin-based adaptive fuzzy support vector machines. IEEE Transactions on Neural Networks, 18(1), 178–192. Liu, B., Hao, Z., & Tsang, E. C. C. (2008). Nesting one-against-one algorithm based on SVMs for pattern classification. IEEE Transactions on Neural Networks, 19(12), 2044–2052. Lu, Y., Roychowdhury, V., & Vandenberghe, L. (2008). Distributed parallel support vector machines in strongly connected networks. IEEE Transactions on Neural Networks, 19(7), 1167–1178. Mavroforakis, M. E., Sdralis, M., & Theodoridis, S. (2007). A geometric nearest point algorithm for the efficient solution of the SVM classification task. IEEE Transactions on Neural Networks, 18(5), 1545–1549. Mavroforakis, M. E., & Theodoridis, S. (2006). A geometric approach to support vector machine (SVM) classification. IEEE Transactions on Neural Networks, 17(3), 671–682. Mitra, V., Wang, C. J., & Banerjee, S. (2006). Lidar detection of underwater objects using a neuro-SVM-based architecture. IEEE Transactions on Neural Networks, 17(3), 717–731. Navia-Vazquez, A., Gutierrez-Gonzalez, D., Parrado-Hernandez, E., & NavarroAbellan, J. J. (2006). Distributed support vector machines. IEEE Transactions on Neural Networks, 17(4), 1091–1097. Nguyen, D., & TuBao, H. (2006). A bottom-up method for simplifying support vector solutions. IEEE Transactions on Neural Networks, 17(3), 792–796. Perfetti, R., & Ricci, E. (2006). Analog neural network for support vector machine learning. IEEE Transactions on Neural Networks, 17(4), 1085–1091. Romero, E., & Toppo, D. (2007). Comparing support vector machines and feedforward neural networks with similar hidden-layer weights. IEEE Transactions on Neural Networks, 18(3), 959–963. Schölkopf, B., Smola, A. J., Williamson, R. C., & Bartlett, P. L. (2000). New support vector algorithms. Neural Computation, 12(5), 1207–1245. Shi, Z., & Han, M. (2007). Support vector echo-state machine for chaotic time-series prediction. IEEE Transactions on Neural Networks, 18(2), 359–372. Takahashi, N., Jun, G., & Nishi, T. (2008). Global convergence of SMO algorithm for support vector regression. IEEE Transactions on Neural Networks, 19(6), 971–982. Takahashi, N., & Nishi, T. (2006). Global convergence of decomposition learning methods for support vector machines. IEEE Transactions on Neural Networks, 17(6), 1362–1369. Tao, Q., Chu, D., & Wang, J. (2008). Recursive support vector machines for dimensionality reduction. IEEE Transactions on Neural Networks, 19(1), 189–193. Tsang, I. W. H., Kwok, J. T. Y., & Zurada, J. A. (2006). Generalized core vector machines. IEEE Transactions on Neural Networks, 17(5), 1126–1140. Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10(5), 988–999. Vapnik, V. N. (2000). The nature of statistical learning. New York: Springer-Verlag. Wai-Hung Tsang, I., Kocsor, A., & Kwok, J. T. Y. (2008). Large-scale maximum margin discriminant analysis using core vector machines. IEEE Transactions on Neural Networks, 19(4), 610–624. Wang, G. L., Li, Y. F., & Bi, D. X. (2007). Support vector networks in adaptive friction compensation. IEEE Transactions on Neural Networks, 18(4), 1209–1219. Wang, G., Yeung, D. Y., & Lochovsky, F. H. (2008). A new solution path algorithm in support vector regression. IEEE Transactions on Neural Networks, 19(10), 1753–1767. Wang, D., Yeung, D. S., & Tsang, E. C. (2007). Weighted mahalanobis distance kernels for support vector machines. IEEE Transactions on Neural Networks, 18(5), 1453–1462. Williams, P., Li, S., Feng, J., & Wu, S. (2007). A geometrical method to improve performance of the support vector machine. IEEE Transactions on Neural Networks, 18(3), 942–947.


Q. Wu, R. Law / Expert Systems with Applications xxx (2011) xxx–xxx Wu, Q. (2009). The forecasting model based on wavelet v-support vector machine. Expert Systems with Applications, 36(4), 7604–7610. Wu, Q., Liu, J., Xiong, F. L., & Liu, X. J. (2009). The fuzzy wavelet classifier machine with penalizing hybrid noises from complex diagnosis system. Acta Automatica Sinica, 35(6), 773–779. Wu, Q. (2010). Product demand forecasts using wavelet kernel support vector machine and particle swarm optimization in manufacture system [J]. Journal of Computational and Applied Mathematics, 233(10), 2481–2491. Wu, Q., & Law, R. (2010a). Fuzzy support vector regression machine with penalizing Gaussian noises on triangular fuzzy number space. Expert Systems with Applications, 37(12), 7788–7795. Wu, Q., & Law, R. (2010b). Complex system fault diagnosis based on a fuzzy robust wavelet support vector classifier and an adaptive Gaussian particle swarm optimization [J]. Information Sciences, 180(23), 4514–4528.

9

Wu, Q., Wu, S., & Liu, J. (2010). Hybrid model based on SVM with Gaussian loss function and adaptive Gaussian PSO [J]. Engineering Applications of Artificial Intelligence, 23(4), 487–494. Wu, Q. (2011a). Fuzzy robust v-support vector machine with penalizing hybrid noises on triangular fuzzy number space [J]. Expert Systems with Applications, 38(1), 39–46. Wu, Q. (2011b). A self-adaptive embedded chaotic particle swarm optimization for parameters selection of Wv-SVM [J]. Expert Systems with Applications, 38(1), 184–192. Xun, L., Chen, R. C., & Guo, X. (2008). Pruning support vector machines without altering performances. IEEE Transactions on Neural Networks, 19(10), 1792–1803.


The complex fuzzy system forecasting model ... - Semantic Scholar

The complex fuzzy system forecasting model ... - Semantic Scholar

Suggest Documents

A Hybrid Fuzzy Time Series Model for Forecasting - Semantic Scholar

Fuzzy Time Series Forecasting - Semantic Scholar

The fuzzy Potts model. - Semantic Scholar

Knowledge Representation for Fuzzy Model ... - Semantic Scholar

Fuzzy Model-Based Reinforcement Learning - Semantic Scholar

Selection of Heterogeneous Fuzzy Model ... - Semantic Scholar

Fuzzy Case-Based Reasoning System - Semantic Scholar

Lorenz System Stabilization Using Fuzzy ... - Semantic Scholar

A fuzzy expert system - Semantic Scholar

FUZZY DECISION SUPPORT SYSTEM TO ... - Semantic Scholar

OPERATING SYSTEM SELECTION USING FUZZY ... - Semantic Scholar

Fuzzy model predictive control - Fuzzy Systems ... - Semantic Scholar

Forecasting Based on High-Order Fuzzy ... - Semantic Scholar

Programming Model for Supporting Complex ... - Semantic Scholar

Foam as a complex system - Semantic Scholar

Fast Mining and Forecasting of Complex Time ... - Semantic Scholar

The chancellor model: Forecasting German elections - Semantic Scholar

The Application of Mamdani Fuzzy Model for Auto ... - Semantic Scholar

A fuzzy TOPSIS model to evaluate the Business ... - Semantic Scholar

Complex Fuzzy Computing to Time Series Prediction - Semantic Scholar

Fuzzy Cognitive Maps for Modeling Complex ... - Semantic Scholar

Complex object comparison in a fuzzy context - Semantic Scholar

Complex Fuzzy Sets: Towards New Foundations - Semantic Scholar

VLSI hardware architecture for complex fuzzy ... - Semantic Scholar