A Novel Recursive Solution to LS-SVR for Robust

0 downloads 0 Views 329KB Size Report
J. D. A. Santos and G. A. Barreto. Thereby, some authors have devoted attention to develop robust variants for the LS-SVR models. For example, Suykens et al.
A Novel Recursive Solution to LS-SVR for Robust Identification of Dynamical Systems Jos´e Daniel A. Santos1 , Guilherme A. Barreto2 Federal Institute of Education, Science and Technology of Cear´ a, Department of Industry, Maracana´ u, Cear´ a, Brazil 1 [email protected] Federal University of Cear´ a, Department of Teleinformatics Engineering, Center of Technology, Campus of Pici, Fortaleza, Cear´ a, Brazil 2 [email protected]

Abstract. Least Squares Support Vector Regression (LS-SVR) is a powerful kernel-based learning tool for regression problems. However, since it is based on the ordinary least squares (OLS) approach for parameter estimation, the standard LS-SVR model is very sensitive to outliers. Robust variants of the LS-SVR model, such as the WLS-SVR and IRLSSVR models, have been developed aiming at adding robustness to the parameter estimation process, but they still rely on OLS solutions. In this paper we propose a totally different approach to robustify the LS-SVR. Unlike previous models, we maintain the original LS-SVR loss function, while the solution of the resulting linear system for parameter estimation is obtained by means of the Recursive Least M -estimate (RLM) algorithm. We evaluate the proposed approach in nonlinear system identification tasks, using artificial and real-world datasets contaminated with outliers. The obtained results for infinite-steps-ahead prediction shows that proposed model consistently outperforms the WLS-SVR and IRLSSVR models for all studied scenarios. Keywords: Nonlinear regression, outliers, LS-SVR, system identification, M -estimation.

1

Introduction

Least Squares Support Vector Regression (LS-SVR) [4, 5] is a widely used tool for regression problems, being successfully applied to time series forecasting [6], control systems [7] and system identification [8]. Its standard formulation makes use of equality constraints and a least squares loss function [5]. Then, the solution for the parameters is obtained by solving a system of linear equations, using ordinary Least Squares (OLS) algorithm. However, OLS provides optimal solution only if errors follow a Gaussian distribution. Unfortunately, in several real-world applications of regression it is a too strong assumption because the available data is usually contaminated with impulsive noise or outliers. In these scenarios, the performance of LS-SVR models may decline significantly.

2

J. D. A. Santos and G. A. Barreto

Thereby, some authors have devoted attention to develop robust variants for the LS-SVR models. For example, Suykens et al. [1] introduced a weighted version of LS-SVR based on M -Estimators [9], and named it Weighted Least Squares Support Vector Regression (WLS-SVR). Later, De Brabanter et al. [2] also introduced a robust estimate upon the previous LS-SVR solutions using an iteratively reweighting approach, named Iteratively Reweighted Least Squares Support Vector Regression (IRLS-SVR). In essence, while these robust variants modify the original LS-SVR loss function to penalize large error, they still rely on the OLS algorithm to provide a solution to the resulting linear system. In this paper, we introduce a different approach to add outlier robustness to the LS-SVR model. Instead of modifying its loss function, we decide to solve the resulting linear system for parameter estimation by using the Recursive Least M -estimate (RLM) algorithm [3]. The RLM rule is itself a robust variant of the standard recursive least squares (RLS) algorithm, which was modified by the use of M -estimators to handle outliers. The proposed approach, referred to as the RLM-SVR model, updates recursively the vector of Lagrange multipliers and bias for each input sample. By doing this, the occurrence of outliers in a given input sample may be treated individually, improving the quality of the solution. For the purpose of validation, we use two synthetic datasets, whose outputs are contaminated with different amounts of outliers, and a real world dataset. The performance of the proposed RLM-SVR model is then compared with those provided by the standard LSSVR, WLS-SVR and IRLS-SVR models in infinite-step-ahead prediction tasks. The remainder of the paper is organized as follows. In Section 2 we describe briefly the LS-SVR, WLS-SVR and IRLS-SVR models. In Section 3 we introduce our approach. In Section 4 we report the achieved results. Finally, we conclude the paper in Section 5.

2

Evaluated Models

Initially, let us consider the estimation dataset {(xn , yn )}N n=1 , with the inputs xn ∈ Rp and corresponding outputs yn ∈ R. In a regression problem, the goal is to search for a function f (·) that approximates the outputs yn for all instances of the available data. For the nonlinear case, f usually takes the form f (x) = hw, ϕ(x)i + b,

(1)

where ϕ(·) : Rp → Rph is a nonlinear map into a higher dimensional feature space, h·, ·i denotes a dot product in this feature space, w ∈ Rph is a vector of weights and b ∈ R is a bias. The formulation of the parameter estimation problem in LS-SVR leads to the minimization of the following functional [5, 4] N 1X 2 1 2 e , J(w, e) = kwk2 + C 2 2 n=1 n

(2)

A Novel Recursive Solution to LS-SVR

3

subject to yn = hw, ϕ(xn )i + b + en ,

n = 1, 2, . . . , N

(3)

where en = yn − f (xn ) is the error due to the n-th input pattern and C > 0 is a regularization parameter. The Lagrangian function of the optimization problem in Eqs. (2) and (3) is L(w, b, e, α0 ) =

N N 1X 2 X 1 kwk22 + C en − αn [hw, ϕ(xn )i + b + en − yn ], (4) 2 2 n=1 n=1

where αn ’s are the Lagrange multipliers. The optimal dual variables correspond to the solution of the linear system Aα = y, given by      b 0 0 1T = , (5) y0 1 Ω + C −1 I α0 | {z } | {z } | {z } α

A

y

where y0 = [y1 , . . . , yN ]T , 1 = [1, . . . , 1]T , α0 = [α1 , . . . , αN ]T , Ω ∈ RN ×N is the kernel matrix whose entries are Ωi,j = k(xi , xj ) = hϕ(xi ), ϕ(xj )i where k(·, ·) is the chosen kernel function. Moreover, A ∈ R(N +1)×(N +1) , α ∈ R(N +1) and y ∈ R(N +1) . Thus, the solution for α is provided by the OLS algorithm as α = (AT A)−1 AT y.

(6)

Finally, the resulting LS-SVR model for nonlinear regression is given by f (x) =

N X

αn k(x, xn ) + b.

(7)

n=1

The Gaussian kernel, k(x, xn ) = exp 2.1

n

kx−xn k22 2γ 2

o , was adopted in this paper.

The WLS-SVR Model

The WLS-SVR model [1] is formulated by the minimization of the loss function J(w, e) =

N 1X 1 kwk22 + C vn e2n , 2 2 n=1

(8)

subject to the same constraints in Eq. (3). The error variables from the unweighted LS-SVR are weighted by the vector v = [v1 , . . . , vN ]T , according to Eq. (11). The optimal solution for WLS-SVR is provided by by solving the linear system Av α = y also by means of the OLS algorithm: α = (ATv Av )−1 ATv y, where the matrix Av ∈ R(N +1)×(N +1) is defined as   0 1T Av = , 1 Ω + C −1 V

(9)

(10)

4

J. D. A. Santos and G. A. Barreto

and the diagonal matrix V ∈ RN ×N is given by V = diag

n

1 1 v1 , . . . , vN

o .

Each weight vn is determined based on the error variables en = αn /C from the original LS-SVR model. In this paper, the robust estimates are obtained from Hampel weight function as follows  if |en /ˆ s| ≤ c1 , 1 s| n /ˆ (11) vn = c2c−|e if c < |en /ˆ s| ≤ c2 , 1 2 −c1  −8 10 otherwise, where sˆ = IQR1 /1.349 is a robust estimate of the standard deviation of the LS-SVR error variables en . Values c1 = 2.5 and c2 = 3.0 are typical ones. 2.2

The IRLS-SVR Model

The weighting procedure for WLS-SVR model can be repeated iteratively giving rise to the IRLS-SVR model [2]. At each iteration i, one can weight the error (i) (i) (i) variables en = αn /C for n = 1, . . . , N . The weights vn are calculated based (i) s and using the weight function in Eq. (11). The next step requires upon en /ˆ (i) the solution of the linear system Av α(i) = y, where ) (   1 1 0 1T (i) (i) (12) , . . . , (i) . Av = , and V = diag (i) 1 Ω + C −1 Vi v v 1

N

The resulting model in the i-th iteration is then given by f (i) (x) =

N X

αn(i) k(x, xn ) + b(i) .

(13)

n=1 (i+1)

and vector αi+1 are calculated. Then, we set i = i+1 and the new weights vn (i−1) (i) In this paper, this procedure continues until maxn (|αn − αn |) ≤ 10−3 .

3

The Proposed Approach

Initially, we compute matrix A = [a1 , . . . , an , . . . , aN +1 ], where an ∈ RN +1 , and vector y in Eq. (5) as in the standard LS-SVR model. The proposed idea involves the application of a robust recursive estimation algorithm to compute at each iteration n the vector αn considering as the input vectors the columns of matrix A, rather than using the batch OLS algorithm. As we will show in the simulations, this simple approach turned out to be very effective in providing robust estimates of the vector α resulting in very good performance for the RLM-SVR model when the data are contaminated with outliers. 1

IQR stands for InterQuantile Range, which is the difference between the 75th percentile and 25th percentile.

A Novel Recursive Solution to LS-SVR

5

The RLM algorithm [3] was chosen as the recursive estimation rule in our approach, and its loss function is given by Jρ(n)

=

N +1 X

λN +1−n ρ(en ),

(14)

n=1

where ρ(·) is an M -estimate function and 0  λ ≤ 1 is a exponential forgetting factor, since the information of the distant past has an increasingly negligible effect on the coefficient updating. In this paper, the Hampel’s three-part M -estimate function is considered  2 e  0 ≤ |e| < ξ1 ,  2    ξ |e| − ξ12 ξ1 ≤ |e| < ξ2 , 1 2 (15) ρ(e) = ξ1 ξ12 ξ1 (|e|−ξ3 )2  ξ2 ≤ |e| < ξ3 ,  2 (ξ3 + ξ2 ) − 2 + 2 ξ2 −ξ3    ξ1 ξ12 otherwise, 2 (ξ3 + ξ2 ) − 2 where ξ1 , ξ2 and ξ3 are threshold parameters which need to be estimated continuously. Let the error en have a Gaussian distribution possibly corrupted with some impulsive noise. The error variance σn2 at iteration n is estimated as follows: 2 σ ˆn2 = λe σ ˆn−1 + c(1 − λe )med(Fn ),

(16)

where 0  λe ≤ 1 is a forgetting factor, med(·) is the median operator, Fn = {e2n , e2n−1 , . . . , e2n−Nw +1 }, Nw is the fixed window length for the median operation and c = 1.483(1 + 5/(Nw − 1)) is the estimator’s correction factor. In this paper, we set ξ1 = 1.96ˆ σi , ξ2 = 2.24ˆ σi and ξ3 = 2.576ˆ σi [3]. Moreover, the values λe = 0.95 and Nw = 14 were fixed. The optimal vector α can be obtained by setting the first order partial deriva(n) tive of Jρ with respect to αn to zero. This yields (n) R(n) ρ αn = Pρ , (n)

(17)

(n)

where Rρ and Pρ are called the M -estimate correlation matrix of an and the M -estimate cross-correlation vector of an and yn , respectively. Eq. (17) can be solved recursively using RLM by means of the following equations: Sn = λ−1 (I − gn aTn )Sn−1 ,

(18)

q(en )Sn−1 an , λ + q(en )aTn Sn−1 an

(19)

gn =

αn = αn−1 + (yn − aTn αn−1 )gn , −1(n) Rρ ,

1 ∂ρ(en ) en ∂en

(20)

where Sn = q(en ) = is the weight function and gn is the Mestimate gain vector. The maximum number of iterations is N + 1; however, it may be necessary to cycle over the data samples Ne > 1 times in order to the algorithm to converge. In this paper, we used Ne = 20 for all experiments. The RLM-SVR model is summarized in Algorithm (1).

6

J. D. A. Santos and G. A. Barreto

Algorithm 1 Pseudo-code for RLM-SVR algorithm. Require: C, γ, δ, Ne , λ, λe , Nw calculate A and y from Eq. (5) set S0 = δI, g0 = α0 = 0 for i = 1 : Ne , do for n = 1 : N + 1, do en = yn − αTn−1 an calculate σ ˆn2 from Eq. (16) calculate q(en ) from the derivative of Eq. (15) update Sn from Eq. (18) update gn from Eq. (19) update αn from Eq. (20) end for end for Table 1. Details of the artificial datasets used in the computational experiments. The indicated noise in the last column is added only to the output of the estimation data. Input/Samples # Output

Estimation Test Noise un = N (un |0, 1) un = N (un |0, 1) 1 yn = yn−1 − 0.5 tanh(yn−1 + u3n−1 ) −1 ≤ un ≤ 1 −1 ≤ un ≤ 1 N (0, 0.0025) 150 samples 150 samples 2

4

yn =

yn−1 yn−2 (yn−1 +2.5) 2 2 1+yn−1 +yn−2

un = U(−2, 2) un = sin(2πn/25) N (0, 0.29) 300 samples 100 samples

Computer Experiments

In order to assess the performance of the proposed approach in nonlinear system identification for infinite-step-ahead prediction and in the presence of outliers, we carried out computer experiments with two artificial datasets (Table 1) and a real-world benchmarking dataset. In addition to Gaussian noise, the estimation data of all datasets were also incrementally corrupted with a fixed number of outliers equal to 2.5%, 5% and 10% of the estimation samples. Each randomly chosen sample was added by a uniformly distributed value U(−My , +My ), where My is the maximum absolute output. Only the output values were corrupted. The artificial datasets 1 and 2 were generated following [10] and [11], respectively. The real-world dataset is called wing flutter and it is available at DaISy repository2 . This dataset corresponds to a SISO (Single Input Single Output) system with 1024 samples of each sequence u and y, in which 512 samples were used for estimation and the other 512 for test. The orders Lu and Ly chosen for the regressors in artificial datasets were set to their largest delays, according to Tab. 1. For wing flutter dataset the orders 2

http://homes.esat.kuleuven.be/smc/daisy/daisydata.html

A Novel Recursive Solution to LS-SVR

(a) Artificial 1 dataset.

7

(b) Artificial 2 dataset.

Fig. 1. RMSE with test samples in free simulation.

Fig. 2. RMSE with test samples of flutter dataset in free simulation.

Lu , Ly ∈ {1, . . . , 5} were set after a 10-fold cross validation strategy in LS-SVR model. The same strategy was used to set C ∈ {20 , . . . , 220 }, γ ∈ {2−10 , . . . , 20 } and λ ∈ {0.99, 0.999, 0.9999} in the search for their optimal values. Furthermore, we use δ = 102 or δ = 103 to initialize the matrix S. We compare the performance of RLM-SVR, standard LS-SVR, WLS-SVR and IRLS-SVR models over 20 independent runs of each algorithm. The Root Mean Square Error (RMSE) distributions for test samples with artificial datasets and flutter wing dataset are shown in Fig. (1) and Fig. (2), respectively. In all scenarios with artificial and real datasets, the LS-SVR, WLS-SVR and IRLS-SVR algorithms obtained the same RMSE value over the 20 runs. Although RLM-SVR algorithm presented some variance for RMSE values over the runs, their maximum values were even lower than those obtained with the other models in all contamination scenarios with artificial datasets and with flutter dataset. Moreover, in the scenarios with artificial datasets without outliers, the RLMSVR model achieved average value of RMSE closer to the other robust models.

5

Conclusion

In this paper we presented an outlier-robust recursive strategy to solve the parameter estimation problem of the standard LS-SVR model. The chosen recur-

8

J. D. A. Santos and G. A. Barreto

sive estimation rule is a robust variant of the RLS algorithm which uses M estimators. The resulting robust LS-SVR model, called RLM-SVR model, was succesfully applied to nonlinear dynamical system identification tasks under free simulation scenarios and in the presence of outliers. For all datasets used in the computer exeriments carried out in this paper, the worst results achieved by the RLM-SVR model (i.e. the maximum RMSE values) were better than those achieved by other robust variants of the LS-SVR model. For outlier-free scenarios, the RLM-SVR model performed similarly to them. In future work, we intend to apply our RLM-SVR model in other tasks, including time series forecasting, and perform a stability analysis with respect to the hiperparameters of the method. Acknowledgments: The authors thank the financial support of IFCE, NUTEC and CNPq (grant no. 309841/2012-7).

References 1. Suykens, J.A.K., De Brabanter, J., Lukas, L., Vandewalle, J.: Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing 48(1) (2002) 85–105 2. De Brabanter, K., Pelckmans, K., De Brabanter, J., Debruyne, M., Suykens, J.A., Hubert, M., De Moor, B.: Robustness of kernel based regression: a comparison of iterative weighting schemes. In: Artificial Neural Networks–ICANN 2009. Springer (2009) 100–110 3. Zou, Y., Chan, S., Ng, T.: A recursive least m-estimate (RLM) adaptive filter for robust filtering in impulse noise. IEEE Signal Proc. Let. 7(11) (2000) 324–326 4. Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: (ICML-1998) Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann (1998) 515–521 5. Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific (2002) 6. Van Gestel, T., Suykens, J.A., Baestaens, D.E., Lambrechts, A., Lanckriet, G., Vandaele, B., De Moor, B., Vandewalle, J.: Financial time series prediction using least squares support vector machines within the evidence framework. Neural Networks, IEEE Transactions on 12(4) (2001) 809–821 7. Khalil, H.M., El-Bardini, M.: Implementation of speed controller for rotary hydraulic motor based on LS-SVM. Expert Syst. Appl. 38(11) (2011) 14249–14256 8. Falck, T., Suykens, J.A., De Moor, B.: Robustness analysis for least squares kernel based regression: an optimization approach. In: Proceedings of the 48th IEEE Conference on Decision and Control (CDC’09). (2009) 6774–6779 9. Huber, P.J., et al.: Robust estimation of a location parameter. The Annals of Mathematical Statistics 35(1) (1964) 73–101 10. Kocijan, J., Girard, A., Banko, B., Murray-Smith, R.: Dynamic systems identification with gaussian processes. Mathematical and Computer Modelling of Dynamical Systems 11(4) (2005) 411–424 11. Narendra, K.S., Parthasarathy, K.: Identification and control of dynamical systems using neural networks. Neural Networks, IEEE Transactions on 1(1) (1990) 4–27

Suggest Documents