Discrete Time Signal Processing.Framework.with ...

0 downloads 0 Views 1MB Size Report
Machines. José Luis Rojo-Álvarez, Universidad Rey Juan Carlos, Spain. Manel Martínez-Ramón, Universidad Carlos III de Madrid, Spain. Gustavo Camps-Valls ...
150 Rojo-Álvarez, Martínez-Ramón, Camps-Valls, Martínez-Cruz, & Figuera

Chapter VI

Discrete Time Signal Processing Framework with Support Vector Machines José Luis Rojo-Álvarez, Universidad Rey Juan Carlos, Spain Manel Martínez-Ramón, Universidad Carlos III de Madrid, Spain Gustavo Camps-Valls, Universitat de València, Spain Carlos E. Martínez-Cruz, Universidad Carlos III de Madrid, Spain Carlos Figuera, Universidad Carlos III de Madrid, Spain

Abstract Digital signal processing (DSP) of time series using SVM has been addressed in the literature with a straightforward application of the SVM kernel regression, but the assumption of independently distributed samples in regression models is not fulfilled by a time-series problem. Therefore, a new branch of SVM algorithms has to be developed for the advantageous application of SVM concepts when we process data with underlying time-series structure. In this chapter, we summarize our past, present, and future proposal for the SVM-DSP framework, which consists of several principles for creating linear and nonlinear SVM algorithms devoted to DSP problems. First, the statement of linear signal models in the primal problem Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited.

Discrete Time Signal Processing Framework with Support Vector Machines 151

(primal signal models) allows us to obtain robust estimators of the model coefficients in classical DSP problems. Next, nonlinear SVM-DSP algorithms can be addressed from two different approaches: (a) reproducing kernel Hilbert spaces (RKHS) signal models, which state the signal model equation in the feature space, and (b) dual signal models, which are based on the nonlinear regression of the time instants with appropriate Mercer’s kernels. This way, concepts like filtering, time interpolation, and convolution are considered and analyzed, and they open the field for future development on signal processing algorithms following this SVM-DSP framework.

Introduction Support vector machines (SVMs) were originally conceived as efficient methods for pattern recognition and classification (Vapnik, 1995), and support vector regression (SVR) was subsequently proposed as the SVM implementation for regression and function approximation (e.g., Smola & Schölkopf, 2004). Many other digital signal processing (DSP) supervised and unsupervised schemes have also been stated from SVM principles, such as discriminant analysis (Baudat & Anouar, 2000), clustering (Ben-Hur, Hom, Siegelmann, & Vapnik, 2001), principal and independent component analysis (Bach & Jordan, 2002; Schölkopf, 1997), or mutual information extraction (Gretton, Herbrich, & Smola, 2003). Also, an interesting perspective for signal processing using SVM can be found in Mattera (2005), which relies on a different point of view of signal processing. The use of time series with supervised SVM algorithms has mainly focused on two DSP problems: (a) nonlinear system identification of the underlying relationship between two simultaneously recorded discrete-time processes, and (b) time-series prediction (Drezet & Harrison 1998; Gretton, Doucet, Herbrich, Rayner, & Schölkopf, 2001; Suykens, 2001). In both of them, the conventional SVR considers lagged and buffered samples of the available signals as its input vectors. Although good results in terms of signal-prediction accuracy are achieved with this approach, several concerns can be raised from a conceptual point of view. First, the basic assumption for the regression problem is that observations are independent and identically distributed; however, the requirement of independence among samples is not fulfilled at all by time-series data. Moreover, if we do not take into account the temporal dependence, we could be neglecting highly relevant structures, such as correlation or cross-correlation information. Second, most of the preceding DSP approaches use Vapnik’s e-insensitive cost function, which is a linear cost (that includes an insensitivity region). Nevertheless, when Gaussian noise is present in the data, a quadratic cost function should also be considered. Third, the previously mentioned methods take advantage of the well-known “kernel trick” (Aizerman, Braverman, & Rozoner, 1964) to develop nonlinear algorithms from a well-established linear signal processing technique. However, the SVM methodology has many other advantages, additional to the flexible use of Mercer’s kernels, which are still of great interest for many DSP problems that consider linear signal models. Finally, if we consider only SVR-based schemes, the analysis of an observed discrete-time sequence becomes limited because a wide variety of time-series structures are being ignored. Therefore, our purpose is to establish an appropriate framework for creating SVM algorithms in DSP problems involving time-series analysis. This framework is born from Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited.

152 Rojo-Álvarez, Martínez-Ramón, Camps-Valls, Martínez-Cruz, & Figuera

Figure 1. Scheme of the proposal for a SVM-DSP framework described in this chapter. Our aim is to develop and create a variety of algorithms for time series processing that can benefit from the excellent properties of the SVM in a variety of different signal models. Linear SVM-DSP algorithms

SVM-ARMA

SVM-Spect

Nonlinear SVM-DSP algorithms

SVM

SVM-ARMA

SVM

RKHS Signal Models SVM-Sinc Primal Signal Models

SVM-deconv

SVM-Sinc

SVM-deconv Dual Signal Models

the consideration that discrete-time data should be treated in a conceptually different way from the SVR way in order to develop more advantageous applications of SVM concepts and performance to data with underlying time-series structure. In this chapter, we summarize our past, present, and future proposal for the SVM-DSP framework, which consists of creating SVM algorithms devoted to specific problems of DSP. A brief scheme of our proposal is presented in Figure 1. On the one hand, the statement of linear signal models in the primal problem, which will be called SVM primal signal models, will allow us to obtain robust estimators of the model coefficients (Rojo-Álvarez et al., 2005) in classical DSP problems, such as auto-regressive and moving-averaged (ARMA) modeling, the g-filter, and the spectral analysis (Camps-Valls, Martínez-Ramón, Rojo-Álvarez, & Soria-Olivas, 2004; Rojo-Álvarez, Martínez-Ramón, Figueiras-Vidal, dePrado Cumplido, & Artés-Rodríguez, 2004; Rojo-Álvarez, Martínez-Ramón, Figueiras-Vidal, García-Armada, & Artés-Rodríguez, 2003). On the other hand, the consideration of nonlinear SVM-DSP algorithms can be addressed from two different approaches: (a) RKHS signal models, which state the signal model equation in the feature space (Martínez-Ramón, Rojo-Álvarez, CampsValls, Muñoz-Marí, Navia-Vázquez, Soria-Olivas, & Figueiras-Vidal, in press), and (b) dual signal models, which are based on the nonlinear regression of each single time instant with appropriate Mercer’s kernels (Rojo-Álvarez et al., 2006). While RKHS signal models allow us to scrutinize the statistical properties in the feature space, dual signal models yield an interesting and simple interpretation of the SVM algorithm under study in connection with the classical theory of linear systems. The rest of the chapter is organized as follows. In the next section, the e-Huber cost function (Mattera & Haykin, 1999; Rojo-Álvarez et al., 2003) is described, and the algorithm based on a generic primal signal model is introduced. SVM linear algorithms are then created for well-known time-series structures (spectral analysis, ARMA system identification, and the g-filter). An example of an algorithm statement from an RKHS signal model, the nonlinear ARMA system identification, is then presented. After that, SVM algorithms for time-series sinc interpolation and for nonblind deconvolution are obtained from dual signal models. A separate section presents simple application examples. Finally, some conclusions and several proposals for future work are considered. Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited.

Discrete Time Signal Processing Framework with Support Vector Machines 153

Primal Signal Models: SVM for Linear DSP A first class of SVM-DSP algorithms are those obtained from primal signal models. Rather than the accurate prediction of the observed signal, the main estimation target of the SVM linear framework is a set of model coefficients or parameters that contain relevant information about the time-series data. In this setting, the use of the e-Huber cost function of the residuals allows us to deal with Gaussian noise in all the SVM-DSP algorithms, while still yielding robust estimations of the model coefficients. Taking into account that many derivation steps are similar when proposing different SVM algorithms, a general model is next included that highlights the common and the problem-specific steps of several preceding proposed algorithms (Rojo-Álvarez et al., 2005). Examples of the use of this general signal model for stating new SVM-DSP linear algorithms are given by creating an SVM algorithm version for the spectral analysis, the ARMA system identification, and the g-filter structure.

The e-Huber Cost As previously mentioned, the DSP of time series using SVM methodology has mainly focused on two supervised problems (nonlinear time-series prediction and nonlinear system identification; Drezet & Harrison 1998; Gretton et al., 2001; Suykens, 2001), and both have been addressed from the straight application of the SVR algorithm. We start by noting that the conventional SVR minimizes the regularized Vapnik e-insensitive cost, which is in essence a linear cost. Hence, this is not the most suitable loss function in the presence of Gaussian noise, which will be a usual situation in time-series analysis. This fact was previously taken into account in the formulation of LS-SVM (Suykens, 2001), where a regularized quadratic cost is used for a variety of signal problems, but in this case, nonsparse solutions are obtained. Figure 2. (a) In the e-Huber cost function, three different regions allow to adapt to different kinds of noise. (b) The nonlinear relationship between the residuals and the Lagrange multipliers provides with robust estimates of model coefficients.

.

(a)

(b)

Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited.

154 Rojo-Álvarez, Martínez-Ramón, Camps-Valls, Martínez-Cruz, & Figuera

An alternative cost function of the residuals, the e-Huber cost, has been proposed (Mattera & Haykin, 1999; Rojo-Álvarez et al., 2003), which just combines both the quadratic and the e-insensitive cost functions. It has been shown to be a more appropriate residual cost, not only for time-series problems, but also for SVR in general (Camps-Valls, Bruzzone, RojoÁlvarez, & Melgani, 2006). The e-Huber cost is represented in Figure 2a, and is given by:

 0,  1 L H (en ) =  ( en − ) 2 , 2 1  2 C ( en − ) − 2 C ,

en ≤ ≤ en ≤ eC en ≥ eC



(1)

where en is the residual that corresponds to the nth observation for a given model, ec = + C; e is the insensitive parameter, and d and C control the trade-off between the regularization and the losses. The three different regions allow us to deal with different kinds of noise: the e-insensitive zone ignores absolute residuals lower than e; the quadratic cost zone uses the L2-norm of errors, which is appropriate for Gaussian noise; and the linear cost zone is an efficient limit for the impact of the outliers in the solution model coefficients. Note that equation (1) represents the Vapnik e-insensitive cost function when d is small enough, the least squares (LS) cost function for C → ∞ and = 0, and the Huber cost function when = 0.

The Primal Signal Model Let { yn } be a discrete-time series from which a set of N consecutive samples are measured and grouped into a vector of observations:

y n = [ y1 , y2 , , y N ]',

(2)

p and let the set of vectors {z } be a set of basis vectors spanning a P-dimensional subspace P E of the N-dimensional Hilbert signal space E N . These vectors are described by:

z p = [ z1p , z2p , , z Np ]',

p = 1, , P.

(3)

Each observed signal vector y can be represented as a linear combination of elements of this basis set, plus an error vector e = [e1 , , eN ]' modeling the measurement errors: P

y = ∑ w p z p + e.

(4)

p =1

Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited.

Discrete Time Signal Processing Framework with Support Vector Machines 155

For a given time instant n, a linear time-series model can be written as: P

yn = ∑ w p znp + en = w'v n + en, p =1

(5)

p 1 where w = [ w1 , , w p ]' is the model weight vector to be estimated, and v n = [ zn , , zn ]' I represents the input space E at time instant n. Equation (5) will be called the general primal signal model, and it defines the time-series structure of the observations. This equation represents the functional relationship between the observations, the data (signals generating the projected signal subspace), and the model residuals. In practice, the general primal signal model equation is fulfilled by the n available observations.

Note that input space EI is closely related to Hilbert signal subspace EP because the input vector at time instant n is given by the nth element of each of the basis space vectors of EP. For instance, in the case of a nonparametric spectral estimation, the basis of EP are the sinusoidal harmonics, whereas in the case of ARMA system identification, the basis of EP are the input signal and the delayed versions of input and output signals. The problem of estimating the coefficients can be stated as the minimization of: N 1 2 w + ∑ L H (en ). 2 n =1

(6)

Equivalently, by plugging equation (1) into equation (6), we have the following functional:

1 1 2 w + 2 2

∑(

n∈I1

2 n

+

*2 n

)+C∑ ( n∈I 2

n

+

* n

to be minimized with respect to w and {

)− ∑ n∈I 2

( *) n

C2 2

}, and constrained to

yn − w'v n ≤ + n − yn + w'v n ≤ + n

,

* n

(7)

* n



≥ 0

(8) (9) (10)

for n = 1, ..., N, where I1 and I2 are the observation indices whose residuals are in the quadratic and in the linear cost zone, respectively. The following expression for the coefficients is then obtained:

Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited.

156 Rojo-Álvarez, Martínez-Ramón, Camps-Valls, Martínez-Cruz, & Figuera N

wp = ∑

p n n

where

=

z

n =1

n

n



(11)

.

* n

Several properties of the method can be observed from these expressions. 1.

Coefficient vector w can be expressed as a (possibly sparse) linear combination of input space vectors.

2.

A straightforward relationship between the residuals and the Lagrange multipliers can be derived (e.g., Rojo-Álvarez et al., 2004) from the Karush-Khun-Tucker conditions:

n



 sign(en )C ,  1  =  sign(en ) ( en − ),  0,

e n ≥ eC ≤ en ≤ eC en