dictionary-based decomposition of linear mixtures ... - Semantic Scholar

1 downloads 0 Views 153KB Size Report
DICTIONARY-BASED DECOMPOSITION OF LINEAR MIXTURES. OF GAUSSIAN PROCESSES. Christophe Couvreur. Service de Physique g en erale. Facult e ...
DICTIONARY-BASED DECOMPOSITION OF LINEAR MIXTURES OF GAUSSIAN PROCESSES Christophe Couvreur

Yoram Breslery

Service de Physique generale Faculte Polytechnique de Mons B-7000 Mons (Belgium) [email protected]

Coordinated Science Laboratory University of Illinois at Urbana-Champaign Urbana, IL 61801 [email protected]

ABSTRACT

We consider the problem of detecting and classifying an unknown number of multiple simultaneous Gaussian processes with unknown variances given a nite length observation of their sum and a dictionary of candidate models for the signals. The optimal minimum description length (MDL) detector is presented. Asymptotic and quadratic approximations of the MDL criterion are derived, and regularization algorithms for their ecient implementation are described. The performance of the algorithms is illustrated by numerical simulations. Interpretations in terms of vector quantization and in model-based spectral analysis are discussed together with applications and possible extensions.

1. INTRODUCTION In [1], a method was proposed for the detection and classi cation of an unknown number of simultaneous Gaussian autoregressive (AR) signals with unknown variances given a nite length observation of their sum and a dictionary of candidate AR models. It was shown that the problem reduced to the maximum-likelihood (ML) estimation of the variances of the AR components for every possible subset of AR models formed from the dictionary. The \best" subset of AR components was then found by applying Rissanen's minimum description length (MDL) principle. The ML estimates of the variances were obtained by combining the EM algorithm and the Rauch-Tung-Striebel optimal smoother. In this paper, we extend the results of [1] to dictionaries composed of more general classes of Gaussian processes than simply AR processes. We also propose an original method to avoid the combinatorial explosion usually associated with the search through all the possible subsets of the dictionary. A discrete-time series fyt g is de ned as a linear mixture of Gaussian processes if fyt g is the sum c components sequences fxi;t g, c X xi;t ; (1) yt = i=1

 Christophe Couvreur is a Research Assistant of the Belgian National Fund for Scienti c Research at Faculte Polytechnique de Mons, Mons, Belgium. yThe work of Yoram Bresler is supported in part by a National Science Foundation grant No. MIP 91-57377.

and the sequences fxi;t g, i = 1; :::; c, are uncorrelated stationary Gaussian random processes. Let x2i be the variance of fxi;t g and let Sxi (!) be its normalized power spectral density (PSD). The PSD of fyt g is then given by Sy (!)

=

c X i=1

x2i Sxi (!):

(2)

We consider the problem of detecting and classifying the components that are present in the mixture fyt g when a set of candidate processes of known statistics is available. That is, we have a nite dictionary of Gaussian processes fxi;t g de ned, e.g., by their normalized PSD Sxi (!), and we want to nd which elements from this dictionary are present in the observed signal fyt g. Typical examples of situations in which dictionary-based decomposition of a linear mixture of signals is desired include environmental sound recognition, speech processing in noisy background, and stochastic nuclear magnetic resonance (NMR) spectroscopy. As will be seen in the sequel, the dictionary-based mixture decomposition problem is, in a stochastic sense, equivalent to vector quantization (VQ) in the spectral domain with the possibility of simultaneous selection of multiple codewords from the codebook to best t the observed signal. Such simultaneous VQ can be used as a preprocessor to implement an extension of the VQ-HMM paradigm for the classi cation of mixtures of signals, of direct interest in environmental sound recognition and in speech processing in noisy background [2]. In NMR spectroscopy, a dictionary of spectra of single chemical elements can be easily constructed. For a mixture of chemical elements, the NMR spectrum will be the sum of the individual NMR spectra weighted by the proportions of the elements in the mixture. The algorithms proposed here not only detect which of the components from the dictionary are present in the observed signal, but also estimate their proportions, which suggests their application to model-based spectral analysis in stochastic NMR spectroscopy.

2. PROBLEM FORMULATION Let be a dictionary of M models of stationary Gaussian processes de ned by their PSD Sxi (!), or, equivalently, by their autocorrelation sequences frxi ;k g, (3)

= fSxi (!); i = 1; : : : ; M g :

Let  , c( ), and ( ) denote respectively a subset of components from , the number of components in  , and the associated vector of variances ( ) = [x21 ; : : : ; x2c() ]T . To each subset  with associated variances ( ) corresponds a linear mixture signal model for the process fyt g de ned by (1) and (2). Let y = [y1; : : : ; yN ]T denote a nite length observation of a linear mixture fyt g of c Gaussian components taken from the dictionary . We want to nd which c elements from are present in fyt g from the observation y. That is, we want to nd the subset   that corresponds to the \best" linear mixture signal model for the observation y. Alternately, we want to nd the \best" decomposition of the linear mixture fyt g onto the dictionary . The number of components c and their variances x21 ; : : : ; x2c are a priori unknown. It was shown in [1] that the optimal solution to this decomposition problem in an information theoretic sense is given by the subset  that minimizes Rissanen's minimum description length criterion MDL( ) over all possible subsets   . For the linear mixture case, the MDL criterion is given by (4) MDL( ) = ?L(y; ; ^( )) + c(2 ) ln N; where L(y; ; ^( )) = ln p(y; ; ( )) is the log-likelihood of the observation y given the signal model corresponding to the subset  and its associated variances ^( ), and ^( ) = arg max L(y; ; ( )); ()

(5)

that is, ^( ) is the ML estimate of ( ) given the signal model  . For a linear mixture of Gaussian processes, the log-likelihood is equal to 1 1 N (6) L(y; ; ^( )) = ? ln 2 ? ln jRj ? yT R?1 y; 2 2 2 with the Toeplitz covariance matrix R = R(; ( )) de ned by [R]ij = x21 rx1 ;ji?jj + : : : + x2c() rxc() ;ji?jj : (7) The MDL criterion can be shown to lead to a consistent solution.

3. REGULARIZATION The minimization of (4) with (5) is a hybrid two-level optimization problem: it requires a combinatorial minimization over all the possible subsets  and a continuous maximization over the mixture variances parameter ( ) for each subset  . The combinatorial part will rapidly lead to an explosion of the computational load, even for moderate dictionary sizes. To avoid this combinatorial explosion, a regularization argument can be used to reduce the hybrid optimization problem to a fully continuous optimization problem. The equivalent continuous formulation is obtained as follows. The correct signal model corresponding to the subset  with variances ( ) can alternately be viewed as the signal model corresponding to all the elements from the dictionary

(a)

6 (u) 1

(b)

@@@@@@@@@@@ t 0

-u

6(u)

1

-u

0

Figure 1: Regularization term: (a) \exact", (b) approximate.

present, but with the variances of the components in n  set to zero. Therefore, the minimization of (4) with (5) is equivalent to the constrained minimization of the loglikelihood with signal model corresponding to plus some regularization terms accounting for the non-zero variance components. That is, "

#

M

ln N X ( ( )) ^( ) = arg min ?L(y; ( ); ) + i 2 ( ) i=1

where



;

(8)

0 when u = 0 ; (9) 1 when u > 0 and the components of ( ) = [x21 ; : : : ; x2M ]T , which are variances, are constrained to be positive. The subset of elements from maximizing (4) is then determined by the non-zero elements of ^( ), which act as an indicator variables. The constrained minimization of (8) can be approximately replaced by the unconstrained minimization (u) =

~( ) = arg min ( )

"

?L(y; ( ); ) +

M X i=1

#

(i ( )) ;

(10)

where the regularization term (u) is a smooth approximation of (u) and accounts for the positiveness constraint (see Figure 1). For example, (u) =

8 < u2  :



ln N 1 ? exp ? u2 2 



when u < 0 when u  0

;

(11)

with adequately chosen regularization constants  and . Once the unconstrained minimization of (10) has been performed, the variances in ~( ) can be thresholded and used as indicator variables. Note that alternative selection criteria could be used for the selection of the best subset in instead of the MDL

criterion. Many of these criteria will have a structure similar to that of (4) and the regularization method proposed here could be applied for their optimization. For example, Akaike's information criterion AIC for the linear mixture decomposition problem can be written as AIC( ) = ?2 ln p(y; ^( );  ) + 2c( ); (12) which can be straightforwardly regularized. In the next sections, we will be mostly concerned with the \regularized" problem. For notational simplicity, we will drop the  indexing. Unless stated otherwise, L(y; ) will stand for L(y; ; ( )).

4. APPROXIMATION OF THE MDL CRITERION Both the solution of the \exact" problem including the combinatorial optimization and the solution of the \regularized" problem with standard minimization techniques imply the evaluation of the log-likelihood L(y; ) and its derivatives. Direct evaluation of L(y; ) by (6) for a given y and a given  requires O(N 3 ) operations, which is not practical, especially for large sample size N . In some situations, the speci c structure of L(y; ) can be exploited to develop simultaneously ecient evaluation and optimization algorithms. For example, if the dictionary

contains only rational PSDs (AR, MA, or ARMA), the computation of the MDL criterion for some subset  ( =

in the regularized case) can be implemented by combining the EM algorithm with the Raunch-Tung-Striebel optimal smoother [1][3]. The iterative EM algorithm requires O(N [c( )]3) operations per iteration (c( ) = M in the regularized case). In general, however, it is necessary to resort to approximations of L(y; ). An asymptotic approximation and a quadratic approximation of L(y; ) are now derived. Although they are introduced here to be used in the regularized approach, these approximations are also suitable to reduce the computational load involved by the maximization of L(y; ; ( )) for each subset  in the \combinatorial" approach.

4.1. Asymptotic approximation

Instead of the exact log-likelihood L(y; ), it is possible to use Whittle's asymptotic approximation which is given by  Z  N IN (!) d! LW (y; ) = ?N ln 2 ? ln S (!) + 4 ? S (!) (13) where 2 N 1 X y e?j!k (14) IN (!) = 2N k=1 k is the periodogram of y, and S (!) denotes the spectral density of y given the value of  for the current subset. For large N , it can be shown that LW (y; )  L(y; ) [4]. Practically, the integral in (13) can be replaced by its Riemann approximating sum over a discrete set of frequencies, e.g., N equally spaced discrete frequencies with N a power of 2 to allow ecient computation of the periodogram by FFT.

The computation of the periodogram requires O(N log 2 N ) operations. The minimization of the regularized approximate MDL c X (15) ?LW (y; ) + ln2N (i ) i=1

can then be performed by classical iterative optimization methods. Each iteration will require 0(N ) operations for the evaluation of (15) or its derivatives.

4.2. Alternative interpretation

Whittle's approximation also leads to an interesting vector quantization interpretation of the dictionary-based mixture decomposition problem. First, observe that the maximization of the log-likelihood is equivalent to the minimization of the Itakura-Saito spectral distance between the periodogram IN (!) and S (!). The Itakura-Saito distance dIS is de ned by [5]  Z   IN (!) S (!) dIS (S ; IN ) = 2 ln I (!) + S (!) ? 1 d!: (16) N  ? Thus, the minimizer of the MDL selection criterion (4) is also approximately the minimizer of c( ) N dIS (S() ; IN ) + (17) 8 min 2 ln N: () Therefore, in VQ parlance, the utilization of the MDL selection criterion yields the linear combination of codewords taken from the codebook that has minimum Itakura-Saito distance to the empirical spectrum of the signal, with a complexity penalty for the number of codewords used. Alternately, the dictionary-based decomposition of linear mixture of Gaussian processes can be interpreted as a model-based spectral analysis method where the parameters are both discrete (the indicators of the elements taken from the dictionary) and continuous (the variances). These parameters are estimated via a spectral-matching criterion based on the Itakura-Saito spectral distance, and including a penalty term for the number of continuous parameters used in the model.

4.3. Quadratic approximation

Let S^N (!) denote a consistent spectral estimator obtained from y, e.g., an averaged, smoothed, or windowed periodogram [6]. If a second-order Taylor expansion about S^N (!) is used in (16), and the integral is replaced by its Riemann approximating sum over N points, the following quadratic approximation of dIS (S ; IN ) is obtained after some trivial algebra, dIS (S ; IN ) 2 Z  S (!) ? IN (!) d!  S^N (!) ? N ?1 h X 2k ) S ( 2k ) ? I ( 2k )i2  2 S^?1 ( N

k =0

N

= 2N Q(y; ):

N



N

N

N

(18)

With this approximation and the remark of Section 4.2, the minimization of the regularized MDL criterion (10) becomes equivalent to the minimization of M 1 Q(y; ) + ln N X 4 2 i=1 (i ):

(19)

Since (19) is non-convex in  and possibly multimodal, convergence of optimization algorithms to the global minimizer is not guaranteed. Furthermore, even convergence to a local minimum may be dicult with classical algorithms, owing to numerical diculties that can arise with such objective functions. However, a recently proposed deterministic relaxation algorithm [7] can be used to transform this nonconvex optimization problem into a sequence of piecewise quadratic (convex) optimization problem. This algorithm has been shown to be globally convergent for a certain class of cost functions Q and regularization terms  to which our problem belongs. If the regularization term (11) is used, and it is assumed without loss of generality that  = 1, the deterministic relaxation algorithm is de ned as follows. Let J : IR M  (0; 1]M ! IR be the functional de ned by J (;

e) = 41 Q(y; ) + G() + ln2N

M X i=1

ei i2 ;

(20)

where e = [e1; :::; eM ]T 2 (0; 1]M , and G() = 

M X ? i=1

 [i ]? 2 ;

(21)

with [i ]? the negative part of i . The k-th iteration algorithm is de ned by  2 = exp ? i(k) ; (k+1) = arg min J (; e):  ei

i

= 1; : : : ; M;

(22) (23)

The algorithmic global convergence of this algorithm is direct consequence of the theorems in [7]. Note that J (; e) is a convex function of  for a given e. Therefore, it admits a unique minimizer and most standard iterative optimization algorithms will converge to it [8]. This algorithm requires O(N ) operations per iteration.

5. PRELIMINARY RESULTS In order to evaluate the performance of the proposed algorithms, we conducted several Monte-Carlo simulations. The algorithms were implemented in MATLAB. We used a Quasi-Newton method (i.e., the Broyden-Fletcher-GolfarbShanno algorithm [8]) for the optimization of the regularized asymptotic MDL (15) and for the inner optimization in the deterministic relaxation algorithm (23) used in the regularized quadratic MDL (19). We used the Welsh spectral estimator for S^N (!) in the quadratic term Q(y; ). The iterative algorithms were initialized by arg min Q(y; ), which can be obtained by solving a weighted least-square problem. We used a dictionary of 5 ARMA processes to allow

Table 1: Dictionary of ARMA models (fui;t g is the innovation sequence of process fxi;t g). x1;t = u1;t (all-pass) x2;t = u2;t ? 0:2u2;t?1 + 0:9u2;t?2 (stop-band) x3;t = 0:7x3;t?1 + u3;t (low-pass) x4;t = 0:8x4;t?1 ? 0:92x4;t?2 + u4;t (broad-band) x5;t = ?0:5x5;t?1 + 0:65x5;t?2 + u5;t (narrow-band) comparison with the \exact" MDL approach of [3]. The ARMA models in the dictionary are described in Table 1. The dictionary includes white noise, stop-band, low-pass, narrow-band, and broad-band processes. The number of samples N was varied from 128 to 2048. We found that the regularized quadratic MDL method o ered good convergence results with only a few iterations of (23)-(23). A number of 10 to 15 BFGS iterations in (23) seemed to provide a good compromise between accuracy and computational savings. The asymptotic approximation method sometimes encountered convergence problems. The numerical results obtained con rmed the validity of our approximations, and established consistency of the method. Moreover, the algorithms showed robustness to non-Gaussianity of the data and to dictionary mismatches, which could have been expected from their VQ interpretation. Complete results, including simulations performed on larger dictionaries and comparisons with the \combinatorial" approach, will be presented at the conference.

6. REFERENCES [1] C. Couvreur and Y. Bresler, \Decomposition of a mixture of Gaussian AR processes", in Proc. ICASSP'95, Detroit, MI, May 8{12 1995, pp. 1605{1608. [2] C. Couvreur, V. Fontaine, and H. Leich, \Discrete HMM's for mixtures of signals", in Proc. VIII Euro. Sig. Process. Conf., Trieste, Italy, September 10{13 1996. [3] C. Couvreur and Y. Bresler, \Optimal decomposition and classi cation of linear mixtures of ARMA processes", In preparation. [4] K.O. Dzhaparidze, Parameter Estimation and Hypothesis Testing in Spectral Analysis of Stationary Times Series, Springer-Verlag, 1986. [5] F. Itakura and S. Saito, \A statistical method for estimation of speech spectral density and formant frequencies", Electronics and Communications in Japan, vol. 53-A, no. 1, pp. 36{43, 1970. [6] B. Porat, Digital Processing of Random Signals, Theory and Methods, Prentice-Hall, 1994. [7] A. H. Delaney and Y. Bresler, \Globally convergent edge preserving regularized reconstruction: An application to limited-angle tomography", IEEE Trans. Image Process., 1994, Submitted for publication. [8] D. G. Luenberger, Linear and Nonlinear Programming, Addison-Wesley, second edition, 1989.

Suggest Documents