Contrast Functions For Blind Source Separation Based On Time-Frequency Information-Theory M. Sahmoudi1 , M. G. Amin1 , K. Abed-Meraim2 , and A. Belouchrani3 1
3
Center for Advanced Communications Villanova University, Villanova, PA 19085, USA {mohamed.sahmoudi, moeness.amin}@villanova.edu 2 TSI, Telecom Paris, 37 rue Dareau, 75014, Paris cedex
[email protected] Ecole Polytechnique, EE Dept., 10 Av. Hassen Badi, Algiers, Algeria
[email protected]
Abstract. This paper introduces new contrast functions for blind separation of sources with different time-frequency signatures. Two contrast functions based on the Kullback-Leibler and Jensen-R´enyi divergences in the time-frequency (T-F) plane are introduced. Two iterative algorithms are proposed for the proposed contrasts optimization and source separation. One algorithm consists of spatial whitening and gradient-Jacobi maximization, combining Givens rotations and stochastic gradient. The second algorithm uses a quasi-Newton technique.
1
Introduction
Blind source separation (BSS) methods exploit the knowledge that the sources typically satisfy certain statistical or deterministic properties. In principle, BSS methods can be categorized into the following five classes: – HOS-approaches: Higher-order statistics (HOS)-based methods assume that the sources are statistically independent. This is always enforced using the fourth order moments or cumulants of the mixtures. This class fails if more than one source is Gaussian. – SOS-approaches: Second-order statistics (SOS)-based methods use some temporal structures less restrictive than the statistical independence. SOS approaches deal with uncorrelated sources and fails when sources have close power spectral shapes and when source are i.i.d.4 – NS-SOS-approaches: This class exploits the nonstationarity (NS) and SOS when sources have time-varying variances. – FLOS-approaches: Recently introduced for separation of impulsive sources. For this king of signals, sources have heavy-tailed probability density functions (pdf), and fractional low order statistics (FLOS) can be used [12, 14]. – STF-approaches: Space-time-frequency methodes refers to methods which exploit various diversity of signals, typically, time, frequency, and timefrequency structure [3, 10, 15, 4]. This is the class of diversity we analyse and we use in this contribution. 4
Abbreviation of independent and identically distributed.
In this paper we expand on the last class and build on the nonstationarity properties in the time-frequency domain. Specifically, we propose a class of contrast functions for BSS in the time-frequency plane using information theoretic divergence measures. Two different schemes are proposed. In the first scheme, we minimize the T-F mutual information between the marginal TFD signal energies and the cross-TFD energies (joint TFD) as a useful extension of the FastICA mutual information based algorithm in the time domain [9]. In the second scheme, we maximize the divergence measures between the TFD energy support of the observation components as a natural extension to the minimum support contrasts proposed in the time domain [13]. To achieve BSS, we optimize the proposed cost functions using a two stage whitening-gradient algorithm and an iterative Quasi-Newton technique.
2
Problem Formulation
Consider m nonstationay signals for which n ≥ m noisy linear combinations are observed, x(t) = As(t) + w(t) (1) where s(t) = [s1 (t), · · · , sm (t)]T is the m × 1 complex sources vector, w(t) is the n × 1 complex noise vector and A is a n × m full rank mixing matrix. The source signals si (t), i = 1, · · · , m are assumed to be nonstationary, complex stochastic processes with different time-frequency signatures. The additive noise w(t) is modelled as a stationary zero-mean complex random process. The goal of BSS is to find a m × n separating matrix B such that the output vector z(t) = Bx(t) is an estimate of the source signals. There are two inherent ambiguities in the underlying problem. First, the original labeling of the sources cannot be ascertained, and second, exchanging a fixed scalar factor between a source signal and the corresponding column of A does not affect the observations. Accordingly, B is commonly determined up to a permutation and scaling multiplication of its columns. That is, B is referred to as a separating matrix if z(t) = Bx(t) = PΛs(t), where P is a permutation matrix and Λ a non-singular diagonal matrix. Similarly, blind identification of A amounts to the determination of a matrix equal to A up to a permutation matrix and a non-singular diagonal matrix.
3 3.1
Measuring Time-Frequency Information Time-frequency distributions
The discrete-time form of the Cohen’s class of time-frequency distributions (TFD), for signal x(t), is given by [5] Dxx (t, f ) =
∞ X
∞ X
φ(m, l)x(t + m + l)x∗ (t + m − l)e−j4πf l
(2)
l=−∞ m=−∞
where t and f represent the time index and the frequency index, respectively. The kernel φ(m, l) characterizes the TFD and is a function of both the time and
lag variables. The cross-TFD of two signals x1 (t) and x2 (t) is defined by Dx1 x2 (t, f ) =
∞ X
∞ X
φ(m, l)x1 (t + m + l)x∗2 (t + m − l)e−j4πf l
(3)
l=−∞ m=−∞
The main obstacle in considering the information theoretic in the T-F plane is that the TFDs are not always positive. If positive TFDs are desired, then the kernel must be signal-dependent [5]. Alternatively, one can use the Copulas theory-based non iterative method for constructing positive TFDs, as recently proposed in [7]. Therefore, in this paper we consider only positive TFDs (e.g. spectrograms). In addition, in order to ensure that theRTFD behaves like a pdf, R we need to normalize the TFD by their energy, ED = D(t, f )dtdf . 3.2
Information-Theoretic on the time-frequency plane
Recently, there has been an interest in applying information theoretic measures to the time-frequency plane in order to quantify signal information [6, 1]. Using energy density instead of probability density function, divergence measures for the probability distributions has been adapted to the TFD as follows [1]: 1. Time-frequency Kullback-Leibler divergence : This standard distance measure belongs to the class of Csiszar’s φ-divergence with φ(t) = − log(t). It can be employed in the time-frequency plane to quantify complexity between two TFD D1 and D2 , as Z Z D1 (t, f ) K(D1 , D2 ) = D1 (t, f ) log dtdf (4) D2 (t, f ) 2. Time-frequency Jensen-R´enyi divergence: For two positive TFDs D1 and D2 , Jensen-R´enyi divergence can be defined as p H (D ) + H (D ) α 1 α 2 Jα (D1 , D2 ) = Hα D1 D2 − (5) 2 where Hα is the αth T-F R´enyi entropy defined as α Z Z 1 D(t, f ) R R Hα (D) = log2 dtdf (6) 1−α D(t, f )dtdf Parameterized by α > 0, α 6= 1, Hα converges to the Shannon entropy. 3. Time-frequency divergence properties: Some of the most desirable properties of the time-frequency divergences are given in the following proposition [1]. Proposition 1. – The divergence measures are always positive : 0 ≤ K(D1 , D2 ), Jα (D1 , D2 ) ≤ ∞ – The lower bound is reached if and only if D1 = D2 . – The supper bound is reached if and only if Supp(D1 ) ∩ Supp(D2 ) = ∅. To separate nonstationary mixtures, the last property will be exploited to provide the mixed signals with nearly 5 disjoint supports in the TF plane through divergence measure maximizations. 5
We relax the well known assumption that sources must be of disjoint T-F supports and we need just that sources are with linearly independent T-F signature vectors.
4
TFD Information Based Contrast Functions
It is shown in [4] that the necessary condition to achieve BSS in the T-F domain is given in the following proposition. Proposition 2. Let (t1 , f1 ), · · · , (tK , fK ) are time-frequency points corresponding to auto-terms, i.e. energy concentration in the time-frequency plan and di = [Dsi si (t1 , f1 ), · · · , Dsi si (tk ,fk ) ] for i = 1, · · · , m. Then, BSS can be achieved if and only if di and dj are linearly independent for i 6= j. 4.1
Minimum T-F Mutual Information Contrast Function
If we replace the marginal pdfs, used in the time domain, by the TFD signal energy Dx (t, f ) and Dy (t, f ), and the joint density function by the joint energy distribution defined by the cross-spectrograms of the two signals, i.e., def
D(x,y) (t, f ) = ST F Tx(t, f )ST F Ty∗(t, f )
(7)
R where ST F Tx(t, f ) = h(τ − t)x(τ )e−j2πf τ dτ with h(t) being the data window. Since R the time marginal of D(x,y) (t, f ) leads to the cross energy of x(t) and y(t), i.e. D(x,y) (t, f )df = x(t)y ∗ (t), it can be viewed as a joint energy distribution of the two signals. Then, T-F mutual information between x(t) and y(t) can be defined as Z Z
def
I(Dx , Dy ) = K(|D(x,y) |, Dx Dy ) =
|D(x,y) (t, f )| log
|D(x,y) (t, f )| dtdf Dx (t, f )Dy (t, f )
(8)
We replaced D(x,y) (t, f ) by its absolute value because it can be complex-valued. When the two signals x(t) and y(t) are separated in the T-F plane, we have |D(x,y) (t, f )| = 0, ∀ t, f , which implies that the T-F mutual information between x(t) and y(t) is equal to zero. This is analogous to the case where independent random variables have zero mutual information. It allows us to introduce a mutual information BSS framework in the T-F domain. In this paper, we define the following T-F mutual information contrast function C1 as, C1 (z) =
X
I(Dzi , Dzj )
1≤i6=j≤m
=
X
K X D(z
i ,zj
) (tk , fk ) log
1≤i6=j≤m k=1
=
X
K X D(z
i ,zj )
D(z
(tk , fk ) Dzi (tk , fk )Dzj (tk , fk ) i ,zj )
(tk , fk ) log D(zi ,zj ) (tk , fk ) − 2 log Dzi (tk , fk )
(9)
1≤i6=j≤m k=1
Equation (9) implies that minimizing the T-F mutual information corresponds to minimizing the cross-terms and maximizing the auto-terms between data components, which should be achieved for proper source estimation. A separating matrix B can be computed as B = arg min C1 (z)
(10)
4.2
Maximum T-F Divergence Contrast Function
Considering the source probability density function, we have discussed the minimum support contrast function for source separation in [13]. In this section, we extend this idea to the time-frequency domain using the above time-frequency support properties. From property 1, for two signals x(t) and y(t), the divergence between Dx and Dy assumes a maximum value when the two signals are wellseparated in the time-frequency plane, or equivalently when their TFD support are disjoint. This leads to the well known minimum support criterion for source extraction in the time domain (e.g., see [13] and references therein). Thus, we propose to maximize the divergence between the mixture components to achieve source separation. Using the Jensen-R´enyi divergence, we can define a second contrast function as C2 (z) =
X
Jα (Dzi , Dzj )
1≤i6=j≤m
p α Dzi (tk , fk )Dzj (tk , fk ) 1 = log2 qP k=1 qP (11) K K 1−α α (t , f ) α (t , f ) D D 1≤i6=j≤m k k k k z z k=1 k=1 i j
X
PK
The maximization of the above criterion leads to signals with energy almost disjoint in the T-F plane. Since the log is a monotonous function, contrast C2 can be further simplified as
p α X Dzi (tk , fk )Dzj (tk , fk ) k=1 C2 (z) = qP qP K K α α D (t , f ) D (t , f ) 1≤i6=j≤m zi k k zj k k k=1 k=1 PK
(12)
Then, a separating matrix B can be computed as B = arg max C2 (z) if α < 1 and B = arg min C2 (z) if α > 1
4.3
(13)
Jacobi-Gradient Algorithm
To derive a nice and fast algorithm, we have proposed in [11] to combine the Jacobi-like decomposition of Givens rotations and the Gradient algorithm using a numerical computation for achieving the BSS in two stages. Firstly, the observations are pre-processed (whitening) such that the sensor vector is transformed to a white vector and at the same time the dimensionality is reduced from n to m, and then an m × m orthogonal matrix is determined as a product of Givens rotations. The so called Jacobi-Gradient algorithm can be summarized as follows:
Step 0. Whitening (e.g., see [9, 12]). Step 1. Initialize Givens angles randomly. Step 2. Compute, the considered sources T-F distributions used in the contrast function: Dzi (tk , fk ) and D(zi ,zj ) (tk , fk ) for i, j = 1, · · · , m and k = 1, · · · , K. Step 3. Calculate the gradient of the cost function with respect to the Givens angles. The gradient is ∂C(B) ∂Θ . Step 4. Update the Givens angles using gradient ascent Θ(k + 1) = Θ(k) − η
∂C(B) ∂Θ
Step 5. Go back to step 3 and continue until convergence.
4.4
Iterative Quasi Newton Algorithm
Similar to [4], a block technique can be implemented for the contrasts optimization using the received samples and consists of searching solutions of equations (10) and (13) iteratively in the form B(p+1) = (I + ε(p) )B(p)
(14)
z(p+1) (t) = (I + ε(p) )z(p) (t)
(15)
and thus
At the pth step, a matrix ε(p) is computed from a local linearization of the criterion. It is an approximate Newton technique with the benifit that ε(p) can simply computed, without need to Hessian inversion, under that B(p) is close to a separating matrix.
5 5.1
Simulation Results First experiment: separation of two crossing linear FM signals
In this experiment, we consider a linear mixture of two linear frequency modulated nonstationary signals (chirp) with random Gaussian amplitudes. The model consists of two noisy mixture observations (n = m = 2) with number of samples K = 1024 and SNR = 10 dB. The plots of the spectrograms TFD of the two mixtures and the separated signals are shown in Fig. 1. Using the first Givens-gradient algorithm for the T-F mutual information minimization, it is clear that the proposed procedure works well for noisy random Gaussian sources with overlapping TFD signatures. This specific example is chosen to test the algorithm when the mixed signals overlap in the time-frequency domain and when more than one source is Gaussian.
Fig. 1. Example of T-F mutual information separation.
5.2
Second experiment: performance comparison with FastICA
We characterize the performance of each algorithm in terms of signal rejection Pm def level. When C = BA, the i-th estimated source is sˆi (t) = zi (t) = j=1 Cij sj (t) which contains the j-th source signal at level |Cij |2 /|Cii |2 . In this case, the |Cij |2 1 Pm P 1 Pm averaged rejection level is given by Iperf = m i=1 Ii = m i=1 j6=i |Cii |2 . Here, the same default settings of the first experiment are used. We evaluate the performance of the proposed T-F mutual information algorithm and compare it with that of the time domain algorithm FastICA [9]. Fig. 2 presents the mean rejection level (Iperf ) versus the signal to noise ratio (SNR) of each algorithm. It is evident from this figure that in this case the proposed T-F mutual information algorithm outperforms the classic FastICA. This is can be explain by the fact that the FastICA assumptions are not fitted in this example because we have more than one Gaussian source in the mixture. We note also that the sources can be non independents and then ICA methods fails in that case.
6
Discussion and Conclusion
A new BSS approach has been developed that is based on the minimization of the T-F mutual information and maximization of the T-F Jensen-R´enyi divergence measure. An efficient implementation using a block quasi-Newton algorithm is proposed for the contrasts optimization. Unlike to the existing BSS methods, the proposed method allows the separation of Gaussian sources with identical spectral shape, provided that the sources have different time frequency signatures.
−4 T−F Mutual Information algorithm FastICA algorithm
−6
−8
Mean rejection level [dB]
−10
−12
−14
−16
−18
−20
−22
−24
2
4
6
8
10 12 SNR [dB]
14
16
18
20
Fig. 2. Mean rejection level versus SNR. T = 1000.
References 1. S. Aviyente, ”Information processing on the time-frequency plane”, in Proc. of the IEEE Conference ICASSP’2005, Philadelphia, USA, 2005. 2. Z. Shan and S. Aviyente, ”Information-Theoretic Blind Source Separation on the Time-Frequency Plane”, to be submitted to ICA 2006. 3. A. Belouchrani and M. Amin, ”Blind source separation based on time-frequency signal representations”, IEEE Trans. on Signal Processing, 46(11), Nov. 1998. 4. A. Belouchrani and K. Abed-Meraim, ”Contrast Functions for Blind Source Separation based on Time frequency distributions”, in ISCCSP’04; IEEE International Symposium on Control, Communications and Signal Processing, Tunisia, 2004. 5. B. Boashash Editor, Time Frequency Signal Analysis and Processing: a comprehensive book, Elseiver, 2003. 6. R. G. Baraniuk, P. Flandrin, A. J. E. M. Janssen, and O. J. J. Michel, ”Measuring Time-Frequency Information Content Using the Renyi Entropies”, IEEE Trans. on Signal Processing, Vol. 47, No. 4, 2001. 7. M. Davy and A. Doucet, ”Copulas: A new Insight into Positive Time-Frequency Distributions”, IEEE Signal Processing Letters, Vol. 10, No. 7, July 2003. 8. K. E. Hild II, D. Erdogmus, and J.C. Principe, ”Blind Source Separation using Renyi’s mutual information”, IEEE Signal Proc. Letters, Vol. 8, pp: 174-176, 2001. 9. A. Hyvarinen et al, Independent Component Analysis , Wiley, 2001. 10. S. Rickard, R. Balan and J. Rosca, ”Blind source separation based on space-timefrequency diversity”, in Proc. of ICA’03, Nara, Japan, April 2003. 11. M. Sahmoudi and K. Abed-Meraim, ”Investigations on Contrast Functions for Blind Source Separation Based on Non-Gaussianity and Sparsity Measures”, in Proc. of ISSPA’2005, Sydney, Australia, Aug. 2005. 12. M. Sahmoudi, K. Abed-Meraim and M. Benidir, “Blind Separation of Impulsive Alpha-Stable Sources Using Minimum Dispersion Criterion”, in IEEE Signal Processing Letters, Vol. 12, No. 4, April 2005. 13. Mohamed Sahmoudi, ”Generalized Contrast Functions for Blind Separation of Overdetermined Linear Mixtures with Unknown Number of Sources”, in Proc. of SSP05; the IEEE Statistical Signal Processing Workshop, France, July 2005. 14. M. Sahmoudi, H. Boumaraf and D.-T. Pham, ”Blind separation of convolutive mixtures using nonstationarity and fractional lower order statistics (FLOS) with application to audio sources”, submitted to ICASSP’06, France, May 2006. 15. Y. Zhang, M. G. Amin, ”Blind separation of sources based on their time-frequency signatures”, in Proc. of ICASSP 2000.