➠
➡ Non-parametric Approach to ICA using Kernel Density Estimation Prabir Burman UC Davis Department of Statistics 360 Kerr Hall One Shields Ave., Davis, CA 95616, USA
[email protected]
Kuntal Sengupta Advanced Interfaces, Inc. 403 S. Allen St. State College, PA 16801, USA
[email protected] Abstract— Independent Component Analysis (ICA) has found a wide range of applications in signal processing and multimedia, ranging from speech cleaning to face recognition. This paper presents a non-parametric approach to the ICA problem that is robust towards outlier effects. The algorithm, for the first time in the field of ICA, adopts an intuitive and direct approach, focusing on the very definition of independence itself; i.e. the joint probability density function (pdf) of independent sources is factorial over the marginal distributions. This is contrary to traditional Independent Component Analysis (ICA) algorithms, which achieve the objective by attempting to fulfill necessary conditions (but not sufficient) for independence. For example, the Jade algorithm attempts to approximate independence by minimizing higher order statistics. In the proposed algorithm, kernel density estimation is employed to provide a good approximation of the distributions that are required to be estimated. This estimation technique is inherently robust towards outlier effects. The application of kernel density estimation also enables the algorithm to be free from the assumptions of source distributions. Experimental results show that the algorithm is able to perform separation of sources in the presence of outliers, whereas existing algorithms like Jade and Infomax break down under such conditions. The results have also shown that the proposed non-parametric approach is generally source distribution independent. In addition, it is able to separate non-gaussian zero-kurtotic signals unlike the traditional ICA algorithms like Jade and Infomax.
Bell and Sejnowski [1] demonstrated that by maximizing the joint entropy of the outputs of a neural processor, the mutual information between the output components will be minimized. Joint entropy H ( y ) and mutual information
I ( y ) is related by equation (1) below.
H ( y1 ,..., yn ) = H ( y1 ) + ... + H ( yn ) − I ( y1 ,..., yn ) (1) Since independent components have zero mutual information, the objective of independence can be achieved simply by maximizing the joint entropy, H ( y ) , of the neural outputs. However, the maximization of H ( y ) will only lead to independence if the logistic function of the neural network is similar to the cumulative density function (cdf) of the source signals. Thus, the infomax algorithm is source dependent. Bell and Sejknowski [1] used the logistic function:
f (ui ) = tanh(ui )
(2)
This happens to be the cdf of a super-Gaussian source. Thus the original infomax could only be used to separate signals which are super-Gaussian in nature.
I. INTRODUCTION In the application of ICA for linearly mixed signals, the aim is to find a transformation such that the resultant output signals are mutually independent. This relates to the premise condition of the ICA problem that the source signals to be recovered are independent. In order to perform the ICA task, a measure of the amount of independence between signals is required. Many such measures of independence exist, based on which, different algorithms have been proposed to solve the ICA problem. Next we discuss some of these algorithms. A. Infomax Algorithm The Infomax algorithm was developed by Bell and Sejnowski. It is based on principles in information theory.
0-7803-7965-9/03/$17.00 ©2003 IEEE
B. Extended Infomax Algorithm Aware of the limitations of the Infomax Algorithm, Tewon Lee [2] extended the algorithm so that it can separate both super-Gaussian and sub-Gaussian sources. The Extended Infomax Algorithm achieves this through the use of a switching function and 2 different logistic function. One for sub-Gaussian sources while the other for superGaussian sources. The aim of the switching function is to determine the nature of the source so that the appropriate logistic function can be applied to the neural network. A possible switching criteria is the kurtosis of the signals. Sub-Gaussian signals generally have negative kurtosis while super-Gaussian signals generally have positive kurtosis. Te-won Lee chooses a switching criterion based on the stability criteria derived from the generic stability analysis of separating solutions as employed by Cardoso and Laheld [3], Pham and Garra [4] and Amari [5].
I - 749
ICME 2003
➡
➡ However, despite the extension, the Extended Infomax algorithm is still dependent on source distribution. It still fails to separate certain source distributions, i.e. skewed zero-kurtotic signals generated by the power distribution. C. Jade This algorithm was developed by Jean Francis Cardoso [6]. The algorithm operates on cumulants as a measure of independence. Cumulants are related to the higher order statistics of the signals. The algorithm first applies whitening to the data to decorrelate them. Subsequently, it seeks to approaqch independence through the minimizing of the higher order cumulants. Theoretically, independence is achieved when all the cumulants are zero. However, this can be computationally challenging. In the Jade algorithm, good results were achieved with the minimization of cumulants up to the 4th order. However, one major weakness of this algorithm is that higher order cumulants are extremely vulnerable to outlier effects. This limits the application of this algorithm in the presence of outliers. Besides being sensitive to outliers, Jade also fails to separate certain source distributions, i.e. skewed zerokurtotic signals generated by the power distribution. This is because by minimizing only the 4th order cumulants, third order effects like the skewness are ignored. II. NON-PARAMETRIC ICA APPROACH The non-parametric ICA algorithm, like the Jade approach can be described in 2 phases: a whitening process and a rotational process. Whitening refers to the process of nullifying the covariances of the signals (de-correlation) and normalizing the variances of the de-correlated signals. In effect, such a process removes first and second order cross-statistics. After whitening, the non-parametric algorithm seeks to achieve independence by rotating the signals by a suitable angle. The angle of rotation is determined through the minimization of a cost function which is based on the fundamental definition of independence. To explain the concept well, we resort to the 2 signal case. Extension to n-signals is straightforward, and is similar to the method discussed in [6]. The most fundamental relationship of independence between 2 signals
Q=
whereby
(3)
f y1 y2 ( y1 , y2 ) is the joint pdf and f yi ( yi ) is
∫∫
f y1 y2 ( y1 , y2 ) − f y1 ( y1 ) f y 2 ( y2 )dy1dy2
(4)
−∞ −∞
However, the close form solution of the above expression proved to be computationally complicated. As such, the cost function was simplified to be evaluated at 25 evenly distributed points, i.e: 25
Q' = ∑ k =1
(
f y1 y2 ( y1k , y2 k ) −
f y1 ( y1k ) f y 2 ( y2 k )
)
2
(5)
To estimate the probability distribution functions required in (5), we employ (gaussian) kernel density estimation techniques [7] [8] whereby:
f yi ( yi ) =
f yi y j ( yi y j ) =
1 N yi − yi [n] ∑ Φ B NB n =1
(6)
1 N yi − yi [n] y j − y j [n] ∑ Φ B Φ B (7) NB n=1
In the above equation, B is the bandwidth used in the estimation process, and N is the number of samples. For our implementations, B was chosen as N1/6, as suggested in [7]. Φ is the Gaussian kernel. Thus after whitening, the signals will be rotated till
Q ' is minimized.
III. PERFORMANCE OF NON-PARAMETRIC ICA ALGORITHM WITH RESPECT TO OTHER ICA ALGORITHMS
The non-parametric algorithm’s performace was evaluated against the infomax, extended infomax, and Jade in the separation of different pairs of mixed signals, i.e. deterministic, uniform, super-Gaussian and skewed zerokurtotic signals. The performance criteria used was the same as that proposed by Amari et al [10]. n n n n pij pij − 1 + ∑ ∑ − 1 E = ∑ ∑ j =1 i =1 max k pkj i =1 j =1 max k pik
y1 and y2 can be described by: f y1 y2 ( y1 , y2 ) = f y1 ( y1 ) f y 2 ( y2 )
∞ ∞
Here, p’s are the elements of the performance matrix P, which is a product of the mixing and the (estimated) unmixing matrix. A lower value of E implies a better performance.
the marginal pdf of the signals. Logically, a most ideal cost function would be:
I - 750
(8)
➡
➡
Fig 1: Performance of ICA algorithms on deterministic signals. X-axis represents different trials, with an unique mixing matrix for each trial. Fig 1 above shows the performance of the various ICA algorithms in unmixing a pair of deterministic signals. Most of the algorithms perform reasonably well, with the exception of Infomax. This is because the Infomax is only suited in unmixing super-Gaussian signals and deterministic signals are sub-Gaussian in nature.
Fig 3: Performance of ICA algorithms on super-Gaussian signals. X-axis is same as in Fig. 1.
Fig 4: Performance of ICA algorithms on zero-kurtotic, skewed signals. X-axis same as in Fig. 1.
Fig 2: Performance of ICA algorithms on uniform signals. X-axis is same as in Fig. 1. Fig 2 above shows the performance of the various ICA algorithms in unmixing a pair of uniform signals. Results are generally the same as Fig 1 as uniform signals are also sub-Gaussian signals. Fig 3 above shows the performance of the various ICA algorithms in unmixing a pair of superGaussian signals. All the algorithms perform well. Fig 4 above shows the performance of the various ICA algorithms in unmixing a pair of zero-kurtotic skewed signals. The Infomax, Extended Infomax and Jade fail to separate the signals effectively. Only the non-parametric approach successfully separated the signals. The results here clearly illustrate the non-parametric ICA algorithm’s independence on source distributions.
The failure of Infomax is largely due to the mismatch between the source’s actual p.d.f and the parametric model assumed by both the Extended Infomax and Infomax algorithm. The failure of jade is because Jade only considers the minimization of the 4th order cumulants. Third order effects like the skewness information are ignored. To test the effectiveness of the algorithm to against outliers, four algorithms, the non-parametric, Jade, Infomax and Extended Infomax, were used to separate outliercontaminated deterministic signals. The outlier of the deterministic signal was slowly increased in step-size of 0.1 from 0 to 10. The results can be seen in Fig 5 below.
I - 751
➡
➠ mixed signals and the unmixed output signals from the algorithm. It was observed that the output signals from the algorithm are similar to the source signals used. Passing the output signals to speakers, it was verified that the source separation was indeed successful. This demonstrates the applicability of the algorithm in speech separation. V. CONCLUSION
Fig 5: Effect of the magnitude of outlier on performance Apparently, only the non-parametric algorithm is successful in separating outlier-contaminated signals. This clearly shows the non-parametric algorithm’s robustness to outlier contaminations.
Fig 6: Top row: original signals. Middle row: Linearly mixed signals. Bottom row: Separated signals. IV. UNMIXING LINEARLY MIXED AUDIO SIGNALS In this section, the application of the non-parametric algorithm to speech signals is demonstrated. The source signals used are two speech signals of 100,000 samples each. These were independently recorded and artificially mixed using a random mixing matrix. In order to ensure a reasonable processing time, only the first 10,000 samples of each of the signals were passed into the algorithm for processing. Figure 6. shows the original speech signals, the
In conclusion, it can be seen that the non-parametric algorithm has two main advantages. First of all, it is source independent. Separation of mixed signals is not dependent on the source distribution of the signals.. Secondly, it is resistant to outlier effects. Further work on the algorithm could involve the extension of the algorithm to the separation of real convolutive speech signals. References [1] Bell, A.J. and Sejnowski, T.J. (1995). “An Information-Maximization Approach to Blind Separation and Blind Deconvolution”. Neural Computation, 7:1129-1159. [2] Lee T-W. Lee, M. Girolami and T.J. Sejnowski. “Independent Component Analysis using an Extended Infomax Algorithm for Mixed Sub-Gaussian and Super-Gaussian Sources”, Neural Computation, 1999, Vol.11 (2):417-441 [3] Cardoso, J.-F. and Laheld, B. (1996). “Equivariant adaptive source separation.” IEEE Trans. On S.P., 45(2):434-444 [4] Pham, D-T. and Garrat, P. (1997). “Blind separation of mixture of independent sources through a quasimaximum likelihood approach.” IEEE Trans. On Signal Proc., 45(7) 1712:1725 [5] Amari, S. (1997a). “Neural learning in structured parameter spaces.” In Advances in Neural Information Processing Systems 9, pages 127-133, MIT Press. [6] Jean-Francis Cardoso. “Blind Signal Separation: Statistical Principles.” Proceedings of the IEEE, vol 90, n8, pg. 2009-2026, Oct 1998. [7] Jeffrey S. Simonoff. “Smoothing Methods in Statistics”. New York; Springer Series c1996, ISBN:0387947167 [8] W.J. Krzanowski. “Principles of Multivariate Analysis: A User’s Perspective.” Pages 230-235. [9] Campbell, N.A. (1980). “Robust Procedures in Multivariate Analysis I: Robust Covariance Estimation.” Appl Statist., 29, 231-237. [10] Amari, S., Cichocki, A., and Yang, H. (1996). “A New Learning Algorithm for Blind Signal Separation.” In Advances in Neural Information Processing Systems 8, pages 757.
I - 752