Blind Source Separation in Post-Nonlinear Mixtures Using ...

28 downloads 0 Views 670KB Size Report
Fernando Rojas, Carlos G. Puntonet, Manuel Rodríguez-Álvarez, Ignacio Rojas, ...... [4] J. F. Cardoso, “Source separation using higher order moments,” in Proc.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 34, NO. 4, NOVEMBER 2004

407

Blind Source Separation in Post-Nonlinear Mixtures Using Competitive Learning, Simulated Annealing, and a Genetic Algorithm Fernando Rojas, Carlos G. Puntonet, Manuel Rodríguez-Álvarez, Ignacio Rojas, and Rubén Martín-Clemente, Member, IEEE

Abstract—This paper presents a new adaptive procedure for the linear and nonlinear separation of signals with nonuniform, symmetrical probability distributions, based on both simulated annealing and competitive learning methods by means of a neural network, considering the properties of the vectorial spaces of sources and mixtures, and using a multiple linearization in the mixture space. Moreover, the paper proposes the fusion of two important paradigms–genetic algorithms and the blind separation of sources in nonlinear mixtures. In nonlinear mixtures, optimization of the system parameters and, especially, the search for invertible functions is very difficult due to the existence of many local minima. The main characteristics of the methods are their simplicity and the rapid convergence experimentally validated by the separation of many kinds of signals, such as speech or biomedical data. Index Terms—Independent component analysis, nonlinear blind separation of sources, signal processing.

The source separation problem has been successfully studied for linear instantaneous mixtures [1], [4], [14], [17], and more recently, since 1990, for linear convolutive mixtures [12], [22], [24]. A. Independent Component Analysis (ICA) The blind separation of sources problem can be approached from a wider point of view by using ICA [7]. ICA appeared as a generalization of the popular statistical technique principal component analysis (PCA). The goal of ICA is to find a linear transformation given by a , so that the random variables , of matrix are as independent as possible in (1)

I. INTRODUCTION

B

LIND SOURCE separation (BSS) consists in recovering unobserved signals from a known set of mixtures. The separation of independent sources from mixed observed data is a fundamental and challenging signal-processing problem [2], [8], [17]. In many practical situations, one or more desired signals needs to be recovered from the mixtures. A typical example is a set of speech recordings made in an acoustic environment in the presence of background noise and/or competing speakers. This general case is known as the cocktail party effect, in reference to a human’s brain ability to focus on one single voice and ignoring other voices/sounds, which are produced simultaneously with similar amplitude in a noisy environment. Spatial differences between the sources greatly increase this capability.

Manuscript received January 31, 2003; revised April 27, 2004. This work was supported in part by the Spanish Interministerial Commission for Science and Technology Spanish (CICYT) under Project TIC2001-2845 “PROBIOCOM” (Procedures for Biomedical and Communications Source Separation), Project TIC2000-1348, and Project DPI2001-3219. A preliminary version of this work was presented at Advances in Multimedia Communications, Information Processing and Education (Learning 2002), Madrid, Spain, in 2002. This paper was recommended by Guest Editors J. L. Fernández and A. Artés. F. Rojas, C. G. Puntonet, M. Rodríguez-Álvarez, and I. Rojas are with the Departamento de Arquitectura y Tecnología de Computadores (Computer Architecture and Technology Department), University of Granada, Granada 18071, Spain (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). R. Martín-Clemente is with the Department of Electronic Engineering, University of Seville, Seville 41092, Spain (e-mail: [email protected]). Digital Object Identifier 10.1109/TSMCC.2004.833297

B. Nonlinear BSS Nevertheless, the linear mixing model may not be appropriate for some real environment. Even though the nonlinear mixing model is more realistic and practical, most existing algorithms for the BSS problem were developed for the linear model. However, for nonlinear mixing models, many difficulties occur and both the linear ICA and the existing linear demixing methodologies are no longer applicable because of the complexity of nonlinear parameters. Therefore, researchers have recently started addressing algorithms for the BSS problem with nonlinear mixing models [10], [19], [20], [23]. In [11] and [15], the nonlinear components are extracted by a model-free method, which used Kohonen’s self-organizing-feature-map (SOFM), but suffers from the exponential growth of network complexity and from interpolation errors in recovering continuous sources. Burel [3] proposed a nonlinear mixing model applying a two-layer perceptron to the blind source separation problem, which is trained by the classical gradient descent method to minimize the mutual information. In [10], a new set of learning rules for the nonlinear mixing models based on the information maximization criterion is proposed. The mixing model is divided into a linear mixing part and a nonlinear transfer channel, in which the nonlinear functions are approximated by parametric sigmoidal or by higher order polynomials.

1094-6977/04$20.00 © 2004 IEEE

408

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 34, NO. 4, NOVEMBER 2004

More recently, Yang et al. [23] developed an information backpropagation algorithm for Burel’s model by a natural gradient method, using special nonlinear mixtures in which the nonlinearity could be approximated by a two-layer perceptron. Other important work in source separation in post-nonlinear mixture is presented in [19]. In this methodology, the estimation of nonlinear functions and score functions is made using a multilayer perceptron with sigmoidal units trained by unsupervised learning. In addition, an extension of ICA to the separation sources in nonlinear mixture is to employ a nonlinear function to transform the mixture such that the new outputs become statistically independent after the transformation. However, this transformation is not unique and, if it is not limited for demixing transforms, this extension may give statistically independent output sources that are completely different from the original unknown sources. Although, to the best of our knowledge, there currently exist many difficulties in this transform and several nonlinear ICA algorithms have been proposed and developed. Nevertheless, one of the greatest problems encountered with nonlinear mixing models is that the approximation of the nonlinear function by the various techniques (perceptron, sigmoidal, radial basis functions networks (RBF), etc.) runs into a serious difficulty: there are many local minima in the search space of the parameter solutions for adapting nonlinear functions. The output surface of a performance index, based on the mutual information when the parameters of the nonlinear function are modified, presents multiple and severe local minima. Therefore, algorithms that are based on a gradient descent for the adaptation of these nonlinear-function parameters may become trapped within one such local minimum. C. Outline of the Proposed Methods This paper presents a new adaptive procedure for the linear and nonlinear separation of signals with nonuniform, symmetrical probability distributions, based on both simulated annealing (SA) and competitive learning (CL) methods by means of a neural network, considering the properties of the vectorial spaces of sources and mixtures, and using a multiple linearization in the mixture space. Also proposed in this work is a new algorithm for the nonlinear mixing problem using flexible nonlinearities. This nonlinear function may be approximated by th order odd polynomials. An algorithm that makes use of the synergy between genetic algorithms and the blind separation of sources (GABSS) was developed for the optimization of the parameters that define the nonlinear functions. Simultaneously, a natural gradient descent method is applied to obtain the linear demixing matrix. Unlike many classical optimization techniques, GAs do not rely on computing local first- or second-order derivatives to guide the search algorithm; GAs are a more general and flexible method that is capable of searching wide solution spaces and avoiding local minima (i.e., it provides more possibilities of finding an optimal or near-optimal solution). GAs deal simultaneously with multiple solutions, not a single solution, and also include random elements, which help to avoid getting trapped in suboptimal solutions.

Fig. 1. Post-nonlinear mixing and demixing models for blind source separation.

The paper is organized as follows: in the next section, the nonlinear mixture model and demixing system are illustrated (Section II). Section III depicts the proposed method using competitive learning (Section III-A), simulated annealing (Section III-B) and both techniques simultaneously (Section III-C). In Section IV, a genetic algorithm (GA) approach is presented in order to overcome entrapment within local minima, showing promising experimental performance (Section IV-D). The two proposed methods were compared in Section V. Finally, the conclusions and future work are summarized in Section VI. II. NONLINEAR MIXTURE MODEL AND DEMIXING SYSTEM The task of blind signal separation (BSS) is that of recovering unknown source signals from sensor signals described by (2) is an available where sensor vector, is an unknown source vector having stochastic independent and zero-mean , is an unknown full-rank non-Gaussian elements and nonsingular mixing matrix, and are the set of invertible nonlinear transfer functions. The BSS using only problem consists of recovering the source vector , the assumption of statistical indepenthe observed data dence between the entries of the input vector , and possibly some a priori information about the probability distribution of the inputs. If all of the functions are linear, (2) reduces to the linear mixing model. Even though the dimensions of and generally need not be equal, we make this assumption here for simplicity. Fig. 1 shows that the mixing system (known as a post-nonlinear mixture) is divided into two different phases, as proposed in [19]. First, a linear mixing and then, for each channel , a nonlinear transfer part. Reversely, for the separating system, first we need to approximate , which is the inverse of the nonlinear function in each channel, and then separate the linear mixing by to the output of the nonlinear function applying (3) In different approaches, the inverse function is approximated by a sigmoidal transfer function, but because of certain situations in which the human expert is not given the a priori knowl-

ROJAS et al.: BSS IN POST-NONLINEAR MIXTURES USING COMPETITIVE LEARNING

409

way of improving the performance by speeding up the convergence of the algorithm. In order to work with well-conditioned are preprocessed or adaptively signals, the observed signals set to zero mean and unity variance as follows: (6) In general, for blind separation and taking into account the possible presence of nonlinear mixtures, the observation space is subsequently quantized into spheres of di, each with a radius mension ( -spheres), circles if covering the points as follows: (7) Fig. 2.

Array of symmetrically distributed neurons in concentric p-spheres.

edge about the mixing model, a more flexible nonlinear transfer function based on an th order odd polynomial is used (4) is a parameter vector to be deterwhere mined. In this way, the output sources are calculated as (5) Nevertheless, computation of the parameter vectors is not easy, as it presents a problem with numerous local minima when the usual BSS cost functions are applied. Thus, we require an algorithm that is capable of avoiding entrapment in such a minimum. As a solution, we propose in this paper one algorithm employing simulated annealing and competitive learning together and another algorithm applying GAs. We have used new metaheuristics as simulated annealing and GAs for the linear case [5], [16], [18], [20], but in this paper, we will focus on a more difficult problem, namely the nonlinear BSS. III. HYBRIDIZING COMPETITIVE LEARNING AND SIMULATED ANNEALING We propose an original method for independent component analysis and blind separation of sources that combines adaptive processing with a simulated annealing technique, and which is in a set of conapplied by normalizing the observed space centric -spheres in order to adaptively compute the slopes corresponding to the independent axes of the mixture distributions by means of an array of symmetrically distributed neurons in each dimension (Fig. 2). A preprocessing stage to normalize the observed space is followed by the processing or learning of the neurons, which estimate the high density regions in a way similar, but not identical to that of self-organizing maps. A simulated annealing optimization method provides a fast initial movement of the weights toward the independent components by generating random values of the weights and minimizing an energy function; this being a

The integer number of -spheres, , ensures accuracy in the estimation of the independent components, and it can be adin justed depending on the extreme values of the mixtures deeach real problem. Obviously, the value of each radius pends on the number of -spheres, . From now on, we shall to denote the vector that verifies the inequality use . If, in some applications, the mixture process is known to be linear, then the number of -spheres is set to 1, and a normalization of the space is obtained with . Although the quantization given in (7) allows a piecewise linearization of the observed space for the case of nonlinear and post-nonlinear mixtures, it is also useful with the assumption of linear media since it allows us to detect unexpected nonlinearities in some real applications [16]. A. Competitive Learning The above-described preprocessing is used to apply a competitive learning technique by means of a neural network whose weights are initially located on the Cartesian edges of the -dimensional space such that the network has neurons, being identified with scalar weights with each neuron per -sphere. For instance, for mixtures of and , then two sources and the network has four neurons (i.e., the neuron is repre), and the neuron is represented sented by both neurons initially located by the weights edge; the neuron is represented by the weights on the , and the neuron is represented by the both neurons initially located on the weights edge. The Euclidean distance between a point and neurons existing in the -dimensional space (Fig. 2) is the

(8) A neuron, labeled , in a p-sphere from the -dimensional point

is at a minimum distance and verifies (9)

The main process for competitive learning when a neuron approaches the density region, in a sphere at time , is given by

(10)

410

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 34, NO. 4, NOVEMBER 2004

with being a decreasing learning rate. Note that a variety and can be used. In particular, a of suitable functions, learning procedure that activates all of the neurons at once is that modulates competitive enabled by means of a factor learning as in self-organizing systems, that is

located on the same axis for each dimension , and finally, computed in (11) if the following transformation is carried out under geometric considerations:

(16)

(11) is a neighborhood decreasing parameter, and is Here , as follows: now geometry-dependent and proportional to

(12) modify the value of the learning rate dewhere and pending on the correlation of the points in the observation space and on the number of -spheres in order to equalize the angular velocity of the outer and inner weights. Note that the weight updating is carried out using the sign function, in contrast to modulates the usual way [6]. As is well known, the term the learning -sphere of jurisdiction depending on the value . After the learning process, the weights are maintained of in their respective -spheres, , by means of the following normalization:

(13) After converging, at the end of the competitive process, the weights in (13) are located at the center of the projections of the maximum density points, or independent components, in each -sphere. It is easy to corroborate that the total number of is . For the purpose of the separation of scalar weights similar to and verifying expression sources, a matrix (3) is needed, and a recursive neural network similar to the Herault–Jutten [9] network uses, as weights, a continuous function scalar weights per sphere, as shown in Section III-A, of the (16) for the general case of sources. Once the neural network has estimated the maximum density subspaces by means of an adaptive (11), and due to the piecewise linearization of the observation space with a number n of -spheres, a set of matrices can be defined as follows:

The superscript indicates that the separation matrix has been computed using competitive learning, which will be useful in Section III-C. Note that (16) works only with even-labeled and can be simplified for linear media if and weights ; for instance, when , it is practical to and in the circle . If operate with only two neurons , the use of several -spheres is useful for nonlinearity detection since in different matrices, in (15) are obtained for successive values of . The total number of coefficients is since the value of the diagonal elements in (16) is 1. B. Simulated Annealing Simulated annealing is a stochastic algorithm that represents a fast solution to some combinatorial optimization problems. As an alternative to the competitive learning method described above, we first propose hybridation with stochastic learning, such as simulated annealing, in order to find a fast convergence of the weights around the maximum density points in the ob. This technique is effective if the chosen servation space for the global system is appropriate. energy or cost function It is first necessary to generate random values of the weights and, second, to compute the associated energy of the system. This energy vanishes when the weights achieve a global minimum, the method thus allowing escape from local minima. For the problem of blind separation of sources, we define an energy related to the fourth-order statistics of the original p sources, due to the necessary hypothesis of statistical independence between them as follows:

(17) where and

is the 2

2 fourth-order cumulant of

, that is

(14) (18) where, for dimensions, the matrices lowing form:

have the fol-

and

represents the expectation of

.

C. Competitive Learning With Simulated Annealing (15) For linear systems or “symmetric” nonlinear mixtures, the eleobtained using competitive learning ments of this matrix are considered to be the symmetric slopes in the segment of -sphere radius between two consecutive weights initially

Despite the fact that the technique presented in Section III-B is fast, the greater accuracy achieved by means of the competitive learning shown in Section III-A led us to consider a new approach. An alternative method for the adaptive computation matrix concerns the simultaneous use (or hybridaof the tion) of the two methods described in Sections III-A and III-B

ROJAS et al.: BSS IN POST-NONLINEAR MIXTURES USING COMPETITIVE LEARNING

Fig. 3. Performance comparison between competitive learning and simulated annealing in a two-signal simulation.

(i.e., competitive learning and simulated annealing). Now, a proposed adaptive rule of the weights is the following:

411

Fig. 4. Performance comparison between competitive learning and simulated annealing in a three-signal simulation.

Note that, a priori, the unknown matrix depends on time, although in the simulations it remains constant (Section III-D). One parameter that provides information concerning is the kurtosis, that is the distribution of a signal

(19) (21) is a decreasing function that can be chosen in several where ways. The main purpose of (19) is to provide a fast initial convergence of the weights by means of simulated annealing during the epoch in which the adaptation of the neural network by competfalls to zero, itive learning is still inactive. When the value the contribution of the simulated annealing process vanishes since the random generation of weights ceases, and the more accurate estimation by means of competitive learning begins. The main contribution of simulated annealing here is the fast convergence compared to the adaptation rule (11), thus obtaining an to the distribution axes (indepenacceptable closeness of dent components). However, the accuracy of the solution when is low, depends mainly on the adaptation the temperature rule presented in Section III-A using competitive learning since, with this, the energy in (17) continues to decrease until a global minimum is obtained. The use of different approaches before the competitive learning process in order to estimate the centres of mass, as performed by standard K-means, is a common practice in expectation maximization fitting of Gaussians, but the complexity of this procedure and the lack of knowledge of the centroids means that simulated annealing is more appropriate, before using competitive learning. A measure of the convergence in the computation of the independent components with the number of samples or iterations is shown in Figs. 3 and 4, which compare the competitive learning and simulated annealing methods using the root mean square defined as follows: error

(20)

is the expectation of . Fig. 3 shows the root where signals mean square (rms) error for linear mixtures of , with the two sources having kurtosis values of and and , respectively, in several experiments. Using simulated annealing and 10 000 samples, the error re, whereas applying simulated annealing and mains at with the same competitive learning the error becomes number of iterations. Fig. 4 shows the rms error in the case and , the sources having kurtosis values of of , , and , respectively. With a larger number of sources to be separated, using simulated annealing , whereas and 15 000 samples, the error remains at using simulated annealing and competitive learning, the error . becomes Although simulated annealing is a stochastic process, the error values presented here are the result of several simulations and are for guidance only since each experiment presents some randomness and is never the same because of the different mixture matrices and sources. D. Simulation Results The simulation in Fig. 5(a) and (b) corresponds to the synthetic nonlinear mixture presented by Lin and Cowan [11] for sharply peaked distributions, the original sources being digital 32-b signals, as follows:

(22)

412

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 34, NO. 4, NOVEMBER 2004

It has been verified that the greater the kurtosis of the signals, the more accurate and faster is the estimation, except for the case in which the signals are not well conditioned or are affected by noise, and this is so since a high density of points on the independent components speeds up convergence when the competitive learning of (11) is used. Moreover, since the distriand the bution estimation is made in the observation space separation is blind, it is useful to take into account the kurtosis of the observed signals in order to test the time convergence and the precision. Fig. 5. (a) Joint distribution space of 32-b uniform sources. (b) Nonlinear mixture joint distribution.

Fig. 6.

Neurons configuration adapting to Lin and Cowan nonlinearity.

The crosstalk parameter is used to verify the similarity signals with sambetween the original and separated ples, and is defined as

IV. GAs Genetic algorithms (GAs) are nowadays one of the most popular stochastic optimization techniques, being inspired by natural genetics and biological evolutionary process. The GA evaluates a population and generates a new one iteratively, with each successive population referred to as a generation. Given the cur, the GA generates a new genrent generation at iteration , based on the previous generation, applying a set eration of genetic operations. The GA uses three basic operators to manipulate the genetic composition of a population: reproduction, crossover, and mutation [5]. Reproduction consists of copying chromosomes according to their objective function (strings with higher evaluations will have more chances of surviving). The crossover operator mixes the genes of two chromosomes selected in the phase of reproduction, in order to combine the features, especially their positive ones. Mutation is occasional; with low probability, it produces an alteration of some gene values in a chromosome (for example, in binary representation a 1 is changed into a 0 or vice versa). A. Evaluation Function Based on Mutual Information

(23) As shown in Fig. 6, good estimation of the density distribution -spheres is obtained with 20 000 samples, and using . For the purpose of the separation, the four equation matrices obtained by means of (14), at the end of the competitive learning process, were

In order to perform the GA, it is very important to define an appropriate fitness function (or contrast function in BSS context). This fitness function is constructed bearing in mind that the output sources must be independent of their nonlinear mixtures. For this purpose, we must utilize a measure of independence between random variables. Here, the mutual information is chosen as the measure of independence. Many forms of evaluation functions can be used in a GA, subject to the minimal requirement that the function be able to map the population into a partially ordered set. As stated, the evaluation function is independent of the GA (i.e., stochastic decision rules). Unfortunately, regarding the separation of a nonlinear mixture, independence alone is not sufficient to perform blind recovery of the original signals. Some knowledge of the moments of the sources, in addition to the independence, is required. A similar index as proposed in [5] is used for the fitness function that approximates mutual information

(24) The crosstalk parameters of the separated signals regarding the original sources were on average of 100 simulations dB

dB

(25) Values near to zero of mutual information (25) between the imply independence between those variables, being statistically . independent if

ROJAS et al.: BSS IN POST-NONLINEAR MIXTURES USING COMPETITIVE LEARNING

413

In the above expression, the calculation of needs to approximate each marginal probability density function (pdf) of the output source vector , which are unknown. One useful method is the application of the Gram–Charlier expansion, which only needs some moments of as suggested by Amari et al. [1] to express each marginal pdf of as

(26) and . where The approximation of entropy (26) is only valid for uncorrelated random variables, being necessary to preprocess the mixed signals (prewhitening) before estimating their mutual information. Whitening or sphering of a mixture of signals consists of filtering the signals so that their covariances are zero (uncorrelatedness), their means are zero, and their variances equal unity. The evaluation function that we compute is the inverse of mutual information in (25), so that the objective of the GA is to maximize the following function in order to increase statistical independence between variables: eval function

(27)

Fig. 7. Original signals (s) corresponding to three persons saying the numbers 1 to 10 in English, Japanese, and Spanish.

tradeoff more favorable to exploration in the early stages of the algorithm, while exploitation becomes of greater importance when the solution given by the GA is closer to the optimal solution. D. Simulation Results

B. Synergy Between GAs and Natural Gradient Descent Given a combination of weights obtained by the GAs for the nonlinear functions expressed as , where the is expressed by parameter vector that defines each function , it is necessary to learn the elements of to obtain the output sources the linear separating matrix . For this task, we use the natural gradient descent method to as proposed in [20] and [23] derive the learning equation for

To provide an experimental demonstration of the validity of GABSS, we will use a system of three sources. The signals match up with three different persons saying the numbers one to ten in English, Japanese, and Spanish, respectively. The number of samples was 12 600. These signals were first linearly mixed with a 3 3 mixture matrix

(30) (28) The nonlinear distortions were selected as

where

(29) and

denotes the Hadamard product of two vectors.

C. Genetic Operators Typical crossover and mutation operators will be used for the manipulation of the current population in each iteration of the GA. The crossover operator is “Simple One-point Crossover.” The mutation operator is “Non-Uniform Mutation” [13]. This operator presents the advantage, compared to the classical uniform mutation operator, of making fewer significant changes to the genes of the chromosome as the number of generations grows. This property makes the exploration-exploitation

The goal of the simulation was to analyze the behavior of the GA and observe whether the fitness function thus achieved is optimized. With this aim, we studied the mixing matrix obtained by the algorithm and the inverse function. When the number of generations reached a maximum value, the best individual from the population was selected and the estimated signals were , and the inverse funcextracted, using the mixing matrix tions . Figs. 7 and 8 represent the original and mixed signals, respectively. Fig. 9 shows the separated signals obtained with the proposed algorithm for one of the simulations. As can be seen, the signals are very similar to the original ones, up to possible scaling factors and permutations of the sources (e.g., estimated signal

414

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 34, NO. 4, NOVEMBER 2004

Fig. 8. Mixed signals (x) after a post-nonlinear mixture.

Fig. 9.

Obtained signals (y) which approximate source signals in Fig. 7.

match up with source signal , but it changed also in amplitude terms). Fig. 10 compares the approximation of the polynomial functions to the inverse of . Finally, Fig. 11 shows the joint representation of the original, mixed, and obtained signals. 1) Parameters of the GA: In this practical application, the and the number of generations population size was . The crosstalk parameters of the separated was iterations signals regarding to the original voices were on average of 100 simulations

and

dB dB dB

Concerning genetic operator parameters, the crossover probaand mutation probability bility per chromosome was

Fig. 10.

Comparison of the unknown f

and its approximation by p .

Fig. 11. Representation of the joint distribution of the original (s), mixed (x), and obtained (y) signals.

per gene was . These values were selected because of their better performance when compared with other combinations that were evaluated as well. V. PERFORMANCE COMPARISON After having presented both approaches, let us compare the two. At first glance, simulated annealing is the faster approach, although competitive learning gives better results in terms of accuracy. The GA approach, on the other hand, is slower but leads to better performance when there are many local minima due to the tendency of other algorithms to get stuck in these local solutions. Although a thorough performance analysis should be carried out in future studies, some conclusions can be drawn from the experimental performance results shown in Table I. The parameters of the GA were the same as those described in Section IV-D.

ROJAS et al.: BSS IN POST-NONLINEAR MIXTURES USING COMPETITIVE LEARNING

TABLE I PERFORMANCE COMPARISON OF THE TWO APPROACHES

415

a more meticulous performance comparison of the proposed methods and their application to higher dimensionality problems. REFERENCES

To quantify the performance achieved, we calculate the component-wise crosstalk measure (23) and we ran each algorithm 100 times for two and three nonlinear mixtures, measuring also the time of convergence. The two uniform signals simulation match with the Lin–Cowan nonlinear mixture already presented in Section III-D. The simulation of two synthetic signals and a voice signal refers to a sinusoidal and a square signal, and a telephone ring recording, respectively. The three Laplacian signals correspond to the ones shown in simulation described in Section IV-D. VI. CONCLUSION We have shown a new powerful adaptive-geometric method based on competitive unsupervised learning and simulated annealing, which finds the distribution axes of the observed signals or independent components by means of a piecewise linearization in the mixture space, the use of simulated annealing in the optimization of a fourth-order statistical criterion being an experimental advance. This article also discusses an appropriate application of GAs to the complex problem of the blind separation of sources. It is widely believed that the specific potential of genetic or evolutionary algorithms originates from their parallel search by means of entire populations. In particular, the ability to escape from local optima is an ability very unlikely to be observed in steepest-descent methods. Although to date and to the best of the authors’ knowledge, there is little mention in the literature of this synergy between GAs and BSS in nonlinear mixtures [20]; the article shows how GAs provide a tool that is perfectly valid as an approach to this problem. We are currently studying the adaptation of the proposed GA technique to other types of nonlinear mixtures, such as time dependent and convolutive mixtures. Future work will include

[1] S.-I. Amari, A. Cichocki, and H. Yang, “A new learning algorithm for blind signal separation,” Adv. Neural Inform. Processing Syst., vol. 8, pp. 757–763, 1996. [2] A. Bell and T. J. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” Neural Comput., pp. 1129–1159, 1995. [3] G. Burel, “Blind separation of sources: a nonlinear neural algorithm,” Neural Netw., vol. 5, pp. 937–947, 1992. [4] J. F. Cardoso, “Source separation using higher order moments,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing, Glasgow, U.K., May 1989, pp. 2109–2212. [5] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley, 1989. [6] S. Haykin, Neural Networks. Englewood Cliffs, NJ: Prentice-Hall, 1999. [7] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis. New York: Wiley, 2001. [8] A. Hyvärinen and E. Oja, “A fast fixed-point algorithm for independent component analysis,” Neural Comput., vol. 9, no. 7, pp. 1483–1492, 1997. [9] C. Jutten and J. Herault, “Blind separation of sources, part I: An adaptive algorithm based on neuromimetic structure,” Signal Processing, vol. 24, no. 1, pp. 1–10, 1991. [10] T.-W. Lee, B. Koehler, and R. Orglmeister, “Blind separation of nonlinear mixing models,” in Proc. IEEE Neural Networks for Signal Processing, 1997, pp. 406–415. [11] J. K. Lin, D. G. Grier, and J. D. Cowan, “Source separation and density estimation by faithful equivariant SOM,” in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 1997, vol. 9, pp. 536–542. [12] A. Mansour, C. Jutten, and P. Loubaton, “Subspace method for blind separation of sources in convolutive mixtures,” in Proc. Eur. Signal Processing Conf., Trieste, Italy, Sept. 1996, pp. 2081–2084. [13] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, 3rd ed. New York: Springer-Verlag, 1999. [14] A. V. Oppenheim, E. Weinstein, K. C. Zangi, M. Feder, and D. Gauger, “Single-sensor active noise cancellation,” IEEE Trans. Speech Audio Processing, vol. 2, pp. 285–290, Apr. 1994. [15] P. Pajunen, A. Hyvarinen, and J. Karhunen, “Nonlinear blind source separation by self-organizing maps,” in Proc. Progress in Neural Information Processing, vol. 2, New York, 1996, pp. 1207–1210. [16] C. G. Puntonet, A. Mansour, C. Bauer, and E. Lang, “Separation of sources using simulated annealing and competitive learning,” Neurocomputing, vol. 49, no. 1–4, pp. 39–60, 2002. [17] C. G. Puntonet and A. Prieto, “Neural net approach for blind separation of sources based on geometric properties,” Neurocomputing, vol. 18, no. 3, pp. 141–164, 1998. [18] F. Rojas, I. Rojas, R. M. Clemente, and C. G. Puntonet, “Nonlinear blind source separation using genetic algorithms,” in Proc. 3rd Int. Workshop Independent Component Analysis Signal Separation, San Diego, CA, 2001, pp. 400–405. [19] A. Taleb and C. Jutten, “Source separation in post-nonlinear mixtures,” IEEE Trans. Signal Processing, vol. 47, no. 10, pp. 2807–2820, 1999. [20] Y. Tan and J. Wang, “Nonlinear blind source separation using higher order statistics and a genetic algorithm,” IEEE Trans. Evol. Comput., vol. 5, pp. 600–612, Dec. 2001. [21] Y. Tan, J. Wang, and J. M. Zurada, “Nonlinear blind source separation using a radial basis function network,” IEEE Trans. Neural Networks, vol. 12, pp. 124–134, Jan. 2001. [22] H. L. N. Thi and C. Jutten, “Blind source separation for convolutive mixtures,” Signal Processing, vol. 45, pp. 209–229, 1995. [23] H. H. Yang, S. Amari, and A. Chichocki, “Information-theoretic approach to blind separation of sources in nonlinear mixture,” Signal Processing, vol. 64, pp. 291–300, 1998. [24] D. Yellin and E. Weinstein, “Multichannel signal separation: methods and analysis,” IEEE Trans. Signal Processing, vol. 44, pp. 106–118, Jan. 1996.

416

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 34, NO. 4, NOVEMBER 2004

Fernando Rojas received the B.Sc. degree in computer science and the Ph.D. degree in independent component analysis and blind source separation from the Universty of Granada, Granada, Spain, in 2000 and 2004, respectively. Currently, he is an Associate Professor working in the Department of Computer Architecture and Technology, University of Granada. From 2000 to 2001, he was working under a local fellowship with the Department of Computer Science and Artificial Intelligence at the University of Granada. He was a Visiting Researcher with the University of Regensburg, Regensburg, Germany, in 2002. His main research interests include signal processing, source separation, independent component analysis, and evolutionary computation.

Manuel Rodríguez-Álvarez received the B.Sc. degree in electronic physics and the Ph.D. degree in physics from the University of Granada, Granada, Spain, in 1986 and 2002, respectively. Currently, he is Associate Professor at the Computer Architecture and Technology Department, University of Granada. He was a Visiting Professor at the National Polytechnic Institute, Grenoble, France, in 1994 and 1995 and at the University of Regensburg, Regensburg, Germany, in 2002. He was Associate Professor at the Electronic Department, University of Granada. His research interests include blind source separation (BSS) and independent component analysis (ICA).

Ignacio Rojas received the M.S. degree in physics and electronics and the Ph.D. degree in neuro-fuzzy networks from the University of Granada, Granada, Spain, in 1992 and 1996, respectively. Currently, he is a Teaching Assistant with the Department of Architecture and Computer Technology, University of Granada. In 1998, he was a Visiting Professor of the Berkeley Institute in Soft Computing (BISC) at the University of California at Berkeley. His research interests are hybrid system and combination of fuzzy logic, genetic algorithms and neural networks, financial forecasting, and financial analysis. Carlos G. Puntonet was born in Barcelona, Spain, on August 11, 1960. He received the B.Sc., M.Sc., and Ph.D. degrees in electronics physics from the University of Granada, Granada, Spain, in 1982, 1986, and 1994, respectively. Currently, he is an Associate Professor with the Department of Architecture and Computer Technology, University of Granada. He has been a Visiting Researcher with the Laboratorie de Traitement d’Images et Reconnaissance de Formes (INPG) Grenoble, France; the Institute of Biophysics, Regensburg, Germany; and at the Institute of Physical and Chemical Research (RIKEN) Nagoya, Japan. His research interests include the signal processing, independent component analysis and blind separation of sources, artificial neural networks, and optimization methods.

Rubén Martín-Clemente (S’96–M’01) was born in Cordoba, Spain, on August 9, 1973. He received the M.Eng. degree in telecommunications engineering and the Ph.D. degree (Hons.) in telecommunications engineering from the University of Seville, Seville, Spain, in 1996 and 2000, respectively. Currently, he is an Assistant Professor with the Department of Electronic Engineering, University of Seville. His research interests include blind signal separation, higher order statistics, and statistical signal and array processing and their application to biomedical problems and communications. He has coauthored many contributions in international journal and conference papers.

Suggest Documents