Multichannel Blind Deconvolution of Non-minimum Phase Systems Using Information Backpropagation L.-Q. Zhang, A. Cichocki and S. Amari Brain-style Information Systems Group, RIKEN Brain Science Institute Wako shi, Saitama 351-0198, JAPAN fzha,
[email protected]
In Proceedings of 6th International Conference on Neural Information Processing(ICONIP'99) pp210-216, Perth, Australia November 16-20
1
210
Multichannel Blind Deconvolution of Non-minimum Phase Systems Using Information Backpropagation
a given length makes a new lter with extended length, so does the inverse operation. In 20] we have studied the geometrical structure of FIR manifolds, and developed an e cient learning algorithm for training FIR lter. In this paper we decompose the doubly nite lters into two cascade FIR lters, a forward lter(noncausal component) and a delay lter(causal component). We employ the natural gradient descent scheme L.-Q. Zhang, A. Cichocki and S. Amari to train the causal FIR lter and use the inforBrain-style Information Systems Group, mation backpropagation scheme to train the nonRIKEN Brain Science Institute causal FIR lter. Novel learning algorithms are Wako shi, Saitama 351-0198, JAPAN derived to update the parameters in the demixing fzha,
[email protected] Abstract | We present a novel method{ lter de- model. Simulations are given to illustrate the efcomposition approach, for multichannel blind de- fectiveness and validity of the proposed algorithm. convolution of non-minimum phase systems. In 20] II. Problem Formulation we has developed an e cient natural gradient algorithm for causal FIR lters. In this paper we In this paper, as a mixing model, we consider a further study the natural gradient method for noncausal lters. We decompose the doubly nite l- multichannel, linear time-invariant(LTI) and nonters into a product of two lters, a noncausal FIR causal systems of form lter and a causal FIR lter. The natural gradient algorithm is employed to train the causal FIR lter, x(k) = H(z)s(s) (1) and a novel information backpropagation algorithm is developed for training the noncausal FIR lter. where H(z) = P1=;1 H z; , z;1 is the delay opSimulations are given to illustrate the eectiveness erator, H is an n n-dimensional matrix of mixand validity of the algorithm. ing coe cients at time-lag p, which is called the impulse response at time p, s(k) is an n-dimensional I. Introduction vector of source signals with mutually independent Recently blind separation of independent sources components. has become an increasing important research area The goal of the multichannel blind deconvoludue to its similarity to the separation feature in tion is to retrieve source signals only using sensor human brain, as well as its rapidly growing appli- signals x(k) and certain knowledge of the source cations in various elds, such as telecommunica- signal distributions and statistics. We carry out tion systems, image enhancement and biomedical the blind deconvolution with another multichansignal processing. Refer to review papers 3] and nel LTI system of a doubly innite multichannel
8] for details. equalizer of the form It has been shown that the natural gradient y(k) = W(z)x(k) (2) improves dramatically the learning e ciency in blind separation and blind deconvolution 1]. For P1 W z; , y = y (k) y ] where W ( z ) = 1 =;1 the doubly innite lter, the natural gradient alis an n -dimensional vector of the outputs and W gorithm has been developed by Amari et al, 5]. is an n n -dimensional coe cient matrix at time However, in most practical applications, we have to implement doubly nite lters as demixing mod- lag p. The global transfer function is dened by els. The main objective of this paper is to develop G(z) = W(z)H(z): The objective of the blind dean e cient learning algorithm for training non- convolution task is to nd W(z) such that causal demixing models. In contrast to doubly G(z) = P D(z) (3) innite lters, the doubly nite lters do not have the self closed multiplication and inverse opera- where P 2 R is a permutation matrix, D(z) = tions in the manifold of lters with a xed length. diagfz; 1 z; g, and 2 R is a nonsinIn general the multiplication of two lters with gular diagonal scaling matrix. In other words, the p
p
p
p
p
p
p
n
T
p
n
d
n
dn
n
n
211 Unknown objective of multichannel blind deconvolution is to x(k) u(k) y(k) s(k) recover the source vector s(k) from the observation L(z) H(z) R(z ) n n n n vector x(k), up to possibly scaled, reordered, and delayed estimates. In practical applications we have to implement Mixing model Demixing model the blind deconvolution problem with doubly-nite Fig. 1. Illustration of the blind deconvolution structure multichannel lters -1
X W z; =;
W(z) =
N
(4) important theoretical problem is what conditions guarantee that a lter W(z) in M(N N ) has such where M and N are given positive integers. The decomposition (6). Theorem 1: Any doubly-nite multichannel lset of such lters (4) is denoted by M(N M ), ter W(z) in M(N N ) has the decomposition (6) 8 9 if the following matrix W is nonsingular, < = X M(N M ) = :W(z)jW(z) = W z; : 2W 3 W W =; 0 ; 1 ; +1 66 W1 W0 W; +2 77 (5) 77 (8) W = 66 .. ... . . . ... For simplicity, we assume M = N in the following 4 5 . discussion. In general the multiplication of two W W W ;1 ;2 0 lters in M(N N ) will enlarge the lter length Proof: See Appendix. and do not belong in M(N N ) anymore. This In this paper we will develop a natural gradimakes di cult to introduce natural gradients in ent approach to adjust the parameters in the lthe manifold of doubly-nite multichannel lters. ter L(z) and use the information back-propagation Moreover it is not easy to discuss directly the in- method to train the parameters in the lter R(z;1 ). vertibility of such lters. The blind deconvolution process in this paper is illustrated in Figure 1. It is plausible to decompose III. Filter Decomposition the doubly nite lters into the multiplication of a In order to explore the geometric structure of causal FIR lter and a noncausal FIR lter since M(N N ) and an eective learning algorithm for it is easy to study the invertibility of such lters W(z), we present a novel lter decomposition ap- and to develop an e cient learning algorithm. proach and introduce operations for lters in the Lie group framework. IV. Natural Gradient Algorithm for Let us decompose a noncausal lter into a casFIR filters cade form of two FIR lters. In the paper 20], we studied the geometrical structures of the FIR lter manifold and introW(z) = L(z)R(z;1 ) (6) duced the Lie Group and Riemannian metric on P ; ; 1 the manifold, then derived a novel natural gradiwhere L(z) = =0 L z and R(z ) = P Rboth ent algorithm for training FIR lters and discussed =0 z with R0 = I , are one-sided nite mul- properties of the learning algorithm. tichannel lters, one is a forward lter (non-causal In order to separate independent sources by a component) and the other is a delay lter (causal component). The relation between the coe cients demixing model, we formulate the blind deconvolution problem into an optimization problem. Asof three lters are as follows sume that p(y W), p (y W) are the joint probX W = L R k = ;N N: (7) ability density function of y and marginal pdf of y (i = 1 m) respectively. Our target is to ; = 0 make the components of y as both spatially muWith this decomposition, it becomes much eas- tually independent and temporarily identically inier to discuss that the invertibility of doubly-nite dependently distributed as possible. To this end, multichannel lters in the Lie group sense 20]. An we employ the Kullback-Leibler divergence as a p
p
p
M
N
p
p
p
M
N N
N
N
p
p
M
p
p
p
N
p
n
i
k
p
p
q
k
p q
N
q
i
i
212
cost function 4]
X
l(y W) = ;H (y W) +
n
=1
x(k)
H (y W) i
(9)
-1
i
R where H (y WR) = ; p(y W) log p(y W)dy H (y W) = ; p (y W) log p (y W)dy : The divergence l(y W) is a nonnegative functional, which i
i
i
i
i
i
u(k)
R(z )
y(k)
L(z) j (× )
H
L (z)
Fig. 2. Information back-propagation structure
measures the mutual independence of the output signals y (k). The output signals y are mutually begin with the lter decomposition and show how to train the noncausal lter by using the mutual independent if and only if l(y W) = 0. For the FIR demixing model, the cost function information backpropagation approach. can be simplied as follows According to the lter decomposition (6), we denote X l(W(z)) = ; log jdet(W0)j ; log p (y (k) W): u(k) = R(z;1 )x(k) (13) =1 y(k) = L(z)u(k) (14) (10) The natural gradient learning algorithm 20] is de- If we consider u(k) as the observed signals, we can scribed as follows apply the natural gradient learning rule (11) to update the parameters in the lter L(z). In order to @l ( W ( z )) W = ; @ W(z) W (z;1 ) W(z) develop learning algorithm for the noncausal lter R(z;1 ), we use the information backpropagation
X = 0 I ; '(y(k))y (k ; q) W ; (11) techniques. The backpropagation rule is described =0 as follows @l(W(z)) = X X @l(W(z)) @y (k + p) for p = 0 1 N , where is a learning rate, and is the Lie group multiplication operator of two @ u(k) =0 =1 @y (k + p) @ u(k ) lters. In particular, the learning algorithm for = L (z)'(y(k)) (15) W0 is described by P
where L (z) = L z is the conjugate operaW0 = I ; '(y)y (k) W0: (12) tor of L(z), and =0 @l(W(z)) = X @l(W(z)) @u (k) Refer to 20] for detailed derivation. It should be noted that the algorithm looks similar but in fact @R =1 @u (k ) @ R it is not identical to the one in 5]. The essen(z)) x (k + q) (16) = @l@(W tial dierence is that the update rule for W in u(k) this paper depends only on coe cients W , in the range q = 0 p, while in 5] it depends on all pa- for q = 1 N . The structure of information rameters W , in the range q = 0 N . The algo- backpropagation is illustrated in gure (2). In this ( )) rithm (11) has two important properties, the uni- structure, the blind statistic error (W y( ) is backform performance (equivalent property) and non- ward propagated through channel L (z) to form singularity of W0. The above learning algorithm the blind error (W( )) , which is used to train nonu( ) (11) works e ciently for the minimum phase sys- causal multichannel lter R(z;1). There are sevtems. eral leaning schemes for train the parameters of R(z;1 ). One typical scheme is the ordinary gradiV. Information Backpropagation descent learning algorithm, which is described The capability of achieving learning in a cas- ent as follows cade of adaptive lters is of fundamental impor ! @l ( W ( z )) tant to the blind deconvolution and separation of R = ; @ u(k)) x (k + q) (17) nonminimum phase mixtures. To explore this, we i
n
i
i
i
T
p
p
T
q
p
q
q
N
n
i
p
i
i
H
H
T
N
T
p
p
p
n
i
q
i
i
q
T
p
q
q
@l
z
@
k
H
@l
@
z
k
T
q
213
where is a learning rate. In next section we also B. Natural gradient learning for R(z;1) employ the natural gradient algorithm to update In order to obtain a better learning performance R(z;1 ) with slight modication, that is, we imple- we also implement the natural gradient approach ment constraint to the matrix R0 as R0 = I. for the noncausal lter R(z;1 ). The natural gradient algorithm for R(z;1 ) is described by VI. Numerical Implementation R(k + 1) = ;(k)Z (k)R(k) (24) In this section we consider the e cient implementation of learning algorithms for blind decon- where the matrix Z (k) is dened as volution of nonminimum phase mixtures. We uti2 0 0 0 03 lize the learning algorithm (11) to train lter L(z). 0 0 77 66 Z1 0 But since R(z;1 ) is the noncausal part of the demixZ Z 0 0 77 6 2 1 ( )=6 . (25) ture model, we have to propose some strategy to . . .. 5 . 4 . . . . . . . . . implement the algorithm. There are several schemes Z Z Z 0 N N ;1 N ;2 to numerically implement the learning algorithm. We introduce the following notations and the matrices Z are dened by (z)) u (k + p) p = 1 N: (26) X (k) = x (k) x (k + 1) : : : x (k + N )] Z = ; @l@(W u(k) U (k) = u (k) u (k ; 1) : : : u (k ; N )] In the learning algorithm (24), we see that R0 and is xed to I. Computer simulations show that R = R0 : : : R ] L = L0 : : : L ] (18) the natural gradient learning algorithm has much better convergence performance than the ordinary then we can rewrite (13)-(14) into following form gradient algorithm (17). u(k) = RX (k) (19) C. Minimum mutual information approach y(k) = LU (k): (20) The information backpropagation (15) usually needs lots of computation. In order to reduce comA. Natural gradient learning for L(z) puting cost, we could directly implement the minIt is easy to see that the natural gradient learn- imum mutual information approach for R(z;1 ). ing algorithm (11) is equivalent to the following The learning rule is the same as (24) besides the matrix form matrices Z , which are dened by L(k + 1) = ;(k)Y (k)L(k) (21) Z = ;'(u(k))u (k + p) p = 1 N: (27) The learning algorithm for Z is much simpler and where the matrix Y (k) is dened as needs less computation. However, since it runs in2 Y0 0 0 0 3 dependently of the learning algorithm (21), com0 0 77 66 Y1 Y0 puter simulations show that the convergence is Y0 0 77 ( ) = 66 Y. 2 Y. 1 (22) much slower than the information backpropaga.. . . . .. 5 4 .. .. . . tion algorithm. YN YN ;1 YN ;2 Y0 D. Performance of algorithm and the matrices Y are dened by To evaluate the performance of the proposed Y = 0 I ; '(y(k))y (k ; p) p = 0 N: learning algorithms, we employ the multichannel (23) intersymbol interference, denoted by M , as a From (21) it is observed that the natural gradient criteria, P jG j2;max jG j algorithm (11) is dierent from the one proposed j P M = =1 in 5]. The dierence comes from that the matrix max jG j P Y (k) in this paper is a low-triangle matrix, while j jG j2;max jG j : (28) + P =1 in 5] it is full one. max jG j Z k
:
p
T
T
T
T
T
T
T
T
N
T
p
N
p
T
p
p
Y k
:
p
T
p
p
ISI
n
I SI
j p
pij
i
n j
p j
p j
i p
pij
p i
p i
pij
pij
pij
pij
214
It is easy to show that M = 0 if and only if G(z) is of the form (3). In order to remove the eect of single numerical trial on evaluating the performance of algorithms, we use the ensemble average approach, that is, in each trial we obtain a time sequence of M , then we take average of the ISI performance to evaluate the performance of algorithms. The learning rate is another important factor to implement the natural gradient algorithm. The Fig. 3. ISI performance of natural gradient algorithm strategy in this paper is to update the learning rate ; 4 by (k + 1) = maxf0:9(k) 10 g with (0) = 10;2, for each 200 iterations. dient algorithm usually needs less than 2000 iterations to obtain satisfactory results, while the orVII. Simulations dinary gradient algorithm needs more than 20000 A large number of computer simulations have iterations to obtain satisfactory results since there been performed to show the eectiveness and per- is a long plateau in the ordinary gradient learning. formance of the natural gradient learning algorithm and the information backpropagation algorithm. We will give two examples to demonstrate B. Nonminimum phase mixture cases the behavior and performance of the algorithm To demonstrate the eectiveness and performance (11). In both examples the mixing model is a mul- of information backpropagation, we randomly setichannel ARMA model as follows lect nonminimum phase lters as mixing models and employ doubly nite noncausal lters as demixX X u(k)+ A u(k ; i) = B s(k ; i)+ v(k) (29) ing models. In the second simulation the sensor =1 =0 signals are produced by the multichannel ARMA 3 where u s v 2 R . The matrices A and B are model (29), of which parameters are randomly chorandomly chosen such that the mixing system is sen such that the mixing system is stable and stable and minimum phase. The sources s are cho- nonminimum phase. The zero and pole distrisen to be i.i.d signals uniformly distributed in the butions of the mixing system are plotted in grange (-1,1), and v are the Gaussian noises with ure 4. In order to estimate source signals, the zero mean and a covariance matrix 0:1I. The non- learning algorithms (21) and (24) have been used. linear activation function is chosen to be '(y) = Figure 5 and 6 illustrate the coe cients of global transfer function G(z) = W(z)H(z) at the initial y3. state and after 8000 iterations respectively, where A. Causal FIR lter the (i j )th sub-gure plots the of the P coe cients ; transfer function G ( z ) = g z up to =; In order to evaluate the performance of the learning algorithm (11), we randomly choose minimum- order of N = 60. phase lters as the mixing model, and use causal VIII. Appendix FIR lters to recover source signals. Both the natural and the ordinary gradient learning algorithms A.Filter Decomposition are used to train the parameters. A large num- In this appendix we present a novel approach to ber of simulations show that the natural gradient decompose the doubly nite noncausal lters into of two one-sided FIR lters. Delearning algorithm can easily recover source sig- a cascade form P P ; ; 1 note L(z) = =0 L z and R(z ) = =0 R z , nals in the sense of W(z)H(z) = P . lter W(z), Figure 3 illustrates the 100 trial ensemble av- with R0 = I. Given a doubly nite ; 1 erage M performance of the natural gradient we want to nd a FIR lter R(z ) such that the learning algorithm and the ordinary gradient learning algorithm. It is observed that the natural gra0.7
ISI
ISI Performance Illustration
0.6
• Natural Gradient: Solid Line
• Ordinary Gradient: Dashed Line
ISI Performance
0.5
I SI
0.4
0.3
0.2
0.1
0
0
1000
2000
3000 Iteration Numbers
4000
5000
6000
M
N
N
i
i
i
i
i
i
N
ij
N p
I SI
p
k
p
N
k
ij k
M p
p
p
215 Zeros distribution of H(z)
Let
Poles distribution of H(z)
1.5
1.5
1
1
0.5
0.5
X N
k
=0
W ; R = 0 p = 1 N: p
k
(32)
k
Im(eig(AM))
Im(eig(A))
If the matrix W , dened in (8) is nonsingular, we can solve th following equation to obtain lter R(z), WR = ;W (33) where R = R1 R ] and W = W1 W ] . Fig. 4. Zeros and poles distributions of the mixing ARMA Once we have R(z ;1 ), we dene L(z ) as in (30). model Therefore the lter W(z) can be decomposed as follows W(z) = L(z)Ry(z;1 ) (34) where Ry(z;1 ) is a noncausal lter, which is the Lie group inverse of R(z), and Ry is recurrently dened by Ry0 = R;0 1 Ry = ; P =1 Ry; R p = 1 N: Refer to 20] for the detailed denition of the Lie group on the FIR manifold. 0
0
−0.5
−0.5
−1
−1
−1.5
N
−1.5
−1
0 Re(eig(A))
1
−1
0 Re(eig(AM))
N
G(z) 1,1
G(z) 1,2
2
1 0
0
−1
−1
−2
−2
−2
100
0
50 G(z)
2,1
100
0
2,2
2
2
2
1
1
1
0
0
0
−1
−1
−1
−2
−2
0
50 G(z)
100
50 G(z)
100
1
50 G(z)
0 −1
−2
−2
−2
References
0
50
100
0
50
100
z
multiplication of two lter satises h i W(z)R(z;1) = L(z)
(30)
N
where W(z)] is a truncated polynomial of order N , i.e. any terms with an order higher than N in the polynomial W(z) are omitted. In fact we could nd R(z) in the following way N
i
W(z)R(z;1) = N
X X W R z; =; + = N
p
k
p
G(z) 1,1
N k
l
G(z) 1,3
5
5
0
0
0
−5
−5
−5
0
50 G(z) 2,1
100
0
50 G(z) 2,2
100
5
5
0
0
0
−5
−5
−5
0
50 G(z) 3,1
100
0
50 G(z) 3,2
100
5
0
0
0
−5
−5
−5
50
100
0
50 G(z) 2,3
100
0
50 G(z) 3,3
100
0
50
100
5
5
0
l
p
G(z) 1,2
5
p
q
q
100
Fig. 5. G( ) at initial state
h
q
3,3
1
0 −1
100
p
p
0
0
50
100
2,3
2
−1
0
50 G(z)
5
0
50
100
Fig. 6. G( ) after convergence z
(31)
T
N
p
3,2
2
1
T
N
N
−2
0
3,1
2
T
1
0
50 G(z)
T
2
−1
0
T
G(z) 1,3
2
1
N
1
1] S. Amari. Natural gradient works eciently in learning. Neural Computation, 10:251{276, 1998. 2] S. Amari, T. Chen, and A. Cichocki. Stability analysis of adaptive blind source separation. Neural Comput., 10:1345{1351, 1997. 3] S. Amari and A. Cichocki. Adaptive blind signal processing{ neural network approaches. Proceedings of the IEEE, 86(10):2026{2048, 1998. 4] S. Amari, A. Cichocki, and H.H. Yang. A new learning algorithm for blind signal separation. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems 8 (NIPS*95), pages 757{ 763, Cambridge, MA, 1996. The MIT Press. 5] S. Amari, S. Douglas, and A. Cichocki. Multichannel blind deconvolution and source separation using the natural gradient. IEEE Trans. on Signal Processing, to appear. 6] S. Amari, S. Douglas, A. Cichocki, and H. Yang. Novel online algorithms for blind deconvolution using natural gradient approach. In Proc. 11th IFAC Symposium on System Identication, SYSID'97, pages 1057{1062, Kitakyushu, Japan, July, 8-11 1997. 7] A.J. Bell and T.J. Sejnowski. An information maximization approach to blind separation and blind deconvolution. Neural Computation, 7:1129{1159, 1995. 8] J.-F Cardoso. Blind signal separation: Statistical principles. Proceedings of the IEEE, 86(10):2009{2025, 1998. 9] J.-F. Cardoso and B. Laheld. Equivariant adaptive source separation. IEEE Trans. Signal Processing, SP-43:3017{ 3029, Dec 1996. 10] A. Cichocki and R. Unbehauen. Robust neural networks with on-line learning for blind identication and blind separation of sources. IEEE Trans Circuits and Systems I : Fundamentals Theory and Applications, 43(11):894{906, 1996. 11] A. Cichocki, R. Unbehauen, and E. Rummert. Robust learning algorithm for blind separation of signals. Electronics Letters, 30(17):1386{1387, 1994. 12] A. Cichocki and L. Zhang. Two-stage blind deconvolution using state-space models (invited). In Proceedings of the
T
216
13] 14] 15]
16] 17]
18]
19]
20]
Fifth International Conference on Neural Information Processing(ICONIP'98), pages 729{732, Kitakyushu, Japan, Oct. 21-23 1998. A. Cichocki, L. Zhang, and S. Amari. Semi-blind and statespace approaches to nonlinear dynamic independent component analysis. In Proc. NOLTA'98, pages 291{294, 1998. Y. Hua. Fast maximum likelihood for blind identication of multiple FIR channels. IEEE Trans. Signal Processing, 44:661{672, 1996. K. Torkkola. Blind separation of convolved sources based on information maximization. In S. Usui, Y. Tohkura, S. Katagiri, and E. Wilson, editors, Proc. of the 1996 IEEE Workshop Neural Networks for Signal Processing 6 (NNSP96), pages 423{432, New York, NY, 1996. IEEE Press. H. Yang and S. Amari. Adaptive on-line learning algorithms for blind separation: Maximum entropy and minimal mutual information. Neural Comput., 9:1457{1482, 1997. L. Zhang, S. Amari, and A. Cichocki. Natural gradient approach to blind separation of over- and undercomplete mixtures. In Proceeding of Independent Component Analysis and Signal Separation(ICA'99), pages 455{460, Aussois, France, 1999. L. Zhang and A. Cichocki. Blind deconvolution/equalization using state-space models. In Proceeding of 1998 Int'l IEEE Workshop on Neural Networks for Signal Processing(NNSP'98), pages 123{131, Cambridge, UK, August 31-September 2 1998. L. Zhang and A. Cichocki. Blind separation of ltered source using state-space approach. In M.S. Kearns, S.A. Solla, and D.A. Cohn, editors, Advances in Neural Information Processing Systems, volume 11, pages 648{654. MIT press, Cambridge, MA, 1999. L. Zhang, A. Cichocki, and S. Amari. Geometrical structures of FIR manifold and their application to multichannel blind deconvolution. In Proceeding of NNSP'99, pages 303{312, Madison, Wisconsin, August 23-25 1999.