Approximate Joint Singular Value Decomposition ... - IEEE Xplore

620

IEEE SIGNAL PROCESSING LETTERS, VOL. 25, NO. 5, MAY 2018

Approximate Joint Singular Value Decomposition Algorithm Based on Givens-Like Rotation Jifei Miao , Guanghui Cheng , Yunfeng Cai , and Jing Xia

Abstract—An approximate joint singular value decomposition algorithm is proposed for a set of K (K ≥ 2) complex matrices. It can be seen as an orthogonal non-Hermitian approximate joint diagonalization algorithm. We exploit a Givens-like rotation method based on a special parameterization of the updating matrices and a reasonable approximation. The main points consist of the presentation of the new parameter structure, analytical derivation of the elementary updating matrices. High accuracy and fast convergence rate are obtained. Numerical simulations illustrate the overall good performance of the proposed algorithm. Index Terms—Givens rotation, joint blind source separation, joint diagonalization, joint singular value decomposition.

I. INTRODUCTION HE approximate joint diagonalization (AJD) problem of a set of matrices has stimulated an increasing interest in the areas of blind source separation (BSS) [1]–[7], independent component analysis [8], biomedical signal processing [9], and so on. One of the natural extensions of the AJD is the approximate joint singular value decomposition (AJSVD) which can be used in the joint blind source separation (JBSS) of only two datasets. In general, the AJD problem can be boiled down to the following problem: given a set of matrices (target matrices) A = {Ak ≡ (ak,i j ) ∈ C N ×N : k = 1, 2, . . . , K } that share the following latent common decomposition:

T

Ak = PDk Q H + Nk , k = 1, 2, . . . , K

(1)

where P, Q ∈ C N ×M (N ≥ M) are full column rank matrices, Dk ∈ C M×M and Nk ∈ C N ×N are, respectively, diagonal matrices and additive noise matrices for all k = 1, 2, . . . , K .

Manuscript received January 19, 2018; revised March 9, 2018; accepted March 10, 2018. Date of publication March 15, 2018; date of current version April 2, 2018. This work was supported in part by the National Natural Science Foundation of China under Grant 11671023, in part by the China Postdoctoral Science Foundation under Grant 2014M560709, and in part by the Fundamental Research Funds for the Central Universities under Grant ZYGX2015J099. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Yong Xiang. (Corresponding author: Guanghui Cheng.) J. Miao, G.-H. Cheng, and J. Xia are with the School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, P. R. China (e-mail:,[email protected]; ghcheng@ uestc.edu.cn; [email protected]). Y. Cai is with the LMAM & School of Mathematical Sciences, Peking University, Beijing 100871, P. R. China (e-mail:,[email protected]). Color versions of one or more of the figures in this letter are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LSP.2018.2815584

The superscript (·) H denotes the conjugate transpose. In the application of the BSS, if P = Q inherently, the model (1) corresponding to the problem of the single-set data BSS, otherwise the model (1) corresponding to the problem of the double-set data JBSS, which can exploit the dependence between two datasets sufficiently. In this study, we consider the orthogonal AJD problem under the case that P = Q in the complex domain, (i.e., orthogonal non-Hermitian AJD algorithm or called AJSVD). Specifically, the aim of our algorithm is to seek two different unitary matrices U and V , which make U Ak V H as diagonal as possible for all Ak ∈ A. Several state-of-the-art AJSVD and orthogonal JBSS algorithms have been proposed in the literatures [10]–[14]. Their works are mostly done in the real domain. As mentioned in [11], the Steepest Descent method and the (nonlinear) Conjugate Gradient method can also be used to solve the AJSVD problem. However, the convergence of these two algorithms often requires considerable iterations and they are very sensitive to noise environment. In [10], Congedo, Phlypo, and Pham proposed an AJSVD algorithm called the Givens that is the Jacobi-like method using Givens rotations. To the best of our knowledge, this algorithm has the best overall performance so far. An alternating scheme is used to update the target matrices in the Givens algorithm. More specifically, the Givens algorithm without simultaneously obtains the optimal left and right Givens rotation matrices. In this letter, we provide an alternative unitary rotation matrix that only depends on one unknown parameter (not rotation angle). The left and right unitary rotation matrices can be simultaneously obtained to improve the performance and shorten the execution time. The main contributions of this letter are as follows: 1) A new parameter structure of unitary rotation matrix is proposed. This parameter structure only depends on one unknown parameter (not rotation angle as in the Givens rotation matrix). Thus, the AJSVD problem can be tackled in the complex domain. 2) Based on an useful approximation, analytic solutions of the proposed algorithm can be obtained. 3) Experiments on the synthetic data and the JBSS of two datasets are conducted. The simulation results illustrate that our proposed algorithm has a better performance than the Givens algorithm. The rest of this letter is organized as follows. In Section II, we present the considered algorithm, the overall description, and the analytical derivation of the updating matrices. In Section III, numerical simulations are given to illustrate the performance of the proposed algorithm. Finally, some conclusions will be drawn in Section IV.

1070-9908 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

MIAO et al.: APPROXIMATE JOINT SINGULAR VALUE DECOMPOSITION ALGORITHM BASED ON GIVENS-LIKE ROTATION

II. PROPOSED ALGORITHM Similar to the vast majority of the AJD algorithms, we consider a natural and effective cost function [4], [8], [15] J (U, V) =

K

ZDiag{UAk V H }2F

(2)

621

where φk,i j and ak,i j denote the (i, j)th entries of matrices k and Ak , respectively. Then, the cost function (2) can be rewritten as J (R1 , R2 ) =

K

{|φk,i j |2 + |φk, ji |2 +

k=1

(t) N ×N where t is the iteration index, and R(t) are unitary 1 , R2 ∈ C updating matrices. The minimization of the cost function J (U, V) is solved by repeating the following updating scheme until convergence: H

(t) (t) A(t+1) = R(t) , t = 0, 1, 2, · · · . k 1 Ak R2

(4)

The key is to find two optimal updating matrices in each iteration. For a fixed (i, j) position, we consider a reasonable (t) parameter structure for the updating matrices R(t) 1 and R2 , which are equal to the identity matrix but for the following four entries: (t) (t) R(t) 1 (i, i) = R1 ( j, j) = λx (5) (t) (t) (t) (t) (t) R1 (i, j) = λx x¯ , R1 ( j, i) = −λ(t) x x (t) (t) R2 (i, i) = R(t) 2 ( j, j) = λ y (6) (t) (t) (t) (t) (t) R2 (i, j) = λ y y¯ , R2 ( j, i) = −λ(t) y y ¯ is the conjugate operator, x (t) , y (t) ∈ C are two unwhere (·) 1 1 known parameters, and λ(t) , λ(t) . It is x = y = 2 2 1+|x (t) | 1+| y (t) | (t) easy to see that R(t) 1 and R2 are unitary matrices.

+ |φk, jn |2 + |φk,ni |2 + |φk,n j |2 )} + c0

k =

R1 Ak R2H , k

= 1, · · · , K .

After straightforward derivations, we can obtain ¯ k, ji + ak,i j + xa ¯ k, j j ), φk,i j = λx λ y (− y¯ ak,ii − y¯ xa

Ja (R1 , R2 ) =

⎪ φk,ni = λ y (ak,ni + yak,n j ) ⎪ ⎪ ⎪ ⎩ φk,n j = λ y (− y¯ ak,ni + ak,n j )

K

|ψk,i j |2 + |ψk, ji |2 + c1

(8)

k=1

where c1 is a positive constant that does not depend on x and y. For the minimization task, we now calculate the partial derivative of Ja (R1 , R2 ) both with respect to x¯ and y¯ , and set them all to zero [4], [15], [17], i.e., ⎧ ∂Ja (R1 , R2 ) ⎪ ⎪ =0 ⎨ ∂ x¯ (9) ⎪ ∂Ja (R1 , R2 ) ⎪ ⎩ = 0. ∂ y¯ Defining

M=

β1 −2α1 −2α¯ 1 β1

,

f=

α2 − α3 α4 − α5

,

where α1 =

K

a¯ k,ii ak, j j

α2 =

k=1

α3 =

K

α5 =

K k=1

K

a¯ k,ii ak, ji

k=1

ak, j j a¯ k,i j

α4 =

k=1

φk, ji = λx λ y (−xak,ii − x yak,i j + ak, ji + yak, j j ) and for all n = i, j ⎧ ¯ k, jn ) φk,in = λx (ak,in + xa ⎪ ⎪ ⎪ ⎪ ⎨ φk, jn = λx (−xak,in + ak, jn )

(7)

where c0 is a positive constant that does not depend on x and y. Notice that, if we directly optimize the cost function (7), a highly nonlinear multiparameter optimization problem, the problem will become extremely complicated. Hence, we consider an useful approximation. Supposing we are close enough to a diagonalizing solution in the sense that all off-diagonal entries of Ak (k ∈ {1, 2, . . . , K }) have a very small magnitude, i.e., we have |ak,i j | 1 (for all k, i, j, i = j). It is then to show that |x| 1 and |y| 1 [15], [16]. Therefore, we will ignore both the high-order (≥ 2-order) terms of x, y, and ak,i j (for all k, i, j, i = j.) in the entries of the matrix k (simplified as k ). As a result, the entries of the matrix k are the same as ones of the matrix Ak except the two entries: ψk,i j = ¯ k, j j and ψk, ji = −xak,ii + ak, ji + yak, j j , − y¯ ak,ii + ak,i j + xa where ψk,i j denotes the (i, j)th entry of matrix k . K We denote Ja (R1 , R2 ) = k=1 ZDiag{k }2F , the (7) can be reasonably simplified as

A. Analytical Derivation of the Updating Matrices For convenience, we omit the iteration index t for derivations of the updating matrices. For a fixed (i, j) position, the set of target matrices are updated as

(|φk,in |2

n=1 n=i, j

k=1

where U and V are the so-called diagonalizing matrices, ZDiag{·} is the operator that sets all the diagonal entries to zero. Our purpose is to find two unitary matrices U and V to minimize J (U, V). We carry out the estimations of U and V by using two multiplicative updates written as (t+1) (t) U = R(t) 1 U (3) (t) V(t+1) = R(t) 2 V

M

K

ak,ii a¯ k,i j

k=1

a¯ k, j j ak, ji

β1 =

K

|ak,ii |2 + |ak, j j |2 . k=1

Then, (9) can be simply written as MX = f with X = (x, y)T . When M is invertible, the solution of the above system can be analytically derived as X = M−1 f.

(10)

622


TABLE I THE N-AJSVD ALGORITHM Input: Ak , k = 1, . . . , K (Matrices to be jointly diagonalized) 1: Initial U, V ∈ C N ×N are unitary matrices. 2: Ak ←− UAk V H for k = 1, . . . , K . 3: Repeat 4: for i = 1, . . . , N − 1 do 5: for j = i + 1, . . . , N do 6: Compute parameters x and y based on (10), then construct unitary matrices R1 and R2 based on (5) and (6), respectively. 7: U ←− R1 U. 8: V ←− R2 V. 9: Ak ←− R1 Ak R2H for k = 1, . . . , K . 10: end for 11: end for 12: Until convergence Output: U, V and Ak for k = 1, . . . , K .

Based on (8) and (10), we have established a new AJSVD algorithm called the N-AJSVD algorithm, and it is outlined in Table I. Table I is called the cyclic-by-row Jacobi-like procedure. Line 1, U, and V are initialized as good guesses for them or the identity matrix. Performing line 5 for all j > i once is called a sweep.

Fig. 1. PI1 values of the N-AJSVD/Givens algorithms. In gray, the 100 individual realizations are plotted and in black the mean over these realizations.

Fig. 2. PI2 values of the N-AJSVD/Givens algorithms. In gray, the 100 individual realizations are plotted and in black the mean over these realizations.

III. SIMULATION RESULTS A. Two Performance Indexes The first performance index is defined as follows: PI1 (U, V) =

K ZDiag{UAk V H }2

F

k=1

UAk V H 2F

where U and V denote the diagonalizing matrices obtained by the proposed algorithm. The second performance index [6] is defined as follows: 1 {pi (G1 ) + pi2 (G2 )} 2 2 N N |g |2 ( j=1 maxi jl |gil |2 − 1) + where pi2 (G) = N (N1−1) { i=1 N N |gi j |2 j=1 ( i=1 max l |gl j |2 − 1)}, and the matrix G = (gi j ) is an N × N matrix, called the global matrix.

Fig. 3. The average PI1 values of 100 individual realizations, when N = 15, K = 4 ∼ 20, ε = 10−4 .

PI2 (G1, G2) =

B. Algorithm Performances Simulation 1: In this experiment, we compare the convergence rate of the N-AJSVD algorithm and the Givens algorithm in the real domain (the Givens algorithm mainly considers in the real domain). K target matrices are generated as Ak = PDk Q H + εNk (k = 1, . . . , K ), where P and Q are randomly generated orthogonal matrices in R N ×N , obtained through an SVD decomposition of a random matrix with i.i.d. elements drawn from the normal distribution N (0, 1). Furthermore, the entries of the diagonal matrices Dk ∈ R N ×N and the noise matrices Nk ∈ R N ×N are also i.i.d. following a zero mean unit variance normal. The constant ε is a real scaling factor which measures the noise contribution. Let G1 = UP and G2 = VQ. We consider K = 12, N = 12 and ε = 10−4 , and let the initial diagonalizing matrices U and V be the identity matrix.

Figs. 1 and 2 show that the convergence rate of the N-AJSVD algorithm is much faster than the Givens algorithm and that the N-AJSVD algorithm has a better stability than the Givens algorithm as well. Remarkably, the initial condition is far from a diagonalizing solution, which means that the N-AJSVD algorithm is really robust when the approximation does not hold. In all the following simulations, we always let the initial diagonalizing matrices U and V be the identity matrix. Simulation 2: We test the accuracy of the N-AJSVD algorithm and the Givens algorithm against the number and size of target matrices for a fixed noise level ε = 10−4 . Using the method in Simulation 1 to generate a set of target matrices, we consider the following two scenarios: 1) N = 15, K = 4 ∼ 20, ε = 10−4 (Figs. 3 and 4). 2) K = 15, N = 4 ∼ 20, ε = 10−4 (Figs. 5 and 6). From Figs. 3 and 4, we see that the N-AJSVD algorithm outperforms the Givens algorithm clearly for small number of target matrices. This result illustrates the advantage of the N-AJSVD algorithm in difficult scenarios where only a small number of target matrices are available [18]. In addition, from Figs. 5 and 6, we note that the N-AJSVD algorithm also outperforms the Givens algorithm in different matrix size.

MIAO et al.: APPROXIMATE JOINT SINGULAR VALUE DECOMPOSITION ALGORITHM BASED ON GIVENS-LIKE ROTATION

623

Fig. 4. The average PI2 values of 100 individual realizations, when N = 15, K = 4 ∼ 20, ε = 10−4 . Fig. 7. Average PI2 values of 100 individual realizations against number of sweeps.

Fig. 5. Average PI1 values of 100 individual realizations, when K = 15, N = 4 ∼ 20, ε = 10−4 .

Fig. 8. (a) Source signals. (b) Retrieved signals by the N-AJSVD. (c) Retrieved signals by the Givens. (d) Retrieved signals by the JBSS–SOS.

TABLE II THE CORRELATION BETWEEN ORIGINAL AUDIO SIGNALS AND EXTRACTED AUDIO SIGNALS IN SIMULATION 3 (USING THE BOLD FACE TYPE FOR THE BEST CORRELATION) Fig. 6. Average PI2 values of 100 individual realizations, when K = 15, N = 4 ∼ 20, ε = 10−4 . Dataset 1

Simulation 3: This experiment aims to demonstrate the effectiveness of the N-AJSVD in solving the JBSS of two datasets. The sources used in this simulation include four audio signals. The sources corresponding to the second dataset are generated by using the same way as in [19]. Two 4 × 4 mixing matrices are generated randomly, where the entries are drawn by using a standard normal distribution. (Here, we have prewhitened the mixed signals [20].) We consider eight target matrices that are the covariance matrices estimated at different successive time lags of prewhitened mixed signals. We also compare the proposed algorithm with JBSS–SOS [13], in the case of only two datasets. For JBSS–SOS, we choose the cross cost, and consider L = 8. Fig. 7 shows the average PI2 values of 100 individual realizations against number of sweeps. We observe that the N-AJSVD and Givens offer better estimation precision than the JBSS–SOS; moreover, the convergence rate of the N-AJSVD is much faster than the Givens and JBSS–SOS. Fig. 8 shows the separation results of the first dataset audio signals by using the N-AJSVD (b), Givens (c) and JBSS–SOS (d), respectively. Table II shows the correlation between original signals and extracted signals, where the correlation is defined as Cov(sn , sˆn )/(σsn σsˆn ), sˆn denotes the estimate of the source sn , Cov denotes the covariance between two variables and σ denotes the standard deviation.

Dataset 2

Algorithms

s1

s2

s3

s4

N-AJSVD Givens JBSS–SOS N-AJSVD Givens JBSS–SOS

0.9998 0.9996 0.9848 0.9997 0.9997 0.9850

0.9997 0.9997 0.9596 0.9991 0.9982 0.9552

0.9992 0.9992 0.9556 0.9983 0.9983 0.9532

0.9986 0.9983 0.9859 0.9997 0.9995 0.9855

We can find that the accuracies of the N-AJSVD and Givens are very similar, which are better than the JBSS–SOS (the results of Fig. 8 and Table II are obtained by choosing the first realization from the 100 individual realizations). IV. CONCLUSION We have proposed a Givens-like AJSVD algorithm based on a special parameterization of the updating matrices. Contrary to the most existing algorithms, the proposed algorithm is suitable in the complex domain. In order to determine the parameters analytically, we have suggested an useful approximation. Simulation results illustrate the overall good behavior of the N-AJSVD algorithm, and show that it has faster convergence rate and higher accuracy than its competitors. Especially, the N-AJSVD algorithm may be preferable in difficult situation where only a few target matrices are available.

624


REFERENCES [1] E. Moreau, “A generalization of joint-diagonalization criteria for source separation,” IEEE Trans. Signal Process., vol. 49, no. 3, pp. 530–541, Mar. 2001. [2] V. Maurandi, E. Moreau, and C. D. Luigi, “Jacobi like algorithm for nonorthogonal joint diagonalization of Hermitian matrices,” in Proc. IEEE Int. Conf. Acoustics, Speech Signal Process., Florence, Italy, May 4–9, 2014, pp. 6196–6200. [3] V. Maurandi and E. Moreau, “A decoupled Jacobi-like algorithm for nonunitary joint diagonalization of complex-valued matrices,” IEEE Signal Process. Lett., vol. 21, no. 12, pp. 1453–1456, Dec. 2014. [4] T. Trainini and E. Moreau, “A coordinate descent algorithm for complex joint diagonalization under Hermitian and transpose congruences,” IEEE Trans. Signal Process., vol. 62, no. 19, pp. 4974–4983, Oct. 2014. [5] P. Tichavsk´y and A. Yeredor, “Fast approximate joint diagonalization incorporating weight matrices,” IEEE Trans. Signal Process., vol. 57, no. 3, pp. 878–891, Mar. 2009. [6] B. Afsari, “Simple LU and QR based non-orthogonal matrix joint diagonalization,” in Proc. 6th Int. Conf. Independent Component Anal. Blind Signal Separation, Charleston, SC, USA, Mar. 5–8, 2006, pp. 1–7. [7] E. M. Fadaili, N. Thirion-Moreau, and E. Moreau, “Nonorthogonal joint diagonalization/zero diagonalization for source separation based on timefrequency distributions,” IEEE Trans. Signal Process., vol. 55, no. 5–1, pp. 1673–1687, May 2007. [8] B. Afsari and P. S. Krishnaprasad, “Some gradient based joint diagonalization methods for ICA,” in Proc. 5th Int. Conf. Independent Component Anal. Blind Signal Separation, Granada, Spain, Sep. 22–24, 2004, pp. 437–444. [9] H.-J. Yu and D.-S. Huang, “Graphical representation for DNA sequences via joint diagonalization of matrix pencil,” IEEE J. Biomed. Health Informat., vol. 17, no. 3, pp. 503–511, May 2013. [10] M. Congedo, R. Phlypo, and D.-T. Pham, “Approximate joint singular value decomposition of an asymmetric rectangular matrix set,” IEEE Trans. Signal Process., vol. 59, no. 1, pp. 415–424, Jan. 2011.

[11] H. Sato, “Joint singular value decomposition algorithm based on the riemannian trust-region method,” Japan Soc. Ind. Appl. Math. Lett., vol. 7, pp. 13–16, 2015. [12] G. Hori, “Comparison of two main approaches to joint SVD,” in Proc. 8th Int. Conf. Independent Component Anal. Signal Separation, Paraty, Brazil, Mar. 15–18, 2009, pp. 42–49. [13] X.-Lin Li, T. Adali, and M. Anderson, “Joint blind source separation by generalized joint diagonalization of cumulant matrices,” Signal Process., vol. 91, no. 10, pp. 2314–2322, 2011. [14] M. Congedo, R. Phlypo, and J. Chatel-Goldman, “Orthogonal and nonorthogonal joint blind source separation in the least-squares sense,” in Proc. 20th Eur. Signal Process. Conf., Bucharest, Ro-mania, Aug. 2012, pp. 1885–1889. [15] G.-H. Cheng, S.-M. Li, and E. Moreau, “New Jacobi-like algorithms for non-orthogonal joint diagonalization of Hermitian matrices,” Signal Process., vol. 128, pp. 440–448, 2016. [16] R. Andre, X. Luciani, and E. Moreau, “A coupled joint eigenvalue decomposition algorithm for canonical polyadic decomposition of tensors,” in Proc. IEEE Sensor Array Multichannel Signal Process. Workshop, Rio de Janerio, Brazil, Jul. 10–13, 2016, pp. 1–5. [17] A. Hjorungnes and D. Gesbert, “Complex-valued matrix differentiation: Techniques and key results,” IEEE Trans. Signal Processing, vol. 55, no. 6–1, pp. 2740–2746, Jun. 2007. [18] K. Wang, X.-F. Gong, and Q.-H. Lin, “Complex non-orthogonal joint diagonalization based on LU and LQ decompositions,” in Proc. 10th Int. Conf. Latent Variable Anal. Signal Separation, Tel Aviv, Israel, Mar. 12–15, 2012, pp. 50–57. [19] L. Zou, X. Chen, and Z. Jane Wang, “Underdetermined joint blind source separation for two datasets based on tensor decomposition,” IEEE Signal Process. Lett., vol. 23, no. 5, pp. 673–677, May 2016. [20] S. Haykin, Independent Component Analysis. Cambridge, U.K.: Cambridge Univ. Press, 2001.