Block Term Decomposition with Rank Estimation Using ... - IEEE Xplore

18 downloads 0 Views 429KB Size Report
Amar Kachenoura. 1,2,4. , Huazhong Shu. 3,4. , and Lotfi Senhadji. 1,2,4. 1INSERM, U 1099, 35402 Rennes Cedex, France. 2LTSI, Université de Rennes 1, ...
2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)

Block term decomposition with rank estimation using group sparsity Xu Han1,2,4 , Laurent Albera1,2,4 , Amar Kachenoura1,2,4 , Huazhong Shu3,4 , and Lotfi Senhadji1,2,4 1

INSERM, U 1099, 35402 Rennes Cedex, France LTSI, Universit´e de Rennes 1, Campus de Beaulieu, 35042 Rennes Cedex, France 3 LIST, Southeast University, 2 Sipailou, 210096, Nanjing, China 4 Centre de Recherche en Information Biom´edicale Sino-Franc¸ais (CRIBs), Rennes, France Email: [email protected], {laurent.albera, amar.kachenoura, lotfi.senhadji}@univ-rennes1.fr, [email protected] 2

Abstract—In this paper, we propose a new rank-(L, L, 1) Block Term Decomposition (BTD) method. Contrarily to classical techniques, the proposed method estimates also the number of terms and the rank-(L, L, 1) of each term from an overestimated initialization of them. This is achieved by using Group Sparsity of the Loading (GSL) matrices. Numerical experiments with noisy tensors show the good behavior of GSL-BTD and its robustness with respect to the presence of noise in comparison with classical methods. Experiments on epileptic signals confirm its efficiency in practical contexts.

I.

I NTRODUCTION

The Block Term Decomposition (BTD) has been proposed for several years [1]. It shows its advantage on blind source separation [2, 3]. The Alternating Least Square (ALS) optimization strategy can be used to perform BTD [4]. The Enhanced Line Search (ELS) procedure [5, 6] with a sophisticated extrapolation scheme can be used too. Non-linear Least squares (NLS) approaches have also been proposed [7]. They appear to be more efficient particularly when there is some degree of collinearity between factors [7]. All these algorithms assume that the number R of terms of the BTD model and the rank of each term is known. Unfortunately, in practice, such a piece of information is not available especially for low Signal-toNoise Ratio (SNR) values. This may lead to the well-known overfactoring problem. In order to overcome this drawback, we propose a new rank-(L, L, 1) BTD method, which estimates R and L before the computation of the loading matrices. This estimation is based on the Group Sparsity property of the Loading (GSL) matrices when R and L are initialized with overestimated values. It is also based on the relationship that we can establish between the rank-(L, L, 1) BTD and Canonical Polyadic Decomposition (CPD) models. Regarding the computation of the loading matrices, the developed approach relies on the Alternating Direction Method of Multipliers (ADMM) [8]. Interestingly, the GSL-BTD method that we present in this manuscript gives better results than classical BTD algorithms [4, 7] for low SNR values even when R and L are initialized with the true values. This performance comparison is performed through numerical experiments with simulated noisy tensors. The ability of GSL-BTD to estimate all the parameters of the BTD model and its robustness with respect to the presence of noise is illustrated through these simulations. In addition, experiments on epileptic signals confirm the efficiency of the GSL-BTD algorithm in practical contexts. 978-1-5386-1251-4/17/$31.00 ©2017 IEEE

II.

N OTATIONS AND P RELIMINARIES

Scalars are denoted by lowercase letters, e.g., x, vectors are written by bold lowercase letters, e.g., x, matrices correspond to bold capital letters, e.g., X and 3-way arrays are symbolized by bold calligraphic letters, e.g., X . X† is the pseudo inverse operation of matrix X. The symbols ◦ , ⊗, ¨ denote the outer product, Kronecker product, column,  wise Khatri-Rao product and matrixwise Khatri-Rao product operators, respectively. Recall that if A = [A1 , . . . , AR ] and B = [B1 , . . . , BR ] are two partitioned matrices, we have ¨ = [A1 ⊗ B1 , . . . , AR ⊗ BR ]. The symbol Xi,: denotes AB the i-th row of matrix X. The column vectorization of X is denoted by vec(X). The mode-i unfolding matrix of the array X is denoted by X(i) . Let’s recall some definitions such as the definition of the inner product of multi-way arrays: Definition 1: The inner product of X and Y with (X , Y) ∈ (RI×J×K )2 is defined as: X , Y =

J  K I  

Xi,j,k Yi,j,k

(1)

i=1 j=1 k=1

The matrix Frobenius norm can also be generalized to 3-way arrays: Definition 2: The Frobenius norm of X ∈ RI×J×K is defined by:  (Xi,j,k )2 (2) X F = i,j,k

Less restrictive than the well-known CPD model, the BTD model is defined as follows (see [1] for identifiability study): Definition 3: The rank-(L, L, 1) BTD of a tensor T ∈ RI×J×K in a sum of R rank-(L, L, 1) terms is given by: T =

R  r=1

E r ◦ cr =

R  

 Ar BTr ◦ cr

(3)

r=1

where each matrix Er = Ar BrT ∈ RI×J is rank-L with Ar ∈ RI×L , Br ∈ RJ×L and cr is a K-dimensional vector. Let’s denote by A, B and C the matrices [A1 , . . . , AR ], [B1 , . . . , BR ] and [c1 , . . . , cR ], respectively. Then, we have (1) (3) ¨ T , T(2) ¨ T TI×JK = A (CB) J×IK = B (CA) and TK×IJ =

2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)

Fig. 1: Visual representation of the BTD model with terms of rank-(L, L, 1).   C ETvec = C (BA)T where Evec = vec(E1 ), . . . , vec(ER ) the mixed-norm of U, V and W as described in definition 4. The minimization of L is computed using the ADMM. So the and where C is a (K × L1 + · · · + LR ) replicate matrix of C (k + 1)-th iterate of U is computed from the k-th iterate of given by c1 , . . . , c1 , . . . , cR , . . . , cR . Eventually, let’s recall the other variables, by vanishing the following derivative: the definition of the matrix mixed-norm: Definition 4: The mixed-norm of X ∈ Rm×n is given by:  m   n

(Xi,j )2 = T r[XT ΦX] X2,1 = i=1

j=1

where T r[.] is the trace operator and where Φ is a diagonal n 2 matrix with Φi,i = 1/ j=1 (Xi,j ) . III.

T HE GSL-BTD M ETHOD

In this section, we will assume that the loading matrices of the CPD and those of the BTD of the three-way array T are full column rank. In addition, we will consider that T is low rank, i.e. its dimensions are largely greater than its rank. Then the whole procedure of our method allowing us to compute the BTD of a given three-way array T can be described by means of three consecutive steps given below. A. Estimation of the number R of terms Let’s recall that the rank of T is defined as the number of terms of the CPD of T . Using our assumptions, we can prove that this rank is equal to R. So we can find the number R by applying the robust CPD method, named GSL-CPD, that we proposed at EUSIPCO’17 [9]. It consists in computing the

CPD approximation of T where R

is an overestimate rank-R of R. Consequently, the estimated loading matrices should be group sparse. Then we propose to minimize their mixed-norm1 in order to vanish their overestimated columns: min

U,V,W

U2,1 + V2,1 + W2,1

s.t. T ≈ F =

 R 

u r ◦ v r ◦ wr

(6)

Using the Lyapunov equation we obtain:         vec(Uk+1 ) = I ⊗ 2λΦk + uk Wk  Vk T Wk  Vk −1    (1) ⊗I vec Yk + uk T(1) Wk  Vk (7) Similarly, the (k + 1)-th iterate of V is given by:        vec(Vk+1 ) = I ⊗ 2λΨk + uk Wk  Uk+1 T Wk  −1     (2) Uk+1 ⊗ I vec Yk + uk T(2) Wk  Uk+1 (8) and the (k + 1)-th iterate of W is given by:        vec(Wk+1 ) = I ⊗ 2λΩk + uk Vk+1  Uk+1 T Vk+1  −1     (3) Uk+1 ⊗ I vec Yk + uk T(3) Vk+1  Uk+1 (9) Eventually, the updating rule of Y k+1 is given by: Y k+1 = Y k + uk (T − F k+1 )

(10)

The update rule for uk+1 is uk+1 = min(ρuk , umax ), ρ > 1. After convergence of this iterative procedure, R is computed as the rank of W. B. Estimation of the rank (L, L, 1)

(4)

r=1

We solved (4) using the following augmented Lagrangian objective function:      L(U,V, W, Y) = λ T r UT ΦU + T r VT ΨV +    u  T r WT ΩW + Y, T − F + T − F 2F (5) 2 where λ is a penalty parameter and Y is the multiplier tensor. Φ, Ψ and Ω are the diagonal matrices allowing us to compute 1 We

∂L = 2λΦU + uU(W  V)T (W  V) ∂U   − Y(1) + uT(1) (W  V)

can derive from [10, proposition 1] that the mixed-norm is an upper bound of the nuclear norm, which is generally use as a convex surrogate of the rank function.

In order to estimate L, we set R at the value estimated by the iterative procedure described in section III-A and we

CPD approximation of T by using compute the rank-RL

of L. Indeed, the BTD of T can be an overestimate L reformulated as a rank-RL CPD approximation of T with A, B and C as loading matrices. Then by computing the rank CPD approximation of T , the estimated loading matrices RL should be group sparse. Consequently, we solve the following minimization problem using the ADMM algorithm:   T   T   T

A

+ Tr B

Ψ

B

+ Tr C

C

Φ Ω min T r A  B,  C  A,

s.t. T ≈ P =

 RL  r=1

r ◦

r ◦ b cr a

(11)

2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)

(a)

(b)

(c)

(a)

(b)

(c)

L)

and SNR=5dB. (c) Probability of Fig. 2: (a) Averaged value of α and (b) averaged value of re for different values of (R, accurate estimation of R and L using GSL-BTD for SNR=5dB.

L)

and SNR=0dB. (c) Probability of Fig. 3: (a) Averaged value of α and (b) averaged value of re for different values of (R, accurate estimation of R and L using GSL-BTD for SNR=0dB.

is constrained to have the same replicate structure where C

Ψ

and Ω

are the diagonal as C and where the matrices Φ,

B

and matrices allowing us to compute the mixed-norm of A,

C, respectively, as described in definition 4. The arrays Ak+1 ,

k+1 , C B k+1 and Kk+1 are computed as follows:        

k + uk C

T C

k+1 ) = I ⊗ 2γ Φ vec(A k  Bk k  Bk −1    (1)

⊗I vec Kk + uk T(1) C  B (12) k k      T 

k+1 ) = I ⊗ 2γ Ψ

k + uk C

vec(B Ck  k  Ak+1   −1  (2)   

k+1 ⊗ I

A vec Kk + uk T(2) C (13) k  Ak+1 









a new one based on the following minimization with β = 0:        min β T r AT φA + T r BT ψB + T r CT ωC A,B,C

s.t. T ≈ M =

(16)

and the use of the corresponding augmented Lagrangian function:      L(A, B, C, Z) = β T r AT φA + T r BT ψB +    u  T r CT ωC + Z, T − M + T − M2F (17) 2 The matrices A, B and C are alternatively updated using the following formulas: Ak+1 =

T 

Bk+1 =

and

   1  (1) ¨ k T † Zk + uk T(1) Ck B uk

   1  (2) ¨ k+1 T † Zk + uk T(2) Ck A uk

Ck+1 =

(15)

Note that γ controls the weight of the three mixed-norms with respect to the constraint. The update rule for uk+1 is uk+1 = min(ρuk , umax ), ρ > 1. The iterative procedure is stopped

divided by R is an integer. when the rank of C

 Ar · BTr ◦ cr

r=1



k+1 

B vec(C k+1 ) = I ⊗ 2γ Ωk + uk Bk+1  Ak+1   −1  (3)   

k+1 ⊗ I

k+1

k+1  A A vec Kk + uk T(3) B (14) Kk+1 = Kk + uk (T − P k+1 )

R  

 † 1  (3) Zk + uk T(3) ETvec k+1 uk

(18)

(19)

(20)

and Z k+1 = Z k + uk (T − Mk+1 ) IV.

(21)

E XPERIMENT RESULTS

A. On simulated noisy tensors C. Estimation of the loading matrices After the estimation of R and L, the loading matirces A, B and C can be computed using any BTD method. We propose

Fifty low rank tensors M ∈ R100×100×100 were generated using the rank-(5, 5, 1) BTD model with Gaussian loading matrices and R = 5. The Gaussian distribution was also used

2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)

(a)

(b)

Fig. 4: (a) Spatial (S1 ), frequency (F1a , F1b , F1c , F1d ) and temporal (T1a , T1b , T1c , T1d ) components of one term of the BTD of X and (b) the corresponding time × frequency matrix. to generate fifty noise tensors N . Then the tensor T was obtained as follows: N M +σ (22) T = MF N F where the parameter σ controls the SNR defined by SNR= −20log10 (σ). Two SNR values of 5dB and 0dB were chosen for our simulations. We compared our method with three BTD classical methods, namely ALS, ELS-ALS and NLS, using TensorLab [11]. The performance was measured using the scale-invariant and permutation-invariant gap between the true loading matrix C and its estimate C(e) , defined as follows:   R   (e) α= (23) min 2 d cn , ck n=1

(n,k)∈In

(e)

where cn is the n-th column of C and ck is the kth column of C(e) , In2 is defined recursively from I12 =  2 = In2 \Jn2 , where Jn2 = 1, . . . , R × 1,.  . . , Rest and In+1 (e) argmin[(n,k)∈In2 ] d cn , ck . The definition of the measure d between two vectors [12] is given by:  T 2  c n ck  d (cn , ck ) = 1 −  2  2 (24)  cn T   ck  The performance was also measured using the relative error M − M(e) F . defined by re =  M F Figure 2 shows the influence of using overestimates of R and L for an SNR value of 5dB. All the fifty independent Monte Carlo trials are stopped when they satisfy the convergence criterion. From Fig. 2(a) and Fig. 2(b), we can see that, whatever the overestimates of R and L are, the methods ALSBTD, ELS-BTD and NLS-BTD are outperformed by GSLBTD, which succeeds in perfectly identifying R and L as shown in Fig. 2(c). The same result is obtained for an SNR value of 0dB as shown in Fig. 3. B. On simulated epileptic EEG data Simulated ElectroEncephaloGraphy (EEG) data were stored in a (500 × 21) matrix E, representing 2s of recordings with a sample frequency of 250 Hz. These EEG data

correspond to the scalp activity due to one epileptic dipole with a moment of linearly decreasing frequency from 8 to 4 Hz located at coordinates2 (x, y, z) = (−0.5, 0, 0.1) with orientation (1, 0, 0). Next, muscle artefacts, stored in a matrix N, were imposed on these cleaned EEG data using an SNR value of 0 dB, leading to the noisy EEG matrix X = E + N. Then a three-way array X is built by computing the wavelet transform of each column of X, leading to a tensor of size

L)

= (4, 4) (30 × 500 × 21), we applied GSL-BTD with (R, and we found (R, L) = (3, 4). Fig. 4 shows the factors associated to one term of the BTD of X computed using GSL-BTD. In Fig. 4(a), we can see that the activity region with spatial signature (S1) is close to the original location (-0.5, 0, 0.1). Due to the indeterminacy of the factors, the F1 and T1 factors can be linear combinations of the true signals. But the time × frequency matrix displayed in Fig. 4(b) is unique and shows the time-frequency property of the epileptic source (8 to 4 Hz). This shows that GSL-BTD can be used to localize epileptic sources from EEG data. V.

C ONCLUSION

In this paper, we proposed a new method, namely GSLBTD, for block term decomposition able to estimate as well the underlying ranks as the loading matrices. Through the experiments on noisy tensors, it appears that our method is more robust than classical BTD algorithms. Experiments on simulated EEG data show the ability of GSL-BTD to solve practical problems such as biomedical engineering problems. Forthcoming work will include the comparison of GSL-BTD with GSL-CPD [9] in order to localize epileptic sources [13, 14]. ACKNOWLEDGMENT The authors would like to thank Dr. Borbala Hunyadi for sharing the simulated EEG data. 2 (x, y, z) indicate left ear to right ear, posterior to anterior and from down upwards throught the Cz electrode, respectively.

2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)

R EFERENCES [1] L. De Lathauwer, “Decompositions of a higher-order tensor in block terms—part II: Definitions and uniqueness,” SIAM Journal on Matrix Analysis and Applications, vol. 30, no. 3, pp. 1033–1066, 2008. [2] L. N. Ribeiro, A. L. De Almeida, and V. Zarzoso, “Enhanced block term decomposition for atrial activity extraction in atrial fibrillation ecg,” in Sensor Array and Multichannel Signal Processing Workshop (SAM). IEEE, 2016, pp. 1–5. [3] L. De Lathauwer, “Blind separation of exponential polynomials and the decomposition of a tensor in rank(lr ,lr ,1) terms,” SIAM Journal on Matrix Analysis and Applications, vol. 32, no. 4, pp. 1451–1474, 2011. [4] L. De Lathauwer and D. Nion, “Decompositions of a higher-order tensor in block terms—part III: Alternating least squares algorithms,” SIAM journal on Matrix Analysis and Applications, vol. 30, no. 3, pp. 1067–1083, 2008. [5] M. Rajih, P. Comon, and R. A. Harshman, “Enhanced line search: A novel method to accelerate parafac,” SIAM journal on matrix analysis and applications, vol. 30, no. 3, pp. 1128–1147, 2008. [6] A. Karfoul, L. Albera, and L. De Lathauwer, “Iterative methods for the canonical decomposition of multi-way arrays: Application to blind underdetermined mixture identification,” Signal Processing, vol. 91, no. 8, pp. 1789–1802, 2011. [7] L. Sorber, M. Van Barel, and L. De Lathauwer, “Optimization-based algorithms for tensor decompositions: Canonical polyadic decomposition, decomposition in rank-(lr ,lr ,1) terms, and a new generalization,” SIAM Journal on Optimization, vol. 23, no. 2, pp. 695–720, 2013. [8] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011. [9] X. Han, L. Albera, A. Kachenoura, L. Senhadji, and H. Shu, “Low rank canonical polyadic decomposition of tensors based on group sparsity,” in EUSIPCO 17, XXV European Signal Processing Conference, KOS island, Greece, August 28-September 1st , 2017. [10] X. Shu, F. Porikli, and N. Ahuja, “Robust orthonormal subspace learning: Efficient recovery of corrupted lowrank matrices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3874–3881. [11] N. Vervliet, O. Debals, L. Sorber, M. Van Barel, and L. De Lathauwer, “Tensorlab 3.0,” available online, URL: www. tensorlab. net, 2016. [12] P. Comon, X. Luciani, and A. L. De Almeida, “Tensor decompositions, alternating least squares and other tales,” Journal of chemometrics, vol. 23, no. 7-8, pp. 393–405, 2009. [13] B. Hunyadi, D. Camps, L. Sorber, W. Van Paesschen, M. De Vos, S. Van Huffel, and L. De Lathauwer, “Block term decomposition for modelling epileptic seizures,” EURASIP Journal on Advances in Signal Processing, Special Issue on Recent Advances in Tensor Based Signal and Image Processing, vol. 139, September 2014.

[14] H. Becker, P. Comon, L. Albera, M. Haardt, and I. Merlet, “Multi-way space–time–wave-vector analysis for eeg source separation,” Signal Processing, vol. 92, no. 4, pp. 1021–1031, 2012.

Suggest Documents