JACOBI-LIKE ALGORITHMS FOR NONSYMMETRIC JOINT MATRIX ...

1 downloads 0 Views 344KB Size Report
In the case of joint diagonalization of more than one matrix, the calculation of the ... in a Jacobi-like fashion using either orthonormal G or nonorthogonal unit-.
JACOBI-LIKE ALGORITHMS FOR NONSYMMETRIC JOINT MATRIX DIAGONALIZATION PETER STROBACH∗ Abstract. We present two algorithms for the exact or approximate joint diagonalization of a set of m ≥ 2 square n × n real nonsymmetric matrices {Xk , k = 1, 2, . . . , m}, with underlying model assumption Xk = AΛk BT + Nk , where {Λk } is a set of diagonal matrices and {Nk } is a set of perturbation matrices. In the first part of this work, we develop an overdetermined Kogbetliantz algorithm for the special case that A and B are both orthonormal matrices. In the second part, we extend this concept to the general case that A and B are both n×n nonorthogonal and nonsymmetric unit-determinant matrices. In all cases, the algorithms jointly diagonalize the given matrix set so that ˆk = A ˆ −1 Xk B ˆ −T } is a sequence of nearly diagonal matrices with offdiagonal elements {off (Λ ˆ k )} {Λ minimized by some Jacobi-like sweeping. We obtain a perfect diagonalization of the given matrix set in cases of vanishing perturbation {Nk } and obtain an approximate joint diagonalization in the more general cases of nonvanishing perturbation or superimposed noise. Approximate joint diagonalization methods are of practical interest in areas such as Array Processing, Direction-of-Arrival Estimation (DOA), Independent Component Analysis (ICA), and System Identification. Key words. Joint Diagonalization, Jacobi Rotations, Kogbetliantz Algorithm, Generalized SVD, Array Processing.

1. Introduction. The roots of joint diagonalization (JD) can be traced back to an early problem in array processing, namely the estimation of a transfer matrix A from a given set of covariance and hence symmetric matrices {Φk }, where A is related to the Φk ’s by the following linear array data model: Φk = AΛk AT + Nk ; k = 1, 2, . . . , m

.

(1.1)

Here {Λk } is a sequence of diagonal matrices of unknown source powers and {Nk } represents perturbations such as superimposed noise. JD methods construct an esˆ −1 in a sense that the sequence of Φk ’s is maximally diagonalized via the timate A transformation ˆk = A ˆ −1 Φk A ˆ −T ; k = 1, 2, . . . , m Λ

.

(1.2)

Early JD methods such as [1] and [2] assume that A is an orthonormal matrix. Hence the joint diagonalization process (1.2) can be implemented in terms of many local transformations, where Jacobi matrices   c −s G= (1.3) s c with c = cos(ϕ) and s = sin(ϕ) are applied from both sides to locally minimize the offdiagonal elements in a given sequence of Φk ’s. Recall that in classical Jacobi [3], we are diagonalizing only a single matrix. In this case, we can perfectly annihilate the core offdiagonal elements and hence the necessary Jacobi angle ϕ is given as the solution of an exactly determined problem. In the case of joint diagonalization of more than one matrix, the calculation of the Jacobi angle ϕ becomes an overdetermined problem and can be solved, for instance, in a sense so that the sum of the squares of the core offdiagonal elements is minimized. ∗ AST-Consulting Inc., Bahnsteig 6, 94133 R¨ ohrnbach, Germany (www.AST-Consulting.net, peter [email protected]).

1

2

P. STROBACH

This means that ϕ is designed to map as much energy as possible from the offdiagonal into the main diagonal elements. JD based on local orthonormal “compression” transformations using G as elementary “compressor” yields optimal results only in cases where the A in model (1.1) is strictly orthonormal. If this condition is violated, we must turn to nonorthogonal compression using nonorthogonal elementary 2 × 2 compressor matrices. We should construct these 2 × 2 nonorthogonal compressor matrices so that as many useful properties of the classical G as possible are preserved. Notice that G has a constant along the main diagonal property and its elements satisfy both a unit column norm and a unit determinant constraint. We must drop at least one of these two constraints to obtain a second free parameter, as required for elementary 2×2 nonorthogonal compression. There are apparently two choices. For instance, we can begin with dropping the unit determinant constraint by parameterizing the compressor in two independent rotation angles ϕ1 and ϕ2 . This yields the following Zimmermann compressor:   c1 −s2 Z= , (1.4) s1 c2 where c1 = cos(ϕ1 ), s1 = sin(ϕ1 ) and c2 = cos(ϕ2 ), s2 = sin(ϕ2 ). This compressor preserves the unit column norm property and is hence perfectly balanced. It first appeared in a classical work [4] on nonorthogonal Jacobi methods for the generalized symmetric eigenproblem, where a pair of two real symmetric matrices is jointly diagonalized. The method can be extended to the case of approximate joint diagonalization of more than two matrices [5]. We continue with dropping the unit column norm constraint while preserving the unit determinant constraint. In this case, we can preserve the constant along the main diagonal property as well. Then the compressor attains the form:   γ β S= ; γ 2 − αβ = σ ; σ = ±1 , (1.5) α γ where σ is case dependent and can attain values of +1 or −1. Strictly speaking, only the unit magnitude of the determinant is preserved. The constant along main diagonal property of S is a key feature that makes this compressor the perfect choice for nonsymmetric nonorthogonal JD. In nonsymmetric JD, it is assumed that the data can be modeled as follows: Xk = AΛk BT + Nk ; k = 1, 2, . . . , m

.

(1.6)

Here, again, all matrices are square n × n and we distinguish between the orthogonal case, where both A and B are assumed strictly orthonormal, and the nonorthogonal case, where we only assume that A and B are unit-determinant matrices. In the orthogonal problem, we can work from both sides using Jacobi compressors G like in the symmetric case. However, the rotation angles on both sides are now generally unequal. In essence, we must find a generalization of a Kogbetliantz algorithm [6] for the overdetermined orthogonal case of JD. The problem of choosing the two different rotation angles of the left and right elementary compressors for overall maximum compression, as required in the nonsymmetric problem, is not nearly as easy as in the symmetric case. This paper is organized as follows. In Section 2, we introduce the main framework, notation and data structures as required in the following sections. In Section 3, we

3

Jacobi-Like Algorithms for Nonsymmetric Joint Matrix Diagonalization

develop the overdetermined Kogbetliantz algorithm for the nonsymmetric orthogonal problem. The most general nonsymmetric nonorthogonal JD algorithm is developed in Section 4. In Section 5, one finds our concluding statement. 2. Main Framework. The algorithms discussed here are based on local compression in a Jacobi-like fashion using either orthonormal G or nonorthogonal unitdeterminant S compressor matrices. These compressor matrices act simultaneously from both sides on each matrix in the given sequence {Xk , k = 1, 2, . . . , m}. The overall organization is the known row cyclic Jacobi addressing scheme, where the accessed row/column index pair {i, j} in each sweep is generated as follows: i = 1, 2, . . . , n − 1

,

j = i + 1, i + 2, . . . , n

(2.1) .

(2.2)

elementary transformations. For a given index pair So each sweep consists of n(n−1) 2 {i, j}, generated according to (2.1-2), we can display the addressed elements in a matrix Xk as follows:        x  i,1     Xk =      x  j,1     

···

···

x1,i .. . xi−1,i xi,i xi+1,i .. . xj−1,i xj,i xj+1,i .. . xn,i

xi,i−1

xj,i−1

xi,i+1

xj,i+1

···

xi,j−1

···

xj,j−1

x1,j .. . xi−1,j xi,j xi+1,j .. . xj−1,j xj,j xj+1,j .. . xn,j



xi,j+1

xj,j+1

···

···

xi,n

xj,n

                     

.

(2.3)

Each matrix element in (2.3) has a third index k, which we omitted here for ease of notation. The “boxed” quantities constitute the core or restricted data set. The transformations from both sides “intersect” at these boxed elements. We can represent them in a compact form and hereby also display the third index k as follows:     ak ek xi,i,k xi,j,k Ck = = ; k = 1, 2, . . . , m . (2.4) fk bk xj,i,k xj,j,k Besides the “restricted” problem constituted by the data set {Ck , k = 1, 2, . . . , m}, we must also discuss the role of the “nonboxed” quantities in (2.3). These nonboxed elements constitute the peripheral data set which is more compactly represented in terms of the matrices          Pk =         

xi,1,k .. . xi,i−1,k xi,i+1,k .. . xi,j−1,k xi,j+1,k .. . xi,n,k

xj,1,k .. . xj,i−1,k xj,i+1,k .. . xj,j−1,k xj,j+1,k .. . xj,n,k

          ; k = 1, 2, . . . , m        

,

(2.5)

4

P. STROBACH

for the horizontal or left accessed peripheral band and          Pk =         

x1,i,k .. . xi−1,i,k xi+1,i,k .. . xj−1,i,k xj+1,i,k .. . xn,i,k

x1,j,k .. . xi−1,j,k xi+1,j,k .. . xj−1,j,k xj+1,j,k .. . xn,j,k

          ; k = 1, 2, . . . , m        

,

(2.6)

for the vertical or right accessed peripheral band in (2.3). Suppose now that we apply a compressor GT from the left to a matrix Xk in the set. The effect of this transformation on the peripheral elements is consequently expressed compactly as follows: P0k = Pk G

.

(2.7)

We could express the corresponding overlined transformations from the right in exactly the same fashion. Observe that these left and right peripheral transformations are completely independent or decoupled. Only the core elements are affected by both the left and right compressors jointly. Let us now study the effect of these transformations on the overall peripheral power. We define the local peripheral power of Pk as follows: Ep,k = trace(PT k Pk ) .

(2.8)

Then it turns out that as a consequence of the orthonormality of G we must have: 0 0 Ep,k = trace(P0T k Pk ) = Ep,k

.

(2.9)

Hence orthonormal transformations leave the peripheral power completely unaltered. They are sometimes called lossless transformations. The property of losslessness is the deeper reason why classical Jacobi algorithms only look at the core data and optimize the rotation angles according to some criteria defined on the core offdiagonal or “boxed” offdiagonal elements since we cannot affect or reduce the peripheral power by an orthonormal transformation, whatever rotation angle we may choose. We must mention this here because the situation alters completely as soon as nonorthogonal local compression becomes an issue. It is clear that as a consequence of nonorthogonality, a compressor like the S of (1.5) will generally alter (increase or decrease) the peripheral power. Hence the optimization of S cannot be based solely on some criteria defined on the core data only. The peripheral data must be taken into account as well and we must overall minimize both the local peripheral and the core offdiagonal power jointly. This makes nonsymmetric nonorthogonal local compression a fairly difficult task. Throughout this work, we can benefit much from the fact that S and G are both unit determinant constant along main diagonal matrices. There exists a smooth transition from the general nonorthogonal framework to the more restricted orthogonal

5

Jacobi-Like Algorithms for Nonsymmetric Joint Matrix Diagonalization

case because the G is just a special case of S, namely S = G, if we choose: γ=c

,

α=s

,

β = −s

(2.10) (2.11) .

(2.12)

We can hence formulate the core compression problem for the more general nonorthogonal case first and then obtain the special results for the orthogonal case by making transitions like (2.10-12). The core compression problem in the general nonorthogonal case is established as follows:     0   γ α ak ek ak e0k γ β 0 T . (2.13) Ck = = S Ck S = β γ fk bk fk0 b0k α γ Only the transformed core offdiagonal elements e0k and fk0 are of interest here. We can express them conveniently as follows:   γ 0    0 γ  β  e0k = [ak , ek , fk , bk ]  , (2.14)  α 0  γ 0 α   β 0    γ 0  γ   , (2.15) = [ak , ek , fk , bk ]  0 β  α 0 γ and  β 0    0 β  γ 0   fk = [ak , ek , fk , bk ]  γ 0  α 0 γ   γ 0    α 0  β   = [ak , ek , fk , bk ]  0 γ  γ 0 α 

An overall core data covariance matrix Φ is defined:    φ1 φ5 φ8 φ10 ak m  φ5 φ2 φ6 φ 9  X  e k   Φ=  φ8 φ6 φ3 φ 7  =  fk k=1 φ10 φ9 φ7 φ4 bk

,

(2.16)

.

(2.17)

   [ak , ek , fk , bk ] 

.

(2.18)

We can now introduce a set of subcompressed quantities {ψ1 , ψ2 , ψ3 } as follows: 

 Ψ=

ψ1 ψ3

ψ3 ψ2



 =

γ 0

0 γ

α 0

0 α



γ  0 Φ  α 0

 0 γ   0  α

,

(2.19)

6

P. STROBACH

or equivalently: ψ1 = φ1 γ 2 + 2φ8 αγ + φ3 α2 2

ψ2 = φ2 γ + 2φ9 αγ + φ4 α

,

2

(2.20)

,

(2.21)

2

ψ3 = φ5 γ + (φ6 + φ10 )αγ + φ7 α

2

.

(2.22)

In the same fashion, a second set of subcompressed quantities {ω1 , ω2 , ω3 } is defined: 

 Ω=

ω1 ω3

ω3 ω2



 =

β 0

0 β

γ 0



0 γ

 0 β   0  γ

β  0 Φ  γ 0

,

(2.23)

where ω1 = φ1 β 2 + 2φ8 βγ + φ3 γ 2

,

ω2 = φ2 β 2 + 2φ9 βγ + φ4 γ 2

,

(2.24) (2.25)

2

ω3 = φ5 β + (φ6 + φ10 )βγ + φ7 γ

2

.

(2.26)

Consequently, it follows from (2.14) and (2.16) that the top-right and bottom-left core offdiagonal powers attain the form: E0 = 0

F =

m X k=1 m X

(e0k ) =

2



2 (fk0 )



=

β γ

γ



α



 Ψ  Ω

k=1

β γ



γ α



,

(2.27)

.

(2.28)

3. The Overdetermined Kogbetliantz Algorithm for Nonsymmetric Orthogonal JD. Based on this framework, we can now introduce the necessary details for the two-sided orthonormal compressor algorithm. The goal in this algorithm is the minimization of the overall core offdiagonal power Θ = E 0 + F 0 . First of all, we apply the transitions γ=c

(3.1)

α=s

(3.2)

β = −s

(3.3)

to (2.27) and (2.28) and hereby obtain the offdiagonal power expressions in terms of the sine and cosine of the right orthonormal compressor G:

0

E =



−s c

F0 =





c s



 Ψ

−s c







c s

=



ψ1

−2ψ3

ψ2

=

ω2

2ω3

ω1





s2  cs  c2 



 s2  cs  c2  



.

,

(3.4)

(3.5)

Jacobi-Like Algorithms for Nonsymmetric Joint Matrix Diagonalization

7

Next we apply the transitions (2.10-12) to the ψ and ω expressions of (2.20-22) and (2.24-26) to obtain: 

   ψ1 φ3 2φ8 φ1  −2ψ3  =  −2φ7 −2(φ6 + φ10 ) −2φ5   ψ2 φ4 2φ9 φ2     2 ω2 φ2 −2φ9 φ4 s  2ω3  =  2φ5 −2(φ6 + φ10 ) 2φ7   c s ω1 φ1 −2φ8 φ3 c2

 s2 cs  c2 

,

.



(3.6)

(3.7)

Substitute these expressions into (3.4) and (3.5). This yields the following final expression for the overall core offdiagonal power in the case of orthonormal compression:   2  (φ2 + φ3 ) 2(φ5 − φ7 ) (φ1 + φ4 ) s  2  Θ(ϕ, ϕ) = s c s c2  2(φ8 − φ9 ) −4(φ6 + φ10 ) 2(φ9 − φ8 )   c s  . (φ1 + φ4 ) 2(φ7 − φ5 ) (φ2 + φ3 ) c2 (3.8) This Θ is clearly a function of the two Jacobi angles ϕ and ϕ of the left and right Gcompressors. Extremal points of Θ can be found at the root locations of the direction derivatives ∂Θ/∂ϕ and ∂Θ/∂ϕ. A few remarks seem to be in order before we establish the closed-form expressions for these direction derivatives. First of all, notice that:  2   2     s 0 2 0 s 2cs ∂  cs  =  c2 − s2  =  −1 0 1   cs  , (3.9) ∂ϕ c2 0 −2 0 −2cs c2 and the same holds for the overlined quantities. Hence the derivative of the sine/cosine parameter vector is just a transformed version of the sine/cosine parameter vector itself. More importantly, however, is the fact that the transform matrix that appears here is just a rank-two matrix. This is the key to a closed-form solution of the Θ minimization problem. After all, it is not difficult to verify that   ∂Θ = 2 s2 ∂ϕ  ∂Θ = 2 s2 ∂ϕ

cs

cs

c2



C 4

D 4A −D

 −B − C4   −A −B c2  D −C A B

 2 − C4 s B  cs C c2 4  2 A s −D   c s −A c2

 

,

(3.10)

,

(3.11)

 

where A = φ5 − φ7

,

(3.12)

B = (φ1 + φ4 ) − (φ2 + φ3 ) ,

(3.13)

C = 4(φ9 − φ8 ) ,

(3.14)

D = 2(φ6 + φ10 ) .

(3.15)

Equations (3.10) and (3.11) constitute a pair of bivariate quadratic equations. The system matrices in these forms are low rank and exhibit a nice structure. The rootfinding problem for this pair of bivariate quadratic equations can be solved in closed

8

P. STROBACH

form. For this purpose, introduce the following decompositions: C 4



 −B −C  4 −A  D A

  1 − C4 C B  =  − 4B C 4 C −1 4   1 A −D  = −A  − D A −A −1

D 4A −D −B −C B

0



16v2 C2





0 0



v1 A2





0

4D C w2

1 0

−1 0

B A w1

1 0

−1 0

 ,

(3.16)

 ,

(3.17)

where

v1 w1 = v2 w2 = AC + BD

.

(3.18)

Introduce two auxiliary variables a and b. Clearly, we must have ∂Θ =0 ∂ϕ

(3.19)

if only  

s2

cs

c2



1  − 4B C −1



0 16v2 C2

=









a

a

1 0

4D C w2

,

(3.20)

0 

b −b

=

−1 0



 s2  cs  c2 

.

(3.21)

Employ the two auxiliary variables a and b again to see that in the same fashion: ∂Θ =0 ∂ϕ

(3.22)

if only  

s2

cs

c2



1  −D A −1

0 v1 A2

 =









a

a

1 0

B A w1

,

(3.23)

0 

b −b

=

−1 0



 s2  cs  c2 

.

(3.24)

Jacobi-Like Algorithms for Nonsymmetric Joint Matrix Diagonalization

9

  1 Now we postmultiply (3.20) and (3.23) by and premultiply (3.21) and (3.24) −1   by 1 1 to obtain:   1  2  16v2  s c s c2  − 4B =0 , (3.25) C − C2 −1   1   2 =0 , s c s c2  4D (3.26) C + w2 −1   1  2  v1  s c s c2  − D =0 , (3.27) A − A2 −1   1  2  =0 . s c s c2  B (3.28) A + w1 −1 These are apparently 4 quadratic equations parametrized in v1 , w1 , v2 , w2 with constraints (3.18). We must adjust these parameters so that the coefficient vectors of (3.25) and (3.27) become perfectly aligned. In the same fashion, we adjust these parameters so that the coefficient vectors of (3.26) and (3.28) become perfectly aligned as well. This yields the following two conditional equations for the v and w parameters: 4B 16v2 D v1 + 2 = + 2 , C C A A 4D B + w2 = + w1 . C A

(3.29) (3.30)

Multiplying (3.29) by A2 C 2 and multiplying (3.30) by AC yields more convenient forms of these conditional equations: 4A2 (BC + 4v2 ) = C 2 (AD + v1 ) ,

(3.31)

A(4D + Cw2 ) = C(B + Aw1 ) .

(3.32)

Introduce an auxiliary quantity a as follows: a = AC + BD

.

(3.33)

Now recall (3.18) to see that we can express v1 and v2 in terms of w1 , w2 and a: a v1 = , (3.34) w1 a v2 = . (3.35) w2 Hereby, we can eliminate the v1 and v2 parameters in (3.31). Moreover, we can solve (3.30) for w2 . This yields: BC − 4AD . (3.36) AC We use this expression to finally eliminate the w2 from (3.31). This way, (3.31) turns into a quadratic equation in w1 : w2 = w1 +

ACgw12 + (ah + gb)w1 −

C ab = 0 , A

(3.37)

10

P. STROBACH

where b = BC − 4AD

,

(3.38)

g = 4AB − CD

,

(3.39)

.

(3.40)

2

h = 16A − C

2

are three more auxiliary quantities. The solutions of (3.37) must be real as long as Θ is extremizable by variation of ϕ and ϕ. For a pair of real solutions w1 , we could use (3.36) to compute a corresponding pair of solutions w2 . We could use these w1 and w2 solutions for computing the corresponding pairs of v1 and v2 parameters via (3.34) and (3.35). These v and w parameters completely determine the coefficients of the quadratics (3.25-28). But since our solution perfectly meets the alignment conditions (3.29) and (3.30), the coefficents of the quadratics (3.25) and (3.27) are perfectly identical and the coefficients of the quadratics (3.26) and (3.28) are perfectly identical as well. Hence we have the choice, which one of these quadratics we wish to pursue. Clearly, we choose (3.27) for computing the {c, s} parameters and choose (3.28) for computing the {c, s} parameters because these quadratics depend only on w1 and v1 and therefore, the explicit quantification of w2 and v2 can be avoided. The two quadratics (3.27) and (3.28) are both of the standard form s2 + ρcs − c2 = 0 ,

(3.41)

where ρ is the only nonfixed coefficient. We introduce the tangent z = s/c

,

(3.42)

and solve the related quadratic z 2 + ρz − 1 = 0 .

(3.43)

This quadratic has only real roots. The minor root is given by zmin =

2 p ρ + sign(ρ) ρ2 + 4

.

(3.44)

Following elementary trigonometric facts, we can calculate the corresponding sine and cosine parameters according to the standard formulas c= p

1

2 1 + zmin s = czmin .

.

(3.45) (3.46)

This concept is applied for computing the overlined sines and cosines as well. The choice of zmin maximizes c while minimizing s. We call this a diagonal dominant compressor design. The counterpart would be an antidiagonal dominant compressor design corresponding to the choice of zmax for calculating the sine and cosine parameters. The diagonal dominant design is generally preferred because in this case, the compressor matrices converge to identity matrices at convergence, as desired. It is also not difficult to demonstrate that both the diagonal and the related antidiagonal dominant design result in exactly the same Θ values and are hence both perfect compressors in the sense of overall core offdiagonal power minimization. The necessary computations for the overdetermined nonsymmetric orthogonal compressor design can be summarized as follows:

Jacobi-Like Algorithms for Nonsymmetric Joint Matrix Diagonalization

11

• Compute the parameters φ1 - φ10 of the overall core data covariance matrix Φ according to (2.18). • Compute the parameters A - D according to (3.12-15). • Compute the parameters a and b, g, h according to (3.33) and (3.38-40), respectively. • Find the two real roots w1,min and w1,max of quadratic (3.37). • For each of these two roots do the following: – Compute the corresponding v1 according to (3.34). – Solve the two quadratics (3.27) and (3.28) for the corresponding {c, s} and {c, s} compressor parameters. For this purpose, use the concept as outlined in (3.41-46). • For each of the two solutions w1,min and w1,max of quadratic (3.37), we have now a parameter set {c, s}min , {c, s}min and {c, s}max , {c, s}max at our disposition. • For each of these two parameter sets, calculate the resulting Θ according to (3.8). Choose the parameter set that minimizes Θ. 4. The Nonsymmetric Nonorthogonal Compressor Algorithm. We are now in a position to derive the most general nonsymmetric nonorthogonal compressor algorithm for performing the compression step (2.13) using nonorthognal unitdeterminant compressors S and S acting from left and right onto the given matrix set. The cost function to be minimized is the overall offdiagonal power given as the sum of the core offdiagonal powers E 0 and F 0 of (2.27) and (2.28) plus the periph0 eral powers given by Ep0 and E p corresponding to the transformed horizontal Pk and vertical Pk peripheral bands of (2.5) and (2.6). This is a demanding optimization problem that cannot be solved as a unit in closed form. We present an alternating optimization procedure that starts with a good initial solution for S. The best initial value of S that we can get here is the orthogonal compressor G of the previous section. As pointed out earlier, we can take advantage of the fact that S and G are both unit determinant constant along main diagonal matrices. Consequently, we set     γ β c −s S= =G= , (4.1) α γ s c and the initial value problem is solved. Given the initial S, the algorithm computes an optimal S, silently assuming that the underlying S was already optimal. But since this was not the case, we further optimize S based on the previously optimized S. This results in an alternating optimization scheme of the kind: G → S → S → S → S...

.

(4.2)

The method converges very rapidly. Two alternating iterations are typically sufficient for convergence. A general observation is that this step of going from orthogonal to nonorthogonal JD further decreases the overall offdiagonal power significantly, indicating that as a consequence of doubling the number of free parameters in the compressor matrices, we will get a much tighter fit of model (1.6) onto a given matrix sequence. We begin with the necessary closed-form expressions for the offdiagonal powers. Much of the work has been done already in Section 2. We established the core offdiagonal power expressions E 0 and F 0 in (2.27) and (2.28) already. These expressions

12

P. STROBACH

were obtained from the “error” equations (2.14) and (2.16). There exists a second side of the medal constituted by the error equations (2.15) and (2.17). Based on these error equations, we can introduce a set of subcompressed quantities {ω 1 , ω 2 , ω 3 } as follows:   β 0      γ 0  ω1 ω3 β γ 0 0  (4.3) Ω= = Φ  0 β  , ω3 ω2 0 0 β γ 0 γ or equivalently: 2

ω 1 = φ1 β + 2φ5 βγ + φ2 γ 2 2

ω 2 = φ3 β + 2φ7 βγ + φ4 γ

2

,

(4.4)

,

(4.5)

2

ω 3 = φ8 β + (φ6 + φ10 )βγ + φ9 γ 2

.

(4.6)

In the same fashion, a second set of subcompressed quantities {ψ 1 , ψ 2 , ψ 3 } is defined:   γ 0      α 0  ψ1 ψ3 γ α 0 0  = (4.7) Ψ= Φ  0 γ  , 0 0 γ α ψ3 ψ2 0 α where ψ 1 = φ1 γ 2 + 2φ5 α γ + φ2 α2 2

ψ 2 = φ3 γ + 2φ7 α γ + φ4 α 2

2

,

(4.8)

,

ψ 3 = φ8 γ + (φ6 + φ10 )α γ + φ9 α

(4.9) 2

.

(4.10)

Based on these overlined subcompressed quantities, the following alternative forms for the top-right and bottom-left core offdiagonal powers can be established:   m X   γ 2 E0 = (e0k ) = γ α Ω , (4.11) α k=1   m X   β 0 2 0 (fk ) = β γ Ψ F = . (4.12) γ k=1

We can compare (4.11-12) with the previously obtained (2.27-28) expressions for E 0 and F 0 . Clearly, these two representations of E 0 and F 0 are complementary in a sense that they feature an alternating iteration scheme for the desired compressor parameters: Once the parameters of S are known, we can use them in (2.27-28) for computing the optimized parameters of S. Once the S parameters are known, we can in turn used them in (4.11-12) for finding the optimized parameters of S and the circle is closed. Besides the core offdiagonal powers E 0 and F 0 , the horizontal and vertical periph0 eral powers Ep0 and E p come into the play. We discussed this already in Section 2. Peripheral covariances are introduced:   X m p1 p3 = PTk Pk , (4.13) p3 p2 k=1   X m T p1 p3 = Pk Pk , (4.14) p3 p2 k=1

Jacobi-Like Algorithms for Nonsymmetric Joint Matrix Diagonalization

13

where we use the Pk ’s and the Pk ’s of (2.5) and (2.6), respectively. The S transformation from the left affects the Pk ’s (recall (2.7)) so that the covariances undergo the transformation:  0    p1 p03 p1 p3 T =S S . (4.15) p03 p02 p3 p2 The overall peripheral power after transformation is hence given by:         p1 p3   p1 p3 γ β 0 0 0 Ep = p1 + p2 = γ α + β γ . (4.16) p3 p2 α p3 p2 γ So Ep0 is given as the sum of two quadratic forms in {γ, α} and {β, γ}, respectively. It should not be too hard to realize that we can glue these two quadratic forms together in a 3-dimensional joint parameter space {β, γ, α} as follows:    p1 p3 0 β   Ep0 = β γ α  p3 (p1 + p2 ) p3   γ  . (4.17) 0 p3 p2 α In the same fashion, we can work in the overlined domain to obtain:     0 T p1 p03 p1 p3 S , =S p03 p02 p3 p2

(4.18)

and consequently:  0 Ep

=



β

γ

α



p1  p3 0

  p3 0 β (p1 + p2 ) p3   γ  p3 p2 α

.

(4.19)

Continuing along this line of thought, we can also represent the overall core offdiagonal power in the 3-D parameter spaces. For instance, notice that the E 0 and F 0 powers of (4.11) and (4.12) sum up in the {β, γ, α} space as:  E0 + F 0 =



β

γ

α



ψ1  ψ3 0

  ψ3 0 β (ψ 2 + ω 1 ) ω 3   γ  α ω3 ω2

Alternatively, it follows from (2.27) and (2.28) that we  ψ1 ψ3   E 0 + F 0 = β γ α  ψ3 (ψ2 + ω1 ) 0 ω3

can write:   0 β ω3   γ  ω2 α

.

(4.20)

.

(4.21)

We can now establish the overall cost functions, as required for the optimization of the S and the S parameters in the sense of overall offdiagonal power minimization. The cost function for the optimization of S is given by:    u g 0 β   (4.22) Θ = E 0 + F 0 + Ep0 = β γ α  g v h   γ  , 0 h w α

14

P. STROBACH

where we added (4.20) and (4.17) so that u = ψ 1 + p1

,

v = ψ 2 + ω 1 + p1 + p2 w = ω 2 + p2

(4.23) ,

(4.24)

,

(4.25)

g = ψ 3 + p3

,

(4.26)

h = ω 3 + p3

.

(4.27)

In the same fashion, we establish the cost function sum of (4.21) and (4.19):  u   0 Θ = E0 + F 0 + Ep = β γ α  g 0

for an optimization of S as the   g 0 β v h  γ  α h w

,

(4.28)

where u = ψ1 + p1

,

v = ψ2 + ω1 + p1 + p2

(4.29) ,

(4.30)

w = ω2 + p 2

,

(4.31)

g = ψ3 + p3

,

(4.32)

h = ω3 + p 3

.

(4.33)

The minimization of Θ is clearly a constrained optimization problem because of the unit-determinant constraint γ 2 − αβ = σ according to (1.5). The same holds for the minimization of Θ. Notice that    0 0 − 12 β   γ 2 − αβ = β γ α  0 1 0   γ  . (4.34) α − 12 0 0 Hence Θ is given as the minor positive generalized eigenvalue of the following generalized symmetric eigenproblem (GSE):       0 0 − 21 u g 0 β β  g v h  γ  = λ 0 1 0  γ  , (4.35) 0 h w α α − 12 0 0 or equivalently: 

u  g λ 2

g (v − λ) h

λ 2



   β 0 h  γ  =  0  α 0 w

.

(4.36)

Consequently, Θ is given as the minor positive root λmin of the following characteristic cubic: C(λ) = λ3 − vλ2 + 4(gh − uw)λ + 4(uvw − h2 u − g 2 w) .

(4.37)

In practice, λmin can be computed in closed form or using the algorithm of [7], or variations thereof for computing only λmin without computing the other roots. In

Jacobi-Like Algorithms for Nonsymmetric Joint Matrix Diagonalization

15

any event, we must be aware that λmin can attain very small values. In the limit case of a perfectly diagonalizable matrix set, we are even approaching a value of λmin = 0. We can now substitute λmin back into (4.36) for obtaining a rank-deficient system of linear equations for the desired compressor parameters α, β and γ, always taking into account the underlying constraint γ 2 − αβ = σ = ±1. For this purpose, we define an orthonormal transform matrix   1 0 0 H= 0 c s  , (4.38) 0 −s c where c and s denote cosine and sine of some rotation angle. determined so that    λmin u t3 u g 2 (v − λmin ) h  HT = T =  t3 t1 H g λmin 0 t4 h w 2

This rotation angle is  0 t4  t2

.

Moreover, we introduce auxiliary variables q1 , q2 and q3 :     β q1  q2  = H  γ  . q3 α

(4.39)

(4.40)

Hereby, the problem reduces to solving a rank-deficient tridiagonal system of linear equations for q1 , q2 and q3 :     q1 0 T  q2  =  0  . (4.41) q3 0 Clearly, we can solve the top row for q1 and the bottom row for q3 to obtain: t3 q2 u t4 q3 = − q2 t2 q1 = −

,

(4.42)

.

(4.43)

We can now substitute these results into the middle row of (4.41) to obtain the following result for q2 : t1 t2 u − t23 t2 − t24 u q2 = 0 . ut2

(4.44)

t1 t2 u − t23 t2 − t24 u = det(T) ,

(4.45)

det(T) = 0

(4.46)

But since

and

by definition, it turns out that q2 is a completely free variable. It can be determined so that the underlying constraint γ 2 − αβ = σ = ±1 is fulfilled. For this purpose, the

16

P. STROBACH

constraint must be “transformed” From (4.40) it follows that:      β q1 1  γ  = HT  q2  =  0 α q3 0

into the “q-domain” of the intermediate solution. 0 c s

    0 q1 q1 −s   q2  =  cq2 − sq3  c q3 sq2 + cq3

.

(4.47)

Plugging this into the constraint γ 2 − αβ = σ, we obtain the constraint in the qdomain: γ 2 − αβ = (cq2 − sq3 )2 − (sq2 + cq3 )q1 = σ

.

(4.48)

Substituting q1 and q3 by q2 according to (4.42) and (4.43), we obtain from (4.48) the following equation for q22 : q22 = σz

,

(4.49)

where z=

ut22 ct2 (ct2 + 2st4 )u + s2 t24 u + t2 t3 (st2 − ct4 )

.

(4.50)

Until now, σ was a free ±1 variable. Now we see that we must choose σ = sign(z) , for making (4.49) in any event solvable for q2 by square rooting: √ q2 = σz .

(4.51)

(4.52)

Once q2 has been determined, we obtain q1 and q3 from (4.42) and (4.43). In a final step, the desired compressor parameters α, β and γ are obtained by backtransformation via (4.47). The same procedure is applied for finding the overlined parameters in the related Θ minimization problem. The algorithm for nonsymmetric nonorthogonal compressor design can be summarized as follows: • Initialize γ = c, α = s and β = −s from the orthogonal compressor design according to (4.1). • FOR µ = 1, 2, 3, ... iterate: – Compute ψ1 , ψ2 and ψ3 according to (2.20-22), ω1 , ω2 and ω3 according to (2.24-26) and p1 , p2 and p3 according to (4.14). – Compute u, v, w, g, h, according to (4.29-33). – Solve the GSE of (4.35) using these overlined quantities for the desired parameters α, β and γ. For this purpose, do the following: ∗ Find the minor root λmin of the cubic C(λ). ∗ Solve the low-rank system of linear equations (4.36) with λ = λmin for the desired α, β and γ parameters using the concept as outlined in (4.38-52). – Compute ψ 1 , ψ 2 and ψ 3 according to (4.8-10), ω 1 , ω 2 and ω 3 according to (4.4-6) and p1 , p2 and p3 according to (4.13). – Compute u, v, w, g, h, according to (4.23-27). – Solve the GSE of (4.35) using these nonoverlined quantities for the desired parameters α, β and γ. For this purpose, do the following:

Jacobi-Like Algorithms for Nonsymmetric Joint Matrix Diagonalization

17

∗ Find the minor root λmin of the corresponding cubic C(λ). ∗ Solve the low-rank system of linear equations (4.36) with λ = λmin for the desired α, β and γ parameters using the concept as outlined in (4.38-52). 5. Conclusions. We established a solid theory and algorithms for the exact or approximate joint diagonalization of square nonsymmetric matrix sets using Jacobilike transformations using either orthogonal or nonorthogonal elementary compressor matrices. REFERENCES [1] A. Bunse-Gerstner, R. Byers, and V. Mehrmann, “Numerical methods for simultaneous diagonalization”, SIAM J. Matrix Anal. Appl., Vol. 14, No. 4, pp. 927-949, 1993. [2] J.-F. Cardoso and A. Souloumiac, “Jacobi angles for simultaneous diagonalization”, SIAM J. Matrix Anal. Appl., Vol. 17, No. 1, pp. 161-164, 1996. [3] J. Demmel and K. Veselic0 , “Jacobi’s method is more accurate than QR”, SIAM J. Matrix Anal. Appl., Vol. 13, pp. 1204-1245, 1992. [4] K. Zimmermann, “Zur Konvergenz eines Jacobiverfahrens f¨ ur gew¨ ohnliche und verallgemeinerte Eigenwertprobleme”, Ph.D. Dissertation Nr. 4305, ETH Z¨ urich, 1969. [5] P. Strobach, “Constrained optimization of the overdetermined Zimmermann compressor for nonorthogonal joint matrix diagonalization”, Numerische Mathematik, Vol. 129, No. 3, pp. 563-586, 2015. [6] J.P. Charlier, M. Vanbegin and P. Van Dooren, “On efficient implementations of Kogbetliantz’s algorithm for computing the singular value decomposition”, Numerische Mathematik, Vol. 52, No. 3, pp. 279-300, 1988. [7] P. Strobach, “Solving cubics by polynomial fitting”, J. Computational Applied Mathematics, Vol. 235, No. 9, pp. 3033-3052, 2011.