IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 3, NO. 2, APRIL 2016
149
Coupled Cross-correlation Neural Network Algorithm for Principal Singular Triplet Extraction of a Cross-covariance Matrix Xiaowei Feng, Xiangyu Kong, and Hongguang Ma
Abstract—This paper proposes a novel coupled neural network learning algorithm to extract the principal singular triplet (PST) of a cross-correlation matrix between two high-dimensional data streams. We firstly introduce a novel information criterion (NIC), in which the stationary points are singular triplet of the crosscorrelation matrix. Then, based on Newton’s method, we obtain a coupled system of ordinary differential equations (ODEs) from the NIC. The ODEs have the same equilibria as the gradient of NIC, however, only the first PST of the system is stable (which is also the desired solution), and all others are (unstable) saddle points. Based on the system, we finally obtain a fast and stable algorithm for PST extraction. The proposed algorithm can solve the speed-stability problem that plagues most noncoupled learning rules. Moreover, the proposed algorithm can also be used to extract multiple PSTs effectively by using sequential method. Index Terms—Singular value decomposition (SVD), coupled algorithm, cross-correlation neural network (CNN), speed-stability problem, principal singular subspace (PSS), principal singular triplet (PST).
I. I NTRODUCTION
N
EURAL network for principal component analysis (PCA) is an important method for feature extraction and data compression applications. Until now, a significant amount of neural network algorithms for PCA have been proposed. Compared with neural based PCA algorithms, the field of neural network algorithms that perform the singular value decomposition (SVD) received relatively little attention[1] . Actually, neural networks can also provide an effective approach for SVD of rectangular matrix between two data streams[2−5] . In an earlier time, some methods for SVD were developed by using purely matrix algebra[6−9] , but these methods are batching methods and thus can not be used for real-time processing. To extract feature information online, some SVD dynamical systems based on gradient flows are developed[10−12] , but these algorithms converge slowly. To overcome this problem, Diamantaras and Kung[5] extended Manuscript received July 20, 2015, accepted December 21, 2015. This work was supported by National Natural Science Foundation of China (61174207, 61374120, 61074072, 11405267). Recommended by Associate Editor Zhanshan Wang. Citation: Xiaowei Feng, Xiangyu Kong, Hongguang Ma. Coupled crosscorrelation neural network algorithm for principal singular triplet extraction of a cross-covariance matrix. IEEE/CAA Journal of Automatica Sinica, 2016, 3(2): 149−156 Xiaowei Feng and Xiangyu Kong are with Xi’an Research Institute of High Technology, Xi’an 710025, China (e-mail:
[email protected];
[email protected]). Hongguang Ma is with Beijing Institute of Technology, Zhuhai 519088, China (email: mhg
[email protected]).
the online correlation Hebbian rule to the cross-correlation Hebbian rule to perform the SVD of a cross-correlation matrix. However, the models are sometimes divergent for some initial states[4] . Feng et al. provided a theoretical foundation for a novel model based on an extension of the Hebbian rule[13] . In order to improve the convergence speed and stability, Feng proposed a cross-correlation neural network (CNN) learning rule for SVD[4] in which the learning rate is independent of the singular value distribution of a non-squared data or crosscorrelation matrix. It is found that all efforts of aforementioned work are only focused on single singular component analysis. Actually, there are also some principal singular subspace (PSS) extraction algorithms that have been proposed. In [14], a novel CNN model for finding the PSS of a cross-correlation matrix was proposed. Hasan[15] proposed an algorithm to track PSS in which the equilibrium point relates to the principal singular values. Later, Kong et al.[16] proposed an effective neural network algorithm for cross-correlation feature extraction, which is also a parallel algorithm. As known, neural network algorithm for extracting the principal singular vectors/subspace, which is always derived based on Hebbian rule or gradient method, is a special case of optimization problem. Actually, solving the optimization problems by using the recurrent neural network has also been well studied in [17 − 19]. In [1], neural network algorithms that only consider the singular vectors extraction are coined as noncoupled algorithms. In most of noncoupled algorithms, the convergence speed depends on the singular values of the cross-covariance matrix, but it is hard to select an appropriate learning rate. This problem was coined as the speed-stability problem[1] . In order to solve this problem, Kaiser et al.[1] proposed a coupled neural network algorithm based on Newton’s method, in which the principal singular triplet (PST) is estimated simultaneously. Coupled learning algorithm, which can control the learning rate of an eigenvector estimate by the corresponding eigenvalue estimate, has favorable convergence properties[1] . Besides singular vectors, CSVD algorithm can also extract the singular values, which are very important and useful in some engineering practice; on the other hand, CSVD algorithm can solve the speed-stability problem which plagues most noncoupled algorithms, and thus converge faster than noncoupled algorithms. But unfortunately, there is only one CSVD algorithm that has been proposed and analyzed so far. In this paper, we successfully derive a new CSVD algorithm based on Kaiser’s work. Our work is an improvement and
150
IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 3, NO. 2, APRIL 2016
expansion of Kaiser’s work. In this paper, we propose a novel information criterion (NIC), in which the desired solution corresponds to one of the stationary points. The gradient-based method is not suitable for the NIC since the first PST of the NIC is not a minimum but a saddle point. Nevertheless, the learning rule can be derived by applying Newton’s method, in this case, the key point is to find the gradient and to approximate the inverse Hessian of the NIC. Then, we obtain an algorithm based on the gradient and the inverse Hessian. Stability of proposed algorithm is analyzed by finding the eigenvalues of the Jacobian of the ordinary differential equations (ODEs). Stability analysis shows that the algorithm solves the speed-stability problem since it converges with approximately equal speed in almost all of its eigen-directions and is widely independent of its singular values. Based on the experiment results, we find that the proposed algorithm converge faster than existing noncoupled algorithms. Compared with noncoupled algorithms, the proposed algorithm can also estimate the principal singular values. In this paper, the approach of approximating the inverse Hessian is different from that in [1]. The information criterion introduced in [1] contains a natural logarithm of singular value, which is not suitable for negative singular value. The proposed information criterion overcomes this weakness. II. P RELIMINARIES Let A denote an m × n real matrix and its SVD is given by[1] ¯ S¯V¯ T + U ¯2 S¯2 V¯2T , A=U (1) m×M n×M ¯ ∈ R where U and V¯ ∈ R (M ≤ min{m, n}) ¯ = denote the left and right PSS of A, respectively. And U [¯ u1 , . . . , u ¯M ] and V¯ = [¯ v1 , . . . , v¯M ], where u ¯i and v¯i (i = 1, . . . , M ) are the i-th left and right principal singular vectors of A, respectively. Moreover, S¯ = diag¯ σ1 , . . . , σ ¯M ∈ RM ×M denotes the matrix with the principal singular values on its diagonal. We refer to these matrices as the principal portion of the SVD. (¯ uj , v¯j , σ ¯j ) is called the jth singular ¯2 = triplet of the cross-covariance matrix A. Furthermore, U [¯ uM +1 , . . . , u ¯N ] ∈ Rm×(N −M ) , S¯2 = diag¯ σM +1 , . . . , σ ¯N ∈ R(N −M )×(N −M ) and V¯2 = [¯ vM +1 , . . . , v¯N ] ∈ Rn×(N −M ) corresponds to the minor portion of the SVD, where N = ¯ S¯V¯ T is the best rank-M approximamin{m, n}. Thus, Aˆ = U tion (in the least-squares sense) of A. All left and right singular vectors are normal, i.e., k¯ uj k = k¯ vj k = 1, ∀j, and mutually ¯ TU ¯ = V¯ T V¯ = IM and U ¯ TU ¯ ¯T ¯ orthogonal, i.e., U 2 2 = V2 V2 = IN −M . In this paper, we assume that the singular values are ordered as |¯ σ1 | ≥ |¯ σ2 | ≥ · · · ≥ |¯ σM | > |¯ σM +1 | ≥ · · · ≥ |¯ σN |. In the following, all considerations (e.g., concerning fixed points) depend on the principal portion of the SVD only. Considering an m-dimensional sequence x(k) and an ndimensional sequence y(k) with sampling number k large enough. Without loss of generality, let m ≥ n. If x(k) and y(k) are jointly stationary, their cross-correlation matrix is defined as A = E[xy T ] and can be estimated by A(k) =
k 1X x(j)y T (j) ∈ Rm×n . k j=1
(2)
If x(t) and y(t) are jointly nonstationary and (or) slowly timevarying, then their cross-correlation matrix can be estimated by A(k) =
k X
αk−j x(j)y T (j) ∈ Rm×n ,
(3)
j=1
where 0 < α < 1 denotes the forgetting factor which makes the past date samples be less weighted than the recent ones. The task of neural network algorithm for SVD is to compute two nonzero vectors (u ∈ Rm and v ∈ Rn ) and a real number (σ) from two data streams (x and y) such that Av = σu,
(4)
T
A u = σv,
(5)
T
(6)
u Av = σ, T
where A = E[xy ]. III. N OVEL I NFORMATION C RITERION AND I TS C OUPLED S YSTEM The gradient-based algorithms are derived by maximizing the variance of the projected data or minimizing the reconstruction error based on an information criterion. Thus it is required that the stationary points of the information criterion must be attractors. However, the gradient-based method is not suitable for the NIC since the first PST of the NIC is a saddle point. Different from the gradient method, Newton’s method has the beneficial property that it turns even saddle points into attractors which guarantees the stability of the resulting learning rules[20] . In this case, the learning rule can be derived to an information criterion which is neither subject to minimization nor to maximization[1] . Moreover, Newton’s method has higher convergence speed than gradient method. In this paper, a coupled algorithm is derived from an NIC based on Newton’s method. The NIC is presented as 1 1 p = uT Av − σuT u − σv T v + σ. 2 2 The gradient of (7) is determined through "µ ¶ µ ¶ #T T T ∂p ∂p ∂p ∇p = , , , ∂u ∂v ∂σ
(7)
(8)
which has the components ∂p = Av − σu, (9) ∂u ∂p = AT u − σv, (10) ∂v ∂p 1 1 = − uT u − v T v + 1. (11) ∂σ 2 2 It is clear that the stationary points of (7) are also the singular triplets of A as given in (4)-(5), and we can also conclude that uT Av = σ and uT u = v T v = 1.
FENG et al.: COUPLED CROSS-CORRELATION NEURAL NETWORK ALGORITHM FOR PRINCIPAL SINGULAR TRIPLET · · ·
In Newton descent, the gradient is premultiplied by the inverse Hessian. The Hessian H = ∇∇T p of (7) is −σIm A −u −σIn −v , (12) H = AT T −u −v T 0 where Im and In are identity matrices of dimension m and n, respectively. In Appendix A, we determine an approximation for the inverse Hessian in the vicinity of the PST: −σ −1 (Im − uuT ) 0mn − 12 u 0nm −σ −1 (In − vv T ) − 12 v , H −1 = 1 T − 12 v T 0 −2u (13) where 0mn is a zero-matrix of size m × n, and 0nm denotes its transpose. The Newton’s method for SVD is defined as ∂p u˙ ∂u v˙ = −H −1 ∇p = −H −1 ∂p . (14) ∂v ∂p σ˙ ∂σ By applying the gradient of p in (9)-(10) and the inverse Hessian (13) to (14), we obtain a system of ODEs 1 3 1 u˙ =σ −1 Av − σ −1 uuT Av − u + uuT u − uv T v, (15) 2 4 4 1 3 T 1 −1 T −1 T v˙ =σ A u − σ vu Av − v + vv v − vuT u, (16) 2 4 4 1 T T T σ˙ =u Av − σ(u u + v v). (17) 2 It is straight-forward to show that this system of ODEs has the same equilibria as the gradient of p. However, only the first PST is stable (see Section IV). The approximation error generated by the approximation of the inverse Hessian matrix is analyzed in Appendix B. IV. O NLINE I MPLEMENTATION AND S TABILITY A NALYSIS A stochastic online algorithm can be derived from the ODEs in (15)-(17) by formally replacing A with x(k)y(k)T and by introducing a small learning rate γ. Here k denotes the discrete time step. Under certain conditions, the online algorithm has the same convergence goal as the ODEs[1] . We introduce the auxiliary variable ξ(k) = uT (k)x(k) and ζ(k) = v T (k)y(k) and then obtain the update equations from (15)-(17): n u(k + 1) = u(k) + γ σ(k)−1 ζ(k)[x(k) − ξ(k)u(k)] o 1 3 1 − u(k) + u(k)uT (k)u(k) − u(k)v T (k)v(k) , (18) 2 4 4 n −1 v(k + 1) = v(k) + γ σ(k) ξ(k)[y(k) − ζ(k)v(k)] o 1 3 1 − v(k) + v(k)v T (k)v(k) − v(k)uT (k)u(k) , (19) 2 4 4 n 1 σ(k + 1) =σ(k) + γ ξ(k)ζ(k) − σ(k)[uT (k)u(k) 2 o T + v (k)v(k)] . (20) Stability is a crucial property of the learning rules, as it guarantees convergence. The stability of proposed algorithm can be proven by analyzing the Jacobian of the averaged
151
learning rule in (15)-(17), evaluated at the q-th singular triplet, i.e., (¯ uq , v¯q , σ ¯q ). A learning rule is stable if its Jacobian is negative definite. Jacobian of the original ODE system (15)(17) is J(¯ uq , v¯q , σ ¯q ) = ¯q u ¯T −Im + 12 u q 3 σ ¯q−1 AT − 2 v¯q u¯q T 0T m
σ ¯q−1 A − 32 u ¯q v¯q T T 1 −In + 2 v¯q v¯q 0T n
0m 0n . 1
(21)
The Jacobian (21) has M −1 eigenvalue pairs αi = σ ¯i /¯ σq − 1, αM +i = −¯ σi /¯ σq − 1, ∀i 6= q and i 6= 1; a double eigenvalues αq = αM +q = − 21 ; and all other eigenvalues αr = −1. Since a stable equilibrium requires |¯ σi |/|¯ σq | < 1, ∀i 6= q, and consequently |¯ σq | > |¯ σi |, ∀i 6= q, which is only provided by choosing q = 1, i.e., the first PST (¯ u1 , v¯1 , σ ¯1 ). Moreover, if |¯ σ1 | À |¯ σj |, ∀j 6= 1, all eigenvalues (except α1 = αM +1 = − 12 ) are αi ≈ −1, so the system converges with approximately equal speed in almost all of its eigendirections and is widely independent of the singular values. V. E XPERIMENTS A. Experiment 1 In this experiment, we conduct a simulation to test the ability of PST extraction of proposed algorithm, and also compare the performance with that of some other algorithms, i.e., the coupled algorithm proposed by Kaiser et al.[1] and two noncoupled algorithms proposed by Feng et al.[14] and Kong et al.[16] , respectively. Generate randomly the nine-dimensional Gaussian white sequence y(k), whereas x(k) = Ay(k). A is given by: A = [u0 , . . . , u8 ] · randn(9, 9) · [v0 , . . . , v8 ]T ,
(22)
where ui and vi (i = 0, . . . , 8) are the i-th components of 11and 9-dimensional orthogonal discrete cosine basis function, respectively. In order to measure the convergence speed and precision of learning algorithms, we compute the direction cosine between the state vector, i.e., u(k) and v(k), and the true principal singular vector, i.e., u ¯1 and v¯1 , at the k-th update: |uT (k) · u ¯1 | , ku(k)k · k¯ u1 k |v T (k) · v¯1 | DC(v(k)) = . kv(k)k · k¯ v1 k
DC(u(k)) =
(23) (24)
Clearly, if direction cosine (23) and (24) converge to 1, then state vector u(k) and v(k) must approach the direction of the true left and right singular vectors, respectively. For coupled algorithms, we define the left and right singular error at the k-th update: ²L (k) = kσ(k)−1 AT u(k) − v(k)k, −1
²R (k) = kσ(k)
Av(k) − u(k)k.
(25) (26)
If these two singular errors converge to 0, then the singular value estimate σ(k) must approach the true singular value as k → ∞. In this experiment, the learning rate is chosen as γ = 0.02 for all rules. The initial value u(0) and v(0) are set to be orthogonal to u ¯1 and v¯1 . Experiment results are shown in Figs. 1-6.
152
IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 3, NO. 2, APRIL 2016
algorithms over noncoupled algorithms. This is very helpful in some engineering applications when singular values estimation is required. Figs. 5 and 6 confirmed the efficiency of principal singular value estimation of coupled algorithms.
Fig. 1.
Direction cosine between u(k) and u ¯1 .
Fig. 4.
Fig. 2.
Norm of left state vector v(k).
Direction cosine between v(k) and v¯1 .
Fig. 5. Principal singular value estimation, the true principal eigenvalue is indicated by a dashed line.
Fig. 3.
Norm of left state vector u(k).
From Figs. 1 and 2, it is observed that all algorithms can effectively extract both of the left and right principal singular vectors of a cross-correlation matrix between two data streams. The coupled algorithms have higher convergence speed than Feng’s algorithm. Compared with Kong’s algorithm, the coupled algorithms have similar convergence speed in the whole process but higher convergence speed in the beginning steps. In Figs. 3 and 4, we find that all left and right state vectors of all algorithms converge to a unit length, and coupled algorithms have higher convergence speed than noncoupled algorithms. What is more, the principal singular value of the cross-correlation matrix can also be estimated in coupled algorithms, which is actually an advantage of coupled
Fig. 6.
The left and right singular error.
In order to demonstrate the estimation accuracy more clearly, we also compute the mean value of the last 100 steps of all aforementioned indices, respectively. The results are shown in Table I. We find that the coupled algorithms extract the principal singular vectors more exactly than the noncoupled algorithms.
FENG et al.: COUPLED CROSS-CORRELATION NEURAL NETWORK ALGORITHM FOR PRINCIPAL SINGULAR TRIPLET · · ·
153
TABLE I MEAN VALUES OF THE LAST 100 STEPS OF ALL INDICES Index
Proposed
Kaiser
Feng
Kong
DC(u) DC(v) kuk kvk k²L k k²R k
1.0054 1.0055 1.0091 0.9904 0.0095 0.0104
1.0059 1.0058 1.0100 0.9901 0.0124 0.0123
0.9808 0.9810 1.0110 1.0109 – –
1.0090 1.0083 0.9899 1.0101 – –
B. Experiment 2 Here we use the proposed algorithm to extract multiple principal singular component, i.e., the first 3 PSTs of a crosscovariance matrix, in this experiment. The initial conditions are set to be the same as that in Experiment 1. The method of multiple component extraction is a sequential method, which was introduced in [4]: A1 (k) = A(k), Ai (k) = A1 (k) −
Fig. 8. Norm of all left and right state vectors (ui (k) and vi (k), where i = 1, 2, 3).
(27) i−1 X
T uj (k)uT j (k)A1 (k)vj (k)vj (k)
j=1 T = Ai−1 (k − 1) − ui−1 (k)uT i−1 (k)A1 (k)vi−1 (k)vi−1 (k), (i = 2, . . . , n), (28)
where A(k) is defined in (3) and can be updated as A(k) =
k−1 1 A(k − 1) + x(k)y T (k). k k
Fig. 9.
Principal singular values estimation.
(29)
By replacing A with Ai (k) instead of ξ(k)ζ(k) in (15)-(17) at each step, then the i-th triplet (ui , vi , σi ) can be estimated. In this experiment, we set α = 1. Fig. 7 shows the direction cosine between the first 3 (left and right) principal singular vectors and the true (left and right) singular vectors. Fig. 8 shows the norm of first 3 (left and right) principal singular vectors. And Fig. 9 shows the first 3 principal singular values estimation. Figs. 7-9 confirm the ability of multiple component analysis of proposed algorithm. The results of Experiment 2 confirm the ability of multiple component analysis of proposed algorithm.
VI. C ONCLUSION In this paper, a novel CSVD algorithm is presented by finding the stable stationary point of an NIC via Newton’s method. The proposed algorithm can solve the speed-stability problem and thus perform much better than noncoupled algorithms. CSVD algorithms can track the left and right singular vectors and singular value simultaneously, which is very helpful in some engineering applications. The major work in this paper extends the field of CSVD algorithms. Experiment results show that the proposed algorithm performs well. A PPENDIX A A PPROXIMATION OF THE I NVERSE H ESSIAN Newton’s method requires an inversion on the Hessian (12) in the vicinity of the stationary point, here the PST. The inversion of the Hessian is simplified by a similarity transformation, using the orthogonal matrix ¯ U 0mn 0m V¯ 0n . T = 0nm (A1) T T 0m 0n 1
Fig. 7. Direction cosine between state vectors (ui (k) and vi (k), where i = 1, 2, 3) and true singular vectors (¯ ui and v¯i ), respectively.
¯ = [¯ Here, U u1 , . . . , u ¯m ] and V¯ = [¯ v1 , . . . , v¯n ] are matrices which contain all left and right singular vectors in their columns, respectively. For the computation of the transformed matrix H ∗ = T ¯ = V¯ S¯T and AV¯ = U ¯ S; ¯ here S¯ T HT we exploit AT U is a m × n matrix whose first N diagonal elements are the
154
IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 3, NO. 2, APRIL 2016
singular values arranged in decreasing order, and all remaining elements of S¯ are zero. Moreover, in the vicinity of the stationary point (¯ u1 , v¯1 , σ ¯1 ), we can approximate u ¯T u ≈ em T and v¯ v ≈ en ; here em and en are unit vectors of the specified dimensions with a 1 as the first element. We obtain −σIm S¯ −em −σIn −en . H0∗ = S¯T (A2) T −em −eT 0 n Next, by approximating σ ≈ σ ¯1 and by assuming |¯ σ1 | À |¯ σj |, ∀j = 2,. . . , M , the expression S¯ can be approximated as S¯ = σσ −1 S¯ ≈ σem eT n . In this case, (A2) yields −σIm σem eT −em n −σIn −en . H ∗ = σen eT (A3) m T −em −eT 0 n As introduced in [21], suppose that an invertible matrix of size (j + 1) × (j + 1) is of the form µ ¶ Rj rj Rj+1 = , (A4) rjT ρj where rj is a vector of size j and ρj is a real number. If Rj −1 is invertible and Rj−1 is known, then Rj+1 can be determined from µ −1 ¶ µ ¶ 1 bj bT bj Rj 0j −1 j Rj+1 = + , (A5) bT 1 0T 0 βj j j where bj = −Rj−1 rj , βj = ρj +
(A6)
rjT bj .
(A7)
T T Here, it is obvious that rj = [eT m , en ] , ρj = 0 and µ ¶ −σIm σem eT n Rj = . σen eT −σIn m
(A8)
Based on the method introduced in [22] and [23], we obtain µ ¶−1 −σIm σem eT −1 n Rj = σen eT −σIn m ¶ µ T −1 −1 em eT (Im − em eT n (In − en en ) m) . = −σ −1 T −1 −1 (In − en eT en eT n) m (Im − em em ) (A9) em eT m
en eT n
It is found that Im − and In − are singular and thus we add a penalty term ε ≈ 1 to (A9), then it yields Rj−1 ≈
µ
¶
−1 T −1 (Im − εem eT em eT m) n (In − εen en ) − σ −1 T T −1 T −1 en e (Im − εem em ) (In − εen en ) µ m ¶ T Im + ηem em (1 + η)em eT n = −σ −1 , (A10) (1 + η)en eT In + ηen eT m n
where η = ε/(1 − ε). Thus bj = βj =
T T −Rj−1 rj = −2(1 + η)σ −1 [eT m , en ] , ρj + rjT bj = 4(1 + η)σ −1 .
Substituting (A10)-(A12) into (A5), it yields −1 (H)∗−1 = Rj+1 ≈ −1 −σ (Im − em eT 0mn m) 0nm −σ −1 (In − en eT n) 1 T − 12 eT − e m 2 n
− 12 em − 21 en . σ 4(1+η)
(A13) From the inverse transformation H −1 = T H ∗−1 T T and by σ approximating 4(1+η) = 41 σ(1 − ε) ≈ 0 , we get H −1 ≈ −σ −1 (Im − uuT ) 0mn − 12 u 0nm −σ −1 (In − vv T ) − 12 v , 1 T −2u − 12 v T 0 (A14) where we approximated the unknown singular vectors by u ¯1 ≈ u and v¯1 ≈ v. A PPENDIX B A PPROXIMATION E RROR OF I NVERSE H ESSIAN M ATRIX The approximation error generated by the approximation of the inverse Hessian matrix is defined as ∆H −1 = H0−1 − H −1 ,
(A15)
−1
where H is the approximated inverse Hessian that has been calculated in (43); H0−1 is the true inverse matrix of Hessian (12) at the stationary point (¯ u1 , v¯1 , σ ¯1 ). The transformed matrix H0∗ = T T H0 T is shown in (31). In this case, we calculate the inverse of (31) at first, which is ∗ Z∗ − 12 em Y − σ −1 (Im − em eT m) Z ∗T X∗ − 12 en , (H0∗ )−1 = 1 T 1 T − 2 en 0 − 2 em (A16) −1 ¯ ∗ ¯ ∗ S¯T , Z ∗ = 1 σ −1 em eT SX and where Y ∗ = σ −2 SX n +σ 2 T −1 ∗ −1 ¯T ¯ X = (σ S S −σIn −4σen en ) . The true inverse Hessian is
H0−1 = T (H0∗ )−1 T T Y − σ −1 (Im − uuT ) Z ZT X = − 12 uT − 12 v T
− 21 u − 12 v , 0
(A17)
where Y = σ −2 AXAT , Z = 21 σ −1 uv T + σ −1 AX and X = V¯ X ∗ V¯ T −1 ¯ T = V¯ (σ −1 S¯T S¯ − σIn − 4σen eT V n) = (σ −1 AT A − σIn − 4σvv T )−1 . Then, the approximation error matrix is Y Z 0m ∆H −1 = Z T X + σ −1 (In − vv T ) 0n . 0T 0T 0 m n
(A18)
(A19)
(A11)
Now we calculate the upper bound of the approximation error. It is found that
(A12)
∆H −1 = T ∆(H ∗ )−1 T T ,
(A20)
FENG et al.: COUPLED CROSS-CORRELATION NEURAL NETWORK ALGORITHM FOR PRINCIPAL SINGULAR TRIPLET · · ·
where H0∗−1
∆(H ∗ )−1 = Y∗ (Z ∗ )T = 0T m
−H
Since stability analysis shows that the proposed algorithm converges to stationary point (u1 , v1 , σ1 ) and the approximation error of ∆H −1 is bounded, we have
∗−1
Z∗ 0m X ∗ + σ −1 (In − en eT 0n . n) T 0n 0
(A21)
k∆H −1 ks = kT ks · k∆H ∗−1 ks · kT T ks ,
(A22)
where kT ks represents the spectrum norm of matrix T . Clearly, ¯ s = kS¯T ks = |σ1 |. it holds that kT ks = kT T ks = 1 and kSk Then, we have + kX ∗ + σ −1 (In − en eT n )ks .
(A23)
Clearly, we have ¯ ∗ S¯T ks kY ∗ ks = kσ −2 SX ¯ s · kX ∗ ks · kS¯T ks = kX ∗ ks , ≤ |σ −2 | · kSk 1 −1 ¯ ∗ kZ ∗T ks = kZ ∗ ks = k σ −1 em eT SX ks n +σ 2 1 −1 ¯ ∗ ≤ k σ −1 em eT SX ks n ks + kσ 2 1 ≤ |σ −1 | + kX ∗ ks , 2 and
(A24)
(A25)
From (49) to (55), we conclude that k∆H −1 ks ≤ 4kX ∗ ks + 2|σ −1 |,
(A27)
where −1 kX ∗ ks = k(σ −1 S¯T S¯ − σIn − 4σen eT ks n) ° −1 ° ° ° −4σ ° ° σ22 ° ° − σ ° ° σ ° =° .. ° ° . ° ° ° ° 2 σn ° ° σ −σ s ° 1 −1 ° ° −4σ ° ° ° σ ° ° σ22 −σ 2 ° ° = ° ° .. ° ° . ° ° σ ° ° 2 2 σn −σ
|σ| − σ22
s
(A28)
in the vicinity of stationary point (u1 , v1 , σ1 ). Thus, the approximation error of (14) is approximate to 0.
[1] Kaiser A, Schenck W, M¨oller R. Coupled singular value decomposition of a cross-covariance matrix. International Journal of Neural Systems, 2010, 20(4): 293−318 [2] Cochocki A, Unbehauen R. Neural Networks for Optimization and Signal Processing. New York: John Wiley, 1993.
[4] Feng D Z, Bao Z, Zhang X D. A cross-associative neural network for SVD of non-squared data matrix in signal processing. IEEE Transactions on Neural Networks, 2001, 12(5): 1215−1221
[6] Bunch J R, Nielsen C P. Updating the singular value decomposition. Numerische Mathematik, 1978, 31(2): 111−129 [7] Comon P, Golub G H. Tracking a few extreme singular values and vectors in signal processing. Proceedings of the IEEE, 1990, 78(8): 1327−1343 [8] Ferzali W, Proakis J G. Adaptive SVD algorithm for covariance matrix eigenstructure computation. In: Proceedings of the 1990 International Conference on Acoustics, Speech, and Signal Processing. Albuquerque, NM: IEEE, 1990. 2615−2618 [9] Moonen M, Van Dooren P, Vandewalle J. A singular value decomposition updating algorithm for subspace tracking. SIAM Journal on Matrix Analysis and Applications, 1992, 13(4): 1015−1038 [10] Helmke U, Moore J B. Singular-value decomposition via gradient and self-equivalent flows. Linear Algebra and Its Applications, 1992, 169: 223−248 [11] Moore J B, Mahony R E, Helmke U. Numerical gradient algorithms for eigenvalue and singular value calculations. SIAM Journal on Matrix Analysis and Applications, 1994, 15(3): 881−902 [12] Hori G. A general framework for SVD flows and joint SVD flows. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing. Hong Kong: IEEE, 2003. II-693-6
and σ = σ ¯1 . Thus, we get 4|¯ σ1 | + 2|¯ σ1−1 |. σ ¯12 − σ ¯22 ∂p ∂u ∂p ∂v ∂p ∂σ
(A32) (A33)
[5] Diamantaras K I, Kung S Y. Cross-correlation neural network models. IEEE Transactions on Signal Processing, 1994, 42(11): 3218−3223
∗ −1 ≤ kX ∗ ks + kσ −1 (In − en eT |. (A26) n )ks = kX ks + |σ
The approximation error of (14) is ∆u˙ ∆v˙ = −∆H −1 ∇p = −∆H −1 ∆σ˙
− [X + σ −1 (In − vv T )](AT u − σv) ≈ 0n , ∆σ˙ =0
[3] Yuille A L, Kammen D M, Cohen D S. Quadrature and the development of orientation selective cortical cells by Hebb rules. Biological Cybernetics, 1989, 61(3): 183−194
kX ∗ + σ −1 (In − en eT n )ks
k∆H −1 ks ≤
(A31)
R EFERENCES
k∆H ∗−1 ks ≤kY ∗ ks + kZ ∗ ks + k(Z ∗ )T ks
σ2
∆u˙ = − Y (Av − σu) − Z(AT u − σv) ≈ 0m , ∆v˙ = − Z T (Av − σu)
It holds that
=
155
(A29)
[13] Feng D Z, Bao Z, Shi W X. Cross-correlation neural network models for the smallest singular component of general matrix. Signal Processing, 1998, 64(3): 333−346
.
(A30)
[14] Feng D Z, Zhang X D, Bao Z. A neural network learning for adaptively extracting cross-correlation features between two high-dimensional data streams. IEEE Transactions on Neural Networks, 2004, 15(6): 1541−1554
156
[15] Hasan M A. A logarithmic cost function for principal singular component analysis. In: Proceedings of the 2008 IEEE International Conference on Acoustics, Speech, and Signal Processing. Las Vegas, NV: IEEE, 2008. 1933−1936 [16] Kong X Y, Ma H G, An Q S, Zhang Q. An effective neural learning algorithm for extracting cross-correlation feature between two highdimensional data streams. Neural Processing Letters, 2015, 42(2): 459−477
IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 3, NO. 2, APRIL 2016
Xiaowei Feng graduated from Xi’an Research Institute of High Technology, China, in 2009. He received the master degree from Xi’an Research Institute of High Technology in 2009. He is a Ph. D. candidate at Xi’an Research Institute of High Technology. His research interests include neural networks, random signal processing, feature extraction, and fault diagnosis. Corresponding author of this paper.
[17] Cheng L, Hou Z G, Tan M. A simplified neural network for linear matrix inequality problems. Neural Processing Letters, 2009, 29(3): 213−230 [18] Cheng L, Hou Z G, Lin Y, Tan M, Zhang W C, Wu F X. Recurrent neural network for non-smooth convex optimization problems with application to the identification of genetic regulatory networks. IEEE Transactions on Neural Networks, 2011, 22(5): 714−726 [19] Liu Q S, Huang T W, Wang J. One-layer continuous-and discretetime projection neural networks for solving variational inequalities and related optimization problems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(7): 1308−1318
Xiangyu Kong graduated from Beijing Institute of Technology, China, in 1990. He received the master degree from Xi’an Research Institute of High Technology, China, in 2000 and the Ph. D. degree in control science and engineering from Xi’an Jiaotong University, China in 2005. He is currently an associate professor at Xi’an Research Institute of High Technology. His research interests include adaptive signal processing, nonlinear system modeling and its application, and fault diagnosis.
[20] M¨oller R, K¨onies A. Coupled principal component analysis. IEEE Transactions on Neural Networks, 2004, 15(1): 214−222 [21] Noble B, Daniel J W. Applied Linear Algebra (Third edition). Englewood Cliffs, NJ: Prentice Hall, 1988. [22] Hotelling H. Some new methods in matrix calculation. The Annals of Mathematical Statistics, 1943, 14(1): 1−34 [23] Hotelling H. Further points on matrix calculation and simultaneous equations. The Annals of Mathematical Statistics, 1943, 14(4): 440−441
Hongguang Ma graduated from Northwestern Polytechnical University, China in 1982. He received the M. S. degree from Xi’an Institute of Electronic Engineering, China in 1990 and the Ph. D. degree from Xi’an Jiaotong University, China in 2005. He is currently a professor at Beijing Institute of Technology, China. His research interests include nonlinear signal processing, data-mining of time series, chaos and nonlinear dynamics, complex system analysis, and fault diagnosis.