We review and introduce new representations of tensor train decompositions for ... In this paper, we consider several tensor formats, especially the tensor train ...
Fundamental Tensor Operations for Large-Scale Data Analysis in Tensor Train Formats Namgil Lee1 and Andrzej Cichocki1 1
Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Wako-shi, Saitama 3510198, Japan
Abstract We review and introduce new representations of tensor train decompositions for large-scale vectors, matrices, or low-order tensors. We provide extended definitions of mathematical multilinear operations such as Kronecker, Hadamard, and contracted products, with their properties for tensor calculus. Then we introduce an effective low-rank tensor approximation technique called the tensor train (TT) format with a number of mathematical and graphical representations. We also provide a brief review of mathematical properties of the TT format as a low-rank approximation technique. With the aim of breaking the curse-of-dimensionality in large-scale numerical analysis, we describe basic operations on large-scale vectors and matrices in TT format. The suggested representations can be used for describing numerical methods based on the TT format for solving large-scale optimization problems such as the system of linear equations and eigenvalue problems. KEY WORDS: tensor train; matrix product states; matrix product operators; tensor networks; generalized Tucker model; strong Kronecker product; contracted product; multilinear operators; numerical analysis; tensor calculus
1
Introduction
Multi-dimensional or multi-way data is prevalent nowadays, which can be represented by tensors. An N th-order tensor is a multi-way array of size I1 × I2 × · · · × IN , where the nth dimension or mode is of size In . For example, a tensor can be induced by the discretization of a multivariate function [24]. Given a multivariate function f (x1 , . . . , xN ) defined on a domain [0, 1]N , we can get a tensor with entries containing the function values at grid points. For another example, we can obtain tensors based on observed data [2, 22]. We can collect and integrate measurements from different modalities by neuroimaging technologies such as functional magnetic resonance imaging (fMRI) and electroencephalography (EEG): subjects, time, frequency, electrodes, task conditions, trials, and so on. Furthermore, high-order tensors can be created by a process called tensorization or quantization [23], by which a large-scale vectors and matrices are reshaped into higher-order tensors. However, it is impossible to store a high-order tensor because the number of entries, I N when I = I1 = I2 = · · · = IN , grows exponentially as the order N increases. This is called the “curse-ofdimensionality”. Even for I = 2, with N = 50 we obtain 250 ≈ 1015 entries. Such a huge storage and computational costs required for high dimensional problems prohibit the use of standard numerical algorithms. To make high dimensional problems tractable, there were developed approximation methods including sparse grids [32, 1] and low-rank tensor approximations [14, 24, 13]. In this paper, we focus on the latter approach, where computational operations are performed on tensor formats, i.e., lowparametric representations of tensors. In this paper, we consider several tensor formats, especially the tensor train (TT) format, which is one of the simplest tensor networks developed with the aim of overcoming the curse-of-dimensionality. Extensive overviews of the modern low-rank tensor approximation techniques are presented in [24, 13]. The TT format is equivalent to the matrix product states (MPS) for open boundary conditions proposed in computational physics, and it has taken a key role in density matrix renormalization group (DMRG) methods for simulating quantum many-body systems [33, 31]. It was later re-discovered 1
in numerical analysis community [28, 30, 18]. The TT-based numerical algorithms can accomplish algorithmic stability and adaptive determination of ranks by employing the singular value decomposition (SVD) [30]. Its scope of application is quickly expanding for addressing high-dimensional problems such as multi-dimensional integrals, stochastic and parametric PDEs, computational finance, and machine learning [13]. On the other hand, a comprehensive survey on traditional low-rank tensor approximation techniques for CP and Tucker formats is presented in [5, 22]. Despite the large interest in high-order tensors in TT format, mathematical representations of the TT tensors are usually limited to the representations based on scalar operations on matrices and vectors, which leads to complex and tedious index notation in the tensor calculus. For example, a TT tensor is defined by each entry represented as products of matrices [30, 18]. On the other hand, representations of traditional low-rank tensor formats have been developed based on multilinear operations such as the Kronecker product, Khatri-Rao product, Hadamard product, and mode-n multilinear product [2, 22], which enables coordinate-free notation. Through the utilization of the multilinear operations, the traditional tensor formats expanded the area of application to chemometrics, signal processing, numerical linear algebra, computer vision, data mining, graph analysis, and neuroscience [22]. In this work, we develop extended definitions of multilinear operations on tensors. Based on the tensor operations, we provide a number of new and useful representations of the TT format. We also provide graphical representations of the TT format, motivated by [18], which are helpful in understanding the underlying principles and TT-based numerical algorithms. Based on the TT representations of large-scale vectors and matrices, we show that the basic numerical operations such as the matrix-vector and matrix-matrix multiplications are conveniently described by the suggested representations. We demonstrate the usefulness of the proposed tensor operations in tensor calculus by giving a proof of orthonormality of the so-called frame matrices. We derive explicit representations of localized linear maps in TT format that have been implicitly presented in matrix forms in the literature in the context of alternating linear scheme (ALS) for solving various optimization problems. The suggested mathematical operations and TT representations can be further applied to describing TT-based numerical methods such as the solutions to large-scale systems of linear equations and eigenvalue problems [18]. This paper is organized as follows. In Section 2, we introduce notations for tensors and definitions for tensor operations. In Section 3, we provide the mathematical and graphical representations of the TT format. We also review mathematical properties the TT format as a low-rank approximation. In Section 4, we describe basic numerical operations on tensors in TT format such as the addition, Hadamard product, matrix-vector multiplication, and quadratic form in terms of the multilinear operations and TT representations. Discussion and conclusions are given in Section 5.
2
Notations for tensors and tensor operations
The notations in this paper follow the convention provided by [2, 22]. Table 1 summarizes the notations for tensors. Scalars, vectors, and matrices are denoted by lowercase, lowercase bold, and uppercase bold letters x, x, and X, respectively. Tensors are denoted by underlined uppercase bold letters X. The (i1 , i2 , . . . , iN )th entry of X of size I1 ×I2 ×· · ·×IN is denoted by xi1 ,...,iN or X(i1 , . . . , iN ). A subtensor of X obtained by fixing the indices i3 , . . . , iN is denoted by X:,:,i3 ,...,iN or X(:, :, i3 , . . . , iN ). We may omit ‘:’ as Xi3 ,...,iN or X(i3 , . . . , iN ) if the rest of the indices are clear to readers. The mode-n matricization of X ∈ RI1 ×···×IN is denoted by X(n) ∈ RIn ×I1 ···In−1 In+1 ···IN . We denote the mode-(1, . . . , n) matricization of X by X([n]) ∈ RI1 ···In ×In+1 ···IN in the sense that [n] ≡ {1, 2, . . . , n} is the set of integers from 1 to n. In addition, we define the multi-index notation by i1 i2 · · · iN = iN + (iN −1 − 1)IN + · · · + (i1 − 1)I2 I3 · · · IN for in = 1, 2, . . . , In , n = 1, . . . , N . By using this notation, we can write an entry of a Kronecker product as (a ⊗ b)ij = ai bj . Moreover, it is important to note that in this paper the vectorization and matricization are defined in accordance with the multi-index notation. That is, for X ∈ RI1 ×I2 ×···×IN , we have x = vec (X) ∈ RI1 I2 ···IN
⇔
x(i1 i2 · · · iN ) = X(i1 , i2 , . . . , iN ),
In ×I1 ···In−1 In+1 ···IN
⇔
X(in , i1 · · · in−1 in+1 · · · iN ) = X(i1 , i2 , . . . , iN ),
I1 ···In ×In+1 ···IN
⇔
X(i1 · · · in , in+1 · · · iN ) = X(i1 , i2 , . . . , iN ),
X = X(n) ∈ R
X = X([n]) ∈ R
for n = 1, . . . , N . Table 2 summarizes the notations and definitions for tensor operations used in this paper. 2
Table 1: Notations for tensors Notation X ∈ RI1 ×I2 ×···×IN x, x, X xi1 ,...,iN , X(i1 , . . . , iN ) xi2 ,...,iN , X(:, i2 , . . . , iN ), X(i2 , . . . , iN ) Xi3 ,...,iN , X(:, :, i3 , . . . , iN ), X(i3 , . . . , iN ) X(n) ∈ RIn ×I1 ···In−1 In+1 ···IN X([n]) ∈ RI1 ···In ×In+1 ···IN G(n) , X(n) , A(n) R, Rn i1 i2 · · · iN
2.1
Description N th-order tensor of size I1 × I2 × · · · × IN scalar, vector, and matrix (i1 , . . . , iN )th entry of X mode-1 fiber of X frontal slice of X mode-n unfolding of X mode-(1, . . . , n) unfolding of X core tensors ranks multi-index iN + (iN −1 − 1)IN + · · · + (i1 − 1)I2 I3 · · · IN
Kronecker, Hadamard, and outer products
Definitions for traditional matrix-matrix product operations such as the Kronecker, Hadamard, and outer products can be generalized to tensor-tensor products. Definition 2.1 (Kronecker product). The Kronecker product of A ∈ RI1 ×I2 ×···×IN and B ∈ RJ1 ×J2 ×···×JN is defined by C = A ⊗ B ∈ RI1 J1 ×I2 J2 ×···×IN JN with entries C(i1 j1 , i2 j2 , . . . , iN jN ) = A(i1 , i2 , . . . , iN )B(j1 , j2 , . . . , jN ). The mode-n Kronecker product of A ∈ RI1 ×···×In−1 ×J×In+1 ×···×IN and B ∈ RI1 ×···×In−1 ×K×In+1 ×···×IN is defined by C = A ⊗n B ∈ RI1 ×···×In−1 ×JK×In+1 ×···×IN with mode-n fibers C(i1 , . . . , in−1 , :, in+1 , . . . , iN ) = A(i1 , . . . , in−1 , :, in+1 , . . . , iN ) ⊗ B(i1 , . . . , in−1 , :, in+1 , . . . , iN ). Similarly, the mode-¯ n Kronecker product of A ∈ RI1 ×···×IN and B ∈ RJ1 ×···×JN with common mode size In = Jn is defined by C = A ⊗n¯ B ∈ RI1 J1 ×···×In−1 Jn−1 ×In ×In+1 Jn+1 ×···×IN JN with subtensors C(:, . . . , :, in , :, . . . , :) = A(:, . . . , :, in , :, . . . , :) ⊗ B(:, . . . , :, in , :, . . . , :) for each in = 1, 2, . . . , In . Definition 2.2 (Hadamard product). The Hadamard (elementwise) product of A ∈ RI1 ×I2 ×···×IN and B ∈ RI1 ×I2 ×···×IN is defined by C = A ~ B ∈ RI1 ×I2 ×···×IN with entries C(i1 , i2 , . . . , iN ) = A(i1 , i2 , . . . , iN )B(i1 , i2 , . . . , iN ). Definition 2.3 (Outer product). The outer product of A ∈ RI1 ×I2 ×···×IM and B ∈ RJ1 ×J2 ×···×JN is defined by C = A ◦ B ∈ RI1 ×I2 ×···×IM ×J1 ×J2 ×···×JN with entries C(i1 , i2 , . . . , iM , j1 , j2 , . . . , jN ) = A(i1 , i2 , . . . , iM )B(j1 , j2 , . . . , jN ).
3
Table 2: Notations and definitions for tensor operations Notation C=A⊗B
C = A ⊗n B
C = A ⊗n¯ B
C=A~B
C=A◦B
C=A⊕B
C = A ⊕n B
C = A ⊕n¯ B
C = A ×n B
C = A ×n b
C=A•B
r z G; A(1) , . . . , A(N ) C=A |⊗| B
C=A |⊗| B
S D (X) E X(n) , Y(n)
Description Kronecker product of A ∈ RI1 ×···×IN and B ∈ RJ1 ×···×JN yields a tensor C of size I1 J1 × · · · × IN JN with entries C(i1 j1 , . . . , iN jN ) = A(i1 , . . . , iN )B(j1 , . . . , jN ) mode-n Kronecker product of A ∈ RI1 ×···×In−1 ×J×In+1 ×···×IN and B ∈ RI1 ×···×In−1 ×K×In+1 ×···×IN yields a tensor C of size I1 × · · · × In−1 × JK × In+1 × · · · × IN with mode-n fibers C(i1 , . . . , in−1 , :, in+1 , . . . , iN ) = A(i1 , . . . , in−1 , :, in+1 , . . . , iN ) ⊗ B(i1 , . . . , in−1 , :, in+1 , . . . , iN ) mode-¯ n Kronecker product of A ∈ RI1 ×···×IN and B ∈ RJ1 ×···×JN with In = Jn yields a tensor C of size I1 J1 × · · · × In−1 Jn−1 × In × In+1 Jn+1 × · · · × IN JN with subtensors C(:, . . . , :, in , :, . . . , :) = A(:, . . . , :, in , :, . . . , :) ⊗ B(:, . . . , :, in , :, . . . , :) Hadamard (elementwise) product of A ∈ RI1 ×···×IN and B ∈ RI1 ×···×IN yields a tensor C of size I1 × · · · × IN with entries C(i1 , . . . , iN ) = A(i1 , . . . , iN )B(i1 , . . . , iN ) outer product of A ∈ RI1 ×···×IM and B ∈ RJ1 ×···×JN yields a tensor C of size I1 × · · · × IM × J1 × · · · × JN with entries C(i1 , . . . , iM , j1 , . . . , jN ) = A(i1 , . . . , iM )B(j1 , . . . , jN ) direct sum of A ∈ RI1 ×···×IN and B ∈ RJ1 ×···×JN yields a tensor C of size (I1 + J1 ) × · · · × (IN + JN ) with entries C(k1 , . . . , kN ) = A(k1 , . . . , kN ) if 1 ≤ kn ≤ In ∀n, C(k1 , . . . , kN ) = B(k1 − I1 , . . . , kN − IN ) if In < kn ≤ In + Jn ∀n, and C(k1 , . . . , kN ) = 0 otherwise mode-n direct sum of A ∈ RI1 ×···×In−1 ×J×In+1 ×···×IN and B ∈ RI1 ×···×In−1 ×K×In+1 ×···×IN yields a tensor C of size I1 × · · · × In−1 × (J + K) × In+1 × · · · × IN with mode-n fibers C(i1 , . . . , in−1 , :, in+1 , . . . , iN ) = A(i1 , . . . , in−1 , :, in+1 , . . . , iN ) ⊕ B(i1 , . . . , in−1 , :, in+1 , . . . , iN ) mode-¯ n direct sum of A ∈ RI1 ×···×IN and B ∈ RJ1 ×···×JN with In = Jn yields a tensor C of size (I1 + J1 ) × · · · × (In−1 + Jn−1 ) × In × (In+1 + Jn+1 ) × · · · × (IN + JN ) with subtensors C(:, . . . , :, in , :, . . . , :) = A(:, . . . , :, in , :, . . . , :) ⊕ B(:, . . . , :, in , :, . . . , :) mode-n product [22] of tensor A ∈ RI1 ×···×IN and matrix B ∈ RJ×In yields a tensor C of size I1 × · · · × In−1 × J × In+1 × · · · × IN with mode-n fibers C(i1 , . . . , in−1 , :, in+1 , . . . , iN ) = BA(i1 , . . . , in−1 , :, in+1 , . . . , iN ) mode-n (vector) product [22] of tensor A ∈ RI1 ×···×IN and vector b ∈ RIn yields a tensor C of size I1 × · · · × In−1 × In+1 × · · · × IN with entries C(i1 , . . . , in−1 , in+1 , . . . , iN ) = bT A(i1 , . . . , in−1 , :, in+1 , . . . , iN ) (mode-(M, 1)) contracted product of tensor A ∈ RI1 ×···×IM and tensor B ∈ RIM ×J2 ×J3 ×···×JN yields a tensor C of size I1 × · · · × IM −1 × J2 × · · · × JN with entries PIM C(i1 , . . . , iM −1 , j2 , . . . , jN ) = iM =1 A(i1 , . . . , iM )B(iM , j2 , . . . , jN ) Tucker operator defined by (2) Strong Kronecker product of two block matrices A = [Ar1 ,r2 ] ∈ RR1 I1 ×R2 I2 and B = [Br2 ,r3 ] ∈ RR2 J1 ×R3 J2 yields a block matrix PR C = [Cr1 ,r3 ] ∈ RR1 I1 J1 ×R3 I2 J2 with blocks Cr1 ,r3 = r22=1 Ar1 ,r2 ⊗ Br2 ,r3 Strong Kronecker product of two block tensors A = [Ar1 ,r2 ] ∈ RR1 I1 ×R2 I2 ×I3 and B = [Br2 ,r3 ] ∈ RR2 J1 ×R3 J2 ×J3 yields a block tensor PR C = [Cr1 ,r3 ] ∈ RR1 I1 J1 ×R3 I2 J2 ×I3 J3 with blocks Cr1 ,r3 = r22=1 Ar1 ,r2 ⊗ Br2 ,r3 self-contraction operator S : RI1 ×I2 ×···×IN ×I1 → RI2 ×I3 ×···×IN defined by (3) core contraction of two TT-cores defined by (23)
C
4
Note that an N th order tensor X ∈ RI1 ×I2 ×···×IN is rank-one if it is written as the outer product of N vectors X = v(1) ◦ v(2) ◦ · · · ◦ v(N ) . In general, N th order tensor X ∈ RI1 ×I2 ×···×IN can be represented as a sum of rank-one tensors, so called CP or PARAFAC [22] R X X= vr(1) ◦ vr(2) ◦ · · · ◦ vr(N ) . r=1
The smallest number R of the rank-one tensors that produce X is called the tensor rank of X [22]. It is possible to define a tensor operation between rank-one tensors and generalize it to sums of rank-one tensors. For example, given two rank-one tensors A = a(1) ◦a(2) ◦· · ·◦a(N ) and B = b(1) ◦b(2) ◦· · ·◦b(N ) and Hadamard (elementwise) product of vectors a(n) ~ b(n) , • the Kronecker product A ⊗ B can be defined by A ⊗ B = a(1) ⊗ b(1) ◦ a(2) ⊗ b(2) ◦ · · · ◦ a(N ) ⊗ b(N ) , • the mode-n Kronecker product by A⊗n B = a(1) ~ b(1) ◦· · ·◦ a(n−1) ~ b(n−1) ◦ a(n) ⊗ b(n) ◦ a(n+1) ~ b(n+1) ◦· · ·◦ a(N ) ~ b(N ) , • the mode-¯ n Kronecker product by A⊗n¯ B = a(1) ⊗ b(1) ◦· · ·◦ a(n−1) ⊗ b(n−1) ◦ a(n) ~ b(n) ◦ a(n+1) ⊗ b(n+1) ◦· · ·◦ a(N ) ⊗ b(N ) , • the Hadamard product by A ~ B = a(1) ~ b(1) ◦ a(2) ~ b(2) ◦ · · · ◦ a(N ) ~ b(N ) , • and the outer product by A ◦ B = a(1) ◦ · · · ◦ a(N ) ◦ b(1) ◦ · · · ◦ b(N ) . However, the problem of determining the tensor rank of a specific tensor is NP-hard if the order is larger than 2 [16]. So, for practical applications, we will define tensor operations by using index notation and provide examples with rank-one tensors.
2.2
Direct sum
The direct sum of matrices A and B is defined by A ⊕ B = diag (A, B) =
A
B
.
A generalization of the direct sum to tensors is defined as follows. Definition 2.4 (Direct sum). The direct sum of tensors A ∈ RI1 ×I2 ×···×IN and B ∈ RJ1 ×J2 ×···×JN is defined by C = A ⊕ B ∈ R(I1 +J1 )×(I2 +J2 )×···×(IN +JN ) with entries A(k1 , k2 , . . . , kN ) C(k1 , k2 , . . . , kN ) = B(k1 − I1 , k2 − I2 , . . . , kN − IN ) 0
5
if 1 ≤ kn ≤ In ∀n if In < kn ≤ In + Jn ∀n otherwise.
The mode-n direct sum of A ∈ RI1 ×···×In−1 ×J×In+1 ×···×IN and B ∈ RI1 ×···×In−1 ×K×In+1 ×···×IN is defined by C = A ⊕n B ∈ RI1 ×···×In−1 ×(J+K)×In+1 ×···×IN with mode-n fibers C(i1 , . . . , in−1 , :, in+1 , . . . , iN ) = A(i1 , . . . , in−1 , :, in+1 , . . . , iN ) ⊕ B(i1 , . . . , in−1 , :, in+1 , . . . , iN ). The mode-¯ n direct sum of A ∈ RI1 ×I2 ×···×IN and B ∈ RJ1 ×J2 ×···×JN with common mode size In = Jn is defined by C = A ⊕n¯ B ∈ R(I1 +J1 )×···×(In−1 +Jn−1 )×In ×(In+1 +Jn+1 )×···×(IN +JN ) with subtensors C(:, . . . , :, in , :, . . . , :) = A(:, . . . , :, in , :, . . . , :) ⊕ B(:, . . . , :, in , :, . . . , :) for each in = 1, 2, . . . , In . In special cases, the direct sum of vectors a ∈ RI and b ∈ RJ is the concatenation a ⊕ b ∈ RI+J , and the direct sum of matrices A ∈ RI1 ×I2 and B ∈ RJ1 ×J2 is the block diagonal matrix A ⊕ B = diag(A, B) ∈ R(I1 +J1 )×(J1 +J2 ) . We suppose that the direct sum of scalars a, b ∈ R is the addition a ⊕ b = a + b ∈ R.
2.3
Tucker operator and contracted product
The mode-n product, denoted by ×n , is a multilinear operator between a tensor and a matrix [22]. Kolda and Bader [22] further introduced a multilinear operator called the Tucker operator [21] to simplify the expression for the mode-n product. The mode-n product of a tensor G ∈ RR1 ×R2 ×···×RN and a matrix A ∈ RIn ×Rn is defined by X = G ×n A ∈ RR1 ×···×Rn−1 ×In ×Rn+1 ×···×RN with entries X(r1 , . . . , rn−1 , in , rn+1 , . . . , rN ) =
Rn X
G(r1 , r2 , . . . , rN )A(in , rn ),
rn =1
or, in matrix, X(n) = AG(n) . The standard Tucker operator of a tensor G ∈ RR1 ×R2 ×···×RN and matrices A(n) ∈ RIn ×Rn , n = 1, . . . , N, is defined by r z G; A(1) , A(2) , . . . , A(N ) = G ×1 A(1) ×2 A(2) ×3 · · · ×N A(N ) ∈ RI1 ×I2 ×···×IN . (1) Here, we generalize this to a multilinear operator between tensors. Definition 2.5 (Tucker operator). Let N ≥ 1 and Mn ≥ 0 (n = 1, 2, . . . , N ). For an N th order tensor G ∈ RR1 ×R2 ×···×RN and (Mn + 1)th order tensors A(n) ∈ RIn,1 ×In,2 ×···×In,Mn ×Rn , n = 1, . . . , N, the Tucker operator is a multilinear operator defined by the (M1 + M2 + · · · + MN )th order tensor r z X = G; A(1) , . . . , A(N ) ∈ RI1,1 ×···×I1,M1 ×···×IN,1 ×···×IN,MN with entries X(i1 , i2 , . . . , iN ) =
R1 X R2 X r1 =1 r2 =1
···
RN X
G(r1 , r2 , . . . , rN )A(1) (i1 , r1 )A(2) (i2 , r2 ) · · · A(N ) (iN , rN ),
rN =1
where in = (in,1 , in,2 , . . . , in,Mn ) is the ordered indices.
6
(2)
In a special case, we have the standard Tucker operator (1) if A(n) are matrices, i.e., Mn = 1. Even in the case of vectors a(n) ∈ RRn , i.e., Mn = 0, we have the scalar z r G; a(1) , a(2) , . . . , a(N ) = G ×1 a(1) ×2 a(2) ×3 · · · ×N a(N ) ∈ R, where ×n is the mode-n (vector) product [22]. For example, let G = g(1) ◦ g(2) ◦ g(3) ∈ RR1 ×R2 ×R3 and A = a(1) ◦ a(2) ◦ a(3) ◦ a(4) ◦ a(5) ∈ I1 ×I2 ×I3 ×I4 ×R2 R be rank-one tensors. Then, E D JG; IR1 , A, IR3 K = g(2) , a(5) g(1) ◦ a(1) ◦ a(2) ◦ a(3) ◦ a(4) ◦ g(3) ∈ RR1 ×I1 ×I2 ×I3 ×I4 ×R3 , where hv, wi = vT w is the innerproduct of vectors. In general, we can derive the following properties. Proposition 2.6. Let N ≥ 1 and Mn ≥ 0 (n = 1, 2, . . . , N ). Let GX and GY be N th order tensors, A(n) and B(n) be (Mn + 1)th order tensors, n = 1, 2, . . . , N , and z z r r and Y = GY ; B(1) , . . . , B(N ) X = GX ; A(1) , . . . , A(N ) have the same sizes. Then r z (a) X ⊗ Y = GX ⊗ GY ; A(1) ⊗ B(1) , . . . , A(N ) ⊗ B(N ) , z r (b) X ~ Y = GX ⊗ GY ; A(1) ⊗M1 +1 B(1) , . . . , A(N ) ⊗MN +1 B(N ) , r z (c) X ⊕ Y = GX ⊕ GY ; A(1) ⊕ B(1) , . . . , A(N ) ⊕ B(N ) , r z (d) X + Y = GX ⊕ GY ; A(1) ⊕M1 +1 B(1) , . . . , A(N ) ⊕MN +1 B(N ) . Example. We provide examples where A(n) and B(n) are either matrices or vectors. • Let Mn = 1 (n = 1, 2, . . . , N ), i.e., we have X and Y in Tucker format r z X = GX ; A(1) , . . . , A(N ) ∈ RI1 ×I2 ×···×IN r z Y = GY ; B(1) , . . . , B(N ) ∈ RI1 ×I2 ×···×IN . It follows that the Kronecker product, Hadamard product, direct sum, and addition is also given in the Tucker format q y (a) X ⊗ Y = GX ⊗ GY ; A(1) ⊗ B(1) , . . . , A(N ) ⊗ B(N ) , q y (b) X ~ Y = GX ⊗ GY ; A(1) ⊗2 B(1) , . . . , A(N ) ⊗2 B(N ) , q y (c) X ⊕ Y = GX ⊕ GY ; A(1) ⊕ B(1) , . . . , A(N ) ⊕ B(N ) , q y (d) X + Y = GX ⊕ GY ; A(1) ⊕2 B(1) , . . . , A(N ) ⊕2 B(N ) . In the case that X and Y are in CP format where the core tensors GX and GY are superdiagonal, the results are also given in CP format because the Kronecker product and direct sum of the super-diagonal core tensors are super-diagonal tensors as well. • Let Mn = 0 (n = 1, 2, . . . , N ), then we have the scalars r z x = GX ; a(1) , . . . , a(N ) ∈ R r z y = GY ; b(1) , . . . , b(N ) ∈ R. The multiplication and addition are given in the form 7
q y (a) xy = x ⊗ y = x ~ y = GX ⊗ GY ; a(1) ⊗ b(1) , . . . , a(N ) ⊗ b(N ) , y q (b) x + y = x ⊕ y = GX ⊕ GY ; a(1) ⊕ b(1) , . . . , a(N ) ⊕ b(N ) . As a special case, we introduce a tensor-tensor contracted product as follows. Definition 2.7 ((Mode-(M, 1)) contracted product). Let M ≥ 1 and N ≥ 1. The (mode-(M, 1)) contracted product of tensors A ∈ RI1 ×I2 ×···×IM and B ∈ RIM ×J2 ×···×JN is defined by C = A • B ∈ RI1 ×···×IM −1 ×J2 ×···×JN with entries C(i1 , . . . , iM −1 , j2 , . . . , jN ) =
IM X
A(i1 , . . . , iM )B(iM , j2 , . . . , jN ).
iM =1
We note that the tensor-tensor contracted product defined above is a natural generalization of the matrix multiplication as AB = A • B, and the vector innerproduct as ha, bi = a • b. Especially, the contracted products between a tensor A ∈ RI1 ×···×IM and vectors p ∈ RI1 and q ∈ RIM lead to tensors of smaller orders as p • A ∈ RI2 ×···×IM A • q ∈ RI1 ×···×IM −1 . The contracted product of rank-one tensors yields a(1) ◦ · · · ◦ a(M ) • b(1) ◦ · · · ◦ b(N ) = a(M ) • b(1) a(1) ◦ · · · ◦ a(M −1) ◦ b(2) ◦ · · · ◦ b(N ) . In general, we have the following properties: Proposition 2.8. Let A ∈ RI1 ×···×IM , B ∈ RIM ×J2 ×···×JN , C ∈ RJN ×K2 ×···×KL , G ∈ RR1 ×···×RN , P ∈ RI×R1 , and Q ∈ RRN ×J . Then (a) (A • B) • C = A • (B • C), (b) A • B = JB; A, IJ2 , . . . , IJN K,
(c) P • G = G ×1 P,
(d) G • Q = G ×N QT , and (e) (A • B)([m]) = A([m]) IIm+1 ⊗ IIm+2 ⊗ · · · ⊗ IIM −1 ⊗ B(1) for m = 1, 2, . . . , M − 1. Proof of (e). Note that A • B ∈ RI1 ×···×IM −1 ×J2 ×···×JN . For 1 ≤ m ≤ M − 1, we have (A • B)([m]) (i1 · · · im , im+1 · · · iM −1 j2 · · · jN ) = (A • B) (i1 , . . . , iM −1 , j2 , . . . , jN ) =
IM X
A(i1 , . . . , iM )B(iM , j2 , . . . , jN ).
iM =1
By inserting auxiliary variables km+1 , . . . , kM , IM X
IM −1
Im+1
A(i1 , . . . , iM )B(iM , j2 , . . . , jN ) =
iM =1
X
···
X
IM X
A(i1 , . . . , im , km+1 , . . . , kM −1 , iM )
kM −1 =1 iM =1
km+1 =1
B(iM , j2 , . . . , jN )δ(km+1 , im+1 ) · · · δ(kM −1 , iM −1 ). We also have A(i1 , . . . , im , km+1 , . . . , kM −1 , iM ) = A([m]) (i1 · · · im , km+1 · · · kM −1 iM ) and B(iM , j2 , . . . , jN )δ(km+1 , im+1 ) · · · δ(kM −1 , iM −1 ) = IIm+1 (km+1 , im+1 ) · · · IIM −1 (kM −1 , iM −1 )B(1) (iM , j2 · · · jN ) = IIm+1 ⊗ · · · ⊗ IIM −1 ⊗ B(1) (km+1 · · · kM −1 iM , im+1 · · · iM −1 j2 · · · jN ). It follows that (A • B)([m]) = A([m]) IIm+1 ⊗ · · · ⊗ IIM −1 ⊗ B(1) . 8
Proposition 2.9. Let A(n) and B(n) be Mn th order tensors, n = 1, 2, and let X = A(1) • A(2)
and
Y = B(1) • B(2)
have the same sizes. Then, (a) X ⊗ Y = A(1) ⊗ B(1) • A(2) ⊗ B(2) , (b) X ~ Y = A(1) ⊗M1 B(1) • A(2) ⊗1 B(2) , (c) X ⊕ Y = A(1) ⊕ B(1) • A(2) ⊕ B(2) , (d) X + Y = A(1) ⊕M1 B(1) • A(2) ⊕1 B(2) .
2.4
Self-contraction operator
We will define a linear operator, S, called the self-contraction, which generalizes the trace on matrices to tensors. Definition 2.10 (Self-contraction). The self-contraction, S : RI1 ×I2 ×···×IN ×I1 → RI2 ×I3 ×···×IN , N ≥ 1, is a linear operator defined by Y = S(X) with Y(i2 , i3 , . . . , iN ) =
I1 X
X (i1 , i2 , . . . , iN , i1 ) .
(3)
i1 =1
The self-contraction is a generalization of the matrix trace. For a matrix A ∈ RI×I , S(A) = tr(A) ∈ R. A more formal definition is given by using the contracted product as S(X) =
I1 X
ei1 • X • ei1 ,
(4)
i1 =1 T
where ei1 = [0, . . . , 1, . . . , 0] ∈ RI1 is the i1 th standard basis vector. The self-contraction of a rank-one tensor is calculated by D E S a(1) ◦ a(2) ◦ · · · ◦ a(N +1) = a(1) , a(N +1) a(2) ◦ · · · ◦ a(N ) , where hv, wi = vT w. For a tensor X ∈ RI1 ×I2 ×···×IN ×I1 , the (i2 , i3 , . . . , iN )th entry of S(X) equals to the matrix trace of the (i2 , i3 , . . . , iN )th slice as (S(X))i2 ,i3 ,...,iN = tr(X(:, i2 , i3 , . . . , iN , :)). In addition, let A ∈ RI1 ×I2 ×···×IM +1 and B ∈ RIM +1 ×J2 ×J3 ×···×JN ×I1 . Then (S(A • B))i2 ,...,iM ,j2 ,...,jN = (S(B • A))j2 ,...,jN ,i2 ,...,iM . As a special case, if M = N = 2, then (S(A • B))T = S(B • A).
2.5
Strong Kronecker product
The strong Kronecker product is an important tool in low-rank TT decompositions of large-scale tensors. We present the definitions of the strong Kronecker product [27] and its generalization to tensors.
9
Definition 2.11 (Strong Kronecker product). Let A and B be R1 × R2 and R2 × R3 block matrices A1,1 · · · A1,R2 . .. ∈ RR1 I1 ×R2 I2 .. A = Ar1 ,r2 = .. . . AR1 ,1 · · · AR1 ,R2
B1,1 . B = Br2 ,r3 = .. BR2 ,1
··· .. . ···
B1,R3 .. ∈ RR2 J1 ×R3 J2 , . BR2 ,R3
where Ar1 ,r2 ∈ RI1 ×I2 and Br2 ,r3 ∈ RJ1 ×J2 . The strong Kronecker product of A and B is defined by the R1 × R3 block matrix C = Cr1 ,r3 = A | ⊗ | B ∈ RR1 I1 J1 ×R3 I2 J2 , where Cr1 ,r3 =
R2 X
Ar1 ,r2 ⊗ Br2 ,r3 ∈ RI1 J1 ×I2 J2 ,
r2 =1
for r1 = 1, 2, . . . , R1 and r3 = 1, 2, . . . , R3 . More generally, let A = Ar1 ,r2 ∈ RR1 I1 ×R2 I2 ×I3 and B = Br2 ,r3 ∈ RR2 J1 ×R3 J2 ×J3 be R1 × R2 and R2 × R3 block tensors where Ar1 ,r2 ∈ RI1 ×I2 ×I3 and Br2 ,r3 ∈ RJ1 ×J2 ×J3 are 3rd order tensors. Then the strong Kronecker product of A and B is defined by the R1 × R3 block tensor C = Cr1 ,r3 = A | ⊗ | B ∈ RR1 I1 J1 ×R3 I2 J2 ×I3 J3 , where Cr1 ,r3 =
R2 X
Ar1 ,r2 ⊗ Br2 ,r3 ∈ RI1 J1 ×I2 J2 ×I3 J3 ,
r2 =1
for r1 = 1, 2, . . . , R1 and r3 = 1, 2, . . . , R3 . Example. The strong Kronecker product has a similarity with the matrix-matrix mutliplication. For example, A11 A12 B11 B12 A11 ⊗ B11 + A12 ⊗ B21 A11 ⊗ B12 + A12 ⊗ B22 |⊗| = . A21 A22 B21 B22 A21 ⊗ B11 + A22 ⊗ B21 A21 ⊗ B12 + A22 ⊗ B22
2.6
Graphical representations of tensors
It is quite useful to represent tensors and related operations by graphs of nodes and edges, as was proposed in [18]. Figure 1(a), (b), and (c) illustrate the graphs representing a vector, a matrix, and a 3rd order tensor. In each graph, the number of edges connected to a node indicates the order of the tensor, and the mode size can be shown by the label on each edge. Figure 1(d) represents the singular value decomposition of a matrix. The orthonormalized matrices are represented by half-filled circles and the diagonal matrix by a circle with slash inside. Figure 1(e) represents the mode-3 product, A ×3 B, for some A ∈ RI1 ×I2 ×I3 and B ∈ RJ1 ×J2 (I3 = J2 ). Figure 1(f) represents the contracted product, A • B, for some A ∈ RI1 ×I2 ×I3 and B ∈ RJ1 ×J2 ×J3 (I3 = J1 ).
10
(a)
(b)
(c)
(d)
(e)
(f)
Figure 1: Graphical representations for (a) a vector (b) a matrix, (c) a 3rd order tensor, (d) singular value decomposition of an I × J matrix, (e) mode-3 product between a 3rd order tensor and a matrix, and (f) contracted product between two 3rd order tensors.
Figure 2: Graphical representation of a 4th order tensor in TT format
3
Tensor train formats
3.1
TT format
In the TT format, a tensor X ∈ RI1 ×I2 ×···×IN is represented as X = G(1) • G(2) • · · · • G(N −1) • G(N ) ,
(5)
(n)
where G ∈ RRn−1 ×In ×Rn , n = 1, . . . , N, are 3rd order tensors called the TT-cores, R1 , . . . , RN −1 are called the TT-ranks, and R0 = RN = 1. Since R0 = RN = 1, we consider that the contracted product (5) yields a tensor of order N even if each of the TT-cores are regarded as a 3rd order tensor for notational convenience. Figure 2 illustrates the graphical representation of a 4th order TT tensor. Alternatively, the TT format is often written entry-wise as (1)
(N −1)
(2)
(N )
xi1 ,i2 ,...,iN = Gi1 Gi2 · · · GiN −1 GiN =
R1 X R2 X
RN −1
r1 =1 r2 =1
(2)
(1)
X
···
(N −1)
(6)
(N )
g1,i1 ,r1 gr1 ,i2 ,r2 · · · grN −2 ,iN −1 ,rN −1 grN −1 ,iN ,1 ,
rN −1 =1
(n)
(1)
where Gin = G(n) (:, in , :) ∈ RRn−1 ×Rn is the lateral slice of the nth TT-core. Note that Gi1 and (N ) GiN
are 1 × R1 and RN −1 × 1 matrices. The TT format can also be represented as a sum of outer products. From (6), we have X=
R1 X R2 X
RN −1
···
r1 =1 r2 =1
(1)
X
(N )
−1) g1,r1 ◦ gr(2) ◦ · · · ◦ gr(N ◦ grN −1 ,1 , 1 ,r2 N −2 ,rN −1
(7)
rN −1 =1
(n)
where grn−1 ,rn = G(n) (rn−1 , :, rn ) ∈ RIn is the mode-2 fiber. A large-scale vector x of length I1 I2 · · · IN can be tensorized into X ∈ RI1 ×I2 ×···×IN and represented in TT format. By vectorizing the TT format (7), we get the TT format for x, represented as a sum of the Kronecker products x=
R1 X R2 X r1 =1 r2 =1
RN −1
···
X
(1)
(N )
−1) ⊗ grN −1 ,1 . g1,r1 ⊗ gr(2) ⊗ · · · ⊗ gr(N 1 ,r2 N −2 ,rN −1
rN −1 =1
11
(8)
Table 3: Various representations for the TT format of a tensor X of size I1 × · · · × IN TT Contracted products X = G(1) • G(2) • · · · • G(N −1) • G(N ) Matrix products (1) (2) (N −1) (N ) xi1 ,i2 ,...,iN = Gi1 Gi2 · · · GiN −1 GiN Scalar products RN −1 R R 1 2 X X X (1) (2) (N −1) (N ) xi1 ,i2 ,...,iN = ··· g1,i1 ,r1 gr1 ,i2 ,r2 · · · grN −2 ,iN −1 ,rN −1 grN −1 ,iN ,1 r1 =1 r2 =1
rN −1 =1
Outer products X=
R1 X R2 X
RN −1
···
r1 =1 r2 =1
X
(N )
(1)
−1) ◦ grN −1 ,1 g1,r1 ◦ gr(2) ◦ · · · ◦ gr(N 1 ,r2 N −2 ,rN −1
rN −1 =1
Strong Kronecker products (1) (2) e e e (N −1) | ⊗ | G e (N ) x = vec(X) h = G i| ⊗ | G | ⊗ | · · · ⊗ G (n) (n) e (n) = gr ,r ∈ RRn−1 In ×Rn with each block grn−1 ,rn ∈ RIn G n−1 n Vectorizations (N −1)T (2)T x = vec(X) = II1 I2 ···IN −1 ⊗ II1 I2 ···IN −2 ⊗ G(1) · · · II1 ⊗ G(1) vec G(1) (1)T (2)T (N −1)T G(3) ⊗ II3 I4 ···IN · · · G(3) ⊗ IIN vec G(N ) x = vec(X) = G(3) ⊗ II2 I3 ···IN T T n G(n) ⊗ IIn ⊗ G(1) vec G(n) x = vec(X) =
(N )T G(1)
The above form can be compactly represented as the strong Kronecker products e (1) | ⊗ | G e (2) | ⊗ | · · · ⊗ G e (N −1) | ⊗ | G e (N ) , x=G
(9)
e (n) , n = 1, . . . , N, are the Rn−1 × Rn block matrices where G (n) (n) g1,1 ··· g1,Rn .. e (n) = .. ∈ RRn−1 In ×Rn G . . (n) (n) gRn−1 ,1 · · · gRn−1 ,Rn (n)
with each block grn−1 ,rn ∈ RIn . Table 3 summarizes the alternative representations for the TT format. In principle, any tensor X can be represented in TT format through the TT-SVD algorithm [30]. The storage cost for a TT format is O(N IR2 ) with I = max(In ) and R = max(Rn ). So the storage and computational complexities can be substantially reduced if the TT-ranks are kept small enough.
3.2
Recursive representations for TT format
The TT format can be expressed in a recursive manner, which is summarized in Table 4. Given the TT-cores G(n) , n = 1, . . . , N, we define the partial contracted products G≤n ∈ RI1 ×···×In ×Rn and G≥n ∈ RRn ×In+1 ×···×IN as G≤n = G(1) • G(2) • · · · • G(n) (10) and G≥n = G(n) • G(n+1) • . . . • G(N ) . n
(11) N
G and G are defined in the same way. For completeness, we define G = G vectorizations of the partial contracted products yield the following recursive equations (n)T (n)T vec G≤n = vec G≤n−1 ×n G(1) = II1 I2 ···In−1 ⊗ G(1) vec G≤n−1
12
= 1. The
Table 4: Recursive representations for the TT format of a tensor X of size I1 × · · · × IN
xi1 ,...,iN = G≤N i1 ,...,iN
TT: recursive representations Contracted products ≤n G = G≤n−1 • G(n) , n = 1, 2, . . . , N , G≤0 = 1 G≥n = G(n) • G≥n+1 , n = 1, 2, . . . , N , G≥N +1 = 1 Matrix products (n) ≤n−1 with G≤n i1 ,...,in = Gi1 ,...,in−1 Gin
xi1 ,...,iN = G≥1 i1 ,...,iN
with
X = G≤N X = G≥1
with with
≤N xi1 ,...,iN = g1,i 1 ,...,iN ,1
≥1 xi1 ,...,iN = g1,i 1 ,...,iN ,1
(n)
≥n+1 G≥n in ,...,iN = Gin Gin+1 ,...,iN Scalar products Rn−1 X ≤n−1 (n) ≤n g1,i1 ,...,in−1 ,rn−1 grn−1 ,in ,rn = g1,i 1 ,...,in ,rn
with
rn−1 =1 Rn X
= gr≥n n−1 ,in ,...,iN ,1
with
(n)
grn−1 ,in ,rn gr≥n+1 n ,in+1 ,...,iN ,1
rn =1
X = G≤N rN X = G≥1 r0
G≤n rn
with
Outer products Rn−1 X (n) G≤n−1 = rn−1 ◦ grn−1 ,rn rn−1 =1 Rn X
G≥n rn−1 =
with
gr(n) ◦ Gr≥n+1 n−1 ,rn n
rn =1
≤N
vec (X) = vec G vec (X) = vec G≥1 (1)
with with
X(1) = G([2]) G>1 with (1) (n) n (1)
Vectorizations (n)T vec G≤n = II1 I2 ···In−1 ⊗ G(1) vec G≤n−1 (n)T vec G≥n = G(3) ⊗ IIn+1 In+2 ···IN vec G≥n+1 Matricizations (n+1) >n+1 G>n ([2]) = G([2]) G(1)
for n = 2, 3, . . . , N , and (n)T (n)T = G(3) ⊗ IIn+1 In+2 ···IN vec G≥n+1 vec G≥n = vec G≥n+1 ×1 G(3) for n = 1, 2, . . . , N − 1. Hence, the vectorization of tensor X can be expressed as (N )T vec G≤N −1 vec (X) = vec G≤N = II1 I2 ···IN −1 ⊗ G(1) and
(1)T vec (X) = vec G≥1 = G(3) ⊗ II2 I3 ···IN vec G≥2 .
Proposition 3.1. We obtain the following formulas: (N )T (N −1)T (2)T vec(X) = II1 I2 ···IN −1 ⊗ G(1) II1 I2 ···IN −2 ⊗ G(1) · · · II1 ⊗ G(1) vec G(1) and
(1)T (2)T (N −1)T vec(X) = G(3) ⊗ II2 I3 ···IN G(3) ⊗ II3 I4 ···IN · · · G(3) ⊗ IIN vec G(N ) .
13
(12)
(13)
3.3
Extraction of TT-cores
Using the concept of splitting tensor train to sub-tensor trains, we can obtain an another important expression of the vectorization, in which the nth core tensor is separated from the others as [6, 26] T T (n) n vec (X) = vec G ×1 G(n) ×3 G(1) (14) T T (n) >n vec G ⊗ I ⊗ G . = Gn ⊗ I ⊗ G ∈ RI1 I2 ···IN ×Rn−1 In Rn X6=n = Gn of the orthonormality of Gn Proof. We prove the orthonormality for G