Recursive and Tensor-Train Decompositions in. Higher Dimensions. Ivan Oseledets and Eugene Tyrtyshnikov. (Invited Paper). Manuscript received May 30, ...
HERCMA-2009 PROCEEDINGS
1
Recursive and Tensor-Train Decompositions in Higher Dimensions
Ivan Oseledets and Eugene Tyrtyshnikov (Invited Paper)
Manuscript received May 30, 2009. The authors are with the Institute of Numerical Mathematics of the Russian Academy of Sciences, Lomonosov Moscow State University and Moscow Institute of Physics and Technology. May 31, 2009
DRAFT
HERCMA-2009 PROCEEDINGS
2
Abstract This is a brief survey of our recent new results on tensor decompositions propelling a new generation of numerical tensor algorithms. In higher dimensions these algorithms are free from the “curse of dimensionality”, in lower dimensions they are useful as new and compatible data compression and image processing tools.
Index Terms Curse of dimensionality, tensor decomposition, tensor rank, tensor-train decomposition, TT decomposition, low-rank approximation, cross approximation, maximal-volume principle.
I. I NTRODUCTION Many problems in mathematics and sciences give rise to arrays (tensors) of size n1 × ... × nd
with elements a(i1 , ..., id ) defined by d indices. If n1 = ... = nd = n, then a contains nd
elements. The exponential growth in d is known as “curse of dimensionality”. It can be inferred that computations for large d are not feasible unless the tensors are presented in a structured form described by much lesser parameters than nd . There are two basic formats for low-parametric representation of tensors. The Tucker decomposition of a reads a(i1 , ..., id ) =
r X
t1 =1
...
r X
g(t1 , ..., td )q1 (i1 , t1 )...qd (id , td ),
(1)
td =1
the value of r is called the Tucker rank. Efficient linear algebra operations with three-dimensional tensors in the Tucker format are recently proposed in [3], important applications to electron density computations with fast multilevel and factor-filtering techniques are presented in [4]. The canonical decomposition of a is of the form a(i1 , ..., id ) =
R X
u1 (i1 , s)...ud (id , s),
(2)
s=1
the value of s is called the canonical rank. The number of parameters in the Tucker decomposition is larger than r d , so the exponential growth is still there. The canonical decomposition involves dRn parameters and usually R = o(n) or even does not depend on n. However, save for some important but very special structured cases [5], R can and does grow fastly during computations. Despite that, a robust procedure for low-rank tensor approximation in many dimensions is in May 31, 2009
DRAFT
HERCMA-2009 PROCEEDINGS
3
effect not available. Even when the tensor is given in some canonical format with rank R and we want to approximate it by a tensor of smaller rank r < R, there are no robust algorithms. So we are led to the following principal question. Can we find an approximation format for d-dimensional arrays that is free from exponential dependence on d and allows us to perform effective and efficient linear algebra operations by using some SVD-type approach? A certain positive answer is suggested in [6], [7]. The new format contains a smaller and generally much smaller number of parameters than a canonical decomposition of the same tensor. Moreover, it possesses good stability properties similar to those of the SVD. In [8], [9] it was noted that the general construction of [6], [7] does not lose anything if only one particular case is used. Moreover, all things in this particular case become simple and lucid, and, as a consequence, we obtain efficient and effective algorithms with a sufficiently simple structure. We call the new decomposition a tensor-train decomposition or TT decomposition. II. F ROM
RECURSION TO TENSOR TRAINS
As a starting point, we subdivide the indices into two subsets, e. g. the first k and the others, and observe that a dyadic (skeleton) decomposition a(i1 , ..., ik ; ik+1 , ..., id ) =
r X
u(i1 , ..., ik , s) v(ik+1 , ..., id , s)
s=1
reduces the decomposition problem for a d-dimensional tensor to the two problems, for tensors of dimensionality k + 1 and d − k + 1. The same reduction can be repeated recursively for the new tensors u and v. Eventually we represent a using some binary tree with the leafs associated with tensors of dimension 2 or 3. This observation in itself is kind of trivial and can be easily neglected if it is not supported by appropriate algorithms. The main new finding of [6], [7] for the above recursive decomposition is exactly the robust algorithms. At the onset we called it a tree-Tucker decomposition assuming that the given tensor is first written (at least implicitly) in the form (1) and the tree representation is constructed for the “core tensor” g so that the leafs indicate tensors of dimension 3. Then we observed that the Tucker decomposition on the upper level may be skipped, the same number of representation parameters is achieved if we extend the tree so that the leaf tensors can be of dimension 3 and 2. For the modified decomposition we decided to keep the name of TT decomposition. Then we May 31, 2009
DRAFT
HERCMA-2009 PROCEEDINGS
4
concentrated on one particluar case of our construction and suddenly realized that it should be made the main case. We still call it TT decomposition, but it is already a neither Tucker, nor tree decomposition. However, the same abbreviation TT has now the full right to symbolize that the tensor elements are represented by tensor trains of the following form: X a(i1 , ..., id ) = g1 (i1 , α0 , α1 ) g2 (i2 , α1 , α2 )... gd (id , αd−1 , αd ),
(3)
α0 ,...,αd
where αk attains rk values and, by definition, r0 = rd = 1. In the matrix form this train of tensors becomes a product of matrices d Gidd , a(i1 , ..., id ) = Gi11 Gi22 ...Gid−1
(4)
where Gikk is a matrix of size rk−1 × rk . Note that Gi11 is a single row and Gidd is a column, so the product of d matrices gives a number (a matrix of size 1 × 1).
If n1 = ... = nd then we have dn matrices Gi11 , ..., Gidd , and the total number of entries
(representation parameters) does not exceed (d − 2)nr 2 + 2nr, where r = maxl rl . Thus, instead
of nd entries we define the same tensor by a TT decomposition that contains only O(dnr 2) parameters, the dependence on d is linear! III. S TABILITY
OF COMPRESSION RANKS
The numbers rk for the index ranges in (3) are called compression ranks. For a TT decomposition with minimal possible values of compression ranks we have [9] rk = rankAk ,
Ak = a(i1 ,...,ik )(ik+1 ,...,id) .
Hence, if a and b are close elementwise, |a − b| ≤ ε, then for all sufficiently small ε > 0 the compression ranks for minimal TT decompositions are such that rk (a) = rk (b). In other words, under ε-perturbations compression ranks cannot decrease, provided that ε is sufficiently small. In contrast, under small perturbations the canonical rank for some tensors can decrease even considerably. A well-known example is this [1]: take linearly independent vectors x1 , ..., xd ,
May 31, 2009
DRAFT
HERCMA-2009 PROCEEDINGS
5
y1 , ..., yd and define a new vector using Kronecker products as follows: d x , k 6= t, X k t t t a= z1 ⊗ ... ⊗ zd , zk = yk , k = t. t=1
We can naturally view a as a d-dimensional tensor, and it is easy to prove that the canonical rank of a is not less than d. At the same time, 1 1 a = (x1 + εy1 ) ⊗ ... ⊗ (xd + εyd ) − x1 ⊗ ... ⊗ xd + O(ε). ε ε Hence, a is a limit of tensors of canonical rank 2. Examples of this kind make the computation of canonical ranks a really difficult problem. Using the TT decomposition for the same tensor, we obtain all compression ranks equal to 2. Thus, the stability comes here with a bonus of reduction of the number of representation parameters: the unstructured canonical format would require d2 n parameters whereas the TT format contains only 4(d − 2)n + 2n parameters. IV. T ENSOR - TRAIN
OPERATIONS
If a tensor a can be represented in the canonical format (2) with R summands, then there is a TT decompostion with all compression ranks rk ≤ R. In order to obtain it, we merely set g1 = u1 (i1 , α1 ),
gd = ud (id , αd−1 );
gk = uk (ik , αk−1 , αk )δ(αk−1 , αk ),
2 ≤ k ≤ d − 1.
Here, δ(α, β) = 1 if and only if α = β, and each αk attains R values. Then we can apply a general tensor-train recompression algorithm [9] possessing two remarkable properties: •
The complexity is linear in d and n.
•
The result appears with a guaranteed approximation accuracy.
The minimal TT compression ranks are ranks of the unfolding matrices Ak . But, although Ak is of size nk ×nd−k , it never appears as a full array of nd entries. We manage to compute a truncated singular value decomposition (SVD) of Ak with a guaranteed accuracy so that the involved orthogonal (unitary) matrices are put in a compact factorized form. For example, consider a reshaped tensor train a(i1 , i2 , i3 ) =
X
g1 (i1 , α1 ) g2 (α1 , i2 , α2 ) g3 (α2 , i3 ).
α1 ,α2
May 31, 2009
DRAFT
HERCMA-2009 PROCEEDINGS
6
Using QR decompositions, in the first sweep from right to left we subsequently transform the train a(i1 , i2 , i3 ) =
X
g1 (i1 , α1 ) g˜2 (α1 , i2 , α2′ ) q3 (α2′ , i3 )
X
gˆ1 (i1 , α1′ ) q2 (α1′ , i2 , α2′ ) q3 (α2′ , i3 )
α1 ,α′2
=
α′1 ,α′2
so that the unfolding matrices q2 (α1′ ; i2 , α2′ ) and q3 (α2′ ; i3 ) acquire orthonormal rows. The very transformations read g3 (α2 ; i3 ) =
X
r3 (α2 ; α2′ ) q3 (α2′ ; i3 ),
g˜2 (α1 , i2 ; α2′ ) =
g˜2 (α1 ; i2 , α2′ ) =
g2 (α1 , i2 ; α2 ) r3 (α2 , α2′ ),
X
g1 (i1 ; α1 ) r2 (α1 ; α2′ ).
α2
α′2
X
X
r2 (α1 ; α1′ ) q2 (α1′ ; i2 , α2′ ),
gˆ1 (i1 ; α1′ ) =
α1
α′1
Then, in the second sweep from left to right we subsequently modify the train X z1 (i1 , α1′′ ) gˆ2 (α1′′ , i2 , α2′ ) q3 (α2′ , i3 ) a(i1 , i2 , i3 ) = ′ α′′ 1 ,α2
=
X
z1 (i1 , α1′′ ) z2 (α1′′ , i2 , α2′′ ) gˆ3 (α2′′ , i3 )
′′ α′′ 1 ,α2
so that the matrices z1 (i1 ; α1′′ ) and z2 (α1′′ , i2 ; α2′′ ) get orthonormal columns. For uniformity we introduce unnecessary indices α0′′ = α3′ = 1 and observe that rankA1 = rank [ˆ g1 (α0′′ , i1 ; α1′ )] , rankA2 = rank [ˆ g2 (α1′′ , i2 ; α2′ )] , rankA3 = rank [ˆ g3 (α2′′ , i3 ; α3′ )] . Therefore, the minimal compression ranks are computed with complexity linear in d. Moreover, using the truncated SVD for small-size matrices in the right-hand side with a prescribed Frobenius-norm accuracy ε, we come up with a tensor train approximating the original tensor √ with accuracy dε (in the Frobenius norm for tensors). Basic linear algebra operations in the TT format are performed in two steps: in some straightforward way the result is put in some TT format and then the tensor-train recompression is applied to reduce the compression ranks within a prescribed accuracy.
May 31, 2009
DRAFT
HERCMA-2009 PROCEEDINGS
7
V. F ROM
MATRICES TO TENSORS
Efficient techniques for numerical work with d-dimensional tensors suggest that we can consider tensors in lower dimensions, e. g. matrices, as tensors in higher dimensions. For instance, given a matrix a(i, j) we can replace i and j with multi-indices (i1 , ..., id ) and (j1 , ..., jd ): a(i1 , ..., id ; j1 , ..., jd ) = a(i, j). We still have little hope that tensor trains for a would provide small compression ranks. In order to reach this purpose we have to reshape this tensor as suggested in [10]: b(i1 , j1 ; ...; id , jd ) = a(i1 , ..., id ; j1 , ..., jd ). Let the original matrix be of size n × n with n = 2d . Then the indices i1 , ..., id and j1 , ..., jd attain 2 values, and b can be regarded as a d-dimensional tensor of size 4 × ... × 4. On many examples we have learned that the ranks of unfolding matrices Bk for b are quite small within a preset accuracy. In some cases we support this experience by rigorous estimates. Compression ranks are exactly 1 if our matrix is a Kronecker product of d matrices of size 2 × 2. If our matrix is the inverse to a Toeplitz banded matrix with half-bandwidth s, then the
compression ranks do not exceed 4s2 + 1 [11].
VI. C ONCLUSION Tensor-train decomposition is a new instrument with a potential of breaking the “curse of dimensionality”. It is most important that we have found an efficient recompression procedure for tensor trains with reliability of the SVD. As a consequence, basic tensor (linear algebra) operations can be implemented in the TT format in a fast and robust way. Moreover, even n × n matrices can be considered as d-dimensional tensors with d ∼ log2 n. Under the proper multilevel reshaping, the corresponding tensors can be approximated in the TT format with sufficiently small ranks. This pertains, for example, to matrices that represent digital images. The complexity of the TT recompression for matrices is comparable with that of discrete wavelet transforms [10]. Other new vistas in this direction are related with approximate numerical computation of matrix functions (for certain classes of matrices) with the complexity linear or polynomial in d, and hence, logarithmic or polylogarithmic in the matrix order n. Tensor trains also possess certain interpolation properties that are the subject of our current research. Eventually we are able to develop the ideas of our incomplete cross approximation May 31, 2009
DRAFT
HERCMA-2009 PROCEEDINGS
8
method [2] to tensors in higher dimensions in such a way that the number of entries involved is linear or polylinear in d. At least at this moment we are sure that tensor trains provide the only viable approach to this task. May 30, 2009 ACKNOWLEDGMENT This work was supported by RFBR grants 08-01-00115, 09-01-91332, 09-01-00565. R EFERENCES [1] V. de Silva, L. H. Lim, Tensor rank and the ill-posedness of the best low-rank approximation problem, SIAM J. Matrix Anal. Appl., vol. 30, no. 3, pp. 1084–1127 (2008). [2] I.Oseledets, D.Savostyanov, E.Tyrtyshnikov, Tucker dimensionality reduction of three-dimensional arrays in linear time, SIAM J. Matrix Anal. Appl., vol. 30, no. 3, pp. 939-956 (2008). [3] I. V. Oseledets, D. V. Savostyanov and E. E. Tyrtyshnikov, Linear algebra for tensor problems, ICM HKBU Research Report 08-03, August 2008 (www.math.hkbu.edu.hk/ICM), submitted to Computing. [4] I. V. Oseledets, D. V. Savostyanov and E. E. Tyrtyshnikov, Cross approximation in electron density computations, ICM HKBU Research Report 09-04, February 2009 (www.math.hkbu.edu.hk/ICM), submitted to Numerical Linear Algebra with Appl.. [5] I. V. Oseledets, E. E. Tyrtyshnikov and N. L. Zamarashkin, Matrix inversion cases with size-independent tensor rank estimates, ICM HKBU Research Report 08-12, November 2008 (www.math.hkbu.edu.hk/ICM); Linear Algebra Appl., vol. 431, pp. 558–570 (2009). [6] I. Oseledets, E. E. Tyrtyshnikov, “Breaking the curse of dimensionality , or how to use SVD in many dimensions”, ICM HKBU Research Report 09-03, February 2009 (www.math.hkbu.edu.hk/ICM); submitted to SIAM J. Sci. Comput., 2009. [7] I. Oseledets, E. E. Tyrtyshnikov, “On a recursive decomposition of multi-dimensional tensors”, Doklady of Russian Academy of Sci., vol. 427, no. 2 (2009). [8] I. Oseledets, “Compact matrix form of the d-dimensional tensor decomposition”, submitted to SIAM J. Sci. Comput., 2009. [9] I. Oseledets, “On a new tensor decomposition”, Doklady of Russian Academy of Sci., vol. 427, no. 3 (2009). [10] I. Oseledets, “On approximation of matrices with logarithmic number of parameters”, Doklady of Russian Academy of Sci., vol. 427, no. 4 (2009). [11] N. Zamarashkin, I. Oseledets, E. Tyrtyshnikov, Tensor structure of the inverse to a banded Toeplitz matrix, Doklady of Russian Academy of Sci., vol. 427, no. 5 (2009).
May 31, 2009
DRAFT