A Singular Value Decomposition of a k-Way Array for a Principal Component Analysis of Multiway Data, PTA-k Didier Leibovici* School of Computing
and Mathematical
University of Greenwich, Wellington Street London
Woolwich
Sciences
Campus
SE18 6PF, U.K.
and Robert Sabatier Fazulte’ de Pharmacie Avenue Charles Flahaut 34000
Montpellier,
France
Submitted by Richard A. Brualdi
ABSTRACT Employing
a tensorial
decomposition
approach
singular value, based on a generalization form. A recursive tion
of the
concepts:
0
algorithm
Eckart-Young
the orthogonal
generalization over
to describe
of this type of multiarray
theorem
of the transition is introduced
1998 Elsevier
Science
which
formulae, termed
value
has a Gauss-Seidel
SVD-k.
rank. The
by a principal
conserves
singular
given to attain a A generaliza-
by consideration
rank and the free orthogonal PTA-k,
the
The algorithm
leads to the decomposition
in data analysis is illustrated
k modes, termed
a k-way army,
is established.
of new
application
component
rank
of this
analysis (PCA)
most of the properties
of a PCA.
Inc.
*E-mail:
[email protected].
LINEARALGEBRAAND
ITS APPLICATIONS
0 1998 Elsevier Science Inc. All rights reserved. 655 Avenue of the Americas, New York, NY 10010
269:307-329 (1998) 0024-3795/98/$19.00 PII sOO24-379$97)90229-2
DIDIER
1.
LEIBOVICI
AND
ROBERT
SABATIER
introduced
by Cailliez
INTRODUCTION
While in factorial data analysis the duality scheme and Pages (1976) of such simple
has permitted
exploratory
methods
or multiple
explain
adequate
comprehension
as principal
component
correspondence
a multimode
analysis
analysis
analysis (CA or MCA),
is limited.
two-way arrays. For a three-way
in algebraic
Thus,
it is most
terms
(PCA)
and
its capacity appropriate
array, three duality schemes
to for
can be drawn,
but each entry does not play the same role, as for example in the case of the statis and prestatis duality schemes
methods
linked to this composite Algebraists PCA
models
k-way arrays
(Tucker,
19661,
and
De
developed Leeuw
have systematically
(1987)
(1992)
existing
It is in this context
in his thesis spaces.
and analytically
h ave been
used the Kronecker (19701,
1970;
which
a model
that the Tucker
Focusing
Kruskal,
1977)
are actually combining
operating
The
product.
the
both
between
by
latter
on SVD
and Candesame
the
Apart from the Kronecker
product
(PCA-3)
introduced.
model.
orthogonal
product,
matrices
included
a new algebraic
approach,
This new approach Candecomp,
enabled
Parafac,
Franc
and PCA-3,
extend them to k modes without difficulty. The purpose of this presentation is to base an extension with k modes,
for main
which
for a fixed
an extension of the algebraic framework was also required. work was conducted in this area by Kaptein et al. (19861, and
of k vector
algebraically
The
modes
algebra. A k-way array is seen as a tensor of order k, an element product
models
models.
in PCA over three
approach.
can be seen as the tensor
Franc
from models such as
have developed
to extend
(1980),
has described
Parafac and the Tucker representation, Preliminary
to generalize
(SVD),
model (Harshman,
by Carol1 and Chang
Yoshisawa
These
along with other models
in the manner in which the data may be represented,
have been the Parafac camp
attempting
for what optimization.
Kroonenberg authors
and
lies, however,
and thereafter,
(1993),
design.
value decomposition
arrays
problem
in Leibovici
and statisticians,
or singular
three-way
(i.e. statis on the arrays); see Lavit (1988).
are described
by deriving singular values and the SVD,
the tensor of the tensor to describe and also to
of PCA to a PCA using the tensorial
approach in order to obtain a theorem similar to that of Eckart and Young (1936). In the second section, simple theoretical elements of the tensor product are described.
Two further sections are devoted to the explanation
the SVD for a tensor of order 2, 3, or k. An algorithm
of
to obtain the SVD-k
will be shown in Section 5, and a generalization of the Eckart-Young theorem for a tensor of order k in Section 6. This last part leads to the elaboration of a method termed principal tensor analysis over k modes (PTA-k),
which can be
SINGULAR
VALUE
DECOMPOSITION
used as a standard method for multidimensional
2.
TENSOR
309
for multiway multidimensional
analysis, as PCA is
analysis.
PRODUCT
AND
MULTIWAY
ARRAYS
Firstly, it is essential to recall some definitions for a simple construction of the tensor product and some of the main properties of the calculus from which subsequent
methodologies
greater detail in Chambadal Allouch (1984),
will be derived.
and Ovaert (19681,
These
points are given in
Schwartz (19751, Charles and
and Lang (1984).
DEFINITION1. (i) Let
E,, . . . , E,
be k Euclidean
vector
spaces of finite dimensions,
with metrics D,,..., D,. With a k-tuple (a,, . . . , uk) of vectors in these spaces, let the element denoted a, 8 a2 @ ... 8 ak be a k-linear map on
E, XE,
where
x ... x E, defined by
( , )E, indicates tensor.
the inner product
in E,. This element
is termed
a
decomposed
(ii) The
space generated
tensor product
by all the decomposed
of the k spaces
E,. Its dimension
is the product
tensors
E,, . . . , E,, and is denoted
c
tensors.
Let
AiiiP,.,ik el,,
the
of the dimensions.
(iii) The inner product in E, @ E, @ ... 8 E, is defined
for decomposed two tensors:
is termed
E, @ E, @ ... @
{eji,.. . . “j,}
as
be a basis of Ej, and X and A be
@ e2iz 8 ... Q ekik,
i,i,...i,,
c i,i,
Xiliz...it el,, ik
8 e2i2 8 ... Q ek,t
(2) E,@E,@
...
@Ek
310
DIDIER
i,i, =
LEIBOVICI
AND
ROBERT
SABATIER
. ik
“x”(D,
c%+ D, -.. c$ D,)
A”
= (X, A) E,@E2@.~~@E, AiliZ...ik’
where
xi,i,...ik
the vectorialization length dim(E,
R,
means the Kronecker product, and x” is X, i.e., its representation as a vector of
d
8 Es o *-- @ Ek). This definition
(E, where
E
of the tensor
@ E, 8 .-a 63 Ek)* = E:
leads to the expression
@ E,* @ --. @ Et,
(3)
* means the dual space.
(iv> Chambadal the tensor product
and Ovaert
(1968)
g eneralize
of two linear applications:
+ F,; then let A: E, ~3 E, + F, @ F2 such that A,(x,);
this unique linear application
(v) A useful operation
is expressed
is proposed
consists
of tensor
contraction
(3) defining
A, : E, + F, and A, : E, A(x, @ x2> = A,(x,) as A = A, 8 A,.
by Schwartz
image of a vector by a linear application by a tensor, here denoted
the assertion
Let
(19751,
as the contracted
generalizing
8 the
product of a vector
. . (no notation having been given by the author).
multiplication
of the tensor
and the vector
on the space to which the vector belongs.
temnr; of E @ F 8 G, and let {e,ll,n,
A = xAijkei
{&>,,,,
followed
It by
For example, let A be a
and {gkll, p be bases of E, F,
@fj 63 gk.
qk
Consider
a vector z* E G*. Then A.. z* = xAijkei
@h(gk,
z*>
@f,(gk.
Ez,gi)= m
ijk
=
CAijkei ijk
CAijkZkei ijk
@&-
(4
A . . z* is an element of E @ F. With z an element of G, A . . z will often be expressed in the same way, explaining a contraction as an inner product. In (4), ( gk, z*> is then changed to ( gk, 2)~. Thus the inner product of two tensors can be seen as the contracted product between them, and so the
SINGULAR
VALUE
metric may be expressed
(A, X>E,0E20
311
DECOMPOSITION [see (iv)] as
=
. ..OEt
A.. X = A . . (D,
@ D, 63 ..a Q Dk) .. X.
(5)
REMARK 1. (1) Note that (4) can be obtained
by transforming
dim( E 8 F) rows and dim(G)
columns
with qn rows and p columns. to
Computing
If complete
vectorialization
expressed
the image of z by this matrix leads
c,
+
A..z*
= AGz*.
is put into bijection,
L(R;
(7) for example
E @ F @ G), then the indexed vectorialization F @ G as L(G*; E 8 F). (2) The fundamental
difference
A to a matrix with
as
between
&
E @ F 8 G and E @
as in (6) identifies
and 8 is that the Kronecker
product operates with a specific and fixed choice of base (lexicographic of indices),
i.e.,
The advantage
order
8 is algebraic, whereas & is arithmetic (on coordinates). of the tensor product is the flexibility of its representations.
They depend on the operation
applied.
(3) Using the contracted product have an underlying use of metrics.
with the inner product
enables
one to
There are several important properties of the tensor product which may be considered fundamental to factorial data analysis.
PHOPERTY 1. (a> Definition l(iii) describes the universal property of the tensor product, which is generally taken for the definition and construction of the tensor
DIDIER
312 product.
For
commutative
any
bilinear
map
LEIBOVICI
S the space
AND ROBERT tensor
product
SABATIER implies
the
diagram S (bilinear) (8) ETFxrY
E@F (b)
The tensor
product
of two subspaces
of E and F is a subspace
of
E @J F. (c)
to L( F; E), the space of linear ‘maps from
E @ F* is isomorphic
to L(E; F).
E, and E* @J F is isomorphic
F to
Even if E @ F # F Q E, they are
isomorphic. (d)
The operation
@ is associative.
By Property 1( c ) a matrix is identified with a linear map and with a tensor of order two: E @ F N L( F*; E) N M(n; q; I@. The factorial analysis methods can thus be described by tensor calculus. This approach can be generalized to an array with k ways, by consideration order k, i.e., an element and in our presentation,
of a tensor product
of the latter as a tensor
of
of k vector spaces. In practice,
those spaces will be iw”~, where m, is the number
of
cells in way t.
3.
SINGULAR
VALUES
FOR
TWO
MODES
Let S, : E* X F* + R be the bilinear map defined by S,(e* , J;* > = Xij with {e?),, ,&*I,,
the canonical
of the tensor pro&ct
bases of the spaces. The universal property
implies
(9)
E* @ F* Then for all rj~* and 4p* in E* and F*,
=
“(lpd p)F=
s,.
(10)
SINGULAR
VALUE
313
DECOMPOSITION
PROPERTY 2. (i)
The first singular
value can he expressed
CT, = ,,81;;‘2y_Lgx(111* @ cp*> = ,,cp*,,;:= I =
max
ll*llE= IldF=
1 1
=x-c+,
(ii)
($8
by diflerent
ma
(**
@ P*, X>
max
X..(Ic,@
~0)
llti*llt*= 1 Il~*ll,~. = I
q,X)~@r=
muximization,s:
Il*llF= 1 Ilqllf:= 1
(11)
@ cpl>
The tensor solution in (11) is unique
up to an orthogonal
transforma-
tion leaving X invariant. Proof.
It is a maximization
of a continuous
{ l+b @ cp~ll$bllE = 1, which is closed in a compact This implies
the existence
IIPIIF = I}
c
linear map over
{~llME@F = I}>
set (the unit sphere);
of cr. The
uniqueness
(12)
thus it is itself compact. is because
map.
of the linear ??
In expressing the Lagrange problem associated with this maximization, the classical transition formulae which lead to the eigenequations of the well-known
operators
are found. In matrix form these are
If there are metrics
In a tensorial
D and Q on E and F respectively,
XQP = N
XQ”XDrc, = a2rC,
“XDtc, = arp
“XDXQcp = c2q
form the transition
formulae
where
X is the tensor equivalent
(14
are
x..cp = a*, x .. * =
(13) becomes
(15)
crcp,
to the matrix
X.
314
DIDIER
LEIBOVICI
The other singular values can be obtained Lemma in the optimization.
ity constraints generalization)
enables
E
@a F = (El
generated
and uniqueness.
by the first solution,
so that
h Et) @(F, 6 F;)
= (E,
@F,)
h(E:
@F;)
= (El
@F,)
h(E,
6s F$
in (E,
with constraint
projections
null tensor. Thereafter,
h(E,
8
h(E: @Fl)
F:)
(16)
;
the maximization
is in fact in El’ @ F1’ , i.e., of the orthogonal
this space is termed the orthogonal
the
space lead to the tensorial
space of
E, and F,.
REMARK 2. and De Leeuw
The well-known (1980)
duality in these solutions. orthogonal
@ F1)l
of X on the other subspaces
the subspaces
A priori
in E or F or both. Let E, and
note that El’ ~3 VI1 c (E, ~3 F1)’ . Given the duality [(13) or (15)] in the first solution, solution
SABATIER
by consideration of orthogonal1 given in Section 4 (for the
us to affirm the existence
there is a choice with regard to orthogonality
F, be the subspaces
AND ROBERT
derive
core matrix in the PCA-3 from ‘this observation
of Kroonenberg
and from the lack of
That is to say, for three modes the solution in the
space of the first solution is not always in the orthogonal-tensorial
space of the preceding solution. After reiterating the process of solution for singular values or, in this case, after diagonalization (13), an orthogonal decomposition singular values decomposition SVD-2, may be expressed
of the tensor, as
the
(17) or in matrix form,
rank x
(18)
SINGULAR
VALUE
The well-known
DECOMPOSITION
matrix approximation
permits the performance
315 theorem
may thus be formulated,
it
of a PCA:
THEOREM 1 (Eckart and Young, 1936). The best rank r (r < q> approtimation of a rank 4 matrix X, according to the norm coming from the inner product in E 8 F, is given by the matrix built with the first I^ tensors of the
SVD:
the squared distance
being
min
[IX - 211” = IIX - x,11” =
Z rank Z=r