Towards a standardized notation and terminology in ... - CiteSeerX

11 downloads 75950 Views 129KB Size Report
used three-way models and their multiway generalizations [3,5]. ..... given names in addition to symbols, we call Xa the mode A matricized version of X, Xb the.
JOURNAL OF CHEMOMETRICS J. Chemometrics 2000; 14: 105–122

Towards a standardized notation and terminology in multiway analysis Henk A. L. Kiers*y Heymans Institute (PA), University of Groningen, Grote Kruisstraat 2/1, NL-9712 TS Groningen, The Netherlands

SUMMARY This paper presents a standardized notation and terminology to be used for three- and multiway analyses, especially when these involve (variants of) the CANDECOMP/PARAFAC model and the Tucker model. The notation also deals with basic aspects such as symbols for different kinds of products, and terminology for threeand higher-way data. The choices for terminology and symbols to be used have to some extent been based on earlier (informal) conventions. Simplicity and reduction of the possibility of confusion have also played a role in the choices made. Copyright  2000 John Wiley & Sons, Ltd. KEY WORDS:

multiway data; tensors; three-way methods

1.

INTRODUCTION

The first proposals for three- and higher-way generalizations of factor and principal component analysis date back to the 1960s [1,2] and early 1970s [3,4]. The model proposed by Tucker [1,2] generalized the principal component and factor analysis model in that it used one component matrix for all three ‘ways’ of a three-way data array; these component matrices are related to each other by a so-called core array. The model proposed independently by Carroll and Chang [3] and Harshman [4], called the ‘CANDECOMP’ and ‘PARAFAC’ model respectively, also uses component matrices for all three ways, but in their model each component is related to only one component of each of the other two ways. The latter aspect entails that its solution is (under mild conditions) unique. This uniqueness has made the model very popular for purposes of estimation of parameters in situations where two-way data would not allow for unique estimation (e.g. estimating concentrations of chemical analytes in a mixture). In certain situations, however, the model is too restrictive or the uniqueness conditions are not satisfied. In such situations the model proposed by Tucker offers a useful alternative. The above two models can be considered the fundamental models underlying most currently * Correspondence to: H. A. L. Kiers, Heymans Institute (PA), University of Groningen, Grote Kruisstraat 2/1, NL-9712 TS Groningen, The Netherlands. y E-mail: [email protected]

Copyright  2000 John Wiley & Sons, Ltd.

106

H. A. L. KIERS

used three-way models and their multiway generalizations [3,5]. Unfortunately, the models, having been developed within psychometrics, have not always been widely known and have been reinvented under different names. Moreover, even when the models are properly attributed, the notation in which they are described differs widely over different papers and disciplines, and so does the terminology employed (even as far as the names of the models are concerned). Some informal attempts have been made towards a standard notation and terminology, but this has not been successful yet. The present paper is an attempt to formally standardize notation and terminology of the most important aspects in multiway analysis. The proposal takes into account any informal conventions the author is aware of; in cases of several competing choices encountered in the literature, mnemonic simplicity and conceptual clarity are the main criteria for adopting a standard. 2. 2.1.

DATA

General

In statistics, data are generally denoted by the symbol X. In the case of two-way data (i.e. a data matrix) the bold-face version X is used. To distinguish the more or less standard two-way data sets from the less common three- and higher-way data sets, the latter are indicated by an underlined bold-face X. When, in addition to X, there are other data sets, it is preferred to use subscripts or to denote them by other letters at the far end of the alphabet (V, W, Y, Z). For example, in the case of regression involving three-way data, the three-way array pertaining to the predictor variables is denoted by X and that pertaining to the criterion variables by Y. Data to be analysed are usually real numbers, but in some instances they are complex. In the present paper, only real-valued data are considered. Mostly, the notation and properties mentioned in the present paper readily generalize to complex-valued data, but one should be aware of exceptions (e.g. the complex conjugate differs from the simple transpose). 2.2.

Indices

The elements of a three-way array X are denoted by xijk, where the indices are taken to run from 1 to their capital version: i = 1,…,I, j = 1,…,J, k = 1,…,K. For higher-way arrays, additional subscripts are used, chosen as the next letters in the alphabet (e.g. an element of a six-way array is denoted as xijklmn). In some higher-way cases one may thus run out of symbols (considering that we need other running indices as well). In that case one can resort to the more complex use of indices i1, i2, i3, i4, etc. running from 1 to I1, I2, I3, I4, etc. 2.3.

Tensors

An N-way data array is sometimes generically called a tensor (see e.g. Reference [6], p. 10). Thus the notion ‘tensor’ captures arrays of different sizes: a vector of order I is a tensor in RI, an I  J matrix is a tensor in RI  J, an I  J  K three-way array is a tensor in RI  J  K, etc. In general, an I1  I2  …  IN N-way array is a tensor in RI1  I2  …  IN. 2.4.

Modes of an array

In Figure 1 a three-way array is depicted. The entities along the vertical axis are indicated by the first index (i), those along the horizontal axis by the second index (j) and those along the depth axis by the third index (k). Note that the use of the first and second indices is the same as what is common for matrices. The three sets of entities define the three ‘ways’ or three Copyright  2000 John Wiley & Sons, Ltd.

J. Chemometrics 2000; 14: 105–122

NOTATION AND TERMINOLOGY IN MULTIWAY ANALYSIS

107

Figure 1. Three-way array, cut into (a) horizontal, (b) lateral and (c) frontal slices.

‘dimensions’ of the three-way array. Because the term ‘way’ is not very specific and the term ‘dimension’ may be confused with the concept of dimensions of a factor space in factor analysis, here, following Tucker [7], we use the term ‘mode’ to refer to a set of entities. In the case of three-way arrays the first mode is denoted as mode A, the second mode as mode B and the third mode as mode C. This terminology can be extended to higher-way data (leading to modes D, E, etc.), but often for higher-way data it is more convenient to denote the modes by numbers, hence as mode 1, mode 2,…, mode N. Copyright  2000 John Wiley & Sons, Ltd.

J. Chemometrics 2000; 14: 105–122

108 2.5.

H. A. L. KIERS

One-mode, two-mode and three-mode three-way data

Carroll and Arabie [8] distinguished three-way data into three types as follows. If the three modes pertain to three different sets of entities, then the data are denoted as three-mode threeway data or simply three-way data. If two modes pertain to the same set of entities (e.g. when dealing with correlations between variables or (a)symmetric proximities between objects) and hence only two different modes are involved, this can be indicated by specifying such threeway data as two-mode three-way data. Finally, if all three modes pertain to the same set of entities (e.g. in the case of transitions between a number of states over three consecutive time points), the data can be denoted as one-mode three-way data. This terminology can easily be extended to N-way data by simply counting the number of different modes, but is not always fruitful then, because it does not distinguish, for instance, between four-way arrays with two modes that both occur twice and four-way arrays with one mode that occurs three times and one mode that occurs only once. There is one exception in which there is no doubt about the meaning: one-mode N-way data pertain to data in which all modes pertain to the same entities. Incidentally, it should be noted that even in cases where some of the modes are equal, we will still refer to them by different names (mode A, mode B, mode C, etc. or mode 1, mode 2,…, mode N), as described in Section 2⋅4. 2.6.

Matrices, vectors and other subarrays of three- and higher-way arrays

A three-way array is frequently considered in terms of a set of matrices. These matrices form the horizontal, lateral and frontal slices of the three-way array, as visualized in Figures 1a–1c. Specifically, the I horizontal slices pertain to the entities i = 1,…,I of mode A, the J lateral slices pertain to the entities j = 1,…,J of mode B and the K frontal slices pertain to the entities k = 1,…,K of mode C. For higher-way arrays, submatrices can be defined as well, but it no longer makes sense to give them intuitive names as is done in the three-way case. Sometimes it is useful to consider a three-way array as a set of vectors. Then three sets of such vectors (denoted as fibers) can be distinguished, namely vertical fibers, horizontal fibers and ‘depth’ fibers, which run over the mode A entities, mode B entities and mode C entities respectively, as visualized in Figures 2a–2c. To compare with matrices: horizontal fibers are rows and vertical fibers are columns. In general, these vectors are called ‘mode n vectors’ (compare Reference [6]), where n denotes the mode over which the vectors run. They are associated with particular vector spaces, to be called the ‘mode n spaces’. Thus for a threeway array the mode A space is the (sub)space spanned by all vertical fibers, the mode B space is the (sub)space spanned by all horizontal fibers and the mode C space is the (sub)space spanned by all depth fibers. These spaces are subspaces of RI, RJ and RK respectively. To give a numerical example, consider the 4  3  2 array with frontal planes

Figure 2. Three-way array, cut into (a) horizontal, (b) vertical and (c) depth fibers.

Copyright  2000 John Wiley & Sons, Ltd.

J. Chemometrics 2000; 14: 105–122

NOTATION AND TERMINOLOGY IN MULTIWAY ANALYSIS

0

1 B 0 B @ 1 ÿ1

2 1 0 0

1

0

0 0 C C ÿ1 A 0

and

0 B1 B @1 0

1 0 2 0

109

1

0 0C C 0A 1

The vectors 0 1 2 B1C B C @0A 0

and

0 1 0 B0C B C @0A 1

are two vertical fibers (or mode A vectors) of the array, the vectors (1 2 0), (1 071) and (1 2 0) are examples of three horizontal fibers (or mode B vectors) and the vectors (1 0), (2 1) and (0 1) are examples of depth fibers (or mode C vectors). The mode A space is the space spanned by the vectors 0

1 1 B 0 C B C @ 1 A; ÿ1

0 1 2 B1C B C; @0A 0

0

1 0 B 0 C B C @ ÿ1 A; 0

0 1 0 B1C B C; @1A 0

0 1 1 B0C B C @2A 0

and

0 1 0 B0C B C @0A 1

which can readily be seen to be the whole R4. The mode B and mode C spaces are defined analogously. In general, for N-way arrays we can define subarrays of any size desired. For instance, one may define the three-way arrays that are a subset of a four-way array. Such arrays can hardly be given intuitive names as we did for matrices and vectors embedded within three-way arrays, but they can be specified using a notation system similar to that used in MATLAB, where we use subscript colons for the modes that remain intact and ordinary indices to indicate the entities with which the subarray is associated. For example, Xi::, X:j: and X::l then denote the horizontal, lateral and frontal planes of a three-way array respectively, and examples of submatrices of a four-way array are Xi::l and X:jk:. Fibers (or mode n vectors) can be denoted by small bold letters using the same indexing system. For example, x:jk denotes a mode A vector (vertical fiber) in a three-way array and xij:l denotes a mode C vector in a fourway array. Furthermore, three- or higher-way subarrays then are denoted by underlined bold capitals, again with the same indexing system. For example X::kl: denotes the three-way subarray of the five-way array X associated with the kth entity of mode 3 and the lth entity of mode 4. Thus any subarray of an N-way array can be identified unambiguously. In practice, the colons will often be omitted, as in many cases there will be no reason for confusion. A common situation is that where a three-way array is only subdivided into its frontal planes. These are then simply denoted as X1,…,XK. 2.7.

Matricization: transforming a three-way or N-way array into a matrix

It is sometimes fruitful to collect all mode n vectors in a single matrix. The supermatrix with all vertical fibers of a three-way array collected next to each other in an I  JK matrix, with mode B entities (j = 1,…,J) nested within mode C entities (k = 1,…,K), is denoted as Xa. This matrix simply contains all the frontal slices of the array next to each other (see Figure 3a). The Copyright  2000 John Wiley & Sons, Ltd.

J. Chemometrics 2000; 14: 105–122

110

H. A. L. KIERS

Figure 3. Matricizing a three-way array: (a) mode A matricization; (b) mode B matricization; (c) mode C matricization.

process of rearranging the elements of X into Xa is often called ‘unfolding’ in chemometrics, but this term is confusing, because in psychometrics unfolding [9] is a particular technique for multidimensional scaling of data with distances between two sets of entities. Here this process is denoted as ‘matricizing’ a three-way array into a matrix (in analogy to the more common term ‘vectorizing’ for rearranging a matrix into a vector), and the reverse process is then called ‘reshaping’ a matrix into a three-way array. Other matricizations are those that form the supermatrices Xb (of order J  KI, with mode C entities nested within mode A entities) and Xc (of order K  IJ, with mode A entities nested within mode B entities) (see Figure 3b and 3c). Other nestings are possible, but without further specification, matricization pertains to one of the above procedures. The above matricizations are related to each other by a simple cyclic permutation of the modes. Analogous to Xa containing frontal planes of X next to each other, Xb contains frontal planes of the three-way array that is obtained upon once cyclically permuting the modes of X (i.e. the second index becomes the first, the third index becomes the second and the first index becomes the third), and Xc contains frontal planes of the three-way array that is obtained upon twice permuting the modes of X. For the numerical example in Section 2⋅6 we have that Copyright  2000 John Wiley & Sons, Ltd.

J. Chemometrics 2000; 14: 105–122

111

NOTATION AND TERMINOLOGY IN MULTIWAY ANALYSIS

0

1 B 0 B Xa ˆ B @ 1 0

ÿ1

1

2 1

0 0

0 1

1 0

0

ÿ1

1

2

0 0C C C 0A

0

0

0

0

1 1

B Xb ˆ @ 2

0

0

1

1

1

ÿ1

1

1

0

0

2

0

C 0A

0

0

0

0

ÿ1

0

0

1

1

0

1

ÿ1

2

1

0

0

0

0

ÿ1

0

0

1

1

0

1

0

2

0

0

0

0

1

 Xc ˆ

1

0



Matricization is not only useful for three-way arrays. In fact, N-way data usually are to be read from file in a two-way structure. An N-way array is ‘matricized’ in essentially the same way as a three-way array. For instance, Xa (or X1 if it is preferred to denote modes by numbers rather than letters, as is the case when there are many modes) contains all vertical fibers collected next to each other in an I  JKLM… matrix, with in the columns the mode B entities nested within mode C entities, mode C entities nested within mode D entities, etc. Likewise, Xb (or X2) is the J  KLM…I matrix, with in the columns the mode C entities nested within mode D entities, mode D entities nested within mode E entities, etc., with mode A entities as the ‘outermost’ entities. Thus, again, Xb is obtained from the once cyclically permuted array X in the same way as Xa is obtained from X itself. This permutational equivalence makes programming with N-way arrays relatively straightforward, even when procedures are to employ matricized versions. If the matricized versions of arrays are to be given names in addition to symbols, we call Xa the mode A matricized version of X, Xb the mode B matricized version of X, etc., and Xa the mode n matricized version of X. Obviously, other possibilities exist for transforming a tensor into a matrix. The above procedures should be seen as the standard and preferable ones, and the use of others should always be explicitly described. For example, the procedure of writing an I  J  K  L arrays as an IJ  KL matrix can be denoted as ‘matricization by combining modes 1 and 2 (1 nested within 2) and modes 3 and 4 (3 nested within 4)’. 2.8.

Vectorization: transforming a three-way or N-way array into a matrix

Sometimes it is useful to represent all the elements of a three- or N-way array as a vector. This can be done by ‘vectorizing’ the array. For matrices, ‘vectorization’ is defined as putting the successive columns of the matrix below each other in a single vector. The vectorization of the matrix U into a vector u is denoted as u = Vec(U). To vectorize a three-way or higher-way array, we simply vectorize the mode A matricized version of it and obtain x = Vec(Xa) as the vectorized version of an array of arbitrary order. 2.9.

Mode n rank and tensor rank

To define the ‘rank’ of a three- or N-way array is more complex than for a matrix. In fact, two types of rank have been defined. The simplest is the mode n rank [6], which is defined as the rank of the mode n space (see Section 2⋅6). For instance, the mode A rank (or mode 1 rank) of an I  J  K  L array X is the rank of the space spanned by the JKL ‘mode A vectors’ x:jkl. Clearly, the mode A rank is the rank of the matrix Xa. The mode B rank (or mode 2 rank) of Copyright  2000 John Wiley & Sons, Ltd.

J. Chemometrics 2000; 14: 105–122

112

H. A. L. KIERS

this array is the rank of the space spanned by the IKL ‘mode B vectors’ xi:kl, hence it is the rank of Xb. The mode C and mode D ranks of X are defined analogously. For matrices the mode n rank can be viewed as the column rank (mode A rank) or row rank (mode B rank) of a matrix. Since the row and column ranks of matrices are known to be equal, for two-way arrays the mode A and mode B ranks are equal. For three- and higher-way arrays such an equality no longer holds. The usual definition of (tensor) rank [10] uses a decomposition of a tensor as a sum of ‘rank-1 tensors’. A rank-1 tensor is a tensor for which the elements can be written as xijkl… = aibjckdl…. The rank of a tensor then is the smallest number of rank-1 tensors sufficient to fully decompose the tensor additively. In the case of matrices this rank is equal to the row and column ranks, but for three- and higher-way arrays no such equivalence exists and the rank can actually be considerably higher than (and is never less than) any of the mode n ranks.

2.10.

Special tensors

A tensor which has all elements zero is called the ‘zero tensor’ (or ‘zero array’). For (hyper)cubic tensors, i.e. tensors with all modes of equal size, we have some special terminology. A tensor which has all elements zero except those for which all indices are the same (which are called the ‘superdiagonal’ elements) is called a ‘superdiagonal’ tensor. In the case where these superdiagonal elements all equal one, it is called the ‘unit superdiagonal’ tensor, denoted by the symbol I (chosen because, as far as its elements are concerned, it resembles the identity matrix). Note that the unit superdiagonal tensor should not to be denoted as ‘identity’, because it does not perform a role similar to that of the identity element (1) or matrix (I) in ordinary (matrix) algebra (see Reference [6], p. 26).

3.

SOME SPECIAL SYMBOLS AND PROPERTIES

In multiway analysis, certain specific matrix products and operators are often used. They will be summarized in this section. Here aij or ai,j is used to denote element (i,j) of a matrix A, and al denotes the lth column of A. Furthermore, a ‘prime’ denotes transposition of a matrix or vector. The Kronecker product is denoted by the symbol 6 and is defined according to (U6V)ik,jl = uij kl. Thus we have 0

u11 V U V ˆ @ ... uI1 V

... ... ...

1 u1J V ... A uIJ V

The columnwise Kronecker product (also denoted as the Khatri–Rao product; see Reference [11], p. 13), denoted by the symbol , can be computed between matrices of the same column order and is defined according to (U V)ik,l = uil kl. Hence, if U and V both have L columns, we have U V = (u16v1j…juL6vL). The elementwise or Hadamard product, denoted by the symbol *, can be computed between matrices of the same order only and is defined according to (U*V)ij = uijvij. In multiway analysis the latter product is typically encountered upon matrix multiplication of two columnwise Kronecker products, because (U V)'(U V) = (U'U)*(V'V). This property can be derived as follows: Copyright  2000 John Wiley & Sons, Ltd.

J. Chemometrics 2000; 14: 105–122

113

NOTATION AND TERMINOLOGY IN MULTIWAY ANALYSIS

0 B …U V†0 …U V† ˆ @ 0

u01 v01

1

C . . . A…u1 v1 j . . . juL vL † u0L v0L

u01 u1 v01 v1 B ... ˆ@

... ...

1 u01 uL v01 vL C ... A ˆ …U0 U†  …V0 V†

u0L u1 v0L v1

...

u0L uL v0L vL

The mode n multiplication, denoted by the symbol  n (see Reference [6], p. 15; see also Reference [12] for a similar definition), defines the multiplication of an array by a matrix along mode n of the array. Specifically, multiplication of the array G by the matrix Sn (with an arbitrary ˜ = G  n Sn and the outcome is number of rows and In columns) along mode n is written as G ˜ n = SnGn involving the mode n matricized versions of G equivalent to the matrix multiplication G ˜ , Gn and G ˜ n (of order In  I1I2…IN/In) respectively. For example, let X be the 2  2  2 and G array with frontal planes     1 0 2 1 and 0 2 0 0 and let  Sˆ

1 ÿ1

0 1



 and



0 2

1 1

Then X  1S can be obtained in mode A matricized version as     1 0 1 0 1 0 2 1 ˆ SXa ˆ ÿ1 1 ÿ1 1 0 2 0 0



2 ÿ2

1 ÿ1



and hence has as frontal planes 

1 ÿ1

0 2



 and

2 ÿ2

1 ÿ1



To find X  3U, we compute its mode C matricized version as     0 1 1 0 0 2 2 0 UXc ˆ ˆ 2 1 2 0 1 0 4 0 and from this recover its frontal planes as   2 1 0 0

 and

4 0

1 4

1 1

0 4





An overview of properties of the above special products can be found, for instance, in Reference [13]. There (p. 263) the following not very well-known but important property relating vectorization of a product of matrices to the Kronecker product is given: Vec…UVW† ˆ …W0 U†Vec…V†

Copyright  2000 John Wiley & Sons, Ltd.

…1†

J. Chemometrics 2000; 14: 105–122

114

H. A. L. KIERS

4. 4.1.

FUNDAMENTAL THREE-WAY MODELS

Names: Tucker models and CANDECOMP/PARAFAC

As already mentioned in Section 1, the fundamental models used in three-way analysis are the model proposed by Tucker [1] and that proposed by Carroll and Chang [3] and Harshman [4]. The former was termed the ‘three-mode factor analysis’ model by Tucker. Kroonenberg and De Leeuw [14], who proposed an algorithm for least squares fitting of this model, called their procedure ‘three-mode principal component analysis’ (to distinguish it from the more narrowly defined term factor analysis which usually is fitted by means of fitting covariances, using assumptions of uncorrelated unique factors). They also distinguished two special cases: the case where one mode is represented by as many components as there are entities (no reduction in this mode), and the case where no reduction takes place in two modes. Consequently, they denoted the original model as the Tucker3 model (reduction in all three modes), the model where only two modes were reduced was denoted as the Tucker2 model, and that where only one mode is reduced is called the Tucker1 model. This distinction between models is a useful one and is often made, albeit not necessarily by these names. Here it is proposed to generally follow this terminology for denoting the models, but we make two modifications. Firstly, in cases where confusion may arise, we add the term ‘three-way’ to indicate that these are threeway models, and thus we can analogously specify four- and higher-way models where not all modes are reduced; for instance, the ‘four-way Tucker2 model’ indicates a four-way model in which only two of the modes are reduced. Secondly, in the case where all modes are reduced, which we see as the ‘default’, we may simply drop the number indicating how many modes are reduced; thus the three-way Tucker3 model can be denoted as the three-way Tucker model, and more generally, the N-way Tucker model indicates the Tucker model for N-way data in which all modes are reduced. The methods fitting these models in the least squares sense are then called N-way Tucker analysis if a model with all modes reduced is considered, or N-way Tucker1 analysis, N-way Tucker2 analysis, N-way Tucker3 analysis, etc. if only some modes are reduced. In fact, N-way Tucker1 analysis comes down to a principal component analysis (PCA) of all fibers pertaining to one mode, and hence of a matricized version of the data array. For three-way data this method has also been denoted as PCASUP [15], and in chemometrics it is sometimes denoted as ‘unfold PCA’ or ‘multiway PCA’, but, as mentioned in Section 2⋅7, the term ‘unfolding’ is confusing and the term ‘multiway PCA’ is easily confused with other multiway generalizations of PCA. The model proposed by Carroll and Chang [3] and Harshman [4] received entirely different names by its two proposers, referring to different features of the model. To give credit to both proposers and to both types of features, the model is referred to as the CANDECOMP/ PARAFAC model or, abbreviated, CP model. The order of the constituent names in the name of the model has been chosen alphabetically and leads to the least confusing abbreviation (PC model resembles PC (personal computer) or PCA). The CP model can be written mathematically as a constrained variant of the three-way Tucker model. This property is often used, and therefore terminology and notation in the two models should be well adjusted.

4.2.

Standard notation for models

The most general model is the three-way Tucker model. This is given by Copyright  2000 John Wiley & Sons, Ltd.

J. Chemometrics 2000; 14: 105–122

NOTATION AND TERMINOLOGY IN MULTIWAY ANALYSIS

xijk ˆ

Q X P X R X

aip bjq ckr gpqr ‡ eijk

115 …2†

pˆ1 qˆ1 rˆ1

In (2), aip, bjq and ckr denote elements of the component matrices A (for the mode A), B (for the mode B) and C (for the mode C) of orders I  P, J  Q and K  R respectively. Thus for the mode A components the running index is p, which runs from 1 to P, for the mode B components the running index is q, which runs from 1 to Q, and for the mode C components the running index is r, which runs from 1 to R. Furthermore, gpqr denotes the element (p,q,r) of the P  Q  R core array G. Finally, eijk denotes the error term for element xijk and is an element of the I  J  K array E. Up to six-way generalizations can be defined analogously using component matrices A, B, C, D, E and F and component indices p, q, r, s, t and u. For clarity, in cases where E denotes the E-mode component matrix, the error array may be denoted by a different symbol, even though no confusion is likely to arise, because the error array and the component matrix play entirely different roles and do not emerge in the same term. From seven-way on, however, we would need a component matrix G, which could become confusing with the core G. Therefore for N-way arrays with N > 6 (or, if preferred, even for N > 3) we use subscripts to distinguish the component matrices for the different modes: A1, A2,…, AN. The elements of An are then indexed by in and pn and hence given by ainpn,n, where the matrix subscript is separated from the others by a comma. Note that in this general form the final subscript n is unnecessary, but in specific cases we need all three subscripts to identify a particular element of a particular component matrix (e.g. a34,2 denotes element (3,4) of component matrix A2 and clearly a34 would be incomplete). With this general formulation the N-way Tucker model [5] can be written as xi1 i2 ...iN ˆ

P1 X P2 X

...

p1 ˆ1 p2 ˆ1

PN N X Y pN ˆ1

! ain pn ;n gp1 ...pN ‡ ei1 i2 ...iN

…3†

nˆ1

The main advantage of (3) is that it is fully general. It is, however, rather hard to read, hence this formulation should only be used if the specific notation, preferably using different letters for different modes, is unfeasible. The (three-way) CP model is given by xijk ˆ

R X

air bjr ckr ‡ eijk

…4†

rˆ1

Here air, bjr and ckr again are elements of the component matrices A (for the mode A), B (for the mode B) and C (for the mode C), now of orders I  R, J  R and K  R respectively. Thus all component matrices have the same number of columns (R). Analogously, the N-way CP model is given by xi1 i2 ...iN ˆ

R N X Y rˆ1

! ain r;n

…5†

‡ ei1 i2 ...iN

nˆ1

For completeness we also give the three-way Tucker2 and Tucker1 models Copyright  2000 John Wiley & Sons, Ltd.

J. Chemometrics 2000; 14: 105–122

116

H. A. L. KIERS

xijk ˆ

Q P X X

aip bjq gpqk ‡ eijk

…6a†

pˆ1 qˆ1

xijk ˆ

P X

aip gpjk ‡ eijk

…6b†

pˆ1

which are special cases of the three-way Tucker model with C = IK and G of order P  Q  K (Tucker2) and B = IJ, C = IK and G of order P  J  K (Tucker1). P All models are usually fitted to a data set by minimizing the sum of squared error terms, ijkeijk2, over A, B, C and G.

4.3.

Matrix formulations of models

Matrix formulations of the Tucker and CP models are often given. For the three-way Tucker model this formulation is Xa ˆ AGa …C0 B0 † ‡ Ea

…7†

From (4) and (2) it is clear that the CP model is the constrained version of the three-way Tucker model with all elements in the core equal to zero, except the elements g111, g222,…, gRRR which are equal to one [3]. Thus, using the R  R  R three-way unit superdiagonal array I (see Section 2⋅10), which has unit elements in the positions (r,r,r), r = 1,…,R, and zeros elsewhere, we can view the CP model as the special case of the three-way Tucker model with the core equal to the unit superdiagonal array. Hence the CP model is Xa ˆ AIa …C0 B0 † ‡ Ea

…8†

An alternative notation can be based on the columnwise Kronecker product (see e.g. Reference [16], p. 25). Then we obtain Xa ˆ A…C B†0 ‡ Ea

…9†

which expresses the structural part of the model only in terms of its three parameter matrices. Note that whereas (C6B)' = (C'6B'), we do not have that (C B)' equals (C' B'). For higher-way generalizations of the three-way Tucker model the matrix formulations are obtained straightforwardly. For instance, the four-way Tucker model given by xijkl ˆ

Q X P X R X S X

aip bjq ckr dls gpqrs ‡ eijkl

…10†

pˆ1 qˆ1 rˆ1 sˆ1

can be written in matrix representation as Xa ˆ AGa …D0 C0 B0 † ‡ Ea

…11†

More generally, the matrix formulation of the N-way Tucker model (3) is given by X1 ˆ A1 G1 …A0N . . . A3 0 A2 0 † ‡ E1

Copyright  2000 John Wiley & Sons, Ltd.

…12†

J. Chemometrics 2000; 14: 105–122

NOTATION AND TERMINOLOGY IN MULTIWAY ANALYSIS

117

where X1 and G1 denote mode 1 matricized versions, hence with the row order equal to the order of mode 1. The N-way CP model (5) is similarly defined as X1 ˆ A1 …AN . . . A3 A2 †0 ‡ E1

…13†

Models for cyclically permuted versions of the array are easily obtained by cyclic permutation of the letters or numbers that indicate the modes; for the three-way Tucker model we thus have …14† Xb ˆ BGb …A0 C0 † ‡ Eb Xc ˆ CGc …B0 A0 † ‡ Ec

…15†

and it becomes clear that the component matrices play fully symmetric roles. 4.4.

Vector formulations of models

By applying (1) (see Section 3) to (7), we can write the vectorized version of the three-way Tucker model as x ˆ …C B A†g ‡ e

…16†

where x, g and e denote the vectorized versions of X, G and E respectively. Expression (16) better displays the symmetry of the three-way Tucker model than does (7), and also gives insight into the role of the elements of the core (as regression weights for the columns of C6B6A). Moreover, the N-way version of (16) is obtained by straightforward generalization as x ˆ …AN AN ÿ1 . . . A2 A1 †g ‡ e

…17†

Finally, the three- and N-way CP models are now given by x ˆ …C B A†1R ‡ e

…18†

x ˆ …AN AN ÿ1 . . . A2 A1 †1R ‡ e

…19†

and

respectively, where 1R denotes a vector of order R with unit elements only. 5. 5.1.

PRE- AND POSTPROCESSING

Centering and scaling ‘within ’ and ‘across ’

Before actually carrying out a multiway analysis, it is often useful to preprocess the data, just as in two-way analysis. In two-way analysis, data are often centered and/or normalized (the combination being called ‘standardized’ or ‘autoscaled’) across the rows to eliminate unwanted differences in level and scale. For three- and higher-way data one may similarly wish to eliminate such unwanted differences, but it is no longer obvious how each of the modes should be dealt with. Therefore it is most important to carefully explain how the data are centered and/or normalized prior to analysis. In their detailed account of preprocessing Copyright  2000 John Wiley & Sons, Ltd.

J. Chemometrics 2000; 14: 105–122

118

H. A. L. KIERS

Figure 4. Visualization of ‘centering across’ and ‘normalizing within’.

three-way data, Harshman and Lundy [17] and Ten Berge [18] used the following terminology. The type of centering is indicated by specifying across which modes the data are centered. For instance, ‘centering across (mode) A’ is carried out by first averaging the data (only) over the entities of mode A and then subtracting each thus obtained average from all data elements that partake in it (see Figure 4a). Analogously, ‘centering across the combination of modes A and B’ is carried out by first averaging the data over the entities of modes A and B and then subtracting each such average from the data that partake in it. In formulae for three-way data, ‘centering across A’ leads to computing centered data as e xijk ˆ xijk ÿ x:jk

…20†

where the subscript dot is used to indicate the mean across i = 1,…,I; ‘centering across the combination of modes A and B’ leads to computing centered data as e xijk ˆ xijk ÿ x::k

…21†

The type of normalization used is indicated by specifying the entities within which the data are normalized. Specifically, ‘normalization within (the levels of mode) A’ consists of, per entity, first computing the sum of squares of all data elements associated with this particular entity and then dividing all these elements by the square root of this sum of squares (see Figure 4b for ‘normalization within B’). In formulae, for three-way data, normalizing within A leads to computing a normalization factor v u J K uX X x2ijk i ˆ t

…22†

jˆ1 kˆ1

and computing the normalized data as Copyright  2000 John Wiley & Sons, Ltd.

J. Chemometrics 2000; 14: 105–122

NOTATION AND TERMINOLOGY IN MULTIWAY ANALYSIS

119

e xijk ˆ xijk =i

…23†

Normalization ‘within the combination of modes A and B’ consists of, for each combination of an entity of mode A and one of mode B, first computing the sum of squares over all other modes and then dividing all associated elements by the square root of this sum of squares. Why do we propose different terminology for centering and normalization? For instance, for four-way data, normalizing within A could also be denoted as normalizing across the combination of modes B, C and D. One reason for using across for centering and within for normalization is that in computing the centered data the subtracted average is best recognized by the mode(s) across which the average is computed (the ones for which dot subscripts are used: x¯.jk contains the average across i), whereas in normalization the scaling factor usually carries the subscript for the entities of the mode(s) within which the normalization takes place (e.g.  j). A second reason is that the preferred procedures for centering and normalizing multiway data are centering across one mode at a time and normalizing within one mode at a time (see Reference [17], where it is argued that only these types of preprocessing leave the structure of the CP or three-way Tucker model intact); with the above terminology these procedures can be referred to in a simple way. If the ‘inappropriate’ preprocessing procedures are to be used, this (justly) requires rather complex terminology. In three-way analysis these preferred forms of centering and scaling are called fiber centering and slab scaling. In multiway analysis the term fiber centering can still be used for centering across one mode only; normalizing within a mode should then no longer be called slab scaling, however. It can be useful to center across several different modes successively or to normalize within several modes successively. For instance, we may first center across A and then across B (or vice versa, which gives the same result). We denote this as ‘(fiber) centering across A and across B’ (or, for short, ‘centering across A and B’). Similarly, normalizing first within A and then within B can be denoted as normalization ‘within A and within B respectively’. Here it is important to give the order of the normalizations, because one normalization affects the other, and the reversed order generally leads to a different outcome. Proposals have been made [17,18] for iterative normalizations within different modes. If such iterative procedures are used, this should be mentioned specifically. Also, when using combinations of centering and normalization in most cases, the order should be specified, and if the procedure is performed iteratively, this should be mentioned as well. 5.2.

Rotation: transformation of Tucker models

The CP model typically gives unique solutions up to permutations and scalings. The three-way Tucker model, on the other hand, is by no means identifiable. Specifically, (7) can be written as eG e a …C e0 B e 0 † ‡ Ea Xa ˆ AGa …C0 B0 † ‡ Ea ˆ A

…24†

˜ = AS, B ˜ = CU and G ˜ a = S71Ga(U71'6T71'). Here the matrices S, T and U ˜ = BT, C with A are non-singular square matrices of orders P  P, Q  Q and R  R respectively.* Sometimes

*Clearly, it would have been nicer if the symbols to denote the orders of the matrices and the symbols for the matrices were the same. However, using symbols P, Q and R for rotation matrices would be confusing, since R usually refers to a correlation matrix. Conversely, using S, T and U for the numbers of components would bypass the common choices for denoting the number of components as Q and R. Therefore it seems best to ignore this inconsistency.

Copyright  2000 John Wiley & Sons, Ltd.

J. Chemometrics 2000; 14: 105–122

120

H. A. L. KIERS

they are taken orthonormal, in which case they are called orthogonal rotation matrices. When they are not orthonormal, we call them oblique rotation matrices. For the N-way Tucker model a similar rotational freedom holds. However, for higher-way cases we will denote the rotation matrices as S1, S2, …, SN, of orders P1  P1, P2  P2,…, PN  PN respectively, and we have e 2 ˆ A2 S2 ; . . . ; A e N ˆ AN SN e 1 ˆ A1 S1 ; A A

and

e a ˆ S1 ÿ1 Ga …SN ÿ10 . . . S2 ÿ10 † G

…25†

Obviously, (25) could also be used for the three-way case, but it has the disadvantage that subscripted matrices are needed. This renders formulae somewhat more difficult to read (and write) and is particularly embarrassing in cases where elements of such matrices are to be specified, as we then need three indices. It frequently happens that rotation in only one ‘direction’ is to be performed. This comes down to multiplication of an N-way array with a matrix. Although such multiplications can easily be written in matrix algebra, as can be seen above, it is sometimes useful to use the mode n multiplication (see Section 3). Specifically, multiplication of the array G by Sn71 ˜ =G ˜ n Sn71 and the outcome is equivalent to the matrix along the mode n is written as G 71 ˜ , Gn and G ˜n ˜ n=Sn Gn involving the mode n matricized versions of G and G multiplication G ˜ = G n Sn71m Sm71, etc. respectively. Successive multiplications can then be written as G The full rotational freedom of the N-way Tucker model is represented by ˜ =G ˜ 1S1712S271…NSN71. The form has a nice symmetric treatment of all modes, G but for practical purposes, where actual matrix multiplications are to be carried out, expression (25) is much more useful. 6. 6.1.

OTHER MULTIWAY MODELS AND CONCLUSION

Extended models

A first class of other multiway models that can be considered is that where the fundamental models (CP or Tucker) are extended by additive terms. Such models have been described in Reference [17] and various variants have popped up at several places in the literature. Adding additive terms implies combination with the general linear model, and it is therefore appropriate to follow common notation used there. Thus first-order additive terms are denoted by single-indexed , , , etc., with as indices simply those that we used for modes A, B, C, etc., leading to i, j, k, etc. Second-order terms are denoted as double-subscripted , , etc. Thus ij denotes the interaction effect of mode A entity i with mode B entity j, etc. For higher-order terms this system of combining symbols is simply extended. 6.2.

Structurally different N-way models

The present overview of notational conventions is based on only two models and hence is by no means complete. For other models it is suggested to choose notation as much as possible along the same scheme. Thus first-mode entities are denoted by the letter ‘a’ and index i, while their components are indexed by p, and the associated rotation is denoted by S, and analogously for the second mode, etc. Furthermore, as far as other models have parameter matrices in common with the ‘fundamental’ models, these should have the same symbol. If parameters play the same role as those in the fundamental model but differ in certain details, this could be expressed by using the same symbol but with a subscript indicating its special role. Copyright  2000 John Wiley & Sons, Ltd.

J. Chemometrics 2000; 14: 105–122

NOTATION AND TERMINOLOGY IN MULTIWAY ANALYSIS

6.3.

121

Concluding remarks

The present paper provides a large number of suggestions for notation and terminology. These suggestions are based on notions and operations that are currently in use in N-way analysis. Some notions, e.g. symmetry of (hyper)cubic arrays, have been ignored given their rather high complexity and limited use. The same holds for operations that rearrange N-way arrays into M-way arrays. The two most common of these are vectorization and matricization (hence with M = 1 and M = 2 respectively), and these can indeed be expected to remain the two most common rearranging operations, because they are sufficient prerequisites for using matrix algebra. However, arranging N-way arrays into three-way arrays may become quite common too, and so may other operations. For arranging tensors into three-way arrays, the term ‘ternarizing’ would seem appropriate (as a ternary array is sometimes used to denote a threeway array), but we are not aware of similar terminological possibilities for arranging tensors into four- or higher-way arrays. It should be noted that, as for matricizing, for ternarizing it would be necessary to indicate specifically which modes are combined and how this is done (see Section 2⋅7). Obviously, the present paper on notation and terminology is based on subjective choices that may not always be deemed most fortunate or useful. However, the aim of this paper is that the notation scheme set out here sets ‘a default’: when there are no reasons to choose a different notation scheme, use the present one. The use of a common notation scheme is expected to increase readability of multiway literature by enhancing recognizability of model formulations. ACKNOWLEDGEMENTS

The author is obliged to Jos ten Berge, Claus Andersson, Rasmus Bro, Age Smilde and three anonymous reviewers for helpful comments on an earlier version of this paper. APPENDIX: OVERVIEW OF NOTATION X, Y I Xa, Xb, Xc,… or X1, X2, X3, … Vec( ) i, j, k, … or i1, i2, i3,… I, J, K, … or I1, I2, I3,… G A, B, C,… or A1, A2, A3,… p, q, r,… or p1, p2, p3,… P, Q, R,… or P1, P2, P3,… S, T, U … or S1, S2, S3,… i, j, k , … ij, ik, ijk, etc. 6 * n X1 = A1G1(AN'6…6A3'6A2') ‡ E1 X1 = A1(AN … A3 A2)' ‡ E1

Copyright  2000 John Wiley & Sons, Ltd.

N-way arrays ‘unit superdiagonal’ tensor matricized versions of X vectorized version of a matrix running indices for respective modes sizes of respective modes core array in Tucker model component matrices for respective modes running indices for components for respective modes numbers of components for respective modes rotation matrices for respective modes additive terms for respective modes examples of additive interaction terms Kronecker product columnwise Kronecker product (Khatri–Rao product) elementwise or Hadamard product mode n multiplication mode 1 matricized version of N-way Tucker model mode 1 matricized version of N-way CP model

J. Chemometrics 2000; 14: 105–122

122

H. A. L. KIERS

REFERENCES 1. L. R. Tucker, Implications of factor analysis of three-way matrices for measurement of change in: C. W. Harris (Ed. ) Problems in measuring change. The University of Wisconsin Press, Madison (1963), 122–137. 2. L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika, 31, 279–311 (1966). 3. J. D. Carroll, & J.-J. Chang, Analysis of individual differences in multidimensional scaling via an N-way generalization of Eckart-Young decomposition, Psychometrika 35, 283–319 (1970). 4. R. A. Harshman, Foundations of the PARAFAC procedure: models and conditions for an explanatory multimode factor analysis. University of California at Los Angeles, UCLA Working Papers in Phonetics. 16, 1–84 (1970). 5. A. Kapteyn, H. Neudecker, & T. Wansbeek, An approach to n-mode components analysis, Psychometrika 51, 269–275 (1986). 6. L. De Lathauwer, Signal Processing based on Multilinear Algebra, doctoral dissertation, University of Leuven, Belgium (1997). 7. L. R. Tucker, The extension of factor analysis to three-dimensional matrices in N. Frederiksen & H. Gulliksen (Eds.), Contributions to mathematical psychology (pp. 109–127), Holt, Rinehart, and Winston, New York (1980). 8. J. D. Carroll, & P. Arabie, Multidimensional scaling, Annual Review of Psychology, 31, 438–457 (1980). 9. C. H. Coombs, Psychological scaling without a unit of measurement. Psychological Review, 57, 145–158 (1950). 10. J. B. Kruskal, Three-way arrays: Rank and uniqueness of trilinear decompositions. Linear Algebra and its Applications, 18, 95–138 (1977). 11. C. R. Rao, & S. K. Mitra, Generalized Inverse of Matrices and its Applications. Wiley, New York (1971). 12. J. D. Carroll, S. Pruzansky, & J. B. Kruskal, CANDELINC: a general approach to multidimensional analysis of many-way arrays with linear constraints on parameters. Psychometrika, 45, 3–24 (1980). 13. J. R. Schott, Matrix Analysis for Statistics. Wiley: New York (1997). 14. P. M. Kroonenberg, & J. de Leeuw, Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika 45, 69–97 (1980). 15. H. A. L. Kiers, Hierarchical relations among three-way methods, sychometrika 56, 449–470). 16. R. Bro, Multiway analysis in the food industry. Models, algorithms and applications, Doctoral dissertation, University of Amsterdam (1998). 17. R. A. Harshman, & M. E. Lundy, Data preprocessing and the extended PARAFAC model in H. G. Law, C. W. Snyder, J. A. Hattie, and R. P. McDonald (Eds.), Research methods for multimode data analysis, (pp. 216–284). Praeger, New York (1984). 18. J. M. F. ten Berge, Convergence of PARAFAC preprocessing procedures and the Deming-Stephan method of iterative proportional fitting in R. Coppi, & S. Bolasco Multiway Data Analysis (pp. 53–63). Elsevier, Amsterdam (1989).

Copyright  2000 John Wiley & Sons, Ltd.

J. Chemometrics 2000; 14: 105–122

Suggest Documents