Appendix B: Biplots and Their Interpretation

44 downloads 0 Views 562KB Size Report
Biplots (see e.g., Gabriel, 1971) allow for the analysis of the two-way interaction in a table of I objects by J variables such that systematic patterns between rows, ...
APPENDIX B BIPLOTS AND THEIR INTERPRETATION

Applied Mulriway Data Analysis. By Pieter M. Kroonenberg Copyright @ 2007 John Wiley & Sons, Inc.

49 1

492

BIPLOTS AND THEIR INTERPRETATION

BIPLOTS AND THEIR INTERPRETATION 6.1

INTRODUCTION'

Biplots (see e.g., Gabriel, 1971) allow for the analysis of the two-way interaction in a table of I objects by J variables such that systematic patterns between rows, between columns, and between rows and columns can readily be assessed and evaluated. The prefix bi refers to the simultaneous display of both rows and columns of the table, not to a two-dimensionality of the plot. The number of dimensions is at most min(1. J ) . Arbitrarily, we will assume that there are more objects than variables, so that I is greater than J , and thus at most J dimensions are possible. As displays of more than two dimensions are generally difficult to make and even more difficult to interpret, most biplots show only the two dimensions which account for the maximum amount of variation in the table (see Section 11.4.2, p. 262). By using the singular value decomposition (SVD),it is possible find such a "best" representation in low-dimensional space. The technique provides the coordinates on dimensions (or directions in space); in the mathematical literature these dimensions are called singular vectors. The dimensions are arranged in such a way that they are orthogonal (i.e., at right angles), and successively represent as much of the variation as possible. Moreover, the technique provides us with measures (singular values), which, if squared, indicate the amount of variability accounted for by each dimension. To display the main variability in the table in a two-dimensional graph, we should use the first two dimensions.

6.2 6.2.1

SINGULAR VALUE DECOMPOSITION Basic theory

Suppose that we have a two-way data matrix X with information of I objects on J variables, and that there are more objects than variables, so that min(I, J ) = J . The singular value decomposition SVD of the matrix X is defined as

X

= UAV'.

which may be written in summation notation as 3 s=l

where S is in most cases equal to J ; that is, we generally need J terms to perfectly reproduce the original matrix X. The scalars A, are the singular values arranged in decreasing order of magnitude, us is a set of object vectors (the left singular vectors), 'This appendix is based on Kroonenberg (199%).

SINGULAR VALUE DECOMPOSITION

493

and v, is a set of variable vectors (the right singular vectors). In both sets the vectors are orthonormal, that is, they are pair-wise at right angles and have lengths equal to 1. U and V are matrices which have the vectors us and v, as their columns, respectively. If the entries in the table are the interactions from a univariate two-way analysis of variance on the original table, then both us and the v, are centered; that is, each column of U and V has a zero mean, because the original table of interaction effects is centered. Moreover, in this case is at most J - 1, because centering reduces the number of independent dimensions by 1.

B.2.2

Low-dimensional approximation

To find a low-dimensional approximation of X we have to minimize the distance between the original matrix and an approximating matrix, X. This (Euclidean) distance between two matrices, X = zij) and X = ( & j ) , is defined as

d(X. X) =

4

I

J

y 7;

(222 -

&)*,

(B.3)

z=13=1

and the Eckart and Young (1936) theorem shows that the best S-dimensional leastsquares approximation of the matrix X can be obtained from the SVD of X by summing only the first S terms of Eq. (B.2) ( S 5 3).Such a S is also referred to as the best rank-S approximation of the matrix X. The first S us and vsrwith S usually two or three, are used as the coordinates for graphical representations of the data. They can be combined with the singular values A, in different ways, of which the following two versions are the most common:

s

S

s=l

s=l

s=l

s=l

where the y and z are the object and variable coordinates of the first principal coordinate scaled version, and y* and the z* those of the second symmetrically scaled version, respectively (see Section B.3.3). In this book the matrix X is called the structural image of the data X. B.2.3 Quality of approximation

To evaluate the quality of the S-dimensional approximation we have to know how much of the original variability of X is contained in the structural image X. The total variability in a matrix, here defined as the uncorrected sum of squares, is equal to the sum of squared entries in the table,

494

BIPLOTS AND THEIR INTERPRETATION

Total variability = SS, =

11x1l2 =

x:j

(B.6)

z = 1 j=1

where llXll is called the norm of X. Because of the least-squares properties of the singular value decomposition, the norm can be split into an explained and a residual part,

+

llXlj2 = llXl12 IIX - Xl12.

(B.7)

Furthermore, one can use the orthonormality of U and V to show that this equation may be expressed in terms of the squared singular values,

s=l

s=l

s=S+l

Equation (B.8) shows that the sum of the first two squared singular values divided by the total sum of the squared singular values will give the proportion of the variability accounted for by the first two singular vectors. Large proportions of explained variability will indicate that the plot based on these two singular vectors will give a good representation of the structure in the table. If only a moderate or low proportion of the variability is accounted for the main structure of the table will still be represented in the graph, but some parts of the structure may reside in higher dimensions. If the data are centered per variable, objects located near the origin might either have all their values close to the variable means, or their variability is located in another dimension. Similarly, variables close to the origin may have little variability or may not fit well in two dimensions.

B.3 B.3.1

BIPLOTS Standard biplots

A standard biplot is the display of an object by variable (interaction) table, X decomposedintoaproductYZ’ofanIxSmatrixY = (yis) a n d a J x S m a t r i x Z = ( z j s ) . Using a two-dimensional decomposition for the structural image X, each element & j of this matrix can be written as

which is the inner (or scalar) product of the row vectors (y,l. yz2) and (z31.2 3 2 ) . A biplot is obtained by representing each row as a point Y , with coordinates (yzl,y22), and each column as point Z, with coordinates ( ~ ~ z,12). in a two-dimensional graph (with origin 0).These points are generally referred to as row markers and column

495

BIPLOTS

Figure B.l

Representation of two object markers and one variable marker in a biplot.

markers, respectively. Sometimes the word “markers” is also used for the coordinate vectors themselves. Because it is not easy to evaluate markers in a three-dimensional space, the most commonly used biplots are two-dimensional. With the current state of graphics software, it is likely that three-dimensional biplots will become more common. A straight line through the origin 0 and a point, say, Z,, is often called a biplot axis and is written as 02,, not to be confused with a coordinate axis. If we write for the orthogonal projection of U, on the biplot axis OZ,, d,, for the angle between the vectors OU, and OZ,, and write 10Zl2for the length of a vector 02, we have the geometric equivalent of Eq. (B.9) as displayed in Fig. B. 1:

X’’

2ij

= / O Z j / / O Y iC/O S ( 8 i j ) =

/oZj//oY;/.

(B.lO)

Equation (B. 10) shows that i,,is proportional to the length of Ox”, lOY:’i. This relationship is of course true for any other object i’ as well. The relationships or interactions of two objects with the same variable can be assessed simply by comparing the lengths of their projections onto that variable. Furthermore, the relationship or interaction between an object vector OY, and a variable vector 02,is positive if their angle is acute, and negative in the case of an obtuse angle. When the projection of a marker Ygonto the variable vector 02,coincides with the origin, 2,, is equal to 0 and the object has approximately a mean value for that variable, if the data were variable centered. A positive value for it,indicates that object i has a high score on variable J relative to the average score in that variable, and a negative value indicates object z has a relatively low score on variable 1. In graphs, the object markers U, are generally represented by points, and the variable markers 2, by vectors, so that the two types of markers can be clearly distinguished. This arrangement is preferred because objects are compared with respect to a variable rather than the reverse.

496

BIPLOTS AND THEIR INTERPRETATION

8.3.2

Calibrated biplots

Because the inner products between the coordinates of the object markers Y , and those of a column marker 2,vary linearly along the biplot axis OZ,, it is possible to mark (or calibrate) the biplot axis 02,linearly in such a way that the &, can be read directly from the graph (Gabriel & Odoroff, 1990; Greenacre, 1993). Note that the approximate value 2:z3does not depend on the position of Yt,but only on the orthogonal projection Y’’ onto the axis 02,.When a data matrix is centered, as is the case with data centered per variable, the approximating matrix is centered as well, and a value of Zt3 equal to zero means that on the ith uncentered variable object i has a value approximately equal to the mean of the j t h variable. One option is to mark biplot axes with (approximations of) the centered variables. However, sometimes it is also informative to replace the centered values by their “real” values by adding the observed means. After this decentering, the origin indicates the true mean values for the variables, rather than zero for all of them. (see Carlier, 1995, for some three-mode examples).

B.3.3 Two different versions of the biplot In Section B.2.2 the two most common decompositions of X were presented, both based on the SVD. These two decompositions lead to different biplots with different properties. Equations (B.4) and (B.5) show that the values of the inner products between object and variable markers are independent of the version used, so that in this respect the two versions are equivalent. However, when looking at the relationships within each set of markers, the two decompositions lead to different interpretations. In the case of principal coordinate scaling (Eq. (B.4)), the objects are in socalled standard coordinates, that is, they have zero means and unit lengths, and the variables are in principal coordinates, that is, they have unrestricted means and lengths equal to the associated singular values. If in the data matrix X the variables are standardized, the coordinates of the variables may be interpreted as correlations between the variables and the coordinate axes. With the symmetric scaling (Eq. (BS)), the correlation interpretation cannot be used, because both the object components and those of the variables have lengths equal to the square root of the singular values. Therefore, this version should primarily be used when the relations between the objects and the variables are the central focus in the analysis, and not the relations among objects and/or among variables, or when the row and column variables play a comparable role in the analysis. The advantage of the representation is that lengths of the variable and the object vectors in the biplot are approximately equal. With principal component scaling it can easily happen that the objects are concentrated around the origin of the plots, while the variables are located on the rim, and vice versa.

BIPLOTS

497

B.3.4 Interpretational rules An important point in constructing the actual graphs for biplots is that the vertical and horizontal coovdiizate axes have the same physical scale. This will ensure that when objects are projected on a variable vector, they will end up in the correct place. Failing to adhere to this scaling will make it impossible to evaluate inner products from the graph. The ratio of the units of the vertical axis and those on the horizontal axis is also referred to as the aspect ratio, and should be equal to 1. The most basic property of a biplot is that the inner product of a row (object) vector and a column (variable) vector in the plot is the best approximation to the corresponding value in the table. If there is a perfect fit in, say, two dimensions, then the inner products are identical to the values in the table. The majority of the interpretational rules given below follow from this basic property. Additional interpretations become available if special treatments such as centering and standardization have been applied to the rows and/or columns, or principal coordinate scaling and symmetric scaling to the coordinate axes. Below we will only present those interpretational rules that we think are relevant for object by variable tables, in particular, we will not consider the situation when the original table is analyzed without centering. 0

General interpretational rules (irrespective of scaling coordinate axes)

- Vectors and points

* * *

objects are preferably displayed as points and variables as vectors or arrows; if the angle between two object vectors is small, they have similar response patterns over variables; if the angle between two variable vectors is small, they are strongly associated.

- Centered per variable

*

* *

* *

the biplot displays the table of object main effect plus the two-way interaction; object scores are in deviation from their average for each of the variables; the origin represents the average value for each variable, that is, it represents the object that has an average value in each variable; this average object has a value of 0 in the centered data matrix; an object at a large distance from the origin has a large object-plusinteraction effect; the larger the projection of an object on a variable vector, the more this object deviates from the average in the variable.

498

BIPLOTS AND THEIR INTERPRETATION

-

Centered per variable and per object

*

* * * * 0

the biplot displays the two-way interaction table; there are at most min(1; J ) dimensions or coordinate axes; both object scores and variable coefficients are in deviation from their averages; the origin represents the average value both for each variable and for each object across all variables; an object (variable) at a large distance from the origin has a large interaction effect with at least one variable (object); the larger the projection of an object on a variable vector, the more this object deviates from the average in the variable, and vice versa.

Principal coordinate sca1ing:U and VA (Principal component biplot)

- Centered per variable

* *

*

* *

the cosine of the angle between any two variables approximates their correlation with equality if the fit is perfect; the lengths of the variable vectors are approximately proportional to the standard deviations of the variables with exact proportionality if the fit is perfect; the inner product between two variables approximates their covariance with equality if the fit is perfect; the Euclidean distance between two objects does not approximate the distances between their rows in the original matrix but their standardized distance, which is the square root of the Mahalanobis distance (for further details, see Gabriel, 1971, p. 460ff.); variables can have much longer vectors than objects, making visual inspection awkward; a partial remedy is to multiply all variable coordinates by an arbitrary constant, which will make the relative lengths of the variable and object vectors comparable. Note, however, that there is no obligation to use such a constant, and that it is an ad-hoc measure.

- Standardized per variable * the lengths of the variable vectors indicate how well the variables are *

represented by the graph -with a perfect fit if all vectors have equal lengths; the inner product between two variables (and the cosine of the angle between them) approximates their correlation with equality if the fit is perfect.

RELATIONSHIP WITH PCA

499

Symmetric scaling: UA1/’ and VA1/2

- General

* * 6.4

if the angle between two variable vectors is small, they are highly correlated, but their correlation cannot be deduced from the graph; similarly, the association between the objects cannot be properly read from the graph; due to the symmetric scaling of variables and objects, both are located in the same part of the space and inner products are easily assessed.

RELATIONSHIP WITH PCA

In principal component analysis the linear combination c = Xb is required which accounts for the largest amount of variation in a set of variables X. The standard solution to this problem is to construct the sums-of-squares-and-cross-productsmatrix (or after centering and scaling, the correlation matrix) X’X, and decomposing it (via the eigenvectors and eigenvalues) into VA’V’; furthermore, XX’ can be decomposed into UA’U‘. It can be shown that U, V, and A are the same as the matrices defined in Eq. (B.1). Moreover, c is equal to the first column of U, and b is equal to XI times the first column of V. In other words, principal component analysis corresponds to the factorization of Eq. (B.9). The parameters for a principal component analysis can thus be directly derived from the singular value decomposition (see also Fig. 4.4, p. 49). However, in PCA it is general practice that X’X is a correlation matrix, whereas this assumption is not made for the singular value decomposition. What this shows is that PCA is a procedure with two steps: a centering and scaling followed by a (singular value) decomposition. The separation of these two steps is generally not emphasized in object-by-variable analyses but it becomes essential when analyzing three-way data of objects by variables by occasions (see Section 6.1, p. 109). 8.5

BASIC VECTOR GEOMETRY RELEVANT TO BIPLOTS

The interpretation of biplots depends heavily on properties of vectors in the plane or three-dimensional space. This section describes the basic properties of vectors, inner products, and projections.

Vector. Vectors and their properties Symbol. x or 2. Definition. A vector is a directed line segment; it has a length and a direction. In biplots vectors start at the origin, the point (0,O) in a two-dimensional biplot. The coordinates of 2 in the two-dimensional case are ( X ~ , X Z ) ,

500

BIPLOTS AND THEIR INTERPRETATION

where x1 is the value on the horizontal coordinate axis and x2 the value on the vertical coordinate axis. Therefore, a vector Zruns from (0,O) to (21, x 2 ) .

+-

Length. The length of a vector, 15,is found via the Pythagorean theorem (a2 = b2 c’): 1 2 1 = (xy x2) - d m .

+

+

Angles. Angles and their properties

Angle. The angle between two vectors can be directly inferred from a graph; its angle between Zand $is O,, . The angle can be computed via the inner product or dot product. Inner product/Dot productIScalar product. These terms are equivalent. The term dot product is used for the product between two vectors when using vector geometry and is written as 2.3. When using the term inner product it is mostly written as Z’g. The product is defined as 20 y” x1y1+ x2 y2 using the coordinates of the vectors or, in more geometric terms, as: Z0 y’ = 1 Z lIy’l cos Ox,, which is the length of 2 times the length of y’times the cosine of the angle between them. Calculation. cos Ox, = (Zg/lZIly’l; convert the cosine to an angle via the “inverse cosine” button on your pocketJdesktop calculator, look it up in a table, or use a computer program. Special angles. Ox, = 0”- cos Ox, = 1: 2 and $are collinear, that is, they lie on the same line in the same direction; y’ = b 2 with b > 0; 2 is collinear with itself Ox, = 0; O,, = 180”+ cos Ox, = -1: Zand y’ are collinear, that is, they lie on the same line but in opposite directions; y’ = bZ with b < 0; O,, = 90”+ cos Ox, = 0: Zand y’are orthogonal (perpendicular);2y’= 0. Projection. Projections and their properties The projection y” of y’on Z is a vector collinear with 2,which can be found by dropping a perpendicular line from y’ onto 2. Thus, $ = d2.The length of fl is igl cosOxy,and d = (2 g)y3/IZl2. Equality between cosines and correlations. If the variables are centered, the cosine of O,,, the angle between two variables Z and y’, is equal to their correlation rxy:

where we have used the fact that the means are 0.