trix, the variables, in effect, were transformed by column standardization (zero ... KEY WORDS: R-mode and Q-mode analysis, scaling variables, correlation.
Mathematical Geology, Vol. 12, No. 6, 1980
Scaling Variables and Interpretation of Eigenvalues in Principal Component Analysis of Geologic Data A. T. M i e s c h 2
The dominant feature distinguishing one method o f principal components analysis from another is the manner in which the original data are transformed prior to the other computations. The only other distinguishing feature o f any importance is whether the eigenvectors o f the inner product-moment o f the transformed data matrix are taken directly as the Qmode scores or scaled by the square roots o f their associated eigenvalues and called the Rmode loadings. I f the eigenvectors are extracted from the product-moment correlation matrix, the variables, in effect, were transformed by column standardization (zero means and unit variances), and the sum o f the p-largest eigenvalues divided by the sum o f all the eigen. values indicates the degree to which a model containing p components will account for the total variance in the original data. However, i f the data were transformed in any manner other than column standardization, the eigenvalues cannot be used in this manner, but can only be used to determine the degree to which the model will account for the transformed data. Regardless o f the type o f principal components analysis that is performed-even whether it is R or Q-mode-the goodness-of.fit o f the model to the original data is given better by the eigenvalues o f the correlation matrix than by those o f the matrix that was actually factored.
KEY WORDS: R-mode and Q-mode analysis, scalingvariables, correlation. INTRODUCTION Principal components analysis is one of a family of multivariate methods commonly referred to as factor analysis. True factor analysis (Jrreskog, Klovan, and Reyment, 1976, p. 78) is another member of the family, as is the more general procedure referred to simply as components analysis. Nearly all factor analyses begin with the determination of the principal components which serve as reference axes for the multidimensional cluster of data points. Elimination of one or more of the highest principal components constitutes a projection of the data into a space of fewer dimensions. After the projection, it is commonly desirable ~Manuscript received 28 February 1980; revised 14 April 1980. 2U.S. Geological Survey, Denver, CO. 80225. 523 0020-5958/80/1200-0523503.00/0 © 1980 Plenum Publishing Corporation
524
Mieseh
to rotate the principal component axes, or simply replace them, with alternative reference axes that are more easily interpretable in geologic terms. Nevertheless, the goodness-of-fit of the final model to the data is fixed with the decision on how many principal components t o retain, and subsequent refinement of the principal components model by rotation or replacement of the principal component axes does nothing whatsoever to improve the fit of the model to the data. Thus, it is important to understand the principal components procedures as thoroughly as possible before considering the various refinements. A variety of principal components methods have become popular in geology and range from the conventional R-mode method based on the correlation or covariance matrix to Q-mode methods which are based on coefficients that express the relations among observations rather than variables. The purpose of this paper is to organize the various computational procedures and show that they differ primarily in the way in which the data are scaled prior to the other computations. It may also be useful to point out the circumstances wherein the eigenvalues can and cannot be used to judge the goodness-of-fit of the derived model to the original data.
NOTATION The following notation is used in the text. n m X )( W l~ q p A G H A F U V
Number of samples (observations) Number of variables Matrix of original data (n rows; m columns) Estimate of the matrix of original data (n rows; m columns) Matrix of transformed data (n rows; m columns) Estimate of the matrix of transformed data (n rows; m columns) Rank of W; equals the number of positive eigenvalues from W'W/nor WW'/n (1