Byron J.T. Morgan. Institute of Mathematics and Statistics. University of Kent at Canterbury, UK. Philip J. Young. Department of Mathematics. University of York ...
A note on using principal components in linear discriminant analysis Ian T. Jollie+ Department of Mathematical Sciences University of Aberdeen, UK Byron J.T. Morgan Institute of Mathematics and Statistics University of Kent at Canterbury, UK Philip J. Young Department of Mathematics University of York, UK August 1995
Summary Simple, readily calculated t-statistics can be used to calibrate measures devised by Chang (1983) for indicating which principal components are important in linear discriminant analysis. Keywords: Discriminant analysis; Principal Components; Variable Selection; Shape; Chang's Method +
Address for correspondence: Department of Mathematical Sciences, University of Aberdeen Edward Wright Building, Dunbar Street Aberdeen AB9 2TY, Scotland
1
1. Introduction
Each year that multivariate analysis is taught at Kent, students are given the option of reporting their own sex, and the following measurements taken in inches: chest, waist, hand, head, height, forearm and wrist { see Bassett et al. (1996). Various versions of this data set have been analysed elsewhere (Jollie, 1986, p.51, Pack et al. 1988), and it may be obtained via the World Wide Web, using the URL http://www.ukc.ac.uk/IMS/
Sex discrimination using the full measurement set is excellent, leading to considerations of variable selection (see McLachlan, 1992, p.396), Chang (1983) considers the use of principal components for reducing the number of variables used in discrimination, and the aim of this paper is to provide a very simple calibration of his valuable approach, which has so far been lacking.
2. Chang's approach
Suppose a p-dimensional random variable x has a distribution which is a mixture of two multivariate normal distributions with the same dispersion matrix, so that x N (1 ; ) with probability , and x N (2 ; ) with probability (1 ? ). Chang (1983) notes that the covariance matrix for x is given as: = (1 ? ) 0 + ;
(1)
where = (1 ? 2 ). The vector of coecients in the population discriminant function is then = ?1 . Let 1 2 : : : p be the eigenvalues of in descending order, and let 1 ; 2 ; : : : ; p be the corresponding eigenvectors, scaled so that 0k k = 1, 1 k p. Chang (1983) shows that if 42q is the Mahalanobis squared distance between the two populations, based on the rst q principal components, then
3? 28 9 q (0 ) =? < X 75 k ? (1 ? ) 4q = 64: ; k ? k 2
2
1
1
(2)
1
A similar expression results for any subset of components. Thus the importance of the kth principal component zk = 0k x (with var(zk ) = k ) in discrimination between the two populations depends on 0 2 k2 = (k ) ; 1 k p: k
2
Chang (1983) employed likelihood-ratio tests to evaluate the importance of the fk g, and the complexity of this procedure is a disincentive to its use. Authors who also comment on the importance of the fk g include Dillon et al. (1989) and Kshirsagar et al. (1990). In practice, of course, we must base inferences on estimates of the fk g. Although Dillon et al. (1989) and Kshirsagar et al. (1990) suggest estimating k using the obvious sample analogue ^k (see below), no-one has addressed the important question of the sampling distribution of estimates, f^k g (see below).
3. The approximate distribution of ^k Suppose we have a random sample, x ; : : : ; x n1 from N ( ; ), and independently a random sample, x ; : : : ; x n2 from N ( ; ). Let n = n + n , n X x i = n1 xij ; i = 1; 2; 11
21
2
1
1
1
2
2
i
i j =1
and
x = n1
n XX 2
i
i=1 j =1
xij :
The sample analogue of (1) is T = W + B (Mardia et al., 1979, p.334), where
W= and
n XX 2
i
i=1 j =1
(xij ? x i )(xij ? x i )0 ;
B = cdd0 ;
where c = n1 n2 =n and d = (x1 ? x 2 ) Clearly B is of rank one in the two-population case of interest here. Let S = W=m, where m = (n1 + n2 ? 2), so that the linear discriminant function is b0 x, where b = S?1 d is the eigenvector of W?1 B corresponding to the only non-zero eigenvalue = tr(W?1 B), and the Mahalanobis squared distance between the two samples is
D2 = d0 S?1 d: The sample analogue of k provides the obvious estimate, and is
0 1 0 ^k = @ ak1d A ; `k2
3
where `1 `2 : : : `p are ordered eigenvalues of the sample covariance matrix, F = T=(n1 + n2 ? 1), and fai; 1 i pg are the corresponding eigenvectors. A likelihood ratio test of the null hypothesis,
H0k : k = 0 has test statistic (Mardia et al., 1979, p.323), (m ? p + 1)cb2k ; 1 k p; k = mt kk (m + cD2 ) where tkk is the kth diagonal element of T?1 , and under H0k ,
k F1;m?p+1 :
(3)
Dealing with principal components greatly simpli es analysis, as T is then diagonal. Simulation studies suggest that the practical importance of induced dependencies from using principal components is negligible (see Takemura, 1985, Jollie et al., 1995). Using the approximation that results from treating principal components as ordinary variates, we show (Jollie et al. 1992) that + ) ^2 ; k = c(m ?(pm++1)(1 k 1) and so, from (3), under H0k applied to the principal component case,
c(m ? p + 1)(1 + ) 12
^k t(m?p+1) : (4) (m + 1) In practice we can therefore decide whether or not to include a particular principal component in the linear discriminant function on the basis of a t-test, where the test statistic is a multiple of ^k . This is the t-test used in standard linear discriminant analysis using the principal components, and so is readily carried out, for example using computer packages such as Minitab, which perform multiple regression (Jollie et al., 1992).
4. Application to the shape data
The example chosen comprised 100 females and 132 males. Several of the principal components shown in Table 1 have simple interpretations, for example pc1 is a measure of size, pc2 separates fat and thin people, and pc3 is primarily a chest/waist contrast. It is interesting to note that the head measurement, which frequently enters the second or third component when the correlation matrix for shape data is analysed (Maxwell, 1977, p.43, Jollie, 1986, p.52) features here only in components 5 and 6. 4
Table 1 about here The only components that are signi cant for the sex discrimination are pcs 1 and 3. Illustrative apparent error rates are 5% (all variables) and 6% (pcs 1 and 3, or just the chest and waist measurements). Why a chest/waist contrast should be important for human sex discrimination is, with hindsight, obvious.
Acknowledgement
We are grateful to a referee for bringing the work of Takemura (1985) to our attention.
5. References
Bassett, E.E., Brooks, S.P. and Morgan, B.J.T. (1996) MINITAB multivariate macros. Applied Statistics, to appear. Chang, W.C. (1983) On using principal components before separating a mixture of two multivariate normal distributions. Applied Statistics, 32, 267-275. Dillon, W.R., Mulani, N. and Frederick, D.G. (1989) On the use of component scores in the presence of group structure. J.Consumer Research, 16, 106-112. Jollie, I.T. (1986) Principal component analysis. New York: Springer-Velag. Jollie, I.T., Morgan, B.J.T. and Young, P.J. (1992) On using principal components in linear discriminant analysis. University of Kent, Institute of Mathematics and Statistics, Tech. Report No. UKC/IMS/S92/11. Jollie, I.T., Morgan, B.J.T., and Young, P.J. (1995) A simulation study of the use of principal components in linear discriminant analysis. Submitted for publication. Kshirsagar , A.M., Kocherlakota, S. and Kocherlakota, K. (1990) Classi cation procedures using principal component analysis and stepwise discriminant function. Commun.Statist.Theor.Meth., 19, 91-109. McLachlan, G.J. (1992) Discriminant analysis and statistical pattern recognition. New York; Wiley. Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979) Multivariate Analysis. London; Academic Press. Maxwell, A.E. (1977) Multivariate analysis in behavioural research. London; Chapman & Hall. 5
Pack, P., Jollie, I.T. and Morgan, B.J.T. (1988) In uential observations in principal component analysis: a case study. J.Appl.Statist., 15, 1, 37-50. Takemura, A. (1985) A principal decomposition of Hotelling's T 2 statistic. pp.583-597 in P.R. Krishnaiah (Ed.) Multivariate Analysis - VI. Elsevier Science Publishers B.V.
6
Table 1 A principal component analysis of the shape data dispersion matrix
k k k 1 k2
Variables Chest Waist Hand Head Height Forearm Wrist
1 2 3 32.0 4.3 2.0 1.64 -0.00 -0.46 23.11 -0.03 -6.48 0.42 0.60 0.11 0.10 0.64 0.14 0.09
0.48 0.47 -0.05 -0.02 -0.73 -0.08 0.02
-0.71 0.63 -0.05 0.02 -0.10 0.09 -0.00
7
4 0.8 0.05 0.73 PCs 0.05 0.07 -0.89 -0.14 0.18 -0.37 0.10
5 0.6 -0.01 -0.18
6 7 0.5 0.1 0.01 -0.13 0.19 -1.83
-0.06 0.11 0.42 -0.44 0.10 -0.78 0.03
-0.05 -0.01 0.06 0.88 -0.01 -0.47 -0.05
-0.05 -0.05 -0.11 0.04 -0.03 -0.05 0.99