A Comparison of Some Tests for Determining the Number of Nonzero ...

0 downloads 0 Views 126KB Size Report
methods for the estimation of dimensionality in canonical correlation analysis are known. They are ...... An Introduction to Multivariate Statistical Analysis. 2nd ed.
Communications in Statistics—Simulation and Computation® , 35: 727–749, 2006 Copyright © Taylor & Francis Group, LLC ISSN: 0361-0918 print/1532-4141 online DOI: 10.1080/03610910600716290

Multivariate Analysis

A Comparison of Some Tests for Determining the Number of Nonzero Canonical Correlations 1 2 ´ ´ TADEUSZ CALINSKI , MIROSŁAW KRZYSKO , 2 ´ AND WALDEMAR WOŁYNSKI 1

Department of Mathematical and Statistical Methods, August Cieszkowski Agricultural University of Poznan, ´ Poznan, ´ Poland 2 Faculty of Mathematics and Computer Science, Adam Mickiewicz University of Poznan, ´ Poznan, ´ Poland When considering the relationships between two sets of variates, the number of nonzero population canonical correlations may be called the dimensionality. In the literature, several tests for dimensionality in the canonical correlation analysis are known. A comparison of seven sequential test procedures is presented, using results from some simulation study. The tests are compared with regard to the relative frequencies of underestimation, correct estimation, and overestimation of the true dimensionality. Some conclusions from the simulation results are drawn. Keywords Bartlett–Nanda–Pillai trace statistic; Canonical correlations; Lawley– Hotelling trace statistic; Simulation study; Testing dimensionality hypotheses; Wilks statistic. Mathematics Subject Classification Primary 62H20; Secondary 62H15.

1. Introduction For studying interrelations between two sets of variables, the canonical correlation analysis proposed by Hotelling (1936) is often used. It consists in determining a linear transformation of the original variables from both sets into two new sets of variables not correlated within the sets but most highly correlated between them. Pairs of the corresponding new variables are called the canonical variates and the coefficients of correlations within the pairs are called canonical correlations. Most frequently, one is interested in reducing the number of the resulting pairs, to consider only those which are really highly correlated. This means to reduce Received February 27, 2004; Accepted January 6, 2006 Address correspondence to Tadeusz Calinski, Department of Mathematical and ´ Statistical Methods, August Cieszkowski Agricultural University of Poznan, ´ Wojska Polskiego 28, 60-637 Poznan, ´ Poland; E-mail: [email protected]

727

728

Calin´ ski et al.

the dimensionality of the two new coordinate systems. In the literature, several methods for the estimation of dimensionality in canonical correlation analysis are known. They are either of the type of sequential test procedures or direct estimation methods. The former are based on the well-known statistics, such as the Lawley– Hotelling trace statistic, the Wilks statistic, and the Bartlett–Nanda–Pillai trace statistic. The latter are based on some information criteria (IC), the Akaike IC and the Bayesian IC, in particular. The main purpose of the present article is to compare seven different test procedures, using results from extensive simulation study. To get some idea on the performance of ICs, one of them is also included in the study. The tests are compared with regard to the relative frequencies of underestimation, correct estimation, and overestimation of the true dimensionality.

2. Canonical Correlation Analysis Let Y and X be two vectors of p and q p ≤ q random variables, respectively, having a joint p + q-variate normal distribution with means 1 and 2 , respectively, and a positive definite covariance matrix 

  = 11 21

 12  22

(1)

Then the population canonical correlations between Y and X, 1 > 1 ≥ 2 ≥ · · · ≥ d > d+1 = · · · = p = 0

(2)

are the positive square roots of the eigenvalues of −1 −1 12 22 21  11

where d is the unknown rank of 12 0 ≤ d ≤ p. Furthermore, let  S S = 11 S21

S12 S22



be the sample unbiased covariance matrix formed from a sample of N ≥ p + q + 1 observations, Y = y1  y2      yN  and X = x1  x2      xN  , on Y and X, respectively. The sample canonical correlations 1 > r1 > r2 > · · · > rp > 0

(3)

−1 are the positive square roots of the eigenvalues of S−1 11 S12 S22 S21  With probability 1, the sample canonical correlations r1  r2      rp are distinct and the rank of S12 is p (see Seber, 1984, Sec. 5.7.2). In the interpretation of the relationships between two sets of variables, the number d of nonzero population canonical correlations may be called the dimensionality, referring to the subspace to which the really correlated canonical variates belong. Of interest is the estimation of this dimensionality.

Tests for Determining the Number of Nonzero Canonical Correlations

729

3. Tests of Dimensionality The canonical correlation analysis makes sense only if 12 = 0, that is, d > 0. Therefore, one should first test the null hypothesis 12 = 0, equivalent to the hypothesis H0 1 = 2 = · · · = p = 0

(4)

If H0 is rejected, then a next question arises, namely, which of the canonical correlations i , i = 1 2     p, are 0. Because the coefficients are ordered as shown in (2), the relevant hypotheses, following H0 , are Hd d+1 = d+2 = · · · = p = 0 d = 1 2     p − 1

(5)

Thus, the question of interest is “what is the value of d?” To answer this, it is advisable to proceed as follows (see, e.g., Anderson, 1984, Sec. 12.4.1; Seber, 1984, Sec. 5.7.2). First test the overall hypothesis that d = 0, the hypothesis H0 in (4). If H0 is not rejected, then accept that d = 0. Otherwise, test Hd for d = 1. In case of not rejecting it, accept that d = 1. Otherwise, continue in this way until Hd is not rejected for the first time. Then conclude that the last p − d canonical correlations are all 0. Several testing procedures have been proposed in the literature for implementing this advice. For testing the hypothesis Hd d = 0 1     p − 1, the following three statistics, or suitable functions of them, analogous to the statistics proposed for MANOVA (see e.g., Fujikoshi, 1977, p. 64) can be considered: 1. Lawley–Hotelling trace statistic Td2 =

p 

ri2

2 i=d+1 1 − ri

(6)

2. Wilks statistic (the likelihood ratio statistic) d =

p 

1 − ri2 

(7)

i=d+1

3. Bartlett–Nanda–Pillai trace statistic Vd =

p 

ri2 

(8)

i=d+1

Under Hd , Bartlett’s (1938, 1947) 2 -approximation can be applied to Td2 , d , and Vd when transformed to SB Td2  = N − p − q − 2Td2    1 SB  d  = − N − 1 −  p + q + 1 ln d 2

(9) (10)

730

Calin´ ski et al.

and SB Vd  = N − 1Vd 

(11)

respectively. Their distributions (under Hd ) are approximately 2 with p − d q − d degrees of freedom. In the case of the statistic d , Lawley (1959) gave an improved transformation,   d  1 −2 SL  d  = − N − 1 − d − p + q + 1 + ri ln d  2 i=1

(12)

which (under Hd ) is distributed approximately as 2 with p − dq − d degrees of freedom. Further, Fujikoshi (1977, p. 66) has given the modified test statistics for the other two statistics considered above:   d  SF Td2  = N − p − q − 2 + ri−2 Td2

(13)

i=1

and   d  −2 ri Vd  SF Vd  = N − 1 − 2d +

(14)

i=1

which are also approximately distributed as 2 with p − dq − d degrees of freedom. Procedures most commonly used in practice are those based on the d criterion, modified either according to Bartlett or to Lawley. Relevant computing procedures, using mainly the former modification, are available in many statistical program packages, e.g., BMD, STATISTICA, and STATGRAPHICS. Quite surprisingly, procedures based on other multivariate test criteria have not received much attention, despite the fact that their limiting distributions are known (see Fujikoshi, 1977, Sec. 4). Calinski and Krzy´sko (2005) have suggested a closed testing procedure, in the ´ sense of Marcus et al. (1976), based on the Lawley–Hotelling trace statistic Td2 . It may be noted that the hypotheses Hd for d = 0 1     p − 1 have the property that for any d > 0 the hypothesis Hd is implied by Hd−1 , and thus all of them are implied by H0 . Moreover, since for any d1 < d2 the equality Hd1 ∩ Hd2 = Hd1 holds, the set of hypotheses  = H0  H1      Hp−1 

(15)

forms a family closed under intersection. Therefore, any Hd ∈  should be tested by means of an -level test if and only if all hypotheses preceding Hd in  have been tested and rejected. To assure that the probability of making no Type I error is at least 1 − , it is sufficient that the test employed for testing any Hd rejects a true Hd with probability not exceeding  (see Marcus et al., 1976, p. 656).

Tests for Determining the Number of Nonzero Canonical Correlations

731

When H0 is true, the null distribution of the Lawley–Hotelling trace statistic 2 T02 is indexed by the triple p mH  mE  and is denoted by Tpm , where mH = q, H mE mE = N − q − 1 and p ≤ mH  mE . Although the exact distribution of Td2 under Hd is not available, the null distribution of the Lawley–Hotelling trace statistic related to the triple p − d mH − d mE  can be shown to be an optimal upper bound for the distribution under Hd of the statistic Td2 , so that 2  Hd  ≤  PTd2 > T p−dm H −dmE

This has been established in Calinski and Lejeune (1998, Sec. 2) by using some ´ results of Schott (1984). Thus, it is justified to propose to test the family of hypotheses (15) sequentially with the statistic Td2 , for d = 0 1 2    , until the inequality 2 Td2 ≤ T p−dm H −dmE

(16)

is reached. If Td2 , for some d, is the first for which (16) holds, it can be concluded that the dimensionality, i.e., the number of nonzero population canonical correlations, is d. The probability of making no Type I error, i.e., of not overestimating the true number d by stopping too late, is then at least 1 − . Unlike in Bartlett’s approximation (9) and in Fujikoshi’s approximation (13), in the method considered here, the distribution of Td2 is approximated by its optimal 2 upper bound, which is of the form of an exact Lawley–Hotelling Tp−dm H −dmE distribution. This allows the critical value for testing the hypothesis Hd , from the family (15), to be taken as given in (16). If Hd is true, the probability of exceeding the critical value by the statistic Td2 is then at most equal to the designated , assuring the control of the Type I familywise error rate at the level , as explained above.

4. The Simulation Study Consider the p + q × p + q covariance matrix  of the combined random vector Y  X  of the form (1). The main problem that arises in designing a simulation study of the canonical correlation analysis, is the infinite number of  matrices that may be selected. However, the problem can be alleviated considerably by confining the selection to ’s in the canonical form, i.e., such that  =

Ip 

   Iq

where  is a p × q matrix of the form  1 0    0 0    0  0 2    0 0    0  =  ...................................   0 0    p 0    0 

732

Calin´ ski et al.

This simplifies the selection problem, because results obtained from the canonical matrices are not at any case less general than results obtained from any other  matrices. There are three reasons for this. • In the first place, canonical correlations are invariant under nonsingular linear transformation, and so are their test statistics (see, e.g., Kshirsagar, 1972, p. 278). • Secondly, for any matrix , one can always select two matrices, say P and Q, such that Y = PY and X = QX yield a covariance matrix  in the canonical form. • Finally, and most importantly, the exact joint distribution of r12      rp2 depends only on the parameters 21      2p (see, e.g., Muirhead, 1982, p. 557). The simulations were conducted, at the significance level  = 001,  = 005, and  = 01, applying the same method as that used by Mendoza et al. (1978). The results of simulations presented here are confined to samples of the size N = 25 50 75, generated from normal distributions with four cases of the numbers p and q: p = 4 q = 5, p = 4 q = 8, p = 6 q = 6, and p = 6 q = 8. From each population, specified by the assumed values of the canonical correlations (see Table 1), 10,000 samples were simulated for each N (10 times more than in the referred paper). To each sample seven testing procedures were applied: • three procedures based on the Lawley–Hotelling trace statistic Td2 : the proposed closed testing procedure denoted by Td2 , the Bartlett 2 -approximation procedure denoted by SB Td2 , and the Fujikoshi 2 -approximation procedure denoted by SF Td2 ; • two procedures based on the likelihood ratio statistic d : the Bartlett 2 -approximation procedure denoted by SB  d  and the Lawley 2 -approximation procedure denoted by SL  d ; • two procedures based on the Bartlett–Nanda–Pillai trace statistic Vd : the Bartlett 2 -approximation procedure denoted by SB Vd  and the Fujikoshi 2 -approximation procedure denoted by SF Vd . The seven tests were compared with regard to (a) their ability to correctly identify the number of nonzero canonical correlations in the population, (b) the frequency with which they underestimate the number of nonzero canonical correlations in the population, and (c) the frequency with which they overestimate the number of nonzero canonical correlations in the population. In addition,

Table 1 The values of canonical correlations in the  matrices used in the simulation study p=4 0 1 2 3 4 5 6

1 1 1 1 1 1 1

= 2 = 3 = 4 = 0 = 08, 2 = 3 = 4 = 0 = 07, 2 = 3 = 4 = 0 = 08, 2 = 07, 3 = 4 = 0 = 07, 2 = 05, 3 = 4 = 0 = 08, 2 = 07, 3 = 05, 4 = 0 = 07, 2 = 05, 3 = 04, 4 = 0

p=6 1 1 1 1 1 1 1

= 2 = 3 = 4 = 5 = 6 = 0 = 08, 2 = 3 = 4 = 5 = 6 = 0 = 07, 2 = 3 = 4 = 5 = 6 = 0 = 08, 2 = 07, 3 = 4 = 5 = 6 = 0 = 07, 2 = 05, 3 = 4 = 5 = 6 = 0 = 08, 2 = 07, 3 = 05, 4 = 5 = 6 = 0 = 07, 2 = 05, 3 = 04, 4 = 5 = 6 = 0

Tests for Determining the Number of Nonzero Canonical Correlations

733

for comparison reasons, the Akaike Information Criterion (AIC) has been included in the study, in the form adopted for canonical correlation analysis by Fujikoshi and Veitch (1979).

5. Results of a Simulation Study Results of the conducted simulation study are given in details in Tables A.1–A.12 (see the Appendix) and in a summarized form in Tables 2 and 3. The following conclusions can be drawn from the obtained results. In the case of generated samples of size N = 25, the mean percentages of correct choices of the actual dimension d are comparatively low for all the seven investigated testing procedures. Under the condition that the nonzero population canonical correlations are all above 0.5, the highest obtained percentage of correct choices is 47.6 at the significance level  = 001 58.6 at  = 005, and 63.3 at  = 010 The best performance from this point of view is shown by the testing procedures SB Td2  and SF Td2  Very poor performance is shown by procedures based on the statistic Vd  i.e., the procedures SB Vd  and SF Vd  If at least one of the nonzero population canonical correlations is equal to 0.5 or less, the obtained percentages of correct choices are small for all the procedures, the highest being for SF Td2  4.6 at  = 0 01 11.4 at  = 005, and 16.4 at  = 010 Under this condition, all the compared procedures expose very high underestimation of the actual d In the case of samples of size N = 50 the mean percentage of correct choices of the actual d is quite high for all the compared testing procedures, provided that the nonzero population canonical correlations are all above 0.5. The best performance is shown by the three procedures based on the statistic Td2  i.e., the procedures Td2  SB Td2 , and SF Td2  with small differences between them. With regard to possible overestimation of d it should be noted that the procedure SF Td2  may overestimate the actual dimension more frequently than the nominal significance level  allows for it. From this point of view, the best performance among the first three procedures is shown by the closed testing procedure Td2  Again, if at least one of the nonzero population canonical correlations is equal to 0.5 or less, then the frequency of correct choices is not satisfactory for any of the compared procedures. In this case, the best performance is shown by the procedure SF Td2  which gives the mean percentage of correct choices equal to 22.5 at  = 001 37.5 at  = 005, and 44.5 at  = 010 In the sample size case of N = 75 the mean percentage of correct choices of the actual d is for all the compared testing procedures very high, provided that none of the nonzero population canonical correlations is equal or smaller than 0.5. Under this condition, satisfactory performance with about or above 90% of correct decisions is shown by each of the procedures, the differences between them being negligible. However, it is to be noted that the procedure SF Td2  leads to a slightly higher frequency of the overestimation of d than it is determined by the nominal significance level  Similarly as before, the mean percentage of correct choices of d is for any procedure much lower, if at least one of the nonzero population canonical correlations is equal to 0.5 or smaller. Under the latter condition, the best performance is shown by the procedure SF Td2  for which the mean percentage of correct choices is 49.8 at  = 001 65.6 at  = 005, and 70.1 at  = 010 When considering the means over all the generated samples, it can be noted that under the condition that all nonzero population canonical correlations are

2j

1

i

2j

1

i

2b

3c

4d

5e

6f

7g

1a

under correct over under correct over

14.1 9.1 8.3 85.2 89.7 89.9 0.7 1.1 1.8 77.8 74.5 69.6 14.6 17.9 22.5 0.1 0.2 0.5

19.6 79.7 0.7 78.7 13.6 0.1

4d

N = 25

3c

5e

6f

7g

1a

2b

3c

N = 50

39.2 55.0 5.8 86.8 12.0 1.3

4d

 = 01

38.6 54.7 6.7 84.5 13.4 2.1

5e

7g

8h

66.6 68.7 5.5 28.4 27.9 58.2 5.0 3.4 36.3 86.8 90.3 52.6 11.0 8.2 37.6 2.2 1.6 9.8

6f

20.2 2.6 2.1 1.8 3.6 3.5 9.5 9.9 0.3 76.9 89.3 89.6 87.0 88.0 87.2 82.5 82.8 76.0 2.9 8.1 8.3 11.2 8.4 9.3 8.0 7.3 23.7 65.7 48.6 49.1 42.2 49.3 46.1 51.0 52.0 28.5 25.2 40.5 40.6 44.5 39.6 41.2 37.3 36.0 54.1 1.6 3.5 2.8 5.8 3.6 5.2 4.3 4.6 9.9

39.5 35.9 53.4 52.8 83.9 85.4 40.4 32.0 28.3 57.7 58.6 44.1 44.2 14.6 13.7 55.0 63.3 62.6 2.8 5.5 2.5 3.0 1.5 0.9 4.7 4.8 9.1 93.8 87.7 93.4 92.0 94.8 96.5 88.6 89.7 81.1 6.0 11.4 6.3 7.3 4.7 3.2 10.4 9.8 16.4 0.2 1.0 0.4 0.6 0.6 0.3 1.0 0.5 2.4

2b

 = 005

19.2 50.3 51.2 5.1 3.7 3.3 7.2 7.0 19.6 80.0 49.4 48.5 91.0 91.8 90.5 88.9 88.5 77.2 0.8 0.4 0.3 3.8 4.5 6.2 4.0 4.5 3.2 77.1 83.2 83.8 60.2 58.9 52.3 61.1 58.2 64.7 15.2 9.2 8.5 31.0 32.4 37.5 30.1 32.1 26.3 0.3 0.1 0.1 1.3 1.2 2.7 1.4 2.2 1.5

under 76.3 53.5 50.5 77.1 76.8 98.0 98.3 53.9 correct 23.5 45.7 47.6 22.6 22.8 1.9 1.6 44.2 over 0.2 0.9 2.0 0.3 0.4 0.1 0.0 1.9 under 99.1 98.0 95.3 98.8 98.5 99.6 99.8 94.6 correct 0.8 2.0 4.6 1.2 1.5 0.4 0.2 5.1 over 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3

1a

 = 001

Testing procedures applied at the significance level

Table 2 Mean percentages of correct and incorrect dimensionality choices obtained from all simulations

734 Calin´ ski et al.

2j

1i

2j

1i

under correct over under correct over

30.6 68.9 0.6 78.2 19.2 0.2

21.1 77.8 1.1 75.6 21.7 0.2

19.8 78.4 1.8 71.4 25.6 0.5

33.0 66.4 0.6 79.1 18.2 0.2

32.7 66.5 0.7 77.6 19.6 0.3

52.4 47.2 0.3 83.2 14.1 0.1

52.9 46.8 0.3 83.7 13.7 0.2

19.8 76.9 3.3 63.5 32.6 1.4

14.5 81.5 4.0 62.6 33.7 1.2

20.3 76.0 3.6 63.9 32.2 1.4

Overall 13.1 81.0 5.9 56.8 38.1 2.6

20.1 75.8 4.1 61.6 33.8 2.1

35.0 62.1 2.9 66.6 29.3 1.6

35.8 61.7 2.5 67.6 28.2 1.7

14.4 78.4 7.3 54.2 39.9 3.4

11.4 81.2 7.4 54.9 39.8 2.8

i

Td2 , b SB Td2 , c SF Td2 , d SB  d , e SL  d , f SB Vd , g SF Vd , h AIC. Populations in which the values of the canonical correlations are all above 0.5. j Populations in which at least one of the nonzero canonical correlations has its true value equal to 0.5 or less.

a

N = 75

under 1.2 0.8 0.7 2.2 2.2 9.0 9.2 0.3 0.2 0.2 0.5 0.4 1.7 1.7 0.1 0.1 correct 98.0 98.1 97.7 96.9 96.9 90.4 90.2 95.4 95.0 93.9 95.2 94.8 94.4 94.6 90.9 90.8 over 0.8 1.1 1.6 0.8 1.0 0.6 0.5 4.3 4.8 6.0 4.4 4.8 3.9 3.7 9.0 9.1 under 57.6 54.4 49.4 59.8 57.4 66.9 67.3 35.8 35.1 30.3 37.2 34.6 40.5 40.6 25.5 25.9 correct 42.1 45.2 49.8 39.8 42.1 32.8 32.3 61.8 62.6 65.6 60.3 61.8 56.8 56.4 68.7 69.0 over 0.3 0.4 0.8 0.3 0.6 0.3 0.3 2.5 2.3 4.1 2.5 3.5 2.7 3.0 5.7 5.1 10.1 79.5 10.4 48.3 43.7 5.5

14.3 78.0 7.7 54.2 39.8 3.5

14.1 77.3 8.6 51.6 40.9 5.0

25.6 67.2 7.3 55.2 38.0 4.3

26.4 67.2 6.4 56.7 36.4 4.4

1.9 71.4 26.7 31.5 55.5 10.5

0.1 0.2 0.2 0.6 0.6 0.0 88.9 90.8 90.0 90.6 91.0 79.9 11.0 9.0 9.8 8.8 8.4 20.1 21.6 26.4 24.1 27.8 27.7 13.4 70.1 67.8 68.2 65.8 65.1 74.8 8.4 5.8 7.7 6.4 7.2 11.7

Tests for Determining the Number of Nonzero Canonical Correlations 735

736

Calin´ ski et al. Table 3 The ratios of the frequency results of the procedure SF Td2  to those of the procedure Td2 (the first), with regard to the correct choice of d, and also to the overestimation of d Populations in which the values of the canonical correlations are all above 0.5

Populations in which at least one of the nonzero canonical correlation has its true value equal to 0.5 or less

N



1a

2b

1a

2b

25

0.01 0.05 0.1 0.01 0.05 0.1 0.01 0.05 0.1 0.01 0.05 0.1

2.03 1.32 1.14 1.06 0.99 0.97 1.00 0.98 0.98 1.14 1.05 1.01

9.28 2.97 1.95 2.66 1.62 1.37 1.91 1.37 1.23 3.12 1.76 1.44

5.40 2.23 1.58 1.54 1.21 1.10 1.18 1.06 1.02 1.34 1.17 1.10

12.40 3.50 2.43 3.47 2.09 1.68 2.50 1.65 1.46 3.04 1.92 1.63

50

75

All

a b

Ratio of SF Td2  to Td2 with regard to the frequencies of the correct choice of d. Ratio of SF Td2  to Td2 with regard to the frequencies of the overestimation of d.

above 0.5, a comparatively successful performance is shown by the first three testing procedures, Td2  SB Td2 , and SF Td2  i.e., procedures based on the statistic Td2  with not much differences between them, particularly when testing at  = 010 Under the other condition, when at least one of the nonzero population canonical correlations is equal to 0.5 or less, the frequencies of correct decisions are much smaller, with the best performance shown by the procedure SF Td2 . Results obtained for the added AIC are shown in the last column of Table 2. Although, if at least one of the nonzero population canonical correlations is equal to 0.5 or less, the performance of this criterion may be better than that of the investigated testing procedures, it has the disadvantage of always leading to much higher frequency of the overestimation of the actual d Thus, if it is desirable to keep this type of deviation under a reasonable control, that criterion cannot be recommended. To summarize the above observations, it may be concluded that for small samples the performance of any of the seven testing procedures is not satisfactory, particularly when small canonical correlations appear in the population. For medium-sized samples, the performance of the compared procedures is much better. The best performance in the sense of correct decisions on d can be expected from the first three procedures, those based on the Lawley–Hotelling statistic Td2  Among them, the most accurate in identifying the correct number of nonzero canonical

Tests for Determining the Number of Nonzero Canonical Correlations

737

correlations when some of them are small, is the third procedure, SF Td2  based on the Fujikoshi approximation of the distribution of the statistic Td2  For large samples, all the compared testing procedures show satisfactory performance, with only small differences among them, rather in favor of the first three procedures, but with larger differences when small population canonical correlations are present. Then again, more successful is the procedure SF Td2  It has, however, the tendency to overestimate the actual d more frequently than it could be expected from the nominal significance level  i.e., it does not always preserve the control of the Type I familywise error rate at the designated level  From this point of view, the most rigorous is the proposed closed testing procedure based on the statistic Td2  i.e., the first procedure. It performs particularly well when applied at the significance level not smaller than  = 005 and when the analyzed sample is not very small. In practice, it may be advisable to apply both of these testing procedures, Td2 and SF Td2  simultaneously (see Table 3). Anyway, the first three procedures, Td2  SB Td2 , and SF Td2  in most cases perform better than the other, though for large samples their dominance is not so evident. Finally, it should be noted that in view of the simulation study reported here, there is no reason for favouring in the commonly known statistical software packages, such as, e.g., BMD, STATISTICA, or STATGRAPHICS, the procedures based on the Wilks statistic d only.

References Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis. 2nd ed. New York: Wiley. Bartlett, M. S. (1938). Further aspects of the theory of multiple regression. Proc. Camb. Phil. Soc. 34:33–40. Bartlett, M. S. (1947). Multivariate analysis (with discussion). J. Roy. Statist. Soc. Suppl. 9(B):176–197. Calinski, T., Lejeune, M. (1998). Dimensionality in MANOVA tested by a closed testing ´ procedure. J. Multivariate Anal. 65:181–194. Calinski, T., Krzy´sko, M. (2005). A closed testing procedure for canonical correlations. ´ Commun. Statist.-Theor. Meth. 34:1105–1116. Fujikoshi, Y. (1977). Asymptotic expansion for the distributions of some multivariate tests. In: Krishnaiah, P. R., ed. Multivariate Analysis. Vol. IV. Amsterdam: North-Holland, pp. 55–71. Fujikoshi, Y., Veitch, L. G. (1979). Estimation of dimensionality in canonical correlation analysis. Biometrika 66:345–351. Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28:321–377. Kshirsagar, A. M. (1972). Multivariate Analysis. New York: Marcel Dekker. Lawley, D. N. (1959). Tests of significance in canonical analysis. Biometrika 46:59–66. Marcus, R., Peritz, E., Gabriel, K. R. (1976). On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63:655–660. Mendoza, J. L., Markos, V. H., Gonter, R. (1978). A new perspective on sequential testing procedures in canonical analysis: A Monte Carlo evaluation. Multivariate Behav. Res. 13:371–382. Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. New York: Wiley. Schott, J. R. (1984). Optimal bounds for the distributions of some test criteria for tests of dimensionality. Biometrika 71:561–567. Seber, G. A. F. (1984). Multivariate Observations. New York: Wiley.

7g

1a

2b

3c

4d

5e

6f

7g

8h

a

Td2 , b SB Td2 , c SF Td2 , d SB  d , e SL  d , f SB Vd , g SF Vd , h AIC.

under 9978 9966 9853 9971 9943 9983 9989 9689 9746 9252 9659 9463 9600 9675 9153 9377 8654 9115 8801 8896 9027 7320 correct 22 34 144 29 57 17 11 293 246 674 323 473 354 281 741 579 1129 770 968 893 749 2273 over 0 0 3 0 0 0 0 18 8 74 18 64 46 44 106 44 217 115 231 211 224 407

6f

6

5e

under 9595 9473 8817 9588 9472 9778 9882 8127 8307 7122 8074 7723 8129 8552 6749 7286 5902 6717 6294 6437 6895 3845 correct 398 523 1151 405 513 214 115 1775 1634 2658 1824 2085 1706 1300 2917 2542 3540 2939 3160 3002 2584 5149 over 7 4 32 7 15 8 3 98 59 220 102 192 165 148 334 172 558 344 546 561 521 1006

4d

5

3c

under 9698 9352 9018 9668 9616 9912 9932 8675 8351 7676 8584 8409 9034 9227 7544 7488 6648 7447 7188 7773 8050 4116 correct 301 645 950 329 377 85 66 1257 1592 2143 1341 1474 883 705 2244 2369 2906 2314 2438 1900 1669 4749 over 1 3 32 3 7 3 2 68 57 181 75 117 83 68 212 143 446 239 374 327 281 1135

2b

4

1a

under 7345 5975 5129 7604 7442 9386 9574 4393 3888 3105 4676 4472 6480 7043 2886 2829 2187 3056 2888 4128 4673 734 correct 2632 3983 4752 2366 2512 599 418 5386 5916 6451 5092 5214 3281 2795 6564 6766 7000 6386 6407 5190 4785 7237 over 23 42 119 30 46 15 8 221 196 444 232 314 239 162 550 405 813 558 705 682 542 2029

7g

3

6f

under 6957 4675 4675 7226 7226 9687 9687 4376 2944 2944 4557 4557 7694 7694 2926 2159 2159 3118 3118 5593 5593 629 correct 3015 5231 5155 2745 2737 307 309 5367 6727 6528 5170 5115 2134 2173 6520 7262 6936 6262 6179 3863 3946 6776 over 28 94 170 29 37 6 4 257 329 528 273 328 172 133 554 579 905 620 703 544 461 2595

5e

2

4d

under 3504 1587 1587 4271 4271 9327 9327 1413 791 791 1791 1791 6072 6072 786 500 500 948 948 3657 3657 87 correct 6436 8260 8171 5666 5661 656 665 8216 8738 8511 7812 7777 3702 3757 8429 8697 8343 8249 8181 5656 5791 6987 over 60 153 242 63 68 17 8 371 471 698 397 432 226 171 785 803 1157 803 871 687 552 2926

3c

 = 01

1

2b

 = 005

correct 9906 9578 9578 9880 9880 9994 9994 9504 9017 9017 9464 9464 9812 9812 9010 8530 8530 8920 8920 9446 9446 6151 over 94 422 422 120 120 6 6 496 983 983 536 536 188 188 990 1470 1470 1080 1080 554 554 3849

1a

 = 001

0

Estim. Cond. of d

Testing procedures applied at the significance level

Appendix Table A.1 Numbers of dimensionality choices among 10,000 simulations for the case of N = 25, p = 4, q = 5

738 Calin´ ski et al.

7g

8h

For explanations see Table A.1.

9987 9324 9357 8312 9184 8969 9308 9622 8450 8821 7446 8250 7931 8192 8784 5082 13 627 629 1547 763 948 631 359 1392 1113 2187 1575 1763 1495 1044 4346 0 49 14 141 53 83 61 19 158 66 367 175 306 313 172 572

9986 9375 9025 8420 9179 9060 9595 9740 8667 8479 7675 8336 8158 8824 9147 4395 14 598 949 1455 781 879 365 236 1232 1457 2056 1524 1639 995 748 4523 0 27 26 125 40 61 40 24 101 64 269 140 203 181 105 1082

9940 6946 6028 4930 6740 6561 8537 8999 5317 5003 3885 5184 4990 6691 7396 1200 60 2934 3856 4657 3101 3223 1322 932 4313 4743 5355 4371 4432 2803 2310 6623 0 120 116 413 159 216 141 69 370 254 760 445 578 506 294 2177

9953 6196 4210 4210 6042 6042 9126 9126 4714 3351 3351 4496 4496 7682 7682 753 47 3680 5545 5305 3763 3729 798 825 4926 6204 5811 4989 4923 1958 2065 5974 0 124 245 485 195 229 76 49 360 445 838 515 581 360 253 3273

9888 3259 1709 1709 3513 3513 8461 8461 2050 1230 1230 2183 2183 6416 6416 131 109 6489 7839 7544 6143 6094 1395 1454 7320 8052 7621 7057 6975 3070 3246 6031 3 252 452 747 344 393 144 85 630 718 1149 760 842 514 338 3838

under 10000 9993 9923 9990 9983 9999 10000 9918 9919 9648 9876 9819 9879 9942 9713 9814 9287 9620 9437 9510 9718 8022 correct 0 7 75 10 17 1 0 77 80 316 119 161 113 56 253 177 604 338 478 412 232 1817 over 0 0 2 0 0 0 0 5 1 36 5 20 8 2 34 9 109 42 85 78 50 161

6f

6

5e

9927 9824 9400 9869 9823 9967 71 174 582 129 172 33 2 2 18 2 5 0

4d

under correct over

3c

5

2b

9906 9620 9293 9838 9814 9978 91 374 682 156 177 19 3 6 25 6 9 3

1a

under correct over

7g

4

6f

9107 7695 6746 8959 8878 9868 885 2282 3143 1023 1099 127 8 23 111 18 23 5

5e

under correct over

4d

3

3c

8427 5721 5721 8361 8361 9953 1559 4215 4129 1617 1614 44 14 64 150 22 25 3

2b

under correct over

1a

9999 9530 8818 8818 9380 9380 9903 9903 9041 8352 8352 8800 8800 9605 9605 4716 1 470 1182 1182 620 620 97 97 959 1648 1648 1200 1200 395 395 5284

7g

2

6f

5978 2843 2843 6336 6336 9888 3986 7003 6860 3613 3608 109 36 154 297 51 56 3

5e

under correct over

4d

1

3c

 = 01

9907 9385 9385 9879 9879 9999 93 615 615 121 121 1

2b

 = 005

correct over

1a

 = 001

0

Estim. Cond. of d

Testing procedures applied at the significance level

Table A.2 Numbers of dimensionality choices among 10,000 simulations for the case of N = 25, p = 4, q = 8

Tests for Determining the Number of Nonzero Canonical Correlations 739

7g

8h

For explanations see Table A.1.

9986 9481 9537 8619 9352 9174 9378 9653 8770 9144 7849 8648 8351 8404 9011 4440 14 488 453 1272 614 761 543 315 1143 823 1876 1235 1417 1311 825 4321 0 31 10 109 34 65 79 32 87 33 275 117 232 285 164 1239

9990 9493 9164 8564 9274 9182 9630 9768 8864 8684 7857 8510 8340 8822 9160 3901 10 490 825 1336 686 760 320 209 1065 1271 1911 1370 1484 984 717 4503 0 17 11 100 40 58 50 23 71 45 232 120 176 194 123 1596

9932 7293 6415 5192 7186 6996 8652 9100 5678 5350 4199 5561 5360 7000 7677 966 65 2587 3482 4446 2657 2785 1185 831 3993 4432 5118 4067 4153 2492 2038 6005 3 120 103 362 157 219 163 69 329 218 683 372 487 508 285 3029

9925 6331 4319 4319 6144 6144 9082 9082 4877 3492 3492 4698 4698 7695 7695 589 75 3505 5378 5111 3601 3565 774 835 4691 5992 5618 4737 4660 1865 2000 5362 0 164 303 570 255 291 144 83 432 516 890 565 642 440 305 4049

9863 3502 1856 1856 3863 3863 8526 8526 2258 1335 1335 2445 2445 6712 6712 103 133 6240 7710 7366 5792 5754 1296 1371 7133 7953 7486 6773 6694 2713 2899 5330 4 258 434 778 345 383 178 103 609 712 1179 782 861 575 389 4567

under 10000 9994 9949 9996 9991 9998 10000 9949 9960 9726 9896 9845 9873 9936 9784 9878 9438 9690 9552 9536 9734 7499 correct 0 6 50 4 9 2 0 50 40 259 101 144 110 59 208 121 493 287 396 395 224 2092 over 0 0 1 0 0 0 0 1 0 15 3 11 17 5 8 1 69 23 52 69 42 409

6f

6

5e

9953 9890 9512 9901 9872 9944 46 109 466 96 123 54 1 1 22 3 5 2

4d

under correct over

3c

5

2b

9941 9705 9400 9889 9867 9980 58 294 583 110 130 20 1 1 17 1 3 0

1a

under correct over

7g

4

6f

9186 7942 7038 9094 9026 9856 804 2038 2851 890 954 137 10 20 111 16 20 7

5e

under correct over

4d

3

3c

8472 5783 5783 8385 8385 9925 1514 4127 4000 1582 1575 73 14 90 217 33 40 2

2b

under correct over

1a

9994 9484 8747 8747 9377 9377 9893 9893 9018 8288 8288 8811 8811 9618 9618 4068 6 516 1253 1253 623 623 107 107 982 1712 1712 1189 1189 382 382 5932

7g

2

6f

6172 3010 3010 6665 6665 9863 3798 6826 6695 3280 3270 130 30 164 295 55 65 7

5e

under correct over

4d

1

3c

 = 01

9888 9333 9333 9865 9865 9994 112 667 667 135 135 6

2b

 = 005

correct over

1a

 = 001

0

Estim. Cond. of d

Testing procedures applied at the significance level

Table A.3 Numbers of dimensionality choices among 10,000 simulations for the case of N = 25, p = 6, q = 6

740 Calin´ ski et al.

6f

7g

8h

For explanations see Table A.1.

9996 9790 9808 9103 9596 9478 9602 9848 9412 9622 8590 9092 8880 8838 9439 4212 4 202 191 820 388 478 334 140 541 370 1217 824 968 931 466 4271 0 8 1 77 16 44 64 12 47 8 193 84 152 231 95 1517

9996 9731 9435 8904 9437 9381 9769 9885 9317 9130 8379 8871 8747 9225 9523 3211 4 258 554 1017 529 570 192 98 634 843 1459 1040 1117 615 411 4656 0 11 11 79 34 49 39 17 49 27 162 89 136 160 66 2133

9977 8561 7625 6361 8001 7864 9262 9589 7291 6832 5365 6658 6507 7969 8646 933 23 1382 2326 3320 1889 1981 617 377 2511 3065 4058 3030 3095 1619 1191 5205 0 57 49 319 110 155 121 34 198 103 577 312 398 412 163 3862

9976 7388 5034 5034 6725 6725 9510 9510 6105 4266 4266 5348 5348 8471 8471 410 24 2504 4724 4460 3051 3015 400 448 3598 5350 4939 4116 4047 1184 1324 4290 0 108 242 506 224 260 90 42 297 384 795 536 605 345 205 5300

9968 5038 2593 2593 4786 4786 9256 9256 3547 2005 2005 3298 3298 7846 7846 102 32 4783 7011 6621 4857 4822 626 693 5980 7390 6822 6002 5935 1712 1887 3985 0 179 396 786 357 392 118 51 473 605 1173 700 767 442 267 5913

9999 10000 9984 9986 9835 9946 9918 9921 9975 9913 9961 9653 9812 9721 9658 9861 7027 1 0 16 14 156 54 78 68 25 84 39 308 174 243 283 118 2413 0 0 0 0 9 0 4 11 0 3 0 39 14 36 59 21 560

5e

under 9998 9997 9967 9996 9995 correct 2 3 33 4 5 over 0 0 0 0 0

4d

6

3c

9981 18 1

2b

under 9989 9961 9677 9950 9924 correct 11 39 313 50 74 over 0 0 10 0 2

1a

5

7g

9986 14 0

6f

under 9981 9788 9507 9914 9902 correct 19 211 469 84 95 over 0 1 24 2 3

5e

4

4d

9951 47 2

3c

under 9733 8738 7839 9468 9421 correct 265 1254 2078 525 565 over 2 8 83 7 14

2b

3

1a

9976 24 0

7g

under 9069 6373 6373 8713 8713 correct 928 3548 3411 1256 1251 over 3 79 216 31 36

6f

2

5e

9968 30 2

4d

under 7636 3801 3801 7443 7443 correct 2338 6053 5853 2507 2498 over 26 146 346 50 59

3c

 = 01

1

2b

 = 005

correct 9892 9158 9158 9818 9818 10000 10000 9513 8550 8550 9222 9222 9926 9926 9055 8085 8085 8605 8605 9660 9660 2531 over 108 842 842 182 182 0 0 487 1450 1450 778 778 74 74 945 1915 1915 1395 1395 340 340 7469

1a

 = 001

0

Estim. Cond. of d

Testing procedures applied at the significance level

Table A.4 Numbers of dimensionality choices among 10,000 simulations for the case of N = 25, p = 6, q = 8

Tests for Determining the Number of Nonzero Canonical Correlations 741

7g

1a

2b

3c

4d

5e

6f

7g

8h

For explanations see Table A.1.

under 9120 9023 8514 9142 8911 9313 9292 7223 7290 6402 7249 6750 7273 7171 5739 5949 5030 5752 5216 5612 5460 3895 correct 874 971 1450 852 1066 681 697 2631 2593 3317 2604 2988 2543 2586 3852 3725 4338 3834 4163 3899 3929 5355 over 6 6 36 6 23 6 11 146 117 281 147 262 184 243 409 326 632 414 621 489 611 750

6f

6

5e

under 5273 5047 4302 5405 5091 5970 6168 2693 2757 2258 2754 2540 2807 2913 1667 1797 1421 1711 1543 1648 1717 788 correct 4690 4918 5609 4558 4840 3992 3789 7020 6990 7320 6959 7061 6851 6722 7626 7628 7669 7582 7561 7516 7404 7957 over 37 35 89 37 69 38 43 287 253 422 287 399 342 365 707 575 910 707 896 836 879 1255

4d

5

3c

under 6710 6101 5627 6951 6727 7970 8038 4050 3851 3421 4181 3998 4778 4852 2815 2763 2411 2905 2739 3167 3232 1315 correct 3262 3867 4305 3022 3229 2011 1941 5718 5931 6164 5585 5654 4978 4872 6604 6721 6736 6509 6482 6179 6057 7276 over 28 32 68 27 44 19 21 232 218 415 234 348 244 276 581 516 853 586 779 654 711 1409

2b

4

1a

under 669 473 392 831 786 1719 1859 161 146 107 206 197 323 353 67 64 54 79 74 116 126 12 correct 9269 9450 9470 9108 9137 8234 8101 9468 9499 9376 9424 9366 9291 9296 9119 9205 8926 9107 8999 8991 9047 8154 over 62 77 138 61 77 47 40 371 355 517 370 437 386 351 814 731 1020 814 927 893 827 1834

7g

3

6f

under 937 597 597 1264 1264 3614 3614 280 201 201 370 370 910 910 122 99 99 167 167 386 386 17 correct 8976 9267 9211 8649 8634 6336 6338 9290 9311 9181 9194 9136 8712 8723 8987 8978 8790 8933 8851 8761 8780 7912 over 87 136 192 87 102 50 48 430 488 618 436 494 378 367 891 923 1111 900 982 853 834 2071

5e

2

4d

under 34 15 15 69 69 682 682 5 4 4 9 9 48 48 4 3 3 4 4 11 11 0 correct 9854 9818 9778 9817 9809 9250 9252 9500 9438 9320 9487 9447 9523 9558 9022 9005 8826 9014 8960 9040 9107 7828 over 112 167 207 114 122 68 66 495 558 676 504 544 429 394 974 992 1171 982 1036 949 882 2172

3c

 = 01

1

2b

 = 005

correct 9892 9795 9795 9897 9897 9951 9951 9502 9318 9318 9504 9504 9672 9672 9025 8836 8836 9002 9002 9223 9223 7533 over 108 205 205 103 103 49 49 498 682 682 496 496 328 328 975 1164 1164 998 998 777 777 2467

1a

 = 001

0

Estim. Cond. of d

Testing procedures applied at the significance level

Table A.5 Numbers of dimensionality choices among 10,000 simulations for the case of N = 50, p = 4, q = 5

742 Calin´ ski et al.

7g

1a

2b

3c

4d

5e

6f

7g

8h

708 292 0

623 372 5

For explanations see Table A.1.

754 246 0

761 239 0

738 262 0

838 162 0

857 143 0

477 499 24

470 509 21

393 562 45

490 486 24

457 506 37

537 437 26

562 413 25

339 601 60

355 596 49

281 635 84

347 593 60

319 601 80

356 574 70

374 555 71

202 722 76

under 9722 9605 9265 9696 9592 9800 9806 8644 8623 7851 8654 8271 8744 8752 7529 7684 6701 7519 7025 7487 7483 5973 correct 271 388 718 297 398 196 190 1288 1325 1986 1278 1597 1183 1159 2270 2157 2887 2279 2601 2252 2205 3763 over 7 7 17 7 10 4 4 68 52 163 68 132 73 89 201 159 412 202 374 261 312 264

6f

6

5e

under correct over

4d

5

3c

under 8067 7303 6813 8241 8055 9123 9191 5642 5241 4639 5827 5584 6738 6872 4153 3990 3516 4301 4081 4906 5069 2208 correct 1920 2669 3108 1743 1919 865 799 4172 4569 4992 3981 4151 3082 2941 5389 5604 5737 5226 5277 4596 4421 6736 over 13 28 79 16 26 12 10 186 190 369 192 265 180 187 458 406 747 473 642 498 510 1056

2b

4

1a

under 1687 1196 970 2195 2094 4970 5314 537 440 345 690 636 1468 1627 251 230 187 329 298 625 680 52 correct 8264 8730 8878 7757 7841 4999 4658 9110 9197 9068 8943 8917 8193 8075 8949 9033 8733 8841 8752 8507 8530 8378 over 49 74 152 48 65 31 28 353 363 587 367 447 339 298 800 737 1080 830 950 868 790 1570

7g

3

6f

under 2015 1161 1161 2746 2746 6598 6598 717 490 490 978 978 2800 2800 380 270 270 507 507 1366 1366 57 correct 7920 8715 8664 7191 7177 3375 3377 8918 9032 8892 8632 8584 6927 6950 8821 8871 8601 8661 8569 7905 7947 7891 over 65 124 175 63 77 27 25 365 478 618 390 438 273 250 799 859 1129 832 924 729 687 2052

5e

2

4d

under 141 58 58 339 339 3556 3556 30 15 15 62 62 661 661 10 6 6 22 22 185 185 0 correct 9787 9795 9737 9589 9577 6419 6423 9550 9470 9303 9493 9458 9014 9049 9096 9027 8816 9065 9005 8991 9076 7781 over 72 147 205 72 84 25 21 420 515 682 445 480 325 290 894 967 1178 913 973 824 739 2219

3c

 = 01

1

2b

 = 005

correct 9906 9727 9727 9895 9895 9972 9972 9478 9177 9177 9460 9460 9698 9698 8964 8653 8653 8958 8958 9267 9267 7255 over 94 273 273 105 105 28 28 522 823 823 540 540 302 302 1036 1347 1347 1042 1042 733 733 2745

1a

 = 001

0

Estim. Cond. of d

Testing procedures applied at the significance level

Table A.6 Numbers of dimensionality choices among 10,000 simulations for the case of N = 50, p = 4, q = 8

Tests for Determining the Number of Nonzero Canonical Correlations 743

7g

1a

2b

3c

4d

5e

6f

7g

8h

For explanations see Table A.1.

under 9765 9690 9397 9760 9641 9827 9833 8914 8916 8213 8901 8575 8932 8950 7983 8168 7138 7985 7471 7876 7886 5457 correct 228 303 581 233 344 165 157 1029 1039 1626 1037 1297 989 951 1849 1715 2496 1835 2193 1883 1818 3898 over 7 7 22 7 15 8 10 57 45 161 62 128 79 99 168 117 366 180 336 241 296 645

6f

6

5e

under 7920 7571 6686 8126 7852 8769 8922 5386 5390 4424 5592 5227 6008 6317 3921 4116 3247 4065 3719 4197 4457 1633 correct 2057 2406 3249 1846 2106 1203 1053 4439 4466 5205 4223 4473 3764 3461 5629 5534 5999 5474 5618 5220 4951 6937 over 23 23 65 28 42 28 25 175 144 371 185 300 228 222 450 350 754 461 663 583 592 1430

4d

5

3c

under 8405 7710 7231 8583 8436 9319 9385 6121 5709 5119 6321 6067 7207 7352 4689 4536 3975 4867 4610 5438 5601 2033 correct 1580 2265 2708 1398 1528 666 600 3725 4137 4526 3511 3683 2619 2477 4905 5115 5337 4717 4794 4086 3911 6434 over 15 25 61 19 36 15 15 154 154 355 168 250 174 171 406 349 688 416 596 476 488 1533

2b

4

1a

under 2057 1524 1246 2779 2638 5758 6101 747 614 476 1000 943 2085 2293 383 350 271 500 466 963 1084 35 correct 7895 8413 8616 7180 7311 4215 3879 8903 9039 8957 8645 8623 7573 7428 8894 8997 8713 8769 8659 8250 8212 7673 over 48 63 138 41 51 27 20 350 347 567 355 434 342 279 723 653 1016 731 875 787 704 2292

7g

3

6f

under 2317 1402 1402 3193 3193 7046 7046 845 578 578 1232 1232 3390 3390 453 339 339 618 618 1754 1754 53 correct 7627 8479 8408 6749 6733 2930 2933 8788 8963 8798 8393 8345 6323 6339 8744 8796 8513 8563 8457 7540 7582 7320 over 56 119 190 58 74 24 21 367 459 624 375 423 287 271 803 865 1148 819 925 706 664 2627

5e

2

4d

under 167 71 71 455 455 4185 4185 31 13 13 79 79 940 940 10 6 6 20 20 293 293 0 correct 9749 9770 9712 9458 9449 5771 5778 9517 9426 9293 9474 9418 8718 8766 9083 9029 8772 9030 8976 8861 8945 7175 over 84 159 217 87 96 44 37 452 561 694 447 503 342 294 907 965 1222 950 1004 846 762 2825

3c

 = 01

1

2b

 = 005

correct 9898 9747 9747 9898 9898 9967 9967 9503 9217 9217 9489 9489 9722 9722 8998 8673 8673 8947 8947 9266 9266 6770 over 102 253 253 102 102 33 33 497 783 783 511 511 278 278 1002 1327 1327 1053 1053 734 734 3230

1a

 = 001

0

Estim. Cond. of d

Testing procedures applied at the significance level

Table A.7 Numbers of dimensionality choices among 10,000 simulations for the case of N = 50, p = 6, q = 6

744 Calin´ ski et al.

7g

1a

2b

3c

4d

5e

6f

7g

8h

For explanations see Table A.1.

under 9910 9867 9628 9891 9835 9929 9943 9354 9338 8677 9297 9032 9318 9365 8629 8754 7848 8564 8168 8469 8534 6148 correct 89 132 361 108 162 70 56 618 646 1214 670 893 641 586 1278 1192 1867 1333 1590 1364 1270 3344 over 1 1 11 1 3 1 1 28 16 109 33 75 41 49 93 54 285 103 242 167 196 508

6f

6

5e

under 8713 8341 7512 8844 8607 9339 9478 6576 6548 5480 6769 6380 7179 7504 5141 5374 4251 5324 4917 5523 5864 2162 correct 1272 1642 2434 1139 1364 645 509 3310 3364 4225 3103 3407 2657 2349 4529 4405 5094 4329 4533 4000 3703 6443 over 15 17 54 17 29 16 13 114 88 295 128 213 164 147 330 221 655 347 550 477 433 1395

4d

5

3c

under 9020 8378 7877 9098 8978 9646 9690 7107 6601 5918 7226 6976 8067 8215 5692 5458 4768 5811 5539 6491 6677 2376 correct 971 1604 2071 888 1000 344 300 2804 3306 3804 2662 2842 1816 1685 4018 4298 4623 3866 3970 3132 2962 6046 over 9 18 52 14 22 10 10 89 93 278 112 182 117 100 290 244 609 323 491 377 361 1578

2b

4

1a

under 3256 2338 1923 4276 4108 7494 7820 1309 1019 744 1802 1688 3682 4018 634 564 413 889 809 1927 2191 65 correct 6692 7587 7912 5678 5819 2475 2162 8398 8684 8701 7893 7919 6021 5749 8720 8854 8583 8423 8374 7337 7195 7353 over 52 75 165 46 73 31 18 293 297 555 305 393 297 233 646 582 1004 688 817 736 614 2582

7g

3

6f

under 3275 1956 1956 4375 4375 8329 8329 1393 910 910 1953 1953 4828 4828 765 549 549 1096 1096 2877 2877 72 correct 6674 7943 7863 5572 5559 1647 1649 8294 8662 8466 7706 7659 4941 4966 8531 8662 8373 8155 8069 6487 6543 6939 over 51 101 181 53 66 24 22 313 428 624 341 388 231 206 704 789 1078 749 835 636 580 2989

5e

2

4d

under 401 150 150 999 999 6350 6350 99 52 52 231 231 2346 2346 38 20 20 96 96 937 937 1 correct 9524 9714 9647 8921 8911 3618 3626 9502 9435 9259 9364 9326 7386 7422 9138 9053 8753 9039 8969 8322 8416 6804 over 75 136 203 80 90 32 24 399 513 689 405 443 268 232 824 927 1227 865 935 741 647 3195

3c

 = 01

1

2b

 = 005

correct 9914 9715 9715 9887 9887 9978 9978 9496 9124 9124 9477 9477 9742 9742 8976 8593 8593 8924 8924 9331 9331 6216 over 86 285 285 113 113 22 22 504 876 876 523 523 258 258 1024 1407 1407 1076 1076 669 669 3784

1a

 = 001

0

Estim. Cond. of d

Testing procedures applied at the significance level

Table A.8 Numbers of dimensionality choices among 10,000 simulations for the case of N = 50, p = 6, q = 8

Tests for Determining the Number of Nonzero Canonical Correlations 745

7g

1a

2b

3c

4d

5e

6f

7g

8h

For explanations see Table A.1.

under 6754 6602 5921 6833 6410 7146 6996 4022 4065 3451 4068 3680 4093 3942 2709 2818 2304 2738 2402 2646 2517 1609 correct 3224 3376 4015 3145 3539 2831 2970 5722 5714 6145 5676 5930 5619 5684 6660 6616 6842 6629 6753 6652 6646 7357 over 22 22 64 22 51 23 34 256 221 404 256 390 288 374 631 566 854 633 845 702 837 1034

6f

6

5e

under 1860 1755 1506 1914 1789 2153 2208 639 653 529 663 598 680 697 333 351 280 338 311 332 350 127 correct 8081 8190 8402 8027 8128 7787 7725 8987 9008 8986 8963 8941 8910 8865 8842 8898 8751 8837 8731 8764 8698 8554 over 59 55 92 59 83 60 67 374 339 485 374 461 410 438 825 751 969 825 958 904 952 1319

4d

5

3c

under 3230 2887 2632 3445 3307 4328 4353 1323 1240 1103 1428 1343 1699 1718 718 706 605 756 717 852 853 263 correct 6713 7049 7261 6495 6615 5616 5587 8343 8436 8428 8239 8239 7962 7919 8579 8623 8498 8537 8432 8398 8336 8319 over 57 64 107 60 78 56 60 334 324 469 333 418 339 363 703 671 897 707 851 750 811 1418

2b

4

1a

under 13 10 8 20 17 51 54 2 1 1 2 2 3 3 1 1 1 1 1 1 1 0 correct 9898 9886 9845 9887 9880 9870 9870 9559 9574 9471 9557 9520 9547 9559 9098 9158 8972 9092 9011 9034 9066 8324 over 89 104 147 93 103 79 76 439 425 528 441 478 450 438 901 841 1027 907 988 965 933 1676

7g

3

6f

under 34 28 28 65 65 267 267 8 6 6 13 13 25 25 2 2 2 5 5 8 8 1 correct 9904 9884 9854 9874 9866 9697 9697 9551 9496 9412 9536 9496 9566 9575 9076 9055 8912 9067 9002 9096 9103 8112 over 62 88 118 61 69 36 36 441 498 582 451 491 409 400 922 943 1086 928 993 896 889 1887

5e

2

4d

under 0 0 0 0 0 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 correct 9906 9870 9847 9901 9895 9921 9925 9525 9487 9415 9531 9514 9589 9601 9043 9028 8915 9038 9004 9053 9101 8117 over 94 130 153 99 105 74 70 475 513 585 469 486 411 399 957 972 1085 962 996 947 899 1883

3c

 = 01

1

2b

 = 005

correct 9897 9828 9828 9889 9889 9938 9938 9510 9398 9398 9508 9508 9616 9616 9012 8864 8864 8998 8998 9139 9139 7887 over 103 172 172 111 111 62 62 490 602 602 492 492 384 384 988 1136 1136 1002 1002 861 861 2113

1a

 = 001

0

Estim. Cond. of d

Testing procedures applied at the significance level

Table A.9 Numbers of dimensionality choices among 10,000 simulations for the case of N = 75, p = 4, q = 5

746 Calin´ ski et al.

7g

1a

2b

3c

4d

5e

6f

7g

8h

For explanations see Table A.1.

under 8412 8158 7476 8450 8116 8796 8751 5971 5940 5156 6064 5582 6252 6146 4544 4642 3877 4586 4117 4601 4461 3340 correct 1581 1833 2494 1543 1859 1198 1237 3900 3944 4547 3807 4173 3596 3649 5022 4988 5405 4979 5223 4905 4920 6213 over 7 9 30 7 25 6 12 129 116 297 129 245 152 205 434 370 718 435 660 494 619 447

6f

6

5e

under 3711 3415 2929 3919 3680 4614 4747 1627 1607 1329 1704 1566 1895 1967 962 1002 792 994 914 1024 1075 490 correct 6233 6526 6956 6025 6239 5339 5203 8039 8096 8208 7962 8024 7752 7666 8333 8378 8285 8296 8210 8189 8093 8773 over 56 59 115 56 81 47 50 334 297 463 334 410 353 367 705 620 923 710 876 787 832 737

4d

5

3c

under 4920 4350 4013 5289 5107 6570 6615 2601 2393 2117 2800 2657 3419 3452 1635 1574 1381 1740 1644 2067 2091 745 correct 5040 5593 5876 4668 4824 3394 3347 7081 7286 7395 6876 6922 6263 6209 7714 7796 7710 7595 7524 7233 7154 8152 over 40 57 111 43 69 36 38 318 321 488 324 421 318 339 651 630 909 665 832 700 755 1103

2b

4

1a

under 84 46 38 136 126 434 471 9 7 6 11 9 38 44 4 4 3 5 5 7 9 0 correct 9845 9863 9830 9789 9786 9514 9483 9560 9552 9400 9570 9505 9565 9582 9101 9142 8887 9095 8989 9058 9099 8561 over 71 91 132 75 88 52 46 431 441 594 419 486 397 374 895 854 1110 900 1006 935 892 1439

7g

3

6f

under 171 103 103 300 300 1325 1325 38 25 25 62 62 209 209 19 18 18 27 27 75 75 1 correct 9750 9786 9748 9621 9607 8631 8632 9548 9496 9394 9516 9477 9432 9435 9115 9071 8905 9088 9018 9090 9116 8286 over 79 111 149 79 93 44 43 414 479 581 422 461 359 356 866 911 1077 885 955 835 809 1713

5e

2

4d

under 0 0 0 2 2 45 45 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 correct 9897 9855 9832 9894 9889 9880 9888 9511 9426 9342 9498 9474 9583 9612 9040 8991 8847 9040 8979 9109 9162 8136 over 103 145 168 104 109 75 67 489 574 658 502 526 416 387 960 1009 1153 960 1021 891 838 1864

3c

 = 01

1

2b

 = 005

correct 9897 9807 9807 9893 9893 9954 9954 9527 9348 9348 9506 9506 9667 9667 9040 8864 8864 9028 9028 9197 9197 7824 over 103 193 193 107 107 46 46 473 652 652 494 494 333 333 960 1136 1136 972 972 803 803 2176

1a

 = 001

0

Estim. Cond. of d

Testing procedures applied at the significance level

Table A.10 Numbers of dimensionality choices among 10,000 simulations for the case of N = 75, p = 4, q = 8

Tests for Determining the Number of Nonzero Canonical Correlations 747

7g

1a

2b

3c

4d

5e

6f

7g

8h

For explanations see Table A.1.

under 8817 8639 8038 8881 8572 9113 9083 6677 6676 5781 6764 6269 6925 6817 5148 5290 4348 5244 4681 5247 5122 2861 correct 1177 1354 1935 1113 1409 879 902 3214 3231 3947 3124 3501 2945 2993 4498 4430 4988 4400 4723 4328 4336 6125 over 6 7 27 6 19 8 15 109 93 272 112 230 130 190 354 280 664 356 596 425 542 1014

6f

6

5e

under 4384 4112 3499 4738 4448 5622 5784 2089 2088 1692 2274 2070 2545 2664 1283 1338 1055 1358 1246 1445 1515 411 correct 5577 5848 6403 5221 5483 4340 4172 7618 7651 7847 7429 7518 7119 6977 8056 8105 8017 7985 7884 7809 7677 8025 over 39 40 98 41 69 38 44 293 261 461 297 412 336 359 661 557 928 657 870 746 808 1564

4d

5

3c

under 5353 4827 4420 5805 5602 7038 7106 2937 2725 2432 3177 3032 3915 3967 1899 1834 1594 2077 1947 2456 2493 669 correct 4606 5123 5472 4150 4328 2924 2853 6775 6987 7085 6533 6580 5795 5715 7467 7567 7474 7285 7222 6869 6776 7670 over 41 50 108 45 70 38 41 288 288 483 290 388 290 318 634 599 932 638 831 675 731 1661

2b

4

1a

under 151 108 92 246 236 747 810 25 22 15 47 44 115 122 7 7 5 11 11 28 37 0 correct 9775 9806 9760 9678 9670 9194 9140 9579 9582 9447 9548 9497 9483 9508 9161 9220 8956 9165 9055 9108 9149 7947 over 74 86 148 76 94 59 50 396 396 538 405 459 402 370 832 773 1039 824 934 864 814 2053

7g

3

6f

under 212 121 121 417 417 1810 1810 44 25 25 76 76 307 307 15 11 11 22 22 93 93 1 correct 9700 9748 9720 9492 9483 8126 8132 9540 9495 9393 9495 9451 9326 9343 9114 9073 8890 9103 9022 9096 9106 7845 over 88 131 159 91 100 64 58 416 480 582 429 473 367 350 871 916 1099 875 956 811 801 2154

5e

2

4d

under 0 0 0 2 2 98 98 0 0 0 0 0 6 6 0 0 0 0 0 0 0 0 correct 9923 9877 9840 9917 9911 9860 9867 9560 9509 9400 9558 9535 9620 9634 9084 9047 8874 9091 9043 9142 9191 7741 over 77 123 160 81 87 42 35 440 491 600 442 465 374 360 916 953 1126 909 957 858 809 2259

3c

 = 01

1

2b

 = 005

correct 9901 9815 9815 9909 9909 9957 9957 9545 9359 9359 9530 9530 9673 9673 9071 8873 8873 9062 9062 9241 9241 7431 over 99 185 185 91 91 43 43 455 641 641 470 470 327 327 929 1127 1127 938 938 759 759 2569

1a

 = 001

0

Estim. Cond. of d

Testing procedures applied at the significance level

Table A.11 Numbers of dimensionality choices among 10,000 simulations for the case of N = 75, p = 6, q = 6

748 Calin´ ski et al.

7g

1a

2b

3c

4d

5e

6f

7g

8h

For explanations see Table A.1.

under 9345 9205 8739 9379 9177 9542 9537 7829 7805 6945 7884 7449 8010 7971 6552 6672 5650 6619 6054 6646 6568 3980 correct 646 785 1226 611 801 448 451 2085 2120 2832 2022 2372 1882 1889 3207 3137 3816 3125 3476 3020 3016 5281 over 9 10 35 10 22 10 12 86 75 223 94 179 108 140 241 191 534 256 470 334 416 739

6f

6

5e

under 5775 5414 4632 6211 5910 7236 7409 3220 3197 2559 3475 3205 3945 4125 2098 2174 1708 2269 2066 2445 2575 704 correct 4196 4553 5286 3759 4039 2738 2564 6549 6601 7013 6289 6453 5789 5607 7359 7365 7452 7179 7160 6902 6746 7878 over 29 33 82 30 51 26 27 231 202 428 236 342 266 268 543 461 840 552 774 653 679 1418

4d

5

3c

under 6513 5904 5433 6952 6754 8138 8208 3972 3677 3279 4310 4126 5213 5302 2747 2633 2280 2991 2811 3556 3633 920 correct 3467 4069 4473 3026 3200 1848 1779 5811 6103 6307 5462 5546 4560 4460 6744 6885 6861 6480 6456 5878 5768 7449 over 20 27 94 22 46 14 13 217 220 414 228 328 227 238 509 482 859 529 733 566 599 1631

2b

4

1a

under 326 222 182 587 544 2027 2181 70 57 47 130 121 380 417 33 31 21 43 39 129 144 1 correct 9601 9681 9647 9336 9353 7916 7774 9519 9522 9316 9440 9387 9209 9205 9110 9171 8855 9089 8968 8965 9028 7788 over 73 97 171 77 103 57 45 411 421 637 430 492 411 378 857 798 1124 868 993 906 828 2211

7g

3

6f

under 480 267 267 898 898 3387 3387 111 73 73 207 207 881 881 47 34 34 84 84 334 334 0 correct 9456 9623 9582 9032 9022 6568 6571 9489 9445 9302 9397 9345 8811 8829 9115 9081 8846 9074 8969 8887 8911 7567 over 64 110 151 70 80 45 42 400 482 625 396 448 308 290 838 885 1120 842 947 779 755 2433

5e

2

4d

under 2 2 2 16 16 644 644 0 0 0 3 3 49 49 0 0 0 0 0 8 8 0 correct 9896 9835 9792 9878 9869 9301 9306 9540 9464 9354 9541 9512 9565 9599 9044 8967 8788 9039 8985 9105 9162 7471 over 102 163 206 106 115 55 50 460 536 646 456 485 386 352 956 1033 1212 961 1015 887 830 2529

3c

 = 01

1

2b

 = 005

correct 9884 9761 9761 9879 9879 9953 9953 9466 9254 9254 9450 9450 9623 9623 8936 8724 8724 8922 8922 9150 9150 7043 over 116 239 239 121 121 47 47 534 746 746 550 550 377 377 1064 1276 1276 1078 1078 850 850 2957

1a

 = 001

0

Estim. Cond. of d

Testing procedures applied at the significance level

Table A.12 Numbers of dimensionality choices among 10,000 simulations for the case of N = 75, p = 6, q = 8

Tests for Determining the Number of Nonzero Canonical Correlations 749

Suggest Documents