supplement to model-based principal components for correlation

0 downloads 0 Views 2MB Size Report
Jan 19, 2013 - 105 Eigenvalue -Vector Dependencies in Correlation Matrices. 34 .... 111 Parameterization of Marginal Standard Deviations ..... 121 Bias, Variance, and Skewness of Estimators ...... Suppose that a and b are column vectors, and that c is a scalar. ...... 0.5. 1. 1.5. 2. 2.5. Index. Ratio. (e) Illustration 4 Model 3.
SUPPLEMENT TO MODEL-BASED PRINCIPAL COMPONENTS FOR CORRELATION MATRICES

Robert J. Boik Department of Mathematical Sciences Montana State University Bozeman, MT 59717-2400 Email: [email protected] January 19, 2013

Details1

Contents 100 Introduction

6

101 Notation and Preliminary Results 101.1 101.2 101.3 101.4 101.5

Special Matrix Notation . . Derivative Notation . . . . . Rearranging Derivatives . . Khatri-Rao Matrix Product Selected Matrix Equalities .

. . . . .

. . . . .

. . . . .

6 . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

6 6 9 10 16

102 Expression for PCs Based on Model (19)

17

103 Simplied and Sparse Principal Components

17

104 Identied and Testable Constraints on 104.1 Discussion of the Issues . . . . . . . . 104.2 Proofs of Theorem 103 and Corollary 104.2.1 Preliminary Lemmas . . . . . 104.2.2 Theorem 103 . . . . . . . . . 104.2.3 Corollary 103.1 . . . . . . . . 104.3 Proofs of Theorem 104 and Corollary 104.3.1 Preliminary Lemmas . . . . . 104.3.2 Theorem 104 . . . . . . . . . 104.3.3 Corollary 104.1 . . . . . . . .

Γ . . . . 103.1 . . . . . . . . . . . . 104.1 . . . . . . . . . . . .

18 . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

105 Eigenvalue -Vector Dependencies in Correlation Matrices

34

105.1 Properties of Sλ (Γ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105.2 Properties of SΓ (λ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105.3 Proof of Theorem 105 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

106 Remarks on Existing Inference Procedures 107 Details on Eigenvalue Parameterizations

107.1 Parameterization (1a): λ = T1 ξλ , Cλ0 λ = cλ . . . . . . . . . . . 107.1.1 Derivatives of λ with Respect to ξλ . . . . . . . . . . . . 107.1.2 Initial Guess for ξλ . . . . . . . . . . . . . . . . . . . . . 107.2 Parameterization (1b): λ = 1p + T2 ξλ , Cλ0 λ = cλ . . . . . . . . 107.2.1 Derivatives of λ with Respect to ξλ . . . . . . . . . . . . 107.2.2 Initial Guess for ξλ . . . . . . . . . . .. . . . . . . . . . 1 exp {T2 ξλ } 0 107.3 Parameterization (2a): λ = p 1T 0 T exp {T ξ } , Cλ λ = cλ . . . 2 λ p 1 107.3.1 Derivatives of λ with Respect to ξλ . . . . . . . . . . . . 107.3.2 Initial Guess for ξλ . . . . . . . . . . . . . . . . . . . . . 107.3.3 Solving for ηλ . . . . .. . . . . . . . .. . . . . . . . . . 1 exp {T2 ξλ } 0 107.4 Parameterization (2b): λ = p 1T 0 T exp {T ξ } , Cλ ln (λ) = cλ 1 2 p

λ

p

λ

34 36 37

37

106.1 Lawley (1963) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106.2 Bentler and Yuan (1998) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

107.4.1 Derivatives of λ and ln (λ) with Respect to ξλ 107.4.2 Initial Guess for ξλ . . . . . . . . . . . . . . . . 107.4.3 Solving for ηλ . . . . . . . . . . . . . .. . . . . 1 exp {T2 ξλ } 0 107.5 Parameterization (2c): λ = p 1T 0 T exp {T ξ } , Cλ ln 1 2

18 20 20 22 24 26 26 28 33

37 38

39 . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

40 40 41 43 43 43 45 45 46 48 48

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  λ/|Λλ |1/p = cλ

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

48 49 50 51

Details2  107.5.1 Derivatives of λ and ln λ/|Λλ |1/p with Respect to ξλ . . . 107.5.2 Initial Guess for ξλ . . . . . . . . . . . . . . . . . . . . . . . . 107.5.3 Solving for ηλ . . . . . . . . . . . . . . . . . . . . . . . . . . . [T1 exp {T2 ξλ }] 0 107.6 Parameterization (3a): λ = p 10exp exp [T1 exp {T2 ξ }] , Cλ ln (λ) = cλ

. . . .

. . . .

. . . .

51 51 52 53

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  λ/|Λλ |1/p ) = cλ

. . . .

. . . .

53 54 55 56

ξλ . . . . . . . . . . . .

. . . . . . .

. . . . . . .

56 57 58 58 59 59 61

λ

p

107.6.1 Derivatives of λ and ln (λ) with Respect to ξλ . . . 107.6.2 Initial Guess for ξλ . . . . . . . . . . . . . . . . . . . 107.6.3 Solving for ηλ . . . . .. . . . . . . . . . . .. . . . . [T1 exp {T2 ξλ }] , Cλ0 ln 107.7 Parameterization (3b): λ = p 10exp p exp [T1 exp {T2 ξλ }]  107.7.1 Derivatives of λ and ln λ/|Λλ |1/p with Respect to 107.7.2 Initial Guess for ξλ . . . . . . . . . . . . . . . . . . . 107.7.3 Solving for ηλ . . . . . . . . . . . . . . . . . . . . . . 107.8 Parameterization (4): 1p + T1 exp {T2 ξλ }, Cλ0 λ = cλ . . . 107.8.1 Derivatives of λ with Respect to ξλ . . . . . . . . . . 107.8.2 Initial Guess for ξλ . . . . . . . . . . . . . . . . . . . 107.8.3 Solving for ηλ . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . . . . .

108 Additional Eigenvalue Structures

61

108.1 Bendel & Mickey (1978) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108.2 Bentler & Yuan (1998) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61 62

109 Remarks on Composite Multiplicity Models

63

110 Remarks on Parameterization of

63

G

111 Parameterization of Marginal Standard Deviations

111.1 Parameterization (1a): ψ = T5 ξ ψ , Cψ0 ψ = cψ . . . . . . . . . . . . . . . . 111.1.1 Derivatives of ψ with Respect to θψ . . . . . . . . . . . . . . . . . 111.1.2 Initial Guesses for ξ ψ and θψ . . . . . . . . . . . . . . . . . . . . . 111.1.3 Solving for ηψ and ξ ψ . . . . . . . . . . . . . . . . . . . . . . . . .  0  C ψ 111.2 Parameterization (1b): ψ = T5 ξ ψ , 10ψψ = cψ . . . . . . . . . . . . . . p 111.2.1 Derivatives of ψ with Respect to θψ . . . . . . . . . . . . . . . . . 111.2.2 Initial Guesses for ξ ψ and θψ . . . . . . . . . . . . . . . . . . . . . 111.2.3 Solving for ηψ and ξ ψ . . . . . . . . . . . . . . . . . . . . . . . . . 111.3 Parameterization (2a): ψ = T4 exp {T5 ξ ψ }, Cψ0 ψ = cψ . . . . . . . . . . 111.3.1 Derivatives of ψ with Respect to θψ . . . . . . . . . . . . . . . . . 111.3.2 Initial Guesses for ξ ψ and θψ . . . . . . . . . . . . . . . . . . . . . 111.3.3 Solving for ηψ and ξ ψ . . . . . . . . . . . . . . . . . . . . . . . . . 111.4 Parameterization (2b): ψ = T4 exp {T5 ξ ψ }, Cψ0 ln (ψ) = cψ . . . . . . . 111.4.1 Derivatives of ψ with Respect to θψ . . . . . . . . . . . . . . . . . 111.4.2 Initial Guesses for ξ ψ and θψ . . . . . . . . . . . . . . . . . . . . . 111.4.3 Solving for ηψ and ξ ψ . . . . . . . . . . . . . . . . . . . . . . . . .   0   C ψ T exp {T ξ } 111.5 Parameterization (3): ψ = θψ,1 10 T4 4 exp {T5 5 ψξ } , 10ψψ = cψ . . . . . . ψ p p 111.5.1 Derivatives of ψ with Respect to θψ . . . . . . . . . . . . . . . . . 111.5.2 Initial Guesses for ξ ψ and θψ . . . . . . . . . . . . . . . . . . . . . 111.5.3 Solving for ηψ and ξ ψ . . . . . . . . . . . . . . . . . . . . . . . . .    111.6 Parameterization (4a): ψ = ξψ,1 exp T4 exp T5 ξ ψ,2 , Cψ0 ln (ψ) = cψ 111.6.1 Derivatives of ψ with Respect to θψ . . . . . . . . . . . . . . . . . 111.6.2 Initial Guesses for ξ ψ and θψ . . . . . . . . . . . . . . . . . . . . . 111.6.3 Solving for ηψ and ξ ψ . . . . . . . . . . . . . . . . . . . . . . . . .

63 . . . .

. . . .

. . . .

. . . .

. . . .

65 65 66 66

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

66 66 67 67 67 67 68 69 70 70 73 73

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

74 74 79 81 81 81 83 84

Details3

  111.7 Parameterization (4b): ψ = θψ,1 exp T4 exp T5 ξ ψ , Cψ0 ln



ψ 1/p i=1 ψi

 = cψ

Qp

85

111.7.1 Derivatives of ψ with Respect to θψ . . . . . . . . . . . . . . . . . . . . . . 111.7.2 Initial Guesses for ξ ψ and θψ . . . . . . . . . . . . . . . . . . . . . . . . . . 111.7.3 Solving for ηψ and ξ ψ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85 87 87

112 Details on Algorithm 1

87

113 Details on Algorithm 2

89

114 Details on Algorithm 3

90

114.1 Notes on the Lagrange Multiplier Method . . . . . . . . . . . . . . . . . . . . . . . 114.2 Description of Algorithm 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.3 Derivative Expressions Required for Algorithm 3 . . . . . . . . . . . . . . . . . . .

90 95 96

115 Details on Algorithm 4

97

116 Details on Algorithm 5

99

116.1 Justification of (53) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 116.2 Description of Algorithm 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

117 Illustrations 117.1 Illustrations of Theorem 4 and Corollary 4.1 . . . . . . . 117.1.1 Illustration 1a . . . . . . . . . . . . . . . . . . . . 117.1.2 Illustration 1b . . . . . . . . . . . . . . . . . . . . 117.1.3 Illustration 1c . . . . . . . . . . . . . . . . . . . . 117.1.4 Illustration 1d . . . . . . . . . . . . . . . . . . . . 117.1.5 Illustration 1e . . . . . . . . . . . . . . . . . . . . 117.2 Illustrations of Theorem 5 and Corollary 5.1 . . . . . . . 117.2.1 Illustration 2a . . . . . . . . . . . . . . . . . . . . 117.2.2 Illustration 2b . . . . . . . . . . . . . . . . . . . . 117.2.3 Illustration 2c . . . . . . . . . . . . . . . . . . . . 117.3 Illustrations of Theorem 12 . . . . . . . . . . . . . . . . 117.3.1 Illustration 3a . . . . . . . . . . . . . . . . . . . . 117.3.2 Illustration 3b . . . . . . . . . . . . . . . . . . . . 117.3.3 Illustration 3c . . . . . . . . . . . . . . . . . . . . 117.3.4 Illustration 3d . . . . . . . . . . . . . . . . . . . . 117.3.5 Illustration 3e . . . . . . . . . . . . . . . . . . . . 117.4 Illustrations of Theorem 14 and Corollaries 14.1–14.4 . . 117.4.1 Illustration 3a Continued . . . . . . . . . . . . . 117.4.1.1 Model 1: Empty Cγ . . . . . . . . . . . 117.4.1.2 Model 2: Non-empty Cγ . . . . . . . . 117.4.2 Illustration 4 . . . . . . . . . . . . . . . . . . . . 117.4.2.1 Model 1: Empty Cγ , Empty A . . . . . 117.4.2.2 Model 2: Non-empty Cγ , Complete A . 117.4.2.3 Model 3: Non-empty Cγ , Empty A . . 117.4.2.4 Model 4: Non-empty Cγ , Non-empty A

118 Simulations

102 . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

102 102 102 102 103 103 103 103 104 104 104 105 105 105 106 107 107 107 107 108 108 109 109 109 110

110

118.1 Estimating rk(Wϕ ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 118.2 Comparison of Proposed Methods with those of Schott (1997a) . . . . . . . . . . . 112

Details4

119 Proofs of Theorems in the Article 119.1 Proofs of Theorem 1, Lemma 1, and Corollary 1.1 . 119.1.1 Preliminary Lemmas . . . . . . . . . . . . . 119.1.2 Theorem 1 . . . . . . . . . . . . . . . . . . . 119.1.3 Lemma 1 . . . . . . . . . . . . . . . . . . . 119.1.4 Corollary 1.1 . . . . . . . . . . . . . . . . . 119.2 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . 119.3 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . 119.3.1 Preliminary Results . . . . . . . . . . . . . 119.3.2 Theorem 3 . . . . . . . . . . . . . . . . . . . 119.3.3 Remarks on Schott’s Proof of Theorem 7 . . 119.4 Proofs of Lemma 2, Theorem 4, and Corollary 4.1 . 119.4.1 Lemma 2 . . . . . . . . . . . . . . . . . . . 119.4.2 Theorem 4 . . . . . . . . . . . . . . . . . . . 119.4.3 Corollary 4.1 . . . . . . . . . . . . . . . . . 119.5 Proofs of Theorem 5 and Corollary 5.1 . . . . . . . 119.5.1 Theorem 5 . . . . . . . . . . . . . . . . . . . 119.5.2 Corollary 5.1 . . . . . . . . . . . . . . . . . 119.6 Proof of Theorem 6 . . . . . . . . . . . . . . . . . . 119.7 Proof of Theorem 7 . . . . . . . . . . . . . . . . . . 119.8 Proofs of Theorem 8 and Corollary 8.1 . . . . . . . 119.8.1 Theorem 8 . . . . . . . . . . . . . . . . . . . 119.8.2 Corollary 8.1 . . . . . . . . . . . . . . . . . 119.9 Proof of Theorem 9 . . . . . . . . . . . . . . . . . . 119.9.1 Preliminary Lemma . . . . . . . . . . . . . 119.9.2 Theorem 9 . . . . . . . . . . . . . . . . . . . 119.10 Proof of Theorem 10 . . . . . . . . . . . . . . . . . 119.11 Proof of Theorem 11 . . . . . . . . . . . . . . . . . 119.12 Proof of Theorem 12 . . . . . . . . . . . . . . . . . 119.13 Proofs of Theorem 13 and Corollary 13.1 . . . . . . 119.13.1 Preliminary Lemmas . . . . . . . . . . . . . 119.13.2 Theorem 13 . . . . . . . . . . . . . . . . . . 119.13.3 Corollary 13.1 . . . . . . . . . . . . . . . . . 119.14 Proofs of Theorem 14 and Corollaries 14.1–14.4 . . 119.14.1 Preliminary Lemmas . . . . . . . . . . . . . 119.14.2 Theorem 14 . . . . . . . . . . . . . . . . . . 119.14.3 Corollary 14.1 . . . . . . . . . . . . . . . . . 119.14.4 Corollary 14.2 . . . . . . . . . . . . . . . . . 119.14.5 Corollary 14.3 . . . . . . . . . . . . . . . . . 119.14.6 Corollary 14.4 . . . . . . . . . . . . . . . . . 119.14.7 Justification of (46) . . . . . . . . . . . . . 119.15 Proofs of Theorem 15 and Corollary 15.1 . . . . . . 119.15.1 Theorem 15 . . . . . . . . . . . . . . . . . . 119.15.2 Corollary 15.1 . . . . . . . . . . . . . . . . . 119.16 Proof of Theorem 16 . . . . . . . . . . . . . . . . .

120 Derivatives of Discrepancy Functions

113 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

113 113 117 120 121 122 126 126 129 135 137 137 138 139 143 143 144 145 147 150 150 151 152 152 158 159 160 162 166 166 170 176 181 181 192 198 199 199 204 204 205 206 208 209

211

120.1 Discrepancy Function L1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 120.2 Discrepancy Function L2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 120.3 Discrepancy Function L3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

Details5

121 Bias, Variance, and Skewness of Estimators

219

121.1 General Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.2 Expressions for Discrepancy Functions L1 and L2 . . . . . . . . . . . . . . . . . . 121.2.1 Simplifications Under Normality: L ∈ {L1 , L2 } . . . . . . . . . . . . . . . 121.3 Expressions for Discrepancy Function L3 . . . . . . . . . . . . . . . . . . . . . . . 121.3.1 Simplifications Under Normality: L = L3 . . . . . . . . . . . . . . . . . . 121.4 Expansions Under Normality When Normal-Theory Estimator of στ2 is Employed.

. . . . . .

122 References

219 224 227 228 230 231

233

List of Tables 100 101 102 103 104

Matrix Operators . . . . . . . . . . . Matrix Functions . . . . . . . . . . . Matrix/Vector Spaces and Sets . . . Eigenvalue Structures for Correlation (1) (1) Expressions for Dλ;ξ0 and Dh3 ;ξ0 .

. . . . .

6 7 8 40 41

105 Expressions for Dλ;ξ0 ,ξ0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

λ

. . . . . . . . . . . . . . . . . . Matrices . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

λ

(2)

λ

106 Expressions for 107 Expressions for 108 109 110 111 112 113 114 115 116 117 118 119 120

λ

(2) Dh3 ;ξ0 ,ξ0 λ λ (3) Dλ;ξ0 ,ξ0 ,ξ0 λ λ λ (3) Dh3 ;ξ0 ,ξ0 ,ξ0 λ λ λ

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

. . . . . . . . . . . . . . . . . . . . Expressions for Special Notation for Eigenvalue Parameterizations . . . . . . . Properties of Geometric Eigenvalue Structures . . . . . . . . . Structures for p-Vector of Standard Deviations . . . . . . . . . Variant 1 Dimensions: Lawley & Maxwell Data . . . . . . . . . Variant 2 Dimensions: Lawley & Maxwell Data . . . . . . . . . List of Eigenvalue Structures for the Simulation Study . . . . List of Test Procedures . . . . . . . . . . . . . . . . . . . . . . . Empirical Test Sizes: MVN Distribution . . . . . . . . . . . . . Empirical Test Sizes: χ22 Distribution . . . . . . . . . . . . . . . Empirical Test Sizes: Mixture Distribution . . . . . . . . . . . Comparison of Schott’s (1997a) Notation and Current Notation Coefficients for Ω42,n and Ω222,n . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

45 46 62 64 98 98 112 113 114 115 116 117 221

Details6

100

Introduction

This document contains mathematical and other details to supplement the article “Model-Based Principal Components of Correlation Matrices.” All section, equation, table, lemma, theorem, and corollary numbers in {1, 2, . . . , 99} refer to the article. Numbers in {100, 101, . . .} refer to this Supplement The Boik, Panishkan, and Hyde (2010) article is cited as BPH.

101

Notation and Preliminary Results

101.1

Special Matrix Notation

Notation for selected matrix functions and matrix/vector spaces is listed in Tables 1–3 in the article. For convenience, these tables are reproduced as Tables 100– 102 in this Supplement. The quantities J∗p and J∗∗ p in Table 101 are additions to Table 2 in the article. The quantity Om in Table 102 is an addition to Table 3 in the article. The operator, , modifies scalar functions to operate elementwise on matrices. For example, if b is a k-vector, then 0 exp (b) = eb1 eb2 · · · ebk . The matrices Np and N⊥ p are projection operators onto the spaces of symmetric and skew-symmetric matrices, respectively. The notation U⊥ refers to the basis set ˚ (B) means that kC − Bk < , in Table 102, except when ⊥ is applied to Np . The relation C ∈ N 2 0 where kFk = tr (F F) and  > 0 is a fixed constant. Table 100: Matrix Operators Operator ⊗ ⊕

Definition Kronecker product operator (Harville; 1997, §16.1) elementwise operator direct sum operator (Schott; 1997b, §7.4)

The matrix Lqr,p in Table 101 is particularly useful. If dim(M) = p × p and dim(Fi ) = p × s for i = 1, 2, then diag(M) = L021,p vec(M),

vec [Mdg ] = L21,p diag(M) = L22,p vec(M), (100)

L21,p L021,p = L22,p , and L021,p (F1 ⊗ F2 ) L21,s = F1 F2 .

101.2

Derivative Notation

Let W be an a × b matrix function of θ, where θ is an ν˙ × 1 parameter vector partitioned as θ = (θ 01 θ 02 · · · θ 0k )0 , θ i has dimension νi × 1, νi is the ith element of the k-vector ν, and Pk ν˙ = i=1 νi . The derivative of W with respect to θ or θ 0 is a pν˙ × q vector or a p × q ν˙ matrix. The derivative of vec W with respect to θ or θ 0 is a pq ν˙ × 1 vector or a pq × ν˙ matrix. These

Details7

Table 101: Matrix Functions Function tr(A) & rk(A) diag(A) vdg Adg dim(M) dimj (M) dim(V ) vec(M) dvec(M; a, b) ⊗r (M) 1 p & Ip ip & epj Ej,a E⊥ j,a Ip,3 & Lqr,p ep e∗ & L L p ˙p D Ka,b Np N⊥ p & Jp J∗p & J∗∗ p kMk SVD(M) svd(M) Ma ppo(M) B− & B+ GS(M) Corr(y, z)

Definition trace of the square matrix & rank of the matrix A vector of diagonal elements of the square matrix A diagonal matrix whose diagonals are the elements of vector v [diag(A)]dg , where A is a square matrix row × column dimensions of matrix M row dimension (j = 1) & column dimension (j = 2) of matrix M dimension of vector space V vector obtained by stacking columns of M (Harville; 1997, §16.2) a × b matrix that satisfies vec [dvec(M; a, b)] = vec(M) rth order Kronecker product of M; e.g., (M)⊗3 = M ⊗ M ⊗ M p × 1 column of ones & p × p identity matrix vec(Ip ) & j th column of Ip (elementary vector) elementary matrix: j th sub-matrix of Ip = (E1,a E2,a · · · Ec,a ), 10c a = p, aj is a non-negative integer, dim(Ej,a ) = p × aj complement of Ej,a : E⊥ j,a = (E1,a · · · Ej−1,a Ej+1a, · · · Ec,a ) ⊗r p ⊗q def Pp Ip,3 def I ⊗ i ⊗ I & L ep0 = p p p qr,p = i i=1 (ei ) p(p+1) × p2 lower triangular elimination matrix & p(p−1) × p2 2 2 strictly lower triangular elimination matrix (Magnus, 1988, §5.2, §6.5) p2 × p(p+1) duplication matrix (Harville, 1997, §3.8) 2 ab × ab commutation matrix (Harville, 1997, §16.3)  Np def = Ip2 + Kp,p /2 def def N⊥ p = Ip2 − Kp,p /2 & Jp = Kp2 ,p + (Ip ⊗ 2Np ) def ∗ def Jp = (Ip ⊗ 2Np ) + (Kp,p ⊗ Ip ) & J∗∗ p = Kp2 ,p + (2Np ⊗ Ip ) 1/2 0 Euclidean norm of M: kMk def = [tr(M M)] singular value decomposition: M = SVD(M) = UDV0 , where U ∈ Oa , V ∈ Ob , and dim(M) = a × b full rank singular value decomposition: M = svd(M) = UDV0 , where U ∈ Oa,r , V ∈ Ob,r , dim(M) = a × b, and r = rk(M) Ma = U(d a )dg U0 , svd(M) = UDU0 , d = diag(D), M ∈ Dq+ perpendicular projection operator onto R(M): − 0 0 ppo(M) def = M (M M) M arbitrary & Moore-Penrose generalized inverse of B non-zero columns of Gram-Schmidt orthonormalization of M matrix of correlations among elements of random vectors y and z

matrices of derivatives are denoted and defined as follows: ν˙

(1) DW;θ def =

X ∂W ∂ ∂W = ⊗W = eνi˙ ⊗ , ∂θ ∂θ ∂ θi i=1 ν˙

(1)

DW;θ0 def =

X ∂W ∂ ∂W ˙ eν0 , i ⊗ 0 = 0 ⊗W = ∂ θi ∂θ ∂θ i=1

Details8

Table 102: Matrix/Vector Spaces and Sets Space/Set

Definition

Dpq

∅& Dp+ Dp++ ++ Ddg,p Cp Pp Op Op,q Om

empty set & set of all p × q matrices set of all p × p symmetric positive semi-definite matrices set of all p × p positive definite matrices set of all p × p diagonal positive definite matrices set of all p × p nonsingular correlation matrices set of all p × p permutation matrices set of all p × p orthogonal matrices set of all p × q semi-orthogonal matrices, p ≥ q Ld set of all p × p orthogonal matrices with structure i=1 Qi , Qi ∈ Omi , Pd i=1 mi = p, mi is a positive integer, and mi = 1 =⇒ Qi = 1 set of all p × p Hadamard matrices; i.e., M ∈ Hp ⇐⇒ np−1/2 M ∈ Op , M 2 = 1p 1o0p

Hp Sλ Sλ (Γ) SΓ (λ) SΛ SΛ (Γ) SΓ (Λ) R(B) N (B) F⊥ if F is either Ej,a or Np ˚ (B) N

++ , 10p λ = p set defined as λ; λdg ∈ Ddg,p set defined as {λ; λ ∈ Sλ , ΓΛΓ0 ∈ Cp , Λ = λdg , Γ ∈ Op } 0 set defined as {Γ; p , ΓΛΓ ∈ Cp , Λ = λdg , λ ∈ Sλ } o n Γ ∈ OL d ++ set defined as Λ; Λ = j=1 Λj , Λj ∈ Dm , tr(Λj ) = mj j set defined as {Λ; Λ ∈ SΛ , ΓΛΓ0 ∈ Cp , Γ ∈ Op } set defined as {Γ; Γ ∈ Op , ΓΛΓ0 ∈ Cp , Λ ∈ SΛ } vector space generated by the columns of the matrix B null space (kernel) of the matrix B any matrix whose columns are a basis for N (F0 ), see Table 101

˚ (B) if kC − Bk < , where  > 0 open neighborhood of B; C ∈ N is a constant

ν˙

(1)

Dvec W;θ def =

X ∂ vec W ∂ ∂ vec W = ⊗ vec W = eνi˙ ⊗ , and ∂θ ∂θ ∂ θi i=1 ν˙

(1)

Dvec W;θ0 def =

X ∂ vec W ∂ vec W ∂ ˙ eν0 = . i ⊗ 0 0 ⊗ vec W = ∂ θi ∂θ ∂θ i=1

Higher-order derivatives of W and vec W with respect to θ and with respect to sub-vectors of θ are denoted and defined as follows: ν˙

(2)

DW;θ,θ0 def =

ν˙

XX ∂2W ∂2W ˙ eνi˙ ⊗ eν0 , j ⊗ 0 = (∂ θi )(∂ θj ) ∂θ⊗∂θ i=1 j=1 ν˙

(2)

Dvec W;θ0 ,θ0 def =

ν˙

XX ∂ 2 vec W ∂ 2 vec W ν0 ˙ ν0 ˙ = e ⊗ e ⊗ , i j (∂ θi )(∂ θj ) ∂ θ0 ⊗ ∂ θ0 i=1 j=1

Details9 ν

(3)

Dvec W;θ0 ,θ0 ,θ0 def = r s t

ν

ν

r X s X t X ∂ 3 vec W ∂ 3 vec W νr 0 νs 0 νt 0 = e ⊗ e ⊗ e ⊗ , i j k (∂ θr,i )(∂ θs,j )(∂ θt,k ) ∂ θ 0r ⊗ ∂ θ 0s ⊗ ∂ θ 0t i=1 j=1 k=1

(3) def D ˆ0 = ˆ 0 ,θ ˆ 0 ,θ vec W;θ t s r

∂ 3 vec W 0 0 0 ∂ θr ⊗ ∂ θs ⊗ ∂ θt

, ˆ θ=θ

b is an estimator of θ. In each case, the derivative is denoted by D, the and so forth, where θ superscript in parentheses denotes the order of the derivative, and the subscripts denote the variable for which the derivative is taken followed by the variables with which the derivative is taken with respect to.

101.3

Rearranging Derivatives

Derivatives with respect to θ can be re-assembled from derivatives with respect to {θ j }cj=1 by using the elementary matrices defined in Table 101. Let W be an a × b matrix and let θ be a ν˙ × 1 vector partitioned as θ = (θ 01 θ 02 · · · θ 0c )0 , where θ i has dimension νi × 1. Define ν as ν = (ν1 ν2 · · · νc )0 . Then, (1)

DW;θ =

c X

(1)

(Er,ν ⊗ Ia ) DW;θr ,

(1)

DW;θ0 =

c X

0

(1)

DW;θ0 (Er,ν ⊗ Ib ) , r

r=1

r=1

(2)

DW;θ,θ0 =

c c X X

0

(2)

(Er,ν ⊗ Ia ) DW;θr ,θ0 (Es,ν ⊗ Ib ) , s

r=1 s=1

(3)

DW;θ,θ0 ,θ0 =

c c X c X X

0

(3)

(Er,ν ⊗ Ia ) DW;θr ,θ0 ,θ0 (Es,ν ⊗ Et,ν ⊗ Ib ) , s

t

r=1 s=1 t=1

(1)

Dvec W;θ0 =

c X

(1)

Dvec W;θ0 E0r,ν , r

r=1

(2)

Dvec W;θ,θ0 =

c c X X

(2)

(Er,ν ⊗ Iab ) Dvec W;θr ,θ0 E0s,ν , s

r=1 s=1

(3) Dvec W;θ0 ,θ0 ,θ0

=

c c X c X X

0

(3)

Dvec W;θ0 ,θ0 ,θ0 (Er,ν ⊗ Es,ν ⊗ Et,ν ) , r

s

t

r=1 s=1 t=1

and so forth. If W with dimension a˙ × b˙ is partitioned as W = {Wij }, where dim(Wij ) = ai × bj for i = 1, . . . , u and j = 1, . . . , v, then     a1 b1 u X v  a2   b2  X     W= Ei,a Wij E0j,b , where a =  .  , b =  .  , . . . . i=1 j=1 au bv

Details10 ˙ The derivatives of W and vec W can be written as follows: 10u a = a, ˙ and 10v b = b. (1)

DW;θ =

u X v X c X

(1)

(Er,ν ⊗ Ei,a ) DWij ;θr E0j,b ,

i=1 j=1 r=1

(1)

DW;θ0 =

u X v X c X

0

(1)

Ei,a DWij ;θ0 (Er,ν ⊗ Ej,b ) , r

i=1 j=1 r=1

(2)

DW;θ,θ0 =

u X v X c X c X

0

(2)

(Er,ν ⊗ Ei,a ) DWij ;θr ,θ0 (Es,ν ⊗ Ej,b ) , s

i=1 j=1 r=1 s=1

(3)

DW;θ,θ0 ,θ0 =

u X v X c X c X c X

0

(3)

(Er,ν ⊗ Ei,a ) DWij ;θr ,θ0 ,θ0 (Es,ν ⊗ Et,ν ⊗ Ej,b ) , s

t

i=1 j=1 r=1 s=1 t=1

and so forth. The sequence of partial derivatives can be permuted by using commutation matrices; e.g., (3)

(3)

(3)

(3)

DW;θs ,θr ,θt = (Kνs ,νr ⊗ Iaν ˙ t ) DW;θ r ,θ s ,θ t and DW;θ 0 ,θ 0 ,θ 0 = DW;θ 0 ,θ 0 ,θ 0 Iνr ⊗ Kνs ,νt ⊗ Ib˙ r

s

t

r

s



t

provided that partial derivatives are continuous. Also,       (2) (2) vec DW;θs ,θ0 = Iνt ⊗ Kb,ν ˙ s ⊗ Ia˙ vec Dvec(W);θ 0 ,θ 0 , t

t

s

    (2) (2) (2) (2) ˙ s , νt . vec Dvec(W);θ0 ,θ0 = Dvec(W);θs ,θt , and Dvec(W);θs ,θ0 = dvec Dvec(W);θ0 ,θ0 , a˙ bν s

t

t

t

s

Because of symmetry of Σ, derivatives of σ def = vec Σ have additional invariance properties: (1)

(1)

Dσ;θ0 = Kp,p Dσ;θ0 , (2)

(2)

(2)

Dσ;θ0,θ0 = Kp,p Dσ;θ0,θ0 , (2)

(3)

(3)

(3)

Dσ;θ0,θ0,θ0 = Kp,p Dσ;θ0,θ0,θ0 , (3)

Dσ;θ,θ0 = (Iν˙ ⊗ Kp,p ) Dσ;θ,θ0 , and Dσ;θ,θ,θ0 = (Iν˙ 2 ⊗ Kp,p ) Dσ;θ,θ,θ0 .

101.4

Khatri-Rao Matrix Product

The derivatives of the eigenvalues are expressed using the Khatri-Rao matrix product (Khatri and Rao, 1968). For completeness, this section gives a summary of the Khatri-Rao product. Two matrices are conformable for Khatri-Rao multiplication if and only if they have the same number of columns. Let A and B be matrices with dimensions a × d and b × d, respectively. Then, the Khatri-Rao product of A and B is an ab × d matrix. The product is denoted as A ∗ B and is defined as d X 0 A ∗ B def (ai ⊗ bi )edi , = i=1 th

where ai is the i column of A and bi is the ith column of B. Let C be a c × d matrix, D be a d × d diagonal matrix whose vector of diagonal elements is diag(D) = d, M1 be a matrix that has a columns, and M2 be a matrix that has b columns. Then the following properties of the

Details11 Khatri-Rao product can be established: (a) (b) (c) (d) (e) (f) (g) (h) (i)

A ∗ (B ∗ C) = (A ∗ B) ∗ C L21,p = Ip ∗ Ip , where L21,p is defined in Table 101 A ∗ B = (A ⊗ B)L21,d D = d0 ∗ Id , (M1 ⊗ M2 )(A ∗ B) = M1 A ∗ M2 B, vec(ADB0 ) = (B ⊗ A)L21,d d = (B ∗ A)d, vec(AB0 ) = (B ∗ A)1d , A ∗ B = Ka,b (B ∗ A), and A ∗ 10d = A.

(101)

Khatri and Rao (1968) verified claims (a) and (e). Hyde (2004) verified claims (b), (c), (f), and (h). Claims (d) and (i) follow directly from the definition of the ∗ operator. Claim (g) follows from claim (f). Let ∆ be a q × q diagonal matrix whose vector of diagonal elements is diag(∆) = h(θ), where h(θ) is a q × 1 vector valued function of θ. Derivatives of ∆, exp [h(θ)], and ln [h(θ)] with respect to θ can be expressed economically by using the Khatri-Rao product.

Theorem 100.

Let U be a q × s matrix whose elements do not depend on θ . Suppose that ∆ = [h(θ)]dg , where h(θ) is a q × 1 vector valued function of θ and θ is a ν -vector. Then,  0 (1) (1)0 (a) D∆U;θ0 = Dh(θ);θ0 ∗ U0 ,  0 (2) (2)0 (b) D∆U;θ0 ,θ0 = Dh(θ);θ0 ,θ0 ∗ U0 , and  0 (3)0 (3) (c) D∆U;θ0 ,θ0 ,θ0 = Dh(θ);θ0 ,θ0 ,θ0 ∗ U0 .

Proof. Write ∆U as ∆U =

q X

q0 eqi eq0 i h(θ)ei U.

i=1

Then, (1)

D∆U;θ0

=

q X

q   X (1) (1) q0  q0 I ⊗ e U = eqi eq0 D eqi eq0 ν i i i Dh(θ);θ 0 ⊗ ei U h(θ);θ 0

i=1

" =

i=1

q  X

(1)0 Dh(θ);θ0 eqi



U0 eqi



#0 eq0 i

 0 (1)0 = Dh(θ);θ0 ∗ U0 ,

i=1

(2)

D∆U;θ0 θ0

=

q X

q   X (2) (2) q0  q0 2 eqi eq0 D I ⊗ e U = eqi eq0 ν i i i Dh(θ);θ 0 ,θ 0 ⊗ ei U h(θ);θ 0 ,θ 0

i=1

" =

i=1

q  X

(2)0 Dh(θ);θ0 ,θ0 eqi



U0 eqi



#0 eq0 i

 0 (2)0 = Dh(θ);θ0 ,θ0 ∗ U0 , and

i=1

(3)

D∆U;θ0 θ0 ,θ0

=

q X i=1

q   X (3) (3) q0 q0  3 eqi eq0 D I ⊗ e U = eqi eq0 ν i i i Dh(θ);θ 0 ,θ 0 ,θ 0 ⊗ ei U h(θ);θ 0 ,θ 0 ,θ 0 i=1

Details12 " q X

=

(3)0 Dh(θ);θ0 ,θ0 ,θ0 eqi



U0 eqi



#0 eq0 i

 0 (3)0 = Dh(θ);θ0 ,θ0 ,θ0 ∗ U0 .

i=1

Theorem 101.

Let U be a q × s matrix whose elements do not depend on θ . Suppose that ∆ = [exp {h(θ)}]dg , where h(θ) is a q × 1 vector valued function of θ and θ is a ν -vector. Then, 0  (1) (1)0 (a) D∆U;θ0 = ∆ Dh(θ);θ0 ∗ U0 , 0  0  (2) (2)0 (1)0 (1)0 (b) D∆U;θ0 ,θ0 = ∆ Dh(θ);θ0 ∗ Dh(θ);θ0 ∗ U0 + ∆ Dh(θ);θ0 ,θ0 ∗ U0 , and 0  (3) (1)0 (1)0 (1)0 (c) D∆U;θ0 ,θ0 ,θ0 = ∆ Dh(θ);θ0 ∗ Dh(θ);θ0 ∗ Dh(θ);θ0 ∗ U0  0  0 (2)0 (1)0 (3)0 +∆ Dh(θ);θ0 ,θ0 ∗ Dh(θ);θ0 ∗ U0 (Jν ⊗ Is ) + ∆ Dh(θ);θ0 ,θ0 ,θ0 ∗ U0 ,

where Jν is dened in Table 101 Proof. Write ∆U as ∆U =

q X

 q0 eqi exp eq0 i h(θ) ei U.

i=1

Then, (1)

D∆U;θ0

=

q X

 q0 (1) q0  eqi exp eq0 i h(θ) ei Dh(θ);θ 0 Iν ⊗ ei U

i=1

=

q X

   q0 (1) eqi exp eq0 ei Dh(θ);θ0 ⊗ eq0 i h(θ) i U

i=1

=

q X

   (1) q0 q ∆eqi eq0 D ⊗ e U because eqi exp eq0 0 i i i h(θ) = ∆ei h(θ);θ

i=1

= ∆

" q X

(1)0 Dh(θ);θ0 eqi



⊗ U0 eqi eq0 i

#0

 0 (1)0 = ∆ Dh(θ);θ0 ∗ U0 ,

i=1

(2)

D∆U;θ0 θ0

=

q X

  q0 (1)   q0 (1) eqi exp eq0 h(θ) e D ⊗ e D Iν 2 ⊗ eq0 0 0 i i i i U h(θ);θ h(θ);θ

i=1

+

q X

 q0 (2) q0  eqi exp eq0 i h(θ) ei Dh(θ);θ 0 ,θ 0 Iν 2 ⊗ ei U

i=1

=

q X

   q0 (1) (1) q0 eqi exp eq0 ei Dh(θ);θ0 ⊗ eq0 i h(θ) i Dh(θ);θ 0 ⊗ ei U

i=1

+

q X i=1

   q0 (2) eqi exp eq0 ei Dh(θ);θ0 ,θ0 ⊗ eq0 i h(θ) i U

Details13

=

q X

  (1) q0 q0 (1) U D ⊗ e D ⊗ e ∆eqi eq0 0 0 i i i h(θ);θ h(θ);θ

i=1

+

q X

  (2) q0 ∆eqi eq0 D ⊗ e U 0 0 i i h(θ);θ ,θ

i=1

" = ∆

q  X

(1)0 Dh(θ);θ0 eqi



(1)0 Dh(θ);θ0 eqi



U0 eqi



#0 eq0 i

i=1

+∆

" q X

(2)0 Dh(θ);θ0 ,θ0 eqi



#0

⊗ U0 eqi eq0 i

i=1

 0  0 (1)0 (1)0 (2)0 = ∆ Dh(θ);θ0 ∗ Dh(θ);θ0 ∗ U0 + ∆ Dh(θ);θ0 ,θ0 ∗ U0 , and (3) D∆U;θ0 ,θ0 ,θ0

=

q X

   q0 (1)  (1) q0 (1) eqi exp eq0 Iν 3 ⊗ eq0 ei Dh(θ);θ0 ⊗ eq0 i h(θ) i U i Dh(θ);θ 0 ⊗ ei Dh(θ);θ 0

i=1

+

q X

    q0 (2) q0 (1) eqi exp eq0 h(θ) e D ⊗ e Iν 3 ⊗ eq0 D 0 0 0 i i i i U h(θ);θ ,θ h(θ);θ

i=1

+

q X

   q0 (2)  q0 (1) eqi exp eq0 h(θ) Iν 3 ⊗ eq0 e D ⊗ e D 0 0 0 i i U i i h(θ);θ ,θ h(θ);θ

i=1

× (Iν ⊗ Kν,ν ⊗ Is )

+

q X

   q0 (1)  q0 (2) eqi exp eq0 h(θ) Iν 3 ⊗ eq0 e D ⊗ e D 0 0 0 i i U i i h(θ);θ h(θ);θ ,θ

i=1

+

q X

 q0 (3) q0  eqi exp eq0 i h(θ) ei Dh(θ);θ 0 ,θ 0 ,θ 0 Iν 3 ⊗ ei U

i=1

=

q X

   q0 (1)  (1) q0 (1) eqi exp eq0 ei Dh(θ);θ0 ⊗ eq0 Iν 3 ⊗ eq0 i h(θ) i Dh(θ);θ 0 ⊗ ei Dh(θ);θ 0 i U

i=1

+

q X

    q0 (2) (1) eqi exp eq0 ei Dh(θ);θ0 ,θ0 ⊗ eq0 Iν 3 ⊗ eq0 i h(θ) i Dh(θ);θ 0 i U (Jν ⊗ Is )

i=1

+

q X i=1

 q0 (3) q0  eqi exp eq0 i h(θ) ei Dh(θ);θ 0 ,θ 0 ,θ 0 Iν 3 ⊗ ei U

Details14

=

q X

  (1) q0 q0 (1) q0 (1) U D ⊗ e D ⊗ e D ⊗ e ∆eqi eq0 0 0 0 i i i i h(θ);θ h(θ);θ h(θ);θ

i=1

+

q X

  (2) q0 (1) q0 ∆eqi eq0 D ⊗ e D ⊗ e U (Jν ⊗ Is ) 0 0 0 i i i h(θ);θ ,θ h(θ);θ

i=1

+

q X

  (3) q0 ∆eqi eq0 i Dh(θ);θ 0 ,θ 0 ,θ 0 ⊗ ei U

i=1

" = ∆

q  X

(1)0 Dh(θ);θ0 eqi



(1)0 Dh(θ);θ0 eqi



(1)0 Dh(θ);θ0 eqi



U0 eqi



#0 eq0 i

i=1

+∆

" q X

(2)0 Dh(θ);θ0 ,θ0 eqi



(1)0 Dh(θ);θ0 eqi



U0 eqi



#0 eq0 i

(Jν ⊗ Is )

i=1

+∆

" q X

(3)0 Dh(θ);θ0 ,θ0 ,θ0 eqi



#0

⊗ U0 eqi eq0 i

i=1

 0 (1)0 (1)0 (1)0 = ∆ Dh(θ);θ0 ∗ Dh(θ);θ0 ∗ Dh(θ);θ0 ∗ U0   0 0 (2)0 (1)0 (3)0 +∆ Dh(θ);θ0 ,θ0 ∗ Dh(θ);θ0 ∗ U0 (Jν ⊗ Is ) + ∆ Dh(θ);θ0 ,θ0 ,θ0 ∗ U0 .

Corollary 101.1.

If U = 1q and ∆ = [exp {h(θ)}]dg , then ∆U = exp {h(θ)} and the derivatives in Theorem 101 simplify to (1)

(1)

(a) Dexp {h};θ0 = ∆Dh(θ);θ0 , 0  (2) (1)0 (1)0 (2) (b) Dexp {h};θ0 ,θ0 = ∆ Dh(θ);θ0 ∗ Dh(θ);θ0 + ∆Dh(θ);θ0 ,θ0 , and 0  (3) (1)0 (1)0 (1)0 (c) Dexp {h};θ0 ,θ0 ,θ0 = ∆ Dh(θ);θ0 ∗ Dh(θ);θ0 ∗ Dh(θ);θ0  0 (2)0 (1)0 (3) +∆ Dh(θ);θ0 ,θ0 ∗ Dh(θ);θ0 Jν + ∆Dh(θ);θ0 ,θ0 ,θ0 , where Jν is dened in Table 101.

Theorem 102.

Suppose that ∆ = [h(θ)]dg , where h(θ) is a q × 1 vector valued function of θ , θ is a ν -vector, and each element of h(θ) is positive. Then, (1)

(1)

(a) Dln (h);θ0 = ∆−1 Dh(θ);θ0 ,  0 (2) (1)0 (1)0 (2) (b) Dln (h);θ0 ,θ0 = −∆−2 Dh(θ);θ0 ∗ Dh(θ);θ0 + ∆−1 Dh(θ);θ0 ,θ0 , and  0 (3) (1)0 (1)0 (1)0 (c) Dln (h);θ0 ,θ0 ,θ0 = 2∆−3 Dh(θ);θ0 ∗ Dh(θ);θ0 ∗ Dh(θ);θ0  0 (2)0 (1)0 (3) −∆−2 Dh(θ);θ0 ,θ0 ∗ Dh(θ);θ0 Jν + ∆−1 Dh(θ);θ0 ,θ0 ,θ0 , where Jν is dened in Table 101.

Details15 Proof. Write ln [h(θ)] as ln [h(θ)] =

q X

 eqi ln eq0 i h(θ) .

i=1

Then, (1)

Dln (h);θ0

=

q X

−1 q0 (1)  ei Dh(θ);θ0 eqi eq0 i h(θ)

i=1

=

q X

−1 (1) q  q0 ∆−1 eqi eq0 = ∆−1 eqi i Dh(θ);θ 0 because ei ei h(θ)

i=1 (1)

= ∆−1 Dh(θ);θ0 , (2)

Dln (h);θ0 ,θ0

= −

q X

  −2  q0 (1) (1) ei Dh(θ);θ0 ⊗ eq0 eqi eq0 i Dh(θ);θ 0 i h(θ)

i=1

+

q X

 −1 q0 (2) eqi eq0 ei Dh(θ);θ0 ,θ0 i h(θ)

i=1

= −

q X

q   X (1) (2) q0 (1) ∆−2 eqi eq0 D ⊗ e D + ∆−1 eqi eq0 0 0 i i i Dh(θ);θ 0 ,θ 0 h(θ);θ h(θ);θ

i=1

i=1

= −∆−2

" q X

(1)0 Dh(θ);θ0 eqi



(1)0 Dh(θ);θ0 eqi



#0 eq0 i

(2)

+ ∆−1 Dh(θ);θ0 ,θ0

i=1

0  (2) (1)0 (1)0 = −∆−2 Dh(θ);θ0 ∗ Dh(θ);θ0 + ∆−1 Dh(θ);θ0 ,θ0 , and (3) Dln (h);θ0 θ0 ,θ0

=

2

q X

  −3  q0 (1) (1) q0 (1) eqi eq0 ei Dh(θ);θ0 ⊗ eq0 i h(θ) i Dh(θ);θ 0 ⊗ ei Dh(θ);θ 0

i=1



q X

  −2  q0 (2) q0 (1) eqi eq0 h(θ) e D ⊗ e D 0 0 0 i i i h(θ);θ ,θ h(θ);θ

i=1



q X

  −2  q0 (2) q0 (1) h(θ) eqi eq0 e D ⊗ e D (Iν ⊗ Kν,ν ) 0 0 0 i i i h(θ);θ ,θ h(θ);θ

i=1



q X

  −2  q0 (1) (2) eqi eq0 ei Dh(θ);θ0 ⊗ eq0 i h(θ) i Dh(θ);θ 0 ,θ 0

i=1

+

q X i=1

 −1 q0 (3) eqi eq0 ei Dh(θ);θ0 ,θ0 ,θ0 i h(θ)

Details16

=

2

q X

  (1) q0 (1) q0 (1) D D ⊗ e D ⊗ e ∆−3 eqi eq0 0 0 0 i i i h(θ);θ h(θ);θ h(θ);θ

i=1



q X

  (2) q0 (1) ∆−2 eqi eq0 D ⊗ e D Jν 0 0 0 i i h(θ);θ ,θ h(θ);θ

i=1

+

q X

(3)

∆−1 eqi eq0 i Dh(θ);θ 0 ,θ 0 ,θ 0

i=1

" =

2∆

−3

q  X

(1)0 Dh(θ);θ0 eqi



(1)0 Dh(θ);θ0 eqi



(1)0 Dh(θ);θ0 eqi



#0 eq0 i

i=1

−2

−∆

" q X

(2)0 Dh(θ);θ0 ,θ0 eqi



(1)0 Dh(θ);θ0 eqi



#0 eq0 i

(3)

Jν + ∆−1 Dh(θ);θ0 ,θ0 ,θ0

i=1

=

 0 (1)0 (1)0 (1)0 2∆−3 Dh(θ);θ0 ∗ Dh(θ);θ0 ∗ Dh(θ);θ0  0 (3) (2)0 (1)0 −∆−2 Dh(θ);θ0 ,θ0 ∗ Dh(θ);θ0 Jν + ∆−1 Dh(θ);θ0 ,θ0 ,θ0 .

101.5

Selected Matrix Equalities

Consider matrices A, B, and C with dimensions such that the matrix product ABC exists. Roth (1934) showed that vec(ABC) = (C0 ⊗ A) vec(B). (102) Suppose that a and b are column vectors, and that c is a scalar. Also, suppose that A : p × q, B : r × s, C : s × t, and D : t × u are matrices and multiplication with E is conformable. Then, the

Details17 following matrix equalities can be established. (a) ab0 = (a ⊗ b0 ) = (b0 ⊗ a) , (b) cA = (c ⊗ A) = (A ⊗ c), (c) E (A ⊗ BCD) = [vec0 (C) ⊗ EKp,r ] [D ⊗ vec(B) ⊗ A] Ku,q , (d) (A ⊗ BCD) E = Kp,r [B ⊗ vec0 (D0 ) ⊗ A] [vec(C0 ) ⊗ Ku,q E] , (e) E (BCD ⊗ A) = [vec0 (C) ⊗ E] [D ⊗ vec(B) ⊗ A] , and (103) (f) (BCD ⊗ A) E = [B ⊗ vec0 (D0 ) ⊗ A] [vec(C0 ) ⊗ E] (g) [A ⊗ vec0 (B) ⊗ D] = [A ⊗ vec0 (Is ) ⊗ vec0 (Ir ) ⊗ D] × {(Kq,s ⊗ Ks,r ) Kqs,rs [vec (B0 ) ⊗ Ks,q ] ⊗ Iru } = [A ⊗ vec0 (Is ) ⊗ vec0 (Ir ) ⊗ D] × (Ksq,rs ⊗ Iru ) [vec (B) ⊗ Iqrsu ] . Boik (2008b, eq. A2, A3) verified (103a)–(103f). The equality in (103g) can be verified in a similar manner.

102

Expression for PCs Based on Model (19)

The vector of standardized correlation-based PCs for yi is zi = Λ−α Γ0 Ψ−1 (yi − B0 xi ), where X = x1

···

xN

0

,

 iid 1−2α and εi = yi − B0 xi iid . ∼ (0, Σ) =⇒ zi ∼ 0, Λ Set α to 1/2 to obtain standardized PCs and set α to 0 to obtain raw PCs. The N × p matrix of PCs for the entire sample is   Z = (Y − XB)Ψ−1 ΓΛ−α and vec(Z) ∼ 0, Λ1−2α ⊗ IN .

103

Simplied and Sparse Principal Components

The material in this section was not addressed in the article. Nonetheless, it is a relevant topic so it is addressed in this Supplement. Simplified PCs of correlation matrices can be obtained in a variety of ways including the following. b can be constrained to be equal to or proportional to values in a small set; 1. The elements in Γ e.g., {−1, 0, 1} (Hausman, 1982; Vines, 2000; Rousson & Gasser, 2004; Chipman & Gu, 2005; Gallo, Amenta, & D’Ambra, 2006).

Details18 2. A penalty function can be appended to the usual variance maximization objective of PCs (Jolliffe & Uddin, 2000). 3. In addition to the unit-norm constraint, an upper bound can be imposed on the sum of the b (Jolliffe, Uddin, & Vines, 2002; Jolliffe, Trendafilov, & absolute values in each column of Γ Uddin, 2003; Hannachi, Jolliffe, Stephenson, & Trendafilov, 2006; Trendafilov & Jolliffe, 2006). Simplified PCs of covariance matrices can be obtained in a variety of ways including the following. b 0 SΓ b ) can be maximized subject to 1. The quantity tr(Γ 1

1

b Γ b0 tr(Γ 1 1 ) = 1,

b ) = p × r, and dim(Γ 1

p X p X

b b0 p |ep0 i Γ1 Γ1 ej | ≤ k.

i=1 j=1

See d’Aspremont, E. Ghaoui, Jordan, and Lanckreit (2007). 2. Impose L1 and L2 constraints on the regression formulation PCA on covariance matrices. That is, minimize Q=

N X

¯ )k22 + λkBk22 + k(Ip − P)(yi − y

i=1

r X

λ1,j kBej k1 ,

j=1

b 0 , dim(A) = p × r, A0 A = Ir , and dim(Γ) b = p × r (Zou, Hastie, & Tibshirani, where P = AΓ 2006). 3. Impose a penalty function on the regression formulation of PCA. That is, minimize b 0 k2 + P (Γ b , U), Q = k(IN − HX )Y − UΓ 1 2 1 b , U) is a penalty term (Shen & Huang, 2006; Witten, where dim(U) = N × r, and P (Γ 1 Tibshirani, & Hastie, 2009).

104

Identied and Testable Constraints on Γ

104.1

Discussion of the Issues

It is assumed in this section that identification constraints such as Γ˙ j = GS(Γ˙ j Γ˙ j0 ) are imposed if mj = 1. If mj > 1, then identification constraints are not imposed unless otherwise specified. Accordingly Γ˙ j in (6) is identified if and only if mj = 1. The proposed models allow Γ to be constrained by Cγ0 vec(Γ) = cγ , where dim(Cγ ) = p2 × qγ , dim(cγ ) = qγ × 1, cγ ∈ R(Cγ0 ), and {Cγ , cγ } are known. For example, if Cγ = E1,f2 ⊗ E2,f1 , 0 0 where f1 = p − q1 q1 , f2 = q2 p − q2 , and p ≥ q1 + q2 , then Cγ0 vec(Γ) = 0 states that the last q1 variables have zero loadings on the first q2 PCs. This is the redundancy constraint discussed in the context of PCA on Σ by BPH (eq. 4) and by Schott (1991b). In this section identifiability of Cγ0 vec(Γ) and testability of Cγ0 vec(Γ) = cγ are discussed. e γ,1 , and C e γ,2 as Define J , d∗ , C X ∗ e γ,1 def J def (Ej,m ⊗ Ip ) Cγ,j , = {j; j ∈ {1, 2, . . . , d}, mj > 1} , d = size(J ), C = 0p2 ×qγ + j6∈J

X  0 e γ,2 def Cγ,j def (Ej,m ⊗ Ip ) Cγ,j . = Ej,m ⊗ Ip Cγ , and C = 0p2 ×qγ + j∈J

(104)

Details19 e γ,1 + C e γ,2 and C 0 vec(Γ) is identified if and only if C e γ,2 = 0. Nonetheless, Then, Cγ = C γ 0 Cγ vec(Γ) = cγ may be testable even if Cγ0 vec(Γ) is not identified. For example, suppose that mj > 1 and that Kj is a p × s matrix of known constants. Then K0j Γ˙ j is not identified; yet K0j Γ˙ j = 0 is testable because K0j Γ˙ j = 0 ⇐⇒ K0j Γ˙ j Γ˙ j0 = 0 and Γ˙ j Γ˙ j0 = ppo(Γ˙ j ) is identified. Partition the set {1, 2, . . . , p} into two subsets, namely W and its complement W. It follows from P 0 (7) that the projection operator PW def = j∈W γ j γ j is identified if and only if {i ∈ W, f ∈ W} ⇒ λi 6= λf . More generally, consider a space for Σ such that Σ obeys an arbitrary eigen-model in which the eigenvalue multiplicity vector is m. Denote this space by SeΣ . That is, n o 0 e SeΣ def . (105) = Σ; Σ = ΨΦΨ = Σ(θ), Φ = ΓΛΓ ∈ Cp , θ ∈ Θ(m) Then, H0 : Σ ∈ SeΣ , Cγ0 vec(Γ) = cγ is testable against Ha : Σ ∈ SeΣ , Cγ0 vec(Γ) 6= cγ if and only if n o n o the model spaces Σ ∈ SeΣ , Cγ0 vec(Γ) = cγ and Σ ∈ SeΣ , Cγ0 vec(Γ) 6= cγ are disjoint. Furthermore, these spaces are disjoint if and only if Σ ∈ SeΣ and Cγ0 vec(Γ0 ) = cγ =⇒ Cγ0 vec(Γ0 Q) = Cγ0 vec(Γ0 ) ∀ Q ∈ Om , where Om is defined in Table 3 and  0 Γ0 def = Γ˙ 0,1 Γ˙ 0,2 · · · Γ˙ 0,d , where Γ˙ 0,j satisfies Γ˙ 0,j = GS(Γ˙ j Γ˙ j ) for j = 1, . . . , d.

(106)

(107)

Theorem 103 gives a necessary and sufficient condition for (106) to be satisfied. In practice, it is difficult to determine whether or not the condition in Theorem 103 is satisfied because it depends on the space of Γ such that Σ = ΨΓΛΓ0 Ψ ∈ SeΣ . Theorem 104 gives sufficient conditions for (106) to be satisfied. It is straightforward to determine whether or not the sufficient conditions are satisfied because they are conditions on Cγ alone, without reference to Γ.

Theorem 103.

A necessary and sucient condition for Cγ0 vec(Γ) = cγ to be testable in model

SeΣ is 

Σ ∈ SeΣ ,

Cγ0

vec(Γ0 ) = cγ



( e 0 vec(Γ0 ) = cγ and C γ,1 =⇒ F0j Γ˙ 0,j = 0 ∀ j ∈ J , where Fj def = dvec(Cγ,j ; p, qγ mj ),

e γ,1 , J , and Cγ,j are dened in (104). Γ0 is dened in (107), and C In practice, many linear constraints of interest can be written as vec(A0 ΓB) = cγ , where (A, B, cγ ) are matrices of constants.

Corollary 103.1. (a) Consider xed matrices A and B with dimensions dim(A) = p × a and dim(B) = p × b. The hypothesis A0 ΓB = Mγ is testable if and only if ( e1 = Mγ and A0 Γ0 B 0 e Σ ∈ SΣ and (B ⊗ A) vec(Γ0 ) = vec(Mγ ) =⇒ 0˙ A Γ0,j ⊗ Bj = 0 ∀ j ∈ J , e1 = 0p×b + where Bj = E0j,m B and B

X j6∈J

Γ0 is dened in (107), and J dened in (104).

Ej,m Bj ,

Details20 (b) If mj > 1, then any testable hypothesis about a linear function of vec(Γ˙ j ) can be written as K0j Γ˙ j = 0 for some Kj , where Kj is a matrix of constants.

Theorem 104.

The following conditions are jointly sucient for Cγ0 vec(Γ) = cγ to be testable:   e0 (a) cγ ∈ R C and (b) R (Cγ ) = R [(Q ⊗ Ip ) Cγ ] ∀ Q ∈ Om . γ,1

Furthermore, (b) is satised if and only if both (c) and (d) are satised, where (c) and (d) are   e γ,1 (C 0 Cγ )−1 C e 0 = 0, or R (Cγ ) = R C e γ,1 C e γ,2 , (c) C γ γ,2     e γ,1 + ppo C e γ,2 or ppo(Cγ ) = ppo C       X M   e γ,2 = R  10 ∗ ⊗ Ip2 e γ,2 = (d) R C (Ej,m ⊗ Fj ) or ppo C Ej,m E0j,m ⊗ Hj , d j∈J

j∈J

mj 0  1 X m m Ej,m ei j ⊗ Ip ppo (Cγ ) Ej,m ei j ⊗ Ip . Fj is dened in Theorem 103 and Hj def = mj i=1

Corollary 104.1.

Suppose that A is a p × s full column-rank matrix, B is a p × t full column-rank matrix, and cγ is an st-vector. Then, vec(A0 ΓB) = cγ is testable if    e 0 ⊗ A0 , (b) j ∈ J , E0 B 6= 0 =⇒ E0 B has full row-rank, (a) cγ ∈ R B 1 j,m j,m e 1 (B0 B)−1 B e 0 = 0, and (c) B 2 e j = Mj B, M1 = 0p×p + where B

part (c) can be written as

X

Ej,m E0j,m , and M2 = Ip − M1 . Note that the condition in

j6∈J

(Ip − M1 ) ppo(B)M1 = 0.

104.2

Proofs of Theorem 103 and Corollary 103.1

104.2.1 Preliminary Lemmas Lemma 100. If B is a b × b matrix, then tr(BQ) = 0 ∀ Q ∈ Ob =⇒ B = 0.

Details21 Proof. Write B as B = SVD(B) = UDV0 . Then, tr(BQ) = 0 ∀ Q ∈ Ob =⇒ tr(UDV0 Q) = 0 ∀ Q ∈ Ob =⇒ tr(UDV0 VU0 ) = 0 because VU0 ∈ Ob =⇒ tr(D) = 0 =⇒ D = 0 because D = diag(d1 , d2 , . . . , db ) and di ≥ 0 for i = 1, . . . , b =⇒ B = 0.

Lemma 101.

If z is a b × 1 vector, then Qz = z ∀ Q ∈ Ob =⇒ z = 0.

Proof. Suppose that z 6= 0. Choose Q = PQ1 , where P ∈ Pb is arbitrary and the first row of Q1 ∈ Ob is (z0 z)−1/2 z0 . Then, Qz = z ∀ Q ∈ Ob =⇒ PQ1 z = z ∀ P ∈ Pb =⇒ Peb1 (z0 z)1/2 = z ∀ P ∈ Pb =⇒ Peb1 = z(z0 z)−1/2 ∀ P ∈ Pb =⇒ z(z0 z)−1/2 = eb1 = eb2 = · · · = ebb , which cannot be true. Accordingly, z = 0 must be true.

Lemma 102.

If B is an a × b matrix, then Q0a BQb = B ∀ Qa ∈ Oa and ∀ Qb ∈ Ob =⇒ B = 0.

Proof. Write B as B = SVD(B) = UDV0 . Choose Qa = UP0a and Qb = VP0b , where Pa ∈ Pa and Pb ∈ Pb are arbitrary. Then, Q0a BQb = B ∀ Qa ∈ Oa and ∀ Qb ∈ Ob =⇒ Pa U0 BVP0b = B ∀ Pa ∈ Pa and ∀ Pb ∈ Pb =⇒ Pa DP0b = B ∀ Pa ∈ Pa and ∀ Pb ∈ Pb =⇒ B = 0 because a non-diagonal element of D (a zero) can be permuted to any location in the a × b matrix Pa DP0b . Note that Lemma 101 is a corollary of Lemma 102

Details22

Lemma 103.

If B is a b × b symmetric matrix, then Q0 BQ = B ∀ Q ∈ Ob =⇒ B = ωIb ,

where ω is a scalar. Proof. Write B in diagonal form as B = UDU0 , where U ∈ Ob and D is diagonal. Choose Q = UP0 , where P ∈ Pb is arbitrary. Then, Q0 BQ = B ∀ Q ∈ Oa =⇒ PU0 BUP0 = B ∀ P ∈ Pb =⇒ PDP0 = B ∀ P ∈ Pb =⇒ B = D and D = ωIb , where ω is a scalar, because PDP0 permutes the diagonal elements in D.

104.2.2 Theorem 103 Theorem 103. A necessary and sucient condition for Cγ0 vec(Γ) = cγ

to be testable in model

SeΣ is Σ ∈ SeΣ and

Cγ0

( e 0 vec(Γ0 ) = cγ and C γ,1 vec(Γ0 ) = cγ =⇒ F0j Γ˙ 0,j = 0 ∀ j ∈ J , where Fj def = dvec(Cγ,j ; p, qγ mj ),

e γ,1 , J , and Cγ,j are dened in (104). Γ0 is dened in (107), and C

Proof. If d∗ = 0, then Γ is identified and, therefore, Cγ0 vec(Γ) = cγ is testable. Otherwise, denote the distinct values in J as J (1), J (2), . . . , J (d∗ ). If Cγ0 vec(Γ) = cγ is testable, then Σ ∈ SeΣ =⇒ Cγ0 vec(Γ0 Q) = cγ ∀ Q ∈ Om

=⇒

d X

C0γ,j vec(Γ˙ 0,j ) =

j=1

d X

C0γ,j vec(Γ˙ 0,j Qj ),

j=1

where Qj = 1 if mj = 1 and Qj is an arbitrary matrix in Omj otherwise. Choose a value s ∈ J and let Qj = Imj ∀ j 6= s. Then d X j=1

C0γ,j vec(Γ˙ 0,j ) =

d X j=1

C0γ,j vec(Γ˙ 0,j Qj ) =

d X

C0γ,j vec(Γ˙ 0,j ) + C0γ,s vec(Γ˙ 0,s Qs ) ∀ Qs ∈ Oms

j6=s

=⇒ C0γ,s vec(Γ˙ 0,s ) = C0γ,s vec(Γ˙ 0,s Qs ) ∀ Qs ∈ Oms .

Details23 Now choose Qs = −Ims . Then, C0γ,s vec(Γ˙ 0,s ) = −C0γ,s vec(Γ˙ 0,s ) =⇒ C0γ,s vec(Γ˙ 0,s ) = 0 ∀ s ∈ J =⇒ C0γ,j vec(Γ˙ 0,j Qj ) = 0 ∀ j ∈ J and ∀ Qj ∈ Omj q 0 =⇒ et γ C0γ,j vec(Γ˙ 0,j Qj ) = 0 ∀ j ∈ J , ∀ Qj ∈ Omj , and for t = 1, 2, . . . , qγ

=⇒ vec0 (Fjt ) vec(Γ˙ 0,j Qj ) = 0 ∀ j ∈ J , ∀ Qj ∈ Omj , and for t = 1, 2, . . . , qγ , where Fjt = dvec(Cγ,j ekt , p, mj ) =⇒ tr(F0jt Γ˙ 0,j Qj ) = 0 ∀ j ∈ J , ∀ Qj ∈ Omj , and for t = 1, 2, . . . , qγ , =⇒ F0jt Γ˙ 0,j = 0 ∀ j ∈ J , and for t = 1, 2, . . . , qγ , using Lemma 100 =⇒ F0j Γ˙ 0,j = 0 ∀ j ∈ J , where Fj def = Fj1

···

Fj2

 Fjqγ = dvec(Cγ,j ; p, qγ mj )

0

=⇒ (Ej,m ⊗ Fj ) vec(Γ0 ) = 0 ∀ j ∈ J =⇒ F0 vec(Γ0 ) = 0, where F def =



EJ (1),m ⊗ FJ (1)



EJ (1),m ⊗ FJ (1)

= 10d∗ ⊗ Ip2

M



···

EJ (d∗ ),m ⊗ FJ (d∗ )



(Ej,m ⊗ Fj ) .

j∈J

Also, note that if Cγ0 vec(Γ) = cγ is testable, then Cγ0 vec(Γ0 Q) = cγ ∀ Q ∈ Om

=⇒

d X

 Cγ0 Ej,m E0j,m ⊗ Ip vec(Γ0 ) = cγ , because Ip ∈ Om

j=1

=⇒

X

 Cγ0 Ej,m E0j,m ⊗ Ip vec(Γ0 ) = cγ

j6∈J

 because j ∈ J =⇒ Cγ0 Ej,m E0j,m ⊗ Ip vec(Γ) = C0γ,j vec(Γ˙ 0,j ) = 0 e 0 vec(Γ0 ) = cγ . =⇒ C γ,1 Accordingly, the claimed condition is necessary for Cγ0 vec(Γ) = cγ to be testable. To verify that the claimed condition is sufficient, suppose that the condition is satisfied. Pick

Details24 any Σ ∈ SeΣ . Then,  0    e C γ,1 vec(Γ ) = cγ ∀ Γ0 such that Σ = ΨΓΛΓ0 Ψ ∈ SeΣ (Θ, Cγ , cγ ) 0 0 F0 e 0 vec(Γ0 ) = cγ and F0 Γ˙ 0,j = 0 ∀ j ∈ J =⇒ C γ,1 j e 0 vec(Γ0 ) = cγ and F0 Γ˙ 0,j Qj = 0 ∀ j ∈ J and ∀ Qj ∈ Om =⇒ C γ,1 j j e 0 vec(Γ0 ) = cγ and F0 Γ˙ 0,j Qj = 0 ∀ j ∈ J , ∀ Qj ∈ Om and for t = 1, 2, . . . , qγ =⇒ C γ,1 jt j   e 0 vec(Γ0 ) = cγ and tr F0 Γ˙ 0,j Qj = 0 ∀ j ∈ J , ∀ Qj ∈ Om and for t = 1, 2, . . . , qγ =⇒ C γ,1 jt j q 0

e 0 vec(Γ0 ) = cγ and e γ C0 vec(Γ˙ 0,j Qj ) = 0 ∀ j ∈ J , ∀ Qj ∈ Om and for t = 1, 2, . . . , qγ =⇒ C t γ,1 γ,j j e 0 vec(Γ0 ) = cγ and C0 vec(Γ˙ 0,j Qj ) = 0 ∀ j ∈ J and ∀ Qj ∈ Om =⇒ C γ,1 γ,j j e 0 vec(Γ0 ) = cγ and =⇒ C γ,1

X

C0γ,j vec(Γ˙ 0,j Qj ) = 0 ∀ Qj ∈ Omj

j∈J

e 0 vec(Γ0 ) = cγ and C e 0 vec(Γ0 Q) = 0 ∀ Q ∈ Om =⇒ C γ,1 γ,2 =⇒ Cγ0 vec(Γ0 Q) = cγ ∀ Q ∈ Om =⇒ Σ ∈ S Σ =⇒ Cγ0 vec(Γ) = cγ is testable.

104.2.3 Corollary 103.1 Corollary 103.1. (a) Consider xed matrices A and B with dimensions dim(A) = p × a and dim(B) = p × b. The hypothesis A0 ΓB = Mγ is testable if and only if ( e1 = Mγ and A0 Γ0 B 0 e Σ ∈ SΣ and (B ⊗ A) vec(Γ0 ) = vec(Mγ ) =⇒ 0˙ A Γ0,j ⊗ Bj = 0 ∀ j ∈ J , e1 = 0p×b + where Bj = E0j,m B and B

X

Ej,m Bj ,

j6∈J

Γ0 is dened in (107), and J dened in (104).

(b) If mj > 1, then K0j Γ˙ j = 0 is testable and any testable hypothesis about a linear function of Γ˙ j can be written as K0j Γ˙ j = 0 for some Kj , where Kj is a matrix of constants.

Details25 Proof. Verification of Part (a). Part (a) can be verified by using Theorem 103 in which Cγ = B ⊗ A and showing that e 0 vec(Γ0 ) = cγ ⇐⇒ A0 Γ0 B e1 = Mγ and (a)(i) C γ,1 (a)(ii) F0j Γ˙ 0,j = 0 ∀ j ∈ J ⇐⇒ A0 Γ˙ 0,j ⊗ Bj = 0 ∀ j ∈ J . e1 ⊗ A. To establish (a)(ii), Relation (a)(i) is established by writing Cγ as B ⊗ A to obtain Cγ = B write Cγ,j and Fj as follows:  Cγ,j = E0j,m ⊗ Ip (B ⊗ A) = E0j,m B ⊗ A = Bj ⊗ A and   b X a X   0 Fj = dvec(Cγ,j ; p, abmj ) = dvec  Bj ebi ⊗ Aeaf ebi ⊗ eaf ; p, abmj  i=1 f =1

  b X a X  b a b a = dvec  ei ⊗ ef ⊗ Bj ei ⊗ Aef ; p, abmj  i=1 f =1

=

a b X X

Aeaf ebi ⊗ eaf ⊗ Bj ebi

0

i=1 f =1

=

b X

0 A ebi ⊗ Ia ⊗ Bj ebi .

i=1

Accordingly, F0j Γ˙ 0,j =

b X

 ebi ⊗ Ia ⊗ Bj ebi A0 Γ˙ 0,j ,

i=1

F0j Γ˙ 0,j = 0 ∀ j ∈ J ⇐⇒

b X

 ebi ⊗ Ia ⊗ Bj ebi A0 Γ˙ 0,j = 0 ∀ j ∈ J

i=1

 ⇐⇒ Ia ⊗ Bj ebi A0 Γ˙ 0,j = 0 ∀ j ∈ J and for i = 1, 2, . . . b ⇐⇒ A0 Γ˙ 0,j ⊗ Bj ebi = 0 ∀ j ∈ J and for i = 1, 2, . . . b ⇐⇒ A0 Γ˙ 0,j ⊗ Bj = 0 ∀ j ∈ J . Verification of Part (b). Note that e 0 vec(Γ0 ) = cγ C γ,1 0 ˙ and Fj Γ0,j = 0 ∀ j ∈ J

) =⇒ Cγ0 vec(Γ0 ) = cγ .

Accordingly, it follows from Theorem 103 that a necessary and sufficient condition for

Details26 Cγ0 vec(Γ) = cγ to be testable in model SeΣ is ( Σ ∈ SeΣ and

Cγ0

vec(Γ0 ) = cγ ⇐⇒

e 0 vec(Γ0 ) = cγ and C γ,1 F0j Γ˙ 0,j = 0 ∀ j ∈ J .

First, suppose that the constraint of interest is Cγ0 vec(Γ˙ j ) = cγ . It follows from Theorem 103 that if mj > 1, then the constraint Cγ0 vec(Γ˙ j ) = cγ is testable in model SeΣ if and only if Σ ∈ SeΣ and Cγ0 vec(Γ˙ 0,j ) = cγ ⇐⇒ cγ = 0 and K0j Γ˙ 0,j = 0, e γ,1 = 0. Accordingly, where Kj = dvec (Cγ ; p, qγ mj ). The conclusion that cγ = 0 follows because C if the constraint is testable, then it can be written as K0j Γ˙ 0,j = 0. Second, suppose that the constraint of interest is K0j Γ˙ 0,j = 0. It follows from Theorem 103 that the constraint K0j Γ˙ 0,j = 0 is testable because K0j Γ˙ 0,j = 0 ⇐⇒ Cγ0 vec(Γ0 ) = 0 where Cγ = Ej,m ⊗ Kj ⇐⇒ F0j Γ˙ j = 0 where Fj = dvec(Cγ,j ; p, qγ mj )   because vec(Γ˙ j0 Fj ) = vec Imj ⊗ Γ˙ j0 Kj .

104.3

Proofs of Theorem 104 and Corollary 104.1

104.3.1 Preliminary Lemmas Lemma 104. Let B be an s × t matrix with rank t.

Suppose that B = B1 + B2 and that B01 B2 = 0. Then each of the following equalities implies the other three equalities: −1

(a) ppo(B) = ppo(B1 ) + ppo(B2 ), (b) B1 (B0 B) −1

(c) Bj (B0 B)

B0j = Bj B0j Bj

Proof. Part 1: (a) =⇒ (b), (c), and (d).

−

B02 = 0,

B0j for j = 1, 2, and (d) R(B) = R



B1

B2



Details27

(a) =⇒ B (B0 B) =⇒ B1 (B0 B)

−1

−1

B01 + B2 (B0 B)

−1

B0 = ppo(B1 ) + ppo(B2 ) −1

B02 + B1 (B0 B)

B02 + B2 (B0 B)

−1

B01 = ppo(B1 ) + ppo(B2 )

i h −1 −1 −1 −1 =⇒ ppo(Bj ) B1 (B0 B) B01 + B2 (B0 B) B02 + B1 (B0 B) B02 + B2 (B0 B) B01 ppo(Bj ) = ppo(Bj ) [ppo(B1 ) + ppo(B2 )] ppo(Bj ) = ppo(Bj ) for j = 1, 2 −1

=⇒ Bj (B0 B)

B0j = ppo(Bj ) for j = 1, 2 because B01 B2 = 0 −1

=⇒ B1 (B0 B)

B02 + B2 (B0 B)

−1

B01 = 0

i h −1 −1 =⇒ B1 (B0 B) B02 + B2 (B0 B) B01 ppo(B2 ) = 0 −1

=⇒ B1 (B0 B)

B02 = 0 because B01 B2 = 0.

To verify that (a) =⇒ (d), note that    ppo B1 B2 = ppo(B1 ) + ppo(B2 ) because B01 B2 = 0 =⇒ R(B) = R B1

B2



,

by the uniqueness of projection operators. Part 2: (b) =⇒ (a), (c), and (d). −1

(b) =⇒ ppo(B) = B1 (B0 B) −1

=⇒ Bj (B0 B) −1

=⇒ Bj (B0 B)

−1

B01 + B2 (B0 B)

B02

B0j Bj = Bj for j = 1, 2 because B01 B2 = 0 and ppo(B)B = B

h i2 −1 B0j = Bj (B0 B) B0j for j = 1, 2 because ppo(B) ppo(B) = ppo(B) =⇒ ppo(Bj ) = Bj (B0 B) −1

because Bj (B0 B)

−1

B0j for j = 1, 2

h i −1 B0j Bj = Bj =⇒ R Bj (B0 B) B0j = R(Bj ) =⇒ (a) =⇒ (c) and (d)

Part 3: (c) =⇒ (a), (b), and (d).

Details28

(c) =⇒ ppo(B) = ppo(B1 ) + ppo(B2 ) + B1 (B0 B)

−1

−1

B02 + B2 (B0 B)

B01

h i −1 −1 =⇒ [ppo(B) − ppo(B1 ) − ppo(B2 )] B = B1 (B0 B) B02 + B2 (B0 B) B01 B = 0 =⇒ B1 (B0 B) −1

=⇒ B1 (B0 B) −1

=⇒ B1 (B0 B)

−1

−1

B02 B2 = −B2 (B0 B) −1

B02 B2 = B2 (B0 B)

B01 B1

B01 B1 = 0 because R(B1 ) ⊥ R(B2 )

B02 = 0 because AX0 X = 0 =⇒ AX = 0 for any conformable A, X =⇒ (b) =⇒ (a) and (d).

Part 4: (d) =⇒ (a), (b), and (c). (d) =⇒ ppo(B) = ppo



B1

B2



=⇒ (a) because B01 B2 = 0 =⇒ (b) and (c).

Lemma 105.

Suppose that L is an ab × c full row-rank matrix, where a, b, and c are positive e def integers. Then L = dvec(L; a, bc) also has full row-rank. e = 0. Then, Proof. Suppose that t is an a × 1 vector that satisfies t0 L   e =0 e = 0 =⇒ (Ic ⊗ Ib ⊗ t0 ) vec L t0 L (Ic ⊗ Ib ⊗ t0 ) vec (L) = 0 =⇒ (Ib ⊗ t0 ) L = 0 =⇒ t = 0 because L has full row-rank e has full row-rank. =⇒ L

104.3.2 Theorem 104 Theorem 104. The following conditions are jointly sucient for Cγ0 vec(Γ) = cγ 

e0 (a) cγ ∈ R C γ,1



to be testable:

and (b) R (Cγ ) = R [(Q ⊗ Ip ) Cγ ] ∀ Q ∈ Om .

Furthermore, condition (b) is satised if and only if (c) and (d) are satised, where (c) and (d) are

Details29 h e γ,1 (C 0 Cγ )−1 C e 0 = 0 or R (Cγ ) = R C e γ,1 (c) C γ γ,2     e γ,1 + ppo C e γ,2 , and ppo(Cγ ) = ppo C

e γ,2 C

i

or

  e γ,2 = R(F), where F is dened in Theorem 103, or (d) R C   X  e γ,2 = ppo C Ej,m E0j,m ⊗ Hj , where j∈J

Hj = m−1 j

mj X

mj

Ej,m ei

  0  fγ Ej,m emj ⊗ Ip . ⊗ Ip ppo C i

i=1

Proof. First, some consequences of (b) will be established: (b) ⇐⇒ ppo(Cγ ) = (Q ⊗ Ip ) ppo(Cγ ) (Q0 ⊗ Ip ) ∀ Q ∈ Om 0  p 0 ⇐⇒ Ef,m ⊗ epg ppo(Cγ ) (Eh,m ⊗ epi ) = Qf E0f,m ⊗ ep0 g ppo(Cγ ) (Eh,m Qh ⊗ ei ) 0 = Qf Ef,m ⊗ epg ppo(Cγ ) (Eh,m ⊗ epi ) Q0h ∀ Q ∈ Om , f = 1, . . . , d, g = 1, . . . , p, h = 1, . . . , d, and i = 1, . . . , p 0 ⇐⇒ Ef,m ⊗ epg ppo(Cγ ) (Eh,m ⊗ epi ) =  0    0  ωf gi Imf    ωf ghi

if if if if

f f f f

∈ J, ∈ J, ∈ J, 6∈ J ,

h 6∈ J or f 6∈ J , h ∈ J using Lemma 101, h ∈ J , f 6= h using Lemma 102, f = h using Lemma 103 h 6∈ J

where ωf gi is a scalar yet to be determined and 0 ωf ghi = Ef,m ⊗ epg ppo(Cγ ) (Eh,m ⊗ epi ) for f 6∈ J , h 6∈ J

⇐⇒ ppo(Cγ ) =

p p X d X d X X

 p p0  0 Ef,m E0f,m ⊗ epg ep0 g ppo(Cγ ) Eh,m Eh,m ⊗ ei ei

f =1 g=1 h=1 i=1

=

p X X p XX

p X p    XX  + Ef,m ⊗ epg ωf gi Imf E0f,m ⊗ ep0 Ef,m ⊗ epg ωf ghi E0h,m ⊗ ep0 i i f ∈J g=1 i=1

f 6∈J g=1 h6∈J i=1

e1 +H e 2 , where =H e1 = H

X X f 6∈J h6∈J

 Ef,m E0h,m ⊗ Hf h ,

e2 = H

X f ∈J

 Ef,m E0f,m ⊗ Hf ,

Details30

Hf h =

p X p X

0

epg ωf ghi ep0 i = (Ef,m ⊗ Ip ) ppo(Cγ ) (Eh,m ⊗ Ip ) and

g=1 i=1

Hf =

p X p X

epg ωf gi ep0 i

=

g=1 i=1

m−1 f

mf X

mj

Ef,m ei

0  m ⊗ Ip ppo (Cγ ) Ef,m ei j ⊗ Ip .

i=1

e j for j = 1, 2 include the following: Properties of H e 1H e 2 = 0, (i) H e2 = H e j because (ii) H j 

e1 +H e2 H

2

    e1 +H e 2 =⇒ H e1 +H e2 = H e2 +H e 2 =⇒ H e 1 I p2 − H e1 = H e2 H e 2 − Ip2 =H 1 2

        e 1 I p2 − H e 1 = 0 and H e2 H e 2 − Ip2 = 0 using R H e1 ⊥ R H e2 , =⇒ H   e γ,j for j = 1, 2 because e j = ppo C (iii) H n o e 1C e γ,2 = 0, H e 2C e γ,1 = 0 =⇒ H e jC e γ,j = C e γ,j for j = 1, 2, and ppo(Cγ )Cγ = Cγ , H       e1 +H e2 H ej = H e j =⇒ R H ej ⊆ R C ej = H e γ,j ppo(Cγ )H     e j ≤ rk C e γ,j for j = 1, 2, =⇒ rk H and   e γ,1 is not constrained because (iv) ppo C   X X  −   e γ,1 = e0 C e ppo C Ef,m E0f,m ⊗ Ip Cγ C Cγ0 Eh,m E0h,m ⊗ Ip γ,1 γ,1 f 6∈J h6∈J

=

X X

 Ef,m E0h,m ⊗ Hf h , where

f 6∈J h6∈J

−  −1    e0 C e γ,1 C 0 (Eh,m ⊗ Ip ) = E0 ⊗ Ip Cγ C f0 C fγ Hf h = E0f,m ⊗ Ip Cγ C Cγ0 (Eh,m ⊗ Ip ) γ,1 γ f,m γ using Lemma 104. Accordingly, (b) =⇒

  

R(Cγ ) = R

h

e γ,1 C

e γ,2 C

i

   X  e γ,2 = and ppo C Ef,m E0f,m ⊗ Hf ,  f ∈J

Details31     h i e γ,1 + ppo C e γ,1 =⇒ R(Cγ ) = R C e γ,1 C e γ,2 using Lemma 104. because ppo(Cγ ) = ppo C It also is true that     h i   X  e γ,2 = e γ,1 C e γ,2 (b) ⇐= R(Cγ ) = R C and ppo C Ef,m E0f,m ⊗ Hf ,   f ∈J

  e γ,1 is not constrained, because ppo C n h e γ,1 R(Cγ ) = R C

e γ,2 C

i

o     e0 C e γ,2 = 0 =⇒ ppo(Cγ ) = ppo C e γ,1 + ppo C e γ,2 ,C γ,1 using Lemma 104,

Q

X

X   Ef,m E0f,m ⊗ Hf Q0 = Ef,m Qf Q0f E0f,m ⊗ Hf

f ∈J

f ∈J

=

X

Ef,m E0f,m ⊗ Hf



∀ Q ∈ Om , and

f ∈J

Q

X X

X X   Ef,m E0h,m ⊗ Hf h Q0 = Ef,m E0h,m ⊗ Hf h ∀ Q ∈ Om

f 6∈J h6∈J

f 6∈J h6∈J

because f 6∈ J =⇒ QEf,m = Ef,m . Note that h i  e γ,1 C e γ,2 =⇒ C e γ,1 R(Cγ ) = R C

 e γ,2 = Cγ U, where C

  −1 0    0 e C e γ,1 C e0 C e γ,2 . e γ,1 C e γ,2 = Cγ0 Cγ −1 C U = Cγ0 Cγ Cγ C γ,1 γ,2   2 p 0 e0 e0 e Use cγ ∈ R C γ,1 to write cγ as cγ = Cγ,1 b for some b ∈ IR . Choose any Σ = ΨΓΛΓ Ψ ∈ SΣ . If (a) and (b) are satisfied, then, Cγ0 vec(Γ) = cγ =⇒ U0 Cγ0 vec(Γ) = U0 cγ

=⇒

e0 C γ,1 e0 C

! vec(Γ) =

γ,2

e γ,1 C 0 Cγ because C γ

=⇒

e0 C e C γ,1 γ,1 0 e C e γ,2 C

!

−1 Cγ0 Cγ

e0 b = C γ,1

γ,2

−1



   e0 b cγ C γ,1 = , 0 0

   e γ,1 = ppo C e γ,1 and C e γ,2 C 0 Cγ −1 C e γ,1 = 0 using Lemma 104 C γ

e0 C γ,1 e2 H

!

=⇒

 vec(Γ) =

e0 C γ,1 e2 H

!

     cγ e γ,2 = R H e2 , because R C 0

  cγ vec(ΓQ) = , ∀ Q ∈ Om 0

n e γ,1 = C e γ,1 and because (Q ⊗ Ip ) C

Details32 e 2 vec(Γ) = 0 =⇒ Hj Γ˙ j = 0 ∀ j ∈ J =⇒ Hj Γ˙ j Qj = 0 ∀ j ∈ J and ∀ Qj ∈ Om H j e 2 vec(ΓQ) = 0 ∀ Q ∈ Om =⇒ H

o

    e 0 vec(ΓQ) = 0 ∀ Q ∈ Om because R C e γ,2 = R H e2 =⇒ C γ,2 =⇒ Cγ0 vec(ΓQ) = cγ ∀ Q ∈ Om =⇒ Σ ∈ S Σ =⇒ S Σ = SeΣ =⇒ Cγ0 vec(Γ) = cγ is testable. It follows from Lemma 104 that the three conditions in (c) are equivalent. It can be shown that the two conditions in (d) also are equivalent. First note that  −     M  − e γ,2 = R(F) =⇒ ppo C e γ,2 = F (F0 F) F0 = F  Imj ⊗ F0j Fj  F0 R C j∈J

 = F

 M

Imj ⊗ F0j Fj 

− 

 F0

j∈J

=

X

 − Ej,m E0j,m ⊗ Hj , where Hj = Fj F0j Fj F0j .

j∈J ∗ ∗∗ Second, define the set J ∗ as J ∗ def = {j; j ∈ J , Hj 6= 0} and denote the size of J by d . For def ∗ 0 j ∈ J , define sj = rk(Hj ) and write Hj as Hj = Uj Uj , where Uj ∈ Op,sj . Then,   X  e γ,2 = ppo C Ej,m E0j,m ⊗ Hj j∈J

 



e γ,2 = R  =⇒ R C

 X

Ej,m E0j,m ⊗ Hj  

j∈J

e γ,2 = BL for some full row-rank matrix L, where =⇒ C B = 10d∗∗ ⊗ Ip2

 M

(Ej,m ⊗ Uj ) because BB0 =

j∈J ∗

=⇒ Cγ,j

X

Ej,m E0j,m ⊗ Hj



j∈J

( 0mj p×qγ  = Imj ⊗ Uj Lj

if j ∈ J , j 6∈ J ∗ otherwise,

where the rows of Lj are linearly independent and are a subset of the rows of L e j , where L e j = dvec(Lj ; sj , qγ mj ) =⇒ Fj = dvec(Cγ,j ; p, qγ mj ) = Uj L

Details33 e j has full row-rank using Lemma 105 =⇒ R(Fj ) = R(Uj ) because L X   Ej,m E0j,m ⊗ ppo(Fj )

=⇒ ppo(F) =

j∈J ∗

=

X

X   Ej,m E0j,m ⊗ Hj Ej,m E0j,m ⊗ Uj U0j = j∈J ∗

j∈J ∗

=

X

Ej,m E0j,m ⊗ Hj



because Hj = 0p×p if j ∈ J , j 6∈ J ∗

j∈J

  e γ,2 = ppo C   e γ,2 = R(F). =⇒ R C To verify that (c) and (d) are sufficient for (b), note that           X  e γ,1 + ppo C e γ,2 and ppo C e γ,2 = ppo(Cγ ) = ppo C Ef,m E0f,m ⊗ Hf   f ∈J

=⇒ (Q ⊗ Ip ) ppo(Cγ ) (Q0 ⊗ Ip )  = (Q ⊗ Ip ) 

 X X

Ef,m E0h,m ⊗ Hf h + 

f 6∈J h6∈J

=

X X f 6∈J h6∈J

X

Ef,m E0f,m ⊗ Hf  (Q0 ⊗ Ip ) 

f ∈J

 X  Ef,m E0h,m ⊗ Hf h + Ef,m Qj Q0j E0f,m ⊗ Hf = ppo(Cγ ) f ∈J

=⇒ R(Cγ ) = R [(Q ⊗ Ip ) Cγ ] ∀ Q ∈ Om .

104.3.3 Corollary 104.1 Corollary 104.1. Suppose that m 6= 1p , A is a p × a full column-rank matrix, B is a p × b full column-rank matrix, and cγ is an ab-vector. Then, vec(A0 ΓB) = cγ is testable if  (a) cγ ∈ R (B01 ⊗ A0 ) , (b) j ∈ J , E0j,m B 6= 0 =⇒ E0j,m B has full row-rank, and (c) B1 (B0 B)

−1

B02 = 0, where Bj = Mj B, M2 =

X

Ej,m E0j,m , and M1 = Ip − M2 .

j∈J

Proof. Assume that (a), (b), and (c) are satisfied. Note that B = B1 + B2 and B01 B2 = 0. Using

Details34 Lemma 104, it follows that (c) =⇒ ppo(B) = ppo(B1 ) + ppo(B2     e γ,1 + ppo C e γ,2 , where C e γ,j = Mj B ⊗ A for j = 1, 2. =⇒ ppo(Cγ ) = ppo C  Define the set J ∗ as J ∗ = j; j ∈ J , E0j,m B 6= 0 and denote the size of J ∗ by d∗∗ . Then,  (b) =⇒ R Ej,m E0j,m B = R(Ej,m ) ∀ j ∈ J ∗ 

 



e2 = =⇒ R B

0 R (1d∗∗

⊗ Ip )

M

Ej,m E0j,m B

j∈J ∗



 = R (10d∗∗ ⊗ Ip )

M

Ej,m 

j∈J ∗

  X e2 = =⇒ ppo B Ej,m E0j,m j∈J ∗

    X e γ,2 = ppo B e 2 ⊗ ppo(A) = =⇒ ppo C Ej,m E0j,m ⊗ ppo(A) j∈J ∗

( =

X

Ej,m E0j,m



⊗ Hj , where Hj =

j∈J

ppo(A) if j ∈ J ∗ 0p×p otherwise.

  e0 To complete the proof, note that (a) =⇒ cγ ∈ R C γ,1 . Accordingly, the sufficient conditions listed in Theorem 104 are satisfied and vec(A0 ΓB) = cγ is testable.

105

Eigenvalue -Vector Dependencies in Correlation Matrices

The eigenvalues and eigenvectors of Σ = ΓΣ ΛΣ Γ0Σ have separable parameterizations, even if constraints are imposed on ΛΣ and/or ΓΣ as in BPH. In contrast, the eigen-parameters of a correlation matrix generally are not separable because the constraint diag(Φ) = diag(ΓΛΓ0 ) = 1p induces dependencies between Λ and Γ. Specifically, λ ∈ Sλ ⊆ Sλ (Γ) and Γ ∈ Op ⊆ SΓ (λ). If λ is parameterized such that 10p λ = p, then diag(Φ) = 1p generally imposes p − 1 linearly independent constraints on Φ. For some Φ ∈ Cp structures, however, the number of linearly independent constraints can be fewer than p − 1. For example, the value of τ ∗ in Theorem 2 corresponds to the number of linearly independent constraints induced by diag(Φ) = 1p .

105.1

Properties of Sλ (Γ)

Lemma 2 on page 137 of this Supplement gives properties of orthogonal matrices which, in turn, are used in Theorem 4 on page 138 of this Supplement and Corollary 4.1 on page 139 of this

Details35 Supplement to deduce several properties of Sλ (Γ). Lemma 2(a) and Theorem 4(a) follow from Schott (1998, Theorem 1). This Supplement contains illustrations of Theorem 4 and Corollary 4.1. Theorem 4(b) reveals that Sλ (Γ) consists of the strictly positive vectors of an affine space with dimension nγ , where nγ ∈ {0, 1, . . . , p − 1}. Also, 1p ∈ Sλ (Γ) ∀ Γ ∈ Op ⇒ Sλ (Γ) 6= ∅ and nγ = 0 ⇒ Sλ (Γ) = 1p ⇒ Φ = Ip . Lemma 2(b) reveals that nγ = p − 1 ⇔ Γ Γ = 1p (1/p)10p and, from Theorem 4(c), it can be concluded that Sλ (Γ) = Sλ ⇔ nγ = p − 1. These equivalent √ √ conditions are satisfied if and only if p Γ ∈ Hp ; i.e., p Γ is a square matrix whose elements are ±1 and whose columns (rows) are pairwise orthogonal (a Hadamard matrix). Accordingly, it is of interest to know the conditions under which nγ = p − 1 is possible. It can √ be shown (Graybill, 1983, Theorem 8.14.5) that if p Γ ∈ Hp , then p ∈ {1, 2} or p ≡ 0 mod (4). If H1 and H2 are Hadamard matrices of order q1 and q2 , respectively, then H1 ⊗ H2 ∈ Hq1 q2 . This result can be used to construct Hadamard matrices of order p = 2i q for i = 0, 1, . . . , ∞, provided that a Hadamard matrix of order q can be constructed. The Hadamard conjecture states that a p × p Hadamard matrix exists if and only p ∈ {1, 2} or p ≡ 0 mod (4). If p < 668 and p ∈ {1, 2} or p ≡ 0 mod (4), then a p × p Hadamard matrix is known to exist (Ðoković, 2008), but the √ Hadamard conjecture remains a conjecture. If p Γ 6∈ Hp , then nγ < p − 1 and Sλ (Γ) is a proper subset of Sλ . Schott (1998, p. 447) claimed that if p is an integer for which a p × p Hadamard matrix does not exist, then a matrix Γ ∈ Op does exist such that Sλ (Γ) consists of the strictly positive vectors of an nγ = p − 2 dimensional affine space. Schott verified his claim by constructing, for any p, a matrix Γ ∈ Op such that if a p-dimensional matrix does not exist, then Sλ (Γ) consists of the strictly positive vectors of an nγ = p − 2 dimensional affine space. It is shown below that Schott’s construction of such a matrix was flawed. Let p∗ be the largest integer for which p∗ ≤ p and a p∗ × p∗ Hadamard matrix exists. Recall that a Hadamard matrix of order 2i exists for every non-negative integer i. Accordingly, a non-negative integer, i, exists such that 2i ≤ p∗ ≤ p < 2i+1 . Furthermore, 2i ≤ p∗ ≤ p < 2i+1 =⇒ 2i+1 ≤ 2p∗ ≤ 2p < 2i+2 =⇒ 2i+1 − p ≤ 2p∗ − p ≤ p < 2i+2 − p

(108)

=⇒ 0 < 2i+1 − p ≤ 2p∗ − p ≤ p because p < 2i+1 . Any Hadamard matrix can be normalized such that each element in the first row and first column is +1. Suppose that p∗ < p. Denote an arbitrary normalized p∗ × p∗ Hadamard matrix by Hp∗ and partition Hp∗ as   Hp∗,1 Hp∗ = , where dim(Hp∗,1 ) = k × p∗, Hp∗,2 dim(Hp∗,2 ) = (p∗ − k) × p∗, and k = 2p∗ − p. Note that k > 0 by (108). Schott’s (1998, p. 447) construction of a matrix that satisfies Γ ∈ Op and rk(Γ Γ) = 2 was the following: ! −1/2 0(p−p∗ )×(p−p∗ ) p∗ Hp∗,2 def . (109) Γ= −1/2 k 1/2 1p∗ (1/p∗ )10p∗ p∗ H0p∗,2 Note that,  0  (a) (1/p∗ )Hp∗,2 Hp∗,2 = Ip∗ −k , Γ ∈ Op =⇒ (b) Hp∗,2 1p∗ = 0, and   (c) 1p∗ (k/p∗ )10p∗ + (1/p∗ )H0p∗,2 Hp∗,2 = Ip∗ .

Details36 It is easy to verify that (a) and (b) are satisfied and that (a,c) =⇒ rk(Hp∗,2 ) = rk(H0p∗,2 Hp∗,2 ) = p∗ − k = rk [Ip∗ − 1p∗ (k/p∗ )1p∗ ] , because rk(M0 M) = rk(MM0 ) = rk(M) = rk(αM) for any matrix M and any non-zero scalar α. Furthermore, p∗ − k = rk [Ip∗ − 1p∗ (k/p∗ )1p∗ ] =⇒ k ∈ {0, 1}, The value of k cannot be 0, however, because that would contradict (108). Accordingly, the construction in (109) can be correct only if k = 1. It is readily shown that if k = 1, then the construction in (109) is correct because ! −1/2 p∗ Hp∗,2 0(p−p∗ )×(p−p∗ ) ∈ Op , k = 1 =⇒ Γ = −1/2 1p∗ (1/p∗ )10p∗ p∗ H0p∗,2  1p−p∗ (1/p∗ )10p∗ and Γ Γ = 1p∗ (1/p2∗ )10p∗ =

 1p−p∗ 0p∗ ×1

0(p−p∗ )×1 1 p∗

  −1 p∗ p∗−2

0



p−1 ∗

0(p−p∗ )×(p−p∗ ) 1p∗ (1/p∗ )10p−p∗

1p∗

0(p−p∗ )×1

0p∗ ×1 1p−p∗



0 =⇒ rk(Γ Γ) = 2,

where dim(Hp∗,2 ) = (p − p∗ ) × p∗ and 2p∗ − p = 1. Accordingly, Schott’s construction is correct only if p = 2p∗ − 1. Values of p for which the construction in (109) is correct are {p 6= 1, p 6= 2, p 6≡ 0

mod (4), p = 2p∗ − 1} =⇒ p ∈ {3, 7, 15, 23, 31, 39, 47, . . .}.

Values of p for which the construction in (109) is not correct are {p 6= 1, p 6= 2, p 6≡ 0

mod (4), p 6= 2p∗ − 1}

=⇒ p ∈ {5, 6, 9, 10, 11, 13, 14, 17, 18, 19, 21, 22, 25, 26, 27, 29, 30, 33, 34, 35, 37, 38, 41, 42, 43, 45, 46 . . .}. Although Schott’s construction is flawed, his claim (Schott, 1998, p. 447) that if p is an integer for which a p × p Hadamard matrix does not exist, then a matrix Γ ∈ Op does exist such that Sλ (Γ) consists of the strictly positive vectors of an nγ = p − 2 dimensional affine space is correct, at least if p < 668.

Theorem 105. Let p be a positive integer and let p∗ be the largest integer such that p∗ ≤ p and a p∗ × p∗ Hadamard matrix exists. Dene nγ as in (17). If p − p∗ ∈ {0, 1, 2, 3}, then max nγ = p − δ, where δ = 1 if p∗ = p and δ = 2 otherwise. Γ∈Op

If the Hadamard conjecture is correct, then p − p∗ ∈ {0, 1, 2, 3} always holds. Otherwise, the result in Theorem 105 still holds for most applications of principal components because p − p∗ ∈ {0, 1, 2, 3} is satisfied for p < 668 and p < 668 generally is satisfied in PC applications.

105.2

Properties of SΓ (λ)

For λ ∈ Sλ , the set SΓ (λ) can be defined as SΓ (λ) = {Γ; Γ ∈ Op , (Γ Γ) λ = 1p }. If m = p, then λ = 1p and SΓ (λ) = Op . Otherwise, SΓ (λ) generally depends on λ through both the multiplicities and the values of the distinct eigenvalues. A notable exception is described in Theorem 5 on page 143 of this Supplement. A corollary to Theorem 5 is given on page 144 of this Supplement.

Details37

105.3

Proof of Theorem 105

Theorem 105.

Let p be a positive integer and let p∗ be the largest integer such that p∗ ≤ p and a p∗ × p∗ Hadamard matrix exists. Dene nγ as in (17). If p − p∗ ∈ {0, 1, 2, 3}, then max nγ = p − δ, where δ = 1 if p∗ = p and δ = 2 otherwise. Γ∈Op

Proof. The claim can be verified by constructing orthogonal matrices that satisfy the rank condition. If p∗ = p and Γ = (1/p)Hp , then Γ ∈ Op and rk(Γ Γ) = 1. If p − p∗ = 1 and   (p − 1)−1/2 Hp−1 0(p−1)×1 , then Γ ∈ Op , Γ= 01×(p−1) 1 Γ Γ=

 1p−1 (p − 1)−1 10p−1 01×(p−1)

 0(p−1)×1 , and rk(Γ Γ) = 2. 1

If p − p∗ = 2 and  Γ=  Γ Γ=

(p − 2)−1/2 Hp−2 02×(p−2)

 0(p−2)×2 , then Γ ∈ Op , 2−1/2 H2

1p−2 (p − 2)−1 10p−2 02×(p−2)

 0(p−2)×2 , and rk(Γ Γ) = 2. 12 (1/2)102

If p − p∗ = 3 and  Γ=

(p − 3)−1/2 Hp−3,2 14 [4(p − 3)]−1/2 10p−3

  0  0(p−4)×3 1q , where H = , q Hq,2 2−1 H04,2 1p−4 (p − 3)−1 10p−3 14 [4(p − 3)]−1 10p−3

 then Γ ∈ Op ,  =

1p−4 04×1

0(p−4)×1 14

Γ Γ=

(p − 3)−1 [4(p − 3)]−1



 1p−3 4−1 03×1 0

0(p−4)×3 14 (1/4)103

0(p−3)×1 13



0 , and rk(Γ Γ) = 2.

If p = 7, then p − p∗ = 3 and the orthogonal matrix, Γ, constructed in the proof of Theorem 105 has the same structure as the orthogonal matrix, Γ, in (109).

106

Remarks on Existing Inference Procedures

106.1

Lawley (1963)

Suppose that Φ satisfies Lawley’s (1963) compound symmetry structure, Φ = (1 − α)Ip + 1p α10p . 0 0 If α ∈ (−1/(p − 1), 0), then m = p − 1 1 γ p = 1p p−1/2 , and ρ = 1 − α (p − 1)α + 1 , where m is the vector of multiplicities of the ordered distinct eigenvalues and ρ is the vector of ordered 0 distinct eigenvalues. If α ∈ (0, 1), then m = 1 p − 1 γ 1 = 1p p−1/2 , and 0 ρ = (p − 1)α + 1 1 − α . Note that n 0 0 o Φ = (1 − α)Ip + 1p α10p =⇒ m ∈ 1 p − 1 , p − 1 1 but m∈

n

1

0 p−1 , p−1

0 o 1 6=⇒ Φ = (1 − α)Ip + 1p α10p .

Details38 0 0 If m = p − 1 1 , then γ p need only satisfy γ 2 p = 1p (1/p) and if m = 1 p − 1 , then γ 1 need 2 only satisfy 5.1. Accordingly, compound symmetry is a special case n γ 1 = 1p (1/p). SeeCorollary o 0 0 of m ∈ 1 p − 1 , p − 1 1 ; equality of the smallest or largest p − 1 eigenvalues of Φ does not imply compound symmetry; and Lawley’s (1963) test is not merely a test of equality of the smallest or largest p − 1 eigenvalues of Φ. Lawley’s test is a test of the hypothesis that the smallest p − 1 eigenvalues of Φ are homogeneous and that γ 1 = 1p p−1/2 or that the largest p − 1 eigenvalues of Φ are homogeneous and that γ p = 1p p−1/2 .

106.2

Bentler and Yuan (1998)

Several issues regarding Bentler and Yuan’s (1998) modified χ2 test require comment. 1. It can be shown (Boik, 1998, page 263) that if nS ∼ Wp (n, Σ) and all eigenvalues of Φ are simple, then √

dist

b R − λ) −→ N (0, Ω) , where n(λ

(110)

 0 Ω = Ω(λ, Γ) = 2U (Φ ⊗ Φ) U0 , where U = L021,p (Γ ⊗ Γ) Ip2 − (Φ ⊗ Ip ) L22,p Φ = ΓΛΓ0 is the diagonal form of Φ, and Lqr,p is defined in Table 2. Note that Ω depends b Γ), b dg Γ b where Γ bλ b 0 is the on Γ as well as on λ. A minimum χ2 test would employ Ω(λ, 2 b bR ) or minimum χ estimator of Φ under H0 . Bentler and Yuan employed either Ω(λR , Γ e Γ e is Bentler and Yuan’s estimator of λ under H0 . Accordingly, Bentler and bR ), where λ Ω(λ, Yuan’s minimum χ2 test is a modified minimum χ2 test. 0 2. Let h = r p − r Bentler and Yuan’s model for λ is 

  λ1 1  λ2      1 0 E λ    1,h λ =  ...  = , where X =  . Xβ    ..  λr  1 Xβ

 p−r−1 p − r − 2  , ..  . 0

and β is a 2 × 1 unknown vector of parameters. Their estimator of λ is ! 0 b E λ R 1,h e= b is the minimizer of λ , where β b Xβ  0  −1   0 0 b b R − Xβ b 2,h b R ) = n E0 λ Q(β, Γ E ΩE E λ − Xβ , R 2,h 2,h 2,h e is not constrained to equal p and e Γ bR, Γ b = Ω(λ, bR ) or Ω b = Ω(λ bR ). Note that (a) 10 λ and Ω p e dg Γ bR λ b 0 need not be a correlation matrix. These issues may (b) the estimator of Φ, namely Γ R explain the slow convergence of the test statistic to χ2 . 3. Bentler and Yuan require that p − r ∈ {3, . . . , p − 1}. The value of p − r is not allowed to be p because if p − r = p, then E2,h = Ip and E02,h ΩE2,h is singular. A generalized inverse of E02,h ΩE2,h could be used to allow p − r = p, but Bentler and Yuan do not discuss this modification.

Details39 4. Equation (29) in Bentler and Yuan gives the asymptotic non-null distribution of the test statistic. The equation should be   dist −1/2 b −→ b ∼ χ2 rather than Qn (β) χ2p−r−2,ϕ Qn (β) + O n p p−r−2,ϕ because the non-centrality parameter, ϕ, is a function of n and limn→∞ ϕ = ∞. 5. The algorithm given in equations (25) and (26) can yield a test statistic substantially different from that obtained by iterating (27) and (28). Below are test statistic values for example 1 in Bentler and Yuan (1998).

p−r 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3

107

Iterating (25)–(26) 67.3683 41.9341 27.1816 13.5033 11.0027 9.6432 7.6840 6.6461 4.9810 4.0607 3.5554 3.5359 3.4978 3.3449 3.0811 2.6350 2.5023 1.6340 1.2538 1.0741 1.0706

Iterating (27)–(28) 103.3027 58.1106 34.8412 12.7203 10.7096 9.5324 7.7599 6.8246 5.0466 4.1003 3.6117 3.5739 3.5573 3.4539 3.2127 2.7336 2.5270 1.6298 1.1928 0.9745 0.9732

Details on Eigenvalue Parameterizations

For convenience, Table 5 is reproduced in this Supplement as Table 103 on page 40 of this Supplement. Details about parameterizations (1a)–(4) in Table 103 are given in the following sections. Each subsection gives (a) derivatives of λ and h∗3 (λ) with respect to ξλ and (b) an b is computed before algorithm to compute an initial guess for λ = λ(ξλ ) ∈ Sλ . The initial guess, λ,  b b b computing Γ. Accordingly, the initial guess need not satisfy λ ∈ Sλ Γ . For convenience, expressions for derivatives of λ and h∗3 (λ) are listed in Tables 104–108. Expressions for first (1) (1) derivatives, namely Dλ;ξ0 and Dh3 ;ξ0 , are listed in Table 104. Expressions for second derivatives λ λ are listed in Tables 105 and 106 on pages 42 and 43 of this Supplement. Expressions for third derivatives are listed in Tables 107 and 108 on pages 44 and 45 of this Supplement. All derivative expressions have been verified numerically. Special notation for this section is listed in Table 109 on page 46 of this Supplement. If qλ > 0, then Cλ must satisfy rk(Cλ ) = qλ in all structures and it must satisfy 1p 6∈ R(Cλ ) in e 0 T1 does not have full row-rank in (2a), then one or more columns structures (1b, 2a, 2c, 3b). If C λ of Cλ (and rows of cλ ) are degenerate and can be deleted. If 10q1 T2 = 0 is not satisfied in (1b, 2a,

Details40

Table 103: Eigenvalue Structures for Correlation Matrices Structure for λ

Optional Constraints

1a

T2 ξλ

Cλ0 λ = cλ

10p T2 6= 0, rk(Cλ0 T2 ) = qλ qλ ≥ 1, Cλ0 λ = cλ ⇒ 10p λ = p

1b

1p + T2 ξλ

Cλ0 λ = cλ

10p T2 = 0,

2a

p T1 exp {T2 ξλ } 10p T1 exp {T2 ξλ }

Cλ0 λ = cλ

10p T1 6= 0, 10q1 T2 = 0 e 0 T1 ) = qλ rk(C λ e Cλ = Cλ − 1p p−1 cλ0

2b

p T1 exp {T2 ξλ } 10p T1 exp {T2 ξλ }

Cλ0 ln (λ) = cλ

10p T1 6= 0, 10q1 T2 = 0 rk(Cλ0 T1 ) = qλ

2c

p T1 exp {T2 ξλ } 10p T1 exp {T2 ξλ }

 ¯ g = cλ Cλ0 ln λ/λ

10p T1 6= 0, 10q1 T2 = 0 Qp 1/p ¯ g def rk(Cλ0 T1 ) = qλ , λ = i=1 λi

3a

p exp {T1 exp [T2 ξλ ]} 10p exp {T1 exp [T2 ξλ ]}

Cλ0 ln (λ) = cλ

10p T1 = 0,

rk(Cλ0 T1 ) = qλ

3b

p exp {T1 exp [T2 ξλ ]} 10p exp {T1 exp [T2 ξλ ]}

 ¯ g = cλ Cλ0 ln λ/λ

10p T1 = 0,

rk(Cλ0 T1 ) = qλ

1p + T1 exp {T2 ξλ }

Cλ0 λ = cλ

10p T1 = 0,

rk(Cλ0 T1 ) = qλ

4

Requirements on T1 , T2 , & Cλ

rk(Cλ0 T2 ) = qλ

2b, 2c), then T2 can be replaced by any full column-rank matrix whose columns are a basis set for R [(Iq1 − Hq1 )T2 ], where Hq1 = ppo (1q1 ). If 10p T1 = 0 is not satisfied in (3a, 3b, 4), then T1 can be replaced by (Ip − H1 )T1 , provided that Cλ0 (Ip − H1 )T1 has full row-rank, where H1 = ppo (1p ).

107.1

Parameterization (1a): λ = T1 ξλ , Cλ0 λ = cλ

To ensure that Φ is a correlation matrix, the constraint Cλ0 λ = cλ must satisfy 1p = Cλ b for some b and b0 cλ = p. This condition is satisfied by equating the first column of Cλ to 1p and equating the first element of cλ to p.

107.1.1

Derivatives of λ with Respect to

ξλ

First, second, and third derivatives of λ with respect to ξλ are listed below: (1)

Dλ;ξ0 = T2 , λ

(2)

(3)

Dλ;ξ0 ,ξ0 = 0p×q22 , and Dλ;ξ0 ,ξ0 ,ξ0 = 0p×q23 . λ

λ

λ

λ

λ

Details41

(1)

(1)

Table 104: Expressions for Dλ;ξ0 and Dh3 ;ξ0

λ

λ

Structure for λ

(1)

(1)

Dλ;ξ0

Dh3 ;ξ0

T2

Cλ0 Dλ;ξ0

2a

pw (Ip − Hλ ) T1 Dξλ T2

(1) Cλ0 Dλ;ξ0 λ

2b

Same as 2a

Cλ0 Λ−1 λ Dλ;ξ0

Same as 2a

Cλ0

3a

(Ip − Hλ ) Λλ T1 Dξλ T2

(1) Cλ0 Λ−1 λ Dλ;ξλ0

3b

Same as 3a

Cλ0 T1 Dξλ T2

4

T1 Dξλ T2

Cλ0 Dλ;ξ0

λ

λ

1a, 1b

(1)

λ

(1)

λ

2c

107.1.2

(1)

(Ip − H1 ) Λ−1 λ Dλ;ξ0

λ

(1)

λ

Initial Guess for ξλ

Define Wλ as Wλ = Cλ0 T2 . By assumption, Wλ has full row-rank, namely rk(Wλ ) = rk(Cλ ) = qλ . Write Wλ as svd (Wλ ) = Uλ Dλ V30 and SVD (Wλ ) = Uλ D∗λ Vλ0 , where Uλ ∈ Oqλ , V3 ∈ Oq2 ,qλ ,

V4 ∈ Oq2 ,q2 −qλ ,

D∗λ = Dλ

Vλ ∈ Oq2 ,

Vλ = V3

 V4 ,

 ++ 0qλ ×(q2 −qλ ) , and Dλ ∈ Ddg,q . λ

Then, λ can be written as λ = V3 ηλ + V4 θλ . To solve for ηλ , write the restriction as Cλ0 λ = cλ =⇒ Cλ0 T2 (V3 ηλ + V4 θλ ) = cλ =⇒ Cλ0 T2 V3 ηλ = cλ because Cλ0 T2 V4 = 0 0 =⇒ ηλ = D−1 λ Uλ c λ ,

where Uλ Dλ V30 = svd (Wλ ) and Wλ = Cλ0 T2 . Accordingly, dim1 (θλ ) = q2 − qλ , ξλ = V3 ηλ + V4 θλ = Wλ+ cλ + V4 θλ , λ = T2 ξλ = T2 Wλ+ cλ + T2 V4 θλ , and Cλ0 T2 ξλ = cλ ∀ θλ .

(111)

Details42

(2)

Table 105: Expressions for Dλ;ξ0 ,ξ0 λ

Structure for λ

λ

(2)

Dλ;ξ0 ,ξ0 λ

λ

1a, 1b

0p×q22

2a, 2b, 2c

  0 (1) pw (Ip − Hλ ) T1 Dξλ (T02 ∗ T02 ) − w u021 ⊗ Dλ;ξ0 2Nq2

3a, 3b

 0 0 (Ip − Hλ ) Λλ T02 Dξλ T01 ∗ T02 Dξλ T01 + (Ip − Hλ ) Λλ T1 Dξλ (T02 ∗ T02 )   (1) −p−1 u031 ⊗ Dλ;ξ0 2Nq2

4

T1 Dξλ (T02

λ



λ



0 T02 )

Denote the vector of ordered eigenvalues of R by `, where `1 ≥ `2 ≥ · · · ≥ `p > 0 and R is the sample correlation matrix. An initial guess for λ, subject only to multiplicity constraints is def ˙0 ˙0 ˙ `0 = J˙ m D−1 m Jm `, where Dm = mdg = Jm Jm ,

(112)

J˙ m is defined in (7) and m is the d-vector of eigenvalue multiplicities. It can be shown (Boik, 1998, page 263) that if nS ∼ Wp (n, Σ) and m is the vector of multiplicities of the eigenvalues of the population correlation matrix, Φ, then   √ dist n(`0 − λ) −→ N 0, Ω`0 , where (113)  0 ˙0 0 Ω`0 = 2U (Φ ⊗ Φ) U0 , where U = J˙ m D−1 m Jm L21,p (Γ ⊗ Γ) Ip2 − (Φ ⊗ Ip ) L22,p Φ = ΓΛΓ0 is the diagonal form of Φ, and Lqr,p is defined in Table 101. Normality is not necessarily assumed in the current application. Nonetheless, (113) can be used to construct a generalized least-squares initial guess for θλ . Note that   rk Ω`0 = d − 1 and Ω`0 BB0 = Ω`0 , where (114) B is any p × (d − 1) matrix that satisfies ˙0 BB0 = J˙ m D−1 m Jm − H1 and H1 = ppo(1p ). It follows that the Moore-Penrose inverse of Ω`0 and an estimate of the Moore-Penrose inverse are  −1  −1 b + = B B0 Ω b B Ω+ = B B0 Ω`0 B B0 and Ω B0 , (115) `0 `0 `0 b = 2U b (R ⊗ R) U b 0, respectively, where Ω `0  0   b = J˙ m D−1 J˙ 0 L0 b ⊗Γ b U Γ Ip2 − (R ⊗ Ip ) L22,p , m m 21,p

Details43

(2)

Table 106: Expressions for Dh3 ;ξ0 ,ξ0 λ

Structure for λ

λ

(2)

Dh3 ;ξ0 ,ξ0

λ

λ

(2)

1a, 1b

Cλ0 Dλ;ξ0 ,ξ0

2a

(2) Cλ0 Dλ;ξ0 ,ξ0 λ λ

2b

 0   (1)0 (1)0 (2) −2 − Λ ∗ D Cλ0 Λ−1 D D λ λ λ;ξ0 λ;ξ0 λ;ξ0 ,ξ0

λ

λ

λ

λ

λ

λ

2c

 0   (1)0 (1)0 −2 −1 (2) 0 Cλ (Ip − H1 ) Λλ Dλ;ξ0 ,ξ0 − Λλ Dλ;ξ0 ∗ Dλ;ξ0

3a

 0   (1)0 (1)0 −2 −1 (2) 0 Cλ Λλ Dλ;ξ0 ,ξ0 − Λλ Dλ;ξ0 ∗ Dλ;ξ0

λ

λ

λ

3b

Cλ0 T1 Dξλ (T02 ∗

4

Cλ0 Dλ;ξ0 ,ξ0

λ

λ

λ

λ

λ

0 T02 )

(2)

λ

λ

bΛ bΓ b 0 is the diagonal form of R. B is given in (114), and R = Γ The initial guess is found by minimizing 0 +  b SSE(θ λ ) = `0 − T2 Wλ+ c1 − T2 V4 θ λ Ω ` − T2 Wλ+ c1 − T2 V4 θ λ `0 0 b + is given in (115). The solution is with respect to θ λ , where Wλ = Cλ0 T2 and Ω `0  −1  bλ = V0 T0 Ω b+ b + `0 − T2 W+ c1 θ V40 T02 Ω 4 2 ` T2 V4 λ `0 0 bλ . and b ξλ = ξλ = Wλ+ cλ + V4 θ

107.2

Parameterization (1b): λ = 1p + T2 ξλ , Cλ0 λ = cλ

107.2.1

Derivatives of λ with Respect to

ξλ

First, second, and third derivatives of λ with respect to ξλ are listed below: (1)

Dλ;ξ0 = T2 , λ

107.2.2

(2)

(3)

Dλ;ξ0 ,ξ0 = 0p×q22 , and Dλ;ξ0 ,ξ0 ,ξ0 = 0p×q23 . λ

λ

λ

λ

λ

Initial Guess for ξλ

Define Wλ as Wλ = Cλ0 T2 . By assumption, Wλ has full row-rank, namely rk(Wλ ) = rk(Cλ ) = qλ . Write Wλ as SVD (Wλ ) = Uλ D∗λ Vλ0 , where Vλ has been partitioned as in (111). Then, ξλ can be written as ξλ = V3 ηλ + V4 θλ and λ can be written as λ = 1p + T2 ξλ = 1p + T2 (V3 ηλ + V4 θλ ) .

Details44

(3)

Table 107: Expressions for Dλ;ξ0 ,ξ0 ,ξ0 λ

Structure for λ

λ

λ

(3)

Dλ;ξ0 ,ξ0 ,ξ0 λ

λ

λ

1a, 1b

0p×q23

2a, 2b, 2c

pw (Ip − Hλ ) T1 Dξλ (T02 ∗ T02 ∗ T02 )     (1) (2) −w Dλ;ξ0 ,ξ0 ⊗ u021 Jq2 − w u031 ⊗ Dλ;ξ0 Jq2 ,

0

λ

λ

3a, 3b

λ

0 ∗ T02 Dξλ T01    0 + (Ip − Hλ ) Λλ (T02 ∗ T02 ) Dξλ T01 ∗ T02 Dξλ T01 Jq 2 (Ip − Hλ ) Λλ



T02 Dξλ T01





T02 Dξλ T01



0

+ (Ip − Hλ ) Λλ T1 Dξλ (T02 ∗ T02 ∗ T02 ) h i   (1) (2) −p−1 Dλ;ξ0 ,ξ0 ⊗ u031 Jq2 − p−1 (v + u32 )0 ⊗ Dλ;ξ0 Jq2 4

T1 Dξλ (T02

λ

λ



T02

λ



0 T02 )

To solve for ηλ , write the restriction as Cλ0 λ = cλ =⇒ Cλ0 1p + Cλ0 T2 (V3 ηλ + V4 θλ ) = cλ =⇒ Cλ0 T2 V3 ηλ = cλ − Cλ0 1p because Cλ0 T2 V4 = 0 0 0 =⇒ ηλ = D−1 λ Uλ (cλ − Cλ 1p ) ,

where Uλ Dλ V30 is the full-rank SVD of Wλ = Cλ0 T2 . Accordingly, ξλ = V3 ηλ + V4 θλ = Wλ+ (cλ − Cλ0 1p ) + V4 θλ and λ = 1p + T2 ξλ = 1p + T2 Wλ+ (cλ − Cλ0 1p ) + T2 V4 θλ . Furthermore, νλ def = dim1 (θλ ) = q2 − qλ and the first derivative of λ with respect to θλ is (1) Dλ;θ0 = T2 V4 . If λ is parameterized as (1b) of Table 103, but no additional restrictions are λ

(1)

imposed, then νλ = q2 and the first derivative of λ with respect to θλ is Dλ;θ0 = T2 . λ Denote the vector of ordered eigenvalues of R by `, where `1 ≥ `2 ≥ · · · ≥ `p > 0 and R is the sample correlation matrix. Let `0 be the vector defined in (112). The initial guess is found by minimizing SSE(θ λ ) =  0 +   b `0 − 1p − T2 Wλ+ (c1 − Cλ0 1p ) − T2 V4 θ λ Ω ` − 1p − T2 Wλ+ (c1 − Cλ0 1p ) − T2 V4 θ λ `0 0

Details45

(3)

Table 108: Expressions for Dh3 ;ξ0 ,ξ0 ,ξ0 λ

Structure for λ

(3)

λ

λ

(3)

1a, 1b

Cλ0 Dλ;ξ0 ,ξ0 ,ξ0

2a

(3) Cλ0 Dλ;ξ0 ,ξ0 ,ξ0 λ λ λ

λ

λ

λ

0 0 h   (1)0 (1)0 (1)0 (2)0 (1)0 −2 ∗ D Jq2 ∗ D − Λ Cλ0 2Λ−3 ∗ D D D 0 0 0 0 0 0 λ λ λ;ξλ λ;ξλ λ;ξλ λ;ξλ ,ξλ λ;ξλ i (3) +Λ−1 λ Dλ;ξ0 ,ξ0 ,ξ0 0 0   (1)0 (1)0 (1)0 (2)0 (1)0 −2 ∗ D Jq 2 ∗ D − Λ Cλ0 (Ip − H1 ) 2Λ−3 ∗ D D D 0 0 0 0 0 0 λ λ;ξλ λ;ξλ λ;ξλ λ;ξλ ,ξλ λ;ξλ iλ (3) +Λ−1 λ Dλ;ξ0 ,ξ0 ,ξ0 h

λ

3a

λ

λ

λ

2c

λ

Dh3 ;ξ0 ,ξ0 ,ξ0 λ

2b

λ

λ

λ

h  0  0 (1)0 (1)0 (1)0 (2)0 (1)0 Cλ0 2Λ−3 Dλ;ξ0 ∗ Dλ;ξ0 ∗ Dλ;ξ0 − Λ−2 Dλ;ξ0 ,ξ0 ∗ Dλ;ξ0 Jq2 λ λ λi λ λ λ λ λ (3) +Λ−1 D 0 0 0 λ λ;ξ ,ξ ,ξ λ

λ

3b

Cλ0 T1 Dξλ

4

Cλ0 Dλ;ξ0 ,ξ0 ,ξ0

λ

(T02

∗ T02 ∗ T02 )

0

(3)

λ

λ

λ

b + is given in (115). The solution is with respect to θ λ , where Wλ = Cλ0 T2 and Ω `0 −1    bλ = V0 T0 Ω b+ b + `0 − 1p − T2 W+ (c1 − C 0 1p ) θ V40 T02 Ω λ 4 2 ` T2 V4 λ `0 0 bλ . and b ξλ = Wλ+ (cλ − Cλ0 1p ) + V4 θ

107.3

Parameterization (2a): λ = p

107.3.1

Derivatives of λ with Respect to



T1 exp {T2 ξλ } 10p T1 exp {T2 ξλ }



, Cλ0 λ = cλ

ξλ

Using Theorem 101, it is readily shown that first, second, and third derivatives of λ with respect to ξλ can be written as follows: (1)

Dλ;ξ0 = pw (Ip − Hλ ) T1 Dξλ T2 , λ

  0 (2) (1) Dλ;ξ0 ,ξ0 = pw (Ip − Hλ ) T1 Dξλ (T02 ∗ T02 ) − w u021 ⊗ Dλ;ξ0 2Nq2 , and λ

λ

λ

0

(3)

Dλ;ξ0 ,ξ0 ,ξ0 = pw (Ip − Hλ ) T1 Dξλ (T02 ∗ T02 ∗ T02 ) λ

λ

λ

    (2) (1) −w Dλ;ξ0 ,ξ0 ⊗ u021 Jq2 − w u022 ⊗ Dλ;ξ0 Jq2 . λ

λ

λ

Details46

Table 109: Special Notation for Eigenvalue Parameterizations −1  0 w def = 1p T1 exp {T2 ξλ } Dξλ def = (exp {T2 ξλ })dg Λλ def = λdg bh) Λ ˆ def = (λ

dg

λ,h

H1 def = ppo (1p ) −1 0 Hλ def = λ p 1p b h p−1 10 H ˆ def =λ

p

λ,h

0 0 u21 def = T2 Dξλ T1 1p 0 0 u31 def = T2 Dξλ T1 λ 0 0 0 u22 def = (T2 ∗ T2 ) Dξλ T1 1p 0 0 0 u32 def = (T2 ∗ T2 ) Dξλ T1 λ

v def =

107.3.2



  T02 Dξλ T01 ∗ T02 Dξλ T01 λ

Initial Guess for ξλ

Define Wλ as

e 0 T1 Dξ T2 , Wλ def =C λ λ

(116)

e λ = Cλ − 1p p−1 c 0 . It is assumed that Cλ has been chosen such that the p × qλ matrix where C λ ∂ C0λ ∂ξλ0 λ has full row-rank. It follows that Wλ has full row-rank. Write Wλ as SVD (Wλ ) = Uλ D∗λ Vλ0 , where Vλ has been partitioned as in (111). Then, ξλ can be written in as ξλ = V3 η λ + V4 θ λ where η λ = V30 ξλ , and θλ = V40 ξλ . Furthermore, νλ def = dim1 (θλ ) = q2 − qλ and for fixed V3 and V4 , the first derivative of λ with respect to θλ is (1)

Dλ;θ0 = wp (Ip − Hλ ) T1 Dξλ T2 V4 . λ

If Cλ in parameterization (2a) of Table 103 is empty, then qλ = 0, V4 = Iq2 , νλ = q2 , and the derivative of λ simplifies to (1)

Dλ;θ0 = wp (Ip − Hλ ) T1 Dξλ T2 . λ

(117)

Details47 Denote the vector of ordered eigenvalues of R by `, where `1 ≥ `2 ≥ · · · ≥ `p > 0 and R is the sample correlation matrix. Let `0 be the vector defined in (112). Write λ as λ = λ(ξλ ), where λ(ξλ ) = p

T1 exp {T2 ξλ } . 10p T1 exp {T2 ξλ }

A three-stage procedure is used to obtain an initial guess for ξλ . The first stage employs the Gauss-Newton algorithm to minimize 0

SSE(ω) = [`0 − λ(1q2 ω)] [`0 − λ(1q2 ω)] with respect to the scalar ω, where the initial guess for ω is zero. At iteration h + 1, the estimate of ω is updated by ω bh+1 = ω bh + αh (x0h xh ) b h = λ (1q ω λ 2 bh ) ,

−1

b h ), where x0h (`0 − λ

 T1 (exp {T2 1q2 ω  bh })dg T2 1q2 . xh = p Ip − Hλ,h ˆ bh } 10p T1 exp {T2 1q2 ω

and αh ∈ (0, 1]. The second stage employs the Gauss-Newton algorithm to minimize 0 b+ SSE(ξλ ) = [`0 − λ(ξλ )] Ω [` − λ(ξλ )] `0 0

b + is given in (115), the initial guess for ξλ is 1q ω with respect to ξλ , where Ω b , and ω b is the 2 `0 minimizer from stage one. At iteration h + 1, the estimate is updated by  −1 b b h ), where b + X h + βh I q b + (`0 − λ ξ λ,h+1 = b ξ λ,h + αh X0h Ω X0h Ω 2 `0 `0

  bh = λ b λ ξ λ,h ,

  ξ λ,h } T2  T1 exp {T2 b dg n o , Xh = p Ip − Hλ,h ˆ 0 1p T1 exp T2 b ξ λ,h 

αh ∈ (0, 1], and βh ≥ 0. The values of αh and βh are chosen to ensure that SSE decreases at each iteration. If λ is subject to no additional constraints, then the initial guess for ξλ is the stage two estimate b ξ λ . If λ is subject to Cλ0 λ = cλ , then the third stage is executed. In this stage, the initial guess for θ λ is the obtained as the minimizer of 0 b+ SSE(θ λ ) = [`0 − λ (ξλ )] Ω [` − λ (ξλ )] `0 0

with respect to θ λ , where ξλ = V3 η λ + V4 θ λ , η λ satisfies e 0 T1 exp {T2 (V3 η + V4 θ λ )} = 0 C λ λ e λ = Cλ − 1p p−1 c0 , and the matrices V3 and V4 are given in (111) in which Wλ is for fixed θ λ , C 1

Details48 defined in (116). At iteration h + 1, the estimate of θ λ is updated as  −1 bλ,h+1 = θ bλ,h + αh X0 Ω b h ), where b + Xh + βh Iν b + (`0 − λ θ X0h Ω h ` 2 `0 0

  bh = λ b λ ξ λ,h ,

  ξ λ,h } T2 V4,h  T1 exp {T2 b dg o n Xh = p Ip − Hλ,h , ˆ 0 b 1p T1 exp T2 ξ λ,h 

b bλ,h , b λ,h + V4,h θ ξ λ,h = V3,h η αh ∈ (0, 1], βh ≥ 0, and the matrices V3,h and V4,h are given in (111) in which   e 0 T1 exp {T2 b Wλ,h =C ξ λ,h } T2 ˆ λ dg

has been substituted for Wλ . The values of αh and βh are chosen to ensure that SSE decreases at bλ,h+1 , the value of the corresponding implicit parameter, η b λ,h+1 , is each iteration. Given θ obtained by solving n  o bλ,h+1 e 0 T1 exp T2 V3,h η b λ,h+1 + V4,h θ C = 0. λ

107.3.3

Solving for ηλ

The issue in this section is to solve n  o bλ e 0 T1 exp T2 V3 η b C + V θ = 0, 4 λ λ bλ is a fixed vector, C e λ = Cλ − 1p p−1 c 0 , and V3 and V4 are fixed semiorthogonal bλ , where θ for η λ b λ,0 = V30 b matrices. To solve this equation, first set η ξλ , where b ξλ is the current estimate of ξλ . A b λ,j+1 is modified Newton update for η  b λ,j+1 = η b λ,j − αj η

−1 h n oi o n 0 b e e 0 T1 exp T2 b Cλ T1 exp T2 ξ λ,j T2 V3 C ξ λ,j , λ dg

bλ . b λ,j + V4 θ where αj ∈ (0, 1], and b ξ λ,j = V3 η

107.4

Parameterization (2b): λ = p

107.4.1

Derivatives of λ and

ln (λ)



T1 exp {T2 ξλ } 10p T1 exp {T2 ξλ }

with Respect to



, Cλ0 ln (λ) = cλ

ξλ

Using Theorems 101 and 102, it is readily shown that first, second, and third derivatives of λ and ln (λ) with respect to ξλ can be written as follows: (1)

Dλ;ξ0 = pw (Ip − Hλ ) T1 Dξλ T2 , λ

(1)

(1)

Dln (λ);ξ0 = Λ−1 λ Dλ;ξ0 , λ

λ

  0 (2) (1) Dλ;ξ0 ,ξ0 = pw (Ip − Hλ ) T1 Dξλ (T02 ∗ T02 ) − w u021 ⊗ Dλ;ξ0 2Nq2 , λ

λ

λ

Details49 0  (1)0 (1)0 (2) (2) −2 , − Λ ∗ D D Dln (λ);ξ0 ,ξ0 = Λ−1 D 0 0 0 0 λ λ λ;ξ λ;ξ λ;ξ ,ξ λ

λ

λ

λ

λ

λ

0

(3)

Dλ;ξ0 ,ξ0 ,ξ0 = pw (Ip − Hλ ) T1 Dξλ (T02 ∗ T02 ∗ T02 ) λ

λ

λ

    (1) (2) −w Dλ;ξ0 ,ξ0 ⊗ u021 Jq2 − w u022 ⊗ Dλ;ξ0 Jq2 , and λ

λ

λ

0 0   (1)0 (1)0 (2)0 (1)0 (1)0 (3) −2 Jq 2 ∗ D − Λ ∗ D D ∗ D D Dln (λ);ξ0 ,ξ0 ,ξ0 = 2Λ−3 0 0 0 0 0 0 λ λ λ;ξ λ;ξ λ;ξ ,ξ λ;ξ λ;ξ λ

λ

λ

λ

λ

λ

λ

λ

λ

(3)

+Λ−1 λ Dλ;ξ0 ,ξ0 ,ξ0 . λ

107.4.2

λ

λ

Initial Guess for ξλ

Define Wλ as

0 −1 Wλ def = pwCλ Λλ (Ip − Hλ ) T1 Dξλ T2 .

(118)

It is assumed that Cλ has been chosen such that the p × qλ matrix ∂ C 0 ln (λ) ∂ξλ0 λ has full row-rank. It follows that Wλ has full row-rank. Write Wλ as SVD (Wλ ) = Uλ Dλ Vλ0 , where Vλ has been partitioned as in (111). Then, ξλ can be written in as ξλ = V3 η λ + V4 θ λ where η λ = V30 ξλ , and θλ = V40 ξλ . Furthermore, νλ def = dim1 (θλ ) = q2 − qλ and for fixed V3 and V4 , the first derivative of λ with respect to θλ is (1)

Dλ;θ0 = wp (Ip − Hλ ) T1 Dξλ T2 V4 . λ

If Cλ in parameterization (2b) of Table 103 is empty, then qλ = 0, V4 = Iq2 , νλ = q2 , and the derivative of λ simplifies to (1)

Dλ;θ0 = wp (Ip − Hλ ) T1 Dξλ T2 . λ

Denote the vector of ordered eigenvalues of R by `, where `1 ≥ `2 ≥ · · · ≥ `p > 0 and R is the sample correlation matrix. Let `0 be the vector defined in (112). Write λ as λ = λ(ξλ ), where λ(ξλ ) = p

T1 exp {T2 ξλ } . 10p T1 exp {T2 ξλ }

A three-stage procedure is used to obtain an initial guess for ξλ . The first two stages are identical to the first two stages in parameterization (2a) of Table 103. If λ is subject to no additional constraints, then the initial guess for ξλ is is the stage two estimate b ξ λ . If λ is subject to Cλ0 ln (λ) = cλ , then the third stage is executed. In this stage, the initial guess for θ λ is the obtained as the minimizer of 0 b+ SSE(θ λ ) = [`0 − λ (ξλ )] Ω [` − λ (ξλ )] `0 0

with respect to θ λ , where ξλ = V3 η λ + V4 θ λ ,

Details50 η λ satisfies Cλ0

  T1 exp {T2 (V3 η λ + V4 θ λ )} = cλ ln p 0 1p T1 exp {T2 (V3 η λ + V4 θ λ )}

for fixed θ λ , and the matrices V3 and V4 are given in (111) in which Wλ is defined in (118). At iteration h + 1, the estimate of θ λ is updated as  −1 bλ,h+1 = θ bλ,h + αh X0 Ω b h ), where b + X h + βh I ν b + (`0 − λ θ X0h Ω h ` 2 `0 0

  bh = λ b λ ξ λ,h ,

  ξ λ,h } T2 V4,h  T1 exp {T2 b dg o n , Xh = p Ip − Hλ,h ˆ 0 b 1p T1 exp T2 ξ λ,h 

b bλ,h , αh ∈ (0, 1], βh ≥ 0, the matrices V3,h and V4,h are given in (111) in b λ,h + V4,h θ ξ λ,h = V3,h η which i   h  −1 T2 Wλ,h = pwC ˆ λ0 Λλ,h Ip − Hλ,h T1 exp T2 b ξ λ,h ˆ ˆ ˆ dg

 n o−1 has been substituted for Wλ , and w ˆ = 10p T1 exp T1 b ξ λ,h . The values of αh and βh are bλ,h+1 , the value of the chosen to ensure that SSE decreases at each iteration. Given θ b λ,h+1 , is obtained by solving corresponding implicit parameter, η n  o   bλ,h+1 b λ,h+1 + V4,h θ T1 exp T2 V3,h η n  o  = cλ . Cλ0 ln p bλ,h+1 b λ,h+1 + V4,h θ 10p T1 exp T2 V3,h η

107.4.3

Solving for ηλ

The issue in this section is to solve 

n  o  bλ b λ + V4 θ T1 exp T2 V3 η n  o  = cλ , Cλ0 ln p bλ b λ + V4 θ 10p T1 exp T2 V3 η

bλ is a fixed vector, and V3 and V4 are fixed semiorthogonal matrices. To solve this bλ , where θ for η b λ,0 = V30 b equation, first set η ξλ , where b ξλ is the current estimate of ξλ . A modified Newton update b λ,j+1 is for η b λ,j+1 = η b λ,j η   −αj pCλ0 Λ−1 j

  −1 b T exp {T2 ξ λ,j } T V h   i 1 2 3   dg b j − cλ , n o Ip − Hλj Cλ0 ln λ  10p T1 exp T2 b ξ λ,j

where αj ∈ (0, 1],

  bj Λj = λ , dg

  bj = λ b λ ξ λ,j ,

b j p−1 10 , and b bλ . b λ,j + V4 θ Hλj = λ ξ λ,j = V3 η p

Details51

107.5

Parameterization (2c): λ = p



T1 exp {T2 ξλ } 10p T1 exp {T2 ξλ }



,

 Cλ0 ln λ/|Λλ |1/p = cλ

107.5.1

Derivatives of λ and

ln λ/|Λλ |1/p



with Respect to

ξλ

Derivatives of ln (λ/|Λλ |1/p ) can be obtained from derivatives of ln (λ) by using the relation ln (λ/|Λλ |1/p ) = (Ip − H1 ) ln (λ). Using Theorem 101 and 102, it is readily shown that first, second, and third derivatives of λ and ln (λ/|Λλ |1/p ) with respect to ξλ can be written as follows: (1)

Dλ;ξ0 = pw (Ip − Hλ ) T1 Dξλ T2 , λ

(1)

(1)

Dln (λ/Φ1/p );ξ0 = (Ip − H1 ) Λ−1 λ Dλ;ξ0 , λ

λ

  0 (1) (2) Dλ;ξ0 ,ξ0 = pw (Ip − Hλ ) T1 Dξλ (T02 ∗ T02 ) − w u021 ⊗ Dλ;ξ0 2Nq2 , λ

λ

λ

  0  (2) (2) (1)0 (1)0 −2 Dln (λ/Φ1/p );ξ0 ,ξ0 = (Ip − H1 ) Λ−1 D − Λ D ∗ D , 0 0 0 0 λ λ λ;ξ ,ξ λ;ξ λ;ξ λ

λ

λ

λ

λ

λ

0

(3)

Dλ;ξ0 ,ξ0 ,ξ0 = pw (Ip − Hλ ) T1 Dξλ (T02 ∗ T02 ∗ T02 ) λ

λ

λ

    (2) (1) −w Dλ;ξ0 ,ξ0 ⊗ u021 Jq2 − w u022 ⊗ Dλ;ξ0 Jq2 , and λ

λ

λ

0 h  (3) (1)0 (1)0 (1)0 Dln (λ/Φ1/p );ξ0 ,ξ0 ,ξ0 = (Ip − H1 ) 2Λ−3 ∗ D ∗ D D 0 0 0 λ λ;ξ λ;ξ λ;ξ λ

λ

λ

λ

λ

λ

 0 i (2)0 (1)0 (3) −Λ−2 Dλ;ξ0 ,ξ0 ∗ Dλ;ξ0 Jq2 + Λ−1 λ λ Dλ;ξ0 ,ξ0 ,ξ0 . λ

107.5.2

λ

λ

λ

λ

λ

Initial Guess for ξλ

Define Wλ as

−1 0 Wλ def = Cλ (Ip − H1 ) Λλ T1 Dξλ T2 .

(119)

It is assumed that Cλ has been chosen such that the p × qλ matrix ! λ ∂ 0 C ln 1 ∂ξλ0 λ |Λλ | p has full row-rank. It follows that Wλ has full row-rank. Write Wλ as SVD (Wλ ) = Uλ Dλ Vλ0 , where Vλ has been partitioned as in (111). Then, ξλ can be written in as ξλ = V3 η λ + V4 θ λ where η λ = V30 ξλ , and θλ = V40 ξλ . Furthermore, νλ def = dim1 (θλ ) = q2 − qλ and for fixed V3 and V4 , the first derivative of λ with respect to θλ is (1)

Dλ;θ0 = wp (Ip − Hλ ) T1 Dξλ T2 V4 . λ

Details52 If Cλ in parameterization (2c) of Table 103 is empty, then qλ = 0, V4 = Iq2 , νλ = q2 , and the derivative of λ simplifies to (1)

Dλ;θ0 = wp (Ip − Hλ ) T1 Dξλ T2 . λ

Denote the vector of ordered eigenvalues of R by `, where `1 ≥ `2 ≥ · · · ≥ `p > 0 and R is the sample correlation matrix. Let `0 be the vector defined in (112). Write λ as λ = λ(ξλ ), where λ(ξλ ) = p

T1 exp {T2 ξλ } . 10p T1 exp {T2 ξλ }

A three-stage procedure is used to obtain an initial guess for ξλ . The first two stages are identical to the first two stages in parameterization (2a) of Table 103. If λ is subject to no additional constraints, then the initial guess for ξλ is the stage two  estimate, b ξ λ . If λ is subject to Cλ0 ln λ/|Λλ |1/p = cλ , then the third stage is executed. In this stage, the initial guess for θ λ is the obtained as the minimizer of 0 b+ SSE(θ λ ) = [`0 − λ (ξλ )] Ω [` − λ (ξλ )] `0 0

with respect to θ λ , where ξλ = V3 η λ + V4 θ λ , η λ satisfies Cλ0

  T1 exp {T2 (V3 η λ + V4 θ λ )} = cλ (Ip − H1 ) ln p 0 1p T1 exp {T2 (V3 η λ + V4 θ λ )}

for fixed θ λ , H1 = ppo(1p ), and the matrices V3 and V4 are given in (111) in which Wλ is defined in (119). At iteration h + 1, the estimate of θ λ is updated as  −1 bλ,h+1 = θ bλ,h + αh X0 Ω b h ), where b + X h + βh I ν b + (`0 − λ θ X0h Ω h ` 2 `0 0

  bh = λ b λ ξ λ,h ,

  ξ λ,h } T2 V4,h  T1 exp {T2 b dg n o , Xh = p Ip − Hλ,h ˆ 0 b 1p T1 exp T2 ξ λ,h 

b bλ,h , αh ∈ (0, 1], βh ≥ 0, and the matrices V3,h and V4,h are given in (111) b λ,h + V4,h θ ξ λ,h = V3,h η in which  n o −1 b T2 = Cλ0 (Ip − H1 ) Λλ,h Wλ,h ˆ ˆ T1 exp T2 ξ λ,h dg

has been substituted for Wλ . The values of αh and βh are chosen to ensure that SSE decreases at bλ,h+1 , the value of the corresponding implicit parameter, η b λ,h+1 , is each iteration. Given θ obtained by solving n  o   bλ,h+1 b λ,h+1 + V4,h θ T1 exp T2 V3,h η n  o  = cλ . Cλ0 (Ip − H1 ) ln p bλ,h+1 b λ,h+1 + V4,h θ 10p T1 exp T2 V3,h η

107.5.3

Solving for ηλ

The issue in this section is to solve n  o  bλ b λ + V4 θ T1 exp T2 V3 η n  o  = cλ , Cλ0 (Ip − H1 ) ln p bλ b λ + V4 θ 10p T1 exp T2 V3 η 

Details53 bλ is a fixed vector, H1 = 1p p−1 10 , and V3 and V4 are fixed semiorthogonal bλ , where θ for η p b λ,0 = V30 b matrices. To solve this equation, first set η ξλ , where b ξλ is the current estimate of ξλ . A b λ,j+1 is modified Newton update for η b λ,j+1 = η b λ,j η   −1 T1 exp {T2 b ξ λ,j } T2 V3 h   i   dg 0 b n o C (I − H ) ln λ − c −αj pCλ0 (Ip − H1 ) Λ−1  1 j λ , λ p j 10p T1 exp T2 b ξ λ,j 

where αj ∈ (0, 1],

  bj = λ b λ ξ λ,j ,

  bj Λj = λ , dg

bλ . b λ,j + V4 θ H1 = ppo (1p ) , and b ξ λ,j = V3 η

107.6

Parameterization (3a): λ = p

107.6.1

Derivatives of λ and

ln (λ)



exp [T1 exp {T2 ξλ }] exp [T1 exp {T2 ξλ }]



10p

with Respect to

, Cλ0 ln (λ) = cλ

ξλ

Using Theorems 101 and 102, it is readily shown that first, second, and third derivatives of λ and ln (λ) with respect to ξλ can be written as follows: (1)

Dλ;ξ0 = (Ip − Hλ ) Λλ T1 Dξλ T2 , λ

(1)

(1)

Dln (λ);ξ0 = Λ−1 λ Dλ;ξ0 , λ

λ

(2)

Dλ;ξ0 ,ξ0 = (Ip − Hλ ) Λλ T02 Dξλ T01 ∗ T02 Dξλ T01 λ

0

λ

0

+ (Ip − Hλ ) Λλ T1 Dξλ (T02 ∗ T02 )

  (1) −p−1 u031 ⊗ Dλ;ξ0 2Nq2 , λ

 0 (2) (2) (1)0 (1)0 −2 Dln (λ);ξ0 ,ξ0 = Λ−1 D − Λ D ∗ D , 0 0 0 0 λ λ λ;ξ ,ξ λ;ξ λ;ξ λ

λ

λ

λ

λ

λ

(3)

Dλ;ξ0 ,ξ0 ,ξ0 = (Ip − Hλ ) Λλ T02 Dξλ T01 ∗ T02 Dξλ T01 ∗ T02 Dξλ T01 λ

λ

0

λ

 0 + (Ip − Hλ ) Λλ (T02 ∗ T02 ) Dξλ T01 ∗ T02 Dξλ T01 Jq2 0

+ (Ip − Hλ ) Λλ T1 Dξλ (T02 ∗ T02 ∗ T02 )

      (2) (1) (1) −p−1 Dλ;ξ0 ,ξ0 ⊗ u031 Jq2 − p−1 v0 ⊗ Dλ;ξ0 Jq2 − p−1 u032 ⊗ Dλ;ξ0 Jq2 and λ

λ

λ

λ

 0  0 (3) (1)0 (1)0 (1)0 (2)0 (1)0 −2 Dln (λ);ξ0 ,ξ0 ,ξ0 = 2Λ−3 D ∗ D ∗ D − Λ D ∗ D Jq 2 0 0 0 0 0 0 λ λ λ;ξ λ;ξ λ;ξ λ;ξ ,ξ λ;ξ λ

λ

λ

λ

λ

λ

λ

(3)

+Λ−1 λ Dλ;ξ0 ,ξ0 ,ξ0 . λ

λ

λ

λ

λ

Details54

107.6.2

Initial Guess for ξλ

Define Dξλ as in Table 109 and define Wλ as 0 0 Wλ def = Cλ (Ip − Hλ ) T1 Dξλ T2 .

(120)

It is assumed that Cλ has been chosen such that the p × qλ matrix ∂ C 0 ln (λ) ∂ξλ0 λ has full row-rank. It follows that Wλ has full row-rank. Write Wλ as SVD (Wλ ) = Uλ Dλ Vλ0 , where Vλ has been partitioned as in (111). Then, ξλ can be written in as ξλ = V3 η λ + V4 θ λ where η λ = V30 ξλ , and θλ = V40 ξλ . Furthermore, νλ def = dim1 (θλ ) = q2 − qλ and for fixed V3 and V4 , the first derivative of λ with respect to θλ is (1)

Dλ;θ0 = (Ip − Hλ ) Λλ T1 Dξλ T2 V4 . λ

If Cλ in parameterization (3a) of Table 103 is empty, then qλ = 0, V4 = Iq2 , νλ = q2 , and the derivative of λ simplifies to (1) Dλ;θ0 = (Ip − Hλ ) Λλ T1 Dξλ T2 . λ

Denote the vector of ordered eigenvalues of R by `, where `1 ≥ `2 ≥ · · · ≥ `p > 0 and R is the sample correlation matrix. Let `0 be the vector defined in (112). Write λ as λ = λ(ξλ ), where λ(ξλ ) = p

exp (T1 exp {T2 ξλ }) . 10p exp (T1 exp {T2 ξλ })

A three-stage procedure is used to obtain an initial guess for ξλ . The first stage employs the Gauss-Newton algorithm to minimize 0

SSE(ω) = [`0 − λ(1q2 ω)] [`0 − λ(1q2 ω)] with respect to the scalar ω, where the initial guess for ω is zero. At iteration h + 1, the estimate of ω is updated by ω bh+1 = ω bh + αh (x0h xh ) b h = λ (1q ω λ b ), 2 h

−1

b h ), where x0h (`0 − λ

  Λλ,h bh })dg T2 1q2 , xh = Ip − Hλ,h ˆ ˆ T1 (exp {T2 1q2 ω

and αh ∈ (0, 1]. The second stage employs the Gauss-Newton algorithm to minimize 0 b+ SSE(ξλ ) = [`0 − λ(ξλ )] Ω [` − λ(ξλ )] `0 0

b + is given in (115), the initial guess for ξλ is 1q ω with respect to ξλ , where Ω b , and ω b is the 2 `0 minimizer from stage one. At iteration h + 1, the estimate is updated by  −1 b b h ), where b + X h + βh I q b + (`0 − λ ξ λ,h+1 = b ξ λ,h + αh X0h Ω X0h Ω 2 `0 `0   bh = λ b λ ξ λ,h ,

    b Λλ,h T2 , Xh = Ip − Hλ,h ˆ ˆ T1 exp {T2 ξ λ,h } dg

Details55 αh ∈ (0, 1], and βh ≥ 0. The values of αh and βh are chosen to ensure that SSE decreases at each iteration. If λ is subject to no additional constraints, then the initial guess for ξλ is the stage two estimate, b ξ λ . If λ is subject to Cλ0 ln (λ) = cλ , then the third stage is executed. In this stage, the initial guess for θ λ is the obtained as the minimizer of 0 b+ SSE(θ λ ) = [`0 − λ (ξλ )] Ω [` − λ (ξλ )] `0 0

with respect to θ λ , where ξλ = V3 η λ + V4 θ λ , η λ satisfies    Cλ0 T1 exp {T2 (V3 η λ + V4 θ λ )} − 1p ln p−1 10p exp [T1 exp {T2 (V3 η λ + V4 θ λ )}] = cλ for fixed θ λ , and the matrices V3 and V4 are given in (111) in which Wλ is defined in (120). At iteration h + 1, the estimate of θ λ is updated as  −1 bλ,h+1 = θ bλ,h + αh X0 Ω b h ), where b + X h + βh I ν b + (`0 − λ θ X0h Ω h ` 2 `0 0   bh = λ b λ ξ λ,h ,

    b Xh = Ip − Hλ,h Λ T exp {T ξ } T2 V4,h , ˆ ˆ 1 2 λ,h λ,h dg

b bλ,h , αh ∈ (0, 1], βh ≥ 0, and the matrices V3,h and V4,h are given in (111) b λ,h + V4,h θ ξ λ,h = V3,h η in which  0  n o def 0 Wλ,h T1 exp T2 b ξ λ,h T2 ˆ ˆ = Cλ Ip − Hλ,h dg

is substituted for Wλ . The values of αh and βh are chosen to ensure that SSE decreases at each bλ,h+1 , the value of the corresponding implicit parameter, η b λ,h+1 , is obtained by iteration. Given θ solving h   Cλ0 T1 exp T2 V3,h η λ,h+1 + V4,h θ λ,h+1      i −1p ln p−1 10p exp T1 exp T2 V3,h η λ,h+1 + V4,h θ λ,h+1 = cλ .

107.6.3

Solving for ηλ

The issue in this section is to solve h n o n h n oioi Cλ0 T1 exp T2 b ξλ − 1p ln p−1 10p exp T1 exp T2 b ξλ = cλ bλ , where for η b bλ , b λ + V4 θ ξλ = V3 η bλ is a fixed vector, and V3 and V4 are fixed semiorthogonal matrices. To solve this equation, first θ b λ,0 = V30 b b λ,j+1 is set η ξλ , where b ξλ is the current estimate of ξλ . A modified Newton update for η  b λ,j+1 = η b λ,j − αj η where αj ∈ (0, 1],

−1 h  0 h n oi i b j ) − cλ , Cλ0 Ip − Hλ,j T1 exp T2 b ξ λ,j T2 V3 Cλ0 ln(λ ˆ dg

  bj = λ b λ ξ λ,j ,

b −1 10 , and b bλ . b λ,j + V4 θ Hλ,j ξ λ,j = V3 η ˆ = λj p p

Details56

107.7

Parameterization (3b): λ = p



exp [T1 exp {T2 ξλ }] 10p exp [T1 exp {T2 ξλ }]



,

 Cλ0 ln λ/|Λλ |1/p ) = cλ

107.7.1

Derivatives of λ and

ln λ/|Λλ |1/p



with Respect to

ξλ

Derivatives of ln (λ/|Λλ |1/p ) can be obtained from derivatives of ln (λ) by using the relation ln (λ/|Λλ |1/p ) = (Ip − H1 ) ln (λ). Using Theorems 101 and 102, it is readily shown that first, second, and third derivatives of λ and ln (λ/|Λλ |1/p ) with respect to ξλ can be written as follows: (1)

Dλ;ξ0 = (Ip − Hλ ) Λλ T1 Dξλ T2 , λ

(1)

(1)

Dln (λ/Φ1/p );ξ0 = (Ip − H1 ) Λ−1 λ Dλ;ξ0

λ

λ

= (Ip − H1 ) T1 Dξλ T2 = T1 Dξλ T2 , (2)

Dλ;ξ0 ,ξ0 = (Ip − Hλ ) Λλ T02 Dξλ T01 ∗ T02 Dξλ T01 λ

λ

0

0

+ (Ip − Hλ ) Λλ T1 Dξλ (T02 ∗ T02 )

  (1) −p−1 u031 ⊗ Dλ;ξ0 2Nq2 , λ

 0 (2) (2) (1)0 (1)0 −2 Dln (λ/Φ1/p );ξ0 ,ξ0 = (Ip − H1 ) Λ−1 Dλ;ξ0 ∗ Dλ;ξ0 λ Dλ;ξ0 ,ξ0 − (Ip − H1 ) Λλ λ

λ

λ

λ

λ

λ

0

= (Ip − H1 ) T1 Dξλ (T02 ∗ T02 ) 0

= T1 Dξλ (T02 ∗ T02 ) , (3)

Dλ;ξ0 ,ξ0 ,ξ0 = (Ip − Hλ ) Λλ T02 Dξλ T01 ∗ T02 Dξλ T01 ∗ T02 Dξλ T01 λ

λ

0

λ

 0 + (Ip − Hλ ) Λλ (T02 ∗ T02 ) Dξλ T01 ∗ T02 Dξλ T01 Jq2 0

+ (Ip − Hλ ) Λλ T1 Dξλ (T02 ∗ T02 ∗ T02 )

      (2) (1) (1) −p−1 Dλ;ξ0 ,ξ0 ⊗ u031 Jq2 − p−1 v0 ⊗ Dλ;ξ0 Jq2 − p−1 u032 ⊗ Dλ;ξ0 Jq2 , and λ

λ

λ

λ

 0 (1)0 (1)0 (1)0 (3) Dln (λ/Φ1/p );ξ0 ,ξ0 ,ξ0 = 2 (Ip − H1 ) Λ−3 D ∗ D ∗ D 0 0 0 λ λ;ξ λ;ξ λ;ξ λ

λ

λ

λ

λ

λ

 0 (2)0 (1)0 (3) − (Ip − H1 ) Λ−2 D ∗ D Jq2 + (Ip − H1 ) Λ−1 0 0 0 λ λ Dλ;ξ0 ,ξ0 ,ξ0 λ;ξ ,ξ λ;ξ λ

λ

λ

λ

0

= (Ip − H1 ) T1 Dξλ (T02 ∗ T02 ∗ T02 ) 0

= T1 Dξλ (T02 ∗ T02 ∗ T02 ) .

λ

λ

Details57 The simplified expression for the derivatives of ln (λ/|Λλ |1/p ) can be verified by using 10p T1 = 0 and either (Ip − H1 ) Λ−1 λ (Ip − Hλ ) Λλ = (Ip − H1 ) or  ln

107.7.2

λ |Λλ |1/p

 = (Ip − H1 ) T1 exp {T2 ξλ } = T1 exp {T2 ξλ } .

Initial Guess for ξλ

Define Wλ as

0 Wλ def = Cλ (Ip − H1 ) T1 Dξλ T2 .

(121)

It is assumed that Cλ has been chosen such that the p × qλ matrix ! λ ∂ 0 C ln 1 ∂ξλ0 λ |Λλ | p has full row-rank. It follows that Wλ has full row-rank. Write Wλ as SVD (Wλ ) = Uλ Dλ Vλ0 , where Vλ has been partitioned as in (111). Then, ξλ can be written in as ξλ = V3 η λ + V4 θ λ where η λ = V30 ξλ , and θλ = V40 ξλ . Furthermore, νλ def = dim1 (θλ ) = q2 − qλ and for fixed V3 and V4 , the first derivative of λ with respect to θλ is (1)

Dλ;θ0 = (Ip − Hλ ) Λλ T1 Dξλ T2 V4 . λ

If Cλ in parameterization (3b) of Table 103 is empty, then qλ = 0, V4 = Iq2 , νλ = q2 , and the derivative of λ simplifies to (1) Dλ;θ0 = (Ip − Hλ ) Λλ T1 Dξλ T2 . λ

Denote the vector of ordered eigenvalues of R by `, where `1 ≥ `2 ≥ · · · ≥ `p > 0 and R is the sample correlation matrix. Let `0 be the vector defined in (112). Write λ as λ = λ(ξλ ), where λ(ξλ ) = p

exp (T1 exp {T2 ξλ }) . 10p exp (T1 exp {T2 ξλ })

A three-stage procedure is used to obtain an initial guess for ξλ . The first two stages are identical to the first two stages in parameterization (3a) of Table 103. If λ is subject to no additional constraints, then the initial guess for ξλ is the stage two  estimate, b ξ λ . If λ is subject to Cλ0 ln λ/|Λλ |1/p = cλ , then the third stage is executed. In this stage, the initial guess for θ λ is the obtained as the minimizer of 0 b+ SSE(θ λ ) = [`0 − λ (ξλ )] Ω [` − λ (ξλ )] `0 0

with respect to θ λ , where ξλ = V3 η λ + V4 θ λ , η λ satisfies

Cλ0 T1 exp {T2 (V3 η λ + V4 θ λ )} = cλ

Details58 for fixed θ λ , and the matrices V3 and V4 are given in (111) in which Wλ is defined in (121). At iteration h + 1, the estimate of θ λ is updated as  −1 bλ,h+1 = θ bλ,h + αh X0 Ω b h ), where b+ b + (`0 − λ θ X0h Ω h ` Xh + βh Iν2 `0 0   bh = λ b λ ξ λ,h ,

    b Xh = Ip − Hλ,h Λλ,h T2 V4,h , ˆ ˆ T1 exp {T2 ξ λ,h } dg

b bλ,h , αh ∈ (0, 1], βh ≥ 0, the matrices V3,h and V4,h are given in (111) in b λ,h + V4,h θ ξ λ,h = V3,h η which o o  n  n def 0 T2 T2 = Cλ0 T1 exp T2 b ξ λ,h Wλ,h ξ λ,h ˆ = Cλ (Ip − H1 ) T1 exp T2 b dg

dg

has been substituted for Wλ , and H1 = ppo(1p ). The values of αh and βh are chosen to ensure bλ,h+1 , the value of the corresponding implicit that SSE decreases at each iteration. Given θ b λ,h+1 , is obtained by solving parameter, η n  o bλ,h+1 b λ,h+1 + V4,h θ Cλ0 (Ip − H1 ) T1 exp T2 V3,h η = cλ .

107.7.3

Solving for ηλ

The issue in this section is to solve n  o bλ b λ + V4 θ Cλ0 (Ip − H1 ) T1 exp T2 V3 η = cλ , bλ is a fixed vector, H1 = 1p p−1 10 , and V3 and V4 are fixed semiorthogonal bλ , where θ for η p matrices. Note that the constraint simplifies to n  o bλ b λ + V4 θ Cλ0 T1 exp T2 V3 η = cλ b λ,0 = V30 b because 10p T1 = 0. To solve this equation, first set η ξλ , where b ξλ is the current estimate of b λ,j+1 is ξλ . A modified Newton update for η  b λ,j+1 = η b λ,j − αj η

−1 h h n oi o i n Cλ0 T1 exp T2 b ξ λ,j T2 V3 Cλ0 T1 exp T2 b ξ λ,j − cλ , dg

bλ . b λ,j + V4 θ where αj ∈ (0, 1], and b ξ λ,j = V3 η

107.8

Parameterization (4): 1p + T1 exp {T2 ξλ }, Cλ0 λ = cλ

Under certain conditions, parameterization (4) has an alternative representation. Specifically, if 1q1 ∈ R(T2 ), then  T2 = 1q1 T∗2 B for some nonsingular B, where dim(T∗2 ) = q1 × (q2 − 1),

∗ R(T2 ) = R [(Iq1 − Hq1 ) T2 ] , and Hq1 = ppo(1q1 ).

Details59 Accordingly, T2 ξλ = 1q1

 T∗2 Bξλ = 1q1

 T∗2 ξ ∗λ where ξ ∗λ =



∗ ξλ,1 ξ ∗λ,2

 = Bξλ

∗ =⇒ T2 ξλ = 1q1 ξλ,1 + T∗2 ξ ∗λ,2

 ∗ =⇒ λ = 1p + T1 exp 1q1 ξλ,1 + T∗2 ξ ∗λ,2   ∗ ∗∗ ∗∗ = 1p + ξλ,1 T1 exp T∗2 ξ ∗λ,2 , where ξλ,1 = exp ξλ,1 .

107.8.1

Derivatives of λ with Respect to

ξλ

Using Theorem 101, it is readily shown that first, second, and third derivatives of λ with respect to ξλ can be written as follows: (1)

Dλ;ξ0 = T1 Dξλ T2 , λ

0

(2)

Dλ;ξ0 ,ξ0 = T1 Dξλ (T02 ∗ T02 ) , and λ

λ

0

(3)

Dλ;ξ0 ,ξ0 ,ξ0 = T1 Dξλ (T02 ∗ T02 ∗ T02 ) . λ

107.8.2

λ

λ

Initial Guess for ξλ

Define Wλ as

0 Wλ def = Cλ T1 Dξλ T2 .

(122)

It is assumed that Cλ has been chosen such that the qλ × q2 matrix ∂ C0λ ∂ξλ0 λ has full row-rank. It follows that Wλ has full row-rank. Write Wλ as SVD (Wλ ) = Uλ Dλ Vλ0 , where Vλ has been partitioned as in (111). Then, ξλ can be written in as ξλ = V3 η λ + V4 θ λ where η λ = V30 ξλ , and θλ = V40 ξλ . Furthermore, νλ def = dim1 (θλ ) = q2 − qλ and for fixed V3 and V4 , the first derivative of λ with respect to θλ is (1)

Dλ;θ0 = T1 Dξλ T2 V4 . λ

If Cλ in parameterization (4) of Table 103 is empty, then qλ = 0, V4 = Iq2 , νλ = q2 , and the derivative of λ simplifies to (1)

Dλ;θ0 = T1 Dξλ T2 . λ

Denote the vector of ordered eigenvalues of R by `, where `1 ≥ `2 ≥ · · · ≥ `p > 0 and R is the sample correlation matrix. Let `0 be the vector defined in (112). Write λ as λ = λ(ξλ ), where λ(ξλ ) = 1p + T1 exp {T2 ξλ } .

Details60 A three-stage procedure is used to obtain an initial guess for ξλ . In the first stage, expand the exponential function around ξλ = 0 to obtain λ(ξλ ) ≈ 1p + T1 (1q1 + T2 ξλ ) = 1p + T1 1q1 + T1 T2 ξλ . Then, choose ξλ to minimize 0 b+ SSE(ξλ ) = (`0 − 1p − T1 1q1 − T1 T2 ξλ ) Ω (` − 1p − T1 1q1 − T1 T2 ξλ ) , `0 0

b + is given in (115). The solution is where Ω `0  −1 b b + T1 T2 b + (`0 − 1p − T1 1q ) . ξλ = T02 T01 Ω T02 T01 Ω 1 `0 `0 In the second stage, the quantity b + [`0 − λ(ξλ )] SSE(ξλ ) = [`0 − λ(ξλ )]0 Ω `0 is minimized with respect to ξλ , where the initial guess for ξλ is the stage one estimate, b ξλ . Using a Gauss-Newton method, the estimate of ξλ is updated at iteration h + 1 by   −1  b b h , where b + X h + βh I q b + `0 − λ ξ λ,h+1 = b ξ λ,h + αh X0h Ω X0h Ω 2 `0 `0   bh = λ b λ ξ λ,h ,

  Xh = T1 exp {T2 b ξ λ,h } T2 . dg

αh ∈ (0, 1], and βh ≥ 0. The values of αh and βh are chosen to ensure that SSE decreases at each iteration. If λ is subject to no additional constraints, then the initial guess for ξλ is the stage two estimate, b ξ λ . If λ is subject to Cλ0 λ = cλ , then the third stage is executed. In this stage, the initial guess for θλ is obtained as the minimizer of 0 b+ SSE(θλ ) = [`0 − λ (ξλ )] Ω [` − λ (ξλ )] `0 0

with respect to θλ , where ξλ = V3 η λ + V4 θλ , η λ satisfies

Cλ0 [T1 exp {T2 (V3 η λ + V4 θλ )} + 1p ] − c1 = 0

for fixed θ λ , and the matrices V3 and V4 are given in (111) in which Wλ is defined in (122). At iteration h + 1, the estimate of θλ is updated as  −1 bλ,h+1 = θ bλ,h + αh X0 Ω b h ), where b+ b + (`0 − λ θ X0h Ω h ` Xh + βh Iν2 `0 0   bh = λ b λ ξ λ,h ,

  Xh = T1 exp {T2 b ξ λ,h } T2 V4,h , dg

b bλ,h , αh ∈ (0, 1], βh ≥ 0, and the matrices V3,h and V4,h are given in (111) b λ,h + V4,h θ ξ λ,h = V3,h η in which   Wλ,h = Cλ0 T1 exp {T2 b ξ λ,h } T2 ˆ dg

has been substituted for Wλ . The values of αh and βh are chosen to ensure that SSE decreases at bλ,h+1 , the value of the corresponding implicit parameter, η b λ,h+1 , is each iteration. Given θ obtained by solving h n  o i bλ,h+1 b λ,h+1 + V4,h θ Cλ0 T1 exp T2 V3,h η + 1p − cλ = 0.

Details61

107.8.3

Solving for ηλ

The issue in this section is to solve h n  o i bλ b λ + V4 θ Cλ0 T1 exp T2 V3 η + 1p − cλ = 0, bλ is a fixed vector, and V3 and V4 are fixed semiorthogonal matrices. To solve this bλ , where θ for η b λ,0 = V30 b equation, first set η ξλ , where b ξλ is the current estimate of ξλ . A modified Newton update b λ,j+1 is for η b λ,j+1 = η b λ,j η

 −1 n  n o h n o i o 0 b Cλ0 T1 exp T2 b − αj Cλ T1 exp T2 ξ λ,j T2 V3 ξ λ,j + 1p − cλ , dg

bλ . b λ,j + V4 θ where αj ∈ (0, 1], and b ξ λ,j = V3 η

108

Additional Eigenvalue Structures

108.1

Bendel & Mickey (1978)

Bendel and Mickey’s (1978) geometric structure and shifted geometric structure can be written as  p (1 − β) if β ∈ (0, 1), 1 − βp (a) λi = α β i−1 , where β ∈ (0, 1], and α = and  1 if β = 1;  (123) p (1 − δ)(1 − β) if β ∈ (0, 1), 1 − βp (b) λi = α β i−1 + δ, where β ∈ (0, 1], α =  1−δ if β = 1, and δ ∈ (0, 1) is a lower bound on the p − 1 smallest eigenvalues. Bendel and Mickey (1978) used (123) to generate eigenvalues, but they did not fit the structures in (123) to observed data. Structure (123a) is a special case of structure (2a) in Table 103 in which Cλ and cλ are empty,   0 p−1 and x1 def (124) T1 = Ip , and T2 = z1 , where z1 def x − 1 = 0 1 ··· p − 1 . = 1 p 2 Two structures that are similar to (123b) are the following:  ( − β)   p(1 − δ)(1 i−1 p−1 δ + αβ if i < p, 1 − β (a) λi = where α =  δ if i = p,  p(1 − δ) p−1 β ∈ (0, 1], δ ∈ (0, 1), and

(b)

if β ∈ (0, 1), if β = 1,

 − β)   p(δ − 1)(1 if β ∈ (0, 1), if i = 1, 1 − β p−1 where α =  if i > 1,  p(δ − 1) if β = 1, p− 1  p(1 − β)  1, if β ∈ (0, 1), p(1 − β) − 1 + β p−1 β ∈ (0, 1], and δ ∈  (1, p) if β = 1.

( δ λi = δ − α β p−i

(125)

Both structures in (125) represent p − 1 of the eigenvalues in a shifted geometric pattern. The eigenvalue that does not conform to this pattern is λp in (125a) and λ1 in (125b) and the value of

Details62 the non-conforming eigenvalue is δ. The structures in (125) can be written as (2a) in Table 103, where !   1 0 1p−1 Ip−1 h i for (125a), T1 = and T2 = 2x2 2 1p−1 − p2x − 1 01×(p−1) −2 (p − 1)(p − 2)  T1 =

1 1p−1

01×(p−1) −Ip−1

 and T2 =

! 0 h i for (125b), 3 1p−1 − p2x −2

1 2x3 − (p − 1)(p − 2)

x2 and x3 are given in (126), and Cλ and cλ are empty. Table 110 summarizes some limiting properties of structures in Table 110 is defined in (124). Also,  0      1 ¯2 x2 − x 0  z2 def and z3 def , where x2 def = = x −x =  .. ¯ 0  . 3 3

(123b) and (125). The vector z1    , 

  p−2 p − 3   x3 def =  ..  ,  . 

p−2

(126)

0

and x ¯2 = x ¯3 = 1p−1 (p − 2)/2. Bendel and Mickey’s shifted geometric structure, (123b), cannot be recommended because estimation of β and δ is hampered by issues of ill-conditioning. Table 110 (1) reveals that the ill-conditioning occurs because Dλ;δ → 0 for any δ ∈ (0, 1) as β → 1. The structures in (125), however, do not suffer from ill-conditioning. Table 110: Properties of Geometric Eigenvalue Structures Limit as β → 0

Struc-

(1)

(1)

ture

λ

Dλ;β

Dλ;δ

123b

1p δ + ep1 p(1 − δ)

(ep2 − ep1 ) p(1 − δ)

1p − ep1 p

125a

1p δ + ep1 p(1 − δ)

1p − ep1 p

125b

1p δ − ep p(δ − 1)

(ep2 − ep1 ) p(1 − δ)  epp − epp−1 p(δ − 1) Limit as β → 1

Struc-

(1)

ture

λ

Dλ;β

123b

1p   p 1p (p − δ) − ep (1 − δ)p /(p − 1)

z1 (1 − δ)

125a 125b

108.2

1p − epp p

[1p (p − δ) +

ep1 (δ

− 1)p] /(p − 1)

z2 (1 − δ)p/(p − 1) z3 (1 − δ)p/(p − 1)

(1)

Dλ;δ

epp (ep1

0p×1  p − 1p /(p − 1) p − 1p ) /(p − 1)

Bentler & Yuan (1998)

Bentler and Yuan’s (1998) linear structure on the smallest p − k eigenvalues is a special case of structure (2a) in Table 103, in which Cλ and cλ are empty,     0 Ik+1 1k 1k x1 Ik T1 = , T2 = , and x = p − k − 1 p − k − 2 · · · 0 . 0 −1k+1 1p−k x 0(p−k)×k

Details63

109

Remarks on Composite Multiplicity Models

One motivation for the composite model is that the parameterization of Φ, as written in (19), fails when comparing models whose eigenvalue multiplicities differ. For example, suppose that p = 6, a test of homogeneity of the three largest eigenvalues is desired, the multiplicities of the three smallest eigenvalues are not specified, and λ3 > λ4 . The null and alternative hypotheses can be written as H0 : m ∈ M0 and Ha : m ∈ Ma , where n o n o 0 0 M0 = 3 m02 ; m2 ∈ M(3) and Ma = m01 m02 ; m1 ∈ M(3)\3, m2 ∈ M(3) , (127) c M(3) is defined in (24) and E\F def = E ∩ F . Denote the parameter spaces under H0 and Ha as Θ0 and Θa , respectively. To minimize a discrepancy function under θ ∈ Θ0 ∪ Θa one could fit a model with the least restrictive multiplicity vector, namely ma = 16 . The parameterization of Φ in (19), however, cannot be used to represent Φ with multiplicity vector 16 if H0 is true. This failure occurs because if λi = λj for some i 6= j, then the corresponding eigenvectors are not identified and neither λi nor λj , are differentiable functions of Φ. A composite model for Λ allows selected eigenvalues to have arbitrary multiplicities and enables Λ to be parameterized without reference to non-differentiable eigenvalues. Consider, again, the hypotheses H0 : m ∈ M0 and Ha : m ∈ Ma , where M0 and Ma are 0 defined in (127). To construct a discrepancy-based test of H0 versus Ha , (a) set m = 3 3 and write Λ as Λ1 ⊕ Λ2 ; (b) set A = 2 for θ ∈ Θ0 ; and (c) set A = {1, 2} for θ ∈ Θ0 ∪ Θa . Note that the eigenvalues of Φ, under θ ∈ Θ0 ∪ Θa , are unrestricted except for λ3 > λ4 . There can be issues regarding fitting models in which composite multiplicities are employed. For example, consider the model in (39). A goodness of fit test of this model is a test of ,r HM : R(M) ⊆ R(Γ). For this model, Λ is parameterized as 0

Λ = Λ1 ⊕ Λ2 , where vec(Λj ) = ρj vec(Imj ) + T3,j ξΛ,,j , 0 where m = r p − r . See (25) for details. The eigenvalues of Φ are the eigenvalues of Λ. It is assumed that λr > λr+1 . The issue is that if λr /λr+1 is near 1 and sample size is not large, then, b 2 can be larger than the smallest eigenvalue of Λ b 1, for the fitted model, the largest eigenvalue of Λ ,r M,r even if HM is true. If this occurs, then H is rejected. If sample size is large, then this failure to 0 0 fit the model is unlikely. See §118.2 for some empirical results.

110

Remarks on Parameterization of G

The indicator matrix A1 in (29) can be written as e ∗0 def A1 = A1,p = L p =

p X i X

 ∗ epj ⊗ epi epfij0 , where fij = i + (2p − j)(j − 1)/2.

i=1 j=1

e0 . The transpose of A1 is Magnus’s (1988, §5.2) elimination matrix. If m = 1p , then A2,m = L p 0 e . Otherwise, the columns of A2,m are a subset of the columns of L p

111

Parameterization of Marginal Standard Deviations

Any of the structures described in Table 7 can be employed to model ψ, the vector of marginal standard deviations. For completeness, a list of marginal standard deviation structures is given in this document in Table 111. These parameterizations are minor modifications of the covariance

Details64

Table 111: Structures for p-Vector of Standard Deviations Structure for ψ

Optional Constraints

1a.

T5 ξ ψ

Cψ0 ψ = cψ

1b.

T5 ξ ψ

Cψ0 ψ = cψ 10p ψ

2a.

T4 exp {T5 ξ ψ }

Cψ0 ψ = cψ

2b.

T4 exp {T5 ξ ψ }

Cψ0 ln (ψ) = cψ

 3.

θψ,1

T4 exp {T5 ξ ψ } 10p T4 exp {T5 ξ ψ }



4a.

   ξψ,1 exp T4 exp T5 ξ ψ,2

4b.

  θψ,1 exp T4 exp T5 ξ ψ

Requirements on T4 & T5 10p T5 6= 0 rk(Cψ0 T5 ) = qψ 10p T5 6= 0 e 0 T5 ) = qψ rk(C ψ e ψ = Cψ − 1p c 0 C ψ 10p T4 6= 0 rk(Cψ0 T4 ) = qψ 10p T4 6= 0 rk(Cψ0 T4 ) = qψ 10p T4 6= 0 10q4 T5 = 0 e 0 T4 ) = qψ rk(C ψ e Cψ = Cψ − 1p cψ0 10p T4 = 0 rk(Cψ0 T4 ) = qψ

Cψ0 ψ = cψ 10p ψ Cψ0 ln (ψ) = cψ Cψ0 ln

ψ 1 Qp ( i=1 ψi ) p

! = cψ

10p T4 = 0 rk(Cψ0 T4 ) = qψ

matrix eigenvalue parameterizations employed by BPH. Details about these parameterizations were given in a supplement to BPH. For completeness, details about these parameterizations also are given in this section. Specifically, this Supplement gives details about (a) rank and/or vector space requirements on the matrices {T4 , T5 , Cψ , cψ }, (b) the implicit function ξ ψ = ξ ψ (θψ ), (c) derivatives of ψ with respect to θψ , and (d) initial guesses for θψ . In Table 111, ψ is written as a function of a vector ξ ψ which, in turn, is an implicit of the ν2 -dimensional vector θψ . Dimensions of the matrices and vectors in Table 111 are the following: dim(T4 ) = p × q4 ,

dim(T5 ) = q4 × q5 ,

dim(Cψ ) = p × qψ , and dim(cψ ) = qψ × 1.

(128)

In models (1a) and (1b), the value of q4 is p. In structures (3, 4b), θψ is partitioned as Qp 1/p 0 0 θψ = θψ,1 θψ,2 and ξ ψ = ξ ψ (θψ,2 ), where θψ,1 = 10p ψ in structure (3), and θψ,1 = i=1 ψi in structure (4b). The values of ν2 for the structures in Table 111 are as follows: ( q5 − qψ for structures (1a, 1b, 2a, 2b), ν2 def (129) = dim1 (θψ ) = q5 − qψ + 1 for structures (3, 4a, 4b). The standard deviation parameterizations in Table 111 require that certain rank and/or vector space conditions on the matrices T4 , T5 , Cψ , and cψ be satisfied. First, if qψ > 0, then Cψ is required to have full column-rank. Second, 1p 6∈ R(Cψ ) must be satisfied in (1b, 3). If rk(Cψ0 T5 ) = qψ is not satisfied in 1a, or if rk(Cψ0 T4 ) = qψ is not satisfied in (2a, 2b, 4a, 4b), or if e 0 T5 ) = qψ is not satisfied in 1b, or if rk(C e 0 T4 ) = qψ is not satisfied in 3 then one or more rk(C ψ ψ columns of Cψ (and rows of cψ ) are degenerate and can be deleted. If 1p ∈ N (T04 ) is not satisfied in (4a, 4b) then T4 can be replaced by any full column-rank matrix whose columns are a basis set

Details65 for R [(Ip − H1 )T4 ], where H1 = p−1 1p 10p . A comparable substitution can be made for T5 in parameterization (3). 0 The vector ξ ψ is a function of θψ,2 in (3, 4b), where θψ is partitioned as θψ = (θψ,1 θψ,2 )0 ; and of θψ (not partitioned) otherwise. Specifically, denote by ηψ a vector-valued implicit function of  either θψ or θψ,2 . Also, let V = V5 V6 ∈ Oq5 be a matrix whose value is not yet determined. Then, ξ ψ can be written as ξ ψ = Iξ ψ = (V5 V50 + V6 V60 ) ξ ψ ( V5 ηψ + V6 θψ,2 = V5 ηψ + V6 θψ

for models 3 & 4b, where ηψ = V50 ξ ψ and θψ,2 = V60 ξ ψ , otherwise, where ηψ = V50 ξ ψ and θψ = V60 ξ ψ .

As an illustration, consider parameterization (2b) in Table 111. In this case, the model can be written as   ψ = T4 exp T5 V5 ηψ + V6 θψ subject to Cψ0 ln (ψ) = cψ , where θψ is the identified parameter and ηψ is an implicit function of θψ . Details about parameterizations (1a)–(4b) in Table 111 are given in the following sections. Each subsection gives derivatives of ψ with respect to θψ , initial guesses for ψ, and an algorithm to solve for the implicit parameter ηψ given a value for the identified parameter, θψ . A proof of the derivative expressions is given for parameterizations (2b) and (3). Proofs for the remaining parameterizations use the same strategy and are not reported. All derivative expressions have been verified numerically. The strategy for obtaining initial guesses for the standard deviation parameters is first to obtain an initial guess for ξ ψ without imposing the restrictions involving Cψ and cψ . The initial guess for ξ ψ is then used as the starting point for obtaining an initial guess for θψ .

111.1

Parameterization (1a): ψ = T5 ξψ , Cψ0 ψ = cψ

111.1.1

Derivatives of ψ with Respect to

θψ

The matrices of first, second, and third derivatives of ψ with respect to θψ are given in Theorem 106.

Theorem 106.

Dene Wψ as Wψ = Cψ0 T5 . By assumption, Wψ has full row-rank, namely rk(Wψ ) = rk(Cψ ) = qψ . Write Wψ as  0 SVD (Wψ ) = Uψ Dψ Vψ , where Uψ ∈ Oqψ , Vψ ∈ Oq5 , Vψ = Vψ,1 Vψ,2 , (130) Vψ,1 ∈ Oq5 ,qψ ,

Dψ = Dψ,1

 ++ 0qψ ×(q5 −qψ ) , and Dψ,1 ∈ Ddg,q . ψ

Then, ψ can be written as + ψ = T5 Wψ cψ + T5 V6 θψ , where V6 = Vψ,2 .

Furthermore, ν2 = dim1 (θψ ) = q5 − qψ and the rst three derivatives of ψ with respect to θψ are (1)

Dψ;θ0 = T5 V6 , ψ

(2)

(3)

Dψ;θ0 ,θ0 = 0p×ν22 , and Dψ;θ0 ,θ0 ,θ0 = 0p×ν23 . ψ

ψ

ψ

ψ

ψ

If ψ is parameterized as (1a) of Table 111, but no additional restrictions are imposed, then ν2 = dim1 (θψ ) = q5 and the rst three derivatives of ψ with respect to θψ are obtained by equating V6 to Iq5 . That is, (1)

Dψ;θ0 = T5 , ψ

(2)

(3)

Dψ;θ0 ,θ0 = 0p×ν22 , and Dψ;θ0 ,θ0 ,θ0 = 0p×ν23 . ψ

ψ

ψ

ψ

ψ

Details66

111.1.2

Initial Guesses for ξψ and

θψ 1/2

b . That is, ψ b = [diag(S)] Denote the vector of sample standard deviations by ψ S S shown that if nS ∼ Wp (n, Σ), then   √ 1 dist b n(ψS − ψ) −→ N 0, (Φ Σ) 2

. It is readily

(131)

Normality is not necessarily assumed in the current application; nonetheless, (131) can be used to construct a generalized least-squares initial guess for θψ . The initial guess is found by minimizing  0   b − T5 W+ cψ − T5 V6 θ ψ (R S)−1 ψ b − T5 W+ cψ − T5 V6 θ ψ SSE(θ ψ ) = ψ S S ψ ψ with respect to θ ψ , where Wψ = Cψ0 T5 and R is the sample correlation matrix. The solution is h i−1   −1 b + bψ = V0 T0 (R S)−1 T5 V6 θ V60 T05 (R S) ψS − T5 Wψ cψ . 6 5

111.1.3

Solving for ηψ and

ξψ

In parameterization (1a), the value of ηψ depends on Cψ and cψ , but it does not depend on θψ . To solve for ηψ , write the restriction as  Cψ0 ψ = cψ =⇒ Cψ0 T5 V5 ηψ + V6 θψ = cψ =⇒ Cψ0 T5 V5 ηψ = cψ because Cψ0 T5 V6 = 0 0 =⇒ ηψ = D−1 ψ Uψ c ψ ,

where Uψ Dψ V50 = svd (Wψ ) and Wψ = Cψ0 T5 . Accordingly, + ξ ψ = V5 ηψ + V6 θψ = Wψ cψ + V6 θψ and + ψ = T5 ξ ψ = T5 Wψ cψ + T5 V6 θψ .

111.2

Parameterization (1b): ψ = T5 ξψ ,

111.2.1

Derivatives of ψ with Respect to

C0 ψ ψ

10p ψ

= cψ

θψ

The matrices of first, second, and third derivatives of ψ with respect to θψ are given in Theorem 107.

Theorem 107.

Dene Wψ as e 0 T5 , where C e ψ = Cψ − 1p c 0 . Wψ = C ψ ψ

e ψ ) = qψ . Write Wψ in terms of its By assumption, Wψ has full row-rank, namely rk(Wψ ) = rk(C 0 singular values and vectors; namely SVD (Wψ ) = Uψ Dψ Vψ , where Vψ has been partitioned as in (130). Then, ψ can be written as ψ = T5 V6 θψ , where V6 = Vψ,2 .

Details67 Furthermore, ν2 = dim1 (θψ ) = q5 − qψ and the rst three derivatives of ψ with respect to θψ are (3)

(2)

(1)

Dψ;θ0 ,θ0 = 0p×ν22 , and Dψ;θ0 ,θ0 ,θ0 = 0p×ν23 .

Dψ;θ0 = T5 V6 ,

ψ

ψ

ψ

ψ

ψ

ψ

If ψ is parameterized as (1b) of Table 111, but no additional restrictions are imposed, then ν2 = dim1 (θψ ) = q5 and the rst three derivatives of ψ with respect to θψ are obtained by equating V6 to Iq5 . That is, Dψ;θ0 ,θ0 = 0p×ν22 , and Dψ;θ0 ,θ0 ,θ0 = 0p×ν23 . ψ

ψ

111.2.2

(3)

(2)

(1)

Dψ;θ0 = T5 ,

Initial Guesses for ξψ and

ψ

ψ

ψ

ψ

θψ

The initial guess for θψ is found by minimizing  0   b − T5 V6 θ ψ (R S)−1 ψ b − T5 V6 θ ψ SSE(θ ψ ) = ψ S S with respect to θ ψ . The solution is h i−1 −1 b bψ = V0 T0 (R S)−1 T5 V6 θ V60 T05 (R S) ψ S. 6 5

111.2.3

Solving for ηψ and

ξψ

In parameterization (1b), the value of ηψ depends on Cψ and cψ , but it does not depend on θψ . To solve for ηψ , write the restriction as Cψ0



ψ 10p ψ



e 0 ψ = 0, where C e ψ = Cψ − 1p c 0 = cψ =⇒ C ψ ψ  e 0 T5 V5 ηψ + V6 θψ = 0 =⇒ C ψ

e 0 T5 V6 = 0 =⇒ C0 T5 V5 ηψ = 0 because C ψ =⇒ ηψ = 0, e 0 T5 V5 = Uψ Dψ has full column-rank, where Uψ Dψ V0 = svd (Wψ ) and Wψ = C e 0 T5 . because C 5 ψ ψ Accordingly, ξ ψ = V5 ηψ + V6 θψ = V6 θψ and ψ = T5 ξ ψ = T5 V6 θψ .

111.3

Parameterization (2a): ψ = T4 exp {T5 ξψ }, Cψ0 ψ = cψ

111.3.1

Derivatives of ψ with Respect to

θψ

The matrices of first, second, and third derivatives of ψ with respect to θψ are given in Theorem 108.

Details68

Theorem 108.

Dene dξ , Dξ , and Wψ as 0 Dξ = (dξ )dg , and Wψ def = Cψ T4 Dξ T5 .

dξ def = exp {T5 ξ ψ },

(132)

It is assumed that Cψ has been chosen such that the qψ × q5 matrix ∂ C0 ψ ∂ ξ 0ψ ψ 0 has full row-rank. It follows that Wψ has full row-rank. Write Wψ as SVD (Wψ ) = Uψ Dψ Vψ , where Vψ has been partitioned as in (130). Then, ξψ can be written in as

ξ ψ = V5 η ψ + V6 θψ , where η ψ = V50 ξ ψ ,

θψ = V60 ξ ψ ,

V5 = Vψ,1 , and V6 = Vψ,2 . Furthermore, ν2 = q5 − qψ and the rst three derivatives of ψ with respect to θψ are (1)

Dψ;θ0 = T4 Dξ T5 V6 , ψ

(2)

Dψ;θ0 ,θ0 = T4 Dξ (Iq4 − Pξ )L02 , and ψ

ψ

h i 0 (3) Dψ;θ0 ,θ0 ,θ0 = T4 Dξ (Id1 − Pξ ) L03 − L2 P0ξ ∗ V60 T05 Jν2 , where ψ

ψ

ψ

L2 = V60 T05 ∗ V60 T05 ,

+ 0 Pξ = Dξ T5 Wψ Cψ T4 ,

L3 = L2 ∗ V60 T05 ,

and Jν2 is dened in Table 101. If Cψ in parameterization (2a) of Table 111 is empty (i.e., there are no constraints on ψ ), then qψ = 0, ν2 = q5 , V6 = Iq5 , ξψ = θψ , Pξ = 0 and the derivatives simplify to (1)

Dψ;θ0 = T4 Dξ T5 , ψ

(2)

(3)

Dψ;θ0 ,θ0 = T4 Dξ L02 , and Dψ;θ0 ,θ0 ,θ0 = T4 Dξ L03 , ψ

ψ

ψ

ψ

ψ

where L2 = T05 ∗ T05 , and L3 = L2 ∗ T05 .

111.3.2

Initial Guesses for ξψ and

θψ

Write ψ as    ψ = ψ ξ ψ , where ψ ξ ψ = T4 exp T5 ξ ψ . A three-stage procedure is used to obtain an initial guess for ξ ψ . The first stage employs the Gauss-Newton algorithm to minimize h i0 h i b − ψ(1q ω) ψ b − ψ(1q ω) SSE(ω) = ψ S S 5 5 with respect to the scalar ω, where the initial guess for ω is zero. At iteration h + 1, the estimate of ω is updated by −1

ω bh+1 = ω bh + αh (x0h xh ) b = ψ (1q ω ψ b ), h 5 h

b −ψ b ), where x0h (ψ S h

xh = T4 (exp {T5 1q5 ω bh })dg T5 1q5 ,

Details69 and αh ∈ (0, 1]. The second stage employs the Gauss-Newton algorithm to minimize h i0 h i b − ψ(ξ ) (R S)−1 ψ b − ψ(ξ ) SSE(ξ ψ ) = ψ S ψ S ψ with respect to ξ ψ , where the initial guess for ξ ψ is 1q5 ω b , and ω b is the minimizer from stage one. At iteration h + 1, the estimate is updated by h i−1 −1 −1 b b b ξ ψ,h+1 = b ξ ψ,h + αh X0h (R S) Xh + βh Iq5 X0h (R S) (ψ S − ψ h ), where   b =ψ b ψ ξ ψ,h , h

o  n T5 , Xh = T4 exp T5 b ξ ψ,h dg

αh ∈ (0, 1], and βh ≥ 0. The values of αh and βh are chosen to ensure that SSE decreases at each iteration. If ψ is subject to no additional constraints, then the initial guess for θ ψ is b ξ ψ . If ψ is subject to Cψ0 ψ = cψ , then the third stage is executed. In this stage, the initial guess for θ ψ is obtained as the minimizer of h h i 0 i −1 b b −ψ ξ SSE(θ ψ ) = ψ (R S) ψS − ψ ξ ψ S ψ with respect to θ ψ , where ξ ψ = V5 η ψ + V6 θ ψ , η ψ satisfies   Cψ0 T4 exp T5 V5 η ψ + V6 θ ψ = cψ for fixed θ ψ , and where the matrices V5 and V6 are given in Theorem 108. At iteration h + 1, the estimate of θ ψ is updated as h i−1 −1 b bψ,h+1 = θ bψ,h + αh X0 (R S)−1 Xh + βh Iν b θ X0h (R S) (ψ S − ψ h ), where h 2   b =ψ b ψ ξ ψ,h , h

h n oi Xh = T4 exp T5 b ξ ψ,h T5 V6,h , dg

b bψ,h , b ψ,h + V6,h−1 θ ξ ψ,h = V5,h−1 η αh ∈ (0, 1], βj ≥ 0, and the matrices V5,h and V6,h are given in Theorem 108 in which b ξ ψ,h has been substituted for ξ ψ . Values of αh and βh are chosen to ensure that SSE decreases at each bψ,h+1 , the value of b iteration. Given θ ξ ψ,h+1 is obtained by solving n o bψ,h+1 . b ψ,h+1 , where b b ψ,h+1 + V6,h θ Cψ0 T4 exp T5 b ξ ψ,h+1 = cψ for η ξ ψ,h+1 = V5,h η

111.3.3

Solving for ηψ and

ξψ

The issue in this section is to solve n  o bψ b ψ + V6 θ Cψ0 T4 exp T5 V5 η = cψ , bψ is a fixed vector and V5 ∈ Oq ,q and V6 ∈ Oq ,q −q are fixed matrices that bψ , where θ for η 5 ψ 5 5 ψ b ψ,0 = V50 b satisfy V50 V6 = 0. To solve this equation, first set η ξ ψ , where b ξ ψ is the current estimate

Details70 b ψ,j+1 is of ξ ψ . A modified Newton update for η b ψ,j+1 = η b ψ,j η  −αj

Cψ0 T4

−1 h n  oi b b ψ,j + V6 θ ψ exp T5 V5 η T5 V5 dg

 i h  × Cψ0 ψ b ξ ψ,j − cψ , bψ . b ψ,j + V6 θ where αj ∈ (0, 1], and b ξ ψ,j = V5 η

111.4

Parameterization (2b): ψ = T4 exp {T5 ξψ }, Cψ0 ln (ψ) = cψ

111.4.1

Derivatives of ψ with Respect to

θψ

The matrices of first, second, and third derivatives of ψ with respect to θψ are given in Theorem 109.

Theorem 109.

Dene dξ and Dξ as in (132), and dene Wψ as Wψ def =

∂ C 0 ln (ψ) = Cψ0 Ψ−1 T4 Dξ T5 , ∂ ξ 0ψ ψ

where Ψ = ψ dg . It is assumed that Cψ has been chosen such that the qψ × q5 matrix Wψ has full 0 , where Vψ has been partitioned as in (130). row-rank. Write Wψ as SVD (Wψ ) = Uψ Dψ Vψ Then, ξψ can be written in as ξ ψ = V5 η ψ + V6 θψ , where η ψ = V50 ξ ψ ,

θψ = V60 ξ ψ ,

V5 = Vψ,1 , and V6 = Vψ,2 . Furthermore, ν2 = q5 − qψ and the rst three derivatives of ψ with respect to θψ are (1)

Dψ;θ0 = T4 Dξ T5 V6 , ψ

(2)

e 0 , and Dψ;θ0 ,θ0 = (Ip − Pξ )T4 Dξ L02 + Pξ Ψ−1 L 2 ψ

ψ

   0 (3) (2)0 Dψ;θ0 ,θ0 ,θ0 = (Ip − Pξ )T4 Dξ L03 + Dξ ;θ0,θ0 T05 ∗ V60 T05 Jν2 ψ

ψ

λ

ψ

+Pξ Ψ−1 L2 = V60 T05 ∗ V60 T05 ,



(2)0

(1)0

Dψ;θ0 ,θ0 ∗ Dψ;θ0 ψ

ψ

ψ

L3 = L2 ∗ V60 T05 ,

+ 0 Pξ = T4 Dξ T5 Wψ Cψ Ψ−1 ,

(2) 0 0 λ ;θλ,θλ



Ψ = ψ dg , and Jν2 is dened in Table 101.

0

λ λ

 e 0 , where Jν2 − 2Ψ−1 L 3

e 2 = D(1)0 0 ∗ D(1)0 0 , L ψ;θ ψ;θ ψ

ψ

e3 = L e 2 ∗ D(1)0 0 , L ψ;θ

  + 0 e 0 − T4 Dξ L0 , = Wψ Cψ Ψ−1 Ψ−1 L 2 2

ψ

Details71 If Cψ in parameterization (2b) of Table 111 is empty (i.e., there are no constraints on ψ ), (2) then qψ = 0, ν2 = q5 , V6 = Iq5 , ξψ = θψ , Dξ ;θ0,θ0 = 0, Pξ = 0, and the derivatives simplify to λ

λ λ

(3)

(2)

(1)

Dψ;θ0 ,θ0 = T4 Dξ L02 , and Dψ;θ0 ,θ0 ,θ0 = T4 Dξ L03 ,

Dψ;θ0 = T4 Dξ T5 ,

ψ

ψ

ψ

ψ

ψ

ψ

where L2 = T05 ∗ T05 , and L3 = L2 ∗ T05 . Proof. If Cψ0 ln (ψ) = cψ ∀ θψ in the parameter space, then all derivatives of Cψ0 ln (ψ) with respect to θψ are 0. It follows from Theorem 102 that the derivatives of ln (ψ) with respect to θψ 0 can be written as follows: (1)

(1)

Dln (ψ);θ0

ψ

(2)

Dln (ψ);θ0

0 ψ ,θ ψ

(3)

Dln (ψ);θ0

0 0 ψ ,θ ψ ,θ ψ

= Ψ−1 Dψ;θ0 ; ψ

0  (1)0 (2) (1)0 e 0 + Ψ−1 D(2) 0 0 ; and = −Ψ−2 Dψ;θ0 ∗ Dψ;θ0 + Ψ−1 Dψ;θ0 ,θ0 = −Ψ−2 L 2 ψ;θ ,θ ψ

ψ

=

ψ

ψ

ψ

ψ

 0 (1)0 (1)0 (1)0 2Ψ−3 Dψ;θ0 ∗ Dψ;θ0 ∗ Dψ;θ0 ψ

ψ

ψ

 0 (2)0 (1)0 (3) −Ψ−2 Dψ;θ0 ,θ0 ∗ Dψ;θ0 Jν2 + Ψ−1 Dψ;θ0 ,θ0 ,θ0 ψ

=

ψ

ψ

ψ

ψ

ψ

 0 e 0 − Ψ−2 D(2)0 0 0 ∗ D(1)0 0 Jν + Ψ−1 D(3) 0 0 0 , 2Ψ−3 L 3 2 ψ;θ ,θ ψ;θ ψ;θ ,θ ,θ ψ

ψ

ψ

ψ

ψ

ψ

e 2 = D(1)0 0 ∗ D(1)0 0 , where Ψ = ψ dg and ∗ is the Khatri-Rao matrix product operator, L ψ;θ ψ;θ ψ

ψ

e3 = L e 2 ∗ D(1)0 0 , and Ja is defined in Table 101. Furthermore, C 0 ln (ψ) − cψ = 0 for all θψ in the L ψ ψ;θψ parameter space implies that (1)

(1)

(a) Cψ0 Dln (ψ);θ0 = Cψ0 Ψ−1 Dψ;θ0 = 0; ψ

ψ

(2)

(b) Cψ0 Dln (ψ);θ0

0 ψ ,θ ψ

(2)

e 0 + C 0 Ψ−1 D 0 0 = 0; and = −Cψ0 Ψ−2 L 2 ψ ψ;θ ,θ ψ

ψ

(133) (c)

(3) Cψ0 Dln (ψ);θ0 ,θ0 ,θ0 ψ ψ ψ

=

e0 2Cψ0 Ψ−3 L 3



Cψ0 Ψ−2



(2)0 Dψ;θ0 ,θ0 ψ ψ



(1)0 Dψ;θ0 ψ

0

Jν2

(3)

+ Cψ0 Ψ−1 Dψ;θ0 ,θ0 ,θ0 = 0, ψ

ψ

ψ

Write ξ ψ as ξ ψ = V5 ηψ + V6 θψ , where ηψ is a function of θψ and V = V5 fixed matrix. Also, write ψ as

 V6 ∈ Oq5 is a

q4 X     ψ = T4 exp T5 V5 ηψ + V6 θψ = T4 eqj 4 exp eqj 4 0 T5 V5 ηψ + V6 θψ . j=1

It follows from (133a) and (134) that   ∂ Cψ0 ln (ψ) = Wψ V5 , where Wψ = Cψ0 Ψ−1 T4 Dξ T5 and Dξ = exp T5 ξ ψ dg . 0 ∂ηλ

(134)

Details72 If Wψ has full row-rank, then choosing V5 = Vψ,1 ensures that Wψ V5 is nonsingular and, by the implicit function theorem, ηλ is a function of θψ , where Vψ,1 is defined in (130). It follows from (134) that derivatives of ψ can be written as (1) 0 ; ψ ;θψ

(1)

(a) Dψ;θ0 = T4 Dξ T5 Dξ ψ

0  (2) (1)0 (1)0 (2) (b) Dψ;θ0 ,θ0 = T4 Dξ Dξ ;θ0 T05 ∗ Dξ ;θ0 T05 + T4 Dξ T5 V5 Dη ;θ0 ,θ0 ; and ψ

ψ

ψ

ψ

ψ

ψ

ψ

ψ

ψ

(135) (c)

(3) Dψ;θ0 ,θ0 ,θ0 ψ ψ ψ

= T4 Dξ



(1)0 Dξ ;θ0 T05 ψ ψ



(1)0 Dξ ;θ0 T05 ψ ψ

(1)0 Dξ ;θ0 T05 ψ ψ



0

0  (3) (1)0 (2)0 + T4 Dξ Dξ ;θ0 ,θ0 T05 ∗ Dξ ;θ0 T05 Jν2 + T4 Dξ T5 V5 Dη ;θ0 ,θ0 ,θ0 ; ψ

ψ

ψ

ψ

(1) 0 ψ ;θψ

where Dξ is defined in Theorem 109 and Dξ

ψ

ψ

(1)

= V5 D η

0 ψ ;θψ

ψ

ψ

ψ

+ V6 . It follows from (133a) and

(135a) that   (1) (1) Wψ V5 Dη ;θ0 + V6 = 0 =⇒ Wψ V5 Dη ;θ0 = 0 because Wψ V6 = 0 ψ

ψ

ψ

(1)

=⇒ Dη

0 ψ ;θψ

(1) 0 ψ ;θψ

= 0 =⇒ Dξ

ψ

(1)

= V6 =⇒ Dψ;θ0 = T4 Dξ T5 V6 , ψ

because Wψ V5 has full column-rank, where V5 , and V6 are defined in Theorem 109. It follows from (133b) and (135b) that (2)

Wψ V5 D η

0 0 ψ ;θψ,θψ

(2)

=⇒ Dη (2) 0 0 ψ ;θψ,θψ

=⇒ Dξ

0 0 ψ ;θψ,θψ

e0 = 0 + Cψ0 Ψ−1 T4 Dξ L02 − Cψ0 Ψ−2 L 2

  + 0 e 0 − T4 Dξ L0 = V50 Wψ Cψ Ψ−1 Ψ−1 L 2 2

  + 0 e 0 − T4 Dξ L0 because V5 V0 W+ = W+ = Wψ Cψ Ψ−1 Ψ−1 L 2 2 5 ψ ψ (2)

e0 , =⇒ Dψ;θ0 ,θ0 = (Ip − Pψ ) T4 Dξ L02 + Pψ Ψ−1 L 2 ψ

ψ

+ 0 where L2 = V60 T05 ∗ V60 T05 and Pψ = T4 Dξ T5 Wψ Cψ Ψ−1 . It follows from (133c) and (135c) that  0 (3) (2)0 Wψ V5 Dη ;θ0 ,θ0 ,θ0 + Cψ0 Ψ−1 T4 Dξ L03 + Cψ0 Ψ−1 T4 Dξ Dξ ;θ0 ,θ0 T05 ∗ V60 T05 Jν2 ψ

ψ

ψ

ψ

ψ

ψ

ψ

 0 e 0 − C 0 Ψ−2 D(2)0 0 0 ∗ D(1)0 0 Jν = 0 +2Cψ0 Ψ−3 L 3 ψ 2 ψ;θ ,θ ψ;θ ψ

=⇒

(3) Dη ;θ0 ,θ0 ,θ0 ψ ψ ψ ψ

=

+ 0 −V50 Wψ Cψ Ψ−1

e0 +2Ψ−2 L 3

=⇒

(3) Dψ;θ0 ,θ0 ,θ0 ψ ψ ψ

−Ψ

ψ

ψ

  0 (2)0 T4 Dξ L03 + T4 Dξ Dξ ;θ0 ,θ0 T05 ∗ V60 T05 Jν2

−1

ψ



(2)0 Dψ;θ0 ,θ0 ψ ψ



(1)0 Dψ;θ0 ψ

0

ψ

ψ

 Jν 2

   0 (2)0 0 0 0 0 = (Ip − Pξ )T4 Dξ L3 + Dξ ;θ0 ,θ0 T5 ∗ V6 T5 Jν2 ψ

ψ

ψ

Details73

+Pξ Ψ−1



(1)0

(2)0

Dψ;θ0 ,θ0 ∗ Dψ;θ0 ψ

ψ

ψ

0

 e0 , Jν2 − 2Ψ−1 L 3

e 3 = D(1)0 0 ∗ D(1)0 0 ∗ D(1)0 0 . where L3 = V60 T05 ∗ V60 T05 ∗ V60 T05 and L ψ;θ ψ;θ ψ;θ ψ

111.4.2

Initial Guesses for ξψ and

ψ

ψ

θψ

Write ψ as  ψ = ψ(ξ ψ ), where ψ(ξ ψ ) = T4 exp T5 ξ ψ . A three-stage procedure is used to obtain an initial guess for ξ ψ . The first two stages are identical to the first two stages in parameterization (2a) of Table 111. If ψ is subject to no additional constraints, then the initial guess for θ ψ is b ξ ψ . If ψ is subject to Cψ0 ln (ψ) = cψ , then the third stage is executed. In this stage, the initial guess for θ ψ is obtained as the minimizer of h h i 0 i −1 b b −ψ ξ SSE(θ ψ ) = ψ (R S) ψS − ψ ξ ψ S ψ with respect to θ ψ , where ξ ψ = V5 η ψ + V6 θ ψ , η ψ satisfies     Cψ0 ln T4 exp T5 V5 η ψ + V6 θ ψ = cψ for fixed θ ψ , and where the matrices V5 and V6 are given in Theorem 109. At iteration h + 1, the estimate of θ ψ is updated as h i−1 −1 b bψ,h+1 = θ bψ,h + αh X0 (R S)−1 Xh + βh Iν b θ X0h (R S) (ψ S − ψ h ), where h 2   b =ψ b ψ ξ h ψ,h ,

h n oi Xh = T4 exp T5 b ξ ψ,h T5 V6,h , dg

b bψ,h , b ψ,h + V6,h−1 θ ξ ψ,h = V5,h−1 η αh ∈ (0, 1], βj ≥ 0, and the matrices V5,h and V6,h are given in Theorem 108 in which b ξ ψ,h has been substituted for ξ ψ . Values of αh and βh are chosen to ensure that SSE decreases at each bψ,h+1 , the value of b iteration. Given θ ξ ψ,h+1 is obtained by solving oi h n bψ,h+1 . b ψ,h+1 , where b b ψ,h+1 + V6,h θ ξ ψ,h+1 = cψ for η ξ ψ,h+1 = V5,h η Cψ0 ln T4 exp T5 b

111.4.3

Solving for ηψ and

ξψ

The issue in this section is to solve h n  oi bψ b ψ + V6 θ Cψ0 ln T4 exp T5 V5 η = cψ , bψ is a fixed vector and V5 ∈ Oq ,q and V6 ∈ Oq ,q −q are fixed matrices that bψ , where θ for η 5 ψ 5 5 ψ 0 b ψ,0 = V50 b satisfy V5 V6 = 0. To solve this equation, first set η ξ ψ , where b ξ ψ is the current estimate

Details74 b ψ,j+1 is of ξ ψ . A modified Newton update for η b ψ,j+1 = η b ψ,j η  −αj

b −1 T4 Cψ0 Ψ j

h

× where αj ∈ (0, 1],

−1 n  oi b b ψ,j + V6 θ ψ exp T5 V5 η T5 V5 dg

i o n h  Cψ0 ln ψ b ξ ψ,j − cψ ,

i h  bψ . bj = ψ b b ψ,j + V6 θ , and b ξ ψ,j = V5 η Ψ ξ ψ,j dg

111.5

Parameterization (3): ψ = θψ,1

111.5.1

Derivatives of ψ with Respect to



T4 exp {T5 ξψ } 10p T4 exp {T5 ξψ }

 C0 ψ

,

ψ

10p ψ

= cψ

θψ

The matrices of first, second, and third derivatives of ψ with respect to θψ are given in Theorem 110.

Theorem 110. as

Partition θψ as θψ = (θψ,1 θ 0ψ,2 )0 . Dene dξ and Dξ as in (132). and dene Wψ e 0 T4 Dξ T5 , Wψ def =C ψ

e ψ = Cψ − 1p c 0 . It is assumed that Cψ has been chosen such that the qψ × q5 matrix where C ψ ∂ ∂ ξ 0ψ



Cψ0 ψ 10p ψ



0 has full row-rank. It follows that Wψ has full row-rank. Write Wψ as SVD (Wψ ) = Uψ Dψ Vψ , where Vψ has been partitioned as in (130). Then, ξψ can be written in as

ξ ψ = V5 η ψ + V6 θ ψ,2 where η ψ = V50 ξ ψ ,

θ ψ,2 = V60 ξ ψ ,

V5 = Vψ,1 , and V6 = Vψ,2 . Dene h as     ν11 1 h= = . ν12 q5 − qψ def

(136)

The entries in h represent the dimensions of θψ,1 and θ ψ,2 . Then, ν2 = dim1 (θψ ) = 1 + q5 − qψ and for xed V5 and V6 , the rst three derivatives of ψ with respect to θψ are (1)

Dψ;θ0 =

2 X

ψ

(1)

Dψ;θ0 E0s,h , ψ,s

s=1

(2)

Dψ;θ0 ,θ0 = ψ

2 X 2 X

(2)

Dψ;θ0

0 ψ,s,θψ,t

ψ

 E0s,h ⊗ E0t,h ,

s=1 t=1

(3)

Dψ;θ0 ,θ0 ,θ0 = ψ

ψ

2 X 2 X 2 X

(3)

Dψ;θ0

0 0 ψ,s,θψ,t,θψ,u

ψ

s=1 t=1 u=1

 E0s,h ⊗ E0t,h ⊗ E0u,h , where

Details75 (1)

(1)

−1 Dψ;θψ,1 = ψθψ,1 ;

Dψ;θ0 = wθψ,1 (Ip − Hψ ) T4 Dξ T5 V6 ;

w=

ψ,2

(2)

−1 0 Hψ = ψθψ,1 1p ;

(2)

Dψ;θψ,1 ,θψ,1 = 0p×1 ;

Dψ;θ0

ψ,2,θψ,1

10p T4

1 ; exp {T5 ξ ψ }

= w (Ip − Hψ ) T4 Dξ T5 V6 ;

  (1) = −2w Dψ;θ0 ⊗ u0ξ Nν12 + wθψ,1 (Ip − Hψ ) T4 Dξ (Iq4 − Pξ )L02 ;

(2)

Dψ;θ0

0 ψ,2,θψ,2

ψ,2

(3) Dψ;θ0 ,θ0 ,θ0 ψ,2 ψ,2 ψ,2



= wθψ,1 (Ip − Hψ ) T4 Dξ (Iq4

  1 0 0 0 0 0 − Pξ ) L3 − L2 Pξ ∗ V6 T5 Jν12 3

i  h (2) (1) −w 10p T4 Dξ (Iq4 − Pξ ) L02 ⊗ Dψ;θ0 Jν12 − w Dψ;θ0

0 ψ,2,θψ,2

ψ,2

+e 0 Pξ = T5 Wψ Cψ T4 Dξ ;

L2 = V60 T05 ∗ V60 T05 ;

 ⊗ u0ξ Jν12 ;

uξ = V60 T05 Dξ T04 1p ; L3 = L2 ∗ V60 T05 ;

and Ja , Es,h and Nν12 are dened in Table 101. If Cψ in parameterization (3) of Table 111 is empty, then qψ = 0, V6 = Iq5 , ν12 = q5 , ξ ψ = θψ,2 , Pξ = 0, and the derivatives of ψ simplify to (1)

(1)

−1 Dψ;θψ,1 = ψθψ,1 ;

(2)

Dψ;θ0 = wθψ,1 (Ip − Hψ ) T4 Dξ T5 ;

Dψ;θ0

ψ,2,θψ,1

ψ,2

(2)

Dψ;θψ,1 ,θψ,1 = 0p×1 ;

(2)

Dψ;θ0

0 ψ,2,θψ,2

  (1) = −2w Dψ;θ0 ⊗ u0ξ Nν12 + wθψ,1 (Ip − Hψ ) T4 Dξ L02 ; ψ,2

(3)

Dψ;θ0

0 0 ψ,2,θψ,2,θψ,2

= wθψ,1 (Ip − Hψ ) T4 Dξ L03

   (1) (2) −w 10p T4 Dξ L02 ⊗ Dψ;θ0 Jν12 − w Dψ;θ0

0 ψ,2,θψ,2

ψ,2

w=

1 ; 10p T4 exp {T5 ξ ψ }

= w (Ip − Hψ ) T4 Dξ T5 ;

−1 0 Hψ = ψθψ,1 1p ;

uξ = T05 Dξ T04 1p ;

 ⊗ u0ξ Jν12 ; where L2 = T05 ∗ T05 ;

Nν12 and Ja are dened in Table 101.

Proof. First note that Cψ0



ψ 10p ψ

 = cψ ∀ θ ψ,2 =⇒

Cψ0

 ! T4 exp T5 ξ ψ  = cψ ∀ θ ψ,2 10p T4 exp T5 ξ ψ

  =⇒ Cψ0 T4 exp T5 ξ ψ = cψ 10p T4 exp T5 ξ ψ ∀ θ ψ,2   =⇒ Cψ0 − cψ 10p T4 exp T5 ξ ψ = 0 ∀ θ ψ,2  e 0 T4 exp T5 ξ ψ = 0 ∀ θψ where C e ψ = Cψ − 1p c 0 =⇒ C ψ ψ

L3 = L2 ∗ T05 ;

Details76

=⇒

 ∂ e0 Cψ T4 exp T5 ξ ψ = 0 ∀ θ ψ,2 0 ∂ θ ψ,2 (1) 0 ψ ;θψ,2

e 0 T4 Dξ T5 D =⇒ C ψ ξ (1) 0 ψ ;θψ,2

=⇒ Wψ Dξ e 0 T4 Dξ T5 , Wψ = C ψ

= 0 ∀ θ ψ,2

= 0 ∀ θ ψ,2 where

 Dξ = (dξ )dg , and dξ = exp T5 ξ ψ .

Write ξ ψ as ξ ψ = V5 η ψ + V6 θ ψ,2 , where η ψ is an implicit function of θ ψ,2 and the matrices V5 and V6 are yet to be determined. By the implicit function theorem (Fulks, 1978, pp. 352–354), η ψ is an implicit function of θ ψ,2 if Wψ V5 is nonsingular, because Wψ V5 =

 ∂ e0 C T4 exp T5 ξ ψ V5 . ∂ η 0ψ ψ

0 Write Wψ as SVD (Wψ ) = Uψ Dψ Vψ , where Vψ has been partitioned as in (130). If Wψ does not have full row-rank, then one or more linear functions of the restrictions Cψ0 ψ/(10p ψ) = cψ are degenerate and can be deleted. Let ψ 0 be a p-vector that can be parameterized as (3) in Table 111. Also, denote an open ˚(ψ 0 ). neighborhood of ψ 0 in the open set of p-vectors that can be parameterized as (3) by N Choose V5 to be any full column-rank matrix that satisfies R(V5 ) = R(Vψ0 ,1 ). Also, choose V6 to be any full column-rank matrix that satisfies R(V6 ) = R(Vψ0 ,2 ). Suitable choices are V5 = Vψ0 ,1 and V6 = Vψ0 ,2 . Continuity of ψ with respect to ξ ψ ensures that Wψ V5 is nonsingular for all ˚(ψ 0 ). ψ∈N Derivatives of ψ with respect to θψ,1 are straightforward to obtain because they do not involve implicit functions. For fixed V5 = Vψ0 ,1 and V6 = Vψ0 ,2 , the matrix of first derivatives of ψ with 0 respect to θψ,2 is     (1) (1) (1) Dψ;θ0 = wθψ,1 T4 Dξ T5 V5 Dη ;θ0 + V6 − wψ10p T4 Dξ T5 V5 Dη ;θ0 + V6 ψ

ψ,2

ψ

ψ,2

 (1) = wθψ,1 (Ip − Hψ ) T4 Dξ T5 V5 Dη ;θ0 ψ

ψ,2

ψ,2

 + V6 ,

 −1 −1 0 where w = 10p T4 exp T5 ξ ψ and Hψ = ψθψ,1 1p . (1)

To solve for Dη

0 ψ ;θψ,2

, note that

 e 0 T4 exp T5 ξ ψ = 0 ∀ ψ ∈ N ˚(ψ 0 ) =⇒ C ψ  (1) =⇒ Wψ V5 Dη ;θ0 ψ

(1)

The derivative Dη

0 ψ ;θψ,2

ψ,2

 ∂ e0 ˚ 0 Cψ T4 exp T5 ξ ψ = 0∀ ψ ∈ N (ψ 0 ) ∂ θψ,2

 ˚(ψ 0 ) =⇒ D(1) 0 + V6 = 0∀ ψ ∈ N η ;θ ψ

ψ,2

= − (Wψ V5 )

−1

Wψ V6 .

, evaluated at V5 = Vψ,1 and V6 = Vψ,2 , simplifies to 0 because (1)

Wψ Vψ,2 = 0. Accordingly, Dψ;θ0

ψ,2

evaluated at V5 = Vψ,1 and V6 = Vψ,2 is

(1)

Dψ;θ0 = wθψ,1 (Ip − Hψ ) T4 Dξ T5 V6 . ψ,2

Details77 For fixed V5 = Vψ0 ,1 and V6 = Vψ0 ,2 , the matrix of second derivatives of ψ with respect to θ 0ψ,2 is  (1) = wθψ,1 (Ip − Hψ ) T4 Dξ V5 Dη ;θ0

(2) Dψ;θ0 ,θ0 ψ,2 ψ,2

ψ

h  (1) −w2 θψ,1 10p T4 Dξ T5 V5 Dη ;θ0 ψ

ψ,2

ψ

= wθψ,1 (Ip − Hψ ) T4 Dξ



+ V6

T05





(1) V5 Dη ;θ0 ψ ψ,2

ψ

+ V6

ψ,2

i

(1) V5 Dη ;θ0 ψ ψ,2

0

T05

+ V6

i

+ V6

  (1) + V6 ⊗ (Ip − Hψ ) T4 Dξ T5 V5 Dη ;θ0

h  (1) (1) −w Dψ;θ0 ⊗ 10p T4 Dξ T5 V5 Dη ;θ0 ψ,2

ψ,2

0

ψ,2

0

(2)

+ wθψ,1 (Ip − Hψ ) T4 Dξ T5 V5 Dη

+ V6

0

T05



 h (1) (1) −w Dψ;θ0 ⊗ 10p T4 Dξ T5 V5 Dη ;θ0 ψ

ψ,2



ψ,2

(1) V5 Dη ;θ0 ψ ψ,2

+ V6

i

+ V6

0

0 0 ψ ;θψ,2 ,θψ,2

T05

0

2Nν12

(2)

+wθψ,1 (Ip − Hψ ) T4 Dξ T5 V5 Dη (2)

To solve for Dη

0 0 ψ ;θψ,2 ,θψ,2

0 0 ψ ;θψ,2 ,θψ,2

, note that  ∂ e0 ˚ 0 Cψ T4 exp T5 ξ ψ = 0∀ ψ ∈ N (ψ 0 ) ∂ θψ,2

=⇒

=⇒



0 θψ,2

 ∂2 e0 ˚ 0 Cψ T4 exp T5 ξ ψ = 0∀ ψ ∈ N (ψ 0 ) ⊗ ∂ θψ,2

  ∂ e0 (1) ˚ C T D T V D + V 0 5 η ;θ 6 = 0∀ ψ ∈ N (ψ 0 ) ψ 4 ξ 5 0 ψ ψ,2 ∂ θψ,2

e 0 T4 Dξ =⇒ C ψ



(1) V5 Dη ;θ0 ψ ψ,2

+ V6

0

(2)

+Wψ V5 Dη (2)

=⇒ V5 Dη

0 0 ψ ;θψ,2 ,θψ,2

+e 0 = −Wψ Cψ T4 Dξ

0 0 ψ ;θψ,2 ,θψ,2



T05



(1) V5 Dη ;θ0 ψ ψ,2

+ V6

0

T05

0

˚(ψ 0 ) = 0∀ ψ ∈ N

(1)

V5 Dη



0 ψ ;θψ,2

+ V6

0

 (1) T05 ∗ V5 Dη ;θ0 ψ

ψ,2

+ V6

0

T05

0

+e 0 = −Wψ Cψ T4 Dξ (V60 T05 ∗ V60 T05 ) evaluated at V5 = Vψ,1 and V6 = Vψ,2 . (2)

Accordingly, the matrix of derivatives Dψ;θ0

0 ψ,2,θψ,2

(2)

Dψ;θ0

0 ψ,2,θψ,2

, evaluated at V5 = Vψ,1 and V6 = Vψ,2 , is

  (1) = wθψ,1 (Ip − Hψ ) T4 Dξ (Iq4 − Pξ ) L02 − w Dψ;θ0 ⊗ u0ξ 2Nν12 ,

+e 0 where Pξ = T5 Wψ Cψ T4 Dξ ,

ψ,2

uξ = V60 T05 Dξ T04 1p , and L2 = V60 T05 ∗ V60 T05 .

0

Details78 For fixed V5 and V6 , the matrix of third derivatives of ψ with respect to θ 0ψ,2 , evaluated at V5 = Vψ,1 and V6 = Vψ,2 , is     (3) (1) Dψ;θ0 ,θ0 ,θ0 = −w2 θψ,1 u0ξ ⊗ (Ip − Hψ ) T4 Dξ L02 − w 10p T4 Dξ L02 ⊗ Dψ;θ0 Jν12 ψ,2 ψ,2 ψ,2

ψ,2

 (2)0 +wθψ,1 (Ip − Hψ ) T4 Dξ Dη ;θ0 ψ

0 ψ,2 ,θψ,2

V50 T05 ∗ V60 T05

0

Jν12

  (1) +wθψ,1 (Ip − Hψ ) T4 Dξ L03 + w2 u0ξ ⊗ Dψ;θ0 ⊗ u0ξ (Iν12 ⊗ 2Nν12 ) ψ,2

 (2) −w Dψ;θ0

0 ψ,2,θψ,2

  (2) ⊗ u0ξ (Iν12 ⊗ 2Nν12 ) − w 10p T4 Dξ T5 V5 Dη ;θ0 ψ

h (2) −w2 θψ,1 u0ξ ⊗ (Ip − Hψ ) T4 Dξ T5 V5 Dη ;θ0

i

0 ψ,2 ,θψ,2

ψ



(1)

0 ψ,2 ,θψ,2

⊗ Dψ;θ0

ψ,2

(3)

+ wθψ,1 (Ip − Hψ ) T4 Dξ T5 V5 Dη

0 0 0 ψ ;θψ,2 ,θψ,2 ,θψ,2

The above expression can be simplified by first replacing (2)

wθψ,1 (Ip − Hψ ) T4 Dξ T5 V5 Dη (2)

Dψ;θ0

0 ψ,2,θψ,2

0 0 ψ ;θψ,2 ,θψ,2

by

  (1) + w Dψ;θ0 ⊗ 10p T4 Dξ T5 V6 2Nν12 − wθψ,1 (Ip − Hψ ) T4 Dξ L02 , ψ,2

and then replacing (2)

T5 V5 Dη

0 0 ψ ;θψ,2 ,θψ,2

by − Pξ L02 .

The simplified expression is 

(3)

= wθψ,1 (Ip − Hψ ) T4 Dξ

Dψ;θ0

0 0 ψ,2,θψ,2,θψ,2

 0 1 0 L3 − L2 P0ξ ∗ V60 T05 Jν12 3

h i (1) −w 10p T4 Dξ (Iq4 − Pξ ) L02 ⊗ Dψ;θ0 Jν12 ψ,2

 (2) −w Dψ;θ0

0 ψ,2,θψ,2

(3)

To solve for Dη

0 0 0 ψ ;θψ,2 ,θψ,2 ,θψ,2

 (3) ⊗ u0ξ Jν12 + wθψ,1 (Ip − Hψ ) T4 Dξ T5 V5 Dη ;θ0 ψ



=⇒



0 0 ψ,2 ,θψ,2 ,θψ,2

.

, note that

0 θψ,2

0 θψ,2

 ∂2 e0 0 Cψ T4 exp T5 ξ ψ = 0∀ θψ,2 ⊗ ∂ θψ,2

 ∂3 e0 0 0 Cψ T4 exp T5 ξ ψ = 0∀ θψ,2 ⊗ ∂ θψ,2 ⊗ ∂ θψ,2

(  0  0 0 ∂ (1) (1) 0 0 e Cψ T4 Dξ V5 Dη ;θ0 + V6 T5 ∗ V5 Dη ;θ0 + V6 T05 =⇒ 0 ψ ψ,2 ψ ψ,2 ∂ θψ,2 ) e 0 T4 Dξ T5 V5 D(2) 0 +C ψ η ;θ ψ

Jν12

0 ψ,2 ,θψ,2

= 0∀ θψ,2

.

Details79  e 0 T4 Dξ L0 + C e 0 T4 Dξ D(2)0 0 =⇒ C ψ 3 ψ η ;θ ψ

0 ψ,2 ,θψ,2

V50 T05 ∗ V60 T05

(3)

e 0 T4 Dξ T5 V5 D +C ψ η (3)

=⇒ V5 Dη

0 0 0 ψ ;θψ,2 ,θψ,2 ,θψ,2

0 0 0 ψ ;θψ,2 ,θψ,2 ,θψ,2

+e 0 = −Wψ Cψ T4 Dξ



0

Jν12

=0

 0 1 0 L3 − L2 P0ξ ∗ V60 T05 Jν12 , 3

where L3 = V60 T05 ∗ V60 T05 ∗ V60 T05 . Accordingly, the matrix of third derivatives, evaluated at V5 = Vψ,1 and V6 = Vψ,2 , is   0 1 (3) Dψ;θ0 ,θ0 ,θ0 = wθψ,1 (Ip − Hψ ) T4 Dξ (Iq4 − Pξ ) L03 − L2 P0ξ ∗ V60 T05 Jν12 ψ,2 ψ,2 ψ,2 3 i  h (2) (1) −w 10p T4 Dξ (Iq4 − Pξ ) L02 ⊗ Dψ;θ0 Jν12 − w Dψ;θ0

0 ψ,2,θψ,2

ψ,2

 ⊗ u0ξ Jν12 .

To reduce memory and computational requirements, matrix derivatives can be stored as sparse matrices. Also, to decrease the number of non-zero entries in the matrix derivatives, V6 can 0 be equated to the transpose of the row reduced echelon form of Vψ,2 .

111.5.2

Initial Guesses for ξψ and

θψ

A three-stage procedure is used to obtain an initial guess for θψ . The first stage employs the Gauss-Newton algorithm to minimize 0    T4 exp {T5 1q5 ω} T4 exp {T5 1q5 ω} f− 0 SSE(ω) = f − 0 1p T4 exp {T5 1q5 ω} 1p T4 exp {T5 1q5 ω} b /(10 ψ b b with respect to the scalar ω, where f = ψ S p S ), ψS is the vector of sample standard deviations, and the initial guess for ω is zero. At iteration h + 1, the estimate is updated by −1

ω bh+1 = ω bh + αh {x0h xh }

x0h (f − b fh ), where

T4 exp {T5 1q5 ω bh } b fh = 0 , 1p T4 exp {T5 1q5 ω bh } xh =



Ip − b fh 10p



T4 (exp {T5 1q5 ω bh })dg T5 1q5

!

10p T4 exp {T5 1q5 ω bh }

,

and αh ∈ (0, 1]. The second stage employs the Gauss-Newton algorithm to minimize  SSE(ξ ψ ) =

f−

T4 exp {T5 ξ ψ } 10p T4 exp {T5 ξ ψ }

0

  T4 exp {T5 ξ ψ } M+ f − ψ 10p T4 exp {T5 ξ ψ }

Details80 with respect to ξ ψ , where the initial guess for ξ ψ is 1q5 ω b, ω b is the minimizer from stage one,  0 −1 Mψ = Ip − f 10p (R S) Ip − f 10p ,

(137)

h  0 i−1 0 −1 0 0 M+ Ip − f 10p U U, ψ = U U Ip − f 1p (R S) and U is any full column-rank matrix that satisfies R(U) = N (10p ). At iteration h + 1, the estimate is updated by  −1 T4 exp {T5 b ξ ψ,h } b b b , ξ ψ,h+1 = b ξ ψ,h + αh X0h M+ X0h M+ ψ Xh + βh Iq5 ψ (f − fh ), where fh = 0 1p T4 exp {T5 b ξ ψ,h }    T4 exp {T5 b ξ ψ,h } T5   dg  , 10p T4 exp {T5 b ξ ψ,h }  

Xh = Ip − b fh 10p



αh ∈ (0, 1], and βh ≥ 0. The values of αh and βh are chosen to ensure that SSE decreases at each iteration. If ψ is subject to no additional constraints, then the initial guess for θψ,2 is b ξ ψ , where b ξψ 1/2 0 0 b is the minimizer from stage two, and the initial guess for θψ,1 is 1p [diag (S)] = 1p ψS . If ψ is subject to Cψ0 ψ/(10p ψ) = cψ , then the initial guess for θψ is the minimizer of h i0 h i b − ψ (θψ ) (R S)−1 ψ b − ψ (θψ ) SSE(θψ ) = ψ S S with respect to θ ψ , where  T4 exp T5 ξ ψ  , ψ (θψ ) = θψ,1 0 1p T4 exp T5 ξ ψ

ξ ψ = V5 ηψ + V6 θψ,2 ,

V5 and V6 are given in Theorem 110, ηψ satisfies   e 0 T4 exp T5 V5 η + V6 θψ,2 = 0 C ψ ψ e ψ is defined in Theorem 110. for fixed θψ,2 , and C At iteration h + 1, the estimate of θ ψ is updated as  −1   −1 b bψ,h+1 = θ bψ,h + αh X0 (R S)−1 Xh + βh Iν θ X0h (R S) ψS − ψ h , where h 2

bψ,h = θ

θbψ,1,h b θ ψ,2,h

! ;

h −1 Xh = ψ h θbψ,1,h

n o ψ h = θbψ,1,h wh T4 exp T5 b ξ ψ,h ;

wh =

 n o Dξˆ = exp T5 b ξ ψ,h ; dg

i wh θbψ,1,h (Ip − Hψ,h ) T4 Dξˆh T5 V6,h ;

10p T4

1 n o; exp T5 b ξ ψ,h

−1 Hψ,h = ψ h θbψ,1,h 10p ;

b bψ,2,h , b ψ,h + V6,h−1 θ ξ ψ,h = V5,h−1 η

Details81 αh ∈ (0, 1], βj ≥ 0, and the matrices V5,h and V6,h are given in Theorem 108 in which b ξ ψ,h has been substituted for ξ ψ . Values of αh and βh are chosen to ensure that SSE decreases at each bψ,h+1 , the value of b iteration. Given θ ξ ψ,h+1 is obtained by solving o n bψ,2,h+1 , e 0 T4 exp T5 b b ψ,h+1 , where b b ψ,h+1 + V6,h θ ξ ψ,h+1 = V5,h η C ξ ψ,h+1 = 0 for η ψ e ψ = Cψ − 1 p c 0 . and C ψ

111.5.3

Solving for ηψ and

ξψ

The issue in this section is to solve n  o bψ,2 e 0 T4 exp T5 V5 η b ψ + V6 θ C = 0, ψ bψ,2 is a fixed vector and V5 ∈ Oq ,q and V6 ∈ Oq ,q −q are fixed matrices that bψ , where θ for η 5 ψ 5 5 ψ 0 b ψ,0 = V50 b satisfy V5 V6 = 0. To solve this equation, first set η ξ ψ , where b ξ ψ is the current estimate b ψ,j is of ξ ψ . A modified Newton update for η n  o bψ,2 , e0 b ψ,j+1 = η b ψ,j − αj B−1 b ψ,j + V6 θ η j Cψ T4 exp T5 V5 η h n  oi bψ,2 e 0 T4 exp T5 V5 η b ψ,j + V6 θ where Bj = C T5 V5 . ψ dg

and αj ∈ (0, 1].

111.6 111.6.1

Parameterization (4a): ψ = ξψ,1 exp T4 exp T5 ξψ,2 , 





Cψ0 ln (ψ) = cψ

Derivatives of ψ with Respect to

θψ

The matrices of first, second, and third derivatives of ψ with respect to θψ are given in Theorem 111.

Theorem 111.

Dene dξ and Dξ as  dξ = exp T5 ξ ψ,2 and Dξ = (dξ )dg .

Dene Wψ as

−1 Wψ = Cψ0 Bξ , where Bξ = 1p ξψ,1

 T4 Dξ T5 .

(138)

It is assumed that Cψ has been chosen such that the qψ × q5 matrix ∂ C 0 ln (ψ) ∂ ξ 0ψ ψ

has full row-rank. It follows that Wψ has full row-rank. Write Wψ as 0 SVD (Wψ ) = Uψ Dψ Vψ , where

Uψ ∈ Oqψ ,

Vψ ∈ Oq5 +1 ,

Vψ,1 is (q5 + 1) × qψ ,

Vψ = Vψ,1

Dψ = Dψ,1

 Vψ,2 ,

 ++ 0qψ ×(q5 +1−qψ ) , and Dψ,1 ∈ Ddg,q . ψ

(139)

Details82 Then, ξψ can be written in as ξ ψ = V5 η ψ + V6 θψ , where η ψ = V50 ξ ψ , V5 = Vψ,1 ,

θψ = V60 ξ ψ ,

  V21 V6 is partitioned as V6 = , V22

V6 = Vψ,2 ,

dim(V21 ) = 1 × (q5 + 1 − qψ ) and dim(V22 ) = q5 × (q5 + 1 − qψ ). Furthermore, ν2 = q5 + 1 − qψ and the rst three derivatives of ψ with respect to θψ are (1)

Dψ;θ0 = ΨBξ V6 , ψ

(2) e 0 + Ψ (Ip − Pξ ) T4 Dξ L0 + ΨPξ 1p ξ −2 (V21 ⊗ V21 ) Dψ;θ0 ,θ0 = ΨL 2 2 ψ,1 ψ

ψ

−1 +ξψ,1 (ΨT4 Dξ T5 V22 ⊗ V21 ) 2Nν2 , and (3)

−2 Dψ;θ0 ,θ0 ,θ0 = ΨPξ 1p ξψ,1 ψ

ψ

ψ

h (2) Dξ

0 0 ψ,1 ;θψ,θψ

 i −1 ⊗ V21 Jν2 − 2ξψ,1 (V21 ⊗ V21 ⊗ V21 )

 0 e 0 + Ψ D(2)0 0 0 T0 Dξ T4 ∗ V0 T0 Dξ T4 Jν + Ψ (L2 Dξ T0 ∗ V0 T0 Dξ T0 )0 Jν +ΨL 3 5 22 5 4 22 5 4 2 2 ξ ;θ ,θ ψ,2

ψ

ψ

   0 (2)0 0 +Ψ (Ip − Pξ ) T4 Dξ L03 + Dξ ;θ0 ,θ0 T05 ∗ V22 T05 Jν2 ψ,2

ψ

ψ

   −1 e 0 ⊗ V21 Jν + ξ −1 Ψ T4 Dξ T5 D(2) +ξψ,1 Ψ L 2 2 ψ,1 ξ

0 0 ψ,2 ;θψ,θψ

 (2) −1 −1 +ξψ,1 Ψ (T4 Dξ L02 ⊗ V21 ) Jν2 + ξψ,1 Ψ Dξ

0 0 ψ,1 ;θψ,θψ

0 0 where L2 = (V22 T05 ∗ V22 T05 ) ,

0 L3 = L2 ∗ V22 T05 ,

e3 = L e 2 ∗ V0 T0 Dξ T0 , L 22 5 4  (2) 0 0 ψ ;θψ,θψ



=

(2)



0 0 ψ,1 ;θψ,θψ

(2)



 ⊗ V21 Jν2

 ⊗ T4 Dξ T5 V22 Jν2 ,

e 2 = (V0 T0 Dξ T0 ∗ V0 T0 Dξ T0 ) , L 22 5 4 22 5 4

+e 0 Cψ , Pξ = Bξ Wψ

 h i e 0 1p ξ −2 (V21 ⊗ V21 ) − T4 Dξ L0 ,  = W+ C 2 ψ,1 ψ ψ

0 0 ψ,2 ;θψ,θψ

Ψ = ψ dg , and Jν2 is dened in Table 101. If Cψ in parameterization (4a) of Table 111 is empty (i.e., there are no constraints on ln ψ ), (q +1)0 , V22 = (0q5 ×1 Iq5 ), ξψ = θψ , Pξ = 0 and the then qψ = 0, ν2 = q5 + 1, V6 = Iq5 +1 , V21 = e1 5 derivatives simplify to (1)

Dψ;θ0 = ΨBξ , ψ

(2)

e 0 + ΨT4 Dξ L0 + ξ −1 (ΨT4 Dξ T5 V22 ⊗ V21 ) 2Nν , and Dψ;θ0 ,θ0 = ΨL 2 2 2 ψ,1 ψ

ψ

0

(3)

e 0 + Ψ (L2 Dξ T0 ∗ V0 T0 Dξ T0 ) Jν + ΨT4 Dξ L0 Dψ;θ0 ,θ0 ,θ0 = ΨL 3 4 22 5 4 3 2 ψ

ψ

ψ

Details83   −1 e 0 ⊗ V21 Jν + ξ −1 Ψ (T4 Dξ L0 ⊗ V21 ) Jν , +ξψ,1 Ψ L 2 2 2 2 ψ,1 0 0 where L2 = (V22 T05 ∗ V22 T05 ) ,

0 L3 = L2 ∗ V22 T05 ,

e 2 = (V0 T0 Dξ T0 ∗ V0 T0 Dξ T0 ) , L 22 5 4 22 5 4

e3 = L e 2 ∗ V0 T0 Dξ T0 , L 22 5 4

and Jν2 is dened in Table 101.

111.6.2

Initial Guesses for ξψ and

θψ

Write ψ as    ψ = ψ(ξ ψ ), where ψ(ξ ψ ) = ξψ,1 exp T4 exp T5 ξ ψ,2 A three-stage procedure is used to obtain an initial guess for ξ ψ . The first stage employs the Gauss-Newton algorithm to minimize   0  b −ψ b −ψ SSE(ω) = ψ ψ S ω S ω with respect to the scalar ω, where  ψ ω = ψ(ξ ψ ), and ξ ψ = 

exp

n

1 0 p 1p

b ) ln (ψ S

o .

1d ω The initial guess for ω is zero. At iteration h + 1, the estimate of ω is updated by   −1 b −ψ b ω bh+1 = ω bh + αh (x0h xh ) x0h ψ S ωh , where

b ψ ωh

  =ψ b ξ ψ,h ,

b ξ ψ,h

 o n b ) exp p1 10p ln (ψ S , = 1d ω bh

xh = Ψh T4 (exp {T5 1q5 ω bh })dg T5 1q5 ,

  b Ψh = ψ . ωh dg

and αh ∈ (0, 1]. The second stage employs the Gauss-Newton algorithm to minimize h i0 h i b − ψ(ξ ) (R S)−1 ψ b − ψ(ξ ) SSE(ξ ψ ) = ψ S ψ S ψ with respect to ξ ψ,2 , where    ψ(ξ ψ ) = ξψ,1 exp T4 exp T5 ξ ψ,2 ,

 n o b ) exp p1 10p ln (ψ S , ξψ =  ξ ψ,2

the initial guess for ξ ψ,2 is 1q5 ω b , and ω b is the minimizer from stage one. At iteration h + 1, the

Details84 estimate is updated by h i−1 −1 −1 b b b ξ ψ,2,h+1 = b ξ ψ,2,h + αh X0h (R S) Xh + βh Iq5 X0h (R S) (ψ S − ψ h ), where   b =ψ b ψ ξ ψ,h , h

 n o Xh = Ψh T4 exp T5 b ξ ψ,2,h T5 , dg

 b ξ ψ,h = 

exp

n

1 0 p 1p

  b Ψh = ψ , h dg

o b ) ln (ψ S ,

ξ ψ,2,h αh ∈ (0, 1], and βh ≥ 0. The values of αh and βh are chosen to ensure that SSE decreases at each iteration. If ψ is subject to no additional constraints, then the initial guess for θ ψ is b ξ ψ . If ψ is subject to Cψ0 ln (ψ) = cψ , then the third stage is executed. In this stage, the initial guess for θ ψ is obtained as the minimizer of h h i 0 i −1 b b −ψ ξ (R S) SSE(θ ψ ) = ψ ψS − ψ ξ ψ ψ S with respect to θ ψ , where ψ(ξ ψ ) = ξψ,1 exp



  T4 exp T5 ξ ψ,2 ,

 ξ ψ = V5 η ψ + V6 θ ψ ,

ξψ =

 ξψ,1 , ξ ψ,2

η ψ satisfies    Cψ0 1p ln(ξψ,1 ) + T4 exp T5 ξ ψ,2 = cψ for fixed θ ψ , and where the matrices V5 and V6 are given in Theorem 109. At iteration h + 1, the estimate of θ ψ is updated as h i−1 −1 b bψ,h+1 = θ bψ,h + αh X0 (R S)−1 Xh + βh Iν b θ X0h (R S) (ψ S − ψ h ), where h 2   b =ψ b ψ ξ ψ,h , h

h −1 Xh = Ψh 1p ξψ,1,h

  i T4 T5 b ξ ψ,2,h T5 V6,h , dg

b bψ,h , b ψ,h + V6,h−1 θ ξ ψ,h = V5,h−1 η αh ∈ (0, 1], βh ≥ 0, and the matrices V5,h and V6,h are given in Theorem 109 in which b ξ ψ,h has been substituted for ξ ψ . The values of αh and βh are chosen to ensure that SSE decreases at each bψ,h+1 , the value of b iteration. Given θ ξ ψ,h+1 is obtained by solving h   n oi b ψ,h+1 , where Cψ0 1p ln ξbψ,1,h+1 + T4 exp T5 b ξ ψ,2,h+1 = cψ for η

b ξ ψ,h+1 =

111.6.3

Solving for ηψ and

ξbψ,1,h+1 b ξ ψ,2,h+1

! bψ,h+1 . b ψ,h+1 + V6,h θ = V5,h η

ξψ

The issue in this section is to solve h   n oi Cψ0 1p ln ξbψ,1 + T4 exp T5 b ξ ψ,2 = cψ ,

Details85 bψ , where for η ξbψ,1 b ξ ψ,2

b ξψ =

! bψ , b ψ + V6 θ = V5 η

bψ is a fixed vector and V5 ∈ Oq +1,q and V6 ∈ Oq +1,q +1−q are fixed matrices that satisfy θ 5 5 5 ψ ψ b ψ,0 = V50 b V50 V6 = 0. To solve this equation, first set η ξ ψ , where b ξ ψ is the current estimate of ξ ψ . b ψ,j+1 is A modified Newton update for η b ψ,j+1 = η b ψ,j η h n −1 −αj Cψ0 1p ξbψ,1,j

o  n i o−1 T5 V5 T4 exp T5 b ξ ψ,2,j dg

oi o h n n ξ ψ,2,j − cψ × Cψ0 1p ln(ξbψ,1,j ) + T4 exp T5 b bψ . b ψ,j + V6 θ where αj ∈ (0, 1], and b ξ ψ,j = V5 η

111.7

Parameterization (4b): ψ = θψ,1 exp T4 exp T5 ξψ ,   

Cψ0 ln

Qp

ψ 1/p ψi



= cψ

i=1

111.7.1

Derivatives of ψ with Respect to

θψ

The matrices of first, second, and third derivatives of ψ with respect to θψ are given in Theorem 112.

Theorem 112.

Partition θψ as θψ = (θψ,1 θ 0ψ,2 )0 , where ξψ is an implicit function of θ ψ,2 . Dene dξ and Dξ as in (132). and dene Wψ as 0 Wψ def = Cψ T4 Dξ T5 .

It is assumed that Cψ has been chosen such that the qψ × q5 matrix   ∂ ψ  C 0 ln  Q 1 p ∂ ξ 0ψ ψ ψp i=1

i

0 has full row-rank. It follows that Wψ has full row-rank. Write Wψ as SVD (Wψ ) = Uψ Dψ Vψ , where Vψ has been partitioned as in (130). Then, ξψ can be written in as

ξ ψ = V5 η ψ + V6 θ ψ,2 where η ψ = V50 ξ ψ ,

θψ = V60 ξ ψ ,

V5 = Vψ,1 , and V6 = Vψ,2 . Dene h as in (136). The entries in h represent the dimensions of θψ,1 and θ ψ,2 . That is,   1 h= . q5 − qψ

Then, ν2 = dim1 (θψ ) = 1 + q5 − qψ and for xed V5 and V6 , the rst three derivatives of ψ with

Details86 respect to θψ are (1)

Dψ;θ0 =

2 X

ψ

(1)

Dψ;θ0 E0s,h , ψ,s

s=1

(2)

Dψ;θ0 ,θ0 = ψ

2 X 2 X

(2)

Dψ;θ0

0 ψ,s,θψ,t

ψ

 E0s,h ⊗ E0t,h ,

s=1 t=1

(3)

Dψ;θ0 ,θ0 ,θ0 = ψ

ψ

2 X 2 X 2 X

(3)

 E0s,h ⊗ E0t,h ⊗ E0u,h , where

Dψ;θ0

0 0 ψ,s,θψ,t,θψ,u

ψ

s=1 t=1 u=1 (1)

(1)

−1 Dψ;θψ,1 = ψθψ,1 ;

(2)

Dψ;θ0

0 ψ,2,θψ,2

Dψ;θ0 = ΨT4 Dξ T5 V6 ; ψ,2

e0 ; = ΨT4 Dξ (Iq4 − Pξ ) L02 + ΨL 2

(2)

Dψ;θψ,1 ,θψ,1 = 0p×1 ; (3)

(3)

(2)

0 ψ,2,θψ,2

−1 = θψ,1 Dψ;θ0

0 ψ,2,θψ,2



(3)

Dψ;θ0

0 0 ψ,2,θψ,2,θψ,2

= ΨT4 Dξ (Iq4 − Pξ )

(1)

ψ,2,θψ,1

−1 θψ,1 Dψ;θ0 ; ψ,2

(3)

Dψ;θψ,1,θψ,1,θψ,1 = 0p×1 ;

Dψ;θψ,1 ,θ0

(2)

Dψ;θ0

Dψ;θψ,1 ,θψ,1 ,θ0 = 0p×ν12 ; ψ,2

;

 0 1 0 L3 − L2 P0ξ ∗ V60 T05 Jν12 3

 0 0 e0 ; +Ψ L2 (Iq4 − Pξ ) Dξ T04 ∗ V60 T05 Dξ T04 Jν12 + ΨL 3 L2 = V60 T05 ∗ V60 T05 ;

L3 = L2 ∗ V60 T05 ;

e3 = L e 2 ∗ V0 T0 Dξ T0 ; L 6 5 4

e 2 = V0 T0 Dξ T0 ∗ V0 T0 Dξ T0 ; L 6 5 4 6 5 4

+e 0 Pξ = T5 Wψ Cψ T4 Dξ ;

Ψ = ψ dg , and Ja , Es,h and Nν12 are dened in Table 101. If Cψ in parameterization (4b) of Table 111 is empty, then qψ = 0, V6 = Iq5 , ν12 = q5 , ξ ψ = θψ,2 , Pξ = 0, and the derivatives of ψ simplify to (1)

−1 Dψ;θψ,1 = ψθψ,1 ;

(2)

Dψ;θ0

0 ψ,2,θψ,2

(1)

Dψ;θ0 = ΨT4 Dξ T5 ; ψ,2

e0 ; = ΨT4 Dξ L02 + ΨL 2

(3)

(3)

0 ψ,2,θψ,2

(2)

−1 = θψ,1 Dψ;θ0

0 ψ,2,θψ,2

(1)

ψ,2,θψ,1

−1 θψ,1 Dψ;θ0 ; ψ,2

Dψ;θψ,1 ,θψ,1 ,θ0 = 0p×ν12 ; ψ,2

; 0

(3)

0 0 ψ,2,θψ,2,θψ,2

L2 = T05 ∗ T05 ;

(2)

Dψ;θ0

(3)

Dψ;θψ,1,θψ,1,θψ,1 = 0p×1 ;

Dψ;θψ,1 ,θ0 Dψ;θ0

(2)

Dψ;θψ,1 ,θψ,1 = 0p×1 ;

e0 ; = ΨT4 Dξ L03 + Ψ (L2 Dξ T04 ∗ T05 Dξ T04 ) Jν12 + ΨL 3

L3 = L2 ∗ T05 ;

e 2 = T0 Dξ T0 ∗ T0 Dξ T0 ; L 5 4 5 4

and Ja , Es,h and Nν12 are dened in Table 101.

e3 = L e 2 ∗ T0 Dξ T0 ; L 5 4

Details87

111.7.2

Initial Guesses for ξψ and

θψ

An initial guess for θψ,1 is   b . θbψ,1 = p−1 10p ln ψ S An initial guess for θψ,2 can be obtained in the same manner that the initial guess for θψ was obtained in model (2a) of Table 111, except that (a) θψ is replaced by θψ,2 , b is replaced by ln (ψ b ) − 1p θbψ,1 , and (b) ψ S S (c) (R S)

111.7.3

−1

is replaced by Ip .

Solving for ηψ and

ξψ

The issue in this section is to solve n o Cψ0 T4 exp T5 b ξ ψ = cψ , bψ , where for η b bψ,2 , b ψ + V6 θ ξ ψ = V5 η bψ,2 is a fixed vector and V5 ∈ Oq ,q and V6 ∈ Oq ,q −q are fixed matrices that satisfy θ 5 ψ 5 5 ψ b ψ,0 = V50 b V50 V6 = 0. To solve this equation, first set η ξ ψ , where b ξ ψ is the current estimate of ξ ψ . b ψ,j+1 is A modified Newton update for η bψ,j+1 = η bψ,j η

 −1 h o o i  n n 0 b − αj Cψ T4 exp T5 ξψ,j T5 V5 Cψ0 T4 exp T5 b ξψ,j − cψ , dg

bψ,2 and αj ∈ (0, 1]. bψ,j + V6 θ where b ξψ,j = V5 η

112

Details on Algorithm 1

Lemma 106.

˚ (0). Dene D(1) 0 Suppose that ξγ,u ∈ N g;ξ

γ,u

(1) Dg;ξ0 def = γ,u

∂ vec(G) ∂ ξγ0

as .

ξγ =ξγ,u

Then, (1)

Dg;ξ0

γ,u

h i−1  ˙ 0 (Gu ⊗ Ip ) A1 ˙ 0 (Gu ⊗ Ip ) . = Ip2 − Pu A2 , where Pu = A1 D D p p

Details88 Proof. Recall that vec(G) = A1 η ϕ + A2 ξϕ . Accordingly, ˚ (0) G ∈ Op ∀ ξϕ ∈ N ˙ 0 vec(GG0 − Ip ) = 0 ⇐⇒ D p 0 ˙ 0 ∂ vec(GG − Ip ) = 0 ∀ ξϕ ∈ N ˚ (0) =⇒ D p 0 ∂ ξϕ

∂ ηϕ ˙ 0 (G ⊗ Ip ) A1 =⇒ 2D + A2 p ∂ ξϕ0

! ˚ (0) = 0 ∀ ξϕ ∈ N

=⇒

h i−1 ∂ ηϕ ˙0 ˙ 0 (G ⊗ Ip ) A2 ∀ ξ ∈ N ˚ (0) D ϕ p 0 = − Dp (G ⊗ Ip ) A1 ∂ ξϕ

=⇒

h i−1 ∂ vec(G) ˙ 0 (G ⊗ Ip ) A1 ˙ 0 (G ⊗ Ip ) A2 + A2 = −A1 D D p p 0 ∂ ξϕ

h i−1  ˙ 0 (G ⊗ Ip ) A1 ˙ 0 (G ⊗ Ip ) . = Ip2 − P A2 , where P = A1 D D p p ˙ 0 (G ⊗ Ip ) A1 is nonsingular because ˚ (0), then D Note that if ξϕ ∈ N p  ˙ 0 (G ⊗ Ip ) A1 ∈ N ˚ (0) =⇒ D ˚ Ip(p+1)/2 . ξϕ ∈ N p (1)

Also, note that if ξ γ,u = 0, then Dg;ξ0

γ,u

simplifies to

∂ vec(G) ∂ ξγ0

(1)

= Dg;ξ0 = 2N⊥ p A2 , γ

ξγ =0

but Ip2 − P does not simplify to 2Np (see eq. 164). ˙ 0 (Gu ⊗ Ip ) A1 has dimension p∗ × p∗ , where p∗ = p(p + 1)/2. To avoid inverting The matrix D p ˙ 0 (Gu ⊗ Ip ) A1 this matrix, one can use a second-order Taylor series approximation. Expanding D p around ξ γ,u = 0 yields h

˙ 0 (Gu ⊗ Ip ) A1 D p

i−1

1 ≈ Ip(p+1)/2 − Mu − M∗u + Mu Mu , where 2

˙ 0 (Bu ⊗ Ip ) A1 , Mu = D p

 Bu = dvec 2N⊥ p A2 ξ γ,u , p, p ,

h i ˙ 0 (B∗ ⊗ Ip ) A1 , and B∗ = dvec A1 D ˙ 0 vec (Bu Bu ) , p, p . M∗u = D p u u p  The error of the quadratic approximation has magnitude O kξ γ,u k3 .  Algorithm 1. Given {Γ∗ , θϕ , V1 , V2 }, compute ξϕ = ξγ0 ξλ0 ξΛ0 0 such that h(ξϕ ; Γ∗ ) = 0.

Details89

Step 1.

0 0 0 0 ξλ,0 ξΛ,0 , Choose a small positive . Denote the initial guess for ξϕ by ξϕ,0 = ξγ,0 where ξγ,0 = 0pm ×1 . Compute ηϕ,0 = V10 ξϕ,0 and ξϕ,1 = V1 ηϕ,0 + V2 θϕ . Set u = 1.  Step 2. Compute Γu = Γ∗ Gu , λu = λ(ξλ,u ), and Λu = (λu )dg + dvec T3 ξΛ,u ; p, p , where Gu = G(ξ γ,u ) is computed using the closed-form algorithm in Boik (2008b, Appendix D).

Step 3.

Compute ξϕ,u+1 = V1 ηϕ,u+1 + V2 θϕ , where ηϕ,u+1 = ηϕ,u − αu (Wϕ:g,u V1 )−1 h(ξϕ,u ; Γ∗ ) def ∂ h(ξϕ ; θϕ ) = Wϕ:g,u = ∂ ξϕ0 ξϕ =ξϕ,u



(1)

2C0 L021,p (ΓGu Λu ⊗ Γ) Dg;ξ0

C0 L021,p (ΓGh )

γ,u

     

(1)

Cγ0 (Ip ⊗ Γ) Dg;ξ0

(1)

L21,p Dλ;ξ0

λ,u

⊗2

C0 L021,p (ΓGu )

0qγ ×q2

γ,u

λ,u

⊗2

Cϕ0 (ΓGh )

(1)

Dg;ξ0

γ,u

(1) Dλ;ξ0 def = λ,u

∂ λ ∂ ξλ0

(1) L21,p Dλ;ξ0 λ,u

⊗2

Cϕ0 (ΓGu )

    ,  

0qλ ×q3

Dh3 ;ξ0 (1) Γ) Dg;ξ0 γ,u

T3

0qγ ×q3

(1)

0qλ ×pm Cϕ0 2Np (ΓGu Λu ⊗

⊗2

T3

is given in Lemma 106,

, ξλ =ξλ,u

(1) Dh3 ;ξ0 def = λ,u

∂ h3 (λ) ∂ ξλ0

, ξλ =ξλ,u

and αu ∈ (0, 1] is chosen to ensure that kh(ξϕ,u+1 ; Γ∗ )k < kh(ξϕ,u ; Γ∗ )k.

Step 4.

Set u = u + 1 and go to step 3 unless kh(ξϕ,u ; Γ∗ )k < .

To obtain a modified Gauss-Newton algorithm, Step 4 of Algorithm 1 can be replaced by

Step 4∗ .

Update ξϕ as ξϕ,h+1 = V1 ηϕ,u+1 + V2 θϕ , where 0 ηϕ,u+1 = ηϕ,u − αu βu Ib−rϕ + V10 Wϕ:g,u Wϕ:g,u V1

−1

0 V10 Wϕ:g,u h(ξϕ,u , θϕ ),

and βu ≥ 0 and αu ∈ (0, 1] are chosen to ensure that kh(ξϕ,u+1 ; θϕ )k < kh(ξϕ,u ; θϕ )k.

113

Details on Algorithm 2

Algorithm 2 (Fisher Scoring, Modified-Newton). Step 1.

Set u = 0. Choose small positive numbers  and ∗ . Denote initial guesses as b bψ,0 ) and Φ b 0 = Φ(b b0 ) = Γ b0 Λ b0 Γ b 0 , where b ψ 0 = ψ(θ ξ ϕ,0 ; Γ ξ γ,0 = 0pm ×1 . These guesses should 0 b b (nearly) satisfy h(ξϕ,0 ; Γ0 ) = 0.

Step 2.

cϕ,u as Wϕ of (32) in which (Λ, Γ, ξϕ ) is replaced by (Λ b u, Γ b u, b Dene W ξ ϕ,u ), where  0 0 0 0 b ξ ϕ,u = b ξ γ,u b ξ λ,u b ξ Λ,u and b ξ γ,u = 0pm ×1 . Reduce the row dimension of h(ξϕ ; Γ), if c necessary, by applying (40) to Wϕ,u .

Details90 b 1,u and V2,u Step 3. Dene V 

bu cϕ,u . Set θ to be V1 and V2 of (33) in which Wϕ is replaced by W 0 bϕ,u = V b b c = where θ 2,u ξ ϕ,u . If the smallest singular value of Wϕ,u is smaller than  , then terminate Algorithm 2 and execute Algorithm 4 or 5 in Ÿ6.2. Otherwise, update b as follows: θ b0 θ ϕ,u ∗

0 b0 θ ψ,u ,

 −1 (1)0 b (1) (1) bu+1 = θ bu + αu βu Iν˙ + D(1)00 H b u D(1) 0 b u ), where D ˆ 0 = Dσ;θ0 θ=θˆ , θ D ˆ0 H u (s − σ ˆ ˆ u σ;θ u

σ;θ u

σ;θ u

σ;θ u

bu = Σ b −1 ⊗ Σ b −1 if L = L1 , H b u = S−1 ⊗ S−1 if L = L2 , H bu = Ω b + if L = L3 , D(1) 0 is H u u 22,n σ;θ given in Theorem 10, and ν˙ = dim1 (θ) = νϕ + νψ . The scalars αu ∈ (0, 1] and βu ≥ 0 are (1) chosen to ensure that the value of DL,θˆ decreases at each iteration.

Step 4.

bu+1 , V1 = V bu , θϕ = θ b 1,u , and V2 = V b 2,u , to compute Use Algorithm 1, in which Γ∗ = Γ b bϕ,u+1 ). Set Γ bu+1 = Γ bu G(b ξ ϕ,u+1 = ξϕ (θ ξ γ,u+1 ) and then set b ξ γ,u+1 = 0pm ×1 . Compute b b b b b Λu+1 = Λ(ξ λ,u+1 , ξ Λ,u+1 ). Compute ψ u+1 = ψ(θ ψ,u+1 ) using the modied Newton algorithm that is given in Ÿsec:SD of this Supplement.

Step 5.

(1)

Set u = u + 1 and go to step 2 unless kDL,θˆ k < . u

If nS has the Wishart density Wp (n, Σ), then the score function and the information matrix  (1) (1)0 are −(1/2)DL1 ;θ = (n/2)Dσ;θ0 Σ−1 ⊗ Σ−1 (s − σ) and Iθ =

 n  (1) 1  (2) (1)0 E DL1 ;θ,θ0 = Dσ;θ0 Σ−1 ⊗ Σ−1 Dσ;θ0 , 2 2

(140)

respectively. It follows that the algorithm for minimizing L1 is a variation on the Fisher scoring algorithm. The algorithms for minimizing L2 and L3 are modified Gauss-Newton algorithms. Modified Newton algorithms can be obtained by replacing Step 3 with  bu+1 = θ bu − αu βu Iν˙ + D(2) θ ˆ

−1

ˆ0 L;θ u ,θ u

(1)

DL;θˆ , u

where L ∈ {L1 , L2 , L3 }. First and second derivatives of L are given in §120 of this Supplement.

114

Details on Algorithm 3

114.1

Notes on the Lagrange Multiplier Method

Consider fitting a parametric model by minimizing loss function L(θ), where θ is a ν-vector ˙ and θ is subject to the q-dimensional constraint h(θ) = 0. The Lagrange multiplier approach is to solve ∂Q = 0 for θ and ζ, where ω = θ 0 ∂ω

0 ζ0 ,

Q = Q(ω) = L(θ) − ζ 0 h(θ),

(141)

and ζ is a q-vector of Lagrange multipliers. Denote the vector of parameter estimates after the ith 0 b0 ζ b0 . Then, the Newton update of ω bi = θ b i at iteration i + 1 is iteration by ω i i h i−1 (2) (1) b i+1 = ω b i − DQ,ωˆ i ,ωˆ 0 ω DQ;ωˆ i . i

(142)

Details91 (1)

(2)

The gradient vector, namely DQ;ωˆ i , and the bordered Hessian matrix, namely DQ,ωˆ i ,ωˆ 0 , can be i written as !   (1) (1)0 b DL;θˆ − D ˆ 0 ζ bi,1 a (1) i def h; θ i i bi = DQ;ωˆ i = a = and bi,2 a b) −h(θ i

(2) b i def A = DQ,ωˆ i ,ωˆ 0 = i

b i,11 A b i,21 A

b i,12 A b i,22 A

!

   (2) b0 D(2) 0 D ˆ ˆ 0 − Iν˙ ⊗ ζ i ˆ i ,θ ˆ h;θ i =  L;θi ,θi (1) −D ˆ 0 h;θ i

Using the well-known expression for the inverse written as ! bi+1 θ = b ζ !

i

bi θ b ζ i

b −1 a −A i bi =

 −1 b −1 − A b −1 A b i,12 A b i,21 A b −1 A b i,12 b i,21 A b −1 A A i,11 i,11 i,11  i,11 −1  − −1 b −1 b b b b Ai,21 Ai,11 Ai,12 Ai,21 Ai,11 (1)

b b 12 ζ DL;θˆ + A i i × bi ) −h(θ

=

! −

i

b −1 D(1) A i,11 L;θ ˆi b ζ i

!

(1) ˆ0 h;θ

(1) ˆ0 h;θ

i

(1)+ ˆ0 h;θ i

Then, D

 −1  b −1 A b i,12 A b i,21 A b −1 A b i,12 A i,11 i,11   −1  −1 b b b − Ai,21 Ai,11 Ai,12

!

 −1  i −1 h b b i,12  A A i,11 bi ) , b i,21 A b −1 A b b i,21 A b −1 D(1) + h(θ A A + i,11 i,12 i,11 L;θ ˆi −Iq

assuming that A−1 i,11 exists and that D D

.

!



bi θ b ζ

i

0q×q



of partitioned matrices, the Newton update can be

i+1

bi θ b ζ

(143) (1)0 −D ˆ 0 h;θ

(1) ˆ0 h;θ

has full row-rank. Write the full-rank SVD of D

i

i

b i,21 = U b iD b iF b 0 , where U b i ∈ Oq , = −A i

b i ∈ D++ , and F b i ∈ Oν,q D ˙ . dg,q

b iD b −1 U b 0 and it follows from Khatri (1966, Lemma 1) that =F i i

 −1  −1 b −1 − A b −1 A b i,12 A b i,21 A b −1 A b i,12 b i,21 A b −1 = F b⊥ F b ⊥0 A b i,11 F b⊥ b ⊥0 , A A F i i i i i,11 i,11 i,11 i,11 b ⊥ is defined in Table 102. Accordingly, the Newton update is where F i ! ! bi − D(1)+0 h(θ bi ) bi+1 θ θ ˆ h;θ = i b ζ 0 i+1   −1 b⊥ F b ⊥0 A b i,11 F b⊥ b ⊥0   −F F i i i i  (1)  bi ) . b i,11 D(1)+0 h(θ  −1 +  (1)+0   DL;θˆ i − A ˆ h;θ i b i,11 F b⊥ F b ⊥0 A b i,11 F b⊥ b ⊥0 F D 0 Iν˙ − A 

ˆ h;θ i

i

i

i

i

as

Details92 A modified Newton algorithm is obtained by changing the step size from one to α ∈ (0, 1]. The modified Newton update is ! ! bi − αD(1)+0 h(θ bi ) bi+1 θ θ ˆ h; θ i = b b (1 − α) ζ ζ i+1 i  −1  b ⊥0 b ⊥0 A b i,11 F b⊥ b⊥ F   F −F i i i i  (1)  bi ) . b i,11 D(1)+0 h(θ  −1 +α  (1)+0   DL;θˆ i − A ˆ h;θ i b i,11 F b⊥ F b ⊥0 A b i,11 F b⊥ b ⊥0 D ˆ 0 Iν˙ − A F i i i i 

h;θ i

(1)

If Dh;θ0 does not have full row-rank, then a finite vector ω that satisfies (141) need not exist. (1)

One could eliminate one or more constraints to ensure that Dh;θ0 does have full row-rank, but b ∈ Cp . If a finite solution eliminating constraints may result in an estimate that does not satisfy Φ to (141) does exist, then the following method may be employed to solve (141). Begin by replacing (142) with (2) (1) b i+1 ) = αDQ;ωˆ i , (144) DQ,ωˆ i ,ωˆ 0 (b ωi − ω i

(1) Dh;θ0

(2)

where α is a step size parameter. If does not have full row-rank, then DQ,ωˆ i ,ωˆ 0 is singular, i bi − ω b i+1 . If (144) is (144) may be inconsistent and, therefore, (144) may have no solution for ω consistent, then a modified Newton update can be computed as h i+ (2) (1) b i+1 ) = α DQ,ωˆ i ,ωˆ 0 DQ;ωˆ i . (b ωi − ω i

Theorem 113 gives expressions for the modified Newton update.

Theorem 113.

  (1) (1) Suppose that the q × ν˙ matrix Dh;θ0 has rank q ∗ < q . If h(θ) ∈ R Dh;θ0 for all θ , then the Moore-Penrose solution to (144) is  ! b (1)+ bi ) θ − αD h( θ 0 b i ˆ θ i+1 h;θ i   =  b b b b0 ζ ζ Iq − αUi U i+1 i i   −1 b⊥ F b ⊥0 A b i,11 F b⊥ b ⊥0   −F F i i i i  (1)  bi ) , b i,11 D(1)+0 h(θ −1  +α  (1)+0   DL;θˆ i − A ˆ h;θ i b i,11 F b⊥ F b ⊥0 A b i,11 F b⊥ b ⊥0 D ˆ 0 Iν˙ − A F i i i i 

h;θ i

b iD b iF b 0 , is the full-rank SVD of D(1) 0 , where U i ˆ h;θ i

b i ∈ Oq,q∗ , U

b i ∈ D++ ∗ , D dg,q

  (1) ∗ b i ∈ Oν,q F ˙ ∗ , and q = rk D ˆ 0 . h;θ i

If Q(ω) is changed from Q(ω) = L(θ) − ζ 0 h(θ) to Q(ω) = L(θ) + ζ 0 h(θ), then the

Details93 Moore-Penrose solution is bi+1 θ b ζ

!

i+1

  bi − αD(1)+0 h(θ bi ) θ ˆ h; θ i   =  b b iU b0 ζ Iq − αU i i

  −1 b⊥ F b ⊥0 A b i,11 F b⊥ b ⊥0   F F i i i i  (1)  bi ) . b i,11 D(1)+0 h(θ  −1 −α  (1)+0   DL;θˆ i − A ˆ h;θ i b i,11 F b⊥ F b ⊥0 A b i,11 F b ⊥0 b⊥ D ˆ 0 Iν˙ − A F i i i i 

h;θ i

Proof. Define δ as b=ω bi − ω b i+1 = δ

b1 δ b2 δ

!

bi − θ bi+1 θ b −ζ b ζ

=

i

! .

i+1

Then, (144) can be written as b iD b iU b0 −F i 0q×q

b i,11 A b b iF b0 −Ui D i b i,11 = D(2) A ˆ

ˆ0 L;θ i ,θ i

!

b1 δ b2 δ

!

  b0 D(2) − Iν˙ ⊗ ζ i ˆ

 =α

0

ˆ h;θ i ,θ i

(1)

(1)0 b ˆ0 ζ i , h;θ i

bi,1 = D ˆ − D a L;θ i

,

 bi,1 a , where bi2 a

b iD b iF b 0 = D(1) 0 , U i ˆ h;θ i

bi ). bi,2 = −h(θ and a

Accordingly, b1 − F b b i,11 δ b iD b iU b 0δ bi,1 , and (a) A i 2 =a (145) b b iD b iF b0δ (b) U ai,2 . i 1 = −b b ⊥0 and using Iν˙ = F b iF b0 + F b ⊥F b ⊥0 yields Premultiplying (145a) by F i i i i b1 = F b b ⊥0 A b i,11 F b ⊥F b ⊥0 δ b ⊥0 ai,1 − F b ⊥0 Ai,11 F b iF b0δ F i i i i i i 1  −1  −1 b1 = F b b ⊥0 δ b ⊥0 A b i,11 F b⊥ b ⊥0 a b ⊥0 b b⊥ b ⊥0 Ai,11 F b iF b0δ =⇒ F F F i i i i bi,1 − Fi Ai,11 Fi i i 1  −1  −1 b b1 = F b⊥ F b ⊥0 A b i,11 F b⊥ b ⊥0 a b⊥ F b ⊥0 A b i,11 F b⊥ b ⊥0 Ai,11 F b iF b0δ b ⊥F b ⊥0 δ =⇒ F F F i i i i bi,1 − Fi i i i i 1 i i

It follows from (145b) that b b iF b0δ b b −1 b 0 bi,2 and F i 1 = − Fi D i U i a   b1 b1 = F b ⊥0 δ b iF b0 + F b ⊥F δ i i i    −1  −1 ⊥ b ⊥0 b ⊥ ⊥0 b b⊥ F b ⊥0 A b i,11 F b⊥ b ⊥0 a b b b b b −1 b 0 bi,2 . b =F F − I − F F A F F A i,1 ν˙ i,11 i i,11 Fi Di Ui a i i i i i i i

Details94 It follows from (145a) that   b b b iU b 0δ b b −1 b 0 b bi,1 U i 2 = Ui Di Fi Ai,11 δ 1 − a   b2 = U b1 − a b iD b −1 F b0 A b i,11 δ b ⊥ z, where bi,1 + U =⇒ δ i i i  b ⊥ satisfies U bi U i

 b ⊥ ∈ Oq and z is arbitrary. U i

Equating z to zero yields the Moore-Penrose solution to (144). The proof is completed by b i,11 , a b0F b⊥ bi,1 , and a bi,2 , and using F substituting definitions for A i i = 0. To verify the update expression for the case in which Q(ω) is defined as Q(ω) = L(θ) + ζ 0 h(θ), begin by defining δ in the same way as before; namely ! ! b1 bi − θ bi+1 δ θ b=ω bi − ω b i+1 = b δ = b . b δ2 ζ −ζ i

i+1

Then, (144) can be written as b i,11 A b b iF b0 Ui D i b i,11 = D(2) A ˆ

ˆ0 L;θ i ,θ i

b iD b iU b0 F i 0q×q

!

b1 δ b2 δ

!

  bi,1 a , where =α bi2 a

  b0 D(2) + Iν˙ ⊗ ζ i ˆ

ˆ0 , h;θ i ,θ i

(1)

bi,1 = D ˆ + D a L;θ i

(1)0 b ˆ0 ζ i , h;θ i

b iD b iF b 0 = D(1) 0 , U i ˆ h;θ i

bi ). bi,2 = h(θ and a

Accordingly, b1 + F b b i,11 δ b iD b iU b 0δ bi,1 , and (a) A i 2 =a (146) b b iD b iF b0δ bi,2 . (b) U i 1 =a b ⊥0 and using Iν˙ = F b iF b0 + F b ⊥F b ⊥0 yields Premultiplying (146a) by F i i i i b1 = F b b ⊥0 A b i,11 F b ⊥F b ⊥0 δ b ⊥0 ai,1 − F b ⊥0 Ai,11 F b iF b0δ F i i i i i i 1  −1  −1 b1 = F b b ⊥0 δ b ⊥0 A b i,11 F b⊥ b ⊥0 a b ⊥0 A b i,11 F b⊥ b ⊥0 Ai,11 F b iF b0δ b F − F F =⇒ F i,1 i i i i i i i i 1 −1  −1  b1 = F b b⊥ F b ⊥0 A b i,11 F b⊥ b ⊥0 a b⊥ F b ⊥0 A b i,11 F b⊥ b ⊥0 Ai,11 F b iF b0δ b ⊥F b ⊥0 δ F F =⇒ F i i i i i i bi,1 − Fi i i i i 1

Details95 It follows from (146b) that b b iF b0δ b b −1 b 0 bi,2 and F i 1 = Fi Di Ui a   b1 = F b1 b iF b0 + F b ⊥F b ⊥0 δ δ i i i    −1  −1 ⊥ b ⊥0 b ⊥ ⊥0 ⊥ b ⊥0 b ⊥ ⊥0 b b b b b b b b iD b −1 U b 0a bi,1 + Iν˙ − Fi Fi Ai,11 Fi = Fi Fi Ai,11 Fi Fi a Fi Ai,11 F i bi,2 . i It follows from (146a) that   b b1 b iU b 0δ b b −1 b 0 bi,1 − A b i,11 δ U i 2 = U i D i Fi a   b2 = U b1 + U b iD b −1 F b0 a b i,11 δ b ⊥ z, where b =⇒ δ − A i,1 i i i  b ⊥ satisfies U bi U i

 b ⊥ ∈ Oq and z is arbitrary. U i

b i,11 , Equating z to zero yields the Moore-Penrose solution to (144). Substituting definitions for A 0 ⊥ b b bi,1 , and a bi,2 , and using Fi Fi = 0 yields the claimed expression. a b to be a It follows from Corollary 4.1 in Boik (2008a), that a necessary condition for θ   0 0 0 b ζ b be zero and that constrained minimizer of L(θ) is that the gradient evaluated at ω = θ ⊥0 ⊥ b A b 11 F b be positive definite. A further modification of the Newton method is to the matrix F b ⊥0 A b i,11 F b ⊥ by F b ⊥0 A b i,11 F b ⊥ + βi Iν−q replace F , where βi is chosen to ensure that ˙ i i i i ⊥0 b ⊥ b b Fi Ai,11 Fi + βi Iν−q is positive definite. ˙

114.2

Description of Algorithm 3

Algorithm 3 (Newton-Lagrange). Step 1. Set u = 0, bξγ,0 = 0p ×1 , and bξΛ,0 = 0q

. Choose small positive values for  and ∗ . b 0 = λ(b b = ψ(θ bψ,0 ). The guesses need not satisfy b0 , λ Denote initial guesses as Γ ξ ), and ψ 0  + λ,0 (1) b =− W c0 b0 ) = 0. Set ζ cϕ,0 = Wϕ h(b ξ ϕ,0 ; Γ D b , where W and 0 ϕ,0 ξϕ =b ξϕ,0 L;ξϕ,0 (1) (1) D b = DL;ξϕ ξ =bξ . L;ξ m

ϕ

ϕ,0

3 ×1

ϕ,0

Step 2.

cϕ,u . If Reduce the row dimension of h(ξϕ ; Γ), if necessary, by applying (40) to W   +   (1) b =− W cϕ,u and set ζ c0 dim1 h(ξϕ ; Γ) is reduced, then recompute W D b . u ϕ,u L;ξ ϕ,u

Step 3.

cϕ,u is smaller than ∗ , then terminate Algorithm 3, and If the smallest singular value of W b as follows: execute Algorithm 4 or 5 in Ÿ6.2. Otherwise, update β h i−1 (1) (1) (1) (2) (2) b b − αu D(2) β = β D , where D = D , D = D , 0 0 0 u+1 u ˆ ˆ Q;β ˆ ˆ ˆ ˆ Q,β,β Q;β Q;β Q,β u ,β u

 0 b = b β ξ ϕ,u u

b0 θ ψ,u

u

u

ˆ β=β u

Q,β u ,β u

ˆ β=β u

0 b0 , b ξ γ,u = 0pm ×1 , Q is dened in (49), and αu ∈ (0, 1] is chosen to ζ u (1)

ensure that the value of kDQ,βˆ k decreases at each iteration. The update can be computed by inverting a matrix of dimension dim1 (β) − dim1 (ζ) rather than dim1 (β).

Details96

Step 4.

b b bu+1 = Γ bu G(b b u+1 = Λ(b Set Γ ξ γ,u+1 ), Λ ξ λ,u+1 , b ξ Λ,u+1 ), and ψ u+1 = ψ(θ ψ,u+1 ).

Step 5.

Set u = u + 1 and go to step 2 unless kDQ,βb k < .

114.3

(1)

u

Derivative Expressions Required for Algorithm 3 (1)

Execution of Algorithm 3 requires expressions for DQ;β listed in this section. Define w and u as     dim1 (ξϕ ) b     and u def ν w def = dim (θ ) ψ = = 1 ψ a dim1 (ζ) where τ = ξϕ0

(2)

and DQ;β,β0 . The required expressions are     dim1 (τ ) b + νψ = , dim1 (ζ) a

0 θψ0 . Then, (1)

(1)

0 DQ;β = E1,u DL;τ + E2,w Wϕ ζ + E2,u h(ξϕ ) and (2)

(2)

0 E02,u DQ;β,β0 = E1,u DL;τ ,τ 0 E01,u + E2,u Wϕ E01,w + E1,w Wϕ

 +E1,w dvec



(2)0 Dh;ξ0 ,ξ0 ζ ϕ ϕ

G=Ip

 ; b, b E01,w .

(1) (2) Expressions for Wϕ = Dh;ξ0 G=I and Dh;ξ0 ,ξ0 G=I are given p p ϕ ϕ ϕ in §119.9.1. Define ν and ν˙ as     dim1 (ξγ ) pm    q2  def  dim1 (ξλ )  0   and ν˙ def ν =  = = 14 ν, dim1 (ξΛ )   q3  νψ dim1 (θψ ) (1)

(2)

where pm is defined in (20). Expressions for DL;τ and DL;τ ,τ 0 are given in §120 in which τ is

Details97 substituted for θ and (1)

(1)

(1)

Dσ;τ 0 = (Ψ ⊗ Ψ) Dϕ;τ 0 + 2Np (ΨΦ ⊗ Ip ) L21,p Dψ;θ0 E04,ν and ψ

(2)

(2)

Dσ;τ 0,τ 0 = (Ψ ⊗ Ψ) Dϕ;τ 0,τ 0 + 2Np Ψ ⊗ i0p ⊗ Ip



 (1) (1) Dϕ;τ 0 ⊗ L21,p Dψ;θ0 E04,ν 2Nν˙ ψ

0

(2)

+ 2Np (ΨΦ ⊗ Ip ) L21,p Dψ;θ0 ,θ0 (E4,ν ⊗ E4,ν ) ψ

ψ

  (1) (1) 0 + 2Np (Ip ⊗ vec0 Φ ⊗ Ip ) L21,p Dψ;θ0 ⊗ L21,p Dψ;θ0 (E4,ν ⊗ E4,ν ) , where ψ

ψ

  (1) (1) (1) Dϕ;τ 0 = 2Np (ΓΛ ⊗ Γ) Dg;ξ0 E01,ν + (Γ ⊗ Γ) L21,p Dλ;ξ0 E02,ν + T3 E03,ν , γ

(147)

λ

 i  h (2) (1) (1) Dϕ;τ 0,τ 0 = 2Np Γ ⊗ i0p ⊗ Γ L21,p Dλ;ξ0 E02,ν + T3 E03,ν ⊗ Dg;ξ0 E01,ν 2Nν˙ γ

λ

  (1) (1) 0 − 2Np (Γ ⊗ vec0 Λ ⊗ Γ) Dg;ξ0 ⊗ Dg;ξ0 (E1,ν ⊗ E1,ν ) γ

γ

0

(2)

+ 2Np (ΓΛ ⊗ Γ) Dg;ξ0 ,ξ0 (E1,ν ⊗ E1,ν ) γ

γ

0

(2)

+ (Γ ⊗ Γ) L21,p Dλ;ξ0 ,ξ0 (E2,ν ⊗ E2,ν ) , λ

λ

and ip = vec(Ip ).

115

Details on Algorithm 4

There are two variants of Algorithms 4 and 5. The variants differ in the manner in which constraints of the form Cγ0 vec(Γ) = cγ are managed. The first variant corresponds to the parameterization described in the article. That is the constraint Cγ0 vec(Γ) − cγ = 0 is an explicit sub-vector in the constraint vector h(ξϕ ; Γ) in (30). The descriptions for Algorithm 4 on page 97 and Algorithm 5 on page 101 of this Supplement are based on this first variant. The second variant absorbs the constraint Cγ0 vec(Γ) − cγ = 0 into the parameterization of G in Γ = Γ∗ G (see eq. 20). Specifically, G is parameterized as in (17) in BPH, where θγ is replaced by ξγ , dim1 (ξγ ) = pm − rk , pm is defined in (20), rk is defined in Theorem 1 in BPH, and V3 and V4 are chosen as in Theorem 4 of BPH. Absorbing the constraint Cγ0 vec(Γ) − cγ = 0 into the parameterization of G reduces dim1 (h(ξϕ ; Γ)) and dim1 (Wϕ ) by qγ and reduces dim2 (Wϕ ) by rk . The descriptions for Algorithm 4 on page 97 and Algorithm 5 on page 101 of this Supplement also bu+1 = Γ bu G(b apply to the second variant, except that the solution to Γ ξ γ,u+1 ) in Step 3 is obtained using the algorithm described in §10.4 of the online supplement to BPH. For example, consider the model 2 analysis of the Lawley and Maxwell (1971) data set in §8 of the article. For variant 1, the dimensions of the relevant quantities are listed in Table 112. For variant 2, the dimensions of the relevant quantities are listed in Table 113.

Algorithm 4. Step 1.

(Newton-Wright)

Set u = 0, b ξ γ,0 = 0pm ×1 , and b ξ Λ,0 = 0q3 ×1 . Choose small positive values for  and ∗ . b b = ψ(θ bψ,0 ). The guesses need not satisfy b Denote initial guesses as Γ0 , λ0 = λ(b ξ λ,0 ), and ψ 0

Details98

Table 112: Variant 1 Dimensions: Lawley & Maxwell Data p=6 dim(ξγ ) = pm × 1 q2 = 3 dim(C) = p × p − 1 dim(Cγ ) = p2 × qγ a = 13 a−z =9 dim(hϕ ) = (a − z + bz) × 1 dim(Hϕ ) = 73 × 58

0 m= 1 1 2 2 pm = 13 dim(ξΛ ) = q3 × 1 dim(Cλ ) = p × qλ qγ = 8 b = 16 dim(θψ ) = ν2 dim(hϕ ) = 73 × 1 rk(Hϕ ) = 49

A=∅ dim(ξλ ) = q2 × 1 q3 = 0 qλ = 0 dim(Wϕ ) = a × b z=4 ν2 = p dim(Hϕ ) = 73 × [b + z(a − z) + ν2 ]

Table 113: Variant 2 Dimensions: Lawley & Maxwell Data p=6 dim(ξγ ) = (pm − rk ) × 1 dim(ξγ ) = 5 × 1 dim(ξΛ ) = q3 × 1 dim(Cλ ) = p × qλ a=5 a−z =1 dim(hϕ ) = (a − z + bz) × 1 dim(Hϕ ) = 33 × 18

0 m= 1 1 2 2 ρk = 8 dim(ξλ ) = q2 × 1 q3 = 0 qλ = 0 b=8 dim(θψ ) = ν2 dim(hϕ ) = 33 × 1 rk(Hϕ ) = 9

A=∅ dim(ξγ ) = 5 × 1 q2 = 3 dim(C) = p × p − 1 dim(Wϕ ) = a × b z=4 ν2 = p dim(Hϕ ) = 33 × [b + z(a − z) + ν2 ]

i+  0 h (1)0 0 0 b = − Hϕ (b b b b b0 ) = 0. Set ζ b b 0 D h(b ξ ϕ,0 ; Γ τ , ξ ; Γ , Π ) 0 0 0 0 νπ ×1 , where Π0 = U0 , 0 π,0 L;b τ0 h i b 0D b 0V b 0 = SVD Wϕ (b b0 , b cϕ,0 = Wϕ and U ξ ϕ,0 ; Γ ξ γ,0 = 0pm ×1 , W 0 ξϕ =b ξϕ,0 (1) (1) DL;ˆτ 0 = DL;τ τ =ˆτ 0 .

Step 2.

b as follows: Update β h b b e (2) β u+1 = β u − αu D ˆ

ˆ0 Q,β u ,β u

e (2) D ˆ

ˆ0 Q,β u ,β u

(2) = DQ,β,β0

def

ˆ β=β u

i−1

(1) (1) (1) DQ;βˆ , where DQ;βˆ = DQ;β

− E3,v ω b Iνζ E03,v ,

u

u

v def = ν˙

νπ

νζ

0

,

ˆ β=β u

,

h i (1) ω b = 10νβ abs DQ;βˆ , u

def ν˙ def (τ ), νπ def = dim = dim1 (ξπ )= z(a − z), νζ = dim1 (ζ) = a − z + bz ,  10 0 b = b b0 b0 , b b0 ξ γ,u = 0pm ×1 , b ξπ,u = 0z(a−z)×1 , Q is dened in (51), and β ξ ϕ,u θ ζ u ψ,u ξπ,u u (1)

αu ∈ (0, 1] is chosen to ensure that the value of kDQ,βˆ k decreases at each iteration. The e (2) 0 with dimension νβ × νβ can be computed as follows: matrix D Q,β,β (2) DQ,β,β0 def =

∂Q 0  0 ξγ =0 − E3,v ωIνζ E3,v (∂ β) ⊗ ∂ β ξ =0, π

Details99     (2) (2) (2)0 DL;τ ,τ 0 + dvec Dhϕ; τ 0 ,τ 0 ζ; ν, ˙ ν˙ dvec Dhϕ; ξ0 ,τ 0 ζ; ν, ˙ νπ π       (2)0 (2)0 = dvec Dhϕ; τ 0 ,ξ0 ζ; νπ , ν˙ dvec Dhϕ; ξ0 ,ξ0 ζ; νπ , νπ π π π  (1) (1) Dhϕ; τ 0 Dhϕ; ξ0 

π

(1)0

Dhϕ; τ 0



 (1)0  Dhϕ; ξ0  , π −ωIνζ

where νβ = dim1 (β) = ν˙ + νπ + νζ .

Step 3.

b b bu+1 = Γ bu G(b b u+1 = Λ(b Set Γ ξ γ,u+1 ), Λ ξ λ,u+1 , b ξ Λ,u+1 ), ψ u+1 = ψ(θ ψ,u+1 ), and   bu+1 = Γ bu G(b b u+1 = Π b u Gw b ξ γ,u+1 ), is computed using the Π ξ π,u+1 . The update Γ closed-form algorithm in Boik (2008b, Appendix D).

Step 4.

(1)

Set u = u + 1 and go to step 2 unless kDQ,βb k < . u

116

Details on Algorithm 5

116.1

Justication of

(53)

Consider the problem of minimizing L(θ) subject to h(θ) = 0, where θ ∈ Θ. The critical points of (1) interest are solutions for β to the Lagrange estimating equation DQ;β = 0, where (1) DQ;β

(1)

(1)0

DL;θ + Dh;θ0 ζ h(θ)

=

! and β = θ 0

0 ζ0 .

(148)

(1)

It is assumed that at least one bounded solution to DQ;β = 0 exists and that L and h are twice 0 differentiable in open neighborhoods of bounded solutions. Note that if θ = ξϕ0 θψ0 ξπ0 and (1)

h = hϕ , then the estimating equation DQ;β = 0 is identical to the estimating equation in (51).

Theorem 114 (Izmailov & Solodov). (1)

 b= θ b0 Suppose that β

b0 ζ

0

is a bounded solution to

(1) ˆ0 h;θ

by rh . It is assumed that the value of rh is known. The value  0 (1) e= θ e0 ζ e0 is an initial of rh can be obtained by examining the singular values of D e0 where β h; θ e D, e V} e and {U, b D, b V} b as guess for β . Dene {U,   eD eV e 0 = SVD D(1) 0 and U e

DQ;β = 0. Denote the rank of D

h; θ

b0



bD bV = U b1 U

b D11 b U2 0

0 0



b0 V 1 b0 V

2

!

    (1) b 1D b 11 V b 0 = svd D(1) 0 , = SVD D b0 = U 1 b h; θ

h; θ

   b 1 = rh . Partition U e as U e = U e1 where dim2 U

   e e 1 = rh . Assume that θ e 2 , where dim2 U U   0 ∗0 0 b so that U b∗ exists such that β b∗ = θ e0 U b 2 is nonsingular. Then a vector ζ b b close enough to θ ζ 2 (1)

is a solution to the modied Lagrange estimating equation MQ;β = 0, where 

(1) MQ;β

 (1) (1)0 DL;θ + Dh;θ0 ζ   . = e e 2U e0 ζ − ζ h(θ) + U 2

Details100    ∗   b =0⇒U b −ζ e = 0. Furthermore, if β b ∗∗ = θ e 2U e0 ζ b0 Note that h θ 2

to

(1) MQ;β

b = 0, then ζ

∗∗



b∗∗0 ζ

0

also is a solution



b . That is ζ b is unique. =ζ

b is a solution to D(1) = 0, then V b 1D b 11 U b 0 ζ = −D(1) is consistent and Proof. First note that if β 1 Q;β b L; θ   ∗ (1) b as b 1 . Define b this implies that D b ∈ R V z∗ and ζ L; θ  −1 h i e and ζ b∗ def e0 U b2 e0 U b 1D b −1 V b 0 D(1) + ζ b 1D b −1 V b 0 D(1) + U b 2b b z∗ def U z∗ . = U = −U 2 2 1 L; θ 1 L; θ 11 11 b b Then h i  ∗ b 1D b 11 U b 0 −U b 1D b −1 V b 0 D(1) + U b 2b +V z 1 1 L; θ 11 b     −1 h i =    (1) −1 b +U e −ζ e e 2U e 0 −U b 1D b V b0D b2 U e0 U b2 e0 U b 1D b −1 V b 0 D(1) + ζ h θ + U U 2 1 2 2 1 11 11 b b 

(1) b∗ Q; β

M

(1) b L; θ

D

L; θ



L; θ

(1) b b 0 (1) b − V1 V1 DL; θ b L; θ h (1) (1) 0 b b −1 b 0 0 b b −1 b 0 e e e e U2 U2 U1 D11 V1 D b + U2 U2 U1 D11 V1 D b L; θ L; θ



D

=   b − h θ

i  = 0, e −U e e 2U e0 ζ +ζ 2

    b = 0 and D(1) ∈ R V b ∗ is a solution to M(1) = 0. b 1 . Accordingly, β because h θ Q;β b L;θ

b ∗∗ also is a solution to M(1) = 0. Then, Second, suppose that β Q;β

b∗∗ = −U b 1D b −1 V b 0 D(1) + U b 2b ζ z∗∗ for some b z∗∗ 1 11 b L; θ

   ∗∗  b +U b −ζ e =0 e 2U e0 ζ and h θ 2  ∗∗    b −ζ e = 0 because h θ b =0 e 2U e0 ζ =⇒ U 2   ∗∗ b −ζ e =0 e0 ζ =⇒ U 2   ∗∗ e =0 e 0 −U b 1D b −1 V b 0 D(1) + U b 2b =⇒ U z − ζ 2 1 11 b L; θ

e0 U b z∗∗ = U e0 U b b −1 b 0 (1) e0 e =⇒ U 2 2b 2 1 D11 V1 D b + U2 ζ L; θ

 −1   e0 U b e0 U b b −1 b 0 (1) e0 e =⇒ b z∗∗ = U U 2 2 2 1 D11 V1 D b + U2 ζ L; θ

b =⇒ ζ

116.2

∗∗



b . =ζ

Description of Algorithm 5

Recall that there are two variants of Algorithm 5. See page 97 of this Supplement for a description b ϕ,u . It of the two variants. Denote the value of Hϕ in (52) at the uth iteration of Algorithm 5 by H

Details101 is assumed that

  b ϕ,u = rk (Hϕ ) . lim rk H

u→∞

Algorithm 5.

(Izmailov and Solodov)

Step 0.

e D, e V} e as U eD eV e 0 = SVD(H e ϕ ), where H e ϕ def b 0, Π b 0 ), Π b0 = U b 0, Dene {U, τ 0, b ξ π,0 ; Γ = Hϕ (b h i b 0D b 0V b 0 = SVD Wϕ (b b 0 ) , and b e D e , and V e as U ξ ϕ,0 ; Γ ξ γ,0 = 0pm ×1 . Partition U 0 

e = U e1 U



e2 , U

e 11 D e 21 D

e = D

e 12 D e 22 D

!

 e = V e1 , and V

 e2 , V

e 1 ) = dim2 (V e 1 ) = rh , dim(D e 11 ) = rh × rh , and rh def where dim2 (U = rk(Hϕ ) = b ϕ,u ) = the number of singular values of H e ϕ that have magnitude Op (1). limu→∞ rk(H

Step 1.

Set u = 0, b ξ γ,0 = 0pm ×1 , and b ξ Λ,0 = 0q3 ×1 . Choose small positive values for  and ∗ . b 0 = λ(b b = ψ(θ bψ,0 ). The guesses need not satisfy b0 , λ Denote initial guesses as Γ ξ λ,0 ), and ψ 0  0 (1) (1) (1)0 −1 e 0 0 e=ζ b = −U b0 ) = 0. Set ζ e e . h(b ξ ϕ,0 ; Γ D V D 0 1 0 1 11 νπ ×1 , where and DL;ˆ τ 0 = DL;τ τ =ˆ L;b τ0 τ0

Step 2.

b as follows: Update β i−1

h (2) b b β u+1 = β u − αu M ˆ

ˆ0 Q,β u ,β u

(1) (1) (1) MQ;βˆ , where MQ;βˆ = MQ;β u

(2) def ˆ ,β ˆ0 = Q,β u u

M  0 (1) b = b MQ;β dened in (53), β ξ ϕ,u u

b0 θ ψ,u

u

(2) MQ,β,β0

0 b ξπ,u

ˆ β=β u

ˆ β=β u

,

,

0 b0 , b ξ γ,u = 0pm ×1 , b ξπ,u = 0z(a−z)×1 , and ζ u (1)

αu ∈ (0, 1] is chosen to ensure that the value of kMQ,βˆ k decreases at each iteration. The (2)

matrix MQ,β,β0 with dimension νβ × νβ can be computed as follows: (2)

MQ,β,β0 def =

∂ MQ,β ξγ =0 ∂ β 0 ξπ =0,

    (2) (2) (2)0 DL;τ ,τ 0 + dvec Dhϕ; τ 0 ,τ 0 ζ; ν, ˙ ν˙ dvec Dhϕ; ξ0 ,τ 0 ζ; ν, ˙ νπ π       (2)0 (2)0 = dvec Dhϕ; τ 0 ,ξ0 ζ; νπ , ν˙ dvec Dhϕ; ξ0 ,ξ0 ζ; νπ , νπ π π π  (1) (1) Dhϕ; τ 0 Dhϕ; ξ0 

π

(1)0

Dhϕ; τ 0

 (1)0  Dhϕ; ξ0  , π 0 e e U2 U2

where νβ = dim1 (β) = ν˙ + νπ + νζ .

Step 3.

b b bu+1 = Γ bu G(b b u+1 = Λ(b Set Γ ξ γ,u+1 ), Λ ξ λ,u+1 , b ξ Λ,u+1 ), ψ u+1 = ψ(θ ψ,u+1 ), and   b u+1 = Π b u Gw b bu+1 = Γ bu G(b Π ξ π,u+1 . The update Γ ξ γ,u+1 ), is computed using the closed-form algorithm in Boik (2008b, Appendix D).

Step 4.

(1)

Set u = u + 1 and go to step 2 unless kMQ,βb k < . u



Details102

117

Illustrations

117.1

Illustrations of Theorem 4 and Corollary 4.1

This section contains several illustrations of the results of Theorem 4 and its corollary. In these illustrations, the rows of λ and the columns of Γ may need to be permuted to satisfy λ1 ≥ · · · ≥ λp .

117.1.1

Illustration 1a

It follows Theorem 4(b) that if      1/2 3 1 3 0 0 √ −1 1 −1  1/ 12 2 0   √  , then nγ = 1 and λ = 14 +   u Γ= −1 1 −1 −1 1  1/√6  −1 1 −1 −1 −1 1/ 2 dg

for some u ∈ (−1/3, 1) and the three smallest (or largest) eigenvalues are constrained to be equal. Note that Γ satisfies the conditions in Corollary 4.1(a) with s = 1 and that Sλ (Γ) contains λ with 0 0 multiplicity vectors 4, 1 3 and 3 1 .

117.1.2

Illustration 1b

It follows from Theorem 4(b) that if     1/2 1 3 1 1 3 √  1/ 20 1 −1 1 −1 −3   , then nγ = 2 and λ = 14 +  Γ= 1 −1 −1 −1  3  1/2 √ 1 −3 1 −1 −1 1/ 20 dg

 0   1  u1 , −2 u2 1

where u1 ∈ (−1/3, 1) and u2 ∈ (−1 + u1 , 3 + u1 ) ∩ (−[u1 + 3]/2, [1 − u1 ]/2). Note that Γ satisfies the conditions in Corollary 4.1(a) with s = 2 and that Sλ (Γ) contains λ with multiplicity vectors 0 0 0 0 0 4, 1 3 , 3 1 , 2 1 1 , 1 2 1 , and 1 1 2 .

117.1.3

Illustration 1c

 Suppose that Γ = Γ1 Γ2 , where Γ1 and Γ2 are given in (150) on page 104 of this Supplement. 0 This eigenvector structure is induced by constraining λ to have multiplicity vector m = 2 2 . 0 Although m = 2 2 induces this eigenvector structure, this eigenvector structure does not solely 0 induce m = 2 2 . If ±Qi ∈ P2 for i = 1, 2,       −1 0 −1 0 0 1 ± Qi ∈ P2 for i = 1, 2, or Q2 = ± Q1 , (149) 0 1 0 1 1 0 then it follows from Theorem 4(b) that λ has one of the following structures:     1 0   1 0     0 1 1  u1 or λ = 1p +  0  u1 λ = 1p +  −1  0 −1 u2 , 0 u2 −1 0 0 −1

Details103 where ui ∈ (−1, 1) for i = 1, 2. Accordingly, the multiplicity vector for the eigenvalues of Φ is 0 0 m = 4, m = 2 2 , m = 1 2 1 , or m = 104 , depending on the values of u1 and u2 . If ±Qi 6∈ P2 , and/or (149) is not satisfied, then the characterization of N (Γ Γ) is more complicated and a large variety of structures for λ can be generated, depending on the values of a, Q1 , and Q2 in (150). For any values of a, Q1 , and Q2 , however, the structure λ = 1p + 1

1

−1

0 −1 u,

u ∈ (−1, 1),

m= 2

2

0

can be generated.

117.1.4

Illustration 1d

It follows from Theorem 4(b,c) that if   1 1 1 1 1 1 −1 −1  (1/2), then nγ = 3 and Γλdg Γ0 ∈ Cp ∀ λ ∈ Sλ . Γ= 1 −1 1 −1 1 −1 −1 1

117.1.5

Illustration 1e

Suppose that p = 5 and s = 1 column of Γ satisfies γ 2 = 1p (1/p). Then part (a) of Corollary 4.1 0 guarantees that Sλ (Γ) contains vectors of eigenvalues having multiplicity vectors 5, 1 4 , and 0 4 1 . However, Sλ (Γ) could contain vectors of eigenvalues having other multiplicity vectors as well. Suppose that √   √ 1/2 1/2 1/2 1/√20 1/√5 1/ 5 −1/2 1/2 −1/2 1/√20   √  Γ = 1/√5 1/2 −1/2 −1/2 1/√20  P1 ,  1/ 5 −1/2 −1/2 1/2 1/ 20 √ √ 1/ 5 0 0 0 −4/ 20 where P1 ∈ P5 . Then, s = 1 and 

1  0  0 λ = 15 + P2 Zh for some h, where Z =   −3/4 −1/4

 0 0 1 0  0 1  −1 −1 0 0

and P2 ∈ P5 . By choosing h suitably, it is possible to generate vectors of eigenvalues having any 0 0 multiplicity vector that sums to 5 except for 2 3 , and 3 2 .

117.2

Illustrations of Theorem 5 and Corollary 5.1

117.2.1

Illustration 2a

Details104 If p = 2 and λ1 > λ2 , then q = p − q = 1 possible eigen-structures of Φ are    1 λ1 1 λ= , Γ= √ 2 − λ1 −1 2  λ=

 λ1 , 2 − λ1

1 Γ= √ 2

 1 1

2 and Γ must satisfy γ 2 1 = γ 2 = 12 (1/2). The only

 1 , 1

 1 , −1

 Φ=  Φ=



1 1 − λ1

1 − λ1 1

1 λ1 − 1

 λ1 − 1 , where λ1 ∈ (1, 2). 1

and

Note that Φ is parameterized in terms of a single parameter, namely λ1 .

117.2.2

Illustration 2b

0 If m = 1 p − 1 , then each off-diagonal element of Φ is equal to ±(λ1 − 1)/(p − 1) and only 2p−1 eigen-structures are possible. Furthermore, Φ is parameterized in terms of a single parameter, namely λ1 . For example, if p = 4, then one such structure is       1 −ω −ω ω 1 λ1 −ω −1 1    1 ω −ω  ,  , Γ = γ1 γ⊥ λ= , and Φ =  4 − λ1  , γ 1 =  1    −ω ω 1 −ω  −1 2 13 3 ω −ω −ω 1 1 where λ1 ∈ (1, 4), ω = (λ1 − 1)/3, and the columns of γ ⊥ 1 form an arbitrary orthonormal basis of 0 (γ ). The remaining seven eigen-structures are obtained by changing the signs of one or more of N 1 the last three elements of γ 1 .

117.2.3

Illustration 2c

0 If p = 4 and m = 2 2 , then Φ can be parameterized in terms of 2 parameters, namely λ1 and one parameter for Γ. In particular, the eigenvectors must satisfy  p   p  1/2 1/2 p 0 p 0   1/2 − a2  1/2 − a2   Q1 and Γ2 = P  p −a  Q2 , p a Γ1 = P  (150)  1/2 − a2    2 −a − 1/2 − a −a p p 0 1/2 0 − 1/2 √ √ where P ∈ P4 , Qi ∈ O2 for i = 1, 2, and a ∈ [−1/ 2, 1/ 2]. The correlation matrix is √ √   1 √ (λ1 − 1)a 2 (λ1 − 1) 1 − 2a2 0 √  (λ1 − 1)a 2 1 0 (λ1 − 1) 1 −√2a2   P0 , √ Φ = P (λ1 − 1) 1 − 2a2 0 1 −(λ1 − 1)a 2  √ √ 0 (λ1 − 1) 1 − 2a2 −(λ1 − 1)a 2 1 where λ1 ∈ (1, 2).

117.3

Illustrations of Theorem 12

In illustrations 3b–3c, the permuted vector of eigenvalues, namely Qλ is denoted by λ∗ . The first p1 eigenvalues in λ∗ are the eigenvalues of Φ11 and the next p2 eigenvalues in λ∗ are the eigenvalues of Φ22 , etc.

Details105

117.3.1

Illustration 3a

An example in which AΦ does not have full column-rank is easily 0 constructed. Suppose 0 that p = 9, bΦ = 2 (i.e., PΦ ΦPΦ0 = Φ11 ⊕ Φ22 ), p1 = 3, p2 = 6, m = 3 6 , λ11 = ρ1 ρ2 ρ2 , and 0 λ22 = ρ1 ρ1 ρ2 ρ2 ρ2 ρ2 . Then the dimension of the correlation constraint in (31) can be reduced from p − 1 = 8 to 7. Specifically, pi − 1 constraints are required to force the diagonals of Φii to be equal for i = 1, 2. These 7 constraints plus 10p λ = 9 imply that Φ ∈ Cp because 2tr(Φ11 ) = 2ρ1 + 4ρ2 = tr(Φ22 ) ⇒ tr(Φ11 ) = 3. This constraint degeneracy is verified by examining 0 the 2 × 2 matrix AΦ in (37), which has rank 1 because both columns are equal to 1/3 2/3 . Accordingly, rk(Wϕ ) = 8 − 2 + 1 = 7; one constraint in C0 diag(Φ) = 0 is degenerate; and Wϕ does not have full row-rank.

117.3.2

Illustration 3b

Suppose that p = 12, bΦ = 3, p1 = p2 = p3 = 4,   ρ1 ρ2    ρ3    ρ4    ρ4    ρ4   λ= ρ5  ,   ρ5    ρ5    ρ6    ρ6  ρ6

  ρ3 ρ4    ρ4     ρ4        1    ρ2  1     ρ5  1 ∗ def 0      m =   , and λ = Q λ =   .  ρ5  3  ρ5  3     3    ρ1    ρ6    ρ6  ρ6 

0 Then AΦ =  0 1/4

0 1/4 1/4 0 0 0

0 3/4 0 0 0 3/4 0  , 0 0 3/4

and AΦ has full column-rank. This is an illustration Theorem 12(a)(iv). The vector ρ needs to satisfy ρi > 0 for i = 1, . . . , d and m0 ρ = p. For the above example, 0 ρ = 5/2 2 3/2 5/6 2/3 1/2 satisfies these requirements.

117.3.3

Illustration 3c

Details106 Suppose that A is empty, p = 13, bΦ = 3, p1 = 6, p2 = 3, p3 = 4,   ρ1 ρ2    ρ2    ρ2    ρ3    ρ4     λ= ρ4  , ρ4    ρ4    ρ4    ρ4    ρ5  ρ6

  ρ2 ρ2    ρ4    ρ4    ρ4      1  ρ4    3       1   ∗ def 0    m =   , and λ = Q λ =  ρ2  . 6 ρ4      1  ρ4    1      ρ1    ρ3    ρ5  ρ6

Then, 

0 AΦ =  0 1/4

1/3 0 1/3 0 0 1/4

2/3 0 2/3 0 0 1/4

0 0 0  , 1/4

rk(AΦ ) = 2,

and Wϕ does not have full row-rank. For example, ρ = 5/2

117.3.4

rk(Wϕ ) = p − 1 − 3 + 2 = p − 2, 2

1

1/2

2/5

0 1/10 .

Illustration 3d

Suppose that A is empty, p = 12, bΦ = 3, p1 = 6, p2 = p3 = 3,   ρ1 ρ1    ρ2    ρ2    ρ3    ρ3   λ= ρ4  ,   ρ4    ρ5    ρ5    ρ6  ρ6 Then,

  ρ1 ρ2    ρ3    ρ4      ρ5  2    ρ6  2      2  ∗ def 0    .   m =   , and λ = Q λ =    ρ2  2 ρ4  2    ρ5  2        ρ1    ρ3  ρ6

 0 1/6 1/6 1/6 1/6 1/6 1/6 1/3 0 1/3 1/3 0  , rk(AΦ ) = 2, AΦ =  0 1/3 0 1/3 0 0 1/3 0 because AΦ 2 −1 −1 = 0, rk(Wϕ ) = p − 1 − 3 + 2 = p − 2, and Wϕ does not have full 0 row-rank. For example, ρ = 8/5 3/2 11/10 1 1/2 3/10 .

Details107

117.3.5

Illustration 3e

0 Suppose that p = 12, bΦ = 3, p1 = 6, p2 = p3 = 3, m = 2 2 2 2 2 2 , and the eigenvalues of Φjj for j = 1, 2, 3 are 0 0 0 λ11 = ρ1 ρ2 ρ3 ρ4 ρ5 ρ6 , λ22 = ρ2 ρ4 ρ5 , and λ33 = ρ1 ρ3 ρ6 . Recall that C0 diag(Φ) together with 10p λ = p constrains the diagonal elements of Φ to be ones. In the above example, this can be accomplished by p − 2 = 10 constraints in addition to 10p λ = p rather than by p − 1 = 11 additional constraints. One such set of 10 constraints consists of 9 = 5 + 2 + 2 constraints to force the diagonals of Φjj to be equal for each j plus tr(Φ22 ) = tr(Φ33 ). These constraints plus 10p λ = 12 imply that tr(Φ11 ) = 6 as well as tr(Φ22 ) = tr(Φ33 ) = 3 and this is sufficient to ensure that all diagonals of Φ are ones. The AΦ matrix for this example is  0 1/6 1/6 1/6 1/6 1/6 1/6 1/3 0 1/3 1/3 0  , AΦ =  0 1/3 0 1/3 0 0 1/3 and this matrix has rank 2 because AΦ ` = 0, where ` = 2 −1 rk(Wϕ ) = 11 − 3 + 2 = 10 and Wϕ does not have full row-rank.

117.4

0 −1 . Accordingly,

Illustrations of Theorem 14 and Corollaries 14.114.4

The following paragraphs illustrate Corollaries 14.1–14.4 by finding auxiliary constraints for various models of two correlation matrices. In all cases, multiple sets of auxiliary constraints can be constructed such that Model 1 1st = Model 2 is satisfied. In practice, one could choose among the candidate sets of auxiliary constraints the one that is most interpretable.

117.4.1

Illustration 3a Continued

In Illustration 3a (page 105) of this Supplement, PΦ ΦPΦ = Φ11 ⊕ Φ22 , where dim(Φ11 ) = 3 × 3, 0 0 λ11 = ρ1 ρ2 ρ2 , m = 3 6 , and each Φjj is irreducible. For transparency, assume that 0 PΦ = Ip . In each of the following models, m = 3 6 , A is empty, and ρ is unrestricted except √ 0 for ρ1 ∈ (1, 3) and 3ρ1 + 6ρ2 = 9. It follows from Theorem 5 that γ 11 = (1/ 3) 1 ±1 ±1 is the eigenvector of Φ11 that corresponds to ρ1 (see Illustration 2b, page 104) of this Supplement. √ 0 0 0 Without loss of generality, assume that γ 11 = 13 / 3. Accordingly, γ 1 def = 13 06×1 is an eigenvector of Φ with eigenvalue ρ1 .

117.4.1.1

Model 1: Empty Cγ For this model, dim(Wϕ ) = 8 × 19, the rank deficiency of Wϕ is z = 1, and the required number of auxiliary constraints is rk (V20 ∆01·2 ) = 8. Application of (41) reduces C from a 9 × 8 matrix to the 0 0 9 × 7 matrix C = I2 −12 ⊕ I5 −15 . The auxiliary constraints can be chosen in several ways including the following. (a) Application of Corollary 14.1 to f (ξϕ ; θϕ ) = vec(Γ˙ j Γ˙ j0 ) for j = 1 or j = 2 yields an 81 × 8 matrix Kj such that K0j vec(Γ˙ j Γ˙ j0 ) = 0 constrains Γ˙ j Γ˙ j0 to be {Ip , 2} reducible. It follows from Theorem 11 that this constraint also forces Φ to be {Ip , 2} reducible. (b) Application of Corollary 14.2 yields an 81 × 8 matrix, Kϕ , such that K0ϕ vec(Φ) = 0 constrains Φ to be {Ip , 2} reducible; i.e., Φ = Φ11 ⊕ Φ22 .

Details108 (c) Application of Corollary 14.3 with Foj = Ip for j = 1, 2 yields the auxiliary constraints n R(M1 ) ⊂ R(Γ˙ 1 ), R(M2 ) ⊂ R(Γ˙ 2 ) ⇔ K0γ vec(Γ) = 0, where " 0 #      0 E⊥ 13 M21 1 −1 0 0 1,m ⊗ M1  Kγ = , M2 = , and M21 = . 0 , M1 = 06×1 06×2 1 0 −1 E⊥ 2,m ⊗ M2 Application of (47) reduces the dimension of Cγ = Kγ from 81 × 12 to 81 × 8. (d) Application of Corollary 14.4 with Fj = Ip for j = 1, 2 yields the same constraints as in (c) ⊥ ˙ because R(Γ˙ 1 ) = R(ΓE⊥ 2,m ) and R(Γ2 ) = R(ΓE1,m ). Using any of (a)–(d), dim1 [hreg (ξϕ ; Γ)] = 7 + 8; dim(Wϕ ) = 15 × 19; and rk(Wϕ ) = 15.

117.4.1.2

Model 2: Non-empty Cγ

Consider the model in which p = 9, Γ˙ 1 is constrained by M1 ∈ R(Γ˙ 1 ); i.e., Cγ0 vec(Γ) = 0, where Cγ = E⊥ 1,m ⊗ M1 , M1 is defined in §117.4.1.1 on page 107 of this Supplement, and ˙ dim(Γ1 ) = 9 × 3. Then, dim(Wϕ ) = 14 × 19, rk(Wϕ ) = 14 − z, z = 3, and rk (V20 ∆01·2 ) = 4 auxiliary constraints are needed. Application of (41) reduces C from a 9 × 8 matrix to the 9 × 5 0 matrix C = 05×3 I5 −15 . Corollaries 14.1–14.4 yield auxiliary constraints of the form (a) K01 vec(Γ˙ 1 Γ˙ 10 ) = 0 to force Γ˙ 1 Γ˙ 10 to be {Ip , 2} reducible; (b) K0ϕ vec(Φ) = 0 to force Φ to be {Ip , 2} reducible; (c) R(M2 ) ⊂ R(Γ˙ 2 ); and (d) K0γ,1 Γ˙ 1 = 0; where Kγ,1 = M2 , and M2 is given in §117.4.1.1 on page 107 of this Supplement. Using any of (a)–(d), dim1 [hreg (ξϕ ; Γ)] = 5 + 6 + 4; dim(Wϕ ) = 15 × 19, and rk(Wϕ ) = 15.

117.4.2

Illustration 4

 Consider Φ ∈ C9 for which Γ = Γ1 Γ2 ,   1/3 0 0 0 0 1  1  0 1/2 0 0 0 √     0√ 0  0 1/ 2 Γ1 =   0  0  0 0 0 1/ 2 0√  0 0 0 0 0 0 1/ 6

1 1 1 1 −1 −1 0 0 0 0 0 0 0 0 0

1 1 1 0 0 0 0 0 0 1 −1 0 1 1 −2

0 1 1  0 0   1 −1  , 0 0 0 0

and Γ2 ∈ O9,4 ∩ N (Γ10 ). It follows from Theorem 4 that λ = 19 + Zu for some u, where  0 46 0 −4 −5 −5 −8 −8 −8 −8 Z= . 0 46 14 6 6 −18 −18 −18 −18 0 0 19 Setting u = 12 520 yields λ = 2.6808 102 1.3654 1.0365 102 0.0500 104 , m = 2 1 2 4 , and   1.000 0.950 −0.365 −0.365 0.292 0.292 0.292 0.292 0.292  0.950 1.000 −0.365 −0.365 0.292 0.292 0.292 0.292 0.292   −0.365 −0.365 1.000 0.950 0.292 0.292 0.292 0.292 0.292   −0.365 −0.365 0.950 1.000 0.292 0.292 0.292 0.292 0.292   0.292 0.292 0.292 1.000 −0.037 −0.037 0.292 0.292 Φ=  0.292 .  0.292  0.292 0.292 0.292 −0.037 1.000 −0.037 0.292 0.292    0.292 0.292 0.292 0.292 −0.037 −0.037 1.000 0.292 0.292    0.292 0.292 0.292 0.292 0.292 0.292 0.292 1.000 −0.365 0.292 0.292 0.292 0.292 0.292 0.292 0.292 −0.365 1.000 In each of the following models, ρ is unrestricted except for ρi > 0 and m0 ρ = 9.

Details109

117.4.2.1

Model 1: Empty Cγ , Empty 0

If m and A are set to m = 2 1 2 auxiliary constraints are required.

117.4.2.2

4

A

and A = ∅, then dim(Wϕ) = 8 × 31 and rk(Wϕ ) = 8. No

Model 2: Non-empty Cγ , Complete A 0 4 , A = {1, 3, 4}; or to m = 2

If m and A are set to m = 2 1 2 R(M1 ) = R(Γ1 ) is imposed, where  M1 =

1 1

1 1 1 1 −1 −1

1 0

1 0

1 0

1 0

0 7 , A = {1, 2}, and 0 1 , 0

then dim(Wϕ ) = 22 × 44 and it follows from Theorem 13 and Corollary 13.1 that Wϕ has full row-rank. Accordingly, no auxiliary constraints are necessary.

117.4.2.3

Model 3: Non-empty Cγ , Empty

A

0

If m is set to its true value m = 2 1 2 4 , A = ∅, and R(M1 ) = R(Γ1 ) ⇔ Cγ0 vec(Γ) = 0 is imposed, where M1 is defined in §117.4.2.2 and Cγ = E⊥ 1,m ⊗ M1 , then dim(Wϕ ) = 22 × 31, rk(Wϕ ) = 22 − z, z = 3, and rk (V20 ∆01·2 ) = 9 auxiliary constraints are needed. Applying (41) 0 reduces C to the 9 × 5 matrix C = 15 104 −4 I5 . The auxiliary constraints can be chosen in several ways. (a) Application of Corollary 14.3 with F4 = Ip yields the auxiliary constraint R(M4 ) ⊂ R(Γ˙ 4 ) ⇔ K0γ vec(Γ) = 0, where Kγ = E⊥ 4,m ⊗ M4 , and  3 0 0 0 3 0 M4 =  0 0 0 0 0 1

0 3 −2 −2 −2 0 0 3 −2 −2 −2 0 0  . 0 −2 −2 −2 3 3 −1 0 0 0 0 0

Appending Kγ to Cγ yields Cγ with dim(Cγ ) = 81 × 34 which is reduced to 81 × 23 by (47). (b) Application of Corollary 14.4 with Fj = Ip for j ∈ {1, 2, 3, 4} the yields constraint  K0γ vec(Γ) = 0, where Kγ = (E2,m ⊗ Kγ,2 ) (E3,m ⊗ Kγ,3 ) (E4,m ⊗ Kγ,4 ) , 

 Kγ,2 =

I4 05×4

 ,

The dimension of Cγ

Kγ,3

1 0  0  0  = 0 0  0  0 0

0 1 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0

0 0 0 1 0 0 0 0 0

0 0 0 0 0 0 0 1 1

  0 0 0 0 0 0 0 0   2 2 2 0   2 2 2 0   3 0 0 1 , and K = γ,4   0 3 0 1   0 0 3 1   2 2 2 0 0 0 0 0

 0 1 0 1  0 −1  0 −1  0 0 . 0 0  0 0  1 0 −1 0

 Kγ can be reduced from 81 × 50 to 81 × 23 by applying (47).

Using either (a) or (b), dim1 [hreg (ξϕ ; Γ)] = 5 + 14 + 9; dim(Wϕ ) = 28 × 31; and rk(Wϕ ) = 28.

Details110

117.4.2.4

Model 4: Non-empty Cγ , Non-empty

A If m is set to the true value, A = {1, 3}, and R(M1 ) = R(Γ˙ 1 ) is imposed, where M1 is defined in §117.4.2.2, then, dim(Wϕ ) = 22 × 35, rk(Wϕ ) = 22 − z, z = 2, and rk (V20 ∆01·2 ) = 9 auxiliary constraints are needed. Applying (41) reduces C to the 9 × 6 matrix C = C1

 C2 , where C1 =



0 1 1 0 0 0 0 0 0 −2 and C2 = 04×4 0 0 1 1 0 0 0 0 −2

I4

0 −14 .

The auxiliary constraints can be chosen in several ways. (a) Application of Corollary 14.3 with F4 = Ip yields the auxiliary constraint R(M4 ) ⊂ R(Γ˙ 4 ) ⇔ K0γ vec(Γ) = 0, where Kγ = E⊥ 4,m ⊗ M4 , and 

5 M4 = 0 1

0 0 5 −2 −2 −2 5 0 5 −2 −2 −2 −1 0 0 0 0 0

−2 −2 0

0 −2 −2 . 0

Appending Kγ to Cγ yields Cγ with dim(Cγ ) = 81 × 29 which is reduced to 81 × 23 by (47). (b) Application of Corollary 14.4 with Fj = Ip for j ∈ {1, 2, 3, 4} the yields constraint  K0γ vec(Γ) = 0, where Kγ = (E2,m ⊗ Kγ,2 ) (E3,m ⊗ Kγ,3 ) (E4,m ⊗ Kγ,4 ) ,  Kγ,2 =

I4

05×4



 ,

Kγ,3 =

The dimension of Cγ

 I5 , and Kγ,4 = 0 14 e50 5

0

0

0

0

0

0

1

0 −1 .

 Kγ can be reduced from 81 × 32 to 81 × 23 by applying (47).

Using either (a) or (b), dim1 [hreg (ξϕ ; Γ)] = 6 + 14 + 9; dim(Wϕ ) = 29 × 35; and rk(Wϕ ) = 29.

118

Simulations

118.1

Estimating rk(Wϕ )

Recall that dim(Wϕ ) = a × b and rk(Wϕ ) = a − z, where z is the rank-deficiency and Wϕ is defined in (32). To estimate the value of z, a profile plot of r1 , . . . , ra−1 is useful, where rj is defined in (50). Recall that z is the largest index value, j, for which rj ≥ Op (n1/2 ). For each of Models 1–2 of Illustration 3a (page 105) of this Supplement and Models 1–4 of Illustration 4, 25 data sets with N = 100 were sampled from a MVN distribution. Figure 1 displays the profile plots of r1 , r2 , . . . , ra−1 . To simulate Φ√for Illustration 3a, PΦ was set to I9 , the eigenvector of Φ11 that corresponds to ρ1 was set to 13 / 3, and ρ1 was set to 2.9. Other details are described below. cϕ ) = 8 × 19 and (a) Illustration 3a, Model 1. Both algorithms in §6.1 converge because dim(W cϕ ) = 8. Nonetheless, rk(Wϕ ) = 7, so z = 1. For each data set, r1 is the largest value of rj , rk(W √ but r1 was as small as 8.15 in one case and r1 > n in 20 of 25 cases.

(b) Illustration 3a, Model 2. Both algorithms in §6.1 terminate before convergence because cϕ ) = 14 × 19 and rk(W cϕ ) = rk(Wϕ ) = 11. The minimum value of r3 across the 25 data sets dim(W is 160.8. For each data set, z is correctly estimated to have value 3. cϕ ) = 8 × 31 and (c) Illustration 4, Model 1. Both algorithms in §6.1 converge because dim(W c rk(Wϕ) = rk(Wϕ ) = 8. The maximum value of rj across the 25 data sets is 1.50. For each data set, z is correctly estimated to have value 0.

Details111

Figure 1: Estimating Rank Deficiency (a) Illustration 3 Model 1

(b) Illustration 3 Model 2

30

900 800

25 700 600 Ratio

Ratio

20

15

500 400 300

10

200 5

100 1

2

3

4 Index

5

6

7

2

4

6

8

10

12

Index

(c) Illustration 4 Model 1

(d) Illustration 4 Model 2 2.5

1.6 2

1.2

Ratio

Ratio

1.4

1.5

1 1

0.8

0.6 1

2

3

4 Index

5

6

0.5

7

(e) Illustration 4 Model 3

5

10 Index

15

20

(f) Illustration 4 Model 4

35

30

30

25

25

Ratio

Ratio

20 20

15

15 10 10 5

5

5

10 Index

15

20

5

10 Index

15

20

(d) Illustration 4, Model 2. Both algorithms in §6.1 converge because dim(Wϕ ) = 22 × 31 and c rk(Wϕ) = rk(Wϕ ) = 22. The maximum value of rj across the 25 data sets is 2.37. For each data set, z is correctly estimated to have value 0. cϕ ) = 22 × 31 and (e) Illustration 4, Model 3. Both algorithms in §6.1 converge because dim(W cϕ ) = 22. Nonetheless, z = 3 because rk(Wϕ ) = 19. For each data set, r3 is the largest value rk(W

Details112 of rj , but r3 was as small as 3.65 in one case and r3 >



n in only 7 of 25 cases.

cϕ ) = 22 × 35 and (f ) Illustration 4, Model 4. Both algorithms in §6.1 converge because dim(W cϕ ) = 22. Nonetheless, z = 2 because rk(Wϕ ) = 20. For each data set, r2 is the largest value rk(W √ of rj , but r2 was as small as 4.19 in one case and r3 > n in 22 of 25 cases.

118.2

Comparison of Proposed Methods with those of Schott (1997a)

,r Schott (1997a) proposed tests of HM R(M) ⊆ R(Γ1 ), where M is a full column-rank matrix with 0 dimension p × s, Γ1 ∈ Op,r , and s ≤ r. To evaluate control of test size, the simulation conditions reported by Schott (1997a), for p = 8, were used. In addition to the MVN parent distribution used by Schott, two non-normal parent distributions were used, namely χ22 and and a mixture of normals. Let z be a N(0, Ip ) random vector, w be a random p-vector whose elements are iid centered and scaled χ22 random variables, and B be a Bernoulli random variable with parameter 0.7. The three distributions√for y were (a) y ∼ Φ1/2 z, (b) y ∼ Φ1/2 w, and (c) y ∼ Φ1/2 z [B + 3(1 − B)] / 3.4. For all three distributions, y ∼ (0, Φ). Sample sizes of N ∈ {30, 50, 100} were used for the MVN simulations and N ∈ {100, 200, 500} were used for the χ22 and normal mixture simulations. √ The structure of Γ was the following: p = 8; 8Γ ∈ H8 ; all elements of γ 1 were positive; all odd elements of γ 2 were positive; and for r = 3 the first 4 elements of γ 3 were positive. The   γ1 γ2 ; r = s = 3 ⇒ M = γ1 γ2 γ3 ; structure of M was the following: r = s = 2 ⇒ M =   (r, s) = (2, 1) ⇒ M = γ 2 ; and (r, s) = (3, 2) ⇒ M = γ 2 γ 3 . The eigenvalue structures that were employed are listed in Table 114. For each simulation condition, 1000 data sets were generated and analyzed.

Table 114: List of Eigenvalue Structures for the Simulation Study λ1 2.5 2.5 0.5 0.5 0.5 0.5 0.5 0.5

λ2 4.0 3.4 0.1 0.1 0.1 0.1 0.1 0.1

λ3 4.0 1.0 0.5 0.5 0.5 0.5 0.5 0.5

λ4 3.0 2.0 2.0 0.2 0.2 0.2 0.2 0.2

λ5 3.0 2.5 1.5 0.2 0.2 0.2 0.2 0.2

λ6 3.0 3.0 1.0 0.2 0.2 0.2 0.2 0.2

To evaluate the proposed methods, goodness of fit tests of the model in (39) were performed. Note that eigenvalue multiplicities are not specified in this model, but λr > λr+1 is assumed. For a br > λ br+1 . For these data sets, HM,r small number of data sets, the fitted model did not satisfy λ 0 was rejected. The test statistics that were evaluated are listed in Table 115. The simulation results are reported in Table 116 (MVN), Table 117 (χ22 ), and Table 118 (mixture). The results reported by Schott (1997a) are given in parentheses in Table 116. The number of data sets for which br > λ br+1 was not satisfied for each condition is listed in the column labeled NF (number of fit λ b + and this, in turn failures). The asymptotically distribution free tests require computation of Ω 22,n ˙ 0Ω b ˙ ˙0b ˙ requires that D p 22,n Dp be nonsingular. If N < p(p + 1)/2, then Dp Ω22,n Dp is singular and the ADF test cannot be performed. For these cases, the ADF test results are reported as NA. Overall, the proposed methods control test size adequately and their control is superior to that of the tests proposed by Schott (1997a). Tests M1 and M2 are recommended, although they are conservative in some cases.

Details113

Table 115: List of Test Procedures S1 S2 S3 M1 M2 M3 M4 M5

Q1 in (15) with ADF estimate of Θ1 if s = r or Q2 in (16) with ADF estimate of Θ2 if s < r Q1 in (15) with MVN estimate of Θ1 if s = r or Q2 in (16) with MVN estimate of Θ2 if s < r t1 from Schott(1997a, Theorem 6) if s = r or t2 from Schott (1997a, Theorem 7) if s < r Model-based goodness of fit residual-based test; (35) in BPH; see Browne (1984) Model-based goodness of fit test; (34) in BPH Model-based goodness of fit residual-based test; (35) in BPH with MVN estimator of Ω22,∞ in (12); see Browne (1984) Model-based goodness of fit likelihood ratio test under MVN Model-based goodness of fit Bartlett-corrected likelihood ratio test under MVN

119

Proofs of Theorems in the Article

119.1

Proofs of Theorem 1, Lemma 1, and Corollary 1.1

119.1.1

Preliminary Lemmas

Some of the notation of Schott (1997a) is modified to be consistent with the remainder of this Supplement. Major differences and similarities between Schott’s notation and the notation in this Supplement are summarized in Table 119. Additional notation from Tables 2 and 3 also is used. Denote the distinct eigenvalues of the population correlation matrix, Φ, as ρ1,1 > ρ1,2 > · · · > ρ1,d1 > ρ2,1 > · · · > ρ2,d2 , where ρ1,i for i = 1, . . . , d1 are the eigenvalues that correspond to the eigenvectors in Γ1 and ρ2,j for j = 1, . . . , d2 are the eigenvalues that correspond to the eigenvectors in Γ2 . That is, ΦΓ˙ 1,i = ρ1,i Γ˙ 1,i for i = 1, . . . , d1 , and ΦΓ˙ 2,j = ρ2,j Γ˙ 2,j for j = 1, . . . , d2 , where Γ1 = Γ˙ 1,1

Γ˙ 1,2

···

Γ˙ 1,d1



and Γ2 = Γ˙ 2,1

Γ˙ 2,2

···

 Γ˙ 2,d2 .

It also is useful to express Γ˙ 1,i and Γ˙ 2,j using elementary matrices. Partition the multiplicity vector m as 0 m = m01 m02 , where dim(mi ) = di × 1 for i = 1, 2. (151) Then, Γ˙ 1,i = Γ1 Ei,m1 , and Γ˙ 2,j = Γ2 Ej,m2 , where Ei,m1 and Ej,m2 are elementary matrices defined in Table 2.   √ b1 Γ b 0 − Γ1 Γ0 is Schott’s expression for the asymptotic covariance matrix of n Γ 1 1

Θ1 = HΩr H, where H =

d1 X d2 X

(ρ2,j − ρ1,i )−1 (P1,i ⊗ P2,j + P2,j ⊗ P1,i ) ,

i=1 j=1 0 P1,i = Γ˙ 1,i Γ˙ 1,i ,

(152)

0 P2,j = Γ˙ 2,j Γ˙ 2,j , and Ωr = lim Var n→∞

√

 n(R − Φ) .

Details114

Table 116: Empirical Test Sizes: MVN Distribution N 30 50 100 30 50 100 30 50 100 30 50 100 30 50 100 30 50 100 30 50 100 30 50 100 30 50 100 30 50 100 30 50 100 30 50 100

r 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3

s 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2

Lemma 107.

λ 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6

ADF Tests S1 M1 M2 NA NA NA 0.322 0.019 0.037 0.161 0.045 0.048 NA NA NA 0.277 0.041 0.042 0.135 0.054 0.046 NA NA NA 0.342 0.014 0.026 0.160 0.047 0.031 NA NA NA 0.513 0.013 0.043 0.206 0.031 0.039 NA NA NA 0.499 0.022 0.048 0.253 0.038 0.039 NA NA NA 0.539 0.013 0.050 0.224 0.028 0.033 NA NA NA 0.157 0.049 0.043 0.092 0.053 0.050 NA NA NA 0.124 0.029 0.053 0.063 0.028 0.039 NA NA NA 0.152 0.004 0.025 0.096 0.053 0.034 NA NA NA 0.311 0.022 0.052 0.132 0.041 0.045 NA NA NA 0.294 0.020 0.044 0.159 0.035 0.051 NA NA NA 0.303 0.027 0.047 0.151 0.034 0.037

S2 0.097 0.069 0.047 0.054 0.044 0.039 0.122 0.077 0.066 0.215 0.118 0.062 0.199 0.116 0.080 0.229 0.145 0.067 0.062 0.052 0.050 0.036 0.037 0.032 0.101 0.053 0.059 0.131 0.087 0.048 0.124 0.068 0.068 0.120 0.083 0.057

MVN S3 0.097 (0.085) 0.069 (0.054) 0.047 (0.041) 0.054 (0.052) 0.044 (0.045) 0.039 (0.034) 0.122 (0.116) 0.077 (0.076) 0.066 (0.052) 0.215 (0.211) 0.118 (0.103) 0.062 (0.087) 0.199 (0.198) 0.116 (0.124) 0.080 (0.077) 0.229 (0.193) 0.145 (0.135) 0.067 (0.079) 0.112 (0.107) 0.073 (0.066) 0.067 (0.055) 0.042 (0.027) 0.037 (0.035) 0.032 (0.047) 0.564 (0.547) 0.343 (0.335) 0.192 (0.173) 0.174 (0.169) 0.111 (0.063) 0.063 (0.062) 0.171 (0.161) 0.100 (0.063) 0.075 (0.072) 0.205 (0.215) 0.132 (0.083) 0.085 (0.087)

Tests M3 0.056 0.045 0.052 0.047 0.049 0.057 0.044 0.052 0.045 0.041 0.042 0.034 0.043 0.044 0.045 0.053 0.055 0.048 0.052 0.045 0.048 0.055 0.047 0.040 0.060 0.048 0.050 0.056 0.047 0.042 0.040 0.043 0.051 0.045 0.055 0.040

M4 0.122 0.082 0.067 0.091 0.075 0.070 0.134 0.092 0.061 0.118 0.083 0.063 0.131 0.081 0.066 0.150 0.093 0.057 0.095 0.062 0.056 0.105 0.068 0.048 0.153 0.087 0.070 0.126 0.079 0.052 0.099 0.068 0.063 0.105 0.080 0.053

Dene H as in equation (152). Then (a) H2Np (Γ1 ⊗ Γ2 ) t = 2Np vec (Γ2 UΓ10 ) , d1 X d2 X where U = Ej,m2 E0j,m2 TEi,m1 E0i,m1 (ρ2,j − ρ1,i )−1 , and (b)

i=1 j=1

H2Np (Γ1 ⊗ Γ2 ) t = 0 =⇒ t = 0.

M5 0.055 0.046 0.049 0.045 0.053 0.052 0.062 0.065 0.043 0.068 0.047 0.044 0.065 0.055 0.050 0.080 0.058 0.047 0.056 0.047 0.048 0.062 0.051 0.040 0.068 0.054 0.052 0.077 0.057 0.039 0.048 0.046 0.052 0.054 0.059 0.039

NF 3 0 0 0 0 0 12 3 0 0 0 0 0 0 0 1 0 0 4 0 0 0 0 0 16 0 0 0 0 0 0 0 0 3 0 0

Details115

Table 117: Empirical Test Sizes: χ22 Distribution N 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500

r 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3

s 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2

λ 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6

ADF Tests S1 M1 M2 0.292 0.031 0.039 0.172 0.047 0.045 0.121 0.047 0.045 0.282 0.045 0.061 0.158 0.050 0.060 0.087 0.038 0.034 0.292 0.019 0.016 0.179 0.043 0.020 0.117 0.043 0.033 0.379 0.024 0.042 0.228 0.042 0.047 0.110 0.042 0.042 0.421 0.029 0.034 0.218 0.039 0.035 0.119 0.045 0.040 0.426 0.023 0.031 0.208 0.030 0.039 0.088 0.037 0.033 0.126 0.049 0.046 0.097 0.058 0.056 0.076 0.060 0.059 0.092 0.040 0.042 0.080 0.050 0.053 0.048 0.040 0.035 0.162 0.024 0.016 0.090 0.035 0.020 0.080 0.038 0.033 0.227 0.034 0.036 0.125 0.043 0.036 0.075 0.048 0.041 0.220 0.042 0.033 0.132 0.042 0.041 0.073 0.037 0.032 0.233 0.032 0.028 0.145 0.044 0.044 0.081 0.052 0.047

S2 0.046 0.061 0.065 0.112 0.124 0.140 0.093 0.071 0.085 0.105 0.103 0.067 0.109 0.080 0.085 0.121 0.080 0.055 0.055 0.069 0.059 0.054 0.052 0.041 0.065 0.057 0.067 0.084 0.064 0.063 0.076 0.072 0.049 0.079 0.070 0.071

MVN S3 0.046 0.061 0.065 0.112 0.124 0.140 0.093 0.071 0.085 0.105 0.103 0.067 0.109 0.080 0.085 0.121 0.080 0.055 0.067 0.075 0.060 0.055 0.052 0.042 0.197 0.117 0.089 0.095 0.072 0.066 0.093 0.077 0.049 0.094 0.084 0.075

Tests M3 0.058 0.070 0.064 0.192 0.184 0.177 0.070 0.066 0.063 0.079 0.090 0.075 0.086 0.069 0.083 0.084 0.078 0.058 0.059 0.074 0.061 0.064 0.058 0.043 0.076 0.060 0.061 0.075 0.060 0.063 0.067 0.063 0.049 0.063 0.066 0.064

Proof. Verification of part (a). First note that H2Np = 2Np H. Accordingly, H2Np (Γ1 ⊗ Γ2 ) t = 2Np H (Γ1 ⊗ Γ2 ) t

= 2Np

d1 X d2 X i=1 j=1

(P1,i Γ1 ⊗ P2,j Γ2 ) t(ρ2,j − ρ1,i )−1

M4 0.078 0.079 0.067 0.215 0.201 0.181 0.097 0.080 0.063 0.104 0.104 0.076 0.112 0.081 0.086 0.110 0.088 0.064 0.069 0.080 0.060 0.072 0.062 0.045 0.108 0.082 0.064 0.084 0.064 0.065 0.082 0.074 0.049 0.076 0.069 0.065

NF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Details116

Table 118: Empirical Test Sizes: Mixture Distribution N 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500 100 200 500

r 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3

s 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2

λ 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6

ADF Tests S1 M1 M2 0.500 0.021 0.031 0.226 0.040 0.026 0.115 0.047 0.040 0.445 0.025 0.032 0.200 0.039 0.041 0.098 0.050 0.048 0.534 0.013 0.012 0.252 0.025 0.016 0.119 0.046 0.028 0.699 0.006 0.031 0.339 0.028 0.043 0.140 0.036 0.042 0.695 0.017 0.027 0.373 0.024 0.030 0.138 0.043 0.041 0.722 0.014 0.031 0.337 0.022 0.035 0.140 0.043 0.047 0.239 0.035 0.046 0.138 0.046 0.048 0.089 0.047 0.049 0.187 0.023 0.037 0.108 0.024 0.034 0.075 0.046 0.038 0.253 0.008 0.015 0.134 0.026 0.020 0.083 0.051 0.023 0.456 0.026 0.043 0.231 0.027 0.036 0.102 0.038 0.036 0.435 0.017 0.025 0.231 0.034 0.033 0.120 0.045 0.046 0.432 0.010 0.046 0.244 0.031 0.046 0.102 0.035 0.046

= 2Np (Γ1 ⊗ Γ2 )

d1 X d2 X

S2 0.607 0.602 0.645 0.538 0.575 0.616 0.601 0.579 0.605 0.706 0.691 0.668 0.698 0.678 0.711 0.722 0.692 0.685 0.455 0.455 0.422 0.419 0.422 0.442 0.438 0.423 0.443 0.618 0.604 0.598 0.597 0.599 0.612 0.581 0.607 0.590

MVN S3 0.607 0.602 0.645 0.538 0.575 0.616 0.601 0.579 0.605 0.706 0.691 0.668 0.698 0.678 0.711 0.722 0.692 0.685 0.488 0.473 0.429 0.423 0.426 0.443 0.750 0.574 0.518 0.634 0.616 0.601 0.630 0.613 0.618 0.614 0.626 0.599

Tests M3 0.656 0.658 0.663 0.648 0.661 0.670 0.877 0.886 0.854 0.707 0.704 0.683 0.698 0.708 0.727 0.728 0.706 0.706 0.467 0.462 0.423 0.445 0.440 0.448 0.882 0.885 0.894 0.582 0.593 0.591 0.571 0.581 0.603 0.555 0.590 0.585

M4 0.702 0.670 0.675 0.685 0.670 0.666 0.905 0.901 0.864 0.749 0.729 0.690 0.743 0.724 0.734 0.774 0.724 0.716 0.489 0.474 0.427 0.469 0.456 0.451 0.907 0.896 0.896 0.613 0.610 0.596 0.609 0.610 0.611 0.599 0.607 0.592

 Ei,m1 E0i,m1 ⊗ Ej,m2 E0j,m2 t(ρ2,j − ρ1,i )−1

i=1 j=1

  d1 X d2 X    = 2Np vec Γ2 Ej,m2 E0j,m2 TEi,m1 E0i,m1 Γ10 (ρ2,j − ρ1,i )−1 ,   i=1 j=1

NF 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0

Details117

Table 119: Comparison of Schott’s (1997a) Notation and Current Notation Quantity Population Correlation Matrix Sample Correlation Matrix √ Asymptotic Variance of n vec(R  − Φ)  √ b1 Γ b 0 − P0 Asymptotic Variance of n vec Γ 1   √ b1 Γ b 0 − P0 Asymptotic Variance of n vec P0 Γ 1

Schott (1997a) Ω R Ψ1 or Ψ2

This Supplement Φ R Ωr

Φ1

Θ1

Ξ

Θ2

where T = dvec(t, p − r, r)

= 2Np vec (Γ2 UΓ10 ) , where U =

d1 X d2 X

Ej,m2 E0j,m2 TEi,m1 E0i,m1 (ρ2,j − ρ1,i )−1 .

i=1 j=1

Verification of part (b). It follows from part (a) that H2Np (Γ1 ⊗ Γ2 ) t = 0 =⇒ 2Np vec (Γ2 UΓ10 ) = 0 =⇒ Γ2 UΓ10 + Γ1 U0 Γ20 = 0 =⇒ Γ20 (Γ2 UΓ10 + Γ1 U0 Γ20 ) Γ1 = 0 =⇒ U = 0 =⇒ E0j,m2 UEi,m1 = 0 for i = 1, . . . , d1 and j = 1, . . . , d2 =⇒ E0j,m2 TEi,m1 = 0 for i = 1, . . . , d1 and j = 1, . . . , d2 =⇒ T = 0 =⇒ t = vec(T) = 0.

119.1.2 Theorem 1 Theorem 1. [Schott, 1997a, Theorem 2] Let C be any p × (p − 1) matrix that satises C = 1⊥p . The rank of Θ1 in (15) is r(p − r) − τ , where τ is the number of linearly independent vectors (a) u for which Γ2 UΓ10 + Γ1 U0 Γ20 is a non-null diagonal matrix and U = dvec(u, p − r, r); (b) a for which Γ2 Γ20 Da Γ1 Γ10 + Γ1 Γ10 Da Γ2 Γ20 = Da , where Da = (Ca)dg and dim1 (a) = p − 1. Proof. Verification of Part (a). Recall that Γ is partitioned as Γ = Γ1 easily verified that

 Γ2 , where dim(Γ1 ) = p × r and λr > λr+1 . It is

 W0 H = 0 and, therefore W0 Θ1 = 0, where W = Γ1 ⊗ Γ1

Γ2 ⊗ Γ2

 2N⊥ p (Γ1 ⊗ Γ2 ) ,

Details118 and H is defined in (152). Furthermore, the columns of W are mutually orthogonal and the columns of 2Np (Γ1 ⊗ Γ2 ) are an orthogonal basis set for N (W0 ). Accordingly, 2 R (Θ1 ) ⊆ R [2Np (Γ1 ⊗ Γ2 )] and rk (Θ1 ) = p − dim [N (Θ1 )]

n o \ = p2 − dim [R(W)] − dim N (Θ1 ) R [2Np (Γ1 ⊗ Γ2 )] = p2 − dim [R(W)] − dim {N [Θ1 2Np (Γ1 ⊗ Γ2 )]} = r(p − r) − τ, where τ = dim {N [Θ1 2Np (Γ1 ⊗ Γ2 )]} . That is, τ is the number of linearly independent vectors, t, that satisfy Θ1 2Np (Γ1 ⊗ Γ2 )t = 0 or, equivalently Ωr H2Np (Γ1 ⊗ Γ2 )t = 0, because Ωr is positive semi-definite. Lemma 107b verified that H2Np (Γ1 ⊗ Γ2 )t = 0 =⇒ t = 0. Accordingly, Ωr H2Np (Γ1 ⊗ Γ2 )t = 0 ⇐⇒ H2Np (Γ1 ⊗ Γ2 )t ∈ N (Ωr ), √ where Ωr is the asymptotic covariance matrix of n vec(R − Φ). Various expressions for Ωr are available in the literature. In particular, in Boik (1998) √ ˙ 0 it follows from equations (18) and (19) ˙ p is defined in Table vec(S − Σ) is nonsingular, where D that if the asymptotic variance of n D p 101, then Ωr has rank p(p − 1)/2. The p(p + 1)/2 singularities in Ωr consist of p(p − 1)/2 singularities due to the symmetry of R and p singularities due to the unit values on the diagonals of R. Accordingly, the matrix Ωr has rank p(p − 1)/2 and  L21,p , (153) N (Ωr ) = R N⊥ p 0 where L21,p is defined in Table 2. Let B be any full column-rank matrix that satisfies N⊥ p = BB . Note that p(p − 1) 0 2 N⊥ and B0 B = Ip(p−1)/2 (154) p = BB =⇒ dim(B) = p × 2 because B has full column-rank and N⊥ p is idempotent. Furthermore,   0 N⊥ . (155) p L21,p = 0 =⇒ B L21,p = 0 and N (Ωr ) = R B L21,p

Accordingly τ is the number of linearly independent vectors, t, that satisfy H2Np (Γ1 ⊗ Γ2 )t = Bv1 + L21,p v2 for some v1 ,

v2 .

Lemma 107a verified that H2Np (Γ1 ⊗ Γ2 )t = 2Np (Γ1 ⊗ Γ2 ) vec(U), where

U=

d1 X d2 X

Ej,m2 E0j,m2 TEi,m1 E0i,m1 (ρ2,j − ρ1,i )−1 , and T = dvec(t, p − r, r).

i=1 j=1

Equating H2Np (Γ1 ⊗ Γ2 )t to Bv1 + L21,p v2 reveals that v1 = 0 because H2Np (Γ1 ⊗ Γ2 )t = 2Np (Γ1 ⊗ Γ2 ) vec(U) = Bv1 + L21,p v2 =⇒ B0 2Np (Γ1 ⊗ Γ2 ) vec(U) = v1

Details119 and B0 2Np = 0. Also, U is a one-to-one function of T because U=

d1 X d2 X

Ej,m2 E0j,m2 TEi,m1 E0i,m1 (ρ2,j − ρ1,i )−1

i=1 j=1

⇐⇒ T =

d1 X d2 X

Ej,m2 E0j,m2 UEi,m1 E0i,m1 (ρ2,j − ρ1,i ) = Λ2 U − UΛ1 ,

i=1 j=1

L where Λ has been partitioned as Λ = Λ1 Λ2 . Accordingly, one can work directly with U rather than with t. Furthermore, the scalar (ρ2,j − ρ1,i )−1 can be absorbed into E02,j TEi,m1 to remove the dependence of U on the eigenvalues. It follows that τ is the number of linearly independent vectors u = vec(U) such that h i 2Np (Γ1 ⊗ Γ2 ) vec(U) = vec (v2 )dg h i because vec (v2 )dg = L21,p v2 . Using properties of Kp,p , the value of τ also can be written as the number of linearly independent vectors u = vec(U) for which Γ2 UΓ10 + Γ1 U0 Γ20 is a diagonal matrix. For later reference, the value of τ also is the number of linearly independent vectors, t for which H2Np (Γ1 ⊗ Γ2 )t = vec(D), where D is a non-null diagonal matrix, H2Np (Γ2 ⊗ Γ1 )t = vec(D), where D is a non-null diagonal matrix, (156) 2Np H(Γ1 ⊗ Γ2 )t = vec(D), where D is a non-null diagonal matrix, and 2Np H(Γ2 ⊗ Γ1 )t = vec(D), where D is a non-null diagonal matrix, because H2Np = 2Np H and 2Np (Γ2 ⊗ Γ1 )t = 2Np (Γ1 ⊗ Γ2 )t∗ , where t∗ = Kr,p−r t. Verification of Part (b). First, suppose that Γ2 UΓ10 + Γ1 U0 Γ20 is a non-zero diagonal matrix. Then, the trace of the diagonal matrix is zero because tr (Γ2 UΓ10 + Γ1 U0 Γ20 ) = tr (Γ10 Γ2 U) + tr (Γ20 Γ1 U0 ) and Γ10 Γ2 = 0. Accordingly the diagonal matrix can be written as Da = (Ca)dg for some (p − 1)-vector a. Also, Γ2 UΓ10 + Γ1 U0 Γ20 = Da =⇒ U = Γ20 Da Γ1 =⇒ Γ2 Γ20 Da Γ1 Γ10 + Γ1 Γ10 Da Γ2 Γ20 = Da and Γ2 UΓ10 + Γ1 U0 Γ20 = Da =⇒ a = 2 (C0 C)

−1

C0 L021,p (Γ1 ⊗ Γ2 ) vec(U).

Second, suppose that Γ2 Γ20 Da Γ1 Γ10 + Γ1 Γ10 Da Γ2 Γ20 = Da , where Da = (Ca)dg for some (p − 1)-vector a. Then Γ2 UΓ10 + Γ1 U0 Γ20 is a non-zero diagonal matrix, where U = Γ20 Da Γ1 and 0 vec(U) = (Γ1 ⊗ Γ2 ) L21,p Ca.

Details120 Itfollows that there is a one-to-one correspondence between U and a. If the columns of  Xu = vec(U1 ) vec(U2 ) · · · vec(Uτ ) form a basis set of vectors such that Γ2 UΓ10 + Γ1 U0 Γ20 is −1 a non-zero diagonal matrix, then the columns of Xa = 2 (C0 C) C0 L021,p (Γ1 ⊗ Γ2 ) Xu are linearly 0 independent because Xu= (Γ1 ⊗ Γ2 ) L21,p Xa . Similarly, if the columns of Xa = a1 a2 · · · aτ form a basis set of vectors such that Γ2 Γ20 Da Γ1 Γ10 + Γ1 Γ10 Da Γ2 Γ20 = Da , 0 then the columns of Xu = (Γ1 ⊗ Γ2 ) L21,p Xa are linearly independent because −1 Xa = 2 (C0 C) C0 L021,p (Γ1 ⊗ Γ2 ) Xu .

119.1.3

Lemma 1

√ Denote the asymptotic variance of n(b γ R,1 − γ 1 ) by Ωγ 1 . It follows from Schott (1991a, p. 748, above Theorem 1) and Kollo and Neudecker (1993, Theorem 8) that   Ωγ 1 = γ 01 ⊗ ΓD+ Γ0 V γ 1 ⊗ ΓD+ Γ0 , where D = (λ − 1p λ1 )dg , and V = lim N Var [vec(R − Φ)] . N →∞

Note that Ωγ 1 γ 1 = 0 because

Lemma 1.

D+ ep1

= 0.

If r = s = 1 and H0 is true, then rk(Ωγ 1 ) = rk(Θ1 ).

b1 = γ b 1) = γ b R,1 γ b 0R,1 , vec(P b R,1 ⊗ γ b R,1 , and Proof. If r = s = 1, then P √

b 1 − P0 ) = n vec(P



    √ b R,1 ⊗ γ b R,1 − γ 1 ⊗ γ 1 = 2Np (Ip ⊗ γ 1 ) n γ b R,1 − γ 1 + Op n−1/2 . n γ

It follows that  Θ1 = 2Np Ωγ 1 ⊗ γ 1 γ 01 2Np . Accordingly,    rk(Θ1 ) ≤ rk Ωγ 1 ⊗ γ 1 γ 01 = rk Ωγ 1 × rk (γ 1 γ 01 ) = rk Ωγ 1 . Denote the eigenvectors of Ωγ 1 that correspond to to the non-zero eigenvalues of Ωγ 1 by ω 1 , ω 2 , . . ., ω t and denote the non-zero eigenvalues by δ1 , δ2 , . . ., δt , where t = rk(Ωγ 1 ). Then, Np (ω j ⊗ γ 1 ) is an eigenvector of Θ1 that corresponds to a non-zero eigenvalue because  Θ1 Np (ω j ⊗ γ 1 ) = 2Np Ωγ 1 ⊗ γ 1 γ 01 2Np Np (ω j ⊗ γ 1 ) =  = 2Np Ωγ 1 ⊗ γ 1 γ 01 [(ω j ⊗ γ 1 ) + (γ 1 ⊗ ω j )] because 2Np Np = 2Np = 2Np (ω j δj ⊗ γ 1 ) because Ωγ 1 γ 1 = 0,

Ωγ 1 ω j = ω j δj , and γ 01 γ 1 = 1

= Np (ω j ⊗ γ 1 ) 2δj =⇒ rk(Θ1 ) ≥ rk(Ωγ 1 ) =⇒ rk(Θ1 ) = rk(Ωγ 1 ) because rk(Θ1 ) ≤ rk(Ωγ 1 ).

Details121

119.1.4 Corollary 1.1 Corollary 1.1 (Schott, 1991a, Theorem 1). where

( τ=

1 0

If r = 1, then the rank of Θ1 in (15) is p − 1 − τ ,

if γ 0 contains only two non-zero elements, each with norm otherwise.

p

1/2, and

Proof. Schott (1991a) provided a proof that rk(Ωγ 1 ) = p − 1 − τ , where τ is given in the statement of the Corollary. Here, a proof that rk(Θ1 ) = p − 1 − τ is given. Under H0 , ppo(M) = γ 1 γ 01 is satisfied. Let z1 be any vector in Op,1 that satisfies ppo(M) = z1 z01 . It follows from Theorem 1 that τ is the number of linearly independent vectors a that satisfy Da = z1 z01 Da + Da z1 z01 , where Da = (Ca)dg ,

(157)

because z01 Da z1 = 0 and Γ2 Γ20 Da z1 z01 = (Ip − z1 z01 ) Da z1 z01 = Da z1 z01 . Accordingly, vec(Da ) = vec(z1 z01 Da ) + vec(Da z1 z01 ) =⇒ L21,p Ca = (Ip ⊗ z1 z01 ) L21,p Ca + (z1 z01 ⊗ Ip ) L21,p Ca =⇒ Ca = 2(z1 z1 )dg Ca, because L021,p L21,p = Ip ,

L021,p (Ip ⊗ z1 z01 ) L21,p = Ip z1 z01 = (z1 z1 )dg ,

and = L021,p (z1 z01 ⊗ Ip ) L21,p = z1 z01 Ip = (z1 z1 )dg =⇒ h = 2(z1 z1 )dg h, where h = Ca is a p-vector that sums to zero h i =⇒ Ip − 2(z1 z1 )dg h = 0  2 =⇒ 1 − 2z1,i hi = 0 for i = 1, . . . , p 2 =⇒ hi = 0 if z1,i 6= 1/2.

To complete the proof, four cases are examined.

Case 1.

2 Suppose that no elements of z1 satisfy z1,i = 1/2. Then, h = Ca = 0 =⇒ a = 0 =⇒ τ = 0.

Case 2.

2 Suppose that exactly one element of z1 satisfies z1,i = 1/2. Then, a satisfies (157) only if h = Ca has at most one non-zero element. If h has no non-zero elements, then a = 0. The p-vector h cannot have exactly one non-zero element because h must satisfy 10p h = 0. Accordingly, τ = 0.

Details122

Case 3.

2 = 1/2. Then at most two elements of Suppose that exactly two elements of z1 satisfy z1,i h are non-zero. If no element of h is non-zero, then a = 0. Also, h cannot have exactly one non-zero element because 10p h = 0. If h has exactly two non-zero elements, then a non-zero a does exist such that h = Ca. Denote by h∗ , the p-vector that contains non-zero elements in 2 the two locations for which z1,i = 1/2. The two non-zero elements in h∗ must be proportional −1 to 1 and −1 because 10p h = 0. Accordingly, a must satisfy a ∝ (C0 C) C0 h∗ and the value of τ is at most one. To verify that τ = 1, denote the index values of the non-zero elements in h∗ by f and g. Then, h∗ = (epf − epg )b for some non-zero b, z1 = epf z1,f + epg z1,g and

Da z1 z01 = [ppo(C)h∗ ]dg z1 z01 = (h∗ )dg z1 z01 = (h∗ z1 ) z01 =b

h  i epf − epg z1 z01

  0 = b epf z1,f − epg z1,g epf z1,f + epg z1,g   p p0 2 p p0 p p0 2 = b epf ep0 f z1,f + ef eg z1,f z1,g − eg ef z1,f z1,g − eg eg z1,g =

 b  p p0 p p0 p p0 ef ef ± epf ep0 g ∓ eg ef − eg eg 2

  p p0 =⇒ Da z1 z01 + z1 z01 Da = b epf ep0 − e e = hdg = Da . g g f

Case 4.

119.2

2 Three or more elements of z1 cannot satisfy z1,i = 1/2 because z01 z1 = 1.

Proof of Theorem 2

Schott (1997) gave the following method to compute τ for any s = r ≥ 1. First, find P ∈ Pp such Lb that PP0 P0 = i=1 Vi , where dim(Vi ) = pi × pi , b is at its maximal value, and P0 is defined in (15). Then, τ is the number of blocks for which Vi = ∆i (Ipi − Vi )∆i for some ∆i , where ∆i = (δ i )dg , 10pi δ i = 0, and δ 2 = 1pi . Theorem 2 gives an alternative method for computing i rk(Θ1 |H0 ). Theorem 2, below, contains results that are not reported in Theorem 2 in the article.

Theorem 2.

Dene Θ1 as in (15), C as in Theorem 1(b), P0 def = ppo(M), and −1

def 0 b def = dim [N (TM )] = p − 1 − rk (TM ) , where TM = Ip−1 − 2 (C C)

C0 (P0 )dg C.

If b > 0, let B be a (p − 1) × b matrix whose columns form a basis set for N (TM ). If H0 : R(M) = R(Γ1 ) is true, then rk(Θ1 ) = r(p − r) − τ , where n  o ˙ 0 Ip2 − L22,p M ⊗ M⊥ and D ˙ p is the duplication matrix, (a) τ = r(p − r) − rk D p n   o ˙ 0 M ⊗ M⊥ L C (b) τ = r(p − r) + p − 1 − rk D , 21,p p

Details123    ˙ 0 2 [P0 ⊗ (Ip − P0 )] − Ip2 L21,p C , (c) τ = p − 1 − rk D p

(d) τ = 0 if TM is nonsingular, n   o ˙ 0 M ⊗ M⊥ L CB (e) τ = r(p − r) + b − rk D , and 21,p p    ˙ 0 2 [P0 ⊗ (Ip − P0 )] − Ip2 L21,p CB . (f ) τ = b − rk D p

If B is empty, then (e) and (f ) reduce to (d). Proof. Verification of part (a). Begin with the result that rk(Θ1 ) = r(p − r) − τ , where τ is the number of linearly independent vectors u = vec(U) for which Γ2 UΓ10 + Γ1 U0 Γ20 is a non-null diagonal matrix. Note that Γ2 UΓ10 + Γ1 U0 Γ20 = 0 ⇐⇒ U = 0 because Γ2 UΓ10 + Γ1 U0 Γ20 = 0 =⇒ Γ20 (Γ2 UΓ10 + Γ1 U0 Γ20 ) Γ1 = 0 =⇒ U = 0. Accordingly, U satisfies the following relations: Γ2 UΓ10 + Γ1 U0 Γ20 = (Γ2 UΓ10 + Γ1 U0 Γ20 )dg ⇐⇒

p X

p p0 0 0 0 0 0 0 epi ep0 i (Γ2 UΓ1 + Γ1 U Γ2 ) ei ei = Γ2 UΓ1 + Γ1 U Γ2

i=1

⇐⇒ L22,p vec (Γ2 UΓ10 + Γ1 U0 Γ20 ) = vec (Γ2 UΓ10 + Γ1 U0 Γ20 ) ⇐⇒ L22,p 2Np vec (Γ2 UΓ10 ) = 2Np vec (Γ2 UΓ10 ) ⇐⇒ L22,p vec (Γ2 UΓ10 ) = Np vec (Γ2 UΓ10 ) because L22,p Np = L22,p ⇐⇒ (Np − L22,p ) (Γ1 ⊗ Γ2 ) vec(U) = 0 ⇐⇒ τ = dim {N [(Np − L22,p ) (Γ1 ⊗ Γ2 )]} ⇐⇒ τ = r(p − r) − rk [(Np − L22,p ) (Γ1 ⊗ Γ2 )] ⇐⇒ τ = r(p − r) − rk [(Np − L22,p ) (Γ1 Γ10 ⊗ Γ2 Γ20 )] because (Γ10 ⊗ Γ20 ) has full row-rank ⇐⇒ τ = r(p − r) − rk {(Np − L22,p ) [P0 ⊗ (Ip − P0 )]}

Details124 because P0 = ppo(M) = Γ1 Γ10 under H0 . Also, {R(Γ1 ) = R(M) and dim(Γ1 ) = dim(M)}   =⇒ rk [(Np − L22,p ) (Γ1 ⊗ Γ2 )] = rk (Np − L22,p ) M ⊗ M⊥   =⇒ τ = r(p − r) − rk (Np − L22,p ) M ⊗ M⊥    =⇒ τ = r(p − r) − rk Np Ip2 − L22,p M ⊗ M⊥ because Np L21,p = L21,p h  i ˙ 0 Ip2 − L22,p M ⊗ M⊥ =⇒ τ = r(p − r) − rk D p  −1  −1 ˙p D ˙ 0D ˙p ˙ 0 and D ˙p D ˙ 0D ˙p because Np = D D has full column-rank. p p p Verification of part (b). First note that tr (Γ2 UΓ10 + Γ1 U0 Γ20 ) = tr (Γ10 Γ2 U) + tr (Γ20 Γ1 U0 ) = 0 for any U. Also, if D is a diagonal matrix with trace zero, then D = (Cb)dg for some b ∈

Suggest Documents