TEPPER-CART-06_103
An Improved Approximation to the Distributions in GMM Estimation Fallaw Sowell∗ This Draft: February 2007 first draft: June 2006 Abstract The empirical saddlepoint distribution provides an approximation to the sampling distributions for the GMM parameter estimates and the statistics that test the overidentifying restrictions. The empirical saddlepoint distribution permits asymmetry, non-normal tails, and multiple modes. If identification assumptions are satisfied as the sample size grows the empirical saddlepoint distributions converges to the familiar asymptotic normal distributions. Formulas are given to transform from the moment conditions used in GMM estimation to the estimation equations needed for the saddlepoint approximation. Unlike the absolute errors associated with the asymptotic normal distributions and the bootstrap, the empirical saddlepoint has a relative error. This provides a more accurate approximation to the sampling distribution, particularly in the tails. The calculation of the empirical saddlepoint approximation is computer intensive. The calculations are comparable to the bootstrap and requires repeatedly solving nonlinear equations. The structure of the saddlepoint equation permit analytically first and second derivatives. KEYWORDS: Generalized method of moments estimator, test of overidentifying restrictions, empirical saddlepoint approximation, sampling distribution, asymptotic distribution. ∗ Tepper
School of Business, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213. Phone: (412)-268-3769.
[email protected] I wish to thank colleagues and graduate students at Carnegie Mellon University for helpful comments and discussions. Particularly helpful suggestions were provided by Steven Lugauer and Roni Israelov.
c °2007 Fallaw Sowell
Printed: February 21, 2007
1
Introduction
The empirical saddlepoint density provides an accurate approximation to the sampling distribution for parameters estimated with the generalized method of moments and the statistics that test overidentifying restrictions. Traditional GMM (Hansen (1982)) uses a central limit theorem as the foundation for the asymptotic distributions to approximate the sampling distributions. The saddlepoint distribution is a generalization of the central limit theorem and provides a more accurate approximation. The saddlepoint approximation has a long history in the Statistics literature, starting with Esscher (1932) and Daniels (1954). General introductions to saddlepoint distributions are available in the articles Reid (1988), Goutis and Casella (1999) and Huzurbazar (1999) and the books Field and Ronchetti (1990), Jensen (1995) and Kolassa (1997). The saddlepoint distribution was first used to provide a more accurate approximation to the sampling distribution of the sample mean when the underlying distribution was known, e.g. Phillips (1978). It was then generalized to maximum likelihood estimates. Field (1982) gives the saddlepoint approximation to the density for parameter estimates defined as the solution to a system of equations where the underlying distribution is known. Ronchetti and Welsch (1994) presented the empirical saddlepoint distribution where the known distribution was replaced with the empirical distribution function. In Economic applications the underlying distribution is typically unknown so the empirical saddlepoint distribution is appropriate. If there exists a unique local minimum, this empirical saddlepoint approximation is an alternative approximation to the GMM sampling density. The Field (1982) and Ronchetti and Welsch (1994) papers are the natural bridges to the results in this paper. The saddlepoint distribution uses a just identified system of estimation equations. However, most empirical Economic models use an overidentified system of moment conditions to allow the Economic theory to be tested. This paper shows how to use the overidentified system of moment conditions to create a just identified system of estimation equations. This just identified system uses an extended-parameter space (m−dimensional) by augmenting the original (k−dimensional) parameter space. The additional parameters span the space of overidentifying restrictions and their estimates can be used to test the validity of the overidentifying restrictions. The first contribution of this paper is a new procedure to use Economic moment conditions (m moments conditions to estimate k parameters) to create a set of m estimation equations in m unknowns. The m equations permit the calculation of the empirical saddlepoint distribution. The second contribution is the extension of the saddlepoint approximation to include statistics to test the validity of the overidentifying restrictions. Previous work has only focused on the parameter estimates and testing hypotheses concerning parameters. The third contribution is a saddlepoint approximation that allows for models and samples where the GMM objective function has multiple local minima. Multiple local minima are common for GMM objective functions. Previous saddlepoint distribution research has always restricted attention to a unique local minimum. Carnegie Mellon University
2
February 21, 2007
The final contribution is the interpretation of the saddlepoint density as the joint density for the parameter estimates and the statistics that test the validity of the overidentifying restrictions. In finite samples these will typically not be independent. The saddlepoint distribution gives a well defined density that accounts for this dependence. The closest paper to the current research is Ronchetti and Trojani (2003), denoted RT. RT is an important paper that gives a procedure to go from the GMM moment conditions (m equations and k parameters) to estimation equations used for a saddlepoint approximation. RT also extends the parameter testing results from Robinson, Ronchetti and Young (2003) to situations where the moment conditions are overidentified. This paper takes a different approach to the saddlepoint approximation that results in a system with only m equations while the procedure presented in RT uses 12 (m + 2k)(m + 1) equations. In this paper, all the parameters are tilted instead of only a subset. The resulting saddlepoint density approximates the joint density of the parameter estimates and the statistics that test the validity of the overidentifying restrictions. The saddlepoint approximation results in two important differences over the commonly used asymptotic normal approximation. First, the parameters and the statistics that test the overidentifying restrictions are no longer forced to be independent. The asymptotic normal approximation uses a linear approximation that can be orthogonally decomposed into the identifying space and the overidentifying space, see Sowell (1996). The asymptotic normal approximation inherits the orthogonality from the tangent space and hence, θˆ and the J statistic are independent. In contrast, the saddlepoint uses different approximations at each parameter value. Each approximation gives a tangent space that is orthogonally decomposed. However, the union of these approximations is not required to give an orthogonal decomposition over the entire space. This is a common feature in approximation with higher orders of accuracy. In Newey and Smith (2004) and Rilstone, Srivastava and Ullah (1996), the higher order bias for the parameter estimates involve the estimated values for the statistics that test the overidentifying restrictions and vice versa. The lack of independence means the marginal densities are no longer equal to the conditional densities. It will now be informative to report the p-value for the statistics that test the overidentifying restrictions for both the marginal density and the density conditional on a parameter value. It is also informative to report the sampling distribution for the parameters conditional on the overidentifying restrictions being satisfied. The second important difference is that the sampling density is no longer required to be unimodal. If the GMM objective function has a unique local minimum, then the saddlepoint approximation will be unimodal. However, if the GMM objective function has multiple local minima, then the saddlepoint approximation can be multimodal. When the GMM objective function has more than one local minima there are two possible interpretations: (1) The model is identified. However, there is not enough data to determine the region where, in the limit, the unique global minimum will occur. Carnegie Mellon University
3
February 21, 2007
(2) The model is not identified. In both situations, the normal approximation using the location and convexity of the global minimum of the observed objective function will provide misleading inference concerning the parameter values associated with the other local minima. However, the Saddlepoint approximation is the density for the location of the minima of the extended-GMM objective function associated with the local minima of the original GMM objective function. Hence inference is well defined for both situations. If the data have not spoken loudly enough, the saddlepoint density can summarize the available information. Again, these issues are not new to Econometrics, even from a classical statistics perspective. The S-sets of Stock and Wright (2000) can also lead to disjoint confidence intervals. One of the attributes of the saddlepoint approximation is that it has a relative error versus the absolute error that occurs with the central limit theorem approximation. When the underlying distribution is unknown and the empirical distribution is used, the empirical saddlepoint approximation results in a relative error of order1 N −1/2 . The relative error results in an improvement over the absolute error, particularly in the tails of the distribution. This is important in the calculation of p−values for parameter estimates and tests of the overidentifying restrictions. If f (α) denotes the density being approximated, the asymptotic approximation can be written ¡ ¢ fˆA (α) = f (α) + O N −1/2 . The empirical saddlepoint approximation can be written © ¡ ¢ª fˆS (α) = f (α) 1 + O N −1/2 . When there is a relative error, the error gets small with the density in the tails of the distribution. When the error is absolute the same error holds in the tails as in the modes of the density. Hall and Horowitz (1996) shows that the bootstrap approximation results in an absolute error of order N −1 . Hence, theoretically, neither the empirical saddlepoint approximation nor the bootstrap approximation dominate. The next section reviews basic GMM notation and the assumptions needed to obtain first order asymptotic results. Section 3 is the derivation of the estimation equations from the GMM moment conditions and the augmenting parameters are introduced. Section 4 presents the saddlepoint approximation including extensions needed to apply the saddlepoint approximation to the estimation equations built on moment conditions from an Economic model. Section 5 summarizes the previous sections with a list the steps needed to calculate a saddlepoint approximation. Section 6 shows that the empirical saddlepoint approximation is well defined using the estimation equations presented in Section 3. Section 7 concludes with additional interpretations, implementation issues and directions for future work. 1
When the underlying distribution is known, the relative error from the saddlepoint approximation is typically of orderN −1 . In some special cases, the relative error can be of order N −3/2 . These dramatic improvements cannot be realized with empirical applications in Economics when the underlying distribution is unknown.
Carnegie Mellon University
4
February 21, 2007
The random vector xi will be defined on the probability space (Ω, B, F ). Let a √ vector’s norm be denoted kxk ≡ x0 x. Let a matrix’s norm be denoted kM k ≡ sup kM xk/kxk. An δ ball centered at x will be denoted, Bδ (x) ≡ {y : kx − yk < δ} for δ > 0. Let m(·) denote the Lebesgue measure.
2
Model and first order asymptotic assumptions
The saddlepoint approximation requires stronger assumptions than needed for the asymptotic normal approximation. To help compare and contrast these assumptions, the basic assumptions required for asymptotic normality (the first order asymptotics) are recorded as the basic model. Consider an m−dimensional set of moment conditions gi (θ) ≡ g(xi , θ) where θ is a k−dimensional set of parameters with k ≤ m. The Economic theory implies that at the population parameter value the moment conditions have expectation zero, i.e. Eg(xi , θ0 ) = 0. The error process that drives the system is assumed to be iid. Assumption 1 . The p−dimensional series xi is iid from a distribution F (x). The functions that create the moment conditions satisfy regularity conditions. Assumption 2. g(x, θ) is continuously partially differentiable in θ in a neighborhood are measurable functions of x for each θ ∈ Θ, of θ0 . The functions g(x, θ) and ∂g(x,θ) £ ∂θ0 0 ¤ ∂g(x,θ) and E supθ∈Θ k ∂θ0 k < ∞. E g(x, θ0 ) g(x, θ0 ) < ∞ and E supθ∈Θ kg(x, θ)k < ∞. Each element of g(x, θ) is uniformly square integrable. Assumption 3. The parameter space Θ is a compact subset of Rk . The population parameter value θ0 is in the interior of Θ. To obtain an estimate from a finite sample, the weighted inner product of the sample analogue of the moment condition GN (θ) =
N 1 X g(xi , θ) N i=1
is minimized conditional on, WN , a given symmetric positive definite weighting matrix argmin θˆ = G (θ)0 WN GN (θ). θ∈Θ N The foundation for the first order asymptotics results is the distributional assumption. Carnegie Mellon University
5
February 21, 2007
Assumption 4. The moment conditions evaluated at the population parameter values satisfy the central limit theorem √ N GN (θ0 ) ∼ N (0, Σg ). Attention will be restricted to GMM estimates with minimum asymptotic variance. Assumption 5. The weighting matrix is selected so that WN → Σ−1 g . This assumption is usually satisfied by performing a first round estimation using an identity matrix as the weighting matrix. i ,θ) Assumption 6. The matrix M (θ) ≡ E ∂g(x rank for all θ in a £ ∂θ0 has full 0column ¤ neighborhood of θ0 . The matrix Σ(θ) ≡ E g(xi , θ)g(xi , θ) is positive definite for all θ in a neighborhood of θ0 .
It will be assumed that the parameter can be identified from the moment conditions. Assumption 7. Only θ0 satisfies Eg(xi , θ) = 0. These assumptions are sufficient to ensure that the GMM estimator is a root-N consistent estimator of θ0 . The GMM estimates are asymptotically distributed as ´ ³ ¡ √ ³ ¢−1 ´ N θˆ − θ0 ∼ N 0, M (θ0 )0 Σ−1 M (θ ) . 0 g The theory being tested implies that all m of the moment conditions should be zero. The first order conditions ³ ´0 ³ ´ MN θˆ WN GN θˆ = 0 being satisfied implies that k of the moments will be exactly zero, where MN (θ) = ∂GN (θ) . The remaining (m − k) moments can be used to test the Economic theory. ∂θ0 The statistic ³ ´0 ³ ´ J = N GN θˆ WN GN θˆ tests these overidentifying restrictions and is asymptotically distributed χ2m−k if the null hypothesis of the Economic theory being true is correct.
3
Moment Conditions to Estimation Equations
The saddlepoint approximation uses a just identified system of estimation equations. This section shows how a just identified system of equations, appropriate for a saddlepoint approximation, can be created from the moment conditions implied by the Economic theory. The estimation equations are created by augmenting the parameters Carnegie Mellon University
6
February 21, 2007
with a set of (m − k) parameters denoted λ. The λ parameters are a local coordinate system2 that spans the space of overidentifying restrictions and are selected so that under the null hypothesis of the Economic theory being true, the population parameter values are λ0 = 0. The validity of the overidentifying restrictions can be tested by the hypothesis that H0 : λ = 0. For each value of θ the sample moment conditions GN (θ) is an m−dimensional vector. As θ takes different values, the moment conditions create a k−dimensional manifold in this m−dimensional space. The k−dimensional manifold will be called the identifying space and the (m − k)−dimensional space is called the overidentifying space. This is a generalization of the decomposition used in Sowell (1996) where the tangent space at θˆ was decomposed into a k−dimensional identifying space and a (m − k)−dimensional space of overidentifying restrictions. The generalization is defining this decomposition, not only at θˆ but at each value3 of θ. For each value of θ, let M N (θ) denote the derivative of GN (θ) with respect to θ scaled (standardized) by the Cholesky decomposition of the weighting matrix, 1/2 ∂GN (θ) . ∂θ0
M N (θ) = WN
Using this notation, the first order conditions that define the GMM parameter estimate can be written ˆ = 0. ˆ 0 W 1/2 GN (θ) M N (θ) N ˆ define the k linear combinations used to identify and estimate The columns of M N (θ) θ. ˆ will The orthogonal complement of the space spanned by the columns of M N (θ) be the (m − k)−dimensional space used to test the validity of the overidentifying restrictions and will be spanned by λ. To determine these augmenting parameters perform the decomposition for every value of θ. Denote the projection matrix for the space spanned by M N (θ) ³ ´−1 M N (θ)0 . PM (θ),N = M N (θ) M N (θ)0 M N (θ) This is a real symmetric positive semidefinite matrix that is also idempotent. Denote a spectral decomposition4 · ¸ ¸· £ ¤ Ik 0 C1,N (θ)0 0 PM (θ),N = CN (θ)ΛCN (θ) = C1,N (θ) C2,N (θ) 0 0(m−k) C2,N (θ)0 2
An introduction to the use of local coordinate systems can be found in Boothby (2003). Application of these tool in Statistics and Econometrics can be found in Amari (1985) and Marriott and Salmon (2000). 3 When attention is restricted to the empirical saddlepoint approximation, then the decomposition only needs to exist for parameters in a neighborhood of the local minima. 4 This raises a concern because the spectral decomposition is not unique. The invariance of the inference with respect to alternative spectral decompositions is demonstrated in Section 6.
Carnegie Mellon University
7
February 21, 2007
where CN (θ)0 CN (θ) = Im . The column span of C1,N (θ) is the same as the column span of M N (θ) and the columns of C2,N (θ) span the orthogonal complement at θ. Hence, for each value of θ, the m−dimensional space that contains GN (θ) can be locally parameterized by " # 1/2 0 C1,N (θ) WN GN (θ) ΨN (θ, λ) = . 1/2 λ − C2,N (θ)0 WN GN (θ) 1/2
The first set of equations are the k−dimensions of WN GN (θ) that locally vary with θ. The parameters θ are the local coordinates for these k−dimensions. The second set 1/2 of equations gives the (m − k)−dimension of WN GN (θ) that are locally orthogonal to θ. The parameters λ are the local coordinates for these (m − k)−dimensions. For each value of θ, the parameters λ span the space that is the orthogonal complement of the space spanned by θ. This parameterization of the m−dimensional space can be used to obtain parameter estimates by solving ΨN (θ, λ) = 0. This set of estimation equationsPwill be used in the saddlepoint approximation. The function ΨN (θ, λ) = N1 N i=1 ψ(xi , θ, λ) where " # 1/2 C1,N (θ)0 WN gi (θ) ψ(xi , θ, λ) = . 1/2 λ − C2,N (θ)0 WN gi (θ) Below a generic value of ψ(xi , θ, λ) will be denoted ψ(x, θ, λ). These moment conditions give a just identified system of m equations in m unknowns. The moment conditions ψ(xi , θ, λ) are an alternative way to summarize the first order conditions for GMM estimation and the statistics that test the validity of the overidentifying restrictions. Because the column span of C1,N (θ) and M N (θ) are the same, the system of equations 1/2 C1,N (θ)0 WN GN (θ) = 0 is equivalent to
1/2
M N (θ)0 WN GN (θ) = 0 ˆ This system of equations is solved and will imply the same parameter estimates, θ. 1/2 independently of λ. The system of equations C2,N (θ)0 WN GN (θ) − λ = 0 imply the estimate ³ ´0 ³ ´ ˆ = C2,N θˆ W 1/2 GN θˆ . λ N
3.1
A differential geometry perspective
G(θ) is a k−dimensional manifold in Rm . The m−dimensional tangent space for each value in the parameter space can be decomposed into the k−dimensional space span by M (θ) and its (m − k)−dimensional orthogonal complement. This orthogonal Carnegie Mellon University
8
February 21, 2007
complement will be span by an orthonormal bases, denoted C2 (θ), with coefficient λ ∈ R(m−k) . It will be convenient to let the columns of C1 (θ) be an orthonormal bases that spans the k−dimensional tangent space span of M (θ). £ by the columns ¤ Together this gives the m columns of C1 (θ) C2 (θ) as an orthonormal bases. Because there are a continuum of orthonormal bases, each equally valid, this requires the demonstration that the conclusions of the analysis are robust. This will be demonstrated below in Theorems 2 and 6. £ ¤ In calculations a fixed orthonormal bases is required. To be specific, C1 (θ) C2 (θ) are determined by the Gram-Schmidt orthogonalization of £ ¤ M E(θ) = m1 (θ) m2 (θ) . . . mk (θ) e1 e2 . . . em−k , where mi (θ) is the ith column of M (θ) and ei is the ith standard bases element, £ ⊥ ¤ 1 Pj−1 C1 (θ) = diag q (θ)mj (θ) ⊥ mj (θ)0 Pj−1 (θ)mj (θ) where Pj⊥ (θ) denotes the projection onto the orthogonal complement of the space span by the first j columns of M E(θ), for j = 1, 2, . . . , m. Let P0⊥ = Ik . C2 (θ) is C2 (θ) = diag q
1 ⊥ e0j−k Pj−1 (θ)ej−k
£ ⊥ ¤ Pj−1 (θ)(θ)ej−k
for j = k + 1, k + 2, . . . , m. For each parameter value, θ◦ , a linear approximation to the limiting function Eg(xi , θ) is defined by Eg(xi , θ) = Eg(xi , θ◦ ) + M (θ◦ ) (θ − θ◦ ) . A reparameterization of the tangent space can be defined by an orthonormal basis of the space span by the columns of M (θ◦ ). Let C1 (θ◦ ) define such an orthonormal basis then the new parameterization can be written t = C1 (θ◦ )0 M (θ◦ )θ. The inverse of the reparameterization is given by −1
θ = (M (θ◦ )0 M (θ◦ ))
M (θ◦ )0 C1 (θ◦ )t.
This gives a local diffeomorphism between θ and t. Eg(xi , θ) is an element of Rm but θ and t are elements of Rk the remaining dimensions (m − k)−dimensions can be span by an orthonormal basis C2 (θ◦ ). At each value of θ the estimation equations have a natural interpretation. The parameter θ is selected so that the normalized moment conditions projected onto C1 (θ) are set to zero. The parameters λ are the normalized moment conditions projected onto C2 (θ). Carnegie Mellon University
9
February 21, 2007
3.2
Notes
1. The inner product of ΨN (θ, λ) will be called the extended-GMM objective function. The estimates are the parameters that minimize the inner product, which can be rewritten QN (θ, λ) = ΨN (θ, λ)0 ΨN (θ, λ) 1/2
= GN (θ)0 WN GN (θ) − 2λ0 C2,N (θ)0 WN GN (θ) + λ0 λ. Because ΨN (θ, λ) is a just identified system of equations there is no need for an additional weighting matrix. When λ = 0 the extended-GMM objective function reduces to the traditional GMM objective function. The extended-GMM objective function is created from the traditional GMM objective function by appending a quadratic function in the tangent space of overidentifying restrictions. 2. As expected the asymptotic distribution for the parameter estimates using the extended-GMM objective function agree with the traditional results. Theorem 1. If Assumptions 1-7 are satisfied, the asymptotic distribution for the extended-GMM estimators is given by ¶ µ· ¸ · ¡ ¸¶ µ ¢−1 √ 0 θˆ − θ0 M (θ0 )0 Σ−1 M (θ0 ) 0 g , ∼N . N ˆ 0 0 I(m−k) λ The proofs of the theorems are presented in the appendix. ˆ are independent up to first order asymptotics. 3. The estimates θˆ and λ 4. The validity of the overidentifying restrictions can be tested with ˆ0λ ˆ = N GN (θ) ˆ 0 W 1/2 0 C2,N (θ)C ˆ 2,N (θ) ˆ 0 W 1/2 GN (θ) ˆ Nλ N N ˆ 0 W 1/2 0 P ⊥ ˆ W 1/2 GN (θ) ˆ = N GN (θ) N M (θ) N ˆ 0 W 1/2 0 W 1/2 GN (θ) ˆ = N GN (θ) N N ˆ 0 WN GN (θ) ˆ = N GN (θ) = J. 5. The asymptotic inference is invariant to the selection of the spectral decomposition. Theorem 2. For the extended-GMM estimators the asymptotic distribution for the parameters, θ, and the parameters that test the overidentifying restriction, λ, are invariant to the spectral decomposition that spans the tangent space.
Carnegie Mellon University
10
February 21, 2007
£ ¤0 6. To reduce notation let α ≡ θ0 λ0 . A generic element ψ(xi , α) will be denoted ψ(x, α). The estimation equations to be used in the saddlepoint approximation can be denoted ΨN (α) = 0. The extended-GMM objective function will be denoted QN (α) = ΨN (α)0 ΨN (α). 7. The point of departure for the saddlepoint approximation is a system of m estimation equation in m parameters. These will be the first order conditions from the minimization of the extended-GMM objective function. The first order conditions imply ∂QN (b α) ∂ΨN (b α)0 =2 ΨN (b α) = 0. ∂α ∂α The assumption that M (θ) and WN are of full rank implies that the first order conditions are equivalent to ΨN (b α) = 0. 8. A minimum or root of the extended-GMM objective function can be associated with either a local maximum or a local minimum of the original GMM objective function. Attention must be focused on only the minima of the extended-GMM objective function associated with the local minima of the original GMM objective function. This will typically be done with an indicator function ½ 1, if QN (θ) is positive definite Ipos (QN (θ)) = 0, otherewise. Note that this indicator function uses the original GMM objective function not the extended-GMM objective function.
4
The Saddlepoint approximation
The saddlepoint density replaces the central limit theorem in the traditional GMM distribution theory. The central limit theorem uses information about the location and convexity of the GMM objective function at the global minimum. The saddlepoint approximation uses information about the convexity of the objective function at each point in the parameter space. The central limit theorem is built on a two-term Taylor series expansion, i.e. a linear approximation, of the characteristic function about the mean. A higher order Taylor series expansion about the mean can be used to obtain additional precision. This results in an Edgeworth expansion. Because the expansion is at the distribution’s mean, the Edgeworth expansion gives a significantly better approximation at the Carnegie Mellon University
11
February 21, 2007
¡ ¢ mean of the distribution, O (N −1 ) versus O N −1/2 . Unfortunately, the quality of the approximation can deteriorate significantly for values away from the mean. The saddlepoint approximation exploits these characteristics of the Edgeworth expansion to obtain an improved approximation. Instead of a single Taylor series expansion, the saddlepoint uses multiple expansions to obtain improved accuracy, one expansion at every value in the parameter space. The significantly improved approximation of the Edgeworth expansion only occurs at the mean of the distribution. To obtain this improvement at an arbitrary value in the parameter space, a conjugate distribution is used. For the parameter values α the conjugate density is n 0 o β exp N ψ(x, α) dF (x) © 0 ª . hN,β (x) = R exp βN ψ(w, α) dF (w) The object of interest is the distribution of Ψ(α) not and individual element ψ(α) hence the parameter β is normalized by N . At the parameter value of interest, α, the original distribution is transformed to a conjugate distribution. The conjugate distribution is well define for arbitrary values of β. This is a degree of freedom, i.e., β can be selected optimally for a given α. A specific conjugate distribution is selected so that its mean is transformed back to the original distribution at the value of interest. This will occur if β is selected to satisfy the saddlepoint equation ½ 0 ¾ Z β ψ(x, α) exp ψ(x, α) dF (x) = 0. N Denote the solution to the saddlepoint equation as β(α). An Edgeworth expansion about its mean is calculated for the conjugate distribution defined by β(α). This is then transformed back to give the saddlepoint approximation to the original distribution at the parameter value of interest, α. The basic theorems from the Statistics literature are in Almudevar, Field and Robinson (2000), Field and Rochetti (1994), Field (1982) and Rochetti and Welsch (1994). To date the saddlepoint distribution theory in the Statistics literature is not well suited for empirical Economics. There are two problem. The first is the assumption that the objective function has a single extreme value. The second is the assumption that the saddlepoint equation has a solution for every value in the parameter space. A common feature of GMM objective functions is the existence of more than one local minimum, e.g. see Dominguez and Lobato (2004). In addition, the nonlinearity of the moment conditions can make it impossible to solve the saddlepoint equation for an arbitrary value in the parameter space. The basic theorems from the Statistics literature need slight generalizations to allow for multiple local minima and the non-existence of a solution to the saddlepoint equation. The needed generalizations are presented in Theorems 3 and 4. The next subsection explains the needed generalizations. This is followed by a subsection that uses this paper’s notation and presents the assumptions needed for the theorems. The last subsection contains the Carnegie Mellon University
12
February 21, 2007
theorems. The proofs are presented in the appendix with citations to the work on which they are built.
4.1
Density Restrictions
The empirical saddlepoint density applied to GMM moments requires two restrictions: the local behavior of the objective function must be associated with a local minimum and the parameters must be consistent with the observed data. Identification for GMM implies that asymptotically there will only be one local minimum. However, in finite samples there may not be enough data to accurately distinguish this asymptotic structure. In finite samples there may be, and often are, multiple local minima. Traditional GMM addresses this by restricting attention to the global minimum. The traditional asymptotic normal approximation to the GMM estimator is an approximation, built on the local structure (location and convexity) of the global minimum. The saddlepoint approximation takes a different approach when the data have not distinguished a unique local minimum. The saddlepoint density approximates the sampling density for the location of solutions to the estimation equations. These include both local maxima and local minima. As with traditional GMM estimation, as the sample size increases one of the local minima is expected to become the unique local minimum. For the empirical saddlepoint density, attention must be focused on the local minima by setting the saddlepoint density to zero if the first derivative of the estimation equation is not positive definite.The term Ipos (QN (θ)) will be used to denote the indicator function for original GMM objective function being positive definite at the θ value in α. Note that a similar restriction was also been used in Skovgaard (1990). However in Skovagaard (1990) the restriction was applied to the equivalent of the extended-GMM objective function not the equivalent of the original GMM objective function, i.e., the Skovagaard (1990) density would include both the local maxima and the local minima of the original GMM objective function. The second restriction for the saddlepoint applied to the GMM estimation equations concerns the fact that the empirical saddlepoint equation may not have a solution. In this case the approximate density is set equal to zero. The lack of a solution to the empirical saddlepoint equation means the observed data is inconsistent with the selected parameter, α. The term I (β(α)) will be used to denote the indicator function for the event that the saddlepoint equation for α has a solution. This type of restriction has occurred recently in the Statistics and Econometrics literature. The inability to find a solution to the empirical saddlepoint equation is equivalent to the restrictions on the sample mean in the selection of the parameters in the exponential tilting/maximum entropy estimation of Kitamura and Stutzer Carnegie Mellon University
13
February 21, 2007
(1997). For the simple case of estimating the sample mean, the parameters must be restricted to the support of the observed sample. It is impossible to select nonnegative weights (probabilities) to have the weighted sum of the sample equal a value outside its observed range.
4.2
Assumptions
The needed results on the saddlepoint approximation and the empirical saddlepoint approximations are contained in two published papers in the Statistics literature: Almudevar, Field and Robinson (2000) and Ronchetti and Welsh (1994). Four needed results are: 1. The existence of the distribution for the local minima of the extended-GMM objective function associated with the local minima of the original GMM objective function. 2. The tilting relationship between the density for the zeros (local minima) of the extended-GMM objective function associated with the local minima of the GMM objective function when the observations are drawn from the population distribution and the density when the observations are drawn from the conjugate distribution. 3. The saddlepoint approximation assuming that the distribution for the observed series is known. 4. The empirical saddlepoint approximation where the distribution for the observed series is replaced with the empirical distribution. The first three results are achieved by a minor change to the results in Almudevar, Field and Robinson (2000). The change concerns the need to restriction attention to only the solutions to the estimation equations associated with the local zeros of the original GMM objective function. In AFR the density is for all solutions even those associated with local maxima of the original GMM objective function. This additional restriction is satisfied by simply setting Ψ∗ (X, θ) and z(θ0 , τ ) to infinity if Ψ0 (X, θ) is not positive definite. With this change, the same steps of the proof are satisfied. The final result is achieved by a minor change in the result in Ronchetti and Welsh (1994). The first change is to set the approximation to zero if the empirical saddlepoint equation does not have a solution. This is achieved by including an indicator function for the existence of a solution to the empirical saddlepoint equation. The other change is to allow for multiple solutions to the estimation equations. This is achieved by applying the result in Ronchetti and Welsh (1994) to each solution of the estimation equation associate with a local minima of the original GMM objective function. Relative to traditional assumptions in Econometrics, an important difference is that the saddlepoint approximation is well defined for some models that lack identification. Traditionally a maintained assumption is that the limiting objective function uniquely identifies the population parameter. The saddlepoint approximation does not require such a strong assumption to be well defined, there may be multiple solutions to the moment conditions associated with local minima of the GMM objective function. Define the set T = {θ ∈ Θ : E [g(xi , θ)] = 0} . If T is a singleton then this Carnegie Mellon University
14
February 21, 2007
reduces to the traditional identification assumption. An individual element in T will typically be denoted θ0∗ . The empirical saddlepoint approximation requires stronger assumptions than the assumptions needed for the normal approximation. These additional restrictions concern the existence and integrability of higher order derivatives of the moments. Stronger assumptions for the saddlepoint density are required so that the asymptotic behavior that occurs in the neighborhood of the global minimum of the GMM objective function also hold for the neighborhood of each element in T . Stronger assumptions are also needed to ensure that the Edgeworth expansion of the conjugate density is well defined. Finally, for the empirical saddlepoint approximation the stronger assumptions concerning the existence of higher order moments and smoothness ensure that the sample averages converge uniformly to their limits. The error process that drives the system is assumed to be iid. Assumption 1’ . The p−dimensional series xi is iid from a distribution F (x). The moment conditions satisfy regularity conditions. Assumption 2’. The function g(x, θ) is uniformly continuous in x and has three derivatives with respect to θ which are uniformly continuous in x. ∂g(x,θ) functions of x for each θ ∈ Θ, and i θ) and £ ∂θ0 are measurable h The functions g(x, ¤ ∂g(x,θ) E supθ∈Θ k ∂θ k < ∞. E g(x, θ0 )0 g(x, θ0 ) < ∞ and E [supθ∈Θ kg(x, θ)k] < ∞. Each element of g(x, θ) is uniformly square integrable. This implies that N times the GMM objective function will uniformly converge to the nonstochastic function E [g(x, θ)]0 E [g(x, θ)]. Assumption 3’. The parameter space Θ is a compact subset of Rk . θ0 ∈ T . Each element of T is in the interior of Θ. ∗ ∗ exists a δ > 0 such that for any two unique elements in T , θ0,j and θ0,i , ¯ ∗ There ¯ ∗ ¯θ0,j − θ0,i ¯ > δ. This assumptions requires that the population parameter is in the set of minima of the extended-GMM objective function associated with the local minima of the original GMM objective function. This assumptions also ensure that the limiting objective function can be expanded in a Taylor series about its local minima. Finally, this assumption also requires that any identification failures results in disjoint solutions. Assumptions 1’-3’ are sufficient to ensure that the GMM estimator is a root-N consistent estimator of an element in T . The foundation for the first order asymptotics results is the distributional assumption. Assumption 4’. The moment conditions evaluated at θ0∗ ∈ T satisfy the central limit theorem √ N GN (θ0∗ ) ∼ N (0, Σg (θ0∗ )) where Σg (θ0∗ ) may be different for each value of θ0∗ ∈ T . Carnegie Mellon University
15
February 21, 2007
Assumption 5’. The weighting matrix is selected so that it is always positive definite and WN → Σg (θ0∗ )−1 where θ0∗ ∈ T . This assumption can be satisfied by performing a first round estimation using a positive definite matrix as the weighting matrix. If T is a singleton this ensures the first order distribution will be efficient. The objective function must satisfy restrictions for a neighborhood for each solutions to the extended-GMM objective function associated with a local minima of the original GMM objective function. Assumption 6’. For each θ0∗ ∈ T , is continuously differentiable and has full column 1. The matrix M (θ) ≡ E ∂g(x,θ) ∂θ0 rank for all θ in a neighborhood of θ0∗ . £ ¤ 2. The matrix Σ(θ) ≡ E g(x, θ)g(x, θ)0 is continuous and positive definite for all θ in a neighborhood of θ0∗ . 3. The function
Z
∂g(x, θ) exp {β 0 g(x, θ)} dF (x) ∂θ0 exist for β in a set containing the origin.
4. For 1 ≤ i, j, s1 , s2 , s3 ≤ m the integrals ¾2 ¾2 Z ½ Z ½ ∂gi (x, α) ∂gi (x, α) gj (x, α) dF (x), dF (x), ∂αs3 ∂αs3 Z ½
∂ 2 gi (x, α) ∂αs2 ∂αs3
¾ ∂ 2 gi (x, α) dF (x), gj (x, α) dF (x), ∂αs2 ∂αs3 ¾ Z ½ 3 ∂ gi (x, α) dF (x) ∂αs1 ∂αs2 ∂αs3
¾2
Z ½
are finite. Assumption 6’.1 and Assumption 6’.2 are straightforward. Assumption 6’.3 ensures that the saddlepoint equation will have a solution in the neighborhood of θ0∗ . Assumption 6’.4 with Assumption 2’ ensures the uniform convergence of the sample averages to their limits. The next assumption restricts attention to all the parameter values in a τ ball of a fixed parameter value. Calculate the maximum distance between the derivative at α0 and the derivative any other parameter values in Bτ (α0 ). If the derivative is zero for any parameter value in Bτ (α0 ), set the distance to infinity. Carnegie Mellon University
16
February 21, 2007
Define the random variable5 ° 2 ° ∂ g(x,θ0∗ )0 WN g(x,θ0∗ ) sup − ∗ θ∈Bτ (θ0 ) ° ∂θ0 ∂θ z(θ0∗ , τ ) = ∞,
°
∂ 2 g(x,θ)0 WN g(x,θ) ° °, ∂θ0 ∂θ
∂ 2 g(x,θ)0 WN g(x,θ) ∂θ0 ∂θ
is
positive definite, ∀ θ ∈ Bτ (θ0∗ ) otherwise
Now define the events in Ω such that the maximum deviation between the first derivative is below some value such as γ > 0. H(θ0∗ , γ, τ ) = {z(θ0∗ , τ ) < γ} ⊂ Ω This restricts attention to events where the objective function has an invertible derivative and that is fairly smooth. Now define ³ 2 ´−1 ∂ g(x,θ0∗ )0 WN g(x,θ0∗ ) ∂g(x,θ)0 ∂ 2 g(x,θ)0 WN g(x,θ) W g(x, α), is positive N 0 ∂θ ∂θ ∂θ ∂θ0 ∂θ R∗ (x, θ) = definite, ∀ θ ∈ Bτ (θ0∗ ) ∞, otherwise and the density fR∗ (x,θ) (z; H(θ, γ, τ )) =
Pr {{R∗ (x, θ) ∈ Bτ (z)} ∩ H(θ, γ, τ )} . m(Bτ (z))
Assumption 7’. For any compact set A ⊂ A and for any 0 < γ < 1, there exists τ > 0 and δ > 0 such that fR∗ (x,θ) (z; H(θ, γ, τ )) exists and is continuous and bounded by some fixed constant K for any α ∈ A and z ∈ Bδ (0). This is a high level assumption that requires that the moment conditions have a bounded density in the neighborhood of zero. This is required to establish the existence of the density for the location of the local minima of the original GMM objective function. The next assumptions permit the moment conditions (and hence the estimation equations) that possess a mixture of continuous and discrete random variables. For each θ let Dθ ⊂ Rm be a set with Lebesgue measure zero such that by the Lebesgue decomposition theorem, ½ ¾ ½ ¾ Z ∂g(x, θ)0 ∂g(x, θ)0 Pr WN g(x, θ) ∈ A = Pr WN g(x, θ) ∈ A ∩ Dθ + fθ dm ∂θ ∂θ A where fθ may be an improper density. Let Iiθ = 0 if g(xi , θ) ∈ Dθ and 1, otherwise. Assume that the moment conditions can be partitioned into contributions from the continuous components and the discrete components. 5
Almudevar, Field and Robinson (2000) uses kB −1 A−Ik as the distance between the two matrices, A and B. This paper will use the alternative distance measure kA − Bk, that is defined for none square matrices. This definition is more commonly used in the Econometrics literature, e.g., see Assumption 3.4 in Hansen (1982).
Carnegie Mellon University
17
February 21, 2007
¡ ¢ Assumption 8’. Assume that there are iid random vectors Uiθ = Wiθ , V1iθ , V2iθ , where Wiθ are jointly continuous random vectors of dimension m and V1iθ and V2iθ are random vectors of dimension m and m∗ , respectively, such that NTθ =
N X ∂g(x, θ)0 i=1
∂θ
WN g(xi , θ) =
N X
Iiθ Wiθ +
N X
i=1
(1 − Iiθ ) V1iθ
i=1
! Ã N N X ∂ 2 g(xi , θ)0 WN g(xi , θ) X ¡ ¢ vec N S θ = vec = Aθ Iiθ Uiθ , ∂θ0 ∂θ i=1 i=1 where Aθ is of dimension m2 by 2m + m∗ . ¢ ¡ 0 0 Vjθ0 have the distribution of Ujθ conditional on Ijθ = 1 and = Wjθ Let Ujθ 00 V1jθ have the distribution of V1jθ conditional on Ijθ = 0. Define K ¡ 0 ¢ 1 X 0 0 ˜ ˜ ˜ Uθ = Wjθ Vjθ = U , N i=1 jθ where K is a draw from a binomial distribution with parameters N and ρ = Pr {Iiθ = 1} conditional on being positive. Define K N 1 X 0 1 X 00 T˜θ = Wjθ + V N j=1 N j=K+1 1jθ
when 0 < K < N . Finally, define ³
K ´ 1 X 0 ˜ U . vec N Sθ = Aθ N j=1 jθ
³ ´ Assumption 9’. Assume that det S˜θ 6= 0 and that the transformation T˜θ to S˜θ−1 T˜θ given V˜θ is one-to-one with probability one. © ª Assumption 10’. Assume that E exp β T Uθ < ∞ for kβk < γ for some γ > 0 and for all θ. Let ° h 2 h 2 i h 2 i° i ° ∂ g(x,θ0∗ )0 WN g(x,θ0∗ ) ∂ g(x,θ)0 WN g(x,θ) ∂ g(x,θ)0 WN g(x,θ) ° , E − E is ° °E ∂θ0 ∂θ ∂θ0 ∂θ ∂θ0 ∂θ 0 Λ(F ; θ , θ) = positive definite, ∞, otherwise and Λ∗ (F ; θ, τ ) = sup Λ(F ; θ0 , θ). θ 0 ∈Bτ (θ)
Carnegie Mellon University
18
February 21, 2007
Assumption 11’. Given 0 < γ < 1 there is a τ such that supθ∈B˜τ (θ0 ) Λ∗ (F0 ; θ, τ ) < γ. Assumption 12’. For fixed θ ∈ Bτ (θ0 ), Λ∗ (·; ·, τ ) is continuous at (F0 , θ) in the product topology. Theorem 3. (Almudevar, Field and Robinson (2000)) Under Assumptions 1’-12’ for θ0∗ ∈ T , there is, with probability 1 − e−cN for some c > 0, a uniquely defined M −estimate α ˆ on Bτ (α0∗ ) which has a density, restricted to Bτ (θ0∗ ), fN (α) = KN × Ipos (QN (θ)) × I(β(α)) µ ¶ m2 ¯ ¯¯ ¯−1/2 N ¯ ¯¯ 0 0 ¯ × ¯ E [ψ (x, α)]¯ ¯ E [ψ(x, α)ψ(x, α) ]¯ 2π ¡ ¡ ¢¢ × exp {N κN (β(α), α)} 1 + O N −1 where β(α) is the solution of the saddlepoint equation ½ 0 ¾ Z β ψ(x, α) exp ψ(x, α) dF (x) = 0, N ¾ ½ Z β(α)0 κN (β(α), α) = exp ψ(x, α) dF (x), N the expectations are with respect to the conjugate density n o 0 exp β(α) ψ(x, α) dF (x) N n o , hN (x) = R 0 exp β(α) ψ(w, α) dF (w) N the term Ipos (QN (θ)) is an indicator function that sets the approximation to zero if the original GMM objective function is not positive definite at the θ value in α, the term I(β(α)) is an indicator function that sets the approximation to zero if the saddlepoint equation does not have a solution and KN is a constant that ensure that the approximation integrates to one. This shows how the saddlepoint approximation is calculated. The saddlepoint approximation is nonnegative and gives a faster rate of convergence than the asymptotic normal approximation. The calculation of the saddlepoint density requires knowledge of the distribution F (x). In most Economic applications this is unknown. Replacing the distribution with the observed empirical distribution results in the empirical saddlepoint approximation. This replaces the expectations with respect to the distribution F (x) with sample averages, i.e., the empirical distribution is used. The empirical saddlepoint density is defined to be fˆN (α) = KN × Ipos (QN (θ)) × I(βˆN (α)) ¯−1/2 ¯¯ µ ¶ m2 ¯¯X N N ¯ ¯ ¯ 0 X N ∂ψ(xi , α) ¯ ¯¯ ¯ 0 ψ(xi , α)ψ(xi , α) pi (α)¯ × pi (α)¯ ¯ ¯ ¯ ¯ ¯ ¯ 2π ∂α i=1 i=1 ( à ( )!) N 1 X βˆN (α)0 × exp N ln ψ(xi , α) exp , N i=1 N Carnegie Mellon University
19
February 21, 2007
where βˆN (α) is the solution of N X
½ ψ(xi , α) exp
i=1
and
¾ β0 ψ(xi , α) = 0 N
nˆ
o
βN (α)0 ψ(xi , α) N
exp o. nˆ pi (α) = P N βN (α)0 ψ(x , α) exp j j=1 N Using the empirical distribution instead of a known distribution gives a nonparametric procedure. However, it results in a reduction in accuracy as noted in the next theorem. The appropriate rate of convergence is achieved by restricting attention to parameters in a shrinking neighborhood of the local minima of the estimation equations, √ |α − α0 | < ∆/ √N for some ∆ < ∞. This can also be thought of as obtaining the density for u = N (α − α0 ) . Theorem 4. (Ronchetti and Welsh (1994)) If Assumptions 1’-12’ are satisfied for θ0∗ ∈ T , ³ √ ´ ∗ fN α0 + u/ N ¡ −1/2 ¢ ³ √ ´ = 1 + Op N ∗ fˆN α ˆN + u/ N where the convergence is uniform for u in any compact set. The empirical saddlepoint approximation is scaled by its (numerical) integral, KN , to give a density that integrates to one.
5
Asymptotic Normal and Empirical Saddlepoint Approximations
This section compares and contrasts the asymptotic normal approximation and the empirical saddlepoint approximation. The two densities have similar structures. Their differences concern the “means” and “covariances.” From Theorem 2 the asymptotic normal approximation can be written ¯ ¯−1/2 " #0 Ã !−1 " # ¯ ¯ b b m ¯Σ¯ 1 θ − θb Σ θ − θb exp − fbA (α) = (2π)− 2 ¯ ¯ b b ¯N ¯ 2 λ−λ N λ−λ where θb is the GMM estimator, i.e. the global minima of the objective function, b = C2 (θ)W b −1/2 GN (θ) b and Σ, b the covariance matrix given Theorem 2. λ
Carnegie Mellon University
20
February 21, 2007
b is the observed second derivative (convexity) of the The covariance matrix Σ extended-GMM objective function at the global minima and can be estimated with −1 (A0 B −1 A) where A =
N 1 X ∂ψ(xi , α ˆ) 0 N i=1 ∂α
and B =
N 1 X ψ(xi , α ˆ )ψ(xi , α ˆ )0 . N i=1
The asymptotic normal approximation is built on the local behavior (location and convexity) at the objective function at its extreme value. The asymptotic normal approximation delivers a familiar, but restricted, structure for the approximation to the sampling distribution: unimodal, symmetric with the thin tails associated with the normal distribution. The asymptotic normal approximation is constructed from a linear approximation to the first order conditions. Exactly linear first order conditions occur when the objective function is quadratic. So the asymptotic normal approximation is built on a quadratic approximation to the objective function. It has been noted that the asymptotic normal approximation is a poor approximation to the sampling distribution for several estimated models. This means that the objective function is not well approximated by a quadratic over relevant parameter values. The local behavior (location and convexity) of the objective function at its extreme value does not contain enough information to accurately approximate the objective function in a large enough region. The empirical saddlepoint approximation to the sampling distribution takes a different approach. The empirical saddlepoint approximation uses the convexity of b C (θ) the objective function at each point in the parameter space. The matrices Σ b e (θ) are different measures of the convexity of the objective function. The and Σ b e (θ) is the convexity of the objective function at the parameter values that matrix Σ b e (θ) = (Ae (α)0 Be (α)−1 Ae (α))−1 is estimated using the empirical distribution, i.e., Σ where N X 1 ∂ψ(xi , α ˆ) Ae (α) = 0 N ∂α i=1
N X 1 and Be (α) = ψ(xi , α ˆ )ψ(xi , α ˆ )0 . N i=1
b C (θ) is the convexity of the objective function at the parameter values The matrix Σ b C (θ) = (AC (α)0 BC (α)−1 AC (α))−1 that is estimated using the conjugate distribution, i.e., Σ where AC (α) =
N X i=1
and
pi (α)
∂ψ(xi , α ˆ) ∂α0
and BC (α) =
N X
pi (α)ψ(xi , α ˆ )ψ(xi , α ˆ )0
i=1
exp {β(α)0 ψ(xi , α)} pi (α) = PN . 0 j=1 exp {β(α) ψ(xj , α)}
In addition, the empirical saddlepoint approximation uses each of the local minima as the “mean” for the parameters that can be connected to it with a path where the Carnegie Mellon University
21
February 21, 2007
∗ objective ¸function remains positive definite. This “mean” will be denoted α (α) = · ∗ θ (α) . λ∗ (α) The similar structure of the asymptotic normal approximation and the empirical saddlepoint approximation is shown in the following theorem.
Theorem 5. The empirical saddlepoint approximation can be written fbS (α) = KN × Ipos (QN (α)) × I(β(α)) ¯ ¯−1/2 !−1 · · ¸0 Ã b ¸ ¯ ¯ ∗ ∗ b m ¯ ΣC (θ) ¯ 1 θ − θ (α) Σe (θ) θ − θ (α) exp − ×(2π)− 2 ¯ ¯ ∗ λ − λ∗ (α) ¯ N ¯ 2 λ − λ (α) N ´o n ³ 1 . × 1 + O N−2 The saddlepoint approximation uses the global structure of the objective function. If T is a singleton, as the sample size grows consistency implies that the mass of the sampling distribution converges to a shrinking neighborhood of the population parameter value. In this neighborhood the objective function will converge to a quadratic and hence the saddlepoint approximation will converge to the familiar normal approximation. The saddlepoint approximation is a natural generalization of the asymptotic normal approximation. If the objective function is quadratic then the saddlepoint approximation will be equal to the asymptotic normal approximation. However, if the objective function is not quadratic the saddlepoint approximation will incorporate the global structure of the objective function to provide more accurate inference. The empirical saddlepoint approximation is the sampling distribution for the location of the local minima of the GMM objective function. If there is a unique local minimum then the empirical saddlepoint distribution is an approximation to the GMM estimator’s sampling distribution. This approximation may be asymmetric and does not force the tails behavior associated with the normal approximation. If the GMM objective function does not have a unique local minimum then the empirical saddlepoint approximation gives the probability of the location of a local minimum. This approximation can have multiple modes. A given sample may not be informative enough to distinguish the general location of the population parameter values. This can result in multiple local minima. Instead of focusing only in the global minimum, the saddlepoint approximation summarizes the information in the sample using the global shape of the objective function.
6
Steps to a Saddlepoint Approximation
The steps to calculate the saddlepoint approximation for the θ parameters and the λ statistics are listed below. The terms’ dependence on N will be suppressed in this section. Carnegie Mellon University
22
February 21, 2007
1. Calculate an optimal weighting matrix for the original GMM objective function. This is simply the first step of the two-step GMM estimation. (a) Calculate the first step GMM estimate using an identity as the weighting matrix. (b) Calculate an estimate of the moments’ covariance matrix at the first step c. GMM estimate, W c 1/2 . (c) Calculate a Cholesky decomposition of the estimate, W 2. Select a grid over the parameter space α = [ θ0 λ0 ]0 . These will be the parameter values on which the approximation will be calculated. 3. Loop through the grid over the θ parameters. (a) Determine the spectral decomposition Ci (θ) for i = 1, 2. i. Calculate M (θ). c 1/2 M (θ). ii. Calculate M (θ) = W ¡ ¢−1 iii. Calculate PM (θ) = M (θ) M (θ)0 M (θ) M (θ)0 . iv. Calculate C1 (θ) using the same fixed steps for every θ. ⊥ v. Calculate PM = I − PM (θ) . (θ) vi. Calculate C2 (θ) using the same fixed steps for every θ. (b) Loop through the grid over the λ parameters. i. Calculate the estimation equation " # c 1/2 GN (θ) C1 (θ)0 W ΨN (α) = . c 1/2 GN (θ) λ − C2 (θ)0 W ii. Solve the saddlepoint equation to determine β(α). £ ¤ \ \ 0 . iii. Calculate κ b(β(α)), E ∂ψ(x,α) and E ψ(x, α)ψ(x, α) 0 ∂α iv. Evaluate the saddlepoint approximation for this value in the parameter space. (c) End the loop over λ. 4. End the loop over θ. 5. Scale the estimated density so that it integrates to one.
Carnegie Mellon University
23
February 21, 2007
7
On the spectral decomposition
The use of the spectral decomposition to determine the estimation equations raises two natural questions. Are the approximate densities well defined? What is its derivative with respect to θ? Because the spectral decomposition is not unique, it must be demonstrated that the approximations are invariant to the spectral decomposition selected. The following theorem provides the needed result. Theorem 6. The saddlepoint approximation is invariant to the spectral decomposition of the tangent spaces. This result shows that the saddlepoint density possesses a large amount of symmetry similar to the normal density. Inference with respect to the parameters of interest is invariant to reparameterization characterized by unitary transformations. Similarly, inference with respect to the statistics that test the validity of the overidentifying restrictions is invariant to unitary reparameterizations of the space of overidentifying restrictions. The estimation equations are built on the matrices Ci,N (θ) for i = 1, 2 from the spectral decomposition of the projection matrix PM (θ) . The proof of the asymptotic normality of the extended-GMM estimators and the empirical saddlepoint density require the differentiability of the estimation equations ψi (α). This in turn requires the differentiation of Ci,N (θ), i.e. the derivative of a spectral decomposition. This is demonstrated in the next theorem. The result and method of proof are of interest in their own right. The presentation will be simplified with the use of the notation D[B(θ)]δ, where D[•] denotes taking the derivative wrt θ of the m × m matrix in the square bracket that is post multiplied by an m × 1 vector, say δ, e.g. D[B(θ)]δ ≡ ∂(B(θ)δ) is an m × m ∂θ0 matrix. Theorem 7.£ Let M (θ) be an¤ m × k matrix with full column rank for every value of θ. Let C(θ) = C1 (θ) C2 (θ) be the orthonormal bases of Rm defined by the spectral decompositions ¡ ¢−1 PM (θ) = M (θ) M (θ)0 M (θ) M (θ)0 = C1 (θ)C1 (θ)0 and ⊥ PM = Im − PM (θ) = C2 (θ)C2 (θ)0 . (θ)
The derivatives of Ci (θ)0 , i = 1, 2, times the constant vector δ, with respect to θ are ¡ ¢−1 £ ¤ D [C1 (θ)0 ] δ = C1 (θ)0 M (θ) M (θ)0 M (θ) D M (θ)0 C2 (θ)C2 (θ)0 δ and
¤¡ ¢−1 £ M (θ)0 δ. D [C2 (θ)0 ] δ = −C2 (θ)0 D M (θ) M (θ)0 M (θ)
This result is of independent interest. Carnegie Mellon University
24
February 21, 2007
8
Final Issues
8.1
Implementation and Numerical issues
1. The Cholesky decomposition “algorithm” must be fixed for the evaluation of the empirical saddlepoint density. In Theorem 2 and Theorem 7, the invariance of the approximations to the selection of the Cholesky decomposition means that any of a continuum of decompositions can be used to determine the empirical saddlepoint density. However, once a Cholesky decomposition algorithm has been selected, it must be used the exact same way for every parameter value in A. The default numerical implementation of the Cholesky decomposition associated with a given software package is typically unacceptable. The problem is that, to achieve higher numerical precision most programs that calculate the Cholesky decomposition, will interchange rows (and/or columns). For a single decomposition, this causes no problem. The difficulty is that for a new parameter value, the resulting matrix may cause the program to use a different set of interchanges in the rows (and/or columns). This is unacceptable. The exact same steps (algorithm) must be used for each parameter in A. 2. The calculation of the saddlepoint density requires repeatedly solving systems of m nonlinear equations in m unknowns. Each value in the grid over the parameter space A requires the solution to a system of equations. Fortunately, the continuity of the density implies that the solutions from nearby parameter values will be excellent starting values for the nonlinear search. 3. The system of m equations in m unknowns has a structure that is similar to the structure of an exponential regression. Regardless of the structure of the original GMM moment conditions, the empirical saddlepoint equations have a simple structure that allows the analytic calculation of first and second derivatives. This results in significant time savings in the calculation of the empirical saddlepoint density. 4. For highly parameterized problems the number of calculations can become large. The saddlepoint for each value in the parameter space can be solved independently. This structure is ideal for distributed computing with multiple processors.
8.2
Extensions
The local parameterization of the overidentifying space and the associated extendedGMM objective function have applications beyond the saddlepoint approximation. The S-sets of Stock and Wright (2000) can be applied to the estimation equations to provide confidence intervals that include parameters that test the validity of overidentifying restrictions. Because the estimation equations are a just identified system the associated S-sets will never be empty. Carnegie Mellon University
25
February 21, 2007
When there is a unique local minima the formulas of Lugannani and Rice (1980) can be used to more efficiently evaluate the tail probabilities. Recent results in saddlepoint estimation (see Robinson, Ronchetti and Young (2003) and Ronchetti and Trojani (2003)) relate to using the saddlepoint parameters βˆN (α) to test hypotheses about the parameters θ. It should be possible to extend this to include new tests for the validity of the overidentifying restrictions, i.e. use βˆN (α) to test the hypotheses H0 : λ0 = 0. The saddlepoint approximations need to be extended to models with dependent data. Recent work suggests that parameter estimates using moment conditions generated from minimum divergence criteria have better small sample properties. The FOC’s from these estimation problems can be used to create estimation equations suitable for saddlepoint approximations. This will replace the CLT’s with the saddlepoint approximations to provide more accurate inference in small samples.
Carnegie Mellon University
26
February 21, 2007
9 9.1
Theorems Theorem 1: Asymptotic distribution
Proof: To reduce notation the matrices Ci,N (θ) dependence on sample size will be suppressed. The first order conditions for the extended GMM estimator are " # 1/2 0 ˆ ˆ C1 (θ) WN GN (θ) ˆ λ) ˆ = ΨN (θ, = 0. ˆ ˆ 0 W 1/2 GN (θ) ˆ λ − C2 (θ) N Expand the estimation equations about the population parameter values and evaluate at the parameter estimates h i · θˆ − θ ¸ ¡ −1 ¢ 0 ∂ΨN (θ0 ,λ0 ) ∂ΨN (θ0 ,λ0 ) ˆ ˆ ΨN (θ, λ) = ΨN (θ0 , λ0 ) + + O N . p ∂θ ∂λ ˆ λ The first order conditions imply that the LHS will be zero. For sufficiently large N it is possible to solve for the parameters gives · ¸ h i−1 ¡ −1 ¢ θˆ − θ0 ∂ΨN (θ0 ,λ0 ) ∂ΨN (θ0 ,λ0 ) Ψ (θ , λ ) + O N = − N 0 0 p ∂θ ∂λ ˆ λ h i−1 ¡ −1 ¢ ∂ΨN (θ0 ,λ0 ) ∂ΨN (θ0 ,λ0 ) = − plimN →∞ Ψ (θ , λ ) + O N plim N 0 0 p N →∞ ∂θ ∂λ The θ parameters enter the first order conditions in both G(θ) and Ci (θ) for i = 1, 2. The product rule implies
∂Ci (θ)0 W 1/2 GN (θ) = Ci (θ)0 W 1/2 MN (θ) + D [Ci (θ)0 ] W 1/2 GN (θ). 0 ∂θ The form of D [Ci (θ)0 ] is explicitly given in Theorem 8. Sufficient assumptions are made to ensure each term is finite in a neighborhood of θ0 hence the law of large 1/2 numbers implies plimN →∞ D [Ci (θ)0 ] WN GN (θ) = 0. Use this result and substitute terms · ¸ θˆ − θ0 ˆ λ # " #−1 " −1/2 −1/2 ¡ ¢ C1 (θ0 )0 Σg GN (θ0 ) C1 (θ0 )0 Σg M (θ0 ) 0 + Op N −1 = − 0 −1/2 0 −1/2 C2 (θ0 ) Σg GN (θ0 ) −C2 (θ0 ) Σg M (θ0 ) I ³ ´−1 " # −1/2 0 −1/2 C1 (θ0 )0 Σg M (θ0 ) 0 ¡ ¢ C1 (θ0 ) Σg−1/2 GN (θ0 ) + Op N −1 ³ ´−1 = − −1/2 −1/2 C2 (θ0 )0 Σg GN (θ0 ) −C2 (θ0 )0 Σg M (θ0 ) C1 (θ0 )0 Σg M (θ0 ) I ³ ´−1 0 −1/2 0 −1/2 C1 (θ0 ) Σg M (θ0 ) C1 (θ0 ) Σg GN (θ0 ) ³ ´−1 = − −1/2 −1/2 −1/2 −1/2 −C2 (θ0 )0 Σg M (θ0 ) C1 (θ0 )0 Σg M (θ0 ) C1 (θ0 )0 Σg GN (θ0 ) + C2 (θ0 )0 Σg GN (θ0 ) ¡ ¢ +Op N −1 Carnegie Mellon University
27
February 21, 2007
·
θˆ − θ0 ˆ λ
¸
´−1 −1/2 −1/2 C1 (θ0 )0 Σg M (θ0 ) C1 (θ0 )0 Σg GN (θ0 ) µ ¶ ³ ´−1 = − −1/2 −1/2 −1/2 −C2 (θ0 )0 Σg M (θ0 ) C1 (θ0 )0 Σg M (θ0 ) C1 (θ0 )0 − I Σg GN (θ0 ) ¡ ¢ +Op N −1 (1) ³
The first k elements of (1) can be written
= = = =
¡ ¢−1 C1 (θ0 )0 Σg−1/2 GN (θ0 ) − C1 (θ0 )0 Σ−1/2 M (θ0 ) g ¡ ¢−1 − C1 (θ0 )0 Σ−1/2 M (θ0 ) C1 (θ0 )0 PM (θ0 ) Σ−1/2 GN (θ0 ) g g ¡ ¢ ¡ ¢−1 −1 − C1 (θ0 )0 Σ−1/2 M (θ0 ) C1 (θ0 )0 M (θ0 ) M (θ0 )0 M (θ0 ) M (θ0 )0 Σ−1/2 GN (θ0 ) g g ¡ ¢ ¡ ¢ −1 −1 − C1 (θ0 )0 Σ−1/2 M (θ0 ) C1 (θ0 )0 Σg−1/2 M (θ0 ) M (θ0 )0 M (θ0 ) M (θ0 )0 Σ−1/2 GN (θ0 ) g g ¡ ¢ −1 M (θ0 )0 Σg−1/2 GN (θ0 ) − M (θ0 )0 M (θ0 )
The last (m − k) element of (1) can be written ³¡ ´ ¢ −1/2 ¡ ¢−1 0 0 −1/2 0 C2 (θ0 ) PM ⊥ + PM Σg M (θ0 ) C1 (θ0 ) Σg M (θ0 ) C1 (θ0 ) − I Σ−1/2 GN (θ0 ) g ¡ −1 ¢ +Op N ´ ³ ¡ ¢−1 ¡ ¢ −1/2 0 −1/2 0 0 C1 (θ0 ) − I Σ−1/2 GN (θ0 ) + Op N −1 = C2 (θ0 ) PM Σg M (θ0 ) C1 (θ0 ) Σg M (θ0 ) g ³ ´ ¡ ¢−1 0 −1/2 0 = C2 (θ0 )0 C1 (θ)C1 (θ)0 Σ−1/2 M (θ ) C (θ ) Σ M (θ ) C (θ ) − I Σ−1/2 GN (θ0 ) 0 1 0 0 1 0 g g g ¡ −1 ¢ +Op N ¡ ¢ = C2 (θ0 )0 (C1 (θ)C1 (θ0 )0 − I) Σ−1/2 GN (θ0 ) + Op N −1 g ¡ ¢ = C2 (θ0 )0 (PM − I) Σg−1/2 GN (θ0 ) + Op N −1 ¡ ¢ ⊥ −1/2 = −C2 (θ0 )0 PM Σg GN (θ0 ) + Op N −1 ¡ −1 ¢ = −C2 (θ0 )0 C2 (θ0 )C2 (θ0 )0 Σ−1/2 G (θ ) + O N N 0 p g ¡ −1 ¢ 0 −1/2 = −C2 (θ0 ) Σg GN (θ0 ) + Op N Combining these gives " ¡ # ¸ · ¢−1 −1/2 ¡ −1 ¢ θˆ − θ0 − M (θ0 )0 M (θ0 ) M (θ0 )0 Σg GN (θ0 ) = N + O p −1/2 ˆ λ −C2 (θ0 )0 Σg GN (θ0 ) Multiply both sides by ¥
√
N and use the CLT to obtain the covariance.
Carnegie Mellon University
28
February 21, 2007
9.2
Theorem 2: Invariance of asymptotic distribution
Proof: The derivation of the estimation equations required the calculation of a Spectral decomposition for the matrix projection matrix · ¸· ¸ £ ¤ Ik 0 C1,N (θ)0 0 PM (θ),N = CΛC = C1,N (θ) C2,N (θ) . 0 0(m−k) C2,N (θ)0 The spectral decomposition is not unique. Hence it is necessary to show that the final results are invariant to the decomposition is selected. Moment conditions with overidentifying restrictions are extended to a set of just identified estimation equations where the parameter space is augmented with a set of parameters that span the space of overidentifying restrictions. The estimated values for these new parameters are statistics that test the validity of the overidentifying restrictions. In Theorem 1 the just identified system of estimation equations is expanded to derive the joint first order asymptotic distribution for both the parameters and the statistics that test the validity of the overidentifying restrictions. This shows that the statistics that test the overidentifying restrictions are asymptotically independent from the parameters of interests. This implies that the proof can be established by showing the result separately for the parameters interest and the statistics that test the validity of the overidentifying restrictions. Because the asymptotic distribution is normal, the parameters that model the overidentifying restrictions only enter through their inner product. This means that the same probability statement will occur for any selection of the diagonalization of the projection matrix onto the space of overidentifying restrictions. The key result is that two different decompositions are related by an (m) × (m) 0 unitary matrix, e.g. U . Recall that for a unitary matrix, £ say U , that ¤U U = Im two different decompositions C = C1 (θ) C2 (θ) and D = £and |U | = 1. Consider ¤ ⊥ D1 (θ) D2 (θ) . For PM = C1 (θ)C1 (θ)0 and PM = C2 (θ)C2 (θ)0 , the estimation equations are ¸ · C1 (θ)0 W 1/2 G(θ) . λ − C2 (θ)0 W 1/2 G(θ) and the parameters for the overidentifying space are λ = C2 (θ)0 W 1/2 G(θ). ⊥ = D2 (θ)D2 (θ)0 , the estimation For the diagonalization PM = D1 (θ)D1 (θ)0 and PM equations are · ¸ D1 (θ)0 W 1/2 G(θ) L − D2 (θ)0 W 1/2 G(θ)
and the parameters for the overidentifying space are L = D2 (θ)0 W 1/2 G(θ). Carnegie Mellon University
29
February 21, 2007
First consider the parameters of interest. The asymptotic distribution of θ does not depend on the decomposition selected and is independent of the statistics that test the validity of the overidentifying restrictions. Hence the decomposition selected for the estimation equations does not affect the asymptotic distribution for θ. Finally consider the statistics that test the validity of the overidentifying restrictions. The parameters λ and L are linearly related λ = C2 (θ)0 W 1/2 G(θ) ⊥ = C2 (θ)0 PM W 1/2 G(θ) = C2 (θ)0 D2 (θ)D2 (θ)0 W 1/2 G(θ) = C2 (θ)0 D2 (θ)L. Note that the (m − k) × (m − k) matrix C2 (θ)0 D2 (θ) is unitary 0
(C2 (θ)0 D2 (θ)) C2 (θ)0 D2 (θ) = = = =
D2 (θ)0 C2 (θ)C2 (θ)0 D2 (θ) ⊥ D2 (θ) D2 (θ)0 PM D2 (θ)0 D2 (θ) Im−k .
To transform from the density for λ to the density for L, the Jacobian of the transformation will be dλ = |C2 (θ)0 D2 (θ)| dL = dL because C2 (θ)0 D2 (θ) is unitary, its determinant is one. The asymptotic distribution for θ does not depend on the diagonalization and is independent of λ and hence is invariant to the diagonalization. The asymptotic distribution for λ is standard normal hence ½ 0 ¾ (m−k) −λ λ − 2 exp (2π) 2 ¾ ½ 0 (m−k) −λ Im−k λ − 2 = (2π) exp 2 ½ 0 ¾ (m−k) −λ (C2 (θ)0 D2 (θ))0 C2 (θ)0 D2 (θ)λ − 2 = (2π) exp 2 ¾ ½ 0 (m−k) −L L = (2π)− 2 exp 2 The last form is the asymptotic distribution for L. Hence, the same probability statements are made regardless of the decomposition. Testing the overidentifying restrictions with either N λ0 λ or N L0 L will give the same test statistic value and the same p-value from the asymptotic distribution. ¥ Carnegie Mellon University
30
February 21, 2007
9.3
Theorem 3: The Saddlepoint approximation
“Proof ” The step of the proof are given in Almudevar, Field and Robinson (2000). What is demonstrated below is that the 8 assumptions made in Almudevar, Field and Robinson (2000) are satisfied by Assumptions 1’-12’. There is one minor change between the current model and the model considered in Almudevar, Field and Robinson (2000). The change concerns the need to restriction attention to only the solutions to the estimation equations associated with the local zeros of the original GMM objective function. In Almudevar, Field and Robinson (2000) the density is for all solutions to the estimation equations and hence include local maxima of the original GMM objective function. This additional restriction is satisfied by setting Ψ∗ (X, θ) and z(θ0 , τ ) to infinity if Ψ0 (X, θ) is not positive definite. With this change, the same steps of the proof are satisfied. A1 Assumptions A1 in Almudevar, Field and Robinson (2000) is satisfied by Assumption 2’, Assumption 5’ and Assumption 6’.1. Assumptions A1 in Almudevar, Field and Robinson (2000) is stated in terms of the estimation equation and Assumption 2’ is stated in terms of the moment conditions. The estimation equations are linear function of the moment conditions. Assumption 2’, Assumption 5’ and Assumption 6’.1 ensure that the coefficients in the linear function are finite. The diffeomorphism defined by M (θ◦ )θ
and C1 (θ◦ )t
ensures the differentiability of the estimation equations. A2 Assumptions A2 in Almudevar, Field and Robinson (2000) is satisfied by Assumption 7’, Assumption 5’ and Assumption 6’.1. Assumptions A2 in Almudevar, Field and Robinson (2000) is stated in terms of the estimation equation and Assumption 7’ is stated in terms of the moment conditions. The estimation equations are linear function of the moment conditions. Assumption 2’, Assumption 5’ and Assumption 6’.1 ensure that the coefficients in the linear function are differentiable and hence the density for the estimation equations can be obtained by a Jacobian transformation. A3 Assumptions A3 in Almudevar, Field and Robinson (2000) is satisfied by Assumption 5’, Assumption 6’.1 and Assumption 6’.2. Assumptions A3 in Almudevar, Field and Robinson (2000) is stated in terms of the estimation equation and Assumption 6’.2 is stated in terms of the moment conditions. The estimation equations are linear function of the moment conditions. Assumption 2’, Assumption 5’ and Assumption 6’.1 ensure that the coefficients in the linear function are finite. Carnegie Mellon University
31
February 21, 2007
A4 Assumptions A4 in Almudevar, Field and Robinson (2000) is satisfied by Assumption 8’. Assumptions A4 in Almudevar, Field and Robinson (2000) is stated in terms of the estimation equation and Assumption 8’ is stated in terms of the moment conditions. The estimation equations are linear function of the moment conditions. Assumption 2’, Assumption 5’ and Assumption 6’.1 ensure that the coefficients in the linear function are finite and differentiable. Hence the restriction on the moment conditions imply the restriction on the estimation equations. item[A5] Assumptions A5 in Almudevar, Field and Robinson (2000) is satisfied by Assumption 9’. Assumptions A5 in Almudevar, Field and Robinson (2000) is stated in terms of the estimation equation and Assumption 9’ is stated in terms of the moment conditions. The estimation equations are linear function of the moment conditions. Assumption 2’, Assumption 5’ and Assumption 6’.1 ensure that the coefficients in the linear function are finite and invertible. Hence the restriction on the moment conditions imply the restriction on the estimation equations. A6 Assumptions A6 in Almudevar, Field and Robinson (2000) is stated in terms of the estimation equation and Assumption 10’ is stated in terms of the moment conditions. The estimation equations are linear function of the moment conditions. Assumption 2’, Assumption 5’ and Assumption 6’.1 ensure that the coefficients in the linear function are finite and invertible. The existence of β for the moment conditions will imply the existence of a β˜ for the estimation equations β 0 gi (θ) = β 0 W 1/2 W −1/2 gi (θ) = β 0 W 1/2 Im W −1/2 gi (θ) ´ ³ ⊥ W −1/2 gi (θ) = β 0 W 1/2 PM (θ) + PM (θ) ⊥ = β 0 W 1/2 PM (θ) W −1/2 gi (θ) + β 0 W 1/2 PM W −1/2 gi (θ) (θ)
= β 0 W 1/2 C1 (θ)C1 (θ)0 W −1/2 gi (θ) + β 0 W 1/2 C2 (θ)C2 (θ)0 W −1/2 gi (θ) = β 0 W 1/2 C1 (θ)C1 (θ)0 W −1/2 gi (θ) ¡ ¢ −β 0 W 1/2 C2 (θ) λ − C2 (θ)0 W −1/2 gi (θ) + β 0 W 1/2 C2 (θ)λ £ ¤ = β 0 W 1/2 C1 (θ) −C2 (θ) ψi (θ) + β 0 W 1/2 C2 (θ)λ. £ ¤ The existence of β 0 for the moment conditions implies β˜0 ≡ β 0 W 1/2 C1 (θ) −C2 (θ) holds for the estimation equations. The term β 0 W 1/2 C2 (θ)λ does note depend on x. A7 Assumptions A7 in Almudevar, Field and Robinson (2000) is stated in terms of the estimation equation and Assumption 11’ is stated in terms of the moment conditions. The estimation equations are linear function of the moment conditions. Assumption 2’, Assumption 5’ and Assumption 6’.1 ensure Carnegie Mellon University
32
February 21, 2007
that the coefficients in the linear function are finite and differentiable. Hence the restriction on the moment conditions imply the restriction on the estimation equations. A8 Assumptions A8 in Almudevar, Field and Robinson (2000) is stated in terms of the estimation equation and Assumption 12’ is stated in terms of the moment conditions. The estimation equations are linear function of the moment conditions. Assumption 2’, Assumption 5’ and Assumption 6’.1 ensure that the coefficients in the linear function are finite. Hence the restriction on the moment conditions imply the restriction on the estimation equations. ¥
Carnegie Mellon University
33
February 21, 2007
9.4
Theorem 4: Empirical Saddlepoint approximation
“Proof ” The step of the proof are given in Ronchetti and Welsh (1994). What is demonstrated below is that the 5 assumptions made in Ronchetti and Welsh (1994) are satisfied by Assumptions 1’-12’. There are two minor changes between the current model and the model considered in Ronchetti and Welsh (1994). The first change is to set the saddlepoint approximation to zero if the empirical saddlepoint equation does not have a solution. This is achieved by including an indicator function for the existence of a solution to the empirical saddlepoint equation. The second change is to allow for multiple solutions to the estimation equations. This is achieved by applying the result in RW to each solution of the estimation equation that is associated with a minimum of the original GMM objective function. (a) In Ronchetti and Welsh (1994) it is assumed that there is only one solution to the estimation equations. In this paper multiple solutions are allowed. The needed generalization of Assumptions (a) in Ronchetti and Welsh (1994) is that each local minima of the extended-GMM objective function associated with a local minima of the original GMM objective function converge at rate N −1/2 to an element in T . This “root-N consistency” is implied by Assumption 1’, Assumption 2’ and Assumption 3’. (b) Assumptions (b) in Ronchetti and Welsh (1994) is satisfied by Assumption 6’.3 and Assumption 10’. Assumptions (b) in Ronchetti and Welsh (1994) is stated in terms of the estimation equation as is Assumption 10’. Assumptions (b) in Ronchetti and Welsh (1994) is stated in terms of the estimation equation and Assumption 6’.3 is stated in terms of the moment conditions. The estimation equations are linear function of the moment conditions. Assumption 2’, Assumption 5’ and Assumption 6’.1 ensure that the coefficients in the linear function are finite. The existence of β for the moment conditions will imply the existence of a β˜ for the estimation equations β 0 gi (θ) = β 0 W 1/2 W −1/2 gi (θ) = β 0 W 1/2 Im W −1/2 gi (θ) ³ ´ 0 1/2 ⊥ = βW PM (θ) + PM (θ) W −1/2 gi (θ) ⊥ = β 0 W 1/2 PM (θ) W −1/2 gi (θ) + β 0 W 1/2 PM W −1/2 gi (θ) (θ)
= β 0 W 1/2 C1 (θ)C1 (θ)0 W −1/2 gi (θ) + β 0 W 1/2 C2 (θ)C2 (θ)0 W −1/2 gi (θ) = β 0 W 1/2 C1 (θ)C1 (θ)0 W −1/2 gi (θ) ¡ ¢ −β 0 W 1/2 C2 (θ) λ − C2 (θ)0 W −1/2 gi (θ) + β 0 W 1/2 C2 (θ)λ £ ¤ = β 0 W 1/2 C1 (θ) −C2 (θ) ψi (θ) + β 0 W 1/2 C2 (θ)λ. Carnegie Mellon University
34
February 21, 2007
£ ¤ The existence of β 0 for the moment conditions implies β˜0 ≡ β 0 W 1/2 C1 (θ) −C2 (θ) holds for the estimation equations. The term β 0 W 1/2 C2 (θ)λ does note depend on x. (c) Assumptions (c) in Ronchetti and Welsh (1994) is satisfied by Assumption 6’.2 and Assumption 5’. Assumptions (c) in Ronchetti and Welsh (1994) is stated in terms of the estimation equation and Assumption 6’.2 is stated in terms of the moment conditions. The estimation equations are linear function of the moment conditions. Assumption 2’, Assumption 5’ and Assumption 6’.1 ensure that the coefficients in the linear function are finite. (d) Assumptions (d) in Ronchetti and Welsh (1994) is satisfied by Assumption 2’ and Assumption 5’. Assumptions (d) in Ronchetti and Welsh (1994) is stated in terms of the estimation equation and Assumption 2’ is stated in terms of the moment conditions. The estimation equations are linear function of the moment conditions. Assumption 2’, Assumption 5’ and Assumption 6’.1 ensure that the coefficients in the linear function are finite. The structure of the derivative of the coefficients wrt to the parameters are given in Theorem 7. (e) Assumptions (e) in Ronchetti and Welsh (1994) is satisfied by Assumption 6’.4, Assumption 5’ and Assumption 2’. Note that Assumptions (e) in Ronchetti and Welsh (1994) is stated in terms of the estimation equation and Assumption 6’.4 is stated in terms of the moment conditions. The estimation equations are linear function of the moment conditions. Assumption 2’, Assumption 5’ and Assumption 6’.1 ensure that the coefficients in the linear function are finite. The structure of the derivative of the coefficients wrt to the parameters are given in Theorem 7. ¥
Carnegie Mellon University
35
February 21, 2007
9.5
Theorem 5: Empirical Saddlepoint Density Structure
Proof The empirical saddlepoint theorem gives the density as KN × Ipos (QN (θ)) × I(β(α)) µ ¶ m2 ¯ · ¸¯− 21 ¸ · 0 ¯ ¯ N ∂ψ(α) ∂ψ(α) −1 0 ¯EC ¯ exp {N κ (β(α))} × (EC [ψ(β)ψ(β) ]) EC ¯ 2π ∂α ∂α0 ¯ = KN × Ipos (QN (θ)) × I(β(α)) h i ¯− 1 i h ¯ µ ¶ m2 ¯ E ∂ψ(α)0 (E [ψ(β)ψ(β)0 ])−1 E ∂ψ(α) ¯ 2 C C ¯ C ∂α ¯ ∂α0 1 ¯ ¯ exp {N κ (β(α))} × ¯ ¯ 2π N ¯ ¯ where the expectations, EC , are wrt to the conjugate density. However, the cumulate generating function is calculated with the distribution of the empirical distribution, i.e. Ee . The proof will be completed by showing that exp {N κ (β(α))} h i −1 h ∂ψ(α)0 i −1 ∂ψ(α) 0 1 Ee ∂α (Ee [ψ(β)ψ(β) ]) Ee ∂α0 0 (α − α∗ ) = exp − (α − α∗ ) N 2 n ³ ´o 1 × 1 + O N−2 where α∗ denotes the local minima associated with α. This will be achieved by showing that the cumulant generating function can be written µ · ¸ ¸¶−1 · ´ ³ 1 ∂ψ(α)0 ∂ψ(α) 0 0 −1 − 32 κ (β(α)) = − (α − α∗ ) Ee (Ee [ψ(β)ψ(β) ]) Ee (α − α∗ )+O N 2 ∂α ∂α0 The cumulate generating function for ψ(α) is κ(β) = ln (Ee [exp {β 0 ψ(α)}]) = ln (M (β)) where M (β) is the moment generating function of ψ(α) under the distribution of the observed data. Expand this in a three term Taylor series about β = β(α∗ ) = 0. Note
Carnegie Mellon University
36
February 21, 2007
that κ(β)|β=0 = ln (M (0)) = 0 M 0 (0) κ(β)0 |β=0 = M (0) = Ee [ψ(α)] M 00 (0)M (0) − M 0 (0)M 0 (0)T κ(β)00 |β=0 = M (0)2 = Ee [ψ(α)ψ(α)0 ] − Ee [ψ(α)]Ee [ψ(α)]0 . The expansion can be written i ¡ ¢ 1 0h 0 0 κ(β) = β Ee [ψ(α)] + β Ee [ψ(α)ψ(α) ] − Ee [ψ(α)]Ee [ψ(α)] β + O |β|3 . 2 0
Linearize the saddlepoint equation about β = 0. Recall the expectation is wrt the distribution of the observed data ¡ ¢ Ee [ψ(α) exp {β 0 ψ(α)}] = Ee [ψ(α)] + Ee [ψ(α)ψ(α)0 ] β + O |β|2 . Evaluate this at β(α), solve for −1
β(α) = − (Ee [ψ(α)ψ(α)0 ])
¡ ¢ Ee [ψ(α)] + O |β(α)|2
and substitute into the expansion −1
κ (β(α)) = −Ee [ψ(α)]0 (E [ψ(α)ψ(α)0 ]) Ee [ψ(α)] 1 −1 + E [ψ(α)]0 (Ee [ψ(α)ψ(α)0 ]) Ee [ψ(α)] 2 ´2 1³ −1 − E [ψ(α)]0 (Ee [ψ(α)ψ(α)0 ]) Ee [ψ(α)] 2¡ ¢ ¡ ¢ +O |β|3 + O |β(α)|2 |Ee [ψ(α)]| . Expand the estimation equation about α and evaluate the expansion at the closest local minimum that is connected by a path that is always positive definite6 , denote this local minimum by α∗ Ψ(α∗ ) = Ψ(α) +
¡ ∂Ψ(α) 2¢ (α − α) + O |α − α | . ∗ ∗ ∂α0
Because α∗ is a local minimum, the estimation equations will be satisfied. This implies Ψ(α) = − 6
¡ ¢ ∂Ψ(α) (α∗ − α) + O |α − α∗ |2 . 0 ∂α
If such a path does not exist, set the approximate density to zero.
Carnegie Mellon University
37
February 21, 2007
Take expectation wrt the distribution of the empirical distribution · ¸ ¡ ∂ψ(α) 2¢ Ee [ψ(α)] = −Ee (α − α) + O |α − α | ∗ ∗ ∂α0 and substitute into the expansion · ¸ · ¸ 1 ∂ψ(α)0 ∂ψ(α) 0 0 −1 κ(β(α)) = − (α∗ − α) Ee (Ee [ψ(α)ψ(α) ]) Ee (α∗ − α) 2 ∂α ∂α0 ¡ ¢ ¡ ¢ ¡ ¢ +O |β|3 + O |β(α)|2 |α − α∗ | + O |α − α∗ |3 . √ Because attention is restricted to a N neighborhood of α∗ and because ³ β(α) is ´ 3 continuous with β(α∗ ) = 0, all the error terms can be collected into a single O N − 2 term. As was to be shown. ¥
Carnegie Mellon University
38
February 21, 2007
9.6
Theorem 6: Saddlepoint approximation invariance to the decomposition
Use the same notation as introduce in Theorem 2. What will be shown is that the same value of the saddlepoint approximation occurs regardless of the decomposition. First the solution to the saddlepoint equation for the decomposition D(θ) is written in terms of the solution to the saddlepoint equation for the decomposition C(θ). This is then used to shows that the different terms that compose the saddlepoint density give the same value for both decompositions. Proof: £ ¤ The saddlepoint equation for the composition C(θ) = C1 (θ) C2 (θ) and λ is ¸ ½ 0 · ¸¾ Z · β (α) C1 (θ)0 W 1/2 gi (θ) C1 (θ)0 W 1/2 gi (θ) 0 = exp dF (x) C2 (θ)0 W 1/2 gi (θ) − λ C2 (θ)0 W 1/2 gi (θ) − λ N · ¸ ½ 0 · ¸¾ Z β (α) C1 (θ)0 W 1/2 gi (θ) C1 (θ)0 W 1/2 gi (θ) = Im exp dF (x) Im C2 (θ)0 W 1/2 gi (θ) − λ C2 (θ)0 W 1/2 gi (θ) − λ N ¸ ¸· Z · (D1 (θ)0 C1 (θ))0 D1 (θ)0 C1 (θ) 0 C1 (θ)0 W 1/2 gi (θ) = 0 (D2 (θ)0 C2 (θ))0 D2 (θ)0 C2 (θ) C2 (θ)0 W 1/2 gi (θ) − λ ½ 0 ¸ · β (α) (D1 (θ)0 C1 (θ))0 D1 (θ)0 C1 (θ) 0 × exp 0 (D2 (θ)0 C2 (θ))0 D2 (θ)0 C2 (θ) N ¸¾ · C1 (θ)0 W 1/2 gi (θ) dF (x) × C2 (θ)0 W 1/2 gi (θ) − λ ¸ ¸· Z · D1 (θ)0 C1 (θ)C1 (θ)0 W 1/2 gi (θ) C1 (θ)0 D1 (θ) 0 = 0 C2 (θ)0 D2 (θ) D2 (θ)0 C2 (θ)C2 (θ)0 W 1/2 gi (θ) − D2 (θ)0 C2 (θ)λ ½ ¸ · β(α)0 C1 (θ)0 D1 (θ) 0 × exp 0 C2 (θ)0 D2 (θ) N ¸¾ · D1 (θ)0 C1 (θ)C1 (θ)0 W 1/2 gi (θ) dF (x) × D2 (θ)0 C2 (θ)C2 (θ)0 W 1/2 gi (θ) − D2 (θ)0 C2 (θ)λ ¸ · ¸Z · C1 (θ)0 D1 (θ) 0 D1 (θ)0 W 1/2 gi (θ) = 0 C2 (θ)0 D2 (θ) D2 (θ)0 W 1/2 gi (θ) − L ( ) ˜ 0 · D1 (θ)0 W 1/2 gi (θ) ¸ β(α) × exp dF (x) D2 (θ)0 W 1/2 gi (θ) − L N where
·
¸ C1 (θ)0 D1 (θ) 0 . 0 C2 (θ)0 D2 (θ) The leading matrix is of full rank and hence the final equation implies ) ( ¸ Z · ˜ 0 · D1 (θ)0 W 1/2 gi (θ) ¸ β(α) D1 (θ)0 W 1/2 gi (θ) dF (x). 0= exp D2 (θ)0 W 1/2 gi (θ) − L D2 (θ)0 W 1/2 gi (θ) − L N ˜ 0 ≡ β(α)0 β(α)
˜ So β(θ) will be the solution to the saddlepoint equation for the decomposition D(θ) with parameter L. Use a superscript to denote the different spectral decompositions Carnegie Mellon University
39
February 21, 2007
and it is possible to show the invariance of the conjugate density wrt the selection of the spectral decomposition n o 0 C(θ) exp β(α) ψ (x, α) dF (x) N C(θ) n o hN,β (x) = R 0 exp β(α) ψ C(θ) (w, α) dF (w) N n˜ 0 o D(θ) exp β(α) ψ (x, α) dF (x) N n o = R ˜ 0 exp β(α) ψ D(θ) (w, α) dF (w) N D(θ)
= hN,β˜ (x). £ ¤0 The saddlepoint approximation for C(θ) and λ with α = θ0 λ0 ¯ · C(θ) µ ¶ m2 ¸¯ ¯ £ C(θ) ¯ ¤¯− 21 ¯ 1 ∂ψ (α) C(θ) 0 ¯E ψ ¯ exp {N κ(β(α), α)} gN (α) = (α)ψ (α) ¯ ¯¯E ¯ 2π ∂α0 is composed of three terms, where β(α) is the solution to the saddlepoint equation C(θ) (2) and the expectation is wrt the conjugate density hN,β (x). The approximation depends on three terms that contain C1 (θ), C2 (θ) and/or λ. The proof will be completed once it is shown is that the value of the three terms is equal to the value that would be obtained if D1 (θ), D2 (θ) and L had been used. 1. Consider the term κ(β(α), α).
= = =
=
=
κ(β(α), α) ¸¾ ½ 0 · Z β (α) C1 (θ)0 W 1/2 gi (θ) dF (x) exp C2 (θ)0 W 1/2 gi (θ) − λ N ½ 0 ¸¾ · Z β (α) C1 (θ)0 W 1/2 gi (θ) exp Im dF (x) C2 (θ)0 W 1/2 gi (θ) − λ N ¸ ½ 0 · Z β (α) (D1 (θ)0 C1 (θ))0 D1 (θ)0 C1 (θ) 0 exp 0 (D2 (θ)0 C2 (θ))0 D2 (θ)0 C2 (θ) N · ¸¾ C1 (θ)0 W 1/2 gi (θ) × dF (x) C2 (θ)0 W 1/2 gi (θ) − λ ½ · ¸ Z β(α)0 C1 (θ)0 D1 (θ) 0 exp 0 C2 (θ)0 D2 (θ) N · ¸¾ D1 (θ)0 C1 (θ)C1 (θ)0 W 1/2 gi (θ) × dF (x) D2 (θ)0 C2 (θ)C2 (θ)0 W 1/2 gi (θ) − D2 (θ)0 C2 (θ)λ ( ) ˜ 0 · D1 (θ)0 W 1/2 gi (θ) ¸ β(α) exp dF (x) D2 (θ)0 W 1/2 gi (θ) − L N
˜ = κ(β(α), α). Hence this terms is invariant to the decomposition selected. Carnegie Mellon University
40
February 21, 2007
¯ h C(θ) i¯ ¯ ¯ 2. Consider the term ¯E ∂ψ ∂α0 (α) ¯ . To reduce notation let U denote the unitary matrix · ¸ C1 (θ)0 D1 (θ) 0 U= 0 C2 (θ)0 D2 (θ) and hence |U | = 1. Use Theorem 8 to write
=
=
=
=
=
¯ · C(θ) ¸¯ ¯ (α) ¯¯ ¯Eβ ∂ψ ¯ ¯ ∂α0 ¯Z " # ¯ ⊥ C1 (θ)0 M i (θ) + C1 (θ)0 M (θ)(M (θ)0 M (θ))−1 D[M (θ)0 ]PM W 1/2 gi (θ) 0 ¯ (θ) ¯ ¯ C2 (θ)0 M i (θ) − C2 (θ)0 D[M (θ)](M (θ)0 M (θ))−1 M (θ)0 W 1/2 gi (θ) −Im−k ¯ ¯ C(θ) × hN,β (x)dx¯ ¯Z " # ¯ ⊥ C1 (θ)0 M i (θ) + C1 (θ)0 M (θ)(M (θ)0 M (θ))−1 D[M (θ)0 ]PM W 1/2 gi (θ) 0 ¯ (θ) ¯ ¯ −Im−k C2 (θ)0 M i (θ) − C2 (θ)0 D[M (θ)](M (θ)0 M (θ))−1 M (θ)0 W 1/2 gi (θ) ¯ ¯ D(θ) × hN,β˜ (x)dx¯ ¯Z " # ¯ ⊥ D1 (θ)0 M i (θ) + D1 (θ)0 M (θ)(M (θ)0 M (θ))−1 D[M (θ)0 ]PM W 1/2 gi (θ) 0 ¯ (θ) |U | ¯ ¯ −D2 (θ)0 C2 (θ) D2 (θ)0 M i (θ) − D2 (θ)0 D[M (θ)](M (θ)0 M (θ))−1 M (θ)0 W 1/2 gi (θ) ¯ ¯ D(θ) × hN,β˜ (x)dx¯ ¯Z " # ¯ ⊥ W 1/2 gi (θ) 0 D1 (θ)0 M i (θ) + D1 (θ)0 M (θ)(M (θ)0 M (θ))−1 D[M (θ)0 ]PM ¯ (θ) ¯ ¯ D2 (θ)0 M i (θ) − D2 (θ)0 D[M (θ)](M (θ)0 M (θ))−1 M (θ)0 W 1/2 gi (θ) −Im−k ¯ ¯ D(θ) × hN,β˜ (x)dx¯ ¯ · D(θ) ¸¯ ¯ (α) ¯¯ ¯E ˜ ∂ψ ¯. ¯ β ∂α0
h C(θ) i h D(θ) i The first and last equalities come from the definitions of Eβ ∂ψ ∂α0 (α) , Eβ˜ ∂ψ ∂α0 (α) and Theorem 8. The second equality is justified with the same steps used above ˜ to derive β(α). Third equality is from the definition of U . The fourth equality occurs because U and D2 (θ)0 C2 (θ) are unitary and hence |U | = |D2 (θ)0 C2 (θ)| = 1. Note that the upper right matrix being zero implies that the determinant will be the product of the the determinant of the matrices on the main diagonal. Hence this terms is invariant to the decomposition selected.
Carnegie Mellon University
41
February 21, 2007
¯ £ ¤¯ 3. Finally consider the term ¯Eβ ψ C(θ) (α)ψ C(θ) (α)0 ¯
= = = =
£ ¤ Eβ ψ C(θ) (α)ψ C(θ) (α)0 ¸· ¸0 Z · C1 (θ)0 W −1/2 gi (θ) C1 (θ)0 W −1/2 gi (θ) C(θ) hN,β (x)dx C2 (θ)0 W 1/2 gi (θ) − λ C2 (θ)0 W 1/2 gi (θ) − λ ¸· ¸0 Z · C1 (θ)0 W −1/2 gi (θ) C1 (θ)0 W −1/2 gi (θ) D(θ) hN,β˜ (x)dx 0 1/2 0 1/2 C2 (θ) W gi (θ) − λ C2 (θ) W gi (θ) − λ ¸ · ¸0 Z · D1 (θ)0 W −1/2 gi (θ) D1 (θ)0 W −1/2 gi (θ) D(θ) hN,β˜ (x)dxU 0 U D2 (θ)0 W 1/2 gi (θ) − L D2 (θ)0 W 1/2 gi (θ) − L £ ¤ U Eβ˜ ψ D(θ) (α)ψ D(θ) (α)0 U 0
£ C(θ) ¤ C(θ) 0 The first£ and last equalities are from the definitions of E (α)ψ (α) β ψ ¤ and Eβ˜ ψ D(θ) (α)ψ D(θ) (α)0 . The second equality if from the same steps that ˜ ˜ were used to define β(α) and the equality κ(β(α), α) = κ(β(α), α). The third equality is from the definition of U . Now, U is a unitary matrix ¯ £ C(θ) ¯ £ D(θ) ¤¯ and hence ¤¯ D(θ) 0 ¯ C(θ) 0 ¯ ¯ ¯ (α)ψ (α) . Hence the |U | = 1. So Eβ ψ (α)ψ (α) = Eβ˜ ψ value is invariant to the decomposition selected. The saddlepoint density is invariant to the decomposition selected. ¥
Carnegie Mellon University
42
February 21, 2007
9.7
Theorem 7: Derivative of the decomposition
The sufficient conditions for the existence of the saddlepoint density requires the derivative of the estimation equations with respect to the parameters. This can be directly observed for the terms that involve the GMM moment conditions. The spectral decomposition, Ci (θ) for i = 1, 2, must be differentiated with respect to θ. Actually, the needed terms are the derivatives of the matrices times a constant vector. Recall D[•] denotes taking the derivative wrt θ of the m × m matrix in the square bracket that is post multiplied by an m × 1 vector, say δ, where δ does not depend on θ. Proof: h i 0 ⊥ For C2 (θ) the proof is demonstrated with the evaluation of D PM (θ) δ by two different ways. These are then set equal to determine D [C2 (θ)] δ. Consider ⊥ D[PM ]δ = D[C2 (θ)C2 (θ)0 ]δ (θ)
= C2 (θ)D[C2 (θ)0 ]δ + D[C2 (θ)]C2 (θ)0 δ.
(2)
Now consider the same derivative written as ¡ ¢−1 ⊥ D[PM ]δ = D[I − M (θ) M (θ)0 M (θ) M (θ)0 ]δ (θ) £ ¤¡ ¢−1 = −D M (θ) M (θ)0 M (θ) M (θ)0 δ i h¡ ¢−1 −M (θ)D M (θ)0 M (θ) M (θ)0 δ ¡ ¢−1 £ ¤ D M (θ)0 δ −M (θ) M (θ)0 M (θ) £ ¤¡ ¢−1 = −D M (θ) M (θ)0 M (θ) M (θ)0 δ ¡ ¢ £ ¤¡ ¢−1 −1 +M (θ) M (θ)0 M (θ) M (θ)0 D M (θ) M (θ)0 M (θ) M (θ)0 δ ¡ ¢−1 £ ¤ ¡ ¢−1 D M (θ)0 M (θ) M (θ)0 M (θ) +M (θ) M (θ)0 M (θ) M (θ)0 δ ¡ ¢−1 £ ¤ D M (θ)0 δ −M (θ) M (θ)0 M (θ) £ ¤¡ ¢−1 ⊥ = −PM D M (θ) M (θ)0 M (θ) M (θ)0 δ ¡ ¢−1 £ ¤ ⊥ −M (θ) M (θ)0 M (θ) D M (θ)0 PM δ £ ¤ ¡ ¢ −1 = −C2 (θ)C2 (θ)0 D M (θ) M (θ)0 M (θ) M (θ)0 δ ¡ ¢−1 £ ¤ −M (θ) M (θ)0 M (θ) D M (θ)0 C2 (θ)C2 (θ)0 δ (3) Comparing equations (2) and (3) gives the result £ ¤¡ ¢−1 D[C2 (θ)0 ]δ = −C2 (θ)0 D M (θ) M (θ)0 M (θ) M (θ)0 δ £ ¤¡ ¢−1 = −C2 (θ)0 D M (θ) M (θ)0 M (θ) M (θ)0 PM δ £ ¤¡ ¢−1 = −C2 (θ)0 D M (θ) M (θ)0 M (θ) M (θ)0 C1 (θ)C1 (θ)0 δ. Carnegie Mellon University
43
February 21, 2007
Now consider C1 (θ)0 . The derivative of the projection matrix gives D[PM (θ) ]δ = D[C1 (θ)C1 (θ)0 ]δ = C1 (θ)D[C1 (θ)0 ]δ + D[C1 (θ)]C1 (θ)0 δ.
(4)
For all θ ⊥ PM (θ) + PM = I, (θ) ⊥ PM (θ) δ + PM δ=δ (θ)
and hence
i h h i ⊥ D PM (θ) δ + D PM δ = 0. (θ)
This implies
h i h i ⊥ D PM (θ) δ = −D PM (θ) δ.
Substitute from equations (3) £ ¤¡ ¢−1 D[PM (θ) ]δ = C2 (θ)C2 (θ)0 D M (θ) M (θ)0 M (θ) M (θ)0 δ ¡ ¢−1 £ ¤ D M (θ)0 C2 (θ)C2 (θ)0 δ +M (θ) M (θ)0 M (θ) £ ¤¡ ¢−1 = C2 (θ)C2 (θ)0 D M (θ) M (θ)0 M (θ) M (θ)0 PM δ ¡ ¢−1 £ ¤ D M (θ)0 C2 (θ)C2 (θ)0 δ +PM M (θ) M (θ)0 M (θ) £ ¤¡ ¢−1 = C2 (θ)C2 (θ)0 D M (θ) M (θ)0 M (θ) M (θ)0 C1 (θ)C1 (θ)0 δ £ ¡ ¢ ¤ −1 D M (θ)0 C2 (θ)C2 (θ)0 δ. (5) +C1 (θ)C1 (θ)0 M (θ) M (θ)0 M (θ) Comparing equations (4) and (5) gives the result ¡ ¢−1 £ ¤ D M (θ)0 C2 (θ)C2 (θ)0 δ D[C1 (θ)0 ]δ = C1 (θ)0 M (θ) M (θ)0 M (θ) ¡ ¢−1 £ ¤ ⊥ = C1 (θ)0 M (θ) M (θ)0 M (θ) D M (θ)0 PM δ. (θ) ¥
Carnegie Mellon University
44
February 21, 2007
REFERENCES 1. Almudevar, Anthony, Chris Field and John Robinson (2000) “The Density of Multivariate M-Estimates” The Annals of Statistics vol 28. no. 1, pp. 275-297. 2. Amari, Shu-ichi (1985) Differential-Geometrical Methods in Statistics Berlin Heidelberg Springer-Verlag, Lecture Notes in Statistics, 28. 3. Anatolyev, Stanislav (2005) “GMM, GEL, Serial Correlation, and Asymptotic Bias,” Econometrica, vol. 73(3), pages 983-1002. 4. Bhattacharya, R. N. and J. K. Ghosh (1978) “On the Validity of the Formal Edgeworth Expansion” Annals of Statistics vol 6, pp. 434-451. 5. Boothby, William M. (2003) An Introduction to Differentiable Manifolds and Riemannian Geometry revised second edition, Academic Press. 6. Daniels, H.E. (1954) “Saddlepoint approximations in statistics” Annals of Mathematical Statistics vol. 25, pp. 631-650. 7. Esscher, F. (1932) “The probability function in the collective theory of risk” Skand. Aktuarietidskr. Tidskr., vol. 15, pp. 175-195. 8. Feuerverger, Andrey (1989) “On the Empirical Saddlepoint Approximation” Biometrika vol. 76, pp. 457-464. 9. Field, C. A.(1982) “Small Sample Asymptotics for Multivariate M-Estimates” Annals of Statistics, Vol. 10, pp. 672-689. 10. Field, C. A. and E. Ronchetti (1990) Small Sample Asymptotics Hayward, CA. IMS Monograph Series, vol 13. 11. Goutis, Constantin and George Casella (1999) “Explaining the Saddlepoint Approximation” The American Statistician vol. 53, no. 3. pp. 216-224. 12. Hall, Peter and Joel L. Horowitz (1996) “Bootstrap Critical Values for Tests Based on Generalized-Method-of-Moments Estimators” Econometrica Vol 64, No. 4, pp. 891-916. 13. Hansen, L.P. (1982) “Large Sample Properties Generalized Method of Moments Estimators” Econometrica Vol. 50. 14. Hansen, L.-P., John Heaton, and A. Yaron (1996) “Finite Sample Properties of Some Alternative GMM Estimators”, Journal of Business and Economic Statistics, Vol.14 No.3. 15. Huzurbazar, S. (1999) “Practical Saddlepoint Approximations” The American Statistician, vol. 53, no. 3, pp. 225-232. Carnegie Mellon University
45
February 21, 2007
16. Jensen, J. L. (1995) Saddlepoint Approximations, Oxford University Press, Oxford. 17. Kitamura, Yuichi and Michael Stutzer (1997) “An Information-Theoretic Alternative to Generalized Method of Moments Estimation” Econometrica Vol. 65, No. 4, pp. 861-874. 18. Kolassa, J.E. (1997) Series Approximation Methods in Statistics 2nd edition, New York, Springer-Verlag, Lecture Notes in Statistics, 88. 19. Lugannani, R. and S. Rice (1980) “Saddle Point Approximations for the Distribution of the Sum of Independent Random Variables” Advances in Applied Probability vol. 12, pp.475-490. 20. Marriott, Paul and Mark Salmon (2000) Applications of Differential Geometry to Econometrics Cambridge, Cambridge University Press. 21. Newey, Whitney and Richard J. Smith (2004) “Higher Order Properties of GMM and Generalized Empirical Likelihood Estimators” Econometrica Vol. 72, No. 1, p. 219-255. 22. Ortega, J.M. and Rheinboldt, W.C. (1970) Iterative Solution of Nonlinear Equations in Several Variables New York: Academic Press. 23. Reid, N. (1988) “Saddlepoint Methods and Statistical Inference” Statistical Science vol 3, no. 2, pp. 213-227. 24. Robinson, J. E. Ronchetti and G. A. Young (2003) “Saddlepoint Approximations and Tests based on Multivariate M-estimates” Annals of Statistics. 25. Rochetti, Elvezio and Fabio Trojani (2003) “Saddlepoint Approximations and Test Statistics for Accurate Inference in Overidentified Moment Conditions Models” National Centre of Competence in Research Financial Valuation and Risk Management, working paper 27. 26. Rochetti, Elvezio and A. H. Welsch (1994) “Empirical Saddlepoint Approximations for Multivariate M-Estimators” Journal of the Royal Statistical Society, Series B, vol. 56, pp. 313-326. 27. Rilstone, Paul, V. K. Srivastava and Aman Ullah (1996) “The Second-Order Bias and Mean Squared Error of Nonlinear Estimators” Journal of Econometrics, Vol 75, pp. 369-395. 28. Skovgaard, I. M. (1990) “On the Density of minimum contrast estimators”, Annals of Statistics, vol. 18, pp. 779-789. 29. Sowell, F. B. (1996) “Optimal Tests for Parameter Instability in the Generalized Method of Moments Framework” Econometrica vol. 64, no. 5. pp. 1085-1107. Carnegie Mellon University
46
February 21, 2007
30. Stock, James and Jonathan Wright (2000) “GMM with Weak Identification,” Econometrica, vol. 68(5), pages 1055-1096.
Carnegie Mellon University
47
February 21, 2007