A Consistent LM Type Specification Test for Semiparametric

0 downloads 0 Views 2MB Size Report
Oct 17, 2018 - ... Brad Larsen, Joe Romano, and Paulo Somaini for very helpful ..... In order to do so, I turn the conditional moment restriction E[εi|Xi] = 0 into a ...
A Consistent LM Type Specification Test arXiv:1810.07620v1 [econ.EM] 17 Oct 2018

for Semiparametric Models∗ Ivan Korolev† October 15, 2018 Abstract This paper develops a consistent Lagrange Multiplier (LM) type specification test for semiparametric conditional mean models against nonparametric alternatives. Consistency is achieved by turning a conditional moment restriction into a growing number of unconditional moment restrictions using series methods. The test is simple to implement, because it requires estimating only the restricted semiparametric model and because the asymptotic distribution of the test statistic is pivotal. The use of series methods in estimation of the null semiparamertic model allows me to account for the estimation variance and obtain refined asymptotic results. The test demonstrates good size and power properties in simulations. I apply the test to one of the semiparametric gasoline demand specifications from Yatchew and No (2001) and find no evidence against it. ∗

I am grateful to my advisors Frank Wolak, Han Hong, and Peter Reiss for their support and guidance, as

well as to Svetlana Bryzgalova, Guido Imbens, Brad Larsen, Joe Romano, and Paulo Somaini for very helpful conversations. I thank seminar participants at several universities and conferences for helpful comments. I thank Adonis Yatchew for permission to use the Canadian household gasoline consumption dataset from Yatchew and No (2001). I gratefully acknowledge the financial support from the Stanford Graduate Fellowship Fund as a Koret Fellow and from the Stanford Institute for Economic Policy Research as a B.F. Haley and E.S. Shaw Fellow. All remaining errors are mine. † Department of Economics, Binghamton University. E-mail: [email protected]. Website: https: //sites.google.com/view/ivan-korolev/home

1

1

Introduction Applied economists often want to achieve two conflicting goals in their work. On the

one hand, they wish to use the most flexible specification possible, so that their results are not driven by functional form assumptions. On the other hand, they wish to have a model that is consistent with the restrictions imposed by economic theory and can be used for valid counterfactual analysis. While parametric models are often too restrictive and may not capture heterogeneity in the data well, nonparametric models may violate restrictions imposed by economic theory and suffer from the curse of dimensionality, i.e. become imprecise if the dimensionality of regressors is high. This, together with the fact that economic theory usually specifies one portion of the model but leaves the other unrestricted, makes semiparametric models especially attractive for empirical work in economics. But because many semiparametric models are restricted versions of fully nonparametric models, it is important to check the validity of implied restrictions. If semiparametric models are correctly specified, then using them, as opposed to nonparametric models, typically leads to more efficient estimates and may increase the range of counterfactual questions that can be answered using the model at hand. On the other hand, if semiparametric models are misspecified, then the semiparametric estimates are likely to be misleading and may result in incorrect policy implications. In this paper I develop a new specification test that determines whether a semiparametric conditional mean model that the researcher has estimated provides a statistically valid description of the data as compared to a general nonparametric model. The test statistic is based on a quadratic form in the semiparametric model residuals. When the errors are homoskedastic, this quadratic form can be computed as nR2 from the regression of the semiparametric residuals on the series approximating functions. The heteroskedasticity robust version of the quadratic form on which the test is based can also be computed in regression based ways. Thus, the proposed test is fast and simple to implement. Moreover, the asymptotic distribution of the test statistic is pivotal, i.e. does not depend on the unknown 2

parameters, so that calculating asymptotically exact critical values for the test is straightforward and does not require the use of resampling methods. The proposed test uses series methods to turn a conditional moment restriction into a growing number of unconditional moment restrictions. I show that if the series functions can approximate the nonparametric alternatives that are allowed as the sample size grows, the test is consistent. My assumptions and proofs make precise what is required of the approximation and its behavior as the number of series terms and the sample size grow. These arguments differ from standard parametric arguments, when the number of regressors in the model is fixed. My asymptotic theory allows both the number of parameters under the null and the number of restrictions to grow with the sample size. By doing so, I show that the parametric Lagrange Multiplier test can be extended to semiparametric models and serve as a consistent model specification test for these models. Because series estiamtors have a projection interpretation, using series methods to nest the null model in the alternative and estimate the restricted model makes it possible to directly account for the estimation variance and obtain refined asymptotic results. This refinement, which can be viewed as a degrees of freedom correction, allows me to derive the asymptotic distribution of the test statistic under fairly weak rate conditions and improves the finite sample performance of the test. Specification tests have long played an important role in theoretical econometrics. Several papers have studied specification testing when the null model contains a nonparametric component. Early work on specification testing in semiparametric models required certain ad hoc modifications, such as sample splitting in Yatchew (1992) and Whang and Andrews (1993) or randomization in Gozalo (1993). Fan and Li (1996) solve this problem and develop a kernel-based specification test that can be used to test a semiparametric null hypothesis against a general nonparametric alternative, but their test requires high-dimensional kernel smoothing and cannot be implemented with standard econometric software. Lavergne and Vuong (2000) refine the test of Fan and Li (1996), but they only consider significance testing

3

in nonparametric models. Kernel-based specification tests are also developed in Chen and Fan (1999), Delgado and Manteiga (2001), Ait-Sahalia et al. (2001), and Bravo (2012). In all these papers, the asymptotic distribution of the test statistic is quite complicated and requires either estimating several nuisance parameters or using the bootstrap, which can be computationally costly. I circumvent this problem by relying on series methods to construct the test statistic. Because the number of series terms grows with the sample size, the usual asymptotic results for the parametric Lagrange Multiplier test no longer apply. However, it is possible to normalize the test statistic so that the resulting normalized statistic is asymptotically standard normal. Therefore, the quantiles of the standard normal distribution can be used as asymptotically exact critical values for the test. The proposed test is based on a quadratic form in the restricted model residuals. Thus, it can be viewed as a nonparametric generalization of the conventional Lagrange Multiplier test, classical treatments of which include Breusch and Pagan (1980), Engle (1982), and Engle (1984). This generalization is not novel in itself, as Hall (1990) and McCulloch and Percy (2013) also develop Lagrange Multiplier specification tests for parametric models against flexible alternatives. However, they only consider null hypotheses with fully specified parametric distributions, and their asymptotic analysis treats the number of series terms in the alternative model as fixed. As a result, their tests fail to achieve consistency. In contrast, I allow for semiparametric conditional mean models and develop an asymptotic theory for the case when the number of series terms grows with the sample size, which results in a consistent specification test. My work is closely related to the literature on series-based specification tests, such as de Jong and Bierens (1994), Hong and White (1995), Koenker and Machado (1999), Donald et al. (2003), and Sun and Li (2006). These papers extend the Conditional Moment test of Newey (1985) by considering a growing number of unconditional restrictions and thus achieve consistency. However, they only consider parametric null hypotheses and do not

4

develop a degrees of freedom correction. In contrast, my test can handle a broad class of semiparametric models and involves a degrees of freedom correction. I show that this correction plays a crucial role in the semiparametric case, because it allows me to weaken the rate conditions and results in better performance of the test in simulations. Hong and White (1995), among others, investigate the behavior of the test statistic for parametric null hypotheses under the fixed alternative and under local alternatives. I repeat this analysis for semiparametric null hypotheses and reach similar conclusions: the test statistic diverges to infinity faster than the parametric rate n1/2 under the fixed alternative, but the test can only detect local alternatives that approach zero slower than the parametric rate n−1/2 . Moreover, both rates are asymptotically the same as in the parametric case.1 To my knowledge, there are two studies that develop series-based specification tests that allow for semiparametric null hypotheses. Gao et al. (2002) only consider additive models and do not explicitly develop a consistent test against a general class of alternatives. Their test is based on the estimates from the unrestricted model and can be viewed as a Wald type test for variables significance in nonparametric additive conditional mean models. In contrast, my test is based on the residuals from the restricted semiparametric model and is consistent against a broad class of alternatives for the conditional mean function. Li et al. (2003) use the approach that was first put forth in Bierens (1982) and Bierens (1990) and develop a series-based specification test based on weighting the moments, rather than considering an increasing number of series-based unconditional moments. Their test can detect local alternatives that approach zero at the parametric rate n−1/2 , but the asymptotic distribution of their test statistic depends on nuisance parameters, and it is difficult to obtain appropriate critical values. They propose using a residual-based wild bootstrap to approximate the critical values. In contrast, my test statistic is asymptotically standard normal under the null, so calculating critical values is straightforward. Another strand of literature related to my paper studies inference in conditional moment 1

By “asymptotically the same,” I mean that the exact rates in the parametric and semiparametric cases are different but the ratio of two rates goes to 1 as the sample size grows.

5

restriction models. Chen and Pouzo (2015a) and Chen and Pouzo (2015b) develop sieve-based inference methods, including the sieve score statistic, for semi- and nonparametric conditional moment models. Their framework allows for the tests of semiparametric null hypotheses against general nonparametric alternatives, but their test statistic is different from mine. In turn, Chernozhukov et al. (2015) develop hypothesis tests for constrained conditional moment restriction models. In semiparametric conditional mean models, their test statistic reduces to the quadratic form on which my test statistic is based and is asymptotically pivotal. While those papers are more general than mine and can handle a wider class of models, they impose high level assumptions, do not derive conditions under which their tests serve as consistent specification tests, and rely on the bootstrap techniques to prove their results. In contrast, I impose assumptions that are more primitive and familiar from the literature on series estimation, explicitly derive the rate conditions under which the proposed test is valid and consistent, and prove my results using the theory of U-statistics and convergence rates of semiparametric estimators. Finally, similar to the overidentifying restrictions test in GMM models or other omnibus specification tests, the proposed test is silent on how to proceed if the null is rejected. In this respect, it is clearly a test of a particular model specification but not a comprehensive model selection procedure. I rely on the results from this paper and study model selection methods for semiparametric and nonparametric models, such as series-based information criteria or upward/downward testing procedures based on the proposed test, in Korolev (2018). The remainder of the paper is organized as follows. Section 2 presents several motivating examples. Section 3 introduces the model and describes how to construct the series-based specification test for semiparametric models. Section 4 develops the asymptotic theory for the proposed test. Section 5 provides a step-by-step description of how to implement the proposed test. Section 6 studies the behavior of the proposed test in simulations. Section 7 applies the proposed test to one of the semiparametric household gasoline demand specifications from Yatchew and No (2001) and shows that it is not rejected by the data. Section 8 concludes.

6

Appendix A collects all figures. Appendix B contains proofs of technical results.

2

Motivating Examples This section presents several examples of applied papers in economics which have used

semiparametric models in their analysis. Example 1 (Production Function Estimation). In a seminal paper, Olley and Pakes (1996) combine a Cobb-Douglas production function and a nonparametric specification for investment to derive the following semiparametric relationship:

yit = βl lit + Φt (iit , kit ) + εit ,

(2.1)

where yit is the log of output from plant i at time t, kit the log of its capital input, lit the log of its labor input, iit investment in plant i at time t, and εit the error term. In turn, Levinsohn and Petrin (2003) combine a Cobb-Douglas production function and a nonparametric specification for demand for intermediate inputs to derive a different semiparametric relationship:

yit = βl lit + Φt (kit , mit ) + εit ,

(2.2)

where mit is an intermediate input (such as electricity, fuel, or materials). Finally, in a recent paper, Ackerberg et al. (2015) use an alternative approach and arrive at the fully nonparametric specification:

˜ t (kit , lit , mit ) + εit yit = Φ

(2.3)

Specification 2.2 is more restrictive than 2.3, and my test can be used to assess whether 2.2 is statistically adequate.

7

Example 2 (Gasoline Demand Estimation). Hausman and Newey (1995) estimate gasoline demand as a flexible function of prices and incomes. They also include in the model other covariates and treat them parametrically to reduce the dimension of the nonparametric part of the model: g = r0 (x) + w0 β + ε, where g is the log of gasoline demand, x includes the log of gasoline prices and the log of income, and w includes other variables. For similar reasons, Schmalensee and Stoker (1999) use the following semiparametric gasoline demand specification: y = G(x) + z 0 β + ε, where y is the log of gallons consumed, x denotes the log of income and the log of age, and z denotes the remaining variables. Semiparametric models have also been used in estimation of Engel curves (Gong et al. (2005)), the labor force participation equation (Martins (2001)), the relationship between land access and poverty (Finan et al. (2005)), and marginal returns to education (Carneiro et al. (2011)). My test can potentially be used to assess the validity of semiparametric specifications used in these papers.

3

The Model and Specification Test This section describes the model, discusses semiparametric series estimators, and intro-

duces the test statistic.

3.1

The Model and Null Hypothesis

Let (Yi , Xi0 )0 ∈ R1+dx , dx ∈ N, i = 1, ..., n, be independent and identically distributed random variables with E[Yi2 ] < ∞. Then there exists a measurable function g such that 8

g(Xi ) = E[Yi |Xi ] a.s., and the nonparametric model can be written as

Yi = g(Xi ) + εi ,

E[εi |Xi ] = 0

The goal of this paper is to test the null hypothesis that the conditional mean function is semiparametric. A generic semiparametric null hypothesis is given by

H0SP : PX (g(Xi ) = f (Xi , θ0 , h0 )) = 1 for some θ0 ∈ Θ, h0 ∈ H,

(3.1)

where f : X × Θ × H → R is a known function, θ ∈ Θ ⊂ Rd is a finite-dimensional parameter, and h ∈ H = H1 × ... × Hq is a vector of unknown functions. The fixed alternative is

H1 : PX (g(Xi ) 6= f (Xi , θ, h)) > 0 for all θ ∈ Θ, h ∈ H

(3.2)

When the semiparametric null hypothesis is true, the model becomes

Yi = f (Xi , θ0 , h0 ) + εi ,

E[εi |Xi ] = 0

Because the model is semiparametric while the moment condition E[εi |Xi ] = 0 is fully nonparametric, the null model is overidentified and it is possible to test its specification. In order to do so, I turn the conditional moment restriction E[εi |Xi ] = 0 into a sequence of unconditional moment restrictions using series methods. But first, I introduce series approximating functions and semiparametric series estimators. For any variable z, let Qan (z) = (q1 (z), ..., qan (z))0 be an an -dimensional vector of approximating functions of z, where the number of series terms an is allowed to grow with the sample size n. Possible choices of series functions include:2 2

For a more detailed discussion of series methods and for other possible choices of basis functions, see Section 5 in Newey (1997), Section 2 in Donald et al. (2003), or Section 2.3 in Chen (2007).

9

(a) Power series. For univariate z, they are given by:

Qan (z) = (1, z, ..., z an −1 )0

(3.3)

(b) Splines. Let s be a positive scalar giving the order of the spline, and let t1 , ..., tan −s−1 denote knots. Then for univariate z, splines are given by: Qan (z) = (1, z, ..., z s , 1{z > t1 }(z − t1 )s , ..., 1{z > tan −s−1 }(z − tan −s−1 )s )

(3.4)

Multivariate power series or splines can be formed from products of univariate ones.

3.2

Series Estimators

In this section I introduce additional notation and define semiparametric estimators. Suppose that the researcher writes the semiparametric null model in a series form:

Yi = f (Xi , θ, h) + εi = Wi0 β1 + Ri + εi = Wi0 β1 + ei ,

(3.5)

where Wi := W mn (Xi ) := (W1 (Xi ), ..., Wmn (Xi ))0 are appropriate regressors or basis functions, mn is the number of parameters in the semiparametric null model, Ri := Rimn := (f (Xi , θ, h) − Wi0 β1 ) is the approximation error, and ei = εi + Ri is the composite error term. Example 3. Suppose that the semiparametric null model is partially linear with f (Xi , θ, h) = X1i θ + h(X2i ), where X1i and X2i are scalars and Xi = (X1i , X2i )0 . Approximate h(x2 ) ≈ Pan an 0 j=1 γj qj (x2 ) = Q (x2 ) γ and rewrite the model as: Yi = X1i θ + h(X2i ) + ε = X1i θ + Qan (X2i )0 γ + Ri + εi = Wi0 β1 + ei , where Wi = (X1i , Qan (X2i )0 )0 , mn = an + 1, β1 = (θ, γ 0 )0 , Ri = h(X2i ) − Qan (X2i )0 γ, and an −1 0 2 ei = Ri + εi . If power series are used, then Wi = (X1i , 1, X2i , X2i , ..., X2i ).

10

In order to test the semiparametric model, I test whether additional series terms should enter the model. These are not the extra series terms used to approximate the semiparametric null model better, but rather the series terms that cannot enter the model under the null and are supposed to capture possible deviations from it. Thus, the alternative is given by:

Yi = Wi0 β1 + Ti0 β2 + Ri + εi = Pi0 β + ei ,

(3.6)

where Ti := T rn (Xi ) := (T1 (Xi ), ..., Trn (Xi ))0 are the series terms that are present only under the alternative, rn is the number of restrictions, Pi := P kn (Xi ) := (Wi0 , Ti0 )0 , kn = mn + rn is the total number of parameters, and β = (β10 , β20 )0 . Example 3 (continued). In the partially linear model example, a possible choice of Ti is an −1 an −1 an −1 0 2 Ti = (X1i , ..., X1i , X1i X2i , ..., X1i X2i ) . These series terms can enter the model if the

null hypothesis is false, but they cannot enter the model if it is true. If the moment condition E[Yi − f (Xi , θ0 , h0 )|Xi ] = 0 holds, then the errors εi = Yi − f (Xi , θ0 , h0 ) are uncorrelated with any function of Xi , not only with Wi , and the additional series terms should be insignificant. Hence, the null hypothesis corresponds to β2 = 0. The estimate of β1 under the null becomes β˜1 = (W 0 W )−1 W 0 Y ,3 and the restricted residuals are given by ε˜ = Y − W β˜1 = Y − W (W 0 W )−1 W 0 Y = MW Y = MW (ε + R), where MW = I − PW , PW = W (W 0 W )−1 W 0 . The residuals satisfy the condition W 0 ε˜ = 0. mn is the number of terms in the semiparametric null model. It has to grow with the sample size in order to approximate nonparametric components of the model sufficiently well.4 rn is the number of series terms that capture possible deviations from the null. kn = mn + rn is the total number of series terms in the unrestricted model. This number has to grow with the sample size if it is to approximate any nonparametric alternative. Typically the number 3 4

For any vector Vi , let V = (V1 , ..., Vn )0 be the matrix that stacks all observations together. The notion of sufficiently well is made precise later.

11

of terms in the restricted semiparametric model, mn , is significantly smaller than the number of terms in the unrestricted nonparametric model, kn , so that mn /kn → 0 and rn /kn → 1.

3.3

Test Statistics

Instead of the conditional moment restriction implied by the null hypothesis: E[εi |Xi ] = E[Yi − f (Xi , θ0 , h0 )|Xi ] = 0, the test will be based on the unconditional moment restriction E[Pi εi ] = 0. I will show that requiring kn to grow with the sample size and P (x) := P kn (x) to approximate any unknown function in a wide class of functions will allow me to obtain a consistent specification test based on a growing number of unconditional moment restrictions. The test statistic is based on the sample analog of the population moment condition E[Pi εi ] = 0. In the homoskedastic case, the test statistic resembles the LM test statistic and is given by

ξ = ε˜0 P (˜ σ 2 P 0 P )−1 P 0 ε˜,

(3.7)

˜ h) ˜ = Y −W β˜1 are the semiparametwhere P = (Pi , ..., Pn )0 , ε˜ = (˜ ε1 , ..., ε˜n )0 , ε˜i = Yi −f (Xi , θ, ric residuals, and σ ˜ 2 = ε˜0 ε˜/n. Note that because the null model is nested in the alternative, the test statistic reduces to ξ = ε˜0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 ε˜, where T˜ = MW T are the residuals from the regression of each element of Ti on Wi . In the heteroskedastic case, the test statistic is modified appropriately:

˜ )−1 P 0 ε˜, ξHC,1 = ε˜0 P (P 0 ΣP

(3.8)

˜ = diag(˜ where Σ ε2i ). Because the null model is nested in the alternative, an alternative test

12

statistic can be computed as follows:

˜ T˜)−1 T˜0 ε˜ ξHC,2 = ε˜0 T˜(T˜0 Σ

(3.9)

All three quadratic forms, ξ, ξHC,1 , and ξHC,2 can be computed in regression-based ways. These ways are discussed in Section 5.5 Because the dimensionality kn of Pi grows with the sample size, a normalization is needed to obtain a convergence in distribution result. I show that the test statistic ξ − τn tτn = √ 2τn

(3.10)

for a suitable sequence τn → ∞ as n → ∞ works and is asymptotically pivotal. The test is one-sided in the sense that the null hypothesis is rejected when the value of the test statistic is large and positive. I will discuss the choice of τn in the next section.

4

Asymptotic Theory In this section I derive the asymptotic properties of the proposed test. I focus here

on the case when series methods are used to nest the null model in the alternative and estimate the restricted model. This is because series methods have a projection interpretation, which makes it possible to directly account for the estimation variance. As a result, the rate conditions required for the validity of the proposed test are mild. The use of other semiparametric estimators, such as kernels or local polynomials, does not allow for the degrees of freedom correction and leads to more restrictive rate conditions. First, I impose basic assumptions on the data generating process. Assumption 1. (Yi , Xi0 )0 ∈ R1+dx , dx ∈ N, i = 1, ..., n are i.i.d. random draws of the random variables (Y, X 0 )0 , and the support of X, X , is a compact subset of Rdx . 5

For regression-based ways to compute the parametric LM test statistic, see Wooldridge (1987) or Chapters 7 and 8 in Cameron and Trivedi (2005).

13

Next, I define the error term and impose two moment conditions. Assumption 2. Let εi = Yi − E[Yi |Xi ]. The following two conditions hold: (a) 0 < σ 2 (x) = E[ε2i |Xi = x] < ∞. (b) E[ε4i |Xi ] is bounded. The following assumption deals with the behavior of the approximating series functions. From now on, let ||A|| = [tr(A0 A)]1/2 be the Euclidian norm of a matrix A. ˜ m (x) = Assumption 3. For each m, r, and k there are matrices B1 and B2 such that, for W ˜ m (x)0 , T˜r (x)0 )0 , B1 W m (x), T˜r (x) = B2 T r (x), and P˜ k (x) = (W ˜ m (x)|| ≤ (a) There exists a sequences of constants ζ(·) that satisfies the conditions supx∈X ||W ζ(m), supx∈X ||T˜r (x)|| ≤ ζ(r), and supx∈X ||P˜ k (x)|| ≤ ζ(k). (b) The smallest eigenvalue of E[P˜ k (Xi )P˜ k (Xi )0 ] is bounded away from zero uniformly in k. Remark 1 (ζ(·) for Common Basis Functions). Explicit expressions for ζ(·) are available for certain families of basis approximating functions. For instance, it has been shown (see Newey (1997) or Section 15.1.1 in Li and Racine (2007)) that under additional assumptions, ζ(a) = O(a1/2 ) for splines and ζ(a) = O(a) for power series. The following lemma shows that a convenient normalization can be used. This normalization is typical in the literature on series methods. Lemma 1 (Donald et al. (2003), Lemma A.2). If Assumption 3 is satisfied then it can be assumed without loss of generality that P˜ k (x) = P k (x) and that E[P k (Xi )P k (Xi )0 ] = Ik . Next, I impose an assumption that requires the approximation error for the semiparametric model to vanish sufficiently fast under the null. Assumption 4. Suppose that H0 holds. There exists α > 0 such that sup |f (x, θ0 , h0 ) − W mn (x)0 β1 | = O(m−α n ) x∈X

14

Remark 2 (α for Common Basis Functions). In certain special cases, it is possible to characterize α explicitly. Suppose that power series or splines are used and that f (x, θ, h) = x01 θ + h(x2 ), where h has r continuous derivatives and x2 is dx2 -dimensional. Then, as shown in Newey (1997) or Chapter 15 in Li and Racine (2007), α = r/dx2 . The following lemma provides the rates of convergence (in the sample mean squared error sense) for semiparametric series estimators under the null hypothesis. Lemma 2 (Li and Racine (2007), Theorem 15.1). Let f (x) = f (x, θ0 , h0 ), fi = f (Xi ), ˜ h) ˜ = W mn (x)0 β˜1 , and f˜i = f˜(Xi ) = W 0 β˜1 . Under Assumptions 1, 3, and 4, f˜(x) = f (x, θ, i the following is true: n

1X ˜ (fi − fi )2 = Op (mn /n + mn−2α ) n i=1 Remark 3 (Convergence Rates for Semiparametric Series Estimators). The exact rates given in Lemma 2 are derived in Li and Racine (2007) for series estimators in nonparametric models. However, as other examples in Chapter 15 in Li and Racine (2007) show, similar rates can be derived in a wide class of semiparametric models, such as partially linear, varying coefficient, or additive models (see Theorems 15.5 and 15.7 in Li and Racine (2007)). In each of these cases, it is possible to replace the rate in Lemma 2 with the rate for the particular case of interest. With appropriately modified rate conditions and assumptions, the results developed below will continue to hold. It is now possible to derive the asymptotic distribution of the test statistic 3.10 under the null. I consider the homoskedastic and heteroskedastic cases separately.

4.1

Homoskedastic Case

The limiting distribution of the test statistic under the null is given by the next result: Theorem 1. Assume that series methods are used to estimate the restricted model. Also assume that Assumptions 1, 2, 3, and 4 are satisfied, σ 2 (x) = σ 2 , 0 < σ 2 < ∞, for all 15

x ∈ X , and the following rate conditions hold:

ζ(kn )2 kn rn1/2 /n → 0

(4.1)

ζ(rn )rn /n1/2 → 0

(4.2)

ζ(kn )mn1/2 kn1/2 /n1/2 → 0

(4.3)

nmn−2α /rn1/2 → 0

(4.4)

ζ(rn )2 /n1/2 → 0

(4.5)

ξ − rn d trn = √ → N (0, 1), 2rn

(4.6)

Then

where ξ is as in Equation 3.7. Remark 4 (Rate Conditions under Homoskedasticity). Conditions 4.1–4.5 are sufficient conditions for the result of the theorem to hold. Conditions 4.1 and 4.2 are used to bound the error from replacing σ ˜ 2 T 0 MW T with σ 2 E[Ti Ti0 ]. Conditions 4.3 and 4.4 are used to bound the error from approximating unknown functions with finite series expansions. Condition 4.5 is used to obtain the convergence in distribution result by applying Lemma A.1 in the Appendix. The following corollary shows that using a χ2 approximation with a growing number of degrees of freedom also results in an asymptotically exact test in the homoskedastic case. Corollary 1. If the conditions of Theorem 1 hold, then

P ξ≥

χ21−α (rn )



 =P

χ2 (rn ) − rn ξ − rn √ ≥ 1−α√ 2rn 2rn

 →α

Thus, there are two ways to construct an asymptotically exact specification test. Exactly which one might be preferred in terms of the finite sample performance is studied in Section 6.

16

4.2

Heteroskedastic Case

The limiting distribution of the heteroskedasticity robust test statistic under the null is given by the next two results: Theorem 2. Assume that series methods are used to estimate the restricted model. Also assume that Assumptions 1, 2, 3, and 4 are satisfied, and the following rate conditions hold:

(mn /n + mn−2α )ζ(rn )2 rn1/2 → 0

(4.7)

ζ(rn )rn /n1/2 → 0

(4.8)

ζ(kn )mn1/2 kn1/2 /n1/2 → 0

(4.9)

1/2 nm−2α n /rn → 0

(4.10)

ζ(rn )2 /n1/2 → 0

(4.11)

−1/2

ˆ − Ω|| ˜ = op (rn Also assume that ||Ω

˜ /n and ˜ = T 0 ΣT ), where Ω

0˜ ˆ = T 0 ΣT ˜ /n − (T 0 ΣW/n)(W ˜ ˜ /n) Ω ΣW/n)−1 (W 0 ΣT

Then

trn ,HC,1 =

ξHC,1 − rn d √ → N (0, 1), 2rn

(4.12)

where ξHC,1 is as in Equation 3.8. Theorem 3. Assume that series methods are used to estimate the restricted model. Also assume that Assumptions 1, 2, 3, and 4 are satisfied, and the rate conditions 4.7–4.11 hold. −1/2

ˆ − Ω|| ˜ = op (rn Also assume that ||Ω

˜ = T 0 ΣT ˜ /n and Ω ˆ = T˜0 Σ ˜ T˜/n. ), where Ω

Then

trn ,HC,2 =

ξHC,2 − rn d √ → N (0, 1), 2rn 17

(4.13)

where ξHC,2 is as in Equation 3.9. Remark 5 (Rate Conditions under Heteroskedasticity). Conditions 4.7–4.11 are sufficient conditions for the result of the theorem to hold. Conditions 4.7 and 4.8 are used to bound the ˜ /n with E[ε2 Ti T 0 ].6 Conditions 4.9 and 4.10 are used to bound the error from replacing T 0 ΣT i i error from approximating unknown functions with finite series expansions. Condition 4.11 is used to obtain the convergence in distribution result by applying Lemma A.1 in the Appendix. The following corollary shows that using a χ2 approximation with a growing number of degrees of freedom also results in an asymptotically exact test in the heteroskedastic case. Corollary 2. If the conditions of Theorem 2 (Theorem 3) hold, then for j = 1 (j = 2)

P ξHC,j ≥

4.3

χ21−α (rn )



 =P

χ2 (rn ) − rn ξHC,j − rn √ ≥ 1−α√ 2rn 2rn

 → α,

Behavior of the Test Statistic under a Fixed Alternative

In this subsection, I study the behavior of the test statistic under a fixed alternative. To obtain a consistent specification test, I will rely on the following result from Donald et al. (2003) that shows that for a suitable class of functions the conditional mean restriction E[εi |Xi ] = 0 is equivalent to a sequence of unconditional moment restrictions. Assumption 5. (Donald et al. (2003), Assumption 1) Assume that E[Pi Pi0 ] is finite for all k, and for any a(x) with E[a(Xi )2 ] < ∞ there are k × 1 vectors γ k such that, as k → ∞,

E[(a(Xi ) − Pi0 γ k )2 ] → 0

Lemma 3. (Donald et al. (2003), Lemma 2.1) ˆ can be replaced with Ω, ˜ I directly impose the Instead of imposing primitive conditions under which Ω −1/2 ˆ ˜ high level condition that ||Ω − Ω|| = op (rn ). This condition is discussed in more detail in Appendix B. 6

18

Suppose that Assumption 5 is satisfied and E[ε2i ] is finite. If E[εi |Xi ] = 0 then E[Pi εi ] = 0 for all k. Furthermore, if E[εi |Xi ] 6= 0 then E[Pi εi ] 6= 0 for all k large enough. The class of functions a(x), for which the equivalence between the conditional and unconditional restrictions holds, consists of functions that can be approximated (in the mean squared sense) using series as the number of series terms grows. While it is difficult to give a necessary and sufficient primitive condition that would describe this class of functions, the test will likely be consistent against continuous and smooth alternatives, while it may not be consistent against alternatives that exhibit jumps. Section 6.3 examines the finite sample behavior of the test under such alternatives. This is a population result in the sense that it does not involve the sample size n. In order to use this result in practice, I require the number of series terms used to construct the test statistic, kn , to grow with the sample size. By doing so, I ensure that the unconditional moment restriction E[Pi εi ] = 0, on which the test is based, is equivalent to the conditional moment restriction E[εi |Xi ]. Thus, the test will be consistent against a wide class of alternatives satisfying Assumption 5. In order to analyze the behavior of the test under a fixed alternative, I introduce some notation first. The true model is nonparamertic:

Yi = g(Xi ) + εi ,

E[εi |Xi ] = 0

An alternative way to write this model is

Yi = f (Xi , θ∗ , h∗ ) + ε∗i , where θ∗ and h∗ are pseudo-true parameter values and ε∗i = εi + (g(Xi ) − f (Xi , θ∗ , h∗ )) = εi + d(Xi ) is a composite error term. The pseudo-true parameter values minimize E[(g(Xi ) − f (Xi , θ, h))2 ] 19

over a suitable parameter space. Note that the model can be written as

Yi = Wi0 β1∗ + ε∗i + Ri∗ , where Ri∗ = (f (Xi , θ∗ , h∗ ) − Wi0 β1∗ ). The pseudo-true parameter value β1∗ solves the moment condition E[Wi (Yi − Wi0 β1∗ )] = 0, and the semiparametric estimator β˜1 solves its sample analog W 0 (Y − W β˜1 )/n = 0. The following theorem provides the divergence rate of the test statistic under the fixed alternative. 0 Theorem 4. Assume that series methods are used in estimation. Let Ω∗ = E[ε∗2 i Ti Ti ]. In

ˆ =σ the homoskedastic case, let Ω ˜ 2 T 0 MW T /n. In the heteroskedastic case, let 0˜ ˆ = T 0 ΣT ˜ /n − (T 0 ΣW/n)(W ˜ ˜ /n), Ω ΣW/n)−1 (W 0 ΣT

˜ T˜/n, where T˜ = MW T , if ˆ = T˜0 Σ ˜ = diag(˜ where Σ ε2i ), if the test is based on ξHC,1 , and let Ω the test is based on ξHC,2 . p

ˆ − Ω∗ || → 0, the smallest eigenSuppose that supx∈X |f (x, θ∗ , h∗ ) − W mn (x)0 β1∗ | → 0, ||Ω value of Ω∗ is bounded away from zero, mn → ∞, rn → ∞, rn /n → 0, E[ε∗i Ti0 ]Ω∗−1 E[Ti ε∗i ] → ∆, where ∆ is a constant. Then under homoskedasticity √

√ rn ξ − rn p √ → ∆/ 2, n 2rn

and under heteroskedasticity for j = 1, 2 √ √ rn ξHC,j − rn p √ → ∆/ 2 n 2rn √ The divergence rate of the test statistic under the alternative is n/ rn . However, in

20

most cases, the restricted semiparametric model is of lower dimension than the unrestricted nonparametric model, so that mn /kn → 0 and rn /kn → 1. Thus, the divergence rate in the semiparametric case discussed here is the same as in the parametric case in Hong and White (1995) and Donald et al. (2003), so the fact that the null hypothesis is semiparametric does not affect the global power of the test.

4.4

Behavior of the Test Statistic under Local Alternatives

In this section, I analyze the power of the proposed test against local alternatives of the form H1n : gn (Xi ) = f (Xi , θ∗ , h∗ ) + (rn1/4 /n1/2 )d0 (Xi ), where d0 is square integrable on X , E[d0 (Xi )] = 0, and E[f (Xi , θ∗ , h∗ )d0 (Xi )] = 0. The null hypothesis corresponds to d0 = 0. I need to slightly modify the assumptions from the precious sections. First, I impose the following condition. It is not primitive; however, an analogous result for ε instead of d is derived in the proof of Theorem 1. Thus, this assumption is likely to hold under the primitive conditions I impose. Assumption 6. Assume that ||n−1 T 0 d|| = Op (

p p rn /n) and ||n−1 W 0 d|| = Op ( rn /n).

Next, I impose the following assumption that requires the basis functions Ti to approximate d0 (Xi ) sufficiently well. Note that because d0 (Xi ) is orthogonal to the semiparametric part of the model, f (Xi , θ∗ , h∗ ), it is also orthogonal to Wi , and only Ti is needed to approximate it. Assumption 7. There exists αd > 0 such that sup |d0 (x) − T rn (x)0 π| = O(rn−αd ) x∈X

Next, I apply the following result to obtain the convergence rates of series estimators 21

of d0 (x). These estimators are infeasible, because in practice d0 (x) is unknown, but these convergence rates will be used in my proofs. ˆ = Lemma 4 (Li and Racine (2007), Theorem 15.1). Let di = d0 (Xi ), π ˆ = (T 0 T )−1 T 0 d, d(x) ˆ i ). Under Assumptions 1, 3, and 7, the following is true: T rn (x)0 π ˆ , and dˆi = d(X (a) supx∈X

  p −αd ˆ |d(x) − d0 (x)| = Op ζ(rn )( rn /n + rn ) (dˆi − di )2 = Op (rn /n + rn−2αd )

(b)

1 n

Pn

(c)

R

˜ − d0 (x))2 dF (x) = Op (rn /n + r−2αd ) (d(x) n

i=1

Next, I impose an assumption that parallels Assumption 4. Assumption 8. There exists α > 0 such that

sup |f (x, θ∗ , h∗ ) − W mn (x)0 β1∗ | = O(m−α n ) x∈X

Now I impose an assumption that mimics the results of Lemma 2 and requires the semiparametric estimators to converge to the pseudo-true values under the local alternative at the same rates as to the true values under the null. ˜ h) ˜ = W mn (x)0 β˜1 , and Assumption 9. Let f ∗ (x) = fn (x, θ∗ , h∗ ), fi∗ = f ∗ (Xi ), f˜(x) = f (x, θ, f˜i = f˜(Xi ). For some α > 0, the following conditions hold: (a) supx∈X

  p ∗ −α ˜ |f (x) − f (x)| = Op ζ(mn )( mn /n + mn ) (f˜i − fi∗ )2 = Op (mn /n + m−2α n )

(b)

1 n

Pn

(c)

R

(f˜(x) − f ∗ (x))2 = Op (mn /n + mn−2α )

i=1

The next result gives the behavior of the test under local alternatives. For simplicity, I only treat the homoskedastic case, but a similar result can be obtained in the heteroskedastic case at the expense of additional, less transparent, assumptions and tedious derivations. 22

Theorem 5. Assume that series methods are used to estimate the restricted model. Also assume that Assumptions 1, 2, 3, 6, 7, 8, and 9 are satisfied, σ 2 (x) = σ 2 , 0 < σ 2 < ∞, for all x ∈ X , and rate conditions 4.1–4.5 hold. Then ξ − rn d trn = √ → N (δ, 1), 2rn where ξ is as in Equation 3.7 and δ = E[d2i ]/σ 2 .

4.5

Summary of the Results

My results show that the LM type test statistic normalized by the number of restrictions rn that the null hypothesis imposes on the nonparametric model is asymptotically standard normal. The normalization I derive, rn , is different from the normalization used in most seriesbased specification tests for parametric models, which use the total number of parameters in the nonparametric model kn (see equations (2.1) and (2.2) in Hong and White (1995) and Lemma 6.2 in Donald et al. (2003)). This difference can be viewed as a degrees of freedom correction. I will study its practical importance in Section 6, while here I discuss its theoretical significance. My approach differs from the approach used in previous literature in how it copes with a key step in the proof, going from the semiparametric regression residuals ε˜ to the true errors ε. My approach relies on the projection property of series estimators to eliminate the estimation variance and hence only has to deal with the approximation bias. Specifically, it uses the equality ε˜ = MW ε+MW R (see equation A.1 in the Appendix), applies a central limit theorem for U -statistics to the quadratic form in MW ε, and bounds the remainder terms by requiring the approximation error R to be small (see equation A.2 in the Appendix). The conventional approach does not impose any special structure on the model residuals ˆ and βˆ is and uses the equality ε˜ = ε + (g − g˜). In parametric models, g − g˜ = X 0 (β − β), √ n-consistent. This makes it possible to apply a central limit theorem for U -statistics to the ˆ However, quadratic form in ε and bound the remainder terms that depend on X 0 (β − β). 23

in semiparametric models this approach has to deal with both the bias and variance of semiparametric estimators. Specifically, g − g˜ = R + W 0 (β1 − β˜1 ), where R can be viewed as the bias term and W 0 (β1 − β˜1 ) as the variance term. Thus, to bound g − g˜, it is necessary to impose rate conditions that would make both bias and variance go to zero sufficiently fast, and these conditions turn out to be very restrictive. To see this, it is useful to look at the rates that would be permissible with and without the degrees of freedom correction. Usually ζ(k) = O(k 1/2 ) for splines and ζ(k) = O(k) for power series. It can be shown that if splines are used, the rates kn = O(n2/7 ), rn = O(n2/7 ), mn = O(n1/4 ) are permissible in the sense of rate conditions 4.1–4.5 if α ≥ 4. If power series are used, the rates kn = O(n2/9 ), rn = O(n2/9 ), mn = O(n1/5 ) are permissible in the sense of rate conditions 4.1–4.5 if α ≥ 5 Without the degrees of freedom correction, in order for the test to be asymptotically valid, 1/2

mn typically has to be of the order o(kn ). Hence, kn = O(n2/7 ) would require mn = o(n1/7 ) and α ≥ 7 if splines are used. For example, kn = O(n2/7 ) and mn = O(n2/15 ) are permissible if α ≥ 7. If power series are used, kn = O(n2/9 ) would require mn = o(n1/9 ) and α ≥ 9. For example, kn = O(n2/7 ) and mn = O(n2/19 ) are permissible if α ≥ 9. Intuitively, the estimation variance in semiparametric model increases with the number of series terms mn . With the degrees of freedom correction, the estimation variance is explicitly taken into account, so mn can grow fairly fast (almost as fast as kn ). This, in turn, makes it easier to approximate unknown functions and leads to fairly weak requirements on the behavior of unknown functions, as implied by the bounds on α. Without the degrees of freedom correction, mn has to grow very slowly to control the estimation variance. But small mn means that it is difficult to control the bias, and unknown functions have to be very well behaved to make the approximation error sufficiently small. This leads to much more restrictive bounds on α. To summarize, the degrees of freedom correction allows me to use more series terms to estimate the semiparametric model and makes it possible to deal with a wider class of unknown functions that enter semiparametric models.

24

5

Implementation of the Test In this section I describe how to implement the proposed test. The test statistic can be

computed using the following steps: 1. Pick the sequence of approximating functions of x, W mn (x) = (W1 (x), ..., Wmn (x))0 , which will be used to estimate the semiparametric model. Let Wi := W mn (Xi ). 2. Estimate the semiparametric model Yi = f (Xi , θ, h) + εi ≈ Wi0 β1 + εi using series methods. Obtain the estimates β˜1 = (W 0 W )−1 W 0 Y and residuals ε˜i = Yi − Wi0 β˜1 . 3. Pick the sequence of approximating functions of x, T rn (x) = (T1 (x), ..., Trn (x))0 , which will complement W mn (x) to form the matrix P kn (x) = (W mn (x)0 , T rn (x)0 )0 , kn = mn + rn , which corresponds to a general nonparametric model. P kn (x) should be able to approximate any unknown function sufficiently well. Common choices of basis functions include power series (see Equation 3.3) and splines (see Equation 3.4). Let Ti := T rn (Xi ) and Pi := P kn (Xi ). 4. Compute the quadratic form ξ = ε˜0 P (˜ σ 2 P 0 P )−1 P 0 ε˜, where P := (P1 , ..., Pn )0 . Note that in the homoskedastic case ξ can be computed as nR2 from the regression of ε˜i on Pi .7 The residuals from the regression of ε˜i on Pi are given by e = ε˜− P (P 0 P )−1 P 0 ε˜, so that e0 e = ε˜0 ε˜ − ε˜P (P 0 P )−1 P 0 ε˜ Then   e0 e ε˜P (P 0 P )−1 P 0 ε˜ nR = n 1 − 0 =n = ε˜P (˜ σ 2 P 0 P )−1 P 0 ε˜ ε˜ ε˜ ε˜0 ε˜ 2

Note that ξ can also be computed as the overidentifying restrictions test statistic from the 2SLS instrumental variables regression of Yi on Wi with (Wi0 , Ti0 )0 as instruments. 7

If the semiparametric model written in a series form includes a constant, both centered and uncentered R can be used because they are identical. If the semiparametric model does not include a constant, the uncentered R2 should be used. 2

25

5. Compute the test statistic which is asymptotically standard normal under the null: ξ − rn a t= √ ∼ N (0, 1) 2rn Reject the null if t > z1−α , the (1 − α)-quantile of the standard normal distribution. a

Alternatively, use the χ2 approximation directly: ξ ∼ χ2 (rn ), reject the null if ξ > χ21−α (rn ), the (1 − α)-quantile of the χ2 distribution with rn degrees of freedom. If the researcher suspects that the errors are heteroskedastic, then ξ = ε˜0 P (˜ σ 2 P 0 P )−1 P 0 ε˜ ˜ T˜)−1 T˜0 ε˜, where Σ ˜ )−1 P 0 ε˜ or ξHC,2 = ε˜0 T˜(T˜0 Σ ˜ = diag(˜ ε2i ) is replaced with ξHC,1 = ε˜0 P (P 0 ΣP and T˜i = Ti − Wi0 (W 0 W )−1 W 0 T . All other steps remain unchanged. Note that in the heteroskedastic case ξHC,1 can be computed as nR2 from the regression ˜ 1/2 P , where Σ ˜ 1/2 = diag(˜ of 1 on Pi ε˜i . The matrix of regressors can be written as Σ εi ). Let ˜ 1/2 ιn = ε˜. Then ιn be a n-vector of ones. Note that Σ e0 e nR = n 1 − 0 ιn ιn 2



 =n

˜ 1/2 ιn ˜ )−1 P 0 Σ ˜ 1/2 P (P 0 ΣP ι0n Σ ˜ )−1 P 0 ε˜ = ε˜0 P (P 0 ΣP n

In turn, ξHC,2 can be computed as nR2 from the regression of 1 on T˜i ε˜i . The matrix of ˜ 1/2 = diag(˜ ˜ 1/2 T˜, where Σ εi ). Let ιn be a n-vector of ones. Note regressors can be written as Σ ˜ 1/2 ιn = ε˜. Then that Σ e0 e nR = n 1 − 0 ιn ιn 2

6



 =n

˜ 1/2 T˜(T˜0 Σ ˜ T˜)−1 T˜0 Σ ˜ 1/2 ιn ι0n Σ ˜ T˜)−1 T˜0 ε˜ = ε˜0 T˜(T˜0 Σ n

Simulations There are several variants of the test, depending on what normalization and what limiting

distribution are used. In this section I study the finite sample performance of different versions of the proposed test in a partially linear model, due to its popularity in applied work (see examples in Section 2). The Monte Carlo studies have several goals: first, to compare the tests 26

based on the χ2 and normal asymptotic approximations; second, to study the importance of the degrees of freedom correction; third, to investigate the test behavior with different sample sizes and different numbers of series terms. I consider two sample sizes, n = 400 and n = 2, 000. I simulate data from two data generating processes: 1. Semiparametric partially linear, which corresponds to H0SP : Yi = 2X1i + g(X2i ) + εi (6.1) g(X2i ) = 3 + 2(exp(X2i ) − 2 ln(X2i + 3))

The left panel of Figure 1 plots the function g(x2 ). 2. Nonparametric, which corresponds to H1 : Yi = 2X1i + g(X2i ) + h(X1i , X2i ) + εi (6.2)

h(X1i , X2i ) = h1 (X1i )h2 (X2i ) h1 (X1i ) = 1.5 cos(X1i − 2),

h2 (X2i ) = 1.25 sin(0.75X2i )

Figure 2 plots the functions h1 (x1 ) and h2 (x2 ). The variance of the null model, 2X1i + g(X2i ), is around 8.5, while the variance of the deviation from the null, h(X1i , X2i ), is around 0.35. Because the semiparametric model is a restricted version of the nonparametric model, I will use the proposed test to check whether it is correctly specified. Before I move on and discuss the behavior of the proposed test in finite samples, I need to make two choices in order to implement the test. First, I need to choose the family of basis functions; second, choose the number of series terms in the restricted and unrestricted models. I use power series because of their simplicity, but I do not have any data-driven methods to choose tuning parameters.

27

The choice of tuning parameters presents a big practical challenge in implementing the proposed test, as well as many other specification tests.8 If one is only interested in estimation, then certain data-driven methods, such as cross-validation, can be used to select the number of series terms.9 However, it is not clear how these data-driven procedures may affect the proposed test and whether using them would lead to any optimality results in testing. I leave the choice of tuning parameters for the proposed test for future research, and in my simulations choose tuning parameters according to the rate conditions in Section 4. However, the choices I make are still arbitrary, because I can multiply mn or kn by any constant and still satisfy the rate conditions, which are asymptotic in nature. To estimate the partially linear model, I use an = b3.25n0.12 c terms in x2 in the series expansion of the unknown function g(x2 ), which leads to mn = an + 1. In the nonparametric model, I include the same number an of series terms in x1 and all products of series terms in x1 and x2 , which leads to kn = a2n . The number of restrictions is given by rn = kn − mn . These choices lead to mn = 7, rn = 29, and kn = 36 when n = 400, and to mn = 9, rn = 55, and kn = 64 when n = 2, 000. Because the behavior of the test statistic depends both on the sample size and the number of series terms, I also consider an intermediate setup with n = 2, 000 but mn = 7, rn = 29, and kn = 36 to separate these two effects. In other words, I first fix the number of series terms and increase the sample size, and then fix the sample size and increase the number of series terms. Throughout my analysis, I use B = 5, 000 simulation draws. 8

For a discussion of regularization parameters choice in the context of kernel-based tests, see a review by Sperlich (2014). 9 For details, see, e.g., Section 15.2 in Li and Racine (2007).

28

6.1

Homoskedastic Errors

First, I investigate the performance of the test when the errors are homoskedastic. I use the ξ test statistic directly: a

ξ = ε˜0 P (˜ σ 2 P 0 P )−1 P 0 ε˜ ∼ χ2 (τn ),

and the normalized t statistic: ξ − τn a tτn = √ ∼ N (0, 1), 2τn with τn = kn and τn = rn . The errors are normally distributed: εi ∼ i.i.d. N (0, 4). With this choice of the distribution of εi , the semiparametric part of the model explains about 2/3 of the dependent variable variance, while the errors account for the remaining 1/3. I repeated my analysis with centered exponential errors, which have an asymmetric distribution, and found no substantial difference from the normal case. The results for exponential errors are omitted for brevity. As advocated by Davidson and MacKinnon (1998), I use graphical methods to investigate the finite sample size and power of my hypothesis test. Figure 3 shows the simulated size of a

a

the tests based on the test statistics ξ ∼ χ2 (τn ) on the left and tτn ∼ N (0, 1) on the right. I vary the nominal test level α from 0 to 0.25 and plot the simulated size against the nominal level. The test controls size well if the resulting curve is close to the 45 degree line. I use three combinations of the sample size and the number of series terms: Setup 1 with n = 400, mn = 7, rn = 29, kn = 36; Setup 2 with n = 2, 000, mn = 7, rn = 29, kn = 36; and Setup 3 with n = 2, 000, mn = 9, rn = 55, kn = 64. I also consider two normalizations of the test statistic: τn = kn and τn = rn . The former normalization corresponds to the approach without the degrees of freedom correction used in the previous literature, while the latter makes the correction. The upper panel of Figure 3 corresponds to the normalization τn = kn . As we can see, this 29

normalization results in severe underrejection. In fact, at the nominal 5% level the test with τn = kn rejects less than 1% of the time. However, as the middle panel illustrates, once the normalization τn = rn is used, the simulated size of the test becomes very close to the nominal size. The test actually overrejects slightly for most nominal levels α, but, as the lower panel 2 = (n − mn )−1 ε˜0 ε˜ shows, a further finite-sample improvement can be achieved by using σ ˜adj

in place of σ ˜ 2 = n−1 ε˜0 ε˜. As far as the choice between the ξ test statistic and the normalized t test statistic is concerned, the former appears to control the size of the test slightly better, especially for the small values of α in Setups 1 and 2. Both tests appear to control the size very well in Setup 2 3, especially in the lower two figures that use σ ˜adj to compute the test statistics. a

Figure 8 shows the simulated power of the tests based on the test statistic tτn ∼ N (0, 1). The left panel plots the simulated power against the nominal size, while the right panel plots the simulated power against the simulated size. The higher the resulting line, the better the power of the test. The upper panel corresponds to the normalization τn = kn . As we can see from the upper left figure, the simulated power of the test is below the nominal size in Setup 1. As expected, it increases in Setups 2 and 3, but as the middle and lower left figures show, the 2 is used. The normalization τn = rn greatly improves power, no matter whether σ ˜ 2 or σ ˜adj

three plots on the right look almost identical, indicating that the relationship between the simulated size and power of the test is very similar, if not identical, for the three versions of the test. This suggests that if there was a method to adjust the size of the test with τn = kn appropriately, the resulting test would have similar performance to the test with τn = rn . Indeed, the bootstrap may be able to achieve that, but because the tests with τn = rn control the size well and have good power, I do not investigate the use of the bootstrap in this paper. Next, we can see from the figure that increasing the sample size from n = 400 to n = 2, 000 while keeping the series term constant at kn = 36 greatly increases power, while increasing the number of series terms from kn = 36 to kn = 64 with the sample size of n = 2, 000

30

noticeably reduces power. This observation calls for a data-driven method to choose the number of series terms. While including too many series terms when they are not necessary can worsen the test performance, including too few series terms can make it difficult to detect certain alternatives,10 which will also lead to low power. Thus, having a data-driven method that would help balance these two considerations and determine the appropriate number of series terms is important, and I plan to study this question in future work.

6.2

Heteroskedastic Errors

In this subsection I investigate the performance of the test when the errors are heteroskedastic. The form of heteroskedasticity is εi ∼ i.n.i.d. N (0, 1+1.75 exp(0.75(X1i +X2i ))). The right panel of Figure 1 illustrates this form of heteroskedasticity. I will consider the test statistic based on a

˜ )−1 P 0 ε˜ ∼ χ2 (τn ), ξHC,1 = ε˜0 P (P 0 ΣP ˆ = diag(˜ where Σ ε2i ), and alternative test statistic based on a ˜ T˜)−1 T˜0 ε˜ ∼ ξHC,2 = ε˜0 T˜(T˜0 Σ χ2 (τn ),

where T˜ = MW T . To make it easier to distinguish between two versions of the test, I will refer to the test based on ξHC,1 as unadjusted, and to the test based on ξHC,2 as adjusted. This is because the latter test accounts for the use of series methods in estimation and adjusts the quadratic form accordingly. ˆ corresponds to the so-called HC0 heteroskedasticityThe aforementioned choice of Σ ˆ adj = n/(n − mn )diag(˜ robust variance estimator. I also use Σ ε2i ), which corresponds to the so-called HC1 variance estimator, which is known to have better finite sample properties (see MacKinnon and White (1985)). 10

The series-based test cannot detect alternatives that are orthogonal to all series terms used to form the test statistic. The more series terms are used to construct the test, the fewer such alternatives exist.

31

The normalized t statistic is then given by

tτn ,HC,j =

ξHC,j − τn a √ ∼ N (0, 1), 2τn

for τn = rn and j = 1, 2. I do not report the results for τn = kn , because they are similar to the previous section, i.e. the test based on τn = kn is severely undersized and low-powered. Instead of comparing the normalizations τn = rn and τn = kn , I compare the feasible test statistics trn ,HC,j with the infeasible test statistics

trn ,HC,j,inf =

ξHC,j,inf − rn √ , 2rn

where j = 1, 2, and

ξHC,1,inf = ε˜0 P (P 0 ΣP )−1 P 0 ε˜, ξHC,2,inf = ε˜0 T˜(T˜0 ΣT˜)−1 T˜0 ε˜,

where Σ = diag(E[ε2i |Xi ]) = diag(1 + 1.75 exp(0.75(X1i + X2i ))). I make this comparison in order to understand whether the behavior of the test under heteroskedastic errors is driven by heteroskedasticity itself or by the use of the estimated variance-covariance matrix instead of the true one. a

Figure 5 shows the simulated size of the tests based on the test statistic trn ,HC,j ∼ N (0, 1). The left panel uses the unadjusted test statistic ξHC,1 , while the right panel uses the adjusted ˆ to compute the test statistic, the middle panel test statistic ξHC,2 . The upper panel uses Σ ˆ adj , and the lower panel uses Σ, which is infeasible in practice. uses Σ As we can see, the tests based on the unadjusted test statistic ξHC,1 and its versions are ˆ Ω|| ˜ = op (rn−1/2 ) severely oversized. This may indicate that the high level assumption that ||Ω− may not hold for the test statistic ξHC,1 . Moreover, this assumption is likely to fail even when Σ is known, because the infeasible unadjusted test overrejects severely as well. But when

32

the test statistic ξHC,2 is used, the simulated size of the test becomes close to the nominal ˆ = diag(˜ level. In fact, in Setups 1 and 3, the test based on ξHC,2 , both with Σ ε2i ) and ˆ adj = n/(n − mn )diag(˜ Σ ε2i ), are slightly undersized. Interestingly, the infeasible adjusted test that uses Σ directly is slightly oversized for the small values of α, so estimating Σ actually helps control the size of the test better. a

Figure 5 shows the simulated power of the tests based on the test statistic trn ,HC,j ∼ N (0, 1) against their nominal size. It uses the same test statistics as Figure 5. It appears that the adjusted test has slightly lower size-corrected power than the unadjusted test, and the infeasible test is less powerful than feasible tests. Based on the evidence presented in this section, I recommend using the adjusted test statistic instead of the unadjusted one, because the adjusted test controls the test size better. Moreover, the size distortion of the adjusted test seems to be caused by the variancecovariance matrix estimation, and this distortion may disappear in large samples, when the variance matrix can be estimated more precisely. At the same time, the unadjusted test statistic is oversized even when the form of heteroskedasticity is known, so the size distortion might remain severe even with large sample sizes. Better ability of the adjusted test to control the size comes at the cost of slightly lower size-adjusted power, but the difference in power between the adjusted and unadjusted tests decreases as the sample size grows.

6.3

Alternatives against Which the Test Has Limited Power

The alternatives considered above are “well-behaved” in the sense that they are smooth and can be captured by series approximating functions. In this section, I study the finite sample behavior of the test when the alternatives are nonsmooth or discontinuous. Because it is difficult or even impossible to approximate such functions well using series, the test should have low power against them. I consider two nonparametric data generating processes, which are shown in Figure 7. One alternative is continuous but nonsmooth, while the other one is discontinuous. The 33

variance of the deviation from the null is around 0.35 in both cases, which is comparable to the variance of the deviation from the null in the smooth case considered before. Thus, any possible difference in power from the smooth case will be caused by the difference in the shape of the deviations from the null, not by how variable these deviations are. Figure 8 plots the simulated size of the test on the test statistic trn for the two types of alternatives I consider. As we can see, despite the degrees of freedom correction, the simulated power of the test in Setup 1, when n = 400, is almost exactly equal to the simulated size. Though the power increases in Setups 2 and 3, when n = 2, 000, it still remains fairly low, below 20% when the simulated size is equal to 5%. These results demonstrate that the test has very limited power against alternatives that cannot be captured well by series methods.

7

Empirical Example In this section, I apply the proposed test to the Canadian household gasoline consumption

data from Yatchew and No (2001). They estimate gasoline demand (y), measured as the logarithm of the total distance driven in a given month, as a function of the logarithm of the gasoline price (P RICE), the logarithm of the household income (IN COM E), the logarithm of the age of the primary driver of the car (AGE), and other variables (z), which include the logarithm of the number of drivers in the household (DRIV ERS), the logarithm of the household size (HHSIZE), an urban dummy, a dummy for singles under 35 years old, and monthly dummies. Yatchew and No (2001) employ several demand models, including semiparametric specifications. They use differencing (see Yatchew (1997) and Yatchew (1998) for details) to estimate semiparametric models. The relevance of semiparametric models in gasoline demand estimation was first pointed out by Hausman and Newey (1995) and Schmalensee and Stoker (1999). Yatchew and No (2001) follow these papers in using semiparametric specifications and pay special attention to specification testing. However, they only compare

34

semiparametric specifications with parametric ones, while I use series methods to estimate semiparametric specifications and implement the proposed specification test to assess their validity as compared to a general nonparametric model. I focus on the model that is nonparametric in AGE but parametric in P RICE and IN COM E (roughly corresponds to Model 3.4 in Yatchew and No (2001)):

y = α1 P RICE + α2 IN COM E + g(AGE) + z 0 β + ε

(7.1)

In this model, the relationship between gasoline demand, price, and household income has a familiar log-log form11 that could be derived from a Cobb-Douglas utility function. The age of the primary driver enters the model nonparametrically to allow for possible nonlinearities, while the remaining demographics enter the model linearly. Next, I investigate whether demand model 7.1 is flexible enough as compared to a more general model. In order to apply my test, I need to choose the series functions P kn (x) or, equivalently, define a nonparametric alternative that can be approximated by these series functions. Ideally, I would want to use a fully nonparametric alternative y = h(P RICE, AGE, IN COM E, z)+ε. However, this is impractical in the current setting: the dataset from Yatchew and No (2001) contains 12 monthly dummies, an urban dummy, and a dummy for singles under 35 years old. The fully nonparametric alternative would require me to completely saturate the model with the dummies, i.e. interact all series terms in continuous regressors with a full set of dummies. This would be equivalent to dividing the dataset into 12 · 2 · 2 = 48 bins and estimating the model within each bin separately. Given that the total number of observations is 6,230, this would leave me with about 125 observations per bin on average and would make semiparametric estimation and testing problematic. I choose a different approach to deal with the dummies. I maintain the assumption that the nonparametric alternative is separable in the continuous variables and the dummies. 11

Recall that y, P RICE, and IN COM E are the logarithms of gasoline demand, price, and household income correspondingly.

35

Separating z into z1 , which includes the logarithm of the number of drivers in the household and the logarithm of the household size, and z2 , which includes the dummies, I consider the nonparametric alternative given by:

y = h(P RICE, AGE, IN COM E, z1 ) + z20 λ + ε

(7.2)

To implement my test, I first need to decide how many series terms in AGE to use to estimate the semiparametric model 7.1. To guide this choice, I use the Mallows’s Cp and the generalized cross-validation, as discussed in Section 15.2 in Li and Racine (2007). Both procedures suggest that an = 3 (not including the constant) power series terms in AGE is optimal. Thus, I use a cubic function of AGE to estimate the model. To construct the regressors P kn used to evaluate the test statistic, I use an = 3 power series terms in AGE, P RICE, and IN COM E, jn = 2 power series terms in DRIV ERS and HHSIZE, and the set of dummies discussed above. I then use pairwise interactions (tensor products) of univariate power series, and add all possible three, four, and five element interactions between AGE, P RICE, IN COM E, DRIV ERS, and HHSIZE, without using higher powers in these interaction terms to avoid multicollinearity. This gives rise to mn = 21 terms under the null, kn = 110 terms under the alternative, and rn = 89 restrictions. I estimate specification 7.1 using series methods and then compute the test statistic trn given in equation 4.6 and the heteroskedasticity robust test statistic trn ,HC,2 given in equation 4.13. I obtain trn = 0.227 and trn ,HC,2 = 0.526. The critical value at the 5% level equals 1.645, so the null hypothesis that model 7.1 is correctly specified is not rejected. This is in line with the results in Yatchew and No (2001): even though they do not test their semiparametric specifications against a general nonparametric alternative, they find no evidence against a specification similar to 7.1 when compared to the following semiparametric specification: y = g1 (P RICE, AGE) + α2 IN COM E + z 0 β + ε

36

Next, I investigate what would happen if instead of the semiparametric model 7.1 the researcher estimated the following parametric model:

y = α1 P RICE + α2 IN COM E + γ0 + γ1 AGE + z 0 β + ε

(7.3)

This model leads to mn = 19 terms under the null, kn = 110 terms under the altenative, and rn = 91 restrictions. When testing it against specification 7.2, I obtain trn = 3.891 and trn ,HC,2 = 4.065, so that specification 7.3 is rejected at the 5% significance level. Thus, it seems that controlling for AGE flexibly is crucial in this gasoline demand application, and specification 7.1 used in Yatchew and No (2001) is flexible enough yet parsimonious. Finally, I use the proposed test to compare specification 7.3 with the following semiparametric alternative:

y = α1 P RICE + α2 IN COM E + h1 (AGE, z1 ) + z20 λ + ε,

(7.4)

which is more restricted than the nonparametric model 7.2. In this case, P kn includes only series terms in AGE and z1 and their interactions. It results in kn = 40 terms under the alternative and rn = 21 restrictions. I obtain trn = 18.902 and trn ,HC,2 = 10.187, so that specification 7.3 is again rejected at the 5% significance level. Therefore, restricting the class of alternatives can improve the power of the test if the true model is close to the conjectured restricted class. However, this will also result in the loss of consistency against alternatives that do not belong to the restricted set. For practical purposes, if the null model under consideration can be nested in several more general models, it may make sense to test it against these several alternatives simultaneously and apply the Bonferroni correction to control the test size. Using a general nonparametric alternative will result in consistency, while using more restrictive alternatives may result in better power if these alternatives are close to being correct. My simulations (not reported to conserve space) suggest that such an approach may indeed be beneficial. 37

8

Conclusion In this paper, I develop a new Largange Multiplier type specification test for semiparamet-

ric conditional mean models. The proposed test achieves consistency by turning a conditional moment restriction into a growing number of unconditional moment restrictions using series methods. Because the number of series terms grows with the sample size, the usual asymptotic theory for the parametric Largange Multiplier test is no longer valid. I develop a new asymptotic theory that explicitly allows the number of terms to grow and show that the normalized test statistic converges in distribution to the standard normal. The proposed test has several attractive features compared to the existing tests. First, the proposed test is simple to implement. The test statistic is based on a quadratic form in the semiparametric regression residuals, so only estimation of the restricted model is required to compute the test statistic. In the homoskedastic case, the quadratic form on which the test is based can be computed as nR2 from the regression of the semiparametric residuals on the series terms used to construct the test. There are also regression based ways to compute the heteroskedasticity robust version of the test statistic. Moreover, the asymptotic distribution of the test statistic is pivotal, which facilitates the calculation of appropriate critical values. Second, the projection property of series estimators makes it possible to explicitly account for the estimation variance and obtain refined asymptotic results. This refinement can be thought of as a degrees of freedom correction. This adjustment allows me to obtain asymptotic results for the proposed test under mild rate conditions and improves the finite sample behavior of the test. Third, series methods make it easy to restrict the class of alternatives from a fully nonparametric to certain semiparametric classes, such as additive, varying coefficient, or partially linear. Doing so will result in the loss of consistency against a general alternative but will improve the power of the test in certain directions. One can combine tests against various alternatives, including a fully nonparametric alternative to maintain consistency and using 38

the Bonferroni correction to control the size of the test. I apply the proposed test to the Canadian household gasoline consumption data from Yatchew and No (2001) and find no evidence against one of the semiparametric specifications used in their paper. However, I show that my test does reject a less flexible parametric model. An important avenue for future research is to develop a data-driven procedure to choose tuning parameters (the number of series terms under the null and alternative). An interesting possibility is to take the maximum of the test statistic over the range of tuning parameters, as suggested for parametric models in Horowitz and Spokoiny (2001) or Gørgens and W¨ urtz (2012). I plan to study whether this approach can be extended to semiparametric models in future work.

39

Appendix A

Tables and Figures

Figure 1: Nonparametric Part of Partially Linear Model

The left figure shows the function g(x2 ) = 3 + 2(exp(x2 ) − 2 ln(x2 + 3)) from equation 6.1 in the partially linear semiparametric model used in simulations. The right figure illustrates the form of heteroskedasticity εi ∼ i.n.i.d. N (0, 1 + 1.75 exp(0.75(X1i + X2i ))). It plots the variance σ 2 (x) = 1 + 1.75 exp(0.75(x1 + x2 )) as a function of x1 + x2 .

Figure 2: Deviation from H0 in Nonparametric Model

The left figure shows the deviation from H0 along the x1 axis: h1 (x1 ) = 1.5 cos(x1 − 2). The right figure shows the deviation from H0 along the x2 axis: h2 (x2 ) = 1.25 sin(0.75x2 ). The total deviation is h(x1 , x2 ) = h1 (x1 )h2 (x2 ).

40

Figure 3: Nominal and Simulated Size of the Test

This figure plots the simulated test size against the nominal size for α ∈ [0, 0.25]. The errors εi are i.i.d. a N (0, 4). The upper left figure corresponds to the test statistic ξ = ε˜0 P (˜ σ 2 P 0 P )−1 P 0 ε˜ ∼ χ2 (kn ), where ε˜ are the residuals from the partially linear model and σ ˜ 2 = n−1 ε˜0 ε˜. The upper right figure corresponds ξ−kn a to the test statistic tkn = √ ∼ N (0, 1). The middle left figure corresponds to the test statistic ξ = 2k a

n

a

ξ−rn σ 2 P 0 P )−1 P 0 ε˜ ∼ χ2 (rn ). The middle right figure corresponds to the test statistic trn = √ ε˜0 P (˜ ∼ N (0, 1). 2rn 0 2 0 −1 0 2 The lower left figure corresponds to the test statistic ξadj = ε˜ P (˜ σadj P P ) P ε˜, where σ ˜adj = (n − mn )−1 ε˜0 ε˜. ξadj −rn a The lower right figure corresponds to the test statistic trn ,adj = √2r ∼ N (0, 1). n The dotted line shows the 45 degree line. The dash-dotted line corresponds to Setup 1 with n = 400 observations, mn = 7 parameters under H0 , rn = 29 restrictions, kn = 36 parameters under H1 . The dashed line corresponds to Setup 2 with n = 2, 000, mn = 7, rn = 29, kn = 36. The solid line corresponds to Setup 3 with n = 2, 000, mn = 9, rn = 55, kn = 64. The results are based on B = 5, 000 simulation draws.

41

Figure 4: Simulated Power of the Test

This figure plots the simulated test power for α ∈ [0, 1]. The figures on the left plot the simulated power against the nominal size, while the figures on the right plot the simulated power against the simulated size. ξ−kn a The errors εi are i.i.d. N (0, 4). The upper two figures correspond to the test statistic tkn = √ ∼ N (0, 1), 2kn 0 2 0 −1 0 2 −1 0 where ξ = ε˜ P (˜ σ P P ) P ε˜, ε˜ are the residuals from the partially linear model, and σ ˜ = n ε˜ ε˜. The ξ−rn a √ middle two figures correspond to the test statistic trn = 2r ∼ N (0, 1). The lower two figure correspond to ξ

−r

n

a

n 2 √ the test statistic trn ,adj = adj ∼ N (0, 1), where σ ˜adj = (n − mn )−1 ε˜0 ε˜. 2rn The dotted line shows the 45 degree line. The dash-dotted line corresponds to Setup 1 with n = 400 observations, mn = 7 parameters under H0 , rn = 29 restrictions, kn = 36 parameters under H1 . The dashed line corresponds to Setup 2 with n = 2, 000, mn = 7, rn = 29, kn = 36. The solid line corresponds to Setup 3 with n = 2, 000, mn = 9, rn = 55, kn = 64. The results are based on B = 5, 000 simulation draws.

42

Figure 5: Nominal and Simulated Size of the Test, Heteroskedastic Errors

This figure plots the simulated test size against the nominal size for α ∈ [0, 0.25]. The errors εi are i.n.i.d. N (0, Σ) with Σ = diag(1 + 1.75 exp(0.75(X1i + X2i ))). The upper left figure corresponds to the test statistic ξ −rn a ˜ )−1 P 0 ε˜, ε˜ are the residuals from the partially linear √ trn ,HC,1 = HC,1 ∼ N (0, 1), where ξHC,1 = ε˜0 P (P 0 ΣP 2rn ξ −rn a ˜ = diag(˜ √ model, and Σ ε2i ). The upper right figures corresponds to the test statistic trn ,HC,2 = HC,2 ∼ 2rn 0 ˜ ˜ 0 ˜ ˜ −1 ˜ 0 0 −1 0 ˜ N (0, 1), where ξHC,2 = ε˜ T (T ΣT ) T ε˜ and T = MW T = T − W (W W ) W T . The middle left figure corresponds to the test statistic trn ,HC,1,adj and the middle right figure corresponds to the test statistic n ˜ with Σ ˜ adj = trn ,HC,2,adj , which replace Σ ε2i ) in the definitions above. The lower left figure n−mn diag(˜ ξHC,1,inf −rn a √ corresponds to the test statistic trn ,HC,1,inf = ∼ N (0, 1), where ξHC,1,inf = ε˜0 P (P 0 ΣP )−1 P 0 ε˜. 2r n

ξ

−r

a

n √ The lower right figure corresponds to the test statistic trn ,HC,2,inf = HC,2,inf ∼ N (0, 1), where ξHC,2,inf = 2rn 0 ˜ ˜ 0 ˜ −1 ˜ 0 ε˜ T (T ΣT ) T ε˜. The dotted line shows the 45 degree line. The dash-dotted line corresponds to Setup 1 with n = 400 observations, mn = 7 parameters under H0 , rn = 29 restrictions, kn = 36 parameters under H1 . The dashed line corresponds to Setup 2 with n = 2, 000, mn = 7, rn = 29, kn = 36. The solid line corresponds to Setup 3 with n = 2, 000, mn = 9, rn = 55, kn = 64. The results are based on B = 5, 000 simulation draws.

43

Figure 6: Simulated Power of the Test, Heteroskedastic Errors

This figure plots the simulated test power against the simulated size for α ∈ [0, 1]. The errors εi are i.n.i.d. N (0, Σ) with Σ = diag(1 + 1.75 exp(0.75(X1i + X2i ))). The upper left figure corresponds to the test statistic ξ −rn a ˜ )−1 P 0 ε˜, ε˜ are the residuals from the partially linear √ trn ,HC,1 = HC,1 ∼ N (0, 1), where ξHC,1 = ε˜0 P (P 0 ΣP 2rn ξ −rn a ˜ = diag(˜ √ model, and Σ ε2i ). The upper right figures corresponds to the test statistic trn ,HC,2 = HC,2 ∼ 2rn 0 ˜ ˜ 0 ˜ ˜ −1 ˜ 0 0 −1 0 ˜ N (0, 1), where ξHC,2 = ε˜ T (T ΣT ) T ε˜ and T = MW T = T − W (W W ) W T . The middle left figure corresponds to the test statistic trn ,HC,1,adj and the middle right figure corresponds to the test statistic n ˜ with Σ ˜ adj = trn ,HC,2,adj , which replace Σ ε2i ) in the definitions above. The lower left figure n−mn diag(˜ ξHC,1,inf −rn a √ corresponds to the test statistic trn ,HC,1,inf = ∼ N (0, 1), where ξHC,1,inf = ε˜0 P (P 0 ΣP )−1 P 0 ε˜. 2r n

ξ

−r

a

n √ The lower right figure corresponds to the test statistic trn ,HC,2,inf = HC,2,inf ∼ N (0, 1), where ξHC,2,inf = 2rn 0 ˜ ˜ 0 ˜ −1 ˜ 0 ε˜ T (T ΣT ) T ε˜. The dotted line shows the 45 degree line. The dash-dotted line corresponds to Setup 1 with n = 400 observations, mn = 7 parameters under H0 , rn = 29 restrictions, kn = 36 parameters under H1 . The dashed line corresponds to Setup 2 with n = 2, 000, mn = 7, rn = 29, kn = 36. The solid line corresponds to Setup 3 with n = 2, 000, mn = 9, rn = 55, kn = 64. The results are based on B = 5, 000 simulation draws.

44

Figure 7: Nonstandard Alternatives

The left figure shows the deviation from H0 along the x1 axis in the nonsmooth case. The left figure shows the deviation from H0 along the x1 axis in the discontinuous case.

Figure 8: Simulated Power of the Test, Nonstandard Alternatives

This figure plots the simulated test power against the simulated size for α ∈ [0, 1]. The errors εi are i.i.d. ξ−rn a ∼ N (0, 1), where ξ = ε˜0 P (˜ σ 2 P 0 P )−1 P 0 ε˜, ε˜ N (0, 4). Both figures correspond to the test statistic trn = √ 2rn 2 −1 0 are the residuals from the partially linear model, and σ ˜ = n ε˜ ε˜. The dotted line shows the 45 degree line. The dash-dotted line corresponds to Setup 1 with n = 400 observations, mn = 7 parameters under H0 , rn = 29 restrictions, kn = 36 parameters under H1 . The dashed line corresponds to Setup 2 with n = 2, 000, mn = 7, rn = 29, kn = 36. The solid line corresponds to Setup 3 with n = 2, 000, mn = 9, rn = 55, kn = 64. The results are based on B = 5, 000 simulation draws.

45

Appendix B

Proofs

Proof of Theorem 1 Given the projection nature of series estimators, the test statistic becomes ξ − rn ε˜0 P (˜ σ 2 P 0 P )−1 P 0 ε˜ − rn (ε + R)0 MW P (˜ σ 2 P 0 P )−1 P 0 MW (ε + R) − rn √ √ √ = = 2rn 2rn 2rn Note that P = (W inverse formula,

T ), so that MW P = (0n×mn

(A.1)

MW T ). Then by the blockwise matrix

MW P (˜ σ 2 P 0 P )−1 P 0 MW = MW T (˜ σ 2 T 0 MW T )−1 T 0 MW Thus, σ 2 T˜0 T˜)−1 T˜0 (ε + R) − rn ξ − rn (ε + R)0 MW T (˜ σ 2 T 0 MW T )−1 T 0 MW (ε + R) − rn (ε + R)0 T˜(˜ √ √ √ = = , 2rn 2rn 2rn where T˜ = MW T . The remainder of the proof consists of several steps. √ Step 1. Show that ||T˜0 T˜/n − T 0 T /n|| = op (1/ rn ). Note that T˜0 T˜/n = T 0 T /n − (T 0 W/n)(W 0 W/n)−1 (W 0 T /n) Thus, ||T˜0 T˜/n − T 0 T /n|| = ||(T 0 W/n)(W 0 W/n)−1 (W 0 T /n)|| By Lemma 15.2 in Li and Racine (2007), E[||P 0 P/n − Ikn ||2 ] = Op (ζ(kn )2 kn /n) and p ||P 0 P/n − Ikn || = Op (ζ(kn ) kn /n). Similarly, it can be shown that ||W 0 W/n − Imn || = p p Op (ζ(mn ) mn /n) and ||T 0 T /n − Irn || = Op (ζ(rn ) rn /n). Moreover, note that 0

P P/n − Ikn =

! W 0 W/n W 0 T /n − T 0 W/n T 0 T /n

Imn 0rn ×mn

0mn ×rn Irn

!

p Hence, ||W 0 T /n|| = Op (ζ(kn ) kn /n). Thus, the eigenvalues of W 0 W/n are bounded below and above w.p.a.1, and 0 0 0 −1 0 0 (T W/n)(W W/n) (W T /n) ≤ C (T W/n)(W T /n) ≤ C (T 0 W/n) (W 0 T /n) = Op (ζ(kn )2 kn /n),

46

where the last inequality is due to the fact that ||AB||2 ≤ ||A||2 ||B||2 (see, e.g., Trefethen and Bau III (1997), p. 23). √ √ If condition 4.1 holds, ζ(kn )2 kn rn /n → 0, and thus ||T˜0 T˜/n − T 0 T /n|| = op (1/ rn ). This also implies that the smallest and largest eigenvalues of T˜0 T˜/n converge to one. Step 2. Decompose the test statistic and bound the remainder terms. (ε + R)0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 (ε + R) = ε0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 ε + 2R0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 ε + R0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 R (A.2) By the projection inequality and Assumption 4, R0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 R ≤ R0 R/˜ σ 2 = Op (nmn−2α ) Because T˜0 T˜/n and T˜T˜0 /n have the same nonzero eigenvalues and all eigenvalues of T˜0 T˜/n converge to one, λmax (T˜T˜0 /n) converges in probability to 1. Thus, X 0 ˜ 2 ˜0 ˜ −1 ˜0 σ T T ) T ε ≤ Cλmax (T˜T˜0 /n)R0 ε ≤ CR0 ε = C Ri εi R T (˜ i

Note that  E

!2  X i

Ri εi

=E

"

# XX i

εi εj Ri Rj = E

j

"

# X

Ri2 ε2i

i

= nE[Ri2 ε2i ] = nσ 2 E[Ri2 ] ≤ nσ 2 sup R(x)2 = O(nm−2α n ) x∈X

by Assumption 4. Hence, 0 ˜ 2 ˜0 ˜ −1 ˜0 X Ri εi = Op (n1/2 m−α σ T T ) T ε ≤ C R T (˜ n ) i

Thus, 1/2 −α mn ) σ 2 T˜0 T˜)−1 T˜0 ε + Op (nm−2α (ε + R)0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 (ε + R) = ε0 T˜(˜ n ) + Op (n

Next, ε0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 ε = ε0 T (˜ σ 2 T˜0 T˜)−1 T 0 ε − 2ε0 PW T (˜ σ 2 T˜0 T˜)−1 T 0 ε + ε0 PW T (˜ σ 2 T˜0 T˜)−1 T 0 PW ε

47

Note that E[(n−1 ε0 T )Ω−1 (n−1 T 0 ε)] = E[ε2i Ti0 Ω−1 Ti ]/n = E[tr(Ω−1 ε2i Ti Ti0 )] = tr(Irn )/n = rn /n Thus, by Markov’s inequality, ||Ω−1 (n−1 T 0 ε)|| ≤ C

p p (n−1 ε0 T )Ω−1 (n−1 T 0 ε) = Op ( rn /n)

Because the eigenvalues of Ω are bounded below and above w.p.a. 1, it is also true that p p ||n T 0 ε|| = Op ( rn /n) and ||n−1 W 0 ε|| = Op ( mn /n). Using this result and the inequality ||AB||2 ≤ ||A||2 ||B||2 , get −1

0 σ 2 T˜0 T˜)−1 T 0 ε = ε0 W (W 0 W )−1 W 0 T (˜ σ 2 T˜0 T˜)−1 T 0 ε ε PW T (˜ = n(ε0 W/n)(W 0 W/n)−1 (W 0 T /n)(˜ σ 2 T˜0 T˜/n)−1 (T 0 ε/n) 0 0 0 0 0 0 ≤ Cn (ε W/n)(W T /n)(T ε/n) ≤ Cn (ε W/n) (W T /n) (T ε/n) p p p p = nOp ( mn /n)Op (ζ(kn ) kn /n)Op ( rn /n) = Op (ζ(kn ) mn kn rn /n) In turn, 0 σ 2 T˜0 T˜)−1 T 0 W (W 0 W )−1 W 0 ε σ 2 T˜0 T˜)−1 T 0 PW ε = ε0 W (W 0 W )−1 W 0 T (˜ ε PW T (˜ 0 0 −1 0 2 ˜0 ˜ −1 0 0 −1 0 = n(ε W/n)(W W/n) (W T /n)(˜ σ T T /n) (T W/n)(W W/n) (W ε/n) ≤ Cn (ε0 W/n)(W 0 T /n)(T 0 W/n)(W 0 ε/n) ≤ Cn (ε0 W/n) (W 0 T /n) (T 0 W/n) (W 0 ε/n) p p p p = nOp ( mn /n)Op (ζ(kn ) kn /n)Op (ζ(kn ) kn /n)Op ( mn /n) = Op (ζ(kn )2 mn kn /n) Thus, under conditions 4.3 and 4.4, ε˜0 P (˜ σ 2 P 0 P )−1 P 0 ε˜ = ε0 T (˜ σ 2 T˜0 T˜)−1 T 0 ε + Op (nmn−2α ) + Op (n1/2 m−α n ) p √ + Op (ζ(kn )2 mn kn /n) + Op (ζ(kn ) mn kn rn /n) = ε0 T (˜ σ 2 T˜0 T˜)−1 T 0 ε + op ( rn )

(A.3)

Step 3. Deal with the leading term. Define Ω = σ 2 E[Ti Ti0 ] = σ 2 Irn . √ Lemma A.1. (Donald et al. (2003), Lemma 6.2) If E[Ti εi ] = 0, E[(εi Ti Ω−1 Ti0 εi )2 ]/(rn n) →

48

0, and rn → ∞, then n(n−1 ε0 T )Ω−1 (n−1 T 0 ε) − rn d √ → N (0, 1) 2rn

(A.4)

All three conditions of this lemma hold. E[Ti εi ] = 0 and rn → ∞ hold trivially, while E[(εi Ti Ω−1 Ti0 εi )2 ] ≤ CE[ε4i ||Ti ||4 ] ≤ CE[||Ti ||4 ] ≤ Cζ(rn )2 rn Under condition 4.5, the assumptions of Lemma A.1 hold. Lemma A.2. Suppose that Assumptions 2, 3, and 4 hold. Moreover, σ 2 (x) = σ 2 for all x ∈ X . Then n(n−1 ε0 T )(˜ σ 2 n−1 T˜0 T˜)−1 (n−1 T 0 ε) − n(n−1 ε0 T )Ω−1 (n−1 T 0 ε) p →0 √ rn

(A.5)

Proof of Lemma A.2. To prove the lemma, I will need an auxiliary result given below. ˜ =σ ˘ =σ ¯ = σ 2 T 0 T /n, Ω = σ 2 E[Ti Ti0 ]. Suppose that Lemma A.3. Let Ω ˜ 2 T˜0 T˜/n, Ω ˜ 2 T 0 T /n Ω Assumptions 2(ii), 3, and 4 are satisfied. Then ˜ − Ω|| ˘ = Op (ζ(kn )2 kn /n) ||Ω ˘ − Ω|| ¯ = Op (rn /n1/2 ) ||Ω ¯ − Ω|| = Op (ζ(rn )r1/2 /n1/2 ) ||Ω n If Assumption 2(i) is also satisfied then 1/C ≤ λmin (Ω) ≤ λmax (Ω) ≤ C, and if ζ(kn )2 kn /n → 1/2 ˜ ≤ λmax (Ω) ˜ ≤ C and 1/C ≤ 0 and ζ(rn )rn /n1/2 → 0, then w.p.a. 1, 1/C ≤ λmin (Ω) ¯ ≤ λmax (Ω) ¯ ≤ C. λmin (Ω) Proof of Lemma A.3 is presented after the remainder of the main proof. Given the result of Lemma A.3, n(n−1 ε0 T )Ω ˜ −1 (n−1 T 0 ε) n(n−1 ε0 T )Ω−1 (n−1 T 0 ε) √ √ − 2rn 2rn n(n−1 ε0 T )(Ω ˜ −1 − Ω−1 )(n−1 T 0 ε) n(n−1 ε0 T )(Ω ˜ −1 − Ω−1 )(n−1 T 0 ε) √ √ = = 2rn 2rn n(n−1 ε0 T )(Ω−1 (Ω ˜ − Ω)Ω ˜ −1 (Ω ˜ − Ω)Ω−1 )(n−1 T 0 ε) n(n−1 ε0 T )(Ω−1 (Ω ˜ − Ω)Ω−1 )(n−1 T 0 ε) √ √ = − 2rn 2rn ˜ − Ω)Ω ˜ −1 (Ω ˜ − Ω)Ω−1 )(n−1 T 0 ε) n(n−1 ε0 T )(Ω−1 (Ω ˜ − Ω)Ω−1 )(n−1 T 0 ε) n(n−1 ε0 T )(Ω−1 (Ω √ √ ≤ + 2rn 2rn 49



˜ − Ω|| + C||Ω ˜ − Ω||2 ) n||Ω−1 n−1 T 0 ε||2 (||Ω √ 2rn p As has been shown above, ||Ω−1 (n−1 T 0 ε)|| = Op ( rn /n). Hence, √ √ ˜ − Ω|| + C||Ω ˜ − Ω||2 ) nOp (rn /n)op (1/ rn ) op ( rn ) n||Ω−1 n−1 T 0 ε||2 (||Ω √ √ = = √ = op (1), 2rn 2rn 2rn

˜ − Ω|| = op (1/√rn ), which holds under rate conditions 4.1–4.2. provided that ||Ω

 

The result of Theorem 1 now follows from equations A.3, A.4, and A.5.

Proof of Lemma A.3. It has been shown in the proof of Theorem 1 that ||T˜0 T˜/n − T 0 T /n|| = p ˜ − Ω|| ˘ = Op (ζ(kn )2 kn /n). Op (ζ(kn )2 kn /n). As long as σ ˜ 2 → σ 2 , this implies ||Ω Note that ε˜ = ε + f − f˜. Due to homoskedasticity, ˘ − Ω|| ¯ = ||(˜ ||Ω σ2 − σ2)

X

Ti Ti0 /n|| ≤ |˜ σ2 − σ2|

i −1

= |n

X

(ε2i

2

− σ ) + 2n

i

X

||Ti ||2 /n

i −1

X

εi (fi − f˜i ) + n−1

i

X

(fi − f˜i )2 |

i

X

||Ti ||2 /n

i

P First, by Chebyshev’s inequality, n−1 i (ε2i − σ 2 ) = Op (n−1/2 ). P Second, by Assumption 2, n−1 i (fi − f˜i )2 = Op (mn /n + mn−2α ). Finally, note that n−1

X

εi (fi − f˜i ) = n−1

i

X

εi (Wi0 (β1 − β˜1 ) + Ri ) = n−1 (β1 − β˜1 )0 W 0 ε + n−1 R0 ε

i

Using the result proved above, n−1 R0 ε = Op (n−1/2 m−α n ). Next, −1 −1 −1 0 0 0 0 0 ˜ ˜ ˜ n (β1 − β1 ) W ε = n (β1 − β1 ) W ε ≤ ||β1 − β1 || n W ε   1/2 1/2 −2α 1/2 /n1/2 = Op (mn /n + m−2α Op (n−1/2 m1/2 n ) n ) = Op mn (mn /n + mn ) Then n−1

X

1/2 −2α 1/2 εi (fi − f˜i ) = Op (n−1/2 m−α /n1/2 n ) + Op mn (mn /n + mn )

i −2α 1/2 = Op m1/2 /n1/2 n (mn /n + mn )



50



Combining the results,  −2α 1/2 /n1/2 = Op (n−1/2 ), σ ˜ 2 − σ 2 = Op (n−1/2 ) + Op (mn /n + mn−2α ) + Op m1/2 n (mn /n + mn ) 1/2

1/2 /n1/2 = o(n−1/2 ) and mn /n + mn−2α = o(n−1/2 ). because mn (mn /n + m−2α n )  ˘ − Ω|| ¯ = Op rn n−1/2 . By Lemma 1, E[||Ti ||2 ] ≤ rn , which yields ||Ω ¯ − Ω|| = ||σ 2 (T 0 T /n − E[Ti T 0 ])||. Next, ||Ω i Thus,

¯ − Ω||2 ] = E[||σ 2 E[||Ω

X

(Ti Ti0 − E[Ti Ti0 ])/n||2 ]

i

= σ E[||Ti Ti0 − E[Ti Ti0 ]||2 ]/n ≤ CE[||Ti ||4 ]/n = Cζ(rn )2 rn /n, 4

p ¯ − Ω|| = Op (ζ(rn ) rn /n). and hence ||Ω The remaining conclusions follow from Lemma A.6 in Donald et al. (2003).



Proof of Theorem 2 Given the projection nature of the series estimators, the test statistic becomes ˜ )−1 P 0 ε˜ − rn ˜ )−1 P 0 MW (ε + R) − rn ε˜0 P (P 0 ΣP (ε + R)0 MW P (P 0 ΣP ξ − rn √ √ √ = = 2rn 2rn 2rn Next, note that P = (W matrix inverse formula,

T ), so that MW P = (0n×mn

MW T ). Then by the blockwise

˜ )−1 P 0 MW = MW T (nΩ) ˆ −1 T 0 MW , MW P (P 0 ΣP where 0˜ ˜ ˜ /n) ˆ = T 0 ΣT ˜ /n − (T 0 ΣW/n)(W Ω ΣW/n)−1 (W 0 ΣT

Thus, ˆ −1 T 0 MW (ε + R) − rn ˆ −1 T˜0 (ε + R) − rn ξ − rn (ε + R)0 MW T (nΩ) (ε + R)0 T˜(nΩ) √ √ √ = = , 2rn 2rn 2rn where T˜ = MW T . The remainder of the proof consists of several steps. ˆ − Ω|| ˜ = op (1/√rn ), where Ω ˜ = T 0 ΣT ˜ . This is not a primitive Step 1. Assume that ||Ω assumption; however, it is difficult to derive more primitive sufficient conditions. Step 2. Use the following auxiliary result: 51

˜ = P ε˜2 Ti T 0 /n, Ω ˘ = P ε2 Ti T 0 /n, Ω ¯ = P σ 2 Ti T 0 /n, Ω = E[ε2 Ti T 0 ], Lemma A.4. Let Ω i i i i i i i i i i i 2 2 where σi = E[εi |Xi ]. Suppose that Assumptions 2(ii), 3, and 4 are satisfied. Then  ˜ − Ω|| ˘ = Op ζ(rn )2 (mn /n + m−2α ) ||Ω n 1/2 1/2 ˘ ¯ ||Ω − Ω|| = Op (ζ(rn )r /n ) n

¯ − Ω|| = Op (ζ(rn )r1/2 /n1/2 ) ||Ω n If Assumption 2(i) is also satisfied then 1/C ≤ λmin (Ω) ≤ λmax (Ω) ≤ C, and if ζ(rn )2 (mn /n+ 1/2 1/2 ˜ ≤ λmax (Ω) ˜ ≤ C and → 0, then w.p.a. 1, 1/C ≤ λmin (Ω) m−2α n ) → 0 and ζ(rn )rn /n ¯ ≤ λmax (Ω) ¯ ≤ C. 1/C ≤ λmin (Ω) ˆ − Ω|| ˜ = op (1), then 1/C ≤ λmin (Ω) ˆ ≤ λmax (Ω) ˆ ≤ C. Moreover, if ||Ω Step 3. Decompose the test statistic and bound the remainder terms. ˆ −1 T˜0 (ε + R) = ε0 T˜(nΩ) ˆ −1 T˜0 ε + 2R0 T˜(nΩ) ˆ −1 T˜0 ε + R0 T˜(nΩ) ˆ −1 T˜0 R (ε + R)0 T˜(nΩ) Because T˜0 T˜/n and T˜T˜0 /n have the same nonzero eigenvalues and all eigenvalues of T˜0 T˜/n ˆ converge to one, λmax (T˜T˜0 /n) converges in probability to 1. Moreover, the eigenvalues of Ω are bounded below and above. Thus, by Assumption 4, ˆ −1 T˜0 R ≤ CR0 (n−1 T˜T˜0 )R/˜ R0 T˜(nΩ) σ 2 ≤ CR0 R/˜ σ 2 = Op (nm−2α n ) Next, 0 ˜ ˆ −1 ˜0 R T (nΩ) T ε ≤ Cλmax (T˜T˜0 /n)R0 ε ≤ CR0 ε = Op (n1/2 m−α n ) In turn, ˆ −1 T˜0 ε = ε0 T (nΩ) ˆ −1 T 0 ε − 2ε0 PW T (nΩ) ˆ −1 T 0 ε + ε0 PW T (nΩ) ˆ −1 T 0 PW ε ε0 T˜(nΩ) Next, 0 ˆ −1 T 0 ε = ε0 W (W 0 W )−1 W 0 T (nΩ) ˆ −1 T 0 ε ε PW T (nΩ) 0 0 −1 0 −1 0 ˆ = n(ε W/n)(W W/n) (W T /n)Ω (T ε/n) ≤ Cn (ε0 W/n)(W 0 T /n)(T 0 ε/n) ≤ Cn (ε0 W/n) (W 0 T /n) (T 0 ε/n) p p p p = nOp ( mn /n)Op (ζ(kn ) kn /n)Op ( rn /n) = Op (ζ(kn ) mn kn rn /n)

52

In turn, 0 0 −1 0 0 −1 0 −1 0 0 −1 0 ˆ ˆ ε P T (n Ω) T P ε = ε W (W W ) W T (n Ω) T W (W W ) W ε W W ˆ −1 (T 0 W/n)(W 0 W/n)−1 (W 0 ε/n) = n(ε0 W/n)(W 0 W/n)−1 (W 0 T /n)Ω ≤ Cn (ε0 W/n)(W 0 T /n)(T 0 W/n)(W 0 ε/n) 0 0 0 0 ≤ Cn (ε W/n) (W T /n) (T W/n) (W ε/n) p p p p = nOp ( mn /n)Op (ζ(kn ) kn /n)Op (ζ(kn ) kn /n)Op ( mn /n) = Op (ζ(kn )2 mn kn /n) Thus, under conditions 4.8 and 4.9 ˜ )−1 P 0 ε˜ = ε0 T (nΩ) ˆ −1 T 0 ε + Op (nm−2α ) + Op (n1/2 m−α ) ε˜0 P (P 0 ΣP n n p 2 0 −1 ˆ T 0 ε + op (√rn ) + Op (ζ(kn ) mn kn /n) + Op (ζ(kn ) mn kn rn /n) = ε T (nΩ)

(A.6)

Step 4. Deal with the leading term. Conditions of Lemma A.1 do not rely on the homoskedasticity assumption. Thus, the √ result of the lemma remains valid even under heteroskedasticity, as long as ζ(rn )2 / n → 0: n(n−1 ε0 T )Ω−1 (n−1 T 0 ε) − rn d √ → N (0, 1) 2rn

(A.7)

Next, as in the proof of Theorem 1, n(n−1 ε0 T )Ω −1 −1 0 −1 0 −1 −1 0 −1 0 −1 −1 −1 0 ˆ ˆ (n T ε) n(n ε T )Ω (n T ε) n(n ε T )(Ω − Ω )(n T ε) √ √ √ − = 2rn 2rn 2rn ≤

ˆ − Ω|| + C||Ω ˆ − Ω||2 ) n||Ω−1 n−1 T 0 ε||2 (||Ω √ 2rn

p Similarly to the proof of Theorem 1, ||Ω−1 (n−1 T 0 ε)|| = Op ( rn /n). Then √ √ ˜ − Ω|| + C||Ω ˜ − Ω||2 ) nOp (rn /n)op (1/ rn ) op ( rn ) n||Ω−1 n−1 T 0 ε||2 (||Ω √ √ = = √ = op (1), 2rn 2rn 2rn ˆ − Ω|| = op (1/√rn ), which holds under rate conditions 4.7, 4.8, and 4.8. provided that ||Ω Hence, ˆ −1 (n−1 T 0 ε) − n(n−1 ε0 T )Ω−1 (n−1 T 0 ε) p n(n−1 ε0 T )Ω →0 √ rn

53

(A.8)

The result of Theorem 2 now follows from equations A.6, A.7, and A.8.



Proof of Lemma A.4. First, ˜ − Ω|| ˘ = || ||Ω = ||

X

Ti Ti0

X

Ti Ti0 (˜ ε2i − ε2i )/n|| = ||

X

  Ti Ti0 (εi + fi − f˜i )2 − ε2i /n||

X  i  i 2 (fi − f˜i )2 + 2εi (fi − f˜i ) /n (fi − f˜i ) + 2εi (fi − f˜i ) /n|| ≤ sup ||Ti ||2 i

i

i

    −2α 1/2 = Op ζ(rn )2 (mn /n + m−2α + Op n−1/2 m1/2 = ζ(rn )2 Op mn /n + m−2α n ) n (mn /n + mn ) n The following two results can be obtained exactly as in Lemma A.6 in Donald et al. (2003): ˘ − Ω|| ¯ = || ||Ω

X

¯ − Ω|| = || ||Ω

X

Ti Ti0 (ε2i − σi2 )/n|| = Op (ζ(rn )

p rn /n)

i

Ti Ti0 ε2i /n − Ω|| = Op (ζ(rn )

p rn /n)

i

Finally, the results about the eigenvalues can also be obtained in the same way as in Lemma A.6 in Donald et al. (2003). 

Proof of Theorem 3 The proof is analogous to the proof of Theorem 2. The only difference is that in Step 1 ˆ is different. The condition ||Ω ˆ − Ω|| ˜ = op (1/√rn ) is imposed of the proof, the definition of Ω directly. Note that ˆ −Ω ˜ = T˜0 Σ ˜ T˜ − T 0 ΣT ˜ = T 0 MW ΣM ˜ W T − T 0 ΣT ˜ Ω 0 ˜ ˜ /n) = (T 0 W/n)(W 0 W/n)−1 (W 0 ΣW/n)(W W/n)−1 (W 0 T /n) − 2(T 0 W/n)(W 0 W/n)−1 (W 0 ΣT ˜ Similarly to the previous proofs, the eigenvalues of (W 0 W/n) and (W 0 ΣW/n) are bounded away from zero w.p.a. 1. Thus, 0 0 ˜ W/n)−1 (W 0 T /n) ≤ C (T 0 W/n) (W 0 T /n) (T W/n)(W 0 W/n)−1 (W 0 ΣW/n)(W = Op (ζ(kn )2 kn /n) = op (rn−1/2 )

54

Next, 0 0 ˜ 0 −1 0˜ 0 (T W/n)(W W/n) (W ΣT /n) ≤ C||(T W/n) (W ΣT /n) p ˜ /n) = Op (ζ(kn ) kn /n) (W 0 ΣT p −1/2 ˜ /n) = Op (ζ(kn ) kn /n). It remains to show that this is op (rn ), e.g. by showing that (W 0 ΣT I plan to derive primitive conditions for this result in future work.

Proof of Theorem 4 ˆ P = (˜ ˆ P = (n−1 P 0 ΣP ˜ ) in the hetDenote Ω σ 2 n−1 P 0 P ) in the homoskedastic case and Ω eroskedastic case. Then under homoskedasticity ˆ −1 (n−1 P 0 ε˜), ξ = n(n−1 ε˜0 P )(˜ σ 2 n−1 P 0 P )−1 (n−1 P 0 ε˜) = n(n−1 ε˜0 P )Ω P while under heteroskedasticity ˜ )−1 (n−1 P 0 ε˜) = n(n−1 ε˜0 P )Ω ˆ −1 (n−1 P 0 ε˜) ξHC,1 = n(n−1 ε˜0 P )(n−1 P 0 ΣP P Note that √

ˆ −1 (n−1 P 0 ε˜) − rn rn n(n−1 ε˜0 P )Ω 1 P ˆ −1 (n−1 P 0 ε˜) + T2 , √ = √ (n−1 ε˜0 P )Ω P n 2rn 2

√ where T2 = −rn /(n 2) → 0. p ˆ −1 (n−1 P 0 ε˜) → Hence, it suffices to show that (n−1 ε˜0 P )Ω ∆. P Next, note that due to the projection nature of the series estimators, ε˜ = MW Y = MW (ε∗ + R∗ ). Hence,  −1 −1 0  ∗ ∗ ˆ −1 (n−1 P 0 ε˜) = n−1 (ε∗ + R∗ )0 MW T Ω ˆ n T M (ε + R ) , (n−1 ε˜0 P )Ω W P ˆ is defined in the statement of Theorem 4. where Ω Note that ˜ W T )−1 (n−1 T 0 MW ε˜) ξHC,2 = n(n−1 ε˜0 MW T )(n−1 T 0 MW ΣM ˆ −1 (n−1 T 0 MW (ε∗ + R∗ )), = n(n−1 (ε∗ + R∗ )0 MW T )Ω ˆ in the statement of Theorem 4. Thus, for all three test for the appropriately defined Ω

55

statistics ξ, ξHC,1 , and ξHC,2 it suffices to show that p

ˆ −1 (n−1 T 0 MW (ε∗ + R∗ )) → ∆ (n−1 (ε∗ + R∗ )0 MW T )Ω Next,   −1 −1 0   −1 −1 0 ˆ ˆ n−1 (ε∗ + R∗ )0 MW T Ω n T MW (ε∗ + R∗ ) = n−1 ε∗0 MW T Ω n T MW ε∗    −1 −1 0  −1 −1 0 ˆ ˆ n T MW R∗ + n−1 R∗0 MW T Ω n T MW ε∗ + n−1 R∗0 MW T Ω Similarly to the proofs of Theorems 1 and 2, but using the fact that supx∈X R∗ (x) = o(1) instead of supx∈X R(x) = O(m−α n ),  −1 −1 0  ˆ n−1 R∗0 MW T Ω n T MW ε∗ ≤ CR∗0 ε∗ /(n˜ σ 2 ) ≤= Op (n−1/2 )op (1) = op (1) and  −1 −1 0  ˆ n−1 R∗0 MW T Ω n T MW R∗ ≤ CR∗0 R∗ /(n˜ σ 2 ) = op (1) Thus,  −1 −1 0  ˆ ˆ −1 (n−1 T 0 MW ε∗ ) + op (1) n−1 (ε∗ + R∗ )0 MW T Ω n T MW (ε∗ + R∗ ) = (n−1 ε∗0 MW T )Ω Next, given that, as shown in the proof of Theorem 1, MW T = T + op (1) and the ˆ are bounded above and below w.p.a. 1, eigenvalues of Ω ˆ −1 (n−1 T 0 MW ε∗ ) = (n−1 ε∗0 T )Ω ˆ −1 (n−1 T 0 ε∗ ) + op (1) (n−1 ε∗0 MW T )Ω Next, −1 ∗0 −1 ∗0 −1 ∗−1 −1 0 ∗ ∗−1 ˆ ∗ ˆ ∗−1 ˆ ∗ ∗−1 −1 0 ∗ ˆ (n ε T )( Ω − Ω )(n T ε ) ≤ (n ε T )(Ω ( Ω − Ω ) Ω ( Ω − Ω )Ω )(n T ε ) ˆ − Ω∗ )Ω∗−1 )(n−1 T 0 ε∗ ) ≤ ||Ω∗−1 n−1 T 0 ε∗ ||2 (||Ω ˆ − Ω∗ || + C||Ω ˆ − Ω∗ ||2 ) = op (1) + (n−1 ε∗0 T )(Ω∗−1 (Ω ˆ −1 (n−1 T 0 ε∗ ) = (n−1 ε∗0 T )Ω∗−1 (n−1 T 0 ε∗ ) + op (1). Thus, (n−1 ε∗0 T )Ω 0 To complete the proof, note that V ar(Ti ε∗i ) ≤ Ω∗ , because Ω∗ = E[ε∗2 i Ti Ti ]. Then E[(n−1 T 0 ε∗ − E[Ti ε∗i ])0 Ω∗−1 (n−1 T 0 ε∗ − E[Ti ε∗i ])] ≤ E[(n−1 T 0 ε∗ − E[Ti ε∗i ])0 V ar(Ti ε∗i )−1 (n−1 T 0 ε∗ − E[Ti ε∗i ])]  = E[tr V ar(Ti ε∗i )−1 (n−1 T 0 ε∗ − E[Ti ε∗i ])(n−1 T 0 ε∗ − E[Ti ε∗i ])0 ] = tr(Irn )/n = rn /n → 0

56

Thus, −1 ∗0 ∗−1 ∗ ∗−1 −1 0 ∗ ∗ 0 ]Ω E[T ε ] T (n ε T )Ω (n T ε ) − E[ε i i i i −1 0 ∗ ∗ 0 ∗−1 −1 0 ∗ ∗ ∗ 0 ∗−1 −1 0 ∗ ∗ ≤ (n T ε − E[Ti εi ]) Ω (n T ε − E[Ti εi ]) + 2 E[εi Ti ]Ω (n T ε − E[Ti εi ]) p p ≤ op (1) + 2 E[ε∗i Ti0 ]Ω∗−1 E[Ti ε∗i ] (n−1 T 0 ε∗ − E[Ti ε∗i ])0 Ω∗−1 (n−1 T 0 ε∗ − E[Ti ε∗i ]) √ = op (1) + 2 ∆op (1) = op (1) p

ˆ −1 (n−1 P 0 ε˜) → ∆. Combining the results above, (n−1 ε˜0 P )Ω P



Proof of Theorem 5 Because series methods are used, it is still the case that ε˜ = MW Y . Under the local alternative, ε˜ = MW Y = MW (gn + ε) = MW (f ∗ + (rn1/4 /n1/2 )d + ε)   = MW W 0 β1∗ + (f ∗ − W 0 β1∗ ) + (rn1/4 /n1/2 )d + ε = MW ε + R + (rn1/4 /n1/2 )d Thus, the test statistic becomes ε˜0 P (˜ σ 2 P 0 P )−1 P 0 ε˜ − rn ξ − rn √ √ = 2rn 2rn 1/4

=

1/4

(ε + R + (rn /n1/2 )d)0 MW P (˜ σ 2 P 0 P )−1 P 0 MW (ε + R + (rn /n1/2 )d) − rn √ 2rn 1/4

1/4

(ε + R + (rn /n1/2 )d)0 MW T (˜ σ 2 T 0 MW T )−1 T 0 MW (ε + R + (rn /n1/2 )d) − rn √ 2rn 1/4 1/4 (ε + R + (rn /n1/2 )d)0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 (ε + R + (rn /n1/2 )d) − rn √ = , 2rn =

where T˜ = MW T . The remainder of the proof consists of several steps. √ Step 1. It has been shown in the proof of Theorem 1 that ||T˜0 T˜/n − T 0 T /n|| = op (1/ rn ) √ as long as condition 4.1 holds, ζ(kn )2 kn rn /n → 0. This also implies that the smallest and largest eigenvalues of T˜0 T˜/n converge to one. Step 2. Decompose the test statistic and bound the remainder terms. (ε + R + (rn1/4 /n1/2 )d)0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 (ε + R + (rn1/4 /n1/2 )d) = ε0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 ε + 2R0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 ε + R0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 R + 2(rn1/4 /n1/2 )d0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 ε + 2(rn1/4 /n1/2 )d0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 R + (rn1/2 /n)d0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 d 57

0 ˜ 2 ˜0 ˜ −1 ˜0 2 ˜ 0 ˜ −1 ˜ 0 −2α ˜ σ T T ) T ε = As in the proof of Theorem 1, R T (˜ σ T T ) T R = Op (nmn ) and R T (˜ 0

Op (n1/2 m−α n ). Recall that, because T˜0 T˜/n and T˜T˜0 /n have the same nonzero eigenvalues and all eigenvalues of T˜0 T˜/n converge to one, λmax (T˜T˜0 /n) converges in probability to 1. Thus, as for the fourth term, 1/4 1/2 0 ˜ 2 ˜0 ˜ −1 ˜0 σ T T ) T ε ≤ (rn1/4 /n1/2 ) Cλmax (T˜T˜0 /n)d0 ε (rn /n )d T (˜ ≤ (rn1/4 /n1/2 ) Cd0 ε = (rn1/4 /n1/2 )Op (n1/2 ) = Op (rn1/4 ) As for the fifth term, 1/4 1/2 0 ˜ 2 ˜0 ˜ −1 ˜0 1/4 1/2 0 0 1/4 1/2 0 ˜ ˜ σ T T ) T R ≤ (rn /n ) Cλmax (T T /n)d R ≤ (rn /n ) Cd R (rn /n )d T (˜ By the Cauchy-Schwartz inequality, v ! ! u X u X 1/4 1/2 0 Ri2 /n d2i /n (rn /n )d R ≤ (rn1/4 n1/2 )t i

i

q 1/4 −α 1/2 2 ) = (rn1/4 n1/2 ) Op (m−2α n )E[di ](1 + op (1)) = Op (rn mn n As for the last term, (rn1/2 /n)d0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 d = (rn1/2 /n)d0 T (˜ σ 2 T˜0 T˜)−1 T 0 d + (rn1/2 /n)d0 PW T (˜ σ 2 T˜0 T˜)−1 T 0 PW d − 2(r1/2 /n)d0 PW T (˜ σ 2 T˜0 T˜)−1 T 0 d n

Using Assumption 6, 1/2 σ 2 T˜0 T˜)−1 T 0 d = (rn1/2 /n) d0 W (W 0 W )−1 W 0 T (˜ σ 2 T˜0 T˜)−1 T 0 d (rn /n)d0 PW T (˜ −1 0 1/2 0 0 −1 0 2 ˜0 ˜ σ T T /n) (T d/n) = rn (d W/n)(W W/n) (W T /n)(˜ ≤ Crn1/2 (d0 W/n)(W 0 T /n)(T 0 d/n) ≤ Crn1/2 (d0 W/n) (W 0 T /n) (T 0 d/n) p p p  = rn1/2 Op ( mn /n)Op (ζ(kn ) kn /n)Op ( rn /n) = Op ζ(kn )rn3/2 mn1/2 kn1/2 /n3/2 Next, 1/2 0 0 2 ˜ 0 ˜ −1 0 1/2 0 −1 0 2 ˜ 0 ˜ −1 0 0 −1 0 (r /n)d P T (˜ σ T T ) T P d = (r /n) d W (W W ) W T (˜ σ T T ) T W (W W ) W d n W W n = rn1/2 (d0 W/n)(W 0 W/n)−1 (W 0 T /n)(˜ σ 2 T˜0 T˜/n)−1 (T 0 W/n)(W 0 W/n)−1 (W 0 d/n) 58







Crn1/2 (d0 W/n)(W 0 T /n)(T 0 W/n)(W 0 d/n)

≤ Crn1/2 (d0 W/n) (W 0 T /n) (T 0 W/n) (W 0 d/n) p p p p = rn1/2 Op ( mn /n)Op (ζ(kn ) kn /n)Op (ζ(kn ) kn /n)Op ( mn /n) = Op (ζ(kn )2 rn1/2 mn kn /n2 ) Thus, under rate condition 4.3, (rn1/2 /n)d0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 d = (rn1/2 /n)d0 T (˜ σ 2 T˜0 T˜)−1 T 0 d + op (rn1/2 ) Next, similarly to the proof of Theorem 1, 1/2 r1/2 (n−1 d0 T )(n−1 σ 2 ˜ 0 ˜ −1 −1 0 −1 2 0 −1 −1 0 −1 0 ˜ T T ) (n T d) r (n d T )(n σ ˜ T T ) (n T d) n n √ √ − 2rn 2rn   1/2 −1 2 0 2 −1 2 0 −1 2 ˜ 0 ˜ −1 2 ˜ 0 ˜ −1 2 0 −1 −1 0 2 ˜ T T )|| ˜ T T )|| + C||(n σ ˜ T T ) − (n σ ˜ T T ) − (n σ rn ||(n σ ˜ T T ) n T ε|| ||(n σ √ ≤ 2rn √ 1/2 rn Op (rn /n)op (1/ rn ) √ = op (rn1/2 /n) = op (1) = 2rn Next, ˆ σ2, (rn1/2 /n)d0 T (T 0 T )−1 T 0 d/˜ σ 2 = (rn1/2 /n)(d0 T (T 0 T )−1 T 0 )(T (T 0 T )−1 T 0 d)/˜ σ 2 = (kn1/2 /n)dˆ0 d/˜ where dˆ = T (T 0 T )−1 T 0 d are the fitted values from the nonparametric series regression of d on T . By Lemma 4, (dˆ − d)0 (dˆ − d) = Op (n(rn /n + rn−2α )). By the Cauchy-Schwartz inequality and Lemma 4, v ! ! u X X u 0 ˆ d2i /n (dˆi − di )2 /n d (d − d) ≤ nt i

i

q  = n E[d2i ](1 + op (1))Op (n(rn /n + rn−2α )) = Op n(rn /n + rn−2α )1/2 Thus, σ 2 + Op rn1/2 (rn /n + rn−2α )1/2 (rn1/2 /n)d0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 d = (rn1/2 /n)d0 d/˜  = rn1/2 E[d2i ](1 + op (1))/σ 2 + Op rn1/2 (rn /n + rn−2α )1/2 + op (rn1/2 ) where the last equality is due to the law of large numbers and σ ˜ 2 = σ 2 + op (1).

59



Thus, 1/2 −α ε˜0 P (˜ σ 2 P 0 P )−1 P 0 ε˜ = ε0 T˜(˜ σ 2 T˜0 T˜)−1 T˜0 ε + rn1/2 E[d2i ]/σ 2 + Op (nm−2α mn ) n ) + Op (n  1/2 + Op (rn1/4 ) + Op (rn1/4 m−α ) + Op rn1/2 (rn /n + rn−2α )1/2 + op (rn1/2 ) n n

Next, as has been shown in the proof of Theorem 1, ε˜0 P (˜ σ 2 P 0 P )−1 P 0 ε˜ = ε0 T (˜ σ 2 T˜0 T˜)−1 T 0 ε + Op (nmn−2α ) + Op (n1/2 m−α n ) p + Op (ζ(kn )2 mn kn /n) + Op (ζ(kn ) mn kn rn /n) Under rate conditions 4.3 and 4.4, ε˜0 P (˜ σ 2 P 0 P )−1 P 0 ε˜ = ε0 T (˜ σ 2 T˜0 T˜)−1 T 0 ε + rn1/2 E[d2i ]/σ 2 + op (rn1/2 )

(A.9)

Step 3. Deal with the leading term. The result of Lemma A.1 applies, so that n(n−1 ε0 T )Ω−1 (n−1 T 0 ε) − rn d √ → N (0, 1) 2rn

(A.10)

Next, use the following lemma: Lemma A.5. Suppose that Assumptions 2, 3, and 8 hold Then n(n−1 ε0 T )(˜ σ 2 n−1 T˜0 T˜)−1 (n−1 T 0 ε) − n(n−1 ε0 T )Ω−1 (n−1 T 0 ε) p →0 √ rn

(A.11)

Proof of Lemma A.5. To prove the lemma, I will again need an auxiliary result given in the following lemma. ˜ =σ ˘ =σ ¯ = σ 2 T 0 T /n, Ω = σ 2 E[Ti Ti0 ]. Suppose that Lemma A.6. Let Ω ˜ 2 T˜0 T˜/n, Ω ˜ 2 T 0 T /n Ω Assumptions 2(ii), 3, and 8 are satisfied. Then ˜ − Ω|| ˘ = Op (ζ(kn )2 kn /n) ||Ω ˘ − Ω|| ¯ = Op (rn n−1/2 ) ||Ω ¯ − Ω|| = Op (ζ(rn )r1/2 /n1/2 ) ||Ω n If Assumption 2(i) is also satisfied then 1/C ≤ λmin (Ω) ≤ λmax (Ω) ≤ C, and if ζ(kn )2 kn /n → 1/2 ˜ ≤ λmax (Ω) ˜ ≤ C and 1/C ≤ 0 and ζ(rn )rn /n1/2 → 0, then w.p.a. 1, 1/C ≤ λmin (Ω) ¯ ≤ λmax (Ω) ¯ ≤ C. λmin (Ω) 60

Proof of Lemma A.6 is presented after the remainder of the main proof. Given the result of Lemma A.6, as in the proof of Theorem 1, n(n−1 ε0 T )Ω −1 −1 0 −1 0 −1 −1 0 ˜ ˜ − Ω|| + C||Ω ˜ − Ω||2 ) (n T ε) n(n ε T )Ω (n T ε) n||Ω−1 n−1 T 0 ε||2 (||Ω √ √ √ − ≤ 2rn 2rn 2rn √ √ nOp (rn /n)op (1/ rn ) op ( rn ) √ = = √ = op (1), 2rn 2rn ˜ − Ω|| = op (1/√rn ), which holds under rate conditions which holds under provided that ||Ω rate conditions 4.1–4.2.  The result of Theorem 5 now follows from equations A.9, A.10, and A.11.



Proof of Lemma A.6. It has been shown in the proof of Theorem 1 that ||T˜0 T˜/n − T 0 T /n|| = Op (ζ(kn )2 kn /n). p ˜ − Ω|| ˘ = Op (ζ(kn )2 kn /n). As long as σ ˜ 2 → σ 2 , this implies ||Ω Under the local alternative, ε˜ = ε + (gn − f˜) = ε + (fn∗ − f˜n ) + (rn1/4 /n1/2 )d Thus, σ ˜ 2 = ε˜0 ε˜/n = ε0 ε/n + (fn∗ − f˜n )0 (fn∗ − f˜n )/n + (rn1/2 /n)d0 d/n + 2(f ∗ − f˜n )0 ε/n + 2(r1/4 /n1/2 )(f ∗ − f˜n )0 d/n + 2(r1/4 /n1/2 )d0 ε/n n

n

n

n

P First, by Chebyshev’s inequality, n−1 i (ε2i − σ 2 ) = Op (n−1/2 ). Second, similarly to the proof of Lemma A.3, n−1

X

 ∗ εi (fni − f˜ni ) = Op mn1/2 (mn /n + mn−2α )1/2 /n1/2 ,

i

and n−1

X

∗ (fni − f˜ni )2 = Op (mn /n + mn−2α )

i

Next, by the law of large numbers, (rn1/2 /n)d0 d/n = (rn1/2 /n)E[d(Xi )2 ](1 + op (1)) = Op (rn1/2 /n)

61

In turn, 1/4 1/2 ∗ ˜ 0 1/2 /n1/2 ) − f ) d/n /n )(f (r = Op (rn1/4 (mn /n + m−2α n n n ) n Finally, (rn1/4 /n1/2 )d0 ε/n = (rn1/4 /n1/2 )Op (n−1/2 ) = Op (rn1/4 /n) Combining the results, σ ˜ 2 − σ 2 = Op (n−1/2 ) + Op mn1/2 (mn /n + mn−2α )1/2 /n1/2



+ Op (rn1/2 /n) + Op (rn1/4 (mn /n + mn−2α )1/2 /n1/2 ) = Op (n−1/2 ) ˘ − Ω|| ¯ = Op (rn n−1/2 ). Moreover, by Lemma 1, E[||Ti ||2 ] ≤ rn , which yields ||Ω The remaining conclusions can be proved as in Lemma A.3.

62



References Ackerberg, D. A., K. Caves, and G. Frazer (2015): “Identification Properties of Recent Production Function Estimators,” Econometrica, 83, 2411–2451. Ait-Sahalia, Y., P. J. Bickel, and T. M. Stoker (2001): “Goodness-of-fit tests for kernel regression with an application to option implied volatilities,” Journal of Econometrics, 105, 363 – 412. Bierens, H. J. (1982): “Consistent model specification tests,” Journal of Econometrics, 20, 105 – 134. ——— (1990): “A Consistent Conditional Moment Test of Functional Form,” Econometrica, 58, 1443–1458. Bravo, F. (2012): “Generalized empirical likelihood testing in semiparametric conditional moment restrictions models,” The Econometrics Journal, 15, 1–31. Breusch, T. S. and A. R. Pagan (1980): “The Lagrange Multiplier Test and its Applications to Model Specification in Econometrics,” The Review of Economic Studies, 47, 239–253. Cameron, A. C. and P. K. Trivedi (2005): Microeconometrics: Methods and Applications, Cambridge University Press. Carneiro, P., J. J. Heckman, and E. J. Vytlacil (2011): “Estimating Marginal Returns to Education,” American Economic Review, 101, 2754–81. Chen, X. (2007): “Large Sample Sieve Estimation of Semi-Nonparametric Models,” in Handbook of Econometrics, ed. by J. J. Heckman and E. E. Leamer, Elsevier, vol. 6B, chap. 76, 5549 – 5632. Chen, X. and Y. Fan (1999): “Consistent hypothesis testing in semiparametric and nonparametric models for econometric time series,” Journal of Econometrics, 91, 373 – 401. Chen, X. and D. Pouzo (2015a): “Sieve Wald and QLR Inferences on Semi/Nonparametric Conditional Moment Models,” Econometrica, 83, 1013–1079. ——— (2015b): “Supplement to “Sieve Wald and QLR Inferences on Semi/Nonparametric Conditional Moment Models”,” Econometrica Supplemental Material, 83, 1013–1079. Chernozhukov, V., W. Newey, and A. Santos (2015): “Constrained Conditional Moment Restriction Models,” ArXiv preprint 1509.06311. 63

Davidson, R. and J. G. MacKinnon (1998): “Graphical Methods for Investigating the Size and Power of Hypothesis Tests,” The Manchester School, 66, 1–26. de Jong, R. M. and H. J. Bierens (1994): “On the Limit Behavior of a Chi-Square Type Test If the Number of Conditional Moments Tested Approaches Infinity,” Econometric Theory, 10, 70–90. Delgado, M. A. and W. G. Manteiga (2001): “Significance Testing in Nonparametric Regression Based on the Bootstrap,” The Annals of Statistics, 29, 1469–1507. Donald, S. G., G. W. Imbens, and W. K. Newey (2003): “Empirical likelihood estimation and consistent tests with conditional moment restrictions,” Journal of Econometrics, 117, 55 – 93. Engle, R. F. (1982): “A general approach to Lagrange multiplier model diagnostics,” Journal of Econometrics, 20, 83 – 104. ——— (1984): “Wald, likelihood ratio, and Lagrange multiplier tests in econometrics,” in Handbook of Econometrics, ed. by Z. Griliches and M. D. Intriligator, Elsevier, vol. 2, chap. 13, 775 – 826. Fan, Y. and Q. Li (1996): “Consistent Model Specification Tests: Omitted Variables and Semiparametric Functional Forms,” Econometrica, 64, 865–890. Finan, F., E. Sadoulet, and A. de Janvry (2005): “Measuring the poverty reduction potential of land in rural Mexico,” Journal of Development Economics, 77, 27 – 51. Gao, J., H. Tong, and R. Wolff (2002): “Model Specification Tests in Nonparametric Stochastic Regression Models,” Journal of Multivariate Analysis, 83, 324 – 359. Gong, X., A. van Soest, and P. Zhang (2005): “The effects of the gender of children on expenditure patterns in rural China: a semiparametric analysis,” Journal of Applied Econometrics, 20, 509–527. ¨ Gørgens, T. and A. Wurtz (2012): “Testing a parametric function against a nonparametric alternative in IV and GMM settings,” The Econometrics Journal, 15, 462–489. Gozalo, P. L. (1993): “A Consistent Model Specification Test for Nonparametric Estimation of Regression Function Models,” Econometric Theory, 9, 451–477. Hall, A. (1990): “Lagrange Multiplier Tests for Normality against Seminonparametric Alternatives,” Journal of Business and Economic Statistics, 8, 417–426. 64

Hausman, J. A. and W. K. Newey (1995): “Nonparametric Estimation of Exact Consumers Surplus and Deadweight Loss,” Econometrica, 63, 1445–1476. Hong, Y. and H. White (1995): “Consistent Specification Testing Via Nonparametric Series Regression,” Econometrica, 63, 1133–1159. Horowitz, J. L. and V. G. Spokoiny (2001): “An Adaptive, Rate-Optimal Test of a Parametric Mean-Regression Model Against a Nonparametric Alternative,” Econometrica, 69, 599–631. Koenker, R. and J. A. Machado (1999): “GMM inference when the number of moment conditions is large,” Journal of Econometrics, 93, 327 – 344. Korolev, I. (2018): “LM-BIC Model Selection in Semiparametric Models,” mimeo, Binghamton University, Department of Economics. Lavergne, P. and Q. Vuong (2000): “Nonparametric Significance Testing,” Econometric Theory, 16, 576–601. Levinsohn, J. and A. Petrin (2003): “Estimating Production Functions Using Inputs to Control for Unobservables,” The Review of Economic Studies, 70, 317–341. Li, Q., C. Hsiao, and J. Zinn (2003): “Consistent specification tests for semiparametric/nonparametric models based on series estimation methods,” Journal of Econometrics, 112, 295 – 325. Li, Q. and J. S. Racine (2007): Nonparametric Econometrics: Theory and Practice, Princeton University Press. MacKinnon, J. G. and H. White (1985): “Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties,” Journal of Econometrics, 29, 305 – 325. Martins, M. F. O. (2001): “Parametric and semiparametric estimation of sample selection models: an empirical application to the female labour force in Portugal,” Journal of Applied Econometrics, 16, 23–39. McCulloch, J. H. and E. R. Percy (2013): “Extended Neyman smooth goodness-of-fit tests, applied to competing heavy-tailed distributions,” Journal of Econometrics, 172, 275 – 282, latest Developments on Heavy-Tailed Distributions.

65

Newey, W. K. (1985): “Maximum Likelihood Specification Testing and Conditional Moment Tests,” Econometrica, 53, 1047–1070. ——— (1997): “Convergence rates and asymptotic normality for series estimators,” Journal of Econometrics, 79, 147 – 168. Olley, G. S. and A. Pakes (1996): “The Dynamics of Productivity in the Telecommunications Equipment Industry,” Econometrica, 64, 1263–1297. Schmalensee, R. and T. M. Stoker (1999): “Household Gasoline Demand in the United States,” Econometrica, 67, 645–662. Sperlich, S. (2014): “On the choice of regularization parameters in specification testing: a critical discussion,” Empirical Economics, 47, 427–450. Sun, Y. and Q. Li (2006): “An alternative series based consistent model specification test,” Economics Letters, 93, 37 – 44. Trefethen, L. N. and D. Bau III (1997): Numerical Linear Algebra, vol. 50, SIAM. Whang, Y.-J. and D. W. Andrews (1993): “Tests of specification for parametric and semiparametric models,” Journal of Econometrics, 57, 277 – 318. Wooldridge, J. (1987): “A Regression-Based Lagrange Multiplier Statistic that is Robust in the Presence of Heteroskedasticity,” Working papers 478, Massachusetts Institute of Technology (MIT), Department of Economics. Yatchew, A. (1997): “An elementary estimator of the partial linear model,” Economics Letters, 57, 135 – 143. ——— (1998): “Nonparametric Regression Techniques in Economics,” Journal of Economic Literature, 36, 669–721. Yatchew, A. and J. A. No (2001): “Household Gasoline Demand in Canada,” Econometrica, 69, 1697–1709. Yatchew, A. J. (1992): “Nonparametric Regression Tests Based on Least Squares,” Econometric Theory, 8, 435–451.

66

Suggest Documents