Nonparametric Instrumental Variable Estimation in ...

2 downloads 0 Views 3MB Size Report
GM CINNAMON TOAST CRUNCH. –0.770. 0.018. –2.148. 0.049. –0.519. 0.012. GM COCOA PUFFS. –0.901. 0.007. –2.514. 0.020. –0.607. 0.005. GM COOKIE- ...
J. Econom. Meth. 2016; 5(1): 153–177

Practitioner’s Corner Philip Shaw*, Michael Andrew Cohen and Tao Chen

Nonparametric Instrumental Variable Estimation in Practice DOI 10.1515/jem-2013-0002 Previously published online November 18, 2015

Abstract: This paper investigates recent developments in the literature on nonparametric instrumental variables estimation and considers the practical importance of the features of these estimators in the context of typically applied econometric models. Our primary focus is on the estimation of econometric models with endogenous regressors, and their marginal effects, without a known functional form. We develop an estimator for the marginal effects and investigate its finite sample performance. We show that when instruments are weak, in the classic sense, the nonparametric estimates of the marginal effect outperforms the classic twostage least squares estimator, even when the model is correctly specified. When the instruments are strong, we show that the nonparametric estimator for the partial effects is still effective compared to the two-stage least squares estimator even as the number of IVs increases. We also investigate bandwidth choice and find that a rule-of-thumb bandwidth performs relatively well. Whereas cross-validation leads to a better fit when the number of instruments is small, as the number of instruments increases the rule-of-thumb standard actually results in better model fit. In an empirical application we estimate the work-horse aggregate logit demand model, discuss the required nonparametric identification properties, and document the differences between nonparametric and parametric specifications on the estimation of demand elasticities. Keywords: information regularized estimators; instrumental variables; logit demand model; nonparametric. JEL codes: C13; C14; C15.

1 Introduction This paper investigates recent developments in the literature on nonparametric instrumental variables estimation and considers the practical importance of the features of these estimators in the context of typically applied econometric models. Our primary focus is on the estimation of econometric models with endogenous regressors, and their marginal effects, without a known functional form. We develop an estimator for the marginal effects and investigate its finite sample performance. We show that when instruments are weak, in the classic sense, the nonparametric estimates of the marginal effect outfits the classic two-stage least squares estimator, even when the model is correctly specified. When the instruments are strong, we show that the nonparametric estimator for the partial effects is still effective compared to the two-stage least squares estimator even as the number of IVs increases. We also investigate bandwidth choice and find that a rule-of-thumb bandwidth performs relatively well. Whereas cross-validation leads to a better fit when the number of instruments is small, as the number of instruments increases the rule-of-thumb standard actually results in better model fit. In an empirical application we estimate the work-horse aggregate logit demand *Corresponding author: Philip Shaw, Fordham University – Economics, 441 E. Fordham Rd., Bronx, NY 10458, USA, E-mail: [email protected] Michael Andrew Cohen: New York University Stern School of Business – Marketing, 40 West Forth Street suite 914, NY 10012, USA Tao Chen: University of Waterloo – Economics, Waterloo, Ontario, Canada

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

154      P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice model, discuss the required nonparametric identification properties, and document the differences between nonparametric and parametric specifications on the estimation of demand elasticities. Researchers are often interested in estimating functional relationships between a dependent variable y∈R and a set of explanatory variables X∈[0, 1]p+m where X = [y2 x1] with y2∈[0, 1]p and x1∈[0, 1]m. If the functional relationship is parametrically specified, for example y is explained by X according to a linear ­parametric specification, then it is straightforward to estimate the functional relationship between variables using well know linear parametric estimation techniques. In most cases the functional relationship is not known therefore, researchers must impose, a possibly misspecified, functional form on the problem. One way to overcome this problem is to estimate the functional relationship nonparametrically. In this case, the researcher no longer estimates a set of parameters, rather he estimates points on the unknown function. To illustrate, consider the following model which assumes the error term enters additively. y = φ( X ) + u (1)



If u is conditionally mean independent of X, E(u|X) = 0, then kernel or sieve methods can be used to estimate φ. However, if E(u|X)≠0 more information will be required to identify φ. The parametric solution is to select instruments, z which are conditionally mean independent of u (E(u|z) = 0) and correlated with X. This works fine in the parametric case but using instruments for nonparametric identification is non-trivial. The difficulty arises because the mapping of the distribution of the data into the regression function is not continuous (Kress 1999). Newey and Powell (2003) overcome this ill-posedness by assuming that φ belongs to a compact set and they restrict their estimator to belong to this set. This avoids the ill-posed problem directly. A similar approach is followed by Ai and Chen (2003) for a semi-nonparametric estimation problem of the same variety. Both implement a two stage estimation procedure wherein the estimate of φ is given by series approximation. kn



kn

φ( X ) ≅ ∑∑… j1 = 1 j 2 = 1

kn

∑Π

jp+m = 1

j1 , j 2 , … j p + m

Ψ j ( X1 ) Ψ j ( X2 )… Ψ j 1

2

p+m

( X p+m ),

(2) 

where Ψj are basis functions. Expanding the conditional moment expression by plugging in the following expression:



kn  kn kn  E ( u | z ) = E ( y | z ) − E ∑∑… ∑ Π j ,j , …j Ψ j ( X 1 ) Ψ j ( X 2 ) … Ψ j ( X p+m )| z  =0 p+m p+m 1 2 1 2  j1 =1 j2 =1 jp+m =1  

(3)

The basic idea is to estimate a vector of parameters for which this condition is true. Using the approach of Ai and Chen (2003), Blundell, Chen, and Kristensen (2007) present one of the first empirical applications to explore a “sieve” type estimator. Sieve estimation optimizes on an empirical counterpart to Equation (3) on a subset of the parameter space and then allows this subset to “grow” with the sample size which requires kn in Equation (2) to go to infinity at certain rate. They estimate a shape invariant Engel curve system which admits a semi-nonparametric form. In their model demographic scaling parameters enter parametrically and total expenditure is treated as endogenous. Hall and Horowitz (2005) present two methods for estimating φ based on kernel and series approximating regressions and derive optimal convergence rates. Newey and Powell (2003) show that in their framework a condition needed for identification is the completeness of the conditional distribution f(X|z). Severini and Tripathi (2006) show that completeness of the conditional distribution is equivalent to a correlation between the model space and the instrument space. Severini and Tripathi (2006) also explore identification issues with these models and note that pointwise identification can easily fail. Their work provides intuition on how to determine the identified part of the structural function φ. They examine the relationship between identification of the structural function and identification of linear functionals and uncover a connection between them. D’Haultfoeuille (2011) discusses the completeness condition in more detail showing that the different versions of completeness arise depending upon the regularity condition imposed. Andrews (2011) constructs examples for L2-completeness distributions, which

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice      155

allow for unbounded regression functions in the nonparametric IV regressions. In Hoderlein and Holzmann (2011) they derive an estimator when the instruments and the endogenous regressors are jointly normal, conditional on exogenous regressors. Under their specification, identification amounts to testing a zero condition on the conditional covariance function. They show that their estimator performs reasonably well when the problem is severely ill-posed, as is the case under joint normality. Darolles et al. (2011) and Gagliardini and Scaillet (2012a) propose estimators which exploit information in the L2 norms of φ and its first derivative by regularizing the second stage estimates with their Sobelev norm. This regularization penalizes the highly oscillating components to achieve a continuous mapping. They call their estimators Tikhonov regularized estimators (TIR) after seminal work by Tikhonov (1963) that proposes this type of regularization for ill-posed inverse problems such as the one studied in this literature. In a similar line of work, Gagliardini and Scaillet (2012b) derive an estimator for the structural quantile effects using an estimator similar to the one discussed above. They also prove the pointwise asymptotic normality of their estimator and construct a consistent estimator for the asymptotic variance. The purpose of this paper is to focus on recent developments in the literature that focuses on the nonparametric estimation of functions under the case in which one or more of the variables are endogenous. We review two types of estimators that overcome the ill-posed problem including the sieve based and the kernel based estimators. Our main focus will be on the kernel based estimator paying close attention to the estimation of the partial effects of the model. We present a feasible estimator for the partial effects and show that it performs nearly as good as the parametric estimator, even when the parametric estimator is given the best possible scenario. We also show that the nonparametric instrumental variable derivative estimator dominates the commonly used two-stage least squares estimator when the instruments are weak. We also apply the estimator to an empirical data set to estimate an aggregate logit demand model – popular in applied research conducted in many fields of economics – and document the differences between nonparametric and parametric specifications on the estimation of demand elasticities.1 The paper proceeds as follows. In Section 2 we present the estimators of Blundell, Chen, and Kristensen (2007) and Gagliardini and Scaillet (2012a) and discuss some of the practical considerations associated with implementation of these types of estimators. Section 3 discusses how one might recover the partial effects of the model and also discusses instrument relevance. In Section 4 we design Monte Carlo simulations to investigate the finite sample performance of the partial effects and in Section 5 we discuss the results of the simulations. Section 6 introduces our empirical application and discusses the results.

2 The Estimators In this section we present the estimators constructed by Blundell, Chen, and Kristensen (2007) and Gagliardini and Scaillet (2012a). Both methods used rely upon a minimum distance criteria that can be posed in a very general framework. Take the following model:

y = φ( y 2 , x 1 ) + u (4)

Under the assumption that E(u|y2, x1)≠0 estimation of φ(y2, x1) by traditional nonparametric methods yields meaningless results. Now assume we observe a set of instrumental variables z = [z1 x1] that satisfy the following condition E(u|z) = 0. Taking expectations over Equation (4) we obtain the following equation:

m( φ( y 2 , x 1 ), z ) = E ( y | z ) − E ( φ( y 2 , x 1 )| z ) =0 (5)

In operator notation let Tφ = E(φ(y2, x1)|z) and r = E(y|z) so that we can write Equation (5) as:

T φ( z ) − r ( z ) = 0 (6)

1 All code used in this paper was written in Matlab and is available upon request.

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

156      P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice The solution to Equation (6) is said to be well-posed if the solution exists, is unique, and continuous in r. I­ll-posedness occurs because T−1 need not be continuous. One approach to dealing with ill-posedness is through regularization which is generally represented as follows: φ λ = argmin|| T φ − r || 2 + λ || φ ||



φ

(7)



Blundell, Chen, and Kristensen (2007), Darolles et al. (2011), and Gagliardini and Scaillet (2012a) rely upon the method above to estimate φλ however, Blundell, Chen, and Kristensen (2007) allows the exogenous variables x1 to enter parametrically instead of nonparametrically. ‖φ‖ is the norm of the function and its derivatives, commonly referred to as a Sobolev norm, used by both Blundell, Chen, and Kristensen (2007) and Gagliardini and Scaillet (2012a). ‖φ‖ is a penalization matrix determined by the Sobolev norm of ‖φ‖, heretofore C, and a scaling parameter λ. An estimate of φ can be found by minimizing ‖Tφ–r‖2 subject to the constraint that ‖φ‖  ≤  l. In practice the constraint on the penalization matrix may be unknown therefore, Equation (7) will be solved for a given value of λ as suggested by Blundell, Chen, and Kristensen (2007). Gagliardini and Scaillet (2012a) approximate the mean integrated squared error of their estimator and propose that in practice one should choose a penalization parameter λ that minimizes the MISE. Blundell, Chen, and Kristensen (2007) take a different approach suggesting that one might choose various values of penalization parameters and display the results for each choice, we follow this approach in the interest of providing a more complete picture of the implications of penalization parameter choice on model inference. At this point it is important to note that thus far we have assumed that the conditioning set and the arguments in the function have common elements x1 in the set z = [z1 x1]. As shown by Carrasco, Florens, and Renault (2007), when the conditioning set contains arguments in the function, Tφ is no longer compact. This means the theory need not apply. This is an issue directly addressed Gagliardini and Scaillet (2012a). They replace Equation (5) with the following: m( φ( y 2 )x , z ) = E ( u | z ) = E (( y − φ( y 2 )x )| z 1 , x 1 = x 1 ) =0



1

1

(8)

By fixing the exogenous variables at a point in their distribution, they recover compactness allowing them to procede theoretically. The main difference between the TIR estimator and the BCK estimator is how each component of ­Equation (5) is estimated. For example, Blundell, Chen, and Kristensen (2007) estimate both Tφ and r via sieve estimation while Gagliardini and Scaillet (2012a) use a combination of basis functions and kernel methods to obtain a solution to the empirical counterpart to Equation (7). There are also significant theoretical differences between their estimators however, it can be shown that in large samples both estimators produce similar results.2 Both methods rely upon the calculation of the penalization matrix C. Blundell, Chen, and Kristensen (2007) provide the following equation characterizing their calculation of the knp × knp penalization matrix C: Cr = ∫



[0,1] p

[ ∇r Ψ k ( y 2 )] ′[ ∇r Ψ k ( y 2 )] dy 2 (9) n

n

where Ψ k ( y 2 ) is the complete set of basis functions which is N × knp . As in Blundell, Chen, and Kristensen n (2007) one might choose r = 0 and r = 2 so that C is constructed as:

C =∫

[0,1] p

[ Ψ k ( y 2 )] ′[ Ψ k ( y 2 )] dy 2 + ∫ n

n

[0,1] p

[ ∇ 2 Ψ k ( y 2 )] ′[ ∇ 2 Ψ k ( y 2 )] dy 2 (10) n

n

The choice of the order of derivative should be chosen based upon the application at hand. In the case of Blundell, Chen, and Kristensen (2007), they choose the order of derivative based upon the type of underlying function they focus on estimating. In practice, the integration can be quite cumbersome to compute symbolically. For a small p and small kn integration can be done fairly quickly; however, for larger values of p and kn, numerical methods of integration prove useful. For large p and kn we use Quasi-Monte Carlo methods with 2 Simulation results showing this are available upon request.

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice      157

stratified sampling to compute the integrals. As an example, Table 1 shows the analytical expression for C0 as well as the one obtained under Monte Carlo integration. It should also be noted that in theory the support of the right hand side variables is restricted to lie on [0, 1]. This assumption is without loss of generality and serves to simplify the mathematics of the problem. So in practice we transform our variables as follows: y 2 − min( y 2 ) y 2 = ω( y 2 ) = where y 2 is the variable measured in the original units. max( y 2 ) − min( y 2 ) ˆ BCK ( y ) has a closed form solution Blundell, Chen, and Kristensen (2007) show that their estimator φ 2 given by: ˆ BCK = ( Ψ ′B( B′B ) −1 B′Ψ + λC ) −1 Ψ ′B( B′B ) −1 B′y (11) Π λ



ˆ BCK ( y ) by multiplying our N × k p basis where B(z) is a N × J nk complete set of basis functions. We construct φ n 2 BCK p ˆ . Similarly, Gagliardini and Scaillet (2012a) show that functions Ψ k ( y 2 ) by the kn vector of coefficients Π λ n their estimator also has a closed form solution expressed as: −1

ˆ TIR =  λ C + 1 Pˆ ′ R ˆ  1 Pˆ ′ R ˆ Π λ ,x 1  N x 1 x 1   N x 1 x 1 N



(12)



ˆ = Ω( z ) Eˆ ( Ψ( y )| z , x = x ) where Ω( z ) is an optimal weighting Pˆx = Eˆ ( Ψ( y 2 )| z 1 , x 1 = x 1 ) and R 1 x 1 x 1 1 x 1 2 1 1 1 1 matrix. Eˆ ( Ψ( y 2 )| z 1 , x 1 = x 1 ) is just a matrix of fitted values from the nonparametric regression of Ψ(y2) on z1 at some point x 1 in the support of x1.

2.1 A Semi-Nonparametric Approach In some cases it makes more sense to estimate the model in a semi-nonparametric framework. We follow the approach as in Blundell, Chen, and Kristensen (2007) where they assume a semi-nonparametric model. Suppose we observe a set of variables x2 for which we believe to enter the structural equation parametrically so that: y = φ( y 2 , x 1 ) + ψ( x 2 θ ) + u (13)



where ψ is a known function up to a finite set of unknown parameters θ. Using this modeling assumption, we can rewrite Equation (5) to include the semi-nonparametric component.

Table 1:  ∫



[0,1]p [0,1]m

[ Ψk ( y2 , x 1 )]′ [ Ψk ( y2 , x 1 )]dy2dx 1 for kn = 5, p = 1, and m = 0. n

n

Analytical solution 0.32  0.00  –0.15  0.00  –0.03  0.00 

0.00  0.21  0.00  –0.13  0.00  –0.03 

–0.15  0.00  0.30  0.00  –0.12  0.00 

0.00  –0.13  0.00  0.31  0.00  –0.11 

–0.03  0.00  –0.12  0.00  0.31  0.00 

0.00 –0.03 0.00 –0.11 0.00 0.32

Calculated via Monte Carlo simulation 0.32  0.00  –0.15  0.00  –0.03  0.00 

0.00  0.21  0.00  –0.13  0.00  –0.03 

–0.15  0.00  0.30  0.00  –0.12  0.00 

0.00  –0.13  0.00  0.31  0.00  –0.11 

–0.03  0.00  –0.12  0.00  0.31  0.00 

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

0.00 –0.03 0.00 –0.11 0.00 0.32

158      P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice m( z ) = E ( y | z ) − E ( φ( y 2 , x 1 )| z ) − E ( ψ( x 2 θ )| z ) = 0 (14)



where x1 and x2 are now subsets of z under the assumption that E(u|z) = 0 while still maintaining the assumption that E(u|y2)≠0. From this we can estimate α = (θ, φ) by finding the minimum of the following expression: ˆ = argmin α α∈An



1 n ∑mˆ ( zi , α ) ′Ωˆ ( zi ) −1 mˆ ( zi , α ) n i= 1

(15)



ˆ ( z ) is an estimate for the optimal weighting matrix and m ˆ ( z ) is where An is the sieve parameter space, Ω i a nonparametric estimate of the moment condition in Equation (14). Blundell, Chen, and Kristensen (2007) ˆ. One can then estimate recommend starting with the identity matrix for the optimal weight then estimate α the optimal weight matrix and iterate until convergence.

2.2 Basis Functions, Kernel Functions, and Choice of Bandwidth In this section we briefly discuss our basis functions that we use for the function approximation. Blundell, Chen, and Kristensen (2007) show that the choice of basis function does not appear to be particularly important with respect to the finite sample properties of their estimator. Following Gagliardini and Scaillet (2012a) we choose the standardized shifted Chebyshev polynomials of the first kind. In lower dimensions the use of tensor product basis poses no real problem.  p+ 1  Λ = ∏Ψ k ( Xi )| ki = 0, 1, …, kn  i   i=1 



(16)

Since our main focus is the estimation of the unknown function in higher dimensions, we will instead use the complete set of shifted Chebyshev polynomials.  Pk p+ 1 = Ψi ( X 1 ) Ψi ( X 2 ) … Ψi ( X p+ 1 ) n 1 2 p+ 1 



p+ 1

∑i ≤ k , 0 ≤ i , …, i l=1

n

l

1

p+ 1

  

(17) 

The main advantage to using the complete versus tensor products is the fact that the tensor products grow exponentially while the complete basis functions grow polynomially and provide an equally as good approximation asymptotically with far fewer elements. For example, if kn = 5 and p = 3, then with the full tensor product basis functions there will be 1296 elements while there will only be 126 for the complete basis functions. For our choice of kernel function we use the generalized product kernel of Li and Racine (2003): K γ ( zt , z ) =Wh ( ztc , z c ) L( ztd , z d , ω ) J ( zts , z s , ω ) (18)



r1

Wh ( zic , z c ) = ∏ j



c c 1  z j − z ji  w  hj  hj  r2

I ( z dji ≠ z dj )

L( zid , z d , ω ) = ∏ω j j



r3

 | z sji − z jz |

J ( zis , z s , ω ) = ∏ω j



(19) 

j



(20)

(21)

where z are continuous variables, z are discrete variables, and z are discrete variables with a natural ordering. For the continuous variables, we used the standard normal density function. c

d

s

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice      159

ˆ one needs to specify the bandwidth across the choice of instruments and When estimating Pˆz and R z 1 1 strictly exogenous regressors. For the choice of bandwidth we consider two different approaches: rule-ofthumb (ROT) and cross-validation (CV). For the ROT we employ Silverman (1986)’s rule-of-thumb for continuous regressors: hj = c j σ j N





1 4+ r1

(22)



where c∈R+ and σj is the standard deviation of zj. The distinct advantage of the ROT method is its computational simplicity. This becomes very important when the researcher is dealing with a large number of exogenous variables. Alternatively, the cross-validation approach seeks to find the optimal choice of γ = [h ω] such that the distance between the response variable and the leave-one-out (LOO) estimator of the unknown function is minimized. This approach can be summarized as follows: N

γ opt = argmin∑( yi − gˆ− i ( zi )) 2 M ( zi ) γ∈Rl



i= 1

(23)



N

where 0  ≤  M(zi)  ≤  1 and gˆ− i ( zi ) =

∑ yK (z , z ). ∑ K (z , z ) l ≠i i N l ≠i

i

γ

γ

i

l

As shown by Hall, Li, and Racine (2007), cross-validation

l

has the ability to select bandwidths in such a way that it smooths out irrelevant regressors. It should be noted that the choice of bandwidth through cross-validation is computationally expensive. The computation time is an increasing function of the number of exogenous and instrumental variables and the sample size.

3 Recovering the Partial Effects In many cases the underlying function φ(X) is not the object of interest. Instead we are more interested in obtaining the derivative of the function which allow us to measure the impact one variable has on another. Recall Equation (2), an intuitive estimator could be based on kn



kn

φ′( X ) ≅ ∑∑… j1 = 1 j 2 = 1

kn

∑ (Π

jp+m = 1

j1 , j 2 , … j

p+m

Ψ j ( X1 ) Ψ j ( X2 )… Ψ j 1

2

p+m

( X p +m ) ) ′ ,

(24) 

and we only need to replace φ in the previous section by φ′. The obvious drawback of this simple solution is its computation burden. As we will show, it requires trivial effort to construct an estimator for φ′ based on the estimated φ, while the product rule in Equation (24) will introduce a lot more coefficients to estimate if we choose to work with φ′ directly. Recall the classical representation for the definition of the derivative.



∂φ( y 2 , x 1 ) φ( y 2 + h, x 1 ) − φ( y 2 , x 1 ) = lim h→0 h ∂y 2

Following this representation we propose a natural estimator for





(25)

∂φ( y 2 , x 1 ) as: ∂y 2

ˆ( y +c , x ) − φ ˆ( y , x ) ˆ( y , x ) φ ∂φ 2 n 1 2 1 2 1 = cn ∂y 2



where cn is an appropriately chosen sequence of decreasing numbers converging to zero.3 3 The theoretical justification of this estimator along with choice of cn is provided in the Appendix.

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

(26)

160      P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice Once we have an estimate for the partial effects, confidence intervals for the pointwise estimate for ∂φ( y 2 , x 1 ) can be constructed to infer whether a particular null model falls within those confidence regions. ∂y 2 For example (x1 is suppressed), one might be interested in testing whether the underlying population has a constant slope β1 so that φ(y2) = β0+β1y2. To test this hypothesis, one can estimate the confidence interval ˆ( y , x ) ˆ( y , x )  ∂φ  ∂φ 2 1 2 1 ,   and determine whether β1 is contained in the interval over the appropriate ∂y 2 ∂y 2  α/2 1−α/2   range of y2. If β1 is contained in each confidence interval over a specified range for y2, then we would fail to reject H0:φ′(y2) = β1, or in other words we fail to reject a model that is linear in the variable y2. Recall that we must transform our variables in practice onto [0, 1]. The researcher is typically interested in recovering the partial effect for a non-standardized variable. In fact the partial effect calculated above returns the partial effect of the transformed variable. To see this, let y 2 = ω( y 2 ) where ω is some monotonic transformation onto [0, 1]. We are now interested in obtaining interest. An expression for this is just given by:

∂φ( y 2 , x 1 ) where y 2 is the original variable of ∂y 2

∂φ( y 2 , x 1 ) ∂φ( y 2 , x 1 ) ∂ω = ∂y 2 ∂y 2 ∂y 2





(27)

A feasible estimator for this is simply given by: ˆ ( y , x ) ∂φ ˆ ( y , x ) ∂ω ∂φ 2 1 2 1 = ∂y 2 ∂y 2 ∂y 2





(28)

In most cases we will be interested not only in the pointwise derivative but also the average partial effect. We can calculate this as:



 ∂φ( y 2 , x 1 )  ∂φ( y 2 , x 1 ) dx 1dy 2 E  = ∫y ∈Ξ ∫x ∈Ξ f ( y 2 , x 1 ) ∂y 2 ∂y 2   2 2 1 1 

(29)

A simple estimator for the expression in Equation (29) is given by the following formula: ˆ( y , x )  ∂φ( y 2 , x 1 )  1 n ∂φ 2i 1i Eˆ  = ∑   ∂ ∂ y n y i= 1 2 2   



(30)

In order to satisfy the compactness assumption, calculate the derivatives and average derivatives while fixing x1 = ω( x 1 ) at some point in the support of x 1 . As a result of fixing x1 in the support of x1, the expectations become conditional as compared to an unconditional expectation that would be calculated by evaluating the derivative at all points in the support of x1. In addition to the average marginal effect, one might also be   ∂φ( y 2 i , x 1i )  . interested in calculating the median derivative which we denote as med   ∂y 2  

3.1 Instrument Relevance Classic Nelson and Startz (1990) and Staiger and Stock (1997) document the importance of instruments quality with respect to their relevance. When the function φ(X) is known to be linear in the model parameters, Staiger and Stock (1997) show that an F-stat of 10 or greater on the instruments in the first stage regression is needed to have confidence in point estimation and thus inference. The fundamental variable that arises in the literature is the concentration parameter:

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice      161

µ2 =

π′z 1′ z 1 π σ v2

2

(31) 

where π is the vector of coefficients from the following population model: y 2 = β0 + x 1 β + z 1 π + v2 (32)



The work on “weak” instruments has been done under the assumption that the unknown function is known and linear in parameters. Under an unknown φ(X), not much is known regarding how one might characterize what Staiger and Stock (1997) refer to as “weak instruments.” In many of the finite sample studies on nonparametric IV estimators z1~N(0, 1) with π = 1 and σ v = 1 yielding µ2 ∼ χ 2N . Stock, Wright, and Yogo (2002) 2 show that in large samples and a single IV, E(F)≈E(μ2)+1 = N+1 which leads to very large values for F-stats as the sample size increases. Estimation in the nonparametric case requires more from the data, therefore, it is not appropriate to apply the Staiger and Stock (1997) rule-of-thumb uniformly to the nonparametric case. To investigate the role of instrument strength in the finite sample performance of the nonparametric IV estimator, we vary the strength of the instruments in-line with the parametric literature by adjusting the concentration parameter and thus the value for the F-stats across different data generating process. While creating instruments in this manner generates a severely ill-posed estimation problem (joint normality), Hoderlein and Holzmann (2011) point out that joint normality is arguably the leading distribution found in empirical applications. This framework therefore, is the one that will most likely be of interest to applied researchers.

4 Monte Carlo Simulations In this section, we document the difference in finite sample performance of our NPIV derivative estimator as compared to the commonly used linear two-stage least squares estimator (2SLS). The empirical application presented in the next section of the paper is a demand model that is typically estimated under the assumption that the underlying random utility model is linear. In our simulation study we introduce three different data generating processes (DGPs) to investigate the relative performance of our NPIV derivative estimator when the linear specification is correct and when it is not.4 We also investigate how the strength and dimension of the instruments (K) impacts the relative performance of our NPIV derivative estimator. In the simulations, the variable we intend to explain takes the following functional form: y = β0 + β 1 y 2 + β 2 x 1 + β 3 y 22 + β 4 x 1 y 2 + v (33)



Table 2: Monte Carlo Designs. DGP1 σw = 0.001   β1 = –13.78 

σl = 0.001  β2 = 0.5  

ρ = 0.5   β3 = 0  

σz = 0.001   β4 = 0  

β0 = –2.67 ρz ,z = 0, 0.50

DGP2 σw = 0.001   β1 = –13.78 

σl = 0.001  β2 = 0.5  

ρ = 0.5   β3 = 0  

σz = 0.001   β4 = –20  

β0 = –2.67 ρz , z = 0

DGP3 σw = 0.001     β1 =1

σl = 0.001  β2 = 0.5  

ρ = 0.5   β3 =  –20 

σz = 0.001   β4 = –20  

β0 = –2.67 ρz ,z = 0

4 Please see Table 2 for a summary of the different DGPs used for the Monte Carlo simulations.

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

i

j

i

i

j

j

162      P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice where y = log(y ) and x 1 is generated from the empirical distribution of advertising data presented later in the paper. To control for instrument relevance and different functional forms the remaining variables are generated to fit the experimental design. Price data ( y 2 ) is generated according to the following process: y 2 = 0.1665 + z 1 P + w (34)



where w ~ N (0, σ w2 ), l ~ N (0, σ l2 ), and v = ρw+[(1–ρ2)1/2]l where ρ allows us to control the degree of endogeneity between y 2 and the structural error v.5 z1~N(0, Σz) which produces a mean price that matches the data set used later in the paper. Initially we assume a diagonal variance-covariance matrix Σz but relax this assumption to investigate the impact instrument correlation might have on the performance of the NPIV estimator. Given these modeling assumptions, the problem is severely ill-posed as the joint distribution of endogenous regressor and the instruments, conditional on the exogenous regressor, is jointly normal. Under DGP1 the model is log-linear which is analogous to the aggregate logit demand models used by Guadagni and Little (1983) and Berry (1994). ρz , z is the correlation parameter between the IVs which allows i j us to investigate how a moderate correlation between the IVs impacts the performance of the NPIV estimator. DGP2 allows for an interaction effect between x 1 and y 2 . Under DGP1 we expect the 2SLS estimator to outperform the NPIV estimator as the model would be correctly specified under the usual 2SLS framework. β4 = –20 for DGP2 which allows advertising expenditures to decrease the sensitivity of price changes on the dependent variable. DGP3 allows for both an interaction effect and a nonlinear price effect. We summarize the different functional forms implied by each design in Figure 1. Finally, we generate a sample size of N = 6300 for each of the 1000 Monte Carlo simulations, additionally the order of approximation is set to kn = 5 and the penalization parameter is set to λ = 0.0001 for our baseline simulations. We also assign values λ = 0.1, λ = 0.01, λ = 0.001, λ = 0.00001, and λ = 0 to investigate the relative performance of the NPIV estimators when the instruments are “weak”. The mean squared error (MSE) is reported for the NPIV and 2SLS estimators for each round of ­simulations.6 Finally the Monte Carlo simulations evaluate the impact that bandwidth selection has on the fit of the NPIV estimator of the derivative. φ (y2, x1) Under different DGPs

–2

DGP1 DGP2 DGP3

–4

–6

y

–8

–10 –12

–14

–16 –18

0

0.1

0.2

0.3

0.4 y2

0.5

0.6

0.7

0.8

Figure 1: φ(y2, x1) across different DGPs. 5 Recall that we transform all variables onto [0, 1] using y 2 = ω( y 2 ) where y 2 is the data measured on the original support and w() is a mapping onto [0, 1]. 6 Given the large sample size our MSE metric is a consistent estimator for MISE.

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice      163

5 Simulation Results Table 3 presents the mean-squared error (MSE) for each of the derivative estimators varying only the choice of bandwidth.7 We generate a sample size of N = 1000 for 100 trials where bandwidth is selected via cross-validation and we compare the results to a bandwidth chosen according to a rule-of-thumb.8 We compare MSE produced by the two types of bandwidth choice using DGP1 with the F-statistic = 157.9 The results indicate that as K increases the MSE increases for K > 1 when bandwidth is chosen by cross-validation and the MSE strictly decreases for the ruleof-thumb chosen bandwidth. Acknowledging the strong instrument and fixed sample size conditions, the MSE for the rule-of-thumb chosen bandwidth is strictly lower than the Cross-validated bandwidth. The superlative performance of the rule-of-thumb bandwidth could be due to the fact that the joint distribution between the endogenous regressor and the instruments is joint normal. We hazard to say that this result holds under more general conditions, however, to the best of our current knowledge there is no natural or intuitive approach for cross-validation in the context in nonparametric instrumental variable estimation, particularly under uncertain instrument strength. We present the second set of the simulation results in Table 4 where we fix the concentration parameter such that the F-stat is approximately equal to 5 for the number of instruments from K = 1, …, 8. This corresponds to weak instruments as defined by Staiger and Stock (1997). We investigate the performance of the average derivative, the derivative at the average, and the median derivative. All of which are equal to each other in the population model given under DGP1. It is clear from Table 4 that all three nonparametric estimators outperform the 2SLS estimator in the presence of weak instruments as long as λ  ≤  0.001. As shown in Table 4 there is a non-monotonic relationship between the penalization parameter and the MSE of the derivative estimates. For the median derivative, the smallest MSE is attained when λ takes the value 0.0001. For the average derivative the MSE falls as λ approaches zero however, the performance gains are minimal for λ = 0. For the derivative at the mean, the optimal λ is 0.0001 for K  ≤  6 and 0.001 when K > 6. This critical result implies that the optimal bandwidth depends on which marginal effect is of focus, in addition to the number of instruments. The importance of the penalization parameter cannot be overstated, especially when dealing with weak instruments, and these findings provide practical guidance for the determination of the penalization parameter in applied settings. To support this emphasis a recent working paper by Han (2014) formally demonstrates that the problem of weak identification in nonparametric models can be characterized as an ill-posed inverse problem which motivates the introduction of a regularization scheme. Darolles et al. (2011) Table 3: MSE for Estimators Using Cross-validation vs. Rule-of-thumb under DGP1 with F = 157. K   

1  2  3  4  5  6  7  8 

 ∂φ( y 2 , x 1 )    Eˆ   ∂y 2   8.5248  5.8707  8.1089  8.6522  11.6205  14.864  17.4231  19.984 

Cross-validation    ˆ (y , x ) ∂φ  ∂φ( y 2 , x 1 )    2 1 med   ∂y 2 ∂y 2   y 2 7.1209  3.903  6.6158  6.2328  7.9402  12.979  14.8244  18.440 

11.7596  12.2437  10.1046  8.2105  7.1919  12.946  13.7786  18.132 

Rule-of-thumb  ∂φ( y 2 , x 1 )    Eˆ   ∂y 2  

 ∂φ( y 2 , x 1 )    med   ∂y 2  

8.1632  4.308  2.4046  1.1869  0.7404  0.5840  0.4568  0.4015 

16.4434  6.9212  3.9284  2.1993  1.0122  0.8931  0.6223  0.5197 

ˆ (y , x ) ∂φ 2 1 ∂y 2

y 2

34.8881 15.6985 9.1418 5.2433 2.3982 2.1338 1.6931 1.2753

7 Recall that estimating Px and Rx requires bandwidths for the conditioning variables. 1 1 8 We choose a small sample size and a small set of simulations to make the computational time reasonable. It took roughly 18 h for the 100 trials with only a single instrument. The computation time is strictly increasing in sample size and the number of instruments. 9 We choose this value due to the fact that our empirical application generates this value for the F-statistic.

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

44.45  8.88  3.00  1.53  1.00  0.77  0.66  0.57 

58.65  15.39  5.79  2.98  1.83  1.25  0.98  0.79 

λ = 0.0001 1   2   3   4   5   6   7   8  

λ = 0.0001 1   2   3   4   5   6   7   8  

 ∂ φ ( y 2 , x 1 )   Eˆ   ∂y 2  

70.46  9.49  2.93  1.38  0.92  0.71  0.61  0.54 



               

λ = 0 1 2 3 4 5 6 7 8

K

51.29  10.88  3.37  1.59  1.04  0.78  0.70  0.61 

58.02  12.60  4.07  1.98  1.32  0.93  0.80  0.66 

108.57  14.75  4.33  1.98  1.29  0.91  0.76  0.62 

 ∂φ( y 2 , x 1 )   med   ∂y 2  

Table 4: MSE Across Derivative Estimates for DGP1 for F = 5.

y 2



52.02  11.12  3.89  2.02  1.35  0.95  0.84  0.75 

79.72  21.44  7.58  3.78  2.49  1.70  1.35  1.06 

388.60  45.87  13.65  5.52  3.15  2.02  1.44  1.08 

ˆ (y , x ) ∂φ 2 1 ∂y 2

  4286.67  96.58  49.87  30.58  23.10  18.17  16.11  11.91 

  4286.67  96.58  49.87  30.58  23.10  18.17  16.11  11.91 

  4286.67  96.58  49.87  30.58  23.10  18.17  16.11  11.91 

ˆ β   2SLS



λ = 0.1 1 2 3 4 5 6 7 8

λ = 0.01 1 2 3 4 5 6 7 8                

               

λ = 0.001 1   2   3   4   5   6   7   8  

K

189.70  189.42  187.12  183.45  178.16  172.20  163.95  153.75 

184.88  174.47  157.63  136.28  111.04  85.58  63.17  46.10 

147.50  91.84  51.42  28.27  15.44  8.74  5.40  3.62 

 ∂φ( y 2 , x 1 )    Eˆ   ∂y 2  

189.70  189.27  186.43  181.82  175.23  167.74  157.49  145.06 

183.65  170.54  149.81  124.21  95.46  68.79  47.25  31.91 

138.38  75.96  36.76  17.07  7.52  3.33  1.68  0.96 

 ∂φ( y 2 , x 1 )   med   ∂y 2  

y 2



196.12  195.42  192.79  188.46  181.46  173.46  163.10  150.60 

190.51  176.85  155.89  129.76  99.08  69.35  45.04  28.10 

143.85  77.00  33.75  13.00  4.24  1.23  0.51  0.29 

ˆ (y , x ) ∂φ 2 1 ∂y 2

4286.67 96.58 49.87 30.58 23.10 18.17 16.11 11.91

4286.67 96.58 49.87 30.58 23.10 18.17 16.11 11.91

4286.67 96.58 49.87 30.58 23.10 18.17 16.11 11.91

ˆ β 2SLS

164      P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice      165

show that increasing the dimension of z1 increases the rate of convergence of their kernel based estimator so long as the instruments are not completely irrelevant. Even at its best, the MSE for the 2SLS estimator is over 15 times larger than the MSE for the average derivative estimated using the NPIV approach when λ = 0.0001. Similar results are obtained when looking at the derivative at the average and the median derivative. To investigate the impact correlation across instruments has on the relative performance of the NPIV estimator when instruments are weak, Table 5 reports estimates of the function derivative, for the baseline case (λ = 0.0001), when the correlation between instruments is set to be moderately high with correlation parameter ρz , z = 0.50. The table illustrates that the mean squared error is higher across all estimators as i j compared to the case where correlation is zero. However, the main result still holds showing the NPIV estimator outperforms the 2SLS when weak instruments are present under a linear model. Table 6 displays the MSE for DGP1 under the condition that the first stage F-stat is equal to 157. This is well above the Staiger and Stock (1997) rule of F ≥ 10. Thus the instruments are strong in the classic sense. This means that the conditions for the 2SLS estimator are ideal when the model is correctly specified and the instruments are strong. In this case, there is no reason to believe that the nonparametric estimator should outperform the 2SLS estimator. As expected, the nonparametric estimators under perform the 2SLS estimator for most K. An interesting result is that as the number of IVs increases the relative performance of the nonparametric estimators improves. For example, the median derivative outperforms the 2SLS derivative for K ≥ 6 and the other nonparametric estimators’ MSEs converge towards the 2SLS estimator’s MSE as K increases. This result is presented in Figure 2 and the finding suggests that for the NPIV estimator to be a viable option over the linear 2SLS estimator, when instruments are relatively strong and the population model is linear, the researcher must have access to a sufficient number of valid IVs. Table 7 shows the mean squared error of the estimators for DGP2 and DGP3 with K = 8 and F = 157. Given the presence of the interaction effect under DGP2 and the nonlinear price effect under DGP3, the 2SLS estimator is no longer consistent. Not surprisingly the 2SLS estimator under performs the NPIV estimator at the 25th

Table 5: MSE Across Derivative Estimates for DGP1 for F = 5 and ρz ,z = 0.50. i

K



2  3  4  5  6  7  8 

j

 ∂φ( y 2 , x 1 )    Eˆ   ∂y 2  

 ∂φ( y 2 , x 1 )    med   ∂y 2  

17.91  8.00  4.72  3.10  2.23  1.64  1.27 

12.70  4.64  2.42  1.50  1.12  0.87  0.70 

ˆ (y , x ) ∂φ 2 1 ∂y 2



ˆ β 2SLS

12.70  4.91  2.72  1.76  1.33  1.03  0.78 

177.45 50.87 31.44 23.47 18.30 16.25 12.08

y 2

Table 6: MSE Across Derivative Estimates for DGP1 for F = 157. K



1  2  3  4  5  6  7  8 

 ∂φ( y 2 , x 1 )    Eˆ   ∂y 2  

 ∂φ( y 2 , x 1 )    med   ∂y 2  

12.713  4.147  2.311  1.368  0.895  0.668  0.539  0.461 

11.381  4.305  2.302  1.096  0.706  0.480  0.404  0.359 

ˆ (y , x ) ∂φ 2 1 ∂y 2



ˆ β 2SLS

15.238  7.724  4.451  2.225  1.213  0.732  0.576  0.497 

3.264 1.491 0.992 0.803 0.604 0.492 0.453 0.399

y 2

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

166      P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice MSE relative to 2SLS for DGP1 and F=157

5.5 5

Ê

∂φ (y2, x1) ∂y~ 2

4.5

med

∂φ (y2, x1) – ∂y~2 y~2

4

β2SLS

3.5 MSE

∂φ (y2, x1) ∂y~2

3 2.5 2 1.5 1 0.5

1

2

3

4

K

5

6

7

8

Figure 2: MSE Across Estimators Relative to 2SLS. Table 7: MSE Across Derivative Estimates for DGP2 and DGP3 for K = 8 and F = 157.  ∂φ( y 2 , x 1 )    Eˆ   ∂y 2  

 ∂φ( y 2 , x 1 )    med   ∂y 2  

ˆ (y , x ) ∂φ 2 1   ∂y 2 y

2SLS

     

0.190  1.034  1.958 

0.395  0.817  1.432 

0.725  0.989  1.509 

9.324 0.438 4.712

     

0.203  0.9631  1.932 

0.423  1.0115  1.778 

0.668  0.9407  1.611 

8.997 0.4358 4.958

Percentile of x 1   DGP2  25  50  75 DGP3  25  50  75

2

and 75th percentile for x 1 . However, towards the center of the distribution the 2SLS estimator has a small MSE relative to the NPIV estimator.

6 A  n Empirical Application: Differentiated Product Logit Demand Model This section introduces an application of the nonparametric instrumental variables estimator to the popular aggregate logit demand model (Guadagni and Little 1983; Berry 1994). The core of the model specifies a random utility model for a differentiated product market t with products j:

uijt = δ jt +εijt , (35)

where u is the utility consumer i derives from the product j in market t. ε is and idiosyncratic taste shock that is assumed to have a type I extreme value distribution. Consumers are assumed to chose the product j that maximizes their utility uijt. The extreme value assumption on the idiosyncratic utility shock εijt, can be ­integrated out to yield the familiar closed form expression for the logit market share model:

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice      167

S jt =



exp( δ jt )

1 + ∑ k exp( δkt )

.

(36)



This expression for demand share, Sjt, is a function of the utility index δ for the focal product j as well as the full choice set of products k. Normalizing the utility of one of the products, j = 0, to zero implies that the demand share of that choice is,

(

S0 t = 1 + ∑exp( δ kt )



k

)

−1

,



(37)

such that, S jt = exp( δ jt ) S0 t (38)

Therefore, (36) can be rewritten as:

ln( S jt ) − ln( S0 t ) = δ jt . (39)

where δjt is the indirect utility function we are interested in recovering. In general we specify,

δ jt = δ( x jt , p jt , ξ jt ), (40)

which is composed of an exogenous product or market attribute xjt, the endogenous price variable, pjt, and a market-level demand shock which is often characterized as time varying product attribute, ξjt, unobserved by the econometrician and partially observed by firms and consumers leading to the endogeneity problem. Berry and Haile (2014) point out that nonparametric identification of this model requires monotonicity of utility index in δjt in ξjt, this requirement along with the connected substituted requirement which allows us to invert the demand system, implies that,

S jt = hj ( s− jt , x jt , p jt , ξ jt ), (41)

which is the form of a nonseparable nonparametric regression model (Chernozhukov and Hansen 2005), The model hj is an unknown function, which has the sole requirement that hj be strictly increasing in ξjt, in this homogenous consumer form of the model ξjt acts like a vertical characteristic so evoking this assumption is natural under the standard laws of demand. Berry and Haile (2014) show that this is case for the differentiated product logit demand model and importantly only requires a single exogenous cost-shifting instrument for the endogenous price variable, pjt, beyond the exogenous characteristic xjt, when price enters only through the demand index as we see specified in (40). For estimation we specify a form of the model analogous to the one that we have addressed in the introduction and that we have worked with during the simulation exercise where the unobserved component of the model ξjt is an additive scalar:

ln( S jt ) − ln( S0 t ) = hj ( x jt , p jt ) + ξ jt . (42)

The data that researchers typically apply this model to is observed from the market place and it is the case that product managers set the price, promotion, and product strategy, conditional on their observation of ξ, this implies that there is plausible correlation between price and promotion and the structural error ξ. Typically researchers specify a parametric model of h(·)such as linear, and use well established linear instrumental variable parametric estimation techniques, such as GMM, Maximum Likelihood, or Empirical Likelihood to compute consistent parameter estimates (Berry, Levinsohn, and Pakes 1995; Nevo 2001). While researchers have gone to great lengths to make this model ever more flexible they continue to restrict the model to belong to some parametric form due to the difficulty using nonparametric estimation techniques in the presence of endogenous regressors, moreover, to the best of our knowledge, the literature has not addressed the estimation of partial effects within the context the nonparametric instrumental variables estimation framework developed in the economic theory literature. The partial effects estimates are precisely the estimates that the typical analyst is interested in obtaining for hypothesis testing and developing policy insights and recommendations. Estimates of the partial effects allow the researcher to recover elasticities, such as own or cross

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

168      P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice price elasticities, without restricting the function form of utility. The own- and cross- demand elasticities for this model are derived from the demand share logit probability expression (36), and take the following functional form: ε jj =

∂δ j

∂p j

ε jk =−

(1 − S j ) p j own

∂δ j

∂pk

Sk pk cross.

Here, Sj and Sk, are the demand share of product j and products k≠j. To estimate the model specified in Equation (42) we use a data set that tracks the purchase of breakfast cereals over 36 months, from January of 2006 through December of 2008, across 8 major metropolitan markets and 21 brands. The data set records quantity purchased, which we translate into demand share by exogenously specifying a market size, a common practice in the literature applying these models. The data also records product level mean purchase price after price promotions and product level gross rating points (GRPs) a measure of television advertising reach and frequency. To instrument for price we use prices of products in other markets (Hausman and Taylor 1981; Nevo 2001) as the cost shifting instrument required for nonparametric identification. The intuition is that while prices in other markets are correlated with production costs, common across all markets, such as product ingredients, these costs are independent of demand shocks observed within a geographic market making them a suitable instrument. One critique of using the prices of products in other markets as an instrument is that national advertising may confound their effectiveness due to common advertising pressure across markets, however, we observe both national an local advertising levels, plausibly strengthening the chosen cost-shifting instrument. Armstrong (2014) argues that characteristics of competing products, often used to instrument for price, are correlated with prices through equilibrium markups, as a result their dependence diminishes as the number of products in the market becomes large relative to the number of markets in the sample under Nash-Bertrand competition. ­Armstrong (2014) points out that instruments that shift marginal cost directly do not suffer from this type of instrument weakness. The price of products in other markets are proxies for production costs that shift marginal cost directly, consequently they do not suffer from the type of instrument weakness that Armstrong (2014) cites. One may also argue that advertising GRPs are also set upon observing the demand shock. Although this is an accurate insight, one institutional fact about the contracts that advertisers make with product marketers reduces this concern. Specifically, product marketers will make a contract with the television advertiser to deliver a certain number of GRPs (GRPs are the currency of Advertising) and because the delivery of GRPs is subject to television viewership the advertisers do not have complete control over the exact amount of GRPs they can deliver within a market period. Consequently advertisers “make good” on the contracts by running lower levels of advertising for the product marketers in subsequent contract periods (Dube, Hitsch, and Manchanda 2005). This institutional detail provides enough exogenous variation in advertising to ignore any systematics difference in the point estimates due to potential advertising endogeneity. In Figure 3 we plot the estimates of the nonparametric densities for price per ounce, log(1+GRP), and the dependent variable. The first thing to notice is that the distribution of the advertising data is not normally distributed while the other densities, excluding outliers, are approximately normal.10 In Table 8, we estimate the partial effects using OLS, 2SLS, and a nonparametric estimate under the assumption that price is exogenous (NP). The estimated partial effect from OLS is larger than the nonparametric estimate whether we control for brand or time effects.11 Because price is positively correlated with the demand shock, ξ, the estimated partial effect is biased upward for both the OLS and NP estimates. The magnitude of the bias suggests that there exists a significant amount of endogeneity present in the model. 10 Using the conditional normality test of Hoderlein and Holzmann (2011), we reject the null hypothesis that the endogenous variable and instruments are normally distributed conditional on the exogenous variables. 11 We report these estimated partial effects to facilitate a comparison between the NPIV estimates of the own and cross price elasticities and those implied by the alternative estimates presented in Table 8.

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice      169

A

Density estimate for price

12

10

Density

8

6

4

2

0

B

0

0.1

0.2

0.3

0.4 Price

0.5

0.6

0.7

0.8

Density estimate for GRP

6

5

Density

4

3 2

1 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

GRP

C

Density estimate for dependent variable 0.5 0.45 0.4

Density

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 –25

–20

–15

–10

–5

0

y

Figure 3: Density Estimates.

We next estimate the model without imposing any structure on the unknown function under the assumption that price is endogenous. We fix kn = 5 as this should be sufficient to pick up any nonlinearities present in the model. We also estimate the model across a variety of values for the penalization parameter λ. The impact

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

170      P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice Table 8: OLS, 2SLS, and Nonparametric Estimates for the Average Partial Effect.

βOLS  –5.688 

Without brand and time effects    β2SLS  NP –16.787 

With brand and time effects

–2.949 

βOLS 

β2SLS 

NP

–4.905 

–13.679 

–3.305

λ has on the estimated function is evident from Figure 4. The slope of the function is strictly increasing in λ and eventually becomes positive for λ = 0.1. To estimate price elasticities we estimate the partial effect at different points along the support of price and GRP. Table 9 shows the point estimates for the marginal utility with respect to price at the 25th, 50th, and 75th percentiles of price for a fixed value of advertising GRP. We also estimate the average derivative as discussed in Section 3 of the paper, as well as the bootstrapped 95% confidence intervals across various values of λ. We bootstrap the data across panel identifiers to respect the panel structure of our data. An interesting result that comes out of Table 9, is that for λ = 0 and λ = 0.0001 there is a substantial difference between the derivative at the different price percentiles suggesting that consumers are much more sensitive to price changes when they occur at 75th percentile. For example, when λ = 0 and x1 = ω( x 1, 0.50 ) the derivative at the 75th percentile is 1 3/4 times larger than the derivative at the 25th percentile. This suggests a significant nonlinear result which would normally be missed under a linear framework. The results are similar for λ = 0.0001 however, for λ ≥ 0.001 the derivative estimates across the different percentiles are similar suggesting that the model is roughly linear. It should be noted that for the range of λ∈(0, 0.001), the average derivative is fairly stable ranging from –7.97 to –9.02 when GRP is fixed at its median value. We also estimate the function allowing for brand and time effects to enter the function linearly thus allowing us to estimate the function using the semi-nonparametric approach laid out in Section 2.1. Figure 5 shows that for the same penalization parameter, the marginal effect is smaller when allowing for brand and time effects. Looking at Table 10 we see that with brand and time effects, the average derivative for λ = 0.0001 is –9.028 as compared to –6.095 when we do not control for brand or time effects, which not surprisingly indicates that price is correlated with brand and time effects. We also note that the results fail to indicate an economically significant interaction effect between price and advertising. ~ Estimates for φ (p)x~1 across various λ with x1=x1, 0.50

–3.4

λ=0 λ=0.0001 λ=0.001 λ=0.01 λ=0.1

–3.6 –3.8 –4

φ (p)x~

1

–4.2 –4.4 –4.6 –4.8 –5 –5.2 –5.4 0.05

0.1

0.15

0.2

0.25

Price

Figure 4: Estimated Function Across Various Values for λ.

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

0.3

1.280  –0.193  0.070  4.627  16.724 

1.390  0.034  0.856  3.264  9.006 

95% CI at x 1 = ω( x 1,0.25 )  λ = 0.1   –0.879   λ = 0.01   –6.169   λ = 0.001   –12.938   λ = 0.0001  –13.930   λ = 0   –20.321 

2, 0.25



95% CI at x 1 = ω( x 1,0.75 )  λ = 0.1   –0.238   λ = 0.01   –5.421   λ = 0.001   –13.945   λ = 0.0001  –16.352   λ = 0   –20.462 

ˆ (y , x ) ∂φ 2 1 ∂y 2 y

1.309  –0.180  0.358  4.908  11.166 



95% CI at x 1 = ω( x 1,0.50 )  λ = 0.1   –0.923   λ = 0.01   –5.922   λ = 0.001   –14.825   λ = 0.0001  –16.457   λ = 0   –20.285 

λ

–0.639  –5.867  –12.874  –15.048  –22.463 

–0.011  –5.335  –14.061  –17.675  –22.259 

–0.716  –5.666  –14.899  –17.484  –21.799 

2, 0.50



1.674  0.180  0.117  0.226  2.849 

1.546  –0.032  –0.764  1.050  8.565 

1.582  0.017  –0.615  1.396  4.620 

ˆ (y , x ) ∂φ 2 1 ∂y 2 y

Table 9: NPIV Derivative Estimates for Different Values of λ.

–0.089  –4.804  –11.657  –14.371  –20.129 

0.427  –4.657  –13.044  –17.289  –20.022 

–0.211  –4.927  –13.705  –16.534  –18.944 

2, 0.75



2.042  0.558  –1.188  –3.265  –2.355 

1.902  0.354  –1.938  –3.574  –0.959 

1.946  0.461  –1.771  –3.019  –2.839 

ˆ (y , x ) ∂φ 2 1 ∂y 2 y

0.207  –5.332  –11.369  –12.063  –15.407 

0.279  –4.825  –12.381  –14.224  –13.926 

0.277  –4.996  –13.514  –14.162  –14.057 

2, 0.25



Derivative estimates 0.317  –2.712  –8.344  –7.614  –5.883  Derivative estimates 0.358  –2.928  –6.808  –6.258  –8.966 

  1.349  0.363  –1.027  –0.825  0.351 

Derivative estimates 0.3312  –2.814  –8.364  –8.184  –7.527 

ˆ (y , x ) ∂φ 2 1 ∂y 2 y

  1.183  0.064  –1.496  –1.129  –0.344 

  1.231  0.064  –1.378  –0.902  –0.273 

 ∂φ( y 2 , x 1 )    Eˆ   ∂y 2   2, 0.50



0.620  –2.774  –7.146  –7.846  –12.378 

0.570  –2.594  –8.889  –10.018  –11.457 

0.5894  –2.674  –8.7843  –10.024  –11.716 

ˆ (y , x ) ∂φ 2 1 ∂y 2 y

2, 0.75



1.058  –2.153  –6.839  –8.997  –12.672 

0.996  –2.046  –8.680  –11.847  –14.409 

1.023  –2.084  –8.409  –11.157  –13.100 

ˆ (y , x ) ∂φ 2 1 ∂y 2 y

0.723 –2.384 –6.493 –7.151 –9.219

0.672 –2.230 –8.091 –9.015 –8.194

0.693 –2.296 –7.977 –9.028 –8.634

 ∂φ( y 2 , x 1 )  Eˆ   ∂y 2  

P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice      171

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

172      P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice ~ Estimates for φ (p)x~1 across various λ with x1=x1, 0.50

–2.5

λ=0 λ=0.0001 λ=0.001 λ=0.01 λ=0.1

–3

φ (p)x~

1

–3.5

–4

–4.5

–5

–5.5 0.05

0.1

0.15

0.2

0.25

0.3

Price

Figure 5: Estimated Function Across Various Values for λ Controlling for Brand and Time Effects.

Using brand and time effects, we compute the average own and cross-price elasticities for the 21 brands in the data. The results for the NPIV approach are presented in Table 11 while the results for OLS, 2SLS, and the nonparametric (NP) approach are presented in Table 12. As expected, the elasticities computed under the NPIV approach are relatively insensitive to values of λ between 0 and 0.001. Comparing elasticity estimates in Tables 11 and 12, it appears that the elasticities implied by the 2SLS estimates are typically twice the size of those computed under the NPIV estimation approach. If we focus on Kellog’s Rice Krispie for example, a 1% increase in the price leads to a 2.74% decline in market share using the 2SLS estimates while this fall is slightly over 1% using the NPIV implied elasticities. The elasticities computed using OLS and the nonparametric approach are generally smaller than those computed under the NPIV approach. Overall, the computed cross-price elasticities are of the expected sign and not surprisingly the 2SLS cross-price estimates of elasticity are twice those of the elasticities implied by those computed using the NPIV method.

7 Conclusion In summary, this research finds that NPIV estimators have enticing properties, principally that the penalization method applied to address ill-posedness remedies the effect of having weak instruments. The article also offers a straightforward approach for estimating marginal effects, demonstrates its performance in the context of the estimation of marginal effects, and demonstrates the economic implications of relaxing linearity in the random utility model applied for estimation of an aggregate logit demand model. Recent developments in the literature on nonparametric instrumental variables estimation and the nonparametric identification of these models lead us to investigate implications for practical implementation of nonparametric models. Monte Carlo experiments indicated that when instruments are weak, in the classic sense, the nonparametric estimate of the marginal effect outperforms the two-stage least squares estimator even when the model is correctly specified. When the instruments are strong, the nonparametric estimator for the partial effects performs well, compared to the two-stage least squares estimator, as the number of IVs increases. When comparing the standard rule-of-thumb (ROT) choice for bandwidth with that of cross-validation (CV), while cross-validation attains a better mechanical fit when the number of instruments is small, as the number of instruments increases the ROT bandwidth exhibits superior performance. We apply the NPIV estimator to empirical data from a consumer product market, and estimate a aggregate logit demand model

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

1.350  0.294  0.857  6.764  11.157 

1.161  0.572  1.235  3.264  7.495 

95% CI at x 1 = ω( x 1,0.25 )  λ = 0.1   –0.503   λ = 0.01   –4.000   λ = 0.001   –10.546   λ = 0.0001  –13.930   λ = 0   –14.209 

y 2, 0.25



95% CI at x 1 = ω( x 1,0.75 )  λ = 0.1   –0.795   λ = 0.01   –4.712   λ = 0.001   –12.239   λ = 0.0001  –14.314   λ = 0   –16.538 

ˆ (y , x ) ∂φ 2 1 ∂y 2

1.050  0.140  0.035  5.470  14.993 



95% CI at x 1 = ω( x 1,0.50 )  λ = 0.1   –0.666   λ = 0.01   –4.315   λ = 0.001   –11.446   λ = 0.0001  –12.342   λ = 0   –14.793 

λ

–0.333  –3.893  –10.999  –15.048  –16.022 

–0.550  –4.570  –12.819  –16.222  –18.917 

–0.501  –4.127  –11.887  –13.504  –17.162 

y 2, 0.50



1.362  0.834  0.659  0.226  1.911 

1.622  0.454  –0.271  3.268  3.804 

1.268  0.348  –0.690  2.384  7.029 

ˆ (y , x ) ∂φ 2 1 ∂y 2

0.058  –3.474  –10.598  –14.371  –15.072 

–0.009  –4.229  –12.142  –16.554  –18.694 

–0.095  –3.724  –10.640  –14.032  –16.887 

y 2, 0.75



1.624  1.281  –0.126  –3.265  –3.126 

1.971  0.856  –1.929  –1.574  –4.002 

1.550  0.785  –1.787  –1.678  –0.733 

ˆ (y , x ) ∂φ 2 1 ∂y 2

0.214  –3.395  –9.755  –12.063  –10.706 

0.393  –4.168  –11.338  –12.411  –11.274 

y 2, 0.25



Derivative estimates 0.429  –2.125  –7.012  –5.365  –2.583  Derivative estimates 0.364  –1.548  –5.481  –6.258  –3.776 

  1.082  0.833  0.086  –0.825  1.052 

Derivative estimates 0.340  –1.790  –5.836  –4.506  –2.635 

ˆ (y , x ) ∂φ 2 1 ∂y 2

  1.259  0.504  –0.986  0.268  0.572 

  0.997  0.268  –0.758  –0.304  0.193 

 ∂φ( y 2 , x 1 )    Eˆ   ∂y 2  

0.229  –3.743  –10.567  –11.646  –10.371 

Table 10: NPIV Derivative Estimates for Different Values of λ Including Brand and Time Effects.

y 2,0.50



0.564  –1.341  –5.806  –7.846  –7.429 

0.684  –1.987  –7.607  –8.080  –8.782 

0.546  –1.631  –6.299  –6.654  –7.362 

ˆ (y , x ) ∂φ 2 1 ∂y 2

y 2, 0.75



0.887  –0.774  –5.618  –8.997  –9.387 

1.102  –1.465  –7.612  –10.588  –13.308 

0.883  –1.118  –6.261  –8.662  –10.735 

ˆ (y , x ) ∂φ 2 1 ∂y 2

0.633 –1.060 –5.274 –7.151 –5.380

0.777 –1.669 –6.946 –7.338 –5.309

0.621 –1.344 –5.747 –6.095 –5.355

 ∂φ( y 2 , x 1 )  Eˆ   ∂y 2  

P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice      173

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

                                         

  ∂pj

∂δˆ j

–0.225  –0.228  –0.223  –0.211  –0.247  –0.292  –0.214  –0.242  –0.192  –0.201  –0.241  –0.276  –0.233  –0.235  –0.232  –0.184  –0.179  –0.160  –0.193  –0.269  –0.186 

  (1 − S j ) pj −

Sk pk



0.001  0.002  0.001  0.005  0.002  0.002  0.011  0.004  0.002  0.001  0.002  0.002  0.002  0.002  0.003  0.007  0.008  0.005  0.002  0.004  0.003 

λ = 0.01

∂pk

∂δˆ j ∂pj

∂δˆ j

–0.961  –0.973  –0.954  –0.903  –1.056  –1.249  –0.917  –1.036  –0.823  –0.859  –1.028  –1.182  –0.994  –1.004  –0.991  –0.787  –0.766  –0.684  –0.824  –1.150  –0.796 

  (1 − S j ) pj −

Sk pk



0.006  0.007  0.005  0.021  0.008  0.007  0.045  0.017  0.006  0.006  0.008  0.006  0.010  0.010  0.013  0.028  0.036  0.020  0.009  0.019  0.012 

λ = 0.001

∂pk

∂δˆ j ∂pj

∂δˆ j

–1.019  –1.032  –1.012  –0.957  –1.120  –1.324  –0.972  –1.098  –0.872  –0.911  –1.091  –1.254  –1.055  –1.065  –1.051  –0.835  –0.813  –0.725  –0.874  –1.219  –0.844 

  (1 − S j ) pj ∂pk

∂δˆ j Sk pk



0.006  0.007  0.005  0.022  0.009  0.007  0.048  0.018  0.007  0.006  0.008  0.007  0.011  0.010  0.014  0.030  0.038  0.021  0.010  0.020  0.012 

λ = 0.0001



∂pj

∂δˆ j

–0.895  –0.906  –0.889  –0.841  –0.984  –1.164  –0.854  –0.965  –0.766  –0.801  –0.958  –1.101  –0.927  –0.936  –0.923  –0.734  –0.714  –0.637  –0.768  –1.071  –0.741 

  (1 − S j ) pj



λ = 0

Sk pk

0.006 0.006 0.005 0.019 0.008 0.006 0.042 0.015 0.006 0.005 0.007 0.006 0.009 0.009 0.013 0.026 0.033 0.018 0.009 0.017 0.011

∂pk

∂δˆ j

Own-price elasticities show the percent change in market share for brand j for a 1% change in price for brand j. Cross-price elasticities show the percent change in market share for brand j for a 1% change in price for brand k.

P COCOA PEBBLES P FRUITY PEBBLES P HONEY-COMB GM CINNAMON TOAST CRUNCH GM COCOA PUFFS GM COOKIE-CRISP GM HONEY NUT CHEERIOS GM LUCKY CHARMS Q CAPN CRN Q CAPN CRN CRN BRY GM REESES PUFFS GM TRIX K APPLE JACKS K CORN POPS K FROOT LOOPS K FROSTED FLAKES K FROSTED MINI-WHEATS K RAISIN BRAN K RAISIN BRAN CRUNCH K RICE KRISPIES Q LIFE

Brand

Table 11: Average Own and Cross-Price Elasticities for Different Values of λ with Brand and Time Effects.

174      P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice

P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice      175 Table 12: Average Own and Cross-Price Elasticities for OLS, 2SLS, and Nonparametric Estimators with Brand and Time Effects.

Brand

   

∂δˆ j ∂pj

(1 − S j ) pj





∂δˆ j

∂pk

Sk pk

   

∂δˆ j ∂pj

(1 − S j ) pj





∂δˆ j

∂pk

OLS P COCOA PEBBLES   P FRUITY PEBBLES   P HONEY-COMB   GM CINNAMON TOAST CRUNCH  GM COCOA PUFFS   GM COOKIE-CRISP   GM HONEY NUT CHEERIOS   GM LUCKY CHARMS   Q CAPN CRN   Q CAPN CRN CRN BRY   GM REESES PUFFS   GM TRIX   K APPLE JACKS   K CORN POPS   K FROOT LOOPS   K FROSTED FLAKES   K FROSTED MINI-WHEATS   K RAISIN BRAN   K RAISIN BRAN CRUNCH   K RICE KRISPIES   Q LIFE  

–0.820  –0.830  –0.814  –0.770  –0.901  –1.066  –0.783  –0.884  –0.702  –0.733  –0.878  –1.009  –0.849  –0.857  –0.845  –0.672  –0.654  –0.583  –0.703  –0.981  –0.679 

Sk pk

   

∂δˆ j ∂pj

(1 − S j ) pj

2SLS

0.005  0.006  0.004  0.018  0.007  0.006  0.038  0.014  0.005  0.005  0.007  0.005  0.009  0.008  0.011  0.024  0.030  0.017  0.008  0.016  0.010 

–2.286  –2.315  –2.271  –2.148  –2.514  –2.972  –2.182  –2.465  –1.958  –2.045  –2.447  –2.813  –2.367  –2.391  –2.358  –1.874  –1.824  –1.627  –1.962  –2.736  –1.894 

0.014  0.016  0.012  0.049  0.020  0.016  0.107  0.039  0.015  0.013  0.018  0.015  0.024  0.023  0.032  0.066  0.085  0.047  0.022  0.044  0.028 





∂δˆ j

∂pk

Sk pk

Nonparametric –0.552  –0.559  –0.549  –0.519  –0.607  –0.718  –0.527  –0.596  –0.473  –0.494  –0.591  –0.680  –0.572  –0.578  –0.570  –0.453  –0.441  –0.393  –0.474  –0.661  –0.458 

0.004 0.004 0.003 0.012 0.005 0.004 0.026 0.010 0.004 0.003 0.005 0.004 0.006 0.006 0.008 0.016 0.021 0.011 0.005 0.011 0.007

Own-price elasticities show the percent change in market share for brand j for a 1% change in price for brand j. Cross-price elasticities show the percent change in market share for brand j for a 1% change in price for brand k.

in its parameter-free form, the empirical analysis documents economically significant differences in nonparametric versus parametric estimates of own- and cross- price elasticities in a monopolistically competitive consumer product market.

Appendix It is well known that for a sequence of convergence functions, in general we cannot infer the convergence of the derivative functions. For a detailed discussion, we refer to (Rudin 1976, Chapter 7). As it is desirable to estimate the derivative of the limiting function, we propose the following estimator: ˆ( y , x ) ∂φ 2 1 ∂y 2

:=

ˆ( y +c , x ) − φ ˆ( y , x ) φ 2 n 1 2 1 cn

,

for some cn 0. Now let’s derive the conditions we would impose on cn such that ˆ ( y , x ) ∂φ( y , x ) ∂φ n 2 1 2 1 , = n→∞ ∂y 2 ∂y 2

lim where

φ( y 2 + cn , x 1 ) − φ( y 2 , x 1 ) ∂φ( y 2 , x 1 ) : = lim . →∞ n cn ∂y 2

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

176      P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice ˆ ( y , x ) ∂φ( y , x )  ˆ( y +c , x ) − φ ˆ( y , x )  ∂φ φ φ( y 2 + cn , x 1 ) − φ( y 2 , x 1 )  2 n 1 2 1 2 1 lim  n 2 1 − = lim  − lim   n→∞ n→∞ ∂y 2 ∂y 2 cn cn   n→∞   ˆ( y +c , x ) − φ ˆ ( y , x ) φ( y + c , x ) − φ( y , x )  φ 2 1 2 1 2 1 2 1 n n = lim  −  n→∞ cn cn   = lim

ˆ ( y + c , x ) − φ( y + c , x )] − [ φ ˆ ( y , x ) − φ( y , x )] [φ 2 1 n 2 n 1 2 1 2 1

n→∞

≤ lim

cn

ˆ ( y + c , x ) − φ( y + c , x )| + | φ ˆ ( y , x ) − φ( y , x )| |φ 2 n 1 2 n 1 2 1 2 1

n→∞

cn

.

If we use the estimator defined by GS, then by their Proposition 2, ˆ ( y + c , x ) − φ( y + c , x )| = sup| φ ˆ ( y , x ) − φ( y , x )| = O (( logn ) n − κ ), sup| φ 2 n 1 2 n 1 2 1 2 1 P for some κ > 0. Therefore, if we choose cn = O((log n)n−γ) for some γ∈(0, κ), we will obtain ˆ ( y , x ) ∂φ( y , x )   ∂φ 2 1 lim  n 2 1 −  = OP (1). n→∞ ∂ ∂ y y   2 2 A similar argument could be made for BCK’s estimator, but with a possibly different γ.

References Ai, C., and X. Chen. 2003. “Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions.” Econometrica 71 (6): 1795–1843. Andrews, D. 2011. “Examples of L2-Complete and Boundedly-Complete Distributions.” Cowles Foundation Discussion Paper no. 1801, Yale University. Armstrong, T. 2014. “Large Market Asymptotics for Differentiated Product Demand Estimators with Economics Model of Supply.” Tech. rep., Yales Economics Working Paper. Berry, S. T. 1994. “Estimating Discrete-Choice Models of Product Differentiation.” RAND Journal of Economics 25 (2): 242–262. Berry, S., and P. Haile. 2014. “Identification in Differentiated Products Markets Using Market Level Data.” Econometrica: Journal of the Econometric Society 82 (5): 1749–1797. Berry, S., J. Levinsohn, and A. Pakes. 1995. “Automobile Prices in Market Equilibrium.” Econometrica: Journal of the Econometric Society 63 (4): 841–890. Blundell, R., X. Chen, and D. Kristensen. 2007. “Semi-Nonparametric IV Estimation of Shape-Invariant Engel Curves.” Econometrica 75 (6): 1613–1669. Carrasco, M., J.-P. Florens, and E. Renault. 2007. “Linear Inverse Problems in Structural Econometrics Estimation Based on Spectral Decomposition and Regularization.” In Handbook of Econometrics, edited by J. Heckman and E. Leamer, vol. 6, of Handbook of Econometrics, chap. 77. Philadelphia: Elsevier. Chernozhukov, V., and C. Hansen. 2005. “An IV Model of Quantile Treatment Effects.” Econometrica 73 (1): 245–261. Darolles, S., Y. Fan, J. P. Florens, and E. Renault. 2011. “Nonparametric Instrumental Regression.” Econometrica 79 (5): 1541–1565. D’Haultfoeuille, X. 2011. “On the Completeness Condition in Nonparametric Instrumental Problems.” Econometric Theory 27 (3): 460–471. Dube, J.-P., G. J. Hitsch, P. Manchanda. 2005. “An Emprical Model of Advertising Dynamics.” Quantitative Marketing and Economics 3 (2): 107–144. Gagliardini, P., and O. Scaillet. 2012a. “Tikhonov Regularization for Nonparametric Instrumental Variable Estimators.” Journal of Econometrics 167 (1): 61–75. Gagliardini, P., and O. Scaillet. 2012b. “Nonparametric Instrumental Variable Estimation of Structural Quantile Effects.” Econometrica 80 (4): 1533–1562. Guadagni, P. M., and J. D. C. Little. 1983. “A Logit Model of Brand Choice Calibrated on Scanner Data.” Marketing Science 2 (3): 203–238.

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM

P. Shaw et al.: Nonparametric Instrumental Variable Estimation in Practice      177 Hall, P. and J. Horowitz. 2005. “Nonparametric Methods for Inference in the Presence of Instrumental Variables.” The Annals of Statistics 33 (6): 2904–2929. Hall, P., Q. Li, and J. S. Racine. 2007. “Nonparametric Estimation of Regression Functions in the Presence of Irrelevant Regressors.” The Review of Economics and Statistics 89 (4): 784–789. Han, S. 2014. “Nonparametric Estimation of Triangular Simultaneous Equations Models under Weak Identification.” Tech. rep., University of Texas at Austin. Hausman, J. A., and W. E. Taylor. 1981. “Panel Data and Unobservable Individual Effects.” Econometrica 49 (6): 1377–1398. Hoderlein, S., and H. Holzmann. 2011. “Demand Analysis as an Ill-posed Inverse Problem with Semiparametric Specification.” Econometric Theory 27 (3): 609–638. Kress, R. 1999. Linear Integral Equations. New York: Springer. Li, Q., and J. Racine. 2003. “Nonparametric Estimation of Distributions with Categorical and Continuous Data.” Journal of Multivariate Analysis 86 (2): 266–292. Nelson, C. R., and R. Startz. 1990. “Some Further Results on the Exact Small Sample Properties of the Instrumental Variable Estimator.” Econometrica 58 (4): 967–976. Nevo, A. 2001. “Measuring Market Power in the Ready-to-eat Cereal Industry.” Econometrica 69 (2): 307–342. Newey, W., and J. Powell. 2003. “Instrumental Variable Estimation of Nonparametric Models.” Econometrica 71 (5): 1565–1578. Rudin, W. 1976. Principles of Mathematical Analysis. 3rd ed. New York: McGraw-Hill. Severini, T., and G. Tripathi. 2006. “Some Identification Issues in Nonparametric Linear Models with Endogenous Regressors.” Econometric Theory 22 (02): 258–278. Silverman, B. 1986. Density Estimation for Statistics and Data Analysis. London: Chapman and Hall. Staiger, D., and J. H. Stock. 1997. “Instrumental Variables Regression with Weak Instruments.” Econometrica 65 (3): 557–586. Stock, J. H., J. H. Wright, and M. Yogo. 2002. “A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments.” Journal of Business and Economic Statistics 20 (4): 518–529. Tikhonov, A. 1963. “On regularization of ill-posed problems.” In Dokl. Akad. Nauk SSSR, vol. 153, pp. 49–52, Russian publisher. Supplemental Material: The online version of this article (DOI: 10.1515/jem-2013-0002) offers supplementary material, ­available to authorized users.

Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 2/7/16 5:09 PM