who has shown me a great deal of unsurpassed love and now rests in peace. I also feel indebted to my adorable kids, Moonyoung and Nathan, who have ...... random walk model of Kleidon (1986) and LeRoy and Parke (1992) and the ...
Three Essays on Nonstationary Time Series and GMM Estimation
Dissertation
Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University
By Jungick Lee, M.S. Graduate Program in Economics
The Ohio State University 2010
Dissertation Committee: Robert M. de Jong, Advisor Stephen R. Cosslett Lung-Fei Lee
Copyright by Jungick Lee 2010
ABSTRACT
Three essays in this dissertation focus on developing new statistical methodologies to address some issues arising in applied econometrics literature. The first two essays concern issues involving the exponential of an integrated series, while the third essay proposes a GMM-type estimation that is robust to the presence of misspecified moment conditions. Since many time series of economic interest are often assumed to have a unit root in their logarithms, the exponential of an integrated series would then describe those series in levels. For some variables such as stock prices, dividends, and interest rates, however, it is not a priori clear, given that one suspects a unit root, whether this unit root is in the levels or in the logarithms. Consequently, in applied work, such variables are sometimes modeled in logarithms and sometimes in levels. Such a situation is considered in the second essay in the bivariate cointegrating regression context. To provide a relevant theoretical guidance as to the choice between cointegration in logarithms and cointegration in levels, testing procedures are proposed based on the slope coefficient and the residual sum of squares from a regression in levels. These test statistics then involve sums of the exponential of an integrated series under the assumption that cointegration in logarithms is the true data-generating process. Asymptotic null distributions of the proposed test statistics are derived by further developing the limit distribution ii
result for sums of the exponential of an integrated series established in the first essay. The finite-sample performance of the test statistics are also examined via Monte Carlo experiments. In empirical applications, the test results suggest that the cointegration in logarithms between real stock prices and real dividends is not well supported in the data, while there is no significant evidence against the specification of long-run money demand function using the logarithm of the nominal interest rate. A separate issue on GMM estimation is considered in the third essay. If some of the moment conditions are misspecified in GMM estimation, the standard GMM estimator is not consistent, and the subsequent inference is misleading. To address this problem, a GMM-type estimator is proposed that is robust against the presence of misspecified moment conditions. To achieve this robustness, a nonstandard criterion function is formulated as a squared weighted L2 -norm of a moment vector minimized over the space of moment selection vectors. A robust estimator is then defined by the minimizer of the objective function. This approach uses the same criterion function to select moment conditions and to estimate parameters, and thus, as opposed to the earlier literature on moment selection, it does not require any pre-selection procedures for choosing correct moment conditions. It is shown √ that the resulting estimator is n-consistent and the limit distribution is characterized by the argmin of a certain random limit function that does not depend on any incorrect moment conditions. Also, the proposed robust estimation framework is illustrated with an example of a simple linear IV model and its performance is evaluated via a simulation study.
iii
Dedicated to my parents and my family
iv
ACKNOWLEDGMENTS
First of all, I would like to express my deepest appreciation to my advisor, Professor Robert de Jong. Without his encouragement, support, excellent guidance, and persistent help, this dissertation would not have been possible. I would also like to sincerely thank Professor Stephen Cosslett, Professor Lung-Fei Lee, and Professor Masao Ogaki for their support, encouragement, insightful comments on my research, and for all their help as committee members. I extend my sincere appreciation to Professor Hajime Miyazaki for his considerate advice and constant support from the very beginning of my Ph.D. study at the Ohio State. I would also like to thank Sang-Yeob Lee for his lasting friendship. I am greatly indebted to my parents who have been always supporting and encouraging me with their best wishes. I send my best regards to my grandmother who has shown me a great deal of unsurpassed love and now rests in peace. I also feel indebted to my adorable kids, Moonyoung and Nathan, who have kept me from losing my vigor and offered a great source of energy and hope. Lastly, but certainly not leastly, my special heartfelt gratitude should go to my wife, Hyejung, for her support and extraordinary dedication to our family during the entire course of our Ph.D. studies. I cherish every moment we have been together in Columbus and look forward to the next stage.
v
VITA
1997 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.B.A. Business Administration, Seoul National University 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.A. Economics, Seoul National University 2000 to 2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Economist, The Bank of Korea 2005 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.A. Economics, The Ohio State University 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.S. Statistics, The Ohio State University 2005 to present . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graduate Teaching Associate, The Ohio State University
PUBLICATIONS Research Publications Lee, J. and R. M. de Jong, “Exponential functionals of integrated processes,” Economics Letters, 100(2):181–184, August 2008.
FIELDS OF STUDY Major Field: Economics
vi
TABLE OF CONTENTS
Page Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
x
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
Chapters: 1.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2.
Exponential of Integrated Series . . . . . . . . . . . . . . . . . . . . . . . .
6
2.1 2.2 2.3 2.4
2.5
Introduction . . . . . . . . . . . . . . . . Related studies . . . . . . . . . . . . . . Main result . . . . . . . . . . . . . . . . Examples and discussion . . . . . . . . 2.4.1 The exponential function . . . . 2.4.2 An exponential-type functional 2.4.3 Discussion . . . . . . . . . . . . Concluding remarks . . . . . . . . . . .
vii
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
6 9 10 12 12 14 14 16
3.
Cointegration in Logarithms versus Cointegration in Levels . . . . . . . 17 3.1 3.2
3.3 3.4
3.5 4.
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
17 22 22 25 29 34 35 44 53
Robust GMM-type Estimation with Misspecified Moment Conditions . . 54 4.1 4.2
4.3 4.4 4.5
4.6 5.
Introduction . . . . . . . . . . . . . . . . . . . . . . Testing for cointegration in logarithms . . . . . . 3.2.1 Assumptions . . . . . . . . . . . . . . . . . 3.2.2 Test statistics and asymptotic distributions Monte Carlo experiments . . . . . . . . . . . . . . Applications: empirical examples . . . . . . . . . 3.4.1 Stock prices and dividends . . . . . . . . . 3.4.2 Money demand function estimation . . . . Concluding remarks . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . Estimation with misspecified moment conditions 4.2.1 Setup . . . . . . . . . . . . . . . . . . . . . 4.2.2 Objective function and estimator . . . . . . Consistency . . . . . . . . . . . . . . . . . . . . . . Limit distribution . . . . . . . . . . . . . . . . . . . Example and simulation study . . . . . . . . . . . 4.5.1 Simple linear IV model . . . . . . . . . . . 4.5.2 Consistency . . . . . . . . . . . . . . . . . . 4.5.3 Limit distribution . . . . . . . . . . . . . . 4.5.4 Discussion . . . . . . . . . . . . . . . . . . Concluding remarks . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
54 57 57 60 62 66 72 72 74 75 79 79
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Appendices: A.
Mathematical Proofs for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . 84
B.
Useful Lemmas and Mathematical Proofs for Chapter 3 . . . . . . . . . . 87 B.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 B.2 Useful lemmas and their proofs . . . . . . . . . . . . . . . . . . . . 87 B.3 Proofs of Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
viii
C.
Mathematical Proofs for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . 101
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
ix
LIST OF TABLES
Table
Page
2.1
100α-percentiles of the distribution of supr∈[0,1] W (r ) . . . . . . . . . . 14
3.1
100α-percentiles of the distribution of
R1
3.2
100α-percentiles of the distribution of
R1
4.1
Selection probabilities for each instrumental variable . . . . . . . . . 74
x
0 0
W 2 (r ) dr . . . . . . . . . . . 33 W∗2 (r ) dr . . . . . . . . . . . 33
LIST OF FIGURES
Figure
Page
3.1
The estimated density of n−1/2 log βˆ n /(θˆn − 1)λˆ n . . . . . . . . . . . . 31
3.2
The estimated density of n−1/2 log Qn /2θˆn λˆ n . . . . . . . . . . . . . . 31
3.3
The estimated density of −n−1/2 log Tn /λˆ n . . . . . . . . . . . . . . . 32
3.4
e n /θˆn2 λˆ 2n . . . . . . . . . . . . . . . . . . 34 The estimated density of n−2 Q
3.5
FTSE All-Share index: (log) real price and (log) real dividend (Monthly, 1965:1–2005:12) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6
Value-weighted NYSE index: (log) real price and (log) real dividend (Annual, 1926–2008) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7
The U.S. long-run money demand: (log) interest rate vs. log real money balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.8
The U.S. long-run (log) interest rate and log real money balance (Annual, 1900–1997) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.9
The seasonal adjustment of the log real money balance series by LOESS-smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.10 Japanese (log) interest rate and log real money balance (Quarterly, 1979:1–2003:4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.1
Distribution of parameter estimates from simulation (θ0 = 1) . . . . 76
4.2
Estimated limit distribution of θˆn from simulation (θ0 = 1, n = 10000) 78
xi
CHAPTER 1
INTRODUCTION
This dissertation studies (i) stochastic processes that can be used to describe the behavior of economic time series in levels when their logarithms are described by integrated series, and (ii) a variant of the generalized method of moments (GMM) estimation that is robust to the presence of misspecified moment conditions. It has been widely observed that many economic time series are well described by nonstationary integrated series, and hence a great deal of work has been devoted to modeling the nonstationarity and making statistical inferences about nonstationary time series models and their parameters, especially about cointegrated models and autoregressive models with unit roots. A fairly complete limit theory is now available for linear time series models with integrated series and, among others, the functional central limit theorem has become the workhorse asymptotic tool in time series econometrics. Also, Park and Phillips (1999) and several subsequent studies developed a limit theory for stochastic processes generated from various nonlinear transformations of integrated series, and established a good deal of limit distribution results for such processes. However, only a few limited results are available in the literature for the exponential of an integrated series, although it is an interesting process in practice. Since many time series of our interest in 1
macroeconomics and finance are often assumed to have a unit root in their logarithms, the exponential of an integrated series would then describe those series in levels. Chapters 2 and 3 explore the analytical and empirical issues involving the exponential of an integrated series. In Chapter 2, I derive a limit distribution for sums of stochastic processes generated from the exponential of an integrated series, viz. n
∑ exp(xt ),
(1.1)
t =1
where xt is an integrated series of order 1. Note that, if xt represents the logarithm of an economic time series, say the log interest rate, then exp( xt ) should describe the series in levels, say the interest rate (in levels). The object in (1.1) will appear, for example, in regression statistics obtained from a time series regression in which the variables are entered in levels but their logarithms are well described by integrated series. It is not obvious to obtain a limit distribution result for (1.1) since xt cannot straightforwardly be normalized with a sample size to appeal to the functional central limit theorem. Several studies in econometrics literature such as Park and Phillips (1999), Davies and Krämer (2003), and de Jong (2010) have attempted to characterize the limit properties of (1.1), but their results seem to be of no use or restrictive. I derive a nondegenerate limit distribution result for (1.1) in a general context. The only assumption imposed on xt is that it satisfies an invariance principle, and not only do I consider (1.1) but also ∑nt=1 exp( f ( xt )) for some nonlinear transformation f (·). The limit distribution result for (1.1) established in Chapter 2, an abridged earlier version of which can be found in Lee and de Jong (2008), does not require the exact order of magnitude of (1.1), yet it shows that the convergence rate of (1.1) depends on the extreme sample path realization as well as the sample 2
size. Furthermore, it is shown that the resulting limit distribution is characterized by the absolute value of standard normal random variable up to a constant, so that the established limit distribution result should be readily applicable to statistical testing in practice. In Chapter 3, I further develop the result and methodologies explored in Chapter 2 to provide a practical guidance as to cointegration specification between in levels and in logarithms. Although many time series in macroeconomics and finance are often assumed to have a unit root in their logarithms, for some variables such as stock prices, dividends, and interest rates, it is not a priori clear, given that one suspects a unit root, whether this unit root is in the levels or in the logarithms. Consequently, in applied work, such variables are sometimes modeled in logarithms and sometimes in levels. Specifically, I consider situations where a pair of time series appears to be cointegrated both in levels and in logarithms, and thus in applied work, cointegrating regression is being specified in both manners. For example, in some studies the present value model of stock prices is tested using cointegrating regression of real stock prices on real dividends in levels, but in some other studies it is tested using cointegrating regression of the logarithms of real stock prices on the logarithms of real dividends; see, e.g., Cochrane and Sbordone (1988) and Ogaki et al. (2007). Another example can be found in the specification of long-run money demand function. In some studies such as Stock and Watson (1993) and Ball (2001), the long-run money demand function is estimated by cointegrating regression of log real money balance on the nominal interest rate in levels; in some other studies such as Bae and de Jong (2007) and Bae et al. (2006), however, it is estimated by cointegrating regression of log real money balance on the 3
logarithm of the nominal interest rate. To provide a relevant theoretical guidance in these situations as to the choice between cointegration in logarithms and cointegration in levels, I develop testing procedures helpful in deciding whether one should take logarithms or not. To test for cointegration in logarithms, I propose test statistics based on the slope coefficient and the residual sum of squares from a regression in levels. Then by construction the test statistics will involve sums of the exponential of an integrated series. By further developing the result established in Chapter 2 and that of Davies and Krämer (2003), I derive the asymptotic distributions of the proposed test statistics under the assumption that cointegration in logarithms is the true data-generating process. I also examine the finite sample performance of the test statistics via Monte Carlo experiments. In empirical applications, I apply the proposed tests to cointegrating regressions of real stock prices on real dividends using U.K. and U.S. data, and to cointegrating regressions of the real money balance on the nominal interest rate using data from the U.S. and Japan. The test results suggest that, regardless of the data source, the cointegration in logarithms between real stock prices and real dividends is not well supported in the data, while there is no significant evidence against the specification of long-run money demand function using the logarithm of the nominal interest rate. In Chapter 4, I consider a separate issue on GMM estimation. The GMM proposed by Hansen (1982) is now one of the most widely used estimation methods in empirical economics and finance. The GMM framework is based on the population moment conditions and presumes that all the moment conditions given are correctly specified, or the population orthogonality condition is a priori assumed to hold. However, empirical researchers using GMM estimation often find 4
that the J-test of overidentifying restrictions rejects the null hypothesis, indicating that not all moment conditions are correct. If some of the moment conditions are misspecified in GMM estimation, the standard GMM estimator is not consistent, and the subsequent inference is misleading. To address this problem, I propose a GMM-type estimator that is robust against the presence of misspecified moment conditions. To achieve this robustness, I formulate a nonstandard criterion function as a squared weighted L2 -norm of a moment vector minimized over the space of moment selection vectors, and then propose a robust estimator defined by the minimizer of the objective function. This approach uses the same criterion function to select moment conditions and to estimate parameters, and thus, as opposed to the earlier literature on moment selection such as Andrews (1999), Andrews and Lu (2001), and Hong et al. (2003), it does not require any pre-selection procedures for choosing correct moment conditions. I establish that the resulting estimator is √ n-consistent and the limit distribution is characterized by the argmin of a certain random limit function that does not depend on any incorrect moment conditions. I illustrate the proposed robust estimation framework with an example of a simple linear IV model and evaluate its performance via a simulation study.
5
CHAPTER 2
EXPONENTIAL OF INTEGRATED SERIES
2.1
Introduction
A limit theory for nonlinear time series models with stochastic processes generated from nonlinear transformations of integrated series has been first developed by Park and Phillips (1999). They consider three classes of nonlinear transformations — integrable, asymptotically homogeneous, and explosive functions — of integrated series. For the first two classes, they fairly well establish the limit distribution results and, moreover, their results have been extended by several subsequent studies; see, e.g., de Jong (2004), Pötscher (2004), and de Jong and Wang (2005). But, for the exponential function, the representative function in the class of explosive transformations, Park and Phillips (1999) failed in obtaining a nondegenerate limit distribution for sample moments of the exponential of an integrated series and only a few results are known from the literature in a limited context. The exponential of an integrated series is, however, an interesting process in practice, in that it describes a time series in levels when the logarithm of the series is an integrated series,1 which is actually often assumed for many time series of 1 Note
that, unlike the exponential of an integrated series, the logarithm of an integrated series cannot be defined since the integrated series will eventually take negative values.
6
economic interest. In this chapter I derive a limit distribution for sample moments of stochastic processes generated from the exponential of an integrated series, and the result will be exploited in the next chapter to investigate a cointegration model with the series specified in levels when their logarithms are in fact integrated series. Let xt denote an integrated series that satisfies an invariance principle or the functional central limit theorem n−1/2 x[nr] ⇒ Ω1/2 W (r ), where xt ∈ Rk , [nr ] with r ∈ [0, 1] denotes the largest integer not exceeding nr, “⇒” signifies weak convergence, W (·) denotes a k-dimensional vector standard Brownian motion on [0, 1], and Ω is a positive definite k ×k matrix. Let X (r ) = Ω1/2 W (r ), that is, X (·) is a k-vector Brownian motion with variance matrix Ω. Here note that Ω has only an unimportant scaling effect in the subsequent analysis. It is well known that the sample average of functions of normalized integrated processes with a scaling factor of n1/2 , viz. n−1 ∑nt=1 T (n−1/2 xt ), satisfies n
−1
n
∑ T (n
−1/2
d
xt ) −→
t =1
Z 1 0
T ( X (r )) dr
in the case where the function T (·) is continuous on Rk . This result follows from the continuous mapping theorem. Park and Phillips (1999) show that this result holds for regular transformations (Park and Phillips, 1999, Definition 3.1) or locally integrable and Lipschitz continuous functions. de Jong (2004) and Pötscher (2004) explore such results in settings where T (·) is no longer continuous. de Jong (2004) 7
allows finitely many poles in T (·), which yet need to be continuous and monotone between poles and locally integrable. Pötscher (2004) establish the above result under the minimal condition that T (·) is locally integrable. For objects of the form cn n
n
−1
∑ T ( xt )
t =1
for some rate cn , i.e., the case where the integrated series xt is not normalized with a scaling factor of n1/2 , several results are known from the work of Park and Phillips (1999) and de Jong and Wang (2005). In the case where T (·) is absolutely integrable, under some regularity conditions, n−1/2
n
d
∑ T (xt ) −→ L(1, 0)
t =1
Z ∞ −∞
T (s) ds,
where L(·, ·) is the local time of Brownian motion on [0, 1]. Also, if T (λx ) ≈ κ (λ)h( x ) in the sense that is made precise in Park and Phillips (1999, Definition 4.2) or de Jong and Wang (2005), we have κ (n
1/2 −1 −1
)
n
n
∑ T ( xt )
t =1
d
−→
Z 1 0
h( X (r )) dr =
Z ∞ −∞
h(s) L(1, s) ds.
These results are the basis for a number of papers, of which Park and Phillips (2000, 2001) are probably the most impressive. For the case of the exponential function T ( x ) = exp( x ) or similar explosive functions, however, it is not obvious to obtain the similar results as above because we cannot straightforwardly normalize the integrated series with a sample size and the convergence rate depends on the extreme sample path realization as well as the sample size. In Section 2.2 I briefly review a few related studies that attempted to characterize the limit properties of sample moments of the exponential of an integrated series. I derive a limit characterization for sums or averages of 8
exponential-type functionals of an integrated process in Section 2.3. In Section 2.4 I provide some examples and discuss the applicability and limitation of the analytical result established in the previous section. Section 2.5 concludes. All mathematical proofs are relegated to Appendix A.
2.2
Related studies
In econometrics literature, Park and Phillips (1999), Davies and Krämer (2003), Lee and de Jong (2008), and de Jong (2010) have attempted to characterize the limit properties of sums or sample moments of the exponential of an integrated process. In this section I briefly review the results from these studies other than Lee and de Jong (2008), the main result of which will be given in the following section. Let xt be an integrated series or a unit root process. When ∆xt follows a stationary linear process that satisfies some regularity conditions, Park and Phillips (1999, Theorem 5.5) show that n
−1/2
n
∑ exp
t =1
xt − max xt 1≤ t ≤ n
d
−→ L(1, smax ),
(2.1)
where smax = max0≤r≤1 X (r ). But, the interpretation of Brownian local time as an occupation time density and the continuity of L(t, s) make it clear that L(1, smax ) = 0. This implies that the limit distribution established in (2.1) is degenerate, so that the result is of no use for statistical testing.2 For an i.i.d. ∆xt with mean zero and finite third moment, Davies and Krämer (2003, Theorem 1) show that n
sup E ∑ exp xt − max xt n ≥1
2 In
than
1≤ t ≤ n
t =1
< ∞,
fact, the only implication from the result in (2.1) is that n1/2 exp(max1≤t≤n xt ) grows faster
∑nt=1 exp( xt ).
9
implying ∑nt=1 exp( xt − max1≤t≤n xt ) = O p (1). Based on this result, Davies and Krämer (2003) obtain some limit results for the Dickey-Fuller test statistic when the data are generated by an exponential random walk exp( xt ) with xt following a simple random walk. For ∆xt that satisfy the linear process conditions of Phillips and Solo (1992), de Jong (2010, Theorem 1) shows that n
∑ exp
xt − max x˜ t 1≤ t ≤ n
t =1
converges in distribution using a Laplace transform argument, where x˜ t denotes the Beveridge-Nelson approximation to xt ; this result is however not amenable to being used for statistical testing in practice, because some resampling methods are needed to obtain the limit distribution and the x˜ t is not readily available from the data.
2.3
Main result
In this section I characterize a nondegenerate limit distribution result for sample moments of the exponential functional of an integrated series that can be readily applicable to statistical testing. The result established in this section will be exploited and further developed in the next chapter. Consider the asymptotic behavior of the average of exponential functionals of a possibly nonlinear transformation of an integrated time series xt . Let f : Rk → R be a Borel measurable transformation. The following lemma is key to the main result. Lemma 2.1. Let an > 0 be a scaling factor such that an log n = o (1). Then as n → ∞ d
an max f ( xt ) −→ Y 1≤ t ≤ n
10
(2.2)
if and only if !
n
an log
∑ exp
f ( xt )
d
−→ Y
t =1
for some random variable Y. Note that the summation can also be replaced by the average in the above lemma, given the assumption that an log n = o (1); that is, an log ∑nt=1 exp f ( xt ) and an log n−1 ∑nt=1 exp f ( xt ) are asymptotically equivalent. Next, I set forth conditions on f (·) that guarantee the convergence condition of Equation (2.2). Assumption 2.1. The function f (·) satisfies f (λx ) = κ (λ)h( x ) + r ( x, λ),
(2.3)
where (a) h(·) is continuous on Rk , or xt ∈ R and h(·) is monotone3 on R; (b) Either 1. |r ( x, λ)| ≤ cν(λ) g( x ) for all λ sufficiently large and for all x over any compact set C, where c is a constant which may depend on C, ν(λ)/κ (λ) → 0 as λ → ∞, and g(·) is bounded on C; or 2. |r ( x, λ)| ≤ cν(λ) g(λx ), ν(λ)/κ (λ) → 0 as λ → ∞, and supx∈R | g( x )| < ∞. For a function f (·) that satisfies Assumption 2.1, we may write f ( xt ) as f ( xt ) = κ (n1/2 )h(n−1/2 xt ) + r (n−1/2 xt , n1/2 ),
(2.4)
where n1/2 corresponds to λ in (2.3) of Assumption 2.1. Obviously, for f ( x ) = x, we can set κ (λ) = λ, h( x ) = x, and r ( x, λ) = 0. Assumption 2.1 (b) is similar to the 3A
function is said to be monotone if it is nondecreasing or nonincreasing.
11
“asymptotically homogeneous” assumption of Park and Phillips (1999, Definition 4.2). Now I present the main result in the following theorem: Theorem 2.1. Let f (·) given by (2.4) satisfy Assumption 2.1 and assume n−1/2 x[nr] ⇒ X (r ), where xt ∈ Rk . Then if log n/κ (n1/2 ) → 0 as n → ∞, we have ! n 1 d log ∑ exp f ( xt ) −→ sup h X (r ) . 1/2 κ (n ) r ∈[0,1] t =1
(2.5)
Remark. Note that analogous results to the one above can also be derived in cases where a rescaled version of x[nr] converges weakly to a limit process different from Brownian motion. For example, if xt is demeaned and n−1/2 x[nr] converges to Brownian bridge, or if a rescaled version of x[nr] converges weakly to a Lévy process or a fractional Brownian motion process, deriving a result analogous to Theorem 2.1 should be straightforward, following the lines of proof in Appendix A.
2.4
Examples and discussion
2.4.1
The exponential function
If f : R → R is set to an identity function, viz. f ( x ) = x, the following result is immediately obtained from Theorem 2.1: Corollary 2.1. Let xt ∈ R be an integrated series that satisfies n−1/2 x[nr] ⇒ X (r ). Then we have n−1/2 log
n
∑ exp(xt )
t =1
!
d
−→ sup X (r ). r ∈[0,1]
Note that we can further characterize the limiting random variable supr∈[0,1] X (r ) in terms of the standard normal random variable. To see this, observe first that we 12
can write X (r ) = λW (r ), where λ =
p
limn→∞ E(n−1/2 xn )2 and W is the standard Brownian motion on
[0, 1]. Next, it follows from Borodin and Salminen (1996, p. 125) that4 ! √ √ P sup W (r ) ≤ z = 1 − erfc(z/ 2) = erf(z/ 2), z ≥ 0, r ∈[0,1]
and it is easy to show that5
√ erf(z/ 2) = 2Φ(z) − 1,
z ≥ 0,
where Φ(·) denotes the cdf of standard normal random variable, say Z ∼ N (0, 1). But 2Φ(z) − 1 is the the cdf of | Z |. Therefore, we have d
sup W (r ) = = | Z |, where Z ∼ N (0, 1).
(2.6)
r ∈[0,1]
Table 2.1 gives critical values of | Z |. The basic result in Corollary 2.1 will be further developed in Chapter 3; see Corollary B.2. While the identity function f ( x ) = x corresponds to a trivial case with r ( x, λ) = 0 in (2.3) in Assumption 2.1, the example in the following subsection considers the case with nonzero r ( x, λ). distribution function of supr∈[0,1] W (r ), i.e., P supr∈[0,1] W (r ) ≤ z = 2Φ(z) − 1, can also be obtained by using the so-called reflection principle; see, e.g., Taylor and Karlin (1998, pp. 491–494). 4 The
5 Observe
that
√
2 erf(z/ 2) := √ π Z z
Z z/√2 0
2
e−u du
2 1 √ e−t /2 dt = 2 2π
Z
z
2 1 √ e−t /2 dt − 2π
Z 0
2 1 √ e−t /2 dt = 2 −∞ −∞ 0 2π √ where the second equality simply follows from the change of variable u = t/ 2.
13
= 2Φ(z) − 1,
α
.005
.01
.025
.05
.10
.90
.95
.975
.99
.995
zα
0.0063 0.0125 0.0313 0.0627 0.1257 1.6449 1.9600 2.2414 2.5758 2.8070
Note: P(supr∈[0,1] W (r ) ≤ zα ) = P(| Z | ≤ zα ) = α, where Z ∼ N (0, 1).
Table 2.1: 100α-percentiles of the distribution of supr∈[0,1] W (r )
2.4.2
An exponential-type functional
Consider f ( x ) = xF ( x ), where F (·) stands for a distribution function. We can write f (λx ) = λxI ( x > 0) + λx F (λx ) − I (λx > 0) , implying that we can set κ (λ) = λ and h( x ) = xI ( x > 0) and that
|r ( x, λ)| ≤ g(λx ) with g( x ) = | x || F ( x ) − I ( x > 0)|. Since xI ( x > 0) is nondecreasing and g(·) is bounded, the condition (b)-2 of Assumption 2.1 holds. Therefore, by Theorem 2.1 ! n d n−1/2 log ∑ exp xt F ( xt ) −→ sup X (r ) I ( X (r ) > 0). r ∈[0,1]
t =1
2.4.3
Discussion
While the above result opens up a new function class for which the limit behavior of sums of functions of an integrated series is well established, the exact order 14
of magnitude of sums of the exponential of an integrated series is not required to obtain the result. Consequently, the result established in Theorem 2.1 or Corollary 2.1 may not be satisfactory in some cases. For example, suppose we wish to investigate the behavior of the Dickey-Fuller test statistic when the data yt are better described by an exponential random walk, say yt = exp( xt ) with ∆xt iid. To be specific, consider the case where the Dickey-Fuller test is applied to an exponential of random walk and the test is based on a regression of exp( xt ) on exp( xt−1 ) with no intercept. Then the least squares coefficient, say ρˆ n , from this regression is given by ρˆ n =
∑nt=1 exp(2xt−1 ) exp(∆xt ) . ∑nt=1 exp(2xt−1 )
By Corollary 2.1 and along the lines of proof of Theorem 2.1, it can be shown that6 p
n−1/2 log ρˆ n −→ 0,
(2.7)
which is, however, of no help in analyzing the Dickey-Fuller statistic since the limit distribution is degenerate. To obtain a nondegenerate result in this case, the exact order of magnitude of ∑nt=1 exp( xt ) seems to be required and it has been proved by Davies and Krämer (2003, Theorem 1) that n
∑ exp
t =1
xt − max xt 1≤ t ≤ n
= O p (1),
that is, the convergence rate of ∑nt=1 exp( xt ) depends on the extreme sample path realization. Based on this observation Davies and Krämer (2003, Theorem 2) show a O p (1) property for ρˆ n .7 Given this result, the result in (2.7) should be a trivial 6 Likewise,
n−1/2 log tˆn
p
we can obtain the similar result for the Dickey-Fuller t-statistic, say tˆn ; that is,
−→ 0.
xt follows a “simple” random walk with increments ∆xt ∈ {−1, 1} for all t, Davies and Krämer (2003, Theorem 3) further show that plimn→∞ n−1 DFn exists and it can 7 In the particular case where
15
consequence. In the next chapter, however, the result established in Theorem 2.1 or Corollary 2.1 will be useful in analyzing cointegrating regression in which the series are specified in levels when their logarithms are in fact integrated series.
2.5
Concluding remarks
While well-established limit distribution results for several function classes are available, only a few results for the exponential have been previously obtained. In this chapter I derive a limit characterization for averages of exponential-type functionals of integrated series in a general context in the sense that (i) the only assumption imposed on the integrated series is that it satisfies a certain invariance principle and (ii) I consider not just the exponential of an integrated series but also the exponential functional of some nonlinear transformations of an integrated series. The result established in Corollary 2.1 of this chapter will be further developed in the next chapter and utilized in characterizing the asymptotic behavior of several statistics obtained in the context of cointegrating regression.
be evaluated, where DFn = n(ρˆ n − 1) is the Dickey-Fuller statistic. de Jong (2010) shows that ρˆ n converges in distribution even in the setting where ∆xt follows a linear process that satisfies the conditions of Phillips and Solo (1992); but his result may not be amenable to statistical inference since the limit distribution is not pivotal.
16
CHAPTER 3
COINTEGRATION IN LOGARITHMS VERSUS COINTEGRATION IN LEVELS
3.1
Introduction
The choice between a specification in levels and one in logarithms is an important issue in applied econometrics. It is often assumed that many time series of our interest in macroeconomics and finance have a unit root in their logarithms; see, e.g., Nelson and Plosser (1982).8 However, for some time series data such as stock prices, dividends, and interest rates, it is not a priori clear, given that one suspects a unit root, whether this unit root is in the levels or in the logarithms. Consequently, in applied work, such variables are sometimes modeled in logarithms and sometimes in levels. Therefore, finding an appropriate transformation of these series or deciding whether one should take logarithms is an important issue. Some previous studies in the literature investigate this issue for a univariate time series in the context of tests for a unit root. They note that the (augmented) Dickey-Fuller 8 Nelson and Plosser (1982) test for a unit root in fourteen U.S. historical time series data—real GNP, nominal GNP, real per capita GNP, industrial production, employment, unemployment rate, GNP deflator, CPI, nominal wages, real wages, M2, velocity, nominal interest rate, stock prices— with all series but the nominal interest rate being transformed to natural logarithms, assuming a unit root is in the logarithms of the series.
17
test is sensitive to nonlinear transformations of integrated series. Granger and Hallman (1991), Ermini and Granger (1993), and Corradi (1995) examine whether the long memory property of an integrated series is preserved under some nonlinear transformations. Granger and Hallman (1991), Burridge and Guerre (1996), and Breitung and Gouriéroux (1997) propose unit root tests invariant to monotonic transformations. Franses and McAleer (1998), Franses and Koop (1998), and Kobayashi and McAleer (1999) investigate the effects of nonlinear transformations on unit root tests in a general Box-Cox framework. Granger and Hallman (1991) and Kramer and Davies (2002) consider the exact null distribution of the DickeyFuller test as applied to a series in levels when the random walk is in the logarithms of the series. Davies and Krämer (2003) examine the asymptotic behavior of the Dickey-Fuller test statistic when the test is applied to a series in levels but its logarithms in fact follows a random walk. Ermini and Hendry (2008) and Spanos et al. (2008) investigate the linear vs. log-linear unit root models in the encompassing framework. Obviously if a time series is an integrated series in its logarithms, then the series in levels should be described by the exponential of an integrated series. Thus, a rigorous study of an econometric model with the series in levels requires analytical tools that enable one to handle the exponential of an integrated series should the logarithm of the variable be an integrated series. The previous studies above, except Davies and Krämer (2003), do not analyze the series in levels as the exponential of an integrated series, largely due to the lack of relevant analytical tools, when its logarithm is assumed to be an integrated series. However some analytical
18
tools have been developed in recent econometrics literature that enable us to investigate the exponential of an integrated series in a rigorous manner; see Sections 2.2 and 2.3 of the previous chapter. In this chapter I address a controversial issue regarding the choice between a specification in levels and a specification in logarithms in the cointegrating regression context. I consider situations where a pair of time series appears to be cointegrated both in levels and in logarithms; thus, in applied work, cointegrating regression is being specified in both manners. Suppose that some applied researchers believe that a pair of time series is cointegrated with both or one of the series specified in levels, while others believe that they are cointegrated in logarithms. For example, some researchers test the present value model of stock prices based on the cointegrating regression of real stock prices on real dividends in levels. However, in some other studies, it is tested based on the cointegrating regression of the logarithms of real stock prices on the logarithms of real dividends, motivated by an observation that the dividend-price ratio has been stable over a long period of time. See, e.g., Cochrane and Sbordone (1988) and Ogaki et al. (2007) for details. Another example can be found in the specification of long-run money demand function. It is believed by many researchers that the money demand function is stable in the long run and the nominal interest rate is an integrated series in its levels.9 Under these assumptions, the long-run money demand function has been estimated by cointegrating regression of log real money balance on the nominal 9 Although
the data do not clearly tell us whether a unit root is in the levels or in the logarithms of the nominal interest rate, it seems that in most literature the levels, rather than the logarithms, of the nominal interest rate is assumed to be an I(1) series. However, Hoffman and Rasche (1991) estimate the U.S. money demand function based on Johansen (1988) approach, assuming the logarithms of the nominal interest rate is I(1). Also, Kobayashi and McAleer (1999) argue that a unit root is in the logarithms, rather than in the levels, of the nominal interest rate.
19
interest rate in levels; see, e.g., Stock and Watson (1993) and Ball (2001). In some other studies, however, it has also been estimated by cointegrating regression of log real money balance on the logarithms of the nominal interest rate to account for the liquidity trap and the nonlinear relationship between the log real money balance and the nominal interest rate; see Bae and de Jong (2007) and Bae et al. (2006). Although specifications both in levels and in logarithms are being used in applied work as noted above, theoretical guidance is lacking in the existing literature as to which one should be used. Thus, as an attempt to provide a relevant theoretical guideline in such a situation, I develop testing procedures helpful in deciding whether one should take logarithms or not. For testing purpose, I set a bivariate cointegration model specified in logarithms as a null hypothesis. To be specific, consider the following cointegrating relationship between a pair of integrated series yt and xt : yt = c + θxt + ut ,
(3.1)
where c is a constant (possibly 0), θ 6= 0, yt and xt are the logarithms of the series under consideration, and the error ut are strictly stationary. I will call the model (3.1) cointegration in logarithms and assume it to be a true data-generating process or the correct specification for cointegrating regression. Now suppose that we run a regression in levels to estimate the cointegrating relation of those series. To be specific, I consider the following regressions in which both or one of the series under consideration is specified in levels: exp(yt ) = αˆ n + βˆ n exp( xt ) + residualt ,
20
(3.2)
or yt = α˜ n + β˜ n exp( xt ) + residualt ,
(3.3)
t = 1, . . . , n. Here the exp(yt ) and exp( xt ) should be understood as the series in levels since the logarithms of the series are denoted by yt and xt in model (3.1). It is clear from this expression that the series in levels can be described by the exponential of an integrated series should the logarithms of the series be an integrated series. The regression of real stock prices on real dividends in levels, as opposed to the regression in logarithms, can be thought of as an example of (3.2), while the regression of log real money balance on the nominal interest rate in levels, rather than on the logarithms of the nominal interest rate, can be considered as an example of (3.3). To test for the null of cointegration in logarithms, I propose test statistics based on the slope coefficient and the residual sum of squares from a regression in levels of type (3.2) or (3.3). Note that it is not straightforward to analyze these test statistics since they involve sums of the exponential of an integrated process, to which the functional central limit theorem cannot be readily applicable. However, by extending the result we established earlier in the previous chapter and that of Davies and Krämer (2003), I will show that the asymptotic null distributions of the proposed test statistics are characterized by functions of the Brownian motion process. In empirical applications, the proposed tests suggest that the cointegration in logarithms between real stock prices and real dividends is not well supported in the data, while there is no statistically significant evidence against the specification of long-run money demand function using the logarithms of the nominal interest rate. 21
The remainder of this chapter is organized as follows. In Section 3.2, to test for cointegration in logarithms, I specify a set of assumptions on the null model, propose test statistics from a regression in levels, and then establish the asymptotic null distributions of our test statistics. In Section 3.3 the finite sample performance of the proposed test statistics is examined via Monte Carlo experiments. In Section 3.4 I apply my testing procedures to cointegrating regressions of real stock prices on real dividends using U.K. and U.S. data, and to cointegrating regressions of the real money balance on the nominal interest rate using data from the U.S. and Japan. Section 3.5 concludes. All proofs are collected in Appendix B.
3.2
Testing for cointegration in logarithms
In this section I propose test statistics from a regression in levels and establish their asymptotic distributions under the null of cointegration in logarithms. Under the null, the test statistics will involve sums of the exponential of an integrated series. Some lemmas useful in characterizing the asymptotic behavior of the proposed test statistics can be found in Appendix B.
3.2.1
Assumptions
Throughout this chapter I maintain the following assumptions. Consider a bivariate cointegration model yt = c + θxt + ut ,
22
(3.1)
where c is a constant (possibly 0), θ 6= 0,10 and xt , ∆xt , ut , and (ut , ∆xt ) satisfy Assumptions 3.1, 3.2, 3.3, and 3.4 below, respectively: Assumption 3.1. The series xt , t = 0, 1, . . . , n, is generated by x t = x t −1 + w t ,
(3.4)
where x0 = O p (1) is arbitrary. Assumption 3.2. The innovation sequence wt in (3.4) follows the stationary linear process ∞
w t = ε t + ψ ( L ) ηt = ε t + ∑ ψ j ηt − j ,
(3.5)
j =0
where (ε t , ηt ) are iid with mean zero, E|ε t
|p
< ∞ for some p > 3, ηt bounded, and
∑∞ j=0 j | ψ j | < ∞. Assumption 3.3. The error ut in (3.1) is a strictly stationary process such that Eut = 0, E|ut | p < ∞ for some p > 3, supn≥1 P(|ut | > K | w1 , . . . , wn ) ≤ f (K ) with f (K ) → 0 as K → ∞, and the conditional density of ∆ut given w1 , . . . , wn is bounded. Assumption 3.4. Suppose that11
ut wt
= Ψ∗ ( L)ε∗t ,
(3.6)
10 If θ = 0, then it is a well-known situation called a spurious regression relationship, where, although two I(1) variables are independent of each other, the OLS estimate of θ does not provide a consistent estimate of true parameter value, and the t and F statistics associated with the OLS estimate get arbitrarily large as the sample size increases. If this is the case, i.e., θ = 0, and if we regress exp(yt ) on exp( xt ), instead of regressing yt on xt , then it can be shown that d n−1/2 log βˆ n −→ σy sup W1 (r ) − σx sup W2 (r ), where βˆ n is the OLS estimator of the slope r ∈[0,1]
r ∈[0,1]
coefficient, σx2 and σy2 are the long-run variances of ∆xt and ∆yt , respectively, and W1 (r ) and W2 (r ) are standard Brownian motions independent of each other. However, I will not pursue this issue any further here. (3.6) seems to be a general structure for (ut , wt )0 that is often employed in the literature, as it allows the correlation between ut and ∆xt as well as the serial correlation in (ut , wt ); see Hamilton (1994, Chapter 19). 11 Equation
23
where ε∗t = (ε t , ηt )0 are i.i.d. with ε t and ηt as in Assumption 3.2. Suppose further that the sequence of matrices {Ψ∗s }s≥0 is one-summable and that the rows of Ψ∗ (1) are linearly independent. Note that Assumptions 3.2 and 3.4 imply that the matrix moving average operator Ψ∗ ( L) in (3.6) is of the form Ψ∗ ( L) =
ψ∗ ( L) 1 1
ψ2∗ ( L)
(3.7)
ψ( L)
∗} ∗ with {ψ1j j≥0 , { ψ2j } j≥0 , and { ψ j } j≥0 one-summable.
Note also that Assumption 3.2 allows wt to be a (weakly) dependent process through the ψ( L)ηt term12 and wt is not restricted to be bounded due to the presence of ε t . Under Assumptions 3.1 and 3.2, it follows from the Beveridge-Nelson decomposition (Beveridge and Nelson, 1981; Phillips and Solo, 1992) and the functional central limit theorem that n
−1/2
[nr ]
x[nr]
1 = √ ∑ wt ⇒ λW (r ), n t =1
where [nr ] denotes the largest integer not exceeding nr, r ∈ [0, 1], “ ⇒” signifies weak convergence, W denotes the standard linear Brownian motion process on √ [0, 1], and λ = λ2 with λ2 the long-run variance of wt defined as13 ! !2 λ2 = lim Var n−1/2 n→∞
n
∑ wt
t =1
= lim n−1 E n→∞
n
∑ wt
.
t =1
Since λ or λ2 will enter into the asymptotic null distributions of our test statistics (see Section 3.2.2 below), it needs to be consistently estimated to eliminate the 12 One may consider an AR(1) process w ˜
˜ t−1 + ηt with |φ| < 1 and the innovation sequence t = φw j ηt drawn from a certain truncated distribution as an example of ψ( L)ηt = ∑∞ j=0 ψ j ηt− j with ψ j = φ . 13 Under
Assumption 3.2, the long-run variance of wt would be given by λ2 = σε2 + ση2 [ψ(1)]2 + σεη ψ(1), where σε2 = Eε2t , ση2 = Eηt2 , and σεη = cov(ε t , ηt ). If ε t = 0 ∀t, then we may need an additional condition that ψ(1) 6= 0 to ensure the positivity of the long-run variance of wt .
24
nuisance parameter asymptotically. A consistent estimator of λ2 , say λˆ 2n , can be obtained by a kernel-based heteroscedasticity and autocorrelation consistent (HAC) estimator λˆ 2n = γˆ 0 + 2
n −1
∑ k( j/bn )γˆ j ,
(3.8)
j =1
where γˆ j = n−1 ∑nt= j+1 wˆ t wˆ t− j is an estimate of γ j , the jth autocovariance of wt , with wˆ t the residuals from a regression of xt on an intercept and xt−1 , as in Phillips and Perron (1988), k (·) is the kernel function, and bn is the bandwidth parameter. It is known that λˆ 2n given by (3.8) is a consistent estimator of λ2 if k (·) and bn satisfy the following conditions. Assumption 3.5. The kernel function k (·) is continuous in the neighborhood of zero, R∞ k (0) = 1, k(s) = k (−s) for all s > 0, |k (·)| is bounded, −∞ |k (s)| ds < ∞, and the bandwidth sequence bn is such that bn−1 + bn /n → 0 as n → ∞. I will use the Bartlett kernel k (s) = (1 − |s|) I (|s| ≤ 1) and the data-dependent bandwidth bn selected by the Newey-West (1994) automatic criterion in our simulation study (see Section 3.3) and empirical applications (see Section 3.4).
3.2.2
Test statistics and asymptotic distributions
I propose test statistics based on the slope coefficient estimate and the residual sum of squares from a regression in levels and establish their asymptotic null distributions under the assumptions specified in Section 3.2.1. I first characterize the limit behavior of our first test statistic, which is based on the OLS estimator of slope coefficient from regression (3.2).
25
Theorem 3.1. Let βˆ n denote the OLS estimator of the slope coefficient from a regression of exp(yt ) on exp( xt ) with or without an intercept term. Then, under the null, if θ > 0, we have d n−1/2 log βˆ n −→ (θ − 1)λ sup W (r ). r ∈[0,1]
Observe that the asymptotic null distribution of this statistic depends on the nuisance parameters θ and λ2 . But it is well known that, under the null, θ can be super-consistently estimated by the OLS estimator of slope coefficient, say θˆn , from a regression of yt on an intercept and xt ; see, e.g., Stock (1987) and Watson (1994). Also, we can consistently estimate λ2 using a kernel-based HAC estimator λˆ 2n given by (3.8). Thus, division of n−1/2 log βˆ n by (θˆn − 1)λˆ n gives an asymptotic distribution free of nuisance parameters; that is, n−1/2 log βˆ n d −→ sup W (r ). (θˆn − 1)λˆ n r ∈[0,1] But we know from (2.6) in Chapter 2 that d
sup W (r ) = = | Z |, where Z ∼ N (0, 1),
(3.9)
r ∈[0,1]
and the critical values of this pivotal limit distribution are given in Table 2.1. A disadvantage of n−1/2 log βˆ n as a test statistic is, however, that its asymptotic null distribution becomes degenerate when θ = 1. Next, I characterize the limit behavior of our second test statistic, which is based on the residual sum of squares from regression (3.2). Theorem 3.2. Let Qn denote the residual sum of squares from a regression of exp(yt ) on exp( xt ) with or without an intercept term. Then, under the null, if θ > 0, we have d
n−1/2 log Qn −→ 2θλ sup W (r ). r ∈[0,1]
26
Observe that the asymptotic null distribution of this statistic is nondegenerate for all θ > 0. As noted above, we divide n−1/2 log Qn by 2θˆn λˆ n to get the pivotal test statistic. Then we have n−1/2 log Qn d −→ | Z |. 2θˆn λˆ n Note that, if we combine the statistics proposed in Theorems 3.1 and 3.2 as in the next theorem, we have a statistic whose asymptotic null distribution is free of θ, the true value of slope coefficient from cointegrating regression in logarithms: Theorem 3.3. Define Tn = βˆ n /
p
Qn ,
where βˆ n and Qn denote respectively the OLS estimator of slope coefficient and the residual sum of squares from a regression of exp(yt ) on exp( xt ) with or without an intercept term. Then, under the null, if θ > 0, we have d
n−1/2 log Tn −→ − λ sup W (r ). r ∈[0,1]
As previously, a pivotal statistic can be obtained by dividing n−1/2 log Tn by a consistent estimate of λ: d
−n−1/2 log Tn /λˆ n −→ | Z |. We may replace the statistic Tn in Theorem 3.3 by en = βˆ n / T
q
σˆ v2 ,
where σˆ v2 = Qn /n is the estimate of error variance from regression (3.2). Note en are asymptotically equivalent since n−1/2 log n = that n−1/2 log Tn and n−1/2 log T o (1). 27
The statistics proposed in Theorems 3.1–3.3 are all obtained from regression (3.2) with θ > 0, and they can be used, for example, to test for the cointegration in logarithms between stock prices and dividends, as demonstrated in Section 3.4.1. The statistic proposed in the following theorem is, however, obtained from regression (3.3) without any restrictions on the sign of θ 6= 0, and it can be used, for example, to test for the log-log functional form of long-run money demand function, as demonstrated in Section 3.4.2. e n denote the residual sum of squares from a regression of yt on Theorem 3.4. Let Q exp( xt ). Then, under the null, if an intercept term is not included in the regression, then we have n
−2
e n −d→ θ 2 λ2 Q
Z 1 0
W 2 (r ) dr,
while, if the regression contains an intercept term, then we have d
e n −→ θ 2 λ2 n −2 Q
Z 1 0
W∗2 (r ) dr,
where W∗ is the demeaned standard Brownian motion, i.e., W∗ (r ) = W (r ) −
R1 0
W (s) ds.
e n by θˆn2 λˆ 2n gives an asymptotic distribution free of nuiAs before, dividing n−2 Q sance parameters: en d n −2 Q −→ θˆn2 λˆ 2n
Z 1 0
W 2 (r ) dr or
Z 1 0
W∗2 (r ) dr,
(3.10)
depending on whether regression (3.3) contains an intercept term or not. Remark. So far an alternative hypothesis has not been explicitly specified. Given the setting in this chapter, cointegration in levels might be the alternative hypothesis of interest. However, under the hypothesis of cointegration in levels, theoretically it is intractable to analyze the limiting behavior of the proposed test statistics 28
with a standard set of assumptions. Note that the pivotal test statistics proposed in Theorems 3.1–3.4 contain θˆn and λˆ 2n , both of which involve the logarithmic function of an integrated series under the hypothesis of cointegration in levels. However, if an integrated series is driven by an innovation sequence with mean zero and finite and constant variance as usual, the integrated series will eventually take negative values with probability one. Thereby, the logarithmic function of an integrated series is not even defined and one cannot analyze the asymptotic behaviors of θˆn and λˆ 2n . The proposed test, therefore, should be understood as a generic specification test.
3.3
Monte Carlo experiments
In this section I report the result from a simple Monte Carlo study designed to see the finite-sample performance of the test statistics proposed in the previous section. I generate a bivariate cointegrated system using a triangular representation. First, an integrated series xt is generated by x t = x t −1 + w t , where the innovation sequence wt is generated by an AR(1) process wt = φwt−1 + ηt , in which φ = 0.5 and ηt ∼ iidN (0, ση2 ) with ση = 1/3. Note that the long-run variance of wt is then given by λ2 = 0.4444.14 Next, yt is generated by yt = θxt + ut , −j 2 = σ2 [ ψ (1)]2 = can write wt = ψ( L)ηt = ∑∞ η j=0 ψ j ηt− j with ψ j = 2 , so that λ (1/3)2 (1/(1 − .5))2 = 0.4444. 14 We
29
where θ = 0.75 and ut is generated by an AR(1) process ut = ρut−1 + ζ t , in which ρ = 0.75 and ζ t ∼ iidN (0, σζ2 ) with σζ = 1/4. Here, as noted before, the xt and yt are considered as the series in logarithms. Three different sample sizes, n ∈ {150, 300, 500}, are considered and 100,000 replications are carried out for each sample size. In each replication, I run a regression in logarithms (3.1) with an intercept and a regression in levels of type (3.2) with and without an intercept to obtain θˆn , βˆ n , Qn , and Tn in the notation of the previous section. Also, a consistent HAC estimator of λ2 , say λˆ 2n , is obtained as in (3.8) using the Bartlett kernel and the bandwidth parameter selected by the Newey-West (1994) automatic criterion. Then the pivotal test statistics proposed in Theorems 3.1–3.3 n−1/2 log βˆ n n−1/2 log Qn n−1/2 log Tn , , and − (θˆn − 1)λˆ n 2θˆn λˆ n λˆ n
(3.11)
are evaluated. All of these statistics converge in distribution to the limiting random variable given by (3.9), which has distribution function and density function respectively given by 2Φ(z) − 1 and 2φ(z), z ≥ 0, where Φ(·) and φ(·) are respectively the cdf and pdf of standard normal random variable. Table 2.1 gives critical values of | N (0, 1)| and the solid lines in Figures 3.1– 3.3 represent its density. The other lines in Figures 3.1–3.3 show the density estimates of statistics (3.11) obtained from a simulation for each sample size n ∈
{150, 300, 500}; the left panel (a) is for the case where an intercept term is not included in regression (3.2) and the right panel (b) is for the case where regression (3.2) contains an intercept term. We can clearly see that the distribution of each test statistic in (3.11) gets closer to that of (3.9) as sample size gets larger, and it appears 30
(a) without intercept 0.8
(b) with intercept 0.8
n = 150 n = 300 n = 500 n=∞
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0 0
1
2
n = 150 n = 300 n = 500 n=∞
3
0
1
2
3
Figure 3.1: The estimated density of n−1/2 log βˆ n /(θˆn − 1)λˆ n
(a) without intercept 0.8
(b) with intercept 0.8
n = 150 n = 300 n = 500 n=∞
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0 0
1
2
3
n = 150 n = 300 n = 500 n=∞
0
Figure 3.2: The estimated density of n−1/2 log Qn /2θˆn λˆ n
31
1
2
3
(a) without intercept 0.8
(b) with intercept 0.8
n = 150 n = 300 n = 500 n=∞
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0 0
1
2
3
n = 150 n = 300 n = 500 n=∞
0
1
2
3
Figure 3.3: The estimated density of −n−1/2 log Tn /λˆ n
that the limiting distribution (3.9) approximates quite well to the distribution of the test statistics given in (3.11) even in the finite sample. Next, I consider the case which can be applied to the specification problem of long-run money demand function. The setup of a Monte Carlo experiment is basically the same as above, but now I set θ = −0.5 to reflect the inverse relationship between money demand and the nominal interest rate. As above, in each replication, I run a regression in logarithms (3.1) with an intercept and a regression in e n in the notalevels of type (3.3) with and without an intercept to obtain θˆn and Q tion of the previous section. A consistent HAC estimator of λ2 , say λˆ 2n , is obtained as before. Then the pivotal test statistic proposed in Theorem 3.4 en n −2 Q θˆn2 λˆ 2n 32
(3.12)
α
.005
.01
.025
.05
.10
.90
.95
.975
.99
.995
cα
0.0290 0.0343 0.0447 0.0566 0.0767 1.1999 1.6577 2.1523 2.7910 3.3142 R1 Note: P( 0 W 2 (r ) dr ≤ cα ) = α. Table 3.1: 100α-percentiles of the distribution of
α
.005
.01
.025
.05
.10
R1 0
.90
W 2 (r ) dr
.95
.975
.99
.995
cα
0.0218 0.0248 0.0303 0.0366 0.0460 0.3479 0.4630 0.5807 0.7445 0.8744 R1 Note: P( 0 W∗2 (r ) dr ≤ cα ) = α. Table 3.2: 100α-percentiles of the distribution of
R1 0
W∗2 (r ) dr
R1 is evaluated. As shown in (3.10), this statistic converges in distribution to 0 W 2 (r ) dr R1 or 0 W∗2 (r ) dr depending on whether regression (3.3) contains an intercept or not, where W∗ denotes the demeaned Wiener process. R1 R1 Tables 3.1 and 3.2 give critical values of 0 W 2 (r ) dr and 0 W∗2 (r ) dr, respectively, calculated via direct simulations, using a sample size of 1,000,000 and 100,000 replications, and the solid lines in Figure 3.4 represent their densities. The other lines in Figure 3.4 show the density estimates of statistic in (3.12) obtained from a simulation for each sample size n ∈ {150, 300, 500}; the left panel (a) is for the case where an intercept term is not included in regression (3.3) and the right panel (b) is for the case where regression (3.3) contains an intercept term. We can see that, in panel (a), the finite-sample distribution is almost the same as the asymptotic distribution; in panel (b), the distribution of statistic in (3.12) gets closer to its limiting distribution as sample size gets larger. Overall, it appears that the limiting 33
(a) without intercept
(b) with intercept
n = 150 n = 300 n = 500 n≈∞
2.0
n = 150 n = 300 n = 500 n≈∞ 6
1.5
4 1.0
2 0.5
0.0
0 0.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.2
0.4
0.6
0.8
e n /θˆn2 λˆ 2n Figure 3.4: The estimated density of n−2 Q
distribution approximates quite well to the distribution of the test statistic given in (3.12) even in the finite sample.
3.4
Applications: empirical examples
In this section I apply the proposed testing procedures to a couple of empirical examples in which cointegrating regression is being specified both in levels and in logarithms in the applied literature. I consider cointegrating regressions of real stock prices on real dividends using U.K. and U.S. data, and cointegrating regressions of real money balance on the nominal interest rate using data from the U.S. and Japan.
34
3.4.1
Stock prices and dividends
The standard present value model of stock prices implies that real stock prices and real dividends are cointegrated. However, depending on which series of real dividends is assumed to be difference stationary between the levels and the logarithms, the present value model can imply cointegration both in levels and in logarithms; see Cochrane and Sbordone (1988) and Ogaki et al. (2007, Chapter 13). Since the economic theory here, viz. the present value model, provides no guidance for applied researchers as to whether the model should be estimated in levels or in logarithms, I apply the proposed testing procedures to test for cointegration in logarithms. Present value model of stock prices and cointegration A generic present value model for two variables yt and xt states that yt is a linear function of the present discounted value of expected future values of xt : ∞
yt = θ (1 − δ) ∑ δi Et xt+i + c,
(3.13)
i =0
where c, the constant, θ, the coefficient of proportionality, and δ, the discount factor, are parameters that may be known a priori or may need to be estimated, and Et = E( · |Ωt ) with Ωt the full market information set which includes xt and yt themselves (Campbell and Shiller, 1987). Models of this form include the expectations theory for interest rates (with yt the long-term yield and xt the one-period rate) and, with some modification, the permanent income theory of consumption. Setting yt to real stock prices, say pt , and xt to real dividends, say dt , in (3.13), we have the present value model of stock prices as a special case of (3.13) with 35
θ = δ/(1 − δ) and c = 0,15 i.e., ∞
pt =
∑ δi+1 Et dt+i .
(3.14)
i =0
The present value model of stock prices implies that real stock prices and real dividends are cointegrated. The exact form of cointegration implied by the model, however, depends on whether the levels or the logarithms of real dividends is assumed to be difference stationary; see Cochrane and Sbordone (1988) and Ogaki et al. (2007, pp. 287–289) for details. Consider first the case where real dividend series is assumed to be difference stationary in its levels; that is, suppose ∆dt is I(0), as in Campbell and Shiller (1987).16 To see how the present value model implies cointegration of pt and dt in this case, define a new variable st , called the spread, by st = pt − θdt . Observe that subtracting θdt from both sides of (3.14) and rearranging yield ∞
st = θ ∑ δi Et ∆dt+i
(3.15)
st = θEt ∆pt+1 .
(3.16)
i =1
and
Since ∆dt is stationary, it follows from (3.15) that st is stationary, implying so is ∆pt from (3.16); in other words, pt and dt are cointegrated of order (1,1) with a cointegrating vector α = (1, −θ )0 should the present value model hold.17 The fact that st , 15 One
difficulty with this formulation is that pt and dt are not measured contemporaneously. The pt is a beginning-of-period stock price and dt is paid sometime within period t. We assume dt is known at the end of period t. 16 This
assumption can also be found in Mankiw et al. (1985) and West (1987, 1988).
17 Following
Engle and Granger (1987), a vector zt is said to be cointegrated of order (d, b), denoted zt ∼ CI(d, b), if (i) all components of zt are integrated of order d (i.e., stationary in dth differences) and (ii) there exists at least one nonzero vector α such that α0 zt ∼ I (d − b), b > 0.
36
a linear combination of dt and pt , is stationary, even though dt and pt are individually stationary only in first differences, turns out to be important for understanding and testing the present value model.18 To estimate the unknown parameter θ, the cointegrating regression approach proposed by Engle and Granger (1987) is often used. Next, consider the case where real dividend series is assumed to be difference stationary in its logarithms; that is, suppose ∆ log dt is I(0), as in Campbell and Shiller (1989).19 It can be shown that (see, e.g., Ogaki et al. (2007, p. 288)), under this assumption, the present value model of stock prices implies that the log pricedividend ratio is stationary, which in turn implies ∆ log pt is I(0); hence, the present value model implies that log pt and log dt are cointegrated. Note that the two different specifications of cointegrating regression would result in different estimates of the cointegrating vector, which in turn may yield different implications for the validity of the present value model of stock prices. Hence, a formal testing of cointegration specification between in levels and in logarithms is important. In the following subsections, I test for the hypothesis that 18 The
discount factor δ is not known a priori but can be inferred by estimating the cointegrating vector. Note that the discount rate r such that δ = (1 + r )−1 is simply given by the reciprocal of θ. One important property of cointegrated systems of order (1,1) is concerned with the estimation of unknown elements of the cointegrating vector α. In the present value model, α is unique up to a scalar normalization and is proportional to (1, −θ )0 . Stock (1987) and Phillips and Ouliaris (1988) prove that a variety of methods provide estimates that converge to the true parameter at a rate proportional to the sample size n. This is because, asymptotically, all linear combinations of the elements of zt other than α0 zt have infinite variance. The practical implication is that an unknown element of α may be estimated in a first-stage regression and then treated as known in second-stage procedures (i.e., in testing the model), whose asymptotic standard errors will still be correct. 19 It might be attractive to model the log d
t and log pt as difference stationary because the growth rates of real stock prices and real dividends are relatively stable over time. However, since the model (3.14) is linear in levels, a log specification is intractable unless one is willing to focus on a special case (Kleidon, 1986) or to approximate the model (Campbell and Shiller, 1989). Campbell and Shiller (1989) model dividends as a log-linear unit root process, incorporating the geometric random walk model of Kleidon (1986) and LeRoy and Parke (1992) and the dividend-smoothing model of Marsh and Merton (1986, 1987) as special cases.
37
log pt and log dt are cointegrated (under the assumption that real dividend series is I(1) in its logarithms) based on the statistics obtained from the regression of pt on dt in levels using U.K. and U.S. data. U.K. stock prices and dividends I apply the proposed testing framework for cointegration specification between the levels and the logarithms to U.K. monthly data on real stock prices and real dividends from January 1965 to December 2005. These series are constructed by dividing the FTSE All-Share price and dividend index20 by the U.K. retail price index. The data set is obtained from Mills and Markellos (2008). Figure 3.5 shows the behaviors of real stock prices and real dividends in levels on the top panel and those in logarithms on the bottom panel. At any conventional level, standard unit root tests such as ADF, DF-GLS, and Phillips-Perron tests do not reject the null of a unit root in the logarithms of real stock prices and real dividends, and the KPSS (Kwiatkowski et al., 1992) test rejects the stationarity of these series as well. Also, the presence of a unit root in the levels of real stock prices and real dividends cannot be rejected at the 5 percent level. The OLS regression of log real stock prices on a constant and log real dividends gives log pt = 1.888 + 1.571 log dt + uˆ t , (0.147)
t = 1, . . . , 492,
(0.067)
20 The
FTSE All-Share Index was originally called the FT Actuaries All-Share Index at its inception back in 1962. The FTSE All-Share is a market-capitalization weighted index representing the performance of all eligible companies listed on the London Stock Exchange’s main market, which pass screening for size and liquidity. The FTSE All-Share Index is considered to be the best performance measure of the overall London equity market with the vast majority of UK-focused money invested in funds which track it.
38
500
real stock price (left scale) real dividend (right scale)
14
400 12 300
10
200
8
100
6
1965
1970
1975
1980
1985
1990
1995
2000
2005
log real stock price (left scale) log real dividend (right scale)
6.0
2.6 2.4
5.5
2.2 5.0 2.0 4.5 1.8 4.0
1.6 1965
1970
1975
1980
1985
1990
1995
2000
2005
Figure 3.5: FTSE All-Share index: (log) real price and (log) real dividend (Monthly, 1965:1–2005:12)
where in parentheses under the coefficient estimates are HAC standard errors with a Bartlett kernel and a bandwidth parameter of 5. Next, the regression of the levels of real stock prices on a constant and the levels of real dividends gives pt = −116.188 + 37.247 dt + residualt , (16.444)
t = 1, . . . , 492,
(2.076)
where in parentheses under the coefficient estimates are HAC standard errors with a Bartlett kernel and a bandwidth parameter of 5. The residual sum of squares from a regression in levels is obtained as 1,189,973. The Phillips-Ouliaris (1990) test is then applied to the residuals from each regression for testing whether variables are
39
cointegrated or not, and the test statistics suggest cointegration both in levels and in logarithms.21 Now, I apply the proposed specification testing framework to test for cointegration in logarithms. From the regression output above, we have n = 492, θˆn = 1.571, βˆ n = 37.247, and Qn = 1,189,973 in the notation of Section 3.2. The long-run variance of ∆ log dt is estimated as λˆ 2n = 0.0021 with a Bartlett kernel and a bandwidth parameter of 5.22 Then the test statistics are evaluated as follows: n−1/2 log βˆ n n−1/2 log Qn n−1/2 log Tn = 6.1797, = 4.3408, and − = 3.2916, (θˆn − 1)λˆ n 2θˆn λˆ n λˆ n which are all well above the upper 1 percentile, 2.5758, of their asymptotic null distribution (see Table 2.1). This suggests that the cointegration in logarithms between the U.K. real stock prices and real dividends is not well supported in the data. U.S. stock prices and dividends I apply the proposed testing framework for cointegration specification between the levels and the logarithms now to U.S. annual data on real stock prices and real dividends from 1926 to 2008. The price variable is the value-weighted New York Stock Exchange index23 constructed by the Center for Research in Security Prices 21 The
ADF t statistics with the order of lagged differenced terms 4 are −3.99 and −3.39 for the residuals from the regression in logarithms and in levels, respectively, while the critical values provided by Phillips and Ouliaris (1990, Table II b) are −3.07, −3.37, and −3.96 at 10%, 5%, and 1% significance levels. 22 If we use a larger bandwidth of 17 as is chosen by the Newey-West (1994) automatic lag selection criterion, the long-run variance is estimated as λˆ 2n = 0.0028; however, the test result remains the same, because the values of test statistics, n−1/2 log βˆ n /(θˆn − 1)λˆ n = 5.3817, n−1/2 log Qn /2θˆn λˆ n = 3.7803, and −n−1/2 log Tn /λˆ n = 2.8666, fall into the rejection region nonetheless. 23 The
value-weighted index is a value-weighted portfolio built each calendar period using all issues listed on the NYSE with available shares outstanding and valid prices in the current and
40
(CRSP). The dividend series is not directly available from the CRSP database, but we can recover it from the CRSP returns by comparing the portfolio returns with and without dividends: Dt+1 = Pt+1 ×
1 + r t +1 −1 , 1 + r˜t+1
(3.17)
where Pt is the price, rt is the value-weighted return including dividends, and r˜t is the value-weighted return excluding dividends. This technique is taken from Cochrane (1992, 2008).24 I use annual data to avoid the seasonal in dividends. The real stock price and dividend series, denoted respectively by pt and dt , are constructed by dividing Pt and Dt by the CPI. Figure 3.6 shows the behaviors of real stock prices and real dividends in levels on the top panel and those in logarithms on the bottom panel. Standard unit root tests do not reject the null of a unit root in the logarithms of real stock prices and real dividends at conventional level, and the KPSS test rejects the stationarity of these series as well. Also, the presence of a unit root in the levels of real stock prices and real dividends cannot be rejected at the 5 percent level. previous period, excluding American Depository Receipts. Issues are weighted by their market capitalization at the end of the previous period. 24 Note
that rt+1 = ( Pt+1 + Dt+1 )/Pt − 1 and r˜t+1 = Pt+1 /Pt − 1, and hence dividend yield is
given by Dt +1 1 + r t +1 = − 1, Pt+1 1 + r˜t+1 from which Equation (3.17) immediately follows. This procedure for recovering dividends from the CRSP returns implies that dividends paid early in the year are reinvested at the market return R (= 1 + r ) to the end of the year, as shown by Cochrane (1991). Accumulating dividends at a different rate is an attractive and frequently followed alternative, but then returns, prices, and dividends no longer obey the identity Rt+1 = ( Pt+1 + Dt+1 )/Pt with end-of-year prices (Cochrane, 2008, p. 1541).
41
1200
30
real stock price (left scale) real dividend (right scale)
25
1000 800
20
600
15
400 10 200 5 1930
7.0
1940
1950
1960
1970
1980
1990
2000
2010
3.5
log real stock price (left scale) log real dividend (right scale)
6.5
3.0
6.0 2.5 5.5 2.0
5.0 4.5
1.5 1930
1940
1950
1960
1970
1980
1990
2000
2010
Figure 3.6: Value-weighted NYSE index: (log) real price and (log) real dividend (Annual, 1926–2008)
The OLS regression of log real stock prices on a constant and log real dividends gives log pt = 1.558 + 1.681 log dt + uˆ t , (0.294)
t = 1926, . . . , 2008,
(0.125)
where in parentheses under the coefficient estimates are HAC standard errors with a Bartlett kernel and a bandwidth parameter of 4. Next, the regression of the levels of real stock prices on a constant and the levels of real dividends gives pt = −333.296 + 56.723 dt + residualt , (63.390)
t = 1926, . . . , 2008,
(6.157)
where in parentheses under the coefficient estimates are HAC standard errors with a Bartlett kernel and a bandwidth parameter of 4. The residual sum of squares from 42
a regression in levels is obtained as 1,780,240. The Phillips-Ouliaris test is then applied to the residuals from each regression for testing whether variables are cointegrated or not, and the test statistics suggest that there is no strong evidence for cointegration both in logarithms and in levels.25 However, considering the small sample size and the low power of the ADF unit root test, we applied the Hausmantype cointegration test recently developed by Choi, Hu, and Ogaki (2008) and the test statistics suggest cointegration both in levels and in logarithms.26 Now, I apply the proposed specification testing framework to test for cointegration in logarithms. From the regression output above, we have n = 83, θˆn = 1.681, βˆ n = 56.723, and Qn = 1,780,240 in the notation of Section 3.2. The long-run variance of ∆ log dt is estimated as λˆ 2n = 0.0131 using a Bartlett kernel with a bandwidth parameter of 3.27 Then the test statistics are evaluated as follows: n−1/2 log βˆ n n−1/2 log Qn n−1/2 log Tn = 5.6816, = 4.1009, and − = 3.0248, (θˆn − 1)λˆ n 2θˆn λˆ n λˆ n which are all well above the upper 1 percentile, 2.5758, of their asymptotic null distribution (see Table 2.1). This suggests that the cointegration in logarithms between the U.S. real stock prices and real dividends is not well supported in the data, as in the U.K. case. 25 The
ADF t statistics with the order of lagged differenced terms 1 are −3.14 and −2.32 for the residuals from the regression in logarithms and in levels, respectively, while the critical values provided by Phillips and Ouliaris (1990, Table II b) are −3.07, −3.37, and −3.96 at 10%, 5%, and 1% significance levels. 26 The
Hausman-type statistics with the order of leads and lags 2 are 1.01 and 2.15 for the regression in logarithms and in levels, respectively, while the critical values from the null χ2 (1) distribution are 2.71, 3.84, and 6.63 at 10%, 5%, and 1% significance levels, indicating that we cannot reject the null of cointegration at any conventional level. 27 If we use a larger bandwidth of 4 as is chosen by the Newey-West (1994) automatic lag selection
criterion, the long-run variance is estimated as λˆ 2n = 0.0126; however, the test result remains the same, because the values of test statistics, n−1/2 log βˆ n /(θˆn − 1)λˆ n = 5.7924, n−1/2 log Qn /2θˆn λˆ n = 4.1809, and −n−1/2 log Tn /λˆ n = 3.0838, fall into the rejection region nonetheless.
43
3.4.2
Money demand function estimation
For the estimation of long run money demand function, the cointegrating regression approach has been used in the recent literature. However, there seems to be no consensus on the specification of regression equation in the existing literature, as the nominal interest rate enters into the regression equation sometimes in levels and sometimes in logarithms depending on authors. I apply the proposed testing framework for cointegration specification to see if the long run money demand function with the nominal interest rate specified in logarithms is acceptable in the data. Specification of money demand function In the literature on the long-run money demand function estimation, the loglevel functional form log mt = α + β rt + u˜ t
(3.18)
has been widely used, where mt denotes the real money balance28 and rt is the nominal interest rate; see, e.g., Stock and Watson (1993) and Ball (2001). However, in order to capture the nonlinear relationship between log real money balance and the nominal interest rate (see Figure 3.7), the log-log functional form log mt = c + θ log rt + ut 28 Let
(3.19)
Mt denote monetary aggregates, Pt overall price level, and Yt income. Here I define the real money balance by mt = Mt /( Pt Yt ), instead of the usual definition of Mt /Pt . To be precise, mt defined in this way is called the reciprocal of velocity (see, e.g., Hoffman and Rasche (1991) and Stock and Watson (1993)), so that Equation (3.18) might be called a velocity function. However, I call it the money demand function following Bae et al. (2006) and Bae and de Jong (2007) with the restriction of unitary income elasticity of money demand imposed. In fact, empirical studies of the demand for money in the U.S. generally found that a unitary long-run income elasticity could not be rejected; see, e.g., Hoffman and Rasche (1991, p. 665).
44
14
2.5
12 2.0
log interest rate
interest rate
10
8
1.5
1.0
6 0.5 4
0.0 2
−1.8
−1.6
−1.4
−1.2
−1.0
−0.8
−0.6
−1.8
log real money balance
−1.6
−1.4
−1.2
−1.0
−0.8
−0.6
log real money balance
Figure 3.7: The U.S. long-run money demand: (log) interest rate vs. log real money balance
has also been employed in the literature; see, e.g., Bae and de Jong (2007) and Bae et al. (2006).29 In the above-mentioned literature, the nominal interest rate is assumed to be I(1) in its levels rather than in its logarithms.30 Although it is argued by some researchers, e.g., Hu and Phillips (2004) and Bae and de Jong (2007), that the assumption of a unit root in the levels rather than in the logarithms of the nominal interest rate is more reasonable, this assumption has an inner inconsistency because the nominal interest rate is always nonnegative in its levels, while an integrated series 29 The
exact functional form employed by Bae and de Jong (2007) and Bae et al. (2006) is not exactly the same as in (3.19) but it takes the form log mt = c + θ log |rt | + ut , since in principle rt could take on negative values under their assumption that rt , rather than log rt , is an I(1) series. 30 A
notable exception is Hoffman and Rasche (1991), who use log rt , rather than rt , as an I(1) series in estimating the U.S. money demand function based on the Johansen (1988) method.
45
may well take negative values. In this regard, the assumption that the nominal interest rate is I(1) in its logarithms may have some advantage or at least should be reasonable as well. Conventional unit root tests actually do not reject the null of a unit root in both the levels and the logarithms of the nominal interest rate; see, e.g., Bae and de Jong (2007, Table I). It is important to note that the two different functional forms in (3.18) and (3.19) have quite different implications for money demand and monetary policy especially when the interest rate is close to zero, as the functional form (3.19) can account for the liquidity trap, while (3.18) cannot. Accordingly, testing for cointegration specification in the form of (3.19) should be important as well. In the following subsections, I test for the specification (3.19) (under the assumption that the nominal interest rate is I(1) in its logarithms) based on the statistic obtained from regression (3.18) using data from the U.S. and Japan. U.S. money demand function estimation I apply the proposed testing framework for cointegration specification between the levels and the logarithms to U.S. annual data on the nominal interest rate and the log real money balance series from 1900 to 1997. I use the same data set as in Bae and de Jong (2007) which extends the data set initially used by Stock and Watson (1993). The 6-month commercial paper (CP) rate is used for the nominal interest rate rt , M1 for monetary aggregates Mt , the NNP deflator as a proxy for the overall price level Pt , and the real NNP as a proxy for income Yt . The real money balance series is then defined as mt = Mt /( Pt Yt ). Figure 3.8 shows the behaviors of the nominal interest rate in levels and the real money balance series
46
−0.6
interest rate (left scale) log real money balance (right scale)
14
−0.8
12 10
−1.0
8
−1.2
6
−1.4
4
−1.6
2
−1.8 1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
2000
−0.6
log interest rate (left scale) log real money balance (right scale)
2.5
−0.8
2.0 −1.0 1.5 −1.2 1.0 −1.4 0.5 −1.6 0.0 −1.8 1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
2000
Figure 3.8: The U.S. long-run (log) interest rate and log real money balance (Annual, 1900–1997)
in logarithms on the top panel and those of the series both in logarithms on the bottom panel. At conventional level, the ADF test does not reject the presence of a unit root in the nominal interest rate series both in levels and in logarithms, and the KPSS test rejects the stationarity of these series as well; see Table I of Bae and de Jong (2007, p. 778). The OLS regression of log real money balance on a constant and the nominal interest rate in logarithms gives log mt = −0.8879 − 0.3326 log rt + uˆ t , (0.0646)
(0.0422)
47
t = 1900, . . . , 1997,
where in parentheses under the coefficient estimates are HAC standard errors with a Bartlett kernel and a bandwidth parameter of 4. Next, the regression of log real money balance on a constant and the nominal interest rate in levels gives log mt = −0.8961 − 0.0914 rt + residualt , (0.0592)
t = 1900, . . . , 1997,
(0.0108)
where in parentheses under the coefficient estimates are HAC standard errors with a Bartlett kernel and a bandwidth parameter of 4. The residual sum of squares from a regression in levels is obtained as 2.6507. The Phillips-Ouliaris test is then applied to the residuals from each regression for testing whether variables are cointegrated or not, and the test statistics suggest cointegration in levels but no statistically significant evidence for cointegration in logarithms.31 However, considering the small sample size and the low power of the ADF unit root test, I apply the Hausman-type cointegration test of Choi et al. (2008) and the test statistics suggest cointegration both in levels and in logarithms.32 Now, I apply the proposed specification testing framework to test for cointegration in logarithms. From the regression output above, we have n = 98, e n = 2.6507 in the notation of Section 3.2. The long-run variθˆn = −0.3326, and Q ance of ∆ log rt is estimated as λˆ 2n = 0.0579 using a Bartlett kernel with a bandwidth 31 The
ADF t statistics with the order of lagged differenced terms 1 are −2.47 and −4.01 for the residuals from the regression in logarithms and in levels, respectively, while the critical values provided by Phillips and Ouliaris (1990, Table II b) are −3.07, −3.37, and −3.96 at 10%, 5%, and 1% significance levels. 32 The
Hausman-type statistics with the order of leads and lags 5 are 1.48 and 2.53 for the regression in logarithms and in levels, respectively, while the critical values from the null χ2 (1) distribution are 2.71, 3.84, and 6.63 at 10%, 5%, and 1% significance levels, indicating that we cannot reject the null of cointegration at any conventional level.
48
parameter of 4.33 Then the test statistic is evaluated as en n −2 Q = 0.0431, θˆn2 λˆ 2n which is greater than the 5 percentile, 0.0366, and close to the 10 percentile, 0.046, of its asymptotic null distribution (see Table 3.2). This suggests that there is no statistically significant evidence against the log-log specification for the U.S. longrun money demand function. Money demand function estimation in Japan I apply the proposed testing framework for cointegration specification between the levels and the logarithms now to the Japanese quarterly data on the nominal interest rate and the log real money balance series from 1979:1 to 2003:4. I use the same data set as in Bae et al. (2006). The Gensaki rate34 is used for the nominal interest rate rt , M1 for monetary aggregates Mt , the CPI as a proxy for the overall price level Pt , and the real GDP as a proxy for income Yt .35 The real money balance series is then defined by mt = Mt /( Pt Yt ) as before. Since the data frequency is quarterly, I remove the seasonal component from the log real money balance series, log mt , using the seasonal-trend decomposition procedure based on LOESS, called STL (Cleveland et al., 1990).36 Figure 3.9 shows the log real money balance series 33 The
bandwidth parameter of 4 is chosen by the Newey-West (1994) automatic lag selection criterion. The test result is however robust to the choice of bandwidth parameter between 1 and 9. 34 The Gensaki rate is the repurchase agreement (repo) rate on Japanese government bonds and it is frequently employed in the empirical studies on Japanese money demand. 35 The
real private consumption is also used as a proxy for income as in Bae et al. (2006), but it is not reported in the paper since the result is similar and robust to the choice of a proxy for income. 36 The
seasonal component is found by LOESS-smoothing the seasonal sub-series, i.e., the series of all the 1st quarter values, the series of all the 2nd quarter values, and so on. However, the overall level is not included in the seasonal component.
49
seasonally unadjusted series seasonally adjusted series
3.4
3.2
3.0
2.8
2.6
seasonal component 0.05
0.00
−0.05
1979
1983
1987
1991
1995
1999
2003
Figure 3.9: The seasonal adjustment of the log real money balance series by LOESSsmoothing
with and without seasonal adjustment in the top panel and its seasonal component extracted by LOESS-smoothing in the bottom panel. I will use the seasonally adjusted series in what follows. Figure 3.10 shows the behaviors of the nominal interest rate in levels and the real money balance series in logarithms on the top panel and those of the series both in logarithms on the bottom panel. At conventional level, standard unit root tests do not reject the presence of a unit root in the nominal interest rate series both in levels and in logarithms, and the KPSS test rejects the stationarity of these series as well.
50
12
interest rate (left scale) log real money balance (right scale)
3.4
10 3.2
8 6
3.0
4
2.8
2 2.6 0 1979
1983
1987
1991
1995
1999
2003
2
3.4
0
3.2 log interest rate (left scale) log real money balance (right scale)
−2
3.0
−4
2.8
−6
2.6
1979
1983
1987
1991
1995
1999
2003
Figure 3.10: Japanese (log) interest rate and log real money balance (Quarterly, 1979:1–2003:4)
The OLS regression of log real money balance on a constant and the nominal interest rate in logarithms gives log mt = 2.7727 − 0.0986 log rt + uˆ t , (0.0125)
t = 1, . . . , 100,
(0.0047)
where in parentheses under the coefficient estimates are HAC standard errors with a Bartlett kernel and a bandwidth parameter of 4. Next, the regression of log real money balance on a constant and the nominal interest rate in levels gives log mt = 2.9921 − 0.0597 rt + residualt , (0.0805)
t = 1, . . . , 100,
(0.0146)
where in parentheses under the coefficient estimates are HAC standard errors with a Bartlett kernel and a bandwidth parameter of 4. The residual sum of squares from 51
a regression in levels is obtained as 3.3639. The Phillips-Ouliaris test is then applied to the residuals from each regression for testing whether variables are cointegrated or not, and the test statistics suggest cointegration in logarithms but no evidence for cointegration in levels.37 I also apply the Hausman-type cointegration test of Choi et al. (2008) and the test statistics suggest cointegration in logarithms but no evidence for cointegration in levels, as in the Phillips-Ouliaris test.38 As seen above, in this case, the existing cointegration tests suggest that cointegration in logarithms is appropriate. Thus, should I apply the proposed specification testing framework to test for cointegration in logarithms, I would expect that the null of cointegration in logarithms could not be rejected, and indeed it turns out to be the case. From the regression output above, we have n = 100, θˆn = −0.0986, e n = 3.3639 in the notation of Section 3.2. The long-run variance of ∆ log rt and Q is estimated as λˆ 2n = 0.1943 using a Bartlett kernel with a bandwidth parameter of 4.39 Then the test statistic is evaluated as en n −2 Q = 0.1781, θˆn2 λˆ 2n which lies in the middle of the 10 percentile, 0.046, and the 90 percentile, 0.348, of its asymptotic null distribution (see Table 3.2). This suggests that there is no 37 The
ADF t statistics with the order of lagged differenced terms 1 are −3.69 and −0.62 for the residuals from the regression in logarithms and in levels, respectively, while the critical values provided by Phillips and Ouliaris (1990, Table II b) are −3.07, −3.37, and −3.96 at 10%, 5%, and 1% significance levels. 38 The
Hausman-type statistics with the order of leads and lags 5 are 0.20 and 15.67 for the regression in logarithms and in levels, respectively, while the critical values from the null χ2 (1) distribution are 2.71, 3.84, and 6.63 at 10%, 5%, and 1% significance levels, indicating that the null of cointegration in logarithms cannot be rejected at any conventional level, while the null of cointegration in levels can clearly be rejected at any conventional level. 39 The
bandwidth parameter of 4 is chosen by the Newey-West (1994) automatic lag selection criterion. The test result is however robust to the choice of bandwidth parameter up to 70.
52
evidence against the log-log specification for the long-run money demand function in Japan, as in the U.S. case.
3.5
Concluding remarks
I have proposed testing procedures helpful in deciding whether one should take logarithms or not in the cointegrating regression context when a pair of time series appears to be cointegrated both in levels and in logarithms. The proposed tests test for cointegration in logarithms based on the statistics obtained from a regression in levels. The asymptotic null distributions of our test statistics are analytically derived and their finite sample performance is examined via Monte Carlo experiments. The proposed tests suggest that the cointegration in logarithms between real stock prices and real dividends is not supported in the data, while there is no statistically significant evidence against the log-log specification for the longrun money demand function. The main technical innovation of this paper is the further development of analytical tools for handling the exponential of an integrated series by extending the result of Lee and de Jong (2008) and that of Davies and Krämer (2003), which enables me to establish the limit distribution results for regression statistics that involve sums of the exponential of an integrated series. The practical contribution of this paper is the proposal of new testing procedures which would serve as a theoretical guidance for applied researchers in specifying cointegrating regression when a pair of time series appears to be cointegrated both in levels and in logarithms.
53
CHAPTER 4
ROBUST GMM-TYPE ESTIMATION WITH MISSPECIFIED MOMENT CONDITIONS
4.1
Introduction
The generalized method of moments (GMM) proposed by Hansen (1982) is now one of the most widely used estimation methods in empirical economics and finance. The GMM estimation framework is based on the population moment conditions and presumes that all the moment conditions given are correctly specified, or the population orthogonality condition is a priori assumed to hold. However, as pointed out by Andrews (1999), empirical researchers using GMM estimation often find that the J-test of overidentifying restrictions rejects the null hypothesis, indicating that not all moment conditions are correct. Nonetheless, it does not seem uncommon to find cases in which the sample evidence suggests some sort of misspecification but the inference about the parameters is still performed based on the asymptotic theory appropriate for correctly specified cases. But the presence of misspecified moment conditions makes the standard GMM estimator inconsistent and subsequent inferences misleading (Hall, 2005; Hall and Inoue, 2003). This is the basic motivation for the earlier literature on moment selection 54
such as Andrews (1999), Andrews and Lu (2001), and Hong, Preston, and Shum (2003).40 They introduce several procedures for consistently selecting correct moment conditions in the context of unconditional moment conditions model based on some information-type selection criteria. The selection criteria proposed by Andrews (1999) and Andrews and Lu (2001) are based on the GMM J test statistic for overidentifying restrictions with a bonus term that rewards utilizing more moment conditions subtracted off. Hong et al. (2003) replace the J statistic with the GEL statistics in the construction of selection criteria. The correct moments are then estimated by minimizing these selection criteria over the space of moment selection vectors. Andrews (1999) considers only moment selection problems, presuming the model itself is correctly specified in the sense that the dimension of parameter vector to be estimated is valid, while Andrews and Lu (2001) and Hong et al. (2003) investigate the selection procedures for both model (i.e., selection of the dimension of a parameter vector to be estimated) and moment conditions (i.e., selection of correct moment conditions to be used in the estimation). In the previous literature on moment selection, correct moment conditions are first selected based on certain information-type selection criteria (pre-selection of moments) and then the parameters of interest are estimated by, for example, the standard GMM method as if the preselected moments are all correct (post-selection 40 There
are a number of studies in the literature along the similar lines although they are not directly related to the problem of correct moment selection. Some of them concern testing individual moment conditions (Gallant et al., 1997) or testing whether a given subset of moment conditions is correct or not (Eichenbaum et al., 1988). Hall et al. (2007) introduce an entropy-based information criterion that can be used as a basis for selection of moments from a candidate set of moment conditions that are known to be valid. But selecting a small number of efficient moment conditions from a large set of correct moments (see also Gallant and Tauchen 1996) is a different problem from what is addressed in this paper. Also, Caner (2009) investigates a different problem from what is addressed here, in that he considers only model selection problems, assuming all the moment conditions given are correct.
55
estimation). Therefore, the effect of moment selection on post-selection inference can be another issue; see, e.g., Pötscher (1991), Kabaila (1995), and Leeb and Pötscher (2005). It is basically the consistent selection of correct moment conditions that the earlier literature on moment selection concerns, and the asymptotic property or limit distribution of parameter estimator, which might be of our ultimate interest, is not explored. Of course, the estimator would be quite close to the true parameter vector in the limit as long as the moment selection procedure is consistent and there is a sufficient number of correct moment conditions. However, in the earlier literature on moment selection, the asymptotic distribution of parameter estimator is conceptually unclear because the criterion function to be used in the estimation is not the same as the one that is used for moment selection. To date, in the presence of incorrect moment conditions in a GMM context, no procedures seem to be available in the literature that consider the selection of correct moment conditions and the consistent estimation of parameter simultaneously.41 I adopt such an approach in this paper by employing a nonstandard objective function. The criterion function is formulated as a squared weighted L2 -norm of a given moment vector minimized over the space of moment selection vectors. The proposed estimator is then defined by the minimizer of the objective function. Since the same criterion function is used in both moment selection and parameter estimation, it allows one to directly explore the asymptotic properties of parameter estimator. For this purpose, I will focus on establishing the consistency property and the limit distribution of the proposed estimator that is asymptotically robust against misspecified moment conditions. I establish that the resulting estimator is 41 In
a different context, Caner (2009) proposes a LASSO-type GMM estimator that selects the correct “model” (i.e., “dimension of parameter vector”) and estimates it simultaneously.
56
√
n-consistent and the limit distribution is characterized by the argmin of a certain
random limit function that is free of any incorrect moment conditions. The remainder of this chapter is organized as follows. In Section 4.2, I set up the framework of this paper and define our objective function and estimator. Then the consistency and the limit distribution of our estimator are established in Sections 4.3 and 4.4. In Section 4.5 the proposed robust estimation framework is illustrated with a hypothetical example in the context of a simple linear IV model, and its performance is evaluated via a simulation study. Section 4.6 concludes with some remarks. All proofs are collected in Appendix C.
4.2 4.2.1
Estimation with misspecified moment conditions Setup
Following Andrews (1999), let {zi } be a sequence of observations taken from an unknown probability distribution (or DGP) P0 ∈ P , where P is a class of probability distributions and allows for the cases where the random variables are i.i.d., independent but nonidentically distributed, stationary and ergodic, weakly dependent and nonidentically distributed, etc. Suppose we have a random vector of empirical moment conditions g¯ (θ ) : Θ → Rr that depends on sample data {zi , 1 ≤ i ≤ n} of size n, which is typically of the form g¯ (θ ) =
1 n g ( z i , θ ), n i∑ =1
57
where θ ∈ Θ ⊂ Rk is the parameter vector of dimension k and g is a known Rr valued moment function. Assume that for all θ ∈ Θ and P0 ∈ P p
g¯ (θ ) −→ g0 (θ ),
(4.1)
where g0 (θ ), called the population moment condition, is the expectation of g¯ (θ ) or its limit as n → ∞,42 and it is assumed to exist and be finite for all θ ∈ Θ and for all zi . Usually Equation (4.1) holds by a weak law of large numbers (WLLN). In the standard GMM framework, it is assumed that all r moment conditions are correct and there exists a unique (true) parameter value θ0 ∈ Θ such that g0 (θ0 ) = 0. The GMM estimator of θ0 is then defined to minimize g¯ (θ )0 Wn g¯ (θ ) over θ ∈ Θ for some positive definite weight matrix Wn . In the overidentified moment conditions model with r > k, however, it is possible that no value of θ ∈ Θ simultaneously satisfies all the moment restrictions exactly in the population, resulting in a misspecified model (Maasoumi and Phillips, 1982). In this regard, the model is often said to be misspecified if there is no value of θ ∈ Θ that satisfies g0 (θ ) = 0, i.e.,
0
g (θ ) > 0 for all θ ∈ Θ, where k·k denotes the Euclidean (or L2 ) norm (Hall and Inoue, 2003). In this paper, I consider an overidentified (unconditional) moment conditions model where not all the moments in g¯ (θ ) are necessarily correct, or some of the 42 If
zi are iid or stationary and erogodic, then g0 (θ ) = E[ g(zi , θ )], where E denotes expectation under P0 .
58
moments in g¯ (θ ) are incorrect (or misspecified).43 I assume that there is a sufficient number of correct moment conditions for identification; that is, we have at least as many moment conditions as the dimension of a parameter vector for which the population orthogonality condition holds at a unique parameter value θ0 ∈ Θ. Thus, g0j (θ0 ) = 0 if the jth moment condition is correct and g0j (θ0 ) 6= 0 otherwise. Suppose a researcher found in an empirical study that the J-test of overidentifying restrictions rejects the null hypothesis and realized that not all r moment conditions given are correct. Suppose that the true number of correct moment conditions is q (out of r), which is unknown, and that the researcher does not know a priori which moment conditions are correct. But I suppose that the researcher specifies the number, p, of moment conditions utilized in estimating the parameter vector, assuming at least p moment conditions are correct (or at most r − p moments are incorrect). I need to impose the restriction that the number of moments, p, specified by a researcher for estimation is no less than the dimension, k, of parameter vector to ensure identification of θ0 , and it is no more than the true number, q, of correct moment conditions to rule out the possibility of being forced to choose any incorrect moment conditions. As long as there are more correct moment conditions than the dimension of a parameter vector, the first restriction on the range of p is not restrictive at all since k is known. The second restriction on the range of p, i.e., p ≤ q, will be guaranteed asymptotically within the framework of this paper since 43 This
situation can arise for a variety of reasons. For example, as noted in Andrews and Lu (2001), it arises when we select between two non-nested GMM models, or when g¯ (θ ) consists of moments for a single model or nested models, but there is a hierarchy of restrictions on the model(s). However, if moment conditions are obtained from the first order conditions (in the form of Euler equations) for a certain model, as is common in some macroeconomic applications, then this situation would make little sense since all the moment conditions would be either correct or incorrect.
59
the criterion function employed for the proposed robust estimation will diverge as the sample size gets larger if p > q, so that the researcher will not pick p such that p > q. Under these circumstances I show that it is possible to obtain a consistent estimator for θ0 based on only correct moment conditions of dimension p asymptotically, and that the limit distribution for such an estimator does not depend on any incorrect moment conditions. To do so, I adopt a nonstandard criterion function and define the proposed estimator by the minimizer of such a criterion function.
4.2.2
Objective function and estimator
To formalize the objective function and estimator, let s ∈ Rr denote a moment selection vector as in Andrews (1999), i.e., a vector of ones and zeros that selects some moment conditions but not others. If the jth element of s is a one, then the jth moment condition is selected by s and will be included, whereas if the jth element is a zero, it will not be included. Let |s| denote the number of moment conditions selected by s, i.e., |s| = ∑rj=1 s j . In the setting specified above, however, s selects p moments out of r moment conditions given, so that s is a vector of p ones and r − p zeros, i.e., |s| = p, and p is specified by the researcher. Let S p = s ∈ Rr : s j ∈ {0, 1} ∀1 ≤ j ≤ r, |s| = p
(4.2)
be a set of moment selection vectors that select p moment conditions. For any rvector v and any s ∈ S p , let vs denote the p-vector that results from deleting all elements of v whose coordinates equal coordinates of elements of s that are zeros. Thus, g¯ s (θ ) represents the moment conditions of dimension p selected by s ∈ S p . 60
With the moment selection vector s ∈ S p defined as above, define a sequence of functions f n by f n (θ, s) = g¯ s (θ )0 Wn (s) g¯ s (θ )
(4.3)
and the corresponding limit function f by f (θ, s) = gs0 (θ )0 W (s) gs0 (θ ),
(4.4)
where s ∈ S p , Wn (s) is the p × p positive definite weight matrix employed with p
g¯ s (θ ), and Wn (s) −→ W (s) for some positive definite matrix W (s) for all s ∈ S p . I construct Wn (s) such that it is an asymptotically optimal weight matrix when 1 the moment conditions selected by s are correct; that is, Wn (s) = Ω− n ( s ), where
Ωn (s) is a consistent estimator of Ω(s) = limn→∞ Var(n1/2 g¯ s (θ0 )). I define Ωn (s) as follows:44 Ωn (s) =
0 1 n gs (zi , θˆ(s)) − g¯ s (θˆ(s)) gs (zi , θˆ(s)) − g¯ s (θˆ(s)) , ∑ n i =1
(4.5)
where g¯ s (θ ) = n−1 ∑in=1 gs (zi , θ ) and θˆ(s) is some estimator of θ0 . Now, I define the objective function as Qn (θ ) = inf f n (θ, s) = inf g¯ s (θ )0 Wn (s) g¯ s (θ ), s∈S p
s∈S p
(4.6)
i.e., the objective function is defined as a squared weighted L2 -norm of g¯ s (θ ) ∈ R p minimized over s ∈ S p . A robust GMM-type estimator θˆn is then assumed to satisfy Qn (θˆn ) = inf Qn (θ ), θ ∈Θ
44 Subtracting
off the sample averages is particulary important when some of the moment conditions are not correct. The Ωn (s) in (4.5) is for the case where zi are iid. In the case of temporal dependence, sample averages can be subtracted off from a HAC covariance matrix estimator in an analogous fashion.
61
or inf f n (θˆn , s) = inf inf f n (θ, s),
(4.7)
θˆn = arg min Qn (θ ) = arg min inf f n (θ, s).
(4.8)
s∈S p
θ ∈Θ s∈S p
i.e., θˆn is defined as
θ ∈Θ
θ ∈Θ
s∈S p
Note that the estimation method proposed here in fact selects p moment conditions and estimates parameters simultaneously. In the subsequent sections I will establish the asymptotic properties (consistency and limit distribution) of the proposed estimator.
4.3
Consistency
To establish consistency result, first observe that, by Equation (4.1) and the continuity of objective function, we have as n → ∞ p
inf g¯ s (θ )0 Wn (s) g¯ s (θ ) −→ inf gs0 (θ )0 W (s) gs0 (θ )
s∈S p
s∈S p
p
for all θ ∈ Θ if Wn (s) −→ W (s) with W (s) positive definite. That is, the limit objective function of (4.6) would be given by Q(θ ) = inf f (θ, s) = inf gs0 (θ )0 W (s) gs0 (θ ). s∈S p
s∈S p
(4.9)
I claim that Q(θ0 ) = 0. To see this, observe that gs0 (θ0 )0 W (s) gs0 (θ0 ) > 0 for any gs0 (θ0 ) 6= 0 since W (s) is positive definite. Let Λc = { j ∈ {1, . . . , r } : g0j (θ0 ) = 0} and Λm = { j ∈ {1, . . . , r } : g0j (θ0 ) 6= 0} denote respectively the sets of coordinates of correct and incorrect moments in g¯ (θ ) ∈ Rr . Also, let Λ0 = { j ∈ {1, . . . , r } : s j = 0} and Λ1 = { j ∈ {1, . . . , r } : s j = 1} 62
denote respectively the sets of coordinates of elements of s ∈ S p that are zeros and ones in the limit.45 Then gs0 (θ0 )0 W (s) gs0 (θ0 ) > 0 if Λm 6⊂ Λ0 or Λ1 6⊂ Λc since g0j (θ0 ) 6= 0 for any j ∈ Λm and g0j (θ0 ) = 0 for all j ∈ Λc . But, it must be the case that Q(θ0 ) = inf f (θ0 , s) = inf gs0 (θ0 )0 W (s) gs0 (θ0 ) = 0 with Λm ⊂ Λ0 and Λ1 ⊂ Λc , s∈S p
s∈S p
(4.10) for #Λ1 = p ≤ q = #Λc or #Λ0 = r − p ≥ r − q = #Λm by supposition; that is, the dimension of coordinates of elements of s ∈ S p that equal zeros is larger than the number of incorrect moments, so that it is possible to set Q(θ0 ) = 0, which is the minimum of Q(θ0 ) since W (s) is positive definite. Thus, θ0 would be identified if Q(θ ) > 0 for all θ ∈ Θ such that θ 6= θ0 . In other words, there should be no subset of moment conditions of dimension p for which each of the moment restriction is simultaneously satisfied with some parameter value θ 6= θ0 in the population. Let sc ∈ Rq and sm ∈ Rr−q denote the subvectors of s ∈ S p whose elements consist of s j ∈ s such that j ∈ Λc and j ∈ Λm , respectively. Namely, sc and sm are the q- and (r − q)-subvectors of s ∈ S p such that |sc | + |sm | = |s| = p and they are, respectively, associated with correct and incorrect moments; for example, if sm = 0, then no incorrect moment conditions are selected by s ∈ S p . Now, define e (θ, sm ) = inf f (θ, sm , sc ). Q sc
e (θ, sm ) = Q e (θ0 , 0) = 0, and this holds reThen we can simply put infθ ∈Θ infsm Q e (θ, sm ) need to be uniquely gardless of the value of sc . Thus, for identification, Q minimized at (θ0 , 0). 45 Let
#A denote the cardinality of a set A. Then, given the supposition, we have #Λc = q, #Λm = r − q, #Λ0 = r − p, and #Λ1 = p.
63
The following regularity conditions will be needed to establish the consistency result for the proposed estimator defined in (4.8). Assumption 4.1. (i) zi are i.i.d.; (ii) Θ is a compact subset of Rk ; (iii) g(zi , θ ) is continuous in θ ∈ Θ with probability one for each zi and is measurable with respect to zi for each θ ∈ Θ; (iv) E supθ ∈Θ k g(zi , θ )k < ∞; p
(v) Wn (s) −→ W (s) for some positive definite matrix W (s) for all s ∈ S p ; e (θ, sm ) is uniquely minimized at (θ0 , 0), or there should be no subset of moment (vi) Q conditions of dimension p for which each of the moment restriction is simultaneously satisfied with some parameter value θ 6= θ0 in the population. The assumptions above guarantee the following result on (weak) uniform law of large numbers (ULLN): Lemma 4.1. Under the conditions (i)–(iv) of Assumption 4.1, g0 (θ ) is continuous and g¯ (θ ) converges in probability to g0 (θ ) uniformly in θ ∈ Θ, i.e.,
p
0 sup g¯ (θ ) − g (θ ) −→ 0. θ ∈Θ
(4.11)
This conclusion remains true if the iid assumption for zi is changed to strict stationarity and erogodicity. In the remainder of this section, I establish the consistency of our estimator and characterize the asymptotic behavior of sm ∈ Rr−q at the same time. I supposed 64
that there are q correct and r − q incorrect moment conditions, although they are a priori unknown, in a given set of r moments; that is, we have the q correct moment conditions g¯ c (θ ) ∈ Rq such that gc0 (θ0 ) = 0 and the r − q incorrect moment 0 ( θ ) 6 = 0 for each jth element of g0 ( θ ). conditions g¯ m (θ ) ∈ Rr−q such that gmj 0 m 0
Lemma 4.2. Let functions f n and f be defined as in (4.3) and (4.4), respectively. Then, under conditions (i)–(v) of Assumption 4.1, the sequence f n converges in probability to f p uniformly over Θ × S p , i.e., supθ ∈Θ,s∈S p f n (θ, s) − f (θ, s) −→ 0. Now, observe that the uniform convergence result in Lemma 4.2 implies that p
inf Qn (θ ) = inf inf f n (θ, s) −→ inf inf f (θ, s) = inf Q(θ ).
θ ∈Θ
θ ∈Θ s∈S p
θ ∈Θ s∈S p
θ ∈Θ
(4.12)
Let θˆn and sˆ denote θ ∈ Θ and s ∈ S p that minimize f n (θ, s), and write inf inf f n (θ, s) = f n (θˆn , sˆ) = f n (θˆn , sˆm , sˆc ),
θ ∈Θ s∈S p
(4.13)
where sˆ = (sˆm , sˆc ). From condition (vi) of Assumption 4.1, we can also write inf inf f (θ, s) = inf f (θ0 , sm = 0, sc ) = 0,
θ ∈Θ s∈S p
sc
(4.14)
which holds regardless of the value of sc , as shown in Equation (4.10). The following theorem now asserts that sˆm = 0 with probability approaching one as n → ∞, p
or sˆm −→ 0, and establishes the consistency of the proposed estimator at the same time. Theorem 4.1. Under Assumption 4.1, if we minimize the function f n defined by (4.3) over θ ∈ Θ and s ∈ S p , the incorrect moment conditions in g¯ (θ ) will be selected out via p
the moment selection vector s with probability approaching one as n → ∞, i.e., sˆm −→ 0, p and θˆn is (weakly) consistent for θ0 ; that is, (θˆn , sˆm ) −→ (θ0 , 0).
65
The consistency result remains true if the iid assumption is replaced by the condition that zi are stationary and ergodic. Theorem 4.1 implies that the proposed framework allows one to estimate parameters and select moment conditions simultaneously.
4.4
Limit distribution
To derive the limit distribution of the estimator defined by (4.8) we cannot resort to the standard technique employed to establish the asymptotic normality of the usual GMM estimator, because the estimator is based on the criterion function that is not differentiable with respect to the unknown parameter.46 Instead, I will use Kim and Pollard’s (1990) approach. To do so, I define the real-valued random processes Hn (·) and H (·) on Rk as follows. First, I define the rescaled Rr -valued random processes hn , n ≥ 1, by ( n1/2 g¯ (θ0 + ξn−1/2 ), hn (ξ ) = n1/2 g¯ (θn∗ ),
if θ0 + ξn−1/2 ∈ Θ otherwise
(4.15)
where θn∗ = arg maxθ ∈Θ Qn (θ ) and ξ ∈ Rk . Then I define the random functions Hn (·) by Hn (ξ ) = inf hns (ξ )0 Wn (s)hns (ξ ) s∈S p
(4.16)
and the corresponding limit function will be given by47 H (ξ ) = inf ( xs + ys + Gs ξ )0 W (s)( xs + ys + Gs ξ ), s∈S p
(4.17)
where x is an r-vector of which jth component is either zero or infinite depending on whether the jth moment condition is correct or not, y is an r × 1 random vector 46 Note
that the objective function defined by (4.6) is a minimum function and so it is not differentiable with respect to θ ∈ Θ. 47 See
the proof of Lemma 4.5 in Appendix C for a detailed derivation of H (·) from Hn (·).
66
such that y ∼ N (0, Ω) with Ω = E[ g(zi , θ0 ) g(zi , θ0 )0 ] − [ Eg(zi , θ0 )][ Eg(zi , θ0 )]0 , G = E(∂/∂θ 0 ) g(zi , θ0 ), and Gs denotes the p × k matrix that equals G with the rows of G deleted that correspond to elements of s that are zeros. If p ≤ q as supposed, then we can rewrite (4.17) as H (ξ ) = inf∗ (ys + Gs ξ )0 W (s)(ys + Gs ξ ),
(4.18)
S p∗ = s ∈ S p : s j = 0 ∀ j ∈ Λm ,
(4.19)
s∈S p
where
i.e., S p∗ ⊂ S p denotes the set of moment selection vectors that select only correct moment conditions of dimension p. Observe that (4.18) is obtained by dropping the xs term from (4.17) since xs = 0 ∈ R p for any s ∈ S p∗ . To derive the limit distribution result of the proposed estimator, which will be established in Theorem 4.2 below, I restate the result of Kim and Pollard (1990, Theorem 2.7) in the following lemma: Lemma 4.3. Let H, H1 , H2 , . . . be real-valued random processes on Rk with continuous sample paths and ξˆn ∈ Rk be a random vector such that (i) H (ξ ) → ∞ as kξ k → ∞; (ii) H (·) achieves its minimum at a unique point in Rk ; (iii) Hn converges weakly to H on any set Ξ = [− M, M]k ; (iv) ξˆn = O p (1); (v) ξˆn minimizes Hn (ξ ). d Then ξˆn −→ arg minξ ∈Rk H (ξ ).
67
In addition to Assumption 4.1, the following conditions will be needed to establish the limit distribution of the estimator: Assumption 4.2. (i) Θ is a compact convex subset of Rk ; (ii) θ0 ∈ int(Θ); (iii) E[ g(zi , θ0 ) g(zi , θ0 )0 ] < ∞; (iv) There exists (∂/∂θ 0 ) g(zi , θ ) that is continuous in θ ∈ Θ for each zi and measurable for each θ ∈ Θ, E supθ ∈Θ k(∂/∂θ 0 ) g(zi , θ )k < ∞, and G = E(∂/∂θ 0 ) g(zi , θ0 ) is of full column rank; (v) H (ξ ) defined by (4.17) achieves its minimum at a unique point of ξ ∈ Rk . The main assertion of Theorem 4.2 below on the limit distribution of the estimator will be proved as the culmination of a sequence of results that shows the conditions of Lemma 4.3 are satisfied under Assumptions 4.1 and 4.2. First, I show that the usual O p (n−1/2 ) asymptotics holds for the proposed estimator. Lemma 4.4. Under Assumptions 4.1 and 4.2, n1/2 (θˆn − θ0 ) = O p (1). Once a O p (n−1/2 ) rate of convergence has been established, let’s focus our attention on the rescaled Rr -valued random processes hn and Hn , n ≥ 1, defined in (4.15) and (4.16) above. Now define ξˆn by ξˆn = n1/2 (θˆn − θ0 ),
(4.20)
where θˆn is the estimator defined in (4.8). Note that ξˆn then minimizes Hn (ξ ) by construction. 68
Since the convergence in distribution for random processes on Rk can be characterized by the usual sort of finite-dimensional convergence together with stochastic equicontinuity (or uniform tightness), I establish now these two results in the following lemmas. Lemma 4.5. Consider random functions Hn , n ≥ 1, and H defined by (4.16) and (4.17). Under Assumptions 4.1 and 4.2, the finite-dimensional distributions of Hn converge to the finite-dimensional distribution of H. Lemma 4.6. Under Assumptions 4.1 and 4.2, Hn (·) defined by (4.16) is stochastically equicontinuous on any set Ξ = [− M, M]k . Now, collecting all the lemmas above yields the following limit distribution result: Theorem 4.2. Let x be an r-vector whose jth element is either zero or infinite depending on whether the jth moment condition is correct or not. Let y be an r × 1 random vector such that y ∼ N (0, Ω) with Ω = E[ g(zi , θ0 ) g(zi , θ0 )0 ] − [ Eg(zi , θ0 )][ Eg(zi , θ0 )]0 . Then, under Assumptions 4.1 and 4.2, we have d n1/2 (θˆn − θ0 ) −→ arg min H (ξ ),
(4.21)
ξ ∈Rk
where H (ξ ) is as defined in (4.17) or (4.18). Remarks. (a) Theorem 4.2 shows that the limit distribution of the estimator defined in (4.8) is characterized by the random vector that minimizes the random limit function H (ξ ). Note that the framework proposed in this paper leads to the limit 69
distribution in which any misspecified moment conditions are automatically ruled out although I do not a priori specify or select which moments are correct or incorrect. In this sense, I call the proposed estimator defined by (4.8) based on the criterion function (4.6) an asymptotically robust GMM-type estimator. (b) To achieve robustness of the proposed estimator, the number of moment conditions, p, specified by the researcher must not exceed the true number of correct moment conditions, q. This is however guaranteed asymptotically within the framework proposed in this paper. Observe from (4.16) and (4.17) that Hn will diverge to infinity if p > q, namely, if the number of moment conditions utilized by the researcher for estimation is larger than the number of correct moments. Notice that the p − q elements of xs ∈ R p in (4.17) will take infinite values if s ∈ S p with p > q, i.e., if the moment selection vector should select more moments than the number of correct moments. (c) Note that the result in Theorem 4.2 in fact establishes the asymptotic normality of our robust estimator if the number of moment conditions, p, specified by a researcher for estimation equals the (true) number of correct moments, q. If p = q, then S p∗ defined in (4.19) contains a single element, say S p∗ = Sq∗ =
{s0 }, and s0 would select all the q correct moment conditions but no incorrect moments. Then the random limit function H (ξ ) in (4.18) reduces to H (ξ ) = (ys0 + Gs0 ξ )0 W (s0 )(ys0 + Gs0 ξ ), which is uniquely minimized when ξ is given by ξ 0 = −( Gs0 0 W (s0 ) Gs0 )−1 Gs0 0 W (s0 )ys0 . 70
Thus, by the assertion of Theorem 4.2, we have d n1/2 (θˆn − θ0 ) −→ N 0, ( Gs0 0 W (s0 ) Gs0 )−1 Gs0 0 W (s0 )Ω(s0 )W (s0 ) Gs0 ( Gs0 0 W (s0 ) Gs0 )−1 , where Ω(s0 ) = Egs0 (zi , θ0 ) gs0 (zi , θ0 )0 with gs0 a q-vector of all correct moments. Furthermore, if W (s0 ) = Ω−1 (s0 ) as constructed, we have d n1/2 (θˆn − θ0 ) −→ N 0, ( Gs0 0 Ω−1 (s0 ) Gs0 )−1 . Observe that the limit distribution here coincides with the usual analysis, yet it is characterized only by correct moment conditions even in the presence of misspecified moment conditions. (d) Now consider the case where the number of moment conditions, p, specified by a researcher for estimation is less than the (true) number of correct moments, q. In this case, S p∗ in (4.19) contains ` = ( qp) elements, say S p∗ = (1)
(`)
{s∗ , . . . , s∗ }, and each moment selection vector in S p∗ selects only correct moment conditions of dimension p, so that (ys + Gs ξ )0 W (s)(ys + Gs ξ ) can be minimized with any element in S p∗ . Observe that, for any given s ∈ S p∗ , (4.18) is minimized at ξ (s) = −( Gs0 W (s) Gs )−1 Gs0 W (s)ys ,
(4.22)
which would be distributed as N (0, V (s)) with V (s) = ( Gs0 W (s) Gs )−1 Gs0 W (s)Ω(s)W (s) Gs ( Gs0 W (s) Gs )−1 , which in turn will reduce to ( Gs0 Ω−1 (s) Gs )−1 for W (s) = Ω−1 (s). However, the s ∈ S∗p assumed to be given in (4.22) is not fixed but random, so that in the case where p < q the limit distribution of n1/2 (θˆn − θ0 ) would be characterized by a 71
mixture of ` normal distributions N
0,
(i ) G 0 (i ) Ω −1 ( s ∗ ) G (i ) s∗ s∗
−1 !
(i )
for s∗ ∈ S p∗ ,
i = 1, . . . , `. (e) In practice, however, neither the correct moment conditions nor the number of them are known. Thus, we may need to use some resampling method to approximate the limit distribution of n1/2 (θˆn − θ0 ) for both cases (c) and (d) above. But the result in Theorem 4.2 shows that in any case the limit distribution of the proposed robust estimator does not depend on any incorrect moment conditions.
4.5
Example and simulation study
In this section I consider a hypothetical example in the context of a simple linear instrumental variable (IV) model to illustrate my robust GMM-type estimation framework. I also report the result from a simple Monte Carlo experiment designed to evaluate the performance of the robust estimation framework.
4.5.1
Simple linear IV model
Consider the following simple linear model y i = θ0 x i + u i ,
ui ∼ iidN (0, 1),
(4.23)
where x i = ηi + u i ,
ηi ∼ iidN (0, 1).
Note that the regressor x is an endogenous variable since it is correlated with the model error term u. Suppose that a set of four (possibly invalid) instrumental 72
variables, z˜ = { a, d, e, v}, is available and each of them is generated according to a i = ηi + ζ i ,
ζ i ∼ iidN (0, 1),
d i = η i + γi ,
γi ∼ iidN (0, 2),
ei = ηi + φi ,
φi ∼ iidN (0, 1),
vi = ηi + 2ui , where ui , ηi , ζ i , γi , and φi are all mutually independent of each other. Observe that E[ ai ui ] = E[di ui ] = E[ei ui ] = 0 and E[vi ui ] = 2Eu2i = 2 6= 0, so that ai , di , and ei are valid IVs, while vi is an invalid IV. The sample moment conditions for GMM estimation in this case are given by g¯ (θ ) =
1 n z˜i (yi − θxi ), n i∑ =1
where z˜i = ( ai , di , ei , vi )0 , and the corresponding limit function is g0 (θ ) = E[z˜i (yi − θxi )]. Note again that
0 0 g2 ( θ 0 ) E [ d i u i ] 0 0 = , g ( θ0 ) = = 0 g3 ( θ 0 ) E [ e i u i ] 0 E [ vi ui ] 2 g40 (θ0 ) g10 (θ0 )
E [ ai ui ]
that is, the first three moment conditions are correct, while the last one is incorrect (or misspecified). In our notation, to estimate k = 1 parameter we have r = 4 moment conditions and q = 3 of them are correct. Suppose that a researcher does not know a priori which one is correct or incorrect and that she decides to utilize
73
p=2
p=3
a
d
e
v
n = 100
0.6782
0.6441
0.6777
0.0000
1000
0.6857
0.6242
0.6901
0.0000
10000
0.6831
0.6369
0.6800
0.0000
n = 100
1.0000
1.0000
1.0000
0.0000
1000
1.0000
1.0000
1.0000
0.0000
10000
1.0000
1.0000
1.0000
0.0000
Note: In each replication of a Monte Carlo experiment, the p ∈ {2, 3} instrumental variables (or moment conditions) are selected in accordance with our robust estimation framework. The selection probabilities for each of the four IVs, { a, d, e, v}, are then calculated based on 10000 such replications for each sample size n ∈ {100, 1000, 10000}.
Table 4.1: Selection probabilities for each instrumental variable
p ∈ {2, 3} moment conditions, assuming at least p moment conditions correct or at most r − p moments are incorrect, to estimate k = 1 unknown parameter value.48
4.5.2
Consistency
Theorem 4.1 in Section 4.3 states that the proposed robust GMM-type estimation yields a consistent estimator for θ0 with no incorrect moment conditions selected by the moment selection vector.49 To evaluate this property, a simple Monte Carlo simulation is performed with setting θ0 = 1 in (4.23). In the simulation, the 48 If
only one moment condition (p = 1) is used for estimation, the limit objective function could be minimized with an incorrect moment condition (i.e., s = (0, 0, 0, 1)) at some θ∗ 6= θ0 , which however violates condition (vi) of Assumption 4.1; thus, for θ0 to be identified, a researcher need to utilize more than one moment in this case. 49 It
is well known that the estimator becomes inconsistent if any incorrect (or misspecified) moment conditions are involved in GMM estimation.
74
p ∈ {2, 3} IVs are selected based on the robust estimation framework to construct p moment conditions. Table 4.1 shows the selection probabilities for each of the four IVs, z˜ = { a, d, e, v}, for each combination of the sample size n ∈ {100, 1000, 10000} and the number of moments p ∈ {2, 3} specified by a researcher for estimation.50 The selection probabilities are calculated based on 10000 replications. Note that in any case the invalid IV (or incorrect moment condition) is never selected. Figure 4.1 shows the corresponding distribution of parameter estimate for each combination of the sample size and the number of moments utilized. Observe that as n gets larger the estimator is more concentrated around θ0 , that is, the estimator is consistent for θ0 . Thus, the proposed robust estimation framework shows quite a good performance in yielding the consistent estimator.
4.5.3
Limit distribution
Theorem 4.2 in Section 4.4 states that the limit distribution of the proposed estimator is characterized by d
n1/2 (θˆn − θ0 ) −→ arg min H (ξ ), ξ ∈R
where H (ξ ) = inf∗ (ys + Gs ξ )0 W (s)(ys + Gs ξ ), s∈S p
(4.18)
in which S p∗ is as defined in (4.19). Consider first the case where p = 3 = q. In this case, given the setting of the above example, S p∗ has a single element s = (1, 1, 1, 0), and we can evaluate ys and 50 The
case where p = 4 is not reported because in that case p > q and so it is not of our interest in our robust estimation framework. Also, in this case, our criterion function will diverge asymptotically.
75
p=3, n=100
2
2.0
1
1.0
0
0.0
Density
3
3.0
p=2, n=100
0.0
0.5
1.0
0.2
0.4
0.6
1.0
1.2
p=3, n=1000
8 6
6
4
4
0
0
2
2
Density
8
10
p=2, n=1000
0.8
0.80
0.85
0.90
0.95
1.00
1.05
1.10
0.85
1.00
1.05
1.10
p=3, n=10000
20
25
10
15
0
0
5
Density
0.95
30
p=2, n=10000
0.90
0.96
0.98
1.00
1.02
1.04
0.96
θ^n
0.98
1.00
1.02
θ^n
Figure 4.1: Distribution of parameter estimates from simulation (θ0 = 1)
76
1.04
Gs . First, ys is a 3 × 1 random vector such that ys ∼ N (0, Ω(s)), where
E[ a2i u2i ]
E[ ai di u2i ] E[ ai ei u2i ]
2 1 1
Ω(s) = E[ gs (θ0 ) gs (θ0 )0 ] = E[ ai di u2i ] E[d2i u2i ] E[di ei u2i ] = 1 3 1 , E[ ai ei u2i ] E[di ei u2i ] E[ei2 u2i ] 1 1 2 for E[ a2i u2i ] = ( Eηi2 + Eζ i2 ) Eu2i = 2, E[d2i u2i ] = ( Eηi2 + Eγi2 ) Eu2i = 3, E[ei2 u2i ] =
( Eηi2 + Eφi2 ) Eu2i = 2, and E[ ai di u2i ] = E[ ai ei u2i ] = E[di ei u2i ] = Eηi2 Eu2i = 1, and its inverse is given by Ω
−1
5
−1 −2
1 ( s ) = −1 3 −1 . 7 −2 −1 5
Next, Gs is given by 1 E(∂/∂θ ) ai (yi − θ0 xi ) Gs = E(∂/∂θ )di (yi − θ0 xi ) = − 1 1 E(∂/∂θ )ei (yi − θ0 xi )
since E[ ai xi ] = E[di xi ] = E[ei xi ] = Eηi2 = 1. Then, as shown before, the limit distribution of the estimator is simply characterized by d n1/2 (θˆn − θ0 ) −→ N (0, ( Gs0 Ω−1 (s) Gs )−1 ) = N (0, 7/5),
which is indeed confirmed from the simulation study, as shown in Figure 4.2 (b). Consider now the case where p = 2 < 3 = q. In this case, given the setting of the above example, we have
S p∗ = {(1, 1, 0, 0), (1, 0, 1, 0), (0, 1, 1, 0)} 77
(b) p=3
0.35
0.35
(a) p=2
0.30 0.25
0.30
0.20
0.25
0.15
0.20
0.10
0.15
0.00
0.00
0.05
0.05
0.10
Density
N(0,1.4)
variance=1.7545
−4
−2
0
2
4
−4
^ −θ ) n (θ n 0
−2
0
2
4
^ −θ ) n (θ n 0
Figure 4.2: Estimated limit distribution of θˆn from simulation (θ0 = 1, n = 10000)
and a generic element s in S p∗ can be written as s = ( s1 , s2 , s3 , 0) with s j ∈ {0, 1}, j = 1, 2, 3, such that s1 + s2 + s3 = p = 2. Then, as noted in remark (d) at the end of Section 4.4, for any given s ∈ S p∗ , the limit distribution of the estimator would be characterized by N (0, ( Gs0 Ω−1 (s) Gs )−1 ), which can easily be evaluated as N (0, 5/3) for s ∈ {(1, 1, 0, 0), (0, 1, 1, 0)} and N (0, 3/2) for s = (1, 0, 1, 0). However, in this case, s ∈ S p∗ is not actually fixed but asymptotically random, so that the limit distribution of the estimator in this case would be characterized by a mixture of these three normal distributions. See 78
Figure 4.2 (a) for the estimated limit distribution of θˆn from simulation. Notice that, as expected,51 the asymptotic variance in the case where p < q is larger than the asymptotic variance in the case where p = q.
4.5.4
Discussion
How does a researcher specify p, the number of moment conditions utilized for estimation, in practice? I suggest that one specify p in the most conservative way for robustness (at the expense of efficiency) because the inclusion of any incorrect moment conditions typically yields an inconsistent parameter estimator, which in turn leads to rejection of all moments asymptotically, and it makes the subsequent inference totally misleading. If the number, q, of correct moment conditions were known, then the researcher can simply set p = q in the proposed framework and the limit distribution for the estimator easily follows, as shown before. However, in general, the true number of correct moment conditions is not known, neither which moments are correct. Thus, in practice, we may use some resampling method to approximate the limit distribution.
4.6
Concluding remarks
In order to obtain a robust GMM-type estimator in the presence of misspecified moment conditions, a nonstandard criterion function is formulated as a squared weighted L2 -norm of a given moment vector minimized over the space of moment selection vectors. A robust estimator is then defined by the minimizer of such 51 In
general, the asymptotic variance gets smaller as we add (correct) moment conditions.
79
an objective function. The estimation framework proposed in this paper does not require any pre-selection procedures for choosing correct moment conditions, as opposed to the earlier literature on moment selection such as Andrews (1999), Andrews and Lu (2001), and Hong et al. (2003). This is because the same criterion function is used to select moment conditions and to estimate parameters simultaneously. This feature allows one to directly focus on exploring the asymptotic √ properties of the estimator. I establish that the resulting estimator is n-consistent and the limit distribution is characterized by the argmin of a certain random limit function that is free of any incorrect moment conditions. But the asymptotic distribution depends on the unknown number of correct moment conditions, so that the limit distribution result established in this paper might be more of theoretical interest than of practical importance.
80
CHAPTER 5
CONCLUSION
Three essays in this dissertation focus on developing new analytical frameworks or statistical methodologies to address some issues arising in applied econometrics literature. The first two essays in Chapters 2 and 3 concern the analytical and empirical issues involving the exponential of an integrated series, while the last essay in Chapter 4 proposes a GMM-type estimation that is robust to the presence of misspecified moment conditions. A limit distribution for sums or averages of the exponential-type functionals of integrated series, which has not been well established in the previous literature, is derived in Chapter 2. It is the extreme sample path realization that plays a crucial role in characterizing the limit behavior of sums of the exponential of an integrated series. Moreover, the resulting limit distribution is nicely characterized by the absolute value of standard normal random variable up to a constant, so that it should be amenable to being used in practice for statistical testing. The methodologies in Chapter 2 are further developed in Chapter 3 to provide a practical guidance as to the choice between cointegration in logarithms and cointegration in levels. When the cointegrating relation of our interest involves some variables for which it is
81
not clear which series is I(1) between the levels and the logarithms, cointegrating regression is sometimes specified in levels and sometimes in logarithms. To provide a relevant theoretical guidance in such situations, I develop testing procedures helpful in deciding whether one should take logarithms or not. To test for cointegration in logarithms, I propose test statistics based on the slope coefficient and the residual sum of squares from a regression in levels. Then by construction the test statistics involve sums of the exponential of an integrated series. By further developing the result established in Chapter 2, the asymptotic distributions of the proposed test statistics are derived under the assumption that cointegration in logarithms is the true data-generating process. In empirical applications, the test results suggest that the cointegration in logarithms between real stock prices and real dividends is not well supported in the data, while there is no significant evidence against the specification of long-run money demand function using the logarithm of the nominal interest rate. A separate issue on GMM estimation is considered in Chapter 4. In order to obtain a robust GMM-type estimator in the presence of misspecified moment conditions, a nonstandard criterion function is formulated as a squared weighted L2 -norm of a given moment vector minimized over the space of moment selection vectors. A robust estimator is then defined by the minimizer of the objective function. This approach uses the same criterion function to select moment conditions and to estimate parameters, and thus, as opposed to the earlier literature on moment selection, it does not require any pre-selection procedures for choosing correct moment conditions. This feature allows one to directly focus on exploring the asymptotic properties of the proposed robust estimator. It is shown that 82
the resulting estimator is
√
n-consistent and the limit distribution is characterized
by the argmin of a certain random limit function that does not depend on any incorrect moment conditions. The asymptotic distribution, however, depends on the unknown true number of correct moment conditions, so that the established limit distribution result might be more of theoretical interest than of practical importance. Also, the proposed robust estimation framework is illustrated with an example of a simple linear IV model and its performance is evaluated via a simulation study.
83
APPENDIX A
MATHEMATICAL PROOFS FOR CHAPTER 2
Proof of Lemma 2.1: Observe first that we can write ! ! n n an log ∑ exp f ( xt ) = an max f ( xt ) + an log ∑ exp f ( xt ) − max f ( xt ) . 1≤ t ≤ n
t =1
1≤ t ≤ n
t =1
But also observe that by construction, because f ( xt ) ≤ max1≤t≤n f ( xt ) and because the equality holds for at least one value of t, n
1 ≤
exp f ( x ) − max f ( x ) ≤ n, t t ∑ 1≤ t ≤ n
t =1
implying that n
0 ≤ an log
∑ exp
f ( xt ) − max f ( xt )
!
1≤ t ≤ n
t =1
≤ an log n.
(A.1)
Since it is assumed that an log n = o (1), it now follows that ! n d an log ∑ exp f ( xt ) = o p (1) + an max f ( xt ) −→ Y. 1≤ t ≤ n
t =1
Proof of Theorem 2.1: I will apply Lemma 2.1 with setting an = 1/κ (n1/2 ). Clearly, an log n → 0 by assumption. Let κ (n1/2 ) and r (n−1/2 xt , n1/2 ) be as defined in (2.4). Then p 1 −1/2 1/2 max r ( n x , n ) − → 0 t κ (n1/2 ) 1≤t≤n
84
under part (b) of Assumption 2.1. This is because (b)-1 implies that p 1 −1/2 1/2 1/2 1/2 −1/2 max r ( n x , n ) ≤ c ( ν ( n ) /κ ( n )) max g ( n x ) − → 0 t t 1≤ t ≤ n κ (n1/2 ) 1≤t≤n
since ν(λ)/κ (λ) → 0 as λ → ∞ by assumption and max1≤t≤n g(n−1/2 xt ) = O p (1) because g(·) is bounded on any compact set, while (b)-2 implies 1 max r (n−1/2 xt , n1/2 ) ≤ c(ν(n1/2 )/κ (n1/2 )) sup | g( x )| → 0. κ (n1/2 ) 1≤t≤n x ∈R Now the desired result immediately follows from the result of Lemma 2.1 since 1 −1/2 −1/2 1/2 1/2 max f ( xt ) = max h(n xt ) + r (n xt , n )/κ (n ) 1≤ t ≤ n κ (n1/2 ) 1≤t≤n d → sup h X (r ) , r ∈[0,1]
where the asserted convergence in distribution follows immediately if h(·) is continuous. For the case where h(·) is monotone, I only discuss the case where h(·) in nondecreasing since the other case is analogous. We can assume without loss of generality that x[nr] is a Skorokhod version satisfying a.s. sup n−1/2 x[nr] − X (r ) −→ 0.
r ∈[0,1]
Then for any δ > 0, for n large enough,
max h(n−1/2 xt ) = h max n−1/2 xt
1≤ t ≤ n
1≤ t ≤ n
"
∈
h sup X (r ) − δ , h sup X (r ) + δ r ∈[0,1]
# a.s.
r ∈[0,1]
Because supr∈[0,1] X (r ) ≤ K with arbitrarily large probability for large K, it suffices to show that for all K > 0, " # E h sup X (r ) + δ − h sup X (r ) − δ I sup X (r ) ≤ K → 0. r ∈[0,1]
r ∈[0,1]
r ∈[0,1]
This however follows easily from the continuity of the density of supr∈[0,1] X (r ), which completes the proof. 85
Proof of Corollary 2.1: Obviously, for f ( x ) = x, we can set κ (λ) = λ, h( x ) = x, and r ( x, λ) = 0 in Equation (2.3). Then the desired result immediately follows from Theorem 2.1 with λ = n1/2 .
86
APPENDIX B
USEFUL LEMMAS AND MATHEMATICAL PROOFS FOR CHAPTER 3
B.1
Notations
Define Mn = max1≤t≤n xt and ξ t,n = xt − Mn . Note that ξ t,n is a function of x1 , . . . , xn (and hence of w1 , . . . , wn ) and always nonpositive by definition. Also, define τn as the first index at which Mn is achieved, so that xτn = Mn or ξ τn ,n = 0; more precisely, τn can be defined as52 τn = min j ∈ {1, . . . , n} :
max ( xt − x j ) < 0, max ( xt − x j ) ≤ 0 .
1≤ t ≤ j −1
j +1≤ t ≤ n
(B.1)
Note that τn itself is a random variable.53
B.2
Useful lemmas and their proofs
Lemma B.1. Let xt and ut be any sequences of random variables. Let an > 0 be a scaling p
factor such that an log n → 0 and an max1≤t≤n |ut | −→ 0 as n → ∞. Then, for any 52 If
wt has a continuous distribution, then one may define τn as the index, rather than the “first” index, of the maximum since τn is unique, that is, τn = { j ∈ {1, . . . , n} : max1≤t< j ( xt − x j ) < 0, max j 0, n
p
an log ∑ exp(γξ t,n + ut ) −→ 0. t =1
Proof. Clearly n
n
t =1
t =1
an u(1) + an log ∑ exp(γξ t,n ) ≤ an log ∑ exp(γξ t,n + ut ) n
≤ an u(n) + an log ∑ exp(γξ t,n ), t =1
where u(1) = min ut and u(n) = max ut . But, an max |ut | = o p (1) by assump1≤ t ≤ n
1≤ t ≤ n
1≤ t ≤ n
tion. Thus n
n
t =1
t =1
an log ∑ exp(γξ t,n + ut ) = an log ∑ exp(γξ t,n ) + o p (1). But, an log ∑nt=1 exp(γξ t,n ) = o p (1) as well, as shown in (A.1) in the proof of Lemma 2.1 above. Corollary B.1. Under the same conditions as in Lemma B.1, n
an log ∑ exp(γxt + ut ) and γan max xt 1≤ t ≤ n
t =1
are asymptotically equivalent. Proof. Write n
n
an log ∑ exp(γxt + ut ) = γan max xt + an log ∑ exp(γξ t,n + ut ). t =1
1≤ t ≤ n
t =1
Then the desired result simply follows from Lemma B.1. Note that Lemma B.1 and Corollary B.1 hold for any sequences of random varip
ables xt and ut , as long as an log n → 0 and an max1≤t≤n |ut | −→ 0 as n → ∞.54 54 Lemma
2.1 can be viewed as a special case of Lemma B.1 and Corollary B.1 here with ut = 0
for all t.
88
Corollary B.2. Under Assumptions 3.1–3.3, n
n−1/2 log ∑ exp(γxt + ut ) ⇒ γλ sup W (r ). r ∈[0,1]
t =1
Proof. Since there exists some p > 3 such that E|ut | p < ∞ by assumption, we have for any δ > 0 1/2 P max |ut | > n δ ≤
n
∑ E|ut | p n− p/2 δ− p
1≤ t ≤ n
= n−( p−2)/2 E|ut | p δ− p → 0
t =1
as n → ∞; that is, p
n−1/2 max |ut | −→ 0.
(B.2)
1≤ t ≤ n
Now the desired result immediately follows from Corollary B.1 with an = n−1/2 , for n−1/2 max xt ⇒ λ supr∈[0,1] W (r ). 1≤ t ≤ n
Lemma B.2. Under Assumption 3.3, uτn = O p (1). Proof. Observe that n
lim sup P(|uτn | > K ) = lim sup E ∑ I (τn = j) I (|u j | > K ) n→∞
n→∞
j =1
n
∑
= lim sup n→∞
j=1, P(τn = j)>0
≤ lim sup n→∞
P(τn = j) P |u j | > K | τn = j
sup
P |u j | > K | τn = j
1≤ j≤n, P(τn = j)>0
→ 0 as K → ∞, since for any j such that P(τn = j) > 0 P |u j | > K | τn = j
= P |u j | > K, τn = j /P(τn = j)
= E[ I (τn = j) E( I (|u j | > K ) | w1 , . . . , wn )]/P(τn = j) ≤ f (K ) → 0 89
as K → ∞ by Assumption 3.3. Therefore, limK →∞ lim supn→∞ P(|uτn | > K ) = 0. Lemma B.3. Under Assumptions 3.1–3.2, n
sup E ∑ exp(γξ t,n ) < ∞ n ≥1
t =1
for any constant γ > 0. Proof. First observe that by the Beveridge-Nelson decomposition we can write xt = x˜ t + η˜ t − η˜0 + x0 , ∞ ˜ ˜ where x˜ t = ∑it=1 (ε i + ψ(1)ηi ) and η˜ t = ∑∞ j=0 ψ j ηt− j with ψ j = − ∑k = j+1 ψk . There
exists a C < ∞ such that |ηt | ≤ C since ηt is bounded. Note that η˜ t is then also ∞ ˜ ˜ bounded, for |η˜ t | ≤ C ∑∞ j=0 | ψ j |. Let a = C ∑ j=0 | ψ j | < ∞. Now observe that for
any γ > 0 E exp(γ( xt − Mn ))
=
Z ∞ 0
Z ∞
max ( xs − xt ) ≤ r
exp(−γr ) P
1≤ s ≤ n
dr
max ( x˜ s − x˜ t + η˜ s − η˜ t ) ≤ r dr Z ∞ ≤ exp(−γr ) P max ( x˜ s − x˜ t ) ≤ r + 2a dr 1≤ s ≤ n 0 Z ∞ = exp(−γr ) P max ( x˜ s − x˜ t ) ≤ r + 2a P max ( x˜ s − x˜ t ) ≤ r + 2a dr.
=
0
exp(−γr ) P
0
1≤ s ≤ n
1≤ s ≤ t
t +1≤ s ≤ n
Then the desired result follows from the same argument as in Davies and Krämer (2003, pp. 868–870). Lemma B.4. Under Assumptions 3.1–3.4, n
∑ exp(γξ t,n + ut )
t =1
for any constant γ > 0. 90
= O p (1)
Proof. By the Beveridge-Nelson decomposition we can write xt = x˜ t + η˜ t − η˜0 + x0 , ∞ ˜ ˜ where x˜ t = ∑it=1 ε˜ i with ε˜ i = ε i + ψ(1)ηi , and η˜ t = ∑∞ j=0 ψ j ηt− j with ψ j = − ∑k = j+1 ψk .
Note that x˜ t is described by a random walk driven by i.i.d. innovations ∆ x˜ t = ε˜ t . Define e n = max x˜ t M 1≤ t ≤ n
and observe that e n | ≤ max |η˜ t − η˜0 + x0 | = max | xt − x˜ t | , | Mn − M 1≤ t ≤ n
1≤ t ≤ n
˜ which is O p (1), for ηt bounded, ∑∞ j=0 | ψ j | < ∞, and x0 = O p (1) by assumption. Thus, for any γ > 0, the stochastic order of n
∑ exp(γ(xt − Mn ) + ut )
t =1
e n ) + ut ) and of would be the same as the stochastic order of ∑nt=1 exp(γ( xt − M e n ) + ut ). On the basis of this observation, I prove the desired ∑nt=1 exp(γ( x˜ t − M result by showing that n
∑ exp(γ(x˜t − Me n ) + ut ) = O p (1).
(B.3)
t =1
Observe that from (3.6) and (3.7) we can write ut =
∑ ψ1j∗ ε t− j + ∑ ψ2j∗ ηt− j = ∑ ψ1j∗ ε˜ t− j + ∑ ψ˜ 2j∗ ηt− j ,
j ≥0
j ≥0
j ≥0
(B.4)
j ≥0
∗ = ψ∗ − ψ (1) ψ∗ . Note that the second term on the most right side of where ψ˜ 2j 2j 1j ∗} ∗ (B.4) is bounded because {ψ1j j≥0 , { ψ2j } j≥0 , and { ψ j } j≥0 are absolutely summable
and ηt are bounded by assumption, so that we can ignore this term to prove (B.3). 91
Then the ut process in (B.4) satisfies the conditions for the ct process in de Jong ∗} (2010, Assumption 1) because {ψ1j j≥0 is absolutely summable and ε˜ t = ∆ x˜ t are
iid with mean zero and E|ε˜ t | p < ∞ for some p > 3; hence, the object on the left side of (B.3) has the same structure as the object Rn in de Jong (2010). Therefore, the result in (B.3) now follows from Theorem 1 of de Jong (2010).
B.3
Proofs of Theorems
Proof of Theorem 3.1: First, suppose an intercept term is not included in the regression. The OLS estimator of the slope coefficient is then given by exp(c) ∑nt=1 exp((θ + 1) xt + ut ) ∑n exp( xt ) exp(yt ) βˆ n = t=1 n = . ∑t=1 exp(2xt ) ∑nt=1 exp(2xt ) Take logarithms and then apply the scaling factor of n−1/2 on both sides to get n
n
n−1/2 log βˆ n = n−1/2 log ∑ exp((θ + 1) xt + ut ) − n−1/2 log ∑ exp(2xt ) + o (1), t =1
where o (1) term represents
t =1
n−1/2 c.
Then, by Lemma B.1 and Corollary B.1, we
have n−1/2 log
n
∑ exp(2xt )
!
d
−→ 2λ sup W (r )
(B.5)
r ∈[0,1]
t =1
and n
d
n−1/2 log ∑ exp((θ + 1) xt + ut ) −→ (θ + 1)λ sup W (r ).
(B.6)
r ∈[0,1]
t =1
Now the desired result follows because the convergences in distribution in (B.5) and (B.6) are joint. Next, consider the case where the regression contains an intercept term. The OLS estimate of the slope coefficient is now given by ec ∑nt=1 (e xt − e xt )eθxt +ut ∑ n 1 ( e xt − e xt ) e yt βˆ n = t= = , ∑nt=1 (e xt − e xt )2 ∑nt=1 (e xt − e xt )2 92
which can be written as e−c βˆ n =
∑nt=1 e(θ +1) xt +ut − n−1 ∑nt=1 e xt ∑nt=1 eθxt +ut ∑nt=1 e2xt − n−1 (∑nt=1 e xt )2
= e ( θ − 1 ) Mn
∑nt=1 e(θ +1)ξ t,n +ut − n−1 ∑nt=1 eξ t,n ∑nt=1 eθξ t,n +ut . ∑nt=1 e2ξ t,n − n−1 (∑nt=1 eξ t,n )2
Take logarithms and then apply the scaling factor n−1/2 on both sides to get n−1/2 log βˆ n = (θ − 1)n−1/2 Mn
+ n−1/2 log
n
n
∑ e(θ+1)ξ t,n +ut − n−1 ∑ eξ t,n ∑ eθξ t,n +ut
t =1
− n−1/2 log
n
t =1
t =1
n
n
n
!
t =1
t =1
t =1
∑ e2ξ t,n − n−1 ∑ eξ t,n ∑ eξ t,n
! (B.7)
+ o (1), where o (1) term represents n−1/2 c. Observe that it suffices to show that both the second and the third terms on the right-hand side of (B.7) are o p (1) to get the desired result. Denote by An and Bn the objects within the parentheses in the second and the third terms, respectively, and note that Lemma B.3 and Lemma B.4 make it clear that the second terms of An and Bn are O p (n−1 ), i.e., asymptotically negligible. Now observe that e
uτn
− O p (n
−1
n
) ≤ An ≤
∑ e(θ+1)ξ t,n +ut .
t =1
But uτn = O p (1) by Lemma B.2 and ∑nt=1 e(θ +1)ξ t,n +ut = O p (1) by Lemma B.4. Hence, n−1/2 log An = o p (1). Observe also that e2ξ τn ,n − O p (n−1 ) ≤ Bn ≤
n
∑ e2ξ t,n .
t =1
But ξ τn ,n = 0 by definition and ∑nt=1 e2ξ t,n = O p (1) by Lemma B.3. Hence, n−1/2 log Bn = o p (1).
93
Proof of Theorem 3.2: First, suppose an intercept term is not included in the regression. The residual sum of squares is then given by n
Qn
n
= ∑ (e − βˆ n e xt )2 =
∑e
yt
t =1
2yt
− 2 βˆ n
t =1
n
∑e
xt +yt
+ βˆ 2n
t =1
n
∑ e2xt ,
(B.8)
t =1
which can be written using βˆ n = ∑nt=1 e xt +yt ∑nt=1 e2xt as 2 n ∑nt=1 e(θ +1) xt +ut , e−2c Qn = ∑ e2(θxt +ut ) − ∑nt=1 e2xt t =1 which in turn can be written
n
e−2c Qn = e2θMn ∑ e2(θξ t,n +ut ) − t =1
∑nt=1 e(θ +1)ξ t,n +ut ∑nt=1 e2ξ t,n
2 .
(B.9)
Multiply both sides of (B.9) by ∑nt=1 e2ξ t,n and rearrange to get e − 2 ( θ Mn + c ) Q n
n
n
n
n
t =1
t =1
t =1
t =1
∑ e2ξ t,n = ∑ e2ξ t,n ∑ e2(θξ t,n +ut ) − ∑ e(θ+1)ξ t,n +ut
!2 .
(B.10)
Let Gn denote the right-hand side of (B.10), i.e., Gn =
n
n
n
t =1
t =1
t =1
∑ e2ξ t,n ∑ e2(θξ t,n +ut ) − ∑ e(θ+1)ξ t,n +ut
!2 .
(B.11)
n−1/2 log Qn = 2θn−1/2 Mn + n−1/2 log Gn + o p (1),
(B.12)
Then we can write
where o p (1) term represents 2n−1/2 c − n−1/2 log ∑nt=1 e2ξ t,n (due to Lemma B.1), and observe that the desired result immediately follows, provided p
n−1/2 log Gn −→ 0.
(B.13)
But it is clear from (B.11) that Gn is bounded above by n
Un =
∑e
t =1
2ξ t,n
n
∑ e2(θξ t,n +ut )
t =1
94
(B.14)
and it follows from Lemma B.1 that n
n
t =1
t =1
n−1/2 log Un = n−1/2 log ∑ e2ξ t,n + n−1/2 log ∑ e2(θξ t,n +ut ) = o p (1). Now, to complete the proof of (B.13), we need to show that (B.13) holds as well for the lower bound of Gn . To identify a lower bound of Gn , observe that (B.11) can be rewritten as55 2 1 n n ξ s,n θξ t,n +ut ξ t,n θξ s,n +us e e − e e ∑ 2 s∑ =1 t =1 2 n = ∑ ∑ eξ s,n eθξ t,n +ut − eξ t,n eθξ s,n +us ,
Gn =
s > t t =1
so that, recalling ∆ξ t,n = ∆xt = wt , we have n
Gn ≥
=
∑
t =1 n
∑
eξ t+1,n eθξ t,n +ut − eξ t,n eθξ t+1,n +ut+1
2
ewt+1 e(1+θ )ξ t,n +ut − e(1+θ )ξ t,n eθwt+1 +ut+1
2
t =1
≥ e
2(1+θ )ξ τn ,n
e
wτn +1 +uτn
−e
θwτn +1 +uτn +1
= e2wτn +1 e2uτn 1 − e(θ −1)wτn +1 euτn +1 −uτn
2
2
and hence n
−1/2
log Gn ≥ 2n
−1/2
wτn +1 + 2n
−1/2
uτn + 2n
−1/2
(θ −1)wτn +1 +∆uτn +1 log 1 − e .
But n−1/2 uτn = o p (1) since uτn = O p (1) by Lemma B.2 and p
n−1/2 |wτn +1 | ≤ n−1/2 max |wt | −→ 0 1≤ t ≤ n
= that since ∑in=1 ∑nj=1 ( ai b j − a j bi )2 ∑in=1 a2i ∑nj=1 b2j + ∑nj=1 a2j ∑in=1 bi2 − 2 ∑in=1 ai bi ∑nj=1 a j b j , collecting together identical terms (albeit with different summation indices) we have !2 n n n 1 n n 2 2 2 ∑ ( a i b j − a j bi ) = ∑ a i ∑ bi − ∑ a i bi . 2 i∑ =1 j =1 i =1 i =1 i =1 55 Observe
95
since for any δ > 0 P
max |wt | > n
1/2
1≤ t ≤ n
n
δ
≤
∑ E|wt | p n− p/2 δ− p
= n−( p−2)/2 E|wt | p δ− p → 0
t =1
as n → ∞, for E|wt | p < ∞ for some p > 3 by Assumption 3.2. Thus, to complete the proof of (B.13), it suffices to show that lim sup P (|(θ − 1)wτn +1 + ∆uτn +1 | ≤ δ) → 0 n→∞
as δ → 0. To see this, observe that P (|(θ − 1)wτn +1 + ∆uτn +1 | ≤ δ)
= P ((1 − θ )wτn +1 − δ ≤ ∆uτn +1 ≤ (1 − θ )wτn +1 + δ) n
= E ∑ I (τn = j) I ((1 − θ )w j+1 − δ ≤ ∆u j+1 ≤ (1 − θ )w j+1 + δ) j =1 n
= =
∑ EI (τn = j) P((1 − θ )w j+1 − δ ≤ ∆u j+1 ≤ (1 − θ )w j+1 + δ | w1, . . . , wn )
j =1 n
Z (1 − θ ) w j +1 + δ
j =1
(1 − θ ) w j +1 − δ
∑ EI (τn = j)
p∆u j+1 |w1 ,...,wn (r ) dr
≤ 2δB for some B < ∞, since the conditional density of ∆ut given w1 , . . . , wn , p(∆ut | w1 , . . . , wn ), is bounded by Assumption 3.3. Next, consider the case where the regression contains an intercept term. Let αˆ n denote the OLS estimate of an intercept term. The residual sum of squares is now
96
given by n
∑ (eyt − αˆ n − βˆ n ext )2
Qn =
t =1 n
∑
=
(B.15)
2 eyt − eyt − βˆ n e xt − e xt
t =1
n xt − e xt e yt 2 e ∑ t = 1 = ∑ e − e yt − 2 ∑nt=1 e xt − e xt t =1 n 2 xt − e xt ec+θxt +ut 2 n e ∑ t =1 = ∑ ec+θxt +ut − ec+θxt +ut − 2 ∑nt=1 e xt − e xt t =1 2 n ξ θξ + u ξ t,n t,n t t,n 2 n e ∑ t =1 e − e = e2(c+θ Mn ) ∑ eθξ t,n +ut − eθξ t,n +ut − , 2 n ξ ξ t,n t,n t =1 ∑ t =1 e − e n
2
yt
which can be rearranged as e − 2 ( c + θ Mn ) Q n
n
∑
eξ t,n − eξ t,n
2
t =1 n
=
∑
eξ t,n − eξ t,n
2
t =1
n
∑
eθξ t,n +ut − eθξ t,n +ut
2
n
−
t =1
∑
!2
eξ t,n − eξ t,n eθξ t,n +ut
.
t =1
(B.16) Let G¯ n denote the right-hand side of (B.16), i.e., G¯ n =
n
∑
eξ t,n − eξ t,n
t =1
2
n
∑
eθξ t,n +ut − eθξ t,n +ut
2
n
−
t =1
∑
!2
eξ t,n − eξ t,n eθξ t,n +ut
.
t =1
(B.17) Then we can write n−1/2 log Qn = 2θn−1/2 Mn + n−1/2 log G¯ n + o p (1),
(B.18)
2 where o p (1) term represents 2n−1/2 c − n−1/2 log ∑nt=1 eξ t,n − eξ t,n as is already shown in the proof of Theorem 3.1, and observe that the desired result immediately follows, provided p n−1/2 log G¯ n −→ 0.
97
(B.19)
To prove (B.19), it suffices to show that G¯ n is asymptotically equivalent to Gn given by (B.11) because we have already proved above that (B.13) holds. Now observe that by Lemma B.3 and Lemma B.4 each summation term on the right-hand side of (B.17) can be written as n
∑
e
ξ t,n
− eξ t,n
2
n
=
t =1 n
∑
∑e
2ξ t,n
t =1
eθξ t,n +ut − eθξ t,n +ut
2
t =1
!2
n
1 − n
∑e
ξ t,n
n
=
t =1
n
∑ e2ξ t,n − O p (n−1 ),
t =1
1 = ∑ e2(θξ t,n +ut ) − n t =1
n
∑ eθξ t,n +ut
!2
t =1
n
=
∑ e2(θξ t,n +ut ) − O p (n−1 ),
t =1
and n
∑
e
ξ t,n
− eξ t,n
e
θξ t,n +ut
n
=
∑e
(θ +1)ξ t,n +ut
t =1 n
t =1
=
1 n ξ t,n n θξ t,n +ut − ∑e ∑e n t =1 t =1
∑ e(θ+1)ξ t,n +ut − O p (n−1 ),
t =1
which imply that G¯ n = Gn + o p (1) as is to be shown. Proof of Theorem 3.3: The desired result immediately follows from Theorems 3.1 and 3.2, since 1 n−1/2 log Tn = n−1/2 log βˆ n − n−1/2 log Qn 2 and d d n−1/2 log βˆ n −→ (θ − 1)λ sup W (r ) and n−1/2 log Qn −→ 2θλ sup W (r ) jointly. r ∈[0,1]
r ∈[0,1]
98
Proof of Theorem 3.4: First, suppose an intercept term is not included in the regression. The residual sum of squares is then given by n
en = Q
∑
t =1 n
=
∑
t =1
yt − β˜ n e xt y2t
2
(∑nt=1 yt e xt )2 − ∑nt=1 e2xt
∑nt=1 (c + θxt + ut )eξ t,n = ∑ (c + θxt + ut )2 − ∑nt=1 e2ξ t,n t =1 n
2
and so en n −2 Q
n−1 ∑nt=1 (c + θxt + ut )eξ t,n = n−2 ∑ (c + θxt + ut )2 − ∑nt=1 e2ξ t,n t =1 n
2 .
Observe that 1 ≤ ∑nt=1 e2ξ t,n = O p (1) by Lemma B.3, n −2
n
∑ (c + θxt + ut )2
t =1
n
= θ 2 n−2 ∑ xt2 + o p (1), t =1
and n −1
n
∑ (c + θxt + ut )eξ t,n
= o p (1)
t =1
since ∑nt=1 xt eξ t,n ≤ max1≤t≤n | xt | ∑nt=1 eξ t,n = O p (n1/2 ) and n n ξ t,n ≤ max ξ t,n = o ( n1/2 ) by (B.2) and Lemma B.3. There ∑ p 1≤ t ≤ n | u t | ∑ t =1 e t =1 u t e fore, we have d
e n −→ θ 2 λ2 n −2 Q
Z 1 0
W 2 (r ) dr.
Next, consider the case where the regression contains an intercept term. Let αˆ n denote the OLS estimate of an intercept term. The residual sum of squares is now
99
given by n
en = Q
yt − α˜ n − β˜ n e xt
∑
(yt − yt ) − β˜ n e xt − e xt
t =1 n
=
2
∑
2
t =1
2 ∑nt=1 e xt − e xt yt = ∑ (yt − yt ) − 2 ∑nt=1 e xt − e xt t =1 2 n ξ ξ t,n t,n n ∑t=1 (c + θxt + ut ) e − e = ∑ (θ ( xt − xt ) + (ut − ut ))2 − 2 n ξ t,n − eξ t,n t =1 e ∑ t =1 n
2
and so 2 n−1 ∑nt=1 (c + θxt + ut ) eξ t,n − eξ t,n e n = n−2 ∑ (θ ( xt − xt ) + (ut − ut ))2 − . n −2 Q 2 n ξ ξ t,n t,n t =1 ∑ t =1 e − e 2 n ξ ξ t,n t,n Observe that 0 < ∑t=1 e − e = O p (1) by Lemma B.3,
n
n −2
n
n
t =1
t =1
∑ (θ (xt − xt ) + (ut − ut ))2 = θ 2 n−2 ∑ (xt − xt )2 + o p (1),
and n −1
n
ξ t,n ξ t,n e = o p (1) ( c + θx + u ) e − t t ∑
t =1
since ∑n
xt ≤ max1≤t≤n | xt | ∑nt=1 eξ t,n = O p (n1/2 ), n n ξ t,n ≤ max ξ t,n = o ( n1/2 ) by (B.2) and Lemma B.3, and ∑ p 1≤ t ≤ n | u t | ∑ t =1 e t =1 u t e t =1
eξ t,n
eξ t,n = O p (n−1 ) by Lemma B.3. Hence, we have n where W∗ (r ) = W (r ) −
−2
R1 0
e n −d→ θ 2 λ2 Q
Z 1 0
W∗2 (r ) dr,
W (s) ds is the demeaned standard Brownian motion
and note that Z 1 0
W∗2 (r ) dr
=
Z 1 0
2
W (r ) dr −
100
2
1
Z 0
W (r ) dr
.
APPENDIX C
MATHEMATICAL PROOFS FOR CHAPTER 4
Proof of Lemma 4.1: See Newey and McFadden (1994, Lemma 2.4). Proof of Lemma 4.2: By the triangle and the Cauchy-Schwarz inequalities, we have56 f n (θ, s) − f (θ, s) = g¯ s (θ )0 Wn (s) g¯ s (θ ) − gs0 (θ )0 W (s) gs0 (θ ) ≤ ( g¯ s (θ ) − gs0 (θ ))0 Wn (s)( g¯ s (θ ) − gs0 (θ )) + gs0 (θ )0 (Wn (s) + Wn (s)0 )( g¯ s (θ ) − gs0 (θ )) + gs0 (θ )0 (Wn (s) − W (s)) gs0 (θ )
≤ k g¯ s (θ ) − gs0 (θ )k2 kWn (s)k + 2k gs0 (θ )kk g¯ s (θ ) − gs0 (θ )kkWn (s)k + k gs0 (θ )k2 kWn (s) − W (s)k ≤ k g¯ (θ ) − g0 (θ )k2 kWn (s)k + 2k g0 (θ )kk g¯ (θ ) − g0 (θ )kkWn (s)k + k g0 (θ )k2 kWn (s) − W (s)k, where the last inequality follows from the fact that kvs k ≤ kvk for any v ∈ Rr and any s ∈ S p , since vs is a p-subvector of v. Note that g0 (θ ) is bounded on Θ by
p condition (ii) of Assumption 4.1, supθ ∈Θ g¯ (θ ) − g0 (θ ) −→ 0 by Lemma 4.1, and 56 Let
k Ak denote the Euclidean norm of a vector or matrix A, i.e., k Ak = [tr( A0 A)]1/2 .
101
p
Wn (s) −→ W (s) by condition (v) of Assumption 4.1. Hence the asserted uniform convergence follows. Proof of Theorem 4.1: Observe from Equations (4.12)–(4.14) that 0 ≤ f (θˆn , sˆm , sˆc ) − f (θ0 , 0, sˆc )
= f (θˆn , sˆm , sˆc ) − f n (θˆn , sˆm , sˆc ) + f n (θˆn , sˆm , sˆc ) − f (θ0 , 0, sˆc ) ≤ f (θˆn , sˆm , sˆc ) − f n (θˆn , sˆm , sˆc ) + f n (θ0 , 0, sˆc ) − f (θ0 , 0, sˆc ) p ≤ 2 sup f n (θ, sm , sc ) − f (θ, sm , sc ) −→ 0 θ,sc ,sm
by the uniform convergence result of Lemma 4.2. Thus, f (θˆn , sˆm , sˆc ) → f (θ0 , 0, sˆc ), implying p
(θˆn , sˆm ) −→ (θ0 , 0), provided for any δ > 0 and sm 6= 0 inf
inf inf f (θ, sm , sc ) > inf f (θ0 , 0, sc ),
θ ∈Θ:|θ −θ0 |>δ sm 6=0 sc
sc
or equivalently inf
e (θ, sm ) > Q e ( θ0 , 0), inf Q
θ ∈Θ:|θ −θ0 |>δ sm 6=0
e (θ, sm ) is uniquely minimized at (θ0 , 0), which however follows i.e., provided Q from condition (vi) of Assumption 4.1; see also Equation (4.10) along with the arguments therein. Proof of Lemma 4.3: See Kim and Pollard (1990).
102
Proof of Lemma 4.4: Let (θˆn , sˆ) be the minimizer of f n (θ, s) = g¯ s (θ )0 Wn (s) g¯ s (θ ) defined in (4.3). By the mean value expansion we have g¯ s (θˆn ) = g¯ s (θ0 ) + (∂/∂θ 0 ) g¯ s (θ˜n )(θˆn − θ0 ), where θ˜n lies in between θˆn and θ0 . Now, observe that
0 (∂/∂θ 0 ) g¯ sˆ (θ˜n )n1/2 (θˆn − θ0 ) Wn (sˆ) (∂/∂θ 0 ) g¯ sˆ (θ˜n )n1/2 (θˆn − θ0 )
= (n1/2 g¯ sˆ (θˆn ) − n1/2 g¯ sˆ (θ0 ))0 Wn (sˆ)(n1/2 g¯ sˆ (θˆn ) − n1/2 g¯ sˆ (θ0 )) ≤ 2(n1/2 g¯ sˆ (θˆn ))0 Wn (sˆ)(n1/2 g¯ sˆ (θˆn )) + 2(n1/2 g¯ sˆ (θ0 ))0 Wn (sˆ)(n1/2 g¯ sˆ (θ0 )) ≤ 4(n1/2 g¯ sˆ (θ0 ))0 Wn (sˆ)(n1/2 g¯ sˆ (θ0 )). p
Since sˆm −→ 0 by Theorem 4.1, we have Λm ⊂ Λ0 or Λ1 ⊂ Λc in the limit, implying all the moments in g¯ sˆ (θ0 ) are correct asymptotically. Thus, n1/2 g¯ sˆ (θ0 ) converges in distribution to a normal random variable by the CLT and hence O p (1). Therefore,
0 (∂/∂θ 0 ) g¯ sˆ (θ˜n )n1/2 (θˆn − θ0 ) Wn (sˆ) (∂/∂θ 0 ) g¯ sˆ (θ˜n )n1/2 (θˆn − θ0 ) = O p (1). (C.1) p
Note that Wn (s) −→ W (s) ∀ s ∈ S p by condition (v) of Assumption 4.1. Also, condition (iv) of Assumption 4.2 implies (∂/∂θ 0 ) g¯ sˆ (θ ) follows a weak ULLN, which p combined with the consistency of θˆn implies that (∂/∂θ 0 ) g¯ sˆ (θ˜n ) −→ G∗ , where G∗
is a p × k matrix. Finally, since G is of full column rank by condition (iv) of Assumption 4.2, so is G∗ , for p ≥ k by supposition. Hence, Equation (C.1) implies that n1/2 (θˆn − θ0 ) is O p (1).57 57 A
detailed treatment for the last statement can be found in de Jong and Han (2002, p. 502).
103
Proof of Lemma 4.5: With fixed ξ, condition (ii) of Assumption 4.2 ensures that θ0 + ξn−1/2 ∈ Θ for n large enough. When this happens, we can write hn (ξ ) = n1/2 g¯ (θ0 + ξn−1/2 ) ˜ −1/2 )ξ = n1/2 g¯ (θ0 ) + (∂/∂θ 0 ) g¯ (θ0 + ξn ˜ −1/2 )ξ, = n1/2 E g¯ (θ0 ) + n1/2 [ g¯ (θ0 ) − E g¯ (θ0 )] + (∂/∂θ 0 ) g¯ (θ0 + ξn
(C.2)
where ξ˜ lies in between ξ and 0. Notice that the jth component of the first term on the right hand side of (C.2) would either be a zero or diverge depending on whether the jth moment condition is correct or not, for E g¯ j (θ0 ) = 0 if the jth moment is correct and E g¯ j (θ0 ) 6= 0 otherwise. That is, each component in the first term, n1/2 E g¯ (θ0 ), will be either 0 or O(n1/2 ) depending on whether the corresponding moment condition is correct or not. Denote by x the limit of this first term. Then we can write the jth component, x j , j = 1, . . . , r, of x as ( 0 if E g¯ j (θ0 ) = 0, |xj | = ∞ if E g¯ j (θ0 ) 6= 0. The second term on the rhs of (C.2) is asymptotically normal by an ordinary CLT under condition (iii) of Assumption 4.2, since it is a normalized sum of mean zero random variables.58 That is, we have n
1/2
1 n d √ ¯ ¯ [ g(θ0 ) − E g(θ0 )] = [ g(zi , θ0 ) − Eg(zi , θ0 )] −→ N (0, Ω), ∑ n i =1
where Ω = E[ g(zi , θ0 ) g(zi , θ0 )0 ] − [ Eg(zi , θ0 )][ Eg(zi , θ0 )]0 58 If
all the moment conditions are correct as in the standard GMM estimation framework, then we can apply the CLT to n1/2 g¯ (θ0 ), which is however no longer a sum of mean zero random variables in the presence of incorrect moment conditions.
104
for the case where zi are iid.59 Let y denote the limiting random variable for this second term; that is, y denotes a Rr -valued random vector distributed according to N (0, Ω). For the third term on the rhs of (C.2), condition (iv) of Assumption 4.2 p ˜ −1/2 → 0 implies that (∂/∂θ 0 ) g¯ (θ0 + ξn ˜ −1/2 )ξ −→ together with the fact that ξn Gξ,
where G = E(∂/∂θ 0 ) g¯ (θ0 ). Then we can write as n → ∞ hn (ξ ) → x + y + Gξ and d
Hn (ξ ) = inf hns (ξ )0 Wn (s)hns (ξ ) −→ inf ( xs + ys + Gs ξ )0 W (s)( xs + ys + Gs ξ ). s∈S p
s∈S p
(C.3) Let H (ξ ) = inf ( xs + ys + Gs ξ )0 W (s)( xs + ys + Gs ξ ). s∈S p
(4.17)
Notice that it must be the case that Λm ⊂ Λ0 or Λ1 ⊂ Λc to minimize ( xs + ys + Gs ξ )0 W (s)( xs + ys + Gs ξ ) over s ∈ S p ; otherwise, H (ξ ) = ∞ since | x j | = ∞ for any j ∈ Λm . In other words, ( xs + ys + Gs ξ )0 W (s)( xs + ys + Gs ξ ) is minimized only if the p moment conditions selected by s are all correct (in the limit). Thus, we may rewrite H (ξ ) as H (ξ ) = inf∗ (ys + Gs ξ )0 W (s)(ys + Gs ξ ), s∈S p
(4.18)
where the set S p∗ is as defined in (4.19). Finally, the convergence of the finitedimensional distributions of Hn (ξ ) to those of H (ξ ) follows from the Cramér-Wold device. 59 In
h i general, Ω = limn→∞ Var n−1/2 ∑in=1 g(zi , θ0 ) .
105
Proof of Lemma 4.6: Using the definition of Hn (·) and the truncation argument, we can write
| Hn (ξ 1 ) − Hn (ξ 2 )| = inf hns (ξ 1 )0 Wn (s)hns (ξ 1 ) − inf hns (ξ 2 )0 Wn (s)hns (ξ 2 ) I (|ξ 1 − ξ 2 | < δn ) s∈S p s∈S p + inf hns (ξ 1 )0 Wn (s)hns (ξ 1 ) − inf hns (ξ 2 )0 Wn (s)hns (ξ 2 ) I (|ξ 1 − ξ 2 | ≥ δn ), s∈S p
s∈S p
where hns (ξ i ) = n1/2 g¯ s (θ0 ) + (∂/∂θ 0 ) g¯ s (θ0 + ξ˜i n−1/2 )ξ i ∈ R p ,
i = 1, 2,
for n large enough. Note that there must be some δn small enough for which the same s ∈ S p , say s˜,60 can minimize both hns (ξ 1 )0 Wn (s)hns (ξ 1 ) and hns (ξ 2 )0 Wn (s)hns (ξ 2 ) if |ξ 1 − ξ 2 | < δn . Then, we have for n large enough and δn small enough
| Hn (ξ 1 ) − Hn (ξ 2 )| = hns˜ (ξ 1 )0 Wn (s˜)hns˜ (ξ 1 ) − hns˜ (ξ 2 )0 Wn (s˜)hns˜ (ξ 2 ) I (|ξ 1 − ξ 2 | < δn ) + Hn (ξ 1 ) − Hn (ξ 2 ) I (|ξ 1 − ξ 2 | ≥ δn ) = (hns˜ (ξ 1 ) − hns˜ (ξ 2 ))0 Wn (s˜)(hns˜ (ξ 1 ) + hns˜ (ξ 2 )) I (|ξ 1 − ξ 2 | < δn ) + Hn (ξ 1 ) − Hn (ξ 2 ) I (|ξ 1 − ξ 2 | ≥ δn ) = [(∂/∂θ 0 ) g¯ s˜ (θ0 + ξ˜1 n−1/2 )ξ 1 − (∂/∂θ 0 ) g¯ s˜ (θ0 + ξ˜2 n−1/2 )ξ 2 ]0 Wn (s˜) [2n1/2 g¯ s˜ (θ0 ) + (∂/∂θ 0 ) g¯ s˜ (θ0 + ξ˜1 n−1/2 )ξ 1 + (∂/∂θ 0 ) g¯ s˜ (θ0 + ξ˜2 n−1/2 )ξ 2 ] × I (|ξ 1 − ξ 2 | < δn ) + Hn (ξ 1 ) − Hn (ξ 2 ) I (|ξ 1 − ξ 2 | ≥ δn ). Then, under condition (iv) of Assumption 4.2, by the weak ULLN for (∂/∂θ 0 ) g¯ (θ ) and ξ˜i n−1/2 → 0 uniformly over all ξ 1 and ξ 2 , for δ sufficiently small enough sup ξ 1 ,ξ 2 ∈Ξ:|ξ 1 −ξ 2 |