May 3, 2015 - E (X1(Ï)X1(Ï) ) denote the population covariance matrix of the covariates. ... 1 ⤠s ⤠2m, a positiv
Sharp Threshold Detection Based on Sup-norm Error Rates in High-dimensional Models Laurent Callot∗
Mehmet Caner†
Anders Bredahl Kock‡
and Juan Andres Riquelme§ May 3, 2015
Abstract We propose a new estimator, the thresholded scaled Lasso, in high dimensional threshold regressions. First, we establish an upper bound on the `∞ estimation error of the scaled Lasso estimator of Lee et al. (2015). This is a non-trivial task as the literature on high-dimensional models has focused almost exclusively on `1 and `2 estimation errors. We show that this sup-norm bound can be used to distinguish between zero and non-zero coefficients at a much finer scale than would have been possible using classical oracle inequalities. Thus, our sup-norm bound is tailored to consistent variable selection via thresholding. Our simulations show that thresholding the scaled Lasso yields substantial improvements in terms of variable selection. Finally, we use our estimator to shed further ∗
VU University Amsterdam, Department of Econometrics and Operations Research, CREATES, and the Tinbergen Institute. Email:
[email protected] † North Carolina State University, Department of Economics, 4168 Nelson Hall, Raleigh, NC 27695. Email:
[email protected] ‡ Corresponding author. Aarhus University, Department of Economics and Business, and CREATES Center for Research in Econometric Analysis of Time Series (DNRF78), funded by the Danish National Research Foundation. Fuglesangs Alle 4, 8210, Aarhus V Denmark. Email:
[email protected] § North Carolina State University, Department of Economics, financial support of CONICYT-Chile (Comision Nacional de Investigacion Cientifica y Tecnologica) is gratefully acknowledged.
1
empirical light on the long running debate on the relationship between the level of debt (public and private) and GDP growth. Keywords and phrases: Threshold model, sup-norm bound, thresholded scaled Lasso, oracle inequality, debt effect on GDP growth. JEL classification: C13, C23, C26.
1
Introduction
Threshold models have been heavily studied and used in the past twenty years or so. In econometrics the seminal articles by Hansen (1996) and Hansen (2000) showed that least squares estimation of threshold models is possible and feasible. These papers show how to test for the presence of a threshold and how to estimate the remaining parameters by least squares. Later, Caner and Hansen (2004) provided instrumental variable estimation of the threshold. These authors derived the limits for the threshold parameter in the reduced form as well as structural equations. There have been many applications of threshold models in cross-section data. One of the most recent ones is the analysis of the public debt to GDP ratio in a threshold regression model by Caner et al. (2010). In the context of time series we refer to the articles by Caner and Hansen (2001), Seo (2006), Seo (2008), and Hansen and Seo (2002). Lin (2014) considers the adaptive Lasso in a high dimensional quantile threshold model. In panel data, semi-parametrics, and least absolute deviation models, Hansen (1999), Linton and Seo (2007), Caner (2002), respectively, made contributions. For applications to stock markets and exchange rates we refer to Akdeniz et al. (2003) and Basci and Caner (2006). These authors argue that threshold model can contribute to reducing forecast errors. To be precise, we shall study the model
Yi = Xi0 β0 + Xi0 δ0 1{Qi t) ≤ Ae−Bt for all t > 0. Z is said to be subexponential if there 3
exists positive constants C and D such that P (|Z| > t) ≤ Ce−Dt for all t > 0. For x ∈ Rk , we will let x(j) denote its jth entry. Let ”wpa1” denote with probability approaching one.
2
Scaled Lasso for Threshold Regression
Defining the 2m × 1 vectors Xi (τ ) = Xi0 , Xi0 1{Qi cη,
For this assumption Lee et al. (2015) (pages A7-A8) also provide sufficient conditions encompassing the assumptions made in Assumption 1 above. Assumption 4. (Smoothness of Design). For any η > 0, there exists a constant C < ∞ such that wpa1 n 1 X (j) (k) sup sup Xi Xi |1{Qi q one has 3Cλ. Then, for all > 0 there exists a C such that for H = 2Cλ = 2C log(m) n ˜ = J(δ0 ) ≥ 1 − as n → ∞. P J(δ) Threshold selection consistency is weaker than model selection consistency as it only requires classifying δ0 correctly. However, it is still relevant as it answers the question whether a threshold is present or not. We discuss how to choose the threshold parameter C in practice in Section 5.
5
Simulations
In this section we report the results of a series of simulation experiments evaluating the finite sample properties of the thresholded scaled Lasso. We focus in turn on the following dimensions: the scale of the parameters, the number of observations, estimation in the absence of a threshold, and the dependence between the threshold variable and the covariates. Results focusing on increasing numbers of zero or non-zero variables are available in the supplementary material. The regressors are generated as Xi ∼ N (0, I), the threshold variable Qi ∼ U[0, 1], and the innovations Ui ∼ N (0, σ 2 ) where we set the residual variance σ 2 = 0.25, i = 1, ..., n. When the threshold parameter τ0 is not explicitly stated it is set to τ0 = 0.5; we search for τ0 over a grid from 0.15 to 0.85 by steps of 0.05. This grid is coarser than the grid used in Lee et al. (2015) which, in our experience, has a mild detrimental effect on the precision with which τ0 is estimated but not on other measures of the quality of the estimator while substantially reducing computation time, thus allowing us to carry out more replications. We select the thresholding parameter C by BIC using a grid from 0.1 to 5, so that parameters b are set to zero by the thresholded scaled Lasso. bλ smaller (in absolute value) than C Every model is estimated with an intercept so that we estimate 2m + 1 parameters, plus the threshold parameter τ0 . All the results reported below are based on 1000 replications. 14
The simulation are carried with R (R Development Core Team, 2008) using the glmnet package of Friedman et al. (2010). The results (and those of the empirical application in section 6) can be replicated using knitr (Xie, 2014) and the supplementary material2 . We report the following statistics, averaged across iterations. • MSE: mean square prediction error. • |J(ˆ α) ∩ J(α0 )c |: number zero parameters incorrectly retained in the model. • |J(α0 ) ∩ J(ˆ α)c |: number of non-zero parameters excluded. • Perfect Sel.: the share (in %) of iterations for which we have perfect model selection. • kˆ α − α0 k1 : `1 estimation error for the parameters. • kˆ α − α0 k∞ : `∞ estimation error for the parameters. • |ˆ τ − τ0 |: absolute threshold parameter estimation error. • C: selected (BIC) thresholding parameter. ˆ selected (BIC) penalty parameter. • λ: Table 1 considers different values of the non-zero coefficients to investigate the effect of the scale of these coefficients. The data is generated as: • Sample size: n = 100, 200. • β = a[1, 1, 1, 1, 1, 0, ..., 0], δ = a[1, −1, 1, −1, 1, 0, ..., 0], m = 100. • a = 0.3, 0.5, 1, 2 is the scale of the non zero parameters. As expected, Table 1 reveals that the Lasso does a good job at model screening in the sense that it retains all relevant variables in many instances. However, it often fails to exclude irrelevant variables. This is exactly where the thresholding sets in – it weeds out 2
Available at https://github.com/lcallot/ttlas
15
n = 200 n = 1000 n = 100
a = 0.5
n = 200 n = 1000 n = 100
a=1
n = 200 n = 1000 n = 100
a=2
n = 200 n = 1000
|τˆ − τ
0|
∞
α0 k kα ˆ−
1
α0 k kα ˆ−
Perfe c
t Sel
(α ˆ ) c| 0 )∩J
|J (α
∩J (α
0 ) c|
a = 0.3
|J (α ˆ)
MSE
n = 100
C
ˆ λ
0.50 0.52 0.38 0.39 0.31 0.31
0.41 0.02 0.29 0.01 0.60 0.00
5.50 6.22 4.00 4.56 1.74 2.21
0 0 0 0 1 5
2.27 2.30 1.89 1.91 1.38 1.38
0.30 0.30 0.30 0.30 0.30 0.30
0.28 0.32 0.10 -
- 0.15 0.46 - 0.10 0.44 - 0.04 0.51 -
0.75 0.78 0.57 0.58 0.31 0.31
0.57 0.03 0.50 0.01 2.75 0.00
4.49 5.15 3.21 3.95 0.04 0.06
0 0 0 0 9 94
3.43 3.45 2.92 2.93 1.37 1.35
0.50 0.50 0.50 0.50 0.32 0.32
0.25 0.27 0.10 -
- 0.15 0.47 - 0.10 0.48 - 0.03 0.75 -
1.87 1.94 1.09 1.12 0.34 0.34
1.12 0.05 3.95 0.04 2.98 0.00
3.52 4.21 1.16 1.54 0.00 0.01
0 0 0 39 9 99
6.31 6.31 4.46 4.39 1.43 1.41
1.00 1.00 0.86 0.86 0.35 0.35
0.22 0.21 0.08 -
- 0.18 0.56 - 0.09 0.88 - 0.03 0.83 -
4.68 4.89 1.81 1.87 0.56 0.56
5.32 0.10 7.44 0.05 3.18 0.01
2.12 2.61 0.11 0.21 0.00 0.01
0 21 0 78 7 98
10.01 9.80 4.74 4.57 1.70 1.68
1.76 1.76 1.12 1.12 0.49 0.49
0.20 - 0.21 - 1.02 0.18 - 0.07 - 1.23 0.07 - 0.03 - 0.79 -
Table 1: Lasso (white background) and Thresholded Lasso (grey background). Increasing parameter scale, 3 sample sizes, τ0 = 0.5. the falsely retained variables by the first step scaled Lasso. Perfect model selection almost never occurs when a = 0.3, but for a ≥ 0.5 perfect model selection is achieved in over 94% of the iteration for n = 100. The rates of flase positives and negatives decreases as n is increased. For every value of the scale of the non-zero coefficients all performance measures improve as n is increased. While variable selection is easier when the non-zero coefficients are well-separated from the zero ones, the MSE and estimation error of α ˆ actually improve
16
as the non-zero coefficients become smaller. The reason for this is that falsely classifying a non-zero coefficient as zero is less costly in terms of estimation error when this coefficient is already close to zero than when it is far from zero. On the other hand, τˆ is estimated slightly more precisely as the non-zero coefficients become more separated from the zero ones. To further illustrate the effect of the scale of the parameters on variable selection, Figure 1 shows the frequency of misclassification as well as that of perfect model selection in a setting where only the scale of the threshold parameters vary. The data is generated as: • Sample size: n = 100, 500. • β = [1, 1, 1, 1, 1, 0, ..., 0], δ = a[1, −1, 1, −1, 1, 0, ..., 0], m = 50. • a = 0.1, 0.2, 0.3, 0.4, 0.5, 0.75, 1, 1.5, 2 is the scale of the non zero parameters in δ. Figure 1 shows that thresholding the Lasso estimates maintains the rate of false positive close to zero while that of the Lasso is large and increasing with the scale of the parameters. The rate of false negative is marginally higher for the thresholded Lasso than for the Lasso when n = 100 and these rates are almost identical when n = 500. Taken together these results show that thresholding the Lasso estimates dramatically reduces the rate of classification error, explaining why the thresholded Lasso often achieves perfect variable selection (bottom panels of Figure 1) while the Lasso rarely does so. Table 2 considers the case where no threshold effect is present, δ0 = 0. The exact data generating process is: • Sample size: n = 200, β = [2, 2, 2, 2, 2, 0, ..., 0], δ = [0, ..., 0]. • The length of β and δ is m = 50, 100, 200, 400. The main finding of Table 2 is that almost all performance measures improve drastically compared to Table 1. This is the case in particular for large m as the performance is no longer worsened as m increases. Note, for example, that the MSE and `1 estimation error of
17
n=100
n=500 , ,
3
,
●
●
●
●
, ●
, ●
, ●
, ●
●
●
●
,
,
,
1.0
1.5
2.0
, ,
,
,
,
, ,
, ,
0
,
,
2 1
,
,
● ● ● ● ●
●
●
●
●
,
,
● ● ● ● ●
False positive (nbr.)
4
●
,
●
value
,
3
False Negative (nbr.)
4
● , ● , ● ,
● ,
●
●
,
, ● , ● ,
2
● ,
● ,
● ,
1 ● ,
●
75 ●
50 25 0
, ● , ● , ● , ● , ●
0.5
●
●
● ,
,
,
1.0
1.5
, ●
Estimator
●
, ● , ● , ●
2.0
Scale of δ ,
Lasso
●
,
,
0.5
,
Perfect selection (pct.)
0 100
Thresholded Lasso
Figure 1: Variable selection with varying parameter scale. α ˆ are almost ten times lower for m = 100 than they were in Table 1. Most importantly for us, the perfect models selection percentage is now also stable across m. In table 3 we investigate the effect of using a threshold variable that is part of the set of covariates (Q ∈ X), or that is correlated with the covariates, to quantify the effect of violations of assumption 1. Formally, let X (1) denote the first column of X and ρQ,X (1) be the correlation between Q and X (1) . We consider the case where Q = X (1) , as well as ρQ,X (1) ∈ {0.5, 0.95} and compare this to the case where Q is independent of X. The parameters are defined as: 18
∞
α0 k kα ˆ−
1
α0 k
Perfe c
kα ˆ−
t Sel
(α ˆ ) c| 0 )∩J
|J ( α
0 ) c|
∩J (α |J (α ˆ)
MSE
C
ˆ λ
m = 50
0.29 0.29
1.56 0.21
0.00 0.00
23 81
0.60 0.56
0.16 0.16 0.73
0.07 -
m = 100
0.30 0.31
1.56 0.18
0.00 0.00
23 83
0.65 0.61
0.17 0.17 0.61
0.08 -
m = 200
0.31 0.32
1.45 0.15
0.00 0.00
27 86
0.70 0.66
0.18 0.18 0.53
0.09 -
m = 400
0.32 0.33
1.44 0.12
0.00 0.00
27 89
0.74 0.71
0.19 0.19 0.46
0.10 -
Table 2: Lasso (white background) and Thresholded Lasso (grey background). No threshold effect (δ = 0), n = 200, 4 different length of the parameter vector. • Sample size: n = 200. • β = [2, 2, 2, 2, 2, 0, ..., 0], δ = [2, −2, 2, −2, 2, 0, ..., 0], m = 50. • τ0 ∈ {0.3, 0.5}. • Q1 ∼ N (0, 1). From table 3 it appears that whether the threshold variable Q is included in the set of covariates or is correlated with one of the covariates, has no impact on the performances of either the Lasso nor the thresholded Lasso relative to the case where Q is independent from X (1) . This supports the idea that Assumption 1, which imposed Q and X independent, is rather innocent. In order to investigate the asymptotic properties of our procedure, Table 4 examines the effect of increasing the sample size for two values of τ0 . The exact data generating process is: • Sample size: n = 50, 100, 200, 500, 1000. • β = [2, 2, 2, 2, 2, 0, ..., 0], δ = [2, −2, 2, −2, 2, 0, ..., 0]. 19
Q = X (1) 0.5 0.3 Q⊥ ⊥X 0.5 0.3 ρQ,X (1) =0.5 0.5 0.3 ρQ,X (1) =0.95 0.5
kτˆ −
τ0 k
1
∞
α0 k kα ˆ−
1
α0 k kα ˆ−
Perfe c
t Sel
(α ˆ ) c| 0 )∩J
|J (α
|J (α ˆ)
∩J (α
0 ) c|
0.3
MSE
τ0
C
ˆ λ
1.18 1.21 1.65 1.69
4.66 0.04 5.99 0.06
0.06 0.13 0.09 0.18
3 84 0 79
3.36 3.25 4.01 3.87
0.84 0.85 1.02 1.03
0.26 - 1.53 0.18 - 1.51
0.06 0.05 -
1.29 1.32 1.58 1.61
4.71 0.03 5.83 0.06
0.06 0.14 0.08 0.17
1 85 1 79
3.48 3.37 4.02 3.89
0.90 0.90 1.02 1.02
0.25 - 1.51 0.19 - 1.53
0.06 0.05 -
1.28 1.31 1.62 1.66
4.57 0.03 5.78 0.04
0.08 0.17 0.10 0.20
3 82 0 78
3.54 3.44 4.10 3.97
0.97 0.97 1.07 1.08
0.25 - 1.58 0.18 - 1.58
0.06 0.05 -
1.31 1.34 1.58 1.62
4.76 0.05 5.79 0.05
0.10 0.20 0.10 0.20
1 78 1 78
3.57 3.47 4.08 3.95
1.00 1.00 1.07 1.08
0.26 - 1.62 0.19 - 1.63
0.05 0.05 -
Table 3: Lasso (white background) and Thresholded Lasso (grey background). Q = X (1) and varying dependence between Q and X (1) . 2 locations of τ0 . • τ0 ∈ {0.3, 0.5}. As expected, the probability of correct model selection tends to one for the thresholded scaled Lasso. For the plain scaled Lasso, on the other hand, this probability reaches at most 11%. As seen already in Figure 1, the problem that the scaled Lasso suffers from is false positives – it fails to exclude irrelevant variables even as the sample size increases. Finally, and as expected, the penalty applied (λ) decreases as n increases.
6
Application
This application aims at investigating the presence of a threshold in the effect of debt on future GDP growth. The academic discussion regarding the impact of debt on growth, and the 20
0|
|τˆ − τ
0k ∞
kα ˆ−α
1
α0 k kα ˆ−
Perfe c
t Sel
(α ˆ ) c| 0 )∩J
|J (α
α0 ) c| )∩J ( |J ( α ˆ
MSE
1.83 0.29 7.22 0.12 5.56 0.04 3.31 0.01 2.62 0.00
4.92 5.51 1.09 1.38 0.08 0.16 0.01 0.02 0.00 0.01
0 0 0 45 1 82 6 97 10 98
14.72 14.66 7.92 7.63 4.07 3.95 2.27 2.23 1.51 1.49
1.99 1.99 1.51 1.51 1.00 1.00 0.64 0.64 0.45 0.45
0.30 0.27 0.25 0.17 0.06 -
0.67 1.32 1.25 0.95 0.81
0.58 0.15 0.07 0.04 0.03 -
8.98 9.52 4.73 4.94 1.83 1.89 0.86 0.87 0.55 0.55
1.81 0.24 5.41 0.12 7.41 0.06 4.32 0.01 3.27 0.01
4.84 5.43 2.15 2.62 0.12 0.21 0.01 0.04 0.00 0.01
0 0 0 23 0 78 2 96 8 98
14.56 14.48 10.05 9.84 4.83 4.66 2.53 2.48 1.70 1.67
2.00 2.00 1.75 1.75 1.14 1.14 0.69 0.69 0.49 0.49
0.21 0.20 0.18 0.18 0.08 -
0.62 1.00 1.22 0.96 0.80
0.48 0.21 0.07 0.04 0.03 -
n = 50 n = 100 τ0 = 0.5
ˆ λ
10.04 10.64 3.34 n = 100 3.53 1.46 n = 200 1.50 0.76 n = 500 0.76 0.50 n = 1000 0.50 n = 50
τ0 = 0.3
C
n = 200 n = 500 n = 1000
Table 4: Lasso (white background) and Thresholded Lasso (grey background). Increasing sample size with m = 100 and 2 locations of τ0 . existence of a threshold above which debt becomes severely detrimental to future growth, has been reignited by Reinhart and Rogoff (2010) who provided evidence for the existence of such a threshold. The evidences presented by Reinhart and Rogoff (2010) have been challenged by Herndon et al. (2014), but others have put forth supportive evidences for this thesis, see among others Cecchetti et al. (2012); Caner et al. (2010); Baum et al. (2013). Using models allowing for multiple thresholds and cross-country heterogeneity, Eberhardt and Presbitero ´ (2013); Kourtellos et al. (2013); Egert (2013) find that the sign of the relationship between debt and GDP growth is not unambiguous and the location of the thresholds is not robust to specification changes; we therefore restrict our analysis to models with a single threshold.
21
6.1
Data
We use the data made available by Cecchetti et al. (2012)3 which originates mainly from the IMF and OECD data bases. The data contains four measures of debt-to-GDP ratio for: 1. Government debt, 2. Corporate debt, 3. Private debt (corporate + household), 4. Total (non financial institutions) debt (private + government). Notice that private and total debt are aggregate measures of debt. The data of Cecchetti et al. (2012) also contains a measure of household debt that we drop as the series is incomplete. A set of control variables, composed of standard macroeconomic indicators, is also included in the data. 1. GDP: The logarithm of the per capita GDP. 2. Savings: Gross savings to GDP ratio. 3. ∆Pop: Population growth. 4. School: Years spent in secondary education. 5. Open: Openness to trade, exports plus imports over GDP. 6. ∆CPI: Inflation. 7. Dep: Population dependency ratio. 8. LL: Ratio of liquid liabilities to GDP.
3
The original data is available at http://www.bis.org/publ/work352.htm, and can also be found in the replication material for this section.
22
9. Crisis: An indicator for banking crisis in the subsequent 5 years. This is taken from Reinhart and Rogoff (2010). The data is observed for 18 countries4 from 1980 to 2009 at an annual frequency. We lose one observation at the start of the sample due to first differencing and five at the end of the sample due to computing the 5 years ahead average growth rate, so that the full sample is 1981-2004. The details on the construction of each variables can be found in Cecchetti et al. (2012).
6.2
Results
In order to evaluate the impact of debt on growth, as well as the potential presence of a threshold in this effect, we estimate a set of growth regressions. As in Cecchetti et al. (2012) our left hand side variable is the 5 years forward average rate of growth of per capita GDP. Even though our estimator is not a panel estimator we choose to pool the data so as to make our results comparable with those of Cecchetti et al. (2012) and benefit from a larger sample. We report a first set of results focusing on the impact of government debt on future GDP growth in Table 5. We consider 3 different samples: 1981 to 2004 (full sample, 414 observations), 1990 to 2004 (252 observations), and a sample with no overlapping data (5 years5 , 90 observations). For the full sample we report results for models estimated with and without country specific dummies (denoted FE in the tables). We do not report the estimated parameters associated with the country specific dummies. We estimate the models including every control variable and a single debt measure, that is, 23 parameters to estimate (11 parameters in β,11 parameters in δ, and the threshold parameter τ ) including the intercept and the thresholded intercept plus, in some instances, 17 country specific dummies. The country specific dummies are not penalized. The grid of 4
US, Japan, Germany, the United Kingdom, France, Italy, Canada, Australia, Austria, Belgium, Denmark, Finland, Greece, the Netherlands, Norway, Portugal, Spain, and Sweden. 5 1984,1989,1994,1999,2004.
23
Threshold:
Government L T
Government L T
intercept GDP Savings ∆Pop School ˆ β Open ∆CPI Dep LL Crisis Government
42.43 -3.643 -0.035 -1.692 0.426 0.003 -0.061 -0.091 -0.433 -1.277 -0.713
intercept GDP Savings ∆Pop School Open ∆CPI Dep LL Crisis Government
-12.167
-12.167
-1.504
0.087 1.563 -0.077 -0.006
0.087 1.563 -0.077
-0.037 0.42
0.181 0.827 -0.459 1.762
0.181 0.827 -0.459 1.762
δˆ
τb b λ b C Sample FE
Government L T
Government L T
42.43 79.611 79.611 86.416 86.416 136.988 136.988 -3.643 -7.419 -7.419 -7.495 -7.495 -11.621 -11.621 -0.035 0.033 0.033 0.02 0.02 -1.692 -1.493 -1.493 -0.879 -0.879 -0.813 -0.813 0.426 0.507 0.507 0.095 0.095 -0.082 -0.082 0.026 0.024 0.024 0.037 0.037 -0.061 -0.056 -0.056 -0.157 -0.157 -0.252 -0.252 -0.091 -0.104 -0.104 -0.132 -0.132 -0.22 -0.22 -0.433 0.33 0.33 0.574 0.574 0.631 0.631 -1.277 -1.58 -1.58 -0.949 -0.949 -1.396 -1.396 -0.713 -0.518 -0.518 -1.504
0.42
0.007
0.82 0.82 0.007 0.007 0.1 1981 - 2004 ×
0.909 -0.294 1.471
0.909 -0.294 1.471
0.68 0.68 0.015 0.015 0.3 1981 - 2004 X
-0.052 0.222 0.203 0.012
-0.052 0.222 0.203
-0.035
-0.035
-1.338
-1.338
0.59 0.59 0.007 0.007 0.1 1990 - 2004 X
0.008 0.61 0.098
0.61 0.098
-3.23
-3.23
0.65 0.65 0.008 0.008 0.1 No overlap X
Table 5: 4 specifications with government debt included as threshold variable and regressor. Estimated parameters for the Lasso (L) and Thresholded Lasso (T). Empty cells are parameters set to zero, dashes indicate parameters not included in the model. threshold parameters goes from the 15th to the 85th percentiles of the threshold variable by steps of 5 percentage point. We select the thresholding parameter C by BIC using a grid b are set to zero by the bλ from 0.1 to 5, so that parameters smaller (in absolute value) than C thresholded scaled Lasso. Table 5 reports the estimated parameters for the 4 specifications of the model, all in-
24
cluding government debt. The L and T in the header of the table indicates a scaled Lasso b δ) b or thresholded scaled Lasso estimate (β, e δ). e The upper panel of each table estimate (β, e the middle panel δb and δ, e and the lower panel gives the values of τb, λ, b and reports βb and β, b Recall that the effect of the regressors when the threshold variable is below its threshold C. e while the effect when the threshold variable is above its threshold is given by βb + δb (βe + δ) e for the scaled Lasso (thresholded scaled Lasso). is given by βb (β) A large fraction of βb is non-zero, the Lasso drops a single variable twice, while δb is more b is always sparse, the Lasso drops between 2 and 7 variables. The thresholding parameter C chosen among the lowest values in the search grid, this nonetheless results in between 1 and 3 extra parameters being discarded compared to the scaled Lasso. A threshold (b τ ) for the effect of government debt on growth is found at between 60% and 80% of GDP, consistent with the findings of Cecchetti et al. (2012); Reinhart and Rogoff (2010); Caner et al. (2010); Baum et al. (2013). The level of GDP is found to have a negative effect on GDP per capita growth as predicted by the income convergence hypothesis, as do inflation, the dependency ratio, population b our model indicates in most growth, and crises. Considering the effect of both βb and δ, instances that government debt has a positive effect below the threshold and a negative effect, or no effect at all, above the debt threshold. Ceteris paribus a 10 percentage point increase in the government debt to GDP ratio, when it is above the threshold, is found to result in a decrease of the average 5 year growth rate between 0.07% and zero. Looking at this effect of high debt on future growth in isolation is overly restrictive though since there are large changes in the other parameters of the model when the debt threshold is crossed. This is the case in particular for financial variables. Interestingly, crises are found to have a more detrimental effect on growth for countries with a government debt ratio below the threshold and while liquid liabilities (LL) are beneficial to the future growth of a country with low debt this does not appear to be the case when debt is high. Table 6 reports estimates for 3 other measures of debt in a model with country dummies
25
Threshold: intercept GDP Savings ∆Pop School Open βˆ ∆CPI Dep LL Crisis Corporate Private Total
δˆ
intercept GDP Savings ∆Pop School Open ∆CPI Dep LL Crisis Corporate Private Total τb b λ b C Sample FE
Corporate L T 140.097 140.097 -11.642 -11.642 -0.026 -0.026 -1.063 -1.063 -0.172 -0.172 0.053 0.053 -0.204 -0.204 -0.242 -0.242 0.332 0.332 -0.96 -0.96 0.491 0.491 -
Private L
T
Total L
T
126.236 126.236 134.725 134.725 -10.616 -10.616 -11.396 -11.396 -0.031 -0.031 -0.011 -0.011 -0.995 -0.995 -0.132 -0.132 0.041 0.041 0.047 0.047 -0.19 -0.19 -0.166 -0.166 -0.191 -0.191 -0.235 -0.235 0.316 0.316 0.376 0.376 -0.319 -0.319 -0.943 -0.943 -0.968 -0.968 0.284 0.284
8.261
8.261
2.301
2.301
-0.243 -2.154 -0.29
-0.243 -2.154 -0.29
0.022 -1.1 -0.33
-0.032
-0.032
0.022 -1.1 -0.33 -0.007 -0.082
-0.082
1.175 -2.389
1.175 -2.389
-
-
0.365 -1.167 0.563 -
0.365 -1.167 0.563 -
0.69 0.69 0.001 0.001 0.1 1981 - 2004 X
1.62 1.62 0.005 0.005 0.1 1981 - 2004 X
2.387 0.387 0.063 0.777 -0.192
2.387 0.387 0.063 0.777 -0.192
-31.521 -
-31.521 -
2 2 0.002 0.002 0.1 1981 - 2004 X
Table 6: Growth regressions with corporate, private, or total debt (see header) included both as threshold variable and as regressor. Estimated parameters, pooled data, Lasso (L) and Thresholded Lasso (T). Empty cells are parameters set to zero, dashes indicate parameters not included in the model. and using the full sample, the same model used in the first two columns of Table 5. The sparsity pattern in Table 6 is comparable to that of Table 5 and some similarities are found between the estimated values. Again, the level of per capita GDP is found to have a negative 26
impact on future growth, as are the dependency ratio, inflation, population growth, and financial crisis. A threshold is always found and identified, 69% for corporate debt, 162% for private debt, and 200% for the total debt. The large value of the estimated thresholds for private and total debt can be explained by the fact that these are aggregate measures of debt and hence of a substantially larger magnitude than either corporate of government debts. The effect of corporate and total debt is found to be positive and not directly affected by the threshold whereas the effect of private debt is negative, and more so when private debt is high. As previously, financial crises are found to have a stronger negative impact on countries with low debt, though crises are detrimental to growth irrespective of the level of debt.
7
Conclusion
In this paper we considered high-dimensional threshold regressions and provided sup-norm oracle inequalities for the estimation error of the scaled Lasso of Lee et al. (2015). These results are non-trivial as most research has focused on either `1 or `2 oracle inequalities. The sup-norm bounds are shown to be crucial for exact variable selection by means of thresholding. To be precise, we can distinguish at a much finer scale between zero and nonzero coefficients than would have been possible if thresholding had been based on either `1 or `2 oracle inequalities. We carry out simulations and show that the thresholded scaled Lasso performs well in model selection. Finally, we estimate a set of growth regressions documenting the existence of a threshold in the amount of debt relative to GDP. Several parameters change when the threshold is crossed making the effect of high debt on future growth unclear. Future work includes investigating the effect of multiple thresholds. Furthermore, it is of interest to allow for an endogenous threshold variable as Kourtellos et al. (2015) even in the high-dimensional setting.
27
APPENDIX The following result is needed in the proofs of Theorems 1 and 2. It is similar to Lemma 6 in Lee et al. (2015) but allows for random regressors and non-Gaussian error terms. Lemma 2. Let Assumption 1 be satisfied. Then, r
1 log(m)
0 τ )U = Op
X (ˆ n n `∞
1 0
Proof. First, note that n X (ˆ τ )U
`∞
1 0
≤ supτ ∈T n X (τ )U
such that it suffices to bound
`∞
the right hand side. Let > 0 be arbitrary. By the independence of (X1 , ..., Xn , U1 , ..., Un ) and (Q1 , ..., Qn ) one has for j = 1, ..., m, X n k 1 X 1 (j) (j) P sup Xi Ui 1{Qi (Q1 , ..., Qn ) = P max Xi Ui > (Q1 , ..., Qn ) 1≤k≤n n τ ∈T n i=1 i=1 k 1 X (j) = P max (7) Xi Ui > 1≤k≤n n i=1 almost surely, where the first equality used that conditional on (Q1 , ..., Qn ), 1{Q1 ≤ cP 1≤k≤n n i=1
P
X n n (j) Xi U i > c i=1
(8)
(j)
As Xi Ui is subexponential (the product of two subgaussian variables is subexponential) for all i = 1, ..., n and j = 1, ..., m, Corollary 5.17 in Vershynin (2012) yields X n n (j) P Xi U i > ≤ 2 exp −d (/K)2 ∧ (/K) n c i=1 28
(9)
where d > 0 and K = K(c) > 0 are absolute constants. Therefore, choosing =
q A log(m) n
for some A ≥ 1 yields r X n n dA log(m) log(m) (j) P Xi Ui > ≤ 2 exp − 2 ∧ n c K ∨ K n n i=1 dA log(m) ≤ 2 exp − 2 K ∨K where the second estimate used that log(m)/n → 0 such that
log(m) n
(10)
is smaller than its square
root for n sufficiently large. Hence, X n dA 1 (j) P sup Xi Ui 1{Qi (Q1 , ..., Qn ) ≤ 2c exp − 2 log(m) K ∨ K τ ∈T n i=1 for all j = 1, ..., m almost surely. Taking expectations over (Q1 , ..., Qn ) yields X n dA 1 (j) log(m) . P sup Xi Ui 1{Qi ≤ 2c exp − 2 K ∨ K τ ∈T n i=1
(11)
Therefore, combining (10) (this is also valid for c = 1 with a different K) and (11), a union bound over 2m terms yields upon synchronizing constants
dA
1 0 log(m) . P sup X (τ )U > ≤ 2m(1 + c) exp − 2 K ∨K `∞ τ ∈T n
q
log(m) Choosing A sufficiently large implies that supτ ∈T n1 X 0 (τ )U = Op using the n `∞ p definition of = A log(m)/n.
Lemma 3. Let assumption 1 be satisfied. Then, supτ ∈T max1≤j≤2m X (j) (τ ) n = Op (1) and
min1≤j≤2m X (j) (t0 ) n is bounded away from zero wpa1.
Proof. Consider the first claim and note that supτ ∈T max1≤j≤2m X (j) (τ ) n = max1≤j≤m X (j) (τ ) n . (j) (j) 2 As X1 is uniformly subgaussian in j = 1, ..., m it also holds that E X1 is uniformly bounded (this follows by Lemma 2.2.1 in van der Vaart and Wellner (1996) and the in29
equalities at the bottom of page 95 in that reference). Thus, by the triangle inequality and √ subadditivity of x 7→ x, v v u n u n q u 1 X (j) 2 u 1 X 2 2 (j) 2 (j) (j) t Xi ≤ t Xi − EXi + EX1 n i=1 n i=1 r P P n (j) 2 (j) 2 (j) 2 (j) 2 and hence it suffices to bound n1 ni=1 Xi − EXi , or, equivalently, n1 i=1 Xi − EXi (j) 2
uniformly in j = 1, ..., n by a constant with probability tending to 1. As the Xi
are uni-
formly subexponential (as they are a product of uniformly subgaussian random variables) in j = 1, ..., m, Corollary 5.17 in Vershynin (2012) implies that for any > 0 there exist constants c, K > 0 (see Vershynin (2012) for the exact meaning of the constants) such that
P
n 1 X (j) 2 (j) 2 2 > ≤ 2 exp −c (/K) ∧ (/K) n − EX X i i n i=1
for all j = 1, ..., m. Now, choosing = K ∨ K/c, the union bound yields that
P
n 1 X (j) 2 (j) 2 > ≤ 2me−n → 0 − EX X i i 1≤j≤m n i=1
max
as log(m)/n → 0. Thus, K ∨ K/c is large enough to be the sought constant.
Now turn to the second claim and observe min1≤j≤2m X (j) (t0 ) n = minm+1≤j≤2m X (j) (t0 ) n . Note that by Assumption 1, (j) 2
min E X1
1≤j≤m
(j) 2 1{Q1 0. 1≤j≤m
where the first equality used the independence of X1 and Q1 as well as that Q1 is uniformly distributed on [0, 1]. Therefore, it suffices to show that n
1 X (j) 2 (j) 2 − EX 1 X 1 {Qi