tax policy on the take-up rate of private health insurance in Australia. ... relationship becomes challenging to analyze after the Australian Tax Office perturbed the.
ISSN 1440-771X
Department of Econometrics and Business Statistics http://business.monash.edu/econometrics-and-business-statistics/research/publications
Error-in-Variables Jump Regression Using Local Clustering
Yicheng Kang, Xiaodong Gong, Jiti Gao, and Peihua Qiu
July 2016
Working Paper 13/16
Error-in-Variables Jump Regression Using Local Clustering Yicheng Kang1 , Xiaodong Gong2 , Jiti Gao3 and Peihua Qiu4 JPMorgan1 , University of Canberra2 , Monash University3 and University of Florida4 Abstract Error-in-variables regression is widely used in econometric models. The statistical analysis becomes challenging when the regression function is discontinuous and the distribution of measurement error is unknown. In this paper, we propose a novel jump-preserving curve estimation method. A major feature of our method is that it can remove the noise effectively while preserving the jumps well, without requiring much prior knowledge about the measurement error distribution. The jump-preserving property is achieved mainly by local clustering. We show that the proposed curve estimator is statistical consistent, and it performs favorably, in comparison with an existing jump-preserving estimator. Finally, we demonstrate our method by an application to a health tax policy study in Australia.
Keywords: Clustering; Demand for private health insurance; Kernel smoothing; Local regression; Measurement errors; Price elasticity.
JEL Classification: C13, C14.
1
Introduction
This research is motivated by our attempt to study the impact of the medical levy surcharge tax policy on the take-up rate of private health insurance in Australia. People in Australia are liable of medical levy surcharge (which is about 1 percent of their annual taxable incomes) if they do not buy private health insurance and their annual taxable incomes are above a certain level. For example, the thresholding level for single individuals was $50,000 per annum in the 2003-04 financial year, where the dollar sign “$” used here and throughout the paper represents the Australian dollar. The major purpose of the medical levy surcharge 1
policy was to give people more choices of health insurance and take a certain pressure off the public medical system. Both policy makers and economists are interested in studying the impact of this policy on the relationship between the private health insurance take-up rate and the annual taxable income. It was expected that this policy would generate a jump in the private health insurance take-up rate around the thresholding taxable income. This discontinuous relationship could be used to evaluate the impact of the policy. However, such relationship becomes challenging to analyze after the Australian Tax Office perturbed the income data by multiplying random numbers to them, out of privacy consideration, because the distribution that generates the random numbers was not revealed. In the literature, jump regression analysis (cf., Qiu 2005) provides a natural framework for studying discontinuous relationship between random variables. In that framework, two approaches have been suggested for estimating a discontinuous curve. The first approach, called the indirect approach, estimates the discontinuity locations first and then considers different segments of the design interval, in which the underlying function is assumed to be continuous and can be estimated as usual. See, for example, Eubank and Speckman (1994), Gijbels et al. (1999), Gijbels and Goderniaux (2004), Kang and Qiu (2014), Kang et al. (2015), Muller (1992), M¨ uller (2002), Qiu (1991), Qiu et al. (1991), Qiu and Kang (2015), Wu and Chu (1993), among others. The second approach, called the direct approach, estimates the regression curve directly, without first estimating the number and locations of discontinuities. Methods based on this idea include Gijbels et al. (2007), McDonald and Owen (1986), Qiu (2003), and the references therein. Most existing jump-preserving estimation methods assume that the explanatory variable does not have any measurement error involved. Error-in-variables regression models, on the other hand, allow measurement error in the explanatory variables. But most of them assume that the regression function is smooth and that the measurement error distribution is known or it can be estimated reasonably well beforehand (cf., Carroll et al. 1999, 2012, Comte and Taupin 2007, Cook and Stefanski 1994, Delaigle and Meister 2007, Fan and Masry 1992, Fan and Truong 1993, Hall and Meister 2007, Staudenmayer and Ruppert 2004, Stefanski 2000, Stefanski and Cook 1995, and Taupin 2001). 2
In this paper, we propose a jump-preserving curve estimation method for discontinuous Error-in-variables regression models. The proposed method is a direct approach without explicitly detecting jumps first and thus it is easy to use. Another feature of our method is that it does not require the measurement error distribution to be specified beforehand, making it applicable to many real problems. The remainder of this article is organized as follows. In Section 2, our proposed method is described in detail. In Section 3, some asymptotic properties of the proposed estimator are discussed. In Section 4, the numerical performance is evaluated by simulated examples. In Section 5, the proposed method is applied to the private health insurance data. Several remarks conclude the article in Section 6. Some technical details are provided in a supplementary file.
2
Proposed Methodology
Let {(Wi , Yi ) : i = 1, . . . , n} be independent and identically distributed observations from the model described below. Yi = g(Xi ) + εi ,
(1)
Wi = Xi + σn Ui ,
(2)
where i = 1, . . . , n, g is the unknown regression function with possible discontinuities, Yi is the ith observation of the response variable, Xi is the ith observation of the unobservable explanatory variable, εi ’s are independent and identically distributed random errors with mean 0 and unknown variance τ 2 > 0, Wi is the observed value of Xi with a measurement error, σn > 0 denotes the standard deviation of the measurement error in Xi , and Ui is the standardized measurement error with mean 0 and variance 1. It is also assumed that Ui ’s are independent and identically distributed, Ui is independent of both Xi and Yi , the distribution of Ui , denoted as fU , and the distribution of Xi , denoted as fX , are both unknown. Without loss of generality, assume that the design interval is [0, 1]. Our major goal is to estimate g(x) from the observed data. Our idea of estimating a regression function with possible jump points of unknown jump 3
locations is that each point in the design interval is a potential jump point and thus the estimation method should adapt at each point to a possible discontinuity. Next, we describe the proposed method in detail. For any given point x ∈ [hn , 1 − hn ], where hn ∈ (0, 1/2) is a bandwidth parameter, consider a small neighborhood of x defined by N (x; hn ) = {z ∈ (0, 1) :| z − x |≤ hn }, and the following local linear kernel (LLK) smoothing procedure: X Wi − x 2 min [Yi − a − b(Wi − x)] K , a,b hn
(3)
N (x;hn )
where K is a density kernel function with support [−1, 1]. Let b an (x), bbn (x) be the solution to (a, b) in (3). Then, the weighted residual mean squares (WRMS) at x is defined by P WRMSn (x) =
h N (x;hn )
Yi − b an (x) − bbn (x)(Wi − x) P Wi −x K N (x;hn ) hn
i2
K
Wi −x hn
.
(4)
If x is a jump point, then the jump structure of the regression function would be dominant even when there is a measurement error involved. This fact is illustrated in Fig. 1. By this observation, if x is near a jump point, there should be a significant evidence of lack-of-fit of the LLK smoothing procedure (3). In other words, WRMSn (x) would be relatively large. Thus, if the following is true: WRMSn (x) > un ,
(5)
where un is a threshold value, then x is likely to be close to a jump point, and we cannot use all observations near x to estimate g(x) because it would blur the jump otherwise. When there is no measurement error in X, the one-sided estimators can estimate g(x) reasonably well (cf., Qiu 2003). In the case when X has measurement error involved, such estimators are unavailable because Xi ’s are no longer observable. It may be problematic if we simply replace Xi ’s by Wi ’s for constructing a one-sided estimator because we do not know whether a specific value Xi is located on the right (or left) side of x when its observed value Wi is on the right (or left) side of x, due to the measurement error. To overcome this difficulty, 4
1.2
Group1
1.0 0.8
Y
0.6 0.4 0.2 0.0 Group2
-0.2 0.0
0.2
x − h0n
x
x + h0n
0.8
1.0
W
Figure 1: The solid line denotes the regression function g(·) that has a jump at x = 0.5 (marked by the vertical dashed line). The dark points denote observations of (W, Y ) where W is the observed value of X with measurement error involved. It can be seen that the jump structure of g(·) is quite visible among observations in N (x; h0n ) (i.e., those fall between the two vertical dotted lines) even in the presence of measurement error.
5
we can make use of the fact that the jump structure of g(·) is still quite visible even in the presence of measurement error (cf., Fig. 1). We suggest classifying all observations in the neighborhood N (x; h0n ) of x into two significantly separated groups, (i.e., Group 1 versus Group 2 in Fig. 1), where the bandwidth parameter h0n could be different from hn . Then, we can estimate g(x) using the observations in one group. As long as the observations are properly clustered, the jump should be preserved well. Next, we describe our clustering procedure mentioned above. An ideal classification would put observations whose unobservable X values are on the same side of the jump location into a same group. So, a classification should be reasonable when certain separation measure reaches the maximum. Intuitively, if the two groups of observations are well separated, the within-group variability would be small and the between-group variability would be large. Consequently, the ratio of between-group variability and within-group variability would be large. Therefore, we can use this ratio as a separation measure of the two groups. Specifically, let G(x; h0n ) = {(Wi , Yi ) : Wi ∈ N (x; h0n )}, Gl (x; h0n ) and Gr (x; h0n ) be a partition of G(x; h0n ) (i.e., G(x; h0n ) = Gl (x; h0n ) ∪ Gr (x; h0n ) and Gl (x; h0n ) ∩ Gr (x; h0n ) = ∅), and Wl = Wr =
1 | Gl (x; h0n ) | 1 | Gr (x; h0n ) |
X
Wi , Y l =
(Wi ,Yi )∈Gl (x;h0n )
X
Wi , Y r =
(Wi ,Yi )∈Gr (x;h0n )
1 | Gl (x; h0n ) | 1 | Gr (x; h0n ) |
X
Yi ,
(Wi ,Yi )∈Gl (x;h0n )
X
Yi ,
(Wi ,Yi )∈Gr (x;h0n )
where | A | denotes the number of elements in the pointset A. Next, we consider the following LLK smoothing procedures: min a,b
X (Wi ,Yi )∈Gl (x;h0n )
2
[Yi − a − b(Wi − x)] K
Wi − x h0n
Wi − x h0n
,
(6)
.
(7)
and min a,b
X (Wi ,Yi )∈Gr (x;h0n )
2
[Yi − a − b(Wi − x)] K
And the WRMS’s defined in (4) can be computed after N (x; hn ) is replaced by Gl (x; h0n ) and Gr (x; h0n ), respectively. They are denoted as WRMSl (x; h0n ) and WRMSr (x; h0n ). Then,
6
we define the following separation measure: T (Gl (x; h0n ), Gr (x; h0n )) =
(W l − W r )2 + (Y l − Y r )2 . WRMSl (x; h0n ) + WRMSr (x; h0n )
(8)
It can be seen that the numerator in (8) represents between-group variability and the denominator represents within-group variability. Let G∗l (x; h0n ) and G∗r (x; h0n ) denote the partition that maximizes (8). In practice, solving the optimization in (8) by exhaustive search would be too time-consuming (cf., Everitt et al. 2011, Chapter 5). The more efficient algorithm proposed in Hartigan and Wong (1979) is well received in the literature. We adopt that algorithm in this paper. Let b al,n (x) and b ar,n (x) denote the solution to a in (6) and (7), respectively. If |G∗l (x; h0n )| > |G∗r (x; h0n )|, then it is more likely for x to be on the same side of the jump point as the X values of those observations in G∗l (x; h0n ). So, our proposed estimator of g(x) is gbn (x) =
b al,n (x),
if |G∗l (x; h0n )| > |G∗r (x; h0n )|,
b ar,n (x0 ),
otherwise.
(9)
The proposed jump-preserving curve estimation procedure is summarized as follows. Jump-preserving Curve Estimation Procedure Step 1: For any given x, compute its WRMS by (4). Step 2: If (5) is true, go to Step 3. Otherwise, estimate g(x) by b an (x), the solution to a in (3). Step 3: Cluster the observations in G(x; h0n ) by maximizing (8), then compute gbn (x) by (9). In the proposed estimation procedure (3)–(9), there are three parameters, hn , h0n and un , to choose. hn is used for two purposes: to flag possible jump points in (5) and to estimate the curve in continuity regions. The purpose that hn serves in our procedure is similar to what the bandwidth parameter does in the conventional kernel smoothing. Thus, we suggest to select hn to minimize the following leave-one-out cross validation score: n X 2 min Yi − b a(−i) n (Wi ) , hn
i=1
7
(−i)
where b an (·) denotes the estimate b an (·) when the ith observation (Wi , Yi ) is omitted. Next, we discuss the selection of h0n and un . In simulation studies, the true regression function g could be known. Then, once hn is selected, (h0n , un ) can be chosen to be the pair that minimizes the Mean Square Error (MSE), defined as n
MSE (b g , g; h0n , un )
1X [b g (xi ) − g(xi )]2 , = n i=1
(10)
where {x1 , . . . , xn } are equally spaced values on [0, 1]. In practice, g is usually unknown. In such cases, we suggest the following bootstrap selection procedure: • For a given bandwidth value h0n > 0 and threshold value un > 0, apply the proposed estimation procedure (3)–(9) to the original dataset {(W1 , Y1 ), . . . , (Wn , Yn )}, and obtain an estimator of g, denoted as gb(· ; h0n , un ).
• Draw with replacement n times from the original dataset to obtain the first bootstrap
f1(1) , Ye1(1) ), . . ., (W fn(1) , Yen(1) )}. sample, denoted as {(W
• Apply the proposed estimation procedure (3)–(9) to the first bootstrap sample, and
obtain the first bootstrap estimator of g, denoted as ge(1) (· ; h0n , un ).
• Repeat the previous two steps B times and obtain B bootstrap estimators of g:
{e g (1) (· ; h0n , un ), . . . , ge(B) (· ; h0n , un )}.
• Then, the bandwidth h0n and the threshold un are chosen to be the minimizer of B n 2 1 X 1 X (k) 0 0 min g e (x ; h , u ) − g b (x ; h , u ) . i n i n n n h0n ,un B n i=1 k=1
3
(11)
Asymptotic Properties
In this section, we discuss some asymptotic properties of the proposed estimation procedure (3)–(9). To this end, we have the theorem below. Theorem 1. Suppose that the following conditions hold: {(W1 , Y1 ), . . . , (Wn , Yn )} are independent and identically distributed observations from models (1) and (2). g(·) is a bounded, piecewise continuous function defined on [0, 1] with finitely many jump points in [0, 1]; at each jump point, g(·) has finite one-sided limit; its first-order derivative, g 0 (·), is also a bounded 8
function and is continuous on [0, 1] except on those jump points; at each jump point, g 0 (·) also has finite one-sided limit. Denote the set of all the jump points by S. The support of fX is [0, 1]; fX is uniformly continuous, bounded, and positive on (0, 1) and has bounded derivatives on (0, 1). fU is continuous on its support, symmetric about 0 with fU (0) > 0 r∞ r∞ and satisfies the conditions that −∞ ufU (u) du = 0 and −∞ u2 fU (u) du = 1. E|ε1 |4 < ∞. hn = o(1), 1/(n1/3 hn ) = o(1), h0n = o(1), σn2 /h0n = o(1), and 1/(n1/3 h0n ) = o(1) The kernel function K is a Lipschitz-1 continuous density function with support [−1, 1] and is symmetric about 0. un = τ 2 + δn , where δn is sequence of positive numbers such that δn = o(1) and that [h2n + σn2 + (log n)1+γ /(nhn )β ]/δn = o(1) for some γ > 0 and some β ∈ (0, 1/4). Then, we have, with probability 1, O h2n + σn2 + (log n)1+γ /(nhn )β , gbn (x) − g(x) = O h02 + σ 2 + (log n)1+γ /(nh0 )β , n n n
if dE (x, S) > hn , otherwise,
where S is the set of all true jump points and dE (x, S) = minxs ∈S | x − xs |. The Theorem 1 shows that the proposed estimation procedure (3)–(9) estimates g(·) consistently under some regularity conditions. Its proof is given in a supplementary file.
Remark 1 Theorem 1 requires that the measurement error standard deviation σn tends to 0 when the sample size increases. In the literature, it has been pointed out that this condition is needed for consistently estimating the regression function when its observations have measurement errors involved and when little prior information about the measurement error distribution is available (cf., Delaigle 2008).
Remarks 2 The rate of convergence of σn to 0 does not need to be comparable to that of hn for our proposed method to flag all jump points correctly (see the supplementary file for details). However, the condition σn2 /h0n = o(1) is required for classifying a jump point into the correct cluster. This is a weaker condition than the one σn /h0n = o(1) which would be required by existing jump regression methods (e.g., Gijbels et al. 2007, Qiu 2003) to ensure consistency. 9
4
Numerical Studies
4.1
Numerical performance of the proposed methodology
In this subsection, the performance of the proposed estimation procedure is evaluated using the following two true regression functions: g1 (x) = (3x2 + 0.53)1{0.3≤x (log n)1+γ i=1 ( " #) n X ≤ exp −(log n)1+γ E exp ξn gn (i) − E(gn (i)) i=1
=n−(log n)
γ
n Y i=1
E {exp {ξn [gn (i) − E(gn (i))]}} 20
(S.15)
Notice that |ξn [gn (i) − E(gn (i))]| " ( k k #) W − x 1 W − x 1 W − x W − x i i i i = (nhn )1/2 −E K K nhn hn hn nhn hn hn ≤
1 C1 , (nhn )1/2
(S.16)
where we have used the fact that K(·) is bounded with support [−1, 1] and C1 is some positive constant unrelated to x. By (S.16) and the fact that ez ≤ 1 + z + z 2 for |z| ≤ 1/2, we have −(log n)γ
(S.15) ≤n
=n−(log n)
γ
n Y E 1 + ξn [gn (i) − Egn (i)] + ξn2 [gn (i) − Egn (i)]2 i=1 n Y i=1
γ
{1 + var [ξn gn (i)]}
≤n−(log n) exp −(log n)γ
( n X
) var[ξn gn (i)]
i=1
exp nξn2 var[gn (1)] " ( 2k 2k #) W − x W − x 1 γ i i K ≤n−(log n) exp nhn n 2 2 E n hn hn hn "1 # w γ =n−(log n) exp w2k K(w) dw · fX (x) + O hn + σn2
=n
−1
−(log n)γ
≤n
C2 ,
where C2 is some positive constant unrelated to x. By the same arguments, it can also be shown that ( pr
" n # ) X ξn γ E(gn (i)) − gn (i) > ≤n−(log n) C2 . 1+γ (log n) i=1
Therefore, it follows from Borel-Cantelli Lemma that (nhn )1/2 Zk (x) Zk (x) −E →0, (log n)1+γ nhk+1 nhk+1 n n 21
a.s. (n → ∞).
And thus (S.13) holds for l = 0 by the condition that nhn → ∞. Next, we prove (S.13) is also true for l = 1. Let 1 e Yei = Yi 1n|Y | 0 is a very small number to be determined later, k = 0, 1, 2, and i = 1, 2, . . .. Let ζn = (nhn )β . For any given > 0, We have ( " n # ) X ζn pr gen (i) − E [e gn (i)] > (log n)1+γ i=1 ( ( " n #)) X ≤ exp −(log n)1+γ E exp ζn gen (i) − E [e gn (i)] i=1
n Y γ
=n−(log n)
i=1
E {exp [ζn [e gn (i) − E [e gn (i)]]]}
(S.17)
|ζn (e gn (i) − Ee gn (i))| " k k # 1 W − x W − x 1 W − x W − x i i i i Yei K − E Yei K =(nhn )β nhn hn hn nhn hn hn ≤C3 (nhn )β =C3
1 1 +α n2 nhn
1 n
1 −α−β 2
h1−β n
,
(S.18)
where C3 is some positive constant. If α is chosen such that 0.5 − 3α = 2β (need to have β < 1/4), then (S.18) = o(1) by the condition that n1/3 hn → ∞. By ez ≤ 1 + z + z 2 for |z| ≤ 1/2, we have −(log n)γ
(S.17) ≤n
−(log n)γ
≤n
n Y
1 + ζn2 var [e gn (i)]
i=1
exp
( n X
) gn (i)] ζn2 var [e
" ( i=1 2k 2 #) n X W − x W − x ζn2 i i −(log n)γ 2 E Yei K ≤n exp 2 h2 n hn hn n i=1 ( n " 2 #) 2k X ζ2 Wi − x Wi − x n −(log n)γ 2 ≤n exp E Yi K n2 h2n hn hn i=1 22
( "
2k !#) n 2β 2β X n h W − x W − x γ i i n =n−(log n) exp E E(Yi2 | Xi )E K2 Xi 2 2 n hn i=1 hn hn " ( 2k #) W − x W − x C γ 1 1 4 E K2 ≤n−(log n) exp 2−2β 1−2β hn hn n hn " # ∞ w ∞ w C γ 4 =n−(log n) exp w2k K 2 (w)fX (x + whn − σn u)fU (u) dwdu 1−2β 1−2β n hn −∞ −∞ C5 −(log n)γ , ≤n exp (nhn )1−2β where the boundedness of g(·) is used in the second last inequality, the boundedness of fX (·) and its derivatives is used in the last inequality, and C4 and C5 are positive constants unrelated to x. Similar calculation is performed in the last equation as did in (S.14). By the same arguments, we also have ( ! ) n X ζn C5 −(log n)γ pr E [e gn (i)] − gen (i) > ≤ n exp . (log n)1+γ i=1 (nhn )1−2β By Borel-Cantelli Lemma, ζn (log n)1+γ
n X gen (i) − E [e gn (i)] → 0 a.s., as n → ∞.
(S.19)
i=1
Next, we look at " n # X ζn gn (i) − gen (i) (log n)1+γ i=1 k n X 1 Wi − x Wi − x e = (Yi − Yi ) K . (nhn )1−β (log n)1+γ i=1 hn hn
" E
∞ X i=1
# 1{Yi 6=Yei } =
∞ X i=1
∞ X 1 pr |Yi | > i 2 +α ≤ i=1
(S.20)
1
i
2 E Y < ∞, i 1+2α
where 1{·} denotes the indicator function. Thus, there exists an Ω0 such that pr(Ω0 ) = 1 and that for each ω ∈ Ω0 , there exists an Nω > 0, Yn (w) = Yen (w) (n ≥ Nω ). 23
(S.21)
By (S.21), we have " n # X ζn gn (i) − gen (i) → 0, (log n)1+γ i=1
a.s. (n → ∞).
(S.22)
Also, we have n X ζn E [gn (i) − gen (i)] 1+γ (log n) i=1 " k # n X Wi − x Wi − x 1 E Yi 1{|Y |>i 21 +α } K = 1−β 1+γ i (nhn ) (log n) hn hn i=1 " # n k X 1 1 Wi − x Wi − x 2 ≤ E Yi K 1 1−β 1+γ +α (nhn ) (log n) hn hn 2 i=1 i " # n k X 1 1 Wi − x Wi − x = K E E(Yi2 | Xi )E X i 1 1−β 1+γ +α (nhn ) (log n) hn hn 2 i=1 i " k # n X C6 Wi − x 1 Wi − x ≤ K E 1 1−β 1+γ +α (nhn ) (log n) hn hn 2 i=1 i ∞ ∞ n X C6 hn w w k = w K(w)fX (x + whn − σn )fU (u) dwdu (nhn )1−β (log n)1+γ i=1 i 12 +α −∞ −∞ 1 C7 hβn n 2 −α 1−β 1+γ n (log n) hβ =C7 1 −(α+β) n , n2 (log n)1+γ
≤
(S.23)
where the boundedness of g(·) is used in the second last inequality, the boundedness of fX (·) and its derivatives is used the last inequality, and C6 and C7 are some positive constants unrelated to x. Since α + β < 1/2, we have (S.23)= o(1) as well. By (S.19), (S.22), and (S.23), we have " k k # n Wi − x Wi − x Wi − x Wi − x (nhn )β 1 X K − E Yi K Yi (log n)1+γ nhn i=1 hn hn hn hn → 0 a.s., as n → ∞. Therefore, (S.13) holds for l = 1. And it completes the proof of Lemma 2.
24
Lemma 3. Suppose that |x − xs | ≥ δ for some δ > 0. Then, (log n)1+γ 2 2 a.s. b an (x) =g(x) + O hn + σn + (nhn )β 2 1+γ bbn (x) =g 0 (x) + O hn + σn + (log n) a.s. hn (nhn )β hn (log n)1+γ 2 2 2 WRMSn (x) =τ + O hn + σn + a.s. (nhn )β Proof. First, we write
b an (x) =
Z2 (x) 1 nh3n nhn
Pn
i=1
Yi K
Wi −x hn
−
Z0 (x) Z2 (x) nhn nh3n
Z1 (x) 1 nh2n nhn
−
Pn
Wi −x i=1 Yi hn K
Wi −x hn
Z1 (x) Z1 (x) nh2n nh2n
.
(S.24)
Denote m(·) = g(·)fX (·). We have that " # ∞ ∞ n 1 w w 1 X Wi − x v + σn u − x E = Yi K fX (v)fU (u) dudv g(v)K nhn i=1 hn hn −∞ −∞ hn =
w∞ w∞ −∞ −∞
m(x + whn − σn u)K(w)fU (u) dudw
(S.25)
There exists xL < xU such that x ∈ [xL , xU ], |xL − xs | ≥ δ/2, and |xU − xs | ≥ δ/2. Then we can continue the calculation as follows. x −x−whn U 1 wσn w (S.25) = K(w) dw · m(x + whn − σn u)fU (u) du −1
xL −x−whn σn
xL −x−whn σn
+
w
−∞
=
w1 −1
m(x + whn − σn u)fU (u) du + xU −x−whn σn
w
K(w) du
xL −x−whn σn
w∞ xU −x−whn σn
m(x + whn − σn u)fU (u) du
1 m(x) + m0 (x)(whn − σn u) + m00 (x) 2
1 000 3 ·(whn − σn u) + m (ξn )(whn − σn u) fU (u) dudw + O(σn2 ) 6 1 00 1 =v0 m(x) + m (x)v2 h2n + m00 (x)v0 σn2 + O(h3n + σn3 ) + O(σn2 ), 2 2 2
25
(S.26)
where ξn is some real number,
r
|u|>x
fU (u) du = O(1/x2 ) 1 is used in the second last equation,
and the boundedness of m(·) and its derivatives, E(U ) = 0, and var(U ) = 1 are used in the last equation. By Lemma 2 and v3 = 0, we have Z2 (x) 1 1 0 2 00 2 =v f (x) + v f (x)h + v f (x)h + v2 fX00 (x)σn2 + O(h3n + σn3 )+ 2 X 3 4 X n X n 3 nhn 2 2 1+γ (log n) o (nhn )β 1 (log n)1+γ 1 00 2 00 2 3 3 a.s. (S.27) =v2 fX (x) + v4 fX (x)hn + v2 fX (x)σn + O(hn + σn ) + o 2 2 (nhn )β Thus, n Z2 (x) 1 X Wi − x Yi K nh3n nhn i=1 hn 1 1 =v2 v0 fX (x)m(x) + h2n v0 v4 fX00 (x)m(x) + v22 m00 (x)fX (x) + σn2 v0 v2 · 2 2 1+γ (log n) [fX00 (x)m(x) + fX (x)m00 (x)] + O(h2n + σn2 ) + o a.s. (nhn )β Next, we calculate the following " # n X 1 Wi − x Wi − x E Yi K nhn i=1 hn hn n 1 X Wi − x Wi − x = E E [Yi | Xi ] E K X i nhn i=1 hn hn n 1 X Wi − x Wi − x = E m(Xi )E K Xi nhn i=1 hn hn n 1 X Wi − x Wi − x = E E m(Xi ) K Xi nhn i=1 hn hn n 1 X Wi − x Wi − x = E m(Xi ) K nhn i=1 hn hn W1 − x W1 − x 1 = E m(X1 ) K hn hn hn ∞ ∞ w w = wK(w)m(x + whn − σn u)fU (u) dudw −∞ −∞
1
It follows from that
r
|u|>x
fU (u) du ≤
r
|u|>x
u2 /x2 fU (u) du ≤ 1/x2 var(U ) = 1/x2 .
26
(S.28)
=
w1
wK(w)
x −x−wh n U wσn xL −x−whn
−1
1 m(x) + m0 (x)(whn − σn u) + m00 (x)(whn − σn u)2 2
σn
+ O h3n + σn3
i
) fU (u) du
=v1 m(x) + v2 m0 (x)hn +
dw + O σn2
v1 v3 00 m (x)h2n + m00 (x)σn2 + O(h2n + σn2 ) 2 2
=v2 m0 (x)hn + O(h2n + σn2 ). By Lemma 2, we have n 1 X Wi − x Wi − x (log n)1+γ 0 2 2 K Yi =v2 m (x)hn + O hn + σn + a.s. nhn i=1 hn hn (nhn )β
(S.29)
Also, by (S.14), we have Z1 (x) v3 00 v1 00 (log n)1+γ 0 2 2 2 2 =v1 fX (x) + v2 fX (x)hn + fX (x)hn + fX (x)σn + o hn + σn + nh2n 2 2 (nhn )β (log n)1+γ =v2 fX0 (x)hn + o h2n + σn2 + a.s. (S.30) (nhn )β Then, by (S.29) – (S.30), n (log n)1+γ Wi − x Z1 (x) 1 X Wi − x 2 0 0 2 2 2 K Yi =v2 m (x)fX (x)hn + O hn + σn + a.s. nh2n nhn i=1 hn hn (nhn )β (S.31) By (S.28) and (S.31), n Z2 (x) 1 X nh3n nhn
Yi K
i=1
Wi − x hn
n Z1 (x) 1 X Wi − x Wi − x − Yi K nh2n nhn i=1 hn hn
h2n v0 v4 fX00 (x)m(x) + v22 m00 (x)fX (x) − 2v22 m0 (x)fX0 (x) 2 2 v0 v2 σn 00 (log n)1+γ 00 2 2 + [fX (x)m(x) + fX (x)m (x)] + o hn + σn + a.s. 2 (nhn )β
=v0 v2 fX (x)m(x) +
(S.32)
By Lemma 2, we have Z0 (x) Z2 (x) Z1 (x) Z1 (x) − nhn nh3n nh2n nh2n h2 =v0 v2 fX (x)2 + n v22 fX00 (x)fX (x) + v0 v4 fX00 (x)fX (x) − 2v22 fX0 (x)2 2 (log n)1+γ 2 00 2 2 + v0 v2 σn fX (x)f (x) + o hn + σn + a.s. (nhn )β 27
(S.33)
By (S.32) – (S.33), we have
b an (x) =
Z2 (x) 1 nh3n nhn
Pn
i=1 Yi K
Wi −x hn
−
Z0 (x) Z2 (x) nhn nh3n
Z1 (x) 1 nh2n nhn
Pn
Wi −x i=1 Yi hn K
−
Z1 (x) Z1 (x) nh2n nh2n
−
Z1 (x) 1 nh2n nhn
Wi −x hn
n)1+γ v0 v2 fX (x)m(x) + O h2n + σn2 + (log (nhn )β = n)1+γ v0 v2 fX (x)2 + O h2n + σn2 + (log (nhn )β 1+γ (log n) 2 2 =g(x) + O hn + σn + a.s. (nhn )β Similarly, we have bbn (x) =
Z0 (x) 1 nhn nhn
Pn
Wi −x i=1 Yi hn K
hn
h
Wi −x hn
Z0 (x) Z2 (x) nhn nh3n
−
Z1 (x) Z1 (x) nh2n nh2n
Pn
i=1
Yi K
Wi −x hn
i
v0 v2 fX (x)m0 (x)hn − v0 v2 fX0 (x)m(x)hn + O h2n + σn2 + h i = n)1+γ hn v0 v2 fX (x)2 + O h2n + σn2 + (log (nhn )β σ 2 (log n)1+γ a.s. =g 0 (x) + O hn + n + hn (nhn )β hn
(log n)1+γ (nhn )β
It remains to calculate WRMSn (x). i2 Pn h Wi −x b an (x) − bn (x)(Wi − x) K hn i=1 Yi − b WRMSn (x) = Pn Wi −x i=1 K hn P n Wi −x 1 2 i=1 εi K nhn hn = P n Wi −x 1 K i=1 nhn hn h i P n Wi −x 1 b an (x) − bn (x)(Wi − x) K hn i=1 εi g(Xi ) − b nhn +2 Pn Wi −x 1 K i=1 nhn hn h i2 Pn Wi −x 1 b an (x) − bn (x)(Wi − x) K hn i=1 g(Xi ) − b nhn + Pn Wi −x 1 K i=1 nhn hn By the same arguments in Lemma 2, we can show that (log n)1+γ 2 2 2 (S.34) =τ + O hn + σn + a.s. (nhn )β 28
(S.34)
(S.35)
(S.36)
Also notice that, for each i, we have g(Xi ) − b an (x) − bbn (x)(Wi − x) h iW −x i =g(x) − b an (x) + hn g 0 (x) − bbn (x) − g 0 (x)σn Ui hn 2 h2n 00 Wi − x σ2 Wi − x + g (x) + n g 00 (x)Ui2 − hn σn g 00 (x) 2 hn 2 hn + o (Wi − x)2 + σn2 Ui − 2σn (Wi − x) . And (log n)1+γ 2 2 g(x) − b an (x) =O hn + σn + a.s. (nhn )β h i 1+γ (log n) 0 2 2 hn g (x) − bbn (x) =O hn + σn + a.s. (nhn )β By the same arguments in Lemma 2, it can be shown that n 1 X Wi − x (log n)1+γ εi K =o nhn i=1 hn (nhn )β n 1 X Wi − x Wi − x (log n)1+γ εi K =o nhn i=1 hn hn (nhn )β 2 n Wi − x (log n)1+γ 1 X Wi − x εi K =o nhn i=1 hn hn (nhn )β n 1 X Wi − x (log n)1+γ εi Ui K =o nhn i=1 hn (nhn )β n Wi − x (log n)1+γ 1 X 2 =o εi Ui K nhn i=1 hn (nhn )β
a.s. a.s. a.s. a.s. a.s.
The above results can also be seen by applying Lemma 2 directly to the special case when m(·) = 0 (i.e., Yi = εi for each i = 1, 2, · · · ). Hence, (log n)1+γ 2 2 a.s. (S.35) =O hn + σn + (nhn )β (log n)2+2γ 4 2 (S.36) =O hn + σn + a.s. (nhn )2β 29
Therefore, (log n)1+γ 2 2 WRMSn (x) = τ + O hn + σn + a.s. (nhn )β 2
The proof of Lemma 3 has been completed. Lemma 4. Suppose that xs = x + s · hn with s ∈ (−1, 1). Let ds = g+ (xs ) − g− (xs ), where g± (xs ) = limz→xs ± g(z). We have the following results: (i) If σn /hn = o(1), then w1
σn (log n)1+γ b an (x) = g− (xs ) + ds K(w) dw + O + hn + a.s. β h (nh n n) s r1 1+γ wK(w) dw σ (log n) n s bbn (x) = ds +O + hn + a.s. hn v2 hn (nhn )β !2 w1 ws w1 w WRMSn (x) = d2s zK(z) dz K(w)dw K(z) dz + v 2 s −1 s ! 2 1 w1 ws w w + K(z) dz − zK(z) dz K(w)dw v 2 s −1 s σn (log n)1+γ 2 +τ +O + hn + a.s. hn (nhn )β
(ii) If σn /hn → ∞, then hn (log n)1+γ 1 b an (x) = [g− (xs ) + g+ (xs )] + O σn + + , a.s. 2 σn (nhn )β 1+γ bbn (x) = 1 O hn + σn + (log n) , a.s. hn σn (nhn )β d2s hn (log n)1+γ 2 WRMSn (x) = τ + , a.s. +O + σn + 4 σn (nhn )β (iii) If σn /hn = O(1), then b an (x) = g− (xs ) + ds
w∞
fU (u)
w1 σn s+u h n
−∞
(log n)1+γ K(w) dw + O hn + σn + (nhn )β
, a.s.
w∞ w1 1+γ d w 1 (log n) s bbn (x) = K(w) dw + O hn + σn + , a.s. fU (u) hn −∞ v2 hn (nhn )β σn s+u h
n
30
WRMSn (x) =
d2s
σn s+u h
w∞
fU (u)
−∞
+
w∞
+
w∞
w∞
0
fU (u )
w1
K(z) dzdu0
σn s+u0 h
−∞
n
K(z)
z dzdu0 · w K(w) dudw v2
n
w1
fU (u)
w∞
σn s+u h
−∞
w∞
−1
σn s+u0 h
−∞
n
2
w1
fU (u0 )
w
n
w1
σn s+u0 h
0
fU (u )
w
n
K(z) dzdu0
−1
−∞
2
z dzdu0 · w K(w) dudw v2 σn −∞ s+u0 h n (log n)1+γ 2 + τ + O hn + σn + a.s., a.s. (nhn )β
−
fU (u0 )
K(z)
Proof. (i) We have that n X
Wi − x Z2 (x) − Z1 (x)(Wi − x) b an (x) = Yi K hn Z0 (x)Z2 (x) − Z1 (x)2 i=1 n X Wi − x Z2 (x) − Z1 (x)(Wi − x) = g(Xi )K hn Z0 (x)Z2 (x) − Z1 (x)2 i=1 n X Wi − x Z2 (x) − Z1 (x)(Wi − x) + εi K hn Z0 (x)Z2 (x) − Z1 (x)2 i=1 X Wi − x Z2 (x) − Z1 (x)(Wi − x) = g(Xi )K hn Z0 (x)Z2 (x) − Z1 (x)2 Xi |G er (x; h0 )| a.s. when n is sufficiently large and therefore, the conclude that |G n n proof of Theorem 1 is completed.
References Butler, J. R. (2002). Policy change and private health insurance: Did the cheapest policy do the trick? Australian Health Review, 25(6):33–41. 45
Carroll, R. J., Maca, J. D., and Ruppert, D. (1999). Nonparametric regression in the presence of measurement error. Biometrika, 86(3):541–554. Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2012). Measurement error in nonlinear models: a modern perspective. CRC press. Comte, F. and Taupin, M.-L. (2007). Nonparametric estimation of the regression function in an errors-in-variables model. Statistica Sinica, 17:1065–1090. Cook, J. R. and Stefanski, L. A. (1994). Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association, 89(428):1314– 1328. Cox, D. (1970). The Analysis of Binary Data. Chapman and Hall, London. Delaigle, A. (2008). An alternative view of the deconvolution problem. Statistica Sinica, 18(3):1025–1045. Delaigle, A. and Meister, A. (2007).
Nonparametric regression estimation in the het-
eroscedastic errors-in-variables problem. Journal of the American Statistical Association, 102(480):1416–1426. Eubank, R. and Speckman, P. (1994). Nonparametric estimation of functions with jump discontinuities. Lecture Notes-Monograph Series, pages 130–144. Everitt, B. S., Landau, S., Leese, M., and Stahl, D. (2011). Cluster Analysis. John Wiley and Sons, Ltd., 5th edition. Fan, J. and Masry, E. (1992). Multivariate regression estimation with errors-in-variables: asymptotic normality for mixing processes. Journal of multivariate analysis, 43(2):237– 271. Fan, J. and Truong, Y. K. (1993). Nonparametric regression with errors in variables. The Annals of Statistics, 21(4):1900–1925. 46
Frech, H. E., Hopkins, S., and MacDonald, G. (2003). The australian private health insurance boom: was it subsidies or liberalised regulation? Economic Papers: A journal of applied economics and policy, 22(1):58–64. Gijbels, I. and Goderniaux, A.-C. (2004). Bandwidth selection for changepoint estimation in nonparametric regression. Technometrics, 46(1):76–86. Gijbels, I., Hall, P., and Kneip, A. (1999). On the estimation of jump points in smooth curves. Annals of the Institute of Statistical Mathematics, 51(2):231–251. Gijbels, I., Lambert, A., and Qiu, P. (2007). Jump-preserving regression and smoothing using local linear fitting: a compromise. Annals of the Institute of Statistical Mathematics, 59(2):235–272. Hall, P. and Meister, A. (2007). A ridge-parameter approach to deconvolution. The Annals of Statistics, 35(4):1535–1558. Hartigan, J. and Wong, M. (1979). Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100–108. Kang, Y., Gong, X., Gao, J., and Qiu, P. (2015). Jump detection in generalized error-invariables regression with an application to australian health tax policies. The Annals of Applied Statistics, 9:883–900. Kang, Y. and Qiu, P. (2014). Jump detection in blurred regression surfaces. Technometrics, 56:539–550. McDonald, J. A. and Owen, A. B. (1986). Smoothing with split linear fits. Technometrics, 28(3):195–208. M¨ uller, C. H. (2002). Robust estimators for estimating discontinuous functions. Metrika, 55:99–109. Muller, H.-G. (1992). Change-points in nonparametric regression analysis. The Annals of Statistics, 20(2):737–761. 47
Palangkaraya, A. and Yong, J. (2005). Effects of recent carrot-and-stick policy initiatives on private health insurance coverage in australia. Economic Record, 81(254):262–272. Palangkaraya, A., Yong, J., Webster, E., and Dawkins, P. (2009). The income distributive implications of recent private health insurance policy reforms in australia. The European Journal of Health Economics, 10(2):135–148. Qiu, P. (1991). Estimation of a kind of jump regression functions. Journal of Systems Science and Complexity, 4:1–13. Qiu, P. (2003). A jump-preserving curve fitting procedure based on local piecewise-linear kernel estimation. Journal of Nonparametric Statistics, 15(4-5):437–453. Qiu, P. (2005). Image Processing and Jump Regression Analysis. John Wiley & Sons. Qiu, P., Asano, C., and Li, X. (1991). Estimation of jump regression function. Bulletin of Informatics and Cybernetics, 24:197–212. Qiu, P. and Kang, Y. (2015). Blind image deblurring using jump regression analysis. Statistica Sinica, 25:879–899. Staudenmayer, J. and Ruppert, D. (2004). Local polynomial regression and simulation– extrapolation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(1):17–30. Stefanski, L. (2000). Measurement error models. Journal of the American Statistical Association, 95(452):1353–1358. Stefanski, L. and Cook, J. (1995). Simulation-extrapolation: the measurement error jackknife. Journal of the American Statistical Association, 90(432):1247–1256. Taupin, M.-L. (2001). Semi-parametric estimation in the nonlinear structural errors-invariables model. Annals of Statistics, 29(1):66–93.
48
Wu, J. and Chu, C. (1993). Kernel-type estimators of jump points and values of a regression function. The Annals of Statistics, 21(3):1545–1566.
49