On variable variance-stabilizing bandwidth for the ...

3 downloads 0 Views 339KB Size Report
maintaining asymptotic homoscedasticity. We examine its local and global prop- erties relative to the MISE minimizing xed bandwidth. The proposed VS.
Department of Social Systems and Management Discussion Paper Series No. 1257

On variable variance-stabilizing bandwidth for the Nadaraya-Watson regression estimator by

Kiheiji NISHIDA and Yuichiro J. KANAZAWA April 2010

UNIVERSITY OF TSUKUBA Tsukuba, Ibaraki 305-8573 JAPAN

On variable variance-stabilizing bandwidth for the Nadaraya-Watson regression estimator Kiheiji NISHIDA3 , Yuichiro J. KANAZAWA ,a

b

a Doctoral Program in Social Systems Engineering, Graduate School of Systems and Information Engineering, University of Tsukuba, 1-1-1 Ten-noh-dai, Tsukuba, Ibaraki 305-8573, Japan. b Department of Social Systems and Management, Graduate School of Systems and Information Engineering, University of Tsukuba, 1-1-1 Ten-noh-dai, Tsukuba, Ibaraki 305-8573, Japan.

Abstract In linear regression under heteroscedastic variances, the Aitken estimator is employed to counter heteroscedasticity. Employing the same principle, we propose the Nadaraya-Watson regression estimator with variable variance-stabilizing bandwidth (VS bandwidth) that minimizes asymptotic MISE (AMISE) while maintaining asymptotic homoscedasticity. We examine its local and global properties relative to the MISE minimizing xed bandwidth. The proposed VS bandwidth produces the asymptotic variance smaller on some part of the support than the xed bandwidth and may in some cases achieve smaller AMISE overall than its xed counterpart. In numerical examples, we nd that the proposed VS bandwidth is more serviceable when the distribution of X 's are

atter. Key words: Aitken estimator, Bandwidth selection, The Nadaraya-Watson estimator, Variable bandwidth, Variance-stabilization.

Corresponding author. Email address: [email protected] (Kiheiji NISHIDA)

3

1. Introduction Suppose we construct a Nadaraya-Watson regression estimator (henceforth the NW estimator, Nadaraya, 1964, 1965, 1970; Watson, 1964; Watson and Leadbetter, 1963) from a pair of random observations (x ; y ) i = 1; : : : ; n. This setup, though limited in its applicability, highlights the mechanism under which the proposed variance-stabilizing NW estimator's MISE performs relative to that under the conventional NW estimator via the ratio of two \densities" to be discussed later in (11). We assume the response Y is in uenced by the explanatory variable X in the form of m(X ) and the disturbance U as i

i

i

i

i

i

Y = m(X ) + U ; i

i

(1)

i

where the X 's, i = 1; :::; n are i.i.d. random variables whose marginal density is de ned as f (x) on support I and U 's, i = 1; :::; n are identically distributed random variables conditionally independent upon X and independent of X , i = j . We assume that their conditional moments are, i

X

i

i

j

6

E

Ui jXi

[U X = x] = 0; E ij

i

Ui jX i

[U X = x] =  (x) < : 2

i j

2

i

1

Under these assumptions, the function m(X ) in (1) is expressed as the condiR R tional expectation m(X ) = E j (Y X ) = yf j (y x)dy = yf (x; y)=f (x)dy, where f (x; y ) is bivariate density function of (X; Y ). By replacing these f (x; y) and f (x) with their kernel bivariate and univariate estimates, the former with multiplicative kernel K (x; y ) = K (x)K (y) of respective bandwidths h and h on X and Y and the latter of kernel K (x) of the bandwidth h on X , we arrive at the usual NW estimator at X = x as m^ (x) = P K 0 Y = P K 0 : We write the kernel function K (t) as K (t) for brevity. Variance and bias of the NW estimator have been well known (see e.g. Hardle et al., 2004; Pagan and Ullah, 1999). With the standard set of assumptions on kernel K1-K3 and the additional assumptions A1-A6 on f (x),  (x), m(x) and h presented in section 2, the theoretical bias of the NW estimator m^ (x) i

i

Yi Xi

ij

i

Y X

j

X;Y

X

X;Y

X

X;Y

X;Y

x

X

y

Y

X

x

n i=1

X

hx

x

Xi

hx

i

n i=1

X

x

Xi

X

hx

X

x

2

hx

2

at x is

Z    EX Y [m^ (x)] m(x) = h2f ((xx)) t K (t)dt + O nh1 + o(h ); 2

;

x

0

hx

2

2

X

x

x

where, for notational convenience, we write

(x) = 2m (x)f (x) + m (x)f (x): Similarly the theoretical variance at x is known to be, Z       ( x ) 1 1 1 VX Y [m^ (x)] = nh f (x) K (t)dt + O n + o nh : (1)

(1)

(2)

X

X

2

;

2

1

hx

x

X

x

(2) (3) (4)

On nding bandwidth selection rule, Hardle et al. said, \[W]e are faced with the problem of nding a bandwidth-selection rule that has desirable theoretical properties and is applicable in practice" (Hardle et al., 2004, p.94). One such desirable theoretical property is pointwise convergence in probability of the proposed estimator to the underlying regression function, which naturally leads to a bandwidth selection rule that minimizes the mean squared error (MSE) at a single point. But if we are interested in how well the entire m(x) performs over the support I , then we would use the global measure of closeness, such as the mean integrated squared error (MISE),

MISE(m"(x); m^ (x)) Z Z Z = [m(x) m^ (x)] hx

111

I

I

0

2

hx

I

n Y

f

#

Xi ;Yi

and the xed bandwidth minimizing the MISE is

h

f ixed

=

(x ; y )dx dy f (x)dx (5) i

i

i

i

X

i=1

2 2R 4 2R

31

K (t)dt3 R  (x)dx 5 5 n0 15 : t K (t)dt3 R 2 dx 2

2

(6)

I

2

2

(x) I fX (x)

It is known that the NW estimator with this xed bandwidth converges in mean square and therefore converges in probability pointwise to m(x) with additional assumptions K4-K5 (see e.g. Hardle et al., 2004, pp.93-94). Substituting (6) for h in (4) gives the leading term of the theoretical variance (henceforth asymptotic variance or abbreviated as AV), x

h

AVX Y m^ ;

hfixed

2

2R

3 45 2R

30 15

3

(x) = f ((xx)) 64 2R K (t)dt3 2 hR  (2x)dx i0 51 75 n0 45 :(7) t K (t)dt 0 5 dx i

2

2

2

I

X

2

3

(x) I fX (x)

The MISE minimizing xed bandwidth in (6) does not achieve asymptotic homoscedasticity of the m^ (x) up to the leading term unless f (x) and  (x) are of the same functional form on I . To numerically illustrate the point, we consider a case where regression function is m(x) = 4x , the density f (x) is the normal N (0:5; 0:04) truncated on I = [0; 1] and the conditional variance increases as m(x) increases, say,  (x) = (x + 0:5) . As a kernel function, we use the Epanechnikov kernel. The left panel of Figure 1 plots the m(x), 3 m(x) AVX Y 2m^ (x) and f (x) along with one set of randomly generated (X , Y ) i = 1;: : : ; 1000. We observe that the asymptotic variances in (7) are larger at both tails. hx

2

X

3

2

1=2

6

i

hfixed

;

X

2

X

i

*

* *

* *

* *

*

*

* * *

*

*

* ** * * *

*

*** * * * * * *

*

*

* *

0.2

*

0.4

0.6

0.8

*

*

* * *

*

*

*

*

*

* *

* *

* * * *

* ** * * *

*

*** * * * * * *

*

*

*

*

* * * * * ** * * * * * * * * * * * * * * * * * * * * * * * ** * ** * ** * * * ** * * * * * * * * * * ** ** * * * * *** * * * * ** * * * * * * * *** * * * * * ** * * ** * ** * * * * ** *** * * ** * * ** * * * * ** * * * ** ** * * ** * * * * *** * * * ** * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * * *** ** * * * * * * ** ** * ** ** * * * * * * * * * * * * * * ** * ** * * * * * * * * * * * ** * * * *** * * * * * * * * ** * * * ** * ** ** * * * * *** * * ** * * * * * * ** ** * * * * ** ** * * * * * * * * * ** ** * * * ** * * * * ** * ** * * *** * * * * * * ** * ** * * * * * * * * * * * ** * ** * ** * * * * *** * * * * * * ** ** *** * * * ** * * * * ** * * * * ** ** ***** *** * ** * * * * * * * * * ** * * * * * * ** * * * * * * * * * *** * ** * * * * * * * ** * *** * **** ******** * *** * ** ** * * * * * ** ** * * * * * * * * * * **** * ** * ** * ** * * * * * ** ** * ** ** ** ** * * * * * * * * ** * * * * *** * * * * * * **** * * * **** ** * * *** * **** ** * * * * * * * * * * * * *** * * * * * ** *** **** * * * * * * * *** *** ** ** * * * * * * * * ** * * * * * ** * * * * ** ** ** ** *** * * * * * * ** * ** * ** * * * *** * * * * * * * * * * * * * * *** * * * * ** * ** * ** * * ** * * * * ** * * * * * ** * * * * * * * **** * * ** * * * * * ** ** * * * ** * ** * ** ** * ** * * * * *** * * * * * * ** ** * ** ** *** ** * * * * ** ** * * * * * * * * ** ** ** * * * * * * * * * * * * * * * * * * * * * * *** * * * * * * *** ** * * * * * * * * * * * * * ** ** * * * ** * * * ** * ** * * * * * * * ** * * * * * ** ** * * * * * * * * * *

* *

* * *

*

*

*

*

*

* * *

* **

0

* *

Density / y

2 0

Density / y

−2

0.0

*

* * *

*

*

*

*

* *

^ (x)], h=VS m(x) ± V1 2[m fX(x)

*

*

* * * * * ** * * * * * * * * * * * * * * * * * * * * * * * ** * ** * ** * * * ** * * * * * * * * * * ** ** * * * * *** * * * * ** * * * * * * * *** * * * * * ** * * ** * ** * * * * ** *** * * ** * * ** * * * * ** * * * ** ** * * ** * * * * *** * * * ** * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *** * * *** ** * * * * * * ** ** * ** ** * * * * * * * * * * * * * * ** * ** * * * * * * * * * * * ** * * * *** * * * * * * * * ** * * * ** * ** ** * * * * *** * * ** * * * * * * ** ** * * * * ** ** * * * * * * * * * ** ** * * * ** * * * * ** * ** * * *** * * * * * * ** * ** * * * * * * * * * * * ** * ** * ** * * * * *** * * * * * * ** ** *** * * * ** * * * * ** * * * * ** ** ***** *** * ** * * * * * * * * * ** * * * * * * ** * * * * * * * * * *** * ** * * * * * * * ** * *** * **** ******** * *** * ** ** * * * * * ** ** * * * * * * * * * * **** * ** * ** * ** * * * * * ** ** * ** ** ** ** * * * * * * * * ** * * * * *** * * * * * * **** * * * **** ** * * *** * **** ** * * * * * * * * * * * * *** * * * * * ** *** **** * * * * * * * *** *** ** ** * * * * * * * * ** * * * * * ** * * * * ** ** ** ** *** * * * * * * ** * ** * ** * * * *** * * * * * * * * * * * * * * *** * * * * ** * ** * ** * * ** * * * * ** * * * * * ** * * * * * * * **** * * ** * * * * * ** ** * * * ** * ** * ** ** * ** * * * * *** * * * * * * ** ** * ** ** *** ** * * * * ** ** * * * * * * * * ** ** ** * * * * * * * * * * * * * * * * * * * * * * *** * * * * * * *** ** * * * * * * * * * * * * * ** ** * * * ** * * * ** * ** * * * * * * * ** * * * * * ** ** * * * * * * * * * *

* **

m(x) *

* *

* * *

*

* *

* *

*

*

*

*

*

4

^ (x)], h=fixed m(x) ± V1 2[m fX(x)

* *

2

* *

−2

4

m(x)

1.0

0.0

x

0.2

*

*

* *

*

*

*

0.4

0.6

0.8

* * *

1.0

x

Figure 1: Plots of the asymptotic variances. Conditional variance 2 (x) increases as m(x) increases as 2 (x) = (x + 0:5)2 , m(x) = 4x3 and fX (x) is a normal distribution N(0.5, 0.04) truncated on [0, 1].

In linear regression under non-spherical or heteroscedastic variances, generalized least-squares or Aitken estimator is employed to account for the di erences in variances at di erent x's. We feel that the NW estimators can be improved by employing the same principle of variance-stabilization. If we let the bandwidth h vary, but in the form, h (x) = h f ((xx)) ; (8) x

2

VS

0

1

X

4

where h is constant possibly dependent on the kernel K (t), the regression function m(x), the density f (x) of X and the conditional variance function  (x), averaged over I , it achieves asymptotic homoscedasticity. Hence, variable bandwidth of the form (8) is variance-stabilizing (henceforth VS). This variable VS bandwidth can be justi ed for two reasons: First in the domain where the density f (x) is low, (8) forces one to choose wider bandwidth. The f (x) being low means that there are relatively few data points on the region, so aggregating them over wider region to estimate the regression function m(x) makes sense. Second in the region where the variances  (x) are low, the bandwidths should be narrower because the small variances imply that those data points are more accurate there. To determine h in (8), we choose h so as to minimize the MISE among the class of bandwidths (8). The variable VS bandwidth so obtained is 0

X

2

X

X

2

0

0

2

h (x) =

2R

4 2R

VS

3 15

Kh (t)dt3 2

32 R

t K (t)dt 2

 8 (x) 2 (x) 5 (x) I fX

i5

dx

n0 51 f ((xx)) ; X

and the corresponding asymptotic variance is 2

AVX Y 2m^ (x)3 = 64 2R ;

hV S

2R

30 25 hR

t K (t)dt 2

K (t)dt 2

(9)

2

1

3

3 45

 8 (x) 2 (x) 5 (x) I fX

n :

(10)

7 0 54 i0 15 5

dx

The derivation of h is in Appendix. It is easy to see that the proposed bandwidth enables m^ (x) to converge in probability to m(x) pointwise with conditions K1-K5 and A1-A6 because MSE at every point of x is the same order of magnitude n0 of the MISE minimizing xed bandwidth in (6). To numerically illustrate the e ect of this variance stabilization, see the right panel of Figure 1 for the aforementioned m(x), f (x),  (x) and m(x) AVX Y [m^ (x)] along with the same set of randomly generated (X , Y ) i = 1; : : : ; 1000. We observe that the asymptotic variance (10) stays constant. Now we need to ask ourselves three questions: First how well the \stabilized" asymptotic variance in (10) fares relative to the asymptotic variance in (7) locally. Second how well the proposed VS bandwidth in (9) performs globally 0

hV S

4=5

X

1=2

;

hV S

2

6

i

5

i

in comparison with the xed bandwidth in (6) in terms of, say, the MISE. Even if these questions can be answered armatively, we wonder if the proposed VS bandwidth in (9) is feasible in the sense that its sample-based criterion function such as the cross-validation statistic can be formulated. In this paper, we answer the rst two questions armatively in propositions 1, 2 and 3 in section 2. As for the last question, it is beyond the scope of this paper given the available space. In section 3, we present two illustrative examples to demonstrate propostions 1 and 3 and to investigate how fat-tailedness of the distribution of X 's a ects the MISE. In section 4, we give comment on the proposed VS bandwidth.

2. Asymptotic results Suppose that the kernel function satisfy:

K1 Kernel is a density function K (t) symmetric about zero. K2 R t K (t)dt < . K3 R K (t)dt < . K4 R K (t) dt < . K5 lim !61 tK (t) 0. 2

1

2

j

1

j

1

!

t

We place the following standard set of assumptions:

A1 h 0 as n goes to in nity. A2 nh as n goes to in nity. A3 The density of X is 0 < f (x) < on a compact support I . A4 The f (x) is bounded continuously di erentiable on I . A5 The conditional variance  (x) is continuous with respect to x and 0 <  (x) < on I . A6 The regression function m(x) is twice bounded continuously di erentiable and is bounded and not identically constant on I . x !

x ! 1

1

X

1

X

2

2

1

2

1 2

At the left (right) boundary, fX (x) is right (left) di erentiable. At the left (right) boundary, m(x) is twice right (left) di erentiable.

6

De ne the ratio of two \densities" of the  (x) to the f (x) or  (11) (x) = R  ((xx))dx R ff ((xx))dx : Consequently, by A3, A4 and A5, we can de ne = arg max 2 (x) and

= arg min 2 (x). Note that the quantity (x), in (11), by virtue of the fact that it is a ratio of two \densities", ( ) 1 ( ). Then, we show in proposition 1 that the asymptotic homoscedastic variance in (10) is bounded above and below by its heteroscedastic counterparts in (7). 2

2

X

2

I

X

X

I

max

min

x I

x I

min





max

Proposition 1: Asymptotic homoscedastic variance AVX Y [m^ (x)] in (10) ;

hV S

is bounded above and below by the maximal and the minimal heteroscedastic counterparts in (7) for all x I , or 2

AVX Y 2m^ ;

hf ixed

( )

3

min



AVX Y [m^ (x)] AVX Y 2m^ ;



hV S

;

hf ixed

(

3

max

):

Proof : Subtracting the leading term of (10) raised to the fth from that of (7) obtains

3 AVX Y 2m^ (x) AVX Y [m^ (x)]   Z Z = C n0 (x) f ((zz)) dz (z) f ((zz)) dz ; (12) 2R 3 2R 3 2R where C = K (t)dt t K (t)dt  (z)dz 3 > 0. Evaluating (12) at x = , we obtain 5

5

0

hfixed

;

hV S

;

2

2

0

4

1

5

I

4

2

0

4

0

X 2

2

X

I

4

2

I

max

3 AVX Y 2m^ ( ) AVX Y [m^ ( )] Z 2 0 = C n ( ) (z)3 f ((zz)) dz 0; because ( ) ( ) (z ) and (z )=f (z ) 0. Similarly evaluating (12) at x = , we obtain 5

hf ixed

;

max

0

5

hV S

;

max

2

0

1

4

5

max

4

0

5

max

4



max





X

I

4

2

X



min

3 AVX Y 2m^ ( ) AVX Y [m^ ( )] Z 2 0 = C n ( ) (z)3 f ((zz)) dz 0; 5

hfixed

;

min

0

5

hV S

;

min

2

0

1

4

5

min

I

7

0

4



X

because ( ) ( ) (z) and (z )=f (z ) 0. 5



min

4

min



4

2

2



X

Remark 1: We mentioned before that if f (x) and  (x) are of the same func2 3 R tional form, AVX Y m^ (x) is constant. If this is the case, f (x)= f (x)dx R coincides with  (x)=  (x)dx, and, the asymptotic heteroscedastic variance 2

X

;

hfixed

2

X

I

X

2

I

in (7) is identical to its homoscedastic counterpart in (10). In other words, our proposed VS bandwidth h is the generalization of the h . VS

f ixed

Let us call the leading term of the MISE (m(x); m^ (x)) in (5) as asymptotic MISE or the AMISE (m(x); m^ (x)) and call the leading term of the bias in (2) as asymptotic bias. Similarly, we call the asymptotic variance (bias squared) integrated with respect to f (x) on I as integrated asymptotic variance (integrated asymptotic bias squared). Then, we have the following result on global performance of the proposed VS bandwidth in (9) and its xed counterpart in (6). hx

hx

X

Proposition 2: The ratio of the integrated asymptotic variance to the integrated asymptotic bias squared under the proposed VS bandwidth h (x) remains the same 4 to 1 as the ratio under the MISE minimizing xed bandwidth h . VS

f ixed

Proof : It is well-known that the relation,

Z h i (x)) = 54 AVX Y m^ (x) f (x)dx; holds because the AMISE is written as the function of a bandwidth h , (1=nh )C + (h =4)C , where C and C are respectively the integrated asymptotic variance and the integrated asymptotic bias squared under the bandwidth h . As seen from (A.2), the same ratio of 4 to 1 is maintained between the integrated asymptotic variance and the integrated asymptotic bias squared for h , so AMISE (m(x); m^ (x)) = 54 AVX Y 2m^ (x)3 : 2

AMISE (m(x); m^

hf ixed

;

hfixed

I

X

x

4

x

2

1

x

2

x

0

hV S

;

8

hV S

1

Proposition 2 says that the AMISE can be measured by the integrated asymptotic variance. We would expect from this and proposition 1 that the resulting AMISE under the m^ (x) cannot be always larger than the AMISE under the m^ (x). hV S

hf ixed

Proposition 3: AMISE(m(x); m^ (x)) is not larger than AMISE (m(x); m^ for all choices of (f (x);  (x); m(x)). hV S

hf ixed

2

X

Proof : It suces to show that there are cases (x)) AMISE (m(x); m^ (x))   Z 2 3 (13) = C 45 n0 1 (x) f ((xx)) dx > 0: Since the (x) in (11) is the ratio of two \densities," (x) cannot be greater than 1 for all x. Then, (x) in (3) in the term (x)=f (x) is the function of m(x) and f (x) only and as such m(x) can be independently chosen of 1 (x), which is a function of  (x) and f (x) only. 2

AMISE (m(x); m^ 5

5

0

hfixed

5

2

1

0

hV S

4

4

0

1

X

I

2

X

4

0

X

2

X

Remark 2: The inequality (13) holds when the term (x)=f (x) is large in the area where (x) < 1, while the (x)=f (x) is small in the area (x) > 1. 2

2

X

X

3. Illustrative examples Example 1. To demonstrate propositions 1 and 3, we set f (x) = 2(2 x)=3 and m(x) = (x 2)0 on the domain I = [0; 1] as in the left panel of Figure 2. This will determine (x)=f (x) = (8=3)(2 x)0 to be a monotone increasing function on I as seen in the right panel of Figure 2. At the same time, this choice of f (x) will dictate the choice of  (x) to be a function of (2 x) R in order for (x) = 1. Let us set  (x) = 0:1(2 x) or  (x)=  (x)dx = 2(2 x)=3. It is easy to see from (11) that (x) = 1 and from (13) that for these cases the AMISE (m(x); m^ (x)) is equal to AMISE (m(x); m^ (x)). In view of Remark 2 and the fact that (x)=f (x) is monotone increasing, 0

X

0

2

2

7

0

X

2

X

2

0

0

2

2

I

0

hV S

hf ixed

2

9

X

(x))

we realize AMISE (m(x); m^ (x)) is larger than AMISE (m(x); m^ (x)) when  (x) is rotated about x = 0:5 clockwise as in the left panel of Figure 3, R say to  (x) = 0:1(3 2x) or  (x)=  (x)dx = (3 2x)=2, and the reverse occurs when  (x) is rotated counterclockwise as in the same panel, say to  (x) = 0:1(x + 1) or  (x)= R  (x)dx = 2(x + 1)=3. Additionally, we see how the relative magnitude of these two measures when the  (x) is constant R as  (x) = 0:1 or  (x)=  (x)dx = 1. For each conditional variance function  (x) above, we show in Table 1 the asymptotic homoscedastic variance in (10), the maximal and minimal values of the asymptotic heteroscedastic variance in (7), and the ratio of the two AMISE's, AMISE(m(x); m^ (x)) : r(f (x);  (x); m(x)) = AMISE (14) (m(x); m^ (x)) It is easy to see that the asymptotic homoscedastic variance in (10) is sandwitched by its maximal and minimal counterparts in (7) as proposition 1 claims for a given  (x). We also observe the ratio r moves from 1.5133 to 0.8904 as proposition 3 implies. From the two right panels in Figures 2 and 3, we see that  (x) = 0:1(3 2x) creates the situation described in Remark 2. hfixed

hV S

2

2

2

0

2

0

I

2

2

2

2

I

2

2

2

2

I

2

X

hV S

2

hf ixed

2

2

0

 2 (x)

homo.var. in (10)

0:1(x + 1) 0:1 0:1(2 0 x) 0:1(3 0 2x)

0.0981 0.0587 0.0648 0.0727

min. 0.0324 0.0351 0.0648 0.0612

hetero.var. in (7) max. x 0.1297 x x 0.0703 x 0.0648 x 0.0918 x

(arg.min)

r

in (14)

(arg.max)

(

= 0)

(

= 1)

(

= 0)

(

= 1)

(

= 1)

(

= 0)

1.5133 1.2525 1.0000 0.8904

Table 1: The result of numerical calculation for illustrative example 1. The values of the homoscedastic variance in (10), the maximal and the minimal of the asymptotic heteroscedastic variances in (7) and the ratio r of the AMISE's in (14) are respectively calculated numerically for four 2 (x) for a given (fX (x) = 2(2 0 x)=3 , 2 (x), m(x) = (2 0 x)02 ). In calculating variances, n is set to be 1 and the Epanechnikov kernel is employed. The calculated numbers are rounded to the fourth decimal place.

Example 2. We investigate how fat-tailedness of the distribution of X 's af10

1.5

2.0

2.5

α2(x) / fX(x)=(8 3)(2 − x)−7

0.0

0.5

1.0

α2(x) / fX(x)

1.5 1.0 0.0

0.5

Density / y

2.0

2.5

m(x)=(x − 2)−2 fX(x)=2(2 − x) 3

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

x

0.6

0.8

1.0

x

Figure 2: Functions in illustrative example 1. The regression function m(x) = (x 0 2)02 and its associated density of X are illustrated in the left panel. The function 2 (x)=fX (x) = (8=3)(2 0 x)07 in the integrand in (13) is in the right panel.

0 −5 −10

1.0

1.5

1 − β4(x)

σ2(x)=0.1(x+1) σ2(x)=0.1 σ2(x)=0.1(2−x) σ2(x)=0.1(3−2x)

0.0

−15

0.5

Density of σ2(x)

2.0

2.5

σ2(x)=0.1(x+1) σ2(x)=0.1 σ2(x)=0.1(2−x) σ2(x)=0.1(3−2x)

0.0

0.2

0.4

0.6

0.8

1.0

0.0

x

0.2

0.4

0.6

0.8

1.0

x

Figure 3: Functions in illustrative example 1. R The \densities"  2 (x)= I 2 (x)dx for the four 2 (x)'s are illustrated in the left panel, while the functions 1 0 4 (x) for the four 2 (x)'s are in the right panel.

11

fects the ratio r of the AMISE's in (14). This is because we expect low density f (x) at and near the boundaries of I = [0; 1] makes the proposed VS bandwidth very wide, rendering the bias not to be ignorable. For this, we retain m(x) = (x 2)0 on I = [0; 1] from illustrative example 1. We then set the conditional variance  (x) to be xed at 0.1 as the regression function m(x) increases. This constant  (x) over the support corresponds to the second row in Table 1. If the  (x) varies, it is more likely to increase as the m(x) increases. Thus we additionally set in the form of increasing  (x) = 0:1(x + 1), which corresponds to rst row in Table 1. Now, on Table 2, we set up 17 cases all truncated normal with mean 0.5 at the center of support I and with the original (that is, before truncation) normal standard deviation s ranging from 0.25 to 1 to in nity. The second column of Table 2 shows that how many data are concentrated on I if the normal is not truncated on I . As s increases, the densities of X 's becomes atter. Especially when s approaches in nity, the distribution of X 's is uniform. The third and fourth columns on Table 2 give the resulting r's in (14) for the aforementioned two cases of  (x). In Figure 4, we plot the points (s; r) shown in Table 2 for the two  (x)'s. For  (x) = 0:1, from Table 2 and Figure 4, the ratio r in (14) rapidly decreases in the begining but more slowly later from 2.2751 to 0.9956 as s increases from 0.25 to 0.6. As s increases further from 0.6 to 0.85, the ratio r increases but barely to 1.0220. As s grows from 0.85 to in nity, the ratio r decreases again slowly to 1.0000. Similarly, in the case of  (x) = 0:1(x + 1), the ratio r rapidly decreases from 2.8077 to 0.9331 as s increases from 0.25 to 0.55. As s increases further from 0.55 to 1.0, the ratio r increases to 1.1953. As s grows from 1.0 to in nity, the ratio r decreases again slowly to 1.1883. We notice that the ratios r in both cases are quite high when X 's are concentrated in the middle, but barely a ected when X 's has atter distribution X

0

2

2

2

2

2

2

2

2

2

12

s

0:25 0:3 0:35 0:4 0:45 0:5 0:55 0:6 0:65 0:7 0:75 0:8 0:85 0:9 0:95 1:0

1

R

I (x; s)dx r in (14) 2 (x) = 0:1 0.9545 2.2751 0.9044 1.7000 0.8468 1.4347 0.7887 1.2762 0.7334 1.1543 0.6826 1.0519 0.6366 0.9980 0.5953 0.9956 0.5582 1.0071 0.5249 1.0157 0.4950 1.0201 0.4680 1.0218 0.4436 1.0220 0.4214 1.0214 0.4013 1.0204 0.3829 1.0193 1.0000

2 (x) = 0:1(x + 1)

2.8077 2.0834 1.7408 1.5202 1.3178 1.0859 0.9331 0.9976 1.0817 1.1317 1.1596 1.1755 1.1847 1.1901 1.1934 1.1953 1.1883

Table 2: The result of numerical calculation for illustrative example 2. The function (x; s) in the second column is the density function of N (0:5; s2 ). The ratio in (14) is calculated numerically for di erent values of the parameter s. The calculated numbers are rounded to the fourth decimal place.

13

3.0 1.0

1.5

r

2.0

2.5

σ2(x)=0.1 σ2(x)=0.1(x+1)

0.2

0.4

0.6

0.8

1.0

s

Figure 4: The plots of (s; r) in Table 2. We plot the points (s; r) shown in Table 2 for the two conditional variance functions  2 (x).

4. Discussion In this paper, we rst show, in proposition 1, that the proposed VS bandwidth produces the asymptotic variance smaller than that proposed by the standard xed bandwidth on some part of the support I for the NW estimator. We then show, in proposition 3, that the proposed bandwidth may in some cases achieve even smaller AMISE while maintaining homoscedasticity. We give illustrative example 1 to show that the penalty for homoscedasticity in terms of the AMISE ranges from -11% (that is, the proposed bandwidth is preferable) to 51% in Table 1. In illustrative example 2, we examine how fat-tailedness of the distribution of X 's a ects the AMISE. It shows that the proposed VS bandwidth in (9) is more serviceable when the distribution X 's becomes atter because only several percentage point of penalty in the form of the increased AMISE is imposed in exchange for homoscedastic m^ (x). hx

14

In Table 2, we are surprised that the ratios r in (14) go below unity when s = 0:55 and 0:60 under the two setup of the  (x). Since the result is based on the limited number of numerical calculations, however, it is dicult to generalize the result to conclude that there will always be the cases where the proposed VS bandwidth in (9) outperforms the conventional xed bandwidth in (6). Our theoretical results and the accompanying numerical calculations show that it is possible for the proposed variance-stabilizing NW estimator to attain the AMISE smaller than that achieved by the conventional NW estimator, yet retains homoscedasticity. Further extensive numerical investigations of this matter will give us more insight into how the proposed VS bandwidth will perform. 2

Appendix Integrating square of the bias in (2) and variance in (4) over the support I , MISE between m^ (x) and m(x) is hx

Z "

1  (x) Z K (t)dt + o  1  + O  1  nh f (x) nh n Z   0 1 + h4 f ((xx)) t K (t)dt + o h + O n 1h 2

2

I

x

X

4

x

x

#

2

2

2

4

x

2

X

2

2

x

f (x)dx: (A.1) X

Substituting h (x) in (8) for h in (A.1), 1 Z Z K (t)dt f (x)dx + o  1  + O  1  nh nh n Z    Z 0 1 h  ( x ) ( x ) 1 +4 t K (t)dt f (x)dx + o h + O n h ; (A.2) f (x) di erentiating the two leading terms in (A.2) with respect to h and equating the outcome to zero, 1 Z K (t)dt + h Z t K (t)dt Z  (x) (x) dx = 0: (A.3) nh f (x) The constant term in (9) is obtained by solving (A.3) with respect to h . 2 VS

x

2

0

X

I

0

4 0

8

2

2

2

I

4 0

X

6

2

X

2 0

0

2

0

2

2 0

3 0

8

2

2

I

5

X

0

15

Acknowledgements This research is supported in part by the Grant-in-Aid for Scienti c Research (C)(2) 12680310, (C)(2) 16510103 and (B) 20310081 from the Japan Society for the Promotion of Science.

References Hardle, W., Muller, M., Sperlich, S., Werwatz, A., 2004. Nonparametric and Semiparametric Models. Springer, Berlin and Heidelberg. Nadaraya, E., A., 1964. On estimating regression. Theory of Probability and Its Applications 9, 141{142. Nadaraya, E., A., 1965. On nonparametric estimation of density functions and regression curves. Theory of Probability and Its Applications 10, 186{190. Nadaraya, E., A., 1970. Remarks on nonparametric estimates for density functions and regression curves. Theory of Probability and Its Applications 15, 134{137. Pagan, A., Ullah, A., 1999. Nonparametric Econometrics. Cambridge University Press, Cambridge. Watson, G., S., 1964. Smooth regression analysis. Sankhya, Series A 26, 359{372. Watson, G., S., Leadbetter, M., 1963. On the estimation of probability density, I. Annals of Mathematical Statistics 34, 480{491.

16

Suggest Documents