Asymptotic linearity of linear rank statistic in the case ...

2 downloads 0 Views 545KB Size Report
Department of Mathematics. Uppsala University. Asymptotic linearity of linear rank statistic in the case of symmetric heteroscedastic variables. Kristi Kuljus and ...
U.U.D.M. Report 2008:32

Asymptotic linearity of linear rank statistic in the case of symmetric heteroscedastic variables Kristi Kuljus and Silvelyn Zwanzig

Department of Mathematics Uppsala University

Asymptotic linearity of linear rank statistic in the case of symmetric heteroscedastic variables Kristi Kuljus∗ and Silvelyn Zwanzig Department of Mathematics, Uppsala University Abstract Let Y1 , . . . , Yn be independent but not identically distributed random variables with densities f1 , . . . , fn P symmetric around Suppose c1,n , . . . , cn,n Pzero. 2 are given constants such that i ci,n = 0 and i ci,n = 1. Denote the rank of Yi − ∆ci,n for any ∆ ∈ R by R(Yi − ∆ci,n ) and let an (i) be a score defined via a score function ϕ. We study the linear rank statistic Sn (∆) =

n X

ci,n an [R(Yi − ∆ci,n )]

i=1

and show that Sn (∆) is asymptotically uniformly linear in the parameter ∆ in any interval [−C, C], C > 0. Key words: linear rank statistic, projection of rank statistics, contiguity, linear rank regression, heteroscedastic errors.

∗ Corresponding

author: Kristi Kuljus, e-mail: [email protected], Department of Mathematics, Uppsala University, Box 480, 751 06 Uppsala, Sweden.

2

1

Kuljus and Zwanzig/U.U.D.M. Report 2008:32

Introduction

Consider independent random variables Y1 , . . . , Yn with densities f1 , . . . , fn symmetric around zero. Suppose c1,n , . . . , cn,n are given constants. Denote the rank of Yi − ∆ci,n for any ∆ ∈ R by R(Yi − ∆ci,n ), i = 1, . . . , n, and let an (1), . . . , an (n) be scores generated by a score function ϕ as   i an (i) = ϕ , i = 1, . . . , n . n+1 In this paper we are interested in the linear rank statistic Sn (∆) defined as Sn (∆) =

n X

ci,n an [R(Yi − ∆ci,n )] .

(1)

i=1

For ∆ = 0 we obtain the statistic Sn (0) =

n X

ci,n an [R(Yi )] .

(2)

i=1

The theory for the family of linear rank statistics in the i.i.d. case is presented in ˇ ak (1967) and in the revised version of this book (H´ajek a book by H´ajek and Sid´ et al., 1999). Randles and Wolfe (1979) give an overview of the ”null P asymptotic” theory of a general class of linear rank statistics of the form S = ni=1 c(i)a(Ri ), where c(1), . . . , c(n) are called the regression constants and a(1), . . . , a(n) are called the scores. By ”null hypothesis” they refer to any set of assumptions that will result in a uniform distribution of R1 , . . . , Rn . Our main aim is to show that the rank statistic Sn (∆) is under certain regularity conditions on the densities f1 , . . . , fn and the score function ϕ asymptotically uniformly linear in the parameter ∆. This means that when the number of observations n is large enough, we can approximate Sn (∆) with a linear function in any interval [−C, C], C > 0. Asymptotic linearity of the linear rank statistic in the case of identically distributed regression errors was at first studied by Jureˇckov´a (1969, 1971). In 1969 Jureˇckov´a proved asymptotic uniform linearity of Sn (∆) in the case of simple linear regression model. In 1971 she generalized the result to the multiple regression case. The asymptotic theory of Sn (∆) in the i.i.d. case is also discussed in Hettmansperger and McKean (1998). Koul (2002) proved asymptotic uniform linearity of the linear rank statistic over bounded sets in the case of heteroscedastic random variables Y1 , . . . , Yn . He used the technique of weighted empirical processes. It is important to emphasize that Koul (2002) considered only bounded score functions. In this work we will generalize the result by Jureˇckov´a (1969) to the case of heteroscedastic symmetric random variables Y1 , . . . , Yn . We are going to combine the techniques used in H´ajek (1968) and Jureˇckov´a (1969). The key tools for obtaining the asymptotic representation for Sn (∆) are the contiguity of probability measures and the projection method for rank statistics developed in H´ajek (1968). We will

Asymptotic linearity of linear rank statistic

3

consider also unbounded square integrable score functions. An example of such a score function is the normal score function ϕ(u) = Φ−1 (u), where Φ(·) denotes the distribution function of N (0, 1). The choice of the score function ϕ plays a crucial role in power considerations of rank tests and efficiency comparisons of R-estimates. In the i.i.d. case when f is a normal density, the optimal scores are given by the normal score function. Therefore it is important to study also unbounded square integrable score functions when working with inference procedures based on ranks. The paper is organized as follows. In Section 2 the linear rank statistic Sn (∆) is studied in the case of bounded score functions. At first we find the asymptotic representation of Sn (∆) for any fixed ∆ ∈ R. Thereafter, we prove that Sn (∆) is asymptotically uniformly linear in ∆ in any interval [−C, C], C > 0. Section 3 deals with Sn (∆) in the case of unbounded score functions. We show that under some additional assumptions on the densities f1 , . . . , fn and the constants c1,n , . . . , cn,n , the studied rank statistic is linear in ∆ also in the case of unbounded square integrable score functions. Section 4 contains the proofs of most lemmas and theorems of this paper.

2

Asymptotic linearity of Sn(∆) in the case of bounded score functions

In this section we will work with bounded score functions. An example √ of bounded score functions is the Wilcoxon score function defined as ϕ(t) = 12(t − 1/2), t ∈ [0, 1].

2.1

Assumptions

Introduce the following conditions for the score function ϕ, the densities f1 , . . . , fn and the constants c1,n , . . . , cn,n : (a) ϕ(t), 0 ≤ t ≤ 1, is nondecreasing; (b) ∃ M1 such that ∀ t ∈ [0, 1], |ϕ(t)| < M1 ,

|ϕ0 (t)| < M1 ,

|ϕ00 (t)| < M1 ;

(3)

(c) ϕ is odd about t = 1/2, i.e., −ϕ(t) = ϕ(1 − t), ∀ t ∈ [0, 1]; (d) f1 , . . . , fn are symmetric functions and for every n and every i, i = 1, . . . , n, fi (x) > 0, ∀ x ∈ (−∞, ∞); (e) ∃ M2 such that for every n and every i, i = 1, . . . , n, fi (x) < M2 ,

|fi0 (x)| < M2 ,

|fi00 (x)| < M2 ,

∀ x ∈ (−∞, ∞) ;

(4)

4

Kuljus and Zwanzig/U.U.D.M. Report 2008:32 (f) ∃ M3∗ and M3 such that the Fisher informations I(f1 ), . . . , I(fn ), Z ∞  0 2 fi (x) I(fi ) = fi (x) dx , −∞ fi (x) satisfy for every n and every i, i = 1, . . . , n, M3∗ < I(fi ) < M3 ;

(5)

(g) ∃ M4 and M5 = M5 (∆) that are independent of n and i such that R fi0 (y) 3 (g1) fi (y) fi (y) dy < M4 ,   R fi00 (y)  fi0 (y) 2 2 (g2) − fi (y) fi (y) dy < M4 , fi (y) R h fi000 (y+θi,n (y)∆ci,n ) fi00 (y+θi,n (y)∆ci,n )fi0 (y+θi,n (y)∆ci,n ) (g3) − 3 fi (y+θi,n (y)∆ci,n ) fi2 (y+θi,n (y)∆ci,n ) o3 2 n 0 f (y+θi,n (y)∆ci,n ) + 2 fii (y+θi,n fi (y) dy < M5 , 0 < θi,n (y) < 1, ∀y ∈ (−∞, ∞). (y)∆ci,n ) (h) for every n, the constants c1,n , . . . , cn,n satisfy n X i=1

2.2

ci,n = 0 ,

n X i=1

c2i,n = 1 ,

max |ci,n | → 0 as n → ∞ .

1≤i≤n

(6)

Asymptotic representation of Sn (∆) for fixed ∆

To prove asymptotic linearity of Sn (∆), we start by studying Sn (∆) for any given ∆, ∆ ∈ R. From Theorem 2.1 in Jureˇckov´a (1969) it follows that for given realizations of Y1 , . . . , Yn , Sn (∆) is a nonincreasing step function of ∆. At the points of discontinuity Sn (∆) can be defined to be either right- or left-continuous. From now on we assume that Sn (∆) is defined for every real ∆. 2.2.1

Preparations and the main theorem

The main difficulty when working with the statistics Sn (0) and Sn (∆) is that both are sums of dependent random variables. To make calculations easier, it is reasonable to approximate Sn (0) and Sn (∆) with statistics that are sums of independent random variables. One way of finding such approximations for statistics that do not have the standard form of a sum of independent random variables is provided by the method of projection introduced in H´ajek (1968). Denote the cumulative distribution functions corresponding to f1 , . . . , fn by F1 , . . . , Fn . It appears that Sn (0) can be approximated with a statistic given in Theorem 4.2 in H´ajek (1968), which we denote by Tn (0). The statistic Tn (0) is defined as Z n n n X X 1X (cj,n − ci,n ) [u(x − Yi ) − Fi (x)]ϕ0 (H(x)) dFj (x) , (7) Tn (0) = li,n (Yi ) = n j=1 i=1 i=1

Asymptotic linearity of linear rank statistic where

5 

n

1X H(x) = Fi (x) n i=1

and

u(x) =

1, 0,

x≥0 . x 0 lim P ( |Sn (∆) − Tn (0) + bn ∆| ≥ ε) = 0 .

n→∞

Theorem 2.2 implies that the rank statistic Sn (∆) can for any fixed ∆ be presented as Sn (∆) = Tn (0) − bn ∆ + rn (∆) , where rn (∆) denotes the remainder term depending on ∆. This means that for n large enough we can approximate Sn (∆) by Tn (0) − bn ∆.

6

Kuljus and Zwanzig/U.U.D.M. Report 2008:32

2.2.2

The idea of the proof of Theorem 2.2

We are going to prove Theorem 2.2 in several steps. The idea of the proof comes from Jureˇckov´a (1969), but the key step of the proof is based on Theorem 4.2 in H´ajek (1968), see Theorem B.4 in Appendix B. This theorem helps us to define suitable approximations of Sn (0) and Sn (∆). In the case of i.i.d. variables the statistic Tn (0) can be found intuitively (Hettmansperger and McKean, 1998, p. 399). When Y1 , . . . , Yn are nonidentically distributed, the intuition does not help anymore. Another key notion needed in the proof is contiguity of two sequences of probability measures, see Definition B.1 and Lemma B.2. Consider the statistic Sn (∆) − Tn (0) + ∆bn for arbitrary ∆ ∈ R. Split it into a sum of three terms: Sn (∆) − Tn (0) + ∆bn = [Sn (∆) − Tn (∆)] + [Tn (∆) − E Tn (∆) − Tn (0)] + [E Tn (∆) + ∆bn ] .

(10)

We are going to show that every term in the sum converges, then the statement of the theorem follows due to Slutsky’s lemma. We are going to prove that p

Step 1: Sn (0) − Tn (0) −→ 0 p

Step 2: Sn (∆) − Tn (∆) −→ 0

as

n → ∞,

as

n → ∞, p

Step 3: Tn (∆) − E Tn (∆) − Tn (0) −→ 0

as

n → ∞,

Step 4: limn→∞ [E Tn (∆) + ∆bn ] = 0 . Proofs of Step 1–Step 4. For every statement in Step 1–Step 4 we formulate a lemma. Some auxiliary results needed to prove these lemmas are also stated. All the longer proofs of the results given below will be presented in Section 4. Step 1. The assertion of Step 1 implies that Tn (0) is a good approximation of Sn (0) for large n. Under the assumptions of Theorem 2.2, E Sn (0) = 0. Lemma 2.3 Let Sn (0) be the statistic given in (2). If Y1 , . . . , Yn are independent random variables symmetric about zero and ϕ : (0, 1) −→ R is odd about t = 1/2, then E Sn (0) = 0. Proof. See p. 16.

2

Lemma 2.4 Under assumptions (b), (c), (d) and (h) it holds for every ε > 0 that lim P {|Sn (0) − Tn (0)| ≥ ε} = 0 .

n→∞

Proof. The proof is based on H´ajek’s theorem (see Theorem B.4). Lemma 2.3 implies that E Sn (0) = 0. According to H´ajek’s theorem there exists a constant M = M (ϕ) > 0 such that E [Sn (0) − Tn (0)]2 ≤

M , n

Asymptotic linearity of linear rank statistic

7

which implies due to Markov’s inequality that p

2

Sn (0) − Tn (0) −→ 0 as n → ∞ .

Step 2. One can show that Lemma 2.4 implies the statement of Step 2, when a certain contiguity argument holds. Consider the joint density functions of Y1 , . . . , Yn and Y1 − ∆c1,n , . . . , Yn − ∆cn,n and denote them by pn and qn , respectively: pn (y1 , . . . , yn ) =

n Y

fi (yi ) ,

qn (y1 , . . . , yn ) =

i=1

n Y

fi (yi + ∆ci,n ) .

(11)

i=1

We wish to show that the sequence {qn } is contiguous with respect to the sequence of densities {pn } for any ∆ ∈ R. This means that if the measure of any sequence of measurable sets An converges to zero under pn , then the same holds under qn . Theorem 2.5 Consider the density functions pn and qn in (11). Assume that conditions (f ), (g) and (h) hold. Then {qn } is contiguous with respect to {pn } for any ∆ ∈ R. Proof. See p. 17 in Section 4.

2

The contiguity theorem can be applied to prove that Tn (∆) is for large n a good approximation of Sn (∆). Lemma 2.6 Let assumptions (b), (c), (d), (f ), (g) and (h) be satisfied. Then for any ∆ ∈ R and every ε > 0 lim P {|Sn (∆) − Tn (∆)| ≥ ε} = 0 .

n→∞

Proof. See p. 20.

2

Step 3. Study now the second term Tn (∆) − E Tn (∆) − Tn (0) of the decomposition in (10). P Lemma 2.7 Consider the random variable li,n (Yi ) in Tn (0) = ni=1 li,n (Yi ), see (7). Assume that ϕ0 is bounded. Then E [li,n (Yi )] = 0 and thus, E Tn (0) = 0. Proof. See p. 21 in Section 4.

2

Lemma 2.8 Under assumptions (b), (e) and (h) it holds for every ∆ ∈ R and every ε > 0 that lim P {|Tn (∆) − E Tn (∆) − Tn (0)| ≥ ε} = 0 . (12) n→∞

Proof. To prove Lemma 2.8, it is enough to show that Var [Tn (∆) − Tn (0)] → 0 as n → ∞ . Then (12) follows according to Chebyshev’s inequality, because E Tn (0) = 0. The entire proof is given on p. 22 in Section 4. 2 Step 4. The next lemma gives an expression for the expectation of Tn (∆).

8

Kuljus and Zwanzig/U.U.D.M. Report 2008:32

Lemma 2.9 Under the assumption that ϕ0 is bounded, the expectation E Tn (∆) can be presented as Z n n 1 XX E Tn (∆) = (cj,n − ci,n ) ϕ0 (H(x)) [Fi (x + ∆ci,n ) − Fi (x)] dFj (x) . (13) n i=1 j=1 Proof. See p. 24.

2

The purpose of Step 4 is to show that E Tn (∆) ≈ −bn ∆ for big n. Lemma 2.10 Suppose that conditions (b), (c), (d), (e) and (h) hold. Then for every ∆ ∈ R lim [E Tn (∆) + ∆bn ] = 0 . n→∞

Proof. Lemma 2.10 can be proved by using the Taylor expansions of Fi (x), i = 1, . . . , n. For the whole proof, see p. 25.

2

We are now ready to prove Theorem 2.2. Proof of Theorem 2.2. Apply Slutsky’s lemma (see Lemma B.9) to the decomposition in (10). Lemma 2.6, Lemma 2.8 and Lemma 2.10 then imply that p

Sn (∆) − Tn (0) + ∆bn −→ 0 as n → ∞ , i.e. the assertion of Theorem 2.2 holds.

2.3

2

Asymptotic uniform linearity of Sn (∆) p

In the previous subsection we proved that Sn (∆) − Tn (0) + ∆bn −→ 0 for every fixed ∆ ∈ R as n → ∞. In this subsection we show that this result holds uniformly in ∆ in any interval [−C, C], C > 0. Observe that −bn is the slope parameter of the line that approximates Sn (∆). The parameter bn defined in (9) is always positive and under certain conditions it has an upper bound. We state at first a general lemma. P Lemma 2.11 Let c1,n , . . . , cn,n be real constants satisfying i ci,n = 0 and P 2 i ci,n = 1. Assume that 0 < bi,j < K are real numbers such that bi,j = bj,i , i, j = 1, . . . , n. Then n

0 0 and C > 0 lim P { max |rn (∆)| ≥ ε} = 0 .

n→∞

|∆|≤C

Proof. The proof of this theorem can be performed as in Jureˇckov´a (1969). Let ε, η and C be any positive numbers. We want to show that lim P { max |Sn (∆) − Tn (0) + bn ∆| ≥ ε} = 0 .

n→∞

|∆|≤C

We have shown that 0 < bn < M1 M2 = M , so bn ∆ is an increasing function of ∆ in [−C, C]. Consider a partition −C = ∆0 < ∆1 < . . . < ∆l = C of the interval [−C, C], such that ε |∆i − ∆i−1 | < , i = 1, . . . , l . 2M Theorem 2.2 implies that there exists n0 such that for every n > n0 and for every i = 0, . . . , l, η P {|Sn (∆i ) − Tn (0) + bn ∆i | ≥ ε/4} < . l+1 From Jureˇckov´a (1969) it follows that Sn (∆) is a nonincreasing step function of ∆, so both Sn (∆) and bn ∆ are monotone functions in [−C, C]. Let ∆ be any point in the interval [−C, C]. There exists i0 , 1 ≤ i0 ≤ l, such that ∆i0 −1 ≤ ∆ ≤ ∆i0 , i.e. ∆ belongs to some subinterval of [−C, C]. Study now Sn (∆) − Tn (0) + bn ∆. We have to distinguish between two cases: 1) when Sn (∆) − Tn (0) + bn ∆ ≥ 0, the following inequality is satisfied: 0 ≤ Sn (∆) − Tn (0) + bn ∆ ≤ Sn (∆i0 −1 ) − Tn (0) + bn ∆i0 + bn ∆i0 −1 − bn ∆i0 −1 ≤ |Sn (∆i0 −1 ) − Tn (0) + bn ∆i0 −1 | + bn (∆i0 − ∆i0 −1 ) ; 2) when Sn (∆) − Tn (0) + bn ∆ < 0, it holds that 0 < −bn ∆ + Tn (0) − Sn (∆) ≤ −bn ∆i0 −1 + Tn (0) − Sn (∆i0 ) + bn ∆i0 − bn ∆i0 ≤ |Sn (∆i0 ) − Tn (0) + bn ∆i0 | + bn (∆i0 − ∆i0 −1 ) .

10

Kuljus and Zwanzig/U.U.D.M. Report 2008:32

The two inequalities in 1) and 2) can be summarized as |Sn (∆) − Tn (0) + bn ∆| ≤ |Sn (∆i0 −1 ) − Tn (0) + bn ∆i0 −1 | +|Sn (∆i0 ) − Tn (0) + bn ∆i0 | + bn (∆i0 − ∆i0 −1 ) ≤2

max

k∈{i0 ,i0 −1}

|Sn (∆k ) − Tn (0) + bn ∆k | +

ε . 2

The last inequality implies that max |Sn (∆) − Tn (0) + bn ∆| ≤ 2 max |Sn (∆i ) − Tn (0) + bn ∆i | + 0≤i≤l

|∆|≤C

ε . 2

(15)

From (15) it follows that     P max |Sn (∆) − Tn (0) + bn ∆| ≥ ε ≤ P max |Sn (∆i ) − Tn (0) + bn ∆i | ≥ ε/4 0≤i≤l

|∆|≤C

( =P

l [

) (|Sn (∆i ) − Tn (0) + bn ∆i | ≥ ε/4)

i=0



l X

l X

P {|Sn (∆i ) − Tn (0) + bn ∆i | ≥ ε/4}
0 such that for every n and i, i = 1, . . . , n, fi (x) 0. R 0 If ϕ is not bounded in the neighbourhood of 0 and 1, then the existence of ϕ (H(x))fi (x)fj (x)dx is not obvious at all. Therefore, we are going to define the parameter corresponding to bn in the case of an unbounded score function in another

12

Kuljus and Zwanzig/U.U.D.M. Report 2008:32

way. The idea for redefining bn comes from the i.i.d. case. Recall from Lemma 2.1 that if f1 = . . . = fn and ϕ is an odd bounded score function, then the expression of bn in (18) reduces with help of partial integration to Z 1 f 0 (F −1 (t)) bn = − ϕ(t) dt . f (F −1 (t)) 0 When partial integration is applied to the integral in bn in the case of nonidentically distributed Y1 , . . . , Yn , we obtain: Z ∞ Z 1 fi (H −1 (t))fj (H −1 (t)) 0 ϕ (H(x))fi (x)fj (x) dx = ϕ0 (t) dt h(H −1 (t)) −∞ 0 1 Z 1   d fi (H −1 (t))fj (H −1 (t)) fi (H −1 (t))fj (H −1 (t)) dt = ϕ(t) − ϕ(t) · dt h(H −1 (t)) h(H −1 (t)) 0 0   fi (H −1 (t))fj (H −1 (t)) = 2 lim ϕ(t) + Int(fi , fj , h, ϕ) , (19) t→1 h(H −1 (t)) where   Z 1 d fi (H −1 (t))fj (H −1 (t)) Int(fi , fj , h, ϕ) = − ϕ(t) · dt . (20) dt h(H −1 (t)) 0 Consider now an unbounded square integrable score function ϕ and a sequence of bounded score functions ϕk , k = 1, 2, . . .. Suppose that both ϕ and {ϕk } are nondecreasing and odd about t = 1/2. Define parameters bn,k and b∗n as bn,k

n−1 n 1X X (cj,n − ci,n )2 Int(fi , fj , h, ϕk ) , = n i=1 j=i+1

b∗n =

n−1 n 1X X (cj,n − ci,n )2 Int(fi , fj , h, ϕ) . n i=1 j=i+1

(21)

(22)

Remark. Suppose that assumption (l) holds. Then for every ϕk   fi (H −1 (t))fj (H −1 (t)) 0 ≤ lim ϕk (t) < Kϕk (1) lim fj (x) = 0 . t→1 x→∞ h(H −1 (t)) This gives due to the partial integration relationship in (19) that for bounded score functions ϕk it holds under (l) that Z ∞ Int(fi , fj , h, ϕk ) = ϕ0k (H(x))fi (x)fj (x) dx . −∞

Thus we can conclude that if condition (l) holds then the integral Int(fi , fj , h, ϕk ) is always positive and bn,k is defined for every k. We will now show that the integral Int(fi , fj , h, ϕ) defined in (20) is under certain assumptions always positive and it has an upper bound that is independent of i, j and n. It follows that the parameter b∗n is properly defined under these assumptions. The proofs of the following two lemmas are given in Section 4, p. 27–28.

Asymptotic linearity of linear rank statistic

13

Lemma 3.1 Let ϕ be a nondecreasing, odd and square integrable score function and let f1 , . . . , fn be densities such that assumption (l) holds. Then   fi (H −1 (t))fj (H −1 (t)) = 0. lim ϕ(t) t→1 h(H −1 (t)) From (19) it now follows that R ∞ if 0(i) and (l) hold and Int(fi , fj , h, ϕ) exists, then it is always positive, because −∞ ϕ (H(x))fi (x)fj (x)dx is positive. Lemma 3.2 Let the score function ϕ satisfy (i) and let f1 , . . . , fn be symmetric densities that satisfy conditions (f ) and (l). Then there exists a constant C = C(ϕ) > 0 that is independent of i, j and n such that Int(fi , fj , h, ϕ) < C(ϕ) ∀i, j, n . Thus, under the assumptions of Lemma 3.2, b∗n is well defined.

3.3

Asymptotic linearity of Sn (∆)

To prove asymptotic uniform linearity of Sn (∆) in any interval [−C, C], C > 0, we start again by considering Sn (∆) for a fixed ∆ ∈ R. Theorem 3.3 Consider any ∆ ∈ R. Suppose assumptions (d)–(m) hold. Let Sn (∆) and Sn (0) be the statistics defined in (1) and (2), respectively. Let b∗n be defined as in (22). Then for every ε > 0 it holds that lim P { |Sn (∆) − Sn (0) + b∗n ∆| ≥ ε} = 0 .

n→∞

Remark. Observe the difference in the asymptotic representation of Sn (∆) compared to the case of bounded score functions. For unbounded score functions Sn (∆) ≈ Sn (0) − b∗n ∆, whereas for bounded score functions Sn (∆) ≈ Tn (0) − bn ∆. Here Tn (0) is the approximation of Sn (0) and it is a sum of independent random variables. The difference and connection between bn and b∗n was explained in the previous subsection. The idea of the proof of Theorem 3.3. To prove Theorem 3.3, the same idea can be used as in the proof of Theorem 2.2. A new step is that we define a sequence of score functions ϕk , kR = 1, 2, . . ., so that conditions (a), (b) and (c) are satisfied for every ϕk and limk→∞ [ϕk (t) − ϕ(t)]2 dt = 0. This means that the unbounded score function ϕ can for k large enough be approximated with a bounded score function ϕk . Let us denote the rank statistics Sn (0) and Sn (∆) corresponding to the score functions ϕk by Sn,k (0) and Sn,k (∆), respectively. Let bn,k , k ∈ {1, 2, . . .}, be defined as in (21). Split the statistic Sn (∆) − Sn (0) + b∗n ∆ in the following way: Sn (∆) − Sn (0) + b∗n ∆ = [Sn,k (∆) − Sn,k (0) + bn,k ∆] + [Sn,k (0) − Sn (0)] +[Sn (∆) − Sn,k (∆)] + (b∗n − bn,k )∆ =

4 X i=1

termi .

(23)

14

Kuljus and Zwanzig/U.U.D.M. Report 2008:32 p

To show that Sn (∆) − Sn (0) + b∗n ∆ −→ 0 as n → ∞, we must show that every term in the partition above converges to zero in probability as k, n → ∞. By k, n → ∞ we mean that the limit is taken in the order limk→∞ limn→∞ . For term1 results from Section 2 can be applied. The convergence of term3 follows due to convergence of term2 and the contiguity result presented in Section 2. Therefore, we are now going to study term2 and term4 and state two lemmas concerning the convergence of these terms. The proofs of the lemmas are given in Section 4. Lemma 3.4 Suppose assumptions (h)–(k) and (m) hold. Let ϕk , k = 1, 2, . . ., be a sequence of bounded score functions such that (a), (b) and (c) are satisfied for every R ϕk and limk→∞ [ϕk (t) − ϕ(t)]2 dt = 0. Under these assumptions it holds for every ε > 0 that lim lim P {|Sn,k (0) − Sn (0)| ≥ ε} = 0 . k→∞ n→∞

Proof. See p. 28.

2 P

Lemma 3.5 Let the assumptions of Lemma 3.2 hold. Assume that i ci,n = 0, P 2 = 1. Let ϕk , k = 1, 2, . . ., be a sequence of score functions such that i ci,n R limk→∞ [ϕk (t) − ϕ(t)]2 dt = 0. Then lim lim (bn,k − b∗n ) = 0 .

k→∞ n→∞

Proof. See p. 31.

2

Proof of Theorem 3.3. Let ε > 0 and η > 0 be any constants. Consider the terms in partition (23). 1) term1 : Since the conditions in (a)–(h) hold for f1 , . . . , fn and ϕk for every k, it follows from Lemma 2.4 and Theorem 2.2 that for every k there exists n1 (k) such that ∀n > n1 (k) P {|Sn,k (∆) − Sn,k (0) + bn,k ∆| ≥ ε/4} < η/4 ; 2) term2 : Lemma 3.4 implies the existence of k2 such that ∀k > k2 we can find n2 (k) so that ∀n > n2 (k) P {|Sn,k (0) − Sn (0)| ≥ ε/4} < η/4 ; 3) term3 : Let us denote the probabilities under the density functions pn and qn defined in (11) right now as Ppn (·) and Pqn (·), respectively. When we write Qn P (·), we always mean the probability under pn = i=1 fi (yi ). We showed in Theorem 2.5 that the sequence of densities {qn } is contiguous with respect to the densities {pn } for any ∆ ∈ R. According to the alternative definition of contiguity given in Lemma B.2 this means that for every ε > 0 and for every sequence of measurable sets An there exists δ = δ(ε) > 0 and n0 = n0 (ε), such that Ppn {An } < δ for every n > n0 implies Pqn {An } < ε . Consider now in the view of contiguity our events |Sn,k (0) − Sn (0)| and δ > 0 that corresponds to

Asymptotic linearity of linear rank statistic

15

our η/4. Take ψ = min{δ, η/4}. Due to Lemma 3.4 there exists k3 such that ∀k > k3 we can find n3 (k) so that ∀n > n3 (k) Ppn {|Sn,k (0) − Sn (0)| ≥ ε/4} < ψ ≤ δ . According to the contiguity definition in Lemma B.2 we can find n4 (k) ≥ n3 (k) for every k > k3 so that ∀n > n4 (k) Pqn {|Sn,k (0) − Sn (0)| ≥ ε/4} < η/4 . The last inequality is equivalent to Ppn {|Sn,k (∆) − Sn (∆)| ≥ ε/4} < η/4 . This can be shown in the same way as in the proof of Lemma 2.6. 4) term4 : According to Lemma 3.5 there exists k4 so that for every n and ∀k > k4 |bn,k − b∗n ||∆| < η/4 . Take k0 = max{k2 , k3 , k4 } and define for every k > k0 the number n0 (k) as n0 (k) = max{n1 (k), n2 (k), n4 (k)}. Then it holds that ∀n > n0 (k) P {|Sn (∆) − Sn (0) + b∗n ∆| ≥ ε} ≤ P {|Sn,k (∆) − Sn,k (0) + bn,k ∆| ≥ ε/4} +P {|Sn,k (0) − Sn (0)| ≥ ε/4} + P {|Sn,k (∆) − Sn (∆)| ≥ ε/4} + |bn,k − b∗n ||∆| < η . 2 Theorem 3.6 Suppose assumptions (d)–(m) are fulfilled. Then the rank statistic Sn (∆) has the uniform stochastic expansion Sn (∆) = Sn (0) − b∗n ∆ + rn (∆) , where for any ε > 0 and C > 0 lim P { max |rn (∆)| ≥ ε} = 0 .

n→∞

|∆|≤C

Proof. The proof of this theorem can be performed in the same way as the proof of Theorem 2.13. 2

16

4

Kuljus and Zwanzig/U.U.D.M. Report 2008:32

Proofs of theorems and lemmas

Proof of Lemma 2.1. 1) The expression of li,n (Yi ) in Tn (0) (see (7)) can in the case of i.i.d. errors be written as Z n 1X (cj,n − ci,n ) [u(x − Yi ) − F (x)]ϕ0 (F (x)) dF (x) , li,n (Yi ) = n j=1 where the integral can be calculated as follows: Z Z 0 [u(x − Yi ) − F (x)]ϕ (F (x)) dF (x) =



ϕ0 (F (x))f (x) dx

Yi

Z

Z





1

0

t ϕ0 (t) dt

F (x)ϕ (F (x))f (x) dx = ϕ(1) − ϕ(F (Yi )) − −∞

0

Z = ϕ(1) − ϕ(F (Yi )) − [ϕ(t)t]|10 + Because ϕ(F (Yi )) does not depend on j and Tn (0) =

n X

1

ϕ(t) dt = −ϕ(F (Yi )) . 0

P

j

cj,n = 0, we obtain

ci,n ϕ(F (Yi )) .

i=1

2) The integral in bn (see (9)) reduces in the i.i.d. case to Using partial integration, we obtain: Z

Z

1

0

ϕ (F (x))f (x) dF (x) =

R

ϕ0 (F (x))f (x) dF (x).

ϕ0 (t)f (F −1 (t)) dt

0



= ϕ(t) f (F

−1

 1 (t)) − 0

Z

Z 1  d  f 0 (F −1 (t)) −1 ϕ(t) · f (F (t)) dt = − ϕ(t) · dt . dt f (F −1 (t)) 0

1 0

Since the last expression is independent of i and j and n



n

1 XX (cj,n − ci,n )ci,n = 1 n i=1 j=1

due to assumption (h), the statement follows. Proof of Lemma 2.3. According to the definition of Sn (0), " n #   n X X R(Yi ) . E Sn (0) = E ci,n an (R(Yi )) = ci,n E ϕ n + 1 i=1 i=1

2

Asymptotic linearity of linear rank statistic

17

Consider E ϕ(R(Yi )/(n + 1)) for any i ∈ {1, . . . , n}. Since Y1 , . . . , Yn are symmetric random variables, it holds according to Lemma A.2 that P {R(Yi ) = j} = P {R(Yi ) = n + 1 − j} . Let us denote the probability P {R(Yi ) = k} by pi (k). Assume without loss of generality that n is even, because for odd n,    1 (n + 1) =ϕ = 0. ϕ 2(n + 1) 2 We obtain:

   X   n R(Yi ) k E ϕ = pi (k) ϕ n+1 n+1 k=1     n/2   X k n − (k − 1) = pi (k) + ϕ pi (n − (k − 1)) ϕ n + 1 n + 1 k=1     n/2   X k n − (k − 1) = ϕ pi (k) − ϕ 1 − pi (k) = 0 . n+1 n+1 k=1

Thus, E Sn (0) = 0.

2

Proof of Theorem 2.5. We are going to apply a part of Le Cam’s first lemma (see Lemma B.3) to prove that the densities qn are contiguous with P 2respect to the densities pn . Because ∗ M3 < I(fi ) < M3 according to (5) and i ci,n = 1, it holds that M3∗


0. Consider the likelihood ratio Lnj (∆) =

nj Y fi (Yi + ∆ci,nj ) i=1

Pnj

fi (Yi )

.

Let −→ denote convergence in distribution under pnj = that Pnj Lnj (∆) −→ V as j → ∞ ,

Q nj i=1

fi (yi ). If we can show (25)

18

Kuljus and Zwanzig/U.U.D.M. Report 2008:32

where V is a random variable with E V = 1, then the contiguity follows from part (iii) in Le Cam’s first lemma. Instead of (25) one can show that Pnj

ln Lnj (∆) −→ U , where U ∼ N (− 12 ∆2 σ 2 , ∆2 σ 2 ), because then it follows that Pnj

Lnj (∆) −→ eU , where eU is log-normally distributed E (eU ) = 1 (see H´ajek et al., 1999, p. Pnwith j 252–253). Let us write ln Lnj (∆) = i=1 ln[fi (Yi + ∆ci,nj )/fi (Yi )] as ln Lnj (∆) =

nj X

lfi (∆ci,nj ) ,

where

lfi (x) = ln

i=1

fi (Yi + x) . fi (Yi )

Recall from (6) that max1≤i≤n |ci,n | → 0 as n → ∞. This implies that ∆ci,n is ”small”, so we can apply to every term lfi (∆ci,nj ) Maclaurin’s formula f (h) = f (0) + hf 0 (0) + . . . +

hk hk−1 (k−1) f (0) + f (k) (θk h) , (k − 1)! k!

0 < θk < 1 .

We can rewrite ln Lnj (∆) as ln Lnj (∆) = ∆

nj X

Zi,nj

i=1

nj nj ∆3 X 00 ∆2 X 0 Z + Z , + 2 i=1 i,nj 6 i=1 i,nj

where Zi,nj

f 0 (Yi ) = ci,nj i , fi (Yi )

and

( 00 Zi,n j

=

c3i,nj

( 0 Zi,n = c2i,nj j

fi00 (Yi ) − fi (Yi )

fi00 (Gi,nj )fi0 (Gi,nj ) fi000 (Gi,nj ) −3 +2 fi (Gi,nj ) fi2 (Gi,nj )





fi0 (Yi ) fi (Yi )

(26)

2 )

fi0 (Gi,nj ) fi (Gi,nj )

3 )

with Gi,nj = Yi + θi,nj ∆ci,nj . Here θi,nj = θi,nj (Yi ), 0 < θi,nj < 1, i = 1, . . . , nj . We are going to show that 1) Step 1: nj X

D

Zi,nj −→ N (0, σ 2 )

under pnj ,

i=1

2) Step 2: nj X i=1

p

0 Zi,n −→ −σ 2 j

under pnj ,

(27)

Asymptotic linearity of linear rank statistic 3) Step 3:

nj X

19

p

00 Zi,n −→ 0 j

under pnj .

i=1

Step 1. A version of the Lindeberg-Feller theorem (see Theorem B.8) can be used to prove that (27) holds. Calculate the expectation and variance of the variables Zi,nj : Z 0 fi (y) E Zi,nj = ci,nj fi (y) dy = 0 , fi (y) Z  0 2 fi (y) 2 fi (y) dy = c2i,nj I(fi ) . Var Zi,nj = ci,nj fi (y) From (24) it follows that nj X

Var

! Zi,nj

=

nj X

i=1

Var(Zi,nj ) → σ 2

as j → ∞ .

i=1

The conditions about the expectation and variance in the Lindeberg-Feller theorem are fulfilled for the random variables Zi,nj . Check the Lindeberg condition: h i Z 2 E Zi,nj I² (|Zi,nj |) = 

Z < c2i,nj

fi (y)²

2

c2i,nj

fi0 (y) fi (y)

2

|ci,nj | 0 1 |fi (y)| dy < |ci,nj |c2i,nj ² ²

fi (y) dy Z 0 3 fi (y) fi (y) fi (y) dy .

The integral in the expression above is bounded for every i according to assumption (g1), thus it follows that nj i h X M4 2 E Zi,n I (|Z |) < max |ci,nj | → 0 as j → ∞ . i,nj j ² 1≤i≤nj ² i=1

The Lindeberg condition holds, hence according to the Lindeberg-Feller theorem (27) follows. Step 2. Consider now "Z 0 E Zi,n = c2i,nj j

P nj i=1

0 Zi,n . Since j

fi00 (y) fi (y) dy − fi (y)

Z 

fi0 (y) fi (y)

2

# fi (y) dy = −c2i,nj I(fi ) ,

it follows according to (24) that lim

j→∞

nj X i=1

0 E Zi,n j

= − lim

j→∞

nj X i=1

c2i,nj I(fi ) = −σ 2 .

20

Kuljus and Zwanzig/U.U.D.M. Report 2008:32

Using Chebyshev’s inequality we obtain: nj ! Pnj 0 Pnj 0 X Var ( Z ) i,n i=1 i=1 Var (Zi,nj ) j 0 0 P (Zi,nj − EZi,nj ) > ² ≤ = . ²2 ²2 i=1

Because

Z "

0 02 Var (Zi,n ) ≤ E (Zi,n ) = c4i,nj j j

it follows that

nj X

0 Var Zi,n j


0, then Z Z 2 [u(x − (y − ∆ci,n )) − u(x − y)] dFj (x) =

y

fj (x) dx y−∆ci,n

= Fj (y) − Fj (y − ∆ci,n ) = |Fj (y) − Fj (y − ∆ci,n )| ; b) if ∆ci,n < 0, then Z

Z

y−∆ci,n

2

[u(x − (y − ∆ci,n )) − u(x − y)] dFj (x) =

fj (x) dx y

= Fj (y − ∆ci,n ) − Fj (y) = |Fj (y) − Fj (y − ∆ci,n )| .

(34)

24

Kuljus and Zwanzig/U.U.D.M. Report 2008:32

Since f1 , . . . , fn are uniformly bounded according to assumption (e) (for every i and ∀x ∈ (−∞, ∞), fi (x) < M2 ), it follows that the distribution functions F1 , . . . , Fn are Lipschitz continuous (see Definition B.5) with some Lipschitz constants L1 , . . . , Ln , where Lj ≤ M2 for every j = 1, . . . , n. This implies that |Fj (y) − Fj (y − ∆ci,n )| ≤ Lj |∆ci,n | ≤ M2 |∆ci,n | . The last inequality and (34) imply that ZZ 2 2 [u(x − (y − ∆ci,n )) − u(x − y)]2 dFj (x)dFi (y) ≤ M12 M2 |∆ci,n | . E [ψj (Yi )] < M1 Thus, E [ψj (Yi )]2 has an upper bound that is independent of j. Putting together the last inequality, (32) and (33), we obtain: Var [Tn (∆) − Tn (0)]

Suggest Documents