A jackknife variance estimator for unistage stratified ...

0 downloads 0 Views 157KB Size Report
Jul 12, 2007 - A novel jackknife variance estimator is proposed which is as numerically simple as ...... Some recent work on resampling methods for complex ...
To appear in Biometrika

12 July 2007

A jackknife variance estimator for unistage stratified samples with unequal probabilities By YVES G. BERGER Southampton Statistical Sciences Research Institute, University of Southampton, Southampton, SO17 1BJ, U.K. y.g.bergersoton.ac.uk

SUMMARY

Existing jackknife variance estimators used with sample surveys can seriously overestimate the true variance under unistage stratified sampling without replacement with unequal probabilities. A novel jackknife variance estimator is proposed which is as numerically simple as existing jackknife variance estimators. Under certain regularity conditions, the proposed variance estimator is consistent under stratified sampling without replacement with unequal probabilities. The high entropy regularity condition necessary for consistency is shown to hold for the Rao-Sampford design. An empirical study of three unequal probability sampling designs supports our findings.

Some key words: Consistency; Design-based inference; Finite population correction; Sample survey; Smooth function of means; Stratification.

1. INTRODUCTION

Jackknife methods are widely used for variance estimation in sample surveys (Wolter, 1985; Shao & Tu, 1995). Properties of various forms of the jackknife variance estimator

1

To appear in Biometrika

have been studied both theoretically and empirically (Lee,1973; Jones, 1974; Kish & Frankel, 1974; Krewski & Rao, 1981; Rao & Wu, 1985; Kovar et al., 1988; Rao et al., 1992; Shao & Tu, 1995). Since the mid 1950s, there has been a well-developed theory of sample survey design inference embracing complex designs with stratification and unequal probabilities (Smith, 2001). However, customary jackknife variance estimators, henceforth called jackknife estimators, are not always consistent under these sampling designs (Demnati & Rao, 2004). We proposed a novel jackknife estimator which is consistent under unistage stratified sampling with inclusion probability proportional to size. The probability sample s is defined as follows. Assume that the sample s is randomly selected by a unistage stratified probability sampling design p(s ). Let

U1 , K , U H denote H strata, where and

H

∑ h =1 N h = N .

H

U h =1U h

= U . The size of U h is denoted by N h ,

Suppose that a sample sh of fixed size nh is selected without H

replacement with unequal probabilities from U h . The complete sample is s = U h =1 sh , and the size of s is n = ∑ hH=1 nh . This sampling design is often used by statistical agencies, and for surveys of rare animal population (Thompson & Seber, 1996, p. 134).

2. THE CLASS OF POINT ESTIMATORS

Assume that the parameter of interest θ can be expressed as a function of means of Q survey variables; that is, θ = g ( µ1 ,K, µQ ), where g (⋅) is a smooth differentiable function (Shao & Tu, 1995, Ch. 2). The parameter µ q is the finite population mean defined by µ q = N −1 ∑i∈U y qi . The set U = {1, K , N } is a finite population containing

2

To appear in Biometrika

N units. The quantity yqi is the value of the qth survey variable ( q = 1, K , Q ) associated

with the unit labelled i ∈ U . This definition of θ includes parameters of interest arising in common survey applications, such as ratios, subpopulation means, correlation and regression coefficients. It excludes parameters such as L-statistics (Shao, 1994) and coefficients of logistic regression which cannot be expressed as function of means. For simplicity, assume that the survey variables are free from errors due to nonresponse and measurement. The point estimator θˆ is the substitution estimator θˆ = g ( µˆ1 ,K, µˆ Q ), in which

µˆ q = ∑ wi yi

(1)

i∈s

is the Hájek (1971) estimator of µ q , where s is the sample, wi = α i (∑k∈s α k )−1 and

α i = 1 / π i denotes the Horvitz & Thompson (1952) sampling weights. The quantity π i denotes the first-order inclusion probability of unit i.

3. EXISTING JACKKNIFE VARIANCE ESTIMATORS

3.1. The customary jackknife

A class of customary jackknife variance estimators of θˆ is given by (Wolter, 1985, p. 182; Rao et al. 1992; Shao & Tu, 1995, p. 239)

vl =

H 

n  nh − 1  (θˆ( j ) − θˆlh ) 2 , ∑ h  nh i∈s h

∑ 1 − Nh

h =1

(2)

for l = 1, 2, 3 . In this class, there are three estimators given by different values of l , and

θˆ( j ) is the estimator of θˆ based on the data with the jth unit excluded; that is,

3

To appear in Biometrika

θˆ( j ) = g ( µˆ1( j ) ,K, µˆ Q ( j ) )

(3)

with

µˆ q ( j ) = ∑ wi ( j ) yi . i∈s

Here wi ( j ) are jackknife weights defined by

wi ( j ) = α i ( j ) (∑ k∈s α k ( j ) ) −1

ξ / π α i ( j ) =  ih i 

0

if i ≠ j if i = j

(4)

with ξ ih = nh /(nh − 1), where h is such that i ∈ sh . There are three versions of the estimator in (2) given by different values of θˆlh (Rao & Wu, 1985):

   θˆlh =    

1 ∑θˆ(i ) n i∈s θˆ 1 ∑θˆ(i) nh i∈s

when l = 1, when l = 2, when l = 3.

h

The corresponding estimators are denoted by v1 , v2 and v3 . The estimators v1 and v2 are often called type 1 and type 2 jackknife variance estimators (Särndal et al., 1992, p. 438). An alternative expression for vl in (2) is given by equation (5) which will be useful for comparison with the proposed jackknife estimator defined in § 4: H

vl =

∑ ∑ h =1 j∈s h

ch (u j −

∑ φihl ui ) 2

(5)

i∈s h

with l = 1, 2, 3 and

ch =

nh  n  1 − h  nh − 1  N h 

(6)

4

To appear in Biometrika

n −1 ˆ uj = h (θ ( j ) − θˆ) , nh

 1/ n 0  φihl =   1 / nh  0

for j ∈ sh

when l = 1 when l = 2 when l = 3 and i ∈ sh when l = 3 and i ∉ sh .

(7)

(8)

It can be easily shown that (5) equals (2), by substituting (6), (7) and (8) into (5). In § 4, we propose a novel jackknife estimator denoted by vp , which has the same form as (5) and which is based upon different values for ch and φihl .

3.2. Generalized jackknife that handles unequal probabilities

Campbell (1980) proposed a generalized jackknife variance estimator that allows for unequal probability sampling and stratification, and Berger & Skinner (2005) established its consistency. Campbell's generalized jackknife is given by vg =

∑∑ i∈s j∈s

π ij − π iπ j π ij

uig u gj ,

(9)

where u gj = (1 − w j )(θˆ − θˆ(gj ) ),

(10)

with θˆ(gj) is defined by (3) after replacing ξ ih by 1 in (4). The quantity π ij denotes the joint inclusion probability of units i and j. Berger & Skinner (2005) showed that the correction (1 − w j ) suggested by Campbell (1980) ensures that (9) reduces to the usual linearization variance estimator (Särndal et al., 1992,

5

To appear in Biometrika

p. 182) when θˆ is the Hájek estimator (1) of a mean. This correction reduces to (nh − 1)nh−1 for stratified simple random sampling. In a simulation study, Berger & Skinner (2005) showed that the estimator (9) is more accurate than the customary estimators (2). Nevertheless, the generalized jackknife has some practical disadvantages. First, this estimator is radically different from the customary jackknife (5), and hence it cannot be implemented with standard statistical packages. Secondly, exact joint inclusion probabilities may be difficult to calculate. The estimator vg can also take negative values. Thirdly, the double sum in (9) makes the generalized jackknife computationally more intensive than (5).

4. THE PROPOSED JACKKNIFE ESTIMATOR

By substituting the Hájek (1964, p. 1520) approximation for the π ij , we obtain the proposed jackknife estimator given by (11). Although the π ij do not appear directly in (11), their effects are included by the approximation for the π ij . The proposed estimator and the customary jackknife estimator (5) are similar: H

vp =

∑∑ h =1i∈s h

c~ih (uig −

~

∑ φ jh uig ) 2

(11)

j∈s h

with c~ih = nh /(nh − 1)(1 − π i )

( i ∈ sh )

~

φih = λh c~ihδ ih , where δ ih = 1 if i ∈ sh and δ ih = 0 otherwise. The quantity λh is such that

(12) ~

∑i∈s φih = 1;

that is, λh = (∑ i∈s c~ih ) −1 . The quantity c~ih includes a correction of degrees of freedom h

6

To appear in Biometrika

nh /(nh − 1) and varying finite population corrections (1 − π i ) which may be of particular use for surveys with large sampling fractions. Note that the pseudo-variable (10) is used in the proposed estimator vp . The proposed jackknife is as simple as (5); it does not require exact joint inclusion probabilities π ij ; and it is always positive unlike the generalized jackknife (9). Furthermore, in § 5, we show that the proposed estimator is consistent under unequal probability sampling without replacement, which is not always true for (5).

5. CONSISTENCY OF THE PROPOSED JACKKNIFE VARIANCE ESTIMATOR

To demonstrate the consistency of the estimator in (11), consider a sequence of stratified samples {s (t ) } selected from the sequence of nested finite populations {U (t ) }, for t = 1, 2, L . All limiting processes are understood to be as t → ∞. In what follows, a constant will be a scalar free of t. For simplicity of notation, the index t is suppressed in what follows. Let r (s ) be the probability of the stratified rejective sampling (Hájek, 1981, p. 132) defined by H

r ( s ) = ∏ rh ( sh ) h =1

with

pi , 1 − p i i∈s h

rh ( sh ) = ϕ h I ( sh ) ∏ where ϕ h is such that

∑ sh r ( sh ) = 1;

I ( sh ) = 1 if sh contains nh units and I ( s h ) = 0

otherwise; and the pi as such that the π i are the inclusion probabilities of r (s ); that is,

7

To appear in Biometrika

the pi are such that that

∑i∈U h pi = nh

∑ s:i∈s r (s) = π i . In order to have a unique set of

pi , we consider

(Hájek, 1981, p. 132). Chen et al. (1994) propose a method for

computing pi exactly. This sampling design can be implemented by selecting sequentially Poisson samples with inclusion probabilities pi and retaining the first sample which has exactly nh units from U h , for h = 1, K , H . In this way, a fixed number nh of units are selected from U h . The stratified rejective sampling design is not the stratified Poisson sampling design. The stratified rejective sampling design may be difficult to implement in practice, as it requires the calculation of the pi which can be computationally intensive. For this reason, we assume that the sample is selected by a simpler sampling design p (s ) called the actual sampling design, which is a also stratified sampling design without replacement with unequal probabilities π i . This design is generally different from r (s ). However, it will be assumed that p (s ) and r (s ) are asymptotically equivalent, see Condition 1 below. In order to show the consistency, the following regularity conditions are imposed

2

Condition 1.

 p(s )   r ( s ) → 0. We assume that L = ∑ 1 − r ( s )   s

Condition 2.

E r (θˆ − θ ) 4 We assume that K = < ∞, where Er ( ⋅ ) and mse r ( ⋅ ) mse (θˆ) 2 r

denote the expectation and mean squared error under r (s ).

8

To appear in Biometrika

Condition 3.

We assume that θ − E p (θˆ) / var p (θˆ)1 / 2 → 0 and

θ − Er (θˆ) / varr (θˆ)1 / 2 → 0, where var p (θˆ) and E p (θˆ) denote the variance and the expectation of θˆ under p (s ) and varr (θˆ) denotes the variance of θˆ under r ( s ).

Condition 4.

We assume that varr (uˆ ) / varr (θˆ) → 1 , where varr (uˆ ) is the

linearization variance of θˆ under r (s ) , see expressions (5.5.10) and (5.7.4) in Särndal et

( al. (1992); that is, varr (uˆ ) is the variance of uˆ = ∑i∈s ui under r (s ) , where 1 Q ∂g ( µ1 , L , µQ ) ( ui = ( yqi − µ q ). ∑ ∂µ q π i q =1

Condition 5.

(13)

We assume that vaˆr(uˆ ) / varr (uˆ ) → 1, in probability, where vaˆr(uˆ )

is the linearization variance based upon the Hájek (1964) variance estimator defined by

vaˆr(uˆ ) =

H

(

∑ { ∑ (1 − π i )ui2 − dˆh Bˆ h2 },

(14)

h =1 i∈s h

where

1 ( Bˆ h = ∑ (1 − π i )ui , ˆ d h i∈s h dˆh =

∑ π i (1 − π i ). i∈U h

Condition 1 implies that the entropy of p (s ) is large; see § 6. Condition 2 means that the kurtosis of θˆ is bounded. Condition 3 means that θˆ is asymptotically unbiased under p (s ) and r (s ); this is a standard requirement for using jackknife methods.

9

To appear in Biometrika

When we investigate asymptotic properties for the jackknife estimator for the variance, a standard requirement is the existence of the linearization variance estimator (Shao & Tu, 1985, Ch. 2; Shao, 1993). This assumption is given by Condition 4. When θˆ is a smooth continuous and differentiable function of means (Shao & Tu, 1995, Ch. 2),

varr (uˆ ) agrees well with the actual variance varr (θˆ) in large samples. Condition 5 corresponds to the consistency of the linearization variance estimator based upon the Hájek estimator for the variance in (14). This can be supported by Hájek’s (1964) consistency results. Hájek (1964, p. 1520) showed that vaˆr(uˆ ) is a simplified version of the Sen-Yates-Grundy (Sen, 1953; Yates & Grundy, 1953) estimator for the

( variance uˆ = ∑ i∈s ui , which can be assumed consistent. Berger & Skinner (2005) made an assumption similar to Condition 5, with the usual linearization Horvitz-Thompson variance estimator instead of vaˆr(uˆ ) . We also consider additional conditions proposed by Berger & Skinner (2005).

Condition 6.

We assume that | 1 − wi | ≥ γ for all i ∈U , where γ is a strictly

positive constant.

Condition 7.

We assume that lim inf n varr (uˆ ) > 0 .

Condition 8.

We assume that 1 / n ∑ i∈s wτi || yi − µˆ || = O p (n −τ ) for all τ ≥ 2,

where

|| A ||= tr ( A' A)1 / 2

τ

denotes

the

Euclidean

norm,

yi = ( y1i ,L, yQi )

and

µˆ = ( µˆ1 ,L, µˆ Q ).

10

To appear in Biometrika

Condition 9.

We assume that the gradient of g ( µ ) given by

∆( x) = (∂g ( µ ) / ∂µ1 , L , ∂g ( µ ) / ∂µ1Q ) µ = x is Lipschitz continuous of order ν (Shao & Tu, 1995, p. 43).

Condition 10.

We assume that || ∆( µ ) ||= O p (1).

Condition 6 ensures that none of the weights wi can approach 1, which would represent a degenerate design. Condition 7 holds in the standard circumstances where the linearization variance decreases with rate 1 / n (Shao & Tu, 1995, p. 260). It holds when

varr (uˆ ) ≥ ξ / n, where ξ is a positive constant. This inequality is similar to the Cramér– Rao lower bound. Condition 8 is an assumption about the behaviour of the weights and the existence of moments of the yi , which would hold, for example, if nwi and yi were bounded. Conditions 9 and 10 are smoothness requirements for the parameter of interest

θ = g ( µ ) (Shao & Tu, 1995, p. 43).

THEOREM 1. Under regularity Conditions 1 - 10, the proposed variance estimator vp is consistent; that is, vp / var p (θˆ) → 1.

The proof of this theorem is deferred to the Appendix. It follows as a corollary of Theorem 1 that if θˆ is asymptotically normal then confidence intervals based upon vp are asymptotically valid, because by Slutsky's Lemma (θˆ − θ )vp−1 / 2 ~ N (0, 1) asymptotically. It also follows that vp is asymptotically design-unbiased.

11

To appear in Biometrika

6. LARGE ENTROPY SAMPLING DESIGNS

In this section, we show that Condition 1 holds for the Rao-Sampford inclusion probability proportional to size design (Rao, 1965; Sampford, 1967). We also give evidence that suggests that Condition 1 holds for the Chao (1982) inclusion probability proportional to size design. Condition 1 implies that the entropy of p (s ) is large, as L is a distance between p (s ) and the stratified rejective sampling design r (s ) which is the maximum entropy design (Hájek, 1981, Ch. 3). Thus, this condition excludes low entropy designs such as the systematic inclusion probability proportional to size design which does not admit consistent estimators for the variance (Isaki & Fuller, 1982). However, the randomized systematic inclusion probability proportional to size design is a high entropy design (Berger, 1998), and simulations in § 7 show that the proposed estimator is accurate under this design. Condition 1 implies that π ij

π iπ j (Hájek, 1981, p. 32), and this ensures

efficient variance estimation (Hanurav, 1967; Isaki & Fuller, 1982). Note that Condition 1 holds for stratified simple random sampling, because p ( s ) = r ( s ) for this sampling design. The Rao-Sampford inclusion probability proportional to size design (Rao, 1965; Sampford, 1967) is a popular design used for unequal probability sampling without replacement. In an unpublished technical report available from the author, we have shown that Condition 1 holds for this design under the following conditions.

Condition 11.

The number of strata H is bounded by a constant.

12

To appear in Biometrika

Condition 12.

The π i are such that d h = ∑i∈U π i (1 − π i ) → ∞ for each h . h

Condition 13.

We assume that π i ≤ π < 1 for all i ∈ U , where π is a constant.

Condition 12 implies that nh → ∞ , as d h < nh . This condition excludes heavily stratified sampling designs, with nh small. Condition 11 implies that the number of strata does not tend to infinity. Condition 13 ensures that the π i are uniformly bounded and that none of the π i tends to one. When some π i equal one, the proposed estimator is still consistent, as the units with π i = 1 will be included in a stratum which will make no contribution to the variance, as in (11), c~ih = 0 , and π ij = π iπ j in (9). For the Chao (1982) sampling design, Berger (2005) gave conditions under which the

divergence

∑ s p(s ) log p( s) / r (s)

∑ s | 1 − p(s ) / r (s ) | r (s) → 0.

Thus,

tends

to

zero,

and

this

implies

that

p( s ) / r ( s ) → 1 in probability by Chebyshev’s

inequality. This would imply Condition 1, if | 1 − p ( s ) / r ( s ) | were uniformly bounded (Lehmann, 1999, p. 53). There is other evidence which shows that the proposed estimator is suitable under Chao sampling. Berger (2005) showed numerically (A2) in the Appendix which implies the consistency for the Chao sampling design. In the Appendix, Condition 1 is only used as a sufficient condition for (A2); that is, Condition 1 is not necessary if (A2) holds. Thus, the proposed estimator should be consistent under the Chao sampling design. The simulation study in § 7 supports this finding.

13

To appear in Biometrika

7. SIMULATION STUDY

In this section, the proposed variance estimator (11) is compared numerically with customary jackknife estimators (5) and the generalized jackknife estimator (9). We use a population frame given in Valliant et al. (2000, Appendix B) and available at the John Wiley world wide web site ftp://ftp.wiley.com/public/sci_tech_med /finite_populations. This population frame is extracted from the September 1976

Current Population Survey in the United States. We replicated this population frame five times to create an artificial population of N = 2390 individuals from which samples were selected. The population is stratified into H = 3 strata with stratum sizes 1050, 1060 and 280. The variables of interest are the number of hours worked per week, y1i , and the weekly wages, y2i . Analysis of variance tests show that the stratum means of the y2i are significantly different, but that the stratum means of the y1i are not significantly different. The population parameter considered is the finite population correlation coefficient θ = S12 ( S11S 22 ) −1 / 2 = 0.49 between the two variables of interest, where

S kl = ∑i∈U ( yki − µ k )( yli − µl )

with

k , l = 1, 2.

The

substitution

estimator

is

θˆ = Sˆ12 ( Sˆ11Sˆ22 ) −1 / 2 , where Sˆkl = ∑i∈s wi ( yki − µˆ k )( yli − µˆ l ) with k , l = 1, 2. Three different inclusion probability proportional to size designs are used to select units within each stratum: the Rao-Sampford sampling design, the Chao (1982) sampling design and the randomized systematic sampling design. A proportional allocation is used. The π i are proportional to a size variable, or equal to one, for units with a large value for the size variable. The usual method described in Särndal et al. (1992, p. 89) is used to compute the π i .

14

To appear in Biometrika

The value zi of the size variable for the ith unit is generated by the model zi = ei y2i , where ei ~ N {0.7, (0.25) 2 } if

y 2i < 550, and ei ~ N {1.5, (0.35) 2 } if

y 2i ≥ 550. The size variable is correlated with the weekly wage variable, with a correlation equal to 0.83. By generating ei from different normal distributions for y2i < 550 and y 2i ≥ 550, we obtain a skewed size variable which implies a strong design effect because of the unequal probabilities. If the π i are concentrated around their mean of n / N , we expect p (s ) to be similar to the simple random stratified sampling design, and in this situation we should not observe major differences between (5), (9) and (11). For each simulation, 10 000 samples were selected and, for each variance estimator, we computed, as percentages, the empirical relative bias, RB(v )

= 100

E (v) − mse(θˆ) % mse(θˆ)

and the empirical relative root mean squared error, RRMSE (v ) = 100

mse(v)1 / 2 % mse(θˆ)

The quantity mse(θˆ) is the empirical mean squared error of θˆ. Tables 1 and 2 give RB and RRMSE for different sampling fractions f = n / N and for the three sampling designs considered. We do not have RB and RRMSE for the Rao-Sampford sampling design when f ≥ 0.10 , as in this case some of the π i are too large to allow the selection of a sample in a reasonable time. The Rao-Sampford design is implemented by selecting n units with replacement with unequal probabilities. If the units drawn are all distinct, we accept the sample; otherwise we reject it and start again. With large π i we would almost surely draw the units with large π i at least twice, and it will not be possible in practice to select

15

To appear in Biometrika

one Rao-Sampford sample. For example, consider N = 86 , n = 36 and π i proportional to zi = (i / 100)5 + 1 / 5. The probability that all the units drawn at n subsequent independent draws are distinct is approximately 10 −36 (Hájek, 1981, p. 70), which is negligible. Table 1 shows that the customary jackknife estimators v1 , v2 and v3 may overestimate the variance for a small sampling fraction, and seriously underestimate the variance for a large sampling fraction. The value of RB of the proposed jackknife vp is generally smaller than those for v1 , v2 and v3 , except for f = 0.10, in which case all the estimators have a small RB under Chao and randomized systematic sampling. Table 2 shows that v1 , v2 and v3 have the same order of accuracy, as shown by Rao & Wu (1985). In all cases, vp and vg have the same order of accuracy. The value of RRMSE

of vp and vg are essentially equal, and it is generally smaller than that of v1 , v2

and v3 , except for f = 0.15 and f = 0.20 in which case the value is slightly smaller for

v1 , v2 and v3 .

8. DISCUSSION

The proposed jackknife variance estimator can be extended in a number of ways. Point estimators, such as calibration estimators (Huang & Fuller, 1978; Deville & Särndal, 1992) or a regression estimator (Särndal et al., 1992, Ch. 7), which uses auxiliary population information, may often be expressible as functions of means. The proposed jackknife estimator needs to be extended to two-stage sampling. However, the estimator can be used in a stage-wise jackknife in which the clusters are deleted to estimate the

16

To appear in Biometrika

variance between clusters and units are deleted within each cluster to estimate the variance within clusters. Many surveys use single imputation to handle item nonresponse. Treating the imputed values as if they are true values, and then estimating the variance using standard methods, may lead to serious underestimation of the variance when the proportion of missing values is not small (Rao & Shao, 1992; Särndal, 1992). Here, one can use the Rao & Shao (1992) method which consists of adjusting the imputed values whenever a responding unit is deleted. Berger & Rao (2006) showed that this method gives a consistent generalized jackknife under uniform response with unequal probabilities. This result can be extended to show that the proposed jackknife (11) is also consistent for the Rao & Shao (1992) method.

ACKNOWLEDGEMENT The author is grateful to two anonymous referees for helpful comments.

APPENDIX

Proof of Theorem 1

Let mse p (θˆ) and mse r (θˆ) denote the mean squared errors of θˆ under p (s ) and r (s ). Using the triangle inequality and Cauchy’s inequality, we have

mse p (θˆ) 1 p( s) −1 ≤ ∑ 1 − r ( s) r ( s)(θˆ − θ ) 2 ≤ ( K L )1 / 2 . ˆ ˆ mser (θ ) mser (θ ) s Conditions 1 and 2 imply that

17

To appear in Biometrika

mser (θˆ) → 1. mse p (θˆ)

(A1)

Now, (A1), the following equation mse r (θˆ) varr (θˆ) 1 + {θ − Er (θˆ)}2 / varr (θˆ) = × , mse p (θˆ) var p (θˆ) 1 + {θ − E p (θˆ)}2 / var p (θˆ) and Condition 3 imply that varr (θˆ) → 1. var p (θˆ)

(A2)

The result in (A2), Condition 4 and Condition 5 imply that vaˆr(uˆ ) →1 var p (θˆ)

(A3)

in probability. It can be easily shown that vaˆr(uˆ ) =

((

∑ ∑ Dij uiu j , i∈s j∈s

where  − (1 − π )(1 − π )d −1 if i ≠ j and i, j ∈ s i j h h  2 −1 Dij =  1 − π i − (1 − π i ) d h if i = j ∈ sh  0 if i ∈ sh and i ∈ sl , with h ≠ l. 

(A4)

Conditions 6 - 10 imply that Berger & Skinner’s (2005) regularity conditions (a), (b), (c), (d), (g) and (h) hold. Conditions (e) and (f) in Berger & Skinner (2005) hold as Dij is given by (A4). These conditions imply (Berger & Skinner, 2005) that vaˆr(uˆ g ) →1 vaˆr(uˆ )

(A5)

in probability, where vaˆr(uˆ g ) =

∑ ∑ Dij uig u gj .

(A6)

i∈s j∈s

18

To appear in Biometrika

Now (A5) combined with (A3) imply that

vaˆr(uˆ g ) →1 var p (θˆ)

(A7)

in probability. Finally, if we substitute (A4) into (A6), it can be shown that vaˆr(uˆ g ) = vp ,



which implies the Theorem by (A7).

REFERENCES

BERGER, Y.G. (1998). Rate of convergence to asymptotic variance for the HorvitzThompson estimator. J. Statist. Plan. Infer. 74, 149-68.

BERGER, Y.G. & RAO, J.N.K. (2006). Adjusted jackknife for imputation under unequal probability sampling without replacement. J. R. Statist. Soc. B 68, 531-47.

BERGER, Y.G. (2005). Variance estimation with Chao’s sampling scheme. J. Statist. Plan. Infer. 127, 253-77.

BERGER, Y.G. & SKINNER, C.J. (2005). A jackknife variance estimator for unequal probability sampling. J. R. Statist. Soc. B 67, 79-89.

CAMPBELL, C. (1980). A different view of finite population estimation. In Proc. Surv. Res. Meth. Sect. Am. Statist. Assoc., pp. 319-24. Baltimore: Am. Statist. Assoc.

CHAO, M.T. (1982). A general purpose unequal probability sampling plan. Biometrika

69, 653-56.

19

To appear in Biometrika

CHEN, X.H., DEMPSTER, A.P. & LIU, J.S. (1994). Weighting finite population sampling to maximise entropy. Biometrika 81, 457-69.

DEVILLE, J.C. & SÄRNDAL, C.E. (1992). Calibration estimators in survey sampling. J. Am. Statist. Assoc. 87, 376-82.

DEMNATI, A. & RAO, J.N.K. (2004). Linearization variance estimators for survey data (with Discussion). Survey Methodol. 30, 17-34.

HÁJEK, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann. Math. Statist. 35, 1491-523.

HÁJEK, J. (1971). Comment on a paper by D. Basu. In Foundations of Statistical Inference, Ed V.P. Godambe and D.A. Sprott, p. 236. Toronto: Holt, Rinehart and Winston.

HÁJEK, J. (1981). Sampling in Finite Population. New York: Marcel Dekker, Inc.

HORVITZ, D.G. & THOMPSON, D.J. (1952). A generalization of sampling without replacement from a finite universe. J. Am. Statist. Assoc. 47, 663-85.

HUANG, E.T. & FULLER,W.A. (1978). Nonnegative regression estimation for sample survey data. In Proc. Social Statist. Sec. Am. Statist. Assoc., pp. 300-5. Baltimore: Am. Statist. Assoc.

ISAKI, C.T. & FULLER, W.A. (1982). Survey design under the regression superpopulation model. J. Am. Statist. Assoc. 377, 89-96.

20

To appear in Biometrika

JONES, H.L. (1974) Jackknife estimation of functions of stratum means. Biometrika 61, 343-8.

KISH, L. & FRANKEL, M.R. (1974). Inference from complex samples (with Discussion). J. R. Statist. Soc. B 36, 1-37.

KOVAR, J.G., RAO, J.N.K. & WU, C.F.J. (1988). Bootstrap and other methods to measure errors in survey estimates. Can. J. Statist. 16, 25-45.

KREWSKI, D. & RAO, J.N.K. (1981). Inference from stratified samples: properties of the linearization, jackknife and balanced repeated replication methods. Ann. Statist. 9, 1010-9.

LEE, K. (1973). Variance estimation in stratified sampling. J. Am. Statist. Assoc. 68, 33642.

LEHMANN, E.L. (1999) Elements of Large-Sample Theory. New York: SpringerVerlag.

RAO, J.N.K. (1965). On two simple schemes of unequal probability sampling without replacement. J. Indian Statist. Assoc. 3, 173-80.

RAO, J.N.K. & SHAO, A.J. (1992). Jackknife variance estimation with survey data under hotdeck imputation. Biometrika 79, 811-22.

RAO, J.N.K. & WU, C.F.J. (1985). Inference from stratified samples: second-order analysis of three methods for nonlinear statistics. J. Am. Statist. Assoc. 80, 620-30.

21

To appear in Biometrika

RAO, J.N.K., WU, C.F.J. & YUE, K. (1992). Some recent work on resampling methods for complex surveys. Survey Methodol. 18, 209-17.

SAMPFORD, M.R. (1967). On sampling without replacement with unequal probabilities of selection. Biometrika 54, 494–513.

SÄRNDAL, C.E (1992) Methods for estimating the precision of survey estimates when imputation has been used. Surv. Methodol., 18, 241-52.

SÄRNDAL, C.E., SWENSON, B. & WRETMAN, J.H. (1992). Model Assisted Survey Sampling. New York: Springer-Verlag.

SEN, P.K. (1953). On the estimate of the variance in Sampling with varying probabilities. J. Indian Soc. Agric. Statist. 5, 119-27.

SHAO, J. (1993). Differentiability of statistical functionals and consistency of the jackknife. Ann. Statist. 21, 61-75.

SHAO, J. (1994). L-statistics in complex survey problems. Ann. Statist. 22, 946-67.

SMITH, T.M.F. (2001). Biometrika centenary: sample surveys. Biometrika 88, 167-94.

SHAO, J. & TU, D. (1995). The Jackknife and Bootstrap . New York: Springer-Verlag.

THOMPSON, S.K. & SEBER, G.A.F. (1996). Adaptive Sampling. New York: Wiley.

VALLIANT, R., DORFMAN, A. H. & ROYALL, R. M. (2000) Finite Population Sampling and Inference: a Prediction Approach. New York: Wiley.

22

To appear in Biometrika

WOLTER, K.M. (1985). Introduction to Variance Estimation. New-York: SpringerVerlag.

YATES, F. & GRUNDY, P. M. (1953). Selection without replacement from within strata with probability proportional to size. J. R. Statist. Soc. B 1, 253-61.

TABLE 1. Simulation study. Empirical relative biases (%) of variance estimators v1 , v2 , v3 , vg and vp for three different stratified unequal probability sampling designs. f

Sampling Design

v1

v2

v3

vg

vp

0.03

Rao-Sampford Randomized Systematic Chao

12.6 13.8 15.0

12.8 14.0 15.1

12.3 13.4 14.6

-0.3 0.1 1.2

-0.5 -0.2 1.2

0.05

Rao-Sampford Randomized Systematic Chao

5.7 8.3 7.2

5.8 8.4 7.2

5.6 8.2 7.1

-1.7 0.5 -0.8

-1.9 0.2 -0.9

0.07

Rao-Sampford Randomized Systematic Chao

5.7 3.9 2.9

5.8 3.9 2.9

5.7 3.8 2.9

2.1 0.4 -1.1

1.9 0.1 -1.2

0.10

Randomized Systematic Chao

-0.5 1.3

-0.4 1.3

-0.5 1.2

0.0 1.4

-0.3 1.4

0.15

Randomized Systematic Chao

-5.3 -5.7

-5.3 -5.7

-5.3 -5.7

0.6 -0.1

0.2 -0.1

0.20

Randomized Systematic Chao

-8.7 -10.5

-8.7 -10.5

-8.7 -10.5

2.1 -0.5

1.7 -0.4

0.40

Randomized Systematic Chao

-26.4 -25.4

-26.4 -25.4

-26.4 -25.4

0.4 -0.2

-0.1 1.3

23

To appear in Biometrika

TABLE 2. Simulation study. Empirical relative root mean squared errors (%) of v1 , v2 , v3 , vg and vp for three different stratified unequal probability sampling designs. f

Sampling Design

v1

v2

v3

vg

vp

0.03

Rao-Sampford Randomized Systematic Chao

110.0 117.2 118.3

110.4 117.6 118.7

109.2 116.3 117.4

86.1 89.4 89.7

85.8 89.2 89.5

0.05

Rao-Sampford Randomized Systematic Chao

90.6 93.9 93.7

90.8 94.1 93.8

90.4 93.7 93.4

75.4 77.6 78.0

75.2 77.4 77.7

0.07

Rao-Sampford Randomized Systematic Chao

79.8 75.4 72.6

79.9 75.5 72.7

79.7 75.3 72.6

71.5 67.8 63.9

71.3 67.6 63.7

0.10

Randomized Systematic Chao

63.9 62.7

63.9 62.7

63.8 62.6

60.6 58.5

60.4 58.3

0.15

Randomized Systematic Chao

48.5 48.6

48.5 48.6

48.5 48.6

49.2 49.6

49.1 49.4

0.20

Randomized Systematic Chao

40.4 39.2

40.4 39.2

40.4 39.2

43.8 41.9

43.6 41.8

0.40

Randomized Systematic Chao

32.7 31.8

32.7 31.8

32.7 31.8

28.6 28.0

28.4 28.3

24