Current Estimates for Sampling on Two Occasions Using Two-Stage Sampling Jane Meza University of Nebraska-Lincoln, Department of Mathematics and Statistics 927 Oldfather Hall Lincoln, NE 68588-0323, U.S.A.
[email protected] Manas Chattopadhyay Gallup Organization One Church Street, Suite 900 Rockville, MD 20850, U.S.A.
[email protected] Partha Lahiri University of Nebraska-Lincoln, Department of Mathematics and Statistics 922 Oldfather Hall Lincoln , NE 68588-0323, U.S.A.
[email protected] Roger Tourangeau Gallup Organization One Church Street, Suite 900 Rockville, MD 20850, U.S.A.
[email protected] 1. Introduction Repeated sampling of a population is a common sampling procedure. In this paper we consider sampling on two occasions, with interest in the current estimates. Cochran (1977) considered sampling on two occasions using random sampling at each of the two occasions. He found that current estimates might be improved by replacing part of the sample on the second occasion. That is, rather than keeping the same units in the second sample, or sample entirely new units in the second sample, some of the units will be retained (or matched) with the first sample. This procedure may improve the efficiency of the estimators. We consider instead sampling on two occasions, with two-stage sampling at each occasion. On the second occasion, should the same units be retained, all new units be sampled, or should some of the units be matched with the first occasion? If some of the units are matched with the first occasion, what is the optimal percentage that should be matched? These questions will be considered in the following discussion. 2. Sample Design Suppose a sample is repeated on two occasions, with a sample size n on each occasion. For the second sample, m (for matched) of the n units in the first sample are retained and u (for unmatched) new units are selected. We wish to determine the optimal percentage to match, m/n. Suppose there are N primary sampling units on both occasions and that each primary sampling unit has K secondary sampling units. Implement a two-stage sampling design at each occasion.
That is, at each occasion, select a simple random sample of n primary sampling units, and within each selected primary sampling unit, draw a simple random sample of k secondary sampling units. 3. Notation The following notation is used. yij 2 = value for the jth s.s.u. in the ith p.s.u. on occasion 2 k
yi 2 = ∑ yij 2 = total for ith p.s.u. at time 2 j =1
1 n ∑ yi 2 = average per p.s.u. at time 2 n i=1 1 m y 2 m = ∑ yi 2 = average per p.s.u. for the matched sample at time 2 m i =1 1 u y 2u = ∑ yi 2 = average per p.s.u. for the unmatched sample at time 2 u i =1 1 N K Y 2 = ∑∑ Yij 2 = average per p.s.u. at time 2 N i=1 j =1 y2 =
S Bαα
1 N 1 = Y iα − ∑ N − 1 i=1 N
SWαα =
S B12 =
SW 12 =
Y iα ∑ i =1 N
(
N
K
(
N
∑Y i =1
i1
for α =1,2.
)
2
N K 1 ∑ ∑ yijα − Y iα N (K − 1) i =1 j =1
1 N 1 Y i1 − ∑ N − 1 i =1 N
2
for α =1,2.
1 Y i2 − N
)(
1 ∑∑ yij1 − Y i1 yij 2 − Y i 2 N (K − 1) i =1 j =1
N
∑Y i =1
)
i2
4. Current Estimates When Sampling Over Two Occasions Using Two-Stage Sampling Suppose there are N p.s.u.’s on both occasions and that each p.s.u. has K s.s.u’s. Assume that n p.s.u.’s and k s.s.u’s are sampled on each occasion and that two-stage sampling is used on each occasion. On the second occasion, m of the p.s.u.’s from the first sample are again sampled and u new p.s.u.’s are sampled. We are interested in estimating Y 2 . The mean of the unmatched portion on the second occasion, y 2u is an unbiased estimator of Y 2 , as is the following regression estimator considered by Cochran (1977). ′ (1) y 2 m = y 2 m + b( y1 − y1m ) The best estimator of Y 2 is: (2) Then
y 2 r = φ y 2u + (1 − φ )y 2 m
′
(V ( y )) ) ) + V ( y −1
where φ =
(V ( y
2u
−1
2u
′ 2m )
−1
.
−1
−1 −1 ′ + V y 2 m . (3) V ( y 2 r ) = V y 2u Proofs of the following theorems along with several preliminary lemmas are included in Meza and Lahiri (1999).
( ( ))
Theorem 1. If n p.s.u.’s and k s.s.u’s are selected by simple random sampling then under two-stage sampling the variance of y 2 r (ignoring the fpc) is given by V ( y 2r ) =
(4)
(
kS B 22 (k + λ2 ) n + cuρ B2 n 2 + cu 2 ρ B2
)
SW 11 S S λ − k − 2λ12 S B212 , λ2 = W 22 , λ12 = W 12 , c = 1 and ρ B2 = . S B11 S B 22 S B12 k + λ2 S B11S B 22 Further, the variance is optimal when
where λ1 =
1 + cρ B m = . n 1 + 1 + cρ B2 2
(5) Remarks
(i) If K=k=1, the variance reduces to V ( y 2 r ) =
(
S B 22 n − uρ B2 n 2 − u 2 ρ B2
)
and the variance is optimal when
1 − ρB m = . n 1 + 1 − ρ B2 2
(ii) If K=k=1 and u=n, or equivalently m=0, the variance reduces to V ( y 2 r ) =
(
)
S B 22 . n
kS B 22 (k + λ2 )1 + 1 + cρ B2 . 2n −1 (iv) So that equation (5) is defined, assume c ≥ 2 . ρB 7DEOH Optimal Percentage Matched Percentage Gain in Precision c= c=
( )
(iii) The optimal variance reduces to Vopt y 2 r =
rB
-.5
-1
-1.1
-1.23
-1.5
-2.75
-4
-.5
-1
-1.1
-1.23
-1.5
-2.75
-4
.5 .6 .7 .8 .9 .95 1.00
48 48 46 45 44 43 41
46 44 42 38 30 24 0
46 44 40 35 25 8
45 43 39 32 6
44 40 34 17
36 9
0
3 5 7 10 13 15 17
7 11 17 25 39 52 100
8 13 19 30 50 84
9 15 23 37 89
12 19 32 67
28 82
100
The above table compares several values of ρ B and c. The percentage gain in precision is determined by considering equation (4) relative to no matching, (u = n). When c = -1, the optimal percentage matched and the percentage gain in precision agree with Cochran (1977). For fixed ρ B , the optimal percentage matched is an increasing function of c and percentage gain in precision is a decreasing function of c. For fixed c, optimal percentage matched is a decreasing function of ρ B and percentage gain in precision is an increasing function of ρ B .
We now consider another estimator of Y 2 . The mean of the unmatched portion on the first occasion, y 2u is an unbiased estimator of Y 2 . Using simple random sampling on both occasions, Des Raj (1968) considered the difference estimator ″ y 2 m = y 2 m − y1m + y1 . (6) Similarly, let
(
(7)
)
y 2 d = φ y1u + (1 − φ )y 2 m
″
(V ( y )) where φ = (V (y )) + V y −1
1u
−1
1u
″ 2m
−1
.
Then −1
−1 −1 ″ (8) V ( y 2 d ) = V y1u + V y 2 m . Theorem 2. Suppose n p.s.u.’s and k s.s.u’s are selected by simple random sampling and S B11 = S B 22 = S B2 , SW 11 = SW 22 = SW2 , S B12 = S B2 ρ B , and SW 12 = SW2 ρW . Let µ = u / n . Then under
( ( ))
two-stage sampling the variance of y 2 d (ignoring the fpc) is given by
kS B2 (1 + c ) 1 + dµ (9) V ( y 2d ) = 1 + dµ 2 . n Further, the variance is optimal when 1+ d −1 (10) µ= d SW2 2(ρ B + cρW ) where d = 1 − and c = 2 . 1+ c kS B Remarks
S B2 (i) If K=k=1, the variance reduces to V ( y 2 d ) = n 1 . when µ = 1 + 2 1 − ρB
1 + (1 − 2 ρ B ) µ and the variance is optimal 2 1 + (1 − 2 ρ B ) µ
(ii) If K=k=1 and µ=1, the variance reduces to V ( y 2 d ) =
S B2 . n
REFERENCES Cochran, W. G. (1977). Sampling Techniques. John Wiley and Sons. New York. Raj, D. (1968). Sampling Theory. McGraw-Hill Book Company. New York. Meza, J. and Lahiri, P. (1999). Current Estimates for Sampling on Two Occasions Using TwoStage Sampling, Technical Report. University of Nebraska-Lincoln.