RANDOM SET SAMPLING WITH NON RESPONSES
Carlos N. BOUZA1 ABSTRACT: The problem of estimating the population mean under non responses is studied when Ranked Set Sampling [rss] is the sampling design used for selecting the sample among the non respondents [nr]. Two rss strategies are proposed in this paper: 1. Selection of a sub sample from the nr in each cycle. 2. Selection of sub samples from the nr in each rank. The corresponding variances and their expectations are derived. We obtain that the error derived is smaller than the classical simple random sampling alternative. The behavior of the proposed estimators is illustrated using some experiments. KEYWORD: Non respondent strata, sub sampling , eÆciency.
1 Introdution
Ranked set sampling [rss] is a procedure which was rst proposed by McYntire (1952) when modeling the estimation of pastures yields. Ranked Set Sampling procedure involves the drawing of m independent sets of m units by simple random sampling with replacement [srswr]. The unit with rank j in the j th set is measured and the value of Y in2 it , yj(j) is obtained. Thus m units are quanti ed out from the m units in the m evaluated sets. This 1 Facultad de Matem atica y Computacion, Universidad de La Habana San Lazaro y L. Habana, CP 10 400. Cuba. email:
[email protected].
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
297
procedure is repeated r times [cycles] and n = mr measurements are made. We denote by si the sample corresponding to cycle y. It reduces costs and increase the eÆciency when it is easy to rank the sampled units. Li, Sinha, Perron (1999) discuss dierent features of this model. The classical non response problem [nr], see Cochran (1977) and Ardilly (1994) , is related with the existence of missing observations in the random sample s. We study the problem of estimating the population mean when the information on some units is not available and rss is used. As nr are observed the population U is divided into two strata U1 = fu 2 U j u responds at the rst attemptg U2 = fu 2 U j u does not respond at the rst attemptg U = U1 + U2 and the sample si = si1 + si2 where sit Ut ; t = 1; 2. The size of sit is , mit = jsit j is random variable. Let Nt represents the number of units within Ut hence j U j= N1 + N2 = N:
We will use the following notation and results: E [Yu j u 2 Ut] = t; V [Yu j u 2 Ut] = t2 ; t = 1; 2 , E Y(j) = (j); V Y(j) = 2 ;j= (2j ) ; j = 1; 2; :::m; E Y(j ) j U2 = 2(j ); V Y(j ) j U2 = 2( j) 1; 2; :::; m; (j) = (j) ; 2(j) = 2(j) 2 ; j = 1; 2; :::; m: E [Y ] = = W1 1 + W2 2 and V [Y ] = 2 : Y(j ) denotes the statistic of order j and Wt = Nt=N . The surveyor may select the sub sample from the nr by using one of the following strategies: 1 Select a sub sample s0i2 of size m0i2 from each si2, i = 1; :::r. 2. Select a sub sample s02(j) of size r0i2 from the nr with rank j , j = 1; :::; m. In both cases he/she uses srswr for selecting the sub samples . We will obtain estimators of for the model associated with each strategy. Section 2 is devoted to the development of unbiased estimators of and their errors. In Section 3 the accuracy of them is analyzed through simulation experiments. The behavior of the proposed estimators is evaluated . The result is that the Strategy 1 is the best alterntive. 298
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
2 Sub sampling the non respondent stratum with rss
The rss method produces r independent samples s0i2 : If mi2 = 0 jsi2 j 6= 0 a sub sample si2 should be selected from s0i2 . The nr may be a consequence of a dierent behavior of Y in the units that belong to U2. Therefore 2; 2(j) ; 22 and 2(2 j) . may be very dierent from ; (j) ; 2 and (2j) . The surveyor xes m and selects the sub sample from s using srswr for obtaining information from U2 . Take yiu0 as the value of Y in he unit u of s0i2 . The sub sample mean is given by: 0
2=
yi
mi2 0 X yiu 0
u=1
m0i
2
The mean s1 is denoted by 0
y rss;i
de ning
1=
m f X y ij mi1 j =1
if unit with rank j in the j th ranked set responds 0 otherwise From Cochran [1977] is derived that an estimator of is the sample weighted mean =
y e ij
()
yj j ; ;
=y
i
0
= wi1 y rss;i1 +wi2 y i2
where wit = mit =m, t = 1; 2; and y rss;i1 is the rss estimator of the population mean of U1. Then it is an unbiased estimator of 0 0 1 . As si2 U2 we have that y i2 is also unbiased for 2 .The rss estimator of it is m 1 X y rss;i2 = y y eij mi
Therefore we can use
2 j=1
j (j )
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
299
=y
i
=
1
wi y rss;i
1 +wi2 y rss;i2
+ wi2
0
yi
2
2
y rss;i
Note that the rst term is the rss estimator of under the complete response model. As the conditional expectation of the =y second term is zero, i is unbiased for and its variance is: V
h= i yi
=V
+wi22 E
y rss
0
yi
2
y rss;i
2
2
+Cov
0
2
y rss ; y i
y rss;i
2
It is well known , see Kaur, Patil and Taillie (1997) for example, that m 1 X 2 V y rss;i = m
2
(j)
j =1
Taking 0
yi
2
2=
0
yi
2
2
y rss;i
+
the expectation of the second term of V
0
E yi
2
2
y rss;i
2
2
y rss;i
h= i yi
2
is
= m20
V
i2
2
2
y rss;i
because there is no contributions from the cross term .Then h= i V yi
2
m 2 X 1 = m2 (2j) + wi22 4 m20
i2
j =1
1
mi2 X
22 j =1
mi
3
2 2(j ) 5
(1)
For calculating EV =y i we need an explicit expression of m0i2 . If the surveyor uses Hansen-Hurwitz's rule m0i2 = mi2 =Ki , Ki > 1. In this case m m h= i X X 1 wi2 Ki 22 1 2 2 y (2) + V = h
i
300
2
m
j =1
(j)
i
i2
m
2
m
j =1
2(j)
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
Using the results of Dell-Clutter (1972) it is possible to express the sum of the involved variances of the order statistics as m m 1 X 2 = 2 1 X 2 2
m
j =1
(j )
m
m
2 = 22 2( ) mi2 m2i2 j =1 h= i y
1
mi2 X
2
j =1
1
(j)
mi2 X
2 22 2(j) j =1
j
mi
Then we can rewrite V i and its expectation is easily obtained. It is 2 3 m m h= i X X 2 W2 [Ki 1] 22 1 4 EV y i = + 2(j) E [ 22(j) ]5 (3) m m m2 i2
j =1
j =1
The rss procedure establishes that we must replicate r times for obtaining a sub sample of size n = mr and to average the r sample means. In our case we have that the corresponding mean if the r samples s01; :::; s0r is r X =y =y 1 = rss i r
i=1
It is unbiased for and its error is given by " # r h= i X 1] 22 2 W2 [K 1 1 2 EV y rss = + 2 E [ ] n
n
where 22i =
m
mi2 X j =1
n
i=1
2i
22(j)
m X 2 = m1 2(j)
and
j =1
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
301
K
= 1r
r X i=1
Ki
Note that if Ki = K the third term measures the gain in accuracy of the proposed model with respect to the use of srs in the presence of nr. The proposed model is more accurate that the srswr one if it is positive. Otherwise the relation between K and K should be included in the analysis of the behavior of the expected variance. Let us assume that the surveyor has information for xing 20i = M ax[22(j) ] for each i = 1; ::; r: Then: 20i mi2 =
mi2 X j =1
22(j)
and we have that mW220i is an upper bound of E [Pmj=1 22(j) ] and if r r X X 2 1 E [2 ] < 2 W2 2 i2
n
i=1
2i
r
i=1
0i
holds a gain in accuracy is expected. Generally W2 is suÆciently small for granting the positiveness of this upper bound. Note that the r observed units with rank j constitutes an independent sample s(j) 0 from the density f(j ) . In the alternative strategy a sub sample s2(j) of size r2(0 j) units should be selected among the nr in s (j) . The sample means 2
1 y 2(j ) = r0
0 r2(j)
1 1( ) = r
r1(j)
0
and
X
2(j) p=1
y j
X
1(j) i=1
()
yp j
()
yp j
can be computed for s02(j) and s1(j) .We have that an unbiased 0 estimator of (j) is =y (j) = q1(j) y 1(j) +q2(j) y 2(j) de ning qt(j) = 302
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
()
rt j =r; t
1
= 1; 2 . Mimicking the procedure used with the Strategy
0 =y y y = q + q + q (j) 1(j) 1(j) 2(j) 2(j) 2(j) y 2(j)
2( )
y j
where y 2(j) is the mean ofthe nr. As we obtain the unbiasedness of 0 =y = 2(j) and E y 2(j) = E y 2(j) = (j) . Then y is also unbiased. Its conditional variance is V
h= i y (j )
=V
()
y j
+ q2(2 j) E
0
2( )
y j
2( )
y j
2
because the cross term has null expected value. The rst term is V y (j ) = 1r (2j ) : If Hansen-Hurwitz's rule is used we have that 0 = r2(j) =K(j) and r2( j) 0 2 2 K(j) 1 2( y 2(j ) E y 2(j ) = j) r 2(j)
Therefore:
2 q2(j ) K(j ) 1 2( 1 j) 2 = r (j) + r As an estimator of we use the mean of the m estimators of the rank population means m X =y =y 1 = h= i V y (j )
(rss)
m
j =1
(j)
It is unbiased and its variance is given by: h= i V y (rss)
j
= m12r
2 m X 2 4
j =1
(j) +
m X j =1
2( ) [ ( )
q j Kj
3
1]2(2 j) 5
Denote by P2(j) the probability that a sample unit with rank does not respond. Hence: m h= i 2 X 2 1 y EV = + P [K 1]2 (rss)
n
nm
j =1
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
2(j) (j)
2(j)
303
and adding and substracting Pmj=1 2(2 j) in the last term we have that 1
X m
nm j=1
P2(j) [K(j)
1]
2 2(j)
Xh m
=
j=1
P2(j) [K(j)
1] 1
i
2 2(j) +
Then the expected variance is given by: EV
h i= = y (rss)
2
+ 22 + 1 n
nm
X m
j =1
[
1]2(2 j)
P2(j ) K(j )
1
22 n
nm
1
j =1
m
nm j=1
X[ m
X
2 (j )
2 2(j)
+22(j) ]
An analysis of the gain in accuracy of this estimator depends on the values of the P2(j) 's which are generally unknown. Therefore we can not conclude if it is better alternative with respect to the other estimators.
3 Evaluating the eÆciency
The classical srswr estimator of under nr is =y
srs
=
0
1 1 +n2 y 2
n y
n
where nt =j st j st Ut; t = 1; 2: n 1X y 1= yi 1
1 i=1
n
is the sample mean of the respondents and 0 y
1 2 = n0
n2 X 0
2 i=1
yi
is the mean corresponding =y to the sub sample among the nr 0 and n2 = n2=n. The error of , see Cochran [1977] is h = i 2 + W2 [K 1]22 EV ysrs = n
n
=y We made a comparison of the behavior of srs with the rss =y =y estimators (rss); rss
304
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
A set of experiments was developed using a normal N [1; 12] distribution for generating m1 values of Yu , u 2 U1 and another N [2 ; 22 ] for the m2 nr. Each mt was generated by a Binomial with parameters m and Wt: A similar procedure was used for generating n1 and n2: The eÆciency of the estimators is denoted by: [ ]=
a; b
h= i ya h= i ; a EV y b
EV
6= b
We computed i and the sub indexes denote rss; (rss), and srs:The number of cycles was xed as r = 10 , K =K =K(j ) and 100 populations of size 500 were generated. The results are given in Table 1. The analysis of the table establishes that ==y rss performs better =y =y =y than (rss) and srs : But (rss) is worst than y srs : Note that =y rss2 produces larger gains in accuracy when the parameters W1 and 2 are large. Table 1. EÆciency of the estimators under normal distributions:r = 10: Distrib:
[1; 1] [3; 1] =0; 8 [1; 1] [3; 1] =0; 2 [1; 1] [3; 1] =0; 8 [1; 1] [3; 3] =0; 2
N N W1 N N W1 N N W1 N N W1
m
10 20 30 10 20 30 10 20 30 10 20 30
[ =2 0 81 0 74 0 60 0 92 0 87 0 83 0 71 0 65 0 59 0 88 0 76 0 73
rss; K ; ; ; ; ; ; ; ; ; ; ; ;
(rss)] K =4 0; 67 0; 64 0; 57 0; 84 0; 77 0; 69 0; 66 0; 53 0; 48 0; 76 0; 63 0; 59
[ =2 0 92 0 87 0 81 0 98 0 93 0 89 0 86 0 70 0 67 0 90 0 85 0 81
rss; K ; ; ; ; ; ; ; ; ; ; ; ;
] =4 0 88 0 83 0 76 0 89 0 85 0 79 0 81 0 75 0 69 0 84 0 77 0 69
srs K ; ; ; ; ; ; ; ; ; ; ; ;
[( ) =2 1 12 1 18 1 22 1 34 1 38 1 46 1 43 1 49 1 52 1 37 1 45 1 48
rss ; K ; ; ; ; ; ; ; ; ; ; ; ;
] =4 1 22 1 28 1 31 1 36 1 40 1 45 1 49 1 53 1 55 1 46 1 50 1 54
srs K ; ; ; ; ; ; ; ; ; ; ; ;
In Table 2 we present a comparison among the estimators using real life data. They were obtained from an study of the abundance of Plankton in Veracruz, Mexico. The selection of the units was made by ranking the results of the analysis of the Hemoglobin. The variable of interest is the evaluation of the blood's quality expressed as a percent of the computed result Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
305
with respect to a xed ideal standard. One hundred samples were selected and [ ]=
A ya
100 y a X
s
100 s=1 was computed. The results coincide with those obtained in the =y simulations. They again suggest that rss is the best alternative and =y (rss) the worst. Table 2. Accuracy of the estimators in a Hematic Biometry study. r = 10 m
10 20 30
[= ] K =2 =4 0; 23 0; 26 0; 11 0; 19 0; 10 0; 08 A y rss K
[= ] K =2 =4 0; 33 0; 38 0; 25 0; 31 0; 10 0; 20 A y srs K
[=( K =2 0; 44 0; 35 0; 28
)]
A y rss K
=4 0; 56 0; 46 0; 39
Another set of data is given by the yields of rice in 1551 plots generated under experimental conditions. The= results are given in Table 3. Again rss is the best alternative but y (rss) has a behavior very similar to the srs estimator. Table 3. Accuracy of the estimators in a study of rice's yields. r = 10 = = = A[ y rss ] A[ y srs ] A[ y (rss) ] m K =2 K =4 K =2 K =4 K =2 K =4 10 1; 61 0; 82 2; 83 3; 45 3; 06 3; 51 20 1; 38 0; 73 2; 33 2; 87 2; 52 2; 96 30 0; 76 0; 69 1; 81 1; 12 1; 74 1; 25 The results suggest that the rst strategy should be prefered.
306
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
BOUZA, C.N. Conjunto de amostras aleatorias com respostas faltantes (S~ao Paulo), v. 19, p. 297-308, 2001. Rev.
Mat.
Estat.
RESUMO: O problema de estimar medias populacionais na aus^encia de respostas e estudado quando o Conjunto de Amostras Ordenadas (RSS) e o delineamento amostral usado para selecionar a amostra entre as respostas faltantes (NR). Duas estrategias RSS s~ao propostas no artigo: 1) Seleca~o de uma sub amostra de NR em cada ciclo. 2) Selec~ao de uma sub amostra de NR em cada passo. As vari^ancias correspondentes e seus valores esperados s~ao deduzidos. Obtemos que o erro resultante e menor do que o da alternativa classica de amostragem aleatoria simples. O comportamento dos estimadores propostos e ilustrado atraves de alguns experimentos. PALAVRAS-CHAVE: Extratos de respostas faltantes, sub amostragem, e ci^encia.
References
ARDILLY, P. . Paris: Technip, 1994. p.393. COCHRAN, W.G. . New York: Wiley, 1977, p.435 DELL, T.R.; CLUTTER, J.L. Ranked set sampling theory with order statistics background. , v. 28, p. 545-55, 1972. KAUR, A.; PATIL, G.P.; TAILLIE, C. Unequal allocation models for ranked set sampling with skew distributions. , v. 53, p.123-30, 1997 HANSEN, M.H.; HURWITZ, W.N. The problem of non response in sample survey. , v. 41, p.517-29, 1946. LI, D.; SINHA, B.K.; PERRON, F. Random selection in ranked set sampling and its aplications. , v. 76, p.185-201, 1999. McYNTIRE, G.A. A method of unbiased selection sampling using ranked sets. , v.3, p.385-90, 1952. PATIL, G.P.; SINHA, A.; TAILLIE, C. Ranked set sampling. In : PATIL, G.P.; RAO, C.R (Ed.). Les techniques de sondages
Sampling techniques
Biometrics
Biometrics
J. Amer. Stat. Assoc.
J. Stat. Plan. Infer.
Aust. J. Agric. Res.
Enviromental
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
statistics.
307
Amsterdam: New Holland, 1994. p. 167-98. (Handbook of Statistics, 12) SAMAWI, H.M. Strati ed set sampling. . v.12, p.9-16, 1996. Pakistan J. Stat
Recebido em 19.09.1999.
308
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001