Journal of Reliability and Statistical Studies (ISSN: 0974-8024) Vol. 2, Issue 1(2009): 28 - 40
UTILIZATION OF NON-RESPONSE AUXILIARY POPULATION MEAN IN IMPUTATION FOR MISSING OBSERVATIONS D. Shukla 1 , Narendra Singh Thakur 2 and Dharmendra Singh Thakur 3 Deptt. of Maths and Stats., Dr. Harisingh Gour University, Sagar (M.P.), India. E mail: 1.
[email protected], 2.
[email protected]
Abstract This paper presents an imputation based factor-type class of estimation strategy for estimating population mean in presence of missing values of auxiliary variable. The non-sampled part of population is used as an imputation technique in the form of a proposed class of estimators. The bias and mean squared error of this class is obtained. Some special cases are discussed. A specific range of parameter is found where the proposed class is optimal. The efficiency of the proposed estimator is compared with similar non-imputed estimator and it is found useful under missing observations setup.
Keywords: Imputation, Non-response, Post-stratification, Simple Random Sampling Without Replacement (SRSWOR), Respondents (R).
1. Introduction To estimate the population mean using auxiliary variable, many estimators are available in literature like-ratio, product, regression, dual-to-ratio estimator and so on. If some values of auxiliary variable are missing, none of the above estimators can be used. In sampling theory, the problem of mean estimation of a population is considered by many authors like Singh (1986), Singh and Singh (1991), Singh et al. (1994), Singh and Singh (2001). Sometimes, in survey situations, a small part of sample remains nonresponded (or incomplete) due to many practical reasons. Techniques and estimation procedures are needed to develop for this purpose. The imputation is a well defined methodology by virtue of which this kind of problem could be partially solved. Ahmed et al. (2006), Rao and Sitter (1995), Rubin (1976) and Singh and Horn (2000) have given applications of various imputation procedures. Hinde and Chambers (1990) studied the non-response imputation with multiple sources of non-response. The nonresponse in sample surveys immensely looked into by Hansen and Hurwitz (1946), Lessler and Kalsbeek (1992), Khot (1994), Grover and Couper (1998) etc. When the population is divided into two groups namely “response” and “nonresponse” then the procedure is known as post-stratification. Estimation problem in sample surveys, in the setup of post-stratification, under non-response situation is studied due to Shukla and Dubey (2001, 2004, and 2006). Some other useful contributions to this area are due to Smith (1991), Agrawal and Panda (1993), Shukla and Trivedi (1999, 2001, 2006), Wywial (2001) and Shukla et al. (2002, 2006). When a sample is full of response over study variable but some of auxiliary values are missing, it is hard to utilize the usual existing estimators. Traditionally, it is essential to estimate those missing observations first by some specific estimation techniques. One can think of utilizing the non-sampled part of the population in order to get estimates of missing observations in the sample. These estimates could be imputed into actual estimation procedures used for estimating the population mean. The content of this paper takes
Utilization of Non-Response Auxiliary Population Mean…
29
into account the similar aspect for non-responding values of the sample assuming poststratified setup and utilizing the auxiliary source of data.
1.1 Symbols and Setup Let U = (U1, U2 , ........., UN) be a finite population of N units with Y as a study variable and X an auxiliary variable. The population has two types of individuals like N1 as number of "respondents (R)" and N2 "non-respondents (NR)", (N = N1+N2). Their population proportions are expressed like W1 = N1/N and W2 = N2 /N. Further, let Y and X be the population means of Y and X respectively. The following notations are used in this paper: Rgroup Y1 X1
S1Y2 2 S1X
C1Y C1 X ρ n1 y1 x1 ρ1
Respondents group (group of those who respond during survey. Population mean of R-group of Y. Population mean of R-group of X. Population mean square of R-group of Y. Population mean square of R-group of X. Coefficient of Variation of Y in R-group. Coefficient of Variation of X in R-group. Correlation Coefficient in population between X and Y. Post-stratified sample size coming from R-group. Sample mean of Y based on n1 observations of R-group. Sample mean of X based on n1 observations of R-group. Correlation Coefficient between study variable Y and auxiliary variable X for R-group.
NRgroup Y2
Non-respondents group or group of those who do not respond during survey. Population mean of NR-group of Y.
X2
Population mean of NR-group of X.
2 S 2Y
S 22X C 2Y C2 X
n n2 y2 x2 ρ2
Population mean square of NR-group of Y. Population mean square of NR-group of X. Coefficient of Variation of Y in NRgroup. Coefficient of Variation of X in NRgroup. Sample size from population of size N by SRSWOR. Post-stratified sample size from NRgroup. Sample mean of Y based on n2 observations of NR-group. Sample mean of X based on n2 observations of NR-group. Correlation Coefficient between study variable Y and auxiliary variable X for NR-group.
Further, consider few more symbolic representations: 1 D1 = E n1
1 (N − n )(1 − W1 ) ; D = E 1 = + 2 n (N − 1) n 2W12 nW1 2 Y =
N1Y1 + N 2Y2 N
;
X=
N1 X 1 + N 2 X 2 N
1 (N − n )(1 − W2 ) = + (N − 1) n 2W 22 nW 2
(1.1) (1.2)
2. Assumptions The following assumptions are made before formulating an imputation based estimation procedure :
30
Journal of Reliability and Statistical Studies, June 2009, Vol. 2(1)
1. 2. 3. 4.
5. 6.
The values of N and n are known. Also, N1 and N2 are known by past data, past experience or guess of the investigator (N1 + N2 = N). Other population parameters are assumed known, in either exact or in ratio form except the Y , Y 1 and Y 2 . The population means X and X 2 are known. The sample of size n is drawn by SRSWOR and post-stratified into two groups of size n1 and n2 (n1 + n2 = n) according to R and NR group respectively. The information about Y variable in sample is completely available. The sample means y1 and y 2 of both groups are known such that 1
y=
7.
n1 y1 + n 2 y 2 n
which is the sample mean on n units.
The sample mean x of auxiliary variable for R-group is known, but the information about x of NR-group is missing. Therefore, the value of 1
2
n x + n2 x 2 x= 1 1 can not be obtained due to absence of x2 . n
3. Proposed Class of Estimation Strategy To estimate population mean Y the usual ratio, product and regression estimators are not applicable when observations related to x2 are missing. Singh and Shukla (1987) have proposed a factor type estimator for estimating population mean Y . Shukla et al. (1991), Singh and Shukla (1993), Shukla (2002) have also discussed properties of factor-type estimators applicable for estimating population mean under SRSWOR and Two-Phase Sampling. But all these cannot be useful due to unknown information x2 . In order to solve this, an imputation (x ) is adopted as: * 2
N X − nX 2 N − n
x 2* =
(3.1)
The logic for this imputation is to utilize the non-sampled part of the population of X for obtaining an estimate of missing x2 and generate x * as describe below : N x +N x * (3.2) x = *
1
1
2
N1 + N 2
2
A proposed and class of imputed factor-type estimation strategy for estimating y = N y + N y ( A + C )X + f B x
( )
FT k
Y
is:
*
1
1
2
N
2
(
)
A + f B X + C x *
(3.3)
where 0 < k < ∞ and k is a constant, A = (k – 1) (k – 2 ); B = (k – 1) (k – 4); C = (k – 2) (k – 3) (k – 4); f = n / N
4. Large Sample Approximation Consider the following for large n: y1 = Y1 (1 + e1 ) ;
y 2 = Y2 (1 + e2 ) ;
x1 = X 1 (1 + e3 );
x 2 = X 2 (1 + e 4 )
where, e1 , e2 , e3 and e 4 are very small numbers and ei < 1 (i = 1,2,3,4).
(4.1)
31
Utilization of Non-Response Auxiliary Population Mean…
Using the basic concept of SRSWOR and the concept of post-stratification of the sample n into n1 and n2 [see Cochran (2005), Sukhatme et al. (1984)],we get
[ ] E(e ) = E[E (e ) n ] = 0;
[ ] E(e ) = E[E (e ) n ] = 0
E(e1 ) = E E (e1 ) n1 = 0; 3
3
E(e2 ) = E E(e2 ) n2 = 0
1
4
4
2
(4.2)
Assuming the independence of R-group and NR-group representation in the sample, the following expression could be obtained:
[
[ ]
1 = E n1
1 1 = E − n1 N
]
E e12 = E E(e12 ) n1
2 C1Y n1
1 2 1 − C1Y = D1 − C12Y N N
(4.3)
[ ]
[
]
1 = D2 − C22Y N
(4.4)
[ ]
[
]
1 = D1 − C12X N
(4.5)
E e22 = E E (e 22 ) n2 E e32 = E E (e32 ) n1
1 E e42 = D2 − C 22X N
[ ]
and
[
E[e1e3 ] = E E(e1e3 ) n1
]
(4.6)
1 1 = E − ρ1C1Y C1 X n1 n1 N
1 = D1 − ρ1C1Y C1 X N
[
]
[
]
(4.7)
E[e1e2 ] = E E(e1e2 ) n1 , n2 = 0 E[e1e4 ] = 0
(4.8) (4.9) (4.10)
E[e2 e3 ] = E E(e2 e3 ) n1 , n2 = 0 1 E[e2 e4 ] = D2 − ρ 2C 2Y C 2 X N
(4.11)
E[e e ] = 0 (4.12) The expressions (4.8), (4.9), (4.10) and (4.12) are true under the assumption of independent representation of R-group and NR-group units in the sample. This is introduced to simplify mathematical expressions. 3
4
Theorem 4.1: The estimator (y ) could be expressed under large sample approximations in following form: y =δ Y [1+s1W1e1+s2W2e2][1+(α−β)e3−(α−β)β e + (α−β) β e − (α−β) β e +...] (4.13) Proof: Rewrite x as in (3.2): FT
k
( )
2 3
FT k
2
3 3
3
4 3
*
*
x
⇒
*
x
=
=
N x + N ( x *2 ) 2 1 1 N1 + N 2
where
x2
*
=
N X −nX 2 N − n
N X − n X 2 1 N1 x1 + N2 = X [W1r1 + p (1 − fr2 ) + W1r1e3 ] = X [v + W1r 1e3 ] N N − n
(4.14)
32
Journal of Reliability and Statistical Studies, June 2009, Vol. 2(1)
where,
N2 N −n
p=
X1
; r1=
X
X2
; r2 =
X
; W1 =
N1 N
n N
; f=
; v = W1r1 + p(1 – fr2).
Now, the estimator (y ) under large sample approximations (4.1) and using (4.12) will be y = N y + N y ( A + C )X + f B x FT
( )
*
1
FT k
1
2
2
(
)
A + f B X + C x *
N
=
( A + fBv + C ) + fBW 1 r1 e3 Y [1 + s1W1e1 + s2W2 e2 ] ( A + fB + Cv ) + CW 1 r1 e 3
=
ψ + ψ 2 e3 Y [1 + s1W1e1 + s 2W 2 e 2 ] 1 ψ 3 + ψ 4 e3
= δ2 Y [1 + s W e 1 = A + fBv + C; 1
where,
k
Y1
s1 =
1 1
+ αe3)(1 + βe3)-1 2 = fBW1r1 ; 3 = A + fB + Cv;
Y2
; s2 =
Y
+ s 2W 2 e2 ] (1
Y
ψ2 ; ψ1
; α=
β=
ψ4 ψ3
; δ=
ψ1 ψ3
We can further express (4.15) as: (y ) = δ Y [1 + s1W1e1+ s2W2e2] [1+(α − β)e3 − (α − β) β FT
k
(4.15) 4=
Cr1W1 ;
. + (α − β) β
e 32
2
e 33 −
.....…..] (4.16)
5. Bias and Mean Squared Error Using E(.) for expectation, B(.) for bias and M(.) for mean squared error, we have to the first order of approximations for i, j = 1, 2, 3, ..... E[e e ] = E[e e ] = E[e e ] = 0 when i + j > 2 (5.1) i 1
i j 1 3
j 2
i j 2 3
(y )
Theorem 5.1: To the first order of approximations, the bias of the estimator B (y
) = Y (δ − 1)− δ C B (y ) = E[ (y ) − Y ]
Y is
FT
k
1X
(α
)
1 − β D 1 − N
{
β C 1 X − s 1W 1 ρ 1 C 1Y
}
FT
k
of
(5.2)
Proof: Taking expectations in (4.16), we have E (y ) = δ Y E [1+ s1W1e1+ s2W2e2] [1 + (α − β)e3 − (α − β) β e +(α − β) β FT
FT
FT k
k
2 3
k
2
e 33
…....]
1 2 1 = δ Y 1 − α − β β D1 − C1 X + α − β s1W1 D1 − ρ1 C1Y C1 X N N
(
)
(
)
1 = δ Y 1 − α − β D1 − C1 X β C1 X − s1W1 ρ1 C1Y N
(
Therefore, B
(y )
FT k
)
{
}
1 = Y δ − 1 − δ C1 X α − β D1 − β C1 X − s1W1 ρ1 C1Y N
(
)
(
Theorem 5.2: The mean squared error of M (y
FT
)
)
{
}
(y ) is FT
k
1 1 = Y δ − 1 2 + D1 − {K1 s12C12Y + K 2 C1X2 + 2 K 3 s1 ρ1C1Y C1 X } + D2 − δ 2 s22W22 C22Y k N N
where,
2
(
)
(
){ (
) (
) };
K1 = δ 22W12 ; K 2 = δ α − β δ α − β − 2 δ − 1 β
Proof : M (y
FT
)= k
E[ (y
FT
)
k
−Y]
(
)(
K 3 = W1δ 2δ − 1 α − β
(5.3)
)
2
= E [δ Y {1 + s1W1e1 + s2W2e2}{1+ (α − β)e3 − (α −β)β e +(α − β) β 2 3
2
e 33 ….} − Y
]
2
33
Utilization of Non-Response Auxiliary Population Mean…
Using large sample approximations of (5.1), we have M (y ) = Y E [(δ−1) + δ{(α − β)e3 − (α− β)β e + (s1W1e1+ s2W2e2) 2
FT
2 3
k
2
+ (α− β) (s1W1e1 + s2W2e2) e3}] = Y [(δ −1)2 + δ {(α−β)2 E (e32 ) + s12W12 E (e12 ) + s 22W22 E (e22 ) +2(α− β)s1W1E(e1 e3)} + 2δ(δ−1){−(α − β)β E( e ) + (α− β)s1W1 E(e1e3 )}] [Using (4.2), (4.7) & (4.8)] 2
2
2 3
2 1 2 = Y δ − 1 + D1 − δ 2 s12W12C12Y + δ α − β δ α − β N
(
)
{
(
(
) }
(
){ (
)(
)
)
2 − 2 δ − 1 β C 1X + 2δ 2δ − 1 α − β s1W1 ρ 1C1Y C1X
} + D
−
2
1 N
2 2 2 2 δ s2W2 C2Y
2 1 1 2 2 = Y δ − 1 + D1 − {K1 s12C12Y + K 2 C1X + 2 K 3 s1 ρ1C1Y C1 X }+ D2 − δ 2 s22W22 C22Y N N
(
)
6. Some Special Cases The term A, B and C are functions of k. In particular, there are some special cases: Case I : k = 1 => A = 0; B = 0; C = − 6; 1 = −6; 2 = 0; 3 = − 6v; 4 = −6r1w1 α = 0; β = The estimator
(y )
=
FT k =1
B
(y ) FT
; δ=
1 v
)
k =1
;
W12 v2
K1 =
r12W12 (3 − 2v ) v4
; K2 =
; K3 =
(y ) along with bias and m.s.e. under case I is: FT
r1W12 (v − 2) v3
(6.1) r1 W1 2 C1 X {r1 C1 X − v s1 ρ 1C1Y }
2 1 2 = Y v − 4 (1 − v ) v 2 + W1 2 D1 − N
(6.2)
2 2 2 {v s1 C1Y + (3 − 2v )r12 C 12X + 2 (v − 2 )vr1 s1 ρ 1 C1Y C 1 X
1 + D 2 − v 2W 22 s 22 C 22Y N
(6.3)
Case II : k = 2 => A = 0; B = −2; C = 0; α = r1W1v ; β = 0; δ = v; −1
(y ) B
(y )
FT k = 2
M (y
FT
)
= Y (v − 1) + D1 −
k =2
=
1=
K1= W
1
2
v
= 0; α =
−2f v;
2=
−2fW1r1;
3=
; K2= r W ; K3 = r W 2 1
2
1
1
1
2
-2f ;
4=
0;
(2v − 1) ;
(6.4)
1 2 W1 r1 s1 ρ 1C1Y C1X N
(6.5)
2 1 1 2 Y (v −1) + W12 D1 − {v2 s12C12Y + r12C12X + 2(2v −1)s1r1ρ1C1Y C1X } + D2 − v2W22 s22C22Y N N
Case III : k = 3 => A = 2; B = −2; C = 0; 4
2
* N y + N y x 2 2 1 1 N X
=
FT k = 2
;
k
N y + N y X 2 2 * 1 1 N x
1 = Y v −3 (1 − v )v 2 + D1 − N
FT k =1
M (y
r1W1 v
− fW1 r1 1 − fv
; β = 0; δ =
1 − fv 1− f
; K1=
= 2(1− f v); 1 (1 − fv ) 2 W12 (1 − f ) 2
;
= −2fW1r1; 2 K2 =
f 2W12 r12 (1 − f ) 2
;
(6.6) = 2(1−f); 3
34
Journal of Reliability and Statistical Studies, June 2009, Vol. 2(1)
K3 =
− fW12 r1{1 + f (1 − 2v )} (1 − f ) 2
(y )
k =3
FT
B (y
FT
M (y
)
FT
k =3
)
k =3
; X − f x (1 − f ) X
N y +N y 2 2 1 1 N
=
*
(6.7)
1 Y f (1 − f ) −1 (1 − v ) − D1 − W12 r1 s1 ρ1C1Y C1 X N
=
= Y (1− f ) 2
−2
(6.8)
2 1 2 2 2 2 2 2 2 2 f (1−v) D1 − W1 (1− fv) s1 C1Y + f r1 C1X N
{
1 2 − 2{1 + f (1− 2ν )} fr1s1ρ1C1Y C1X } + D2 − (1− fv) W22 S22C22Y N
Case IV : k = 4 => A = 6; B = 0; C = 0; δ = 1; K1 = W ; K2= 0; K3 = 0;
1
= 6;
2
= 0;
3
(6.9)
= 6;
4
= 0; α = 0; β = 0;
2
1
(y )
k =4
FT
B (y
) V (y ) FT
FT
k =4
k =4
N y + N2 y2 = 1 1 N
(6.10)
=0
(6.11)
2 2 = D1 − 1 .W12 Y 1 C12Y + D2 − 1 .W22 Y 2C22Y N N
(6.12)
7. Estimator Without Imputation Throughout the discussion, the value of x2 is assumed unknown. This is imputed by the term x to provide the generation of x . [See (3.1) & (3.2)]. Suppose x is known, then there is no need of imputation and the proposed estimators (3.2) and (3.3) reduce into : *
*
2
2
N x + N 2 x2 x (#) = 1 1 N
(7.1)
N y + N 2 y2 = 1 1 N
(7.2)
[( y ) ] FT
w k
( A + C )X + fBx (# ) (#) ( A + fB )X + Cx
where, k is a constant (0 < k < ∞) and A = (k – 1) (k – 2); B = (k – 1) (k – 4); C = (k – 2) (k – 3) (k – 4); f = n/N. Theorem 7.1: The estimator [( y
FT
) ] is biased for w
k
Y with the amount of bias
1 1 B[( y FT )w ]k = (ψ 1' −ψ 2' ) D1 − W12 r1C1X {s1 1C1Y −ψ 2, r1C1X } + D2 − W22 r2 C2 X {s 2 ρ 2 C2Y −ψ 2' r2 C2 X } N N
where ψ = fB / (A + fB + C ) ; ψ = C / (A + fB + C ) . Proof: The estimator [( y FT )w ]k may be approximated as : ' 1
[( y ) ] FT
w k
(7.3)
' 2
N y + N2 y2 = 1 1 N
[
( A + C )X + fBx ( # ) (# ) ( A + fB )X + Cx
][
= Y + W1Y1e1 + W2Y2 e2 1 + ψ 1' (W1r1e3 + W2 r2 e4 )
] [1 +ψ
' 2
(W r e
1 1 3
]
+ W2 r2 e4 )
−1
Expanding the above using binominal expansion, and ignoring (eik e lj ) terms for (k + l )>2, (k, l = 0,1,2 ....), (i, j = 1, 2, 3, 4 ); the estimator results into
35
Utilization of Non-Response Auxiliary Population Mean…
[( y ) ] FT
= Y + Y (∆1 − ∆ 2 ) + W1Y1e1 (1 + ∆1 − ∆ 2 ) + W2Y2 e2 (1 + ∆1 − ∆ 2 )
w k
(7.4) ) and W1 r1 +W2 r2 = 1
where, ∆1 = (ψ −ψ ) (W r e + W r e ) , ∆2 = ψ (ψ −ψ ) (W r e + W r e holds. Further, up to first order of approximation, one may derive the following: ' 1
' 2
1 1 3
2 2
2
(iv) E [e
]
' 2
4
(ii) E [∆ ] =
(i) E [∆1] = 0; (iii) E [∆ ] = ψ
' 2
2 1
(ψ
' 1
(ψ
' 1
' 1
2 2 4
;
1 1 −ψ 2' ) W12 r12 D1 − C12X + W22 r22 D2 − C22X ; N N
(ψ
' 1
1 −ψ 2' ) W1 r1 D1 − ρ1C1X C1Y N
(v) E [e2 ∆1 ] =
(ψ
' 1
1 −ψ 2' ) W2 r2 D2 − ρ 2C 2 X C 2Y N
∆1
1 1 3
1 1 2 −ψ 2' ) W12 r12 D1 − C12X + W22 r22 D2 − C22X N N
=
1
2
' 2
; ;
(vii) E [e ∆ ] = 0 [under o (n )] ; (vi) E [e ∆ ] = 0 [under o (n )] ; The bias of the estimator without imputation is B[( y ) ] = E [{( y ) } − Y ] = E Y ( ∆1 − ∆ 2 ) + W1Y1e1 (1 + ∆1 − ∆ 2 ) + W2Y2 e2 (1 + ∆1 − ∆ 2 ) −1
1
FT w k
−1
2
2
2
FT w k
1 1 = Y (ψ 1' −ψ 2' ) D1 − W12 r1C1 X {s1 ρ1C1Y −ψ 2' r1C 2 X } + D2 − W22 r2 C2 X {s 2 ρ 2 C2Y −ψ 2' r2 C2 X } N N
Theorem 7.2 : The mean squared error of the estimator [( y FT )w ]k is :
{
}
1 2 M [( y FT )w ]k = Y 2 D1 − W12 s12 C12Y + (ψ 1' −ψ 2' ) r12 C12X + 2 s1 (ψ 1' −ψ 2' ) r1 ρ1C1Y C1 X N
{
}
1 2 2 + D2 − W22 s 22 C22Y + (ψ 1' −ψ 2' ) r22 C22X + 2 s 2 (ψ 1' −ψ 2' ) r2 ρ 2 C2Y C2 X N
Proof :
[{( ) } − Y ]
M[( y FT )w ]k = E y FT
[
(7.5)
2
w k
]
M[( y FT )w ]k = E Y (∆1 − ∆ 2 ) + W1Y1e1 (1 + ∆ 1 − ∆ 2 ) + W2Y2 e2 (1 + ∆ 1 − ∆ 2 )
2
= Y 2 E (∆ 12 ) + W12 Y1 2 E (e12 ) + W22Y2 2 E (e22 ) + 2W1 Y Y1 E (e1 ∆ 1 ) +2W2 Y Y2 E (e2 ∆1 ) + 2W1 W2Y1Y2 E (e1e 2 )
{
}
1 2 M [( y FT )w ]k = Y 2 D1 − W12 s12 C12Y + (ψ 2' −ψ 1' ) r12 C12X + 2 s1 (ψ 2' −ψ 1' ) r1 ρ1C1Y C1 X N
{
}
1 2 2 + D2 − W22 s 22C 22Y + (ψ 2' −ψ 1' ) r22 C22X + 2 s 2 (ψ 2' −ψ 1' ) r2 ρ 2 C2Y C 2 X N
Remark At k = 1, k = 2, k = 3 and k = 4, the biases and mean squared errors of nonimputed estimators are given below : Case I : k = 1 => A = 0 ; B = 0 ; C = - 6 ; ψ 1' = 0; ψ 2' = 1;
[( y ) ] FT
w k =1
N y + N 2 y2 = 1 1 N
X (# ) x
(7.6)
1 1 B[( y FT )w ]k =1 = −Y D1 − W12 r1C1 X {s1 ρ1C1Y − r1C1 X } + D2 − W22 r2 C2 X {s 2 ρ 2 C 2Y − r2 C2 X } N N 1 1 M [( y FT )w ]k =1 = Y 2 D1 − W12 {s12 C12Y + r12 C12X − 2s1 r1 ρ 1C1Y C1 X } + D2 − N N
(7.7)
2 2 2 W2 {s 2 C2Y + r22C 22X − 2 s 2 r2 ρ 2C 2Y C 2 X }
(7.8)
36
Journal of Reliability and Statistical Studies, June 2009, Vol. 2(1)
Case II : k = 2 => A = 0; B = –2; C = 0; ψ 1' = 1; ψ 2' = 0;
[( y ) ] FT
w k =2
N y + N 2 y2 = 1 2 N
x ( # ) X
(7.9)
1 1 B[( y FT )w ]k =2 = Y D1 − W12 s1r1 ρ1C1 X C1Y + D2 − W22 s2 r2 ρ 2C2 X C2Y N N
(7.10)
1 M [( y FT )w ]k = 2 = Y 2 D1 − W1 2 {s12 C12Y + r12 C12X + 2 s1 r1 ρ 1C1Y C1 X } N
1 + D2 − N Case III : k = 3 =>
2 2 2 2 2 W2 {s2 C2Y + r2 C2 X + 2s2 r2 ρ 2 C2Y C2 X } A = 2; B = –2; C = 0; ψ 1' = -f (1-f)-1; ψ 2' = 0;
(7.11)
(7.12)
1 1 −1 B[( y FT )w ]k =3 = −Yf (1 − f ) D1 − W12 r1 s1 ρ1C1Y C1 X + D2 − W22 r2 s 2 ρ 2 C2Y C2 X N N
(7.13)
[( y ) ] FT
w k =3
N y + N 2 y2 = 1 1 N
X − x (# ) (1 − f ) X
{
1 −2 −1 M[( y FT )w ]k =3 = Y 2 D1 − W12 s12 C12Y + (1 − f ) f 2 r12C12X − 2(1 − f ) f s1 ρ1C1Y C1 X N
}
1 −2 −1 + D2 − W22 {s 22 C22Y + (1 − f ) f 2 r22 C22X − 2(1 − f ) f s2 r2 ρ 2C2Y C2 X N
}]
(7.14)
Case IV: k = 4 => A = 6 ; B = 0 ; C = 0; ψ 1' = 0; ψ 2' = 0;
[( y ) ] FT
w k =4
N y + N 2 y2 = 1 1 N
B[( y FT )w ]k =4 = 0 ; V [( y FT )w ]k =4
8.
(7.15) (7.16)
1 1 = Y 2 D1 − W12 s12 C12Y + D2 − W22 s22 C22Y N N
(7.17)
Numerical Illustration
Consider two artificial populations I and II (given in Appendix A and B), each of which is divided into R and NR-groups of sizes N1 and N2 respectively. Let the samples of sizes 40 and 30 respectively from populations I and II drawn with SRSWOR be post-stratified into R and NR-groups. Then, we have Population I n1 = 28; n2 = 12 ; n = 40; f = 0.22
Population II n1 = 20; n2 = 10 ; n = 40; f = 0.20
The values of population parameters for the two populations are given in Table 8.1 and the values of bias and MSE are shown in Table 8.2.
37
Utilization of Non-Response Auxiliary Population Mean…
Size Mean Y Mean X M.S. Y M.S. X C.V. Y C.V. X Cor.Coeff.
Entire Population I II 180 150 63.77 159.03 29.20 113.22 299.87 2205.18 110.43 1972.61 0.272 0.295 0.360 0.392 0.809 0.897
R-group I II 100 90 66.33 173.60 30.72 128.45 349.33 2532.36 112.67 2300.86 0.282 0.290 0.345 0.373 0.805 0.857
NR-group I II 80 60 59.92 140.81 26.92 94.19 206.35 1219.90 100.08 924.17 0.240 0.248 0.372 0.323 0.808 0.956
Table 8.1: Parameters of Populations – I & II given in Appendix A & B.
Type of Estimator
( y FT ) k
( y FT ) w
Population-I Bias MSE -2.0 17.1255
Population-II Bias MSE -2.3386 8.025
1.6628
228.6822
2.6018
49.8306
1.4183
28.0158
-0.6512
7.0054
0
43.64
0
9.2662
0.1433 0.3141 -0.5962 0
12.9589 216.3024 24.327 43.64
0.1095 0.1599 -0.031 0
6.0552 46.838 5.2423 9.2662
Description of the estimator
(y (y (y (y
[( y [( y [( y [( y
) ) ) )
FT k =1
FT
k =2
FT k =3
FT k = 4
)] )] )] )]
FT w k =1
FT w k = 2 FT w k = 3
FT w k = 4
Table 8.2: Bias and M.S.E. Comparisons of (y FT )k and (y FT )w
The m.s.e. of the proposed imputed estimator is higher than that of nonimputed estimator but both are very close. Obviously, the non-imputed estimator will be better than the imputed estimator due to complete availability of information. The proposed one is very near to the non-imputed estimator showing utility due to new estimation technique in missing observation environment. Define a term LI as “percentage loss due to imputation” with formulation.
(LI )k
( ) × 100 ( )
MSE y FT = MSE y FT
k
w
The table 8.3 shows the variation of LI over k. k k=1 k=2 k=3 k=4
(LI)k Population –I Population-II 132.1524203 132.5307 105.7233762 106.3893 115.1633987 133.6322 100 100
Table 8.5: Variation of LI over k
(8.1)
38
Journal of Reliability and Statistical Studies, June 2009, Vol. 2(1)
The percentage loss in MSE due to imputation is small and accomodable over suitable choice of k. But at the same time, proposed one tackles and solves the problem of missing observations also. 9. Conclusions As per table 8.2 and 8.3, the imputed class performs closer to the non-imputed class of estimators over suitable choice of k. The over all comparative procedure shows almost a closed performance of imputed factor-type estimator to the same without imputation. The imputed factor-type class of estimators reveals a good potential for utilizing the information X 2 in place of missing x 2 . The class presents efficient member when k = 1 and k = 3. The LI comparison shows that with a little loss, one can handle the non-responded observations effectively. Actually, the best choice of k is suppose to be near to k = 1 or near to k = 3. It is worthwhile to say that the proposed class contains estimators is effective for mean estimation even when some observations of auxiliary variable X are missing (or non-responded).
Acknowledgement Authors are thankful to the referee for his valuable comments and useful suggestions which improved the quality of the original manuscript.
References 1. 2.
3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
Agrawal, M.C. and Panda, K.B. (1993). An efficient estimator in poststratification, Metron, 51, 3-4, 179-187. Ahmed, M.S., Al-titi, Omar, Al-Rawi, Ziad and Abu-dayyeh, Walid (2006). Estimation of a population mean using different imputation method, Statistics in Transition, 7(6), 1247-1264. Cochran, W.G. (2005). Sampling Techniques, Third Edition, John Willey & Sons, New Delhi. Grover and Couper (1998). Non-response in household surveys, John Wiley and Sons, New York. Hansen, M.H. and Hurwitz, W.N. (1946). The problem of non-response in sample surveys, Jour. Amer. Stat. Asso., 41, 517-529. Hinde, R.L. and Chambers, R.L. (1990). Non-response imputation with multiple source of non-response, Jour. Off. Statistics, 7, 169-179. Khot, P.S. (1994). A note on handling non-response in sample surveys, Jour. Amer. Stat. Assoc., 89, 693-696. Lessler and Kalsbeek (1992). Non-response error in surveys, John Wiley and Sons, New York. Rao, J.N.K. and Sitter, R.R. (1995). Variance estimation under two phase sampling with application to imputation for missing data, Biometrika, 82, 453-460. Rubin, D.B. (1976). Inference and missing data, Biometrika, 63, 581-593. Shukla, D. (2002). F-T estimator under two phase sampling, Metron, 59, 1-2, 253263. Shukla, D. and Dubey, J. (2001). Estimation in mail surveys under PSNR sampling scheme, Ind. Soc. Ag. Stat., 54, 3, 288-302. Shukla, D. and Dubey, J. (2004). On estimation under post-stratified two phase non-response sampling scheme in mail surveys, International Jour. of Management and Systems, 20, 3, 269-278.
Utilization of Non-Response Auxiliary Population Mean…
39
14. Shukla, D. and Dubey, J. (2006). On earmarked strata in post-stratification, Statistics in Transition, 7, 5, 1067-1085 15. Shukla, D. and Trivedi, M. (1999). A new estimator for post-stratified sampling scheme, Proceedings of NSBA-TA (Bayesian Analysis), Eds. Rajesh Singh, 206218. 16. Shukla, D. and Trivedi, M. (2001). mean estimation in deeply stratified population under post-stratification. Ind. Soc. Ag. Stat., 54(2), 221-235. 17. Shukla, D. and Trivedi, M. (2006). Examining stability of variability ratio with application in post-stratification, International Jour. of Management and Systems, 22, 1, 59-70. 18. Shukla, D., Bankey, A. and Trivedi, M. (2002). Estimation in post-stratification using prior information and grouping strategy, Ind. Soc. Ag. Stat., 55, 2, 158-173. 19. Shukla, D., Singh, V.K. and Singh, G.N. (1991). On the use of transformation in factor type estimator, Metron, 49 (1-4), 359-361. 20. Shukla, D., Trivedi, M. and Singh, G.N. (2006). Post-stratification two-way deeply stratified population, Statistics in Transition, 7, 6, 1295-1310. 21. Singh G.N. and Singh V.K. (2001). On the use of auxiliary information in successive sampling, Jour. Ind. Soc. Ag. Stat., 54(1), 1-12. 22. Singh, H.P. (1986). A generalized class of estimators of ratio product and mean using supplementary information on an auxiliary character in PPSWR scheme, Guj. Stat. Rev., 13, 2, 1-30. 23. Singh, S. and Horn, S. (2000). Compromised imputation in survey sampling, Metrika, 51, 266-276. 24. Singh, V.K. and Shukla, D. (1987). One parameter family of factor type ratio estimator, Metron, 45 (1-2), 273-283. 25. Singh, V.K. and Shukla, D. (1993). An efficient one-parameter family of factor type estimator in sample survey, Metron, 51, 1-2, 139-159. 26. Singh, V.K. and Singh, G.N. (1991). Chain type estimator with two auxiliary variables under double sampling scheme, Metron, 49, 279-289. 27. Singh, V.K., Singh, G. N. and Shukla, D. (1994). A class of chain ratio estimator with two auxiliary variables under double sampling scheme, Sankhya, Ser. B., 46(2), 209-221. 28. Smith, T.M.F. (1991). Post-stratification, The Statisticians, 40, 323-330. 29. Sukhatme, P.V., Sukhatme, B.V., Sukhatme, S. and Ashok, C. (1984). Sampling Theory of Surveys with Applications, Iowa State University Press, I.S.A.S. Publication, New Delhi. 30. Wywial, J. (2001). Stratification of population after sample selection, Statistics in Transition, 5, 2, 327-348.
40
Journal of Reliability and Statistical Studies, June 2009, Vol. 2(1)
Appendix A: Population I (N= 180) R-group: (N1=100) Y: X: Y: X: Y: X: Y: X: Y: X: Y: X: Y: X:
110 80 200 150 175 145 110 75 180 135 205 110 210 170
75 40 135 85 160 110 110 80 150 110 180 105 145 85
85 55 120 80 165 135 120 90 185 135 150 110 135 95
165 130 165 100 175 145 230 165 165 115 205 175 250 190
125 85 150 25 185 155 220 160 285 125 220 180 265 215
110 50 160 130 205 175 280 205 150 205 240 215 275 200
Y: X: Y: X: Y: X: Y: X: Y: X: Y: X:
85 55 110 70 115 75 105 65 155 115 160 120
75 40 90 60 140 105 125 75 190 130 155 115
115 65 155 85 180 120 110 70 150 110 170 120
165 115 130 95 170 115 120 80 180 120 195 135
140 90 120 80 175 125 130 85 200 125 200 150
110 55 95 55 190 135 145 105 160 145
85 35 165 135 140 80 275 215 235 100 260 225 205 165
80 40 145 105 105 75 220 190 125 195 185 110 195 155
150 110 215 185 125 65 145 105 165 85 150 90 180 150
165 115 150 110 230 170 155 115 135 115 155 95 115 175
135 95 145 95 230 170 170 135 130 75 115 85
120 60 150 75 255 190 195 145 245 190 115 75
140 70 150 70 275 205 170 135 255 205 220 175
135 85 195 165 145 105 185 110 280 210 215 185
145 115 190 160 125 85 195 145 150 105 230 190
NR-group: (N2=80) 115 60 100 60 160 110 160 110 155 120
13.5 65 125 75 155 115 170 115 170 105
120 70 140 90 175 135 180 130 195 100
125 75 155 105 195 145 `145 95 200 95
120 80 160 125 90 45 130 65 150 90
150 120 145 95 90 55 195 135 165 105
145 105 90 45 80 50 200 130 155 125
90 45 90 55 90 60 160 115 180 130
105 65 95 65 80 50 110 55 200 145
Appendix B : Population II (N=150) R-group (N1=90) Y: X: Y: X: Y: X: Y: X: Y: X: Y: X:
90 30 100 50 90 40 45 15 65 25 75 25
75 35 40 10 95 50 40 15 60 40 70 20
70 30 45 25 80 35 40 20 75 35 40 35
85 40 55 25 85 45 35 10 75 30 55 30
95 45 35 10 55 35 55 30 85 40 75 45
55 25 45 15 60 25 75 25 95 35 45 10
Y: X: Y: X: Y: X: Y: X:
40 10 65 35 65 30 75 30
90 30 80 45 70 40 65 35
95 30 55 30 55 30 70 40
70 30 65 30 35 10 65 25
60 25 75 40 35 15 70 45
65 30 55 15 50 25 45 10
65 40 35 10 75 30 80 30 90 40 55 30
80 50 55 25 85 40 80 40 90 35 60 25
65 35 85 35 80 25 85 35 45 15 85 40
50 30 95 55 65 35 55 20 40 25 55 15
45 15 65 35 35 10 45 25 45 15 60 25
55 20 75 40 40 15 70 30 55 30 70 30
60 25 70 30 95 45 80 40 60 30 75 35
60 30 80 45 100 45 90 45 65 25 65 30
95 40 65 40 55 25 55 30 60 20 80 45
60 20 45 15 60 30 55 15
65 30 40 10 30 10 60 25
60 30 75 40 35 20 70 30
55 35 75 45 45 15 75 35
55 25 45 10 55 30 65 30
45 20 70 30 65 30 80 45
NR-group (N2=60) 85 40 50 15 55 30 55 30
55 25 55 20 35 15 60 25
45 15 60 30 55 20 85 40