Information and Management Sciences Volume 16, Number 2, pp. 73-82, 2005
Estimation of Finite Population Variance in Presence of Random Non-Response Using Auxiliary Variables M. S. Ahmed
Omar Al-Titi
Sultan Qaboos University
Yarmouk University
Oman
Jordan
Ziad Al-Rawi
Walid Abu-Dayyeh
Yarmouk University
Yarmouk University
Jordan
Jordan
Abstract This paper proposes some general estimators for finite population variance in presence of random non-response using an auxiliary variable. All possible non-response cases are considered here. The properties of these estimators are studied. Finally, a simulation study has been done to compare their performance.
Keywords: Randomized Response, Ratio and Regression Estimators, Simple Random with or without Replacement. 1. Introduction Let ω : (v1 , v2 , . . . , vN ) denote the population of N units from which a simple random sample of size n is drawn without replacement. If r (r = 0, 1, 2, . . . , (n − 2)) denotes the number of sampling units on which information could not be obtained due to random non-response, then the remaining (n − r) units in the sample can be treated as a simple random sample without replacement (SRSWOR) from ω. Since, we are considering the problem of unbiased estimators of finite population variance; we will assume that r should be less than (n − 1) i.e. 0 ≤ r ≤ n − 2. We assume that p denotes the probability of non-response among the (n − 2) possible values of non-responses. Singh and Joarder (1998) assumed that r has the following discrete distribution: P (r) =
(n − r) n−2 Cr pr q n−2−r , nq + 2p
where q = 1 − p, r = 0, 1, 2, . . . , (n − 2) and
Received July 2004; Revised December 2004; and January 2005. Supported by ours.
n−2
Cr ,
74
Information and Management Sciences, Vol. 16, No. 2, June, 2005
denotes the total number of ways of r non-responses out of totals possible (n − 2) responses. They studied the effect of random non-response on the study and auxiliary variables of several estimators of variance. P Pn−r n−r ¯∗ )2 ¯ ∗ )2 i=1 (yi − y i=1 (xi − x ∗2 Define, sy = and s∗2 = are conditionally unbiased x n− n−r−1 P PrN − 1 N ¯ 2 (xi − X) (yi − Y¯ )2 and Sx∗2 = i=1 , respectively, where y¯∗ = estimators of Sy∗2 = i=1 N −1 N −1 (n − r)−1
n−r X
yi and x ¯∗ = (n − r)−1
xi . Also,
i=1
i=1
define µls = (N − 1)−1
n−r X
¯ l ¯ s i=1 (yi − Y ) (xi − X) , λls =
PN
µls s
l
µ∗02 ) 2 (µ20 ) 2 (ˆ
− 1, λ =
1 1 − n N
1 1 − where nq + 2p N Singh and Joarder (1998) obtained the following maximum likelihood estimators of p
and λ∗ =
(probability of non-response), λls and µls as r
(n − 1 + r) − pˆ = µ ˆ ∗ls = (n − r − 1)−1
(n − 1 + r)2 −
4rn(n−3) (n−2)
2(n − 3)
n−r X
,
ˆ∗ = λ ls
µ ˆ ∗ls − 1 and (ˆ µ∗20 (ˆ µ∗02 )s/2
(yi − y¯∗ )l (xi − x ¯ ∗ )s
i=1
We have studied the effect of random non-response on the study and auxiliary variables of several estimators of variance under the following three strategies considered by Singh and Joarder (1998): Strategy I: We are considering the situation when random non-response exists on both the study variable y and the auxiliary variable x and population variance S x2 of the auxiliary character is known. Strategy II: Here we are considering the situation when information on variable y could not be obtained for r units while information on variable x is available and population variance Sx2 of the auxiliary variable is known. Strategy III: Here we again consider the situation when information on variable y could not be obtained for r units while information on the variable x is obtained for all the sample units, but the population variance S x2 of the auxiliary variable is unknown. For each strategy, we have proposed a number of estimators and derived approximate bias and mean squared error. It is interesting that under this distribution of random
Estimation of Finite Population Variance in Presence of Random Non-Response Using Auxiliary Variables
75
non-response the exact bias and mean squared error expressions, up to first order approximation, exist for the proposed strategies. 2. Proposed Estimators Strategy I: Under this strategy, Singh and Joarder (1998) proposed the following estimator of finite population variance νˆ1 = s∗2 y
Sx2 s∗2 x
(2.1)
For this strategy, we have proposed the following estimators for finite population variance t1 =
s∗2 y
Sx2 s∗2 x
! α1
(2.2)
2 ∗2 t2 = s∗2 y + k1 (Sx − sx )
(2.3)
2 s∗2 y Sx 2 θ1 s∗2 x + (1 − θ1 )Sx
(2.4)
t3 =
where α1 , k1 and θ1 are suitably chosen constants. The properties of these estimators are stated as theorems. The proofs of the theorems are similar. Therefore only the proof of Theorem 2.1 will be given in the appendix. Theorem 2.1. The Bias and Mean squared error of t 1 in (2.2) are given respectively by
α1 (α1 + 1) λ04 − α1 λ22 2
(2.5)
M (t1 ) ≈ λ∗ Sy4 (λ40 − 2α1 λ22 + α21 λ04 )
(2.6)
B(t1 ) ≈
λ∗ Sy2
The minimum mean squared error (2.6) of the proposed estimator t 1 is given by M (t1 )min ≈ The optimum value of α1 is α1 =
λ∗ Sy4
λ2 λ40 − 22 λ04
!
(2.7)
λ22 λ04
Theorem 2.2. The estimator t2 in (2.3) is an unbiased estimator of S y2 with variance is given by: V (t2 ) = λ∗ (Sy4 λ40 − 2k1 Sx2 Sy2 λ22 + k12 Sx4 λ04 )
(2.8)
The minimum variance of the proposed estimator t 2 is given by V (t2 )min =
λ∗ Sy4
λ2 λ40 − 22 λ04
!
(2.9)
76
Information and Management Sciences, Vol. 16, No. 2, June, 2005
The optimum value of k1 is: k1 =
Sy2 λ22 Sx2 λ04
Theorem 2.3. The Bias and Mean squared error of t 3 in (2.4) are given respectively by B(t3 ) ≈ λ∗ Sy2 (θ12 λ04 − θ1 λ22 )
(2.10)
M (t3 ) ≈ λ∗ Sy4 (λ40 − 2θ1 λ22 + θ12 λ04 )
(2.11)
The minimum mean squared error of the proposed estimator t 3 is given by M (t3 )min ≈ The optimum value of θ1 is: θ1 =
λ2 λ40 − 22 λ04
λ∗ Sy4
!
(2.12)
λ22 λ04
Under strategy II: Under this strategy, Singh and Joarder (1998) proposed the following estimator of finite population variance νˆ2 = s∗2 y
Sx2 s∗2 x
(2.13)
For this case, we propose the following estimators of finite population variance t4 = s∗2 y
Sx2 s∗2 x
! α2
(2.14)
2 2 t5 = s∗2 y + k2 (Sx − sx )
t6 =
(2.15)
2 s∗2 y Sx
(2.16)
θ2 s2x + (1 − θ2 )Sx2
where α2 , k2 and θ2 are suitably chosen constants. Theorem 2.4. The Bias and Mean squared error of t 4 in (2.14) are given respectively by: B(t4 ) ≈ γSy2
α2 (α2 + 1) λ04 − α2 λ22 2
M (t4 ) ≈ Sy4 (λ∗ λ40 + λ(α22 λ04 − 2α2 λ22 )]
(2.17) (2.18)
The minimum mean squared error of the proposed estimator t 4 is given by M (t4 )min ≈ Sy4 (λ∗ λ40 − λ The optimum value of α2 is α2 =
λ22 λ04
λ222 ) λ04
(2.19)
Estimation of Finite Population Variance in Presence of Random Non-Response Using Auxiliary Variables
77
Theorem 2.5. The estimator t5 given in (2.15) is an unbiased estimator of S y2 with variance is given by: V (t5 ) = λ∗ Sy4 λ40 + λ[k22 Sx4 λ04 − 2k2 Sx2 Sy2 λ22 ]
(2.20)
The minimum variance of the proposed estimator t 5 is given by V (t5 )min = Sy4 (λ∗ λ40 − λ The optimum value of k2 is k2 =
λ222 ) λ04
(2.21)
Sy2 λ22 Sx2 λ04
Theorem 2.6. The Bias and Mean squared error of t 6 in (2.16) are given respectively by: M (t6 ) ≈ Sy4 [λ∗ λ40 + λ(θ22 λ04 − 2θ2 λ22 )]
(2.22)
The minimum mean squared error of the proposed estimator t 6 is given by V (t6 )min = The optimum value of θ2 is θ2 =
λ2 λ40 − λ 22 λ04
Sy4 λ∗
!
(2.23)
λ22 λ04
3) Under strategy III: Under strategy III, Singh and Joarder (1998) proposed the following estimator of finite population variance νˆ3 = s∗2 y
Sx2 s∗2 x
(2.24)
For this case, we propose the following estimators of finite population variance t7 = s∗2 y
s2x s∗2 x
! α3
(2.25)
2 ∗2 t8 = s∗2 y + k3 (sx − sx )
(2.26)
2 s∗2 y sx 2 θ3 s∗2 x + (1 − θ3 )sx
(2.27)
t9 =
where α3 , k3 and θ3 are suitable constants. Theorem 2.7. The Bias and Mean squared error of t 7 in (2.25) is given respectively by 1 B(t7 ) ≈ Sy2 [α23 {(λ∗ + λ)λ04 − 2λλ22 } + α3 (λ∗ − λ)(λ04 − 2λ22 )] 2
(2.28)
M (t7 ) ≈ Sy4 (λ∗ λ40 − 2α3 (λ∗ − λ)λ22 + α23 {(λ∗ + λ)λ04 − 2λλ22 }]
(2.29)
78
Information and Management Sciences, Vol. 16, No. 2, June, 2005
The minimum mean squared error of the proposed estimator t 7 is given by M (t7 )min =
Sy4
"
(λ∗ − λ)2 λ222 λ λ40 − ∗ (λ + λ)λ04 − 2λλ22 ∗
#
(2.30)
The optimum value of α3 is α3 =
(λ∗ − λ)λ22 (λ∗ + λ)λ04 − 2λλ22
Theorem 2.8. The proposed estimator t 8 given in (2.26) is an unbiased estimator of Sy2
with variance V (t8 ) = Sy4 λ∗ λ40 − 2k3 Sx2 Sy2 (λ∗ − λ)λ22 + k32 Sx4 [(λ∗ + λ)λ04 − 2λλ22 ]
(2.31)
The minimum variance is given by V (t8 )min = Sy4
"
(λ∗ − λ)λ222 λ∗ λ40 − ∗ (λ + λ)λ04 − 2λλ22
#
(2.32)
The optimum value of k3 is k3 =
Sy2 (λ∗ − λ)λ22 Sx2 [(λ∗ + λ)λ04 − 2λλ22 ]
Theorem 2.9. The Bias and Mean squared error of t 9 in (2.27) are given respectively by: B(t9 ) ≈ Sy2 [θ3 {(λ − λ∗ )λ22 + λ(λ22 − λ04 )} + θ32 (λ∗ + λ)λ04 − 2λλ22 }]
(2.33)
M (t9 ) ≈ Sy4 [λ∗ λ40 − 2θ3 (λ∗ − λ)λ22 + θ32 {(λ∗ + λ)λ04 − 2λλ22 }]
(2.34)
The minimum mean squared error of the proposed estimator t 9 is given by M (t9 )min ≈
Sy4
"
(λ∗ − λ)2 λ222 λ λ40 − ∗ (λ + λ)λ04 − 2λλ22 ∗
#
(2.35)
The optimum value of θ3 is θ3 =
(λ∗ − λ)λ22 (λ∗ + λ)λ04 − 2λλ22
3. Numerical Illustration with Simulation The relative comparison among these estimators is given by using a real data set. The data for this illustration has been taken from Department of Statistics (Jordan),
Estimation of Finite Population Variance in Presence of Random Non-Response Using Auxiliary Variables
79
Healthcare Utilization and Expenditure Survey, 2000. The population that we like to study contains 8306 households. We consider the variable y and x where y is the monthly expenditure of the household and x is the monthly income of the household. Throughout this study, calculations are used based on the simulation study of 30,000 repeated samples without replacement. We compute the bias and the MSE for all estimators. We assume the probability of nonresponse: p = 0.10 and p = 0.05, and from the population, we have Sy2 = 338006, Sx2 = 862017, m20 = 338006, m02 = 862017, λ04 = 100.501, λ40 = 255.731, λ22 = 79.7033, α1 = α2 = θ1 = θ2 = 0.793063, α3 = θ3 = 0.169604, Simulation Simulation has been used to calculate the bias and MSE of the estimators of Sˆy2 using Mathematica 5. The algorithm of simulation consists of the following steps: Step 1. Select a (SRSWOR) of size 200 from the population of size 8306. Step 2. Select value of r, say r. Step 3. Drop r units randomly from each sample in step 1. Step 4. Find the value of the estimator based on the (n − r) observation, say Sˆy2 . Step 5. Repeat step (1), (2), (3) and (4) 30000 times. Thus, we obtain 30000 values for 2 ,...,S ˆ2 Sˆy2 say Sˆy1 y30000 . P30000 2 Step 6. The bias of Sˆy2 is obtained by B(Sˆy2 ) = 1 Sˆy − Sy2 i=1 i 1 P30000 ˆ2 i=1 (Syi 30000
30000
Step 7. The MSE of Sˆy2 is obtained by M SE(ˆ y) = The following tables are obtained.
− Sy2 )2
Table 1. Bias and MSE for estimators for Strategy I. MSE
Bias
Estimator
p = 0.10
p = 0.05
p = 0.10
p = 0.05
νˆ1
41.2523 × 1011
35.4198 × 1011
305451
285382
t1
10.8574 × 1011
9.4279 × 1011
131735
123443
t2
1.19627 ×
1011
1011
353
−175
t3
1.94415 × 1011
1.7936 × 1011
1030
1540
1.1308 ×
The estimators can be ranked according to their performance as t 2 , t3 , t1 and νˆ1
80
Information and Management Sciences, Vol. 16, No. 2, June, 2005
Table 2. Bias and MSE for estimators for Strategy II. MSE Estimator
p = 0.10
Bias p = 0.05
1011
p = 0.05
253128
261142
νˆ1
33.2604 ×
t1
9.0795 × 1011
8.6695 × 1011
109467
113115
t2
1.2375 × 1011
1.1495 × 1011
518
−154
t3
1011
1011
1021
1485
1.8889 ×
31.9796 ×
p = 0.10
1011
1.7725 ×
The estimators can be ranked according to their performance as t 5 , t6 , t4 and νˆ2 Table 3. Bias and MSE for estimators for Strategy III. MSE Estimator
p = 0.10 1011
Bias p = 0.05
2.5626 ×
1011
νˆ1
4.2418 ×
t1
1.5698 × 1011
1.4950 × 1011
t2
1.5686 ×
1011
1011
t3
1.5592 × 1011
1.4957 ×
1.4931 × 1011
p = 0.10
p = 0.05
31885
14638
−48
102
77
189
−1748
−304
The estimators can be ranked according to their performance as t 9 , t8 , t7 and νˆ3 . It is observed that the proposed estimators have less mean squared errors and less bias compare to Singh and Joarder (1998).
APPENDIX Proof of theorem 2.1. Define, ε =
s∗2 y Sy2
− 1, δ =
s∗2 x Sx2
− 1 and η =
s∗2 x Sx2
− 1 then
E(ε) = E(δ) = E(η) = 0, E(ε2 ) = λ∗ λ40 , E(δ 2 ) = λ∗ λ40 , E(η 2 ) = λλ40 , E(εδ) = λ∗ λ22 , E(εη) = λλ22 and E(δη) = ηλ22 , Using Tailor theorem, the estimator t 1 can be written as α(α + 1) 2 δ ≈ + ε) 1 − αδ + t1 = + ε)(1 + δ) 2 α(α + 1) 2 δ = Sy2 1 + ε − αδ − αεδ + 2 Sy2 (1
−α1
Sy2 (1
Estimation of Finite Population Variance in Presence of Random Non-Response Using Auxiliary Variables
81
Then the first order bias of t1 is given by α1 (α1 + 1) 2 B(t1 ) = E(t1 − ≈ ε − α1 δ − α1 εδ + δ 2 α1 (α1 + 1) ∗ α1 (α1 + 1) ∗ 2 2 2 E(δ ) = Sy −α1 λ λ22 + λ λ04 = Sy −α1 E(εδ) + 2 2 α1 (α1 + 1) = λ∗ Sy2 λ04 − α1 λ22 2 Sy2 )
Sy2 E
The first order mean squared error of t 1 is given by M (t1 ) = E(t1 − Sy2 )2 ≈ Sy4 E(ε − α1 δ)2 = Sy2 [E(ε2 ) − 2α1 E(εδ) + α21 E(δ 2 )] = Sy2 (λ∗ λ40 − 2α1 λ∗ λ22 + α21 λ∗ λ04 ) = λ∗ Sy4 (λ40 − 2α1 λ22 + α21 λ04 ) The optimum value of α1 that minimizes mean squared error, α 1 =
λ22 α04
Substituting the optimum value of α 1 in equation (2.6), we get M (t1 )min ≈
λ∗ Sy4
λ2 λ40 − 22 λ04
!
Other theorems can be proved similar way. Acknowledgement We would like to thanks for valuable comments of the referees which leads to improve our paper. References [1] Singh, S. and Joarder, AH., Estimation of finite population variance in the presence of random non-response. Metrika, Vol. 47, pp.241-249, 1998. [2] Healthcare Utilization and Expenditure Survey (2000). Department of Statistics, Ministry of Planning, (Jordan).
Authors’ Information M. S. Ahmed is currently an assistant professor in the Department of Mathematics and Statistics, Sultan Qaboos University, Oman. He received his Ph.D. in Statistics from Aligarh Muslim University, India. His research interests are survey design, Experimental design and statistical quality control. Department of Mathematics and Statistics, Sultan Qaboos University, P.O. Box 36, Al-Khodh PC 123, Muscat, Sultanate of Oman. E-mail:
[email protected]
Tel: 00968 24413333 Ext. 3117(W), 00968 24541420 (R)
82
Information and Management Sciences, Vol. 16, No. 2, June, 2005
Omar Al-Titi is a research student in the Department of Statistics, Yarmouk University, Jordan. He received his M.Sc. in Statistics from Yarmouk, University, Jordan. His research interests are Survey design and Multivariate analysis. Department of Statistics, Yarmouk University, Irbid, Jordan. E-mail:
[email protected] Ziad Al-Rawi is currently a professor in the Department of Statistics, Yarmouk, University, Jordan. He received his Ph.D. in Biostatistics from University of Pittsburgh, U.S.A. His research interests are Applied Multivariate Analysis, Design of experiment, survival function and distribution, Bioassay. Department of Statistics, Yarmouk University, Irbid, Jordan. E-mail:
[email protected]
Tel.00962 2 7271100 Ext. 2331 (Off.),3485 (Res.)
Walid Abu-Dayyeh is currently a professor in the Department of Statistics, Yarmouk, University, Jordan. He received his Ph.D. in Statistics from University of Illinoss, U.S.A. His research interests are Asymptotic Efficiency, Ranked Set Sampling, Testing under Restricted Alternatives. Department of Statistics, Yarmouk University, Irbid, Jordan. E-mail:
[email protected]
Tel: 0096 22 7271100 Ext.2340(W), 3599(H)