Feb 15, 2018 - Keywords: Double Sampling for Ratio Estimation, Domain Mean, Auxiliary .... ratio estimation with non-linear cost function with a random.
Science Journal of Applied Mathematics and Statistics 2018; 6(1): 28-42 http://www.sciencepublishinggroup.com/j/sjams doi: 10.11648/j.sjams.20180601.14 ISSN: 2376-9491 (Print); ISSN: 2376-9513 (Online)
Domain Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non Response Alila David Anekeya1, *, Ouma Christopher Onyango2, Nyongesa Kennedy1 1
Department of Mathematics Masinde Muliro University of Science and Technology, Kakamega, Kenya
2
Department of Statistics and Actuarial Science Kenyatta University, Nairobi, Kenya
Email address: *
Corresponding author
To cite this article: Alila David Anekeya, Ouma Christopher Onyango, Nyongesa Kennedy. Domain Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non Response. Science Journal of Applied Mathematics and Statistics. Vol. 6, No. 1, 2018, pp. 28-42. doi: 10.11648/j.sjams.20180601.14 Received: December 17, 2017; Accepted: January 5, 2018; Published: February 15, 2018
Abstract: This paper describes theoretical estimation of domains mean using double sampling with a non-linear cost function in the presence of non-response. The estimation of domain mean is proposed using auxiliary information in which the study and auxiliary variable suffers from non-response in the second phase sampling. The expression of the biases and mean square errors of the proposed estimators are obtained. The optimal stratum sample sizes for given set of non-linear cost function are developed. Keywords: Double Sampling for Ratio Estimation, Domain Mean, Auxiliary Variable, Non-Linear Cost Function and Non-Response
1. Introduction 1.1. Domains Domain is a subgroup of the whole target population of the survey for which specific estimates are needed. In sampling, estimates are made in each of the class into which the population is subdivided; for instance, the focus may not only be the unemployment rate of the entire population but also the break-down by age, gender and education level. Units of domains may sometimes be identified prior to sampling. In such cases, the domains can be treated as separate stratum from a specific sample taken. Stratification ensures a satisfactory level of representativeness of the domains in the final sample. These domains are called planned domains. 1.2. Domain Estimation Consider a finite population under study U of size N divided into D domains; U1 , U 2 ,..., U D respectively.
Domain membership of any population unit is unknown before sampling. Its assumed that the domains are quite large and for a typical d th domain U d several characteristics maybe defined as described by Gamrot [4]. This includes; Y = yd k Domain total; U d
∑ ud
1
Domain mean; YU d = N d
∑y
dk
ud
∑(
1 2 Yd − Yud Domain variance; SU d (Y ) = N − 1 d k∈u
)
2
d
Domain Covariance between two characters X and Y is given by; CovU d =
∑(
1 xd k − X u d N d − 1 k∈u
)( y
dk
− Yud
)
d
According to Meeden [7] domain can be estimated by use
Science Journal of Applied Mathematics and Statistics 2018; 6(1): 28-42
of a non-informative Bayesian approach where a polya posterior is used on finite population that has little or no prior information about the population. Although a prior distribution is not specified there is a posterior distribution which may be used to make inferences. Udofia [12] proposed estimate of domains using double sampling for probabilities proportional to size (PPS) with known constituent domain. The assumptions proposed by Udofia [12] are; (i) The size of auxiliary variable X is not known. (ii) The distribution of the variable ( Z ) that defines the domain is the not known prior and therefore the population
( )
size N h j of the domain is also known. (iii) The cost of measuring the variable X and Z in each stratum is much lower than that of measuring of the study variable Y . Aditya et al. [1] developed a method of estimating domain total for unknown domain size in the presence of nonresponse with a linear cost function using two-stage sampling design. In this method the response mechanism is assumed to be deterministic. 1.3. Double Sampling in the Presence of Auxiliary Information In many sampling procedures the prior knowledge about the population mean of the auxiliary variable is required. If there is no such information, it’s easier and cheaper to take on the large initial sample from which the auxiliary variable is measured and from which the estimation of the population parameters like the total, mean or the frequency distribution of the auxiliary variable X is made. Srivastava [11] proposed a large class of ratio and product estimators in double sampling. It was found that the asymptotic minimum variance for any estimator of this class is equal to that which is generally believed to be linear regression estimators. According to Sahoo and Panda [10] if an experimenter knows the population mean of an additional auxiliary variable, say, Z whereas the population mean of an auxiliary variable X is unknown and can be estimated using double sampling scheme, it is possible to come up with a class of estimators for the finite population mean µY . 1.4. Double Sampling for the Ratio Estimator in the Presence of Non-Response Hansen and Hurwitz [5] proposed a way of dealing with non-response to address the bias problem. In this case, when dealing with non-response, a sub-sample is taken from the non-respondents to get an estimate of the sub-populations represented by the non-respondents. Cochran [2] employed Hansen and Hurwitz [5] technique and proposed ratio and regression estimation of the population mean of the study variables where the auxiliary variable information is obtained from all the sample units with some of the sample units failing to supply information on the study variable. According to Oh and Scheuren [8] and Kalton and Karsprzyk
29
[6], non-response is often compensated by weighting adjustment and imputation respectively. In these methods it was argued that the procedure used in weighting adjustment and imputation aimed at eliminating the bias due to nonresponse. Okafor and Lee [9] employed the double sampling method to estimate the mean of the auxiliary variable and went ahead to estimate the mean of the study variable in a similar way as Cochran [2]. In this method double sampling for ratio and regression estimation was considered. The distribution of the auxiliary information was not known and hence the the first phase sample was used to estimate the population distribution of the auxiliary variable while the second phase was used to obtain the required information on the variable of the interest. The optimum sampling fraction for the estimators for a fixed cost was derived. Performances of the proposed estimators were computed and compared with those of Hansen and Hurwitz [5] estimators without considering the cost. It was noted that for the results for which cost component was not considered, regression estimator functions were consistent than the Hansen and Hurwitz [5] estimator. Chaudhary and Kumar [3] proposed a method of estimating mean of a finite population using double sampling scheme under non-response. The proposed model was based on the fact that both the study and auxiliary variable suffered from the non-response with the information of X not available. Hence the estimate of X at first phase is given by,
x/ =
n1/ xn/1 + n2/ xh/2 n/
(1)
With the corresponding variance of,
( ) = n1 − N1 S
V x/
*
/
2 x
L/ − 1 + / W2 S x22 n
(2)
Where xn/1 and xh/2 are means from the n1/ responding units and n2/ non-responding units respectively. Sx2 and S x22 are mean square errors of the entire group and nonresponding respectively with L/ as the inverse sampling rate at first phase of the sampling. From the previous studies, a number of researchers have considered a linear cost function when estimating domains. In dealing with non-response most of them have considered subsampling while holding to the idea that the response mechanism is deterministic. This study therefore focuses on the estimation of domain mean using double sampling for ratio estimation with non-linear cost function with a random response mechanism. In this study we therefore establish an efficient and cost effective method of estimating domains when the travel cost component is inclusive and it is not linear. The problem of minimum variance and cost is addressed while considering non-linear cost function and optimal sample size.
30
Alila David Anekeya et al.: Domain Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non Response
2. Estimation of Domain Mean and Variance in the Presence of NonResponse 2.1. Developing Domain Concept Theory with NonResponse The problem of non-response is inherent in many surveys. It always persists even after call-backs. The estimates obtained from incomplete data will be biased especially when the respondents are different from the non-respondents. The non-response error is not so important if the characteristics of the non-responding units are similar to those of the responding units. However, such similarity of characteristics between two types of units (responding and non-responding) is not always attainable in practice. In double sampling when the problem of non-response is present, the strata are virtually divided into two disjoint and exhaustive groups of respondents and non-respondents. A sub-sample from nonresponding group is then selected and a second more extensive attempt is made to the group so as to obtain the required information. Hansen and Hurwitz [5] proposed a technique of adjusting the non-response to address the problem of bias. The technique consists of selecting a subsample of the non-respondents through specialized efforts so as to obtain an estimate of non-responding units in the population. This sub-sampling procedure albeit costly, it’s free from any assumption hence, one does not have to go for a hundred percent response which can be substantially more expensive. In developing the concept of domain theory with nonresponse the following assumptions are made; i. Both the domain study and auxiliary variables suffers from non-response. ii. The responding and non-responding units are the same for the study and auxiliary characters. iii. The information on the domain auxiliary variable X d is not known and hence X d is not available. iv. The domain auxiliary variables do not suffer from nonresponse in the first phase sampling but suffers from non-response in the second phase of sampling. 2.2. Proposed Domain Estimators Let U be a finite population with N known first stage units. The finite population is divided into D domains; U1 , U 2 ,..., U D of sizes N1 , N 2 ,..., N d ,..., N D respectively. Further, let U d be the domain constituents of any population size N d which is assumed to be large and known. Let U and N be defined as, D
U = ∪ U d and N = d =1
D
∑N
d
respectively.
d =1
Let Yd and X d be the domain study and auxiliary variables respectively. Further, let Yd and X d be their respective domain population means and auxiliary means
ydi ( i = 1, 2,3,..., N d )
with
xdi ( i = 1, 2,..., N d )
and
observations on the ith unit. In estimating the domain auxiliary population mean X d double sampling design is used. A large first phase sample of size n / is selected from N units of the population by simple random sampling without replacement (SRSWOR) design from which nd/ out n / first sample units falling in the d th domain. The assumption here is that all the n / units supply information of the auxiliary variable X d at first phase. A smaller second phase sample of size n is selected from n / by SRSWOR from which nd out of n second phase sample units fall in the d th domain. For estimating the domain population mean X d of the auxiliary variables X d from a large first phase sample of
(
size nd/ , values of the observations xd/ i i = 1, 2,3,..., nd/
) are
obtained and a sample auxiliary domain mean xd/ is computed. From the second sample of size nd , let ydi and
xdi be the domain study and auxiliary observations with
( i = 1, 2,3,..., nd ) .
Let ndi units supply the information on
ydi and xdi respondents while nd 2 be the non-respondents for both the study and the auxiliary domain variables respectively such that,
nd = nd1 + nd 2 . For the nd 2 non-respondent group at the second phase sampling, an SRSWOR of rd2 units is selected with an inverse sampling rate of vd2 such that,
rd2 =
nd2
, With vd2 > 1
vd2
All the rd2 units respond after making extra efforts of subsampling nd 2 non-responding units. In developing the framework of double sampling there are two strata that are non-overlapping and disjoint. Stratum one consist of those units that will respond in the first attempt of the second phase population made up of N d1 units and stratum two consist of those units that would not respond in the first attempt of phase two with domain population units N d2 = N d − N d1 . Both N d and N d2 units are not known in advance. The stratum weights of the responding and non- responding 1
groups are defined by
Wd1 =
N d1 Nd
their estimators defined by respectively.
and
Wd2 =
Wˆd1 = wd1 =
nd1 nd
N d2 Nd
respectively with
and Wˆd2 = wd2 =
nd 2 nd
Science Journal of Applied Mathematics and Statistics 2018; 6(1): 28-42
Following the Hansen and Hurwitz [5] techniques, the unbiased estimator for estimating the domain population
(
)
mean using nd1 + rd 2
iv) xrd = 2
observations on ydi domain study
yd =
nd
yd1 +
nd2
yrd
nd
2
(3)
xd =
nd
xd1 +
nd
xrd
E xd/ = E xd = X d , E ( yd ) = Yd (4)
observation ydi and xdi respectively.
The expression for the Mean square error (MSE) of Yd R
1
The following sample characteristics are defined when estimating domain mean, 1 nd1
and Yd R
i =1
di
approximation. Let
-domain mean of the study character
from the response group based on nd1 units ii) yrd = 2
1 rd 2
rd 2
∑y j =1
dj
1 nd1
ε d0 =
yd − Yd ⇒ yd = Yd ε d0 + 1 Yd
ε d1 =
xd − X d ⇒ xd = X d ε d1 + 1 Xd
-domain mean of the study character
for the non-responding group of rd2 respondent units iii) xd1 =
are derived by the use of the Taylor's series
2
nd1
∑y
(
nd1
∑x i =1
di
(5)
3. Bias and Mean Square Error of the Ratio Estimator
Where yd and xd are the sample domain means for the
yd1 =
-domain mean of the auxiliary
With the assumption that,
2
= wd1 xd1 + wd 2 xrd2
i)
dj
j =1
y y ˆ ˆ Yd R = d .xd/ = rd xd/ and Yd 2 = d/ .xd = rd xd 1 xd xd
Similarly the estimate for domain auxiliary variable is given by;
nd2
∑x
respondent units In estimating the overall domain population mean in the presence of non-response, double sampling ratio estimation of the domain mean is used. Define;
= wd1 yd1 + wd 2 yrd2
nd1
rd2
character for the non-responding group of rd2
character is given by;
nd1
1 rd2
31
ε d2 =
-domain mean of the auxiliary
(
)
xd/ − X d ⇒ xd = X d ε d0 + 1 Xd
(
( )
character from the response group based on nd1 units
)
( )
)
(6)
( ) =0
With the assumption that E ε d0 = E ε d1 = E ε d 2 Further define;
y − Y 2 d E ε d20 = E d Yd
( )
(
y −Y d d = E 2 Yd =
1 Yd2
)
2
= 1 E y − Y 2 d Y2 d d
Var ( yd )
(
)
(
)
=
1 V E E yd / nd/ + E1V2 E3 ( yd / nd ) + E1 E2V3 yd / rd2 2 1 2 3 Yd
=
vd − 1 1 1 1 2 1 1 − − / S y2d + 2 Wd 2 S y2d S yd + 2 / 2 Yd nd N d nd nd nd
(7)
32
Alila David Anekeya et al.: Domain Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non Response
vd − 1 1 1 2 1 1 Wd C y2 − / C y2d + 2 C yd + = / − 2 d2 N n n n n d d d d d Where, S y2d = Variance of the whole domain population mean of the study variable Yd
S y2d =Variance of the domain population mean for the 2
stratum of non-respondents for Stratum of non-respondents for the study variable Yd Consider also
the auxiliary variable X d
vd2 = The inverse sampling rate
S x2d = Variance of the domain population mean for the 2
stratum of non-respondents for Stratum of non-respondents for the auxiliary variable X d 2
xd/ − X d 2 1 = 2 E xd − X d Xd X d
( )
Next consider E ε d22 = E
2
E ε12
x − Xd 2 1 = E d = 2 E xd − X d Xd Xd =
=
1 X d2
=
Var ( xd )
=
(
)
1 V1 E2 ( xd / nd ) + E1 V2 xd / rd2 X d2 2
1 1 S xd = − 2 + Wd 2 nd N d X d
(8)
( )
Var xd/
1 N d − nd/ X d2 N d
S x2d / nd
2 1 1 S xd = / − n 2 d Nd X d
2
vd 2 − 1 S xd2 2 nd X d
1 X d2
(9)
Consider,
Where, S x2d = Variance of the whole domain population mean of
y − Yd xd − X d E ε d0 ε d1 = E d Yd X d
(
)( xd − X d )
=
1 E yd − Yd X d Yd
=
1 Cov ( xd yd ) X d Yd
=
1 1 1 Cov E ( yd / nd ) E ( xd / nd ) + E Cov ( yd xd ) / nd + E Cov ( yd xd ) / rd2 X d Yd X d Yd X d Yd
=
1 1 E Cov ( yd xd ) / nd + E Cov ( yd xd ) / rd2 X d Yd X d Yd
S xd S yd S xd S yd vd 2 − 1 1 1 2 2 = − + ρ Wd 2 ρ xd yd x y d d n 2 2 n N X Y X Y d d d d d d d Next, y − Y d E ε d0 ε d2 = d Y d
xd/ − X d Xd
(10)
Science Journal of Applied Mathematics and Statistics 2018; 6(1): 28-42
(
33
) ( xd/ − X d )
=
1 E yd − Yd X d Yd
=
1 Cov yd xd/ X d Yd
=
1 1 1 Cov E yd / nd/ , E xd / nd/ + E Cov yd xd/ / nd/ + E Cov yrd xrd / nd/ 2 2 X d Yd X d Yd X d Yd
(
)
(
) (
)
(
(
)
)
S x S yd 1 1 = / − ρ xd yd d n X d Yd d Nd
(11)
Consider, x − X d E ε d1 ε d 2 = E d X d
=
=
1 X d2
xd/
(
E1 E2 xd − X d
1 X d2
(
E1 xd/ − X d
)
2
− Xd Xd
y ˆ Yd R = d .xd/ = rd xd/ 1 xd
(
) ( xd/ − X d ) / nd/
2 1 1 S xd = / − 2 n d Nd X d
(
(12)
(
)
)
)
(
)
= Yd 1 + ε d0 + ε d2 + ε d0 ε d 2 − ε d1 − ε d0 ε d1 − ε d1 ε d2 + ε d21 (13)
2
The ratio estimator of Yˆd R and Yˆd R can be defined as; 1
)(
1+ εd 1+ εd 0 2 = Yd 1 + ε d1
ˆ ˆ 3.1. The Bias of the Ratio Estimator Yd R and Yd R 1
) ( ( )
Yd 1 + ε d X d 1 + ε d 0 2 = X d 1 + ε d1
ˆ 3.1.1. Bias of Ratio Estimator Yd R
2
1
y y ˆ ˆ Yd R = d .xd/ = rd xd/ and Yd 2 = d/ .xd = rd xd respectively 1 xd xd
Proposition 1
ˆ The bias of the ratio estimator Yd R is given by, 1
Define Yˆd R as 1
1 vd − 1 1 Yd − / Cxd C xd − ρ xd yd c yd + 2 Wd 2 Cxd2 Cxd2 − ρ xd2 yd2 Cxd2 n nd d nd
(
(
)
)
Where,
Cxd =
S xd Xd
, C yd =
S yd Yd
, C xd = 2
S xd
2
Xd
and C yd = 2
S yd
2
Yd
S y2d = Variance of the whole domain population mean of the study variable Yd
S y2d =Variance of the domain population mean for the stratum of non-respondents for Stratum of non-respondents for the 2
study variable Yd
S x2d = Variance of the whole domain population mean of the auxiliary variable X d
vd2 = The inverse sampling rate
S x2d = Variance of the domain population mean for the stratum of non-respondents for Stratum of non-respondents for the 2
auxiliary variable X d Proof
(
)
2 ˆ Yd R = Yd 1 + ε d0 + ε d2 + ε d0 ε d2 − ε d1 − ε d0 ε d1 − ε d1 ε d2 + ε d1 1
34
Alila David Anekeya et al.: Domain Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non Response
( ) = Y (1 + E (ε
ˆ E Yd R
d
1
(
) + E (ε ) + E (ε
d0
d2
(
) (
d0 ε d 2
) − E (ε ) − E ( ε d1
d 0 ε d1
) − E (ε
d1 ε d 2
) + E (ε d2 ) ) 1
) ( ))
) (
= Yd 1 + E ε d0 ε d2 − E ε d0 ε d1 − E ε d1 ε d 2 + E ε d21 1 1 vd 2 − 1 1 1 1 1 2 = Yd 1 + / − − ρ xd yd C xd C yd − C xd Wd2 ρ xd2 yd2 C xd2 C yd2 − / − ρ xd yd C xd C yd − nd N d nd nd N d nd N d 1 1 2 vd 2 − 1 + − W C2 C xd + d 2 xd2 n N n d d d 1 vd − 1 1 W C 2 − ρ xd yd C xd C yd = Yd 1 + − / C x2d − ρ xd yd C xd C yd + 2 n d2 xd 2 2 2 2 nd nd d
(
(
)
( )
)
1 vd − 1 1 ˆ 2 − / C x2d − ρ xd yd C xd C yd + 2 E Yd R − Yd = Yd Wd 2 C xd − ρ xd2 yd2 C xd2 C yd2 1 n nd nd d
(
(
)
1 vd − 1 1 2 2 Hence Bias of Yˆd R = Yd − / Cxd − ρ xd yd Cxd C yd + 2 Wd 2 Cxd − ρ xd2 yd2 Cxd2 C yd2 1 n nd nd d
(
(
)
)
)
ˆ 3.1.2. Bias of Ratio Estimator Yd R
2
Proposition 2 S xd S yd S S 1 ˆ 1 2 2 The bias of the ratio estimator Yd R is given by, Yd − ρ x y xd yd + Wd ρ x y / d d 2 d d 2 2 2 X n X Y Y n d d d d d d Proof yd ˆ Define Yˆd R as Yd R2 = / .xd = rd xd 2 xd
(
) (
Yd 1 + ε d X d 1 + ε d 0 1 = X d 1 + ε d2
(
(
)(
)
1+ εd 1+ εd 0 1 = Yd 1 + ε d2
(
)
)
)
(
)
= Yd 1 + ε d0 + ε d1 + ε d0 ε d1 − ε d2 − ε d0 ε d2 − ε d1 ε d2 + ε d22
(
)
2 ˆ Yd R = Yd 1 + ε d0 + ε d1 + ε d0 ε d1 − ε d2 − ε d0 ε d2 − ε d1 ε d2 + ε d2 2
( ) = Y (1 + E (ε
ˆ E Yd R
1
d
(
d0
(
) + E (ε ) + E (ε d1
) (
d 0 ε d1
) (
) − E ( ε ) − E (ε d2
d0 ε d 2
) − E (ε
d1 ε d 2
) + E (ε ) ) 2 d2
) ( ))
= Yd 1 + E ε d0 ε d1 − E ε d0 ε d 2 − E ε d1 ε d2 + E ε d22 2 1 S xd S yd 1 Sx S y 1 S x S y vd − 1 1 1 1 S xd2 2 2 = Yd 1 + − ρ xd yd d d − / − ρ xd yd d d + 2 W ρ − − d x y n 2 nd N d X d Yd n d N d X d Yd nd 2 d2 d2 X d Yd d Nd X d 2
1 1 S xd2 + − 2 nd N d X d
Science Journal of Applied Mathematics and Statistics 2018; 6(1): 28-42
35
S xd S yd 1 S x S yd vd 2 − 1 1 2 2 − / ρ xd yd d + Wd2 ρ xd yd Hence Bias of Yˆd R = Yd 2 2 X 2 X d Yd nd nd n d d Yd ˆ 3.2. Mean Square Error (MSE) of the Ratio Estimator Yd R 1 ˆ Sd2R = S y2d + S x2d Rd2 − 2 ρ xd yd Rd S xd S yd and Yd R 1 2
The ratio estimator of Yˆd R and YˆdR can be defined as; 1
Sd2R = S y2d + Rd2 S x2d − 2ρ xd
2
2
y y ˆ ˆ Yd R = d .xd/ = rd xd/ and Yd 2 = d/ .xd = rd xd respectively 1 xd xd Proposition 3 The mean square error (MSE) of the estimator defined by y ˆ Yd R = d .xd/ = rd xd/ is given by; 1 xd
1 1 2 1 1 − / / − S yd + N n d nd d nd
2
2
2
yd2 Rd S xd2 S yd2
With the notations defined as in preposition 1 above and Rd =Population ratio of Yd to X d Proof By definition,
( )
ˆ MSE Yd R
2 vd − 1 / Sd R + 2 Wd 2 Sd R2 1 n d
1
2 y ˆ = E Yd R − Yd = E d .xd/ − Yd 1 xd
Substituting the values of equations (5) we obtain
Where,
(
) ( ( )
(
)(
)
Yd 1 + ε d X d 1 + ε d 0 2 = E − Yd X d 1 + ε d1
)
1+ εd 1+ εd 0 2 = Yd2 E − 1 1 + ε d1
(
)
2
2
)(
(
)
= Yd2 E ε d2 + ε d0 + ε d0 ε d2 − ε d1 1 − ε d1 + ε d21 + ...
(
= Yd2 E ε d + ε d − ε d 0 2 1
)
2
2
( ) ( ) ( )
(
)
(
)
(
)
2 2 2 2 = Yd E ε d0 + E ε d1 + E ε d2 + 2 E ε d0 ε d2 − 2 E ε d0 ε d1 − 2 E ε d1 ε d2 2 2 S y2d 1 S yd 1 1 S yd vd 2 − 1 2 1 2 Y − + − + W = d / d nd N d Yd2 nd nd/ Yd2 nd 2 Yd2
2
2 2 1 vd − 1 S xd2 1 1 S xd 1 S xd − + + + Wd 2 2 − 2 / 2 n X 2 nd N d X d nd N d X d d d
S x S yd S xd S yd 1 1 1 1 − 2 − ρ xd yd d +2 / − ρ xd yd X d Yd X d Yd nd N d nd N d 2 S xd S yd vd − 1 1 1 S xd 2 2 −2 2 ρ W − 2 − n xd2 yd2 d 2 X Y n/ N d X 2 d d d d d
1 1 1 2 1 1 1 2 2 = / − − / S y2d + / − S yd + S xd Rd n d Nd nd nd nd N d
]
2
36
Alila David Anekeya et al.: Domain Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non Response
vd 2 − 1 2 2 2 Wd 2 S yd + Rd S xd − 2 ρ xd2 yd2 Rd S xd2 S yd2 n d
1 1 2 2 − − S R n / N xd d + d d
1 1 1 1 − ρ xd yd Rd S xd S yd − 2 ρ xd yd Rd S xd S yd +2 / − nd N d nd N d 1 1 2 1 1 = / − − / S y2d + S x2d Rd2 − 2 ρ xd yd Rd S xd S yd S yd + n d Nd nd nd
{
}
vd − 1 2 2 2 + 2 Wd 2 S yd + Rd S xd − 2 ρ xd2 yd2 Rd S xd2 S yd2 n d 1 vd − 1 1 2 1 1 2 = / − S yd + − / Sd2R + 2 Wd 2 Sd R2 n 1 nd d Nd nd nd Proposition 4 The mean square error (MSE) of the ratio estimator
y ˆ YdR = / .xd 2 xd
1
is given by;
nd/
−
1 Nd
2 1 1 − / S yd + n d nd
/2 vd − 1 /2 S d R + 2 Wd 2 S d R2 1 nd
Where,
S /2d R = S y2d + S x2d Rd2 + 2ρ xd yd Rd S xd S yd 1
/
Sd R2 = S y2d + Rd2 S x2d + 2 ρ xd 2
2
2
2
yd2
Rd S xd S yd 2
2
With the notations as defined in proposition 1 above Proof
( )
MSE of Yˆd R = MSE Yˆd R
2
2
ˆ = E Yd R − Yd 2
y = E d .x d/ − Yd xd = E Yd
(
2
2
)(
)
1+ ε d 1+ εd 0 1 − 1 1 + ε d2
(
)
2
)(
(
)
= Yd2 E ε d + ε d + ε d ε d − ε d 1 − ε d + ε d2 + ... 0 1 0 2 2 2 1
(
= Yd2 E ε d + ε d − ε d 0 1 2
)
2
2
( ) ( ) ( )
(
)
(
) (
)
2 2 2 2 = Yd E ε d0 + E ε d1 + E ε d2 + 2 E ε d0 ε d1 − 2 E ε d0 ε d2 − 2 ε d1 ε d2 2 1 1 S yd 1 1 = Yd2 / − + − nd N d Yd2 nd nd/
S y2d S y2d vd 2 − 1 2 + W 2 d 2 2 n Yd d Yd
2 2 S x2d 1 1 S xd 1 1 S xd vd 2 − 1 2 + − + − + W d2 2 2 / 2 n N N n X n X X d d d d d d d d
Science Journal of Applied Mathematics and Statistics 2018; 6(1): 28-42
37
S xd S y d 1 S xd S yd vd 2 − 1 1 2 2 − + Wd 2 ρ xd yd +2 ρ xd yd n 2 2 X n N X Y Y d d d d d d d
2 1 S xd S yd 1 1 1 S xd − ρ + − −2 xd yd X d Yd nd/ N d X d2 nd N d
1 1 2 1 1 = / − S + − / n yd n N d d d nd
2 vd 2 − 1 2 S yd + Wd 2 S yd2 n d
1 1 2 2 1 1 2 2 vd 2 − 1 2 2 − Rd S xd + + Wd 2 Rd S xd2 Rd S xd + / − n N N n d d d d nd vd 2 − 1 1 1 − W ρ R S S + 2 ρ xd yd Rd S xd S yd + 2 d 2 xd2 yd2 d xd2 yd2 nd N d nd 1 1 1 1 2 2 −2 / − ρ xd yd Rd S xd S yd − 2 / − Rd S xd n d Nd nd N d 1 1 2 1 1 = / − − / S y2d + S x2d Rd2 + 2 ρ xd yd Rd S xd S yd S yd + n d Nd nd nd
(
vd − 1 2 2 2 + 2 Wd2 S yd + Rd S xd2 + 2 ρ xd2 yd2 Rd S xd2 S yd2 n d
(
)
)
1 vd − 1 1 2 1 1 − / S /2d R + 2 W S /2 S yd + = / − n d 2 d R2 1 N n n n d d d d d Where, cd/ = The cost of measuring a unit in the first sample of
4. Estimation of Sample Size in the Presence of Non-Response
size nd/
Estimation of domain mean is developed using double sampling design based on the technique of sub-sampling of both the study and auxiliary variable of the non-response with unknown domain size. A study of cost surveys is therefore considered where a non-linear cost function is employed in obtaining the optimal sample sizes by minimizing variance for a fixed cost
θ
+ cd0 nd + cd1 nd1 + cd2 rd2
cd1 = The unit cost for processing the responded data of yd at the first attempt of size nd1 .
cd2 = The unit cost associated with the sub-sample of size However the first sample of size nd1 and sub-samples of
An optimum size of a sample is required so as to balance the precision and cost involved in the survey. The optimum allocation of a sample size is attained either by minimizing the precision against a given cost or minimizing cost against a given precision. In this study, a non-linear cost function has been considered. Denote the cost function for the ratio estimation by
( )
yd with second phase sample size nd .
rd2 from non-respondents of size nd 2
4.1. Optimal Allocation in Double Sampling for the Estimation of Domain
Cd = cd/ nd/
cd0 =The cost of measuring a unit of the first attempt on
(14)
size rd2 are not known until the first attempt is carried out. The cost will therefore be used in the planning for the survey. Hence the expected cost values of sizes nd1 and rd2 will be given by; nd1 = Wd1 nd and rd 2 = Wd 2 .
nd . Hence the vd2
expected cost function is;
( )
/ / E Cd = Cd* = cd nd
θ
+ cd0 nd + cd1 nd1 + cd2 Wd2
nd vd2
38
Alila David Anekeya et al.: Domain Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non Response
( )
θ
/ / Cd* = cd nd
Wd + nd cd0 + cd1Wd1 + cd2 . 2 (15) vd 2
1
( )
θ +1
λ cd/ nd/
4.2. Results for Double Sampling for Domain Estimation in the Presence of Non-Response
nd/
specified cost
when,
θ +1
=
S y2d − Sd2R
1
λ cd/ θ
2 2 2 Let Sωd = S yd − Sd R1 > 0 , thus,
( ) nd/
1 θ +1
Sω2 = d/ ∅ θ cd
nd/
(S ∅
nd =
− Wd 2 S d2R
d0
+ cd1Wd1
(c
cd2
vd2 =
.
Sd2R
2
(S (
)
2
2 d R1
)
− Wd2 Sd2R
2
cd0 + cd1Wd1
)
)
∂G (Wd ) ∂vd2
2
Sωd = ∅=
Sω2d
λ cd/ θ 1
Next the partial derivative with respect to vd2 obtained as;
Wd 2 S d2R
S y2d
=
2 2 Sωd θ +1 Sωd θ +1 nd/ = = ∅ / cd/ θ λ cd θ
Where, 2
θ +1
1
2 d R1
− Sd2R 1
=0
= S y2d − Sd2R
y
Cd*
θ +1
1
( )
Proposition 5 The variance for the estimated domain mean for the ˆ / d estimated domain mean Yd R1 = x .xd is minimum for a d
( )
= − S y2d + Sd2R + λ cd/ nd/
>0
nd
Wd2 S d2R
=
−
2
nd
=
λ cd 2 Wd2 nd vd22
=0
λ cd 2 Wd2 nd vd22
λ cd2 Wd2 nd 2 = vd22 Wd2 Sd2R
1
2
λ
Sd2R = S y2d + S x2d Rd2 ± 2ρ xd yd Rd S xd S yd
vd 2 =
1
nd λ cd 2
(17)
Sd R
2
Sd2R = S y2d + Rd2 S x2d ± 2 ρ xd 2
2
2
2
Rd S xd S yd
yd2
2
2
Consider the equation 1 1 2 1 1 G (Wd ) = / − − / S yd + n N n d d d nd
Proof To determine the optimum values of vd2 , nd and nd/ that
minimizes variance at a fixed cost, define
G (Wd )
1 vd − 1 1 2 1 1 = / − − / Sd2R + 2 W S2 S yd + n d2 d R2 n 1 N n n d d d d d
( )
+ λ cd/ nd/
θ
and the partial derivatives are equated to zero
∂G (Wd )
=
− S y2d nd/2
+
Sd2R 1 nd/2
( )
+ λ cd/ θ nd/
θ −1
=0
θ
Wd + nd cd 0 + cd1Wd1 + cd 2 . 2 vd 2
* − Cd (18)
But from (17),
Wd + nd cd0 + cd1Wd1 + cd 2 . 2 − Cd* (16) vd2
To obtain the normal equations, the expression of Equation (16) is differentiated partially with respect to vd2 , nd and nd/ ,
∂nd/
( )
+ λ cd/ nd/
2 vd 1 2 Sd R1 + 2 − Wd 2 Sd R2 n n d d
vd 2 nd
=
λ cd 2 Sd R
2
and Sd R nd 2 = vd 2 λ cd 2
Substituting this in Equations (18) we obtain
(19)
Science Journal of Applied Mathematics and Statistics 2018; 6(1): 28-42
1
1
1
λ cd 2
1
G (Wd ) = − − / Sd2R + S2 + n / N yd n 1 S d d d nd dR 2
1 Wd Sd2 nd 2 R2
Sd R 2 + nd cd0 + cd1Wd1 + cd 2 Wd 2 λ cd 2
( )
+λ cd/ nd/
−
θ
39
* − Cd
(20)
The partial derivative of the equation (20) with respect to nd is obtained as
∂G (Wd ) ∂nd
=−
Sd2R
1
nd2
+
Wd2 Sd2R
(
)
+ λ cd0 + cd1Wd1 = 0
2
nd2
(
)
= − Sd2R + Wd 2 Sd2R + λ nd2 cd 0 + cd1Wd1 = 0 1
2
(
)
λ nd2 cd0 + cd1Wd1 = Sd2R − Wd2 Sd2R nd2
=
1
Sd2R − Wd2 Sd2R
(
1
2
λ cd0 + cd1Wd1
)
Sd2 − Wd S d2 R 2 R1 nd = ∅ 1 cd + cd Wd 1 1 0
Where ∅ =
1
λ
But
vd2 =
λ cd2 Sd R
.nd
2
from equation (19)
2
Thus,
vd 2 =
cd 2 S d2R 2
(S .
2 d R1
− Wd 2 S d2R
d0
+ cd1Wd1
(c
2
)
)
To obtain λ the values of vd2 , nd and nd/ are substituted in the cost function equation (16) and then solve for the value of λ . Suppose the cost function is given by
( )
/ / Cd* = cd nd
θ
Wd + nd cd0 + cd1Wd1 + cd2 . 2 vd 2
Then, 1
Cd*
= cd/
Sω2 d θλ cd/
θ +1 θ 1 + λ
1 Cd* = cd/ / c d Let,
θ
θ +1 Sω2d θ
S d2 − Wd S d2 2 R2 R1 cd + cd Wd 0 1 1
c W cd + cd Wd + d 2 d2 0 1 1 cd 2
θ
θ θ +1 1 θ +1 1 + Wd Sd + cd0 + cd1Wd1 λ λ 2 R2
(
)
S 2 − Wd S d2 2 R2 d R1 cd + cd Wd 0 1 1
Sd2 − Wd Sd2 2 R2 R1 cd + cd Wd 0 1 1
(21)
40
Alila David Anekeya et al.: Domain Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non Response
( )
A = cd/
(
B = cd0 + cd1Wd1
1 θ +1
Sω2 d θ
θ
θ +1
Sd2 − Wd S d2 2 R2 R1 cd + cd Wd 0 1 1
)
+ Wd S d cd 2 2 R2
C = Cd* The equation (20) becomes;
Aλ
−
θ θ +1
+ Bλ
−
1 2
(22)
−C = 0
If θ = 1 and substituting this value in the equation (20) we obtain a linear equation of the form
Aλ
−
1 2
+ Bλ
−
1 2
−C = 0
With the values of A and B defined as;
A=
( ) cd/
Sω2 d θ
1 2
1
2 , B = cd0 + cd1Wd1
(
)
Sd2 − Wd S d2 2 R2 R1 cd + cd Wd 0 1 1
Solving the linear equation solution obtained is,
+ Wd S d cd 2 and C = Cd* 2 R2
cost Cd* if;
A+ B C
λ=
Sω2 nd/ = d/ ∅ θ cd
When θ = 1 and substituting this value in the equation (22) 3
we obtain a linear equation of the form,
Aλ
−
1 4
+ Bλ
−
1 2
nd = ∅
(23)
−C = 0
With the values of A defined as;
( )
A = cd/
1 2
vd 2 =
1
Sω2 d θ
2
(
2A B 2 + 4 AC − B
− Wd 2 S d2R
d0
+ cd1Wd1
(c
cd 2 S d2R 2
2
(S .
∅=
)
)
2 d R1
− Wd 2 S d2R
d0
+ cd1Wd1
(c
2
)
)
1
λ
Proof The proof for nd and vd2 is the same as the one in
4
)
Proposition 6 If the expected cost function is of the form cd/ log nd/ + nd cd0
2 d R1
Where,
While B and C remains as earlier defined Solving the equation (23) solution obtained is,
λ=
(S
proposition 5 above. For nd/ the Lagrangian multiplier technique is used. Let, C d* =
Wd + cd1Wd1 + cd 2 . 2 then the variance of vd 2
the estimated domain mean yd is minimum for a specified
1 vd − 1 1 2 1 1 − S yd + − / Sd2R + 2 W S2 / d2 d R2 n 1 N n n n d d d d d
G (Wd ) =
Science Journal of Applied Mathematics and Statistics 2018; 6(1): 28-42
Wd + λ cd/ log nd/ + nd cd0 + cd1Wd1 + cd 2 . 2 − Cd* (24) v d2
nd/ =
To obtain the normal equations for the expression (24) the equation is differentiated partially with respect to nd/ and the partial derivatives are equated to zero
∂G (Wd ) ∂nd/
=
=
− S y2d
+
nd/2
− S y2d
Sd2R
+
1
nd/2
+ Sd2R 1
S y2d − S d2R
1
λ cd/
But, Sω2d = S y2d − Sd2R > 0 , thus, 1
Sω2 d ∅ = = λ cd/ cd/ Sω2d
nd/
λ cd/
=0
nd/
+ λ cd/ nd/
41
1
∅=
=0
λ
To solve for λ , let the variance be given as V0 then
λ cd/ nd/ = S y2d − Sd2R
substitute the values of vd2 , nd and nd/ into the equation,
1
1 1 2 1 1 − / G (Wd ) = / − S yd + n N n d d d nd
2 vd − 1 2 Sd R + 2 Wd2 Sd R2 = V0 1 n d
1 vd − 1 1 2 1 1 = / − S yd + − / Sd2R + 2 W S 2 − V0 = 0 n n 1 n d 2 d R2 N n d d d d d
=
1 nd/
(
)
S y2d − Sd2R + 1
2 S yd 1 2 2 2 S − W S + v W S − + V0 = 0 d R1 d 2 d R2 d2 d 2 d R2 nd N d
(
)
(25)
Substitute the values of vd2 , nd and nd/ into the equation (25) and simplify to obtain, − Wd 2 S d2R
) (c
+ V0 , A = cd/ and B = Nd
(S
λ cd/ + λ
(S
2 d R1
2
d0
)
+ cd1Wd1 + Wd 2 S d2R cd 2 − Vd* = 0 2
(26)
Let, S y2
Vd* =
d
Thus equation (26) becomes, 1
Aλ + Bλ 2 − Vd* = 0
(27)
Solving for λ in equation (27) the solution becomes,
λ=
(
)
1 2
B 2 + 4 AC − B 2A
5. Conclusion
2 d R1
− Wd2 Sd2R
2
sampling rate ( vd2 ) tends to ( →) 1 then the MSE tends
d0
)
+ cd1Wd1 + Wd2 Sd2R
2
cd2
asymptotically to 0. From theoretical analysis it is observed that the Mean Square Error of the proposed estimator will decrease as the sub-sampling fraction together with the number of auxiliary characters is increased. As the subsampling fraction also increases and the value of θ increases then the values of nd/ and nd are minimized with the reduction in the value of Lagrangian multiplier ( λ ) which minimizes the cost function.
References [1]
Aditya, K., Sud U., and Chandra H., (2014). Estimation of Domain Mean Using Two-Stage Sampling with Sub-Sampling Non-response. Journal of the Indian Society of Agricultural Statistics 68 (1) pp. 39-54.
[2]
Cochran W. G., (1977) Sampling techniques. New York: John Wiley and Sons, (1977).
From the results it is noted that as values of first sample domain size ( nd/ ) tends to ( →) domain population size ( N d ), second sample size ( nd ) tends to ( →) nd/ and inverse
) (c
42
Alila David Anekeya et al.: Domain Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non Response
[3]
Chaudhary M. K, and Kumar A., (2016). Estimation of Mean of Finite Population using Double sampling Scheme under Non-response, Journal of Mathematical Sciences 5 (2), 4 pp 287 297.
[8]
Oh, H. L, and Scheuren F. J., (1983). Weighting adjustments for unit non-response, In W. G Madow, I, Olkin And B Rudin (Eds), Incomplete data in sample surveys New, York Academic press, 2 pp 143-184.
[4]
Gamrot, W., (2006). Estimation of Domain Total under Nonresponse using Double Sampling, Statistics in Transition, 7 (4) pp. 831-840.
[9]
Okafor F. C, (2001). Treatment of Non-response in Successive Sampling, Statistica, 61 (2) 195 204.
[5]
Hansen M. H. and Hurwirtz W. W, (1946). The problem of Non-response in Sample Surveys, The Journal of the American Statistical Association, 41 517-529.
[6]
Kalton, G., and Kasprzyk, D (1986). The treatment of Missing Survey Data, Survey Methodology, 12 pp. 1-16.
[11] Srivastava, S. K., Jhajj, H. S., (1983). A Class of Estimators of Population Means Using Multi-Auxiliary Information. Cal. Stat. Assoc. Bull. 10 (2) pp. 47-56.
[7]
Meeden, G., (2005). A Non-information Bayesian Approach to Domain Estimation. Journal of Statistical Planning and Inference, 129 (2) pp. 85-92.
[12] Udofia G. A., (2002). Estimation for Domains in Double Sampling for Probabilities Proportional to Size, the Indian Journal of Statistics 63 pp. 82-89.
[10] Sahoo, L. N and Panda, P., (199). Estimation Using Auxiliary Information in Two-Stage Sampling, Austrialian and New Zealand Journal of Statistics 41 (4) pp. 405-410.