Document not found! Please try again

Optimal Allocation in Domains Mean Estimation Using Double

0 downloads 0 Views 328KB Size Report
Feb 12, 2018 - Such a linear cost function can be of the form,. / /. 1. H. h h h. C cn cn. = = + ∑ ... allocation to the strata that minimizes the total sample sizes.
American Journal of Theoretical and Applied Statistics 2018; 7(2): 45-57 http://www.sciencepublishinggroup.com/j/ajtas doi: 10.11648/j.ajtas.20180702.11 ISSN: 2326-8999 (Print); ISSN: 2326-9006 (Online)

Optimal Allocation in Domains Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non-Response Alilah David Anekeya1, *, Ouma Christopher Onyango2, Nyongesa Kennedy1 1

Department of Mathematics, Masinde Muliro University of Science and Technology, Kakamega, Kenya

2

Departments of Statistics and Actuarial Science, Kenyatta University, Nairobi, Kenya

Email address: *

Corresponding author

To cite this article: Alilah David Anekeya, Ouma Christopher Onyango, Nyongesa Kennedy. Optimal Allocation in Domains Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non-Response. American Journal of Theoretical and Applied Statistics. Vol. 7, No. 2, 2018, pp. 45-57. doi: 10.11648/j.ajtas.20180702.11 Received: December 13, 2017; Accepted: January 5, 2018; Published: February 12, 2018

Abstract: Studies have been carried out on domain mean estimation using non-linear cost function. However little has been done on domain stratum estimation using non-linear cost function using ratio estimation in the presence of non-response. This study develops a method of optimal stratum sample size allocation in domain mean estimation using double sampling with non-linear cost function in the presence of non- response. To obtain an optimum sample size, Lagrangian multiplier technique is employed by minimizing precision at a specified cost. In the estimation of the domain mean, auxiliary variable information in which the study and auxiliary variables both suffers from non-response in the second phase sampling is used. The expressions of the biases and mean square errors of proposed estimator has also been obtained.

Keywords: Optimal Allocation, Double Sampling, Non-Linear Cost Function, Non-Response

1. Introduction

(Domain total) of a study variable yd over domain U d of size N d given by

1.1. Domains In sampling, estimates are made in each of the class into which the population is subdivided. Such subgroups or classes are known as the domain of study. Units of domains may sometimes be identified prior to sampling. Such domains are called planned domains. For unplanned domain the units cannot be identified prior to sampling and hence the estimates of certain domains is often evident only after the sampling design has been identified or after the Sampling and field work have been completed. Hence the size of unplanned domain cannot be controlled. The sample sizes for sub-populations are random variables since formation of these sub-populations is unrelated to sampling. According to Eurostat [4] the precision threshold and or minimum effective sample sizes are set up for effective planned domains. The minimum sample sizes required to achieve a relative margin error of 100.k% for the total Yd

nd (min) =

Z12−α .N d2 S y2d 2

(1)

K 2Yd2 + Z12−α N d2 S y2d 2

2 Where S yd is the variance of y over the domain and

Z1−α

2

(

)

is the percentile value at 100 1 − α 2 % of normal

distribution with mean 0 and variance 1, K is the relative margin of error expressed as a proportion while 100.k% is the relative margin error expressed as a percentage. The 2 population value Yd and S yd is unknowns and have to be estimated using data from auxiliary sources. 1.2. Optimal Allocation with Non-Linear Cost Function Optimal sample allocation involves determining the

46

Alilah David Anekeya et al.: Optimal Allocation in Domains Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non-Response

sample size n1 , n2 ,..., nH that minimizes the various cost characters under a given sampling budget C (where C is the upper limit of the total cost of the survey). Linear cost function is appropriate when the cost involved is associated with non-travel activities of survey e.g drawing sample, preparing survey methods, locating, identifying, interviewing respondents and coding data. Such a linear cost function can H

/ / be of the form, C = c n + ∑ ch nh h =1

where c / = cost of classification per unit n / =size of the first sample ch = Cost of measuring a unit in stratum h nh = number of units in stratum h . Generally the above linear cost function is mostly applicable when the major cost item is that of taking the measurements on each unit without considering the cost of the distance between the sample units. Chernyak [1] proposed optimum allocation in double sampling for stratification with a non-linear cost function. The proposed non-linear cost function is of the form,

( ) + ∑c n

C = c/ n/

α

L

k k

k =1

, α > 0 where, c / is the cost of

classification per unit and ck is the cost of measuring a unit in stratum k . Another non-linear cost function proposed was logarithmic in nature of the form, L

C = c / log n / + n/

∑c v W k k

k

k =1

Okafor and Lee [9] employed the double sampling method to estimate the mean of the auxiliary variable and proceeded to estimate the mean of the study variable in a similar way as Cochran [3]. In this method double sampling ratio and regression estimation was considered. The distribution of the auxiliary information was not known and hence the first phase sample was used to estimate the population distribution of the auxiliary variable while the second phase was used to obtain the required information on the variable of the interest. The optimum sampling fraction was derived for the estimators at a fixed cost. Performance of the proposed estimators was compared with those of Hansen and Hurwitz [5] estimators without considering the cost. It was noted that for the results for which cost component was not considered, regression estimator functions were consistent than the Hansen and Hurwitz [5] estimator. Tschuprow [11] and Neyman [8] proposed the allocation procedure that minimizes variance of sample mean under a linear cost H

function of sample size

n=

∑ n , where nh is the size of the h

h =1

stratum. Neyman [8] used Lagrange multiplier optimization technique to get optimum sample sizes for a single variable under study. Holmberg [6] addressed the problem of compromised allocation in multivariate Stratified sampling by taking into consideration minimization of some of the

variances or coefficient of variation of the population parameters and of some of the efficiency losses which may be as a result of increase in the variance due to the use of compromise allocation. Saini [10] developed a method of optimum allocation for multivariate stratified two stage sampling design by using double sampling. In this method the problem of determining optimum allocations was formulated as non-linear programming problem (NLPP) in which each NLPP has a convex objective function under a single linear constraints. The Lagrange multiplier technique was used to solve the formulated NLPPs. Khan et al. [7] proposed a quadratic cost function for allocating sample size in multivariate stratified random sampling in the presence of non-response in which a separate linear regression estimator is used. In this multi-objective Non-linear integer programming problem, an extended lexicographic goal programming was used for solution purpose and comparison made with individual optimum techniques. It is observed that in the allocation techniques, the extended lexicographic goal programming gives minimum values of coefficient of variation than the individual optimum and goal programming technique. Choudhry [2] considered sample allocation issues in the context of estimating Sub-populations (stratum and domain) means as well as the aggregate population means under stratified simple random sampling. In this method non-linear programming was used to obtain the optimal sample allocation to the strata that minimizes the total sample sizes subject to a specified tolerance on the coefficient of variation of the estimators of strata and population means. From the previous studies, a number of researchers have considered a linear cost function when estimating domains. In dealing with non-response most of them have considered subsampling while holding to the idea that the response mechanism is deterministic. This paper therefore focuses on the estimation of domain mean using double sampling for ratio estimation with non-linear cost function with a random response mechanism. In this study we therefore establish an efficient and cost effective method of estimating domains when the travel component is inclusive and it is not linear.

2. Estimation of Domain Mean and Variance in the Presence of NonResponse 2.1. Introduction The problem of non-response is inherent in many surveys. It always persists even after call-backs. The estimates obtained from incomplete data will be biased especially when the respondents are different from the non-respondents. The nonresponse error is not so important if the characteristics of the non-responding units are similar to those of the responding units. However, such similarity of characteristics between two types of units (responding and non-responding) is not always attainable in practice. In double sampling when the problem of

American Journal of Theoretical and Applied Statistics 2018; 7(2): 45-57

non-response is present, the strata are virtually divided into two disjoint and exhaustive groups of respondents and nonrespondents. A sub-sample from non-responding group is then selected and a second more extensive attempt is made to the group so as to obtain the required information. Hansen and Hurwitz [5] proposed a technique of adjusting the nonresponse to address the problem of bias. The technique consists of selecting a sub-sample of the non-respondents through specialized efforts so as to obtain an estimate of nonresponding units in the population. This sub-sampling procedure albeit costly, it’s free from any assumption hence, one does not have to go for a hundred percent response which can be substantially more expensive. In developing the concept of domain theory with nonresponse the following assumptions are made; i. Both the domain study and auxiliary variables suffers from non-response. ii. The responding and non-responding units are the same for the study and auxiliary characters. iii. The information on the domain auxiliary variable X d is not known and hence X d is not available. iv. The domain auxiliary variables do not suffer from nonresponse in the first phase sampling but suffers from non-response in the second phase of sampling. 2.2. Proposed Domain Estimators Let U be a finite population with N known first stage units. The finite population is divided into D domains; U1 , U 2 ,..., U D of sizes N1 , N 2 ,..., N d ,..., N D respectively. Further, let U d be the domain constituents of any population size N d which is assumed to be large and known. Let U and N be D

D

d =1

d =1

defined as, U = ∪ Ud and N = ∑ Nd respectively.

computed. From the second sample of size nd , let ydi and

xdi be the domain study and auxiliary observations with

( i = 1, 2,3,..., nd ) .

nd = nd1 + nd2 . For the nd2 non-respondent group at the second phase sampling, an SRSWOR of rd2 units is selected with an inverse sampling rate of vd2 such that,

A large first phase sample of size n / is selected from N units of the population by simple random sampling without replacement (SRSWOR) design from which nd/ out n / first sample units falling in the d th domain. The assumption here is that all the n / units supply information of the auxiliary variable X d at first phase. A smaller second phase sample of size n is selected from n / by SRSWOR from which nd out of n second phase sample units fall in the d th domain. For estimating the domain population mean X d of the auxiliary variables X d from a large first phase sample of size

nd/

(

/ / , values of the observations xdi i = 1, 2,3,..., nd

) are

obtained and a sample auxiliary domain mean xd/ is

nd2

rd2 =

vd 2

, With vd2 > 1

All the rd2 units respond after making extra efforts of subsampling nd2 non-responding units. In developing the framework of double sampling there are two strata that are non-overlapping and disjoint. Stratum one consist of those units that will respond in the first attempt of the second phase population made up of N d1 units and stratum two consist of those units that would not respond in the first attempt of phase two with domain population units Nd2 = Nd − Nd1 . Both N d1 and N d 2 units are not known in advance. The stratum weights of the responding and non- responding Wd1 =

groups are defined by their

Let Yd and X d be the domain study and auxiliary variables

the i th unit. In estimating the domain auxiliary population mean X d double sampling design is used.

Let ndi units supply the information on

ydi and xdi respondents while nd2 be the non-respondents for both the study and the auxiliary domain variables respectively such that,

with

respectively. Further, let Yd and X d be their respective domain population means and auxiliary means with ydi ( i = 1, 2,3,..., Nd ) and xdi ( i = 1, 2,..., N d ) observations on

47

Wˆd2 = wd2 =

nd2 nd

estimators

N d1

and

Nd

defined

Wd2 =

N d2 Nd

respectively

Wˆd1 = wd1 =

by

nd1 nd

and

respectively.

Following the Hansen and Hurwitz [5] techniques, the unbiased estimator for estimating the domain population

(

mean using nd1 + rd2

)

observations on ydi domain study

character is given by; yd =

nd1 nd

yd1 +

nd2 nd

yrd

2

= wd1 yd1 + wd2 yrd2

(2)

Similarly the estimate for domain auxiliary variable is given by; xd =

nd1 nd

xd1 +

nd2 nd

xrd

2

= wd1 xd1 + wd2 xrd2

(3)

Where yd and xd are the sample domain means for the

48

Alilah David Anekeya et al.: Optimal Allocation in Domains Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non-Response

observation ydi and xdi respectively.

and Yd R

In estimating the overall domain population mean in the presence of non-response, double sampling ratio estimation of the domain mean is used. Define;

approximation. Let

are derived by the use of the Taylor's series

2

y y ˆ ˆ Yd R = d .xd/ = rd xd/ and Yd 2 = d/ .xd = rd xd 1 xd xd With the assumption that,

E  xd/  = E  xd  = X d , E ( yd ) = Yd  

ε d0 =

yd − Yd ⇒ yd = Yd ε d0 + 1 Yd

ε d1 =

xd − X d ⇒ xd = X d ε d1 + 1 Xd

ε d2 =

xd/ − X d ⇒ xd = X d ε d0 + 1 Xd

(4)

3. Mean Square Error of the Ratio Estimator

(

)

(

(

( )

)

)

(5)

( )

( ) =0

With the assumption that E ε d0 = E ε d1 = E ε d 2 Further define;

The expression for the Mean square error (MSE) of Yd R

1

 y − Y  2  d E ε d20 = E  d    Yd    

( )

(

 y −Y d d = E 2  Yd  =

1 Yd2

)

2

  = 1 E  y − Y 2 d  Y2  d d 

Var ( yd )

(

)

(

)

=

1  V1 E2 E3 yd / nd/ + E1V2 E3 ( yd / nd ) + E1 E2V3 yd / rd 2   Yd2 

=

  vd − 1  1  1 1  2  1 1  2 − − /  S y2d +  2   S yd +   Wd 2 S yd 2  2  /   Yd  nd N d    nd   nd nd 

(6)

 vd − 1   1 1  2  1 1   Wd C y2 − /  C y2d +  2  C yd +  =  / −    2 d2 N n n n n d  d   d  d  d  Where, S y2d = Variance of the whole domain population mean of the study variable Yd

S y2d = Variance of the domain population mean for the 2 stratum of non-respondents for Stratum of non-respondents for the study variable Yd Consider also 2

E ε12   

 x − Xd  2 1 = E d  = 2 E  xd − X d  Xd  Xd  =

1 X d2

Var ( xd )

=

(

)

1  V1  E2 ( xd / nd )  + E1 V2 xd / rd 2      X d2 

2  1 1  S xd = −  2 + Wd 2  nd N d  X d

2

 vd 2 − 1  S xd 2   2  nd  X d

(7)

Where, S x2d = Variance of the whole domain population mean of the auxiliary variable X d

vd2 = The inverse sampling rate S x2d = Variance of the domain population mean for the 2 stratum of non-respondents for Stratum of non-respondents for the auxiliary variable X d

American Journal of Theoretical and Applied Statistics 2018; 7(2): 45-57

Next consider E

( ) ε d22

 x / − Xd = E d  X d

=

1 X d2

Var

=

2

 2 1  = 2 E  xd − X d  Xd 

49

1  N d − nd/  X d2  N d

 S x2d  /  nd

2  1 1  S xd = / −  n  2  d Nd  X d

( ) xd/

(8)

Consider,

 y − Y  x − X d   E ε d0 ε d1  = E  d d  d   Yd  X d  

(

)( xd − X d )

=

1 E  yd − Yd X d Yd 

=

1 Cov ( xd yd ) X d Yd

=

1 1 1 Cov  E ( yd / nd ) E ( xd / nd )  + E Cov ( yd xd ) / nd  + E Cov ( yd xd ) / rd2  X d Yd X d Yd  X d Yd 

=

1 1 E Cov ( yd xd ) / nd  + E Cov ( yd xd ) / rd 2  X d Yd X d Yd 

S xd S yd S xd S yd  vd2 − 1   1 1  2 2 = − + ρ W   ρ xd yd  n  xd2 yd2 d2 X Y n N X Y d  d d d d  d  d 

(9)

Next,  y − Y d E ε d 0 ε d 2  =  d  Yd

  xd/ − X d    Xd

(

    

) ( xd/ − X d )

=

1 E  yd − Yd X d Yd 

=

1 Cov yd xd/ X d Yd

=

1 1 1 Cov  E yd / nd/ , E xd / nd/  + E Cov yd xd/ / nd/  + E Cov yrd xrd / nd/   2 2   X d Yd   X d Yd  X d Yd

(

)

(

) (

)

(

(

)

Sx S y  1 1  = / − ρ xd yd d d  n  X d Yd  d Nd 

(10)

Consider,  x − X d E ε d1 ε d 2  = E  d X  d

  xd/ − X d    Xd

=

    

)

1

(

E xd/ 2 1 Xd

− Xd

)

2

2  1 1  S xd = / −  2 n  d Nd  X d

(11)

3.1. Mean Square Error (MSE) of the Ratio Estimator Yˆd R

R1

=

1 X d2

(

E1 E2  xd − X d 

)(

xd/

)

− Xd /

nd/

 

and Yˆd

with the Sample Size Allocation R2

ˆ ˆ The ratio estimator of Yd R and YdR can be defined as; 1

2

50

Alilah David Anekeya et al.: Optimal Allocation in Domains Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non-Response

S y2d =Variance of the domain population mean for the

y y ˆ ˆ Yd R = d .xd/ = rd xd/ and Yd 2 = d/ .xd = rd xd respectively 1 xd xd Proposition 1 The mean square error (MSE) of the estimator defined by y ˆ Yd R = d .xd/ = rd xd/ is given by; 1 xd

2

stratum of non-respondents for Stratum of non-respondents for the study variable Yd S x2d = Variance of the whole domain population mean of

the auxiliary variable X d

vd2 = The inverse sampling rate

 1 1  2  1 1  2  vd − 1  /  / −  S yd +  − /  Sd R +  2 Wd2 Sd R2 1 N n n n n d   d  d   d  d

S x2d = Variance of the domain population mean for the 2

stratum of non-respondents for Stratum of non-respondents for the auxiliary variable X d

Where, Sd2R 1

=

S y2d

+

S x2d

Rd2

− 2 ρ xd yd Rd S xd S yd

S d2R = S y2d + Rd2 S x2d − 2 ρ xd 2

2

2

Rd =Population ratio of Yd to X d Proof By definition,

2

yd 2

Rd S xd S yd 2

( )

2

ˆ MSE Yd R

With the notations defined as: S y2d = Variance of the whole domain population mean of

1

Substituting the values of equations (5) we obtain

the study variable Yd

(

) ( ( )

(

)(

)

 Yd 1 + ε d X d 1 + ε d  0 2 = E − Yd    X d 1 + ε d1  

= Yd2 E

)

 1+ εd 1+ εd  0 2  − 1   1 + ε d1  

(

2 y  ˆ = E Yd R − Yd  = E  d .xd/ − Yd   1   xd 

)

2

2

)(

(

)

= Yd2 E  ε d 2 + ε d0 + ε d0 ε d 2 − ε d1 1 − ε d1 + ε d21 + ...   

(

= Yd2 E ε d 0 + ε d 2 − ε d1

)

2

2

( ) ( ) ( )

(

)

(

)

(

)

2 2 2 2 = Yd  E ε d0 + E ε d1 + E ε d2 + 2 E ε d0 ε d2 − 2 E ε d0 ε d1 − 2 E ε d1 ε d2    2  1  S yd  1 1 2 1 Y − + −  = d  /  nd N d  Yd2  nd nd/ 

S y2d  S y2d  vd2 − 1  2 + W  2   d 2 2 n Yd  Yd  d  2

2 2  vd − 1  S xd 2  1 1  S xd 1  S xd  1 − + + Wd 2  2 + −    2  / 2  n  X 2   d  d  nd N d  X d  nd N d  X d

Sx S y S xd S yd  1  1 1  1  +2  / −  ρ xd yd d d − 2  −  ρ xd yd X d Yd X d Yd  nd Nd   nd Nd  2 S xd S yd  vd − 1   1 1  S xd 2 2 −2  2 ρ W − 2 −     n  xd2 yd2 d2 X Y  n/ N d  X 2 d d  d   d  d

 1 1  2  1 1  2  1 1  2 2 = / −  S yd +  − /  S yd +  / −  S xd Rd n  d Nd   nd nd   nd Nd 

]

2

American Journal of Theoretical and Applied Statistics 2018; 7(2): 45-57

 1 1  2 2 −  − S R  n/ N  xd d + d   d

51

 vd2 − 1  2 2 2  Wd2  S yd + Rd S xd − 2ρ xd2 yd2 Rd S xd2 S yd2  n d  

 1  1 1  1  +2  / −  ρ xd yd Rd S xd S yd − 2  −  ρ xd yd Rd S xd S yd  nd N d   nd N d   1 1  2  1 1  2 2 2 = / −  S yd +  − /  S yd + S xd Rd − 2 ρ xd yd Rd S xd S yd n N n n d d d   d  

{

}

 vd − 1  2 2 2 +  2 Wd2  S yd + Rd S xd − 2ρ xd2 yd2 Rd S xd2 S yd2  n d    1  vd − 1  1  2  1 1  2 2 = / −  S yd +  − /  Sd R +  2 Wd2 Sd R2 □ n 1 n N n n d d d   d   d   Proposition 2  1 y 1  2  1 1 ˆ  S yd +  − / The mean square error (MSE) of the ratio estimator Yd R2 = / .xd is given by;  / − N n xd d   nd  d nd  vd2 − 1  /2  Wd2 S d R2 n  d  Where,

 /2  S d R + 1 

S / d2 R = S y2d + S x2d Rd2 + 2 ρ xd yd Rd S xd S yd 1

/

Sd R2 = S y2d + Rd2 S x2d + 2 ρ xd 2

2

2

2

yd 2 Rd S xd 2 S yd 2

With the notations as defined in proposition 1 above Proof

( ) = E Yˆ

ˆ MSE of YdR = MSE Yˆd R

d R2

2

2

y  = E  d .x d/ − Yd   xd 

(

− Yd  

2

2

)(

)

  1+ ε d 1+ ε  d1 0 = E Yd  − 1     1 + ε d2    

(

)

2

)(

(

)

= Yd2 E  ε d + ε d + ε d ε d − ε d 1 − ε d + ε d2 + ...  0 1 0 2 2 2  1 

(

= Yd2 E ε d 0 + ε d1 − ε d 2

)

2

2

( ) ( ) ( )

(

)

(

) (

)

2 2 2 2 = Yd  E ε d0 + E ε d1 + E ε d2 + 2 E ε d0 ε d1 − 2 E ε d0 ε d2 − 2 ε d1 ε d2   

= Yd2

2 2  1 S y2d 1  S yd  1 1  S yd  vd 2 − 1  2  − + − +    Wd  nd/ N d  Yd2  nd nd/  Yd2  nd  2 Yd2 

52

Alilah David Anekeya et al.: Optimal Allocation in Domains Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non-Response

2 2 S x2d  1 1  S xd  1 1  S xd  vd 2 − 1  2 +  − W  2 +   2 +  / −  d 2 X 2 n N N n X n X d d d d   d  d  d  d 

S xd S y d  1 S x d S y d  vd 2 − 1  1  2 2 − + Wd 2 ρ xd yd +2    ρ xd yd 2 2 X X d Yd  nd  d Yd  nd N d 

  

2  1 S xd S yd  1 1  1  S xd  + / −  −2  −  ρ xd yd  n Nd  X d Yd  nd N d  X d2   d 

 1 1  2  1 1 = / −  S yd +  − / n N n d   d  d nd

 2  vd2 − 1  2  S yd +  Wd2 S yd2 n  d  

 1 1  2 2  1 1  2 2  vd2 − 1  2 2 − +   Rd S xd +  Wd2 Rd S xd2  Rd S xd +  / − n N N n d  d   d  d   nd  vd2 − 1   1 1  − + 2 Wd2 ρ xd2 yd2 Rd S xd2 S yd2  ρ xd yd Rd S xd S yd + 2   nd N d   nd   1  1 1  1  2 2 −2  / − ρ xd yd Rd S xd S yd − 2  / −   Rd S xd n  n  d Nd   d Nd   1 1  2  1 1 = / −  S yd +  − / n N n d   d  d nd

 2 2 2  S yd + S xd Rd + 2 ρ xd yd Rd S xd S yd 

(

 vd − 1  2 2 2 +  2  Wd2 S yd + Rd S xd2 + 2ρ xd2 yd2 Rd S xd2 S yd2 n  d 

(

 1 1  2  1 1 =  / −  S yd +  − / N n d   nd  d nd 3.2. Optimal Allocation in Double Sampling for Domain Estimation An optimum size of a sample is required so as to balance the precision and cost involved in the survey. The optimum allocation of a sample size is attained either by minimizing the precision against a given cost or minimizing cost against a given precision. In this study, a non-linear cost function has been considered. Denote the cost function for the ratio estimation by

( )

Cd = cd/ nd/

θ

+ cd 0 nd + cd1 nd1 + cd 2 rd 2

(12)

)

)

 /2  vd − 1  /2  S d R +  2 Wd2 S d R2 1 n  d  

cd1 = The unit cost for processing the responded data of yd at the first attempt of size nd1 .

cd2 = The unit cost associated with the sub-sample of size rd2 from non-respondents of size nd2 However the first sample of size nd1 and sub-samples of size rd2 are not known until the first attempt is carried out. The cost will therefore be used in the planning for the survey. Hence the expected cost values of sizes nd1 and rd2 will be n

d given by; nd1 = Wd1 nd and rd 2 = Wd 2 . v . Hence the expected d2

Where, cd/ = The cost of measuring a unit in the first sample of size nd/

cd0 =The cost of measuring a unit of the first attempt on yd with second phase sample size nd .

cost function is;

( )

/ / E Cd  = Cd* = cd nd

θ

+ cd0 nd + cd1 nd1 + cd2 Wd2

nd vd2

American Journal of Theoretical and Applied Statistics 2018; 7(2): 45-57

Cd*

( )

/ / = cd nd

θ

Wd   + nd  cd0 + cd1Wd1 + cd2 . 2  (13) vd 2  

1

( )

( )

cd 2

vd 2 =

S d2R

d0

+ cd1Wd1

)

) 2

+ cd1Wd1

)

∂G (Wd )

)

∂vd2

Sωd =

− S d2R 1

=

Wd 2 Sd2R

>0

nd

λ

vd 2 =

2

 1

2

yd 2

Rd S xd S yd 2

θ

 vd − 1 

1 

To obtain the normal equations, the expression of Equation (14) is differentiated partially with respect to vd2 , nd and nd/ , and the partial derivatives are equated to zero

∂G (Wd ) ∂nd/

=

+

vd22

nd λ cd2

Sd2R

1

nd/2

( )

+ λ cd/ θ nd/

θ −1

(15)

Sd R

 1

=0

1 

 1

1 

 vd

1 

2 − /  Sd2R +  2 −  Wd2 Sd2R G (Wd ) =  / −  S yd +   1  nd nd  2 N n n d     d  d nd 

( )

 + λ  cd/ nd/ 

θ

Wd    + nd  cd0 + cd1Wd1 + cd 2 . 2  − Cd*  (16) v    d2

But from (15),

Wd    + nd  cd0 + cd1Wd1 + cd 2 . 2  − Cd*  (14) v    d2

− S y2d nd/2

λ cd2 Wd2 nd

Consider the equation 2

2 − /  Sd2R +  2 W S2 G (Wd ) =  / −  S yd +   n  d2 d R2  1 N n n n d   d  d   d  d

( )

=0

2

Proof To determine the optimum values of vd2 , nd and nd/ that minimizes variance at a fixed cost, define

 + λ  cd/ nd/ 

vd22

2

S d2R = S y2d + Rd2 S x2d ± 2 ρ xd

1 

=

λ cd2 Wd2 nd

λ cd 2 Wd 2 nd 2 = vd22 Wd 2 S d2R

1

 1



2

nd

Sd2R = S y2d + S x2d Rd2 ± 2 ρ xd yd Rd S xd S yd 2

1

 Sω2  θ +1 =  / d ∅  cd θ   

1

∅=

2

λ cd/ θ

1

2

Where,

S y2d

=

 θ +1

Wd 2 Sd2R 2

Sω2d

Next the partial derivative with respect to vd2 obtained as;

1

(c

1

λ cd/ θ

 nd/ =   /  λ cd θ 

S d2R − Wd 2 S d2R d0

2

S y2d − Sd2R

=

Sω2d

2

( .

1

( )

− Wd 2 Sd2R

=0

= S y2d − S d2R

θ +1

nd/

2 d R1

θ +1

2 2 2 Let Sωd = S yd − S d R1 > 0 , thus,

1

(c

θ +1

nd/

 Sω2  θ +1 nd/ =  d/ ∅   θ cd   

nd =

θ +1

λ cd/ nd/

specified cost Cd* when,

(S ∅

( )

= − S y2d + S d2R + λ cd/ nd/

3.3. Results of Double Sampling for Domain Estimation in the Presence of Non-Response Proposition 3 The variance for the estimated domain mean for the yd / ˆ .xd is minimum for a estimated domain mean Yd R1 = xd

53

vd2 nd

=

λ cd2 Sd R

2

and Sd R nd 2 = vd 2 λ cd 2

(17)

Substituting this in Equations (17) we obtain into (16) we obtain

54

Alilah David Anekeya et al.: Optimal Allocation in Domains Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non-Response

 1

1 

 1

 λ cd

1 

1 

2 2  Wd S d2 − /  S d2R +  − G (Wd ) =  / −  S yd +  2 R2  1  S N n n n n d  d  d   d  d  dR 2 

Sd R   2 + nd  cd 0 + cd1Wd1  + cd 2 Wd 2 λ c   d2

( )

 + λ  cd/ nd/ 

θ

The partial derivative of the equation (18) with respect to nd is obtained as

∂G (Wd ) ∂nd

=−

Sd2R

1

nd2

+

Wd2 Sd2R

2

nd2

(

Where ∅ =

1

2

(

)

)

vd 2 =

)

nd2

=

(

1

2

λ cd0 + cd1Wd1

)

 S d2 − Wd S d2 R 2 R1 nd = ∅  1  cd + cd Wd 0 1 1  1

 Cd* = cd/  

 Sω2  d  θλ cd/ 

 1 Cd* = cd/  /   cd

θ

( )

/ / Cd* = cd nd

   

 θ +1 θ 1   +  λ  

θ +1  Sω2d     θ

from equation (17)

S d2R

.

(S

2 d R1

− Wd 2 S d2R

d0

+ cd1Wd1

(c

2

)

)

To obtain λ the values of vd2 , nd and nd/ are substituted in the cost function equation (13) and then solve for the value of λ . Suppose the cost function is given by

2

S d2R − Wd 2 S d2R

cd 2 2

λ nd2 cd0 + cd1Wd1 = Sd2R − Wd 2 Sd2R 1

.nd

Thus,

= − Sd2R + Wd 2 S d2R + λ nd2 cd0 + cd1Wd1 = 0 1

Sd R

(18)

2

+ λ cd0 + cd1Wd1 = 0

(

λ cd 2

vd 2 =

But

λ

 *   − Cd   

θ

Wd   + nd  cd0 + cd1Wd1 + cd2 . 2  vd 2  

Then,

 Sd2 − Wd Sd2 2 R2  R1  cd + cd Wd 0 1 1 

 c W   cd + cd Wd + d2 d2 0 1 1  cd2 

θ

θ  θ +1  1  θ +1 1     + Wd2 S d R2 + cd0 + cd1Wd1  λ λ  

(

)

 S 2 − Wd Sd2 2 R2  dR1  cd + cd Wd 0 1 1 

 S d2 − Wd S d2 2 R2  R1  cd + cd Wd 0 1 1 

    

      

(19)

Let,

A=

1 cd/ θ +1

( )

θ

 Sω2  d  θ 

(

B = cd0 + cd1Wd1

 θ +1   

)

 S d2 − Wd S d2 2 R2  R1  cd + cd Wd 0 1 1 

  +W S d 2 d R2 cd 2  

C = Cd* The equation (19) becomes;





θ θ +1

+ Bλ



1 2

−C = 0

If θ = 1 and substituting this value in the equation (20) we obtain a linear equation of the form

(20)

American Journal of Theoretical and Applied Statistics 2018; 7(2): 45-57





1 2

+ Bλ



1 2

55

−C = 0

With the values of A and B defined as;

( )

A=

cd/

1 2

1

 Sω2  d  θ 

2  , B = cd + cd Wd 0 1 1  

(

)

 S d2 − Wd S d2 2 R2  R1  cd + cd Wd 0 1 1 

Solving the linear equation solution obtained is,

estimated domain mean yd is minimum for a specified cost

Cd* if;

A+ B C

λ=

 *  +W S d 2 d R2 cd 2 and C = Cd  

 Sω2  nd/ =  d/ ∅   θ cd   

1 and substituting this value in the equation 3 (20) we obtain a linear equation of the form, When θ =





1 4

+ Bλ



1 2

nd =

−C = 0

( )

1 2

vd 2 =

1 2

 Sω2  d   θ   

(

   2 B + 4 AC − B   2A

d0

+ cd1Wd1

cd 2 S d2R 2

2

(S .

∅=

4

)

)

2 d R1

− Wd 2 S d2R

d0

+ cd1Wd1

(c

2

)

)

1

λ

Proof The proof for nd and vd2 is the same as the one in

)

Proposition 4 If the expected cost function is of the form Wd  cd/ log nd/ + nd  cd 0 + cd1 Wd1 + cd 2 . 2 vd 2 

− Wd 2 Sd2R

Where,

While B and C remains as earlier defined Solving the equation (21) solution obtained is,   λ=  

2 d R1

(c

(21)

With the values of A defined as;

A = cd/

(S ∅

proposition 3 above. For nd/ the Lagrangian multiplier technique is used. Let,

Cd* =

  then the variance of the 

G (Wd )

 1 1  2  1 1  2  vd2 − 1  2 = / −  S yd +  − /  S dR1 +   Wd2 SdR2 n N n n n d  d   d  d  d  Wd     + λ  cd/ log nd/ + nd  cd0 + cd1Wd1 + cd2 . 2  − Cd*  vd 2    

(22)

To obtain the normal equations for the expression (22) the equation is differentiated partially with respect to nd/ and the partial derivatives are equated to zero

∂G (Wd ) ∂nd/

=

− S y2d nd/2

+

S d2R

1

nd/2

λ cd/

+

nd/

=0

= − S y2d + S d2R + λ cd/ nd/ = 0 1

λ cd/ nd/ = S y2d − S d2R

1

56

Alilah David Anekeya et al.: Optimal Allocation in Domains Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non-Response

nd/

=

S y2d − Sd2R

1

λ cd/

2 2 2 But, Sωd = S yd − S d R1 > 0 , thus,

nd/ =

 Sω2   d ∅ =  λ cd/  cd/  Sω2d

1

∅=

λ

To solve for λ , let the variance be given as V0 then substitute the values of vd2 , nd and nd/ into the equation,

 2  vd2 − 1  2  Sd R +  Wd2 Sd R2 = V0 1 n  d  

 1 1  2  1 1 G (Wd ) =  / −  S yd +  − / n N n d   d  d nd  1 1  2  1 1 = / −  S yd +  − / n N n d   d  d nd =

1 nd/

(

S y2d

− S d2R 1

)

(

1  +  nd 

 2  vd2 − 1  2  SdR +  Wd2 SdR2 − V0 = 0 1 n  d  

S d2R 1

− Wd 2 S d2R 2

)

+ vd2 Wd 2 S d2R 2

2    S yd  + V0  = 0 −    N d 

(23)

Substitute the values of vd2 , nd and nd/ into the equation (23) and simplify to obtain,

 

λ cd/ + λ 

(S

2 d R1

− Wd2 Sd2R

2

) (c

d0

)

 + cd1Wd1  + Wd 2 S d2R cd 2 − Vd* = 0 2 

(24)

Let,  S y2  Vd* =  d + V0  , A = cd/ and B =  Nd   

(S

Thus equation (24) becomes, 1

(25)

Solving for λ in equation (21) the solution becomes,

(

)

B 2 + 4 AC − B 2A

− Wd 2 S d2R

2

) (c

d0

)

+ cd1Wd1 + Wd 2 S d2R

2

cd 2

respondents of size nd2 ) is less than both cd1 (the unit

Aλ + Bλ 2 − Vd* = 0

  λ=  

2 d R1

1 2

   

cost for processing the responded data of yd at the first attempt of size nd1 ) and cd0 (the cost of measuring a unit of the first attempt on yd with second phase sample size

nd ) and also when the value of Sd2R is not too large 1 2 relative to Sd R2 . The second phase sample size ( nd ) will

(

)

2 2 be minimum if the value of S d R1 − Wd2 Sd R2 > 0 but less

2 2 2 2 2 than 1. If the value of Sωd = S yd − Sd R1 < 1 and S yd > Sd R1

4. Conclusion

with the values of cd/ (the cost of measuring a unit in the

It is noted that value of inverse sampling rate ( vd2 ) do not depend on the value of the Lagrangian multiplier ( λ ).

first sample of size nd/ ) not being too large to the relative

Further the value of sampling rate, vd2 < 1, if cd2 (the unit cost associated with the sub-sample of size rd2 from non-

Sω2 d then the value of nd/ (size of the first sample) will be

minimum. These minimum values therefore make the theoretic cost survey of the proposed estimator as minimal as possible.

American Journal of Theoretical and Applied Statistics 2018; 7(2): 45-57

References [1]

Cherniyak O. I., (2001). Optimal allocation in stratified sampling and double sampling with non- linear cost function, Journal of Mathematical Sciences 103, 4 pp. 525-528.

[2]

Choudhry H. G., Rao, J. N. K, and Michael A., Hidiroglou, (2012). On sample allocation for efficient domain estimation, Survey methodology, 38 (1) pp. 23-29.

[3]

Cochran W. G., (1977) Sampling techniques. New York: John Wiley and Sons, (1977).

[4]

Eurostat., (2008). Introduction to Sample Design and Estimation Techniques, Survey Sampling Reference Guidelines. Luxembourg; Office for Publication of the European Communities pp. 36.

[5]

[6]

Hansen M. H. and Hurwitz W. W, (1946). The problem of non-response in sample surveys. The Journal of the American Statistical Association, 41 517-529. Holmberg A., (2002). A multi-parameter perspective on the

57

choice of sampling designs in surveys. Journal of statistics in transition. 5 (6) pp. 969-994. [7]

Khan S. U., Muhammad Y. S., and Afgan N., (2009). Multiobjective compromise allocation stratified sampling in the presence of non-response using quadratic cost function. International Journal of Business and social science. 5 (13). pp. 162-169.

[8]

Neyman, C. and Jerzy D. (1934).; On the Two Different Aspects of the Representative methods of stratified sampling and the method of purposive selection, Journal of royal statistical society. 97 (4) pp. 558-625.

[9]

Okafor F. C, (2001). Treatment of non-response in successive sampling, Statistica, 61 (2) 195-204.

[10] Saini M., and Kumar A, (2015). Method of Optimum allocation for Multivariate Stratified two stage Sampling design Using double Sampling. Journal of Probability and Statistics forum, 8, pp. 19-23. [11] Tschuprow and Al A., (1923). On mathematical expectation of the moments of frequency distribution in the case of correlated observation (chapters 4-6) Metron 2 (1) pp. 646-683, (1923).

Suggest Documents