Improved Estimators in Simple Random Sampling When Study ... - arXiv

0 downloads 0 Views 273KB Size Report
the estimation stage in order to improve the precision or accuracy of an estimator of unknown population parameter of interest. Ratio, product and regression ...
Improved Estimators in Simple Random Sampling When Study Variable is An Attribute Rajesh Singh and Prayas Sharma Department of Statistics, Banaras Hindu University, Varanasi (U.P.), India

Abstract This article addresses the problem of estimating the population mean in the presence of auxiliary information when study variable itself is qualitative in nature. Bias and mean squared error (MSE) expressions of the class of estimators are derived up to the first order of approximation. The suggested estimators have been compared with the traditional estimator and several other estimators considered by Singh et al. (2010). In addition, we support this theoretical result by an empirical study to show the superiority of the constructed estimator over others. Key words: Attribute, point bi-serial, mean square error, simple random sampling. 1.

Introduction In the theory of sample surveys, it is usual to make use of the auxiliary information at

the estimation stage in order to improve the precision or accuracy of an estimator of unknown population parameter of interest. Ratio, product and regression methods of estimation are good examples in this context. Many authors including Upadhyaya and Singh (1999), AbuDayyeh et al. (2003), Kadilar and Cingi (2005), Khoshnevisan et al. (2007), Singh et al. (2007), Singh et al. (2008) and Singh and Kumar (2011) suggested estimators using known population parameters of an auxiliary variable. But there may be many practical situations when auxiliary information is not available directly but is qualitative in nature, that is, auxiliary information is available in the form of an attribute. For example:

(a) The height of a person may depend on the fact that whether the person is male or female. (b) The efficiency of a Dog may depend on the particular breed of that Dog. (c) The yield of wheat crop produced may depend on a particular variety of wheat, etc.

In these situations by taking the advantage of point bi-serial correlation between the study variable y and the auxiliary attribute  along with the prior knowledge of the population parameter of auxiliary attribute, the estimators of population parameter of interest can be constructed. In many situations study variable is generally ignored not only by ratio scale variables that are essentially qualitative, or nominal scale, in nature, such as sex, race, colour, religion, nationality, geographical region, political upheavals (see Gujarati and Sangeetha (2007)). Taking into consideration the point bi-serial correlation between auxiliary attribute and study variable, several authors including Naik and Gupta (1996), Jhajj et al. (2006), Singh et al. (2007), Shabbir and Gupta (2007), Singh et al.(2008), Singh et al. (2010), Abd-Elfattah et al. (2010), and Singh and Solanki (2012) proposed improved estimators of population mean. All the others have implicitly assumed that the study variable Y is quantitative whereas the auxiliary variable is qualitative. There may be situations when study variable itself is qualitative in nature. For example, consider U.S. presidential elections. Assume that there are two political parties, Democratic and Republican. The dependent variable here is the vote choice between two political parties. Suppose we let Y=1, if the vote is for a Democratic candidate and Y=0, if the vote is for republican candidate. Some of the variables used in the vote choice are growth rate of GDP, unemployment and inflation rates, whether the candidate is running for reelection, etc. For the present purposes, the important thing is to note that the study variable is a qualitative variable. One can think several other examples where the study variable is qualitative in nature. Thus, a family either owns a house or it does not, it has disability insurance or it does not, both husband and wife are in the labour force or only one suppose is, etc. (see Gujarati and Sangeetha (2007)). In this paper we propose estimators in which study variable itself is qualitative in nature.

Consider a sample of size n drawn by simple random sampling without replacement (SRSWOR) from a population of size N. Let  i and x i denote the observations on variable  and x respectively for i th unit (i=1,2,3…N).  i  1 , if i th unit of population possesses N

n

i 1

i 1

attribute  and i  0 , otherwise. Let A   i and a    i , denotes the total number of units in the population and sample possessing attribute  respectively, P 

A a and p  , N n

denotes the proportion of units in the population and sample, respectively, possessing attribute  . Let us define,

ep 

p  P ,

e1 

P

x  X  ,

e3 

X

s

2 x

 S2x S2x

Such that,

Ee i   0, i  p,1,3 and

Ee 2p   fC2p , E e1e p   f pb C  C x . ,

 

 

E e 32  f  04  1,

E e12  fC 2x , E e 3 e    fC   12 ,

Ee1e 3   fC x  03 ,

Where,

1 1  f    , n N

And

 pb

S 2p 2 Cp  , 2 P

C 2x 

S2x X2

,

is the point bi-serial correlation coefficient.



2.

Estimators in Literature

Singh et al. (2010) proposed the following ratio-type estimator for estimating unknown population mean when study variable is an attribute, as P t a   X x

(2.1)

The bias and MSE expressions of the estimator t 1 , to the first order of approximation is respectively, given by

 C2  Bt a   f  x   pb C p C x   2 

(2.2)

MSEt a   fP2 C2p  C2x  2pb Cp Cx 

(2.3)

Singh et al. (2010) proposed a general class of estimator as, t b  Hp, u 

where u 

(2.4)

x and Hp, u  is a parametric equation of p and u such that X

Hp,1  P, P

(2.5)

and satisfying following regulations: (i) Whatever be the sample chosen, the point (p,u) assume values in a bounded closed convex subset R2 of the two-dimensional real space containing the point (p,1). (ii) The function H(p,u) is a continuous and bounded in R2 . (iii)The first and second order partial derivatives of H(p,u) exist and are continuous as well as bounded in R2.

Where,

H1 

H3 

H , u p  P, u 1

1  2H 2 pu

H2 

and

,

H4 

p  P, u 1

1  2H 2 u 2

1  2H 2 py 2

, p  P, u 1

. p  P, u 1

The bias and minimum MSE of the estimator t b are respectively, given by –

Bt b   f Ppb Cp Cx H3  C2x H 2  P 2 C2y H 4 

(2.6)

MSEt b min  fP2 C2p 1  2pb 

(2.7)

Singh et al. (2010) proposed another family of estimator for estimating P, as  aX  b   aX  b   ax  b   t c  q1 P  q 2 X  x  exp    ax  b   aX  b   ax  b  



(2.8)

The bias and minimum MSE of the estimator tc to the first order of approximation, are respectively, given as



Bias t c   Pq  1  f q 2 XB  q1PAC2x  q1PBCp Cx

MSEt c min



(2.9)

 2 125   3 24  2 2  4  5   P   1 3  22  

(2.10)

Where,





M1  P 2 f C 2p  B 2 C 2x  2BC p C x ,





M 3  P 2 f AC 2x  2BC p C x ,

 

M 2  X 2f C2x ,





M 4  PXf  BC 2x  C p C x ,





M5  XPf  BC 2x .

    25 q1*  1 4 1 3  22

q *2 

And

1 5   2  4  1  3  22

(2.11)

Where,





1  P 2  M1  2M 3 ,  2   M 4  M 5 ,

3.





 3  M 2 ,  4  P 2  M 3 ,  5   M 5 ,

Proposed estimators

The following estimator is proposed x t 1  p  X



 s 2x  2  Sx

  



(3.1)

Where  and  are suitably chosen constants to be determined such that MSE of the

estimator t1 is minimum. The bias and MSE of the estimator t1 to the first order of approximation are respectively, given by    1 2   1 2 Bias ( t 1 )   C p   C x   C x  03   px C p C x  C p  12  2   2 



MSE(t 1 )  P 2 f C2p   2 C2x  2  04  1  2px Cp Cx  2Cp 12  2Cx  03

(3.2)



(3.3)

Differentiating equation (3.3) partially with respect to  and  , equating them to zero, we get the optimum values of  and  respectively, as

* 

C p  0312   px  04  1



C x  04  1  

2 03



* 

C p  px  03  12 



04

 1  203



(3.4)

Putting the optimum values of  and  from equation (3.4) into equation (3.3), we get the minimum MSE of the estimator t1 as

MSE ( t 1 ) m in

  03 px  12 2  2  P fC 1   px   04  1  203    2

2 p

(3.5)

Following Srivastava and Jhajj (1989), we propose a general family of estimators for estimating P, as

t 2  Hp, u, v 

Where u 

(3.6)

s2 x , v  x2 and Hp, u, v  is a parametric function of p, u and v such that X Sx

Hp,1,1  P, P

(3.7)

And satisfying following regulations: (iv) Whatever be the sample chosen, the point (p,u,v) assumes the values in a closed convex subset R3 of the three-dimensional real space containing the point (p,1,1). (v) The function H(p,u,v) is a continuous and bounded in R3 . (vi) The first and second order partial derivatives of H(p,u,v) exist and are continuous as well as bounded in R3. Expanding H(p,u,v) about the point (P,1,1) in a second order Taylor series we have t 2  Hp, u, v   P  p - P ,1  u - 1,1  v  1

(3.8)

t 2  P   Pe 0  e1H1  e 3 H 2  Pe 02 H 3  e12 H 4  e 32 H 5  Pe 0 e1H 6  e1e 3 H 7  Pe 0 e 3 H 8  (3.9) Where,

H p

 1, p  P , u 1

H1 

H , u pP,u 1

1  2H H3  2 p 2

1 2H H5  2 v 2

H7 

H2 

, p  P , u 1, v 1

p  P , u 1, v 1

1  2H 2 uv

H , v pP,u 1,v1

1  2H H4  2 u 2

. p  P , u 1, v 1

1  2H H6  2 pu

p  P , u 1, v 1

1  2H 2 pv

p  P , u 1, v 1

H8  p  P , u 1, v 1

Taking expectations of both sides of (3.9), we get the bias of the estimator t2 to the first order of approximation, as



Bt 2   f PC2p H3  C2x H 4   04  1H5  Ppx Cp Cx H6  Cx  03 H7  PCp 12 H8



(3.10)

Squaring both sides of (3.9) and neglecting terms of e’s having power greater than two, we have

t 2  P2  Pe02  e12 H12  e32 H 22  2Pe0 e1H1  2Pe0 e3 H2  2e1e3H1H 2 

(3.11)

Taking expectations of both sides of (3.11), we get the MSE of the wider class of estimator t2 as



MSEt 2   f PC2p  C2x H12   04  1H 22  2Ppx Cp Cx H1  2PCp 12 H 2  2Cx  03 H1H 2



(3.12) On differentiating (3.12) with respect to H1 and H2 equating to zero, respectively we obtain the optimum values of H1 and H2, as

H1* 

C p  0312   px  04  1



C x  04  1  203

H *2 



C p  px  03  12 



04

 1  203



(3.13)

On substituting the values of H 1* and H *2 from (3.13) in (3.12), we obtain the minimum MSE of the estimator t2, as

MSE ( t 2 ) m in

  03 px  12 2  2  P fC 1   px   04  1  203    2

2 p

(3.14)

We propose another improved family of estimators for estimating P, as

 

g

 

  S 2x  s 2x    X t 3  m1 p   m p exp  2 2  2   x  1   X   Sx  s x 

(3.15)

where  is suitable constant. g and  are constants that can takes values (0,1,-1) for designing different estimators; and m1 and m2 are suitable chosen constants to be determined such that mean square error (MSE) of the class of estimator t3 is minimum. Expressing the class of estimators t3 at equation (3.15) in terms of e’s, we have

t 3  m1 P1  e 0 1  e1 

g

  e 3  e 3  1   m 2 P1  e 0  exp  1    2    2 

(3.16)

Simplifying equation (3.16) and retaining terms up to the first order of approximation, we have

 gg  1 2 2    1 2     t 3  P  P 1  m1 1  ge 0 e1   e1   m 2 1  e 0 e1  e1  2 8    2   (3.17) Taking expectations of both sides of equation (3.17), we get the bias of the estimator t3 to the first order of approximation, as

Bias ( t 3 )  P1  m1 B  m 2 E 

where,

(3.18)

gg  1 2 2   B  1  gf px C p C x   fC x  2     1 2    E  1  f px C p C x  fC x  8  2 

(3.19)

Squaring both sides of equation (3.17) and neglecting the terms having power greater than two, and then taking expectations of both sides, we get the MSE of the estimator t3 to the first order of approximation, as



MSE t 3   P 2 1  m12 A  m 22 C  2m1 m 2 D  2m1 B  2m 2 E

 

Where, A  1  f C2p  4gpx Cp C x   2 g2g  1C2x





(3.20)





   2    2  04  1    C  1  f  C 2p  2C p  12   4     

   2   04  1  2g px C p C x D  1  f  C 2p  C p 12  8   

g gg  1 2 2  C x  03   C x  2 2 

And B and E are the same as defined earlier give in (3.19). The MSE of the class of estimator t3 at equation (3.20) is minimised for the optimum values of m1 and m2 given as m1* 

BC  DE 

AC  D  2

and

m *2 

AE  BD 

AC  D  2

(3.21)

Putting equations (3.21) in (3.20), we get the resulting minimum bias and MSE of the proposed class of estimators t3, respectively, as

 B 2 C  2BDE  AE 2  Bias ( t 3 ) min  P 1  AC  D 2   

(3.22)

 B 2 C  2BDE  AE 2  MSEt 3   P 2 1  AC  D 2    4.

(3.23)

Efficiency Comparisons

First we compare the MSE of proposed estimators t1 and t2 with usual estimator, MSE t 1 min  MSE t 2 min  Vp  If,

  03 px  12 2  2 2 fC 1   px   04  1  203    fC p  2 p

(4.1)

On solving we observed that above conditions holds always true. Now, we compare the efficiency of proposed estimator t 3 with usual estimator, MSE t 3 m in  Vp 

If,

 B2C  2BDE  AE 2  2 1    f1Cp 2 AC  D  





(4.2)

On solving we observed that above conditions holds always true. Next we compare the efficiency of proposed estimator t 3 with wider class of estimator t2. MSE t 3 m in  MSE t 2 m in  MSE t 1 m in

If,   03 px  12 2   B 2 C  2BDE  AE 2  2 2 1  AC  D 2    fC p 1   px   04  1  203   

(4.3)

Finally, we compare the efficiency of proposed estimator t 3 with class of estimator tc proposed by Singh et al. (2010) as MSE t 3 m in  MSE t c m in if

 B 2 C  2BDE  AE 2   2 125   3 24  2 2  4  5  P 1   AC  D 2    P  1 3  22   2

5.

(4.4)

Empirical study

Data Statistics: The data used for empirical study has been taken from Gujrati and Sangeetha (2007) -pg, 601. Here, Y – Home ownership. X – Income (thousands of dollars).

n

N

P

X

 pb

Cp

Cx

12

 04

 03

11

40

0.525

14.4

0.897

0.963

0.308

-0.118

1.75

-0.153

The following Table shows comparison between some existing estimators and proposed estimators with respect to usual estimator. Table 5.1: Percent relative efficiency of proposed estimators with respect to usual estimator

Estimators PRE

p

ta

tb

tc

t1

t2

t3 g  1,   1 ,

100 189.38 511.79 518.05 513.92 513.92 685.51

g  1,   1

199.20

g  0,   1

141.23

When we examine Table 5.1, we observe that the proposed estimators t1, t2 and t3 all performs better than the usual estimator p . Also, the proposed estimator t3 is the best among the estimators considered in this paper and perform better than the estimators proposed by Singh et al. (2010) for estimating P for the choice g  1,   1. References 1. Abd-Elfattah, A.M., Sherpeny E.A., Mohamed S.M., Abdou O.F.(2010):Improvement estimating the population mean in simple random sampling using information on auxiliary attribute. Applied mathematics and computation. 2. Gujarat, D. N., Sangeetha, (2007): Basic economtrics. Tata McGraw – Hill. 3. Naik,V.D. and Gupta, P.C., (1996): A note on estimation of mean with known population proportion of an auxiliary character. Jour. Ind. Soc. Agr. Stat., 48(2),151158. 4. Jhajj, H. S., Sharma, M. K. and Grover, L. K.,(2006): A family of estimators of population mean using information on auxiliary attribute. Pak. J. Statist., 22 (1), 4350. 5. Kadilar, H. Cingi. (2005): A new estimator using two auxiliary variables, Applied Mathematics and Computation 162, 901–908. 6. Khoshnevisan, M., Singh, R., Chauhan, P., Sawan, N., and Smarandache, F., (2007): A general family of estimators for estimating population mean using known value of some population parameter(s), Far East Journal of Theoretical Statistics 22 181–191. 7. Shabbir, J. and Gupta, S.,(2007): On estimating the finite population mean with known population proportion of an auxiliary variable, Pakistan Journal of Statistics 23 (1) 1–9. 8. Singh, R., Chauhan, P., Sawan, N. and Smarandache,F. (2008): Ratio estimators in simple random sampling using information on auxiliary attribute. Pak. J. Stat. Oper. Res. 4(1),47-53.

9. Singh, H.P. and Solanki, R. S.,( 2012): Improved estimation of population mean in simple random sampling using information on auxiliary attribute. Appl. Math. Comput., 218, 7798–7812.

10. Singh, R., Cauhan, P., Sawan, N., and Smarandache, F.,(2007): Auxiliary information and a priori values in construction of improved estimators. Renaissance High Press. 11. Singh, R., Kumar, M. and Smarandache, F., (2010): Ratio estimators in simple random sampling when study variable is an attribute. World Applied Sciences Journal 11 (5) pp 586-589. 12. Singh, R., Kumar, M. and Smarandache, F., (2008): Almost unbiased estimator for Estimating population mean using known value of some population parameter(s). Pak. J. Stat. Oper. Res., 4(2) 63-76. 13. Singh, R. and Kumar, M., (2011): A note on transformations on auxiliary variable in survey sampling. MASA, 6:1, 17-19. 14. Srivastava, S.K., and Jjhajj, H.S., (1981), A class of estimators of population mean in survey sampling using auxiliary information. Biometrika 68, 341-343. 15. Upadhyaya, L. N. and Singh, H. P., 1999, Use of transformed auxiliary variable in estimating the finite population mean. Biom. Jour., 41, 627-636.

16. W.A. Abu-Dayyeh, M.S. Ahmed, R.A. Ahmed, H.A. Muttlak., (2003): Some estimators of a finite population mean using auxiliary information, Applied Mathematics and Computation 139,287–298.