Sampling Theory Improvement in Variance Estimation in Simple ...

2 downloads 0 Views 79KB Size Report
Isaki (1983) presented the ratio estimator for the population variance as s2 ratio = s2 y. S2 x s2 x. (1) where s2 y and s2 x are unbiased estimators of population ...
Communications in Statistics—Theory and Methods, 36: 2075–2081, 2007 Copyright © Taylor & Francis Group, LLC ISSN: 0361-0926 print/1532-415X online DOI: 10.1080/03610920601144046

Sampling Theory

Improvement in Variance Estimation in Simple Random Sampling CEM KADILAR AND HULYA CINGI Department of Statistics, Hacettepe University, Ankara, Turkey We propose a new estimator for the population variance using an auxiliary variable in simple random sampling and obtain the equations for its mean square error (MSE) and bias. In addition, theoretically, we show that the proposed estimator is more efficient than the traditional ratio and regression estimators, suggested by Isaki (1983), under certain conditions that are defined in this article. These conditions are satisfied with a numerical example. Keywords Auxiliary information; Bias; Efficiency; Mean square error; Ratio estimator; Regression estimator; Variance estimator. Mathematics Subject Classification 62D05.

1. Introduction Isaki (1983) presented the ratio estimator for the population variance as 2 sratio = sy2

Sx2  sx2

(1)

where sy2 and sx2 are unbiased estimators of population variances Sy2 and Sx2 , respectively. Prasad and Singh (1990) obtained the MSE and the bias equations of this estimator as follows:  2 MSE sratio   Sy4 2 y + 2 x − 2 (2)  2  2 B sratio  Sy 2 x − 1 −  − 1  (3) respectively, where  = n1 , 2 y = 402 is the kurtosis for the population of the study 20 variable, 2 x = 042 is the kurtosis for the population of the auxiliary variable,  = 22 and rs = 20 02

02 N j=1

yj −Y¯ r xj − X s N

(Kendall and Stuart, 1963). Here n is the sample

Received June 24, 2005; Accepted October 13, 2006 Address correspondence to Cem Kadilar, Department of Statistics, Hacettepe University, Beytepe, Ankara, Turkey; E-mail: [email protected]

2075

2076

Kadilar and Cingi

size, N is the population size,  X and  Y are population means of the auxiliary variable xi and the study variable yi , respectively. Isaki (1983) also presented the regression estimator for the population variance using auxiliary variable as 2 = sy2 + bSx2 − sx2  sreg

(4)

where b is a constant, which makes the MSE of the estimator minimum, when Sy2 VR −1 b = B =  x−1 . Here VR = S2 . Garcia and Cebrain (1996) obtained the MSE of this 2 x estimator as follows:    − 12 2 4 MSEsreg   Sy 2 y − 1 − (5) 2 x − 1 Note that this estimator is an unbiased estimator. When we compare MSE Eqs. (5) and (2), we see that the MSE of regression estimator is smaller than the MSE of ratio estimator for 2 x > 1. In other words, regression estimator, given in (4), is more efficient than the ratio estimator, given in (1), for 2 x > 1. There are some other important studies on this topic in literature such as Das and Tripathi (1978), Singh et al. (1988), Agrawal and Sthapit (1995), and Arcos et al. (2005). In these studies, the following population variance estimators are proposed: s12 = sy2 s32 =

   2   X S  s22 = sy2 2x  x¯ sx sy2  X

 X + ¯x −  X

 s42 =

sy2 Sx2 Sx2 + sx2 − Sx2 



X − x¯  s62 = 1 sy2 + 2 Sx2 − sx2  s52 = 1 sy2 + 2  s72 = ¯r Sx2 + srd  + 1 − sy2  s82 = sy2 + 1 Sx2 − sx2  + 2  X − x¯  where , 1 , 2 , 1 , 2 are suitable constants that minimize the MSE of the estimators, n n nN − 1 1 1  ¯  r¯ = ri  and srd = r − r¯ di − d Nn − 1 n i=1 n − 1 i=1 i   n Y 2 d yi −   di = xi −  X 2  and d¯ = i=1 i Here ri = n xi −  X

=

2. The Suggested Estimator Shabbir and Yaab (2003) proposed an estimator for the population mean as X y¯ SY = 1 − J ¯y + Jtb 

(6)

1+C

where J is a constant and tb = xy¯¯ 1+Cxy2 is called Beale’s estimator. Here, Cx and Cy x are the population coefficient of variations of xi and yi , respectively. Cxy = xy Cx Cy ,

Improvement in Variance Estimation

2077

 = 1−f , f = Nn , xy is the population coefficient of correlation between xi and yi , n y¯ and x¯ are sample means of xi and yi , respectively. Note that the optimum value of J is J ∗ = xy Cy /Cx . Adapting the estimator of Shabbir and Yaab (2003), given in (6), to the estimator for the population variance, we develop the following estimator: 2 = 1 sy2 + 2 tb  X spr

(7)

where 1 and 2 are the weights that satisfy the condition: 1 + 2 = 1. The MSE of the proposed estimator can be found using the first-degree approximation in the Taylor series method defined by  2  d d   MSE spr

(8)

where

ha b c ha b c ha b c d = 2 2 2 a b c Sy  X  Y Sy  X  Y Sy  X  Y   2 2 2 Vsy  covsy  x¯  covsy  y¯    2 V¯x cov¯x y¯    = cov¯x sy  cov¯y sy2  cov¯y x¯  V¯y

2 (see Wolter, 1985). Here, ha b c = hsy2  x¯  y¯  = spr . According to this definition, we obtain d for the proposed estimator as follows:

d = 1 

− 2 R 2 

1+C

where R = XY and  = 1+Cxy2 is a constant, whose value is approximately 1 in general x for a fixed sample size. We obtain the MSE of the proposed estimator using (8) as     2  MSE spr  12 V sy2 + 22 R2 2 V¯x + 22 2 V¯y − 21 2 Rcov x¯  sy2   + 21 2 cov y¯  sy2 − 222 R2 cov¯x y¯ 

(9)

To find the variance, autocovariance, and cross covariance terms in (9), we use the following equations:   2 Vmr  =  2r − r2 + r 2 2 r−1 − 2r r−1 r+1        Vmr  =  2r − r2  Vms  =  2s − s2 

(10) (11)

covmr  mq  =  r+q − r q + rq 2 r−1 q−1 − r r−1 q+1 − q r+1 q−1  covmr  m1  =  r+1 − r 2 r−1   covmrs muv  =  r+us+v − rs uv + ru 20 r−1s u−1v + sv 02 rs−1 uv−1 + rv 11 r−1s uv−1 + su 11 rs−1 u−1v − u r+1s u−1v  − v rs+1 uv−1 − r r−1s u+1v − s rs−1 uv+1

(12)

2078

Kadilar and Cingi

    covmrs muv  =  r+us+v − rs uv    covmrs muv  =  r+us+v − rs uv − u r+1s u−1v − v rs+1 uv−1 

(13) (14)

 respectively (see Kendall and Stuart, 1963, pp. 229–235), where mr =  nj=1 yj −  Y r , N     y − Y r Y r xj −  X s , mrs =  nj=1 yjr xjs , r = j=1 N j , mr =  nj=1 yjr , mrs =  nj=1 yj −  r =

N

r j=1 yj

N

, rs =

N

j=1

yj − Y r xj − X s N

, rs =

N

r s j=1 yj xj

N

. When r = 2 in (10), we have

Vsy2   Sy4 2 y − 1

(15)

When r = 1 and s = 1 in (11), we have V¯x  Sx2 

(16)

V¯y 

(17)

Sy2 

respectively. When r = 2 in (12), we have covsy2  y¯  = cov¯y sy2  =  30

(18)

When r = 0, s = 1, u = 1, and v = 0 in (13), we have cov¯x y¯   Syx 

(19)

where Syx is the population covariance between the study and auxiliary variables. When r = 0, s = 1, u = 2, and v = 0 in (14), we have cov¯x sy2  =  21

(20)

Using (15)–(20) in (9), we obtain the MSE equation of the proposed estimator as follows: 2 MSEspr   Sy4 12 2∗ y − 21 2 A + 22 B 

where 2∗ y = 2 y − 1, A = n2 MSE¯yr . Sy4

R 21 − 30 , Sy4

and B =

R2 2 VR Sy2

+

2 Sy2



2R2  . VR Sy Sx

(21) In other words,

B= Here, y¯ r is the ratio estimator for the population mean in simple random sampling and MSE¯yr   Sy2 − 2RSy Sx + R2 Sx2  (see Cochran, 1977). 2 The optimum values of 1 and 2 to minimize the MSE of spr can easily be shown as: 1∗ =

A+B A + 2∗ y and 2∗ = ∗ 2A + B + 2 y 2A + B + 2∗ y

(22)

The bias of the proposed estimator can be found using the second-degree approximation, 0n−2 , in the Taylor series method defined by 2  Bspr

3 3  1  dij Eˆi − i ˆj − j  2 i=1 j=1

(23)

Improvement in Variance Estimation

2079

2 , ht = hs2  x¯  y¯  = s2 , ˆ 1 = s2 , ˆ 2 = y¯ , (see Wolter, 1985), where dij = ˆ ht y pr y ˆ i j t=T  ˆ3 = x¯ and 1 = Sy2 , 2 =  Y , 3 = X . By these definitions, we can re-write (23) as  2 1 B spr  d Vs2  + d10 cov¯y sy2  + d20 cov¯x sy2  + d01 covsy2  y¯  + d11 V¯y 2 00 y  × d21 cov¯x y¯  + d02 covsy2  x¯  + d12 cov¯y x¯  + d22 V¯x (24) Using the definition dij and (15)–(20) in (24), we can easily obtain the bias of the proposed estimator as follows:  2  2    B spr  Syx + RSx2  X

(25)

3. Efficiency Comparisons In this section, firstly we compare the MSE of the proposed estimator with the MSE of the traditional ratio estimator given in (2). We have the condition as follows: 2 2 MSEmin spr  < MSEsratio 

1∗2 − 12 y − 1∗2 − 2 x + 2 − 21∗ 2∗ A + 2∗2 B < 0

(26)

When the condition (26) is satisfied, the proposed estimator is more efficient than the traditional ratio estimator given in (1). Secondly, we compare equations of the MSE between the proposed estimator and the traditional regression estimator given in (5) as follows:  2  2  MSEmin spr < MSE sreg 1∗2 − 12∗ y +

 − 12 − 21∗ 2∗ A + 2∗2 B < 0 2∗ x

(27)

where 2∗ x = 2 x − 1. When the condition (27) is satisfied, the proposed estimator is more efficient than the traditional regression estimator given in (4). As a result, we show that the MSE of the proposed estimator is smaller than the MSE of traditional ratio and regression estimators for the conditions (26) and (27), respectively. If we compare the bias of the proposed estimator, given in (25), with the bias of the ratio estimator, given in (3), we obtain the following equation: 2∗  Syx + RSx2  − Sy2 2∗ x −  − 1 < 0  X

(28)

When the condition (28) is satisfied, the proposed estimator is less biased than the traditional ratio estimator, given in (1).

4. Numerical Example We use the data of Kadilar and Cingi (2003) in this section. However, we consider the data of only the East Anatolia Region of Turkey, as we are interested in simple

2080

Kadilar and Cingi Table 1 Data statistics

N = 104 n = 20  = 0 865 Cy = 1 866 Cx = 1 653 Cyx = 2 668 Syx = 232470 06

 Y = 6 254  X = 13931 683 Sy = 11 670 Sx = 23026 133 2 x = 17 516 2 y = 16 523

 = 0 050  = 0 040  = 14 398 VR = 2.57E-07 1∗ = −0 005 2∗ = 1 005

A = −0 0845 B = 0 0018 R = 0 0004 21 = 9665672 33 30 = 5910 282  = 0 998

random sampling in this article. We apply the proposed and traditional estimators to this data set concerning the level of apple production (in 100 tones) (as study variable) and number of apple trees (as auxiliary variable) in 104 villages in the East Anatolia Region in 1999 (Source: Institute of Statistics, Republic of Turkey). The statistics of these data are given in Table 1. Using the statistics in Table 1, we obtain the values of conditions (26) and (27), as −5.241 and −4.653, respectively, for all sample sizes (n = 10, 20, 30, 40, etc.). Therefore, conditions are always satisfied for this data set. Thus, we can infer that the proposed estimator is more efficient than the traditional regression and ratio estimators. In addition, when we compare the bias of the proposed estimator with the bias of the traditional ratio estimator, we obtain the value of the condition (28) as approximately −390 for all sample sizes (n = 10, 20, 30, 40, etc.). Therefore, the bias of the proposed estimator is also less than of the traditional ratio estimator for this data set.

5. Conclusion We have developed a new population variance estimator whose MSE is smaller than the MSE of traditional ratio and regression estimators for the conditions (26) and (27), respectively. This theoretical inference is also supported by the result of an application with original data presented in Kadilar and Cingi (2003). In forthcoming studies, we hope to adapt the estimator, presented in this article, to the stratified random sampling as in Kadilar and Cingi (2005a) and hope to develop a variance estimator using two auxiliary variables as the estimator in Kadilar and Cingi (2005b).

References Agrawal, M. C., Sthapit, A. B. (1995). Unbiased ratio-type variance estimation. Statist. Probab. Lett. 25:361–364. Arcos, A., Rueda, M., Martinez, M. D., Gonzalez, S., Roman, Y. (2005). Incorporating the auxiliary information available in variance estimation. Appl. Math. Computat. 160:387–399. Cochran, W. G. (1977). Sampling Techniques. New York: John Wiley and Sons. Das, A. K., Tripathi, T. P. (1978). Use of auxiliary information in estimating the finite population variance. Sankhya 40:139–148. Garcia, M. R., Cebrain, A. A. (1996). Repeated substitution method: The ratio estimator for the population variance. Metrika 43:101–105.

Improvement in Variance Estimation

2081

Isaki, C. T. (1983). Variance estimation using auxiliary information. J. Amer. Statist. Assoc. 78:117–123. Kadilar, C., Cingi, H. (2003). Ratio estimators in stratified random sampling. Biometrical J. 45:218–225. Kadilar, C., Cingi, H. (2005a). A new ratio estimator in stratified random sampling. Commun. Statist. Theor. Meth. 34:597–602. Kadilar, C., Cingi, H. (2005b). A new estimator using two auxiliary variables. Appl. Math. Computat. 162:901–908. Kendall, M., Stuart, A. (1963). The Advanced Theory of Statistics: Distribution Theory. Vol. 1, London: Griffin. Prasad, B., Singh, H. P. (1990). Some improved ratio-type estimators of finite population variance in sample surveys. Commun. Statist. Theor. Meth. 19:1127–1139. Shabbir, J., Yaab, M. Z. (2003). Improvement over transformed auxiliary variable in estimating the finite population mean. Biometrical J. 45:723–729. Singh, H. P., Upadhyaya L. N., Namjoshi, U. D. (1988). Estimation of finite population variance. Curr. Sci. 57:1331–1334. Wolter, K. M. (1985). Introduction to Variance Estimation. Springer-Verlag.

Suggest Documents