Ratio method of estimation of population proportion using ... - IOS Press

162 downloads 0 Views 74KB Size Report
Model Assisted Statistics and Applications 1 (2005,2006) 125–130. 125. IOS Press. Ratio method of estimation of population proportion using randomized ...
125

Model Assisted Statistics and Applications 1 (2005,2006) 125–130 IOS Press

Ratio method of estimation of population proportion using randomized response technique 1

Zaizai Yana,b a

Science College of Inner Mongolia university of technology, Hohhot, Inner Mongolia, 010051, P.R. China Management College of Inner Mongolia university of technology, Hohhot, Inner Mongolia, 010051, P.R. China E-mail: [email protected]

b

Abstract. In this paper, we present results pertaining to investigations concerning randomized response sampling. We consider an ratio estimation procedure that is a development for some existing investigations, and then provide the suitable ratio estimator for estimating an unknown proportion of people bearing a sensitive characteristic in a given community. It is demonstrated the superiority the proposed procedure over Warner [9] procedure. Keywords: Warner’s randomized response technique, sensitive variable, auxiliary variable, estimation of proportion

1. Introduction In survey where some of the variables (or characteristics) concern incriminating or socially undesirable practices, direct enquiring about sensitive topic leads to some degree of evasiveness or noncooperation, and refusals or untruthful answers will be possibly encountered. One method of encouraging better cooperation by respondents is using the randomized response technique introduced by Warner [9]. The Warner’s design is applied to a population of two groups according to whether people belong to a certain sensitive group A (e.g., being homosexual) or not. Each respondent uses a randomized response device to select one of the following two statements (a) “I am a number of group A”, (b) “I am not a number of group A”, and gives a “yes” or “no” answer. The output of the randomized response device is not revealed to interviewer. In this case, privacy of the respondents 1 Supported by Natural Science Foundation of Inner Mongolia(2005), Department of Education, Inner Mongolia (NJ03004).

is protected since the interviewer does not know which statement has been chosen. The purpose of this survey is to estimate the proportion π of the population belonging to group A by procured data. Warner [9] gave the following estimator of π: π ˆW = (z − 1 + p)/(2p − 1)(p = 0.5), where p is the proportion of the first statement in the randomized response device, and z is the mean of “yes” answers obtained from the n respondents selected by simple random sampling with replacement. The estimator π ˆW of π has been shown to be unbiased with variance p(1 − p) π(1 − π) + V (ˆ πW ) = . (1) n n(2p − 1)2 A similar procedure is the unrelated question model by Greenberg et al. According to this model one of the two statements is chosen to have no the stigmatizing characteristic. Many other authors such as Horvitz et al. [5], N.S. Mangat [6], Anthony Y.C. KUK [1], Chaudhuri [3] and Zaizai Yan [11] proposed some methods conducting survey on sensitive topics in order to improve the efficiency of the estimators and the level of respondent

ISSN 1574-1699/05/06/$17.00 © 2005/2006 – IOS Press and the authors. All rights reserved

126

Z. Yan / Ratio method of estimation of population proportion using randomized response technique Table 1 An artificial population of size N = 6 i Xi Yi

1 2 0

2 3 0

3 5 1

4 5 1

5 7 1

6 8 1

cooperation. The use of auxiliary variable in randomized response technique has was proposed by few authors. For example Chaudhuri and Mukerjee [4] applied PPSWOR designs in RR Survey, and Arnab [2] also considered the problem of estimation of population total using randomized response sampling. But under direct survey, there are much research about improving accuracy using the auxiliary variables. These auxiliary variables are related with the interest sensitive characteristic variable. It is very important that these auxiliary information is used reasonably in order to decrease the error brought about sampling and randomized response. In this paper, we attempt to improve the accuracy and to develop an ratio estimation procedure of randomized response survey on sensitive topics. 2. The proposed procedure Let N (known) be size of a finite population and π(unknown) be the proportion of persons possessing some sensitive characteristic (say Y ) in the population. let Yi = 1 if the ith person possesses the sensitive characteristic Y in the population, otherwise Y i = 0 . And let Xi be the value of an auxiliary variable, X say, for the ith person in the population, i = 1, 2, · · · , N . If the auxiliary variable is the positive correlation with the sensitive characteristic variable in the population  1 if the ith person is smoking (For example, X i = 0 if the ith person is not smoking  1 if the ith person is a drug addict and Yi = ). 0 if the ith person is not a drug addict In other words, X i takes the larger value if Y i = 1, but Xi takes the smaller value if Y i = 0. It is easily seen N that π = N1 i=1 Yi . Sampling design: a sample s of size n is selected from the population following the simple random sampling with replacement. Randomized device: A box have some cards bearing following two statements (1) I have the sensitive characteristic Y , and (a) (2) I have the characteristic X. (1) I have not the sensitive characteristic Y , (b) (2) I have the characteristic X. respectively with pre-assigned probabilities p and 1−p.

Randomized Response Technique: Each respondent selected in the sample is requested to choose independently at random a card out of the box, then he has to disclose the true values of the variables Y and X. Let Zi denote the ith respondent’s randomized response⎧about Y , and 1, if the respondent i in the population ⎪ ⎪ ⎨ draws the first stetement (a) Di = , 0, if the respondent i in the population ⎪ ⎪ ⎩ draws the second stetement (b) then Zi = Di Yi + (1 − Di )(1 − Yi ), and ER (Zi ) = pYi + (1 − p)(1 − Yi ), Ui say, i = 1, 2, · · · , N , U = (2p − 1)π + (1 − p). The n persons in the sample use independently the randomized device and give n ordered answers (z1 , x⎧ 1 ), (z2 , x2 ), · · · , (zn , xn ), where yi , if the respondent i in the sample ⎪ ⎪ ⎨ draws the first stetement (a) zi = , , if the respondent i in the sample 1 − y ⎪ i ⎪ ⎩ draws the second stetement (b) i = 1, 2, · · · , n. ⎧ 1, if the respondent i in the sample ⎪ ⎪ ⎨ draws the first stetement (a) , Let di = 0, if the respondent i in the sample ⎪ ⎪ ⎩ draws the second stetement (b) then zi = di yi + (1 − di )(1 − yi ), and ER (zi ) = pyi + (1 − p)(1 − yi ), ui say, i = 1, 2, · · · , n, u = (2p − 1)y + (1 − p). As Ui and Xi take a positive correlation when p > 12 , and P (zi = 1) = πp + (1 − π)(1 − p), according to the ideas of the moment estimation and the ratio estimation, let πp+ (1 − π)(1 − p) = xz ·X, then obtain the following ratio estimator of π. z π ˆR = α · · X − β (2) x 1−p 1 with α = 2p−1 and β = 2p−1 . Following Cochran [10] the expressions for the Bias and mean square errors of the estimator π ˆ R are readily obtained as follows:   1 1−p 2 2 C E(ˆ πR )=π + π(CX − CY X ) + n 2p − 1 X

1 +O (3) n2 M SE(ˆ πR ) =

1 p(1 − p) π(1 − π) + n n(2p − 1)2 U {2(2p − 1)πCY X (4) n(2p − 1)2

1 2 −U CX } + O n2



Table 2

1,2,3,5

1,2,3,6

1,2,4,5 1,2,4,6 1,2,5,6 1,3,4,5 1,3,4,6 1,3,5,6 1,4,5,6 2,3,4,5 2,3,4,6 2,3,5,6 2,4,5,6 3,4,5,6

2

3

4 5 6 7 8 9 10 11 12 13 14 15

4.25 4.50 5.00 4.75 5.00 5.50 5.50 5.00 5.25 5.75 5.75 6.25

4.50

4.25

0.0256 0.0256 0.0256 0.0064 0.0064 0.0064 0.0064 0.0064 0.0064 0.0064 0.0064 0.0016

0.0256

0.0256

0.2176 0.2176 0.2176 0.2176 0.2176 0.0784 0.0784 0.0784 0.0784 0.0784 0.0784 0.0784 0.0784 0.0256

−0.3333 −0.3333 −0.3333 −0.3333 −0.3333 −0.3333 −0.3333 −0.3333 −0.3333 −0.3333 −0.3333 −0.3333 −0.3333 −0.3333

0.5136

0.5136

0.0833 0.5136 0.0833 0.5136 0.0833 0.5136 0.0833 0.3264 0.0833 0.3264 0.0833 0.3264 0.0833 0.3264 0.0833 0.3264 0.0833 0.3264 0.0833 0.3264 0.0833 0.3264 0.0833 0.1536 V (ˆ πW ) = Ed ER (ˆ πW − π)2 =

0.0833

0.0833 0.2176

0.2176

0.5 0.2176 0.5 0.2176 0.5 0.2176 0.5 0.4864 0.5 0.4864 0.5 0.4864 0.5 0.4864 0.5 0.4864 0.5 0.4864 0.5 0.4864 0.5 0.4864 0.5 0.4096 0.13320944

0.5

0.5

0.9166 0.9166 0.9166 0.9166 0.9166 0.9166 0.9166 0.9166 0.9166 0.9166 0.9166 0.9166

0.9166

0.9166

0.0256 0.0256 0.0256 0.1024 0.1024 0.1024 0.1024 0.1024 0.1024 0.1024 0.1024 0.4096

0.0256

0.0256

Wanner estimator of the simple random sample of n = 4 from the population of N = 6 using the randomized device (design parameter p = 0.8) SN SSU x PR (z = 0/4) π ˆW PR (z = 1/4) π ˆW PR (z = 2/4) π ˆW PR (z = 3/4) π ˆW PR (z = 4/4) 1 1,2,3,4 3.75 0.0256 −0.3333 0.2176 0.0833 0.5136 0.5 0.2176 0.9166 0.0256

1.3333 1.3333 1.3333 1.3333 1.3333 1.3333 1.3333 1.3333 1.3333 1.3333 1.3333 1.3333

1.3333

1.3333

π ˆW 1.3333

0.138898368 0.138898368 0.138898368 0.118031864 0.118031864 0.118031864 0.118031864 0.118031864 0.118031864 0.118031864 0.118031864 0.222169037

0.138898368

0.138898368

ER (ˆ πW − π)2 0.138898368

Z. Yan / Ratio method of estimation of population proportion using randomized response technique 127

Table 3

The ratio estimator of the simple random sample of n = 4 from the population of N = 6 using the randomized device (design parameter p = 0.8) SN SSU x PR (z = 04 ) π ˆR PR (z = 14 ) π ˆR PR (z = 24 ) π ˆR PR (z = 34 ) π ˆR PR (z = 44 ) π ˆR ER (ˆ πR − π)2 1 1,2,3,4 3.75 0.0256 −0.3333 0.2176 0.2222 0.5136 0.7777 0.2176 1.3333 0.0256 1.8888 0.209847625 2 1,2,3,5 4.25 0.0256 −0.3333 0.2176 0.1568 0.5136 0.6470 0.2176 1.1372 0.0256 1.6274 0.154172435 3 1,2,3,6 4.50 0.0256 −0.3333 0.2176 0.1296 0.5136 0.5926 0.2176 1.0555 0.0256 1.5185 0.142660566 4 1,2,4,5 4.25 0.0256 −0.3333 0.2176 0.1568 0.5136 0.6470 0.2176 1.1372 0.0256 1.6274 0.154172435 5 1,2,4,6 4.50 0.0256 −0.3333 0.2176 0.1296 0.5136 0.5926 0.2176 1.0555 0.0256 1.5185 0.142660566 6 1,2,5,6 5.00 0.0256 −0.3333 0.2176 0.0833 0.5136 0.5000 0.2176 0.9166 0.0256 1.3333 0.138898368 7 1,3,4,5 4.75 0.0064 −0.3333 0.0784 0.1052 0.3264 0.5438 0.4864 0.9824 0.1024 1.4210 0.142788361 8 1,3,4,6 5.00 0.0064 −0.3333 0.0784 0.0833 0.3264 0.5000 0.4864 0.9166 0.1024 1.3333 0.118031864 9 1,3,5,6 5.50 0.0064 −0.3333 0.0784 0.0454 0.3264 0.4242 0.4864 0.8030 0.1024 1.1818 0.092063609 10 1,4,5,6 5.50 0.0064 −0.3333 0.0784 0.0454 0.3264 0.4242 0.4864 0.8030 0.1024 1.1818 0.092063609 11 2,3,4,5 5.00 0.0064 −0.3333 0.0784 0.0833 0.3264 0.5000 0.4864 0.9166 0.1024 1.3333 0.118031864 12 2,3,4,6 5.25 0.0064 −0.3333 0.0784 0.0634 0.3264 0.4603 0.4864 0.8571 0.1024 1.2539 0.101781235 13 2,3,5,6 5.75 0.0064 −0.3333 0.0784 0.0289 0.3264 0.3913 0.4864 0.7536 0.1024 1.1159 0.087383546 14 2,4,5,6 5.75 0.0064 −0.3333 0.0784 0.0289 0.3264 0.3913 0.4864 0.7536 0.1024 1.1159 0.087383546 15 3,4,5,6 6.25 0.0016 −0.3333 0.0256 0.0000 0.1536 0.3333 0.4096 0.6666 0.4096 1.0000 0.075554419 M SE(ˆ πR ) = Ed ER (ˆ πR − π)2 = 0.123832937, E(ˆ πR ) = Ed ER (ˆ πR ) = 0.667715136, B(ˆ πR ) = 0.00104847, V (ˆ πR ) = M SE(ˆ πR ) − B 2 (ˆ πR ) = 0.123831838. SN denotes sample number, SSU denotes selected sample units.

128 Z. Yan / Ratio method of estimation of population proportion using randomized response technique

Z. Yan / Ratio method of estimation of population proportion using randomized response technique 2 YX YX where CX = σXX , CY X = σX·Y = σX·π , σX =   N N 1 1 2 i=1 (Xi − X) , σY !X = N i=1 (Yi − Y )(Xi − N X).

3. Efficiency comparison By comparing Eq. (4) with Eq. (1), we can obtain the following theorem. It shows that the proposed estimator π ˆR is more efficient than the usual Warner’s estimator π ˆW . Theorem 1. For enough large n, when ρ Y X > (2p−1)π+(1−p) CX · CY , has (2p−1)π V (ˆ πR ) < V (ˆ πW ).

1 2

where ρY X = σσXY·σXY and CY = 1−π π . Proof: By the expression (4), when n is enough large, we have V (ˆ πR ) ≈

p(1 − p) 1 π(1 − π) + n n(2p − 1)2

V (ˆ πW ) =

1−f N · π(1 − π) n N −1 +

1 p(1 − p) = 0.133333333. · n (2p − 1)2

The above numerical example shows that the proposed ratio estimation procedure is more efficient than the usual Warner simple estimation procedure πW )). (V (ˆ πR ) < V (ˆ

Acknowledgement

U 2 − {2(2p−1)πCY X −UCX }. n(2p−1)2 Hence πR ) = V (ˆ πW ) − V (ˆ

placement. The mean square error and the variance of the proposed ratio estimator are calculated on all possible samples of size n = 4. Furthermore we proceed the comparison of this proposed procedure and Warner procedure. It is obtained that by little simple calculation X = 4 2 5, SX = 5.2, Y = π = 23 , SY X = 1, SY2 = 15 . Taking p = 0.8, using the variance formula of the Warner model under simple random sampling without replacement, we have

·



129

πU CX · CY 2n(2p − 1)

  1 (2p − 1)π + (1 − p) CX · > 0. ρYX − 2 (2p − 1)π CX

The author wishes to thank two referees and Editorin-Chief Dr. Sarjinder Singh for their valuable comments and suggestions, which helped considerably in improving the earlier version of the paper.

References 2

Remark 1. Specially, when C X ≈ CY (for example there is this kind of case when X i is early data of Yi ), the ratio estimation is more accuracy than the simple . estimation if only ρ Y X > 12 · (2p−1)π+(1−p) (2p−1)π In a word, the method proposed in this paper is an efficient survey method on sensitive questions at no more cost in sampling.

4. Empirical study In order to the magnitude of gain in efficiency of the proposed ratio estimation procedure over Warner [9] simple estimation procedure, an empirical study is undertaken for an artificial population of size N = 6 in the following Table 1. For simplicity, the sampling design takes the simple random sampling without re-

[1]

A.Y.C. KUK,Asking sensitive questions indirectly, Biometrika 77(2) (1990), 436–438. [2] R. Arnab, Randomized response surveys: estimation of a finite population total, Statistical Papers 39 (1998), 405–408. [3] A. Chaudhuri, Using randomized response from a complex survey to estimate a sensitive proportion in a dichotomous finite population, J of Statistical Planning and Inference 94 (2001), 37–42. [4] A. Chaudhuri and R. Mukerjee, Randomized Response: Theory and Techniques, Marcel Dekker, New York, 1988. [5] D.G. Horvitz, B.V. Shah and W.R. Simmons, The unrelated question randomized response model, In Proc. Statist. Sect., Am. Statist. Assoc., 1967, 65–72. [6] N.S. Manget and Ravindra singh, An alternative randomized response procedure, Biometrika 77(2) (1990), 439–442. [7] P.D. Bourke and M.A. Moran, Estimating proportions from randomized response data using the EM algorithm, J Am Statist Assoc 83(404) (1984), 964–968. [8] S. Singh, A new Stochastic randomized response model, Metrika 56 (2002), 131–142. [9] S.L. Warner, Randomized response:a survey technique for eliminating evasive answer bias, J Am Statist Assoc 60 (1965), 63–69. [10] W.C. Cohran, Sampling Techniques, (third ed.), John Wiley & sons, 1977.

130 [11]

Z. Yan / Ratio method of estimation of population proportion using randomized response technique Yanzaizai and Niezankan, A fair comparison of the randomized response strategies, Acta Mathematica Scientia 24A(3) (2004), 362–368.

Suggest Documents