On Estimating the Proportion of a Qualitative Sensitive ... - Springer Link

Quality & Quantity 38: 675–680, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.

675

On Estimating the Proportion of a Qualitative Sensitive Character Using Randomized Response Sampling HORNG-JINH CHANG1, , CHIH-LI WANG2 and KUO-CHUNG HUANG3 1 Department of Management Sciences, Tamkang University, Tamsui, Taipei, Taiwan 251, R.O.C. 2 Department of Applied Statistics and Information Sciences, Mingchuan University, Taoyuan,

Taiwan 333, R.O.C. 3 Department of Business Administration, Chung-Yu Junior Institute of Technology, Keelung, Taiwan 201, R.O.C.

Abstract. In this paper, we present results pertaining to investigations concerning randomized response sampling. We consider a simple generalization for some existing investigations, and then provide the suitable choices for design parameters. It is also demonstrated the superiority of the proposed procedure over Warner (1965) procedure.

1. Introduction To procure reliable sample data for estimating π , the proportion of respondents possessing a sensitive attribute A, Warner (1965) proposed an ingenious procedure, called randomized response technique. In this procedure, a simple random sample of size n is drawn from the population with replacement. Each sampled-respondent is provided with a suitable randomization device that comprises of two statements viz. “I am a member of A”, and “I am not a member of A”, represented with probabilities p1 and 1 − p1 , respectively. The respondent randomly chooses one statement and report “yes” or “no” with respect to his/her actual status, without revealing to the interviewer which of the two statements he/she has chosen. Thus, in this whole procedure, privacy of the respondent is maintained. Assuming truthful reporting, the probability of “yes” answer, λW , is given by λW = p1 π + (1 − p1 )(1 − π ).

(1.1)

Warner considered πˆ W =

λˆ W − (1 − p1 ) , (2p1 − 1)

E-mail: [email protected]

1 p1 = , 2

(1.2)

676

HORNG-JINH CHANG ET AL.

as an unbiased estimator of π , where λˆ W is the observed proportion of “yes” answers. Since λˆ W follows the binomial distribution B(n, λW ), the variance of the estimator πˆ W is given by V (πˆ W ) =

p1 (1 − p1 ) π(1 − π ) + . n n(2p1 − 1)2

(1.3)

It is known that p1 is always chosen to be larger than 1/2, and the value of p1 should be chosen as close to unity as practicable. Several other workers have contributed to the further development of Warner’s (1965) strategy (Abul-Ela et al., 1967; Horvitz et al., 1967; Greenberg et al., 1969; Moors, 1971; Erikson, 1973; Folsom et al., 1973; Raghavarao, 1978; Mangat and Singh, 1990; and Mangat, 1994). In particular, Mangat et al. (1995) proposed a randomized response device that consists of a deck having three types of cards. The statements “I belong to group A” and “I do not belong to group A” are represented with probabilities p1 (same as in Warner’s case) and p2 (p1 = p2 ), respectively, whereas with p3M proportion of cards are left blank, such that p1 + p2 + p3M = 1. If a blank card is drawn by the respondent, he/she will report “no” irrespective of his/her actual status with respect to the sensitive attribute. An unbiased estimator of π is therefore given by πˆ M =

λˆ M − p2 , (p1 − p2 )

(1.4)

with variance given by V (πˆ M ) =

πp3M p2 (1 − p2 ) π(1 − π ) + + , n n(p1 − p2 ) n(p1 − p2 )2

(1.5)

where λˆ M is the observed proportion of “yes” answers in the sample. Recently, Bhargava and Singh (2000) proposed an alternative randomized response procedure, which operated in a similar manner, with a little deviation as that proposed by Mangat et al. (1995). The difference is that, when a blank card is drawn, the respondent is just required to say “yes”. An unbiased estimator of π can be obtained as πˆ BS =

λˆ BS − (p2 + p3BS ) , (p1 − p2 )

(1.6)

and its variance is given by V (πˆ BS ) =

πp3BS p1 (1 − p1 ) π(1 − π ) − + . n n(p1 − p2 ) n(p1 − p2 )2

where λˆ BS is the proportion of “yes” answers.

(1.7)

ON ESTIMATING THE PROPORTION OF A QUALITATIVE SENSITIVE CHARACTER

677

The present paper considers a simple generalization for the above two randomized response procedures. An attempt is made to provide an optimal solution to achieve higher gain in efficiency. We also study its performance and compare it with Warner (1965) procedure. 2. The Proposed Procedure In the proposed randomized procedure, a randomization device consisting of a deck having four types of cards is in use. The statements “I belong to group A” and “I do not belong to group A” are represented with probabilities p1 (same as in Warner’s case) and p2 (p1 = p2 ), respectively, whereas with p3 proportion of cards’ statement is “yes”, and the statement “no” is with probability p4 , such that 4 j =1 pj = 1. When a “yes” or “no” card is drawn by the respondent, he/she is required to say “yes” or “no” irrespective of his/her actual status concerning the sensitive attribute. The probability of “yes” answers, in this case, is given by λ = p1 π + p2 (1 − π ) + p3 .

(2.1)

After solving it for π , we obtain an estimator of π as πˆ =

λˆ − (p2 + p3 ) , (p1 − p2 )

(2.2)

where λˆ is the proportion of “yes” answers reported by the sampled-individuals. Now, λˆ , being the binomial random variable with parameters (n, λ), is an unbiased estimator of λ. THEOREM 2.1. The estimator πˆ is unbiased with variance given by V (πˆ ) =

π(1 − π ) π(p3 − p4 ) (p1 + p4 )(1 − p1 − p4 ) − + . n n(p1 − p2 ) n(p1 − p2 )2

(2.3)

Proof. The unbiasedness follows from that λˆ is an unbiased estimator of λ. Since λˆ is binomially distributed with (n, λ), we have V (πˆ ) =

λ(1 − λ) . n(p1 − p2 )2

(2.4)

On substituting the expression (2.1) for λ in (2.4), and then after some algebraic simplifications, we get (2.3). An unbiased estimator of V (π) ˆ can easily be obtained, which is given in the following theorem. THEOREM 2.2. An unbiased estimator of variance V (π) ˆ is given by Vˆ (πˆ ) =

λˆ (1 − λˆ ) . (n − 1)(p1 − p2 )2

(2.5)

678


3. Optimal Choice of Parameter Design As the recommendation of Greenberg et al. (1969), a good working rule of selection of p1 is that, it should be chosen as large as practicable, such that p1 > 1/2. In this Regard, in what follows, we study the problem of selection of other design parameters p2 , p3 , p4 in appropriate manners. First, we are interesting in the optimal choices for p3 and p4 under the condition that p1 and p2 are given in advance. Actually, because of the relation 4j =1 pj = 1, the variance of πˆ can be written in the form π(1 − π ) π(p1 + p2 + 2p3 − 1) (p2 + p3 )(1 − p2 − p3 ) − + . V (πˆ ) = n n(p1 − p2 ) n(p1 − p2 )2 (3.1) It can be shown that the above variance expression concaves down with respect to p3 , implying that the optimal choice of p3 is 0 or 1 − p1 − p2 (p4 = 0), which are respectively identical to Mangat et al. (1995) and Bhargava and Singh (2000) procedures. Actually, Bhargava and Singh (2000) have pointed out: in cases when π ≥ 1/2, the later procedure is certainly better than that given by Mangat et al. (1995). Otherwise, when π < 1/2, Mangat et al. (1995) procedure performs better. Next, we further study on the optimal choices of p2 and p3 or p4 , provided that p1 remains unchanged. Following the similar manner as described above, the first order differentiation for the corresponding variance expression with the respect to p2 are always positive. In other words, the optimal values of p2 for Mangat et al. (1995) and Bhargava and Singh (2000) procedures are zero. As a result to these discussions, it is concluded that: CASE I. When π > 1/2, the optimal randomized response device is with the two statements “I belong to group A” and “no”, represented with probabilities p1 and (1 − P1 ) respectively. An unbiased estimator of π is given by πˆ 1 =

λˆ 1 − (1 − p1 ) , p1

(3.2)

with variance V (πˆ 1 ) =

π(1 − π ) (1 − π )(1 − p1 ) + , n np1

(3.3)

where λˆ 1 is the proportion of “yes” answers. CASE II. When π < 1/2, the optimal randomized response device is with the two statements “I belong to group A” and “yes”, represented with probabilities p1 and (1 − p1 ) respectively. An unbiased estimator of π is given by πˆ 2 =

λˆ 2 , p1

(3.4)

ON ESTIMATING THE PROPORTION OF A QUALITATIVE SENSITIVE CHARACTER

679

with variance V (πˆ 2 ) =

π(1 − π ) n(1 − p1 ) + , n np1

(3.5)

where λˆ 2 is the proportion of “yes” answers. CASE III. The randomized response devices presented in Cases I and II are equally efficient, which are the optimal devices in the presence of π = 1/2. 4. Efficiency Comparison In the present section, we study the performance of the proposed procedure with respect to Warner (1965) procedure. When π ≥ 1/2, it is easy to derive that V (πˆ 1 ) − V (πˆ W ) =

−(1 − p1 )[(1 − p1 )(3p1 − 1) + π(2p1 − 1)2 ] , p1 (2p1 − 1)2

(4.1)

which is always less than zero when p1 > 1/3. When π < 1/2, we have V (πˆ 2 ) − V (πˆ W ) =

(1 − p1 )[(2p1 − 1)2 π − p12 ] np1 (2p1 − 1)2
1/3. In general, the value of p1 is always chosen to be larger than 1/2. In other words, the proposed procedure is invariably more efficient than Warner (1965) procedure for all practical situations. To have an idea about the magnitude of relative efficiency of the proposed procedure in relation to Warner (1965) procedure, we resort to an empirical study. The percent relative efficiency of the proposed procedure with respect to Warner’s procedure is defined as PRE =

V (πˆ W ) × 100, V (πˆ i )

(4.3)

where i = 1, 2, corresponded to the cases π ≥ 1/2 and π < 1/2. Here the values of p1 are taken as 0.6, 0.7, 0.8 and 0.9, and the whole range of π is considered. The PRE values computed are shown in Table I. Table I clearly demonstrated that the proposed procedure is always superior to Warner (1965) procedure. It is worth to note that, the PRE values are symmetric about 1/2 with respect to π , and are concave up. Hence higher efficiency is expected especially when π is remote from 1/2. On the other hand, with p1 close to 1/2, the

680


Table I. Percent relative efficiency of the proposed procedure with respect to Warner procedure π

p1 = 0.6

p1 = 0.7

p1 = 0.8

p1 = 0.9

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

3887.2 2100.0 1514.6 1231.6 1071.4 1231.6 1514.6 2100.0 3887.2

1055.6 599.3 449.7 377.3 336.5 377.3 449.7 599.3 1055.6

464.7 287.8 229.6 201.3 185.2 201.3 229.6 287.8 464.7

228.1 165.0 144.1 133.8 127.8 133.8 144.1 165.0 228.1

respondent is well protected and the PRE values blows up. This implies that more gain in relative efficiency results from smaller value of |p1 − 1/2|. References Abul-Ela, L. A., Greenberg, B. G. & Horvitz, D. G. (1967). A multiproportions randomized response model. Journal of the American Statistical Associations 62: 990–1008. Erikson, S. A. (1973). A new model for randomized response. International Statistical Review 41: 101–103. Folsom, R. E., Greenberg, B. G. & Horvitz, D. G. (1973). The two alternate questions randomized response model for human surveys. Journal of the American Statistical Associations 68: 525– 530. Greenberg, B. G., Abul-Ela, L. A., Simon, W. R. & Horvitz, D. G. (1969). The unrelated question randomized response model: theoretical framework. Journal of the American Statistical Associations 64: 520–539. Horvitz, D. G., Shah, B. V. & Simon, W. R. (1967). The unrelated question randomized response model. Proc. Social. Stat. Sec. (The American Statistical Association), pp. 65–72. Bhargava, M. & Singh, R. (2000). A modified randomization device for Warner’s model. Statistica, anno. LX 2: 315–321. Mangat, N. S. & Singh, R. (1990). An alternate randomized response procedure. Biometrika 77: 439–442. Mangat, N. S. (1994). An improved randomized response strategy. Journal of the Royal Statistical Society Series B 56: 93–95. Mangat, N. S., Singh, S. & Singh, R. (1995). On use of a modified randomization device in Warner’s model. Journal of the Indian Society of Statistics & Operational Research 16: 65–69. Moors, J. J. A. (1971). Optimization of the unrelated question randomized response model. Journal of the American Statistical Associations 66: 627–629. Warner, S. L. (1965). Randomized response: a survey technique for eliminating evasive answer bias. Journal of the American Statistical Associations 60: 63–69.