Efficiency comparison of unrelated question models ... - Springer Link

3 downloads 60 Views 184KB Size Report
Oct 1, 2011 - based on same privacy protection degree. Sabrina Giordano · Pier Francesco Perri. Received: 7 October 2010 / Revised: 15 August 2011 ...
Stat Papers (2012) 53:987–999 DOI 10.1007/s00362-011-0403-4 REGULAR ARTICLE

Efficiency comparison of unrelated question models based on same privacy protection degree Sabrina Giordano · Pier Francesco Perri

Received: 7 October 2010 / Revised: 15 August 2011 / Published online: 1 October 2011 © Springer-Verlag 2011

Abstract In this study, the problem of estimating the proportion π A of people bearing a sensitive attribute A is considered. Three dichotomous unrelated question mechanisms which are alternative to the well-known Simmons’ model are discussed and their performance is evaluated taking into account both efficiency and respondent privacy protection. The variance of the estimators of π A is compared under equal levels of confidentiality measures introduced by Lanke (1976) and Leysieffer and Warner (1976). Keywords Sensitive questions · Randomized response techniques · Simmons’ model · Jeopardy measures 1 Introduction Surveys on controversial or highly personal and stigmatizing questions concerning, for example, drug addiction, abortion, rape, xenophobia and fraud can lead to unreliable data when people interviewed are required to answer directly. Consider, for instance, the case in which people are asked to give a yes or no response to the question “Have you ever used drugs?”. Such direct questioning can be met with refusal to cooperate or untruthful answers. To induce greater cooperation and collect data on sensitive topics while ensuring respondent anonymity, Warner (1965) introduced an ingenious tool known as the randomized response (RR) technique. According to it, the population under study is divided into two mutually exclusive groups, one with and the other without

S. Giordano (B) · P. F. Perri Department of Economics and Statistics, University of Calabria, Via P. Bucci, Cubo 0C, 87036, Arcavacata di Rende (CS), Italy e-mail: [email protected]

123

988

S. Giordano, P. F. Perri

a stigmatizing attribute, say A. Hence, in order to estimate the proportion π A of population possessing A, a sample of n respondents is selected and each one is provided with a deck of cards. On each card is written one of two statements: “I possess A” or “I do not possess A”, in a proportion which is fixed in advance by the researcher. Respondents are asked to select a card and report a yes or no response depending on whether their true status—with respect to A—matches the statement on the selected card or not. Respondents are assumed to answer truthfully and, since they are instructed not to reveal to anyone the question selected, their true status remains unknown and privacy is protected. Different studies have assessed the validity of RR methods and shown that these can significantly outperform conventional data collection methods and procure more honest and reliable answers (see van der Heijden et al. (2000); Lara et al. (2004); Lensvelt-Mulders et al. (2005)). In the Netherlands, there is a well-established tradition regarding the use of RR data collection in official surveys mainly concerned with measuring the prevalence of fraud, especially in the area of disability benefits (see Lensvelt-Mulders et al. (2006)). However, many authors have used RR techniques in a wide range of empirical studies. Among others, Lara et al. (2004, 2006) have recently estimated the prevalence of abortion in Mexico; van der Heijden and Böckenholt (2008) discussed applications of the methodology in e-commerce; Ostapczuk et al. (2009) and Krumpal (2010) treated the issue of xenophobia and anti-semitism in Germany, while Arnab and Singh (2010) measured the impact of HIV infection in Botswana. Different variants of Warner’s model have been suggested so far in order to enhance respondent cooperation and increase the efficiency of the estimates (see, among others, Diana and Perri (2009) and Chaudhuri et al. (2011) for some recent contributions). A first alternative to Warner’s mechanism is known as Simmons’ model (Horvitz et al. (1967); Greenberg et al. (1969)). The technique provides respondents with the opportunity of replying to one of two statements: “I possess A” or “I possess B”, B being an innocuous characteristic unrelated to A and possessed by a known proportion π B of the population. Moreover, for the purpose of this paper, we mention Mangat et al. (1995) who designed an RR device by adding a number of blank cards to the two types of cards in Warner’s design. If a blank card is selected, respondents are asked to say no, irrespective of their true status. Bhargava and Singh (2000) considered the case when respondents are instructed to reply yes if a blank card is drawn. Similar procedures have been adopted by Singh et al. (2003) and Perri (2008) for Simmons’ model. Usually, comparisons of different RR designs are only performed in terms of efficiency of the estimates, neglecting the degree of complexity and, above all, the level of confidentiality guaranteed. As pointed out by many authors (see, e.g. Nayak (1994); Bhargava and Singh (2002); Guerriero and Sandri (2007); Chaudhuri et al. (2009)), a fair comparison should consider both efficiency and privacy: the variance of different estimators is to be compared under equal levels of confidentiality. With this in mind, Lin (2005) compared Warner (1965); Mangat et al. (1995) and Bhargava and Singh (2000) models in terms of the efficiency/privacy criterion. Starting from Lin (2005), we intend, in this study, to compare the performance of the unrelated question models (UQMs) of Singh et al. (2003) and Perri (2008) with Simmons’ design. In so

123

Comparison of UQMs based on same privacy protection degree

989

doing, we will consider the privacy protection measures proposed by Lanke (1976) and Leysieffer and Warner (1976). The paper is organized as follows. Section 2 provides a general framework for the problem of estimating the proportion of people bearing a sensitive attribute when a dichotomous-response model is assumed. The UQMs mentioned earlier are discussed and the variance of the estimators given. Section 3 is devoted to the protection of privacy: some common respondent confidentiality measures are illustrated and the relevant relation between the variance of the estimator of π A and the degree of privacy is shown. In Sect. 4, the UQMs are compared in terms of variance under equal levels of confidentiality. Section 5 concludes the work with some final comments. 2 Unrelated question models Consider a dichotomous population consisting of a sensitive group A and its complement A¯ in unknown proportions π A and 1 − π A . Consider a generic dichotomous RR design with a typical response R which can be yes (Y ) or no (N ). Let P(R|A) ¯ be the conditional probabilities that a response R comes from a member and P(R| A) ¯ respectively. These probabilities denote the design probabilities of groups A and A, of an RR survey. They are controlled by the researcher and are known both to the interviewee and the interviewer. Without loss of generality, the probability of a yes response can be expressed as ¯ − πA) λ = P(Y ) = P(Y |A)π A + P(Y | A)(1 ¯ A + P(Y | A). ¯ = [P(Y |A) − P(Y | A)]π

(1)

To estimate π A , a simple random sample of n respondents is drawn with replacement from the population. Sampled individuals are instructed to execute the RR device and give a yes or no response according to the procedure. Let n 1 be the number of yes λ = n 1 /n. Then, from (1), the unbiased responses in the sample, n 1 ∼ Bin(n, λ), and  moment estimator of π A is  πA =

¯  λ − P(Y | A) ¯ P(Y |A) − P(Y | A)

(2)

¯ = 0 and has variance which is defined if P(Y |A) − P(Y | A) V ( πA) =

λ(1 − λ) . ¯ 2 n[P(Y |A) − P(Y | A)]

(3)

These general expressions can be specified for a given RR device by defining the design probabilities. For instance, consider Simmons’ model and let p and 1− p be the design parameters, i.e. the probabilities of answering statements “I possess A” and “I possess ¯ = (1 − p)π B . From B”, respectively. Then, P(Y |A) = p + (1 − p)π B and P(Y | A) (2) and (3), we get the expression for the estimator of π A and its variance as

123

990

S. Giordano, P. F. Perri

 πS =

 λ − (1 − p)π B p

and V ( πS ) =

π A (1 − π A ) π A (1 − p)(1 − 2π B ) π B (1 − p)[1 − (1 − p)π B ] . (4) + + n np np 2

In Singh et al’s (2003) RR model (hereafter Design 1), cards show the statement “I possess A”, others “I possess B”, and the remaining cards are left blank. These three types of cards occur with probabilities p1 , p2 and p3 (design parameters), with  3 i=1 pi = 1. The device works as in Simmons’ with the exception that, in the case of a blank card being chosen, respondents will report no, no matter what their true status. ¯ = p2 π B . The design probabilities are given by P(Y |A) = p1 + p2 π B and P(Y | A) According to the randomization procedure, π A is estimated by  π1 =

 λ − p2 π B p1

with variance V ( π1 ) =

π A (1 − π A ) π A (1 − p1 − 2 p2 π B ) p2 π B (1 − p2 π B ) + . + n np1 np12

(5)

Motivated by Singh et al. (2003), and aimed at improving the performance of Simmons’ model, Perri (2008) proposed two UQMs. The first one (hereafter Design 2) provides a yes response when a blank card is selected. Generally, a no answer is perceived to be less embarrassing by the respondents but, in certain situations, yes may be more fashionable. Consider, for instance, the question “Have you ever smoked marijuana?” posed to university students. Many students may be tempted to reply in the affirmative even if it is untrue. selecting a card with statements A and B, and Let q1 and q2 be the probabilities of 3 qi = 1. q3 the probability of a blank card, i=1 ¯ = q2 π B + q3 and Then, P(Y |A) = q1 + q2 π B + q3 , P(Y | A)  π2 =

 λ − q2 π B − q3 q1

with variance V ( π2 ) =

π A (1 − π A ) π A [q2 (1 − 2π B ) − q3 ] (q3 + q2 π B )(1 − q3 − q2 π B ) + + . n nq1 nq12 (6)

On the other hand, when a blank card is selected, a yes answer could cause embarrassment in the respondent when the statement is unfashionable. With the intent of reducing possible reluctance and encouraging respondent cooperation, a different

123

Comparison of UQMs based on same privacy protection degree

991

strategy is devised in this situation. The alternative procedure (hereafter Design 3) operates as the previous two methods with the difference that, when a blank card is drawn, respondents are instructed to perform again a Simmons’ device by replying to the sensitive and innocuous questions with probability θ and 1 − θ, respectively. Let ri , i = 1, 2, 3, denote the probability of selecting one of the three types of cards in the first randomization step. Then, the new design probabilities are ¯ = (1 − r1 − θr3 )π B , whereas P(Y |A) = r1 + θr3 + (1 − r1 − θr3 )π B and P(Y | A) the estimator of π A takes the form  π3 =

 λ − (1 − r1 − θr3 )π B r1 + θr3

with variance V ( π3 ) =

π A (1 − π A ) π A (1 − r1 − θr3 )(1 − 2π B ) + n n(r1 + θr3 ) π B (1 − r1 − θr3 ) [1 − π B (1 − r1 − θr3 )] + . n(r1 + θr3 )2

(7)

The performance of these UQMs has been evaluated in Perri (2008) by comparing πi ), i = 1, 2, 3, for p = p1 = q1 = r1 and p3 = q3 = r3 . The comV ( π S ) and V ( parison has analytically shown that Design 2 outperforms Simmons’ model as well as Design 1 in terms of efficiency under certain conditions only, whereas Design 3 always outperforms Simmons’ model, whatever the parameters ri and θ, i = 1, 2, 3. Nonetheless, in the literature it is generally observed that the efficiency of the estimates tends to increase as the level of confidentiality decreases, and vice versa. We shall proceed therefore to compare the considered designs taking both efficiency and privacy protection into account.

3 Measuring respondent privacy protection One way to evaluate different RR models is by comparing the variances when the degree of privacy protection is held constant. Many criteria for measuring confidentiality have been discussed in the literature. Here, we focus on two of the most widely used approaches proposed by Lanke (1976) and Leysieffer and Warner (1976).

3.1 The Leysieffer and Warner measures Leysieffer and Warner (1976) introduced the measures of jeopardy (hereafter ¯ respectively LW-measures) carried by yes and no responses about A and A, g(Y, A) =

¯ ¯ P(Y |A) ¯ = P(N | A) = 1 − P(Y | A) . , g(N , A) ¯ P(N |A) 1 − P(Y |A) P(Y | A)

(8)

123

992

S. Giordano, P. F. Perri

¯ greater than unity indicate that response Y [N ] is Values of g(Y, A)[g(N , A)] ¯ in the sense that with this response members of jeopardizing with respect to A[ A], ¯ ¯ than to group A[A]. ¯ group A [ A] are more likely to belong to group A[ A] Therefore, ¯ for respondents genuinely belonging to group A[ A], the RR device yields an increased ¯ )] that they really belong to group A[ A], ¯ and this posterior probability P(A|Y )[P( A|N may arouse their suspicions. Note that g is a function of the design probabilities, but not of π A , and the response is nonjeopardizing if and only if g = 1. The LW-measures are linked to the efficiency of the estimators. In fact, for designs ¯ = 0, it can be shown Leysieffer and Warner (1976) that where P(Y |A) − P(Y | A) ¯ π A (1 − π A ) π A g(Y, A) + (1 − π A )g(N , A) + . ¯ − 1] n n[g(Y, A) − 1][g(N , A)

V ( πA) =

(9)

¯ ¯ > 1, V ( For g(Y, A) > 1 and g(N , A) π A ) is a decreasing function of g(Y, A), g(N , A) or both. The general expressions for g given in (8) can be easily worked out for any RR design by substituting the proper design probabilities. For Simmons’ model, the LW-measures take the form g S (Y, A) =

p + (1 − p)π B ¯ = 1 − (1 − p)π B , g S (N , A) (1 − p)π B (1 − p)(1 − π B )

(10)

whereas, for Designs 1, 2 and 3 they are p1 + p2 π B 1 − p2 π B ¯ = , g1 (N , A) p2 π B 1 − p1 − p2 π B

(11)

q1 + q2 π B + q3 ¯ = 1 − q2 π B − q3 , g2 (N , A) q2 π B + q3 q2 (1 − π B )

(12)

g1 (Y, A) = g2 (Y, A) =

g3 (Y, A) =

r1 + θr3 + (1 − r1 − θr3 )π B , (1 − r1 − θr3 )π B

¯ = g3 (N , A)

(13)

1 − (1 − r1 − θr3 )π B . (1 − r1 − θr3 )(1 − π B )

It is easy to verify that g > 1 for each considered randomization design, meaning that respondent privacy is jeopardized both with a yes or a no answer. 3.2 The Lanke measure Lanke (1976) proposed a measure of privacy protection based on the idea that the higher the probability P(A|R) of being classified in the group A by giving the answer R the more embarrassing it is to give that answer. The Lanke measure (hereafter L-measure) is L = max{P(A|Y ), P(A|N )}.

123

Comparison of UQMs based on same privacy protection degree

993

However, it takes L = P(A|Y ) when P(A|Y ) − P(A|N ) =

π A [P(Y |A) − P(Y )] >0 P(Y )[1 − P(Y )]

¯ > 0. This is true for that is when P(Y |A) − P(Y ) = (1 − π A )[P(Y |A) − P(Y | A)] ¯ is clearly all the considered RR designs. In fact, the difference P(Y |A) − P(Y | A) positive in all the four devices since it equals p for Simmons’ model, and p1 , q1 , P(Y |A) r1 + θr3 for Designs 1, 2 and 3, respectively. So, L reduces to P(A|Y ) = π A P(Y ) for these randomization procedures. In particular, the L-measure for Simmons’ model is given by LS =

π A [ p + (1 − p)π B ] pπ A + (1 − p)π B

and for Designs 1, 2 and 3 it turns out to be L1 =

π A ( p1 + p2 π B ) , p1 π A + p2 π B L3 =

L2 =

π A (q1 + q2 π B + q3 ) , q1 π A + q2 π B + q3

π A [r1 + θr3 + (1 − r1 − θr3 )π B ] . (r1 + θr3 )π A + (1 − r1 − θr3 )π B

For each RR device, poor privacy protection is perceived when L → 1. Note that, since, in general (Leysieffer and Warner 1976) P(A|Y ) 1 − π A P(Y |A) = ¯ ¯ P( A|Y ) π A P(Y | A) thus there exists a relation between the L-measure L = P(A|Y ) and the LW-measure g(Y, A), which will be useful later, that is L 1 − πA = g(Y, A). 1 − L πA

(14)

4 Comparison of the unrelated question models For proper design comparisons, both efficiency and privacy protection should be taken into account. Hence, we compare the four UQMs with respect to efficiency at the same level of confidentiality. 4.1 Comparison through LW-measures According to the literature, let K 1 and K 2 be the maximal allowable values, both above ¯ respectively. Jeopardy degrees lower unity, of the LW-measures g(Y, A) and g(N , A),

123

994

S. Giordano, P. F. Perri

than K 1 and K 2 would ensure respondents a higher degree of protection, but this would necessarily increase the variance of the estimators as shown in (9). ¯ = K 2 for each UQM under We proceed by imposing g(Y, A) = K 1 and g(N , A) study and we assume that π B is invariant for the different devices albeit not fixed in advance by any consideration. Under the given privacy restrictions, the variance of  π A follows by (9) V ( πA) =

K 1 π A + K 2 (1 − π A ) π A (1 − π A ) + n n(K 1 − 1)(K 2 − 1)

(15)

and cannot be reduced without violating the imposed jeopardy constraints. Therefore, the problem of comparing the procedures under same levels of the LW-measures turns into finding suitable design parameters, if any, which allow to achieve the levels K 1 and K 2 and, thus, (15). First, assume K 1 = K 2 both finite. In this case, Simmons’ model cannot be improved by Designs 1, 2 and 3. In fact, the optimal choice for π B and p of Sim¯ = K 2 in (10) is mons’ device which satisfies g S (Y, A) = K 1 and g S (N , A) πB =

K2 − 1 , K1 + K2 − 2

p=

(K 1 − 1)(K 2 − 1) . K1 K2 − 1

(16)

¯ = Moreover, provided π B in (16), the restrictions gi (Y, A) = K 1 and gi (N , A) K 2 , i = 1, 2, in (11) and (12), lead to Designs 1 and 2 with parameters p1 = q 1 =

(K 1 − 1)(K 2 − 1) , K1 K2 − 1

p2 = q 2 =

K1 + K2 − 2 K1 K2 − 1

(17)

and consequently p3 = q3 = 0. This choice of the design parameters is trivial since it actually turns Designs 1 and 2 into Simmons’. So, when same privacy protection is required, the blank card option, which distinguishes the two methods from Simmons’ mechanism, vanishes. ¯ = K 2 in (13), together with π B Finally, imposing g3 (Y, A) = K 1 and g3 (N , A) from (16), yields K 2 (1 − θr3 ) − 1 (K 2 − 1)2 , − K2 K 2 (K 1 K 2 − 1) K1 + K2 − 2 r2 = − r3 (1 − θ ). K1 K2 − 1 r1 =

(18)

Then, contrary to Designs 1 and 2, for every degree of the LW-measures, there exists at least one nontrivial combination of the design parameters for which Design 3 is equivalent—in terms of variance and privacy protection—to Simmons’ model, and vice versa.

123

Comparison of UQMs based on same privacy protection degree

995

Next, suppose that K 1 = K 2 = K < ∞. Then, from (15) the variance of  πA becomes V ( πA) =

K π A (1 − π A ) + n n(K − 1)2

while, from (16) and (17), it immediately follows that π B = 0.5,

p = p1 = q 1 =

K −1 , K +1

p2 = q 2 =

2 K +1

and, thus, p3 = q3 = 0 which are meaningless. As concern Design 3, imposing )(1−r1 )−2 from (13) and the device K 1 = K 2 = K when π B = 0.5 leads to θ = (1+K (1+K )r3 turns out to be again equivalent to Simmons’ according to the efficiency/privacy criterion. Let us consider another case, assume K 1 finite but K 2 unlimited. This refers to a problem in which there is no jeopardy in a no answer (Greenberg et al. 1969). π A results As K 2 → ∞ in (15), the variance of  V ( πA) =

(1 − π A ) π A (1 − π A ) + . n n(K 1 − 1)

The optimal choice of the design parameters for Simmons’s model is π B = 1 and by (16). Taking the limits of (17) as K 2 → ∞, Design 1 and 2 parameters p = KK1 −1 1

, p2 = q2 = K11 and, thus, p3 = q3 = 0. Hence, assume the values p1 = q1 = KK1 −1 1 the resulting models are merely reduced to that of Simmons. 3 )−1 and With regard to Design 3, from (18) we get r1 = K 1 (1−θr K1

3 . Once again, Simmons’ and Design 3 strategies offer equal r2 = 1−K 1K(1−θ)r 1 efficiency/privacy. Practical considerations on the protection of confidentiality can lead the researcher ¯ not of major concern and give more importance to to consider the level of g(N , A) the control of g(Y, A). In this context, it is useful to regard g(Y, A) as fixed and com¯ values of RR methods sharing a common value pare the variances and the g(N , A) of g(Y, A). The design with the smallest variance value is judged the most efficient under the given privacy restrictions. The following proposition focuses on this point and considers Simmons’ model as a benchmark for comparing the alternative UQMs.

Proposition 1 Under a suitable choice of the design parameters, Simmons’ model is more [less] efficient than Design 1 [Design 2] under the privacy restriction of equal jeopardy yes-measure, g S (Y, A) = g1 (Y, A) [g S (Y, A) = g2 (Y, A)]. On the contrary, Simmons’ model and Design 3 are equally efficient. Proof Condition g S (Y, A) = g1 (Y, A) ensures that p = p1p+1p2 . By substituting this ¯ it follows that g S (N , A) ¯ > g1 (N , A), ¯ and parameter in the expression of g S (N , A), π1 ) holds by (9). consequently V ( π S ) < V (

123

996

S. Giordano, P. F. Perri

In the same way, since g S (Y, A) = g2 (Y, A), Simmons’ model has parameter π B q1 ¯ is always lower than g2 (N , A), ¯ . With this choice of p, g S (N , A) p = π B (1−q 3 )+q3 π2 ) is true by (9). and, therefore, V ( π S ) > V ( Similarly, from g S (Y, A) = g3 (Y, A) we get p = r1 + θr3 in Simmons’ model ¯ in terms of this new reparameterization, we find and, thus, by expressing g S (N , A) ¯ ¯ π S ) = V ( π3 ) from (9) g S (N , A) = g3 (N , A). Consequently, we conclude that V (   since the LW-measures are identical for p = r1 + θr3 . ¯ and describe On the other hand, one can fix a maximal allowable degree of g(N , A) how the UQMs behave in terms of efficiency. Again, comparisons of the procedures are illustrated in the next proposition by assuming Simmons’ model as reference. Proposition 2 Under a suitable choice of the design parameters, Simmons’ model is less [more] efficient than Design 1 [Design 2] under the privacy restriction of ¯ = g1 (N , A)[g ¯ S (N , A) ¯ = g2 (N , A)]. ¯ On the equal jeopardy no-measure, g S (N , A) contrary, Simmons’ model and Design 3 are equally efficient. ¯ and g1 (N , A) ¯ provides p = p1 (1−π B ) . It is easy to prove Proof Equating g S (N , A) 1−π B ( p1 + p2 ) that the measure g S (Y, A), expressed in terms of this optimal p, is always less than π S ) > V ( π1 ) by (9). g1 (Y, A) ensuring that V ( q1 ¯ = g2 (N , A), ¯ it from the identity g S (N , A) Using Simmons’ parameter p = 1−q 3 π S ) < V ( π2 ) by (9). turns out that g S (Y, A) > g2 (Y, A), and consequently, V ( ¯ = g3 (N , A) ¯ leads again to p = r1 + θr3 which, from Finally, imposing g S (N , A) π S ) = V ( π3 ).   the proof of Proposition 1, ensures g S (Y, A) = g3 (Y, A) and V ( Design 3 versus Designs 1 and 2 behaves as Simmons’ model, while comparing Design 1 and 2 does not lead to any manageable condition in terms of design parameters. Results are hence skipped.

4.2 Comparison through L-measure We compare the efficiency offered by the four designs given the same level of privacy protection evaluated by the L-measure. In so doing, we exploit the results stated in Proposition 1 and relation (14). First, consider the case when the values of the L-measure are identical in pairs: π S ) < V ( π1 ); while if L 2 = L S it holds that V ( π S ) > V ( π2 ) if L 1 = L S then V ( π S ) = V ( π3 ). This result can be easily verified and, finally, if L 3 = L S then V ( since devices which admit equal values of the L-measure have also equal levels of g(Y, A) due to (14), hence, the variances are ordered as in Proposition 1. Obviously, the design parameters which ensure the equalities between the LW yes-measures (in Proposition 1) are the same that guarantee equal L-measure. Comparing two designs, the more protective has the smaller value of L. We shall now specify under what condition one UQM is considered more protective than another in accordance with the L-measure. After some algebra, we get

123

Comparison of UQMs based on same privacy protection degree

997

Table 1 Values of n[V ( π S ) − V ( π1 )] for p1 = p, p3 = 0.1, p2 = 1 − p1 − p3 such that L 1 > L s . Negative values denote situations where Design 1 is less protective and also less efficient than Simmons’ device πY πA

p

0.2

0.3

0.4

0.5

0.6

0.3

0.3

0.124

0.143

0.133

0.094

0.5

0.042

0.052

0.054

0.050

0.7

0.020

0.026

0.031

0.3

0.098

0.103

0.5

0.026

0.7

0.008

0.3

0.071

0.5

0.010

0.7

−0.003

0.5

0.7

0.7

0.8

0.027

−0.070

−0.196

0.038

0.020

−0.006

0.034

0.034

0.033

0.029

0.080

0.028

−0.053

−0.163

−0.302

0.028

0.022

0.010

−0.010

−0.036

−0.070

0.009

0.008

0.005

0.000

−0.007

−0.016

0.063

0.027

−0.039

−0.133

−0.257

−0.409

0.004

−0.010

−0.030

−0.058

−0.092

−0.134

−0.008

−0.015

−0.023

−0.034

−0.047

−0.062

(a) L 1 > L S if and only if p
L S if and only if p
L S if and only if p < r1 + θr3 (d) L 2 > L 1 if and only if p1
L S . Results of this study are pected situations where V ( π1 ) > V ( reported in Table 1. π S changes inversely with respect to the On the contrary, the variance of  π3 and  Lanke degree of privacy protection. In particular, under restriction (c), Simmons’ device is more protective but less efficient than Design 3. In fact, comparing V ( π3 ) π S ) > V ( π3 ) if and V ( π S ), after some algebra, we get V ( p(r1 + θr3 )(π A − π B )(2π B − 1) < ( p + r1 + θr3 )π B (1 − π B ) which holds as p(r1 + θr3 ) < p + r1 + θr3 and (π A − π B )(2π B − 1) < π B (1 − π B ). It is at once apparent that (π A − π B )(2π B − 1) < π B (1 − π B ) is equivalent to π B2 − 2π A π B + π A > 0 which is always true.

123

998

S. Giordano, P. F. Perri

5 Concluding remarks The degree of privacy protection is an important aspect to be considered when dealing with RR devices. Owing to the rationale of RR techniques, greater privacy is thought to imply, in general, an increase in the variance of the estimates. The key issue when choosing a model is to find the right trade-off between privacy protection and efficiency in the estimates; hence the importance of measuring the degree of privacy protection correctly. Different measures have been discussed in the literature, a couple of which are mentioned in the paper. The main concern is that the measures can yield different results. One way of fairly treating different RR designs is to compare variances when a required level of privacy is ensured. With this in mind, we have evaluated the performance of four UQMs by paralleling the work of Lin (2005) on the basis of the approaches proposed by Lanke (1976) and Leysieffer and Warner (1976). LeysiefferWarner measures allow us to achieve a balance between privacy and efficiency since they are coherent, at least for the RR devices under study, with the desired principle of an inverse relationship between disclosure of personal information and precision of the estimates. On the contrary, the measure of the privacy attained through the Lanke approach could be misleading since it shows an ordering of the devices in terms of privacy which is not always reversible as regards efficiency. In fact, according to our Lanke-based outcomes, design parameters could be fine-tuned to ensure greater efficiency to an RR device which, paradoxically, can also be more protective than a competitive randomization procedure. However, even if researchers can draw useful indications from a set of privacy protection measures, the question of how much importance should be given to the correct measurement of privacy remains, in our opinion, a matter for discussion. In fact, it is arguable whether the respondents themselves are able to measure the extent of privacy disclosure offered by an RR design. Therefore, since respondents cannot fully appreciate the level of privacy protection guaranteed by a device, researchers may be more inclined towards designs providing more efficient estimates rather that ones which theoretically offer higher protection. Finally, whatever the privacy measure, respondents usually have a subjective impression regarding disclosure which can be higher or lower than that measured by an index. In this sense, an empirical study would certainly be of interest to ascertain and quantify the difference between the perceived and measured degree of respondent privacy protection.

References Arnab R, Singh S (2010) Randomized response techniques: an application to the Botswana AIDS impact survey. J Stat Plann Inference 140: 941–953 Bhargava M, Singh R (2000) A modified randomization device for Warner’s model. Statistica LX: 315–321 Bhargava M, Singh R (2002) On the efficiency comparison of certain randomized response strategies. Metrika 55: 191–197 Chaudhuri A, Bose M, Dihidar K (2011) Estimation of a sensitive proportion by Warner’s randomized response data through inverse sampling. Stat Pap 52: 343–354

123

Comparison of UQMs based on same privacy protection degree

999

Chaudhuri A, Christofides TC, Saha A (2009) Protection of privacy in efficient application of randomized response techniques. Stat Methods Appl 18: 389–418 Diana G, Perri PF (2009) Estimating a sensitive proportion through randomized response procedures based on auxiliary information. Stat Pap 50: 661–672 Greenberg BG, Abul-Ela ALA, Simmons WR, Horvitz DG (1969) The unrelated question randomized response model: theoretical framework. J Am Stat Assoc 64: 520–539 Guerriero M, Sandri MF (2007) A note on the comparison of some randomized response procedures. J Stat Plann Inference 137: 2184–2190 Horvitz DG, Shah BV, Simmons WR (1967) The unrelated question randomized response model. Soc Stat Sect Proceedings of the American Statistical Association, 65–72 Krumpal I (2010) Estimating the prevalence of xenophobia and anti-semitism in Germany: a comparison of the randomized response technique and direct questioning in the telephone survey mode. Unpublished Manuscript, Institute of Sociology, University of Leipzig Lanke J (1976) On the degree of protection in randomized interviews. Int Stat Rev 44: 197–203 Lara D, Strickler J, Olavarrieta CD, García SG, Ellertson C (2004) Measuring induced abortion in Mexico. A comparison of four methodologies. Soc Meth R 32: 529–558 Lara D, García SG, Ellertson C, Camlin C, Suaréz J (2006) measure of induced abortion in Mexico using random response technique. Soc Meth R 35: 279–301 Lensvelt-Mulders GJLM, Hox JJ, van der Heijden PGM, Maas CJM (2005) Meta-analysis of randomized response research: thirty-five years of validation. Soc Meth R 33: 319–348 Lensvelt-Mulders GJLM, van der Heijden PGM, Laudy O, van Gils G (2006) A validation of a computerassisted randomized response survey to estimate the prevalence of fraud in social security. J R Stat Soc, Ser A 169: 305–318 Leysieffer FW, Warner SL (1976) Respondent jeopardy and optimal designs in randomized response models. J Am Stat Assoc 71: 649–656 Lin TH (2005) The efficiency comparison on Warner’s and two modified models. Int J Inf Manage Sci 16: 73–78 Mangat NS, Singh S, Singh R (1995) On use of a modified randomization device in Warner’s model. J Indian Soc Stat Oper Res 16: 65–69 Nayak TK (1994) On a randomized response surveys for estimating a proportion. Commun Stat, Theory Methods 23: 3303–3321 Ostapczuk M, Musch J, Mashagen M (2009) A randomized-response investigation of the education effect in attitudes towards foreigners. Eur J Soc Psychol 39: 920–931 Perri PF (2008) Modified randomized devices for Simmons’ model. Model Assist Stat Appl 3: 233–239 Singh S, Horn S, Singh R, Mangat NS (2003) On the use of modified randomization device for estimating the prevalence of a sensitive attribute. Stat Transit 6: 515–522 van der Heijden PGM, van Gils G, Bouts J, Hox JJ (2000) A comparison of randomized response, computer-assisted self-interview, and face-to-face direct questioning. Soc Meth R 28: 505–537 van der Heijden PGM, Böckenholt U (2008) Applications of randomized response methodology in e-commerce. In: Jank W, Shmueli G (eds) Statistical methods in e-commerce research. Wiley, New York pp 401–416 Warner SL (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60:63–69

123

Suggest Documents