On the Contractivity of Privacy Mechanisms - arXiv

arXiv:1801.06255v1 [cs.IT] 18 Jan 2018

On the Contractivity of Privacy Mechanisms Mario Diaz

Lalitha Sankar

Peter Kairouz

Arizona State University and Harvard University mdiaztor@{asu,g.harvard}.edu

Arizona State University [email protected]

Stanford University [email protected]

estimation. More specifically, the Dobrushin coefficient can be used to provide strong data processing inequalities (SDPIs) for any f -divergence. Since these SDPIs are a fundamental part of many standard statistical methodologies, e.g., Le Cam’s method, the Dobrushin coefficient leads to an immediate evaluation of the statistical cost of a privacy mapping. We highlight the value of this approach by presenting new results on the ℓ2 -minimax risk for a distribution estimation setting under MaxL constraints, i.e., we compute the best worst-case expected ℓ2 -loss of a distribution estimator when using data sanitized by a privacy mechanism with specific MaxL guarantees. We show that the minimax risk order with respect to the sample size n and privacy level α has order I. I NTRODUCTION (n(2α − 1))−1 . This is the first step to enable quantitative Recently, several compelling definitions for privacy have comparisons with the corresponding local differential private arisen, notable among them are the context-free (statistics minimax risk (see, for example, [1]–[3]). The value of this agnostic) notion of differential privacy and the context-aware approach is in exploiting the connection between the Doinformation-theoretic measures such as mutual information brushin coefficient and Le Cam’s method to obtain bounds and maximal leakage. Context-aware approaches provide more directly. average-case privacy guarantees and allow for a range of This paper is organized as follows. In Section II we introadversarial models. duce the main concepts and terminology used in this paper. In Both context-free and context-aware definitions of privacy Section III we present our main results regarding the relation require designing a probabilistic mapping (henceforth referred between L-DP or MaxL privacy guarantees and the Dobrushin to as privacy mechanism) that satisfies the desired privacy re- coefficient of privacy mechanisms, and illustrate some of their quirement. Despite the operational interpretations, comparing consequences in Section IV. In Section V we apply our results privacy leakage measures numerically does not provide much to study the ℓ -minimax risk of a distribution estimation 2 insight. One approach is to evaluate the effect of the privacy problem under maximal leakage constraints. The proof of our requirement on utility (i.e., the statistical cost of using the main results are provided in Appendix A. corresponding privacy mechanism). The aim of this paper is II. P ROBLEM S ETTING AND P RELIMINARIES to provide a framework in which different privacy metrics can be compared in terms of their general statistical cost. A. Privacy Notions We compare different privacy mechanisms via their DoWe assume that X and Y are finite sets. A privacy mechabrushin coefficient, which is equal to the contractivity conism is a function W : X × Y → R such that W (x, y) ≥ 0 efficient of the mechanism with respect to total variation for all (x, y) ∈ X × Y and, for all x ∈ X , distance. Specifically, we provide upper bounds on the privacy X guarantees of a local differentially private (L-DP) mechanism W (x, y) = 1. (1) and a maximal leakage (MaxL) private mechanism in terms y∈Y of its Dobrushin coefficient. Conversely, we provide upper Let P(Z) be the set of probability measures on a discrete bounds on the Dobrushin coefficient of any mechanism in set Z. Every privacy mechanism W : X × Y → R can be terms of its L-DP and MaxL privacy guarantees. Since a small identified with a mapping W ′ : P(X ) → P(Y) determined Dobrushin coefficient means a highly contractive mapping, the by latter bounds are particularly useful to assess the cost of both X aforementioned privacy guarantees for a wide range of statisW ′ (P )(y) = P (x)W (x, y), y ∈ Y. (2) tical problems, including hypothesis testing and distribution x∈X

Abstract—We present a novel way to compare the statistical cost of privacy mechanisms using their Dobrushin coefficient. Specifically, we provide upper and lower bounds for the Dobrushin coefficient of a privacy mechanism in terms of its maximal leakage and local differential privacy guarantees. Given the geometric nature of the Dobrushin coefficient, this approach provides some insights into the general statistical cost of these privacy guarantees. We highlight the strength of this method by applying our results to develop new bounds on the ℓ2 -minimax risk in a distribution estimation setting under maximal leakage constraints. Specifically, we find its order with respect to the sample size and privacy level, allowing a quantitative comparison with the corresponding local differential privacy ℓ2 -minimax risk.

This material is based upon work supported by the National Science Foundation under Grant No. CCF-1350914 and an ASU seed grant.

Note that this correspondence defines a bijection. Thus, by abuse of notation, we denote by W both W and W ′ .

Definition 1 ([4]). For α ∈ [0, ∞], a privacy mechanism W is said to be α-locally differentially private if max max

y∈Y x1 ,x2 ∈X

W (x1 , y) ≤ 2α . W (x2 , y)

(3)

For convenience, we define 0/0 := 1 and 1/0 := ∞. Following tradition, all logarithms are taken in base 2. Definition 2 ([5]). For α ∈ [0, ∞], a privacy mechanism W is said to be α-MaxL private if X max W (x, y) ≤ 2α . (4) y∈Y

x∈X

The operational significance of Def. 2 comes from the following result of Issa et al. [5]. Proposition 1 ([5]). Let X and Y be random variables with support X and Y, respectively. Define ˆ (Y ) P U =U L(X → Y ) := sup log , (5) maxu∈U PU (u) U−X−Y ˆ (Y ) where the support of U is finite but of arbitrary size, and U is the maximum a posteriori estimator. Then, X L(X → Y ) = log max PY |X (y|x). (6) y∈Y

x∈X

The Dobrushin coefficient of a probabilistic mapping is defined as follows. B. Dobrushin Coefficient Definition 3 ([6]). The Dobrushin coefficient α(W ) of a mapping W : P(X ) → P(Y) is defined by 1X α(W ) = max |W (x1 , y) − W (x2 , y)|. (7) x1 ,x2 ∈X 2 y∈Y

C. f -Divergences Another important property of the Dobrushin coefficient comes from its connection with f -divergences. Definition 4. Let f : R+ → R be convex with f (1) = 0. For P, Q ∈ P(Z), its f -divergence Df (P kQ) is determined by X P (z) . (10) Df (P kQ) := Q(z)f Q(z) z∈Z

For W : P(X ) → P(Y), let ηf (W ) be defined by ηf (W ) :=

sup

Df (W (P0 )kW (P1 )) . Df (P0 kP1 )

(11)

P0 ,P1 ∈P(X ) 0 0 be given. There exists N = N (k, α) such that, for all n > N , k·k22

rα,k,n

1 ≥ . 16n(2α − 1)

and, for y n ∈ KPcˆ , kPˆ (y n ) − P1 k2 ≥

1 1 kP0 − P1 k2 = p . 2 2 n(2α − 1)

(39)

Also, observe that EW,P0 kPˆ (Y n ) − P0 k22 ≥ EW,P0 kPˆ (Y n ) − P0 k22 1KPˆ

(40)

≥

(30)

1

EW,P0 1KPˆ (41) 4n(2α − 1) 1 W (P0 )n (KPˆ ), = 4n(2α − 1) (42)

By providing an achievability scheme, we provide an upper k·k22 bound for rα,k,n .

Proposition 5. Let k ∈ N and α > 0 be given. If 2α ≤ k, where W (P )n (K ) denotes the measure of K with respect 0 Pˆ Pˆ then, for all n ∈ N, to the n-fold tensor product measure W (P0 ) ⊗ · · · ⊗ W (P0 ). Using a similar argument, we obtain that k−1 k·k22 rα,k,n ≤ . (31) α n(2 − 1) 1 S(Pˆ ) ≥ W (P0 )n (KPˆ ) + W (P1 )n (KPcˆ ) . (43) α 8n(2 − 1) Note that when 2α > k, the constraint in Def. 2 becomes

vacuous. By Propositions 4 and 5, 1 k·k22 . rα,k,n = Θk n(2α − 1)

(32)

The notation f (n, α) = Θk (g(n, α)) denotes that there exists N = N (k, α) such that, for all n > N , C1 g(n, α) ≤ f (n, α) ≤ C2 g(n, α),

(33)

Recall that P (E) + Q(E c ) ≥ 1 − kP − Qk for every event E. In particular, 1 (1 − kW (P0 )n − W (P1 )n k) (44) S(Pˆ ) ≥ 8n(2α − 1) r n 1 ≥ 1 − D (W (P )kW (P )) , KL 1 0 8n(2α − 1) 2 (45)

where the last inequality follows from Pinsker’s inequality and the tensorization property of the KL-divergence. By (15) and Theorem 3, we obtain that ! r n(2α − 1) 1 ˆ 1− DKL (P1 kP0 ) . S(P ) ≥ 8n(2α − 1) 2 (46) Proof of Proposition 4. Let P0 ∈ ∆k be the uniform distri- A Taylor series expansion argument shows that for n large bution over [k], i.e., P0 (x) P = k −1 for all x ∈P [k]. For a given enough n(2α − 1)DKL (P1 kP0 ) ≤ 1, and hence 2 k u = 0 and vector u ∈ R such that x x ux = 1, let x 1 P1 ∈ ∆k be the distribution determined by S(Pˆ ) ≥ . (47) 16n(2α − 1) 1 ux P1 (x) = + p , x ∈ [k]. (35) Note that sup E ˆ (Y n ) − P k2 ≥ S(Pˆ ), thus k P W,P α 2 k n(2 − 1)

where C1 , C2 > 0 are constants depending only on k. In particular, the MaxL ℓ2 -minimax risk is smaller than its L-DP counterpart, see [3, Thm. II.5], 2α k·k22 . (34) rα,k,n = Θk n(2α − 1)2

P ∈∆k

2

α

Note that if n ≥ k /(2 − 1), then P1 indeed defines a probability distribution. A direct computation shows that 1 kP0 − P1 k2 = p . n(2α − 1)

(36)

Let KPˆ := {y n ∈ [m]n : kPˆ (y n ) − P0 k2 ≥ kPˆ (y n ) − P1 k2 }. By (36) and the triangle inequality, for y n ∈ KPˆ , 1 1 , kP0 − P1 k2 = p 2 2 n(2α − 1)

= inf sup EW,P kPˆ (Y n ) − P k22 ≥ inf S(Pˆ ).

(38)

Pˆ

Pˆ P ∈∆k

(48)

By (47), we obtain that k·k2

Fix an α-MaxL private mechanism W : ∆k → ∆m . For a given estimator Pˆ , let 1 X (37) S(Pˆ ) := EW,Pi kPˆ (Y n ) − Pi k22 . 2 i=0,1

kPˆ (y n ) − P0 k2 ≥

k·k2 rk,n2 (W )

rk,n2 (W ) ≥

1 . 16n(2α − 1)

(49)

Since this inequality holds for any given α-MaxL private mechanism W , the result follows. 2α − 1 . Consider the mapProof of Proposition 5. Let λ := k−1 ping W : ∆k → ∆k+1 given by   λ 1−λ  ..  . .. (50) W = . .  λ 1−λ

It is immediate to verify that W is α-MaxL private. Hence, (29) readily implies that k·k2

k·k2

2 rα,k,n ≤ rk,n2 (W ).

(51)

Also, let Pˆ be the estimator determined by n k−1 1 X 1Y =x , Pˆ (x) = α 2 − 1 n i=1 i

x ∈ [k]. (52)

k·k2

2 rα,k,n ≤ sup EW,P kPˆ (Y n ) − P k22 .

P ∈∆k

(53)

We now estimate the right hand side term of (53). For a given P ∈ ∆k , we let Q = W (P ) be the common distribution of Y1 , . . . , Yn . Note that Q(x) = λP (x) for all x ∈ [k]. In particular, 1 X kPˆ (Y n ) − P k22 = (nλ)2

x∈[k]

n X i=1

1Yi =x − Q(x)

!2

E kPˆ (Y n ) − P k22 =

1 nλ2

X

Q(x)(1 − Q(x))

.

(55)

x∈[k]

1 X P (x)(1 − λP (x)). = nλ

(2α + 1)(W (x1 , y) − W (x2 , y)) ≤ (2α − 1)(W (x1 , y) + W (x2 , y)).

(59) As the later holds if and only if W (x1 , y) ≤ 2α W (x2 , y), and W is α-locally differentially private, the result follows. Proof of Theorem 1. For x1 , x2 ∈ X , we define 1X |W (x1 , y) − W (x2 , y)|. S(x1 , x2 ) := 2

(56)

Note that for every x1 , x2 ∈ X , S(x1 , x2 ) equals 1 X |W (x1 , y) − W (x2 , y)| W (x1 , y) + W (x2 , y) . (61) 2 W (x1 , y) + W (x2 , y) y∈Y

By the Lemma 1, we have that 2α − 1 X S(x1 , x2 ) ≤ W (x1 , y) + W (x2 , y) 2(2α + 1)

P

x

=

P (x)(1 − λP (x)) ≤ 1, we obtain that 1 k−1 E kPˆ (Y n ) − P k22 ≤ = nλ n(2α − 1)

(62)

y∈Y

x∈[k]

Since

(60)

y∈Y

(54) Using the fact that Q(x) = λP (x) for all x ∈ [k], we obtain

Lemma 1. If a privacy mechanism W is α-locally differentially private, then, for all x1 , x2 ∈ X and y ∈ Y, 2α − 1 |W (x1 , y) − W (x2 , y)| ≤ α . (58) W (x1 , y) + W (x2 , y) 2 +1 Proof. Without loss of generality, assume that W (x1 , y) ≥ W (x2 , y). Note that (58) holds true if and only if

Hence, by (28) and (51),

A PPENDIX A P ROOFS OF THE M AIN R ESULTS A. Local Differential Privacy Results The proof of Theorem 1 is based on the following elementary observation.

2α − 1 . 2α + 1

(63)

2α − 1 , as required. 2α + 1 Proof of Theorem 2. Without loss of generality, assume that W (x2 , y0 ) W (x, y) = (64) max max log y∈Y x,x′ ∈X W (x′ , y) W (x1 , y0 ) for some x1 , x2 ∈ X and y0 ∈ Y. For ease of notation, let η = ηTV (W ). By (7), 1X η = max′ |W (x, y) − W (x′ , y)| (65) x,x 2 y∈Y 1X |W (x2 , y) − W (x1 , y)|. (66) ≥ 2

By (7), we obtain that ηTV (W ) ≤ (57)

Since (57) holds for every P ∈ ∆k , by (53) the result follows.

VI. C ONCLUDING R EMARKS

We have introduced a novel way to compare the statistical costs of any privacy mechanism via the Dobrushin coefficient. y∈Y A significant advantage of this approach is that it eliminates the need to compute the precise mechanism for any privacy In particular, we have that X definition, which is often times difficult to obtain in closed |W (x2 , y) − W (x1 , y)|. W (x2 , y0 ) − W (x1 , y0 ) ≤ 2η − form. Many questions remain to be addressed including tighter y6=y0 bounds for distribution estimation under MaxL constraints as (67) well as application to other statistical problems with different An elementary computation shows that privacy requirements. X X W (x1 , y) − W (x2 , y) |W (x2 , y) − W (x1 , y)| ≥ ACKNOWLEDGEMENT

The authors would like to thank Dr. Ibrahim Issa for many useful discussions at the earlier stages of this work.

y6=y0

y6=y0

(68)

= W (x2 , y0 ) − W (x1 , y0 ), (69)

and hence W (x2 , y0 )− W (x1 , y0 ) ≤ 2η − (W (x2 , y)− W (x1 , y)), (70) i.e., W (x2 , y0 ) − W (x1 , y0 ) ≤ η. Therefore, W (x2 , y0 ) ηTV (W ) ηTV (W ) . ≤1+ ≤1+ W (x1 , y0 ) W (x1 , y0 ) W∗

Proof. Let s = a1 + · · · + ak . In order to reach contradiction, ai + ai2 s assume that 1 < for all i1 6= i2 . In this case, 2 k X ai + ai X s 2 1 < = (k − 1)s, (81) 2 k i1 6=i2

(71)

i1 6=i2

where the last equality follows from the fact that

The result follows.

|{(i1 , i2 ) : i1 6= i2 }| = k(k − 1). In a similar way, for i ∈ {1, . . . , k},

B. Maximal Leakage Privacy Results Proof of Thm. 3. For x1 , x2 ∈ X , we define 1X S(x1 , x2 ) := |W (x1 , y) − W (x2 , y)|. 2

|{(i1 , i2 ) : i1 6= i2 , i1 = i or i2 = i}| = 2(k − 1). (72)

y∈Y

Fix x1 , x2 ∈ X . Let

In particular, we have that X ai + ai X 2 1 = 2 i i1 6=i2

Y+ = {y ∈ Y : W (x1 , y) ≥ W (x2 , y)} and Y− = Y \ Y+ . In particular, we have that 1 X S(x1 , x2 ) = W (x1 , y) − W (x2 , y) 2 y∈Y+ 1 X + W (x2 , y) − W (x1 , y) . 2

(73)

Since, for every x ∈ X , X X W (x, y) + W (x, y) = 1,

X

i1 6=i2 i1 =i or i2 =i

x∈X

(74)

X

y∈Y

(75)

max W (x, y) = x∈X

W (x1 , y) +

y∈Y+

=

X

W (x2 , y) − 1

x∈X

max{W (x1 , y), W (x2 , y)} − 1,

(77)

y∈Y

where the last equality follows from the definition of Y± . Note that for all y ∈ Y, max{W (x1 , y), W (x2 , y)} ≤ max W (x, y). x

Hence, S(x1 , x2 ) ≤

X

y∈Y

(78)

max W (x, y) − 1 ≤ 2α − 1, where x∈X

ηTV (W ) = max S(x1 , x2 ) ≤ 2α − 1. x1 ,x2 ∈X

(79)

W (x(y) , y),

Before proceeding with the proof of Theorem 4, we prove the following elementary lemma. Lemma 2. If k > 1 and a1 , . . . , ak are non-negative real numbers, then there exist i1 6= i2 such that

(87)

where the last equality uses the fact that {Yx : x ∈ X } is a partition of Y. Note that W (x(y) , y) = W (x, y) for every y ∈ Yx , thus X X X max W (x, y) = W (x, y). (88) y∈Y

x∈X

x∈X y∈Yx

By Lemma 2, (88) implies that there exist x1 6= x2 such that X X 2 X max W (x, y) ≤ W (x1 , y) + W (x2 , y). x∈X |X | y∈Yx1

y∈Yx2

(89) Note that for all y ∈ Yx1 , W (x1 , y) = W (x(y) , y) ≥ W (x2 , y) and, in particular, W (x1 , y) = max{W (x1 , y), W (x2 , y)}.

Since ηTV (W ) ≤ 1 for every mapping W : P(X ) → P(Y), the result follows.

ai + ai2 a1 + · · · + ak ≤ 1 . k 2

X X

x∈X y∈Yx

y∈Y

the last inequality follows as W is α-MaxL private. By (7), we conclude that

(85)

y∈Y

(76)

y∈Y−

W (x(y) , y).

y∈Y

=

S(x1 , x2 ) =

X

For each x ∈ X , let Yx := {y ∈ Y : x(y) = x}. Note that, X X max W (x, y) = W (x(y) , y) (86) y∈Y

we have that X

(84)

Proof of Theorem 4. For each y ∈ Y, choose x(y) ∈ X such that W (x(y) , y) = max W (x, y). In particular,

y∈Y−

X

ai = (k − 1)s. 2

(83)

By (81), we conclude that (k − 1)s < (k − 1)s. Contradiction.

y∈Y−

y∈Y+

(82)

(90)

Also, W (x2 , y) = max{W (x1 , y), W (x2 , y)} for all y ∈ Yx2 . Altogether, we have that X 2 X max W (x, y) ≤ max{W (x1 , y), W (x2 , y)} x∈X |X | y∈Y y∈Yx1 ∪Yx2 X ≤ max{W (x1 , y), W (x2 , y)}, y∈Y

(80)

(91)

where we used the fact that Yx1 ∩Yx2 = ∅ and Yx1 ∪Yx2 ⊂ Y. Recall that, by (7), 1X ηTV (W ) = max |W (x, y) − W (x′ , y)|. (92) x,x′ ∈X 2 y∈Y

In particular, we have that 1X |W (x1 , y) − W (x2 , y)| ηTV (W ) ≥ 2 y∈Y 1 X = W (x1 , y) − W (x2 , y) 2 y∈Y+ 1 X W (x2 , y) − W (x1 , y) , + 2

(93) (94) (95)

y∈Y−

where, as before, Y+ = {y ∈ Y : W (x1 , y) ≥ W (x2 , y)} and Y− = Y \ Y+ . By (75) we obtain that X X ηTV (W ) ≥ W (x1 , y) + W (x2 , y) − 1. (96) y∈Y+

y∈Y−

By definition of Y± , we conclude that X 1 + ηTV (W ) ≥ max{W (x1 , y), W (x2 , y)}.

(97)

By (97) and (91), we conclude that 2 X max W (x, y), 1 + ηTV (W ) ≥ x∈X |X |

(98)

y∈Y

y∈Y

as we wanted to show. R EFERENCES [1] A. Pastore and M. Gastpar, “Locally differentially-private distribution estimation,” in Information Theory (ISIT), 2016 IEEE International Symposium on, 2016, pp. 2694–2698. [2] J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Minimax optimal procedures for locally private estimation,” Journal of the American Statistical Association, 2017, just-accepted. [3] M. Ye and A. Barg, “Asymptotically optimal private estimation under mean square loss,” arXiv preprint arXiv:1708.00059, 2017. [4] J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Local privacy and statistical minimax rates,” in Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, 2013, pp. 429–438. [5] I. Issa, S. Kamath, and A. B. Wagner, “An operational measure of information leakage,” in Information Science and Systems (CISS), 2016 Annual Conference on, 2016, pp. 234–239. [6] R. L. Dobrushin, “Central limit theorem for nonstationary Markov chains. I,” Theory of Probability & Its Applications, vol. 1, no. 1, pp. 65–80, 1956. [7] J. Cohen, J. Kempermann, and G. Zbaganu, Comparisons of Stochastic Matrices with Applications in Information Theory, Statistics, Economics and Population. Springer Science & Business Media, 1998. [8] Y. Polyanskiy and Y. Wu, “Dissipation of information in channels with input constraints,” IEEE Transactions on Information Theory, vol. 62, no. 1, pp. 35–55, 2016. [9] A. Makur and L. Zheng, “Linear bounds between contraction coefficients for f -divergences,” arXiv preprint arXiv:1510.01844v3, 2017. [10] P. Kairouz, S. Oh, and P. Viswanath, “Extremal mechanisms for local differential privacy,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 492–542, 2016. [11] J. Liao, L. Sankar, F. P. Calmon, and V. Y. F. Tan, “Hypothesis testing under maximal leakage privacy constraints,” in 2017 IEEE International Symposium on Information Theory (ISIT), June 2017, pp. 779–783. [12] S. Kamath, A. Orlitsky, D. Pichapati, and A. T. Suresh, “On learning distributions from their samples,” in Conference on Learning Theory, 2015, pp. 1066–1100.