What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
On the Complexity of Restricted kanonymity Problem Xiaoxun Sun1
Hua Wang1
Jiuyong Li2
1 Department
of Mathematics & Computing University of Southern Queensland Email:
[email protected]
2 School
of Computer and Information Science University of South Australia
Asia-Pacific Web Conference (APWeb’08), Shenyang, China
SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
What is k-anonymity?
• Strategy for releasing large amount of personal data, while still protecting privacy of individuals. • Originally proposed by Latanya Sweeney. • Level of privacy protection depends on a parameter k.
SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
What is k-anonymity?
In particular, data fields are either generalized or suppressed. • Generalized: e.g. “age 35” becomes “age 20-40”. • Suppressed: e.g. “age 35” is withheld entirely or suppressed to ∗. In our work, we deal only with restricted k-anonymity via suppression. Optimal k-anonymity: Given a list of records, minimize the number of fields suppressed, such that for each record r , there are k − 1 other records that are indistinguishable from r .
SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
Example of k-anonymity Consider the query “Who had an x-ray at this hospital yesterday?” and the following response: first Harry John Beatrice John
last Stone Reyer Stone Delgado
age 34 36 34 22
race Afr-Am Cauc Afr-Am Hisp
• Want to 2-anonymize this data (using suppression) before release.
SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
Example of k-anonymity
Consider the query “Who had an x-ray at this hospital yesterday?” and the following response: first ∗ John ∗ John
last Stone ∗ Stone ∗
age 34 ∗ 34 ∗
race Afr-Am ∗ Afr-Am ∗
• Rows 1 and 3 are indistinguishable, 2 and 4 are indistinguishable.
SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
What is restricted k-anonymity?
• The restricted case of general k-anonymity problem. We restrict: P (1) The alphabet = {0, 1}; (2) The number of zeroes in each attribute (column) is exactly k.
SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
Example of restricted k-anonymity
The left dataset in the figure above is an instance of general 3-anonymity problem and the right dataset is an instance of Restricted 3-anonymity problem. SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
Why introducing restricted k-anonymity?
• Its close connection with general k-anonymity problem. Given an instance of general k-anonymity problem, we could construct an instance of the restricted k-anonymity problem. As the previous example, by adding three vectors v4 , v5 , v6 , we can make the left dataset an instance of the restricted 3-anonymity problem, which, we think, could provide an alternative approach to solve the general k-anonymity problem.
SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
Overview
• Optimal k-anonymity is NP-hard. (Meyerson and Williams PODS2004) - For a large alphabet, k-anonymity is hard for k ≥ 3. • Optimal restricted k-anonymity is NP-hard. (What we have done) - For binary alphabet, restricted k-anonymity is hard for k ≥ 3. Our results could imply the results obtained by Meyerson and Williams 2004.
SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
Hardness of restricted k-anonymity P restricted k-anonymity problem: Given V ⊆ m (where P = {0, 1}) such that the number of zeroes in each attribute (column) is exactly k; Is there a suppressor t, such that t(V ) is k-anonymous and suppresses the minimum number of vector coordinates? The reduction is from Exact cover by 3-sets (X3C): Given a finite set X with |X | = 3q and a collection C of 3-element subsets of X , does C contain an exact cover for X ; that is, a sub collection C 0 ⊆ C such that every element of X occurs in exactly one member of C 0 ?
SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
From X3C to restricted 3-anonymity Given an instance of X3C: X = (x1 , x2 , · · · , x3q ) and C = (C1 , C2 , · · · , Cm ), where |Ci | = 3 for i = 1, 2, · · · , m. Define a database V of records where: • Records (rows) correspond to xi ∈ X • Attributes (columns) correspond to Cj ∈ C More precisely, Vi [j] =
0 1
if xi ∈ Cj otherwise
We then ask: does the optimal restricted 3-anonymous solution suppress at most 3q(m − 1) fields? SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
Example of reduction in action X = {1, 2, 3, 4, 5, 6} and C = {{1, 2, 3}, {1, 4, 5}, {4, 5, 6}, {2, 3, 6}} The reduction results in the table: 1 2 3 4 5 6
{1, 2, 3} 0 0 0 1 1 1
{1, 4, 5} 0 1 1 0 0 1
SUN, Xiaoxun
{4, 5, 6} 1 1 1 0 0 0
{2, 3, 6} 1 0 0 1 1 0
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
X3C 1 Exact 3-set cover {{1, 2, 3}, {4, 5, 6}} corresponds to the restricted 3-anonymous table:
1 2 3 4 5 6
{1, 2, 3} 0 0 0 ∗ ∗ ∗
{1, 4, 5} ∗ ∗ ∗ ∗ ∗ ∗
{4, 5, 6} ∗ ∗ ∗ 0 0 0
{2, 3, 6} ∗ ∗ ∗ ∗ ∗ ∗
There are 3 × 2 × (4 − 1) = 18 fields suppressed. SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
X3C 2 Exact 3-set cover {{1, 4, 5}, {2, 3, 6}} corresponds to the restricted 3-anonymous table:
1 2 3 4 5 6
{1, 2, 3} ∗ ∗ ∗ ∗ ∗ ∗
{1, 4, 5} 0 ∗ ∗ 0 0 ∗
{4, 5, 6} ∗ ∗ ∗ ∗ ∗ ∗
{2, 3, 6} ∗ 0 0 ∗ ∗ 0
Some observations: • If a set Cj doesn’t appear in the X3C, then its column is all ∗’s. • If Cj does appear, then 3 entries in its column are not ∗’s. SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
Why does this work? (Recall m = number of sets in collection = number of columns in table) • A group of 3 rows needs at least 3 × (m − 1) ∗s in order for the group to become indistinguishable. Follows from Vi [j] = 1 if xi ∈ Cj • A group of 3 rows corresponds to the elements of a set Cj if and only if exactly 3 × (m − 1) ∗s are required. The rows have 0 in the j th column, differ in other columns • Thus there is a X3C if and only if for every group of 3 rows, exactly 3 × (m − 1) ∗s are necessary. ⇒ 3q(m − 1) ∗s in total So there is a X3C if and only if the number of entries suppressed in the optimal restricted 3-anonymous solution is 3q × (m − 1) SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
Restricted 2-anonymity
Restricted 2-anonymity is solvable within polynomial time by equivalent transforming from perfect matching : Given a graph G = (U, E ) with |U| = n and |E | = m, is there a subset S ∈ E of n/2 edges such that each vertex of U is contained in exactly one edge of S?
SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
From perfect matching to restricted 2-anonymity Given an instance of perfect matching: A simple graph G = (U, E ) with U = (v1 , v2 , · · · , vn ) and E = (e1 , e2 , · · · , em ). Define a database V of records where • Records (rows) correspond to vi ∈ U. • Attributes (columns) correspond to ej ∈ E . P In detail, for each vi , define an m-dimensional vector vi ∈ m as vi [j] = 0 if vi ∈ ej ; Otherwise, vi [j] = 1; Then, the optimal Restricted 2-anonymous solution has at most n × (m − 1) ∗’s suppressed by t if and only if there is a perfect matching in the corresponding constructed graph G . SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
Example of equivalent transform
SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
The algorithm
Algorithm 1: Polynomial time algorithm for the Restricted 2-anonymity problem. P Input : A dataset V = (v1 , v2 , · · · , vn ) ⊆ m Output: The 2-anonymous dataset t(V ) (where t is a suppressor) 1.
Construct the graph G = (U, E ) where U = (v1 , v2 , · · · , vn ) and E = {eik (j)} and for each j = 1, 2, · · · , m, eik = (vi (j), vk (j)),with vi (j) = vk (j) = 0
2.
Find one perfect matching M in G .
3.
If found, let M(i) denote the unique edge in M containing node i and let t(M(i)) = 0 and t(j) = ∗, if j 6= M(i). Output t(V ).
4.
If not found. Output t(V ) with each value replaced by ∗ in V .
Algorithm complexity: • Time complexity O(n2 m); • Space complexity O(nm); SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
Example of how algorithm works
Our objective is to make the left dataset 2-anonymous. In the previous graph, we could find two perfect matchings {e1 , e3 } and {e2 , e4 } and according to Algorithm 1, all the 2-anonymous tables are shown in the figure above. SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
Future work
Starting from restricted k-anonymity problem: • Developing approximate algorithms for general k-anonymity problem. • Developing efficient exact algorithms for general k-anonymity problem.
SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem
What is k-anonymity? What is restricted k-anonymity? Hardness results One special case Further discussion
Questions?
SUN, Xiaoxun
On the Complexity of Restricted k-anonymity Problem