Min-Wise independent linear permutations Tom Bohman Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh PA15213, U.S.A. E-mail:
[email protected] Colin Coopery School of Mathematical Sciences, University of North London, London N7 8DB, UK. E-mail:
[email protected] Alan Friezez Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh PA15213, U.S.A., E-mail:
[email protected] Submitted: January 12, 2000; Accepted: April 23, 2000
Abstract A set of permutations F Sn is min-wise independent if for any set X [n] and any x 2 X , when is chosen at random in F we have P (minf(X )g = (x)) = 1 jX j . This notion was introduced by Broder, Charikar, Frieze and Mitzenmacher and is motivated by an algorithm for ltering near-duplicate web documents. Linear permutations are an important class of permutations. Let p be a (large) prime and let Fp = fa;b : 1 a p ? 1; 0 b p ? 1g where for x 2 [p] = f0; 1; : : : ; p ? 1g, a;b (x) = ax + b mod p. For X [p] we let F (X ) = Supported in part by NSF Grant DMS-9627408 y Research supported by the STORM Research Group z Supported in part by NSF grant CCR-9818411
1
the electronic journal of combinatorics
7 (2000), #R26
2
maxx2X fPa;b (minf(X )g = (x))g where Pa;b is over chosen at ran uniformly (log k )3 1 dom from Fp . We show that as k; p ! 1, E X [F (X )] = k + O k3=2 con rming that a simply chosen random linear permutation will suce for an average set from the point of view of approximate min-wise independence.
1 Introduction Broder, Charikar, Frieze and Mitzenmacher [3] introduced the notion of a set of min-wise independent permutations. We say that F Sn is min-wise independent if for any set X [n] and any x 2 X , when is chosen at random in F we have 1 : P (minf (X )g = (x)) = (1)
jX j
The research was motivated by the fact that such a family (under some relaxations) is essential to the algorithm used in practice by the AltaVista web index software to detect and lter near-duplicate documents. A set of permutations satisfying (1) needs to be exponentially large [3]. In practice we can allow certain relaxations. First, we can accept small relative errors. We say that F Sn is approximately min-wise independent with relative error (or just approximately min-wise independent, where the meaning is clear) if for any set X [n] and any x 2 X , when is chosen at random in F we have ? P min
f(X )g = (x) ? jX1 j jX j : (2) In other words we require that all the elements of any xed set X have only an almost equal chance to become the minimum element of the image of X under . Linear permutations are an important class of permutations. Let p be a (large) prime and let Fp = fa;b : 1 a p ? 1; 0 b p ? 1g where for x 2 [p] = f0; 1; : : : ; p ? 1g,
a;b (x) = ax + b mod p; where for integer n we de ne n mod p to be the non-negative remainder on division of n by p. For X [p] we let F (X ) = max fPa;b (minf(X )g = (x))g x2X where Pa;b is over chosen uniformly at random from Fp. The natural questions to discuss are what are the extremal and average values of F (X ) as X ranges over Ak = fX [p] : jX j = kg. The following results were some of those obtained in [3]:
the electronic journal of combinatorics
7 (2000), #R26
3
Theorem 1 (a) Consider the set Xk = f0; 1; 2 : : :k ? 1g, as a subset of [p]. As k; p ! 1, with k = o(p), 2
3 ln k + O k + 1 : Pa;b (minf (Xk )g = (0)) = k p k (b) As k; p ! 1, with k = o(p), 2
2
4
p 2 + 1 1 ; E X [F (X )] p + O 2(k ? 1) k 1
2k denotes expectations over X chosen uniformly at random from Ak . 2
where E X
In this paper we improve the second result and prove
Theorem 2 As k; p ! 1,
(log k) 1 E X [F (X )] = + O k k= 3 2
3
:
Thus a simply chosen random linear permutation will suce for an average set from the point of view of min-wise independence. Other results on min-wise independence have been obtained by Indyk [6], Broder, Charikar and Mitzenmacher [4] and Broder and Feige [5].
2 Proof of Theorem 2 Let X = fx ; x ; : : : ; xk? g [p]. Let i = axi mod p for i = 0; 1; : : : ; k ? 1. Let 0
1
1
i = i(X; a) = minf ? j mod p : j = 1; 2; : : : ; k ? 1g: 0
Let and note that Then
Ai = Ai(X ) = fa 2 [p] : i(X; a) = ig
jAij k ? 1;
i = 1; 2; : : : ; p ? 1:
minf(X )g = (x ) i 0 2 f + b; + b ? 1; : : : ; + b ? i + 1g mod p: 0
0
0
Thus if
Z = Z (X ) =
0
p?1 X i=1
ijAij;
(3)
the electronic journal of combinatorics
7 (2000), #R26
f(X )g = (x )) = p(pZ? 1) :
Pa;b (min
4 (4)
0
Fix a 2 f1; 2; : : : ; p ? 1g and x . Then 0
k ? Y 1 i+t P(a 2 Ai ) = (k ? 1) 1 ? p?1 t p?1?t 2
(5)
=1
We write Z = Z + Z where Z = 0
1
0
Pi0
i=1
ijAij where i = 0
4p log k
k
. Now, by symmetry,
f(X )g = (x )) = k1
E X (Pa;b (min
and so
p(p ? 1) : k
E X (Z ) =
It follows from (5) that E (Z1 )
(k ? 1) kp
(6)
0
p?1 X
i=i0 +1
i exp ? 4(k ? k2) log k
2
(7)
3
for large k; p. We continue by using the Azuma-Hoeding Martingale tail inequality { see for example [1, 2, 7, 8, 9]. Let x be xed and for a given X let X^ be obtained from X by replacing xj by randomly chosen x^j . For j 1 let dj = maxfjE xj (Z (X ) ? Z (X^ ))jg: 0
^
X
Then for any t > 0 we have
2t
2
j ? E (Z )j t) 2 exp ? d + + d : k?
P( Z0
0
2 1
2
1
(8)
We claim that
dj
i0 X
i+
i=1
i0 X (k i=1
? 1)i p
i2 + i3pk + O(p) 30(logk k) p 2 0
2
(9)
3 0
3 2
2
(10)
the electronic journal of combinatorics
7 (2000), #R26
5
Explanation for (9): If a 2 Ai(X ) because axj = ax ? i mod p then changing xj to x^j changes jAij by one. This explains the rst summation. The second accounts for those a 2 Ai(X ) for which ax ? ax^j mod p < i, changing the minimum in (3). We then use jAij k ? 1 and P(ax ? ax^j mod p < i) = pi . 0
0
0
Using (10) in (8) with t = " pk2 we see that
P
"k jZ ? E (Z )j " pk exp ? 450(log k) : 2
0
2
0
6
It now follows from (4), (6), (7) and the above that Z 1 1 1 1 + min 1; k exp ? E X [F (X )] = + O
k
and the result follows.
k
2
k
"=0
"k 2
450(log k)
6
d"
2
References [1] N. Alon and J.H. Spencer, The Probabilistic Method, Wiley, 1992. [2] B. Bollobas, Martingales, isoperimetric inequalities and random graphs, in Combinatorics, A. Hajnal, L. Lovasz and V.T. Sos Ed., Colloq. Math. Sci. Janos Bolyai 52, North Holland 1988. [3] A.Z. Broder, M. Charikar, A.M. Frieze and M. Mitzenmacher, Min-Wise Independent permutations, Proceedings of the 30th Annual ACM Symposium on Theory of Computing (1998) 327{336. [4] A.Z. Broder, M. Charikar and M. Mitzenmacher, A derandomization using min-wise independent permutations, Proceedings of Second International Workshop RANDOM '98 (M. Luby, J. Rolim, M. Serna Eds.) (1998) 15{24. [5] A.Z. Broder and U. Feige, Min-Wise versus Linear Independence, Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (2000). [6] P. Indyk, A small approximately min-wise independent family of hash-functions, Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (1999) 454{456. [7] C.J.H. McDiarmid, On the method of bounded dierences, Surveys in Combinatorics, 1989, Invited papers at the Twelfth British Combinatorial Conference, Edited by J. Siemons, Cambridge University Press, 148{188.
the electronic journal of combinatorics
7 (2000), #R26
6
[8] C.J.H. McDiarmid, Concentration, Probabilistic methods for algorithmic discrete mathematics, (M.Habib, C. McDiarmid, J. Ramirez-Alfonsin, B. Reed, Eds.), Springer (1998) 195{248. [9] M.J. Steele, Probability theory and combinatorial optimization, CBMS-NSF Regional Conference Series in Applied Mathematics 69, 1997.