Min-Wise independent linear permutations - CiteSeerX

5 downloads 0 Views 191KB Size Report
Jan 12, 2000 - Alan Friezez. Department ... E-mail:alan@random.math.cmu.edu .... mathematics, (M.Habib, C. McDiarmid, J. Ramirez-Alfonsin, B. Reed, Eds.),.
Min-Wise independent linear permutations Tom Bohman Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh PA15213, U.S.A. E-mail:[email protected] Colin Coopery School of Mathematical Sciences, University of North London, London N7 8DB, UK. E-mail:[email protected] Alan Friezez Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh PA15213, U.S.A., E-mail:[email protected] Submitted: January 12, 2000; Accepted: April 23, 2000

Abstract A set of permutations F  Sn is min-wise independent if for any set X  [n] and any x 2 X , when  is chosen at random in F we have P (minf(X )g = (x)) = 1 jX j . This notion was introduced by Broder, Charikar, Frieze and Mitzenmacher and is motivated by an algorithm for ltering near-duplicate web documents. Linear permutations are an important class of permutations. Let p be a (large) prime and let Fp = fa;b : 1  a  p ? 1; 0  b  p ? 1g where for x 2 [p] = f0; 1; : : : ; p ? 1g, a;b (x) = ax + b mod p. For X  [p] we let F (X ) =  Supported in part by NSF Grant DMS-9627408 y Research supported by the STORM Research Group z Supported in part by NSF grant CCR-9818411

1

the electronic journal of combinatorics

7 (2000), #R26

2

maxx2X fPa;b (minf(X )g = (x))g where Pa;b is over  chosen at ran  uniformly (log k )3 1 dom from Fp . We show that as k; p ! 1, E X [F (X )] = k + O k3=2 con rming that a simply chosen random linear permutation will suce for an average set from the point of view of approximate min-wise independence.

1 Introduction Broder, Charikar, Frieze and Mitzenmacher [3] introduced the notion of a set of min-wise independent permutations. We say that F  Sn is min-wise independent if for any set X  [n] and any x 2 X , when  is chosen at random in F we have 1 : P (minf (X )g =  (x)) = (1)

jX j

The research was motivated by the fact that such a family (under some relaxations) is essential to the algorithm used in practice by the AltaVista web index software to detect and lter near-duplicate documents. A set of permutations satisfying (1) needs to be exponentially large [3]. In practice we can allow certain relaxations. First, we can accept small relative errors. We say that F  Sn is approximately min-wise independent with relative error  (or just approximately min-wise independent, where the meaning is clear) if for any set X  [n] and any x 2 X , when  is chosen at random in F we have ? P min



f(X )g = (x) ? jX1 j  jX j : (2) In other words we require that all the elements of any xed set X have only an almost equal chance to become the minimum element of the image of X under . Linear permutations are an important class of permutations. Let p be a (large) prime and let Fp = fa;b : 1  a  p ? 1; 0  b  p ? 1g where for x 2 [p] = f0; 1; : : : ; p ? 1g, 

a;b (x) = ax + b mod p; where for integer n we de ne n mod p to be the non-negative remainder on division of n by p. For X  [p] we let F (X ) = max fPa;b (minf(X )g = (x))g x2X where Pa;b is over  chosen uniformly at random from Fp. The natural questions to discuss are what are the extremal and average values of F (X ) as X ranges over Ak = fX  [p] : jX j = kg. The following results were some of those obtained in [3]:

the electronic journal of combinatorics

7 (2000), #R26

3

Theorem 1 (a) Consider the set Xk = f0; 1; 2 : : :k ? 1g, as a subset of [p]. As k; p ! 1, with k = o(p), 2





3 ln k + O k + 1 : Pa;b (minf (Xk )g =  (0)) =  k p k (b) As k; p ! 1, with k = o(p), 2

2

4

p   2 + 1 1 ;  E X [F (X )]  p + O 2(k ? 1) k 1

2k denotes expectations over X chosen uniformly at random from Ak . 2

where E X

In this paper we improve the second result and prove

Theorem 2 As k; p ! 1,



(log k) 1 E X [F (X )] = + O k k= 3 2

3



:

Thus a simply chosen random linear permutation will suce for an average set from the point of view of min-wise independence. Other results on min-wise independence have been obtained by Indyk [6], Broder, Charikar and Mitzenmacher [4] and Broder and Feige [5].

2 Proof of Theorem 2 Let X = fx ; x ; : : : ; xk? g  [p]. Let i = axi mod p for i = 0; 1; : : : ; k ? 1. Let 0

1

1

i = i(X; a) = minf ? j mod p : j = 1; 2; : : : ; k ? 1g: 0

Let and note that Then

Ai = Ai(X ) = fa 2 [p] : i(X; a) = ig

jAij  k ? 1;

i = 1; 2; : : : ; p ? 1:

minf(X )g = (x ) i 0 2 f + b; + b ? 1; : : : ; + b ? i + 1g mod p: 0

0

0

Thus if

Z = Z (X ) =

0

p?1 X i=1

ijAij;

(3)

the electronic journal of combinatorics

7 (2000), #R26

f(X )g = (x )) = p(pZ? 1) :

Pa;b (min

4 (4)

0

Fix a 2 f1; 2; : : : ; p ? 1g and x . Then 0

k ? Y 1 i+t P(a 2 Ai ) = (k ? 1)  1 ? p?1 t p?1?t 2





(5)

=1

We write Z = Z + Z where Z = 0

1

0

Pi0

i=1

ijAij where i = 0

4p log k

k

. Now, by symmetry,

f(X )g = (x )) = k1

E X (Pa;b (min

and so

p(p ? 1) : k

E X (Z ) =

It follows from (5) that E (Z1 )

 (k ? 1)  kp

(6)

0

p?1 X



i=i0 +1

i exp ? 4(k ? k2) log k



2

(7)

3

for large k; p. We continue by using the Azuma-Hoe ding Martingale tail inequality { see for example [1, 2, 7, 8, 9]. Let x be xed and for a given X let X^ be obtained from X by replacing xj by randomly chosen x^j . For j  1 let dj = maxfjE xj (Z (X ) ? Z (X^ ))jg: 0

^

X

Then for any t > 0 we have



2t



2

j ? E (Z )j  t)  2 exp ? d +    + d : k?

P( Z0

0

2 1

2

1

(8)

We claim that

dj 

i0 X

i+

i=1

i0 X (k i=1

? 1)i p

 i2 + i3pk + O(p)  30(logk k) p 2 0

2

(9)

3 0

3 2

2

(10)

the electronic journal of combinatorics

7 (2000), #R26

5

Explanation for (9): If a 2 Ai(X ) because axj = ax ? i mod p then changing xj to x^j changes jAij by one. This explains the rst summation. The second accounts for those a 2 Ai(X ) for which ax ? ax^j mod p < i, changing the minimum in (3). We then use jAij  k ? 1 and P(ax ? ax^j mod p < i) = pi . 0

0

0

Using (10) in (8) with t = " pk2 we see that 

P







"k jZ ? E (Z )j  " pk  exp ? 450(log k) : 2

0

2

0

6

It now follows from (4), (6), (7) and the above that    Z 1 1 1 1 + min 1; k exp ? E X [F (X )] = + O

k

and the result follows.

k

2

k

"=0



"k 2

450(log k)

6



d"

2

References [1] N. Alon and J.H. Spencer, The Probabilistic Method, Wiley, 1992. [2] B. Bollobas, Martingales, isoperimetric inequalities and random graphs, in Combinatorics, A. Hajnal, L. Lovasz and V.T. Sos Ed., Colloq. Math. Sci. Janos Bolyai 52, North Holland 1988. [3] A.Z. Broder, M. Charikar, A.M. Frieze and M. Mitzenmacher, Min-Wise Independent permutations, Proceedings of the 30th Annual ACM Symposium on Theory of Computing (1998) 327{336. [4] A.Z. Broder, M. Charikar and M. Mitzenmacher, A derandomization using min-wise independent permutations, Proceedings of Second International Workshop RANDOM '98 (M. Luby, J. Rolim, M. Serna Eds.) (1998) 15{24. [5] A.Z. Broder and U. Feige, Min-Wise versus Linear Independence, Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (2000). [6] P. Indyk, A small approximately min-wise independent family of hash-functions, Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (1999) 454{456. [7] C.J.H. McDiarmid, On the method of bounded di erences, Surveys in Combinatorics, 1989, Invited papers at the Twelfth British Combinatorial Conference, Edited by J. Siemons, Cambridge University Press, 148{188.

the electronic journal of combinatorics

7 (2000), #R26

6

[8] C.J.H. McDiarmid, Concentration, Probabilistic methods for algorithmic discrete mathematics, (M.Habib, C. McDiarmid, J. Ramirez-Alfonsin, B. Reed, Eds.), Springer (1998) 195{248. [9] M.J. Steele, Probability theory and combinatorial optimization, CBMS-NSF Regional Conference Series in Applied Mathematics 69, 1997.