Approximate Counting, Almost Uniform Generation and ... - CiteSeerX

8 downloads 0 Views 416KB Size Report
trials of the same experiment, such that if Xi is the outcome of the i'th trial ...... Let x; y be two partitions of n, such that y = xij. Then pxy = nx i nx j. 2n2. Hence pyx =.
Approximate Counting, Almost Uniform Generation and Random Walks Michal Parnas Department of Computer Science Hebrew University Jerusalem, Israel August, 1989

A thesis submitted in ful llment of the requirements for the degree of Master of Science supervised by Prof. Avi Wigderson 1

Contents 1 Introduction

4

2 Monte - Carlo Algorithms for Counting Problems

7

2.1 2.2 2.3 2.4

Estimating the Weight of a Set . . . . . . . . . . . . . . . . . . . . . Estimating the Relative Size of a Subset . . . . . . . . . . . . . . . . DNF Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Multiterminal Network Reliability - MNR - Problem . . . . . . . 2.4.1 A Monte - Carlo algorithm based on cuts for estimating b(F ) . 2.5 Translates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 A Monte - Carlo algorithm for estimating jT j . . . . . . . . . 2.6 The Permanent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 A simple Monte-Carlo algorithm for Per(A) . . . . . . . . . . 2.6.2 A modi ed Monte-Carlo algorithm for Per(A) . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

8 9 11 12 13 14 15 15 16 19

3 Uniform Generation

20

4 Almost Uniform Generation and Randomized Approximate Counting

25

5 Markov Chains as Almost Uniform Generators

28

6 The Rate of Convergence of Markov Chains

32

7 Random Walks on the n-Dimensional Cube

39

8 Approximating the Permanent for Dense Graphs

43

5.1 De nitions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.2 Markov Chains as Generators . . . . . . . . . . . . . . . . . . . . . . . . . . 29 6.1 Conductance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6.2 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7.1 The n-dimensional Cube over Z2 . . . . . . . . . . . . . . . . . . . . . . . . . 39 7.2 The n-dimensional Cube over Zk . . . . . . . . . . . . . . . . . . . . . . . . 40

2

9 Independent Sets in Graphs

45

10 A deterministic Algorithm for Counting Independent Sets in Trees

53

11 Partitions

57

12 Acknowledgements

63

13 References

64

9.1 Independent Sets of Cycles and Lines . . . . . . . . . . . . . . . . . . . . . . 46 9.2 What Can Be Done in the General Case . . . . . . . . . . . . . . . . . . . . 48

11.1 A Random Walk on Pn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 11.2 Random Walks on Pkn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3

1 Introduction Counting is the process of nding the number of elements of a given set, for example counting the number of students in a given class, or nding the number of people that voted Likud in the last elections. The obvious way to solve problems of this type, is by exhaustive search of the set of interest. In the rst example this may take only a few seconds, but in the second it may take a couple of days (depending on the computational power). One may try to develop more ecient counting procedures which depend on the special structure of the set of interest. In the rst example, one can count the number of chairs in a row, and multiply it by the number of rows. It is possible that the answer will not be exact, because some of the seats may be unoccupied. In the election example, one may sample from the set of all votes, get an estimation of the percentage of Likud voters, and multiply that percentage by the number of all voters (assuming that number is known). The answer will only be an estimate. The quality of the estimate depends naturally on the number of samples made, and on the distribution of the samples over the total population. In 1979, Valiant [V] de ned the class #P , of counting problems which are believed to be dicult, i.e. are not known to be solvable in polynomial time. For problems which belong to this class it is natural to seek approximate solutions, in order to reduce the running time of the algorithm. One type of existing approximate counting algorithms are the Monte-Carlo algorithms. They are randomized algorithms which estimate the size of a set with some performance guarantee, i.e. the answer deviates by no more then a speci ed factor from the exact solution. The running time of the algorithm depends on this factor, and is usually faster than any known deterministic algorithm for the problem. We will review the MonteCarlo technique in section 2, and bring some examples of counting problems solved using this technique in the past few years. The example of counting electoral votes, raises the idea that there may be a connection between counting the number of elements of a given set and sampling from it. Jerrum, Valiant and Vazirani [JVV], observed this connection, and de ned the class Uniform Generation. The algorithms in this class generate (sample) uniformly at random the elements of a given set. [JVV] proved that this class is reducible to counting. That is, given a deterministic (approximate) counting algorithm for a given set, it is possible to build a uniform generator for the elements of that set. It was also shown that the classes of randomized approximate counting algorithms and almost uniform generation algorithms are equivalent, for problems which are self reducible. This results are described in sections 3 and 4. Given the above results it was understood, that it may be easier to construct almost uniform generators and use them as building blocks in randomized approximate counting algorithms. Broder [B], was the rst to use Markov chains as almost uniform generators. The main idea is to de ne a Markov chain on the set, whose size we want to estimate, then to simulate the chain until it is close to its stationary distribution, and output the state we have reached as the output of an almost uniform generator. Broder de ned an ergodic Markov 4

chain on the set of all perfect and almost perfect matchings in a dense bipartite graph. He showed that if it converges fast enough to its stationary distribution then it can be used as an almost uniform generator for perfect matchings. Jerrum and Sinclair [JS], were the rst to prove that the chain Broder de ned converges fast (see section 8). The Markov chain method is described in section 5. The two major problems which arise when using Markov chains as almost uniform generators are:

 What is the rate of convergence of the chain?  What is the stationary distribution of the chain? Polynomial rate of convergence is necessary so that the method will also be practical. In section 6, we will review two methods to determine the rate of convergence of a given Markov chain: Coupling by Aldous [A], and Conductance by Jerrum and Sinclair [SJ]. The second question has no easy solution. For reversible Markov chains it will be shown that it is possible to compute the stationary distribution of each state, as the simulation of the chain proceeds (see section 5). In the general case we have to try and bound the stationary distribution. If we can bound the ratio of the stationary distribution of two neighbor states, then we can derive a global bound on the ratio of the stationary distribution of any two states, as a function of the diameter of the underlying graph of the chain (see sections 5 and 9.2). We will illustrate the above concepts by studying the random walk on the n-dimensional cube over Zk . Of course the number of vertices of the cube is known exactly, and therefore there is no need to estimate it, but the rate of convergence of this walk is of independent interest for itself. We will show that the walk converges rapidly to a uniform stationary distribution, using the Coupling technique (see section 7). As far as we know, no other proof for the rapid convergence of this walk was known before. The next problem which shall be studied, is counting the number of independent sets in a graph (section 9). This problem was shown to be #P -complete [PB], and therefore is not likely to have an exact solution, but an approximate one may very well be possible. Recently, Dagum showed that it is possible to estimate the number of independent sets of a claw free graph in polynomial time ([D]). He used technics similar to those used by Jerrum and Sinclair ([SJ],[JS]), to bound the conductance of the exchange graph. The problem of estimating the number of independent sets of a general graph, (not necessarily claw free) in polynomial time, remains open. We will try to use the coupling method, in order to bound the rate of convergence of more direct walks on the set of independent sets of a given graph. For the simple case of a graph with degree  2, a Markov Chain with uniform stationary distribution, will be de ned on the set of independent sets of the graph, and will be shown to converge rapidly. For the general case, a similar Markov Chain will be introduced, which again converges fast, but to an unknown stationary distribution. Proof that the stationary 5

distribution is bounded by a polynomial for a certain class of graphs, will immediately yield a polynomial randomized approximate counting algorithm for independent sets of graphs from that class. For a graph which is a tree, a deterministic algorithm for counting both the number of independent sets and maximum independent sets of the graph will be presented, which runs in polynomial time (section 10). Finally we will examine the problem of nding the number of partitions of a given integer n. An ecient deterministic algorithm for this problem is not known. We will try to construct an approximate one, using the Markov chain method. We will de ne two Markov chains one on the set of partitions of n into k parts, and the second on the set of all partitions of n. Both chains will be shown to be reversible and rapidly converging. Their stationary distribution will be computed exactly. However since it is not bounded polynomialy in n, it can not be used to estimate the number of partitions of n in polynomial time. As a possible solution, a di erent Markov chain will be de ned on the set of partitions of n into k parts. It has a stationary distribution which is bounded by a polynomial in n and k, however its rate of convergence is unknown so far (section 11).

6

2 Monte - Carlo Algorithms for Counting Problems Enumeration problems can be described as computing the size of a set of elements S . We shall say jS j is dicult to compute, if any known deterministic algorithm which nds jS j, runs in time proportional to jS j. For small sets S this may not be too bad, but for exponential size sets the computation may take quite a long time. In this section we will describe a class of algorithms which estimate the size of a set S , such that the running time of the algorithm is faster then any known deterministic algorithm which computes jS j, and depends on the accuracy of the estimation. We rst need to de ne when an estimate is "accurate enough". The following de nition is due to Karp and Luby [KL]:

De nition 2.1 A random variable Y is called an (; )-approximation to Q if Pr((1 + )?1Q  Y  (1 + )Q)  1 ?  Monte - Carlo algorithms are a randomized computational method for the estimation of some quantity Q. In general, a Monte-Carlo algorithm to estimate Q consists of N independent trials of the same experiment, such that if Xi is the outcome of the i'th trial then E (Xi) = Q. The output of the algorithm is Y = X +:::N+XN . It is clear that E (Y ) = Q, and the question is, what is the number of experiments N , which guaranty that Y is a good estimate of Q. 1

Lemma 2.1 Let X1; :::; XN be N independent random variables with expectation  = E (Xi ), X +:::+XN such that 0  Xi  z, and let Y = N . Then Y is an ( ? )- approximation to  if N  z log 2 for   . 1

2

Proof: By the Hoe ding inequality (see [Ho]),  t z ?  )1? z t ]N Pr(Y   + t)  [(  + t ) z ( z ? ?t for all 0  t  z ? . +

+

Take t = , then

? N Pr(Y  (1 + ))  [( 1 +1  )(1+)(1 + z ? t ? t )z??t ] Nz  e z 2

for   . In a similar way it is possible to prove that Take N  z log 2 . Then

Pr(Y  (1 + )?1)  e

?2 N z

2

Pr((1 + )?1  Y  (1 + ))  (1 ? 2 )(1 ? 2 )  1 ?  7

Hence Y is an ( ? )-approximation to . 2 Notice that the restriction   , is not a real problem, since we are interested in estimating the size of large sets, hence  = Q is large and of course  < 1, therefore the requirement    is satis ed. A slightly di erent bound on the number of trials N , needed to guaranty a good estimate is proved in the following lemma.

Lemma 2.2 Let X1 ; :::; XN be N independent random variables, with expectation E (Xi ) = . Then N  EE((XXii)) 36 log 1 is sucient to guaranty an (; )-approximation to . 2

2

2

Proof: Let Y = X +:::n +Xn . Using the Chebychev Inequality, it is easy to prove that Y is an (; 14 )-approximation to , for n  EE((XXii)) 4 . Jerrum, Valiant and Vazirani show in [JVV], 1

2

2

2

how to get an (; )- approximation of  for any . They suggest to make t = 12 log 1 estimates Y , each an ( ? 14 )-approximation to , and to take their median as a nal estimate to . It is possible to show, using the Cherno inequality, that this estimate is an ( ? )2 approximation to . The complexity of this algorithm is N = tn  EE((XXii)) 36 log 1 . 2

2

2

Let us now review some counting problems, which were solved using the Monte-Carlo method. We will use either one of the above two lemmas, when trying to bound the number of trials needed in a Monte-Carlo algorithm.

2.1 Estimating the Weight of a Set Let R be a set, and assume there is a weight function w : R ! with , and for each r-tuple < 1; :::; r > de ne a graph G(1; :::; r) where 2

2

16

 There is a node for each < j; k > labelled with one or more of 1; :::; r.  There is an edge between < j; k > and < j 0; k0 > i j = j 0 or k = k0. Let c(G) be the number of cycles of length  2 in graph G. Let D2 = fG(1; 2)j < 1; 2 >2 P 2g the set of graphs which are de ned by two permutations in P . For each G 2 D2 let [G]2 = f< 1; 2 >2 P 2jG(1; 2) = Gg the set of pairs of permutations in P , which de ne the same graph G.

Lemma 2.6 [G]2 = 2c(G) . Proof: Let < 1; 2 >2 [G]2. Then each cycle of length one in G is labelled with both 1 and 2. Let c be a cycle of length  2 in G. Every node at an even distance from a xed node in c, should be labelled with the same label, and every other node with the other label. Thus there are two possible labellings for each cycle of length  2, and therefore a total of 2

2c(G) labellings. De ne

ODD = f< 1; :::; 4 > j9 < s; t > labelled with an odd number of labels from 1; :::; 4g EV EN = P 4 ? ODD and let

D4 = fG(1; :::; 4)j < 1; :::; 4 >2 EV EN g the set of graphs which are de ned by a set of permutations from EV EN . Then it is easy to check that D4 = D2. For each G 2 D4 let [G]4 = f< 1; :::; 4 >2 EV EN jG(1; :::; 4) = Gg the set of all permutations from EV EN which de ne the same graph G.

Lemma 2.7 [G]4 = 6c(G) .

17

Proof: Let < 1; :::; 4 >2 [G]4. Each cycle of length one in G is labelled with all four of 1; :::; 4. Let c be a cycle of length  2 in G. Every node at an even distance from c

should be labelled with the same two permutations out of!1; :::; 4, and every node at an odd distance with the remaining two. Thus there are 42 = 6 possible labellings for each cycle of length  2, and therefore a total of 6c(G) labellings. 2 And nally,

Lemma 2.8

E (Xi2) E (Xi)2

 3n=2.

Proof: 1. E (Xi)2 = Per(A)2 = PP 1 = PG2D P[G] 1 = PG2D 2c(G) ( by lemma 2.6). 2. Let X (1; :::; 4) = Q4k=1(sgn() Qnj=1 bjk (j)). Then: 2

Xi2 = det(Bi)4 =

X P

4

2

2

X (1; :::; 4) =

2

X

ODD

X (1; :::; 4) +

X

EV EN

X (1; :::; 4)

 For each four permutations 1; 2; 3; 4 2 ODD, there exist two indices s; t which

are labelled with an odd number of labels from these four permutations. Thus either it is labelled by only one, (say 1) or it is labelled by three and not labelled by one (say again 1). Hence 1(s) = t 6= 2; 3; 4. Therefore we can express each summand in the sum over ODD, as H (1; :::; 4)bst, where H is a function which does not contain bst. But E (bst) = 0, and therefore the expected value of the rst term is 0.  For each summand in the sum over EV EN , we have that X (1; :::; 4) = 1, independent of Bi, and thus E (X (1; :::; 4)) = 1. Thus E (Xi2) = PEV EN 1 = PG2D P[G] 1 = PG2D 6c(G) (by lemma 2.7). 4

E (Xi2) E (Xi )2

P P = D

4

4

6c(G)

c(G)  3n=2 , because there are at most n=2 cycles of Therefore c G  maxG 3 D 2 length  2 in any graph G. 2 4

( )

2

As a nal result we get:

Theorem 2.7 It is possible to get an ( ? )-approximation to Per(A), if N = 3n=2 36 log 1 . 2

18

Proof: Follows from lemma 2.2 and lemma 2.8.

2

Therefore the total running time of the algorithm is 3n=2 36 log 1 poly(n), where poly(n) is the time each trial takes. 2

2.6.2 A modi ed Monte-Carlo algorithm for Per(A) [KKLLL] improve the bound on the number of trials N , using the following algorithm:

Algorithm: 1. Let w1; w2; w3 be the 3 cube roots of unity. De ne N random matrices Bi = fbjk g as follows:  bjk = 0 if ajk = 0  bjk 2 w1; w2; w3 randomly, if ajk = 1. 2. Set Xi = det(Bi)  det(Bi). 3. Estimate Per(A) with Y = X +:::N+XN . 1

The following two lemmas can be proved in a similar way to the proof given in section 2.6.1:

Lemma 2.9 E (Xi) = Per(A). Lemma 2.10

E (Xi2) E (Xi)2

 2n=2.

and as a consequence we get:

Theorem 2.8 It is possible to get an (; )-approximation to Per(A), if N = 2n=2 36 log 1 . 2

19

3 Uniform Generation The above section indicates the existence of a direct connection between counting problems and sampling algorithms. Jerrum, Valiant and Vazirani de ne and prove this connection in [JVV]. We will review their results here: Let  be a nite alphabet, and R a relation which associates with each problem x 2 , a set of solutions R(x) = fy 2 jxRyg. Thus for example, x may be a formula in DNF, y an assignment to the variables of x, and xRy if y is a satisfying assignment of x. Given a problem instance x, we can de ne the following problems: 1. 2. 3. 4.

Existence - Does there exist a y, such that xRy ? Construction - Find a y, such that xRy, if there exists one. Uniform Generation - Generate uniformly at random a y, such that xRy. Counting - What is the size of R(x)?

It is clear that the Existence problem is at least as easy as the Construction problem, in the sence that if we can construct a solution y, we can answer the corresponding Existence problem with 'yes', and if no y is constructed the answer is 'no'. It is also obvious that constructing some solution y is at least as easy as generating uniformly at random a solution y. Jerrum, Valiant and Vazirani show in their paper that the Generation problem is probably strictly more dicult the Construction problem, and easier then Counting. That is given an algorithm which nds jR(x)j we ca build an algorithm which generates uniformly at random element of R(x), but the converse is probably not true. Their results apply only for a special kind of relations, self reducible relations. Schnorr [S], was the rst to study these type of relations. Self - reducible relations R, are relations for which there exists a simple inductive construction of solutions in R(x) from the solutions of a few smaller instances of the problem x, and the length of each solution is polynomialy bounded by the length of x. A more formal de nition follows:

De nition 3.1 A relation R     is self - reducible i 1. There exists a polynomial p, such that jyj = p(jxj) if xRy. 2. It can be tested deterministicly in polynomial time whether xRy. 3. There exist polynomial time computable functions :    !  and  :  ! N where (a) (x) = O(logjxj). (b) If p(x) > 0 then (x) > 0. 20

(c) j (x; w)j  jxj. (d) < x; y1:::yn >2 R i < (x; y1:::y(x)); y(x)+1:::yn >2 R. Now let us de ne Generating and Counting algorithms in the terms of relations R:

De nition 3.2 A Uniform Generator for a relation R is an algorithm which on input x, out-

puts only strings y in the solution set of x, each y with equal probability, and the probability for some output is bounded away from 0. A Deterministic Approximate Counter for a relation R is an algorithm which on input < x;  >, outputs a Y such that (1 + )?1jR(x)j  Y  (1 + )jR(x)j For self-reducible relations R, we can de ne the following set, of partial solutions of a problem instance x: Rx;w = fzjxRwzg The following lemma is easy to prove, using the self reducibility structure:

Lemma 3.1 Let R be a self reducible relation. If there exists a deterministic approximate counter for R, then there exists a deterministic approximate counter for Rx;w for any x.

We now get to the main theorem of this section, which states that given a deterministic approximate counter for a self reducible R, we can construct a uniform generator for R. Note that the proof is constructive, and that the counting algorithm must be deterministic (for the connection between randomized approximate counters and generators, see section 4).

Theorem 3.1 Let R be a self reducible relation. If there exists a deterministic approximate counter for R, then there exists a uniform generator for R.

Proof: Let C be a deterministic approximate counter for R, and let Cx;y be the approximation to jRx;y j, as promised by the above lemma. We will describe a recursive algorithm which generates uniformly a y = y1y2:::ym 2 R(x); m = p(jxj). Each iteration of the algorithm will generate the next letter yi of y, with a probability proportional to the number of words in Rx;y y :::yi . Set y =  (the empty word). With probability Cx;yCx;y +Cx;y set the rst C letter of y to be 0, that is y = y0, or with probability Cx;y x;y +Cx;y set the rst letter to be 1. Continue recursively with the new y, until jyj = m. Then check if xRy, and if so output y with probability ' = C1 Cx;C0 + Cx;1 Cx;yC0 + Cx;y 1    Cx;y y :::ymC? 0 + Cx;y y :::ym? 1 x; x;y x;y y x;y y :::ym 0

1 2

0

1

1

0

1

1

1

1 2

1 2

1

1

1 2

1 2

21

1

Note that if C was an exact counter, then ' = jRx;y y1 :::ym j = 1, that is we always output y, and each y is output with the same probability. Since C is only approximate, we output each y with probability  1, and the probability of each output y is the probability of generating it multiplied by ', which is: Cx;y Cx;y y :::ym Cx;y y    ' = 1 Cx;0 + Cx;1 Cx;y 0 + Cx;y 1 Cx;y y :::ym? 0 + Cx;y y :::ym? 1 Cx that is, each output y has an equal probability. The only thing which need to be checked is that the probability of some output is bounded away from 0, that is that ' is bounded away 2 from 0. But '  (1 + )?2m, and for  = m1 it is bounded away from 0. 1 2

1

1 2

1 2

1

1

1 2

1

1 2

1

Let us move on, and show that Uniform generation is probably more dicult then Construction. That is, there are relations R, for which there exist polynomial time bounded algorithms which construct solutions of R, but no fast uniform generator for R is believed to exist.

Theorem 3.2 If there exists a polynomial time bounded uniform generator for cycles in a

given directed graph, then NP = RP .

Proof: Let G = (V; E ) be a directed graph with jV j = n, and de ne CY CLE (G) = fcjc is a cycle of Gg Then: 1. The problem of constructing a cycle in G is easily solved in polynomial time. 2. We will show that if there exists a uniform generator of cycles then NP = RP , and thus it is unlikely that uniform generation of cycles can be solved in polynomial time. De ne DHC = fhjh is a Hamiltonian cycle in Gg

22

Let G` = (V `; E `) be the directed graph derived from G by replacing every edge < u; v >2 E by:

u

v

v’

u’

k copies

where k = nlogn. We can see that for each cycle c of G, which includes edge < u; v >, there are 2k cycles in G0, which include nodes u0; v0. Therefore, if G` contains a cycle of length 2kn then it contains at least 2kn cycles of this length, and this happens i G contains a Hamiltonian cycle (of length n). Also, the total number of cycles in G` of length < 2kn is < nn 2k(n?1)  2kn . Thus if jDHC j 6= 0 then the probability that a randomly generated cycle of G` has length 2kn is  1=2, and if jDHC j = 0 then the probability is 0. Therefore the problem of generating Hamiltonian cycles is in RP , and because it is NP - complete we get that NP = RP .

2 To illustrate the gap between Counting and Uniform Generating, Jerrum, Valiant and Vazirani turn to the problem of counting the number of satisfying assignments of a DNF formula versus the problem of generating uniformly a satisfying assignment to such a formula. Their proof relies on the Monte-Carlo algorithm presented in section 2.3, which is due to Karp and Luby.

Theorem 3.3 There exists a polynomial time bounded uniform generator for satisfying assignments of a DNF formula, but the corresponding counting problem is #P -complete.

Proof: 1. The problem of counting the number of satisfying assignments of a DNF formula was shown to be #P -complete ([V]). 2. The sampling algorithm which is presented in section 2.3, can be used as a uniform generator. We generate uniformly at random a pair (j; x), where x is an assignment satisfying the j 0th clause of the formula F , and output x if j is the lowest numbered clause satis ed by x (steps 1-3 of section 2.3). Each assignment is generated with equal probability, and the probability of some output is  m1 , where m is the number 23

of clauses of F . If we repeat this algorithm for m times, then the probability for no output will be  (1 ? 1=m)m  1=e, and therefore bounded away from 1. 2

24

4 Almost Uniform Generation and Randomized Approximate Counting In the last section we reviewed the connection between uniform generation and counting. We showed that given a counting algorithm (even an approximate one) we can construct a uniform generator, but that the opposite is probably not true. Jerrum, Valiant and Vazirani in [JVV], and Broder in [B] showed that if we are willing to compromise and be satis ed with a randomized approximate counting algorithm, then it is possible to construct one given an almost uniform generator and vice-versa. Their results apply only for self reducible relations. Let us de ne the concepts above (these de nitions can be found in [JVV],[B],[JS] and other places, in various forms):

De nition 4.1 An  - Generator for a relation R is an algorithm, which on input x, outputs

with probability bounded away from 0, only strings y in R(x), and (1 + )?1  Pr(output y)  (1 + ) jR(x)j jR(x)j (that is each output y has a probability which is close to uniform). An Almost Uniform Generator for R, is an algorithm which on input < x;  >, behaves as an  - Generator for R. An Almost Uniform Generator is fully - polynomial (f.p.) if it runs in time polynomial in jxj and 1 . An (; ) Randomized Approximate Counter for a relation R, is an algorithm, which on input x, outputs a Y such that Pr((1 + )?1jR(x)j  Y  (1 + )jR(x)j)  1 ?  A Randomized Approximate Counter for R, is an algorithm, which on input < x; ;  >, behaves as an (; ) Randomized Approximate Counter for R. A Randomized Approximate Counter is fully - polynomial if it runs in time polynomial in jxj, 1 and log 1 . We are now ready to state and prove the equivalence of randomized approximate counters and almost uniform generators for self reducible relations R. These results can again be found in [JVV] and [B]. Before we proceed, let us state the following lemma. The proof is omitted.

Lemma 4.1 Let R be a self reducible relation. If there exists an f.p. randomized approximate counter for R, then there exists an f.p. randomized approximate counter for Rx;w .

25

From this lemma, we get that given a fast approximate counter we can build a fast uniform generator. Again the proof is constructive.

Theorem 4.1 Let R be a self- reducible relation. If there exists an f.p. randomized approximate counter for R, then there exists an f.p. almost uniform generator for R.

Proof: The proof is identical to the one of theorem 3.1, with the di erence that we are

given a f.p. randomized approximate counter C instead of a deterministic one. Therefore the algorithm presented in theorem 3.1 constructs an almost uniform generator instead of a uniform one. Let m = p(jxj), then by the algorithm described in the proof of theorem 3.1, we make 2m calls to the randomized approximate counter C (using the above lemma). Therefore the deviation from the exact uniform distribution jR(1x)j is at most 2m. But jR(1x)j  21m , and thus if we take  = 2m2m then: If < x; y >2 R then (1 + )?1  1 ? 2m  Pr(output y)  1 + 2m  (1 + ) jR(x)j jR(x)j jR(x)j jR(x)j and if < x; y >62 R then Pr(output y) = 0. It is easy to verify that the resulting uniform generator is polynomial in jxj and 1 . 2 +1

The converse of the above theorem is:

Theorem 4.2 Let R be a self - reducible relation. If there exists an f.p. almost uniform generator for R, then there exists an f.p. randomized approximate counter for R.

Proof: Let m = p(jxj), and let M be an f.p. almost uniform generator for R. Make N calls to M with input < x; =m >, and let w be the most commonly occurring pre x of length (x) among the strings generated. Set Xi = 1 if the initial segment of the i'th string x)jm log 2 , then Y = X +:::+XN is an generated is w, and Xi = 0 otherwise. Take N  jjRR(x;w j  N j R j j R j  1 x;w x;w 1 ( m ; )-approximation to jR(x)j (see section 2.2). But jR(x)j  2 x  jxj (w was chosen as the most commonly occurring pre x of length (x)), and therefore N is polynomial in jxj, 1 and log 1 . Rx;w = R( (x; w)) because R is self- reducible, and thus we can proceed recursively. After at most m iterations, we will be left with a set containing just one string. Let Y 0 be the inverse of the product of all the ratios computed. Then Y 0 is an (; ) approximation to jR(x)j, because (1 + )?1jR(x)j  (1 + m )?m jR(x)j  Y 0  (1 + m )mjR(x)j  (1 + )jR(x)j 2

1

2

( )

26

2

Theorem 4.3 Let R be a self - reducible relation. If there exists a f.p. jxjk - generator for R, where the output probability of each y 2 R(x) is known (or can be computed in polynomial time), then there exists a f.p. randomized approximate counter for R.

Proof: The proof is similar to the proof of theorem 4.2, and relies on the algorithm described in section 2.2 for computing the ratio of two sets.

27

2

5 Markov Chains as Almost Uniform Generators As stated in the above sections, there is a tight connection between counting and generating. Therefore it may be easier to try and build an almost uniform generator for a relation R, instead of trying to construct a counting algorithm for it. Broder was the rst to use Markov Chains as generators. Before we review his results and see how to use Markov Chains as almost uniform generators, let us review some elementary concepts of Markov Chains. These de nitions can be found in any probability book (see for example [Ro] for a good reference on Markov Chains).

5.1 De nitions and Notation Let (Xt )1 t=0 be a Markov-chain on a nite space state [N ] = f1; :::; N g, with transition matrix P = (pij )Ni;j=1, where pij = Pr(Xt+1 = j jXt = i), independent of t. The s-step transition matrix is P s = (psij ), where psij = Pr(Xt+s = j jXt = i), independent of t. Let t = (1t ; :::; Nt ) be the distribution vector of Xt , where it = Pr(Xt = i). The initial distribution of the chain is  0. We will note ijt = Pr(Xt = ijX0 = j ), that is ijt is the distribution vector of the chain at time t, if we are given that j0 = 1 and k0 = 0 for every k 6= j . In this case j is called the initial state of the chain. The chain is ergodic if there exists a distribution  = (1; :::; N ), such that i > 0 for all i, and lims!1 psij = j for all i; j 2 [N ]. Then  is called the stationary distribution of the chain, and it is the only vector which satis es P = , that is, it is a left eigenvector of P with eigenvalue 0 = 1. The following lemma, which we state without proof, gives an easy criterion to verify whether a given chain is ergodic.

Lemma 5.1 The chain is ergodic i : 1. The chain is irreducible - for all i; j 2 [N ]; 9s such that psij > 0, and 2. The chain is aperiodic - for all i; j 2 [N ]; gcdfsjpsij > 0g = 1.

An ergodic Markov chain is said to be time - reversible i pij i = pji j for every i; j 2 [N ], where  is the stationary distribution of the chain. If we can nd a distribution , such that pij i = pji j for every i; j , then: (P )j =

X i

pij i = 28

X i

pji j = j

for every j . Hence, the chain is reversible, and  is its stationary distribution. Thus for example if pij = d1 for every i and j , where d is some positive constant, then the uniform distribution i = jN1 j , is a solution to the set of equations pij i = pji j , and therefore the stationary distribution of the chain. The underlying graph of a Markov chain is G = (V; E ), where V = [N ], and for each i; j 2 V , such that pij > 0, there is a directed edge (i; j ) 2 E with weight pij . State i will be called a neighbor of state j , if the edge (i; j ) exists, that is if pij > 0. If also pji > 0, then states i; j will be called neighbors. If pij = d1i , where di is the number of edges leaving state i, then as above, the distribution i = 2djEi j is a solution to the set of equations pij i = pji j . Hence, the chain is reversible, with stationary distribution .

5.2 Markov Chains as Generators We now outline a general procedure, which describes how to use Markov chains as almost uniform generators. Let R be a relation, with solution set R(x) which we want to generate uniformly. 1. De ne a Markov Chain on R(x) with uniform (not necessarily) stationary distribution. 2. Simulate the chain until it is close to stationarity. That is, if we are at state i, then we move with probability pij , to state j . 3. Output the state reached as the output of an almost uniform generator. The problems which this simple scheme raises, are many: 1. How do we de ne the Markov chain? This is not always easy, since the chain should be irreducible so that it will be possible to reach (generate) all states of the chain. The chain should also be aperiodic, so that at a given time there will be a positive probability to be at each of the states of the chain. But this requirement is easily achieved, if we add a self loop of probability 1/2 to each state. That is we set pii = 1=2 for each i, and adjust the rest of the probabilities. This change will only slow slightly the simulation of the chain, but will result with an aperiodic chain. 2. From what state do we start simulating the chain? It should be possible to construct at least one element of R(x) in polynomial time and to start the simulation from it. 29

3. When is the chain close to stationarity? For how long do we need to simulate the chain until it is close to stationarity, and what does it mean to be close to stationarity? (see section 6). 4. What is the stationary distribution of the chain? If the chain has uniform distribution, then it can be used as an almost uniform generator. However we will show that it suces to have a chain with a distribution that is 'close' to uniform. This is also not always easy to achieve or determine. The rst two problems are directly connected to the structure of the problem x, and each problem should be treated di erently. For example, Broder in [B] de ned a Markov chain on the set of perfect and almost perfect matchings of a given graph. The problem of nding one perfect matching has many known algorithms. In section 9, we will de ne a Markov chain on the set of all independent sets of a given graph. Constructing one independent set is of course easy, just take any node of the graph as an independent set of size one. For this reason let us concentrate on the last two problems raised above. We will complete this section with a more concrete discussion of the last question and leave the third question for the next section. If the stationary distribution is not uniform, but it is known that i = Dci , for D = Pi ci, then we can output state i with probability c1i , and thus achieve a uniform distribution on R(x). As a result we can build an approximate counter for R, whose running time will depened on the deviation of the stationary distribution from a uniform one. The resulting counting algorithm will be fully polynomial, if the stationary distribution is close to uniform, where:

De nition 5.1 The stationary1 distribution  of a Markov chain over state space [N ], is said to be close to uniform if cN  i  Nc for every i, where c is polynomial in log N (or c is polynomial in jxj, where R(x) is the relation described by the Markov chain) . (See also section 2.2 and theorem 4.3). If the chain is reversible, it is possible to compute the constants ci, as the simulation proceeds.

Algorithm: 1. Assume without loss of generality that the simulation of the chain begins at state i. Set ci = 1. 2. The chain is reversible and therefore

pij i = pjij 30

hence also

pij ci = pjicj Therefore, for each j such that pij > 0, we can compute cj = ppij ci ji Since we can move from state i only to states j for which pij > 0, we are done. Continue with state j . If the chain is not reversible, but we know that ji  c for any two neighbor states i; j and c a constant, then we can derive the bound ji  cd, for any two states i; j , where d is the diameter of the underlying graph of the chain. Thus if d  log log N , where N is the number of states, (or d  log jxj in terms of the relation R(x) described by the chain) then i c j  (log N ) , hence the stationary distribution  , is close to uniform.

31

6 The Rate of Convergence of Markov Chains Let us turn now to the question of estimating the time it takes a given Markov chain MC , to get close to its stationary distribution , or in a more formal way: to estimate the rate of convergence of MC . Fast rate of convergence is needed if we want to use the chain as a polynomial generator. It is not enough to know that the chain converges fast. We need to know also how fast, so that we know for how long we should simulate the chain until it is close to stationarity. Before we introduce methods for estimating the rate of convergence of a given Markov chain, let us de ne more formally what we mean by 'close to stationarity' and 'converges fast':

De nition 6.1 Let d(t) be some distance measure between the stationary distribution  and the distribution t of the chain at time t. We say the chain is  ? close to  at time t if d(t)  . De nition 6.2 A Markov Chain over state space [N ], converges rapidly to its stationary distribution , if it is  ? close to  at time t, where t is a polynomial in log N and 1 . We will now introduce two methods to estimate the time t needed, until the chain is  ? close to its stationary distribution - one developed by Sinclair and Jerrum [SJ], and the other by Aldous [A].

6.1 Conductance This method for estimating the rate of convergence of a given Markov chain is due to Jerrum and Sinclair [SJ]. The intuition behind their method is that the chain converges rapidly if it is unlikely to stay in any subset S , of the state space, whose total stationary distribution is small. They formalize this concept by considering the underlying graph of the chain as a network, and showing that the edges which come out of S , should carry a large ' ow', so that S is easyly left. The conductance of the chain measures the minimum relative connection between small subsets S and the rest of the space, and is shown to be related to the rate of convergence of the chain. We outline now the main aspects of their proof, but rst let us de ne the distance measure they use:

De nition 6.3 The relative pointwise distance between  and t is j ijt ? ij (t) = max i;j 2[N ]

32

i

The proof of the following theorem can be found in [SJ]:

Theorem 6.1 Let P be the transition matrix of an ergodic time-reversible Markov chain, let i ; 1  i  N , be the eigenvalues of P , and assume that 1 = 1  2  :::  N . Then (t)  min 2  i2[N ] i t

As a result of this theorem, we see that an ergodic time-reversible Markov chain will converge rapidly to , if i is not too small for all i 2 [N ], and 2 is bounded away from 1. The rst of these conditions can be easily checked if the stationary distribution is uniform or known to be close to uniform, therefore we now examine the second. Let S  [N ] and de ne CS = Pi2S i - the capacity of S , FS = Pi2S;j2S pij i - the ow of S , and S = FS =CS . Let  = min0

Suggest Documents