Jan 4, 1991 - Alan Friezey. Department of ...... is similar to that employed by Payne and Weinberger 28] to bound the second largest eigenvalue of the \free.
COMPUTING THE VOLUME OF CONVEX BODIES: A CASE WHERE RANDOMNESS PROVABLY HELPS Martin Dyer School of Computer Studies, University of Leeds, Leeds, U.K. and
Alan Friezey Department of Mathematics, Carnegie-Mellon University, Pittsburgh, U.S.A. 4 January, 1991 Abstract
We discuss the problem of computing the volume of a convex body K in n . We review worst-case results which show that it is hard to deterministically approximate voln K and randomised approximation algorithms which show that with randomisation one can approximate very nicely. We then provide some applications of this latter result. IR
Supported y Supported
by NATO grant RG0088/89 by NSF grant CCR-8900112 and NATO grant RG0088/89
1
1 Introduction The mathematical study of areas and volumes is as old as civilization itself, and has been conducted for both intellectual and practical reasons. As far back as 2000 B.C., the Egyptians1 had methods for approximating the areas of elds (for taxation purposes) and the volumes of granaries. The exact study of areas and volumes began with Euclid2 and was carried to a high art form by Archimedes3. The modern study of this subject began with the great astronomer Johann Kepler's treatise4 Nova stereometria doliorum vinariorum, which was written to help wine merchants measure the capacity of their barrels. Computational eciency has always been important in these studies but a formalisation of this concept has only occurred recently. In particular the notion of what is computationally ecient has been identi ed with that of polynomial time solvability. We are concerned here with the problem of computing the volume of a convex body in n , where n is assumed to be relatively large. We present results on the computational complexity of this problem which have been obtained over the past few years. Many of our results pertain to a general oracle-based model of computation for problems concerning sets developed by Grotschel, Lovasz and Schrijver [13]. This model is discussed in Section 2. We note here that classical approaches, using calculus, appear tractable only for bodies with a high degree of symmetry (or which can be anely mapped to such a body). We can for example show by these means that the volume of the unit ball B (0; 1) in n is n=2 =?(1 + n=2), or that the volume of a simplex with with vertices p0 ; p1 ; : : : ; pn is given by the \determinant formula" 1 1 : : : 1 : vol () = (1) IR
IR
p0 p1 : : : pn
n
However, for unsymmetric bodies, the complexity of the integrations grows rapidly with dimension, and quickly becomes intractable. In Section 3, we formalise this observation, and discuss negative results which show that it is provably hard for a completely deterministic polynomial time algorithm to calculate, or even closely approximate, the volume of a convex body. In stark contrast to these negative results, in Section 4 we describe the randomized polynomial time algorithm of Dyer, Frieze and Kannan [10], with improvements due to Lovasz and Simonovits [24], Applegate and Kannan [2]. We give some new improvements in this paper. This algorithm allows one, with high probability, to approximate the volume of a convex body to any required relative error. This algorithm has a number of applications, and some of these are described in Section 5. Section 6 then examines \how much randomness" is needed for this algorithm to succeed.
2 The oracle model A convex body K n could be be given in a number of ways. For example K could be a polyhedron and we are given a list of its faces, as we would be in the domain of Linear Programming. We could also be given a set of points in n and told that K is its convex hull. We consider this \polyhedral" situation brie y in Section 3.2. In general, however, K may not be a polyhedron, and it might be dicult (or even impossible) to give a compact description of it. For example, if K = f(y; z ) 2 m+1 : v(y) z g, where v(y) = maxfcx : Ax = y; x 0g is the value function of a linear program (A is an m n matrix.) IR
IR
IR
1 The Rhind Papyrus (copied ca. 1650 BC by a scribe who claimed it derives from the \middle kingdom" about 2000 - 1800 BC) consists of a list of problems and solutions, 20 of which relate to areas of elds and volumes of granaries. 2 The exact study of volumes of pyramids, cones, spheres and regular solids may be found in Euclid's Elements (ca. 300 BC). 3 Archimedes (ca. 240 BC) developed the method of exhaustion (found in Euclid) into a powerful technique for comparing volumes and areas of solids and surfaces. Manuscripts: 10 < < 3 1 ). 1. Measurement of the Circle. (Proves 3 71 7 2. Quadrature of the Parabola 3. On the Sphere and Cylinder 4. On Spirals 5. On Conoids and Spheroids
The application of modern in nitesimal ideas begins with Kepler's Nova stereometria doliorum vinariorum (New solid geometry of wine barrels), 1615. 4
2
We want a way of de ning convex sets which can handle all these cases. This can be achieved by taking an \operational" approach to de ning K i.e. we assume that information about K can be found by asking an oracle. This approach is studied in detail by Grotschel, Lovasz and Schrijver [13]. Our model of computation for convex bodies is taken from [13]. In order to be able to discuss algorithms which are ecient on a large class of convex bodies, we do not assume any one particular formalism for de ning them. For example, we do not want to restrict ourselves to convex polyhedra given by their faces. However, if the body is not described in detail, we must still have a way of gaining information about it. This is done by assuming that one has access to an \oracle". For example we may have access to a strong membership oracle. Given x 2 n we can \ask" the oracle whether or not x 2 K . The oracle is assumed to answer immediately. Thus the work that the oracle does is hidden from us, but in most cases of interest it would be a polynomial time computation. For example, if K is a polyhedron given by its facets, all the oracle needs to do is check whether or not x is on the right side of each de ning hyperplane. The advantage of working with oracles is that the algorithms so de ned can be applied in a variety of settings. Changing the class of convex body being dealt with, only requires changing the oracle (i.e. a procedure in the algorithm,) and not the algorithm itself. Moreover, an oracle such as this, plus a little more information, is sucient to solve a variety of computational problems on K . With such an oracle, we will need to be given a litle more information. We must assume that there exist positive r; R 2 and a 2 n such that B (a; r) K B (a; R) (2) where B (x; ) denotes the ball centred at x with radius . In this case we say that the oracle is well-guaranteed, with a; r; R being the guarantee. Without such a guarantee, one could not be certain of nding even a single point of K in nite time. So, from now on, we assume that the guarantee is given along with the oracle. We do not lose any important generality if we assume that r; R 2 and a 2 n . Using h i to denote the number of bits needed to write down a rational object, we let L0 = hr; R; ai and L = L0 + n. This will be taken as the size hK i of our input oracle. A polynomial time algorithm is then one which runs in time which is polynomial in hK i. Hence we are allowed a number of calls on our oracle which is polynomial hK i. In the cases of interest, it is also true that each such call can be answered in time which is polynomial in hK i, and hence we have a polynomial time algorithm overall. (See [13] for further details.) If K is a polyhedron given by its faces, then it is more usual to let the input length be the number of bits needed to write down the coecients of these faces. The reader should be able to convince him/herself that if K is non-empty then in polynomial time one can compute a; r; R as above and the two notions of input length are polynomially related. Now let us be precise about the other oracles considered in this paper. First there is the weak membership oracle. Given x 2 n and positive 2 this oracle will answer in one of the following ways: x 2 S (K; ) = fy 2 n : y 2 B (z; ) for some z 2 K g or x 62 S (K; ?) = fy 2 n : B (y; ) K g: Again each call to the oracle is normally assumed to take time which is polynomial in hK i and hi. We will also have need of a weak separation oracle. Here, given x 2 n and positive 2 this oracle will answer in one of the following ways: x 2 S (K; ) = fy 2 n : y 2 B (z; ) for some z 2 K g or c y c x + for all y 2 S (K; ?) n where kck1 = 1 and c 2 is output by the oracle. One pleasant consequence of the ellipsoid method is that a weak separation oracle can be obtained from a weak membership oracle in polynomial time (see [13]) and so it is not strictly necessary to consider anything other than weak membership oracles. The positive results of this paper will be couched in terms of weak oracles. Thus given a weak membership oracle for a bounded convex body K we will see that we can approximate its volume to within arbitrary accuracy in random polynomial time using the algorithm of Dyer, Frieze and Kannan [10]. IR
IR
IR
Q
Q
Q
Q
IR
IR
Q
IR
Q
3
Q
However some of the negative results can be couched in terms of strong oracles. Thus we must also mention the strong separation oracle. Here, given x 2 n the oracle will answer in one of the following ways: Q
x 2 K or c y < c x for all y 2 K
where kck1 = 1 and c 2 n is output by the oracle. It turns out that even with a strong separation oracle, it is not possible to deterministically approximate the volume of a convex body \very well" in polynomial time. Q
3 Hardness proofs In this section we review some results which imply that computing the volume of a convex body, or even an approximation to it, is intractable if we restrict ourselves to deterministic computations.
3.1 Oracle model We say that V^ is an -approximation to voln (K ) if 1=(1 + ) voln (K )=V^ (1 + ), and that volume is approximable if there is a deterministic polynomial time (oracle) algorithm which will produce an -approximation for any convex set K . We begin, historically, with the positive result. Assume that K is well-guaranteed (see Section 2). Grotschel, Lovasz and Schrijver [13] showed that there p is a polynomial time computable ane transformation f : x 7! Ax + b in n such that B (0; 1) f (K ) n n + 1B (0; 1). (The \rounding" operation.) Since the Jacobian of f is simply det(A), this implies that we can calculate (in deterministic polynomial time) numbers ; such that voln (K ) , with =p O(n3n=2 ). The p reader may easily check that the best we can do in these circumstances is to put V^ = , giving an ( = ? 1)-approximation. It follows that volume is O(n3n=4 )approximable. This may seem rather bad, but Elekes [11] showed that we cannot expect to do much better. His argument is based on the following IR
Theorem 1 (Elekes) Let p1; p2; : : : ; pm be points in the ball B = B(0; 1) in n, and P = conv fp1; p2; : : : ; pmg. Then voln (P )=voln (B ) m=2n. Proof1 Let B1 i be the ball centre 21 pi, radius 21 . Note voln(Bi ) = voln (B)=2n. Suppose y 62 Sni=1 Bi . Then ; m. Thus all pi lie in (y ? 2 pi )2 > 4 for i = 1; 2; : : : ; m. Since p2i 1, we have pi y < y2 for i = 1; 2; : : : S 2 . So P H , but clearly y 62 H , so y 62 P . Thus P n Bi , and therefore the half-space H : yx < y i=1 P vol (B ) = mvol (B)=2n. voln (P ) m 2 n i=1 n i IR
Keeping the above notation, it follows that, with any sub-exponential number m(n) of calls to a strong membership oracle, a deterministic algorithm A will be unable to obtain good approximations. For, suppose K = K (A) B is such that the oracle replies that the rst m(n) points queried lie in K . Then any K such that P K B is consistent with the oracle, and hence we cannot do better than (2n=2 =pm)-approximation. If m(n) is polynomially bounded, it follows, in particular, that volume is not 2n=2?! log n -approximable for any ! = !(n) ! 1. Note that it is crucial to this argument that A is deterministic, since K must be a xed body. For, suppose A is nondeterministic, and can potentially produce M (n) dierent query points, if allowed m(n) queries on a given input. Then it only follows that we cannot do better than (2n =M )-approximation. If M is a fast growing function of n, this bound may be weak. We return to this point in Section 6 below, in the context of randomized computation. Elekes' result was strengthened by Barany and Furedi [3], who showed that (even with a strong separation oracle) volume is not ncn -approximable, for any constant c < 21 . This result implies that the method of [13] described above is, in a weak sense, an \almost best possible" deterministic algorithm for this problem. However, recently, Applegate and Kannan [2] have adapted an idea of Lenstra [22] to produce an algorithm which works even better. This idea will also be exploited in the algorithm of Section 4. The idea is to start with any right simplex S in the body, and gradually \expand" it. Using the guarantee, we can initially nd such a simplex with vertices f0; rei (i 2 [n])g. (We will use ei for the ith unit vector and e for the vector of all 1's throughout.) If we 4
scale so that S is the standard simplex with vertices f0; ei (i 2 [n])g, K is contained in B (0; R=r). Thus, by simple estimations, voln (K )=voln (S ) < (2nR=r)n . Now, for each i = 1; 2; : : :; n, we check whether the region fx 2 K : jxi j 1 + 1=n2g is empty. This can be done in polynomial time [13] to the required precision. Suppose not, then for some i, we can nd a point yi in this region. Replace ei by yi as a vertex of S . Clearly the ratio voln (K )=voln (S ) decreases by a factor at least (1 + 1=n2). We now transform S back to the standard simplex. This leaves the volume ratio unaected. Clearly this must terminate before k iterations, for any (1 + 1=n2)k (2nR=r)n . Thus k = d2n3 ln(2nR=r)e iterations will suce, i.e. \polynomially" many. However, at termination K is clearly contained in a cube A(0; 1 + 1=n2), where A(a; b) is the cube centred at a with side 2b. Thus voln (K )=voln (S ) n!f2(1 + 1=n2)gn = O(n!2n ) = n(1?o(1))n: 1
We then approximate voln (K ) in the obvious way, producing an n( 2 ?o(1))n approximation. It now follows from [3] that this procedure is (in a certain sense) an \optimal" deterministic approximator. Moreover, since S contains the cube A(e=(2n); 1=(2n)) so does K . Thus, relocating the origin at e=(2n) and scaling by a factor 2n on all axes, we see that K will contain A(0; 1) and be (strictly) contained in A(0; 2(n + 1)) for any n 2. We make use of this in Section 4 below, following Applegate and Kannan [2].
3.2 Polyhedra Suppose a polyhedron P n is de ned as the solution set of a linear inequality system Ax b. The size of the input (as remarked in Section 2) is de ned by hAi + hbi. Here we might hope that the situation regarding volume computation would be better, but this does not seem to be the case (at least as far as \exact" computation is concerned). The following was rst shown by Dyer and Frieze [9]. Let us use Cn to denote the unit n-cube [0; 1]n = f0 x eg, and H n the half-space fax bg, where a; b are integral. Consider the polytope K = Cn \ H . Then it is #P-hard to determine the volume of K . The proof is based on the following identity, which is easily proved using inclusion-exclusion. Let V = f0; 1gn = vert Cn and, for v 2 V , write jvj = ev. Then IR
IR
voln (K ) =
X
v2V
(?1)jvjvoln (v );
(3)
where v = fx vg \ H . Now if v is nonempty, it is a simplex with vertices v; v + (b ? av)ei =ai (i = 1; 2; : : : n); (4) Q and hence by the determinant formula (1) for the volume of a simplex, (n! ni=1 ai )voln (v ) = max(0; b ? av)n . Thus, from (3), n X Y (5) (n! ai )voln (K ) = (?1)jvj max(0; b ? av)n : i=1
v2V
Now the right side of (5) may be regarded as a polynomial in b, for all b such that V \ H remains the same. (This true be for b in successive intervals of width at least 1.) The coecient of bn in the polynomial is P will jvj in polynomial time by v2H (?1) . Now, supposing we can compute volume, we can determine this coecient interpolation, using (n + 1) suitable values of b. Now let Nk = jfv 2 H : jvj = kgj, a0 = a + Me; b0 = b + Mk where M > ae > b > 0. Consider the inequality H 0 = fa0x b0 gP. It follows? easily that v 2 H 0 i either jvj < k, k ? 1 n i 0 n or jvj = k and v 2 H . Thus, from (5), b willPhave coecient i=1 (?1) i + (?1)k Nk . >From this we could compute all Nk (k = 1; 2; : : :; n). However, nk=1 Nk = jV \ H j is a well-known #P-hard quantity, i.e. the number of solutions to a zero-one knapsack problem. It follows that volume computation must also be #P-hard. Since a in the above must contain large integers, this still left open the question of strong #P-hardness of the problem of computing the volume of a polyhedron. This was rst shown to be strongly NP-hard by Khachiyan [20], using the intersection of \order polytopes" with suitable halfspaces. The order polytope is de ned as follows. Let be a partial order on the set [n] = f1; 2; : : :; ng, then the order polytope P () = fx 2 Cn : xi xj if i j g: A permutation of [n] is a linear extension of if (i) (i + 1) for i = 1; 2; : : : ; n ? 1. Given , let E () = f : is a linear extension of g; 5
and let e() = jE ()j. Linial [23] (and others) observed that, in fact, n!voln (P ()) = e(). To see this let
S = fx 2 Cn : x(1) x(2) : : : x(n) g: S Then one observes that the the S intersect in zero volume, and that P () = 2E() S . An application of (1) shows easily that voln (S ) = 1=n! always, so voln (P ()) = e()=n!, as required. It was conjectured that e() was #P-hard, but this issue, though of considerable interest, remained open for some years. Recently,
however, Brightwell and Winkler [6] have nally settled this conjecture in the armative. Their proof is a little too complicated to sketch here, but their result implies, in particular, that polyhedral volume computation is strongly #P-hard, even for this natural application. We will return to this application in Section 5.2 below. It is also shown in [9] that the volume of a polyhedron can be computed, to any polynomial number of bits, using a #P oracle. The construction uses a \dissection into cubes" similar to that used in Section 4 below. A pre-selected polynomial bound on the number of bits is in fact necessary, as the following considerations imply. By decomposing into simplices, we can easily show that the volume of a rational polyhedron is a rational p=q for p; q 2 . This argument also shows that p and q require only exponentially many bits, but it was asked in [9] whether polynomially many bits will suce. The answer to this is negative, and the situation is almost as bad the above indicates. This may be shown using a simple, but ingenious, construction due to Lawrence [21]. Consider the situation of (3), (4) above, with a = (2n?1 ; 2n?2 ; : : : ; 2; 1) and b > ae = 2n ? 1. Now K = Cn and V H . Observe that av is the number whose binary representation is v, so as v runs through V , (1 + av) runs through the integers from 1 to 2n . Suppose now we make the projective transformation f : x 7! x=(1 + ax) in n . Since projective transformations preserve hyperplanes, the identity corresponding to (3), i.e. Z Z
IR
voln (f (Cn )) =
X
v2V
(?1)jvjvoln (f (v ));
(6)
is still valid. Note that f (Cn ) is the polyhedron C~n = f0 x (1 ? ax)eg. But, from (4), ~ v = f (v ) has vertices v=(1 + av); (v + (b ? av)ei =ai )=(b + 1) (i = 1; 2; : : :n): (7) Letting b ! 1, (7) simpli es to v=(1 + av); ei =ai (i = 1; 2; : : : n): (8) Q n ~ Applying the determinant formula (1) to (8), we nd (n! i=1 ai )voln (v ) = 1=(1 + av). Hence, from (3), inserting the values of the ai , 2n X n ( n ? 1) = 2 = (n!2 )voln (C~n ) = 1=j; (9) j =1
where the sign is + i the binary number j contains an odd number of one-bits. It is not dicult to see that the rational number has an immense denominator. Consider the primes between 2n?1 and 2n . The Prime Number Theorem implies that, for large n, there are at least 2n?1=(n ? 1) such primes. Each of these primes occurs exactly once as a factor of any j in the expression for . It follows easily that every such prime divides the denominator of . Thus 's denominator is at least their product, i.e. more than 22n?1 . A polyhedron may be de ned dually as the convex hull of a set of m points p1 ; p2 ; : : : ; pm in n . This problem is, however, no easier. It is shown in [9] that computing volume in this situation is also #P-hard. The examples used are the \duals" of the polyhedra K described above. It remains open whether this problem is strongly #P-hard. However, it is true (and easy to prove) that, in this presentation, the volume is a rational of size polynomial in the input. (See [9] for details.) IR
4 Randomized volume approximation In spite of the negative results of Section 3, Dyer, Frieze and Kannan [10] succeeded in devising a randomized algorithm which can, with high probability, approximate the volume of a convex body as closely as desired in polynomial time. (This will be made precise later.) The algorithm itself is a fairly simple random walk. The diculties lie in the analysis. The analysis of [10] used the idea of \rapidly mixing Markov chains", and exploited a powerful isoperimetric inequality on the boundary of convex sets due to Berard, Besson and Gallot [5] in order 6
to prove a crucial property of the random walk. A dierent isoperimetric inequality was also conjectured in [10], concerning the \exposed" surface area of volumes in the interior of convex sets, which would improve the time bound of the algorithm. Aldous and Diaconis (see, for example, [1]) seem to have originated the investigation of Markov chains which \mix rapidly" to their limit distribution. A major step forward in their applicability to the analysis of randomized algorithms came when Sinclair and Jerrum [30] proved a very useful criterion for rapid mixing, based on conductance. They have applied this, for example, in [15]. Intuitively, conductance is a measure of \probability
ow" in the chain. More formally, it measures the isoperimetry of a natural weighted digraph underlying the chain. Good conductance implies rapid mixing. It was precisely to prove good conductance that the inequality of [5] was required in [10]. Recently, Lovasz and Simonovits [24] generalized the notion of conductance, and gave a sharper proof that this implies rapid mixing (although in a weaker sense than Sinclair and Jerrum [30]). They also proved the above conjecture of [10]. (See also Karzanov and Khachiyan [19].) With these improvements, they improved the analysis of the algorithm and its polynomial time bound. They also simpli ed the algorithm itself somewhat. In order to obtain rapid mixing, Dyer, Frieze and Kannan were obliged to smooth the boundary of the convex set by \in ating" it slightly. Lovasz and Simonovits dispensed with this assumption by showing that the \sharp corners" of the body cannot do too much harm, provided the walk is started uniformly on some \large enough" set. Applegate and Kannan [2] have recently obtained signi cant improvements in execution time with a dierent approach. The main new ingredients are a biassed random walk, and the use of the in nity-norm in the isoperimetry. Somewhat surprisingly, this overcomes the problem of \sharp corners" in a relatively ecient manner by allowing the walk to \step outside" the body if it enters such a region. They use this walk to sample from a non-uniform distribution over a convex body K { see Section 5, and to integrate log-concave functions over K . They estimate the volume of K by combining these two algorithms. In this paper we see how this biassed random walk works naturally with the original approach of [10]. We also manage to reduce the running time by a better method of statistical estimation, and by using uniformity to reduce the walking times. We will rst describe the algorithm, and subsequently develop the various components of its analysis. A key step in all of the algorithms that have been applied to this problem is that of computing a nearly uniform random point from a convex body. In Section 4.6 we prove a new result, which is a (sharpened) converse to this. We show that a polynomial number of calls to any good volume approximator suces to generate (with high probability) a uniform point in any convex body. We may observe that the only polynomial time (randomized) algorithms for the volume approximation problem seem to be based on the Dyer, Frieze and Kannan approach. For a slightly dierent approach in a special case, see [26]. It is of interest to display here the time bounds on the various volume algorithms so that we can see the progress that is being made on the problem. Let K be our convex body in n (n 2), given by a weak membership oracle. (See Section 2.) Given and , with probability (1 ? ) we wish to nd an -approximation to voln (K ). To avoid unnecessary complication, let us asume 1. We require the algorithm to run in time polynomial in hK i, 1= and log(1= ), i.e. it must be a fully polynomial randomized approximation scheme (FPRAS) [18]. IR
Dyer,Frieze and Kannan [10]
O(n23 (log n)5 ?2(log 1 )(log 1 )) convex programs.
Lovasz and Simonovits [24]
O(n16 ?4 (log n)8 (log n )(log n )) membership tests.
Applegate and Kannan [2]
O(n10 ?2 (log n)2 (log 1 )2 (log 1 ))(log log 1 )) membership tests. 7
This paper
O(n8 ?2 (log n )(log 1 )) membership tests.
4.1 The volume algorithm As discussed in Section 3, K can be \rounded" so that it contains the cube A(0; 1) and is contained in the cube A(0; 2(n + 1)). (The work required to carry out this rounding is dominated by the rest of the algorithm, so we will choose to ignore it.) Now let = 1=(2n), and let L = ( 12 e + n ) be an array of points, regularly spaced at distance , in n . We think of each point of L as being at the centre of a small cube of volume n (we refer to these as -cubes.) As in [10], we use the -cubes to approximate K closely enough that random sampling within cubes suces to obtain \nearly random" points within K . Our algorithm is a modi cation of that of [10], using Z Z
IR
the ideas of Applegate and Kannan [2]. Let = 21=n , k = dn lg 2(n + 1)e and di = bi =c (i = 0; : : : ; k) (so we are \rounding down to whole -cubes"). Now consider the sequence of cubes Ai = A(0; di ) (i = 0; 2 : : :; k). (Thus Ai is the `1 \ball" of radius di around 0.) It follows that A0 K Ak . So consider the convex bodies Ki = Ai \ K (i = 0; 1; 2; : : :; k). Clearly K0 = A(0; 1) and Kk = K . Also Ki Ki?1 . Thus
i = voln (Ki?1 )=voln (Ki ) ?n = 21 (i 2 [k]): Also it is easy to see that
Yk
i ); i=1 where voln (A(0; 1)) = 2n . It will therefore suce to estimate the i closely enough. voln (K ) = voln (A(0; 1))=(
(10) (11)
Suppose we can generate a point 2 Ki such that, for all S Ki with (say) voln (S ) > 31 voln (Ki ), we have Pr( 2 S ) very close to voln (S )=voln (Ki ). Then, by repeated sampling, we can estimate i closely, and hence voln (K ). For this, from purely statistical considerations, we need to assume that i is bounded away from zero. This is justi ed by (10). To estimate the volume, we perform a sequence of random walks on L, divided into phases. For i = 1; 2; : : : ; k, phase i consists of a number of random walks, which we will call trials, on L \ Ai . Trial j of phase i starts at a point Xi;j of Ai and ends at the point Xi;j+1 . If Xi;j+1 signals the end of phase i (see below), then we enter phase (i + 1) with Xi+1;1 = Xi;j (unless i = k, in which case we stop). The point X1;1 is chosen uniformly on L \ A0 . Its coordinates may be generated straightforwardly using n (independent) integers uniform on [4n]. Starting at Xi;j , trial j of phase i is a random walk which \moves" at each step from one point of L to an adjacent point (i.e. one which diers by in exactly one coordinate). The exact details are now spelled out. Associated with each y 2 L, we have an integer
(y) = minfs 2 : s 0 and y=(1 + (s + 21 )) 2 K g: Z Z
(12)
We keep track of this quantity. Since X1;1 2 K , (X1;1 ) = 0. We will show in Section 4.2 below that, if y1; y2 are adjacent in L (i.e. y2 ? y1 = er for some r 2 [n]) then j(y2 ) ? (y1 )j 1, so at most two membership tests suce to determine (y2 ) given (y1 ). The j th trial of phase i then proceeds as follows. Suppose at step t, the walk is at point Xt?1 2 L. We set X0 = Xi;j and the following operations comprise step t. With probability 21 \do nothing", i.e. put Xt = Xt?1 , t (t + 1) and end step t. (This is a technical requirement, see Section 4.4.) Otherwise, select a coordinate direction 2 fer g, all equally likely with probability 1=(2n). Let Xt0 = Xt?1 + . Test if Xt0 2 Ai . If not, do nothing. Otherwise determine (Xt0 ). If (Xt0 ) > (Xt?1 ), with probability 21 do nothing. Otherwise put Xt = Xt0 and end step t, setting (Xt ) = (Xt0 ). (Note that we require only weak membership tests here, with tolerance some small fraction of . There is sucient \slack" in our estimates below to allow for this source of small errors, but we omit further discussion of this issue. See [10] for the details.) We observe that what we have here is an example of the Metropolis algorithm { see the paper by Diaconis in this volume. 8
We continue the walk until t = , where = i = d29 n4 d2i ln(227 n3 ?4 )e = O(n4 log(n=)d2i ); then end trial j of phase i. We now continue with trial (j + 1) (or commence phase (i + 1) ) but, before doing so, we accumulate data for the volume estimate, as follows. We show later (in Sections 4.4 and 4.5) that Pr(X = x) c0 2?(x) (x 2 L \ Ai ); where c0 normalises the probabilities over L \ Ai . This distributional information about X is used to nd a point i;j , approximately uniform on Ki , in the following way. Let C be the -cube with centre X , and let s = (X ). If s > 0, do nothing. We declare trial j to be an improper trial and continue with trial (j + 1). We show in Section 4.2 that s > 0 implies C \ Ki = ;. Otherwise, if s = 0, C may meet Ki and we choose = i;j uniformly from C . If 2= Ki , we again declare trial j improper. Otherwise we have a proper trial, and we claim that is approximately uniformly distributed on Ki . We will justify this claim in Section 4.5 below. Now, if also 2 Ki?1 we declare the (proper) trial j to be a success. We continue phase i until a total of mi = d29 n2 =(2 di )e = O(n2 =(2 di )) proper trials have been observed, and we accumulate the number i of successes observed in these trials. Then we commence phase (i + 1), unless i = k, in which case we terminate and use the accumulated data to calculate our estimate of voln (K ). Let = 2?18 4 n?3 . If ^i;j = Pr(i;j 2 Ki?1 j i;j 2 Ki ), we will show in Section 4.5 that for each (proper) trial in phase i, p ji ? ^i;j j = 2?92 n?3=2 ; (13) conditional on the previous trial ending well in a sense made precise in Section 4.5. We show that no trial ends badly with probability at least 109 . We will also show in Section 4.5 that each trial is proper with probability at least 15 provided no trial ends badly. Thus, under these conditions, the expected number of trials in each phase is less than 5mi (and it is easy to show that the actual number will be less than, say, 10mi with very high probability. If after 10mi trials we have too few proper trials then we start again from the beginning.) Let
^i = m1 If
P=
Yk i=1
mi X
i j =1
^i;j :
i and P^ =
then, since i 12 , it is straightforward to show that
Yk i=1
^i ;
^ P ? 1 2?8: P
Now let us form the estimates and
(14)
Zi = mi for i = 1; 2; : : :; k i
Z=
Yk i=1
Zi :
We will use the Chebyche inequality to show that, if all trials end well, 3 1 4: Pr Z^ ? 1 > 10 P 9
(15)
Combining this with (14), and using the fact that the probability that there is a trial which ends badly is at most 1 10 , we obtain Z 1 Pr P ? 1 > 12 3 :
So if we take the median, W , of
= d12 lg(2= )e = O(log(1= )); repetitions of the algorithm, then by standard methods (see [16]), we may estimate
W Pr P ? 1 > ;
as required for use in (11). Combining our running time estimates, the expected time to compute W is
O(
k X i=1
as claimed. Here we have used
mi i ) = O(n6 ?2 log(n=)) log(1= )
k X i=1
di )
= O(n8 ?2 log(n=) log(1= )); k X
k X
di
i < 9n2 = O(n2 );
i=1 i=1 k since 4n and (as is easily shown) ? 1 > 1=(2n).
To prove (15) we observe that
E(Z ) = P^ 0 1 k Yk @ 1 X ^ i A ? Y ^ 2 : 0 Var(Z ) = + ^ ^ ij ij i m2 m i j 6=j 0
i=1
i
i=1
(16)
The pairwise independence needed to justify (16) will be established in Section 4.5. Then
Var(Z )
Yk mi2 ? mi
i=1
= P^ 2
P^ 2
P^ 2
mi 2
Yk i=1 Yk i=1
k p ^i ? Y ^2i ^2i (1 + )2 + m
1
1 ? m (1 + i
p
i
p
)2 +
1 + 2 + + m2
i=1
!
1 ?1 mi ^i
!
i
?1
( p
k 2) ! X
(
i=1
exp (2 + )k +
mi ? 1
) !
k 2 2 X exp 2 7 + 29n2 di ? 1 i=1 2 ? 7 ? ^ P (expf(2 + 9 2 9)2 g ? 1) 0:022P^ 2
P^ 2
and (15) follows from the Chebyche inequality and E(Z ) = P^ . To justify the algorithm, we must prove the various assertions made above. We do this in the following sections. We rst establish some essential theoretical results. 10
4.2 Convex sets and norms In this section we prove some preliminary technical results which will be used later. We assume we have any xed (symmetric) norm kxk for x 2 n . See [29] for general properties. In particular, we denote the `p norm by kxkp for 1 p 1. We will denote the \ball" fx : kx ? yk g by A(y; ). Since any two norms are equivalent, we note that for any other norm k k0 , there is a constant M 0 > 1 such that 1=M 0 < kxk=kxk0 < M 0 . For any S n , diam (S ) will denote the diameter of S in the norm kk and, for S1 ; S2 , dist (S1 ; S2 ) the (in mal) distance between the sets S1 ; S2 . It is well known that corresponding to k k, there is a dual norm k k , such that k k = k k, de ned by IR
IR
kxk = max ax=kak = maxfax : kak = 1g: (17) Now, for any a 2 n , consider the set of hyperlanes H (s) = fax = skakg orthogonal to a, and half-spaces H + (s) = fax skakg, H ? (s) = fax skakg they de ne. If K is any convex body, let K (s) = K \ H (s), K + (s) = K \ H + (s), K ? (s) = K \ H ? (s). (We call K (s) a \cross section" of K in \direction" a.) Let s1 = inf s fK (s) = 6 ;g, s2 = sups fK (s) =6 ;g. Then w = s2 ? s1 is the width of K in direction a, and we will write IR
w = W (K; a). Note that
Lemma 1 diam K = maxa W (K; a). Proof diam K = maxfkx ? yk : x; y 2 K g = maxfkz k : z 2 K ? K g = max max az=kak = max max az=kak z a a z = max W (K; a): a
2
We will also need the following technical result.
Lemma 2 Let a1; a2; : : : ; an?1 be mutually orthogonal. Then for some constant c > 0, depending only on n and k:k, diam K (s) < c maxi W (K; ai ) for all s. Proof If a is in the subspace generated by the ai, W (K (s); a) W (K; a) = (kak2=kak)W2 (K; a) < M W2 (K; a); where 2 denotes width in the Euclidean norm and M is the constant relating k k ; k k2 . But W2 (K; a) pn ? 1Wmax i W2p(K; ai ), since K can clearly be contained in an (in nite) cubical cylinder of side maxi W2 (K; ai ). 2 Taking c = M n ? 1 and using Lemma 1 now gives the conclusion.
If K is any convex body in n , then we can de ne a convex function IR
r(x) = inff 2 : > 0 and x= 2 K g; the gauge function associated with K . This has all the properties of a norm except symmetry. (See [29].) We IR
have
Lemma 3 If K contains the unit ball A(0; 1) then, for any x; y 2 n , jr(x) ? r(y)j kx ? yk: Proof Suppose, without loss, r(x) r(y). Then y 2 r(y)K and x ? y 2 kx ? ykA(0; 1) kx ? ykK: Thus x 2 (r(y) + kx ? yk)K , i.e. r(x) r(y) + kx ? yk. IR
11
2
Corollary 1 If A(0; 1) K , then r(y) > 1 + implies A(y; ) \ K = ;. Proof If x 2 A(y; ) \ K , then kx ? yk and r(x) 1. Hence r(y) ? r(x) > giving kx ? yk > , a
contradiction. 2 We use these results above with the `1 norm. If x 2 L, then the -cube C (x) = A(x; 21 ) in this norm. Also it is not dicult to see that (x), as de ned by (12), satis es
(x) = d(r(x) ? 1)= ? 12 e: From this we see 1+ ((x)+ 12 ) r(x) < 1+ ((x)+ 32 ). Any two adjacent points x; y, of L satisfy kx ? yk1 = . From Lemma 3 it now follows that jr(x) ? r(y)j , since A(0; 1) K . Thus we have r(x) ? r(y) > ((x) ? (y) ? 1); giving (x) (y) + 1. By symmetry we therefore have j(x) ? (y)j 1, as claimed in Section 4.1. Also, if (y) 1 for y 2 L, we have r(y) 1 + 23 . Thus from Corollary 1 we have C (y) \ K = ;, as claimed in
Section 4.1. We will extend the domain of the function (y) from L to n by letting (x) be the (obvious) upper semicontinuous function which satis es (x) = (y) for x 2 int C (y), y 2 L. Thus, in particular, (x) = maxf(y1 ); (y2 )g if y1 ; y2 are adjacent in L and x lies on the (n ? 1)-dimensional face int fC (y1 ) \ C (y2 )g. We bound this (extended) function (x) below by the convex function ^(x) = (r(x) ? 1)= ? 1. If x 2 C (y), we have (x) ? ^(x) (r(y) ? 1)= ? 21 ? (r(x) ? 1)= + 1 = (r(y) ? r(x))= + 21 0; (x) ? ^(x) (r(y) ? 1)= + 21 ? (r(x) ? 1)= + 1 = (r(y) ? r(x))= + 23 2; IR
so ^(x) (x) ^(x) + 2.
4.3 The isoperimetric inequality Here we derive an isoperimetric inequality about convex sets and functions which is the key to proving rapid convergence of the random walks. Our treatment follows that of Applegate and Kannan [2], and Lovasz and Simonovits [24], but we give an improvement and generalization of their theorems. We retain the notation of Section 4.2. Let A(s) = voln?1 (K (s)) and V (s) = voln (K + (s)), and temporarily assume, without loss, that s1 = 0 and s2 = w. Note then V (w) = voln (K ). It is a consequence of the Brunn-Minkowski theorem [7], that A(s)1=(n?1) is a concave function of s in [0; w]. Then we have
Lemma 4 (s=w)n V (s)=V (w) ns=w. Proof Since the inequality is independent of the norm used, we will assume the Euclidean norm for convenience. First we show that if 0 < s < u, A(s)=A(u) (s=u)n?1 . This follows since if s = 0 + (1 ? )u, then BrunnMinkowski implies
A(s)1=(n?1) A(0)1=(n?1) + (1 ? )A(u)1=(n?1) (1 ? )A(u)1=(n?1) = (s=u)A(u)1=(n?1) : Thus
V (s) V (w) ? V (s)
Zs Z0 s
(u=s)n?1 A(s) du = (s=n)A(s);
(18)
(u=s)n?1 A(s) du = (wn ? sn )=(nsn?1 )A(s):
(19)
w
12
Dividing (19) by (18) gives V (w)=V (s) (w=s)n , which is the left hand inequality. By symmetry, this inequality in turn implies (V (w) ? V (s))=V (w) ((w ? s)=w)n = (1 ? s=w)n 1 ? ns=w; since (1 ? x)n 1 ? nx for x 2 [0; 1]. This gives the right hand inequality. 2 We say that a real-valued function F (x) de ned on the convex set K n is log-concave if ln F (x) is concave on K . This clearly entails R F (x) > 0 on K . With such an F , we will associate a measure on the measurable subsets S of K by (S ) = S F (x) dx. We will need the following simple lemma asserting the existence of a hyperplane simultaneously \bisecting the measure" of two arbitrary sets. IR
Lemma 5 Let S1; S2 n , measurable, and a two-dimensional linear subspace of n. Then there exists a hyperplane H , with normal a 2 , such that the half-spaces H + , H ? determined by H satisfy (Si \ H + ) = (Si \ H ? ) for i = 1; 2. Proof Let 1; 2 be a basis for . For each 2 [?1; +1], let bi() be such that the hyperplane (1 + (1 ? jj)2 )x = bi () bisects the measure of Si for i = 1; 2. (If Si is disconnected in such a way that the possible bi form an interval, bi () will be its midpoint.) It clearly suces to show that b1 (0 ) = b2 (0 ) for some 0 . If b1 (?1) = b2 (?1) we are done, so suppose without loss b1 (?1) > b2 (?1). We clearly have bi (1) = ?bi (?1) for i = 1; 2, so b1(1) < b2 (1). But since is a continuous measure, it follows easily that bi () is a continuous function of . The existence of 0 2 (?1; 1) now follows. 2 IR
IR
Remark 1 This is a rather simple case of the so-called \Ham Sandwich Theorem". (See Stone and Tukey [31].) The proof here is a straightforward generalization of one in [8, p. 318].
We now give the rst version of the isoperimetric inequality. Without the constant 21 , the following was proved, for the case F (x) = 1 with Euclidean norm, by Lovasz and Simonovits [24], and, for the case of general F and the `1 norm, by Applegate and Kannan [2]. We give a further generalization and improvement of their theorems.
Theorem 2 Let K n be a convex body and F a log-concave function de ned on int K . Let S1; S2 K , and t dist (S1 ; S2 ) and d diam (K ). If B = K n (S1 [ S2 ), then minf(S1 ); (S2 )g 21 (d=t)(B ): Proof By considering, if necessary, an increasing sequence of convex bodies tending to K , it is clear that we may assume without loss F (x) > 0 on K . Thus, for some M1 > 1 we have 1=M1 < F (x) < M1 for all x 2 K . Also since F is positive log-concave, ln F (y) ln F (x)+ (x)(y ? x), where (x) is any subgradient at x. It follows that there exists a number M2 1 such that ln(F (y)=F (x)) < M2 ky ? xk for all x; y 2 K . Let M = maxfM1; M2 g. Now note that, if (B ) 21 (K ) the theorem holds trivially, since d t. We therefore assume otherwise. IR
We consider rst the case where K is \needle-like", i.e. there exists a direction a such that all cross sections of K are \small". Speci cally, for given 0 < < t, we require diam K (s) < for all s. If L is the line segment joining any point of K (s1 ) to any point of K (s2 ), let f (s) = F (y) for y 2 K (s) \ L. Now f (s) is log-concave in s, and we clearly have j ln(F (x)=f (s))j < M for any x 2 K (s). S Now for i = 1; 2 replace Si by S^i = s fK (s) : Si \ K (s) 6= ;g, and B by B^ = K n (S^1 [ S^2 ). Since < t, this operation is well de ned and dist (S^1 ; S^2 ) t^ = t ? . Clearly (S^i ) (Si ) (i = 1; 2), and (B^ ) (B ). Let us now drop the \hats", bearing in mind that t must eventually be replaced by t ? . The components of S1 ; S2 ; B now correspond to intervals of s. We may assume without loss that the components of S1 and S2 alternate in the increasing s direction, since otherwise we could increase (S1 ) and/or (S2 ) and decrease (B ) without decreasing dist (S1 ; S2 ). We show rst that it is sucient to consider the case when each of S1 , S2 contains a single component. By symmetry, let us assume that S1 = K + (u1 ) and S2 = K ? (u2 ) where (u2 ? u1 ) t. Call this the \connected case", and suppose we are not in this case. Consider any component S10 of S1 , covering the interval [s0 ; s00 ]. This meets two (possibly empty) components of B which meet no other S1 component. Let S20 = K +(s0 ? t), S200 = K ?(s00 + t). Note that S2 S20 [ S200 . Suppose (S10 ) (S20 ). Assuming the theorem holds for the connected case, let us apply it to K 0 = K + (s00 ) with S10 , S20 and B 0 = K 0 n (S10 [ S20 ). This implies (S10 ) 21 (d=t)(B 0 ), 13
where B 0 is a component of B which meets no other component of S1 . Similarly if (S10 ) (S200 ). If one or other of these holds for every component of S1 , adding all the resulting inequalities implies (S1 ) 21 (d=t)(B ). Thus suppose there is a component with both (S20 ) (S10 ) and (S200 ) (S10 ). Then we can show, similarly to the above, that (S20 ) 21 (d=t)(B 0 ) and (S200 ) 21 (d=t)(B 00 ), where B 0 ; B 00 are dierent components of B . Adding these now implies (S2 ) 21 (d=t)(B ). Thus it suces to consider the connected case. If A (s) = (kak2 =kak)A(s), is the \scaled area" of K (s), we have Z u2 f (s)A (s) ds = (u2 ? u1 )eM f ( )A ( ) teM f ( )A ( ); (20) e2M (B ) eM u1
for some 2 [u1 ; u2 ], by the rst mean value theorem for integrals. We will assume without loss that = 0, s1 = ?, s2 = , so w = W (K; a) = ( + ). By scaling orthogonal to a, we will also assume without loss that eM f ( )A ( ) = 1. Now ln f (s) is concave by assumption, and ln A (s) is log-concave since A (s)1=(n?1) is concave. Thus G(s) = M + ln f (s) + ln A (s) is concave with G(0) = 0. Let be any subgradient of G at s = 0. If = 0, then G(s) 0 for all s. But then it follows that (S1 ) and (S2 ) . Letting ~ = minf(S1 ); (S2 )g, we therefore have (21) ~ 21 ( + ) 21 e2M (w=t)(B ) using (20). If 6= 0, assume > 0, since otherwise we can re-label S1 ; S2 and use the direction ?a. By scaling in the a direction, we may assume = 1. Then G(s) s for all s, hence eM f (s)A (s) es for all s, giving
(S1 ) (S2 )
Z0
?
Z 0
es ds = (1 ? e? ); es ds = (e ? 1):
so ~ minf(1 ? e?); (e ? 1)g. This implies ? ln(1 ? ~) and ln(1 + ~). Thus 1 1 1 2 w = 2 ( + ) 2 (ln(1 + ~) ? ln(1 ? ~)) > ~; where the nal inequality may be obtained by series expansion of both terms in the penultimate expression. Thus (21) holds again, with strict inequality. Recalling that we must replace t by (t ? ), and that by Lemma 1 w d we have proved that in the needle-like case, minf(S1 ); (S2 )g 21 eM (B )d=(t ? ): (22) We move to the general case. Suppose there is a convex body K with sets S1 ; S2 such that the theorem fails. Then, for some > 0, (22) fails. Suppose that there exist mutually orthogonal directions a1 ; : : : ; aj such that max1ij W (K; ai ) < =c where c is the constant of Lemma 2. If j n ? 1, by Lemma 2 the needle-like case applies and we have a contradiction. Thus suppose j n ? 2 is maximal such that a counter-example can be found. Let be a two-dimensional linear subspace orthogonal to the aj . By Lemma 4 there is a hyperplane H with normal a 2 , kak = 1, which bisects the measure of both S1 ; S2 . We choose H + to be the half-space such that (B \ H +) is smaller. Let us write K 0 for K \ H + etc. If the theorem fails for K , S1 , S2 , then it follows that it must also fail for K 0 , S10 , S20 . (The diameter can only decrease, and the distance increase, so the same d, t, will apply.) Note that, since (B ) < 21 (K ), H cuts K into two parts K 0 ; K 00 with (K 0 ) (K 00 ) 3(K 0). Since 1=M < F (x) < M on K , for any measurable S we have voln (S )=M < (S ) < M voln (S ). Hence voln (K 0 )=M 2 voln (K 00 ) 3M 2 voln (K 0 ), and it follows that voln (K 0 )g voln (K )=(1+3M 2). Thus, by Lemma 4, W (K 0 ; a) W (K; a) for some constant < 1 depending only on M; n. Suppose we iterate this bisection, obtaining a sequence of bodies K = K (1) K (2) K (m) ; where K (m) = H (m) \ K (m?1), containing sets for which the theorem fails. Now K (m) clearly converges to a compact convex set K . If a(m) is the normal to H (m) , by compactness a(m) has a cluster point a 2 . By continuity, taking the limit in 0 W (K (m+1) ; a(m) ) W (K (m) ; a(m) ) gives 0 W (K ; a ) W (K ; a ). Thus W (K ; a ) = 0, and hence for some m, W (K (m); a(m) ) < =c. However, taking aj+1 = a(m) , the fact that K (m) is a counter-example to the theorem now gives a contradiction. 2 14
Remark 2 The method of proof by repeated bisection is due in this context to Lovasz and Simonovits [24], but is similar to that employed by Payne and Weinberger [28] to bound the second largest eigenvalue of the \free membrane" problem for a convex domain in n . Eigenvalues are, in fact, closely related to conductance. The approach of Sinclair and Jerrum [30] was based on bounding the second eigenvalue of the transition matrix. IR
We use this to prove the following isoperimetric inequality.
Theorem 3 Let K n be a convex body, and F log-concave on int K . Let S K , with (S ) 21 (K ), be such that R @S n @K is a piecewise smooth surface , with u(x) the Euclidean unit normal to at x 2 . If 0 (S ) = F (x)ku(x)k dx, then (S )=0 (S ) 21 diam (K ). Proof By considering the limit of an appropriate sequence of simplicial approximations, it clearly suces to prove the theorem for a \simplicial surface", i.e. one whose \pieces" are (n ? 1)-dimensional simplices. For small t > 0, let B be the closed 21 t-neighbourhood of such a surface . Consider a simplicial piece 0 , with R IR
normal u and surface integral = 0 F (x) dx. The measure of B around 0 is then approximately h, where h = maxfuz : kz k = tg = kukt:
Thus the measure of this portion of B is tkuk + o(t) and hence, since u is constant on each such 0 , (B ) = t0 (S ) + o(t). Now, from Theorem 2 with S1 = S , and S2 = K n (B [ S ), we have (S ) 21 (diam (K )=t)(B ), and the theorem follows by letting t ! 0. 2
Remark 3 The inequality in Theorem 3 is \tight". To see this, let K be any circular cylinder with radius very small relative to it length, F (x) = 1, and S be the region on one side of the mid-section of K .
Corollary 2 Let F (x) be an arbitrary positive function de ned on int K , and F^ (x) be any log-concave function such that F^ (x) F (x) for all x 2 K . If = maxx F^ (x)=F (x) then, in the notation of Theorem 3, (S )=0 (S ) 1 2 diam (K ).
Proof Use the result of Theorem 3 for F^ and the inequalities F (x) F^(x) F (x).
2
Remark 4 Applegate and Kannan [2] have proved a further weakening of Theorem 3, in terms the maximum
ratio of the function to a bounding concave function on each line in K . (The bounding function may vary from line to line.) In [2] this is proved by the bisection argument assuming that F satis es a Lipschitz condition. However, the condition appears unnecessarily strong to prove an analogue of Theorem 3. Continuity of F is certainly sucient, and even this can be dispensed with by employing an approximating sequence of continuous functions and dominated convergence of the integrals.
4.4 Rapidly mixing Markov chains In this section we prove some basic results about the convergence of Markov chains. Our treatment is based on Lovasz and Simonovits' [24] improvement of a theorem of Sinclair and Jerrum [30]. Let CN denote the unit cube, with vertex set V , as in Section 3. We regard v 2 V as a (column) N -vector. Then v = fi : vi = 1g gives the usual bijection between V and all subsets of [N ]. By abuse of language, we will refer to Sv simply as v, the meaning always being obvious from the context. Thus for example, jvj is the cardinality, and v = (e ? v) the complement, of v in its \set context". Suppose P is the transition matrix of a nite Markov chain Xt on state space [N ], whose distribution at time t = 0; 1; 2; : : : is described by the (row) N -vector p(t) . Thus
Pe = e; p(t) e = 1; p(t) = p(t?1) P:
(23)
(We use only basic facts concerning Markov chains but, if necessary, see [12] for an introduction.) In our application, observe that the points of L \ Ki correspond to the states. Thus any subset of cubes in the random walk is actually being identi ed with a vertex of CN here. 15
We suppose that we are interested in the \steady state" distribution q = limt!1 p(t) of Xt , given that this exists. We will write the corresponding random variable as X1 . It is easy to see that q must be a solution of qP = q; qe = 1: (24) Our objective is to sample approximately from the distribution q. We do this by choosing X0 from some initial distribution p(0) , and determining Xt iteratively in accordance with the transition matrix P (using a source of random bits). We do this for some predetermined nite time until X closely enough approximates X1 . By this we mean that we require the variation distance be small, i.e. for some 0 < 1,
jp( ) ? qj = 21
N X 1=1
jp(i ) ? qi j < :
(25)
We call the mixing time of Xt for . We will assume that P is such that pii 21 (i 2 [N ]). For our purposes, this assumption is unrestrictive, since it is easy to verify that the chain Xt0 with transition matrix P 0 = 21 (I + P ) also has limiting distribution q. (I is the N N identity matrix.) Also Xt0 has mixing time only (roughly) twice that of Xt , since it amounts to choosing at each step, with probability 21 , either to do nothing or else to carry out a step of Xt . Let G be the \underlying digraph" of Xt with vertex set N and edge set E = f(i; j ) : pij > 0g. As Xt \moves" probabilistically around G we imagine its probability distribution p(t) as a dynamic ow through G in accordance with (23). Thus, in the time interval (t ? 1; t), probability fij(t) = p(it?1) pij ows from state i to state P j . At (epoch) t, the probability p(jt) at j is, by (23), the total ow Ni=1 fij(t) into it during (t ? 1; t). Thus PN f (t) = PN f (t+1) expresses dynamic conservation of ow. Let f = lim f (t) = q p . Then clearly i ij ij t!1 ij i=1 jiP i=1 ij P we have Ni=1 fij = Ni=1 fji , i.e. static conservation of ow. This is the content of the rst equation of (24). In order that probability can ow through the whole of G, we must assume that it is connected (i.e. that Xt is irreducible). In applications, the validity of this hypothesis must be examined for the Xt concerned. Under these assumptions, however, we are guaranteed that q exists and is the unique solution of (24). The chain is then said to be ergodic. (See [12].) From (25) it follows easily that (p(t) ? q)(2v ? e) = max (p(t) ? q)v: (26) jp(t) ? qj = 21 max v2V v2V Note that (p(t) ? q)v = Pr(Xt 2 v) ? Pr(X1 2 v). We will examine the behaviour of maxv2V (p(t) ? q)v as a function of the limiting probability qv of the sets. The aim will be to show that this function is (approximately) pointwise decreasing with t, at a rate in uenced by the asymptotic speed of probability ow into, and out of, each set v. To make this idea precise, we digress for a moment. Sinclair and Jerrum [30] de ned the ergodic ow f (v) from v to be the asymptotic total ow out of v. (Equivalently, this is the limiting value of the probability Pr(Xt?1 2 v and Xt 2= v).) Thus, from the de nition, XX qi pij f (v) = = = = = =
i2v j 2= v X X
i2v j 2= v N XX i2v j =1 N XX i2v j =1 N X X j =1 i2v
XX j 2= v i2v
= f (v);
16
fij fij ? fji ? fji ? fji
XX i2v j 2v
XX i2v j 2v
XX j 2v i2v
fij fij fji
using conservation of ow. Thus the ergodic ow from v is the same as that from its complement v. (This is, of course, a property of any closed system having conservation of ow.) Sinclair and Jerrum [30] now de ned the conductance of Xt as = minv2V ff (v)=qv : qv 21 g. This quantity is clearly the limit of minv2V Pr(Xt 2= v j Xt?1 2 v) for sets of \small" limiting probability. (We call these \small sets".) Intuitively then, if the conductance is (relatively) large the ows will be high, and Xt cannot remain \trapped" in any small set v for too long. Lovasz and Simonovits [24] generalized this de nition to -conductance, which ignores \very small" sets. They de ned = min ff (v)=(qv ? ) : < qv 21 g: (27) v2V
Remark 5 In [30], conductance is only de ned for Xt \time reversible". Our de nition of -conductance does not agree precisely with that in [24], but is clearly equivalent since f (v) = f (v).
The intuition now is that, if the distribution of X0 is already close to that of X1 on all very small sets, we know that this will remain true for all Xt . (This will be shown below). Thus Xt cannot be trapped in any very small set, and we need only worry about the larger ones. We will use only the notion of conductance (i.e. 0-conductance) here, but we prove the results in this section in the more general setting of -conductance. To avoid a complication in the proof, we will modify the de nition (27) slightly. Let qmax = maxi qi , and de ne = min ff (v)=(qv ? ) : < qv 21 (1 + qmax)g: v2V
(28)
zt (x) = max fp(t)v ? x : qv = xg: v2V
(29)
The given by (28) is easily seen to be at least (1 ? 2 ? qmax)=(1 ? 2 + qmax) times that given by (27). Thus, provided, is bounded away from 21 and qmax = o(1), the value from (28) is asymptotic to that from (27). (In our application here, these assumptions are overwhelmingly true.) Now let us return to our main argument. For 0 x 1, we wish to examine the function Thus zt is the value function of an equality knapsack problem. This is dicult to analyse, since it is only de ned for a nite number of x's, and has few useful properties. Thus we choose to majorize zt by the \linear programming relaxation" of (29). Therefore de ne
ht (x) = wmax fp(t) w ? x : qw = xg: 2C N
We observe that, trivially,
(30)
ht (x) 1 ? x for all x 2 [0; 1]: (31) Clearly zt (x) ht (x) at all x for which zt is de ned. Also, it is not dicult to see that max0x1 ht (x) = max0x1 zt (x) = jp(t) ? qj, so the relaxation does not do too much harm. Its bene t is that ht (x) is the value function of a (maximizing) linear program, and hence is (as is easy to prove) a concave function of x on [0; 1]. We have ht (0) = ht (1) = 0. Now, for given x and t, let w^ be the maximizer in (30). By elementary linear programming theory, w^ is at a vertex of the polyhedron CN \ fqw = xg. Therefore it lies at the intersection of an edge of CN with the hyperplane qw = x. Thus there exists 2 [0; 1) and vertices v(1) ; v(2) 2 V , with v(2) = v(1) + ek for some k 2 [N ], such that w^ = (1 ? )v(1) + v(2) . So w^ has only one fractional coordinate w^k . Moreover, we must have ht (qv(i) ) = p(t) v(i) ? qv(i) , (i = 1; 2). Otherwise, suppose w(i) 2 CN is such that qw(i) = qv(i) , p(t) v(i) < p(t) w(i) . Then we can replace v(i) in the expression for w^ by w(i) to obtain a feasible solution to the linear program in (30) with objective function better that p(t) w^ ? x, a contradiction. Thus ht (x) = (1 ? )ht (qv(1) ) + ht (qv(2) ). So ht is piecewise linear with successive \breakpoints" x = qv(1) ; qv(2) , such that v(1) v(2) are sets diering in exactly one element. It follows that there are N ? 1 such breakpoints in the interior of [0; 1], with successive x values separated by a (unique) qi . Note that ht (x) = p(t?1) (P w^) ? x, P w^ 2 CN and q(P w^) = qw^ = x, using (24). Thus P w^ is feasible in the linear program (30) for ht?1 (x), giving immediately ht (x) ht?1 (x). Thus ht certainly decreases with t, but we wish to quantify the rate at which this occurs. We do this by expressing the ow into w^ during (t ? 1; t), p(t) w^, as 17
a convex combination of the ows out of \sets" (points in CN ) w0 ; w00 , with qw0 = x0 < x < x00 = qw00 . This enables us to bound ht (x) as a convex combination of ht?1 (x0 ) and ht?1 (x00 ). This is made precise in Lemma 6 below. Then, provided x0 ; x00 are \far enough away" from x, ht (x) decays exponentially (in a certain sense) with t. This will be the content of Theorem 4.
Lemma 6 (Lovasz-Simonovits) Let y(x) = min(x; 1 ? x). Then, for x 2 [; 1 ? ], ht (x) 21 ht?1 (x ? 2 (y(x) ? )) + 21 ht?1 (x + 2 (y(x) ? )): Proof The function on the right side in the lemma is evidently concave in both intervals [; 21 ] and [ 21 ; 1 ? 1].
Thus, since ht is also concave, it suces to prove the inequality at the breakpoints of ht and the point x = 2 . Thus, consider a breakpoint < x = qv 12 , with ht (x) = p(t) v ? x. (Breakpoints in [ 21 ; 1 ? ] are dealt with by a similar argument.) Intuitively, we wish to express the ow p(t) v into v as a convex combination of ows from \small subsets" and \large supersets" of v. Note that we have 0 2Pv ? v e, since 0 v e and (2P ? I ) is a non-negative matrix since all pii 21 . Hence de ne
vi0 = 2(Pv)i ? vi ; vi00 = vi ; if vi = 1; vi0 = vi ; vi00 = 2(Pv)i ? vi ; if vi = 0:
(32)
Thus v0 ; v00 2 CN and Pv = 21 (v0 + v00 ). Clearly, v0 ; v00 are convex combinations of sets respectively contained in, or containing, v. Thus, since from (24)
p(t) v = p(t?1) (Pv) = 21 p(t?1) v0 + 21 p(t?1) v00 ; we have achieved our objective of expressing the ow into v as a convex combination of ows from subsets and supersets v~ of v. It remains to prove that the P v~ in this representation are large enough, or small enough, in comparison with v. From (32), since (Pv)i = j2v pij , we have
q(v00 ? v) = 2
XX i=2v j 2v
qi pij = 2f (v) = 2f (v):
(33)
Also, using (24) and Pv = 21 (v0 + v00 ),
q(v ? v0 ) = q(Pv ? v0 ) = q(v00 ? Pv) = q(v00 ? v) = 2f (v):
(34)
Let x0 = qv0 ; x00 = qv00 . Then (34) gives (x ? x0 ) = (x00 ? x) = 2f (v). Thus, from (27), and (34), since x 21 , (x ? x0 ) = (x00 ? x) 2 (x ? ):
(35)
Also, since v is a maximizer for ht (x) and Pv = 21 (v0 + v00 ),
ht (x) = (p(t?1) ? q)Pv = 21 (p(t?1) ? q)(v0 + v00 ) 21 ht?1 (qv0 ) + 21 ht?1 (qv00 ) = 21 ht?1 (x0 ) + 21 ht?1 (x00 )
Let x1 = x ? 2 (x ? ), x2 = x + 2 (x ? ). Then we have x = 21 (x0 + x00 ) = 12 (x1 + x2 ), and (35) implies x0 x1 x2 x00 . For these four x's, denote ht?1 (x0 ) by h0 etc. Since ht?1 is concave, the whole of the line segment [(x1 ; h1 ); (x2 ; h2 )] lies above [(x0 ; h0 ); (x00 ; h00 )]. Hence, in particular,
ht (x) 21 ht?1 (x0 ) + 21 ht?1 (x00 ) 21 ht?1 (x1 ) + 21 ht?1 (x2 ): (36) We have still to consider the point x = 21 . Observe that there must be a breakpoint of ht within 21 qmax of 12 . Let this be x+ , and suppose that x+ 2 [ 21 ; 21 (1 + qmax )], the other case being symmetric. Let the previous breakpoint be x? < 21 . By our de nition (28), the inequality in (35) will still apply at x+ . Thus we can prove (36) for x+ . The linearity of ht in [x? ; x+ ] and the concavity of ht?1 now imply that (36) holds throughout [x? ; x+ ], and hence at x = 21 . 2 18
Clearly Lemma 6 is equivalent to ht (x) Ht (x) ( x 1 ? ), where H0 (x) is any function such that h0 (x) H0 (x) for all x 2 [; 1 ? ] and
Ht (x) = 21 Ht?1 (x ? 2 (y(x) ? )) + 21 Ht?1 (x + 2 (y(x) ? )): (37) We have to solve the recurrence (37). Clearly Ht (x) = C , for any constant C is a solution. To nd others, we use \separation of variables". We look for a solution of the form Ht (x) = g(t)G(y(x)) for x 2 [; 1 ? ]. Then g(t)=g(t ? 1) = (G(y ? 2 (y ? )) + G(y + 2 (y ? )))=2G(y) where y = y(x) 2 [; 21 ]. (Note y(y(x)) = y(x).) Thus, for some , we must have g(t) = g(t ? 1), i.e. g(t) = C1 t , for some constant C1 , and 2 G(y) = G(y ? 2 (y ? )) + G(y + 2 (y ? )) ( y 21 ): The form of this equation suggests trying G(y) = C2 (y ? ) for some constants , C2 . This gives 2 = (1 ? 2 ) + (1 + 2 ) : Assuming that is small, we have 1+2( ? 1)2 . We wish to minimize in order to force Ht to decrease quickly with t. Thus we should take = 21 , giving
p
p
= 21 ( 1 ? 2 + 1 + 2 ) 1 ? 12 2 : The inequality in (38) is proved by noting that, for x 2 [0; 1], q p 1 2 (1 +
(38)
p1 ? x (1 ? 1 x) and 1 (p1 ? x + p1 + x) = 2 2
1 ? x2 ). Both are easily proved by squaring. Thus the middle term of (38) is
r
1 (1 + 2
q
q
1 ? 42 ) 1 ? 2 1 ? 12 2 :
In view of this discussion, we have justi ed a bound of the form
p
(39) ht (x) C + C 0 (1 ? 12 2 )t y(x) ? ; for some constants C; C 0 , given only that this inequality holds for h0 (x) (x 2 [; 1 ? ]). Thus we may prove
Theorem p 4 (Lovasz-Simonovits) If C = maxfh0(x) : x 2 [0; ] [ [1 ? ; 1]g, and C 0 = maxx1?(h0(x) ?
C )= y(x), then
ht (x) C + C 0 exp(? 12 2 t) (x 2 [0; 1]; t 0) Proof The constant C ensures the inequality holds for t = 0 and x or x 1 ? . Then C 0 ensures that it holds for x 2 [; 1 ? ] and t = 0. It then holds for all t, using the solution of the recurrence (39). 2 We turn now to the application of Theorem 4 to the volume algorithm. The Markov chain Xt we consider is the phase i, trial j , random walk. An ergodic Markov chain is time reversible if there exist constants i 0 (i 2 [N ]), not all zero, such that i pij = j pji for all i; j 2 [N ]. (These are called the detailed balance equations.) Since N X
i pij = j (j 2 [N ]); i=1 P it follows, by uniqueness, that qi = i =( Nj=1 j ) for all i 2 [N ]. In our random walks, we have (in obvious
notation) for all x; y 2 L,
p(x; y) = 0
if x; y nonadjacent if x; y adjacent and (y) (x) = 41n = 81n P if x; y adjacent and (y) > (x) = 1 ? z6=x p(x; z ) if x = y; 19
Where (x) is as de ned in Section 4.1 and discussed in Section 4.2. If we take (x) = 2?(x), the only cases to be checked are if x; y are adjacent. It is then easy to verify that (40) (x)p(x; y) = (y)p(y; x) = 41n 2? maxf(x);(y)g = 41n 2?(z) ; for any z 2 int fC (x) \ C (y)g. The conductance =
S
XX x2v y=2v
(x)p(x; y)=
X x2v
(x);
for some v 2 V . Let S = x2v C (x), with bounding surface . Note that is a union of (n ? 1)-dimensional -cube faces, with kuk1 = kuk1 = 1 at all points at which u is de ned. If we put F (x) = (x) = 2?(x), and F^ (x) = 2?^(x) , where ^(x) is as de ned in Section 4.2, we have 1^ ^ 4 F (x) F (x) F (x); and F^ is log-concave, since r(x) is convex. Letting be the measure induced by F , we apply Corollary 2 with the norm `1 and = 4. If i is the conductance of any phase i random walk, we then have n?1
0
i = fq((vv)) = 4nn qvf (v) 4n = ((SS)) 4n ; since () = maxf(x); (y)g on int fC (x) \ C (y)g by de nition. Thus = 1 ; i 422nd i 24 n2 di for i = 1; 2; : : : ; k.
(41)
4.5 The random walk In this section we conclude the analysis of the random walks employed in the algorithm. For convenience, let us assume that a point is generated in the nal -cube at the end of every walk, and we always check whether is in Ki . Thus, if the random walk is run \long enough", the (extended) function F (x) = 2?(x) is the (unnormalised) probability density function of . We call F (x) the \weight function". We observe that each walk has one of three mutually exclusive outcomes : (E1 ) 2= Ki , an improper trial. (E2 ) 2 Ki n Ki?1 , a failure. (E3 ) 2 Ki?1 , a success. We generate , and observe one of the outcomes Ej ; j = 1; 2; 3. Let us denote the observed outcome by E . Denote the nal (i.e. t = ) and limiting distributions of the random walk by pj and qj for j 2 [N ] similarly to Section 4.4, and let zj = Pr( 2 E j Xt = j ) (j 2 [N ]): (Observe that this is independent of t.) We will use primes to denote the probabilities conditional on E . Thus, if pE = Pr( 2 E ) = pz , and we write qE = qz for its asymptotic value,
p0j = pj zj =pE ; qj0 = qj zj =qE : We say that E is a good set and the outcome is good if 4 qE = 218 n3 :
20
We now proceed inductively. We assume that the outcome of a trial is good and its nal distribution is close to its steady state i.e. pp (42) h (x) 2?6 min(x; 1 ? x) (x 2 [0; 1]): This is certainly true initially. Let us show next that, when the walk is close to its asymptotic distribution, the probability of E1 will not be too high. Now
(x) = d(r(y) ? 1)= ? 21 e d(r(x) ? 1)=e ? 1;
for some y 2 L, using Lemma 3. Thus F (x) 2?j if r(x) > (1 + j ). Thus, if E1 = E2 [ E3 , the de nition of r(x) implies Pr(E1 )= Pr(E1 ) = Pr( 2= Ki )= Pr( 2 Ki ); =
N . Let p00 ; q00 denote the N -vectors obtained by deleting the last (N~ ? N ) components of p0 ; q0 . Now
h (x) = max~ fp0w : q w = xg ? x w 2
= p0 w~ ? x; say = p00 w^ ? x; say max fp00w : q00 w = q00 w^g ? x w2
= max fp00w : q00 w = x00 g ? x; w2
where w^ is the truncation of w~ to its rst N components, and
x=
N~ X j =1
qj w~j
N X j =1
qj zj w^j = qE (q00 w^) = qE x00 :
Thus
h (x) h0 (x00 ) + x00 ? x p < 1:1qx00 1:1 x=qE 1p
< 5 ? 2 x:
The trivial inequality (31), h (x) 1 ? x, now implies that 1p
h (x) 2 ? 2 min(x; 1 ? x); 1
and thus we take (with = 0) C = 0; C 0 = 2 ? 2 in Theorem 4. Thus we need only run the random walk until
p
1
2 ? 2 expf? 21 2i g 2?6 ; i.e. 2?i 2 ln(5 26 = ) or 29 n4 d2i ln(227 n3 ?4); 22
using (41). We have included an extra factor of 8=5 to allow for the discrepancy in the de nitions of conductance between (27) and (28) in Section 4.4. This is generous, since qmax 1=(4n)n 2?6 (the initial distribution for n = 2), and thus the factor (1 + qmax)=(1 ? qmax ) < 1:1 : We can now see that (16) is justi ed. Basically we need to consider quantities Pr(E 0 jE 00 ) where E 0 ; E 00 are good events and E 00 refers to an earlier trial than E 0 . We can assume that at the trial corresponding to E 00 (42) holds. Our inductive argument then implies that assuming Ebad does not occur the probability of E 0 will be within the correct error bounds because of (42). This concludes the analysis of the algorithm.
4.6 Generating uniform points We have seen how a generator of \almost uniform" points in an arbitrary convex body can be used to estimate volume. Here we will prove a stronger converse to this, that a volume estimator can be used to determine, with high probability, a uniformly generated point in a convex body. (The probability of failure is directly related to the probability that the volume estimator fails.) The development here has a similar avour to, though is not derivable from, results of Jerrum, Valiant and Vazirani [16]. We will gloss over most of the issues of accuracy of computation, leaving the interested reader to supply these. Let = 1=(6n) and m = 60n2, say. We consider a general dimension d (2 d n). We will use the same terminology and notation as in Section 4.3. Choose the lowest numbered coordinate direction, and determine the Euclidean width w of K in this direction. We assume, for convenience, that the area function A(s) is de ned for s 2 [0; w]. We know, from Brunn-Minkowski, that A(s)1=(n?1) is a concave function of s in [0; w]. Thus, in particular, A(s) is unimodal, i.e. for some s , A(s) is nondecreasing in [0; s ] and nonincreasing in [s ; w]. We will write A = A(s ). We have
Lemma 7 If 0 s s, (s=s)n(A s=n) V (s) As. Proof From the proof of Lemma 4, for 0 < s < u, we haveA(s)=A(u) (s=u)n?1. But V (s) = R0s A(y) dy, so the result follows from this and A(s) A , on putting u = s and integrating between 0 and s. 2 Corollary 3 Aw=n voln(K ) A w. Proof The right hand inequality is immediate. For the left hand, from Lemma 7, V (s ) As=n. By symmetry, V (w) ? V (s ) A (w ? s )=n. The result follows by adding. 2
R
Now let us divide the width of the body into m \strips" of size = w=m. Write Ai = A(i) , Vi = (ii?1) A(s) ds, P so V = voln (K ) = m i=1 Vi . We begin by obtaining some easy estimates which form the basis of the method. Assume without loss that s 2 [(k ? 1); k) with k 21 m. Then the fAi g form a nondecreasing sequence for 0 i (k ? 1), and a nonincreasing sequence for k i m. Then, by Corollary 3, V A w=d. Thus A dV=w = dV=(m). Therefore A dV=m nV=m = V=10 (44) ^ Let A(s) be an -approximation to A(s), with probability at least (1 ? ), i.e. (with this probability) A(s)=(1+ ) A^(s) (1 + )A(s). Write A^i = A^(i), and let Hi = (1 + )3 maxfA^i?1 ; A^i g.
Lemma 8 If s 2 [(i ? 1); i], then A^(s) Hi. Proof If i 6= k, then A^(s) (1 + )A(s) (1 + ) maxfAi?1 ; Ai g (1 + )2 maxfA^i?1 ; A^i g = Hi =(1 + ) Hi : 23
If s 2 [(k ? 1); k], A^(s) (1 + )A . Also, using Corollary 7
Ak?1 ((k ? 1)=s )d?1 A ((k ? 1)=k)d?1A (1 ? 2=m)d?1A since k 21 m; (1 ? 1=(30n2))n A since m = 60n2; (1 ? =(5n))nA for n 1; A =(1 + ) since < 1: Thus A (1 + )Ak?1 , and therefore A^(s) (1 + )2 Ak?1 (1 + )3 A^k?1 (1 + )3 maxfA^k?1 ; A^k g = Hk :
2
P H , we have Thus, if V 0 = m i=1 i V 0 (1 + )4 =
m X
(1 + )4 (
i=1 m X
(1 + )4 ( Also
maxfAi?1 ; Ai g
i=1 kX ?1
V 0 (1 + )2
Using elementary area estimates
V (
m X i=1
kX ?1 i=1
and
V ( = (
(
i=0
Ai +
mX ?1 i=k
Ai + maxfAk?1 ; Ak g)
Ai + A )
(45) m X
maxfAi?1 ; Ai g (1 + )2 (
Ai + A + kX ?2 i=1 k X i=1 m X i=0
mX ?1 i=k
Ai ) (
m X i=0
Ai + minfAk ; Ak?1 g + Ai ? maxfAk ; Ak?1 g +
i=0
Ai ? A ):
Ai + A ) m X
i=k+1 m X i=k+1
(46)
(47)
Ai ) Ai )
Ai ? 2A )
(48)
From (46) and (47),
V V 0 =(1 + )2 + 2A V 0 =(1 + )2 + V=5; using (44), so V 0 =V (1 ? =5)(1 + )2 (1 + ). From (45) and (48), V 0 (1 + )4 (V + 3A ) (1 + )5 V; using (44), so V 0 =V (1 + )5 , i.e.
(1 + ) V 0 =V (1 + )5 : 24
(49)
P
We may now turn to the algorithm itself. We select a strip i 2 [m] from the probability distribution Hi =( m j =1 Hj ). Within the chosen strip we select a point uniformly, i.e. s 2 [(i ? 1); i] with density 1=. With probability A^(s)=Hi , we \accept" s and proceed recursively to dimension (d ? 1) and the cross-section at s. When d = 1 we generate uniformly on [0; w]. The generated point (s1 ; s2 ; : : : ; sn ) 2 K , where we use subscript d to refer to quantities at dimension d, is now accepted with a nal probability n V0 Y d : q = eV1 0 ^ A ( s n d=1 d) Note that q can be calculated within the algorithm. Now, Yn 0 q = eV1 VVn0 VVd A^(sd ) AV(sd ) n n d=1 d A(sd ) d n Y eV1 (1 +1 ) (1 + )5 (1 + ) AV(sd ) n d d=1 n 6n?1 1 Y Vd = (1 + )
Also
e Vn d=1 Vd?1 6n?1 = (1 + e) < 1 since = 1=(6n):
n Y q eV1 (1 +1 )5 (1 + ) (1 +1 ) AV(sd ) n d d=1
1 1 1 = e(1 + )5 e(1 + 1=12)5 > 5 : The overall (improper) density of the selected point is n n ^ n Y Y Y ^ q dsd P Hid A(sd ) = q A(sd ) ds d=1
m H j =1 jd
Hid
Vd0 d=1 d n Y = q eV1 0 dsd ; n d=1
i.e. uniform. The overall probability of acceptance is clearly n 1 Z Y ds = Vn
eVn0 K d=1 d
eVn0
d=1
1 1 e(1 + )5 > 5 :
Thus each \trial" of determining a point has a constant probability of success. We can make this as high as we wish by repeating the procedure. We use at most 60n2 n = 60n3 calls to the volume approximator. Thus the overall error probability will be at most 60n3 , if the approximator fails with probability . Finally, we observe that if K is well guaranteed, then all the sections which we might wish to approximate can easily be shown to be well guaranteed also. Thus our approximator can be restricted to work only for well guaranteed bodies, as we would obviously require. Thus this is no real restriction. (Provided, of course, the body K from which we wish to sample is itself well guaranteed.)
5 Applications 5.1 Integration We describe algorithms for integrating non-negative functions over a well-guaranteed convex body K . We assume non-negativity since we can only approximate and so we cannot deal with integrals which evaluate to zero. It may of course be entirely satisfactory to integrate the positive and negative parts of the function separately. 25
5.1.1 Concave functions Integration of a non-negative function f : n ! computation by: Z IR
where
x2K
over a convex body K can be expressed as a volume
IR
fdx = voln+1 (Kf )
Kf = f(x; z ) 2 n+1 : 0 z f (x)g: R Now if f is concave then Kf is convex and so we can compute x2K fdx as accurately as required by the algorithm of Section 4. The time taken depends on the guarantee that we make for Kf . This will depend on how large f can become on K and also on its average value R fdx x2K f = vol : (50) n (K ) IR
We assume from hereon that and
fmax = maxff (x) : x 2 K g 1 = eL1 f 1 = e?L2 : 2
We feel that L1 ; L2 and hK i are good measures of the size of the problem here. We need a parameter (L2 ) which accounts for f being very small on K . 2(R + r) for If the guarantees for K are a; r; R then observe that (i) Kf B (a; R + 1 ) and (ii) f (x) = rf= x 2 B (a; r=2) (this follows from f fmax and the non-negativity of f .) It follows that Kf is well guaranteed by ((a; =2); =2(1+( r+fR )2 )?1=2 ; R + 1 ). Thus we can compute the integral of f over K in time which is polynomial in hK i; L1 ; L2.
5.1.2 Mildly varying functions Here we consider a pseudo-polynomial time algorithm i.e. one which is polynomial in the parameters L; 1 ; 2 but which is valid for general integrable functions. We see from (50) that it is only necessary to get a good approximation for f in order to get a good approximation for the integral. We use the equation
f =
Z 1 0
Pr(f (x) t)dt
where the probability in (51) is for x chosen uniformly from K . Now let (2 + )1 2 ; N=
and Then we have where Furthermore
h = N1
i = Pr(f (x) ih) for i = 0; 1; :::; N: f = Ii =
Z (i+1)h ih
NX ?1 i=0
Ii
Pr(f (x) t)dt:
hi+1 Ii hi for i = 0; 1; :::; N ? 1; 26
(51)
and so where
S0 f S1 S0 = h S1 = h
Thus
NX ?1 i=1 NX ?1 i=0
i ; i :
0 1 SS1 = 1 + h S 0
0
1 + f ?h h 1 + 2 :
We have now reduced our problem to one of nding a good estimate for S0 and hence for i ; i = 1; 2; :::; N ? 1. Assume that we wish our estimate for S0 to be within =3 with probability at least . This will yield an -approximation for f when is small. We let M = d2160 ?3 ln( 4N )e 1 2
i for and choose points x1 ; x2 ; :::; xM uniformly at random from K . Let i = jfj : f (xj ) ihgj and ^i = M i = 1; 2; :::; N ? 1. Observe that the i are binomially distributed and we will use standard tail estimates of the binomial distribution without comment (see e.g. Bollobas [4].) We consider two cases. Case 1: i < 201 2 For this case we observe that if = 6M 1 2 then
Pr(i )
3e 10
2N :
This enables us to assume that if i0 = minfi : i < 201 2 g then
^i 6 for i i0 : 1 2
The probability of this not holding being at most =2. Case 2: i 201 2 For this case we observe that
3
M Pr(j^i ? i j 6 i ) 2 exp ? 2160 12 This enables us to assume that
2N :
j^i ? i j < 6 i for i < i0 :
The probability of this not holding being at most =2.
27
P Now our estimate for f will be S^0 = h Ni=1?1 ^i . It follows from the above that with probability at least 1- jS0 ? S^0 j h
NX ?1
ji ? ^i j i=1 iX NX ?1 0 ?1 h 6 i + h 6 i=1 i=i0 1 2
S60 + 6 2 f 3:
5.1.3 Quasi-concave functions It is possible to improve the preceding analysis in the case where f is quasi-concave i.e. the sets fx : f (x) ag are convex for all a 2 . We will need to assume that f satis es a (semi-) Lipschitz condition IR
f (y) ? f (x) 3 jjy ? xjj for x; y 2 K: Our algorithm includes a factor which is polynomial in L3 = ln(3 ), which can be taken to be positive. This is reasonable for if f grows extremely rapidly at some point then a small region may contribute disproportionately
to the integral and so require extra eort. Note that the algorithm will be polynomial in the log of the Lipschitz constant. Next let N = dln( 10 )e + 1;
M = d 10NL e:
Let L = 1 + maxfL1 ; L2; L3 g and = eL. It will be convenient later to assume that we know a 2 K such that f (a ) = 1 and that L1 1. this can be justi ed as follows: we use the Ellipsoid algorithm to nd a 2 K such that voln (fx 2 K : f (x) f (a )g) e?2L voln (K ) 10 and then replace f (x) by minff (x); f (a )g. The loss in the computation of f is at most 10 f and can be absorbed in our approximation error. We can then if necessary scale to make L1 1. By making a change of varable t = eu in (51) we have
f =
Z L1
?1
Pr(f (x) eu )eu du;
= I + J: Here
I =
Z ?NL Z?1 ?NL
?1 ? = e NL
Pr(f (x) eu )eu du;
eu du
10 f:
28
Then
J = = where and
Ji =
Z L1 ?NL 2M X?1 i=0
Ji ;
Z ui+1 ui
Pr(f (x) eu )eu du;
Pr(f (x) eu )eu du;
?NL + iNL
if i M if i > M M u Now de ne i = Pr(f (x) e i ) and hi = ui+1 ? ui for i = 0; 1; :::; 2M ? 1: Then
ui =
(i?M )L2 M
hi eui i+1 Ji hi eui+1 i : Now let
S0 = S1 = Then clearly But
2M X?1
i=0 2M X?1 i=0
hi eui i+1 ; hi eui+1 i :
S0 J S1 : u1 S0 expf? LN M g(S1 ? h0 e 0 ) 2 f) (1 ? 10 )(f ? 10 f ? 90 (1 ? )f:
4
=10 9 f and h0 10 .) (The second inequality uses 0 1; eu1 10 fe But f S0 and so we need only estimate S0 . Equivalently we need to estimate the i . Suppose we can compute ^i such that j ^ii ? 1j 2 for i = 0; 1; :::; 2M ? 1: (We will see shortly that we have xed things so that 2M ?1 is suciently large.) Under these circumstances if
S^0 = then and we are done. Observe next that
2M X?1
i=0
hi eui ^i+1
(1 ? 2 )(1 ? 4 )f S^0 (1 + 2 )f n (Ki ) i = vol vol (K ) for i = 0; 1; :::; 2M ? 1 n
where
Ki = fx 2 K : f (x) eui g: 29
Now the Ki are convex sets and it remains only to discuss their guarantees. Since Ki K for each i, we have no worries about the outer ball. It is the inner ball of K2M ?1 that we need to deal with. Now letting at = (1 ? t)a + ta for 0 t 1 we nd that K contains the ball B (at ; t ) where t = trR . Then if
= Rr expfL1 ? L3 ? LM2 g
(we can make L3 large enough so that 0 < < 1) then x 2 B (a ; ) implies
f (a ) ? f (x) eL3 r R = f (a ) expf? LM2 g
and so K2M ?1 B (a ; ) and we have a guarantee of (a ; ; 2R) for each Ki . Thus we can approximate f in time polynomial in L and 1 . It should be observed that Applegate and Kannan [2] have a more ecient integration algorithm for log-concave functions.
5.2 Counting linear extensions We noted in Section 3.2 that determining the number of linear extensions of a partial order can be reduced to volume computation (and so it can be approximated by the methods of Section 4). The volume approximation algorithm of Dyer, Frieze and Kannan applied (in the notation of Section 3.2) to P () gave the rst (random) polynomial time approximation algorithm for estimating e(). However, Karzanov and Khachiyan [19] have recently given an improvement to the algorithm for this application which is more natural, and which we will now outline. Observe rst that it suces to be able to generate an (almost) random linear extension of . For an incomparable pair i; j under , let ij denote the proportion of linear extensions with ?1 (i) < ?1 (j ). It is known, Kahn and Saks [17], that for some i; j we have minfi;j ; j;i g 113 . Thus by repeated sampling we will be able to determine, for some i; j , a close approximation to the proportion of linear extensions with ?1 (i) < ?1 (j ) { choose the i; j for which the estimate gives the largest minimum. We then add i j to the partial order and proceed inductively until the order becomes a permutation and then our estimate is the product of the inverses of the proportions that we have found. This requires us to generate O(n log n) linear extensions. To generate a random linear extension we do a random walk on E (). At a given extension we do nothing with probability 21 , otherwise we choose a random integer i between 1 and (n ? 1). If (i) 6 (i + 1) then we get a new permutation 0 by interchanging (i) and (i + 1). Let us say that in these circumstances ; 0 are adjacent. The steady state of this walk is uniform over linear extensions and so the main interest now is in the conductance of this chain which is b(X ) 1 min (2n ? 2)jX j : jX j 2 e() where b(X ) = jf(; 0 ) : 2 X; 0 62 X are adjacentgj: S So let X E () satisfy jX j e()=2. Let SX = 2X S and AX be the (n ? 1)-dimensional volume of the common boundary of SX and SE()=X . Now a straightforward calculation (using a two-dimensional rotation followed by an application of (1)) shows that each simplicial face of this boundary has (n ? 1)-dimensional volume p 2=(n ? 1)!. In the notation of Theorem 3, with p F (x) = 1 and thep`1 norm, we see that the unit normal u to any face of the common boundary has kuk = 2. Thus 0 (SX ) = 2AX . Applying the theorem we obtain p 2AX 2jnX! j since diam(K )=1 here.Thus b(X ) = AX (np? 1)! jXn j : 2 30
and so
2n(n1? 1) :
andpwe can generate a random linear extension in polynomial time. Note that this estimate is better by a factor of n than that given in [19]. (This order of improvement was, in fact, conjectured in [19].) Applying similar arguments to those in Section 4 we see that we can estimate e() to within , with probability at least (1 ? ) in O(n6 ?2 (log n)2 log(n=) log(1= )) time.
5.3 Mathematical Programming We can use our algorithm to provide random polynomial time algorithms for approximating the expected value of some stochastic programming problems. Consider rst computing the expected value of v(b) when b = (b1 ; b2 ; : : : bm) is chosen uniformly from a convex body K m and IR
v(b) = max f (x) subject to gi (x) bi
R To estimate Ev(b) we need to estimate
(i = 1; 2; : : :; m)
b2K v and divide it by an estimate of the volume of K . We thus have to consider under what circumstances the results of Section 4 can be applied. If f is concave and g1; g2 ; : : : ; gm are all convex then v is concave and we can estimate Ev eciently if we know that v is uniformly bounded below for b 2 K. Observe also that we will be able to estimate Pr(v(b) t) by randomly sampling b and computing v(b), provided this probability is large enough. Of particular interest is the case of PERT networks where the bi represent (random) durations of the various activities and f represents the completion time of the project. The results here represent a signi cant improvement, at least in theory, over the traditional heuristic method of assuming one critical path and applying a normal approximation. As another application consider computing the expected value of (c) when c = (c1 ; c2 ; : : : cn ) is chosen uniformly from some convex body K n and IR
(c) = min cx subject to gi (x) bi
(i = 1; 2; : : :; m)
Now (c), being the supremum of linear functions, is concave and we will be able to estimate the expectation of when can be computed eciently. The same remark holds for computing Pr((c) t).
As a nal example here, suppose that we have a linear program
min cx subject to Ax = b x 0: Suppose that (b; c) is chosen uniformly from some convex body in m+n . Suppose that B is a basis matrix (i.e. an m m non-singular submatrix of A). Sensitivity analysis might require us to estimate the probability that B is the optimal basis. This can be done eciently since it amounts to computing volm+n (Kopt )=volm+n (K ) where Kopt is the convex set K \ fcj cB B ?1 aj : j = 1; 2; : : : ; ng \ fB ?1 b 0g: (Here we are using common notation: aj is column j of A and cB is the vector of basic costs.) IR
5.4 Learning a halfspace This problem was brought to our attention by Manfred Warmuth who suggested that volume computatation might be useful in solving the problem. The method described here is due to the authors and Ravi Kannan. We 31
describe here the application of good volume estimation to a problem in learning theory. Student X is trying to learn an inequality n
X j =1
j xj 0 :
The unknowns are j 0, (j = 0; 1; : : : ; n) and X's aim is to be able toPanswer questions of the form \What is the sign of x 2 n relative to this inequality ?" Here sign(x; ) = + if nj=1 j xj 0 , and ? otherwise. There is a teacher Y who provides X with an in nite sequence of examples z (t); t = 1; 2; : : :. Given an example z (t), X must make a guess at sign(z (t); ) and then Y will reveal whether or not X's guess is correct or not. We assume that there is an L 2 such that z (t) 2 = f0; 1; : : :; L?1gn. Integrality is not a major assumption and non-negativity can be assumed, at the cost of doubling the number of varables, if X treats arbitrary components as the dierence of two non-negative components. The problem we have to solve is to design a strategy for X which minimises the total number of errors made. If there is no bound on component size then, even for n=2, Y can construct a hyperplane in response to any answers which is consistent with X being wrong every time. We de ne an equivalence relation on n+1 by IR
IR
(1) (2) if sign(x; (1) ) = sign(x; (2) ) for all x 2 : X cannot hope to compute exactly and instead aims to nd 0 . Moreover we will see that it is advantageous for X to assume satis es n X (52) j0 xj 6= 00 for all x 2 : j =1
There is always a small perturbation ^ of , ^ , that satis es (52). We can also assume that 0 j 1; j = 0; 1; : : : ; n since scaling does not aect signs. For x 2 let ax = (x; ?1) and Hx be the hyperplane (in space) f 2 n+1 : ax = 0g. These hyperplanes partition n+1 into an arrangement of open cones. Consider the partition S1 ; S2 ; : : : that these cones induce of Cn+1 = [0; 1]n+1. Note that if two vectors ; 0 lie in the2 same Si then 0 . If satis es (52) then it lies in an Si of dimension n + 1 and volume at least = (nL)?n . It follows from these remarks that the following algorithm never makes more than O(n2 (log n + log L)) mistakes: IR
IR
Keep a polytope P within whose interior is known to lie; initially P = Cn+1 ;
for t = 1; 2; : : : do begin let P+ = f : z 0g and P? = f : z 0g; compute voln (P+ ), voln (P? ); answer 2 P+ if this larger volume, otherwise P? ; if you are wrong, having chosen P+ say, then P := P? end
Each mistake halves the volume of P , which starts at 1. On the other hand, voln+1 (P ) and the result follows. Although we cannot compute volumes exactly, a 101 -approximation will guarantee that the volume of P reduces by 43 , say, which suces. Also we have a probabilistic error in our computation. To keep the overall probability of error down to say, we need only keep the error probability for each computation down to = log4=3 (1= ). This analysis improves the the number of errors required by a factor of n from the method proposed by Maass and Turan [25].
6 The number of random bits We have already seen in Section 3.1 that a deterministic algorithm cannot guarantee a good approximation to volume in the oracle model. We return now to our remarks about nondeterministic computation, using the notation of Section 3:1. We assume we are interested in -approximation, with = (n ) for some 2 , i.e. polynomial approximation. As usual, we have a convex body K n described by an oracle as in Section 2. IR
IR
32
Suppose that we have a randomised algorithm which makes at most m(n) calls on the oracle for a polynomial m, and that it uses at most b = n ? ! log2 n random bits, where ! = !(n) ! 1. Then M (n) 2bm(n). Thus the relative error of approximations from this algorithm cannot be guaranteed to be better than (2n?b =m(n))1=2 n!=4 for large n. So we cannot polynomially approximate with much less than n (truly) random bits. On the other hand, a result of Nisan [27] shows that only O(n(log n)2 ) truly random bits are actually necessary. This is rather surprising, but it follows from the fact we need only O(n log n) space to maintain the random walk and accumulate the required information to make our estimate. (We need not, of course, worry about the space needed by the oracle.) Nisan's result states that, in an algorithm using space S and R random bits, the random bits can be supplied by a pseudorandom generator which uses only O(S log(R=S ) truly random bits. One then observes from Section 4 that in our case, for polynomial approximations, R is polynomially bounded in n.
Acknowledgements We thank Ross Willard for providing us with the rst paragraph of historical information and David Applegate, Ravi Kannan and Umesh Vazirani for general comments. We thank Nick Polson for his observation on the estimation of P in Section 4.1 which saved us from a much more complicated argument. We thank Russell Impagliazzo for bringing Nisan's paper to our attention.
References [1] D. Aldous and P. Diaconis, Shuing cards and stopping times, American Mathematical Monthly 93 (1986), 333{348. [2] D. Applegate and R. Kannan, Sampling and integration of near log-concave functions, Computer Science Department Report, Carnegie-Mellon University, 1990. [3] I. Barany and Z. Furedi, Computing the volume is dicult, Proc. 18th Annual ACM Symposium on Theory of Computing (1986), 442{447. [4] B.Bollobas, Random Graphs, Academic Press, 1985. [5] P. Berard, G. Besson and A. S. Gallot, Sur une inegalite isoperimetrique qui generalise celle de Paul LevyGromov, Inventiones Mathematicae 80 (1985), 295{308. [6] G. Brightwell and P. Winkler, Counting linear extensions is #P-complete, DIMACS Technical Report 90-49, 1990. [7] Yu. D. Burago and V. A. Zalgaller, Geometric inequalities, Springer-Verlag, Berlin, 1988. [8] R. Courant and H. Robbins, What is mathematics ?, Oxford University Press, London, 1941. [9] M. E. Dyer and A. M. Frieze, On the complexity of computing the volume of a polyhedron, SIAM J. Comput. 17 (1988), 967{974. [10] M. E. Dyer, A. M. Frieze and R. Kannan, A random polynomial time algorithm for approximating the volume of convex bodies, Proc. 21st Annual ACM Symposium on Theory of Computing (1989) 375-381 (full paper will appear in J. ACM.) [11] G. Elekes, A geometric inequality and the complexity of computing volume, Disc. Comp. Geom. 1 (1986), 289{292. [12] W. Feller, Introduction to the theory of probability and its applications Vol. I, Wiley, New York, 1968. [13] M. Grotschel, L. Lovasz and A. Schrijver, Geometric algorithms and combinatorial optimization, SpringerVerlag, 1988. 33
[14] W. Hoeding, Probability inequalities for sums of bounded random variables, J. Amer. Stat. Assoc. 58 (1963) 13-30. [15] M. R. Jerrum and A. J. Sinclair, Approximating the permanent, SIAM J. Comput. 18 (1989) 1149-1178. [16] M. R. Jerrum, L. G. Valiant and V. V. Vazirani, Random generation of combinatorial structures from a uniform distribution, Theoretical Computer Science 43 (1986), 169{188. [17] J.Kahn and M.Saks,Every poset has a good comparison, Proc. 16th Annual IEEE Symposium on Foundations of Computer Science (1984) 299-301. [18] R. M. Karp and M. Luby, Monte-Carlo algorithms for enumeration and reliability problems, Proc. 24th Annual IEEE Symposium on Foundations of Computer Science (1983) 56-64. [19] A. Karzanov and L. G. Khachiyan, On the conductance of order Markov chains, Technical Report DCS TR 268, Rutgers University, 1990. [20] L. G. Khachiyan, On the complexity of computing the volume of a polytope, Izvestia Akad. Nauk SSSR, Engineering Cybernetics 3 (1988), 216{217 (in Russian). [21] J. Lawrence, Polytope volume computation, Preprint NISTIR 89-4123, U.S. Dept. of Commerce, National Institute of Standards and Technology, Center for Computing and Applied Mathematics, Galthersburg, 1989. [22] H. W. Lenstra, Integer programming with a xed number of variables, Math. Oper. Res. 8 (1983), 538{548. [23] N. Linial, Hard enumeration problems in geometry and combinatorics, SIAM J. Alg. Disc. Meth. 7 (1986), 331{335. [24] L. Lovasz and M.Simonovits, The mixing rate of Markov chains, an isoperimetric inequality, and computing the volume, Preprint 27/1990, Mathematical Institute of the Hungarian Academy of Sciences, 1990. [25] W. Maass and G.Turan, On the complexity of learning from counterexamples, Proc. 30th Annual IEEE Symposium on Foundations of Computer Science (1989) 262-267. [26] P. Matthews, Generating a random linear extension of a partial order, University of Maryland (Baltimore County) Technical Report, 1989. [27] N. Nisan, Pseudorandom generators for space-bounded computation, Proc. 22nd Annual ACM Symposium on Theory of Computing (1990) 204-212. [28] L. E. Payne and H. F. Weinberger, An optimal Poincare inequality for convex domains, Arch. Rat. Mech. Anal. 5 (1960), 286{292. [29] R. T. Rockafellar, Convex analysis, Princeton University Press, Princeton, New Jersey, 1970. [30] A. J. Sinclair and M. R. Jerrum, Approximate counting, generation and rapidly mixing Markov chains, Information and Computation 82 (1989), 93{133. [31] A. H. Stone and J. W. Tukey, Generalized \sandwich" theorems, Duke Math. J. 9 (1942), 356{359.
34