A Permanent Algorithm With (exp ]) Expected Speedup for ... - CiteSeerX

1 n4 p1 4 log2 n 2e

A Permanent Algorithm With (exp[ Expected Speedup for 0-1 Matrices

])

Eric Baxand Joel Frankliny December 12, 1997 Abstract

Formulas for the matrix permanent can be derived by applying nite dierences to a generating function. For 0-1 matrices, some choices of nite-dierence parameters produce formulas with many zero-valued terms. Eliminating sets of these terms speeds the computation. In this paper, we use a permanent decomposition identity to introduce more parameters. We show that a class of random parameter settings reduces the expected number of nonzero-valued terms to an exponentially small1 fraction of the terms. We present an algorithm that achieves n 4 p 1 ]) expected speedup for 0-1 matrices over Ryser's inclu (exp[ 4 log 2 n 2e sion and exclusion algorithm.

Key words algorithms, combinatorial problems, #P, permanent, nitedierence, generating function. AMS subject classi cations 05,68

Computer Science Department, California Institute of Technology 256-80, Pasadena, California, 91125 ([email protected]). y Department of Applied Mathematics, California Institute of Technology 217-50, Pasadena, California, 91125 ([email protected]).

1

2

1 Introduction

The permanent of an n n 0-1 matrix is the number of n-sets of one-valued entries with one entry from each row and one entry from each column. If the rows and columns represent the independent vertex sets of a bipartite graph and one-valued entries represent edges, then the permanent is the number of perfect matchings. If the rows represent women, the columns represent men, and the one-valued entries denote compatability, then the permanent is the number of ways to arrange marriages such that everyone is married and there is no polygamy. The 0-1 matrix permanent problem has applications to statistical physics [6, 13]. Recently, several methods have been developed to estimate the 0-1 matrix permanent [7, 8, 9, 10]. This paper focuses on computing the permanent exactly. Computing the permanent of a 0-1 matrix is a #P-complete problem [14]. The class #P is the set of counting problems associated with NP decision problems [5]. Other #P-complete problems include counting Hamiltonian cycles [5] and counting assignments that satisfy a CNF expression [12]. Ryser devised an inclusion and exclusion formula for the permanent. Evaluating the formula requires O(2n poly n) time and O(poly n) space. The nitedierence formula for the permanent is a generalization of the inclusion and exclusion formula [1]. The nite-dierence formula has some free parameters which can be chosen to produce permanent formulas that have many zero-valued terms. These terms can be collected to reduce computation [3]. In this paper, we review the nite-dierence formula. Then, we use a permanent decomposition identity to introduce new parameters that can be chosen to produce more zero-valued terms. We exhibit a method to choose the parameters that yields formulas in which the expected fraction of nonzero-valued terms is O(exp(?n 41 p 1 )) (1) 2e Also, we present an algorithm to compute the permanent that has expected running time n 41 p 1 ] 2n poly n) O(exp[? 4 log (2) 2e 2n for random 0-1 matrices. This is a substantial increase in speed over the O(2n poly n) time required for Ryser's inclusion and exclusion algorithm [11]. Unfortunately, the algorithm requires super-polynomial space. We outline methods to vary the algorithm to trade space for time.

3

2 A Finite-Dierence Formula for the Permanent 2.1 Finite-Dierences and Multilinear Terms

Let P () be a polynomial in variables x1 ; : : : ; xn with all terms of degree n or less. De ne the nite-dierence operator with respect to xj as follows:

Dj (uj ; vj ) =

P ()jxj =uj ? P ()jxj =vj for uj 6= vj uj ? v j

(3)

where constants uj and vj are the nite-dierence parameters. Abbreviate Dj (uj ; vj ) by Dj . The nite-dierence operator Dj acts like the derivative with respect to xj . Note that Dj xj = 1 and Dj c = 0 for c constant with respect to xj . Also, note that the nite-dierence operators are linear, and Dj Di = Di Dj . Apply the nite-dierence operators with respect to each variable to P () to form the expression: D1 Dn P (): (4) Expand P (). Let cp1 ;:::;pn be the coecient of term xp11 xpnn . Then

D1 DnP () = D1 Dn

X

cp1 ;:::;pn xp11 xpnn :

(5)

cp1 ;:::;pn D1 Dn xp11 xpnn :

(6)

p1 +:::+pn n

Since nite-dierences are linear operators,

D1 Dn P () =

X

p1 +:::+pn n

For the multilinear term, D1 Dnx1 xn = 1. Since each term has degree n or less, each other term lacks some xj (by the pigeonhole principle.) The corresponding Dj kills the term. So the nite-dierence operators yield the coecient of the multilinear term:

D1 Dn P () = c1;:::;1 :

(7)

2.2 The Permanent as a Multilinear Term

De ne a variable matrix A(x) corresponding to the n n matrix A as follows: [A(x)]ij = [A]ij xj , i.e., multiply each column by xj . De ne P (x) to be the product of row sums of A(x).

P (x) =

n X n Y i=1 j =1

aij xj =

X

j ;:::;jn )2f1;:::;ngn

( 1

a1j1 anjn x1j1 xnjn :

(8)

4 Each term in the expression on the right has one entry from each row, but it may not have exactly one entry from each column. The exponent of variable xj is the number of entries from column j . The terms with one entry from each column are the permanent terms, so the permanent is the coecient of the multilinear term of P (x). per A = D1 Dn P (x):

(9)

2.3 Formula

Expanding the nite-dierence operators produces the following formula. X per A = (u ? v ) 1 (u ? v ) (?1)s(x) P (x) (10) 1 1 n n x2fu1 ;v1 g:::fun ;vn g

where s(x) is the number of variables xj set to vj . (For a more detailed derivation of this formula, refer to [1].)

5

3 Zero-Valued Terms

Consider the center expression in (8). Note that P (x) has value zero when any row sum of A(x) is zero, because P (x) is the product of row sums. If P (x) is zero, then the corresponding term of (10) is zero. Some choices of nite-dierence parameters induce many zero-valued terms. Consider the setting uj = 1 and vj = ?1 for all j . Formula (10) becomes per A = 21n

X x2f1;?1gn

(?1)S(x)P (x):

(11)

Assume the entries of A are determined randomly and independently, with each entry taking value one with probability p and value zero with probability q = 1 ? p. Assume 0 < p < 1. The expected fraction of nonzero-valued terms in (11) is n 1 X Y (12) 2n x2f1;?1gn i=1[1 ? Prfai1 x1 + : : : + ain xn = 0g]:

Collect terms, letting k be the number of entries in Then the fraction is n X

x assigned positive one.

n 1 n Y [1 ? Prf(ai1 + : : : + aik ) ? (aik+1 + : : : + ain ) = 0g]: (13) n k=0 2 k i=1 Given k, the row sums are independent and identically distributed. So the fraction is

n X

1 n [1 ? Prf(a + : : : + a ) ? (a + : : : + a ) = 0g]n 1 k k+1 n n k 2 k=0

(14)

where aj is a random variable that has value one with probability p and zero with probability q = 1 ? p. The expected fraction of nonzero-valued terms can be computed exactly [3]. In this paper, we focus on the asymptotic form as n increases. We use some results from ? probability theory [4]. Approximate the leading binomial distribution 21n nk by the normal distribution p2n g( p2n (k ? n2 )), where g(z ) = p12 e? 12 z2 . Also, approximate the row sum by a normal distribution with mean (2k ? n)p and variance npq. (For detailed justi cation and analysis of these approximations, see [1].) With these approximations, the expected fraction of nonzero-valued terms is

n X k=0

1 g( (2k ? n)p )]n p2n g( 2kp?n n )[1 ? pnpq pnpq

(15)

6 Let tk = 2kp?nn and 4t = p2n . Approximate the sum by an integral. The previous expression is r Z 1 g( p t)]n dt g(t)[1 ? pnpq (16) q q

In [1], it is shown that this expression is O((n p log n)? 21 ). In [3], it is shown that even more zero-valued terms are produced by introducing zeros in the nite-dierence parameters and alternating positive and negative ones as follows: (u1 ; v1 ) = (1; 0), (u2 ; v2 ) = (0; ?1), (u3 ; v3 ) = (1; 0), etc. The row sums are correlated. For x assignments with about half the entries positive and about half negative, there are likely to be several rows with sum zero. For x assignments with almost all entries positive (or negative), it is likely that no row has sum zero. When one row sum is zero, it is likely that several others are zero as well. Since only one zero row sum is needed to zero the term, the extra zero row sums are \wasted". If the row zeroings were independent, there would be many more zero-valued terms, as we show in the following analytic sketch. For simplicity, assume n is a multiple of 4, A has n2 \even" rows with n2 one-valued entries each, and the other rows have odd numbers of one-valued entries so they are never zeroed. Each even row has sum zero when half the entries of x that correspond to one-valued entries in the row are assigned +1. The fraction of assignments that zero each even row is r 1 n2 4 p1 (17) n 2 n2 n4 by Stirling's formula [4]. So each even row has a nonzero sum in about 1 ? p1n of the assignments. If the row zeroings were independent, then the fraction of nonzero-valued terms would be about (18) (1 ? p1 ) n2

n

To nd the fraction as n ! 1, take the natural logarithm, approximate by a Taylor series expansion about 1, and exponentiate. The fraction of nonzerovalued terms would be about

e? 12

pn? 1 4

(19)

In the next section, we show how to reduce correlations among row sums to achieve an exponentially small fraction of nonzero-valued terms.

7

4 Decomposition Note that the permanent is linear in its columns. For example, if we de ne Aa as the matrix A with the rst row replaced by vector a, then per Aa+b = per Aa + per Ab

(20)

Set a to the rst column of A to make Aa = A. Then we have a formula for the permanent of A: per A = per Aa+b ? per Ab (21) We refer to b as the decomposition vector. To compute per A, we can use formula (21) to compute the permanents of Aa+b and Ab . On the face of it, this doubles the computation. However, we can choose the decomposition vector to decrease the expected fraction of nonzero-valued terms. Consider the worst case for correlations among row zeroings, when A is the matrix with all entries one-valued. For each assignment x, all row sums in A(x) are equal. Using the formula with nite-dierence parameters (u1 ; v1 ) = (1; 0), (u2 ; v2 ) = (0; ?1), (u3 ; v3 ) = (1; 0), etc., the row sums are zero for O(1pn) of the assignments x. So the fraction of nonzero-valued terms is 1 ? O(1pn) . We will now use decomposition to compute A. For simplicity, assume n is odd. Use nite-dierence parameters u1 = 1, v1 = ?1; for odd j > 1, uj = 1 and vj = 0; for even j , uj = 0 and vj = ?1. Let b = (? n?2 1 ; : : : ; n?2 1 ). The row sums over columns 2 to n n X aij xj (22) vary from ? n?2 1 to n?2 1 .

j =2

In the computation of the permanent of Ab , the rst column, x1 b, has entries 1 n?1 . Hence, each row sum over columns 2 to n is cancelled by some 2 2 element in the rst column. So every term has value zero. In the computation of the permanent of Aa+b , the rst column has entries ? n?2 1 + 1 to n?2 1 + 1 if x1 = +1 and entries ? n?2 1 ? 1 to n?2 1 ? 1 if x1 = ?1. For x1 = +1, the term with partial row sums ? n?2 1 over columns 2 to n is the only nonzero-valued term. Likewise, for x1 = ?1, the term with partial row sums n?2 1 is the only nonzero-valued term. So the computation has only 2 nonzero-valued terms! As illustrated in the example, decomposition can signi cantly increase the fraction of zero-valued terms for matrices with many one-valued entries. In general, using a decomposition vector with a variety of entries increases the variety among row sums. Let Zi be the set of terms that produce a zero row sum in row i. The goal of decomposition is to enlarge the union of these sets, Z1 [ : : : [ Zn . Using decomposition increases the fraction of zero-valued terms if

? n? to

8 it reduces the intersections among sets without shrinking the sets too much. In this paper, we develop and analyze a simple strategy for selecting decomposition vector entries { choosing them at random.

4.1 A Random Decomposition Strategy to Increase Row Sum Variance

Each row sum in matrix Aa+b(x) is the sum of the row sum in the original matrix A(x) and x1 times the decomposition vector entry for the row. Each row sum in matrix Ab (x) is the sum of the partial row sum in A(x), neglecting the rst entry, and x1 times the decomposition vector entry for the row. If we select decomposition vector entries independently at random from an integer-valued distribution that takes the shape of a zero-mean Gaussian as n increases, then the distributions (over random 0-1 matrices and random decomposition vectors) of row sums of Aa+b (x) take the shape of Gaussians with the variance of the row sums of the original matrix A(x) plus the variance of the decomposition vector entry distribution. The row sums of Ab (x) have similar distributions. Hence, this decomposition strategy increases the variance of row sums. To specify our strategy in more detail, let us generate the entries of the decomposition vector by summing over i.i.d. Bernoulli variables:

bi = (di1 + : : : + dim ) ? (dim+1 + : : : + di2m ) 8i 2 f1; : : : ; ng

(23)

where each dij has value one with probability w and value zero with probability 1 ? w, and m and w are values that we will choose. For matrix Aa+b (x), the row sums have the form: [(di1 + : : : + dim ) ? (dim+1 + : : : + di2m )]x1 + ai1 x1 + : : : ain xn

(24)

Whether x1 is assigned positive one or negative one, the term [(di1 + : : : + dim ) ? (dim+1 + : : : + di2m )]x1

(25)

has mean zero and variance 2mw(1 ? w). If k variables xj are assigned positive one, then the sum of the remaining terms, e.g.,

ai1 + : : : + aik ? aik+1 ? : : : ? ain

(26)

has mean (2k ? n)p and variance npq. The entire row sum has mean (2k ? n)p and variance npq + 2mw(1 ? w). As n ! 1, the row sum distribution takes the shape of a Gaussian with mean (2k ? n)p and variance npq + 2mw(1 ? w). We can produce any variance greater than npq through the choice of m and w. For the remainder of this exposition, let w = p. Then the variance is (n + 2m)pq. We will control the variance through the choice of m.

9 For the computation of the permanent of Aa+b , the expected fraction of nonzero-valued terms is n 1 X Y 2n x2f1;?1gn i=1[1?Prf[(di1 +: : :+dim )?(dim+1 +: : :+di2m )]x1 +ai1 x1 +: : :+ain xn = 0g] (27) Note that the distribution of [(di1 + : : : + dim ) ? (dim+1 + : : : + di2m )]x1 is the same whether x1 is positive one or negative one, since the distribution of (di1 + : : : + dim ) ? (dim+1 + : : : + di2m ) is symmetric about zero. Collect terms, letting k be the number of entries in x assigned positive one. Then the expected fraction of nonzero-valued terms is n 1 n Y n X [1?Prf(di1 +: : :+dim )?(dim+1 +: : :+di2m )+(ai1 +: : :+aik )?(aik+1 +: : :+ain ) = 0g] n k=0 2 k i=1 (28) Given k, the row sums are independent and identically distributed. So the expected fraction of nonzero-valued terms is n 1 n X n n k [1?Prf(d1 +: : :+dm )?(dm+1 +: : :+d2m )+(a1 +: : :+ak )?(ak+1 +: : :+an ) = 0g] 2 k=0 (29) where each aj is a random variable that has value one with probability p and zero with probability q = 1 ? p, and each dj is a random variable that has value one with probability p and zero with probability q = 1 ? p. The row sums of Ab (x) are similar to the row sums of Aa+b (x). They have the form (24), except that the term ai1 x1 is replaced by zero. Because of this, the asymptotic expected fraction of nonzero-valued terms in the computation of the permanent of Ab is slightly less than the expected fraction of nonzero-valued terms in the computation of the permanent of Aa+b . Hence, we focus on the fraction for Aa+b and use the result as a bound on the fraction for Ab .

4.2 Variance Increased by a Constant Multiplier

Consider the asymptotic form of the expected fraction of nonzero-valued terms (29) in the computation of the permanent of Aa+b when we select the number m to create row sum variances (n + 2m)pq = cn, where c is constant with respect to n. (Without decomposition, c = pq; with decomposition, c > pq.) Examine (29). Note that the bracketed expression ? is in [0; 1] for all k . So the term corresponding to k is no greater than 21n nk , which is the probability that the sum of n i.i.d. Bernoulli variables is k, given that each variable takes value one with probability 21 and zero with probability 12 . According to Feller [4], p. 193, (6.7) X 1 n p1 1 exp(? 1 n0:28 ) (30) n k 0:14 2 n 2 2 n 0 : 64 jk? 2 j>2n

10 So we introduce error o(exp(? 12 n0:28 )) by restricting the sum to values of k such that jk ? n2 j 2n0:64. According to Feller [4], p. 184, (3.13), 1 n p2 g( 2kp? n ) for jk ? n j 2n0:64 (31) n n 2n k 2 where g() is the standard normal, i.e., g(z ) = p12 exp(? 12 z 2 ). Note that the sum of random variables in (29), (d1 + : : : + dm ) ? (dm+1 + : : : + d2m ) + (a1 + : : : + ak ) ? (ak+1 + : : : + an ) (32) is the dierence of sums over two sets of i.i.d. Bernoulli variables: (d1 + : : : + dm + a1 + : : : + ak ) ? (dm+1 + : : : + d2m + ak+1 + : : : + an ) (33) In [1], it is shown that the distribution of this dierence can be asymptotically approximated by a Gaussian with the same mean and variance as the dierence: Prf(d1 +: : :+dm)?(dm+1 +: : :+d2m )+(a1 +: : :+ak )?(ak+1 +: : :+an) = 0g (34) p 1 g( p(2k ? n)p ) (35) (n + 2m)pq (n + 2m)pq for jk ? n2 j 2n0:64. Using tail bound (30) and asymptotic forms (31) and (35) in (29) gives the following expression for the asymptotic expected fraction of nonzero-valued terms: X p2n g( 2kp?n n )[1 ? p1cn g( (2kp?cnn)p )]n + o(exp(? 12 n0:28 )) (36) jk? n j2n0:64 2

Since our result will dominate the error term o(exp(? 21 n0:28 )), we disregard it in the remaining analysis. Let tk = 2pk?nn and 4t = p2n . Then the asymptotic expected fraction is X

jtk j4n0:14

Observe that

pkc )]n 4t g(t)[1 ? p1cn g( pt

(37)

r

[1 ? p1cn g( pptc )]n exp(? nc g( pptc )) exp(? 21c g2 ( pptc )) (38) The second exponential is no greater than one. So the asymptotic form of the expected fraction of nonzero-valued terms is

r

4t g(t) exp(? nc g( pptc )) jtk j n0:14 X 4

(39)

11 In [1], it is shown that this sum has the same asymptotic form as the following integral. r Z 1 (40) dt g(t) exp(? nc g( pptc )) ?1

By symmetry, this is equal to 2

Z

1

0

r

g(t) exp(? nc g( pptc ))dt

p

(41)

Let r = pc2 and = 2nc . Make the change of variable w = exp( ?2tr ). Then we have the following bound for the asymptotic expected fraction of nonzerovalued terms: r Z 1 r 1 ? 12 ?w r?1 (42) w (log w ) e dw 2

0

This integral has asymptotic form [1]: r

2r (2pq) r2 ?(r) p 1 nr log n So the asymptotic expected fraction of nonzero-valued terms is ) O( q c1 n p2 log n

(43)

(44)

For n ! 1 and c constant with respect to n, increasing row sum variance cn through decomposition decreases the expected fraction of nonzero-valued terms.

4.3 Variance Increased by a Factor of n { Too Much Variance

There is a limit to how much the expected fraction of nonzero-valued terms can be decreased by increasing row sum variance through decomposition. In our previous analysis, c is constant with respect to n. Suppose instead we increase the variance multiplier with n, i.e., c = n, so that variance (n + 2m)pq = n2 . Examine expression (29), the expected fraction of nonzero-valued terms. For simplicity, assume n is even. Note that the expression in brackets, which is the probability of a nonzero row sum, is minimum when k = n2 , i.e., when half the variables in x are assigned positive one and half are assigned negative one. Hence, the expected fraction of nonzero-valued terms is

n X

1 n [1?Prf(d +: : :+d )?(d +: : :+d )+(a +: : :+a n )?(a n +: : :+a )g]n 1 m m+1 2m 1 n 2 2 +1 n k=0 2 k (45)

12 ?

Since k does not appear in the bracketed term, and 21n nk is a distribution, we can move the bracketed expression to the left of the sum and substitute one for the sum over the distribution. So the previous expression is equal to [1 ? Prf(d1 + : : : + dm) ? (dm+1 + : : : + d2m )+(a1 + : : : + a n2 ) ? (a n2 +1 + : : : + an)g]n (46) Use (35) to derive the following asymptotic form for this lower bound on the expected fraction of nonzero-valued terms. Remember that (n + 2m)pq = n2 . (47) [1 ? n1 g(0)]n ! exp(?g(0)) = exp(? p1 ) as n ! 1 2 Hence, the expected fraction of nonzero-valued terms does not even go to zero as n ! 1. This is worse than not using decomposition at all.

4.4 Variance Increased by a Factor of pn { An Exponentially Small Fraction of Nonzero-Valued Terms p

Suppose we choose m to make the variance (n + 2m)pq = nn. Examine (29). Use (30) and (35) to derive the following expression for the asymptotic expected fraction of nonzero-valued terms: X 1 n [1 ? p1 g( (2pk ? n)p )]n + o(exp(? 1 n0:28 ) (48) pnn n nn 2 jk? n j2n0:64 2 k 2

Since our result will dominate the error term o(exp(? 12 n0:28 )), we disregard it in the remaining analysis. Note that the term in brackets is minimum for the largest deviations of k from n2 , i.e., jk ? n2 j = 2n0:64. So the asymptotic form is

1 n [1 ? 1 g( (2 2n0:64 )p )]n n n0:75 n0:75 jk? n j2n0:64 2 k X

(49)

2

?

The bracketed term is independent of k, and the sum of 21n nk over the range of k is a partial distribution. So the expression is 0:64 (50) [1 ? 1 g( n 4p )]n = [1 ? 1 g( 4p )]n

n0:75

n0:75

n0:75 n0:11

As n ! 1, n40:p11 ! 0. Bound this fraction by one to nd an upper bound for the previous expression. [1 ? n01:75 g(1)]n (51)

To nd the asymptotic behavior of this expression, let x = ? n01:75 g(1). Then (52) [1 ? n01:75 g(1)]n = (1 + x)n

13 Exponentiate and take the natural logarithm. (1 + x)n = exp[ln(1 + x)n ] = exp[n ln(1 + x)]

(53)

Take the Taylor series expansion for ln(1 + x): ln(1 + x) = ln(1) + 1!x ln0 (1) + x2! ln00 (1) + x3! ln000 (1) + : : :

(54)

2 3 ln(1 + x) = 0 + 1!x ( 11 ) + x2! ln00 (1) + x3! ln000 (1) + : : :

(55)

ln(1 + x) = 0 + x ? 12 x2 + 31 x3 : : :

(56)

exp[n ln(1 + x)] = exp[n(x ? 21 x2 + 31 x3 : : :)]

(57)

(1 + x)n = exp[n(x ? 21 x2 + 31 x3 : : :)]

(58)

2

3

Note that ln0 (y) = y1 , ln00 (y) = ?y21 , ln000 (y) = 2 y13 , and ln(k) (y) = (?1)k+1 (k ? 1)! y1k . So

and So we have: and

Undo the substitution x = ? n01:75 g(1):

[1 ? n01:75 g(1)]n = exp[n(? n01:75 g(1) ? 12 n11:50 g2 (1) ? 31 n21:25 g3 (1) ? : : :)] (59)

= exp[?n0:25g(1) ? 21 n01:50 g2 (1) ? 31 n11:25 g3 (1) ? : : :] (60) = exp[?n0:25g(1)] exp[? 21 n01:50 g2(1)] exp[? 31 n11:25 g3 (1)] (61) In all but the rst exponential, the exponents go to zero as n ! 1, so these exponentials go to one. Hence, the rst exponential determines the asymptotic form. So the asymptotic expected fraction of nonzero-valued terms is (62) O(exp(n? 41 p 1 )) 2e

14

5 A Method to Compute the Permanent that Avoids Many Zero-Valued Terms 5.1 Alternative Computation of the Permanent Formula Recall the permanent formula: per A =

X x2f?1;1gn

(?1)s(x)

n n X Y i=1 j =1

aij xj

(63)

where s(x) is the number of variables xj assigned ?1. The straightforward method to compute this formula is to step through assignments x, compute the signed product of row sums for each assignment, and sum these values over assignments. We use an alternative method to compute the formula in order to avoid computing many zero-valued terms. Focus on the product of row sums: n X n Y

i=1 j =1

aij xj

(64)

View vector x as the concatenation of \half vectors" y and z, with n2 entries each. (To simplify the exposition, assume n is even.) For a given assignment x = (y1; : : : ; y n2 ; z1; : : : ; z n2 ), suppose we have already computed the row sums over the rst half of the columns of A(x):

c(y) =

n

2 X

j =1

aij yj

(65)

and suppose we have already computed the row sums over the second half of the columns of A(x): n X (66) aij zj? n2 d(z) = j = n2 +1

Then we can compute the product of row sums by taking the product of the sums of the half-row sums. n X n Y

i=1 j =1

aij xj =

n Y

i=1

(ci (y) + di (z))

(67)

If we de ne s(y) to be the number of variables yj assigned ?1 and de ne s(z) to be the number of variables zj assigned ?1, then (?1)s(x) = (?1)s(y)(?1)s(z) . For notational convenience, de ne c0 (y) = (?1)s(y) and d0 (z) = (?1)s(z) . Then (?1)s(x) = c0 (y)d0 (z)

(68)

15 Notenthat the space f?1; 1gn in formula (63) is the cross product f?1; 1g n2 f?1; 1g 2 . Hence, the assignments xn 2 f?1; 1gn are equivalent to the assignn2 ments (y; z) 2 f?1; 1g f?1; 1g 2 . We can use (67) and (68) to create a formula equivalent to (63). X

per A = (y

n2

;z)2f?1;1g f?1;1g

c0 (y)d0 (z)

n2

n Y

i=1

(ci (y) + di (z))

(69)

To compute this formula, we can rst compute and store all half-row sums. C = fc(y)jy 2 f?1; 1g n2 g and D = fd(z)jz 2 f?1; 1g n2 g (70) Then the formula becomes per A =

X

;d)2C D

(c

c0 d0

n Y

i=1

(ci + di )

(71)

Now, suppose we partition the sets of half-row sum vectors according to the values in entries 1; : : : ; m. (These are the sums over halves of the columns of A(x) for the rst m rows. We will specify the value of m later.) Denote the partitions of C as follows:

C (s) = fc 2 C jc1 = s1 and : : : and cm = sm g 8 s 2 f? n2 ; : : : ; n2 gm (72)

Denote the partitions of D as follows:

D(t) = fd 2 Djd1 = t1 and : : : and dm = tm g 8 t 2 f? n2 ; : : : ; n2 gm (73)

Introducing the partitions into formula (71) produces: per A =

X

X

; 2f? n2 ;:::; n2 gm f? n2 ;:::; n2 gm

;d)2C (s)D(t)

(c

(s t)

c o do

n Y i=1

(ci + di )

(74)

All c 2 C (s) have values s1 ; : : : ; sm in the rst m entries. All d 2 D(t) have values t1 ; : : : ; tm in the rst m entries. So these values are constant within the second sum, and they can be moved outside the sum. per A =

X

m Y

n m n n m i=1 (s;t)2f? n 2 ;:::; 2 g f? 2 ;:::; 2 g

(si +ti )

X

;d)2C (s)D(t)

(c

c0 d0

n Y

i=m+1

(ci +di )

(75) The new product is the key to avoiding many zero-valued terms. If si + ti = 0 for some i 2 f1; : : : ; mg, then the new product is zero, so there is no need to compute the second sum. In terms of the original formula (63), we avoid computing the products of row sums for all assignments x such that A(x) has a zero row sum in the rst m rows.

16

5.2 Algorithm

The algorithm proceeds as follows: (1) For each y 2 f?1; 1g n2 , compute:

c(y) =

n

2 X

j =1

aij yj and c0 (y) = (?1)s(y)

(76)

(2) Store vectors c(y) as set C . (3) Partition set C :

C (s) = fc 2 C jc1 = s1 and : : : and cm = sm g for s 2 f? n2 ; : : : ; n2 gm (77) (4) For each z 2 f?1; 1g n2 , compute:

d(z) =

n X j = n2 +1

aij zj? n2 and d0 (z) = (?1)s(z)

(78)

(5) Store vectors d(y) as set D. (6) Partition set D:

D(t) = fd 2 Djd1 = t1 and : : : and dm = tm g for t 2 f? n2 ; : : : ; n2 gm (79) Q (7) For each pair (s; t) 2 f? n2 ; : : : ; n2 gm f? n2 ; : : : ; n2 gm, examine m i=1 (si + ti ). If it is nonzero, then do step (8).

(8) Compute m Y i=1

(si + ti )

X

;d)2C (s)D(t)

(c

c 0 d0

n Y i=m+1

(ci + di )

and add the result to the running total. On completion, the running total is the permanent of A.

(80)

17

5.3 Analysis

The algorithm requires quite a bitn of space to store multisets C and D of vectors c(y) and d(zn). Since there are 2 2 assignments each for y and z, the algorithm requires O(2 2 n) space. Steps (1), (2), and (3) require O(2 n2 poly n) time since there are 2 n2n assignn2 ments y in f?1; 1g . Likewise, steps (4), (5), and (6) require O(2 2 poly n) time. Step (7) requires O(n2m poly n) = O(22m log2 n poly n) time, since there are (n + 1)2m pairs (s; t) in f? n2 ; : : : ; n2 gm f? n2 ; : : : ; n2 gm . Step (8) requires O(poly n) time for each assignment x in f?1; 1gn such that A(x) has no zero row sums in the rst m rows. Let f (n; p; m) represent the expected fraction of assignments x for which A(x) has no zero row sums in the rst m rows, with the expectation over n n matrices with each entry one with probability p and zero with probability 1 ? p. Then the algorithm has expected running time: max[O(2 n2 poly n); O(22m log2 n poly n); O(2n f (n; p; m)poly n)] (81) If we set m = 4 logn2 n , then the center expression is equal to the rst expression. In this case, we avoid computing terms with zero row sums in the rst 4 logn2 n rows. The running time is: max[O(2 n2 poly n); O(2n f (n; p; 4 logn n )poly n)] 2

(82)

In the previous section, we analyzed f (n; p; n), the expected fraction of assignments x for which A(x) has no zero row sums. Later in this section, we analyze the expected value of f (n; p; m) for m < n.

5.4 Variations

The same basic steps can be used to construct an algorithm using nite-dierence parameters (?1; 0) and (0; 1) instead of (?1; 1). If the rst n2 columns have parameters (0; 1) and the remaining columns have parameters (?1; 0), then the range of each entry in s is f0; : : :; n2 g and the range of each entry in t is f? n2 ; : : : ; 0g. This reduces the computation in step (7) from O(n2m poly n) to O(( n2 )2m poly n). The procedures also work with decomposition. If all columns have multipliers (?1; +1), and the decomposition vector is b, then the range of si is fbi ? n2 ; : : : ; bi + n2 g [ f?bi ? n2 ; : : : ; ?bi + n2 g. The range of each entry si has at most 2n + 2 values, so the computation required for step (7) increases to O((2n)m nm poly n) = O(2m+2m log2 n poly n). If only the initial (decomposition) column has nite-dierence parameters (?1; 1), and the remaining parameters are (0; 1) for the rst half of the columns and (?1; 0) for the second half, then the range of each entry si has no more than n +2 elements, and the range of each entry ti has n2 + 1 elements. In this case, step (7) requires O(nm( n2 )m poly n) computation.

18 Computation can be saved by making a list of those partitions C (s) and D(t) that are nonempty and computing step (7) only for pairs (s; t) that correspond to nonempty pairs of partitions. The algorithm will collect zero-valued terms more eectively if the rst m rows can each have zero sums in A(x). Hence, if the nite-dierence parameters are (?1; 1), then permute the rows of A so that as many as possible of the rst m rows have even numbers of elements. Computation and space can be saved by \merging" duplicated partial row sums. If C contains duplicates, then keep only one and store the number of duplicates as a coecient. When the duplicated vector c is used to nd the product of row sums, multiply the result by the coecient. If C contains vectors c and c0 that are identical except for opposite signs (c0 = ?c00), then their terms will cancel in the computation, so they can be eliminated from C . (The same rules apply to duplicates and opposites in D.) The storage requirement can be reduced as follows. Compute and store partial row sums over less than half of the columns. Partition these partial row sum vectors according to the rst m entries, as we did with the vectors c(y) in the original algorithm. For each assignment to the xj 's corresponding to the remaining columns, compute the partial row sum vector d. Find the partitions such that none of the rst m partial row sum vector entries are the opposite of the corresponding entries of d. For each c in these partitions, compute the product of row sums using c and d. For example, if we use the rst n4 columns to form partial row sums c, then the algorithm proceeds as follows: (1)' For each y 2 f?1; 1g n4 , compute

c(y) =

n

4 X

j =1

aij yj and c0 (y) = (?1)s(y)

(83)

(2)' Store vectors c(y) as set C . (3)' Partition set C :

C (s) = fc 2 C jc1 = s1 and : : : and cm = sm g for s 2 f? n2 ; : : : ; n2 gm (84) (4)' For each z 2 f?1; 1g 34n , compute

d(z) = and let t = (d1 ; :::; dm ).

n X

j = n +1 4

aij zj? n4 and d0 (z) = (?1)s(z)

(85)

19 (5)' For each s 2 f? n2 ; : : : ; n2 gm , examine m i=1 (si + ti ). If it is nonzero, then do step (6)'. Q

(6)' Compute m Y i=1

(si + ti )

X

;d)2C (s)D(t)

(c

c 0 d0

n Y i=m+1

(ci + di )

(86)

and add the result to the running total. The storage requirements are determined by step (2)'.n Since C consists of one vector for each y 2 f?1; 1g n4 , the space required is O(2 4 poly n), a reduction n4 ? from the original algorithm by a factor of 2 . There is a tradeo of space for time. We do not store and partition the vectors d(z) as in the original algorithm, so we must step through these vectors one at a time (step (5)') instead of computing for entire partitions at once 3n m 4 n poly n) (step (7)). Hence, the time required for steps (4)' and (5)' is O(2 = O(2 34n +m log2 n poly n). (Compare to O(22m log2 n poly n) for step (7) in the original algorithm.) The expected running time for step (6)' is O(2nf (n; p; m)poly n). Hence, the algorithm has expected running time

max[O(2 34n +m log2 n poly n); O(2n f (n; p; m)poly n)]

(87)

For example, if we set m = 8 logn2 n , then the expected running time is

max[O(2 78n poly n); O(2n f (n; p; m)poly n)]

(88)

5.5 Expected Fraction of Terms Computed by the Algorithm { f(n,p,m)

In this section, we nd the asymptotic form (as n increases) of the expected fraction of the 2n assignments x for which A(x) has no zero row sum among the rst m rows when we use nite-dierence parameters (?1; 1) and decomposition. We show that, for m = 4 logn2 n , this fraction determines the expected running time of our algorithm, and the expected running time is an exponentially small fraction of the 2n poly n time required to evaluate the permanent formula (63) directly. In the previous section, we showed that ifpwe use a random decomposition strategy to increase row sum variances to nn, then the expected fraction of nonzero-valued terms is O(exp(?n 41 p21e )). The fraction of nonzero-valued terms is the fraction of assignments x for which A(x) has nonzero row sums in all

20

n rows. In the previous section, we showed that this fraction has an asymptotic form bounded by (51): [1 ? n01:75 g(1)]n (89) The fraction of terms computed by the algorithm is a similar quantity. It is the fraction of assignments x for which A(x) has nonzero row sums in the rst m rows. Review the derivation of the previous expression. Note that the n in the exponent is the only reference to the number of rows that have nonzero sums in the fraction of terms counted by the expression. All other n's refer to the number of columns in A. Hence, an upper bound for the asymptotic form of f (n; p; m), the fraction of terms computed by our algorithm, is: (90) [1 ? 1 g(1)]m n0:75

Following the derivation in the previous chapter from (51) to (59) , we nd that [1 ? n01:75 ]m = exp[m(? n01:75 g(1) ? 21 n11:50 g2 (1) ? 13 n21:25 g3(1) ? : : :)] (91) Let m = 4 logn2 n :

= exp[ 4 logn n (? n01:75 g(1) ? 21 n11:50 g2 (1) ? 31 n21:25 g3 (1) ? : : :)] (92) 2 1 g2 (1) ? 1 1 g3(1) ? : : :] (93) n0:25 g(1) ? 1 = exp[? log 2 n0:50 log2 n 4 3 n1:25 log2 n 4 2n 4 n0:25 g(1) ] exp[? 1 1 g2(1) ] exp[? 1 1 g3 (1) ] = exp[? log 0 : 50 1 : 25 2 n log2 n 4 3 n log2 n 4 2n 4 (94) The rst exponential dominates. In the others, the exponents go to zero as n ! 1, so the exponentials go to one. Hence, f (n; p; 4 logn2 n ), the asymptotic expected fraction of terms computed by the algorithm, is

n p 1 ]) O(exp[? 4 log (95) 2e 2n We can use this result to analyze the expected running time of the algorithm. Earlier, we showed that steps (1) through (6) require O(2 n2 poly n) time. With decomposition, we found that step (7) requires O(2m+2m log2 n poly n) time. n n + n2 n 4 log 2 poly n). Also, we found that step (8) With m = 4 log2 n , this is O(2 requires O(2n f (n; p; m)poly n) expected running time. From (95), we can see that this step determines the expected running time, which is 1 4

n 41 p 1 ]poly n) O(2n exp[? 4 log 2e 2n

(96)

21 This is smaller than the O(2n poly n) required to evaluate the permanent formula directly, by a factor of 1 4 n exp[? 4 log n p 1 ] (97) 2e 2

5.6 Extension to f?1; 0; 1g Matrices

The result above can be extended to matrices with entries in f?1; 0; 1g. Speci cally, examine the possibility of applying the algorithm developed in thispchapter to a f?1; 0; 1g matrix, with decomposition to make row sum variances nn and column multipliers (?1; 1). The range of possible row sums n X j =1

aij xj

(98)

is the same as for 0-1 matrices, so the data structures and procedures of the algorithm for 0-1 matrices work for f?1; 0; 1g matrices as well. Now consider whether the expected computational reduction achieved for 0-1 matrices is also achieved for f?1; 0; 1g matrices. As in the analysis for 0-1 matrices, assume the entries of the f?1; 0; 1g matrices are drawn i.i.d. to compute expectations. Note that the reduction in computation relies on small expected p values of row sums (98) and the possibility of achieving row sum variance nn through decomposition. Recall that the expected value of a row sum in a 0-1 matrix, given k variables xj assigned +1, is: k = (2k ? n)p (99) where p is the probability of each entry taking value 1. Since the computational reduction proof applies for p 2 [0; 1], it applies for jk j as large as j2k ? nj. For f?1; 0; 1g matrices, let 0 be the expected value of each entry. Then the expected value of a row sum, given k variables xj assigned +1, is: k = (2k ? n)0 (100) Note that j0 j 1. Hence, jk j j2k ? nj, and the expected row sums are in the range covered by the proof. Since decomposition can only increase row sum variance, thep proof applies only if the row sum variance without decomposition is less than nn. Let (0 )2 be the variance of each entry. Note that the row sum variance without decomposition is n(0 )2 . The maximum entry variance is achieved by the distribution 8 1 < ?1 with probability 2 aij = : 0 with probability 0 (101) 1 with probability 21 In this case, (0 )2 = 1, and the row sum variance without decomposition is n, which is in the range covered by the proof.

22

6 Discussion We have developed a decomposition strategy to produce a permanent formula with an exponentially small fraction of nonzero-valued terms. This work introduces several challenges and opportunities for further inquiry. Our decomposition strategy is random, and it ignores the speci c problem instance A. It must be possible to produce more zero-valued terms using a decomposition vector with entries determined by the examining the problem instance. Strategies to reduce computation in an algorithm with polynomial space requirements by collecting sets of zero-valued terms are discussed in [3]. The strategies are heuristic, and they are designed for general nite-dierence formulas. It should be possible to design better strategies for matrices with decomposition. One interesting idea is to assign decomposition vector entries in a manner that facilitates the collection of zero-valued terms, i.e., to develop a decomposition strategy with the direct goal of reducing computation. Finite-dierence algorithms have been developed for several other problems than the matrix permanent, including counting paths and cycles [2], sequencing, bin-packing, and deadlock avoidance [1]. One opportunity for future research is to nd nite-dierence parameters and procedures (like decomposition for the matrix permanent problem) that allow signi cant reductions in the time required to evaluate the nite-dierence formulas for these problems.

23

References [1] E. Bax, Finite-dierence algorithms for counting problems, CalTech-CSTR-97-23. [2] E. Bax and J. Franklin, A nite-dierence sieve to count paths and cycles by length, Inform. Process. Lett., 60 (1996) 171-176. [3] E. Bax and J. Franklin, A permanent formula with many zero-valued terms, Inform. Process. Lett., 63 (1997) 33-39. [4] W. Feller, An Introduction to Probability Theory and Its Applications, John Wiley and Sons, Inc. 1968. [5] M. R. Garey and D. S. Johnson, Computers and Intractability { A Guide to the Theory of NP-Completeness W. H. Freeman and Company, New York, 1979. [6] J. M. Hammersley, An improved lower bound for the multidimensional dimer problem, Proc. Camb. Phil. Soc., 64 (1968) 455-463. [7] M. R. Jerrum and A. Sinclair, Approximating the permanent, SIAM Journal on Computing, 18(6):1149-1178, December 1989. [8] M. Jerrum and U. Vazirani, A mildly exponential approximation algorithm for the permanent, Algorithmica, 16(1996):392-401. [9] N. Karmarkar, R. Karp, R. Lipton, L. Lovasz, and M. Luby, A Monte Carlo algorithm for estimating the permanent, SIAM Journal on Computing, 22(2):284-293, April 1993. [10] R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge University Press 1995, pp. 315-329. [11] H. J. Ryser, Combinatorial Mathematics, The Mathematical Association of America 1963, Ch. 2. [12] J. Simon, On Some Central Problems in Computational Complexity, Doctoral Thesis, Dept. of Computer Science, Cornell University, Ithaca, NY. [13] A. Sinclair, Algorithms for Random Generation and Counting: A Markov Chain Approach, Birkhauser, Boston 1993. [14] L. G. Valiant, The complexity of computing the permanent, Theoretical Computer Science, 8 (1979) 189-201.

A Permanent Algorithm With (exp ]) Expected Speedup for ... - CiteSeerX

A Permanent Algorithm With (exp ]) Expected Speedup for ... - CiteSeerX

Suggest Documents

Scheduling Parallel Jobs with Monotone Speedup - CiteSeerX

Partitioning Graphs to Speedup Dijkstra's Algorithm

Statistical Methods for Analyzing Speedup Learning ... - CiteSeerX

A VACUUM CIRCUIT-BREAKER WITH PERMANENT ... - CiteSeerX

A fast algorithm for calculating an expected ... - University of Stirling

Experimentally-implemented genetic algorithm (Exp ... - OSA Publishing

An Alternative to Factorization: a Speedup for

The throughput of data switches with and without speedup - CiteSeerX

Creating groups with similar expected behavioural ... - CiteSeerX

A Genetic Algorithm with a Tabu Search (GTA) for ... - CiteSeerX

A Genetic Algorithm with a New Repair Process for ... - CiteSeerX

Supercritical Speedup

genetic algorithm for coping with ... - CiteSeerX

Genetic Algorithm for Document Clustering with ... - CiteSeerX

Algorithm for Nanotubes Computer Generation with ... - CiteSeerX

Genetic Algorithm for Document Clustering with ... - CiteSeerX

Grammatical Inference with a Genetic Algorithm - CiteSeerX

A NOISE ESTIMATION ALGORITHM WITH RAPID ... - CiteSeerX

A Robust Competitive Clustering Algorithm with ... - CiteSeerX

+ exp

o exp o op - CiteSeerX

Parallelized QuickSort with Optimal Speedup 1

A Quantitative Analysis of the Speedup Factors of FPGAs ... - CiteSeerX

Expected Completion Time based Scheduling Algorithm for ... - ipcsit