Communication-E cient Parallel Multiway- and ... - Semantic Scholar

Communication-Ecient Parallel Multiwayand Approximate Minimum- Cut Computation Friedhelm Meyer auf der Heide Gabriel Teran Martinez Heinz Nixdorf Institute and Department of Computer Science University of Paderborn, D-33095 Paderborn, Germany fmadh@@uni-paderborn.de, gab@@hni.uni-paderborn.de

Abstract

We examine dierent variants of minimum cut problems on undirected weighted graphs on the p-processor bulk synchronous parallel BSP model of Valiant. This model and the corresponding cost measure guide algorithm designers to develop work ecient algorithms that need only very little communication. Karger and Stein have presented a recursive contraction algorithm to solve minimum cut problems. They suggest a PRAM implementation of their algorithm working in polylogarithmic time, but being not work-optimal. Typically the problem size n is much larger than the number of processors p on real-world parallel computers (p 1, i.e., a cut (A; B ) satisfying copt c(A; B ) copt. The minimum r-way cut problem is comparable to the minimum cut problem, but V is partitioned into r subsets for r > 2.

1.2 The BSP Model

In the BSP (Bulk Synchronous Parallel) Model of parallel computing of Valiant [21], a parallel computer consists of a number of processors, each of 2

which is equipped with a larger local memory. The processors are connected by a router that transports messages between processors. A computation in the BSP model proceeds in a succession of supersteps. Conceptually, a superstep consists of a computation and a communication phase followed by a barrier synchronization. In the computation phase, processors independently perform operations on data that resides in their local memories at the beginning of the superstep. In the communication phase, the messages are transported to their destinations by the router. After the communication phase a barrier synchronization takes place. In the BSP model a parallel computer is characterized by the following parameters: The parameter p is the number of processors. The parameter l models the communication latency and the time needed for a barrier synchronization. One can view l as the minimum time for a superstep. The parameter g models the gap, i.e. the minimum time between the arrival of succeeding words of data; g? re ects the available communication bandwidth per processor. Consider a BSP computation consisting of L supersteps, where in the i-th superstep each processor performs at most wi local operations and a ci relation is realized, i.e., in the communication phase each processor sends and receives at most ci messages. The runtime of the i-th superstep is wi + g ci + l. The runtime of the BSP computation is de ned as 1

W + g C + l L; P

P

where W = Li wi and C = Li ci . The costs W and C are denoted computation time and communication volume, respectively. Classical parallel models like the PRAM only aim to be work ecient, i.e., to use time close to T=p (T : sequential complexity of the underlying problem; p: number of processors). The design goal of a BSP algorithm is, in addition to be work ecient, also to reduce the number of supersteps and the communication volume as much as possible, because communication is a main bottleneck in real parallel machines. A (n) be the sequential runtime algorithm A for some problem with Let Tseq input size n. We call the BSP algorithm work optimal with respect to A =1

=1

3

A (n)=p). It is communication ecient if C = O(W =g ) and if W = O(Tseq L = O(W =l) for large range of values for g, l and p. For detailed discussion of the BSP model and BSP algorithms see, e.g., [21, 5, 2].

1.3 Previous work

The classical approach to solve the minimum cut problem is via reduction to the maximum ow problem. New algorithms for solving the minimum cut problem have been proposed recently. These algorithms are superior to previous algorithms with respect to simplicity and time complexity. Typically the new techniques do not perform any maximum ow computations. For instance, one of these techniques is based on the (deterministic or randomized) identi cation and contraction of edges which do not belong to a minimum cut. The Scan First Search procedure of Nagamochi and Ibaraki [17] identi es an edge with this property deterministically within linear time. The time complexity of their algorithm is O(mn + n log n). Stoer and Wagner [20] proposed a simpler version of this algorithm which has the same time complexity. Matula [16] found a linear (2 + ")-approximation algorithm for the minimum cut problem. A randomized version of contraction techniques has been discovered by Karger [9]. The algorithm works as follows: In each round one edge is selected at random and is contracted. This process is repeated until the graph is composed of only two meta-nodes. Karger shows that a certain minimum cut has survived the whole sequence of contraction with probability (1=n ). Stein and Karger [13] proposed the recursive contraction algorithm as a far more ecient variant of the contraction algorithm [9]. It can be thought of as a binary computation tree of depth log((n=2) ), whose nodes represent contracted graphs. Each of the 2k contracted graphs of level k, 0 k log((n=2) ), has n=2k= nodes. Particularly, the leaves of the computation tree are contracted graphs with two meta-nodes. Each of these leaves may represent a minimum cut. The algorithms of Karger and Stein use dierent strategies for traversing the computation tree. For instance, their sequential algorithm makes use of a depth rst search-strategy for traversing the computation tree. This algorithm nds a particular minimum cut with probability (1= log n). The time complexity of this sequential search process is O(n log n). Their PRAM implementation uses n processors for realizing a breadth rst search-strategy. The transition from one level to 2

2

2

2

2

2

2

4

the next is done in polylogarithmic time. The runtime of their best RNCimplementation (RNC is the class of problems that can be solved by a randomized algorithm in polylogarithmic time using a PRAM with a polynomial number of processors) is O(log n) with n processors, i.e., this algorithm is not work optimal. The methods for the multiway cut and the approximate minimum cut problems are similar. The sequential algorithm nds a particular -minimum cut with probability (1= log n) in time O(n ). The parallel implementation uses n processors and performs work O(n = log n). A particular minimum r-way cut of the multiway cut problem is found sequentially in time O(n r? ) or, using parallel methods, in O(log n) time with n r? processors. Both PRAM algorithms are not work optimal. 3

1

2

2

2

3

2

2(

3

2(

1)

1)

1.4 Our results

The number p of processors of existing parallel computers is typically much smaller than the input size n. For such values p we obtain trivial BSP algorithms by simulating the corresponding PRAM algorithm on the p processors. A BSP implementation of, e.g., the parallel minimum cut algorithm of Karger and Stein with p n processors needs O(log n) supersteps, communication volume O(n log n), and runtime O( np log n) to nd a particular minimum cut with probability (1= log n). Thus, the high amount of communication makes the algorithm infeasible on real parallel machines. As far as we know, no other algorithms based on the practically relevant BSP model exist for the graph optimization problems treated in this paper. We use the approximate minimumcut problem in order to explain why the PRAM-algorithms of Karger and Stein are not work optimal. The objective of Karger and Stein is to traverse the computation tree in polylogarithmic time. This is achieved by means of a transformation from one level within the computation tree to the next which is done in polylogarithmic time. The size of all subproblems increases with the level in the computation tree. For instance, the size of all subproblems in the lowest level is n . Indeed, for this level the PRAM-algorithm needs n processors. On the other hand, the problem size is only n for level 0. Thus, not all n processors are busy in the rst levels. Our method is based on a dierent idea. We begin with the sequential 3

2

2

2

3

3

2

2

2

1

2

logk n denote (log n)k

5

algorithm, i.e., with only one processor. If we have more processors at our disposal we try to nd a strategy for traversing the computation tree which is still optimal. For instance, if two processors are available then each of these processors is assigned one of the two subtrees which have the root of the computation tree as a parent. Both subtrees are then traversed sequentially by the corresponding processor. For larger numbers p of processors we proceed analogously, but not before level log p. A nice side eect, in addition to work optimality, is a saving of communication. For the multiway cut problem resp. the approximate minimum cut problem the BSP-algorithmsr?are work optimal for n ? = log n p ? = respectively n r? = log n p r? and communication ecient for 2(

2(

1)

1 1

2 1

2)

?1)

n

2(

n

r?2)

2(

!

; g = o p ? = log p resp. g = o ? p r? log p 1 1

1

1

1

for r = 3, g = o(n =(pp log p)). Our BSP-algorithm for the minimum cut problem is communication ef cient only for low values of g and not work optimal, because the best sequential algorithm [11] nds all minimum cuts with high probability in time O(n log n). Nevertheless, it is by far more ecient than the simulation of the PRAM algorithm of Karger and Stein. Our algorithms are the rst communication ecient algorithms for the multiway cut- and approximate minimum cut- problems. In the next Section we review the contraction algorithms. Our BSP algorithms are presented in Section 3. 2

2

2 The Contraction Algorithm In this section, we introduce the contraction algorithm for the minimum cutproblem following Karger [9]. The main operation of this algorithm is the contraction of an edge (v ; v ). This operation replaces two nodes v and v by one node v and two edges (u; v ) and (u; v ) by one edge (u; v) with weight c(u; v) = c(u; v ) + c(u; v ). The rest of the graph remains unchanged. The implementation of this algorithm makes use of the weighted n n adjacency matrix associated with the graph G. Using this representation of the graph, the contraction of an edge is reduced to elementary manipulations of the 1

2

1

1

1

2

2

6

2

rows and columns of the adjacency matrix. For a detailed description of this implementation see [13]. Denote by G=(v; u) the graph resulting from G by the contraction of the edge (v; u). Likewise, denote by G=F the graph resulting from G by the contraction of a set F E of edges. In following denoted n ! k the contraction of the graph from n to k node. proc Contract(G; n ! k) repeat until G has k meta node choose edge (v; u) with probability proportional to c(v; u) ; G := G=(v; u) return G Figure 1: The contraction algorithm of Karger

Theorem 1 (Karger [9]) A xed minimum cut of? the G survives the ? graph contractions to k nodes with probability of at least k2 = n2 = ((k=n)2 ). For k = 2 the contraction algorithm returns a certain minimum cut of the graph G with probability (1=n2 ).

2.1 The Recursive Contraction Algorithm

A more ecient version of the contraction algorithm is based on the following idea of Karger and Stein: The probability of the rst contracted edge to belong to the minimum cut is only 2=n. However, for the last contraction we already have a probability of 2=3. Thus, the probability for a minimum cut to survive a few contractions is rather large. For instance, p this probability is approximately 1=2 if the graph is contracted to n= 2 nodes. Therefore, a certain minimum cut is expected to survive almost surely a (twice) repeated contraction. The recursive continuation of this approach leads to the recursive contraction algorithm. Theorem 2 (Karger and Stein [13]) The recursive contraction nds a particular minimum cut with probability (1= log n) and time O(n log n). It nds all minimum cuts with high probability in time O(n log n). The recursive contraction algorithm can be represented by means of a binary computation tree. Each node of this tree is associated with a conp tracted graph; an edge represents a reduction of the nodes by the factor 2 2

2

7

3

proc RContract(G; n) if n = 2 then return cut else repead following steps twicep G0 := Contract(G;pn ! n= 2) ; 0 RContract(G ; n= 2)

Figure 2: The recursive contraction algorithm

p

by executing Contract(G; np! n= 2. The reduction factor for the minimum cut problem is said to be 2. The leaves of the tree are contracted graphs with two meta-nodes which may represent a minimum cut. The depth of the tree is log (n ) + O(1), the number of leaves is O(n ). 2

2

3 The BSP Algorithms We describe our BSP-algorithms by giving a strategy for traversing the computation tree. Reducing the communication volume and the number of supersteps are two important objectives in connection with the design of BSPalgorithms. Taking these objectives into account, we need a clever distribution of the nodes of the computation tree among the processors. Essentially, the computation tree is traversed in log p + 1 phases, where the last phase only consists of local computations. In phase k (k = 0; : : : ; log p ? 1), all contractions associated with the transition from level k to level k + 1 within the computation tree are performed in parallel; each contraction is assigned the same number of processors. In phase log p, each of the p nodes of level log p is assigned to exactly one processor. Thus, the computation is local in this phase: No communication is needed in order to traverse each of the p subtrees. We will denote by nk the number of nodes of a contracted graph in level k, (k = 0; : : : ; log p). All 2k contracted graphs in level k have the same number of nodes nk . For representing the graph, we make use of the adjacency matrix as a data structure. At the beginning, each of the p processors stores n=p 8

rows of the adjacency matrix. Without loss of generality we assume that the number of processors p is a power of 2. For each level k, (k = 0; : : : ; log p), we group the p processors in 2k sets of processors Pk;j = [Pj pk ; : : : ; Pj pk pk ], 0 j < 2k . The invariant of the BSP-algorithm over all the phases k, (k = 0; : : : ; log p) is: At the beginning of the phase k, each of the contracted graphs in level k are stored in exactly one processor set Pk;j , 0 j < 2k . The rows of an nk nk adjacency matrix of a contracted graph are distributed uniformly among the processors of Pk;j . The invariant is true for phase 0 because the assumption of the distribution of the adjacency matrix for the input graph. We give a proof of the invariant in the next section by a detailed description of the transition from level k to k + 1 in the computation tree. The computation tree for the minimum cut, the approximately minimum cut, and the multiway cut problem are comparable. In the subsequent sections we explain the BSP-algorithms for these problems. 2

3.1 Minimum Cut

+1

2

+

2

p

The reduction factor for the minimum cut problem is 2. We summarize the properties of the computation tree for this problem. Lemma 1 1. The number of nodes of a contracted graph in level k is k=n . 2. The depth of the computation tree is log ((n=2) ) + O(1). 3. Each contracted graph in level log p has n=pp nodes. 2

2

2

Proof.

p

1. The result follows from n = n and nk = nk = 2. 2. The depth t of the computation tree follows from nt = 2. 3. The number of nodes of a contracted graph in level log p is n p = 2 n=2 p = n=pp . We give a detailed description of our implementation in the following two paragraphs. We rst considered the last phase log p. The description of the transition from phase k to k + 1 in the next paragraph is the induction step in the proof of the invariant. 0

+1

log

log 2

9

3.1.1 The phase log p

In phase log p, each of the p contracted graphs of level log p of the computation tree is assigned exactly one of the p processors. That is, a processor stores exactly one graph. This will become clear from the discussion of the implementation of the previous phases in the next paragraph. From the above lemma, we know each of the graphs to have n=pp nodes. In this phase, each of the p computation subtrees is traversed sequentially by the corresponding processor. Therefore, this phase consists of one superstep without any communication, whose computation time is O( np log n). 2

3.1.2 The transition from phase k to k + 1 We describe the phases 0 to log p?1 by a detailed description of the transition from level k to k+1, for 0 k < log p. At beginning of phase k each processor set Pk;j , j = 0; : : : ; 2k ? 1 stores one contracted graph Gk;j with nk nodes from level k of the computation tree. Because all processor sets Pk;j execute the same procedure we describe the algorithm only for processor set Pk; . proc Contract(Gk ; nk ! nk ; Pk ) 0

+1

+1

generate permutation L of the edges of Gk using exponentially distributed scores; return Compact(Gk ; nk ; L; Pk ) +1

+1

Figure 3: A parallel version of Contract At beginning of phase k we divide the processor set Pk; in two equally sized processor sets Pk ; and Pk ; . In order that each of the processor set Pk ; and Pk ; can execute the procedure Contract(Gk; ; nk ! nk ) we need that both processor sets store the adjacency matrix of Gk; . We reach this by realizing a simple 2(nk ) =pk -relation. The idea hereby is that each processor of Pk ; informs exactly one processor of Pk ; about which part of the graph it has stored and vice versa. With Contract(Gk ; nk ! nk ; Pk ) we denote the procedure in which the processor set Pk ; executes the procedure Contract(Gk; ; nk ! nk ). Our procedure Contract(Gk ; nk ! nk ; Pk ) is a BSP implementation of a parallel algorithm of Karger [9] (see Fig.3), with pk = p=2k processors. Instead of choosing one edge at random repeatedly, it generates a list of edges. 0

+1 0

+1 0

+1 1

+1 1

0

+1

0

2

+1 0

+1 1

+1

+1 0

0

+1

+1

10

+1

+1

+1

proc Compact(Gk ; nk ; L; Pk ) if Gk has nk nodes then return Gk else let L be the rst and L the second half of L +1

+1

+1

1

2

l := number of connected components of (V (Gk ); L ) if l < nk then Compact(Gk ; nk ; L ; Pk ) else Compact(Gk =L ; nk ; L =L ; Pk ) 1

1

1

+1

+1

1

1

+1

+1

2

1

+1

Figure 4: The procedure Compact This list determines the order in which edges are contracted. The procedure Compact(Gk ; nk ; L; Pk ) is responsible for the contraction of edges until the graph consists of nk nodes. The procedure Compact(Gk ; nk ; L; Pk ) has to nd a pre x L0 of edges from the list L, whose contraction results in a graph having nk nodes (see Fig. 4). This pre x is found by means of binary search. The number of the connected components of the graph (V (nk ); L0) is determined in order to test whether a pre x L0 of L belongs to some contracted graph having nk nodes. In case of an undirected graph the generation of the list L of edges is relatively simple. It works as follows: Each edge is assigned a score which is chosen at random from the unit interval according to the uniform distribution. Then, the list L is determined by sorting the edges according to the score values. This method also works for weighted graphs. However, in this case the score of an edge is the realization of an exponentially distributed random variable. Karger proposed an ecient implementation in his PhD thesis [10]. Lemma 2 (Karger[10]) In O(log nk ) steps per edge, it is possible to assign +1

+1

+1

+1

+1

+1

+1

to each edge an approximately exponentially distributed score such that all comparisons are the same as for exact exponential distributions, with high probability. Now we are in a position to explain the BSP-implementation in more detail and to estimate the corresponding cost. Our BSP-implementation use the BSP-sorting algorithm of Goodrich [6] and the BSP-algorithm of Caceres et. al. [3] for computing the connected components of a graph Lemma 3 Consider a phase k, 0 k < log p.

11

1. The cost for generating the list L of the edges of Gk is

W =O

nk ) pk

(

2

!

log (nk ) ; C = O 2

nk ) pk

(

nk )

2

log ( log

and L = O

2

n k )2 pk

(

nk n k )2 pk 2

log log

(

2. The cost of the BSP implementation of Compact(Gk ; nk+1 ; L; Pk+1 ) is

W =O

n2k pk

log pk ; C = O

n2k pk

log pk

and L = O(log pk log (nk )2):

3. The cost of the BSP-implementation of Contract(Gk ; nk ! nk+1 ; Pk+1) !! is W = O

n k )2 pk

(

log (nk ) ; C = O 2

L = O(log pk log (nk ) ): Proof.

nk )2 pk

(

n k )2

log ( log

2

nk )2 pk

(

+ log pk

and

1. The generation of a score for each edge is a local computation which can be done within one superstep. Since each processor stores at most (nk ) =pk edges, the corresponding computation time is O( npkk log nk ) by Lemma 4. In order to sort the nk edges according to their score, we apply the BSP algorithm of Goodrich [6] with pk processors. The cost of this BSP-algorithm is 2

2

2

2

W=O

n k )2 pk

(

!

log(nk ) , C = O 2

nk )2 log(nk )2 pk log (nk )2

(

pk

and L = O

!

n k )2

log( log

n k )2 pk

(

.

2. For computing the connected components of the implementation of the procedure Compact we make use of the BSP algorithm of Caceres et. al. [3]. For the computation time of Compact it holds that

W ((nk ) ; pk ) = O (npk ) log pk + W ((nk ) =2; pk ); k where the rst part of the right hand side essentially expresses the computation time of the BSP algorithm of Caceres et. al. [3] for 2

2

12

2

!

:

computing the connected component of (V (nk ); L ). We solve this recurrence to 1

W ((nk ) ; pk ) = O (npk ) log pk : k Analogously, for the communication volume we have ( n k) C ((nk ) ; pk ) = O p log pk + C ((nk ) =2; pk ); k where the rst part of the right hand side corresponds to the communication volume of the BSP-algorithm of Caceres et. al. The number of supersteps is 2

2

2

2

2

L((nk ) ; pk ) = O(log pk ) + L((nk ) =2; pk ) = O(log pk log(nk ) ): 2

2

2

The term log pk is the number of supersteps in the algorithm of Caceres et. al. [3]. 3. The overall cost results from 1. and 2. 2 We are now in the position to estimate the cost for all phases. Theorem 3 Our BSP algorithm nds a particular minimum cut with probability (1= log n). Its cost is

W = O np log p log n ; C = O np log p and L = O(log n log p): 2

2

2

2

Proof. The computation time of the phases k, 0 k < log p is given by p?1 X

log

k=0

W (Phase k) =

p?1 X

log

k=0

= O

O( (npk ) log(nk ) )

p?1 X

log

k=0

2

2

k

n =2k log n p=2k 2k

2

2

+1

= O( np log p log n): 2

13

!

The computation time of the last phase log p is given by ? p n O (n= p) log pp = O n =p log n : Thus, we obtain n W = O p log p log n for the computation time of the algorithm. For the communication volume of all phases it hold that 2

2

2

C =

p?1 X

log

k=0 p?1 X

C (Phase k)

log(nk ) + log p O (npk ) log = k nk =pk k k = O(n =p log p): The number of supersteps of the algorithm is log

2

2

2

=0

2

2

L=

p?1 X

log

k=0

L(Phase k) =

p?1 X

log

k=0

O(log pk log(nk ) ) = O(log n log p): 2

2

2

The work performed by our algorithm is larger by a factor O(log p) than the work of the sequential algorithm by Karger and Stein (, but better by a factor O(log n= log p) than their PRAM algorithm). This is a consequence of the algorithm of Caceres et. al. which needs O(log p) supersteps for computing the connected components. 2

3.2 Approximate Minimum Cuts

The contraction algorithm of Karger and Stein can be adapted in order to compute approximations of minimum cuts. Let > 1.

Theorem 4 (Karger and Stein [13]) is O((2n)

).

2

14

1. The number of -minimal cuts

2. A particular -minimal cut can be found with probability (1= log n) in time O(n2). 3. All -minimal cuts can be found in time O(n2 log n) w.h.p..

The algorithm for generating a particular -minimal cut corresponds to the recursive contraction algorithm with the following modi cations p p The reduction factor is 2, i.e.,p Contract(G; n ! n= 2) is executed instead of Contract(G; n ! n= 2). The contraction procedure ends if the number of nodes of the graph is d2e. 2

2

Lemma 4 The computation tree for the -minimal cut has the following

properties:

1. The depth of the computation tree is log nd2e + O(1). 2. The number of nodes of a contracted graph in level k is nk = n=2 k .

p nodes. 3. A contracted graph in level log p has n= p

2

2

Proof. Analogously to Lemma 3.

2

Our BSP-scheme for traversing the computation tree is the same for the minimum cut problem and the -minimum cut problem. The computation is divided into log p + 1 phases. Each processor stores one instance of an -minimum cut problem with n= p p nodes at the beginning of the nal phase log p. These problems can be solved by the corresponding processor within one superstep and without communication between the p processors. The phase k, 0 pk < log p, is composed of several supersteps. Each Contract(G; nk ! nk = 2) is executed with pk = p=2k processors. 2

2

+1

Theorem 5 Our BSP algorithm nds a particular -minimal cut with probability (1= log n). Its cost is

W = O pn p log n + np ; C = O pn p log p and L = O(log n log p): 2

2

2

15

2

2

Proof. Analogously to Theorem 6. 2 Corollary 1 Our BSP algorithm nds all -minimal cuts w.h.p. in optimal time O np log n for n ?n p ? = . It is communication ecient for 2

g=o

2(

2

n2(?1) p1?1= log p

1)

1 1

log

.

3.3 Multiway Cuts

Karger and Stein have shown that their techniques can also be applied to the minimum r-way cut problem. Let r > 3 be a xed integer.

Theorem 6 (Karger and Stein [13]) way cuts is O((n)2(r?1)).

1. The number of minimum r-

2. A particular minimum r-way cut can be found in time O(n2(r?1) ) with probability (1= log n). 3. All minimum r-way cuts can be found w.h.p. in time O(n2(r?1) log n).

The algorithm for generating all minimum r-way cuts corresponds to the recursive contraction algorithm modi ed in the following way.

The reduction factor is 2

1

r?1)

2(

, i.e., Contract(G; n ! n=

p

r?1)

2(

ecuted. The contraction process is nished if the graph has r nodes.

2) is ex-

Lemma 5 The computation tree for the minimum r-way cut has the following properties: 1. The depth of the computation tree is log n2(r?1) + O(1). 2. The number of nodes of a contracted graph in level k is nk = n=2 3. A contracted graph in level log p has n=p

Proof. Analogously to Lemma 3.

1

r?1)

2(

.

nodes.

The proof of the following result is similar to the one of Theorem 6. 16

k r?1)

2(

2

Theorem 7 Our BSP algorithm nd a particular minimum r-way cuts with

probability (1= log n). Its cost is r? W = O nr? log n + n p p 2

1

2(

2

1)

1

!

!

; C = O n log p ; L = O(log n log p): p r? 2

1

2

1

Corollary 2 Our r-way cuts w.h.p. in optimal time BSP nds all minimum r? r? r? n n O p log n for n p r? and is communication ecient for g = 2(

1)

2(

2

2 1

2)

log

o

n2(r?2) 1 ? p r?1 log p 1

.

4 Conclusion We have proposed ecient BSP-algorithms for the minimumcut problem, the -minimum cut problem, and the multiway cut problem. These algorithms are based on the contraction algorithm of Karger and Stein [13]. The question whether an optimal BSP-implementation for the new algorithm of Karger exists has not been answered yet. Some improvements of our algorithm are possible:

A BSP algorithm for the connected components with a constant number

of supersteps. The generation of an edge permutation without sorting.

For the multiway- and -minimum cutproblem with realistic parameters our BSP-algorithms are optimal and communication ecient. A nice property of the recursive contraction algorithm is the generation of p small problems in level log p. These small problems can be solved by other sequential algorithms, such as, e.g., the deterministic algorithm of Nagamochi and Ibaraki [17] in time O(n =p = ) or the new algorithm of Karger [11] in time O(n =p), for the minimum cut problem. This implies that only O(log p log n) instead of O(log n) repetitions are necessary in order to achieve high probability. This eect may be quite interesting for practical implementations. For the all terminal network reliability problem Karger [12] has proposed a fully polynomial randomized approximation scheme (FPRAS). An important part of this algorithm is the generation of all -minimum cuts using the 3

3 2

2

2

17

recursive contraction algorithm. It is an interesting question whether our BSP-algorithm for the -minimum cut problem can be extended to a BSPFPRAS for the all terminal network reliability problem.

References [1] M. Adler, J. W. Byers, R. M. Karp, Parallel sorting with limited bandwidth, SPAA 95, pages 129{136, 1995. [2] A. Baumker, W. Dittrich, and F. Meyer auf der Heide, Truly ecient parallel algorithms: 1-optimal multisearch for an extension of the BSP model , ESA 95, 1995. [3] E. Caceres, F. Dehne, A. Ferreira, P. Flocchini, I. Rieping, A. Roncaa, P. Flocchini, I. Rieping, A. Roncato, N.Santoro, S. W. Song, Ecient parallel graph algorithms for coarse grained multicomputers and BSP , ICALP 97, Bologna, Italy, 1997. [4] D. E. Culler, R. M. Karp, D. A. Peterson, A. Sahay, K. E. Schauser, E. Santos, R.Subramonian, T. von Eicken, Log P: Towards a realistic model of parallel computation, in Proc. 4th ACM SIGPLAN Symp. on Princ. and Practice of Parallel Programming, pages 1{12, 1993. [5] A. V. Gerbessiotis L. G. Valiant, Direct bulk-synchronous parallel algorithms, J. of Parallel and Distributed Computing, 22:251{267, 1994. [6] M. T. Goodrich, Communication-ecient parallel sorting, STOC 96, 1996. [7] M. T. Goodrich, Randomized fully-scalable BSP techniques for multisearching and convex hull construction, SODA 97, 1997. [8] J. Jaja, An Introduction to Parallel Algorithms, Addison-Wesley, Reading, Mass., 1992. [9] D. R. Karger, Global min-cuts in RNC and other rami cations of a simple mincut algorithm, SODA 93, pages 21{30, 1993. [10] D. R. Karger, Random Sampling in Graph Optimization Problems, PhD thesis, Stanford University, 1994. 18

[11] D. R. Karger, Minimum cuts in near-linear time , STOC 96, pages 56{ 63, 1996. [12] D. R. Karger, A randomized fully polynomial approximation scheme for the all terminal network reliability problem, STOC 95, pages 11{17, 1995. [13] D. R. Karger, C. Stein, A new approach to the minimum cut problem, Journal of the ACM, 43(4):601{640, 1996. [14] R. M. Karp, V. Ramachandran, Parallel algorithms for shared memory machines, In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, pages 318{326, 1992. [15] Y. Mansour, N. Nisan, U. Vishkin, Trade-os between communication throughput and parallel time STOC 94, pages 372{381, 1994. [16] D. W. Matula, A linear 2 + approximation algorithm for edge connectivity, SODA 93, pages 500{504, 1993. [17] H. Nagamochi, T. Ibaraki, Computing edge connectivity in multigraphs and capacitated graphs, SIAM J. of Discrete Mathematics, 5:54{66, 1992. [18] C. Papadimitriou, M. Yannakakis, Towards an architecture-independent analysis of parallel algorithms, STOC 98,pages 510{513, 1988. [19] J. H. Reif, Synthesis or Parallel Algorithms, Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1993. [20] M. Stoer, F. Wagner, A simple min cut algorithm, ESA 94, page 141{ 147, 1994. [21] L. G. Valiant, A bridging model for parallel computation, Comm. ACM,33:103{111, 1990.

19