Parallel Algorithmic Techniques For Combinational Computation - disco

6 downloads 0 Views 3MB Size Report
Sci. 1988.3:233-283. Downloaded from arjournals.annualreviews.org by Eidgenossische Technische Hochschule Zurich on 06/08/06. For personal use only.
Annual Reviews www.annualreviews.org/aronline

Annu. Rev. Comput. Sci. 1988.3:233-283. Downloaded from arjournals.annualreviews.org by Eidgenossische Technische Hochschule Zurich on 06/08/06. For personal use only.

Ann.Rev. Comput.Sci. 1988.3: 233-83 Copyright©1988by AnnualReviewsInc. All rights reserved

PARALLEL ALGORITHMIC TECHNIQUES FOR COMBINATORIAL COMPUTATION David Eppstein Computer Science Department, Columbia University, NY 10027

NewYork,

Zvi Galil Computer Science Department, Columbia University, NewYork, NY10027 and Computer Science Department, Tel-Aviv University, Tel-Aviv, Israel

INTRODUCTION Parallel computation offers the promise of great improvements in the solution of problemsthat, if we were restricted to sequential computation, would take so muchtime that solution would be impractical. There is a drawbackto the use of parallel computers, however, and that is that they seem to be harder to program. For this reason, parallel algorithms in practice are often restricted to simple problems such as matrix multiplication. Certainly this is useful, and in fact we shall see later some non-obvious uses of matrix manipulation; but many of the large problems requiring solution are more complex. In particular, an instance of a problem maybe structured as an arbitrary graph or tree, rather than in the regular order of a matrix. In this paper we describe a numberof algorithmic techniques developed for solving such combinatorial problems. The intent of the paper is to showhowthe algorithmic tools we present can be used as building blocks for higher-level algorithms, and to present pointers to the literature for the reader to look up the specifics of these algorithms. Wemake no claim to completeness; a numberof techniques have been omitted for brevity or because their chief application is not combinatorial in nature. In particular 233 8756-7016/88/1115q)233502.00

Annual Reviews www.annualreviews.org/aronline

Annu. Rev. Comput. Sci. 1988.3:233-283. Downloaded from arjournals.annualreviews.org by Eidgenossische Technische Hochschule Zurich on 06/08/06. For personal use only.

234

EPPSTEIN & GALIL

wegive verylittle attention to parallel sorting, althoughsorting is usedas a subroutine in a numberof algorithmswedescribe. Wealso only describe algorithms, and not lowerbounds,for solving problemsin parallel. The modelof parallelism used in this paper is that of the PRAM, or shared memorymachine. This model is commonlyused by theoretical computerscientists but less often by builders of actual parallel machines; neverthelessit is hopedthat the techniquesdescribedhere will be useful not only on the shared memorymachinesthat have been built, but also on other types of parallel computers,either through simulations of the shared memorymodel on those computers (Mehlhorn & Vishkin 1984; Vishkin 1984a; Karlin & Upfal 1986; Ranade1987), or through analogues of these techniquesfor different modelsof parallel computation (e.g. Hillis & Steele 1986). All logarithms in this paper should be assumedto be base 2 unless anotherbase is explicitly stated. 1.

THE MODEL OF PARALLELISM

The modelof parallel computationwefor the most part use in this paper is the shared memory parallel randomaccess machine,the PRAM. In this model,oneis given an arbitrarily large collection of identical processors and a separate collection of memory cells; any processor can access any memory cell in unit time. Theprocessors are assumedto knowthe size of their input, but the programcontrolling the machineis the samefor all sizes of input. Thereare both theoretical and practical reasonsfor using this model. Thetheoretical motivationis basedon anothermodel,that of the circuit. Analgorithmin this modelconsists of a family of Booleancircuits, one for each size of a problem instance. To correspond with the PRAM requirementof havingone programfor all input sizes, the circuit family is usually required to satisfy a uniformitycondition (Ruzzo1981). Such condition states that there must be a programmeetingcertain conditions (for instance taking polynomialtime, or running in logarithmic space) which, given as input a number,producesas output the circuit havingthat numberas its input size. The output of the algorithmfor a given instance of a problemconsists of the result of the circuit of the appropriate size, whenthe circuit’s input lines are set to somebinary representation of the instance. Thetwo measuresof an algorithm’scomplexityin this modelare the circuit’s size and depth, both as a function of input size. Circuit modelsare goodfor producingtheoretical lower boundson the complexity of a problem, because a gate in a Boolean circuit is much

Annual Reviews www.annualreviews.org/aronline

Annu. Rev. Comput. Sci. 1988.3:233-283. Downloaded from arjournals.annualreviews.org by Eidgenossische Technische Hochschule Zurich on 06/08/06. For personal use only.

COMBINATORIAL

COMPIYrATION

235

simpler than an entire processor in a parallel computer,and becausemost other modelsof parallelismcan be describedin terms of circuits. Butit is difficult to construct algorithmsin the circuit model,again becauseof the simplicityof the circuits. It turns out that any uniformcircuit family can be describedas a PRAM algorithm, by simulatingeach gate of the circuits with a processorof the PRAM. The numberof processors used is proportional to the size of the circuit, andthe parallel timeis proportionalto the circuit’s depth.Similarly, a PRAM algorithm maybe simulated by a circuit family of size proportional to the total numberof PRAM operations, and of depth proportional to the parallel time, with perhapsan extra logarithmicfactor in the size and depth due to the fact that PRAM operations are performedon numbers rather than single bits. Therefore if a problem can be solved by a PRAM algorithm with a numberof processors polynomialin the size of the input, and a.time polynomialin the logarithm of the input size, it can be solved by a polynomial-sizedcircuit with polylogarithmicdepth, and vice versa. The class of problemssolved in such boundsis called NC;the theoretical importanceof this class is due to the fact that it does not dependon whether one is using the PRAM or the circuit as a model, or on details such as whetherthe circuit is allowedunboundedfan-in or whetherthe PRAM is allowedsimultaneousaccess to the samememory cell by multiple processors. Practically, one is given a machinecapableneither of configuringitself into arbitrary circuits nor of accessingmemory in unit time, and onewould like an algorithmthat takes a total numberof operationsthat is close to the best possible sequential time, but with a nonconstantfactor speedup over that sequential time. It is in manycases possible to simulate a PRAM algorithmon a real parallel machine,with only a logarithmicloss in time for the simulation. In particular, efficient randomizedsimulations are knownfor the butterfly network(Karlin & Upfal 1986; Ranade1987) and for a more specialized combination of sorting and merging networks (Mehlhorn& Vishkin 1984; Vishkin 1984a). in both cases the interconnection network has boundeddegree. Manycombinatorial problems can be solved on the PRAM using algorithms that fulfill both the theoretical requirement of taking polylogarithmic time, and the practical requirement of taking a numberof operations within a polylogarithmicfactor of the best knownsequential algorithm. For a numberof other problems,the best knownNCalgorithm takes a numberof processorsthat is a small polynomialof the input size, and so for small probleminstances the parallel solution maystill be practical. In this paper werestrict our attention to NCalgorithms,but we

Annual Reviews www.annualreviews.org/aronline

236

EPPSTEIN & GALIL

Annu. Rev. Comput. Sci. 1988.3:233-283. Downloaded from arjournals.annualreviews.org by Eidgenossische Technische Hochschule Zurich on 06/08/06. For personal use only.

attempt to keepthe numberof processorssmall enoughthat the algorithms remainuseful in practice. 1.1 Details of the PRAMModel Weassumethat each processor of the PRAM is arbitrarily assigned a uniqueidentifier, whichcan fit in a single memory cell; wealso assume that memory addresses can fit in memory cells. Eachprocessor can access any memory cell in unit time. In somecases weadd the further restriction that a memorycell can contain only a numberof bits of information proportionalto the logarithmof the size of the input (note that there must be at least that manybits to meetour earlier restriction that addressesfit in a cell). Processors are allowedto performin constant time any other operation that a typical sequential processorcould do; this usually means processorsare restricted to arithmetic and bit vector operations. The input to a PRAM algorithm is given in a predeterminedfashion in the memory cells of the processor. Weassumethat the size of a given input instance (which maybe represented by morethan one parameter; e.g. the numbersof vertices and edges in a graph) is known.The numberof processors used for a given computationmayvary dependingon the size of the instance, but maynot changeduring the computation,nor mayit in any other waydependon the actual contents of the probleminstance. Theefficiency of a PRAM algorithmis measuredby two criteria, taken as functions of the instance size: the numberof processorsused to perform the computation, and the amountof time taken by that numberof processors for the computation.Thelatter criterion is usually takento be the worstcase over all instances of a given size. Anothercriterion, the total numberof operations, is then the product of the numberof processors with the total time taken; that is, wecount a processor as performingan operation even whenit is idle. Welater describe a wayof performinga parallel computation suchthat at mosta fixed fraction of the time is spent idle; this meansthat wecould instead count non-idle operationsonly, and take the numberof processors to be the numberof operations divided by the time. Since a parallel algorithmcan easily be simulatedon a sequential computer in time proportionalto the numberof parallel operations, it is clear that converselythe numberof operations of a parallel algorithmmust be at least proportional to the best knownsequential algorithm. A parallel algorithmthat meets this boundis called optimal. Optimalitydependson the specifics of the modelweare using, becauseas weshall see, simulation of an algorithmdesignedfor one modelon a processorof a different model maylose a polylogarithmic factor in the numberof operations. Manyof the algorithmsweconsider are in fact optimal for somemodel.

Annual Reviews www.annualreviews.org/aronline

Annu. Rev. Comput. Sci. 1988.3:233-283. Downloaded from arjournals.annualreviews.org by Eidgenossische Technische Hochschule Zurich on 06/08/06. For personal use only.

COMBINATORIAL

COMPUTATION

237

There is an important question we have not yet addressed, which is what happens when more than one processor tries to access the same memory cell at a given time. Various possible answers have been chosen; it will be convenient to pick different answers for different algorithms. This does not mean that the algorithms will only work for a specific choice of concurrent memoryaccess behavior, because as we shall see, there are a number of simulation results that let algorithms written for one choice work with machinesthat use a different choice. There are three main choices for processor action on concurrent memory access. The first choice is simply to disallow such access altogether, and leave undefined the behavior of a program that attempts to perform such access. Such a machine is called exclusive read exclusive write, or EREW for short. This is the weakest of the various types of PRAM,and so an algorithm that works on this model with a given time and number of processors is preferable to one that achieves the same time and processors on a stronger model. The second type of PRAM,knownas a concurrent read exclusive write machine, or CREW for short, allows several processors to read the same cell at once, but disallows multiple concurrent writes to a cell. Again, the behavior of a program that violates these constraints is undefined. For a long time this seems to have been the only type of PRAM used (Ku~era 1982); it appeared in the literature as early as 1976 (Csanky 1976; Hirschberg 1976; Fortune & Wyllie 1978). In the third and strongest version of the PRAM,the CRCW,both concurrent reads and concurrent writes are allowed. Westill need to define what happenswhenseveral processors write to a single cell with different data values. A number of different possibilities have appeared in the literature under various names: 1. WeakCRCW.In this model concurrent writes are allowed only if all processors performing a concurrent write are writing the value zero. Sometimesthe further restriction is added that they be writing to one of a set of special cells that can only contain zero or one; but this restricted weak CRCW is equivalent to the version described here, as we showin the simulations below. This model was introduced by Kurera (1982), who showedthat certain problems could be solved more quickly on it than on the CREW which had typically been used before then, and that (see below) problems that can be solved in a given time on any version of the CRCW can be solved within a constant factor of that time on this version (but perhaps with manymore processors). 2. Common-mode CRCW.In this model there are no restrictions on the values written in a concurrent write; however,all processors writing at

Annual Reviews www.annualreviews.org/aronline

Annu. Rev. Comput. Sci. 1988.3:233-283. Downloaded from arjournals.annualreviews.org by Eidgenossische Technische Hochschule Zurich on 06/08/06. For personal use only.

238

EPPSTEIN& GALIL

a given time to the same cell must write the same value. Therefore there is no question of whichvalue is actually written to the cell. 3. Arbitrary-winner CRCW.Processors maywrite different values to the same cell, and one of the values written is the value the cell ends up containing. But which processor wrote the winning value may be chosen arbitrarily; in particular the result maybe different if the samestep is executed another time. This version of the CRCW is the one most commonlyused by workers in the area of parallel algorithms. 4. Priority CRCW. The result written in a concurrent write is the value from the writer with the largest processor identifier. Recall that these identifiers maybe chosen arbitrarily; thus like the arbitrary-winner CRCW it is not clear a priori which processor will be the one to get its value chosen as the result; however, unlike the arbitrary-winner CRCW a repetition of the sameconcurrent write will give the sameresult. 5. Strong CRCW.This model is less often seen than the previous ones, but it has appeared in the literature (Galil 1986a; Cole & Vishkin 1986c). In it, the value written in any concurrent write is the largest (or equivalently smallest) of any of the values the processors are attempting to write. Note that the strong CRCW model can simulate each step of a priority CRCW in constant time by, in a concurrent write, first writing the identifiers of the processors attempting the write, and then the winner writing the original value it wanted written. Thus it is at least as strong as the priority CRCW.Similarly we can see that each of the models we have given is at least as strong as the previous ones, so the various modelsform a hierarchy. Grolmusz& Ragde (1987) have recently described some other CRCW models that are intermediate in power among the ones we have listed but do not fall into a hierarchy with them. Vishkin (1984b) has described an even stronger version of the CRCW PRAM.In this model, which has been partially implemented as the NYU Ultracomputer (Gottlieb et al 1983), there is a further kind of concurrent write available. If one processor performs this write to a cell, the value it was writing is added to the value at the cell, and the result is stored back into the cell. The old cell value, before the addition, is returned to the writing processor. If several processors write to the samecell, the effect is as if they had performed the above operation one at a time, in some arbitrary order. One could further generalize this to any (associative) binary operation, rather than just addition. Vishkin showedthat in certain simulations of one parallel modelby another, this very strong write operation can also be simulated with no further loss of time or efficiency. A final distinction to be made among PRAMmodels is whether they

Annual Reviews www.annualreviews.org/aronline

Annu. Rev. Comput. Sci. 1988.3:233-283. Downloaded from arjournals.annualreviews.org by Eidgenossische Technische Hochschule Zurich on 06/08/06. For personal use only.

COMBINATORIAL COMPUTATION

239

are randomizedor deterministic. If weallow randomnessin parallel computation, it is performedas follows. Each processor is given its own randomnumbergenerator, with whichit can generate in constant time a numberdrawnfrom a uniformdistribution over the numbersfitting in a memory cell. Thesenumbersare private; if a processor wantsto share its randomnumberwith another processor it mustwrite it to a memory cell. The distribution of numbersdrawnat any one processor is usually taken to be independentof that drawnat all other processors, although some algorithms have weakerindependencerequirements. Whena parallel algorithm uses randomnumbers,westill require that the number of processorsbe a fixed functionof the input size; that is, only the run time, and not the numberof processors, mayvary randomly. The algorithm mayalways terminate with the correct answer (such an algorithm is commonlyknownas a Las Vegas algorithm) or it mayget the correct answerwith high probability (a MonteCarlo algorithm). The algorithm must be able to detect whenit has terminated. The time we measure for a randomizedparallel algorithm is then the worst case expectedtime of terminationover all instances of a givensize. Random numbersare used in a numberof parallel algorithms for which there are efficient deterministicsequential solutions. Oneimportantuse of them is for what is knownas symmetrybreaking; for instance, one may have a graph in whichmanynodes have similar local neighborhoods,but in whichafter the algorithmterminatesdifferent nodesmusthavedifferent data values. Onesuch problemis that of coloring the graph. If weused a deterministic algorithm, the nodes must look far awayfromthemselvesto see enoughto differentiate themfromtheir neighbors, but in a parallel algorithmthey mightmerelyflip coins and expectwith high probability to get a different result. For certain problemsthese coin flips can be replaced by a use of the processoridentifiers knownas deterministic coin tossing, introducedby Cole&Vishkin(1986a,b); this techniqueis discussedbelow. There are several ways randomnesscan improvethe parallel time and processor boundsof an algorithm. First, a randomizedparallel algorithm maysolve in polylogarithmicexpectedtime a problemfor whichthere is no knownNCalgorithm. Wecall the class of problems solved by such algorithms RNC.Suchis the case for matching(Karp et al 1985a; Galil & Pan 1985a; Mulmuleyet al 1987) and for the construction of maximal paths (Anderson1985)and of depth-first search trees (Aggarwal& Anderson 1987). Karpet al (1985b)describe a general modelof parallel computation in the presenceof oracles; they give a problemwhichthey prove no deterministic PRAM with polynomial processors can solve in polylogarithmicparallel time, but whichcan be solved quicklyand efficiently in parallel usinga probabilistic algorithm.

Annual Reviews www.annualreviews.org/aronline

Annu. Rev. Comput. Sci. 1988.3:233-283. Downloaded from arjournals.annualreviews.org by Eidgenossische Technische Hochschule Zurich on 06/08/06. For personal use only.

240

EPPSTEIN & GALIL

Second, a randomizedparallel algorithm mayperform a task that can bc performeddetcrministically in parallel, but the parallel algorithmmay bc moreefficient than the best knowndeterministic algorithm, maytake less time than it, or mayrun on a weakermodelof parallelism. Evenif the randomized algorithm takes the same asymptotic bounds, it may be simpler and therefore morepractical than its deterministic counterpart. Examplesof improvedefficiency in randomizedparallel computationarc the various algorithms knownfor finding maximalindependentsets. The randomizedalgorithmof Luby(1985) uses less operations than the deterministic algorithm of the samepaper, and runs morequickly (with the samenumberof processors) than the newerdeterministic algorithm, of Goldberg& Spencer (1987). Otherexamplesof randomized parallel algorithmsthat are moreefficient than their deterministic counterparts include those for finding connected components(Gazit 1986) and for sorting integers (Reif 1985). Another examplein whichthe randomizedparallel time is faster than the best knowndeterministic time for a given numberof processors (in fact the time is better than the deterministic lower bound)is the constant time selection algorithm given by Reischuk(1981); note, however,that both Reischuk’salgorithm and the lower boundsgiven are for the comparison tree modelof parallelism, and therefore do not translate directly to the PRAM. Finally, a randomizedalgorithm mayuse the sameasymptotic time and numberof processors as a deterministic algorithm, but maybe much simpler. Suchan algorithmwouldtypically be moreuseful in a practical implementation,becauseof the ease of writing the programfor it andalso becausethese simpler programstend to run faster (by a constant factor, but sometimesa very large constant factor) than their deterministic counterparts. Also the randomizedalgorithms are often discovered some time earlier than an equally efficient deterministic solution. Onesuch instance is the randomizedEREW list ranking algorithm of Vishkin (1984c), whichwas discovered sometime before the best deterministic algorithms (Cole & Vishkin 1986c; Anderson& Miller 1988b). Wegive morecompletehistory of ranking in the next section. Anotherexampleis the randomizedtree contraction algorithmof Miller & Reif (1985), which was later madedeterministic by Cole &Vishkin(1987). 1.2 Simulations Among PRAM Models Wehave looked at a numberof different ways that the PRAM model of parallel computationmayhave its details of concurrent memory access, and of randomness or determinism,filled out. All of the different types of deterministic PRAM we have seen have been used in the developmentof

Annual Reviews www.annualreviews.org/aronline

Annu. Rev. Comput. Sci. 1988.3:233-283. Downloaded from arjournals.annualreviews.org by Eidgenossische Technische Hochschule Zurich on 06/08/06. For personal use only.

COM~n~ATOR~AL CO~PtrrA’nON 241 parahel algorithms to solve various problems, and the same is true for many probabilistic PRAMmodels. An algorithm designed for a weak modelcan be used in a stronger model within the same processor and time bounds as they would take in the weaker model. But we would like to use any PRAM algorithms on any model, and so it is important to be able to simulate efficiently stronger modelsby weakerones. Wegive a numberof these simulation results between different pairs of models, but before we do, let us first see a result in which the modelbeing simulated and the model doing the simulation are the same. That is, if we have a parallel algorithm running on a certain PRAM model with a given number of processors and within a given amount of time, we would like to be able to run it with fewer processors without more than a proportional loss of time. Theorem1 (Brent 1974). If a parallel computation can be performed in time t, using q operations on any numberof PRAMprocessors (of a given fixed type), then it can be performedin time t+ ( q- t ) /p using p processors (of the same type). Proof." Supposethat each time step i, taking i from 1 to t, the original computation performs qi operations, so that q = ~EI= i qi. Then with p processors, we can simulate step i in time [qi/p] = [(qi+P-1)/p] _< (q~+ p- 1)/p, by breaking the operations into blocks of size p and performing one block at a time, with one processor per operation within each block. Summinggives total time Y’~-I [q~/P] 0, note that each path fromi to j havingat most2~ edges can be formedas a path from i to somevertex h having at most 2k- 1 edges, together with another path from h to j again havingat most 2~- 1 edges. k ~[h, j]), whichby inductionis the sumof Ak[i, j] = ~]~=I(Ak- l[i, h] +A the labels of all waysof combiningtwo paths of length at most2~- ~. The matrix multiplication maycomputethe label for a given path in several different ways, but by idempotencethe sumof any numberof copies of the pathlabel is that pathlabel itself. Finally, note that, again using the fact that 1 ÷a = 1, the sumof the label of any nonsimplepath with that of a simple path using a subset of the sameedges will be the label of the simple path; therefore since all simple paths use at most n-1 edges we need not take A to any higher power. ¯ Nowlet us describe someapplications of closed semiringsystems. First consider finding the shortest path betweeneach pair of vertices in the graph, and assumethe edge lengths are given as log n-bit integers. The correspondingsemiring has as its addition operation the integer minimum function, and as its multiplication operation integer addition. Themultiplicative identity is the integer 0; weadda special infinite valueto be our additive identity. Matrixmultiplication in the abovesemiringcan be seen to take constant time using O(n3) weak CRCW processors. Therefore wecan find all shortest paths in the graphin time O(logn), andthe samenumberof processors. Asa secondexample,considerfinding the transitive closureof a directed graph. Wetake as our semiring Booleanalgebra: the addition operator will be logical or andthe multiplicationoperator logical and. Theadditive identity is the valuerepresentingfalsehood,andthe multiplicativeidentity that representing truth. Againmatrix multiplication can be performedin constant time with O(r/3) weakCRCW processors, so finding the transitive closure takes O(logn) time with the samenumberof processors. Essentially

Annual Reviews www.annualreviews.org/aronline

Annu. Rev. Comput. Sci. 1988.3:233-283. Downloaded from arjournals.annualreviews.org by Eidgenossische Technische Hochschule Zurich on 06/08/06. For personal use only.

COMBINATORIAL

COMPUTATION

275

this algorithm, in a version for EREW processors, wasgiven by Hirschberg (1976). As a less obvious exampleof the use of semiring systems, consider finding a topologicalordering of a directed acyclic graph. This can easily be performedin linear time sequentially, but the algorithmdoes not obviously lend itself to parallelism. The followingalgorithm, due to Ku~era (1982),doesfind a topologicalorderingin parallel; however,the efficiency of .the algorithm is muchlower than that of the sequential topological sorting algorithm. Wefirst find the transitive closureof the graph,usingthe abovesemiring algorithm.For each vertex v, wecompute the in-degreeofv in the transitive closure. Finally wesort the vertices by their computedin-degrees. If a vertex v is an ancestor of another vertex w in the original graph, then in the transitive closure wwill haveincomingedgesfromall the ancestorsof v, plus onefromv itself, with possiblystill others. Thereforesorting by indegreesin fact results in a topologicalordering.Eachstep can be performed in O(log n) time; the step that requires the most processors is that computingthe transitive closure, whichas wehaveseen takes O(n3) weak CRCWs. 4.3 Matchin9 Anotherimportant problemthat can be solved in parallel using matrix techniques is that of matching.A matchingis a subset Mof the edges of an undirected graph, such that no two edges in Mshare a vertex. There are a numberof closely related problemsonewouldlike solved, all involving finding matchingsof a certain sort. In particular, wemightwantto find any of the followingtypes of matching. 1. A perfect matching. Each vertex must appear in someedge of the matching. 2. A maximum matching. This is a matchingwith the largest numberof edges over all possible matchingsin the graph. If there is a perfect matching,it is obviouslya maximum matching,but there are graphs in whichno perfect matchingexists. 3. A maximalmatching. With such a matching, each edge of the graph has at least one of its vertices coveredby the matching,so no more edges can be added in the matching. A maximum matching must be maximal,but the converseis not necessarily true. 4. A minimum (or maximum) weight perfect matching. Each edge is given an integer weight; the task is to find a matchingsuch that the sumof the edgeweightsis minimized. 5. A minimum weight perfect matching, assumingthat there is only one perfect matchinghaving the minimum weight.

Annual Reviews www.annualreviews.org/aronline

Annu. Rev. Comput. Sci. 1988.3:233-283. Downloaded from arjournals.annualreviews.org by Eidgenossische Technische Hochschule Zurich on 06/08/06. For personal use only.

276

EPPSTE1N& GALIL

In problems4 and 5 werestrict the weightsto be integers boundedby a polynomialin the input size. It is not clear whetherthe problemscan be solvedquicklyand efficiently in parallel for moregeneral weights. Matching is interesting in its ownright, but it is also closely related to a numberof other problems. In particular if wecan calculate maximum flows on a bipartite graph, wecan use that calculation to find a maximum matchingon that graph; and conversely matchings can be used to find certain types of flows. Aggarwal& Anderson(1987) used flows calculated by matchingsin their algorithmfor computinga depth-first search tree. If wecould computematchingsmoreefficiently, wecould in turn use that computationto computedepth-first search trees moreefficiently. Here wedescribe a solution to problems1 and 5, due to Mulmuleyet al (1987). Problem1 is reducedto problem5 by an appropriate choice relatively small randomweights; then problem5 is solveddeterministically using inversion of a matrix derived fromthe edgeweights. Weshould note that the samepaper also describes a solution of problems2 and 4 using similar techniques. Problem3, finding a maximalmatching,seemsto be mucheasier; it can be solvedefficiently and deterministically in parallel with an algorithm based on the Euler tour technique for graphs (Israeli & Shiloach 1986). The best knownrandomized algorithm for this problem is even more efficient (Israeli &Itai 1986). Finally weshouldnote that an importantspecial case of all of the above problemsis finding the appropriate matchingfor a bipartite graph. Wedo not discussthis case further here. Theorem26 (Mulmuleyet al 1987). Let S = {x~,x2,... ,x,} be a finite set, andlet F={S1, S2 ..... Sk}, with Sic S, be a family of subset of S. Let weightswi, drawnrandomlyand independentlyfroma uniformdistribution on the integers froml to 2n, be assigned to each of the xi, and define the weight w(Sj) of a set in F to be the sumof the weights of its members. Thenthe probability that there is exactly one memberof F having the minimum weightis at least 1/2. Proof: Let P be the probability that more than one minimum weight matchingexists. Let Ei be the event that x~ is both in someminimum weight matchingand not in someother minimum weight matching, and let P~ be the probability of event Ei. Wefirst showa boundon Pi, and then from this derive a boundon P. Assume a fixed assignmentof weightswi. For each i, define Fi to be the collection of sets from Fcontainingx~. Let m(X)for X~ Fbe the minimum weight set in X. Then xi can only be included in someminimum weight set and not included in someother such set whenre(F-F,.) = m(F~);

Annual Reviews www.annualreviews.org/aronline COMBINATORIAL

COMPUTATION

277

Annu. Rev. Comput. Sci. 1988.3:233-283. Downloaded from arjournals.annualreviews.org by Eidgenossische Technische Hochschule Zurich on 06/08/06. For personal use only.

equivalently when wi = re(F--Fi)--(m(Fi)--wi). But wi is independent of both m(F-F3and m(F3-wi, so P~, the probability of wi being chosen to equal their difference, is at most1/2n. If there are two or more minimumweight sets in F, they must differ in at least one xi; that is, at least one of the events E~ must occur. These events are not independent, but the probability of any of them occurring can be no more than the sum of their individual probabilities. That is, P < ~i"= ~ Pi