Distributed Matroid Basis Completion via Elimination ... - CiteSeerX

0 downloads 0 Views 271KB Size Report
Jan 9, 1998 - spanning tree computed for a D-diameter network, after k edges have changed ... minimum / maximum cost basis of a given matroid) may be of ...
Distributed Matroid Basis Completion via Elimination Upcast and Distributed Correction of Minimum-Weight Spanning Trees D. Peleg



January 9, 1998

Abstract

This paper proposes a time-ecient distributed solution for the matroid basis completion problem. The solution is based on a technique called elimination upcast, enabling us to reduce the amount of work necessary for the upcast by relying on the special properties of matroids. As an application, it is shown that the algorithm can be used for correcting a minimum weight spanning tree computed for a D-diameter network, after k edges have changed their weight, in time O(k + D).

Department of Applied Mathematics and Computer Science, The Weizmann Institute, Rehovot 76100, Israel. E-mail: [email protected]. Supported in part by grants from the Israel Science Foundation and from the Israel Ministry of Science and Art. 

1 Introduction 1.1 Motivation The theory of matroids provides a general framework allowing us to handle a wide class of problems in a uniform way. One of the main attractive features of matroids is that their associated optimization problems are amenable to greedy solution. The greedy algorithm is simple and elegant, and its runtime is linear in the number of elements in the universe, which is perfectly acceptable in the sequential single-processor setting. In the parallel and distributed settings, however, one typically hopes for faster solutions. Unfortunately, direct implementations of the greedy algorithm are inherently sequential. Hence the problem of designing time-ecient distributed algorithms for handling matroid optimization problems (particularly, problems involving the computation of the minimum / maximum cost basis of a given matroid) may be of considerable interest. This paper focuses on situations where a partial solution is already known, and it is only necessary to modify or complete it. In such cases, one may gain considerably from applying a matroid basis completion algorithm, rather than solving the entire optimization problem from scratch. Hence the paper considers time-ecient distributed solutions for the optimal matroid basis completion problem, and some of its applications. The problem is de ned as follows. Given an independent but non-maximal set R in the universe, nd a completion of R into a maximumweight basis assuming such completion exists. (Precise de nitions are given in Sections 3 and 4.) In particular, applied with R = ;, the problem reduces to the usual matroid optimization problem.

1.2 MST correction The main application we present for matroid basis completion is for handling the problem of maintaining and correcting an MST in a distributed network. Maintenance of dynamic structures in a distributed network is a problem of considerable signi cance, and was handled in various contexts (cf, e.g., [ACK88, KP95b, KP95c]). The speci c problem of MST correction can be described as follows. Suppose that a minimum-weight spanning tree M has been computed for the (weighted) network G. Moreover, suppose that each node stores a description of the edge set of M (but not of the entire network G). At any time, certain edges of the tree M may signi cantly increase their weight, indicating the possibility that they should no longer belong to the MST, and certain other edges may reduce their weight, possibly enabling them to join the MST. Suppose that a central vertex v0 in the graph accumulates information about such deviant edges. This can be done over some shortest paths tree T spanning the network. At some point (say, after hearing about a set Wbad of jWbad j = kbad edges of M whose weight has deteriorated and a set Wgood of jWgood j = kgood edges outside M whose weight has improved, where kbad + kgood or the accumulative change in weights exceeds some predetermined limit), v0 may decide to initiate a recomputation of the MST. The question studied in this paper is whether such a recomputation can be performed in a cheaper way than computing an MST from scratch. Let us observe the following facts concerning the problem. First, note that as collecting the information at v0 takes (kbad + kgood + Diam(G)) time, we may as well distribute the information throughout the entire network, at the same (asymptotic) time, by broadcasting it on the tree T in 1

a pipelined fashion. Secondly, note that the two problems of taking Wbad and Wgood into account can be dealt with separately, one after the other. In particular, suppose rst that Wgood = ;. In this case, we only need to nd a way to discard those edges of Wbad that should no longer be in M , and replace them by other edges. Similarly, supposing Wbad = ;, we only need to nd a way to add those edges of Wgood that should be in M , and throw away some other edges currently in M . Now, assuming we have two separate procedures for handling these separate problems, applying them one after the other to the general case would result in a correct MST. Our next observation is that, as every vertex stores a description of the entire tree M , checking the possibility of incorporating the edges of Wgood can be done locally at each vertex (ignoring the set Wbad or assuming it is empty). Hence the second of our two problems is easy, and can be solved in time (kgood + Diam(G)). However, solving our rst problem is not as straightforward, as possible modi cations involving the elimination of some of the edges of Wbad from the tree M may require some knowledge about all other edges currently not in M , and this knowledge is not available at all vertices. Two natural approaches are the following. First, it is possible to keep at each vertex the entire topology of the graph. The disadvantage of this solution is that it involves large ( (jE j)) amounts of storage, and therefore may be unacceptable in certain cases. A second approach is to employ the well-known GHS algorithm for distributed MST computation [GHS83], but invoke it only from the current stage. Speci cally, once v0 has broadcasted the set Wbad throughout the graph, every vertex in G can (tentatively) throw these edges from M , and remain with a partition of M into a spanning forest F composed of k  kbad +1 connected subtrees (or \fragments" in the terminology of [GHS83]), M1 ; : : :; Mk . A standard rule can be agreed upon (say, based on vertex ID's) for selecting a responsible vertex in each of these fragments. These vertices can now initiate an execution of the GHS algorithm, starting from the current fragments. The disadvantage of this solution is that the last few phases of the GHS algorithm are often the most expensive ones. In particular, it may be the case that some of the fragments Mi have large depth (even if G has low diameter), thus causing the fragment-internal communication steps to be time consuming. As a result, the correction process may take (n) time. The approach proposed in this paper is the following. We rst discard the edges of Wbad from M , remaining with the forest F . We now view the problem as a matroid basis completion problem, and look for a completion of M into a full spanning tree. Our approach yields an O(k + Diam(G)) time MST correction algorithm.

1.3 Model The network is modeled as an undirected unweighted graph G = (V; E ), where V is the set of nodes, and E is the set of communication links between them. The nodes can communicate only by sending and receiving messages over the communication links connecting them. In this paper we concentrate on time complexity, and ignore the communication cost of our algorithms (i.e., the number of messages they use). Nevertheless, we still need to correctly re ect the in uence of message size. Clearly, if messages of arbitrary unbounded size are allowed to be 2

transmitted in a single time unit, then any computable problem can be trivially solved in time proportional to the network diameter, no matter how many information items need to be collected. We therefore adopt the more realistic (and more common) model in which a single message can carry only a limited amount of information (i.e., its size is bounded), and a node may send at most one message over each edge at each time unit. For simplicity of presentation, it is assumed that the network is synchronous, and all nodes wake up simultaneously at time 0. Let us note, however, that all the results hold (with little or no change) also for the fully asynchronous model (without simultaneous wake up). It is also assumed that we are provided with a shortest paths spanning tree, denoted T , rooted at v0. Otherwise, it is possible to construct such a tree in time Diam(G).

1.4 Contribution The main result of this paper is a time-ecient distributed solution for the optimal matroid basis completion problem. Speci cally, in a tree T of depth D, given a partial basis R of size r in a matroid of rank t over an m-element universe, the optimal basis completion problem is solved in time O(D + t ? r). (Hence in particular, a matroid optimization problem of rank t over an m-element universe is solved from scratch in time O(D + t), as opposed to the O(D + m) time complexity achieved, say, by a naive use of upcast.) Let us brie y explain the basic components of our solution. The reason why the greedy algorithm is inherently sequential is that it is global in nature, and has to look at all the edges. This diculty can be bypassed by using a dual approach for matroid problems, which is more localized. The idea is to work top-down rather than bottom-up, i.e., instead of building the basis by gradually adding elements to it, one may gradually eliminate some elements from consideration. This is known in the context of MST construction as the \red rule" (cf. [Tar83]). One source of diculty encountered when working with the basis completion problem, rather than solving the matroid optimization problem from scratch, is that the usual version of the red rule is not applicable. Consequently, the rst component in our solution is a generalized form of the red rule, applicable to basis completion problems. It should be clear that if restricted to sequential computation, the use of the dual algorithm based on the red rule would still yield linear complexity, despite its more local behaviour. Consequently, the second component of our algorithm is geared at exploiting the generalized red rule in a distributed fashion, in order to enable us to reduce the time complexity of the algorithm in the distributed setting, relying on the special properties of matroids. This is done via a technique called elimination upcast. This technique in fact implements a combination of the two dual approaches discussed above. Its main operation is a greedy upcasting process on a tree, collecting the best elements to the root. This operation is sped up by accompanying it with the complementary operation of eliminating elements known for certain to be out of the optimal solution. This elimination process is carried out locally at the various nodes, relying on the generalized red rule. The elimination upcast technique is a variant of the procedure used in [GKP98, KP95a] as a component in a fast distributed algorithm for computing a minimum-weight spanning tree (MST), but it applies to a wider class of problems (namely, all matroid optimization problems), and it deals 3

with the more general setting of basis completion. Moreover, its correctness proof and analysis are (perhaps surprisingly) somewhat simpler. For the MST correction problem, our matroid basis completion algorithm enables us to perform the second phase of the solution, replacing the edges of Wbad , in time (kbad + Diam(G)). The result is an MST correction algorithm solving the entire problem in time (kbad +kgood +Diam(G)). The rest of the paper proceeds as follows. In Section 2 we overview some of the basic material known regarding the convergecast process. Section 3 reviews some basic material concerning matroids. In Section 4 we introduce the matroid basis completion problem and its sequential solutions, and in Section 5 we describe our distributed solution.

2 Basics This section reviews known or straightforward background concerning convergecasts and upcasts on a tree T rooted at v0 . Let Depth(T ) denote T 's depth, i.e., the maximum distance from v0 to any vertex of T .

2.1 Convergecasts Acknowledgements of receiving a broadcast message can be eciently gathered on a tree T by the following convergecast process. Upon getting the message, a vertex v has to do the following. If v is a leaf in the tree T , then it immediately responds by sending up an Ack message to its parent. If v is an intermediate (non-leaf) vertex in T , then it must rst collect Ack messages from all its children, and only then it may send an Ack message to its parent. The time complexity of the convergecast process is O(Depth(T )). The convergecast process is useful also for computing other types of global functions. Suppose that each vertex v in the graph holds an input Xv , and we would like to compute some global function f (Xv1 ; : : :; Xv ) of these inputs. Suppose further that the function f enjoys the following properties. n

1. f is well-de ned for any subset Y of the input set X = fXv1 ; : : :; Xv g, n

2. f is associative and commutative, 3. the representation of f (X ) is compact enough to t in a single message. Functions satisfying the rst two properties are sometimes referred to as semigroup operations. Such a function f can be computed eciently on a tree T by a convergecast process. In this process, the value sent upwards by each vertex v in the tree will be the value of the function on the inputs of its subtree Tv , namely, fv = f (Yv ) where Yv = fXw j w 2 Tv g. An intermediate vertex v with k children w1; : : :; wk computes this value by receiving the values fw = f (Yw ), 1  i  k, from its children, and applying fv f (Xv ; fw1 ; : : :; fw ). Correctness of the computation is guaranteed by the associativity and commutativity of f . The time complexity of the process is O(Depth(T )). i

k

4

i

Two simple examples for such functions are addition and maximum computations. As an additional example, the convergecast process can be applied in order to collect global knowledge from local predicates. Suppose that each vertex v holds a predicate Pred(v ) which may be \True" or \False". Let Xv be a bit variable set to 1 if Pred(v ) = True and to 0 otherwise. Then it is possible to convergecast a combined form of this information eciently, ending up with the root knowing, for instance, 9v Pred(v ) or 8v Pred(v ). In particular, the acknowledgement process presented earlier can be described in this way. Broadcasts and convergecasts can also be eciently pipelined. Suppose each vertex in the tree stores k variables Xi (v ), 1  i  k, and we would like to compute all k maximums Mi = maxv2V Xi(v ) and inform all vertices of the results. Clearly, for each i, it is possible to compute Mi separately, through a max-convergecast process, and then broadcast the result on the tree. Assuming the variables hold log n-bit bumbers, this process requires O(Depth(T )) time. But performing these operations sequentially, waiting for the computation of Mi?1 to end before beginning the computation of Mi , would multiply the time complexity by a factor of k, resulting in a total of O(k  Depth(T )) time. A simple technique for reducing this time complexity is to pipeline the computations. Each leaf v starts the processes one after another, sending X1 (v ), followed by X2(v ), and so on. Denote by T (v ) the subtree of T rooted at a vertex v . Each intermediate vertex v computes each the partial maximums Mi (v ) = maxw2T Xi (w) immediately when receiving the corresponding partial maximums from all its children, and sends the values Mi (v ) to its parent one by one. The algorithm can be formalized as follows. De ne the level of a vertex v in T as the depth of the subtree Tv rooted at v , denoted by L^ (v ) = Depth(Tv ). More explicitly, L^ (v ) is de ned as follows: ( if v is a leaf; ^L(v ) = 0 ^ 1 + maxu2child (v) (L(u)) otherwise. Then each vertex v sends the values Mi (v ) to its parent consecutively, at rounds L^ (v ) + i (for 1  i  k). This can be proved for every i by induction on L^ (v ), from the leaves up. It follows that k global semi-group functions can be computed in Depth(T ) + k time. v

2.2 Downcasts and upcasts Another pair of tasks that can be carried out via the process similar to (but di erent from) broadcast and convergecast is downcast and upcast. These tasks refer to the case where the item communicated between the root and each of the vertices in the tree is possibly di erent, hence it needs to be treated individually, and may not be combined. Downcasts apply in the following situation. Suppose that the root has m distinct items A = f 1; : : :; mg, each destined to one speci c vertex in the tree. (Each vertex in the tree may get zero or more such messages.) Clearly, both the depth of the tree and the number of distinct messages that need to be sent are potential bottlenecks, hence the process requires (maxfm; Depth(T )g) time in the worst case. This lower bound can be met by a straightforward algorithm. For each of its children w, the root r0 simply sends the messages destined to the subtree rooted at w one by one on the edge 5

(r0; w), in an arbitrary order. Each intermediate vertex v in the tree receives at most one message at each step, and passes it on towards its destination. This protocol downcasts m distinct messages on T in time O(m + Depth(T )). Let us now consider the dual problem. Suppose that m data items A = f 1; : : :; m g are initially stored at some of the vertices of the tree T . Items can be replicated, namely, each item is stored in one or more vertices (and each vertex may store zero or more items). The goal is to end up with all the items stored at the root of the tree. We refer to this operation as upcast. Note, that this task is not really a \convergecast" process, since the items are sent up to the root individually, and are \combined" only in the sense that a vertex sends only a single copy of a replicated item. The same bottlenecks that exist for the downcast operation apply also to upcast. Hence upcasting m distinct messages on T requires (maxfm; Depth(T )g) time in the worst case. At rst, it may seem that the upcast operation should be inherently more dicult than the downcast operation: intuitively, in the downcast operation the di erent messages \spread out" on the tree, hence they disrupt each other less and less, whereas in the upcast operation the di erent messages \converge" to a single spot on the tree, hence they tend to disrupt each other more and more. Moreover, in the upcast problem an item may be replicated, i.e., initially stored in a number of di erent locations. Despite these diculties, a very simple algorithm will guarantee this optimal bound on the upcast operation. For every v , let Mv denote the set of items initially stored at some verex of Tv . The only rule that each vertex has to follow is to upcast to its parent in each round some item in its possession that has not been upcast in previous rounds. One can show that for every 1  i  jMv j, at the end of round L^ (v ) + i ? 1, at least i items are stored at v . Hence at the end of round Depth(T ) + m, all the items are stored at the root of the tree. Let us remark that similar results hold also in much more general settings, without the tree structure. In particular, the bounds hold even when the m messages are sent from di erent senders to di erent (possibly overlapping) recipients along arbitrary shortest paths, under a wide class of con ict resolution policies (for resolving collisions in intermediate vertices between messages competing over the use of an outgoing edge), so long as these policies are consistent (namely, if item i is preferred over item j at some vertex v along their paths, then the same preference will be made whenever their paths intersect again in the future). This was rst shown in [CKMP90, RVVN90] for two speci c policies, and later extended to any consistent greedy policy in [MPS91]. Next, let us consider the following smallest k-of-m problem. Suppose that the elements Xv stored at the vertices are taken out of an ordered domain A, and our goal is to collect the smallest k elements at the root. The global function computation scheme described above could be used to solve this problem as follows. First, nd the minimum element, and inform all the vertices by broadcasting it throughout the tree. Now, nd the next smallest element by the same method, and so on. This should take O(kDepth(T )) time. An alternative and faster method would be the following. At any given moment along the execution, every vertex keeps the elements it knows of in an ordered list. In each step, each vertex 6

sends to its parent the smallest element that hasn't been sent yet. Recalling the de nition of L^ (v ), the level of a vertex v in T , we note that for each vertex v , the smallest value stored at any vertex in the subtree Tv has already reached v by round L^ (v ), and more generally, v already has the ith smallest value in Tv by round L^ (v ) + i ? 1. This can be proved by induction on L^ (v ). It follows that upcasting the k smallest elements on a tree T can be performed in Depth(T ) + k time. Finally, let us consider the smallest ki -of-mi problem, illustrated via the resource allocation example in the introduction. Thinking of each resource type i as de ning a separate \smallest k-of-m" subproblem, we have to combine our solution for these separate problems by pipelining them as discussed earlier. It is easy to verify that by pipelining the subproblems in the same order P at all nodes, we get the correct k = i ki unit ID's at the root by time Depth(T ) + k.

2.3 Two example applications Let us conclude this review section by discussing two additional applications that fall under the general convergecast / upcast framework, although they do not t precisely into the paradigms discussed above.

Route-disjoint matching: The following \matching" problem can be solved using the tech-

niques discussed earlier. Suppose that we are given a rooted tree T and a set of 2k vertices W = fw1; : : :; w2kg in it. Our goal is to nd a matching of these vertices into pairs (xi; yi) for 1  i  k, such that the (unique) routes i connecting xi to yi in T are edge disjoint. The existence of such a matching is given by the following lemma. Lemma 2.1 [KR93] For every tree T and for every set W as above, k  bn=2c, there exists an edge-disjoint matching as required. A simple distributed algorithm based on upcasts can be used to construct the matching. Clearly, it is possible to use a convergecast process in order to count, for every vertex v , the number of vertices in W that reside in its subtree Tv . This process can be modi ed to yield the matching by adopting the following ground rules. Each vertex will upcast also (up to) one name of a W element from its subtree. A vertex v receiving from (some of) its children the names wi1 ; : : :; wi of W vertices in their respective subtrees (including itself, if it belongs to W ) will match those vertices in pairs (say, matching wi1 to wi2 and so on). If k is odd, then it will upcast wi to its parent. (See Fig. 1 for an example). We have the following result. Lemma 2.2 For every tree T and for every set W as above, k  bn=2c, the edge-disjoint matching can be found by a distributed algorithm on T in time O(Depth(T )). k

k

Token distribution: The token distribution problem is stated as follows: n tokens are initially

distributed among the n vertices of a graph, with no more than K at each site. Redistribute the tokens so that each processor will have exactly one token. This problem was studied on expander graphs in [PU89]. Here we discuss a distributed solution on a tree. Let us suppose that each token is of O(log n) bits, so it can be sent in a single message. The cost of the entire redistribution process is the sum of the distances traversed by the tokens in their 7

T

w1 w2

w4

w3

w5

w6

Figure 1: The matching process for the vertices of W over the tree T . way to their destinations. An optimal solution can be derived by using a convergecast process in order to determine, for every vertex v in the tree, the following three parameters: 1. su , the number of tokens in the subtree Tu , 2. nu , the number of vertices in the subtree Tu , 3. pu = su ? nu , the (positive or negative) number of tokens that need to be sent out of Tu . The parameters pu can now be used for performing the redistribution optimally. Generally, each vertex u with pu > 0 will upcast pu tokens to its parent, and each vertex u with pu < 0 will wait to get pu tokens from its parent. In addition, each intermediate vertex u in the tree is responsible for balancing those of its children w whose pw parameter is nonzero. In particular, it has to collect super uous tokens from its \rich" children and provide the necessary tokens to its \needy" ones. (If pu > 0 then this entire redistribution operation inside Tu can be completed internally; in contrast, if pu < 0 then u must wait for tokens from its parents before it can help its needy children.) Overall, the total number of messages required for achieving an even distribution P of the tokens is P = u6=r0 jpu j. Lemma 2.3 There is a distributed algorithm for performing token distribution on a tree in optimal number of messages P , after a preprocessing stage requiring O(n) time and O(n) messages.

3 Matroid problems and greedy algorithms Let us start with a brief presentation of matroid problems. A subset system is speci ed as a pair  = hA; Si, where A is a universe of m elements, and S is a collection of subsets of A, closed under inclusion (namely, if A 2 S and B  A then also B 2 S ). The sets in S are called the independent sets of the system. A maximal independent set is called a basis. The optimization problem associated with the system  is the following: given a weight function ! 8

assigning nonnegative weights to the elements of A, nd the basis of maximum total weight. This problem may be intractable in general. A natural approach for solving the optimization problem associated with  is to employ a greedy approach. Two types of greedy approaches may be considered. The best-in greedy algorithm is based on starting with the empty set and adding at each step the heaviest element that still maintains the independence of the set. Its dual, the worst-out greedy algorithm, starts with the entire universe, and discards at each step the lightest element whose removal still leaves us with a set containing some basis of . Unfortunately, these algorithms do not necessarily yield an optimal solution. A subset system  is said to be a matroid if it satis es the following property.

Replacement property: If A; B 2 S and jBj = jAj +1, then there exists some element 2 B n A

such that A [ f g 2 S . (There are in fact a number of other equivalent de nitions for matroids.) One of the most well-known examples for matroids is the minimum weight spanning tree (MST) problem, where the universe is the edge set of a graph, the independent sets are cycle-free subsets of edges, and the bases are the spanning trees of the graph. (The goal here is typically to nd the spanning tree of minimum weight, rather than maximum weight, but this can still be formalized as a matroid optimization problem.) Another fundamental example is that of vector spaces, where the universe is the collection of vectors in d-dimensional space, and the notions of dependence in set of vectors and bases are de ned in the usual algebraic sense. One important property of matroids is that both the best-in greedy algorithm and the worst-out greedy algorithm correctly solve every instance of the optimization problem associated with . (In fact, these properties hold for a somewhat wider class of problems, named greedoids, which were thoroughly treated in [KL81, KL83, KL84b, KL84a].) The common representation of matroids is based not on explicit enumeration of the independent sets, but on a rule, or procedure, deciding for every given subset of A whether or not it is independent. Here is another well-known property of matroids that we will use later on. (See [PS82] for more on the subject.) Proposition 3.1 All bases of a given matroid  are of the same cardinality, denoted rank(). One source of diculty in trying to adapt the greedy algorithms for solving matroid problems fast in a distributed fashion is that both algorithms are inherently \global" and sequential. First, they require going over the elements in order of weight, and secondly, they require us to be able to decide, for each element, whether after eliminating it we still have a basis (namely, an independent set of cardinality rank()) in our set of remaining elements. It is therefore useful to have a variant of the greedy algorithm which is more localized in nature. Such a variant was given for the MST problem [Tar83]. This algorithm makes use of the so-called red rule, which is based on the following fact. Lemma 3.2 [Tar83] Consider an instance of the MST problem on a graph G = (V; E; !), with a solution of (minimum) weight ! . Consider a spanning subgraph G0 of G (with all the vertices 9

and some of the edges), and suppose that G0 still contains a spanning tree of weight !  . Let C be a cycle in G0, and let e be the heaviest edge in C . Then G0 n feg still contains a spanning tree of weight ! .

The lemma leads to a localized version of the worst-out greedy algorithm, avoiding both dif culties discussed above. This localized algorithm starts with the entire graph G, and repeatedly applies the red rule (stated next), until remaining with a spanning tree.

The \red rule": Pick an arbitrary cycle in the remaining graph, and erase the heaviest edge in

that cycle. Lemma 3.2 guarantees that once the process halts, the resulting tree is an MST of the graph G. Indeed, this localized greedy algorithm was previously used as a component in a fast distributed algorithm for computing MST [GKP98, KP95a]. The proof of Lemma 3.2 relies on some speci c properties of the MST problem, and therefore it is not immediately clear that a similar general rule applies to every matroid problem. Nonetheless, it turns out that a rule of this nature exists for all matroids (cf. [Law76]). To illustrate the relevance of matroid problems in distributed systems, let us consider two simple examples.

Distributed resource allocation: Suppose that our distributed system features t types of re-

sources, with a set Ri of ri resource units of each type i. At any given moment, some of the units are occupied and only mi  ri are readily available. There is also a cost c(u) associated with each resource unit u. At a given moment, a process residing in node v decides to perform some task which requires it to get hold of some ki resource units of each type 1  i  t (where possibly ki  mi ). Naturally, the process would prefer to identify the ki cheapest free units of each type i. Assume that there is a spanning tree T (rooted at v , for the sake of this example), so v can broadcast its needs to all nodes over T . We would now like to collect the necessary information (namely, the ID's of the ki cheapest available resource units of each type i) from all nodes to v . Note that the necessary information (concerning the free units of each type and their costs) is scattered over the di erent nodes of the system, and is not readily available in one place.P Hence a naive solution based on collecting all the information to v over the tree T might cost O( i mi + Depth(T )) time. This problem can be solved by casting it as a simple kind of matroid problem, where the independence of any particular set in the universe is determined solely by counting the number of elements of each type. The methods developed in this P paper for handling matroid problems are thus applicable, and yield a solution of optimal time O( i ki + Depth(T )). We note that in this particular case, the problem can also be optimally solved directly, through a technique based on a careful pipelining of the separate upcast tasks involved. (In Section 2 we referred to this type of pipelined upcast as the smallest ki -of-mi problem.) However, things become more involved once certain inter-dependencies and constraints are imposed among di erent types of resources (for example, suppose that the cheapest available processor is incompatible with the cheapest available disk, so they cannot be used together.) In such a case, it is not possible to neatly separate and pipeline the treatment of the di erent resource types. Yet 10

in some cases, these more intricate dependencies can still be formulated as matroids, and hence are still solvable by our algorithm. 2

Task scheduling: Our next example concerns optimally scheduling unit-time tasks on a single

processor. Suppose that the sites of our system generate a (distributed) collection S of m tasks 1  i  m, all of which must be executed on the same processor in the system, with each requiring exactly one time unit to execute. The speci cation of task i includes a deadline di by which it is supposed to nish, and a penalty pi incurred if task i is not nished by time di . The goal is to nd a schedule for the collection S on the processor, minimizing the total penalty incurred by the tasks for missed deadlines. In particular, we would like to decide on a maximal set of k  m tasks that can be scheduled without violating their deadlines. Collecting all the information to the root of the tree T spanning the system and computing the schedule centrally may require O(m + Depth(T )) time. This problem is discussed in a number of places (cf. [Law76, CLR90]), and again, it is known that it can be formulated as a matroid problem. Hence our methods can be applied, yielding an O(k + Depth(T )) time solution. 2

4 Optimal matroid basis completion We now consider the following slightly more general problem. Consider an instance ! of the optimization problem associated with the matroid  = hA; Si, with a solution of (maximum) weight ! , and two disjoint sets A; R  A, where R is a non-maximal independent set. A completion for R in A is a set W  A such that R [ W is a basis. W is said to be an optimal completion if ! (R [ W ) = ! . The problem is to nd an optimal completion for R in A, assuming such a completion exists. (Note that in particular, if R = ; then the problem reduces to the basic question of nding an optimal basis.) This is again doable in a localized manner by a generalized variant of the red rule. We need the following de nition. Consider an instance ! of the optimization problem associated with the matroid  = hA; Si, with a solution of (maximum) weight ! . Consider two disjoint sets A; R  A, where R is a non-maximal independent set. Suppose that A contains an optimal completion for R. Let " 2 A and D  A n f"g. Then the pair ("; D) is called an elimination pair for R if it satis es the following: (1) R [ D is independent, (2) R [ D [ f"g is dependent, and (3) " is no heavier than any element in D.

Lemma 4.1 For an instance !;  and disjoint sets A; R as above, if ("; D) is an elimination pair for R then A n f"g still contains an optimal completion for R.

11

Proof: Let Z be some optimal completion for R in A. If " 62 Z then we are done, as Z  A n f"g. Otherwise, let Y = Z n f"g. Since R [ D is independent, D can be expanded into a completion D0 = D [ Q for R. (In particular, the process of expanding D by repeatedly adding new elements that preserve the independence of R[D, as long as possible, yields a basis R[D0 , and by Proposition 3.1, jR [ D0j = rank()). We next argue that there exists some X  Y such that D00 = D [ X is a completion for R of cardinality rank(). This D00 can be obtained from D [ Q by repeatedly applying the replacement rule, at each stage discarding some element of Q from R [ D [ Q (thus getting an independent set of cardinality rank() ? 1) and replacing it by some element of Z (relying on the fact that R [ Z is an independent set of cardinality rank()). The replacing element cannot be ", as R [ D [ f"g is dependent, hence it must come from Y . Note that jD00j = jY j + 1, hence we may apply the replacement rule to R [ D00 vs. R [ Y , and conclude that there exists some element 2 D00 n Y such that R [ Y [ f g is an independent set. Setting Z 0 = Y [ f g, we now argue that Z 0 is a completion satisfying the requirements of the lemma. Indeed, note that R [ Z 0 is of cardinality jRj + jY j + 1 = rank(), hence it is a basis. Moreover, Z 0  A and " 2= Z 0 . It remains to prove that R [ Z 0 is optimal. This is shown as follows. As X  Y , necessarily 2 D. This implies that ! ( )  ! ("), as " is no heavier than the elements in D. Hence ! (Z 0) = ! (Y ) + ! ( )  ! (Y ) + ! (") = ! (Z ), necessitating that ! (R [ Z 0 ) = !  . The claim follows. We thus get a modi ed greedy algorithm, based on the following rule.

The \generalized red rule": Pick an element " in the remaining set A for which there exists a set D  A n f"g such that ("; D) is an elimination pair, and erase " from A. Of course, the rule in itself does not specify how such " can be found systematically; our distributed algorithm addresses precisely this point.

5 A distributed algorithm for optimal matroid basis completion We now describe a distributed algorithm for solving the optimal matroid basis completion problem on a tree T . In the distributed setting, it is assumed that the elements of the (non-maximal) independent set R are known to all the vertices of the system (connected by a tree T ), and that each of the elements of the set A is stored at some vertex of T . (We make no a-priori assumptions on the precise distribution of the elements.) An element can be sent in a single message. Recall that m = jAj and r = jRj. Denote the number of elements missing from R by  = rank() ? r. In order to solve the problem, we require that the elements of the maximum-weight independent set be gathered at the root of the tree T . (The  completion elements added to R can then be broadcast by the root over the tree T in a pipelined manner in O( + Depth(T )) additional steps.) A straightforward approach to solving this problem would be to upcast all the elements of A to the root, and solve the problem locally using one of the greedy algorithms. However, this solution would require O(m ? r + Depth(T )) time for completing the upcast stage. Our aim in this section 12

is to derive an algorithm requiring only O( + Depth(T )) time. The algorithm uses elimination upcast,

5.1 The Elimination Upcast Algorithm The algorithm presented next for the problem is a distributed implementation of the localized greedy algorithm for matroid basis completion. It is based on upcasting the elements toward the root in a careful way, attempting to eliminate as many elements as we can along the way, relying on the generalized red rule. Our elimination upcast procedure operates as follows. During the run, each vertex v on T maintains a set Qv of all the elements of A it knows of, including both those stored in it originally and those it learns of from its children, but not including the elements of R, which are kept separately. The elements of Qv are ordered by non-increasing weight. The vertex v also maintains a set Av of all the elements it has already upcast to its parent. Initially Av = ;. A leaf v starts upcasting elements at pulse 0. An intermediate vertex v starts upcasting at the rst pulse after it has heard from all its children. At each pulse i, v computes the set

Depv and the set of candidates

f 2 Qv n Av j R [ Av [ f g is dependentg

Qv n (Av [ Depv ): If Cv 6= ; then v upcasts to its parent the heaviest element in Cv . Else, it stops participating in the execution. Finally, once the root r0 stops hearing from its children, it locally computes the solution to the problem, based on the elements in R [ Qr0 . Cv

5.2 Analysis The correctness proof proceeds by showing that the elements upcast by each vertex v to its parent are in nonincreasing weight order, and that v upcasts elements continuously, until it exhausts all the elements from its subtree. It follows that once Cv = ;, v will learn of no new elements to report. We use the following straightforward observations. Lemma 5.1 For every vertex v, R [ Av is independent. Lemma 5.2 Every vertex v starts upcasting at pulse L^(v). Call a node t-active if it upcasts an element to its parent on round t ? 1. Lemma 5.3 (a) For each t-active child u of v, the set Cv examined by v at the beginning of round t contains at least one element upcast by u. (b) If v upcasts to its parent an element of weight !0 at round t, then all the elements v was informed of at round t ? 1 by its t-active children were of weight !0 or smaller. (c) If v upcasts to its parent an element of weight !0 at round t, then any later element it will learn of is of weight !0 or smaller.

13

(d) Node v upcasts elements to its parent in nonincreasing weight order. Proof: By induction on the height of the tree, from the leaves up. For a leaf v, Claims (a), (b) and (c) hold vacuously, and (d) follows trivially from the rules of the procedure. Let us now consider an intermediate vertex v , and assume that the claims hold for each of its children. We start with Claim (a). Suppose v is at height h. The children of v in T are of height h ? 1 or lower, hence by Lemma 5.2, all of them started upcasting at round h ? 1 or earlier. Let Atv denote the set Av at the beginning of round t, and let t = jAtv j = t ? h. The elements of Atv were upcast by v during rounds h; : : :; t ? 1 if t > h. Consider a t-active child u of v . Since u was still active on round t ? 1, it has transmitted continuously to v since round L^ (u), which, as discussed before, is at most h ? 1. Therefore, jAtu j   + 1. By Lemma 5.1, both R [ Atv and R [ Atu are independent. Hence by the replacement property there exists some element 2 Atu n Atv such that R [ Atv [ f g is independent. This element is therefore in Cv , hence Claim (a) holds. Claim (b) is proved as follows. Consider any t-active child u of v . Let be the element upcast by u on round t ? 1. (Note that is not necessarily in Cv .) Let 0 be some element that was upcast by u at some round t0  t ? 1 and still resides in Cv at the beginning of round t (Claim (a) ensures the existence of such 0 ). By Claim (d) of the inductive hypothesis, ! ( )  ! ( 0). By the selection rule applied at vertex v , ! ( 0 )  !0 ; Claim (b) follows. Finally, Claim (c) follows from Claim (d) of the inductive hypothesis combined with Claim (b), and Claim (d) follows from Claim (c) and the element selection rule. Lemma 5.4 A vertex v that has stopped participating will learn of no new candidate elements. Proof: By induction on the structure of the tree from the leaves up, we prove that once Cv becomes empty, no new elements will become known to v . The inductive step of the proof follows from Claim (a) of the last lemma, which implies that if Cv = ; on round t, then none of v 's children was t-active, hence they have all stopped participating.

Lemma 5.5 The algorithm requires O( + Depth(T )), and the resulting set is a solution for the

optimal basis completion problem. Proof: To prove that the resulting set is a maximum-weight independent set, we need to argue that the set of elements collected at the root at the end of the upcast contains some optimal completion for R, despite the element eliminations performed along the way. This is shown by observing that each elimination performed by the algorithm conforms to the \generalized red rule". Indeed, consider the rst round t in which a vertex v places an element in the set Depv of eliminated elements, and consider the set Atv . Clearly, will be put by v in the set Depv in every subsequent round as well. But this is justi ed for the following reason. First, the set R [ Atv [f g is dependent. Secondly, by Lemma 5.1 the set R [ Atv is independent. And nally, observe that is no heavier than any element in Atv . To see why this last observation holds, we consider two separate cases. If was known to v when it decided which element to upcast in round t ? 1, then was still in the candidate set Cv on that round, and therefore the claim follows from the fact that it was not sent to v 's parent on that round, combined with claim (d) of Lemma 5.3 (applied to round t ? 1). Otherwise, if has arrived at v only after v has made its round t ? 1 transmission to its parent,

14

then the observation follows by Claim (c) of Lemma 5.3 (applied to round t ? 1). Therefore by Lemma 4.1, eliminating does not harm the solution. The bound on the run time is proved as follows. By Lemma 5.1 the root r0 receives at most  = rank() ? r elements from each of its children. Moreover, the children upcast these elements to r0 in a pipelined fashion, without stopping in the middle. As r0 starts receiving these messages at time Depth(T ) at the latest, the claim follows. Theorem 5.6 1. There exists a distributed algorithm for computing the optimal completion for a partial basis of cardinality r on a tree T in time O(rank() ? r + Depth(T )). 2. There exists a distributed algorithm for solving a matroid optimization problem on a tree T in time O(rank() + Depth(T )).

5.3 Distributed MST correction Let us now explain how our algorithm enables us to correct an MST fast in the distributed setting. Suppose that we start with a weighted graph G = (V; E; ! ) and a spanning BFS tree T . As discussed earlier, the subproblem of taking into account the edges in Wgood is easily solved in time

(kgood + Diam(G)). The other subproblem, of taking into account the edges in Wbad , can now be solved by rst removing those edges from the MST M , resulting in a partial edge set M 0 , and then completing M 0 into a minimum weight spanning tree using the elimination upcast algorithm. Observe that the assumptions necessary for the elimination upcast procedure are satis ed in our case, namely, each node stores the entire current MST and every edge e is stored at some node (in the obvious way, namely, each node knows the edges incident to itself.) As mentioned earlier, despite the fact that we seek the minimum-weight solution rather than the maximum-weight one, this problem is still a matroid optimization problem, so the same algorithm applies ( ipping the ordering in the procedure, or rede ning the edge weights by setting ! 0(e) = W^ ? ! (e), where W^ = maxe !(e)). Hence by Corollary 5.6, this part can be solved in time (kbad + Diam(G)), and thus the entire problem is solvable in time (kbad + kgood + Diam(G)).

Acknowledgement I am grateful to Guy Kortsarz for helpful comments.

15

References [ACK88] Baruch Awerbuch, Israel Cidon, and Shay Kutten. Communication-optimal maintenance of dynamic trees. Unpublished manuscript, September 1988. [CKMP90] Israel Cidon, Shay Kutten, Yishay Mansour, and David Peleg. Greedy packet scheduling. In Proc. 4th Workshop on Distributed Algorithms, pages 169{184, 1990. LNCS Vol. 486, Springer Verlag. [CLR90] T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. MIT Press/McGrawHill, 1990. [GHS83] Robert G. Gallager, Pierre A. Humblet, and P. M. Spira. A distributed algorithm for minimumweight spanning trees. ACM Trans. on Programming Lang. and Syst., 5(1):66{77, January 1983. [GKP98] J. Garay, S. Kutten, and D. Peleg. A sub-linear time distributed algorithm for minimum-weight spanning trees. SIAM J. on Computing, 1998. To appear. Extended abstract appeared in 34th IEEE Symp. on Foundations of Computer Science, pages 659{668, November 1993. [KL81] B. Korte and L. Lovasz. Mathematical structures undelying greedy algorithms. in: Fundamentals of Computation Theory, Lecture Notes in Computer Science, 117:205{209, 1981. [KL83] B. Korte and L. Lovasz. Structural properties of greedoids. Combinatorica, 3:359{374, 1983. [KL84a] B. Korte and L. Lovasz. Greedoids-a structural framework for the greedy algorithms. in: W. Pulleybank editor, Progress in Combinatorial Optimization, pages 221{243, 1984. [KL84b] B. Korte and L. Lovasz. Greedoids and linear objective functions. SIAM J. Alg. and Disc. Meth., 5:229{238, 1984. [KP95a] Shay Kutten and David Peleg. Fast distributed construction of k-dominating sets and applications. In Proc. 14th ACM Symp. on Principles of Distributed Computing, 1995. [KP95b] Shay Kutten and David Peleg. Fault-local distributed mending. In Proc. 14th ACM Symp. on Principles of Distributed Computing, August 1995. [KP95c] Shay Kutten and David Peleg. Tight fault-locality. In Proc. 36th IEEE Symp. on Foundations of Computer Science, October 1995. [KR93] P.N. Klein and R. Ravi. A nearly best-possible approximation for node-weighted Steiner trees. In Proc. 3rd MPS Conf. on Integer Programming and Combinatorial Optimization, pages 323{332, 1993. [Law76] E.L. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart and Winston, 1976. [MPS91] Y. Mansour and B. Patt-Shamir. Greedy packet scheduling on shortest paths. In Proc. 10th ACM Symp. on Principles of Distributed Computing, August 1991. [PS82] C.H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Inc., 1982. [PU89] D. Peleg and E. Upfal. The token distribution problem. SIAM J. on Computing, 18:229{243, 1989. [RVVN90] P. I. Rivera-Vega, R. Varadarajan, and S. B. Navathe. The le redistribution scheduling problem. In Data Eng. Conf., pages 166{173, 1990. [Tar83] Robert E. Tarjan. Data Structures and Network Algorithms. SIAM, Philadelphia, 1983.

Suggest Documents