Scalable Coarse Grained Parallel Interval Graph ... - Semantic Scholar

Scalable Coarse Grained Parallel Interval Graph Algorithms Xin He

Chun-Hsi Huang

y

Abstract

We present scalable coarse grained parallel algorithms for solving interval graph problems on a BSP-like model{Coarse Grained Multicomputers (CGM). The problems we consider include: nding maximum independent set, maximum weighted clique, minimum coloring and cut vertices and bridges. With scalability at np p; 8 > 0 (here n denotes the total input size and p the number of processors), our algorithms for maximum independent set and minimum coloring use optimal computation time and O(log p) communication rounds, which is independent of the input size and grows slowly only with the number of processors. Equally scalable are our algorithms for nding maximum weighted clique, cut vertices and bridges, which use O(1) communication rounds and optimal local computation time, achieving both communication and computation optimality.

1 Introduction

1.1 Interval Graphs

An interval graph is an undirected graph G whose vertices can be put into one-to-one correspondence with a set of intervals = of a linearly ordered set such that two vertices are connected by an edge of G if and only if their corresponding intervals intersect [8]. (= is called an interval model of G.) Due to richness in real world applications from various elds, such as biology, psychology, archaeology, protein sequencing, VLSI design and many scheduling problems, interval graphs have been intensely researched [3, 10, 11, 12, 13, 14, 15, 16, 19, 20]. Many NP-complete problems on general graphs, such as nding maximum clique, maximum independent set and minimum coloring, etc., have ecient sequential algorithms and ecient parallel algorithms on PRAM model. Throughout this paper, we will use G= to denote an interval graph whose interval model is =. Ik =[lk ; rk ] (1 k n) speci es an interval in =. Without loss of generality [3, 16, 20], all endpoints of intervals in = are assumed distinct. Department of Computer Science and Engineering, State University of New York at Bualo, Bualo, NY 14260. Email: [email protected] y Department of Computer Science and Engineering, State University of New York at Bualo, Bualo, NY 14260. Email: [email protected]

1.2 Coarse Grained Parallel Programming Models

PRAM algorithms have long been questioned that the theoretic algorithmic eciency does not give satisfactory performance prediction while implemented on commercially available parallel machines. It has been pointed out that communication is the major bottleneck for the performance of parallel algorithms [18]. Valiant proposed a bridging model, BSP (Bulk Synchronous Parallel), based on a generic parallel machine model [18], which is de ned as the combination of three attributes: (1) p processors, each containing O( np ) local memory, where n refers to problem size, (2) a router delivering messages point to point between pairs of processors, and (3) facilities for synchronizing all or part of the processors at regular intervals. BSP emphasizes that parallel computation should be modeled as a series of supersteps, rather than individual message passing steps or shared memory accesses. A BSP algorithm consists of a sequence of such supersteps, separated by barrier synchronizations, each consisting of various message transmissions and local computation. In each superstep, every processor can perform operations on its local data, as well as send/receive messages of size O( np ). Assuming dierent network broadcast and combining capabilities [9], variations of BSP have also been proposed. One variation is CGM (Coarse Grained Multicomputer) which has been discussed in [2, 4, 5, 6, 7]. A CGM algorithm consists of an alternating sequence of computation rounds and communication rounds [4], separated by barrier synchronizations. The router is capable of duplicating data items in messages, as long as the destination processors are consecutively numbered. (For this reason, CGM is also referred to as weakCREW BSP in [9].) CGM algorithms require that, in each communication round, each processor can send/receive a message with maximum size h np . (This is referred to as an h-relation in [5, 6]). The cost of each communication round is considered the same. Therefore, the communication cost can be speci ed in terms of a single parameter, , the number of communication rounds. Our purpose is to design a scalable CGM algorithm, minimizing the number of communication rounds as well as the local computation time. If the best possible sequential algorithm for a given problem takes Ts(n) time, then ideally we would like to design a CGM algorithm using O(1) communication rounds and O( Tspn ) total local computation time. Although algorithms for solving many computational geometry problems on CGM are known [4], few CGM algorithms for solving graph problems have been published. Our work on CGM graph algorithms are currently among those pioneering papers. ( )

1.3 Organization of This Paper

In section 2, we present an algorithm for parallel pre x sum on CGM. Our algorithm solves this basic problem in optimal O(1) rounds with full scalability. It is used in later sections and may nd applications for solving other problems. In sections 3 through 6, we present algorithms for nding maximum independent set, maximum weighted clique, minimum coloring, and cut vertices and bridges, respectively. All these problems have important applications in practical applications [8]. Our algorithms in this paper are for CGM models. Yet, these algorithms can be easily modi ed to work on Valiant's original EREW BSP model. 2

2 Pre x Sum on CGM

Pre x sum is de ned in terms of a binary, associative operator . The computation takes as input a sequence < a1 ; a2; ; an > and produces as output a sequence < b1 ; b2; ; bn > such that b1 = a1 and bk = bk?1 ak for k = 2; 3; ; n. The pre x sum can trivially be solved on CGM using O(1) communication rounds and optimal local computation time with scalability at np p. Our algorithm further achieves scalability at np p for any > 0, while preserving both computation and communication optimality. Meijer and Akl [12] presented a parallel pre x sum algorithm on binary tree machine. We conceptually think ofk+1a p-processor CGM model as a complete g-ary tree machine, with gk leaf processors and g g?1?1 processors in total. (Here g is a parameter used for grouping processors which will be determined later). Each non-leaf processor and its rightmost child are actually the same processor. An example using 16 processors with g = 4 is shown in Figure 1. Assume that the initial input is a0 ; a1; ; an?1 and processors are labeled P0; P1; ; Pp?1. Processor Pi initially contains ai( np ); ai( np )+1 ; ; ai( np )+( np )?1 . For simplicity, we also assume n 0 mod p and p = gk . Our algorithm can easily be extended for any n and p. P15

P3

P0

P1

P7

P2

P3

P4

P5

P11

P6

P7

P8

P9

P15

P10

P11

P12

P13

P14

P15

Figure 1: A conceptual 4-ary tree w.r.t 16-processor CGM model. First each processor computes all pre x sums of ai np ; ai np ; ; ai np np ? and stores the results in local memory locations si np ; si np ; ; si np np ? . Then, starting from leaves, for each tree level, we perform the following steps: (

(

)

(

)

(

)+1

(

)+1

)+(

(

)

)+(

)

1

1

1. Each Pi sends si np np ? to its parent processor. 2. Each parent processor computes pre x sums of the g values received from its children and store these values at t ; t ; ; tg? . 3. Each parent processor broadcasts ti to all leaf processors of the subtree rooted at Ci , for 0 i g ? 2. (Here C ; : : : ; Cg? denote the child processors ordered from left to right.) 4. Each leaf processor adds received value to si np ; si np ; ; si np np ? . (

)+(

)

1

0

+1

0

1

1

1

(

)

(

)+1

(

)+(

)

1

For general n and p, slight modi cation for computing parent processor number is needed but the algorithm remains intact. An example with n = 26 and p = 9 is shown in Figures 2, 3 and 4. 3

We let g = minfd np e; pg. The required number of communication rounds is the height of the conceptual tree, with scalability measure np p, which is dlogg pe logg p + 1 p n p + 1 = + 1 = O(1). In each round, the local computation time is obviously O( p ). This gives: log log

1

Theorem 2.1 Parallel pre x sum on n input items can be optimally implemented on a p-

processor CGM computer in O(1) communication rounds and optimal local computation time in each round, with scalability at np p ; 8 > 0. [a] [ab] [ac]

[d] [de] [df]

[g] [gh] [gi]

[j] [jk] [jl]

[m] [mn] [mo]

[p] [pq] [pr]

[s] [st] [su]

[v] [vw] [vx]

[y] [yz]

Figure 2: (n = 26; p = 9) Initially each processor stores d np e data items. This diagram shows the result after the rst computation round. ([ac] denotes the pre x sum a b c, where

stands for any pre x computation operator.) [ac] [a] [ab] [ac]

[ac]

[ad] [ae] [af]

[ag] [ah] [ai] [ac] [af]

[df]

[su]

[jl] [j] [jk] [jl]

[jl]

[jm] [jn] [jo]

[jp] [jq] [jr] [jl] [jo]

[mo]

[s] [st] [su]

[sv] [sw] [sx]

[sy] [sz] [su] [sx]

[su]

[vx]

Figure 3: In each level, each parent processor collects (with one g-relation), computes, and broadcasts (with one g-relation) values to its descendent leaf processors.

[a] [ab] [ac]

[ad] [ae] [af]

[ag] [ah] [ai]

[ai] [aj] [ak] [al]

[ai] [am] [an] [ao]

[ai] [ap] [aq] [ar]

[ar] [as] [at] [au]

[ar] [av] [aw] [ax]

[ay] [az] [ai] [ar]

[jr]

[ai]

Figure 4: In another level, the parent processor also collects, computes and broadcasts values. (Here we are employing the weak-CREW [9] property of CGM model.) Next we describe two examples using pre x sum operations, which will appear as subprocedures in some algorithms in the incoming sections. 4

Array Packing [1]: Given an array A= fa ; a ; ; ang, some of whose elements are la1

2

beled. We need to pack all labeled elements at the front of the array. The corresponding CGM algorithm is a straightforward application of the pre x sum algorithm.

Interval Broadcasting [1]: Given an array A, some of whose elements are \leaders",

broadcast those values associated with the leaders to the subsequent elements, up to, but not including, the next leader. The CGM solution is as below: 1. Each leader in A holds a pair (i; di), where i is the leader's index in A and di is its value. Each nonleader holds a pair (?1; #), where # is a dummy value. 2. Perform CGM pre x operation. The pre x operator is de ned as: (i; a) (j; b) ! (i; a) if j < i (i; a) (j; b) ! (j; b) otherwise

3 Maximum Independent Set Given G = (V; E ), a subset S of V is called an independent set of G if no two vertices of S are adjacent in G. An independent set S is maximum if G has no independent set S 0 with jS 0j > jS j. We are interested in nding a maximum independent set (MIS) of an interval graph G=. The following sequential algorithm for this purpose was presented in [10]. Seq-MIS Algorithm: First sort the 2n endpoints. Scan in ascending order until the rst right endpoint is encountered. Output the interval I with this right endpoint as a member of the MIS and delete all intervals containing this point (including I ). Proceed this process until no interval is left. A PRAM algorithm for nding MIS of interval graphs requiring O(log n) time and O( n2n ) processors was presented in [3]. In this section, we give a scalable CGM algorithm for this purpose, using O(log p) communication rounds and optimal local computation time. Our algorithm also implies a costoptimal PRAM algorithm using O(log n) time and O(n) processors which improves the algorithm in [3]. An interval Ii 2 = is redundant if there exists Ij 2 =, such that li < lj and rj < ri. Given =, =0 denotes the interval set after removing all redundant intervals in =. Next we prove two technical lemmas. log

Lemma 3.1 An MIS of G= is also an MIS of G=. Proof: The lemma follows directly from the fact that when the algorithm Seq-MIS is performed on =, the redundant intervals never appear in MIS. Thus the behavior of Seq-MIS on = and =0 is identical. 2 0

5

Lemma 3.2 The order of the intervals of =0 sorted by left endpoints and that by right

endpoints are identical. Proof: Let I1; : : : ; Im be the intervals in =0 sorted by left endpoints. If the right endpoints of I1; : : : ; Im are not strictly increasing, there must exist at least one redundant interval in =0, leading to a contradiction. 2 Let I1; I2; : : : ; In be the intervals in = sorted by left endpoints. Let R(Ii) be the index of Ii in = when sorted by right endpoints. By Lemma 3.2, an interval Ii is redundant if and only if there exists j (j > i) such that R(Ij ) < R(Ii). Based on Lemmas 3.1 and 3.2, it is clear that the intervals selected by algorithm Seq-MIS on = are exactly those selected by the following: Modi ed-Seq-MIS Algorithm:

Construct =0 from =. Sort the remaining intervals by their left endpoints. Let I ; : : : ; Im be the sorted intervals in =0. For each index i, let link(i) be the smallest index such that ri < l i . If no such 1

link( )

index exists, let link(i) = null. Note that if we consider I i as the parent of Ii , then the intervals in =0 are arranged into a forest structure F . Select the intervals on the path from I to the root of the tree in F that contains I . link( )

1

1

Our CGM MIS algorithm is based on the algorithm Modi ed-Seq-MIS. The basic data structure in our algorithm is an array L[1 n]. Each L[i] (1 i n) speci es an interval in = and is a record consisting of the following elds:

L[i]:id: the interval id, within [1 n] L[i]:l: the coordinate of the left endpoint of interval L[i] L[i]:r: the coordinate of the right endpoint of interval L[i] CGM-MIS Algorithm: Input: Records of L are evenly distributed among the p-processors. 1. Renaming intervals: (a) Sort L by l elds. (Each processor Pi now contains np records of sorted L, according to left endpoints.) (b) Each processor replaces the id elds of its local np records by their corresponding indices in sorted L. 2. Computing =0 : (a) Sort L by r elds. (Each processor Pi now contains np records of sorted L, according to right endpoints.) 6

(b) Perform pre x sum on eld id for all intervals with operator max. (c) Discard redundant intervals, which are those whose id elds themselves are not pre x maximum values. (d) Sort the remaining intervals by l elds and update the id eld values as done in Step 1. These intervals form =0 . 3. Computing MIS of G= : 0

(a) For each L[i] of =0, create two records: Ll [i] = (L[i]:l; 0; L[i]:id; L[i]:id) and Lr [i] = (L[i]:r; 1; L[i]:id; 0). In addition, create a dummy record (1; 1; null; null). (b) Sort these 2 j=0j + 1 records by their rst elds. (c) Perform backward interval broadcasting for these 2 j=0j + 1 records on the 4th eld, with nonzero values as leaders. (For each interval L[i] in =0, the 4th eld of the record Lr [i] now contains the index link(i).) (d) Perform array packing to pack, among the 2 j=0j + 1 records, those j=0 j + 1 ones, whose second elds are 1's. These (j=0 j + 1) records thus conceptually form a forest F linked by the 4th eld. (e) Among those (j=0j + 1) records in F , identify the records on the path from the rst record Lr [1] to the root of the tree in F that contains Lr [1]. Those records correspond to the intervals (whose id's are stored in the 3rd elds) in MIS of G=.

Analysis: The sorting steps can be done by using the CGM parallel merge sort in [9], taking

O(1) communication rounds and optimal local computation time. Pre x sum in Step 2 (b), backward interval broadcasting in Step 3 (c) and array packing in Step 3 (d) are applications of our CGM pre x sum algorithms. Step 3 (e) is the only step that requires O(log p) communication rounds. For each record Lr [i] in the forest F , let Lr [i]:min denote the minimum of interval id's (the 3rd eld of Lr [i]) of all records in the subtree rooted at Lr [i]. Those records with Lr [i]:min = 1 are the records to be identi ed. All Lr [i]:min can be computed by using pointer jumping technique. Here, in order to enforce the weak-CREW property of CGM, before performing pointer jumping operations, we would need to relocate the records in such a way that those of the same depths are stored in consecutive processors. To accomplish this, we apply the well known method by Tarjan and Vishkin[17] to compute Euler Tours of trees in our forest F and further compute depths for all records. Another sorting by depth of each record nishes our record relocation. Caceres [4] presented CGM algorithms for list ranking and Euler Tour computation, both requiring O(log p) communication rounds. So far, for CGM model, list operations usually contribute the dominating part of required communication rounds. Thus we have:

Theorem 3.1 Finding maximum independent set of interval graphs containing n intervals can be optimally solved on a p-processor CGM computer, using O(log p) communication rounds and optimal local computation time in each round, given np p ; 8 > 0.

7

4 Maximum Weighted Clique

A set S = is a clique of G= if every pair of intervals in S intersect. A maximum cardinality clique is a clique with maximum number of elements. An interval graph is weighted if a positive real number wi, the weight, is associated with each interval Ii. A maximum weighted clique (MWC) of a weighted interval graph is the clique with maximum total weight. When all intervals carry unit weights, it is easily seen that a MWC is actually a maximum cardinality clique. Our MWC algorithm uses an array L[1::2n]. Each record L[i] (1 i 2n) speci es either a left or right endpoint of some interval, and has the following elds: L[i]:coord: coordinate of the end point speci ed by L[i] ( left endpoint L[i]:end = rl ifif LL[[ii]] isis aa right endpoint ( weight of the interval corresponding to L[i] if L[i]:end = l L[i]:weight = ? the (the weight of the interval corresponding to L[i]) if L[i]:end = r

L[i]:id: the id of the interval corresponding to L[i], within [1 n] L[i].sum: weight sum (initially 0) MWC Algorithm: Input: Each processor Pi stores pn endpoints of L. 2

1. Sort L, using coord as key. 2. Perform pre x sumPon eld L[i]:weight and store the result at L[i]:sum. (i.e., L[i]:sum = ij L[j ]:weight. Note that if L[i]:end = l, L[i]:sum is the total weight of the clique consisting of all intervals that contain the point L[i]:coord.) 3. Compute pre x max on L[i]:sum, for all L[i]'s with L[i]:end = l. =1

4. Assume L[k]:sum is the maximum of the sum elds among all records. Broadcast L[k]:coord to all other processors. 5. In parallel, each processor determines the intervals containing the point L[k]:coord. Those intervals are in the MWC. Analysis: Step 1 can be done by using the CGM parallel merge sort in [9], taking O(1) communication rounds and optimal local computation time. Steps 2 and 3 are applications of pre x sum operation. Step 4 uses one communication round for all processors to receive the broadcasted endpoints. Step 5 uses a single computation round in time O( pn ). This gives: Theorem 4.1 Finding maximum weighted clique of interval graphs containing n intervals can optimally be solved on a p-processor CGM computer, using O(1) communication rounds and optimal local computation time in each round, given np p; 8 > 0. 2

8

5 Minimum Coloring

Given =, a partition } = fS ; S ; ; Sk g of = is a coloring of G= if each Si (1 i k) is an independent set of G=. A coloring with minimum k is called a minimum coloring [8]. It is known that !(G=), the maximum clique size, is equal to (G=), the chromatic number (the fewest number of colors needed to properly color vertices of G=) [8]. Yu [20] devised an optimal algorithm for computing minimum coloring of interval graphs on EREW PRAM. It works as follows: Let = = fI Ing be an interval set sorted by left endpoints. Let IR i denote the interval with i-th smallest right endpoint. For each i, compute link(i): ( if i + !(G=) < n + 1 link(i) = ni ++!1(G=) otherwise 1

2

1

( )

Then, for any i (1 i n), link the interval IR i to the interval I i if link(i) 6= n + 1; otherwise, link IR i to null. There will be !(G=) separate linked lists and by assigning each list a distinct color, we have a minimum coloring. The correctness proof can be found in [20]. Our CGM algorithm for nding minimum coloring of interval graphs is based on the above observation. In addition to pre x computation and sorting, in order to propagate colors in these !(G=) lists, our algorithm also needs pointer jumping techniques, which requires O(log p) communication rounds. The algorithm uses an array L[1::n]. Each record L[i] speci es an interval in = and contains the following elds: ( )

link( )

( )

L[i]:id: the interval id, within [1 n] L[i]:l: the coordinate of left endpoint of interval corresponding to L[i] L[i]:r: the coordinate of right endpoint of interval corresponding to L[i] (

if i + !(G=) < n + 1 (initially null) L[i]:link = ni ++!1(G=) otherwise L[i]:color: the color assigned to interval L[i] (initially i) Another array R[1 n] is needed in our algorithm: R[i]: the interval id containing the i-th smallest right endpoint Minimum Coloring Algorithm: Input: L and R are evenly distributed among the p processors. 1. Renaming intervals: Sort L using l as key eld. Each processor replaces the id elds of its local np records by their corresponding indices in sorted L. 2. Computing array R: (a) Sort L using r as key eld. 9

(b) Each processor computes R[i] = L[i]:id for all its local records. 3. All records of L resume their positions in Step 1. 4. Compute !(G=) by using the MWC algorithm in Section 4. 5. Each processor computes L[i]:link for all its local intervals, using the above formula. 6. Assign color to each interval: (a) Consider the record corresponding to IR i \linked" to that corresponding to IL i :link . These records form !(G=) linked lists. (b) In parallel, each record searches for the root of its list. (c) In parallel, each record replaces its color value by the color of root record. [ ]

[ ]

Analysis: Steps 1 and 2 both involve global sorting and, as mentioned in previous

sections, can be nished using optimal local computation time and O(1) communication rounds. Step 3 consumes a single communication round with a np -relation. Step 4 takes O(1) communication rounds and optimal local computation time by Theorem 4.1. Broadcasting !(G=) to all other processors takes another single communication round. Step 5 takes obviously O( np ) local computation time. Step 6(a) nishes in a single computation round. Step 6(b) uses pointer jumping on CGM, and takes O(log p) communication rounds (for inter-processor pointer jumps) and O( np ) local computation time in each round (for intraprocessor pointer jumps). Step 6(c) obviously nishes in one single computation round. This concludes:

Theorem 5.1 Finding minimum coloring of interval graphs containing n intervals can optimally be solved on a p-processor CGM computer, using O(log p) communication rounds and optimal local computation time in each round, given np p; 8 > 0.

6 Cut Vertices and Bridges A vertex v of G= is a cut vertex if removal of v results in a graph having more components than G= has. Likewise, an edge with such property is called a bridge. A biconnected component of G= is a maximal connected component of G= which contains no cut vertices. In [16], Sprague proposed an optimal EREW PRAM algorithm for nding cut vertices and bridges of interval graphs, using n n processors in O(log n) time. This algorithm heavily relies on the PRAM pre x sum operations. Here, by employing our CGM pre x sum procedure, we can easily implement an optimal CGM algorithm for nding cut vertices, bridges, and biconnected components in interval graphs. We use an array L[1 2n] for endpoints of intervals in =. Each L[i] (1 i 2n) speci es either a left or a right endpoint of some interval. Later in the algorithm, after sorting the 2n coordinates, we will further replace these original coordinates by their indices in sorted array L. From then on, all coordinates are within range [1 2n]. Detailed data structures are described as below: log

10

L[i]:coord: coordinate of L[i] (

left endpoint L[i]:end = rl ifif LL[[ii]] isis aa right endpoint (

=l L[i]:type = ?1 1 ifif LL[[ii]]:end :end = r L[i]:id: the id of the interval corresponding to L[i], within [1 n] L[i]:density: the number of intervals containing L[i]:coord (initially 0) (

id; rid) if L[i]:end = l L[i]:e = ((?1 ; ?1) if L[i]:end = r

where rid is the right endpoint coordinate of interval Iid

L[i]:f = L[1]:e L[2]:e L[i]:e, where (x; a) (y; b) = (initially e, and to be computed in step 4 )

(

(x; a) if a > b (y; b) otherwise

L[i]:g is to be assigned the cut vertex (denoted as interval id) at step 5 (initially ?1) An important property on which our CGM cut vertex algorithm is based is uncovered in [16] as follows::

Lemma 6.1 An interval Ii is a cut vertex if and only if there exists j, such that li j < ri and L[j ? 1].density = L[j + 1].density = 2; L[j ].density = 1. Cut Vertex Algorithm: Input: Each processor Pi stores

n p

2

records of L.

1. Globally sort L, using coord as key. 2. Each processor replaces the coord elds of its local indices in sorted L.

n p

2

records by their corresponding

3. Compute density by using pre x sum: L[i]:density = Pij L[j ]:type. (An interval Ik is called a covering interval of a point x if lk x < rk . L[i]:density now contains the number of covering intervals of L[i]:coord). =0

4. For each i such that L[i]:density > 0, nd a covering interval as follows: Compute f eld for each L[i]. Then the rst coordinate of f speci es the interval whose left end appears right at or before i and whose right end extends to the rightmost. 5. Assign the rst component of f to g for those L[j ]'s satisfying (a) L[j ]:density = 1 and (b) L[j ? 1]:density = L[j + 1]:density=2. 11

6. Generate all cut vertices (using array packing for repetition removal and packing). Analysis Step 1 is an application of Goodrich's CGM sorting and uses O(1) communication rounds and optimal local computation time. Now each processor contains its part of n p endpoints in sorted L. Step 2 assigns index i to L[i]:coord, making these 2n endpoints consecutively numbered from 1 to 2n. Each processor sequentially nishes this step in one computation round, using O( np ) computation time. Step 3 is also a pre x sum operation and takes O(1) communication rounds. Step 4 is a pre x maxima operation. Step 5 needs one communication round for those L[i]'s whose density elds are 1's to get either L[i ? 1]:density or L[i + 1]:density. So obviously the required communication rounds are dominated by sorting and pre x sum operations, which are O(1). And the local computation time is also optimal. Therefore, we conclude: 2

Theorem 6.1 Finding cut vertices on interval graphs containing n intervals can optimally

be solved on a p-processor CGM computer, using O(1) communication rounds and optimal local computation time in each round, given np p; 8 > 0.

Optimal CGM algorithms for nding bridges and biconnected components can be designed in a similar fashion.

7 Concluding Remarks We have implemented scalable CGM pre x computations in a communication ecient way and, by incorporating sorting subroutine, provided algorithms for nding maximum weighted cliques, cut vertices and bridges, also in a communication ecient manner. Since pre x computation and sorting have widely been applied in interval graph algorithms, many other problems can be solved on CGM computer in similar approaches. Pointer jumping is another important procedure commonly used in many parallel algorithms. On CGM computers, algorithms using pointer jumping usually take O(log p) communication rounds, which is independent of the input size and grows slowly only with the processor numbers. Our algorithms for nding maximum independent set and minimum coloring jointly employ pre x computation, sorting and pointer jumping. These algorithms are simple and can easily be implemented on any commercially available parallel computers with predictable performance.

References [1] Selim G. Akl. Parallel Computation, Models and Methods. Prentice Hall, 1997. [2] Mikhail J. Atallah and Danny Z. Chen. Parallel Geometric Algorithms in Coarse-Grain Network Models. In Proc. 4th Annual International Computing and Combinatorics Conference(COCOON), 1998, 55-64. [3] Alan A. Bertossi and Maurizio A. Bonuccelli. Some Parallel Algorithms on Interval Graphs. Discrete Applied Mathematics, 16:101{111, 1987. 12

[4] E. Caceres, F. Dehne, A. Ferreira, P. Flocchini, I. Rieping, A. Roncato, N. Santoro, and S. W. Song. Ecient Parallel Graph Algorithms for Coarse Grained Multicomputers and BSP. In Proc. 24th International Colloquium on Automata, Languages and Programming, 1997, 390-400. [5] F. Dehne, A. Fabri, and A. Rau-Chaplin. Scalable Parallel Geometric Algorithms for Coarse Grained Multicomputers. In Proc. 9th ACM Annual Computational Geometry, 1993, 298-307. [6] F. Dehne, A. Fabri, and A. Rau-Chaplin. Scallable Parallel Computational Geometry for Coarse Grained Multicomputers. International Journal on Computational Geometry and Applications, 6(3):379{400, 1996. [7] F. Dehne and S. Goetz. Practical Parallel Algorithms for Minimum Spanning Trees. In Proc. 17th IEEE Symp. on Reliable Distributed Systems, Advances in Parallel and Distributed Systems, 1998, 366-371. [8] Martin Charles Golumbic. Algorithmic Graph Theory and Perfect Graphs. Academic Press, 1980. [9] Michael T. Goodrich. Communication-Ecient Parallel Sorting. In Proc. 28th ACM Symp. on Theory of Computing, 1996, 247-256. [10] U. I. Gupta, D. T. Lee, and Y. Y.-T Leung. Ecient Algorithms for Interval Graphs and Circular-Arc Graphs. Networks, 12:459{467, 1982. [11] J. Mark Keil. Finding Hamiltonian Circuits in Interval Graphs. Information Processing Letters, 20:201{206, 1985. [12] Henk Meijer and Selim G. Akl. Optimal Computation of Pre x Sums on a Binary Tree of Processors. International Journal of Parallel Programming, 16(2):127{136, 1987. [13] Stephan Olariu. An Optimal Greedy Heuristic to Color Interval Graphs. Information Processing Letters, 37:21{25, 1991. [14] G. Ramalingam and C. Pandu Rangan. New Sequential and Parallel Algorithms for Interval Graph Recognition. Information Processing Letters, 34:215{219, 1990. [15] Sanjeev Saxena and N. Malahal Rao. Parallel Algorithms for Connectivity problems on Interval Graphs. Information Processing Letters, 56:37{44, 1995. [16] Alan P. Sprague. Optimal Parallel Algorithms for Finding Cut Vertices and Bridges of Interval Graphs. Information Processing Letters, 42:229{234, 1992. [17] Robert E. Tarjan and Uzi Vishkin. An Ecient Parallel Biconnectivity Algorithm. SIAM Journal on Computing, 14(4):862{874, 1985. [18] Leslie G. Valiant. A Bridging Model for Parallel Computation. Communications of the ACM, 33(8):103{111, 1990. 13

[19] Ming-Shing Yu and Cheng-Hsing Yang. An Optimal Parallel Algorithm for the Domatic Partition Problem on an Interval Graph Given its sorted Model. Information Processing Letters, 44:15{22, 1992. [20] Ming-Shing Yu and Cheng-Hsing Yang. A Simple Optimal Algorithm for the minimum Coloring Problem on Interval Graphs. Information Processing Letters, 48:47{51, 1993. XIN HE received the Ph.D. degree in computer and information science in 1987, the M.S. degree in computer and information science in 1984, and the M.S. degree in mathematics in 1981, all from Ohio State University. He attended the Graduate School of the Academia Sinica, Beijing, China, from 1978 to 1980, majoring in mathematics. He attended Lanzhow University, Lanzhow, China, in 1978, majoring in chemistry. In the fall of 1987, Dr. He joined the faculty of the Department of Computer Science at the State University of New York at Bualo, where he is currently an associate professor and director of graduate studies. His primary research interests are graph algorithms, parallel algorithms, and combinatorics. Dr. He is a member of the Association for Computing Machinery. CHUN-HSI HUANG received his B.S. in computer and information science in 1989 from National Chiao-Tung University in Taiwan. He received his M.S. in computer science from University of Southern California in 1993. Since 1996, he has been a doctoral student in computer science and engineering department of SUNY at Bualo. Prior to attending UB, he was a software engineer at Multimedia B.D. of Acer Inc., Taipei, Taiwan. His current research interest is on parallel graph algorithms.

14

Scalable Coarse Grained Parallel Interval Graph ... - Semantic Scholar

Scalable Coarse Grained Parallel Interval Graph ... - Semantic Scholar

Suggest Documents

Graph Coloring on Coarse Grained Multicomputers - CiteSeerX

Fine-Grained Scalable Streaming from Coarse-Grained Videos - ifi

Migratory Compression: Coarse-grained Data ... - Semantic Scholar

Coarse-grained protein molecular dynamics ... - Semantic Scholar

Exploiting Coarse Grained Parallelism in ... - Semantic Scholar

Towards Coarse-Grained Mobile QoS - Semantic Scholar

A Coarse-Grained Dynamically Reconfigurable ... - Semantic Scholar

Interval Scheduling: Fine-Grained Code ... - Semantic Scholar

and coarse-grained sand

Scalable, Graph-Based Network Vulnerability ... - Semantic Scholar

Coarse Grained Parallel Monte Carlo Algorithms for Solving SLAE

A range minima parallel algorithm for coarse grained multicomputers

The Saukas-Song Selection Algorithm and Coarse Grained Parallel ...

Finite Element Analysis on Coarse Grained Vector ... - Semantic Scholar

PMOS: A Complete and Coarse-Grained ... - Semantic Scholar

Fine-Grained, Local Maps and Coarse, Global ... - Semantic Scholar

a coarse-grained array accelerator for software - Semantic Scholar

Structurally detailed coarse-grained model for Sec ... - Semantic Scholar

A system for coarse-grained location-based ... - Semantic Scholar

Energy-Aware Exploration of Coarse Grained ... - Semantic Scholar

A coarse-grained model for RNA folding ... - Semantic Scholar

Implementation and Use of Coarse-grained Parallel ... - Science Direct

A range minima parallel algorithm for coarse grained multicomputers

A Coarse Grained Parallel Algorithm for Closest Larger Ancestors In