mon ancestor (LCA) query in a Cartesian tree data structure (due to [Vu-80]) and solved the range max- ima problem serially in linear time. Our preprocess-.
HIGHLY PARALLELIZABLE PROBLEMS (Extended Abstract) Omer Berkman 1,4 Dany Breslauer 1,2 Zvi Galil 1,2 Baruch Schieber 3 Uzi Vishkin 1,4,5
Summary of Results. We establish that several problems are highly parallelizable. For each of these problems, we design an optimal O (loglogn ) time parallel algorithm on the Common CRCW PRAM model which is the weakest among the CRCW PRAM models. These problems include: g all nearest smaller values, g preprocessing for answering range maxima queries, g several problems in Computational Geometry, g string matching. Until recently, such algorithms were known only for finding the maximum and merging. A new lower bound technique is presented showing that some of the new O (loglogn ) upper bounds cannot be improved even when non optimal algorithms are used. The technique extends Ramsey-like lower bound argumentation due to auf der Heide and Wigderson [MW-85]. Its most interesting applications are for Computational Geometry problems for which no previous lower bounds are known.
hhhhhhhhhhhhhhhh
1. Department of Computer Science, Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, Israel 69978. 2. Department of Computer Science, Columbia University, New York, NY 10027. The research of these authors was supported in part by NSF grants CCR-86-05353 and CCR-88-14977. 3. IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598. 4. Institute for Advanced Computer Studies, University of Maryland, College Park, Md 20742. 5. The research of this author was supported by NSF grant CCR-86-15337 and ONR grant N00014-85-K-0046 at the Department of Computer Science, Courant Institute, New York University.
1. Introduction It is commonly agreed that the class NC contains exactly all the problems that are amenable to parallel computation. It corresponds to the class P of the problems with feasible sequential solution. But establishing that a problem is in NC does not really prove that the problem has a good parallel algorithm. (This is even more so than the fact that a problem is in P does not mean that it has an efficient sequential algorithm.) Even being in NC 1 is not enough because the O (logn ) time solution might use many processors. Optimal parallel algorithms are those with a linear time-processor product and linear space. They correspond to linear time and space sequential algorithms. We call a problem highly parallelizable if it can be solved by an optimal parallel algorithm of time O (loglogn ). We restrict attention to the CRCW PRAM model, because, in general, a sublogarithmic time requires the concurrent write model (see [CDR-86]). We use the weakest Common CRCW PRAM model, in which the only concurrent writes allowed are of the value one. There are several known highly parallelizable problems for which there is a constant -time optimal algorithm. They include: (1) OR and AND functions, (2) maximum finding for "small integers" [FRW-88], (Small integers are integers in the range [1, . . . , n c ], for some constant c .) (3) finding whether n integers drawn from the domain [1...M ], are pairwise distinct given a memory of size M [FMW-87], and (4) logn -coloring of a cycle [CV86]. Until recently only two other highly parallelizable problems were known: (general) maximum finding and merging of sorted arrays. In both cases,
Valiant [Va-75] first described an O (loglogn ) algorithm in the parallel comparison model. His algorithms were later implemented as O (loglogn ) time optimal algorithms by Shiloach and Vishkin [SV81] for maximum finding (on a CRCW PRAM) and by Kruskal [Kr-83] for merging (on a CREW PRAM). Note that Beame and Hastad [BHa-87] showed that the problem of computing the parity of n bits requires Ω(logn /loglogn ) time on the CRCW PRAM model with any polynomial number of processors. Thus, this simple problem is not highly parallelizable, and it is quite surprising that any problem is! The contribution of this paper is showing that other problems that have more elaborate structure are highly parallelizable. The first problem is new. It is called the All Nearest Smaller Values problem, or ANSV and is defined as follows. The All Nearest Smaller Values (ANSV) problem. Given an array A =(a 1,a 2, . . . ,an ) of elements from a totally ordered domain. For each ai , 1≤i ≤n , find the nearest element to its left and the nearest element to its right, that are less than ai , if such elements exist. That is, for each 1≤i ≤n , find the maximal 1≤j X (vn +i ), for i = 2,...,n . We claim that the nearest neighbor of a vertex w of P , is a vertex z ≠ w , such that | X (w ) − X (z ) | is minimal. In Theorem 3.2 we relate the strong CRCW PRAM model to the parallel comparison model, defined in [Va-75]. Theorem 3.2: If there exists an algorithm for the ANN problem which runs in T (n ) time using m processors on a strong CRCW PRAM then there exists an algorithm for merging two ordered lists of size n each which runs in T (n ) time and performs at most m 22T (n ) simultaneous comparisons in a comparison model. Before proving the theorem we show how the lower bound follows. Suppose, in contradiction, that there exists an algorithm for the ANN problem which runs in o (loglogn ) time using n logc n processors, for some constant c . By Theorem 3.1, there exists an algorithm for merging two ordered lists of size n each which runs in o (loglogn ) time and performs at most n logc +2n simultaneous comparisons in a comparison model. This contradicts the corresponding lower bound of [BHo-85] and [HH82] for merging in a parallel comparison model. This concludes the proof of Theorem 3.1.
The proof of Theorem 3.2 is based on a lemma from [MW-85]. Suppose we are given a strong CRCW PRAM algorithm for the ANN problem. Using Claim 3.1 below we show how to construct from this algorithm a comparison model algorithm for merging. Informally, Claim 3.1 states that there exists an infinite set of integers S with the following property. Consider the set of all input polygons with vertices whose x coordinates are taken from S . (Later, we refer to this set as the set of inputs taken from S .) For this set of inputs and for each processor pi , 1 ≤ i ≤ m , the indices of the vertices that became known to pi at time t +1 depend on the order of the (x coordinates of the) vertices known at time t . (A vertex is known to a processor pi if it is in its local memory.) Let us formalize Claim 3.1. For a given input polygon we make the following definitions. (1) For 1 ≤i ≤ m , 0 ≤ t ≤ T (n ), let Kit be the set consisting of all indices of the vertices known to processor pi at time t . Denote IK t = (K t1 , . . . , Kmt ). (2) For 0 ≤ t ≤ T (n ), let Πt be a partial order on the indices {1,...,2n }. Πt is the closure of the union of m +1 (consistent) partial orders π0,π1t , . . . , πmt , defined as follows. π0 is the partial order induced by the counter clockwise direction of the input vertices. That is, 1