An Optimal Parallel Algorithm for Merging using ... - CiteSeerX

An Optimal Parallel Algorithm for Merging using Multiselection Narsingh Deo

Amit Jain

Muralidhar Medidi

Department of Computer Science, University of Central Florida, Orlando, FL 32816

Keywords: selection, median, multiselection, merging, parallel algorithms, EREW PRAM.

1

Introduction

We consider the problem of merging two sorted arrays

and

on an exclusive read,

exclusive write parallel random access machine (EREW PRAM, see [8] for a definition). Our approach consists of identifying elements in

and

which would have appropriate

rank in the merged array. These elements partition the arrays

and

into equal-size

subproblems which then can be assigned to each processor for sequential merging. Here, we present a novel parallel algorithm for selecting the required elements, which leads to a simple and optimal algorithm for merging in parallel. Thus, our technique differs from those of other optimal parallel algorithms for merging where the subarrays are defined by elements at fixed positions in

and

.

Formally, the problem of selection can be stated as follows. Given two ordered

of sizes and , where , the problem is to select the th smallest element in and combined. The problem can be solved sequentially in log time without explicitly merging and [5, 6]. Multiselection, a generalization of selection, is the problem where given a sequence of integers 1 !" $# &%'( , all the $) th, 1 *+, , smallest elements in and combined are to be found. -. / 0 1 /32 multisets

1

and

2

Supported in part by NSF Grant CDA-9115281. For clarity in presentation, we use log to mean max 1 log2

.

2 Parallel merging algorithms proposed in [1] and [5] employ either a sequential median

log 4%5( on an EREW PRAM. Parallel algorithms for

or a sequential selection algorithm. Even though these parallel algorithms are cost-optimal, their time-complexity is

2

merging described in [2, 3, 7, 10, 11] use different techniques essentially to overcome the difficulty of multiselection. Without loss of generality, we assume that

and

are disjoint and contain no repeated

elements. First, we present a new algorithm for the selection problem and then use it to

processors, of an

EREW PRAM, to perform selections in log 6% log time. We further show that the number of comparisons in our merging algorithm matches that of Hagerup and R8 7 b’s develop a parallel algorithm for multiselection. The algorithm uses

algorithm [7] and is within lower-order terms of the minimum possible, even by a sequential merging algorithm. Moreover, our merging algorithm uses fewer comparisons when the two given arrays differ in size significantly.

2

Selection in Two Sorted Arrays

29 % 1 elements is defined to be the :% 1 th element. Finding the th smallest element can be reduced to selecting the median of the appropriate subarrays of and as follows: When 1 ; 1 ? @ and > 1 ? @ . Thus, the median of the 2 elements in these subarrays is the th smallest element. This reduction is depicted as Case III in Figure 1. On the other hand, when ACB D%E(F 2G , the th selection can be reduced to finding the median of the subarrays $> 1 H A@ and > JIKL H @ , which is shown as Case I in Figure 1. When NM6B D%E(F 2G , we can view the problem as that of finding the O th largest element, where OQPRS%TUIAV% 1. This gives rise to Cases II and IV which are symmetric to Cases

The median of 2 elements is defined to be the th smallest element, while that of

I and III, respectively, in Figure 1. From now on, these subarrays will be referred to as windows. The median can be found by comparing the individual median elements of the current

3

I.

A

B

A

II.

B

j -m-1 m

m

j

m+1 k

m+1

k-m-1 k >m

j>m

III.

A

B j

A

IV.

B

j k k

j j

m

k

m

j > ((m+n)/2) k = m+n-j+1

((m+n)/2) jth smallest

jth smallest = kth largest

Select jth smallest, 1

j

m+n

A[i] < A[i+1], 1 i < m, m B[q] < B[q+1], 1 q < n

n

Figure 1: Reduction of selection to median finding

active windows discarded elements

4 windows and suitably truncating the windows to half, until the window in

has no more

than one element. The middle elements of the windows will be referred to as probes. A formal description of this median-finding algorithm follows.

f >>mghln@@ $ > >jgJl+% % @ @ WZWeY\Yn$[][]>X,;WZY\[]ioUg lU !,,! ^`^p_b^`Zacacb^`ac^p^` :,, f@ , >j >j$>dWe%% Y\[:11A@(@"PDP5 kk! _^`bac^`U@ ) ^`bac^`qIqWeY\[:*,^`bac^`,IqWZY\[]rD^`bac^` 1 ? 2%Kc@ . Thus, selections in Case III (IV) in the arrays and become selections in Case I (II) in the arrays and . Any selection in the interval >j !! @ dominates the time-complexity. In such a case, we note that all of the selections in different cases can be handled by one chain with the

appropriate reductions. Hence we have the following result.

!! $# , all of the selections can be made

U in log &% log time using processors on the EREW PRAM.

Theorem 3.1 Given selection positions

4

1

Parallel Merging

Hagerup and R8 7 b[7] have presented an optimal algorithm which runs in , respectively. log r% time using r% 1 ? @ and ¡:¤> 1 ? 3@ , where ¡ ¢ >m@¦P ¥ 0 implies that $>m ¡ ¢ >m@?@ is the t log 6%'§u th element and ¡:¤V>m@|P ¥ 0 implies that >m¡:¤¨>m@?@ is the t log %K(u th element. 2. Let ¡:¢£> 0@"Po¡ ¤ > 0@"P 0, ¡:¢¨> @"Pow¡ ¤ > 3@©Pj§@"P 0 then ¡:¢£>j@cPo"«At log &%vu¦IK¡ ¤ >j§@ else ¡ ¤ >m@"P"«At log %K(u¬IK¡:¢|>j@ 3. for ªP 1 !! do merge $>m¡ ¢ >j"I 1@% 1@ ? f>j¡ ¢ >m@?@§ with >m¡:¤>j"I 1@% 1@ ? >m¡:¤¨>m@?@§ . Steps 1 and 3 both take log ;%K( time using ;%KF log ;%K( processors. Step 2 takes 1 time using r%vF log ;%' processors. Thus the entire algorithm U takes log v%f time using v%fF log v%f processors, which is optimal. The total q%NF log q%A( , since JP q%NF log q%N , amount of data copied in Step 1 is and compares favorably with 6%K data copying required by Hagerup and R8 7 b’s [7] merging algorithm. If fewer processors, say g , are available, the proposed parallel merging ®% algorithm can be adapted to perform gEI 1 multiselections, which will require Fg$% log ;% log gc -time. Let us now count the number of comparisons required. Step 2 does not require any comparisons and Step 3 requires less than %, comparisons. The estimation of the

can be expressed as follows: 1.

Find the

9 number of comparisons in Step 1 is somewhat more involved. First we need to prove the following lemma.

U¯

Lemma 4.1 Suppose we have a chain of size ,

2. The worst-case number of compar-

isons required to completely process the chain is greater if the chain splits at the current probe than if it stays intact.

log L . Let ° n±W be the total number of comparisons needed, in the worst case, to process a chain of size , which is at a probe corresponding to a node at height W in Proof: We can envisage the multiselection algorithm as a specialized search in a binary tree

with height

the search tree. We proceed by induction on the height of the node. The base case, when the chain is at a node of height 1, can be easily verified. Suppose the lemma holds for all nodes at height

,WcI

W

1. Consider a chain of size at a node of height . If the chain stays

W²I 1 then ° \_W§PR° n_WI 1(% 2

intact and moves down to a node of height

1

since at most two comparisons are required to process a chain at a node. If the chain splits,

BZF 2G stays at height W (in the worst-case) and the other chain of size tZF 2u moves down to height W²I 1. The worst-case number of comparisons is, then, 2 ° \_W§Po° BeF 2G_W§"%v° tZF 2u_W²I 1(% 4

one chain of size

3 ° \_WpI 1£D° BeF 2G_WcI 1(%v° tZF 2u±WpI 2(% 4 Thus, when the chain stays intact, we can combine Eq.s (1) and (3) to obtain ° n±W¨° BZF 2G_W²I 1(%K° tZF 2u3_WpI 2(% 6 Again, by hypothesis ° tZF 2u±W(I 1 J¯° teF 2u_W©I 2 ³% 2 and we require at least one comparison for a chain to move down a level. Hence ° BZF 2G_W§¦¯° BZF 2G_W©I 1 ³% 1 and the lemma holds. ´ To determine the number of comparisons required in the multiselection algorithm, we consider two cases. In the first case, when Uo , we have the following. By the hypothesis and Eq. (2)

10

1 % log L F , if Uo

Lemma 4.2 In the worst case, the total number of comparisons required by the parallel

multiselection algorithm for selections is

.

Proof: The size of the initial chain is . Lemma 4.1 implies that the chain must split at every

) of a chain is F 2 0 ,+µt log _u . Recall that at most two chains can remain at any node after any stage. The maximum number of chains possible is (with each containing only one element); which, in the worst case, could be spread over the first B log _G levels. From a node at height , the maximum number of comparisons a search for an element can take is B log NG¬Iqz (for this chain at this node). Hence the number of comparisons after the chains have split is bounded by ·)d¸ ¶ 2 2) B log QG¦IK¹=QPCt log _u which is log TF . We also need to count the number of comparisons required for the and fill up B log G levels. Since the size of a chain at initial chain to split up into chains, ) a node of height is at most F 2 , the maximum number of splits possible is B log _GJIK¹ .

opportunity for the worst-case number of comparisons. Thus, at height the maximum size

0

Also, recall that a chain requires four comparisons for each split in the worst-case. Thus,

·4 )¸ ¶ 2 2) B log G¬IK¹=QPCt log _u ) since there can be at most 2 2 chains at level . Thus the number of comparisons for U ´ splitting is . In particular, Lemma 4.2 implies that, when P=º ( , the total number of com log log (F log . This matches the parisons for the merging algorithm is r%'(ª% number of comparisons in the parallel algorithm of Hagerup and R8 7 b [7]. When one of the list is smaller than the other, however, our algorithm uses fewer comparisons. Consider the second case, when UM . Lemma 4.3 If *M , the parallel multiselection algorithm performs selections in log FL comparisons. the number of comparisons is bounded by:

0

11

F 2) at height . This chain can split at most

Proof: Using Lemma 4.1 and arguments similar to the ones in the proof of previous lemma,

B log _GJI»z times. Hence the number of comparisons needed for splitting is bounded by ·4 ¼ )¸ y½ 2 2) B log _G9IKz which is log FL . After the chains have split, there may be at most T chains remaining. The number of comparisons required is then bounded by ·¼ )d¸ y(½ 2 2) B log QG¦IK¹ P U L ´ Hence, if »%$(F log »%f¨M

An Optimal Parallel Algorithm for Merging using ... - CiteSeerX

An Optimal Parallel Algorithm for Merging using ... - CiteSeerX

Suggest Documents

An optimal parallel algorithm for arithmetic

An optimal parallel algorithm for formula evaluation - CiteSeerX

Optimal Stable Merging - CiteSeerX

An efficient parallel algorithm for geometrically ... - CiteSeerX

An algorithm for optimal service provisioning using ... - CiteSeerX

An Optimal Algorithm for Assigning Cryptographic - CiteSeerX

Using Fagin's Algorithm for Merging Ranked Results in ... - CiteSeerX

Communication-Optimal Parallel Algorithm for Strassen's Matrix ...

Communication-Optimal Parallel Algorithm for Strassen's Matrix ...

An Optimal Scheduling Algorithm for Parallel Video Processing

An Optimal Parallel Algorithm for Learning DFA - Semantic Scholar

An Optimal Approximate Dynamic Programming Algorithm ... - CiteSeerX

An Optimal Multiline TRL Calibration Algorithm - CiteSeerX

An Optimal Approximate Dynamic Programming Algorithm ... - CiteSeerX

An Efficient Parallel Algorithm for Graph Isomorphism on GPU using ...

Algorithm for Parallel Inverse Halftoning using Partitioning ... - CiteSeerX

An optimal algorithm for estimating angular speed using incremental ...

An Adaptive LU Factorization Algorithm for Parallel Circuit ... - CiteSeerX

Parallel Quicksort Algorithm using OpenMP

An E cient Parallel Algorithm for the Minimal Elimination ... - CiteSeerX

An Efficient Parallel and Distributed Algorithm for

An Adaptive Parallel Genetic Algorithm for VLSI

An Efficient Parallel Algorithm for Computing the

An Efficient Deterministic Parallel Algorithm for Adaptive ...