a binary image of size nÃn in O(loglogn) time. Key Words: reconfigurable meshes, bus systems, integer sorting, prefix sums, graph, image processing, pattern ...
Integer Problems on Reconfigurable Meshes, with Applications* Stephan Olariu, James L. Schwing, and Jingyuan Zhang Department of Computer Science Old Dominion University Norfolk, VA 23529-0162 U.S.A.
A proposed running head: Integer Problems
Correspondence Address: Stephan Olariu Department of Computer Science Old Dominion University Norfolk, VA 23529-0162 U.S.A. Phone: (804)683-4417 FAX: (804)683-4900 Email: olariu˜cs.odu.edu
__________________ * This work was supported by NASA under grant NCC1-99, and the first author was also partly supported by the National Science Foundation under grant CCR-8909996.
-2-
Abstract Problems with solutions based upon integer computations are collectively referred to as integer problems. Such problems occur routinely in pattern recognition, image processing, graph theory, and query processing. The purpose of this paper is to present constant time algorithms for computing the prefix sums of an integer sequence, and for solving the packing problem on reconfigurable meshes. These two algorithms are fundamental in the sense that they afford us efficient solutions to a wide variety of problems. Applications discussed in this paper include integer sorting, converting between the edge-list and the adjacency list representation of a graph, and converting between different representations of a rooted tree. All these problems are solved in constant time. As another application, we show how to compute the perimeter and the area of a binary image of size n ×n in O(loglogn ) time. Key Words: reconfigurable meshes, bus systems, integer sorting, prefix sums, graph, image processing, pattern analysis
-31. Introduction Recent advances in VLSI have made it possible to build massively parallel machines featuring many thousands of cooperating processors. Interprocessor communications and simultaneous memory accesses typically act as bottlenecks in parallel machines, hampering any increase in computational power from translating into increased performance of the same order of magnitude. To overcome the inefficiency of long distance communications among processors, bus systems have been recently added to a number of parallel machines [1-6]. If such a bus system can be dynamically changed, under program control, to suit communication needs among processors, it is referred to as reconfigurable. Examples include the bus automaton [6], the reconfigurable mesh [3,4], and the polymorphic torus [1,2]. The computational model used throughout this work is the reconfigurable mesh.1 An m ×n reconfigurable mesh consists of m ×n identical processors positioned on a rectangular array (refer to Figure 1). The processor located at (i ,j ), (0≤i ≤m −1; 0≤j ≤n −1) is referred to as P (i ,j ). Every processor has 4 ports denoted by N, S, E, and W. In each processor, ports can be dynamically connected in pairs to suit computational needs. Our computational model only allows two connections to be set in each processor. Furthermore, these two connections must involve disjoint pairs of ports (see Figure 2). In the absence of these local connections, the reconfigurable mesh is functionally equivalent to the mesh connected computer. To make the model realistic, we assume that the processing elements have a small number of registers of O( log mn ) bits and a very basic instruction set; a processor can perform in unit time standard arithmetic and boolean operations. More sophisticated operations including floor and modulo computation are specifically excluded. To anticipate, we show that these operations can, in fact, be added to the repertoire with _______________ 1 When no confusion is possible a reconfigurable mesh will be referred to simply as a mesh.
-4very little overhead, thus setting the stage for a very powerful computing environment. We assume a SIMD (Single Instruction stream, Multiple Data stream) model: in each time unit the same instruction is broadcast to all processors which execute it and wait for the next instruction. Each instruction can consist of setting local connections, performing an arithmetic or boolean operation, broadcasting a value on a bus, or reading a value from a specified bus. The regular structure of the reconfigurable mesh makes it suitable for VLSI implementation [3,4]. In fact, it has been argued [4] that the reconfigurable mesh can be used as a universal chip capable of simulating any equivalent-area architecture without loss of time. By adjusting the local connections within each processor several subbuses can be established. We assume that the setting of local connection is destructive in the sense that setting a new pattern of connections destroys the previous one. At any given time, only one processor can broadcast a value onto a bus. Processors, if instructed to do so, read the bus. If no value is being transmitted on the bus, the read operation has no result. It is assumed [3,7-9] that communications along buses take O(1) time. This seems to be a reasonable assumption in the light of recent experiments with the YUPPIE system [7]. Problems in many practical applications can be reduced to computations on integer values. Such problems are collectively referred to as integer problems. Although, in principle, we could solve an integer problem by applying general-purpose algorithms, the hope is that more efficient algorithms can be devised to handle these problems more efficiently. The purpose of this paper is to propose a number of efficient algorithms for integer problems on reconfigurable meshes. To begin, we present a constant time algorithm to compute the prefix sums of a sequence of n integers ranging from 0 to n −1 on an n ×n reconfigurable mesh. We then show that the range of integers to be processed by both sorting and prefix sum algorithm can be extended to n c for arbitrary c . In this case, the complexity of our
-5algorithms is O(c ) time using a reconfigurable mesh of the same size. As an application of the integer prefix sums algorithm, we propose an efficient solution to the following problem referred to as the packing problem. Given an n element set X and a collection of m (m ≤n ) predicates such that every element of X satisfies at most one predicate, the packing problem asks for a partion of X such that the elements of X satisfying the same predicate occur in the same subset of the partition. Likewise, the elements of X that satisfy no predicate belong to a single subset of the partition. We show that our integer prefix sums algorithm yields an efficient algorithm to solve the packing problem. Specifically, we outline an algorithm that solves the packing problem in O(1) time on a reconfigurable mesh of size n ×n . As it turns out, the solution to the packing problem can be used to devise a constant time integer sorting algorithm. We show that sorting n integers in the range 0 to nc
can be solved in O(c ) time on a reconfigurable mesh of size n ×n . To our best
knowledge, this is the first instance of such an algorithm reported in the literature. Packing and prefix sums of integer sequences are fundamental and have farreaching applications. Among them, we present a constant time algorithm to convert an edge-list representation of a graph to an adjacent-list representation, a constant time algorithm to convert a parent-pointer representation of a rooted tree to a standard form, and an O(loglogn ) algorithm to compute the perimeter and the area of a binary image. To the best of our knowledge, this is the first time such results are reported in the literature. The paper is organized as follows: Section 2 provides a set of tools that will be used in the following sections; Section 3 presents an algorithm to compute the prefix sums of a sequence of integers; Section 4 discusses the packing problem; Section 5 discusses a number of applications; finally, Section 6 summarizes the results and proposes a number of open problems.
-62. Preliminaries One of the most useful techniques in designing parallel algorithms on a reconfigurable mesh of size n ×n involves partitioning the original mesh into √n submeshes of size √n ×n each2. At times, it may also be necessary to further subdivide the mesh into submeshes of size √n ×√n each. In this context, it is important for every processor to determine which submesh it belongs to, as well as its local coordinates within that submesh. The problem can be
j and j mod √n . Consider abstracted as follows: for every j (0≤j ≤n −1), compute _√__ n
column j of an n ×n mesh: every processor P (i ,j ) (0≤i ≤n −1) in that column computes i ×√n
and compares i ×√n with j . Notice that in each column there must exist a unique
row number i for which the following two conditions hold: • i ×√n ≤ j ; • i +1×√n > j . j Now this unique row number i is the largest integer smaller than or equal to _√__ , that n
j j , and, therefore, j mod √n = j – ___ × √n . To summarize the above discusis, _√__ √ n
n
sion we state the following result.
j and j mod √n can be obtained in O(1) time Lemma 1. For every j (0≤j ≤n −1), _√__ n
on a reconfigurable mesh of size n ×n . Next, let us restrict our attention to integers in the range from 0 to n −1. As it turns out, it is sometimes convenient to be able to represent such an integer in radix √n form. That is, we want to decompose an integer a (0≤a ≤n −1) into a unique ordered pair of integers such that a =x ×√n +y with 0≤x ≤√n −1 and 0≤y ≤√n −1. Once such a representation is available, processing a amounts to processing x and y . This _______________ 2 To avoid handling tedious housekeeping details we assume, without loss of generality, that n is a perfect square.
-7representation will occur frequently in this paper. Note that, in the above notation, a x = ___ and y =a √n
mod √n .
By Lemma 1, the unique ordered pair of a j (0≤j ≤n −1) can be obtained in O(1) time on a reconfigurable mesh of size n ×n . Recently, Olariu, Schwing, and Zhang [10] pointed out that on a reconfigurable mesh of size m ×n , the prefix sums of a binary sequence b 0, b 1, . . . , bn −1 with b j stored log n by processor P (0,j ) (0≤j ≤n −1) can be computed in O(log n ) time if m =1 and in O( ______ ) log m
if m >1. An important corollary of this result goes as follows. Lemma 2. The prefix sums of a binary sequence b 0, b 1, . . . , bn −1 with b j stored by processor P (0,j ) (0≤j ≤n −1) can be computed in O(1) time on a reconfigurable mesh of size √n ×n . We next present another important application of the binary prefix sums algorithm in [10]. Consider a reconfigurable mesh M of size n ×m with m ≤n , and an integer n sequence x 0, x 1, ..., xm −1 with 0≤xi ≤ ___ −1 for all values of i . We assume that the m
sequence is stored by the processors in the first row of M , with P (0,i ) storing xi , for all i.
log n Note that if m is small, that is, log m ≤ ______ , we can compute the prefix sums by log m
using the general prefix sums algorithm discussed in [3]. We therefore, assume that log n log m > ______ . log m
To begin, establish vertical busses in all columns of M and let every processor P (0,i )
n broadcast xi southbound to P (i ___ +xi ,i ). Next, establish horizontal busses in all m
n n n rows i ___ +xi of M and let processor P (i ___ +xi ,i ) broadcast xi to P (i ___ +xi ,0). m
m
m
Next, every processor P (j ,0) writes a 1 or a 0 into a local variable depending on n n whether or not i ___