An Associative Processor for Multi-comparand Parallel Searching and

0 downloads 0 Views 142KB Size Report
The processor works in a combined bit-serial/bit- parallel mode. Its main component is a multi- comparand associative memory with programmable prescription ...
An Associative Processor for Multi-comparand Parallel Searching and Its Selected Applications  ZBIGNIEW KOKOSINSKI * University of Aizu, Department of Computer Software Aizu-Wakamatsu, 965-80 Fukushima, Japan e-mail: [email protected]

Abstract In this paper a multi-comparand asso-

ciative processor is presented. The structure of the processor and its functions are described in detail. The processor works in a combined bit-serial/bitparallel mode. Its main component is a multicomparand associative memory with programmable prescription functions. The multi-comparand associative search paradigm is shown to be e ective in processing complex search problems from many application areas including computational geometry, graph theory and list/matrix computations. Several representative problems, belonging to di erent complexity classes, and algorithms for them are presented and discussed.

Keywords:

associative processor, multiple search, multi-comparand search, geometric range search, matrix problem

1

Introduction

Searching is one of the key concepts in computer science and engineering [13, 20, 21]. A need for searching arises in many di erent application areas. Although the problem of searching has been challenging researchers for decades, it shall remain a fundamental question in future. Since the size of searching tasks at hand is increasing and, in a consequence, time requirements are becoming 3 on leave from Politechnika Krakowska, Krak ow,

Poland.

more critical, new techniques that can speed up search operations by means of massively parallel computing are of interest. Among many parallel models, associative machines, that belong to broader SIMD category, are particularly well suited for performing fast parallel search operations, since the content-addressing mode of associative memories is supported by build-in multiple comparison capability [3, 4, 22, 24, 26, 27, 30]. Recent advances in associative processing include : massively parallel associative processor IXM2, that performs comparisons involving up to 260,000 data items simultaneously [7], new associative devices with cells designed on the transistor level [6, 8, 25, 29], new applications and algorithms in databases, computational geometry and arti cial intelligence [12], new selection and extreme search algorithms for fully parallel memories resulting from average-case analysis[23], and an experimental optical architecture [18]. Most associative machines work in bit-serial word-parallel mode, with a single comparand and multiple data. Results of multiple comparisons are stored in a tag memory and are resolved by a specialized circuitry (logical tag resolver). However, many alternative paradigms were also developed. Other commonly used working modes include fully parallel (bit-parallel word-parallel), word-serial, block-oriented, etc. Various machines implement di erent sets of basic logical matching functions. There exist architectures without tag memory, and solutions with separated tag

and word mask circuitry. Logical tag resolvers often vary in their function and structure. All this leads to a variety that ts many particular processing requirements. One common drawback of mainstream associative models is usage of a single comparand only. Although there exist particular applications, where simultaneous many-to-many comparisons are feasible without any need to use an external comparand vector [9, 10], a general practice is to process many-to-many comparisons sequentially, by performing a sequence of one-to-many compare operations. Eventually, parallelization is achieved through processing many copies of input data simultaneously. Design of a processor with multiplecomparand processing capability would make the multi-comparand associative search much faster. The problem was stated for the rst time in [2], where a versatile search memory with many-to-many comparison embedded was proposed. In this paper a hierarchical associative processor architecture is proposed, that implements the idea of multi-comparand search. The multiple-comparand associative memory with 2D tag memory is a key module of the processor. In comparison to its predecessor [2], the search functions are restricted to few basic ones. On the other hand, in order to extract global features of processed data, the associative search capability is introduced into the 2D tag memory too. Thus, the associative search is organized hierarchically, on the two levels. This new design of associative architecture is shown to be e ective in solving some exemplary problems, that involve complex search operations like: multiple search [28], geometric range search [19] and matrix problems [12]. Six representative problems with various computational complexity characteristics are selected, and algorithms for them are derived with the multi-comparand search as a basic operation. In the next section the machine model is described. Section 3 is devoted to exemplary applications and algorithms formulated in the given model of computations. Section 4 suggest possible directions of a further research.

2

The machine

The multi-comparand associative memory may be described as follows. Basic data processing is organized in bit-serial word-parallel mode, but the conventional single-comparand register is replaced by a comparand array, and the tag memory is extended to the size of the Carthesian product of data and comparand sets. Processing is performed neither in the data array, nor in the comparand array, but exclusively in the 2D tag memory. In each consecutive step of processing each tag memory cell is updated according to : its previous state, the corresponding data/comparand values, and so called prescription function that de nes type of the search (24 logical and 10 arithmetical searches can be well de ned in total [2]). Type of the search determines also the initial state of the tag memory. The possibility to de ne a prescription function for each tag cell separately is a source of the memory's versatility but it requires an extra logic and d log2q e control bits for each cell, where q is the number of di erent prescription functions. Obviously, presumptively homogenous cell functions could be handled in a less hardware consuming way, reducing the number of control bits. Making a usage of the current ASIC technology, the processor's tag memory could be implemented as a recon gurable logic, with con guring data loaded from a PRAM, providing hardware savings and an essential improvement of time parameters. Comparison results collected in the 2D tag memory have to be further processed in order to extract global search results. Otherwise, partial immediate data collected in the tag memory are dicult to interpret. This implies a need of providing the 2D tag memory with associative processing capabilities. As a result, a simple multi-comparand associative processor architecture is obtained, which is depicted in Fig.1. Description of the architecture components: DATA ARRAY (DA) - n data matrix;

2 p binary

C2 SM1 M

SM2 1

p

1

m

1

T2

LTR

1

DATA

TAG

...

...

ARRAY

n 1

MEMORY

n

COMPARAND ...

ARRAY PRAM PFG m

M BPC

Figure 1: A multi-comparand associative processor. COMPARAND ARRAY (CA) m 2 p binary comparand matrix; DM - n 2 1 data and tag (TM and T2) mask vector; CM - m 2 1 comparand mask vector; SM1 - 1 2 p mask vector for multicomparand search; (DM, CM and SM1 select proper submatrices of DA and CA for multicomparand search) BPC - bit position counter; (generates consequtive bit slices selected by SM1 for bit-serial processing; bit slice l is generated by bit BPC[l]=1; for all j 6= l: BPC[j]=0) TAG MEMORY (TM) - n 2 m binary tag matrix; (memory for processing and storing comparison results; each cell TM[i, j] process and stores results of comparison of subsequent pairs of bits fDA[i, l],CA[j, l]g, where l is the coordinate of processed bit slice, according to a precription function loaded into TM from PRAM PFG; TM has built-in capability of fully parallel (bit-parallel word

parallel) associative processing its content with single comparand C2 and search mask SM2 - results of single comparand EQUALITY search in TM are stored in T2; TM can also perform SELECT FIRST function [4] in each column) PRAM PFG - PRAM prescription function generator; (for our needs the set of prescription functions is restricted to a very basic one, i.e. f =, 6=, , , g, but in general this set can be freely modelled for a given application) C2 - 1 2 m binary comparand vector; SM2 - 1 2 m TM mask vector; T2 - n 2 1 binary tag vector for TM; LTR - logical tag resolver; (in particular, it computes binary variables w and u: w = 1 i at least one unmasked bits of T2 is equal 1, while u = 1 i at all unmasked bits of T2 are equal 1; variable u, often denoted as SOME/NONE, is commonly used in tag resolvers [4]; variable w, which can be denoted as ALL/NOT ALL, is used in [12]). The above processor architecture is shown in a basic con guration. Components enhancing functions of the basic associative memory structure are drawn in Fig.1 with a bolder line. Depending on a particular application it is necessary to add some of the optional system components: RESPONDE COUNTER - counter of the number of respondes for T2 [4]; SELECT FIRST - circuit for each column of TM and T2 [4]; (p,k)-COMBINATION GENERATOR - hardware mask generator for SM1 and SM2 [15, 16];

(p,k)-PERMUTATION GENERATOR - hardware comparand generator placed between DA and CA [14, 15]; CONTROLLERS for sequencing processor operations in TM (required for processing some speci c problems only). MEMORY REGISTERS - a space for storing temporary results of computations, like tag vectors, mask vectors, etc. The machine model is powerfull and exible enough to satisfy numerous requirements in fast associative processing. The model is a parametrized generalization of many other simpler models proposed earlier and can substitute them functionally, i.e. perform all algorithms derived for those models. For instance, models of single-comparand processors are equivalent to our model with the parameter m=1. In many cases, where organization of multi-comparand searching is possible, the model provides a signi cant improvement of algorithms' performance. In order to simplify the evaluation of algorithms presented in the next section the following assumptions are made: 1. single bit-serial search operation consumes a unit time; 2. all bit-serial word-parallel search operations consume a time proportional to the number of bit slices, i.e. O(p) or less; 3. TM prescription functions can be loaded from PRAM PFG in O(1) time; 4. single mask programming can be done in O(1) time; 5. single comparand permutation can be done in O(1) time; 6. LTR, SELECT FIRST circuits and COUNTER of the number of respondes consume a time that depends on the construction of these components and available technol-

ogy. In general the time is a logarithmic function of n, but this obvious factor will not be considered in the rest of this paper.

3

Applications

The proposed machine is a powerfull tool suitable for processing complex search problems in many application areas. In this section we are going to review some exemplary problems and their processing in our machine model. SET INCLUSION Given two sets A=fa1,a2,...,an g and B=fb1,b2 ,..., bm g, n  m; is every element ai 2 A equal to an element bj 2 B, i.e. does AB? Remark: Let us consider two cases of this problem: 1. EXACT SET INCLUSION : assuming p-bit representation and the same precision the solution is obtained in O(p) time; 2. APPROXIMATE SET INCLUSION : with qbit representation the precision requirement is relaxed (only rst q bits are compared, q < p). The solution is obtained in O(q) time. Both cases can be interpreted as a particular case of GEOMETRIC RANGE SEARCH, where the total range is a sum of subranges de ned by signi cant bits of all elements of B. The following algorithm deals with the rst case.

Algorithm 1 1. 2. 3. 4.

5. 6. 7. 8.

DA A. CA B. DM, CM, SM1 1. compute TM for the prescription function f = g. C2 0. SM2 1. compute T2. if w=0 then return YES else return NO.

GEOMETRIC RANGE SEARCH [19]. There are many variants of geometric range search problems. We will consider two of them:

Problem A: Given a system of ranges of the d-dimentional Euclidean space Rd and a n-point set P in Rd ; nd for given range R, whether all points of P are lying in R. Remark: In [12] redundant multiple coordinate systems (both Carthesian and polar) are proposed for representing complex geometric shapes. This approach can be followed for modelling complex ranges with redundant mixed coordinate systems. Various di erent ranges can be de ned in Rd (see [19]) and modelled by systems of equality, inequality and logic relations (without loss of generality, we may assume that ranges are restricted to axisparallel boxes and halfplanes in R2 in a Carthesian coordinate system). Points' coordinates will be stored in data array DA while range parameters in comparand array CA. Since real numbers can be represented in computers with limited precision only, processing bit-slices in associative memory from the most signi cant bit of the representation towards least signi cant bit allow us to modify precision of computations (what, in fact, leads us naturally to the concept of approximate range matching). Geometric range matching problem as formulated above can be solved in O(p) time, where p is the precision of binary representation.

Algorithm 2

1. 2. 3. 4. 5. 6. 7. 8.

DA P. CA R. DM, CM, SM1 1. compute TM for the prescription functions corresponding to a given system of relations. C2 1. SM2 1. compute T2. if u=1 then return YES else return NO.

Problem B: Given a system of ranges of the d-dimentional Euclidean space Rd and a npoint set P in Rd ; nd for given range R, the number of points s of the set P lying in R. Remark: If s < n, then the problem B can be decomposed onto two subproblems: rst the problem A has to be solved with solution in T2, and then the number of 1's in T2 can be counted (for description of a specialized circuitry see [4]).

Algorithm 3 1. 2. 3. 4. 5.

NUMBER=0; DA P. CA R. DM, CM, SM1 1. compute TM for the prescription functions corresponding to a given system of relations. 6. C2 1. 7. SM2 1. 8. compute T2. 9. if w=1 then NUMBER= NUMBER OF RESPONDES. 10. return NUMBER.

MATRIX INCLUSION [12] Given two binary matrices A[n,p] and B=[m,p], n  m; is there a p-permutation  such that set A is included in B (in the sense of EXACT SET INCLUSION) ? Remark: Status of the MATRIX INCLUSION problem is an open problem since certain instances of it represent GRAPH ISOMORPHISM [11] and no polynomial algorithm is known for these problems in general.

Algorithm 4 1. 2. 3. 4. 5.

DA A. CA B. DM, CM, SM1 1. repeat generate next permutation of CA columns.

compute TM for the p.f. f = g. C2 0. SM2 1. compute T2. until w = 0 or the last object is generated. 11. if w = 0 then return YES else return NO.

6. 7. 8. 9. 10.

NEIGHBOURHOOD [12] Given a binary matrix A[n,p] containing a set A=fa1,a2,...,an g, an integer constant k < p, and two disjoint subsets I,J 2 A, I,J 6= ;, is there a submatrix A'[n,r] of A, where r  k, such that ah '=ai ' and ah '6=aj ', for all h,i 2 I' and j 2 J'? Remark: This problem has been shown to be NP-complete since it is a generalization of DIFFERABILITY, proven to be in NPcomplete class [12]. Binary matrix representation and single-comparand associative processing of many NP-complete problems is discussed in [11, 12].

1. 2. 3. 4. 5.

Algorithm 5

DA A. CA A. DM mask for I. CM, SM2 mask for I [ J. set the prescription functions for columns of TM: if a column index i 2 I then f=g, if j 2 J then f6=g. 6. C2 1. 7. repeat 8. generate next r-combination in binary representation and set mask SM1. 9. compute TM for the given prescription functions. 10. compute T2. 11. until u = 1 or the last object is generated. 12. if u = 1 then return YES else return NO.

MULTIPLE SEARCH [28] Given two sorted sequences of items A= and B=; determine for each ai (1  i  n), the item bk such that bk01  ai < bk . Remark: This problem can be solved in CREW PRAM model in O(log n) time by using m processors each carrying an item of A to do (simultaneously) binary search in B. In [28] the problem has been solved on an EREW PRAM using k (k  minfm,ng processors in O( logm + mr ) time if n  m, or in O( logn + mr log 2mn ) time if n > m. In our algorithm the input sequences are sorted according to increasing lexicographic order, and the processing time is O(p), where p is the lenght of items' binary representation, and is independent of both set sizes. Each TM[j,i] computes if bj > ai . The pairs fai ,bj g can be retrieved in any order, but only one at a time.

Algorithm 6 1. 2. 3. 4.

DA B. CA A. DM, CM, SM1 1. compute TM for the prescription function f > g. 5. compute SELECT FIRST for all columns of TM. 6. use SELECT FIRST bit in i-th column of TM for retrieving bk for a given ai :

MULTI-LIST RANKING [1, 5] Given a set A=fl1 ,l2 ,...,lq g of lists, where each list lr , 1  r  q, contains one or more integers; an integer e, which is an element of some list; and a designated integer i. Does element e receive a rank less then or equal to i ? Remark: Rank of elements are computed iteratively. Each iteration consists of the following steps: 1. assigning a consequtive rank to all rst elements of all lists; 2. checking if element

e has the current rank; 3. creating new lists by deleting all appearences of elements with this rank from all lists. The procedure is repeated no more then i times, until the rank of element e is found or iteration i gives the negative answer. In general the above problem is known to be in P-complete class, i.e. it is believed to be inherently sequential. Several restriction of the problem have, however, a feasible highly parallel algorithms. For detailed complexity characterization please refere to [1]. It is possible in principle to parallelize both steps of each iteration in a multi-comparand, associative model, thus reducing the algorithm complexity to O(i). However, such solution is rather costly, since it requires q separate SELECT FIRST circuits with an extra mask and tag circuitry. Moreover, CA must consist of all lists. The following algorithm provides a parallelization of the second and third step of each iteration, only, performing the many-to-many comparison operation in our basic associative model. The time complexity of the algorithm is O(iq). M[r] denotes a mask vector corresponding to the list lr in DA. M[q+1] denotes a mask vector for element e in DA. N denotes global mask vector for current elements of all lists in DA.

Algorithm 7

1. DA A [ feg. 2. SM1, N 1. 3. C2 0. 4. s=1; 5. repeat 6. k=0; 7. CM, SM2 0. 8. for r = 1 to q do 9. T2 M[r]\N. 10. if w=1 then 11. k=k+1; 12. CA[k] DA[SELECT FIRST]. 13. CM[k], SM2[k] 1. 14. DM M[q+1]. 15. compute TM for the p.f.f = g.

16. 17. 18. 19. 20. 21. 22. 23. 24.

4

compute T2 for the p.f.f = g. if w=1 then DM M[1][...[M[q]. compute TM for the p.f.f = g. compute T2 for the p.f.f = g. N : T2. s:=s+1; until w=0 or s=i+1; if w=0 then return YES else return NO.

Final remarks

In this paper a multi-comparand processor architecture was proposed which enables a highly parallel search operations. The machine is enough general and exible to meet a wide range of requirements, and solve combinatorial problems belonging to various complexity classes including polynomial, P-complete, isomorphic-complete and NP-complete problems. Some other machines known from the literature are included in the presented model. Some particular topics may be object of further investigation. For instance, establishing the relationships between various prescription functions used in associative processing may contribute to minimizing hardware cost of a single TM cell. Various techniques of speeding up search operations may still be discovered (see [23]). Finally, a closer inspection of various classes of combinatorial problems should signi cantly extend the application domain as well as result in many new algorithms developed on the basis of multi-comparand search operation.

References [1] Dessmark A., Lingas A., Maheshwari A.: Multi-list ranking: complexity and applications, Proc. 10th Annual Symposium on

Theoretical Aspacts of Computer Science STACS'93, February 1993, Wurzburg, Germany, LNCS 665, Springer-Verlag 1993, pp. 306-316.

[2] Digby D.W.: A search memory for manyto-many comparisons, IEEE Transactions on Computers, C-22 (1973), No. 8, pp. 768-772.

Sequential and parallel processing in depth search machines, World

[12] Kapralski A.:

Scienti c, 1994.

[3] Fernstrom C., Kruzela I., Svensson B.:

[13] Knuth D.E.: The art of computer programming, Vol.3, Sorting and searching, Addison-Wesley, Reading, MA, 1973.

[4]

[14] Kokosinski Z.: On generation of permutations through decomposition of symmetric groups into cosets, BIT, 30 (1990), pp. 583-591.

LUCAS associative array processor. Design, programming and application studies, LNCS 216, Springer-Verlag 1986. Foster C.C.: Content addresable parallel processors, Van Nostrand Reinhold, N.Y. 1976.

[5] Greenlaw R., Hoover H.J., Ruzzo W.L.:

Limits to parallel computation: Pcompleteness theory, Oxford University Press, N.Y. - Oxford, 1995.

[6] Herrmann F.P. et al.: A dynamic threestate memory cell for high-density associative processors, IEEE Journal of SolidState Circuits, 26 (1991), No. 4, pp. 537541. [7] Higuchi T. et al.: The IXM2 parallel associative processor for AI, Computer, 27 (1994), No. 11, pp. 53-63. [8] Jalaleddine S.M.S., Johnson L.G.: Associative IC memories with relational search and nearest-match capabilities, IEEE Journal of Solid-State Circuits, 27 (1992), No. 6, pp. 892-900.

An electronic circuit for the maximum selection in a RAM, Patent No. 146021, Polish

[9] Kapralski A., Kokosinski Z., Mol W.: Patent Oce, 1989.

[10] Kapralski A.: The maximum and minimum selector SELRAM and its application for developing fast sorting machines, IEEE Transactions on Computers, 38 (1989), No.11, pp. 1572-1576. [11] Kapralski A.: Supercomputing for solving a class of NP-complete and isomorphic complete problems, Computer Systems Science & Eng., 7 (1992), No.4, pp. 218-228.

[15] Kokosinski Z.: Mask and pattern generation for associative supercomputing, Proc.

12th Int. Conf. on Applied Informatics AI'94, May 1994, Annecy, France, pp. 324-326.

[16] Kokosinski Z.: On parallel generation of combinations in associative processor architectures, Proc. of the Int. Conf. on

Parallel and Distributed Systems EuroPDS'97, June 1997, Barcelona, Spain.

[17] Krikelis A. Weems C.C. (eds): Associative processing and processors, IEEE Computer Society Press, Los Alamitos, 1997. [18] Louri A., Hatch J.A.: An optical associative parallel processor for highspeed database processing, Computer, 27 (1994), No. 11, pp. 65-72. [19] Matousek J.: Geometric range searching, Computing Surveys, 26 (1994), No.4, pp. 421-461.

Data structures and algorithms 1: Sorting and searching, EATCS

[20] Mehlhorn K.:

Monographs on Theoretical Computer Science, Springer-Verlag, 1984.

Data structures and algorithms 3: Multi-dimentional searching and computational geometry, EATCS

[21] Mehlhorn K.:

Monographs on Theoretical Computer Science, Springer-Verlag, 1984.

[22] Parhami B.: Associative memories and processors: an overview and selected bibliography, Proc. IEEE, 61 (1973), pp. 722730. [23] Parhami B.: Extreme-value search and general selection algorithms for fully parallel associative memories, The Computer Journal, 39 (1996), No.3, pp. 241-250. [24] Ramamoorthy C.V., Turner J.L., Wah B.W.: A design of a fast cellular associative memory for ordered retrieval, IEEE Transactions on Computers, C-27 (1978), No. 9, pp. 800-815. [25] Schultz K.J., Gulak P.G.: Fully parallel intergated CAM/RAM using preclassi cation to enable large capacities, IEEE Journal of Solid-State Circuits, 31 (1996), No. 5, pp. 689-699.

A hierarchical associative processing system, LNCS 195, Springer-

[26] Stuttgen H.: Verlag, 1986.

[27] Thurber K.J., Wald L.D.: Associative and parallel processors, Computing Surveys, 7 (1975), No.4, pp. 215-255.

Proceedings 5th International Parallel Processing Symposium, April-May 1991, Ana-

[28] Wen Z.: Parallel Multiple Search, heim, CA, USA, pp. 114-119.

[29] Yamagata T. et al.: A 288-kb fully parallel content addressable memory using a stacked-capacitor cell structure, IEEE Journal of Solid-State Circuits, 27 (1992), No. 12, pp. 1927-1933. [30] Yau S.S., Fung H.S.: Associative processor architecture - a survey, Computing Surveys, 9 (1977), No.1, pp. 3-27.

Suggest Documents