Unranking Combinations in Parallel

0 downloads 0 Views 102KB Size Report
May 31, 1995 - Algorithms for ranking and unranking combinations, permuta- tions, partitions, trees etc., are often used in adaptive parallel generation of combi ...
Technical Report 95-1-023

Unranking Combinations in Parallel

Zbigniew Kokosi´ nski May 31, 1995

Department of Computer Software The University of Aizu Tsuruga, Ikki-Machi, Aizu-Wakamatsu City Fukushima, 965-80 Japan

Unranking Combinations in Parallel

Abstract In this report a parallel algorithm is presented for unranking k out of n set subsets. The computations run in a specialized architecture that combines some systolic and some associative features. The algorithm has O(k) time complexity and can be applied in specialized systems for parallel adaptive generation of the set of all combinations, its ordered subsets and any random sequences. This property enables fast distribution of subtasks among processors in parallel systems devoted for solving some classes of combinatorial problems. In particular the presented solution may be directly applied for programming hardware generators of combinations used in the associative computing.

1. Introduction In combinatorial problem solving ranking and unranking combinatorial objects is of great importance. Algorithms for ranking and unranking combinations, permutations, partitions, trees etc., are often used in adaptive parallel generation of combinatorial objects [1] and play an important role in dividing and distribution splitable combinatorial tasks among processors [2, 7]. One particular application pointed out recently is programming mask and pattern generators used in massive associative architectures [3, 4, 5, 8, 14]. Unranking combination algorithms can be used to produce k-subsets of the set {1,2,...,n}, where: 1 ≤ k ≤ n, for any given sequence of natural numbers including random sequences. All these algorithms derive elements ai of the sequence < a1 , ..., ai , ..., ak >, where: a1 ≤ ... < ai < ... < an ≤ n, for given natural numbers n, k and N (1 ≤ N ≤ C(n,k)), by carrying out operations on binomial coefficients [1, 2, 7, 9, 11, 13]. All known unranking combination algorithms are roughly characterized in Table I. The detailed discussion of their properties can be found in [9]. Since till now no effort was reported in the literature for solving unranking combinatorial objects in parallel in this paper we propose an O(k) parallel algorithm for unranking combinations. It is created on the basis of UNRANKCOMB-A algorithm [7, 9]. Although deriving consequtive elements of the Nth combination is inherently sequential one [11], two computation processes can be parallelized: 1) creation of the coefficient table , and 2) searching in the coefficient table. 2

In the next section we describe the computational structure and parallel algorithms for creation and processing the coefficient table. The resulting solution can be used both as and unranking based generator of combinations and as a hardware programmer for the hardware generator of combinations described in [7, 8]. For our parallel algorithms we apply one model of parallel Boolean computations, i.e. arithmetic Boolean circuits [12]. TABLE I Algorithm

CNR [13] RANKCINV [1] Kth COMBINATION [2] UNRANKCOMB-A [7,9] UNRANKCOMB-B [9] UNRANKCOMB-C [9] UNRANKCOMB-D [9]

Evaluation of binomial coefficients RF RF CT CT CT CT RF

Linear order IL DL IL DL IL DL DL

Time Space complexity complexity O(n) O(nk) O(nk) O(n) O(n) O(klogn) O(n)

O(k) O(k) O(nk) O(nk) O(nk) O(nk) O(k)

RF - Reduced Factorialing, CT - Coefficient Table, IL - Increasing Lexicographical, DL - Decreasing Lexicographical

2. Parallel algorithm for unranking combinations First of all we recall some basic notions related to set subsets (combinations) [9]. Let < Ai >i∈I denote an indexed family of sets Ai = A, where: A ={1, ... ,n}, I ={1, ...,k}, 1 ≤ k ≤ n. Any mapping f which ”chooses” one element from each set A1 , ..., Ak is called a choice function of the family < Ai >i∈I [10]. If a suplementary condition: ai < aj , for i < j, and i, j ∈ I, is satisfied then any choice function κ =< ai >i∈I that belongs to the indexed family < Ai >i∈I is called increasing choice function of this family. Sets of all increasing choice functions are representations of the set of all k-subsets (combinations) of the set A (in these cases we deal in fact with indexed sets Ci = {i, ... ,n-k+i} ⊂ Ai ) [2]. Let us introduce now lexicographical order on the set of all choice functions of the family < Ai >i∈I . For given choice functions δ =< d1 , ..., dk > and γ =< g1 , ..., gk >, we say that δ is less then γ according to the increasing lexicographical order, if and only if there exists i ∈ {1, ... ,k}, satisfying di < gi , and dj = gj , for every j < i. For given choice functions δ =< d1 , ..., dk > and γ =< g1 , ..., gk >, we say that δ is less then γ according to the decreasing lexicographical order, if and only if there exists i ∈ {1, ... ,k} satisfying di > gi and dj = gj , for every j < i. 3

There is one to one correspondence between any linearly ordered set of choice functions S with cardinality |S| = s and linearly ordered set {0, 1, ..., s − 1}. If β ∈ S then ρ(β) =x is called rank of β, where ρ is the ranking function. The function ρ−1 (x) = β is called unranking function. Ranks of ρ(β) and ρ0 (β) in lexicographical increasing and decreasing orders, respectively, satisfy relation: ρ(β) + ρ0 (β) =s-1. In the parallel unranking combination algorithm UNRANKCOMB-E the table A is used, which includes a part of the modified Pascal Triangle (see Table II). The table A is equivalent to the table PTA in [9], each binomial coefficient C(n, k) is mapped to the cell A[n-k+2, k] there, n ≤ nmax , where: nmax is any natural number (size of the triangle). Let us notice that elements in each jth column of A form a sequence which is increasing with the row index i. This property is the essential for speeding up the search in A columns. TABLE II (Table A for: 1 ≤ k ≤ n ≤ nmax = 6) i\j 1 2 3 4 5 6

1 2 0 0 1 1 2 3 3 6 4 10 5 ...

3 4 0 0 1 1 4 5 10 ... ...

5 6 0 0 1 ... ...

For given pair {n,k} creation of A requires O(nk) steps. For all pairs {n, k}, k ≤ n, creation of A requires O(n2 ) steps. One method of parallel creation of the table A is by using systolic computations. Let us assume that each element A[i, j] does correspond to the element A[i, j] of triangular systolic array (in this case i+j ≤ nmax +1), and each such element can compute the sum of operands obtained from two its neibourghs : A[i, j] = A[i, j-1] + A[i-1, j]. Accordingly, the inputs of the array are denoted by A[i, 0] for rows and A[0, j] for columns, respectively. In this systolic array the construction of the table A requires O(n) steps: Algorithm CA (Construction of the table A) 1. set in parallel inputs of the systolic array A 1.1. for j=1 to n do in parallel A[0, j]:= 0; 1.2. A[1, 0]:= 0; 1.3. for i=2 to n do in parallel A[i, 0]:= 1; 2. for m=1 to nmax do 2.1. for all {(i, j): i+j ≤ nmax + 1} do in parallel A[i, j]:= A[i, j-1] + A[i-1, j];

4

Figure 1: Logic for processing binary variables C[i, j]. Hardware implementation of this systolic array is straightforward. Since further processing of the coefficient table does not depend on the way it was created therefore we can assume that the table is already stored in the array registers. However in order to provide execution of our parallel algorithm we must build in additional processing capabilities into the array, i.e. implement an associative search in all columns of the array A. Let us assume that output registers of each element A[i, j] of the triangular systolic array have equal lenght and constitute an associative memory. In each jth memory column, choosen by mask bit M[j]= 1, fast parallel comparison is implemented for determining at first all elements A[i, j] > P, where: P is an input pattern in pattern register. If the above condition is satisfied for given A[i, j] in the jth column, then the pointer C[i, j]= 1, otherwise C[i, j]=0. Pointers C[i, j] are then used for detecting the unique cell A[t, j] that satisfies the condition A[t, j]= max {A[i, j]: A[i, j] ≤ P}. In order to implement this detection to each cell A[i, j] one EX-OR gate is added realizing the logical function D[i, j] = C[i, j] ⊗ C[i+1, j], where: 1 ≤ i ≤ n+1-j, and input C[n+2-j, j]= M[j]. Because D[i, j]= 1 only for i=t, hence the cell A[t, j] can be uniquely determined. All outputs D[i, j] of EX-OR gates in cells laying in the ith row are connected to the ith (n-i+1)-input OR gate. Outputs of all n OR gates are connected to the coder (conversion from the ”1 out of n” code into binary code). The coder outputs form the binary output vector. A fragment of the computational structure of the array A is shown in Fig.1. An alternative solution of the search in A is implementation of conventional associative memory operations (f.i. no greater then and maximum value). However, the resulting solution requires more hardware and is significantly slower then the circuit described above.

5

The parallel hardware-oriented algorithm UNRANKCOMB-E presented below generate, the Nth increasing choice function in the increasing lexicographical order: Algorithm UNRANKCOMB-E Input : n, k, N (0 ≤ N ≤ C(n,k)-1) - number of the choice function in increasing lexicographical order, triangular array A[n, n] in which element A[r-k+2, k] containes the value C(r, k), for k-1 ≤ r ≤ n. Output: Table K[k] with the choice function κ. Method: Computations proceed with combinations ranks in decreasing lexicographical order. In each mth iteration in step 2 all binary coefficients in the mth column of A are simultaneously compared with the combination rank P. Values M[j], C[i, j] and D[i, j] are computed in parallel in step 2.1. On this basis the next binary vector D is computed in step 2.2 that containes a unique bit D[t]= 1, pointing out the coefficient A[t, j]= max {A[i, j]: A[i, j] ≤ P}. In each iteration the next value K[k-m+1] is computed and new condition for the step 2.1.3.1 is derived with a new combination rank P. After k iterations we obtain the Nth increasing choice function κ . 1. P:= A[n-k+1, k]+ A[n-k+2, k-1]- 1- N; 2. for m=k downto 1 do 2.1. for j=1 to n do in parallel 2.1.1. if j=m then M[j]:=1 else M[j]:=0; 2.1.2. C[n-j+2, j]:= M[j]; 2.1.3. for i=1 to n-j+1 do in parallel 2.1.3.1. if A[i, j]*M[j] > P then C[i, j]:=1 else C[i, j]:=0; 2.1.3.2. D[i, j]:= C[i, j] C[i+1, j]; 2.2. for i=1 to n do in parallel 2.2.1. D[i]=D[i, 1] ∨ D[i, 2] ∨ ... ∨ D[i, n-i+1]; 2.3. convert vector D into t; 2.4. K[k-m+1]:= n-t-m+2; 2.5. P:= P - A[t, m]; 3. return K.

Example 1 For n=6 and k=4 find 12th increasing choice function in the increasing lexicographical order.

6

Solution: Step 1: P= A[3, 4]+A[4, 3]-1-12=2 Step 2: m= 4; Step 2.1: Step 2.1.1: M= [0,0,0,1,0,0]; Step 2.1.2: C[4, 4]= 1; all others C[n-j+2, j]= 0; Step 2.1.3: C[3, 4]= 1; all others C[i, j]= 0; D[2, 4]= 1; Step 2.2: D[2]=1; all others D[i]= 0; Step 2.3: t= 2; Step 2.4: K[1]= 2; Step 2.5: P= 2- A[2, 4]= 1; Step 2: m= 3; Step 2.1: Step 2.1.1: M= [0,0,1,0,0,0]; Step 2.1.2: C[5, 3]= 1; all others C[n-j+2, j]= 0; Step 2.1.3: C[3, 3]=C[4,3]= 1; all others C[i, j]= 0; D[2, 3]= 1; Step 2.2: D[2]=1; all others D[i]= 0; Step 2.3: t= 2; Step 2.4: K[2]= 3; Step 2.5: P= 1- A[2, 3]= 0; Step 2: m= 2; Step 2.1: Step 2.1.1: M= [0,1,0,0,0,0]; Step 2.1.2: C[6, 2]= 1; all others C[n-j+2, j]= 0; Step 2.1.3: C[2, 2]=C[3, 2]=C[4, 2]=C[5, 2]= 1; all others C[i, j]= 0; D[1, 2]= 1; Step 2.2: D[1]=1; all others D[i]= 0; Step 2.3: t= 1; Step 2.4: K[3]= 5; Step 2.5: P= 0- A[1, 2]= 0; Step 2: m= 1; Step 2.1: Step 2.1.1: M= [1,0,0,0,0,0]; Step 2.1.2: C[7, 1]= 1; all others C[n-j+2, j]= 0; Step 2.1.3: C[2,1]=C[3,1]=C[4,1]=C[5,1]=C[6,1]=1; all others C[i, j]= 0; D[1, 1]= 1; Step 2.2: D[1]=1; all others D[i]= 0; Step 2.3: t= 1; Step 2.4: K[4]= 6; Step 2.5: P= 0- A[1, 1]= 0; Step 3: return K. The 12th increasing choice function, for n=6 and k=4, in lexicographical order is < 2, 3, 5, 6 >. 2

7

Theorem 1 Algorithm UNRANKCOMB-E is correct and its asymptotic computational complexity is O(k). Proof Unranking algorithm is a variant of Lehmer’s scheme [11]. Correctness of the method results from the Proof of Theorem 1 in [9]. Search in consecutive columns of the coefficient table A is organized in an associative manner. Rank P loaded into pattern register is simultaneously compared with all values C(r,m) stored in the cells A[r-m+2,m] in the mth column of A. This reduces search time in the mth column to O(1). In this way the value t is determined. Then K[k-m+1] is obtained. Before the next step the rank P is modified in the step 2.5 of the algorithm. Each iteration in the step 2 has time complexity O(1). Hence the loop for in step 2 has k iterations the total complexity of the algorithm is O(k). 2 The Nth increasing choice function generated by the unranking algorithm can be used as input data for the programmable combination generator. The task of generation of all C(n,k) combinations using a given number of processors can be easy divided into subtasks and resulting adaptive generation algorithm [1] is efficient in many applications. Because hardware generation of combinatorial objects like permutations, combinations, partitions. etc. provides fast generation of mask vectors and interconnections for pattern input in associative processors, the presented solution of parallel unranking combinations in O(k) time seems to be very attractive. In the case of the versatile generator described in [7, 8], the programming process can be easy synchronized with the process of unranking and both processes may overlap. Since also the regular cellular structures of the both circuits overlap the combination generator and its programmer can be implemented together as one integrated structure. In this report we presented only one particular solution of parallel combination unranking. Applying the same approach more unranking algorithms and circuits can be designed satisfying additional design requirements which must be taken into account when integration with other system components is needed. They may differ with combination representations and code conversion.

8

References [1] Akl S.G.: Design and analysis of parallel algorithms, Prentice Hall, Englewood Cliffs, N.J., 1989, pp. 148-150. [2] Kapralski A.: New methods for generation permutations, combinations and other combinatorial objects in parallel, J. Parallel and Distrib. Computing, 17 (1993), pp. 315-326. [3] Kapralski A.: Sorting and searching in depth search machines, TR 93-1-001, University of Aizu, 1993, 66 pp. [4] Kapralski A.: Sequential and parallel processing in depth search machines, World Scientific, 1994. [5] Kapralski A.: Supercomputing for solving a class of NP-complete and isomorphic complete problems, Computer Systems Science & Eng., 7 (1992), No.4, pp. 218228. [6] Kokosi´ nski Z.: On generation of permutations through decomposition of symmetric groups into cosets, BIT, 30 (1990), pp. 583-591. [7] Kokosi´ nski Z.: Circuits generating combinatorial configurations for sequential and parallel computer systems, Monografia 160, Politechnika Krakowska, Krakow, Poland, 1993, 106 pp. (in Polish) [8] Kokosi´ nski Z.: Mask and pattern generation for associative supercomputing, Proceedings of the Twelfth IASTED International Conference ”Applied Informatics”, Annecy, France, May 1994, pp. 324-326. [9] Kokosi´ nski Z.: Algorithms for unranking combinations and other related choice functions, TR 95-1-006, University of Aizu, 1995, 20 pp. [10] Mirsky L.: Transversal theory, Academic Press, N.Y. 1971. [11] Lehmer D.H.: The machine tools of combinatorics, [in:] Beckenbach E.F. (editor): Applied combinatorial mathematics, John Wiley, N.Y. 1964, pp. 5-31. [12] Reiff J.H. (editor): Synthesis of parallel algorithms, Morgan Kaufman 1993. [13] Tang C.Y., Du M.W. and Lee R.C.T.: Parallel generation of combinations, [in:] Proc. Int. Computer Symposium, Taipei, Taiwan 1984, pp. 1006-1010. [14] Yau S.S., Fung H.S.: Associative processor architecture - a survey, Computing Surveys, 9 (1977), No.1, pp. 3-27.

9