ARCHITECTURE OF RECONFIGURABLE PROCESSOR FOR ...

4 downloads 0 Views 367KB Size Report
Valery Sklyarov, Iouliia Skliarova. University of Aveiro, Department of Electronics and Telecommunications, IEETA ...... POSI/43140/CHS/2001 and. FCT-PRAXIS.
ARCHITECTURE OF RECONFIGURABLE PROCESSOR FOR IMPLEMENTING SEARCH ALGORITHMS OVER DISCRETE MATRICES Valery Sklyarov, Iouliia Skliarova University of Aveiro, Department of Electronics and Telecommunications, IEETA 3810-193 Aveiro Portugal

Abstract. The paper suggests architecture of a reconfigurable processor, which can be customized for implementing different search algorithms over discrete matrices. Such algorithms might be used for solving various problems of combinatorial optimization, such as covering, Boolean satisfiability, etc. The proposed architecture contains memory blocks for binary or ternary matrix, general-purpose registers, five stacks, that make possible to carry out recursive search procedures based on decision tree, and a reprogrammable functional unit that allows to perform the required operations over binary and ternary vectors. Two levels of control circuits have been suggested. The first (top) level permits to realize the search algorithm. The second (bottom) level allows to implement operations that are required for the algorithm. The same architecture enables us to implement different search algorithms by reprogramming RAM (ROM) – based components of control circuits. Keywords –problems of combinatorial optimization, reconfigurable architecture, search algorithm, FPGA, reprogrammable finite state machines, hierarchical control algorithm

I. INTRODUCTION There are many practical applications [1-13], which require the solution of some combinatorial problems, such as Boolean satisfiability [1-9], covering of binary matrices [10-13], graph coloring [10,14], etc. It is known that the majority of these problems are NP-hard and as a result they are time and resource consuming. Because of that it is very important to design and to implement the respective accelerators such as coprocessors for generalpurpose computers [15-23]. Combinatorial computations have two distinctive features [24]. Firstly, as a rule they require considering a huge number of different variants. Secondly, these variants are frequently ordered and examined with the aid of a decision tree that provides an efficient way for handling intermediate solutions. The decision tree is constructed during the search process and it is traversed

starting from his root. During the search special reduction methods are applied allowing both to simplify intermediate situations and to reduce the number of possible variants to analyze. Most combinatorial problems have discrete character. That is why they can be formulated on such mathematical models as graphs, sets, discrete matrices, etc. [25]. These models are convertible in such a way that one model can be formally transformed into another. For example, in [25] it was shown that they might be converted to a universal matrix representation. Logic matrices are very well suited for processing them in hardware (in FPGAs in particular [24-28]). This property has played an important role for the use of matrices as a basic mathematical model for architecture proposed in this paper. In [11,29] it is presented a classification of different problems that can be solved on discrete matrices. Analysis of these problems shows that each of them requires quite a limited number of different operations. On the other hand as a rule various combinatorial problems require different sub-sets of operations. It allows to conclude the following. First, the reconfigurable devices should be more profitable because the same device might be efficiently employed for solving different problems. Second the reconfiguration time should be negligible comparing with the time of combinatorial computations. With the advent of Field Programmable Logic Devices (FPLDs) it became possible to design and implement digital systems and their components without the need for technological steps dealing with silicon. This process is very similar to the process for the development of software for general-purpose computers. Initially, FPLDs were used for implementing unique, and usually relatively simple, digital circuits. The effectiveness of this approach forced enormous research efforts in this area for developing new generations of FPLDs with a huge number of available logic primitives and much more powerful capabilities. Today FPLDs are considered to be an alternative for ASICs, and they have already been very efficiently used in a very large number of practical applications, such as co-processors for general-purpose computers, problem-oriented digital systems, embedded

controllers, prototyping boards, and so on. All basic components that are included into the proposed architecture of combinatorial processor can easily be implemented in FPGAs of Xilinx Spartan-II/Virtex families. Most models of computation in digital systems include components with state, where the behavior is given as a sequence of state transitions, which is served by control circuits. In order to provide a system with the properties of extensibility, flexibility and reusability, we have to be able to modify the behavior of control circuits after the system has been designed and implemented in hardware. If we can design the circuit so that it can be flexibly modified and updated during run-time, we can implement a system that requires some hardware resources Rc, on some available hardware that has resources Rh, where Rc>Rh. The main idea of the approach considered for combinatorial processor is the rational combination of FPGA capabilities with some suggested methods for producing a modifiable specification.

II. AN EXAMPLE OF COMBINATORIAL SEARCH Fig. 1 depicts the basic algorithm of combinatorial search, which can be used to solve combinatorial problems over Boolean (binary) and ternary matrices. Γ Begin Applying reduction rules allowing to simplify the problem Applying selection rules in order to choose a component

Recording intermediate result. Simplifying the matrix

Branching required

The solution is found

no

no

Recording intermediate result. Simplifying the matrix

yes Recursive call of the same algorithm Γ

no Recording final result

Previous result is either equal or better yes

Initial matrix Applying reduction rules and removing rows 1,3 and column A

End

Figure 1. Basic search algorithm

Different steps of the algorithm will be demonstrated on an example of the exact method [10] that permits to find out a minimal column cover of the following binary matrix:

Applying reduction rules and removing columns C,E,I and row 2

Applying reduction rules and removing row 5 and column G

Row 5 with 2 values “1”

Row 10 with 2 values “1”

branching

G

2 Removing the column G and rows 5,7 that are covered

Applying reduction rules and removing columns B,H

(1)

2

2

stop

stop 3

Row 6 with 1 value “1”

F

branching

I

I 1 0 1 0 1 0 1 0 0 1

Removing the column E and rows 4,7 that are covered

H

ABCDEFGH 1 1 0 0 0 01 0 1 2 0 1 1 0 00 1 0 3 0 0 1 1 01 0 0 4 1 0 0 1 10 0 0 5 0 0 1 0 00 1 1 6 0 1 0 0 01 0 0 7 0 1 0 0 10 1 0 8 1 0 1 1 00 0 1 9 0 0 0 0 01 0 1 10 0 0 0 1 0 0 0 0

Row 4 with 2 values “1” colu branching mn 1 E

D mn colu Removing the column D and rows 4,8,10 that are covered

1

D

yes

Columns D, F, G represent a minimal column cover, i.e. a minimal subset of columns that have at least one value “1” in each row of the matrix. The following set of rules that permit to simplify matrix will be used: • If for i≠j rowi & rowj = rowj then rowi can be removed from the matrix, for example, row1 = 100001011, row9 = 000001010, row1 & row9 = row9 and row1 have to be removed from the matrix; • If for i≠j columni & columnj = columni then columni can be removed from the matrix, for example, after deleting rows 1 and 3 using the first rule columnA = 01000100, columnD = 01000101, columnA & columnD = columnA and columnA has to be removed from the matrix. • If any column contains just values “0” it has to be removed from the matrix; • If there is a row, which does not have values “1” then covering cannot be found. The first two operations are called subsumption operations. For selection purposes the following rules will be used: • If a row has just one value “1” then it must be included into the covering; • If all rows have more than one value “1” then the first row from the top of the matrix that contains a minimum number of ones has to be selected. For this row it is necessary to analyze all possible branches and the number of such branches is equal to the number of values ”1” in the row. Obviously any branch has to be examined until the step where an intermediate result becomes worse than any previously discovered covering. Fig. 2 shows all the steps that are required in order to find out a minimal column cover of the matrix (1).

2

stop

Removing the column F and rows 6,9 that are covered. The matrix is empty

Figure 2. Using search algorithm to find out a minimal column cover of the matrix (1)

The way that leads to the minimal cover (columns DGF) is shown with the aid of double arrows. There are three branching points in fig. 2: D-E, G-H and D-I. After

getting the first solution (DGF) we are interested just in coverings that contain 2 or less columns. Thus, it is not necessary to traverse all branches and the search process can be stopped at any point that gives a 2-component incomplete solution. Similar search algorithms can be used for solving many combinatorial problems, such as Boolean satisfiability [27,28], graph coloring [10,11], etc. These algorithms have several distinctive features: 1. They are recursive and as a result they require a recursive control circuit. 2. They do not change initial data (i.e. initial matrix) because matrix reduction can be provided by masking some rows/columns and using just a remainder of the matrix. 3. They invoke very limited number of operations (such as reduction and selection operations considered above), which have to be applied to a huge volume of data. 4. Subsets of required operations are usually not the same for different combinatorial problems (for example we can compare the considered above subset and subset of operations that are employed for Boolean satisfiability [24]). 5. In order to perform forward and backtrack propagation we can use stack memory that stores and restores intermediate results (such as values of mask registers) in branching points (see rectangular nodes in fig. 2). 6. The algorithms can be decomposed into two levels of control operations. The top-level (recursive) sequence is very similar for different algorithms. The bottom level sequence permits to execute the required operations over Boolean and ternary vectors. We have already mentioned that as a rule these operations are not the same for different algorithms and it requires changes to the functionality of the respective circuit. It shows in particular a necessity of reconfiguration. All these features have been taken into account in the proposed architecture of combinatorial processor targeted to the considered type of algorithms for combinatorial search.

III. ARCHITECTURE OF RECONFIGURABLE PROCESSOR

Fig. 3 depicts the proposed architecture of reconfigurable processor. The processor is composed of five primary units that will be described below. III.1. Storage for ternary (binary) matrix. Any binary matrix is kept in two memory (RAM) blocks. The first block addresses columns of the matrix and the second block addresses rows of the matrix (i.e. an initial matrix and its transpose are stored in two separate memories). It allows to read any row or column during one clock cycle. An address is provided by the respective address counter, which can also be loaded in parallel (i.e. any arbitrary address might be specified). Any ternary matrix is coded by two binary matrices. The first matrix contains values "1" in all positions where the initial ternary matrix has values "1" and values "0" in all other positions. The

second matrix contains values "1" in all positions where the initial ternary matrix has values "0", and values "0" in all other positions. Thus zeros, ones and don't care are coded respectively by 01, 10 and 00. Note that for the considered algorithms the combination 11 is not needed, but if necessary it can also be used. column address counter r o w a d d r e s s c o u n t e r

address for columns

Stack of masks for rows Stack of masks for columns

address for rows

Storage for matrix

column

Stack of results Auxiliary stack

row

Stack of branch masks

Reprogrammable functional unit for computations over Boolean and ternary vectors

General-purpose registers

Control unit with modifiable functionality

Figure 3. Architecture of reconfigurable combinatorial processor for implementing search algorithms over Boolean and ternary matrices

III.2. There are five types of stack memory, which permits to store intermediate results in any branching point in order to provide backtracking if required. The stacks are employed for storing the following data: • masks for rows of the matrix; • masks for columns of the matrix; • intermediate results in branching points (i.e. some columns that have already been included into the covering); • masks for branches that have already been considered in any branching point; • values of estimation criteria, which permit to select the proper component of the matrix (auxiliary stack), for example, in the considered covering problem this stack contains the number of ones in each row and this number has to be corrected at any step that removes row(s)/column(s) of the matrix with values "1". III.3 General-purpose registers for storing intermediate results. III.4. Reprogrammable functional unit, implementing bottom-level control, for computations over Boolean and ternary vectors. This unit permits to carry out all the required elementary operations. For the considered above example it has to execute subsumption operation that is required for reduction of the matrix and counting operation (counting the number of ones in rows), which is needed for selection purposes. Reprogrammability enables to reuse the same unit for solving different combinatorial problems. III.5. Control unit with modifiable functionality, implementing top-level control. Modifiability is needed to realize some application-specific operations. For example,

for any removed row/column in the considered covering algorithm it is necessary to correct values for auxiliary stack. Let us consider now how the suggested architecture can be employed for realizing search algorithm that solves covering problem described in section II. Fig. 4 demonstrates all the steps of the algorithm execution. Fig. 5 shows the contents of all stacks and a coded value of the result D, G, F that is stored in a general-purpose register. Note that for 3 stacks in fig. 5 additional registers are used. This is necessary because we have to change masks and the number of ones in between push operations. storage for matrix stack of masks for rows 8 1 1 1 1 1 0 1 1 1 1

6 1 1 1 1 1 0 1 1 0 1

5 1 1 1 1 1 0 0 1 0 1

p u s h

4 1 1 1 1 0 0 0 1 0 1

3 1 0 1 1 0 0 0 1 0 1

2 1 0 1 1 0 0 0 0 0 0

p u s h

3

auxiliary stack

6

1 1 0 1 0 0 0 0 0 0 0

ABCDEFGHI 1 1 0 0 0 01 0 1 1 4 2 0 1 1 0 00 1 0 0 4 3 3 0 0 1 1 01 0 0 1 4 4 1 0 0 1 10 0 0 0 3 5 0 0 1 0 00 1 1 1 4 6 0 1 0 0 01 0 0 0 2 7 0 1 0 0 10 1 0 1 6 4 8 1 0 1 1 00 0 1 0 4 9 0 0 0 0 01 0 1 0 2 10 0 0 0 1 0 0 0 0 1 2 2 1 4 4 4

1 1 2 5 3 3

7

saving in general-purpose register: D,G,F (covering)

8

7

push 1 0 0 0 0 0 0 0 0

1

1 0 0 1 00 0 0 0

3

push 1 0 1 1 1 0 0 0 1

4

1 0 1 1 10 1 0 1

6

push 0 0 0 1 0 0 0 0 0 3 push 0 0 0 0 0 0 1 0 0 6 stack of branch masks

1 1 1 1 10 1 1 1

7

stack of masks for columns

1

2

3

4

5

6

7

8

0 3 0 2 4 2 4 3 2 2 2

0 3 0 0 4 2 4 3 2 2 2

0 3 0 0 4 2 4 0 2 0 2

0 0 0 0 2 2 2 0 2 0 2

0 0 0 0 1 2 2 0 2 0 2

0 0 0 0 1 2 0 0 2 0 2

0 0 0 0 0 1 0 0 1 0 1

0 0 0 0 0 0 0 0 0 0 0

p u s h

p u s h

Minimal number of ones push D

3

push G

6

stack of results

storage for matrix stack of masks for rows 5 1 1 1 1 1 0 0 1 0 1

4 1 1 1 1 0 0 0 1 0 1

3 1 0 1 1 0 0 0 1 0 1

2 1 0 1 1 0 0 0 0 0 0

p u s h

3

1 0 1 0 0 0 0 0 0 0

1 1 2 5 3 3

saving in general-purpose register: D,G,F (covering)

1111100101 1011000000

000100000 000000100

ABCDEFGHI 1 1 0 0 0 01 0 1 1 4 2 0 1 1 0 00 1 0 0 4 3 3 0 0 1 1 01 0 0 1 4 4 1 0 0 1 10 0 0 0 3 5 0 0 1 0 00 1 1 1 4 6 0 1 0 0 01 0 0 0 2 7 0 1 0 0 10 1 0 1 4 8 1 0 1 1 00 0 1 0 4 9 0 0 0 0 01 0 1 0 2 10 0 0 0 1 0 0 0 0 1 2 2 4 4 1 4

push 1 0 0 0 0 0 0 0 0

1

1 0 0 1 00 0 0 0

3

1 0 1 1 10 0 0 1

4

push 0 0 0 1 0 0 0 0 0 2

1

2

3

4

5

0 3 0 2 4 2 4 3 2 2 2

0 3 0 0 4 2 4 3 2 2 2

0 3 0 0 4 2 4 0 2 0 2

0 0 0 0 2 2 2 0 2 0 2

0 0 0 0 1 2 2 0 2 0 2

p u s h

push D

2

push H

5

stack of results

push 0 0 0 1 0 0 1 0 0 5 stack of branch masks

Figure 6. A backtrack process to the branching point G-H

Step 1: At the beginning, mask registers for rows and columns contain all zeros (see fig. 7), which requires to consider all rows and columns of a given matrix. According to the reduction rules (see section II) the rows 1, 3 and the column A are removed from the initial matrix. As a result two bits of mask registers for rows 1, 3 and one bit of mask register for column A are changed from 0 to 1. Begin Reset all the registers

Figure 4. Execution of search algorithm for solving the covering problem in the combinatorial processor

auxiliary stack

6

1

Calculate the number of ones in each row and store the result in auxiliary register

Run Γ (see fig. 1) End

with the following modifications:

removing a column c

decrementing values for fields of the auxiliary register that correspond to bits of the column c with values “1”

removing a row r

reset field r of the auxiliary register (i.e. set this bit to “0”)

Rows mask register stack of masks for columns

auxiliary stack

100000000 101110001

00001220202 13004243222

Columns mask register

Auxiliary register

stack of results

G D

coding of covering D,G,F in general-purpose register

000101100

Figure 5. Intermediate values that are kept in stack memory, mask and general-purpose registers Fig. 6 illustrates a backtrack process to the branching point G-H (see the left-bottom rectangle in fig. 2). Fig. 7 shows the basic operations of top-level control algorithm, which has to be realized by the control unit depicted at the bottom of fig. 3. This algorithm activates the basic search algorithm Γ shown in fig. 1 with trivial modifications (see the right part of fig. 7). Let us consider now the steps 1-8 depicted in fig. 4 in circles in a bit more detail.

Figure 7. The basic operations of top-level control algorithm

Step 2: In accordance with the selection rules the row 4 is chosen (this is the first row from the top that has a minimal number of ones). Step 3: The first column from left to right that has "1" in the row 4 is chosen. This is the column D. Since this is a branching point that might require to recede in order to evaluate another branch (E in our case) it is necessary to push onto the stacks all data that are needed to restore this branching point in future. These data are the following: the mask 1011000000 for rows; the mask 100000000 for columns; the values of ones in different rows 03004240202, where "0" indicates an absence of the row, the less significant value (2) shows the minimal number of ones; the mask 000100000 for the first branching point, which does not need to be considered in future for new branching; and the column D that has already been included into the covering. Note that positions of zeros in the auxiliary stack (in the auxiliary register) correspond to positions of ones in the stack of masks for rows (in the rows mask register). It allows to replace two stacks (and two registers) with just one stack (and one register). The rows 8, 10 (because they have ones in the column D) and

IV. REPROGRAMMABLE FUNCTIONAL UNIT There are many digital systems that require operations over binary and ternary vectors that might have an arbitrary size. An example of such system is demonstrated in fig. 3. On the one hand, the number of feasible operations on such vectors is practically infinite. On the other hand, as a rule, each particular application (see, for example, section II and [24]) requires a very limited number of operations. Thus it is rational to construct a reusable circuit that performs a limited number of operations over considered vectors, but where these operations can be customized for an unlimited (or at least

very large) number of particular applications. Let us assume that the circuit has been designed in such a way that: • at any given time it can perform any Q operations over vectors from a set O of allowed operations; • the number Q is limited and Q