Distributed Software for Complex Boolean Tasks in ... - Semantic Scholar

3 downloads 4224 Views 180KB Size Report
Lists of Ternary Vectors (OTVL), Unified Modeling Language. (UML), Virtual Parallel Computer ... university and company, and with appropriate hardware and software techniques these computers can be treated as a. Virtual Parallel Computer ...
Distributed Software for Complex Boolean Tasks in Circuit Design Bernd Steinbach, Christina Dorotska, Tochko Dobrev 1

Abstract - We present a concurrent and distributed application for solving complex computational tasks in parallel. The design process is demonstrated using an important CAD problem, which can be reduced to the calculation of the difference between two Boolean functions. Keywords – Difference of Boolean Functions (DIF), Ordered Lists of Ternary Vectors (OTVL), Unified Modeling Language (UML), Virtual Parallel Computer (VPC), Java Remote Method Invocation (RMI).

I. INTRODUCTION Heterogeneous computer networks are available at every university and company, and with appropriate hardware and software techniques these computers can be treated as a Virtual Parallel Computer (VPC) [1], characterized with its big raw power and low total price. By developing a concurrent and distributed application [2], it is possible to solve a task in parallel. Such an application is a program, which parts are distributed over VPC nodes and can be used for independent calculation of parts of an extreme computational problem. Suppose a VLSI circuit should be designed. The complex internal Boolean functions must be realized by multilevel circuits. Each of the millions of logic gates can synthesized using the bi-decomposition. An OR-bi-decomposition (see Fig. 1) exist if and only if (1) holds [3]: n

n

f ∧ min f ∧ min f = 0 xa

xb

(1)

In the order to check (1) two DIF-Operations

DIF ( f1 , f 2 ) = f1 − f 2 = f1 ∧ f 2

(2)

are necessary:

DIF ( DIF ( f , min k ( f , x a )), min k ( f , xb )) = 0 (3) Thus, an extreme efficient calculation of the DIF-operation is necessary for VLSI circuit design. This DIF-operation will be studied in this paper. There exists a very simple xa f(xa,xb,xc) algorithm to calculate the g difference of two Boolean xc OR functions. If these functions xb h are described by such Boolean vectors for which Fig. 1. OR-bi-decomposition the function is equal one, all pairs of vectors taken from the lists must be compared. In the case, if the Boolean vectors are equal, the selected vector from the first list has to be removed (see Fig.2). Bernd Steinbach, Christina Dorotska, Tochko Dobrev1 Freiberg University of Mining and Technology, Akademiestr. 6 09599 Freiberg, Germany E-mail:{steinb, dorotsk}@informatik.tu-freiberg.de, [email protected]

The disadvantage of this simple f2 algorithm is the time consuming large f1 number of vectors comparisons. Considering the large number of DIF-calculations in circuit design, it is f1 ∧ f2 necessary to use all possibilities to speed up these calculation of the difference. Fig. 2. Difference of Section 2 introduces improvements two Boolean functions of the data structure and the algorithms for sequential calculation of the DIF-operations. Section 3 shows how the best founded sequential DIFalgorithm can be solved in parallel and presents with the help of Unified Modeling Language (UML) an object-oriented, concurrent and distributed application for its computation. Finally, in section 4, we summarize our paper.

II. OPTIMAL SEQUENTIAL ALGORITHMS AND ORDERED LISTS OF TERNARY VECTORS The first idea to speed up the DIF-algorithm is use the list of ternary vectors (TVL) instead of the list of Boolean vectors (BVL) [4]. Both of these data structures include all these vectors in the list, for which the function has the value ONE. A binary vector consists of elements ZERO and ONE. A ternary vector (TV) consists of elements ZERO, ONE and STROKE. A STROKE is allowed to replace a ZERO or an ONE. Two binary vectors that have a difference only in one column may be combined into one ternary vector. A ternary vector that includes s STROKE-elements, represents 2s Boolean vectors and reduce the exponential expansion of the list of Boolean vectors. Two ternary vectors, which don’t include some common vectors, are called orthogonal. This property is given, if two ternary vectors have a 0/1 combination in at least one column. In this paper are considered only orthogonal TVLs that means that each pair of ternary vectors in the TVL is orthogonal. Using the TVL as representation of Boolean function for a calculation of the difference DIF(f1,f2), each vector from the first TVL must be compared with each vector from the second TVL to find all not orthogonal pairs of vectors. Then we select all positions where the vector from TVL1 consists of element STROKE and the vector from TVL2 of elements ZERO or ONE. For each selected position we build a new vector that contains in this position the value that is opposite to the element from the second vector. This algorithm is precisely described on the Fig. 3. The second idea to speed up the DIF-operation is focused to the minimization of the number of necessary comparisons of ternary vectors. In order to do this, we have developed an ordered model of classes and subclasses of ternary vectors. Classes are defined by fixed numbers of ones and subclasses of a class by fixed numbers of strokes, so that a TVL is

DIF(TVL1,TVL2)

DIF(OTVL1,OTVL2)

for all vectors tv1 from TVL1

for all Sub_TVL1 from OTVL1

for all vectors tv2 from TVL2 tv1 orthogonal to tv2 Y N for all positions where tv1 consists of elements "-" and tv2 of element "0" or "1" new_vect = tv1

delete tv1 from TVL1

Fig. 3. Sequential difference algorithm of two TVLs

represented as ordered series of separate TVLs (Sub_TVLs), and to one Sub_TVL belong only vectors with the same numbers of ones #o and strokes #s. More information on the TV-ordering can be found in [5]. This ordered structure is called Ordered List of Ternary Vectors (OTVL). Using this model of ordering we don’t have to compare so much vectors, as by representation of Boolean function with TVL or BVL. In [5] was proved, that vectors from Sub_TVL(#o, #s) can contain a common information (be not orthogonal) only with vectors from such Sub_TVL(#o1,#s1) for which the formulas (4) and (5) are true:

# o + # s ≥ # o1

(5)

number of ones

In order number of strokes to simplify the access to 0 1 2 3 4 5 6 several Sub 0 NULL NULL NULL TVL NULL NULL NULL Sub_TVLs, (0,3) Sub Sub pointer to TVL NULL NULL NULL NULL TVL 1 (1,0) (1 1) these Sub Sub Sub Sub_TVLs 2 TVL TVL TVL NULL NULL (2,0) (2,1) (2,2) are stored in Sub Sub the cells of 3 TVL TVL NULL NULL (3,0) (3,1) Sub Sub Sub triangular 4 TVL TVL TVL matrix (see (4,0) (4,1) (4,2) Sub Fig. 4). The 5 TVL NULL (5,0) number of rows of this 6 NULL matrix corresponds Fig. 4. Matrix structure to the number of ones in Sub_TVL and the number of columns corresponds to their number of strokes. If we represent the first Boolean function by OTVL1, and the second by OTVL2, we can calculate the difference of each

number of strokes Sub_TVL from OTVL1 and the whole OTVL2 0 1 2 3 4 5 6 independently. This is 0 possible, because we 1 consider only orthogonal lists, so if 2 the results will be 3 joined, we get a still orthogonal OTVL. In 4 our algorithm shown on 5 Fig. 5 each Sub_TVL 6 from the OTVL1 will be compared with the Fig. 6. Selected classes from OTVL2 corresponding (see that have to be checked with formulas (4) and (5)) Sub_TVL(2,1) from OTVL1 selected subclasses in OTVL2 with the described algorithm DIF(TVL1,TVL2). The selected classes that have to be checked for Sub_TVL(2,1) from OTVL1 are shown in the Fig. 6. number of ones

save new_vect save new_vect put in tv1 "1" in the put in tv1 "0" in the selected position selected position

(4)

new_OTVL=new_OTVL+new_Sub_TVL

Fig. 5. Sequential algorithm of difference of two OTVLs

sign = tv2 hold in the current selected position element "1" sign = = true Y N put in new_vect "0" put in new_vect "1" in the selected in the selected position position

# o ≤ # o1+ # s1

for selected Sub_TVL2 from OTVL2 new_Sub_TVL =DIF(Sub_TVL1,Sub_TVL2)

III. PARALLEL ALGORITM OF DIFFERENCE OF BOOLEAN FUNCTIONS AND THEIR MODEL BY UML Observing the presented approach we can consequent organize the calculation of the difference between the two OTVLs, OTVL1 and OTVL2, in parallel. The idea is to use the computational power of arbitrary computers in a network, called Workers, where each one receives a part of the whole task from a machine, called Host, makes the difference on the data and sends the partial result back to the host. As described in the previous section the structure of the OTVL1 can be decomposed into n independent chunks where each of them is equal to a Sub_TVL from the OTVL1. The decomposition is done on a computer which plays in our scenario the role of the host and creates thereby a workpile of the chunks. DIF(OTVL1,OTVL2) for all workers send OTVL2 to the worker send next unchecked chunk Sub_TVL1 to a idle worker for selected Sub_TVL2 from OTVL2 new_Sub_TVL =DIF(Sub_TVL1,Sub_TVL2) while there are unchecked Sub_TVL1 on the host send result to the host new_OTVL=new_OTVL+received result from worker

Fig. 7. Parallel algorithm of difference of two OTVLs

Preparing the parallel calculation the host makes a broadcasting to all workers sending them the OTVL2. After that a worker receives a Sub_TVL from the workpile and handles the difference between this Sub_TVL and the OTVL2. The result is sent back to the host and the worker waits for a new partial task. The formal algorithm of this approach is shown on Fig. 7. All vectors from the OTVL1, which are not orthogonal to some vectors in OTVL2, have to be decomposed in new vectors, which don’t contain a common information. These new vectors are stored at the end of an appropriated new Sub_TVL according to their numbers of ones and strokes. Because each new Sub_TVL has its own selected Sub_TVLs from OTVL2, they may contain any vectors, which will be not orthogonal to new vectors from new Sub_TVL. Therefore, we have developed an algorithm (see Fig. 8) that checks all new vectors in the new Sub_TVLs to be orthogonal to vectors from appropriated selected Sub_TVLs from the OTVL2.

methods of the class HostNativeMethods have to be declared as native which give us an opportunity to implement them with C++ using some for this task important methods from XBOOLE. There must be only one host object which controls and operates on the whole data needed for the parallel computation. Thus, it is possible that the class OTVLHost extends the class HostNativeMethods.

while the worker is waiting for next Sub_TVL from workpile for special new_Sub_TVLs for all selected Sub_TVL from OTVL2 DIF(new Sub_TVL, new_Sub_TVL2)

Figure 8. Check new vectors algorithm

A worker, who finished its partial calculation (first order task, see Fig. 7) and waits for a new task from the host, starts this algorithm (second order task, see Fig. 8) and works on it until it has received a new chunk from the workpile. After finishing the first order task, the second order task may continue. The main purpose is that at every interval of the total computational time there is no idle worker. Thus, the whole parallel approach reaches a good load balancing. To use the resources of many platforms for the parallel calculation with the above described strategy we developed a concurrent and distributed application with the object-oriented hardware independent language Java [6]. The architecture of our system can be presented with help of an UML class diagram (see Fig. 9). The class OTVLHost implements the host. When the host starts, first, it decomposes the structure of the OTVL1 into independent chunks, creates the workpile, starts the workers by creating number of threads by instancing the class WorkerThread which is equal of the number of the available machines and finally, makes a broadcast sending of the OTVL2 to all workers. The workpile is implemented in the class Workpile, which has two methods for getting a chunk from, respectively putting a chunk into it. To avoid a situation of dead lock where many workers can access simultaneously to the same part of the workpile, we need synchronization mechanisms. The synchronization is organized by semaphores which are implemented in the class Semaphore. To take some input data for the parallel computation such as type, number of variables and vectors of an OTVL, and to fill the workpile with Sub_TVLs from the OTVL1, the methods of the library for logic design XBOOLE [4] have to be used. This library is written in the C programming language and to access it from the Java run time level we use the Java Native Interface technology (JNI) [7]. Therefore, the

Fig. 9. An UML class diagram of the concurrent and distributed system

A thread in the application is implemented by the class WorkerThread. It extends the class Thread1 from the standard Java programming library and realizes an approach for activating the workers, transfers the data between them and the host, and specifies the rules how a worker should access the workpile. A worker is implemented on the basis of the Remote Method Invocation technology (RMI) [8] which is integrated in the Java programming language. It allows an object running on one Java Virtual Machine (JVM) on some computer to invoke methods on another object, called remote object, running on a different JVM on a different computer. Every class which has remote objects as instances has to implement directly or indirectly the basic Remote2 interface and extends the class UnicastRemoteObject3, both from the standard Java programming library. A remote object is presented by its Stub and Skel components. When invoking a method on it, its Stub is downloaded on the caller machine, called Host in our scenario and acts as a local representative or proxy for the object (see Fig. 10). The Stub is responsible, first, to care out the method call on the caller machine, to build the connection to 1

java.lang.Thread java.rmi.Remote 3 java.rmi.server.UnicastRemoteObject 2

the Skel component and finally, to transmit some parameters such variables, Java objects, and etc. to the other machine, called Worker. The Skel takes the parameters and runs the method on the Worker (server). Thereby, there is an illusion that all happens local on the Host (client).

with the C++ programming language. It is well known that Java don’t permit a multiple inheritance. Therefore, an aggregation between the classes OTVLWorker and WorkerNativeMethods has to be implemented (see Fig. 9).

IV. CONCLUSIONS Remote Object

Stub

Skel

Host

Method1( ) Method2( ) Methodn( )

Worker Fig. 10 The RMI model

To make remote objects running in a JVM on some computer accessible in the network, a references to them have to be bound in a local naming service provided by RMI, called rmiregistry. In the default case, there must be such service on every machine which has remote objects. Assuming the RMI basic features, we designed the workers in our application as remote objects. To avoid the disadvantage to have a local registry on every machine, a service for centralize object management, called central registry was developed. More detailed information about it can be found in[2]. A worker is implemented by the class OTVLWorker. As RMI instructs, it implements the connection to the remote object indirectly, over the interface OTVLWorkerInterface which extends the interface Remote. Furthermore the class OTVLWorker extends the class UnicastRemoteObject, which methods define mechanisms how the data between the host and the workers can be transmitted. It is important to remark that a thread on the host uses the interface OTVLWorkerInterface (see Fig. 9) to instance a worker object by taking a reference to it from the central registry. To avoid, that data during a remote method invocation can be corrupted, an object has to be decomposed before it will be send and rebuilt after it was received, the class OTVLWorker has to implement the empty interface Serializable4 from the Java standard programming library. When a thread, running on the host, activates a worker object, the worker’s Stub component is downloaded on the host within the thread and acts as a local representative of this object. Thus, it is possible to transmit data to the remote machine which makes the computation on the difference and sends the result back to the host. The methods of the library XBOOLE are used for the essential computation over the Boolean data on a worker. Therefore, we designed the class WorkerNativeMethods which methods are declared as native and implemented 4

java.io.Serializable

In this paper we have described a very efficient approach for parallel computation of the difference between two Boolean function with large number of variables using Ordered Ternary Vector Lists (OTVL). We have shown that there is an opportunity to divide the structure of the first OTVL into n independent parts and to organize them into a Workpile which can be handled in parallel in order to reduce the total computing time. For the practical computation of the Boolean task we developed a concurrent and distributed application which realizes a Host/Workers strategy and implemented it with the platform independent programming language Java to reach an effect of homogeneity in a heterogeneous network environment treated as a Virtual Parallel Computer (VPC). Dividing the worker’s task into a first and a second order task, the idle time of the first order task for data transfer of the next chunk can be filled for handling the second order task, which calculates the difference between the new_Sub_TVL and OTVL2. The practical results of the parallel computation of the difference between of OTVLs will be presented in our talk at CADSM2001.

REFERENCES [1] P. Boulet, J. Dongarra, F. Rastello, Y. Robert, and F. Vivien, “Algorithmic Issues on Heterogeneous Computing,” Parallel Processing Letters, vol. 9, number 2, pp.197-213, Apr. 1999. [2] B. Steinbach, and T. Dobrev, “A Concurrent and Distributed Model for Complex Boolean Calculations,” in 4th Int. Workshop on Boolean Problems, Univ. Freiberg, Germany, Sept. 2000, pp.183-189. [3] D. Bochmann, F. Dresig, and B. Steinbach, “A new Decomposition Method for Multilevel Circuit Design,” in European Conference on Design Automation, Amsterdam, The Netherlands, 1991, pp. 374–377. [4] D. Bochmann, and B. Steinbach, Logikentwurf mit XBOOLE. Berlin: Verlag Technik, 1991. [5] B. Steinbach, and Ch. Dorotska, “Orthogonal Block Building using Ordered Lists of Ternary Vectors,” in 4th. International Workshop on Boolean Problems, Univ. Freiberg, Germany, Sept. 2000, pp. 125-133. [6] J. Gosling, B. Joy, G. Steele, and G. Bracha, The JavaTM Language Specification, Second Edition. Massachusetts, USA: Addison-Wesley Publishing Company, 1999. [7] S. Liang, The JavaTM Native Interface: Programmer’s Guide and Specification. Massachusetts, USA: AddisonWesley Publishing Company, 1999. [8] T. B. Downing, The JavaTM RMI: Remote Method Invocation. Foster City, CA: IDG Books Worldwide Inc., 1998.