Compilation from Matlab to Process Networks - CiteSeerX

4 downloads 2354 Views 87KB Size Report
graph (PRDG) which is the model from which the genera- ... ing no tokens, the process stalls completely until tokens be- .... fb. (!(c)). ,! :::fx. (!(c)). ,! ::: (7). When a function fires, it consumes data from the read ... with random access behavior.
CASES’99, OCTOBER 1-3, WASHINGTON 1999

1

Compilation from Matlab to Process Networks Edwin Rypkema and Ed F. Deprettere Bart Kienhuis Delft University of Technology Univeriteit of California 2628 CD Delft, The Netherlands Berkeley, CA 94720, USA frypkema,[email protected] [email protected] Abstract— Novel high-performance, domain specific, embedded architectures are often composed of a microprocessor, some memory, and a number of dedicated coprocessors. Onto these embedded architectures, applications will be executed that belong to the domain of multi-media processing, mobile communication, and adaptive array processing. In general, these applications are written using an imperative model of computation most commonly C or Matlab. A better specification format would be to use an inherent parallel model of computation like Process Networks. This paper presents a three-step approach to automatically transform a class of Matlab specifications into a process network specification. Keywords—Process Networks, Matlab, Compilation, Embedded Architectures.

I. I NTRODUCTION Novel high-performance, domain specific, embedded architectures are often composed of a microprocessor, some memory, and a number of dedicated coprocessors. Examples of these new architectures are for example the Prophid architecture [1], The Jacobium architecture [2], and the Pleiades architecture [3]. These architectures are used in respectively, video consumer appliances, adaptive radar processing, and mobile communication devices that are multi-standard. All these architectures have in common that they exploit parallelism using instruction level parallelism offered by the microprocessor, but also coarsegrained parallelism offered by the coprocessors. The coarse-grained parallelism is vital to achieve the performance constraints set for the embedded architectures. Onto these embedded architectures, applications will be executed that belong to the domain of multi-media processing, mobile communication, and adaptive array processing. In general, these applications are written using an imperative model of computation most commonly C or Matlab. Although this model of computation is well accepted to specify applications, it does not reveal parallelism due to its inherent sequential nature, even though many applications contain a lot of parallelism. Compilers exist that are able to extract instruction level parallelism from the original specifications at a very fine level of granularity. They

are, however, unable to exploit coarse-grained parallelism offered by the coprocessors of the architectures. A better specification format would be to use an inherent parallel model of computation like Process Networks [4], [5]. This model of computation describes an application as a network of concurrent executing processes. It describes parallelism naturally from the very fine-grained to the very coarse-grained, and each process in a process network is described using an imperative language. Consequently, a process could be mapped onto a coprocessor or onto a microprocessor exploiting all available techniques available to execute as fast as possible on the microprocessor. It would be nice to have a compiler that could extract the available parallelism from an application described in Matlab or C automatically and convert it into a process network description. This paper focuses on the steps and problems involved in the transformation of an imperative specification into a process network specification. We confine ourselves to the class of affine nested loop programs (NLP) [6]. Many applications of interest for the embedded architectures are described fully or partly as affine NLPs. Section II deals with the specific problems encountered when going from an imperative specification to a process network specification. Section III is about the way we decompose the transformation task into three smaller tasks and show that for some of them, we can use existing tools. Section IV deals with the polyhedral reduced dependence graph (PRDG) which is the model from which the generation of a process network starts. Section V gives the structure of the nodes in the process network description and explains how to specify processes. In Section VI, we describe the tasks and problems involved in the generation of processes from a PRDG. Section VII gives conclusions and directions for future work. II. P ROBLEM S TATEMENT Applications are specified using a particular Model of Computation (MoC), which is a formal representation of the operational semantics of a network of functional blocks describing computations. A language like Matlab has an imperative MoC whereas a process network has a process

CASES’99, OCTOBER 1-3, WASHINGTON 1999

network MoC. The problem we address in this paper is the steps involved to automatically transform an imperative specification written in Matlab in a specification in terms of a process network. As process network model, we use the Stream-based Function Model (SBF) [7]. This model describes an application as a network of SBF objects that are interconnected by channels. A channel is an unbounded FIFO queue that can contain an infinite sequence of tokens, i.e. a stream. Processes can write to a channel unconditionally, but can only read from the channel when the queue is non-empty. An SBF object describes a process, which maps one or more input streams into one or more output streams by executing a sequence of statements. Some statements perform a read on a channel and some statements perform a write on a channel. When reading from a channel containing no tokens, the process stalls completely until tokens become available again, implementing a blocking-read. A write on a channel can always proceed, implementing a non-blocking write. An SBF object describes a process in terms of a controller, a state, and a set of functions as shown in Figure 4. An enabled function consumes a number of tokens from input channels, performs its function, and writes tokens to output channels. By repeatedly enabling functions, an SBF object operates on streams. The controller enables the functions of the set in a particular sequence and uses data stored in the state of the SBF object to determine which function it must enable next. An SBF object differs from the well-known Kahn Process Model [4], in that it describes a process in a well-defined structured way. In the next section, we describe our approach to transform a Matlab specification to a specification in the SBF Model. III. S OLUTION

APPROACH

Our approach to transform a Matlab specification into a specification that uses the SBF Model, is to divide this transformation into a number of steps as shown in Figure 1. In this figure, a box represents a result and an ellipsoid represents an action. Given a Matlab specification, we convert the specification into a single assignment code (SAC) specification. The SAC describes all the parallelism available in the original Matlab specification. Next, we derive from the SAC a polyhedral reduced dependence graph (PRDG) [6] specification. From this PRDG, we go to the SBF Model. We derive from the PRDG the network description, called a SBF network, and the individual SBF objects. In the path from Matlab to the PRDG, we used two tools that we have already available, namely MatParser and DgParser, which are both described in [6]. MatParser is a

2

Step 3 in more detail

Matlab

Step 1.

Step 2.

MatParser

reduced dependence graph

single assignment code

domain scanning

DgParser linearization polyhedral reduced dependence graph domain data partitioning reconstruction

SBF network generation

SBF objects generation SBF objects

Step 3. SBF network

SBF objects

Fig. 1. Solution approach to transform a Matlab specification into a process network specification.

compiler that finds all the parallelism available in the Matlab code using a very aggressive data-dependence analysis technique based on integer linear programming. Consequently, MatParser handles only programs that belong to the class of Nested Loop Programs. The output of MatParser is a SAC version of the algorithm in terms of a set of consecutive affine nested loop programs [8]. In this paper, we outline the procedures for generating the SBF network and the SBF objects from a PRDG description. The generation of the SBF network is straightforward. The generation of the SBF objects, however, is further decompose into domain scanning, linearization, and domain reconstruction. We discuss these steps in more detail in section VI. In the next two sections, we first look in more detail what a PRDG is as well as what the SBF model is. IV. P OLYHEDRAL R EDUCED D EPENDENCE G RAPH The output of the tool DgParser is the specification of a polyhedral reduced dependence graph (PRDG). A polyhedral reduced dependence graph is a compact representation of a dependence graph (DG) using parameterized polyhedra. A polyhedral reduced dependence graph (PRDG) is a directed graph

G = (V; E )

(1)

where V is a set of node domains

V

=

fv ; v ;    ; v V g; 1

2

j

j

(2)

and where E is a set of edge domains

E = fe1; e2;    ; e E g: j

j

(3)

CASES’99, OCTOBER 1-3, WASHINGTON 1999

3

Figure 2 shows an PRDG where the cardinality of the set of node domains, jV j, is 5, and the cardinality of the set of edge domains,jE j, is 12. e

a

A

C k

b f

g

B. Edge domain An edge domain is the ordered pair (vi ; vj ) of node domains together with the ordered pair (pi ; pj ) of port domains where pi is the OPD of vi and pj the IPD of vj . This ordered pair corresponds with a data dependency in a DG. The data dependency can be expressed using an affine mapping M . This affine mapping is the same for all ports belonging to the same IPD. C. Example

E

c

To illustrates the notion of node and port domains, we show in Figure 3 a node that represents node C in Figure 2. d B D The figure shows the node domain (a), its iteration domain h with the iterators i and j (b), its port domains (c)-(f), and i its view as it appears in the PRDG (g). Thus the four port j domains (c)-(f) partitions the node domain (a) of node C. Fig. 2. An example of a polyhedral reduced dependence graph. For illustration purpose, we assume that the polytope of the node domain has a triangular shape. The next subsections we explain the notions of node doipd main and edge domain and give an example. opd l

1

A. Node domain

ipd2

A node domain is the collection of a linearly bounded lattice (LBL) [9], a function, and a set of port domains. An LBL is a (integral) polytope together with an affine mapping. A polytope P is a set of integral points given by

I

P = fx 2 ZnjAx  bg

(a)

C

(c)

(e)

i 0

1

2

3

N

(g) opd 3

opd4

1 2

j

(4)

ipd 3

N

3

ipd4 (b)

(d)

Iteration Domain

(f)

Port Domains

(5)

All matrices and vectors have proper dimensions here. In this paper we consider the case where L is the identity matrix and is the all-zero vector, both of proper dimension 1 . The polytope defines a set P of points, where each point corresponds to a node in the original DG and the set of these points is called the iteration space or iteration domain of the node domain. With every point inside the iteration domain, the same function is associated. A function has a number of input ports and output ports. An input port corresponds with an argument of the function; an output port corresponds with a value the function returns. Input ports corresponding to the same argument of the function are grouped together and form an input port domain (IPD); they have in common that they receive data from the same output port domain (OPD).

m

1 This simplification can be extended easily to the more general case of -polyhedra [10], [11].

Z

opd 2

Node Domain

x

and an LBL, , is the set of points defined by a polytope P and an affine map

I = fI jI = Lx + m ^ x 2 P g

1

Fig. 3. A node domain and its corresponding port domains.

In (c) and (d), we show IPDs and in (e) and (f), we show OPDs. In (c) we identify two IPD functions, ipd1(i; j ) and ipd2(i; j ). In (e) we identify two OPD functions, opd1(i; j ) and opd2(i; j ). The figure shows one dependency between opd1 of port domain (e) and ipd1 of port domain (c). The dependencies in the PRDG are expressed in terms of an affine function M (i; j ) between the different IPDs and OPDs. From the PRDG, we have to construct the network of SBF objects and the SBF objects themselves. Currently, the network of SBF objects has the same topology as the PRDG, i.e., each node in the PRDG is mapped onto a single SBF object. V. T HE SBF M ODEL The SBF Model describes an application as a network of SBF objects that are interconnected by channels. Each

CASES’99, OCTOBER 1-3, WASHINGTON 1999

4

SBF object describes a process in terms of a controller, a state, and a set of functions, as illustrated in Figure 4.

state read ports

f1 f2

write ports

...

Program 5.1:

enable signals

OBJECT

out1; out2;    ; outJ ) = f (in1; in2;    ; inI )

The set of functions defines the function repertoire F = ff1; f2;    ; fjF jg. The state of the object consists of control state c and data state d. The controller operates on the control state and defines two functions, i.e. the transition function ! , and the binding function  which are defined by

forall values outj , f () returns (j = 1; 2;    ; J ) forall output port domains opd` , (` = 1; 2;   ; L) if c 2 opd` , bind value outj to proper output

c ! !(c) VI. D ERIVING

(6)

where C is the space of all possible values of c 2. The transition function ! determines the new state s0 from the current state s. The binding function  determines what function has to be enabled for the current state s and exactly one function is associated with a state. Enabling of a function is called a firing. When a process executes, a sequence of firings will occur as given in equation 7. !(c)) !(c)) !(c)) (! (c)) finit (,! fa (,! fb (,! : : :fx ,! :::

SBF

(

Fig. 4. An SBF object.

!:C!C  : C ! F;

PSEUDO CODE FOR

forall arguments ini of f () (i = 1; 2;    ; I ) forall input port domains ipdk , (k = 1; 2;    ; K ) if c 2 ipdk , bind argument ini to proper input

f|F| controller

and output binding functions as a sequence of nested conditional constructs in terms of the control state. If all conditions hold, a read variable is bounded to the argument of the function. Now, let I be the number of arguments of f () and J the return values, and let K be the number of input port domains and let L be the number of output port domains, we can describe an SBF object as follows:

(7)

When a function fires, it consumes data from the read ports, from the state, or from both, and it produces data on the write ports, on the state, or on both. Each function should know where to get its input data from and where to send its output data. This leads to so-called function variants, which are functions with the same functionality but bound differently to read and write ports, and state. To avoid an explosion of variants, we define a single function invocation and define a input binding function and output binding function. The input binding function binds the function arguments to inputs (state/read ports) and the output binding function binds the values that the function returns to outputs (state/write ports). We specify the input 2 Because we look at NLP programs only, ! is a function in terms of the control space c only, and not from the combined control and data space [7].

AN

SBF

OBJECT FROM A

PRDG

As shown in Figure 1, we divide the generation of an SBF object from a PRDG description into three different steps, which we now discuss in more detail. A. Domain Scanning For each SBF object, we need to derive it’s transition function ! . For now, we construct ! such that it describes the lexicographical order imposed by the original nestedloops, a process we call domain scanning. The orignial nested-loops define for each node in the PRDG a polytope. We derive ! from this polytop by enumerating its points. A problem is that the nested-loops possiblely describes points outside the iteration space of a node, although when evaluated in Matlab produces the correct iteration space. To find ! , we rely on the Fourier-Motzkin algorithm [12] to obtain a system of constraints describing the polytope set-up by the nested-loops, but also to exclude points that are outside the iteration space. Note that we follow the lexicographical order, but we could have selected other orderings. B. Linearization The original Matlab specification can contain higherdimensional arrays representing vectors and matrices. In the process network, however, tokens are transported via channels. By indexing the tokens transported over the channel, the stream of indexed tokens can be viewed as

CASES’99, OCTOBER 1-3, WASHINGTON 1999

5

a one-dimensional array. The order in which a consuming process reads token from a channel is the same as the order in which tokens are written on the channel by the producing process. Depending on the data dependencies and the domain scanning, expressed by ! , the read order may or may not be the order in which the tokens are produced. In general, the tokens read from a channel will need to be reordered. We perform the reordering by using a data structure in the SBF object, i.e. the state, that provides us with random access behavior. The procedure of mapping higher dimensional arrays onto one-dimensional arrays and the out-of-order address generation is linearization.

VII. C ONCLUSIONS

AND FUTURE WORK

In this paper, we presented a three-step approach to automatically transform a Matlab specification into a process network specification in terms of a network of SBF objects. The process network model of computation fits better with modern embedded architectures using coprocessors because it describes coarse-grained parallelism in applications. We explained the SBF model as well as what a polyhedral reduced dependence graphs (PRDGs) is. Furthermore, we partitioned the transformation into three steps. For two of the steps, we could relay on existing tools. We elaboThe linearization method relies on methods to count the rated on the third step, which encompasses the specificanumber of integral points contained by a convex poly- tion from a PRDG of the individual SBF objects and the tope [13], [14], [15]. A solution to a counting problem network of SBF objects. is given by a so-called Ehrhart Polynomial. Using such The status of this work is that we implemented in softEhrhard polynomial, we derive a read polynomial. By ware the methods to scan the domains and to make the outevaluating this polynomial in the order tokens are con- put port domains (OPDs) explicit. The result of the compisumed, we obtain the the proper address where to read a lation is a C++ definition for each SBF object. We are still token from. working on the linearization problem. For simple Matlab programs, we already showed that we can automatically compile it, using the trajectory illustrated in Figure 1. A C. Domain Reconstruction simple simulator is available that can simulate the derived SBF objects and network [7]. However, we also working In Program 5.1, we use the input and output port domains on a port to the PN-domain in Ptolemy II [17]. This allows to determine where a function should read from and write us to combine the derived process network description with to. As such, an explicit specification of the IPDs and OPDs other domains, enabling the description and simulation of is required for the generation of the SBF objects. The probmore complex systems. lem is that MatParser generates a SAC description in which The presented work contains subjects for further reonly the IPDs are explicitly specified. Hence, we need to search. One subject could be the use of more complex comreconstruct the OPDs to make them explicit. munication structures between the nodes of the PRDG. For Making the output port domains explicit is illustrated example, one could think of sharing of channels. Another in Figure 5. It shows two communicating node domains subject for further research is to define operations that operNDp and NDc. The tokens produced by port domain Pp of ate on the PRDG, like index transformations, localization, node domain NDp are to be consumed by port domain Pc partitioning, scheduling, and memory optimizations. Usof node domain NDc, as describe by the data-dependency ing these operations, we could optimize a network of SBF with mapping M . Port domain Pp is an OPD and port do- objects in terms of some metric like throughput, memory main Pc is an IPD. To make Pp explicit, we should apply use or computation complexity. M (), which is also derived by MatParser, to IPD Pp, which For more information about the work described in this is a operation on -polyhedra [16]. paper as well as references to the tools used, please have a look at http://www.gigascale.org/compaan

Z

M() Pp NDP

opd

R EFERENCES ipd

[1]

Pc NDC

[2]

Fig. 5. Making the output port domain explicit. [3]

Jeroen A.J. Leijten, Jef L. van Meerbergen, Adwin H. Timmer, and Jochen A.G. Jess, “Prohid, a data-driven multi-processor architecture for high-performance dsp,” in Proc. ED&TC, Mar. 1720 1997. Edwin Rijpkema, Ed F. Deprettere, and Gerben Hekstra, “A strategy for determining a jacobi specific dataflow processor,” in Proceedings ASAP’97 conference.July 1997, IEEE Computer Society Press. Arthur Abnous and Rabaey Jan, “Ultra-low-power domain-

CASES’99, OCTOBER 1-3, WASHINGTON 1999

[4]

[5]

[6] [7]

[8]

[9]

[10]

[11] [12] [13]

[14]

[15]

[16]

[17]

specific multimedia processors,” in VLSI Signal Processing, IX, 1996, pp. 461–470. Gilles Kahn, “The semantics of a simple language for parallel programming,” in Proc. of the IFIP Congress 74. 1974, NorthHolland Publishing Co. Edward A. Lee and Thomas M. Parks, “Dataflow process networks,” Proceedings of the IEEE, vol. 83, no. 5, pp. 773–799, May 1995. Peter Held, Functional Design of Data-Flow Networks, Ph.D. thesis, Dept. EE, Delft University of Technology, May 1996. A.C.J. Kienhuis, Design Space Exploration of Stream-based Dataflow Architectures: Methods and Tools, Ph.D. thesis, Delft University of Technology, Jan. 1999. Ed Deprettere, Peter Held, and Paul Wielage, “Model and methods for regular array design,” Int. J. of High Speed Electronics, Special issue on Massively Parallel Computing-Part II, vol. 4, no. 2, pp. 133–201, 1993. Lothar Thiele, “Resource constrained scheduling of uniform algorithms,” in Conference on Application Specific Array Processors, October 1993, vol. 20, pp. 29–40. C Ancourt, G´en´eration Automatique de Code de Transfert pour Multiprocesseurs a` M´emoires locales, Ph.D. thesis, Universit´e de Paris VI, 1991. P.M. Gruberr and C.G. Lekkerkerker, Geometry of Numbers, North-Holland, Amsterdam, 1987. C. Ancourt and F. Irigoin, “Scanning polyhedra with DO loops,” in Proc. ACM SIGPLAN ’91, june 1991, pp. 39–50. Ph. Clauss, “Counting solutions to linear and nonlinear constraints through ehrhart polynomials: Applications to analyse and transform scientific programs,” in 10th International Conference on Supercomputing, Philadelphia, May 1996. Ph. Clauss and V. Loechner, “Parametric analysis of polyhedral iteration spaces,” in IEEE Int. Conf. on Application Specific Array Processors, ASAP’96, Chicago, Illinois. August 1996, 1996. Ph. Clauss and V. Loechner, “Parametric analysis of polyhedral iteration spaces,” Journal of VLSI Signal Processing, vol. 19, pp. 179–194, July 1998. Patrice Quinton, Sanjay Rajopadhye, and Tanguy Risset, “On Manipulating -polyhedra,” Tech. Rep., Institut de Recherche en Informatique et Syst`emes Al´eatoires, 1996. J. Davis, R. Galicia, M.C. Goel, Hylands, E.A. Lee, J. Liu, X. Liu, L. Muliadi, S. Neuendorffer, J. Reekie, N. Smyth, Tsay J., and Y. Xiong, “Heterogeneous concurrent modeling and design in java,” Tech. Rep., University of California,Dept EECS, Nov. 1998.

Z

6

Suggest Documents