Conditional and Iterative Structures using a ... - CiteSeerX

1 downloads 0 Views 252KB Size Report
communication penalty drastically decreases when sequential threads of code can be ..... case is the conditional structure if ... then ... else. First of all, the DFG, ...
Conditional and Iterative Structures using a Homogeneous Static Dataflow Graph Model Lorenzo Verdoscia and Roberto Vaccaro Istituto per la Ricerca sui Sistemi Informatici Paralleli - CNR Via P. Castellino, 111

80131 Naples - Italy

e-mail [email protected]

Abstract. This paper presents a static dataflow graph model, where only data tokens are allowed to flow. The proposed model is formally described, and the dataflow graph is obtained by employing only actors with homogeneous I/O conditions. Each actor, which executes an elemental operation, is characterized by having one output and two input arcs. Even though no control tokens are allowed, so that no T-gate, merge, and switch actors are present in this model, it is always possible to represent conditional and iterative structures whose behavior is well-behaved. As homogeneous I/O conditions are a severe restriction to represent the flow of a computation and the token flow in such dataflow graphs is completely asynchronous, proof is given to guarantee their determinacy.

1

I. INTRODUCTION

The recent developments in VLSI technology and the rapid decrease in hardware costs have raised interest in computing devices for performing computationally intensive tasks. One of the hardware architectures which was developed toward this goal is the dataflow machine. In this architecture, a network of processors (called Functional Units) is set up, the computational task is decomposed into smaller tasks to be performed by the individual Functional Units (FU), and the interconnections between them in the network correspond to directed flow of information between FUs. This physical network can be presented by a directed (or dataflow) graph. The input is fed into some FUs, and the results of computations of FUs are moved along the network arcs and used as input for computation at subsequent FUs. Since the dataflow approach [11] is asynchronous, FUs may perform their computations in different speeds. Even though theories exist about dataflow models [4] [14] [22] [25], and many architectures have been proposed (see [19], [28], and [30] for an insightful survey), they can be grouped into static and dynamic (or tagged-token) models. Recently a new proposal, which has the advantages of both models, has been presented in [1]. The static model allows only one token at a time to reside on an arc, while the dynamic one allows essentially unbounded token queues on arcs with no ordering, but each token carries a tag to identify its role in the computation. As opposed to fine grain computation of a static model, a dynamic one seems better tailored for coarse grain computation [5]. Although several interesting proposals based on dynamic models have been formulated in these last years [9][21][16], it preserves, through the convenctional processor cycle, the von Neumann model at some lower level of its implementation. On the contrary, the static model, even though it preserves its dataflow nature, has received several criticisms because of its fine grain computation. An objection is that the overhead of fine grain instruction scheduling prohibits the attainment of acceptable efficency [17]. However, 2

in [12][15][31] it has been shown how fine grain instruction scheduling can be done by simple and efficient hardware mechanisms. Another drawback of the model is that since the task switching occurs at the instruction level, it cannot take advantage of the instruction level locality which is present in the programs. Also such drawback is arguable. For example, the reason for which 128 FUs have been grouped in a cluster, and two of them are connected to the same cell of the Mail Box System of ALFA [32] is just to preserve locality in programs. In this work it has been shown how the communication penalty drastically decreases when sequential threads of code can be incapsulated in a cluster. In this paper, we present at first a static dataflow graph model where actors get homogeneous I/O conditions and then how they can be combined to describe conditional and iterative structures. Homogeneous I/O conditions mean they have one output and two input arcs and consume and produce only data tokens. Although these conditions are rather severe, proof is given in order to show how it is possible to obtain completely asynchronous well-behaved dataflow graphs where the actor firing control is constituted only by the pure static data flow paradigm. Because actors cannot produce control tokens, merge and switch actors [18] are not present in our model. However, associating with actors only elemental logic and arithmetical operations, it is always possible to obtain dataflow graphs, including both iterative and conditional structures, which are well-behaved and where: 1) no feedback interpretation is needed to execute a program correctly. This means that no check needs to verify whether the output token of an actor has been consumed, so, only one-way token flow is present. 2) no synchronization mechanism needs to control the token flow. Thus, the model is completely asynchronous. Since recursive fuctions can be transformed into iterative ones [2][6][7][27], this DFG model does not support recursion as re-entrant code, but rather it utilizes the Iterative structure to represent it.

3

The main advantage of this characterization with respect to other static models [11][15] is that with our model it is possible to establish an easy one-to-one correspondence between graph actors and FUs of a very fine grain dataflow machine [32]. Consequently, VLSI technology can be massively employed to integrate into a chip a cluster of hundreds of simple FUs so that the DFG, for a given application, can be directly mapped onto VLSI. Such dataflow representation is also suitable to describe concurrency in any massively multiprocessor implementation and can be used as a tool for both static analysis of programs and performance and reliability analysis of computer systems modeled as dataflow graphs.

II. DATAFLOW MODEL

A. Dataflow notation Since a static dataflow graph is represented by a directed graph, or digraph, we use certain terminologies and notations from graph theory [8] [20] [3]. Let A = {a1, a2, ..., an} be the set of actors and L = {l1, l2, ..., lm} be the set of links. Definition 1: A dataflow graph (DFG) is a labelled directed graph G = (N, E)

(1)

where N = A ∪ L is the set of nodes and E ⊆ (A × A) ∪ (A × L) ∪ (L × A) the set of edges.

Actors represent functions, links are treated as

place holders of values (tokens) as they flow from actors to actors, and edges are channels of communication like arcs in Dennis' model [10]. Therefore, we will use also the word arc to denote an edge. Definition 2: The head node set (or input nodes) is: HD = {l∈L(a, l) ∉ E, ∀ a∈A}. The tail node set (or output nodes) is: 4

(2)

TL = {l∈L(l, a) ∉ E, ∀ a∈A}.

(3)

These nodes, which are links, represent inputs to and outputs from a DFG. Let I(a) = {l∈L(l, a) ∈ E} be the set of input links (or in-degree) to an actor a and O(a) = {l∈L(a, l) ∈ E} be the set of output links (or out-degree) from an actor a, the set of input actors to a link l is I(l) = {a∈A(a, l) ∈ E} and output actors from a link l is O(l) = {a∈A(l, a) ∈ E}. The set I(a) and O(a) constitute the I/O condititons for an actor a while I(l) and O(l) constitute the I/O condititons for a link l. For them these conditions are: (α) |I(a)| > 0

for all actors a∈A

(β) |I(l)| > 0

for all links l∈L

(γ) |O(a)| > 0

for all actors a∈A

(δ) |O(l)| > 0

for all links l∈L

(4)

Let D ⊆ R be a subset of real numbers, V = {T, F} be the set of logical values T= true and F=false, and B = {0, 1} be the set of boolean values. The set of tokens for a DFG is K = B × W where W = D ∪ V is the set of token values. The pair k=(b, w) is said data token if w ∈D and control token if w ∈V ∀ b ∈B. Definition 3: A token k=(b, w) is valid (or present on an arc) if b=1 and not valid (or not present on an arc) if b=0 ∀ w ∈W. A token value w ∈W is held in a link l∈L (w ↑ l) if k=(1, w), otherwise the link is empty (w ↓ l).

B. Firing rules and determinacy In a DFG the state of a computation is described by a configuration, and the transition between configurations are governed by firing rules. Definition 4: A configuration is an assignment of tokens in a DFG: C(W ↑ L) = {w↑ lw ∈W⊆W and l∈L⊆L}.

(5)

Definition 5: A DFG computation is a function FC such that: FC(C(W ↑ HD)) = C(W ↑ TL)

5

(6)

Definition 6: The firing of an actor is a combined action of consuming input and producing output tokens. One configuration is advanced to another by the firing of actors. The firing rules for a static dataflow model are [14]: 1) an actor is enabled iff (a)

{w ↑ l∀ l∈I(a)} (strict evaluation) and

(b)

{w ↓ l∀ l∈O(a)};

2) any enabled actor may be fired to arrive at the next configuration; 3) an actor is fired if (a)

{w ↓ l∀ l∈I(a)} (input token consuming) and

(b)

Fa({w ↑ l∀ l∈I(a)}) = {w ↑ l∀ l∈O(a)} (output token

producing). Fa is called the functionaly of an actor a and represents the law which transforms input tokens into output ones. The first firing rule, besides, gives an implicit definition of static DFG where no token queueing is allowed. As no assumption is made about the absolute or relative firing speed of a node, dataflow computation is completely asynchronous. Furthermore, if no control is supposed during such a computation, no well-behavedness or determinacy is guaranteed at the global level (input/output). Definition 7: A DFG is said well-behaved when the set of result values on the output links depends only on the set of values presented on the input links. Although firing rules provide a sort of control during a computation, they are not completely general for all actors of any well-behaved DFG. For example, DFGs describing conditional and iterative computations need T-gate, F-gate, Merge, and Switch actors to control the data token flow. These special actors, shown in Figure 1, besides having their own firing rules, are characterized by having heterogeneous I/0 conditions and arcs. For them the first and third firing rules become:

6

T-gate (F-gate) actor: 1) a T-gate (F-gate) actor becomes enabled if (a) (b)

{d ∈D l1, v∈V l 2 {l1, l 2}= I (T-gate) ( I ( F-gate))} and

A A {d ∈D B l ∀l∈O(T-gate) (O( F-gate))};

3) once fired, an enabled T-gate (F-gate) actor (a) (b)

{w l ∀l∈I (T-gate ) ( I ( F-gate))} and

B

FT-gate(F-gate) ({d 1∈D l1, v∈V l 2 {l1, l 2}= I (T-gate) ( I ( F-gate ))})=

A A R{d∈D B l ∀l∈O(T-gate) (O( F-gate))} if v=F (v=T ) =S T{d∈D A l ∀l∈O(T-gate) (O( F-gate))} if v=T (v=F ) Merge actor: 1) a Merge actor becomes enabled if (a) (b)

{d 1∈D l1, d 2∈D l 2, v∈V l 3 {l1, l 2, l 3}= I ( Merge)} and

A A A {d ∈D B l ∀l ∈O( Merge)};

3) once fired, an enabled Merge actor (a)

{d 1∈D l1, d 2∈D l 2, v∈V l 3 {l1, l 2, l 3}= I ( Merge)} and

(b)

FMerge

B

B

B

({d 1∈D l1, d 2∈D l 2, v∈V l 3 {l1, l 2, l 3}= I ( Merge)}) =

A

A

A RS{d ∈D A l ∀l∈O( Merge)} if v=T T{d ∈D A l ∀l∈O( Merge)} if v=F 1

2

Switch actor: 1) a Switch actor becomes enabled if (a) (b)

{d ∈D l1, v∈V l 2 {l1, l 2}= I ( Switch )} and

A A {d ∈D B l ∀l ∈O(Switch )};

3) once fired, an enabled Switch actor (a)

{d ∈D l1, v∈V l 2 {l1, l 2}= I ( Switch )} and

(b)

FSwitch({d ∈D l1, v∈V l 2 {l1, l 2}= I ( Switch )}) =

B

B

A

=

A

RS{d∈D A l l ∈O(Switch ) if v=T T{d∈D A l l ∈O(Switch ) if v=F 1 1

2 2

7

Heterogeneity concerning tokens and I/O conditions of actors, besides constituting an obstacle to VLSI implementation, makes DFGs not easily understandable or easy to use to represent data dependency and their flow along actors. We will show that with a simple example. Example 1 Let us consider the following program: input (w, x) y := x; t := 0; repeat if y > 1 then y := y ÷ 2 else y := y × 3; t := t + 1; until t = w; output y Variables w and x are input variables of the program and y is the output variable. Figure 2 shows the equivalent DFG that garantees the well-behaved behavior. The part encapsuled into the gray rectangle represents the iteration control. The part encapsulated into the white rectangle represents the body of the iteration, which in this case is the conditional structure if ... then ... else. First of all, the DFG, besides having heterogeneous I/O conditions of actors, has heterogeneous links and values (data links to held data tokens and control links to held control tokens). To comprehend how this DFG works, we have to follow the flow of two different kinds of tokens along a graph where actors can have different numbers of input and output arcs, and consume and produce different kinds of tokens. Then, the initial behavior of actors like T- and F-gate, Switch, and Merge depends on their position in the DFG rather than on the program input values. For example, the Switch and two Merge actors belonging to the gray part of the DFG are needed to have set respectively false and true their initial control token values to work right, while the Merge and Switch actor belonging to the white part of the DFG get their initial control 8

token value from the computation of the related Decider actor. We point out that, in the gray part, the initial control tokens for the Switch and Merge actors are automatically present on their control arcs and have different values although they share the same control link. Furthermore, these control values, even though they might be deduced, are not a program input but a programmer's trick to allow the computation to start correctly. Then, not all functions associated with actors are defined in the same domain and assume value in the same codomain. So, while the function associated to an arithmetical actor both is defined and assumes a value in D, the function associated to the actor Decider is defined in D but assumes a value in V and the functions associated to the actors Merge, Switch, and T-gate are defined in W but assume a value in D. Finally, the implicit definition of static dataflow, for this model, requires that each arc is practically constituted by a pair of arcs: value and signal [13], to control the condition that an actor cannot be enabled unless all of its output arcs are empty. In a static datafow machine, this control, obtained by means of a data/acknowledgment mechanism between actors, augments the communication overhead and in a certain way goes against the pure dataflow model because it originates two opposite token flows, data and signal tokens, for the same computation.

III. OUR PROPOSAL

In our DFG model nodes satisfy the following I/O conditions: (α) |I(a)| = 2

for all actors a∈A

(β) |I(l)| > 0

for all links l∈L

(γ) |O(a)| = 1

for all actors a∈A

(δ) |O(l)| > 0

for all links l∈L

(7)

Actors and links are shown in Figure 3 and are represented respectively by a rectangular base and filled circle. Without losing generality, with reference to link

9

nodes, we will consider only DFGs with joint and replica links, shown in Figure 4, which are so defined: A joint link is a link node such that: |I(l)|>1 and |O(l)|=1

(8)

A replica link is a link node such that: |I(l)|=1 and |O(l)|>1

(9)

In this model the set of tokens is: K = B × D. Since no control tokens are allowed in this DFG model, any operation associated with an actor is defined and takes a value in D. We assume that the set of operations is that described in [32].

Firing rules and determinacy Definition 7: A DFG is said to be static if the only legal configurations are those which assign no more than one token to each link: d∈D ↑ l, when it exists, is unique ∀ l∈L. This means that if a new valid token arrives in a link where a not yet consumed token is present, it will substitute the old one because no queueing is allowed. For this model the firing rules are: 1)an actor a is enabled iff {d∈D ↑ l∀ l∈I(a)}; 2) any enabled actor may be fired to arrive at the next configuration; 3) an actor a is fired if (a)

{d∈D ↓ l∀ l∈I(a)} and

(b)

Fa({d∈D ↑ l∀ l∈I(a)}) = {d∈D ↑ ll∈O(a)}.

DFG well-behavedness is guaranteed by actor firing rules and token-level functionality. Token-level functionality means that given the same tokens on its incoming edges, an actor will always produce the same tokens on its outgoing edges, without considering the relative times of incoming token arrival and the computation 10

state. However, we will use two actors in order to obtain well-behaved DFGs representing conditional and iterative structures. These actors, called SEL and ℜ , violate one of the firing rules. The actor SEL transfers an input token to its output. Token-level functionality is imposed to guarantee its well-behavedness: Assertion 1: On the SEL arcs only one token can be present:

A B

B A

d l1 and d l 2 or d l1 and d l 2

for d∈D and (l1, l 2)∈I ( SEL)

For it the first firing rule becomes: SEL is enabled if: {d∈D ↑ ll∈I(a)} (non strict evaluation). The actor ℜ performs a logical operation between its two input tokens. For it the third firing rule becomes: ℜ is fired if: (a)

{d∈D ↓ l∀ l∈I(ℜ)} and

(b)

Fℜ({d∈D ↑ l∀ l∈I(ℜ)}) =

=

RS {d∈D A l d =0andl∈O(ℜ)} if the condition is satisfied T {d ∈D B l l∈O(ℜ)} if the condition is not satisfied

.

However, SEL and ℜ actor violations do not give any restriction to this model to describe dataflow programs. First of all, the actor SEL might be substituted with a joint link whose |I(joint)|=2. We prefer to have this actor because, as we will see, it is employed to identify the starting point of iterative constructs. Then, as firing of an actor is time independent, the ℜ actor does not lose its generality if we consider that it always produces the token {d∈D ↑ ld=0 and l∈O(ℜ)}. The latter will take an infinite time to be produced if its condition is not satisfied. So, the functionality of the specific

11

third firing rule can be written according to the functionality of the general third firing rule: Fℜ({d∈D ↑ l∀ l∈I(ℜ)}) = {d∈D ↑ ld=0 and l∈O(ℜ)}. Since the other actors compute elementary arithmetical operations, we can conclude that: Assertion 2: All actors of the model are well-behaved. Lemma 1: Actors and replica links are the only well-behaved nodes. Proof: If a node is an actor, it is well-behaved for Assertion 2. If a node is a link, it is either replica or joint. If it is replica it is well-behaved because it must only transfer its incoming token to its output arcs. If it is joint, its output is random when two or more tokens are present simultaneously on its input arcs. This contradicts the token-level functionality, and, therefore, it cannot be well-behaved.

IV. MACROACTORS

The simplest DFG includes one actor and three link nodes as shown in Figure 5. Interconnecting actors and links, it is possible to obtain larger DFGs. These larger DFGs, which can be seen as macronodes with in-degree>2 and out-degree>1, might not be well-behaved. In an asynchronous system even though we interconnect wellbehaved links and actors, the resulting macronode may not be determinate. In fact, when cycles1 occur among nodes, clash-free operation is not guaranteed [26]. Consequently, no closure property can be guaranteed [26]. Even so this observation is worth more if DFGs include nodes that are not well-behaved as joint links. However, it is not excluded that macronodes obtained by combining simple nodes, i.e. actors and not well-behaved links, may be determinate. If it happens, we will call them macroactors. 1A

directed cycle (or simply cycle) is a path for which the start and end nodes are the same.

12

Definition 8: A macronode is called macroactor if it guarantees global token-level functionality, that is if: FC(C(D⊆ D ↑ HD)) = C(D⊆ D ↑ TL) is independent of firing order of its actors. Theorem 1: A macronode obtained by connecting determinate nodes without cycles is always well-behaved. Proof: The macronode is called Acyclic macronode. Since there are no cycles, the determinate property of the static dataflow graph model guarantees that it is necessary to examine only one execution sequence to derive the result of graph execution. Case 1: |I(Acyclic)|=n with n>2 and |O(Acyclic)|=1. Under such conditions, Acyclic represents a reverse tree of actors. When one of these a actors is fired, we can substitute its functional value Fa({d∈D ↑ ll∈I(a)})={d∈D ↑ ll∈O(a)} thus obtaining a new macronode with the same O(Acyclic) but |I(Acyclic)|=m2 and |O(Acyclic)|>1. The macronode might not represent a reverse tree. That means it necessarily contains replica links. As they are well-behaved from Lemma 1, we can always split the macronode into smaller ones in correspondence with a replica link. Consequently, such smaller DFGs are reverse trees of actors which share some of the input tokens. Hence, we can conclude that in both cases FAcyclic(C(D⊆D↑HD))=C(D⊆D↑TL) satisfies Definition 8.

13

TEST macroactor The macroactor TEST constitutes the kernel of many other macroactors as iterative and conditional because it is used to control the flow of a data token. It can be represented by a macronode with in-degree=3 and out-degree=1. It is constituted by interconnecting the logical actor ℜ to the actor ADD as shown in Figure 6. Its wellbehavedness comes from Theorem 1. Let k1=(b1, d1), k2=(b2, d2), and k3=(b3, d3)∈K be three tokens and p=d2 Cd3 be the condition evaluating a logical operation C between the token values d2 and d3, its functionality can be expressed as it follows: FTEST({d 1 l1, d 2 l 2, d 3 l 3 for l 1, l 2, l 3∈I (TEST )})=

A A A R{d Al l ∈O(TEST )} if p is verified =S T{d Bl l ∈O(TEST )} if p is not verified 1

1

In addition to the macroactor TEST, we have also its complement that is the macroactor TEST.

V. CONDITIONAL AND ITERATIVE STRUCTURE GRAPHS

A. Conditional DFGs The most immediate application of the macroactors TEST and TEST is the representation of conditional structures. The conditional construct is an example of data-dependence, that is, a case in which only at run-time it is possible to know if part of a program will be executed or not. Consider the following part of a program which evaluates the conditional statement if...then...else: .... if p(x) then f(x) else g(x); ....

14

where p(x)=a(x) C b(x) is the condition, C the logical operation, and a(x), b(x), f(x), g(x), and x∈ D are variables and functions of the program. The macronode representing such DFG is shown in Figure 7 and called COND. Theorem 2: Macronode COND is a macroactor. Proof: To prove that the macronode COND satisfies Definition 8, it is sufficient to show that the joint link, whose |I(joint)|=2, holds no more than one token at a time for each set I(COND). Case 1:TEST verifies the condition p(x). Its functionality guarantees that the token f(x) is present on its output arc. But being this arc an input for joint, f(x) is present in it. On the other side, TEST, being the complement of TEST, can not produce any token because its condition is not verified. So, only f(x) is held in joint link and ready on its output arc. But as this arc is the output arc for the macronode COND, we can conclude that in this case the behavior of COND is functional. Case 2:TEST verifies the condition p(x). The same reasoning of Case 1 is applied. The only difference is that now g(x) is held in the joint link. Therefore, we can conclude that in both cases FCOND(C(D⊆ D↑HD))=C(D⊆ D↑TL) is globally token-level functional. In a similar way it can be proved that the most general construct case, whose DFG is shown in Figure 8, is well-behaved. In this case the condition p(x) for the ith macroactor TEST is: pi(x) = ai(x) C b(x), while mutual exclusion among the conditions of the macroactors TEST is expressed by: pi × pj = F

∀ j ≠ i.

15

B. Iterative DFGs With the term iterative structures we refer to two different kinds of iterations: manifest and data-dependent. Manifest iterations are those whose number of repetitions of a computation is known at compile-time, and hence are independent of the data. In conventional programming languages, this would be expressed with a DO or FOR loop. Data-dependent iterations are those whose number of repetitions of a computation cannot be known at compile-time but at run-time. They can be expressed with WHILE or REPEAT programming language constructs. Here we will refer only to REPEAT programming language construct since the reasoning is easily transferable to WHILE and DO loops. Consider the following iterative part of a program: ..... x:=xI; repeat (x, p(x), f(x))=F(x, p(x), f(x)) until p(x); y:=f(x) ...; ...... where p(x)=a(x) C b(x) is the condition, C the logical operation, and a(x), b(x), f(x), F, y. x, and xI∈ D are variables and functions of the program. The macronode associated to DFG expressing the REPEAT structure, called IT_R, is shown in Figure 9. It is constituted in two parts. The former, which represents the loop body and is indicated by the dashed rectangle LOOPB, is constituted by the actor SEL and the macronode F. The latter, which operates the loop control function and is indicated by the dashed rectangle LOOPC, is constituted by the macroactors TEST and TEST. The two input tokens for SEL actor, xI and x, are respectively the initial and upgraded value for the loop; the dotted link and arc represent other inputs for the macronode F. LOOPC, as it can be observed, is a macroactor like COND with the sole 16

difference that |O(LOOPC)|=2. It must be pointed out that, in a macronode IT_R, there is always an arc that connects TEST and SEL. Theorem 3: Macronode IT_R is a macroactor if the macronode F is a macroactor. Proof: To prove this theorem at first we suppose that IT_R does not have any cycles, that is TEST and SEL actors are disconnected, and then we connect them. Furthermore, without any loss of generality, we suppose that only tokens which are present on SEL arcs influence the computation of IT_R. Step 1:TEST and SEL are disconnected. If IT_R has no feedback arc, |O(IT_R)|=2 and O(TEST ) ⊂ O(IT_R). According to Theorem 1 the macronode IT_R is wellbehaved, if F is a macroactor. Its global token-level functionality can be expressed as: FIT_R({x I l l ∈I ( SEL )})=

A

=

RS{ f ( x)Al l ∈O(TEST)} if p( x) is verified |T{x Al l ∈O(TEST )} if p( x) is not verified

Step 2: TEST is connected to SEL. If the feedback arc is present, |O(IT_R)|=1. The first firing of IT_R acts like in Step 1. If the condition p(x) is verified, f(x) expresses just the global token-level functionality of IT_R. If the condition p(x) is not verified, x becomes a new input for IT_R. IT_R still acts as in Step 1 once it is fired by this upgraded value. This process will continue until f(x) is produced, if the algorithm represented by IT_R converges. But as IT_R always acts as in Step 1 and Step 1 is well-behaved, we can conclude that IT_R is well-behaved, Its global token-level functionality can be expressed as: FIT_R({x I l l ∈I ( SEL )})={ f ( x ) l l ∈O( IT _ R)}

A

A

Figure 10 shows the DFG of the Example 1 program which is obtained by this model.

17

In opposition to the behavior of DFG of Figure 2, besides allowing a quicker comprehension of how it works and which function it represents, DFG of Figure 10 employs only actors with homogeneous I/O conditions and data links. The initial behavior of any actor depends on the program input values rather than on its position in the graph. To check if the DFG is well-behaved we only have to verify if its subgraphs are macroactors rather than following the direct and reverse flow of value and signal tokens. Finally, any function associated with any actor consumes and produces only data tokens. As result of this theorem, the following corollary holds: Corollary : Nested iterative DFGs are well-behaved. Proof: Suppose that the iterative macronode IT_R1 has the macroactor F such that F=IT_R. In this case IT_R1 represents a nested iterative DFG. But as IT_R is a macroactor, from Theorem 3 we have that IT_R1 is well-behaved. Figure 11 shows such a nested iterative structure. The first projection is the outer loop while the second one is the inner. When yI is present on the arc of the SEL actor the outer loop starts, and yI enters the dashed block F to be processed. When xI is I present on the arc of the SEL actor the inner loop starts. The dashed block F1 is activated and, at the end of the loop, produces y1, which is passed to the dashed block F2 for further processing. After finishing its computation, F2 produces f(y), p(y), y, and xI that represent respectively the values of the entire computation, the condition for the outer loop, the upgraded input value for IT_R1, and the upgraded input value for IT_R. We point out that if yI and xI are not related, the two loops start in an I independent mode. Furthermore, in the second projection the token xI on the input arc of the macroactor TEST cannot be present if the block F2 does not produce it and the macroactor TEST does not select it. The value of xI can be either the same as xI , as I for example for DO loops, or evaluated at run time, as for data-dependent iterations.

18

Example 2 To clear up how a nested iterative DFG works in this model, consider the following part of a program: input (w, x) y := x; repeat t := 0; repeat if y > 1

then y := y ÷ 2 else y := y × 3;

t := t + 1; until t = w; w := w + t/2; until y ≤ w output y Figure 12 represents the corresponding DFG. Until t≠ w, the innermost loop will be executed, and y upgraded values are produced. As soon as t=w, t/2 is evaluated, and the new values w and y are compared (gray rectangle). If y≤ w is true, y is produced; otherwise t is reinitialized to a zero value, the upgraded value of w is sent back into the cycle through the actor SEL in the gray rectangle, and the computation starts again.

V. CONCLUDING REMARKS

Since DFGs represent parallelism at a fine grain level, their VLSI implementation is a necessary condition to make competitive dataflow computers. However, even though some researchers [24][29][23] have engaged themselves in this direction, much more work must be still done to carry out large, scalable dataflow systems, which directly

19

map the DFG, for a given application, onto VLSI so that each actor in the DFG corresponds to one computing element cell in VLSI. One of the obstacles to such an implementation is the heterogeneity of actor I/O conditions that make it impracticable to have a realistic interconnection and an efficient utilization of hundreds of such cells in a chip. With this proposal our aim is to contribute to overcome this obstacle. In fact, the main characteristics that distinguish this model from the similar ones are that DFGs are obtained by using only actors with homogeneous I/O conditions. That means actors have two input links and one output link on which only a data token flow and no control token is required to compute conditional or iterative structures. As merge and switch actors are not allowed, the well-behavedness proof of DFGs representing above-mentioned structures is given. The main advantage is that it is possible to integrate identical FUs in clusters by using VLSI technology, because the actors of this model can be implemented with identical simple hardware. Besides, in these clusters no memory is required to store partial results [31] so as to construct an entirely asynchronous engine. This result should simplify the construction of dataflow machines, and the model can be used to attack seriously, at least at the chip level, the von Neumann model. At the moment we are starting with the VLSI design of a cluster with about a hundred interconnected FUs.

REFERENCES [1]

Abramson D. and Egan G., "The RMIT Data Flow Computer: A Hybrid Architecture", The Computer Journal, Vol. 33, no. 3, 1990, pp.230-240.

[2]

Backus J.W., "Can programming be liberated from von Neumann style? A functional style and its algebra of programs", Communications of the ACM, August 1978, pp. 613-641.

[3]

Boffey T.B., "Graph Theory in Operations Research", The Macmillan Press LTD, 1982.

[4]

Böhm A.P.W., "Dataflow Computation", CWI Tract, 1983.

[5]

Böhm A.P.W., and Gurd J.R. "Iterative Instructions in the Manchester Dataflow Computer", IEEE Trans. Parall. and Distrib. Systems, Vol. 1, Apr. 1990, pp.129-139.

20

[6]

Böhm C., "Reducing Recursion to Iteration by Algebraic Extention", ESOP86, Robinet R. and Wilhelm R. eds), LNCS n.213, pp. 111-118.

[7]

Böhm C., "Reducing Recursion to Iteration by means of Pairs and N-tuples", LNCS n.306, pp. 58-66, 1988.

[8]

Christofides N., "Graph Theory: An Algorithmic Approach", New York, Academic Press, 1975.

[9]

Culler D.E., Papadopoulos G.M., "The Explicit Token Store", JPDC, December 1990, pp. 289-308.

[10]

Dennis J.B., "First version of data flow procedural language", Lecture Notes in Computer Science, Vol. 19, Springer-Verlag, 1974.

[11]

Dennis J.B. "Data flow supercomputers", IEEE Computer, November 1980, pp. 48-56.

[12]

Dennis J.B., Lim W.L-P., and Ackerman W.B. "The MIT data flow engineering model." In Information Processing. IFIP 1983, pp. 553-560.

[13]

Dennis J.B., Gao G.R., and Todd K.W., "Modelling the Weather with a Data Flow Supercomputer", IEEE Trans. Computer, July 1984, pp. 592-603.

[14]

Dennis J.B., "Dataflow Computation - Control Flow and Data Flow: Concepts of Distributed Programming", NATO ASI Series F: Computer and system sciences, vol.14, 1985.

[15]

Dennis J.B., and Gao G.R. An efficient pipelined dataflow processor architecture. Proc. of Supercomputing '88, IEEE Computer Society Press, November 1988, pp. 368-373.

[16]

Egan G.K., Webb N.J., and Böhm W., "Some Architectural Features of the CSIRAC II Data-Flow Computer". In Advanced Topics in Data-Flow Computing, ed. by J.L. Gaudiot and L. Bic. Prentice Hall, 1991, pp. 143-173.

[17]

Gajski D., Padua D.A., Kuck D.J., and Kuhn R.H. "A second opinion on data-flow machines and languages". IEEE Computer, February 1982, pp. 58-69.

[18]

Gao G.R., "A Code Mapping Scheme for Dataflow Software Pipelining" Kluwer Academic Publishers, 1991.

[19]

Gaudiot J.L. and Bic L., eds. "Advanced Topics in Data-Flow Computing" Prentice Hall, 1991.

21

[20]

Harary F., "Graph Theory", Reading, MA, Addison-Wesley, 1969.

[21]

Iannucci R.A. "Toward a dataflow/von Neumann hybrid architecture". In Proc. of the 15th Annual Symposium on Computer Architecture. IEEE Computer Society, 1988.

[22]

KaviK.M, Buckles. B.P., and. Bhat U.N, "A Formal Definition of Data Flow Graph Models", IEEE Trans. Computer, November 1986.

[23]

Komori S., Shima S., Miyata S., Okamoto T., and Terada H., "The data-driven microprocessor", IEEE Micro, June 1989, pp. 44-59.

[24]

Mendelson B., Patel B., and Koren I., "Designing Special-purpose Co-processors Using the Data-Flow Paradigm", in "Advanced Topics in Data-Flow Computing", ed. by Gaudiot, J.L., Bic, L.-PRENTICE HALL, 1991, pp. 547-570.

[25]

Miklosko and Kotov V.E. eds., "Algorithms, Software and Hardware of Parallel Computers", Springer-Verlag, 1984.

[26]

Patil S.S., "Closure Properties of Interconnections of Determinate Systems", in The project MAC Conference on Concurrent Systems and Parallel Computation, ACM, 1970, pp 107116.

[27]

Skillicon D., "Stream Languages and Data-Flow". In Advanced Topics in Data-Flow Computing, ed. by J.L. Gaudiot and L. Bic. Prentice Hall, 1991, pp. 439-454.

[28]

Srini V.P., "An Architectural Comparison of Data Flow Systems", IEEE Computer, March 1986.

[29]

Terada H., "A design philosopy of a data-driven processor", Journal of Information Processing, March 1988, pp. 245-251.

[30]

Veen A.H., "Data Flow Machine Architecture", ACM Computing Surveys, December 1986.

[31]

Verdoscia L., EspositoA. , and Vaccaro R., "A Specialized Hardware to Implement in Fine Grain Mode the Dataflow Paradigm" In Proc. Int. Workshop on Comp. Architecture Technology for Comp.Science Research and Application, Naples, Italy, March 30-April 2, 1992, IEEE Computer Society Press.

[32]

Verdoscia L. and R. Vaccaro, "ALFA: a Static Dataflow Architecture", in Proc. 4th Symp. on the Frontiers of Massively Parallel Computation, Mc Lean, Virginia, October 19-21, 1992, IEEE Computer Society Press, pp. 318-325.

22

(a) T-gate actor

(b) F-gate actor

Firing when v = T

Firing when v = F (c) Switch actor

Firing when v = T

Firing when v = F (d) Merge actor

Figure 1. Special actors for the classical dataflow model

w

T

F

F

F

T F

T

T

T

F

T

T

+ /

*

> =

T

F

F

T

F

F

F

Figure 2. DFG of the Example 1 program according to the classical static dataflow model

23

Actor node

Link node Figure 3

d1

dn

d di

d

d

(a)

(b)

Joint link places the first token incoming on any of its input edges on its output edge.

Replica link places the token incoming on its input edge on each of its output edges;

Figure 4. Joint (a) and Replica (b) link and their firing rules.

TEST l

l



a

l

ADD

Figure 5. The simplest DFG Figure 6. TEST produces the token a if ℜ produces the output token

24

COND

TEST TEST

TEST

TEST

TEST

Figure 8. DFG for the case statement.The mutual exclusion condition is a i = a j for i = j

Figure 7. The macroactor COND

IT_R

SEL

F

TEST

TEST

Figure 9. The macroactor IT_R

25

w

Figure 10. DFG of the Example 1 program applying the proposed model.

26

SEL

F1

IT_R TEST

TEST

F2

SEL

F=IT_R

TEST

TEST

Figure 11. Nested Iterative Structure

27

w

w

w

t

y

Figure12. DFG of the Example 2 program. It computes a nested loop.

28

w

Suggest Documents