is especially true for digital signal processors, which are de- signed to execute ..... For simplicity of presentation, we assume all the edges share a single global.
28th IEEE International Real-Time Systems Symposium
Static Scheduling and Software Synthesis for Dataflow Graphs with Symbolic Model-Checking ∗ Zonghua Gu1 , Mingxuan Yuan1 , Nan Guan2 , Mingsong Lv2 , Xiuqiang He1 , Qingxu Deng2 and Ge Yu2 1 2 Dept of Computer Science and Engineering Dept of Computer Science and Engineering Hong Kong University of Science and Technology Northeastern University Hong Kong, China Shenyang, China
Abstract
schedule. rA OA = rB IB ; rB OB = rC IC
In this paper, we address the problem of static scheduling and software synthesis for dataflow graphs with the symbolic modelchecker NuSMV using a two-step process: first use modelchecking to obtain a static schedule with the objective of minimizing the data buffer size, then synthesize efficient code from the static schedule with the objective of minimizing code size and performance overheads due to runtime dynamic decisions. We show the effectiveness of these techniques using a number of digital signal processing examples.
1
Introduction
In the dataflow paradigm, a program is represented as a directed graph, where the nodes, called actors, represent computational modules, and the directed edges represent communication channels between the modules. Synchronous Dataflow (SDF) [1] is a special type of dataflow model where each actor invocation consumes and produces a constant number of data tokens. Due to its static nature, SDF can be statically scheduled offline. It is widely used in a broad class of signal processing and digital communications applications, including modems, multirate filter banks, and satellite receiver systems. As an example, Fig. 1 shows a simple SDF graph taken from [2]. Each invocation of actor A produces 2 tokens on edge eAB; each invocation of actor B consumes 3 tokens on edge eAB and produces 1 token on edge eBC; each invocation of actor C consumes 2 tokens on edge eBC.
Figure 1. A SDF graph. We can solve the balance equation to obtain a feasible static ∗ This work was partially supported by Hong Kong RGC CERG Grant No. 613506, National Basic Research Program of China (973 Program) Grant No. 2006CB303000 and the 863 program Grant No. 2007AA01Z181
1052-8725/07 $25.00 © 2007 IEEE DOI 10.1109/RTSS.2007.51
(1)
where rA is the number of invocations of actor A; OA is the number of tokens produced by each invocation of actor A; rB is the number of invocations of actor B; IB denotes the number of tokens consumed by each invocation of actor B, and so on. For the example in Fig. 1, the balance equation is rA ∗ 2 = rB ∗ 3; rB ∗ 1 = rC ∗ 2 with the solution rA = 3; rB = 2; rC = 1. So a feasible static schedule is 3A2BC. Starting from the initial state shown in Fig 1, if the actors are invoked in the sequence AAABBC, then the SDF graph goes back to the initial state where there are no tokens on either edges. Therefore, we can execute this sequence of actor firings repeatedly without any deadlock or buffer overflow conditions. Given the actor ordering ABC, the repetition vector is [3,2,1], and the schedule length is 6. DSP applications are very performance sensitive, and digital signal processors have stringent resource limitations in terms of both processing speed and memory size. Engineers used to write DSP programs with assembly language not so long ago, and it was only recently that compilers for high-level programming languages like C/C++ became efficient enough to enable the use of high-level languages. To maximize performance, it is desirable to inline the code of each actor invocation directly instead of using procedure calls to avoid their runtime performance overheads. The memory size requirements of an application consists of two parts: code size and data buffer size. Since inlining causes code size expansion, we should minimize the number of inline actor appearances to minimize code size. One class of schedules have special significance in SDF scheduling: the Single Appearance Schedules (SAS), in which each actor invocation appears exactly once in the program body, e.g., the schedule 3A2BC is a SAS while 2ABABC is not. The SAS has the minimal code size among all feasible schedules where actor invocations are implemented as inline functions. Many researchers have developed various heuristic techniques for obtaining a SAS while minimizing data buffer size [1], implicitly treating minimizing code size as the primary objective and minimizing data buffer size as the secondary objective. However,
353
the data buffer size can be further reduced if we drop the SAS requirement, e.g., for the SDF graph in Fig. 1, the non-SAS 2ABABC requires a smaller data buffer size (6 tokens) than the SAS 3A2BC (8 tokens), assuming each edge has its separate buffer space without sharing. If we limit the search space to SAS only, then the code size is minimal, but the data buffer size can often be much larger than the buffer-minimal non-SAS, as we show in Section 5. It is fairly easy to construct a dynamic SAS using runtime if decisions from a non-SAS, but the challenge is to minimize the number of runtime decisions, since a large number of runtime decisions can have a negative impact on performance, not only due to the processor time spent executing the decision statements themselves, but also due to slowdown caused by their interference with speculative execution and branch prediction mechanisms of modern processors with deep pipelines. This is especially true for digital signal processors, which are designed to execute long sequences of computation intensive instructions, and do not handle control-flow intensive code very well. In this paper, we use model-checking to derive a data buffer-optimal non-SAS, and then construct a dynamic SAS while minimizing the number of runtime decisions, essentially sacrificing some runtime performance to minimize the overall memory size requirement of an application. This paper is structured as follows: we first present a brief introduction to the model-checker NuSMV in Section 2. We then present the NuSMV models for three common variants of dataflow models in Section 3. We present our software synthesis technique from a NuSMV-generated schedule in Section 4; performance evaluation results in Section 5; related work in Section 6; conclusions and future work in Section 7.
2
Introduction to NuSMV
NuSMV [3] is an reimplementation and extension of the original SMV model-checker from Carnegie Mellon University. NuSMV permits a number of different modeling styles [4], including synchronous or asynchronous modules, and direct specification of the FSM in terms of propositional formulas. Since we use the direct specification style in this paper, we provide a brief introduction to it, but do not discuss other modeling styles. The set of possible initial states is specified as a formula in the current state variables. A state is initial if it satisfies the formula. The transition relation is directly specified as a propositional formula in terms of the current and next values of the state variables. Any current state/next state pair is in the transition relation if and only if it satisfies the formula. These two functions are accomplished by the INIT and TRANS keywords. Here is the NuSMV model for a hardware inverter that accepts a boolean input and produces a boolean output that is the logical inversion of the input after a non-deterministic delay: MODULE inverter(input) VAR output : boolean; INIT output = 0 TRANS next(output) = !input | next(output) = output
According to the TRANS declaration, for each inverter, the next value of the output is equal either to the negation of the input, or to the current value of the output. Thus, in effect, each gate can choose non-deterministically whether or not to delay. The property specification language of NuSMV can be either Computational Tree Logic (CTL) or Linear Temporal Logic (LTL). For example, the CTL formula AF p specifies that, for all the paths (A) stating from a state, eventually in the future (F) condition p must hold. That is, all the possible evolutions of the system will eventually reach a state satisfying condition p. The CTL formula EF p specifies than there exists some path (E) that eventually in the future satisfies p. Other CTL formulae include AX p and EX p, which require that condition p is true in all or in some of the next states reachable from the current state.
3
Modeling Dataflow Models with NuSMV
In this section, we present NuSMV models for several common variants of dataflow models: Synchronous Dataflow (SDF) in Section 3.1, Cyclo-Static Dataflow (CSDF) in Section 3.2, Multi-Dimensional Synchronous Dataflow (MDSDF) in Section 3.3, and Boolean Dataflow (BDF) in Section 3.4. Of these variants, SDF, CSDF and MDSDF are all statically schedulable while BDF is not. In this paper, we focus on static scheduling and code synthesis for SDF, as extension to other static dataflow variants is relatively straightforward.
3.1
Synchronous Dataflow
Table 1 shows the NuSMV models for the example in 1. We focus on the separate buffer case, as the shared buffer case is self-explanatory once the separate buffer case is explained1 . A and B represent the two actors; eAB and eBC represent the number of data tokens on the two edges AB and BC. eAB:0..6 means that the integer variable eAB has a range between 0 and 6, i.e., we restrict the maximum number of tokens on edge eAB to be 6. (We’ll discuss how to set these bounds later.) Actor A is enabled if its output edge is not full (eAB=3) and its output edge is not full (eBC=2). (These enabling conditions have the same effect as inserting a back edge to model the number of available empty spaces on each forward edge.) mAB and mBC keep track of the maximum buffer usage on the two edges. In order to find a schedule with the minimum buffer size, we ask NuSMV to check the CTL formulas: SPEC AF mAB+mBC>=BUFF_SIZE+1 SPEC AF mAB+mBC>=BUFF_SIZE 1 For the shared buffer case, we do not consider memory fragmentation issues, i.e., the data tokens can fit in the buffer as long as the sum of the buffer sizes on all edges are less than or equal to the total buffer size. This is not true if we consider memory fragmentation caused by physical placement of data tokens with different lifetimes into memory, which is a NP-complete problem similar to bin-packing. Therefore, the buffer size obtained with model-checking should be viewed as a lower bound of the actual buffer size required.
354
Each edge has its own separate buffer
All edges share one global buffer
MODULE main VAR eAB:0..6; eBC:0..2; mAB:4..6; mBC:2..2; INIT (eAB=0 & mAB=0 & eBC=0 & mBC=0) TRANS (Aenabled & next(eAB)=eAB + 2 & next(eBC)=eBC &(next(eAB)>mAB -> next(mAB)=next(eAB)) &(next(eAB) next(mAB)=mAB) & next(mBC)=mBC) | (Benabled & next(eAB)=eAB - 3 & next(eBC)=eBC + 1 & next(mAB)=mAB &(next(eBC)>mBC -> next(mBC)=next(eBC)) &(next(eBC) next(mBC)=mBC)) | (Cenabled &next(eAB)=eAB & next(eBC)=eBC - 2 &next(mAB)=mAB & next(mBC)=mBC) DEFINE Aenabled:=eAB=3 & eBC=2; BUFF_SIZE:=6;
MODULE main VAR eAB:0..6; eBC:0..2; INIT (eAB=0 & eBC=0) TRANS (Aenabled & next(eAB)=eAB + 2 & next(eBC)=eBC) | (Benabled & next(eAB)=eAB - 3 & next(eBC)=eBC + 1) | (Cenabled & next(eAB)=eAB & next(eBC)=eBC - 2) DEFINE Aenabled:=eAB=3 & eBC=2; BUFF_SIZE:=4; SPEC AF EX 1 --True SPEC AF eAB+eBC>=BUFF_SIZE+1 --False SPEC AF eAB+eBC>=BUFF_SIZE --True
SPEC AF EX 1 --True SPEC AF mAB+mBC>=BUFF_SIZE+1 --False SPEC AF mAB+mBC>=BUFF_SIZE --True
Table 1. NuSMV models. The CTL formula φ AF mAB+mBC>=BUFF SIZE means “from the initial state, all possible execution paths eventually lead to a state where MemReq>=BUFF SIZE.”. If it is proven false, then an execution path has been found leading to a state when the total buffer size requirement is less than BUFF SIZE, so we decrement BUFF SIZE to search for a tighter lower-bound; otherwise, any execution path must have buffer size requirement not less than BUFF SIZE, so we increment BUFF SIZE. The minimum buffer size requirement is the value BUFF SIZE such that φ is true for BUFF SIZE but false for BUFF SIZE+1, and the schedule is the cycle produced by the model-checker as a counter example, where every state on the cycle satisfies the property MemReq a + b − c (2) where c = gcd(a, b). In the NuSMV model for the separate buffer case, we set buffer size lower bounds to BSLB obtained from Equation 2, e.g., the lower bound of mAB is 2+3-1=4, and that of mBC is 1+2-1=2. We set buffer size upper bounds to BSUB obtained from the balance equation, e.g., the upper bound of mAB is 6; that of mBC is 2. In the NuSMV model for
355
both the separate buffer case and shared buffer case, eAB and eBC have lower bounds 0 and upper bounds 6 and 2, respectively. Our experience shows that the BSUB is often quite pessimistic and often grossly overestimates the actual buffer size needed on each edge. Since the variable bounds have a large impact on the system state space, we follow these steps: 1. Set buffer size upper bounds to BSUB obtained from the balance equation. If NuSMV can handle the model, then continue with the process of looking for a tight overall buffer size bound, discussed next. 2. If the state space is too large for NuSMV to handle, then set buffer size upper bounds to BSLB obtained from Equation 2. If a feasible schedule is found, then finish. 3. If NuSMV reports a deadlock, then gradually increase the buffer size bounds until no deadlock occurs. Another modeling technique is to use the following lines to replace the SPEC parts in Table 1 (separate buffer case): DEFINE DONE:=(eAB=0 & eBC=0 & mAB+mBC>0); INVAR mAB + mBC=1; --the SELECT actor TRANS (Aenabled1 & next(ein1)=ein1-1 &next(ein2)=ein2 & next(eout)=eout+1) |(Aenabled2 & next(ein1)=ein1 &next(ein2)=ein2-1&next(eout)=eout+1) DEFINE Aenabled1:=inb & ein1>=1; Aenabled2:=!inb & ein2>=1;
Figure 3. A MDSDF model and its NuSMV model.
3.4
Figure 4. Two common elements in a BDF model: SWITCH and SELECT, and their NuSMV models.
Boolean Dataflow
Boolean Dataflow (BDF) [8] allows an actor to make runtime decisions based on boolean inputs. BDF is Turing-complete and undecidable in general. The basic mechanism used in BDF is to construct an annotated quasi-static schedule, a schedule where each firing in the schedule is annotated with the Boolean conditions under which it occurs. Thus, any sequence of firings can depend on a sequence of Boolean values computed during the execution. Executing the annotated schedule involves much less overhead than executing a dynamic dataflow schedule. Fig. 4 shows the two common elements in BDF. IntegerControlled Dataflow (ICDF) can be handled in a straightforward manner by extending the BDF model to have an integer value as the control input instead of a boolean value.
4
which is impossible since states are represented symbolically instead of explicitly. In this section, we compensate for modelchecking’s inability to generate a static SAS by synthesizing SAS code with dynamic runtime decisions from the non-SAS. Given a potentially very long, non-single appearance schedule (non-SAS) generated by NuSMV for a SDF graph, we face the problem of implementing it on a resource-constrained processor, both in terms of memory size and processing speed. Several implementation alternatives are possible for a given schedule ABCDBCDC:
Synthesizing Efficient Code
One limitation of using model-checking for SDF scheduling is that it is not possible to find SAS. There may be many possible schedules that all have the minimum buffer size, but NuSMV only produces one of them, the choice of which is not under user control. Unlike data buffer size, which is a property of each system state and is independent of execution history, SAS is a property of each schedule after post-processing [1]. It is not possible to encode SAS into the NuSMV model, since that would require checking at each individual state if the execution path leading to that state is a possible prefix of a SAS,
357
1. Use inline code to implement a non-SAS schedule: A();B();C();D();B();C();D();C();
(We use the notation A() to denote the inline code of actor A, not a procedure call.) This approach has minimal runtime overhead, but is not feasible in general since the code size will be prohibitively large for a long schedule. 2. Turn each actor invocation into a procedure, and use procedure calls to ensure that each actor body only appears once in the code. This approach carries some runtime performance overhead due to procedure invocation, which is typically not acceptable to DSP applications. It may also result in increased code size. Even though the code size expansion is not as large as the non-SAS inline schedule, it can still be significant if the schedule is very long.
0 A
1 B
2 C
3 D
4 B
5 C
6 D
7 C
further reduce the runtime overhead by using bit-shifting and bitwise comparison instead of integers and boolean comparison:
Table 2. The schedule ABCDBCDC with index numbers.
As a third alternative, we present a systematic approach to generating a dynamic SAS from a given non-SAS schedule sequence while minimizing the number of runtime decisions by extending the work of Bjorklund [9]. Section 4.1 describes the basic approach of [9]; Section 4.2 presents our first extension to reduce the number of runtime decisions by actor grouping, and Section 4.3 presents our second extension to handle long schedules.
4.1
Basic Dynamic SAS
Table 2 shows the schedule ABCDBCDC, where each actor invocation is numbered based on its position in the schedule for easy reference. A naive way of generating a dynamic SAS schedule is: for(int i=0; i