Embedded code optimization via common control ... - CiteSeerX

2 downloads 0 Views 52KB Size Report
implemented by integer variables (sm1 and sm2). In this case, the following encoding has been used for the places: in SM1 places p0 and p2 are represented by ...
Embedded code optimization via common control structure detection Luciano Lavagno Cadence Berkeley Laboratories

Jordi Cortadella Universitat Polit`ecnica de Catalunya

Alberto Sangiovanni-Vincentelli University of California, Berkeley

Abstract This paper addresses the problem of efficient code generation for embedded reactive real-time systems. Such systems have tight memory-size and execution speed-constraints. A method is proposed based on the theory of Petri net synthesis aiming at reducing the size of the code by exploiting common control structures. Experimental results show that significant improvements can be obtained.

1 Introduction In this paper we address the problem of synthesizing efficient software for embedded reactive real-time systems. Such systems are in general composed of software and hardware components, and the software has tight memory-size and execution-speed constraints. The use of FSMs for embedded control specification offers several advantages over apparently more powerful formalisms (such as unrestricted programming languages). First of all, they are easily understood and widely used even as informal specifications. Secondly, there are abundant theoretical and practical results concerning their manipulation (minimization, encoding, formal verification of properties, : : : ). It is customary to extend them with the capability to perform assignments of expressions to variables, and to use comparisons to determine transition conditions. The purpose of this paper is to describe algorithms for an optimizing compiler from an FSM specification to object code on a micro-controller ([4]). This compiler is not to be compared against traditional compilers for a programming language like C or Pascal, because we are solving a much simpler and more restricted problem. But exactly for this reason we can afford to perform optimizations that are either impossible or simply too expensive in the general case ([1]). We use, like most compilation strategies, a control/data-flow diagram (called s-graph, for software graph, in the following) as an intermediate data structure [4]. The s-graph is simpler than general control/data-flow diagrams, because it needs only to represent a single function from a discrete domain (the set of FSM inputs) to a discrete domain (the set of FSM outputs). As such, it requires only two primitives: conditional branch and assignment (using arithmetic and relational expressions without side effects). An s-graph is a directed graph with nodes:

   

a source node, with a single child, a sink node, without children, ASSIGN nodes, that evaluate a function of the FSM variables, assign it to an FSM variable, and pass control to their single child, TEST nodes, that evaluate a function of the FSM variables, and pass control to one of their children depending on the result of the function (each node has as many children as output values of the function).

 This work has been funded by MURST project “VLSI Architectures” and CICYT TIC95-0419

1

This simple representation has a straightforward representation in C and can be translated with equal ease into object code by any available compiler. The execution time of the code then roughly depends on the length of a path from source to sink1 and its size roughly depends on the number of edges in the s-graph. Our software synthesis procedure is composed of the following main steps: 1. Translation of a given FSM into an s-graph. 2. S-graph size minimization. 3. Translation of the s-graph into a target language. Steps 1 and 3, that have been described in [4], use Binary Decision Diagrams ([3]) as an intermediate representation, to generate a very fast initial s-graph, potentially at the expense of code size. Step 2, which is the subject of this paper, uses a novel algorithm, based on the theory of Petri net synthesis [8, 5]. Its aim is to reduce the size of the s-graph by exploiting commonalities between subgraphs that cannot be used by BDD-based optimization techniques, because they yield cyclic s-graph structures, while BDDs are inherently acyclic. Section 4 shows an example of an s-graph that cannot be optimized, while preserving its execution semantics, without introducing cycles.

2 S-graphs and Petri Nets A PN [9, 7] is a 4-tuple (P; T; F; m0 ) in which P is a set of places, T is a set of transitions, and F  (T  P ) [ (P  T ) is the flow relation (P; T; F is a bipartite directed graph), and m0 is the initial marking. A marking is a multi-set of places. The cardinality of a place in a given marking is also called the number of tokens assigned to the place in that marking. A PN transition may fire when all its predecessor places have at least one token. When it fires, it decrements the markings of all its predecessors and increments the markings of all its successors. The behavior of a PN is the set of transition firing sequences from m0 . A PN is said to be safe if no place can have a marking greater than one in any of its reachable markings. In this paper we only deal with safe PNs. Given a place p 2 P the marked set of p, Mp is the set of markings in which p has a value greater than zero. A PN is called minimal saturated if it contains all minimal places, that is all places p such that there exist no p0 such that Mp0  Mp . Two places are disjoint if they can never be marked together in any marking reachable from m0 by firing any sequence of transitions. Any PN can be transformed into a minimal saturated PN by performing reachability analysis and adding to it all the minimal places that do not change its behavior. A PN is called place-irredundant if no place can be removed from it without changing its behavior2 . A State Machine is a PN such that every transition has at most one predecessor and one successor. We want to exploit the powerful synthesis algorithms that have been developed for the synthesis of Petri Nets in order to minimize the size of a given s-graph. In order to do so, we need to define a correspondence between PNs and s-graphs. An s-graph can be interpreted as a PN in which each node is a place and each edge is a transition. An ASSIGN node is a simple single-predecessor, single-successor transition. A TEST node is a place with several successor transitions, one for each possible value of the tested function, each with one successor place. It should be obvious from this definition that a PN derived from an s-graph is a State Machine. The problem of synthesizing place-irredundant Petri nets was already solved in [5]. The algorithms given there minimize the number of transitions in the PN, and hence they can be used in order to minimize the code size of the s-graph, since transitions in the PN correspond to edges in the s-graph. Unfortunately, there is no guarantee that the minimized PN still has an efficient implementation as an s-graph. One mechanism for ensuring the existence of this backward mapping, from PNs to s-graphs is via the notion of State Machine coverability. Definition 2.1 (SM Component [6]) A State Machine Component N1 of a Petri net N is defined as a connected subnet of N with the following properties: 1. Each transition of N1 has exactly one input and one output edge 1 We

require that there are no data-dependent loops inside an s-graph, in oder to guarantee a deterministic delay bound. speaking, a minimal saturated net is analogous to the set of prime implicants of a logic function, while a place-irredundant net is analogous to an irredundant set of primes. 2 Roughly

2

2. All input and output transitions of a place in N1 (and their connecting arcs) also belong to N1 . Property 2.1 Given a minimal saturated Petri net N obtained as described in [2], any set of disjoint minimal places such that at least one of them is marked in any marking reachable from m0 defines an SM-component of N . Definition 2.2 (SM-coverable Petri net) A Petri net N is said to be SM-coverable if any place of the net belongs to an SM-component of N . From Property 2.1 and from the theory of minimal saturated PNs presented in [2], it can be easily deduced that any minimal saturated Petri net is SM-coverable.

3 S-graph optimization algorithm An SM cover of a PN can be used as a basis for s-graph (and hence code) generation as follows. 1. One SM component (typically the largest one) is implemented by using the s-graph traversal semantics from source to sink (i.e., it is implemented by the Program Counter of the CPU on which the synthesized code is executed). 2. The places not covered by this component are implemented by variables, that are incremented, decremented and tested to traverse the synthesized s-graph in a fashion that exactly mimics the PN firing rule. Hence it preserves the same sequence of ASSIGNs and TESTs as the original, un-minimized s-graph. Hence it becomes extremely important to minimize the number of these variables, as well as the number of their assignments and tests, because they become additional costs, that are not directly taken into account by the original PN minimization procedure. The algorithm works as follows 1. Generate all minimal places of the PN and a place-irredundant safe Petri net N as proposed in [5]. 2. For each place p of N do

 Find a maximal subset SM of disjoint places of N that do not intersect with p.  If SM does not completely cover all reachable markings, add some disjoint redundant minimal places to N until all reachable markings are covered (i.e. an SM-component is formed).

The result of the previous synthesis algorithm will be an SM-coverable Petri net in which each SM-component will only have one token, i.e. only one place of each SM-component will be marked in every reachable marking. Given a Petri net N that can be covered by a set of SM-components, 0 ; : : :; k ?1 , with at most one token per SM-component, each reachable marking can be represented as a vector m = (m0 ; : : :; mk?1), where mi is the place marked in i at marking m. For each SM-component i = fp1 ; : : :; pl g, an encoding function Ci : i ! IN can be defined in such a way that each marking m can now be represented as a vector of codes m = (C0 (m0 ); : : :; Ck?1(mk?1 )). We will call this vector an SM-vector. The information supplied by each SM-component of a Petri net can be redundant with regard to the information provided by the other SM-components. This becomes obvious in those reachable markings in which two or more SM-components share some marked place. In the framework of code generation to simulate the execution of a Petri net, a compact encoding for SM-components contributes to the reduction of operations for updating the SM-vector that represents the current marking. The encoding strategy previously proposed can be further optimized by taking into account the contribution of each SM-component to the distinguishability of different markings. The following procedure is proposed to calculate a compact encoding for the places of i , assuming that the SM-components ;0 : : :; i?1 have already been encoded:

SM

SM

SM

SM

SM

SM

SM

SM

SM

1. Build a compatibility graph in which each vertex represents a place of i . There is an edge between two places p1 and p2 if for all pairs (m1 ; m2 ) of reachable markings such that p1 is marked in m1 and p2 is marked in m2 , their sub-markings with respect to i?1 are different. 0 ; : : :;

SM

SM

3

p0

a=1

BEGIN

a=0 p2

0

p3

p1

a 1

op1

op2

op2

op1 p5

p4 b 0

1

op1

op3

op3

op2

b=1

b=0

p6

p7

begin: sm1=sm2=0; p0: if (!a) { sm2=1; goto p3; } p4: op2; sm1=1; if (b) { sm2=0; goto p7; } sm2=1; p3: op1; p7: if (sm1) op3; if (!sm2) op2; end:

op3

END

(a)

(b)

(c)

Figure 1: (a) S-graph, (b) Petri net, (c) Optimized code. 2. Find a clique partitioning of the compatibility graph. Each clique represents a subset of places that can have the same code. Informally, if two places do not contribute to the distinguishability of markings they can share the same code. The previous procedure can still be improved if priority is given to sharing codes between adjacent places (i.e. places that correspond to adjacent regions in the transition system). With this strategy, the number of changes on the SM-vector when transitions are fired is also reduced. The overall complexity of the present approach is dominated by the synthesis of PNs from s-graphs. In the worst case, this technique is exponential on the size of the s-graph but manifests linear complexity for most practical cases ([5]).

4 Example Figure 1 depicts a complete synthesis example. Let us assume that op1, op2 and op3 are complex operations in the s-graph. By synthesizing a safe PN from the s-graph, only one transition per operation is achieved. The state machines covering the Petri net are defined by the following sets of places: SM0 = fp0 ; p3 ; p4 ; p7 g, SM1 = fp0 ; p2 ; p6 g and SM2 = fp0; p1; p3; p5g. SM0 is implemented by the program counter, whereas the other two state machines are implemented by integer variables (sm1 and sm2). In this case, the following encoding has been used for the places: in SM1 places p0 and p2 are represented by the value 0 and p6 by 1, and in SM2 places p0 and p1 by 0 and p3 and p5 by 1. Note than in the final code, only one instance of op1 and op3 is generated, at the expense of adding some overhead based on simple assignments and conditional branches, supposedly simpler than the saved code for the complex operations.

5 Experimental results We have implemented the algorithms described in the previous sections, and we have applied them to several FSMs that are part of a dashboard controller. The results are summarized in Table 1. The columns labeled “original” and “minimized” contain the number of bytes of executable code generated from the original and the minimized s-graph 4

task belt cross display clock divisor quad2sign coil switch pwm fuel normalize odometer speedometer frc timer

original 232 5856 118 360 497 612 550 291 225 366 222 1228

minimized 216 5060 110 450 479 699 469 365 195 342 193 1962

CPU sec 0.44 1.70 0.04 0.11 0.12 0.32 0.35 0.05 0.06 0.04 0.07 53.64

Table 1: Results of optimization of dashboard controller respectively. The last column reports the PN minimization time (all other algorithm steps are linear in the size of the s-graph). A maximum of 2 additional variables (each requiring 1 byte) have been added to each s-graph to represent the SM-based encoding of places. The results are still preliminary, because we are not fully exploiting all the encoding optimizations (for example, using increment and decrement operations instead of variable assignments). Moreover, we are not taking into account that in some cases the advantage of eliminating one transition (s-graph node) may be canceled by the overhead due to testing and assigning to auxiliary variables. The most significant improvement can be seen in example cross display, that contains a relatively large number of similar arithmetic operations. Thus the optimization that is obtained by PN minimization in this case is related to common subexpression factoring, but is more powerful, because it can also take into account common control sub-structures. The cases in which the cost actually increases are due to the fact that PN minimization takes into account only the cost of the transitions, i.e. of the ASSIGNs and TESTs of the original s-graphs. In some cases, the cost of the variables, ASSIGNs and TESTs added to implement the SM components exceeds the saving due to the reduction in the number of transitions.

6 Conclusions We have shown that Petri net transformations can be used in order to optimize the size of embedded code. The transformations exploit similarities in the control structure, and are particularly effective when the cost of the “factored” basic blocks is much larger than the bookkeeping overhead. In the future we are planning to explore more aggressive encoding schemes, and consider the cost of adding auxiliary variables when deciding about code merging.

References [1] A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques and Tools. Addison-Wesley, 1988. [2] E. Badouel, L. Bernardinello, and Ph. Darondeau. Polynomial algorithms for the synthesis of bounded nets. In TAPSOFT ’95: Theory and Practice of Software Development, 1995. [3] R. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, C35(8):677–691, August 1986.

5

[4] M. Chiodo, P. Giusto, H. Hsieh, A. Jurecska, L. Lavagno, and A. Sangiovanni-Vincentelli. Synthesis of software programs from CFSM specifications. In Proceedings of the Design Automation Conference, June 1995. [5] J. Cortadella, M. Kishinevsky, L. Lavagno, and A. Yakovlev. Synthesizing Petri nets from state-based models. In Proceedings of the International Conference on Computer-Aided Design, November 1995. [6] M. Hack. Analysis of production schemata by Petri Nets. Technical Report TR 94, Project MAC, MIT, 1972. [7] T. Murata. Petri Nets: Properties, analysis and applications. Proceedings of the IEEE, pages 541–580, April 1989. [8] M. Nielsen, G. Rozenberg, and P.S. Thiagarajan. Elementary transition systems. Theoretical Computer Science, 96:3–33, 1992. [9] C. A. Petri. Kommunikation mit Automaten. PhD thesis, Bonn, Institut f¨ur Instrumentelle Mathematik, 1962. (technical report Schriften des IIM Nr. 3).

6

Suggest Documents