On the False Path Problem in Hard Real-Time Programs Peter Altenbernd C-LAB D-33095 Paderborn, Germany
[email protected], http://www.cadlab.de/peter/ Abstract This paper addresses the important subject of estimating the worst-case execution time (WCET) of hard real-time programs essentially needed for further evaluation of realtime systems. Purely structure oriented methods, analysing the control flow of the program without taking into account functional dependencies, tend to overestimate the execution time. An exact solution of this NP-complete problem is impossible for larger applications. In this paper, we propose a new heuristic of finding an estimate on the WCET. It provides a reasonable trade-off between analysis results and analysis efforts: the results will still be better than purely structure oriented methods without spending too much time on finding an exact solution. For this purpose our approach does not need any user annotations except for maximum loop counts and maximum recursion depths. The actual algorithm combines pruned path enumeration with the concept of symbolic execution.
1. Introduction Predicting the execution times of programs by static timing analysis tools is a fundamental problem in hard realtime systems. It is important for system design, especially with respect to schedulability testing, (e.g. [17, 1]) which relies on the values delivered by the static timing analysis. Most of the recent work on this subject focusses on analytical methods to extract the worst-case execution time (WCET) from a given program. One simple approach to deal with the problem is to find the longest structural path in the control flow of the program, and to assume its length to be the worst-case execution time. The disadvantage of this method is that it ignores mutual exclusive control flow branches, resulting in overestimated values, as demonstrated by the example of Fig.1. Due to the two if-statements, where the second one depends on the first, the program will run either through block a and then through d, or through b and then through c. How-
begin
if E
< 0 then /* block a=251 */ condition := FALSE ; x := E + 45;
else
/* block b=229 */
condition := TRUE ; x := ?E ;
if condition then /* block c=412 */
result := x DIV y;
else end;
result := y;
/* block d=352 */
Figure 1. A simple example program
ever, a purely structure oriented analysis would compute the worst-case execution time as MAX (a; b) + MAX (c; d) = 663, instead of 641 = MAX (a + d; b + c), the true value. Hence, the value is not given by the longest structural path (a; c), but by the longest executable path (sometimes referred to as “feasible” path) (b; c). In the following, we will refer to non-executable paths, like (a; c) or (b; d), as false paths (sometimes referred to as “dead” path), and to the problem of finding the longest executable path in a realtime program as the false path problem. The false path problem is NP-complete, i.e. for the exact solution all possible input value combinations would have to be considered, so that exact approaches are excluded from use when dealing with larger applications. In this paper, we propose a new heuristic of dealing with the false path problem without the need of user annotations on mutual exclusive control flow branches. It provides a reasonable trade-off between analysis results and analysis efforts: the results will still be better than purely structure oriented methods without spending too much time on find-
ing the exact solution. Basically our method is a combination of path enumeration with pruning (for structural analysis) and symbolic execution (for executability analysis). Our paper addresses the determination of the worst-case execution time with the following assumptions:
We use the term “program” in the sense of a sequential program. Our method is devoted to hard real-time programs, i.e. the code to be analysed is quite restrictive: Maximum loop counts must be known in advance, as well as maximum function recursion depths. Since we deal with programs automatically generated from differential equations, we can easily fulfill these restrictions. Our analysis will run on the abstract description of the program: the control flow graph (CFG) to be generated at compile time. It is assumed that the computation time of all blocks of basic statements is known in advance. Caching and pipelining effects are beyond the scope of the paper. In this paper we will focus on worst-case times only. Our method is applicable to get the best-case times as well.
The remainder of the paper is structured as follows: the next section illustrates how our approach is related to others. In Section 3, we introduce our new method of finding the longest executable path. In Subsection 3.1, our model of the control flow graph is described. Subsections 3.2 and 3.3 deal with the particular problems of loops and function calls. In Section 4 some experimental results are presented. Finally, Section 5 gives some conclusions.
2. Related Work Early work in the field of determinating the worst-case execution time focussed mainly on measuring the program runtimes on the target machine, like [9], or like the dual loop benchmark paradigm [4]. Within this method the program to be analysed is embedded into a loop, which is executed for a number of times. This way it is possible to measure a maximum/average runtime of the program. The disadvantages of this approach are obvious: the results are input dependent and the system overhead cannot be accurately determined. The simulation of the target processor including its software [5] is another non-analytical approach, which mainly suffers from the same disadvantages as the loop benchmark paradigm and additionally from the quite slow computation speed of the simulator. Approaching analytical methods, the simplest technique is to find the longest structural path in the control flow graph
(e.g. [15]). As mentioned in Section 1, this method tends to overestimate the worst-case execution time, resulting in severe processor underutilisations, which is considerably improved by our method. More sophisticated analytical methods which try to find the longest executable path have to cope with the huge complexity of the false path problem. Techniques which try solve the problem exactly, like [16], are only applicable to very limited program sizes. To overcome the complexity problem, some methods, like [7], which tries to simultaneously consider functional and timing analysis, require timing annotations to be added by the user. These methods typically improve the runtime behaviour of the analysis while concurrently delivering very accurate results. Their disadvantage is that the user has to be an expert on the insights of the static timing analysis tool and of the program code. However, our approach does not need any user knowledge about the timing of the real-time program, or vice versa, i.e. it takes into consideration that many of todays real-time programs are automatically synthesised. In [12] another user-annotated method, which is based on integer linear programming (ILP), is proposed. This method is very well-suited for modelling the whole system including the processor architecture and the annotations. Nevertheless, the computational results are not better than the results of a purely structural method, where the annotations are used to manipulate the control flow graph. Furthermore, this method is only applicable to relatively small examples, since larger examples create huge ILPs, which are difficult to solve with reasonable costs. In contrast, our method is based on the control flow graph, and its heuristical behaviour guarantees its applicability to larger examples as well. Most of the recent work in the field of static timing analysis of real-time programs focusses on the impact of pipelining [11] and caching [14] strategies. Though these are also imported problems, our approach emphasises the false path problem. However, the concepts of handling pipelines and caches can be included into our method as well. There is a similar false path problem in the field of hardware development, where it is of interest to find the longest executable path in a network of logic gates. This problem was firstly addressed in 1987 [6], and proved to be NPcomplete in [13]. In contrast to the real-time field, this problem also has to deal with signal delay times, which makes it more difficult to solve. On the other hand, a digital circuit only has to deal with logic signal values (variables), and not with more complex data types as in real-time programs, which makes the problem easier to solve. Conceptionally, our heuristic is slightly similar to the digital circuit analysis of [3].
3. False Path Problem Heuristic
complex atomic
Basically, our method is a combination of path enumeration with pruning and symbolic execution. We use a branchand-bound algorithm to perform the actual path search in the control flow graph. The algorithm is bounded at each node (block) of the search by early rejecting alternatives, which will not improve the currently best solution. Symbolic execution is a sort of simulated execution of the program. It is used to maintain a virtual memory representing current variable values, as far as they are known. It evaluates program statements concurrently to the path search algorithm. The symbolic execution will not try to evaluate each statement perfectly: It will only try its best. Our algorithm behaves heuristicly, since it is started with no knowledge about the input data values. However, values (or at least value ranges) are assigned when approaching certain branches as in if-statements, i.e. if the entrance condition for a certain branch is E < 0 for example, then it is assumed for the control flow following the branch that E is negative. This can be demonstrated by the example of Fig. 1. The path search will start with visiting block a due to the longer execution time of a compared to b. The symbolic execution will then store that E is negative. Furthermore, it will assign FALSE to condition and the value of E plus 45 to x, i.e. x is now known to be below 45. In the next step, the path search is forced to visit block d, not c, due to condition = FALSE. Using this method the false paths (a,c) and (b,d) are excluded. Hence, our heuristical symbolic execution is sufficient to avoid many false paths, though it does not take into account the input values.
3.1. Control Flow Graph Our algorithm is based on an attributed control flow graph, a directed graph G = (V; E ), where the set of hierarchical vertices V represents blocks of program statements, and the set of edges E represents the actual control flow including branches. Each (hierarchical) vertex is of one of the following types: atomic, complex, function, or loop. Complex vertices are used to represent hierarchy and loop bodies, whereas atomic vertices are used to represent basic statement blocks at the lowest level. Atomic vertices are annotated by an attribute C giving the total maximum computation effort, necessary to run the statements inside. Function vertices are further decomposed by an internal lower-level complex vertex, representing the function body. Similarly, loop vertices are further decomposed by an internal lower-level complex vertex, representing the loop body. An example involving all mentioned vertex types is shown in Fig. 2.
atomic
atomic
atomic
loop complex
function complex
atomic
Figure 2. Example of a control flow graph
3.2. Loops Since we are dealing with hard real-time systems, we presume that loops behave deterministically, i.e. the maximum loop count is known in advance. This is quite restrictive, but our concern are hard-real time programs automatically generated from differential equations, so that we can easily fulfill this restriction. During the analysis, loops are treated as follows: whenever the algorithm reaches a loop vertex, it will first analyse the loop body (i.e. compute the worst-case execution time of the loop body) before continuing with the rest of the current level. The loop body analysis is repeated either until the symbolic execution evaluates the termination condition to true, or until the maximum loop count is reached. Variables which have been set inside of the loop have to be cleared after processing the loop. If not done so, all possible combinations of processing all loops would have to be considered. This is important when the variable is evaluated at a later instant outside the loop. Similarly, variables, which are embedded in an if-statement adn which have been set inside of the loop, are cleared at the end of each loop. If not done so, all possible combinations of processing this loop would have to be considered. This is important when the variable is evaluated at a later instant inside the loop. A loop vertex is annotated by an attribute M giving the maximum loop count and by an attribute C giving the maximum costs for evaluating the loop exit condition.
3.3. Function Calls Function or procedure calls are treated with the help of the function vertices. They express the hierarchical structure of the real-time program. For recursive functions the maximum recursion depth must be known in advance for the same reasons as the maximum loop count. During the worst-case-execution-time analysis, function vertices are handled by first analysing the function body before continuing with the rest of the current level. Recursive functions are only analysed as often as stated by the maximum recursion depth. Function vertices are annotated by an attribute C giving the maximum costs for calling (not executing) the function. Recursive functions are additionally annotated by an attribute M , the maximum recursion depth.
3.4. The Algorithm
example if E < 0 C=30
condition:=FALSE x:=E+45 C=221
mdts=633
mdts=663
condition:=TRUE x:=−E C=199
mdts=611
if condition C=30
result:=x/y C=382
mdts=382
mdts=412
result:=y C=322
mdts=322
Figure 3. Control flow graph of the example program The control flow graph will be extracted at compile time from the real-time program, including all vertex attributes described in the previous subsections. Then, in a preprocessing step, the maximum-delay-to-sink (mdts) values are computed for all vertices. The mdts-value of a vertex is the length of the longest structural path from the vertex to a vertex with no successor. The control flow graph of the example of Fig. 1 is shown in Fig. 3 including all mdts-values. The mdts-value of the source vertex of a graph is equal to the length of the longest structural path of the program. The mdts-values are needed to bound the path enumeration as far as possible. These values help to detect quite early whether
an alternative leads to a better solution than the current best or not. The computational costs for the mdts-values grow proportional to the number of vertices, using a breath-firstsearch technique. function WCET(integer wcet, vertex V , memory M ) begin (memory, integer) switch (V:type) case atomic: for all statements Si in V do M .eval(Si ); wcet += V:C ; break; case loop: M .eval(exit condition); wcet += V:C while (not loop end(M , V:M )) do /* V’=loop body */ 0 (M; wcet) = WCET(wcet; V ; M ); wcet += V:C ; M .eval(exit condition); break; case function: if (not max recurs depth(V:M )) then /* V’=function body */ 0 (M; wcet) = WCET(wcet;V ; M ); wcet += V:C ; M .eval(call stmt); break; end switch; if (no successor(V )) then max := MAX(max, wcet); else M .eval(branch condition); for all successors Vi of V do if (M .possible(Vi ) and wcet + Vi :mdts > max) /* branch access and still better solution */ then WCET(wcet, Vi , M ); /* recursion */ return (M , wcet); end; function main() begin max := 0; /* current maximum wcet */ WCET(0, program, free memory); end.
!
Figure 4. The WCET algorithm
A pseudo-code of the simplified algorithm is given in Fig. 4. It is basically a recursive branch-and-bound algorithm enriched by the concept of symbolic execution, which is represented by the function M.eval(S). This function is used to transfer the result of the execution of a statement S into the virtual memory M. The recursive function WCET(), representing the worst-case-execution-time analysis, returns a tuple consisting of the virtual memory and the resulting worst-case execution time. The edges at each vertex are sorted in descending order by the mdts-values of the corresponding vertices. This gives the algorithm the
capability to find quite long paths very early. The mdtsvalues are further used to bound the current branch decision: whenever the current worst-case execution time wcet plus the mdts-value of the considered edge is less than the current maximum worst-case execution time max, it is not necessary to further explore the vertices following this edge. The algorithm is demonstrated in a more detailed manner than before by the example of Fig. 3. It starts by processing the “if E