Compiling for Time Predictability - Semantic Scholar

1 downloads 0 Views 341KB Size Report
Compiling for Time Predictability ⋆. Peter Puschner1, Raimund Kirner2, Benedikt Huber1, and Daniel Prokesch1. 1 Institute of Computer Engineering.
Compiling for Time Predictability

?

Peter Puschner1 , Raimund Kirner2 , Benedikt Huber1 , and Daniel Prokesch1 1

Institute of Computer Engineering Vienna University of Technology, Austria {peter,benedikt,daniel}@vmars.tuwien.ac.at, 2 Department of Computer Science University of Hertfordshire, United Kingdom [email protected]

Abstract. Within the T-CREST project we work on hardware/software architectures and code-generation strategies for time-predictable embedded and cyber-physical systems. In this paper we present the single-path code generation approach that we plan to explore and implement in a compiler prototype for a timepredictable processor. Single-path code generation produces code that forces every execution to follow the same trace of instructions, thus supporting time predictability and simplifying the worst-case execution-time analysis of code. The idea of the single-path generation and details about the code-generation rules of the compiler can be found in this work. Keywords: real-time systems, compilers, time predictability, worst-case execution-time analysis

1

Introduction

Many embedded and cyber-physical systems need safe and tight predictions about the timing of the hard real-time software that controls safety-critical parts of the application. The problem of current systems is that planning application timing and obtaining reliable information about the timing behaviour of applications is getting more and more difficult, the reason being that the complexity of hardware and software is growing without limits. This, in turn, has the effect that efforts and costs of both the construction and the validation of safety-critical real-time applications are steadily increasing and becoming unacceptably high. Within the T-CREST project we are developing a novel processor architecture and new software and code-generation strategies to make real-time systems more time-predictable and reduce the complexity of temporal planning and timing analysis. The strategy is to use simpler hardware that can be controlled by software and to generate code that behaves in a more predictable manner than traditional code. ?

This research was partially funded under the European Union’s 7th Framework Programme under grant agreement no. 288008: Time-Predictable Multi-Core Architecture for Embedded Systems (T-CREST).

2

P. Puschner, R. Kirner, B. Huber and D. Prokesch

Within this paper we present the code-generation strategy of the T-CREST approach. The ideas for the single-path code generation, that is at the core of our code generation, have been published in [6] – In the single-path approach inputdependent code alternatives are translated into sequential pieces of predicated code of the same functional behaviour. In this paper we now present the rationale for the single-path code transformation and provide details about the codegeneration rules used in this approach. The advantages of the proposed code generation strategy are that it yields predictable, compositional and stable code execution times. These properties make it easy to argue about code timing. Further, they support a structured or hierarchical design and analysis of systems with respect to timing properties.

2

Desirable Code-Timing Properties

When compiling code for time-predictable embedded systems we should strive for the following properties: – Composability: The execution times of generated code should not be dependent of the software context in which the code is executed, i.e., adding, changing, or removing one part of some software must not change the worstcase timing of the other code sections. This is necessary to make timing an integral (though platform specific) property of a code piece or software component, which is a prerequisite for a hierarchical software development process for real-time applications. – Compositionality: Given the timing of some pieces of code, the timing of a composite should be derivable by a simple timing formula from the timing of its constituent code pieces. – Analyzability: The code structure should allow for an accurate timing analysis at reasonable cost. In particular, the overestimation of the worst-case execution time should be low, in the order of a few percent. – Stability: The generated code should run with constant execution time or with small execution-time jitter (variability). This greatly simplifies both timing analysis and the argumentation about the temporal behaviour of real-time application software. – Simplicity: The execution-time analysis of the resulting code should be of low complexity. Simplicity also fosters analyzability.

3

How to Make Execution Times Predictable

The execution time of a piece of code is determined by three factors: (1) the hardware on which the code is to run, (2) the sequence of actions defined by the code (depending on the algorithm chosen to solve a computing problem and the compiler that transforms the source code into machine code that the processor can execute), and (3) the context in which the code is executed [9]. The latter depends on the hardware state resulting from the execution history and the

Compiling for Time Predictability

3

application-specific context (e.g., the possible value assignments of variables) in which the code is executed. Each of the listed factors influences time predictability. Within the T-CREST project we aim at creating an environment in which time predictability is supported by all three factors. In the hardware domain, the architecture of the Patmos processor [8], which is developed in the T-CREST project, provides means to make the processor timing independent of the execution context. It features a fully predicated instruction set (cf. Section 4.1), where every instruction is conditionally executed depending on the value of one of the eight predicate registers. The VLIW nature of the in-order execution dual-issue pipeline demands that hazards are resolved during compile time instead of stalling the pipeline implicitly at runtime. The memory architecture, which provides a predictable function cache and a softwaremanaged scratchpad area, ensures that timing of memory accesses is controllable by software. As for the software, coding guidelines and specific code-generation strategies ensure that the sequence of instructions executed during a program execution are insensitive to the values of input variables. The sequence of instructions executed by the resulting code is easy to analyse, and, again, independent of the context in which the software is run. Within this paper we focus on code generation. Enforcing Equal Timing for All Inputs Regarding software, the central question in this paper is: How can the software help to make code timing predictable? Software contributes to variable timing by executing different instruction traces for different inputs. These different traces, in general, have different timing. So, if we want software to help make code timing predictable, we must find a way to make alternative code sections execute with equal time consumption. This can be addressed in two ways: either find ways to make the timing of alternative traces equal or eliminate alternatives. Enforcing Equal Timing for Alternatives Assuming that the processor provides constant instruction execution times and memory access times, an input-data dependent execution time can be traced back to some control-flow branch where different alternatives take a different amount of time. This difference can be eliminated by inserting so many single-cycle Nop instructions into the shorter (less time-consuming) alternative that the execution times of the alternatives become equal. A similar strategy can be applied to every loop with a non-constant but bounded number of iterations — insert another loop of identical iteration timing but empty functionality to compensate for non-taken iterations in the original loop. So the number of iterations of both loops taken together is always the same. The same goal could be achieved by substituting the Nop sequences by a Delay instruction, parametrised to stall execution for the equivalent amount of time the Nops would take.

4

P. Puschner, R. Kirner, B. Huber and D. Prokesch

This strategy is related to the approach of timing instructions, so-called deadline instructions, as provided in the implementation of the PRET architecture of Lickly et al. [3]. A deadline instruction sets the execution time limit for subsequent instructions until the next deadline instruction is encountered, which then stalls the execution until the specified time has elapsed. If the execution takes longer, an exception is raised which could be handled by the application, although it is desirable to verify the absence of such events. There are some drawbacks with the Nop or Delay insertion approach: First, the insertion of Nop respectively Delay instructions increases code size. Second, the Nop-insertion approach can only be used in architectures that do not suffer from hardware-state dependent execution times. E.g., in architectures with instruction caches one cannot assign a fixed execution time to a set of instructions, because access times of instructions differ for hits and misses (see Fig. 1). Even in architectures whose instruction execution times are not state dependent, Nop insertion requires detailed knowledge about the hardware timing in order to determine the correct number of Nop instructions that need to be placed in different code locations. Similarly, the insertion of Delay instructions needs a detailed analysis of the worst-case timing of code sections to set timers correctly, thus confronting us with the whole set of problems of a highly complex WCET analysis.

Cache conflict example: loop with conflicting instructions A and B (left), and two possible loop executions with two iterations

A

B

A

A t

A

B

A ... cache miss A ... cache hit

t

Fig. 1. Example Illustrating that There is Nothing Like a Fixed Execution Time for a Branch.

Eliminating Alternatives If we manage to generate code that follows the same execution trace for whatever input data it receives then obtaining composable and stable timing becomes almost trivial. This is the idea behind the single-path transformation [6]. The single-path transformation is a code generation strategy that extends the idea of if-conversion [1] to transform branching code into code with a single trace. Instead of using conditional branches to react to different input data, the transformed code uses predicated instructions – comparable to instructions that hide the branches within – to control the semantics of the executed code. Code generation for loops follows a similar idea. Loops with input-dependent iteration conditions are transformed into loops for which the number of iterations is known at compile time. Input-data dependent iterations are again removed by if-conversion.

Compiling for Time Predictability

5

The execution of single-path code requires a little hardware support (see Section 4.3). Single-path code generation is however a purely software-based approach, meaning that it does not need any information about the timing of hardware operations. This independence from hardware timing makes the single-path approach the preferable code-generation strategy among the discussed solutions.

4 4.1

Generating Single-Path Code If-Conversion

As said before, the single-path transformation builds upon If-Conversion [1]. If-conversion removes branches in the control flow of a piece of code by using predicated instructions. Predicated instructions are instructions whose semantics are controlled by a predicate, where the predicate can be implemented by the condition code flag(s) or specific predicate flags of the processor. If the predicate is true the instruction realises the function associated with its op-code. If the predicate evaluates to false, the instruction behaves like a Nop instruction.

Code example: Branching code cmplt rA, rB bf skip swp rA, rB

if rA < rB then swap(rA, rB); Predicated code (Pi)

predlt Pi, rA, rB swp rA, rB

skip:

Fig. 2. Branching Code (left) versus Predicated Code (right) Generated from the Same Source-Code Example.

Figure 2 shows an if-conversion example. In this example the values of two variables, rA and rB are swapped if the value of rA is less than the value rB . On the left side of the figure we see the branching code generated for the example, assuming the two variables are held in registers. The right side displays the code after if-conversion. The code on the right side is semantically equivalent, but uses a predicated swap instruction instead of the conditional branch. Let us for the moment assume that we have a processor that supports a fully predicated instruction set, i.e., the processor provides a predicated version of each of its op-codes. Building on this instruction set we can use if conversion to transform arbitrary conditional code branches resulting from if-then-else constructs into predicated code. Figure 3 illustrates the single-path transformation of an if-then-else construct. The original – branching – version uses a conditional branch to control

6

P. Puschner, R. Kirner, B. Huber and D. Prokesch

if cond res := expr1

res := expr2

P := cond (P) res := expr1 (not P) res := expr2 Predicated execution

Fig. 3. Using If-conversion to Eliminate Alternatives.

which alternative should be effective. The single-path version computes a predicate, P, and executes both alternatives with predicates P and not P , respectively, to implement the same semantics without branching. 4.2

The Single-Path Transformation

We will now explain how we can use the if-conversion of conditionals to build a set of rules that allows us to transform more complex programs, i.e., programs that include sequences of constructs, loops, and procedure calls. These rules can then be applied to any piece of WCET-boundable code3 to transform it into a single-path equivalent. Preparing for the Single-Path Transformation Recall that only those conditional branches whose branching decision depends on the program inputs create different paths of program execution. Therefore, the single-path transformation only has to eliminate these input-data dependent branches. Conditional branches whose behaviour is not influenced by program inputs should not be affected by the transformation. To make sure that only input-dependent conditionals are transformed, the actual single-path transformation and code generation is preceded by a data-flow analysis [2]. This data-flow analysis traverses the entire program code and marks all parts of the code as either input-data dependent or input-data independent. The Actual Transformation After the data-flow analysis the actual singlepath transformation and code generation are performed. Although the singlepath transformation is conducted on an intermediate representation of the code, we will demonstrate it here for programming language constructs represented at the source-language level. We think that this makes the performed steps easier to comprehend. To transform a program given in high-language source, we first construct its syntax tree. We then recursively traverse the syntax tree and use the appropriate rules from Table 1 to perform the single-path transformation for the constructs represented by the nodes of the syntax tree. 3

The maximum number of iterations can be bounded for all loops.

Compiling for Time Predictability

7

Table 1. Single-Path Transformation Rules.

Construct S S

if σ = T otherwise

S1 ;S2 if cond then S1 else S2

if ID(cond)

otherwise while cond max N times do S

if ID(cond)

otherwise call proc p (pars)

if σ = T otherwise

def proc p (pars) S

Translated Construct SPJ S K σδ S

(σ) S SPJ S1 K σδ; SPJ S2 K σδ

guard δ := σ; SPJ S1 Khσ ∧ guard δ ihδ + 1i; SPJ S1 Khσ ∧ ¬guard δ ihδ + 1i if cond then SPJ S1 K σδ else SPJ S2 K σδ

end δ := false for count δ := 1 to N do begin SPJ if ¬cond then end δ := true K σhδ + 1i; SPJ if ¬end δ then S K σhδ + 1i end while cond do SPJ S K σδ call proc p (pars)

call proc p-sip(σ, pars) def proc p (pars) S; def proc p-sip (pcnd , pars) SPJ S Khpcnd ih0i

Table 1 shows the single-path transformation rules for the basic control constructs used in a high-level language – simple constructs, sequences, alternatives, loops, and procedures4 . We assume that conditions controlling the execution of alternatives and loops are boolean variables, and thus side-effect free expressions. Besides the statement type, each rule has two parameters, σ and δ. The first parameter, σ, is a boolean value that represents the precondition under which the statement under transformation is executed. The second parameter, δ, is used to pass the value of a counter to the code transformation rule. Some rules use this counter value to generate unique variable names in the context of the rule. The details of the rules are as follows: Simple Statement For a simple statement S we distinguish two cases. If the precondition evaluates to true then the statement will be executed in every execution. Therefore the transformation generates S. Otherwise S will be 4

To keep the paper short we demonstrate the transformation for one representative of alternative statements (if–then–else) and loop statements (while) only. The rules for other variations of these statement types are similar.

8

P. Puschner, R. Kirner, B. Huber and D. Prokesch

executed conditionally depending on the value of σ. Therefore the transformation generates code for the predicated execution of S with predicate σ. Statement Sequence For a statement sequence, the generated code is the result of the sequence of its transformed constituents. Conditional Statement For a conditional statement, as represented by the if– then-else construct, we distinguish two cases. If the outcome of the branching condition depends on the program inputs ID(cond) is true), then we generate a code sequence that consists of the serialisation of the two single-path transformed alternatives S1 and S2 , where the precondition parameters of the alternatives are the conjunction of the old precondition (σ) and the evaluation result of cond (for S1 ) respectively not cond (for S2 ). If the branching condition does not depend on the program inputs then the transformation conserves the if–then–else structure and only transforms S1 and S2 . Loop In order to eliminate input-dependent control flow from a loop, the transformation replaces the original loop by a for–loop with constant execution count — as we are transforming hard real-time code we assume that there is an input-data independent expression N bounding the maximum number of loop iterations. The termination of the new for–loop is controlled by a new counter variable count δ . Further, we introduce an end δ flag to enforce that the transformed loop has the same semantics as the original. This flag is initialised to true and assumes the value false as soon as the termination condition of the original loop evaluates to true for the first time. In the new loop the original loop body S is only effective as long as the end flag has not been set. Thus S in the end executes under the same condition as in the original loop. Procedures The last two rules illustrate code generation for procedures. If the precondition of a call is always true, then the generated code calls the procedure unconditionally. Otherwise, i.e., σ will only be known at runtime, we have to generate code that ensures that the procedure is called in every execution, but that the procedure execution respects the call’s precondition. To facilitate the latter, we generate code for a new, single-path version of the procedures (with suffix -sip) that has an additional parameter pcnd. In the code generation for the definition of the single-path version of the procedure, pcnd is incorporated as a precondition that controls the predicated execution of the procedure body S. In the code generation for the call of the single-path version of the procedure, pcnd accepts the actual value of the precondition passed to the procedure. 4.3

Single-Path Transformation and Partial Predication

So far we assumed that our processor provides support for full predication. While this is the case for the processor being developed in T-CREST it seems noteworthy that the single-path transformation can be easily adapted for architectures that support only partial predication. In fact, work published in [4] gives an excellent guide on how to apply if-conversion for such architectures.

Compiling for Time Predictability

9

The idea behind code generation for partially predicated architectures is to compute intermediate results of both alternatives of an if–then–else statement, but only use the results from that one alternative for which the condition evaluates to true. Predicated Move instructions are used to control that the correct temporary results are moved to the variables that store the final results of the translated code. An if-conversion example for an architecture with partial predication is shown in Figure 4. The example also demonstrates how the if-conversion avoids adverse side effects when generating code with partial predication.

if src2 ≠ 0 then dest := src1/ src2;

Original code:

Partially predicated code, incorrect translation: exception on division by zero

P := (src2 ≠ 0) (P)

div tmp_dst, src1, src2 cmov dest, tmp_dst

Partially predicated code without dangerous side effects: if src2 = 0 then divide by a safe value (e.g., 1)

mov (not P) cmov div (P) cmov

tmp_src, src2 tmp_src, $safe_val tmp_dst, src1, tmp_src dest, tmp_dst

Fig. 4. If-Conversion for Partial-Predication Support.

5

Conclusion and Outlook

As the architectures of both hardware and software used in embedded and cyberphysical systems get more and more complex, the temporal predictability of code gets lost. This predictability loss brings about a number of unpleasant effects for the design and analysis of time critical systems: the absence of timing composability and compositionality impedes a meaningful argumentation about timing properties when designing, implementing, or re-using software components. The multitude of parameters determining hardware timing and execution paths make code timing unstable and difficult to analyse, thus resulting in great timing variability of execution times and pessimistic results in worst-case execution time analysis and global timing analysis, both leading to an overestimation of resource needs and thereby higher than necessary expenses for computing resources. The goal of T-CREST is to build a hardware/software platform for time predictable computing. With simpler and controllable hardware and single-path software we want to eliminate the above-mentioned problems.

10

P. Puschner, R. Kirner, B. Huber and D. Prokesch

The single-path conversion described in this paper allows us to produce code that executes on the same execution path for all possible inputs. The resulting code behaviour is insensitive to the values of input variables, which makes the analysis easy and supports stability from the software side. Together with timepredictable hardware the proposed code-transformation strategy also provides the composability and compositionality of code timing that is desirable for realtime applications. The latter has been highlighted in [7]. So far the proposed single-path transformation has been explored in small experiments that incorporated the manual transformation of branches into conditionals [5]. Within T-CREST we are now implementing a compiler that performs the code transformation automatically. A first compiler prototype is expected to be available in a few months. This will allow us to conduct further experiments (including experiments with larger code) and gain insights into the practical aspects of the automated single-path generation of code.

References 1. Allen, J., Kennedy, K., Porterfield, C., Warren, J.: Conversion of Control Dependence to Data Dependence. In: Proc. 10th ACM Symposium on Principles of Programming Languages. pp. 177–189 (Jan 1983) 2. Gustafsson, J., Lisper, B., Kirner, R., Puschner, P.: Code analysis for temporal predictability. Real-Time Syst. 32(3), 253–277 (Mar 2006), http://dx.doi.org/ 10.1007/s11241-005-4683-4 3. Lickly, B., Liu, I., Kim, S., Patel, H.D., Edwards, S.A., Lee, E.A.: Predictable programming on a precision timed architecture. In: Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems. pp. 137–146. CASES ’08, ACM, New York, NY, USA (2008), http: //doi.acm.org/10.1145/1450095.1450117 4. Mahlke, S., Hank, R., McCormick, J., August, D., Hwu, W.: A Comparison of Full and Partial Predicated Execution Support for ILP Processors. In: Proc. 22nd International Symposium on Computer Architecture. pp. 138–150 (Jun 1995) 5. Puschner, P.: Experiments with wcet-oriented programming and the single-path architecture. In: Proc. 10th IEEE International Workshop on Object-Oriented RealTime Dependable Systems. pp. 205–210 (Feb 2005) 6. Puschner, P., Burns, A.: Writing temporally predictable code. In: Proc. 7th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems. pp. 85–91 (Jan 2002) 7. Schellekens, M.: A Modular Calculus for the Average Cost of Data Structuring. Springer (2008) 8. Schoeberl, M., Schleuniger, P., Puffitsch, W., Brandner, F., Probst, C.W., Karlsson, S., Thorn, T.: Towards a time-predictable dual-issue microprocessor: The Patmos approach. In: First Workshop on Bringing Theory to Practice: Predictability and Performance in Embedded Systems (PPES 2011). pp. 11–20 (March 2011), http: //www.jopdesign.com/doc/patmos_ppes.pdf 9. Wilhelm, R., Engblom, J., Ermedahl, A., Holsti, N., Thesing, S., Whalley, D., Bernat, G., Ferdinand, C., Heckmann, R., Mitra, T., Mueller, F., Puaut, I., Puschner, P., Staschulat, J., Stenstr¨ om, P.: The worst-case execution-time problem – overview of methods and survey of tools. ACM Transactions on Embedded Computing Systems 7(3) (2008)