WCET Coverage for Pipelines

3 downloads 0 Views 319KB Size Report
caches, dynamic branch prediction, and out-of-order execution, exhibit a greater variability ..... A superscalar pipeline dynamically issues multiple instructions on.
WCET Coverage for Pipelines Adam Betts1 , Guillem Bernat1 , Raimund Kirner2 , Peter Puschner2 , and Ingomar Wenzel2 1

2

Real-Time Systems Research Group, University of York, York YO10 5DD, UK Institut fur ¨ Technische Informatik, Technische Universitat ¨ Wien, Treitlstrasse 3/182/1, A-1040 Wien, Austria

Abstract Hybrid measurement-based (MB) approaches for computing WCET estimates are gaining popularity due to difficulties modelling advanced microprocessor speed-up features. These approaches combine measured data of program segments by using static analysis techniques in order to reconstruct the longest path through the program. To this end, execution times of program segments are collected by first instrumenting, and then testing, the program. However, the test phase must exercise the WCET between each pair of instrumentation points in order to enable safe re-combination of these data in the calculation stage. In general, exhaustive testing is not permissible, thus coverage criteria are required which give a quantitative view of the quality of the test phase. Often, such criteria within the functional domain, e.g. branch coverage and MC/DC, are insufficient because temporal properties are not considered. This deficiency becomes an issue for advanced mircoprocessors which almost certainly include pipelines and caches, amongst others. In this paper, we introduce the notion of WCET coverage, which are criteria to be integrated into hybrid MB WCET analysis, by focusing on three metrics for programs executing on pipelined processors. Each metric utilises the instrumentation point graph (IPG) - a program model typically used in hybrid MB WCET analysis - as a measure of coverage. The most straightforward criteria, simple pipeline coverage, covers the pipeline effect on each IPG edge. This is subsumed by pairwise pipeline coverage, which attempts to cover the pipeline effect between adjacent IPG edges. A stronger criteria, pipeline hazard path coverage, utilises properties of the pipeline to identify potential sources of hazards, either structural or data, in the program and maps this information onto the IPG.

1

Introduction and Motivation

The dependability of real-time systems manifests itself to a larger degree in an ever-increasing computer-centric world. A guarantee of precise functionality in such systems requires affirmation of both functional and temporal correctness; this latter property necessitates since both early and late computations alter the state of the system in a potentially adverse manner. Two classic examples of such are an in-flight control system and a video streaming application in which temporal failure results in a wide range of possible consequences: death at the one extreme and an angry client at the other. Real-time system engineers deliver temporal correctness by breaking the system up into a number of tasks and assigning a temporal order to the task set according to a selected scheduling algorithm, e.g. fixed-priority [41] or earliest deadline first (EDF) [29]. The feasibility of

scheduling a task set is ascertained through schedulability analysis by analysing tasks’ temporal parameters relative to the available system resources. A central parameter in this analysis is that of a task’s worst-case execution time (WCET), which represents the maximum amount of CPU time a task requires in order to complete execution. Consequently, estimating the WCET of each task is a key deliverable when gauging the temporal correctness of a real-time system. Deriving WCET estimates is driven by two conflicting requirements: safety and accuracy. On the one hand, it is essential to certify that a task’s execution time will never exceed the predicted WCET bound so as to enable accurate schedulability analysis. On the other hand, estimates should lie in close proximity to the actual WCET of a task because real-time system resources are usually constrained; consequently, maximisation of these scarce resources is a primary objective. Erring on the side of caution prevails, however, since a large number of real-time systems are safety critical. Before WCET research began in earnest, standard industrial practice obtained WCET estimates by testing the program according to a set of coverage criteria, in which the principal aim was functional testing [42]. In many cases, this observed WCET is factored in an attempt to bypass any optimism that may exist. The pathological case, however, might still remain, thus static analysis (SA) has been proposed as a viable alternative [36]. These techniques are appealing because mathematical models of the program and the processor are constructed that ensuingly permit formal proofs. SA techniques produce accurate and safe estimates on microarchitectures that exhibit relatively simple hardware configurations, such as shallow pipelines. The main drawback of SA is that there is an intrinsic need for predictability at all levels of the analysis; this is jeopardised in the presence of more complex speed-up features, such as caches, branch prediction units and out-of-order execution, since accurately and safely modelling their complex interaction is an intractable problem. Naturally, this leads to pessimistic assumptions about how these features operate in order to facilitate tractable solutions. For instance, many approaches typically consider each micro-architectural unit in isolation before merging data together at a later stage, thus implicitly integrating some level of pessimism since these units operate in tandem during normal execution. These modelling problems become more crucial as an upsurge in the integration of such micro-architectural units grips the embedded market [19]. Worse yet, SA is never able to match the pulsating innovation of hardware architects, hence it invariably lags behind the current state-of-the-art of microprocessor design [35]. These difficulties provide the motivation for measurement-based (MB) approaches, which is further fuelled by the successful exploitation of MB approaches in many contemporary realtime systems. In particular, testing measures real system behaviour instead of using a model, which offers several advantages. First, there is relative ease in re-targeting a MB framework towards a new processor architecture, or towards modifications made to an existing processor. Second, interactions among modern speed-up features are not isolated and subsequently recombined, facilitating the possibility of more accurate WCET estimates. Third, unrealistic assumptions about the processor’s implementation, e.g. a large instruction cache, are never permitted due to the processor’s physical structure. A clear disadvantage of an end-to-end MB approach is that safe WCET guarantees cannot be ascribed unless the combination of input values leading to this occurrence is exercised; this is not a trivial problem in the general case. It is for this reason that MB techniques aim towards mission-critical systems, usually within a probabilistic framework [6], whereby occasional deadline misses can be tolerated. However, the presence of micro-architectural speed-up features also inhibits end-to-end MB approaches in that it is becoming more difficult to ascertain confidence in the measurements, as the WCET can depend on a rare sequence of events at the architectural level, and not only on the worst-case input data. This is further underlined by a lack of research in test-data generation that aim towards verifying temporal properties. In practice, functional test data are reused, but Tracey [42] has noted that these are not necessarily suited in eliciting WCET estimates. 2

Hybrid MB methods attempt to reduce the potential for underestimation and overestimation incurred by MB and SA techniques, respectively, by combining the best features of both approaches. To this end, execution times of program segments, which we term instruction blocks, are monitored by inserting instrumentation points and testing the program with stateof-the-art techniques. Furthermore, loop bounds are also extracted through measurements. The subsequent stage is to compute a WCET estimate combining these observed execution times through tree-based [13, 10, 11], path-based [22, 39] or IPET [28, 37] models. These calculations are instead based on the structure afforded by instrumentation points [7, 34] - the instrumentation point graph (IPG) - and not on the abstract syntax tree (AST) or control flow graph (CFG). However, the key assumption of this final calculation is that the WCET of each instruction block has been exercised during testing and that measurements are sufficient to provide upper loop bounds; otherwise, the safety of the final estimate is compromised. In this respect, and analogously to functional testing, an exhaustive strategy is generally not feasible. Therefore, in this paper, we explore the notion of WCET coverage, which aims to guide the testing process towards completion whilst accounting for timing properties of the program. The work is primarily motivated by the current inadequacy of functional coverage criteria (statement and branch coverage [48], MC/DC coverage [9], loop count coverage [5]) with respect to such properties. That is, achieving full coverage of all these criteria does not imply exercising the WCET of each instruction block. In essence, two factors impact the stringency of WCET coverage. First, the number and placement of instrumentation points relative to the structure of the program is significant due to the resultant effect on the size of each instruction block. Intuitively, smaller instruction blocks, e.g. comprising a single instruction or basic block, require less coverage since there is only a small number of different execution times [11]. On the other hand, instruction blocks incorporating, for example, nested loops of the CFG require greater coverage to ensure that all paths in the segment are exercised and that the WCET is indeed captured. The second crucial parameter is the hardware architecture on which the software executes. Relatively simple architectures with shallow, in-order pipelines result in smaller variations of execution times between instrumentation points. In comparison, state-of-the-art processors with multi-level caches, dynamic branch prediction, and out-of-order execution, exhibit a greater variability in execution time between instrumentation points due to, for example, cache misses and branch mispredictions. Colin and Petters [11] have already noted that caches and out-of-order execution have the greatest impact on the execution time variability of basic blocks, whilst branch prediction is negligible. In this paper, we concentrate on WCET coverage metrics for programs executing on processors with pipelines, since this feature is almost compulsorily integrated into modern CPU design [24]. Both metrics utilise structural properties of the IPG as a measure of coverage, assuming that the program has been optimally instrumented [3, 27], i.e. the entire execution path can be reconstructed from a trace with a minimal number of instrumentation points. There are several motivations for this requirement. The first is that hardware debug interfaces, such as Nexus [18] and the ARM Embedded Trace Macrocell [14], optimally instrument a large class of programs. Therefore, these metrics are applicable for instrumentation techniques which passively collect timing data, i.e. without adding the overhead incurred by software instrumentation. The second, and more noteworthy motivation, is that optimal instrumentation guarantees each path between instrumentation points is unique. This property is central to the three metrics - simple pipeline coverage (SPC), pairwise pipeline coverage (PPC), and pipeline hazard path coverage (PHPC) - that we introduce because traversing an IPG edge ensures coverage of that path. On the other hand, a coarser instrumentation would forbid this implication. SPC is a straightforward criteria that attempts to cover the pipeline effect on each IPG; it is subsumed by PPC because this attempts to cover the pipeline effect between adjacent 3

IPG edges. PHPC is a stronger criteria whereby properties of the pipeline and the program are utilised to identify sources of potential data and structural hazards. This information is subsequently mapped onto the IPG to highlight (sub-)paths which must be executed to satisfy the criteria. For each criteria, the important detail is that the IPG is used to construct test cases which satisfy these criteria; hence it is employed to measure the amount of coverage. Following is the organisation of the paper. In section 2, we review the current state of the art in three areas: WCET analysis, instrumentation techniques, and coverage criteria. Section 3 presents an overview of terminology, particularly that of graphs. This terminology is required in section 4 to formally introduce the IPG and properties of instruction blocks within the scope of an optimal instrumentation. These properties are significant in considering our WCET coverage metrics that we introduce in section 5, which is the main contribution of the paper. Finally, in section 6, we draw some conclusions and explore some future directions of work.

2

Related Work

WCET Analysis Puschner and Koza [36] are generally attributed with the introduction of SA techniques for WCET analysis, who crucially noted that computing execution times of arbitrary programs is an intractable problem. Tractability is hence accomplished by bounding loops and recursive call depth. Such information is typically elicited through user annotations, who are assumed to have detailed insight of the program. However, since this is often an error-prone task, an alternative method is to glean these constraints from sophisticated program analysis techniques [16, 23]. In order to compute WCET estimates, SA relies on two mathematical models: a program model and a processor model. The former is required to determine the longest path through the program: the CFG and the AST are utilised, both of which use basic blocks1 as atomic units of computation. The latter is needed since the execution time of a program varies among different micro-processors due to the presence or absence of speed-up features. Consequently, the execution times of basic blocks in the program model can vary significantly according to processor state and data supplied: the processor model thus attempts to bound the WCET of each individual basic block. Regarding the processor model, pipelines and caches have been comprehensively studied. The former is widely used in many contemporary embedded processors due to low implementation cost and power consumption, whilst the latter has the greatest effect on the WCET [11, 25]. Engblom [15] has presented a model to bound timing effects due to pipelines that occur over consecutive basic blocks. Schneider and Ferdinand [38] have proposed how to predict the behaviour of pipelines using abstract interpretation. Comparatively, caches have received more attention [17, 31, 32, 45, 46]. Each of these methods essentially attempts to (conservatively) determine the contents of cache at a particular program point. Recent work in processor modelling has targeted features present in high-end mircoprocessors, such as branch prediction and out-of-order execution. The focal point of the former is to bound the number of branch mispredictions depending on the underlying prediction mechanism: either a static-based scheme [12], or dynamically using global history predictors [4]. For the latter, a timing estimate of basic blocks is produced without enumerating all possible instruction schedules, which might be necessary due to the timing anomaly problem [30]. 1 A basic block is a sequence of instructions such that the flow of control can only enter at the beginning and leave at the end [2]

4

A timing anomaly occurs when local worst-case behaviour, e.g. a cache miss, does not necessarily lead to the global worst case, which creates the possibility of an underestimated WCET. The program model determines how the final WCET computation is realised. In a treebased method, each high-level construct (i.e. sequence, selection, iteration) is attributed with a timing rule, collectively referred to as a set of timing schema [33]. The calculation engine hence equates to a syntactical parse of the program, which is typically based on the abstract syntax tree. A tree-based calculation technique based on the IPG has recently been presented [7]. On the other hand, a flow graph representation of the program - typically the CFG - permits two kinds of calculation techniques. In a path-based approach, the paths of the program are explicitly explored with the aim of locating the longest in this set. This clearly succumbs to tractability problems when analysing program with loops since the number of paths grows exponentially. Consequently, path-based analysis is usually limited to particular program segments, typically loops and functions [22, 39]. The other permissible calculation method is the implicit path enumeration technique (IPET) technique [28, 37], whereby an integer linear programming (ILP) problem is formulated. The constraint set is derived from the static structure of the flow graph and the dynamic properties of the program, i.e. loop bounds and infeasible paths. These constraints are solved by maximising an objective function, which returns a WCET value as well as the number of times each node and edge is executed. MB approaches have recently been afforded increasing attention. Wegener et al. [43, 44] adopt evolutionary algorithms to generate inputs by evaluating the fitness of the WCET, i.e. a longer execution time is fitter. Bernat et al. [6] use probabilistic analysis to combine execution profiles of program segments, which have been collected during measurements. Colin et al. [11] have used measurements to quantify the effects of microprocessor speed-up features on the WCET estimate - as opposed to average-case behaviour - which motivates some of the ideas in this paper. Kirner et al. [26] adopt a hybrid MB approach; evolutionary testing is used in conjunction with model checking to exercise each acyclic path between instrumentation points.

Instrumentation Techniques Program instrumentation is mainly used to trace programs in functional testing in order to determine bugs and to measure the amount of coverage, and can loosely be classified as intrusive or non-intrusive. In the case of the former, physical instructions are present in the program that collect (timing) data during execution, and hence affect the execution time of the program in an adverse manner. On the other hand, non-intrusive counterparts collect these data passively, and requires the support of hardware debug interfaces, such as the Nexus interface [18] or the ARM Embedded Trace Macrocell (ETM) [14]. Agrawal [1] has proposed an intrusive technique that reduces the number of probes with respect to basic block and branch coverage criteria. His technique utilises a super block dominator graph, which essentially merges pre- and post-dominator information from the CFG, to insert probes into leaf nodes: it can be shown that covering the instrumented nodes implies covering all nodes in the CFG (with respect to basic block and branch metrics). Tikir and Hollingsworth [40] detailed an intrusive approach that dynamically inserts and remove instrumentation points in order to reduce the run-time overhead of code coverage, as opposed to instrumenting statically. Instrumentation is only inserted into a function when executed for the first time during program execution; it is subsequently removed when it does not provide any additional coverage information. A widely-adopted, and intrusive, intra-procedural tracing mechanism has been proposed by Ball and Larus [3, 27]. This has the significant property that the entire traversed path through a program can be reconstructed from a trace with the minimum number of instru5

mentation points. A heuristic edge weighting algorithm assigns lower weight to less frequently executed edges in the CFG, generally equating to deeply-nested conditional constructs. From this weighted graph, a maximum spanning tree of the CFG is constructed, and instrumentation points are inserted on non-tree edges, creating new basic blocks of tracing instructions. In the remainder of this document, we shall term this technique the optimal instrumentation. A sub-optimal instrumentation profile indicates an optimal instrumentation such that there is an additional set of, potentially empty, instrumentation points. We concentrate on WCET coverage for programs which have been optimally instrumented. In the scope of this paper, this is primarily due to several structural properties of the IPG that we enumerate in sections 4 and 5, but the motivation extends farther. In particular, hardware debug interfaces, such as Nexus [18], record time stamps at points where program flow discontinues, i.e. taken conditional branches. This is, in fact, an optimal instrumentation for a large class of programs without the intrusiveness of software instrumentation points, as noted above.

Coverage Criteria Software testing is required in the validation and verification of a program in order to uncover any errors; unit testing is a term typically used to indicate that testing occurs at the functional/procedural level of a program. In unit testing, white box techniques exploit structural properties of the program by analysing the CFG. An ideal requirement of a structural-based (unit) testing strategy is to exercise each path in the CFG; this is generally infeasible due to an exponential blow-up of paths for programs with loops. Therefore, coverage criteria are used to ensure tractable testing, or as a measurement of test quality [48]. The subsumes relationship offers a comparison between different coverage criteria: a criterion A subsumes a criterion B if, and only if, every test case that satisfies A also satisfies B [9]. Statement coverage ensures that all statements in the program are exercised, and is clearly subsumed by basic block coverage. Full statement coverage can never be achieved in the presence of dead code. Branch coverage requires each edge of the CFG to be traversed, which also subsumes statement coverage. The all jump-to-jump paths criteria requires each linear code sequence and jump (sub-)path to be executed [47]. A linear code sequence and jump path begins with a basic block which is the target of a jump (or the program start node), contains a sequence of contiguous basic blocks with no internal jumps, and finishes with a jump from the last basic block in the sequence (or the program exit node). Covering all jump-to-jump paths subsumes branch coverage. Chilenski and Miller [9] have classified four kinds of condition/decision coverage metrics: a condition is a boolean expression containing no boolean operators; a decision is an outcome of a (composite) boolean valued statement in a high-level language. Following are the metrics that are classified in increasing order of subsumption: 1. Decision coverage (DC): all outcomes of every decision have been exercised. 2. Condition/decision coverage (C/DC): all outcomes of every condition in a decision and all outcomes of every decision, respectively, have been taken. 3. Modified condition/decision coverage (MC/DC): every condition in a decision has been exercised, and each condition has been shown to independently affect the decision’s outcome. The latter can be shown by modifying the condition whilst all other conditions of the decision are fixed. 4. Multiple-condition coverage (M-CC): all possible outcomes of each condition in a decision have been taken. 6

As stated above, path coverage is the strongest since it requires each path in the CFG to be traversed, but it is impractical. Consequently, path-based coverage has focused on the selection of the most important subset of paths. Simple path coverage requires that all paths without repeated execution of an edge is satisfied, whereas an elementary path criterion requires that all paths without repeated execution of a node is satisfied [48]; simple path coverage subsumes elementary path coverage since an elementary path must be a simple one. Bently and Miller [5] have proposed loop body coverage, which specifies that each loop body must be iterated n times, up to some value k.

3

Terminology and Notation

A (undirected) graph G = (N, E) is a pair of finite sets N and E, called nodes and edges respectively, where E = {{u, v}|u, v ∈ N }. We sometimes use the notation NG to clarify that N are the nodes of G. A directed graph (digraph) G = (N, E) has an edge set E = {(u, v)|u, v ∈ N }, where u is an immediate predecessor of v, and v an immediate successor of u. The set of immediate predecessors and immediate successors of a node u shall be denoted pred(u) and succ(u), respectively. A path of length m is a sequence v0 → v1 → . . . → vm ; v0 is the start node, vm is the end ∗ + node, and each vi → vi+1 ∈ E for all 0 ≤ i < m. The notation u → v (u → v) denotes a path + of length zero (one) or more. A cycle is a path u → v such that u = v; a graph G = (N, E) + is acyclic if there are no cycles. A graph G = (N, E) is connected if there is a path u → v between any u, v ∈ N such that u 6= v. A tree T = (N, E) is a connected, acyclic graph. For a graph G = (N, E) with a weighting function f : E → R, a maximum spanning tree T = (N ′ , E ′ ) is such that N ′ = N and the weighing of E ′ is maximal over E. A control flow graph (CFG) is a connected digraph C = (N, E, s, t), s, t ∈ N , where we assume that s has no immediate predecessors and t has no immediate successors. Furthermore, + + it is assumed for every u ∈ N − {s, t} there are paths p : s → u and q : u → t, i.e. no dead code, without loss of generality. The set of regular expressions over a finite alphabet Σ is defined as follows: • ∅, Λ and a are basic regular expressions; ∅ denotes the empty set, Λ denotes the empty string and a ∈ Σ. • If r and s are regular expressions then (r+s), (r·s), and (r∗ ) are corresponding compound regular expressions; + denotes set union, · denotes concatenation, and ∗ denotes transitive closure under concatenation. The regular expressions obtained from this definition are fully parenthesized. However, parentheses are relaxed using the operator precedence of ∗ over · over +. Notation is sometimes abused: (r · s) is written (rs); ((r∗ )r) is written r+ .

4

Instrumentation Point Graphs

In this section, we formally introduce the instrumentation point graph (IPG), which is the program model used in our hybrid MB WCET analysis framework. Therefore, the final calculation stage operates on this data structure [7, 8]. However, the principal motivation for its 7

inclusion is that we utilise its properties in the WCET coverage metrics presented in section 5, particularly because of the impact of optimal instrumentation on instruction blocks, which we also discuss. The construction of the IPG depends on a canonical representation of the CFG, which we term the CFG*. This intermediate representation is necessary in order to distinguish between instrumentation points and functional instructions in the program. In practice, instrumentation points could be arbitrarily inserted, causing basic blocks to be split, resulting in a number of basic sub-blocks. We denote the sets of instrumentation points and basic sub-blocks as I and B, respectively. Following are the definitions of the CFG* and the IPG: Definition 1. A CFG* is a connected digraph C = (N, E, s, t); N = B ∪ I, where B is the set of basic sub-blocks and {s, t} ⊆ I is the set of instrumentation points; E is the set {u → v|there is possible flow of control from u to v}. Definition 2. An IPG is a connected digraph I = (I, F, s, t) which is constructed from a CFG* C = (N, E, s, t); F is the set {u → v|there exists a path u → b1 → b2 → . . . → bn → v, bi ∈ B ∧ n ≥ 0}. In figure 1, an example CFG has been optimally instrumented and the resultant IPG shown. For the CFG in 1(a), an arbitrary spanning tree has been selected, noting that the direction of tree edges (those which are dashed) is conceptually ignored, whilst non-tree edges (those which are solid) have instrumentation points I1 , I2 , and I3 . Also note that we have omitted depiction of the CFG* because it is straightforwardly determined from the CFG shown; we shall therefore refer to the CFG and the CFG* interchangeably where no ambiguity is caused. The IPG in 1(b) has edges labelled with instruction blocks, therefore functional instructions appear on edges and not on nodes.

Instruction Blocks In previous hybrid MB WCET analysis approaches [6, 11], the entity for which a WCET is assumed known, the atomic unit of computation, is the basic block. Accordingly, the CFG and the AST can be used in the final calculation since they are based on these units. However, coarser instrumentation than at the beginning or end of each basic block, e.g. optimal instrumentation, forces an alternative atomic unit of computation because the WCET of individual basic blocks cannot be extracted after the test phase2 . Our term for the program segment residing between a pair of instrumentation points is the instruction block. Following is the formal definition: Definition 3. The instruction block of an edge u → v ∈ F, denoted IBu→v , is the regular expression over E that represents the set of all paths from u to v in C. For y, z ∈ N, y + z denotes selection, y · z denotes sequence, y ∗ denotes iteration zero or more times (e.g. for loop), and y + denotes iteration one or more times (e.g. do-while loop). For notational convenience, and because we are specifically interested in which basic subblocks are executed between instrumentation points, we omit the start and end nodes u and v of IBu→v . For example, in the IPG of 1(b), each edge has been labelled with its respective instruction block. The figure also motivates the introduction of the instruction block; for instance, IBI1 →I1 , IBI1 →I2 , IBI1 →I3 , and IBI1 →t , contain context-sensitive execution of basic sub-block d, thus, after testing, the WCET of d alone cannot be known. Optimal instrumentation has several interesting effects on instruction blocks. The most obvious is that, for a CFG C = (N, E, s, t) and the corresponding CFG* C = (N, E, s, t), N − 2 This

is also a primary motivation for introducing the IPG

8

Figure 1. An example CFG and IPG s

a s

b

c

ab

I1 d def klb

def k lb c

abc

lb

i

I2

I3

I3

jk

hj k lbc

j

lbc

h

hj klb

I2

deg

g

f

jk

e

gi de

t

I1

k

l

t

(a) Optimal instrumentation inserted into the CFG. Dashed edges are spanning tree edges. Solid edges are non-tree egdes and hence have instrumentation points I1 , I2 , and I3 .

(b) The resultant IPG, where edges have been labelled with instruction blocks. There are three iteration edges: I1 → I1 , I2 → I1 , and I3 → I1 .

{s, t} = B; that is, basic blocks and basic sub-blocks are equivalent, because instrumentation points essentially create new basic blocks. The second is that each instruction block contains a unique sub-path of the CFG*, precisely because the entire traversed path can be regenerated from a trace file. That is, between each pair of instrumentation points, the exact sequence of functional instructions executed can be enumerated. Observation 1. Using sub-optimal instrumentation, each instruction block IBu→v , u → v ∈ F, is a compound regular expression of the form u·b1 ·b2 ·. . .·bn ·v, bi ∈ B and n ≥ 0. Moreover, optimal instrumentation guarantees that n ≥ 1.

5

WCET Coverage

For any WCET analysis technique using measurements, testing plays a central role in the accuracy of the ensuing estimate in that a weak testing strategy creates the possibility of underestimation. On the other hand, exhaustive techniques are impractical for most contemporary applications due to the size of programs. In order to facilitate a tractable testing technique within hybrid MB WCET analysis, coverage criteria are required that guide the test harness to completion. In this domain, functional coverage metrics are often unsuitable since they do not consider temporal aspects of the code, but rather the functionality with respect to errors. 9

In this section, we introduce the notion of WCET coverage which should be considered within hybrid MB WCET analysis. The crux of WCET coverage is to provide a quantitative assessment of the quality of testing with respect to the timing properties of the program, as opposed to considering functionality alone. A myriad of software and hardware factors need to be contemplated to fully address WCET coverage; for example, external hardware factors such as bus contention and shared memory. This paper does not enumerate all such issues, but concentrates on providing coverage for programs executing on hardware with pipelines, as this feature is commonplace in a wide range of microprocessors used in the embedded market. The central assumption of the criteria that we introduce is that the program has been optimally instrumented. This assumption is motivated by re-emphasising the fact that hardware debug interfaces optimally instrument a large class of programs. More important, however, is observation 1 because measuring coverage is alleviated to a certain degree in that covering each edge in the IPG implies covering each sub-path between instrumentation points, and hence each instruction if all edges are covered. On the contrary, coarser instrumentation techniques would require greater support from testing and coverage because a number of instruction blocks will incorporate several sub-paths; the problems arising from such instrumentation techniques are considered beyond the scope of this paper. The stimulus for our work here stems from the results published by Colin and Petters [11], who quantified the effect of micro-architectural speed-up features on WCET analysis through MB techniques. The noteworthy characteristic of their study is that the effect of hardware features is observed. The essence of our approach is to try and force the negative impact of pipelines and provide feedback about (sequences of) instruction blocks which require further testing. For these purposes, properties of the instructions (which comprise the instruction block) are analysed with respect to a high-level view of the hardware. That is, the hardware is not modelled, but knowledge of the properties which affect the execution time, e.g. pipeline depth, is required since they affect the amount of coverage required. This is based on the intuitive notion that, for example, potential pipeline stalls caused by long-running floatingpoint instructions is only an issue when contention for pipeline resources can actually occur. In essence, we would assume less coverage when no such contention could exist, either because the program did not contain floating-point instructions, or because a pipeline was absent. In section 5.1, an overview of pipelines and their properties, including those pertaining to WCET analysis, is presented. Next, in section 5.2, we detail the inadequacy of current functional coverage with respect to timing analysis, particularly focusing on MC/DC since it is widely used in the avionic industry. We then introduce WCET criteria for pipelined architectures in section 5.3, which use properties of the IPG to measure coverage, and thus provide a stopping rule for sufficient testing. Finally, in section 5.4, we discuss some of the matters to be considered in fulfilling these criteria, such as problems arising from infeasible paths.

5.1

Pipelines and Their Properties

The almost mandatory inclusion of pipelines in hardware has been noted above and elsewhere [24]. Pipelining is a technique that permits multiple instructions to be in flight by exploiting the fact that an instruction must pass through multiple stages to complete execution. In a simple RISC architecture, these stages are typically: 1. 2. 3. 4. 5.

IF: Fetch the next instruction. ID: Decode the instruction, and read the register sources from the register file. EX: Execute the necessary ALU operation. MEM: Access memory for data for load and store instructions. WB: Write back the results computed into the register file. 10

A pipeline thus consists of a number of stages - its depth - each instruction must pass through and allows several instructions to occupy independent stages. Each instruction normally progresses to the next stage on every clock cycle, although each instruction does not need to pass through each stage. The instruction latency is the number of cycles taken to pass through the pipeline. The idealised latency of an instruction is the pipeline depth, but this is hindered by hazards: a data hazard occurs when an instruction depending on the data computed by a previous instruction is still in execution; a structural hazard occurs when there is contention amongst instructions for functional units, e.g. the ALU, or read and write ports. Hazards lead to stalls, whereby instructions cannot progress through the pipeline and prevent other instructions from being issued. Pipelines are often categorised according to the number of instructions issued and the way in which the allocation to the functional units occurs. An in-order pipeline allocates instructions according to the order in which they appear, whereas an out-of-order pipeline allows instructions to execute before previous instructions have completed. A scalar pipeline issues at most a unique instruction on each clock cycle and permits instructions to fork to different functional units. A superscalar pipeline dynamically issues multiple instructions on each clock cycle, whereas a very long instruction word (VLIW) achieves this statically. A common need for a scalar pipeline is for programs executing floating-point instructions, since these usually require multiple cycles to complete execution. To overcome potential, and unnecessary, stalls on integer instructions, a separate floating-point functional unit (FP) is employed to handle their EX stage. In this situation, both integer and floating-point instructions progress through IF and ID, but their execution then forks to EX and FP, respectively. It is also possible for these functional units to be pipelined due to inherent longer latency, thus preventing floating-point instructions stalling each other. The initiation interval is the number of cycles that must elapse between issuing two operations of a given type [24].

Instruction sub r1, r2, r3 load r4, 0(r3) add r6, r4, r5 add r8, r2, r7

1 IF

2 ID IF

3 EX ID IF

4 MEM EX ID IF

Clock Cycle 5 6 WB MEM WB stall EX stall ID

7

8

9

MEM EX

WB MEM

WB

Table 1. A pipeline diagram for several instructions on a typical load-store architecture. The execution of instruction add r6, r4, r5 is stalled due to a data hazard since the value of r4 is loaded in the previous instruction. For a scalar pipeline with a single ALU allocating in-order, all instructions after add r6, r4, r5 are also stalled; however, each instruction before add r6, r4, r5 must progress as normal to clear the stall. A pipeline diagram is a useful pictorial aid that represents the state of the pipeline, i.e. which instructions are currently executing, through consecutive clock cycles. Table 1 shows a pipeline diagram for four instructions executing on an in-order scalar pipeline. In this example, a pipeline stall occurs which prevents instruction add r6, r4, r5 from entering the EX stage in clock cycle 5, and instruction add r8, r2, r7 is also stalled. However, the hazard clears on the consecutive cycle, allowing the continuation of execution. To obtain a safe and accurate WCET estimate in the presence of pipelines, the timing effect over basic block boundaries must be contemplated, since the effect within the basic block is quite easily determined. In many cases, there is a relative speed-up in the execution time (of consecutive basic blocks), in comparison to their individual execution, due to the inherent overlapping of the pipeline. However, structural and data hazards between basic blocks can produce an increase in execution time, a so-called positive timing effect. 11

Engblom [15] has elaborated on the most significant properties of pipelines with respect to the WCET, which we summarise as follows: • For single in-order pipelines with data dependences between adjacent instructions, no positive timing effect can occur. However, positive timing effects can occur if there data dependences between non-adjacent instructions or the pipeline supports multiple functional units. • There does not exist an upper bound on the longest sequence of instructions causing a timing effect. Therefore, positive timing effects can occur over an arbitrary sequence of instructions.

5.2

Inadequacy of Functional Coverage

MC/DC forms part of the DO-178B standard [21] for avionic software, which is employed to overcome the weaknesses of C/DC and M-CC. For the former, each decision outcome and each condition outcome, i.e. either true (T) or false (F), needs to be toggled; however, it does not ensure decision coverage of object code [9]. For the latter, each possible combination of conditions must be evaluated; however, the number of test cases required grows exponentially since a decision composed of n conditions requires 2n test cases. MC/DC overcomes these issues by demanding that a condition is shown to independently affect the outcome of the decision. To construct test cases for MC/DC, a logic-based approach can be used, which considers a decision as the outcome of a logic gate (and, or, xor, and not) such that each condition is an input to the gate [21]. For example, to satisfy MC/DC for an n-input and gate requires n + 1 test cases since: • All inputs must be set to T and the outcome observed to be T. • Each input must be exclusively set to F (whilst the other inputs remain T) and the outcome observed to be F; this shows the independent effect of each condition. 1 2 3 4 5 6 7 8 9 10 11 12

void f o o ( int i , int j , int k ) { f l o a t u , v , w, x ; u = 10.5; v = 1.5; if ( i w = if ( i x =

> u > u

5 && k < 5 0 ) // D e c i s i o n One ∗ v; 20 && j > 1 0 0 ) // D e c i s i o n Two ∗ v;

} Listing 1. Example C program We will show that MC/DC is not sufficient for timing analysis purposes by constructing test cases, according to the logic-based approach, for a synthetic program shown in listing 1. The functionality of the program is straightforward: perform two floating-point multiplications depending on the outcome of two decisions on lines 8 and 10, which are denoted D1 and D2 , respectively. Both D1 and D2 are composed of two conditions whose boolean operator is and; the conditions i > 5, k < 50, i > 20, and j > 100 are denoted C1,1 , C1,2 , C2,1 , and C2,2 , 12

respectively. The values of i, j, and k, on which the outcome of these conditions depend, are assumed to be supplied by a test vector of (i, j, k). Figure 2. The CFG and the IPG generated from the example program in listing 1 s s m

n

i>5

k < 50

mn

mn

I4

mp

I5

I4

I5

j > 100

opq rz

x=u∗v

pq

I7

I9 r

op

q

p

opq

i > 20

I8

mpqrz

p

I6

I6

I7

pq rz

w =u∗v

m pq

o

z

z

z

t

t

(a) The generated CFG with basic block labels adjacent to nodes. Thick and dashed edges highlight an outcome of T and F, respectively, of a condition. Solid edges represent a linear code sequence. I4 , I5 , I6 , and I7 are optimal instrumentation points, where I8 and I9 are potentially added.

(b) The resultant IPG, where edges have been labelled with instruction blocks. The path s → I4 → t contains the floating-point operations in basic blocks o and r, respectively.

The CFG of this program is shown in figure 2(a), where basic blocks are annotated with source-level statements, and basic block labels are adjacent. For the sake of conciseness, we assume that the declaration and initialisation (where appropriate) of the variables u, v, w, and x, occur in basic block m. With respect to the structure of the CFG, we assume that the compiler short circuits decisions; for example, if the condition i > 5 in basic block m is F, then flow of control falls through to basic block p, and hence there is no requirement to evaluate the condition k < 50 in basic block n. A source-level statement in basic block z is absent because it represents the end of the program. Table 2 shows the test cases required to achieve MC/DC for D1 and D2 according to the logic-based method. Table 3 shows a set of test vectors that fulfil these test cases, and also highlights the CFG path traversed on executing the program with the respective test vector. Note that none of these test vectors have exercised the path m → n → o → p → q → r → z. This exclusion could be problematic in an architecture with a scalar pipeline that has functional units serving integer and floating-point instructions. In such an architecture, the operation w = u ∗ v (in basic block o) could cause a long-running timing effect by occupying FP and preventing the operation x = u ∗ v (in basic block r) from entering that stage, i.e. a stall occurs. Therefore, the WCET of basic block r could still be undecided after MC/DC has 13

been satisfied, depending on which architectural speed-up features are present, and how these features are implemented.

C1,1 C1,2 D1 C2,1 C2,2 D2

1 T (i > 5) T (k < 50) T -

2 T (i > 5) F (k ≥ 50) F -

Test Case Number 3 4 F (i ≤ 5) T (k < 50) F T (i > 20) T (j > 100) T

5 T (i > 20) F (j ≤ 100) F

6 F (i ≤ 20) T (j > 100) F

Table 2. Test cases required to satisfy MC/DC for the CFG in figure 2 according to logic-based approach. Test Vector (6, 0, 49) (5, 0, 49) (6, 0, 50) (21, 101, 100) (20, 101, 100) (21, 100, 100)

Test Case Satisfied 1 2 3 4 5 6

Path s→m→n→o→p→z→t s→m→p→z→t s→m→n→p→z→t s→m→n→p→q→r→z→t s→m→n→p→z→t s→m→n→p→q→z→t

Table 3. Test vectors fulfilling the test cases in table 2 Although it is clear that a slight modification to the test vectors would exercise the problem path whilst still achieving MC/DC, it should be underlined that coverage criteria merely serve as a guide to the quality of testing: they do not dictate how testing is to be implemented. The underlying problem of functional coverage is that architectural state is not considered. In the case of MC/DC, the timing effect between independent decisions is not always captured. Worse yet, this finding also extends to the most demanding functional criteria, that of path coverage. That is, even achieving full path coverage does not imply that the WCET of basic blocks can be deduced, since the initial architectural state must also be considered. To completely satisfy WCET coverage would therefore require full state coverage, which is clearly impractical.

5.3

Pipeline Coverage

In the previous section, we presented a motivating example that demonstrated the insufficiency of functional coverage for a simple architecture with a pipeline. In this section, we introduce pipeline coverage metrics which incorporate information about the hardware, and thus differ from functional metrics. A core assumption of these metrics is that the hardware is in a predictable state before testing commences. This does not mean that we assume the worst possible state since it is generally very difficult to determine. It should also be stressed that these metrics do not provide a guarantee that the WCET of instruction blocks is acquired on completion, in as much as satisfying functional metrics does not guarantee precise functionality. The metrics that we propose are motivated by observation 1, that there is a unique sub-path of the CFG* between pairs of instrumentation points in the IPG. Due to this property, a very basic requirement is to cover each instruction block in order to partially consider the effects of the pipeline, i.e. the overlap, on this unique sub-path. We define this metric as simple pipeline coverage: 14

Definition 4. A test set T satisfies simple pipeline coverage (SPC) whenever there is at least one test case in T that exercises each u → v ∈ F. The key word here is ”partial” because the effect of neighbouring instruction blocks is not necessarily captured. The natural extension to this simple criterion, therefore, is to try and observe the impact between adjacent instruction blocks. We define this metric as the pairwise pipeline coverage, which subsumes SPC3 : S {(pi → u, u → sj )|pi ∈ pred(u) ∧ sj ∈ succ(u)}. A test set Definition 5. Let X = u∈I−{s,t}

T satisfies pairwise pipeline coverage (PPC) whenever there is at least one test case in T that exercises each x ∈ X. Node s I4 I5 I6 I7 t

{(s → I4 , I4 {(s → I5 , I5 {(s → I6 , I6 {(s → I7 , I7

Test cases ∅ → I6 ), (s → I4 , I4 → I7 ), (s → I4 , I4 → I6 ), (s → I5 , I5 → I7 ), (s → I5 , I5 → t), (I4 → I6 , I6 → t), (I5 → I6 , I6 → t), (I4 → I7 , I7 → t), (I5 → I7 , I7 ∅

→ t)} → t)} → t)} → t)}

Table 4. Test cases required to satisfy pairwise pipeline coverage for the IPG of 2. For example, consider the IPG in figure 2(b), which has been generated from the optimally instrumented CFG in 2(a). To satisfy PPC for this IPG requires the test cases shown in table 4. In particular, note that test case (s → I4 , I4 → t) will cover the potentially problematic path m → n → o → p → q → r → z, which is missed by MC/DC. However, a weakness of PPC is that it is highly dependent on the placement of instrumentation points - whether they be passive or active - because each instruction block is likely to have different properties: varying numbers of basic blocks, varying numbers of instructions, and varying types of instructions. Moreover, it does not attempt to cover sources of pipeline contention between instruction blocks which extend over pairs of IPG edges. Slight modifications to the instrumentation profile, i.e. additional instrumentation points or a rearrangement of their positions, could prevent coverage of the most important pipeline contentions, even after satisfying PPC. For example, if the additional two instrumentation points I8 and I9 are inserted into the CFG of figure 2(a) in the locations shown, PPC does not guarantee coverage of the problematic path. We propose a stronger metric - pipeline hazard path coverage - that is designed to decouple the specific instrumentation point placement from the coverage criteria whilst still using the IPG to construct test cases, and hence measure coverage. In essence, test cases are fine tuned with the program and the employed pipeline in mind. To achieve this, a high-level analysis of the pipeline is performed, which does not construct a model, but instead uses its properties to examine sources of hazards, i.e. data and structural, between instructions residing in different instruction blocks. The analysis identifies pipeline hazard paths in the IPG, which we define as follows: ∗

Definition 6. A pipeline hazard path p : v0 → v1 → vm−1 → vm , where vi → vi+1 ∈ F, is one whose execution leads to a potential pipeline stall, which occurs due to overlapping instructions in IBv0 →v1 and IBvm−1 →vm . 3 The subsumes relation will not hold if the edge s → t exists since it does not have any adjacent edges. However, this can easily be handled by adding a dummy edge d → s (or t → d), and including a dummy test case (d → s, s → t) (or (s → t, t → d)).

15

In order to identify pipeline hazard paths, the pipeline depth is needed to determine if instructions with a potential hazard can be in flight simultaneously. This assumes that instruction latency is equal to the pipeline depth, and essentially neglects timing effects over arbitrary sequences of instructions simply because considering these would require a detailed model. Intuitively, we want to cover possible hazards for instructions which are known to overlap according to their idealised latency, since this provides a lower bound on the time an instruction occupies pipeline resources4 . To allow an accurate set of pipeline hazard paths to be identified requires additional knowledge of the pipeline, especially regarding the functional units. For instance, it is typical of some pipelines to have separated floating-point functional units. However, these pipelines can also provide support for particular floating-point operations, i.e. addition, multiplication, and division, by providing extra functional units for these purposes. This knowledge could be used to prune superfluous pipeline hazard paths which might occur when assuming that all floating-point instructions contend for the same resource. An equally important parameter is the initiation interval of these functional units, especially those which are long, because it is then possible to conjecture that stalls can propagate over their idealised latency. Already identified pipeline hazard paths can then be concatenated together to produce longer sequences which require execution. Using the high-level view of the pipeline, the next stage is to identify structural and data hazards among instructions through static analysis of the CFG. Structural hazards of functional units are determined by analysing the instructions according to their type, and using the pipeline depth (of the appropriate functional unit) to decide whether instructions will contend for resources. Similarly, data hazards are determined by examining whether there are dependences between instructions according to the pipeline depth. This information is then mapped onto the IPG to identify the pipeline hazard paths that require execution to satisfy pipeline hazard path coverage, which is defined as follows: Definition 7. Let P be the set of all pipeline hazard paths. A test set T satisfies pipeline hazard path coverage (PHPC) whenever there is at least one test case in T that exercises each p ∈ P . To illustrate this metric, we again refer to the CFG and IPG of figure 2. Let us assume that, after analysis of the pipeline, it is discovered that the floating-point operations in basic blocks o and r can overlap, and hence compete for the same resource. Therefore, a test case is required for the path s → I4 → t, which is equivalent to the test case (s → I4 , I4 → t) generated by PPC. However, the essential difference is that, if instrumentation points I8 and I9 are added, PHPC will require that the path s → I4 → I8 → I9 → t be exercised. On the other hand, PPC would require test cases (s → I4 , I4 → I8 ), (I4 → I8 , I8 → I9 ), and (I8 → I9 , I9 → t), thus potentially missing the timing effect. Note, however, that PHPC and PPC are complementary criteria. On the one hand, the goal of PPC is to blindly force the impact of the pipeline between sub-paths of the CFG according to the structure afforded by instrumentation points, and captures many local effects with well positioned, coarse instrumentation. On the other hand, the need for PHPC increases with denser instrumentation because positive timing effects caused by long-running instructions might not be captured with PPC.

5.4

Discussion

In WCET analysis, information regarding infeasible paths can tighten the final estimate. These are paths whose execution is possible according to the structure of the CFG, but are infeasible 4 This again highlights an essential property of a useable coverage criteria, that, due to tractability concerns, the worst case situation is (possibly) not covered.

16

when considering the semantics of the code. It is clear that attainment of any of the three criteria presented (SPC, PPC, or PHPC) might sometimes be hindered by infeasible paths in the CFG because there might not be any test vector that executes the desired sequence of IPG edges. For example, in the IPG of figure 1(b), the edge s → t corresponds to executing the body of the for loop (in the CFG of 1(a)) zero times. Depending on the way in which the loop exit condition is evaluated, it might not be possible to execute this edge. There are several options available in dealing with this issue. The first is to feed infeasible path information, acquired from static analysis [20], into the design of test cases for the relevant criteria, simply by disallowing those which attempt to exercise an infeasible path. The merit of this approach is the capability to capture quite complex infeasible path information in the IPG. However, its disadvantage is the complexity arising from the disambiguation of memory references for conditionals utilising pointers. The second is to utilise knowledge from the initial functional test phase, and assume any IPG edge not executed is infeasible, thus the edge is conceptually removed. For example, the potentially problematic IPG edge s → t in figure 1(b) could be removed if the for loop body is never executed zero times during functional testing. The deficiency is that the information is only applicable to unique IPG edges, and not sequences of IPG edges as sometimes required. Naturally, the third possibility is to determine infeasible sequences of IPG edges from trace parsing, which is essentially an extension of the former. This assumes functional testing is adequate, which might not be the case. The fourth option, in tune with the sentiment of hybrid MB WCET analysis, is to use multiple sources of evidence. For instance, after trace parsing it could be hypothesised that a particular sequence of IPG edges is infeasible, which is then verified by an appropriate static analysis technique; clearly, the opposite direction is also possible.

6

Conclusions and Future Work

In this paper, we have introduced the notion of WCET coverage, which must be considered in hybrid MB WCET analysis methods. The motivation for WCET coverage stems from the shortcomings of functional criteria, since these do not consider temporal issues, particularly the impact of hardware. We have considered WCET coverage for hardware that contains pipelines due to their seamless integration into modern embedded CPU design. The metrics that we have introduced assume an optimal instrumentation mechanism because of particular properties on our program model, the Instrumentation Point Graph (IPG). We have presented simple pipeline coverage in order to cover the pipeline effect of IPG edges, and pairwise pipeline coverage (PPC), which subsumes SPC, to try and cover the impact between adjacent IPG edges. However, a stronger criteria sometimes necessitates because of a dependence on the instrumentation point placement in attempting to cover positive timing effects due to pipeline stalls. For these purposes, pipeline hazard path coverage (PHPC) exploits properties of the pipeline and the program to identify sub-paths of the IPG that should be executed. There are several open issues which are the focus of future work. First, we intend to evaluate PPC and PHPC on processors with different pipelined configurations. The second is to extend the work presented here to account for the impact of other hardware features, especially caches and branch prediction units. Thirdly, we intend to enumerate the issues of hardware and software that must be considered in order to satisfy WCET coverage.

17

References [1] H. Agrawal. Dominators, Super Blocks, and Program Coverage. In Proceedings of the ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, January 1994. 5 [2] A. Aho, R. Sethi, and J. Ullman. Compilers: Principle, Techniques and Tools. AddisonWesley, 1986. 4 [3] T. Ball and J. R. Larus. Optimally Profiling and Tracing Programs. In Proceedings of the ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, February 1992. 3, 5 [4] I. Bate and R. Reutemann. Worst-case Execution Time Analysis for Dynamic Branch Predictors. In Proceedings of the Euromicro Conference of Real-Time Systems, July 2004. 4 [5] W. G. Bently and E. F. Miller. Ct coverage - initial results. Software Quality Journal, 2(1):29–47, March 1993. 3, 7 [6] G. Bernat, A. Colin, and S. M. Petters. WCET Analysis of Probabilistic Hard Real-Time Systems. In Proceedings of the Real-Time Systems Symposium, December 2002. 2, 5, 8 [7] A. Betts and G. Bernat. Tree-Based WCET Analysis on Instrumentation Point Graphs. In Proceedings of the International Symposium on Object and component-oriented Real-time distributed Computing, April 2006. 3, 5, 7 [8] A. Betts and G. Bernat. WCET Analysis on Irreducible Instrumentation Point Graphs using IPET-based Techniques. Submitted for publication at Real-Time Systems Symposium 2006, May 2006. 7 [9] J. J. Chilenski and S. P. Miller. Applicability of modified condition/decision coverage to software testing. Software Engineering Journal, 9(5):193–200, September 1994. 3, 6, 12 [10] A. Colin and G. Bernat. Tree-Based WCET Analysis on Instrumentation Point Graphs. In Proceedings of the Euromicro Conference of Real-Time Systems, July 2002. 3 [11] A. Colin and S. M. Petters. Experimental Evaluation of Code Properties for WCET Analysis. In Proceedings of the Real-Time Systems Symposium, December 2003. 3, 4, 5, 8, 10 [12] A. Colin and I. Puaut. Worst-Case Execution Time Analysis for Processors with Branch Prediction. Real-Time Systems, 18(2-3):249–274, May 2000. 4 [13] A. Colin and I. Puaut. A Modular & Retargetable Framework for Tree-based WCET Analysis. In Proceedings of the Euromicro Conference of Real-Time Systems, July 2001. 3 [14] ARM development tools. http://www.arm.com, August 2006. 3, 5 [15] J. Engblom. Processor Pipelines and Static Worst-Case Execution Time Analysis. PhD thesis, Uppsala University, April 2002. 4, 12 [16] A. Ermedahl and J. Gustafsson. Deriving Annotations for Tight Calculation of Execution Time. In Proceedings of the International Euro-Par Conference on Parallel Processing, August 1997. 4 18

[17] C. Ferdinand, F. Martin, and R. Wilhelm. Applying Compiler Techniques to Cache Behavior Prediction. In Proceedings of the ACM SIGPLAN Workshop on Language, Compiler and Tool Support for Real-Time Systems, June 1997. 4 [18] The Nexus 5001 Forum. http://www.nexus5001.org, August 2006. 3, 5, 6 [19] G. Frantz. Digital Signal Processor Trends. IEEE Micro, 20(6):52–59, November 2000. 2 [20] Jan Gustafsson, Andreas Ermedahl, and Bj¨ orn Lisper. Algorithms for Infeasible Path Calculation. In Proceedings of the International Workshop on Worst-Case Execution Time Analysis, July 2006. 17 [21] K. J. Hayhurst, D. S. Veerhusen, J. J. Chilenski, and L. K. Rierson. A Practical Tutorial on Modified Condition/Decision Coverage. Technical Report NASA/TM-2001-210876, NASA, May 2001. 12 [22] C. A. Healy, R. D. Arnold, F. Mueller, D. B. Whalley, and M. G. Harmon. Bounding Pipeline and Instruction Cache Performance. IEEE Transactions on Computers, 48(1):53– 70, January 1999. 3, 5 [23] C. A. Healy, M. Sj¨ odin, V. Rustagi, and D. Whalley. Bounding Loop Iterations for Timing Analysis. In Proceedings of the Real-Time Technology and Applications Symposium, June 1998. 4 [24] J. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Mogran Kaufmann Publishers, 2003. 3, 10, 11 [25] S-K. Kim, R. Ha, and S. L. Min. Analysis of the Impacts of Overestimation Sources on the Accuracy of Worst Case Timing Analysis. In Proceedings of the Real-Time Systems Symposium, December 1999. 4 [26] R. Kirner, I. Wenzel, B. Rieder, and P. Puschner. Using Measurements as a Complement to Static Worst-Case Execution Time Analysis. Intelligent Systems at the Service of Mankind, 2:205–226, January 2006. 5 [27] J. R. Larus. Efficient Program Tracing. IEEE Computer, 26(5):52–61, May 1993. 3, 5 [28] Y-T. S. Li and S. Malik. Performance Analysis of Embedded Software Using Implicit Path Enumeration. In Proceedings of the ACM/IEEE conference on Design Automation, June 1995. 3, 5 [29] C. L. Liu and J. W. Layland. Scheduling Algorithms for Multiprogramming in a HardReal-Time Environment. Journal of the Association for Computing Machinery, 20(21):46– 61, January 1973. 1 [30] T. Lundqvist and P. Stenstr¨ om. Timing Anomalies in Dynamically Scheduled Microprocessors. In Proceedings of the Real-Time Systems Symposium, December 1999. 4 [31] F. Mueller. Timing Predictions for Multi-Level Caches. In In Proceedings of the ACM SIGPLAN Workshop on Language, Compiler, and Tool Support for Real-Time Systems, June 1997. 4 [32] F. Mueller. Timing Analysis for Instruction Caches. Real-Time Systems, 18(2-3):217–247, May 2000. 4 [33] C. Park and A. C. Shaw. Experiments With A Program Timing Tool Based on SourceLevel Timing Schema. IEEE Computer, 24(5):48–57, May 1991. 5 19

[34] S. M. Petters, A. Betts, and G. Bernat. A New Timing Schema for WCET Analysis. In Proceedings of International Workshop on Worst Case Execution Time Analysis, June 2004. 3 [35] P. Puschner. Is Worst-Case Execution-Time Analysis a Non-Problem? - Towards New Software and Hardware Architectures. In Proceedings of the Euromicro International Workshop on WCET Analysis, June 2002. 2 [36] P. Puschner and C. Koza. Calculating the Maximum Execution Time of Real-Time Programs. Real-Time Systems, 1(2):159–176, September 1989. 2, 4 [37] P. Puschner and A. V. Schedl. Computing Maximum Task Execution Times - A GraphBased Approach. Real-Time Systems, 13(1):67–91, July 1997. 3, 5 [38] J. Schneider and C. Ferdinand. Pipeline Behaviour Prediction for Superscalar Processors by Abstract Interpretation. In Proceedings of the ACM SIGPLAN workshop on Languages, compilers, and tools for embedded systems, May 1999. 4 [39] F. Stappert, A. Ermedahl, and J. Engblom. Efficient Longest Executable Path Search for Programs with Complex Flows and Pipeline Effects. In Proceedings of the international conference on Compilers, architecture, and synthesis for embedded systems, November 2001. 3, 5 [40] M. M. Tikir and J. K. Hollingsworth. Efficient Instrumentation for Code Coverage Testing. In Proceedings of the International Symposium on Software Testing and Analysis, July 2002. 5 [41] K. W. Tindell and A. J. Wellings A. Burns. Mode changes in Priority Preemptively Scheduled Systems. In Proceedings of the Real-Time Systems Symposium, December 1992. 1 [42] N. Tracey. A Search-Based Automated Test-Generation Framework for Safety-Critical Software. PhD thesis, University of York, July 2000. 2 [43] J. Wegener and M. Grochtmann. Verifying Timing Constraints of Real-Time Systems by Mean of Evolutionary Testing. Real-Time Systems, 15(3):275–298, November 1998. 5 [44] J. Wegener and F. Mueller. A Comparison of Static Analysis and Evolutionary Testing for the Verification of Timing Constraints. Real-Time Systems, 21(3):241–268, November 2001. 5 [45] R. T. White, F. Mueller, C. A. Healy, D. B. Whalley, and M. G. Harmon. Timing Analysis for Data Caches and Set-Associative Caches. In Proceedings of the Real-Time Technology and Applications Symposium, June 1997. 4 [46] R. T. White, F. Mueller, C. A. Healy, D. B. Whalley, and M. G. Harmon. Timing Analysis for Data and Wrap-Around Fill Caches. Real-Time Systems, 17(2-3):209–233, November 1999. 4 [47] M. R. Woodward and M. A. Hennell. On the Relationship between Two Control-Flow Coverage Criteria: All JJ-Paths and MCDC. Information and Software Technology, 48(7):433– 440, July 2006. 6 [48] H. Zhu, P. A. V Hall, and J. H. R May. Software Unit Test Coverage and Adequacy. ACM Computing Surveys, 29(4):366–427, December 1997. 3, 6, 7

20