Accurate Software Performance Estimation Using Domain ...

3 downloads 50188 Views 125KB Size Report
Sep 7, 2004 - automotive applications, considering the virtual instruction set, and use a ... fitting method, it is adequate only when the training set of.
Accurate Software Performance Estimation Using Domain Classification and Neural Networks Marcio Seiji Oyamada 1,2 ´ [email protected]

Felipe Zschornack 1 [email protected]

1

Instituto de Informatica ´ Universidade Federal do Rio Grande do Sul Porto Alegre, Brazil

ABSTRACT For the design of an embedded system, there is a variety of available processors, each one offering a different trade-off concerning factors such as performance and power consumption. High-level performance estimation of the embedded software implemented in a particular architecture is essential for a fast design space exploration, including the choice of the most appropriate processor. However, advanced architectures present many features, such as deep pipelines, branch prediction mechanisms and cache sizes, that have a non-linear impact on the execution time, which becomes hard to evaluate. In order to cope with this problem, this paper presents a neural network based approach for highlevel performance estimation, which easily adapts to the non-linear behavior of the execution time in such advanced architectures. A method for automatic classification of applications is proposed, based on topological information extracted from the control flow graph of the application, enabling the utilization of domain-specific estimators and thus resulting in more accurate estimates. Practical experiments on a variety of benchmarks show estimation results with a mean error of 6.41% and a maximum error of 32%, which is more precise than previous work based on linear and nonlinear approaches. Categories and Subject Descriptors: C.3 [Computer Systems Organization]: Special-Purpose and ApplicationBased Systems: Real-time and embedded systems General Terms: Measurement, Performance, Design Keywords: Performance estimation, embedded software, neural networks

1. INTRODUCTION The existence of various architectures presenting different trade-offs concerning factors such as performance and power consumption allows the exploration of a large design space

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SBCCI’04, September 7–11, 2004, Pernambuco, Brazil. Copyright 2004 ACM 1-58113-947-0/04/0009 ...$5.00.

Flavio Rech Wagner 1 ´ [email protected]

2

Centro de Ciencias Exatas e Tecnologicas ˆ ´ Universidade Estadual do Oeste do Parana´ Cascavel, Brazil for the choice of the most adequate processor to execute a given embedded application. The evaluation of the execution time of the application, when running on a given architecture, may be performed by emulation, cycle-accurate simulation, or estimation. The exact number of cycles required by an application may be obtained only by emulation or cycle-accurate simulation. These techniques, however, have an inherent high cost, both for the development of the simulation or emulation setting and for obtaining the data. Even adopting techniques like simulation parallelization on a cluster of workstations, the simulation time is still too high, making a fast design space exploration impossible. High-level performance estimation is a very interesting alternative, since it may combine a low cost for obtaining the performance data with an acceptable precision, thus allowing a fast evaluation of different architectural alternatives in the early phases of the design cycle. Presently, the main problem in the development of an estimation tool is the availability of an accurate performance model that considers advanced architectural features such as pipelines, caches, and branch predictors. Performance estimation may be applied in two contexts: worst-case execution time (WCET) evaluation and design space exploration. In WCET evaluation [3], one of the main requirements is to guarantee that there will be no underestimation of the execution time of a given application task, since this could have deleterious effects when using the estimation in the analysis of the schedulability of a set of tasks. The goal of software performance estimation for design space exploration is to obtain an approximation of the software execution time for a given architecture [1, 5]. In this case, as in WCET calculation, precision is also required, although both underestimations and overestimations may be tolerated. In this case, techniques are usually based on an application profiling, which extracts instructions to be executed by the application. An analytical or statistical model thus maps the executed instructions into the number of cycles, resulting in a very low estimation cost. In this paper, the applicability of a neural network for performance estimation of the application running on advanced architectures is demonstrated. Practical experiments have been performed with the PowerPC 750, using a set of 40 benchmarks from heterogeneous application domains, including dataflow, control intensive, and mixed behavior. Experiments obtained a mean error of 7.90% in the per-

formance estimation, which is more accurate than previous work based on a linear approach [5], which results in a mean error of 34%. Compared to a previous work also based on a non-linear approach [1], our method obtains the same accuracy but using a more heterogeneous benchmark set. This paper also proposes a new method, based on the use of the application control flow graph, to automatically classify the applications and derive two domain-specific estimators (controlflow and dataflow), thus increasing the accuracy. The remaining of this paper is organized as follows. Section 2 discusses previous work. The proposed estimation method is presented in Section 3, and experiments and results are detailed and analyzed in Section 4. Section 5 concludes and introduces future work.

2. PREVIOUS WORK In worst-case execution time (WCET) calculation, it is necessary to determine the sequence of basic blocks that is responsible for the worst case. The static extraction of the execution flow allows the determination of the WCET, even for applications where the execution behavior is dependent on the input data. Li and Malik [10] present a method to determine the number of executions of basic blocks by modeling structural and functional restrictions of the application code using linear equations. A linear programming method may thus be used to maximize these equations and obtain the WCET. Wolf and Ernst [17] present a method that tries to extract a single feasible path, thus reducing the problem complexity. Other information that is also relevant for performance estimation in the presence of complex architectural features may also be obtained statically. In [7, 11], the number of misses in the instruction cache is obtained by the application of linear equations. In [9], a method is presented that models the impact of speculative execution upon the number of misses in the instruction cache. The number of misses of the branch predictor may also be statically obtained, as presented in [2]. These predictions are used to increase the precision of the calculation of the execution time of each basic block, since this calculation uses only local information. In this phase, cycle-accurate simulators may be alternatively used [10, 17], but at a higher cost. The adoption of more abstract processor models reduces the complexity and eases the retargeting of the estimation method to different processors [3, 12, 16]. Performance estimation may also be used for high-level design space exploration. Usually, an application profiling is performed to dynamically extract the number of executed instructions of various types [1, 5, 8]. Some adequate method is then applied to estimate the execution time, using this profiling as input. Giusto et al. [5] compile the application code into a virtual instruction set (a simplified RISC set with 25 instructions). The estimation is performed by evaluating the execution cost of the virtual instructions in the target architecture. They profile a set of benchmarks with 35 control-dominated automotive applications, considering the virtual instruction set, and use a cycle-accurate simulator to obtain the number of cycles consumed by an application. After that, a statistical analysis based on linear regression is applied on these data to determine the constant K and the indexes Pi of equation 1, where Pi and Ni are the weight and number of executions of each instruction type i, respectively.

Cycles = K +



P i Ni

(1)

Since this approach is based on the application of a linear fitting method, it is adequate only when the training set of benchmarks is very similar to the applications for which the estimation is required, as the authors demonstrate. They do not discuss details of the target architecture (such as cache and pipeline) for which estimations are obtained. Bontempi and Kruijtzer [1] use a non-linear method to estimate the execution time. For a given benchmark set, a profiler extracts a functional signature vector for a virtual processor (with a set of 42 instructions), containing the instruction types that appear in the code and the number of times each instruction type is executed. This functional signature is theoretically independent of the target architecture, so that it can be re-used for the estimation with different processors. The authors, however, do not discuss the impact of using this functional signature for estimating performance for a processor of an architectural type that is very different from the virtual processor upon which the profiler is based. In our work, we prefer to profile the application for the instruction set of the target processor, even if this requires compiling the application and profiling it for each different processor to be evaluated. Their estimation method is also based on an architectural signature of the target processor. They propose two parameters that define this signature: the number of memory wait cycles and the ratio between CPU clock and bus clock. They present estimation results for a MIPS R3000. Bontempi and Kruijtzer use a training-and-test approach. In a training phase, they apply a modeling technique called lazy learning to choose an estimation function that is based on a criterion of neighborhood between the application and the training set. This function, which may be locally linear, uses only points of the training set that are closer to the application. Inputs to this phase are the functional and architectural signatures and the number of clock cycles for executing each application of the benchmark set, obtained from a cycle-accurate simulation in the target processor. They propose a training method based on splitting the benchmarks into two disjoint sets for training and test. They report a mean error of 8.8% in the estimations, for a set of 6 benchmarks, each one executed with 15 different input data sets. They do not report, however, the size of the training and of the test sets. They present another training method LOO (leave-one-out), which uses N-1 executions of the same algorithm with different input data for the training phase, and one additional execution, for the test phase. In this paper we present a method to classify the applications in domains and create domain-specific estimators that result in smaller errors, even when using heterogeneous applications. A real architecture based on the PowerPC 750 has been used for the experiments. As we wish to model non-linear behavior due to caches, branch predictions, and to the nature of the application itself, the natural choice was to use a neural network as a non-linear predictor.

3.

PERFORMANCE ESTIMATION

In the design of embedded systems, design space exploration can be performed to satisfy application requirements, for instance by modifying architectural choices and task partitioning. After this, the synthesis process generates the

training set

architecture A

architecture B

compiler classification cycle-accurate simulator

compiler classification cycle-accurate simulator

simulation data domain A neural network

simulation data domain B

domain A

neural network

neural network

backpropagation training algorithm

Training Phase Design Space Exploration

domain B neural network

backpropagation training algorithm

application compiler classification dynamic instruction count

compiler classification dynamic instruction count

or

or

trained neural network

trained neural network

trained neural network

estimated cycles

estimated cycles

estimated cycles

trained neural network

estimated cycles

Figure 1: Estimation tool development and utilization final solution, composed by software (operating system, application tasks, drivers), hardware (processors, dedicated IP hardware), and communication structure. An estimation process can be continuously applied to verify the proposed solution with regard to the system requirements. In embedded software estimation, the increasing use of advanced processors requires the development of accurate and fast estimation tools that consider the performance impact of advanced features such as caches, branch prediction, and pipelines. Neural networks have been chosen for performance estimation since they can generalize their behavior even when the process to be modeled is highly non-linear. In this work, a feedback-forward network has been used [4], due to its simplicity and adaptation to the non-linear behaviour of software performance estimation. Our network is composed by an input layer, one hidden layer, and an output layer. Each layer may have a different number of neurons, each one with its transfer function. Figure 1 presents the two main steps of our estimation methodology: training and utilization. In the training phase, a set of samples is presented to the network, so that it may learn. For performance estimation of embedded software, inputs are the number of executed instructions of different instruction types, while the expected result is the number of cycles consumed by the embedded application. A cycleaccurate simulator is required, to extract the number of executed instructions and the total number of cycles consumed by the application. The benchmarks are classified based on the method presented in Section 3.1 and trained using a neural network for a specific domain. An iterative learning process, based on the back-propagation method, modifies the weights of the input and output arcs of the neurons in each layer, so that the network presents an output that is as close as possible to the expected

result. After the training phase, the estimation tool is ready to be used in many designs. In the design exploration phase, an application is compiled for a given target processor and classified. The classification indicates the domain of application and the suitable estimator to be used. The number of executed instructions of each type is obtained by a dynamic instruction count and presented to the neural network, so that it can estimate the number of cycles consumed by the application. The training time may be long, depending on the inputs and on the complexity of the generalization. If one considers that the training process is realized only once for each given architecture, the long time that it takes is acceptable. However, once the network is trained, its utilization has a very low cost, consisting in the dynamic instruction count of the application and on the neural network cost, which corresponds only to the multiplication of the inputs by the weights of the neurons. The dynamic instruction count dominates the time consumed by the utilization phase. In our experiments, this task consumes 5 minutes for a given application, compared to 24 hours taken by the cycle-accurate simulation. This fast performance estimation enables a design space exploration that would be difficult using a cycleaccurate simulator due to the long time it takes to evaluate each new software design. For each target processor, a different set of estimators is generated, one for each application domain. This performance estimation method is especially adapted for design space exploration in the software domain, for instance considering various algorithmic alternatives for design tasks and various partitionings of tasks among processors, since architectural modifications would require a new training process and thus a very long turnaround time.

To improve the estimation accuracy, the topological information of the applications can be used to classify them and select domain-specific estimators. We observed that the quality of the prediction was tightly linked to the training set. Moreover, if the training set has most benchmarks as dataflow examples, the error in prediction for controldominated application was large, and vice-versa. In order to further improve the prediction process, the key issue was the selection of the correct training set for a certain domain in an automatic way and of the correct estimator for a given application to be evaluated. The basic topological information is the Control Flow Graph (CFG) of an application. This topological information can be used to classify the applications and improve the accuracy of estimations, as in Section 4.2. A modified version of the GNU compiler GCC [6] was used to dump a file with information concerning the CFG. The classification is based on the computation of a CFG weight, based on the number of arcs connecting the basic blocks. If the basic block has 2 output arcs, these arcs are assigned a weight of 2, reflecting the cost of a decision statement in the processor performance. The CFG weight is calculated by equation 2 and illustrated in Figure 2.  CF G weight =

1

2

2

1

1 1

1

CFG_weight =

8 =2 4

1 CFG_weight =

1.00E+06 1.00E+05 1.00E+04 1.00E+03 1.00E+02

Cycle Count Instruction Count

1.00E+01

1

2 1

1.00E+07

3

5

7

9

11

13

15

17

(2)

1

2

1.00E+08

1.00E+00

weighted arcs num nodes

In this way, control-dominated applications will have a higher value for the CFG weight than dataflow ones. The proposed classification method is very fast and can be implemented statically without manual intervention. A threshold of 1.95 has been used to separate the applications in the data and controlflow domains. Using this threshold, the result of the automatic classification looks very similar to a manual classification performed by an experienced designer.

1

and satisfactory estimations have been obtained for applications from both domains using the same single training, demonstrating the robustness of the estimator. Besides this set, also a real application - a crane control [14] - has been used to assess the precision of the estimator. Figure 3 presents the cycle and instruction counts of the benchmarks considering the PowerPC 750 as target processor. The y-axis is in logarithmic scale and demonstrates the large spectrum of applications used to train and test the estimator. The instruction count varies from 228 to 14x106 , representing from small codes such as device drivers and operating systems functions to huge applications.

Size

3.1 Automatic Domain Classification

9 = 1.8 5

Figure 2: CFG weight method

4. EXPERIMENTS AND RESULTS A set of 32 benchmarks [15] has been used for training and test. Some benchmarks have been executed with different input data, each one of them being considered a new benchmark, resulting in a total of 40 samples. This set contains both control-dominated and data-dominated applications. Training has been performed with a mix of applications,

19 21 23 Benchm ark

25

27

29

31

33

35

37

39

Figure 3: Cycle and instruction count distribution The PowerPC 750 processor has been used to evaluate the precision of the proposed estimation method. It is a RISC superscalar processor that may complete up to 2 instructions per cycle and contains 6 functional units: floating-point, branch, system register, load/store, and two integer units. A cycle-accurate PowerPC simulator [13] has been used to profile each benchmark and to obtain the exact number of cycles consumed by each application.

4.1

Generic Estimator

In the first phase, a generic estimator was trained using benchmarks 1 to 10 and 30 to 39. They represent a mix of data and control-dominated applications and have very different sizes, as seen in Figure 3. The remaining 21 benchmarks have been used to test the estimation precision. Benchmarks 40 and 41 are executions of the Crane application with different execution times of the main loop of the control algorithm. In order to build a first neural network estimator, instructions have been classified into four classes: branches, integer, floating point, and load/store. Since the PowerPC has a branch predictor, the influence of this resource in the estimator can be considered by enhancing the neural network with two different parameters: the number of forward and backward branches. Backward branches are usually observed in loops and enhance the effectiveness of the branch predictor. These counts can be easily obtained from the application dynamic instruction count. The neural network used in this experiment is composed by an input layer, a hidden layer with 5 neurons containing a tansig transfer function, and an output layer with one neuron containing a linear transfer function. The time consumed to train this neural network was about 5 hours, on a PC workstation (Athlon 2GHz).

41

40

30

20

10 Error %

Table 1 presents results obtained from the experiments and the corresponding mean estimation error, standard deviation, maximum underestimation, and maximum overestimation errors. For the neural network using only four instruction classes, the maximum overestimation is 41.01%, and the maximum underestimation is 20.69%. The utilization of the information on backward and forward branches results in an improvement of the estimator accuracy, reducing the mean error from 9.26% to 7.90%. Figure 4 presents the estimation error for each benchmark. The order of benchmarks in the x-axis corresponds to the size of the application in number of cycles, as shown in Figure 3.

0

−10

−20

Table 1: Estimation results of 41-benchmark set Branch, Backward branch, load/store, forward branch, integer, float load/store, integer, float Mean error 9.26% 7.90% Std dev 10.64% 9.11% Max over 41.01% 33% Max under -20.69% -31% In the experiments using four instruction classes and considering the training set only, the mean error obtained was 4.81% and the standard deviation 7.07%. Considering the test set only, we obtained a mean error of 13.30% and a standard deviation of 11.83%. The experiments using the backward and forward branches resulted in a mean error of 3.15% and a standard deviation of 4.22% for the training set and a mean error of 12.23% and a standard deviation of 10.23% for the test set. Using the same set of training benchmarks and applying linear regression as proposed in [5], one obtains a mean error of 34% with a standard deviation of 33% and a maximum absolute error of 106%. This comparison demonstrates the advantage of using neural networks, especially when applied to advanced architectures, since they can cope with the nonlinear impact of different features like cache, branch prediction, and deep pipeline.

4.2 Domain-specific estimators We also applied the training method based on the LOO (leave-one-out) technique used in [1]. In this technique, N-1 runs of the same application with different input data are used for training, and one sample is left to evaluate the estimation accuracy, thus creating an application-specific estimator. Table 2 presents the results obtained for the Quicksort and Matrix Multiply algorithms running with 15 different data inputs. The results show that the neural network can highly adapt to estimate the performance of one application in front of new runs, resulting in very small prediction errors. Table 2: Estimation performance using the LOO (leave-one-out) training technique Benchmark Mean error Std dev Max error Qsort 0.08% 0.16% 7.18% Matrix multiply 0.12% 0.19% 7.63% The LOO method has been also applied to the complete benchmark set, by using 39 benchmarks for training and 1

−30

−40

0

5

10

15

20 25 Benchmark

30

35

40

45

Figure 4: Prediction errors using 5 input parameters: backward branch, forward branch, load/store, integer and floating point

benchmark for test, resulting in a mean error of 11.37% and a maximum error of 103%. The increase of the mean error shows that the heterogeneity of the applications makes the generalization task of the neural network much harder. This demonstrates that the quality of the prediction is tightly linked to the training set. As an example, if the training set has most benchmarks from the dataflow domain, the error in prediction for control-dominated applications is large, and vice-versa. In order to further improve the prediction process, domain-specific estimators may be developed. A key issue to make this approach practical is the selection of the correct training set for a certain domain in an automatic way. The CFG weight criterion, described in Section 3.1, has been used to classify the original benchmark set into two domains. The processor features (cache, branch prediction) react differently depending of the application domain (control or dataflow), influencing the overall performance. The CFG weight can be obtained statically and without manual intervention, and the selection and utilization of the most suitable neural network can be implemented without user knowledge. In domain A, we have placed applications with high CFG weight (above 1.95) and, consequently, with strong controlflow characteristics. Domain B is composed by applications with low CFG weight, and hence presenting dataflow characteristics. From the original benchmark set, domain A is composed by 20 applications, and domain B by 21 applications. From domain A we have kept only 16 benchmarks and removed 4 benchmarks with floating point instructions that represent a small set and could harm the generalization of the neural network. To overcome this restriction, more benchmarks with floating point instructions should be used, improving the generalization of the neural network. Table 3 shows the results obtained with the domain-specific estimators. As one can see, their use results in a good performance prediction when compared to generic estimators. In domain B (dataflow) the mean error decreases from 7.90% in the generic estimator to 6.41% in the domain-specific es-

timator. This estimator also results in a decrease of the error range, which varies from -32.41% to 25.87%. In domain A (control flow), the mean error is very close to the generic estimator. We notice that the benchmark with the largest error (49%) is expint. This is a synthetic benchmark developed specifically to stress control flow features, without a relation to real applications. If we do not consider the expint, the error range varies from -17.81% to 24.96%, resulting in a mean error of 5.30%. We also analyzed the prediction performance using a crosstest. In this case, we have used the domain A estimator with the benchmarks of the domain B, and vice-versa. As one can see in Table 3, the use of domain-specific estimators in applications from an unrelated domain results in much poorer estimations. This illustrates the validity of our classification method. Table 3: Estimation results using domain-specific estimators Domain Domain Cross-test A B (A vs B) (B vs A) Mean error 7.62% 6.41% 17.65% 55.12% Std dev 12.46% 9.45% 12.39% 38.25% Max over 24.96% 25.87% 42.34% 163.78% Max under -49.37% -32.41% -28.49% -95.70%

5. CONCLUSIONS AND FUTURE WORK This paper shows the applicability of neural networks to improve the precision of embedded software performance estimation. In comparison to previous works, results are more precise than those obtained with linear methods. In comparison with other non-linear methods that also use the partitioning strategy in the training-and-test approach, our work also achieves a better precision. In the experiments, 41 benchmarks and real applications from different domains, such as filters, matrix manipulations, sorting algorithms, and an embedded crane control, have been used. A real target processor with advanced features such as cache, superscalar pipelines, and branch prediction was used to evaluate the predictor precision, and a mean error of 7.90% has been obtained. Experimental results show that our method is very precise for application-specific estimations, with mean errors of 0.16% using the LOO (leave-one-out) training method. Topological information has been used to classify applications and create domain-specific estimators, increasing the estimation accuracy. A modified version of the GNU-gcc compiler has been developed to statically provide topological information, allowing the classification without user intervention. This new method results in the decreasing of mean and maximum errors. Future work will address the determination of further application parameters such as cache misses, branch mispredictions and others that may increase the estimation accuracy, but without loss of generality. The CFG weight is useful to separate the applications with clear dataflow or control-flow characteristics. Another set of topological parameters will be proposed to classify the applications more precisely, thus creating other classes of mixed behavior applications. Our approach for automatic domain classification and performance estimation will be applied to other architectures

such as Athlon (CISC-like) and a Java processor. Moreover, architectural parameters such as cache size and memory latency could be used as inputs to the neural network, enabling it to predict the performance in architectures with the same instruction set but with different micro-architectural features.

6.

ACKNOWLEDGMENTS

The authors acknowledge the support from CNPq. Special thanks to Fabio Wronski for the support in the PowerPC simulator.

7.

REFERENCES

[1] G. Bontempi and W. Kruijtzer. A Data Analysis Method for Software Performance Prediction. In Design, Automation and Test in Europe, pages 971–976. IEEE Computer Society Press, 2002. [2] A. Colin and I. Puaut. Worst Case Execution Time Analysis for a Processor with Branch Prediction. Journal of Real-Time Systems, 18(2-3):249–274, 2000. [3] J. Engblom, A. Ermedahl, and F. Stappert. A Worst-Case Execution-Time Analysis Tool Prototype for Embedded Real-Time Systems. In RTTOOLS’2001 - Workshop on Real-Time Tools, Aalborg, Denmark, 2001. [4] J. A. Freeman and D. M. Skapura. Neural Networks: Algorithms, Applications, and Programming Techniques. Addison-Wesley Publisher, 1992. [5] P. Giusto, G. Martin, and E. Harcourt. Reliable Estimation of Execution Time of Embedded Software. In Design, Automation and Test in Europe, pages 580–589. IEEE Computer Society Press, 2001. [6] GNU. GCC - GNU Compiler Collection. http://www.gnu.org [7] A. Hergenhan and W. Rosenstiel. Static Timing Analysis of Embedded Software on Advanced Processor Architectures. In Design, Automation and Test in Europe, pages 552–559. IEEE Computer Society Press, 2000. [8] M. Lajolo, M. Lazarescu, and A. Sangiovanni-Vincentelli. A Compilation-based Software Estimation Scheme for Hardware/Software Co-simulation. In Proceedings of the 7th International Workshop on Hardware/Software Codesign, pages 85–89. ACM Press, 1999. [9] X. Li, T. Mitra, and A. Roychoudhury. Accurate Timing Analysis by Modeling Caches, Speculation and their Interaction. In Design Automation Conference, pages 466–471. ACM Press, 2003. [10] Y.-T. S. Li and S. Malik. Performance Analysis of Embedded Software Using Implicit Path Enumeration. In Design Automation Conference, pages 456–461. ACM Press, 1995. [11] Y.-T. S. Li, S. Malik, and A. Wolfe. Performance Estimation of Embedded Software with Instruction Cache Modeling. In IEEE/ACM International Conference on Computer-Aided Design, pages 380–387. IEEE Computer Society Press, 1995. [12] S.-S. Lim, J. H. Han, J. Kim, and S. L. Min. A Worst Case Timing Analysis Technique for Multiple-Issue Machines. In IEEE Real-Time Systems Symposium, page 334. IEEE Computer Society Press, 1998. [13] Microlib. PowerPC 750 Simulator. http://www.microlib.org/G3/PowerPC750.php [14] E. Moser and W. Nebel. Case Study: System Model of Crane and Embedded Control. In Design, Automation and Test in Europe, page 721. IEEE Computer Society Press, 1999. [15] F. Stappert. WCET - Benchmarks. http://c-lab.de/home/en/download.html#wcet. [16] S. Thesing, J. Souyris, R. Heckmann, F. Randimbivololona, M. Langenbach, R. Wilhelm, and C. Ferdinand. An Abstract Interpretation-Based Timing Validation of Hard Real-Time Avionics. In: International Performance and Dependability Symposium (IPDS). IEEE Computer Society Press, June 2003. [17] F. Wolf and R. Ernst. Intervals in Software Execution Cost Analysis. In: International Symposium on System Synthesis, pages 130–135. ACM Press, 2000.

Suggest Documents