... thanks to Christer Eriksson for all the creative discussions, and to Harriet ...... ware. This timing model is programmer supplied [Kenny91:1]. FLEX does not ...
TRITA-MAE 1994:X ISSN XXXX-XXXX ISRN KTH/MAE/R--94/X--SE
Calculation of Execution Times in Object-Oriented Real-Time Software. A Study Focused on RealTimeTalk. by Jan Gustafsson
Stockholm 1994 Licentiate Thesis DAMEK Research Group, Department of Machine Elements Royal Institute of Technology, KTH S-100 44 Stockholm, Sweden CUS (Department of Real-time Computer Systems) University of Mälardalen Box 11, S-721 03 Västerås
i
Abstract. This thesis describes the calculation of execution times in the object-oriented language RealTimeTalk (RTT). RTT is an object-oriented language for hard real-time systems. Hard real-time systems are systems where all tasks must fulfil the specified time requirements. State of the art of execution time calculation in real-time systems is presented, and the different approaches are discussed and compared to the RTT approach. RealTimeTalk (RTT) is presented. The algorithms and data structures of an implementation of execution time calculation in RTT are described. Results and some limitations in the method are discussed.
Preface. This licentiate thesis is a part of a joint research project between CUS (Department of Real-time Computer Systems) at the University of Mälardalen in Västerås and the Department of Machine Elements at KTH in Stockholm. Supervisors have been Prof. Sören Andersson and Dr. Jan Wikander at the Damek research group (Computer Controlled Machinery). The work has been supported by a reference group, consisting of Prof. Bud Lawson, Tech. lic. Ulf Backman and representatives of Swedish companies (ABB Automation, DeLaRue Inter Innovation and CelsiusTech). The project RTT started in 1990 as an idea to show that, opposite to many peoples opinion, real-time systems could be constructed with object-oriented techniques. A number of presentations and papers have communicated the progress of the project. The first presentation was given in Uppsala at the SNART (Swedish Work Group for Real-Time) conference 19-20 august 1991. A paper was presented at the IFAC/IFIP international workshop on real-time programming (WRTP'92) in June 1992 [Brorsson92]. Another paper was presented at the Fifth Euromicro Workshop on Real-Time Systems, Oulu, Finland, June 1993 [Eriksson93]. A recent presentation was given at the SNART conference in Stockholm in august 1993. Parallel to this thesis, another thesis [Eriksson94]has been written by my colleague Christer Eriksson at CUS. His thesis describes the RTT framework and run-time system in more detail and makes a comparison to the state of the art. My work in the RTT project has been supported by NUTEK and by internal research funding at the University of Mälardalen. I would like to thank my supervisors and the members of the reference group. Thanks also to all my colleagues at CUS and at Damek for friendly co-operation. Special thanks to Christer Eriksson for all the creative discussions, and to Harriet Ekwall for the friendly atmosphere at CUS. Västerås, a dark evening in December 1993. Jan Gustafsson ii
Contents. 1.
Introduction. ........................................................................................................... 1 1.1. Real-time systems and software. ............................................................ 1 1.2. Real-time systems and object-oriented programming. ....................... 1
2.
Analysis of execution time................................................................................... 3 2.1. Measurement or calculation of execution times? ................................. 3 2.1.1. Measurement of execution times. ............................................. 3 2.1.2. Calculation of execution times. ................................................. 4 2.1.3. Combination of methods............................................................ 5 2.2. System and hardware demands. ............................................................ 5 2.3. Real and calculated execution time. ....................................................... 7 2.4. The overreservation factor....................................................................... 8 2.5. The dynamic factor. .................................................................................. 9
3.
State of the art ....................................................................................................... 11 3.1. Overview. ................................................................................................. 11 3.2. Mok. .......................................................................................................... 12 3.2.1. General. ....................................................................................... 12 3.2.2. Languages supported. .............................................................. 12 3.2.3. Description of the approach. ................................................... 12 3.2.4. Results. ........................................................................................ 12 3.2.5. Discussion. ................................................................................. 12 3.3. MARS. ....................................................................................................... 13 3.3.1. General. ....................................................................................... 13 3.3.2. Languages supported. .............................................................. 13 3.3.3. Description of the approach. ................................................... 13 3.3.4. Results. ........................................................................................ 14 3.3.5. Discussion. ................................................................................. 15 3.4. Shaw and Park......................................................................................... 16 3.4.1. General. ....................................................................................... 16 3.4.2. Languages supported. .............................................................. 16 3.4.3. Description of the approach. ................................................... 16 3.4.4. Results. ........................................................................................ 19 3.4.5. Discussion. ................................................................................. 20 3.5. FLEX (Kenny, Lin et al.) ......................................................................... 21 3.5.1. General. ....................................................................................... 21 3.5.2. Languages supported. .............................................................. 21 3.5.3. Description of the work. ........................................................... 21 3.5.4. Results. ........................................................................................ 22 3.5.5. Discussion. ................................................................................. 22 3.6. DEDOS ..................................................................................................... 23 3.6.1. General. ....................................................................................... 23 3.6.2. Languages supported. .............................................................. 23 3.6.3. Description of the approach. ................................................... 23 3.6.4. Results. ........................................................................................ 24 3.6.5. Discussion. ................................................................................. 24
iii
3.7. Wall (Ada) ................................................................................................25 3.7.1. General. .......................................................................................25 3.7.2. Languages supported. ...............................................................25 3.7.3. Description of the approach. ....................................................25 3.7.4. Results. ........................................................................................25 3.7.5. Discussion. ..................................................................................26 3.8. CHAOS (Bihari et al) ...............................................................................27 3.8.1. General. .......................................................................................27 3.8.2. Languages supported. ...............................................................27 3.8.3. Description of the approach. ....................................................27 3.8.4. Results. ........................................................................................28 3.8.5. Discussion. ..................................................................................28 3.9. Real-Time Euclid. ....................................................................................29 3.9.1. General. .......................................................................................29 3.9.2. Languages supported. ...............................................................29 3.9.3. Description of the approach. ....................................................29 3.9.4. Results. ........................................................................................30 3.9.5. Discussion. ..................................................................................30 4.
Presentation of RealTimeTalk (RTT). ..............................................................31 4.1. The design objects in RTT. .....................................................................31 4.2. Usecases. ...................................................................................................32 4.3. Scheduling. ...............................................................................................34 4.4. The RTT language. ..................................................................................34 4.4.1. Syntax. .........................................................................................34 4.4.2. Semantics.....................................................................................34 4.4.3. Control structures i RTT. ..........................................................35 4.5. Run-time support for real-time. ............................................................37 4.6. The RTT compiler and other tools. .......................................................38
5.
Calculation of execution times in RTT.............................................................39 5.1. Analysis of code in RTT. .........................................................................40 5.1.1. Sequences. ...................................................................................42 5.1.1.1. Blocks...........................................................................44 5.1.2. Selections. ....................................................................................45 5.1.3. Repetitions. .................................................................................46 5.1.3.1. The timesRepeat repetition. .....................................46 5.1.3.2. The whileTrue repetition. .........................................48 5.1.3.3. Run-time supervision of loop limits. ......................50 5.2. The C-macro analysis (the front-end tool). ..........................................51 5.2.1. The basic idea. ............................................................................51 5.2.2. Algorithms and data structures. ..............................................51 5.2.2.1. What is recursion in object-oriented programs? ...................................................................55 5.2.2.2. Detection of recursion. ..............................................58 5.2.2.3. Sequences. ...................................................................59 5.2.2.4. Selections. ...................................................................60 5.2.2.5. Repetitions. .................................................................61 5.2.3. Implementation. .........................................................................63 5.2.3.1. The C-macro parser. ..................................................63 iv
5.2.3.3. Programming tools. .................................................. 63 5.3. The assembly code analysis (the back-end tool). ............................... 65 5.3.1. The basic idea............................................................................. 65 5.3.2. Algorithms and data structures. ............................................. 66 5.3.3. Implementation. ........................................................................ 69 5.3.3.1. The assembly code parser. ....................................... 69 5.3.3.2. The assembly code graph parser. ........................... 69 5.3.3.3. The assembly code graph analyser. ....................... 71 5.3.3.4. Programming tools. .................................................. 71 Limitations and future directions. .................................................................... 73 6.1. Limitations and current problems. ....................................................... 73 6.1.1. The back-end tool. ..................................................................... 73 6.1.2. The front-end tool. .................................................................... 73 6.1.3. Complexity. ................................................................................ 74 6.2. Reduction of the dynamic factor using path information. ............... 75 6.2.1. The constructs. ........................................................................... 76 6.2.2. An example. ............................................................................... 77 6.2.3. Calculation of execution times. ............................................... 78 6.2.4. Simulation of execution times. ................................................ 78 6.3. Typing RTT. ............................................................................................. 79 6.4. Proving terminating loops. .................................................................... 80 6.5. Measurement on a PC version of RTT. ................................................ 80 7. References. ............................................................................................................ 81 Appendix A: Measurement of execution times on Motorola MC68000. .......... 85 A.1. Aim............................................................................................................ 85 A.2. Test system............................................................................................... 85 A.2.1. Measurement technique. .......................................................... 86 A.3. Test 1: Short loop..................................................................................... 88 A.3.1. Program list. ............................................................................... 88 A.3.2. Program graph with execution times. .................................... 88 A.3.3. Output from the execution time analyser. ............................. 88 A.3.4. Calculated time versus measured time. ................................. 88 A.4. Test 2: Long loop. .................................................................................... 89 A.4.1. Program list. ............................................................................... 89 A.4.2. Program graph with execution times. .................................... 89 A.4.3. Output from the execution time analyser. ............................. 89 A.4.4. Calculated time versus measured time. ................................. 89 A.5. Test 3: Long loop with jumps. ............................................................... 90 A.5.1. Program list. ............................................................................... 90 A.5.2. Program graph with execution times. .................................... 90 A.5.3. Output from the execution time analyser. ............................. 91 A.5.4. Calculated time versus measured time. ................................. 91 A.6. Overhead for refresh .............................................................................. 92 Appendix B. Calculation of execution times for a RTT C-macro example. ..... 93 B.1. The example............................................................................................. 93 B.2. Manual calculation. ................................................................................ 94 B.3. Detection of recursion. ........................................................................... 96 B.4. Automatic calculation with the front-end tool. .................................. 96 6.
v
Notation and terms used in the thesis. The following object-oriented terms are used in this thesis: •
An object is an entity in the system that has a set of state variables and a set of operations accessible from the outside that operate on these variables.
•
A class is a common description for a set of objects. From the class it's possible to create instances.
•
An instance is an object that is created from a class.
•
Polymorphism is the ability of behaviour to have an interpretation over different classes. It can be introduced in two ways: through inheritance, and through overloading.
•
Inheritance is a relation between classes that allow us to build a class as an extension of an already defined class. The inheritance mechanism is an important feature when building and using frameworks.
•
Overloading occurs when the same selector is used for several methods.
•
A selector is the name of a method.
•
A message is a service request that is sent to an object. The message includes a selector and arguments. It is up to the receiver to decide what will be done.
•
A method is the implementation of a certain service.
The following real-time terms are used in this thesis: •
"A real-time system is a system where the correctness of the system depends not only of the results of computations, but also on the time at which the result is produced " [Sta88].
•
A hard real-time system is a system where a system failure in both the temporal as well in the functional domain will cause that the system will not fulfil its mission.
•
A soft real-time system is a system that will fulfil its mission even if deadlines are missed occasionally, but it should in normal operation not miss any deadlines.
•
Period time is the time interval between two consecutive activation's of a service.
•
Deadline is the time before which a certain service must be completed.
vi
•
Release time is the earliest time that a service can start its execution relative to start of period.
•
Determinism is the possibility to a priori guarantee something, e g the time behaviour of software. Indeterminism is of course the opposite.
•
Jitter is the difference in period(time uncertainty) for a periodic action.
•
A schedule is an execution plan. A feasible schedule is an execution schedule which guarantees all tasks to meet their deadlines.
The following computer science terms that are used in this thesis: •
Dead code is program code that never will be executed.
•
Static analysis of programs is analysis of the program code. Dynamic analysis of programs is analysis of the program behaviour when it is executed.
•
Run-time is when the program (system) executes. Compile time is when the program (system) is compiled.
•
The execution path of a program is the order the program executes its statements (code). The possible execution paths can be depicted in a control flow graph.
•
Code is executed in sequence if the statements are executed in the order they are given in the program.
•
Selections are program constructs where only one code segment (or none) out of a set of code segments is selected.
•
Repetitions in programs are code segments that are repeatedly executed. Synonym to loops and iterations.
•
Recursion occurs when program code invokes itself.
•
Dynamic binding means that the code that is to be invoked is selected at runtime. Synonym to late binding. Static binding means that the code that is to be invoked is selected at compile time. Synonym to early binding.
•
Imperative languages are languages where the code is written so that the processor is told more or less in detail what it shall do. Other language groups are functional languages and logical languages.
•
An exception is an abnormal event. An exception handler is a code segments that handles such an event.
•
A language is typed if its variables, parameters etc. must be defined as certain types. A non-typed language does not have this demand. Type inference is the ability to calculate which the type of a certain variable from the program code. vii
•
The syntax of a language is the rules that define the correct form of the language elements of a program. The semantics is the meaning of the program, i. e. the intended function of the program.
•
A parser is a program that reads and analyzes program syntax.
•
A framework is a set of code and rules, used to develop applications of a certain type.
•
A preemptable program is a program that can be interrupted by other programs.
•
Garbage collection is a function that reclaims earlier used but now free memory.
•
Invocation means activation of a code segment (e. g. a block).
•
Return means termination with a result.
•
A transitive closure is the result one get when a function is repeatedly executed on the output of the function.
•
A fair function (fairness) will eventually be performed, but no deadline is given.
•
Proofs using assertional logic are proofs that use assertions (predicates) valid at certain points in the program.
•
Pre- and post-conditions in programs are predicates that are valid before and after a certain action is performed, e.g. a function.
•
MINT is the real minimum execution time for a program.
•
MAXT is the real maximum execution time for a program.
•
MINTC is the calculated minimum execution time for a program.
•
MAXTC is the calculated maximum execution time for a program.
•
The overreservation factor (OF) = MAXTC / MAXT.
•
The dynamic factor (D) = MAXT / MINT.
•
The dynamic factor (CALD ) = MAXTC / MINTC. n
•
AVET =
∑ Ti/n is the pathwise average execution time of a program .
i=1
viii
Chapter 1: Introduction
1.
Introduction.
1.1.
Real-time systems and software.
1
Real-time systems are systems where both the functional and the temporal behaviour of the system is essential. "A real-time system is a system where the correctness depends not only on the results of computations, but also on the time at which the result is produced" [Stanc88]. The system may be controlling a time-critical process, like a flying aircraft, where the signals to the rudders, flaps and the motor must be correct and must be given at the correct time instant. Often these timing requirements are defined as deadlines for the response times of certain system use cases or transactions (input - process - output). Very often, time windows are more suitable, since an output too early may be as harmful to the process as an output too late. A typical application in a hard realtime system is a control loop controlling a mechanical device such as a motor. Here, the output to the actuators must not come too early or too late (i e the jitter must be small), else the control algorithm does not work correctly. The time demand on the system is mapped to time demands on the software, depending on the resource structure of the implementation. This mapping depends on many factors; one of the most important of these is the execution strategy of the system. In the real-time community, there are two main streams for system implementation; the event-triggered and the time-triggered paradigms [Kopetz91]. The event-triggered paradigm is based on software invoked by process events, dynamic allocation of resources, and some priority ordering of software modules (tasks). The time-triggered paradigm is based on software invoked by time (often by a cyclic executive using an off-line schedule) and static allocation of resources. No priority ordering of software modules is necessary. An event-triggered system has a complicated, stochastic behaviour, depending on the process behaviour, while the time-triggered behaviour is simpler and prescheduled. Thus, the time-triggered paradigm is more suitable in order to get deterministic systems, i e control systems for which the temporal behaviour must be guaranteed in advance.
1.2.
Real-time systems and object-oriented programming.
Building real-time systems is becoming increasingly complex due to the increased use of such systems in larger and more complex applications. A problem that arises is the treatment of software complexity, both in the structure and the time domain. Therefore, designers need better tools when designing and maintaining real-time software. The object-oriented concept seems to be a good attempt to handle this complexity.
2
Chapter 1: Introduction
The software complexity in real-time systems is not different from the complexity in other fields of programming. However, real-time systems must also have a correct temporal behaviour. Thus the question: Is the object-oriented paradigm suitable for building software for real-time systems? Our assumption is that this is the case and therefore it is our starting point. An important aspect of the object-oriented idea is that it may serve as a common language for all levels of abstraction of the system. Non-programmers may use the same basic model as the people who design and implement the system; the difference is only the level of detail. Some other important issues that speak in favour of the object-oriented paradigm are: •
It is often natural to model the application domain by interacting objects [Shlaer88].
•
Object-oriented programming supports enhanced productivity by use and reuse of components and frameworks that are well tested [Cox86].
•
Polymorphism promotes the extensibility of systems [Gold89].
•
Object-oriented design is suitable for fast creation of prototypes and development by refinement [Blair91].
Chapter 2: Analysis of execution time
2.
3
Analysis of execution time.
To be able to guarantee the deadlines of a real-time system, the execution time for the software executed during run-time must be measured or calculated in advance. Real-time scheduling theory requires this, starting from the classical paper by Liu and Leyland 1973 [LiuLey73]. Often, the execution time varies between different invocations of a software module. This is because the execution takes different ways through the control structure of the program depending on input data or the internal state. The maximum execution time is used instead in this case, to avoid deadline violation. This of course may lead to unused processing power in a time-triggered system, since execution slots are reserved for the worst case. If the time requirements are defined as time windows instead of deadlines, the minimum execution time is very interesting. This is the case e.g. in control applications. Note that the average execution time is not very interesting in a real-time system.
2.1.
Measurement or calculation of execution times?
2.1.1. Measurement of execution times. Measurement is made with the means of timing certain entry and exit points in the software, when it is executed on the target run-time system. Measurement is thus a dynamic method. Measurement is often used in practice, and a number of commercial tools are available, see for example Turbo Profiler [Profiler]. Measurement is of course possible, but a number of theoretical difficulties occur. The most serious is how the longest execution path through a program is found and executed. This is of course no problem with small or simple programs, but it can be very difficult in a complex program with many repetitions and selections and in systems with dynamic allocation of resources. In general, this is an NP-hard problem: all paths must be executed with a timing measurement enabled and then the longest execution time is chosen. To make the program execute through all these paths requires a control flow analysis and generation of test data that forces the program to execute the desired paths. This work is of the same complexity as the calculation of execution time (see next section). Another aspect of measurement is that the measuring mechanism must not interfere with the execution of the measured software. This is possible with the use of external hardware-based timers [Berggren92]. If software measurement is used, some technique to correct the interference must be used, e.g. the dual loop technique described in [Clapp86].
4
Chapter 2: Analysis of execution time
A negative property of measurement is that the resulting times are only valid for the actual target hardware; the times cannot easily be recalculated for other processors, memories, etc. The times are also only valid for the current version of the program. If the program is changed, times must be re-measured. It cannot be ignored that measurement often is a very time-consuming work with much manual labour (setting up hardware, make all the test runs, etc.). 2.1.2. Calculation of execution times. Calculation of maximum execution times is a static method, and is based on the analysis of the source code for the actual system. The analysis can be made either by hand or automatically. A combination is also possible. Of course, calculation by hand is only possible for small or simple programs. Automatic calculation is studied in a number of recent papers and a number of research tools are mentioned in these papers, among these are tools for MODULA/R [Puschner93], C [Puschner89], [Park91:2] and C++ [Vort92]. Calculation of maximum execution times for Smalltalk programs is not mentioned anywhere in the literature. It seems that our research in the RTT project is the first attempt in this direction. Automatic calculation does not seem to be used in industrial practice. There are no widely used commercial tools. A problem with calculation of maximum execution times is that the execution times for basic language constructs are available only for assembly or machine instructions. When analyzing high-level programs, one has to know what machine code is produced from the high-level constructs. This implies that also the code generation of the compiler has to be known. Some high-level object-oriented program constructs are data dependent. This means that only a lower and an upper limit of these constructs can be calculated, because the data values vary between different executions. What the actual data will be during execution is not known at compile time. These constructs are selections, repetitions, recursion and dynamic binding. Selections are rather easy to handle. The lower and an upper limit of this construct is simply the time for the code selection plus the shortest or the longest alternative, respectively. The lower and upper time limits for repetitions are simply the time to execute some initalization plus minimum number and maximum number of repetitions of the code, respectively. However, for repetitions of while-type, the maximum number of repetitions in a loop cannot in the general case be calculated by a static analyzer. A syntax for maximum number of repetitions must therefore be added to the language, or the data must be transferred to the analyzer in another way. In [Puschner89] max-count loops and max-time loops are both introduced. However, in our work with RTT, only max-count loops are used, since it makes little semantic sense to time out somewhere in the middle of a loop1 . 1 /The argument is the same as for Real-Time Euclid [Stoyen91]
Chapter 2: Analysis of execution time
5
Recursion is hard to handle, both direct and indirect. The number of recursion levels before the base-case is found is data-dependant and cannot in the general case be calculated by a static analyzer. Recursion must therefore either be forbidden or limited to a fixed maximum number of levels. Recursion in the case of a non-typed, polymorphic language like RTT and Smalltalk can be hard to identify. What looks as recursion may not be recursion at all; for example the code for a method print can contain sendings of the message print, but at run-time this is always calls to instances of another class. In the present version of RTT no type of recursion (true recursion, class recursion or polymorphic recursion) is allowed 2 . Object-oriented programs pose yet another problem. The late (dynamic) binding which is a consequence of polymorphism leads to the fact that a static analysis cannot determine which code, for a certain message sending, that will be executed at run-time. This is the case with virtual functions in C++, and is even more common in Smalltalk, which is an untyped language. Overestimation may occur 3 . 2.1.3. Combination of methods. A combination of measurement and calculation is also possible. One idea is to measure the execution time for small code fragments and then use these times to calculate the execution times for larger program constructs. This approach is used e.g. by G. Wall et al. in Uppsala [Wall92] and is described in section 3.7.
2.2.
System and hardware demands.
The execution time of a certain software module in a certain computer system depends on a number of factors. These can be identified as: 1. The program source code. 2. The compiler. 3. The run-time system and the operating system. 4. The processor and other hardware. The program code defines the control flow graph and the instructions that are executed along the execution paths. The analysis of the program code is the main topic of this thesis. Detailed knowledge of how the compiler translates the source code to machine instructions is essential. The run-time system and the operating system also affects the execution time with factors like the execution strategy (e g late binding), time for system services (like floating point calculations) and for communication. The system must be designed in a way that makes it possible to give an exact execution time (or a min-max-interval) for each of these factors.
2 /This is discussed in more detail in section 5.2.2.1. 3 /For a discussion, see section 6.3.
6
Chapter 2: Analysis of execution time
The time behaviour processor and other hardware constitutes the basis upon which all other calculations rest. This time behaviour must be deterministic, i e known in advance. This presents a problem: many modern processors are designed to optimize throughput by using caches, pipelines etc. This optimizes the average behaviour, but the worst case execution times are very complicated (see for example [Zhang93]) to calculate, since the internal behaviour of the hardware is so complicated and dynamic. Other interferences come from interrupts and memory refresh of dynamic RAM. There are some ways to solve this problem: 1. Use a defensive approach, avoiding the optimizing hardware. •
Use "old" processors like Motorola 68000 or 68010, which don't use caches or pipelines4 .
•
Use static RAM without memory refresh.
•
Clock interrupts can be taken into account using e g the formulas in [Park91:1].
•
Don't use other stochastic interrupts.
2. Try to calculate the execution times for the optimizing hardware. •
The memory refresh of dynamic RAM is discussed in [Park91:1], and the worst case effect on execution times is calculated.
•
The worst case execution times for pipelined processors is discussed and calculated in [Zhang93].
3. Adapt the optimizing hardware to real-time. •
One approach is the SMART cache approach of Kirk [Kirk92], which yields deterministic cache behaviour.
4. Develop new processor architectures for real-time. •
One example is RTX2000 from Harris [Harris] which uses parallelism to get deterministic execution times; all instructions take one or two clock cycles to execute.
In the RTT project, we use a combination of the first and second approaches (Motorola 68000 with dynamic RAM and only clock interrupts) at present time.
4 / In fact, Motorola 68000 or 68010 use a prefetch hardware, but the the times for the prefetch are
included in the execution times for the instructions. See also app. A.
Chapter 2: Analysis of execution time 2.3.
7
Real and calculated execution time.
times for different paths
0
MINTC
MINT
AVET
MAXT
MAXTC
time
Figure 2.1: Different measures of execution time. The real execution time of a program, T, may vary depending on the input data. With the definitions MINT = real minimum execution time MAXT = real maximum execution time
(Def. 1)
we mean the shortest and the longest real execution time it takes to execute a given program, for any input data, compiled with a given compiler on a given hardware. In many cases, MINT and MAXT are unknown. With MINTC = calculated minimum execution time MAXTC = calculated maximum execution time
(Def. 2)
we mean the highest lower bound and the lowest upper bound our calculations of execution time give us, for a given program, compiled with a given compiler on a given hardware. The calculation must be safe, i e never give too low MAXTC or too high MINTC. That is, the following inequalities must be valid: MINTC ≤ MINT and MAXTC ≥ MAXT.
(1)
The narrowness between MINTC and MAXTC tells us something about the quality of the calculation and the the dynamic factor of the program (see next page). If, for example, MINTC = MAXTC we know that we have a perfect calculation and that the dynamic factor is one. From (1) also follows MINTC = MINT and MAXTC = MAXT, that is MINT and MAXT are determined. Therefore, it is worthwhile to calculate MINTC, as well as MAXTC. MINT and MAXT are limited by MINTC and MAXTC.
8
Chapter 2: Analysis of execution time
The pathwise average execution time of a program is n
AVET =
∑ Ti/n
(2)
i=1
where T i is the execution time of path i and n is the number of different paths through the program. There are other possible mean values, like mean values over the input data domain, or for the actual input data distribution during run-time execution. Note that the average execution time is not very interesting for real-time programs.
2.4.
The overreservation factor.
An interesting figure is the overreservation factor OF = MAXTC / MAXT
(Def. 3)
for a software module. It expresses the quality of the calculation, since it captures the relative error of the execution time calculation. It also expresses the overreservation of hardware execution resources. Of course, MAXTC is unknown if we don't know MAXT. Consider for example a cyclic system with only one module, and no operating system overhead. If MAXTC is chosen as the cycle time, deadline violation can never occur. However, the highest possible CPU load (utilisation) is U = MAXT/MAXTC = 1/OF which means that (1 - 1/OF) * 100% of the hardware capacity is never utilised. OF should be as close to 1 as possible if high utilization of the hardware is important. Note that MAXTC = MINTC => OF = 1 An important cause of big overreservation factors are repetitions (loops) in the program., especially for programs with a number of loops which are dependant, in such a way that the worst case for all loops never occurs at the same time. Another cause of big overreservation factors is selections in the program. In a program with a number of selections which are dependant, it is possible that the worst case for all program branches never occurs at the same time. This also leads to a big overreservation factor.
Chapter 2: Analysis of execution time
9
These causes to overreservation have been discussed by several papers and there are some suggestions how to minimize them. See for example the presentation of the work by Puschner et al. in section 3.3 and of Park in section 3.4. Another important cause of big overreservation factors in object-oriented software is the use of polymorphism. For example consider the expression anObject aMessage.
Since the type of anObject is nor known at static analysis time, we can't decide the method that will be invoked by aMessage. The biggest MAXTC and the shortest MINTC among all methods with the selector aMessage in the system has to chosen. This is of course not optimal, and may lead to a diminishing use of polymorphism in object-oriented software, which is contrary to the basic ideas of object-oriented programming5 .
2.5.
The dynamic factor.
The dynamic factor (D) for a software module is defined as D = MAXT / MINT
(Def. 4)
D expresses the execution time dynamics for a module. From calculations, only the calculated dynamic factor CALD = MAXTC / MINTC
(Def. 5)
can be calculated. From equation 1 and Def. 5 it follows that CALD ≥ D
(3)
5 /However, this problem can be solved at least partly by the use of type checking and type
inference. This is discussed in more detail in section 6.3.
Chapter 3: State of the art
3.
State of the art
3.1.
Overview.
11
During the 60:s and the 70:s, most of the real-time applications were programmed in FORTRAN or assembler. However, coding, testing and maintaining systems programmed in those languages were difficult. Also, neither of these rather primitive languages were suited for applications where time behaviour analysis was important. Small programs in assembler could be analysed by hand, but for systems of some size, trial-and-error testing was necessary to check that the systems met the timing requirements. Later in the 80:s, many companies turned to modern general systems programming languages like Pascal, Modula-2, and C, to get better support for the structuring, coding, debugging and maintenance of programs. However, these program languages also lacked support for calculation of real-time behaviour. A number of programming languages were developed during the 70:s and 80:s, like Pearl and Ada, which were claimed to be suitable for real-time applications. These languages, well-suited for general systems programming, did not really support the programmer to write reliable real-time software, and did not have any support for time analysis or schedulability necessary in hard real-time systems. A critical analysis of seven "real-time" languages, including Pearl and Ada, is found in [Halang90]. However, Ada is still used for real-time applications, and development of tools and methods for Ada continues. See for example section 3.7. A new approach was taken in RealTime Euclid, which restricts the available constructs to those that are time-bounded. This makes the execution time of the program easier to calculate by static analysis. See section 3.9. Another approach was adopted in Esterel [Berry85]. The language permitted direct expression of timing requirements and exception handlers to handle broken deadlines. However, the lack of compile-time analysis means that the programs still are not predictable. The FLEX approach is not based on static analysis of the code, but on a mathematical model of the program's time behaviour, verified through measurement. See section 3.5. A common approach has been to add real-time constructs to general programming languages to be able to support time behaviour calculation by static analysis. This is made for example in the MARS system (C and Modula-2, see section 3.3), and in the work of Shaw and Park (C, see section 3.4). Lately, the complexity of real-time systems increasies due to the increased use in larger and more complex applications. The object-oriented concept is one attempt to handle this complexity. Tools and methods to calculate the execution time of object-oriented software are studied by a number of researchers, see for example section 3.5, 3.6 and 3.8, and our own project, RTT.
12 3.2.
Chapter 3: State of the art Mok.
3.2.1. General.
3.2.2. Languages supported.
3.2.3. Description of the approach.
3.2.4. Results.
3.2.5. Discussion.
Chapter 3: State of the art 3.3.
13
MARS.
3.3.1. General. MARS stands for MAintainable Real-Time System and is a system developed at the Technical University of Vienna in Austria. The goal with the MARS project is to develop a distributed real-time system with a guaranteed, deterministic behaviour [Kopetz89]. All parts of the system, like hardware, compilers, and schedulers, are built within the project to get total control over the timing behaviour of the system. As a part of the project, a special tool is developed that calculates the maximum execution time for the software from the source code. 3.3.2. Languages supported. The early version of the MARS system used MARS-C, a descendant of the C language. Later, MARS-C was replaced by MODULA/R, a real-time adaptation of Modula-2. 3.3.3. Description of the approach. In the now almost classical paper [Puschner89], Puschner and Koza discusses the restrictions that have to be made on a common imperative language, such as C, to be able to calculate the maximum execution time. They found that the following problems must be solved: -
the maximum number of iterations or time spent in a loop must be maximized.
-
recursion must not be allowed due to the problems with determination of maximum depth.
-
functions as parameters and pointers to functions must not be allowed, because MAXT for these functions is unknown.
-
GOTO must be forbidden, because unstructured programs cannot be analysed by a tool.
They proposed the following constructs to solve these problems: the maximum number of iterations in a loop is specified in the MAX_COUNT keyword in loop statements. Similarly, maximum time to be spent in a loop is specified with MAX_TIME. If either of these bounds are broken, a type of exception handling is invoked. The authors define a set of formulas for simple language constructs. These are simple statements, statement sequences, alternatives, bounded loops and subroutines. The maximum execution times for the machine instructions of these constructs are calculated in the compiler. The maximum execution time for a program is calculated by the recursive use of these formulas.
14
Chapter 3: State of the art
The authors then define new constructs to reduce overestimation. Markers define the maximum number of times the control flow can pass through a certain position in the program. A scope define the part of the program, where the marker is valid. The usual use of markers is to define an upper count of how many times a path can be executed within a loop (the scope then coincidences with the loop). If, for example, the longest path through a loop is possible to execute only some times of the MAX_COUNT number, the calculated upper bound is diminished. For loops which are dependant in such a way that the sum of the iterations never exceeds a fix value, loop sequences are defined with certain keywords. The recent paper [Puschner93] presents a programming environment for the new language MODULA/R. The same basic approach as before is used, but a more detailed description of the algorithms is given. As before, scopes, markers and loop sequences are used to reduce high calculated upper bounds. New is the analysis of the return and break statements and their impact on overestimating the upper bounds. When, for example, a return statement is executed, the current function is terminated. This means that all subsequent statements are skipped. An analysis may thus lead to that the shorter alternative without a return statement in a selection may contribute more to the total MAXTC of a function than the longer alternative with a return statement. A similar reasoning is made for loops. All information about the control flow is stored in the timing tree, and this information is also presented to the programmer. The actual worst times as well as hypothetical execution times are stored there. Hypothetical execution times are times that the programmer can use for modelling. He can enter values and study their effect on his program. The contribution of code sections to the total MAXTC is presented, so the programmer can decide where improvements should be made. 3.3.4. Results. In the earliest paper, [Puschner89], the authors analyse a program with and without the marker construct. Their result is the following: Construct used Bounded loops only: Bounded loops and markers:
MAXTC 551 475 096 cycles 46 810 232 cycles
When marker are introduced, it results in a reduction of MAXTC with more than a factor 10! Of course the success of the method depends a lot on the application.
Chapter 3: State of the art
15
These constructs can also be used during run-time as a supervision of the program behaviour. The paper calculates the overhead for this run-time check for this example, i.e. the execution time that these run-time checks uses: Construct used
MAXTC
Bounded loops (calc_weigth):
2 115 840 cycles
Bounded loops (calc_center):
4 879 238 cycles
Marker:
139 216 cycles
It is striking to se how cheap the marker construct is, in comparison to its dramatic effect on the MAXTC (see above). 3.3.5. Discussion. The work done in the MARS project is a cornerstone in the real-time field. This is also the case with the methods and tools for execution time analysis developed in the project. The basic problem, calculating the execution time for programs in imperative languages, is solved. The high upper bounds of the calculations are reduced through new language constructs. Innovative programming tools support the programmer. There are two things that could be pointed out that are missing in the MARS approach: - MINTC is not determined. - Object-oriented programming is not supported.
16 3.4.
Chapter 3: State of the art Shaw and Park.
3.4.1. General. A. C. Shaw and C. Y. Park from University of Washington, Seattle, have presented a number of interesting papers on the analytical approach to execution time determination. The basic idea is to analyze the high level source code for the real-time software, and to calculate the time behaviour of programs. 3.4.2. Languages supported. The authors have generated a tool that analyzes a subset of C. The GNU C-compiler is used without optimization, and the Motorola 68010-based Sun 2/100U is chosen as the target computer. For the path analysis, a prototype is built for a subset of C and for IDL (Information Description Language). 3.4.3. Description of the approach. The approach is to define a timing schema for the constructs of the software, and then proceed in the following way [Shaw91], [Park91:1]: 1.
Decompose the statements into their primitive components, as defined in the timing schema.
2.
Predict the implementation (e g object code) of each primitive component.
3.
Determine the execution time of the primitive components from the times of the machine instructions.
4.
Compute the execution time of the statements, using the times of the primitive components and the timing schema for the statements.
For example, the execution time of the statement S1: a := b + c;
is calculated via the timing schema T(S1) = T(b) + T(+) + T(c) + T(a) + T(:=) which is composed of 5 primitive components.
Chapter 3: State of the art
17
The object code for each primitive component could, in a hypothetical assembly language, be predicted as:
b: +: c: a: :=:
mov M, R add M, R none none mov R, M
/* mov b, d0 */ /* add c, d0 */
/* mov d0, a */
The predicted execution time of S1 becomes the sum of the times of the three instructions. The papers discuss the problem of code generation in compilers. Many compilers optimize code within basic blocks by the intelligent use of registers, eliminating common expression, etc. (see for example [Aho89], section 10). But this leads to that the same statement may correspond to different object code in different program contexts. For example, the first assembly instruction in the object code above (mov M, R) would be eliminated by register optimization, if the value b already was stored in a register. The authors point out some ways to solve this problem. The method used in their tool is that the extended compiler places markers between the primitive components. By the help of these markers, the exact generated code for each primitive component can be identified. Timing schema for more complicated constructs as selections and iteration are given, and both sequential and parallel programs are analyzed. Lately, Park has continued the work with a very interesting article in Real-Time Systems Journal [Park93]. The basis is still the timing schemes, but by using userprovided information, unfeasible paths in the program can be eliminated. In this way, tighter prediction bounds can be achieved. Park uses a path model (see the figure below). Ap is the set of all feasible programs paths found by static analysis of the source code. Ip is the set of feasible paths given by the user-given information. Xp' is the set of possibly executable paths when Ap and Ip are combined. Finally, Xp is the set of paths really possible in the program. Xp' = Ap ∩ Ip Timewise, Xp' may give a better prediction than Ap, since the user-given information in Ip may exclude some paths that would give too high or too low calculated execution times in Ap. The paths in the program are analyzed by pathwise decomposition, i e the program is decomposed into subsets for different patterns of possible program behaviour.
18
Chapter 3: State of the art
∑* Xp' Ap
Xp
Ip
Figure 3.1: Program path model The user supplied information is given in a information description language (IDL), which contains constructs like always(A), samepath(A, B), nopath(A, B) and execute A x times, where A and B are statements. Example of a C-function: check_data() { int i, morecheck, wrongone; morecheck = 1; i = 0; wrongone = -1; L:while (morecheck ) { if data[i] < 0 A: { wrongone = i; morecheck = 0; } else B: if (++i >= datasize) morecheck = 0; } if (wrongone >= 0) C:{ handle_exception(wrongone); return 0; }
User-defined information for the function (IDL statements): loop L [1, 10] times;
(Loop L is iterated 1..10 times since the size of data is known as 10) samepath(A, C);
(statements A and C must be executed together) (not A) imply loop L 10 times;
(If A is not executed, then L is iterated 10 times) execute a[0, 1] times inside L;
(The exception case A is executed at most once inside L)
Chapter 3: State of the art
19
The information from the program text and the IDL statements are combined in a path-processor (see figure 3.2). Finally times T(P) for the program are calculated.
Program text
Information
Timed Ap-Generator
Ip-Generator
Ap
Ip
PathProcessor
Xp'
Time Computation
T(P)
Figure 3.2: The time calculation tools by Shaw.
3.4.4. Results. The paper [Shaw91] presents comparisons between calculated time bounds and measured times for three programs. The agreements are quite good. In one case, some impossible paths are eliminated by data given by user, which gives a tighter prediction. In [Park93] nine different programs have been analysed with and without path analysis. The results are very interesting; it shows that i all cases except three the predictions are tighter with this approach. For example, the check_data function given above gives the following result: Prediction without path analysis:
Prediction with path analysis:
MINTC = 32.15 µs MAXTC = 252.00 µs
MINTC = 67.75 µs MAXTC = 248.60 µs
Measured times: MINTC = 73.1 µs
MAXTC = 234.9 µs
20
Chapter 3: State of the art
3.4.5. Discussion. The early results of Shaw and Park are a basis for much research in the field, and the analytical approach is similar to the one used in RTT. The path analysis approach of Park is very promising. However, some of his constructs may be of only theoretical interest. Programs with too much IDL statements may be error-prone and hard to maintain. Is the information wanted from the programmer possibly too complicated? Park shows that only some of the IDL statements really are effective to tighten calculations. It is therefore a good strategy to give the information incrementally. One problem with the user supplied information given in IDL is that the information must be correct, i.e. it must not conflict with the program. Park discusses this problem [Park93], and shows that it can be reduced to a problem of program verification by using assertional problem logic. One problem with the Park approach is that the calculation of execution times becomes very complicated, when many possibilities in the IDL are used. The MARS (Puschner) approach and the Shaw and Park approach have very much in common. The MARS (Puschner) ideas about restricting the high upper bounds with new language constructs are brought further by the IDL (Information Description Language) in Parks paper (see below). However it is probably so that the MARS have come the furthest with the implementation of their ideas in a real system environment.
Chapter 3: State of the art 3.5.
21
FLEX (Kenny, Lin et al.)
3.5.1. General. FLEX is a real-time language, which is part of the Concord project [Lin87]. FLEX is not a tool based basically on analysis of the program, but has an empirical approach. The FLEX system measures the timing behaviour of the software, and then uses the measurement to find the parameters of a timing model of the software. This timing model is programmer supplied [Kenny91:1]. FLEX does not forbid the programmer to use constructs like unbounded loops and recursion, nor does it abandon indeterministic hardware like the cache. It is claimed to work for programs in general. 3.5.2. Languages supported. FLEX is a superset of C++ and C. The code is instrumented by inserting #pragmastatements, which are expanded by the compiler to executable code which performs the measurement. 3.5.3. Description of the work. The idea is to instrument the code with pragmas and then during execution collect timing information according to these pragmas. A pragma is an extension of the language. Mean and maximum duration can be measured. The measurement is done via software, but should ideally, especially for fine grain measurement, be done by special hardware. The timing results are analyzed and compared to the timing model.
Instrumented Flex program
Flex compiler
Timing-model description
Instrumented object program
Measured execution times
Timing analyzer
Figure 3.3: The FLEX tools. After the program has been run several times, the timing analyzer program is run. This program determines the best fit of the observed parameters to the actual runtimes and gives a confidence level for the model.
22
Chapter 3: State of the art
The method can be used e. g. to choose between different software routines performing the same function. Performance polymorphism, meaning an implicit selection of the best function, depending on the actual data, is presented in [Kenny91:1]. This polymorphism can be used to build flexible real-time systems with adaptive behaviour: if there is not enough time to execute the full algorithm for some value, an imprecise, but still good enough, value may be calculated by a faster algorithm selected by the system [Kenny91:3]. 3.5.4. Results. Some objects in the GNU C++ class library were analysed by the authors and they found very high confidence with the expected model. 3.5.5. Discussion. FLEX is a very interesting performance analyser, and of course it fits perfectly in the performance polymorphism system. However, the technique used is best suited for programs that make numerical calculations, where a clear correspondence exists between e. g. the size of input data and the execution time. It is less suited for e. g. a part of an operating system, where complex relations may exist between input data and execution time. Also, the random measurement technique that is the basis of the analysis contains a problem: you can never prove anything with an experiment! A complex program may contain unexpected behaviour for certain combinations of input data. This possibility may be missed by the FLEX system. The analytical approach of other researchers and in RTT finds such behaviour, since all program paths are analysed.
Chapter 3: State of the art 3.6.
23
DEDOS
3.6.1. General. DEDOS (Dependable Distributed Operating System) is developed at the University of Eindhoven. DEDOS is designed to handle distributed real-time processes. Soft as well as hard real-time processes are supported. A timing tool for DEDOS is developed, which calculates the upper execution time limits for the hard real-time processes [Vort92]. The processes executes in beads, which are parts of process executions delimited by synchronization points. 3.6.2. Languages supported. DEDOS applications are written in C++, which is interesting. It is the only project, beside RTT, in which execution times for object-oriented programs are calculated. However, not all constructs of C++ are supported. In [Vort92] 8 restrictions are given, e.g. recursion is not allowed, goto is not allowed and C++ library functions must not be used. Also, a new construct, the MAX_COUNT keyword, is added to the language to handle while-type loops. 3.6.3. Description of the approach. The timing analyser of DEDOS has an analytical approach, based on Park's and Puschner's work, and it uses the following method: 1.
Compile the C++ source code to machine (assembly) code.
2.
Scan and parse the C++ source code and divide it into small fragments for which no data dependency exist.
3.
For these small parts calculate the execution times by parsing the corresponding machine (assembly) code and adding the clock cycles of the instructions.
4.
Combine the execution times for these small parts together into high-level program constructs in a predefined way (a timing schema).
24
Chapter 3: State of the art
DEAL Code Preprocessor
Compiler
C++
C++ Machine Code
Timing Analyzer
PPDB
Timing info
Figure 3.4: The DEDOS Timing analyzer tools DEAL = DEDOS Application Language, PPDB = Pre-processor Database
3.6.4. Results. No results are presented in [Vort92]. 3.6.5. Discussion. The method used is very similar to the approach used by Shaw and Park (see section 3.4) and to our approach in RTT. The code is analysed at two levels, the machine level and the high-level language level, using timing schemata. However, the DEDOS method has one advantage over our approach: since the actual machine code is parsed, they can be sure that the times calculated are correct, and even some optimizations of the compiler can be turned on, as long as the machine code corresponding to the C++ code can be found. In RTT, the machine code analysis is made off-line, that is the actual machine code components are parsed once6 , and there is no guarantee that the same machine code will be produced in the actual machine code program, even if all optimizations of the compiler are turned off7 . What complicate matters in DEDOS are the fine grained execution units (the beads). Since synchronization can occur at many points inside a C++ process, many different cases have to be taken into account. These problems seem to have been solved.
6 / This is described in section 5.3. 7 / See section 5.3.1.
Chapter 3: State of the art 3.7.
25
Wall (Ada)
3.7.1. General. Göran Wall et al. at the Department of Computer Systems at Uppsala University are developing a source-level analysis technique based on a combination of measurements and program analysis [Wall93]. The goal is to create databases for use in program execution time estimation. 3.7.2. Languages supported. The language analyzed is a subset of Ada. 3.7.3. Description of the approach. The basic constructs of the language are measured using a developed variant of the dual loop technique described in [Clapp86]. The idea behind the dual loop measurement is to have two loops, one containing a null statement in the body, the other containing the code being measured in the body. The execution time is of course the difference divided by the number of iterations. However, to get the worst time one must be very careful with analysis of effects from e.g. compiler optimizations and hardware optimizations. This problem is discussed in [Clapp86]. The authors have developed a statistical method to find the execution times of the different statements that enters the mix of code in the loop body. A number of basic operations in Ada (e. g. boolean operations, float operations, loop statements and procedure calls) are measured. As an error measure the standard deviation of the calculation is used. 3.7.4. Results. The times are measured on a Force CPU-37BZE with a M68030 processor with data and instruction cache switched off. Moreover, all compiler optimizations were switched off. Basic operations were calculated with rather small standard deviations. Then an analytical method was used to estimate the execution times of some larger programs. Good agreement between measured and estimated execution times were found.
26
Chapter 3: State of the art
Two functions that calculated the sum of 10 integers in an array were analysed. One used value parameters, the other reference parameters. The measured execution times are compared to the estimatated execution time for the functions in the table below. Sum array value parameter Sum array reference parameter _____________________________________________________________________ Measured execution time Estimated execution time Standard deviation Overestimation
59.5 71.9 12.3 12%
59.9 62.0 12.4 4.3%
The estimated execution time is determined with the method described above. The standard deviation is a measure of the error in the estimated execution time. 3.7.5. Discussion. The approach used is a pragmatic one which gives useful results when one does not have total control over the compiler, the run-time system and the hardware. The paper [Wall93] discusses the correctness of the model. It is assumed that there is a linear relation between the number of executed features (constructs) and the execution time, and that the features do not interfere. However, little is really known about what happens on the lowest level: what machine code is really executed and how does the hardware optimize the execution? This is really the weak point in the method, as it is in RTT until the C-compiler is controlled or an own compiler is developed. The validity of the results can be measured with a statistical hypothesis test (R2test). This is however not done in the paper. Since standard deviations in the calculations are not zero, some uncertainties in the measurements exist. Where do they come from? If there are variations in the execution times, isn't the measured time more a mean time than a maximum time? The authors comment this by saying that when optimizations are switched on, all times will be shorter. This is certainly true in the average, but there may still be cases where this is not valid8 .
8 /See section 5.3.1.
Chapter 3: State of the art 3.8.
27
CHAOS (Bihari et al)
3.8.1. General. Gopinath, Bihari and Gupta discuss the difficult task of managing real-time constraints in large software systems in [Gopi92]. The authors are also engaged in the CHAOS project. Object-orientation is an effective way of managing complexity and to give the designer a convenient way to model the software. The article presents some ideas to improve the predictability of the code produced by the compiler for an object-oriented language. 3.8.2. Languages supported. No languages are mentioned in the paper; it is a theoretical study. 3.8.3. Description of the approach. There are two goals for each object in the system: 1.
Avoid timing faults. Maximize the predictability of its own execution time.
2.
Tolerate timing faults. Minimize the dependency on other object's execution time.
The approach has two phases: 1.
The compiler uses available information to rearrange the code to improve its predictability.
2.
The compiler adds monitoring points in the code for dynamic tuning.
The compiler computes the best and worst execution time together with the variance derived from the syntax tree of the application. The rearrangement of the code is based on its predictability and monotonicity. The code is divided into partitions of different types: •
Predictable. The code has a fixed execution time, because it consists of sequences, conditionals with equal branches or loops with fixed number of iterations.
•
Unpredictable. The code has a varying execution time, because it consists of combinations of conditionals with unequal branches or loops with unknown number of iterations.
28
Chapter 3: State of the art
•
Monotonic. The quality of the result is depending on the execution time. A loop refines the result if it is allowed to iterate more than a specific base number.
•
Nonmonotonic. The quality of the result is not depending on the execution time.
The predictability is then based on the variance of the execution time. Monotonicity is a semantic property of the code, and it's therefore given as certain pragmas in the code. The compiler optimizes the code by rearranging it, within the constraints of data and control dependencies. The idea is to: •
Move the unpredictable code towards the start of the application, so a risk for deadline failure can be detected early.
•
Identify the monotonic partitions, so the number of iterations can be lowered if a deadline failure is possible9 .
The idea is that the monitoring points in the code will detect a risk for deadline failure, and the application during run-time will adapt itself to the shorter available time. If this is not possible, the application will be aborted. 3.8.4. Results. No results are given in the paper. 3.8.5. Discussion. The idea is very interesting, but an implementation would be very complex, as the discussion in the paper shows. Also, no guarantees will be given that the application really will meet its deadline. Thus, this is not a solution for hard real-time systems. The idea could be developed in the following way. First, one would give a guarantee that the unpredictable code will never exceed a certain maximum execution time. Then, if the monotonic code could give a reasonable result within the time left, we would have a hard real-time solution. This solution would also attack the problem of overreservation.
9 / This approach has similarities with the FLEX apprach (see section 3.5).
Chapter 3: State of the art 3.9.
29
Real-Time Euclid.
3.9.1. General. Alexander Stoyenko, then at the University of Toronto, was one of the driving forces behind Real-Time Euclid, which is a language specially addressed to reliability and schedulability in real-time systems. The language is described in [Stoyen86] and the schedulability analysis in [Stoyen91]. 3.9.2. Languages supported. Real-Time Euclid, which is a language specifially developed for hard real-time systems. 3.9.3. Description of the approach. The real-time model, used in systems developed in Real-Time Euclid, has three components: •
the langauge and the compiler
•
the schedulability analyser
•
the run-time system
The language, Real-Time Euclid, is a descendant of Pascal and similar languages. It supports concurrent execution of processes. Activation of processes can be both periodic and aperiodic. Synchronisation between processes is handled with monitors. To guarantee correct estimation of the execution time, a number of features common in general programming languages have been removed. These are: •
dynamic arrays, pointer and arbitrary long strings
•
recursion
•
unbounded loops
•
process creation during run-time
Some constructs are added to support real-time programming. The language supports exception handling to enhance the fault tolerance. Real-Time Euclid supports pre- and post-conditions in the programs.
30
Chapter 3: State of the art
The compiler generates code for the NS16000/32000 microprocessors. The schedulability analyser consists of two parts, the front-end and the back-end analyser. The front-end analyser is language-dependent and builds language-independent program trees which contains time information. The back-end analyser, which is language-independent, maps the program trees onto the run-time system, and then computes the total execution time. The run-time system that the trees are mapped upon is a distributed configuration with support for time-bounded mailbox communication. Besides execution time and communication delays, the scheduler takes blocking times, due to other processes and devices, into account. 3.9.4. Results. In [Stoyen91], two Real-Time Euclid programs were analysed to evaluate the schedulability analyser. The results were compared with the Leinbaugh and Yamini method [Leinb80] and [Leinb82]. The scheduling method developed by the authors performed quite well: •
very accurate (within 4%) for a light load case
•
good (within 7%) for a medium load case
•
reasonably accurate (within 14%) for a heavy load case
•
marginally inaccurate (within 33%) for a heavy load case
The Leinbaugh and Yamini method resulted in much more pessimistic predictions. 3.9.5. Discussion. Real-Time Euclid is an interesting example on how a language specially developed for hard real-time can be designed and implemented. It also shows how the features available for the programmer are limited in such a language. The scheduling of processes becomes quite complicated since periodic and aperiodic execution, process preemption, blocking and communication delays are taken into account. In RTT, scheduling is much more straightforward and simple, since our "processes" are non-preemptable and periodic. Nevertheless, the scheduling problem seems to be solved in the Real-Time Euclid system, and the predictions seem to be very close to the measured times. However, the results are very dependant on the chosen applications.
Chapter 4: Presentation of RealTimeTalk (RTT)
4.
31
Presentation of RealTimeTalk (RTT).
This chapter presents RealTimeTalk [Bro92], which is a framework and an objectoriented language for distributed hard real-time systems. The main goals of RTT are to simplify the modelling and design of predictable real-time systems. We believe a system must be described with a syntax and semantics that are nonambigious to fulfil the above goals. Such a description simplifies the design and increases the maintainability of the system. These descriptions are made in either graphical or in textual form and are made using the design objects in RTT. A basis for the development of RTT is given in [Brorsson92]. An overview is given in [Eriksson93], and [Eriksson94] describes the RTT analysis and design philosophy, the RTT object model, the RTT tools and the RTT run-time system in detail. 4.1.
The design objects in RTT.
The design objects are defined within the RTT framework, which is a real-time framework with support for describing an application. The framework also includes basic classes, e g collections, arrays, etc. This framework is similar to the Smalltalk framework [Gold89]. The design philosophy in RTT is to use hierarchical decomposition, where one starts to find the different operational modes for the application. The design steps are the following: 1. 2. 3. 4. 5. 6.
Define the different operational Modes of the application. Define the allowed Mode Transitions between these modes. Define the Usecases that belong to each mode. For each usecase in the system, define the Parallel Executable Objects and their Object Relation Graphs and Precedence Graphs. Define the object relation graph and the precedence graph for the objects that are involved in the Mode Transitions. Implement the methods for all objects in all usecases. Application
Mode1 Mode2
Mode 4 Mode 3
Usecase 1
ModeTransition 1
Usecase 1 Usecase 1
Figure 4.1: The hierarchical structure of an application.
Usecase 1
32 4.2.
Chapter 4: Presentation of RealTimeTalk (RTT) Usecases.
The services that each mode include are called usecases. A usecase is the computation between a stimulus from the environment and a response given to the environment. Each usecase is defined by: •
its temporal characteristics (e.g. period time, release time, and deadline). These data are calculated from the user requirements in the analysis and design work.
•
a precedence graph, i.e. the execution order of the objects involved in the computation.
•
an object relation graph that defines the associations between the objects involved in the computation.
In RTT, all usecases are periodic. The services for aperiodic events are converted to periodic usecases according to the theory of Mok [Mok84]. The time dependency between the computational steps within a usecase is defined in a precedence graph. The objects that are defined in the precedence graph are called Parallel Executable Objects (PEO). These objects have all the capabilities needed to be distributed and they define the granularity of schedulable objects. The objects which represent a sensor or actuator are called Interface Objects (IO). Data Communication Objects (DCO) are used to communicate between Parallel Executable Objects on different nodes. The figure below shows an example of a precedence graph for a usecase. S1 and S2 are input objects. Their methods update can be executed concurrently, while the method control in the object Motor must wait for both update methods to finish, so the input values are in a stable state. Activation methods are methods that are periodically activated during run-time.
update S1 update
S2
control Motor deadline time
period time P
Figure 4.2: Example of a precedence graph for a usecase.
Chapter 4: Presentation of RealTimeTalk (RTT)
33
The scheduler gives each activation method a time slot in the schedule. The length of the time slot is MAXTC for the method10. Note that activation methods are not preemptable in RTT. For example, the precedence graph above could generate the following cyclic schedule in a one-node system: S2 update Control S1 update
0
t deadline
release time
end of period
Figure 4.3: Example of a schedule for a usecase. In a system with more than one node, S1 update and S2 update could execute in parallel. Note that a PEO can have its own release time and deadline within a usecase. One example on the use of this is shown in the figure below. The objective is to reduce the jitter for input and output in a control application. The release time and deadline are used to narrow the possible execution time so the jitter becomes as small as possible. If the deadline is chosen as release time + MAXTC, the execution is locked in time, and the jitter only depends on what happens inside the method. Input MAXTCI
Output Control algorithm
MINTCI deadline release time
MAXTCO
MINTCO
t
deadline release time
Figure 4.4: Use of release time and deadline of a PEO. In this case, it is wise to separate input, control algorithm and output in different objects, especially the two last. Output is normally performed in the end of a method, and a complex algorithm often has a big CALD (dynamic factor). With a simple output object the CALD can be small and also the jitter (= the difference between MAXTC and MINTC, if output is made last in the method). High CALD values often are due to loops in the code. Therefore they should be avoided or loops should iterate a constant number of times if small jitter is required. 10/ Actually, it is MAXTC corrected for runtime system overhead for activation, clock interrupts
and termination. See [Eriksson94] for details.
34 4.3.
Chapter 4: Presentation of RealTimeTalk (RTT) Scheduling.
The maximal execution time of the activation method of the PEO:s must be known in order to find a feasible schedule, that is a schedule where all deadlines are met. The scheduling is performed by an off-line scheduler which makes a heuristic search. The scheduler outputs a dispatch table that is used by the run-time system. The off-line scheduler is a part of the RTT tools (see below).
4.4.
The RTT language.
RealTimeTalk is an object-oriented language for hard real-time systems. We based our language on Smalltalk due to the following reasons: •
The idea to build systems with components in frameworks is successfully used in Smalltalk.
•
Smalltalk yields high quality applications with short development time.
•
Smalltalk supports incremental development and prototyping.
•
Smalltalk has very simple syntax and semantics.
4.4.1. Syntax. The syntax is the same as in Smalltalk [Gold89], with one difference: •
Both methods and separate statements can be written in C. This supports the re-use of existing code and is convenient for low level routines in the system.
4.4.2. Semantics. Some changes compared to Smalltalk semantics have been made in order to handle real-time demands: •
It is not possible to create new classes during execution. The reason for this is of course that the off-line scheduler cannot schedule unknown PEO:s. Therefore, the metaclass level is removed in RTT.
•
Recursion is not allowed.
•
Unbounded loops are not allowed.
•
Loops over unbounded data structures are not allowed.
One reason for the three last points is the demand from the time calculation algorithms (presented in section 5.2.2). Recursion cannot be handled by the present algorithms, and it must be a priori known when loops terminate.
Chapter 4: Presentation of RealTimeTalk (RTT) •
35
Invocation of blocks 11 is only allowed explicitly, not via parameters or variables.
The reason for the last point is that we must know which block will execute to avoid a very big OF (maximum over all blocks would have to be used) 12. 4.4.3. Control structures i RTT. Control structures are the basic control primitives that the control flow graph is built with. The basic control structures of programs are sequences, selections and iterations [Böhm66]. The control structures of RTT are a subset of the Smalltalk control structures [Gold89]. The basic element in an RTT program is the expression. An expression is one or many invocations of blocks or methods through message sending. There are also primitive methods in RTT that execute without message sending. A method is a sequence of expressions that consists of one entry point and one or more return points. The expressions are separated with dots. A block is equivalent to the method with the difference that it is always defined within a block or a method. However, the block can be executed in the same method or in another method. The last case is possible if the block is transferred as a parameter to the other method. This possibility is removed in RTT, because it is not possible to calculate a realistic execution time for this case. A method is invoked by message sending. A block is invoked by sending the special message value to it. Methods and blocks terminate when a return point is reached or the last expression is executed. Return points can be either conditional or unconditional. The definition of an unconditional return point is an explicit return expression, (marked with an ^ in the source code), or an unconditional invocation of a block with unconditional return point. Unconditional invocation of a block are all types of block invocation but selection (see section 5.1.2). The definition of a conditional return point is a conditional invocation of a block with an unconditional return point, a conditional invocation of a block with unconditional return point or a conditional invocation of a block with an conditional return point. Conditional invocation of blocks occurs only in selection.
11/ Blocks in RTT are described in section 4.4.3. 12/ See also section 5.2.2.3.
36
Chapter 4: Presentation of RealTimeTalk (RTT)
There are three variants of termination structure in methods and blocks. 1.
If no return points exist, execution is performed to the end of the sequence. Transfer of control is made to the invoking method or block, where execution continues. The last calculated object is put on the stack as an answer.
2.
If an unconditional return point is present, execution is immediately terminated after this point. If the return expression is executed in a block, invoked in the current method, the block and all levels of blocks above it are immediately terminated. Note that this termination breaks the structured control flow 13. The object calculated in the return expression is put on the stack as an answer.
3.
If a conditional return point is present, execution may be terminated or not, i. e. the first or the second of the variants above may occur.
One interesting observation is that methods and blocks always form sequences of expressions. However, this sequence is not a basic block as defined in compiler theory 14. In Smalltalk and RTT, for each expression, control is transferred to one or several other methods by message sending and then returns to the next expression.
Graphical symbol for a method or a block.
Chapter 4: Presentation of RealTimeTalk (RTT)
37
The control structure of a method may be seen as a tree composed of sequences. The method is the sequence at the top, all other sequences are blocks. Level 0 (method) S 1 (block) R R 2 (block)
Figure 4.6: Graphical description of a method. In the figure, a line followed downwards means invocation of a block. An "S" marks selection (see section 9.1.2) and an "R" marks repetition. A repetition with one block involved is a timesRepeat repetition (see section 9.1.3.1). A repetition with two blocks involved is a whileTrue repetition (see section 9.1.3.2).
4.5.
Run-time support for real-time.
The RTT run-time system is created to eliminate all the features that may cause indeterminism. The system uses: •
Optimised jump tables (Modified Two Way Colouring method) [Hua92] with constant method look-up time are used instead of the linear method search used in Smalltalk.
•
Object creation (memory allocation) with fixed execution time.
•
Real-time garbage collection with deterministic behaviour [Hassel93]
38 4.6.
Chapter 4: Presentation of RealTimeTalk (RTT) The RTT compiler and other tools.
A Smalltalk environment together with an RTT framework is used to generate models of the application, and to test and simulate different aspects of the application. As the implementation approaches the target computer environment, the application is converted to a load module and testing is continued in the target environment. The RTT compiler translates file-out files from the Smalltalk system to C-code, which is passed through a timing analysis tool, that calculates the maximum execution time for the activation method of all parallel executable objects in the system. The off-line scheduler tool is used to generate a feasible schedule. C-tools finally generate the load module.
Smalltalk environment with RTT framework
RTT browser
RealTimeTalk program
RTT Compiler
Timing analysis
Offline scheduler C-code
C-Compiler C-linker Load module
MC 68000
RS6000
80386
Figure 4.7: RTT development tools. Note that the timing analysis currently exists only for the MC 68000 run-time system.
Chapter 5: Calculation of execution times in RTT
5.
39
Calculation of execution times in RTT.
The basic idea contains the following steps: 1. Decompose the RTT code into basic components (C-macros). This is made by the RTT compiler. Each method and block produces a C function containing C-macros. 2. Predict the assembly code generated for each C macro. The C compiler used for the run-time system generates assembly code from the Cmacros. The code is of course dependant on processor type. 3. Predict the assembly code generated for the C-code in the system. Some of the methods are written in inline C (e. g. the methods in the base classes). Parts of the run-time system, like message sending, are a parts of the execution of the methods and therefore have to be analyzed. The C compiler used for the runtime system generates processor type dependant assembly code from the C-code. 4. Determine the execution times for the C-macros and the C-code. The RTT assembly parser calculates the execution times from the assembly code produced in step 2 and 3. The parser has knowledge of the execution times for the different machine instructions. The RTT assembly parser calculates the execution times for basic blocks. An assembly graph parser together with an assembly graph analyzer produces control flow graphs for the assembly code and calculates the execution times. 5. Compute the basic execution time for the RTT code, using the times for the basic components. The basic execution time is the time for C-macro execution within the method, the message sending and the execution of the invoked method (or block) are excluded. This computation is made by the RTT C-macro parser, which analyzes the code produced in step 1. 6. Search for recursion in the RTT code. If a method sends a message to itself, the execution time cannot be calculated (nonterminating loop). Therefore, the code is searched for recursion. If there is any risk for recursion, the calculation is aborted. 7. Compute the final execution time for the RTT code. All message sendings to methods and blocks are resolved and the final execution time is calculated, using knowledge of the control structures in RTT. The formulas given in section 5.1 are used. The goal are the execution times for the activation methods of the PEO:s. These times are passed on to the off-line scheduler.
40 5.1.
Chapter 5: Calculation of execution times in RTT Analysis of code in RTT.
RTT file-out code
RTT compiler
XXX. CLS
Transcript show: 'ARRAYTST'.
RTT.EXE
/* C macros
XXX. C
PUSH(stack,&_classTranscript,OBJPTR); PUSH(stack,_ITst_tststring0,STRING); SENDKEYWORD(stack,"show:",1); POP(stack);
C-compiler/ linker
Assembly
Transcript show: 'ARRAYTST'. */
XXX. SRC
The three levels of code in RTT.
Executable RTT program
; PUSH(stack,&_classTranscript,OBJPTR) ; moveq#2,d0 move.ld0,8(a2) lea.l__classTranscript(a5),a0 move.la0,12(a2) movea.l(a2),a1 addi.l#16,(a2) lea.l8(a2),a0 move.l(a0)+,(a1)+ move.l(a0)+,(a1)+ move.l(a0)+,(a1)+ move.l(a0)+,(a1)+
Chapter 5: Calculation of execution times in RTT
RTT file-out code
41
XXX. CLS
RTT compiler
Front-end
C macros
XXX. C
C code (e. g. C macros, baseclasses)
YYY. C
RTT Cmacro parser
XXX. T
Execution times for activation methods for use by the scheduler
XXX. FW
Execution times for for use by the Cmacro parser
Assembly code (e. g. OS kernel)
YYY. SRC C-compiler
Assembly code
YYY. SRC
RTT
Back-end
Figure 5.2: Time calculation in RTT. For the C-code and assembly code in the system, an analysis is made on assembly level. The C-code is translated to assembly code by the C compiler. Then the execution times for the assembly code is calculated using an assembly code analyser. Thereafter the times for the activation methods of the Parallel Executable Objects are calculated by a C-macro analyser, using the basic times for the C-macros and knowledge about the basic constructs in the C-macro code. These times are used by the off-line scheduler. What we have is a front-end and a back-end tool. The front-end tool analyzes the Cmacro code and is only dependant on the RTT language and the RTT compiler. The back-end tool analyzes the assembly code and is dependant on the processor and other hardware. If another hardware is used, only the back-end has to be changed. We could even change the back-end tool to a measurement tool (cf. section 6.5).
42
Chapter 5: Calculation of execution times in RTT
5.1.1. Sequences. As mentioned above, methods and blocks always form sequences of expressions. However, this sequence is not a basic block as defined in compiler theory. In Smalltalk and RTT, for each expression, control is transferred to one or several other methods by message sending and then returns to the next expression. Which method that is performed is decided at run-time (late binding) and depends on the class the object is an instance of. For example, aSensor getValue may invoke different methods depending on the type of aSensor. This late binding (implicit selection) is marked with a black diamond in the graph. The black diamond symbolizes "black box selection". An arrow down means invocation, a horizontal arrow means execution and an arrow up means return.
obj1 sel1.
sel1 sel1 class1 meth1 class2 meth1
The minimum and maximum execution times for an expression are: MINTC(expression) = MINTC(implicit selection and message sending ) + min[MINTC(method that implements the selector)] (4) MAXTC(expression) = MAXTC(implicit selection and message sending ) + max[MAXTC(method that implements the selector)] (5)
Figure 5.3: Message sending and method selection. The equations for minimum and maximum execution time are found by following the possible execution paths and determining the shortest and the longest path.
Chapter 5: Calculation of execution times in RTT
43
In methods and blocks, execution is performed in sequence from the entry point to a return point or to the end of the sequence.
The minimum and maximum execution times for a sequence of this type are: #expressions MINTC(sequence1) = ∑ MINTC(expressioni) i=1 MAXTC(sequence1) =
#expressions ∑ MAXTC(expressioni) i=1
(6)
(7)
where sequence1 = method or block without a return point. Figure 5.4. A method or a block without a return point.
If an unconditional return point is present, all following code is dead15.
^XXX.
dead code
The minimum and maximum execution times for a sequence of this type are: MINTC(sequence2) =
#leading expressions ∑ MINTC(expressioni) i=1
#leading expressions MAXTC(sequence2) = ∑ MAXTC(expressioni) i=1
(8)
(9)
sequence2 =
method or block with an unconditional return expression leading expressions = all expressions before and including the first unconditional return point. Figure 5.5. A method or a block with an unconditional return point. 15/ Of course, it makes no sense having dead code in a program, but it is perfectly possible!
44
Chapter 5: Calculation of execution times in RTT
If an conditional return point is present, the minimum and maximum execution times are calculated by equation 8 and equation 7 respectively.
Figure 5.6. A method or a block with a conditional return point. If there are both unconditional and conditional return points in the code, the minimum execution time is calculated from equation 8 with leading expressions = all expressions before and including the first conditional return point. The maximum execution time is calculated from equation 9 with leading expressions = all expressions before and including the first unconditional return point. 5.1.1.1.
Blocks.
An invocation of a block within a method or a block resembles message sending to a method. An important difference is that a return expression in a block always returns control to the method calling the present method. block1 value.
method1
block1
The minimum and maximum execution times for an invocation of a block are: MINTC(block invocation) = MINTC(message sending) + MINTC(block1) (10) MAXTC(block invocation) = MAXTC(message sending)+ MAXTC(block1)(11) Times for the block are calculated with the equations 6 - 9. Figure 5.7. Invocation of a block without a return expression. Blocks may be called with or without parameters. This doesn't influence the execution times, because parameters are put on and read from the execution stack, and the times for this (the PUSH and POP macros) are calculated separately16. 16/The RTT runtime system is based on an abstract stack machine, with a stack for each usecase. See
[Eriksson94] for details.
Chapter 5: Calculation of execution times in RTT
45
5.1.2. Selections. Selections in RTT are handled by conditional block invocation. Depending on the result of cond, block1 or block2 are invoked. (cond) ifTrue:[block1 code] ifFalse:[block2 code]
method1
S
block2
block1
The minimum and maximum execution times for an else branch selection are: MINTC(selection expression1) = MINTC(cond) + MINTC(choice) + MINTC(message sending) + min[MINTC(block1), MINTC(block2)] (12) MAXTC(selection expression1) = MAXTC(cond) + MAXTC(choice) + MAXTC(message sending) + max[MAXTC(block1), MAXTC(block2)] (13) Figure 5.8. Selection with an else branch.
(cond) ifTrue:[block1 code].
method1
S
block1
The minimum and maximum execution times for a selection without an else branch are: MINTC(selection expression2) = MINTC(cond) + MINTC(choice)
(14)
MAXTC(selection expression2) = MAXTC(cond) + MAXTC(choice) + MAXTC(message sending) + MAXTC(block1)
(15)
Figure 5.9. Selection without an else branch.
46
Chapter 5: Calculation of execution times in RTT
5.1.3. Repetitions. Repetitions in RTT are handled by repeated block invocation. There are two types of repetitions in RTT, timesRepeat and whileTrue. The do-construct and similar forms (iterations over data structures) are removed in RTT 17. 5.1.3.1.
The timesRepeat repetition.
The first type, the timesRepeat repetition, is used when the number of iterations is known in advance (as the iteration starts). If it is known at compile time, the number can be given as an integer constant n. If the number of iterations is calculated during run-time, an upper limit maxIterations must be given. A lower limit, minIterations, is optional and is used for the MINTC calculation.
n timesRepeat: [block1 code].
method1
R
block1
Block1 is invoked exactly n (a constant) times. The minimum and maximum execution times for a timesRepeat expression without a return point in the block are: MINTC(timesRepeat expression1) = MINTC(counter test) + n * [MINTC(message sending) + MINTC(block1)] (16) MAXTC(timesRepeat expression1) = MAXTC(counter test) + n * [MAXTC (message sending) + MAXTC(block1)] (17) Figure 5.10. The timesRepeat repetition without a return point in the block. One must assume that block1 in the timesRepeat repetition does not contain an unconditional return point. If it did, we could only accept the uninteresting case where n = 1. Therefore, unconditional return points are not allowed within timesRepeat statements of this type 18.
17/ See section 6.3. 18/ This is checked in the algorithm in section 5.2.2.
Chapter 5: Calculation of execution times in RTT
47
If block1 in the timesRepeat repetition invokes a block which contains one or more conditional return points, we would have the risk that the repetition was terminated before n iterations were finished. Therefore, conditional return points are not allowed within timesRepeat statements of this type19. A timesRepeat repetition with one or more conditional return points must, instead, be given in one of the forms given below. If expr is a expression, calculated at run-time, one of the following forms must be used, where n is an integer constant or expression. The first form is equal to the second form with min = 0. In this case, the equations for execution times change in the following way: Form 1. expr timesRepeat: […] maxIterations: max
The minimum and maximum execution times for a timesRepeat expression of this type without a return point in the block are: MINTC(timesRepeat expression3) = MINTC(counter test) (18) MAXTC(timesRepeat expression3) = MAXTC(counter test) + max * [MAXTC(message sending) + MAXTC(block1)] (19) If block1 in the timesRepeat repetition of this type contains one or more conditional return points, the equations 18 and 19 will still hold. Form 2. expr timesRepeat: […] minIterations: min maxIterations: max
The minimum and maximum execution times for a timesRepeat expression of this type without a return point in the block are: MINTC(timesRepeat expression4) = MINTC(counter test) + min * [MINTC(message sending) + MINTC(block1)] (20) Equation 19 will still hold for MAXTC.
19/ This is checked in the algorithm in section 5.2.2.
48
Chapter 5: Calculation of execution times in RTT
If block1 in the timesRepeat repetition of this type contains one or more conditional return points, the equation 22 will change to: MINTC(timesRepeat expression5) = MINTC(counter test) + (min - 1) * [MINTC(message sending) + MINTC(block1)] + MINTC(message sending) + MINTC'(block1)
(21)
where MINTC'(block1) is the sum of expressions up to the first return point. Cf equation 8. Equation 19 will still hold for MAXTC. 5.1.3.2.
The whileTrue repetition.
The second type, the whileTrue repetition, is used when the termination of the loop depends on some logical expression calculated during run-time. An upper limit maxIterations must always be given. A lower limit, minIterations, is optional and is used for the MINTC calculation. In the whileTrue repetition, block2 is invoked as long as the result of block1 is true, and the number of iterations does not exceeded max (max is a non-negative number). The first test (test1) tests that max is not exceeded, and test2 tests that the result of block1 is true.
[block1 code] whileTrue: [block2 code] maxIterations: max.
method1
test1
R
test2 block1
block2
The minimum and maximum execution times for a whileTrue expression without a return point in the blocks are: MINTC(whileTrue expression1) = MINTC(counter init) + MINTC(counter test1) (23) MAXTC(whileTrue expression1) = MAXTC(counter init) + MAXTC(counter test1) + max * [MAXTC(message sending) + MAXTC(block1) + MAXTC(test2) + MAXTC(message sending) + MAXTC(block2) + MAXTC(counter increment) + MAXTC(counter test1)] (24) Figure 5.11. A whileTrue repetition without a return point in the blocks.
Chapter 5: Calculation of execution times in RTT
49
One must assume that block1 in the whileTrue repetition does not contain an unconditional return point. If it did, we could only accept the uninteresting case where n = 1. Therefore, unconditional return points are not allowed within timesRepeat statements of this type 20. If block1 or block2 in the whileTrue repetition contains a conditional return point, the equations 23 and 24 will still hold. If the following form is used: [block1 code] whileTrue: [block2 code] minIterations: min maxIterations: max.
the equation for minimum execution time changes to: MINTC(whileTrue expression2) = MINTC(counter init) + MINTC(counter test1) + min * [MINTC(message sending) + MINTC(block1) + MINTC(test2) + MINTC(message sending) + MINTC(block2) + MINTC(counter increment) + MINTC(counter test1)] (25) Equation 24 will still hold. If block1 in the whileTrue repetition contains one or more conditional return points, the equation 24 will still hold, but equation 25 will change to: MINTC(whileTrue expression3) = MINTC(counter init) + MINTC(counter test1) + (min - 1) * [MINTC(message sending) + MINTC(block1) + MINTC(test2) + MINTC(message sending) + MINTC(block2) + MINTC(counter increment) + MINTC(counter test1)] + MINTC(message sending) + MINTC'(block1) (26) where MINTC'(block1) is the sum of expressions up to the first return point. Cf equation 8. If block2 in the whileTrue repetition invokes a block which contains one or more conditional return points, the equation 24 will still hold, but equation 25 will change to: MINTC(whileTrue expression4) = MINTC(counter init) + MINTC(counter test1) + (min - 1) * [MINTC(message sending) + MINTC(block1) + MINTC(test2) + MINTC(message sending) + MINTC(block2) + MINTC(counter increment) + MINTC(counter test1)] + MINTC(message sending) + MINTC(block1) + MINTC(test2) + MINTC(message sending) + MINTC'(block2) (27) where MINTC'(block2) is the sum of expressions up to the first return point. Cf equation 8. 20/ This is checked in the algorithm in section 5.2.2.
50
5.1.3.3.
Chapter 5: Calculation of execution times in RTT
Run-time supervision of loop limits.
The maxIterations limit is checked during run-time in the present version of RTT. If it is violated, a mode transition to an exception mode is performed 21. However, this is not a pleasant situation. One of the limits, given by the programmer, is broken. What should the system do? One ideal solution would be to be able to guarantee the proper termination of the loop within the limit by a proof22. Then, the run-time check of maxIterations could be removed. The minIterations limit is only checked for the timesRepeat iteration during runtime. It may be is important in certain applications to guarantee that this lower limit is not broken. One such occasion would be to minimize the jitter in a method. However, loops with varying number of iterations should be avoided in such methods23.
21/For a description of exception modes in RTT, see [Eriks93]. 22/This matter is discussed in section 6.4. 23/ See section 4.2 for a discussion on jitter in methods.
Chapter 5: Calculation of execution times in RTT 5.2.
51
The C-macro analysis (the front-end tool).
5.2.1. The basic idea. The C functions generated by the RTT compiler normally consist of C-macros. This is the case for the user defined application code written in the Smalltalk environment. This code is filed out and compiled by the RTT compiler. The C-macros are analysed by tools that calculate MINTC and MAXTC for the methods. The user code written i C is analysed on the assembly level by the back-end tool. See section 5.3. 5.2.2. Algorithms and data structures. The RTT file-out code is translated into C-functions by the RTT compiler. Each method and block in the RTT-code corresponds to a C-function. The C functions consist of macro calls (PUSH, RVALUE, SENDUNARY, ASSIGN etc.) which in the present version of RTT are expanded to C-code by the C-compiler. No executable C-code is visible in the translated code. In a future version of RTT the macro calls may be expanded directly to assembly code. This is one advantage with this approach: it is basically independent of the chosen language in the run-time system. There are two basic types of C-macros in RTT: The macros with constant execution time (type 1). Some examples are: PUSH PUSH_BLOCK POP LINK UNLINK ASSIGN
Push data on the stack. Push a block on the stack. Pop data from the stack. Link a method (init). Unlink a method (terminate). Assign a result
The macros which perform a message sending action (type 2): SENDUNARY SENDBINARY SENDKEYWORD
Send unary message to an object. Send binary message to an object. Send keyword message to an object.
52
Chapter 5: Calculation of execution times in RTT
The front-end tool calculates MINTC and MAXTC in the following way: Calculation of execution times (Algorithm 1, pass 1):
The source code for each method in the system is analysed one at a time. This is also the case for blocks.
Pass 1
Get next method or block, if any. Else go to pass 2. A new data structure (see fig. 5.14) and a new code
A B
N
More macros? If not, finish the code sequence, and mark it as the end of a method or block.
Y
Type of C-macro?
4,5
1
1. For macros with known MINTC and MAXTC (type 1), the precalculated macro times are just added to the execution times of the code sequence. 2. For macros which performs the sending of a message (type 2), the precalculated macro times are added to the execution times, and the selector is stored for later calculation (pass 2). 3. It is an invocation of one or more blocks. Add the times and finish the code sequence.
2 3
B
B
Type of block?
B1
B1. Invocation of a block once. A reference to the block is stored for later calculation. B2. Invocation of a loop with blocks. References to the blocks together with minimum and maximum number of loops are stored for later calculation (pass 2). B3. Conditional invocation of blocks (selection). References to the blocks are stored for later calculation (pass 2).
B2 B3
B
B
B 4
4 5
5 A
B A
4. It is an unconditional return expression: Finish the code sequence. Mark the end as unconditional return expression. Go up one level (possible only in blocks). If not, get next method or block (A). If the invocation is not selection, go to 4. If the invocation is selection, mark the end of the code sequences on all levels above as conditional return points, and continue with next macro (B). 5. It is an conditional return expression: Finish the code sequence. Mark the end as conditional return expression. Mark the end of the code sequences on all levels above as conditional return points, and continue with next macro (B).
Figure 5.12. Calculate execution times (Algorithm 1, pass 1).
Chapter 5: Calculation of execution times in RTT
53
Calculate execution times (Algorithm 1, pass 2)24:
Total times for each method previously stored are calculated. Each method in the system is analysed, one at a time.
Pass 2
N Y
Get next method or block, if any.
C
The times in the code sequences are added to MINTC and MAXTC of the method (or block). The times for the message sendings in the code sequences are also added to MINTC and MAXTC in the following way:
D
min(MINTC) and max(MAXTC) for the invoked methods are calculated and added to the times of the method. Times for message sending is also added. The times for the invoked blocks (Blocks, Loop blocks and Selection blocks) are calculated. Start from C. When the times of the invoked blocks are complete, calculate the times of the invocating expression according to the equations in section 5.1. The time depends on the type of expression, the loop limits and if it is an unconditional, a conditional or no return point. Checks are also made if the construct is allowed, according to the rules in section 5.1.
Pass 2
Figure 5.13. Calculate execution times (Algorithm 1, pass 2).
24/ This description is somewhat simplified. The recursive nature of the algorithm is not clearly
shown. Recursion occurs when C is called from below in the figure.
54
Chapter 5: Calculation of execution times in RTT
The following data structure is generated for each method in the symbol table. The same data structure is used for blocks, but for these "Messages sent by the method" are omitted.
Method Selector
Messages sent by the method Selector
Selector
For use in recursion detection
Selector
Messages sent in the sequence
Code sequence 1
Time for the sequence
Selector
Selector
Selector
Invoked Block Block
Block invocation 1
Cond./ Uncond. return
Invoked Loop Blocks One of these pointers are used
min max
Block1
Block2
Invoked Selection Blocks
. . .
Block
Block
Messages sent in the sequence Code sequence n
Time for the sequence
Selector
Selector
Selector
Uncond. return stmnt/ End
Figure 5.14. Data structure for the methods and blocks in the RTT symbol table. The pointers between methods (via Messages sent by the method) will form a DAG (directed acyclic graph) with the base classes' methods at the leaves if there is no recursion. The graph will however not form a DAG if recursion is used in the RTT code. This is used to detect recursion. Each code sequence between block invocations has its own record in the list. For each block invocation information is stored about the type of invocation, and if the block contains conditional or unconditional return points. The list ends when the last expression or when an unconditional return expression is encountered. Each block record contains a pointer to a block description.
Chapter 5: Calculation of execution times in RTT 5.2.2.1.
55
What is recursion in object-oriented programs?
Recursion in classical imperative languages, like C, occurs when a function invokes itself directly or indirectly. A common example of direct recursion is the calculation of the factorial function: int fac(int n) { if (n == 0) return(1); else return(n * fac(n - 1)); }
In object-oriented programs, recursion is different, since a function (a method) is identified not only by the name of the function, but also by the receiving object. Thus we have to differ between three different forms of recursion; true recursion, class recursion and polymorphic recursion. True recursion (instance recursion) occurs when the receiver of the message is the same instance as the sender, and that the same method is invoked again.
m
m
anObject: m Direct true recursion
anObject: m
anotherObject: aMethod
Indirect true recursion
Figure 5.15. True recursion. Code example on direct true recursion: trueRecursion "Tests true recursion, i e same object, same method." Transcript show: counter printString. "Ready?" (counter 1 ifTrue: [ ^ (self - 1) factorial * self ]. self < 0 ifTrue: [ ^ self error: 'negative factorial' ]. ^1
The important message here is the expression (self - 1) factorial
which sends a message to another instance of the class Integer 25.
25/We know that anInteger - 1 answers an Integer.
Chapter 5: Calculation of execution times in RTT
57
Polymorphic recursion is when the sender invokes the same method again in objects that are instances of unknown classes.
anObject: m
We don't know the class of anotherObject m another anObject: Object: m m
Direct polymorphic recursion
Indirect polymorphic recursion
We don't know the class of athirdObject athird Object: m
n
m another Object: n
Figure 5.17. Polymorphic recursion. Code example on direct polymorphic recursion: printString anOrderedCollection do: [:anObject | Transcript cr; show:anObject printString].! !
The important message here is the expression anObject printString
which sends the message printString to the objects in the collection, which are instances of unknown classes. However, this may not be recursion at all, since the selector may exist in other classes. This is a problem which cannot be solved without extra information in the code (like typing) or some type inference mechanism26.
26/See section 6.3 for a further discussion on this problem.
58
Chapter 5: Calculation of execution times in RTT
5.2.2.2.
Detection of recursion.
Recursion in a method is suspected when it contains message sending to a selector identical to its own. Suspected recursion is possible to find by a making a transitive closure of the message sendings in the method. The following recursive algorithm is used: Find suspected recursion (Algorithm 2) 27: The source code for each method in the system is analysed, one at a time.
End
More methods?
Find all selectors to which message sending is performed in the method and its blocks.
Error
If any of these selectors are equal to the selector of the investigated method, report an error (risk for direct recursion).
Find all methods which have these selectors.
If all these methods are methods in base classes, return "complete" (basic case in recursion).
Else, perform this algorithm for all the methods which are not base class methods. Then return "complete".
Figure 5.18: Find suspected recursion (Algorithm 2). This algorithm is executed before the calculation of execution times, because the calculation will never terminate if risk for recursion is found.
27/ This description is somewhat simplified. The recursive nature of the algorithm is not clearly
shown. Recursion occurs from the lowest box in the figure.
Chapter 5: Calculation of execution times in RTT 5.2.2.3.
59
Sequences.
Calculation of execution time for sequences of C-macros in RTT are simple. The times are just added. Note that there is a severe problem with the RTT code sequence a := […]. … a value.
which generates the macros28 /* a := [...]. */ LVALUE(stack,GETFRAMEADDRESS(stack, 0)); PUSH_BLOCK(stack, _ITst_mthdblock0); assign(stack); POP1(stack); /* a value. */ RVALUE(stack,GETFRAMEVALUE(stack, 0)); SENDUNARY1(stack,_value,0); POP1(stack);29
We cannot know what block that a refers to. This means that MINTC and MAXTC for the expression must be chosen as min(MINTC) and max(MAXTC) for all blocks, which is not acceptable, because it would lead to a big OF (overreservation factor). This obviously leads to that assignment of blocks and blocks as parameters must be forbidden in RTT. The same reasoning as above is also valid for value with parameters a value:7. a value:7 value:'a'.
What can be accepted is the direct […] value.
which generates /* [a := 2] value. */ PUSH_BLOCK(stack, _ITst_mthdblock1); SENDUNARY1(stack,_value,0); POP1(stack);
Here we know that the block _ITst_mthdblock1 is pushed on the stack.
28/ The C-macro code in this section just illustrates the text. For more details on the execution of
code in the RTT stack machine, please see [Eriks93]. 29/ SENDUNARY1 and POP1 are variants of SENDUNARY and POP.
60
Chapter 5: Calculation of execution times in RTT
5.2.2.4.
Selections.
Four constructs are possible in RTT. The RTT compiler generates the following macro code: 1. (cond) ifTrue:[block1 code]
generates PUSH_BLOCK(stack, _ITst_mthdblock2); SENDKEYWORD1(stack,_ifTrue_,1,0);30 POP1(stack);
2. (cond) ifFalse:[block1 code]
generates PUSH_BLOCK(stack, _ITst_mthdblock3); SENDKEYWORD1(stack,_ifFalse_,1,0); POP1(stack);
3. (cond) ifTrue: [block1 code] ifFalse: [block2 code].
generates PUSH_BLOCK(stack, _ITst_mthdblock4); PUSH_BLOCK(stack, _ITst_mthdblock5); SENDKEYWORD1(stack,_ifTrue_ifFalse_,2,0); POP1(stack);
4. (cond) ifFalse: [block1 code] ifTrue: [block2 code].
generates PUSH_BLOCK(stack, _ITst_mthdblock6); PUSH_BLOCK(stack, _ITst_mthdblock7); SENDKEYWORD1(stack,_ifFalse_ifTrue_,2,0); POP1(stack);
These constructs are recognized by the RTT C-macro parser. The execution times for the four cases are calculated from the equations in the previous section.
30/ SENDKEYWORD1 is a variant of SENDKEYWORD.
Chapter 5: Calculation of execution times in RTT 5.2.2.5.
61
Repetitions.
The following repetition constructs are possible in RTT: n timesRepeat: […]
where n is an integer constant. expr timesRepeat: […] maxIterations: max expr timesRepeat: […] minIterations: min maxIterations: max
where expr is an integer expression. […] whileTrue:[…] maxIterations: max […] whileTrue:[…] minIterations: min maxIterations: max
They generate the following macro code: 1. 7 timesRepeat: [block1 code]
generates PUSH(stack,7,INTEGER); PUSH_BLOCK(stack, _ITst_mthdblock8); SENDKEYWORD1(stack,_timesRepeat_,1,0); POP1(stack);
2. b timesRepeat: [block1 code] maxIterations: 7
generates RVALUE(stack,GETFRAMEVALUE(stack, 0)); PUSH_BLOCK(stack, _ITst_mthdblock9); PUSH(stack,7,INTEGER); SENDKEYWORD1(stack,_timesRepeat_maxIterations_,2,0); POP1(stack);
3. b timesRepeat: [block1 code] minIterations: 3 maxIterations: 7
generates RVALUE(stack,GETFRAMEVALUE(stack, 0)); PUSH_BLOCK(stack, _ITst_mthdblock10); PUSH(stack,3,INTEGER); PUSH(stack,7,INTEGER); SENDKEYWORD1(stack,_timesRepeat_minIterations_maxIterations_,3,0); POP1(stack);
62
Chapter 5: Calculation of execution times in RTT
4. [block1 code] whileTrue:[block2 code] maxIterations: 7
generates PUSH_BLOCK(stack, _ITst_mthdblock11); PUSH_BLOCK(stack, _ITst_mthdblock12); PUSH(stack,7,INTEGER); SENDKEYWORD1(stack,_whileTrue_maxIterations_,2,0); POP1(stack);
5. [block1 code] whileTrue:[block2 code] minIterations:3 maxIterations:7
generates PUSH_BLOCK(stack, _ITst_mthdblock13); PUSH_BLOCK(stack, _ITst_mthdblock14); PUSH(stack,3,INTEGER); PUSH(stack,7,INTEGER); SENDKEYWORD1(stack,_whileTrue_minIterations_maxIterations_,3,0); POP1(stack);
This is recognized by the RTT C-macro parser. The execution times are calculated from the equations in the previous section.
Chapter 5: Calculation of execution times in RTT
63
5.2.3. Implementation.
5.2.3.1.
The C-macro parser.
This parser (RTT_C.EXE) analyses the executable C-code generated by the RTT compiler from the filed-out RTT file (XXX.CLS). The RTT compiler output consists of two C-macro files: XXX.C, which is the C-macro code for the class and instance methods. XXX.B, which is the C-macro code for all blocks. These two files are concatenated to an input file to the C-macro parser. The tokens in the C-macro file are described in RTT_C.LEX. The grammar for the C-macro file is described in RTT_C.Y. The file C_SYMB.C is the symbol handler for the parser. The parser RTT_C.EXE does the following: 1.
It reads the symbol table generated from the execution of the RTT compiler.
2.
It executes the algorithm 1, pass 2: Parses the C-macros for each block and method, and store the accumulated MINTC and MAXTC for these in the symbol table. Times for the macros are fetched from an include file (XXX.H), created by the front-end tool. It saves the message sendings to other methods in the selectors list in the symbol table. It also saves the message sendings to blocks in loop blocks or selection blocks in lists in the symbol table.
5.
It executes the algorithm 1, pass 2, to calculate MINTC and MAXTC for each block and method in the symbol table. This is done by the functions in the file TIMECALC.C. Times for the base classes are fetched from an include file XXX.H), created by the front-end tool. Note that account is not taken for return expressions in the current version.
6.
It outputs the times for the activation methods of PEO:s to the scheduler.
5.2.3.3.
Programming tools.
All tools are developed on and for PC compatibles. The basic language is C and the programming environment Borland C++ 2.0 [Borland]. The parser generator used is BISON which a PC freeware variant of the well-known yacc parser generator in the UNIX system [BISON]. FLEX is used as the lexical analyser generator; a PC freeware variant of lex in the UNIX system [FLEX]. The C cross compiler for the MC68000 target computer is Microtec MCC68K [Microtek].
64
Chapter 5: Calculation of execution times in RTT
Make-files for front end tool RTT_C. Y
SYMBOL. C and H
TIME CALC. H C
M68K_TAB. C
Bison
Borland C-compiler/ linker
RTT_C. LEX
LEXYY .C
FLEX
RTT_C Time parser
XXX.C XXX. CLS
Method C-macro code
RTT compiler
RTT code
Run-files for front end tool
XXX.B Block C-macro code
XXX.DMP
XXX.H
RTT symbol table
Times from backend
RTT_C Time parser
Times for the RTT methods
For use in RTT C-code analysis
Figure 5.19: Source code for the RTT C-macro tool.
Chapter 5: Calculation of execution times in RTT 5.3.
65
The assembly code analysis (the back-end tool).
5.3.1. The basic idea. Some of the code in the RTT system is written i C. This is the case for: -
inline C-code in RTT methods (e g special methods for process I/O) the basic C-macros (e g PUSH, POP, LINK, UNLINK) primitive classes (e.g. Integer, Float and String) base classes (e.g. Array, Bag, Time and Transcript) run-time operations (e g message sending)
C-functions containing C-code are translated to assembly code by a C compiler. The code is of course dependant on processor type. The calculation of execution time for these functions are then made on the assembly level. There are also some parts of the RTT run-time kernel which are written in assembler which we need to know the execution time for, e. g. the clock interrupt routine. The minimum and maximum times for each type of macro and each method in the base class are the base for all the later calculation of times, which rests upon an important assumption: - all assembly code generated from the C-code are always generated equal, regardless of the context. This assumption permits us to calculate the times for e g a C-macro once and to use these times anywhere. This means that optimization of the code must be turned off. Specially, size optimization must be avoided, since it often leads to extra jumps to common code. The code for the RTT C-macros are duplicated many times! Speed optimization should also be turned off, since some types of optimization may increase the execution time in certain cases 31. The ideal situation would be to have complete control over the code generation, either via having access to the source code of the C compiler (e.g. using the GNU C compiler) or to generate the assembly code directly from the RTT code. Today, this is not the case; the Microtec C Compiler is used. The analysis is made in the following way: the basic blocks of assembly code are identified and analysed by a assembly code parser which sums up the execution times in the data sheet for the current processor. Clock frequency as well as number of wait-states for the memory must be known. A basic block is a sequence of statements in which flow of control enters the beginning and leaves at the end without halt or branching in-between. It has the following definition [Aho89]: For each leader (defined below), its basic block consists of the leader and all statements up to but not including the next leader or the end of the program. 31/See e. g. [Aho89], page 642.
66
Chapter 5: Calculation of execution times in RTT
A leader is: -
the first statement of a program
-
any statement which is the target of a conditional or an unconditional branching statement
-
any statement that immediately follows a conditional or an unconditional branching statement
All branching statements created by selections and repetitions must be analysed by a tool. This tool takes the output of the assembly code parser as input and generates an output that describes the execution graph of the assembly program. The maximum and minimum number of iterations in loops have to be input by hand into the graph. The tool then calculates MAXTC and MINTC for the graph. There is a problem with called subroutines (JSR) and code reached via interrupts (TRAP). They can be of two types: 1.
Their times are known from another analysis (i. e. the source code is available). Their times are then input by hand into the graph.
2.
They are not known because the source code is not available. A typical example is the float operations in the C package from Microtec.
There are some ways to handle the problem in point 2: -
One can measure the subroutines, which is theoretically doubtful.
-
One can rewrite the subroutines and calculate MINTC and MAXTC.
5.3.2. Algorithms and data structures. The following equations are used for sequences: MINTC(sequence) =
#instructions ∑ MINTC(instruction i) i=1
#instructions MAXTC(sequence) = ∑ MAXTC(instructioni) i=1
(28)
(29)
The symbol for a sequence (and also a subroutine and an interrupt routine) in the following is the rectangle. The diamond represents a branch, and the circle a branch address. The symbols are fetched from [Ammar]32. 32/However, their paper discusses average execution times, while we calculate minimum and
maximum times.
Chapter 5: Calculation of execution times in RTT
67
Selections (conditional branch statements): Type 1: If-type (conditional branch ahead)
no branch
branch
MINTC(selection1) = min{[time(no branch) + MINTC(code)], time(branch)} (30) MAXTC(selection1) = max{[time(no branch) + MAXTC(code)], time(branch)} (31)
code
Type 2: If-else-type (conditional and unconditional branch ahead)
no branch
branch
code1
MINTC(selection2) = min{[time(no branch) + MINTC(code1) + time(branch)], [time(branch) + MINTC(code2)} (32) MAXTC(selection2) = max{[time(no branch) + MAXTC(code1) + time(branch)], [time(branch) + MAXTC(code2)] (33)
code2
branch
Unconditional branch:
branch
MINTC(selection3) = MAXTC(selection3) = time(branch) (34)
68
Chapter 5: Calculation of execution times in RTT
Repetitions:
Type 1: Do-while-type (conditional branch backwards)
n
branch
MINTC(repetition1) = MINTC(code) + time(no branch) (35)
code
no branch
MAXTC(repetition1) = n + 1 * MAXTC(code) + n * time(branch) + time(no branch) (36)
Type 2: While-type (unconditional branch backwards)
n branch
MINTC(repetition2) = MINTC(code1) + time(branch) (37)
code1
MAXTC(repetition) = n + 1 * MAXTC(code1) + n * [time(no branch) + MAXTC(code2) + time(branch)] + time(branch)
no branch
(38) code2
branch
Chapter 5: Calculation of execution times in RTT
69
5.3.3. Implementation.
5.3.3.1.
The assembly code parser.
This parser (M68K.EXE) analyses the assembly source code files (XXX.SRC) which are generated by the Microtec C compiler. At present, all tools are developed for the Motorola MC68000 processor. The execution times for sequences (basic blocks) of instructions are calculated by the parser. The semantic rules in the grammar M68K.Y accumulates the execution times for the basic blocks of instructions for the chosen processor. The times are fetched from [MC68000]. The tokens in the Motorola MC68000 assembler language are described in M68K.LEX. The assembly language is in detail described in e. g. [Ford89]. There are two versions - one for assembly code produced by the Microtec C compiler (instructions with small letters) and one for hand-written code (capital letters). There are some central parts of the run-time system which are hand-coded in assembler (e. g. the clock interrupt routine) and which need to be analysed. The actual hardware parameters (clock frequency and number of wait states) are defined in HARDWARE.H. The file SYMBOL.C contains a symbol handler for the parser. The parser produces an output file (XXX.FW) which contains the accumulated times for the basic blocks. See Appendix A for an example of the output format. For assembly code which only contains a sequence, the analysis is complete. This is the case with many macros in RTT. 5.3.3.2.
The assembly code graph parser.
This parser (YYY.EXE) reads the XXX.FW file, recognises the structures described in section 5.3.2 and produces a Smalltalk file. This file is the input to the Smalltalk application described in the next section. There are some limitations in this parser at present. Since assembly code produced by a C compiler is the main input to the parser, only "pure" structures (structured programming) are allowed. For example, branches into loops are not allowed. The parser is the result of a student project within the RTT project, and it is described in more detail in [Korho93].
70
Chapter 5: Calculation of execution times in RTT
M68.Y
HARD WARE. H
M68. LEX
Make-files for the back-end tool
SYMBOL. H C
Bison
M68K_TAB. C Borland C-compiler/ linker
FLEX
LEXYY .C
M68K.EXE Time parser
Run-files for the back-end tool
XXX.C C code
Microtek C-compiler
XXX. FW Times for sequences
XXX. SRC Assembly code
XXX. STV Graph parser
M68K.EXE Time parser
Smalltalk/V
Smalltalk code for graphs
F = Clock frequency in MHz W = number of waitstates
For use in RTT C-code analysis
Figure 5.20: Source code for the RTT assembly tool.
Graph with min and max times
Chapter 5: Calculation of execution times in RTT 5.3.3.3.
71
The assembly code graph analyser.
The Smalltalk file produced in the previous step is filed in into a Smalltalk/V system. Before the code can be executed, the maximum number of repetitions in loops and the execution times for called subroutines have to be input manually. When the code is executed, the corresponding graph is drawn on the screen and minimum and maximum execution times for the assembler program are calculated. The graph symbols and the equations used are those defined in section 5.3.2.
Figure 5.21: Example of output from the graph analyser. The Smalltalk application is the result of a student project within the RTT project, and it is described in more detail in [Jönsson93]. The application also gives the possibility to input graphs manually via a dialogue. The minimum and maximum execution times are stored in an include file (XXX.H) which is the input to the C-macro parser. 5.3.3.4.
Programming tools.
The same tools as for the C-macro parser are used (see 5.2.3.3.). The Smalltalk tools are developed for Smalltalk/V version 1.0 for the PC.
Chapter 6: Limitations and future directions
6.
Limitations and future directions.
6.1.
Limitations and current problems.
73
Prototypes of the tools have been implemented to show that the basic principles are correct. However, due to time and resource limits, some functions have been excluded. A brief list of these limitations and some solution ideas is given below. 6.1.1. The back-end tool. 1.
Large functions in C have not been tested in the assembly graph analyser, due to a limitation in Smalltalk /V version 1.0 which leads to that the assembly graph analyser at present cannot handle large graphs. This should be fixed with Smalltalk /V 2.0.
2.
It may be a problem to find the loop limits to enter in the assembly code generated from C code. However, one is helped by having the C code as comments in the generated assembly code (this is an option in the Microtec compiler). It is important that the C functions are small and well-structured.
3.
Unstructured assembly programs cannot be analysed. However, this is really no problem, since only a small part of the RTT system is coded in assembly code, and these are well-structured.
4.
The problem with called subroutines (JSR) and code reached through interrupts (TRAP), described in section 5.3.1.
6.1.2. The front-end tool. 1.
In RTT, the machine code analysis is made off-line, and there is no guarantee that the same code will be produced in the actual machine code program, even if all optimizations of the compiler are turned off. Now, in all but a very few cases, optimizations yield faster programs. A safe solution, however, would be to generate the assembly code with own RTT tools.
2.
The calculations in the front-end tool (RTT C-macro analysis) does not consider return expressions. This means that the MINTC and MAXTC values sometimes are too high.
3.
No automatic transport of results is made between the RTT compiler and the front-end tool. This would need a function to dump and restore the symbol table. Today's version uses manual work.
4.
No automatic transport of results is made between the front-end tool and the scheduler. Today's version uses manual work.
5.
Time calculation for the garbage collection now being implemented for RTT is not performed. Only the explicit "delete" operation is supported.
74
Chapter 6: Limitations and future directions
6.1.3. Complexity. The tests made in Appendix A and B are small in comparison with a real application. How is the scalability of the algorithms and tools? How do they handle complexity? The question of scaleability in the back-end tool is discussed in point 1 and 2 in section 6.1.1. Probably, the complexity problem can be handled when the C functions in RTT are analysed, if the C functions are small and well-structured. There is much manual work to find and enter the loop limits in the assembly code, however. The front-end tool uses the recursive algorithms in section 5.2.2. The algorithms are executed by recursive C functions. How is the dependency between system size and time and memory consumption of the tool? Suppose that one application A consists of m methods, with a mean size of x message sendings and b block invocations. Suppose further that the mean number of levels of message sendings, before the base class method or primitive method is reached, is n. The mean number of implementors of a selector is supposed to be y. The execution time of applying an algorithm to an application is directly related to the number of executed operations in the algorithm. For the algorithms below, the executable operations are add and store. A compact form of Algorithm 1 is: Pass 1: For each method Add or store. Pass 2: For each method (but only once per method) A: For each message sending For each implementor If method with calculated time: add time. Else start from A for the implementing method. For each block For each message sending For each implementor If method with calculated time: add time. Else start from A for the implementing method.
The number of operations in Algorithm 1 is m + m * ((x * y) + (b * x * y)). A compact form of Algorithm 2 is: For each method B: For each message sending For each implementor If base class method return "complete". Else if recursion is suspected return "error". Else start from B for the implementing method.
The number of operations in Algorithm 2 is m * (x * y)n .
Chapter 6: Limitations and future directions
75
Suppose also that we double the complexity of the application A to get a new application B. With double complexity we mean the double number of methods (2m). Each method however is of the same size, that is the number of message sendings per method is constant (x). The number of levels (n) and the number of implementors (y) are also constant. This means that we have succeeded to handle complexity in a typical object-oriented way, by adding more code, but each code segment is still as simple. The number of operations in Algorithm 1 is now 2m + 2m * ((x * y) + (b * x * y)), i. e. the execution time of Algorithm 1 has increased with a factor 2. The number of operations in Algorithm 2 is now 2m * (x * y)n , i. e. the execution time of Algorithm 2 has also increased with a factor 2. Thus, the dependency between system size and time and memory consumption of the front-end tool is linear, with the assumptions made above. If the complexity affects the mean size of methods, the number of levels of message sendings or the number of implementors of a selector, the complexity grows faster than linear.
6.2.
Reduction of the dynamic factor using path information.
If we had information on infeasible (impossible) paths in the program, some overestimation of MAXTC and underestimation of MINTC might be avoided. Examples of such path information constructs are found in the MARS approach33 and the work of Park34. Inspired by these ideas, we have made some experiments with some simple constructs in RTT. The constructs are: 1.
An execution passage counter.
2.
A path checker.
These constructs have been: -
implemented in the Smalltalk run-time environment.
-
the impact on MINTC and MAXTC has been manually calculated for an example.
-
the calculations have been verified by simulations.
The algorithms necessary for the front-end tool have not been developed nor implemented so far.
33/ See section 3.3. 34/ See section 3.4.
76
Chapter 6: Limitations and future directions
6.2.1. The constructs. These constructs (the execution passage counter and the path checker) are valid within one method. They are used both at calculation of the execution time and for run-time checks. 1.
The execution passage counter.
This counter counts the number of times the execution flow has passed. The construct can be used e.g. to -
limit long alternatives within loop that are not selected each iteration
-
mark loops that are dependant
The syntax and semantics are: maxCount1 := MaxCount new.
Create an object which can hold this counter information. maxCount1 init:n.
This counter can be passed maximally n times. maxCount1 test.
Decrement and test this counter. This can be done on several places in a method. 2.
The path checker.
This construct marks an execution path and can be used for example to mark that two maximum cases in two different selections never can occur at the same time. The syntax and semantics are: executionPath1 := ExecutionPath new.
Create an object which can hold this path information. executionPath1 init:1.
Initialize this path as path 1. executionPath1 is:1.
This expression tests that the path is initialized to path 1. i.e. that this statement lies in the same path as the init statement. executionPath1 isNot:1.
This expression tests that the path is not initialized to path 1.
Chapter 6: Limitations and future directions
77
If the execution path is not initialised, the execution may take any path in the method. If the execution path is initialised, the execution may take only path 0 or paths where the path number is equal to the correct path number. If one alternative in selections is marked as belonging to a path, the other alternative is automatically defined as forbidden for that path. 6.2.2. An example. The following method was tested with the new constructs. 10
M S 0
B1
0
7
R
R 5..10 8
0
0..10 6 S
S 0
B2
0
6
0
5
S B3
1
R 10..11 7
0
1
S 3
0
4
-1
11
R 1..100 0 3 a:50 S
3
b:80 S
a:50 1
B4
1
c:40
7
1
3
0
5
0
4
d:40
Figure 6.1: Test method graph. The figure uses the graph symbols described in section 5.1. The rectangle at the top represents the method, the other represent the blocks. The figures to the left mark the level. The blocks are numbered on their level from left to right, e.g. the blocks on level are numbered B11, B12 and B13. The figures and letters in the graph mean the following: -
1..10 means minimum and maximum number of iterations for loops.
-
a normal number within a sequence marks the execution time (for simplicity, MINTC = MAXTC).
-
an italic number within a sequence marks the path id (0 is default). For simplicity, we assume that the path is defined in the beginning of the block.
-
a:5000 means that counter a is tested to be below 5000.
78
Chapter 6: Limitations and future directions
In this example we have: -
maxCounta is 50 (i. e. the sum of executions of B33 and B43 must not exceed 50), maxCountb is 80, maxCountc is 40 and maxCountd is 40.
-
executionPath1 is initialised to 1 in B31 and tested = 1 in B34 (which leads to that B41and B42 also lies in path 1).
-
the forbidden block for path 1 (B33) is marked with -1 .
6.2.3. Calculation of execution times (these figures has to be changed!). MINTC(M) without the new constructs is = 10 + 40 + 650 = 700. MAXTC(M) without the new constructs is = 10 + 17 + 80 + 770 + 1210 + 60 + 10 + 800 = 2 957. MINTC(M) with the new constructs is minimum for the different paths. MINTC(M0) = 10 + 40 + 50 * (7 + 11) = 950. MINTC(M1) = 10 + 40 + 50 * (7 + 3 + 3) = 700. MINTC(M) = min(MINTC(M0), MINTC(M1)) = 700. MAXTC(M) with the new constructs is maximum for the different paths. Path 0: MAXTC(M0) = 10 + 17 + 800 + 2000 * 18 + 60 + 10 + 1600 + 10 * 40 * 12 * 8 = 76897. Path 1: MAXTC(M1) = 10 + 16 + 640 + 560 + 800 + 60 + 10 + 1600 + 10 * 40 * 12 * 8 = 42096. MAXTC(M) = min(MAXTC(M0), MAXTC(M1)) = 76897. A reduction with 79%! 6.2.4. Simulation of execution times. A Smalltalk program was written, that simulated the method in the example. The execution time was simulated by adding integers. All different paths were executed, and minimum and maximum times were calculated. The results were the same as above.
Chapter 6: Limitations and future directions 6.3.
79
Typing RTT.
The fact that RTT is untyped creates some problems during the calculation of execution times. These problems are mentioned in several chapters in this thesis. A list of the problems and a discussion of some possible solutions will be given in this section. Problems: 1.
Recursion in the case of an untyped, polymorphic language like RTT and Smalltalk can be hard to identify. What looks as recursion may not be recursion at all. We have forbidden suspected recursion in RTT.
2.
Polymorphism together with untyped objects causes big overreservation factors for message sendings. We can't decide which method that will be invoked at run-time. We then have to choose the biggest MAXTC and the shortest MINTC among all methods with the current selector in the system.
3.
It is not possible to calculate execution times for iterations over unlimited data structures. Therefore, the do-construct and similar forms are removed in RTT.
Implications: All three problems are severe. The first stops constructs which are usual in Smalltalk; e. g. exploiting the benefits of polymorphism in lists, by having one selector "expand" to the same selector to the objects in collections. The second problem may lead to a diminishing use of polymorphism in object-oriented software. One naive solution would be to avoid common selectors in the system, which is contrary to the basic ideas of object-oriented programming. The third forbids constructs in Smalltalk that are very powerful (do, accept, reject, etc.) The programmer must use a less elegant solution with limited loops over indexable data sets. Solutions: An at least partly solution would be to type the language. An implementation of Smalltalk with type-checking and type-inference is described in [Graver89]. If we could type the receivers of the messages in the first construct (recursion), we could isolate real recursion (true recursion and class recursion) from "false" recursion (polymorphic recursion). True recursion and class recursion are similar to recursion in C because it is the same code that is executing. Real recursion would maybe be possible to handle, if the maximum depth was given by the programmer. Polymorphic recursion could be analysed as ordinary message sending. Overestimation due to the second problem would get smaller, since the possible class of the receiver would be restricted. It would be possible to use the do-construct and similar forms, if the size of collections and the types of the members were known.
80 6.4.
Chapter 6: Limitations and future directions Proving terminating loops.
How can we be sure that a while-type loop terminates gracefully, within its time limit? This is a classical problem in computer science, that is proven to be not solvable in the general case. However, by using some proof mechanism, the termination of certain loops can be proven. One such proof mechanism is to have a value which is increased by a fair function. When the value reaches a limit, the calculation is complete, and the loop terminates. The UNITY language [Chandy88] for example, is used to support such proof techniques. To include such proof techniques in RTT and to couple it to real-time is a challenging research area.
6.5.
Measurement on a PC version of RTT.
One version of RTT executes on an Intel 386/486-based PC. However, this hardware is not predictable at all; it uses both pipelines and cache. One can not expect hard real-time support for such an implementation, but for test, simulation, development and education this version is very useful. One approach to get some idea of the time behaviour of that implementation is to exchange the back-end calculation tool to a measurement tool. One could use the measurement technique described in section 3.7 to get an idea of the mean value and the variance of the execution time of the basic C-macros. The assembly code and, especially, the C code in RTT would give some trouble. One would probably have to find the best and worst cases for these parts of the software and to measure these cases. The final execution times for the methods could finally be calculated by the front-end tool, as in the M68000 case.
Chapter 7: References
7.
81
References.
[Ammar]
Ammar, Wang and Scholl: "Graphic Modelling Technique for Software Execution Time Estimation", 1991.
[Aho89]
A. Aho, R. Sethi, and J. D. Ullman: "Compilers - principles, Techniques and Tools", pp 528 ff., Addison Wesley 1989.
[Berggren92]Hans Berggren, Mikael Gustafsson and Lennart Lindh, "Measuring and Analyzing Real-Time Kernel Performance", EuroMicro 92, Paris, Sept. 1992. [Berry85]
G. Berry and L. Cosserat: "The Esterel Synchronous Programming Language and Its Mathematical Semantics", Lecture Notes in Computer Science, Vol. 197, Springer-Verlag, Feb. 1985, pp. 389 - 448.
[BISON]
C. Donnely and R. Stallman, "BISON, the YACC-compatible Parser Generator", Free Software Foundation, 1988.
[Blair91]
Blair, G., Gallagher, J., Hutchison, D., Shepard, D., "Object-Oriented Languages, Systems and Applications", Pitman Publishing, 1991, London.
[Borland]
Borland C++ 2.0, Borland Inc., 1800 Green Hills Road, P.O. Box 660001, Scotts Valley, CA 95067-0001, USA.
[Brorsson92] E. Brorsson, C. Eriksson, J. Gustafsson, "RealTimeTalk, An Object-Oriented Language for Hard Real-Time Systems", International workshop on real-time programming, WRTP'92, IFAC/IFIP, Bruges, Belgium, June 1992. [Böhm66]
C. Böhm and G. Jacopini: "Flow diagrams, Turing Machines and Languages with only Two Formation Rules", Communications of the ACM, Vol. 9, No. 5, May 1966.
[Clapp86]
R. Clapp et al., "Toward Real-Time Performance Benchmarks for Ada", Communications of the ACM, Vol. 29, No. 8, Aug. 1986, pp. 760 - 778.
[Cox86]
Cox B. J., Object-Oriented Programming: An Evolutionary Approach, Addison-Wesley, 1986, Reading (Mass).
[Eriksson93] Christer Eriksson, Jan Gustafsson, Jerk Brorsson and Mikael Gustafsson, "An Object-Oriented Framework for Designing Hard Real-Time Systems", Fifth Euromicro Workshop on Real-Time Systems, Oulu, Finland, June 1993. [Eriksson94] Christer Eriksson, "An Object-Oriented Framework for Designing Hard RealTime Systems", Licentiate Thesis, University of Mälardalen/KTH, Sweden, 1994. [FLEX]
V. Paxson, "FLEX, Fast Lexical Analyzer Generator", Computer Science Dept., Cornell University, 1990.
[Ford89]
W. Ford and D. Topp, Assembly Language and Systems Programming for the M68000 family, Heath 1989.
[Gold89]
A. Goldberg and D. Robson, Smalltalk 80, The Language, Addison Wesley 1989.
[Gopi92]
P. Gopinath, T. Bihari and R. Gupta, "Compiler Support for Object-Oriented Real-Time Software", IEEE Software, sept 1992, pp 45 - 49.
Page 81
82
Chapter 7: References
[Graver89]
J. O. Graver, "Type-Checking and Type-Inference for Object-Oriented Programming Languages", Thesis, University of Urbana-Champaign, 1989.
[Halang90]
W. Halang, A. Stoyenko: "Comparative Evaluation of High-Level Real-Time Programming Languages", The Journal of Real-Time Systems,2, pp. 365 382, Nov. 1990.
[Harris]
"RTX2000, Real Time Express 16-Bit Microcontroller", Harris Semiconductor, 1301 Woody Burke Road, Melbourne, Florida 32 902, USA, 1989.
[Hassel93]
Roger Hassel, Kristian Sandström, "Garbage Collection i realtid", project report (in Swedish), 1993.
[Hua92]
S. Huang and D. Chen, "Effecient Algorithms for Method Dispatch in Object-oriented Programming Systems". JOOP September 1992.
[Jönsson93] Michael Jönsson, "Ritande och beräknande av exekveringsgrafer i Smalltalk", project report (in Swedish), 1993. [Kenny91:1] K. B. Kenny and K.-J. Lin, "Measuring and Analyzing Real-Time Performance", IEEE Software, September 1991, pp 41 - 49. [Kenny91:2] K. B. Kenny and K.-J. Lin, "A Measurement-Based Performance Analyzer for Real-Time Programs", Proc. International Phoenix Conference on Computer and Communication, IEEE CS Press, Los Alamitos, California, Order No. 2133, 1991, pp 93 - 99. [Kenny91:3] K. B. Kenny and K.-J. Lin, "Building Flexible Real-Time Systems Using the FLEX Language", IEEE Computer, May 1991, pp 70 - 78. [Kirk91]
D. B. Kirk et al, "Allocating SMART Cache Segments for Scheduability", Proceedings, Euromicro '91 Workshop on Real-Time Systems, June 12 - 14, Paris, France, pp 41 - 50.
[Kopetz89]
H. Kopetz, et al., "Distributed Real-Time Systems - The MARS Approach", IEEE Micro, Feb.. 1989.
[Kopetz91]
H. Kopetz. Event-Triggered versus Time-Triggered Real-Time Systems. Lecture Notes in Computer Science, Vol 563. Springer Verlag, Berlin. pp 87101.
[Korho93]
Marko Korhonen, "Analys av exekveringsgrafer för assemblyprogram", Project report (in Swedish), 1993 (not ready at present).
[Leinb80]
D. W. Leinbaugh, "Guaranteed Response Times in a Hard-Real-Time Environment", IEEE Transactions on Software Engineering, Vol. SE-6, Jan 1980, pp 85 - 91.
[Leinb82]
D. W. Leinbaugh, and M.-R. Yamini "Guaranteed Response Times in a Distributed Hard-Real-Time Environment", Proc. IEEE 1982 Real-Time Systems Symposium, Dec. 1982, pp 157 - 169.
[Lin87]
K.-J. Lin, S. Natarajan and J. W. S. Liu, "Concord: a System of Imprecise Calculations", Proceedings of COMPSAC '87, Tokyo, Japan, IEEE, Oct. 1987.
[Lin91]
K.-J. Lin, J. W. S. Liu, K. B. Kenny and S. Natarajan, "FLEX: A Language for Programming Flexible Real-Time Systems ", Foundations of Real-Time
Chapter 7: References
83
Computing, Formal Specifications and Methods, Kluwer Academic Publishers, July 1991, pp 251 - 290. [LiuLey73]
C. Liu and J. Leyland, "Scheduling Algorithms for Multiprogramming in Hard Real-Time Environment", Journal of the ACM, Jan. 1973, pp 46 - 61.
[MC68000] MC68000 Microprocessor, User's Manual, Section 8, Prentice-Hall, 1990. [Microtec]
MCC68K (C Compiler for MC68000), Microtec Research Inc., 2350 Mission College Blvd., Santa Clara, CA 95054, USA.
[Mok84]
A. Mok, "The Design of Real-Time Programming Systems on Process Models", Proceedings, 5th Real-Time Systems Symposium, Dec. 1984.
[Mok89]
A. Mok et al., "Evaluating Tight Execution Time Bounds of Programs by Annotations", Proceedings of the 6th IEEE Workshop on Real-Time Operating Systems and Software, May 1989, Pittsburgh, pp. 74 - 80.
[Park89]
C. Y. Park and A. C Shaw, "A Source-Level Tool for Predicting Deterministic Execution Times for Programs", TR #89-09-12, Dept. of Computer Science and Engineering, University of Washington, Sept 1989.
[Park91:1]
C. Y. Park and A. C Shaw, "Experiments with a Program Timing Tool Based on a Source-Level Timing Schema", IEEE Computer, May 1991, pp 48 - 56.
[Park91:2]
C. Y. Park and A. C Shaw, "Experiments with a Program Timing Tool Based on a Source-Level Timing Schema", IEEE Computer, May 1991, pp 48 - 56.
[Park93]
C. Y. Park, "Predicting Program Execution Times by Analyzing Static and Dynamic Program Paths", The Journal of Real-Time Systems, 5, 1993, pp 31 - 62.
[Profiler]
Turbo Profiler 1.1, Borland Inc., 1800 Green Hills Road, P.O. Box 660001, Scotts Valley, CA 95067-0001, USA.
[Puschner89] P. Puschner and C. Koza, "Calculating the Maximum Execution Times of Real-Time Programs", The Journal of Real-Time Systems, Sept. 1989, pp. 159 - 176. [Puschner93] P. Puschner and A. Schedl, "A Tool fir the Computation of Worst Case Task Execution Times", Proceedings, Fifth EuroMicro Workshop on Real-Time Systems, Oulu, Finland, June 1993. [RR01]
CUS93RR01, Internal report at CUS, (Department of Real-time Computer Systems), University of Mälardalen, 1993.
[Shaw89]
A.C. Shaw, "Reasoning about Time in Higher Level Language Software", IEEE Transactions on Software Engineering, July 1989, pp 875 - 889.
[Shaw91]
A.C. Shaw, "Towards a Timing Semantics for Programming Languages", Foundations of Real-Time Computing, Formal Specifications and Methods, Kluwer Academic Publishers, July 1991, pp 217 - 250.
[Shlaer88]
Shlaer, S., and Mellor, S., "Object-Oriented Systems Analysis - Modelling the World in Data", Yourdon Press 1988, Computer Series.
[Stanc88]
Stancovic, J. A. "Misconceptions about Real-Time Computing", IEEE Computer, 21, Oct. 1988, pp 10-19. Page 83
84
Chapter 7: References
[Stoyen86]
A. Stoyenko and E. Kligerman, "Real-Time Euclid: A Language for Reliable Real-Time Systems", IEEE Transactions on Software Engineering, Vol. SE12, Sept. 1986, pp 940 - 949.
[Stoyen87]
A. Stoyenko, "A Real-Time Language with a Schedulability Analyzer", Ph. D. Thesis TR CSRI-206, Computer Systems Research Institute, University of Toronto, Dec. 1987.
[Stoyen91]
A. Stoyenko, V. C. Hamacher and R. C. Holt, "Analyzing Hard-Real-Time Programs for Guarateed Scheduability", IEEE Transactions on Software Engineering, Aug. 1991, pp 737 - 750.
[Vort92]
K. H. Vortman, "A Timing Analyzer for DEDOS", Master's Thesis, Eindhoven University of Technology, Dept. of Mathematics and Computing Science, Aug. 1992.
[Wall93]
Göran Wall et al., "A Source-Level Performance Analysis Tool for RealTime Programs", SNART Real-Time Conference, Stockholm, 23 - 25 August, 1993.
[Zhang93]
N. Zhang, A. Burns, and M. Nicholson,"Pipelined Processors and Worst Case Execution Times", Journal of Real-Time Systems, 5, Oct. 1993, pp. 319 - 343.
Appendix A: Measurement of execution times on Motorola MC68000
85
Appendix A: Measurement of execution times on Motorola MC68000. A.1.
Aim.
This appendix shows the results of back-end tool tests. The primary goal was to compare the calculations with measurements to ensure that the times given in the data sheet for the processor MC68000 were correct with respect to wait states, clock frequency, and effect from the instruction prefetch at branching. Also, the effect of dynamic memory refresh on execution times were studied. The results are quite good. The measured times correspond well to the calculated. However, we have a dynamic factor in the calculations of about 10 %. Parts of this factor could be eliminated if we developed a cleverer parser for the assembler code. One way to do this would be to analyse the operands to the assembler instructions in detail. Three different program were analysed and measured. The aim was to; -
in general check some of the times given in the MC68000 manual [MC68000].
-
check the branching times; does the prefetch queue in the processor influence the times given in the manual?
-
check how the refresh mechanism influences the times given in the manual.
The first program Test 1 (section A.3) checks the times given in the manual for some move and branch instructions. The second program Test 2 (section A.4) especially checks the times given for the impact on waitstates on move.l (long) instructions. The third program Test 3 (section A.5) checks that the times given in the manual are correct also for branch instructions, that empties the processor prefetch queue of two words.
A.2.
Test system.
•
MC68000 circuit board (FORCE-1) with DRAM (2 wait states). Clock frequency 8 MHz.
•
Adapter to I/O port which lets the oscilloscope read the output signal from the circuit board.
•
PC system with Microtec assembler.
•
Oscilloscope.
86 A.2.1.
Appendix A: Measurement of execution times on Motorola MC68000 Measurement technique.
The programs toggles a bit in an outport. This bit is presented on an oscilloscope via an adapter, and a square wave is presented on the display as the program executes. This wave has the time on the x-axis. The 0 to 1 and 1 to 0 transitions are easily identified in the program. 1
2
t1 t2
2
r
r
t3
t4 Figure A.1. Oscilloscope display (schematic). The toggling bit is shown on the y-axis. 1 = point 1 in the program 2 = point 2 in the program t1 = time between point 2 and 1 in the program t2 = time between point 2 and 1 in the program when refresh has occurred t3 = time between point 1 and 2 in the program t4 = time between point 1 and 2 in the program when refresh has occurred t5 = time between point 1 and 1 in the program (the whole loop) r = time for refresh The longer time t2 occurs when refresh is made between point 2 and 1 in the program. The longer time t4 occurs when refresh is made between point 1 and 2 in the program. (Since refresh occurs rather seldom, it never happens that two refreshes happens between point 2 and 2.) The processor is stopped completely during refresh.
Appendix A: Measurement of execution times on Motorola MC68000
87
The time r for refresh is on the display measured to 128 µs, which is according to the documentation with the FORCE-1 circuit board. That execution with refresh occurs more seldom than the execution without refresh is obvious from the lines on the display: the longer way writes a thinner and flickering line. The reading error of the oscilloscope is about 2%.
88
Appendix A: Measurement of execution times on Motorola MC68000
A.3.
Test 1: Short loop.
A.3.1.
Program list.
inp: EQU out: EQU ORG $10000 move.b move.b move.b move.b move.b loop: move.l move.l move.b zero: move.l move.l move.l move.l move.l move.b one: move.l move.l move.l bra.w loop
A.3.2. Program graph with execution times. 120 150
$E0013 $E0011 #$0,$E0001 #$FF,$E0005 #$0,$E0007 #$C0,$E000D #$C0,$E000F
loop branch
$18000,$18004 $18000,$18004 #0,out $18000,$18004 $18000,$18004 $18000,$18004 $18000,$18004 $18000,$18004 #1,out
14 14
120 138 1 264 300 2
one 144 162
$18000,$18004 $18000,$18004 $18000,$18004
t3 = 300 cycles (point 1 to point 2) t5 = 614 cycles (the whole loop) A.3.3.
Output from the execution time analyser.
Execution times for MC68000 MHz = 8, waitstates = 2 calmint = 120 cycles, 15.00 microseconds calmaxt = 150 cycles, 18.75 microseconds LABEL = loop calmint = 120 cycles, 15.00 microseconds calmaxt = 138 cycles, 17.25 microseconds LABEL = zero calmint = 264 cycles, 33.00 microseconds calmaxt = 300 cycles, 37.50 microseconds LABEL = one calmint = 144 cycles, 18.00 microseconds calmaxt = 162 cycles, 20.25 microseconds Branch backward (loop) Time min 14, max 14 cycles if branch taken calmint = 0 cycles, 0.00 microseconds calmaxt = 0 cycles, 0.00 microseconds That's it!
A.3.4.
Calculated time versus measured time.
t3 is calculated to 300 cycles => 37.5 µs, t5 is calculated to 614 cycles => 77 µs. These are the same as the measured values within the reading error.
Appendix A: Measurement of execution times on Motorola MC68000 A.4.
89
Test 2: Long loop.
A.4.1.
Program list.
inp: EQU $E0013 out: EQU $E0011 ORG $10000 move.b #$0,$E0001 move.b #$FF,$E0005 move.b #$0,$E0007 move.b #$C0,$E000D move.b #$C0,$E000F loop: move.l $18000,$18004 move.l $18000,$18004 move.b #0,out zero: move.l $18000,$18004 ... A total of 50 move.l move.b #1,out one: move.l $18000,$18004 move.l $18000,$18004 move.l $18000,$18004 move.l $18000,$18004 bra.w loop
A.4.2. Program graph with execution times. 120 150
loop branch
14 14
120 138 1 2424 2730 2
one 192 216
t3 = 2730 cycles (point 1 to point 2) t5 = 3098 cycles (the whole loop) A.4.3.
Output from the execution time analyser.
Execution times for MC68000 MHz = 8, waitstates = 2 calmint = 120 cycles, 15.00 microseconds calmaxt = 150 cycles, 18.75 microseconds LABEL = loop calmint = 120 cycles, 15.00 microseconds calmaxt = 138 cycles, 17.25 microseconds LABEL = zero calmint = 2424 cycles, 303.00 microseconds calmaxt = 2730 cycles, 341.25 microseconds LABEL = one calmint = 192 cycles, 24.00 microseconds calmaxt = 216 cycles, 27.00 microseconds BRA LABEL = loop Branch backward (loop) Time min 14, max 14 cycles if branch taken calmint = 0 cycles, 0.00 microseconds calmaxt = 0 cycles, 0.00 microseconds That's it!
A.4.4.
Calculated time versus measured time.
t3 is calculated to 2730 cycles => 341.25 µs and t5 is calculated to 3098 cycles => 387.25 µs which is the same as the measured values, within the reading error. Page 89
90
Appendix A: Measurement of execution times on Motorola MC68000
A.5.
Test 3: Long loop with jumps.
A.5.1.
Program list.
inp: EQU out: EQU ORG $10000 move.b move.b move.b move.b move.b loop: move.l move.l move.b bra.w loop1 move.l loop1: bra.w loop2 move.l loop2: bra.w loop3 move.l loop3: bra.w loop4 move.l loop4: move.b move.l move.l move.l bra.w loop
A.5.2. Program graph with execution times.
$E0013 $E0011 #$0,$E0001 #$FF,$E0005 #$0,$E0007 #$C0,$E000D #$C0,$E000F
120 150
loop branch
14 14
1
120 138
48 54
$18000,$18004 $18000,$18004 #0,out
branch
14 14
loop1
0 0
$18000,$18004
$18000,$18004
48 54
branch
14 14
branch
14 14
branch
14 14
loop2
$18000,$18004
0 0
$18000,$18004 #1,out $18000,$18004 $18000,$18004 $18000,$18004
48 54 loop3 0 0
t3 = 86 cycles (point 1 to point 2) 48 54
t5 = 400 cycles (the whole loop)
24 30
2 144 162
Appendix A: Measurement of execution times on Motorola MC68000 A.5.3.
91
Output from the execution time analyser.
Execution times for MC68000 MHz = 8, waitstates = 2 calmint = 120 cycles, 15.00 microseconds calmaxt = 150 cycles, 18.75 microseconds LABEL = loop calmint = 120 cycles, 15.00 microseconds calmaxt = 138 cycles, 17.25 microseconds BRA LABEL = loop1 Branch forward (selection) Time min 14, max 14 cycles if branch taken calmint = 48 cycles, 6.00 microseconds calmaxt = 54 cycles, 6.75 microseconds LABEL = loop1 calmint = 0 cycles, 0.00 microseconds calmaxt = 0 cycles, 0.00 microseconds BRA LABEL = loop2 Branch forward (selection) Time min 14, max 14 cycles if branch taken calmint = 48 cycles, 6.00 microseconds calmaxt = 54 cycles, 6.75 microseconds LABEL = loop2 calmint = 0 cycles, 0.00 microseconds calmaxt = 0 cycles, 0.00 microseconds BRA LABEL = loop3 Branch forward (selection) Time min 14, max 14 cycles if branch taken calmint = 48 cycles, 6.00 microseconds calmaxt = 54 cycles, 6.75 microseconds LABEL = loop3 calmint = 0 cycles, 0.00 microseconds calmaxt = 0 cycles, 0.00 microseconds BRA LABEL = loop4 Branch forward (selection) Time min 14, max 14 cycles if branch taken calmint = 48 cycles, 6.00 microseconds calmaxt = 54 cycles, 6.75 microseconds LABEL = loop4 calmint = 24 cycles, 3.00 microseconds calmaxt = 30 cycles, 3.75 microseconds LABEL = one calmint = 144 cycles, 18.00 microseconds calmaxt = 162 cycles, 20.25 microseconds BRA LABEL = loop Branch backward (loop) Time min 14, max 14 cycles if branch taken calmint = 0 cycles, 0.00 microseconds calmaxt = 0 cycles, 0.00 microseconds That's it!
A.5.4.
Calculated time versus measured time.
t1 is calculated to 86 cycles => 10,75 µs, t3 is calculated to 400 cycles => 50 µs which are the same as the measured values within the reading error.
Page 91
92 A.6.
Appendix A: Measurement of execution times on Motorola MC68000 Overhead for refresh
The frequency for refresh was calculated by measuring the frequency of the program in section 14.5 to 17300 Hz (iterations/second). This gives the percentage value for the refresh usage of the CPU: R = 100% - 17 300 * 50 µs = 13.5 % Number of refreshes /second = 135 000/128 = 1 055. The manual says 1 000. The difference (5.5%) may very well be due to errors of values in the circuits. If the refresh mechanism is known, the impact on hard real-time is possible to calculate. This is done e. g. in [Park91:1].
Appendix B. Calculation of execution times for a RTT C-macro example
93
Appendix B. Calculation of execution times for a RTT C-macro example. This appendix shows the results of front-end tool tests. The primary goal was to test Algorithm 1 by comparing the result from the front-end tool with manual calculations. No measurement was made.
B.1.
The example.
A class hierarchy (se the figure below) was created with classes, class methods and instance methods. A file with RTT C-macros was prepared with code to execute many types of macros and all kinds of control structures (sequences, selections and iterations). The file is a faked file, i e it does't do anything useful. The file is given in the technical report [RR01]. The example does not contain any return expressions, since the tool does not handle this break of the structured control flow 35.
Object
Class_1
meth_1 (sel_1) ->new ->sel_4 ->sel_3 ->sel_2 ->sel_1 direct recursion block_1 ->sel_3 block_2 ->sel_4
new
Class_2
meth_2 (sel_2) ->sel_3 block_3 ->sel_4 ->sel_3 block_4 block_5 ->sel_1 indirect recursion
Baseclass_2
meth_4 (sel_4) meth_5 (sel_2)
meth_3 (sel_3) Class_3
meth_6 (sel_2) ->sel_3 ->sel_4 ->sel_2
Baseclass_1
• Class 1, 2 and 3 are "normal" classes. • Baseclasses 1 and 2 and Object have known execution times (in TIMES82.H). • "sel_2" contains message sending to "sel_1" and the baseclasses. • "new" and "meth_3" are class methods, the rest are instance methods.
Figure B.1: The test class hierarchy. To the right of each class in the figure all message sendings of the class are written.
35/ See section 6.1.2.
Page 93
94
Appendix B. Calculation of execution times for a RTT C-macro example
The syntax of this can most easily be described with an example: The method meth_1 in Class_1 is invoked by sending a message with the selector sel_1 to an instance of the class. The method contains message sending with the selectors sel_1, sel_2, sel_3 and sel_4. The method also invokes the block block_1, which contains a message sending with the selector sel_3, and the block block_2, which contains a message sending with the selector sel_4. The methods new, meth_3, meth_4 and meth_5 are methods in base classes, with the following known (faked) times: MINTC(new) = 1 162, MAXTC(new) = 1 552 MINTC(meth_3) = 42, MAXTC(meth_3) = 42 MINTC(meth_4) = 192, MAXTC(meth_4) = 192 MINTC(meth_5) = 300, MAXTC(meth_5) = 300 Times are expressed in clock cycles. For the rest of the methods, the recursive algorithm Alg. 1 can be used by hand to check the computed values. This is done below. Please note that for simplicity, system overhead is set to zero, i. e.: MINTC(init) = MINTC(test1) = MAXTC(test2) = MAXTC(message sending) = MAXTC(increment) = 0 in the equations. B.2.
Manual calculation.
The basic time (the first figure after the equal sign) is fetched from the run of the parser and is the sum of the times of the macros. These times are given in the technical report [RR01]. MINTC(meth_1) = 6 666 + MINTC(new) + MINTC(sel_4) + MINTC(sel_3) + MINTC(sel_4) + MINTC(sel_2) + 7 * ((MINTC(block_1) + MINTC(block_2)) + 7 * MINTC(block_1) + 9 * MINTC(block_1) = 6 666 + 1 162 + 42 + 2 * 192 + 300 + 23 * 1 000 + 7 * 1 106 = 39 296. MAXTC(meth_1) = 6984 + MAXTC(new) + MAXTC(sel_4) + MAXTC(sel_3) + MAXTC(sel_4) + MAXTC(sel_2) + 11 * ((MAXTC(block_1) + MAXTC(block_2)) + 7 * MAXTC(block_1) + 31 * MAXTC(block_1) + 13 * MAXTC(block_1) = 6 984 + 1 552 + 42 + 2 * 192 + 21 318 + 62 * 1 314 + 11 * 1 416 = 127 324. MINTC(meth_2) = 7370 + 2 * MINTC(sel_3) + MINTC(block_3) = 7 370 + 2 * 42 + 5 412 = 12 866.
Appendix B. Calculation of execution times for a RTT C-macro example
95
MAXTC(meth_2) = 8682 + 2 * MAXTC(sel_3) + 2 * MAXTC(block_3) = 8 682 + 2 * 42 + 2 * 6 276 = 21 318. MINTC(meth_6) = 3 584 + 2 * MINTC(sel_3) + MINTC(sel_4) + MINTC(sel_2) = 3 584 + 2 * 42 + 192 + 300 = 4 160. MAXTC(meth_6) = 4 088 + 2 * MAXTC(sel_3) + MAXTC(sel_4) + MAXTC(sel_2) = 4 088 + 2 * 42 + 192 + 21 318 = 25 682. MINTC(block_1) = 958 + MINTC(sel_3) = 958 + 42 = 1 000. MAXTC(block_1) = 1 272 + MAXTC(sel_3) = 1 272 + 42 = 1 314. MINTC(block_2) = 914 + MINTC(sel_4) = 914 + 192 = 1 106. MAXTC(block_2) = 1 224 + MAXTC(sel_4) = 1 224 + 192 = 1 416. MINTC(block_3) = 3 978 + 2 * MINTC(sel_3) + MINTC(sel_4) + min(MINTC(block_4), MINTC(block_5)) = 3 978 + 2 * 42 + 192 + min(1 704, 1 158) = 5 412. MAXTC(block_3) = 4 296 + 2 * MAXTC(sel_3) + MAXTC(sel_4) + max(MAXTC(block_4), MAXTC(block_5)) = 4 296 + 2 * 42 + 192 + max(1 704, 1 350) = 6 276. MINTC(block_4) = 1 704. MAXTC(block_4) = 1 704. MINTC(block_5) = 1 158. MAXTC(block_5) = 1 350. MINTC("new") = MINTC(new) = 1 162. MAXTC("new") = MAXTC(new) = 1 552. MINTC("sel_1") = min(MINTC(meth_1), MINTC(meth_6)) = 4 160. MAXTC("sel_1") = max(MAXTC(meth_1), MAXTC(meth_6)) = 120 856. MINTC("sel_2") = min(MINTC(meth_2), MINTC(meth_5)) = 300. MAXTC("sel_2") = max(MAXTC(meth_2), MAXTC(meth_5)) = 21 318. MINTC("sel_3") = MINTC(sel_3) = 42. MAXTC("sel_3") = MAXTC(sel_3) = 42. MINTC("sel_4") = MINTC(sel_4) = 192. Page 95
96
Appendix B. Calculation of execution times for a RTT C-macro example
MAXTC("sel_4") = MAXTC(sel_4) = 192.
B.3.
Detection of recursion.
If the message sending in Class_1, meth_1 with the selector sel_1 should be executed, direct recursion would occur. This is tested with positive result. If the message sending in Class_2, meth_2 with the selector sel_1 should be executed, indirect recursion could occur (via meth_1). This is tested with positive result.
B.4.
Automatic calculation with the front-end tool.
The example above has been analysed with the tool with the same result as the manual calculations36. Here follows a list of the calculated values for the different methods, calculated dynamic factors (CALD) for the methods together with comments on the results. Result: MINTC(meth_1) = 39 296, MAXTC(meth_1) = 127 324. Comment: CALD = 3,24 due to loops with different min- and maxIterations and the polymorphism of selector sel_2. Result: MINTC(meth_2) = 12 866, MAXTC(meth_2) = 21 318. Comment: CALD = 1,66 due to selection expressions. Result: MINTC(meth_3) = MAXTC(meth_3) = 42 MINTC(meth_4) = MAXTC(meth_4) = 192 MINTC(meth_5) = MAXTC(meth_5) = 300 Comment: CALD = 1 for all these base class methods. Result: 36/ The C-macro code together with the output from the front-end tool is found in the report RRXX.
Appendix B. Calculation of execution times for a RTT C-macro example
97
MINTC(meth_6) = 4 160, MAXTC(meth_6) = 25 682. Comment: Very high CALD = 6,17 due to the polymorphism of selector sel_2. Result: MINTC("new") = MINTC(new) = 1 162. MAXTC("new") = MAXTC(new) = 1 552. Comment: CALD = 1,34. Base class method. Result: MINTC("sel_1") = 4 160, MAXTC("sel_1") = 120 856. Comment: Extreme CALD = 29,05 due to the polymorphism of selector sel_1 and sel_2. Result: MINTC("sel_2") = 300, MAXTC("sel_2") = 21 318. Comment: Extreme CALD = 71,06 due to the polymorphism of selector sel_2. Result: MINTC("sel_3") = MAXTC("sel_3") = 42. MINTC("sel_4") = MAXTC("sel_4") = 192. CALD = 1 for both these base class methods. Some concluding comments: The results certainly show varying values of CALD:s. Although the example is very fabricated, it illustrates some of the problems mentioned in the thesis. The extreme CALD:s for sel_1 and sel_2 would probably be removed if the code was typed or if the polymorphism of selector sel_1 and sel_2 was removed. If return statements would have been present, some of the CALD values would be bigger. Of course, a real judgement is only possible from a real application.
Page 97