started to show up in different spheres of our lives: from cell phones and ... swiping his finger across the screen, the user expects the screen to be unlocked almost .... developer is aware of the portability issues between Apple OS and Android ...
Predictable Platforms for Safety-Critical Embedded Systems
Sidharta Andalam February 2013 Supervisor:
Dr. Partha S. Roop
Co-Supervisor: Dr. Alain Girault Department of Electrical & Computer Engineering The University of Auckland New Zealand
A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Engineering
2
Abstract
Safety-critical embedded systems, commonly found in automotive, space, and health-care, are highly reactive and concurrent. Their most important characteristics are that they require both functional and timing correctness. C has been the language of choice for programming such systems. However, C lacks many features that can make the design process of such systems seamless while also maintaining predictability. In contrast, the synchronous programming paradigm offers an alternative approach for programming safety-critical applications. The formal semantics of synchronous programming languages establish a well-defined behaviour of a program. The synchronous paradigm adopts an abstract notion of time by viewing a system as evolving in a sequence of discrete steps. This simplifies program debugging, testing and validation, and leads to clear temporal constructs. These features make synchronous languages more expressive, but also makes them less familiar to programmers trained in conventional languages, like C. In this thesis, we address the need for a C-based design framework for programming safety-critical applications. Inspired by the synchronous programming paradigm, we propose the following. (1) A new language called, Precision Timed C (PRET-C) that provides a small set of extensions to a subset of C to facilitate effective concurrent programming of safety-critical applications. We present a new synchronous semantics for PRET-C and guarantee that all PRET-C programs are deterministic, reactive, and provides thread-safe communication via shared memory access. (2) A new predictable architecture, called ARPRET. It offers the ability to design time predictable architectures through simple customizations of soft-core processors. We have designed ARPRET particularly for efficient and predictable execution of PRET-C. (3) A new static timing analyser for validating the timing deadlines of a synchronous program. Here, we consider pruning of infeasible paths for tighter analysis along with new fast and precise technique for analysing cache-based architectures. (4) A new cache analysis approach for analysing the behaviour of instructions exe3
cuting on a direct mapped cache. Using a binary representation and a new abstraction, we reduce the analysis time without sacrificing the precision. This offers the ability to analyse large PRET-C programs. The proposed framework in this thesis is implemented and evaluated as follows. Firstly, the PRET-C language is supported using C macros. Experimental results reveal that PRET-C yields significantly more efficient code compared to other C-based synchronous languages. Secondly, the ARPRET architecture is synthesised on an FPGA and it is shown through extensive benchmarking that this significantly improves throughput of PRET-C programs, while maintaining predictability. Thirdly, the proposed static timing analyser is based on the model checking technique. It is very effective in pruning infeasible paths. Experiments show that the proposed approach gives significantly more precise results than the current state-of-the-art static timing analysers for synchronous programs. Finally, the proposed cache analysis approach is very precise and completes within a reasonable amount of time. This is unlike the existing cache analysing approaches where either precision or scalability (analysis time) is sacrificed. Overall, results demonstrate the viability of the ideas presented in this thesis for the development and verification of large safety-critical applications.
4
Acknowledgements
This thesis concludes one of the longest journeys in my life. It has been stimulating when reading/writing articles, has been adventurous when attending conferences, and most importantly it has been fulfilling. These experiences have given me a sense of self satisfaction at a spiritual level. This journey was only possible due to the contributions of many people. First, among them, is my supervisor Partha S. Roop for encouraging me on starting this great journey, guiding me on my research, providing constant challenge and encouragement, and funding my conferences and field trips. Finally, his feedback on my ideas and the thesis has been invaluable. I am also grateful to Alain Girault, who has also supervised me during this journey. He has always provided me with a fresh perspective on my work which enabled me to see a much wider picture. I would also like to thank his POPART team from INRIA and the AFMES project for their generous support in providing expertise and finance. This has added more colours to my journey. I would like to thank Roopak Sinha for sharing his expertise in verification and his close collaboration. I would also like to thank Simon Yuan and Li Hsien Yoong for their valuable discussions and guidance in selecting research directions and using LATEX. Li Hsien Yoong has also helped proof read this thesis. I would like to thank Matthew Kuo and Eugene Yip for their help in building tools and writing documents. I must also thank the summer interns David Liu, Kevin Chung, Brenny Wang and Subarno Banerjee who have worked with me over the summer breaks in developing the timing analysis and the cache analysis tools. I would also like to thank Reinhard von Hanxleden from Germany for giving me the opportunity to work with his group. Last but not least, this journey was only possible with the help and support of my family members. My respectful father Satyapal Andalam, my caring mother Lata Andalam, and my dearest brother Kartik Andalam who has provided all the IT support during this journey.
5
6
Contents
1 Introduction 1.1
1
Design and implementation of real-time systems . . . . . . . . . . . . . .
3
1.1.1
Real-Time Operation Systems (RTOS) . . . . . . . . . . . . . . .
5
1.1.2
Light-weight multithreading . . . . . . . . . . . . . . . . . . . . .
8
1.1.3
Synchronous languages . . . . . . . . . . . . . . . . . . . . . . . .
9
1.2
Validation of real-time systems
. . . . . . . . . . . . . . . . . . . . . . .
12
1.3
Motivation and overview of our approach . . . . . . . . . . . . . . . . . .
13
1.3.1
Research contributions . . . . . . . . . . . . . . . . . . . . . . . .
14
1.3.2
Thesis organisation . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2 Literature Review 2.1
2.2
2.3
2.4
17
Programming safety-critical applications . . . . . . . . . . . . . . . . . .
17
2.1.1
Synchronous Languages . . . . . . . . . . . . . . . . . . . . . . .
17
2.1.2
Lightweight multithreading
. . . . . . . . . . . . . . . . . . . . .
19
Predictable architectures . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.2.1
Reactive processors . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.2.2
Thread-interleaved pipelined processors . . . . . . . . . . . . . . .
21
2.2.3
Java optimised processors . . . . . . . . . . . . . . . . . . . . . .
22
Timing analysis of synchronous programs . . . . . . . . . . . . . . . . . .
23
2.3.1
Timing analysis for reactive processors . . . . . . . . . . . . . . .
24
2.3.2
Timing analysis for general-purpose processors . . . . . . . . . . .
24
2.3.3
High-level Timing analysis . . . . . . . . . . . . . . . . . . . . . .
25
2.3.4
Analysing cache behaviour . . . . . . . . . . . . . . . . . . . . . .
25
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3 The PRET-C Language 3.1
29
PRET-C overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.1.1
30
PRET-C language extensions . . . . . . . . . . . . . . . . . . . . 7
3.1.2
An Unmanned Aerial Vehicle example . . . . . . . . . . . . . . .
34
3.1.3
Verifying Timing Requirements . . . . . . . . . . . . . . . . . . .
37
Semantics of PRET-C . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.2.1
The Kernel language . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.2.2
Structural Translations . . . . . . . . . . . . . . . . . . . . . . . .
38
3.2.3
Rewrite rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
3.2.4
Structural translation rules for preemption . . . . . . . . . . . . .
44
3.2.5
Structural translation rules for loops . . . . . . . . . . . . . . . .
45
3.2.6
Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
3.2.7
Reactivity and Determinism . . . . . . . . . . . . . . . . . . . . .
47
3.2.8
Proof for Reactivity . . . . . . . . . . . . . . . . . . . . . . . . . .
47
3.2.9
Proof for determinism . . . . . . . . . . . . . . . . . . . . . . . .
48
3.3
Comparison with Esterel, Reactive C and Synchronous C . . . . . . . . .
49
3.4
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
3.2
4 Architectures for Executing PRET-C
53
4.1
The basic platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
4.2
ARPRET platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
4.2.1
Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
4.2.2
Comparison between basic and ARPRET platforms . . . . . . . .
56
4.3
Comparison with the Berkeley-Columbia PRET approach . . . . . . . . .
57
4.4
Intermediate Format for Timing Analysis . . . . . . . . . . . . . . . . . .
59
4.4.1
Compiling PRET-C into TCCFG . . . . . . . . . . . . . . . . . .
61
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
4.5
5 Static Timing Analysis of PRET-C 5.1 5.2
5.3
65
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
5.1.1
Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . .
67
WCRT Analysis using Model Checking . . . . . . . . . . . . . . . . . . .
68
5.2.1
Structural translation of aborts . . . . . . . . . . . . . . . . . . .
69
5.2.2
Mapping TCCFG to TFSMs . . . . . . . . . . . . . . . . . . . . .
73
5.2.3
Mapping TFSMs to UPPAAL . . . . . . . . . . . . . . . . . . . .
76
5.2.4
U oA Computing W CRTcmp as a CTL Property . . . . . . . . . . . . .
79
5.2.5
Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
Pruning Infeasible Paths . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
5.3.1
Classification of infeasible paths . . . . . . . . . . . . . . . . . . .
85
5.3.2
A PRET-C example . . . . . . . . . . . . . . . . . . . . . . . . .
85
5.3.3
Our Model Checking Formulation . . . . . . . . . . . . . . . . . .
88
8
5.4
Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
5.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
6 Modelling the Instruction Cache Behavior
93
6.1
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
6.2
Cache Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
6.2.1
Cache model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
6.2.2
Cache states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
6.2.3
Analysing the cache states . . . . . . . . . . . . . . . . . . . . . .
100
6.2.4
Cache analysis problem . . . . . . . . . . . . . . . . . . . . . . . .
102
The concrete approach . . . . . . . . . . . . . . . . . . . . . . . . . . . .
105
6.3.1
The concrete join function . . . . . . . . . . . . . . . . . . . . . .
105
6.3.2
The concrete transfer function . . . . . . . . . . . . . . . . . . . .
106
6.3.3
Fixed point computation . . . . . . . . . . . . . . . . . . . . . . .
108
The abstract approach . . . . . . . . . . . . . . . . . . . . . . . . . . . .
112
6.4.1
Abstract cache states . . . . . . . . . . . . . . . . . . . . . . . . .
112
6.4.2
The abstract join function . . . . . . . . . . . . . . . . . . . . . .
114
6.4.3
The abstract transfer function . . . . . . . . . . . . . . . . . . . .
115
6.4.4
Fixed-point computation . . . . . . . . . . . . . . . . . . . . . . .
117
6.4.5
Calculating cache misses . . . . . . . . . . . . . . . . . . . . . . .
121
6.5
Comparison between the concrete and abstract approaches . . . . . . . .
124
6.6
Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
127
6.6.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
127
6.6.2
Reducing the CFG and abstracting the instructions . . . . . . . .
129
6.6.3
Relative cache states . . . . . . . . . . . . . . . . . . . . . . . . .
133
6.6.4
proposed transfer function . . . . . . . . . . . . . . . . . . . . . .
136
6.6.5
Computing all possible reaching relative cache states of the refer-
6.3
6.4
ence block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
139
6.6.6
Calculating cache misses for the reference block . . . . . . . . . .
143
6.6.7
Reducing the analysis time . . . . . . . . . . . . . . . . . . . . . .
145
Comparisons between the concrete, abstract and proposed approaches. .
150
6.7.1
Mapping relative cache states of bref to concrete cache states
. .
150
6.7.2
Comparison between the approaches . . . . . . . . . . . . . . . .
155
6.8
Cache analysis of PRET-C programs . . . . . . . . . . . . . . . . . . . .
160
6.9
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
162
6.7
9
7 Benchmarking and Experimental Results 165 7.1 Evaluation of the ARPRET platform and the PRET-C language . . . . . 165 7.1.1 Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 7.2
Evaluating the Model Checking framework for WCRT analysis . . . . . . 7.2.1 Benchmarking and Results . . . . . . . . . . . . . . . . . . . . . .
7.3 7.4 7.5
Evaluating the improvement in precision of the WCRT using EDFA technique174 Evaluating the Precision of the proposed Cache Analysis Technique . . . 179 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
8 Conclusions and Future directions 8.1
8.2
170 170
183
Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Platforms for execution of PRET-C . . . . . . . . . . . . . . . . .
184 185
8.1.2 8.1.3 8.1.4
Cache analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multi-core execution of PRET-C . . . . . . . . . . . . . . . . . . IDE support for programming in PRET-C . . . . . . . . . . . . .
185 185 185
Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
186
A Research Articles Based on this Thesis 187 A.1 Published papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 A.2 Papers under preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 188 References
189
10
List of Figures
1.1
A model of real-time systems
. . . . . . . . . . . . . . . . . . . . . . . .
2
1.2
UAV example overview
. . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3
Implementation using RTOS and light-weight multithreading . . . . . . .
6
1.4
Synchronous execution . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.5
Implementation using Esterel . . . . . . . . . . . . . . . . . . . . . . . .
10
1.6
Estimating execution time . . . . . . . . . . . . . . . . . . . . . . . . . .
13
1.7
Overview of the proposed approach . . . . . . . . . . . . . . . . . . . . .
14
1.8
Map of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
3.1
Execution of PRET-C program w.r.t environment. . . . . . . . . . . . . .
31
3.2
Execution of PRET-C program. . . . . . . . . . . . . . . . . . . . . . . .
33
3.3
Simple example in PRET-C . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.4
UAV example overview
35
3.5
Unmanned Aerial Vehicle (UAV) example in PRET-C
3.6
Structural translation of strong aborts into a PAR statement
3.7
Structural translation of weak aborts into a PAR statement
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
. . . . . .
44
. . . . . . .
44
3.8
Structural translation of nested aborts into a PAR statements . . . . . .
44
3.9
Structural translation of annotated loops. . . . . . . . . . . . . . . . . . .
45
3.10 PRET-C Example
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
3.11 Parallelism in Esterel and PRET-C . . . . . . . . . . . . . . . . . . . . .
50
4.1
Architecture of ARPRET platform . . . . . . . . . . . . . . . . . . . . .
55
4.2
TCCFG of the UAV Controller. . . . . . . . . . . . . . . . . . . . . . . .
60
4.3
Overview of compiling PRET-C into TCCFG. . . . . . . . . . . . . . . .
61
4.4
PRET-C to ASM to TCCFG. . . . . . . . . . . . . . . . . . . . . . . . .
61
5.1
Notations used in timing analysis of synchronous programs . . . . . . . .
66
5.2
Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
5.3
Brief overview of our approach. . . . . . . . . . . . . . . . . . . . . . . .
68
11
5.4 5.5
Structural translation of strong aborts . . . . . . . . . . . . . . . . . . . Structural translation of weak aborts . . . . . . . . . . . . . . . . . . . .
70 71
5.6 5.7
Structural translation of nested aborts . . . . . . . . . . . . . . . . . . . Converting TCCFG to TFSM for each thread of the UAV example. Con-
72
5.7
tinued in next page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Converting TCCFG to TFSM for each thread of the UAV example. . . .
74 75
5.8
Simple UPPAAL model
. . . . . . . . . . . . . . . . . . . . . . . . . . .
76
5.9 Modelling a TFSM as UPPAAL model . . . . . . . . . . . . . . . . . . . 5.10 Asynchronous model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77 77
5.11 Synchronous model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 Mapping Timed FSM (TFSM) to UPPAAL model for the UAV example.
79
Continued in next page. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 Mapping Timed FSM (TFSM) to UPPAAL model for the UAV example.
81
Continued in next page. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 Mapping Timed FSM (TFSM) to UPPAAL model for the UAV example.
82 83
5.13 Infeasible paths due to the abstraction of data . . . . . . . . . . . . . . .
84
5.14 PRET-C example to illustrate various infeasible paths . . . . . . . . . . . 5.15 TCCFG of the PRET-C example . . . . . . . . . . . . . . . . . . . . . .
86 87
5.16 TFSM is an abstracted model that captures only tick boundaries. . . . . 5.17 UPPAAL model of thread T1 ’s first tick of the example in Figure 5.15. .
88 89
6.1
Memory hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
6.2
(a) A simple control flow graph consisting of nine basic blocks (B1 to B9) and the instructions that are accessed during execution of the basic block. (b) Mapping of instructions on to four cache lines (c0 to c3 ). . . . . . . .
96
6.3
Illustration of the cache states. . . . . . . . . . . . . . . . . . . . . . . . .
100
6.4 6.5
Illustration of the function mc . . . . . . . . . . . . . . . . . . . . . . . Illustration of the concrete transfer function . . . . . . . . . . . . . . . .
101 107
6.6 6.7
Computing all possible reaching cache states using the concrete approach. 111 Illustration of the abstract cache states. . . . . . . . . . . . . . . . . . . . 114
6.8
Illustration of the abstract transfer function . . . . . . . . . . . . . . . .
116
6.9
Computing all possible abstract reaching cache states using the abstract approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
119
6.10 Illustration of the functions wmcabs and bmcabs . . . . . . . . . . . . . . 6.11 Comparing the concrete and abstract approaches on precision vs analysis time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
125
6.12 Illustration of Algorithm 4 with bref = B8. . . . . . . . . . . . . . . . . .
131
r
6.13 Illustration of the function BI (b) 12
. . . . . . . . . . . . . . . . . . . . .
123
132
6.14 Illustration of the relative cache states. . . . . . . . . . . . . . . . . . . . 6.15 Illustration of the proposed transfer function . . . . . . . . . . . . . . . 6.16 Computing all possible reaching relative cache states of the reference block
134 137
bref (B8) using the proposed approach. . . . . . . . . . . . . . . . . . . . 6.17 UoA approaches for analysing a reference block bref . . . . . . . . . . . . .
142 145
6.18 Overview of concrete, abstract and proposed approaches for computing the worst/best cache misses for a block b. . . . . . . . . . . . . . . . . . . . . 6.19 Cache analysis of PRET-C programs . . . . . . . . . . . . . . . . . . . .
155 161
7.1 7.2
Comparing the WCRT on Basic and ARPRET platforms. . . . . . . . . . Comparing the ACRT on Basic and ARPRET platforms. . . . . . . . . .
167 167
7.3 7.4 7.5
Number of threads versus hardware consumption in terms of LUTs. . . . Comparing the WCRT of PRET-C with Protothreads, Esterel, and SC. . Comparing the ACRT of PRET-C with Protothreads, Esterel, and SC. .
168 169 169
7.6
Comparing the memory usage of PRET-C with Protothreads, Esterel, and SC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
170
The effect of load variance on the WCRT value of the Smokers example . WCRT over estimation for SDFA and EDFA based pruning. . . . . . . . Context sensitive information vs tightness of the estimated WCRT. . . .
173 175 176
7.10 Context sensitive information vs the analysis time. . . . . . . . . . . . . 7.11 Context sensitive information vs the number of states explored. . . . . .
177 178
7.12 Comparing the WCET estimates (the smaller the better) of the abstract and proposed approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 7.13 Comparing WCET and analysis time for the last five examples (1% relative
180
7.7 7.8 7.9
cache size)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
181
14
List of Tables
1.1
Comparing the three approaches for implementing real-time systems. . .
12
3.1
PRET-C extensions to C . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.2
Kernel statements of PRET-C.
. . . . . . . . . . . . . . . . . . . . . . .
38
3.3
Structural translations of PRET-C extensions into the kernel statements.
39
3.4
Changes to the state of the environment E. . . . . . . . . . . . . . . . .
46
3.5
Qualitative Comparison with Esterel and RC. . . . . . . . . . . . . . . .
50
4.1
Simple lookup table used by the PFU to decode data from the FSL bridge. 56
4.2
Comparing the execution cost of an EOT
4.3
Qualitative comparison between the Auckland and Berkeley-Columbia PRET approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4
MB instructions and their worst case execution cost. . . . . . . . . . . .
62
5.1
Comparison between SDFA and EDFA based path pruning.
. . . . . . .
86
5.2
Pruning infeasible tick transitions . . . . . . . . . . . . . . . . . . . . . .
90
6.1
Some of the symbols and the definitions (Illustrated using Figures 6.3) .
104
6.2
Illustrate the subsumed function Scon . . . . . . . . . . . . . . . . . . . .
106
6.3
Illustrate the join function Jcon . . . . . . . . . . . . . . . . . . . . . . .
106
6.4
Comparing the precision between the concrete and the abstract approaches.124
6.5
Some of the symbols and the definitions presented in this section. . . . .
126
6.6
Illustrate the subsumed function Spro . . . . . . . . . . . . . . . . . . . .
136
6.7
Illustrate the join function Jpro
. . . . . . . . . . . . . . . . . . . . . . .
136
6.8
Reducing cache states based on the worst/best case analysis. . . . . . . .
146
6.9
Illustration of functions Jpro+w and Jpro+b . . . . . . . . . . . . . . . . . .
148
6.10 Some of the symbols and the definitions presented in this section. . . . .
149
. . . . . . . . . . . . . . . . .
56
6.11 Illustration of the function MAP instpro w.r.t cache line c0 , BI(bref )[0] = m1 and CI(c0 ) = {m1, m5} . . . . . . . . . . . . . . . . . . . . . . . . . 15
152
6.12 Comparing the cache states and the precision between the concrete, the abstract and the proposed approaches as we analyse block B8 (BI(B8) = [∓, m6, m3, m4]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
159
6.13 Qualitative comparing the precision between the concrete, the abstract and the proposed approaches. . . . . . . . . . . . . . . . . . . . . . . . .
159
7.1 7.2 7.3
Benchmark programs and their characteristics. . . . . . . . . . . . . . . . The complexity growth of WCRT analysis . . . . . . . . . . . . . . . . . max+ U oA Comparing the W CRTcmp and the W CRTcmp . . . . . . . . . . . . . .
166 171 172
7.4
max+ U oA Comparing W CRTcmp and the W CRTcmp as the load varies for the Smokers example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
173
7.5 7.6 7.7
Comparing the W CRTcmp value with the W CRTobs value. . . . . . . . . Comparing the WCRT with SDFA and EDFA based path pruning. . . . Context sensitive information vs tightness of the estimated WCRT. . . .
173 174 176
7.8 7.9
Context sensitive information vs analysis time. . . . . . . . . . . . . . . . Context sensitive information vs number of states explored. . . . . . . . .
177 178
7.10 Benchmark programs and their characteristics. . . . . . . . . . . . . . . . 7.11 Quantitative comparison between the abstract, proposed and concrete approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
179
16
180
1 Introduction
Originally computers were developed for solving mathematical equations. They were the size of a large room, consuming as much power as several hundred modern personal computers. As computers become smaller, faster, cheaper and more energy efficient, they started to show up in different spheres of our lives: from cell phones and multi-media players, to the automotive, aviation, defence, and space sectors. Their primary function is not only to interact with the environment, but also to perform information processing. Such applications often require both functional and timing correctness. These types of applications are referred to as real-time applications. Before we proceed further, let us try to define the phrase real-time systems. There are many definitions and interpretations, but they all relate to a notion of time. Here is one such definition [1]: “A real-time system is any information processing system which has to respond to externally generated input stimuli within a finite and specified period”. In general, the definition captures a wide range of activities. For example, iPods may be considered real-time in the sense that when a user tries to unlock the device by swiping his finger across the screen, the user expects the screen to be unlocked almost instantaneously. Fortunately, it is not disastrous if the device does not respond or unlock 1
immediately. However, it is still expected to unlock within a reasonable amount of time. Such systems are known as soft real-time systems. In contrast, applications such as the control of the shutter speed of a camera, requires very precise response time for the correct operation of that product. Such systems are known as hard real-time systems. Further, applications such as missile control, nuclear power plants, flight control, and remote surgery are also considered as hard real-time applications. Here, a failure could result in catastrophic loss of life. Such applications have safety as an additional requirement and, are known as safety-critical real-time systems. In both hard and soft real time systems, a digital controller (or a micro-controller) usually interacts with the environment and is dedicated to the task of monitoring sensors and controlling actuators to achieve some predefined reaction within a finite and specified period. This idea is presented using Figure 1.1.
Figure 1.1: A model of real-time systems In Figure 1.2, we present one such example of a hard real-time application that is also safety-critical i.e., an abstracted Unmanned Aerial Vehicle (UAV) controller. This example is an abstraction of a benchmark in PapaBench [2] and will be the running example for the thesis. The example has four tasks, namely the navigation controller, the altitude controller, the engine controller, and the failSafe. The navigation controller calculates the desired angle of the flaps and the desired engines speed (speedL and speedR). This is done based on the current altitude, speed, and the GPS position of the vehicle. The altitude controller controls the duty cycle of flaps. This is based on the value of the variable angle, which is used to capture the desired angle of the flaps. The engine controller controls the pulse width modulation (PWM) duty cycle of the left and right engines based on the values of speedL and speedR outputs. Finally, the failSafe makes sure that the value of angle always operates within a safe 2
Figure 1.2: UAV example overview range. The UAV operates in two modes. In the flight mode the UAV takes a preprogrammed route. In contrast, in the landing mode the UAV shuts down its engines and glides down to land. There is a strict timing constraint that the time from capturing the inputs to determining the outputs should not exceed 0.1 seconds. We refer to this example as the UAV example. In the next section, we introduce some of the key features that must be considered during the design and implementation of the UAV example.
1.1
Design and implementation of real-time systems
During design of real-time systems, one must think about concurrency, determinism, predictability, reliability, and portability. We describe them as follows. Concurrency: Real-time systems are designed to handle several concurrent activities. Thus, the design language must provide a means for expressing parallelism and solving resulting synchronisation and communications problems. However, as explained in [3], the challenge is to maintain the understandability such that the programmer is not burdened with ensuring correctness through complex synchronisation mechanisms. In the UAV example, there are four concurrent tasks and they need to synchronise the access to the shared variables speedL, speedR and angle. Determinism: Real-time systems can benefit from consistent behaviour: when given the same sequence of inputs, the same set of outputs are generated. This makes a deterministic systems simpler to specify and debug compared to a non-deterministic system [4]. In the UAV example, given a sequence of input sensor values (GPS, altitude, airSpeed and mode), the sequence of output values to the actuators (engineL, engineR and flaps) should always be the same. This allows us to guarantee the behaviour of a real-time system, which is important for validation 3
of safety-critical applications. In this thesis, when we refer to determinism, unless specified, we refer to input/output determinism. Predictability: In this thesis, we describe predictability as the ability to guarantee a time bound on the reaction of a real-time system [5]. Time-critical tasks are analysed by static analysers to verify that these applications meet their timing deadlines at all times. In order to analyse these timing deadlines, designers must choose software languages and the the underlying hardware such that they are amenable for static timing analysis [6], [7]. As per our definition, determinism is related to functionality, while predictability is related to timing. For the UAV example, there is a strict timing constraint that the time from capturing the inputs to determining the outputs should not exceed 0.1 seconds. During implementation, we must make sure that the underlying hardware and the software layers are amenable for static timing analysis. Portability A real-time application is portable if it can run with different hardware and software architectures. During implementation, design choices must be evaluated for their ability to allow such portability. For example, a mobile applications developer is aware of the portability issues between Apple OS and Android OS. Historically, computers were developed for solving mathematical equations. Back then, the focus was purely on transforming data. Therefore, numerous transformational languages were developed and quickly became mainstream. However, as described above, real-time systems require more than just correct. Moreover, the expressivity of transformational languages was inadequate to capture concurrency and to describe the interaction with the environment. New approaches were required for these purposes. The following are some of the existing approaches that are presented for implementing complex real-time systems. 1. Real-time operating systems (RTOS) consists of tasks, where each task has a priority and a deadline. A scheduler manages the tasks, such that each task adheres to its deadline [8]. Here, the focus is not on throughput, but the responsiveness to changes in the environment. 2. Light-weight multithreading approaches were introduced to minimum memory footprint [9,10]. These provide alternatives to using an RTOS for programming resource constrained applications, such as sensor networks [9]. 3. Synchronous programs execute periodically in a sequence of ticks. The inputs from the environment are sampled at the beginning of each tick, and a program reacts to 4
it by computing new outputs infinitely fast relative to the environment such that the reaction of the system to the inputs is instantaneous (zero time). Synchronous languages are based on a solid mathematical foundation, which enables one to reason formally about the operations of a program [11]. Using the UAV example, we will illustrate more details about RTOSes, light-wight multithreading languages, and synchronous languages. For illustration purposes, we focus only on the the navigation controller task and the engine controller task to explain the constructs for describing concurrency and synchronisation of threads using these approaches.
1.1.1
Real-Time Operation Systems (RTOS)
Using Figure 1.3(a), we illustrate the key features of concurrency and synchronisation in MicroC/OS-II RTOS [8]. An RTOS application consists of tasks, , where each task has a unique priority (the lower the value, the higher the priority). Tasks are managed using a preemptive scheduler that always runs the highest priority task that is ready. For the UAV example, the priorities of the navigation controller task and engine controller task are defined on lines 6 and 7. Shared variables speedR and speedL are used for communication between the two tasks. These are declared on line 9. Safe access to these shared variables is provided using semaphores [8], and a mutex is declared on line 11. Behaviour of the navigation controller task (NC) is described from lines 14 to 33. Similarly, behaviour of the engine controller task (EC) is described from lines 35 to 44. Both tasks communicate using shared variables (speedR and speedL). The navigation controller computes the new values for the shared variables on lines 20 and 21, while the engine controller reads the values on lines 39 and 40. During execution, when the lower priority task (engine controller) is reading the shared variables, it may be interrupted by the higher priority task (navigation controller) that writes on the shared variables. This read-write access pattern may corrupt data, and must be avoided. Hence, before accessing the shared variables, both threads first await until they are granted access to the shared variables. This is achieved by calling the OSMutexPend function on lines 19 and 38 respectively. Once a task finishes accessing the shared variables, it informs the RTOS by calling the OSMutexPost function on lines 22 and 41 respectively. As specified before, the UAV example has a strict deadline of 0.1 seconds (100 ms) to react to the inputs. For simplicity, let us assume that both tasks (navigation controller and engine controller) need to complete their computation within 100 ms. Assuming an even workload between tasks, it is desirable that both tasks are exe5
1 2 3 4 5 6 7 8 9 10 11
... #include ”semaphore.h” .... /∗ Task Priorities ∗/ #define MU PRIO 1 //mutex #define NC PRIO 2 #define EC PRIO 3 /∗ Shared variables ∗/ float speedL, speedR, angle; /∗ Declaration of the mutex∗/ OS EVENT ∗mutex;
1 2 3 4 5 6 7
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
8
/∗ navigation controller ∗/ void NC(void∗ pdata){ INT8U err = OS NO ERR; while(1){ if (mode==flight){ angle=calAngle(altitude); OSMutexPend(mutex, 0, &err); speedL=calSpeedL(airSpeed,GPS); speedR=calSpeedR(airSpeed,GPS); OSMutexPost(mutex); } else{ angle=calGlide(altitude); OSMutexPend(mutex, 0, &err); speedL=0; speedR=0; OSMutexPost(mutex); } OSTimeDlyHMSM(0, 0, 0, 50); } } /∗ engine controller ∗/ void EC(void∗ pdata){ INT8U err = OS NO ERR; while(1){ OSMutexPend(mutex, 0, &err); engineL=calDuty(speedL); engineR=calDuty(speedR); OSMutexPost(mutex); OSTimeDlyHMSM(0, 0, 0, 50); } } //main mutex=OSMutexCreate(MU PRIO,...); OSTaskCreate(NC,...,NC PRIO); OSTaskCreate(EC,...,EC PRIO); OSStart(); (a)
... #include ”pt−sem.h” ... static struct pt pt NC; static struct pt pt EC;
9
/∗ Shared variables∗/ float speedL, speedR, angle;
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
/∗ navigation controller ∗/ static PT THREAD(NC(struct pt ∗pt)){ PT BEGIN(pt); while(1){ if (mode==flight){ angle=calAngle(altitude); speedL=calSpeedL(airSpeed,GPS); speedR=calSpeedR(airSpeed,GPS); } else{ speedL=0; speedR=0; angle=calGlide(altitude); } PT YIELD(pt); } }
30 31 32 33 34 35 36 37 38 39
/∗ engine controller ∗/ static PT THREAD(EC(struct pt ∗pt)){ PT BEGIN(pt); while(1){ engineL=calDuty(speedL); engineR=calDuty(speedR); PT YIELD(pt); } }
40 41 42 43 44 45
MicroC OS II (RTOS) [8]
//main PT INIT(&pt NC); PT INIT(&pt EC); PT SCHEDULE(NC(&pt NC)); PT SCHEDULE(EC(&pt EC)); (b)
Protothreads (Light-weight) [9]
Figure 1.3: Implementation using RTOS and light-weight multithreading
6
cuted every 50 ms. The RTOS allows us to define this deadline using the OSTimeDlyHMSM functions on lines 31 and 42 respectively. However, the RTOS does not provide any guarantees, and the timing deadlines must be validated at compile time using static timing analysis techniques [12, 13]. The worst case execution time (WCET) analysis computes the time to execute a task in the worst case. This WCET is used during the schedulability analysis to decide the feasibility of scheduling tasks, such that all tasks meet their timing deadlines [12]. However, this is not easily accomplished. As described in [12], the WCET analysis of an RTOS is hard due to the following reasons. 1. During execution, a task may be preempted and must wait for completion of the preempting task, before continuing its execution. Thus, the execution of a task is dependent on the interfering tasks. This interference must be considered during the WCET analysis. However, determining the worst case cost of such interference is often not feasible. 2. In general, WCET analysis tools require users to guide the tool by manually specifying loop bounds. This may be possible for the application programs, where the user may have some knowledge about the program. However, during the analysis of RTOS system calls, some of the loop bounds are related to the runtime behaviors. For example, to find the next highest priority task the scheduler often iterates over a queue of tasks. The size of this queue is dependent of the number of tasks that are running in the system. Since, this may change throughout the course of the program, too pessimistic estimation from the user may lead to over-design. 3. Interrupt handlers and their latency is easy to analyse. However, it may affect the execution time of a task and the scheduler. Thus, the interferences from the interrupts must also be taken into account which adds to the complexity of the analysis. 4. RTOS system calls execute dynamic function calls that are implemented through function pointers. These dynamic function calls are hard to analyse statically, because the actual called function is known only at run-time. In general, WCET analysis of RTOS is more complex compared to conventional WCET analysis of application programs. This is mainly due to the dynamic properties of system calls, which are based on runtime properties. Experimental results show that the overestimation is much higher when considering the RTOS than conventional WCET analysis of individual tasks [12]. In summary, even though validating timing properties of an RTOS is harder, an RTOS makes it easier to describe tasks, provides mechanisms for safe communication 7
between tasks and supports time management. When a lower priority task is preempted to execute a higher priority task, the scheduler is responsible for storing the context of the lower priority task and for restoring the context of the higher priority task. This context switching and monitoring of tasks, introduces overheads on the execution time and the memory usage, which may be exorbitant for resource-constrained systems. For such systems, a light-weight approach may be more suitable.
1.1.2
Light-weight multithreading
Light-weight multithreading languages, like Protothreads [9] and Cilk [10], provide an alternative to using a RTOS. These languages are designed to maximise throughput with minimum memory footprint. They are used in programming small embedded applications, such as sensor networks [9]. Unlike the tasks of an RTOS, threads in Protothreads do not have any priorities and the scheduler is non-preemptive, i.e., a thread is allowed to execute until it terminates or blocks. When this happens, a light-weight scheduler is invoked, which then selects the next thread to be executed. Using Figure 1.3(b), we illustrate the key features of concurrency and synchronisation in Protothreads [9]. The specification is very similar to the RTOS approach, shown in Figure 1.3(a). For the UAV example, the navigation controller and the engine controller use variables speedR and speedL for communication between the threads. These are declared on line 9. Behaviour of the navigation controller task is described using lines 14 to 29. Similarly, behaviour of the engine controller task is described using lines 32 to 39. Both tasks communicate using the shared variables, speedR and speedL. The navigation controller computes the new values for the shared variables on lines 19 and 20, while the engine controller reads the values on lines 35 and 36. Unlike the RTOS, there is no need for a mutex to guard access to the shared variables, because of implicit locking semantics of Protothreads i.e., the executing thread will never be preempted by the scheduler. Also, the context switching points are explicitly specified using the PT_YIELD statement as shown on lines 31 and 42. As specified before, the UAV example has a strict deadline of 0.1 seconds (100 ms) to react to the inputs. Unlike the RTOS approach, due to the lack of any operating system, light-weight multithreading languages do not provide direct support for specifying an execution frequency of a task. Instead, Protothreads libraries are implemented in C [9] allowing the validation of timing constraints (like the worst case execution time of a task) using existing tools for analysing C programs [7]. Compared to the WCET analysis of tasks executing on a RTOS, the WCET analysis is much simpler in the light-weight multithreading approach due to the use of a non-preemptive scheduler. Context switching 8
points are explicitly specified by the programmer (e.g., using the PT_YIELD statement), which simplifies the timing analysis and reduces the overestimation.
1.1.3
Synchronous languages
Synchronous programming languages follow a model of computation [11], that is akin to synchronous circuits. Synchronous programs execute periodically in a sequence of ticks (or reactions), illustrated using Figure 1.4. The inputs from the environment are sampled at the beginning of each tick, a synchronous program reacts to these inputs and computes new outputs in zero time is based on the synchrony hypothesis. The synchrony hypothesis states that the computation happens infinitely fast relative to the environment such that the reaction of the system to the inputs from the environment is instantaneous (zero time). As a consequence, the interaction with the environment happens only at the tick boundaries. The need for the computation to happen at infinitely fast may seem impractical. However, in practice, the hardware must be capable of reacting to an input event before the next input event arrives, i.e., the worst case reaction time (WCRT) of a synchronous program must be less than the minimum inter arrival time of the input events.
Figure 1.4: Synchronous execution Using the synchronous language called Esterel [14], in Figure 1.5, we present an implementation of the UAV example (Figure 1.2). Threads in Esterel communicate using signals. Signals may consist of a status (present or absent) and/or a value component. Signals with only a status component are known as pure signals, while those with an additional value component are known as valued signals. For example, on line 2, signals angle, speedR and speedL have status and an associated value from the domain of floating point numbers. Also, the scope of the signals is valid between lines 2 and 33. Behaviour of the navigation controller task is described in lines 4 to 24. Similarly, behaviour of the engine controller task is described in lines 27 to 31. Both threads 9
1 2 3 4
... signal angle, speedL, speedR: float in //nav control loop
5 6 7 8 9 10 11 12 13
abort loop emit angle(calAngle(altitude)); emit speedL(calSpeedL(airSpeed,GPS)); emit speedR(calSpeedR(airSpeed,GPS)); pause; end loop when immediate(?mode=landing)
14 15 16 17 18 19 20 21 22
abort loop emit speedL(0); emit speedR(0); emit angle(calGlide(altitude)) ; pause; end loop when immediate(?mode=flight);
23 24 25 26 27 28 29 30 31
end loop || //eng control loop emit engineL(calDuty(?speedL)); emit engineR(calDuty(?speedR)); pause; end loop
32 33 34
end signal ...
Figure 1.5: Implementation using Esterel
communicate using signals speedR and speedL. The ‘k’ operator on line 25 denotes synchronous concurrency and indicates that both threads, consisting of a non-terminating loop, are to be executed in a logically concurrent fashion. The navigation controller thread computes the new values for the shared signals and emits the valued signals on lines 9 and 10, while the engine controller reads the values on lines 28 and 29. Here, w.r.t the shared signals, the navigation controller thread is considered as a writer and the engine controller thread is considered as a reader. During execution of Esterel programs, all possible writers are scheduled first before the readers are scheduled, allowing all concurrent statements within the same tick to share the same synchronous view. This type of communication is known as instantaneous broadcast. Instantaneous broadcast is a powerful feature of Esterel. However, if not han10
dled correctly, it has the potential to produce non-deterministic code. For example, in the Esterel program present A else emit A end. The program contradicts with itself, as the status of the signal can only be preset or absent, and can not change during the execution of a tick. Such program is know as non-causal and is checked at compile time using existing techniques [14]. Another important feature of Esterel is its support for preemptions. For example, strong immediate abort statement is a preemption that terminates its body as soon as the preemption condition evaluates to true. We illustrate this feature on lines 6 and 13. Here, the abort body (lines 7 to 12), consisting of a non-terminating loop, is preempted when the expression ?mode=landing evaluates to true and the control continues execution from line 15. Also, when preemption statements are nested, the order of nesting dictates the priority of the preemptions. Esterel relies on interfacing with a host language, typically C, to handle data computations. For example, on line 8, the value of the signal angle is computed using the host function calAngle(altitude). This limitation for direct support for data manipulation is one of the limitations of Esterel. In summary, we have presented three approaches for implementing real-time systems. Using Table 1.1, we present a qualitative comparison. The tasks of an RTOS are managed by a preemptive scheduler, which monitors tasks and performs context switching. This introduces scheduling overheads and consumes more memory per task. The threads of light-weight multithreading languages, like Protothreads, are managed using a non-preemptive scheduler. The context switching points are explicitly defined by the programmer. Thus, compared to RTOS the resulting scheduling and memory overheads for context switching are smaller. Also, as explained earlier in Section 1.1.1, preemptive scheduling increases the complexity of the static timing analysis, because of the interferences between tasks. This problem is significantly reduced by a non-preemptive scheduler, which reduces the number of possible context switching points, making it more amenable for timing analysis. In contrast, concurrency between the threads of a synchronous program is compiled away into sequential code, avoiding the need for a scheduler. The synchronous model of computation is akin to synchronous digital hardware which makes them highly amenable for timing analysis. However, the semantics of synchronous constructs is comparatively harder to comprehend for conventional designers of embedded systems, who are more familiar with C. In summary, using the UAV example, we have presented three ways to implement real-time applications using RTOS, Protothreads (light-weight) and Esterel (synchronous language). While they differ in expressing functional specifications, they all share one 11
Scheduler type Scheduling overhead Memory footprint per thread Amenable for timing analysis Learning time Example application
Real-time operating systems (MicroC/OS-II [8]) preemptive high high
Light-weight multithreading (Protothreads [9]) non-preemptive medium medium
Synchronous languages (Esterel [4]) not applicable not applicable not applicable
low
medium
high
easy automotive control [15]
easy sensor networks [16]
hard flight control of the Airbus A340 [11]
Table 1.1: Comparing the three approaches for implementing real-time systems. common problem, in that their timing specifications are validated externally using static timing analysers. In the next section, we study some of the problems that are common to all three approaches.
1.2
Validation of real-time systems
The computation of the worst case execution time (WCET), depends on the structure of the program and the complexity of the underlying target hardware. Today’s general purpose processors (GPPs) complicate the analysis by introducing speculative features that solely focus on improving the average case performance, while ignoring and sometimes worsening the possibility to analyse the worst case. A summary of these problems are presented in [7], which describes the analysis of speculative components such as caches, pipelines, branch prediction, and more. Cache Analysis classifies memory references as cache misses or hits. In general, the timing penalty for a miss is significantly more than that of a hit. Thus, the precision of cache analysis could significantly affect the overall precision of the WCET analysis. However, current multi-level cache architectures and speculative replacement policies make it harder for static analysis of cache behaviour. Pipeline Analysis is used to determine the execution time of instructions. This required the modelling of hardware features such as branch prediction, data forwarding, and stalling of pipelines. Other analysis techniques include, value analysis for determining the range of the resister values and the memory references, inter procedural analysis for analysing the flow of function calls in a program, and lots more [7]. Analysing functional properties of a program is also a challenge [7], e.g, loop bound 12
analysis computes an upper bound for the number of iterations of a loop. For example, in Figure 1.6, the termination condition of the for loop is i