MODELLING AND VERIFICATION OF EMBEDDED SYSTEMS BASED

MODELLING AND VERIFICATION OF EMBEDDED SYSTEMS BASED ON PETRI NET ORIENTED REPRESENTATIONS

by Mauricio Varea

A thesis submitted for the degree of Doctor of Philosophy

Department of Electronics and Computer Science, University of Southampton, United Kingdom.

c September 2003

Dedicated to Fabiana, my “emotional support”.

UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF ENGINEERING ELECTRONICS AND COMPUTER SCIENCE DEPARTMENT Doctor of Philosophy

MODELLING AND VERIFICATION OF EMBEDDED SYSTEMS BASED ON PETRI NET ORIENTED REPRESENTATIONS by Mauricio Varea

Driven by the demand for more functionality, the complexity involved in the design of embedded systems continues to increase. This has lead to a progressive increase in the amount of control and data flow that current embedded systems need to deal with. This dissertation addresses the interaction between these two domains and investigates its influence on the design of embedded systems, in terms of overall design cost. The first part of this dissertation presents the formalisation of a new design representation, called Dual Flow Net (DFN), which provides a tight control and data flow interaction. This is achieved by means of two new concepts. Firstly, the structure of the new DFN model is formulated employing a tripartite graph, as opposed to previous approaches based on a bipartite graph. Such a structure allows the use of a unique semantics to model the control flow, data flow, and its interactions. Secondly, a marking scheme that captures the changes in the state of the system produced by the separated effects of control and data flow is described. The analysis of behavioural properties using such a marking is proposed, and illustrative examples are given. The second part of this dissertation is concerned with the verification of DFN models through formal methods. A new set of algorithms for the symbolic model checking of DFN models is proposed. Behavioural properties of embedded systems, such as reachability, safety and liveness, are verified, using both Computation Tree Logic (CTL) and Linear Temporal Logic (LTL) formulae. The description of a new estimation method is provided, which is capable of allocating resources to the verification process efficiently, hence dealing with the state explosion problem. The algorithms and estimation method have been validated by examples of varying complexity, ranging from simple systems, in order to understand the modelling and verification principles, up to complex arrangements that depict real-life embedded systems, including an Ethernet coprocessor. The final part of this dissertation investigates the applicability of DFN models to the co-synthesis of hardware/software systems, as a potential application of the new design representation. It has been shown how the DFN model provides a flexible design framework for system-level trade-offs in the generated solution. ii

Contents Abstract

ii

Acknowledgements∗

x

List of Symbols

xii

Chapter 1 Introduction 1.1 Embedded Systems . . . . . 1.2 Embedded System Design . 1.2.1 Modelling . . . . . . 1.2.2 Validation . . . . . . 1.2.3 Synthesis . . . . . . 1.3 Motivation and Contributions 1.4 Thesis Organisation . . . . .

. . . . . . .

1 2 3 5 7 9 9 11

. . . . . . . . . . . .

13 14 14 16 19 21 22 23 25 27 27 29 30

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

Chapter 2 Background and Related Work 2.1 Preliminaries . . . . . . . . . . . . . . 2.1.1 System Modelling . . . . . . . 2.1.2 Temporal Logics . . . . . . . . 2.1.3 Binary Decision Diagrams . . . 2.2 Modelling of Embedded Systems . . . . 2.2.1 Finite-State Machines . . . . . 2.2.2 Petri Nets . . . . . . . . . . . . 2.2.3 Signal Transition Graphs . . . . 2.2.4 High Level Petri Nets . . . . . . 2.2.5 Data Flow Graphs . . . . . . . 2.3 The Control/Data-Flow Heterogeneity . 2.4 Taxonomy of Embedded System Models iii

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

2.5

2.6

2.4.1 MCD models . . . . . . . 2.4.2 MDC models . . . . . . . 2.4.3 MB¯ models . . . . . . . . 2.4.4 Summary of Models . . . Formal Verification . . . . . . . . 2.5.1 Deductive Methods . . . . 2.5.2 State-exploration Methods Concluding Remarks . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

Chapter 3 Dual Flow Net model 3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Modelling the Control/Data-Flow paradigm . . . . . . . . . . . 3.3 DFN Structural model . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Structural model example . . . . . . . . . . . . . . . . 3.4 A new concept in marking functions . . . . . . . . . . . . . . . 3.5 DFN Behavioural model . . . . . . . . . . . . . . . . . . . . . 3.5.1 Behavioural model example . . . . . . . . . . . . . . . 3.6 DFN Graphical interpretation: Analogy with Complex Numbers 3.7 DFN Analysis of the Multiplier example . . . . . . . . . . . . . 3.8 DFN Matrix Equations . . . . . . . . . . . . . . . . . . . . . . 3.9 Decidability Issues for DFN . . . . . . . . . . . . . . . . . . . 3.10 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . .

Chapter 4 Formal Verification of Dual Flow Nets 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Model Checking of PN-based models . . . . . . . . . . . . 4.2 Model Checking of DFN models . . . . . . . . . . . . . . . . . . . 4.2.1 Implementation and Results . . . . . . . . . . . . . . . . . 4.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Example 1: verification of a VME-bus controller . . . . . . 4.3.2 Example 2: verification of an Ethernet network coprocessor 4.4 Modular approach to the Verification of DFN models . . . . . . . . 4.4.1 Compositional Verification . . . . . . . . . . . . . . . . . . 4.4.2 Estimation Method . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Towards an Automatic Approach . . . . . . . . . . . . . . . iv

. . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . .

31 33 33 36 37 37 38 40

. . . . . . . . . . . .

42 43 44 45 48 49 51 53 54 56 59 61 65

. . . . . . . . . . . .

68 69 70 71 75 76 78 78 81 84 85 86 88

4.5 4.6

Real-life Example: the Ethernet coprocessor . . . . . . . . . . . . . . . 90 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Chapter 5 Co-Synthesis of Dual Flow Nets 5.1 Hardware/Software Co-synthesis . . . . . . . . . . . . . . 5.2 Co-Synthesis of Embedded System . . . . . . . . . . . . . 5.2.1 Dynamic Voltage Scaling . . . . . . . . . . . . . . 5.2.2 A Co-Synthesis Tool . . . . . . . . . . . . . . . . 5.3 Synthesising DFN models . . . . . . . . . . . . . . . . . 5.3.1 Example: Co-Synthesis of the Ethernet coprocessor 5.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . .

. . . . . . .

97 97 99 100 101 101 105 107

Chapter 6 Conclusions 6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Infinite State Systems . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Energy-efficient Embedded Systems . . . . . . . . . . . . . . .

109 111 111 111

Appendix A Transformations

112

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

Appendix B DFN Library 115 B.1 main( ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Appendix C DFN Examples C.1 Multiplier . . . . . . . . . . . . . . . . . . C.2 VME-bus controller . . . . . . . . . . . . . C.3 Ethernet coprocessor (flat representation) . C.4 Ethernet coprocessor (modular verification)

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

121 121 122 124 127

Appendix D DFN Co-Synthesis results 140 D.1 Mapping results without DVS . . . . . . . . . . . . . . . . . . . . . . 140 D.2 Mapping results with DVS . . . . . . . . . . . . . . . . . . . . . . . . 142 Bibliography

145

v

List of Tables 2.1 4.1 4.2 4.3 4.4 5.1

Temporal Operators . . . . . . . . . . . . . . . . . . . . . . Sizes of BDD trees for a ten-stage pipeline . . . . . . . . . . Estimating the pipeline complexity . . . . . . . . . . . . . . Ethernet LTL properties: guarantees . . . . . . . . . . . . . Ethernet LTL properties: assumptions . . . . . . . . . . . . Low-power co-synthesis results of the Ethernet’s DFN model

vi

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

17 86 87 91 96 106

List of Figures 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 3.1 3.2 3.3 3.4 3.5 3.6 3.7

Sales of microprocessors . . . . . . . . . . . . . . . . . . . . Typical top-down embedded system design strategy . . . . . . Impact of design methodology on system implementation cost Relationship among Validation Methods . . . . . . . . . . . . Binary signal . . . . . . . . . . . . . . . . . . . . . . . . . . Hierarchical composition of logical systems . . . . . . . . . . Temporal Logics . . . . . . . . . . . . . . . . . . . . . . . . Modulo-8 counter . . . . . . . . . . . . . . . . . . . . . . . . BDD representation of Eq. (2.2) . . . . . . . . . . . . . . . . four-bit shift register . . . . . . . . . . . . . . . . . . . . . . FSM model of the four-bit Shift Register . . . . . . . . . . . . PN representation of a binary signal . . . . . . . . . . . . . . STG model of a four-phase handshake protocol . . . . . . . . Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . DFG model of one cycle of the multiplier given in Figure 2.10 Processor architecture . . . . . . . . . . . . . . . . . . . . . . PRES+ model of the multiplier . . . . . . . . . . . . . . . . . FGM model of the multiplier . . . . . . . . . . . . . . . . . . ETPN representation of the multiplier . . . . . . . . . . . . . FunState model of the multiplier . . . . . . . . . . . . . . . . A tripartite graph for control/data-flow systems . . . . . . . . DFN model of the Fibonacci algorithm . . . . . . . . . . . . . Initial marking for the DFN model in Figure 3.2 . . . . . . . . Complex plane mapping of the state space . . . . . . . . . . . DFN model of the multiplier . . . . . . . . . . . . . . . . . . Copying process (COPY) . . . . . . . . . . . . . . . . . . . . . Summation process (SUM) . . . . . . . . . . . . . . . . . . . .

vii

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

3 4 6 8 16 17 19 20 21 23 23 24 26 28 28 30 32 34 35 36 45 49 54 56 57 62 63

3.8 3.9 3.10 3.11 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 5.1 5.2 5.3 5.4 5.5 5.6 5.7 B.1 B.2 B.3 B.4 B.5 B.6 B.7

Overwriting process (OWR) . . . . . . . . . . . . . . . . . . . DFN hull represented by PN . . . . . . . . . . . . . . . . . . Comparation process (CMP) . . . . . . . . . . . . . . . . . . . The DFN philosophy . . . . . . . . . . . . . . . . . . . . . . Proposed model checking methodology . . . . . . . . . . . . Nondeterministic scheduler . . . . . . . . . . . . . . . . . . . Algorithm for a DFN Transition . . . . . . . . . . . . . . . . Algorithm for a DFN Hull . . . . . . . . . . . . . . . . . . . Algorithm for a DFN Guard . . . . . . . . . . . . . . . . . . Verification time depending on the capacity K and the length L VME bus controller . . . . . . . . . . . . . . . . . . . . . . . DFN model of the VME bus controller . . . . . . . . . . . . . DMA tx/rx of the Ethernet coprocessor, and signals involved . DFN model of the Ethernet coprocessor . . . . . . . . . . . . Cyclic and acyclic complexity . . . . . . . . . . . . . . . . . Ethernet coprocessor’s model, using signal modules . . . . . . Initialisation: module M1 . . . . . . . . . . . . . . . . . . . . Sends destination address . . . . . . . . . . . . . . . . . . . . Setting up the length: module M5 . . . . . . . . . . . . . . . Reads successive data . . . . . . . . . . . . . . . . . . . . . . Module M2 : cancel transmission? . . . . . . . . . . . . . . . Typical co-synthesis design flow . . . . . . . . . . . . . . . . Algorithm that translates DFN models into Task Graphs . . . . Finding communication links in SD . . . . . . . . . . . . . . . Finding communication links in SC . . . . . . . . . . . . . . . Synthesising the DFN model of the multiplier . . . . . . . . . Task graph for the Ethernet coprocessor . . . . . . . . . . . . Final Task Mapping . . . . . . . . . . . . . . . . . . . . . . . Type definitions . . . . . . . . . . . . . . . . . . . . . . . . . Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . Place . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guard function . . . . . . . . . . . . . . . . . . . . . . . . . Hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Binary signal . . . . . . . . . . . . . . . . . . . . . . . . . .

viii

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63 64 65 66 69 71 72 74 74 76 79 80 82 83 88 91 92 92 93 94 95 98 102 102 103 104 106 107 115 116 116 117 117 118 119

B.8 B.9 B.10 B.11 B.12

Declarations . . . . . . . . . . . . . Conditions . . . . . . . . . . . . . . Scheduler . . . . . . . . . . . . . . Assignments . . . . . . . . . . . . . Arrays containing the DFN structure

ix

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

119 119 120 120 120

Acknowledgements∗ This dissertation is the outcome of both, three years of technical investigation and a fertile environment in which to foster my ideas. Maybe unaware of it, a finite number of people have helped to develop this atmosphere. Thus, the aim here is to express my deepest gratitude towards them. Hu [1995] attains such an objective in a more abstract and fair way, i.e., without any name at all, based on the fact that each person’s contribution is not measurable, therefore a prioritisation is not feasible. Since I believe on the immortality of the written word, my approach still gives names of relevant people but does not, by any means, attempt to judge their individual merits. On the contrary, they all deserve my gratitude in equal proportion, because the outcome of this PhD would have been very different on the absence of anyone. First and foremost, I am thankful to my supervisor, Bashir Al-Hashimi, for leading me on and giving me the opportunity to explore challenging fields. He showed me how to conduct research and aided the development of my written and presentation skills. Also, being part of the ESD group1 has been an amazing experience for me. Here I had the most rewarding discussions in a truly varied range of topics, with guys who were always in the spot for lively conversations. Thus, special thanks go to Marcus Schmitz, Theo Gonciari, and Paul Rosinger, for their constant and daily friendship. Also, for Nicola Nicolici, who not only gave me valuable feedback on early papers, but also provided me with his friendliness, even before my arrival to Southampton. Moreover, I wish to thank Reuben Wilcock and Edward James, for their patience when I bothered them with practical English usage, as well as Alan Williams for his early comments. Beyond the boundaries of the ESD group, I am also indebted to the DSSE group2 . Arguing about aspects of formal methods with Michael Leuschel, Ulrich Ultes-Nitsche, ∗ Besides

the people mentioned here, I am very grateful to the Department of Electronics and Computer Science (ECS), University of Southampton, which has provided the financial support that made this investigation feasible! 1 Electronic Systems Design Group, University of Southampton, http://www.esd.soton.ac.uk/ 2 Declarative Systems and Software Engineering, http://www.dsse.soton.ac.uk/

x

and Juan Carlos Augusto, has been a very rewarding experience for me. Furthermore, the writing of this dissertation has been possible mainly due to Michael’s patience, allowing me the extra time to write up this dissertation. Some other people in the UK deserves my perennial gratitude as well. For instance, the SMV support I have received from Gethin Norman (at Birmingham) has been truly helpful in the development of my verification methodology. A landmark through my PhD training, laid on my visit to the ESLAB3 in Sweden. Particularly, I wish to acknowledge Petru Eles for inviting me to this lab, and always being there for fruitful discussions, as well as Zebo Peng for burgeoning my research and, of course, Luis Cortés with whom I have had some of the most intriguing debates. Other people at Linköping, such as Erik Larsson, Traian Pop, Paul Pop, Alexandru Andrei, et al., also contributed towards an ideal research atmosphere. From the very far land, at the other side of the Atlantic ocean, also came some invaluable arguing that, for sure, has led to many improvements in the work presented in this dissertation. Alan Hu, from Canada, and Ken MacMillan, from USA, have made some great contributions to my research. Furthermore, a dissertation is not only a technical contribution, but also a personal achievement. For this, I am thankful to all those in Argentina who have motivated me to perform this PhD. From the very beginning, when Carmen Elsa fired that “initial spark” by encouraging me to pursue some postgraduate study in the United Kingdom, till the helpful support both Peter and Marcela have given me while I was settling down in the UK. Also, thanks to Paul, Cora, and Alejandro for their unconstrained friendship over these years. Last, but certainly not the least, I ought to mention that the constant and daily support I have received from my family has been very motivating. Dad has always been a source of inspiration. Mom has synthesised all her love in 493 e-mails she wrote to me over these three years, far more than anyone else! My two sisters, Ivana and Valeria, have fulfilled each day of my life with nice thoughts and, of course, may Uncle P. be remembered here, for his influence towards my current personality. Finally, I might struggle trying to express in words my gratitude towards Fabiana, my fiancée, who has been putting up with many things, so that I could achieve this grade. Therefore, this piece of work is dedicated to her.

3

Embedded Systems Laboratory, University of Linköping, http://www.ida.liu.se/∼eslab/

xi

List of Symbols General =⇒ ⇐⇒ ← νi ≺ ν j νi ν j

implies if, and only if assignment νi precedes ν j (ν j succeeds νi ) νi succeeds ν j (ν j precedes νi )

Set theory ∈ ⊆ ∪ ∩ \ 2S ∅

set membership subset set union set intersection set difference power set empty set

Propositional logics ¬ ∧ ∨ Y ≡

negation (not) conjunction (and) disjunction (or) exclusive disjunction (xor) equivalence

xii

First Order logics (FOL) ∀ ∃

for all (universal quantifier) there exists (existential quantifier)

Temporal logics

◦ϕ 3ϕ 2ϕ ϕU φ

ϕ holds next ϕ eventually holds ϕ always holds ϕ holds until φ holds

Domains IN Z Zn IR / C

set of natural numbers, IN = {0, 1, 2, . . .} set of integers, Z+ = IN \ {0}, Z = Z− ∪ {0} ∪ Z+ set of integers modulo-n, Zn = {0, 1, 2, . . . , (n − 1)} set of real numbers set of complex numbers

DFN model •x ◦x

x• x◦

pre-set of x over the control domain pre-set of x over the data domain post-set of x over the control domain post-set of x over the data domain

Modular Verification J1 k J2 P #P(x)

parallel composition partition of a set of vertices number of possible partitions, for a set of x vertices

xiii

Chapter 1 Introduction Never yet has a theory had to be regarded as falsified owing to the sudden breakdown of a well-confirmed law K ARL P OPPER [1959]

Aggressive competitiveness in the market place is one of the key issues behind the growth seen in both the electronics and computing industry. The increasing demand for faster systems and more complex functionalities, along with shrinking time-to-market windows, have fueled a vast amount of research towards new directions. Indeed, improving silicon technology from Large Scale of Integration (LSI) to Very Large Scale of Integration (VLSI)1 has fostered the creation of Computer Aided Design (CAD) tools, which have reciprocally boosted the silicon technology. This booming cycle has become a major focus of research in both industry and academia, leading to the development of an industry widely known as Electronic Design Automation (EDA). The boundary between what is and what is not possible in the EDA industry for VLSI technology is constantly changing. Based on the type of functionality to be performed, VLSI devices are classified in two types: (1) self-contained units, which can be programmed to run different applications according to what is needed, or (2) units that are part of a larger system, advocated to comply only with a particular subset of the specified requirements. The first type of devices include: workstations, desktop computers, notebooks, etc., while devices of the second class are widely known as embedded systems. This dissertation mainly deals with the identification of intrinsic 1

And recently, Ultra Large Scale of Integration (ULSI).

Chapter 1: Introduction

2

features of the design of embedded systems that need to be targeted, in order to improve the design process. These intrinsic features are not necessarily reflected directly in the cost of the final product, but are decisive to the design cost. For example, it is not true that increasing heterogeneity in embedded systems leads to higher costs, however, it does lead to an enlarged complexity that requires additional resources to both validate and implement. The remaining of this chapter introduces the overall embedded system design flow and outlines the main objectives of this research. Section 1.1 introduces embedded systems. In Section 1.2, a description of the design process of embedded systems is given. The main contributions and the underlying motivation of this work are highlighted in Section 1.3. Finally, Section 1.4 outlines the organisation of this dissertation.

1.1

Embedded Systems

Embedded systems are currently been applied to a vast number of fields such as, mobile phones, cars, avionics, medical equipments, computer devices, consumer electronics, household appliances, etc. For instance, a modern vehicle’s control system may have up to 70 embedded systems [Thoma, 1999], called Engine Control Units (ECU), which aid in the cruise of the car. Examples of such embedded systems are the Anti-lock Braking System (ABS), the fuel injection controller, the electronic ignition system, etc. Such diversification of applications has lead to an exploding growth in the embedded system market, which has widely overtaken the general purpose processors (GPP) market for the last five years. Figure 1.1 compares the sales [Hennessy, 1999] of both processors, those included in embedded systems and GPPs. In order to do a fair comparison, only 32- and 64-bit microprocessors have been included in both types of processor2 . It can be seen that the sales of embedded processors far outstrips the sales of GPP. The implications of this, is that embedded system designers are concentrating on other aspects of the design process, including reliability and scalability issues, which are increasingly becoming the focus of research; instead of, e.g., performance, which is often more appropriate for GPP designers. Due to its dissemination among various types of application domain, embedded systems are more affected by market constraints than general-purpose units. These constraints include [Ernst, 1997]: 2

8- and 16- bit Embedded Processors are far more popular, which would emphasize even more the difference.

3


Cost vs. Performance: Contrary to general purpose processors, where performance is the main goal, embedded systems may allow a redefinition of their functionality in order to achieve certain cost limits. In other words, cost plays a ceiling role in the embedded system specification (and it is a strict margin), as opposed to GPPs. Time-to-market: Embedded systems may require longer development time, when compared to GPPs, since their hardware and software design has to be finished before it can be released as a commercial product. This may not be the case with GPPs, where updated software versions can be supplied at a later date. Safety: Often, embedded systems are part of a critical system, hence subject to safety and reliability requirements, in order to guarantee a certain level of robustness. A more robust design tends to increase the cost, which cannot exceed a certain value. Flexibility: Since embedded systems are application-specific units, it is difficult to provide an appropriate degree of programability, without falling into a GPP design. Mobility: Embedded systems aimed at targeting mobile applications, such as wireless telecommunication or portable information processing devices, are likely to be severely affected by other factors, including size and energy consumption.

Units shipped (millons)

700 600

Embedded Proc. GPP

500 400 300 200 100 0

1996 1997 1998 1999 2000 2001 2002 2003

Figure 1.1: Sales of microprocessors

1.2

Embedded System Design

The design of embedded systems normally involves a number of steps that transform an abstract description, namely specification, into a more refined characterisation, namely

4


Specification

correctness identification

Modelling Parameters (stimuli, properties)

constraints estimation

Technology Library

Internal Design Representation (IDR)

Validation

Synthesis

Implementation

Figure 1.2: Typical top-down embedded system design strategy implementation. Design strategies for embedded systems frequently encompass topdown and bottom-up approaches. Firstly, via a top-down approach, successive refinements are applied to a very abstract description of the desired functionality, in order to meet system specifications and constraints. Secondly, limits imposed by the manufacturing technology and financial budgets are propagated via a bottom-up approach. Due to this mixture of approaches, it is not trivial (sometimes not even possible) to produce an embedded system implementation that optimises all objectives. There is, however, a strong concern about reducing the number of iterations performed between these two approaches [Lavagno et al., 1999]. Figure 1.2 shows a typical top-down design strategy for embedded systems design. The two inputs to this systematic approach are: (1) the embedded system specification which contains all technology independent information, and (2) the technology library consisting of a set of resources that defines the architecture to be used. The output consists of the final implementation of the embedded system, which results from the


5

heterogeneous integration of architectures carried out in the last step of the strategy. There are three main steps in this design strategy: modelling, validation, and synthesis [Edwards et al., 1997], which are described in the following sections of this chapter. In addition to those steps, some intermediate steps are required in order to adapt a model at a certain stage for the next refinement. The main characteristic of these intermediate steps is that they are not carried out automatically. For example, in the correctness identification step the designer infers a set of parameters which are to be used in the validation step. Also, constraints estimation is a step that is performed by the designer, in order to guide the synthesis towards a feasible implementation. 1.2.1

Modelling

Modelling is the core of many disciplines in science and engineering. Scientists and engineers have increased the knowledge of a physical phenomenon by representing the interaction of different parameters as a model. For example, a well known set of equations developed by Maxwell [1873] has provided physicists with an insight into the interactions between electric and magnetic fields. When applied to embedded system design, the modelling process covers those aspects in the design which can be abstracted from the physical implementation. Thus, three aspects need to be covered: • Capturing the embedded system functionality, • Meeting a number of requirements (including, e.g., temporal, energy, size, etc.), and • Organising the dependencies of each part of the embedded system. The use of an Internal Design Representation (IDR) in order to achieve these three aims is desirable, since it eliminates the ambiguity that an abstract specification is likely to have. Furthermore, the IDR does not only unambiguously define the embedded system specification, but also allows a better exploration among possible implementations of a design. This indicates that choosing an appropriate IDR may facilitate the use of several tools for subsequent processes, such as estimation, validation and synthesis, which is the focus of this thesis. An IDR consists of a finite set of objects (or symbols) and its composition rules (syntax and semantics). Both, objects and composition rules, can be expressed in either a textual or graphical way [Gajski et al., 1997]. In spite of the apparent amelioration of

6


textual languages when compared to their graphical counterpart, graphical representations have a multi-threaded nature, which allows certain features, e.g., concurrency, to be analysed. £100B

Design Cost [log]

£10B

£1B

£100M

RTL only (without IDR) Designed with IDR

£10M 1985

1990

1995

2000

Year

2005

2010

2015

2020

Figure 1.3: Impact of design methodology on system implementation cost A thorough literature survey in the realm of IDRs for embedded system designs indicates that, despite the important role of modelling in the overall cost, this issue has not been sufficiently addressed yet. The existence of an IDR within the design methodology has fostered some innovational paradigms, in order to reduce the overall cost. Such paradigms are, for example, reuse [Seepold and Kunzmann, 1999], platform-based design [Keutzer et al., 2000], hardware/software co-design [Staunstrup and Wolf, 1997], fault tolerance [Torres-Pomales, 2000], etc. Figure 1.3 [ITRS, 2002] illustrates the design cost of System-on-a-Chip (SoC) technology for a low-power SoC (SOC-LP) Personal Digital Assistants (PDA). Two trends can be observed, and correspond to the following assumptions: Firstly, if no IDR has been used (RTL only), and consequently no innovational paradigms had taken place in the embedded system design process, the estimated design cost would have risen exponentially in the last decade (recall the logarithmic nature of the vertical axis). Secondly, the use of models capable of incorporating these successful paradigms into the design methodology, has lead to a desirable gap in the design cost of (SOC-LP) PDAs. Considering, for example, the year 2001,


7

the cost of designing and implementing this type of embedded system equated3 the sum of £9, 491, 867, which represents an improvement of more than 22 times w.r.t. an estimated £215, 438, 300, had the last decade’s innovational paradigms not taken place. This clearly shows the need for a better understanding of the underlying principles in embedded systems in order to obtain cost-effective implementations. 1.2.2

Validation

Through validation, the designer achieves a reasonable level of confidence about how much of the original embedded system design will in fact be reflected in the final implementation. Early detection of design errors dramatically reduces the cost of design, since a lower number of design iterations is needed. The cost of detecting errors also increases as the designer goes further in the synthesis process. In general, there are three methods for validation: • Simulation (S) • Testing (T) • Formal Verification (FV) The goal of each of these methods, can be described in terms of a trade-off between behavioural coverage and structural accuracy. Figure 1.4 is based on the microeconomic theory of the consumer (MTC), where the economists analyse the behaviour of rational consumers, i.e., those who intend to choose the best bundle of goods they can afford, by means of indifference curves and budget lines [Varian, 1987]. Efficiency (referred as capacity in the ITRS [2002]) is a trade-off between behavioural coverage and structural accuracy, and cost is a measure of the allocation of resources, e.g., memory, which is not necessarily reflected in the price of the embedded system. The MTC theory utilises indifference (hyperbolic) curves to show the preference of the consumer for any combination of two products (goods) that gives him/her equal satisfaction, while using budget lines to define alternative combination of such goods that can be purchased by the consumer, given a fixed income. Likewise, assuming that both coverage and accuracy are goods (i.e., desired features), the efficiency of a validation method is given by an hyperbolic conduct whereas the cost is linear. Therefore, simulation (S) depicts an efficient-constant response where more accurate models can be analysed by trading-off 3 The

ITRS has originaly published the values in Figure 1.3 using US dollars as currency. In the context of this dissertation, those values have been translated into GB pounds at the following rate: £1 = $ 1.58940

8


behavioural coverage, in contrast to testing (T), which always has a maximum degree of structural accuracy (i.e., 100%) because an actual prototype of the embedded system is used. This means that a higher coverage (hence efficiency) is only attained by means of an increment in the cost of the testing prototype. In contrast to S and T, a maximum behavioural coverage is always achieved by formal verification (FV) methods, where the accuracy (hence efficiency) can be raised by increasing the verification cost, in terms of memory resources. It should be noted that maximum coverage at 100% accuracy is an utopian goal, which is very unrealistic due to strict cost margins.

FV

max

iso

Behavioural Coverability

cos

t

indiffe rence

S T min

0

100 %

Structural Accuracy

Figure 1.4: Relationship among Validation Methods With reference to Figure 1.2, one input (parameters) to the validation step is: either a stimulus file or a set of properties. On the one hand, a stimulus file, i.e., a file consisting of predefined inputs to the system, is used when the validation is carried out by simulation. On the other, a validation based on formal methods makes use of a set of behavioural descriptions, which are called properties. In case of testing methods, an equivalent prototype of the final implementation should be synthesised in order to validate the design, which denotes a link between the validation and the final implementation.

Chapter 1: Introduction 1.2.3

9

Synthesis

Within the context of the design flow introduced in Figure 1.2, the abstract components of the Internal Design Representation are combined with the technology library, in order to produce an embedded system final implementation that meets all requirements. Since the synthesis process has a knock-on effect on the IDR used, it heavily depends on the characteristics of the model. For instance, an IDR which explicitly denotes data dependencies facilitates the synthesis of a hardware data path. The outcome of the synthesis process is a final implementation of the embedded system, i.e., a mixed hardware/software system [Ernst, 1997; Adams and Thomas, 1996] serving to fulfil the specification requirements. The embedded system implementation consists of two elements: • hardware resources that are either (a) standard or (b) custom, and • software processes, which are allocated to such hardware resources. Standard hardware for embedded systems typically includes one or more commercial microprocessors (or microcontrollers), memory and interfacing units; whereas custom hardware is either implemented as Application Specific Integrated Circuits (ASIC) or Field Programmable Gate Arrays (FPGA), depending on the desired flexibility. In addition, standard software has to be customised by a software program, in order to achieve the desired functionality. The design of embedded systems with such a combined (hardware/software) outcome, has been an active area of research, widely known as hardware/software co-design [Wolf, 1994; De Micheli and Gupta, 1997; Staunstrup and Wolf, 1997]. Particularly, when the hardware/software co-design approach focuses only on the synthesis part of the design strategy, it is known as co-synthesis. This issue is discussed further in Chapter 5.

1.3

Motivation and Contributions

There has been an increasing demand for more complex functionalities, while time-tomarket windows are rapidly decreasing. Greater design complexities are in the aftermath of technology development, hence very likely to continue raising in the future. In addition, meeting the plummeting time-to-market imposed by the burgeoning technology is becoming a major challenge. By examing Figure 1.3, it appears that the IDR may significantly influence the design, hence impacting on the design cost. This issue is addressed throughout the dissertation.


10

One of the key parameters that gives rise to this issue is heterogeneity. The increasing demand for multi-functional architectures in embedded systems has been leading to higher levels of heterogeneity throughout the design process. The existence of event driven mechanisms (control flow) that curb the operation of computationally intensive parts (data flow) is one of the forms of heterogeneity that is difficult to address by simple modelling techniques. Control and data flow cannot be analysed separately since there is a tight interaction between them. However, this interaction is not a trivial matter, and it is seldom exploited by many of the reported approaches (c.f. first aim expressed below). To undertake the problem of efficiency in embedded system design, the aims of this thesis are: 1. Examine the role of modelling within the design flow and identify key features that lead to better embedded system implementations (Chapter 2). 2. Propose a new IDR developed specifically to exploit the strong interrelation between control and data flow parts in embedded systems (Chapter 3). 3. Study verification techniques suited for the proposed model by means of formal methods (Chapter 4). 4. Overcome the state explosion problem of conventional formal methods (Chapter 4). 5. Illustrate the applicability of the proposed model to real-life examples (Chapters 4 and 5). Aiming at the fulfilment of such goals, the work described in this dissertation has led to a number of scientific publications, which are detailed below: • Mauricio Varea and Bashir M. Al-Hashimi, “Dual Transitions Petri Net based Modelling Technique for Embedded Systems Specification”, in Proc. of the 4th Conference on Design Automation and Test in Europe (DATE), Munich, Germany, 13–16 March, 2001 [Varea and Al-Hashimi, 2001a]. • Mauricio Varea and Bashir M. Al-Hashimi, “Embedded Systems Modelling and Validation based on Extended Petri Nets”, in Proc. of the 1st U.K. ACM/SIGDA Workshop on Design Automation, London, UK, 10 September 2001 [Varea and Al-Hashimi, 2001b]. • Mauricio Varea, Bashir M. Al-Hashimi and Michael Leuschel, “Finite and Infinite Model Checking of Dual Transition Petri Net Models”, in Proc. of the 2nd


11

International Workshop on Automated Verification of Critical Systems (AVoCS), Birmingham, UK, 15–16 April, 2002 [Varea et al., 2002a]. • Mauricio Varea, Bashir M. Al-Hashimi, Luis A. Cortés, Petru Eles and Zebo Peng, “Symbolic Model Checking of Dual Transitions Petri Nets”, in Proc. of the 10th Symposium on Hardware/Software Codesign (CODES), Colorado, USA, 6–8 May, 2002 [Varea et al., 2002b]. • Mauricio Varea, Michael Leuschel and Bashir M. Al-Hashimi, “Improving Compositional Verification of State-based Models by Reducing Modular Unbalance”, in Proc. of the 2nd International Workshop on Refinement and Critical Systems (RCS), Turku, Finland, 03 June 2003 [Varea et al., 2003]. • Mauricio Varea, Bashir M. Al-Hashimi, Luis A. Cortés, Petru Eles and Zebo Peng, “Dual Flow Nets: Modelling the Control/Data-Flow Relationship in Embedded Systems”, submitted to the ACM Transactions on Embedded Computing Systems (TECS), 2003.

1.4

Thesis Organisation

The remainder of this dissertation is organised following the taxonomy described in Section 1.2. Chapter 2 briefly reviews some concepts used throughout this work, namely different forms of design representation for embedded systems and some general knowledge of formal verification. In Chapter 3 a formal model for embedded system design is presented. The analysis carried out in Chapter 3 is both intuitive and mathematical, bringing forth the features of the model that are relevant for the rest of the dissertation. Some examples illustrate the applicability of the model into embedded system modelling and aid to understand its semantics. Building up the design flow, Chapter 4 introduces a validation methodology for the proposed model. The full development and analysis of a four-module library for model checking is presented. The analysis is supported by real-life examples, which


12

provide an insight of the methodology and its mechanism. This methodology is enhanced towards the end of the chapter, where a compositional method of validation and an estimator are introduced, aiming at reducing verification complexity. Chapter 5 presents a novel approach towards the synthesis of embedded systems based on the model developed in Chapter 3. This synthesis approach completes the design methodology shown in Figure 1.2, producing a final implementation in both hardware and software simultaneously. Finally, conclusions are drawn in Chapter 6, which summarises the work presented throughout this dissertation and outlines future shapes for the ongoing research.

Chapter 2 Background and Related Work Over the last decade, a considerable amount of research has been carried out in the area of embedded systems, addressing various aspects of the design flow shown in Chapter 1 (Figure 1.2). From the modelling point of view, newer representations [Cortés et al., 2000; Strehl et al., 2001] have emerged from well established research groups worldwide, in order to overcome the limitations of their own former models [Peng and Kuchcinski, 1994; Ziegenbein et al., 1999]. One of the major breakthroughs in the formal verification area has also taken place in the last decade, or so [Bryant, 1992]. This breakthrough has resulted in whole new areas of research, such as symbolic model checking [McMillan, 1993]. Finally, innovations in the synthesis side have made it possible to extend existing techniques, such as high-level synthesis (HLS)[Gajski et al., 1992], into more advanced techniques, such as hardware/software co-design [Wolf, 1994; De Micheli and Gupta, 1997; Staunstrup and Wolf, 1997]. This chapter reviews relevant work that has been carried out in the field of embedded system design. Particularly, the chapter immerses in the modelling and verification steps from the design methodology presented in Chapter 1 (Figure 1.2), giving rise to the issues that require attention from both theoretical and pragmatic designers. Throughout Sections 2.2 to 2.4, the modelling of embedded systems is reviewed, illustrating the usability of several such models by means of examples of low complexity. Section 2.5 introduces the area of verification based on formal methods, with particular emphasis on its applicability to embedded system design. This section not only reviews the basic principles and different approaches within the field, but also depicts its current state-ofthe-art. Finally, Section 2.6 outlines some concluding remarks.

Chapter 2: Background and Related Work

2.1

14

Preliminaries

This section introduces some concepts and notations that are used in the remainder of this dissertation. The area of related research is investigated and presented through three main sub-sections: System Design, Temporal Logics, and Binary Decision Diagrams. 2.1.1

System Modelling

In order to capture the functionality of an embedded system, i.e., what the system does, several issues need to be considered. Firstly, the nature of time is an important aspect of a model. Time may be either continuous or discrete, according to the ennumerating system used by the model. Continuous-time models use t ∈ IR to represent the flow of time, as opposed to discrete-time models that are based on k ∈ IN (or k ∈ Z). In the framework of this dissertation, time is considered to be a discrete variable k. Secondly, it is equally important to characterise the system in such a way that it can be analysed. A state (s) is a unique characterisation of the system, which comprises enough information so that the entire functionality of the system can be represented by a set S that only contains states. For a system being in state sa at a particular instance of time ka , and in state sb at time kb > ka , has evolved from sa to sb . Since discrete-time models are used, there are only a finite number of time instances between ka and kb and, therefore, a finite number of states. A state transition is a particular case of state evolution, where kb is the next time instance of ka . This is, kb = ka + 1. The set of all possible states of a system is called state space (S), where state transitions (si → s j , such that si occurs in time ka and s j occurs in kb ) represent the edges of a graph where s ∈ S are the nodes. State transitions can be of two types: either deterministic, when there is only one possible next state s j for a given state si , or nondeterministic if there are multiple states which can be chosen after the occurrence of si . The number of states in a system may be either finite or infinite. Turing has demonstrated [Turing, 1936a; Turing, 1936b] that, for some systems with an infinite-state nature, there may be no algorithm capable of capturing its functionality. Such systems are referred to as undecidable. Decidability is not a trivial matter and having a decidable system does not guarantee that the modelling can be solved within a reasonable amount of time or resources. Complexity theory deals with the amount of time and space (e.g. memory) required to solve a problem, whether tractable or not [Garey and Johnson, 1979]. Intractable problems, within this context, means that no algorithm can solve it in O(p(n)) time, where p(n) is a polynomial over n.


15

The state space may contain several paths, i.e., a sequence of states instantiated one after another. Going along these paths is also known as traversing the state space. The syntax for a path π containing several states s is: πi = {si , si+1 , si+2 , . . .}. Thus, a path is also a sequence of state transitions, e.g., from si to si+1 , or si+1 to si+2 . If it is possible to traverse the entire state space by a single path, i.e., there exists π = π0 such that all s ∈ S are included in the path, and all state transitions in the path are deterministic, the system is said to be deterministic. Otherwise, the system is nondeterministic. In essence, having a nondeterministic system implies the existence of choice points, i.e., states where there is more than one possible state transition. In addition to choice points, embedded systems often have multiple threads of execution (c.f. Chapter 1, Section 1.2.1). While software is mainly organised in a single threaded fashion, i.e., sequentially, hardware tends to have a multi-threaded nature, i.e., concurrently. Concurrency, however, should not be confused with nondeterminism. For instance, a Marked Graph [Murata, 1977] is a deterministic model which does allow concurrency, i.e., choice cannot be represented in Marked Graphs despite its multi-thread characteristics. A widely accepted representation for finite-state systems, which has both concurrency and nondeterminism, is given by the Kripke structure [Emerson and Clarke, 1982] presented in Definition 1. This structure is also cited in the literature as temporal structure, e.g. [Kern and Greenstreet, 1999], due to its applicability to temporal logics reasoning (c.f. Section 2.1.2). The Kripke structure consists of a set of states S, a transition relation R, and a labelling function L, where the relation R is often required to be total, that is, ∀s ∈ S, ∃s0 ∈ S R(s, s0 ). Thus, traversing the state space using R may result in an infinite tree, as it can be inferred from unfolding a relation R(s, s0 ) where both s and s0 lie in S. Definition 1 A Kripke structure is a tuple M = hS, R, Li, where: S is a set of states; R ⊆ S × S is a total relation, called the transition relation; L : S 7→ 2AP is a labelling function, which maps a state s ∈ S to a set of atomic propositions (AP).

The Kripke structure defined above is, in general, nondeterministic. However, a deterministic version of the above structure is introduced in Definition 2, where the transition relation R has been replaced by a bijective function, in order to avoid choice

16


points. Such a replacement leads to a unique path of infinite nature, which is associated with the evolution of time. Atomic propositions are defined later in Section 2.1.2. Definition 2 A Linear-time structure is a tuple M = hS, π, Li, where: S is a set of states; π = s0 , s1 , . . . is an infinite path; L : S 7→ 2AP is a labelling function, which associates a state s ∈ S to a set of atomic propositions (AP).

The simple Kripke structure depicted in Figure 2.1(a) exemplifies the principles introduced by Definition 1. The initial state, indicated in the figure as curly arrow, refers to a state s ∈ S used as a starting point for the analysis. This structure corresponds to a binary signal, i.e., a part of the circuit that can be in any of two possible states. Being in state s0 (s1 ), a signal can do one of the following actions: either it stays in the same state, or it changes to state s1 (s0 ). The atomic propositions “Off” and “On” are defined such that “Off” holds when the system is in state s0 and, consequently, “On” when the system is in s1 . Thus, Figure 2.1(b) shows the result of unfolding the structure. The infinite labelled tree obtained reveals that the atomic propositions “Off” and “On” are mutually exclusive, since there is no node containing both APs. Off

On Off

On

S0

S1

On

(a) Kripke structure

Off

Off

On

Off

(b) Unfolding

Figure 2.1: Binary signal 2.1.2

Temporal Logics

Logical systems (or simply logics) are particular methods of argument and reasoning about the relation and interpretation of a set of assertions, called properties, which have been applied to the verification of both computer programs [Pnueli, 1977; Burstall, 1974] and hardware design [Wagner, 1977; Gordon, 1985]. The approach in [Pnueli,

17


1977] is a landmark on formal methods, since it is the first logic to include reasoning about concurrency. Atomic propositions (AP), which have been used in Definitions 1 and 2, are the most elementary expressions in logical systems. A logical system is composed of many APs, which are evaluated to either true (>) or false (⊥).

PL Modal Logics

AP AP First Order Logics

AP

Figure 2.2: Hierarchical composition of logical systems Figure 2.2 depicts the hierarchical composition of logical systems. A finite set of logical operators, e.g., ¬, ∧, =⇒ , etc., are used to combine APs in a propositional logics (PL). On the one hand, first order logics (FOL) include the notion of variable quantification, i.e., either the universal quantifier ∀x, which states that a property holds for all paths π, or the existential quantifier ∃x, meaning that there exists a path π such that the property holds. On the other, modal logics (ML) provide a set of qualitative operators, raising the issue of necessity (i.e., 2x), and possibility (i.e., 3x). In particular, when concerned with the concept of time, these operators are known as temporal operators, and the logic is said to be a temporal logic [Emerson, 1990]. Temporal operators are summarised in Table 2.1. Temporal Quantifier Next Eventually Always Until

Notation

◦ 3 2 U

Semantics

◦

M , x ϕ ⇐⇒ M , x1 ϕ M , x 3ϕ ⇐⇒ ∃ j M , x j ϕ M , x 2ϕ ⇐⇒ ∀ j M , x j ϕ M , x ϕ1 U ϕ2 ⇐⇒ ∃ j M , x j ϕ2 ∧ ∀k < j M , xk ϕ1 Table 2.1: Temporal Operators

Definition 3 Let M be a Kripke structure and s the state in M , then the property ϕ holds (which is annotated as: M , s ϕ) when ϕ ∈ L(s). Conversely, ϕ 6∈ L(s) means that ϕ does not hold in M at state s, and it is annotated as M , s 2 ϕ.


18

Definition 3 formulates the meaning of the operator, which is used in Table 2.1 to define the semantics of temporal operators. Since temporal operators allow the reasoning of a property at a certain state being observed from a different state, Definition 3 can be applied to a set of states instead of a single state. Thus, the operator is extended in Eq. (2.1), in order to support paths π.

M , π ϕ ⇐⇒ ϕ ∈ L(s), ∀s ∈ π

(2.1)

In terms of the underlying nature of time in a TL, two possible views exist. In the first one, the system assumes that only one possible next state exists, for a given current state, in which case the TL is said to be linear, namely Linear Temporal Logic (LTL). In the second view, the state evolution considers more than one path, leading to a Branching Temporal Logic (BTL). Several logics are conceived as a restricted type of BTL. One of the most commonly used in both hardware and software systems is Computational Tree Logic (CTL), where each temporal operator must be preceded by a path quantifier. Other BTL types include, for example, TCTL (i.e., CTL augmented with time [Alur et al., 1993]) or ACTL (i.e., CTL without existential path quantifiers [Clarke et al., 1994]). A generalisation of these branching type logics, namely CTL*, is defined in [Emerson and Halpern, 1986]. Figure 2.3(a) shows some elementary properties expressed in LTL. These properties are annotated along a line, since the state evolution assumes only one state transition for each state. Assuming that the atomic propositions a and b hold in the time instances indicated, where a comma is read as a conjunction of APs, there are, e.g., some properties drawn below the line, which hold as well. Firstly, at time k = 0, the Kripke structure satisfies: M , s0 3b because at some point in the future the atomic proposition b will become true. Later, at k = 2, the property M , s2 a U b holds, since a holds until the satisfaction of b is reached. Finally, at k = 5 two properties are shown, which depict the fact that a will only hold for one more instance of time, while b will hold for all remaining time. In Figure 2.3(b), a labelled tree illustrates some basic CTL properties. Assume that the atomic proposition p holds in the subset of states {s0 , s1 , s3 , s6 , s7 }, and that q holds in {s3 }. The property M , s0 ∃2p holds because there is a path π = {s0 , s1 , s3 } which satisfies p along all its states. In state s2 the property M , s2 ∃3p holds because of its paths leading to s6 or s7 . It should be noted that if s5 would satisfy p, then the property

19


0

b

1

a

a

a

a,b

a,b

b

b

2

3

4

5

6

7

8

a b

b a

(a) Linear Temporal Logic S0

p p

E S1

S2

p

E

S3

p S5

S4

p,q

p

A S6

S7

p

p

(b) Computational Tree Logic

Figure 2.3: Temporal Logics

◦

in s2 could be rewritten as: M , s2 ∀3p. Finally, M , s4 ∀ p holds because for any next state chosen from s4 , property p holds. 2.1.3

Binary Decision Diagrams

Digital electronic systems are composed of signals that have only two possible states, so digital system designers often apply Boolean functions in the design process. The development of efficient techniques for representing such functions have been an active area of research. Binary Decision Diagrams (BDD) are a canonical representation for Boolean functions which have been widely used [Hu, 1995] to implicitly represent the state space in symbolic methods – discussed later in Section 2.5.2. A BDD [Bryant, 1986] is a directed, acyclic graph with two types of vertices: internal and leaves. Internal vertices of a BDD graph are labelled with the variables of the Boolean function, while leaves may be labelled with either 0 or 1. Since the Boolean variables may contain either a value of 0 or a value of 1, each internal vertex has only two outgoing edges, also labelled with 0 and 1. Ordered Binary Decision Diagrams

20


(OBDD) [Bryant, 1992] have, in addition, the property that no variable (of the Boolean function) is repeated in any path, i.e., from root to leaf. Bryant proved [Bryant, 1992] the uniqueness of the OBDD for the Boolean formulae representation, given the variable order is known. As a consequence, BDDs, and particularly OBDDs, have attracted research in many fields, including symbolic simulation [Burch and Dill, 1994] and formal verification (see Section 2.5).

v2

CLK v1 v0

ν00 = ¬ν0 ν01 = ν0 Y ν1 ν02 = (ν0 ∧ ν1 ) Y ν2

(a) schematics

(b) next-state functions

Figure 2.4: Modulo-8 counter A simple digital circuit has been chosen from [Burch et al., 1994], in order to illustrate the BDD representation. Figure 2.4(a) shows the schematic diagram of a Modulo-8 counter, which consists of four gates, implementing the next-state function, and a register, which serves as a storage element. A clock pulse into the register loads values coming from the inputs to ν0 , ν1 , and ν2 . Such values are calculated by means of the Boolean next-state function shown in Figure 2.4(b), using the outputs of ν0 , ν1 , and ν2 . Despite its apparent simplicity, these next-state functions are slightly more complicated than visualised at a first sight due to the inherent complexity added by the XOR function. For example, the third Boolean function, ν02 =φ(ν0 , ν1 , ν2 ), has been represented in the minterm canonical form in Eq. (2.2).

φ = (ν0 ∧ ν1 ∧ ¬ν2 ) ∨ (¬ν0 ∧ ¬ν1 ∧ ν2 ) ∨ (ν0 ∧ ¬ν1 ∧ ν2 ) ∨ (¬ν0 ∧ ν1 ∧ ν2 )

(2.2)

21


Figure 2.5 shows the BDD representation of Eq.(2.2), where the following order has been considered: ν2 ≺ ν1 ≺ ν0 . There are six paths starting at node ν2 , going through the others, and ending up in one of the two leaves. These paths in the BDD graph corresponds to instances in the Boolean function. For example, the assignment {ν2 ← 0, ν1 ← 1, ν0 ← 1} leads to the leaf node labelled 1, hence the Boolean function φ holds for such an assignment.

v2 0

1

v1

1

v1 0

v0

1

0

v0

1

1 0

0

0

1

Figure 2.5: BDD representation of Eq. (2.2)

2.2

Modelling of Embedded Systems

As outlined in Chapter 1 (Section 1.2.1), there are two ways to capture the functionality of an embedded system: either a textual approach or a graphical representation. This section introduces the principles behind embedded system modelling based on graphical approaches and presents five models of computation, which constitute the foundation of many internal design representations (IDR) used in current design methods. There are a number of items that may be considered in the semantics of an IDR, these are: • • • •

States Activities Physical composition Type of each object

Internal design representations prompted by states or activities are more likely to be used to enhance the process of capturing the functionality of a specification, while IDRs


22

prompted by physical composition are more suited to describe the structure at a lower level of abstraction. For information systems it is important to consider various aspects, such as the type of each object, its attributes, etc. Furthermore, since the complexity of embedded systems is rapidly increasing, heterogeneity is an important issue that needs to be taken into account. In the scope of this dissertation, at least two types of heterogeneity can be identified: • at the IDR level, the control and data flow parts are unattached descriptions of the embedded system functionality which are tightly linked, and • at the implementation level, the execution of the embedded system’s task is performed by a mixed hardware/software architecture. As discussed in Chapter 1 (Sections 1.2.3 and 1.3), the first type of heterogeneity is due to the dichotomy between event driven and computational parts (also known as reactive and transformational parts), while the second type is due to software processes allocated in hardware resources. The reminder of this section introduces five models of computation that have been proposed, in order to model digital systems: finite-state machines, Petri nets, signal transition graphs, high level Petri nets, and data flow graphs. 2.2.1

Finite-State Machines

The Finite-State Machine (FSM) model [Gill, 1962] has been widely used in control theory and is one of the fundamental pillars in the development of control-dominated representations for embedded systems. The FSM is a tuple hS, I, O, f , hi, where S, I and O are the finite sets of states, inputs and outputs respectively, f : S × I → S is the next-state function, and h is the output function. The function f takes the current state of the system, i.e., the state at time ki , together with a subset of inputs, and produces a new state for the system at time ki+1 . The output function depends whether the FSM is a Mealy machine (h : S × I → O) or a Moore machine (h : S → O). For example, Figure 2.6 shows a four-bit shift register that has a feedback connection, in order to obtain a cyclic behaviour. Two modes of operation can be identified: either the system state cyclically changes among four distinct values, when the only input signal is its clock (CLK), or, under an event of the reset signal (RST) the system state goes immediatly to a predefined value (0000).

23


CLK

Q0

RST Q0

Q1 Q2 CLK

Q1

Q3

RST

Q2

Sin

Sout

Q3 S1

S2

(a) schematics

S3

S4

S1

S2

S1

S2

S3

(b) timing diagram

Figure 2.6: four-bit shift register The FSM model of the four-bit shift register is depicted in Figure 2.7, which shows that if the system is set to a certain state, e.g. s1 , there are as many next state possibilities as outgoing arcs, i.e., one for this example. In this state, the rising edge in the CLK produces the system to change to state s2 , without any choice.There are now two possibilities from state s2 onwards: either going back to s1 if RST is set to 1, or going to state s3 otherwise. Similarly, the remaining states can be reached by successive events of the CLK signal. Input/Output RST/ Q3Q2Q1Q0 S1 0/0010 0/0001

1/0001 1/0001 1/0001

S4

0/1000

S2

0/0100 S3

Figure 2.7: FSM model of the four-bit Shift Register 2.2.2

Petri Nets

The Petri net model [Petri, 1962] is another type of state-oriented model, which can efficiently exploit concurrency identification. A Petri net is a directed, weighted, bipartite graph PN =hP, T, F,W i; where P is a set of vertices called places (represented as circles), T is another set of vertices called transitions (represented as bars), P ∩ T = ∅

24


holds, F ⊆ (P × T ) ∪ (T × P) is a set of arcs, describing the pre-set and post-set relationship, and W : F 7→ Z is a weight function, where a set of k arcs is represented by a single k-weighted arc [Murata, 1989]. It should be noted, however, that the Petri net literature sometimes shows a rather different formulation: Instead of considering a weight function W , there are approaches where bag theory, i.e., set theory augmented to allow multiple occurrence of its elements, is used [Peterson, 1981]. t1

p1

p3

p2

t2

t3

p5

p4

t4

p6

Figure 2.8: PN representation of a binary signal Figure 2.8 shows the PN representation of the binary signal (Figure 2.1), where the states S1 and S0 are represented by places p2 and p5 respectively. Place p4 represents the request for a change from S0 to S1 , in other words a signal raise. Likewise, p3 represents a signal fall. Places p1 and p6 show how the environment should consume the tokens in p2 and p5 , without loosing information of the present state of the system. In general, the concept of states is associated with tokens (pictorially represented as dark dots), which inhere in places and circulate through the net by calls to the event of firing transitions. The mapping of tokens to places is given by the marking function µ : P 7→ IN. Thus, the following mappings define the µ function for this example: µ(p1 ) = 0, µ(p2 ) = 0, µ(p3 ) = 0, µ(p4 ) = 1, µ(p5 ) = 1, and µ(p6 ) = 1. A vector notation1 is used to represent such instance of the marking function as follows: → − µ = (0, 0, 0, 1, 1, 1)T 1

The following notation is used hereafter:   µ(p1 )  µ(p2 )    T → − µ =  = (µ(p1 ), µ(p2 ), · · · , µ(pn )) ..   . µ(pn )

where the last T denotes the transpose operation.


25

The dynamic evolution of µ is determined by the firing of transitions. A transition t ∈ T may fire only if it becomes enabled, i.e., if all places pi ∈ P in its pre-set contain at least W (pi ,t) tokens. The firing of a transition t causes W (pi ,t) tokens to be removed from its pre-set and W (t, p j ) tokens to be added to its post-set. Since a vector notation is assumed, this dynamics can be expressed using the incidence matrix defined in Definition 4. Definition 4 Let n = |P| and m = |T |. Then, the incidence matrix D = [di j ] of a PN, is − 2 + an n × m matrix, such that di j = di+j − d − ji , where di j = W (t j , pi ) and di j = W (pi ,t j ). → Having defined the incidence matrix, this can be used to generate the marking − µ− k+1 − from the marking → µk , as shown in Eq. (2.3). − → → − → − µ− k+1 = µk + D · x

(2.3)

− − Where → x is the firing vector (|→ x | = |T |), i.e., a vector that contains 1, in entry j, if transition t j fires at step k, and 0 otherwise. For example, for the marking µ presented above, t2 and t4 are enabled. Suppose that t2 fires, the new marking (after removing to− kens from the pre-set, and adding them to the post-set) would be → µ 0 = (0, 1, 0, 0, 0, 1)T . In terms of the incidence matrix, the firing of t2 may be expressed as follows:           

2.2.3

0 1 0 0 0 1





          =        

0 0 0 1 1 1





          +        

 1 0 0 0    0 1 −1 0    0  0 0 −1 0  1     ·  0  0 −1 0 0     0 −1 1 0  0  0 0 0 1

Signal Transition Graphs

Signal Transition Graphs (STG) [Rosenblum and Yakovlev, 1985] are a type of labelled Petri net, i.e., a PN where every transition is labelled with a symbol (from an alphabet), which have been widely used to design asynchronous circuits [Miyamoto and Kumagai, 1997; Yakovlev and Koelmans, 1998; Yakovlev et al., 2000]. The suitability of STGs for 2 note

the implicit matrix transposition given by the swapped indexes.

26


asynchronous designs, comes from their capabilities to efficiently capture dependencies among events from different binary signals. Formally, an STG is a triple ST G =hPN, A, λi, where PN is the underlying Petri net, A is a set of signals, and λ : T 7→ A × {+, −} is a labelling function. The set A captures all signals in the system and, finally, the λ function assigns a signal change to transitions from the underlying PN, such that for a signal α ∈ A, there are two symbols, α+ and α− , assigned to two transitions, which represent the rising and falling edge of such a signal, respectively. Data Req Ack

(a) timing diagram Req+

D+ D−

Ack+ Req−

Ack−

(b) STG representation

Figure 2.9: STG model of a four-phase handshake protocol In order to aid the understanding of this model, Figure 2.9(a) shows a timing diagram for a four-phase handshake protocol [Furber and Day, 1996], which is modelled by means of the STG model in Figure 2.9(b). It can be observed from (a) that, e.g., the rising edge of signal Ack happens only after the rising edge of signal Req, which corresponds in (b) to transition Ack+ being able to fire only after the firing of Req+. Furthermore, two threads can be seen due to the rising of Ack, which is associated with the parallel branches after firing Ack+.

Chapter 2: Background and Related Work 2.2.4

27

High Level Petri Nets

High Level Petri Nets (HLPN) [Jensen and Rozenberg, 1991] have been reported to tackle more sophisticated systems, where the semantics of classical PNs is not capable of representing the complete behaviour of the system – or at least, not efficiently. HLPNs include two main approaches: Predicate/Transition Nets (PrT-Nets) [Genrich, 1987] and Coloured Petri Nets (CPN) [Jensen, 1992; Jensen, 1994; Jensen, 1997]. Both approaches have been applied to embedded system modelling [Kleinjohann et al., 1997; Grode et al., 1998], but CPN has been leading the way. The CPN model is based on a PN structure, but tokens are allowed to have different colours. Thus, by expressions attached to arcs of the net, it is possible to capture the behaviour of systems in a more accurate way: the actions performed by a transition t ∈ T depends on the colour of tokens at its input places. Formally, a CPN is a tuple CPN =hΣ, P, T, A, N,C, G, E, Ii, where: Σ is a finite set of non-empty colour sets; P and T are, as expected, sets of places and transitions; A is a finite set of arcs; N : A 7→ P × T ∪ T × P is a node function; C : P 7→ Σ is a colour function; G : T 7→ IB is a guard function, such that Type(Var( G(t)))⊆ Σ; E : A 7→ C(p) is an arc expression; and I : P 7→ C(p) is an initialisation function. A token element is a pair hp, ci, where p ∈ P and c ∈ C(p). Each token on a place p of a CPN model must have a data value that belongs to the set C(p). Despite its good and friendly graphical simulation interface, CPN is still undergoing some improvements in terms of its formal verification [DAIMI, 2002].

2.2.5

Data Flow Graphs

The modelling of transformational embedded systems has been primarily based on data flow graphs (DFG) [Davis and Keller, 1982], due to their suitability for scheduling. This model consists of a directed graph G = (V, E), where nodes V = {ν1 , ν2 , · · · , νN }, N = |V| represent computations and the arcs E ⊆ V × V describe the data dependency between operations and precedence constraints. A multiplier based on iterative additions is used as an illustrative example. Figure 2.10(a) shows the schematic implementation of the multiplier, which takes two parameters (a and b) as inputs and produces a result that is stored in the output register (c). The functionality of the multiplier is to calculate the product c = a · b when enabled, i.e., when the en signal holds, and acknowledge with a ready signal. The acknowledgement

28


int mult(int a,int b) { int x, y, z; x ← a; y ← b; z ← 0; while(y > 0) { z ← z + x; y ← y - 1; } return z; }

ready

a

c

Multiplier

b

en

en

ready

ready

c=mult(a,b); (b) pseudocode

(a) diagram

Figure 2.10: Multiplier takes place after several iterations, and it indicates that the result has been placed in the output. A pseudocode description for such a behaviour is presented in Figure 2.10(b). X

Z

Y

1

2

> O 1

O2

+

O3

_

3

X

Z

Y

Figure 2.11: DFG model of one cycle of the multiplier given in Figure 2.10 One cycle of the multiplier’s functionality is captured in a data flow graph, which is depicted in Figure 2.11. This model shows three steps, each of which contain one operation. In the first step, a comparison is carried out in order to decide whether to perform or bypass the following two operations. The second step contains the addition


29

operation which, on successive instances of the cycle, leads to the desired result. Finally, the third step contains the subtraction operation that controls the cycle execution by a descending counter.

2.3

The Control/Data-Flow Heterogeneity

The impact that complexity and heterogeneity have on the burgeoning technology, has been highlighted in many parts of this thesis. Particularly, Section 2.2 has identified two types of heterogeneity: at the IDR level and at the implementation level. The aim of this section is to develop further the first one of these two types, and to review the work carried out in the realm of modelling of embedded systems considering the control/data-flow heterogeneity. In a simplistic view, a processor may be considered as a set of registers holding information of its current state, while some combinational logic transforms this information into the next state. This is also known as a sequential machine – depicted in Figure 2.12(a). However, the actual architecture of the processor is more complicated: some registers in the processor architecture are intended to hold intermediate data values relevant to computation being performed, namely data registers, while others are strictly concerned with the state information (control registers). Therefore, since embedded systems contain one or more processors, a more realistic approach needs to take into account this division among registers, separating those belonging to the control and data part. The combinational logic used to transfer and manipulate these data values comprises the datapath of the processor (or embedded system), while the part designed to compute the next state is called controller. Figure 2.12(b) shows a more accurate model of a processor architecture, where the datapath and controller are identified as two separate but connected units [Ward and Halstead, Jr., 1994]. Finding a model that is capable of dealing with both control and data parts of the architecture is not an easy task, since the intrinsic features of each part are almost completely disjointed, i.e., there is little analogy between transformational and reactive elements of the embedded system. For example, finite-state machines, Petri nets, and signal transition graphs are suitable for designing the controller part of an embedded system, but cannot directly handle the datapath. On the other hand, data flow graphs are suitable for capturing issues related to the datapath part of embedded systems, but they are difficult to manipulate in such a way that the controller can be generated from them.

30


Inputs

Combinational logic

Outputs

Registers

Clock

(a) Sequential machine Inputs

Control logic

Signals

Control registers

Datapath logic

Outputs

Data registers Conditions

Clock

(b) Controller+Datapath

Figure 2.12: Processor architecture Thus, the following section presents an overview of the field of embedded systems IDR that are suitable to cope with the control/data-flow heterogeneity.

2.4

Taxonomy of Embedded System Models

In this section we describe some relevant formal models that have been used when describing embedded systems [Lavagno et al., 1999]. In order to facilitate the literature review findings, the models presented in this section are classified according to the following taxonomy: 1. Models originally developed for control-dominated embedded systems, and later expanded to include data flow (for ease of reference, these models are called MCD ), 2. Models developed in a data-dominated basis, extended to support also control flow (referred as to MDC ), and 3. Unbiased models specifically developed to deal with combined control/data-flow interactions (MB¯ ).

Chapter 2: Background and Related Work 2.4.1

31

MCD models

Intuitively, one way to make the FSM definition given in Section 2.2.1 suitable for supporting data type information, is by extending each element of S, I and O from the Boolean to the integer domain. This has been treated in [Gajski, 1997], and given the name of Finite-State Machine with Datapath (FSMD). The FSMD model is a tuple hS,V, I, O, f , hi where S is the set of states, V is the set of datapath variables, I : IC × ID and O : OC × OD . State transitions may include arithmetic and logic operations on the set of datapath variables. The Co-design Finite State Machines (CFSM) [Chiodo et al., 1994] is an FSM that has been extended, in order to support data handling and asynchronous communication. The CFSM model operates on a set of integer variables with arithmetic, relational and logical operators. This model is suitable for control-dominated applications of relatively low algorithmic complexity [Chiodo et al., 1993]. It has been used within the framework of the POLIS co-design environment [Lavagno et al., 1996; Balarin et al., 1997; UCB, 1999], where specifications are usually given in the ESTEREL language [Boussinot and Simone, 1991; Berry and Cosserat, 1984; Berry, 1991]. Moreover, CFSM might allow a different input specification, as pointed out by Bates et al. [1999] who used StateChart graphs instead [Harel, 1987; Drusinsky and Harel, 1989]. The Quenya Control/Data-Flow Graph (CDFG) is based on the coloured Petri net (CPN) model introduced in Section 2.2.4. A Quenya CDFG [Brage, 1993] is a directed graph where nodes represent data operations, control operations, communication and miscellaneous, and edges form the control/data-flow. Edges may contain two type of tokens, which are: d ∈ D for data-types, s ∈ S for value-less points of execution, or t ∈ T = D ∪ S for a combination of both. This model is the kernel of the LYCOS system [Madsen et al., 1997], which has been proposed for co-synthesis of embedded systems. Unlike FSMD and CFSM, the CDFG model provides a mechanism to estimate hardware area, through the derivation of Basic Scheduling Blocks (BSB). The BSB hierarchical representation of CDFGs allows the designer to evaluate various hardware/software solutions. Recently, Cortés et al. [2000] have proposed a Petri net based Representation for Embedded System (PRES+), which addresses not only the modelling but also the verification of embedded systems. The PRES+ model is based on a notation that is similar to CPN, where an integer value (v) and a time stamp (t) are associated with each token in the net, in order to handle data information. This model has been applied to formal

32

Chapter 2: Background and Related Work B

A a a

b

t1

t2

b

X Y

x t4

x

t3

0

X1

y

X2

Y1 x Z1

Y2

y

t7

[y>0] x

t8

Y3 y

y [y>0] z+x

t6

Z

x

z

t5

y

X3

[y>0] y−1

t9

z

Z2 Z3 y

z

t10 [y=0]

z

C

Figure 2.13: PRES+ model of the multiplier verification of Timed CTL (TCTL) properties [Alur et al., 1993] using a transformation of PRES+ into Timed Automata [Alur and Dill, 1990]. Models constructed at different levels of abstraction are possible by means of hierarchy [Cortés et al., 2001]. Hierarchy is carried out by defining a set of super-transitions (Λ) that have the same semantics as ordinary transitions, but can be decomposed into other PRES+ models. In order to illustrate the underlying principles of the model, Figure 2.13 shows a PRES+ representation of the multiplier example (c.f. Figure 2.10). The inputs to the multiplier are places A and B, since they are only part of I ⊆ P × (Λ ∪ T ), and not part of O ⊆ (Λ ∪ T ) × P. Likewise, the output of the multiplexer is bound to place C, because C ∈ O,C 6∈ I. The


33

initialisation of the multiplier is carried out by firing t1 , t2 , and t3 , which transfers the token in A and B to X and Y respectively (for the first two transitions), and sets the value of the token in Z to 0 (for the firing of t3 ). The main loop of the multiplier is captured by t7 , t8 , and t9 , where the guard function [y>0], and t9 ’s transition function y-1, indicate that the loop will be repeated Y times. The result is taken out to place C, when the guard function [y=0] allows t10 to fire. 2.4.2

MDC models

The Flow Graph Model (FGM) [Gupta and De Micheli, 1994] is a graph representation based on the data flow graph model, where a polar acyclic graph is used to represent sequential threads of execution. Modelling an embedded system with this approach implies the use of a set Φ = {G1 , G2 , . . . , Gn } of FGMs, a set of timing constraints, and a set of resource constraints. This model has been used in the VULCAN system [Gupta and De Micheli, 1992], where a specification written in the HardwareC language [Ku and De Micheli, 1990a] is translated into FGM by a compiler called Hercules [Ku and De Micheli, 1990b]. There are two particular nodes in the FGM model, ν0 and νN , which are source and sink operations respectively, and conjunction is indicated by the ∗ symbol, while + indicates disjunction. A loop in the specification is captured by an infinite repeat count of Gi , i.e., the source operation is unconditionally called on the completion of the sink operation. For instance, Figure 2.14 represents the multiplier example (c.f. Figure 2.10) by means of the FGM model. It can be observed that the while loop is captured in a sub-graph of the FGM model, which has been drawn at the right side of the figure. Also, the conjoined execution of three rd nodes, one for each initialisation, which has to be performed before the loop can be executed. The System Property Intervals (SPI) model consist of processes communicating through unidirectional channels, which are defined in terms of a tuple hP,C, Ei, where P is a set of processes, C = Q ∪ R is a set of channels (composed of queues Q and registers R) and E = (P ×C) ∪ (C × P) denotes a set of edges [Ziegenbein et al., 1999]. The execution of a SPI process P depends on data availability, allowing non-constant data rates which are used to capture the conditional behaviour of the embedded system. 2.4.3

MB¯ models

The MCD and MDC models outlined in Sections 2.4.1 and 2.4.2 provide a good basis for the embedded system design flow, but they have not been developed specifically to cope

34

Chapter 2: Background and Related Work Source nop

1

*

*

1

1

rd

rd

rd

1 1

1

*

*

join Source nop

* loop add

sub

* wr

nop

Sink

Sink

Figure 2.14: FGM model of the multiplier with the intrinsic features of the control/data-flow interaction that exists in embedded system design. Combinations of such state- or activity-oriented approaches are the basis of MB¯ models, which undertake this interaction in a more equitable way. Extended Timed Petri Nets (ETPN) were introduced in the CAMAD high-level synthesis system [Peng and Kuchcinski, 1994] and have then been extended in order to be used in the framework of hardware/software co-design, yielding a model called PURE [Stoy, 1995]. The ETPN model [Peng, 1987] explicity captures the intermediate result of a design, allowing accurate decisions to be made in the synthesis process. It consists of two separate but related parts: the control part and the datapath. The first part is captured as a Timed Petri Net (TPN) [Merlin and Faber, 1976], while the second part is represented as a directed graph where nodes are used to capture data manipulation and storage. By making data dependencies explicit, i.e., by means of the directed graph, communication requirements are easily identified. A set of labels defines the functionality of each node in the directed graph. For instance, R is used for registers, IP and OP for input and output ports respectively, C for conditions, + for adders, − for subtracters, and constants are enclosed between quotation marks. The applicability of ETPN to the modelling of embedded systems is illustrated in

35


S0

t1

IPB

IPA S1

S2

S3

S3

S1 RX

t2

RY

S5 S6

t3

+

C1

C1

t4

S5

S5

"1" S4

S5

S6

S2

"0"

S4

S6

S7

t6

t5

OP C S8

(a) Control part

S4

-

"0"

RZ S7

S4

> S6

S6

C1

/C 1

(b) Data path

Figure 2.15: ETPN representation of the multiplier Figure 2.15, where the control part and data path of the multiplier example (c.f. Figure 2.10) are captured in Figures 2.15(a) and 2.15(b) respectively. A change in the control part, i.e., changing the marking of the underlying PN, affects the data path in the sense that data is transferred through all edges of the data path that are labelled with the state s ∈ S that contains a token in the control part. For example, the firing of transition t1 allocates tokens in places s1 , s2 , and s3 , which leads to data transferring from IPA to RX , from IPB to RY , and from “0” to RZ , in the data path. Then, iterative firings of t4 and t6 leads to the desired multiplication result, since places s4 and s5 , which are associated with the subtract and the addition operations, change their marking according to such firings. Functions driven by state machines (FunState) [Strehl et al., 2001] is a very expressive model, which aims to be a superset of some of the models introduced in this section. FunState explicitly separates control and data flow, by means of a finite-state machine M and network N respectively. The network N =hF, S, Ei is composed of: a set of functions F, a set of storage units S, and a set of arcs E ⊆ (F × S) ∪ (S × F), which underpins the fact that there is an underlying bipartite graph between storage units and functions. Figure 2.16 shows the multiplier example modelled by means of the FunState model. Like ETPNs, the FunState model of the multiplier has a separate

36


control and data flow, as it can be observed from the two parts that the model depicts. The top part of the model is related to the network N, while the other one contains the state machine M. In order to obtain the product of the two inputs i1 · i2 , the underlying state machine M performs repeatedly a cycle of two transitions, one that leads to the execution of f2 , while the other one executes f3 based upon the value stored in register r2 . Thus, f2 and f3 are the addition and subtraction operation, respectively. C

r2 f3 i2

1 f1

i1

r1

N

1

f2 f4

o

r3 /f1

r2$1>0/f3

/f2

M

r2$1=0/f4

Figure 2.16: FunState model of the multiplier 2.4.4

Summary of Models

Section 2.4 has outlined a variety of approaches, sorted in a structured taxonomy, that have been proposed in order to tackle the control/data-flow heterogeneity. MCD models are, in general, extensions to state-based approaches, such as Petri nets or FSMs. When based on PNs, MCD models consist of transitions that represent changes in the states and places associated with data activity. For instance, the CDFG model [Brage, 1993] uses places of a Coloured PN to represent functions in either data or control domain, while PRES+ models [Cortés et al., 2000] use these places for storage purposes. There are also MDC models that have an underlying PN semantics, e.g., the SPI model [Ziegenbein et al., 1999] uses a bipartite graph to represent processes and channels. Control or data domains utterly ruled by Petri net semantics are commonplace in the literature of MB¯ models. However, what is not so common is a unified semantics for both domains. For instance, the ETPN model [Peng, 1987; Peng and Kuchcinski, 1994]


37

uses a Timed PN to represent the control flow of the embedded system model, whereas the recently introduced FunState model [Strehl et al., 2001] utilises a Coloured PN structure in the data domain to represent activity by means of queues.

2.5

Formal Verification

Formal verification has been widely applied to validate both hardware [Kern and Greenstreet, 1999] and software [Bérard et al., 2001] parts of an embedded system architecture. This form of validation is rapidly gaining popularity among researchers, due to the successful results achieved in the verification of industrial embedded system products by formal methods. Examples of such achievements include: the verification of the Futurebus+ cache protocol [Clarke et al., 1995], and the bug found in the floating-point division (FDIV) routine of the Pentium Processor [Coe, 1995]. Although a numerous amount of formal verification methods exist, they all fall into one of these two categories: Deductive Methods or State-exploration Methods. Deductive methods, although not fully-automated, are motivated by the analogy that exists between mathematical theorems and system properties. State-exploration methods are completely automatic, on the other hand, and use either a Kripke structure [Emerson and Clarke, 1982] or a Labelled Transition System (LTS) [Keller, 1976] for the state manipulation. This section reviews the underlying principles of formal verification and the work carried out in the realm of both categories. 2.5.1

Deductive Methods

Proving the correctness of a design by deductive methods is not a trivial matter [Duffy, 1991]. These methods are based on interactive assistants, called theorem provers, which help with the formulation of axioms A , inference rules R , and theorems T , in order to prove that an implementation meets its specification. Axioms are logical formulae involving any type temporal logic (c.f. Section 2.1.2). Inference rules are used to derive new theorems from existing ones, and have the following form: α1 , α2 , . . . , αk β where αi , ∀1 6 i 6 k, are called premises and β is the conclusion. Finally, theorems are a finite sequence of formulas Φ = {ϕ1 , . . . , ϕn }, which can be either obtained from the


38

set A or by applying R to φ ⊆ Φ. Formally, this is: Γ`ϕ where Γ ⊆ A . Well known theorem provers are, e.g., HOL [Gordon and Melham, 1993] and PVS [Owre et al., 1996]. For further information about deductive methods, the reader is referred to [Duffy, 1991]. 2.5.2

State-exploration Methods

Unlike deductive methods, state-exploration approaches base their methodology in the unique characterisation introduced in Section 2.1.1, i.e., the state space of the system. Much research has been devoted to state-exploration methods in both industry and academia, since they are fully automatic and provide an answer to several aspects of the design, such as safety and robustness. One of the state-exploration methods that is gaining popularity is model checking. Based upon the concept introduced in Definition 3, the model checking problem [Clarke et al., 1999; Alur et al., 1993] is formulated in Definitions 5 and 6. Definition 5 (CTL model checking) Given a Kripke structure M and the computational tree logic formulae ϕ, does M , s ϕ hold, for all states s ∈ S? Definition 6 (LTL model checking) Given a Linear-time structure M and the linear temporal logic formulae ϕ, does M , π ϕ hold, for a given path π? The idea behind model checking is to decide whether or not a model satisfies a given specification, assuming that the state space of the design is finite, and the specification is composed of a finite set of Temporal Logic formulas – called properties. Properties can be either functional or timing related. Functional properties are most commonly expressed in either LTL or CTL logics (c.f. Section 2.1.2), as opposed to timing related properties which need an explicit notation for time, such as TCTL. The debate on which temporal logics is more suitable to express functional properties for model checking systems, goes as far back as [Pnueli, 1985; Emerson and Halpern, 1986; Kupferman and Vardi, 1995], and is still a matter of current investigation [Vardi, 2001]. The work presented in [Clarke et al., 1997] attempts to bridge the gap between these two logics and has allowed traditional CTL model checking tools to accept LTL properties as specification.


39

Symbolic Model Checking Although being automatic, early work on model checking [Clarke et al., 1986] did not make a great impact on embedded system’s verification, since the state exploration was carried out explicitly, i.e., constructing the entire state space. Clearly, the limitation of explicitly exploring the state space is that the designer is usually faced with a socalled state explosion problem, even for relatively small models. One way to carry out more efficient verifications is using an implicit representation of the state space, hence avoiding the search through the entire state space. Two directions for implicit-state model checking can be identified: symbolic and non-symbolic. Model checking based on symbolic methods makes use of some canonical representation of the state space, which can cope better than explicit methods with scalability issues. Symbolic model checking [McMillan, 1993; Henzinger et al., 1994] is based on the OBDD representation introduced in Section 2.1.3. Since BDDs are, in general, much smaller than the state space they are representing, they can handle larger state spaces without running into state explosion [Burch et al., 1992]. However, depending on the Kripke (or Linear-time) structure being used and the property being checked, the symbolic approach can still lead to large BDDs, especially in the data domain. Thus, alternative methods for model checking, not based on BDDs, are emerging as a viable solution to the state explosion problem. Examples of such methods are Boolean satisfiability (SAT) solvers [Biere et al., 1999] and Automatic Test Pattern Generation (ATPG) methods [Boppana et al., 1999], which are gaining popularity due to their “on-the fly” mechanism, i.e., the state space is neither stored explicitly nor as a compact form, rather, the state transition is generated from one step of a branch-and-bound algorithm to the next one. Both symbolic and non-symbolic methods have their limitations, which can be put forward as a trade-off between space (memory) and time. On the one hand, symbolic model checking may require big BDDs for some structures, which lead to large amounts of memory used for the verification process. On the other, SAT and ATPG based methods may not have enough information of the system, in order to produce a result in an acceptable time, thus taking long paths through the state space and, consequently, being unable to reach a solution.


40

Model Checking Tools Nearly a hundred tools based on formal methods are listed in [CAFM, 2003], which range from “light-weight” automatisation of tedious tasks, to “heavy-weight” tools that comply with industrial standards. Examples of well established tools include, e.g., Murϕ, SPIN, SMV, etc. At early stages on the evolution of formal verification, Dill et al. [1992] introduced the Murϕ system [Stanford, 1996], which explicitly explores the state space of the system by both depth- and breadth-first searching techniques, in order to reason about properties. Despite its strength, explicit exploration is always limited compared to implicit model checking. Thus, Bell Labs and Carnegie Mellon University developed some robust and efficient tools for model checking, which are still amongst the most popular in the area. For example, the SPIN tool [Holzmann, 1997; Bell, 2003] developed at Bell does on-the-fly LTL model checking [Holzmann, 1991] of systems specified in the PROcess MEta LAnguage (PROMELA). Another example is the SMV [McMillan, 1993; CMU, 1997] tool developed at Carnegie Mellon, which performs symbolic model checking of systems described in its own language. Cadence SMV [Cadence, 2001] is an enhanced version of the traditional SMV tool that is more suitable for larger designs, not only because it has a syntax which allows for more complex structures, but also because it exploits the existence of symmetry in scalar types. The Cadence SMV tool accepts specifications written in both CTL and LTL logics.

2.6

Concluding Remarks

Clearly, models that do not capture the intrinsic features of both control and data flow, such as those described in Section 2.2, are not suitable for current (and future) embedded system modelling. Thus, there has been an increasing need for research efforts towards the exploitation of such features in embedded systems design. One of the issues raised in such investigation, is the intricate relationship between control and data flows. A common way to address this issue is by taking an insufficiently expressive model and extend its capabilities, in order to allow a combined control and data flow representation, hence overcoming such weakness in its semantics. Previous work [Brage, 1993; Peng and Kuchcinski, 1994; Mirkowski and Yakovlev, 1998; Cortés et al., 2000] has shown that Petri net based models are particularly suitable for internal design representation of embedded systems, which are normally composed of several tasks running in parallel, e.g., a digital filter may compute all the terms corresponding to


41

the difference equation (i.e., an implementation of the Z −1 transform of a transfer function) at the same time. This suitability is not surprising, since PNs are very intuitive and potentially applicable to construct highly concurrent models, which make them suitable for embedded system specifications. This thesis argues, however, that unbiased models specifically developed to target the interactions between control and data flow (MB¯ ) are the way forward in embedded system design, since they treat both control and data flow in a unbiased way. The models presented in Section 2.4.3 are based on a highly heterogeneous semantics, a consequence of being combinations of representations with very dissimilar aims. A conclusion that arises from the analysis provided in Section 2.4.4, is that a model with an ETPN-like notation for the control domain and a FunState-like representation for the data domain would be advantageous in the sense that the entire model is based on the same Petri net based semantics. Consequently, the next chapter formulates a new MB¯ model, which does have an underlying PN semantics in both control and data domain, hence a unified notation.

Chapter 3 Dual Flow Net model The influence of modelling aspects on the overall cost of embedded systems is a rather intricate issue on the design. Chapter 1 has argued that incremental improvements in the modelling process does have a drastic influence in terms of the cost reductions achieved. A thorough analysis of the background material presented in Chapter 2, raises the motivation for further investigating one of such modelling issues: the control/dataflow interrelation. A common inability from MCD , MDC , and MB¯ models used as the core of a design flow for embedded systems, lies in the unnatural treatment that the interaction between control and data flow receives. The aim of this chapter is to present a new internal design representation that tackles the interrelation between control and data flow in embedded system designs. Unlike any other Petri net based representation in the realm of embedded system modelling, where the PN semantics is either circumscribed in the control domain [Brage, 1993; Cortés et al., 2000; Peng, 1987] or constitutes the basic structure of the data domain [Ziegenbein et al., 1999; Strehl et al., 2001], the model introduced in this chapter has a unified notation throughout the entire modelling process. That is, the same graphical notation and semantics is used to capture both control and data flow, leading to a uniform representation of the embedded system being modelled. This chapter introduces the new Dual Flow Net (DFN) model as a well-defined formalism for the analysis of embedded systems, and also argues about its intrinsic features. The main contributions of the DFN model are: • A conceptual modification in the first flush of PN theory: DFN uses a weighted, directed, tripartite graph, instead of the classical bipartite one.

Chapter 3: Dual Flow Net model

43

• Capability to express changes on the state of the system by means of transitions, while, arithmetic functions can be captured through a new set of vertices that are called hulls. • A definition of the dynamics of the system that captures both control and data flows. Both forms of Internal Design Representation, i.e., either textual format or graphical models, have their semantics classified according to the type of analysis they can perform: static or dynamic. In this way, the part of an IDR semantics that deals with properties of a static nature is called structural model, while the part that deal with its dynamics, is called behavioural model. In textual languages, the interconnection of modules (or entities1 ) sets up the structure of the model while the behaviour is described in terms of a parallel and sequential execution of events. In graphical approaches, the structural model refers to the topology of the formalism and the analysis of several instances of this topology brings out the behavioural model. The rest of this chapter is organised as follows. Section 3.3 introduces the structure of the DFN model. The state space of such a model is defined in Section 3.4 while a definition of its behaviour follows in Section 3.5. Section 3.6 outlines an interesting analogy with the field of complex numbers, which serves as a basis for a graphical interpretation of DFN models. An example that illustrates the DFN principles is presented in Section 3.7, while Section 3.8 outlines a matrix analysis of DFN state equations. In Section 3.9, the question as to whether DFN models are decidable or not is tackled, in order to support the analysis provided in Chapter 4. Some concluding remarks are given in Section 3.10.

3.1

Preliminaries

As mentioned in Chapter 2 (Section 2.1.2), atomic propositions (AP) are variables from the Boolean domain, IB = {⊥, >}, that expresses the results of elementary assertion. For the purpose of this dissertation, we assume that APs are composed of two parts: subject and predicate. The subject of an AP is the part that is not constant (e.g., “x” in “x > 0”), whereas the predicate is a property that the subject may or may not satisfy (e.g., “> 0”). Since APs express conditions, the predicate consists of a symbol taken from the finite set of conditional comparators, ] = {“=”,“6=”,“>”,“”,“6”}, and 1

Depending whether the syntax is defined in Verilog or VHDL language.

44


a constant (e.g., “0”). Thus, an AP can be defined in terms of a function that receives a triplet as an argument, such as the one described in Definition 7. Definition 7 An atomic proposition (AP) is a function of 3 arguments: AP : |{z} Z × ] × Z 7→ IB | {z } subject

predicate

where the first argument is a variable x ∈ Z that represents the the subject of the AP, while the second and third argument are a symbol s ∈ ] and a constant K ∈ Z respectively, which constitute the predicate of the AP. Definition 7 standardises the way APs are treated in the scope of this dissertation. As such, APs are considered to be a function, which maps both the predicate and subject into the Boolean domain IB. For instance, the syntax AP(x, >, 0) needs to be specified, in order to express the condition “x > 0”.

3.2

Modelling the Control/Data-Flow paradigm

In general, systems with two flows of information, i.e., one control flow and another data flow, have three aspects to be considered by the underlying semantics. These are: • the general state of the system, which characterises the system at a certain instance of time, • the control flow, which is a composition of reactive elements, and • the data flow consisting of transformational elements. This thesis argues that a system which has two flows, i.e., control- and data-flow, is best represented by a model which is isomorphic to a tripartite graph. Figure 3.1 depicts the interaction between the general state of the system and both sets of elements. Hereafter, we graphically represent a storage element by a circle, a reactive element by a bar, and a transformational element by a box. The DFN model and its elements are introduced in Definition 8. This definition is based upon the principles introduced in the next sections (c.f. Section 3.3 for the concept of S , and Section 3.4 for the marking function µ).

45

Chapter 3: Dual Flow Net model Control Domain

Data Domain

Storage elements

Enabling signals

Reactive elements

Conditions

Transformational elements

Figure 3.1: A tripartite graph for control/data-flow systems Definition 8 A Dual Flow Net is a pair N =hS , µ0 i, where: S is a Dual Flow Structure. µ0 is the initial marking.

3.3

DFN Structural model

Consequently to Figure 3.1, the structure of the DFN model is based on a notation that comprises three types of vertices: 1. a set of vertices that captures the state of the system, 2. a set of vertices used to capture the changes in such states, i.e., the control flow, and exposes the influence over the third set of vertices, and 3. a set of vertices that captures all those transformations which are relevant to the data flow of the system, such as transferring information or performing arithmetical operations among registers. Given a weighted, directed, tripartite graph, its vertices V = P ∪ T ∪ Q, are used to represent storage, reactive, and transformational elements respectively2 . Storage elements (p ∈ P) relate to memory components in the system (e.g., registers, memory cells, latches, variables, etc.), reactive elements (t ∈ T ) allude to components in the control part and transformational elements (q ∈ Q) refer to arithmetic operations performed among storage elements (i.e., components in a data path). 2

where P ∩ T = ∅, P ∩ Q = ∅, T ∩ Q = ∅, P 6= ∅, and T ∪ Q 6= ∅.

46

Chapter 3: Dual Flow Net model Definition 9 A Dual Flow Structure is a seven-tuple S =hP, T, Q, F,W, G, Hi, where: P = {p1 , p2 , · · · , pn } is a finite, non-empty set of places; T = {t1 ,t2 , · · · ,tm } is a finite set of transitions; Q = {q1 , q2 , · · · , qh } is a finite set of hulls; F ⊆ (P × T ) ∪ (T × P) ∪ (P × Q) ∪ (Q × P) ∪ (T × Q) ∪ (Q × T ) is a binary relation, called the flow relation; W : F 7→ Z+ ∪ Z− is a weight function; G :T 7→ ] ∪ {>} is a guard function, where ] is the set of conditional comparators (c.f. Section 3.1), and > is an element from the binary set IB = {⊥, >} that means true; H : Q 7→ Z is an offset function.

The cardinality of each set P, T and Q is given by n, m and h respectively; where n > 0, m > 0 and h > 0 (also, m + h 6= 0). For the sake of simplicity, this dissertation uses the notation W (x, y) to denote W ((x, y)). From the pictorial point of view, each transition t ∈ T is labelled with a symbol from the set ], according to the guard function G(t), and hulls q ∈ Q are labelled with integers, corresponding to the offset function H(q). The purpose of these two functions is twofold: (a) the guard function G(t) provides a mechanism for data flow to interfere in the control flow, i.e., to have conditional points in the control flow, where the condition is taken from the data flow; and (b) the offset function H(q) completes the arithmetic used in the hull. The functionality of the hull, i.e., the action that takes place when a hull is executed, is detailed in Definition 18. Eq. (3.1) summarises the Definition, where the basic operation of a hull is to sum over the data domain, and the H(q) function is provided in order to cover those situations where a constant is needed. variable

functionality(q) =

constant

∑ SD + H(q) z }| {

z }| {

(3.1)

q

In order to reduce notational clutter, symbols ’>’ and numbers ’0’ (from G(t) and H(q) respectively) are not explicitly written across the net, i.e., transitions and hulls are only labelled in nontrivial cases. The three disjoint sets of vertices in a DFN model, and their interactions, define the structure of the control and data domain shown in Figure 3.1. This means that there is a direct relation between the control (data) flow of

47


the original embedded system specification and the control (data) domain of the S net. Both control and data domains are formalised in Definitions 10 through 13, where the concepts of pre- and post-sets, from classical Petri net notation, are extended in order to support any element p ∈ P, t ∈ T , and q ∈ Q of the new DFN model. Definition 10 Given a certain p ∈ P, t ∈ T , and q ∈ Q, the following subsets are defined: i. ii. iii. iv. v. vi.

The control pre-set of a place, • p = t ∈ T (t, p) ∈ F ; The control post-set of a place, p• = t ∈ T (p,t) ∈ F ; The control pre-set of a transition, •t = p ∈ P (p,t) ∈ F ; The control post-set of a transition, t • = p ∈ P (t, p) ∈ F ; The control pre-set of a hull, • q = t ∈ T (t, q) ∈ F ; The control post-set of a hull, q• = t ∈ T (q,t) ∈ F .

Definition 11 The control domain of a Dual Flow Structure, SC , is defined as follows: 

 "  #  [ [ [ SC =  ( • p ∪ p• ) ∪ ( •t ∪ t • ) ∪  ( • q ∪ q• ) ∀p∈P

∀t∈T

∀q∈Q

Definition 12 Given a certain p ∈ P, t ∈ T , and q ∈ Q, the following subsets are defined: i. ii. iii. iv. v. vi.

The data pre-set of a place, ◦ p = q ∈ Q (q, p) ∈ F ; The data post-set of a place, p◦ = q ∈ Q (p, q) ∈ F ; The data pre-set of a transition, ◦t = q ∈ Q (q,t) ∈ F ; The data post-set of a transition, t ◦ = q ∈ Q (t, q) ∈ F ; The data pre-set of a hull, ◦ q = p ∈ P (p, q) ∈ F ; The data post-set of a hull, q◦ = p ∈ P (q, p) ∈ F .

Definition 13 The data domain of a Dual Flow Structure, SD , is defined as follows: 

 "  #  [ [ [ SD =  ( ◦ p ∪ p◦ ) ∪ ( ◦t ∪ t ◦ ) ∪  ( ◦ q ∪ q◦ ) ∀p∈P

∀t∈T

∀q∈Q

48


Inherited from classical PNs [Murata, 1989], a transition t such that •t = ∅ is called source transition and a transition t such that t • = ∅ is called sink transition. The same pattern is followed in the data domain, i.e., a hull with ◦ q = ∅ is named source hull, while its dual (q◦ = ∅) is called sink hull. There are, however, some issues to be considered with regard to these four special cases. For instance, a sink hull q ∈ Q may only exist3 if there is at least one transition in its control post-set (q• 6= ∅), and for these transitions t ∈ q• , the function G(t) 6= > holds for at least one of them. The control and data flow of an embedded system are, of course, not completely independent. On the contrary, a model which deals with embedded systems should have a mechanism which allows the representation of inter-domain effects. By inter-domain effects we mean the influence of the control flow over the data flow and vice versa, e.g., the execution of a conditional branch (where the next operation to be executed is data dependent). Such influence is modelled in DFN by means of arcs in T × Q and Q × T , and the guard function G. The guard function G plays an important role in the behavioural DFN model, since it allows a transition t to have a functionality that not only depends on the control domain of the model, but also on its data domain. Section 3.5 elaborates this concept with further detail. 3.3.1

Structural model example

The definitions introduced in Section 3.3 are illustrated by the example shown in Figure 3.2. This example consists of a DFN model of the Fibonacci algorithm [Séroul, 2000]. The structural part of the model is discussed in this Section, while Section 3.5.1 analyses the behavioural part. This DFN model consists of three places, three transitions and three hulls. The places of the net, p1 , p2 , and p3 , have the following pre- and post-sets definition: control pre-set control post-set data pre-set data post-set

•p

1 •

= {t1 }

•p

2 •

= {t3 }

•p

3 •

= {t2 }

p1 = {t2 }

p2 = {t1 }

p3 = {t3 }

◦p

◦p

◦p

1 ◦

= {q3 }

p1 = {q1 }

◦

2

= {q2 }

p2 = {q1 , q3 }

3 ◦

= {q1 }

p3 = {q2 }

Likewise, transition t1 has a control pre-set •t1 = {p2 }, a control post-set t1 • = {p1 }, and a data post-set t1 ◦ = {q1 }. With transition t2 , •t2 = {p1 }, t2 • = {p3 }, and t2 ◦ = {q3 } 3

Otherwise, it does not make sense to perform an operation (c.f. Section 3.5) that will not be used, neither in the data domain (a place p ∈ q◦ ) nor in the control domain (a transition t ∈ q• ).

49

Chapter 3: Dual Flow Net model p3 t2

t3

q1

p1

q2

t1

p2

q3

Figure 3.2: DFN model of the Fibonacci algorithm holds. And transition t3 has •t3 = {p3 }, t3 • = {p2 }, and t3 ◦ = {q2 }. For this example, ◦t = ◦t = ◦t = {∅} holds, which means that there are no conditions from the data 1 2 3 domain affecting the execution of these transitions. As to the four hulls q ∈ Q, the control pre-set of q1 is given by • q1 = {t1 }, the data pre-set is ◦ q1 = {p1 , p2 }, and its data post-set is q1 ◦ = {p3 }. Similarly, • q2 = {t3 }, ◦ q2 = {p3 }, and q2 ◦ = {p2 }. And, it can be observed for q3 that • q3 = {t2 }, ◦ q3 = {p2 }, and q3 ◦ = {p1 }. Again, this example has the particularity that q1 • = q2 • = q3 • = {∅} holds, which implies the same absence of conditions mentioned above. Later, in Section 3.5.1, we will come back to this example in order to analyse its behavioural model.

3.4

A new concept in marking functions

The previous section has covered the definition of the DFN model from the structural point of view. Section 3.5 introduces the behaviour of DFN models, whilst this section tackles the dynamic analysis of embedded systems proposing a new concept for the state evolution, i.e., we analyse the progress of the state taking into account two parameters: control and data information. As explained in Chapter 2 (Section 2.2.2), the behavioural modelling that takes place in classical Petri nets is carried out by dynamically assigning tokens to places, i.e., using a marking function. In this way, tokens work out as indivisible quanta from the control flow. But, a marking function defined in the classical way (µ : P 7→ IN) is

50


not sufficient to capture both control and data flow of an embedded system specification. Since the DFN structure is based on a tripartite graph rather than bipartite one, it is possible to extend the classical marking function to a domain which can handle the control/data-flow heterogeneity described in Chapter 2 (Section 2.3). This extension, in very simplistic terms, can be seen as having a marking function defined as µ = P 7→ IN × IN. However, this function does not take into account important modelling issues, e.g., how to cope with negative numbers in the data flow, if both elements of the pair are in IN? Moreover, storage elements will have a finite amount of bits in the final implementation, and negative numbers will in fact be mapped to unsigned numbers by two’s complement techniques [Wakerly, 1994]. This means that it is more realistic to capture the data flow by a finite set of integers modulo-n, Zn = {0, 1, 2, 3...(n − 1)}, as shown in Definition 14. Definition 14 The DFN marking function is defined as follows4 : → − µ : P 7→ IN × Zn where the first element in the tuple (γ ∈ IN) is the number of control quanta that reside inside a place p, while the second element (α ∈ Zn ) is the number of data quanta. The following notation is used, in order to obtain each part in the tuple hγ, αi: γ = |µ(p)| α = ∠µ(p)

In addition to tokens for the control flow analysis, as in classic PNs, this new marking function scheme incorporates an indivisible quantum for the data flow. Since the data quanta is defined in the Zn domain, it inherits the periodicity property. This means that: ··· = α−2·n = α−n = α = α+n = α+2·n = ··· (3.2) where n = |Zn |, which is a very natural behaviour for the final implementation of an embedded system. For instance, an ALU performing an operation which exceeds the capacity of a register, will produce a truncated result. The same effect can be observed in the DFN model, if the data part of the marked place is overflowed. Definition 15 4 This

PNs.

marking function utilises the same vector notation described in page 24 (footnote) for classical

51


introduces two more attributes associated with a place, which are used to bound the model in both control and data domains. Definition 15 K(p) is the capacity of a place, which is the maximum number allowed in |µ(p)|; L(p) = log2 (n) is the length of a place, where n = |Zn |.

3.5

DFN Behavioural model

Having introduced both the structure and the state space of DFN models, it is subsequently possible to analyse the behaviour of such models utilising the principles introduced in this section. The behaviour of a DFN model is described in terms of enabling and firing transitions, as in classical Petri nets, in addition to a synchronised data-flow operation scheme. The following two definitions introduce the rules that ensue from modifying the classical enabling and firing rules, in order to allow the marking function defined in the previous section. Definition 16 A transition t is said to be enabled, for a given marking µ, if the following two conditions are met: i. all places in pre-set pi ∈ •t contain at least W (pi ,t) tokens, that is: ^ pi

(|µ(pi )| > W (pi ,t))

∈ •t

ii. the following atomic proposition holds: “the relation between the data quanta that affects all hulls in the preset q j ∈ ◦t, and 0, is given by the result of the guard function G(t)”. Formally: ^ qj

∈ ◦t

AP

∑

! ∠µ(p` ) ·W (p` , q j ) + H(q j ) , G(t) , 0

p` ∈ ◦ q j

Definition 16 states whether a transition of a DFN model is enabled or not. The influence of both control and data flow aspects in this evaluation can be observed from the combined form of the enabling condition. Thus, from the control flow point of view, the enabling of a transition depends on the token distribution throughout the DFN model, i.e., subpart (i.) of the definition. From the data flow point of view, the dependence is

52


established in subpart (ii.), by the conjunction of atomic propositions AP, which takes data quanta as an argument. The summation (over `) in the argument of AP is further explained in Definition 18. An enabled transition may fire, producing the result described in Definition 17. Definition 17 The firing of an enabled transition t j changes a marking µk into µk+1 by means of the following rules: i. A finite number of tokens are removed from pi ∈ •t j : |µk+1 (pi )| = |µk (pi )| −W (pi ,t j ), ∀pi ∈ •t j ii. A finite number of tokens are added to pi ∈ t j • : |µk+1 (pi )| = |µk (pi )| +W (t j , pi ), ∀pi ∈ t j • iii. Each hull q ∈ t j ◦ is executed (c.f. Definition 18). Hulls capture the data flow behaviour of a DFN, as shown in Definition 18. In simple terms, the hull performs a summation of data quanta over the data domain. If the summation contains only one term, i.e. | ◦ q| = 1, it turns out to be a simple move operation. From the behavioural point of view, the execution of hulls q ∈ Q are synchronised with transitions t ∈ T in the net, i.e., no hull q can fire nondeterministically. Definition 18 The firing of any transition t ∈ • q produces the execution of the hull q, which changes a marking µk into µk+1 as follows:

∠µk+1 (p j ) = W (q, p j ) ·

∑

! ∠µk (pi ) ·W (pi , q) + H(q)

(3.3)

pi ∈ ◦ q

where p j ∈ q◦ . It is worth underpinning that, since ∠µ(p) ∈ Zn , the sum operation in Eq. (3.3) is defined with additions modulo-n. This means that the summation is quoted with the mod operation, as follows: xr = (· · · ((xa + xb ) mod n) + xc mod n) + · · · mod n)

Chapter 3: Dual Flow Net model 3.5.1

53

Behavioural model example

The application of Definitions 16 to 18 is illustrated in this section, using the example described in Figure 3.2. The Fibonacci algorithm [Séroul, 2000], which is commonplace in subjects related to programming algorithms, produces a sequence of numbers {si } such that si = si−1 + si−2 . This is: {si } = 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, · · ·

(3.4)

where s0 = 0 and s1 = 1. The DFN model presented in Figure 3.2 (and reproduced in Figure 3.3) produces that sequence of numbers in the data part of p3 , due to the addition that takes place in q1 and the movement of data quanta that the cyclic firing of {t1 ,t2 ,t3 }. The state space of the algorithm needs to be initialised with two initial values, as the algorithm is defined by a second order recurrence relation [Séroul, 2000]. Thus, the initial marking is given by: 

   h0, si−2 i h0, 0i     → − µ0 =  h1, si−1 i  =  h1, 1i  = (h0, 0i, h1, 1i, h0, 0i)T h0, si i h0, 0i − where the vector → µ0 = (h0, −i, h1, −i, h0, −i)T indicates5 that transition t1 is allowed − to fire. Figure 3.3 shows the initial marking → µ0 that initialises the state space of the Fibonacci algorithm. − According to the initial marking → µ0 , only transition t1 is enabled. Firing t1 leads to T → − µ1 = (h1, 0i, h0, 1i, h0, 1i) , where q1 has performed the addition ∠µ(p2 ) + ∠µ(p1 ) and − placed the result in ∠µ(p3 ). This enables t2 , leading to → µ2 = (h0, 1i, h0, 1i, h1, 1i)T after its firing. Subsequently, t3 is enabled and its firing will produce the marking → − − µ3 = (h0, 1i, h1, 1i, h0, 1i)T , which contains the same information than → µ0 in the control part, while the data part has performed one step of the algorithm. Thus, the marking function starts to show a behaviour that, after a number of iterations, resembles the pattern imposed by the Fibonacci algorithm. 5 notations

hγ, −i or h−, αi indicate that only one component of the tuple is taken into consideration.

54

Chapter 3: Dual Flow Net model p3 t2

0

t3

q1

p1

0

q2

1

t1

p2

q3

Figure 3.3: Initial marking for the DFN model in Figure 3.2

→ − µ4 → − µ5 → − µ6 → − µ7 → − µ8 → − µ9 − → µ 10

= = = = = = = .. .

(h1, 1i, h0, 1i, h0, 2i)T (h0, 1i, h0, 1i, h1, 2i)T (h0, 1i, h1, 2i, h0, 2i)T (h1, 1i, h0, 2i, h0, 3i)T (h0, 2i, h0, 2i, h1, 3i)T (h0, 2i, h1, 3i, h0, 3i)T (h1, 2i, h0, 3i, h0, 5i)T

The combined effect of two separate dynamics changing the state of the same system can be visualised. Along with a token moving clockwise in DFN model, data quanta are being added (and the result placed) in p3 , in order to obtain the sequence shown in Eq (3.4).

3.6

DFN Graphical interpretation: Analogy with Complex Numbers

/ ) is consolidated in many areas of engineerAlthough the theory of complex numbers (C ing, these numbers are not yet commonplace in areas where models have a discrete nature. Some work carried out in that respect is the field of Gaussian Integers [Dimiev and Markov, 2002], where Z[i] = {ξ + i η ξ, η ∈ Z}. Since Z ⊂ IR, and a complex number / . These is composed of two real numbers, the Gaussian Integers comply with: Z[i] ⊂ C

55


numbers may be used in discrete models, where ξ and η are the quantised variables of a Cartesian system hx, yi. The state space in DFN model resembles more a set of complex number in its polar form, rather than Cartesian, thus Definition 19 presents an / and being polar. alternative notation that complies with both Z∗[i] ⊂ C Definition 19 Let γ ∈ IN and α ∈ Zn . Then, a Modified Gaussian Integer is a combined number ς ∈ Z∗[i] that has a polar form, where both modulus and argument are defined in IN and Zn , as follows: Z∗[i] = {γei·ρ·α γ ∈ IN, α ∈ Zn } where: ρ ∈ IR is the resolution constant; and n ∈ IN is the period of the argument.

The resolution constant introduced in Definition 19 adjusts the inherent periodicity n of numbers in Zn , to the periodicity 2π of the argument of a complex number in polar form. This constant is defined in Eq. (3.5). ρ = 21−L(p) · π

(3.5)

where L(p) is the length of places, defined previously in Definition 15. Figure 3.4 exemplifies the principles hitherto introduced, by means of a three-place DFN model where each place has a length L(p) = 4. Thus, the resolution constant has been set to: π ρ = 21−4 · π = 8 − A marking → µ = (h2, 1i, h3, 6i, h1, −4i)T is mapped to the complex plane as three vectors, where the magnitude of each vector is obtained from the number of tokens in the marked place and there is a direct relation between the vector’s angle and the number of data quanta in the marked place. That is, place p1 contains γ = 2 tokens and α = 1 data quantum, place p2 has γ = 3 tokens and α = 6 data quanta, and p3 has γ = 1 token and α = −4 data quanta. This means, analysing each marked place in the Z∗[i] field: π

µ(p1 ) = 2ei· 8

π

µ(p2 ) = 3ei· 8 ·6 π

π

µ(p3 ) = 1e−i· 8 ·4 = 1e−i· 2

56


0101

Im

0100

3

0110

m2

0111

0011 0010

2 0001 1

1000

m1 Re

m3

1001

1111

1110

1010 1011

1100

1101

Figure 3.4: Complex plane mapping of the state space Moreover, changes in the control quantum γ of a marked place can be visualised as moving radially along a line, while changes in data quantum α are equivalent to a move through concentric circles of constant radius –determined by the number of tokens γ.

3.7

DFN Analysis of the Multiplier example

In Sections 3.3.1 and 3.5.1, the separate structural and behavioural model of a system based on DFN have been developed, in order to illustrate how the previously introduced structural and behavioural Definitions are applied. This section shows how to obtain a DFN internal design representation from an embedded system specification, and performs a functional analysis of the resulting model. Based on the iterative multiplier introduced in Chapter 2 (Section 2.2.5), we have developed a DFN model for such a system in Figure 3.5. Places p1 and p2 are set to contain, in the argument of their markings, the multiplier operands a and b respectively. After b + 1 iterations, the argument of p6 should contain a value of c = a · b. As shown in Figure 2.10, the multiplier needs to check the readiness of both operands before it

57


INA

INB

p

1

OUT p2

a

p

6

c

b t1

q

5

q

q2

1

p3

x

y

q7

t2

"=" t3

p4 q

6

">" −1

q3

q4

z p5 q

8

Figure 3.5: DFN model of the multiplier is able to start iterating. This condition is captured by t1 , which is only allowed to fire when both p1 and p2 have a token each. The control flow of the multiplier is also curbed by the guard and offset functions (G and H respectively), where G(t2 ) =“¿”, G(t3 ) =“=”, and H(q3 ) = −1 (the rest of G(t), ∀t ∈ T and H(q), ∀q ∈ Q are set to > and 0, as explained in Section 3.3). Shall t1 fire, the following sequence shows how the product ∠µk (p6 ) = ∠µ0 (p1 ) · ∠µ0 (p2 ) is obtained: 1. Transition t1 fires. Therefore, (a) The two tokens are removed from p1 and p2 , as stated in Definition 17.1. (b) One token is placed in p3 , other in p4 , and another in p5 (Definition 17.2). (c) The two hulls belonging to the set t1 ◦ = {q1 , q2 } are executed according to Eq. (3.3), as disclosed from Definition 17.3. 2. Hulls q1 and q2 execute, according to Definition 18, copying the arguments from p1 and p2 into p3 and p4 respectively, i.e., the application of Eq. (3.3) when


58

| ◦ q| = |q◦ | = 1. Thus, ∠µ1 (p3 ) = W (q1 , p3 ) · (∠µ0 (p1 ) ·W (p1 , q1 ) + H(q1 )) = 1 · (a · 1 + 0) = a ∠µ1 (p4 ) = W (q2 , p4 ) · (∠µ0 (p2 ) ·W (p2 , q2 ) + H(q2 )) = 1 · (b · 1 + 0) = b T − 3. The new marking → µ1 = (0ei·ρ·a , 0ei·ρ·b , 1ei·ρ·a , 1ei·ρ·b , 1ei·0 , 0ei·0 ) causes the transition t2 to fire repeatedly b times and, for each iteration, ∠µ(p5 ) is incremented by a. This leads to obtaining a · b in the final value of ∠µ(p5 ). More accurately: (a) Since µ1 (p4 ) has an argument greater than 0, the transition t2 is enabled according to Definition 16. Note the influence of ∠µ1 (p4 ) in the enabling rule of the Definition, through an atomic proposition AP based on ◦t2 = {q7 } and G(t2 ) = “ > ”. (b) Firing the enabled transition t2 produces the subsequent execution of two hulls: q3 and q4 . (c) The execution of q3 always leads to a unitary decrement of the argument in p4 , since | ◦ q3 | = 1 and H(q3 ) = −1. This is:

∠µk+1 (p4 ) = W (q3 , p4 ) · (∠µk (p4 ) ·W (p4 , q3 ) + H(q3 )) = 1 · (∠µk (p4 ) − 1) = ∠µk (p4 ) − 1 For k = 1 · · · b + 1. (d) The execution of q4 provides the system with iterative additions within the scope of p5 . That is: ∠µk+1 (p5 ) = W (q4 , p5 ) · (∠µk (p3 ) ·W (p3 , q4 ) + ∠µk (p4 ) ·W (p5 , q4 ) + H(q4 )) = 1 · (∠µk (p3 ) + ∠µk (p5 ) + 0) = ∠µk (p3 ) + ∠µk (p5 ) For k = 1 · · · b + 1.


59

(e) This is: T → − µ1 = (0ei·ρ·a , 0ei·ρ·b , 1ei·ρ·a , 1ei·ρ·b , 1ei·0 , 0ei·0 ) T → − µ2 = (0ei·ρ·a , 0ei·ρ·b , 1ei·ρ·a , 1ei·ρ·(b−1) , 1ei·ρ·a , 0ei·0 ) T → − µ3 = (0ei·ρ·a , 0ei·ρ·b , 1ei·ρ·a , 1ei·ρ·(b−2) , 1ei·ρ·2·a , 0ei·0 ) .. .

− → i·ρ·a , 0ei·ρ·b , 1ei·ρ·a , 1ei·ρ·2 , 1ei·ρ·(b−2)·a , 0ei·0 )T µ− b−1 = (0e T → − µb = (0ei·ρ·a , 0ei·ρ·b , 1ei·ρ·a , 1ei·ρ·1 , 1ei·ρ·(b−1)·a , 0ei·0 ) − → i·ρ·a , 0ei·ρ·b , 1ei·ρ·a , 1ei·0 , 1ei·ρ·b·a , 0ei·0 )T µ− b+1 = (0e (f) When ∠µ(p4 ) reaches the value of 0, only t3 is allowed to fire next, deviating the control flow towards the end of the procedure. 4. The firing of t3 not only puts a token in p6 , due to p6 ∈ t3 • , but also sets ∠µ(p6 ) to the value in ∠µ(p5 ), because of the execution of q5 .

3.8

DFN Matrix Equations

− − Deciding whether a marking → µk can be obtained from the initial marking → µ0 , and a sequence of transition firings, is a fundamental issue in the theory of Petri nets, which is known as reachability analysis. The reachability analysis of classical Petri nets can be carried out in any of the following three ways: • coverability tree • state equations • reduction rules This section concentrates on the application of the second method to the reachability analysis of DFN models. Similar to Petri nets, the DFN’s behaviour can be analysed by means of matrix equations but, unlike PNs, these equations need to consider the influence of both control and data flow. Thus, besides the classical analysis by vector addition systems [Nash, 1973], DFN models need to also consider an algebraic system that takes the data domain into account. This is achieved, in DFN models, by the use of complex numbers (and polar notation) in the matrices that define the state equations. Before introducing this approach, it is necessary to define some matrices that are generated from the topology of the DFN model. For instance, the Cartesian product

60


(P × T ) generates the matrix D+ by associating W (t j , pi ) to the element [di+j ] of such matrix, and (T × P) generates D− where [di−j ] = W (p j ,ti ). These matrices have been explained in Chapter 2 (c.f. Section 2.2.2), and are used in classic Petri nets to form T the incidence matrix D = D+ − (D− ) . In addition, DFN models require another four matrices to define the topology of a tripartite graph, which are: (Q × P) generates S+ ; (P × Q) generates S− ; (T × Q) generates A+ ; (Q × T ) generates A− . Also, by using the Ω-transformation (given in Appendix A), it is possible to define a matrix M, which contains the information from the marking function, such that M = − Ω(→ µ ):    → −  Ω( µ ) = Ω  

µ(p1 ) µ(p2 ) .. .





     =     

µ(pn )

µ(p1 ) 0 ··· 0 0 µ(p2 ) · · · 0 .. .. .. . . . 0 0 0 0 µ(pn )

   =M  

Definition 14 establishes that each component µ(pi ) in the diagonal of M has two parts, i.e., its modulus and argument, therefore it is possible to formulate the following decomposition:    M=  

γ1 0 · · · 0 0 γ2 · · · 0 .. .. . . . 0 . . 0 0 0 γn

      ·    

ei·ρ·α1 0 ··· i·ρ·α 2 0 e ··· .. .. .. . . . 0 0 0

0 0 0 ei·ρ·αn

    = Γ · exp (ρ · Λ)  

where exp(.) is the exponential of a matrix, and matrices Γ and Λ are formed by elements γi = |µ(pi )| and αi = ∠µ(pi ) in the diagonal, respectively. Should the notion of time be implicit in the equation, an index k may be used. Analogously to the state equations for classical PNs presented in [Murata, 1989], the DFN state equation in the control domain, has the following connotation: − Γk+1 = Γk + Ω (D · → x)

(3.6)

61


T − where the D = D+ − (D− ) is the incidence matrix, and → x is the firing vector. The data domain, on the other hand, follows a slightly more complicated approach. It is desirable to have the choice to either (a) change the state of, e.g., a data register, into a value that is unrelated to the previous one, or (b) keep the state unaltered. Therefore, Eq. (3.7) is composed of three terms: the first one for the changes, the second one for the places that remain unaltered, and a third one for the offset function.

T − Λk+1 = S · Λk · ST + S · Λk + S+ · Ω (→ y ) · H · S+

(3.7)

− where: S = S+ · Ω (→ y ) · S− ,

T − S = I − S+ · Ω (→ y ) · (S+ ) , and 1

H is a matrix which has: [hii ] = H(qi ), ∀q ∈ Q. The notation k.k1 indicates that a 1-norm of the result is taken, i.e., any number different − from 0 is considered to be 1. The vector → y indicates which hulls are available to execute, and is obtained as follows: → − − y = Ω−1 A · Ω (→ x ) · AT

(3.8)

The above formulae allow the calculation of Mk+1 = f (Mk ). By induction Mk+2 = f (Mk+1 ) can also be obtained, thus Mk+2 = f ( f (Mk+0 )), and so forth. This leads to the formulation of the reachability problem: Mn = f (. . . f ( f (M0 )) . . .) for DFN models. However, Peterson [1981] pointed out that the major drawback of this method for reachability analysis, is that it suffers from insufficiency, i.e., the solutions to (3.6) and (3.7) are necessary conditions for marking Mn to be reachable from M0 , but sufficiency cannot be guaranteed. Therefore, this analysis is ruled out by the methodology presented in Chapter 4.

3.9

Decidability Issues for DFN

It is well known that boundedness is decidable for classical PNs [Karp and Miller, 1969]. Indeed, Esparza and Nielsen [1994] show the importance of the reachability problem for Petri nets, since many other decidability problems are proven to be recursively equivalent to reachability. This section proves that the reachability problem is decidable for DFN models, by decomposing DFN models onto regular (finite) Petri nets. The structure of DFN models consist of places, transitions, and hulls. Transitions

62


(and their interaction with places) have the same semantics than regular PNs, which means that the control part of a DFN model is already decidable for being semantically equivalent to a PN. However, it is still necessary to analyse the data part in order to come up with a conclusion. Thus, four basic blocks built upon classic PNs are first introduced, and later on this section discusses how to combine them in order to perform the semantics of a hull and its interaction. p

p

p

2

1

p

3

4

p

5

t

t

1

t

2

t

3

4

p

6

p

7

p

8

p

9

p

10

Figure 3.6: Copying process (COPY) Since ∠µ(p) ∈ Zn , the data part of a place may be decomposed into n places p1 , p2 , . . . , pn such that a value of 0 data quanta is represented by a token in p1 , 1 data quanta implies a token in p2 , etc.. Figure 3.6 shows a PN model that copies the data information contained in {p1 , p2 , p3 , p4 } to {p7 , p8 , p9 , p10 }. The copying procedure is synchronised with the action of moving a token from p5 to p6 , in order to avoid multiple tokens6 . For example, assuming there is a token in p3 , transition t3 will wait for a token in p5 to be enabled. After firing t3 , the token is returned to p3 (so, no changes in the marking of the places in the top of the diagram) and p9 is marked with a token (together with p6 , indicating the end of the process). In Figure 3.7 the process of summation has been represented. Places {p1 , p2 , p3 , p4 } and {p5 , p6 , p7 , p8 } represent two data inputs to a hull (i.e., two places p ∈ ◦ q). There are n × n = 16 transitions in this Petri net, which represent the addition over the data domain by putting a token in the right place, among {p11 , p12 , p13 , p14 }. For instance, t7 will only fire when performing the addition 2 + 1 = 3, which is represented as taking tokens from both p3 and p6 , and placing one in p14 . In general, the number of transitions (#t) needed for a hull q is: #t = nm 6 note

that arcs sharing either a place or a transition are drawn using a simplified graphical notation, which resembles a bus or wire in an electrical schematic diagram, in order to avoid cluttering.

63

Chapter 3: Dual Flow Net model p

p

1

p

2

t

t

1

p

3

t

2

9

t

t

5

t

p

5

t

3

t p

p

4

6

p

7

8

4

t

7

8

p

6

10

t

9

t

p

t

t11

10

t

13

t

14

p

12

t

15

p

11

16

p

12

p

13

14

Figure 3.7: Summation process (SUM) where n = |Zn | and m = | ◦ q|. Again, a token is needed in p9 in order to start the summation process, and p10 will be marked upon finishing the process. p

p

1

p

2

p

3

4

p

5

t

t

1

t

2

t

3

4

p

6

t

t

5

t

6

t

7

8

p

11

p

7

p

8

p

9

p

10

Figure 3.8: Overwriting process (OWR) As explained in Section 3.5, the execution of a hull overwrites a marking ∠µ(p j ) with the result of a weighted sum. Figure 3.8 shows the effect of overwriting in the data domain, where a new value is placed in the output regardless its previous content. For example, if there is a token in p2 , transition t2 will only become enabled when p6 is marked (besides the usual µ(p5 ) = 1 for beginning of the process). Place p6 can only be marked after one transition of {t5 ,t6 ,t7 ,t8 } removes the token in {p7 , p8 , p9 , p10 }. Therefore, the data value formed by the markings of {p7 , p8 , p9 , p10 } is overwritten by a new marking where µ(p7 ) = 0, µ(p8 ) = 1, µ(p9 ) = 0, and µ(p10 ) = 0.

64

Chapter 3: Dual Flow Net model p

p

p

p

p

3

2

1

p

11

4

p

p

p

p

p

12

p

13

14

15

5

6

16

COPY

COPY p

7

t

p

p

8

9

p

p

10

p

17

18

p

p

19

20

1

p

p

21

22

SUM p

23

p

24

p

25

p

26

p

27

t

2

p

33

p

28

OWR p

29

p

30

p

31

p

32

Figure 3.9: DFN hull represented by PN Having explained the process of copying (COPY), adding (SUM), and overwriting (OWR), Figure 3.9 combines them in order to illustrate the principles of a hull. It can be observed that a number of transitions (t1 and t2 ) interconnect the different processes in such a way that only when the copying stage has been completed, i.e., all the copying processes have finished, the summation process is activated and, at a later stage, the overwriting process. For example, assume that {µ(p1 ) = 0, µ(p2 ) = 0, µ(p3 ) = 1, µ(p4 ) = 0} and {µ(p11 ) = 0, µ(p12 ) = 1, µ(p13 ) = 0, µ(p14 ) = 0}, which would be equivalent to having ∠µ(p1 ) = 2 and ∠µ(p2 ) = 1 in a DFN model. There will be a COPY stage which will put tokens in p9 and p18 , keeping the tokens in p3 and p12 . Then, t1 will eventually fire and start the SUM stage, where only one out of sixteen transitions will fire, putting a token in p26 . After firing t2 , place p32 will receive that token regardless of the marking {µ(p29 ), µ(p30 ), µ(p31 ), µ(p32 )}, which completes the OWR stage.

65


When a (Q×T ) arc exists, the guard function G(t) may indicate that the result of the summation performed in the hull has to be compared with 0. Because data information has been split into n PN places for each DFN place, comparing this value is just a mater of obtaining a token from the right transitions. For instance, Figure 3.10 shows the comparing (CMP) process used to capture the condition G(t) =“=”. Comparing the result of the hull operation with 0, can be obtained by a token placed in p7 . In this way, depending on which transitions are connected to the output place, all conditions of the set of conditional operators can be met, e.g., the condition (∠µ(p) = 1) ∨ (∠µ(p) = 2) ∨ (∠µ(p) = 3) has been drawn using dashed lines. p

p

p

2

1

p

3

4

p

5

t

1

t

t

2

3

t

4

p

6

p

7

p

8

Figure 3.10: Comparation process (CMP) This section has shown that DFN models may be decomposed into classic Petri nets, with a finite number of transitions, which implies that decidability issues in DFN models homologous to those in PNs. Therefore, both boundedness and reachability problems are decidable for DFN. Indeed, this section has not only shown that the number of places and transitions used for a DFN translation into PNs remains finite, but also an interesting issue has arisen as a consequence of the underlying periodicity of DFN models in the data domain. The summation process (SUM) depicted in Figure 3.7 shows a network of 16 transitions that cover all possibilities for the summation of two places with Zn in the data domain. Having not defined the data domain as periodic, the amount of places needed at the output would have double. This puts DFN models in a better position when compared to other high level PNs.

3.10

Concluding Remarks

This chapter has presented the DFN model, a model that exploits the control/data-flow interaction in order to ease the design of embedded systems. The model has been introduced in a formal fashion, defining the syntax and semantics of both structural and

66


behavioural parts. A few pragmatic realisations based on the DFN model have aided on the understanding of the model’s principles. The aim of the DFN model is to enhance the interaction between control and data flows of an embedded system specification. This chapter has addressed two important issues that concerns the field of embedded system modelling. The first issue tackles the lack of control/data-flow interaction by traditional methods, which have a bipartite underlying graph, and proposes to use a tripartite graph in order to solve such weakness. The second issue addressed is the behavioural part of the model, where complex numbers have been incorporated in the state space of the model, due to their suitability for the extended structure resulted from the tripartite graph. Although the control/dataflow interaction has been addressed in some related work (c.f. the MCD , MDC and MB¯ approaches in Chapter 2), the two issues mentioned above has provided a framework for the analysis of embedded systems that is capable to exploit features inherent to this interaction. This is achieved by means of an analogy between the way hardware works and the implicit periodicity in the complex numbers on a polar representation basis. Data Flow

DFN Model

D

Q Q’

Controller

Datapath

Control Flow

Figure 3.11: The DFN philosophy Taking into consideration only two of the three set of vertices of the underlying tripartite graph in the DFN model, leads to a purely control oriented (or purely data oriented) representation. Figure 3.11 shows that when only places and transitions are considered, a classical Petri net is obtained, which can be synthesised to a controller as in [Yakovlev and Koelmans, 1998], whereas if only hulls and places are considered, the model turns out to be similar to a directed graph, which can be easily synthesised into a datapath by, e.g., those techniques cited in [Kollig, 1998]. Thus, the DFN model


67

may be considered as a generalisation of other proposed models for embedded system design which are based on high level PNs, due to its tripartite structure instead of a bipartite one. However, a question that still remains open is whether or not Coloured Petri Nets (CPN) have the ability to extend PNs in that extent. It is known that CPNs have pioneered the incorporation of data flow analysis to PNs, but what this chapter argues is that the relation between control and data flows has to be treated differently. The fact that DFN hulls do not fire asynchronously but are rather controlled by a transition has not been chosen out of the blue, as well as the fact that hulls replace an existing value of a place by a new value, regardless what the value was. Although it is still possible to get a replace effect in CPNs, the synchronised execution would be a daunting task to solve.

Chapter 4 Formal Verification of Dual Flow Nets The semantics of a new model for embedded system design, called Dual Flow Nets, has been introduced in Chapter 3. Reasoning about the correctness of an embedded system model leads to an implementation which is less prone to design errors, hence it reduces the number of iterations performed at earlier stages of the design. This chapter focuses on the implementation of a set of algorithms based on existing model checking tools, in order to validate the design of embedded systems based on DFN models. The behavioural properties analysed are expressed either in a branching timed fashion, e.g., by means of CTL, or following a linear structure, e.g., by means of LTL. Furthermore, since model checking tools suffer from state explosion, this chapter also addresses a modular approach that considers the decomposition of DFN models into pieces of smaller complexity, guided by an estimator. A pragmatic way to illustrate both, the verification methodology and its modular approach, is also undertaken in this chapter. The rest of this chapter is organised as follows. Section 4.1 reviews some related work in terms of model checking design representations based on Petri nets and introduce the principles of the tool used as a verification engine. The core of the proposed implementation for the framework given in Figure 4.1, is given in Section 4.2. In Section 4.3, two real-life examples are presented. Section 4.4 introduces a new method for estimating the complexity involved in the verification process, based on compositional verification. The applicability of both the framework and the estimation method is illustrated in Section 4.5 through an Ethernet coprocessor. Finally, Section 4.6 presents some concluding remarks for the chapter.

69

Chapter 4: Formal Verification of Dual Flow Nets

4.1

Background

Chapter 2 (Section 2.5.2) has introduced the principles of model checking. This section concentrates on the applicability of those principles to the DFN model. The model checking methodology proposed in this chapter is shown in Figure 4.1. It can be seen that the core of such a methodology is a verification engine, which in this case is the Cadence SMV model checker [Cadence, 2001]. However, this does not imply that the proposed methodology is restricted to this particular tool. The input to the methodology is an embedded system specification composed of both a DFN model and properties expressed in temporal logics. The DFN model is translated into a source code understood by the model checker while, on the other hand, a library written in the tool’s language captures the DFN semantics according to the definitions introduced in the previous section. The verification is driven by a scheduler (sch), which determines a valid sequence of transition firings in order to analyse the resulted behaviour. The outcome of the verification is the result of evaluating the correctness of the DFN model with regard to the temporal logic properties. Embedded Systems Specification LTL / CTL Properties

DFN Model

G (p −> F q) AG (q −> EX r)

Model Checker Source Code

...

. . . forall(i in CTRL) forall(j in PLACES) ctrl[i][j]:= switch(i,j) {

f

N

P

Verification Engine

T Q

BDD

G sch

N

YES

f

Library

NO

Figure 4.1: Proposed model checking methodology

70

Chapter 4: Formal Verification of Dual Flow Nets 4.1.1

Model Checking of PN-based models

The symbolic manipulation of Petri nets has been widely studied [Pastor et al., 2001; Yoneda et al., 1996; Heljanko and Niemelä, 2001] for safe Petri nets. Such a manipulation is based on finding the relation Rk+1 , which formalises the set of reachable markings that are obtainable from a set of markings Rk , through the transition relation N. Since PNs are assumed safe in such approaches, each marked place contain either − 0 or 1 token, which can be expressed as: → µ ⊆ IBn , where n = |P|. Formally, the set of − reachable markings from → µ0 is given by: − 0 → − − − R1 = R0 ∪ → µ ∃− µ [→ µ ∈ R0 ∧ (→ µ ,→ µ 0 ) ∈ N] − Where R0 = {→ µ0 }. Burch et al. [1994] shows that such a set is associated with operations between binary decision diagrams (BDD), in the following way: − − R1 (→ µ 0 ) = R0 (→ µ 0) ∨

∃µ R (→−µ ) ∧ N(→−µ , →−µ ) 0

0

− − − Where R0 (→ µ ) and N(→ µ ,→ µ 0 ) are BDD representations of the boolean functions that produce R0 and N respectively. This leads to the formulation of the set of reachable markings in at most k + 1 transition firings: − − Rk+1 (→ µ 0 ) = R0 (→ µ 0) ∨

∃µ R (→−µ ) ∧ N(→−µ , →−µ ) k

0

Wimmel [1996] has proposed four approaches to model checking classical PN by means of the SMV tool [CMU, 1997], which are: firstly, implementing transitions as SMV processes; secondly, a solution to the fairness problem; thirdly, utilising places as modules; and finally, representation using TRANS and INIT. The first approach utilises SMV modules for the definition of transitions t ∈ T , while places p ∈ P are declared as boolean variables. By the instantiation of the process corresponding to the transition that fires, places in the argument of the module change their content, according to whether they are in the input or output part. The key aspect of such an approach, is that SMV chooses nondeterministically which module to execute, hence capturing well the intrinsic nondeterminism that exists in the PN firing rule. However, one of the limitations of the first approach is that no fairness is assumed, i.e., there is no guarantee that SMV will eventually visit all transitions. Thus, Wimmel has reported a


71

second approach that uses a global variable, in order to keep track of transitions that have already been fired. In that way, the method can assure that all transitions will fire infinitely often. Wimmel found that an approach that takes places as modules (third approach) leads to a more effective model checking, compared to the first and second one. When implementing places as modules, a place has the possibility to be marked with either 1 or 0, depending whether the firing transition belongs to the pre- or post-set of the place. Finally, Wimmel refers to [Corbett, 1996], for an efficient technique based on the two SMV directives TRANS and INIT. In spite of all the advantages mentioned in [Wimmel, 1996], the major drawback of such a technique is the lack of modularity which makes the solutions generated hard to reuse.

4.2 Model Checking of DFN models This section presents a library which captures the semantics of DFN, in order to perform the model checking of such models. This library consists of four modules (place(), transition(), hull() and guard()) and a scheduler (sch). The structure of the net is built by means of successive instantiations of these four modules. Since an enabled transition may or may not fire, the scheduler has been defined in such a way that the schedule generated for each step is restricted to the set of transitions which are enabled, i.e., the scheduler nondeterministically chooses, from the set of enabled transitions, the next transition to fire. The nondeterministic scheduler code is presented in Figure 4.2, where M= |T | is the number of control transitions in the net. A variable sch in the range of 1 6sch6M indicates which transition t[sch] is firing at time k. The firing of the transition t[sch] changes the marking µ of the system, which is shown in the last line of Figure 4.2. Note that pp is the output of the transition() module, i.e., p0 in Figure 4.3. As a consequence, there is a change in the enabled transitions, i.e., t[i].en=1, which influences the nondeterministic value assigned to sch = {i : i = 1..(M), t[i].en}. typedef TRANSITIONS 1..(MM); ... sch : TRANSITIONS; sch := { i : i = 1..(MM), t[i].en }; ... forall (i in PLACES) next(p[i].modulus) := t[sch].pp[i];

Figure 4.2: Nondeterministic scheduler


72

In the following subsections, we introduce three modules of the library, where the place() module is a data structure, and the transition() and hull() algorithms are presented.

Places: Due to the underlying tuple in µ(p), the set P cannot be constructed as an array of integers as in classical PNs. In DFN, modulus and argument of a marked place comprise a two-member structure which is the basic data type for the array p[i]. Then, the extraction of each part of the complex tuple, i.e., the application of the |µ(p)| and ∠µ(p) operators in order to obtain modulus and argument respectively, is performed through a direct access to the member of the structure stored within the array p[i]. Transitions: The algorithm presented in Figure 4.3 consists of two main parts, checking if the transition is enabled (s1) and the firing operation (s2), which refer to Definitions 16 and 17 (c.f. Chapter 3), respectively. When a transition is selected by the scheduler, an assignment occurs in all places of the net. If the transition determined by the scheduler (t[sch]) is enabled, the assignment carried out in (s2) considers the effect of the firing rule presented above. Module: TRANSITION Input: - p: array of places - pre: array in control domain - post: array in control domain - guard: boolean Output: - p’: array of places

01 s1: Check enabling condition: en ← guard ∧

02

|P| V

pre[i] > 0 =⇒ | p[i] | > pre[i]

i=1

03 s2: Perform Firing Operation: 04 forall(p[i] ∈ P) 05 if (en) then | p’[i] | ← | p[i] | - pre[i] + post [i] 06 else p’[i] ← | p[i] |

Figure 4.3: Algorithm for a DFN Transition

For the sake of clarity, assume a transition() module instantiated as follows: t[1]: transition(p,[3,1,0,2],[0,1,2,0],1); This means that:


73

•t 1

= {p1 , p2 , p4 }; W (p1 ,t1 ) = 3,W (p2 ,t1 ) = 1,W (p4 ,t1 ) = 2; t1 • = {p2 , p3 }; W (t1 , p2 ) = 1,W (t1 , p3 ) = 2 and No guard function exists (since 1 is the default value on the result of a guard() module) With regard to Definition 16 (Chapter 3), which is implemented through the enabling condition part (s1), the following logical condition: -

guard ∧

|P| ^

pre[i] > 0 =⇒ |p[i]| > pre[i]

i=1

is unfolded into: > ∧ |p[1]| > pre[1] ∧ |p[2]| > pre[2] ∧ |p[4]| > pre[4] ≡ > ∧ |p[1]| > 3 ∧ |p[2]| > 1 ∧ |p[4]| > 2

(4.1)

The boolean result of Eq. (4.1) is used in part s2 to see whether the next step of the argument of each marked place, or whether its former content, is assigned to the result of the firing rule. Note the connection between the scheduler’s assignment and the output of the transition() module. Hulls: Similarly, Figure 4.4 shows the algorithm for a hull q ∈ Q. The enabling part s1 checks whether or not the transition which is firing at a given step k, i.e., t[sch], has any connection to the hull itself. This conforms with Definition 17.3. The action described in Definition 18 is implemented in the execution part s2, which allows the next state of the argument in pi to be set to either (a) a sum of values coming from the preset, or (b) its previous content. Taking into account the discussion in Chapter 3 (Section 3.5), where it has been said that special care has to be taken in the summation over the Z∗[i] domain, it is worth pointing out that line 09 has been implemented in that way. Guard Function: The crossing of information between control and data domains is partially achieved by the guard function G. This function compares a value in the


74

Module: HULL Input: - p: array of places - pre: array in data domain - post: array in data domain - FA: set of arcs in (T × Q) Output: - p’: array of places

01 s1: Check enabling condition: 02 if (output of the scheduler ∈ FA) &(there is no deadlock) 03 then 04 en ← > 05 else 06 en ← ⊥ 07 s2: Perform the Execution Operation: 08 forall(p[i] ∈ P) |P|

09

s ← ∑ ∠ p[i] · pre[i] i=1

10 11 12

forall(p[i] ∈ P) if (en) then ∠ p’[i] ← post[i] · s else p’[i] ← ∠ p[i]

Figure 4.4: Algorithm for a DFN Hull

argument (data domain) of a set of marked places, and produces, as a result, a decision that affects the modulus of another set of marked places (control domain). The comparison is carried out by the boolean evaluation of an atomic proposition Module: GUARD Input: - op: place - sgn: {=, 6=, >, ≥, ’) then guard ← (∠ op > val) 06 if (’≥’) then guard ← (∠ op > val) 07 if (’

Figure 4.5: Algorithm for a DFN Guard


75

(AP), such as it has been suggested in Definition 7. Figure 4.5 shows the implementation of the guard() module mainly consists of a switch() statement that returns the evaluation of a condition, based upon the symbol used at its argument. Appendix B gives details about the implementation of these modules using the popular Cadence SMV language. 4.2.1

Implementation and Results

The library introduced in Section 4.2 has been applied to a number of examples, in order to demonstrate their validity. Without loss of generality, the implementation of the algorithms has been done using the Cadence SMV tool [Cadence, 2001] as verification engine for the methodology, i.e., the four-module library has been encoded using Cadence SMV syntax, for the following reasons: • • • •

It is able to analyse both CTL and LTL properties. It can potentially reduce the BDD space by means of symmetry. It supports data type reduction. It is robust and well known within the research community.

The aim of this section, is to show how the methodology described scales up, when the complexity increases. For this, the multiplier presented in Chapter 2 (Figure 2.10) and later modelled by means of the DFN model (Figure 3.5) has been studied for various levels of capacities K and lengths L. The property to be verified is the generic functionality of a multiplier, i.e., for any inputs a and b the output of the multiplier should be the product: a · b. Therefore, two abstract signals a and b are used and the following LTL property has to be assessed: ϕc = 3 (∠ pc = a · b) This means that ∠ p6 will eventually reach the value of (a · b). Although the methodology can be applied to the formal verification of more complex problems, mainly in the area of reactive embedded systems, this simple example is concise enough to give an insight of the framework efficiency. To illustrate the effects obtained on the results, when the method is scaled up to realistic problems, Figure 4.6 shows a search through the state space of the multiplier example, introduced in Chapter 2 (Section 2.2.5) and modelled using DFN in Chapter 3

76

Chapter 4: Formal Verification of Dual Flow Nets 350

Data Domain Control Domain

300

time [sec]

250 200 150 100 50 0

2

4

6

8 10 Capacity / Length

12

14

16

Figure 4.6: Verification time depending on the capacity K and the length L (Section 3.7). Two early indicators of complexity, in both control and data domains, are used in order to establish some parameters to foster the ongoing research. The solid line shows the state space exploration of the multiplier when the capacity K(p), ∀p ∈ P is fixed, and the length L(p), ∀p ∈ P is variable, while the dotted line represents a space search using constant L(p) and variable K(p). As shown in the figure, increasing L(p) results in an exponential growth of verification time, while increasing K(p) results in a linear growth. Therefore, it can be said that larger ranges of the values have a stronger impact on the complexity of the methodology, rather than increased number of tokens. Within the context of the complex plane, this means that a further refinement in the quantisation of the modulus of the underlying complex number of marked place is preferred, rather than a refinement in the quantisation of its argument. 4.2.2

Properties

There are two types of properties, structural and behavioural. While the first one only depends on the structure S of the model, the latter also depends on the initial marking µ0 . The analysis of behavioural properties is of much interest in PN theory (therefore, in DFN). This section presents a brief analysis of some behavioural properties such as reachability, safety and liveness, and relate them to both LTL and CTL notation in order to apply our proposed methodology. Reachability: As outlined in Chapter 3 (Section 3.8), a marking µk is reachable from a marking µ0 if there is a sequence of transition firings which leads to µk . In spite


77

of the work carried out by Mayr [1981], which proves that reachability analysis based on vector addition systems is decidable, there are serious limitations in terms of tractability of such approaches. The major drawback of reachability analysis based on vector addition systems comes from the sparsity of the matrices involved in the state equation. DFN state equations presented in Section 3.8, Eqs. (3.6)-(3.8), are not an exception. Indeed, because they contain data information, their complexity level is increased compared to clasical PN state equations. Thus, a more efficient method for reachability analysis is used in the DFN methodology. CTL formulae is used instead of the algebra inherent to vector addition systems, where a marking µk is said to be reachable if the following property holds: ϕR = ∃3

|P| V

(µ(pi ) = ci )

(4.2)

i=1

where ci = bi ei·ai is the desired final marking µk (pi ), ∀pi ∈ P at the time step k. Therefore, if both |µ(pi )| = bi and ∠µ(pi ) = ai hold, then the state of the system has eventually reached a marking of µk (pi ), ∀1 6 i 6 n.

Safety: Safety properties are conditions that are verified along any execution path. These type of properties are usually associated with some critical behaviour, thereby they should always hold. A particular type of safety property is known, in the context of Petri nets, as safeness. Classically, a safe PN allows, at most, one token in every place, for any reachable marking, which means that the following LTL formula holds: ϕs = 2

|P| ^

(|µ(pi )| 6 1)

i=1

Liveness: A DFN model that never changes its marking is likely to be of very little interest. Thus, liveness properties indicate that a certain DFN model would not get trapped into a single marking (or a particular cycle defined by a limited set of markings). The absence of deadlocks is a fundamental liveness property in the theory of Petri nets. For the model checking of a DFN model, the deadlock

78

Chapter 4: Formal Verification of Dual Flow Nets condition is expressed as follows:

ϕd = ¬

|T | _



 ^

|µ(p j )| > W (p j ,ti )

 i=1

pj

∈ •t

i

This means that there is “at least one transition enabled”. The condition ϕd is used within s1 in the hull module as a part of the enabling condition, i.e., the ‘deadlock’ condition affects the hull’s member ‘en’, as illustrated in Figure 4.4.

4.3

Experimental Results

All empirical results have been obtained using the Cadence SMV tool on a Sparc SunUltra 10 / 440 MHz with 512 Mb RAM running Solaris 8. Two benchmarks are presented in this section, in order to show the applicability of the DFN model checking methodology to real-life examples: (1) a VME-bus controller, and (2) an Ethernet coprocessor. 4.3.1

Example 1: verification of a VME-bus controller

The VME-bus controller [Peterson, 1998] is an interesting asynchronous circuit that has been widely studied. This bus is of much interest for industrial applications (e.g., avionics), since it provides a flexible multi-master bus arbitration scheme that is both simple and not vulnerable to noise. Figure 4.7 shows the interface of a generic device connected to a VME bus, and its timing diagram. The functionality of the controller is to regulate the reading and writing cycles of the device connected to the bus through a data transceiver. The controller opens the transceiver to transfer data from (to) the device (bus), by means of signal D. In the read cycle, a request-to-read signal DSr is propagated to LDS. Then, the device acknowledges with a LDTACK when the data is ready, which is taken into account to activate signal D. A falling edge in signal DSr, i.e., the end of the read cycle, causes D and all other outputs of the controller to go low. In the write cycle (started by signal DSw), the controller places the data in the device’s port and sends a request-to-read signal to the device through an LDS signal. When the device is ready, an LDTACK signal is produced, which closes the transceiver in order to isolate the device from the bus. The design of the complete VME-bus controller has been presented in [Yakovlev and Petrov, 1990], where signal transition graphs (STG) have been used to attain a complete gate netlist from the timing diagram specifications

79


of the model. The synthesis part of the design has been carried out through four nextstate functions, which are replicated in Eqs. (4.3)-(4.6). D LDS DTACK csc0

Bus

= = = =

LDTACK · csc0 D + csc0 D DSr · (csc0 + LDTACK )

Data Transceiver

(4.3) (4.4) (4.5) (4.6)

DSr LDS LDTACK

Device

D

DSw

D

LDS

DSr

VME Bus Controller

DTACK

LDTACK

DATA

DTACK

(a) Interfase diagram

(b) Read cycle

Figure 4.7: VME bus controller This section describes the application of our model checking methodology to an extended version of the model, which includes not only the VME-bus controller but also the Data Transceiver. The goal of this section is to verify that the DFN model presented in Figure 4.7 corresponds to the behaviour inherent to the net description. To achieve this, we have developed a DFN model that takes into consideration the behaviour described above. Each control signal described has been bound to a pair of transitions, the first one for capturing the rising edge while the next one captures its falling edge. This binding is presented bellow: DSr LDS LDTACK D DTACK

← ← ← ← ←

{t3 ,t10 } {t1 ,t7 } {t2 ,t4 } {t5 ,t9 } {t8 ,t6 }

For example, firing transition t1 represents the rising edge of signal LDS and, consequently, firing t7 its falling edge. Having introduced the binding used for all control signals, we may now capture the behaviour of the VME-bus controller’s read cycle

80


shown in Figure 4.7(b). The read cycle shows the temporal behaviour of each control signal, and their control dependencies. Thus, e.g., it can be observed that the rising edge of the LDS signal follows from a rising edge of DSr, and leads to the rising of LDTACK . This dependency is captured by a place p ∈ P, so that p ∈ ti • and p ∈ •t j iff the event represented by t j needs ti to occur beforehand. For instance, firing t3 , which corresponds to the rising edge of DSr, will put a token in a place that has to be shared by t1 (the LDS signal). Figure 4.8 presents the DFN model of the given specification. This model is an extension of the one in [Kishinevsky et al., 1998], where an additional hull is used to capture the Data Transceiver operation that has not been taken into account in such work. The verification of the four next-state functions obtained in [Kishinevsky et al., 1998], with respect to its timing diagram, is carried out by means of the following three LTL formulas:

◦

ϕ1 = 23 (LDTACK ∧ csc0 =⇒ (D)) ϕ2 = 23 (D =⇒ (DTACK )) ϕ3 = 23 (D ∨ csc0 =⇒ (LDS))

◦

◦

Which express that the next-state functions derived in [Kishinevsky et al., 1998] should be satisfied infinitely often. The idea of “next-state” is implicit in the notation “ =⇒ ”, which means that if the cause of the implication holds then the consequence holds in the next time step.

◦

p

12

q

p

1

t

13

11

p

14

t

t

5

p

5

10

p

11

p

p

10

7

t

t

8

p

3

2

t

p

4

t

3

p

1

t

6

p

8

t

9

1

p

2

t

p 4

6

t

p 7

9

Figure 4.8: DFN model of the VME bus controller


81

Eq. (4.6) is recursive and, therefore, it cannot be expressed in a linear or branching temporal logic, i.e., neither as an LTL nor a CTL formula. However, proving ϕ1 , ϕ2 , and ϕ3 is not sufficient in order to infer the correctness of the model w.r.t. the nextstate functions given in Eqs. (4.3)-(4.6). Therefore, the recursive next-state function is presented as a fact, and defined in SMV in the following way: abstract csc0 : boolean; next(csc0):= DSr & ( csc0 | ˜LDTACK ); Proved true, the three LTL formulas (ϕ1 , ϕ2 and ϕ3 ), and the signal (csc0), brings forth the correctness side of the model in relation to the timing diagram of the read cycle. Additionally, the designer may be interested in checking that the model captures issues related to the data flow of the system. Thus, ϕD1 and ϕD2 are two CTL formulas that describe the behaviour in the data domain. That is, for any path taken by the execution, there is a point where a rise in signal D (D↑) will eventually cause q1 to copy the data content of p12 into p13 , until the signal falls (D↓). Also, ϕD2 takes care that the negated condition occurs as a consequence of the fall of D (D↓), since the Data Transceiver should be in a high impedance mode at any time signal D is low. Formally, these CTL formulas are:

◦ ◦

ϕD1 = ∀3 (D↑ =⇒ ∀ (∃ (|µ(p13 )| = |µ(p12 )|) U D↓)) ϕD2 = ∀3 (D↓ =⇒ ∀ (¬∃ (|µ(p13 )| = |µ(p12 )|) U D↑)) Using the Cadence SMV model checker, the above five properties (three LTL and two CTL formulas), have been proved correct. This implies that it is possible to infer the correctness of the DFN model w.r.t. the timing diagram shown for the read cycle. The verification time for this case study was: 3.57 seconds. 4.3.2 Example 2: verification of an Ethernet network coprocessor The Ethernet network is an IEEE standard. The network coprocessor transmits and receives data frames over a network by means of the CSMA/CD protocol, which is defined in the IEEE 802.3 standard [ANSI/IEEE, 1991]. It has been used in previous publications in order to show an application whose complexity is closer to industrial standards than that of several other benchmarks. Gupta and De Micheli [1992] have used this coprocessor specification to test their hardware/software co-design tool on a real-life example. Narayan and Vahid [1992] have used it in order to compare their

82


modelling technique. Naik and Sistla [1994] present an approach to formally verify the CSMA/CD protocol. This approach utilises the SMV tool1 to verify both the asynchronous and the synchronous model [Weinberg and Zuck, 1992] of such protocol.

txaddress[16]

p

txstart

p

Bhold

9

Bhold

CPU

exec_unit

dma_xmit

10

control flow

Bholdak

Bholdak

data flow dma_xmit_cancel

txrestart

p

39

p

Bwait

19

1

cancelxmit

cancel

0

control flow

Brd

Brd

data flow p

13

DMAxmit[8]

p

34

p

14

Baddr[16]

Memory

xmit_frame

dma_xmit_normal

Bdata[16]

Figure 4.9: DMA tx/rx of the Ethernet coprocessor, and signals involved The operation of the coprocessor is controlled by the execution unit, which sends the starting memory address to the transmit unit and then enables a DMA unit to operate straight into memory. The DMA unit (dma xmit) directly reads from successive memory locations in order to obtain destination address, data length, and the actual data, which are then sent to the xmit frame. Figure 4.9 shows the top level diagram of the Ethernet coprocessor’s DMA transmitter/receiver. Data flow has been marked with thick lines in order to distinguish it from the control flow. This unit has two modes of operation: dma xmit normal and dma xmit cancel, so that it normally stays in the first mode but, if a failure occurs in the transmission to xmit frame, the DMA unit switches to an alternative mode that sets the environment to restart the transmission process. It can be observed in the figure that only the dma xmit normal mode of operation deals with the data flow of the system. The specification presented above is captured in the DFN model depicted in Figure 4.10. The work in [Narayan and Vahid, 1992] suggests that the specification should be divided into eight sub-processes, namely START, DEST1, DEST2, LENGTH, DATA, DATA2, END, and RESTART. In the DFN model, such sub-processes are initialised by placing a token in places p1 to p8 , which are highlighted in the figure for further reference. A property that can be drawn from the model, is that such places are mutually 1 Note the difference between the SMV tool they have used [CMU, 1997] and the Cadence SMV verification suit [Cadence, 2001] used here.

Figure 4.10: DFN model of the Ethernet coprocessor

33

34

t

p

"cancel"

36

p

37

p

39

p

10

p

9

p

35

p

t

t

22

23

1

p

t

1

2

32

2

p

t

−8

20

38

p

t

47

p

q

q

5

6

q 1

6

p

t

q 11

+1

t

2

−8

t

2

8

p

8

’=’

t 35

18

’=’ /

t

36

’=’

14

t

t

40

p

t

6

43

p

46

p

33

p

42

p

5

t

p

31

p

25

16

7

p

t

24

t

30

p

t

7

t 15

29

p

q

4

34

2

t

28

p

q

41

p

48

p

t

q

7

+2

8

3

p

22

p

q

t

4

3

12

p

20

p t

t

t

’0’

12

t

3

27

19

31

q

t

26

23

p

26

14

p

13

p

t

30

24

p

t

28

"Bhold"

"Brd"

16

p

19

p

18

p

17

p

15

p

27

p

29

p

"Bholdak"

25

p

t

21

p

Chapter 4: Formal Verification of Dual Flow Nets 83


84

exclusive, which can be expressed as: 2

8 ^

(|µ(pi )| = 1) ∧ (|µ(p j )| = 0) , ∀ j 6= i, 1 6 j 6 8

(4.7)

i=1

Section 4.5 provides more details about the verification properties that serve to prove the correctness of this model. The complete correctness of the DFN model for the Ethernet coprocessor has been proved after 4809.2 seconds, i.e., 1 hour and 20 minutes, of verification time. Although being able to prove the correctness of the entire dma xmit unit of the Ethernet (without reaching a state explosion) is quite an achievement, a methodology like the one presented in this section does not scale well for larger designs. Thus, the next section introduces a modular approach which allows the verification of larger designs based on the same framework.

4.4

Modular approach to the Verification of DFN models

In the design of large embedded systems, the designer often faces a trade-off as to which validation technique to use, in order to guarantee that the final implementation will meet the specification. On the one hand, formal verification techniques (as those hitherto analysed) aim at mathematically proving the correctness of a specification, but may suffer from state explosion. On the other, simulation [Bushnell and Agrawal, 2001] and testing [Abramovici et al., 1990] can cope with larger specifications, but they do not cover the entire state space. Chapter 1 has schematised this trade-off by means of Figure 1.4. The key aspect here is that simulation and testing are feasible for large models because they do not attempt to validate the entire specification at once. In other words, breaking down the specification into smaller parts, called modules, would make the entire system specification formally verifiable, i.e., the model checker will give an answer as to whether or not the specification satisfies a temporal logic property, in a finite amount of time. This type of verification, known as modular or compositional verification, is a divide-and-conquer approach that uses natural divisions in the specification in order to partition the structure and, therefore, reduce the number of states involved in the process. Compositional verification was introduced in [Clarke et al., 1989], aiming to reduce the complexity of large designs validation. Since then, the problem has been studied in several formalisms [Long, 1993; Grumberg and Long, 1994; Henzinger and


85

Alur, 1995], but not so often applied in computer-aided verification of real-world applications. Only in recent years there has been an increasing research effort towards developing tools and techniques to validate real-life embedded systems by compositional verification [McMillan, 2000]. This effort includes a variety of areas, such as microprocessor architectures [McMillan, 1998; Jhala and McMillan, 2001], hardware protocols [McMillan, 2001], multimedia SoC [Roy et al., 2000] and multi-agent systems [Jonker and Treur, 2002]. 4.4.1

Compositional Verification

It is known that a small increment in the size of the model structure M , results in a several-order-of-magnitude bigger state space to be explored [Kozen and Tiuryn, 1989; Kupferman et al., 2001]. Therefore, by decomposing M into n modules M1 , M2 , . . ., and Mn , in such a way that the parallel composition of all these modules M1 k M2 · · · k Mn is equivalent to the original structure M , it is possible to significantly reduce the number of states searched by the model checker, because the complexity is reduced. However, a model checker often uses some heuristic in order to avoid redundancy of larger state spaces and, by exploiting some features of BDDs, also reduces the amount of BDD nodes allocated in memory. This means that a large number of very small modules Mi not necessarily leads to the most efficient way to perform model checking, since they would not share any path in the verification process. In fact, there is a trade-off between having a vast number of small modules, which can have quite a substantial redundancy factor, and having only a few modules that are larger in size, which consequently takes memory resources at an exponential rate. Mathematically, the compositional verification of a structure M that has been constructed by the parallel composition, i.e., using the k operator, of modules Mi , is formulated as: M1 ϕ1 M2 ϕ2 .. .

Mn ϕn f (ϕ1 , . . . , ϕn ) =⇒ ϕ

M1 k M2 · · · k Mn ϕ which means that proving that each module Mi , ∀1 6 i 6 n, satisfies the temporal logic formula ϕi , and also proving that each formula ϕi is related to the property ϕ (through


86

a logical function f ), the conclusion that the entire system M satisfies the property ϕ can be drawn. A key issue in compositional verification is the assume-guarantee paradigm [Misra and Chandy, 1981; Pnueli, 1984], which bases its reasoning in separating a system in two parts: the module M 0 and the environment M 00 . Guarantees Gi are properties of M 0 which are verified assuming that M 00 satisfies some assumptions A j . By appropriately combining a set of assumptions A and a set of guarantees G , it is possible to infer the correctness of the entire system M without actually building the global state-transition graph. Motivating Example Suppose that a ten-stage pipeline system needs to be formally verified. The property to be verified is simply that some data put at the beginning of the pipeline will propagate and eventually reach the end. The minimum unit of functionality of the system is each stage itself, but it will be assumed that the system can only be partitioned into three separate modules. Table 4.1 shows that different ways of partitioning such a pipeline, i.e., grouping the pipeline stages, leads to different complexities in terms of BDD nodes allocated to memory. This raises the vexed question of how to identify the partitions that lead to better results, which is the aim of the next section. Partition h1, 2, 7i h1, 3, 6i h2, 3, 5i h2, 4, 4i h3, 3, 4i

BDD nodes 2706 2135 1912 1851 1830

Table 4.1: Sizes of BDD trees for a ten-stage pipeline 4.4.2

Estimation Method

Intuitively, a four-module system which has a partition P =h4, 4, 4, 4i is more balanced than the same system using other partition P0 =h1, 9, 4, 2i, where the elements in the tuple indicate the number of elementary units (transitions t ∈ T ) in each module. More formally, Definition 20 defines the modular unbalance of a Kripke structure which has been partitioned under a partition scheme P.


87

Definition 20 The modular unbalance of a Kripke structure M = M1 k M2 · · · k Mn is given by: s 1 n (4.8) σ= ∑ (mi − m)2 n − 1 i=1 where m is the statistical mean (average) of the sizes of the modules, i.e., m = m/n. It is clear from Eq.(4.8) that the unbalance is defined as the standard deviation of the size of each module w.r.t. the average size m = m/n. This can be seen as making balance analogue to volatility, since standard deviation is one of the most common ways to assess the volatility of a discrete variable. Thus, σ → 0 corresponds to a very balanced system while σ → ∞ is in accordance to an extremely unbalanced system. A system with σ = 0 will be called perfectly balanced, and a system with the greatest σ possible will be called perfectly unbalanced. Proposition 1 Among all possible partitions of a system M , the one which minimises σ will also minimise the verification complexity. Intuition, again, would serve to describe the feature outlined in Proposition 1. Suppose a system M , which has a partition scheme P, which is perfectly balanced. Then, changing from P to P0 implies that at least one transition t ∈ T has to be removed from a module Mi and added into another module M j . This will lead to a decrement in the complexity of verifying Mi and an increment in the complexity of verifying M j . Since the exponential nature of BDD’s state space, it is more likely that the improvement due to the reduction of Mi is less than the worsening caused by the expansion of M j . Therefore, the overall increasing of the verification complexity is accompanied by an increase of σ, and vice versa. Partition h1, 2, 7i h1, 3, 6i h2, 3, 5i h2, 4, 4i h3, 3, 4i

σ BDD nodes 2.6247 2706 2.0548 2135 1.2472 1912 0.9428 1851 0.4714 1830

Table 4.2: Estimating the pipeline complexity Coming back to the example presented in Section 4.4.1, Table 4.2 now shows that the more balanced partitioning scheme, the less number of BDD nodes allocated in

88


memory. That is, σ → 0 is a desirable feature of the design, which leads to an improvement of 32% in the resources that the pipeline example has allocated to the BDD tree. Therefore, this has provided an answer to the vexed question raised earlier, and shows that it is important to include σ as an estimation method, in order to guide the partitioning of a system which is being verified by compositional methods. Due to the internal behaviour of the model checker, which unfolds cycles to perform the verification, the results showed had assumed that there are no cycles within the modules. Otherwise, well balanced partitions may have bigger unfolded structure than others less balanced and, thus, more complex BDD trees. Figure 4.11 illustrates this principle showing a system with cycles within its modules and another without them. It is not true that between two partitions P and P0 , with σ < σ0 , the complexity of P will be less than the one from P0 . 2800 acyclic cyclic

2700 2600

BDD nodes

2500 2400 2300 2200 2100 2000 1900 1800

0

0.5

1

1.5 Standard Deviation

2

2.5

3

Figure 4.11: Cyclic and acyclic complexity

4.4.3

Towards an Automatic Approach

Hardy and Ramanujan [1918] have proven that the number of distinct partitions for a modular design is: q exp π 2m 3 √ #P(m) ≈ (4.9) 4m 3 where m is the number of modules and #P(m) the number of distinct combinations among them. However, not all Pi , 1 6 i 6 #P(m), are feasible. When the estimation


89

method is applied to DFN models, #P(m) defines the absolute maximum number of partitions that the m = |T | transitions may be grouped into. But knowledge of the DFN structure may significantly reduce this number. For instance, if there is a strong interaction between two transitions ti and t j , mapping them to separate modules would lead to communication overhead, which is not desirable. Therefore, Definition 21 introduces the concept of breakpoint places. Definition 21 Let ℑ p ⊆ P be a set of places arbitrarily chosen to be inputs to the system. Also, O p ⊆ P a set of places called output places. A place p is said to be a breakpoint place, if: i.- it has only one output transition: | p• | = 1. ii.- it is not part of either ℑ p or O p : p 6∈ ℑ p ∧ p 6∈ O p . iii.- it is not part of the data domain: p 6∈ SD ≡ p 6∈ ( ◦ q ∪ q◦ ), ∀q ∈ Q. The idea behind breakpoint places is to capture the bottlenecks in threads of control flow. As outlined in Eq. (4.7), a set of breakpoint places should be mutual exclusive. This is not a consequence of its definition, it is another constraint to be taken into account. Thus, by grouping places, transitions, and hulls into modules, such that they can only communicate against each other by breakpoint places, it is possible to capture those Pi that are fitted candidates to for a compositional approach. Thus, the motivation for this work is that the structure of the DFN model defines a preliminary set of feasible partitions and later, by considering the modular unbalance, the designer may choose the one that produces the less amount of BDD nodes. Beyond selection of the partitioning scheme Pi , the set Φ = {ϕ1 , ϕ2 , . . . , ϕn } of properties originally specified for the model, needs to be extended in order to include features that allow the consideration of a modular approach. Thus, a new set of properties Φ0 = {G1 , G2 , . . . , Gn , A1 , A2 , . . . , Am } is obtained, where ϕi is mapped to Gi , and the set {A1 , A2 , . . . , Am } is produced to assure liveness among breakpoint places.

A j : {1 (µ(p`1 )) =⇒ 3{2 (µ(p`2 ))

(4.10)

In order to achieve liveness, each A j is constructed following Eq. (4.10). Condition {1 evaluates either the number of tokens or data quanta from input places, while {2 is concerned with the marking of output places. The principle here is that given a condition {1 , the condition {2 should eventually become true. Thus, by proving that a


90

certain output place p` of a module Mi will be eventually marked with a token, all other modules M j that have p` as an input may eventually start. Then, there will be other A j which prove that this token is transferred to the output places of M j , and so forth.

4.5

Real-life Example: the Ethernet coprocessor

This section applies the estimation method discussed in Section 4.4, in order to improve the model checking results obtained in Section 4.3, One of the earliest attempts to deal with the state explosion problem in the formal verification of an Ethernet coprocessor, has been presented by Dill and Wong-Toi [1995], who directly implemented some liveness properties in C, and have used approximations to cope with the state explosion problem. However, the ANSI/IEEE [1991] underpinned that the CSMA/CD is a highly structured protocol, which makes the Ethernet coprocessor very suitable for being a benchmark of real-life complexity, mainly if compositional methods are to be applied. Figure 4.10 has shown that the complete DFN model of the Ethernet coprocessor consists of 49 places, 36 transitions and 16 hulls. A first step towards reducing these figures is by means of the Petri net model of the binary signal module, which has been introduced in Chapter 2 (Figure 2.8). Similar structures have been identified in the DFN model of the Ethernet coprocessor, and have been replaced by a diamond shaped symbol, where p1 has been labelled with ‘1’, p3 with ↓, p4 with ↑, and p6 with ‘0’. This leads to Figure 4.12. By use of Eq. (4.9), it can be inferred that a model with 36 transitions would in fact have #P(36) = 19370 ways to group these transitions into different modules. However, by identifying groups of places p ∈ P, transitions t ∈ T and hulls q ∈ Q, such that any thread of execution going from one group to the other, always goes through the same place (or set of places), it is possible to reduce the number of practically feasible modules, from 35 to 8. This means that #P(m) is also dramatically reduced, from 19370 to 26. Indeed, these modules correspond to the eight modules identified in Section 4.3.2, where the relation 4.7 is assumed. In order to prove the correctness of the Ethernet coprocessor by compositional verification, 22 temporal formulas have been used, which are the original 9 guarantee properties G , plus a remaining of 13 assumptions A . Table 4.3 shows the verification results for each G -property.

91


q

1

t

p

11

p

9

t

p

1

t

t

p

10

p

23

t

t

11

p

8

t

47

t

’=’

t q

49

0

14

t

2

t

22

8

cancel

42

11

q

t

t

t

p

46

9

q

2

6

2

−8

1

8

Brd

t

−8

4

q

12

0

’0’

p

41

p

5

p t

13

7

6

p

32

p p t

5

3

7

33

q

q

34

p ’=’

0 1 Bholdak

p p

20

t +2

10

’0’

t

rd 48

t q

36

+1

q

1

0

19

8

p

’=’

p

39

Bhold

43

’=’ /

6

t

1

p

18

10

p

p

2

21

1

t

q

2

q

14

4

5

Figure 4.12: Ethernet coprocessor’s model, using signal modules Name

Gacc Gfrom Gs1 Gs2 Gs3 Ghigh Glow Grel Gfail

Property (|µ(p10 )| = 1) =⇒ 3 (|µ(p21 )| = 1) (|µ(p10 )| = 1) =⇒ 3 (∠µ(p12 ) = ∠µ(p9 )) (|µ(p10 )| = 1) =⇒ 3 (∠µ(p13 ) = ∠µ(p12 )) 3 (∠µ(p13 ) = ∠µ0 (p12 ) + 2) 3 (∠µ(p13 ) = ∠µ0 (p12 ) + 4) 3 (∠µ(p34 ) = high( ∠µ(p14 ) )) 3 (∠µ(p34 ) = low( ∠µ(p14 ) )) (|µ(p8 )| = 1) =⇒ 3 (|µ(p26 )| = 1) (|µ(p36 )| = 1) =⇒ 3 (sch = 23)

Time [sec] 8.48 s 9.2 s 9.11 s 10.82 s 10.85 s 8.12 s 7.68 s 8.14 s 8.24 s

BDD Nodes 10033 10256 10494 141160 140948 5704 3268 7015 9029

Module

M1 M1 M1 M5 M5 M3 M4 M8 M2

Table 4.3: Ethernet LTL properties: guarantees Guarantee properties Figure 4.13 shows the module M1 , which produces the initialisation of the ethernet controller. This part of the Ethernet controller, is the one that requests access from the CPU when a txstart signal is sent to the DMA unit, i.e., upon receiving a token in place p10 the module will eventually raise the Bhold signal (c.f., Gacc ). By acknowledgement of the Bhold signal, i.e., as a consequence of Bholdak, the system reads from successive

92


memory locations (c.f., Gs1 ) starting from txaddress[16] (c.f. Gfrom ). For this proof, the inclusion of transition t37 which has •t37 = {p21 } and t37 • = {p22 , p25 } is necessary, since it replaces the behaviour of the CPU, in the event of a Bhold signal. q

1

t

p

11

p

9

q 2

2

0

Bhold

1

p

1

t

t

1

p

0 1 Bholdak

10

p

47

t

37

3

p

12

q

p

3

3

p

13

Figure 4.13: Initialisation: module M1

p

3

p

19

t

4

p

4

p

p

34

2

32

−8

p q

p

33

q

t

5

14

4

5

(a) Module M3 p

5

p

5

p

34

2

−8

q

6

2

8

t p

6

32

(b) Module M4

Figure 4.14: Sends destination address After its initialisation, the Ethernet coprocessor is ready to transmit Bdata[16] . Since the xmit frame unit only accepts 8-bit words (bytes), the Bdata[16] signal has to be

93


multiplexed in time. Thus, Ghigh and Glow are formulated, in order to prove that both Bdata[15..8] and Bdata[7..0] are transfered to such a unit. The two DFN modules shown in Figure 4.14 are advocated to the verification of Ghigh and Glow , respectively. At this stage, the Ethernet coprocessor carries on operating normally, i.e., reading from successive memory locations (from address txaddress[16] onwards) and transferring the data contained in memory, to the xmit frame unit. Properties Gs2 and Gs3 guarantee that the program counter will eventually move on, fetching data consecutively from the next but one address. Figure 4.15 illustrates how the DFN model of the coprocessor captures this effect. In the event that a token is placed in p5 , transition t7 becomes enabled, which allows q7 to execute if the transition fires. Thus, the firing of t7 has two consequences: (a) the token moves from p5 to p41 , and (b) the argument ∠µ(p12 ) is incremented by two. The fact that t34 becomes enabled next, ensures that q3 will eventually execute and copy ∠µ(p12 ) to ∠µ(p13 ), hence preparing the memory for the next fetch. Then, after fetching the data from memory, through p14 , a token in p19 indicates that the data is to be copied to p32 , and the higher byte can be consequently transferred to the xmit frame unit. t

p

43

10

p

p

6

t p

’=’ /

p

8

t

19

9

t

44

’=’

8

11

q

p

48

8

+2

q

p

12

7

p

45

q ’0’

t

9

2

q

34

3

−8

p

41

p

5

p t

13

7

p

p

34

2 q

−8

32

p q

14

4

5

Figure 4.15: Setting up the length: module M5 Having sent the higher byte to the xmit frame unit, the M5 module activates either M6 or M8 , depending on the data copied to p44 . If the M6 module is activated, the lower byte of p32 is transmitted, and a counter (p45 ) is incremented. If the M8 is activated

94


instead, the coprocessor finishes the transmission by putting Bhold signal to 0, and letting the bus become free. Figure 4.16(a) shows how M6 is captured by means of the DFN model. It can be observed that the iterations of this module are controlled by the counter p45 , which is incremented each time ∠µ(p45 ) − ∠µ(p44 ) < 0 or, in other words, when ∠µ(p45 ) < ∠µ(p44 ). When ∠µ(p45 ) reaches the same value than ∠µ(p44 ), transition t14 fires instead of t13 , deviating the flow towards M8 instead of M7 . p

p

8

p

6

44

t

−1

12

p

49

q +1

q

2

−8

q

34

14

’>=’

10

p

45

11

p

46

t

p

t

p 13

2

6

7

’=’

+2

49

’>=’

42

t q

36

16

t

p

12

7

15

10

p

+1

q

q

p

’ ∠µ0 (p45 )) =⇒ 3 (|µ(p6 )| = 1) (∠µ0 (p44 ) 6 ∠µ0 (p45 )) =⇒ 3 (|µ(p8 )| = 1)

Time [sec] 8.5 s 7.41 s 7.01 s 8.14 s 9.53 s 8.37 s 8.07 s 8.14 s 8.19 s 8.23 s 8.29 s 9.97 s 9.66 s

BDD Nodes 10017 4012 2365 7628 69283 5506 2629 3760 10013 9598 9612 30585 30704

Module

M1 M3 M4 M5 M5 M5 M5 M5 M6 M6 M6 M7 M7

Table 4.4: Ethernet LTL properties: assumptions possible due to a four-module library, which captures the semantics of the DFN model. Throughout the chapter, a number of CTL and LTL formulae have been given, in order to illustrate the type of properties that can be utilised within this verification framework. A generalised analysis of behavioural properties has been introduced, showing the capabilities of DFN models to cope with reachability, safety, and liveness formulations. Indeed, the reachability analysis proposed in this chapter (Section 4.2.2) is more efficient than the one proposed in Chapter 3 (Section 3.8), due to the absence of sparse matrices. The proposed methodology has been successfully applied to a number of examples, confirming its validity. Furthermore, an insight of a verification methodology based on compositional methods is given. Such an approach towards breaking down the verification complexity, has lead to an estimation method, which is also presented in this chapter. The estimation method addresses the reduction of verification complexity by reducing the modular unbalance of a partition. Thus, the modular approach presented aims to reduce the complexity by carefully selecting the way the system is partitioned. The results have been validated through an Ethernet coprocessor, showing that both the model checking library and the estimation method may be applied to models of reasonable complexity. By means of this benchmark example, both non-compositional and compositional verification have been compared. This comparison has shown that the compositional approach is approximately 25 times faster than the non-compositional one.

Chapter 5 Co-Synthesis of Dual Flow Nets As outlined in Chapter 1 (Figure 1.2), the final stage in the development of an embedded system is the synthesis step. This step provides a reduction in the level of abstraction, until a feasible physical implementation is reached. It has been argued that the synthesis process takes the internal design representation (IDR) of an embedded system, and a defined technology library, in order to produce the so-called embedded system implementation. The main contribution of this chapter is the creation of a co-synthesis method for the DFN model, i.e., the chapter demonstrates the applicability of DFN models in the context of a mixed hardware/software system design tool. Chapter 1 (Section 1.2.3), highlights the fact that, since the final implementation of embedded systems consist of both hardware and software, a significant proportion of research has been devoted to make more efficient designs, in the sense of parallelising aspects of both parts of the final implementation. A co-synthesis method refers to the synthesis of both hardware and software simultaneously. Such a method is studied in more detail in Section 5.1 and 5.2, where both background information and a co-synthesis tool developed at the University of Southampton, are introduced. Section 5.3 presents an algorithm capable of serving as an interface between the DFN representation and the co-synthesis tool. Finally, Section 5.4 outlines some conclusions.

5.1

Hardware/Software Co-synthesis

Co-synthesis studies the separate but interrelated synthesis of hardware and software for system’s design. Wolf [1994] defines co-synthesis as an extension to the high-level synthesis (HLS) approach [Gajski et al., 1992], where an extra dimension is considered

98

Chapter 5: Co-Synthesis of Dual Flow Nets

in the problem formulation. In addition to three phases of HLS design, i.e., allocation, mapping, and scheduling, co-synthesis also incorporates a fourth one, which is called partitioning. Thus, a co-synthesis tool identifies parts of the system description which can be broken down (i.e., partitioned) into fragments of functionality, allocates resources from a technology library to the desired architecture, maps each functional fragment onto these resources, and schedules the time at which fragments are executed, on a set of technological resources. IDR

CO−SYNTHESIS TOOL

Architecture Allocation

Technology Library

Functional & Mapping Partitioning

Scheduling

Power, Performance, Cost, etc.

HW

Synthesis

Co−simulation

SW

Compilation

Figure 5.1: Typical co-synthesis design flow Figure 5.1 shows a typical design flow within a co-synthesis tool, which takes an IDR and a technology library as inputs. It can be observed that the starting point is the architecture allocation, where important decisions in terms of cost and performance, are made. This decisions range from the number and type of processing elements, to determining whether one type of communication protocol or another should be used


99

in the final implementation. At a later stage in the co-synthesis flow, the functional partitioning and mapping of the system binds subparts of the functionality (fragments) into the finite set of resources obtained from the allocation step. Furthermore, the two steps described are followed by a third step called scheduling. In the scheduling step, the execution order of tasks and communications is determined. Knowing the duration of each task, the set of resources where they are bound to, and the dependency among each other, it is possible to infer the scheduling by which to execute such tasks. Since its NP-complete nature, these three steps are often approached by heuristic methods [Eles et al., 1997]. The outcome of the co-synthesis tool consists of a mixed hardware/software architectural description, which are fed into both hardware synthesiser and software compiler. The synthesiser transforms a behavioural description of the tasks mapped to hardware, to the interconnections among a set of electronic components chosen from the technology library. The compiler, on the other hand, produces a low-level machine-code software, from a high-level language used implement the tasks mapped to software.

5.2

Co-Synthesis of Embedded System

The data flow graph (DFG) introduced in Chapter 2 (Section 2.2.5) is commonplace in high-level synthesis. In the framework of hardware/software co-synthesis, however, a coarser representation is preferred. Thus, Definition 22 extends the DFG model aforementioned, allowing to capture the dependencies of fragments of functionality, as opposed to single operations. Definition 22 A task graph is a directed acyclic graph GS (T , C ), where: T = {τ0 , τ1 , . . . , τn } is a set of tasks; C ⊆ T × T is a flow relation describing the dependency among tasks. Being able to transform task graphs GS (T , C ) into DFN models is quite a useful front-end for the DFN modelling technique, since it allows the designer to produce a synthesised version of the DFN model. As to the realisation of the three steps described in Section 5.1, the designer also requires the utilisation of some technique, which allows for an optimisation of the final implementation towards certain goals. Since mobility is a feature that is gaining importance in embedded systems, due to the increasing demand


100

for more functionality in consumer and automotive electronics, reducing the energy consumption is often regarded as the main goal to achieve. Energy management techniques guide the optimisation of the three aforementioned steps, by means of an estimation of the total power that would be dissipated in certain periods of time at the final implementation. The power dissipated in elements of the architecture selected in the first step of the co-synthesis flow, is divided into static and dynamic power. Static power dissipation is due to, e.g., leakage currents, while dynamic power dissipation is caused mainly by the switching activity in the hardware when performing a computation. The aim of the following subsections is twofold: (1) to present an energy management technique that, despite its recent introduction, is very promising in terms of efficiency, and (2) to overview the principles of a co-synthesis tool that utilises this technique. 5.2.1

Dynamic Voltage Scaling

Dynamic Voltage Scheduling (DVS) [Hong et al., 1999; Burd et al., 2000] is a powerful technique that dynamically compromises system performance in favour of energy dissipation. This trade-off leads to task schedules that still meet the deadlines, but in a way that is more energy-efficient. The underlying motivation for such a trade-off lies in the fact that the power dissipated in processing elements is proportional to both frequency and supply voltage but, while variations in the frequency produces linear effects, variations on the supply voltage results in quadratic effects. A processing element capable of dynamically changing its supply voltage and operational frequency settings during runtime, is called dynamic voltage scalable processing elements (DVS-PE). For instance, during timing-critical parts of the application processes are running at high frequency, hence more energy consumption, while during less critical situations the processing elements are slowed down, in order to reduce the energy dissipation. Thus, by means of an architecture consisting of several DVS-PE, a co-synthesis can exploit the energy/delay trade-off, in order to reduce the energy dissipation. The dynamic voltage selection is carried out by means of a DC/DC voltage converter, a specialised frequency register, and a voltage controlled oscillator (VCO). Examples of such processors are: AMD R 2001]. Anthlon 4 [AMD, 2000] and Intel’s XScale processor [Intel ,

Chapter 5: Co-Synthesis of Dual Flow Nets 5.2.2

101

A Co-Synthesis Tool

The LOPOCOS (Low Power Co-Synthesis) tool has been developed at the University of Southampton [Schmitz et al., 2002] to tackle the allocation, mapping and scheduling problem described in Section 5.1, by considering energy minimisation objectives. Given a task graph, i.e., an extension of the DFG model (as described in Section 2.2.5) where each computation Vi may be a single computation or a group of them, the aim of the tool is to provide information as to which tasks are more suitable for been implemented in hardware, and which are more suitable for software. The architecture identification performed by LOPOCOS takes into account performance requirements and area constraints, while minimising the energy dissipation at the same time. The inputs to this tool are: • an embedded system specification given as a task graph (single mode is considered, as opposed to the approach given in [Schmitz et al., 2003]), • a technology library, such as the one shown in Chapter 1 (Figure 1.2), and • an initial allocation of components, based on knowledge that the designer has about the final implementation. As outlined in Chapter 1 (Section 1.2.2), having a validation technique within the design flow of embedded systems is convenient for reducing the overall cost of design, since early errors in the design are detected and less iterations are needed in order to obtain a final implementation that meets the specifications. Clearly, there is a mutual benefit if the tool is associated with DFN model presented throughout this dissertation: on the one hand the LOPOCOS tool would be used as a front-end to the DFN model while, on the other, the DFN model would provide a mechanism to allow the validation of task graphs, prior to its co-synthesis in LOPOCOS. This is achieved by the three algorithms discussed in next section: DFN2TG, scan data, and scan control.

5.3

Synthesising DFN models

This section introduces three linked algorithms that produces task graphs GS based on a DFN model. The translation is carried out by scanning the implicit dependencies that hulls q ∈ Q have among each other. This scanning is divided in two separate parts: first the algorithm maps each hull of the DFN model into a task τi ∈ T , and then it searches throughout the control domain SC in order to come up with the influence that one hull


102

has over the others. Such influences are captured as edges γ j ∈ C in the generated task graph GS . Algorithm: DFN2TG Input: - DFN structure SC ∪ SD Output: - Task Graph GS (T , C ) 01 B INDING : 02 i ← 0 03 for each q ∈ Q do 04 if q◦ 6= {∅} then 05 τi ← q 06 inc(i) 07 F INDING C OMMUNICATION L INKS : 08 j←0 09 scan data(SD , j) 10 scan control(SC , j)

Figure 5.2: Algorithm that translates DFN models into Task Graphs

The translation from DFN models to task graphs is carried out by the algorithm shown in Figure 5.2. It can be observed that the algorithm consists of two main parts: binding and finding communication links. In the first part, the algorithm assigns hulls q ∈ Q that have an output in the data domain SD , to tasks τ ∈ T , while in the second one, communication links in the task graph are obtained through a scan into both data and control domain of a DFN structure. Algorithm: scan data Input: - DFN data structure SD , index j Output: - Communication links C 01 02 03 04 05 06 07

for each q ∈ Q do Set qε ← q for each p ∈ ◦ q do for each (q ∈ ◦ p) ∧ (q 6= qε ) do if path(qε , q)={∅} then γ j ← (q, qε ) inc( j)

Figure 5.3: Finding communication links in SD

In order to find the communication links, both data and control domain structures should be scanned separately, though there has to be a way to consider the interaction


103

between these two domains. Figure 5.3 presents the algorithm that finds communication links in the data domain SD . This algorithm searches through SD , looking for places that are shared between the data pre-set ◦ qε of a hull and the data post-set q◦ of another one. In this way, the outcome of each iteration of the loop will be an edge γ j of the task graph, where the arrow goes from task where hull q is bound to, to the task that captures qε . Algorithm: scan control Input: - DFN control structure SC , index j Output: - Communication links C 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18

for each q ∈ Q do Set qε ← q for each t ∈ • q do Build a set p0 = p p ∈ •t for each p ∈ p0 do for each t ∈ • p do Build a set q0 = q q ∈ t ◦ , (q, qε ) 6∈ C for each q ∈ q0 do if path(qε , q)={∅} then γ j ← (q, qε ) inc( j) for each q ∈ ◦t do Build a set p00 = p p ∈ ◦ q for each p ∈ p00 do for each q ∈ ◦ p do if path(qε , q)={∅} then γ j ← (q, qε ) inc( j)

Figure 5.4: Finding communication links in SC

The structure of the algorithm that scans through the control domain searching for communication links is slightly more complicated, since it needs to take into account what happens in the data domain, as well. Figure 5.4 shows the algorithm for scanning in SC . The algorithm reaches its second outer loop, i.e., ∀t ∈ • q, in a way that is analogous to the algorithm presented in Figure 5.3. Nevertheless, the scanning through SC requires the algorithm to carry on two loops hereafter, one (line 05) for the reasons in the control domain that would fire such t ∈ • q and another one (line 12) for the consideration of ∠µ(p) that would influence the guard function of the same t ∈ • q. The three algorithms described in this section have been applied to the multiplier example shown in Chapter 3 (Figure 3.5). This leads to the transformation of such

104

Chapter 5: Co-Synthesis of Dual Flow Nets p

1

a

p2

b

p

6

c

t1

τ0

q

τ1 q

q2

1

p3

x

y

q7

t2

τ4

5

"="

t3

p4 q

6

">" −1

q3

τ2

q4

z p5

τ3

0

q

8

τ5 (a) DFN→ GS (T , C )

τ1

τ0 τ5

τ3

τ2 τ4

(b) Multiplier’s Task Graph

Figure 5.5: Synthesising the DFN model of the multiplier a DFN model into the task graph shown in Figure 5.5(b). In order to obtain such a task graph, Figure 5.5(a) shows the DFN model for the multiplier in grey, and depicts the dependencies in black. Firstly, the binding of q1 onto τ0 , q2 onto τ1 , etc. takes place, according to the DFN2TG algorithm. Secondly, the scan data algorithm runs, capturing the data dependencies among hulls. For example, q1 modifies the argument of µ(p3 ) and q4 reads from the argument of µ(p3 ), therefore, task τ3 can only execute


105

when task τ0 has been executed. This link is captured in the fourth iteration of the outer loop (line 01), where ◦ q = {p3 , p5 } and, since q1 and q8 comply with the condition in line 04, there are two communication links γ j1 = (τ0 , τ3 ) and γ j2 = (τ5 , τ3 ). Thirdly, the scan control algorithm completes the task graph GS (T , C ) by scanning the DFN model for dependencies that are not captured by the scan data algorithm. For instance, when scanning hull q4 two links (γ j1 and γ j2 ) are found in the data domain. However, there is a further link (γ j3 ) that can only be determined by the scan control algorithm, when scanning for dependencies in the control domain. This link corresponds to the set p00 = {p4 } built in line 13, and q2 ∈ ◦ p4 , hence γ j3 = (τ1 , τ3 ). Combining γ j1 , γ j2 , γ j3 , and all γ j obtained from processing these algorithms, it is possible to obtain the task graph depicted in Figure 5.5(b). 5.3.1 Example: Co-Synthesis of the Ethernet coprocessor Having introduced the three algorithms that allow DFN models to be converted into task graphs, this section considers the application of such algorithms to the Ethernet controller analysed in Chapter 4 (Section 4.3.2). Figure 5.6 shows the task graph that is obtained after executing the three algorithms with the complete Ethernet DFN model. Based on the original specifications [ANSI/IEEE, 1991], deadlines θ1 = .015, θ2 = 0.03, and θ3 = 0.03 have been introduced to the task graph. There are two different approaches for the execution of the LOPOCOS tool: DVSenabled or DVS-disabled. The DVS-enabled approach considers the effects of reducing the energy dissipated in a processing element by means of a flag in the specification, while the DVS-disabled does not have this feature. Both approaches have been examined with the LOPOCOS tool, obtaining two different mappings that are shown in Figure 5.71 . The shaded nodes represent tasks that are mapped to hardware, whereas the other tasks are mapped to software. Since the LOPOCOS tool is geared towards energy efficient implementations, Table 5.1 summarises the results from Appendix D that are used to guide the designer. Not all solutions reported by the tool are feasible. A solution is said to be feasible when neither the normalised area penalty nor timing penalty are greater than 1. Both solutions (with and without DVS) presented in Appendix D have been reported as feasible, since there is at least one solution (in each) that complies with the feasibility condition described here. 1 The

complete outcome of the tool is included in Appendix D.

106

Chapter 5: Co-Synthesis of Dual Flow Nets τ0 τ1 τ2 τ11 τ6

τ7

τ3

θ1

τ8

τ4

θ2

τ10 τ5

θ3

τ9

Figure 5.6: Task graph for the Ethernet coprocessor Parameter Total energy dissipation Dynamic average power Static power Feasibility

without DVS 0.59 mJ 18.35 mW 1.58√mW

with DVS 0.26 mJ 7.14 mW 1.58√mW

Table 5.1: Low-power co-synthesis results of the Ethernet’s DFN model Besides the task mapping and scheduling obtained as a result of LOPOCOS, the tool also provides information about the scaled supply voltage that the tasks (executing on DVS-enabled processors) require, the dynamic and static power that is consumed in total, some metrics about the quality of the generated solution, e.g., cost, area violations, and timing penalties, and information about the optimisation process. Solutions that

107


τ6

τ0

τ0

τ1

τ1

τ2

τ2

τ11

τ11

τ7

τ3

τ6

θ1

τ7

τ8

τ3

θ1

τ8

τ4

τ4

θ2

τ10

θ2

τ10 τ5

τ9

τ5

θ3

θ3

τ9 (a) without DVS

(b) with DVS

Figure 5.7: Final Task Mapping incur in, e.g., area violation, can still be moved towards a feasible implementation by a different set of constraint resources.

5.4

Concluding Remarks

By employing an internal design representation (IDR), the methodology for embedded system design turns out to be more flexible. Validating the design prior to the application of the co-synthesis step is a very valuable issue for embedded system designers. Being able to carry out verification in formal terms before synthesising a design is desirable, since it reduces the uncertainty an embedded system design may have, regarding


108

its correctness. This may not be the case if no IDR is applied, for example, by direct use of task graphs. This chapter has presented one potential application for DFN models. A methodology that performs co-synthesis of embedded systems, taking DFN models as inputs, has been presented. The methodology is based on three algorithms and an existing co-synthesis tool. Such algorithms transform DFN models into task graph representations, which are commonly used among these type of tools. The applicability of such a methodology has been illustrated in Section 5.3.1, where the Ethernet coprocessor example provided in Chapter 4 (Section 4.3.2) has been used as a benchmark for the three algorithms. Since DFN models are transformed into task graphs, which are the original inputs to the co-synthesis tool used, the quality of the generated solution is not compromised. Off-the-shelves Ethernet coprocessors [Altima, 2001] consume approximately 280 mW in total, and have a DMA unit that utilises about a tenth of such energy. By inspection of Table 5.1, the Ethernet coprocessor implementation through the DFN model reveals an average total power of 20 mW (or even less than 9 mW if DVS-enabled elements are used) in the DMA unit, which is approximately only 2/3 of the latest technology developments, recently mentioned. This improvement is reasonable, taking into account that the tendencies depicted in Figure 1.3 are severely affected by whether the design methodology used incorporates or not an internal design mechanism.

Chapter 6 Conclusions Being able to keep pace with the ever-increasing demand for more complex functionality, is certainly a very challenging goal that current researchers and designers are facing. The next-generation of embedded systems will undoubtedly comprise functional specifications that goes beyond our imagination, at reduced cost and time-to-market. Thus, relying only on the advancements that the silicon technology industry is able to provide, would not be adequate in order to address such a challenge. Indeed, the ITRS [2002] affirms that one source for overall improvements, in terms of cost, comes from the methodologies used along the design flow, rather than more powerful processing engines devoted to implement the design. This dissertation has tackled one of the issues that influences the design complexity: the interrelation between control and data flow. Embedded systems have separate but related control and data flows. Unlike most approaches in the realm of embedded systems modelling based on Petri Nets, which extends the classical weighted, directed, bipartite graph, Chapter 3 has presented the syntax and semantics of a new model called Dual Flow Net (DFN), which is based on a tripartite graph. Specifications consisting of both control and data flow were efficiently converted into an internal design representation (IDR) that kept a tight link between control and data domains. Such an innovation in the structure of an IDR brought along the necessity to use a different representation for capturing the state of the system in a behavioural analysis of the model. Thus, a marking function defined in the domain of the complex numbers has been defined, in order to cope with the duality of control and data information associated with a particular state. One of the main features of the DFN model is that the periodicity inherent to the analysis based in complex numbers

Chapter 6: Conclusions

110

using polar notation, has a desirable effect in the representation, due to its similarity to the way real hardware works. Therefore, an aspect of the model that has been underpinned in Chapter 3 is the considerable significance is this intrinsic relation between the control/data-flow interaction and the periodicity of the underlying representation. Chapter 4 has addressed the validation of the DFN model, bringing forth the development of a library entirely constructed in the framework of a formal verification tool. This library is part of a verification engine which uses the Cadence SMV tool in order to reason about CTL and LTL properties. Various DFN behavioural properties have been described and generalised, such as reachability, safety and liveness. Furthermore, reachability analysis has been carried out in two different ways: with (Section 4.2.2) and without (Section 3.8) using temporal logics. Since the aim of the thesis is to target embedded systems, which vastly increase their design complexity along with an increment of their functionality, it is desirable to have a verification methodology which is capable of scaling up towards more sophisticated designs. Thus, Chapter 4 has also presented a modular approach to the model checking problem, aiming at the validation of models that are comparable to real-life specifications, in terms of complexity. Particularly, the Ethernet coprocessor is used to visualise the improvements that can be obtained by such an approach: a reduction of about 25 times in terms of verification time. In Chapter 5, the last step in the embedded system design flow has been implemented with the aid of existing co-synthesis tools. The flexibility gained by utilising an IDR as backbone of the design flow does not affect by any means the quality of the solution generated by the co-synthesis tool, as shown in the benchmark example used to demonstrate the applicability of such an approach. Indeed, Chapter 5 has concluded that the co-synthesis tool also benefits from this approach, in the sense that the DFN model allows the verification of embedded systems prior to its implementation phase. Overall, this thesis has focused on the modelling and verification of embedded systems, where particular emphasis has been put on the interrelation between control and data flows. The design representation introduced and analysed throughout the dissertation has shown to be capable of modelling the intrinsic features of an embedded system design, whereas the verification side also shows promising results, mainly when a modular approach is taken into account. Furthermore, this thesis has also investigated the applicability of a standard co-synthesis tool in order to complete the design process.

Chapter 6: Conclusions

6.1

111

Future Work

Particularly, the thesis has focussed on improving the design of embedded systems, where the control/data-flow interaction has an impact on the design complexity. However, the principles introduced throughout this dissertation can be applied to many areas of design where the control/data-flow relation is likely to happen. This section explores some further areas of research, where the DFN model can be applied to. 6.1.1

Infinite State Systems

Among the two formal verification methods described in Chapter 2 (Section 2.5), this dissertation has utilised a method based on state-exploration, more precisely: model checking. However, in order to support recursion, and other type of structures leading to infinite state systems [Leuschel and Massart, 1999], deductive methods may be applied. Thus, Varea et al. [2002a] have investigated towards this issue and concluded that it is possible to realise a combined verification, where SMV can be used for finite state embedded systems and logic programming can be used for large (and infinite) state spaces. It would be interesting to develop further such approach and implement a logic programming based model checker, which would complement the existing verification methodology based on SMV and would be able to cope with models of infinite nature, as well. 6.1.2

Energy-efficient Embedded Systems

The co-synthesis tool used in this thesis (Chapter 5) aims at optimising towards minimal energy dissipation. However, the tool does not take into account certain parameters, which are intrinsic to DFN models and would lead to a reduction in the energy dissipated on the final implementation. It would be interesting to investigate the incorporation of DFN models as the native representation used by the cosynthesis tool, so that each loop in the process, i.e., scheduling, mapping, and allocation, could benefit from a tighter control/data-flow relationship. Performing a comparative analysis of the solutions generated with and without including this factor can give the basis for a further investigation into this aspect of the design.

Appendix A Transformations Chapter 3 (Section 3.8) makes use of the Ω-transformation, and its inverse, which are − defined in this appendix. Given a vector → v , it is possible to obtain a matrix M, and vice-versa, by means of the following two theorems. − THEOREM 1 (vector2matrix) Let → v be a n-dimension vector and M be an n × n − matrix. Then, there exist a linear transformation Ω that transforms → v into M, such → − that each component vi of v is allocated in [mii ] of the matrix M. This is: Ω : Θ(n×1) 7→ Θ(n×n) Where Θ(n×m) denotes the Euclidean space of dimension n × m. → − Proof: Let εi be a n-dimension vector, where entry i is set to 1, and the rest is 0. Then: → −T → εi · − v = vi

(A.1)

It is also known that: 

 0 ··· 0  . ..  → − → −  . vi · εi · εi T =   . vi .  0 ··· 0

(A.2)

→ − Since all εi , ∀i, are orthogonal w.r.t. each other, the superposition principle can be applied. Thus, the summation of those matrix in (A.2), ∀i, leads to a matrix

113

Appendix A: Transformations with elements in its diagonal: 

  0 · · · 0 n     ... v ...  =  ∑ i    i=1 0 ··· 0

 v1 · · · 0 .. .  . vi ..  =M 0 · · · vn

Which is, by definition, the M matrix. Therefore, by combining Eqs. (A.1) and (A.2), it is possible to infer: n → − − → − → − v · εi · εi T = M ∑ εi T · → i=1

Of course, the above equation represents the following transformation: − M = Ω (→ v) Q.E.D.

− THEOREM 2 (matrix2vector) Let → v be a n-dimension vector and M be an n × n − matrix. Then, there exist a linear transformation Ω−1 that transforms M into → v , such − that each component [mii ] on the diagonal of M, is captured in vi of the vector → v . This is: Ω−1 : Θ(n×n) 7→ Θ(n×1) Where Θ(n×m) denotes the Euclidean space of dimension n × m. Proof: Since M is diagonal, it can be easily proven that:    M·  

1 1 .. . 1

   → =−  v 

(A.3)

114

Appendix A: Transformations

Moreover, the (1, 1, · · · , 1)T vector, can also be constructed by superposition, in the following way:   1   n  1  −  . = ∑→ εi (A.4)  .   .  i=1 1 Thus, by combining Eqs. (A.3) and (A.4), it is concluded that: n

→ − − M · ∑ εi = → v i=1

Which has the following form: → − v = Ω−1 (M) Q.E.D.

Appendix B DFN Library The four-module library for the verification of DFN models, described in Chapter 4, has been implemented using the the Cadence SMV tool [Cadence, 2001]. This appendix shows such an implementation. To begin with, several defined types are needed in the methodology, which are shown in Figure B.1. From the Cadence SMV point of view, typedef is equivalent to a module, but without arguments. Each of this types, rely on a constant which is passed through, from the program that invokes the library,e.g., in order to define PLACES the user needs to provide NN. typedef PLACES 1..(NN); typedef TRANSITIONS 1..(MM); typedef HULLS 1..(HH); typedef AVAILABLE SOT; typedef CTRL 0..(Kc-1); typedef DATA -(Kd/2)..(Kd/2-1); typedef DOUBLE -(Kd)..(Kd-1);

Figure B.1: Type definitions Figure B.2 shows two macro-definitions, which are used in the DFN library. The first one is related to the definition of abstract signals, i.e., auxiliary variables which are either part of the specification of the proof, but cannot be actually verified. The second one, instead, is related to the set of assignments that are necessary to carry out, in order to define the net in the sense of Definition 8. Thus, the initial marking in both control and data domain, as well as the structure S are defined under this macro, which is to be invoked at the end of any DFN program.

116

Appendix B: DFN Library #define asignal(n) {abstract n : 1..(Kd/2-1); next(n):=n;} #define END \ forall (i in PLACES) init(p[i].modulus) := ictrl(i);\ forall (i in PLACES) init(p[i].phase) := idata(i);\ forall (i in TRANSITIONS)\ t[i] : transition(p,ctrl[0][i],ctrl[1][i],grd(i));\ forall (i in HULLS)\ q[i] : hull(p,data[0][i],bin0(i),data[1][i],bin1(i),\ allow(i),offset(i),sch,deadlock);\ } /* End Of File */

Figure B.2: Definitions The three primary components of this library, are those modules that capture the functionality of places, transitions and hulls. Figure B.3 presents the data type used for the definition of a place. A place p[i], defined as an instance to this module, is a structure composed of two parts, its modulus p[i].modulus and its argument p[i].phase. typedef place struct{ modulus : CTRL; phase : DATA; }

Figure B.3: Place In Figure B.4, the functionality of a transition is captured by means of four arrays of length |P| and a boolean variable g. The boolean variable captures the result from the guard function, implemented in Figure B.5, in order to influence the result of the enabling condition. The four arrays are: p, preset, postset, and pp, where p contains the marking of the entire set P before the firing of the transition t, preset and postset represent the sets •t and t • respectively, and the array pp contains information about the control part of the entire set P, after the firing of t. As said in Chapter 4, this module is composed of two parts, the enabling condition and the firing operation. In order to link the argument ∠µ(p) of a place p ∈ P with the firing of some transition t ∈ T , the module presented in Figure B.5 implements the guard function G(t) defined in Definition 9. As mentioned in Chapter 3, the guard function maps transitions to the set of conditional comparators ], which is captured in the switch operation. Thus, the output of the module, i.e., the boolean variable v, is allocated with the evaluation of an atomic proposition AP that considers ∠µ(p), the symbol sgn from the set ], and a constant val. In other words, this module implements Definition 7.

117

Appendix B: DFN Library module transition(p,preset,postset,g){ input preset : array PLACES of CTRL; input postset : array PLACES of CTRL; input g : boolean; pp : array PLACES of CTRL; en : boolean; en := &[ (preset[i] > 0) -> (p[i].modulus >= preset[i]) : i = 1..(NN)] & g; for(i=1;i val; < val; >= val; > preset[i]) : 0; default : 0; } mod Kd; for(i=1;i 0) ? (((+sum) + h) 0) ? (((+sum) + h) >> postset[i]) : 0; default : 0; }; else pp[i] := p[i].phase; }

Figure B.6: Hull Figure B.7 illustrates the implementation of the binary signal presented in Chapter 2 (Figure 2.8, page 24). The macro definition signal is composed of two transitions and four places, as opposed to the one introduced in Chapter 2 where |T | = 4 and |P| = 6, because transitions t1 and t4 as well as places p1 and p6 are assumed to be the environment. This implementation has been carried out using a notation that is a consequence of the /*NET*/ section in the DFN code (c.f. Appendix C). Thus, an arc in F is annotated as: io,t j , pi : w

or

io, q j , pi : w

119

Appendix B: DFN Library

Where io ∈ IB indicates whether it is an input or output arc, t j (or q j ) refers to an active element of the net, pi refers to a passive element, and w ∈ W is the weight of the arc. #define signal(tr,tf,pr,pf,p1,p0) \ (0,tr,p0):1; (0,tr,pr):1; (1,tr,p1):1;\ (0,tf,p1):1; (0,tf,pf):1; (1,tf,p0):1;

Figure B.7: Binary signal

B.1

main( )

The main body of the library comprises a series of SMV code lines that declare the syntax to be used. Thus, Figure B.8 shows how the DFN model is declared by means of three arrays of module instantiations (p, t, and q) and two sets of symbols (symbol and sentido). Another declaration, which takes place in the scope of the program that calls this library, is the constant set SOT. This set offers an additional source of allowance for the transition to fire, hence providing the basis for a modular approach. module main(){ p : array PLACES of place; t : array TRANSITIONS; q : array HULLS; abstract symbol : {NE,EQ,NEQ,GT,LT,GEQ,LEQ}; abstract sentido : {NONE,IZQ,DER};

Figure B.8: Declarations Certain conditions guide the verification process. Figure B.9 shows the implementation of two of such conditions: enabled and deadlock. The enabled condition is defined in IBn , where each element enabled[i] is set to 1 if the transition ti is enabled and included in SOT. enabled : array TRANSITIONS of boolean; forall(i in TRANSITIONS) enabled[i] := (i in SOT) ? t[i].en : 0; deadlock : boolean; deadlock := ˜(|enabled);

Figure B.9: Conditions Chapter 4 (Section 4.2) has also outlined the main scheduler for this library. This scheduler is based on nondeterminism, and controls the firing of transitions such that,

120

Appendix B: DFN Library

at each each stage of the model checking execution trace, all enabled transitions are available. Figure B.10 shows the implementation of such a scheduler in the DFN library. sch : AVAILABLE; sch := { i : i = 1..(MM), enabled[i] };

Figure B.10: Scheduler In Figure B.11, all assignments that conform the core of the library are presented. The first forall statement indicates that the places from the control domain point of view, i.e., |µ(p)|, are assigned with the value that results from instantiating a transition module indexed by the nondeterministic scheduler. The following three forall statements are the counterpart in the data domain. act : array PLACES of boolean; qq : array PLACES of array HULLS of DATA; forall (i in PLACES) next(p[i].modulus) := t[sch].pp[i]; forall (i in PLACES) forall (j in HULLS) qq[i][j] := (q[j].en) ? q[j].pp[i] : 0; forall (i in PLACES) act[i] := |[((q[j].postset[i] ˜= 0) & (q[j].en)) : j = 1..(HH)]; forall (i in PLACES) next(p[i].phase) := (act[i]) ? +qq[i] : p[i].phase;

Figure B.11: Assignments Finally, the structure of the net is captured by a 3 dimensional data structure, i.e., an array x of arrays y, where each element of the array y is an array z of either control or data elements. This is shown in Figure B.12. ctrl : array 0..1 of array TRANSITIONS of array PLACES of CTRL; data : array 0..1 of array HULLS of array PLACES of DATA;

Figure B.12: Arrays containing the DFN structure

Appendix C DFN Examples This appendix presents the DFN code for the examples presented in this dissertation. The first example (Appendix C.1) has been introduced in Section 3.7 and verified in Section 4.2.1. The next example (Appendix C.2) corresponds to the VME-bus controller described in Section 4.3.1. The following one (Appendix C.3) is the Ethernet coprocessor verified in Section 4.3.2. Finally, the last example shown in this appendix (Appendix C.4), consists of the modular approach of the ethernet coprocessor analysed in Section 4.5.

C.1

Multiplier

#define Kc 4 #define Kd 64 #define NN 7 #define MM 5 #define HH 6

/* capacity in the ctrl domain */ /* capacity in the data domain */ /* |P| */ /* |T| */ /* |Q| */

#include "../lib/dfn.smv" abstract a : 0..7; next(a):=a; abstract b : 0..7; next(b):=b; /* INITIAL MARK */ #define ictrl(i) {( (i=1)|(i=2) )}

122

Appendix C: DFN Examples #define idata(i) {(i=1)?a:((i=2)?b:((i=6)?b:10))} /* GUARD FUNCTION and OFFSET*/ g1 : guard(p[4],GT,0); g2 : guard(p[4],EQ,0); #define grd(i) {(i=2)?g2:((i=3)?g1:1)} #define offset(i) {(i=3)?(-1):(0)} #define allow(i) {(i=3|i=4|i=5)?{i}:{1}} /* NET */ for (w=0;w A((EX p[13].phase = dat) U Dm)); Dm -> A((!EX p[13].phase = dat) U Dp));

END

C.3

Ethernet coprocessor (flat representation)

/*****************************************/ /******** The Ethernet Controller ********/ /*****************************************/ /* */ /* FLAT REPRESENTATION */ /* */ /*****************************************/ /* (C) 2002 by Mauricio Varea */ /*****************************************/

124

125

Appendix C: DFN Examples

#define Kc 2 #define Kd 16

/* capacity in the ctrl domain */ /* capacity in the data domain */

#define NN 48 #define MM 38 #define HH 12

/* |P| */ /* |T| */ /* |Q| */

#define SOT {1,2,3,4,5,6,7,8,9,10,11,22,23,26,27,28,29,34,37,38}

/* {t in T} */

#include "../lib/dfn.smv" abstract pc : -(Kd/4)..(Kd/4-1); next(pc):=pc; asignal(dat) asignal(addr) asignal(any) abstract HIGH : array 3..0 of boolean; HIGH := (any & [1,1,0,0]) >> 1; abstract LOW : array 3..0 of boolean; LOW:= (any & [0,0,1,1]); /* INITIAL MARK */ #define ictrl(i) {switch(i){ 1:1; 10:{0,1}; 15:1; 19:1; 20:1; 26:1; 29:1; 35:1; 39:{0,1}; default:0;}} /* initial marking + all signals to 0 + tx_start */ #define idata(i) {switch(i){9:addr; 12:pc; 14:dat; default: 0;}} #define grd(i) {1} #define offset(i) {switch(i) {7:2; 9:0; default:0;}} #define allow(i) {switch(i){ 1:1; 2:{2,23}; 3:{3,7,15,34,35}; 4:{4,8,16}; 5:{5,9,17}; 6:{6,12}; 7:{7,8,16}; 8:8; 9:9; default:0;}} #define bin0(i) {switch(i){5:DER; 6:IZQ; 8:DER; default:NONE;}} #define bin1(i) {(i=6) ? DER : NONE}

126


/* NET */ for (w=0;w F (p[12].phase = p[9].phase)); successive1: assert G((p[10].modulus = 1) -> F (p[13].phase = p[12].phase)); successive2: assert G((p[5].modulus = 1) -> F (p[13].phase = pc+2)); successive3: assert G((p[5].modulus = 1) -> F (p[13].phase = pc+4)); send_high: assert G((p[33].modulus = 1) -> F (p[34].phase = HIGH)); send_low: assert G((p[4].modulus = 1) -> F (p[34].phase = LOW)); tx_fail: assert G((p[36].modulus = 1) -> F (sch = 23));

term_start: assert G((p[10].modulus = 1) -> F (p[3].modulus = 1)); term_dest1: assert G((p[5].modulus = 1) -> F (p[4].modulus = 1)); term_dest2: assert G((p[4].modulus = 1) -> F (p[5].modulus = 1)); term_len_ctrl: assert G((p[5].modulus = 1) -> F (p[43].modulus = 1)); term_len_data: assert G((p[5].modulus = 1) -> F (p[32].phase = p[14].phase)); len_intermediate: assert G((p[5].modulus = 1) -> F (p[44].phase = HIGH)); len_term1: assert G((p[44].modulus = 1 & p[44].phase ˜= 0) -> F (p[6].modulus = 1)); len_term2: assert G((p[44].modulus = 1 & p[44].phase = 0) -> F (p[8].modulus = 1)); END

C.4

Ethernet coprocessor (modular verification)

/*****************************************/ /******** The Ethernet Controller ********/ /*****************************************/ /* */ /* State: START */ /* */ /*****************************************/ /* (C) 2002 by Mauricio Varea */ /*****************************************/ #define Kc 2 #define Kd 16 #define NN 48 #define MM 37 #define HH 12

/* capacity in the ctrl domain */ /* capacity in the data domain */ /* |P| */ /* |T| */ /* |Q| */

#define SOT {1,2,3,26,27,28,29,37}

/* {t in T} */


128

#include "../lib/dfn.smv" asignal(addr) /* INITIAL MARK */ #define ictrl(i) {switch(i){1:1; 10:{0,1}; 15:1; 20:1; 26:1; 29:1; 35:1; default:0;}} /* initial marking + all signals to 0 + tx_start */ #define idata(i) {switch(i){9:addr; default: 0;}} #define grd(i) {1} #define offset(i) {0} #define allow(i) {switch(i){1:1; 2:{2,23}; 3:{3,7,15}; default:0;}} #define bin0(i) {NONE} #define bin1(i) {NONE} /* NET */ for (w=0;w F (p[12].phase = p[9].phase); successive1: assert (p[10].modulus = 1) -> F (p[13].phase = p[12].phase); term: assert (p[10].modulus = 1) -> F (p[3].modulus = 1); END

/* Q1 */ /* Q2 */ /* Q3 */

129


/*****************************************/ /******** The Ethernet Controller ********/ /*****************************************/ /* */ /* State: DEST1 */ /* */ /*****************************************/ /* (C) 2002 by Mauricio Varea */ /*****************************************/ #define Kc 2 #define Kd 16



/* |P| */ /* |T| */ /* |Q| */

#define SOT {4,5}

/* {t in T} */

#include "../lib/dfn.smv" asignal(any) abstract HIGH : array 3..0 of boolean; HIGH:=(any & [1,1,0,0]) >> 2; /* INITIAL MARK */ #define ictrl(i) {switch(i){3:1; 19:1; 35:1; default: 0;}} #define idata(i) {switch(i){14:any; default: 0;}} #define grd(i) {1} #define offset(i) {0} #define allow(i) {switch(i){4:{4,8,16}; 5:{5,9,17}; default:0;}} #define bin0(i) { (i=5) ? DER : NONE } #define bin1(i) { NONE } /* NET */ for (w=0;w F p[8].modulus = 1); END

/* T12 */ /* T13 */ /* T14 */

/* Q6 */ /* Q10 */ /* Q11 */

135


/*****************************************/ /******** The Ethernet Controller ********/ /*****************************************/ /* */ /* State: DATA2 */ /* */ /*****************************************/ /* (C) 2002 by Mauricio Varea */ /*****************************************/ #define Kc 2 #define Kd 16



/* |P| */ /* |T| */ /* |Q| */

#define SOT {13,14,15,16,17,24,25,30,31}

/* {t in T} */

#include "../lib/dfn.smv" asignal(dat) asignal(count) asignal(len) /* INITIAL MARK */ #define ictrl(i) {switch(i){7:1; 19:1; 35:1; default: 0;}} #define idata(i) {switch(i){14:dat; 44:len; 45:count; default: 0;}} /* GUARD FUNCTION and OFFSET*/ g3 : guard(p[46],LT,0); g4 : guard(p[46],GEQ,0); #define grd(i) {switch(i) {13:g3; 14:g4; default:1;}} #define offset(i) {switch(i) {7:2; 11:1; default:0;}} #define allow(i) {switch(i){ 3:{3,7,15}; 4:{4,8,16}; 5:{5,9,17}; 7:{7,8,16}; 10:{12,17}; default: 0;}} #define bin0(i) { (i=5) ? DER : NONE } #define bin1(i) { NONE }

136


/* NET */ for (w=0;w p[7].modulus = 1); term_2: assert F ((count > len) -> p[8].modulus = 1); END /*****************************************/ /******** The Ethernet Controller ********/ /*****************************************/ /* */ /* State: END */ /* */ /*****************************************/ /* (C) 2002 by Mauricio Varea */ /*****************************************/ #define Kc 2 #define Kd 16


/* /* /* /* /* /* /*

T13 */ T14 */ T15 */ T16 */ T17 */ rd */ Brd */

/* /* /* /* /*

Q3 */ Q4 */ Q5 */ Q7 */ Q10 */

137

Appendix C: DFN Examples #define NN 48 #define MM 35 #define HH 12

/* |P| */ /* |T| */ /* |Q| */

#define SOT {18,19,26,27,28,29,35}

/* {t in T} */

#include "../lib/dfn.smv" asignal(bw) /* INITIAL MARK */ #define ictrl(i) {switch(i){8:1; 21:1; 27:1; 35:1; default:0;}} #define idata(i) {(i=13)?bw:0} #define grd(i) {1} #define offset(i) {0} /* q[12] has been included in this offset */ #define allow(i) {switch(i){12:19; default:0;}} #define bin0(i) {NONE} #define bin1(i) {NONE} /* NET */ for (w=0;w

MODELLING AND VERIFICATION OF EMBEDDED SYSTEMS BASED

MODELLING AND VERIFICATION OF EMBEDDED SYSTEMS BASED

Suggest Documents

Verification in Networked Embedded Systems

Verification Tools for Autonomous and Embedded Systems

Verification of Uncertain Embedded Systems by ... - mediaTUM

MODEL-BASED VERIFICATION OF EMBEDDED SOFTWARE

Verification of embedded control systems by simulation and program ...

Automated Verification and Synthesis of Embedded Systems using

A Modelling Method for Embedded Systems

Embedded Software Composition and Verification

Multibiometric Systems Based Verification Technique

Multibiometric Systems Based Verification Technique

MICROCONTROLLER BASED ETHERNET EMBEDDED SYSTEMS

Compositional Verification for Component-based Systems and ...

modelling and simulation of water systems based

VERIFICATION OF EMBEDDED SUPERVISORY ... - CiteSeerX

Verification Tools for Embedded Systems - Carnegie Mellon School of ...

HW/SW Co-Verification of Embedded Systems ... - Semantic Scholar

Formal Verification of Embedded Systems for Remote ... - WSEAS

Embedded Fingerprint Verification System

The Construction of Verification Models for Embedded Systems

Hierarchical Modeling and Verification of Embedded ... - CiteSeerX

Verification of Agent-Based Artifact Systems

A Low-Cost FPGA-based Embedded Fingerprint Verification and

Model-based Development of Embedded Systems: Executable ...

Model-Based Debugging of Embedded Software Systems