Chapter 6 concludes and outlines future work. ..... a binary relation, then p.q gives the relational image of p under q. Intuitively, the ...... Although this choice.
Runtime Conformance Checking of Objects Using Alloy
by
Michelle Love Crane
A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science
Queen’s University Kingston, Ontario, Canada August 2003
c Michelle Love Crane, 2003 Copyright
Abstract Object model specifications are an important part of most object-oriented software development methodologies, where they play a central role during the specification and design phases. However, their usefulness is much more limited during the implementation phase. We demonstrate how confidence in source code can be increased by using runtime conformance checking to analyze the code with respect to an object model specification. More precisely, we use the Alloy Analyzer, developed at MIT, to determine automatically whether the runtime state of a program at certain userspecified locations conforms to a given specification. The design, implementation and analysis of Embee, a prototype runtime conformance checker for Java programs against Alloy object models, is presented.
i
Acknowledgments To my mother, for your strength and sense of humour, and for proving that anything is possible. To Pat and Jim, for adopting me. To Valerie, for all of your support. To Juergen, for your patience and inspiration, and most especially, for your enthusiasm. To my friends, thanks for all the laughs, and all the lunches. I would also like to acknowledge the invaluable assistance of Ilya Shlyakhter and the rest of the Alloy folks at MIT.
ii
Contents 1 Introduction 1.1 Motivation . . . . . . . 1.2 Problem . . . . . . . . 1.3 Objective . . . . . . . 1.3.1 Hypothesis . . . 1.4 Contributions . . . . . 1.5 Organization of Thesis
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
2 Background 2.1 Conformance Checking . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Multiple Definitions . . . . . . . . . . . . . . . . . . . . 2.1.2 Static vs. Runtime (Dynamic) Conformance Checking . 2.1.3 Our Definitions . . . . . . . . . . . . . . . . . . . . . . 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Static Conformance Checking . . . . . . . . . . . . . . 2.2.2 Runtime (Dynamic) Conformance Checking . . . . . . 3 Alloy 3.1 The Alloy Language . . . . . 3.1.1 Atoms and Relations . 3.1.2 Expressions, Operators 3.2 Alloy Specifications . . . . . . 3.2.1 Descriptive Elements . 3.2.2 Analysis Elements . . 3.2.3 Example . . . . . . . . 3.3 Alloy Analysis . . . . . . . . . 3.4 Alloy Analyzer . . . . . . . . 3.4.1 Using the Analyzer . . 3.4.2 Example . . . . . . . . 3.5 Analyzing the Analysis . . . .
. . . . . . . . . . . . . . . . . . and Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
1 1 2 3 3 4 5
. . . . . . .
6 6 6 8 9 11 11 12
. . . . . . . . . . . .
21 22 23 24 26 26 31 32 33 35 35 37 47
3.6
3.5.1 3.5.2 3.5.3 3.5.4 Alloy
Decidability and Completeness . The Boolean Array . . . . . . . The Solution Space . . . . . . . Small Scope Hypothesis . . . . Analyzer Source Code . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
4 Embee: User Perspective 4.1 Preparation . . . . . . . . . . . . . . . . . . . . 4.2 Embee Phases . . . . . . . . . . . . . . . . . . . 4.2.1 Phase 1: High-Level Static Mapping . . 4.2.2 Phase 2: Dynamic Object Collection . . 4.2.3 Phase 3: Conformance Checking . . . . . 4.3 Miscellaneous Capabilities . . . . . . . . . . . . 4.3.1 Running Embee . . . . . . . . . . . . . . 4.3.2 Context of Constraints and Breakpoints 4.3.3 Exploring Nonconformance . . . . . . . . 4.3.4 Higher-Arity Relations . . . . . . . . . . 4.3.5 Mappings in the Configuration File . . . 4.3.6 Automatic Edit Instance Information . . 4.3.7 Implementation Errors . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
5 Embee: Implementation and Analysis 5.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . 5.2 Phase 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Phase 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Objects at Breakpoints . . . . . . . . . . . . . . . 5.4 Phase 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Step 1: Hierarchy of Dynamic Mappings . . . . . 5.4.2 Step 2: Parsing . . . . . . . . . . . . . . . . . . . 5.4.3 Step 3: The Boolean Array . . . . . . . . . . . . 5.4.4 Step 4: Checking Conformance with the Analyzer 5.5 Code Analysis with Embee . . . . . . . . . . . . . . . . . 5.5.1 Testing vs. Continuous Monitoring . . . . . . . . 5.5.2 Quality of the Analysis . . . . . . . . . . . . . . . 5.5.3 Decidability and Completeness . . . . . . . . . . . 5.6 Capabilities and Limitations of Embee . . . . . . . . . . 5.6.1 Analysis Capabilities . . . . . . . . . . . . . . . . 5.6.2 Miscellaneous Capabilities . . . . . . . . . . . . . 5.6.3 Implementation Limitations . . . . . . . . . . . . 5.6.4 Fundamental Limitations . . . . . . . . . . . . . .
iv
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
47 47 48 48 49
. . . . . . . . . . . . .
50 51 55 56 56 58 59 59 61 63 67 70 71 73
. . . . . . . . . . . . . . . . . .
83 84 85 86 88 91 93 93 96 99 99 100 100 101 102 102 103 103 106
5.7 5.8
Comparison with Related Work Complexity and Performance . 5.8.1 Definition of Terms . . . 5.8.2 Complexity Analysis . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
108 111 111 113
6 Summary and Conclusions 127 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Bibliography
132
A Alloy Analyzer 139 A.1 Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 A.2 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 B Example 1 – List B.1 Input to Embee . . . . . . . B.1.1 Alloy Specification . B.1.2 Java Code . . . . . . B.1.3 Configuration File . B.2 Output Files . . . . . . . . . B.2.1 Static Map File . . . B.2.2 Summary File . . . . B.2.3 Dump Files . . . . . B.3 Screen Output . . . . . . . . B.4 Exploring Non-Conformance B.4.1 Lists Sharing a Node B.4.2 Cycle in a List . . . C Example 2 – Graph C.1 Input to Embee . . . . . . . C.1.1 Alloy Specification . C.1.2 Java Code . . . . . . C.1.3 Configuration File . C.2 Output Files . . . . . . . . . C.2.1 Static Map File . . . C.2.2 Summary File . . . . C.2.3 Dump Files . . . . . C.3 Screen Output . . . . . . . . C.4 Exploring Non-Conformance
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . v
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
145 145 145 146 149 150 150 150 151 154 157 158 159
. . . . . . . . . .
162 162 162 163 167 167 167 168 168 175 176
C.4.1 Cycle in a Graph . . . . . . . . . . . . . . . . . . . . . . . . . 177 D Example 3 – Tree D.1 Input to Embee . . . . . . D.1.1 Alloy Specification D.1.2 Java Code . . . . . D.1.3 Configuration File D.2 Output Files . . . . . . . . D.2.1 Static Map File . . D.2.2 Summary File . . . D.2.3 Dump Files . . . . D.3 Screen Output . . . . . . . D.4 Exploring Nonconformance D.4.1 Error of Omission . D.4.2 Typographic Error E Embee Class Diagrams E.1 embee . . . . . . . . E.2 embee.checker . . . . E.3 embee.mapper . . . . E.4 embee.parser . . . . E.5 embee.statedumper . E.6 embee.util . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
F Additional Analysis F.1 Calculation of Arity . . . . . . . . . . . . . . . . F.2 Comparison of N . . . . . . . . . . . . . . . . . F.2.1 Reasoning about N in terms of n . . . . F.3 Estimation of F . . . . . . . . . . . . . . . . . . F.4 Running Time . . . . . . . . . . . . . . . . . . . F.4.1 Testing Methodology . . . . . . . . . . . F.4.2 Test Series . . . . . . . . . . . . . . . . . F.4.3 Comparison of Running Times . . . . . . F.4.4 Split between Embee and Alloy Analyzer
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
179 179 179 181 189 190 190 190 191 192 194 194 198
. . . . . .
202 203 205 208 215 221 227
. . . . . . . . .
229 229 231 232 233 237 237 238 239 243
G Alloy Grammar
250
Glossary
254
Vita
270
vi
List of Tables 5.1
Running times for each phase and total running time of Embee . . . . 123
B.1 Mapping from test1 12.dump to Analyzer identifiers . . . . . . . . . 158 B.2 Mapping from test1 13.dump to Analyzer identifiers . . . . . . . . . 160 F.1 Example arity calculations . . . . . . . . . . . . . . . . . . . . . . . . 230 F.2 Estimate of Boolean formula size . . . . . . . . . . . . . . . . . . . . 234 F.3 Test series for evaluating the running time of conformance checking . 239
vii
List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
Usage of JML assertion checker . . . . . . . . . . . . . . . . . . Overview of the MaC architecture . . . . . . . . . . . . . . . . . Java-MaC structure . . . . . . . . . . . . . . . . . . . . . . . . . Overview of JPaX runtime verification system . . . . . . . . . . Excerpt of class diagram describing a meeting scheduling system A visual constraint diagram for a Meeting object . . . . . . . . The architecture of the VCD conformance checker . . . . . . . . Basic TestEra framework . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
13 14 16 17 18 18 19 20
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9
High-level view of Alloy analysis . . . . . . . . . . . . . . Satisfying instance of specification in Listing 3.1 . . . . . Activities performed while using the Alloy Analyzer . . . Satisfying instance of specification in Listing 3.2 . . . . . Another satisfying instance of specification in Listing 3.2 Incorrect satisfying instance of specification in Listing 3.2 Edit Instance dialog box for specification in Listing 3.4 . Edited instance, isomorphic to instance in Figure 3.6 . . Result of Edit Instance command . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
34 35 36 40 41 41 45 46 46
4.1 4.2
High-level view of Embee execution . . . . . . . . . . . . . . . Excerpt of high-level static mapping file . . . . . . . . . . . . (a) Default static mapping . . . . . . . . . . . . . . . . . . . (b) Modified static mapping . . . . . . . . . . . . . . . . . . Visualization showing two lists sharing a node . . . . . . . . . Visualization of list showing a cycle . . . . . . . . . . . . . . . Visualization of a directed graph with a cycle . . . . . . . . . Visualization of tree . . . . . . . . . . . . . . . . . . . . . . . (a) Before deletion . . . . . . . . . . . . . . . . . . . . . . . (b) After correct deletion . . . . . . . . . . . . . . . . . . . . Visualization of tree after deletion of root (error or omission) . Visualization of tree after deletion of root (typographic error)
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
55 56 56 56 65 67 73 80 80 80 80 82
4.3 4.4 4.5 4.6
4.7 4.8
viii
. . . . . . . . .
. . . . . . . . .
5.1 5.2 5.3 5.4 5.5
5.6 5.7
Hierarchy of Embee packages . . . . . . . . . . . . . . . . . . . . Embee execution – focus on Phase 1 (High-level Static Mapping) . Embee execution – focus on Phase 2 (StateDumping) . . . . . . . Embee execution – focus on Phase 3 (Conformance Checking) . . Sample specification and implementation . . . . . . . . . . . . . . (a) Specification of binary next relation . . . . . . . . . . . . . (b) Implementation of binary relation in (a) . . . . . . . . . . . (c) Specification of ternary next relation . . . . . . . . . . . . . Edit Instance dialog box . . . . . . . . . . . . . . . . . . . . . . . Running times for conformance checking phase as scope increases
. . . . . . . . . .
. 84 . 85 . 88 . 92 . 95 . 95 . 95 . 95 . 118 . 124
A.1 A.2 A.3 A.4 A.5
Legend of UML class diagram components . . . . . . . . . embee.EmbeePrep and Alloy Analyzer . . . . . . . . . . . . embee.mapper.StaticMapPreparer and Alloy Analyzer . . embee.checker.ConformanceChecker and Alloy Analyzer embee.checker.Transmogrifier and Alloy Analyzer . . .
. . . . .
. . . . .
. . . . .
140 140 141 141 141
B.1 B.2 B.3 B.4
Visualization of two lists sharing a node . . . . . . . . . . . . . . Using the Analyzer to pinpoint source of sharing non-conformance Visualization of a cycle in the list . . . . . . . . . . . . . . . . . . Using the Analyzer to pinpoint source of cycle non-conformance .
. . . .
. . . .
158 159 160 161
. . . . .
. . . . .
. . . . .
C.1 Visualization of nonconforming state at second breakpoint . . . . . . 177 C.2 Using the Analyzer to pinpoint source of cycle non-conformance . . . 178 E.1 E.2 E.3 E.4 E.5 E.6 E.7 E.8 E.9 E.10 E.11 E.12 E.13 E.14 E.15 E.16
Six main packages of Embee . . . . . . . . . . . . . . . . . . . Variable and operation access modifier legend . . . . . . . . . Class diagram for embee package . . . . . . . . . . . . . . . . Class diagram for embee.checker package . . . . . . . . . . . Class diagram for embee.mapper package . . . . . . . . . . . . Class diagram for embee.mapper.mappings package . . . . . . Class diagram for embee.mapper.mappings.maps package . . Class diagram for embee.parser package . . . . . . . . . . . . Class diagram for embee.parser.attributes package . . . . Class diagram for embee.parser.dumpTypes package . . . . . Class diagram for embee.parser.events package . . . . . . . Class diagram for embee.parser.variables package . . . . . Class diagram for embee.statedumper package . . . . . . . . Class diagram for embee.statedumper.breakpoints package Class diagram for embee.statedumper.exceptions package . Class diagram for embee.util package . . . . . . . . . . . . . ix
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
202 203 203 205 208 211 213 215 217 218 219 220 222 225 226 227
F.1 F.2 F.3 F.4 F.5 F.6 F.7 F.8 F.9 F.10 F.11 F.12 F.13 F.14 F.15 F.16 F.17 F.18 F.19 F.20 F.21 F.22
Equations to compute arity of relations . . . . . . . . . . . . . . N vs. scope for all three examples . . . . . . . . . . . . . . . . . Propositional formula size for List specification . . . . . . . . . Propositional formula size for Graph specification . . . . . . . . Propositional formula size for Tree specification . . . . . . . . . Comparison of propositional formula sizes for all three examples Running times of List specification . . . . . . . . . . . . . . . . Running times of Graph specification . . . . . . . . . . . . . . . Running times of Tree specification . . . . . . . . . . . . . . . . Running times of List, Graph and Tree specifications . . . . . . Running time split for List specification with no facts . . . . . . Running time split for List specification with 1 fact . . . . . . . Running time split for List specification with 2 facts . . . . . . Running time split for Graph specification with no facts . . . . Running time split for Graph specification with 1 fact . . . . . . Running time split for Graph specification with 2 facts . . . . . Running time split for Graph specification with 3 facts . . . . . Running time split for Tree specification with no facts . . . . . . Running time split for Tree specification with 1 fact . . . . . . . Running time split for Tree specification with 2 facts . . . . . . Running time split for Tree specification with 3 facts . . . . . . Running time split for Tree specification with 4 facts . . . . . .
x
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
230 231 235 235 236 236 240 241 242 243 244 244 245 245 246 246 247 247 248 248 249 249
List of Listings 3.1 3.2 3.3 3.4
Excerpt of a simple Alloy specification for a singly-linked list More complex Alloy specification of a singly linked list . . . Alloy specification of a singly linked list with assertion . . . Complete Alloy specification of a singly linked list . . . . . .
. . . .
32 37 43 44
4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 4.20 4.21 4.22
Alloy specification of a singly-linked list using only binary relations . Excerpt of Java code for Node class . . . . . . . . . . . . . . . . . . . Excerpt of Java code for SimpleList class . . . . . . . . . . . . . . . Excerpt of ClientCode.java with a conforming execution . . . . . . Configuration file for code in Listings 4.2, 4.3 and 4.4 . . . . . . . . . Runtime state dump file for the first method exit breakpoint . . . . . Excerpted screen output of first nine, conforming, breakpoints . . . . Excerpt of ClientCode.java causing two lists to share a node . . . . Excerpted screen output of Embee program . . . . . . . . . . . . . . Excerpt of ClientCode.java creating a cycle in the list . . . . . . . . Excerpted screen output of Embee program . . . . . . . . . . . . . . Excerpt of Alloy specification for directed acyclic graph . . . . . . . . Excerpt of Java code for Node class . . . . . . . . . . . . . . . . . . . Excerpt of Java code for DAG class . . . . . . . . . . . . . . . . . . . . Excerpt of ClientCode.java creating a cycle in the acyclic graph . . Configuration file for code in Listings 4.13, 4.14 and 4.15 . . . . . . . Automatically generated Edit Instance information . . . . . . . . . . Excerpt of Alloy specification for a binary tree . . . . . . . . . . . . . Excerpt of BinaryTreeNode class . . . . . . . . . . . . . . . . . . . . Excerpt of BST class . . . . . . . . . . . . . . . . . . . . . . . . . . . Configuration file for code in Listings 4.19 and 4.20 . . . . . . . . . . Excerpt of swapNodes() method with a typographic error in line 248
51 52 53 54 55 57 59 64 65 66 67 68 69 69 70 71 72 75 76 77 77 81
5.1 5.2 5.3
Excerpt from StaticMapPreparer.java . . . . . . . . . . . . . . . . Excerpt from StateDumperThreads.java . . . . . . . . . . . . . . . . Excerpt of code to retrieve and dump object information . . . . . . .
87 89 90
xi
. . . .
. . . .
. . . .
. . . .
5.4 5.5 5.6 5.7 5.8 5.9
Conclusion of excerpt in Listing 5.3 . . . . . . . . . . Excerpt of an implementation of the Graph signature Excerpt of the Transmogrifier class . . . . . . . . . Excerpt of the ConformanceChecker class . . . . . . Excerpt of Alloy specification for binary search tree . Simple specification to explore F . . . . . . . . . . .
xii
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. 91 . 96 . 98 . 99 . 105 . 117
Chapter 1 Introduction 1.1
Motivation
Object-oriented analysis and design have become very popular in recent years. Several modelling notations have been developed for this paradigm, with the current de facto standard being the Unified Modeling Language (UML) [BRJ99]. UML includes a rich set of artifacts, including object models (class diagrams), use cases, and message sequence diagrams. The Object Constraint Language (OCL) [WK99] extends UML by providing the ability to specify constraints. Unfortunately, the semantics of UML and OCL are not completely formalized, which complicates the development of automatic analysis techniques and tools. In order to overcome these shortcomings, researchers in the Software Design Group at the Massachusetts Institute of Technology (MIT) have developed Alloy. Alloy is a UML-compatible object modelling notation expressly designed with automatic analysis in mind. An object model describes the objects that an object-oriented software system creates and uses during execution. It reveals the possible relationships between objects 1
CHAPTER 1. INTRODUCTION
2
and expresses properties that they possess. The benefits of formally modelling software systems are well-documented [CW96, Jac02c, Win90]. By formalizing aspects of a system, developers are more likely to discover potential design flaws, as well as inconsistency, ambiguity and incompleteness. Object models support the precise and concise description of central, high-level system properties. Consequently, they form an integral part of the specification and design stages of most object-oriented software development efforts.
1.2
Problem
Unfortunately, the usefulness of object models during later stages of the development cycle is currently much more limited—for two reasons: • Although object models provide the implementer with a formal description of central properties, there are currently few techniques or tools that allow the implementer to check that newly developed or modified source code satisfies the constraints of the object model. In other words, the conformance of the actual source code with respect to the specified model cannot be automatically checked. • Object models and code are typically presented as two separate and distinct artifacts. Currently, little tool support is available for keeping the object model and the source code synchronized; hence, it is difficult to ensure that changes in one are reflected in the other. Therefore, the model often becomes out-of-date or even obsolete and ceases to be a useful development or documentation tool.
CHAPTER 1. INTRODUCTION
1.3
3
Objective
To address these issues, researchers have suggested combining the model and the code into one artifact [BHL94, LBR99]. This thesis proposes to bridge the gap between models and code with a different approach—that of runtime conformance checking. We focus our efforts on the Alloy language and its associated analysis tool.
1.3.1
Hypothesis
We hypothesize that it is possible to automatically check the execution of a Java implementation against structural invariants expressed in Alloy. Furthermore, we aim to demonstrate that such conformance checks can be performed both accurately and reasonably efficiently. To this end, we make use of the Alloy Analyzer tool, which has the capability of checking specific instances for conformance against a specification. However, these instances are expressed in terms of Alloy constructs and are manually created and input into the Analyzer. Our goal is to automate this process, from executing the target code, to retrieving information about the execution’s runtime state, to translating that information into a format understandable by the Analyzer. The main focus of our research is the design and implementation which automates the conformance checking of Java implementations against Alloy specifications. We view the successful creation of such a tool as a proof-of-concept; i.e., proof that it is possible to automate this type of runtime conformance checking.
CHAPTER 1. INTRODUCTION
1.4
4
Contributions
We have been successful in creating such a proof-of-concept. This document presents the design and implementation of a tool, called Embee, which captures the runtime state of a Java program at certain user-specified points. The Alloy Analyzer is then used to determine automatically whether or not the state conforms to an Alloy specification. We also present several test cases and analyses of the tool itself, which confirm that Embee can accurately and reasonably efficiently check conformance of mid-sized implementations. The tool can thus be used to help increase confidence in the correctness of a specification and related implemented or modified code, and to make it easier to keep the specification and code in sync. During the implementation of the Embee prototype, we designed and implemented a separate tool, called StateDumper. This tool’s purpose is to extract the runtime state from the target implementation’s execution. Its implementation is explained in more detail in Section 5.3. In addition to its usefulness to Embee, the StateDumper program could be used in isolation to examine the runtime state of any Java implementation. Although both the Alloy Analyzer and Embee (including the StateDumper) are implemented in Java, Embee could be adapted to check the conformance of other object-oriented languages. The target language must support the collection of an implementation’s runtime state and a StateDumper program would have to be implemented for that language. The new StateDumper would extract information from the target program’s execution and output it to a text file in an appropriate format. Embee would then be able to check the execution for conformance against the specification.
CHAPTER 1. INTRODUCTION
5
Finally, the Alloy Analyzer is typically used with very small scopes, i.e., with very few objects of any one type. With our test cases, we have experimented with the size of the scope and we have discovered two interesting facts. First, the Analyzer has been optimized for these small scopes and there is a specific threshold at which its performance becomes very inefficient. Second, the Analyzer seems to have a second threshold, at which point its results become inaccurate. This behaviour does not invalidate our results, because these results are secondary to our main research objective. However, given the popularity of Alloy, the results are interesting enough to warrant further research and are discussed in more detail in a separate report [CD03].
1.5
Organization of Thesis
We proceed by introducing conformance checking and discussing related work in the next chapter. We discuss the Alloy language and the Alloy Analyzer tool in Chapter 3. Chapter 4 describes our Embee tool, from the user’s perspective, with several running examples. Implementation details and the analysis of the tool are presented in Chapter 5. Chapter 6 concludes and outlines future work.
Chapter 2 Background 2.1
Conformance Checking
2.1.1
Multiple Definitions
The concept of conformance checking has become popular in the area of software engineering. However, there does not appear to be one generally accepted term and definition. For example, the terms conformance checking, architectural conformance (testing), monitoring and checking, runtime debugging and runtime verification are used almost interchangeably and are described by various authors as: • checking “whether an implementation conforms to some given design” [Wuy01] • ensuring “that the actual software (the detailed design and code) conforms to the architecture” [Flo02] • “a testing method to see if an implementation and its executable specification
6
CHAPTER 2. BACKGROUND
7
are behaviorally equivalent relative to any interactions performed on the implementation” [BNS01] • detecting “inconsistencies between an architectural model that has been verified and a system that has been implemented” [You98] • comparing the source-code artifacts to the high-level architectural abstractions [MW99] • testing “how closely the design matches the code” [GSR+ 99] • checking “preconditions of methods” and checking “whether a class invariant holds at runtime” [LLP+ 00] • “a means of debugging and of partially checking correctness” [Bho00] • providing “assurance about the correct execution of target programs at runtime...based on a formal specification of system requirements” [KKL+ 01] • “combining testing and formal methods” [HR01] From these definitions, we conclude that the basis of conformance checking is a comparison between two artifacts in the software development process, such as the specification, design or implementation. The purpose of this comparison is to determine whether the two artifacts are consistent. Conformance checking is usually referred to in a ‘forward-engineering’ sense, i.e., an artifact created later in the development process is checked for conformance against one created earlier in the process. For example, conformance checking can be used to determine whether the design of a system is consistent with the original textual specification of that system. Similarly,
CHAPTER 2. BACKGROUND
8
conformance checking can be used to determine whether an object-oriented program’s implementation is consistent with its design, or even to its specification. In these cases, consistency means that the design or implementation follows the structure of the specification and meets all of the constraints of the specification. Various techniques exist for conformance checking, including code inspection, testing and formal verification. The goal of each of these techniques is to check, or perhaps ensure, that the specification, design and/or implementation are consistent with each other.
2.1.2
Static vs. Runtime (Dynamic) Conformance Checking
Techniques and tools for conformance checking can be divided into two approaches— static and runtime, or dynamic, conformance checking. Static Conformance Checking Static conformance checking only makes use of the implementation’s source code. In other words, static conformance checking can be used to check the consistency of a program’s code against either its design or its specification. This type of checking can make use of any artifact available at compile-time, but cannot make use of runtime information. Static checking has the advantage of being able to cover the entire implementation, depending on the tool used. An additional advantage is the fact that checking a program before its execution means that the execution itself will not be slowed down. Static conformance checking may increase the confidence in an implementation, but the actual runtime execution is not checked. The amount of information that can be extracted from a program at compile-time with reasonable
CHAPTER 2. BACKGROUND
9
accuracy is limited. For instance, it is possible in certain specific cases to determine at compile-time precisely what values a variable can hold at specific program locations. However, such information typically cannot be determined in the general case with sufficient accuracy. Therefore, a program may conform in a static context, but may still behave unexpectedly in operation. Similarly, potential nonconformance caused by inappropriate dynamic binding or by the violation of an invariant during execution may not be caught. Runtime (Dynamic) Conformance Checking On the other hand, runtime conformance checking makes use of information available during the implementation’s execution; it is not limited to artifacts available at compile time. In fact, the entire purpose of runtime conformance checking is to check the actual execution of a program against its specification and/or design. On the other hand, an execution may not cover all functionality of an implementation; therefore, the resulting conformance check may be less comprehensive. In addition, runtime conformance checking interferes somewhat with the implementation’s execution; the execution will be slowed down by the checking itself.
2.1.3
Our Definitions
For our research, we are adopting the following definitions of conformance checking and runtime conformance checking: Conformance checking is the process of comparing an object-oriented implementation with an object model, or structural specification, in order to determine whether the implementation conforms to the constraints of the specification.
CHAPTER 2. BACKGROUND
10
Runtime conformance checking is the process of comparing an object-oriented program’s execution with an object model, or structural specification, in order to determine whether the implementation’s execution conforms to the constraints of the specification. Specifically, we check the conformance of Java programs against Alloy object model specifications. It is important to note that we do focus on structural properties, such as the correspondence between classes and attributes in the implementation and signatures and fields in the specification. In other words, we focus our efforts on the structure of objects and relationships between these objects. We consider an implementation’s execution to be conforming if objects created during the execution obey the constraints of the specification at certain user-specified breakpoints. Note that an implementation may be considered conforming in one execution, but nonconforming in another. For example, an implementation of a data structure may be conforming if only its add() method is called during an execution. However, if there is a flaw in its delete() method, then an execution calling on that method could be nonconforming. We further refine our definition with the following caveats: 1. Our version of conformance checking is strictly runtime, i.e., we make use of information available during the execution of the implementation to check the conformance of the program against its specification. 2. We distinguish between checking and ensuring. The process of checking is passive, i.e., an implementation either conforms to its specification, or it does not. On the other hand, ensuring or steering is an aggressive activity, i.e., a nonconforming implementation might somehow be modified during execution
CHAPTER 2. BACKGROUND
11
in order to ensure that it becomes conforming. 3. Finally, we do not check conformance of the entire program’s execution; checking is performed only at user-specified breakpoints.
2.2
Related Work
The remainder of this chapter is devoted to related work in the area of conformance checking. We briefly discuss two static conformance checking tools and then focus on five runtime conformance checking tools in more detail. Comparisons between our work and the related runtime tools and techniques are presented in Section 5.7.
2.2.1
Static Conformance Checking
USE for OCL The UML-based Specification Environment (USE) tool [RG00] offers the ability to animate a UML class diagram and check the conformance of various object states against a UML/OCL specification. The specification is written in UML with additional constraints written in OCL. The USE tool boasts a graphical user interface that allows the user to animate and inspect the UML/OCL specification. Scripts may also be written to create a set of objects and simulate the performance of operations on these objects. Any created state may be checked for conformance against the original specification. The creation of the specification and constraints, as well as the creation of the object states to be checked are the responsibility of the user. The tool automatically graphically displays the user-created object states and checks their conformance against the specification.
CHAPTER 2. BACKGROUND
12
ESC/Java The Extended Static Checker for Java (ESC/Java) [DLNS98, FLL+ 02] is an experimental compile-time program checker developed by the Compaq Systems Research Center. Using a simple annotation language (similar to the Java Modeling Language discussed in Section 2.2.2), the user instruments Java code with constraints. These constraints are referred to as ‘design decisions’. As an example of such a design decision, consider an integer array called elements. If the user decides that this array should always be non-null, he or she would annotate the array declaration as follows: /*@non_null*/ int[] elements; ESC/Java then uses verification-condition generation and automatic theorem-proving techniques to determine whether or not the actual code conforms to the design decisions as annotated. ESC/Java is also able to warn the user about possible runtime errors such as null dereferences and array bounds errors [FLL+ 02].
2.2.2
Runtime (Dynamic) Conformance Checking
Java Modeling Language (JML) The Java Modeling Language is a notation that can be used to specify assertions about the detailed design of Java classes and interfaces, including pre- and post-conditions on methods [LBR99, LLP+ 00]. A runtime assertion checker has been developed at Iowa State University, which makes use of JML for runtime debugging and partial correctness checking [Bho00]. The use of JML and the Iowa State University checker tool requires the user to annotate the target code, i.e., the specification and code are combined in one artifact.
CHAPTER 2. BACKGROUND
13
Compilation of the code causes the annotated constraints to be expanded into runtime checks, that is, the conformance checking is performed as the code is executed, and not by a separate process. Figure 2.1 shows the usage of the runtime assertion checker tool, which has two components. The translator takes as input the target program, e.g., Foo.java, which is annotated with JML specifications. The translator performs an abstract syntax tree transformation to create a second Foo.java file, which now contains extra Java code to perform the runtime checks. The instrumented target code is compiled with the standard Java compiler. The runtime system executes the program and assists the user with the assertion checker; for instance, the user can choose which types of assertions to check and how assertion failures should be handled.
Figure 2.1: Usage of JML assertion checker [Bho00]
14
CHAPTER 2. BACKGROUND
Monitoring The Monitoring and Checking (MaC) architecture, developed at the University of Pennsylvania, provides the ability to continuously monitor a target program’s execution and to check the conformance of this execution against formal system requirements [KKL+ 01, LKK+ 99]. Users can formulate specifications using events and temporal operators. The code is instrumented automatically to emit a stream of events, which the monitor then checks with respect to the specification.
Informal Requirement Spec
Program Input
Human
Formal Requirement Spec Low-level Specification
Automatic Instrumentation
Automatic Translation
Static Phase low-level activity
Program
Filter
Event Recognizer
high-level activity
High-level Specification
Automatic Translation
Run-time Checker
Run-time Phase
Figure 2.2: Overview of the Monitoring and Checking (MaC) architecture [KKL + 01] Figure 2.2 shows the structure of the generic MaC architecture, which consists of two main phases. The input to the static phase is the target program and a requirements specification. During the static phase, the specification is provided by the user in terms of a high-level and a low-level specification. In addition, the target program
CHAPTER 2. BACKGROUND
15
is automatically instrumented, i.e., runtime components such as a filter, event recognizer and runtime checker are created. The runtime phase occurs while the target program is executing; during this phase, the filter extracts information about monitored objects. The event recognizer interprets events from the filtered information. Recognized events are then sent to the runtime checker, which determines whether the execution history, based on the events, conforms to the specification. The user needs to learn two specification languages, one each for low-level and high-level specifications, although the target program itself is automatically instrumented and checked. The Primitive Event Definition Language (PEDL) is used to express the low-level specification; this is used to define the information to be sent from the program’s filter to the event recognizer. In addition, this specification defines how the filtered information is to be transformed into events to be used with the high-level specification. On the other hand, high-level requirements are expressed in the Meta Event Definition Language (MEDL). The use of two languages allows the separation of implementation-specific details from the high-level requirements checking [KKL+ 01]. Java-MaC [KKL+ 01] is a prototype implementation of the MaC architecture, designed for Java programs. Figure 2.3 on the next page shows the structure of this prototype. Java PathExplorer The Java PathExplorer (JPaX) is a separate monitoring tool developed at the National Aeronautics and Space Administration Ames Research Center [HR01]. It provides similar functionality to Java-MaC in that it has the ability to automatically
CHAPTER 2. BACKGROUND
16
Figure 2.3: Java-MaC structure [KKL+ 01] instrument Java byte code and monitor program execution. The user needs to learn only one high-level logic language, which allows for the creation of temporal specifications. Conformance checking of an execution against high-level specifications is possible, as well as detection of low-level error conditions such as deadlock. JPaX consists of three main components, as can be see in Figure 2.4 on the next page. The instrumentation module automates instrumentation of the Java code to be observed; the byte code instrumentation is accomplished using a Java byte code engineering tool from Compaq, called Jtrek [SC]. When executed, the target code will then emit events through the interconnection module. The observer module processes these events and checks them against high-level requirement specifications. Events are passed to a set of observer rules; each rule performs a specific type of analysis, such as checking for deadlock or race conditions. One of the observer rules can check events against linear temporal logic (LTL) specifications using Maude [CDE+ 99]—a high-performance system for rewriting and evaluating logic.
CHAPTER 2. BACKGROUND
17
Figure 2.4: Overview of Java PathExplorer (JPaX) runtime verification system [HR01]
UML-Based Runtime Conformance Checking Researchers at Queen’s University have developed a tool that allows users to specify constraints on an object model using an extension of UML object diagrams. The user must create a set of Visual Constraint Diagrams (VCDs) [TGS+ 02] that represent nonconforming object states. For example, the partial UML class diagram in Figure 2.5 on the next page shows part of a meeting scheduling system design. Every Meeting object contains two primitive long values, i.e., startTime and endTime, as well as a ParticipantList object. The fact that a meeting’s ending time should occur after its starting time is impossible to express using only UML. Instead of making use of prose descriptions or OCL, a VCD can be created to express this constraint. The VCD in Figure 2.6 on the next page expresses the constraint that a meeting’s endTime must be greater than its startTime. The bold line marked badTime indicates a defined relation over Meeting
18
CHAPTER 2. BACKGROUND
l o ng startTime
Meeting
l o ng
endTime
1 participants
1
P a r tic ip a ntL is t
Figure 2.5: Excerpt of class diagram describing a meeting scheduling system. A Meeting object will have a start and end time and a list of participants [TGS+ 02]. objects; this means that badTime holds if and only if a meeting’s start time is greater than its end time. All constraints are expressed as exceptions, i.e., undesirable states.
badTime startTime
:Meeting
x :l o ng >
endTime
y :l o ng
Figure 2.6: A visual constraint diagram for a Meeting object. It is illegal for a meeting’s start time to follow its end time [TGS+ 02]. The checking tool makes use of these diagrams in order to check the conformance of target Java programs. A deductive graphical database language called GraphLog is used by the checker to compare runtime data snapshots against the VCDs. If a relation defined by a VCD is found to be true over some set of related objects from the runtime snapshot, then the constraint expressed by that VCD has been violated, i.e., the runtime state of that target program does not conform to its constraints. Constraints can also be expressed using path regular expressions, or defined over several VCDs.
19
CHAPTER 2. BACKGROUND
Figure 2.7 shows the high-level architecture of this checker from the user’s perspective. Standard UML class diagrams and visual constraint diagrams (called ‘UML object diagrams expressing exceptions’ in the figure) make up a composite design. Information from the target Java program is extracted to create a core snapshot file. The core snapshot is compared against the composite design to determine whether or not conformance has been met; a checker report is output to the user. !"#$ #% !
& $ ' $'
,-.$/ 021/ '! Run Checker
( )$"*+$
Extract Information at Breakpoints
-+43!$$/5$76$ &
-85$ #9/:'
Figure 2.7: The architecture of the Visual Constraint Diagram conformance checker from the point of view of a user [Tur03]
TestEra TestEra is a framework designed for the automated testing of Java programs, which also makes use of the Alloy Analyzer [MK01]. Figure 2.8 on the next page shows a basic framework for TestEra. A specification is composed of both Java code and Alloy, which is used to describe the structural properties of a program’s input, as well as correctness properties of the program itself. Three source files are extracted
CHAPTER 2. BACKGROUND
20
from the specification: an Alloy specification, which describes the inputs to the target program, another Alloy specification describing the correctness criteria and, finally, Java code. The code is responsible for translating Alloy instances into Java, running the target program with these inputs, and translating the Java output back into an Alloy format.
Figure 2.8: Basic TestEra framework [MK01] TestEra’s analysis proceeds in two phases. In the first phase, the Alloy Analyzer generates all non-isomorphic test cases based on the structure of the target program’s input. In the second phase, TestEra uses the target program to execute each test case—translated into Java by the concretization step. The results of each test are translated back to Alloy by the abstraction step. The Alloy output is then compared against the correctness properties specification; if a check fails, it is output to the user as a counter-example. TestEra has been used to check interesting examples, including some classes in the java.util package and the naming scheme used by the Alloy Analyzer itself.
Chapter 3 Alloy Alloy is an object-modelling language, developed by the Software Design Group at MIT [Jac00a, Jac02a], which has been specifically developed as a language to create micromodels—models which are usually much smaller than the systems they represent [Jac02c]. One advantage to using micromodels is that developers can choose to model only portions of the system, focussing on high-risk areas. These risky areas can be those which are most likely to fail, those which are most expensive to implement, those which are most likely to contain errors after implementation, etc. Building a small model of a system means that the important aspects of the system are handled, and not lost in the detail of a more complete model. Models themselves are important artifacts in the software development process, whether or not they are micromodels. Models allow developers to document a software design, without having to implement it. Once code has been implemented, the underlying model remains useful: the model can be used to succinctly explain a difficult part of the implementation, to educate new members of the development team, 21
CHAPTER 3. ALLOY
22
to determine how maintenance will affect the entire program, etc.
3.1
The Alloy Language
Alloy combines attributes of Z [Spi92], UML and OCL. It is a mathematically-based language, combining aspects of first-order logic and set theory. Alloy uses a small, intuitive, ASCII-only syntax [Jac00a]. Several characteristics distinguish Alloy from UML/OCL, including the following: • Alloy provides transitive closure operations, which allow for the succinct expression of reachability properties such as the cycle-freedom of lists. • Alloy models are declarative. Properties and constraints describe a system’s state; operations are specified by describing the relationships between the objects in the old and new state. A declarative language is well-suited to incremental modelling because complex specifications can be easily composed. • Alloy models are analyzable. Alloy boasts a complete and formal semantics, making automatic analysis possible. In addition, the language was developed along with an automatic analysis tool [JSS00]. Models can be built incrementally, using the Alloy Analyzer for simulation and constraint checking. We illustrate the use of Alloy through examples; however, several concepts must first be introduced. The following information has been extracted from the user’s manual [Jac02c]; see that reference for a complete discussion of the intricacies of the Alloy language.
CHAPTER 3. ALLOY
3.1.1
23
Atoms and Relations
Object models, or specifications1 , described in Alloy are constructed from atoms and relations. Atoms An atom is a primitive entity that is indivisible, immutable and uninterpreted [Jac02c]. In other words, an atom cannot be decomposed into smaller components and its properties do not change over time. In addition, an atom has no implicit properties or meaning, unlike a number or Boolean value. Alloy has no implicit concept of composition. To indicate that a consists of b and c, we model a, b and c as atoms and explicitly define relations between a and b, and a and c. Relations A relation is a structure, or mapping, that relates atoms. A relation can be considered a set of tuples, each containing a sequence of atoms. [Jac02c] also describes relations as tables, where each entry must contain an atom. The order of the columns is important; although the order of the rows is not. Relations are typed, i.e., the entries in any column will all be of the same type. A relation must have at least one column; the arity of the relation is equal to the number of columns in the relation. A relation may be empty (no rows) or non-empty. Relations are typed; the type of the atoms in the first column is the left type and the type of the atoms in the last column is the right type. For a binary relation, the atoms in the first column correspond to the domain and those in the last column correspond to the range. Note that the atoms in the first column (domain) do not have to correspond to the entire set of atoms of that type. If that were the case, then 1
The Alloy documentation refers exclusively to ‘object models’. Since our focus is on conformance checking of a program against a ‘specification’, we will refer to ‘Alloy specifications’.
CHAPTER 3. ALLOY
24
the relation would be total ; otherwise, it would be a partial relation. Similarly, the atoms in the last column (range) do not have to include all of the atoms of that type. If that were so, then the relation would be surjective. Sets and Scalars Although Alloy is based on set theory, there are no actual sets in the language itself; every expression is a relation and sets of atoms are simply unary relations. Similarly, scalars are represented as singleton unary relations. Intuitively, however, it is more convenient to think in terms of sets and scalars. Therefore, when we discuss a set, we mean a unary relation; a scalar is a singleton unary relation; and a relation is a relation with arity of 2 or more. Functions A function is simply a total binary relation that maps each atom in the left set to at most one atom in the right set.
3.1.2
Expressions, Operators and Quantifiers
Expressions and Operators All expressions in Alloy denote relations; every operator takes one or more relations and the result is also a relation. There are three types of operators: set operators, relational operators and logical operators. Set operators include standard operators such as union (+), intersection (&) and difference (-). The arguments for these operators must be of the same type. The keyword in can be used for containment. For example, p in q means that every tuple in p is also in q. The = sign is used for equality, or containment in both directions. Relational operators include operators such as join (.), transpose (~), transitive closure (^), reflexive transitive closure (*) and product (->). The join of two
CHAPTER 3. ALLOY
25
relations p and q contains the join of every combination of a tuple in p with a tuple in q, where the right type of p matches the left type of q. If both p and q are binary relations, then p.q is the relational composition of p and q. If p is a set and q is a binary relation, then p.q gives the relational image of p under q. Intuitively, the join operator models navigation. As we will see, the join operator is used to navigate from a set of objects to the objects that are related through associations. The transpose ~r of a relation r is its mirror image, reversing the order of the mapping. The transitive closure ^r of a binary relation r is the smallest relation that contains r and is transitive. The reflexive, transitive closure *r is the smallest relation that contains r and is both reflexive and transitive, i.e., it is like ^r but includes a mapping from every atom to itself. The product of two relations is similar to the join, but the intermediate atoms are not dropped. The result is the relation containing the concatenation of every tuple in the first relation with every tuple in the second relation. Logical operators can be used to combine smaller formulas into larger ones. These operators include: negation (!), conjunction (&&), disjunction (||), implication (=>) and biimplication (). Quantifiers Larger formulas can also be made by quantifying other formulas that contain free variables. Quantified variables are not bound by type, but rather by an expression. There are five quantifiers available in Alloy:
26
CHAPTER 3. ALLOY
Quantifier all x : e | F some x : e | F no x : e | F sole x : e | F one x : e | F
Meaning universal, F is true for every x in e existential, F is true for some x in e F is true for no x in e F is true for at most one x in e F is true for exactly one x in e
Quantifiers can also be applied to expressions. For example, one e means that the set denoted by e contains exactly one tuple.
3.2
Alloy Specifications
There are several elements to an Alloy specification. In addition to descriptive elements, such as signatures and facts, others, such as assertions and commands are used during the analysis of these specifications. This section will discuss only the basic aspects of these elements; see the user manual [Jac02c] for more detailed information.
3.2.1
Descriptive Elements
The basic element of an Alloy specification is the signature, with its fields. Models are further enhanced with the use of explicit facts. Alloy specifications are contained in modules. It is possible to maintain an entire specification in one module. However, reuse of particular elements is possible by importing, or using, other modules.
CHAPTER 3. ALLOY
27
Signatures and Fields The simple signature sig A {} introduces A as a basic type with a set of atoms of that type. A refers to the set of atoms; the type is inferred by Alloy and cannot be referenced explicitly. A signature may include references to other signatures, or fields that represent relations, for example sig A {} sig B { f : A } In this case, signature B contains field f, which relates atoms of B to atoms of A, i.e., a relation mapping the type of B to the type of A. There may be implicit facts described by a specification. In this example, the two signatures, A and B, are disjoint—atoms are either in A, or they are in B. There may also be implicit facts defined in a signature. In the signature for B, the implicit fact that the left type of relation f is of type B means that the left set of f is a subset of B. Similarly, the fact that the right type of relation f is the type of A means that the right set of f is a subset of A. There is one additional implicit fact in this signature: that f is a total, injective relation. Every atom in B must map to exactly one atom in A. Distinguishing Fields Each signature in a specification represents a name space, so it is possible to have two fields with the same name. For example, sig A { f : B }
CHAPTER 3. ALLOY
28
sig B { f : A } In this case, the dollar sign ($) is used to distinguish between the two fields by using their absolute names: A$f and B$f. Set Multiplicity Keywords It is possible to change the constraints on binary relations using multiplicity keywords. In the example above, f : A means that every atom of type B is associated to exactly one atom of type A, i.e., f is a total function. The use of the keyword option, as in f : option A, means that f can now be a partial function. For example, sig B {f : option A} expresses that an atom in B maps to at most one atom in A. Similarly, the keyword set, as in f : set A, means that an atom in B maps to zero or more atoms in A, thereby removing any constraint on the function f. Ternary and Higher-Arity Relations Not all relations must be binary, as in the relation f discussed above. It is possible to have ternary relations, or relations with even higher arity (referred to as higher-arity relations). For example, in the specification sig A {} sig B {} sig C { f : A -> B } field f represents a ternary relation. Given a scalar c from the set C, the expression c.f denotes a binary relation whose left set is a subset of A and whose right set is a subset of B. An example with a ternary relation is discussed in Section 3.4.2.
29
CHAPTER 3. ALLOY
It is also possible to have fields representing relations with arbitrary arity, such as f : A -> B -> C -> D. This relation has arity 5. An Alloy model typically does not contain relations of arity greater than 3. Multiplicity Markings on Relations The multiplicity of a binary relation can be modified with the use of option and set keywords. It is also possible to change the multiplicity of higher-arity relations. The following multiplicity markings are available: Multiplicity Marking ? ! +
Meaning zero or one exactly one one or more
If m and n are multiplicity markings, then the signature sig C { f : A m -> n B } with a scalar c from the set C means that c.f maps m atoms in A to each atom in B and each atom in A to n atoms in B. Making m = ? turns relation c.f into a function, i.e., atoms in A may or may not map to atoms in B, but there will be a maximum of one mapping for each atom in A. Making m = ! makes it a total function, i.e., every atom in A maps to some atom in B. Making n = ? turns the relation c.f into an injection, i.e., atoms in A map to at most one element in B. Multiplicity markings are only possible for relations with arity 3 or greater, i.e., there must be at least one product operator (->). In addition, these markings may only be placed on either side of one product operator. For example, the relation f : A ? -> B -> ! C would be illegal.
CHAPTER 3. ALLOY
30
Extension One signature may extend another, similar to class specialization (extension) in Java. For example, sig A {} sig B { b1 : A } sig C extends B { c1 : A } specifies three signatures, but only two types. sig C extends B introduces C as a new subsignature, i.e., as a subset of signature B. Atoms of C have the same type as atoms of B. Also, atoms of C have their own fields, e.g., c1, and the fields of their supersignature, e.g., b1, but not vice versa. Finally, subsignatures may have their own subsignatures. Functions In addition to the definition of a function as a binary relation referred to by a field in a specification, Alloy also allows for the definition of explicit functions, or parameterized formulas. Most of the specifications that we use for our analysis will not contain functions, with the exception of a specialized function used to demonstrate a satisfying instance of the specification. Although this function could have any name, we will follow the convention of referring to it as the Show() function. Facts Signature declarations usually imply certain structural constraints; however, it is also possible to express explicit invariants, or facts, in Alloy specifications. A fact is a
CHAPTER 3. ALLOY
31
constraint that must always be satisfied in any instance satisfying the specification. It may be placed in a separate paragraph in the specification, or it may be directly associated with a particular signature. Both types of facts are demonstrated by example in Section 3.2.3.
3.2.2
Analysis Elements
In addition to the descriptive elements, discussed above, which allow for richly defined specifications, assertions and commands are used for the analysis of these specifications. Assertions Assertions can be used during the analysis of Alloy specifications. They are always separate paragraphs in the specification and are formed in the same fashion as facts. While a fact represents a constraint that must be satisfied by an instance satisfying the specification, an assertion is used to check whether or not a specification satisfies some constraint. Assertions are demonstrated by example in Section 3.4.2. Commands There are two types of command that can be used during analysis: run
This command is used to find an instance of the specification that also satisfies a function in the specification. The only function that we will discuss in this context is the Show() function, which is used to find a solution, i.e., an instance that satisfies the constraints of the specification.
32
CHAPTER 3. ALLOY
check This command is used to determine whether or not an assertion in a specification holds for all instances that satisfy the specification. In other words, the check command attempts to find a counter-example, i.e., an instance that satisfies the specification, and yet does not satisfy the assertion in question.
3.2.3
Example
In order to demonstrate some of the basic elements of an Alloy specification, and its analysis, we shall introduce a simple example. An excerpt from an Alloy specification of a singly-linked list is presented in Listing 3.1. Listing 3.1 Excerpt of a simple Alloy specification for a singly-linked list sig Node { next : option Node }
sig List { first : Node }{ all n : Node | n in first.*next no n : Node | n in n.^next }
This specification consists of two signatures, which declare two disjoint sets of objects—one of type Node and one of type List. The Node signature contains one binary relation (or partial function), next, which relates nodes to nodes. The option qualifier indicates that every node is related to zero or one nodes. The List signature also contains one binary relation, first, which relates every list to exactly one node. Unlike the Node signature, the List signature is followed by a second set of braces containing invariants, or facts, that must always be true for all lists. The first fact uses the reflexive, transitive closure operator * to specify that all nodes are reachable from the first node of a list, i.e., every node can be reached from
CHAPTER 3. ALLOY
33
the first node by following the next relation zero or more times. The second fact uses the transitive closure operator ^ to specify that no node is its own successor, i.e., no node can be reached from itself by following the next relation one or more times. These facts have been expressed directly with the List signature. It would also be possible to express these facts as separate paragraphs in the specification, such as fact Reachable { all l : List | all n : Node | n in (l.first).*(l.next) } fact NotOwnSuccessor { all l : List | all n : Node | n in (l.first).^(l.next) }
Notice the addition of an extra qualified variable in order to bring the fact out of the context of the List signature.
3.3
Alloy Analysis
A key advantage of using Alloy as a modelling language is that the resulting specifications can be analyzed fully automatically. Although first-order logic is undecidable [Chu36, Hun01, Tur36], it is possible to analyze Alloy models by restricting the search space to a certain finite scope. The Alloy analysis uses this scope to translate the specification into a propositional formula and employs a selection of off-the-shelf SAT solver to determine whether or not there exists an instance within the scope that satisfies the formula [Jac00a]. Figure 3.1 on the next page illustrates a high-level view of the Alloy analysis [Jac02b]. The Alloy documentation refers to the propositional formula as the boolean formula; the alloy formula is the actual specification and the alloy instance
34
CHAPTER 3. ALLOY
is an instance that satisfies the specification. allo y f o rm u la
sc o p e
allo y i nstanc e
translate f o rm u la
m ap p i ng
translate m o d el
b o o lean f o rm u la
S A T so lv er
b o o lean i nstanc e
Figure 3.1: High-level view of Alloy analysis [Jac02b] There are two ways of using the analysis; the first is to check the consistency of a specification, by finding an instance that satisfies it. This consistency check is essentially a simulation, or animation, of the model described by the specification. This can be accomplished using a run Show() command. Running the specification in Listing 3.1 on page 32 with a Show() command and a scope of 3 yields the satisfying instance in Figure 3.2 on the next page. This instance consists of one list with three nodes. Although atoms have no inherent meaning, or identification, the Alloy Analyzer assigns each atom a unique name when presenting a solution or counter-example. The name of an atom is the name of its signature, followed by some number between 0 and scope − 1. The list has a first node, and the three nodes are linked with the next relation. All nodes are reachable, but no node is reachable from itself. Many other satisfying instances are possible within this scope of 3; the simplest of which is actually an instance with no nodes and no lists. The second type of analysis is to check the constraints of the model. A desired property is asserted and the analysis attempts to find a counter-example, i.e.,
CHAPTER 3. ALLOY
35
Figure 3.2: Satisfying instance of specification in Listing 3.1 a satisfying instance of the model that does not satisfy the assertion. This can be accomplished with an assertion and a check command. These are demonstrated in Section 3.4.2. These two types of analysis are useful for determining if the model is under- or over-constrained and in supporting the incremental development of models [Jac02a].
3.4
Alloy Analyzer
The Alloy Analyzer has been publicly available since September 1999 [JSS00] and is used for educational purposes in over a dozen universities. It provides a welldesigned and intuitive graphical user interface, as well as command-line functionality. The Analyzer automates the analysis described above, permitting the user to edit a specification, analyze it, and examine the resulting instances or counter-examples.
3.4.1
Using the Analyzer
As illustrated in Figure 3.3 on the next page, the Alloy Analyzer is used as follows [JSS00]: 1. The user creates and compiles a specification with the Analyzer. This compilation takes a few seconds and finds errors in syntax and type.
36
CHAPTER 3. ALLOY
l o o s en m o d el none
write m in im a l m o d el
p ic k a n a l y s is
g en era te in s ta n c es c h ec k p ro p erty
some
none
ex ten d m o d el o r s to p
some
tig h ten m o d el
Figure 3.3: Activities performed while using the Alloy Analyzer [Jac01] 2. The user chooses the mode of analysis, i.e., simulation/animation or constraint checking, and a finite scope. 3. The Analyzer responds with either an instance or a message that none could be found. If the analysis mode was simulation, then the instance is one that satisfies the specification. If the analysis mode was constraint checking, then the instance is one that satisfies the structure of the specification, but that violates the assertion that was being checked. 4. The user can continue developing the specification, or can investigate the same area of analysis by changing the scope or adjusting the model. We demonstrate the analysis elements of Alloy, as well as the use of the Alloy Analyzer by example. The discussion in Section 3.4.2 offers a condensed view of the steps taken while creating a specification with the Analyzer.
CHAPTER 3. ALLOY
3.4.2
37
Example
Listing 3.2 More complex Alloy specification of a singly linked list 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
module List sig Node {} { one l : List | this in (l.first).*(l.next) } sig List { first : Node, next : Node -> ? Node } fact NoNodeReachableFromSelf { all l : List | no n : Node | n in n.^(l.next) } fun Show() {} run Show for 4
The specification in Listing 3.1 on page 32 was a quick introduction to simple specifications. The Alloy specification listed in Listing 3.2 is a complete specification and includes a ternary relation. We will build upon this specification to include more facts, as well as an assertion and commands. This specification also consists of two signatures, which declare two disjoint sets of objects—one of type Node and one of type List. Ternary vs. Binary Relation The specification in this example and the one from Listing 3.1 on page 32 both describe singly-linked lists. However, this example makes use of a ternary relation in the List signature to represent the links between nodes. This is in contrast to the
CHAPTER 3. ALLOY
38
previous example, where the Node signature contained only a binary relation, or field, which referred to the next node in the list. Both representations specify the same kind of linked list; however, there are some subtle differences: • In Listing 3.1 on page 32, it is possible to refer to node n’s next neighbour by referencing the expression n.next. In other words, the next relation is implicitly given by indicating the neighbours of each node. However, in Listing 3.2 on the previous page, node n’s next neighbour can only be referenced by including a reference to the list that contains the nodes as well. In other words, the relation l.next of a list l is explicitly given and associated with list l itself. For example, if list l contains node n, then n’s next neighbour is n.(l.next). • The specification in Listing 3.1 on page 32 has implicit constraints that force each node to have exactly one next node. However, in Listing 3.2 on the previous page, it is possible for each node to be linked to any number of other nodes with the list’s next relation. Because we want each linked list node to be associated with at most one other node, we are forced to include additional constraints in the model, such as the multiplicity constraint that has been placed on the next relation in line 10. This constraint forces each atom in the domain of the next relation to map to zero or one atoms in the range. Reachability Each example contains two reachability constraints—one to ensure that no node is reachable from itself, and one to ensure that a node is reachable in some, i.e., at least one, list. The first reachability constraint is almost the same for both. Listing 3.1 on page 32 uses the constraint
CHAPTER 3. ALLOY
39
no n : Node | n in n.^next as part of the List signature, while Listing 3.2 on page 37 uses all l : List | no n : Node | n in n.^(l.next) in the separate NoNodeReachableFromSelf fact. The phrasing of each constraint is the same, just the context and the exact name of the relevant next relation differs, according to the remainder of the specification. On the other hand, the second reachability constraint is much different in the two examples. The constraint in Listing 3.1 on page 32 all n : Node | n in first.*next can be extracted from the List signature and expressed as all l : List | all n : Node | n in (l.first).*(l.next) By extracting this reachability constraint, it is quite obvious that the constraint is quite na¨ıve in that it actually requires that all nodes must be reachable in all lists. In reality, we would prefer a more selective approach, i.e., every node must be reachable in exactly one list. This would mean that no lists would share nodes, and no nodes would exist without being part of a list. We attempt to specify this constraint by the addition of a fact to the Node signature, i.e., the fact in line 5 one l : List | this in (l.first).*(l.next) which forces every node to be reachable from the first node of exactly one list.
CHAPTER 3. ALLOY
40
Show Function and Run Command At this point, let us make use of the Analyzer to find an instance that satisfies our current specification. Line 17 of the specification contains a Show function, i.e., a function whose sole purpose is to demonstrate a satisfying instance of the specification. The braces of the function are empty, but could contain additional constraints, such as the requirement to show at least one list (some List), etc. This function is called by the run command in Line 19, with a scope of 4. In other words, when the Analyzer is used to execute this specification, it will attempt to find an instance that satisfies the structure and constraints of the specification, with a maximum of 4 of any type of atom, e.g., no more than 4 Node atoms and no more than 4 List atoms. When we execute the specification in the Alloy Analyzer, the program returns almost immediately with a ‘Solution Found’ popup box, i.e., a satisfying instance has been found. The Analyzer also has the ability to show a visualization of the satisfying instance. One satisfying instance of the specification in Listing 3.2 on page 37 is shown in Figure 3.4.
Figure 3.4: Satisfying instance of specification in Listing 3.2 The Analyzer is able to find all non-isomorphic satisfying instances of a specification, and the ‘Next’ command may be used to return another instance, such as the one shown in Figure 3.5 on the next page.
CHAPTER 3. ALLOY
41
Figure 3.5: Another satisfying instance of specification in Listing 3.2 Based on these successful instances, one might think that the specification was complete. However, iterating through various possible solutions quickly results in a satisfying instance such as the one in Figure 3.6.
Figure 3.6: Incorrect satisfying instance of specification in Listing 3.2 In this instance, there are two linked lists: List_2, whose first node is Node_3 and List_3, whose first node is Node_2. Note, however, that there is a next mapping between Node_3 and Node_2. Although at first glance, it might appear that this next mapping belongs to List_2, it is actually part of List_3’s next relation. This extra mapping conflicts with our desire to have each node belonging to only one list; there is obviously a subtle flaw in the specification, which allows this type of extraneous mapping to exist. No Solution Found It may be the case that the Alloy Analyzer cannot find a solution to a specification. In this case, it is possible that the specification contains conflicting requirements, and that no solution is possible. On the other hand, it may
CHAPTER 3. ALLOY
42
be the case that no solution can be found within the scope that was used to execute the Show() function. It should be noted that a solution found within a certain scope indicates that a satisfying instance of the specification does exist. The lack of such a solution within a certain scope does not indicate that a satisfying instance does not exist, merely that it does not exist within that scope. Increasing the scope may allow the Analyzer to find a solution. This concept is discussed further in Section 3.5.4. Assertion and Check One way of confirming possible flaws, or of checking that particular constraints are met by satisfying instances of the specification, is to create assertions that can be checked by executing the Alloy check command. Listing 3.3 on the next page lists our sample specification with the addition of a new assertion and check command. The assertion in Lines 17-20 assert NextOnlyForNodesOfSameList { all l : List | all n1, n2 : Node | n1 in n2.~(l.next) => n1 in (l.first).*(l.next) } represents an invariant that constrains mappings between nodes to occur only in the list in which those nodes exist. Remembering that ~ represents the transpose of a relation, this assertion states: for every pair of nodes n1 and n2 in a list l, the fact that n2 can be reached from n1 via the next relation in l implies that n1 is reachable in that list. This forces every mapping in a list’s next relation to be between two nodes that are reachable from the head of that list. This assertion represents a constraint that we want to maintain in the specification. To check this assertion, we execute the check command in the Alloy Analyzer. In this case, the ‘Solution Found’ popup box means that a counter-example has been
CHAPTER 3. ALLOY
43
Listing 3.3 Alloy specification of a singly linked list with assertion 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
module List sig Node {} { one l : List | this in (l.first).*(l.next) } sig List { first : Node, next : Node -> ? Node } fact NoNodeReachableFromSelf { all l : List | no n : Node | n in n.^(l.next) } assert NextOnlyForNodesOfSameList { all l : List | all n1, n2 : Node | n1 in n2.~(l.next) => n1 in (l.first).*(l.next) } check NextOnlyForNodesOfSameList for 4 fun Show() {} run Show for 4
found, i.e., an instance that satisfies the constraints of the specification and yet fails the assertion, e.g., the instance shown in Figure 3.6 on page 41. Our assertion has failed, which means that we have to fix the specification to take the desired invariant into account. Fortunately, it is easy to modify the specification so that the assertion is always true; simply change that assertion into a fact. We have done this in Listing 3.4 on the next page, where Line 12 contains the new fact. Note that we have associated this fact with the List signature but that the essence of the constraint remains the same.
CHAPTER 3. ALLOY
44
Listing 3.4 Complete Alloy specification of a singly linked list 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
module List sig Node {} { one l : List | this in (l.first).*(l.next) } sig List { first : Node, next : Node -> ? Node }{ all n1, n2 : Node | n1 in n2.~next => n1 in first.*next } fact NoNodeReachableFromSelf { all l : List | no n : Node | n in n.^(l.next) } fun Show() {} run Show for 3
No Counter-Example Found It should be noted that any assertion that is found to be valid, i.e., any assertion for which the Alloy Analyzer cannot find a solution, is only valid within the scope used to check that assertion. In other words, if an assertion is valid in scope 3, there is no guarantee that it is valid within scope 4 or higher. This concept is discussed further in Section 3.5.4. Edit Instance The Alloy Analyzer offers one more command, which is useful when designing specifications; this is the Edit Instance command. This command can be used in conjunction with a specification, including scope, to test the specification against particular combinations of atoms and relations. In other words, instead of letting the Analyzer
CHAPTER 3. ALLOY
45
find instances that satisfy the specification, or counter-examples that violate an assertion, the Edit Instance command can be used to test a user’s instance against the specification. Suppose we wish to determine whether an instance such as that shown in Figure 3.6 on page 41 could actually still be a satisfying instance of our specification, even after the new fact has been added. We can run the Edit Instance command on our specification, select those atoms and relations that represent the existence of two lists, with a mis-mapping between them, and then the Alloy Analyzer will check this instance against the specification. For purposes of simplicity, we have changed the scope of our specification from Listing 3.4 on the previous page to 3. The Edit Instance dialog box would be as shown in Figure 3.7.
Figure 3.7: Edit Instance dialog box for specification in Listing 3.4 Checkmarks have been placed against certain atoms and relations in order to simulate the erroneous instance of Figure 3.6 on page 41. The Alloy Analyzer is also
CHAPTER 3. ALLOY
46
able to provide a visualization of the edited instance, as in Figure 3.8. Note how this instance is isomorphic to the one in Figure 3.6.
Figure 3.8: Edited instance, isomorphic to instance in Figure 3.6 Once the instance has been edited, the Analyzer determines whether or not the instance satisfies the specification. In this case, it does not, as evidenced by the display in Figure 3.9. The highlighted portion of the specification in the right panel shows which fact has not been met by the edited instance. The top left panel contains the Alloy propositional formula representing the specification; the expressions prefaced with ‘F’ indicate which portions of the formula are false.
Figure 3.9: Result of Edit Instance command, showing that the edited instance in Figure 3.8 does not satisfy the specification in Listing 3.4
CHAPTER 3. ALLOY
3.5 3.5.1
47
Analyzing the Analysis Decidability and Completeness
As briefly mentioned earlier in this chapter, the Alloy language is partially based on first-order logic, and is therefore undecidable [Chu36, Hun01, Tur36]. The analysis technique used to generate satisfying instances of specifications avoids this undecidability by using a finite scope. To say that a specification has a solution, or a satisfying instance, in a scope of k means that this solution has no more than k atoms of each type. Although the use of a finite scope makes the analysis decidable, it makes the analysis incomplete. If an instance satisfying the formula cannot be found within a certain scope, this does not imply that the specification is unsatisfiable; an instance may be found if the scope is increased. Likewise, the lack of a counter-example for an assertion does not imply that the asserted property holds in a larger scope.
3.5.2
The Boolean Array
As discussed in Section 3.3, the Alloy specification is combined with the finite scope into a relational formula, which is then transformed into a propositional, or Boolean, formula. This formula is presented to a commercial off-the-shelf SAT solver, which is capable of determining a solution to the formula, if one exists. Any solution that satisfies this formula represents a combination of atoms and relations that constitute a solution to the specification. Each variable in the propositional formula, i.e., each possible atom or relation between atoms, is represented in the Analyzer by an element in a Boolean array. An array of size N represents a
48
CHAPTER 3. ALLOY
total of 2N possible solutions to the propositional formula, as each element in the array may be either TRUE or FALSE, indicating that the element is either present or absent.
3.5.3
The Solution Space
These 2N possible solutions represent the search space of the analysis, the size of which is at least exponential in the size of the scope. For example, a single binary relation in a specification with scope k causes the array to contain at least N = k 2 2
elements. The analysis search space of such a specification thus contains 2k = 2k×k possible values. Increasing the arity of the solitary relation to 3 causes the array to 3
contain at least N = k 3 elements, causing the search space to expand to 2k = 2k×k×k possible values. As the arity of the array increases, so does the exponent applied to k. The Alloy Analyzer has proved to be efficient at analyzing specifications with small scopes. The default scope seems to 3; this scope is normally large enough to find most errors [Jac00a]. Empirical testing has been performed at MIT on specifications containing only three relational state components in this default scope; the solution space translated to about a billion states [JSS00]. Results demonstrate that specifications with the default scope of 3 can be analyzed in well under a minute [Jac00a].
3.5.4
Small Scope Hypothesis
This empirical testing with various specifications has led to the small scope hypothesis [Jac00b], which states that many errors can be detected by considering only a
CHAPTER 3. ALLOY
49
small scope, such as 3. In other words, many errors, even subtle ones, in a specification can be detected by considering no more than 3 atoms of each type. This hypothesis is useful in practice when developing specifications; however, it is a purely empirical hypothesis. In general, for any scope k, there exists a specification that has no satisfying instance in k, but does have one in scope k + 1 [Jac00b].
3.6
Alloy Analyzer Source Code
The Alloy Analyzer has been implemented in Java and consists of over 400 classes, not including plug-ins for the visualizations. The source code is publicly available, and while we examined it, we made no changes to the source code itself. While implementing Embee, we made use of the public methods in a several Alloy classes in the alloy.api package, such as AlloyRunner and SolutionData. Appendix A contains some documentation on the Alloy Analyzer classes which we use.
Chapter 4 Embee: User Perspective The purpose of this chapter is to discuss how the prototype Embee tool functions from the user’s perspective. We explain the phases of the Embee analysis and various capabilities of the tool through a series of examples. Detailed discussion of the implementation decisions, specific algorithms and analysis of the tool itself is presented in Chapter 5. The purpose of Embee is to enable a user to specify an object model in Alloy, implement that specification in Java, and then dynamically check the conformance of that implementation against the original specification. The user also indicates breakpoints—points in the code where the user expects the code to conform. The conformance check is dynamic in that the runtime state of the program is captured during execution. The conformance check itself can take place immediately after the program terminates, or after each breakpoint.
50
CHAPTER 4. EMBEE: USER PERSPECTIVE
4.1
51
Preparation
The first step in being able to check the conformance of a program is to develop a specification using Alloy. Our first example is similar to the specification in Listing 3.1 on page 32; however, we have modified the facts of the specification such that each node may only exist in exactly one list. The resulting specification is displayed in Listing 4.1. The fact NoCycle means that no node is reachable from itself and the fact NodeInOneList forces every node to be reachable in exactly one list. Appendix B contains the full listings of all input files and partial listings of output files and screen output for this example; selected portions are reproduced here as necessary. Listing 4.1 Alloy specification of a singly-linked list using only binary relations module List sig Node { next : option Node } sig List { first : Node } fact NodeInOneList { all n : Node | one l : List | n in (l.first).*next } fact NoCycle { all n : Node | n ! in n.^next } fun Show() {} run Show for 4
CHAPTER 4. EMBEE: USER PERSPECTIVE
52
In principle, conformance checking can be performed on any Java code purported to be an implementation of this specification. However, we do assume that each signature in the specification has a corresponding Java class. Conformance checking could be performed if a signature has not been implemented; however, any constraints placed on the atoms of that signature would cause the execution to be nonconforming. Listings 4.2 and 4.3 show excerpts of a simple implementation of the example specification. Note that attributes that do not correspond to fields in the specification, such as the nodeContents attribute in the Node class, are ignored during the conformance check. However, if an ignored attribute has a type that corresponds to a signature in the model, the total count of objects of that type will be increased, which in turn may increase the scope used during the conformance check. Listing 4.2 Excerpt of Java code for Node class, implementing Node signature. The implementation contains an additional attribute, nodeContents. public class Node { Object nodeContents; Node next; ... }
An implementation must be executed in order to check that execution for conformance. Therefore, the user must ensure that at least one of the implemented classes has a main() method; one of these executable classes becomes the target class. If there is no target class, then the user must create one. The target class should use the code that is to be checked for conformance; we have created a target called ClientCode.java, the full listing of which is in Section B.1.2. This class has been created specifically to exercise the Node and SimpleList classes; we will demonstrate
CHAPTER 4. EMBEE: USER PERSPECTIVE
53
Listing 4.3 Excerpt of Java code for SimpleList class, implementing List signature. public class SimpleList { ... Node first; ... public void addNode(Node n) { Node temp = first; if (temp == null) { first = n; } else { while (temp.next != null) { temp = temp.next; } temp.next = n; } } ... }
some of Embee’s capabilities with three separate segments of this code. Let us begin by demonstrating code execution that conforms to the specification, such as the code excerpt in Listing 4.4 on the next page. This client code creates four Node objects, adds them to a new SimpleList object, then creates four more Node objects and adds them to a second SimpleList object. We would expect this code to be in conformance with the specification, assuming that the Node and SimpleList classes have been correctly implemented. Finally, the user must create a simple configuration file, which contains the name of the target class and a list of desired breakpoints. The configuration file for our first example is displayed in Listing 4.5 on page 55. It requests breakpoints at the beginning of lines 34, 42 and 49 of the ClientCode class, as well as at the end of the addNode() method of the SimpleList class. The placement of breakpoints is not restricted to the target class; breakpoints could be placed at the beginning of any
CHAPTER 4. EMBEE: USER PERSPECTIVE
executable line of code, or at the end of any method, regardless of the class. Listing 4.4 Excerpt of ClientCode.java with a conforming execution ... System.out.println("Creating some grocery items..."); Node item1 = new Node("milk"); Node item2 = new Node("eggs"); Node item3 = new Node("bread"); Node item4 = new Node("apples"); System.out.println("Creating my shopping list..."); SimpleList myList = new SimpleList("My Shopping"); myList.addNode(item1); myList.addNode(item2); myList.addNode(item3); myList.addNode(item4); System.out.println("Printing my shopping list..."); myList.printList(); System.out.println("Creating some more grocery items..."); Node item5 = new Node("lettuce"); Node item6 = new Node("peanut butter"); Node item7 = new Node("carrots"); Node item8 = new Node("bacon"); System.out.println("Creating your shopping list..."); SimpleList yourList = new SimpleList("Your Shopping"); yourList.addNode(item5); yourList.addNode(item6); yourList.addNode(item7); yourList.addNode(item8); 34
System.out.println("Printing your shopping list..."); yourList.printList(); ...
54
55
CHAPTER 4. EMBEE: USER PERSPECTIVE
Listing 4.5 Configuration file for code in Listings 4.2, 4.3 and 4.4 ClientCode method line : line : line :
4.2
: SimpleList ClientCode : ClientCode : ClientCode :
: addNode : (Node) 34 42 49
Embee Phases
The execution of Embee occurs in three distinct phases, as shown in Figure 4.1. As mentioned earlier, the purpose of this chapter is to discuss how Embee works from a user’s perspective, i.e., from a very high-level point of view. Chapter 5 will discuss the intricacies of each phase in much more detail. Alloy specification Configuration file
Java class(es)
Embee Phase 1
Phase 2
Phase 3
High-Level Static Mapping
Dynamic Object Collection
Conformance Checking
High-level mapping
Summary file
.dump Object.dump state dump file(s)
T or F
Figure 4.1: High-level view of Embee execution
CHAPTER 4. EMBEE: USER PERSPECTIVE
4.2.1
56
Phase 1: High-Level Static Mapping
In general, every signature specified in the object model should have a corresponding Java class; however, we do not require that the naming scheme of the Alloy model and the implementation be the same. The high-level static mapping is used to link signatures and fields in the model with the corresponding classes and attributes in the code. By default, Embee assumes that the specification signature and field names correspond exactly to the implementation class and attribute names. The user may specify alternate signature-class or field-attribute mappings in the configuration file. If no mappings are specified, Phase 1 simply generates the default static mapping and presents it to the user, as shown in Figure 4.2(a). If the default mapping is accurate, the user can continue with the Embee analysis. If not, the user can edit the map file and then continue. In our example, the List signature has been implemented as the SimpleList class. We have modified the map file as shown in Figure 4.2(b). List = List List$first = List.first Node = Node Node$next = Node.next (a) Default static mapping
List = SimpleList List$first = SimpleList.first Node = Node Node$next = Node.next (b) Modified static mapping
Figure 4.2: Excerpt of high-level static mapping file before and after modification
4.2.2
Phase 2: Dynamic Object Collection
Once the high-level mapping is satisfactory, execution of the process continues with the second phase. The heart of this phase is a stand-alone Java program called StateDumper, which is responsible for executing the target program, halting execution
CHAPTER 4. EMBEE: USER PERSPECTIVE
57
at the desired breakpoints, iterating through the current objects at each breakpoint 1 , and outputting that information into a series of dump files, a sample of which is shown in Listing 4.6. A summary file is also created, containing the names of all of the dump files, along with a count of the maximum number of objects of any one type at each breakpoint. This number will serve as the scope for the Alloy analysis. The summary file and excerpts of the dump files for this example are listed in Appendix B. Listing 4.6 Runtime state dump file for the first method exit breakpoint MethodExitEvent at end of method SimpleList.addNode(Node) Current file is test1_1.dump Current OBJECT: instance of SimpleList(id=62) - Visible Local Variable: Node n = instance of Node(id=53) - Visible Local Variable: Node temp = null Dumping: instance of Node(id=53) - java.lang.String Node.nodeContents = "milk" Dumping: instance of SimpleList(id=62) - java.lang.String SimpleList.listName = "My Shopping" - Node SimpleList.first = instance of Node(id=53) End dump.
The StateDumper program was developed using the Java Platform Debugger Architecture (JPDA), which is available from Sun Microsystems. The JPDA provides debugging support for the Java 2 platform, as well as the infrastructure for the creation of end-user debugger applications [Mic]. Our use of this infrastructure is very basic; the JPDA provides classes and methods that allow the StateDumper to execute the target in a second virtual machine, halt the execution at breakpoints, and retrieve information about objects at each breakpoint. Further discussion of the JPDA and how objects are retrieved from the target program is provided in Section 5.3. The JPDA allows several different types of breakpoint. For example, execution 1
The current implementation of Embee iterates through the objects that are available in the top-most level of the program’s stack. This distinction is discussed in more detail in Section 5.3.1.
CHAPTER 4. EMBEE: USER PERSPECTIVE
58
of the target could be halted every time a class is prepared, every time a method is entered and/or exited, at specific locations or lines in the code, etc. Our implementation limits breakpoints to be placed at either the beginning of a specific line of code (line breakpoint) or at the end of a specific method (method exit breakpoint). These desired breakpoints are listed in the configuration file. In order to make Embee as user-friendly as possible, we wanted to avoid forcing the user to instrument the implementation code; we deliberately avoided the requirement for any modification of the target code. All constraints are detailed in the Alloy specification, and the list of desired breakpoints is maintained in the configuration file. In the worst case, the user may have to insert an executable line of code if none exists where a line breakpoint is desired.
4.2.3
Phase 3: Conformance Checking
Conformance checking is performed for each individual breakpoint. Information about the runtime state at that breakpoint is transformed into an internal representation akin to a propositional truth assignment. This representation is in the form of a Boolean array, comparable to the one used by the Alloy Analyzer as discussed in Section 3.5.2. The array is passed, along with the original object model, to the Alloy Analyzer. The Analyzer transforms the object model into a propositional formula, then uses a SAT solver to determine whether or not the truth assignment satisfies the formula. If so, then the objects at that particular breakpoint do indeed conform to the specification. In our first example, the first eight breakpoints occur at the end of the addNode() method in the SimpleList class; one breakpoint for each addition to a list. The
CHAPTER 4. EMBEE: USER PERSPECTIVE
59
ninth breakpoint is at the beginning of line 34 in the ClientCode class. As can be seen in Listing 4.4 on page 54, this is the first executable line of code following the addition of four unique nodes to each of two lists. Up to this point, we expect that the execution of the implemented code should conform to the specification; as can be seen from the excerpted screen output shown in Listing 4.7, the execution does conform for the first nine breakpoints. Listing 4.7 Excerpted screen output of first nine, conforming, breakpoints ... MethodExitEvent. Class: SimpleList. Method: addNode(Node). Objects contained in file test1_1.dump Conformance at this breakpoint :) ... LineBreakpointEvent. Line: 34. Class: ClientCode. ... ...Method: main(java.lang.String[]). Objects contained in file test1_9.dump Conformance at this breakpoint :) ...
4.3
Miscellaneous Capabilities
Up to this point, we have discussed a high-level view of how the Embee process checks the conformance of code against a specification. The remainder of this chapter presents miscellaneous features of the Embee process. These capabilities are briefly described and some are further illustrated with examples.
4.3.1
Running Embee
In general, Embee can be executed in three different modes:
CHAPTER 4. EMBEE: USER PERSPECTIVE
batch
60
In batch mode, the target program is executed from start to finish, with dump files being created when necessary. When the execution has terminated, the runtime states logged in each dump file are checked for conformance, and the results are output to the user. This could be termed post-runtime analysis and is best suited for programs that always terminate.
runtime
In this mode, the thread of execution is passed between Phases 2 and 3. Once the runtime state has been dumped to a file, it is immediately checked for conformance. This is as close to true runtime analysis as is possible with our use of the JPDA, which must actually halt the target program in order to examine its state.
critical runtime This mode is similar to the previous one; however, instead of processing all breakpoints, the analysis will halt as soon as nonconformance occurs. In other words, as soon as a set of objects at some breakpoint does not conform to the specification, the entire process halts. For the purposes of our discussion, we will use the batch mode for our examples; in reality, the method of execution does not change how the basic process of conformance checking is handled. Command-Line Arguments Embee is executed from the command line. As discussed above, there are three main ways to execute Embee. These three methods refer to when conformance checking is
CHAPTER 4. EMBEE: USER PERSPECTIVE
61
performed in relation to the collection of object information from the target program’s execution. In practice however, there are actually more than three ways to execute the program. Embee supports a variety of command-line arguments, which can be used to tailor the execution, depending on the nature of the task to be performed. For example, when executed normally, all three phases of the process are executed; however, it is possible to execute any one phase in isolation. Other command-line arguments can be used to alter the amount of information output to the user; further arguments have been included to assist in development of the code itself.
4.3.2
Context of Constraints and Breakpoints
Returning to our running example, we have seen so far how Embee can be used to check the conformance of an implementation that matches the specification. However, we did include some code in the ClientCode class that deliberately violates the specification in Listing 4.1 on page 51. We return to the List example to demonstrate how Embee can be used to indicate a lack of conformance in the implemented code. Before doing so however, it is necessary to discuss the concept of context with respect to both constraints and breakpoints. Context of Constraints Some constraints in a specification relate to every individual instance of a particular signature, such as the constraint on lists stating that every single list must have a first node. Other constraints relate to more than one instance, or to all instances, of a particular signature, such as the constraint that lists cannot share nodes. This
CHAPTER 4. EMBEE: USER PERSPECTIVE
62
distinction between constraints is useful in determining where to place breakpoints in the target code. If a breakpoint is placed where only one instance of a signature is available for inspection, then constraints on more than one instance cannot be accurately checked. For example, in the addNode() method of the SimpleList class, only one instance of the List signature is available for inspection. Any constraint that requires the examination of more than one List instance would be vacuously true or false, depending on the constraint. In this case, the constraint that no two lists share nodes would be vacuously true because only one list would be examined at a time. If the constraint stated that two or more lists must share nodes, then the constraint would be vacuously false, again, simply because only one list could be examined. On the other hand, any breakpoint at a line, or at the end of a method, which has full access to all lists, will result in accurate conformance checking. For example, code in the main ClientCode class has full access to all List instances created in that class. It is therefore important to consider the context of a constraint when determining the placement of breakpoints to check that constraint. For instance, a constraint that refers to the structure of any List could be placed anywhere in the SimpleList class, although the conformance check at that breakpoint would be accurate only for the instance of List that was being accessed in that method. This caveat leads us into a discussion on the context of breakpoints. Context of Breakpoints Every identifier (or variable) in a Java program has a scope, which defines where it may be used in the program. For example, instance variables (also known as class variables) may be accessed anywhere in the class in which they are declared. On the
CHAPTER 4. EMBEE: USER PERSPECTIVE
63
other hand, local variables are only accessible in the methods or code blocks in which they are declared [Mor00]. We can therefore state that every point in a Java program has what we define as a context, which encompasses all of the variable identifiers2 that are accessible at that point in the program. Similarly, each breakpoint also has a context. The context of a breakpoint encompasses objects that are accessible at that particular breakpoint. The exact method of accessing objects is discussed in detail in Section 5.3; for now, the current implementation of Embee restricts the accessible objects to those that are available in the top-level of the program’s stack. For example, the objects that are accessible at a particular breakpoint in a class include any objects referred to by that class’s instance variables, any objects referred to by local variables that exist at that particular breakpoint and objects referred to by arguments passed into the method. Inaccessible objects are any objects referred to by local variables in other regions of that class’s code, and objects referenced by other classes. There is no distinction between line and method exit breakpoints; the definition of their context is the same. Note that objects referred to by local variables defined in the method (but not within inner blocks of the method) are accessible at method exit breakpoints, even though the execution of that method has terminated.
4.3.3
Exploring Nonconformance
Now that we have introduced the concept of the context of constraints and breakpoints, we can return to our discussion of the first example. Listing 4.8 on the next page is another excerpt of the ClientCode class, where a new node is created and 2
We are not considering static variables in this discussion.
CHAPTER 4. EMBEE: USER PERSPECTIVE
64
then added to two separate lists. We have deliberately created code that does not conform to the constraint that lists do not share nodes. Listing 4.8 Excerpt of ClientCode.java causing two lists to share a node ... System.out.println("Putting a new item onto both lists..."); Node item9 = new Node("dog food"); myList.addNode(item9); yourList.addNode(item9); 42
System.out.println("Printing both our lists..."); myList.printList(); yourList.printList(); ...
The configuration file in Listing 4.5 on page 55 shows desired breakpoints at the end of the addNode() method, as well as at the beginning of lines 34, 42 and 49. As discussed earlier, there is conformance at the first nine breakpoints. The tenth and eleventh breakpoints occur when the newest node is added to each of the two lists, i.e., at the end of the addNode() method in the SimpleList class. The context at this breakpoint includes the objects that are referred to by local variables in the method, instance variables, and the arguments to the method itself. In this particular case, the only objects that exist at the breakpoint are the list object to which a new node is being added, the new node, and any other node objects accessible in the list itself. Other list objects are outside of the context of this breakpoint, and therefore the constraint that no lists share nodes will be vacuously true. In order to ensure that the sharing of a node is accurately caught by the conformance checker, we have ensured that there is another breakpoint (the twelfth) at some point in the client code where all lists are accessible. In this case, we have chosen line 42 of the ClientCode class. Because we have deliberately seeded nonconforming
CHAPTER 4. EMBEE: USER PERSPECTIVE
65
Listing 4.9 Excerpted screen output of Embee program finding nonconforming code ... MethodExitEvent. Class: SimpleList. Method: addNode(Node). Objects contained in file test1_10.dump Conformance at this breakpoint :) ... MethodExitEvent. Class: SimpleList. Method: addNode(Node). Objects contained in file test1_11.dump Conformance at this breakpoint :) ... LineBreakpointEvent. Line: 42. Class: ClientCode. ... ...Method: main(java.lang.String[]). Objects contained in file test1_12.dump NO CONFORMANCE AT THIS BREAKPOINT!!!
code in our example, it was easy to decide where to place breakpoints to catch the discrepancies. In reality, we might simply place breakpoints at regular intervals in the top-level code. Listing 4.9 shows the screen output of the conformance checking for the tenth through twelfth breakpoints. It is possible to use the information from a dump file to manually edit an instance in the Alloy Analyzer. This allows us to visualize the objects that exist at a particular breakpoint, as well as to explore why these objects do not conform to the specification. Section B.4 discusses how to do this. For this example, we have taken the information from test1_12.dump and input it into the Edit Instance dialog box of the Analyzer. The diagram in Figure 4.3 is the resulting visualization.
Figure 4.3: Visualization showing two lists sharing a node
CHAPTER 4. EMBEE: USER PERSPECTIVE
66
We have also deliberately seeded another error in the ClientCode class; we have created a cycle in one of the lists. Listing 4.10 is a final excerpt from this class, showing the creation of a cycle in the first list. Listing 4.10 Excerpt of ClientCode.java creating a cycle in the list ... System.out.println("Creating a cycle in my list..."); myList.addNode(item3); 49
System.out.println("Printing my shopping list..."); myList.printList(); System.out.println("Done..."); ...
The constraint that we were aiming to defy is the one that states that no list may have a cycle, i.e., that no node is reachable from itself. In this case, the constraint is applicable for any particular instance of a list object. Therefore, the conformance checking will discover nonconformance at the end of the addNode() method when it is called for the list containing the cycle. In addition, we expect that the nonconformance will be caught by line 49 of the ClientCode class, as that line occurs after the cycle is created. Listing 4.11 on the next page contains the screen output for the last two breakpoints. The nonconforming code is indeed caught by both the method exit breakpoint and the line breakpoint. Figure 4.4 on the next page is the visualization of the objects that exist at either of these breakpoints.
CHAPTER 4. EMBEE: USER PERSPECTIVE
67
Listing 4.11 Excerpted screen output of Embee program finding nonconforming code ... MethodExitEvent. Class: SimpleList. Method: addNode(Node). Objects contained in file test1_13.dump NO CONFORMANCE AT THIS BREAKPOINT!!! ... LineBreakpointEvent. Line: 49. Class: ClientCode. ... ...Method: main(java.lang.String[]). Objects contained in file test1_14.dump NO CONFORMANCE AT THIS BREAKPOINT!!! ...
Figure 4.4: Visualization of list showing a cycle
4.3.4
Higher-Arity Relations
Our first example used only binary relations in the signatures, such as next : Node and first : Node. As discussed in Sections 3.2.1 and 3.4.2, it is possible to specify relations with arity higher than two. We have designed our second example specifically to make use of a ternary relation. Appendix C contains the full listings of all input files and excerpts of the output files and screen output for this example; selected portions are reproduced here as necessary. This second example focusses on a directed acyclic graph. Listing 4.12 on the next page contains an excerpt of the Alloy specification. Similar to the list specification, this one also has a Node signature; however, in this case, the next relation is not contained in the Node signature, but rather in the Graph signature. This relation is a
CHAPTER 4. EMBEE: USER PERSPECTIVE
68
ternary relation, i.e., with arity 3, and is of type Graph -> Node -> Node. In other words, in every instance of Graph, there is a relation called next, which essentially contains a set of mappings from Node to Node. These mappings indicate the structure of the graphs. Other facts in the specification decree: that every node must be part of exactly one graph; that if there is a mapping between two nodes, then both nodes must be in the same graph; and that no node is reachable from itself. Listing 4.12 Excerpt of Alloy specification for directed acyclic graph sig Node {} { one g : Graph | this in (g.first).*(g.next) } sig Graph { first : Node, next : Node -> Node }{ //For every pair of nodes in a graph’s next relationship, //the left hand node (n1) must be in that graph all n1, n2 : Node | n1 in n2.~next => n1 in first.*next } fact NoCycle { all g : Graph | all n : Node | n ! in n.^(g.next) }
Listings 4.13 and 4.14 on the next page show excerpts of a simple implementation of the example specification. The current implementation of Embee requires that ternary and other higher-arity relations must be implemented as collections of linked lists. Reasons for requiring this particular implementation of higher-arity relations are discussed in Section 5.4.2. The Java Vector edges will contain Java LinkedList objects; each one representing one specific next mapping. For instance, consider an object Graph 0, consisting of two node objects, Node 0 and Node 1, with
CHAPTER 4. EMBEE: USER PERSPECTIVE
69
an edge between the former and the latter. Then the object Graph 0 would have an attribute called edges of type Vector. In this vector, there would be one element—a LinkedList object. The first element in this LinkedList would be Node 0 and the second would be Node 1. Listing 4.13 Excerpt of Java code for Node class, implementing Node signature. The implementation contains an additional attribute, nodeContents. public class Node { ... Object nodeContents; ... }
Listing 4.14 Excerpt of Java code for DAG class, implementing Graph signature. The first relation in the Graph signature has been renamed start. Likewise, the next relation is implemented as the vector edges. public class DAG { Node start; Vector edges; ... public void addEdge(Node start, Node finish) { //Implement the ternary next relation with a //LinkedList with two elements LinkedList newEdge = new LinkedList(); newEdge.add(start); newEdge.add(finish); //Add the relation to the collection edges.add(newEdge); } ... }
The client code, an excerpt of which is shown in Listing 4.15 on the next page, exercises the Node and DAG classes by creating a graph containing several nodes. This
CHAPTER 4. EMBEE: USER PERSPECTIVE
70
code then creates a cycle in the graph, creating a nonconforming execution. Listing 4.15 Excerpt of ClientCode.java creating a cycle in the acyclic graph System.out.println("Creating some nodes..."); Node A = new Node("A"); ... Node D = new Node("D"); Node E = new Node("E"); System.out.println("Creating a new graph; adding Nodes..."); DAG graph = new DAG(A); ... graph.addEdge(C,E); graph.addEdge(D,C); ... System.out.println("Creating a cycle..."); graph.addEdge(E,D); ... graph.printGraph();
26 29
4.3.5
Mappings in the Configuration File
The configuration file is shown in Listing 4.16 on the next page and requests breakpoints at the beginning of lines 26 and 29 of the ClientCode class, i.e., before and after the cycle has been created. The file also contains a set of definitions, showing the mapping between signature and field names used in the specification and the class and attribute names used in the implementation. The inclusion of definition lines in the configuration file allows Embee to create the proper static mapping between the specification and implementation naming schemes. The user is therefore not required to manually edit the mapping file. Executing Embee with this configuration file, along with the specification file, creates a static mapping file, summary file, dump files and screen output, much of which is included in Appendix C. As expected, the execution of the ClientCode
CHAPTER 4. EMBEE: USER PERSPECTIVE
71
program conforms to the specification at the first breakpoint, but not at the second. Listing 4.16 Configuration file for code in Listings 4.13, 4.14 and 4.15. Notice the definition lines mapping Graph’s first relation to DAG’s start attribute and the next relation to the edges attribute. ClientCode define : Graph : DAG define : Graph$first : DAG.start define : Graph$next : DAG.edges line : ClientCode : 26 line : ClientCode : 29
4.3.6
Automatic Edit Instance Information
As discussed in Section 4.3.3, it is possible to use the information from a dump file to edit an instance in the Alloy Analyzer. Manually translating the information from a dump file into an equivalent instance in the Analyzer can be a time-consuming and error-prone activity. The dump files employ a unique identifier for each object dumped, which must be mapped to an equivalent Analyzer atom name. In addition, the dump file frequently contains much information, some of it extraneous to the instance to be explored. For these reasons, identifying the objects and relationships among them is a process that would be better performed automatically. Embee supports a series of command-line options, one of which can be used to output the information required to edit an instance in the Analyzer. The format of this information is sorted as it would appear in the Edit Instance dialog box, and the Analyzer’s naming scheme is used to make editing the instance as straightforward as possible. Listing 4.17 on the next page contains the edit instance information
72
CHAPTER 4. EMBEE: USER PERSPECTIVE
automatically generated by Embee for the second breakpoint.
Listing 4.17 Automatically generated Edit Instance information for second breakpoint 1 2 3 4 5 6 7 8 9 10 11 12 13
Graph=DAG = Graph$first=DAG.start [Graph->Node] [Graph_0 Node_3] = Graph$next=DAG.edges [Graph->Node->Node] [Graph_0 Node_0 Node_4] Graph$next=DAG.edges [Graph->Node->Node] [Graph_0 Node_2 Node_0] Graph$next=DAG.edges [Graph->Node->Node] [Graph_0 Node_2 Node_4] Graph$next=DAG.edges [Graph->Node->Node] [Graph_0 Node_3 Node_2] Graph$next=DAG.edges [Graph->Node->Node] [Graph_0 Node_3 Node_4] Graph$next=DAG.edges [Graph->Node->Node] [Graph_0 Node_4 Node_1] Node=Node = Node=Node = Node=Node = Node=Node = Node=Node =
= = = = = =
Once the specification has been built by the Analyzer with the appropriate scope, we can edit an instance with this information. For example, line 1 indicates that a DAG object exists and its unique identifier is 64; it has been mapped to an Alloy atom called Graph 0. Therefore, in the Edit Instance dialog box, we indicate with a checkmark that the Graph 0 atom exists in the instance. Line 2 indicates that there is a binary relation, named first (or start in the target program), between the atom named Graph 0 and the one named Node 3. Similarly, Line 3 indicates that there is a ternary relation between the atoms named Graph 0, Node 0 and Node 4. In other words, in Graph 0, there is a relation, called next, between these two nodes. Once all of the information has been inserted into the Edit Instance dialog box, the Analyzer can be used to determine whether or not the this instance satisfies the specification. We have edited an instance in the Analyzer that relates to the objects that exist at the second breakpoint, i.e., from the information in test2 2.dump. The resulting
CHAPTER 4. EMBEE: USER PERSPECTIVE
73
visualization is shown in Figure 4.5. Notice that the edges between nodes, or the next relations, are labelled with the name of the graph to which they belong; this makes it easier to decipher visualizations when there is more than one graph atom.
Figure 4.5: Visualization of a directed graph with a cycle; therefore, not a directed acyclic graph
4.3.7
Implementation Errors
The examples used thus far demonstrate that Embee can determine whether or not top-level code uses well-implemented lower-level code well. In other words, for the List and Graph examples, the implementation of the specification was correct, but the use of the relevant classes was not. In these examples, we manipulated nodes, lists and graphs to cause nonconforming executions, such as lists that shared nodes, or directed acyclic graphs that contained cycles. The purpose of this final example is to demonstrate that Embee is also sometimes
CHAPTER 4. EMBEE: USER PERSPECTIVE
74
able to discover erroneously implemented code. Because Embee checks the conformance of a program at one particular point in the execution, the check is necessarily limited to whether or not the state at that point conforms to the specification. In other words, Embee cannot be used to check that an operation conforms to pre- and post-conditions. This type of check would require the ability to compare two states; one immediate preceding and one immediately following an operation. This example focusses on the specification of a binary tree, and the implementation of a binary search tree. Appendix D contains the full listings of all input files and partial listings of output files and screen output for this example; selected portions are reproduced here as necessary. Listing 4.18 on the next page contains the partial specification of the binary tree. Node instances are related to each other by the use of the left and right relations; the tree itself is specified by its root. The facts and functions are necessary to specify that this is a binary tree. For example, facts are necessary to specify that no node is accessible from more than one parent, that no node is reachable from itself and that each node belongs to exactly one tree. An additional fact specifies that the keys of the nodes must be unique.
CHAPTER 4. EMBEE: USER PERSPECTIVE
75
Listing 4.18 Excerpt of Alloy specification for a binary tree sig Key {} sig Node { key : Key, left : option Node, right : option Node } sig Tree { root : Node } fact KeysUnique { all t : Tree | all n1, n2 : nodesInTree(t) | n1.key = n2.key => n1 = n2 } fact EveryNodeInOneTree { all n : Node | one t : Tree | n in nodesInTree(t) } fact NoCycles { all n : Node | n ! in descendants(n) } fact OnlyOneParent { all n : Node | sole (n.~left + n.~right) } fun descendants (n : Node) : set Node { result = n.^(left + right) } fun nodesInTree(t : Tree) : set Node { result = t.root + descendants(t.root) }
Listings 4.19 on the next page and 4.20 on page 77 contain the excerpts of the implementation of the Node and Tree signatures. Although the specification requires only that the tree be a binary tree, we have implemented it as a binary search tree. It is possible to modify the specification to include an ordering property for the keys
CHAPTER 4. EMBEE: USER PERSPECTIVE
76
of the nodes; however, Embee is not currently able to check the conformance of such a specification. This limitation is discussed in more detail in Section 5.6.3. Listing 4.19 Excerpt of BinaryTreeNode class, implementing Node signature. The implementation includes two extra attributes: data and parent. In addition, the left relation in the Node signature has been implemented as the leftChild attribute and the right relation has been implemented as rightChild. public class BinaryTreeNode { String key; String data; BinaryTreeNode parent; BinaryTreeNode leftChild; BinaryTreeNode rightChild; ... }
Exclusions in the Configuration File The configuration file for this example is shown in Listing 4.21 on the next page and requests breakpoints at the beginning of line 12 of the ClientCode class, as well as at the end of the remove() method of the BST class. As well as static mapping definitions, the configuration file contains a list of exclusions, which specify attributes in the implementation that are to be dumped solely as comments. These attributes would normally be ignored by Embee as they have no counterparts in the specification; however, treating them as comments helps streamline the conformance checking process.
CHAPTER 4. EMBEE: USER PERSPECTIVE
Listing 4.20 Excerpt of BST class, implementing Tree signature public class BST { BinaryTreeNode root; ... 82 public void remove(String key) { BinaryTreeNode nodeToDelete = search(key); ... BinaryTreeNode swapPos = nodeToDelete; BinaryTreeNode remPos = swapPos.rightChild; while (remPos.leftChild != null) remPos = remPos.leftChild; 115
197 242 243 244 245 246 247 248
swapNodes(swapPos,remPos); cleanUpAfterSwap(swapPos); ... } ... private void swapNodes(BinaryTreeNode n1, BinaryTreeNode n2){ ... if(temp.parent != null) if(temp.parent.hasThisLeftChild(temp)) temp.parent.leftChild = n2; else temp.parent.rightChild = n2; else root = n2; ... }
}
Listing 4.21 Configuration file for code in Listings 4.19 and 4.20 ClientCode define define define define
: : : :
Node : BinaryTreeNode Node$left : BinaryTreeNode.leftChild Node$right : BinaryTreeNode.rightChild Tree : BST
exclude : BinaryTreeNode.data exclude : BinaryTreeNode.parent line : ClientCode : 12 method : BST : remove : (java.lang.String)
77
CHAPTER 4. EMBEE: USER PERSPECTIVE
78
Initially, we had hoped to specify some balancing properties for binary trees. However, although the current implementation of the Alloy Analyzer supposedly supports the use of recursive functions, we were not able to specify the height of a tree or subtree3 . We are, however, able to constrain a tree to being proper, i.e., every node having zero or two children, or perfect, i.e., all external nodes having the same depth, and all internal nodes having exactly two children. In addition, it is possible to use various functions to determine how many children a node has, or how many ancestors, but this still does not allow us to create reasonable balancing constraints. We therefore limited our specification to a simple binary tree. To illustrate Embee’s capability of determining nonconformance due to erroneous implementation, we focus solely on code in the BST class that supports the removal of a node from the tree. This is a complicated operation and could conceivably contain many errors. Unfortunately, the only errors that Embee is able to identify will be those that result in a state in which the objects do not conform to the specification of a binary tree. We have chosen two implementation errors to discuss, both focussing on line 248 of the BST class, in the swapNodes() method, which is called by the remove() method. The purpose of the swapNodes() method is to swap the position of two nodes, n1 and n2; n1 will later be removed to finalize the remove() method. A temporary node is used to facilitate the swap; this temp node’s links to other nodes are set to those of n1, then n1’s links are set to those of n2. Finally, n2’s links are set to those of the temp node. The tree’s root variable is reassigned to the current root of the tree, as appropriate. It is this last part of the algorithm into which we introduce 3
Our attempts at defining recursive functions continually resulted in a ‘illegal invocation of nondeterministic function as expression’ error.
CHAPTER 4. EMBEE: USER PERSPECTIVE
79
errors. The code shown in Listing 4.20 on page 77 is the correct implementation of the swapNodes() method. If the temporary node’s parent is not null, i.e., the temporary node is not the root, then the temporary node’s parent is set to point to n2 as its left or right child, as appropriate. If the temporary node’s parent is null, then the temporary node is the root of the tree. Therefore, the tree’s root attribute must be set to n2. Error of Omission For our first error, we have decided to omit the changing of the tree’s root attribute by simply commenting out the else clause (lines 247 and 248). This error will only be noticeable if the root of the tree is deleted; otherwise, the program’s execution will perform as expected. We compiled the modified code, and ran Embee with the same specification and configuration file. Embee does detect the nonconformance caused by the seeded error at the second breakpoint, i.e., at the end of the remove() method when the tree’s root node is removed. We will not display the contents of the summary file, dump files, or the screen output of conformance checking; much of this output is available in Appendix D. We have, however, used the Edit Instance information produced by Embee (see Appendix D) to create visualizations of the tree before and after deletion of the root node. Figure 4.6 on the next page shows the tree before and after deletion, with correctly implemented code. Figure 4.7 on the next page shows the tree after deletion of the root, when the root = n2 statement is not executed.
CHAPTER 4. EMBEE: USER PERSPECTIVE
(a) Before deletion
80
(b) After correct deletion
Figure 4.6: Visualization of tree before and after correct deletion of the root node
Figure 4.7: Visualization of tree after deletion of the root node, with an error of omission in the code. Node 7 represents the temporary node in the swapNodes() method. Relating back to the swapNodes() method, n1 would be Node 0 and n2 would be Node 2. For easier understanding of the visualizations, we have changed the instance to reflect a new Node 7 to represent the temporary node. This visualization is shown
CHAPTER 4. EMBEE: USER PERSPECTIVE
81
in Figure 4.7 and demonstrates that the tree’s root still points to the temporary node, Node 7, instead of to the new root of the tree, Node 2. Typographic Error Our second error is a more plausible one—a simple typographic error in the statement that sets the tree’s root attribute (line 248 in the swapNodes() method). As with the error of omission, this error will only become apparent if the tree’s root is deleted. Listing 4.22 shows the error that we have introduced; we have set the root to n1 instead of n2. Listing 4.22 Excerpt of swapNodes() method with a typographic error in line 248 242 243 244 245 246 247 248
if(temp.parent != null) if(temp.parent.hasThisLeftChild(temp)) temp.parent.leftChild = n2; else temp.parent.rightChild = n2; else root = n1;
We compiled the modified code, and ran Embee with the same specification and configuration file. Embee is also able to detect the nonconformance caused by this seeded error. The summary file, dump files and much of the conformance checking screen output can be found in Appendix D. Again, we have used the Edit Instance information to create a visualization of the tree at the second breakpoint, i.e., after the root node has been incorrectly deleted; this visualization is shown in Figure 4.8 on the next page.
CHAPTER 4. EMBEE: USER PERSPECTIVE
82
Figure 4.8: Visualization of tree after deletion of the root node, with a typographic error in the code Notice that there are still seven nodes, but this time, all of them are part of the original tree. The visualization demonstrates that the tree’s root still points to the first node; this was caused by execution of root = n1 instead of root = n2. The remaining six nodes are aligned in the shape of a binary tree, with the proper new root node, Node 2 at the root position; however, these nodes are no longer accessible by the tree object. In fact, if the execution of this erroneous code were to be output, the printTree() method would output a tree with one node only.
Chapter 5 Embee: Implementation and Analysis The purpose of this chapter is to describe the Embee program in more detail, with a focus on implementation and analysis. Figure 4.1 on page 55 shows a high-level view of the entire process. The first half of this chapter examines each phase of this process in more detail, explaining how specific portions of each phase were implemented. This discussion will also explain how various implementation decisions affect the current capabilities and limitations of Embee. The second half of this chapter focusses on analysis of the tool, especially with respect to its capabilities and limitations. As well, we discuss the asymptotic complexity of the Embee program itself and conclude with the results of performance tests.
83
CHAPTER 5. EMBEE: IMPLEMENTATION AND ANALYSIS
5.1
84
Implementation
The purpose of developing the Embee prototype was to create a program that could be linked with the Alloy Analyzer; we therefore implemented the prototype in Java. The development and implementation took approximately nine months, although the code continues to be modified slightly to improve its performance. The end result is a program that consists of 59 classes in 14 packages, for a total of approximately 4 800 source lines of code. A hierarchy of these packages is displayed in Figure 5.1. ;=; ; ;