A Visual Interactive Debugger Based on Symbolic Execution∗ Reiner Hähnle
Marcus Baum
Chalmers Univ. of Technology Gothenburg, Sweden
Karlsruhe Institute of Technology (KIT) Karlsruhe, Germany
[email protected]
[email protected]
ABSTRACT We present the concepts, usage, and prototypic implementation of a new kind of visual debugging tool based on symbolic execution of Java source code called visual symbolic state debugger. It allows to start debugging of source code at any code location without the need to write a fixture as well as to visualize all possible symbolic execution paths and all symbolic states up to a finite depth. A code-based test generation facility is integrated.
Categories and Subject Descriptors D.2.4 [Software Engineering]: Software/Program Verification—Formal methods; D.2.5 [Software Engineering]: Testing and Debugging—Debugging aids, symbolic execution, testing tools; D.2.6 [Software Engineering]: Programming Environments—Graphical environments
General Terms algorithms, design, experimentation, verification
Keywords symbolic execution, state visualisation, reverse debugging, test generation
1.
INTRODUCTION
Debugging is a central and unavoidable activity during software development consuming a considerable fraction of the overall effort. Nevertheless, debugging has received little interest from the academic software research community. One of the few systematic treatments of debugging is Zeller’s text book [11] whose terminology we adopt for this paper. ∗This work has been partially supported by the EU project FP7-ICT-2007-3 HATS Highly Adaptable and Trustworthy Software using Formal Methods.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ASE’10, September 20–24, 2010, Antwerp, Belgium. Copyright 2010 ACM 978-1-4503-0116-9/10/09 ...$10.00.
Richard Bubel Marcel Rothe Chalmers Univ. of Technology Gothenburg, Sweden
[email protected] [email protected]
All software developers use debugging tools like interactive debuggers (e.g., the Eclipse debugger), which are used to watch, to observe, and to inspect computation states during a program run.1 The goal is to narrow down the section of a program where a defect is located as much as possible. In this paper we argue that symbolic execution of programs, an idea from the 1960s [5] can overcome some of the most serious limitations of traditional interactive debuggers: (i) Interactive debuggers must be started in a concrete program state that has to be specified by the user; (ii) the number of program states between the start of the debugger and the program section where the location of the defect is suspected can be very high; (iii) in consequence, the datastructures that need to be inspected when the debugger stops might be very large; (iv) it is generally necessary to trace back the occurrence of a failure to the defective statement(s) that caused it, but interactive debuggers do not record trace histories. This makes subsequent debugging sessions with modified breakpoints necessary. Omniscient [7] or reverse debuggers address issue (iv) as they record the state history, but are not yet used in wide. One obstacle is the number and size of states that have to be visited. By optimizing trace generation and storage techniques it is possible to arrive at an acceptable execution overhead ([8], gdb≥7.0), but the difficulty to navigate and inspect a high number of states and huge datastructures remains. All four issues pointed out above can be solved by generalizing program execution from concrete to symbolic execution states. In this paper we present a visual symbolic state debugger (vsdb) for Java Card based on symbolic execution. The vsdb has the usual functionality of interactive debuggers; in addition, there are a number of new features: it is possible to start debugging at any statement of interest, because no concrete state must be built up. The initial state can be completely generic or only be partially specified. The vsdb produces a symbolic execution tree representing all possible concrete execution paths faithfully up to a finite depth and feasible w.r.t. a given precondition. The tree is visualized and one can navigate (by point-andclick) to any node to inspect its contents. This implements omniscient debugging [7]. The heap in a symbolic state can be visualized as a symbolic object diagram, an extension of 1 Interactive debuggers are known in the literature as symbolic debuggers. We use the former notion to avoid confusion with tools using symbolic representations of program states, which will figure in our presentation as well.
UML object diagrams where program variables and fields (i.e., memory locations) have symbolic primitive values and symbolic constraints on the values. It is possible to control execution with a new concept called semantic watchpoints representing complex semantic conditions. At each symbolic execution tree node one can create automatically a JUnit test case which guarantees that this node is reached [3]. This is important to generate regression tests after a defect has been fixed.
2.
SYMBOLIC EXECUTION
We illustrate the principle of symbolic execution with a small example: A.1 public class A { public int value; A.3 public static int absDiffSwap(A a, A b) { int abs; A.5 if (b.value > a.value) { a.value = a.value - b.value; A.7 b.value = a.value + b.value; a.value = b.value - a.value; A.9 } abs = a.value - b.value; A.11 return abs; }} This Java program declares a class A with integer field value and static method absDiffSwap() that computes the absolute difference of the arguments’ value fields and swaps their contents whenever the second argument is greater than the first. In symbolic execution program locations are initialized with a symbolic value s of the form s ::= lit | c | s ◦ s | −s where lit ∈ Z ∪ Boolean is a literal (concrete value), c is some constant symbol or variable and ◦ ∈ {+, −, ∗, . . .}. A symbolic state maps each program location (program variable, array/object access) to a symbolic value. We represent a symbolic state U by location-value pairs l0 := v0 | . . . | ln := vn . Finally, a symbolic execution configuration is a tuple (A; U; hsi ; . . . ; sn i) where A is a set of assumptions (formulas encoding the current path condition plus preconditions) about restrictions on the symbolic values, U represents the symbolic state, and si ; . . . ; sn the sequence of statements remaining to be executed (a program counter). For brevity we refer to the program counter by the line number of si . I = ({{a, b} 6= null }; a.value := a0 | b.value := b0 ; hA.4i) {z } |
a0
a0 >
≤
b0
b0
p
cfgtAsgn1 cfgtAbs
cfgtAsgn1 : ({{p, b0 > a0 }; a.value := a0 | b.value := b0 ; hA.6i) cfgtAbs : ({{p, b0 > a0 }; a.value := b0 | b.value := a0 ; hA.10i) cfgtFin : see text
Figure 1: Symbolic execution of absDiffSwap Let us execute method absDiffSwap symbolically with initial configuration I, see Fig. 1. Executing the declaration moves the program counter to A.5. To continue we evaluate the guard expression b.value>a.value which requires access of a.value and b.value. This raises an exception when a or b is null. Symbolic execution looks at all four possibilities on different branches. On three of these either a = null or
b = null occurs which is infeasible with p in I and can be ignored. Showing such infeasibility requires to integrate an automated theorem prover with symbolic execution. In configuration I the value b0 > a0 of the guard is symbolic and not reducible to true/false, so symbolic execution splits into two branches. Splitting adds the (negated) symbolic value of the guard expression, called branch condition, to the assumptions of the two new configurations. The path condition is the conjunction of all branch conditions along a path and identifies a path uniquely. On the left branch symbolic execution continues in configuration cfgtAsgn1 with the branch condition added. After symbolically executing all assignments of the then-branch and the one at A.10, execution terminates in the final configuration cfgtFin = ({p, b0 > a0 , }; a.value := b0 | b.value := a0 | abs := b0 − a0 ; hi). The else branch is left for the reader.
3. 3.1
SYMBOLIC STATE DEBUGGING Program Control
A main feature of interactive debuggers is program control that allows the user to control execution while identifying infected states. The basic mechanism for transferring control to the user is the breakpoint which can be attached to lines or statements. When execution encounters a breakpoint it goes into break mode where the user can continue program execution step-wise and inspect the current state. The vsdb supports the standard control commands, however, in Run/Resume symbolic execution of loops and recursive programs does in general not terminate. Therefore, symbolic execution stops after a fixed (adjustable) number statements are executed, unless it encounters a breakpoint first. For stepwise program execution the commands Step Into/Step Over behave as expected. Symbolic execution stops as soon as its program counter points to a statement where a statement breakpoint is attached. Going beyond the capabilities of interactive debuggers, our realization of symbolic execution permits to add general preconditions (see Fig. 1) specified in JML [6] to the initial symbolic state. Thus execution paths that are contradictory to the precondition are automatically excluded.
3.2
Semantic Watchpoints
Many debuggers support conditional breakpoints. These are statement breakpoints with a boolean-typed Java expression. Conditional breakpoints interrupt the control flow if execution reaches the statement they are attached to and their condition is true in the current state. Another useful feature are watch expressions. These are boolean expressions attached to a field (e.g., a==0). Whenever the value of a field with an attached watch expression is changed, the watch expression is evaluated immediately after and, if true, program execution is put on hold. With the introduction of semantic watchpoints we go a step further: a semantic watchpoint is a boolean (first-order) expression that is evaluated each time when the symbolic state changes. Again, program execution halts when the boolean expression evaluates to true in the current symbolic state. Optionally, semantic watchpoints can be attached to a statement. In this case the semantic watchpoint expression is evaluated in the same scope as the statement and allows to access parameters and local variables.
Semantic watchpoints realize the functionality present in either conditional breakpoints and watch expressions, however, they offer additional functionality for debugging and program comprehension not (easily) achievable with interactive debuggers. For a start, semantic watchpoints are not restricted to Java expressions, but may contain arbitrary first-order logic formulas including universal and existential quantifiers. This allows to express complex semantic conditions such as “all elements of an array have value zero”. The caveat is that one possibly leaves the decidable realm and needs SMT-solvers or theorem provers. Semantic watchpoints need not be attached to fixed program points and can be used to detect that a certain system configuration has been reached (system state discovery). E.g., if the watchpoint contains the negation of a safety invariant one can detect when an unsafe system state is reached. As semantic watchpoints do not depend on a fixed code point they are much more reusable than conditional breakpoints.
4. 4.1
VISUALIZATION Visualization of Symbolic Execution
A valuable feature of the vsdb that goes beyond the capabilities of other debuggers is visualization (see Fig. 2). Visualization makes implicit knowledge or assumptions about the source code explicit and so can help to find erroneous code or specifications quickly. In the vsdb we render symbolic execution trees in a separate Eclipse view. To facilitate orientation in symbolic execution trees types of statements are distinguished by color and form: Statement/Expression Nodes (light blue/magenta, rectangle) represent symbolic states whose program counter points to a Java statement/expression. Method Invocation/Method Return Nodes (white, rectangle) represent symbolic states that refer to a method invocation/return. They are labeled with the method name and the symbolic values of its arguments or with the symbolic return value. Details are displayed as a tool tip. Termination Nodes (green/red, circle) indicate termination of symbolic execution on a path. A green circle indicates normal termination, a red circle signals abrupt termination (uncaught exceptions). Selecting a red circle highlights the Java expression in the Eclipse editor that threw the exception. The exception type is shown as a tool tip. Given the large amount of data potentially created during symbolic execution, it is necessary to provide suitable navigation mechanisms: double clicking any statement/expression node highlights the corresponding statement/expression in the Eclipse source code editor. It is possible to hide/expand any node of the symbolic execution tree. One can also adjust horizontal and vertical spacing of symbolic execution tree nodes. A more advanced navigation help is the possibility to set an arbitrary node of the symbolic execution tree as the new root node. Further it is possible to isolate individual symbolic execution paths and suppress the display of all others. This makes it easy to focus on the statements leading up to a given tree node. Isolating paths is also a first step towards dynamic slicing, because only the statements that were actually executed before a given node are shown.
4.2
Visualising Symbolic Object Diagrams
Visualisation of the symbolic execution tree helps to obtain an overview of all possible execution paths and when
they are taken. To really comprehend a program one has also to understand its datastructures, i.e., the possible initial, intermediate and final states. Of particular interest is the heap structure of programs manipulating linked datastructures. Textual representations are hard to read even for simple programs while a graphical representation of the heap can directly show whether a linked datastructure is as intended. We use a graphical representation called symbolic object diagram, see Fig. 3.a. Symbolic object diagrams follow the notation of UML object diagrams, but are generalized for symbolic heaps. Their two main components are symbolic objects and symbolic associations. Symbolic objects are rendered by a rectangle with two compartments. The upper compartment contains the unique name and class type of the instance. The lower compartment lists attributes of primitive type and their symbolic values. Attributes of reference type are visualised as symbolic associations. An attribute a of object instance o may have a constraint and a value term. A constraint is a quantifier-free formula restricting the possible values of o.a. A value term is a symbolic representation of the value of o.a. In any feasible symbolic execution configuration the value term satisfies the attribute’s constraint. Attributes without value terms or constraints are suppressed in the compartment view to provide an uncluttered view of the symbolic heap. Symbolic associations are directed binary associations. Let an attribute a of symbolic object o have object u as its value term. This is visualized with a binary association from o to u labeled with a. Attribute labels are white rectangles placed on top of association arrows, see Fig. 3. Due to the implicit heap representation, the possible symbolic object instances and associations are not uniquely determined. Therefore, we generate and visualize all possible symbolic object diagrams for a given symbolic configuration, see Fig. 3. This feature is used in Sect. 5 to detect undesired heap configurations.
5.
USAGE
We implemented a prototype of the vsdb2 [1, 9] as an Eclipse plugin (Fig. 2) to prove the feasibility of the concept. Our implementation is based on the symbolic execution engine of the KeY verification tool [2] and has Java Card as its target language. In this section we focus on features going beyond interactive debuggers or being unique to the vsdb. Overview of all execution paths: let us inspect method A.absDiffSwap() of Sect. 2. After starting the vsdb and “running” symbolic execution for a number of steps we obtain the tree in Fig. 2. As explained in Sect. 4.1, rectangular nodes represent symbolic execution of statements/expressions and circular nodes show where symbolic execution terminated. Excluding unintended paths: a close look at all abruptly terminated branches shows that in their path conditions at least one of the method parameters a, b is null. Whether the exception is considered to be an error or expected behaviour is up to the program developer. Let us assume that the observed behavior is the expected one. To avoid to clutter the analysis with expected failure cases we add a!=null and b!=null as a precondition to the initial symbolic configuration. Running vsdb again, the explored tree has only the two leftmost branches in Fig. 2. 2
Available at www.key-project.org/download/ase2010/
(a) Expected heap structure
(b) Alternative possible heap structures
Figure 2: All execution paths of absDiffSwap() Generating tests: we alter absDiffSwap() slightly by weakening the conditional’s guard to “b.value>=a.value”. This seems harmless and might have been implemented that way from the start. To generate tests [3] with meaningful oracles, the functional property to be tested is specified in JML as a postcondition, here we can use b.value >= a.value ==> a.value==\old(b.value) && b.value==\old(a.value). JUnit tests are automatically created and can be run from Eclipse. In the example one of the tests fails. The vsdb helps bug finding by highlighting the symbolic execution tree path leading to the failure. A look at the symbolic state unveils that it occurs when a==b, a typical aliasing bug. Symbolic state visualisation: consider a simple list class: public class L { public /*@ nullable @*/ L next; public int content; /*@requires l!=null && sz>=0 && a.length>=sz;@*/ public static void copy(L l, int[] a, int sz){ L tail = l; for (int i=0; tail != null && i