In our case, each line of code is converted into dependencies, mainly composed of indexed variables. .... locations(nId) returns a list of location nodes of node nId if Id is a method identity. copy(Id) ..... [9] Raymond Reiter. A theory of diagnosis ...
Exploiting Alias Information to Fault Localization for Java Programs 1
Rong Chen1 2 and Franz Wotawa1 Technische Universit¨ at Graz, Institute for Software Technology, 8010 Graz, Inffeldgasse 16b/2, Austria {chen, wotawa}@ist.tugraz.at 2 Institute of Software Research, Zhongshan University, Xingangxilu 135, 570215 Guangzhou, China Abstract
In software system it is still a challenging task to automatically extract program model by capturing properties to release the full potential of model-based software debugging. This paper introduces a new model which can be used by a model-based diagnosis engine to localize faults of Java programs. The proposed model is a qualitative model where values of variables are ignored, and it can reason at a higher level of abstraction by using dependency information and aliasing information. To show how it works, we discuss the compilation of Java programs to a constrained value-flow graph, and the mapping from the graph to its logical representation, which can be used by a model-based diagnosis engine. Indeed our experimental results suggest that our approach can achieve better diagnosis results.
1 Introduction Model-based diagnosis [9, 3] is a process for locating faulty components in a technical system solely on the basis of its structural and behavioral model, and it has been successfully applied to compute possible fault locations of software systems [16, 7]. The basic idea behind model-based debugging is (1) to automatically build a logical model that describes the structure and the behavior of a program, and (2) to compute the diagnosis candidates by asking the model-based diagnosis engine why the program does not behave as expected. Each diagnosis candidate corresponds to a certain syntactic entity (e.g., a statement or expression) in the original source code. Diagnosis candidates can be further discriminated by using additional measurements, e.g., values of variables that are know at a specific position in the program’s source code, or assertions. In the last years our colleagues have developed two categories of models for debugging Java programs: (1) The functional dependency model (FDM) [7], and (2) the value-based model (VBM) [8]. Since the VBM can eliminate wrong diagnoses by using additional run-time information, e.g., the values of variables, the VBM achieves better results than the FDM [13] in most cases. However, whereas the VBM heavily relies on low-level information, the FDM can reason at a higher level of abstraction. Moreover, the dependency-based representation is expected to scale up well to large-sized programs, as is shown in [14, 2]. However, it is still a challenging task to automatically extract program model by capturing properties to release the full potential of model-based software debugging. In this paper, we focus on extending our previous research on model-based program debugging with the dependencybased representation. We illustrate our approach using a small Java program that is shown in Figure 1. The demo method comprises instance creations, assignment and a conditional statements. If we know some information about the expected value of a.value, b.value, c.value, and d.value, we can ask our JADE debugging tool to debug this program. For example, we may know that the values of b.value, c.value, d.value are correct after line 7 but a.value is wrong. If we compute the diagnoses for this test case with the JADE debugger using the FDM, we get 6 diagnosis candidates. Among these 6 diagnoses some of them are unreasonable like the conditional (line 1
Authors are listed in alphabetical order. The work presented in this paper was funded by the Austrian Science Fund (FWF) P15265-N04, and partially supported by the National Natural Science Foundation of China (NSFC) Project 60203015 and the Guangdong Natural Science Foundation Project 011162.
5) and assignments (line 2 and line 6), and some are imprecise like the assignment in line 1. The reason is that the functional dependency model that was introduced in [7] cannot capture the behavior of a statement block precisely. For example, the FDM cannot find that statements in line 1, 3, and 5 show a conditional alias relationship between a and c. Instead our approach can achieve this by using enriched dependencies with constraints produced by a points-to analysis which thus lead to improved results. The proposed approach can be seen as a standard qualitative reasoning approach. In contrast to the VBM which requires the knowledge of the variables’ values at specific lines in the code, the dependency-based models make only use of qualitative information like a value is correct or two variables are aliased. Hence, debugging using abstract models are good examples for the use of qualitative knowledge to solve real-world problems efficiently and as precise as required. . class ValueHolder { . int value ; ValueHolder(int v) { 0. value = v; } ... . static void demo(boolean choice) . { 1. ValueHolder a = new ValueHolder(1); /∗l1∗/ 2. ValueHolder b = new ValueHolder(2); /∗l2∗/ 3. ValueHolder c = a;
4. 5. 5.1 . 5.2 6. 7.
ValueHolder d = b; if ( choice) a = b; else b = new ValueHolder(3); /∗l3∗/ b.value = 4; c .value = 3; }
}
Figure 1: A Small Java Program
2 The Consistency-based Software Debugging The development and improvement of consistency-based diagnosis techniques are built on the strict logical foundations provided by Reiter (in [9]) and others (see [5]). Since Model-Based Software Debugging follows this framework, we rephrase the standard methodology in the setting of software debugging. A consistency-based diagnosis system is a tuple (SD, STMNTS , SPEC ), where SD is a system description modeling the program’s structure and behavior, STMNTS denotes a set of statements (the artifacts that are assumed to be either correct or faulty), and SPEC is a set of test cases. A diagnosis problem arises when the prediction made by the diagnostic system contradicts a test case. It is clear that correctly working statements contribute to the prediction. If we use ¬AB(S) to denote the assumption about a correctly working statement S, we can figure out the contradiction is supported by one or more sets of ¬AB(S)-like assumptions. We call these sets of assumptions conflicts. Formally, a set CO ⊆ STMNTS is a conflict if and only if SD ∪ SPEC ∪ {¬AB(S)|S ∈ CO} ` ⊥. Therefore, to avoid the contradiction, we cannot assume that all the statements in CO work correctly at the same time. For example, suppose a conflict CO compromising n statements S1 , S2 , .., Sn−1 , Sn , i.e., ¬AB(S1 ) ∧ ¬AB(S2 ) ∧ ... ∧ ¬AB(Sn−1 ) ∧ ¬AB(Sn ), we should assume that at least one of the assumptions should be false; that is, there must be a statement S i (1 ≤ i ≤ n) such that AB(Si ) is true. Thus for resolving the contradiction, a set of diagnostic candidates is generated by identifying sets of statements covering all conflicts. Formally, a diagnosis ∆ is a minimal subset of all assumptions such that SD ∪ STMNTS ∪ {¬AB(S)|S ∈ STMNTS − ∆} ∪ {AB(S)|S ∈ ∆} is consistent. This concept means that some subset of statements must be assumed to work incorrectly in order to avoid the contradiction (see [9] for the algorithm of the computation of diagnoses).
3 Modeling Programs for Debugging In this section, we first describe how to model Java programs by means of enriched dependencies. Java programs consist of class declarations with defined method and they are converted by successively converting all methods of all classes. And the functional dependency model is built on the basis of program variables. After that, we further show the conversion of these dependencies into a logical representation, which can be directly used by a standard model-based diagnosis engine. As a result we obtain diagnoses that correspond to syntactical parts of the program’s source code, i.e., statements and expressions. The functional dependency model is built on the basis of program variables. Since there may be many assignments to the same variable, variables may be defined (i.e. have different values) at different program points; for example, a has different values on statement 1 and 5.1 in the demo method ( Figure 1). To distinguish variables’ definition point, we assign a unique index, as in the FDM, to all occasions where a variable is defined in a method. From the technical perspective, we keep the history of indices for variables occurring in a method m that are being modeled. And whenever we reach a new definition point for a program variable, we can increase its current index and then get a fresh indexed variable for this definition. In this way, there is at most one assignment to every variable. And dependencies are computed on the basis of indexed variable. In our case, each line of code is converted into dependencies, mainly composed of indexed variables. Before we introducing our algorithm formally, we first illustrate the underlying ideas on simple examples. 3.1 Computing Enriched Dependency for Simple Assignment Statements The main idea behind the algorithm is the conversion of an input program into an enriched points-to graph. We assume that each variable regardless whether it is class, instance, or local has a corresponding node in the induced points-to graph. It is convenient to assume that each node n has an edge pointing to its location node n.lc because we always collect strong knowledge at the low-level location nodes. For example, a variable x induces a structure outlined in Figure 2 (a). A variable may refer to an object. The content of the object may point to other memory locations which can be accessed by field selector. For the sake of uniformity, we also regard objects as memory locations and the edges directing from nodes to their location nodes are uniformly called access edges (e.g., solid edges in Figure 2 (a)). We use also flow edges to depict basic dependencies that are produced by assignment statements. For example, consider a simple variable assignment, x = y. Firstly, the variable x and y induces a structure that is outlined in Figure 2 (b). Secondly, we create a dashed edge (i.e. flow edge) because the operand = implies that a value flows from y to x. This is depicted in Figure 2 (c). Finally, the assignment statement induces the unification of contents of x and y which is shown in Figure 2 (d). For the sake of clarity, each time a variable or a constant is used in an assignment statement, we create a source node (labeled by the variable or the constant, e.g., node x in Figure 2), a corresponding location node (represent the memory location, e.g., dashed circles in Figure 2). Some location nodes are called alias nodes since they represent the content that is referenced via different variables (e.g., node n in Figure 2 (d)). Because of the fact that we collect the knowledge from alias and location nodes with multiple ingoing flow edges we call them summary nodes. For example, the assignment x ≡ y is represented at the alias node n in Figure 2 (d), which means either “x and y refer to the same object” or simply “x equals to y”. Next we consider the more complicated case of how to convert statement 1 of method demo that is depicted in Figure 1 (i.e., ValueHolder a = new ValueHolder(1)) which is an assignment with a constructor on the right side. In this example we focus on the handling of the constructor.
x
x.lc
x.lc.lc
y
x
y.lc
x.lc
y.lc.lc
x.lc.lc
(a)
x
x
y
x.lc
x.lc
y.lc
x.lc.lc
y
y.lc
n
y.lc.lc
(b)
(c)
(d)
Figure 2: The graph induced by variables and a variable assignment statement. In order to model variables of reference type, we map them into abstract memory locations, and their attributes are mapped into the content of locations, represented by internal indexed variables. On the other hand, statements might change the status of objects, their effects are approximated by using a set of locations accessed by executing the given statement. For brevity, we use positive integers, 0, 1, ..., to denote possible abstract memory locations; and location 0 corresponds to the implicit parameter this. Regarding the Java example in Figure 1, we first convert the constructor of the statement. The result of this process is shown in Figure 3 (a). Because an object may influence several variables due to its state change we use named locations to represent a newly-created object; each location l has a type, i.e., the class type of the instance modeled by l, and a unique index, which non-ambiguously identifies location l within a modeled method. To record these object influences, we next link the named location to each instance/class variables of the object (shown in Figure 3 (b)). Afterwards we convert the rest of the assignment which is now simplified to be an assignment of a named location to a variable, i.e., a = l1 (see Figure 3 (c) for the induced graph). a
l1
.value
l1
1 .value
1
.value
1
1(0) 1(0)
(a)
(b)
1(0)
(c)
Figure 3: The induced graph for statement 1: ValueHolder a = new ValueHolder(1). Note that we can obtain some useful information from every alias node. For example, from node n1 in Figure 3 (c) we obtain the statement “a is an alias of l1” and from n 2 we obtain
a.value = 1. Moreover, from Figure 3 we observe that: 1: Alias nodes are those with multiple in-going edges and one out-going edge, whereas location
nodes have one in-going edge and one out-going edge. 2: Both, a source node and a location node can access their location nodes either via their
references or via their field selectors (e.g., a.value). 3: A source node can become a location node. For example, node n 1 in Figure 3 (c) is not only
a source node a.value but also a location node. The main idea of our algorithm is the conversion of input statements into an enriched pointsto graph, which comprises source nodes and summary nodes, as well as access edges and flow edges, that represent the flow of values through the program. The goal of this representation is to gain debugging knowledge easily by traversing the graph. Algorithm 1 convert(Expr, i) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42:
if ( Expr = Id ) then if (not (nId = lookup(Id))) then create a named source node nId, its location node nId.lc, and nId.lc ’s location node nId.lc.lc; create access edges directed from each node to its location node. return nId.lc end if else if ( Expr = Const ) then create a named source node nId, its location node nId.lc, and nId.lc ’s location node nId.lc.lc; create access edges directed from each node to its location node. return nId.lc else if (Expr = (Expr1)) then rN odes := convert(Expr1, i) else if (Expr = Id.f ) then return access(Id.f ) else if (Expr = Expr1 Op Expr2) then rN odes := convert(Expr = Expr1, i)∪ convert(Expr = Expr2, i) return rN odes else if ( Expr = new M ethodCall ) then create a named location node li ; if (not (Id = lookup(M ethodCall))) then rN odes := convert(M ethodDec(Id), i) for all node n ∈ rN odes do create an access directed from li to n end for else (nId = lookup(Id)) li := copy(nId) end if return (nId.lc) else if ( Expr = M ethodCall ) then if (not (Id = lookup(M ethodCall))) then rN odes := convert(M ethodDec(Id), i) else (nId = lookup(Id)) n := copy(nId) rN odes := locations(n) end if for all node nj ∈ input(Id) and its corresponding node n0j ∈ rN odes do create a flow edge directing from nj .lc to n0j .lc end for return output(nId) end if
3.2 Algorithm for Computing Enriched Dependencies In the last section, we informally introduced an enriched points-to graph. In this section, we describe an algorithm that converts Java programs to extended points-to graphs. The extension is due to the labeling of edges with constraints. We call the extended graph a constrained
value flow graph (CVFG). The constraints can be Boolean expression that occur as conditional expressions in loops and conditionals and also conditional assertions. The CVFG for a given program is constructed by calling two functions. The function convert compiles one statement into a basic CVFG. The function summarize summarizes the alias information and the conflicting dependencies in the induced CVFG. The function convert calls summarize whenever an assignment statement is processed. To simplify our discussion of these two functions, we introduce a number of functions and notations that are used throughout the discussion. For this purpose we use the following notational conventions: Id denotes either a variable name or a method identifier. Id.f denotes the field selector f of Id. nId denotes the corresponding location node for Id. n denotes a node in the induced graph. rN ode denotes the single returned node. rN odes denote the list of returned nodes. lc denotes the abstract reference. We use n.lc to follow access edges for each node n in the induced graph. M ethodCall denotes the method call. sumN odes(nId) denotes a set of summary nodes for node nId, which are used to record the information to be summarized. constraints(i) denotes the constraints associated with nId with respect to a statement index i. t:j.k(l) denotes the inner statement index, where j is the prime statement index (as shown in Figure 1); k the sub-index of statements in the block of selection statements; l the sub-index of statements in method calls or constructors; t is used to label the type of the modeling statement: t = 1, if it is a loop statement, otherwise t = 0 for modeling method declarations. The information in our indexing mechanism corresponds to the statement types; sometimes we will neglect the unimportant part if the context is unambiguous. Some auxiliary functions are defined to retrieve values of variables as well as nodes and their associated constraints: match(t : j.k(l), i) returns the value of t, j, k, l for a statement index i. access(Id.f ) goes down along the access edges of node Id and return a node corresponding its field f . lookup(Id) returns the corresponding location node nId for Id if it exists. lookup(i) returns the corresponding location node nId if statement i is in the form of “Id = Expr”. lookup(M ethodCall) determines the receiver(s) of the M ethodCall, returns the corresponding the method identities Id(s method(s).
MethodDec(Id) returns the declaration “Type Id(FormalParList) {S 1 , · · · , Sn } ” of the method Id. output(Id) returns nodes related to the returned values of a method call with respect to a method identity Id. input(Id) returns a list of nodes for the actual parameters of a method call with respect to a method identity Id. locations(nId) returns a list of location nodes of node nId if Id is a method identity. copy(Id) copies the induced graph, which can be done by traversing the graph from the starting node nId to its leafs. visible(nId, i) returns true if either (1) Id is an attribute of an object and statement i is one of the statements of a constructor, or (2) statement i is one of the statements in the declaration of a method call and Id is an formal parameter; otherwise returns f alse. The basic functionality of the conversion algorithm can be explained by explaining its behavior for different Java statements. We first describe the conversion of an assignment via the function convert. Algorithm 2 convert(Id = Expr, i) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29:
tag := f alse; if (not (nId = lookup(Id))) then create a named source node nId and its location node nId.lc; tag := true end if rN odes := convert(Expr, i) for all node n ∈ rN odes do create a flow edge directed from n to node nId.lc, label this edge with the index i and constraints(i) if ((tag) ∧ (constraints(i)6= {})) then for all constraints conj ∈ constraints(i) do create a new location node nId.lc0 for node nId; create an access edge directed from nId to nId.lc0 , label with the index i and conj ; create an access edge directed from nId.lc0 to n.lc end for else create an access edge directed from nId.lc to n.lc end if sumN odes(nId) := sumN odes(n.lc) for all node n0 ∈ sumN odes(nId) do if ((constraints(i) 6= {}) ∨ (i = 0)) then summarize(i, n0 ) end if end for end for if visiable(nId, i) then return nId.lc else return nil end if
Algorithm 2 shows how to convert a single assignment statement. Algorithm 3 handles the assignment statement in case of a field access at the left-hand side of the assignment. Both algorithms call convert(Expr, i) which is defined in Algorithm 1 in order to convert the righthand side of the assignment. The Algorithm 4 converts method calls into CVFGs. Since method calls might modify external variables, we must consider side effects which stem from statements in the method to be
modeled. Algorithm 3 convert(Id.f = Expr, i) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25:
nId := access(Id.f ) rN odes := convert(Expr, i) for all node n ∈ rN odes do create a flow edge directed from n to node nId.lc, label this edge with the index i and constraints(i) if ((tag) ∧ (constraints(i)6= {})) then for all constraints conj ∈ constraints(i) do create a new location node nId.lc0 for node nId; create an access edge directed from nId to nId.lc0 , label with the index i and conj ; create an access edge directed from nId.lc0 to n.lc end for else create an access edge directed from nId.lc to n.lc end if sumN odes(nId) := sumN odes(n.lc) for all node n0 ∈ sumN odes(nId) do if ((constraints(i) 6= {}) ∨ (i = 0)) then summarize(i, n0 ) end if end for end for if visiable(nId, i) then return nId.lc else return nil end if
Algorithm 4 convert(M ethodDec(Id), i) 1: 2: 3: 4: 5: 6:
create node nId for the method identity Id rN odes := convert({S1 , · · · , Sn }, 0 : i) for all node n ∈ rN odes do create an access directed from nId to n. end for return rN odes
A statement sequence is converted statement by statement in Algorithm 5. Algorithm 5 convert({S1 , · · · , Sn }, i) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:
rN odes := {} match(t : j.k(l), i) for all 1 ≤ r ≤ n do if ( t = 1 ) then rN odes := convert(Sr , 1 : r) ∪ rN odes end if if ( t = 0 ) then rN odes := convert(Sr , 0(r)) ∪ rN odes end if end for return rN odes
When modeling the selection statement and loop statement, we assume that we have preprocessed their corresponding expressions by introducing additional statements with auxiliary variables if expressions contain constructors like arithmetic operator or method calls. That is, we assume that expressions are simple conditions. Thus the selection statement, for example an if-then-else statement, is converted as described by Algorithm 6. The loop statement like a while-statement can be converted by calling Algorithm 7. Algorithms 2–7 define the compilation process of Java programs into CVFGs. The algorithms that correspond to the conversion of assignment statements make use of the function summarize. The basic idea behind summarize is the collection of strong knowledge at summary nodes that is represented by consistent constraints. The collected knowledge which is represented by equalities is said to be strong because it represents several dependencies coming from the nodes that are one level higher than the summary node. We can always create such summary nodes by creating the node and a flow edge from nodes to the summary node itself. For this purpose we
Algorithm 6 convert(if (Expr) {S1 , · · · , Sn } else {S10 , , Sn0 }, i) 1: 2: 3: 4:
constraints(i.1) := constraints(i) ∪ Expr rN odes := convert({S1 , · · · , Sn }, i.1) constraints(i.2) := constraints(i) ∪ ¬Expr 0 }, i.2) rN odes := convert({S10 , · · · , Sn
Algorithm 7 convert(while (Expr) {S 1 , · · · , Sn }, i) 1: constraints(1 : i) := constraints(i) ∪ Expr 2: rN odes := convert({S1 , · · · , Sn }, 1 : i)
must be sure that there are no constraints annotated and that there are no conflicts in related flow edges (as shown at the summary nodes in Figure 2). Whenever two flow edges are directed to the same summary node, we call them conflicting. To resolve any conflicts, we compare related statement indexes at a summary node and select the one that has been executed at the latest. Regarding the summary nodes of a while statement, we can also obtain strong knowledge by using some sort of fix-point computation, i.e., calling summarize until no new knowledge can be obtained. The functions convert and summarize define the conversion of Java programs into a constrained value-flow graph. Figure 4 shows the final CVFG of our example program demo from Figure 1. a 5 choice
5 choice 3
3
b
d
c
1
n1
1,3
n3
4
l1
1
2
1,3
1,3,7 choice
5 choice
2(0) 2,4,5.1 choice
2,4,5.1,6 choice
5
l2
2
4 2,4,5.1 choice
1(0) n2
5.1 choice
5.2 choice
choice
6 choice 6
2, 4
5.2 choice
l3
3
5.2 choice
2, 4
2, 4
choice
5.2(0) choice
5, 2.6 choice
Figure 4: The induced graph for the small Java program in Figure 1.
3.3 Constrained Value-Flow Model After computing all dependencies, i.e., the constrained value-flow graph, we map them to a logical representation, which can be directly used by a standard model-based diagnosis engine. For this purpose, we have to introduce a predicate ¬Ab(i) to denote that a statement i is assumed to be correct where i is the index of this statement. The mapping of dependencies to their logical model was described elsewhere [7]. In the context of CVFGs this mapping is similar to the mapping of dependencies at non-summary nodes to logical sentences. From the standard functional dependency model we receive the following (simplified) rules for variables a and c:
¬Ab(1) → ok(a1 ) ¬Ab(3) ∧ ok(a1 ) → ok(c3 ) ¬Ab(7) ∧ ok(c3 ) → ok(a7 ) ∧ ok(c7 ) where vi denotes the state of the variable v after executing the statement in line i. If we now assume that ok(c7 ) and ¬ok(a7 ) is valid, i.e., c is computed correctly by the program demo whereas a is not, then we can compute 3 single-fault diagnosis candidates: {1}, {3}, {7}, which is not a good result as explained previously. In order to improve this result we made some slight changes to the model. We only consider the following rules: ¬Ab(1) → ok(a1 ) ¬Ab(7) → ok(c7 ) and add some additional rules that are for handling the aliasing relationship between a and c. The first rule that can be easily extract from the CVFG captures aliasing. ¬Ab(3) ∧ ¬choice → a ≡ c We further add a some rules that allow us to derive new information from the aliasing knowledge. a ≡ c ∧ ok(ai ) → ok(ci ) a ≡ c ∧ ok(ci ) → ok(ai ) When using these rules and the above observations and the fact ¬choice, we now obtain the diagnosis {3} as the only single-fault diagnosis. In general the process of generating the aliasing rules can be expressed as follows. For each summary node n where there exists a flow edge between parent nodes add the following rule to the system description: ¬Ab(i) ∧ Cns(n) ⇒ v ≡ w where i denotes the maximum value of the statement indices IDX(n) that are associated with n, v and w denotes the variables that corresponds to the parent nodes, and Cns(n) denotes the constraints that are associated with n. This sentence basically means that v and w denotes the same memory location if the constraints are fulfilled. For example, the summary node n1 in Figure 4 will lead to the rule: ¬Ab(3) ∧ ¬choice → a ≡ c because IDX(n1 ) = {1, 3}, 3 = max{IDX(n1 )}, and Cns(n1 ) = {¬choice}. 4 Experimental Results The proposed algorithm has been implemented in Smalltalk using VisualWorks 7.1. In this section we present the experiments that evaluate two-tired dependencies with path conditions. Experiments are performed on on a sun Ultra 60 workstation (model 2450, 2.0 GB memory, 450MHz CPU speed).
Program
Faulty Version
M-Generation T-Diagnosing N-Diagnosis [sec.] [sec.] [num.] LinkedList f1 1.2 0.5 3 LinkedStack f1 0.6 0.4 6 f2 0.5 0.4 5 Ticket f1 2.1 0.3 2 f2 2.1 0.3 3 Library f1 1.3 0.5 4 Adder f1 0.1 0.0 9 f2 0.1 0.0 9 SumPowers f1 0.0 0.0 5 f2 0.0 0.0 4 f3 0.0 0.0 4 Table 1: Debugger’s performance for model generation and diagnosis correctness We use several Java Programs to measure our model-based debugger’s performance. For example, LinkedList is the implementation of linked lists, and their elements are ordered by an attribute of LinkedList objects. LinedStack is an list-based implementation of stacks. The elements in LinedStack are arbitrary objects. T icket implements a ticket office application. Their products are 6 types of railway tickets intended for travellers. Traveller’s age, duration time and number of zones determine the right ticket for a traveller. And the algorithmic part of this application relates to handling ticket sales. Moreover, Library is a small application that creates a sample library and then computes the author who has published the largest amount of books stored in the library. To evaluate the debugger’s capability, we intensionally inject a single fault into the method of a certain benchmark, and thus create several faulty version of the correct program. A set of diagnosis candidates is assumed to be sound if they can cover this fault. Moreover, the smaller is the size of diagnosis set, the better diagnosis we’ve achieved. Table 1 shows our debugger’s performance. It reports the versions of the program analyzed, the time for generating models and making diagnoses, and the number of diagnosis candidates. The results show the faulty statements of all programs can be covered by the diagnosis candidates. Compared with the FDM, the size of diagnosis candidates is indeed reduced for the first four programs in Table 1 because they involve intensive object accesses, and our new model can eliminate much more irrelevant statements. Moreover, our model can instrumentation the Java source code as the Smalltalk code, and in some cases (e.g., for example Adder, SumPowers), the run-time information is available, so by using them to check the path conditions, our debugger outperforms the FDM. 5 Related Work Models of technical systems are at the very heart of Model-Based Reasoning. Building models for a domain is difficult, time consuming and expensive. The VMBD project focuses on the on-board diagnosis prototype running on the demonstrator vehicle [12]. Our algorithm for dependency analysis is an extension of Das’s one level flow algorithm. Das’s algorithm [2] is a simple extension of Steensgaard’s unification-based approach [11], which is described as a set of type inference rules over a language of pointer related assignments. The focus of Das’s approach is to provide a practical method for obtaining better points-to information on large programs. Unfortunately, the Das’s algorithm cannot fully meet our needs that are required for debugging Java programs. For example, in some cases faults in programs rely heavily on the order of statements, however flow-insensitive analysis assume that statements can be execute in any order [2], i.e., the control structures of the languages are irrelevant [11]. Moreover, some flow-insensitive dependencies are possibly conflicting. Method calls may give
rise to implicit dependencies, and selection statements and loop statements produce inherent conditional dependencies. All these facts have to be taken into account for debugging. Hence, our algorithm extends the Das’s algorithm in this respect and presents a constrained value-flow graph to model Java programs in an appropriate way. In [7] the authors conclude that a model based purely on dependencies is too weak to discriminate between all possible program errors. As suggested by the experimental results, the new model really helps to improve the diagnosis results whenever aliasing occur within a program. Several automatic debugging approaches have been proposed so far to help programmers solving the debugging task. Among them are program slicing [15], algorithmic debugging [10], dependency-based techniques [6], and others (see [4] for an overview). Another approach that makes use of a model checker to produce error trace in order to find a fault is described in [1]. In this paper the authors discuss the localization of a fault in the source code of a program using error traces. In [17] Wotawa shows that model-based debugging in the context of functional dependencies provides at least the same capabilities than program slicing for locating bugs. Moreover, in the same paper the author proves that the model-based debugging approach can provide better results. 6 Conclusion In this paper, we try to find the tradeoff between exact dependencies and cost of modeling. Our new model is a purely qualitative model that captures not only the dependencies between variables but also the aliasing relationship between the variables. The proposed model provides more information that can be successfully used to reduce the number of computed diagnoses during debugging. To show how it works, we discuss the compilation of Java programs to a constrained value-flow graph, and the mapping from the graph to its logical representation, which can be used by a model-based diagnosis engine. Indeed our experimental results show that our approach can achieve better diagnosis results. Future research should include an analysis of the proposed model and a comparition with the existing value-based model. Moreover, we will consider augmenting the proposed model with an abstract program semantics and a light-weighted specification. 7 Acknowledgments The authors like to thank Roderick Bloem and Dr. Lin Li for their comments on the manuscript. References [1] T. Ball, M. Naik, and S.K. Rajamani. From symptom to cause: localizing errors in counterexample traces. In Proc. of the 30th ACM SIGPLAN-SIGACT symposium of programming languages (POPL), pages 97–105. ACM Press, 2003. [2] Manuvir Das. Unification-based Pointer Analysis with Directional Assignments. In Proceedings of the SIGPLAN’88 Conference on Programming Language Design and Implementation, Vancouver, BC, Canada, 2000. [3] Johan de Kleer and Brian C. Williams. Diagnosing multiple faults. Artificial Intelligence, 32(1):97–130, 1987. [4] Mireille Ducass´e. A pragmatic survey of automatic debugging. In Proceedings of the 1st International Workshop on Automated and Algorithmic Debugging, AADEBUG ’S93, Springer LNCS 749, pages 1–15, May 1993.
[5] W. Hamscher, Luca Console, and Johan de Kleer, editors. Readings in Model-Based Diagnosis. Morgan Kaufmann, 1992. [6] Daniel Jackson. Aspect: Detecting Bugs with Abstract Dependences. ACM Transactions on Software Engineering and Methodology, 4(2):109–145, April 1995. [7] Cristinel Mateis, Markus Stumptner, and Franz Wotawa. Modeling Java Programs for Diagnosis. In Proceedings of the European Conference on Artificial Intelligence (ECAI), Berlin, Germany, August 2000. [8] Cristinel Mateis, Markus Stumptner, and Franz Wotawa. A Value-Based Diagnosis Model for Java Programs. In Proceedings of the Eleventh International Workshop on Principles of Diagnosis, Morelia, Mexico, June 2000. [9] Raymond Reiter. A theory of diagnosis from first principles. Artificial Intelligence, 32(1):57– 95, 1987. [10] Ehud Shapiro. Algorithmic Program Debugging. MIT Press, Cambridge, Massachusetts, 1983. [11] B. Steensgaard. Points-to analusis in almost linear time. In Proc. Seventh ACM Symp. on Princ. of Prog. Lang. (POPL). ACM, 1996. [12] P. Struss, M. Sachenbacher, and C. Carl en. Insights from building a prototype for modelbased on-board diagnosis of automotive systems. In Proceedings of the Eleventh International Workshop on Principles of Diagnosis, 2000. [13] Markus Stumptner, Dominik Wieland, and Franz Wotawa. Comparing two models for software debugging. In Proceedings of the Joint German/Austrian Conference on Artificial Intelligence (KI), Vienna, Austria, 2001. [14] Markus Stumptner and Franz Wotawa. Using Model-Based Reasoning for Locating Faults in VHDL Designs. K¨ unstliche Intelligenz, 14(4):62–67, 2000. [15] Mark Weiser. Programmers use slices when debugging. Communications of the ACM, 25(7):446–452, July 1982. [16] Franz Wotawa. Debugging VHDL Designs using Model-Based Reasoning. Artificial Intelligence in Engineering, 14(4):331–351, 2000. [17] Franz Wotawa. On the Relationship between Model-Based Debugging and Program Slicing. Artificial Intelligence, 135(1–2):124–143, 2002.