mic Debugging Technique, introduced by Shapiro [ 111 was the ... mic debugging are discussed. .... may either be direct or indirect, i.e. through a pointer.
Semi-automatic Bug Localization in Software Maintenance* Nahid Shahmehri Mariam Kamkar Peter Fritzson Dept. of Info. and Comp. Science, Linkoping University S-581 83 Linkoping, Sweden Abstract
Furthermore, a major drawback of algorithmic debugging is the great number of user interactions during the debugging process. Thus, an important improvement would be to supply the debugging system with some information which can reduce this number. A side-Gect is an update to value of a variable or a data structure of a program. A side-effectby a procedure or function, i.e. a global si&-effecr, is usually defined as an update of a global variable or data structure. Input and output statements also causes global side-effects, since those affect input and output streams. Imperative languages have some form of assignment statement which allows side-effectsof both kinds. Pure functional languages do not have an assignment statement. We extend the definition of global side-effects to include even references to global variables, i.e. variables declared outside current procedure/function. In this paper, we introduce an algorithmic debugging method for imperative languages which allows the constructive use of code with side-effects. We also demonstrate a major improvement in the bug-localization process by combining program slicing with algorithmic debugging. Program slicing, as presented by [ 121 is a method for automatically decomposing programs by analyzing their data flow and control flow. This method isolates individual computation threads within a program. The size of a slice is usually program and input dependent. However in practice, a slice is often much smaller than the original program, es. pecially for block-structured languages. In the remainder of this paper we first give a brief overview of our debugging system. Then, the principles of bug localization through algorithmic debugging are discussed. Afterwards, the principles of program slicing and its application in algorithmic debuggingare discussed. The last three sections of the paper are respectively dedicated to current implementation status, discussion about possible future improvements to the system and summary.
Debugging is a large fraction of software maintenance activities, which makes it important to find methods and tools to support this activity. This paper presents a generalized version of algorithmic debugging, a semi-automatic bug localization method. This technique is generally applicable to procedural languages. The original form of algorithmic debugging,introduced by Shapiro [ll], is however limited to small Prolog programs without side-effects,and usually requires a large number of interactions with the user during bug localization. To our knowledge, this work is the fist generalization of algorithmic debugging for programs with side-effects written in imperative languages such as Pascal, C or Fortran. Also, we have improved the search method in a way that eliminatesmany irrelevant questions to the programmer during bug localization. This is achieved by using program slicing, a data flow analysis technique, to dynamically compute which parts of the program are relevant for the search. A prototype generalized algorithmic debugger for Pascal has been implemented in Pascal.
Introduction Since the very first computer program was written, debugging has been an integral part of software development. The cost of debugging is well known. For example, telecommunications industry statistics show that removing programming defects accounts for 40 to 70 percent of the total expense [9]. Although this estimate also includes preventive efforts, debugging is the second largest expense category after new-feature introduction. Given the high cost of debugging, it is not surprisingthat attempts have been made to automate this task [2]. The Algorithmic Debugging Technique, introduced by Shapiro [ 111 was the first attempt to lay a theoretical framework for program debugging and to take this framework as a basis for a partly automatic debugger. In this system, the programmer supplies a partial specification of the program during the bug localization process, by answering questions. However Shapiro’s model cannot handle side-effects or loops, and has only been applied to Prolog programs. This restriction prevents the system from being practically useful for programs written in imperative languages.
* This work is supported by STU, the Swedish National Board for Technical Development
CH2921-5/90/0000/0030$01 .OO 0 1990 IEEE
30
several output values and only some of these values are erroneous. Program Slicing: This component is activated due to an error indication from the user. The user has pointed out a certain variable whose value is incorrect at a certain program point. Slicing computes a slice of the program with respect to the variable at that point. This slice has a corresponding execution tree which is returned to the pure algorithmic debugging for continuation of the debuggingprocess.
Functional Overview of the Debugging System We divide our algorithmic debugging methodology into three major phases: a transformation phase, a tracing phase and a debugging phase. The last phase consists of the two major components: pure algorithmic debugging and program slicing. ~~
Ph..
I
& I1
Bug Localization Through Algorithmic Debugging
\
/
-7
In this section, we describe the principles of algorithmic debugging and give an example. The transformation of the source program to an internal representation is not shown in the example. Principles of the Algorithmic Debugging Technique Algorithmic program debugging originally defined by Shapiro [l 11 is an interactiveprocess where the debugging system acquires knowledge about the intended behavior of the debugged program and uses this knowledge to localize errors. The knowledge is collected by the system through a number questions to the user. The user answers “yes“or “no”,or helshe can give an assertion about the intended behavior of the program [ 11. An assertion is a predicate which expresses an input-output relation for a procedure. We generalize the algorithmic debugging method to programs which may contain side-effects and which can be written in imperative languages, e.g. Pascal. Assertions in this model are expressed in terms of Boolean expressions, which can refer to functions and procedures, parameters, and global variables. The current target and implementation language for our algorithmic debugging system is Pascal. The algorithmic program debugger can be invoked by the user after noticing an externally visible symptom of a bug. The debugger executes the program and builds an execution tree at the unit level, while saving some useful trace information such as procedure names and inputloutput parameter values. Note that a separate node is created on the execution tree for each iteration of a loop. In other words, each iteration of a loop is similar to an execution (activation) of a procedure. The algorithmic debugger traverses the execution tree and interacts with the user by asking about the intended behavior of each procedure. The answers which the user supplies to the debugger guide the tree traversal. The search finally ends and a bug is localized in a unit p when one of the following holds: Unit p contains no procedure calls or units. All unit executions, i.e. procedure calls or loop executions, performed in the body of the unit p fulfill user’s expectations. The output from the debugger shows that, given the input parameter values and the user’s assertions about expected results, an error has been isolated to a certain unit. As an example of algorithmic debugging, consider the program in Figure 2 which computes the square of the integer 3 in
Figure 1: The functional structure of the debugging system. Arrows denote information transfer.
Transformation Phase This phase takes a program that may contain side-effects, written in an imperative language, and transforms it to an equivalent program without global side-effects. Trace generating actions (statements) are added to the transformed program in this phase. Trace generating actions are created for units of a program. Here, we define a unit to be a procedure, a function or a loop inside a procedure/function, i.e. a local loop. Tracing Phase This phase builds an execution tree of the transformed program. An execution tree of a program is a tree structurecontaining information about the program’s actual execution. We refer to the actual program execution as the actual program behavior [2]. The execution tree created in this phase contains trace information about each unit of the original program, such as parameter values and value of variables which cause global side-effects within the unit. Note that the execution semantics of the original and the transformed program are equivalent. Debugging Phase The goal of the debugging phase is to localize a bug through a dialoguewith the user with as few interactions as possible. The components of the debugging phase are described here. Pure algorithmic debugging: This component interacts with the user through queries aboot the expected result of procedures, i.e. intended program behavior, while traversing the execution tree, until one of the following happens: A bug is localized by the debugger at the unit level, i.e. the debugging ends, or The program slicing is activated by the user, due to an eror indication. This case happens when a unit produces
.
31
The programmer notices that there is a bug somewhere in the program. The algorithmic debugger is activated, in order to localize the cause of the incorrect behavior. The following interactions are performed between the programmer and the debugger during a top-down traversal of the execution me:
two ways and then checks if the results from both computations are the same. One way is by multiplication, 3*3=9. The other is by summation, 1+2+3+2+1=9. The latter expression is split into two partial sums, 1+2+3 and 1 +2. Then the formula for the sum of 1 through n, n*(n+l)t2, is used for each sum. program Main; var y: integer; isok: boolean; procedure Test(r1, r2: integer; v i r isok: boolean); bep” 1sok:- rl r2; end; procodure Multiply(y: integer; var r2: integer);
SqrToat(1n 3, Out f a l a e ) ? no ColllputInTwoYaya (In 3, Out 9 , Out 6) ?no h m u t i o n ( I n 3, Out 9) ? yes Yultiplication(1n 3, Out 6) ? no Multiply(1n 3, Out 6) ? no
-
bepin
r2:= y * y; and; procedure Multiplication(y: integer; var r2: integer); begin Multiply (y, r2) ; and; procedure Add(s1, s2: integer; var rl: integer); bogin rl:= sl t s2;
Error cauaed by proooQIre Yultiply.
The debugger message states that an erroneous result has been produced inside the procedure Multiply with respect to the input value 3: procedure Multiply(y: integer; var r2: integer); begin r2:- y t y; and;
end : _.. _,
procedure SecondSum(y: integer; v u s2: integer); begin 9.2:y * (ytl) div 2; end; procedure FirstSum(y: integer; var sl: integer); begin (ytl) div 2; s1:- y end; : procedure PartialSums(y,x: integer; var ~ 1 . ~ 2integer); bogin FirstSum(y, sl); x:= y-1; SecondSum(x, 92); and; procedure Summation(y: integer; var rl: integer) ; var sl, s2: integer; bogin PartialSums(y, y-1, sl, s2); Add(s1, 92, r l ) ;
In order to reduce the number of user interactions, the user ihould be able to supply the debugging system with extra information during queries. Our debugging system supports this extended query in two ways. The first way is to allow the user to give assertionsabout intended program behavior. The other way is to allow the user to point out certain erroneous output parameter values from a parameter list during a query. This matter is discussed later in this paper.
and :, _.._
procedure ComputInTwoWays(y:integer; var rl,r2:integer); begin Summation(y, rl); Multiplication (y, r2) ; and; procedure SqrTest(y: integer; var isok: boolean); var rl, r2: integer; bogin ComputInTwoWays(y,~ 1 . ~ ; 2 )Test ( r l , r2,isok); end; begin ( * Main * ) y:= 3; SqrTest (y, isok); and.
Preparing Programs for Standard Algorithmic Debugging The purpose of this section is to discuss program transformations in more details. We start by describing two different approaches to the transformation phase. Then, some examples of our approach are given. One approach to such transformations is to transform the subject program which contains side-effects.and loops into a completely functional form, free from side-effects and loops. This approach is discussed in [lo]. Similar transformations for a very small language are formally treated in [6]. Since such transformations cause an increase in the size of the transformed program, it is desired to restrict the transformations. Thus, another approach is to perform transformations only on the program constructs which conflict with the principles of algorithmic debugging. Our system follows the second approach.
Figure 2 . An example program which computes the square of integer 3 in two ways and then checks if the results from both computations are the same.
By introducing a bug into the example program in Figure 2 we can follow the algorithmic debugging process. For example by writing y+y instead of y*y in the procedure Multiply, the program result will be incorrect. The execution tree of the modified program with trace information is shown in Figure 3:
-
SqrTest (In 3,Out false)
a
ComputInTwoWays(In 3,Out 9, Out 6)
Test (In 9, In 6,Out false)
z
Summation (In 3,Out 9)
Multiplication(1n 3,Out 6)
PartialSums(1n 3,1n 2,Out 6,Out 3) Add(1n 6,1n 3 , h t 9) FirstSum(1n 3,Out 6) SecondSum(1n 2,Out 3 )
Figure 3: The execution tree of the modified program.
32
I
Multiply(1n 3,Out 6)
goto 9
Here, we describe program constructs which violate principles of algorithmic debugging and are subject to transformations. Then the transformations for these constructs are discussed. Side-effect Freeness: The Algorithmic debugging method is based on procedure abstraction. This means that the debugging process and hence user interactionsare performed at the procedure level. Thus, we can relax the condition for side-effectfreeness at the statement level, and impose it only at the procedure level. In a program written in an imperative language the presence of constructs which cause any of the following effects, are said to have global side-effects: Global side-effects in the form of side-effects on variables not locally declared in the current procedure. Such a side-effectmay be caused by a reference to or modification of a variable value. The reference/modification may either be direct or indirect, i.e. through a pointer. This paper does not yet cover side-effects caused by pointers, due to the complex nature of that problem. GOTO-statements from a procedure to a label declared defined outside this procedure. Such statements cause global side-effects in the form of global transfers of control. Here we refer to such goto-statements as global goto’s. Loops inside a procedure do not prohibit the algorithmic debugging process. However, crucial computations are often performed inside loops. Thus, they deserve to be treated in a similar way as procedures, i.e. as units for algorithmic debugging. A loop may be created explicitly by a language construct, e.g. a while-statement. A backward goto-statement may create an implicit loop. In addition to the above mentioned transformations, the intermediate program is augmented with trace generating statements. The augmentation is very straightforward. Examples of Program Transformations: The program transformations are guided by information from control and data flow analysis. The transformations here are twofold. The f i s t kind are those transformations which eliminate global side-effects and global goto’s. The second kind augments the intermediate program with calls to procedures which generate trace information during the tracing phase. Here we give an example of differentcases of program transformations. Conversion of global variables to parameters. procedure p(vrr y :
bogin y:= x t 1; 2:- y - x end;
...);
begin ( * p * )
.. . ) ;
p r o ~ d u r sq(
... ) ;
... ...1; ...
9:
end;
bogin ( * q *)
...
...
...
bogin ( * P * )
...
..
q (. , exitcond) ; i f exitcond-1 then goto 9;
...
9:
end;
...
Handling gotos inside a loop addressed outside the loop. Here a simple while-statement is considered. If the lube1 9 is declared outside the procedure surrounding the while-statement, then the new global goto is handled by a later transformation. The transformed program will contain declarations for the label whilelab and for the variable leave. These declarationsare not shown here. while B do bogin
while B and not leave do
bepin goto 9;
leave:- true; goto whilelab; whilelab: end;
end;
i f leave than goto 9;
Generating trace generating actions (functions) for procedures and loops. ProFQQUp p (vary:
booln
...; in x: ...; out z : ...) ;
create-exectree-rec; save-incoming-values(x,y); y : = x + 1; 2 : - y - x; save-outgoing-values(y, z); d;
Transparent Debugging Relative to the Original Program Despite the fact that the program is transformed into an internal form, the debugger still presents the original program when interacting with the user. This is important - otherwise it would be too hard for the user to answer questionsfrom the algorithmic debugger. For example, a typical question regarding a procedure call could be: Is this call correct for these input parameters and input values on these global variables, and these values on output parameters and free global variables? Regarding loops: the debugger presents the original loop to the user and asks if relevant iteration variables are correct for iteration 1, iteration 2 etc. Likewise, for programs that contain gotos and have been transformed to goto-lessform: this transformation does not influence the procedure call structure and execution tree, and thus, creates no problems during algorithmic debugging. Non-local gotos are represented as integer values on special added return parameters, together with a local goto. A question to the user in this case will be: Given these values on input parameters and free variables, is it correct to perform this non-local goto? Thus the nonlocal goto is treated as one of the results from the procedure call.
...) ;
...
procedure p ( ); l a b e l 9; v8r exitcond: integer; pmwodure q ( . v8r exitcond: ); label exitlab; bepin ( * q * ) exitcond:=O;
. .;
exitlab: ad;
q(
Breaking global gotos into several structured local gotos. proaedure p ( label 9;
...
...
and;
...;
procodure p(v8r y : i n x:...; out z : bogin y:- x t 1; 2:- y - x and;
exitcond:-1; goto exitlab;
...
...
...
33
Principles of Program Slicing
procedurs p (
...
bogin
program computing; vaz i, f a c t : i n t e g e r ;
pmcaduro sumproc(var sum : i n t e g e r ; i: i n t e g e r ) bepin sum:= sum + i; and;
proCoQlr0 f a c t p r o ( v a r f a c t : i n t e g e r ; i: i n t e g e r )
pmcrxiuro factpro(-
begin
fact
: i n t e g e r ; i: i n t e g e r ) ;
bogin fact:= f a c t
*
i
d; bogin ( * m i n *) sum:= 0; f a c t : = 1; i:=1;
bogin fact:= f a c t end;
*
p n - l ( . . .) : pn(x, Y):
...
end;
F i g u r e 5. A program fragment t o compute t h e v a l u e of t h e v a r i a b l e y.
Assume that procedure pn with variable x as input parameter and variable y as output parameter computes the value of variable y using the value of variable x. Also assume that the value of variable x is independent of procedure calls p l to p n - I . The execution results in an incorrect value for the variable y , compared to the user expectation. Procedures PI, p2,...,pn-I which execute before pn are not involved with the computation of y , but still the algorithmicdebugger asks about the behavior of all of them. Here, we consider the top-down traversal of execution tree, Figure 6. However, generally it doesn’t matter which traversal method is used because the debugger has no knowledge about which procedures are relevant or irrelevant for the computation of a specific value
i
P(
( * m i n *) fact:- 1; i:- 1; whilsi