Application of Computational Redundancy in Dangling ... - Doi.org

1 downloads 0 Views 221KB Size Report
pointers in large C and C++ programs. In [8] a static checking tool has been presented that can detect misuses of null pointers, uses of dead storage, memory ...
Application of Computational Redundancy in Dangling Pointers Detection Zakarya Alzamil Computer Technology Dept. Riyadh College of Technology Riyadh, Saudi Arabia [email protected] Abstract — Many programmers manipulate dynamic data improperly, which may produce dynamic memory problems, such as dangling pointer. Dangling pointers can occur when a function returns a pointer to an automatic variable, or when trying to access a deleted object. The existence of dangling pointers causes the programs to behave incorrectly. Dangling pointers are common defect that are easy to commit, but are difficult to discover. In this paper we propose a dynamic approach that detects dangling pointers in computer programs. Redundant computation is an execution of a program statement(s) that does not contribute to the program output. The notion of redundant computation is introduced as a potential indicator of defects in programs. We investigate the application of redundant computation in dangling pointers detection. The results of the initial experiment show that, the redundant computation detection can help the debuggers to localize the source(s) of the dangling pointers. During the experiment, we find that, our approach may be capable of detecting other types of dynamic memory problems such as memory leaks and inaccessible objects, which we plan to investigate in our future research. Keywords-redundant computation; debugging; memory leaks;

dangling

pointers;

I. INTRODUCTION Most of the programs that use dynamic data such as linked lists are very difficult to debug. Among the common bugs that are caused by the improper use of pointers are dangling pointers, inaccessible objects, and memory leaks. Dangling pointers can occur when a function returns a pointer to an automatic variable, or when trying to access a deleted object (using delete operator). Inaccessible objects occur when a pointer is assigned to point to another object, leaving the original object inaccessible, either by using the new operator or regular assignment operator. Memory leaks occur when a dynamic data is allocated but never deallocated. The presence of such dynamic memory problems consumes the systems resources that deteriorates the system performance and may cause the system to behave incorrectly which may lead to incorrect results. Dangling pointers are easy to encounter but very difficult to diagnose. In this paper we propose a dynamic approach that helps the testers in detecting dangling pointers in computer programs. A new notion of computational redundancy is used to guide the debuggers in the bug localization. Redundant computation is

Proceedings of the International Conference on Software Engineering Advances (ICSEA'06) 0-7695-2703-5/06 $20.00 © 2006

the execution of a program statement(s) that does not affect the program output. Mainly, redundant computation is introduced as a performance deficiency [3, 4] that may be caused by poor program design or inefficient program coding. However, in some cases, redundant computation may be caused by a defect in the program, therefore, the detection of redundant computation may guide the debuggers to the potential source(s) of such defect. This paper is organized as follows: Section II describes in brief the related work, Section III presents the concept of redundant computation, Section IV presents the algorithm that detects redundant computation, Section V describes the application of redundant computation in dangling pointers detection, and in Section VI future research is discussed. II. RELATED WORK There have been many proposed methods for the purpose of supporting program debugging. Program debugging has been supported by static program slicing [23], redundant code detection [15, 18], dynamic detection of data anomalies [5], and dynamic program slicing [17]. The techniques that have been proposed for the purpose of detecting dynamic memory problems can be categorized into static and dynamic methods. Static methods analyze the source code without its execution, whereas the dynamic methods analyze the code based on its execution. Most of the existing methods have been proposed on the memory leaks detection, and some has investigated the detection of dangling pointers. For example a static method has been proposed in [20] for detecting memory leaks occurring in arrays of objects in garbage collected environment. The detected leaks may be exposed to the garbage collector, which may allow garbage collector to collect more storage. In [19] a static analysis approach that detects memory leaks has been presented, which uses symbolic evaluation to analyze the program's behavior without executing the code. In addition a static analysis method is presented in [13] that can automatically find memory leaks and delete dangling pointers in large C and C++ programs. In [8] a static checking tool has been presented that can detect misuses of null pointers, uses of dead storage, memory leaks, and dangerous aliasing. The existing dynamic method for dynamic memory problems detection includes memory leaks detection [24, 12,

0-7695-2703-5/06/$20.00 (c) IEEE

22, 11, 6] and dangling pointers detection [14, 21]. In [24] a garbage collection technique has been proposed that might be used to reduce the performance degeneration due to memory leaks in applications running on server in a network environments, such technique can be useful when source code or object code are not available. A memory leak detection tool Purify has been described in [12], which uses the object code insertion with checking logic to detect memory leaks and access errors. In [22] a method for detecting flaws in heap management has been presented, which is based on collected execution traces of the application under analysis to identify the memory leaks and redundant deallocations of the same heap segment. An approach to modeling the dynamic memory access properties of a program using amorphous program slicing has been presented in [11] to create a Dynamic Memory Model (DMM), which is simplified version of the original program that is constructed with the code that concerned with the dynamic memory access behavior of the original program. This approach focuses on the analysis of the quantity of dynamic memory allocated to detect dynamic memory problems such as memory leaks. In [6] a reference pattern visualization methodology for Java programs has been proposed for finding the causes of memory leaks. This approach allows the programmer to identify a period of time in which temporary objects are expected to be created and released. This information may be used to identify objects that persist beyond this period and the references which are holding on to them. A Real-Time Specification for Java (RTSJ) approach has been presented in [14] that provide an implementation solution to ensure the dynamic checking of assignment rules before each assignment statement to prevent the creation of dangling pointers, and thus maintaining the pointer safety of Java. In [21] a data management package has been proposed to translate data organizations into the definition of classes, pointers, and access functions, in which the translated data organization may be protected against dangling pointers. Many studies have shown that some program faults may exist as a result of program deficiency such as redundant (dead) code. Redundant code [15, 10] is a statement that does not affect program outputs for all possible executions. Redundant code has been presented as a symptom of an error in the program behavior. Most compilers identify redundant code as deficiency in the program, and therefore, may be deleted to produce a more reliable version of the same program. For example, a redundant code analysis method has been presented in [15], which may help in the localization and correction of the error without executing the program. This approach is based on the premise that the existence of redundant code might be a source of erroneous behavior. Also, the usage of redundancies to find errors has been proposed in [25], in which a checkers are written to flag potential redundancies that may be the cause of errors. This technique uses five types of checkers: idempotent

Proceedings of the International Conference on Software Engineering Advances (ICSEA'06) 0-7695-2703-5/06 $20.00 © 2006

operators, redundant assignments, redundant code, redundant conditional, and redundant null-checks. Although the existing methods detect dynamic memory problems, there are other types of program deficiency, such as redundant computation, that may be found in programs as a result of a dynamic memory problem. The detection of redundant computation may help the debugger to detect more dynamic memory problems such as dangling pointers. In this paper we present an approach of using computational redundancies for the purpose of detecting dangling pointers. The notion of redundant computation is presented in the following section. III. REDUNDANT COMPUTATION Redundant computation [1] is the execution of a program statement(s) that does not affect the program output. The existence of the redundant computation reflects the situation where an execution of a statement is unnecessary. The main idea of redundant computation is based on the premise that each execution of a program statement should affect at least one program output; otherwise the statement execution(s) is considered as redundant computation. In addition, a redundant execution of a statement can never affect any nonredundant executions of other statements. Redundant (dead) code does not affect program outputs; therefore, its elimination does not change the functionality of the program. A redundant code always exhibits redundant computation, i.e., its execution is always redundant. However, a statement that exhibits redundant computation is not necessarily a redundant code, i.e., the same statement on one execution may exhibit redundant computation, whereas on a different execution, the statement may affect the program output, i.e., is not redundant computation. Redundant computation represents a partial redundancy of a statement. Redundant computations are identified in a sequence of executed statements for some program input x. This sequence is referred to as an execution trace Tx. Node (statement) Y at execution position p in execution trace Tx is written as YP and is referred to as an action. In order to, informally, present the concept of redundant computation; consider the compare function of Fig. 1. None of the statements in this function are redundant, i.e., there is no "dead" code. Suppose that, the compare function is invoked with the following values of input parameters: x=5, y=2. Fig. 2, shows the execution trace of the compare function on the given input. During execution of this function the following sequence of statements is executed: 1, 2, 3, 5, 6, 7. In this execution, statement 2 assigns a value to variable f, but the value of f is reassigned at statement 6. Clearly, execution of statement 2 on this input is redundant because the value of variable f that is assigned at statement 2 is not used before being assigned a new value. Therefore, execution of statement 2 does not affect the output, i.e., the return statement, of the function. On the other hand, when

0-7695-2703-5/06/$20.00 (c) IEEE

the function is executed on a different input, e.g., x=2, y=7, execution of statement 2 is not redundant because the value of f assigned by statement 2 is used by the return statement. Clearly, statement 2 exhibits partial redundancy referred to as redundant computation. In the following section we present the method used to detect redundant computation. /* determines the difference between two numbers */ 1 int compare(int x, int y) { int f; 2 f=-1; //difference is negative 3 if (x==y) 4 f=0; //difference is zero 5 if (x>y) 6 f=1; //difference is positive 7 return f; 8 } Figure 1. Sample C function.

// input x=5, y=2 Entry compare(x, y) 11 f=-1 22 x==y 33 x>y 54 f=1 65 76 return f Exit compare 87 Figure 2. An execution trace of the code fragment in Figure 1.

IV. REDUNDANT COMPUTATION DETECTION ALGORITHM In this section, we describe the algorithm that detects redundant computations in the execution trace. The program under analysis is executed on some program input and during its execution the execution trace is recorded. In order to record an execution trace, the program is automatically instrumented to capture information related to the executed statements, e.g., statement id, memory addresses of variables used and defined in statements. The algorithm is based on a modified version of a dynamic slicing algorithm [16], and uses dynamic dependence analysis to identify data and control dependencies in the execution trace that are then used to identify actions in the execution trace that affect at least one output. This analysis is performed until all actions affecting at least one output are identified. Actions that do not affect any output are actions that exhibit redundant computation. The algorithm can compute redundant computations for arbitrary procedural programs (e.g., program with procedures/functions, unstructured constructs, recursion, aliasing, etc.). Before the algorithm is described, we introduce basic data and control dependence concepts used in the algorithm. A use of variable v is an action YP in which this variable is referenced. A definition of variable v is an action YP that

Proceedings of the International Conference on Software Engineering Advances (ICSEA'06) 0-7695-2703-5/06 $20.00 © 2006

assigns a value to that variable. The data dependence captures a situation where one action assigns a value to a variable (memory address) and the other action uses that value. The last definition of a variable v at execution position k in execution trace Tx is the closest action YP that contains a definition of v such that p < k. For example, in Fig. 2, the last definition of variable x at the execution position 4 is action 11 because function entry at node 1 assigns a value to variable x and variable x is not modified between execution positions 1 and 4. The control dependence captures the situation when the execution of a statement depends on the evaluation of a test node (i.e., a predicate) of a conditional statement. The control dependence [9] is defined as: Let Y and Z be two nodes and (Y, X) be a branch of Y. Node Z postdominates node Y iff Z is on every path from Y to the exit node. Node Z postdominates branch (Y, X) iff Z is on every path from Y to the program exit node e through branch (Y, X). Z is control dependent on Y iff Z postdominates one of the branches of Y and Z does not postdominate Y. For example, in the code segment of Fig. 1, node 4 is control dependent on node 3 because node 4 postdominates branch (3, 4) and node 4 does not postdominate node 3. The general outline of the algorithm that computes redundant computations in execution trace Tx is presented in Fig. 3. The algorithm starts by initializing all actions in Tx as unmarked and not visited. Then, the algorithm identifies all output actions (executed output statements) and marks them as marked and not-visited. An output action may be an explicit output statement, e.g., printf. In a case of analyzing stand-alone functions, e.g., the compare function of Fig. 1, an output statement can be a return statement or an exit from the function can be considered as an output statement for parameters passed-by-reference. For example, the return statement (node 7) of Fig. 1 is considered as an output statement of that code segment. In the while-loop (lines 310) of Fig. 3, the algorithm selects marked but not-visited action XP and sets it as visited. In next steps (lines 6-8), if the action XP is not a function entry/call, actions that have data dependence on XP are identified by finding the last Procedure definition for each variable used in XP. Find_LD(v, p) finds and marks the last definition of variable v starting from execution position p. The procedure traverses backwards looking for an action that defines v. When such an action is found it is marked. If the identified last definition action is a function entry and v is a formal parameter of this function, then a function entry and the function call, which is the predecessor of the function entry, are marked. At line 9, Procedure Mark_Function_Entry(XP) identifies and marks the function entry to which node (statement) X belongs, together with the corresponding function call. At line 10, Procedure Find_CD(XP) identifies the closest action Zq (by moving backwards) that has control dependence on action XP and marks it. If marked action Zq is a predicate of a loop, the next executions of Z till the last iteration of the loop are also marked. This is done to ensure

0-7695-2703-5/06/$20.00 (c) IEEE

that the exit-loop action (i.e., a loop-predicate action at the exit from the loop) is marked. The while-loop iterates until all marked actions are visited.

1 2

SET all actions in Tx to unmarked and not visited SET all output actions as marked and not visited // starts from the end of Tx to go backward 3 WHILE there is marked and not visited action in Tx DO 4 SELECT a marked and not visited action XP in Tx 5 SET XP as visited 6 IF XP is not a function entry/call THEN 7 FOR every used variable v at XP DO 8 Find_LD(v, p) and SET it as marked // mark the last definition of v at p ENDFOR 9 ELSE Mark_Function_Entry(XP) // mark a function entry to which X belongs ENDIF 10 Find_CD(XP) and SET it as marked // mark an action that has control dependence on XP ENDWHILE 11 DISPLAY all unmarked and not visited actions as redundant computation Figure 3. Redundant computation detection algorithm.

The algorithm marks actions in the execution trace until all marked actions are visited and no more actions can be identified for marking. All marked actions represent statements executions that affect at least one program output action; therefore, they do not represent redundant computation. On the other hand, all actions that are not marked represent statements executions that do not affect any program output action; therefore, these statements executions represent redundant computations. The set of redundant actions is computed by identifying all unmarked and not-visited actions in the execution trace Tx. For example, in the execution trace of Fig. 2, the algorithm in the first step marks the return action 76 of the compare function as an output action of the code segment. In the next step last definition of used variable f at action 76 are identified as action 65. Action 65 is visited and because there are no used variables at this action, no further steps are taken for that action. In addition, action 54 that has control dependence on action 65 is marked. The algorithm continues visiting and marking actions until all marked actions are visited, in which the function entry at action 11 is marked as the last definition of the used variables x and y at action 54. Actions 22 and 33 are not marked by the algorithm of Fig. 3, hence are shown in bold. All the remaining actions (not in bold) are marked by the algorithm. Unmarked actions represent redundant executions of statements. The detailed redundant computation algorithm along with its proof of correctness and complexity is discussed in [1].

Proceedings of the International Conference on Software Engineering Advances (ICSEA'06) 0-7695-2703-5/06 $20.00 © 2006

V. APPLICATION OF REDUNDANT COMPUTATION IN DANGLING POINTERS DETECTION Debugging is a major activity in software testing and still a bottleneck in software development. The most difficult part of debugging is to localize the fault [7]. In debugging, programmers are interested in the localization of a fault that caused an incorrect output during program execution on particular program input. Among the faults that cause defects in the programs are the faults that are related to dynamic memory allocations. The improper use of pointers is a common practice for most programmers. Among the common problems that are resulted of the misuse of pointers are dangling pointers. Dangling pointers can occur when a function returns a pointer to an automatic variable, or when trying to access a deleted object (using delete operator). The presence of such dynamic memory problem may cause the system to behave incorrectly which may lead to incorrect results. Such defect is major cause of incorrect behavior of programs, and hard to detect. Understanding program behavior is frequently a major factor in efficient debugging [17]. For efficient debugging, debuggers often are interested to narrow the space in which they examine for bug localization. One approach is to slice the code into small pieces in which debuggers can concentrate to find bugs. Another technique is to filter the amount of information the debugger still has to examine to localize bugs [7]. Redundant computation has been reported as a performance deficiency that may be eliminated to improve program's performance [3, 4]. Our experience with redundant computation shows that, the presence of redundant computation is not desirable. Although its existence may represent a performance defect, in some case redundant computation may be caused by an error in the program, thus detecting computational redundancy may guide the tester in debugging their software. Redundant computation might be useful for the purpose of program debugging, by filtering the amount of information provided to the debuggers as potential sources of program defects. Detecting computational redundancies guide debuggers to the locations in the code that are likely sources of program defects. The existence of redundant computation reflects the situation where the developer believes such statement's executions contribute to the functionality of the program. In this paper we investigate the application of redundant computation in dangling pointers detection. We have conducted an initial experiment using the Redundant Computation Analysis Tool (RCAT) [2] to identify statement executions that exhibit redundant computation. RCAT tool computes redundant computations based on an extended version of the algorithm [1] that has been presented in the previous section. In order to illustrate the application of redundant computation in detecting dangling pointers, consider the sample code in Fig. 4, which reads the students' marks,

0-7695-2703-5/06/$20.00 (c) IEEE

computes the marks average, and display the average on the screen. The execution trace is shown in Fig. 5 on the input size=3, ArrPtr[0]=80, ArrPtr[1]=90, and ArrPtr[2]=70. As can be seen in the execution trace of Fig. 5, when analyzing this sample code using our redundant computation detection tool RCAT, action 324 (shown in bold), which represent the execution of the return statement of the function ComputeAvg(float total, int num) is detected as redundant computation. The function ComputeAvg(float total, int num) is responsible for computing the average of the students' marks and returns the memory address of an object that stores the computed average. The pointer average should contain a memory address of an object that contains the computed average. However, the detected redundant computation indicates a problem in the source code that causes the pointer average to contain a memory address that is not pointing to the object that contains the average computed by the function ComputeAvg(float total, int num), as a result, the action 324 is not influencing the output and hence, is redundant computation. When examining the code we find that the function returns the address of a local variable (automatic variable) to a pointer at the function call. Such practice causes a dangling pointer, in which the object is inaccessible to the pointer at the function call. Clearly, the debugger can localize the source of such defect in the source code by the help of the detected redundant computations.

51 62 73 84 95 106 117 128 139 910 1011 1112 1213 1314 915 1016 1117 1218 1319 920 1421 122 223 324 425 1526 1627

Entry StuAvg(int size) i=0 sum=0 ArrPtr= new int[size] i

Suggest Documents