Relationship between the Verification Based Model and the Functional Dependences Model Using Program Specification Safeeullah Soomro and Abdul Baqi Yanbu University College Department of Computer Science and Engineering Yanbu Al-Sinaiyah, Kingdom of Saudi Arabia
[email protected],
[email protected]
Abstract. It is generally agreed that faults are very difficult to find, and also expensive to locate and remove from programs. Hence, automating the debugging task is highly desired. This paper presents previous work in automated debugging based on a verification based model, which can detect and localize faults from programs using abstract dependencies. We empirically compare the two models, i.e., Functional Dependencies Model(FDM) and Verification Based Model(VBM). VBM extracts dependencies from the source code of programs. These dependencies are called computed dependencies and are compare with specification of programs to detect a misbehavior. The functional dependency model uses dependencies to state the correctness of variables at the end of a program run. If a variable value is not correct, the model is used to localize the fault. Both models apply Model Based Diagnosis for software debugging. In the paper we compare both models with respect to their capabilities of localizing faults in programs. We also present new results for the VBM of large programs, which further justifies that the approach can be used in practice. Keywords: Functional Dependences Model, Model-Based Software Debugging, Model-Based Diagnosis, Abstract dependencies, Fault detection and Localization.
1
Introduction
Software debugging is a very important part of software engineering. Without testing and verification software cannot be reliable. People make mistakes when coding, and testing or verification is required in order to detect and finally remove these anomalies. Unfortunately, both testing and fixing the fault is a time consuming and difficult task, which is mainly done by hand and based on a textual specification. Therefore, and because of the increasing pace of market leading to faster market entries of products not enough effort is assigned for debugging (including testing, fault localization, and repair). Hence, automation is required in order to provide better software quality.
Authors are listed in reverse alphabetical order.
D.-S. Huang et al. (Eds.): ICIC 2010, LNAI 6216, pp. 535–542, 2010. c Springer-Verlag Berlin Heidelberg 2010
536
S. Soomro and A. Baqi
1. 2. 3. 4. 5. 6. 7. 8. }
// Specified Dependencies : //{(a, b), (a, a), (a, i), (b, b), (b, a), (b, i), (c, a), (c, b), (c, i)} public int FixedPoint () { int a =0; // D(1) = {} int b=1; // D(2) = {} int i=0; // D(3) = {} int c=a+b; // D(4) ={(c, a), (c, b)} while (i > 10 ) { {(a, b), (a, i), (b, i), (c, a), (c, b), (c, i)} a = b; // D(6) = {(a, b)} b = 10; // should be b = c; // D(7) = {} c = a + b; // D(4) ={(c, a), (c, b)} // Computed Dependencies : //{(a, b), (a, i), (b, i), (c, b), (c, i)} }
Fig. 1. The computed dependences for the Fixed Point Computation example
For several decades automated debugging has increased importance in the research community. Various different approaches and systems have emerged. Automated debugging has more advantages compared to manual debugging because manual debugging is costly, requires much time, and is monotonous task. A brief survey on automatic debugging approaches is given in [4]. The author [4] divides existing automating debugging approaches into three categories and explains these techniques in her survey. All of the approaches have in common that they make use of program analysis. A technique not mentioned in [4] is the application of Model Based Diagnosis for software debugging. Model Based Diagnosis (MBD) is a technique to diagnosing faults based on a model of the system and observations. This technique was first introduced by [10,5]. The basic idea behind MBD s to use a model of a system directly to compute diagnosis candidates. The pre-requisite of MBD is the availability of a logical model of a system, which comprises different components. The outcome of diagnosis is a set of components that may causes an observed unexpected behavior. More recently it has been shown that MBD is not only of use for hardware diagnosis but is also a very effective technique for software debugging [1,2,6,3,11]. In this paper we follow previous research in software debugging based on MBD. In order to motivate the paper, we briefly introduce the underlying basic approach used in the paper. Let us consider the program given in Figure 1. For the program we also specify the expected dependencies. This expected dependencies can be seen as abstract program specifications. A dependency in our case is a relationship between variables. A variable x depends on a variable y if changing y potentially cause a change in x. The underlying debugging technique based on these dependencies makes use of a comparison between the dependencies invoked by the program and the specified dependencies. If there is a difference between his two dependence sets, we known that this program is faulty. The rules for computing the dependencies for simple statement, multiple statements, loops and procedure invocation are explained in more detail in [11,7,12]. For
Relationship between the VBM and the FDM Using Program Specification
537
while loops we make use of a fix-point computation. This is possible because the dependence computation is a monotonic function. Hence, when replacing while loops by a large number of nested if-then-else statements, we are able to compute a set of dependencies for while statements. When using our model [11] for debugging using the example from Fig. 1 we obtain three single fault diagnosis. Statement 1, 2 and 7 can be the root cause of the detected differences between the dependences. However, Statement 4 cannot be responsible for the following reasons. Assume statement 4 to be abnormal, i.e., Ab(4). From this we derive the dependence D(4) = {(c, ξ4 )} where ξ4 is a place holder for all possible program variables. The replacement of the ξ-variable with program variables is called grounding. In this case we obtain the following dependence set after grounding; [(a, b), (c, b), (b, i), (a, i), (c, i)]. Hence, we see that the dependencies are not equal to specification and therefore 4 cannot be a diagnosis candidate. With similar computations we are able to state 1, 2 and 7 as the diagnosis candidates. It is worth noting that there are other approaches to software debugging based on MBD. In particular the spectrum of models ranges from abstract models relying on dependences [8] to concrete value-level models. The paper is organized as follows. In Section 2 we present our Verification Based Model which is not new. We present an important rules for computing dependencies from programs. The comparison between VBM and FDM are discussed in Section 3. In Section 4 we present case studies regarding verification based model using ISCAS85 benchmarks. Finally we summarize the paper.
2
Verification Based Model
In this section we briefly recall the previously defined verification based model [11] on fault localization, which is based on abstract dependencies that are used by the Aspect system [8] for detecting faults. Abstract dependencies are relations between variables of a program. We say that a variable x depends on a variable y iff a new value for y may causes a new value for x. For example, the assignment statement x = y + 1; implies such a dependency relation. Every time we change the value of y the value of x is changed after executing the statement. The VBM used this way of extracting dependencies from programs. The method of VBM is to extract code from programs and compute the dependencies. After computing, it compares with the specified ones, if there is any mis-match so VBM detects faults and also provide information regarding real misbehavior through its automatic process. In comparison with the Aspect system [8] the VBM provides full information of faults and allows localizing those faults through an automatic process, Instead of just informing the user about missing dependencies in program as suggested by [8]. Definition 1 (Composition). Given two dependence relations R1 , R2 ∈ DEP on V and M . The composition of R1 and R2 is defined as follows: {(x, y)|(x, z) ∈ R2 ∧ (z, y) ∈ R1 }∪ R1 • R2 = {(x, y)|(x, y) ∈ R1 ∧ ∃(x, z) ∈ R2 }∪ {(x, y)|(x, y) ∈ R2 ∧ ∃(y, z) ∈ R1 }
538
S. Soomro and A. Baqi
Compositions are used to compute the dependences for a sequence of statements. The above definition ensures that no information is lost. The first line of the definition of composition handles the case where there is a transitive dependence. The second line states that all dependences that are not re-defined in R2 are still valid. In the third line all dependences that are defined in R2 are in the new dependence set provided that there is no transitivity relation. Note that this composition is not a commutative operation and that {} is the identity element of composition. We advice reader to go though these published work [11,7,12,9] to understand all rules of extracting dependencies using VBM regarding simple statement, compound statement, loop statements, procedure invocation and others to computed dependencies from programs. The above composition operator • ensures that none of dependencies are are lost during computing of dependencies.
3
Comparison between VBM and FDM
In order to compare both models we use some example programs where we show the differences between fault localization using VBM and FDM. Similar to Reiter’s definition of a diagnosis problem [10] a debugging problem is characterized by the given program and its expected behavior. We compute our assumptions from program specifications to find out the real causes of a misbehavior of a program. The model comparison we present in the following relies on a couple of reasonable assumptions. First, for the FDM we need to have a test case judging the correctness of specific variables. In general, finding an appropriate test case revealing a misbehavior w.r.t. specific variables is a difficult task, however, the presence of such a single test case is a requirement for the applicability of the 1 2 3 4 5
proc (a,b,c,d) {... x = a + b; y = x + c + d; // instead of y = x + c assert (y == x + c) }..
¬AB(2) ∧ ok(a) ∧ ok(b) → ok(x) ¬AB(3) ∧ ok(x) → ok(c) ∧ ok(d) → ok(y) → ¬ok(y), → ok(a), → ok(b) DIAG = {{AB(2)}, {AB(3)}} SP EC(proc) = {(y, a), (y, b)(y, c)} dep(proc) = {(y, a), (y, b)(y, c)(x, a)(x, b)} dep(proc) ⊇ SP EC(proc) DIAG = {}
Fig. 2. A Typical (Structural) Fault Inducing Additional Dependences
Relationship between the VBM and the FDM Using Program Specification
1 2 3 4 5
539
proc (a,b,c) {... x = a + b; y = x; // instead of y = x + c assert (y == a + b + c) }..
¬AB(2) ∧ ok(a) ∧ ok(b) → ok(x) ¬AB(3) ∧ ok(x) → ok(y) → ¬ok(y), → ok(a),→ ok(b) DIAG = {{AB(2)}, {AB(3)}}
SP EC(proc) = {(y, a), (y, b), (y, c)} dep(proc) = {(y, a), (y, b)} dep(proc) ⊇ SP EC(proc) σ(ξ2 ) = {a, b, c}),σ(ξ3 ) = {a, b, c} DIAG = {{AB(2)}, {AB(3)}}
Fig. 3. A Typical (Structural) Fault Inducing Fewer Dependences than Specified
FDM. For the VBM, we assume an underlying assertion language, and a mechanism for deducing dependence specifications from this language. Dependences are further oriented according to last-assigned variables and specified in terms of inputs or input parameters rather than intermediate variables. For simplicity, we further assume that there are no disjunctive post conditions. In the following we illustrate the introduced models’ strength and weaknesses in terms of simple scenarios. In the figures the left hand side is a summary of the FDM model including the observations obtained from running the test case and the left hand side outlines the VBM. For both columns we summarize the obtained diagnosis candidates in terms of the set DIAG. Note that we only focus on single-fault diagnosis in the following discussion. By reverting the correctness assumption about statement 2 we obviously can remove the contradiction. Moreover, reverting the assumption about statement 3 also resolves the contradiction. Thus, we obtain two single-fault diagnosis AB(2) and AB(3). In contrast to this, since y never appears as target variable, we cannot obtain dependences for variable y and thus the VBM cannot localize these kind of (structural) faults. The next example points out that the VBM fails in case the fault introduces additional dependences. In Figure 2 we assign x+c+d instead of x+c to the variable y. Our assertion indicates that y depends upon x and c, thus SP EC(proc) = {(y, a), (y, b), (y, c)}. Computing the program’s actual dependences dep(proc), however, yields to {(y, a), (y, b), (y, c), (y, d)} ⊇ {(y, a), (y, b), (y, c)} and thus VBM cannot detect this specific malfunctioning nor locate the misbehavior’s cause. By employing the FDM under the assumption ¬ok(y) we obtain two single-fault diagnosis AB(2) and AB(3). Figure 3 illustrates an example where the fault manifests itself in inducing less dependences than specified. Our specification is SP EC(proc) = {(y, a), (y, b), (y, c)}. Obviously, the computed dependences {(y, a), (y, b)} ⊇ SP EC(proc). As the figure outlines, we obtain two single-fault diagnosis candidates, AB(2) and AB(3). In this case, the FDM is also capable of delivering the misbehavior’s real cause, it returns two single-fault diagnosis candidates: AB(2) and AB(3).
540
S. Soomro and A. Baqi
Fig. 4. Results Analysis from ISCAS85 Benchmarks
The examples outlined above should have made clear that a comparison of both models in terms of their diagnostic capabilities inherently depends on how we deduce observations from violated properties. Note that the FDM itself cannot detect any faults, rather faults are detected by evaluation of the assertions on the values obtained from a concrete test run.
4
Case Studies from ISCAS’85 Benchmark
We implemented the proposed modeling approach for the programming language Java employing the Eclipse plug-in based framework. We conducted our experimental evaluation solely with large programs up to 8000 lines of code. We evaluated 10 programs and introduced a (single) structural fault on the righthand side of an arbitrarily chosen statement in every program. We carried out our evaluation on an Intel Pentium 4 workstation (3 Ghz, 520 MB). Whenever the given dependences point out erroneous behavior, our lightweight model appears to localize the real cause of misbehavior - that is the diagnosis candidates contain the statement where the fault has been introduced. We converted the well-known VHDL programs from ISCAS 85 into Java for our experimental results. We converted these programs in respect to the required sequential order. In the converted process of programs we used a method which depends upon variable’s definition and usage. First we considered the definition of variables then usage of variables in the statements of programs. Note that we use super set operator for comparison between compute dependences and program specification COM P ⊇ SP EC, instead of logical equivalence operator COM P = SP EC to find faults in the programs. Table 1 present first empirical results. The empirical results presented results from ISCAS 85 programs which indicate that this model is also able to localize faults in huge programs. The first column shows the program names. The second column shows the total possible locations of code from c17 to c7552 programs. The third column shows the
Relationship between the VBM and the FDM Using Program Specification
541
number of input variables of the programs. The fourth column is a multicolumn which reports the number of diagnosis candidates using output variables and all variables. The 5th column is also a multicolumn which shows the amount of code that can be excluded by relying on this verification-based model. Table 1. Results Obtained From ISCAS85 Programs
Prog. c17 c432 c499 c880 c1355 c1908 c2670 c3540 c5135 c6288 c7552
TL 17 356 445 826 1133 1793 2620 3388 4792 4862 7231
No. diag % INPV OUTV ALLV OUTV ALLV 11 7 4 58.8 76.4 196 37 3 89.6 99.1 243 29 28 93.4 93.7 443 11 3 98.6 99.6 587 69 64 93.9 94.3 913 6 6 99.6 99.6 1426 9 4 99.6 99.8 1719 10 3 99.7 99.9 2485 24 3 99.4 99.9 2448 141 141 97.0 97.0 3719 11 3 99.8 99.9
In the Table 1 there are following coloumns to be described as Prog is equivaent to Name of programs, TL is Total possible fault locations, No. Diag. is Number of faulty diagnosis from given program, INPV is Input variables of given program, OUTV is Output variables of given program and ALLV is equivalent to Exclusive amount of code from programs. In Graph 4 we show all programs with the number of faults depending on output variables and all variables presented in Table 1 . We used only one iteration for every program containing output variables and all variables. Green bar shows the diagnosis faults using output variables and red bar shows the diagnosis faults using all variables. In the Graph 4 we present the comparison of diagnoses in respect to the output variables and the all variables. Bar with crossed line indicates that when we increase the number of output variables used in the specification, then the number of diagnosis decreases. Bar with straight line indicates that when we decrease the number of output variables used in the specification, then the number of diagnosis increases. The results indicate that our approach is feasible for detecting and localizing the real cause of a misbehavior.
5
Conclusion and Future Research
In this article we present the comparison of the verification based model with the well-known functional dependence model [6] to detect faults using test cases and specification knowledge. This used knowledge is extracted from source code of programs directly. We gave several examples where both models can detect faults from programs. We discussed relationship and limitation of both models using
542
S. Soomro and A. Baqi
example programs. After comparison we presented the most recent empirical results for the verification based model. In future research we aim to extend the verification based model in order to handle object oriented programs.
References 1. Console, L., Friedrich, G., Theseider, D.: Model-Based Diagnosis Meets Error Diagnosis in Logic Programs In: Proceeding IJCAI’93, Chambery, pp. 1494–1499 (August 1993) 2. Stumptner, M., Wotawa, F.: Debugging Functional Programs In: Proceeding of IJCAI’93, Stockholm, Sweden, pp. 1074–1079 (August 1999) 3. Stumptner, M., Wotawa, F.: Jade – Java Diagnosis Experiments – Status and Outlook. In: IJCAI ’99 Workshop on Qualitative and Model Based Reasoning for Complex Systems and their Control Stockholm, Sweden (1999) 4. Ducass´e, M.: A pragmatic survey of automatic debugging. In: Fritzson, P.A. (ed.) AADEBUG 1993. LNCS, vol. 749, pp. 1–15. Springer, Heidelberg (1993) 5. de Kleer, J., Williams, B.C.: Diagnosing multiple faults. Artificial Intelligence 32(1), 97–130 (1987) 6. Friedrich, G., Stumptner, M., Wotawa, F.: Model-based diagnosis of hardware designs. Artificial Intelligence 111(2), 3–39 (1999) 7. Wotawa, F., Soomro, S.: Using abstract dependencies in debugging. In: Proceedings of Qualitative Reasoning QR-05, Proceedings of Joint Conference on Artificial Intelligence (2005) 8. Jackson, D.: Aspect: Detecting Bugs with Abstract Dependences. ACM Transactions on Software Engineering and Methodology 4(2), 109–145 (1995) 9. Peischl, B., Wotawa, F., Soomro, S.: Towards Lightweight Fault Localization in Procedural Programs In: Proceedings of IEAAIE (2006) 10. Reiter, R.: A theory of diagnosis from first principles. Artificial Intelligence 32(1), 57–95 (1987) 11. Soomro, S.: Using abstract dependences to localize faults from procedural programs. In: Proceedings of AIA, Innsbruck, Austria, pp. 180–185 (2007) 12. Soomro, S., Wotawa, F.: Detect and Localize Faults in Alias-free Programs using Specification Knowledge. In: Proceedings of IEAAIE (2009)