Efficient Automatic Program Repair Using Function-Based Part-Execution YuhuaQi, Xiaoguang Mao, Ziying Dai
YudongQi
School of Computer National University of Defense Technology Changsha, China {yuhua.qi, xgmao}@nudt.edu.cn,
[email protected]
School of Computer Naval Aeronautical and Astronautical University Yantai, China
[email protected]
the function-based part-execution (FPE) technique to suppress the time spent in executing long-running test cases in the process of validation: Specifically, for some long-running test cases, the process of executing them may be time-consuming. Nevertheless, for the same test case the behavior before the activation of bug is similar between the original program and the patched program. Thus, during validation, we just need to run the program from the location where code is changed, instead of running all over again. In this paper, Checkpoint/Restart [5] technique is used to implement this insight. In addition, we apply the invariant detection technique [6] to report some impending failures without executing the program to the end. Before the validation, we identify invariants holding on the SUSpICiOUS functions from the positive test cases. Subsequently, in the execution of each test case, we monitor the program and kill it immediately as long as all these functions are executed completely. Finally, we check whether the invariants are violated. If some violations are observed then the program fails to pass the invariant testing, otherwise the program passes the invariant testing, and we continue to test next test case at once. In summary, this paper makes the following contributions: We leverage both the Checkpoint/Restart and invariant detection technique to implement FPE and predict impending failure. To verify our idea, we have implemented a new approach which can automatically repair the program with high efficiency.
Abstract-As an emerging paradigm for automated debugging on software system, automatic program repair plays a more and more important role on computer development.
Currently, although
there are lots of approaches for automatically repairing errors, they do not work very well due to time-consuming validation especially when the faulty programs equip with some long-running test cases. To suppress the testing cost, we present the technique of function based part-execution (FPE), by which only the key code, instead of the whole patched program, is executed. In addition, the invariant detection technique is applied to predict imminent program failure with incomplete execution. The controlled experiment on real bug in the PHP program show that our approach performs much better than
the
original
Genprog,
a
state-of-the-art
approach
on
automatic program repair.
Keywords-program repair; validation; invariant
I.
INTRODUCTION
It is a tedious, time-consuming process to fix bugs for deployed software. According to previous statistics, up to 90% of development resources are spent on software maintenance [1]. At present some promising studies have been done for automatic program repair [2-4]. Generally, there are three phases: first, locate the fault automatically or manually; second, generate the corresponding patch for the fault automatically; at last, validate the patch through some test cases. Much research in recent years has focused on the first two phases: how to locate the fault and generate the candidate patches. However, little attention has been paid to the third phase: how to validate the candidate patch with high efficiency. Generally speaking, there are two main components during the course of the traditional validation testing: recompiling code and executing test cases. Although in our precious work we have used the weak recompilation (WR) technique to reduce the recompilation cost, the testing cost is still not been efficiently addressed. In general, to check whether a candidate patch is valid or not, both the negative test cases encoding the fault to be repaired and the positive test cases characterizing the required behavior of the program should be executed. If there are some long-running test cases among them, time spent on validation may be too much and even unaccepted. In this paper, we present
II.
A.
FRAMEWORK IMPLEMENTAnON
Overview
Figure 1 gives an overview of our technical approach which is implemented based on Genprog, and describes how it works. As a whole, there are two stages in the framework: 1) Preparation Before the repair, some information should be presented: suspicious functions which can narrow down and identify the scope of defective code, context files which save the states for program to be restarted, and invariants which represent the properties holding for all positive test cases. As shown in Figure
978-1-4673-5000-6/13/$31.00 ©2013 IEEE
235
[ r-�-,
Loop
(:=;PE= ;
validalin g:-
FPE
Function
GcnProg
--:= : t:-1l
�=====�=]
------
======
Suspiolls Restart
Suspious Functions
Execute
Execution
Lnvariant
Positive Testcases
Identification Invariant Detect Tool
Figure I.
The framework of our approach.
which can suppress time for recompilation and FPE which can reduce the execution time for long-running test cases. This section mainly describes the concrete implement of the FPE technique. 1) WR To suppress the possible recompilation cost, we use the WR technique to recompile only the altered code, resulting in effIcient recompilation process. 2) FPE For long-running test cases, we suppress the execution time by executing only the key code having important influence on the abnormal behavior of the program. We implement that using a combination of two methods: restart the program from the state having been saved in advance at the point just before the execution of suspicious functions; and kill the program at the end of execution of the key code. As for the former, we implement it by using the Checkpoint/Restart technique; to the latter, we add an exit(O) statement at the end of key code area in the source code.
1, information for this stage can be obtained from the output of corresponding components which will be described in detail when the information is required in the relative sections. 2) Automatic repair Taking as inputs the outputs of stage of preparation, in this stage the candidate patch is repeatedly generated and validated until a valid one is available. According to genetic algorithm, the Genprog component fIrstly generates a candidate patch through the modifIcation of the suspicious functions, and then the patch is validated by the validating component. Because the valid patch is obtained by the trial and error approach, this stage is labeled with the tag loop as shown in Figure 1.
B. Fault localization In the framework described in Figure 1, we do not assign the specifIc approach to the component of fault localization. We assume that the defective code have been tracked down with the help of aforementioned fault localization technology. What we need to do is just to extract those functions where the defective code is located.
C.
a) Checkpoint/Restart Checkpoint/Restart is an operating system component that provides checkpoint and restart services for program: the checkpoint service creates a fIle describing a running program; the restart service can later reconstruct the program from the contents of the fIle, and continues to execute the program. In this paper, we adopt this technique in order to suppress execution time for long-running test cases in the process of validation. Since we try to repair the program by modifying the suspicious functions in our approach, before the execution of suspicious functions the behavior for the same test case is not different between the patched program and the original program. Hence, consider the long-running test cases for large programs such as databases and web servers. For each one, we can utilize the Checkpoint/Restart to save the program state at the point before the execution of the suspicious functions in the stage of preparation, and when a candidate patch is validated with the corresponding test case in the stage of automatic repair, we restart and continue to execute the program from the saved state, rather than execute the program from the beginning.
Genprog
In our approach, until a valid patch is obtained, we recursively generate the candidate patch based on Genprog: a generic approach for automatic software repair [ 7]. With the hypothesis that important functionality missed by a program in one location probably exhibits the correct behavior in another location, Genprog has successfully fIxed lots of bugs in deployed, legacy C programs (e.g., PHP, TIFF) by copying code from another location or modifying existing code. In consideration of the scalability problem, we merely make use of Genprog to generate the candidate patch.
D.
r
Validating the patch
When a candidate patch is generated by the Genprog component, we need to check whether the patch is valid or not. As illustrated in the preceding sections, with traditional approach, most of current approaches for automatic program repair suffer from the scalability problem. To address the problem, we apply two innovative techniques in the process of validation: WR
b) Bier
236
To the FPE component, we use Berkeley Lab CheckpointlRestart (BLCR) [8] library to complete the checkpoint and restart service. In order to accurately checkpoint the target program at the point before the execution of suspicious code, according to API functions provided by BLCR we have written a C file with the name of "checkpoint_selfc", which provides an interface for informing the running program of self checkpoint. Specific to our approach, we add the checkpoint_selJO statement, an interface function from checkpoint_selfc, at the point after which the first suspicious function is called in the source code. Then, we recompile the modified program with the BLCR library. When checkpoint_selJO is called, the program running with Valgrind terminates itself and saves the current state to a context file, and then in the consequent process of validating a patch, by reading the context file we can restore the state to continue to execute the program with Valgrind.
be manually set. In our framework, since we are interested in the key code, Daikon only needs to check for invariants in the key code. In really, for C program Daikon checks for invariants just at the entry and exit of function. Hence, only invariants, which hold at entries and exits of functions included in the key code, are identified. Second, execute the program with positive test cases, and record the execution trace data with the help of Kvasir, which is one of C/C++ front ends provided by Daikon. For a long-running program, recording all the trace data is time consuming but not indispensable. So, to reduce performance overhead, we record just the execution trace data of the key code. Finally, identify invariants through the analysis to these execution trace data.
c) Predict failure By means of checking each sample in the data trace file against each of the invariants, we predict whether an impending failure is about to occur. Similar to invariant identification, we utilize Kvasir, a front end provided by Daikon, to record data trace of the patched suspicious functions (also called the key code). Then, the InvarantChecker program, which is released with Daikon, is executed to check the consistency between the invariants and the trace. Taking a set of invariants identified by Daikon and the data trace, InvariantChecker checks each sample in the trace data against each of the invariants. If any violation is found, InvariantChecker prints an error message to the standard output or to a specified output file. For the checking invariants component, if any error message is observed after the execution of a test case, it means that the patched program does not pass the test case. 4) Assure We assure the patch with the traditional testing used by Genprog, if the patched program passes all invariant testing. For the limitations of invariant detection tool, some key invariants only existing in positive test cases may not be identified. As a result, the message for the violations to these invariants does not be noted by InvariantChecker, and thus yielding false positive. Hence, in order to guarantee that a candidate patch having passed the invariant testing is actually valid, the traditional testing is done. If passing all test cases once again, then the patch is assured; otherwise we have to give up the patch and repeatedly generate a new patch, validate it until a valid patch is observed.
c) Kill program Generally, there should be some common properties holding in the program. These properties are called invariants. We can obtain some invariants by learning from the execution traces of positive test cases. When a failure occurs, it is very possible that one or more properties are violated. Meanwhile, plenty of failures can be predicted by detecting invariant violations in the key code [9]. In fact, by means of adding the exit(O) statement immediately at the end of key code (suspicious functions) in the source code, we abort the program once the key code is completely executed, and then predict the impending failure by checking invariant violations. 3) Check invariants As described above, a new way to predict impending failure is required because the program is not executed to the end. We address the issue by checking invariant for functions included in key code. By this way, each invariant is represented with a logical formula that is always satisfied at the entry or exit of these functions. And these invariants can be identified through the analysis to the execution traces of positive test cases. a) Daikon We make use of Daikon [6] to identify and check invariants. There are two components architecting Daikon: a front end extracting execution trace data from a running program, and an inference engine analyzing trace data to infer invariants. Daikon is robust enough to only observe the functions in which we are interested, irrespective of the remainder. That can immensely improve the efficiency of the analysis. Furthermore, Daikon comes with a set of tools, each of which performs some kind of printing invariants, merging invariants, checking invariants, or similar tasks that help to manipulate invariants.
III.
CONTROLLED EXPERIMENT
As an excellent interpreter for a web-application scnptmg language, the PHP program is popular and widely-used. However, there is a bug [ 7] existing in the version 5.2.l. We select 3 positive test cases, 1 negative test case which is long-running. The positive test cases are some requests about which the original PHP can handle correctly, and the negative test case is about the call to the defective function which can cause the integer overflow fault. What is more, to demonstrate that our approach scales well to long-running test case, we utilize the sleep function to simulate the long- running effects that differ by the values of the sleeptime variable.
b) Invariant identification Before we begin to check invariants, invariants holding at the suspicious code must be first identified, and this work is completed in the stage of preparation. First, make certain which kind of invariants to be identified. Reporting expressive and useful invariants are Daikon's primary goal. But, which invariants are reported depends on the configurations which can
237
TABLE 1. trials.
Experimental results on PHP, a program with 764489 lines of code. We performed 100 trials for each row, and report averages for the successfule
Validate One Patch
Sleeptime(s)
Approach
0
Our approach Genprog
5 10 15 20
Total(s)
Others(s)
Compile(s)
Good test cases(s)
Bad test case(s)
0.765 19.422
0.041 0.128
0.748
Our approach
0.763
0.043
0.747
Genprog
19.422
0.130
5.082
Our approach Genprog
0.764 19.448
0.130 0.132
0.747 10.111
Our approach
0.766
0.045
0.747
1.558
3.499
7.753
Genprog
19.440
0.132
15.140
34.712
5.229
131.677
Sum(s)
I I
0.051
1.554 19.601 1.553 24.634 1.641 29.691
I I
3.487 5.218 3.488 5.220 3.496 5.235
I I
7.730 76.663 7.727 94.981 7.743 113.403
Our approach
0.766
0.046
0.749
1.561
3.500
7.764
Genprog
19.462
0.133
20.157
39.752
5.228
150.058
.. To demonstrate the scalabIlIty of our approach, we assIgn dlstmct values 0, 5, 10, 15 and 20 to the s/eeptime variable, which separately simulate the long-running test cases with different delay time. At the same time, for the purpose of comparison, we also tried to fix the bug with the original Genprog. The experimental context is the same as the context described in Section 4.l. All the experimental results of this example are listed in Table l. Clearly, our approach scales significantly better compared with Genprog in Table l. Using our approach time for validating one patch is relatively stable. In contrast, the corresponding time in the original Genprog increases rapidly as with the growing values of the sleeptime variable. In particular, when the value of sleeptime is assigned to 20, to yield one valid patch, Genprog took approximate 3 9.752 seconds to validate one candidate patch, which is over 2 times more than the value of s/eeptime assigned to 0, as well as over 25 times more than our approach. IV.
ACKNOWLEDGMENT
This research was supported in part by National Natural Science Foundation of China (Grant Nos. 90818024, 911(800 7), National High Technology Research and Development Program of China (Grant Nos. 20IIAAOIOI06, 2012AAOII20I), and Program for New Century Excellent Talents in University. REFERENCES [II
software technologies,
[21
[31 [4]
RELATED WORK
[5]
A number of studies try to repair defective programs at the source code level in different ways. Guided by evolutionary computation, Genprog has the ability to repair programs without any specifications [10]. AutoFix-E can repair programs but requires for the contracts in terms of pre- and post-conditions [4]. JAFF tries to automatically correct the faulty java programs using an evolutionary approach; the repair effectiveness of JAFF were not reported on real-world softwares with real bugs [11]. AFix focuses on the repair for single-variable atomicity violations [12]. PHPRepair can automatically fix HTML generation Errors in php applications through string constraint solving [13 ]. V.
R. C. Seacord, D. Plakosh, and G. A. Lewis, Modernizing legacy systems:
[6]
[7]
[8]
[9]
[10]
CONCLUSION
Current approaches for automatic program repair do not scale very well due to possible long-running test cases. To address this problem, we present the technique of FPE to suppress the time spent in executing long-running test cases. Meanwhile, we implement an approach for automatic program repair with the combination of our idea. The controlled experimental results reveal that our approach performs much more gracefully than the original Genprog.
[11] [12]
[13]
238
engineering processes,
and
business
practices:
Addison-Wesley Professional,2003. W. Weimer, S. Forrest, C. 1. Goues, and T. Nguyen, "Automatic program repair with evolutionary computation," Communications of the ACM, vol. 53, pp. 109-116,2010. A. Arcuri, "On the automation of fixing software bugs," in International Conference on Software Engineering,2008. Y. Wei, Y. Pei, C. A. Furia, 1. S. Silva, S. Buchholz, B. Meyer, et al., "Automated fixing of programs with contracts," in International Symposium on Software Testing and Analysis,2010. E. Roman, "A survey of checkpoint/restart implementations," Citeseer LBNL-54942,2002. M. D. Ernst, 1. H. Perkins, P. 1. Guo, S. McCamant, C. Pacheco, M. S. Tschantz, et al., "The Daikon system for dynamic detection of likely invariants," Science of Computer Programming, vol. 69,pp. 35-45,2007. C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer, "GenProg: A Generic Method for Automated Software Repair," JEEE Transactions on Software Engineering, vol. 38,pp. 54-72,2012. P. H. Hargrove and 1. C. Duell, "Berkeley lab checkpoint/restart (BLCR) for Linux clusters," Journal of Physics: Conference Series, vol. 46,pp. 494-499, 2006. 1. Perkins,S. Kim,S. Larsen,S. Amarasinghe,J. Bachrach,M. Carbin, et al., "Automatically patching errors in deployed software," in Symposium on Operating Systems Principles, 2009,pp. 87-I02. W. Weimer, T. V. Nguyen, C. Le Goues, and S. Forrest, "Automatically finding patches using genetic programming," in International Conference on Software Engineering,2009,pp. 364-374. A. Andrea, "Evolutionary repair of faulty software," Applied Soft Computing, vol. 11,pp. 3494-3514,2011. G. Jin, 1. Song, W. Zhang, S. Lu, and B. Liblit, "Automated atomicity violation fixing," in the 32nd ACM SIGPLAN conference on Programming language design and implementation,2011. H. Samimi, M. Schafer, S. Artzi, T. Millstein, F. Tip, and L. Hendren, "Automated repair of HTML generation errors in PHP applications using string constraint solving," in International Conference on Software Engineering,2012,pp. 277-287.