Modeling the Reliability of Existing Software using ...

3 downloads 1734 Views 103KB Size Report
technology, it is only recently that the computing power has increased ... [18] C. Kaner, “Software negligence and testing coverage,” Florida Tech,. Tech. Rep.
Modeling the Reliability of Existing Software using Static Analysis Walter W. Schilling, Jr., Member, IEEE

Dr. Mansoor Alam, Member, IEEE

Electrical Engineering and Computer Science Department The University of Toledo Toledo, Ohio 43615 Email: [email protected]

Electrical Engineering and Computer Science Department The University of Toledo Toledo, Ohio 43615 Email: [email protected]

Abstract— Software reliability represents an increasing risk to overall system reliability. As systems become larger and more complex, mission critical and safety critical systems have had increasing functionality controlled exclusively through software. This change, coupled with generally increasing reliability in hardware modules, has resulted in a shift of the root cause of systems failure from hardware to software. Market forces, including increased time to market, reduced development team sizes, and other factors, have encouraged projects to reuse existing software as well as to purchase COTS software solutions. This has made the usage of the more than 200 existing software reliability models increasingly difficult. Traditional software reliability models require significant testing data to be collected during software development in order to estimate software reliability. If this data is not collected in a disciplined manner or is not made available to software engineers, these modeling techniques can not be applied. It is imperative that practical reliability modeling techniques be developed to address these issues. It is on this premise that an appropriate software reliability model combining static analysis of existing source code modules, limited testing with path capture, and Bayesian Belief Networks is presented. Static analysis is used to detect faults within the source code which may lead to failure. Code coverage is used to determine which paths within the source code are executed as well as how often they execute. Finally, Bayesian Belief Network is then used to combine these parameters and estimate the resulting software reliability.

I. I NTRODUCTION Traditional software reliability models require significant data collection during development and testing, including the operational time between failures, the severity of the failures, and other metrics. This data is then applied to the project to determine if an adequate software reliability has been achieved. While these methods have been applied successfully to many programs, there are occasions when the failure data has not been collected in an adequate fashion to obtain relevant results. This is often the case when reusing previously developed software or purchasing COTS components. This poses a dilemma for a software engineer wishing to reuse a piece of software or purchase from a vendor. As software does not suffer from age related failure, all faults which lead to failure are present when the software is released. In a purely theoretical sense, if all faults can be detected in the released software, and these faults can then be assigned a probability of manifesting themselves during

software operation, an appropriate estimation of the software reliability can be obtained. The difficulty of this theoretical concept is reliably detecting the software faults and assigning the appropriate failure probabilities. While neither perfect nor guaranteed, static analysis has been shown to be practical and extremely effective at detecting faults. It is believed that it is possible to develop a reliability model based upon static analysis, limited testing, and Bayesian Belief Networks. Static analysis is a technique commonly used during implementation and review to detect software implementation errors. It has been used in mission critical source code development, such as aircraft [1] and rail transit [2] areas, and has been shown to reduce software defects by a factor of six [3], as well as detect 60% of post-release failures [4]. Static analysis has been shown to out perform other Quality Assurance methods, such as model checking [5], and can detect errors such as buffer overflows [6], security vulnerabilities [7] [8], memory leaks [9], timing anomalies [10], dead or unused code [11], as well as other common programming mistakes. Giessen [12] provides an overview of the concept of static analysis, including the philosophy and practical issues related to static analysis. Nagappan et all. [13] discusses the application of static analysis to a large scale industrial project as well as demonstrating a statistically significant relationship between the faults detected during automated inspection and the actual number of field failures. II. U NDERSTANDING FAULTS

AND

FAILURES

It is often the case that the terms fault and failure are used interchangeably. This is incorrect, as each term has a distinct and specific meaning. Unfortunately, there are multiple definitions for this relationship. For the purposes of this article, a fault represents a software defect injected during software development. The development of software is a labor intensive process, and as such, programmers make mistakes, resulting in faults being injected during development into each and every software product. The rate varies for each engineer, the implementation language, and the development process. These injected faults are removed principally through review and testing. A failure represents an unexpected departure of the software package from expected operational characteristics. A failure May 7-10, 2006, East Lansing, MI USA Draft Version submitted to 2006 IEEE International Electro/Information Technology Conference 1

can be attributable to one or more faults within the software package. Any fault can potentially cause a failure of a software package, but the probability for a fault manifesting itself as a failure is not uniform. Adams [14] indicates that, on average, one third of all software faults manifest themselves as a failure once every 5000 executable years, and only two percent of all faults lead to a MTTF of less than 50 years. Downtime is not evenly distributed either, as it is suggested that about 90 percent of the downtime comes from, at most, 10 percent of the faults. From this research, it follows that finding and removing a large number of defects does not necessarily yield a high reliability product. Instead, it is important to focus on the faults that have a short MTTF associated with them. Failures by their nature can only be detected through software testing. There are certain tools which have been created to combine static analysis with dynamic testing. The Check ’n’ Crash tool [15] combines static analysis with Testing to automatically detect and verify the presence of errors within developed software. ESC/Java is used to analyze the source code and detect programming errors and test cases are written for the JCrasher [16] tool to determine if the detected fault will actually produce a failure.

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19:

typedef unsigned short uint16_t; void update_average(uint16_t current_value); #define NUMBER_OF_VALUES_TO_AVERAGE (11u) static uint16_t data_values[NUMBER_OF_VALUES_TO_AVERAGE]; static uint16_t average = 0u; void update_average(uint16_t current_value) { static uint16_t array_offset = 0u; static uint16_t data_sums = 0u; array_offset = ((array_offset++) % NUMBER_OF_VALUES_TO_AVERAGE); data_sums -= data_values[array_offset]; data_sums += current_value; average = (data_sums / NUMBER_OF_VALUES_TO_AVERAGE); data_values[array_offset] = current_value; }

Fig. 2: Sample source code exhibiting statically detectable faults.

is to calculate the running average of an array of variables. Variables are stored in a circular buffer data values of length N U M BER OF V ALU ES T O AV ERAGE, defined at compile time to be 11. The average of the values stored is kept in the variable average, the sum of all data values is stored in the variable data sums, and the current offset into the circular buffer is stored in array of f set. Each time the routine is called, a 16 bit value is passed in representing the current value that is to be added to the average. The array offset is incremented, the previous value is removed from the III. W HAT C AUSES A FAULT TO B ECOME A FAILURE data sums variable, the new version is added to the array and data sums variable, and the updated average is stored in A. Example Statically Detectable Faults average. However, there are several potential problems with As was discussed previously, a large majority of the faults this simple routine associated with the array of f set variable. present in a given software package rarely manifest themselves The intent of the source code is to increment the offset by one as a failure. The key to understanding software reliability and then perform a modulus operation on this offset to place based upon static analysis is to understand what causes a fault it within the range of 0 to 11. Based upon the behavior of the to become a failure and to be able to predict which faults will compiler, this may or may not be the case. If array of f set = likely manifest themselves as a failure. 10, the value of array of f set can be either 0 or 10 The value will be 0 if the postfix increment operator (++) is 1: int32_t foo(int32_t a) executed before the modulus operation occurs. However, if 2: { 3: int32_t b; the compiler chooses to implement the logic so that the postfix 4: if (a > 0) 5: { increment operator occurs after the modulus operation occurs, 6: b = a; 7: } the array of f set variable will have a value of 11. 8: return ((b) ? 1 : 0); 9: } If array of f set is set to 11, the execution of line 15 results Fig. 1: Sample source code exhibiting uninitialized variable. in an out of bounds access for the array. In the C Language, reading from outside of the array does not directly cause Uninitialized variables pose a significant problem to embed- a processor exception. However, the value read is entirely ded source code programs. In the ISO C language, variables invalid, potentially resulting in a very large negative number are not automatically initialized when defined. For an auto- if the offset value being subtracted is larger than the current matic variable which is allocated either on the stack or within data sum variable value. Line 18 may result in the average a processor register, this results in an unknown value being value being overwritten. Depending upon the compilers word assigned to that variable. Figure 1 shows an example of a alignment, array padding, and other implementation behavfunction which has potential uninitialized variable. If a > 0, iors, the average variable may be the next variable in RAM b is initialized to the value of a. However, if a ≤ 0, the following the data values array. If this is the case, writing value for b is indeterminant, and therefore, the return value of to data values[11] will result in the average value being the function is also indeterminant. This behavior is statically overwritten. This behavior can vary from one compiler to detectable, yet occurs once every 250 lines of course code another compiler, one compiler version to another compiler for Japanese programs and once every 840 lines of code for version, or be dependant upon compiler options passed on the US programs [17]. If this function is executed, the resulting command line, especially if compiler optimization is used. behavior is entirely unpredictable. From a software reliability standpoint, the probability of Figure 2 provides another example of source code which failure associated with this construct is easy to very through contains statically detectable faults. The intent of the code testing. So long as the code has been exercised through this May 7-10, 2006, East Lansing, MI USA Draft Version submitted to 2006 IEEE International Electro/Information Technology Conference 2

transition point and proper behavior has been obtained, proper behavior will continue until the code is recompiled, a different compiler version is used, or the compiler is changed. B. When Does a Fault Manifest Itself as a Failure In order to use static analysis for reliability prediction, it is important to understand what causes these faults to become failures. There are many reasons why a fault lays dormant and does not manifest itself as a failure. The first, and most obvious, deals with code coverage. If a fault does not execute, it can not fail. While this is intuitively obvious, determining if a fault can be executed can be quite complicated and require significant analysis. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

#include

"interface.h"

static uint16_t test_active_flags; static uint16_t test_done_flags; void do_walk(void) { uint8_t announce_param; function_ptr_type test_param; if (TEST_BIT(test_active_flags, DIAG_TEST)) { if (check_for_expired_timer(TIME_IN_SPK_TEST) == EXP) { start_timer(); if (TEST_BIT(test_active_flags, RF_TEST)) { announce_param = LF_MESSAGE; test_param = LF_TEST; SETBIT_CLRBIT(test_active_flags, LF_TEST, RF_TEST); } else if (TEST_BIT(test_active_flags, LF_TEST)) { announce_param = LR_MESSAGE; test_param = LR_TEST; SETBIT_CLRBIT(test_active_flags, LR_TEST, LF_TEST); } else if (TEST_BIT(test_active_flags, LR_TEST)) { announce_param = RR_MESSAGE; test_param = RR_TEST; SETBIT_CLRBIT(test_active_flags, RR_TEST, LR_TEST); } else if ((TEST_BIT(test_active_flags, RR_TEST)) && (get_ap_state(AUK_STATUS) != UNUSED_AUK)) { announce_param = SUBWOOFER_MESSAGE; test_param = AUX1_TEST; SETBIT_CLRBIT(test_active_flags, SUBWOOFER1_TEST, RR_TEST); } else { announce_param = EXIT_TEST_MESSAGE; CLRBIT(test_active_flags, DIAG_TEST); SETBIT(test_done_flags, DIAG_TEST); } make_announcements(announce_param); *test_param(); } } }

Fig. 3: Source code exhibiting uninitialized variable.

Figure 3 provides a second example of an uninitialized variable fault. In this case, there are eight distinct paths through the source code. Of these paths, seven of them do not contain any statically detectable faults. However, the eighth path fails to initialize a function pointer, resulting in the program jumping to an unknown address, and likely crashing the program. Assuming that we can consider these paths of having an equal probability of executing, pf = .125. The very presence of these problems may or may not immediately result in a failure. Returning a larger than expected number from a mathematical function may not immediately result in a software failure. A random jump to a pointer address will most likely result in an immediate and noticeable failure. Overwriting the stack return address is likely to result in the same behavior. For a fault to manifest itself as a failure, the code which contains the fault first must execute, and then the result of fault must be used in a manner that will result in a failure occurring.

C. Measuring Code Coverage The first and most important starting point for determining if a fault is to become a failure is related to source code coverage. If a fault is never encountered during execution, it can not become a failure. Embedded systems are often designed with a few repetitive tasks that execution periodically at similar rates. Thus, the percentage of code which routinely executes is actually quite small, and the majority of the execution time is spent covering the same lines over and over again. With limited testing covering the normal use cases for the system, information about the “normal” execution path through the module can be obtained. There are many different metrics and measurements associated with code coverage. Kaner [18] lists 101 different coverage metrics. Four of the more common methods are Statement Coverage, Block Coverage, Decision Coverage, and Path Coverage. Statement Coverage measures whether each executable statement is encountered. Block coverage is an extension of statement coverage with the unit of code being a sequence of non-branching statements. Decision Coverage reports whether boolean expressions tested in control structures have evaluated to both true and false. Path Coverage reports whether each possible path in each function has been followed. There has been significant study of the relationship between code coverage and the resulting reliability of the source code. Garg [19] and Del Frate [20] indicate that there is a strong correlation between code coverage obtained during testing and software reliability, but the exact extent of this relationship is unknown. A certain level of code coverage is often mandated by the software development process when evaluating the effectivness of the testing phase. Extreme Programming advocates endorse 100% method coverage in order to ensure that all methods are invoked at least once, though there are exceptions given for small functions [21]. Piwowarski, Ohba, and Caruso [22] indicate that 70% statement coverage is necessary to ensure sufficient test case coverage, 50% statement coverage is insufficient to exercise the module, and beyond 70%-80% is not cost effective. There are many tools that have been developed to aid in code coverage analysis, both commercial and open source. Known Java programs for code coverage include EMMA [23], InsectJ [24], JVMDI [25], Clover [26], JCover [27], JBlanket [28] and Quilt [29]. A further discussion of Java code coverage tools is available in [21]. IV. M ATHEMATICALLY M ODELING A FAILURE A. The Static Analysis Premise

The fundamental premise behind this model is that the resulting software reliability can be related to the number of statically detectable faults present within the source code, the number of paths which lead to the execution of the statically detectable faults, and the rate of execution of each path within the software package. Figure 4 provides example source code for an embedded system timer routine which verifies if a timer has or has May 7-10, 2006, East Lansing, MI USA Draft Version submitted to 2006 IEEE International Electro/Information Technology Conference 3

1 #include 2 typedef enum {FALSE, TRUE} boolean; 3 4 extern uint32_t get_current_time(void); 5 6 typedef struct { 7 uint32_t starting_time; /* Starting time for the system */ 8 uint32_t timer_delay; /* Number of ms to delay */ 9 boolean enabled; /* True if timer is enabled */ 10 boolean periodic_timer; /* TRUE if the timer is periodic. */ 11 } timer_ctrl_struct; 12 13 boolean has_time_expired(timer_ctrl_struct p_timer) 14 { 15 boolean t_return_value = FALSE; 16 uint32_t t_current_time; 17 if (p_timer.enabled == TRUE) 18 { 19 t_current_time = get_current_time(); 20 if ((t_current_time > p_timer.starting_time) && 21 ((t_current_time - p_timer.starting_time) > p_timer.timer_delay)) 22 { 23 /* The timer has expired. */ 24 t_return_value = TRUE; 25 } 26 else if ((t_current_time < p_timer.starting_time) && 27 ((t_current_time + (0xFFFFFFFFu - p_timer.starting_time)) > >>p_timer.timer_delay)) 28 { 29 /* The timer has expired and wrapped around. */ 30 t_return_value = TRUE; 31 } 32 else 33 { 34 /* The timer has not yet expired. */ 35 t_return_value = FALSE; 36 } 37 if (t_return_value == TRUE) 38 { 39 if (p_timer.periodic_timer == TRUE ) 40 { 41 p_timer.starting_time = t_current_time; 42 } 43 else 44 { 45 p_timer.enabled = FALSE; 46 p_timer.starting_time = 0; 47 p_timer.periodic_timer = FALSE; 48 } 49 } 50 } 51 else 52 { 53 /* Timer is not enabled. */ 54 } 55 return t_return_value; 56 }

expresses the reliability for the block. The two parameters which affect a fault becoming a failure have been carefully chosen based upon the nature of static analysis tools. For each statically detectable fault, there is a set of variables which effect the probability of the fault becoming a failure. Every static analysis tool in existence generates false positive fault warnings; the percentage of these varies greatly. Thus, the first variable on a fault becoming a failure is the probability that the detected fault is valid. Once a statically detectable fault is known to be valid, the second parameter effects how probable that fault is of manifesting itself as a failure. This will vary based upon the exact statically detectable fault, the data range necessary to exploit the given fault, and other factors. For this experiment, when possible, pf p will ideally be selected based upon historical information and historical statistics, such as those published by Hovemeyer and Pugh [30], Artho and Havelund [31], and Wagner et all [32]. If a statement block includes a call to an external method or function, then the reliability of the block will be multiplied by the reliability of the method which is being called. If this number is not available, a default value can be used. If multiple functions are called within a block, then the reliability will be the product of their discrete reliabilities.

Fig. 4: Sample source code to determine if a timer has expired.

not expired. To model reliability, the source code is first divided into statement blocks. A statement block represents a continguous set of source code instructions uninterrupted by a conditional statement. By using this organization, the source code is translated into a set of blocks connected by decisions. Statically detectable faults can then be partitioned into the appropriate block. By doing this, the reliability for each block can be assigned based upon the statically detectable faults. For a block which has a single statically detectable fault, the reliability of that block can be expressed as Rblock = 1 − pf

(1)

Fig. 5: Example code structure diagram for a method showing reliabilities of each block and each link.

pf = (pf p · pif )

(2)

Once the reliability for each block has been established, the reliability for each decision which leads into a block needs to be established. This is accomplished in the same manner, using statically detectable faults and the same mechanism that is used for source code blocks. When this stage is completed, the code structure diagram will look similar to the example shown in Figure 5.

where and pf p

represents the probability of a false positive static analysis detection occurring and pif represents the probability of an immediate failure for the fault occurring. If there is more than one statically detectable fault within a given block, then n X

B. Integration of Code Coverage Into the Model

At this point, the probability of failure for each static path through the source code has been established. However, the i=1 May 7-10, 2006, East Lansing, MI USA Draft Version submitted to 2006 IEEE International Electro/Information Technology Conference 4 Rblock = 1 −

pf,i

(3)

ultimate reliability for each method requires knowing how often each path will be taken through the source code. The simplest method for establishing code coverage using the model given would be to assume that all paths through the function execute with equal probability. The function diagramed in Figure 5 has ten possible paths through the 1 = source code; each path would have a probability pp = 10 0.10 of executing. From this method, we can then calculate a reliability for the given function. However, empirically it is known that this assumption of uniform path coverage is incorrect. Many functions contain fault tolerance logic which rarely executes while other functions contain source code which, given the calling parameters, never executes. The next natural refinement of this probabilistic assignment would be to look at the discrete decisions which cause the execution of each path through the source code. To use this methodology, it is assumed that each conditional statement has an equal probability of being true or false. Thus, the statement if (p timer enabled == T RU E) has a probability of pi = .50 of taking the if condition and a probability pe = .50 of taking the else condition. Using this same logic, the statement ‘‘if ((t_current_time > p_timer.starting_time) && ((t_current_time - p_timer.starting_time) > p_timer.timer_delay))’’

has a probability of pi = pC1=T RUE ∪ pC2=T RUE = 0.5 · 0.5 = 0.25 of taking the if condition and a probability of pe = pC1=F ALSE ∪ pC2=F ALSE − pC1=F ALSE ∩ pC2=F ALSE = (0.5) + (0.5) − (0.5 · 0.5) = .75 of taking the else condition. This method is also problematic in that paths which encounter fewer decisions have a higher probability of executing. It is known, however, from studying developed source code that this assumption is not always true. In many instances, the first logical checks within a function are often for fault tolerance purposes, and since these conditions rarely exist, the paths resulting from this logic is rarely executed.

TABLE I: Execution Coverage of various paths Path A B→C B →D →F ∪B →E →F B →D →G∪B →E → G B →D →F ∪B →D →G B →E →F ∪B →E →G

Coverage 0 =0 2537 2415 = 0.9519 2537 9 = 0.0035 2537 113 = 0.0445 2537 5 = 0.0020 2537 117 = 0.0461 2537

Each method profiled thusfar fails to take into account the user provided data which has the greatest effect on which paths actually execute. Depending upon the users preferences and use cases, the actual path behavior may vary greatly from the theoretical values. During limited operational testing, actual code coverage can be obtained from a coverage analysis tool such as gcov, as is shown in Table I. From this information, the experimental probability of each block being executed can be constructed, allowing an system of equations to be set up,

as is shown in Equation 4. pB→D→F + pB→D→G + pB→D→F + pB→E→F +

pB→E→F pB→E→G pB→D→G pB→E→G

9 2537 = 113 2537 = 5 2537 = 117 2537 =

(4)

However, this set of equations is unsolvable. Thus, the information captured by gcov is not entirely suitable for determining the paths which are executed during limited testing. This comes about by the very nature of most code coverage tools. Of the tool discussed previously, only JQuilt, supports branch coverage, and the support in JQuilt is currently being developed. However, it is possible to obtain branch coverage information using breakpoints, tracepoints, or system address bus captures. Once the path coverage for limited testing has been obtained, this information can be appended with the theoretical branch probabilities which can be statically constructed from the source code through the use of a weighting factor. For each path i, a probability is created which includes a portion of the actual test probabilities and a portion of the theoretical probabilities such that Pp,i = 1 = 1 > 1 =

Ptup,i · Wtup + Ptuc,i · Wtuc + Pa,i · Wa (5) Wtup + Wtuc + Wa (6) Wtuc , Wtup , Wa n X Pp,i

(7) (8)

i=1

holds true. n represents the number of static paths through the source code. Wtup , Wtuc , and Wa are chosen based upon the confidence in the limited testing, as well as the desire to magnify potential failures. Choosing Wa closer to 1 magnifies the effect of the paths which have actually been covered during limited testing, while increasing the values of Wtuc and Wtup magnify the effect of the paths which have not yet been covered. It should be noted that the reason for including the theoretical paths which are not executed during routine testing is that these paths are more likely to have an undetected fault that will cause a failure. Malaiya et all [33] indicates that rarely executed modules, such as error handlers and exception handlers, are notoriously difficult to test, and are highly critical to the resultant reliability for the system. Once this has been completed, the probability for failure can be obtained by treating each code block as a segment which must complete successfully in order for successful operation to continue. By iterating this operation over all paths, the resulting reliability for a given function can be obtained. If a function calls another function, this will cause the reliability of the called function to be calculated in the same manner. If the called function is not included within the functions that are being analyzed, a default value will be assigned representing the reliability of that function.

V. R ESEARCH P LAN In order to validate this model, an experiment will be conducted in which a set of software components will be May 7-10, 2006, East Lansing, MI USA Draft Version submitted to 2006 IEEE International Electro/Information Technology Conference 5

analyzed for statically detectable defects. Limited operational testing in a simulated environment will be performed. The components will be deployed into an experimental operational environment and monitored for failure. The results of field failures will then be compared with the estimated reliability from the model, allowing validation of the model. In order to easily apply this model, the SOSART (SOftware Static Analysis Reliability Tool) reliability analysis tool shall be developed. The SOSART tool serves as a bug finding meta tool, automatically combining and correlating statically detectable faults, as well as a reliability assessment tool, calculating software reliability based upon a structural analysis of the given source code, the program paths executed during limited testing, and a Bayesian belief network. VI. S UMMARY The problem of software reliability is vast and ever growing. As more and more complex electronic devices rely further upon software for fundamental functionality, the impact of software failure becomes greater. Market forces, however, have made it more difficult to measure software reliability through traditional means. The reuse of previously developed components, the emergence of open source software, and the purchase of developed software has made delivering a reliable final product more difficult. One of the recent software engineering techniques that has emerged with great promise is the static analysis tool. While static analysis does not represent fundamentally new technology, it is only recently that the computing power has increased significantly enough to allow advanced software analysis. Though by no means a “Silver Bullet”, static analysis has been shown to be quite effective. This paper has described a reliability model which includes limited testing as well as static analysis of the raw source code to estimate the reliability of an existing software module. The reliability is calculated through a Bayesian Belief Network incorporating the path coverage obtained during limited testing, the structure of the source code, and results from multiple static analysis tools combined using a meta tool. R EFERENCES [1] K. J. Harrison, “Static code analysis on the c-130j hercules safetycritical software,” Aerosystems International, UK, Tech. Rep., 1999. [Online]. Available: www.damek.kth.se/RTC/SC3S/papers/Harrison.doc. [2] “Polyspace for c++,” Product Brochure. [Online]. Available: http: //www.polyspace-customer-center.com/pdf/cpp.pdf [3] S. Xiao and C. H. Pham, “Performing high efficiency source code static analysis with intelligent extensions.” in APSEC, 2004, pp. 346–355. [4] Q. Systems, “Overview large java project code quality analysis,” QA Systems, Tech. Rep., 2002. [5] D. R. Engler, “Static analysis versus model checking for bug finding.” in CONCUR, 2005, p. 1. [6] D. Larochelle and D. Evans, “Statically detecting likely buffer overflow vulnerabilities,” pp. 177–190. [Online]. Available: citeseer.ist.psu.edu/ larochelle01statically.html [7] J. Viega, J. T. Bloch, Y. Kohno, and G. McGraw, “Its4: A static vulnerability scanner for c and c++ code,” in ACSAC ’00: Proceedings of the 16th Annual Computer Security Applications Conference. Washington, DC, USA: IEEE Computer Society, 2000, p. 257.

[8] V. B. Livshits and M. S. Lam, “Finding security vulnerabilities in java applications with static analysis,” in 14th USENIX Security Symposium, 2005. [9] A. Rai, “On the role of static analysis in operating system checking and runtime verification,” Stony Brook University, Tech. Rep., May 2005, technical Report FSL-05-01. [10] C. Artho, “Finding faults in multi-threaded programs,” Master’s thesis, Federal Institute of Technology, 2001. [Online]. Available: citeseer.ist.psu.edu/artho01finding.html [11] A. German, “Software static code analysis lessons learned,” Crosstalk, 2004. [12] D. Giesen, “Philosophy and practical implementation of static analyzer tools,” QA Systems Technologies, Tech. Rep., 1998. [13] N. Nagappan, L. Williams, M. Vouk, J. Hudepohl, and W. Snipes, “A preliminary investigation of automated software inspection,” in IEEE International Symposium on Software Reliability Engineering, 2004, pp. 429–439. [14] E. N. Adams, “Optimizing preventive service of software products”, ibm j. research and development,,” IBM J. Research and Development, vol. 28, no. 1, pp. 2–14, January 1984. [15] C. Csallner and Y. Smaragdakis, “Check ’n’ crash: combining static checking and testing,” in ICSE ’05: Proceedings of the 27th international conference on Software engineering. New York, NY, USA: ACM Press, 2005, pp. 422–431. [16] ——, “JCrasher: An automatic robustness tester for Java,” Software— Practice & Experience, vol. 34, no. 11, pp. 1025–1050, Sept. 2004. [17] “Qac clinic,” Available Online, 1998, available from http://www.toyo.co.jp/ss/customersv/doc/qac clinic1.pdf. [Online]. Available: http://www.toyo.co.jp/ss/customersv/doc/qac clinic1.pdf [18] C. Kaner, “Software negligence and testing coverage,” Florida Tech, Tech. Rep., 1995. [19] P. Garg, “Investigating coverage-reliability relationship and sensitivity of reliability to errors in the operational profile,” in CASCON ’94: Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research. IBM Press, 1994, p. 19. [20] F. D. Frate, P. Garg, A. P. Mathur, and A. Pasquini, “On the correlation between code coverage and software reliability,” in Proceedings of the Sixth International Symposium on Software Reliability Engineering, 1995, pp. 124–132. [21] J. M. Agustin, “Jblanket: Support for extreme coverage in java unit testing,” University of Hawaii at Manoa, Tech. Rep. 02-08, 2002. [Online]. Available: citeseer.ifi.unizh.ch/605556.html [22] P. Piwowarski, M. Ohba, and J. Caruso, “Coverage measurement experience during function test,” in ICSE ’93: Proceedings of the 15th international conference on Software Engineering. Los Alamitos, CA, USA: IEEE Computer Society Press, 1993, pp. 287–301. [23] “Emma: a free java code coverage tool,” Online, http://emma.sourceforge.net/. [24] A. Seesing and A. Orso, “A generic instrumentation framework for collecting dynamic information within eclipse,” website, http://insectj.sourceforge.net/. [25] “Jvmdi code coverage analyser for java,” website, http://jvmdicover.sourceforge.net/. [26] “Clover: A code coverage tool for java,” Online, http://www.thecortex.net/clover. [27] “Jcover: Java code coverage testing and analysis,” Online, http://www.codework.com/JCover/product.html. [28] “Jblanket,” Online at, http://csdl.ics.hawaii.edu/Tools/JBlanket/. [29] “Junit quilt,” Online at, http://quilt.sourceforge.net/. [30] D. Hovemeyer and W. Pugh, “Finding bugs is easy,” in OOPSLA ’04: Companion to the 19th annual ACM SIGPLAN conference on Objectoriented programming systems, languages, and applications. New York, NY, USA: ACM Press, 2004, pp. 132–136. [31] C. Artho and K. Havelund, “Applying jlint to space exploration software.” in VMCAI, 2004, pp. 297–308. [32] S. Wagner, J. Jrjens, C. Koller, and P. Trischberger, “Comparing bug finding tools with reviews and tests,” in Proceedings of Testing of Communicating Systems: 17th IFIP TC6/WG 6.1 International Conference, TestCom 2005. Montreal, Canada: Springer-Verlag GmbH, May - June 2005. [33] Y. Malaiya, N. Li, J. Bieman, R. Karcich, and B. Skibbe, “The relationship between test coverage and reliability,” in Proc. Int. Symp. Software Reliability Engineering, November 1994, pp. 186–195.

May 7-10, 2006, East Lansing, MI USA Draft Version submitted to 2006 IEEE International Electro/Information Technology Conference 6

Suggest Documents