Mar 20, 2009 ... Comparing Java bug finding tools ... Bandera, ESC/Java 2, FindBugs, Jlint, PMD
static analysis tools ... Java over a variety of checking tasks.”.
bug/lús These warnings about my Java code are interesting.
Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar 7 and 8 Comparing Java bug finding tools
20/03/2009
Dr Andy Brooks
1
comparison/samanburður
Case Study/Dæmisaga Reference A Comparison of Bug Finding Tools for Java, N Rutar, C B Almazan, and J S Foster, ISSRE‟04, paper 4, ©IEEE Bandera, ESC/Java 2, FindBugs, Jlint, PMD static analysis tools
“We present what we believe is the first detailed comparison of several different bug finding tools for Java over a variety of checking tasks.”
20/03/2009
Dr Andy Brooks
2
trade-off/skipti
Are all bugs true? • false positives – “warnings about correct code”
• false negatives – “failing to warn about incorrect code”
• Bug finding tools make different tradeoffs regarding both false positives and false negatives.
20/03/2009
Dr Andy Brooks
3
to generalize/að alhæfa too many warnings/of margar viðvaranir difficult to understand/erfitt að skilja
Research Validity Rannsóknarréttmæti
• Java tools and code – C and C++ were not studied, but C and C++ tools use the same basic techniques.
• “...we did not exactly categorize every false positive and false negative...” – There were too many warnings from the tools. – The code being checked was written by others and difficult to understand.
• Results from the tools are cross-checked. 20/03/2009
Dr Andy Brooks
4
severity/harka serious logical error/ alvarleg rökvilla
Research Validity Rannsóknarréttmæti
• Bugs were not rated according to severity. – Quantifying the severity of bugs is difficult. – How serious is it to leave out brackets in if...else... statements? PMD warns about missing brackets. int x = 2, y = 3; int x = 2, y = 3; If a programmer believes x will have the value 4, this could be a serious logical error.
20/03/2009
if (x == y) if (y == 3) x = 3; else x = 4;
Dr Andy Brooks
if (x == y) { if (y == 3) x = 3;} else {x = 4;}
“dangling else problem”
5
to generalize/að alhæfa runtime exception/inningarfrábrigði
Jlint, FindBugs, ESC/Java 2 report this as a bug.
Research Validity Rannsóknarréttmæti
String s = new String(“I´m not null...yet”); s = null; System.out.println(s.length());
– A runtime exception will occur and the program will halt if the exception is not caught. This bug will be very quickly caught when the program is run. – Is the null dereference really a more severe bug than the missing brackets in the if...else... statement? • The faulty logic might not be caught so quickly.
20/03/2009
Dr Andy Brooks
6
compilation/vistþýðing
Figure 1 code compiles with no errors or warnings. But 4 of the 5 tools found at least one bug.
©IEEE
7
object/hlutur
Figure 1 code • PMD warns y on line 8 is not used. • FindBugs warns the results of the read on line 12 is ignored. • FindBugs warns the method may fail to close a stream on exception. • FindBugs warns about using == to compare String objects. • ESC/Java 2 warns about an array index being possibly too large on lines 17-18. – Java arrays are indexed from 0 to length-1 20/03/2009
Dr Andy Brooks
8
overlap/skörun
Figure 1 code • ESC/Java 2 warns about a possible null dereference on line 18 which is a false positive because b is initialized in the constructor. • JLint warns about the String comparison on lines 17-18. – There is overlap with FindBugs.
• FindBugs looks for unused variables but does not discover that y is unused. – False negative.
• JLint can warn about indexing out of bounds, but does not recognise the bug on line 16. – False negative. 20/03/2009
Dr Andy Brooks
9
Figure 2 The Bug Finding Tools Version
Bandera ESC/Java 2 FindBugs
JLint PMD 20/03/2009
0.3b2 (2003) 2.0a7 (2004) 0.8.2 (2004) 3.0 (2004) 1.9 (2004)
Input
Interfaces Technology
source
CL, GUI
Model Checking
source
CL, GUI
bytecode
CL, GUI, IDE, Ant
Theorem Proving Syntax, Dataflow
bytecode
CL
source
CL, GUI, IDE, Ant
Dr Andy Brooks
Syntax, DataFlow Syntax 10
multithreading/fjölþráðavinnsla extensible/stækkanlegur detector/skynjari
FindBugs
• Checks that wait() in multi-threaded Java programs are inside a loop, which is usually the correct usage. • Uses dataflow analysis within a method (intraprocedural) to check for null pointer dereferences. • FindBugs can be extended by writing custom bug detectors. • In the study, FindBugs was set to report medium priority warnings. 20/03/2009
Dr Andy Brooks
11
deadlock/sjálfhelda
JLint • JLint includes an interprocedural, inter-file component to find deadlocks. • JLint 3.0 is not easily extended. If, for one class, the locking scheme is such that it could lead to a cycle in the locking graph, this message is shown. public void foo() { synchronized (a) { synchronized (b) { } }} public void bar() { synchronized (b) { synchronized (a) { } }} 20/03/2009
Jlint manual. Dr Andy Brooks
12
PMD • No dataflow analysis. • Many PMD´s bugs are about breaking style conventions. – What style conventions do you adhere to? • How many Python coding style standards are there ? • How many C# coding style standards are there ?
• PMD allows bug detectors to be turned on and off. • PMD can be extended. 20/03/2009
Dr Andy Brooks
13
precondition/forskilyrði postcondition/eftirskilyrði loop invariant/lykkjuóbreyta comment/athugasemd specification/hönnunarlýsing
ESC/Java 2
• Programmer adds preconditions, postconditions, and loop invariants as special comments in the code. • ESC/Java 2 uses a theorem prover to verify the program matches the specifications. • In the study, ESC/Java 2 was used without specifications and without making annotations to reduce the number of false positives. 20/03/2009
Dr Andy Brooks
14
Figure 3 Bug Types/Lúsar Tegundir Example General Concurrency
Exceptions Array Mathematics
ESC/Java 2
FindBugs
JLint
PMD
Null dereference
√*
√*
√*
√
Possible deadlock
√*
√
√*
√
Possible unexpected exception
√*
Length may be less than zero
√
√*
Division by zero
√*
√
Unreachable code due to constant guard
√
Using == or !=
√
√*
√
Object overriding
Equal objects must have equal hashcodes
√*
√*
√*
I/O stream
Stream not closed on all paths
√*
Unused local variable
√
Should be a static inner class
√*
Conditional,loop String
Unused or duplicate statement Design Unnecessary statement
Unnecessary return statement
√*
√*
√*
* Means that the tool checks the specific example provided. 20/03/2009
Dr Andy Brooks
15
running time/inningartími
Experiments/Tilraunir • Five mid-sized programs. • Running times of the static analysis tools varied enormously: – ESC/Java 2 took a few hours – FindBugs and PMD took a few minutes – JLint took a few seconds – Timings were on a Mac OS X v10.3.3 with 1.25 GHz PowerPC G4 processor and 512MB RAM. 20/03/2009
Dr Andy Brooks
16
number of warnings/fjöldi viðvarana repeated warnings/endurteknar viðvaranir raw/óunnin
Figure 4 Raw* warning counts NCSS (Lines)
Class ESC/Java FindBugs Files 2
Azureus 2.0.7
35,549 1,053
Art of Illusion 1.7
55,249
Tomcat 5.019 JBoss 3.2.3 Megamek 0.29
JLint
PMD
5474
360
1584
1371
676
12813
481
1637
1992
34,425
290
1241
245
3247
1236
8,354
274
1539
79
317
153
37,255
270
6402
223
4353
536
* No attempt to remove repeated warnings about the same error/sama villan. 20/03/2009
Dr Andy Brooks
17
“For ESC/Java 2, the number of generated warnings is sometimes extremely high... 12000/60 = 200 hours. FindBugs generally reports fewer warnings than the other tools. In general, we found this makes FindBugs easier to use...” 480/60 = 7 hours.
20/03/2009
Dr Andy Brooks
18
too many warnings/of margar viðvaranirtylft category/flokkur dozen/tylft
Analysis/Greining
• There are too many warnings to review manually so the authors decide to analyse a few bug categories only. • Even restricting the bug categories is not enough, so manual examination is limited to “several dozen warnings”. Hmmm, looks like another false positive...
20/03/2009
Dr Andy Brooks
19
Figure 6 Warning counts by bug category ESC/Java 2 FindBugs
JLint
PMD
Concurrency
126
122
8883
0
Null dereferencing
9120
18
449
0
Null assignment 0
0
0
594
Index out of bounds
1810
0
264
0
Prefer Zero Length Array
0
36
0
0
20/03/2009
Dr Andy Brooks
20
concurrent/samskeiða
Concurrency • ESC/Java 2 reported 126 deadlock warnings. The authors looked at a handful of these and some appeared to be false positives. They found it difficult to investigate further because of the way the warnings were reported. • FindBugs found three cases (true bugs) of the „double-checked locking bug‟ in Java but PMD produced no warnings (false negatives). – PMD was fooled by some other code mixed in with the bug pattern. 20/03/2009
Dr Andy Brooks
21
Concurrency • FindBugs was usually found to correctly indicate the presence of a concurrency bug pattern but it was not always clear that an actual error was present. – FindBugs has no interprocedural analysis so it will warn about a wait() not being in a loop even though the wait() might be in a method which is in a loop. – And not all uses of a wait() outside a loop are wrong. 20/03/2009
Dr Andy Brooks
22
Concurrency • In some cases, JLint produced many warnings from the same underlying bug. – Sometimes several hundred.
• The duplication could be eliminated by making changes to Jlint. – Reporting a cycle in a lock graph just once.
• The sheer volume of JLint warnings made it difficult to assess the false positive rate. 20/03/2009
Dr Andy Brooks
23
overlap/skörun
Null dereferencing • Not a lot of overlap was found between ESC/Java 2, FindBugs, and Jlint. • In a fair number of cases, JLint´s warnings were false positives. – To eliminate some of the false positives, a deep analysis of program logic is required, which is difficult for a static analysis tool.
20/03/2009
Dr Andy Brooks
24
Null dereferencing • ESC/Java 2 often assumes objects might be null so reports more warnings than the other tools. – Too many warnings to be easily useful. – Annotations are required to be made.
• ESC/Java 2 does not always report warnings in the same place as JLint. – Overlap is not 100%. 20/03/2009
Dr Andy Brooks
25
heuristics/brjóstvitsfræði
Null dereferencing and Null assignment • FindBugs reports much fewer warnings because it uses heuristics to avoid reporting warnings when its dataflow analysis loses precision. • PMD warns about setting certain objects to null but the authors regarded this as not useful “for many common coding styles.” – False positives. 20/03/2009
Dr Andy Brooks
26
Array Bounds Errors • Java throws a run-time exception if an array index is out of bounds. • JLint and ESC/Java 2 check for array index is out of bounds and creating an array with negative size. • JLint and ESC/Java were found not always to report the same warnings in the same places. – Overlap not 100%. 20/03/2009
Dr Andy Brooks
27
Array Bounds Errors • ESC/Java 2 mainly reports warnings because parameters that are later used in array accesses may not be within range. – Annotations are required to be made.
• JLint has several false positives and some false negatives. – Some information is not tracked interprocedurally in its dataflow analysis. – See the following JLint example. 20/03/2009
Dr Andy Brooks
28
Array Bounds Errors JLint example public class Foo { static Integer[] ary = new Integer[2]; public static void assign() { Object o0 = ary[ary.length]; Object o1 = ary[ary.length-1]; } }
• JLint warns that the access for o1 might be out of bounds because it thinks the length of the array might be zero. (False positive.) • But JLint does not warn about the access for o0 which will always be out of bounds. (False negative.) 20/03/2009
Dr Andy Brooks
29
Array Bounds Errors • FindBugs warns about returning null from a method that returns an array. – Might be better to use a 0-length array.
20/03/2009
Dr Andy Brooks
30
correlation/fylgni
Correlations r = 0,34
• Regarding the total warning counts per class, no substantial correlations were found comparing JLint vs PMD (r = 0,15), JLint vs FindBugs (r = 0,33), or FindBugs vs PMD (r = 0,31). • Regarding lines of code per class and warning counts, weak or no correlations were found. 20/03/2009
Dr Andy Brooks
31
usability/nytsemi
Usability of Tools • Tools with a GUI and/or IDE plugins were much more usable than tools that provided only textual output. • Tools should avoid cascading errors: – JLint repeatedly warns about dereferencing a variable that may be null. • Warning about the first dereference should be sufficient.
20/03/2009
Dr Andy Brooks
32
Usability of Tools • False positives are a problem for all the tools. – Only ESC/Java 2 supports annotations to eliminate false positives.
• Allowing developers to weigh the severity of types of warnings might be useful.
20/03/2009
Dr Andy Brooks
33
distinct/ólíkur
Conclusions/Niðurstöður • While there was some overlap between tools, “mostly warnings are distinct”. • The main usability problem is the sheer volume of warnings.
20/03/2009
Dr Andy Brooks
34
Conclusions/Niðurstöður • All tools should support an annotation facility to reduce the number of false positives. • Work is needed to determine actual faults in programs so that false positive and false negative rates can be precisely determined. Even restricting the bug categories is not enough, so manual examination is limited to “several dozen warnings”.
20/03/2009
Dr Andy Brooks
35
Conclusions/Niðurstöður • The absence of warning does not imply the absence of error. • We still do not know the right tradeoffs to make in bug finding tools.
20/03/2009
Dr Andy Brooks
36