A Comparison of Bug Finding Tools for Java

5 downloads 105 Views 224KB Size Report
Mar 20, 2009 ... Comparing Java bug finding tools ... Bandera, ESC/Java 2, FindBugs, Jlint, PMD static analysis tools ... Java over a variety of checking tasks.”.
bug/lús These warnings about my Java code are interesting.

Hugbúnaðarverkefni 2 - Static Analysis Fyrirlestrar 7 and 8 Comparing Java bug finding tools

20/03/2009

Dr Andy Brooks

1

comparison/samanburður

Case Study/Dæmisaga Reference A Comparison of Bug Finding Tools for Java, N Rutar, C B Almazan, and J S Foster, ISSRE‟04, paper 4, ©IEEE Bandera, ESC/Java 2, FindBugs, Jlint, PMD static analysis tools

“We present what we believe is the first detailed comparison of several different bug finding tools for Java over a variety of checking tasks.”

20/03/2009

Dr Andy Brooks

2

trade-off/skipti

Are all bugs true? • false positives – “warnings about correct code”

• false negatives – “failing to warn about incorrect code”

• Bug finding tools make different tradeoffs regarding both false positives and false negatives.

20/03/2009

Dr Andy Brooks

3

to generalize/að alhæfa too many warnings/of margar viðvaranir difficult to understand/erfitt að skilja

Research Validity Rannsóknarréttmæti

• Java tools and code – C and C++ were not studied, but C and C++ tools use the same basic techniques.

• “...we did not exactly categorize every false positive and false negative...” – There were too many warnings from the tools. – The code being checked was written by others and difficult to understand.

• Results from the tools are cross-checked. 20/03/2009

Dr Andy Brooks

4

severity/harka serious logical error/ alvarleg rökvilla

Research Validity Rannsóknarréttmæti

• Bugs were not rated according to severity. – Quantifying the severity of bugs is difficult. – How serious is it to leave out brackets in if...else... statements? PMD warns about missing brackets. int x = 2, y = 3; int x = 2, y = 3; If a programmer believes x will have the value 4, this could be a serious logical error.

20/03/2009

if (x == y) if (y == 3) x = 3; else x = 4;

Dr Andy Brooks

if (x == y) { if (y == 3) x = 3;} else {x = 4;}

“dangling else problem”

5

to generalize/að alhæfa runtime exception/inningarfrábrigði

Jlint, FindBugs, ESC/Java 2 report this as a bug.

Research Validity Rannsóknarréttmæti

String s = new String(“I´m not null...yet”); s = null; System.out.println(s.length());

– A runtime exception will occur and the program will halt if the exception is not caught. This bug will be very quickly caught when the program is run. – Is the null dereference really a more severe bug than the missing brackets in the if...else... statement? • The faulty logic might not be caught so quickly.

20/03/2009

Dr Andy Brooks

6

compilation/vistþýðing

Figure 1 code compiles with no errors or warnings. But 4 of the 5 tools found at least one bug.

©IEEE

7

object/hlutur

Figure 1 code • PMD warns y on line 8 is not used. • FindBugs warns the results of the read on line 12 is ignored. • FindBugs warns the method may fail to close a stream on exception. • FindBugs warns about using == to compare String objects. • ESC/Java 2 warns about an array index being possibly too large on lines 17-18. – Java arrays are indexed from 0 to length-1 20/03/2009

Dr Andy Brooks

8

overlap/skörun

Figure 1 code • ESC/Java 2 warns about a possible null dereference on line 18 which is a false positive because b is initialized in the constructor. • JLint warns about the String comparison on lines 17-18. – There is overlap with FindBugs.

• FindBugs looks for unused variables but does not discover that y is unused. – False negative.

• JLint can warn about indexing out of bounds, but does not recognise the bug on line 16. – False negative. 20/03/2009

Dr Andy Brooks

9

Figure 2 The Bug Finding Tools Version

Bandera ESC/Java 2 FindBugs

JLint PMD 20/03/2009

0.3b2 (2003) 2.0a7 (2004) 0.8.2 (2004) 3.0 (2004) 1.9 (2004)

Input

Interfaces Technology

source

CL, GUI

Model Checking

source

CL, GUI

bytecode

CL, GUI, IDE, Ant

Theorem Proving Syntax, Dataflow

bytecode

CL

source

CL, GUI, IDE, Ant

Dr Andy Brooks

Syntax, DataFlow Syntax 10

multithreading/fjölþráðavinnsla extensible/stækkanlegur detector/skynjari

FindBugs

• Checks that wait() in multi-threaded Java programs are inside a loop, which is usually the correct usage. • Uses dataflow analysis within a method (intraprocedural) to check for null pointer dereferences. • FindBugs can be extended by writing custom bug detectors. • In the study, FindBugs was set to report medium priority warnings. 20/03/2009

Dr Andy Brooks

11

deadlock/sjálfhelda

JLint • JLint includes an interprocedural, inter-file component to find deadlocks. • JLint 3.0 is not easily extended. If, for one class, the locking scheme is such that it could lead to a cycle in the locking graph, this message is shown. public void foo() { synchronized (a) { synchronized (b) { } }} public void bar() { synchronized (b) { synchronized (a) { } }} 20/03/2009

Jlint manual. Dr Andy Brooks

12

PMD • No dataflow analysis. • Many PMD´s bugs are about breaking style conventions. – What style conventions do you adhere to? • How many Python coding style standards are there ? • How many C# coding style standards are there ?

• PMD allows bug detectors to be turned on and off. • PMD can be extended. 20/03/2009

Dr Andy Brooks

13

precondition/forskilyrði postcondition/eftirskilyrði loop invariant/lykkjuóbreyta comment/athugasemd specification/hönnunarlýsing

ESC/Java 2

• Programmer adds preconditions, postconditions, and loop invariants as special comments in the code. • ESC/Java 2 uses a theorem prover to verify the program matches the specifications. • In the study, ESC/Java 2 was used without specifications and without making annotations to reduce the number of false positives. 20/03/2009

Dr Andy Brooks

14

Figure 3 Bug Types/Lúsar Tegundir Example General Concurrency

Exceptions Array Mathematics

ESC/Java 2

FindBugs

JLint

PMD

Null dereference

√*

√*

√*



Possible deadlock

√*



√*



Possible unexpected exception

√*

Length may be less than zero



√*

Division by zero

√*



Unreachable code due to constant guard



Using == or !=



√*



Object overriding

Equal objects must have equal hashcodes

√*

√*

√*

I/O stream

Stream not closed on all paths

√*

Unused local variable



Should be a static inner class

√*

Conditional,loop String

Unused or duplicate statement Design Unnecessary statement

Unnecessary return statement

√*

√*

√*

* Means that the tool checks the specific example provided. 20/03/2009

Dr Andy Brooks

15

running time/inningartími

Experiments/Tilraunir • Five mid-sized programs. • Running times of the static analysis tools varied enormously: – ESC/Java 2 took a few hours – FindBugs and PMD took a few minutes – JLint took a few seconds – Timings were on a Mac OS X v10.3.3 with 1.25 GHz PowerPC G4 processor and 512MB RAM. 20/03/2009

Dr Andy Brooks

16

number of warnings/fjöldi viðvarana repeated warnings/endurteknar viðvaranir raw/óunnin

Figure 4 Raw* warning counts NCSS (Lines)

Class ESC/Java FindBugs Files 2

Azureus 2.0.7

35,549 1,053

Art of Illusion 1.7

55,249

Tomcat 5.019 JBoss 3.2.3 Megamek 0.29

JLint

PMD

5474

360

1584

1371

676

12813

481

1637

1992

34,425

290

1241

245

3247

1236

8,354

274

1539

79

317

153

37,255

270

6402

223

4353

536

* No attempt to remove repeated warnings about the same error/sama villan. 20/03/2009

Dr Andy Brooks

17

“For ESC/Java 2, the number of generated warnings is sometimes extremely high... 12000/60 = 200 hours. FindBugs generally reports fewer warnings than the other tools. In general, we found this makes FindBugs easier to use...” 480/60 = 7 hours.

20/03/2009

Dr Andy Brooks

18

too many warnings/of margar viðvaranirtylft category/flokkur dozen/tylft

Analysis/Greining

• There are too many warnings to review manually so the authors decide to analyse a few bug categories only. • Even restricting the bug categories is not enough, so manual examination is limited to “several dozen warnings”. Hmmm, looks like another false positive...

20/03/2009

Dr Andy Brooks

19

Figure 6 Warning counts by bug category ESC/Java 2 FindBugs

JLint

PMD

Concurrency

126

122

8883

0

Null dereferencing

9120

18

449

0

Null assignment 0

0

0

594

Index out of bounds

1810

0

264

0

Prefer Zero Length Array

0

36

0

0

20/03/2009

Dr Andy Brooks

20

concurrent/samskeiða

Concurrency • ESC/Java 2 reported 126 deadlock warnings. The authors looked at a handful of these and some appeared to be false positives. They found it difficult to investigate further because of the way the warnings were reported. • FindBugs found three cases (true bugs) of the „double-checked locking bug‟ in Java but PMD produced no warnings (false negatives). – PMD was fooled by some other code mixed in with the bug pattern. 20/03/2009

Dr Andy Brooks

21

Concurrency • FindBugs was usually found to correctly indicate the presence of a concurrency bug pattern but it was not always clear that an actual error was present. – FindBugs has no interprocedural analysis so it will warn about a wait() not being in a loop even though the wait() might be in a method which is in a loop. – And not all uses of a wait() outside a loop are wrong. 20/03/2009

Dr Andy Brooks

22

Concurrency • In some cases, JLint produced many warnings from the same underlying bug. – Sometimes several hundred.

• The duplication could be eliminated by making changes to Jlint. – Reporting a cycle in a lock graph just once.

• The sheer volume of JLint warnings made it difficult to assess the false positive rate. 20/03/2009

Dr Andy Brooks

23

overlap/skörun

Null dereferencing • Not a lot of overlap was found between ESC/Java 2, FindBugs, and Jlint. • In a fair number of cases, JLint´s warnings were false positives. – To eliminate some of the false positives, a deep analysis of program logic is required, which is difficult for a static analysis tool.

20/03/2009

Dr Andy Brooks

24

Null dereferencing • ESC/Java 2 often assumes objects might be null so reports more warnings than the other tools. – Too many warnings to be easily useful. – Annotations are required to be made.

• ESC/Java 2 does not always report warnings in the same place as JLint. – Overlap is not 100%. 20/03/2009

Dr Andy Brooks

25

heuristics/brjóstvitsfræði

Null dereferencing and Null assignment • FindBugs reports much fewer warnings because it uses heuristics to avoid reporting warnings when its dataflow analysis loses precision. • PMD warns about setting certain objects to null but the authors regarded this as not useful “for many common coding styles.” – False positives. 20/03/2009

Dr Andy Brooks

26

Array Bounds Errors • Java throws a run-time exception if an array index is out of bounds. • JLint and ESC/Java 2 check for array index is out of bounds and creating an array with negative size. • JLint and ESC/Java were found not always to report the same warnings in the same places. – Overlap not 100%. 20/03/2009

Dr Andy Brooks

27

Array Bounds Errors • ESC/Java 2 mainly reports warnings because parameters that are later used in array accesses may not be within range. – Annotations are required to be made.

• JLint has several false positives and some false negatives. – Some information is not tracked interprocedurally in its dataflow analysis. – See the following JLint example. 20/03/2009

Dr Andy Brooks

28

Array Bounds Errors JLint example public class Foo { static Integer[] ary = new Integer[2]; public static void assign() { Object o0 = ary[ary.length]; Object o1 = ary[ary.length-1]; } }

• JLint warns that the access for o1 might be out of bounds because it thinks the length of the array might be zero. (False positive.) • But JLint does not warn about the access for o0 which will always be out of bounds. (False negative.) 20/03/2009

Dr Andy Brooks

29

Array Bounds Errors • FindBugs warns about returning null from a method that returns an array. – Might be better to use a 0-length array.

20/03/2009

Dr Andy Brooks

30

correlation/fylgni

Correlations r = 0,34

• Regarding the total warning counts per class, no substantial correlations were found comparing JLint vs PMD (r = 0,15), JLint vs FindBugs (r = 0,33), or FindBugs vs PMD (r = 0,31). • Regarding lines of code per class and warning counts, weak or no correlations were found. 20/03/2009

Dr Andy Brooks

31

usability/nytsemi

Usability of Tools • Tools with a GUI and/or IDE plugins were much more usable than tools that provided only textual output. • Tools should avoid cascading errors: – JLint repeatedly warns about dereferencing a variable that may be null. • Warning about the first dereference should be sufficient.

20/03/2009

Dr Andy Brooks

32

Usability of Tools • False positives are a problem for all the tools. – Only ESC/Java 2 supports annotations to eliminate false positives.

• Allowing developers to weigh the severity of types of warnings might be useful.

20/03/2009

Dr Andy Brooks

33

distinct/ólíkur

Conclusions/Niðurstöður • While there was some overlap between tools, “mostly warnings are distinct”. • The main usability problem is the sheer volume of warnings.

20/03/2009

Dr Andy Brooks

34

Conclusions/Niðurstöður • All tools should support an annotation facility to reduce the number of false positives. • Work is needed to determine actual faults in programs so that false positive and false negative rates can be precisely determined. Even restricting the bug categories is not enough, so manual examination is limited to “several dozen warnings”.

20/03/2009

Dr Andy Brooks

35

Conclusions/Niðurstöður • The absence of warning does not imply the absence of error. • We still do not know the right tradeoffs to make in bug finding tools.

20/03/2009

Dr Andy Brooks

36

Suggest Documents