manually designed and tuned algorithms to detect bugs. ... challenge that effectively learning a bug detector requires examples of both correct and incorrect ...
DeepBugs: A Learning Approach to Name-Based Bug Detection MICHAEL PRADEL, TU Darmstadt, Germany KOUSHIK SEN, University of California, Berkeley, USA Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing name-based bug detection approaches reason about names on a syntactic level and rely on manually designed and tuned algorithms to detect bugs. This paper presents DeepBugs, a learning approach to name-based bug detection, which reasons about names based on a semantic representation and which automatically learns bug detectors instead of manually writing them. We formulate bug detection as a binary classification problem and train a classifier that distinguishes correct from incorrect code. To address the challenge that effectively learning a bug detector requires examples of both correct and incorrect code, we create likely incorrect code examples from an existing corpus of code through simple code transformations. A novel insight learned from our work is that learning from artificially seeded bugs yields bug detectors that are effective at finding bugs in real-world code. We implement our idea into a framework for learning-based and name-based bug detection. Three bug detectors built on top of the framework detect accidentally swapped function arguments, incorrect binary operators, and incorrect operands in binary operations. Applying the approach to a corpus of 150,000 JavaScript files yields bug detectors that have a high accuracy (between 89% and 95%), are very efficient (less than 20 milliseconds per analyzed file), and reveal 102 programming mistakes (with 68% true positive rate) in real-world code. CCS Concepts: · Software and its engineering → Software notations and tools; Software verification and validation; · Computing methodologies → Neural networks; Additional Key Words and Phrases: Bug detection, Natural language, Machine learning, Name-based program analysis, JavaScript ACM Reference Format: Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-Based Bug Detection. Proc. ACM Program. Lang. 2, OOPSLA, Article 147 (November 2018), 25 pages. https://doi.org/10.1145/3276517
1 INTRODUCTION Source code written by humans contains valuable natural language information, such as the identifier names of variables and functions. This information often conveys insights into the semantics intended by the developer, and therefore is crucial for human program understanding [Butler et al. 2010; Lawrie et al. 2006]. While the importance of identifier names for humans is widely recognized, program analyses typically ignore most or even all identifier names. For example, popular static analysis tools, such as Google Error Prone [Aftandilian et al. 2012] , FindBugs [Hovemeyer and Pugh 2004], or lgtm1 , mostly ignore identifier names. As a result, analyzing a program that has 1 https://lgtm.com/
Authors’ addresses: Michael Pradel, TU Darmstadt, Department of Computer Science, Germany; Koushik Sen, University of California, Berkeley, EECS Department, USA. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, This work licensed under a Creative Commons Attribution 4.0 International License. contact theisowner/author(s). © 2018 Copyright held by the owner/author(s). 2475-1421/2018/11-ART147 https://doi.org/10.1145/3276517 Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 147. Publication date: November 2018.
147
147:2
Michael Pradel and Koushik Sen Table 1. Examples of name-related bugs detected by DeepBugs.
ID
Buggy code
Description
1
browserSingleton . startPoller (100 , function ( delay , fn ) { setTimeout ( delay , fn ) ; }) ;
The setTimeout function expects two arguments: a callback function and the number of milliseconds after which to invoke the callback. The code accidentally passes these arguments in the inverse order.
2
for ( j = 0; j < param . replace ; j ++) { if ( param . replace [ j ]. from === paramVal ) paramVal = param . replace [ j ]. to ; }
The header of the for-loop compares the index variable j to the array param.replace. Instead, the code should compare j to param.replace.length.
3
for ( var i = 0; i < this . NR_OF_MULTIDELAYS ; i ++) { // Invert the signal of every even multiDelay outputSamples = mixSampleBuffers ( outputSamples , this . multiDelays [ i ]. process ( filteredSamples ) , 2% i ==0 , this . NR_OF_MULTIDELAYS ) ; /* ^^^^^^ */ }
The highlighted expression 2%i==0 is supposed to alternate between true and false while traversing the loop. However, the code accidentally swapped the operands and should instead be i%2==0.
meaningful identifiers chosen by human developers yields the same results as analyzing a variant of the program where identifiers are consistently replaced with arbitrary and meaningless names. Ignoring identifier names causes existing bug detection tools to miss bugs that, in hindsight, may appear obvious to a human. Table 1 gives three examples of such bugs. All three are from real-world code written in JavaScript, a language where identifiers are particularly important due to the lack of static types. Example 1 shows a bug in Angular.js where the developer accidentally passes two function arguments in the wrong order. The first expected argument is a callback function, but the second argument is called fn, an abbreviation for łfunctionž. Example 2 shows a bug in the Angular-UI-Router project where the developer compares two values of incompatible types with each other. In the absence of statically declared types, this inconsistency can be spotted based on the unusual combination of identifier names. Finally, Example 3 shows a bug in the DSP.js library where the developer accidentally swapped the operands of a binary operation inside a loop. A human might detect this bug knowing that i is a common name for a loop variable, which suggests that the code does not match the intended semantics. As illustrated by these examples, identifier names convey valuable information that can help to detect otherwise missed programming mistakes. One reason why most program analyses, including bug detection tools, ignore identifier names is that reasoning about them is hard. Specifically, there are two challenges for a name-based bug detector. First, a name-based analysis must reason about the meaning of identifier names. As a form of natural language information, identifier names are inherently fuzzy and elude the precise reasoning that is otherwise common in program analysis. Second, given an understanding of the meaning of identifier names, an analysis must decide whether a given piece of code is correct or incorrect. To be practical as a bug detection tool, the second challenge must be addressed in a way that yields a reasonably low number of false positives while detecting actual bugs. Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 147. Publication date: November 2018.
DeepBugs: A Learning Approach to Name-Based Bug Detection
147:3
Previous work on name-based bug detection [Hùst and Østvold 2009; Liu et al. 2016; Pradel and Gross 2011; Rice et al. 2017] addresses these challenges by lexically reasoning about identifiers and through manually designed algorithms. To reason about the meaning of identifiers, these approaches use lexical similarities of names as a proxy for semantic similarity. For example, the existing approaches may find length and len to be similar because they share a common substring, but miss the fact that length and count are semantically similar even though they are lexically different. To make decisions about programs, e.g., to report a piece of code as likely incorrect, existing name-based analyses rely on manually designed algorithms that use hard-coded patterns and carefully tuned heuristics. For example, a name-based analysis that has been recently deployed at Google [Rice et al. 2017] comes with various heuristics to increase the number of detected bugs and to decrease the number of false positives. Designing and fine-tuning such heuristics imposes a significant human effort that is difficult to reuse across different analyses and different classes of bugs. This paper tackles the problem of name-based bug detection with a machine learning-based approach. To address the problem of reasoning about the meaning of identifiers, we use a learned vector representation of identifiers. This representation, called embeddings, preserves semantic similarities, such as the fact that length and count are similar. Such embeddings have been successful for several natural language processing tasks and adopting them to source code is a natural choice for name-based bug detection. To address the problem of deciding whether a piece of code is likely correct or incorrect, we formulate the problem as binary classification and train a model to distinguish correct from incorrect code. Because the classifier is learned without human intervention, the approach does not rely on designing and tuning heuristics. Effectively learning a classifier that distinguishes correct from incorrect code requires training data that consists of both correct and incorrect examples. Examples of correct code are easily available due to the huge amounts of existing code, based on the common assumption that most parts of most code are correct. In contrast, large amounts of code examples that are incorrect for a specific reason are much harder to find. In particular, manually obtaining a sufficiently large data set would require a human to label thousands of bugs. To address this problem, we generate large amounts of training data via simple program transformations that insert likely bugs into existing, supposedly correct code. An important insight of our work is that learning from such artificially generated training data yields a learned model that is effective at identifying real-world bugs. We implement our ideas into an extensible framework, called DeepBugs, that supports different classes of name-related bugs. The framework extracts positive training examples from a code corpus, applies a simple transformation to also create large amounts of negative training examples, trains a model to distinguish these two, and finally uses the trained model for identifying mistakes in previously unseen code. We present three bug detectors based on DeepBugs that find accidentally swapped function arguments, incorrect binary operators, and incorrect operands in binary operations. Creating a new bug detector consists of two simple steps. First, provide a training data generator that extracts correct and incorrect code examples from a given corpus of code. Second, map each code example into a vector that the machine learning model learns to classify as correct or incorrect. For the second step, all bug detectors reuse the same embedding of identifier names, simplifying the task of creating a name-based bug detector. Our approach differs from existing bug detectors that identify bugs as anomalies in a corpus of code [Engler et al. 2001; Hangal and Lam 2002; Monperrus et al. 2010]. These approaches infer information from existing code by learning only from correct examples and then flag any code as unusual that deviates from the norm. To reduce false positives, those approaches typically filter the detected anomalies based on manually designed heuristics. Instead, our approach learns from positive and negative examples, enabling the machine learning model to accurately distinguish Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 147. Publication date: November 2018.
147:4
Michael Pradel and Koushik Sen
these two classes. DeepBugs differs from existing work on name-based bug detection [Hùst and Østvold 2009; Liu et al. 2016; Pradel and Gross 2011; Rice et al. 2017] by reasoning about identifiers based on a semantic representation, by learning bug detectors instead of manually writing them, and by considering two new name-related bug patterns on top of the previously considered swapped function arguments [Liu et al. 2016; Pradel and Gross 2011; Rice et al. 2017]. Finally, our work is the first on name-based bug detection for dynamically typed languages, where name-related bugs may remain unnoticed due to the lack of static type checking. We evaluate DeepBugs and its three instantiations by learning from a corpus of 100,000 JavaScript files and by searching mistakes in another 50,000 JavaScript files. In total, the corpus amounts to 68 million lines of code. We find that the learned bug detectors have an accuracy between 89% and 95%, i.e., they are very effective at distinguishing correct from incorrect code. Manually inspecting a subset of the warnings reported by the bug detectors, we found 102 real-world bugs and code quality problems among 150 inspected. Even though we do not perform any manual tuning or filtering of warnings, the bug detectors have a reasonable precision of 68%, i.e., the majority of the reported warnings point to actual bugs. In summary, this paper contributes the following: • A learning approach to name-based bug detection, which differs from previous name-based bug detectors (i) by reasoning about identifier names based on a semantic representation, (ii) by learning bug detectors instead of manually writing them, (iii) by considering additional bug patterns, and (iv) by targeting a dynamically typed programming language. • We formulate bug detection as a classification problem and present a framework to learn a classifier from examples of correct and incorrect code. To obtain large amounts of training data for both classes, we create training data through simple program transformations that yield likely incorrect code. • We implement the idea into a general framework that can be instantiated into different kinds of name-based bug detectors. The framework is available as open-source, enabling others to build on our work, e.g., by adding further bug detectors: https://github.com/michaelpradel/DeepBugs • We provide empirical evidence that the approach yields effective bug detectors that find various bugs in real-world JavaScript code. 2 A FRAMEWORK FOR LEARNING TO FIND NAME-RELATED BUGS This section presents the DeepBugs framework for automatically creating name-based bug detectors via machine learning. The basic idea is to train a classifier to distinguish between code that is an instance of a name-related bug pattern and code that does not suffer from this bug pattern. By bug pattern, we informally mean a class of programming errors that are similar because they violate the same rule. For example, accidentally swapping the arguments passed to a function, calling the wrong API method, or using the wrong binary operator are bug patterns. Manually written bug checkers, such as FindBugs or Error Prone, are also based on bug patterns, each of which corresponds to a separately implemented analysis. 2.1 Overview Given a corpus of code, creating and using a bug detector based on DeepBugs consists of several steps. Figure 1 illustrates the process with a simple example. (1) Extract and generate training data from the corpus. This step statically extracts positive, i.e., likely correct, code examples from the given corpus and generates negative, i.e., likely incorrect, code examples. Because we assume that most code in the corpus is correct, the Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 147. Publication date: November 2018.
DeepBugs: A Learning Approach to Name-Based Bug Detection
Corpus of code setSize(width, height)
(1) Training data generator Positive examples setSize(width, height)
(2) Embeddings for identifiers Vector representations (0.37, -0.87, 0.04, ..) (-0.13., 0.63, 0.38, ..)
Negative examples setSize(height, width) (4) Embeddings Previously unseen code for identifiers Vector representations setDim(y_dim, x_dim) (-0.12, 0.67, 0.35, ..)
147:5
(5) Query classifier
(3) Train classifier Learned bug detector
(6) Predict
Likely bugs and code quality problems Incorrectly ordered arguments? setDim(y_dim, x_dim)
Fig. 1. Overview of our approach.
extracted positive code examples are likely to not suffer from the particular bug pattern. To also create negative training examples, DeepBugs applies simple code transformations that are likely to introduce a bug. (Step 1 in Figure 1.) (2) Represent code as vectors. This step translates each code example into a vector. To preserve semantic information conveyed by identifier names, we learn an embedding that maps identifiers to a semantic vector representation via a Word2Vec neural network [Mikolov et al. 2013b]. (Step 2 in Figure 1.) (3) Train a model to distinguish correct and incorrect examples. Given two sets of code that contain positive and negative examples, respectively, this step trains a classifier to distinguish between the two kinds of examples. The classifier is a feedforward neural network. (Step 3 in Figure 1.) (4) Predict bugs in previously unseen code. This step applies the classifier obtained in the previous step to predict whether a previously unseen piece of code suffers from the bug pattern. If the learned model classifies the code to be likely incorrect, the approach reports a warning to the developer. (Steps 4 to 6 in Figure 1.) The example in Figure 1 illustrates these steps for a bug detector aimed at finding incorrectly ordered function arguments. In Step 6, the bug detector warns about a likely bug where the arguments y_dim and x_dim should be swapped. The reason that the approach can spot such bugs is that the trained classifier generalizes beyond the training data based on the semantic representation of identifiers. For the example, the representation encodes that width and x_dim, as well as height and y_dim, are pairwise semantically similar, enabling DeepBugs to detect the bug. 2.2
Generating Training Data
An important prerequisite for any learning-based approach is a sufficiently large amount of training data. In this work, we formulate the problem of bug detection as a binary classification task and address it via supervised learning. To effectively address this task, our approach relies on training data for both classes, i.e., examples of both correct and incorrect code. As observed by others [Bielik et al. 2016; Nguyen and Nguyen 2015; Raychev et al. 2015], the huge amount of existing code provides ample of examples of likely correct code. In contrast, it is non-trivial to obtain many examples of code that suffers from a particular bug pattern. One possible approach is to manually or semi-automatically search code repositories and bug trackers for examples of bugs that match a given bug pattern. However, scaling this approach to thousands or even millions of examples, as required for advanced machine learning, is extremely difficult. Instead of relying on manual effort for creating training data, this work generates training data fully automatically from a given corpus of code. The key idea is to apply a simple code transformation τ that transforms likely correct code extracted from the corpus into likely incorrect code. Section 3 presents implementations of τ that apply simple AST-based code transformations.
Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 147. Publication date: November 2018.
147:6
Michael Pradel and Koushik Sen
Definition 2.1 (Training data generator). Let C ⊆ L be a set of code in a programming language L. Given a piece of code c ∈ C, a training data generator G : C → (2Cpos , 2Cneд ) creates two sets of code snippets Cpos ⊆ C and Cneд ⊆ L, which contain positive and negative training examples, respectively. The negative examples are created by applying transformation τ : C → C to each positive example: Cneд = {c neд | c neд = τ (cpos ) ∀cpos ∈ Cpos } By code snippet we mean a single expression, a single statement, or multiple related statements. Each code snippet contains enough information to determine whether it contains a bug. For example, a code snippet can be a call expression with two arguments, which enables a bug detector to decide whether the arguments are passed in the correct order. There are various ways to implement a training data generator. For example, suppose the bugs of interest are accidentally swapped arguments of function calls. A training data generator for this bug pattern gathers positive examples by extracting all function calls that have at least two arguments and negative examples by permuting the order of these arguments of the function calls. Under the assumption that the given code is mostly correct, the original calls are likely correct, whereas changing the order of arguments is likely to provide an incorrect call. Our idea of artificially creating likely incorrect code relates to mutation testing [Jia and Harman 2011] and to work on artificially introducing security vulnerabilities [Dolan-Gavitt et al. 2016; Pewny and Holz 2016]. These existing techniques are intended for evaluating the effectiveness of test suites and of vulnerability detection tools, respectively. Our work differs by creating likely incorrect code for a different purpose, training a machine learning model, and by considering bug patterns that are amenable to name-based bug detection. 2.3
Embeddings for Identifiers and Literals
As machine learning relies on vector representations of the analyzed data, to learn a bug detector, we require vector representations of code snippets. An important challenge for a name-based bug detector, is to reason about identifier names, which are natural language information and therefore inherently difficult to understand for a computer. Our goal is to distinguish semantically similar identifiers from dissimilar ones. For example, the bug detector that searches for swapped arguments may learn from examples such as done(error, result) that done(res, err) is likely to be wrong, because error ≈ err and result ≈ res, where ≈ refers to semantic similarity. In contrast, seq and sequoia are semantically dissimilar because they refer to different concepts, even though they share a common prefix of characters. As illustrated by these examples, semantic similarity does not always correspond to lexical similarity, as considered by prior work [Liu et al. 2016; Pradel and Gross 2011; Rice et al. 2017], and may even exist cross type boundaries. To enable a machine learning-based bug detector to reason about identifiers and their semantic similarities, we require a representation of identifiers that preserves these semantic similarities. In addition to identifiers, we also consider literals in code, such as true and 23, because they also convey relevant semantic information that can help to detect bugs. For example, true and 1 are similar (in JavaScript, at least) because both evaluate to true when being used in a conditional. To simplify the presentation, we say łidentifierž to denote both identifiers and literals. Our implementation disambiguates tokens that represent identifiers and literals from each other and from language keywords by prepending the former with łID:ž and the latter with łLIT:ž. DeepBugs reasons about identifiers by automatically learning a vector representation, called embeddings, for each identifier based on a corpus of code: Definition 2.2 (Embeddings). The embeddings are a map E : I → Re that assigns to each identifier in the set I of identifiers a real-valued vector in an e-dimensional space. Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 147. Publication date: November 2018.
DeepBugs: A Learning Approach to Name-Based Bug Detection
147:7
A naïve representation is a local, or one-hot, encoding, where e = |I | and where each vector returned by E contains only zeros except for a single element that is set to one and that represents the specific identifier. Such a local representation fails to provide two important properties. First, to enable efficient learning, we require an embedding that stores many identifiers in relatively short vectors. Second, to enable DeepBugs to generalize across non-identical but semantically similar identifiers, we require an embedding that assigns a similar vector to semantically similar identifiers. Instead of a local embedding, we use a distributed embedding, where the information about an identifier is distributed across all elements of the vector returned by E. Our distributed embedding is inspired by word embeddings for natural languages, specifically by Word2Vec [Mikolov et al. 2013a]. The basic idea of Word2Vec is that the meaning of a word can be derived from the various contexts in which this word is used. In natural languages, the context of an occurrence of a word in a sequence of words is the window of words preceding and succeeding the word. We adapt this idea to source code by viewing code as a sequence of tokens and by defining the context of the occurrence of an identifier as its immediately preceding and succeeding tokens. Given a sequence of tokens t 1 , ..., ti , ..., tk , where ti is an identifier, the approach considers a window of w tokens around ti , containing the w2 tokens before ti and the w2 tokens after ti . As a default, we choose w = 20 in our experiments. We use the CBOW variant of Word2Vec [Mikolov et al. 2013a], which trains a neural network that predicts a token from its surrounding tokens. To achieve this goal, the network feeds the given information through a hidden layer of size e, from which the token is predicted. We use e = 200 for our experiments. Once trained, the network has learned a semantics-preserving representation of each identifier, and we use this representation as the embedding E of the identifier. For efficiency during training, we limit the vocabulary V of tokens, including identifiers and literals, to |V | = 10, 000 by discarding the least frequent tokens. To represent tokens beyond V , we use a placeholder łunknownž. Section 5.7 shows that this way of bounding the vocabulary size covers the vast majority of all occurrences of identifiers and literals. 2.4 Vector Representations of Positive and Negative Code Examples Given code snippets extracted from a corpus, our approach uses the embeddings for identifiers to represent each snippet as a vector suitable for learning: Definition 2.3 (Code representation). Given a code snippet c ∈ C, its code representation v ∈ Rn is an n-dimensional real-valued vector that contains the embeddings of all identifiers in c. Each bug detector built on top of the DeepBugs framework chooses a code representation suitable for the specific kind of code snippet (explained in detail in Section 3). For example, to detect bugs related to function arguments, the code representation may contain the embeddings of the function name and the arguments. All bug detectors share the same technique for extracting names of expressions. Given an AST node n that represents an expression, we extract name (n) as follows: • • • • • • • •
If n is an identifier, return its name. If n is a literal, return a string representation of its value. If n is a this expression, return łthisž. If n is an update expression that increments or decrements x, return name (x ). If n is a member expression base.prop that accesses a property, return name (prop). If n is a member expression base[k] that accesses an array element, return name (base). If n is a call expression base.callee (..), return name (callee). For any other AST node n, do not extract its name. Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 147. Publication date: November 2018.
147:8
Michael Pradel and Koushik Sen Table 2. Examples of identifier names and literals extracted for name-based bug detectors.
Expression Extracted name list 23 this i++ myObject.prop myArray[5] nextElement() db.allNames()[3]
ID:list LIT:23 LIT:this ID:i ID:prop ID:myArray ID:nextElement ID:allNames
Table 2 gives examples of names extracted from JavaScript expressions. We use the prefixes łID:ž and łLIT:ž to distinguish identifiers and literals. The extraction technique is similar to that used in manually created name-based bug detectors [Liu et al. 2016; Pradel and Gross 2011; Rice et al. 2017], but omits heuristics to make the extracted name suitable for a lexical comparison of names. For example, existing techniques remove common prefixes, such as get to increase the lexical similarity between, e.g., getNames and names. Instead, DeepBugs identifies semantic similarities of names through learned embeddings. 2.5
Training and Querying a Bug Detector
Based on the vector representation of code snippets, a bug detector is a model that distinguishes between vectors that correspond to correct and incorrect code examples, respectively. Definition 2.4 (Bug detector). A bug detector D is a binary classifier D : C → [0, 1] that predicts the probability that a code snippet c ∈ C is an instance of a particular bug pattern. Training a bug detector consists of two steps. At first, DeepBugs computes for each positive example cpos ∈ Cpos its vector representation vpos ∈ Rn , which yields a set Vpos of vectors. Likewise, the approach computes the set Vneд from the negative examples c neд ∈ Cneд . Then, we train the bug detector D in a supervised manner by providing two kinds of input-output pairs: (vpos , 0) and (vneд , 1). The output of the learned model can be interpreted as the probability that the given code snippet is incorrect. That is, the model is trained to predict that positive code examples are correct and that negative code examples are incorrect. In principle, a bug detector can be implemented by any classification technique. We use a feedforward neural network with an input layer of a size that depends on the code representation provided by the specific bug detector, a single hidden layer of size 200, and an output layer with a single element that represents the probability computed by D. We apply a dropout of 0.2 to the input layer and the hidden layer. As the loss function, we use binary cross-entropy and train the network with the RMSprop optimizer for 10 epochs with batch size 100. Given a sufficiently large set of training data, the bug detector will generalize beyond the training examples and one can query it with previously unseen code. To this end, DeepBugs extracts code snippets Cnew in the same way as extracting the positive training data. For example, for a bug detector that identifies swapped function arguments, the approach extracts all function calls including their unmodified arguments. Next, DeepBugs computes the vector representation of each example c new ∈ Cnew , which yields a set Vnew . Finally, we query the trained bug detector D with every vnew ∈ Vnew and obtain for each code snippet a prediction of the probability that it is incorrect. To report warnings about bugs to a developer, DeepBugs ranks all warnings by Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 147. Publication date: November 2018.
DeepBugs: A Learning Approach to Name-Based Bug Detection
147:9
the predicted probability in descending order. In addition, one can control the overall number of warnings by omitting all warnings with a probability below a configurable threshold. It is important to note that bug detectors built with DeepBugs do not require any heuristics or manually designed filters of warnings, as commonly used in existing name-based bug detectors [Liu et al. 2016; Pradel and Gross 2011; Rice et al. 2017]. For example, the start-of-the-art bug detector to detect accidentally swapped function arguments relies on a hard-coded list of function names for which swapping the arguments is expected, such as flip, transpose, or reverse [Rice et al. 2017]. Instead of hard-coding such heuristics, which is time-consuming and likely incomplete, learned name-based bug detectors infer these kinds of exceptions from the training data. 3 NAME-BASED BUG DETECTORS This section presents three examples of name-based bug detectors built on top of the DeepBugs framework. The bug detectors address a diverse set of programming mistakes: accidentally swapped function arguments, incorrect binary operators, and incorrect operands in binary expressions. While the first bug pattern has been the target of previous work for statically typed languages [Liu et al. 2016; Pradel and Gross 2011; Rice et al. 2017], we are not aware of a name-based bug detector for the other two bug patterns. Implementing new bug detectors is straightforward, and we envision future work to create more instances of our framework, e.g., based on bug patterns mined from version histories [Brown et al. 2017; Hanam et al. 2016]. Each bug detector consists of two simple ingredients. • Training data generator. A training data generator that traverses the code corpus and extracts positive and negative code examples for the particular bug pattern based on a code transformation (Definition 2.1). We find a simple AST-based traversal and transformation to be sufficient for all studied bug patterns. • Code representation. A mapping of each code example into a vector that the machine learning model learns to classify as either benign or buggy (Definition 2.3). All bug detectors presented here build on the same embeddings of identifier names, allowing us to amortize the one-time effort of learning an embedding across different bug detectors. Given these two ingredients and a corpus of training code, our framework learns a bug detector that identifies programming mistakes in previously unseen code. The remainder of this section presents three bug detectors build on top of DeepBugs. 3.1 Swapped Function Arguments The first bug detector addresses accidentally swapped arguments. This kind of mistake can occur both in statically typed and dynamically typed languages. For statically typed languages, this kind of bug occurs for methods that accept multiple equally typed arguments. For dynamically typed languages, the problem is potentially more widespread because all calls that pass two or more arguments are susceptible to the mistake due to the lack of static type checking. Example 1 in Table 1 shows a real-world example of this bug pattern. Training Data Generator. To create training examples from given code, the approach traverses the AST of each file in the code corpus and visits each call site that has two or more arguments. For each such call site, the approach extracts the following information: • The name ncall ee of the called function. • The names nar д1 and nar д2 of the first and second argument. • The name nbase of the base object if the call is a method call, or an empty string otherwise. • For arguments that are literals, the types tar д1 and tar д2 of the first and second argument, or empty strings if the arguments are values other than literals. Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 147. Publication date: November 2018.
147:10
Michael Pradel and Koushik Sen
• The names npar am1 and npar am2 of the formal parameters of the called function, or empty strings if unavailable. All names are extracted using the name function defined in Section 2.4. Type information is used only for literals because JavaScript is a dynamically typed language. The extracted types correspond to the built-in types of JavaScript (more specifically, ECMAScript 5 [ECMA 2011]): łstringž, e.g., for a literal "abc", łnumberž, e.g., for a literal 23, łbooleanž for the literals true and false, łRegExpž for regular expression literals, and łnullž for null. An addition, we consider undefined values to have type łundefinedž. This information is available only for a small subset of all code locations analyzed by DeepBugs. For example, among all calls analyzed for swapped arguments, only 18% have a known type for one and only 4% have a known type for both arguments. To obtain the names of the formal parameters of the called function, we resolve function calls heuristically, as sound static call resolution is non-trivial in JavaScript. If either ncall ee , nar д1 , or nar д2 are unavailable, e.g., because the name function cannot extract the name of a complex expression, then the approach ignores this call site. From the extracted information, the training data generator creates for each call site a positive example xpos = (nbase , ncall ee , nar д1 , nar д2 , tar д1 , tar д2 , npar am1 , npar am2 ) and a negative example x neд = (nbase , ncall ee , nar д2 , nar д1 , tar д2 , tar д1 , npar am1 , npar am2 ). That is, to create the negative example, we simply swap the arguments w.r.t. the order in the original code. Code representation. To enable DeepBugs to learn from the positive and negative examples, we transform xpos and x neд from tuples of strings into vectors. To this end, the approach represents each string in the tuple xpos or x neд as a vector. Each name n is represented as E (n), where E is the learned embedding from Section 2.3. To represent type names as vectors, we define a function T that maps each built-in type in JavaScript to a randomly chosen binary vector of length 5. For example, the type łstringž may be represented by a vector T (string) = [0, 1, 1, 0, 0], whereas the type łnumberž may be represented by a vector T (number) = [1, 0, 1, 1, 0]. Finally, based on the vector representation of each element in the tuple xpos or x neд , we compute the code representation for xpos or x neд as the concatenation the individual vectors. 3.2 Wrong Binary Operator The next two bug detectors address mistakes related to binary operations. At first, we consider code that accidentally uses the wrong binary operator, e.g., i