An Evaluation of the Strategies of Sorting, Filtering, and Grouping API Methods for Code Completion Daqing Hou Electrical and Computer Engineering Clarkson University, Potsdam, NY USA 13699
[email protected]
Abstract—Code Completion is one of the most popular IDE features for accessing APIs, freeing programmers from remembering specific details about an API and reducing keystrokes. We propose three ways to enhance the current code-completion systems to work more effectively with large APIs. First, we propose two methods for sorting APIs, by type hierarchy and by use count, and show that their use significantly reduces the number of API proposals a user must navigate while using Code Completion. Second, we show that context-specific filtering of inappropriate proposals can also reduce the number of proposals a user must navigate. Third, we propose to group API proposals by their functional roles, which can help maintain a well-ordered, meaningful list of API proposals in the presence of dynamic reordering. These functionalities are grouped into a research prototype, BCC (Better Code Completion). We evaluated fourteen configurations of BCC by simulating Code Completion nearly three million times on nine open-source Java projects that utilize AWT/Swing. Keywords-Object-oriented Programming, Programming Environments, Code Completion.
I. I NTRODUCTION Software frameworks and libraries have clearly contributed substantially to programmer productivity and software quality. But this kind of reuse has also created a massive amount of information and artifacts that programmers have to deal with. For example, the Standard Edition of the Java Development Kit version 1.6 contains 3,777 classes and interfaces. Modern programmers must master the necessary skills to effectively work with this vast amount of information and artifacts and to overcome learning barriers [1]. This is far from a trivial task, and many programmers rely on tools and features such as IDEs (Integrated Development Environments) to help deal with the proliferation of APIs (Application Programming Interfaces). One of such standard features in IDEs is Code Completion. Code Completion regularly saves IDE users time when they work with hard-to-remember or unfamiliar APIs. Since Code Completion is an interface through which programmers view and access APIs on a daily basis, it is important to design it to be highly usable. The usability issue is especially acute when the number of APIs to be viewed by a programmer becomes too large (say, more than twenty). In
David M. Pletcher Computer Science Clarkson University, Potsdam, NY USA 13699
[email protected]
fact, many classes in AWT/Swing commonly contain more than 300 methods. In such circumstances, it is beneficial for a programmer to see the needed API as early as possible. BCC (Better Code Completion) is a research prototype that demonstrates the feasibility of several ideas for improving Code Completion systems. In this paper, we report on an evaluation of BCC. Specifically, we show that ranking APIs based on type hierarchy and use counts can reduce the number of API proposals that a programmer has to browse before selecting the one that she or he needs. Based on the current coding context, BCC also filters out APIs irrelevant to the current task. Finally, BCC presents related APIs together as a group, potentially increasing API learnability. The contributions of this paper include 1) The design and implementation of a set of new strategies for organizing API proposals in Code Completion, including type-based sorting, filtering, and grouping (see Section III); 2) A thorough evaluation and analysis of fourteen configurations of these strategies and the use-count-based sorting, using nine open-sourced, small to large frameworks/applications that use an industrial framework (AWT/Swing) (Section IV); 3) A set of design recommendations for future Code Completion systems (see Section IV-E). BCC can be downloaded from the Internet 1 . II. BACKGROUND The Code Completion displays in a popup pane a list of class members that can be accessed from or invoked upon a specified “receiver” expression. Typically, hitting “.” after an identifier, expression, or the “this” or “super” keywords invokes the code-completion process. During this process, the Code Completion engine computes a list of completion proposals, which are either method invocations or field accesses that could be used to complete the current Java expression. If the user continues to type after the initial “.”, the list of proposals is automatically filtered such that only those whose names begin with the typed-in 1 www.clarkson.edu/∼dhou/projects/BCC
(All URLs verified 4/1/2011)
(a) Sorting alphabetically all completion proposals available for the receiver.
(b) Moving two relevant proposals to the top and sorting the remaining proposals alphabetically.
Figure 1: Eclipse Code Completion (ECC, as of version 3.4) sorts completion proposals either alphabetically (a) or by relevance (b).
letters are displayed. As shown in Figure 1, the completion proposals are listed one per line, showing the member name and argument names and types (if applicable), return type (or declared type for a field), and the first type in the type hierarchy on the path from the receiver type to java.lang.Object where the member was declared. As of version 3.4, ECC (Eclipse’s Code Completion) has two sorting methods for completion proposals. The first, as shown in Figure 1a, is to sort the completion proposals by member name, in alphabetical order. This is usually ineffective for objects with large APIs as this sorting ignores the level in the hierarchy where the member is defined. The second method is sorting by relevance. The completion proposal computer assigns each proposal a relevance score during the computation process. However, relevance only comes into play when the expected return type of the completion invocation target is non-trivial. For instance, if the completion process is invoked on the right side of an assignment statement, the type expected on the left side of the assignment statement is used to boost the relevance scores of proposals that return or resolve to that type. Proposals that are not relevant as described above fall into the alphabetical order. An example of sorting by relevance is shown in Figure 1b. With Code Completion, programmers are freed from having to remember all the specific details about each API. Instead, they can rely on Code Completion as a just-intime reminder to help recall and access these details only as they are needed. For example, for Java 2D graphics, a programmer may need to remember only generally that java.awt.Graphics2D contains various APIs for painting, such as drawOval() and drawString(), but not the exact names, parameters, or the detailed semantics for each API. These details can be discovered using Code Completion on the fly. In particular, Code Completion allows a programmer to browse and choose the relevant APIs, and to access
the associated documentation, all in the context where the programmer is actively coding. Thus, Code Completion helps programmers avoid switching work contexts and the ensued interruptions to their train-of-thoughts. In this way, Code Completion supports programmers in best utilizing their brain power so that they can focus on more important information, handle larger problems, and work more effectively. Over time, Code Completion helps a programmer incrementally learn the used APIs. Two typical scenarios can be identified when a programmer uses Code Completion to complete an API usage expression. In the first scenario, the programmer may already know the exact spelling, or a good portion of the prefix of the API name to be used. In this case, she or he can just type the name or prefix as usual without major interruption from Code Completion. In the second scenario, the programmer may know only a receiver expression and a rough idea of what is to be achieved with the receiver. In this case, she or he may rely on Code Completion as a quick, within-context alternative for searching and browsing documentation, as shown by the JavaDoc in the right portion of Figure 1a. In this paper, we target mainly the second scenario for optimizing Code Completion. III. BCC: B ETTER C ODE C OMPLETION At the core of any type-based Code Completion system is an engine capable of computing the static type of a receiver expression that a programmer is currently working on. All APIs from this type and its supertypes that are available for use in the current coding context, which are also known as completion proposals, are presented to the programmer in a popup pane in a certain order. The presentation of the popup pane can be customized by filtering and sorting the list of completion proposals in various ways. Other than the two sorting methods introduced in Section II, the default ECC offers limited options for user customization. The goal for BCC is to provide programmers
(a) One version of BCC (type) sorts proposals by position of declaration type in type hierarchy and then alphabetically within type.
(b) One version of BCC (ranking) sorts proposals by popularity.
Figure 2: BCC sorts proposals by the type hierarchy (a) or by popularity ranking (b).
with more options for controlling how code completion proposals can be grouped, filtered, and sorted. In this section, we describe the main features of BCC before presenting its design and implementation in Section III-D. A. Sorting APIs BCC provides two more options for sorting completion proposals than ECC. Type-hierachy-based sorting. With this strategy, BCC sorts the list of completion proposals in the order from the declared type to the root in the type hierarchy (java.lang.Object). An example of this behavior is shown in Figure 2a, where methods from JButton are shown before AbstractButton, unlike the default Code Completion ECC shown in Figure 1, which mixes up the methods without considering their relative positions in the hierarchy. Popularity-based sorting. The use counts (aka, frequencies, or popularity) by which APIs have been invoked statically in source code can be used to sort APIs 2 . The more frequently an API is used, the earlier it should appear in the popup pane. An example of this behavior is shown in Figure 2b, where two methods are shown at the top of the popup pane before all the other methods in JButton and AbstractButton. This is because these two methods were used and BCC sorted them according to their use counts. B. Filtering APIs BCC allows users to define context-sensitive filters to filter out completion proposals that are deemed certainly irrelevant in the current coding context. The following are three scenarios where API filters can be applied. Private Filter. The class javax.swing.JComponent contains the public method updateUI(). This method is made public because the Swing classes that invoke it are located outside 2 The
scope by which API calls are counted can be a design variable.
of the package javax.swing. Although public, this method is not intended to be directly accessed by client code. That is, it is not a client API. BCC can be configured to filter out such non-API public methods. Private Filter With Receiver Exceptions. Single inheritance in Java has been shown to generally lead to cleaner design, but it can have negative consequences too. One example is that class javax.swing.JComponent inherits from java.awt.Container. The designers probably made this decision to avoid the code duplication that would result from having each “true container” class redefine the exact same Container methods. However, this means that non-container classes like JLabel also inherit from Container, even though JLabel is probably never intended to be a container. A filter is needed to make such methods inaccessible for some receiver types (e.g., JLabel), but accessible for certain other receiver types (e.g., JPanel). Subclass Only. The public paint() method in class java.awt.Component should generally be called by an instance’s own repaint() method, or from within a subclass of Component that wishes to extend paint(). With BCC, paint() can be made accessible only from within subclasses of Component. C. Grouping APIs BCC allows programmers to manually specify that a set of methods belong to the same group 3 . These methods will be displayed together in the Code Completion popup pane. For example, the add()/getComponent()/remove() methods in java.awt.Container can be grouped together as they control how child components are added to, and removed from, a container. A group can span multiple classes in the same type hierarchy. We hypothesize that the grouping mechanism 3 The effort of specifying filters and groups is low, provided that the specifier understands the semantics of the API. Once filters and groups are defined for each API, they can be used by all of the future projects.
make it more effective for programmers to learn and use APIs than without as it can be used to present APIs logically.
input : proposal0; input : proposal1; return: -1 if proposal0 < proposal1; 1 if proposal0 > proposal1; 0 otherwise.
D. BCC Design and Implementation input : Document cu; input : int pos; effects: Performs Code Completion. (context, receiver) = dietParse (cu, pos); listOfProposals = calc (receiver); 3 filter (listOfProposals, context, receiver); 4 updateGroups (listOfProposals) ; 5 sort (listOfProposals, BCCCompare); 6 selectedProposal = display (listOfProposals); 7 incrementUseCount (selectedProposal); Algorithm 1: The BCC Code Completion process 1 2
BCC is implemented as an Eclipse plugin. The BCC implementation for the most part is a straightforward customization of the modular design of ECC. Thus we will focus mainly on the key design and implementation issues. The BCC Code Completion process is depicted in Algorithm 1. Lines 2 and 6, calculating and displaying the set of applicable proposals, respectively, reuse ECC’s implementation. The other lines are new to BCC. In particular, the dietParse() algorithm on line 1 is used to obtain the class where the current Code Completion is happening (context), and the receiver type (receiver). These two types are needed to filter APIs on line 3. Line 4 updates the API groups according to the content of listOfProposals. Line 5 sorts the list of proposals using a new comparator, BCCCompare() (Algorithm 2). Line 6 displays the proposals to the user, and returns the proposal that the user selects. Line 7 increments the use count for the selected proposal. BCC needs additional data structures to keep track of the use count for each visited completion proposal as well as the composition of a group. It models a completion proposal using its declaration type and its signature. For a group, BCC models the member completion proposals as well as the largest use count and the most specific type, for the group. When a proposal that is part of a group is compared for sorting, the type and the use count of its group are used. The type is needed because a group may be defined in terms of methods in higher level classes in the type hierarchy. If a completion proposal method overrides such a method, the overriding type will be used if it is more specific than the current group type. This is done at line 4 in Algorithm 1. BCCCompare(), listed in Algorithm 2, compares two proposals for sorting. The composite sorting key is made of use count, type, and signature, in descending order of priority. When a proposal has a group, the use count, type, and signature of the group, instead of the proposal itself, will be used for comparison. Two proposals in the same group
use0 = proposal0.useCount; use1 = proposal1.useCount; 3 type0 = proposal0.type; 4 type1 = proposal1.type; 5 signature0 =proposal0.signature; 6 signature1 =proposal1.signature; 7 group0 = getGroup (proposal0); 8 group1 = getGroup (proposal1); 9 if group0 exists then 10 use0 = group0.useCount; 11 type0 = group0.type; 12 signature0 =group0.signature; 13 end 14 if group1 exists then 15 use1 = group1.useCount; 16 type1 = group1.type; 17 signature1 =group1.signature; 18 end 19 if use0 > use1 then return -1; 20 if use0 < use1 then return 1; 21 if isSubtype (type0, type1) then return -1; 22 if isSubtype (type1, type0) then return 1; 23 return AlphaCompare (signature0, signature1); Algorithm 2: BCCCompare(proposal0, proposal1) 1 2
are compared alphabetically. The predicate isSubtype() returns true if the first parameter is a proper subtype of the second. The comparator AlphaCompare() compares two signatures alphabetically. The evaluation in Section IV was based on customizing BCCCompare() to produce the various configurations for BCC. BCC’s filter() is listed in Algorithm 3. The predicate compatible() returns true if the first parameter is a member of, or an exact match of, the type(s) specified by the second parameter. Internally, BCC supports the three API filtering scenarios described in Section III-B with two sets of rules, which are specified through two maps deny and allow, both of the type Map. A method is represented by a fully-qualified signature (QSignature). It may be either explicitly prohibited (deny), or only allowed (allow), within a certain kind of classes (Context) and with a certain kind of receiver types (Target). BCC supports two kinds of contexts, sub (subtypes of a given type) and any classes; and four kinds of targets, a class, sub, this and super, and any (for any receiver type when this and super are not used). To illustrate how filtering works, consider the sample rules for add() and paint() below, where the fully-qualified name
input : input : input : effects:
listOfProposals; context; receiver; Filter completion proposals according to two sets of filtering rules, deny and allow.
forall proposal in listOfProposals do qsignature = proposal.type +proposal.signature; 3 if deny.contains (qsignature) then 4 denyRules = deny.get (qsignature); 5 forall (contextR, receiverR) in denyRules do 6 if compatible (receiver, receiverR) && compatible (context, contextR) then 7 remove proposal from listOfProposals; 8 break; 9 end 10 end 11 end 12 if allow.contains (qsignature) then 13 allowed = false; 14 allowRules = allow.get (qsignature); 15 forall (contextR, receiverR) in allowRules do 16 if compatible (receiver, receiverR) && compatible (context, contextR) then 17 allowed = true; 18 break; 19 end 20 end 21 if !allowed then 22 remove proposal from listOfProposals; 23 end 24 end 25 end Algorithm 3: filter(listOfProposals, context, receiver) 1
2
for add() is java.awt.Container.add(java.awt.Component). allow: QSignature for add => { , ...
(1) (2)
deny: paint=>{} (3) #add=>{} (4) ...
The rule at (1) says that it is allowed to invoke add on a JPanel anywhere. The rule at (2) says that it is allowed to invoke add in any subclass of Container and when the receiver is super. Rule (3) says that the paint method must
Table I: The nine Java projects used in BCC evaluation. Projects jEdit JIDE-Common LAPIS OpenSwing SweetHome3D Swing Tutorial Examples SwingX Zeus NetBeans total
#API Calls 5,339 11,032 1,944 6,865 4,368 2,629 8,233 1,036 151,567 193,013
#Unique Calls 605 805 460 578 542 455 951 196 2,039 2,685
not be invoked anywhere. Rule (4) can serve as an alternative to rules (1) and (2) for JButton. It specifies that it is not allowed to invoke add on a JButton, anywhere. IV. E VALUATION The overall objective of our evaluation is to empirically answer two research questions about BCC: 1) Do BCC’s strategies outlined in Section III actually lead to significant improvement over the default ECC? 2) Which combination(s) of BCC’s strategies perform the best under different usage scenarios? The evaluation process is described in Section IV-A. Our first question is answered by the evaluation results presented in Sections IV-B through IV-D. We answer the second question by offering design recommendations (Section IV-E). A. Evaluation Metrics and Test Cases We decided to evaluate BCC against AWT/ Swing. Table I depicts the nine projects used in our evaluation 4 . For each project, we gathered the number of AWT/Swing API calls as well as the number of distinct APIs that are called. Note that the total for the unique APIs (counted by method signatures) does not add up due to overlapping among the projects. The projects include both GUI frameworks and applications that extend or use AWT/Swing. In particular, JIDECommon 5 , OpenSwing 6 , SwingX 7 , and Zeus 8 are UI frameworks or Java Swing Components libraries that extend AWT/Swing with new components and functionality. jEdit (a text editor) 9 , LAPIS (a text editor and web browser) 10 , SweetHome3D (an interior design tool) 11 , Sun’s Swing Tutorial Examples 12 , and NetBeans 13 are applications that make use of Swing and AWT. We use the rank of an API in the popup pane as the evaluation metric, with a value of 0 for the top API, 1 for 4 These
projects were chosen independent of the filters and groups used.
5 http://www.jidesoft.com/products/oss.htm 6 http://oswing.sourceforge.net/ 7 https://swingx.dev.java.net/ 8 http://sourceforge.net/projects/zeus-jscl/ 9 http://www.jedit.org/ 10 http://groups.csail.mit.edu/uid/lapis/ 11 http://www.sweethome3d.eu/index.jsp 12 http://download.oracle.com/javase/tutorial/uiswing/examples/components 13 http://netbeans.org/
Table II: Evaluating two ECC strategies (Columns 2 and 3) and seven BCC configurations (Columns 4 through 10) for rank reduction over nine Java projects. “Ranking” represents the popularity-based sorting strategy, and “type” represents the type-hierarchy-based sorting. Projects
ECC alphabetical
ECC byrelevance
filtering
type
type + filtering
ranking
ranking + filtering
ranking + type
jEdit JIDE-Common LAPIS OpenSwing SweetHome3D Swing Tutorials SwingX Zeus NetBeans Total %reduction w.r.t alphabetical %reduction w.r.t relevance
619,294 824,731 182,356 884,303 370,844 395,301 698,256 174,136 23,924,204 28,073,425
469,516 568,139 132,981 711,841 286,319 340,554 522,218 141,989 19,706,197 22,879,754 18.50%
366,537 457,608 101,615 546,627 224,531 262,921 404,729 110,504 14,997,097 17,472,169 37.76%
241,506 399,099 54,980 355,341 123,051 117,792 301,722 65,154 7,165,706 8,824,351 68.57%
211,745 344,017 48,175 292,191 105,494 103,546 237,940 55,774 6,205,877 7,604,759 72.91%
104,375 163,536 49,037 158,490 75,136 71,544 164,982 34,844 2,324,187 3,145,131 88.80%
84,436 135,446 38,494 129,528 58,720 57,918 135,875 27,535 1,991,124 2,659,076 90.53%
72,549 133,308 27,831 115,456 47,438 35,188 123,036 16,556 2,218,022 2,789,384 90.06%
ranking + type + filtering 61,102 114,654 23,895 98,952 39,157 31,136 105,812 14,137 1,924,340 2,413,185 91.40%
23.63%
61.43%
66.76%
86.25%
88.38%
87.81%
89.45%
Table III: Evaluating nine dynamic primers for rank reduction. The last row shows the results of comparing the regular dynamic ranking (Column 2) with each of the nine dynamic primers (Columns 3 through 11). A positive W means that the regular dynamic ranking is better than the corresponding dynamic primer; “S” means “statistically Significant” at the corresponding level of α beside “S”, and “I” “statistically Insignificant”. In each row, the largest and smallest ranks for a project are marked with > and ⊥, respectively. The comparison uses the directional Wilcoxson Signed-Rank Test, in terms of the percentages of rank reduction over ECC’s by-relevance. Projects
jEdit JIDE-Common LAPIS OpenSwing SweetHome3D Swing Tutorials SwingX Zeus NetBeans Wilcoxon (N=8)
regular dynamic ranking 61,102 114,654 23,895 98,952 39,157 31,136 105,812 14,137 1,924,340
NetBeans
72,783 134,686> 33,635 132,236 71,795 72,390 116,481 29,737 W=36, S 0.005
jEdit
113,666 26,071 105,105 49,445 43,470 108,569 18,329 1,908,993 W=30, S 0.025
JIDECommon
LAPIS
Open Swing
106,427>
65,991 116,174
74,310 123,977 30,104
44,323> 165,231> 79,697> 82,152> 140,937> 33,624> 2,132,046> W=36, S 0.005
the second API, and so on. The total rank for a project is the sum of the ranks for all of the API calls in the project. We configured BCC to produce fourteen different combinations of strategies. For each configuration, we ran it over each project to gather the total rank for that project. Specifically, our tool traverses each source file in the project and invokes BCC to perform code completion on the receiver of each method call encountered. A list of completion proposals is computed for the method call. The rank of the method to be called in that list is added to the total rank for the project. Our evaluation was run on Java 1.6, invoking the code completion engine nearly three million times. For each tested project, a project-wide total rank was also computed for ECC’s by-revelance strategy. We then computed the percentages of rank reduction from using
104,767 46,204 38,894 109,399 17,942 1,903,556 W=34, S 0.01
62,070 54,103 115,697 20,834 1,984,466 W=36, S 0.005
Sweet Home 3D 70,582 118,086 27,999 114,748 41,728 109,397 20,315 1,920,695 W=34, S 0.01
Swing Tutorials
SwingX
Zeus
59,689⊥ 113,233⊥ 22,981⊥ 100,982 41,527⊥
87,768 127,851 34,962 137,699 64,600 59,321
61,589 113,516 25,503 100,897⊥ 42,986 34,098⊥ 105,343
100,006⊥ 14,970⊥ 1,901,194 W=6, I
25,043 2,022,430 W=36, S 0.005
1,892,942⊥ W=20, I
the BCC configuration over ECC’s by-revelance strategy. A strategy is considered better, if it produces statistically higher percentages of reduction, or equivalently, smaller projectwide total API ranks, over all the projects. All comparisons are performed using the directional Wilcoxson Signed-Rank Test, in terms of the percentage of rank reduction. B. Evaluation 1: Rank Reduction with Sorting and Filtering We propose three strategies to reduce API ranks in the popup pane, the popularity-, or use-count-, based sorting, the type-hierarchy-based sorting, and filtering. In this section, we evaluate all seven combinations of these three strategies and compare them with ECC’s two strategies in terms of rank reduction. Table II shows the result of this evaluation. The second and third columns contain the rank results for ECC’s alpha-
betical and by-relevance order. The next seven columns are data for all seven possible configurations with BCC’s two sorting methods and filtering. The bottom three rows show the total ranks that each configuration produces for all nine projects, as well as the percentages of rank reduction that each configuration achieves over the two ECC strategies. Remarkably, in all of the rows in Table II, from left to right, the ranks decrease monotonically, with only two pairs of exceptions highlighted in bold. The first pair appears in columns type+filtering and ranking for LAPIS, and the second pair in columns ranking+ filtering and ranking+ type for NetBeans. To test whether the left column performs significantly worse than than the right for the two pairs of columns involved above, we performed two directional Wilcoxon Signed-Rank Tests. The first test from type+ filtering to ranking shows that the type+ filtering configuration is statistically significantly worse than the ranking in terms of percentages of rank reduction over ECC’s by relevance (W=-43, N=9, significant at α = 0.005). The second test from ranking+ filtering to ranking+ type shows that ranking+ type is better than ranking+ filtering (W=-41, N=9, significant at α = 0.01). Therefore, we conclude that the following order would hold for the nine Code Completion configurations in terms of the percentages of project-wide rank reduction over ECC’s by-relevance strategy: ECC’s alphabetical < ECC’s by-relevance < filtering < type < type+filtering < ranking < ranking+filtering < ranking+type < ranking+type+filtering. This shows that all of BCC’s seven configurations in Table II significantly improve over ECC’s alphabetical and byrelevance order in terms of project-wide total rank reduction. While ranking (the popularity-based sorting) outperforms both filtering and type (the type-hierarchy-based sorting) individually, it is also clear that both the type-hierarchybased sorting and filtering add additional, unique values to the popularity-based sorting and cannot be completely subsumed by the latter. In particular, Table II shows that the combination of the type-hierarchy-based sorting, filtering, and dynamic ranking yields the highest percentage of total rank reduction for all projects (the right most column, 91.40% and 89.45%, respectively). It also does so consistently across all projects. Therefore, this configuration is a highly promising candidate for improving Code Completion. We call this configuration regular dynamic ranking. The comparison of Columns ranking + filtering and ranking + type in Table II also underscores the importance of applying the right statistical methods. The highlighted percentages for total rank reduction (88.38% and 87.81%, respectively) might mislead some to conclude that ranking + filtering is better than ranking + type. A paired comparison between these two columns shows that ranking + filtering is better than ranking + type only for the NetBeans project.
For the other eight projects, it is always worse than ranking + type. As shown, this difference is statistically significant. C. Evaluation 2: Further Rank Reduction with Primers An important conclusion in Section IV-B is that the regular dynamic ranking is the most effective configuration in rank reduction. However, when a programmer starts to use this configuration initially, the use counts for all APIs will be zero and, thus, the API popularity is not leveraged. A potential improvement could be using the API use counts from a “representative” project as a primer for the regular dynamic ranking configuration. With the dynamic primer configuration, a project is first selected as the primer project 14 . BCC then counts the API use in the primer project. During the initial actual code completion, BCC uses the API use counts from the primer to rank completion proposals. Every time this version of Code Completion is used, the use count for the selected API will be incremented by one. Thus, the order of APIs would change dynamically as they are called with Code Completion. That is why it is called dynamic. To understand the effects of a primer and investigate what makes a “representative” project and an ideal primer, we evaluated two ways of priming the regular dynamic ranking: with a dynamic primer, and with an adjusted primer. Because we did not know which projects would make a good primer, we applied each of them as a primer and tested it against the other eight projects. For each primer and a project under testing, we collected the sum of API ranks for all relevant code completions in that project. We then compared the rank results of regular dynamic ranking with that of each primer, using the directional Wilcoxon Signed-Rank Test. Table III depicts the rank data and the results of the statistical testing. Perhaps a little surprising, Table III shows that seven of the nine projects are not a good primer in the sense that when they are used as a primer, they perform statistically significantly worse than the regular dynamic ranking (see the last row in Table III). The other two projects also occasionally perform worse than the regular dynamic ranking but the differences are not significant. The dynamic primers failed to further improve rank reduction beyond regular dynamic ranking. We hypothesized that the reason for the failure would be due to the way it exploits use counts in ranking APIs. Ranking the APIs directly in terms of the use counts from the primer made the API order rigid and less sensitive to the way in which APIs were actually used in the current testing project. The adjusted primers was designed to test this hypothesis. Instead of using the raw use counts to rank APIs, we used a normalized floating value between zero and one by dividing each use count by the largest one. In this way, initially, APIs are still ranked by their popularity in the primer project. 14 For
example, a project to be maintained can be used as the primer.
Table IV: Evaluating the nine adjusted primers for rank reduction (Columns 3 through 11). The Swing Tutorials column, with W=-28, outperforms the regular dynamic ranking (Column 2) whereas SwingX performs worse. Projects
jEdit JIDE-Common LAPIS OpenSwing SweetHome3D Swing Tutorials SwingX Zeus NetBeans Wilcoxon (N=8)
regular dynamic ranking 61,102 114,654 23,895 98,952 39,157 31,136 105,812 14,137 1,924,340
NetBeans 54,121⊥ 106,269⊥ 21,932 98,070 39,809 32,483 103,732 15,542 W=-15, I
jEdit
111,053 22,641 96,708 38,962⊥ 31,832 106,391 13,797 1,897,764 W=-24, I
JIDECommon
LAPIS
Open Swing
62,732
60,012 113,173
59,343 113,085 23,041
28,420 105,736 45,074 40,436 110,317 18,001 1,888,894 W=34, S 0.01
Table V: Impact of grouping on rank reduction. Projects
grouping
type + grouping
type + filtering + grouping
jEdit JIDE-Common LAPIS OpenSwing SweetHome3D Swing Tutorials SwingX Zeus NetBeans %reduction w.r.t by-relevance
465,859 563,802 126,658 660,306 297,438 344,897 500,312 133,023 19,635,350 0.66%
260,561 415,044 71,768 408,869 129,158 125,896 313,245 66,472 8,634,093 54.44%
222,458 355,760 60,231 330,059 109,782 118,160 250,035 55,694 7,253,067 61.73%
regular dynamic ranking + grouping 123,129 177,799 36,855 198,688 69,593 70,928 150,380 21,313 3,629,672 80.43%
However, this ranking may impact only the first use of a ranked API. Once an API is used, its use count will be greater than or equal to one, and the use pattern of an API in the current project will override the primer’s ranking. Table IV shows the rank data for the nine adjusted primers. These primers are compared with the regular dynamic ranking as a baseline, again using the directional Wilcoxon Signed-Rank Test. The result of the comparison is shown in the last row of Table IV. It indicates that only the Swing Tutorial project, when used as an adjusted primer, significantly reduces the ranks over the baseline configuration (W=-28, α = 0.05); JIDE-Common and SwingX perform significantly worse, indicating that they are probably not ideal candidates for an adjusted primer. D. Evaluation 3: Impact of Structure on Rank Reduction While ranking APIs dynamically can push a commonly used API to the top of the popup pane, thus reducing the time it takes to select the same API next time, it is likely that dynamic ranking may cause a usability problem because changing the order of APIs in the popup pane may confuse some users. A promising remedy to this problem is to introduce some structures to the APIs. One structural mechanism is to group APIs logically according to their functional roles
96,473⊥ 40,170 32,348 106,355 14,364 1,884,137 W=-1, I
40,021 33,392 106,854 13,646 1,903,276 W=-8, I
Sweet Home 3D 63,384 111,790 24,123 100,679 32,670 105,095 14,740 1,900,418 W=14, I
Swing Tutorials
SwingX
Zeus
60,109 109,871 21,867⊥ 97,091 40,008
62,535 111,698 25,546 102,707 42,824 34,504
60,918 111,349 23,242 97,643 40,876 29,477⊥ 104,847
103,444⊥ 12,590⊥ 1,904,206 W=-28, S 0.05
15,994 1,901,101 W=28, S 0.05
1,882,874⊥ W=20, I
and always display together APIs that belong to the same group. Another option is to replace dynamic ranking with other configurations to completely avoid API reordering. In this way, we add structures to the API space, helping keep the users better oriented. Table V shows the rank results for grouping alone (Column 2) as well as three more configurations that combine grouping with other strategies (Columns 3, 4, and 5). Comparing the three configurations for grouping in Table V with the corresponding ones without grouping in Table II (Columns 5, 6, and 11), there is only one place where a grouping strategy slightly outperforms the corresponding without-grouping strategy, that is, 55,694 versus 55,774 for the Zeus project under the type + filtering configuration. Thus the adoption of grouping in this case has consistently increased the total API ranks across all nine projects. The reason would be that APIs in the same group are not always used in the same frequencies. For example, addListener() tends to be used more often than removeListener(). When a less-frequently used API such as removeListener() is pushed to the top of the popup pane with its more popular peers, it may punish other APIs by pushing them further down in the popup pane and increasing their ranks. Although grouping increases the total API ranks, its advantage in improving API usability may outweigh this drawback and appeal to some users. Furthermore, the best performer for grouping (the rightmost column in Table V, regular dynamic ranking + grouping) achieves as high as 80.43% an overall rank reduction over ECC’s by-relevance configuration, indicating that it may be a viable choice for users who would like to benefit from the help of additional structures in APIs. A directional Wilcoxon Signed-Rank Test between the best performer for grouping and ranking is insignificant at the α = 0.05 level (W=-13, N=9). Additional tests indicate that the best performer for grouping performs statistically significantly better than type + filtering and type, but worse than ranking + filtering and ranking + type.
Table VI: Evaluating the nine static primers for rank reduction (Columns 3 through 11). Eight of them are better than the type+filtering configuration (Column 2), as indicated by the negative W values. Projects
type + filtering
NetBeans
jEdit JIDE-Common LAPIS OpenSwing SweetHome3D Swing Tutorials SwingX Zeus NetBeans Wilcoxon (N=8)
211,745 344,017 48,175 292,191 105,494 103,546 237,940 55,774 6,205,877
76,743⊥ 178,455⊥ 36,583⊥ 139,224⊥ 80,480⊥ 102,228 123,668⊥ 38,437 W=-36, S 0.005
jEdit
192,288 37,633 142,908 94,315 94,943 145,943 25,454⊥ 3,413,571⊥ W=-36, S 0.005
JIDECommon
LAPIS
Open Swing
205,359
104,795 225,916
110,483 209,301 40,758
71,613 279,337 144,704 139,172 207,184 57,755 6,685,630 W=18, I
The static primer can be a choice for completely avoiding dynamic ranking. In this configuration, as with a dynamic primer, BCC also counts API calls in the primer project and uses the API use counts to rank completion proposals during actual code completion. The difference is that the API use counts will not be changed dynamically as the APIs are used in the testing project. As a result, when a static primer is used, the order of APIs remains stable. We tested the static primer the same way as we tested the dynamic primer. Table VI depicts the rank data for all nine static primers and the type + filtering (Column 2). All static primers perform statistically significantly worse in terms of rank reduction than the regular dynamic ranking, which can be found in Table IV. To test the potential to use the static primers as an alternative structuring mechanism, we compare the performance of each primer pairwise with the best grouping configuration (regular dynamic ranking + grouping) in Table V, using a directional Wilcoxon Signed-Rank Test. The result (not shown) indicates that three of the static primers, jEdit, LAPIS, and NetBeans, may perform equally well as the best grouping configuration, and the other six primers perform significantly worse. But static primers can still be useful. Unlike the regular dynamic ranking + grouping, neither the static primer nor the type + filtering configuration reorder APIs dynamically. When compared pairwise with the type + filtering data shown in Table II, eight of the static primers, except JIDE-Common, perform significantly better than the type + filtering, but the JIDE-Common primer does not perform significantly worse either. Thus, static primers are the best of the static configurations. E. Design Recommendations Based on our evaluation, we recommend the following design options for future Code Completion systems: • For programmers who are familiar with the design of the APIs they are using and are using Code Completion
168,034 98,004 96,462 161,815 33,586 4,179,927 W=-36, S 0.005
•
•
109,108 106,644 151,335 27,375 4,376,037 W=-30, S 0.025
Sweet Home 3D 125,224 225,167 38,974 207,607 97,425 147,853 33,564 4,587,796 W=-36, S 0.005
Swing Tutorials
SwingX
Zeus
120,304 203,802 40,629 201,565 95,647
133,703 192,118 43,597 204,241 101,352 119,968
120,901 260,551 40,889 185,992 94,045 88,978⊥ 183,890
190,049 28,435 4,045,753 W=-36, S 0.005
35,762 5,135,225 W=-30, S 0.025
4,342,409 W=-36, S 0.005
mainly as a reminder, provide them with both the regular dynamic ranking and the adjusted primers. API providers can provide a default adjusted primer, and a user should be allowed to choose their own. For programmers who are less familiar with, and expect to learn, the design of the APIs they are using, provide them with the regular dynamic ranking + grouping. For programmers who do not feel comfortable with the dynamic API reordering but instead prefer a stable API order, provide them with the static primers + grouping and/or type + filtering + grouping. Note that the typebased sorting complements dynamic ranking nicely as it does not require use counts in order to work. V. R ELATED WORK
The two most closely related works are [2] and [3]. Both incorporate additional knowledge (program history and API association, respectively) to help API consumers more effectively select APIs. Both works try to reduce the number of APIs that a programmer has to go through before settling on the one she or he needs. However, their goals are to evaluate the precision and recall that the recommended APIs fall in the top n for a rather small value of n. We measure the project-wide rank reduction to assess the overall performance of each configuration. New contributions of BCC include type-hierarchy-based sorting, grouping, and filtering; our evaluation as well as our design recommendations. Mined API usage patterns can be a useful aid for programmers [3], [4]. Other related tools that are aimed at improving coding efficiency in the code editor include keyword programming, abbreviation based completion, and automated method completion. Keyword programming takes words that may appear in an API as input to create an expression [5]. In this way, it frees a programer from remembering the specific API names and reduces a programmer’s memory load. Abbreviation based completion speeds up coding by using abbreviated input to query syntactically valid code snippets [6]. Automated Method Completion exploits code
similarity to recommend code templates similar to what the programmer is working on in the code editor [7]. BCC uses API popularity to sort the APIs for Code Completion. API popularity has been proposed for informing the consumption and production of APIs [8], [9], [4]. However, our evaluation of the combinations of popularitybased sorting and the other strategies is new. BCC’s API filtering mechanism filters out public methods that are not APIs as well as APIs that are meant to be used in only limited contexts, such as in a subclass. Interestingly, the issue of non-API public methods seems to be important enough that a proposal to extend Java with the so-called “superpackage” modularity mechanism is being worked on in the Java community [10]. Preliminary versions of BCC have been described elsewhere [11], [12]. The new content in this paper includes the first complete description of its design and implementation, the evaluation, and the design recommendations. VI. T HREATS TO VALIDITY We evaluated BCC using the widely used the AWT/Swing APIs and nine projects that range from small to large. Although not all APIs and applications are as large as AWT/Swing or exhibit the same use patterns, AWT/Swing are certainly not the only large APIs [3]. We measure BCC mainly in terms of the reduction of the total rank for a project. We believe that rank reduction can have a noticeable impact on coding speed and productivity, especially when the used APIs are large. The advantage of this metric is that it can be easily computed. Although this metric may not always reflect the exact processes how individual programmers write code with the APIs, it is a good first evaluation of the proposed strategies. Total ranking is not the only factor that can be used to assess Code Completion. API usability is arguably a more important factor needed to be measured. For example, when APIs are small enough, say around twenty, it would be less critical to reduce ranks in the popup pane. However, grouping should still be useful as it may contribute to API learnability. When grouping is used, we can change the unit of navigation in the popup pane from individual proposals to groups, which would result in a noticeable reduction of ranks. It is also likely that users at different experience levels will prefer different BCC configurations. However, our usability discussions so far are mainly based on our personal experience with existing Code Completion systems and BCC itself. User studies will be necessary to fully explore these issues. Nonetheless, the current paper focuses on the BCC algorithms and their first evaluation in terms of ranks. Our current evaluation of grouping and filtering was based on a partial coverage of the AWT/Swing APIs. Although incomplete, they are by no means trivial. In particular, the grouping definitions cover the two most important ancestry classes of AWT/Swing, Component and Containter.
The filter definitions target 44 classes and 197 methods within Swing’s JComponent type hierarchy. Despite this incompleteness, both grouping and filtering still demonstrate significant improvement of API usability. BCC and the data used in this paper are available for download so that other researchers can verify our results. VII. C ONCLUSION AND FUTURE WORK We proposed and implemented new strategies for sorting, filtering, and grouping APIs in the Code Completion popup pane. We evaluated fourteen different configurations of the proposed strategies using nine small to large frameworks/applications that make use of the AWT/Swing APIs. We measured these configurations mainly in terms of project-wide rank reduction, but we also considered their likely usability implications and offered new design options as potential remedies. We analyzed in detail how these configurations worked on the nine projects. Based on the evaluation, we recommended a set of design options for future Code Completion. We feel that these design options are promising for improving the state of the art. As future work, it would be useful to further evaluate BCC with other APIs. Furthermore, user studies are needed to further understand and validate the usability issues discussed in this paper, for example, the effect of grouping APIs. Lastly, it would be interesting to investigate new schemes to make the dynamic primers work as an effective strategy. R EFERENCES [1] A. J. Ko, B. A. Myers, and H. H. Aung, “Six Learning Barriers in End-User Programming Systems,” in VLHCC, 2004, pp. 199–206. [2] R. Robbes and M. Lanza, “How Program History Can Improve Code Completion,” in ASE, 2008, pp. 317–326. [3] M. Bruch, M. Monperrus, and M. Mezini, “Learning from Examples to Improve Code Completion Systems,” in FSE, 2009, pp. 213–222. [4] M. Mooty, A. Faulring, J. Stylos, and B. A. Myers, “Calcite: Completing Code Completion for Constructors Using Crowds,” in VLHCC, 2010, pp. 15–22. [5] G. Little and R. C. Miller, “Keyword Programming in Java,” Autom. Softw. Eng., vol. 16, no. 1, pp. 37–71, 2009. [6] S. Han, D. Wallace, and R. Miller, “Code Completion from Abbreviated Input,” in ASE, 2009, pp. 332–343. [7] R. Hill and J. Rideout, “Automatic Method Completion,” in ASE, 2004, pp. 228–235. [8] R. Holmes and R. J. Walker, “A Newbie’s Guide to Eclipse APIs,” in MSR, 2008, pp. 149–152. [9] ——, “Informing Eclipse API Production and Consumption,” in ETX, 2007, pp. 70–74. [10] A. Buckley, “JSR 294: Improved Modularity Support In the Java Programming Language,” Sun Microsystems Inc., Tech. Rep., 2009, http://jcp.org/en/jsr/detail?id=294. [11] D. M. Pletcher and D. Hou, “BCC: Enhancing Code Completion for Better API Usability,” in ICSM, 2009, pp. 393–394, tool Demonstration. [12] D. Hou and D. M. Pletcher, “Towards A Better Code Completion System by API Grouping, Filtering, and PopularityBased Ranking,” in RSSE, 2010, 5 pp.