APISynth: A New Graph-Based API Recommender ...

11 downloads 43431 Views 222KB Size Report
Current API recommendation tools yield either good recall ratio or good precision .... cal Editing Framework (used by Prospector) and Java SDK Utility. (used by ...
APISynth: A New Graph-Based API Recommender System Chen Lv1,2 , Wei Jiang1,2,3 , Yue Liu1,2 and Songlin Hu2 2

1 University of Chinese Academy of Sciences, China Institute of Computing Technology, Chinese Academy of Sciences, China 3 Greatwall Drilling Company R&D Academy of Well Logging,CNPC

{lvchen, jiangwei, liuyue01, husonglin}@ict.ac.cn ABSTRACT

Prospector [2] can describe potential invoking relationships among APIs and forms a good basis for better recall ratio, but the single shortest path algorithm it uses cannot guarantee correct output of Direct Acyclic Graph (DAG)-like MISs, sometimes leading to false results. Unlike Prospector, tools in [1, 3, 4] model the relevant API dependencies as a set of unconnected graphs, each of them only represents internal API dependencies of either a class file [1] or a function [3, 4]. Even though the reduction of searching Categories and Subject Descriptors space makes it easy to yield good precision, the isolation of indiD.2.3 [SOFTWARE ENGINEERING]: Coding Tools and Techniques- vidual graphs will cause the missing of some potential invocation Object-oriented programming relationships among APIs in different graphs, and consequently decrease the recall ratio. In this paper, we propose a tool, called APISynth, to help the General Terms developers to perform object instantiation task. A new connected Experimentation, Languages graph is designed to maintain good recall ratio as [2]; a new element called "tag" is added to classical graph together with a new reachKeywords ability property to avoid false invocation of APIs, and thus enables correct searching of DAG-like MISs. To achieve good precision, it Code Assistant; API Recommender; Code Reuse models the MISs searching problem as a novel Top-K DAGs problem. Correspondingly, a Key-Path based Loose (KPL) algorithm is 1. INTRODUCTION proposed to solve this problem. Preliminary evaluation proves the As a typical representative of software reuse, object instantiation benefits of APISynth. needs correctly synthesizing of APIs to meet a query with a pair of Source and Destination type. Due to sophisticated features and enormous quantity of APIs, it will be difficult and sometimes im2. METHODOLOGY practical to manually find out the API Method Invoking Sequences As shown in Figure 1, the process of APISynth consists of three (MISs) in time[1]. The MIS can serve as the solution that instantiphases: 1) Modeling–builds the ate the Destination type from the Source type. graph model based on the given Several automatic API recommendation tools are currently availsource codes; 2) Searching– able to recommend such MISs to the developers. These tools colsearches the DAGs on the graph lect the source or example codes related to a certain library and for the given query; and 3) Rankrepresent the relevant API dependencies (e.g., field access, method ing–sorts the final set of DAGs to call or statement control flow) using different graph based modassist the developers to select the els. After that, based on such graphs, the subgraphs (corresponding desired result efficiently. Figure 1: Architecture to the MISs) from Source to Destination can be inferred by either adopting path-oriented graph search algorithm [1, 2, 3] or selecting 2.1 Modeling a best matching pattern (sub-graph observed frequently) [4]. APISynth leverages Spoon (http://spoon.gforge.inria.fr/)and JavasThe problem is, with our observation, current tools can either sit (http://en.wikipedia.org/wiki/ achieve good recall ratio or good precision, but not both. For inAa; Javassist) to parse source codes, B b = B.M1(); C c = b.M2(); stance, depending on a connected graph built on the whole library, and builds a so-called Weighted Current API recommendation tools yield either good recall ratio or good precision, but not both. A tool named APISynth is proposed in this paper by utilizing a new graph based approach. Preliminary evaluation demonstrates that APISynth wins over the state of the art with respect to both the two criteria.

Input

Modeling

1

DAG Constructor

2

Graph Model 4 5

6

Key Path

9

Rank Processor

Key-path Provider

A code example

Symbolic Form

CompilationUnitEditor editor; JavaPlugin jp = JavaPlugin.getDefault(); IWorkingCopyManager manager =jp.getWorkingCopyManager(); IEditorInput iei = editor.getEditorInput(); ICompilationUnit unit = manager.getWorkingCopy(iei);

D d = a.M3(); E e = c.M4(d);

API Graph (WAG). WAG is a connected graph, but unlike the model used in Prospector, the nodes A E Destination in WAG represent API methods, Source and a directed edge will be built Figure 2: Example between two nodes once the output type of the former API matches any input type of the latter. Furthermore, the matching type will be utilized as a "tag" of the edge, and the API usage frequency is retrieved and used as Qual(Input)

A

(Output) (Tag) D

A M3 D 7.2

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE ’14, May 31 - June 7, 2014, Hyderabad, India Copyright 2014 ACM 978-1-4503-2768-8/14/06 ...$15.00.

Output

8

7

3

Ranking

Candidiate Results

Results

Graph Builder

Source Codes

Query

Searching

Code Analyzer

(QoS)

M1 B B 4.5

B M2 C 3.7

D M 4 E C 1.7

E

ity of Service (QoS) to assign weight to the corresponding nodes. Briefly, the API usage frequency can be obtained by computing the number that each API method appears among the relevant projects. Let us consider the code example shown in Figure 2, which provides a MIS to instantiate the ICompilationU nit (Destination) type according to an object of the CompilationU nitEditor (Source) type. For convenience, the MIS is represented by the simplified symbolic form in the upper-right corner. Its WAG model is shown at the bottom. Note that, the nodes of Source and Destination are added as special temporary nodes only when issuing a query request. The Source node only has output and Destination node only has input. We also add edges with "tag" between each temporary node and the original API nodes, e.g., A → M3 . In practice, an API can be invoked only when all its input types are instantiated, and that is also the reason why MISs might be in the form of DAGs. To reflect this feature, the reachability of WAG is defined in different way from that of the classical graph: a node is reachable (meaning that a specific API can be invoked) if and only if all of its input tags have been instantiated.

2.2

Searching

With the help of the "tags" and the new reachability property of WAG, only the APIs whose input types are all instantiated can be included in the results. It makes it possible for APISynth to directly support searching of DAG-like MISs. Based on that, we define a Top-K DAGs problem to meet the requirement of object instantiation: given a query with a pair of Source and Destination, the set DAGAll represents all the DAGs from Source to Destination and the set DAGT opK represents those DAGs whose overall QoS values are the top k ones among DAGAll . Details of the overall QoS computing rules can be acquired from our previous work [5]. To solve this problem, a Key-Path based Loose (KPL) algorithm is designed. Different from the existing path-oriented searching [2], KPL algorithm is a DAG-oriented approach. The algorithm contains four steps. First, KPL algorithm conducts a forward search which starts with the Source node on WAG to retrieve all the reachable nodes like Dijkstra algorithm. Second, KPL algorithm conducts a backward search which starts with Destination node to retrieve the "optimal" key path hop by hop with the information stored in the above process. After that, it pushes the retrieved "optimal" key path into a priority queue and conducts the following steps recursively. Third, KPL algorithm pops the key path that has the best overall QoS among the current priority queue and constructs all the corresponding DAGs with this popped key path. Please refer to [5] for the details. If the number of all the already constructed DAGs is less than k, it conducts the forth step. Otherwise, the algorithm terminates. Forth, KPL algorithm generates new worse key paths of the current popped key path obtained in the third step, and pushes these new ones into the priority queue again. They can be obtained by "loose" operation presented in [5]. After that, KPL Algorithm will conduct the third step again in order to construct more DAGs.

2.3

Ranking

Our ranking considers four criteria for a DAG: overall QoS, number of nodes, number of edges and length of its key path. The heuristic rule is: the smaller value of these four criteria represents the higher rank.

3.

EVALUATION

We evaluated our tool on a 2.4GHz CPU machine with 4GB RAM running Microsoft Windows 7. We compared it against Prospector [2] and GraPacc [4]. Our tool worked on the same Graphical Editing Framework (used by Prospector) and Java SDK Utility

(used by GraPacc). 61 queries included in the above frameworks were used for evaluation For each query, we employed seven human experts to judge which one in the query’s results list is correct by the majority. Precision, Recall and F-score ratio were used to measure the accuracy. The metric of P@1 which is widely used in information retrieval is adopted to evaluate the precision. P@1 means the percentage of the queries whose top 1 result is correct; Recall is the percentage of the queries that can be given a correct solution; Fscore, 2 / (1/P@1 + 1/Recall). Table 1: Performance Comparison Query set Approach Recall P@1 F-score

APIS 1.00 0.95 0.97

QuerySet 1 Prosp Improve 0.90 +10% 0.61 +34% 0.73 +24%

APIS 1.00 0.90 0.95

QuerySet 2 GraP Improve 0.40 +60% 1.00 -10% 0.57 +38%

Table 1 shows that a P@1 improvement of 34% is gained compared with the connected graph based tool, Prospector. The reason behind the good P@1 of APISynth is that many top 1 results are DAG-like MISs which can be correctly searched by APISynth. Table 1 also shows that a Recall improvement of 60% is gained by APISynth in comparison with the isolated graph based tools, GraPacc. The reason for not get good recall with GraPacc is that GraPacc cannot meet the queries which solutions are are split among its isolated graph models. Moreover, APISynth achieves higher Fscore value than the other approaches.

4.

CONCLUSION AND FUTURE WORK

A tool for automatic object instantiation is introduced. We propose a new connected graph that can maintain good recall ratio. A new element "tag" of this graph together with a new reachability property can guarantee output of correct DAG-like MISs, and thus increase precision. Based on that, we design an efficient DAGoriented searching algorithm that can search DAG-like MISs correctly. In the future work, we will extend APISynth to update the recommendations when the API library migrates to a new version.

5.

ACKNOWLEDGMENTS

For the completion of this research, we would like to thank the National Natural Science Foundation of China under Grant No.61070027. This work is also supported by State Key Laboratory of Software Engineering (SKLSE2012-09-02).

6.

REFERENCES

[1] N. Sahavechaphan and K. Claypool. Xsnippet: mining for sample code. ACM SIGPLAN Notices, 41(10):413–430, 2006. [2] David Mandelin, Lin Xu, Rastislav Bodík, and Doug Kimelman. Jungloid mining: helping to navigate the api jungle. ACM SIGPLAN Notices, 40(6):48–61, 2005. [3] Suresh Thummalapenta and Tao Xie. Parseweb: a programmer assistant for reusing open source code on the web. In Proc. ASE, pages 204–213, New York, NY, USA, 2007. ACM. [4] A.T. Nguyen, H.A. Nguyen, T.T. Nguyen, and T.N. Nguyen. Grapacc: a graph-based pattern-oriented, context-sensitive code completion tool. In Proc ICSE, pages 1407–1410. IEEE Press, 2012. [5] Jiang Wei, Hu Songlin, and Liu Zhiyong. Top k query for qos-aware automatic service composition. IEEE Transactions on Services Computing, 2013 (preprint, DOI:http://doi.ieeecomputersociety.org/10.1109/TSC.2013.41).