different types of relations exist between APIs, for instance, the Facebook API ... APIs for users, including APIs provider, category, developer, tag, textual descrip-.
Personalized API Recommendation via Implicit Preference Modeling Wei Gao1 , Liang Chen2 , Jian Wu1 , Hai Dong2 , and Athman Bouguettaya2 1 2
College of Computer Science & Technology, Zhejiang University, China School of Computer Science & Information Technology, RMIT, Australia {gw, wujian2000}@zju.edu.cn {liang.chen, Hai.dong, athman.bouguettaya}@rmit.edu.au
Abstract. With a huge amount of APIs on the Internet, understanding users’ complex needs and preferences for APIs becomes an important task. In this paper, we aim to uncover users’ implicit needs for APIs and recommend suitable APIs for users. Specifically, first different similarity scores between APIs are computed according to heterogeneous functional aspects of APIs. Next, users’ preferences for APIs is combined with similarities of APIs measured with different functional aspects, and matrix factorization technique is used to learn the latent representation of users and APIs for each functional aspect. Then we use a personalized weight learning approach to combine the latent factors of different aspects to get the predicted preferences of users for APIs.
1
Introduction
With the development of Web 2.0 paradigm and mobile service computing, API (Application Programming Interface) as a form of REST-ful service has become prevalent in developing Web and mobile applications. Consumers can enjoy the web or mobile service by simply invoking the APIs provided by service developers. For example, a user may find his/her location on a map by invoking the Google Map API, which is an open mapping service developed by Google. Since API provides a convenient way to deliver complicated service and is easily programmable, it has become ubiquitous to access Web or mobile services through APIs. As a result, the number of services has been experiencing a dramatic increase over the past few years. And platforms like ProgrammbleWeb3 , Mashape4 , APIStore5 that collect, host and aggregate APIs on the Internet are emerged to help users find APIs that meet their requirements in an efficient manner. However, according to an analysis conducted by ProgrammableWeb, the number of APIs has exploded from 299 in Oct. 2006 to 10,632 in Oct. 2013, and the number has reached 15,012 by May 2016. Thus it is quite challenging for a consumer to find APIs that are satisfactory to her. The huge number of 3 4 5
www.programmableweb.com www.mashape.com apistore.baidu.com
2
APIs calls for effective and efficient API management. And there is a need to understand users’ implicit needs and preferences for APIs, predict what APIs they are likely to use and recommend suitable APIs for users. Currently, none of existing API aggregation and hosting platform provides personalized API recommendation for users. In ProgrammableWeb, it only suggests recently updated APIs under the same category if a user visits the webpage of an API. Therefore, our work aims to understand users’ needs and preferences for APIs, predict which APIs users will likely to follow and proactively suggest a list of APIs that can meet users potential requirements. However, the number of APIs is large and users’ demand for APIs are complex and diverse, which makes it challenging to find APIs that match users’ interest. A lot of methods have been proposed for service recommendation, including collaborative-filtering based method [5][12][13], content-based method [2][3][6] and a hybrid of both [8][10][11]. CF-based methods predict users interest on services based on previous user-service interaction records. Content-based methods recommend services by analyzing the functional description of services, such as structured WSDL files or free form textual description and representing them as vector space model or topic model. Our work can be regarded as a hybrid recommendation method that combines collaborative filtering and content-based method. Figure 1 illustrates a scenario for our API recommendation model. In this example, users and APIs are connected via the follow relations representing users historical preferences for APIs. Furthermore, APIs are linked with different aspects of functional information. For instance, each API is associated with its provider, category, tags, textual information or the mashups composing it. And different types of relations exist between APIs, for instance, the Facebook API and the Picasa API are linked to each other as they belong the same Social category, and they share a common Photo tag. Similarly, there exist heterogeneous relations between APIs and different functional aspects of APIs and these rich source of functional relations can be utilized in conjunction with user preference relations for hybrid API recommendation. In this paper, we explore the multiple aspects of API functionalities and their effects on users’ preferences, and propose a novel API recommendation approach. We identify several functional aspects of APIs that could impact the adoption of APIs for users, including APIs provider, category, developer, tag, textual description and mashups it composes. We propose to measure the similarities of APIs according to different functional aspects. Then we combine the users historical following data with different aspects of API similarities in a collaborative filtering way. That is, we propagate the observed user-API following relations with potential relatedness between APIs under different types of functionality. Matrix factorization technique is applied on the propagated user preference data and latent factor representation for users and APIs is calculated for each functional aspect of API accordingly. Then we combine these latent factors of different aspects with different weights which is personalized for each user. Further a weight learning method is proposed to learn a personalized recommendation model for each user. The final predicted scores of users’ preferences for APIs are computed
3
Fig. 1. An example of API Recommendation Scenario
with the learned weights and top scoring APIs are used for recommendation. Our contributions in this work are summarized as follows. 1. We study the problem of discovering users’ implicit preferences for APIs and make personalized recommendations by mining rich heterogeneous functional information of APIs with various aspects. 2. We combine matrix factorization based collaborative filtering as well as different types of functional aspects of APIs with personalized weights to perform API recommendation. 3. We conduct extensive experiments on the ProgrammableWeb dataset and the result demonstrate the effectiveness of our recommendation approach. The remainder of this paper is organized as follows. In Section 2, we give an overview of the related work of service recommendation approaches. In Section 3, we present details of the proposed API recommendation approach. In Section 4, we present and analyze our experimental results. Section 5 concludes this paper.
2
Related Work
In this section, we describe several representative related work. In service computing area, understanding users’ preferences for services and recommending suitable services for users has always been a hot research topic. Service recommendation can be broadly classified into three categories: Functional-based service recommendation, non-functional service recommendation and a hybrid of both. Non-functional based service recommendation focus on predicting the non-functional features (i.e., Qos) of services and recommend services that gives the best Qos performance. For example, [5][12] use neighborhood based collaborative filtering approach by calculating the similarities of users or services by Pearson Correlation Coefficient and predict service ratings by aggregating the
4
ratings of top K neighbor users or services. [13] use matrix factorization technique which is a model-based approach to predict service ratings. However, the Qos attributes of APIs are not available due to expensive access to all APIs over the web. There are several approaches proposed to model functionality of services. The functionality of services is usually analyzed by the functional description of services, such as structured WSDL files or free form textual description. For example, In [3], services with similar functionalities are clustered together based on WSDL files and clustering result is used for service recommendation. Specifically, when a user request a service, services that in the same functional cluster are retrieved and recommended to the user.[2] use LDA-based approach to model functionality of services using both WSDL files and tags. [6] propose a relational topic modeling (RTM) technique to model the functionality of mashups, services and their links. In [9] a category-based approach was proposed to identify the latent categories of APIs and APIs are recommended within each category. However, the functional aspects of APIs are more abundant and complex compared to traditional web services and heterogeneous aspects of APIs and whether it is informative to infer users’ preferences is unexplored. The hybrid service recommendation method combines both functionality and Qos attributes of services. Other available side information can also be utilized for recommendation. For example, [10] explores and analyzes the relations between users and functional topics and tags of API and mashup with a coupled matrix factorization (CMF) model. [8] use collaborative topic regression technique to integrate both users’ preferences and functional features of services by combining probabilistic matrix factorization and topic modeling to recommend services for users. [11] proposed to combine collaborative filtering and content-based recommendation for service recommendation. User-related latent variables are introduced to model the services preference of users and the functional descriptions. Parameters are learnt through Expectation Maximization (EM) algorithm and used for preference prediction. [4] proposed to incorporate three heterogeneous factors to recommend APIs for mashup: the functionality of an API, the usage history of the API by mashups and the popularity of the API, and integrate different sources of information using Bayes’ theorem.
3
User Preference Modeling
In this section, we propose to model users’ preferences for APIs in terms of different aspects of APIs and propose a novel personalized API recommendation model by combining heterogeneous functional information of APIs and collaborative information of other users. Figure 2 illustrates the overall framework of our proposed API recommendation method. First, six functional aspects of APIs, namely provider, category, developer, tag, textual descriptions and mashups, are extracted for each API from the repository. Similarities of APIs are evaluated under different aspects of API which is introduced in Section 3.3. To predict users’ preferences for un-
5
Fig. 2. API Recommendation Framework
followed APIs, the user-API interaction matrix R is constructed and combined with API similarities measured with six aspects respectively. For each aspect of API, the users’ preferences are propagated through the relations of API with the method introduced in Section 3.2. Then, the six augmented matrices which encode both user-API historical information and functional information of APIs are factored into user latent factor matrices and API latent factor matrices respectively using matrix factorization technique described in Section 3.1. To predict users’ preferences for APIs, the predicted score for each aspect is linearly combined with different weights for each user which is described in Section 3.4. The weight learning method is introduced in Section 3.5. After weights are learned, the predicted scores of users for APIs are calculated accordingly and APIs that receive the highest score for each user are recommended. 3.1
Matrix Factorization
Matrix Factorization method is a widely used technique to model users’ preferences in service recommendation for it is effective when the scale of users and services is large and it deals well with data sparsity issue. We use matrix factorization as our base method for API recommendation. With m users and n
6
APIs, we construct a user-API follow matrix R ∈ Rm×n as follows: if a user ui followed an API aj , then the corresponding entry of Rij is 1. Otherwise, it is 0. The goal is to identify APIs that are potentially of interest to a user. That is, we need to fill in the 0 entries by predicting the preference score for unobserved user-API pairs. Matrix factorization assumes that users’ preferences for APIs are affected by a small number of latent factors and it models users and APIs as low dimensional latent vectors. Mathematically, it seeks to approximate the matrix R with the product of two low-rank matrices of users and APIs: R ≈ U V T , where U ∈ Rm×d are latent factors representing users, and V ∈ Rn×d are latent factors representing APIs, and d is the number of latent factors. The matrices U and V are learnt by minimizing the following objective function: m
n
λU λV 1 XX 2 2 Iij (Rij − ui vj )2 + kU k + kV k . 2 i=1 j=1 2 2
(1)
in which Iij is an indicator function equals to 1 when the user-API follow relation is observed, λU and λV are regularization parameters. For minimizing the objective function (1), we use the widely adopted gradient descent method which iteratively updates U and V by computing the gradients. 3.2
User Preference Propagation
Matrix factorization only captures the collaborative effect of users’ preferences for APIs. In this section, we utilize the heterogeneous functional information of APIs and incorporate them into the recommendation process. We propose to e on the basis of R by adding functional aspects construct an enhanced matrix R of APIs. The functionalities of an API are characterized by a set of aspects or attributes, for instance, the category an API belongs to or the developer of an API. APIs are related with each other under different semantic aspects. For example, two APIs are related when they are produced by the same developer. Or they have a connection in that they are published by the same provider. The relations of two APIs can be characterized by different aspects of APIs. This heterogeneous relations can be utilized to help user discover novel APIs with different functional aspects. Specifically, if two APIs are related to each other under a certain functional aspect which the user is concerned about. Then one of APIs in highly recommendable if the other one is followed. For example, if a user followed Google Maps API, and Google Picasa API is related with Google Maps API as they share the same provider Google, thus it is advisable to recommend Google Picasa API to the user. In other words, there is an implicit preference between the user and Google Picasa API and it is built by propagating from the follow relation between the user and Google Maps API and the functional relations between two APIs provided by Google. It is an intuitive way to model users’ preferences in that it imitates the API discovery process by the user. Our problem is, under a certain functional aspect of API, how to learn the degree of users’ preferences for the unfollowed APIs from the heterogeneous relations including the follow relations between users and APIs as well as the functional
7
relations between APIs. To this end, we use the following equation to infer users’ implicit preferences: Pn k=1 Rik sim(aj , ak ) eij = P (2) R n k=1 sim(aj , ak ) where sim(aj , ak ) measures the similarity of APIs under a particular functional aspect. From equation (2) we can see that the users’ implicit preferences for unknown APIs are measured by two parts: (1) The observed user-API follow relations represented by R, and (2) the similarities between APIs for a certain aspect represented by sim(·) function. The predicted score of an unfollowed API is the aggregation of APIs followed by the user weighted by the similarity between them and the predicted API. In this way, the users’ preferences are propagated from the original user-API follow matrix R by incorporating the functional aspect of APIs represented by their similarities. 3.3
API Similarity Modeling
In reality, multiple types of relations exist between APIs, and different types of relations has a different impact on users’ preferences. In this section, we identify six aspects to model the functionalities of an API and their impact on users’ preferences. We describe the similarity measuring method for each aspect as follows: 1. Provider The provider of an API is usually a website that publishes APIs for sharing and development. For example, Google Maps API and Google Picasa API are of the same provider Google.com. Each API has a unique provider. If two APIs have the same developer, their similarity should be higher than those that do not share the same developer. Denote provider(a) as the provider of API a, the similarity of APIs measured by the provider is calculated as: 1 provider(ai ) == provider(aj ) sim(1) (ai , aj ) = (3) 0 provider(ai ) 6= provider(aj ) 2. Category Each API belongs to some categories curated by the ProgrammableWeb taxonomy. For example, the Facebook API belongs to the Social category and the Webhooks category. For measuring the relatedness of APIs with respect to category information. We assume that if two APIs belong to the same category, their similarity should be high. And if two APIs share more categories in common, their similarity should be higher. Denote the categories an API a belongs to by category(a), the similarity of APIs according to category information is: T |category(ai ) category(aj )| (2) S (4) sim (ai , aj ) = |category(ai ) category(aj )| 3. Developer APIs are usually developed, updated and maintained by various people. Developers are the ones who make a contribution to the development process of
8
an API, which usually contains a group of people. If two APIs have many developers in common, it means that they are likely to be developed by the same group of people, and their similarity should be high. Denote the developer of an API a by developer(a), the similarity of APIs with respect to developers can be calculated as: T |developer(ai ) developer(aj )| S (5) sim(3) (ai , aj ) = |developer(ai ) developer(aj )| 4. Tag Tagging is an effective method of annotating APIs with a list of keywords. Since in ProgrammableWeb APIs do not have tagging information, we make use of a tagging method proposed in [7]. In general APIs are assigned with tags according to the functionalities of APIs and the labeled tags contain 393 terms in total and each API contains at least 5 tags. If two APIs have more tags in common, they should be more functionally similar and should be assigned a higher similarity score. Denoting the tags of an API a by tag(a), the similarity of APIs with respect to tags is computed as: T |tag(ai ) tag(aj )| S (6) sim(4) (ai , aj ) = |tag(ai ) tag(aj )| 5. Textual Description Each API has a brief text describing its functions and usage. We propose to model the similarity of APIs by their textual descriptions. We use Latent Dirichlet Allocation (LDA [1]) method, which is a probabilistic topic modeling technique that models the textual descriptions of APIs as a distribution on latent topics. For evaluating the similarities of APIs, we compute the cosine similarity of topic vectors of APIs. Specifically, if the topic vector of an API a generated by LDA method is denoted as θi , the similarity of APIs is: sim(5) (ai , aj ) =
θi · θj kθi k kθj k
(7)
6. Mashup Besides functional similarity, different APIs may complement each other. Two APIs of different functionalities can be composed together to form a mashup. A mashup is an integrated value-added service created by composing APIs with various functionalities. The APIs in a mashup can be treated as functionally complementary instead of functionally similar. The more times two APIs are composed together by a mashup, the more complement the two services are with each other. Denote a list of mashups that compose the API a as mashup(a), the complementarity of APIs is measured as: T |mashup(ai ) mashup(aj )| S sim(6) (ai , aj ) = (8) |mashup(ai ) mashup(aj )| In this paper, only six types of relations between APIs are proposed. However, more similarity measures related to other aspects of APIs can be easily extended if additional information of APIs are available.
9
3.4
Personalized API Preference Modeling
In this section, we aim to learn a better low-rank approximation of users and APIs by combining matrix factorization technique with heterogeneous functional aspects of APIs. For each propagated user-API follow matrix according to a certain aspect of API, we use matrix factorization method to derive low rank matrices of users and APIs. Since the propagated matrix incorporates a specific type of similarity measurement of APIs, the latent representations of users and APIs are encoded with a certain aspect of API functionality. For each propagated mae(l) , we perform matrix factorization to generate a pair of user-factor matrix trix R (l) e U and item-factor matrix Ve (l) . Six pairs of latent factors of users and APIs e (l) , Ve (l) ) represents users factors and APIs factors are generated. Each pair (U according to a certain aspect of APIs. The users’ preferences for APIs under e(l) = U e (l) Ve (l)T . The overall predicted a specific aspect is then computed as R user-API preference score is computed by combining different aspects of APIs together. A simple approach to combine latent factors learned under different aspects e(l) . That is: of APIs is to perform a weighted aggregation of the respective R b= R
6 X
e (l) Ve (l)T θl U
(9)
l=1
where θl is the weight for each aspect of APIs. Different aspects of APIs have different importance for influencing the users’ preferences. Some aspects of APIs are more relevant for users when deciding which APIs to follow. For example, users are more likely to follow APIs that are of the same category rather than composed in the same mashup. The weight controls how much impact the corresponding aspect of APIs has on the preferences of users. However, different users may have different opinions on deciding which aspects of APIs are more crucial for her preference. For example, some users want to be recommended with APIs under the same or relevant category, while other users prefer APIs that are complementary to each other and want to be recommended with APIs that can be composed with the followed API. Clearly a single weight for each aspect of APIs can not distinguish preferences of different users. To achieve the effect of personalized recommendation, we need to assign different weights to different users instead of a global weight for all users. Specifically, for each user, a weight vector is assigned to measure her personal preference on different aspects of APIs. The preference score prediction of user ui to API aj is defined as follows: 6 X b i , vj ) = e (l) Ve (l)T R(u θil U (10) i j l=1
where θil is the weight for user i on the lth aspect of API. Compared with only 6 parameters for aggregating different aspects of APIs, the personalized preference prediction model has 6m parameters. The larger parameter space enables a more accurate representation of users interest for APIs.
10
3.5
Model Weight Learning
In this section, we introduce how to learn the weight parameter θ for the personalized preference prediction model. The basic idea of the weight learning approach is introduced as follows: The preference score of a user to a followed API should always be higher than the preference score of a user to a non-followed API. To be specific, for each user ui , if API aj is followed by user previously and API ak has never been followed, then the predicted preference score of ui b i , aj ) > R(u b i , ak ) always holds. to aj should be higher than to ak . That is, R(u Assume that the user-API follow relations are organized in the form of a series of triplets (ui , aj , ak ), which means ui follows API aj and did not follow ak . Thus for each triplet, we can define the loss function as: l(ui , aj , ak ) = σ(R(ui , ak ) − R(ui , aj ))
(11)
where σ is the sigmoid function σ(x) = 1+e1−x to constrain the value between 0 and 1. For all the triplets generated from the dataset, the overall objective function is X λ 2 (12) L= l(ui , aj , ak ) + kθk 2 where λ is a regularization parameter. By minimizing L we can learn the parameter θ from the triplets. Since Objective function L is differentiable, we employ the stochastic gradient descent (SGD) method to estimate the parameter θ. For each triplet (ui , aj , ak ), the objective function based on this triplet is l = l(ui , aj , ak ) +
λ 2 kθi k 2
(13)
The gradient of l with respect to θi can be calculated as: ∂l e−(∆R(ui ,ak ,aj )) = · (Ui VkT − Ui VjT ) + λθi ∂θi 1 + e−(∆R(ui ,ak ,aj )) (1)
(1)T
where ∆R(ui , ak , aj ) = R(ui , ak )−R(ui , aj ), and Ui VjT = [Ui Vj (6) (6)T ..., Ui Vj ]
Ui VkT
(1) (1)T (2) (2)T (6) (6)T [Ui Vk , Ui Vk , ..., Ui Vk ].
(14) (2)
(2)T
, Ui Vj
,
and = Then θi can be ∂l , where η is a learning rate parameter. The parameter updated as θi ← θi − η · ∂θ i update process is iterated for each triplet until convergence is reached. If we consider all the triplets that could be generated by the data, then the time complexity of the parameter learning process is O(mn2 ). However, as the number of APIs n is usually huge, it is infeasible for practical implementation. In reality, we only sample a small subset of triplets from data to make the learning process more efficient. We will discuss the effect of sampling in the next section.
11
4
Experiments and Evaluation
We conduct a set of experiments to evaluate our proposed approach for API recommendation. The experimental dataset is crawled from the ProgrammableWeb, the largest public online API repository on the Internet. It contains a comprehensive listing of available APIs online and provides detailed information for each API. We crawled all the APIs available on ProgrammableWeb up until March 2016 and their corresponding followers. We also crawled rich API content profiles, including API names, primary category and secondary categories, provider, textual description, developers. We also crawled profiles of mashups to obtain the use of APIs by mashups. We filter out those users that follow less than 10 APIs since we can not create enough training and test data for such users. In summary, our dataset contains 792 users and 9,650 APIs. And there are a total of 1,486 APIs used in 7,066 mashups. Since our goal is to recommend appropriate APIs to followers, we use APIs followed by users as the ground truth for evaluating the accuracy of recommendation results. We split 80% of the APIs each user followed into training set and the rest of the APIs as test set. We perform the recommendation methods in the training set. The recommendation results generated from the training set is then compared to the test set for evaluation. Precision and Recall are two commonly used metrics to evaluate the prediction accuracy of recommendation results. Precision is the ratio of the number of recommended APIs actually followed by the user to the total number of recommended APIs. Recall is the ratio of the number of recommended APIs actually followed by the user to the total number of APIs followed by the user. We evaluate Precision and Recall for each user when recommending top K APIs and average over all users to get the overall performance. To integrate both metrics of precision and recall, we also use F-score measure as an evaluation metric, which is computed as the harmonic mean of precision and recall. We compare our proposed recommendation algorithm with the following baseline methods: 1. Neighborhood-based Collaborative Filtering (NBCF). It performs APIs recommendation based on users with similar preferences. Specifically, we use cosine similarity to evaluate the similarity between users based on their followed APIs and recommend APIs for each user according to the most similar 50 users of hers. 2. Matrix Factorization (MF). It is the method introduced in Section 3.1. The user-API follow matrix R is directly used for learning and prediction. The number of latent factors d is set to 50 and λU = λV = 0.1 for it achieves the best performance under parameter tuning. 3. Matrix Factorization + Single (MFS). We calculate the similarities for each e(l) . Maaspect of APIs and propagated the original matrix R into matrix R (l) e for recommendation. We trix factorization is performed on each matrix R evaluate the performance of each of the six aspects one by one.
12
4. Matrix Factorization + Uniform Weight (MFU). It combines the learned factor matrices for each aspect of APIs. Each aspect is aggregated with the same weight. That is, θl is a constant equals to 1/6. 5. Matrix Factorization + Weight Learning (MFW). It learns different weights for each aspect of APIs by the weight learning method proposed in Section 3.5. However the weights are the same for all the users. We set the learning rate η to 0.001 and the regularization parameter λ to 0.01 by tuning the parameters. 6. Matrix Factorization + Personalized Weight (MFPW). It is the personalized API recommendation method proposed in the paper. The weights of each aspect of APIs are estimated for each user. The parameters are set the same as MFW.
0.3
0.22
NBCF MF MFS MFU MFW MFPW
0.28 0.26
NBCF MF MFS MFU MFW MFPW
0.4
0.35
NBCF MF MFS MFU MFW MFPW
0.21
0.2
0.24 0.3
0.2 0.18
0.19
F-score
Recall
Precision
0.22
0.25
0.18
0.17 0.16
0.2 0.16
0.14 0.15
0.12
0.15
0.1 0.1 5
10
15
20
25
30
K
35
40
45
50
0.14 5
10
15
20
25
30
K
35
40
45
50
5
10
15
20
25
30
35
40
45
50
K
Fig. 3. Performance Comparison of Competitive Methods
We evaluate the recommendation performance for top K recommended APIs. Figure 3 shows the recommendation accuracy of all the competitive methods as the number of recommended APIs K ranges from 5 to 50. For method MFS, we only plot the aspect that achieves the best performance to save space. From the experimental results we can draw some observations: With a varying number of K, MFPW consistently achieves the best recommendation performance among all the other methods on the three metrics, which demonstrates the effectiveness of our approach on API recommendation. For other baseline algorithms, Matrix factorization method outperforms neighborhood based method, which shows that latent factor model can capture user API preference relations more accurately. Furthermore, incorporating heterogeneous functional information of APIs can enhance the recommendation performance as MFS method outperforms MF method. And MFU achieves a better recommendation result than MFS, which proves that the idea of combining different aspects of APIs is more effective than only considering a single aspect. Still, MFU method can not yield a better result over MFW since the uniform combination method is too simple to model the difference of impact of API factors on the recommendation performance. MFPW learns a personalized weight for each user and model users’ preferences at a finer granularity, as evidenced by the superior performance compared to MFW. In summary, the experimental results validate our theoretical analysis and show its efficacy in API preference modeling.
13
MFPW
MFPW
MFW
MFW
MFU
MFU
MFPW
MFU
MF+Tag
MF+Tag
MF+Tag
MF+Category
MF+Category
MF+Category
MF+Description
MF+Description
MF+Description
MFW
MF+Mashup
MF+Mashup
MF+Mashup
MF+Developer
MF+Developer
MF+Developer
MF+Provider
MF+Provider
MF+Provider
MF 0.15
MF 0.155
0.16 Precision
0.165
0.17
0.25
MF 0.26
0.27 Recall
0.28
0.29
0.18
0.19
0.2 F-score
0.21
0.22
Fig. 4. Performance Comparison of each Aspect in MFS Method
In particular, we plot the recommendation performance for each of the six aspects of APIs in the MFS method as well as other competitive methods. The result is shown in Figure 4 with the number of recommended APIs K set to 20. Some observations can be drawn from the result: First, compared to the basic MF method, the performance of incorporating each aspect of API all shows an increase with varying degrees. This shows that each aspect of API proposed in the paper is helpful for user preference modeling and API recommendation. In particular, different aspect of API has a different impact on users following pattern. The API tags reports the best performance among all the six aspects. This indicates that tags are the most informative aspect for understanding users’ preferences for APIs. The reason could be that users tend to follow APIs that share many common tags together. And the API provider is the least informative information for characterizing users’ preferences. It is interesting that mashups factors have a positive impact on the recommendation results, which shows that users not only care about APIs that are similar with each other, but also APIs that are complementary to each other. Second, unifying each aspects of APIs together for recommendation (MFU, MFW, MFPW) performs better than only one single aspect is utilized for recommendation by a relatively large margin. This indicates that the idea of combining different aspects of APIs together is helpful in API recommendation. Finally, we evaluate the performance of the proposed weight learning approach. Each triplet is generated as follows: For an API followed by each user, we randomly draw P APIs that are not followed by the user. Thus P triplets are sampled for a user and an API that followed by the user. The number of triplets drawn by this method is mnu P , where nu is the number of APIs followed by a user u. While P and nu is much smaller than the number of APIs n, the time complexity of the algorithm is greatly reduced and weights can be learned more efficiently. We evaluate the effect of the number of sampled triplets represented by P on the weight learning process. Figure 5 (a) shows the training error measured by the loss function L over the number of triplets processed as the weight learning algorithm progresses. We can see that the loss function gradually converges as more training triplets are fed into the algorithm, and with more training triplets available, the training error also becomes smaller. Figure 5 (b) demonstrates the relationship between the number of triplets and the performance of weight learn-
14 0.216
K=10 K=20
0.215 0.214 0.213
F-score
0.212 0.211 0.21 0.209 0.208 0.207 0.206 1
5
10
50
100
500
P
(a)
(b)
Fig. 5. Performance Evaluation with Number of Triplets
ing algorithm measured by F-score under different values of K. We can see that with the number of triplets becomes larger, the recommendation performance experiences a slight increase. The performance reaches to stable state when the number of triplets becomes even bigger. This means that the sampling of triplets is effective in the weight learning process. There is no need to generate all triplets from the training data, only a small number of sampled triplets is enough. The number of sampled triplets should be set to an appropriate number as insufficient data could hurt the performance and too many triplets could render the recommendation process inefficient. We set P = 100 in our experiment.
5
Conclusion
In this paper, we propose a comprehensive approach for recommending APIs to users. We utilize the heterogeneous content information of APIs and explore the impact of different aspects of APIs on the preferences of users. We identify several aspects of APIs, including API’s provider, category, developer, tag, textual descriptions and mashups composing them. Then users’ preferences are propagated through different functional aspects of API and matrix factorization is employed for each propagated matrix to learn user and API latent factors under each aspect. To achieve personalized recommendation, latent factors of six functional aspects are combined with different weights for each user. The experimental results demonstrate that our approach outperforms other baseline approaches and proves its effectiveness for personalized API recommendation. In future work, we will investigate more aspects of APIs that may have an influence on users’ preferences, such as API’s social information like twitter homepage. Moreover, we conjecture that it is possible that user preference may be influenced by non-functional aspects of APIs and we will explore this idea further in order to produce more satisfactory recommendation results.
15
References 1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003) 2. Chen, L., Wang, Y., Yu, Q., Zheng, Z., Wu, J.: WT-LDA: user tagging augmented LDA for web service clustering. In: Service-Oriented Computing - 11th International Conference, ICSOC 2013, Berlin, Germany, December 2-5, 2013, Proceedings. pp. 162–176 (2013) 3. Elgazzar, K., Hassan, A.E., Martin, P.: Clustering WSDL documents to bootstrap the discovery of web services. In: IEEE International Conference on Web Services, ICWS 2010, Miami, Florida, USA, July 5-10, 2010. pp. 147–154 (2010) 4. Jain, A., Liu, X., Yu, Q.: Aggregating functionality, use history, and popularity of apis to recommend mashup creation. In: Service-Oriented Computing - 13th International Conference, ICSOC 2015, Goa, India, November 16-19, 2015, Proceedings. pp. 188–202 (2015) 5. Jiang, Y., Liu, J., Tang, M., Liu, X.F.: An effective web service recommendation method based on personalized collaborative filtering. In: IEEE International Conference on Web Services, ICWS 2011, Washington, DC, USA, July 4-9, 2011. pp. 211–218 (2011) 6. Li, C., Zhang, R., Huai, J., Sun, H.: A novel approach for API recommendation in mashup development. In: 2014 IEEE International Conference on Web Services, ICWS, 2014, Anchorage, AK, USA, June 27 - July 2, 2014. pp. 289–296 (2014) 7. Liang, T., Chen, L., Wu, J., Bouguettaya, A.: Exploiting heterogeneous information for tag recommendation in api management. In: 2016 IEEE 23rd International Conference on Web Services, San Francisco, USA, June 27 - July 2, 2016 (2016) 8. Liu, X., Fulia, I.: Incorporating user, topic, and service related latent factors into web service recommendation. In: 2015 IEEE International Conference on Web Services, ICWS 2015, New York, NY, USA, June 27 - July 2, 2015. pp. 185–192 (2015) 9. Xia, B., Fan, Y., Tan, W., Huang, K., Zhang, J., Wu, C.: Category-aware API clustering and distributed recommendation for automatic mashup creation. IEEE Trans. Services Computing 8(5), 674–687 (2015) 10. Xu, W., Cao, J., Hu, L., Wang, J., Li, M.: A social-aware service recommendation approach for mashup creation. In: 2013 IEEE 20th International Conference on Web Services, Santa Clara, CA, USA, June 28 - July 3, 2013. pp. 107–114 (2013) 11. Yao, L., Sheng, Q.Z., Segev, A., Yu, J.: Recommending web services via combining collaborative filtering with content-based features. In: 2013 IEEE 20th International Conference on Web Services, Santa Clara, CA, USA, June 28 - July 3, 2013. pp. 42–49 (2013) 12. Zheng, Z., Ma, H., Lyu, M.R., King, I.: Wsrec: A collaborative filtering based web service recommender system. In: IEEE International Conference on Web Services, ICWS 2009, Los Angeles, CA, USA, 6-10 July 2009. pp. 437–444 (2009) 13. Zheng, Z., Ma, H., Lyu, M.R., King, I.: Collaborative web service qos prediction via neighborhood integrated matrix factorization. IEEE Trans. Services Computing 6(3), 289–299 (2013)