This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2482980, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
1
Cost Performance Driven Service Mashup: A Developer Perspective Shuiguang Deng, Member, IEEE, Hongyue Wu, Javid Taheri, Albert Y. Zomaya, Fellow, IEEE, and Zhaohui Wu, Senior Member, IEEE Abstract—Service mashups are applications created by combining single-functional services (or APIs) dispersed over the web. With the development of cloud computing and web technologies, service mashups are becoming more and more widely used and a large number of mashup platforms have been produced. However, due to the proliferation of services on the web, how to select component services to create mashups has become a challenging issue. Most developers pay more attention to the QoS (quality of service) and cost of services. Beside service selection, mashup deployment is another pivotal process, as the platform can significantly affect the quality of mashups. In this paper, we focus on creating service mashups from the perspective of developers. A genetic algorithm-based method, GA4MC (genetic algorithm for mashup creation), is proposed to select component services and deployment platforms in order to create service mashups with optimal cost performance. A series of experiments are conducted to evaluate the performance of GA4MC. The results show that the GA4MC method can achieve mashups whose cost performance is extremely close to the optimal. Moreover, the execution time of GA4MC is in a low order of magnitude and the algorithm performs good scalability as the experimental scale increases. Index Terms—Cost Performance, Mashup Deployment, Service Composition, Service Mashup, Service Selection
—————————— ——————————
1 INTRODUCTION Over the last few years, distributed computing technologies have gained tremendous popularity in modern industry and academia. Under such circumstances, Web services emerge as a major technology for deploying automated interactions between distributed and heterogeneous applications, and for connecting business processes, which might span companies’ boundaries [1]. However, in many cases, no single Web service can fully satisfy some complex requests. Therefore, service mashups are proposed to solve this problem. A service mashup, in enterprise application development, is an application that is created by combining single-functional services (or APIs) dispersed over the web. By combining several services, mashups can provide multiple functionalities to satisfy complex user requests. Recently, service mashups are widely used and a large number of mashup platforms have appeared, such as ProgrammableWeb, Yahoo Pipe, and IBM Mashup Center. For example, ProgrammableWeb provides various APIs with different functionalities, developers can search and invoke these APIs, and compose them to create mashups according to their requirements, and various mashups with complex functionalities are deployed on it. Mashup development process mainly consists of two phases: service selection and mashup deployment. As shown in Fig. 1, service providers can deploy their ————————————————
S. Deng, H. Wu, and Z. Wu are with College of Computer Science and Technology, Zhejiang University. E-mail:
[email protected],
[email protected],
[email protected] J. Taheri is with Department of Computer Science, Karlstad University, Karlstad, Sweden. E-mail:
[email protected] A. Zomaya are with School of Information Technologies, The University of Sydney. E-mail:
[email protected]
services on different platforms, mashup developers search services from platforms and invoke them to create mashups according to their service plans, service plans specify the required services and composite structures, and mashups are also deployed on platforms. In the process of mashup development, service selection is to search the platforms to select a concrete service for each task in the service plan. Then, the selected services are composed to a service mashup according to the structure of the service plan. Finally, the new created mashup is deployed on a selected platform. When a mashup is invoked, the user should visit the platform where the mashup is deployed and the mashup accomplishes its functionality by invoking all of its component services one after another. Service selection and service mashup deployment are considered as the most pivotal steps during service mashup construction, as they determine the quality of service mashups. Service provider
Platform
a11
b11 c12
d13
a21
c32
d24
Service plan
b
b44 a44
Where to deploy?
How to select? Service Mashup
d43
d33
b31
a21
b31
c12
d33
A
B
C
D
e
Fig. 1. Service mashup construction in cloud environment
In the process of service mashup construction, most developers aim to achieve mashups with optimal QoS. QoS is usually employed for describing the nonfunctional characteristics of Web services. With the development of cloud computing and web technologies, the services on the web are proliferated. For a certain
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See xxxx-xxxx/0x/$xx.00 © 200x IEEE Published by the IEEE Computer Society http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2482980, IEEE Transactions on Parallel and Distributed Systems 2
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
function, there may be many services with different QoS and provided by different enterprises (providers). In this circumstance, users are paying more attention to QoS. Therefore, mashup developers should try to create mashups with optimal QoS. Among all QoS criteria, response time is one of the attributes that attract the most attention. In this paper, specifically, we only consider response time and regard response time as the QoS of mashups. The response time of a service is the expected delay between the moment when a request is sent and the moment when the results are received. It is the sum of input parameter transmission time, service execution time and output parameter transmission time. What’s more, running a service mashup is not free due to the cost of its invocation to its component services. For developers, their objective is to make profits and they pay much attention to the cost of services. Therefore, they try to decrease the overall cost in order to increase their revenue. The cost of a service is the fee that the requester has to pay for invoking the service. The execution cost of a mashup is the sum of the execution costs of all its component services. Besides, we have taken service packages into consideration in this paper. The service package is defined as that the price of the services is discounted when two or more services are packaged together for sale. Service packages are very common in the real life. For example, on Priceline (http://www.priceline.com/), we can search flight, car and hotel separately, and we can also search vacation packages where the services of booking flight, car and hotel are packaged together and the prices are discounted. For another example, if we invoke the database services and storage services of Aliyun (http://www.aliyun.com/) together, a discount will be given. Service packages have great effect on the overall cost of the mashups, but are ignored by most researchers. Both optimal QoS and minimal cost are essential objectives of mashup construction. However, there is often a tradeoff between these two objectives. An effective way is to combine them via cost performance. Cost performance is defined as the quotient of the overall QoS and total cost of the mashup, implying the performance that can be bought by unit price. Therefore, we adopt cost performance as the optimization goal and aim to maximize the cost performance of mashups. Service mashups are also deployed on cloud platforms. The location of a mashup may have a big effect on its QoS. If one of the component services is located on the same service platform with the mashup, then the response time of the mashup is reduced, as the data transmission time can be saved. In this paper, we focus on creating service mashups in perspective of developers. Service selection and service mashup deployment are considered as the most pivotal steps of service mashup construction, as they determine the QoS and cost of service mashups. Our objective is to select component services and deployment platforms in order to create service mashups with maximum cost performance. To achieve maximum cost performance, we should consider both optimal QoS and minimal cost,
which makes the problem become complex. For QoS, we should consider both service execution time and parameter transmission time, and try to save the parameter transmission time. As for cost, due to service packages, the traditional greedy algorithm [2, 3] that selects the services with lowest price is not suitable any longer. In this paper, we focus on cost performance driven service mashup development and deployment in perceptive of mashup developers and propose a new approach toward service and deployment platform selection, consisting of the following contributions: (1) We propose a new problem of creating service mashups in terms of cost performance. (2) We select services and platforms while accounting for service packages and parameter transmission time saving, which has great effect on the cost performance but is ignored by most of the present service selection methods. (3) Beside service selection, we analyze the significant effect of platform selection on the cost performance of service mashups and propose a method to select platforms for mashups. (4) We adopt the genetic algorithm to solve the problem and present the detailed algorithm and analysis. A series of experiments are performed to show the effectiveness and efficiency of our method. The rest of the paper is organized as follows. Section 2 presents a specific example to make the problem clearer. In Section 3, the related works are discussed. In Section 4, we present basic definitions and formalize the problem. Then we describe the main operations and algorithms in detail in Section 5. In Section 6, we show the evaluation experiments and analyze the results. Finally, we conclude the paper and discuss the challenges in our future work in Section 7.
2 MOTIVATION SCENARIO In this section, we introduce a specific real-life example to make the problem clearer and to illustrate the necessity and significance of our work. The given conditions are shown in Fig. 2 and Table 1. In Fig.2, rectangles denote tasks and circles denote services. The service plan contains three tasks; task k1 has three candidate services, task k2 has two candidate services and task k3 has four candidate services. These services are located on four platforms and the details of them are shown in Table 1: the RT column shows the average response time; the ET column shows the average service execution time; the IT column and OT column show the average input, output parameter transmission time respectively, with the relation that RT is the sum of ET, IT and OT; the DC column shows the default price of the services; the PS column shows the service or services that the service is packaged together with, and the PC column shows the price of the service packages. For example, the cost of s11 is 10 and the cost of s31 is 22, but if they are invoked together, the total cost will be discounted and become only 24. This means that if
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2482980, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
3
packaged services are invoked in the same mashup, a discount on the execution cost will be given. “n.a.” denotes that the Web service does not have been packaged. These data are set according to [2]. b
k1 s11
k2
k3
e
s31
s22
s32
s12
s33
s23 s14
s34
Fig. 2. Service selection example
TABLE 1 DETAILED INFORMATION OF THE SERVICES WS
Platform
s11 s12 s14 s22 s23 s31 s32 s33 s34
Oracle Amazon Microsoft Amazon IBM Oracle Amazon IBM Microsoft
Response Time
3 RELATED WORK
Execution Cost
RT
IT
ET
OT
DC
PS
292 310 309 274 306 342 344 335 340
137 145 142 98 111 153 147 140 150
43 41 40 52 54 37 47 53 45
112 124 127 124 141 152 150 142 145
10 8 7 17 15 22 24 26 23
s31 s22, s32 n.a. s12, s32 n.a. s11 s12, s22 n.a. n.a.
to select services to maximize the cost performance. To achieve the optimal mashup, we should select the most appropriate service for each task. However, due to service packages, the costs of some services are correlated with other services. Moreover, the response time of services are correlated to the platform where the mashup will be deployed, while the platform should be selected according to the selected services. Therefore, it is difficult to decide the most appropriate services. In some cases in the real world, the problem may become more difficult, as the service plan may be more complex, there may be multiple candidate services for each task and numerous packages may exist between services.
PC 24 46 n.a. 46 n.a. 24 46 n.a. n.a.
Given the service plan, the candidate services for each task in the service plan, the detailed information of each service, and the platforms, the object is to select one service from the candidate services for each task in the service plan and to select one platform for the new mashup in order to maximize the cost performance of the generated mashup. Specifically, we should select one service from s11, s12 and s14 to fulfill task k1, one service from s22 and s23 to fulfill task k2, and one service from s31, s32, s33 and s34 to fulfill task k3, then compose them sequentially and deploy the generated service mashup on one platform selected from Oracle, Amazon, IBM and Microsoft, aiming to maximize the cost performance of the mashup. Traditional methods always adopt greedy-like methods to perform service selection [3, 4]. If a mashup with minimum cost is expected, they will select s14 for task k1, s23 for task k2, and s34 for task k3, because the default price of these services are the lowest. However, due to the existing of service packages, the combination of s11, s23 and s31 (24+15=39) is much better than that of s14, s23 and s34 (7+15+23=45). If a mashup with shortest execution time is expected, they will select s11, s22 and s33, and deploy the new mashup on IBM, as the parameter transmission time of s33 inside IBM can be safely omitted. In that case, the whole response time of the mashup is 619. However, if we select s12, s22 and s32, and deploy the new mashup on Amazon, the response time will become only 140, as the parameter transmission time of these services can be saved. Therefore, traditional greedy-like service selection method is not suitable any more, due to the influence of service packages and deployed platforms. From the example, we can conclude that it is difficult
In recent years, service mashups have attracted intensive attention from both industry and academia. A lot of researches on service mashups have been done from different perspectives. Many works focus on the mashup composition patterns, as the composition model determines how components are orchestrated and integrated. Some researchers present a spreadsheet-based service mashup development framework, which enables users to develop mashups in the popular spreadsheet environment [5]. Wang et al. [6] present an end-user-oriented programming environment called Mashroom, which takes the nested table as the data structure and formally defines a set of visual mashup operators to offer a spreadsheet-like programming experience. Maximilien et al. [7] propose a Domain-Specific Language that unifies the most common service models, including data, services and APIs, and facilitates service composition and integration into Web applications. Hristoskova et al. [8] presents a WTE+ framework for automated construction and runtime adaptation of service mashups. They present the design and validation of the management framework to support automatic composition, execution and dynamic adaptation of custom-made service mashups. Bertoli et al. [9] describe a novel planning framework for the automated composition of Web services, which can handle services using industrial standard languages for business processes modeling and execution. Several studies pay attention to privacy preserving of service mashups. Arafati et al. [10] provide a cloud-based framework for privacy-preserving mashups. It enables secure collaboration between providers for the purpose of generating an anonymous dataset to support data mining. Barhamgi et al. [11] propose a privacy-preserving approach for mashing-up web services. They arrange services by defining a dependency graph, and then insert privacy filters to generate the mashup data. Friedman et al. [12] propose an interactive algorithm for building a decision tree that satisfies ε-differential privacy. Mrissa [13] proposes a formal privacy model in order to extend DaaS descriptions with privacy capabilities. This privacy model allows a service to define a privacy policy and a set of privacy requirements. A large number of studies address the problem of
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2482980, IEEE Transactions on Parallel and Distributed Systems 4
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
mashup service selection. These selection approaches are mainly based on QoS. They apply QoS to evaluate the non-functional property of services, and aim to select candidate services in order to make the mashup with optimal composite QoS [14, 15]. Various QoS-aware service selection approaches are proposed to solve different problems. Some studies focus on computing the QoS attributes of mashups [16, 17]. In these methods, QoS attributes are divided into several categories, and each category of QoS attributes has its own aggregation rules with respect to the structure of mashups. Many QoSaware service selection studies account for several QoS attributes during service selection and describe QoSaware service selection as a multidimensional, multiobjective, and multi-choice knapsack problem (MMMKP) [18]. Many researchers adopt QoS constraints to restrict service mashups in the process of service selection [19, 20]. Some studies focus on service mashups that not only satisfy QoS constraints but also optimize certain QoS criteria [16, 21, 22]. Moreover, some studies exploit decomposition-based approach to perform service selection [22]. They try to compute the utility of a mashup from the utilities of its component services, and derive the constraints of component services from the constraints of the objective service mashups. Besides, we have addressed the problem of service selection in mobile environment, considering different optimization objectives including response time [23, 24], energy consumption [24, 25], and risk reduction [26]. Several studies considered correlations between services. Yu et al. [27] proposed a backwards composition context based service selection approach for service composition. They considered several context factors, including cost policy and composition time, during service selection. However, they did not present how these factors are computed and their aim is to obtain the composite service that best suits these context factors. Several studies have addressed the optimal service selection problem with correlations taken into consideration [28, 29]. They have considered the effect of service packages on the cost and regarded them as correlations. Their object is to get composite services with optimal QoS. Three major concerns differ our work from these two studies. First, the computation methods of cost are different: we regard the cost of service packages as a whole, while these work calculate the cost by discounting the cost of latter services. Second the objective of these studies is to make the composite service with best QoS [28, 29] and best suitability regarding the context factors [27], respectively, while we aim to maximize the cost performance of the mashup from the economic perspective. Third, both these two studies only focus on service selection, while we not only consider service selection but also take mashup deployment into consideration, mainly because the deployed platform can significantly affect the QoS of mashups. All the studies mentioned above are mainly from the perspective of mashup users not the developers; therefore, there are some differences between these studies and our work. 1) These studies only pay attention to the QoS of
mashups and do not take the revenue of developers into consideration, while we consider both the QoS and the cost of mashups to obtain mashups with highest cost performance. 2) Most of the existing studies do not account for service packages and parameter transmission time saving, which may have significant impact on the QoS value of the service mashups. We have taken these factors into account to improve the QoS and reduce the cost of the service mashups. 3) These methods do not consider the problem of mashup deployment; however, the deployment of mashups has great impact on the QoS of mashups.
4 PROBLEM DEFINITION In this section, we present some basic definitions and formally model the cost performance driven service selection problem. The main variables and symbols in this section are listed in Table 2. TABLE 2 SYMBOL EXPLANATIONS Symbol s/S p/P ps defaults pricepackage Spackage t tinput texecution toutput k/K c cp Q pm
Explanation Service/service set Platform/platform set The platform of service s The default value of the price of service s The price of package The set of services that are involved in package Response time Input parameter transmission time Service execution time Output parameter transmission time Task/task set cost Cost performance The function used to integrate QoS values The deployed platform of mashup m
4.1 Web Service and Service Mashups Definition 1 (Web Service). A Web service is a 4-tuple 𝑠 = (𝑖, 𝑓, 𝑝, 𝑄𝑜𝑆), where: (1) i is the unique identifier of the service; (2) f is the functional description of the service, including the input, output, pre-condition and result of the service; (3) p is the platform where the service is deployed; (4) QoS is a set of attributes, including price, response time, reliability, availability, reputation, etc. In Definition 1, each service is identified by i, f describes the function of a service, which is used to aggregate candidate services for tasks in service plans, and QoS describes the non-functional qualities of a service. Platforms are the location where services and service mashups are deployed and executed. During a service invocation process, the input parameters are transmitted to the platform first, then the service is executed on the platform, and finally the execution result is feedback. The price is one of the most important attributes of QoS. The price of a service is the fee that a service requester has to pay for invoking the service. In some conditions, the price of a service is not fixed. It is reasonable that if a developer invokes two services from
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2482980, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
the same platform, then a discount will be given. In the following, we give the definition of price. Definition 2 (Service Price). The price of a service is a tuple 𝑝𝑟𝑖𝑐𝑒 = (𝑑𝑒𝑓𝑎𝑢𝑙𝑡, 𝑝𝑎𝑐𝑘𝑎𝑔𝑒), where: (1) default is the default value of the price of the service; (2) package is the service package that the service is involved in, with 𝑝𝑎𝑐𝑘𝑎𝑔𝑒 = (𝑝𝑟𝑖𝑐𝑒, 𝑆), where price denotes the package price of the package and 𝑆 is the set of services that are involved in the package. In this paper, we only consider the scenarios where a service is involved in not more than one service package. If a service is not packaged with other services, its package is null. The default value applies to a service if there is no service package where all services involved in it are invoked by the same mashup. The price of the service package will be applied, if and only if all of the services involved are invoked by the same mashup. The response time is another important attribute of QoS. Various methods have been proposed to forecast the response time for a certain user and a certain service [2, 30]. In this paper, we assume that the response times of all services are pre-calculated. Definition 3 (Service Response Time). The response time of a service is the expected delay between the moment when a request is sent and the moment when the results are received. It can be calculated by 𝑡=𝑡 +𝑡 +𝑡 where (1) (1) tinput is the input parameter transmission time; (2) texecution is the service execution time, which is the time duration between the server receiving input parameter and sending out the corresponding response; (3) toutput is the output parameter transmission time. We can get the conclusion that if a service requester invokes a service on the same platform, then the parameter transmission time can be saved, as the parameter transmission time inside a platform is in a far smaller order of magnitude than across the network.
5
(5) QoS expresses the QoS of the service mashup; (6) cp is the cost performance of the mashup, calculated by ( ) , where Q is a function integrating the values of all QoS attributes to a real number, with higher value indicating better performance and lower value indicating worse performance. Function Q is used to integrate the values of all QoS attributes. Various methods have been proposed to transform multiple objectives to a signal objective [31-33] and these methods are suitable for different scenarios. Therefore, function Q should be defined according to different problems. Service mashups can also be viewed as services deployed on platforms. Service mashups are obtained by service composition. We regard the service mashup with maximum cost performance as the optimal service mashup. For simplicity, we only consider execution time and regard execution time as the QoS of a service mashup in this paper. Definition 6 (Service Mashup Deployment). Given a service plan 𝑠𝑝 = (𝐾, 𝐵) with 𝐾 = *𝑘 + , the selected service for each task in the plan *𝑠 + , and a set of platforms P, service mashup deployment is to (1) select a platform p from P; (2) compose *𝑠 + according to B, and deploy the composed service mashup on p. Service mashup deployment is an important step in service mashup construction, as the selected mashup platforms can affect the QoS of the generated service mashup. Through service mashup deployment, the component services are composed and enveloped by XML-based languages.
4.2 Cost Performance Driven Service Mashup In this paper, we only consider the service mashup development for service plans with sequence structure. We try to develop service mashups with the maximum cost performance. The definition of service price has been given by Definition 4 (Service Plan). A service plan is a tuple Definition 2. Each service has a default price and a set of 𝑝𝑙𝑎𝑛 = (𝐾, 𝐵), where: (1) 𝐾 = *𝑘 + is a set of tasks, including two special tasks: service packages. The cost of a service mashup is the sum of the prices for all the services invoked by the mashup, a beginning task b and an ending task e; (2) B provides the structural information of the service as defined in Definition 7. plan, which can be specified by XML-based languages, such as Definition 7 (Service Mashup Cost). Given a service mashup with component service set S, its cost is calculated by BPEL. 𝑐=𝑐 +𝑐 where (2) A service plan is an abstract description of a business 𝑐 = ∑ 𝑝𝑟𝑖𝑐𝑒 , 𝑖 ∈ {𝑖|S 𝑆}, and (3) process. Each service plan begins with task b and ends 𝑐 = ∑ 𝑑𝑒𝑓𝑎𝑢𝑙𝑡 , with the task e. Each task in a service plan can be realized by invoking an individual service and there may be 𝑗 ∈ {𝑗|¬∃𝑝𝑎𝑐𝑘𝑎𝑔𝑒, 𝑠. 𝑡. 𝑠 ∈ 𝑆 &𝑆 𝑆} (4) multiple services from different platforms and with As shown in Definition 7, the cost of a service mashup different QoS that can be adopted to fulfill each task. can be divided to two parts, i.e. the cost of service Definition 5 (Service Mashup). A service mashup is a 6- packages and the cost of services. If all services involved tuple 𝑚 = (𝑆, 𝐵, 𝑝, 𝑐, 𝑄𝑜𝑆, 𝑐𝑝), where: in a service package are invoked by the mashup, then the (1) S is the set of Web services composed in the service package price will be applied to these services (equation mashup; 3), and other services will be calculated with their default (2) B provides the structural information of the service prices (equation 4). As Definition 2 indicates, because a mashup, which can be specified by XML-based languages; service does not belong to more than one service package, (3) p is the platform the service mashup is deployed on; each service in the mashup is either counted through the (4) c is the cost of the service mashup; package it is involved, or appears with its default price. 1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2482980, IEEE Transactions on Parallel and Distributed Systems 6
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
The response time of a mashup includes the data transmission time and mashup execution time. As users are from different places and they invoke mashups at different times, therefore, we can only reduce the response time of mashups by minimizing the execution time of mashups. Definition 8 (Service Mashup Execution Time). The execution time of a service mashup is the expected delay between the moment when a request is received and the moment when the results are received. Given a service mashup with its component service set *𝑠 + , the execution time of it can be calculated by ∑ (𝑡 ), where 𝑡 𝑡 = {𝑡
, +𝑡
𝑖𝑓 𝑝 = 𝑝 +𝑡
, 𝑒𝑙𝑠𝑒
(5)
As defined in Definition 3, the time spent on each service includes its input parameter transmission time, service execution time and its output parameter transmission time. However, if a service and the service mashup are on a same platform, then the parameter transmission time is saved and only the service execution time should be added to the execution time of the service mashup. Therefore, the execution time of a service mashup is significantly affected by its location. Service mashup development is to select a service for each task in the service plan to obtain a service mashup with maximum cost performance. In the following, we formalize this problem. Definition 9 (Cost Performance Driven Service Mashup). Given a service plan, with n tasks, its corresponding candidate service sets *𝑆 + and the available platform set P, cost performance driven service mashup construction is to (1) select one service for each task; (2) select one platform for the service mashup to maximize the cost performance of the generated service mashup. Definition 9 gives the objective of service mashup development. The cost performance reflects the quality per unit cost of the generated service mashup.
5 GA4MC APPROACH In this section, we introduce the details of our approach. The well-known genetic algorithm is adopted for service and platform selection. We present how the genetic algorithm is applied to service selection first, then introduce the platform selection approach, and finally show the detailed algorithm.
5.1 Transformed to the Optimization Problem According to Definition 9, the cost performance driven service mashup construction is to select candidate services and platforms for mashups. It is an optimization problem that can be modeled as follows: 𝑚𝑎𝑥 𝑐𝑝(𝜃, 𝑝) 𝑠. 𝑡. 𝜃 ∈ ,1, 𝑁 𝜃 ∈𝑍 where (6) 𝑝 ∈ ,1, 𝑚𝑝∈𝑍 𝑐𝑝(𝜃, 𝑝) is the cost performance of the service mashup generated according to θ and p, θ is an n-dimensional
vector representing the selected services, θi implies which candidate service is selected for the i-th task, Ni is the quantity of candidate services for the i-th task, p is an integer representing the selected platform that the mashup is deployed on and m is the quantity of candidate platforms. Obviously, the constraints must hold, and they construct the feasible solution set. The optimal solution (𝜃̂ , 𝑝̂ ) should satisfy: (1) (𝜃̂ , 𝑝̂ ) belongs to the feasible solution set;
(2) (θ, p) in the feasible solution set, 𝑐𝑝(𝜃, 𝑝) 𝑐𝑝(𝜃̂, 𝑝̂ ). Therefore, this problem is an integer-programming problem and it is a NP-hard problem. Generally speaking, no algorithm can resolve this problem in polynomial time. Therefore, we propose a genetic algorithm-based method to solve this problem, because it can provide an approximate optimal solution within polynomial time. 5.2 Genetic Algorithm Introduction The genetic algorithm [34] is one of the most widely used heuristic algorithms. It is inspired by biological phenomena: in natural evolution, superior individuals that fit the environment survive and inferior individuals are eliminated. The genetic algorithm encodes specific problems in a chromosome-like structure and simulates the evolution process by applying a fitness function to judge how well a chromosome adapts to the environment. In the genetic algorithm, feasible solutions are modeled by chromosomes and genes are the independent units of chromosomes. New individuals are generated through crossover and mutation. Crossover is an operation that recombines parent chromosomes to generate new child chromosomes. Crossover plays a central role in the genetic algorithm. Through crossover, new chromosomes are generated and the fitness of the new generated chromosomes may be higher than that of their parents. The objective of crossover is to generate additional new chromosomes to improve the optimal fitness of the generated chromosomes and improve the possibility of obtaining chromosomes with higher fitness in the next generation. Mutation is an operator that changes chromosomes slightly in a random way. The objective of mutation is to generate chromosomes with higher fitness and to avoid early convergence. The function of mutation is as same as crossover. The reserving or removing of chromosomes is determined by selection. Selection is a process of reserving the superior chromosomes and weeding out the inferior ones. The objective of selection is to transmit the superior chromosomes directly to the next generation and indirectly to the next generation via mutation and crossover operations. 5.3 Service Selection Based on Genetic Algorithm Our objective is to select component services to achieve a mashup with optimal cost performance, and this problem can be smoothly transformed to a genetic problem. In the following, we illustrate how the genetic algorithm is
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2482980, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
applied to solve our service selection problem. 1) Encoding Table 3 presents the corresponding relationships between the parameters of genetic algorithm and our service selection problem. In the genetic algorithm, feasible solutions are modeled by chromosomes; thus, chromosomes correspond to mashups. Genes are the independent units of chromosomes, so they correspond to component services. The locus of genes in a chromosome expresses tasks in the service plan. Fitness is used to judge chromosomes just as cost performance is adopt to evaluate mashups. Therefore, fitness corresponds to cost performance. If a chromosome has a high fitness, it implies that the cost performance of the corresponding mashup is low. Fig. 3 shows an example of genetic encoding. TABLE 3 TERM MATCHING BETWEEN THE GENETIC ALGORITHM AND SERVICE SELECTION Genetic Algorithm Chromosome Gene Locus Fitness Service Plan:
b
t1
t2
„
tn
s21
sn1
s22
sn2
„
s11 s12
e
„
„
„
Service Selection:
Service Selection Service mashup Service Task Cost performance
snm
Chromosome:
2
1
„
7
calculated by: Equation (7) guarantees that chromosomes with higher fitness values have a higher probability of being selected. 4) Crossover In this paper, we adopt the standard single-point crossover operator to perform crossover. In the crossover process, a single point is randomly chosen from the chromosome first; then, the genes of the two parent chromosomes before the point are held unchanged and the genes after the point are interchanged to generate two new child chromosomes, as shown in Fig. 4. p1: 3 p2: 2
Fig. 3. Genetic encoding scheme
2) Initialization At the beginning of the initialization phase, we should initialize the parameters involved in the genetic algorithm, including the population size, namely the quantity of initial chromosomes cq, the maximum iteration times it, the crossover times ct and the mutation times mt. These parameters can influence the result and efficiency of the algorithm, and they should be set according to the parameters of the selection problem. Then, the initial population is generated randomly. For each chromosome in the population, 𝑠 = (𝑠 , 𝑠 , ⋯ , 𝑠 ) is generated randomly. From the service selection perspective, n corresponds to the quantity of tasks in the service plan; 𝑠 is an integer, implying which candidate service is selected for the i-th task. 3) Selection Chromosomes are evaluated by fitness. Fitness is used to describe how well an individual fits the environment. In our model, a chromosome with high fitness implies that the corresponding service mashup has high cost performance. Fitness can be calculated according to Definitions 5, 7 and 8. We adopt the well-known roulette wheel method to perform selection. The probability of a chromosome sj with fitness fj to be selected out of the population is
4
5
3
1
c1: 3
4
6
2
3
7
6
2
3
c2: 2
7
5
3
1
Fig. 4. Crossover example
From the service selection perspective, crossover is to recombine two mashups by interchanging corresponding services to generate two new available mashups. The new generated mashups may be with higher cost performance than their parents. Therefore, through crossover, additional superior mashups are generated for us to select. 5) Mutation The mutation process is performed by randomly choosing a gene from a chromosome and randomly changing it to another feasible gene. Thus, a new chromosome is generated. Fig. 5 shows an example of mutation, where two new chromosomes are generated. p: 3
m
(7)
𝑃(𝑠 ) = ∑
4
5
3
c1: 3
24
5
2
1
c2: 3
4
3
3
1
1
Fig. 5. Mutation example
From the service selection perspective, mutation is to randomly choose a component service in the mashup and replace it with another service randomly chosen from the same candidate service set. Through mutation, additional superior mashups are generated for us to select.
5.4 Platform Selection Platform selection is an important step in the construction process of service mashups, because it can affect the QoS and the cost performance of the service mashups. If a service mashup is deployed on a same platform with one of its component services, then the parameter transmission time of this service can be saved. Therefore, when selecting platforms, we should try to save as much parameter transmission time as possible to improve the QoS of the mashup. Given the component services of a mashup *𝑠 , 𝑠 , ⋯ , 𝑠 + and their platforms *𝑝 , 𝑝 , ⋯ , 𝑝 +(𝑚 𝑛) , the transmitted data size of each platform can be calculated by 𝐷(𝑝 ) = ∑ ∈ (𝑠 +𝑠 ) where (8) 𝑠 represents the data size of the input parameters of service sj and 𝑠 represents the dada size of the output parameters of service sj. Based on equation (8), we can select the platform with maximum transmitted data size for the service mashup to be deployed.
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2482980, IEEE Transactions on Parallel and Distributed Systems 8
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
Due to varying network conditions and user locations, the data transmission speed is varying all the time, and thus solutions that are generated according to current conditions are only effective for current states and not optimal for the following states. Therefore, we can only use the average data transmission speed to perform service selection and platform selection, as the average data transmission speed can best reflect the network condition. Minimum transmitted data size does not mean minimum data transmission time, because the network condition may differ for different platforms. In this paper, we do not take network condition into consideration. The minimum transmitted data size most likely cost the least transmission time, therefore, we select the platform with maximum transmitted data size to save the parameter transmission time and reduce the execution time of the mashup. In the genetic algorithm, platform selection is performed after the crossover and mutation process. After platform selection, the fitness of each chromosome can be calculated based on Definition 5.
5.5 Algorithms and Analysis Given a fitness function, the genetic algorithm executes iteratively to achieve approximate optimal solutions eventually. In the genetic algorithm, new individuals are constantly generated by crossover and mutation. Then, through selection, superior individuals are reserved and inferior individuals are weeded out. Thus, superior genes are transmitted to the next generations, making the next generations fit the environment better. The detailed algorithm is summarized in Table 4. TABLE 4 GA4MC ALGORITHM Input Output 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:
Quantity of chromosomes cq, iteration times it, crossover times ct and mutation times mt The chromosome with the highest fitness randomly compose cq chromosomes in ChrSet randomly choose one chromosome from ChrSet and mark it as OptChr for i=1 to it for j=1 to ct randomly choose two chromosomes, crossover them, and put the new generated chromosomes into ChrSet for k=1 to mt randomly choose one chromosome, mutate it, and put the newly generated chromosome into ChrSet select the platform for each chromosome compute the fitness of all of the chromosomes in ChrSet CurOptChr←the chromosome with the highest fitness if f (CurOptChr) > f (OptChr) OptChr←CurOptChr select cq chromosomes and remove others from ChrSet return OptChr
The algorithm begins with initialization (line 1), where initial chromosomes are randomly generated and put into the chromosome set ChrSet. The variable OptChr representing the optimal chromosome is initialized (line 2). Then, the pivotal steps crossover (lines 4-5) and mutation (lines 6-7) are processed, through which more new chromosomes are generated. Afterwards, platform selection is performed for each mashup (line 8). The
fitness of all of the chromosomes in the chromosome set are calculated (line 9), and the current optimal one is recorded (line 10). Next, the current optimal chromosome is compared with the optimal chromosome that has ever emerged in the evolutionary history, and the better one is assign to OptChr (lines 11-12). After that, the chromosomes are selected according to their fitness (line 13). This process is repeated until the given iteration times it is finished. Finally, OptChr is returned as the optimal chromosome (line 14). Obviously, the time complexity of the algorithm is polynomial. Moreover, we can adjust the efficiency of the algorithm by adjusting the initialization parameters. The result may be better if one or more parameters among cq, it, ct, and mt are increased, but it will cost more time to obtain the result. If these parameters are decreased, the efficiency will be improved, but the result may be suboptimal. Moreover, as shown in Table 4, the algorithm is based on the genetic algorithm. It is completed by multiple iterations, and each iteration involves operations including multiple crossover, mutation and selection, which are independent from each other. Therefore, the algorithm can be easily implemented on a distributed system with the operations being completed by different servers. In that case, the algorithm will become more efficient.
6 EXPERIMENTS In order to evaluate the performance of GA4MC, we conduct three sets of experiments. The first set of experiments aims to evaluate the effectiveness of GA4MC in terms of service selection by comparing the result of GA4MC with other methods. The second set of experiments is conducted to evaluate the effectiveness of GA4MC in terms of platform selection by comparing GA4MC with the method that does not consider platform selection. The third set of experiments evaluates the efficiency of GA4MC, where we implement the algorithms in different scale scenarios to examine the scalability of the algorithm.
6.1 Experiments Setup We have realized the algorithms using Matlab. Experiments are implemented on a computer with Pentium(R) Dual-Core CPU 2.50GHz, 2GB of RAM and Windows 8 operating system. As no standard experimental platforms and test data sets are available, we automatically generate the parameters of services and platforms and use them as the experimental data sets. Each service is assigned three integers as the input parameter transmission time, output parameter transmission time and service execution time, respectively, which are randomly generated, referred to [2]. Besides, each service is assigned an integer randomly generated from 5 to 20 as its default cost. The services packages in the experiments are generated according to Definition 2. In the experiments, the candidate services are randomly distributed to the platforms. Service packages are existed between services on the same platforms. If there are more than one service distributed
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2482980, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
to a same platform, these services are packaged together. The ratio between the package price and the sum of default values of the involved services is generated randomly from 0.2 to 0.9. The package prices are generated through the ratios and the default values of the involved services. We focus on the following three variables in the experiments: (1) Task number: the number of tasks in a service plan. (2) Candidate service number: the number of candidate services for each task in a service plan. (3) Platform number: the number of platforms where the mashups can be deployed. As analyzed in Section 5, the effectiveness and efficiency of the algorithm are mainly related to these three variables, as they reflect the scales of service plans, candidate service sets, and candidate platforms respectively, which are the main factors determining the results and efficiency of the algorithms. In order to examine the impact of these three parameters on the cost performance of the generated service mashups, we set three sets of parameters. In each set, one of the three parameters is varied and the other two parameters are fixed, shown as Table 5. In order to avoid the influence of the number of candidate services, we set the candidate services number as 0.5* platform number in set 3. TABLE 5 VARIABLE SETTINGS Set 1 2 3
Task number 3-15 5 5
Candidate service number 5 1-10 0.5* platform number
Platform number 10 10 3-15
6.2 Effectiveness of Service Selection In this subsection, we will compare the cost performance of mashups generated by GA4MC with three other methods that are widely used for service selection. - Greedy Algorithm. The greedy algorithm is a simple algorithm that makes the locally optimal choice at each stage. It is widely used in service selection field [3, 4]. We compare the greedy algorithm with other methods to show the different results between the methods that consider both service packages and parameter transmission time and those that do not. Therefore, the greedy algorithm does not consider service packages and parameter transmission time saving. It performs service selection by locally selecting the service with the best cost-performance in the candidate service set for each task in the service plan. In this process, the cost performance of each service is calculated by dividing the response time by the cost. The response time is the sum of the input parameter transmission time, service execution time, and output parameter transmission time of the service because it does not know which platform the mashup will be deployed on. The cost is the default cost of the service, because it does not know whether other services in the same package will be selected or not. - Ant colony algorithm. Ant colony algorithm is another heuristic algorithm, which is widely used for service selection [35, 36]. It assigns pheromone to each
9
service and guides the selection through the pheromones. When the algorithm starts, a number of ants are assigned, and each of them is ordered to select a service for each task in the service plan to create a mashup. The ants select services according to the pheromone of services. The services with higher pheromone are more possibly to be selected. After the ants finish their selection, all of the pheromones of their selected services will be updated according to the performance of corresponding mashups. The component services of mashups whose cost performance is high will be assigned with higher pheromones, and vice versa. After that, the ants will perform a new round of selection. Finally, the result can be generated by selecting the service with highest pheromone for each task in the service plan. - Integer programming. Integer programming is also widely used for service selection [37, 38]. It adopts enumeration-like algorithm to compare all of the possible mashups and select the best one. This method can achieve the best solution. All of the four methods select platforms according to the platform selection method in Subsection 5.4 for each candidate mashup generated in the process of the algorithms. In the following experiments, the cost performance is calculated by 𝑐𝑝 = where (9) 𝑡 is the maximum possible response time of the mashup, which is the sum of the maximum input parameter transmission time, maximum output parameter transmission time and maximum service execution time, 𝑡 is the response time of the mashup, which is calculated according to Definition 8, and c is the cost of the mashup, which is calculated according to Definition 7. 1) Impact of task number First, we examine the impact of task number on the cost performance of the generated service mashups. The parameters are set according to set 1 in Table 5. All of the four methods are implemented on 200 different data sets and we adopt the average values to avoid the particularity of data sets. Results are shown in Fig. 6, from which we can see that there is no obvious change of the optimal cost performance with the increasing of the task number. This is because, with the increasing of the task number, the total response time and cost of the mashups are increasing synchronously. Moreover, the result of integer programming method is the best, as it can achieve the optimal solution. The result of GA4MC is close to that of integer programming, which implies that our method can achieve a solution close to the optimal. The ant colony algorithm also can generate results that close to the optimal when the task number is small. However, with the task number increasing, the results become worse. The greedy algorithm performs worst, as it does not account for the service packages and parameter transmission time saving. There is a big gap between its result and the optimal, which shows the importance of
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2482980, IEEE Transactions on Parallel and Distributed Systems 10
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
service packages and parameter transmission time saving.
more platforms and services that can be selected, so the integer programming, the genetic algorithm and the ant colony algorithm can select them to get better mashups. However, there is no obvious varying trend with the result of the greedy algorithm, because it selects services without considering the cost performance of mashups. Moreover, the comparison of the four methods is in accordance with the previous experiments: the result of the GA4MC method is close to integer programming, and both the ant colony algorithm and greedy algorithm are inferior to our GA4MC method.
Fig. 6. Impact of task number on cost performance
2) Impact of candidate service number In this experiment, we examine the impact of candidate service number on the cost performance of mashups. To this end, we set the parameters according to set 2 in Table 5. All of the four methods are implemented. The experiment is also repeated 200 times and we adopt the average values. The result is shown in Fig. 7. From Fig. 7, we can see that the cost performances of all of the four methods are rising with the increasing of the candidate service number. The platforms are fixed, therefore, with the increasing of the candidate services, more services will be aggregated to same platforms, which increases the percentage of service packages, therefore, the cost performance of the four methods are raised. In addition, the comparison of the four methods is similar to the previous experiment: the result of integer programming method is the best; the result of GA4MC is close to integer programming; the ant colony algorithm is inferior to GA4MC; the greedy algorithm performs worst.
Fig. 7. Impact of candidate service number on cost performance
3) Impact of platform number This experiment is to examine the impact of platform number on the cost performance of mashups. We set the experimental parameters according to set 3 in Table 5. Then we implemented the four methods. Similarly, this process is repeated 200 times and we adopt the average values. The results are shown in Fig. 8. Fig. 8 shows that the cost performance of the optimal mashup is rising with the increasing of the platform number. With the increasing of the platforms, there are
Fig. 8. Impact of platform number on cost performance
From these experiments, we can draw conclusion that our GA4MC method can achieve mashups that are close to the optimal. The ant colony algorithm is inferior to GA4MC method. Though it can get good results when the data scale is small, its result becomes worse with the experimental scale increasing. The greedy algorithm performs worst, which shows that the service packages and parameter transmission time saving have great effect on the cost performance of mashups.
6.3 Effectiveness of Platform Selection In this set of experiments, we examine the importance of platform selection on the cost performance of mashups. To this end, we design a new method, which does not account for the impact of the located platforms on the execution time of mashups. It separates the process of service selection and platform selection. In the process of service selection, similarly, it adopts the genetic algorithm to select component services, but the fitness is calculated differently. It simply regards the sum of response time of the component services as the response time of mashups to calculate the fitness of mashups. After service selection, it randomly selects the located platform for the mashup and recalculates the cost performance of the generated mashup. We call this method as GA4MC with random platform. 1) Impact of task number First, we examine the impact of platform selection with the ranging of task number. To this end, we set the experimental parameters according to set 1 in Table 5. Both methods are also implemented 200 times and the average values are recorded. The result is shown in Fig. 9. It shows that, with the task number increasing, the result of the random platform selection method becomes better. This is because with the
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2482980, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
11
task number increasing, there will be more services on the same platform with the mashup, which makes the execution time of the mashups reduced. However, there is still a wide gap between the two methods, which implies the importance of platform selection for mashups.
Fig. 11. Impact of platform selection ranging platform number
Fig. 9. Impact of platform selection ranging task number
2) Impact of candidate service number In this experiment, we examine the impact of platform selection with the range of task number. To this end, we set the experimental parameters according to set 2 in Table 5. Both methods are also implemented 200 times and the average values are recorded. The result is shown in Fig. 10, from which we can see that, with the increasing of candidate service number, the cost performance of both methods are raised, due to the increased percentage of service packages. Moreover, with the candidate service number increasing, the gap between the two methods widens.
6.4 Efficiency Evaluation In this subsection, we conduct experiments to examine the scalability of our method and compare our method with others in terms of execution time. Similarly, we also range those three parameters, i.e. task number, candidate service number and platform number. 1) Impact of task number In order to examine the impact of task number on the execution time of the methods, we set the experimental parameters according to set 1 in Table 5. All of the methods are implemented 200 times to avoid the influence of data sets. As the execution times of these methods are in different order of magnitude, we separate them to two figures to show the variation trends of the execution time of all methods. Fig. 12 shows the comparison of execution time between GA4MC and integer programming method, from which we can see that with the increasing of task number, the execution time of integer programming method increases sharply. This is because it adopt enumeration-like algorithm. With the task number increasing, the number of candidate composition plans increases exponentially. In contrast, the execution times of GA4MC method are in a much lower order of magnitude. Fig. 13 shows that the execution time of GA4MC and the ant colony algorithm are similar, while with the tasks increasing, the execution time of the ant colony algorithm increases more sharply.
Fig. 10. Impact of platform selection ranging candidate service number
3) Impact of platform number This experiment is to examine the impact of platform selection with the range of platform number. We set the experimental parameters according to set 3 in Table 5. Then we implemented the two methods on 200 different data sets and record the average values. The results are shown in Fig. 11, which demonstrates that the GA4MC method significantly outperforms the random platform selection methods and the gap becomes larger with the increasing of platforms. From the experiments in this subsection, we can draw conclusion that the platform selection has a significant effect on the cost performance of the mashups.
Fig. 12. The execution time comparison with integer programming method ranging task number
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2482980, IEEE Transactions on Parallel and Distributed Systems 12
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
are very close.
Fig. 13. The execution time comparison with ant colony algorithm ranging task number
2) Impact of candidate service number In order to examine the impact of candidate service number on the execution time of the methods, we set the experimental parameters according to set 2 in Table 5. Similarly, all of the four methods are implemented 200 times and we adopt the average values. The results are shown in Fig. 14 and Fig. 15.
3) Impact of platform number This experiment is to examine the impact of platform number on the execution time of the methods. We set the experimental variables according to set 3 in Table 5. Similarly, all of the methods are implemented 200 times and we adopt the average values. From the results in Fig. 16 and Fig. 17, we can draw similar conclusion with the previous experiments: with the increasing of platform number, the execution time of integer programming method increases sharply; the execution time of our GA4MC method and the ant colony algorithm is in a much lower order of magnitude; the execution time of GA4MC and the ant colony algorithm are similar; the execution time of ant colony algorithm grows quickly with the platform number increasing, while the execution time of GA4MC almost remain unchanged.
Fig. 16. The execution time comparison with integer programming method ranging platform number Fig. 14. The execution time comparison with integer programming method ranging candidate service number
Fig. 17. The execution time comparison with ant colony algorithm ranging platform number Fig. 15. The execution time comparison with ant colony algorithm ranging candidate service number
Fig. 14 shows the comparison of execution time between GA4MC method and integer programming method with the candidate service number increasing. The execution time of GA4MC is in a much lower order of magnitude than the integer programming method. With the candidate service number increasing, the execution time of integer programming method increases sharply. In contrast, the execution time of GA4MC and the ant colony algorithm almost remain unchanged, as shown in Fig. 15. Moreover, the execution time of ant colony algorithm is a little less than GA4MC method, but they
From the experiments above, we can draw conclusion that the integer programming method can achieve the optimal mashup, but due to the exorbitant execution time, it is not practicable in the real life. The execution time of ant colony algorithm is close to GA4MC, but with the increasing of task number and platform number, it grows faster. Our GA4MC method performs the best scalability.
7 CONCLUSION AND FUTURE WORK In this paper, we focus on service selection and platform selection in the process of mashup creation. To improve the cost performance of service mashups, we take service
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2482980, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
packages and platform selection into consideration. This problem is formally modeled and the genetic algorithm is tailored to achieve the objective. A set of experiments performed show that GA4MC can achieve mashups whose cost performance is extremely close to the optimal. Moreover, the execution time of GA4MC is in a low order of magnitude and GA4MC performs good scalability with the increasing of the experimental scales. However, there are still several problems unsolved in this paper: 1) In order to save response time, we select platforms only according to the transmitted data size, assuming that the data transmission speeds between all platforms are identical. In fact, the data transmission speed varies widely between different platforms. 2) We account for service packages to reduce cost. However, there are many other complex pricing policies. How to model and address these pricing policies is still unsolved. 3) We perform the selection with regards to the cost performance of mashups, which only involves the response time and cost of services. In fact, there are many other QoS attributes that users pay attention to, such as reliability, failure probability, etc. Therefore, in future, we will try to solve the selection problem with all these factors taken into consideration. In that case, the problem will become closer to the real world and the method will be more practical.
13
[8]
A. Hristoskova, B. Volckaert, and F. De Turck, "The WTE+ framework: automated construction and runtime adaptation of service mashups," Automated Software Engineering, vol. 20, no. 4, pp. 499-542, 2013. [9] P. Bertoli, M. Pistore, and P. Traverso, "Automated composition of web services via planning in asynchronous domains," Artificial Intelligence, vol. 174, no. 3, pp. 316-361, 2010. [10] M. Arafati, G. Dagher, B. Fung, and P. Hung, "D-mash: A framework for privacy-preserving data-as-a-service mashups," International Conference on Cloud Computing, 2014, pp. 498-505. [11] M. Barhamgi, D. Benslimane, C. Ghedira, S. Tbahriti, and M. Mrissa, "A framework for building privacy-conscious DaaS service Mashups," International Conference on Web Services, 2011, pp. 323-330. [12] A. Friedman and A. Schuster, "Data mining with differential privacy," International Conference on Knowledge Discovery and Data Mining, 2010, pp. 493-502. [13] M. Mrissa, "Privacy-enhanced Web service composition," IEEE Transactions on Services Computing, vol. 7, no.2, pp.210-222, 2014. [14] J. El Hadad, M. Manouvrier, and M. Rukoz, "TQoS: transactional and QoS-aware selection algorithm for automatic Web service composition," IEEE Transactions on Services Computing, vol. 3, no. 1, pp. 73-85, 2010. [15] H. Zheng, W. Zhao, J. Yang, and A. Bouguettaya, "Qos analysis for web service compositions with complex structures," IEEE Transactions on Services Computing, vol. 6, no.3, pp. 373-386, 2013. [16] M. Dumas, L. García-Bañuelos, A. Polyvyanyy, Y. Yang, and L. Zhang, "Aggregate quality of service computation for composite services," International Conference on Service-Oriented Computing, 2010, pp. 213-227. [17] S. Uludag, K. Lui, K. Nahrstedt, and G. Brewster, "Analysis of topology aggregation techniques for QoS routing," ACM Computing Surveys, vol. 39, no. 3, pp. 1-7, 2007. [18] M. Jaeger, G. Muhl, and S. Golze, "QoS-aware composition of ACKNOWLEDGMENT Web services: a look at selection algorithms," International Conference on Web Services, 2005, pp. 1-2. This research work is conducted while Shuiguang Deng was working at MIT as visiting scholars. We thank Prof. [19] R. Zhang, C. Chai, and Y Liang, "Joint beamforming and power control for multiantenna relay broadcast channel with QoS Stuart Madnick for his valuable comments and constraints," IEEE Transactions on Signal Processing, vol. 57, no. 2, suggestions. This research is supported in part by the pp. 726-737, 2009. National Natural Science Foundation of China under [20] W. Chen and J. Zhang, "An ant colony optimization approach Grant No. 61170033 and the National High-Tech Research to a grid workflow scheduling problem with various QoS requirements," IEEE Transactions on Systems, Man, and and Development Plan of China under Grant No. Cybernetics, Part C: Applications and Reviews, vol. 39, no. 1, pp. 2013BAD19B10. 29-43, 2009. [21] G. Zou, Q. Lu, Y. Chen, R. Huang, Y. Xu and Y. Xiang, "QoSREFERENCES aware dynamic composition of Web services using numerical temporal planning," IEEE Transactions on Services Computing, [1] X. Chen, Z. Zheng, Q. Yu, and M. R. Lyu, ―Web service vol. 7, no. 1, pp. 18-31, 2014. recommendation via exploiting location and qos information,‖ [22] S. Sun and J. Zhao, "A decomposition-based approach for IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. service composition with global QoS guarantees," Information 7, pp. 1913-1924, 2014. Sciences, vol. 199, pp. 138-153, 2012. [2] Z. Zheng and M. R. Lyu, QoS management of web services: [23] S. Deng, L. Huang, D. Hu, J. Zhao, and Z. Wu. "MobilitySpringer, 2013. enabled service selection for composite services," IEEE [3] P. Bartalos and M. Bieliková, "QoS aware semantic web service Transactions on Services Computing, DOI composition approach considering pre/postconditions," 10.1109/TSC.2014.2365799. International Conference on Web Services, 2010, pp. 345-352. [24] S. Deng, L. Huang, J. Taheri, and A. Y. Zomaya, "Computation [4] D. Chiu, S. Deshpande, G. Agrawal, and R. Li, "A dynamic offloading for service workflow in mobile cloud computing," approach toward QoS-aware service workflow composition," IEEE Transactions on Parallel and Distributed Systems, DOI International Conference on Web Services, 2009, pp. 655-662. 10.1109/TPDS.2014.2381640. [5] W. Kongdenfha, B. Benatallah, J. Vayssière, R. Saint-Paul, and F. [25] S. Deng, H. Wu, W. Tan, Z. Xiang, and Z. Wu., " Mobile service Casati, "Rapid development of spreadsheet-based web selection for composition: an energy consumption perspective," mashups," International Conference on World Wide Web, 2009, pp. IEEE Transactions on Automation Science and Engineering, DOI 851-860. 10.1109/TASE.2015.2438020. [6] G. Wang, S. Yang, and Y. Han, "Mashroom: end-user mashup [26] S. Deng, L. Huang, Y. Li, H. Zhou, Z. Wu, X. Cao, M. Kataev, programming using nested tables," International Conference on and L. Li., "Towards risk reduction for mobile service World Wide Web, 2009, pp. 861-870. composition," IEEE Transactions on Cybernetics, DOI [7] E. Maximilien, A. Ranabahu, and K. Gomadam, "An online 10.1109/TCYB.2015.2446443. platform for web APIs and service mashups," IEEE Internet [27] H. Yu, S. Reiff-Marganiec, ―A backwards composition context Computing, vol. 12, no. 5, pp. 32-43, 2008. 1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2482980, IEEE Transactions on Parallel and Distributed Systems 14
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
based service selection approach for service composition,‖ International Conference on Services Computing, 2009, pp. 419-426. Q. Wu, Q. Zhu, and M. Zhou, ―A correlation-driven optimal service selection approach for virtual enterprise establishment,‖ J. Intelligent Manufacturing, vol. 25, no. 6, pp. 1441-1453, 2014. S. Deng, H. Wu, D. Hu, and J. Leon Zhao. "Service selection for composition with QoS correlations," IEEE Transactions on Services Computing, DOI 10.1109/TSC.2014.2361138. W. Ahmed, Y. Wu, and W. Zheng, "Response time based optimal Web service selection," IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 2, pp. 551-561, 2015. M. Alrifai, D. Skoutas, and T. Risse, ―Selecting skyline services for QoS-based web service composition,‖ International Conference on World Wide Web, 2010, pp. 11–20. L. Qi, Y. Tang, W. Dou, and J. Chen, ―Combining local optimization and enumeration for QoS-aware Web service composition,‖ International Conference on Web Services, 2010, pp. 34-41. J. Huang, C. Lin, and Wan J. Modeling, ―Analysis and optimization of dependability-aware energy efficiency in services computing systems,‖ International Conference on Services Computing, 2013, pp. 683-690. D. Goldberg, "Genetic algorithms in search, optimization and machine learning," Addison-Wesley Publishing Company, pp. 2628, 1989. J. Cao, G. Zhu, X. Zheng, B. Liu and F. Dong, "TASS: transaction assurance in service selection," International Conference on Web Service, 2012, pp. 472-479. W. Zhang, C. Chang, T. Feng, and H. Jiang, "QoS-Based dynamic Web service composition with ant colony optimization," Computer Software and Applications Conference, 2010, pp. 493-502. T. Ambra, D. Parlanti, D. Giuli, ―A semantic-driven integer programming approach for QoS-aware dynamic service composition,‖ IEEE Congress on FITCE, 2011, pp.1-6. J. Yoo, S. Kumara, D. Lee and O. Chan, " A Web service composition framework using integer programming with nonfunctional objectives and constraints," IEEE Conference on Enterprise Computing, E-Commerce and E-Services, 2008, pp. 347350.
Shuiguang Deng received the BS and PhD degrees in computer science from Zhejiang University in 2002 and 2007, respectively. He is an associate professor in the College of Computer Science at Zhejiang University. He was a visiting scholar at MIT in 2014 and he is working in Stanford University as a visiting scholar now. His research interests include service computing, business process management, and data management. He is a member of the IEEE and the ACM.
Hongyue Wu received the B.S. and M.S. degree in computer science and technology in 2010 and 2013, respectively. Now he is a PhD of computer science and technology in Zhejiang University. His research interest focuses on service computing. Javid Taheri received his Bachelor and Masters of Electrical Engineering from Sharif University of Technology, Tehran, Iran in 1998 and 2000, respectively. He received his Ph.D. in the field of Mobile Computing from the School of Information Technologies in the University of Sydney, Sydney, Australia. Since 2006, he has been actively working in several fields, including: networking, bioinformatics parallel computing, etc. He is currently working as a senior Lecturer at Karlstad University, Sweden, in designing scheduling algorithms for cloud and green computing. Zhaohui Wu received the PhD degree in com- puter science from Zhejiang University in 1988. From 1991 to 1993, he was a member of a Sino- Germany jointly trained PhD program. Currently, he is a professor in the College of Computer Sci- ence, Zhejiang University. His research interests include service computing, intelligent system, and ubiquitous computing. He is a senior member of the IEEE. Albert Y. Zomaya is currently the Chair Professor of High Performance Computing & Networking in the School of Information Technologies, The University of Sydney. He is also the Director of the Centre for Distributed and High Performance Computing. Professor Zomaya published more than 500 scientific papers and articles and is author, co-author or editor of more than 20 books. He is the Editor in Chief of the IEEE Transactions on Computers and Springer’s Scalable Computing and serves as an associate editor for 22 leading journals, such as, the ACM Computing Surveys and Journal of Parallel and Distributed Computing. Professor Zomaya is the recipient of the IEEE Technical Committee on Parallel Processing Outstanding Service Award (2011), the IEEE Technical Committee on Scalable Computing Medal for Excellence in Scalable Computing (2011), and the IEEE Computer Society Technical Achievement Award (2014). He is a Chartered Engineer, a Fellow of AAAS, IEEE, IET (UK). Professor Zomaya’s research interests are in the areas of parallel and distributed computing and complex systems.
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.