Efficient Computing Composite Service Skyline with ... - IEEE Xplore

3 downloads 4881 Views 425KB Size Report
School of Computer Science and Engineering, Nanjing University of Science and Technology, ... Keywords-composite service skyline; QoS correlation; service.
2015 IEEE International Conference on Services Computing

Efficient Computing Composite Service Skyline with QoS Correlations Yu Du∗ , Hao Hu∗ , Wei Song† , Junhua Ding‡ and Jian L¨u∗ Key Laboratory for Novel Software Technology, Nanjing University, China † School of Computer Science and Engineering, Nanjing University of Science and Technology, China ‡ Department of Computer Science, East Carolina University, USA † [email protected] ∗ State

reduce the number of candidate services and will further avoid the problem of combination explosion. Yu et al. [1] exploits dominance relationship to select a set of “best” possible service compositions which are not dominated by each other, the result is referred to a composite service skyline. Based on the composite service skyline, the QoSoptimal composite service can be found conveniently and efficiently when different weights of QoS attributes are considered. However, of the approach in [1] assumes that the QoS of different services are independent from each other. In fact, QoS correlations between different services are popular in practice. For instance, if we choose Ctrip1 as our travel service enterprise, we will get a discount of China Eastern Airlines2 . Therefore, some candidate set of services will be missed in the composite service skyline, if we do not take the QoS correlation into account. In this paper, we focus on the calculating the composite service skyline with considering the QoS correlations. To the best of our knowledge, we are the first to investigate this problem. A naive approach to obtaining the composite service skyline is to generate all possible composite service first and then obtain the ones that are not dominated by others. Unfortunately, it is time-consuming when the scale of candidate services is large, because this is a combination explosion problem. To address this problem, we propose two advanced pruning algorithms to remove redundant candidate services of each task before the calculating of composite service skyline. Then, we propose a composite service skyline selection method with considering QoS correlation, we compose adjacent services one by one, and remove redundant services among these intermediate results based on our proposed pruning criteria. Note that we cannot use dominance relationship due to the possibility of the erroneous removing of services with QoS correlations. In summary, the contributions of our work can be summarized as follows.

Abstract—In the service-oriented architecture, component services can be composed to provide value-added services. The notion of composite service skyline facilitates obtaining a QoS-optimal service combination. State-of-the-art approach to computing composite service skyline assumes that candidate services of different tasks are independent. However, real-word services are usually QoS-correlated. To this end, we present a novel approach to computing the composite service skyline in the presence of QoS correlations. Advanced pruning techniques are proposed to accelerate our approach. We conduct extensive experiments to evaluate the effectiveness and efficiency of our approach. Keywords-composite service skyline; QoS correlation; service selection; pruning

I. I NTRODUCTION Web Services are platform-independent, self-contained, and loosely coupled web applications that are described, published, discovered, and invoked over the network using standard languages and protocols. Service-oriented architecture permits the composite services to achieve more complex tasks. Usually, an abstract workflow is constructed to represent and manage the composite services. We use the term “service plan”(SP, in short) to represent a composition in rest of the paper. The quality of service (QoS) is usually used to describe the non-functional characteristics of a Web service. With the growing number of services sharing similar functionality but different QoS, service composition faces a new challenge that the selection of services is not only to meet the functional requirements of a user but also to optimize the overall QoS of a composite service. A large amount of work focus on the QoS-aware service composition problems. A Simple Additive Weighting (SAW) technique is usually used to select an optimal composition. However those solutions have some demerits in practice. Firstly, if the weights are changed, the process of selection need to be performed again which will cause considerable calculations [1]. Secondly, the composer must searches the whole composition space with different users’ requests due to their weights may be different. To this end, some researchers try to utilize the dominance relationship to select a set of skyline services of each task [1][2][3]. The use of dominance relationship between services will significantly 978-1-4673-7281-7/15 $31.00 © 2015 IEEE DOI 10.1109/SCC.2015.16



We present the problem of computing composite service skyline with QoS correlations. Besides, we present an approach to solve this problem.

1 http://www.ctrip.com 2 http://www.ceair.com

41

DNF to describe the condition of a correlation, but the time complexity is large. Deng et al. [11] presented a pruning algorithm before global optimization. Then they proposed two algorithm for solving the service selection with the correlations existing in non-adjacent tasks and adjacent tasks respectively. But the authors only consider 1-dimension quality. But none of these researches give a method for computing composite service skyline with QoS correlations.

We present seven pruning criteria based on which two pruning algorithms are proposed to accelerate our approach. • We conduct a series of experiments to evaluate the effectiveness and efficiency of our algorithms. The rest of this paper is organized as follows. Section II reviews the related work. Section III discusses the background. Section IV presents a detailed introduction of our algorithms. Section V evaluates our approaches. Section VI concludes the paper. •

III. BACKGROUND In this section, we first illustrate our motivation, then present the necessary definitions and notations, at last we define the problem.

II. R ELATED W ORK As the fast growing number of services with similar functionalities appearing in the internet, QoS-aware service composition problem has received a lot attentions in recent years. Zeng et al. [4] used mixed integer linear programming to find the optimal composition. In order to find the near-optimal composition, Yu et al. [5] proposed a global optimization algorithm by using integer programming. A mixed method was proposed in [6], the local selection and global optimization was merged to get the optimal QoSbased service composition. But all of these works need to search the whole candidate services space to get the optimal composite service. To this end, some researchers exploited skyline based algorithms to reduce the size of candidate services. Skyline computation came from early research questions like contour problem, maximum vectors and convex hull [3]. It is first introduced to database domain by Brzsonyi et al. [7], they proposed three algorithms for computing skyline. In [8], the authors proposed a BBS algorithm based on R-tree for computing skyline. Alrifai et al. [2] exploited dominance relationship for cutting through the search space of candidate services. Two novel concepts, called α-dominance and σdominance were proposed by Benouaret et al. [3], they are the extension of Pareto dominance, the scale of skyline services can be controlled by using them. However, when the weights, which are specified by a user, are changed, the selection process need to run again. In addition, for a same request, different user may specify different weights, the selection process will run for every single user. The selection process is running with searching the whole space of candidate service and do composing again in the works we talked before. This is inefficient and time consuming. To this end, Yu et al. [1] proposed a novel method BUA to compute CSKY. The composite services in CSKY are a small set of the whole composite services. They are all non-dominated composite services. No matter what weights are specified by a user, the optimal composite service must in the CSKY. These approaches, however, do not account for QoS correlations. The concept of QoS correlations was proposed in [9], and the authors put forward the novel service selection algorithm with QoS correlation. In [10], the authors applied

A. Motivation In this section, we present an example to demonstrate our motivation. Consider there are 3 tasks in Table I, each task contains 4 candidate services. For simplicity, we only consider 2 qualities of service: “price” and “response time”. And in this case, we assume QoS correlation only exists in the price. In addition, we use “select()” to represent the QoS correlation between services in the table, for instance, if we select s24 and s34 at the same time, the QoS of s34 will be improved due to there exists a QoS correlation between them. The correlated quality values are represented in underlined dark blue font. Given a set of services in a d-dimensional QoS, a service skyline query[2] selects those services that are not dominated by any other service. Before we talk about the definition of composite service skyline, let’s see what is dominance relationship. Consider two services a and b from a same task, iff a is as good or better than b in all attributes in QoS and better in at least one attribute in QoS, we say a dominates b, or denoted as a  b. For example, s14  s12 in Table I. By exploiting dominance, we can find service skyline(SSKY, in short). The skyline services are presented in bold red font in Table I. The definition of dominance can also apply to composite services. A composite service is an instantiation of a SP, it can be obtained by binding each task a candidate service, we use CS to denote a composite service. And the composite service skyline(CSKY, in short)[1] is a set of composite services which are not dominated by any other composite services. Example 1 (CSKY): Assuming we need to finish a complex task by composing task1 ∼task3 in a sequential structure as shown in Table I. The CSKY of this case is {{s13 , s23 , s33 }, {s14 , s21 , s32 }}. If we use the algorithm BUA[1] to compute the CSKY in Example 1, we will obtain {{s14 , s21 , s33 }}, {{s14 , s23 , s33 }} which are totally different to the actual result. The reason is QoS correlations are not considered in BUA. Yu et al. [1] propose a local search strategy, firstly, they select the SSKY among candidate services of each task, and then compute CSKY over these SSKY. It is very useful

42

Table II F REQUENTLY U SED N OTATIONS

Table I E XAMPLE OF W EB S ERVICE S ET task1

task2

task3

Symbols

Explaination

getCorS(c)

Get the service in the 3-tuple correlation c.

s11 : ($10, 0.7s)

s21 : ($12, 0.4s)

s31 : ($7, 1.0s) select(s11 ): ($6, 1.0s) select(s21 ): ($5, 1.0s)

bind(s, c)

Return the service with correlated QoS when bind s with c.

s12 : ($4, 0.5s)

s22 : ($15, 0.5s)

s32 : ($8, 0.7s) select(s21 ): ($5, 0.7s)

isFST(S)

Judge whether the services in S are from a same task.

s13 : ($5, 0.3s)

s23 : ($12, 0.4s) select(s12 ): ($11, 0.4s) select(s13 ): ($10, 0.4s)

s33 : ($6, 0.7s)

s14 : ($4, 0.3s)

s24 : ($17, 0.6s)

s34 : ($10, 0.9s) select(s24 ): ($5, 0.9s)

getToSs(S)

Get tasks of the services in S.

composite(s1 , s2 )

Return the composite service consist of s1 and s2 .

correlations existing in the candidate services from different task, the problem is how to compute all composite services that are not dominated by others. As we talked before, the use of BUA [1] cannot compute effectively, and the enumerating method cannot work efficiently when the size candidate services is large. To solve the combination explosion problem, we want to reduce the candidate services’ scale by exploiting our pruning criteria. We only focus on the sequential service plan, since other structure may be reduced or transformed to the sequential structure [12].

to avoid the problem of combination explosion. But, as shown before, BUA is not suit to the new circumstances with QoS correlation. In addition, the naive enumerating method is not adaptive either, due to the possibility of leading to combination explosion. We want to compute the CSKY with QoS correlations efficiently and effectively. B. Preliminaries Considering there are m tasks in a service plan, SP={t1 , t2 ,· · ·, tm }. Each task ti contains n candidate services which is presented as Si ={si1 , si2 ,· · ·, sin }. The quality of each service is denoted as Q={q1 , q2 ,· · ·, qd }, the i-th quality value of s is qi (s). There are two types of quality, one is “negative” which means the smaller the quality value is, the higher the quality will be (e.g., price), another type is “positive quality” which means the bigger the quality value is, the higher the quality will be (e.g., availability). Definition 1 (Correlation). The correlation c = is a 3-tuple, q represents the correlated quality, cv denotes the correlated quality value, and s is the service affects the quality of current service. Due to the quality may be influenced by more than one service, we use a correlation set C to keep these correlations. As shown in Table I, C(s23 ) is {, } Definition 2 (Web Service). The web service s = , id is the identification of s, InSSet keeps the services affect the QoS of s, NoInSSet is a set holds the services cannot be composed with s, and OutSSet is a set contains the services of which the QoS is affected by s. Table II defines the notations that will be used in the rest of this paper.

IV. C OMPUTING C OMPOSITE S ERVICE S KYLINE W ITH Q O S C ORRELATIONS In this section, we introduce the main idea of our methods for computing CSKY while considering QoS correlations. We first define the pruning criteria, and then give a detailed introduction of our algorithms. A. Pruning Criteria The inspiration comes from the local search strategy proposed in [1]. And an intuitive observation is shown in Lemma 1. Lemma 1. ∃s ∈ S, InSSet(s) = OutSSet(s) = ∅, if ∃s ∈ S, / CSKY. s  s, then for any CS contains s, CS ∈ Proof. Assume a CS contains s is a skyline composite service. We can find another service s ∈ S and s  s. And we can replace s in CS with s , the new composite service is denoted as CS . It’s obviously CS  CS, which is contradictory with the assumption. Example 2 (Lemma 1): s22 should be removed from task2 as Table I shows, because s22 is not a skyline service, and there is no QoS correlation between s22 and others. Note that, in Lemma 1, s also represents the sub composite service (SUBCS, in short). For instance, {s13 , s23 }, {s13 , s33 } and {s23 , s33 } are all the sub composite service of {s13 , s23 , s33 }. Although the services with QoS correlations are not removed based on Lemma 1, this does not mean that they cannot be removed. The following criteria are focusing on removing the correlations between services. And after removing these correlations, some services should be removed according to Lemma 1.

C. Problem Definition In this section, we will formally define the problem of computing composite service skyline with QoS correlations. Problem Definition. Given a service plan consists of m tasks, there exists n candidate services in each task, the dimension of QoS is d, in addition, there are several QoS

43

Lemma 2. ∃s1 , s2 ∈ S, s1 ∈ SSKY, s2 ∈ / SSKY, InSSet(s2 ) = ∅, OutSSet(s2 ) = ∅, if ∃c ∈ C(s2 ), s1  bind(s2 ,c), then for any CS contains getCorS(c) and s2 , CS ∈ / CSKY. Proof. Assume a CS contains s2 and getCorS(c) is a skyline composite service. We can replace s2 in CS with s1 , it’s obviously the new composite service CS’  CS, which is contradictory with the assumption. Lemma 3. ∃s1 , s2 ∈ S, InSSet(s1 ) ∩ InSSet(s2 ) = ∅, OutSSet(s2 ) = ∅, if ∃c1 ∈ C(s1 ), ∃c2 ∈ C(s2 ), that getCorS(c1 ) = getCorS(c2 ) ∈ InSSet(s1 ) ∩ InSSet(s2 ), and bind(s1 , c1 )  bind(s2 , c2 ), then for any CS contains s2 and / CSKY. the service related to c2 , CS ∈ Proof. This proof is similar to the proof Lemma 2. According to Lemma 2 and Lemma 3, we can remove the QoS correlation c between s2 and getCorS(c). Because the correlation c is useless under these circumstances. Example 3 and Example 4 show how to remove correlations under these two situations respectively. Lemma 2 and Lemma 3 guide us to remove the useless correlations from the InSSet’s point of view. Example 3 (Lemma 2): The correlation c between s11 , s31 should be removed in Table I, because s33  bind(s31 , c). Example 4 (Lemma 3): The correlation between s21 and s31 is c1 , and the correlation between s21 and s32 is c2 . We should remove c1 due to bind(s32 , c2 )  bind(s31 , c1 ).

Example 6 (Lemma 5): The correlation between s12 and s23 is c1 , and the correlation between s13 and s23 is c2 . We should remove c1 due to composite(s13 , s23 )  composite(s12 , s23 ). The QoS of composite(s13 , s23 ) is ($15, 0.7s) and the QoS of composite(s12 , s23 ) is ($15, 0.9s). The correlated QoS values corresponding to remaining correlations after pruning process, according to Lemma 2 to 5, cannot be dominated by others. However, there are still some things need pay attention to, for instance, s32 can only compose with s21 in task2 , if we compose other service such as s23 with s32 , we can replace s32 with s33 so that the QoS of composite service get better. To this end, we propose Lemma 6 and Lemma 7 based on Lemma 1 to 5. Lemma 6. ∃s ∈ S, InSSet(s) = ∅, OutSSet(s) = ∅, s ∈ / SSKY, so for all CS, CS contains s, if CS ∈ CSKY, then there must be at least one service s , s in CS and s ∈ InSSet(s) . Proof. We can assume ∃CS ∈CSKY, CS contains s, but CS contains no service belongs to InSSet(s). As s is not a skyline service in S, we can find s ∈ S that s  s, and by replacing s with s , the new composite service CS  CS, which is contradictory with the assumption. From Lemma 6, we can find the default QoS value of s can never be used, when InSSet(s) = ∅, OutSSet(s) = ∅, and s is not a skyline service. The reason is s must be composed with a service belongs to InSSet(s), so the default QoS value of s can never be used. The following example explains for Lemma 6. Example 7: s32 is not a skyline service, InSSet(s32 ) = ∅ and OutSSet(s13 ) = ∅, so s32 must be composed with s21 in task2 , and the default QoS value of s32 can never be used. Lemma 7. ∃s ∈ S, InSSet(s) = ∅, OutSSet(s) = ∅, s ∈ / SSKY, so for all CS, CS contains s, if CS ∈ CSKY, then there must be at least one service s , s in CS and s ∈ OutSSet(s). Proof. We can assume ∃CS ∈CSKY, CS contains s, but CS contains no service belongs to OutSSet(s). As s is not a skyline service in S, we can find s ∈ S that s  s, and by replacing s with s , the new composite service CS  CS, which is contradictory with the assumption. Example 8 (Lemma 7): s13 must be composed with s23 in task2 due to s13 is not a skyline service, InSSet(s13 ) = ∅ and OutSSet(s13 ) = ∅.

Lemma 4. ∃s1 , s2 ∈ S, s1 ∈ SSKY, s1  s2 , InSSet(s2 ) = ∅, OutSSet(s2 ) = ∅, isFST(OutSSet(s2 )) = true, then ∀s ∈ OutSSet(s2 ), if composite(s1 ,s)  composite(s2 ,s), then for / CSKY. any CS contains s2 and s, CS ∈ Proof. Assume a CS contains s2 and s is a skyline composite service. We can replace s2 in CS with s1 , it’s obviously the new composite service CS  CS, which is contradictory with the assumption. Lemma 5. ∃s1 , s2 ∈ S, OutSSet(s1 ) = ∅, InSSet(s2 ) = ∅, OutSSet(s2 ) = ∅, isFST(OutSSet(s2 )) = true, getToSs(OutSSet(s1 )) ∩ getToSs(OutSSet(s2 )) = ∅, and then ∀s ∈ OutSSet(s1 ) ∩ OutSSet(s2 ), if composite(s1 , s)  composite(s2 , s), then for any CS contains s2 and s, CS ∈ / CSKY. Proof. This proof is similar to the proof Lemma 4. Based on this Lemma 4 and Lemma 5, we can remove the QoS correlation c between s2 and s. Due to the correlation c is useless under these circumstances. We show how to remove correlations under these two situations in Example 5 and Example 6 respectively. Lemma 4 and Lemma 5 guide us to remove the useless correlations from the OutSSet’s point of view. Example 5 (Lemma 4): s23 is a skyline service, and s24 has the correlation c with s34 . c should be removed in Table I, because composite(s23 , s34 )  composite(s24 , s34 ). The QoS of these two composite services are ($22, 1.3s) and ($22, 1.5s) respectively.

B. Alogrithm 1: Offline preprocessing algorithm of candidate services for each task(OFPA) Algorithm 1 is running on each service broker concurrently before a request coming in. Each broker holds a lot of candidate services with similar functionality. The inputs of Algorithm 1 are the services with QoS correlations or be skyline, the other services are not considered according to Lemma 1. The skyline services could be calculated by using BBS [8] algorithm and the services with correlations can be

44

Algorithm 1 Offline Preprocessing Algorithm of Candidate services for each task(OFPA) Input: Candidate services set S contains the skyline services and the services with QoS correlation. Output: The candidate services after pruning and the removed correlations. 1: DelCorrSet ← ∅ 2: for all s1 , s2 ∈ S do 3: if s1 ∈ SSKY & s2 ∈ / SSKY & InSSet(s2 ) = ∅ & OutSSet(s2 ) = ∅ then 4: for all c ∈ C(s2 ) do 5: if s1  bind(s2 , c) then 6: remove c and add c in DelCorrSet 7: else if InSSet(s1 ) ∩ InSSet(s2 ) = ∅ & OutSSet(s2 ) = ∅ then 8: iSet ← InSSet(s1 ) ∩ InSSet(s2 ) 9: for all c1 ∈ C(s1 ), ∀c2 ∈ C(s2 ) do 10: if getCorS(c1 ) = getCorS(c2 ) ∈ iSet & bind(s1 , c1 )  bind (s2 , c2 ) then 11: remove c2 and add c2 in DelCorrSet / SSKY then 12: if InSSet(s2 ) = ∅ & OutSSet(s2 ) = ∅ & s2 ∈ 13: remove(s2 , S) 14: return S,DelCorrSet

Algorithm 2 Online Preprocessing Algorithm of Candidate services for each task(ONPA) Input: Service Plan SP, Candidate Services Set S of each task in SP, Deleted correlations set DelCorrSet Output: The candidate services after pruning. 1: l ← length(SP) 2: for Sl to S1 do 3: remove the correlations in DelCorrSet 4: remove the correlations related to the tasks not in SP 5: OFPA(S) 6: for S1 to Sl do 7: for all s1 , s2 ∈ Si do 8: if InSSet(s2 )=∅&OutSSet(s2 )=∅&isFST(OutSSet(s2 )) then 9: for all s ∈ OutSSet(s2 ) do 10: if s1 ∈SSKY&s1  s2 &composite(s1 , s)composite(s2 , s) then 11: remove the correlation between s2 and s 12: else if getToSs(OutSSet(s1 )) ∩ getToSs(OutSSet(s2 )) = ∅ then 13: for ∀s ∈ OutSSet(s1 ) ∩ OutSSet(s2 ) do 14: if composite(s1 , s)  composite(s2 , s) then 15: remove the correlation between s2 and s / SSKY then 16: if InSSet(s2 ) = OutSSet(s2 ) = ∅ & s2 ∈ 17: remove(s2 , Si ) 18: return Candidate Services Set S of each task in SP

obtained during the process of building an R-tree used in BBS. The QoS space is specified by user preferences [13]. The service broker are independent of each other, and the service information of other brokers are unknown. So Algorithm 1 can only remove the correlations from InSSet’s point, that is to say this algorithm is implemented based on Lemma 2 (Line 3∼6) and Lemma 3 (Line 7∼11). In Example 1, we should remove the correlation c1 between s31 and s11 , and the correlation c2 between s31 and s21 after performing Algorithm 1. However, s11 is still remaining in task1 although it has no correlations now, the reason is the broker manages s31 doesn’t tell the broker of s11 that the correlation has been removed. So we have to store the removed correlations in Algorithm 1, and in the next section, we will introduce how to use it to update the involved services in each broker. The time complexity of Algorithm 1 is O(mnc+(kc)2 ), where m is the size of skyline services, n is the size of services with QoS correlation(not include skyline services), c is the number of correlations per correlated service, k is the number of the correlated services(include the skyline services with correlation).

plan, and the output of Algorithm 1 from each related broker. After running Algorithm 1 and 2, the size of candidate services in Example 1 is further reduced. In task1 , only s13 and s14 remained. In task2 , only s21 and s23 are kept. And in task3 , the candidate services are s32 and s33 . The time complexity of Algorithm 3 is O(tkmc), where t is the size of tasks in service plan, k is the scale of the correlated services per task, k is the size of candidate services pruned by Algorithm 1, and c is the number of correlations per correlated service. D. Alogrithm 3: Computing Composite Service Skyline Algorithm with QoS Correlation(CCSKYAC) In this section, we will utilize the candidate services get from Algorithm 1 and Algorithm 2 to compute the CSKY with considering of QoS correlations. The main idea comes from Yu et al. [1]. The expansion lattice and the min-heap structures are used to the basic progressive enumeration. Example 9 (Service lattice): There is a service plan SP = {t1 , t2 }, each task contains two services, ti = {si1 , si2 }, the services of each candidate set are sorted according to the score. The score of a service is the aggregation of QoS [6], the positive qualities are from q1 to qk , and the negative qualities are from qk+1 to qd ,it is computed according to formula (1). The lattice of SP is constructed as Figure 1 shows.

C. Alogrithm 2: Online Preprocessing Algorithm of Candidate services for each task(ONPA) When a new request of composition arrives, service composer start running. Before composing, we still need to remove some redundant services, such as s11 . Algorithm 2 is running on the service composer broker which can obtain all the information of services. Firstly, we remove the deleted correlations in Algorithm 1 and the correlations relate to the services which do not belong to the tasks of SP. In Algorithm 1, we ignore the services of which OutSSet isn’t empty. So, when we remove the correlations, we should do Algorithm 1 again. Then, we remove the useless correlations from the OutSSet’s point, this part is implemented based on Lemma 4 and Lemma 5. The input of Algorithm 2 includes: a service

score(s) =

k d   qj − qjmin qimax − qi + (1) q max − qimin qjmax − qjmin i=1 i j=k+1

By using the lattice, we will get a parent node pn before the child node cn, it is useful since the score of a parent is smaller than its children and a service with a smaller score cannot be dominated by a service with a higher score [1], so the false positive skyline services could be avoided.

45

{s11, s21} {s12, s21}

Algorithm 4 Enumeration to Get Composite Service Skyline With QoS Correlation(ECSKYC) Input: Service Plan SP, Service Set S1 , S2 Output: The sub composite services skyline. 1: Initialize Lattice L and PT for S1 and S2 2: Sub composite service skyline sCSKY ← ∅ 3: Min-heap mHeap ← {s11 , s21 } 4: while mHeap = ∅ do 5: Sub composite service sCS ← pop(mHeap) 6: if validateComposition(sCS) then 7: if ( ∃sCS ∈ sCSKY, sCS  sCS) || (OutSSet(sCS) = ∅) then 8: push(sCS, sCSKY) 9: cSet ← generateChildren(sCS, L) 10: for all cCS in cSet do 11: PT(cCS) ← PT(cCS) −1 12: if PT(cCS) = 0 then 13: push(cCS, mHeap) 14: return sCSKY

{s11, s22}

{s12, s22} Figure 1.

Expansion lattice for Eaxmple 9

Algorithm 3 Computing Composite Service Skyline Algorithm with QoS Correlation(CCSKYAC) Input: Service Plan SP, Candidate Services Set S of each task in SP. Output: The composite service skyline. 1: for all S in SP do 2: decompose(S) 3: sort(S) 4: CSKY ← S1 5: taskNum ← the number of tasks in SP 6: for i = 2 to taskNum do 7: CSKY ← ECSKYC(SP, CSKY, Si ) 8: return CSKY

V. E VALUATION We propose two set of experiments to evaluate the effectiveness and efficiency of our algorithms in this section respectively. The experiments were implemented on a windows 7 (64-bit) machine with Intel Core i5-3210M CPU, 2.50GHz, the RAM is 4.00 GB. We use Java to implement our algorithms. There are two main research questions of our experiments: • RQ1: effectiveness: does the pruning process effect on the result? • RQ2: efficiency: does the response time of our algorithms smaller than the algorithms without pruning? Data Sets. As there are no standard data sets with QoS correlations, so we have to generate them among services synthetically. We use two data sets, the first is QWS3 data set and the second is synthetically generated data set. The quality values are assigned with floating numbers in synthetic data set, range from 0.2 to 0.9. We assume the quality value of a service must depend on the former one based on the sequence in a service plan, when there is a correlation between them.

However, it is difficult to get order of cn’s brothers based on their scores conveniently, because the scores of brother nodes are uncertainty, both larger and smaller than cn’s score, so we put the brother nodes in a min-heap in order to get the nodes orderly. There is still an issue, a child node may be enumerated multiple times. The reason is different parent nodes may have a same child node, e.g. {s12 , s22 } has two parent nodes {s12 , s21 } and {s11 , s22 }. To avoid this problem, a parent table is introduced. A parent table PT will keep how many parents does a child node have. When a child node is enumerated, the number of its parents kept in PT will be decrement. Until the number decrease to zero, the child node can be accessed . There are some differences between our method and the method in Yu et al. [1]. First, in our method, the score computation involves scaling the QoS attributes values to allow a uniform measurement of these qualities independent of their ranges and units in formula (1), so that we can improve the pruning efficiency comparing to the method without normalization. Secondly, we remove the services not only based on dominance relationship, but also other criterion. At last, we apply a verification process to verify the validity of a composition. The decompose() function is used to decompose a service of which the InSSet is not empty. And if the service is not skyline and its OutSSet is empty, we can remove the default QoS value according to Lemma 6. Example 10 (Decompose): In task2 , s23 can decompose into two services, the correlation between s12 and s23 is removed. We denote these services as s23 def ault and s23 1 . Now, InSSet(s23 def ault ) = ∅, NoInSSet(s23 def ault ) = {s13 }, and InSSet(s23 1 ) = {s13 }. The validateComposition() function returns false, if two conflicting services appearing in the same composition. We can judge the conflicts based on Lemma 6 and 7.

A. The effectiveness evaluation Experiment 1. Verify the accuracy of our algorithms. In this experiment, we will evaluate the effectiveness of our methods by verifying whether the algorithm can fulfil the computation of CSKY. We use synthetic data set, the number of QoS qualities is set to 3, the number of tasks in SP is set to 4, the size of candidate services is set from 500 to 2000 which step is set to 500. We will compare the accuracy of our algorithm (denote as CA) with BUA that considering QoS correlation(denote as BUA, Figure 2(a)) and BUA without considering QoS correlation (denote as NCA, Figure 2(b)) in different size of candidate services. The accuracy rate is computed by furmula (2). We first compute the composite service skyline by enumerate all 3 http://www.uoguelph.ca/∼qmahmoud/qws/

46

1.0 500BUA 1000BUA 1500BUA 2000BUA 500CA 1000CA 1500CA 2000CA

0.8 0.7 0.6

0.8 0.7 0.6 0.5

0.4

0.6

0.8

1.0

Correlation Pecentage

0.0

(a) CA and BUA with correlations Figure 2.

12000 10000 8000 6000 4000 2000 0

0.8

1.0 

Algorithm With Pruning Algorithm Without Pruning

15000 10000 5000

500 1000 1500 2000

300

0.4

0.6

0.8

250 200

500 450 400 350 300 250 200 150 100 50 0 0.0

2 3 4

0.2

0.4

0.6

0.8

Correlation Percentage

1.0 

Figure 5. The size candidate services after pruning

500 1000 1500 2000

4000

150 100 50 0 0.0

1.0

Correlation Percentage



(a) Response time of Algorithm 2 and 3

0.2

0.4

0.6

0.8

3000 2500 2000 1500 1000 500 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9



Correlation percentage



(b) Response time of Algorithm 1 Figure 7. The performance of Pruning Algorithm compares with BUA

more obvious when the correlation percentage is large. Experiment 2. Verify the size candidate services after our pruning algorithms. In this experiment, we will evaluate the effectiveness of our pruning methods by verifying the size of candidate services after pruning. These services are computed after decompose process. We use synthetic data set, the quantity of candidate services is set to 1000 for each task, the number of tasks in SP is set to 4, and size of qualities is set to 3. The result is shown in Figure 3. We can find the quantity of candidate services in each task is significantly reduced by using pruning algorithms. Experiment 3. The performance of Pruning Algorithm with different correlation percentage. In this experiment, we will evaluate the efficiency of

possible composite services, and select the skyline from them, the result is kept in R1 . Then we compute the composite service skyline by using other methods, the result is stored in R2 . Because R1 must be correct, so we consider it as a standard. We compute the proportion of the intersection of R1 and R1 in R1 , and take this proportion as the accuracy rate. size(R1 ∩ R2 ) size(R1 )

BUA CA

3500

0

1.0

Correlation Percentage

The performance of Pruning Algorithm with different quantity of candidate services.

AccuracyRate =

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Correlation Percentage 

(b) Response time of algorithms using Synthetic data set

The performance of Pruning Algorithm with different correlation percentage.

0.2

Algorithm With Pruning Algorithm Without Pruning

Figure 3. The size candidate services after pruning

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Correlation Percentage(Synthetic Data Set) 

Response Time(ms)

Response Time(ms)

Figure 4.

0.0

20000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Correlation Percentage(QWS Data Set) 

(a) Response time of algorithms using QWS data set

100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 -10000

0.6

The accuracy of our algorithms.

Response Time(ms)

Response Time(ms)

14000

0.4

1400 1300 1200 1100 1000 900 800 700 600 500 400 300 200 100 0

(b) CA and BUA without correlations

Algorithm With Pruning Algorithm Without Pruning

16000

0.2

Correlation Percentage



Candidate Services' size After Pruning

0.2

Response Time(ms)

0.5 0.0

Figure 6.

500NCA 1000NCA 1500NCA 2000NCA 500CA 1000CA 1500CA 2000CA

0.9

Accuracy Rate

Accuracy Rate

0.9

Candidate services number

1.0

(2)

We can see only our algorithm can accurately compute the CSKY. Besides, we can also find a tendency from the results of BUA and NCA, the accuracy rate decreases when the correlation percentage(the percentage of the correlated services per task) rises. So, the advantage of our method is

47

proposed pruning criteria, we propose advanced pruning techniques to reduce the search space. We also propose a composite service skyline selection method based on our proposed pruning criteria, combining with a service lattice and a min-heap. Experimental results show the effectiveness and efficiency of our approach. In our current work, we only consider the QoS correlation on one dimension of QoS. In the future, we plan to extend our work to handle QoS correlations on multi-dimension.

our algorithms by compare the response time of using and without using pruning process with different correlation percentage. First, we conduct the experiment by using QWS data set, the number of tasks in SP is set to 4, the candidate services’ size is set to 300 for each task, and the correlation percentage is set from 0.1 to 0.9, the number of qualities is set to 3 (Figure 4(a)). Then the synthetically generated data set was used to evaluate the effectiveness, the setting is same to QWS data set, and the result of it is shown in Figure 4(b).

ACKNOWLEDGMENT

B. The efficiency evaluation Experiment 4. The performance of Pruning Algorithm with different number of quality attributes. In this experiment, we analyze the efficiency of our algorithm by considering different size of qualities of the candidate services. We use the synthetic data set, the number of tasks is set to 4, the size of candidate service is set to 1000, and the number of qualities is set from 2 to 4. The result is shown in Figure 5. For each preference length, the number of candidate services is grown with the percentage of correlation. And for each correlation percentage, the size of candidate services is grown with the number of preference length. So, both correlation percentage and number of qualities are the factors affect the algorithms’ efficiency. Experiment 5. The performance of Pruning Algorithm with different quantity of candidate services. In this experiment, we analyze the efficiency of our algorithm by considering different quantity of candidate services. We use the synthetic data set, the number of qualities is set to 3, the size of tasks is set to 4, the quantity of candidate services is set from 500 to 2000 with step 500. Figure 6(a) shows the summation response time of Algorithm 2 and 3. For the same correlation percentage, the response time is grown with the size of candidate services. Figure 6(b) shows the response time of Algorithm 1, the response time is acceptable. So, we can find the size of candidate services is also a factor affects the efficiency of our algorithm. Experiment 6. Performance comparison of Pruning Algorithm and BUA. In this experiment, we will compare the response time of our algorithm (denote as CA) with BUA, the result is shown in Figure 7. We use synthetic data set, the number of QoS qualities is set to 3, the number of tasks in SP is set to 4, the size of candidate services is set to 500. In BUA, the candidate services of each step are only skyline without considering correlations, but in our algorithm, the candidate services are not only skyline, but also some with correlations. So the time consumption of our algorithm will be larger than BUA, but it is acceptable and reasonable.

This work is partially supported by the National HighTech Research and Development Plan of China under Grant No.2013AA01A213, Supported by the National Natural Science Foundation of China under Grant Nos.91318301, 61321491, 61202003, 61472174. R EFERENCES [1] Q. Yu, and A. Bouguettaya, “Efficient Service Skyline Computation for Composite Service Selection,” IEEE Trans. Knowledge and Data Engineering, 2013, 25(4), pp. 776-789. [2] M. Alrifai, D. Skoutas, and T. Risse, “Selecting skyline services for QoS-based Web service composition,” In Proc. of WWW, 2010, pp. 11-20. [3] K. Benouaret, D. Benslimane, and A. HadjAli, “WS-Sky: An Efficient and Flexible Framework for QoS-Aware Web Service Selection,” In Proc. of SCC, 2012, pp. 146-153. [4] L. Zeng, B. Benatallah, A. H. H. Ngu, M. Dumas, J. Kalagnanam, and H. Chang, “Qos-aware middleware for Web services composition,” IEEE Trans. Software Engineering, 2004, 30(5),pp. 311C327. [5] T. Yu, Y. Zhang, and K. Lin, “Efficient algorithms for Web services selection with end to end QoS constraints,” ACM Trans. Web, 1(1), 2007, pp. 1-26. [6] M. Alrifai, T. Risse, and W. Nejdl, “A Hybrid Approach for Efficient Web Service Composition with End-to-End QoS Constraints,” ACM Trans. Web, 2012, 6(2), pp.7-7 [7] S. Brzsonyi, D. Kossmann, and K. Stocker, “The skyline Operator,” In Proc. of ICDE, 2001, pp. 421-430. [8] D. Papadias, Y. Tao, and G. Fu, “An optimal and progressive algorithm for skyline queries,” In Proc. of SIGMOD, 2003, pp. 467-478. [9] S. Ye, J. Wei, L. Li, and T. Huang, “Service-Correlation Aware Service Selection for Composite Service,” Chinese Journal of Computer, 31(8), 2008, pp. 1383-1397. [10] L. Barakat, S. Miles, and M. Luck, “Efficient correlationaware service selection,” In Proc. of ICWS, 2012, pp. 1-8. [11] S. Deng, H. Wu, D. Hu, and J. L. Zhao, “Service Selection for Composition with QoS Correlations,” IEEE Trans. Service computing, DOI 10.1109/TSC.2014.2361138. [12] J. Cardoso, A. P. Sheth, J. A. Miller, J. Arnold, and K. Kochut, “Quality of service for workflows and Web service processes,” Journal of Web Semantics, 2004, 1(3), pp. 281-308.

VI. C ONCLUSION In this paper, we formulate the problem of computing composite service skyline with QoS correlation. To compute the composite service skyline efficiently, based on our

[13] S. Zhang, W. Dou and J. Chen, “Selecting Top-k Composite Web Services Using Preference-Aware Dominance Relationship,” In Proc. of ICWS, 2013, pp. 75-82.

48