A Probabilistic QoS Model and Computation Framework for Web Services-Based Workflows San-Yih Hwang1,2,*, Haojun Wang2,**, Jaideep Srivastava2, and Raymond A. Paul3 1 Department of Information Management National Sun Yat-sen University, Kaohsiung 80424, Taiwan
[email protected] 2 Department of Computer Science University of Minnesota, Minneapolis 55455, USA {haojun,srivasta}@cs.umn.edu 3 Department of Defense, United States
Abstract. Web services promise to become a key enabling technology for B2B e-commerce. Several languages have been proposed to compose Web services into workflows. The QoS of the Web services-based workflows may play an essential role in choosing constituent Web services and determining service level agreement with their users. In this paper, we identify a set of QoS metrics in the context of Web services and propose a unified probabilistic model for describing QoS values of (atomic/composite) Web services. In our model, each QoS measure of a Web service is regarded as a discrete random variable with probability mass function (PMF). We describe a computation framework to derive QoS values of a Web services-based workflow. Two algorithms are proposed to reduce the sample space size when combining PMFs. The experimental results show that our computation framework is efficient and results in PMFs that are very close to the real model.
1 Introduction Web services have become a de facto standard for achieving interoperability among business applications over the Internet. In a nutshell, a Web service can be regarded as an abstract data type that comprises a set of operations and data (or message types). Requests to and responses from Web service operations are transmitted through SOAP (Simple Object Access Protocol), which provides XML-based message delivery over an HTTP connection. The existing SOAP protocol uses synchronous RPC for invoking operations in Web services. However, in response to an increasing need to facilitate long running activities new proposals have been made to extend SOAP to allow asynchronous message exchange (i.e., requests and responses are not synchronous). One notable proposal is ASAP (Asynchronous Service Access Protocol) [1], which allows the execution of long-running Web service operations, * **
San-Yih Hwang was supported in part by Fulbright Scholarship. Haojun Wang was supported in part by the NSF under grant ISS-0308264.
P. Atzeni et al. (Eds.): ER 2004, LNCS 3288, pp. 596–609, 2004. © Springer-Verlag Berlin Heidelberg 2004
A Probabilistic QoS Model and Computation Framework
597
and also non-blocking Web services invocation, in a less reliable environment (e.g., wireless networks). In the following discussion, we use the term Web service, to refer to an atomic activity, which may encompass either a single Web service operation (in the case of asynchronous Web services) or a pair of invoke/respond operations (in the case of synchronous Web services), and the term WS-workflow to refer to a workflow composed of a set of Web service invocations threaded into a directed graph. Several languages have been proposed to compose Web services into workflows. Notable examples include WSFL (Web Service Flow Language) [13] and XLANG (Web Services for Business Process Design) [16]. The ideas of WSFL and XLANG have converged and been superceded by BPEL4WS (Business Process Execution Language for Web Services) specification [2]. Such Web services-based workflows may subsequently become (composite) Web services, thereby enabling nested Web Services Workflows (WS-workflows). While the syntactic description of Web services can be specified through WSDL (Web Service Description Language), their semantics and quality of service (QoS) are left unspecified. The concept of QoS has been introduced and extensively studied in computer networks, multimedia systems, and real-time systems. QoS was mainly considered as an overload management problem that measures non-functional aspects of the target system, such as timeliness (e.g., message delay ratio) and completeness (e.g., message drop percentage). More recently, the concept of QoS is finding its way into application specification, especially in describing the level of service provided by a server. Typical QoS metrics at the application level include throughput, response time, cost, reliability, fidelity, etc [12]. Some work has been devoted to the specification and estimation of workflow QoS [3, 7]. However, previous work in workflow QoS estimation either focused on the static case (e.g., computing the average or the worst case QoS values) or relied on simulation to compute workflow QoS in a broader context. While the former has limited applicability, the later requires substantial computation before reaching stable results. In this paper, we propose a probability-based QoS model on Web services and WS-workflows that allows for efficient and accurate QoS estimation. Such an estimation serves as the basis for dealing with Web services selection problem [11] and service level agreement (SLA) specification problem [6]. The main contributions of our research are: 1. We identify a set of QoS metrics tailored for Web services and WS-workflows and give an anatomy of these metrics. 2. We propose a probability-based WS-workflow QoS model and its computation framework. This computation framework can be used to compute QoS of a complete or partial WS-workflow. 3. We explore alternative algorithms for computing probability distribution functions of WS-workflow QoS. The efficiency and accuracy of these algorithms are compared. This paper is organized as follows. In Section 2 we define the QoS model in the context of WS-workflows. In Section 3 we present the QoS computation framework for WS-workflows. In Section 4 we describe algorithms for efficiently computing the
598
San-Yih Hwang et al.
QoS values of a WS-workflow. Section 5 presents preliminary results of our performance evaluation. Section 6 reviews related work. Finally, Section 7 concludes this paper and identifies directions for future research.
2 QoS Model for Web Services 2.1 Web Services QoS Metrics Many workflow-related QoS metrics have been proposed in the literature [3, 7, 9, 11, 12]. Typical categories of QoS metrics include performance (e.g., response time and throughput), resources (e.g., cost, memory/cpu/bandwidth consumption), dependability (e.g., reliability, availability, and time to repair), fidelity, transactional properties (e.g., ACID properties and commit protocols), and security (e.g., confidentiality, nonrepudiation, and encryption). Some of the proposed metrics are related to the system capacity for executing a WS-workflow. For example, metrics used to measure the power of servers, such as throughput, memory/cpu/bandwidth consumption, time to repair (TTR), and availability, falls in the category called system-level QoS. However, the capacities of servers for executing Web services (e.g., man power for manual activities and computing power for automatic activities) are unlikely to be revealed due to autonomy consideration, and may change over time without notification. These metrics might be useful in some workflow context such as intra-organizational workflows (for determining the amount of resources to spend on executing workflows). For interorganizational workflows, where a needed Web service may be controlled by another organization, QoS metrics in this category generally cannot be measured, and are thus excluded from further discussion. Another QoS metrics require all instances of the same Web service to share the same values. In this case, it is better to view these metrics as service classes rather than quality of service. Metrics of service class include those categorized as transactional properties and security. In this paper we focus on those WS-workflow QoS metrics that measure a WS-workflow instance and whose value may change across instances. These metrics, called instance-level QoS metrics, include response time, cost, reliability, and fidelity rating. Note that cost is a complicated metric and could be a function of both service class and/or other QoS values. For example, a Web service instance that imposes weaker security requirements or incurs longer execution time might be entitled to lower cost. Some services may adopt a different pricing scheme that charges based on factors other than usage (e.g., membership fee or monthly fee). In this paper, we consider the pay-per-service pricing scheme, which allows us to include cost as an instance-level QoS metric. In summary, our work considers four metrics: Response time (i.e., time elapsed from the submission of a request to the receiving of the response), Reliability (i.e., the probability that the service can be successfully completed), Fidelity (i.e., reputation rating) and Cost (i.e., the amount of money paid for executing an activity), which can be equally applicable to both atomic Web services and WS-workflows (also called composite Web services). These QoS metrics are defined such that different instances of the same Web service may have different QoS values.
A Probabilistic QoS Model and Computation Framework
599
2.2 Probabilistic Modeling of Web Services QoS We use a probability model for describing Web service QoS. In particular, we use probability mass function (PMF) on finite scalar domain as the QoS probability model. In other words, each QoS metric of a Web service is viewed as a discrete random variable, and the PMF indicates the probability that the QoS metric assumes a particular value. For example, the fidelity F of an example Web service with five grades (1-5) may have the following PMF: fF(1) = 0.1, fF (2) = 0.2, fF (3) = 0.3, fF (4) = 0.3, fF (5) = 0.1 Note that it is natural to describe Reliability, Fidelity rating and Cost as random variables and to model them as PMFs with domains being {0 (fail), 1 (success)}, a set of distinct ratings, and a set of possible costs respectively. However, it is less intuitive to use PMF for describing response time whose domain is inherently continuous. By viewing response time at a coarser granularity, it is possible to model response time as a discrete random variable. Specifically, we partition the range of response time into a finite sequence of sub-intervals and use a representative number (e.g., the mean) to indicate each sub-interval. For example, suppose that the probabilities of a Web service being completed in one day, two to four days, and five to seven days, are 0.2, 0.6, and 0.2, respectively. The PMF of its response time X is represented as follows: fX(1)=0.2, fX(3)=0.6 (3 is the mean of [2, 4]), and fX(6)=0.2 (6 is the mean of [5, 7]) As expected, finer granularity on response time will yield more accurate estimation with higher overhead in representation and computation. We explore these tradeoffs in our experiments. 2.3 WS-Workflow Composition For an atomic Web service, its QoS PMFs can be derived from its past records of invocations. For a newly developed WS-workflow that is composed of a set of atomic Web services, we need a way to determine its QoS PMFs. Different workflow composition languages may provide different constructs for specifying the control-flow among constituent activities (e.g., see [14, 15] for a comparison of the expressive powers of various workflow and Web services composition languages). Kiepuszewski et al. [8] define a structured workflow model that consists of only four constructs: sequential, or-split/or-join, and-split/and-join, and loop, which allows for recursive construction of larger workflows. Although it has been shown that this structured workflow model is unable to model arbitrary workflows [8], it is nevertheless powerful enough to describe many real-world workflows. In fact, there exist some commercial workflow systems that support only structured workflows, such as SAP R/3 and Filenet Visual workflo. In this paper, as an initial step of the study, we focus our attention on structured workflows. To distinguish between exclusive or and (multiple choice) or, which is crucial in deriving WS-workflow QoS, we extend the structured workflow model to include five constructs:
600
San-Yih Hwang et al.
1. sequential: a sequence of activities (a1, a2, …, an). 2. parallel (and split/and join): multiple activities (a1, a2, …, an) that can be concurrently executed and merged with synchronization. 3. conditional (exclusive split/exclusive join): multiple activities (a1, a2, …, an), among which only one activity can be executed. 4. fault-tolerant (and split/exclusive join): multiple activities (a1, a2, …, an) that can be concurrently executed but merged without synchronization. 5. loop: a block of activities a guarded by a condition “LC”. Here we adopt while loop in our following discussion.
3 Computing QoS Values of WS Compositions We now describe how to compute the WS-workflow QoS values for each composition construct introduced earlier. We identify five basic operations for manipulating random variables, namely (i) addition, (ii) multiplication, (iii) maximum, (iv) minimum, and (v) conditional selection. Each of these operations takes as input a number of random variables characterized by PMFs and produces a random variable characterized by another PMF. The first four operations are quite straightforward, and their detailed descriptions are omitted here due to space limitations. For their formal definitions, interested readers are referred to [5]. The conditional selection, denoted as CS ( X i , pi ) , is defined as following1. Let X 1 , X 2 , ..., X n be n random variables, 1≤i ≤ n
with pi , 1 ≤ i ≤ n , being the probability that X i is selected by the conditional selection operation CS. Note the selection of any random variable is exclusive, i.e., exactly one of these would be selected. The result of CS ( X i , pi ) is a new random variable Z 1≤ i ≤ n
with Dom(Z ) = ∪ Dom( X i ) . Specifically, the PMF fZ() of Z is as follows: 1≤ i ≤ k
f Z (Z = z) =
∑
pj z∈Dom ( X j )
⋅ f X j ( z ) , z∈ Dom(Z ) .
For each activity a, we consider four QoS metrics, namely response time, cost, reliability, and fidelity, denoted T(a), C(a), R(a), and F(a) respectively2. A WS-workflow composed of activities a1, a2, …, an using some composition construct is denoted w(a1, a2, …, an). The QoS values of w, under various composition constructs, are shown in Table 1. We assume that the fidelity of w using sequential or parallel composition is a weighted sum of the fidelities of its constituent activities. The fidelity weight of each 1
2
Ensure not to confuse the conditional selection by the weighted sum ∑pi⋅Xi. The weighted sum results in a random variable whose domain may not be the union of the domains of the constituent activities. While weighted sum is used for computing the average value of a set of scalar values, it should not be used to compute the PMF resulted from the conditional selection of a set of random variables. Note that each QoS metric of an activity is NOT a scalar value but a discrete random variable characterized by a PMF.
A Probabilistic QoS Model and Computation Framework
601
activity can be either manually assigned by the designer, or automatically derived from past history, e.g. by using linear regression. For the conditional construct, exactly one activity will be selected at run-time. Thus, the fidelity of w is the conditional selection of the fidelity of its constituent activities with the associated probabilities. For the fault-tolerant construct, the fidelity of the activity that is the first to complete becomes the fidelity of w. Thus, F ( w) = CS ( F (ai ), pf (ai )) , where 1≤ i ≤ n
p f ( ai ) =
∏ P(T (a ) > T (a )) . k
i
k ≠i
A loop construct is defined as a repetition of a block guarded by a condition “LC”, i.e., this block is repetitively executed till the condition “LC” no longer holds. Cardoso et al. assumed a geometric distribution on the number of iterations [3]. However, the memoryless property of the geometric distribution fails to capture a common phenomenon that a repeated execution of a block usually has a better chance to exit the loop. Gillmann et al [7] assumed the number of iterations to be uniformly distributed, which again may not hold in many applications. In this paper, rather than assuming a particular distribution, we simply regard the number of iterations as a PMF with a finite scalar domain. Let fL(a)(l),0≤l≤c, be the PMF of the number of iterations of a loop structure L defined on a block a, where c is the maximum number of iterations. Let T(a), C(a), R(a), F(a) denote the PMFs of the response time, cost, reliability, and fidelity of a respectively. If a is executed for l times, the response time Ta(l) is Ta (l ) = T (a ) . The response time of L is the conditional selection on Ta(l) with
∑
1≤ i ≤ l
0≤l≤c. Thus, the response time of L is T ( L) = CS (Ta (l ), f L ( a ) (l )) . Similar arguments can be applied to the computation of
probabilities
fL(a)(l),
1≤ l ≤ c
cost and reliability. Regarding fidelity, let p1 be the probability of executing at least one iteration and p0=1-p1. When a is executed at least once, the fidelity of a loop structure, in our view, is determined simply by its last execution of a. Let Fa(T) denote the fidelity that a is executed at least once (i.e., Fa(T)=F(a)) and Fa(F) be the fidelity that a is not executed. The fidelity of L is therefore computed as follows: F ( L) = CS ( Fa (i ), p i ) . i∈{F ,T }
4 Efficient Computation of WS-Workflow QoS 4.1 High Level Algorithm A structured WS-workflow can be recursively constructed by using the five basic constructs. Figure 1 shows an example WS-workflow, namely PC order fulfillment. This WS-workflow is designed to tailor-make and to deliver personal computers at a customer’s request. At the highest level, the WS-workflow is a sequential construct that consists of Parts procurement, Assembly, Test, Adjustment, Shipping, and Cus-
602
San-Yih Hwang et al. Table 1. The QoS values of a WS-workflow w under various composition constructs Cost: C(w))
Composition Construct Sequential
Response time: T(w)
∑ T (a ) Max{T (a i )}
n
Parallel
∑
C (ai )
i
CS (C (a i ), p i )
1≤i ≤ n
Min{T (a i )}
n
∑
CS (T (a i ), p i )
1≤i ≤ n
C (ai )
i
n
n
∑ w F (a )
∏ R(a )
i
i
CS ( R(a i ), p i )
CS ( F (a i ), p i )
1≤i ≤ n
CS ( F (ai ), p f (ai ))
n
∏ (1 − R(a )) i
i =1
i =1
i
i =1
1≤i ≤ n
1−
i
i =1
i =1
i =1
i =1
Conditional Faulttolerant
i
i
i =1
i =1
n
∑ w F (a )
∏ R(a )
i
i
Fidelity: F(w)
n
n
n
∑ C (a )
Reliability: R(w)
where f1-R(0)= fR(1)
1≤i ≤ n
where p f ( ai ) =
∏ P(T (a ) > T (a )) k
i
k ≠i
and f1-R(1)= fR(0)
CS(Ca (l), f L(a) (l))
Loop
CS (Ta (l ), f L ( a ) (l ))
1≤l≤c
1≤ l ≤ c
where
where
C a (l ) =
∑ C (a) Ta (l ) = ∑ T (a)
1≤i ≤l
1≤i ≤l
CS(Ra (l), fL(a) (l))
CS ( Fa (i ), pi )
1≤l≤c
i∈{ F ,T }
where
where
R a (l ) = ∏ R ( a ) 1≤ i ≤ l
Intel CPU Proc
and
Fb(T)=F(a)) Fa(F) be the
fidelity that a is not executed.
Ce
C AMD CPU Proc
S
CPU procurement - Conditional
P
Pe
Assembly
Test
HDD Proc. Parts procurement - Parallel
Y
OK? N
CD-ROM Proc
Email Notification Shipping
F
Fe
E
Phone Notification
Fix&Test Adjustment-Loop
Customer notification – Fault-tolerant
Fig. 1. An example WS-workflow PC order fulfillment
Customer notification. Parts procurement is a parallel construct that comprises of CPU procurement, HDD procurement, and CD-ROM procurement. CPU procurement in turn is a conditional construct composed of Intel CPU procurement and AMD CPU procurement. Adjustment is a loop construct on Fix&Test, which is iteratively executed until the quality of the PC is ensured. Customer notification is a fault-
A Probabilistic QoS Model and Computation Framework
603
ComputeQoS(A: a WS-workflow activity) { IF A.type ≠ ATOMIC THEN { FOR (each activity t∈A.activities) DO ComputeQoS(t); IF A.construct = SEQUENTIAL THEN A.QoS = SequentialQoS(A.activities); ELSEIF A.construct = PARALLEL THEN A.QoS = ParallelQoS(A.activities); ELSEIF A.construct = CONDITIONAL THEN A.QoS = ConditionalQoS(A.activities); ELSEIF A.construct = FAULT_TOLERANT THEN A.QoS = FaultTolerantQoS(A.activities); ELSE // A.construct = LOOP A.QoS = LoopQoS(A.activities); } ELSE Estimate the QoSs of A and put them in A.QoS; }; Fig. 2. Pseudo code for computing QoS of a WS-workflow
tolerant construct that consists of Email notification and Phone notification. The success of either notification marks the completion of the entire WS-workflow. The QoS of the entire WS-workflow can be recursively computed. The pseudocode is listed in Figure 2. Note that SequentialQoS(A.activities), ParallelQoS (A.activities), ConditionalQoS(A.activities), FaultTolerantQoS(A.activities), LoopQoS(A.activities) are used to compute the four QoS metric values for sequential, parallel, conditional, fault tolerant, and loop constructs respectively. Their pseudo codes are quite clear from our discussion in Section 3 and omitted here for brevity. 4.2 Sample Space Reduction When combining PMFs of discrete random variables with respect to a given operation, the sample space size of the resultant random variable may become huge. Consider adding k discrete random variables each having n elements in their respective domains. The sample space size of the resultant random variable, in the worst case, is of the order of nk. In order to keep the domain of a PMF after each operation at a reasonable size, we propose to group the elements in the sample space. In other words, several consecutive scalar values in the sample space will be represented by a single value and the aggregated probability is computed. The problem is formally described below. Let the domain of X be { x1 , x 2 ,..., x s }, where xi < xi +1 ,1 ≤ i < s , and the PMF of X be fX. We called another random variable Y an aggregate random variable of X if there exists a partition (j0, j1, j2, …, jm) of ( x1 , x 2 ,..., x s ), where 1=j01 and pair_error(x’, xi+2) if i