Supporting the Dynamic Evolution of Web Service Protocols in Service

0 downloads 0 Views 1MB Size Report
2004b]. One of the main motivations behind the adoption of SOA is the need for dynamic ... conversations is very large so that individual, manual analysis is unfeasible. ...... 1991; Bertino and Martino 1993; Lautemann 1996; Ferrandina et al.
Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures SEUNG HWAN RYU University of New South Wales, Australia FABIO CASATI University of Trento, Italy HALVARD SKOGSRUD ThoughtWorks, Australia BOUALEM BENATALLAH University of New South Wales, Australia and ´ REGIS SAINT-PAUL University of New South Wales, Australia

In Service-Oriented Architectures, everything is a service and everyone is a service provider. Web services (or simply services) are loosely coupled software components that are published, discovered, and invoked across the Web. As the use of Web services grows, in order to correctly interact with the growing services, it is important to understand the business protocols that provide clients with the information on how to interact with services. In dynamic Web services environments, service providers need to constantly adapt their business protocols for reflecting the restrictions and requirements proposed by new applications, new business strategies, and new laws, or for fixing problems found in the protocol definition. However, the effective management of such a protocol evolution raises critical problems: one of the most critical issues is how to handle instances running under the old protocol when it has been changed. Simple solutions, such as aborting them or allowing them to continue to run according to the old protocol, can be considered, but they are inapplicable for many reasons (e.g., the loss of work already done and the critical nature of work). In this paper, we present a framework that supports service managers in managing the business protocol evolution by providing several features, such as a variety of protocol change impact analyses automatically determining which ongoing instances can be migrated to the new version of protocol, and data mining techniques inferring interaction patterns used for classifying ongoing instances migrateable to the new protocol. To support the protocol evolution process, we have also developed database-backed GUI tools on top of our existing system. The proposed approach and tools can help service managers in managing the evolution of ongoing instances when the business protocols of services with which they are interacting have changed.

Author’s address: S. H. Ryu, B. Benatallah, R. Saint-Paul, CSE, University of New South Wales, Sydney NSW 2052, Australia; e-mail: {seungr,boualem,regiss}@cse.unsw.edu.au; F. Casati, DIT, University of Trento, Via Sommarive 14, 38050, Povo (Trento), Italy; e-mail: [email protected]; H. Skogsrud, ThoughtWorks, 16 O’Connell Street, Sydney, NSW 2000, Australia; e-mail: [email protected] Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. c 2007 ACM 1529-3785/2007/0700-0001 $5.00  ACM Transactions on the Web, Vol. V, No. N, November 2007, Pages 1–43.

2

·

S. H. Ryu et al.

Categories and Subject Descriptors: D.2.7 [Software Engineering]: Distribution, Maintenance, and Enhancement; H.3.5 [Information Storage and Retrieval]: Online Information Services General Terms: Management Additional Key Words and Phrases: Business protocols, Change impact analysis, Dynamic evolution, Ongoing instances, Web services, Decision trees

1. INTRODUCTION Web services, and more in general service-oriented architectures (SOAs), are quickly becoming the preferred choice for distributed application development and application integration. Web service interfaces today are described using the Web Services Description Language (WSDL). Besides interfaces, business protocols are rapidly gaining impetus and awareness as a necessary part of the service description [Benatallah et al. 2006]. A business protocol1 for a service specifies the sequence of messages that a service and its clients exchange to achieve a certain business goal [Alonso et al. 2004], e.g., booking flight tickets. Business protocols play an important role in Web services environments. They inform developers on how to write clients that correctly interact with a given service, and they allow development tools and runtime middleware to deliver functionality that simplifies the service development lifecycle (e.g., automatically generating code skeleton) [Benatallah et al. 2004b]. One of the main motivations behind the adoption of SOA is the need for dynamic applications, which can be quickly adapted to changes in business needs and/or regulations. Correspondingly, it is necessary to provide services with the ability to evolve and the ability to minimize the impact of such business-driven evolution in terms of i) development efforts to implement the changes and ii) disruption in the services provided to clients while the change is applied. When a service changes, the externally-visible behavior of a service, and in particular the protocols on which services base their interactions, can also evolve. As an example scenario, used for illustration throughout this paper, consider a service providing working visas in Australia. The immigration department is the service provider, and tens of thousands of protocol instances (conversations) are active at any given time. The completion of the entire service protocol, corresponding to the approval or rejection of the work permit, takes months. At a certain point, changes in immigration laws (quite frequent these days) may require changes in the service protocol. For example, new documents must be provided by the applicant, or the order in which documents have to be provided is modified. The dynamic protocol evolution problem is that of managing the ongoing conversations in the context of a protocol evolution. Dynamic protocol evolution is an important and challenging problem. From a ”business continuity” perspective, in most cases, we cannot abort all conversations and ask clients (users) to restart the service invocation from the beginning. In the visa example, we cannot ask immigrants to repeat the application and begin from 1 In this paper we will use “business protocol”, “protocol” and “Web service protocol” interchangeably.

ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures

scratch by resubmitting all the documents. In addition, continuing and completing ongoing conversations with the old protocol may be unacceptable, as the changes are introduced for a reason (e.g., a digital passport should be submitted as well) and not complying with the new regulation and with the required protocol changes may be unacceptable. Hence, the problem lies in finding some acceptable ”middle ground” for each of the active conversations, especially when the number of such conversations is very large so that individual, manual analysis is unfeasible. In this paper we propose a set of methods and a tool for managing dynamic protocol evolution. The end goal is to provide techniques for understanding which conversations are affected by a protocol evolution and, for those who are, for facilitating the definition and enactment of criteria for managing them. We are not instead concerned on how a protocol is changed, or whether the protocol changes are syntactically correct. Rather, we focus on making ongoing conversations dynamically comply with protocol changes. In particular, this paper makes the following contributions: —it presents a method to automatically classify conversations based on the impact protocol evolution has on them. We do so by studying two properties, called forward and backward protocol compatibility. The properties are used as requirements for determining conversations which can be migrated to a new protocol when an old protocol has been changed. Migrating a conversation to a protocol P means that in the future the conversation will have to obey the rules and constraints of protocol P. Then, based on the properties, we define operators for analyzing how the protocol changes impact on ongoing conversations and for classifying them into migrateable and non-migrateable conversations (Section 3). —for cases where the analysis above cannot be applied (e.g., we do not have a formal description of the protocol followed by the clients) we analyze service interaction logs recorded by a Web service monitoring tool and infer, using data mining techniques, interaction patterns of conversations that have completed their executions under an old protocol in the past. From this, we infer if it is likely that conversations may proceed without errors under a new protocol (Section 4). —we provide management tools for modifying protocols, for supporting change impact analysis based on protocol models and data mining-based migration analysis, and for assisting users in determining migration strategies. These tools are crucial to facilitate evolution, particularly when the number of ongoing conversations is high (Section 6). Besides the presentation of the technical contributions, we also discuss related work in Section 7 and conclude with a summary or results and directions for future work in Section 8. 2. VERSIONING AND EVOLUTION FRAMEWORK In this section we first present an example of protocol model that will be used throughout this paper to illustrate our approach. We then introduce protocol evolution concepts: the possible migration strategies and the properties that should be preserved during protocol migration. Although our presentation is based on ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

3

4

·

S. H. Ryu et al. checkEligibility Start

fillInApplication

submitReference Letter

cancel Eligible

Cancelled

fillInApplication ForOverseasStudent

Application Ready

S-Application Ready

submitWork Experience

submitGraduation Certificate

WE Submitted

GC Submitted

testEnglishAbility

submitPassport

Lodged

S-Lodged

checkApproval

Checked reassess

Reviewed

Fig. 1.

confirm

checkApproval

Processed

Initial business protocol for an Australian working visa application service

protocols modeled through a finite state machine, the concepts are generic and can be applied regardless of the modeling formalism. 2.1 Business Protocols Modeling A business protocol specifies which message exchange sequences are supported by a service. Following our previous work [Benatallah et al. 2006], we model protocols as finite state machines (FSMs). The reason for using FSM is because it is a wellknown paradigm based on a formalism that is easy to understand for non-expert users, and that is appropriate to represent reactive behaviors. FSM consists of states and transitions. States represent the different phases that a service may go through during its interaction with clients while transitions are triggered by messages sent by the clients to the service provider. Thus, transitions are labeled with a message, corresponding to the invocation of a service operation. Figure 1 shows a graphical representation of a protocol for an Australian working visa application service. State names, such as Eligible, are logical and do not affect the actual usage of a service. The visa application service is initially in the Start state, and service usage begins when a client sends a checkEligibility message, upon which the service moves to the Eligible state. In general, clients seeking to work in Australia can proceed to state Lodged by filling in the application, submitting their work experience, and testing their English ability, while clients re-applying for the working visa after visa expiry can go to the same state only by filling in the application and providing an employer reference letter. Overseas students ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures checkEligibility Start

cancel Eligible

fillInApplication

Application Ready

fillInApplication ForOverseasStudent S-Application Ready

submitReference submitWork Letter Experience

submitGraduation Certificate

WE Submitted

RL Submitted

Cancelled

GC Submitted

testEnglishAbility reportMedical Examination Lodged

submitPassport

S-Lodged

checkApproval

Checked checkApproval reassess

Reviewed

Fig. 2.

confirm Add

Processed

Remove

Changed protocol for an Australian working visa application service

who complete eligible studies in Australia proceed to state S-Lodged by filling in the application for overseas students, and submitting graduation certificate and passport. Then, they check their application status and complete the application. Although, in reality, the protocol could be much more complex, for simplicity, we omit the other stages necessary for the visa application service. An instance or conversation of the visa application service corresponds to a particular visa application process initiated by a particular client. Several instances of the service may be active at the same time, each instance may be in any of the possible states defined by protocols. It is important to observe that the entire procedure takes weeks or months to complete. Therefore, any time a modification is applied, it is certain that there will be thousands of active conversations that need to be handled. Formally, a business protocol is defined as follows: Definition 2.1. (Business protocol) A business protocol is a tuple P = (S, s0 , F , M, R) which consists of the following elements: —S is a finite set of states. —s0 ∈ S is the initial state. —F is a set of final states. —M is a finite set of messages. In our model, we assume that M is a set of operation names. ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

5

6

·

S. H. Ryu et al.

—R ⊆ S 2 × M is the transition relation. Each transition (s, s , m) identifies a source state s, a target state s and a message m that is consumed during this transition. 2.2 Changed Visa Application Protocol The initial version of the visa application protocol (Figure 1) may change for various reasons. For example, suppose that immigration laws are amended as follows: —Applicants reapplying for the working visa after their visa expiry should submit an employer reference letter as well as the result of medical examination to lodge their application. —Applicants cannot anymore ask for reviewing the application result. The service protocol must be modified to meet the new requirements. The new protocol is depicted in Figure 2 (changed parts are in bold). In this protocol, a state and a transition have been added: the new state RLSubmitted after state ApplicationReady and the transition reportMedicalExamination between RLSubmitted and Lodged. In addition, the state Reviewed and the transition reassess have been removed. 2.3 Protocol Version and Migration Strategy Within a dynamic Web service environment, service managers can define over time different versions of protocols to meet a variety of new requirements. In this subsection we explain the possible strategies that can be applied to migrate the instances running under an old protocol. We adopt the strategies suggested in [Skogsrud et al. 2004]. —Continue: Active instances are allowed to continue to run according to the old protocol, while new instances will start following the new version. Service managers apply this strategy to ongoing instances when it is acceptable to complete them according to the old protocol. However, in some cases, this strategy could be inapplicable since letting the instances complete according to the old one may not be acceptable, e.g., in case that there are security holes in the old immigration protocol. —Migration to the new protocol : Active instances are migrated to the new version of protocol. Whenever there is a migration, it should also be defined the state in the new protocol from which the conversation will resume. This strategy is the most appealing, but it is not always applicable as we will see in detail. For example, migration to a new version of the visa application protocol may not make sense if continuing with the new protocol means that certain legal requirements (e.g., submission of medical documents, to be done at the beginning of a conversation in the new protocol) are not met. Another possible problem is that clients’ implementations may be unable (unless changes are applied) to interact with the new protocol and to send and receive messages as required. —Migration to ad hoc protocol. Service managers may define ad hoc protocols for the instances that cannot be migrated to the new protocol. Ad hoc protocols are defined to manage those active conversations for which the other strategies are not applicable. They mediate between the need for capturing the changes ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures

prescribed by the new protocol and allowing continuation of active conversations. The details will be described in Section 5. 2.4 Compatibility Properties to be Considered When Migrating Conversations In order to determine whether a conversation can be migrated to the new protocol, we identify two different degrees of compatibility between conversations and protocols, corresponding to different requirements that service manager may impose on the migration process to assess which migration strategy should be applied: forward and backward compatibility. —Forward compatibility refers to the ability for clients of active conversations to continue to interact correctly (without runtime errors) with a given service after it is migrated to the new protocol. In some cases, the effects of protocol changes may make the active conversations fail to send required message when needed; clients are not prepared to interact with the new protocol since they have been developed to interact with the old one. Example 2.1. Consider a conversation at state Eligible in the initial protocol (Figure 1). If the conversation is migrated to state Eligible of the new protocol (Figure 2), the violation of the forward compatibility property might occur because the applicant (client) might take one of the changed paths in the new protocol (Eligible.fillInApplication().ApplicationReady.submitReferenceLetter(). RLSubmitted.reportMedicalExamination(). ...) with which the client cannot interact. —Backward compatibility means that, after migration to the new protocol, the backward path (also called history) of an instance (i.e., the message sequence followed by the instance so far) must be compatible in the context of the new protocol. This property is not concerned with the possible future progression of the conversation, but on whether the past interactions, up to the time of evolution, correspond to a valid interaction as defined in the new protocol version. Example 2.2. Consider again the old protocol (Figure 1) and the new protocol (Figure 2). We assume that there is an applicant currently in the state Lodged of the old protocol. This conversation cannot be directly migrated so that it continues from the same state of the new protocol, since the applicant might have followed the message sequence (Start. ... .Eligible.fillInApplication(). ... .submitReferenceLetter().Lodged) in the old one, which is incompatible with (is not allowed by) the new protocol. Hence, the migration causes the violation of backward compatibility property. These two properties are important in managing the dynamic protocol evolution. Forward compatibility is necessary in order to guarantee the successful future interaction between clients and services, while backward compatibility is required if we need every conversation, in every instant (including at its completion) to be a valid instance of the protocol to which it is migrated, that is, its execution is one of the executions allowed by the protocol. We next define these properties formally. —Let P = (S, s0 , F , M, R) be an old business protocol and P  = (S  , s0 , F  , M , R ) be a new business protocol. ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

7

8

·

S. H. Ryu et al.

—StateIP denotes the current state of an instance I in protocol P. —Let an execution path p= < s1 .m0 .m1 ....mk−2 .mk−1 .s2 > be a sequence of messages, from a state s1 to a state s2 , such that for 0 ≤ i ≤ k − 1, mi ∈ M and s1 ∈ S ∪ s0 and s2 ∈ S ∪ F . —HistoryIs0 ,s denotes an execution path actually taken by instance I that starts from an initial state s0 and ends at a state s= StateIP , s ∈ S in protocol P. s0 ,s denotes a set of execution paths that start from an initial —PathsFromStartP state s0 and end at a state s= StateIP , s ∈ S in protocol P.

—PathsToCompletions,f denotes a set of execution paths that start from a state P s=StateIP , s ∈ S and end at a final state f ∈ F in protocol P.

Definition 2.2. (Forward compatibility) In the migration of instance I from protocol P to protocol P  , forward paths that can be taken by the instance I in protocol P are forward compatible in the context s,f  of protocol P  iff P athsT oCompletions,f ⊆ P athsT oCompletion , where s is a  P P corresponding state of s. This definition states that, if the new protocol includes all the possible forward paths that can occur between the instance’s current state and final states in the old protocol, the instance can correctly interact with the new protocol. Definition 2.3. (Backward compatibility) In the migration of instance I from protocol P to protocol P  , the execution history of instance I in protocol P is backward compatible in the context of protocol P  iff s0 ,s s0 ,s HistoryI ∈ PathsFromStartP  , where s is a corresponding state of s. This definition states that, if the actual history taken by an instance belongs to the set of possible paths from the initial state to the state corresponding to the instance’s current state in the new protocol, the history is compatible in the context of the new protocol. Definition 2.4. (Safe migration) Migration of instance I from protocol P to protocol P  is safe iff the forward paths of I satisfy the definition 2.2 and the history of I satisfies the definition 2.3. 2.5 Protocol Evolution Process Protocol evolution management can be seen as a multi-step process illustrated in Figure 3. The first step is the modification, by service managers, of the protocol model and its implementation as a service. New instances of the service can directly interact in the realm of the new protocol version and, thus, do not create migration problems (assuming of course that client implementations have to be updated to interact with the new version of the service). Ongoing conversations that have been initiated according to the obsolete version of the protocol are more problematic. Service managers have to decide on an appropriate migration strategy for each instance. This involves to classify instances as either migrateable or non-migrateable. To achieve this classification, we propose a three-step approach as follows: (1) Model-based analysis. This analysis is done on a model level (Section 3). Old and new protocols are compared to check for their replaceabilty (that is, whether ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures Known client protocols

Old service protocol Unknown client protocols

Interactions

New service protocol w™–›–Š–“G”–‹Œ“GŒ–“œ›–•

conversations running under old protocol

?

Model-based analysis

Future interaction analysis by inferring interaction pattenrs

tŽ™ˆ›Œˆ‰“ŒGŠ–•Œ™šˆ›–•šG

tŽ™ˆ›Œˆ‰“ŒGŠ–•Œ™šˆ›–•šG

Non-migrateable conversations

Handling non-migrateable conversations

Fig. 3.

Process of managing the business protocol evolution

they can support the same set of message exchanges). There are different ways in which two protocols (old and new) of a service can be replaceable, with different implications in terms of migrateability of conversations: —Full replaceability. In this case, the new protocol can support all the message exchanges the old protocol supports. This means that the protocol evolution problem is in fact trivial: all instances can be safely migrated. The change is transparent with respect to the clients: they can continue their conversations as the service will support them. —State replaceability. If the new protocol cannot replace the old one in the general case, we need to compute which states of the old protocol are not affected by changes. Conversations in the unaffected-states can be safely migrated to the new version of the protocol. —Path replaceability. As for the states that are affected by changes, we look at the paths from the initial state to these states (backward path) and from the states to the final state (forward path) in order to determine to or from which states the changes are transparent to the clients. Conversations not in change-transparent states are further analyzed on the basis of their past and future interaction in the following analysis steps. —Replaceability with respect to a history. We examine the past interaction of conversations to filter out migrateable ones from the states that have different backward paths in old and new protocols. In general, past interactions of conversations are known (e.g., documents for visa application are stored in the database and time stamped with the date of submission). —Replaceability analysis based on client protocols. The future interaction analysis of conversations, based on client protocols, is conducted to the ones in the states that have different forward paths in the two protocols. This ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

9

10

·

S. H. Ryu et al.

analysis enables service managers to identify which conversation can take the unaffected forward path from its current state to the end. The future interaction may or may not be known depending on the service considered. In Section 3.6, we discuss the case where future interaction, i.e., client protocol, is known to the service manager. The result of this step is a conclusive (deterministic) classification of instances as either migrateable or non-migrateable. (2) Future interaction analysis by inferring interaction patterns. As outlined above, client protocol might not always be known. To overcome such a situation, we propose to apply data mining techniques to infer interaction patterns of already terminated instances and then use the patterns in predicting future interactions of ongoing instances. Service interactions are typically logged by a monitoring tool. Interaction logs contain valuable information, e.g., data sent by clients during service invocation, which can be used to induce a classification model of instances. In a nutshell, instances that completed their interaction can be automatically classified as compatible or incompatible with the new version of protocol. Their features can be used to train an automatic classification engine for building classification models, the underlying assumption is that the patterns generated from the models can serve as predictors for the future interaction of clients. If such predictors can be found, they allow to further classify (although only probabilistically) instances for which future interaction is not known. Section 4 details this approach. (3) Handling non-migrateable conversations. After the best efforts detailed in the two previous steps have been applied to migrate instances, some may remain non-migrateable. We propose two approaches to handle these instances: protocol adapters and ad hoc protocols. Protocol adapters bridge the differences between the old and new protocols so that non-migrateable instances can continue to interact with the new protocol as if they were interacting with the old one. As another solution, we develop ad hoc protocols, which enable the nonmigrateable instances to satisfy the requirements newly proposed in the new protocol without aborting them. The ad hoc protocols are defined to handle the cases for which adapters cannot meet the new requirements. Such protocols vanish when there are no active instances under them (Section 5). 3. MODEL-BASED ANALYSIS In this section, we propose how to perform change impact analysis on ongoing instances, based on protocol models, and describe how to classify the active instances as migrateable or non-migrateable using the result of the analysis. We perform different kinds of analysis at different levels of details and complexity to identify the largest possible number of conversations that can be migrated. 3.1 Full replaceability This analysis checks if a new protocol can support all the message exchanges that are supported by the old protocol [Benatallah et al. 2004a]. When this is the case, all instances can be safely migrated and continue their interaction with the new one. This situation typically occurs when changes to the protocol are additive, e.g., when a new protocol path is added without discarding any of the existing ones. ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures

Definition 3.1. (Full replaceability) Let P = (S, s0 , F , M, R) and P  = (S  , s0 , F  , M , R ) be two protocols. Let a complete execution path be a path that starts from an initial state and ends at a final state. A protocol P  can replace P iff P  supports all the complete execution paths that P supports. Algorithm 1: F-replaceability Input: P = (S, s0 , F, M, R) and P  = (S  , s0 , F  , M , R ). Output: Replaceable or not. begin 1: Let Replaceability:= true; 2: Let CompletePaths:= φ and CompletePaths := φ; 3: CompletePaths:= Recursive-ComputePaths(P, s0 , F); 4: CompletePaths := Recursive-ComputePaths(P , s0 , F  ); 5: foreach path ∈ CompleteP aths do 6: if path ∈ / CompleteP aths then 7: Replaceability:= false; 8: break; 9: endfor 10: return Replaceability; end Function Recursive-ComputePaths(P, s, F) begin 31: Let Path:= ””; 32: Let PathSet:= φ and ReturnPathSet:= φ; 33: foreach m ∈ outgoing messages of s do 34: s := ending state of m; 35: if s ∈ / F then 36: ReturnPathSet:= Recursive-ComputePaths(P, s , F); 37: foreach RP ath ∈ ReturnPathSet do 38: Path:= m+”.”+RPath; 39: PathSet:= PathSet ∪ Path; 40: endfor; 41: else 42: PathSet:= PathSet ∪ m; 43: endfor 44: return PathSet; end

We present an algorithm, called F-replaceability, that implements the full replaceability analysis. Informally, we obtain a set of complete paths from the initial state to each final state in old and new protocols (lines (3) to (4)). The procedure Recursive-ComputePaths(P, s, F) computes all the paths that can exist from the start state to a final state. If one of the complete paths of the old protocol does not belong to a set of complete paths of the new protocol, this means that replaceability between these protocols is not possible (lines (5) to (9)). ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

11

12

·

S. H. Ryu et al.

3.2 State replaceability The previous analysis is ”black and white” replaceability analysis since it does not consider the current states of instances. In fact, even if protocols are not replaceable, there are still hopes to classify some instances as depending on their current state. In particular, this class of analysis determines the states of the old protocol such that conversations in those states are not affected by the changes. These states have the same forward and backward paths in the old and new protocols, which means that the instances in the states followed an unaffected backward path from the initial state to their current state and will follow an unaffected forward path from their state to a final state. Thus, all the instances in these states can be safely migrated to the new protocol without generating any kind of migration property violations. We call these states as replaceable states. To perform this analysis, we develop a function that takes as input two protocols (old and new protocols) and generates as output the replaceable states. Example 3.1. In Figure 1, the analysis function returns four states S-Application Ready, GCSubmitted, S-Lodged, and Cancelled, since their backward paths and forward paths are not changed in the new protocol (Figure 2). We can migrate all the instances in these states to the corresponding states of the new protocol. Once we understand that an instance in a certain state is migrateable, we have to determine to which state it is migrated in a new protocol. There are several ways in finding a state (in a new protocol) corresponding to one of an old protocol: —Name-based mapping. The common situation is the one in which protocol changes add states or remove states, but do not change state names. In this case we can simply migrate instances to the new protocol and keep them in the same state as they were in the old protocol. —Path-based mapping. This is the general case. A corresponding state can be determined by looking at the paths leading to the state from the initial state. In this case, a corresponding state is the one that is obtained by following the same path from the initial state in the old and new protocols. Note that in this definition we assume state replaceability, therefore it is irrelevant which path we follow: if two paths (sets of message exchanges) p1 and p2 lead to a same state in the old protocol, they will also lead to a same state in the new protocol. —User-defined mapping. Another approach is for users (e.g., service managers) to manually choose which state in an old protocol is corresponded to which state in a new protocol. This approach is not recommended as arbitrary state mappings not computed via path-based mappings cannot guarantee that migration property hold, and in particular do not guarantee backward compatibility which is instead guaranteed by path-based mapping. Definition 3.2. (Corresponding state) Let P = (S, s0 , F , M, R) and P  = (S  , s0 , F  , M , R ) be two protocols. The corresponding state of a state s ∈ S is a state s ∈ S  if either of the following holds: —Ψ(s) = s where Ψ : S → S  is a partial function, with Ψ = {(x, y)|x ∈ S∧y ∈ S  ∧ N ame(x)=N ame(y)} and N ame(x) means the name of state x.  s0 ,s — ni=1 EqualP ath(pi , pi ), with p1 , p2 , ..., pn ∈ P athsF romStartP , p1 , p2 , ..., pn ∈ ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures s ,s

P athsF romStartP0 and EqualP ath(p1 , p2 ) means p1 is equal to p2 in terms of message sequences. —Φ(s) = s where Φ : S → S  is a partial function, with Φ = {(x, y)|x ∈ S ∧y ∈ S  ∧ M APUSER (x, y)} and M APUSER (x, y) symbolizes x is mapped to y by the user. Using the corresponding state concept, we can formalize the state replaceability like below. Definition 3.3. (State replaceability) Let P = (S, s0 , F , M, R) and P  = (S  , s0 , F  , M , R ) be two protocols. The res0 ,s placeable states are a set of states s ∈ S such that for ∀ b ∈ P athsF romStartP ,∀ s0 ,s s,f f ∈ P athsT oCompletionP , then b ∈ P athsF romStartP  and f ∈ P athsT oComp   letionsP ,f , where s is a corresponding state of s. Algorithm 2: S-replaceability Input: P = (S, s0 , F, M, R) and P  = (S  , s0 , F  , M , R ). Output: A set of states. begin 1: Let Candidates:= φ; 2: foreach s ∈ S do 3: s = f indCorrespondingState(P, P  , s); 4: if s = null then 5: PathsFromStart:= GetBackwardPaths(P, s); 6: PathsFromStart := GetBackwardPaths(P , s ); 7: foreach b ∈ P athsF romStart do 8: if b ∈ / P athsF romStart then 9: go to 18; 10: endfor 11: PathsToComplete:= GetForwardPaths(P, s); 12: PathsToComplete := GetForwardPaths(P , s ); 13: foreach f ∈ P athsT oComplete do 14: if f ∈ / P athsT oComplete then 15: go to 18; 16: endfor 17: Candidates:= Candidates ∪ s; 18: endif 19: endfor 20: return Candidates; end

The S-replaceability algorithm implements the identification of replaceable states. Given the old and new protocols, the algorithm first obtains the corresponding state of a state by calling the procedure findCorrespondingState(P,P’,s) (line (3)). Then, if the corresponding state exists in the new protocol, it calculates two sets of backward paths from the initial state to the state in the two protocols (lines (5) to (6)). Next, the algorithm examines whether the set of backward paths in the new protocol includes the set of backward paths in the old protocol (lines (7) to (9)). ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

13

14

·

S. H. Ryu et al.

For checking the forward paths, the algorithm proceeds similarly to the above case (lines (11) to (16)). If all the backward paths and forward paths to/from a state in the old protocol also exist in the new protocol, the state is added to the variable Candidates (line (17)). Algorithm 3: findCorrespondingState Input: P = (S, s0 , F, M, R), P  = (S  , s0 , F  , M , R ) and a state. Output: a state. begin 1: Let FoundState:=  ; 2: foreach s ∈ S  do 3: if name of s= name of s then 4: FoundState:= s ; 5: go to 17; 6: foreach p ∈ GetBackwardP aths(P, s) do 7: foreach s ∈ S  do 8: foreach p ∈ GetBackwardP aths(P , s ) do 9: if p= p then 10: FoundState:= s ; 11: go to 17; 12: endfor 13: foreach s ∈ S  do 14: if s is mapped to s’ then 15: FoundState:= s ; 16: go to 17; 17: return FoundState; end

Algorithm 4: GetBackwardPaths Input: protocol P = (S, s0 , F, M, R) and state s Output: a set of paths from s0 to s. begin 1: Let executionPaths:= φ; 2: Let executionPath:=  ; 3: if s = s0 then 4: executionPaths:= executionPaths ∪ executionPath; 5: else 6: parentStates:= parent states of s; 7: incomingMessages:= incoming messages of s; 8: foreach parentState ∈ parentStates and message ∈ incomingM essages do 9: parentPaths:= getParentPaths(parentState); 10: foreach parentP ath ∈ parentP aths do 11: executionPath:=parentPath+ ”.”+ message; 12: executionPaths:=executionPaths ∪ executionPath; 13: endfor 14: endfor 15: return executionPaths; 16: end ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures

We have omitted the details of the other algorithms for space reasons, but interested readers are referred to [Ryu 2007] for further details. 3.3 Path replaceability After conducting the two analyses (full replaceability and state replaceability), we focus on states that are not replaceable and specifically on the paths from the start to this state (backward path) and from the state to the end (forward path). We observe that there are two properties that states may have with respect to such paths: forward path replaceability and backward path replaceability. Together, these properties guarantee state replaceability, however it often happens that only one of them holds. 3.3.1 Forward path replaceability. We identify the states from which the new protocol can replace the old one and the changes are transparent to the clients. A function doing this analysis takes as input two protocols and generates states: the forward paths from the states are the same in the old and new protocols and the backward paths to these states are not the same. Some of the instances in the states might have followed an unaffected path (compatible backward path in conformance with the new protocol) while the others might have taken a path affected by changes (incompatible backward path). So, if we migrate all the active instances in the states to the new protocol, the migration could cause the violation of the backward compatibility. Example 3.2. In our example protocol, the state Processed belongs to this kind of states, as one of three backward paths leading to this state is affected by the protocol changes (i.e., the addition of state RLSubmitted and reportMedicalExamination in Figure 2). Definition 3.4. (Forward path replaceability) Let P = (S, s0 , F , M, R) and P  = (S  , s0 , F  , M , R ) be two protocols. The forward path replaceable states are a set of states s ∈ S such that for ∃ b  ∈  s ,s s0 ,s P athsF romStartP , ∀ f ∈ P athsT oCompletions,f / P athsF romStartP0 P , then b ∈   and f ∈ P athsT oCompletionsP ,f , where s is a corresponding state of s. 3.3.2 Backward path replaceability. Compared with the forward path replaceability, this analysis identifies the states that might generate the violation of the forward compatibility in case that conversations in these states are migrated. Namely, the new protocol can replace the old one only to these states. The states have the same backward paths, but different forward paths in old and new protocols. Hence, some of the instances in the states would follow an unaffected forward path while the others would take the forward path affected by changes (incorrect interaction with the new protocol). In this case, all instances in these states cannot be safely migrated, but they are guaranteed to be backward compatible in the context of new protocol. To filter out migrateable ones, we need to predict the forward paths that the instances in the states will take in the future (Section 3.6). Example 3.3. In Figure 1, this analysis generates as output the states Start, Eligible, ApplicationReady, and WESubmitted, which have the same backward paths, but changed forward paths in the new protocol (Figure 2). ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

15

16

·

S. H. Ryu et al.

Definition 3.5. (Backward path replaceability) Let P = (S, s0 , F , M, R) and P  = (S  , s0 , F  , M , R ) be two protocols. The backward path replaceable states are a set of states s ∈ S such that for ∀b ∈ s0 ,s s0 ,s P athsF romStartP , ∃f ∈ P athsT oCompletions,f P , then b ∈ P athsF romStartP  s ,f   and f ∈ / P athsT oCompletionP  , where s is a corresponding state of s. 3.4 Non replaceability The last overall analysis is that of determining the states in which all the instances cannot be safely migrated, and they are neither guaranteed to be forward nor backward compatible. In Section 3.6.3, for the instances in these states, we analyze whether, at the same time, they have followed the compatible backward path and will take the compatible forward path in the context of new protocol. In addition, removed states can be identified by this analysis. Example 3.4. The analysis function returns states Lodged, Checked, and Reviewed. Definition 3.6. (Non replaceability) Let P = (S, s0 , F , M, R) and P  = (S  , s0 , F  , M , R ) be protocols. The nons0 ,s replaceable states are a set of states s ∈ S such that for ∃b ∈ P athsF romStartP , s0 ,s s,f ∃f ∈ P athsT oCompletionP , then b ∈ / P athsF romStartP  and f ∈ / P athsT oCom   pletionsP ,f , where s is a corresponding state of s. 3.5 Replaceability with respect to a history In order to filter out migrateable instances, it is not sufficient to classify active instances based on the overall replaceability analysis, since only through comparing old and new protocols, we cannot extract migrateable instances from the change-affected states, where migrateable and non-migrateable instances stay together (e.g., states identified by forward path replaceability). Hence, we describe how to further categorize active instances in the states. This refers to the analysis of the actual backward path (history) that an instance took, up to the evolution time. This analysis plays an important role in filtering out migrateable instances from the states identified by the forward path replaceability. Definition 3.7. (Replaceability with respect to a history) An instance I is migrateable to protocol P  , with respect to its history HistorysI0 ,s , iff HistoryIs0 ,s satisfies the definition 2.3. Example 3.5. Consider the state Processed of the old protocol in Figure 1. If all the instances in the state are migrated to the state Processed of the new protocol, this might cause the violation of backward compatibility, since some of them might have followed the backward path (Start. ... .ApplicationReady. submitReferenceLetter().Lodged. ... .confirm().Processed), which cannot be simulated in the new protocol because they have not provided the result of medical examination. So, to filter out migrateable instances, there is a need for analyzing the actual execution path taken by individual instances in the state. By doing this, the service manager knows that some of them followed the path 1 (Start. ... .ApplicationReady.submitWorkExperience(). ... .Lodged ... .confirm().Processed) while the others followed the path 2 (Start. ... .ApplicationReady. submitReferenceLetter().Lodged. ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures checkEligibility Start

checkEligibility Eligible

cancel

Start

fillInApplication ForOverseasStudent

Eligible

fillInApplication

cancel

Cancelled S-Application Ready

Application Ready

submitGraduate Certificate

submitWork Experience submitReference Letter

GC Submitted

Cancelled

WE Submitted

submitPassport

testEnglishAbility

S-Lodged

Lodged

checkApproval checkApproval Checked confirm Processed Processed

(a) A client protocol Pc1

Fig. 4.

(b) A client protocol Pc2

Client protocols interacting with the old Australian working visa application service

... .confirm().Processed). In this case, only the instances that followed the path 1 can be migrated to the state Processed of the new protocol. 3.6 Replaceability analysis based on client protocols After performing all the replaceability analyses described above, there remain instances that followed compatible backward paths, but are not guaranteed to correctly interact with the new protocol. However, if we have knowledge of the clients’ protocols, we can assess whether the protocol changes do not affect clients as they always (as per their protocol) take certain paths (possibly unmodified by the modifications) rather than others. Hence, virtually all replaceability analysis discussed above can be modified to take into account the client protocols, when such definition is available. 3.6.1 Replaceability with respect to a client protocol. In our previous work [Benatallah et al. 2004a; 2006], this class of analysis was proposed for identifying whether a new version of protocol can replace an old one when interacting with clients supporting a protocol Pc . If every legal message sequence between an old protocol Po and a client protocol Pc is also supported between a new protocol Pn and Pc , we say that Pn can replace Po with respect to Pc . Example 3.6. Protocol Pn of Figure 2 can replace protocol Po of Figure 1 when interacting with client protocol Pc1 (Figure 4(a)). So, the instances having such a client protocol can be safely migrated to protocol Pn . The formal definition of replaceability analysis is given in [Benatallah et al. 2006]. ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

17

18

·

S. H. Ryu et al.

3.6.2 Replaceability with respect to a state and a client protocol. Here, we propose more find grained replaceability analysis by considering the current states of instances as well as client protocols. Using the state and client protocol information, we can see that the client will never follow the changed forward path from its current state and there is no problem in migrating the client’s instance to the corresponding state in the new one. Example 3.7. As an example of using a state information and a client protocol to identify the future interaction, assume that client 1 stays in Eligible state of Figure 1, client 2 is in WESubmitted state in the same protocol, and both of them have client protocol Pc2 of Figure 4. According to the replaceability analysis only considering client protocols, the clients’ instances are classified as non-migrateable, as Pn of Figure 2 allows a client to report a medical examination (message reportMedicalExamination at state RLSubmitting) while Po does not support this message. However, if we consider the states and client protocols at the same time, we get different results. Namely, client 1’s instance in state Eligible cannot be migrated to the corresponding state of Pn , however, client 2’s instance in state WESubmitted can be migrated to the corresponding state of Pn since the client will never take the changed forward path from its current state, according to its protocol. The migration does not cause the violation of forward compatibility. Definition 3.8. (Replaceability with respect to a state and a client protocol) Let P and P  be two business protocols, and CP be a client protocol. Let CommonPathss,f (P, CP) be a set of common execution paths between P and CP, from a state s=StateIP to a final state f ∈ F . An instance I is migrateable to protocol P  , with respect to client protocol CP, iff ∀p ∈ CommonP athss,f (P, CP ),   p ∈ P athsT oCompletionsP ,f , where s is a corresponding state of s. It should be noted that when performing the forward path analysis using client protocols and state information, we determine the migrateability of instances only if the ”set” of all forward paths that an instance can follow from its current state to final states in the old protocol also exists in the context of the new protocol. 3.6.3 Replaceability with respect to a state, a client protocol and a history. This analysis is performed on the instances in the states returned by the non replaceability. To extract migrateable instances, the analysis function examines which backward paths the instances followed in the old protocol as well as which forward paths they will take based on the protocols of clients. Service managers conduct the analysis to the instances in states Lodged and Checked of the old protocol. Definition 3.9. (Replaceability with respect to a state, a client protocol and a history) Let P and P  be two business protocols, and CP be a client protocol. An instance I is migrateable to protocol P  , with respect to client protocol CP and its history HistorysI0 ,s , iff HistorysI0 ,s satisfies the definition 2.3 and ∀p ∈ CommonP athss,f (P, CP ),   p ∈ P athsT oCompletionsP ,f , where s is a corresponding state of s. ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures

4. FUTURE INTERACTION ANALYSIS BY INFERRING INTERACTION PATTERNS In Section 3, we explained how to classify ongoing conversations as either migrateable or non-migrateable, through performing model-based analysis on protocols and conversations. However, this analysis i) assumes that protocol models of clients are available, and ii) is conservative, in that in case of multiple possible future paths a client may take through the protocol, it assumes a worst-case scenario (forward compatibility is not guaranteed unless all possible paths are compatible). In this section, to overcome such a limitation, we present an approach for supporting service managers in further classifying conversations when the client protocols are unknown and based on (statistically motivated) assumptions on possible future behavior of a client. 4.1 Approach To carry out future interaction analysis, we propose to apply data mining techniques to audit trails of completed conversations (recorded in logs) and from them derive a set of predictive models to predict the future behavior of each conversation, or at least its forward compatibility. The data mining technique chosen for the interaction analysis is that of decision trees [Quinlan 1986; Witten and Frank 2005], because they combine ability to derive classification and prediction rules and due to the fact that they can be easily read by users to understand the classification logic. A decision tree is constructed from a training set of records, each of which has a set of attributes of an instance (e.g., an applicant in working visa application services) and a class label (e.g., ”permit” or ”reject” applicant). Each node in a tree identifies a set of instances that satisfy a node condition (their attributes have certain values) as well as all conditions in the path from the node to the root. Non-leaf nodes in a decision tree involve a test comparing a particular attribute with a constant value. Leaf nodes are associated with a class label that characterizes (classifies) all instances in the leaf. Hence, to classify an instance with a label, we start from the root node and traverse the tree, based on the instance attributes and the node conditions, until a leaf is reached. Note that conditions of nodes at the same level identify a partition on attribute values, so there is always one and only one path that can be taken. The combination of conditions involved in all the tests on a path from the root node to a leaf produces a classification pattern (rule) for determining the class label assigned to the leaf. For example, a classification pattern can be: applicants who have more than 10 years of experience and speak fluent English can obtain visa permit. We map the interaction analysis problem to a decision tree (classification) problem where the conversations are the objects to be classified, the classification features are selected attributes of conversations, and the classes are “migrateable” (interaction compatible with the new protocol) or ”non-migrateable” (interaction not compatible with the new one) class. To this end, the approach we follow is inspired by the ones adopted in business process prediction [Grigori et al. 2001; Castellanos et al. 2005], with some modifications to make it applicable and practical for the problem at hand. The idea is to generate trees for each stage in the process (in our case, for each state in the protocol) where predictions need to be made. In particular, the idea is to generate ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

19

20

·

S. H. Ryu et al.

one tree per each state and per different path (modulo loop) that can be taken to reach that state. In a nutshell, we look at completed conversations and build the tree by first labeling the conversations based on the forward path they took from the state, and hence based on whether they are migrateable or not (this can be determined by looking if the forward path exists in the new protocol). In building the tree, we only use features of conversations (e.g., parameters of messages) that took the path and that are in the state for which the tree is computed, so that node conditions only include features that are known (defined) for conversations that took that path and are in that state. At migration time, we look at active conversations and select the tree based on the path and state of the conversation. Then we traverse the tree using the conversation features and classify the instance as migrateable or not. We next detail the algorithm and the approach we took to make the problem tractable, in particular in terms of reducing the (potentially unlimited) number of features to be considered for building the trees. We next discuss how to address these challenges: how the states are identified, which attributes are considered for the analysis, which trees are generated, and how the results are used to determine migration properties of conversations. 4.2 Feature Selection and Data Preparation Trees are generated from service interaction logs (logs of messages exchanged among services). We assume that logs are clean of noise and that we have logs of conversations (logs of messages where it is known which message belongs to which conversation). We refer the reader to [Motahari et al. 2007; Nezhad et al. 2007] for approaches to obtain such log when it is not available in the first place. We also assume that the service interaction logs contain: (i) invoked message name, (ii) information on sender, receiver, and timestamp, (iii) SOAP header, and (iv) SOAP body. The assumption is in agreement with the data logged by commercial service monitoring tools (e.g., HP SOA Manager). 4.2.1 Identifying Candidate States. In our problem domain, the goal of classification patterns generated by the decision tree is to identify which attributes can be exploited for predicting the likelihood of the conversation to actually follow a certain forward path (i.e., path not affected by protocol changes). To build the decision trees, we determine the most relevant states from a set of states, as there is no need to build trees for every state as discussed below. The candidate states can be identified by the algorithm 5: - Exclude the initial state from the candidate states because there is no information to be used for constructing a decision tree, and the final states where the already completed conversations stay. In addition, disregard the states returned by the state replaceability analysis (lines (3) to (4)). In our visa application protocol (Figure 1), those states are Start, Cancelled, Reviewed, Processed, SApplicationReady, GCSubmitted, and S-Lodged. - Exclude the states that have only one forward path to a final state regardless of whether the path has been changed in the new protocol, since we can easily predict the forward path that ongoing conversations in those states will take. ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures

- For the remaining states, determine whether there is more than one forward path to final states and one of the paths is changed in the new protocol. If they satisfy these conditions, we put them into a set of candidate states because we have to decide whether the ongoing conversations in those states might take the changed forward path in the future. In the example, the identified candidate states are Eligible, ApplicationReady, WESubmitted, Lodged, and Checked (lines (7) to (9)). Algorithm 5: Identifying candidate states for building decision trees Input: protocol P = (S, s0 , F, M, R). Output: a set of states. begin 1: Let Candidates:= { }; 2: foreach s ∈ S do 3: if s = s0 or s ∈ F or s ∈ ReplaceableStates(P ) then; 4: continue; 5: 6: PathsToCompletion:= GetForwardPaths(P, s); 7: if |P athsT oCompletion| = 1 then 8: continue; 7: else if |P athsT oCompletion| ≥ 2 and one of PathsToCompletion is changed then 8: Candidates:= Candidates ∪ s; 9: continue; 11: endfor 12: return Candidates; end

4.2.2 Identifying Candidate Attributes. Feeding all possible attributes to a decision tree algorithm would not a good approach, because (i) the table may have undefined (NULL) values that might cause the accuracy of classification patterns low and (ii) the computation of building the tree with all attributes becomes heavy. Therefore, based on the states computed by the technique (described in the previous subsection), we identify which, among the many instance attributes (table I) in the input data table, should be selected as the interesting attributes for building decision trees that tell us which future paths the conversations in the states will take. The approach we followed for this uses the path information as follows: —Given a candidate state (CS), we identify only the attributes of the messages that belong to a backward path from the initial state to CS. We do not need to consider the attributes related to the messages exchanged from CS to final states, regardless of whether the conversations in CS will follow or not. If we select the attributes from messages exchanged after the CS targeted at deriving the statebased decision tree, the values of some of the attributes may be indeterminate for the conversations in progress to the point, and the classification rules generated by the decision tree including such attributes may not work very well (lines (4) to (6)).

ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

21

22

·

S. H. Ryu et al.

Attribute age visaType odl relevantLicense sponsorship workExperience englishTest referenceResult checkResult universityName country

Value Nominal Nominal Nominal Nominal Nominal Numeric Nominal Nominal Nominal Nominal Nominal

(e.g., (e.g., (e.g., (e.g., (e.g., (e.g., (e.g., (e.g., (e.g., (e.g., (e.g.,

Table I.

{20-29, 30-39, ... }) {New, Renew, ... }) {Yes, No}) {Yes, No}) {Yes, No}) 1, 2, 3, ...) {LimitedUser, ... }) {Satisfaction, ... }) {accept, review}) {UNSW, ...}) {EU, ...})

Related Message checkEligibility checkEligibility fillInApplication fillInApplication fillInApplication submitWorkExperience testEnglishAbility submitReferenceLetter checkApproval fillInApplicationForOverseasStudent fillInApplicationForOverseasStudent

Some of classification attributes

Algorithm 6: Identifying Attributes as Input to Decision Tree Input: candidate state CS, input data table T, and protocol P = (S, s0 , F, M, R). Output: a set of identified attributes. begin 1: Let TotalAttr := an attribute set of T; 2: Let IdentifiedAttr := {} and AttrSubset:={}; 3: PathsFromStart:= GetBackwardPaths(P, CS); 4: if |P athsF romStart| = 1 then 5: AttrSubset:= an attribute set of backPath ∈ PathsFromStart; 6: IdentifiedAttr:= AttrSubset ∩ TotalAttr ; 7: else if |P athsF romStart| ≥ 2 then 8: if PathsFromStart contains incompatible path then 9: foreach backP ath ∈ P athsF romStart do 10: AttrSubset:= an attribute set of backPath; 11: IdentifiedAttr:= IdentifiedAttr ∪ AttrSubset; 12: M:= findMessagesOfIncompatiblePaths(PathFromStart); 13: foreach m ∈ M do 14: IdentifiedAttr:= IdentifiedAttr- attributes of m; 15: else 16: foreach backP ath ∈ P athsF romStart do 17: AttrSubset:= an attribute set of backPath; 18: IdentifiedAttr:= IdentifiedAttr ∪ AttrSubset; 19: return IdentifiedAttr ; end

—If there is more than one backward path from the initial state to CS, the instances in CS might have taken different paths to lead to that state, which means that the instances could have different sets of attributes depending on the path taken by them. Hence, we divide the input data table into two or more tables each having a subset of the attributes (based on the path) from the data table. In this case, if there are N paths leading to CS, we generate N tables (eventually, N different trees will be generated from them) (lines (16) to (18)). —In particular, when one of the backward paths is incompatible in the context of new protocol, we can prune the attributes of messages involved in the path. The ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures

pruning is based on the assumption that conversations (for which future interaction analysis is needed) in CS already followed a compatible backward path and thus we do not need to look at the attributes of messages M of the incompatible path. Specifically, the messages M should not belong to the intersection between the incompatible and compatible backward paths. For example, if state Lodged in Figure 1 is the CS , we can prune the attributes of message submitReferenceLetter (lines (8) to (14)). Therefore, this approach identifies the attributes relevant to the goal of the decision tree construction and prevents the undefined values from being involved in the input data table. 4.2.3 Labeling Instances. After identifying the candidate attributes, we provide the instances in the data table with class label information. We classify the completed instances into two classes of instances: compatible (C) and noncompatible (NC) instance. The compatible instance means that, from the candidate state to a final state, the message sequence taken by the instance is compatible in the context of the new protocol. The process of labelling instances can be done by the algorithm 7: Algorithm 7: Labelling instances for constructing state-based trees Input: candidate state CS, input data table T, and protocol P = (S, s0 , F, M, R). Output: labelled training set based on a state, which also includes candidate attributes. begin 1: Let CandidateRecords:= { }; 2: Let CM := null ; 3: foreach t ∈ T do 4: CM := correlated message attribute of t; 5: if CM includes all the incoming and outgoing messages to/from CS then 6: if CM, from s0 to CS, is compatible then 7: if CM, from CS to f ∈ F, is compatible then 8: t is labelled with C ; 9: else 10: t is labelled with NC ; 11: CandidateRecords:= CandidateRecords ∪ t; 12: endfor 13: return CandidateRecords; end

—we extract the instances that have passed over the candidate state (line 5). —from the instances, we exclude the instances that have the incompatible backward path to the state (line 6). —the instances filtered out from the above procedures are labelled with C or NC, depending on whether, from the candidate state to a final state, message sequences taken by the instances are compatible in the context of the new protocol (lines (8) to (11)). ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

23

24

·

S. H. Ryu et al.

Fig. 5.

Simplified state-based decision tree for state Lodged

The number of instances in the training data set for each candidate state might be different, depending on the state, since the message sequence followed by an instance is compatible with respect to a certain candidate state, but is incompatible with respect to another candidate state. Example 4.1. Suppose we create the labelled training set for the candidate state Lodged (Figure 1). The instances that did not pass the state Lodged should be excluded from the dataset. Namely, we disregard the instances that have taken the paths (Start. ... .Eligible. fillInApplicationForOverseasStudent(). ... Processed()). Among the remaining instances, we also exclude the instances that have taken the backward path (Start. ... .Eligible. fillInApplication().ApplicationReady. submitReferenceLetter().Lodged) which is incompatible in the context of the new protocol. In the end, the instances labelled with C are ones that followed the message sequence (Start. ... Eligible.fillInApplication(). ... Lodged. ... confirm().Processed), while the instances labelled with NC are ones that followed the message sequence (Start. ... Eligible.fillInApplication(). ... Lodged. ... reassess().Reviewed). 4.3 Tree Generation and Usage 4.3.1 Generating Tree and Interaction Patterns. Finally, we build the statebased decision trees for each of the candidate states through feeding the attributes from the labelled training data set. In our protocol example (in Figure 1), we build five state-based decision trees used to classify ongoing conversations migrateable to the new protocol, instead of constructing twelve trees corresponding to the number of states. The first tree corresponding to the state Eligible is constructed by using only data related to the checkEligibility message that visa applicants sent to the service provider. For the last tree, corresponding to the state Checked, the decision tree algorithm exploits the data obtained from the messages exchanged from the initial state Start to the state Checked. ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures

Example 4.2. Figure 5 depicts the state-based tree for the state Lodged showing that, in order of conversations in the state to continue to correctly interact with the service after being migrated to the new protocol, what attributes they should have. The paths from the root to the middle leaves of the tree show that, if the applicants’ number of years of experience is greater than 4, their skills belong to occupations in demand list (odl), and they got the English test result ModestUser or GoodUser, their application instances can be classified as migrateable because they could follow compatible forward path in the context of new protocol. The combination of branching conditions on these paths can be converted to classification (interaction) patterns of IF-THEN statement. The generated patterns are : . IF workExperience > 4 AND odl= ”yes” AND englishResult= ”ModestUser” THEN forward path= ”C” . IF workExperience > 4 AND odl= ”yes” AND englishResult= ”GoodUser” THEN forward path= ”C” For simplicity, we do not show the actual decision tree and some details of the results. 4.3.2 Applying Interaction Patterns to Ongoing Conversations. Once a decision tree for a certain candidate state (CS) has been built and interaction patterns have been generated from the tree, we use the patterns to make predictions about which future paths the ongoing conversations in CS will take. Namely, at the time we cannot classify conversations in CS (due to the unavailability of client protocols), the interaction (classification) patterns for the state are retrieved and applied to the live conversation data, that is, assessing whether values of the conversation attributes satisfy the if condition of patterns stipulating the features of instances that took compatible forward paths in the context of new protocol. Furthermore, each rule is also associated with the probability that the examined conversation will take compatible future interaction. In case that the data of conversations in CS satisfy the rules with the probability above a threshold set by service managers, those conversations can be classified as migrateable. 5. HANDLING NON-MIGRATEABLE CONVERSATIONS Conversations for which no satisfactory migration solution have been found by the analysis methods described in previous sections are doomed to fail in interacting with the new version of the protocol. In this section, we examine the solutions that can be applied to avoid interrupting these conversations. In a nutshell, since the conversations concerned are non-migrateable to the new protocol, we have to examine if it is possible to adapt the new protocol temporarily and only for the sake of these remaining conversations. We first detail the methodology for adapting the protocol and then present how our tool supports service managers in building the adaptation. 5.1 Protocol adaptation methods We examine two methods that can be used to modify temporarily the way clients interact with the new service protocol: ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

25

26

·

S. H. Ryu et al. Old service protocol Clients

New service protocol

Interactions based on old protocol

Fig. 6.

Adapter

Adapters bridging the protocol differences

(1) Developing service adapter. An adapter is a mediator service. Its role is to make the new protocol “look like” the old protocol, so that any clients interacting correctly using the old protocol can continue to interact with the new one (Figure 6). In our previous work [Benatallah et al. 2005], we proposed an approach for developing adapters based on the concept of mismatch patterns (e.g., message ordering mismatch, extra message mismatch and missing message mismatch). Mismatch patterns capture the possible differences between two protocols to adapt. Once the pattern to use have been identified, the adapter code can be generated automatically. However, the approach has the limitation of being dependent on the semantics of the protocol differences, which means that certain adaptation is effective only when the differences do not affect or change the functionality of the old protocol. We refer the interested reader to the work [Benatallah et al. 2005] for the detailed description. (2) Ad hoc protocols. To cope with the cases where we cannot develop adapters (e.g. an operation from the old protocol is removed in the new one and its features are not offered by any alternative operation), it is possible to construct an ad-hoc protocol whose aim is to meet the new requirements without canceling ongoing conversations. For example, consider the working visa application protocol (Figure 1). There might be still some non-migrateable conversations in state Lodged after classifying conversations based on the model-based analysis or the inferred interaction patterns because, although their forward paths are compatible in the context of the new protocol, they followed the incompatible backward path. In this case, it is not necessary to cancel them to satisfactorily meet the updated visa law. It could be more efficient to make them fulfill the backward compatibility. To do so, we define the ad hoc protocol in Figure 7 as adding the transition reportMedicalExamination before the final states. In the next section, we present how the development of ad-hoc protocols can be facilitated using the protocol evolution tool. 5.2 Supporting ad-hoc protocol development In order to adapt their service with ad-hoc protocol, service managers need to answer few questions: (i) which are the types of conversations that will need adhoc protocol development?, i.e., which is the minimum set of adaptation to develop in order to satisfy as many conversations as possible? (ii) what are the features that ad-hoc protocol have to fulfill for a given group of conversations? Regarding the first question, the tool provides support in the form of conversation clustering. Non-migrateable conversations are grouped with respect to their current state, their history and the forward path they will be taking (when known). For example, consider the state Lodged (Figure 1), for which the tool computed the repleaceability status (in this case, non-replaceability). Non-migrateable conACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures checkEligibility Start

Eligible

fillInApplication

Application Ready

cancel

Cancelled

submitReference Letter

Lodged

checkApproval

Checked

confirm Confirmed

reportMedicalExamination

Processed

Fig. 7.

Example of an ad hoc protocol

versations which currently are in that state can be classified into three groups: (i) conversations that have a non-compatible history and will take a compatible forward path; (ii) conversations that have a compatible history and will take a noncompatible forward path; (iii) conversations for which both history and forward path are non-compatible. In order to assist the development of ad-hoc protocol, each group of conversations, obtained as detailed above, is associated with a template that can be used as guideline during the ad-hoc protocol definition phase. A template is a tuple composed of where the intersection protocol consists of the states and transitions that exist in common execution paths to flow from the initial state to final states in the old and new protocols, passing over the state (e.g., Lodged) in which conversations of a given group stay. Also, the mismatch types specify the differences between the two protocols, which are only related to the intersection protocol. In particular, service managers can automatically construct the intersection protocols of templates using protocol management operators [Benatallah et al. 2004a; 2006] and then further (manually) refine the intersection protocols by looking at the mismatch type information [Benatallah et al. 2005], e.g., missing mismatch type describing that the clients of the group 1 (in the previous example) have to provide the medical examination (reportMedicalExamination) to satisfy the new visa law. 6. IMPLEMENTATION, USAGE, AND EXPERIMENTS This section describes our prototype implementation and exposes experimental results. The experiment aims also at illustrating through a scenario how the tool can help service managers to handle dynamic evolution of business protocols. ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

27

28

·

S. H. Ryu et al. Analysis and management components

Development environment

Protocol Evolution Manager

Business protocol editor

Mining Engine

Analysis and management interface

Protocol analysis and manipulation operators

Trust negotiation protocol editor

Calls from external clients via SOAP interface

Code generator from protocol model

Composition editor

Adapter generator

Mismatch-pattern editor

Service manager

Model representation, storage, and manipulation components

Repositories

Mismatch-patterns templates

Service descriptions and models

Protocol instances

Fig. 8.

Interaction logs

Classification patterns

Service Mosaic platform

6.1 Prototype implementation The tool for dynamic protocol evolution is part of a larger project called ServiceMosaic (http://servicemosaic.isima.fr), which is a CASE tool set for web service life-cycle management. The ServiceMosaic platform has been implemented using Java and J2EE technologies as an Eclipse plugin. It is organized in three modules (see Figure 8): a development environement, model representation and manipulation components, and analysis and management components. The development environment provides means for building and editing protocol models. The model representation and manipulation components are a collection of methods for storing and managing service descriptions and protocols. The analysis and management components assist users in performing several types of analysis such as protocol discovery or log analysis. A detailed description of the platform can be found in [Benatallah et al. 2006]. This article is concerned with three components (darkly shaded in Figure 8): the business protocol editor, the protocol evolution manager (PEM), and the mining engine. Business Protocol Editor. The protocol editor offers a visual environment that allows service managers to create or edit protocol definitions. It has been implemented by modifying the tool, which was developed to model security policies (i.e., trust negotiation) in our previous work [Skogsrud et al. 2004]. The editor allows to build state machine diagrams and set properties for state (e.g., state ID, state name) and transitions (e.g. transition name). The protocol editor uses the XML representations of models to generate control tables which provide the information required to correlate conversation instances with protocol’s state and make sure that messages are being exchanged as specified by protocol definitions. Protocol Evolution Manager. The Protocol Evolution Manager (PEM) corresponds to the GUI front-end used by service managers to carry out the model-based ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures

analysis and migrate active conversations to a new protocol. It presents the result of the analysis, that is, for each group of conversations, the possible migration strategies that can be applied. Using this tool, service managers are able to: —Load and show old and new protocols, and current active conversations from DB. —Choose a particular state and show the conversations at the state. —Choose a certain conversation, show its current state, and the history taken by it. —Show a client’s protocol, if possible, and the interaction path between the service and client protocol. —Perform different types of model-based analysis, classify ongoing conversations as migrateable or non-migrateable, and show the migrateable ones. —Migrate classified conversations to the new protocol, and show several migration statistics, e.g., percentage of migration. —When the protocols of clients are unavailable, ask the mining engine to analyze Web service interaction logs and infer interaction patterns that will be applied to ongoing conversations. Mining Engine. It takes as input a service interaction log and produces as output the interaction patterns inferred by the state-based decision trees. The mining engine consists of two modules, namely, preprocessor and decision tree builder. The preprocessor provides facilities for cleaning interaction logs, correlating messages, and extracting attributes from exchanged documents. It produces a table representation of the messages exchanged during the service execution grouped by conversations. This table serves as input to the decision tree builder. The decision tree builder proceeds as follows: (1) The states, in the old protocol, that can be used for state-based decision trees are first selected (Section 4.2.1); (2) For a candidate state identified, a collection of attributes are then selected to characterize conversation instances (see Section 4.2.2); (3) Historical conversation instances are labeled and the labeled conversation instances form the training set for the candidate state considered (Section 4.2.3); (4) The training set, characterized with the selection of attributes identified and with the label information in the previous steps, is used to build a decision tree for the considered state. In our prototype implementation, we use Weka software [Witten and Frank 2005] that implements the decision tree algorithm based on C4.5 [Quinlan 1993] (Section 4.2.4); (5) From the decision tree, interaction patterns are generated. This last step of the algorithm is not yet implemented. It only affects the presentation of the results shown to user. 6.2 Usage Scenario of the tool We propose the following scenario as an illustration of the support provided to service managers. First, the service manager creates the new version of protocol by ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

29

30

·

S. H. Ryu et al.

(a) Modifying old protocol

(b) State replaceability

Fig. 9.

Usage scenario for classifying migrateable conversations (1)

adding or removing states and transitions (Figure 9(a) presents a screenshot of the editor). After modifying the old protocol, she starts the state replaceability analysis to identify the states such that conversations in these states can be safely migrated to the new protocol without any conditions (Figure 9(b)). These states, i.e., SApplicationReady, GCSubmitted, S-Lodged, and Cancelled are displayed in green. In Figure 9(b), the old protocol appears on the center left and the new one on the center right. Current active conversations are displayed on the left pane, where highlighted conversations correspond to the conversations currently in the selected states. Conversations that are migrated are moved from the left pane to the right ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures

(c) Forward path replaceability

(d) Replaceability w.r.t a history

Fig. 10.

Usage scenario for classifying migrateable conversations (2)

pane. The results of analysis and actions are shown in the bottom part of the window. After migrating all compatible conversations, the service managers can investigate the remaining ones. She identifies the states of the old protocol that have compatible forward paths but different backward paths in the context of new protocol (Figure 10(c)) and then classifies migrateable conversations from the computed states by looking at their histories (Figure 10 (d)). The conversations in the identified states (i.e., Processed) colored as yellow are highlighted in the left pane. She can migrate these conversations to the new protocol. ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

31

32

·

S. H. Ryu et al.

(e) Backward path replaceabilty

(f) Replaceability w.r.t state and client protocol

Fig. 11. Usage scenario for classifying migrateable conversations when client protocols are known (3)

Next, the service manager can investigate the states with compatible backward paths and for which the forward path is affected by the protocol modification (Figure 11(e)). The states identified by this analysis are Start, Eligible, ApplicationReady, and WESubmitted (colored in purple). If some of the conversations in these states have compatible forward paths, they can be migrated. Figure 11(f) shows the history of a migrated conversation on the center left window while the client protocol corresponding to this conversation appears in the pop-up window and its current ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures

(g) Identifying branching states

(h) Decision tree w.r.t state WESubmitted

Fig. 12. Usage scenario for classifying migrateable conversations when client protocols are unknown (4)

state in the new protocol is displayed on the center right window. For the conversations whose client protocols are unavailable, the service manager identifies the candidates states for inferring interaction patterns (Figure 12(g)). Then, for example, to classify the conversations in the state WESubmitted, she builds the decision tree from which the interaction patterns (rules) can be generated. On the basis of the interaction patterns, conversations in the WESubmitted state can be further classified (Figure 12(h)). ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

33

34

·

S. H. Ryu et al.

6.3 Adapting a Service Implementation to Evolving Protocols Using top-down development approaches [Baina et al. 2004], the external specifications of a Web service (i.e., protocol specification) can be automatically generated into the internal specifications (i.e., service implementation templates/skeletons) that can be extended with business logic by developers. In particular, since the skeletons generated by the approach are BPEL-compatible, BPEL execution engine such as the IBM’s BPWS4J (www.alphaworks.ibm.com/tech/bpws4j) can be used to execute the skeletons including operation implementation logic. In case that service managers modify the existing protocol specification, it is desirable to adapt the service implementation without regenerating the implementation skeletons from scratch and repeating the enhancement of the skeletons with business logic. In our previous work [Kongdenfha et al. 2006], we proposed a framework, based on aspect-oriented programming (AOP) [Courbis and Finkelstein 2005], for simplifying the adaptation of a service implementation in response to the protocol mismatches or changes by separating the adaptation logic from the business logic. Such a separation helps to maintain the internal specifications without destructively modifying them, since there needs to evolve only the separated adaptation logic when the protocol specification changes. We identified mismatches between old and new protocols and weaved the adaptation logic related to the mismatches with the internal specifications (service implementation). After the implementation modification, a service manager migrates the classified ongoing conversations under the old version of protocol to the new version of protocol controlled by the modified system. 6.4 Experiments We implemented the prototype in Java using PostgreSQL 8.1 as the database engine. All the experiments were performed on a notebook machine with 1.73GHz CPU and 1 GB of memory, running Microsoft Windows XP. 6.4.1 Model-based analysis performance. In this experiment, we test the scalability of the model-based analysis (Section 3) in terms of protocol complexity and number of ongoing conversations. To this end, we defined 5 pairs of protocols (each pair consisting of an old and a new protocol) with varying number of states (i.e., 10, 20, 30, 40 and 50 states). We populated the system with a number of artificial conversations (i.e., 1000, 2500, 5000, 7500, 10000). Each conversation was generated by randomly choosing a path from the initial state of the old protocol to an arbitrary state, considered the current state of the conversation. In Figure 13, the graphs ((a)-(c)) show the performance of the replaceability analysis. The graph (a) shows the time needed to complete the state replaceability (SR) analysis and the forward path replaceability (FPR) analysis for protocols of a varying number of states. The graph (b) shows the time needed to compute the replaceability analysis w.r.t a history (R w.r.t H) when carried out for a varying number of conversations. For comparison, the graph (c) indicates the time that would be required if the analysis was performed conversation per conversation rather than being done directly at the protocol level. As it can be seen from the graphs, the time taken to complete these analysis grows linearly with respect to the number of states and the number of conversations. ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures ͑͢͡ΤΥΒΥΖΤ ͥ͑͡ΤΥΒΥΖΤ

ͷ΁΃

ͣ͑͡ΤΥΒΥΖΤ ͦ͑͡ΤΥΒΥΖΤ

΃͑Ψ͟Σ͟Υ͑͹

ͤ͑͡Τ ΥΒΥΖΤ

΅ΚΞΖ͙͑ΞΤ͚

ͣ͡͡͡ ΅ΚΞΖ͙͑ΞΤ͚

΅ΚΞΖ͙͑ΞΤ͚

΄΃ ͧ͢͡ ͥ͢͡ ͣ͢͡ ͢͡͡ ͩ͡ ͧ͡ ͥ͡ ͣ͡ ͡

ͦ͢͡͡ ͢͡͡͡ ͦ͡͡ ͡

͢͡

ͣ͡

ͤ͡

ͥ͡

͢͡͡͡

ͦ͡

ͣͦ͡͡

ͦ͡͡͡

(a) Time taken to perform State Replaceability (SR) and Forward Path Replaceability (FPR)

͢͡͡͡͡

͢͡͡͡

ͣͦ͡͡

΁ΣΖΔΚΤΚΠΟ

΃ΖΔΒΝΝ

΁ΖΣΔΖΟΥΒΘΖ

ͧ͟͡ ͥ͟͡ ͣ͟͡ ͡ ͖ͦ͡

͖ͧ͡

͖ͨ͡

ͨͦ͡͡

͢͡͡͡͡

(c) Time taken to compare two protocols and time taken to individually look at conversations

΁ΖΣΔΖΟΥΒΘΖ͑ΠΗ͑ΝΒΓΖΝΖΕ͑ΚΟΤ ΥΒΟΔΖΤ

ͩ͟͡

͖ͥ͡

ͦ͡͡͡ ͿΠ͑͟ΔΠΟΧΖΣΤΒΥΚΠΟΤ

(b) Time taken to perform Replaceability w.r.t a history (R w.r.t H)

͢

ͲΔΔΦΣΒΔΪ

ͨͦ͡͡

ͿΠ͑͟ΔΠΟΧΖΣΤΒΥΚΠΟΤ

ͿΠ͑͟ΤΥΒΥΖΤ

ͷ΁΃

ͩ͡͡ ͨ͡͡ ͧ͡͡ ͦ͡͡ ͥ͡͡ ͤ͡͡ ͣ͡͡ ͢͡͡ ͡

͖ͩ͡

΁ΖΣΔΖΟΥΒΘΖ͑ΠΗ͑ΚΟΤΥΒΟΔΖΤ ͑ΦΤΖΕ͑ΒΤ͑ΥΣΒΚΟΚΟΘ͑ΕΒΥΒ

(d) Precision and Recall for different sizes of training data (# instances= 15000)

Fig. 13.

͖ͪ͡

͖ͣͪ͟͡ ͖ͣͪ͟͡ ͖ͣͩ͟͡ ͖ͣͩ͟͡ ͖ͣͨ͟͡ ͖ͣͨ͟͡ ͖ͣͧ͟͡ ͖ͣͧ͟͡ ͢͡͡͡

ͦ͡͡͡

͢͡͡͡͡

ͦ͢͡͡͡

ͣ͡͡͡͡

ͿΠ͑͟ΚΟΤ ΥΒΟΔΖΤ

(e) Percentage of excluding irrelevant instances out of input dataset

Results of experiments

6.4.2 Future interaction analysis evaluation. In a second experiment, we evaluated the applicability of the future interaction analysis method (Section 4). The actual accuracy of the method necessarily depends on the specificities of the business process considered (some process may be more predictable than others). The evaluation we performed only aims at testing, in an artificial setting, if the method can be used with large datasets containing between 1000 and 20,000 instances. The experiments have been conducted on a synthetic dataset obtained by simulating the working visa application protocol described in Figure 1. Messages corresponding to the records of each dataset are correlated and attributes of the correlated messages are extracted, e.g., SequenceID, Timestamp, Age, RelevantLicense, EnglishResult, WorkExperience, ReferenceResult, and so on. As a preprocessing step, attribute values, when needed, were discretized and a training set of conversation instances was labeled C or NC for each candidate state identified (see Section 4.2.3). For each candidate state, the corresponding decision tree is built from the training set using Weka software [Witten and Frank 2005]. We then measured the accuracy of the inferred interaction patterns in terms of precision and recall, varying the size of the training and validating data. In these tests, the accuracy value is based on the proportion of correctly classified instances. Precision and recall can be defined as follows: given a set X of conversation instances having a message sequence compatible with the new protocol and a set Y of instances correctly classified as compatible by the inferred interaction patterns, the recall corresponds to the ratio |X ∩Y |/|X| and the precision to the ratio |Y ∩X|/|Y |. The graph (d) shows these metrics computed from a same dataset using different proportion of data for training and testing. From this graph, we can see that the size of the training and testing data affects the accuracy. As expected, using a larger training dataset increases the precision and recall of the interaction patterns. The graph (e) shows that the algorithm used for labeling instances (algorithm 6) filters about 75% of instances that have undefined (NULL) or irrelevant attribute ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

35

36

·

S. H. Ryu et al.

values. Eliminating irrelevant instances is important in the sense that the computation cost of trees with reduced number of instances is much lower than that of trees with the whole number of instances. 7. RELATED WORK The business protocol evolution is related to other evolution problems: database schema evolution, software component evolution, software refactoring, workflow evolution, and protocol evolution. Database schema evolution: The database community has considered the problem of managing schema evolution, mainly in the field of object-oriented databases [Andany et al. 1991; Bertino and Martino 1993; Lautemann 1996; Ferrandina et al. 1994; Estublier and Nacer 2000]. To meet the new requirements of database applications, the schema definition is changed over time, by adding or removing schema elements. The work in this area has developed several techniques that support the mapping of schema elements from the old schema to the new one. Such approaches include the class versioning, which allows the old started applications to continue to use the old schema, and the conversion, which transforms the data of the database to make them comply with the modified schema. However, the business protocol evolution differs from the database schema evolution in two significant ways. First, in the case of class versioning, it is acceptable for old applications to run according to the old schema, whereas there might be situations where it is not possible for ongoing conversations to continue to run according to the old protocol, e.g., in case that there are security holes in the definition of security protocol. Second, the database cannot be accessed from all applications during the database reorganization (conversion) and, after conversion completion, the applications implemented based on an old schema should be updated, compiled and restarted against a new schema. In contrast to this technique, protocol evolution needs to migrate ongoing conversations to the new protocol without adapting and restarting them. In addition, unlike the database conversion, it is not possible to migrate all the conversations to the new protocol. Instead of the conversion technique, if we adopt the approach of simulating the schema evolution by object-oriented views [Bratsberg 1992; Tresch and Scholl 1993], which enables different applications to use different views (seen as different schemas), there is no need for conversion. However, applying this approach to the protocol evolution causes the similar problem, like the case of class versioning. Therefore, we are unable to use techniques for database schema evolution to this context. Software component evolution: Software component evolution has been considered important for getting the benefits of component-based software developments, such as component reuse, easy maintenance, and greater flexibility. The components can be evolved since they are hardly flawless. The evolution is a result of satisfying new application requirements, ranging from software structure changes to problems and bug fixes. Most solutions to this problem are based on versioning mechanisms [Englander 2001; Rakic and Medvidovic 2001; Eisenbach et al. 2003; Stuckenholz 2005], which provide the ability to distinguish various versions of a software component evolving over time. To give the version information for comACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures

patibility checks, they support enhancing the filenames of the libraries by version numbers or the libraries by special meta-files (e.g., manifest files of an XML format). So, such mechanisms enable multiple versions of a component to exist in a system and allow applications to use different versions of one component. However, these approaches are not applicable to our problem, for the objective of our work is different from one of them. Software refactoring: In real-world environments, software is modified, improved, and adapted to meet new requirements by adding new features or fixing bugs. Refactoring is the process that reorganize the internal structure of a software system without altering the external behavior [Fowler 1999]. Software refactoring has been successful in restructuring the source code of a software system to improve the quality of software (e.g., reusability, complexity, modularity, etc) [Bergstein 1991; Mens and Tourw´e 2004; Tokuda and Batory 2001]. To facilitate adaptations and extensions for software evolution, refactoring performs edit functions, such as adding classes, variables, and methods, or moving variables up the class hierarchy. However, our work differs in that we are performing change impact analysis (based on protocol models) and data mining-based analysis on conversations, and classifying migrateable ones in order to deal with the protocol evolution problem. Workflow evolution: The protocol evolution has some similarities with workflow evolution [Ellis et al. 1995; Casati et al. 1998; Sadiq 2000; Kradolfer and Geppert 1999; Agostini and Michelis 2000; van der Aalst 2001; Vieira and Silva 2005]. Ellis et al. coined the issues of dynamic workflow change in their work [Ellis et al. 1995]. They exploited a Petri net abstraction for modeling dynamic change which means the change ”on the fly” while workflow instances are running. Their approach is based on ”a change region” that contains the parts of the Petri net directly affected by the change. However, the change region should be calculated manually. In versioning mechanisms [Joeris and Herzog 1998; Kradolfer and Geppert 1999; Vieira and Silva 2005], every change causes the creation of a new version of workflow, and each instance is bound to a particular version of the workflow. The active instances bound to a certain version of the workflow will continue to run according to this version [van der Aalst 2001]. They are not affected by workflow evolution because the version is not altered to reflect the changes. The MILANO workflow management system [Agostini and Michelis 2000] provides techniques for determining whether an active instance can be migrated to the new workflow, and for automatically calculating the states where migrateable instances stay. Aalst [van der Aalst 2001] propose an approach for tackling the dynamic change bug, which refers to errors caused by migrating an instance from the old workflow to the new one. Like in MILANO, the approach determines the ”change region”- the part of the workflow that is affected by changes. If instances are in the change region, the migration of them is postponed until they exit the region. However, in the protocol evolution, it might be necessary to immediately transfer the instances in the change region to the new protocol without delaying the migration of them (e.g., by laws). To do so, we look at the properties of conversations within the region and filter out the ones migrateable to the new protocol. Some works [Casati et al. 1998; Sadiq 2000; Rinderle et al. 2003] propose mechanisms for checking the compliance of all active instances with the new workflow, ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

37

38

·

S. H. Ryu et al.

using the information on the state and execution trace of an instance. In WIDE [Casati et al. 1998], Casati et al. presents a set of basic operations that allow the modification of a workflow schema and preserve structural and behavioral correctness when they are applied to the workflow modification. The workflow developer should manually group instances according to the workflow evolution impacts. However, compared to their work, the grouping and classifying instances can be done automatically with the replaceability analysis, or by the interaction patterns inferred through analyzing service interaction logs. Sadiq introduces a three-phase modification process consisting of defining, conforming and enacting the workflow modification. They provide two types of grouping methods, while our framework enables service managers to conduct more fine-grained classification and to choose more variety of migration strategies. Also, the mechanisms [Casati et al. 1998; Sadiq 2000; Rinderle et al. 2003] are not sufficient for handling the business protocol evolution, since they cannot guarantee the correct future interaction of the migrated instances in the new protocol, which is one of the important requirements in determining the migrateability of instances. Web service versioning: In the context of Web services, some recent works [Brown and Ellis 2004; Kaminski et al. 2006] have proposed the versioning techniques for managing the problem of Web service evolution. Brown et al. proposed an approach based on the use of version namespace and the use of version numbers in UDDI entry. The approach allows multiple versions of a Web service to support client services that are dependent on earlier versions of the service, like database schema evolution and software component evolution. Kaminski et al. presented a design technique called Chain of Adapters to handle the problem of managing the Web service version and to achieve the backward compatibility with clients written to work with older versions of the Web service. Their approach has the limitation that, as new versions of a service come out, the chain of adapters becomes long and it takes much more time for clients compatible with earlier version of the service to interact with the most recent version of the service. Protocol evolution: Several approaches have been developed to support the dynamic protocol evolution, e.g., communication protocol or security protocol. Ryan and Wolf [Ryan and Wolf 2004] explored the problem of making a distributed application continue to run when the inter-component protocols on which its distributed components base their communication evolve, and they proposed the technology of what is called event-based translation as a solution. When the new protocol has been defined, event-based translation techniques avoid the need of altering an application code by making the application handle the semantic concepts of elements in the protocol rather than the syntactic details of it. However, our approach differs in that we are dealing with the business protocol evolution at the higher layers of interoperability stack in supporting interaction among services, namely, businesslevel protocols, rather than at the lower layers such as communication or transport layer (SOAP) [Alonso et al. 2004]. Ahmed [Ahmed 2006] proposed an approach for helping clients adapt to the evolving business protocols. They defined a set of operations used for modifying protocols, and provided an algorithm for calculating new client protocols compatible with the changed provider protocol by considering the list of operations applied ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures

to change the old provider protocol. The newly created client protocols are propagated to the clients so that they can adjust their systems to continue to successfully interact with the new provider protocol. Compared to the ”client-side” adaptation mechanism, our approach of adapting conversations to the new protocol are performed transparently to clients. Their solution also has to compute a new client protocol for each client interacting with the provider service. This may make the computation expensive because there could be very large number of clients. Relationship with Our Previous Work: This article presented our vision on resolving the dynamic protocol evolution problem in SOAs. This work is part of the project called ServiceMosaic [Benatallah et al. 2006], which has been carried out by Service Oriented Computing (SOC) group at the University of New South Wales. The ServiceMosaic platform (http://servicemosaic.isima.fr) is a CASE toolset for modelling, analyzing, and managing Web service models, including business protocols, orchestrations, and adapters. Some of the replaceability analysis described in model-based analysis (Section 3) were presented in [Benatallah et al. 2004a; 2006; Ryu et al. 2007]; this article has extended the previous work with respect to a more fine-grained analysis of protocol evolution. In addition, this work is based on the methodology developed in our previous work [Skosgrud et al. 2007], which supports the evolution management of security protocols (i.e., trust negotiation protocol). The approach presents constraints specific to security protocols and provides analysis methods for the change impact of security protocols. When the methodology is applied to business protocols, it cannot be directly applicable and hence the following changes are required: (i) as the security protocol model specifies the set of credentials (signed statements describing attributes of a client) disclosed to proceed a state, rather than the sequence of messages, the analysis of security protocol change impacts should be modified to perform the analysis on conversations, in terms of messages sequence; (ii) the violation of security-specific constraints occurs only when the condition for proceeding to a state is restricted (i.e., additional credentials are required). However, in the management of business protocol evolution, the violation detecting technique should be extended to consider not only adding messages (in the security protocol, adding credentials), but also removing messages; (iii) to guarantee the correct interaction after migrating conversations to the new business protocol, it is important to predict the forward paths that can be taken by each conversation. However, the management of security protocol evolution does not consider which forward path an conversation can take in the new protocol; (iv) in addition, the approach of this paper employed data mining techniques to overcome the situation when the analysis methods cannot be performed due to the unavailability of client protocols. Although the methodology is similar to that used in the security protocol evolution, the solutions and results for the problem are different. 8. DISCUSSION AND FUTURE WORK This paper provided an approach to tackle the problem of dynamic evolution of business protocols. In particular, we identified properties that can be used as requirements in determining which conversations can be migrated to a new protocol when an old one has been changed. In addition, to analyze the impact of protocol ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

39

40

·

S. H. Ryu et al.

changes, we presented the overall change impact analysis, such as a protocol replaceability and the other types of path/state based analysis, and the detailed change impact analysis, based on additional knowledge (i.e., forward path and backward path). According to the analysis, the grouping (classification) of ongoing conversations is automatically performed by our developed tool, rather than manually by service managers. The automatic classification plays an important role to support flexibility in Service-Oriented Architectures, where there are large number of interacting services, and it is required to dynamically adapt to the new requirements and opportunities proposed over time. In addition, we propose a data mining approach that can be applied when client protocols are unknown. The main result of this paper is that we have presented a comprehensive approach to the dynamic protocol evolution management where we provide a formal model, approaches for change impact analysis, techniques for inferring interaction patterns used for predicting forward paths of conversations, and tool support for migrating active conversations from the old to the new protocol without generating any problems such as the violations of the identified properties. The approach presented in this paper is not at all specific to business protocols, and it can be applied to a large extent to processes and to traditional services as well, as long as we have a trace of the execution. The only part that is specific is the analysis of the protocols of clients. The proposed approach might block the execution of ongoing conversations during the change impact analysis in order to guarantee the consistency of the analysis results with current states of conversations. In addition, we need to discuss the degree of protocol complexity supported by the approach that is one of the important considerations in the management of dynamic protocol evolution, as excessive complexity might impact on the performance of the approach. While protocol complexity can be measured in terms of state complexity (how many states a protocol has), control-flow complexity, data-flow complexity, and resource complexity [Cardoso 2007], we reduce the scope of the complexity discussion to the first two perspectives, which are most related to our problem context. First, our approach scales well for complex protocols with many states, as the time taken to perform the change impact analysis increases linearly with the number of states. Another complexity aspect is related to service interaction patterns [Barros et al. 2005] or workflow patterns [van der Aalst et al. 2003]. Our change impact analysis only supports the part of the patterns (e.g., sequence, loop, or-split, etc). Therefore, the proposed approach should be extended in order to achieve comprehensive patterns support in protocol evolution management. We leave this issue to our future research work. In addition, with respect to performance issues, we believe that performance is more influenced by how much a protocol is changed as well as where changes happen in the protocol, rather than the protocol complexity per-se. In future work, first, the semantic equivalence of protocol changes will be addressed, since, when comparing two protocols, we examine only the syntactic differences between them, rather than the semantic changes. To do so, we will present a variety of protocol change operations (e.g., adding a message sequentially/parallely, removing a message sequentially/parallely, merging two messages into one, etc) and will consider change logs (i.e., which operations are applied) in analyzing the comACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures

patibility properties. A good example is that, when two messages are merged into one, from the semantic point of view, the client that followed the protocol using the previous two messages are in the same state as the new client that are going to use the merged message. It means that this state, located after the two merged messages, can be said to be equivalent from the semantic point of view. We also plan to extend the change impact analysis to what-if analysis and other types of analysis, that help service managers or business users to plan the protocol changes and improve the quality of their services to their business partners. For example, after protocol changes, how many clients cannot be migrated to the new protocol and, as a result of such changes, how the business profit is affected by the changes, in case that the protocol is relevant to business transactions. Second, we plan to identify areas of improvements in business protocol definitions and exploit the knowledge generated by the analysis in the context of services optimization. Finally, providing a variety of these analyses is not straightforward, since there exist many different types of analyses for users to conduct, and it is difficult to satisfy their needs by predefining some queries. Hence, we plan to provide OLAP-style functionalities for service managers to perform the analysis fit to their needs. REFERENCES Agostini, A. and Michelis, G. 2000. Improving Flexibility of Workflow Management Systems. In Business Process Management. Ahmed, A. 2006. Management of the impact of change of web service protocols. Internship report, National Institute of Applied Sciences, Lyon. Alonso, G., Casati, F., Kuno, H., and Machiraju, V. 2004. Web Services - Concepts, Architectures and Application. Springer-Verlag. Andany, J., Leonard, M., and Palisser, C. 1991. Management Of Schema Evolution In Databases. In Proc. 17th Conf. Very Large Data Bases (VLDB’91). Baina, K., Benatallah, B., Casati, F., and F.Toumani. 2004. Model-Driven Web Service Development. In Proc. 16th Int’l Conf. Advanced Information Systems Eng. (CAiSE’04). Barros, A. P., Dumas, M., and ter Hofstede, A. H. M. 2005. Service interaction patterns. In Business Process Management. 302–318. Benatallah, B., Casati, F., Grigori, D., Nezhad, H. M., and F.Toumani. 2005. Devoloping Adapters for Web Services Integration. In Proc. 17th Int’l Conf. Advanced Information Systems Eng. (CAiSE’05). Benatallah, B., Casati, F., and Toumani, F. 2004a. Analysis and Management of Web Service Protocols. In Proc. 23rd Int’l Conf. Conceptual Modeling (ER 2004). Benatallah, B., Casati, F., and Toumani, F. 2004b. Web Service Conversation Modeling: A Cornerstone for e-Business Automation. In IEEE Internet Computing. Benatallah, B., Casati, F., and Toumani, F. 2006. Representing, analysing, and managing web service protocols. Data and Knowledge Eng. 58, 3 (Sept.). Benatallah, B., Casati, F., Toumani, F., Ponge, J., and Nezhad, H. 2006. Service Mosaic: A Model-Driven Framework for Web Services Life-Cycle Management. In IEEE Internet Computing. Bergstein, P. 1991. Maintenance of object-oriented systems during structural evolution. Theory and Practice of Object Systems 3, 3. Bertino, E. and Martino, F. 1993. Object-Oriented Database Systems: Concepts and Architecture. Addison-Wesley. Bratsberg, S.-E. 1992. Unified Class Evolution by Object-Oriented Views. In Proc. 11th International Conf. on the Entity-Relationship Approach. Brown, K. and Ellis, M. 2004. Best practices for web services versioning. IBM Technical Report. ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

41

42

·

S. H. Ryu et al.

Cardoso, J. 2007. Complexity analysis of bpel web processes. Software Process: Improvement and Practice 12, 1, 35–49. Casati, F., Ceri, S., Pernici, B., and Pozzi, G. 1998. Workflow evolution. Data and Knowledge Eng. 24. Castellanos, M., Casati, F., Shan, M., and Dayal, U. 2005. iBOM: A Platform for Intelligent Business Operation Management. In Proc. 21th Int’l Conf. Data Eng. (ICDE’05). Courbis, C. and Finkelstein, A. 2005. Towards aspect weaving applications. In Proc. 27th Int’l Conf. Software Eng. (ICSE’05). Eisenbach, S., Jurisic, V., and Sadler, C. 2003. Managing the evolution in .net programs. In Proc. of the 6th IFIP International Conference on Formal Methods for Open Object-based Distributed Systems. Ellis, C., Keddara, K., and Rozenberg, G. 1995. Dynamic change within workflow systems. In Proc. of the Conference on Organizational Computing Systems. Englander, R. 2001. Developing Java Beans. O’Reilly. Estublier, J. and Nacer, M. 2000. Schema evolution in software engineering databases – a new approach in adele environment. CAI Computer and Artificial Intelligence Journal 19. Ferrandina, F., Meyer, T., and Zicari, R. 1994. Implementation lazy database updates for an object database system. In Proc. 20th Conf. Very Large Data Bases (VLDB’94). Fowler, M. 1999. Refactoring: Improving the Design of Existing Code. Addison-Wesley. Grigori, D., Casati, F., Dayal, U., and Castellanos, M. 2001. Improving Business Process Quality through Exception Understanding, Prediction, and Prevention. In Proc. 27th Conf. Very Large Data Bases (VLDB’01). Joeris, G. and Herzog, O. 1998. Managing Evolving Workflow Specifications. In Proc. the 3rd IFCIS International Conf. on Cooperative Information Systems. Kaminski, P., Muller, H., and Litoiu, M. 2006. A design for adaptive web service evolution. In Workshop on. Software Engineering for Adaptive and. Self-Managing Systems. Kongdenfha, W., Saint-Paul, R., Benatallah, B., and Casati, F. 2006. An Aspect-Oriented Framework for Service Adaptation. In Proc. of the 4th International Conference on Service Oriented Computing. Kradolfer, M. and Geppert, A. 1999. Dynamic Workflow Schema Evolution Based on Workflow Type Versioning and Workflow Migration. In CoopIS. Lautemann, S.-E. 1996. An introduction to schema versioning in OODBMS. In DEXA Workshop. Mens, T. and Tourw´ e, T. 2004. A Survey of Software Refactoring. In Proc. 20th Int’l Conf. Data Eng. (ICDE’04). Motahari, H., Saint-Paul, R., Benatallah, B., and Casati, F. 2007. Protocol Discovery from Imperfect Service Interaction Logs. In Proc. 23th Int’l Conf. Data Eng. (ICDE’07). Nezhad, H. M., Saint-Paul, R., Benatallah, B., Casati, F., and Andritsos, P. 2007. Message correlation for conversation reconstruction in service interaction logs. Technical Report, UNSWCSE-TR-0709, University of New South Wales. Quinlan, J. R. 1986. Induction of decision trees. In Machine Learning. Morgan Kaufmann. Quinlan, J. R. 1993. C 4.5 Programs for Machine Learning. Rakic, M. and Medvidovic, N. 2001. Increasing the confidence in off-the-shelf components: a software connector-based approach. In Proc. 2001 symposium on Software reusability. Rinderle, S., Reichert, M., and Dadam, P. 2003. Supporting workflow schema evolution by efficient compliance checks. Technical Report 2003-02, University of Ulm. Ryan, N. and Wolf, A. 2004. Using Event-Based Translation to Support Dynamic Protocol Evolution. In Proc. 26th Int’l Conf. Software Eng. (ICSE’04). Ryu, S. H. 2007. A framework for managing the evolving web service protocols in service-oriented architectures. Master Thesis. Ryu, S. H., Saint-Paul, R., Benatallah, B., and Casati, F. 2007. A framework for managing the evolution of business protocols in web services. In APCCM ’07: Proceedings of the fourth Asia-Pacific conference on Conceptual modelling. Australian Computer Society, Inc., Darlinghurst, Australia, 49–59. ACM Transactions on the Web, Vol. V, No. N, November 2007.

Supporting the Dynamic Evolution of Web Service Protocols in Service-Oriented Architectures Sadiq, S. 2000. Handling Dynamic Schema Change in Process Models. In Proc. 11th Australian Database Conference. Skogsrud, H., Benatallah, B., and Casati, F. 2004. Trust-Serv: Model-Driven Lifecyle Management of Trust Negotiation Policies for Web Services. In Proc. 13th World Wide Web Conf. (WWW2004). Skosgrud, H., Benatallah, B., Casati, F., and Toumani, F. 2007. Managing Impacts of Security Protocol Changes in Service-Oriented Applications. In Proc. 29th Int’l Conf. Software Eng. (ICSE’07). Stuckenholz, A. 2005. Component evolution and versioning state of the art. In ACM SIGSOFT Software Engineering Notes. Tokuda, L. and Batory, D. 2001. Evolving Object-Oriented Designs with Refactorings. In Automated Software Engineering. Tresch, M. and Scholl, M. 1993. Schema Transformation without Database Reorganization. In SIGMOD Record, 22(1). van der Aalst, W. 2001. Exterminating the Dynamic Change Bug A Concrete Approach to Support Workflow Change. In Information Systems Frontiers, 3(3). van der Aalst, W. M. P., ter Hofstede, A. H. M., Kiepuszewski, B., and Barros, A. P. 2003. Workflow patterns. Distributed and Parallel Databases 14, 1, 5–51. Vieira, P. and Silva, A. 2005. Adaptive Workflow Management in WorkSCo. In DEXA Workshop. Witten, I. and Frank, E. 2005. Data Mining : practical machine learning tools and techniques. Morgan Kauffmann Publishers.

Received July 2007; Accepted November 2007

ACM Transactions on the Web, Vol. V, No. N, November 2007.

·

43

Suggest Documents