SOCA (2010) 4:109–136 DOI 10.1007/s11761-010-0060-9
ORIGINAL RESEARCH PAPER
Protocol contracts with application to choreographed multiparty collaborations Ashley McNeile
Received: 15 April 2009 / Revised: 13 April 2010 / Accepted: 3 May 2010 / Published online: 16 May 2010 © Springer-Verlag London Limited 2010
Abstract E-commerce collaborations and cross-organizational workflow applications are increasingly attractive given the universal connectivity provided by the Internet. Such applications are inherently concurrent and non-deterministic, so standard software engineering practices are inadequate, and we need new techniques to design extended collaborations and ensure that the implemented designs will behave correctly. The emerging technique for achieving this is to use a choreography, a global description of the possible sequencing of message exchange between the participants, as the basis for both the design of the collaboration and verification of its behavior. We describe a new technique that uses compositions of partial descriptions to define a choreography and show how the technique can be used to model the use of data and computation in the rules of the collaboration. We define conditions for correctness and show that they can be applied separately to each partial description. We demonstrate the expressive power of the technique with examples and discuss how it improves on previously published approaches. Keywords Behavior contract · Verification · Service collaboration · Service choreography · Process algebra
1 Introduction With networks now providing universal connectivity across organizational boundaries, business-to-business e-commerce and cross-organizational workflow applications are increasingly common. In such applications, two or more organizations arrange for their systems to collaborate in a shared A. McNeile (B) Metamaxim Ltd., London, UK e-mail:
[email protected]
business process, communicating by message exchange across a network infrastructure. The interaction between the systems engaged in the services can become complex, involving multiple message types and many different possible patterns of message exchange. Current software engineering practice does not offer a generally accepted approach to the design of extended multiparty collaborations and consequently industry practice generally resorts to two expedients:
– The use of multiple simple collaborations, each involving only a pair of participants and invoking manual intervention when the collaboration enters a state that, because of the fragmentation of the process, the software is not designed to handle. While this ensures that the complexity of the software is bounded to a level that can be validated using conventional software engineering techniques, in general it results in business processes that are slower, less scalable and more expensive than they need be. – The use of “packaged” solutions that require business processes to adhere to predefined proprietary templates. The templates are provided by specialist vendors who are able to invest in evolving and validating standard solutions over time. This works for some well-standardized applications but sometimes requires that processes are forced to fit and, more importantly, does not address cases where competitive advantage stems from innovative custom solutions.
This attests to the value in obtaining a general approach to designing extended collaborations, simple and practical enough to be used routinely in the context of custom solution design.
123
110
1.1 The challenge Designing services that can engage successfully in extended collaborations is a challenge. Unlike a program or process that follows a single procedural description, a collaboration involves the concurrent interaction of independent participants and is subject to various unpredictable factors at execution time that can affect its behavior, progress and outcome; in particular: – Differences in the relative speed or scheduling of the hardware/software at each participant. – Variations in latency of message transfer by asynchronous messaging systems. – Execution choices made by the infrastructure, independently of the application, when handling synchronization across distributed components.1 Considered as a whole, the behavior of a collaboration is the emergent behavior of the interaction between its participants and is non-deterministic in nature. This non-determinism renders conventional testing techniques ineffective, as it is very hard to design a test strategy that provides adequate coverage of the execution possibilities and it is not generally possible to repeat runs to recreate errors. As Owicki and Lamport observe: “There is a rather large body of sad experience to indicate that a concurrent program can withstand very careful scrutiny without revealing its errors. The only way we can be sure that a concurrent program does what we think it does is to prove rigorously that it does it.” [16]. The emerging approach to building extended service collaborations that are provably reliable is to base the design on a global contract that specifies the allowable patterns of message exchange between the participants from a global perspective. Such a global contract is known as a choreography. Building a collaboration based on a choreography has three steps as depicted in Fig. 1: 1. At design-time, a choreography is developed which describes all meaningful sequences of message exchange between the participants. 2. Using some kind of mechanical process, a specification of the behavior of each participant is extracted from the overall choreography.2 3. At run time, the participants interact, each behaving according to its own behavior specification.
1
Such as using a 2-phase commit mechanism for synchronous message exchange.
2
In the jargon of choreography theory this is called “end-point projection”.
123
SOCA (2010) 4:109–136
At run time (step 3), the participants behave independently. Each is free to send and receive messages according to its own behavioral specification and without any central orchestrating or controlling entity to enforce conformance to the original choreography. This procedure guarantees a degree of compatibility between the designs of the behaviors of the participants but, unfortunately, this is not enough by itself to guarantee that the behavior of the collaboration at step 3 does not depart from the choreography. It may be that the collaboration deadlocks leaving messages unprocessed or allows patterns of message exchange not envisaged in the original choreography definition and which result in combinations of states of the participants that were not intended and have no meaning. As the choreography specifies the meaningful sequences of exchange, it is clearly important that departure from it is not possible, as such departure represents entering an unintended and meaningless state. The behavioral rules embedded in the participants must therefore be strong enough to prevent such departure and this, in turn, requires that the choreography conforms to certain structural conditions. The problem of determining general conditions on the form a choreography which are sufficient to guarantee that the interaction of extracted behaviors is bound to adhere to the choreography is known as the realizability problem. 1.2 Choreography realizability Realizability of a choreography is determined by a combination of: – The communication model used for message exchange between participants. This model describes the communication channels (or queues) between the participants and is characterized by the number of the queues, message ordering discipline of the queues, queue size bounds etc. – Conditions on the form of the choreography itself. – The mechanical process used to extract the participant behavior specifications from it. Investigations into choreography realizability address the question: Given a particular communication model (e.g., unbounded FIFO queues in each direction between each pair of participants), and a process for extracting participant behaviors from a choreography, what are sufficient conditions on the form of the choreography to guarantee that the collaboration will always succeed? This is the key theoretical research challenge in choreography theory, and the focus of this paper is an investigation into this question. The idea is that if sufficiently simple and robust conditions on the form a choreography can be formulated, along with
SOCA (2010) 4:109–136
111
Fig. 1 Extracting participant designs from a choreography
a simple procedure for extracting participant behavior definitions, then designing extended collaborations can become routine. If the choreography is built to obey the rules, the resulting collaboration is bound to work.
it is permissable to engage in a message exchange can be determined by the result of a calculation. This is in contrast to most other published approaches, which only guarantee realizability on the basis of patterns of message exchange.
1.3 Contribution Over the last few years, a number of approaches to the realization problem have been proposed: some of the main ones are discussed toward the end of this paper in Sect. 11. We claim that the approach we define in this paper makes a significant advance over other proposals in three respects: – We use composition in the articulation of a choreography, so that a single choreography can be expressed as a composition of partially synchronized descriptions. This gives our approach the expressive power to model cases where the ordering and/or certainty of message receipt is unknown in advance and cannot be constrained by the choreography. These kinds of collaboration are hard or impossible to handle without composition. – We demonstrate that realizability of a choreography defined as a composition only needs to be established individually for the components of the composition. Determining realizability for a single component is easier than analyzing the choreography as a whole, thus simplifying the analysis. – We allow the use of data and computation in the definition of a choreography, so that the decision on whether
1.4 Organization of this paper The paper is organized as follows: Section 2 provides background on Protocol Modeling (PM) whose formalisms provide the basis for the rest of the paper. Section 3 introduces the concept of a protocol contract together with how its satisfaction is defined and established. Section 4 discusses how protocol contracts can be composed and decomposed. Section 5 extends the basic PM and protocol contract formalisms to describe choreographies and participant contracts. Sections 6, 7, 8 and 9 investigate the question of realizability and give sufficient conditions for realizability in both synchronous and asynchronous collaborations. Section 10 gives an example of a three party collaboration, showing the projected participant contracts for asynchronous collaboration. Section 11 relates our work to other research into choreography realization and Sect. 12 concludes with our view of the significance of this work.
123
112
SOCA (2010) 4:109–136
2 Protocol modeling
– A2 expresses the fact that the account may only be closed if it is in credit. Instead of having its state stored as part of the local storage, this machine has its state represented by a state function (shown in the A2 cartouche in the diagram) that returns an enumerated type, allowing values {in credit, overdrawn}. We describe this as a machine with a derived state. Because their states are derived rather than driven by transitions, protocol machines that use a derived state can be topologically disconnected. For graphical emphasis, the state icons in a machine with a derived state are shown with a double outline. – A3 expresses the fact that the balance is limited to the range −2 to +2. The machine does this by constraining the deposit and withdraw events on the basis that their ending states must be in this range. In PM, this is called a post-state constraint. A machine is unable to accept an event that violates a post-state constraint, so in CSP terms it is a refusal exactly as if there was no outgoing transition for the event from the current state of the machine.
Protocol Modeling (PM) is a technique that aims to combine the strengths of process algebra with the ability to model data. PM borrows from process algebra in establishing a formal relationship between the states of a process and the events it can handle, and in supporting process composition. It departs from traditional process algebra in allowing processes to maintain data and allowing states to be computed from stored data. The following small example introduces PM and illustrates how a PM representation relates to pure algebraic representation. This account of PM is brief and intended to provide a basis for the rest of this paper; and a complete account of can be found in [10]. 2.1 Protocol model example Consider the model shown in Fig. 2. The left-hand side (the “algebraic” description) shows a state machine for a very simple (and limited) bank account. The account can have a balance of {−2, −1, 0, 1, 2} represented by states {B−2, B−1, B0, B1, B2}. It has an alphabet {O pen, D1, D2, W 1, W 2, Close} of events that it understands, where D1, W 1 deposit and withdraw one unit respectively and D2, W 2 deposit and withdraw two units respectively. The left-hand side process can be described in algebraic form (using Milner’s CCS [14] notation) as follows: ACC OU N T = O pen.B0 B0 = D1.B1 + D2.B2 + W 1.B−1 + W 2.B−2 + Close.Closed B1 = D1.B2 + W 1.B0 + W 2.B−1 + Close.Closed B2 = W 1.B1 + W 2.B0 + Close.Closed B−1 = D1.B0 + D2.B1 + W 1.B−2 B−2 = D1.B−1 + D2.B0
The right-hand side of Fig. 2 shows the same bank account rendered in PM notation. The PM description comprises three machines, A1, A2 and A3. These are called protocol machines. – A1 provides the basic account behavior, showing that the O pen must happen first, followed by any number of deposits {D1, D2} and withdraws {W 1, W 2}, followed by the Close. This machine maintains a balance attribute for the account using the updates indicated in the bubbles attached to the transitions. The balance is a numeric attribute owned by A1 and only A1 may alter its value, although other composed machines may access it. The current state is also an attribute of the machine, with an enumerated type allowing values {dot, active, closed}.
123
These three machines are in parallel composition using the parallel composition operator of Hoare’s CSP [5], so the behavior of the bank account is given by A1 A2 A3. This composition is also a protocol machine and has a behavior that is indistinguishable, by black-box testing, from the algebraic description. For instance, the Close can only take place if A1 is in state active and A2 is in state in credit, mirroring the fact that the left-hand diagram only has outgoing transitions for Close from the states {B0, B1, B2}. While this approach is based on parallel composition of conceptually concurrent machines, it is important to appreciate that there is no requirement or implication that multiple virtual processors or threads are used. We will assume that the assembly of composed machines, such as A1 A2 A3, is single threaded.3 Clearly, the PM model on the right-hand side of Fig. 2 can be modified to handle balances ranging −∞ to +∞ (assuming no limit on the ability of the balance attribute to represent any numeric value) by: – Replacing {D1, D2, W 1, W 2} with generic Deposit and Withdraw events carrying an amount field,4 and using this field to update the balance appropriately in A1. – Removing A3. Making the equivalent change to the algebraic model would require an infinite state space, as the data owned and main3
The ModelScope tool [11] that supports Protocol Modeling uses a single threaded implementation of CSP composition.
4
We use the term field in the context of messages and attribute in the context of machine local storage.
SOCA (2010) 4:109–136
113
Fig. 2 Bank account as algebraic and protocol model
tained by the system has to be encoded in the state space of the process.
2.2 Terms and definitions This section provides some further definitions and terminology related to protocol machines that are germane to the paper.
Alphabet. Every protocol machine divides the universe of possible events into those that it understands (and will either accept or refuse) and those that it does not (and ignores). This is done on the basis of the alphabet of a process: the set of event types for which the behavior of the process is defined. Thus, the alphabet, denoted α(A1), of the machine A1 in the example (once amended to handle arbitrary balance values) is the set {Open, Deposit, Withdraw, Close}.
Attributes. A machine owns a set of attributes, each of which is either stored or derived. Only the machine that owns a stored attribute can alter its value, and only when moving to a new state in response to an event. The updates for stored attributes are shown as bubbles attached to transitions. A derived attribute is not stored and has its value returned on demand by a derivation (calculation) function using the attributes of the owning machine and other composed machines.
Dependent and independent machines. types of machine:
We distinguish two
– A dependent machine is one that accesses the values of attributes that it does not own (i.e., that belong to other machines composed with it). Both A2 and A3, considered as “stand-alone” machines, are dependent as they use balance, which they do not own, to compute their state. – An independent machine is one that makes no such access. A1 is independent; as is the protocol machine formed as the composition of A1 and A2 as the reference to the balance attribute used by A2 to determine its state is resolved by A1, which is within the composition. While derived state calculations are the source of dependency in the machines A2 and A3, the definition extends to any inter-machine reference whereby one machine uses the attribute of another to: – compute the value of a derived state, or – compute the value of a derived attribute, or – compute the new value of a stored attribute in response to accepting an event. Note that two machines having elements of their alphabets in common is not a source of dependency between them.
123
114
Thus, A1 is independent even though event types in its alphabet are also in the alphabets of A2 and A3. Composition. We will use the symbol to indicate CSP parallel composition, with the semantics defined by Hoare in [5]. We use the term assembly to describe a machine created by parallel composition of other machines; and we allow an attribute in one component of an assembly to resolve a reference in another, as balance in A1 resolves the references in A2 and A3. Trace. A sequence of event instances, t, is a trace of an independent machine, M, if there is an execution of M starting from the machine’s initial state that accepts t. Each element of a trace completely specifies both the type of the event instance and the values of all fields carried by the instance. Thus, a trace of A1 (once amended to handle arbitrary balance values) might be Open, Deposit 100, Withdraw 80, Deposit 50. The set of traces of M is denoted by traces(M). If M is a dependent machine, t is trace of M iff ∃ X such that M X is independent and t ∈ traces(M X ) α(M) . In other words, there is a trace of M X which, if all entries that are not in the alphabet of M are removed, is equal to t. Determinism. Protocol machines are deterministic, in that the set of events that a given independent machine can accept at any point is fully determined by the trace accepted by the machine up to that point. More formally: If X is an independent machine (i.e., one with no unresolved references) and two executions of X behave as follows: – In one execution, X accepts a trace, t, of event instances and then accepts (refuses) a further event instance, e++,5 – In the other, X accepts t but then refuses (accepts) e++. Then X is not a protocol machine. Determinism of a dependent machine is defined by the requirement that any composition of it with another protocol machine that resolves its references must yield an independent machine that qualifies as a protocol machine according to the above. Because the component machines are deterministic, so are assemblies. As Hoare himself notes: “…the concurrent operator by itself does not introduce non-determinism” [6]. Equivalence. A consequence of determinism is that if two independent machines have the same set of traces, no blackbox experiment can determine the difference between them, 5
We use the notation ++ throughout this paper as a visible reminder that a trace element is an instance, carrying values in all its fields.
123
SOCA (2010) 4:109–136
and they are therefore considered to be the same machine. Two dependent machines are considered to be the same if one can be substituted for the other in any independent assembly without affecting the trace set of the assembly. 2.3 Topological representation Much of the later parts of this paper concern reasoning based on the topology of protocol machines represented as state transition diagrams. As a basis for this we define a representation of a stored state machine as a tuple P = (Λ P , Σ P , Γ P , Δ P ) where: – Λ P is the alphabet of P, the set of events which P will either accept or refuse. So Λ P = α(P). Elements range over a, b, …. – Σ P is a finite set of states of P. Elements range over σi , σ j , …. – Γ P ⊆ (Λ P × Σ P ) is a binary relation. (a, σi ) ∈ Γ P means that a can be accepted by P when P is in state σi . / Γ P means that a is refused by P when P is in (a, σ j ) ∈ state σ j . – Δ P is a total mapping Γ P → Σ P that defines for each member of Γ P the new state that P adopts as a result of accepting an event. So Δ P (a, σk ) = σl means that if P accepts a when in state σk it will then adopt state σl . Clearly, this has a direct translation into a graphical state transition diagram. Note that this kind of representation is defined for stored state machines, where the new state that a machine adopts as the result of accepting an event is driven by topological rules (embodied in the function Δ). It is not defined for derived state machines as their behavior cannot be depicted in pure topological form. Suppose we have two stored state machines, P and Q. Using the semantics of CSP composition we can create a representation of P Q as follows: – Λ PQ = Λ P ∪ Λ Q – Σ PQ = Σ P × Σ Q (the Cartesian product of Σ P and ΣQ ) – Given a ∈ Λ PQ and a state (σ p , σq ) ∈ Σ PQ , we determine whether (a, (σ p , σq )) ∈ Γ PQ as follows: – (a ∈ Λ P ∧ (a, σ p ) ∈ / Γ P ) ⇒ (a, (σ p , σq )) ∈ / Γ PQ – (a ∈ Λ Q ∧ (a, σq ) ∈ / Γ Q ) ⇒ (a, (σ p , σq )) ∈ / Γ PQ – Otherwise, (a, (σ p , σq )) ∈ Γ PQ – And we construct Δ PQ (a, (σ p , σq )) as follows: – (a ∈ Λ P ) ∧ (a ∈ / Λ Q ) ⇒ Δ PQ (a, (σ p , σq )) = (Δ P (a, σ p ), σq ) – (a ∈ / Λ P ) ∧ (a ∈ Λ Q ) ⇒ Δ PQ (a, (σ p , σq )) = (σ p , Δ Q (a, σq ))
SOCA (2010) 4:109–136
115
– (a ∈ / Λ P ) ∧ (a ∈ / Λ Q ) ⇒ Δ PQ (a, (σ p , σq )) = (σ p , σq ) = Δ P (a, σ p ), – Otherwise Δ PQ (a, (σ p , σq )) Δ Q (a, σq )) We will be using this construction in Sect. 8, in the context of choreography analysis.
3 Protocol contracts We now introduce the concept of a protocol contract, which is a partial specification of the behavior of a protocol machine. Later in the paper, we use protocol contracts as the vehicle to capture required participant behaviors extracted from a choreography. 3.1 Definition of a protocol contract A protocol contract C is a pair [C, F] where:
An independent protocol machine M satisfies a contract C, written M C, iff: (3.1)
and: ∀ e ∈ F ∃ X such that M = X C and X can never refuse an event of type e
(3.2)
The first condition, (3.1), has two implications: – The alphabet of M is a superset of that of C. This is natural, as you would not expect a design to satisfy a contract if its alphabet does not include all the event types in the alphabet of the contract. – Every trace of M, when restricted by eliminating those events whose type is not in the alphabet of C, must be a trace of C. This condition is described as M being Observationally Consistent 6 with C. The second condition, (3.2), requires that an X can be found so that either e is not in the alphabet of X or every state of X allows e to be accepted. If an event type is fully constrained by the contract, acceptability by the contract machine is a necessary and sufficient condition for acceptability by M. 6
As defined by Ebert and Engels in [2].
3.2 Protocol contract example Consider Fig. 3. The left-hand side specifies a contract for the behavior of a bank account. The right-hand side shows the design of a bank account, Account = A1 A2 A3. We now show that the design satisfies the contract. Account clearly obeys the upper part (state machine) of the contract, as the state diagram A1 of the Account is the same as the state diagram, C, of the contract. This means that Account C = Account, thus meeting (3.1). We now wish to show that Account meets the lower part of the contract, which lists the events that are fully constrained by the contract. To do this, consider the Account reformulated as shown in Fig. 4 so that: Account = A1 A1 A2 A3
– C is an independent protocol machine, called the contract machine of C. – F is a set of event types, F ⊆ α(C), called the fully constrained event types of C.
M C =M
If an event is not fully constrained by the contract, acceptability by the contract machine is a necessary but not a sufficient condition for acceptability by M. We give an example in the next section.
In this reformulation, the machine A1 has been split into two machines, A1 and A1 . The first of these specifies the event sequencing on Open, Deposit, Withdraw and Close; and the second maintains the balance but does not constrain event sequencing. The machine A1 is now identical to C, so: Account = C A1 A2 A3 The machine A1 A2 A3 plays the role of X for both Open and Deposit in part (3.2) of the contract definition above. As none of A1 , A2 or A3 can ever refuse the events Open and Deposit, these events are fully constrained by the contract. However, as A2 can refuse Close and A3 can refuse Withdraw, these events are not fully constrained by the contract. 3.3 Discussion Suppose I have a bank account and I know, because the bank tells me, that the account conforms to the contract shown on the left of Fig. 3. Suppose also that I know that the account has been opened but not closed. Then I know that, in terms of the contract, it is in the state active. From this I can deduce that: – I can do a Deposit and the account will accept this in its current state. This is because Deposit is fully constrained by the contract. – I may or may not be able to Withdraw or Close, as the contract does not fully constrain these and so does not guarantee that these will be accepted when in the contract is in the active state. In other words, the bank has
123
116
SOCA (2010) 4:109–136
Fig. 3 Bank account: contract and design
Fig. 4 Bank account reformulated
rules about when a withdraw can be made and when an account may be closed, but the contract does not include these rules. – If I close the account, and the bank permits this operation, I cannot then do any further deposits or withdraws. This is because the sequencing rules of the contract cannot be violated.
123
In this way, a contract allows determination of what is possible for a machine, based on partial knowledge of its state. 3.4 Dependency in contracts Suppose that we want to enhance the contract for the Bank Account to fully constrain Withdraw events, with the rule that
SOCA (2010) 4:109–136
117
a withdraw must not take the account overdrawn beyond its limit. The obvious way to do this is the include the machine A3 in the contract, so that the protocol machine of the contract becomes C A3 where C is the state diagram shown on the left of Fig. 3. However, A3 has a derived state based on balance, and the contract needs to know what this is as the contract machine must be independent. This means that balance has to be defined within the contract and so we add it as an attribute to C, along with the update rules (as shown in the bubbles in A1) that maintain its value. This means that C is now identical to A1. In general, we allow the use of dependent machines in the definition of a contract provided that they are in composition with other machines so that the contract as a whole is independent. This requires that any inter-machine references used in the contract are fully resolved within the contract.
and the decomposition, giving C = C1 ⊕ C2 , is:
4 Composition and decomposition of contracts
traces(A) ⊇ traces(A B)
Protocol contracts can be composed and sometimes decomposed, as we describe below.
C1 = [C1 , F ∩ α(C1 )] and C2 = [C2 , F ∩ α(C2 )] The reason for the stricture (4.1) is as follows. Suppose that P C and that e ∈ (F ∩ α(C1 ) ∩ α(C2 )). While the pattern of occurrences of e in P is fully described (i.e., fully constrained) by C1 C2 , it may not be fully described by either C1 or C2 alone because it depends on the way the two processes synchronize on e. In this case e cannot belong to the set of the events fully constrained by a contract defined in terms of either C1 or C2 alone, so the decomposition fails. Suppose that stricture (4.1) is met and that P C. We have: P = C P = C1 C2 P
As it is a property of CSP that for any A, B with α(B) ⊆ α(A):
traces(P) ⊇ traces(P C1 ) ⊇ traces(P C1 C2 ) = traces(P) so traces(P) = traces(C1 P), and hence:
If C1 = [C1 , F1 ] and C2 = [C2 , F2 ], then their composition is defined as follows:
P = C1 P
It is easy to show that if a machine P satisfies both C1 and C2 , it also satisfies C1 ⊕ C2 : 1. P C1 ⇒ P = C1 P P C2 ⇒ P = C2 P so P = (C1 C2 ) P.
(4.3)
we have by (4.2) and (4.3):
4.1 Composition
C1 ⊕ C2 = [C1 C2 , F1 ∪ F2 ]
(4.2)
(4.4)
Suppose e ∈ F ∩ α(C1 ). As P C, ∃ X s.t. P = C X and X never refuses e. So, P = C1 (C2 X ). As, by stricture (4.1), e ∈ / α(C2 ), X = (C2 X ) can never refuse e, and we have: P = C1 X and X can never refuse e
(4.5)
(4.4) and (4.5) ⇒ P C1 . Similarly, P C2 . Finally, combining the results for composition and decomposition we note that:
2. Suppose e ∈ F1 . Then ∃ X s.t. P = C1 X and X never refuses e. As P C2 , P = C2 P so P = (C2 C1 ) X .
If F1 ∩ F2 = ∅, (P C1 ⊕ C2 ) ⇔ (P C1 ∧ P C2 )
4.2 Decomposition
In this section, we explore the use of Protocol Modeling ideas to specify a choreography for a collaboration. Our approach uses end-point projection, whereby the behavior of the participants is abstracted from the choreography. This is conceptually similar to projecting a WS-CDL choreography into BPEL end-point (participant) definitions, for instance as described by Mendling et al. [13]. The difference in our scheme is that, rather than projecting into any particular design language, a choreography is projected to define protocol contracts for the end-points. As protocol contracts can be composed, it is possible for a participant to engage in
Suppose that C = [C, F] and that C = C1 C2 . It is reasonable to ask whether C can be decomposed into two contracts, C1 and C2 , the former using C1 and the latter using C2 , in such a way that: P C ⇒ (P C1 ) ∧ (P C2 ) This is possible provided that: F ∩ α(C1 ) ∩ α(C2 ) = ∅
(4.1)
5 Choreography and collaboration
123
118
multiple choreographies by complying simultaneously with the different contracts they impart. 5.1 Choreography definition We define a choreography, C, as a choreography protocol machine. Instead of an alphabet of event types as we had in in Sect. 2, a choreography protocol machine has an alphabet of message exchanges. The upper cartouche in Fig. 5 shows a choreography for collaboration between two participants, a Customer (Cust) and a Supplier (Supp), in the context of order processing. The transitions are labeled with the message type being exchanged, with a prefix indicating the sender and receiver of the message. So a label: Supp > Cust:Accept Order represents the exchange of an Accept Order message sent by participant Supp to participant Cust. These exchanges, defined by the combination of sender, receiver and message type, are the individuals that form the alphabet of the choreography. Roughly speaking, a choreography represents the set of possible orderings of message exchange between the participants of the collaboration; the exact definition will be given later, in Sect. 6.2. The choreography in Fig. 5, shown in the upper cartouche of the figure, is represented as two machines which are taken to be in CSP composition, so that they synchronize on message exchanges that appear in the alphabet of both but are otherwise independent. The set of possible orderings of exchanges between Customer and Supplier given by this choreography is the set of traces of the composition of these two machines. This idea, that CSP compositions are used to generate the set of possible exchanges in a collaboration, is central to the approach described in this paper. If t is a sequence of entries of the form P>Q:m++ we define t to be a trace of an independent choreography machine if it is accepted by the machine. Note that the machines of a choreography can have attributes and derived states, so a trace has to be defined in terms of a sequence of message instances, loaded with field values (as indicated by the ++ in the entry). The choreography of a collaboration is well constructed if it has a set of traces that covers all possible useful interaction scenarios. We will assume that choreographies are, in this sense, well constructed. The statement The Choreography Fully Constrains the Receivers specified as part of the choreography means that it fully describes the circumstances under which a message can be received. This assumption is required to establish realizability, and we will assume, as standard, that choreographies fully constrain message reception in the participants.
123
SOCA (2010) 4:109–136
5.2 Collaboration contracts We define the behavior of the participants in our collaboration using protocol contracts. The elements of the alphabet of the contracts represent sending and receiving messages, and we refer to these as actions. An element of the alphabet of a contract is of the form: – !>Q:m1 indicating a send action that sends a message of type m1 to participant Q; or – ?P:m++, where means concatenation7 and m++ means a message instance of type m along with the values of all fields it carries, is also a trace iff both of the following hold: – t brings M to a state (in terms of its protocol) in which it allows a transition for the action !>P:m – t brings M to a state (in terms of its attributes) in which the derivation function for any field of m that depends on M returns the same value as that field has in m++. We will create the contracts for participants by projection from the choreography. The lower cartouche of Fig. 5 shows 7
So if t = then t c = .
SOCA (2010) 4:109–136
the projected contract for Cust. The prefixes on the transition labels in the choreography are changed in the obvious way, so in constructing the Cust contract: – a prefix “Cust >Supp:” in the choreography becomes “!>Supp:” in the contract; and – a prefix “Supp>Cust:” in the choreography becomes “?Q:m++ in t having P as sender becomes !>Q:m++ in t P , and an entry such as Q>P:n++ in t having P as receiver becomes ?Q:m) ∈ α(C j ), P >Q:m is possible for C j after executing s j . Consider one of these machines, Ck . We know that sk P>Q:m++ ∈ traces(Ck ). By (F1) we have:
This device gives a global sequence, sE , of the sends executed in E. We denote the restriction of this global sequence of sends to the alphabet of a given choreography machine, Ci , by siE . So a send of m++ by P to Q of the global sequence is included as an entry P>Q:m++ in siE iff (P >Q:m) ∈ α(Ci ). The machines CiP j are trace matched iff, ∀ collaborations E: (M1) siE ∈ traces(Ci ). siE is called the implied trace of Ci in E. (M2) ∀ j the trace of CiP j in E is a prefix of siE P j . Informally, trace matching means that the sequencing of sends in a collaboration uniquely defines the traces (sends and receives) of the projected machines. Trace matching is a property of both a choreography and the way it is projected to the participants, and we will be considering the conditions under which this property is guaranteed later, in Sect. 8. The ability to identify the state of the choreography machines on the basis of the observable sequence of sends meets the first condition, (C1), for realizability. We now go on to show that, if we have faithful projection and trace matching, we also meet conditions (C2) and (C3). We show this for first for synchronous and then asynchronous collaboration. In these arguments we take faithful projection and trace matching as given. 6.5 Synchronous realizability We show that, in a synchronous collaboration, the combination of faithful projection and trace matching guarantees realizability of the choreography. We assume that the choreography is projected as described in (6.1) and that ∀i, j CiP j are faithfully projected and ∀i CiP j are trace matched. We may assume that the participants’ designs contain their respective contract machines as, if not, we may add the contract machines without altering the behavior of the participant. As this is a synchronous collaboration, every exchange in any trace of a choreography machine Ci must be fully represented (both send and receive entries present) in the traces of the projections. This means that no entries can be removed by the prefixing in (M2). So, if the sequence of exchanges so
123
(sk P>Q:m++) P = (sk P !>Q:m++) ∈ traces(CkP ) By (M2) sk P is the trace so far of CkP , so !>Q:m++ is possible for CkP . This applies to all machines in P required to participate in the send of m++ under CSP composition, so the send is possible for P as a whole. In addition, (M2) means that a send that is not allowed by the choreography is not allowed by any participant, so we have (C2). Receiving (C3). An identical argument shows that ?Q:m++. As the receipt by Q is fully constrained by the choreography contract, no other (non contract) machine in Q can refuse (block) it. This means that the choreography C is realizable. 6.6 Asynchronous realizability We now show that realization of the choreography is also guaranteed in an asynchronous collaboration between faithfully projected trace matched machines. As in the synchronous case, we assume that the choreography is projected as described in (6.1) and that ∀i, j CiP j are faithfully projected and ∀i CiP j are trace matched. Again, we may assume that the participants’ designs contain their respective contract machines as, if not, we may add the contract machines without altering the behavior of the participant. The method of argument is similar to the synchronous case. Note that, in the asynchronous case, while the traces of CiP j may be matched for a given i, the traces for two different values of i may be interleaved differently at different participants so that we do not have trace matching on the contracts considered as a whole. Sending (C2). We suppose an asynchronous execution E that has executed a sequence, s, of sends. By (M1), this implies a trace si of each choreography machine Ci . Consider an exchange P >Q:m involving a message m++ and suppose that ∀ j for which (P >Q:m) ∈ α(C j ), P >Q:m is
SOCA (2010) 4:109–136
123
possible for C j after executing s j . Consider one of these machines, Ck . By (F1), (sk !>Q:m++) P ∈ traces(CkP ). This means that if the send is possible in the choreography, it is also possible in the collaboration. Note that, unlike the synchronous case, (M2) only guarantees that current trace of CkP is a prefix of (sk !>Q:m++) P and there may be outstanding messages for CkP to receive before !>Q:m++ becomes available to execute. So, unlike the synchronous case, the send may not be available to P immediately. This is inherent to asynchronous collaboration. In addition, (M2) means that a send that is not allowed by the choreography is not allowed by any participant, so we have (C2). Receiving (C3). Now we suppose that there is at least one message that fails: in that it cannot be successfully received by the participant to which it has been sent and so remains permanently “in flight”. In the global sequence of sends in E we assume, without loss of generality, that the first failed message is m++ f ail of type m sent by P to Q. We argue to show that this failure is impossible. The two machines that govern the sending (by P) and receiving (by Q) of m++ f ail are as below: C Pm = CiP and C Qm = CiQ (!>Q:m) ∈ α(CiP )
(?Q:m++ f ail . So the implied trace, tkf ail , of Ck must contain P>Q:m++ f ail and the restricted trace tkf ail Q must contain ?Q:m++ is added to the trace of CiP ; and – ?P:m++. As all exchanges
SOCA (2010) 4:109–136
131
Fig. 12 Failure of reduction to pairs
involving P in t’ have entries in t P , no message to P sent before m++ is still queued; and because the queues are FIFO, no message sent after m++ by Q to P can overtake it. So it must become available to P. As we have trace matching so far, P must be able to receive it. We need to show that C P is bound to stay matched to the trace t when it consumes a message. There are two possible causes for departure: – C P consumes the message m++ from Q but advances to a state that does not correspond to the state that C has after the Q>P:m++ exchange. – Another message, n++ sent by R in an exchange that occurred in t after Q>P:m++ is accepted by C P before m++, causing it to receive messages in an order that does not correspond to t and hence depart from match to t. An example of this situation is shown in Fig. 12. Note that such a message cannot come from Q, as it would then not be available to P before m++ because of the FIFO queueing discipline. The first cause would require that there is a trace t f alse of C such that t f alse P = t P + ?P:m++ in t f alse is different from its state after Q>P:m++ in t. This requires that the exchange Q>P:m++ uses different transitions in the graph of C in the context of the two traces. However, in C P these transitions are both available to P after it has executed t P and this would make this reduction non-deterministic, which is not allowed by (A1). The second cause would require that there is a trace t f alse of C containing R>P:n++ followed, as the next send by Q to P, Q>P:m++. We consider the graph of C from the state after t’ to the state where it executes R>P:n++. The path used by t has a send from Q to P and the path used by t f alse does not. So these paths must lead to different states, otherwise C would not be reducible to {P, Q}, as required by projection rule (A1). If they are different states, then the two
transitions used by R>P:n++ are different in the two paths. In the reduction of C to {P, R} these would both be available in the state after t, and this would make this reduction non-deterministic, which is not allowed by (A1). Induction begins when the choreography and all participants are at their initial states and all traces are empty. This establishes trace matching and realizability.
10 Example As an example, we take the Shared Book Buying Collaboration described by Honda et al. [7]. In this example, there are three participants: a Seller (S) and two buyers (B1 and B2). Buyer1 can afford a portion of a book’s price and the balance has to be paid by Buyer2, who also makes the buy or no buy decision. The choreography is shown at the top of Fig. 13 described as two composed machines. It is easy to see that the choreography is relay, so adheres to projection conditions (A2) and (A3). Moreover, as a given exchange only appears once in each choreography machine, any reduction must be deterministic so the choreography adheres to projection condition (A1). So asynchronous realizability is guaranteed. Below the choreography in Fig. 13 is shown the extraction of the contract for the participant Buyer1. Similar extraction produces the three contracts shown in Fig. 14.
11 Related work A number of other authors have looked at choreography definition and the rules required to ensure realizability. In this section, we discuss related work and provide a commentary on how this work relates to this paper. We focus on work on asynchronous choreographed collaborations.
123
132
SOCA (2010) 4:109–136
Fig. 13 Shared book buying: choreography and contract for Buyer1
Session typing. Work by Honda et al. [7] addresses the question of realizability of asynchronous multiparty collaborations using a behavioral typing discipline called Session Typing. The scheme defines global types, which represent choreography; and local types, which correspond to our participant contracts. The local types are abstracted from the global type by a projection calculus, similar to that we describe in Sect. 7. The conditions we give for asynchronous projectability are captured in their notion of linearity, which aims to enforce the discipline that in two communications, sending actions and receiving actions should respectively be ordered temporally, so that no confusion arises. Linearity is established by a process called causality analysis based on the dependencies between the exchanges of the global type.
123
This analysis requires that there are subsequences in the choreography that observe a similar discipline to that we require in relay machines. Where there is indeterminacy of ordering in the global type, sequencing is imposed using a device called a channel. In their solution to the Shared Book Buying Collaboration, which we discussed in Sect. 10, the channel device is used to address the indeterminacy in the ordering of receipt of the Quote and Share messages by B2 by imposing an ordering: Quote followed by Share. This is not required in the solution we present, and it seems that channels are used to remove the natural concurrency of the collaboration which in our approach is expressed explicitly using composition in the choreography. This means that the Session Typing approach,
SOCA (2010) 4:109–136
133
Fig. 14 Shared book buying: participant contracts
at least as described, is not able to describe collaborations that necessarily entail a race. An example is given in Fig. 15 which describes a simple competition in which two contestants, X and Y , are asked a question and a prize is awarded for correct answers. A choreography expressed as four composed protocol machines is shown in Fig. 15, and we note the following: – The collaboration involves a race, because the sequencing of the receipt of answers from X and Y is unknown in advance and cannot be constrained by the choreography. This is handled using a composition of two machines in the choreography, shown one above the other at the bottom left of the figure, allowing the ordering of receipt of the answers from X and Y to be unspecified. – The provision of the Decision by Q back to P and then by P on to X and Y is depicted by the two machines at the bottom right of the figure. Note that it is possible for P to send notification of the decision to X or Y before they have responded, as it is possible that either or both failed to respond quickly enough. – The four machines in the choreography are independent, relay and deterministic when reduced to pairs. So the choreography is projectable and therefore realizable under our theory. In attempting to create of a global type for this collaboration, we have a choice about how to express the sending
of the question to X and Y and the subsequent receipt of their answers by Q. One possibility is to use two session types composed in parallel, following the same strategy that is used in Fig. 15. However, while the syntax of global types allows parallel composition, the projection of a global type that involves parallel composition is undefined if a given participant is involved in both the types being composed, either as the sender or the receiver of a message, so this approach is excluded. The other possibility is to serialize the two exchanges whereby X and Y send their answers to Q in a single global type. In this case, we need to address the question: Should these two exchanges use the same or different channels? Neither choice appears to be satisfactory. If we choose to use different channels then P is forced to receive the answers in a defined order, as the receives on the channels must be serialized in P’s logic. This clearly violates the intention of the collaboration. If we choose to use the same channel then the global type fails to obey the condition for linearity10 (the Session Typing equivalent of our projectability condition), and this means that realizability is not guaranteed. As some collaborations, such as e-auctions, are likely to involve such intentional race conditions the difficulty of modeling this using Session Types seems a weakness. 10
This is because a chain of two receipts on the same channel from different senders, as we would have here, is considered “non-causal” and makes the global type non-linear. See Honda et al. [7]
123
134
SOCA (2010) 4:109–136
Fig. 15 Competition example
More recent work is underway, [1], to extend the Session Typing to include a treatment of data. This work aims to allow contractual assertions to be included in the definition of global types and thus bring a “Design by Contract” [12] capability to protocol construction. At present, this work is not aimed at allowing data and computation to be used for the definition of choreographical behavior constraints and so does not compare directly with our work.
IOC/POC. Work by Lanese et al. [9] define an approach similar to that used in Session Typing, but without the use of channels. The scheme defines interaction-oriented choreography (IOC), which represents choreography; and process-oriented choreography (POC), which corresponds to our participant contracts. A POC for each participant is abstracted from the IOC by a projection calculus. The authors give realizability conditions, which they call connectedness conditions, for both synchronous and asynchronous collaborations. These conditions correspond very closely to those
123
we describe for projectability of a single stored state choreography machine. The IOC/POC scheme allows parallel composition of processes in a choreography and, unlike Session Typing, allows the same participants and message types to appear in different composed processes. This means that the IOC/POC scheme has the ability to model concurrency involving races and could be used to express the choreography for the Competition Example in Fig. 15. However there is no concept of CSP style synchronization or of the use of data and derived states, so it is not possible to articulate a choreography as a set of separate partially synchronized descriptions. This means that the approach would not, as currently described, be able to describe a choreography that requires a combination of stored and derived state spaces such as the Instalments Example in Fig. 11 and, as published so far, the IOC/POC approach does not use data or computation at all. Conversation protocols. In their work, Fu et al. [3] describe an approach for the realization of asynchronous collaborations
SOCA (2010) 4:109–136
based on Conversation Protocols. A conversation protocol corresponds to a choreography and is projected to give peer (= participant) behavior. Both the choreography and the projected peer behaviors are modeled as Finite State Automata (FSA). The authors identify three conditions on a conversation protocol that must be met for realizability: Lossless Join which requires that the protocol should be complete when projected to individual peers, Synchronous Compatible which ensures that the protocol does not have illegal states, and Autonomy which implies that at any point in the execution each peer is determined on the choice to send, or to receive, or to terminate. Although stated in different terms, these correspond quite closely to our asynchronous projectability conditions. The Autonomy condition allows more than one peer (participant) to be in send mode at the same point in the choreography so is more relaxed than our projectability condition (A3) which limits sending to a single participant. This more relaxed condition is balanced by the stricter requirement of Lossless Join to ensure realizability. As we point out in Sect. 8.3, different machines in our compositional approach can have different senders, and this gives our approach equivalent expressive power to Conversation Protocols. This formalism used for Conversation Protocols does not use parallel composition at all. This does limit expressive power. For instance, if:
135
accumulated amounts11 such as is required for the Instalments Example in Fig. 11. The authors describe an algorithm for checking realizability of a choreography described as a GFSA, and this bears some similarities to the technique described in Sect. 8.4 but a detailed comparison is beyond the scope of this paper. BPMN2. The Business Process Model and Notation12 (BPMN2) [15] developed by the Object Management Group (OMG) is the latest in a number of attempts to create an industry standard language to describe choreography.13 The new standard is notable in that it is the first to our knowledge to attempt to include rules for realizability, with the following statement: There are constraints on how Choreography Activities can be sequenced (through Sequence Flow) in a Choreography ... The basic rule of Choreography Activity sequencing is this: – The Initiator of a Choreography Activity MUST have been involved (as Initiator or Receiver) in the previous Choreography Activity. This captures our asynchronous projectability condition (A2), but is clearly an incomplete treatment of the conditions for realizability. The BPMN choreography language has no compositional capability of the kind discussed and advocated in this paper.
(P > Q:a) (Q > P:b) is expressed as a Conversation Protocol (i.e., with no composition) then it violates Autonomy as P and Q are both sending and receiving in the initial state of the choreography. However, both components are clearly asynchronously projectable, so the choreography is realizable under our theory. The same issue would arise in attempting to model the Customer and Supplier choreography in Fig. 5 as a Conversation Protocol as, for instance, there is a state where the Customer can send a Request Cancel and also receive an Invoice. In more recent work, [4], the authors have added some capability to model data and computations. This is done by using Guarded Finite State Automata (GFSA) for both choreography and projected peer behaviors. The general idea is similar to that we describe, with data and computation being used to specify rules in the choreography. However, the scheme is significantly less ambitious than ours in that the guard of a transition only expresses the relationships between the message that is being sent and the last message of each class sent or received by the sender; so there is no notion of a choreography owning and maintaining attributes as we have. Updates are confined to changes to message fields, and it would not be possible to compute a guard condition on
12 Conclusion We believe that the development of advanced distributed collaborations will require techniques that enable designers to construct solutions that are transparently correct so that, as far as possible, realizability is guaranteed. The modeling formalisms used to represent choreographies and the resultant behavior contracts must play the key role in this, and the use of compositional modeling appears to be the natural paradigm to express and analyze the inherent parallelism of distributed behavior. If choreography-based approaches are to succeed they must provide the expressive power to capture real business collaborations, but also allow realizability to be verified using techniques within the competence of mainstream software engineers. We believe that the ideas we discuss here take us significantly nearer achieving this combination and we hope 11
At least not without including such amounts as extra fields in the messages. But message content is normally fixed by business considerations or standards, and not open to this kind of change.
12
At the time of writing at the “OMG Adopted Beta Specification” stage.
13
Previous attempts include WSCI and WS-CDL.
123
136
that this work will help inform future directions in the design of choreography modeling languages. In particular, we suggest that composition capabilities should be at the heart of such languages.
References 1. Bocchi L, Honda K, Tuosto E, Yoshida N (2009) A theory of design-by-contract for distributed multiparty interactions. Technical Report, University of Leicester, University of London and Imperial College London 2. Ebert J, Engels G (1994) Observable or invocable behaviour—you have to choose. Technical Report 94-38. Department of Computer Science, Leiden University 3. Fu X, Bultan T, Su J (2004) Conversation protocols: a formalism for specification and verification of reactive electronic services. Theor Comput Sci 328(1–2):19–37 4. Fu X, Bultan T, Su J (2004) Realizability of conversation protocols with message contents. In: ICWS ’04: Proceedings of the 2004 IEEE international conference on Web services. IEEE Computer Society, Washington, DC, pp 96–103. doi:10.1109/ICWS.2004.92 5. Hoare C (1985) Communicating sequential processes. PrenticeHall International, Englewood Cliffs, NJ 6. Hoare C (2006) Why ever CSP? Electronic Notes Theor Comput Sci 162:209–215. doi:10.1016/j.entcs.2006.01.031 7. Honda K, Yoshida, N Carbone M, (2008) Multiparty asynchronous session types. SIGPLAN Not 43(1):273–284. doi:10.1145/ 1328897.1328472
123
SOCA (2010) 4:109–136 8. Kazhamiakin R, Pistore M (2006) Analysis of realizability conditions for web service choreographies. In: Najm E et al (ed) Proceedings of FORTE 2006, 26th IFIP WG 6.1 international conference, Paris. Lecture Notes in Computer Science, vol 4229. Springer, pp 61–76 9. Lanese I, Guidi C, Montesi F, Zavattaro G (2008) Bridging the gap between interaction- and process-oriented choreographies. In: Proceedings of SEFM’08, 6th IEEE international conferences on software engineering and formal methods. IEEE Computer Society, Washington, DC, pp 323–332 10. McNeile A, Simons N (2006) Protocol modelling: a modelling approach that supports reusable behavioural abstractions. J Softw Syst Model 5(1):91–107 doi:10.1007/s10270-005-0100-7 11. McNeile A, Simons N (2008) Metamaxim Website: ModelScope Tool. http://www.metamaxim.com/ 12. Meyer B (2000) Object-oriented software construction. Prentice Hall PTR, Englewood Cliffs, NJ 13. Mendling J, Hafner M, (2005) From inter-organizational workflows to process execution: generating BPEL from WS-CDL. In: Meersman R, Tari Z et al (eds) OTM workshops. Lecture Notes in Computer Science, vol 3762. Springer, pp 506–515 14. Milner R (1980) A calculus of communicating systems. Lecture Notes in Computer Science, vol 92. Springer 15. Object Management Group (2009) Business process model and notation (BPMN): FTF Beta 1 for Version 2.0. OMG Document Number: dtc/2009-08-14. Technical Report, Object Management Group 16. Owicki S, Lamport L (1982) Proving liveness properties of concurrent programs. ACM Trans Program Lang Syst 4(3): 455– 495. doi:10.1145/357172.357178