Feb 11, 2016 - systems that are deadlock-free by construction, by using an âAlice and ...... Benoît Combemale, David J. Pearce, Olivier Barais, and Jurgen J.
Choreographies, Divided and Conquered Luís Cruz-Filipe and Fabrizio Montesi University of Southern Denmark {lcf,fmontesi}@imada.sdu.dk
arXiv:1602.03729v1 [cs.PL] 11 Feb 2016
Abstract Choreographic Programming is a paradigm for developing concurrent software that is correct by construction, by syntactically disallowing mismatched I/O operations in programs, called choreographies. Due to their benefits, choreographies have been largely adopted for the writing of business processes and communication protocols. However, current choreography language models cannot capture many kinds of communication structures, limiting their applicability. In this paper, we present Procedural Choreographies (PC), a new language model that includes the novel feature of reusable choreographic procedures, parameterised on the processes they use. PC also combines, for the first time in choreographies, general recursion with the ability to create new processes at runtime. The combination of these features yields a powerful framework where we can write divide-and-conquer concurrent algorithms based on message passing. This enhanced expressivity makes it possible to write new behaviours that cannot be faithfully implemented (unrealisability); to tackle this issue, we endow PC with a new typing discipline that supports both decidable type checking and type inference. PC is equipped with an EndPoint Projection (EPP) that, from a well-typed choreography, synthesises a correct-byconstruction distributed implementation in a process calculus. Extending a previous line of work on choreographies, our model supports two important properties wrt the programming of concurrent algorithms: implicit parallelism and transparent projection.
1
Introduction
Background. Choreographic Programming [17] is a paradigm for programming concurrent systems that are deadlock-free by construction, by using an “Alice and Bob” notation [20] for preventing mismatched I/O communications syntactically. EndPoint Projection (EPP) is then used to synthesise distributed implementations in process models, which are guaranteed to be deadlock-free by construction [3, 4, 25]. Choreographies are found in standards [2, 28], language implementations [7, 12, 21, 27], and many formal models for behaviour specification [1, 3, 4, 14, 15]. They are widely used as a design tool in the fields of service-oriented computing, business processes and communication protocols, since they give a succint and unambiguous view of the communications performed in a system [2, 21, 27, 28]. Driven by the practical benefits brought about by choreographies, research on choreographic programming has recently gained in breadth, with the aim of exploring its applicability and theoretical foundations. The paradigm has been investigated in the settings of service programming [3, 4], runtime adaptation [23, 24], modular development [19], and formal logics [5, 6]. In [8], we developed Minimal Choreographies (MC), a foundational model that contains exactly the necessary choreographic primitives to achieve Turing completeness. MC is a representative model of choreographies, in the sense that it can be readily embedded in other models for choreographic programming [8]. However, Turing completeness of MC (and the other choreography calculi) only guarantees that all computable functions can be somehow implemented. What about the algorithms that can be expressed with choreographies? Surely, if we want to realise the promise of using choreographic programming as a full-fledged programming paradigm, we must also find a way to program interesting concurrent algorithms with choreographies. A way to do that is to look at the established techniques for the design of algorithms and develop a language model with the necessary Leibniz International Proceedings in Informatics Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
2
Choreographies, Divided and Conquered
primitives to support them. In this paper, we focus on providing full functional abstraction, i.e., allowing for the definition of procedures and their invocation from any point in a program. In particular, this requires primitives for general recursion and parametric procedures in choreographies, which we develop here for the first time. We exemplify the potential of these features by showing how they can be used to implement divide-and-conquer algorithms – a particularly interesting class of algorithms in the field of concurrency. Motivation: A representative example. We illustrate the kind of concurrent algorithms that we are interested in, by discussing a distributed version of merge sort. I Example 1. We make the standard assumption that we have concurrent processes with local storage and computational capabilities. In this example, each process stores a list and can use the following local functions: split1 and split2, respectively returning the first or second half of a list; is_small, which tests if a list has at most one element; and merge, which combines two sorted lists into one. The following (choreographic) procedure, MS, implements merge sort on the list stored at process p. MS ( p ) = if p . is_small then 0 else p start q1 , q2 ; p . split1 -> q1 ; p . split2 -> q2 ; MS < q1 >; MS < q2 >; q1 . c -> p ; q2 . c -> p . merge
Procedure MS starts by checking whether the list at process p is small, since in that case it does not need to be sorted (0 denotes termination); otherwise, p starts two sub-processes q1 and q2 (p start q1,q2), to which it respectively communicates the first and the second half of the list (p.split1 -> q1 and p.split2 -> q2). The procedure is recursively reapplied to processes q1 and q2, which can independently (concurrently) proceed to ordering their respective sub-lists. When this is done, we store the first ordered half from q1 to p (q1.c -> p, where c is a placeholder for the data stored in q1) and then merge it with the ordered sub-list from q2 (q2.c -> p.merge). Our merge sort example showcases the key features necessary for our development: General recursion. The ability to invoke a procedure and then proceed with arbitrary code. Parameterised procedures. Procedures should be parametric on the processes that they use (p in MS), enabling their reuse with different arguments (as in MS and MS). Process spawning. The ability of starting new processes dynamically. There must be no bound on how many processes can be started, since this is decided at runtime. (In our example, the number of spawned processes depends on the size of the original list.) Are there other properties that we should look for in a choreography model? First and foremost, we should have the typical correctness-by-construction results of choreographic programming: the distributed implementations generated from choreographies should be deadlock-free and follow precisely the high-level description given in the choreography [17]. Furthermore, in the setting of concurrent algorithms, we identify two other desiderata: Transparent projection. The distributed code projected from a choreography should implement exactly the processes and communications described therein; otherwise, the choreography misrepresents the actual efficiency and behaviour of the algorithm written by the programmer. For example, projection should not introduce extra communications. Implicit parallelism. Non-interfering behaviour should always run in parallel. For example, in our merge sort procedure, the recursive calls MS and MS involve only separate processes and should therefore be executed concurrently.
Luís Cruz-Filipe and Fabrizio Montesi
C ::= η; C | I; C | 0
η ::= p.e -> q.f | p -> q[l] | p start qT | p : q r
T ) = C, D | ∅ f D ::= X(q
I ::= if p.e then C1 else C2 | Xh˜ pi | 0
Figure 1 Procedural Choreographies, Syntax.
Contributions. We report our main contributions. Procedural Choreographies. We introduce Procedural Choreographies (PC), a choreography language model that supports all the features discussed above (§ 2). PC supports divideand-conquer concurrent algorithms. In particular, we show that it captures not only our distributed merge sort example, but also a more involved parallel downloader that makes heavy use of implicit parallelism to deal with the parallelisation of multiple streams. General recursion and parameterised procedures that support runtime instantiation of parameters are new features for choreographic programming, and represent the major departure from previous work. Their interplay with process spawning requires careful tracking of the connections among processes, formalised in our semantics for PC. Typing. Process spawning enables writing “wrong” choreographies, where processes that are supposed to interact are not properly connected. Thus, we introduce a typing discipline (§ 3) to rule out such wrong choreographies. This type system tracks the connections required by each procedure defined by the programmer, which is the major novelty wrt to previous typing disciplines for choreographies. It also checks that the data stored inside processes is well-typed wrt the local functions and communications that use it. The typing discipline for PC supports decidable type checking (Theorem 6) and type inference (Theorems 7 and 8). Endpoint Projection. We define an EndPoint Projection (EPP) procedure that, given a choreography, synthesises a distributed implementation in a process calculus (§ 4). This projection is transparent: it guarantees that the synthesised implementation faithfully follows the behaviour of the originating choreography (Theorem 11). As a corollary, all generated implementations are deadlock-free by construction (Corollary 12).
2
Procedural Choreographies
We now introduce the language model of Procedural Choreographies (PC). Syntax. Figure 1 introduces the syntax of PC; a procedural choreography is a pair hD, Ci, where C is a choreography and D is a set of procedure definitions. Process names, ranged over by p, q, r, . . ., identify processes that execute concurrently. Each process is equipped with a memory cell that stores a single value of a fixed type. Specifically, we consider a fixed set T of datatypes (numbers, lists, etc.); each process p stores only values of type Tp ∈ T. Statements in a choreography can either be communication actions (η) or compound instructions (I), and both can have continuations. Term 0 is the terminated choreography, which we sometimes omit. The term 0; A is needed at runtime to capture the termination of procedure calls with continuations. Processes communicate via direct references (names) to each other. In a value communication p.e -> q.f , process p sends the result of evaluating expression e to q; the expression e can contain the placeholder c, which is replaced at runtime with the data stored at process p. When q receives the value from p, it applies to it the (total) function f (of the form λx.e0 ). The intended semantics is that the parameter x will be replaced with the value sent by p, and that q will store the result of computing f in its memory. The expression e0 , the body of f , can also contain the placeholder c, allowing it to read the contents of q’s memory.
3
4
Choreographies, Divided and Conquered
The selection term p -> q[l] is standard, as in session types [13]: p communicates to q its choice of label l. Labels l range over a fixed enumerable set, whose precise definition is immaterial for the properties of the calculus (as long as it has at least two elements, see [8]). In term p start qT , process p spawns the new process q, which will store data of type T . Process name q is bound in the continuation C of p start qT ; C. Process spawning introduces the need for another kind of action. Since we want to model real-world communicating systems, we have to assume that, after executing p start qT , process p is the only process who knows the name of process q. Any other process wanting to communicate with q must therefore be first informed of its existence, as would typically happen, e.g., in object- and service-oriented computing [9, 11, 18]. This is done by p : q r, read “p introduces q and r”. We require p, q and r to be distinct. As its double-arrow syntax suggests, this action represents two communication steps – one where p communicates q’s name to r, and another where p communicates r’s name to q. This will become explicit in § 4, when we will show our EPP procedure for synthesising implementations. In a conditional term if p.e then C1 else C2 , process p evaluates expression e to choose between the possible continuations C1 and C2 . The set D defines global procedures that can be invoked in choreographies. Term f X(qT ) = C defines a procedure X with body C, which can be used anywhere in hD, Ci ˜ are bound to – in particular, inside the definitions of X and other procedures. The names q C, and they are assumed to be exactly the free process names in C; we also assume that D contains at most one definition for each procedure name. Term Xh˜ pi then invokes procedure ˜. X, instantiating its parameters with the processes p We work up to α-equivalence in choreographies, assuming the Barendregt convention. In particular, we rename bound variables as needed when expanding procedure definitions. I Example 2. Recall procedure MS from our merge sort example in the Introduction (Example 1). If we annotate the parameter p and the started processes q1 and q2 with a type, e.g., List(T ) for some T (the type of lists containing elements of type T ), then MS is a valid procedure definition in PC, as long as we allow two straightforward syntactic conventions: T stands for the sequence p start qT1 ; . . . ; p start qTn ; second, that a comfirst, that p start qf n 1 munication of the form p.e -> q stands for p.e -> q.id, where id is the identity function in our setting: it simply sets the content of q to the value received from p. Semantics. We define a reduction semantics →D for PC, which is parameterised over the set of procedure definitions D. To model the state of processes, we use a total state function σ, where σ(p) denotes the value stored in process p. We assume that each type T ∈ T has a special value ⊥T , representing an uninitialised process state. The semantics of PC also includes a connection graph G, keeping track of which processes know each other. In the G rules, p ←→ q denotes that G contains an edge between p and q, and G ∪ {p ↔ q} denotes the graph obtained from G by adding an edge between p and q (if missing). Executing a communication action p.e -> q.f requires that: p and q are connected in G; e is well typed; and that the type of e matches that expected by the receiver. The last two conditions are implicit in the premises of rule bC|Come, as v and w are not defined otherwise. Choreographies can therefore deadlock (be unable to reduce) because of errors in the programming of communications; this issue is addressed by our typing discipline in § 3. Rule bC|Conde needs the auxiliary operator # in order to generate only well-formed terms. η # C = η; C
I # C = I; C
(C1 ; C2 ) # C = C1 ; (C2 # C)
(1)
The operator # extends the scope of bound names. The Barendregt convention guarantees that it is capture-avoiding, while keeping our presentation simple.
Luís Cruz-Filipe and Fabrizio Montesi
G
p ←→ q
5
v = e[σ(p)/c]
w = f [σ(q)/c](v)
G, p.e -> q; C, σ →D G, C, σ[q 7→ w] G
p ←→ q bC|Sele G, p -> q[l]; C, σ →D G, C, σ
bC|Come
G
G
p ←→ q p ←→ r bC|Telle G, p : q r; C, σ →D G ∪ {q ↔ r}, C, σ
G, p start qT ; C, σ →D G ∪ {p ↔ q}, C, σ[q 7→ ⊥T ] i = 1 if e[σ(p)/c] = true,
i = 2 otherwise
G, (if p.e then C1 else C2 ); C, σ →D G, Ci # C, σ G, C2 , σ →D G0 , C20 , σ 0
C1 D C2
0
G, C1 , σ →D G
bC|Starte
bC|Conde
C20 D C10
, C10 , σ 0
bC|Structe
Figure 2 Procedural Choreographies, Semantics.
0; C
D
C
bC|Ende
pn(η) ∩ pn(η 0 ) = ∅ η; η 0 ≡D η 0 ; η
bC|Eta-Etae
{p, q} ∩ pn(η) = ∅ if p q[l]; C . G0 Γ`p:T
G
bT|Sele
c : T `T e : bool
p ←→ q
G
p ←→ r
Γ; G ` p : q r; C . G0 Γ; G ` Ci . Gi
Γ; G1 ∩ G2 ` C . G0
Γ; G ` (if p.e then C1 else C2 ); C . G0 T ) : G . G0 f Γ ` X(q X X
Γ ` pi : Ti
Γ; G ∪ {q ↔ r} ` C . G0
GX [˜ p/˜ q] ⊆ G
bT|Starte
bT|Come
bT|Telle
bT|Conde
Γ; G ∪ (G0X [˜ p/˜ q]) ` C . G0
0
Γ; G ` Xh˜ pi; C . G
bT|Calle
Figure 4 Procedural Choreographies, Typing Rules.
Therefore, our typing judgements have the form Γ; G ` C . G0 , which reads “C is welltyped according to the typings in Γ, and when executed from a connection graph that contains G it will produce a connection graph that includes G0 ”. Typing environments Γ are used to keep track of the types of processes and procedures; they are defined as follows: Γ ::= ∅ | Γ, p : T | Γ, X : G . G0 A typing p : T states that process p stores values of type T , whereas a typing X : G . G0 records the effect of the body of X on the graph G. The rules for deriving typing judgements for PC are given in Figure 4. We assume that we can use standard typing judgements for functions and expressions. Thus, we write c : T `T e : T and c : T1 `T f : T2 → T3 with the usual meaning, respectively “e has type T assuming that c has type T ” and “f has type T2 → T3 assuming that c has type T1 ”. The verification that all communications respect the expected types is straightforward, using the connection graph G to keep track of which processes have been introduced to each other. In rule bT|Starte, we are implicitly using the fact that q does not appear yet in G, which is another consequence of using the Barendregt convention. To type a procedural choreography, we need to type its set of procedure definitions D. We f T ) = C ∈ D, there is exactly one typing X(q T ) : G .G0 ∈ Γ, write Γ ` D if: for each X(qf X X X 0 ] and this typing is such that q : T , GX ` CX . GX . We say that Γ ` hD, Ci if Γ, ΓD ; GC ` C, G0 for some ΓD such that ΓD ` D and some G0 , where GC is the full graph whose nodes are the free process names in C. The choice of GC is motivated by observing that (i) all top-level processes should know each other and (ii) eventual visibilities between processes not occuring in C do not affect its typability. Our type system provides the following main property. I Theorem 5 (Deadlock freedom and Subject reduction). Given a choreography C and a set D of procedure definitions, if Γ ` D and Γ; G1 ` C . G01 for some Γ, G1 and G01 , then either: C D 0; or, for every σ, there exist G2 , C 0 and σ 0 such that G1 , C, σ →D G2 , C 0 , σ 0 and Γ0 ; G2 ` C 0 . G02 for some Γ0 ⊇ Γ and G02 . In other words, if C is well-typed, then C either terminates or diverges.
8
Choreographies, Divided and Conquered
B
::= q!e; B | p?f ; B | q!!r; B | p?r; B | q ⊕ l; B | p&{li : Bi }i∈I ; B | 0 | start qT . B2 ; B1 | if e then B1 else B2 ; B | Xh˜ pi; B | 0; B
N, M B
::= p .v B
|
N |M
|
0
::= X(˜ q) = B, B | ∅
Figure 5 Procedural Processes, Syntax.
Checking that Γ ` hD, Ci is not trivial, as it requires “guessing” ΓD . However, this set can be algorithmically determined from hD, Ci. As a consequence, we can also derive type inference properties for PC. I Theorem 6. Given Γ, D and C, it can be decided whether Γ ` hD, Ci. I Theorem 7. There is an algorithm that, given a procedural choreography hD, Ci, outputs: a set Γ such that Γ ` hD, Ci, if such a Γ exists; NO, if no such Γ exists. I Theorem 8. The types of arguments in procedure definitions and the types of freshly created processes can be inferred automatically. Theorems 7 and 8 allow us to omit type annotations in choreographies completely, assuming that we know the types of local functions and expressions at processes (given by `T ). Therefore, in practice, programmers can write choreographies as in Examples 1 and 4.
4
Endpoint Projection
In this section, we present our EndPoint Projection procedure (EPP), which compiles a choreography to a distributed implementation represented in terms of a process calculus.
4.1
Procedural Processes
We introduce our target process model, called Procedural Processes (PP), an extension of the SP calculus [8]. The new key elements are: arbitrary expressions and functions for value communications; the possibility to start new processes; parameterised recursive procedures; general recursion; and the communication of process names (references). Syntax. The syntax of PP is reported in Figure 5. A term p .v B is a process, where p is its name, v is the value stored in its memory cell, and B is its behaviour. Networks, ranged over by N, M , are parallel compositions of processes, where 0 is the inactive network. Finally, hB, N i is a procedural network, where B defines the procedures that the processes in N may invoke. Values, expressions and functions are as in PC. We comment the syntax of behaviours. A send term q!e; B sends the evaluation of expression e to process q, and then proceeds as B. Term p?f ; B is the dual receiving action: it receives a value from process p, combines it with the value in memory cell of the process executing the behaviour as specified by f , and then proceeds as B. Term q!!r sends process name r to q and process name q to r, making q and r “aware” of each other. The dual action is p?r, which receives a process name from p that replaces the bound variable r in the continuation. Term q ⊕ l; B sends the selection of a label l to process q. Selections are received by the branching term p&{li : Bi }i∈I , which can receive a selection for any of the labels li and proceed according to Bi . Branching terms must offer at least one branch. Term
Luís Cruz-Filipe and Fabrizio Montesi
9
u = (f [w/c])(e[v/c]) p .v q!e; B1 | q .w p?f ; B2 →B p .v B1 | q .u B2
bP|Come
p .v q!!r; B1 | q .w p?r; B2 | r .u p?q; B3 →B p .v B1 | q .w B2 | r .u B3
bP|Telle
j∈I bP|Sele p .v q ⊕ lj ; P | q .w p&{li : Qi }i∈I →B p .v P | q .w Qj q0 fresh 0
T
0
p .v (start q . B2 ; B1 ) →B p .v B1 [q /q] | q .⊥T B2 i = 1 if e[v/c] = true, i = 2 otherwise bP|Conde p .v if e then P1 else P2 →B p .v Pi
bP|Starte
N B M
N →B N 0 bP|Pare N | M →B N 0 | M
M →B M 0 M 0 B N 0 bP|Structe N →B N 0
Figure 6 Procedural Processes, Semantics.
0; B B B
bP|Ende
N | p .v 0 B N
bP|AZeroe
T) = B f X(q X ∈ B
Xh˜ pi; B B BX [˜ p/˜ q] # B
N | 0 B N
bP|NZeroe
bP|Unfolde
Figure 7 Procedural Processes, Structural precongruence B .
start q . B2 ; B1 starts a new process (with a fresh name) executing B2 , proceeding in parallel as B1 . The other terms are standard (conditionals, procedure calls, and termination), while procedural definitions are stored globally as in PC. Some terms bind names: start q . B2 ; B1 binds q in B1 , and p?r; B binds r in B. Semantics. The rules defining the reduction relation →B for PP are shown in Figure 6. As in PC, they are parameterised on the set of behavioural procedures B. Rule bP|Come models value communication: a process p executing a send action towards a process q can synchronise with a receive-from-p action at q; in the reductum, f is used to update the memory of q by combining its contents with the value sent by p. The placeholder c is replaced with the current value of p in e (resp. q in f ). Rule bP|Telle establishes a three-way synchronisation, allowing a process to introduce two others. Since the received names are bound at the receivers, we rely on α-conversion to make the receivers agree on each other’s name, as done in session types [13]. (Differently from PC, we do not assume the Barendregt convention here.) Rule bP|Sele is standard selection, where the sender process selects one of the branches offered by the receiver. In rule bP|Starte, we require the name of the created process to be globally fresh. All other rules are standard. In rule bP|Structe, structural precongruence B is the smallest precongruence satisfying commutativity of the parallel operator | and the rules in Figure 7. Rule bP|Unfolde expands procedure calls. It uses again the # operator, defined as in (1) but where terms are now in the PP language.
I Remark. Our three-way synchronisation in rule bP|Telle could be easily encoded with two standard communications of names (as in the π-calculus [26]). Our choice has no effect on our results, but will have the advantage of giving a clearer formulation of our EPP.
4.2
EndPoint Projection (EPP)
We now show how to compile procedural choreographies in PC to processes in PP.
10
Choreographies, Divided and Conquered
[[p.e -> q.f ; C]]r =
q!e; [[C]]r
p?f ; [[C]]r [[C]] r
if r = p if r = q otherwise
q!!r; [[C]]s if s = p
p?r; [[C]]s [[p : q r; C]]s = p?q; [[C]]s [[C]]s
[[if p.e then C1 else C2 ; C]]r =
if s = q if s = r otherwise
[[p -> q[l]; C]]r =
q ⊕ l; [[C]]r
p&{l : [[C]]r } [[C]] r
if r = p if r = q otherwise
start q . [[C]]q ; [[C]]r if r = p [[C]]r otherwise Xi h˜ pi; [[C]]r if r = pi [[Xh˜ pi; C]]r = [[C]]r otherwise [[p start qT ; C]]r =
if e then [[C1 ]]r else [[C2 ]]r ; [[C]]r ([[C1 ]]r t [[C2 ]]r ); [[C]]r
if r = p otherwise
[[0]]r = 0
[[0; C]]r = [[C]]r
Figure 8 Procedural Choreographies, Behaviour Projection.
Behaviour Projection. We start by defining how to project the behaviour of a single process p, a partial function denoted [[C]]p . The rules defining behaviour projection are given in Figure 8. Each choreography term is projected to the local action of the process that we are projecting. For example, for a communication term p.e -> q.f , we project a send action if we are projecting the sender process p, a receive action if we are projecting the receiver process q, or we just proceed with the continuation otherwise. The rule for projecting a conditional uses the standard (and partial) merging operator t: B t B 0 is isomorphic to B and B 0 up to branching, where the branches of B or B 0 with distinct labels are also included [3]. Merging allows the process that decides a conditional to inform the other processes of its choice later on, using selections [15]. Building on behaviour projection, we define how to project the set D of procedure definitions. We need to consider two main aspects. The first is that, at runtime, the choreography may invoke X multiple times, but potentially passing r at different argument positions each time. This means that r may be called to play different “roles” in the implementation of the procedure. For this reason, we project the behaviour of each possible process parameter qi as the local procedure Xi . The second aspect is: depending on the role that r is called to play by the choreography, it will need to know the names of the other processes that it is supposed to communicate with in the choreographic procedure. We deal with this by simply passing all arguments, which means that some of them may even be unknown to the process invoking the procedure. This substantially simplifies the development of the theory, and does not essentially change it: it is straightforward to annotate the EPP by analysing which parameters of each recursive definition are actually used in each of its projections, and instantiating only those. We can now define [[D]] as the component-wise extension of T ) = C]] = X (˜ [[X(qf q) = [[C]]qn 1 q) = [[C]]q1 , . . . , Xn (˜
˜ = q1 , . . . , qn . where q
I Definition 9 (EPP from PC to PP). Given a procedural choreography hD, Ci and a state σ, the endpoint projection [[D, C, σ]] is defined as the parallel composition of the processes in C with all global definitions derived from D: D E Q [[D, C, σ]] = h[[D]], [[C, σ]]i = [[D]], p∈pn(C) p .σ(p) [[C]]p where [[C, σ]], the EPP of C wrt state σ, is independent of D. Since the σs are total, if [[C, σ]] is defined for some σ, then [[C, σ 0 ]] is defined also for all other σ 0 . When [[C, σ]] = N is defined for any σ, we say that C is projectable and that N is the projection of C, σ. Similar considerations apply to [[D, C, σ]].
Luís Cruz-Filipe and Fabrizio Montesi
Parameterised General Process Transparent recursion sequencing spawning EPP MC [8] no no no yes IOC [15] no yes no yes no yes no no DIOC [23] GC [3] no no partial partial CC/Chor [4, 17] no no yes yes PC (this work) yes yes yes yes Table 1 Expressive capabilities of representative choreography languages. Language
11
Implicit parallelism yes no no no yes yes
I Example 10. We show the EPP of procedure MS in our merge sort example from § 1. Below, L is the list type List(T ) for some T and id is the identity function (Example 2). MSp (pL ) = if is_small then 0 else
L start qL 1 . ( p?id; MSp hq1 i; p!c; 0 ); start q2 . ( p?id; MSp hq2 i; p!c; 0 );
q1 !split1 ; q2 !split2 ; q1 ?id; q2 ?merge; 0 Properties. EPP guarantees the following operational correspondence, which is the hallmark correctness-by-construction property of choreography languages. I Theorem 11 (EPP Theorem). If Γ ` D and Γ; G ` C . G0 , then, for all σ: (Completeness) G, C, σ →D G00 , C 0 , σ 0 implies [[C, σ]] →[[D]] [[C 0 , σ 0 ]]; (Soundness) [[C, σ]] →[[D]] N implies G, C, σ →D G00 , C 0 , σ 0 for some G00 and σ 0 such that [[C 0 , σ 0 ]] ≺ N . Above, the (standard, from [3, 4]) pruning relation ≺ eliminates the branches introduced by the merging operator t when they are not needed anymore to follow the originating choreography (we write N N 0 when N 0 ≺ N ). Pruning does not alter reductions, since the eliminated branches are never selected, as shown in [3, 15, 22]. Combining Theorem 11 with Theorem 5 we get that the projections of typable PC terms never deadlock. I Corollary 12 (Deadlock-freedom by construction). Let N = [[C, σ]] for some C and σ, and assume that Γ; G ` C . G0 for some Γ such that Γ ` D and some G and G0 . Then, either: N [[D]] 0 (N has terminated); there exists N 0 such that N →[[D]] N 0 (N can reduce).
5
Related Work and Discussion
We start by comparing PC to previous choreography models, focusing on the features that motivated our development (cf. § 1, Motivation). The comparison is summarised in Table 1; we comment on some interesting items. The language of Minimal Choreographies (MC) [8] is the most similar to ours, but is also remarkably less expressive: it captures only programs with a fixed (static) number of processes that use basic local computation primitives (zero and successor for natural numbers). MC is a minimal choreography language for general computation (Turing completeness), which makes PC Turing complete by extension (MC is included in PC). The language IOC [15] is not Turing complete: it captures only systems with a fixed number of processes and has no support for local process computation or procedures. IOC provides general sequencing, but in a form that does not support implicit parallelism. DIOC [23] is an extension of IOC that supports the dynamic replacement
12
Choreographies, Divided and Conquered
of parts of a choreography via runtime adaptation; however, updates cannot contain new processes (called roles in DIOC), and this extension is made at the expense of transparent EPP (coordination of adaptation requires hidden communications). The language GC [3] supports the spawning of processes at runtime inside of services; however, this is kept hidden from the programmer and processes are spawned whenever a service is contacted on a special channel, called service channel. EPP in GC is thus not transparent wrt processes; as a consequence, it GC requires additional machinery to guarantee correctness by construction. The calculus CC [4] and its implementation Chor [17] support explicit process spawning, which is achieved via special public channels, similarly to GC and differently from our PC (where channels are not needed). Both GC and CC support recursive procedures, but these do not support invocations with different arguments at runtime as in PC [3, 17]. CC supports asynchronous communications, whereas here we focused on synchronous communications for simplicity of presentation; it would be straightforward to add asynchrony to our development by following the idea of rule bC|Asynce found in [4]. None of the examples we presented can be written in previous choreography models, as discussed in § 1. A major distinguishing feature of PC is the management of connections among processes using graphs that can be manipulated at runtime. The only other choreography model that supports something similar is CC, via channel passing, but it is much less expressive since a process that introduces two other process cannot communicate with them thenceforth. Choreographies are not only used as implementation languages in the literature, but also as specifications. Multiparty Session Types (MPST) [14] is a typing discipline where choreography-like descriptions are used as behavioural types to verify implementations given in (variants of) the π-calculus. Differently from PC and all other models for choreographic programming, choreographies in MPST do not model computation and are not Turing complete. MPST type multiparty sessions, which are guaranteed to be “locally” deadlock-free, in the sense that bad composition of different sessions can still lead to deadlocks. By contrast, PC guarantees deadlock-freedom for the whole system, regardless of how procedures are composed. Recent work investigated how to extend MPST to capture protocols where the number of participants and communications is known only when a session is started at runtime [29], or the number of participants in a session can grow during execution [10]. These results are achieved by introducing ad-hoc primitives and “middleware” terms in the process calculi to be typed, e.g., for tracking and polling the current number of participants in a session [10]. In PC we do not need such machinery: our programming of connections among processes, which arises naturally in models dealing with dynamic process structures (e.g., mobility in the π-calculus [16]), is general enough for our purposes. Programming connections also yields better precision: in MPST, the graph of connections in a session is always assumed to be complete, whereas in PC we only require the connections that we actually need. This makes PC a suitable model for reasoning about different kinds of topologies. In the future, it would be interesting to see whether our type system and connection graphs can be used to enforce pre-defined network structures (e.g., hypercubes or butterflies), making PC a candidate for the programming of choreographies that account for hardware restrictions. Acknowledgements. Montesi was supported by CRC (Choreographies for Reliable and efficient Communication software), grant no. DFF–4005-00304 from the Danish Council for Independent Research.
Luís Cruz-Filipe and Fabrizio Montesi
References 1 2 3 4
5
6
7 8 9 10 11 12
13
14
15
16 17 18
19
20 21
Samik Basu, Tevfik Bultan, and Meriem Ouederni. Deciding choreography realizability. In John Field and Michael Hicks, editors, Proc. of POPL, pages 191–202. ACM, 2012. Business Process Model and Notation. http://www.omg.org/spec/BPMN/2.0/. Marco Carbone, Kohei Honda, and Nobuko Yoshida. Structured communication-centered programming for web services. ACM Trans. Program. Lang. Syst., 34(2):8, 2012. Marco Carbone and Fabrizio Montesi. Deadlock-freedom-by-design: multiparty asynchronous global programming. In Roberto Giacobazzi and Radhia Cousot, editors, Proc. of POPL, pages 263–274. ACM, 2013. Marco Carbone, Fabrizio Montesi, and Carsten Schürmann. Choreographies, logically. In Paolo Baldan and Daniele Gorla, editors, Proc. of CONCUR, volume 8704 of LNCS, pages 47–62. Springer, 2014. Marco Carbone, Fabrizio Montesi, Carsten Schürmann, and Nobuko Yoshida. Multiparty session types as coherence proofs. In Luca Aceto and David de Frutos-Escrig, editors, Proc. of CONCUR, volume 42 of LIPIcs, pages 412–426. Schloss Dagstuhl, 2015. Chor. Programming Language. http://www.chor-lang.org/. Luís Cruz-Filipe and Fabrizio Montesi. Choreographies, computationally. CoRR, abs/1510.03271, 2015. Frank S. de Boer, Mohammad Mahdi Jaghoori, Cosimo Laneve, and Gianluigi Zavattaro. Decidability problems for actor systems. Logical Methods in Computer Science, 10(4), 2014. Pierre-Malo Deniélou and Nobuko Yoshida. Dynamic multirole session types. In Thomas Ball and Mooly Sagiv, editors, Proc. of POPL, pages 435–446. ACM, 2011. Maurizio Gabbrielli, Saverio Giallorenzo, and Fabrizio Montesi. Applied choreographies. CoRR, abs/1510.03637, 2015. Kohei Honda, Aybek Mukhamedov, Gary Brown, Tzu-Chun Chen, and Nobuko Yoshida. Scribbling interactions with a formal foundation. In Raja Natarajan and Adegboyega K. Ojo, editors, Proc. of ICDCIT, volume 6536 of LNCS, pages 55–75. Springer, 2011. Kohei Honda, Vasco Vasconcelos, and Makoto Kubo. Language primitives and type disciplines for structured communication-based programming. In Chris Hankin, editor, Proc. of ESOP, volume 1381 of LNCS, pages 22–138. Springer, 1998. Kohei Honda, Nobuko Yoshida, and Marco Carbone. Multiparty asynchronous session types. In George C. Necula and Philip Wadler, editors, Proc. of POPL, pages 273–284. ACM, 2008. Ivan Lanese, Claudio Guidi, Fabrizio Montesi, and Gianluigi Zavattaro. Bridging the gap between interaction- and process-oriented choreographies. In Antonio Cerone and Stefan Gruner, editors, Proc. of SEFM, pages 323–332. IEEE, 2008. Robin Milner, Joachim Parrow, and David Walker. A calculus of mobile processes, I and II. Information and Computation, 100(1):1–40,41–77, September 1992. Fabrizio Montesi. Choreographic Programming. Ph.D. thesis, IT University of Copenhagen, 2013. http://fabriziomontesi.com/files/m13_phdthesis.pdf. Fabrizio Montesi and Marco Carbone. Programming services with correlation sets. In Gerti Kappel, Zakaria Maamar, and Hamid R. Motahari Nezhad, editors, Proc. of ICSOC, volume 7084 of LNCS, pages 125–141. Springer, 2011. Fabrizio Montesi and Nobuko Yoshida. Compositional choreographies. In Pedro R. D’Argenio and Hernán C. Melgratti, editors, Proc. of CONCUR, volume 8052 of LNCS, pages 425–439. Springer, 2013. Roger M. Needham and Michael D. Schroeder. Using encryption for authentication in large networks of computers. Commun. ACM, 21(12):993–999, December 1978. PI4SOA. http://www.pi4soa.org, 2008.
13
14
Choreographies, Divided and Conquered
22
23
24
25
26 27 28 29
Mila Dalla Preda, Maurizio Gabbrielli, Saverio Giallorenzo, Ivan Lanese, and Jacopo Mauro. Deadlock freedom by construction for distributed adaptive applications. CoRR, abs/1407.0970, 2014. Mila Dalla Preda, Maurizio Gabbrielli, Saverio Giallorenzo, Ivan Lanese, and Jacopo Mauro. Dynamic choreographies – safe runtime updates of distributed applications. In Tom Holvoet and Mirko Viroli, editors, Proc. of COORDINATION, volume 9037 of LNCS, pages 67–82. Springer, 2015. Mila Dalla Preda, Saverio Giallorenzo, Ivan Lanese, Jacopo Mauro, and Maurizio Gabbrielli. AIOCJ: A choreographic framework for safe adaptive distributed applications. In Benoît Combemale, David J. Pearce, Olivier Barais, and Jurgen J. Vinju, editors, Proc. of SLE, volume 8706 of LNCS, pages 161–170. Springer, 2014. Zongyan Qiu, Xiangpeng Zhao, Chao Cai, and Hongli Yang. Towards the theoretical foundation of choreography. In Carey L. Williamson, Mary Ellen Zurko, Peter F. PatelSchneider, and Prashant J. Shenoy, editors, Proc. of WWW, pages 973–982. ACM, 2007. D. Sangiorgi and D. Walker. The π-calculus: a Theory of Mobile Processes. Cambridge University Press, 2001. Savara. JBoss Community. http://www.jboss.org/savara/. W3C WS-CDL Working Group. Web services choreography description language version 1.0. http://www.w3.org/TR/2004/WD-ws-cdl-10-20040427/, 2004. Nobuko Yoshida, Pierre-Malo Deniélou, Andi Bejleri, and Raymond Hu. Parameterised multiparty session types. In C.-H. Luke Ong, editor, Proc. of FOSSACS, volume 6014 of LNCS, pages 128–145. Springer, 2010.
Luís Cruz-Filipe and Fabrizio Montesi
A
15
Appendix
We include detailed proofs of the theorems in this paper. We start with some technical lemmas about typing. I Lemma 13 (Monotonicity). Let Γ and Γ0 be typing contexts with Γ ⊆ Γ0 , G1 , G01 and G be visibility graphs such that G1 ⊆ G, and C be a choreography. If Γ; G1 ` C . G01 , then Γ0 ; G ` C . G ∪ G01 . Proof. Straightforward by induction on the derivation of Γ; G1 ` C . G01 .
J
I Lemma 14 (Sequentiality). Let Γ be a typing context, G1 , G01 , G2 and G02 be visibility graphs such that G2 ⊆ G01 , and C1 , C2 be choreographies. If Γ; G1 ` C1 . G01 and Γ; G2 ` C2 . G02 , then Γ; G1 ` C1 # C2 . G01 ∪ G02 . Proof. Straightforward by induction on the derivation of Γ; G2 ` C2 . G02 .
J
I Lemma 15 (Substitution). Let Γ be a typing context, G and G0 be visibility graphs, and ˜ be a set of process names that are free in C and q ˜ be a set of C be a choreography. Let p process names that do not occur (free or bound) in C. If Γ; G ` C . G0 , then Γ[˜ p/˜ q]; G[˜ p/˜ q] ` C[˜ p/˜ q] . G0 [˜ p/˜ q]. Proof. Straightforward by induction on the derivation of Γ; G ` C . G0 , as all typing rules are valid when substitutions are applied. J We are now ready to start proving Theorem 5. The following lemma takes care of the base cases, and is required for one of the inductive steps. I Lemma 16. Let Γ be a set of typing judgements, D a set of procedure definitions, G1 and G01 visibility graphs, and C a choreography that does not start with 0 or a procedure call. Assume that Γ ` D and Γ; G1 ` C . G01 . For every state σ, there exist Γ0 , σ 0 , C 0 , G2 and G02 such that G1 , C, σ →D G2 , C 0 , σ 0 and Γ0 ; G2 ` C 0 . G02 . Proof. By case analysis on the last step of the proof of Γ; G1 ` C . G01 . By hypothesis, this proof cannot end with an application of rules bT|Ende, bT|EndSeqe or bT|Calle; we detail all cases for completeness, but the only non-trivial one is the last. bT|Starte: then C is p start qT ; C ◦ and by hypothesis Γ, q : T ; G1 ∪ {p ↔ q} ` C ◦ . G01 . Since G1 , p start qT ; C, σ →D G1 ∪ {p ↔ q}, C ◦ , σ[q 7→ ⊥T ] by rule bC|Starte, taking Γ0 = Γ, q : T , σ 0 = σ[q 7→ ⊥T ], C 0 = C ◦ , G2 = G1 ∪ {p ↔ q} and G02 = G01 establishes the thesis. G1 bT|Come: then C is p.e -> q.f ; C ◦ and by hypothesis p ←→ q, f [σ(q)/c](e[σ(p)/c]) is a valid expression of type Tq , and Γ; G1 ` C . G01 . Then all the preconditions of bC|Come are met, so taking Γ0 = Γ, σ 0 = σ[q 7→ f [σ(q)/c](e[σ(p)/c])], C 0 = C ◦ , G2 = G1 and G02 = G01 establishes the thesis. G
1 bT|Sele: then C is p -> q[l]; C ◦ and by hypothesis p ←→ q and Γ; G1 ` C . G01 . By ◦ ◦ 0 bC|Sele, G1 , p -> q[l]; C , σ →D G1 , C , σ, so taking Γ = Γ, σ 0 = σ, C 0 = C ◦ , G2 = G1 and G02 = G01 again establishes the thesis.
G
G
1 1 bT|Telle: then C is p : q r; C ◦ and by hypothesis both p ←→ q, p ←→ r, and ◦ 0 Γ; G1 ∪ {q ↔ r} ` C . G1 . Since the preconditions of rule bC|Telle are met, by taking Γ0 = Γ, σ 0 = σ, C 0 = C ◦ , G2 = G1 ∪ {q ↔ r} and G02 = G01 establishes the thesis.
16
Choreographies, Divided and Conquered
bT|Conde: then C is if p.e then C1 else C2 ; C ◦ and by hypothesis e[σ(p)/c] is a valid Boolean expression, Γ; G1 ` Ci . G◦i and Γ; G◦1 ∩ G◦2 ` C ◦ . G01 . Suppose e[σ(p)/c] = true (the other case is similar). Then G1 , if p.e then C1 else C2 ; C ◦ , σ →D G1 , C1 # C, σ. Since G◦1 ∩ G◦2 ⊆ G◦1 , Lemma 14 allows us to conclude that Γ; G1 ` C1 # C . G01 ∪ G◦1 , whence the thesis follows by taking Γ0 = Γ, C 0 = C1 # C, G2 = G1 and G02 = G01 ∪ G◦1 . J Proof (Theorem 5). If C D 0, then the first case holds. Assume that C 6D 0; we show that the second case holds by induction on the proof of Γ; G1 ` C . G01 . By hypothesis, the last rule applied in this proof cannot be bT|Ende; the cases where the last rule applied is bT|Starte, bT|Come, bT|Sele, bT|Telle or bT|Conde follow immediately from Lemma 16, while the case of rule bT|EndSeqe is straightforward from the induction hypothesis. We focus on the case of rule bT|Calle. In this case, C has the form Xh˜ pi; C ◦ , and we know 0 f ˜ : T , GX [˜ that Γ ` X(qT ) : (GX . GX ), Γ ` p p/˜ q] ⊆ G1 and Γ; G1 ∪ (G0X [˜ p/˜ q]) ` C ◦ . G01 . From the hypothesis that Γ ` D we also know that ΓX ; GX ` CX . G0X , where CX is the body of X as defined in D. By Lemma 15, ΓX [˜ p/˜ q]; GX [˜ p/˜ q] ` CX [˜ p/˜ q] . G0X [˜ p/˜ q], 0 p/˜ q] ∪ G1 . By applying rule bC|Unfolde, whence by Lemma 13 also Γ; G1 ` CX [˜ p/˜ q] . GX [˜ we conclude that Xh˜ pi; C ◦ D CX [˜ p/˜ q] # C ◦ , and Lemma 14 allows us to conclude that Γ; G1 ` CX [˜ p/˜ q] # C ◦ . G01 . If CX does not begin with a procedure call, then Lemma 16 establishes the thesis. If this is not the case, we repeat the unfolding and the typing argument; well-foundedness of D guarantees that this will only be done a finite number of times. J We now shift to Theorem 6. Proof (Theorem 6). The proof of this result proceeds in several stages. We first observe that deciding whether Γ; G ` C . G0 is completely mechanical, as the typing rules are deterministic. Furthermore, those rules can also be used to construct G0 from G and C; therefore, the key step of this proof is showing, given Γ and hD, Ci, how to find a “canonical typing” for the recursive definitions, the set ΓD , such that ΓD ` D and Γ, ΓD ; GC ` C . G0 (with G0 inferred) iff Γ, Γ0 ; GC ` C . G00 for some Γ0 and G00 . More precisely, we need to find graphs GX and G0X for each procedure X defined in D. Our proof proceeds in three steps. First, for each X we compute an underapproximation G◦X of the output graph G0X , containing all the relevant connections that executing X can add. Using this, we are able to compute the input graph GX and the output graph G0X = GX ∪ G◦X . Both these steps are achieved by computing a minimal fixpoint of a monotonic operator in the set of all graphs whose vertices are the parameters of X. Finally, we argue that the typing X : GX . GX is minimal, and therefore the set ΓD of all such typings fulfills the property we require. Throughout the remainder of this proof, we assume that D = {Xi (qei ) = Ci | i = 1, . . . , n}. 1. In order to compute G◦Xi , we define an auxiliary function fwd with intended meaning ˜
i as follows: fwdG Cj (G) computes the communication graph obtained from G after one execution of the body of Xj , assuming that Xi (qei ) : ∅ . Gi for all i and ignoring newly created processes. We use a conditional union operator ] where G ] {e} denotes G ∪ {e} if e is an edge connecting two vertices in G, and G otherwise. The function fwd is defined
Luís Cruz-Filipe and Fabrizio Montesi
17
as follows. ˜
˜
Gi i fwdG p:q r;C (G) = fwdC (G ] {q ↔ r})
˜
Gi Gi Gi i fwdG if p.e then C1 else C2 ;C (G) = fwdC (fwdC1 (G) ∩ fwdC2 (G))
˜
˜
˜
Gi i fwdG p.e -> q.f ;C (G) = fwdC (G)
˜
˜
˜
˜
˜
˜
˜
˜
Gi i fwdG 0;C (G) = fwdC (G) ˜
˜
˜
i i (G) = fwdG fwdG C (G) p start qT ;C
i fwdG 0 (G) = G
i i p/qei ]) (G) = fwdG fwdG C (G ] Gi [˜ Xi h˜ pi;C
i i (G) = fwdG fwdG C (G) p -> q[l];C
Using fwd, we define an operator Tfwd over the set G of tuples of graphs over the paramfi | Gi is a graph over qei }. Observe that G is a complete lattice eters of Xi , i.e. G = {G wrt componentwise inclusion. ˜i G fi ) = fwd^ Tfwd (G Ci (Gi )
This operator is monotonic, since fwd only adds edges to its argument, and thus has a least fixpoint that can be computed by iterating Tfwd from the tuple of empty graphs over the right sets of vertices. Furthermore, since G is finite (each graph has a finite number of vertices) this fixpoint corresponds to a finite iterate, and can thus be computed in ◦ . g finite time. We denote this fixpoint by G Xi 2. The construction of the input graphs GXi follows the same idea: we go through the Ci s noting the edges that are required for all communications to be able to take place. It is however slightly more complicated, because we have to keep track of edges that the choreography adds to the graph; we therefore need a function bck that manipulates two ˜i graphs instead of one. More precisely, bckG C (G, G) returns the graph extending G that is needed for correctly executing C (ignoring newly created processes); the first argument keeps track of the edges that need to be added to G, and the second argument keeps track of edges added by executing C. This function uses the graphs G◦Xi computed earlier, which explains why it has to be defined afterwards. We use the same notational conventions as above, and let fstha, bi = a and sndha, bi = b. ˜ G
bck0 i (Ga , Gb ) = hGa , Gb i ˜ G
˜ G
i (G , G ) = bck i (G , G ) bck0;C a a b b C
˜ G
bckp.ei -> q.f ;C (Ga , Gb ) =
bck
˜ G bckC i (Ga
˜ G
i bckp -> (Ga , Gb ) = q[l];C ˜ G i (Ga , Gb ) p start qT ;C
˜ G bckp:qi r;C (Ga , Gb )
˜ G
bckC i (Ga , Gb )
if p ↔ q ∈ Gb
] {p ↔ q}, Gb ] {p ↔ q})
˜ G
bckC i (Ga , Gb )
˜ G bckC i (Ga
otherwise if p ↔ q ∈ Gb
] {p ↔ q}, Gb ] {p ↔ q})
otherwise
˜ G
= bckC i (Ga , Gb )
=
G˜i bckC (Ga , Gb ] {q ↔ r}) G ˜ i
if p ↔ q, p ↔ r ∈ Gb
bckC (Ga ] {p ↔ q}, Gb ] {p ↔ q, q ↔ r})
if p ↔ q 6∈ Gb , p ↔ r ∈ Gb
˜
Gi (Ga ] {p ↔ r}, Gb ] {p ↔ r, q ↔ r}) C bckG ˜
if p ↔ q ∈ Gb , p ↔ r 6∈ Gb
bckC i (Ga ] {p ↔ q, p ↔ r}, Gb ] {p ↔ q, p ↔ r, q ↔ r})
˜ G i bckif p.e (Ga , Gb ) then C1 else C2 ;C ˜ G
=
˜ ˜ G G bckC i (fst(bckC i (Ga , Gb )) 1 ˜ G
∪
if p ↔ q, p ↔ r 6∈ Gb
˜ ˜ G G fst(bckC i (Ga , Gb )), snd(bckC i (Ga , Gb )) 2 1 ◦
bckXi h˜pi;C (Ga , Gb ) = bckC i (Ga ] (Gi [˜ p/qei ] \ Gb ), Gb ] Gi [˜ p/qei ] ] Gi [˜ p/qei ]) i
(This definition could be simplified, but this formulation is sufficient for our purposes.) Again we define a monotonic operator over the same G as above. ˜i G fi ) = fst(bck^ Tbck (G Ci (Gi , Gi ))
˜ G
∩ snd(bckC i (Ga , Gb ))) 2
18
Choreographies, Divided and Conquered
Observe that we do not need to recompute G◦i , since these graphs contain all edges that can possibly be added by executing Ci . The least fixpoint of Tbck can again be g computed by finitely iterating this operator, and it is precisely G Xi . We then define 0 ◦ GXi = GXi ∪ GXi . 3. We now show that ΓD = {Xi (qei ) : GXi . G0Xi } is a minimal typing of D, in the sense explained earlier. Observe that it is possible that ΓD 6` D, in particular if the Xi are ill-formed choreographies. Suppose that Γ, Γ0 ; GC ` C . G for some Γ0 and G. We argue that Γ, ΓD ; GC ` C . G0 , where G0 is inferred from the typing rules. For each procedure Xi (qei ) = Ci , there must 0 be a unique typing Xi (qei ) : G∗Xi . G∗∗ Xi in Γ . By a simple inductive argument one can ◦ ∗∗ ∗∗ show that GXi ⊆ G (since ∅ ⊆ GXi and Tfwd preserves inclusion in G∗∗ Xi ). Similarly, one ∗ 0 shows that GXi ⊆ G∗Xi and that G∗∗ \ G ⊆ G \ G . As a consequence, the typing Xi Xi Xi Xi 0 0 derivation for Γ, Γ ; GC ` C . G can be used for Γ, ΓD ; GC ` C . G , as all applications of rule bT|Calle are guaranteed to be valid (their preconditions hold) and to produce the same results (they change the communication graph in the same way). J A consequence of this result is that we can obtain a type inference algorithm (Theorem 7), as we only need to “guess” types for the processes in the choreography. As a corollary of the proof, we can also infer the types for parameters of procedural definitions and freshly created processes. Proof (Theorem 7). Construct Γ by going through C and adding p : Tp every time there is an action that depends on p’s type (i.e. p is a sender or receiver in a communication, or an argument of a procedure call). If Γ contains two different types for any process, then output NO, else output Γ. This algorithm will not necessarily assign a type to all processes in C, in case C contains processes whose memory is never accessed. J Proof (Theorem 8). Inferring the types of freshly created processes is analogous to the previous proof. As for parameters of procedure definitions, we omit the details of the proof, as it repeats ideas previously used. Define an operator TT over tuples of typing contexts (one for each Xi defined in D) that generates a typing context for each Xi in the same way as in the previous proof. If any contradictions are found, then fail. Iterate TT until either failure occurs (in which case the Xi s are not properly defined) or a fixpoint is reached. Finally, assign a random type (e.g. N) to each process variable that has not received a type during this procedure. The algorithm readily extends to infer the types of processes created inside procedure definitions. J Proof Sketch (Theorem 11). The structure of the proof is standard, from [17], so we only show the most interesting differing details. In particular, we need to be careful about how we deal with connections, which is a new key ingredient in PC wrt to previous work. We demonstrate this point for the direction of (Completeness); the direction for (Soundness) is proven similarly. The proof proceeds by induction on the derivation of G, C, σ →D G00 , C 0 , σ 0 . The interesting cases are reported below. bC|Telle: From the definition of EPP we get: [[p : q r; C ◦ , σ]]
p .σ(p) q!!r; [[C ◦ ]]p | q .σ(q) p?r; [[C ◦ ]]q | r .σ(r) p?q; [[C ◦ ]]r | N
By bP|Telle we get: [[p : q r; C ◦ , σ]] →
p .σ0 (p) [[C ◦ ]]p | q .σ0 (q) [[C ◦ ]]q | r .σ0 (r) [[C ◦ ]]r | N
Luís Cruz-Filipe and Fabrizio Montesi
which proves the thesis, since we can assume that the projection of C 0 remains unchanged for the other processes (N stays the same). bC|Starte: This is the most interesting case. From the definition of EPP we get: [[p start qT ; C ◦ , σ]]
p .σ(p) start qT . [[C ◦ ]]q ; [[C ◦ ]]p | N
From the semantics of PP we get: [[p start qT ; C ◦ , σ]] →
p .σ0 (p) ([[C ◦ ]]p )[q0 /q] | q0 .⊥T [[C ◦ ]]q0 | N
By the Barendregt convention, we know that C (p start qT ; C ◦ )[q0 /q]. We now have to prove that: [[C ◦ [q0 /q], σ]]
p .σ0 (p) ([[C ◦ ]]p )[q0 /q] | q0 .⊥T [[C ◦ ]]q0 | N
We observe that this is true only if process q does not occur free in N , i.e., q appear in N only inside the scope of a binder. The latter must be of the form r?q; B. This is guaranteed by the fact that C is well-typed, since the typing rules prevent other processes in N to communicate with q without being first introduced. J
19