grammer to write asynchronous code in a sequential manner, producing code ..... Figure 2.1 shows an example with three processes {p, q, r} and six events. {e1,eâ² ..... Netcharts are distributed versions of HMSCs that have the structure of a .... guage with advanced features and libraries like Java or C#, the faster per-.
Design and Programming of Asynchronous Concurrent Systems A Natural Verifiable Approach
Thesis submitted in partial fulfilment of the
Degree of Doctor of Philosophy (Ph.D.)
by
Prakash Chandrasekaran Chennai Mathematical Institute
November 2009
Dedicated to my Parents
i
DECLARATION
The work in this thesis is based on research carried out by me under the supervision and guidance of Prof. Madhavan Mukund. Most of the research reported in this thesis was carried out at the Chennai Mathematical Institute. The work reported in Chapters 5 and 6 was initiated during a summer internship at Microsoft Research India, Bangalore, in 2006. No part of this thesis has been submitted elsewhere for any other degree or qualification.
November 2009
Chennai Mathematical Institute Plot H1, SIPCOT IT Park Padur PO, Siruseri TN, INDIA. PIN: 603 103.
Prakash Chandrasekaran
ii
CERTIFICATE
This is to certify that the thesis entitled “Design and Programming of Asynchronous Concurrent Systems — A Natural Verifiable Approach” submitted by Mr. Prakash Chandrasekaran is a bona fide record of the research work carried out by him under my supervision and guidance. The contents of the thesis in full or parts have not been submitted to any other institute or university for the award of any degree or diploma.
November 2009
Madhavan Mukund Thesis Supervisor
Chennai Mathematical Institute Plot H1, SIPCOT IT Park Padur PO, Siruseri TN, INDIA. PIN: 603 103.
iii
Abstract In this thesis, we address the need for a formal model for the design and implementation of Asynchronous Concurrent Systems, and also the need for a programming language to enable analyzable design of Asynchronous software components. In the first part of the thesis we present Coordinated Concurrent Scenarios (ccs) — a formal model that combines the rich visual notation of High level Message Sequence Charts (HMSC) with the expressive power of Message Passing Automata (MPA), and also show that verification of ccs models can be automated using the model checker Uppaal, under certain bounds. In the second part of the thesis we present clarity — a programming language that extends C and introduces new features that enable the programmer to write asynchronous code in a sequential manner, producing code that is more amenable to static analysis.
iv
Acknowledgements I would like to thank Prof. Madhavan Mukund for his guidance, collaboration, encouragement and support through all these years, since my undergraduate days, at the Chennai Mathematical Institute (CMI). I would like to thank all the Computer Science faculty in CMI and at the Institute of Mathematical Sciences (IMSc) for the enjoyable and interesting courses that they had taught me since my UG. Also, thanks especially to all the Madhavan, Kumar, Deepak, KV, Bharat, Suresh, Samir, Jam, Meena, Kamal, Raj, Sripathy, Clare and Vanchi for all the warmth, support and encouragement all these years. I thank Samir for being there when I needed someone to talk to, when the going got tough. Thanks to Sriram for the internship at Microsoft Research that helped clarity take-off. Thanks to Sriram, Joseph and Chris for their collaboration on clarity. Thanks to all my colleagues and friends at CMI and IMSc, especially Rahul, Somnath, Swati, Harish, Koushik, Naru, Saket, Baskar, Naga, Gayathri, Suman, Neel, Somdeb, Sandeep, Srikant and Puneet for making life at CMI and IMSc interesting and fun. I thank the CMI administration for all their administrative support. I would also like to thank my parents and my sister for their support, and my wife Suneetha for being there with me during all the highs and lows of the last one year of my PhD.
Contents 1 Introduction
1
1.1
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Formal Models . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3
Verifiable Implementation . . . . . . . . . . . . . . . . . . . .
3
1.4
Contributions of this Thesis . . . . . . . . . . . . . . . . . . .
4
2 Background 2.1
Communicating Systems . . . . . . . . . . . . . . . . . . . . . 2.1.1
2.2
2.3
I
7
Collection of MSCs . . . . . . . . . . . . . . . . . . . . 10
High level models based on MSCs . . . . . . . . . . . . . . . . 11 2.2.1
Alternating Bit Protocol . . . . . . . . . . . . . . . . . 12
2.2.2
High-level Compositional MSC . . . . . . . . . . . . . 14
2.2.3
Causal HMSC . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.4
Netcharts . . . . . . . . . . . . . . . . . . . . . . . . . 17
Program Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.1
2.4
7
Dealing with asynchrony . . . . . . . . . . . . . . . . . 21
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Design and Specification
25
3 Coordinated Concurrent Scenarios
27
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2
Coordinated Concurrent Scenarios . . . . . . . . . . . . . . . . 28 v
vi
CONTENTS 3.2.1
Informal ccs Semantics . . . . . . . . . . . . . . . . . 29
3.2.2
Interleaving Transactions . . . . . . . . . . . . . . . . . 30
3.3 ABP in ccs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 Formal Semantics . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4.1
Adding Local Concurrency . . . . . . . . . . . . . . . . 34
3.4.2
Specifying Blocked Threads . . . . . . . . . . . . . . . 37
3.4.3
Formal Semantics of MSC Programs . . . . . . . . . . 40
3.4.4
Relaxing Global Behavior on await . . . . . . . . . . . 40
3.5 Execution Semantics . . . . . . . . . . . . . . . . . . . . . . . 40 3.5.1
if , atomic and waitfor . . . . . . . . . . . . . . . . . . 41
3.5.2
Complete Semantics . . . . . . . . . . . . . . . . . . . 42
3.6 Expressiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.6.1
Encoding message sequence graphs . . . . . . . . . . . 45
3.6.2
Encoding message passing automata . . . . . . . . . . 46
3.6.3
Netcharts . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4 Formal Verification
49
4.1 Scenarios as ccs Specification . . . . . . . . . . . . . . . . . . 50 4.2 Modelling ccs in Uppaal . . . . . . . . . . . . . . . . . . . . 51 4.2.1
Component FSMs . . . . . . . . . . . . . . . . . . . . . 52
4.2.2
p-local MSCs as Uppaal Templates . . . . . . . . . . . 52
4.2.3
Buffered Channels . . . . . . . . . . . . . . . . . . . . 53
4.2.4
Triggering p-local MSCs . . . . . . . . . . . . . . . . . 55
4.3 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.3.1
Using Specification Templates . . . . . . . . . . . . . . 57
4.3.2
Scenario Embedding . . . . . . . . . . . . . . . . . . . 58
4.3.3
Matching modulo weak embedding . . . . . . . . . . . 58
4.3.4
Modelling Transactions . . . . . . . . . . . . . . . . . . 59
4.4 Optimizing the translation into Uppaal . . . . . . . . . . . . 59 4.4.1
Controlling Interleavings with committed locations . . . 60
4.4.2
Guiding Uppaal using priorities . . . . . . . . . . . . 60
CONTENTS
vii
4.4.3
Meta Variables . . . . . . . . . . . . . . . . . . . . . . 61
4.5
II
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Implementing with clarity
5 Introducing CLARITY
63 65
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2.1
Coords in clarity . . . . . . . . . . . . . . . . . . . . 70
5.2.2
Linearity Annotations . . . . . . . . . . . . . . . . . . 74
5.3
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4
Operational Semantics . . . . . . . . . . . . . . . . . . . . . . 80 5.4.1
Semantic Rules . . . . . . . . . . . . . . . . . . . . . . 82
5.5
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.6
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6 Analyzing CLARITY 6.1
Static Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.1.1
6.2
89
Sequential analysis . . . . . . . . . . . . . . . . . . . . 90
Concurrency analysis . . . . . . . . . . . . . . . . . . . . . . . 92 6.2.1
Guarantees and limitations . . . . . . . . . . . . . . . . 92
6.3
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.4
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7 Epilogue
101
7.1
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
viii
CONTENTS
List of Figures 2.1
An MSC (a) over {p, q, r}, the corresponding MPA (b). . . . .
2.2
Basic interactions in the alternating bit protocol . . . . . . . . 13
2.3
Message-passing automaton for the alternating bit protocol . . 13
2.4
An arbitrarily long MSC for abp. . . . . . . . . . . . . . . . . 14
2.5
HCMSC model for Alternating Bit Protocol . . . . . . . . . . 15
2.6
Basic interactions in the Causal MSG for the alternating bit
9
protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.7
Netchart for Alternating Bit Protocol . . . . . . . . . . . . . . 19
3.1
Crossing Transactions
3.2
The alternating bit protocol specified in ccs . . . . . . . . . . 32
3.3
A simple client-server . . . . . . . . . . . . . . . . . . . . . . . 34
3.4
A more realistic client-server . . . . . . . . . . . . . . . . . . . 36
3.5
A more realistic client-server, with local guards . . . . . . . . 37
3.6
Print-Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1
Translating p-local MSC into an automaton. . . . . . . . . . . 53
4.2
Translating p-local MSC into an automaton using explicit buffers. 54
4.3
Translating ccs components to Uppaal templates. . . . . . . 56
5.1
Sending packets and pausing in a C driver . . . . . . . . . . . 68
5.2
Sending packets and pausing in clarity . . . . . . . . . . . . 70
5.3
Coord for gate . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.4
Automaton (with guards) described by the gate coord protocol. 73
. . . . . . . . . . . . . . . . . . . . . . 31
ix
x
LIST OF FIGURES 5.5 Network file server with asynchronous reading and serialized sending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.6 Coord for chute . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.7 Alternate, recursive network file server . . . . . . . . . . . . . 78 5.8 clarity syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.1 clarity simulation environment . . . . . . . . . . . . . . . . 99
Chapter 1 Introduction 1.1
Background
Communicating systems are multi-component systems whose behaviour is defined by the communication among the components. We call a communicating system concurrent when a component in the system can participate in more than one communication scenario at the same time. For example, a print-server along with a client and a printer could form a concurrent communicating system, wherein the server can simultaneously communicate with the client and the printer. More often than not, these systems also (not coincidentally) exhibit asynchronous behavior. For example, a client might make a request to the server and proceed to do other computation without waiting for a response from the server. And the server might, at a later point, indicate the completion of the request by a signal or an event. We use and interact with various communicating systems, from our mobile phone that is always communicating with the base station, to a humble remote-control that communicates with the television. There are also systems that we don’t directly interact with, like the communication between software components hidden under the hood of the operating system on our computers, or the interaction between various sensors and on-board con1
2
CHAPTER 1. INTRODUCTION
trollers on all our modern automobiles. With the presence of communicating systems being so ubiquitous, it becomes essential to ensure that they do what they are designed for and also do not cause unintended side effects. In other words we can say that we want to verify that the systems fulfill their specifications, and satisfy all safety constraints. A formal specification is a mathematical description of a system (software or hardware) that is used to design and implement it. The goal of a formal specification is to describe what the system should do, and not necessarily how it should do it. A correctness specification specifies what is expected of the system under different input conditions. A safety specification specifies what the system should not do under any circumstances. In the print-server system, a correctness specification could describe the protocol the client should use to submit a print request to the server, and the server’s communication with the printer to print the request. And, a safety specification could assert that the server will never send a new job to a printer when it is busy printing an earlier job. Formal verification of a system is the process of confirming whether the given design or implementation of the system conforms to its specification. Before we can verify that a system does what it is designed to do, we need to be able to specify what it is desired of the system. While designing and specifying these asynchronous concurrent communicating systems, it might seem convenient to use descriptive text in the specification. For example, in the telecom industry the ITU Z120 specification is used to describe communication scenarios, and structured English text is used to describe the control flow between the various scenarios. But, this can give rise to ambiguities and conflicts while specifying large systems. Hence, the need arises for a rigorous formal model to help in specification and verification to ensure correctness and safety. While a rigorous mathematical model might be the most desirable in terms of verification, the steep learning curve will impede adoption by the industry. An ideal formal model for specification should have a nice visual representation with formal semantics that is intuitive and easy to understand.
1.2. FORMAL MODELS
1.2
3
Formal Models
Although there are many formal models available, they all have their own limitations. Message Passing Automata (MPA) have a clear operational interpretation, but they are difficult to work with as they lack a mechanism to specify global interactions. On the other hand, more visual notations like Message Sequence Charts (MSCs), that are often used in the (telecommunications) industry in conjunction with descriptive text, are not expressive enough to capture many of the behaviors of these concurrent systems. MSCs, although useful in describing the global interactions, can express only limited aspects of control flow, and are hence difficult to execute and implement. In the next chapter, we introduce some of the well known formal models for communicating systems, and discuss them. We then motivate the need for a visual model that can capture the concurrency and asynchrony in the behavior of these systems in a natural way.
1.3
Verifiable Implementation
Although essential, using formal models only helps till the design stage. The final components will be implemented in some programming language. Such implementations also bring with them their own challenges, depending on the language and platform used. It becomes essential that we are able to verify and certify the correctness of the program as well. There are many program analysis tools that can analyze software and identify bugs. Programs can be analyzed both statically and dynamically. Static analysis means analyzing the source code and verifying properties about it, without having to actually simulate (or execute) it. Whereas, in dynamic analysis one executes or simulates the program so as to verify it. Dynamic analysis is not always a feasible option, as it requires that the program be executed with sufficient test inputs for the results to be reliable. Static program analysis also has its own limitations, when dealing with dynamic memory, pointer arithmetic, and concurrency.
4
CHAPTER 1. INTRODUCTION In the next chapter we discuss these program analysis techniques and
their limitations, and explain the need for a new programming paradigm for describing asynchronous communicating systems so as to make the programs themselves easy to verify.
1.4
Contributions of this Thesis
In this thesis, we address the above mentioned problems in the design and implementation of Asynchronous Concurrent Systems in two parts. We use both the terms Asynchronous Systems and Concurrent Systems interchangeably, to denote Asynchronous Concurrent Systems, depending on which aspect is focussed upon. In the first part, we address the need for a formal model for specifying these systems by presenting our new model Coordinated Concurrent Scenarios (ccs). ccs provides a natural visual notation for specification that is also easily verifiable. ccs, based on High-level Message Sequence Charts, introduces the notion of asynchronous transitions and uses local specifications of components. ccs uses global transaction declarations to relate the various local views into a single global communication scenario. We then describe an approach to automate verification of the specification using existing model checking tools. In the later part, we describe clarity, a programming language that extends C and enables analyzable design of Asynchronous Components. clarity introduces three novel features: 1. Nonblocking function calls which allow event-driven code to be written in a sequential style. If a blocking statement is encountered during the execution of such a call, the call returns and the remainder of the operation is automatically queued for later execution. 2. Coords, a set of high-level coordination primitives, which encapsulate common interactions between asynchronous components and make high-level coordination protocols explicit.
1.4. CONTRIBUTIONS OF THIS THESIS
5
3. Linearity annotations, which delegate coord protocol obligations to exactly one thread at each asynchronous function call, transforming a concurrent analysis problem into a sequential one. In Chapter 2 of the thesis we motivate our model by discussing various existing formal models and comparing them on some classical examples from the literature. Also, we describe the problems with the current approach to programming these systems and motivate the need for a new programming language. In Chapter 3 we introduce our formal model ccs with a few examples, and present the formal syntax and semantics. We describe an approach to automate verification of a ccs specification using existing model checkers in Chapter 4. In Chapter 5, we describe the new language features of clarity with a few examples. Also, we present the formal syntax and operational semantics of clarity programs. We describe static analysis of clarity programs and present the details of the clarity prototype implementation along with verification and performance results in Chapter 6. Chapter 7 concludes the thesis with a brief discussion on possible future work in bridging the gap between ccs and clarity and (semi-)automatic program synthesis.
6
CHAPTER 1. INTRODUCTION
Chapter 2 Background In this chapter we introduce related work and motivate the need for a new formal model for specifying asynchronous communicating systems, as well as a new programming language for implementing asynchronous code. In section 2.1, we formally define some basic formal models to describe communicating systems. In section 2.2, we introduce various higher level models that add additional features and expressibility to the basic formalisms. As we introduce these high-level models we also compare their expressiveness by using them to represent a classical example, the alternating bit protocol (ABP). In section 2.3, we describe various types of program analysis, and the difficulties in verifying asynchronous C programs. We conclude this chapter with a motivation for the need of a new formal model and programming paradigm to overcome some of the challenges described in this chapter.
2.1
Communicating Systems
We begin with an introduction to message sequence charts (MSCs). A widely used formalism for modelling communication, MSCs visually describe a set of messages exchanged by components in a system. Formally, let P = {p, q, r, . . .} be a finite set of processes that communicate via messages sent over reliable FIFO point-to-point channels. We assume a finite set of message types M. Each p ∈ P can perform three types 7
8
CHAPTER 2. BACKGROUND
of actions. • p!q(m): p sends message m to q ∈ P. • p?q(m): p receives message m from q ∈ P. • Local actions, denoted {a, b, . . .}. Let Σp denote the set of actions performed by p and Σ =
S
p∈P
Σp . By ∆p
we denote the set of actions Σp ∪ {q!p(m), q?p(m) | m ∈ M, q ∈ P, q 6= p}. Thus, ∆p extends Σp with send and receive actions on other processes that refer to messages sent to or received from p. Labelled posets A Σ-labelled poset is a structure M = (E, ≤, λ) where (E, ≤) is a partially ordered set with a labelling function λ : E → Σ. For e ∈ E, ↓e = {e′ | e′ ≤ e} and for X ⊆ E, ↓X = ∪e∈X ↓e. We call X ⊆ E a prefix of M if X = ↓X. For p ∈ P and a ∈ Σ, we set Ep = {e | λ(e) ∈ Σp } and Ea = {e | λ(e) = a}, respectively. For p, q ∈ P, p 6= q, we define the relation sendLock);
else {
CompletePacket(a, p, Status);
Status = NicPausing;
DEC_REF_CNT(a);
return STATUS_PENDING;
} else { Status = STATUS_PENDING;
} }
ListAddEnd(a->pSendList, p); ReleaseSpinLock(&a->sendLock);
void DoPendingSend(Adapter *a) {
}
assert(HW_IS_AVAIL(a->pHwCsr));
return Status;
AcquireSpinLock(&a->sendLock); Packet *p = ListRemoveHead(
}
a->pSendList ); Status = NICSendPacket(a, p);
void ReleaseBuffers() { ...
ReleaseSpinLock(&a->sendLock);
if (a->AdapterState == NicPausing
CompletePacket(a, p, Status); DEC_REF_CNT(a);
&& REF_CNT(a) == 0) { a->AdapterState = NicPaused;
}
PauseComplete(a); } ... }
Figure 5.1: Sending packets and pausing in a C driver
69
5.2. OVERVIEW
it has been added to this queue, since there is no control dependency between SendPacket and the code that processes the queue. Packets from this queue are removed and transmitted at several places in the driver code—the logical operation “send packet” is “manually scheduled” across several functions. One such example is shown in function DoPendingSend. A property we might want to check is that every non-failing call to SendPacket is followed by a matching call to CompletePacket.
In the
SendPacket code, this readily holds if the hardware is immediately available (the HW IS AVAIL check). If it is not, the situation is more complicated: the packet is put into a queue from which it is retrieved and completed at a later time, in another function. Because of the difficulty of tracking heap objects and non-sequential control flow, sequential error detection tools are unable to check if every packet is completed along all execution paths. The function Pause in Figure 5.1 demonstrates another difficulty—ad-hoc coordination between asynchronous operations. This function is the entry point for a “pause” operation, which needs to wait until all outstanding sends are finished before calling PauseComplete. The driver maintains a reference count, REF CNT(a), which tracks the number of outstanding sends in progress. In several unrelated places in the code, inside and outside the Pause function, the reference count is checked, updated, and PauseComplete is called (e.g., in the function ReleaseBuffers). Suppose we wish to automatically check that the pause operation completes only after all other pending operations (like a blocked SendPacket) have been completed. This is possible only by doing a global analysis that considers all possible interleavings between pause and other operations, taking into account all of the implicit control dependencies, the reference counts, and the heap objects involved. Such a check is beyond the reach of today’s analysis technology. Figure 5.2 shows a clarity implementation of the send and pause operations. The function SendPacket now represents the entirety of the logical operation of sending a packet. Inside SendPacket, clarity’s waitfor primitive is used to logically wait until the hardware becomes ready and then transmit the packet. Calls to the SendPacket function from the operating
70
CHAPTER 5. INTRODUCING CLARITY
STATUS SendPacket(Adapter *a,
STATUS Pause(Adapter *a) {
Packet *p) {
if(!(a->sendGate->Close()))
if(!(a->sendGate->Enter()))
return STATUS_FAILED;
return STATUS_FAILED;
waitfor(STATUS_PENDING,
waitfor(STATUS_PENDING,
[a->sendGate->e],
HW_IS_AVAIL(a->pHwCsr),
a->sendGate->IsEmpty());
[]);
a->AdapterState = NicPaused;
Status = NICSendPacket(a, p);
PauseComplete(a->AdapterHandle);
CompletePacket(a, p); a->sendGate->Exit();
return STATUS_SUCCESS; }
return Status ; }
Figure 5.2: Sending packets and pausing in clarity system are nonblocking—if the hardware is not ready, the caller is returned the value STATUS PENDING immediately (the first argument to waitfor); the remainder of the computation is automatically converted into a closure and put into a queue. In this case, the final return statement is of no consequence as it will return to the clarity runtime, and will be ignored. The last argument to waitfor is a set of events that can cause the waitfor to continue execution (from a blocked state, even if the wait condition is not met). For example, we might want a blocked SendPacket call to fail a transmit action (irrespective of hardware state), if Pause had been invoked in the meantime. We leave this set empty, in the code show in Figure 5.2. The semantics of the clarity code are similar to those of the C code in Figure 5.1, but the programmer does not have to manually schedule the code or manage the persistent state. Moreover, a sequential analysis tool can now easily check that every packet is completed on all execution paths before the SendPacket function exits, without doing any heap analysis.
5.2.1
Coords in clarity
clarity uses higher level abstractions called coords to express coordination between different asynchronous operations. The code in Figure 5.2 uses
71
5.2. OVERVIEW
sendGate, an instance of the gate coord (that is declared as part of the Adapter structure), the interface of which is given in Figure 5.3. The interface has four functions: the first two, Enter and Exit, are used by “client” threads when they begin and end operations that are controlled by the gate; the second two, Close and IsEmpty, are used by “control” threads. Close is used to prevent new operations from beginning and IsEmpty is used to check whether pending operations have completed. Waiting for a collection of asynchronous processes to complete is a common pattern in asynchronous systems programming. This pattern, commonly referred to as “asynchronous rundown”, is modelled by the gate coord. A gate can be implemented in a few dozen lines of code using an atomic counter and a boolean flag. In Figure 5.2, all send operations first call sendGate->Enter() and then call sendGate->Exit() before returning.
The function Pause calls
sendGate->Close(), then waits for sendGate->IsEmpty() to become true and returns. Unlike Figure 5.1, there is only one place in the code (inside the body of Pause) where the pause operation is completed. At runtime, Pause may need to wait asynchronously for pending send operations to complete, but the programmer does not have to worry about these details. Significantly, clarity enables the programmer to make the high-level contract between Pause and the other operations explicit. Consequently, it is possible to perform simple compositional analysis automatically and check that the coordination has been implemented and used properly. A coord declaration has three parts — variable declaration, function declaration and protocol declaration. The variable and function declarations are similar to a class declaration in C++, and describe the coord implementation. The protocol declaration specifies the sequence of calls by which every logical thread accesses the coord. A coord implementation is just a C (or C++) program (library) that provides the coord’s functionality. An implementation in C++ would be a class that has provides the variables and functions declared in the coord declaration, and whose behavior conforms to the coord protocol specification.
72
CHAPTER 5. INTRODUCING CLARITY
coord gate
Exit.return {
{
if(state==s1) state= done; /* VARIABLE DECLARATION */
else abort();
event e;
}
/* FUNCTION DECLARATION */
Close.return {
bool Enter();
if(state==init && $ret)
void Exit();
state = s2;
bool Close();
else if(state==init && !$ret)
bool IsEmpty();
state = done; else abort();
/* PROTOCOL DECLARATION */
}
protocol{ /* variables */
waitfor {
enum state {init,s1,
if(state==s2 && $2 =~ [e]
s2,done,
&& $1 =~ ‘IsEmpty()‘)
final} = init;
state = done; else abort();
/* triggers */
}
Enter.return { if(state==init && $ret)
ThreadDone {
state = s1;
if (state!=done && state!=init)
elseif (state==init && !$ret)
abort();
state = done;
}
else abort(); }
} }
Figure 5.3: Coord for gate
73
5.2. OVERVIEW Enter,Exit,Close,ThreadDone
abort
waitfor,Exit
s2
Close/[ret]
init
Enter,Exit,Close,waitfor
Enter,Close,ThreadDone,waitfor
Enter/[ret]
s1 Exit
Enter/[!ret]
waitfor/[...]
done
Close/[!ret]
Figure 5.4: Automaton (with guards) described by the gate coord protocol. We use the low-level specification language slic [6] — a Specification Language for Interface Checking of C — to specify the coordination protocol. The protocol declares a set of variables and then defines transitions caused by triggers, e.g., a function call return (Enter.return), the evaluation of a waitfor statement (waitfor), or thread termination (ThreadDone). A transition may inspect and update the values of the protocol variables. A call return transition may inspect the return value using the $ret variable. A waitfor or call transition may inspect the argument list using positional variables $1, $2, etc. A transition to an error state is represented by a call to abort. In the case of the gate coord, the protocol declaration (Figure 5.3) defines the state machine shown in Figure 5.4 as part of its protocol declaration. Any thread that uses the gate coord must satisfy this automaton. That is, any thread should use the coord in only one of the following ways: 1. call Enter first and, if the call returns true, then later call Exit — this is the typical behavior of a “client” thread that would just pass through the gate, if it is open. 2. call Close first and, if the call returns true, then wait until IsEmpty() returns true — this is the typical behavior of a “control” thread that at
74
CHAPTER 5. INTRODUCING CLARITY some point closes the gate and waits for all threads (that have entered) to exit.
Using the protocol specification, a gate implementation can be compositionally checked for correct concurrent behavior: assuming that threads using the gate obey the protocol, we can verify that the gate implementation is deadlock-free.
5.2.2
Linearity Annotations
We can check that the SendPacket and Pause threads in Figure 5.2 satisfy the protocol for gate by using a per-thread sequential analysis. The compositional reasoning in this case is simplistic, since no threads are created dynamically. If new threads are created, compositional analysis of coord protocol conformance becomes more complicated. We make a particular design choice — at each asynchronous call, for every coord (protocol) instance in progress its completion needs to be delegated to exactly one of the two threads. This delegation (or hand-off) is specified using additional annotations of the form “@ coord1 , . . ., coordk ” in the source code. We call these linearity annotations, as they help in linearizing the verification of coord usage. Linearity annotations are appended to the end of a statement containing a fork or a non-blocking call. We illustrate this with another example. Network File Server: Consider the network file server shown in Figure 5.5. To read and transmit a large file, the file server launches a set of parallel thread, one to read each block of the file. The threads coordinate to send the blocks in sequence over the network. The code creates “reader” threads using a fork call to read block. It uses fileChute, an instance of the chute coord, to do the necessary synchronization. The interface for the coord chute is shown in Figure 5.6. The protocol declaration specifies that each thread using the chute must: first call Enter, which returns an integer token k; then call waitfor(IsMyTurn(k),[e]), where e is the event field of the chute; and, finally, call Exit. This protocol can be understood as a variation of Lamport’s bakery algorithm [37] where
5.2. OVERVIEW
75
void read(FILE *fp, int n) { chute fileChute; for(i = 0; i < n; i++) { /* Enter the chute before spawning thread, to ensure ordering. */ int token = fileChute.Enter(); /* The annotation @fileChute in the call below indicates that the remainder of the protocol in chute fileChute will be carried over by the callee */ fork read_block(fp,i,token,&fileChute)@fileChute; } } void read_block(FILE *fp, block i, int token, chute *fileChute) { FileBlock fb; fb = fs_read(fp,i); /* Synch before sending block on the network. We omit the return value argument, since the return type is void. */ waitfor( fileChute->IsMyTurn(token), [fileChute->e] ); /* Send and exit. */ net_send(fb); fileChute->Exit(); }
Figure 5.5: Network file server with asynchronous reading and serialized sending
76
CHAPTER 5. INTRODUCING CLARITY
coord chute { /* Sent when a thread exits. */ event e; /* Called by a thread to "get on line" in the chute. Returns an integer token (the thread’s "ticket"). */ int Enter(); /* Called by a thread to check if it is "first in line" given its token. */ bool IsMyTurn(int); /* Called by a thread to exit the chute. Sends the event e. */ void Exit(); protocol{ enum state {init,s1,s2,done,final} = init; int k; Enter.return { /* function exit transition */ if(state==init) { k = $ret; state = s1; } else abort(); } waitfor { /* invocation transition */ if(state==s1 && $1 =~ ‘IsMyTurn(k)‘ && $2 =~ [e]) state=s2; else abort(); } Exit.return { /* function exit transition */ if(state==s2) state=done; else abort(); } ThreadDone { /* thread exit transition */ if(state!=done && state!=init) abort(); } }
Figure 5.6: Coord for chute
5.2. OVERVIEW
77
the thread may enter a non-critical section after “taking a number” (entering the chute). Unlike gate, there is only one correct usage pattern for a chute—there is no distinction between “client” and “control” threads. The Exit method does not take a token argument and the protocol forbids any thread to call Exit except when IsMyTurn returns true. The protocol also forbids a thread from trying to “spoof” a token and steal a turn — for each thread the argument to IsMyTurn must match the return value of Enter. Note that the thread executing read in Figure 5.5 calls Enter, but never calls IsMyTurn or Exit; likewise, each read block thread calls IsMyTurn and Exit without first calling Enter. How, then, can we verify the chute protocol using sequential reasoning? Whenever a logical thread makes an fork call, it effectively creates two logical threads of execution. We require that each coordination protocol in progress be handed off to exactly one of the two threads; each fork call is annotated with those instances of the protocol that will be handled by the callee (i.e., the new thread). Note that the fork call to read block in Figure 5.5 is annotated with @fileChute. The annotation indicates that the callee, read block, is responsible for completing the protocol for fileChute. An empty annotation would imply that the caller — and not the callee — is responsible for completing the protocol. In general, the callee is responsible only for the coords mentioned in the annotation, and the caller for the rest of the coords in progress. Since exactly one logical thread is responsible for carrying out the remainder of the protocol at every asynchronous call, the sequential analysis merely follows one of the two continuations at the call and ignores the other, depending on which instance of the protocol is currently being analyzed, as described in Chapter 6. Recursive Network File Server: Our final example is a recursive implementation of the network file server shown in Figure 5.7. Instead of generating “readers” from a “master” thread, the recursive implementation has a chain of recursive fork calls to read block; each new “reader” thread
78
CHAPTER 5. INTRODUCING CLARITY
void read(FILE *fp, int n) { chute fileChute; read_block(fp,0,n,&fileChute); } void read_block(FILE *fp, int i, int max, chute *fileChute) { FileBlock fb; if( i==max ) return; /* Enter the chute before spawning thread, to ensure ordering. */ int token = fileChute->Enter(); /* parallel call to the next file block reader. */ fork read_block(fp,i+1,max,fileChute); /* asynchronous part, can execute without any ordering */ fb = fs_read(fp,i); /* Synch before sending block on the network. */ waitfor( fileChute->IsMyTurn(token), [fileChute->e] ); /* Send and exit. */ net_send(fb); fileChute->Exit(); }
Figure 5.7: Alternate, recursive network file server spawns its own successor. Note that the calls to Enter and Exit now always happen in the same thread context. Note also that the fork call to function read block in Figure 5.7 does not contain the annotation @fileChute. This indicates that the calling thread continues to be responsible for the protocol on fileChute. Again, checking if each thread follows the protocol can be done using purely sequential analysis, one thread at a time. Separately, the correctness of the chute implementation can be established once and for all, assuming that all the client threads conform to the protocol.
79
5.3. SYNTAX Stmt
::=
(Send | CallStmt | WaitFor) ;
Send
::=
(send | sendall) EventId
CallStmt
::=
Fork | NonBlock | Block
Fork
::=
fork CallExpr Annot?
Nonblock
::=
(Lvalue =)? nonblock CallExpr Annot?
Block
::=
(Lvalue =)? block? CallExpr
CallExpr
::=
FuncId ( (CExpr List)? )
Annot
::=
@ ProtocolId List
WaitFor
::=
waitfor( (CExpr ,)? WaitCond List )
WaitCond
::=
(LabelId :)? CExpr , [ (EventId List)? ]
A List
::=
A (, A)∗
Figure 5.8: clarity syntax
5.3
Syntax
clarity is an extension of ANSI/ISO C [36]. clarity’s extensions to C syntax are given in Figure 5.8. The terminal symbols EventId, FuncId, ProtocolId, and LabelId represent alpha-numeric identifiers with event, function, protocol, and label types, respectively. The terminal Lvalue represents a standard C lvalue expression : that is, an expression that can occur on the left hand side of an assignment statement. The terminal CExpr represents a standard C expression. A clarity Stmt may appear anywhere a statement is allowed in standard C (e.g., in the bodies of loops and if-then-else statements). The new statement types are Send, CallStmt, and WaitFor. Send statements (both send or sendall) use an event identifier. There are three types of call statements: Fork, Nonblock, and Block. Fork and Nonblock calls can take an optional linearity annotation. Block and Nonblock calls can assign their return value to an optional Lvalue. Calls not specified as fork, nonblock, or block are understood to be blocking by default. The WaitFor statement uses an expression (a return value) and a (non-empty) list of WaitCond records (wait conditions). If the return type of the function in which the statement appears is void, the return value may be omitted. A WaitCond record is tagged using an optional wait label and uses an expression (the wait predi-
80
CHAPTER 5. INTRODUCING CLARITY
cate) and a (possibly empty) list of event identifiers (wait events) enclosed in square brackets. The label is used by the runtime to identify the wait condition that enabled execution.
5.4
Operational Semantics
We now present the full operational semantics for the new statements and expressions in clarity. Notation Let A be a set. We use 2A to denote the powerset of A and A∗ to denote the set of multisets (or bags) of elements from A. We use ∪ for set union, ⊎ for multiset sum, and {{a1 , . . . , an }} for a multiset of elements a1 , . . . , an . We elide braces from singleton sets and multisets, when the meaning is clear. Let Var, Expr, Stmt, Lab be sets of variables, expressions, statements, and wait labels, respectively, appearing in the program. Stmt includes compound statements, i.e., statements of the form S1 ; S2 where S1 , S2 ∈ Stmt. Let Locs be a set of locations, Vals ⊆ Expr a set of values, and Evts a set of events. Let false and true be elements of Vals such that false 6= true. Let M = Locs → Vals be a set of memory states, functions from locations to values. Let M : Expr → Vals be a function from expressions to values in memory state M ∈ M. We require that M (v) = v for all v ∈ Vals. Let C be a set of continuations. A continuation is either blk x.S (a blocking continuation) or nbl x.S (a non-blocking continuation), where x ∈ Var is the variable that will receive the return value, and S ∈ Stmt is the continuation block. Let K be a set of continuation stacks. A continuation stack is either • (the empty stack ) or k; K, where k is a continuation and K is a continuation stack. Let W = Lab × Expr × 2Evts be a set of wait conditions. Each hℓ, b, Ei ∈ W represents a wait condition with label ℓ, wait predicate b, and wait events E. Let B = 2W × Stmt × K × 2Evts be the set of blocked thread descriptors.
81
5.4. OPERATIONAL SEMANTICS
Each hW, S, K, Li ∈ B represents a thread that has blocked at a waitfor statement with wait conditions W , next statement S, continuation stack K, and local events L. Let R = Stmt × K × 2Evts be a set of running thread descriptors. Each hS, K, Li ∈ R represents a thread that is currently executing with next statement S, continuation stack K, and local events L. Let S = M × 2Evts × B∗ × R∗ be the set of system configurations. Each hM, E, Q, P i ∈ S represents a system configuration with state M, global events E, multiset of blocked thread descriptors Q (the blocked thread list), and multiset of running thread descriptors P (the active thread list). Semantic rules are of the form
C D
, representing the evolution of the sys-
tem from configuration C to configuration D. A configuration is a tuple hM, E, Q, P i representing a system state with memory state M, set of global events E, multi-set of blocked threads Q, and multiset of active threads P . A blocked thread is a tuple hb, E, S, Ki, representing a thread that has blocked at a waitfor statement with wait predicate b, wait events E, next statement S, and continuation stack K. An active thread is a tuple hS, Ki, representing a thread that is currently executing with next statement S and continuation stack K. To keep the semantics clean, we make several simplifying assumptions. First, since clarity statements require only trivial intraprocedural control flow, we assume that each statement is of the form S1 ; S2 , where S1 is a clarity statement and S2 is an arbitrary C statement. Second, we treat functions as if they have no arguments. Function arguments can be handled as assignments from actuals to formals; we assume that rules not shown have evaluated these assignments, leaving only the function invocation. Finally, we assume the existence of rules that are not shown which reduce the arguments to return, send, and waitfor from syntactic expressions to values, as necessary: we write return v, send e, and waitfor r b E, where v and r are arbitrary values, e is an event, b is a boolean expression, and E is a set of events (the pair (b, E) represents a single unlabeled wait condition). Rules for C language statements not given are as in ANSI/ISO C.
82
5.4.1
CHAPTER 5. INTRODUCING CLARITY
Semantic Rules
The semantics is nondeterministic — if a configuration matches the left-hand side of more than one semantic rule, the system may evolve according to any one of the matched rules. Semantic rules are evaluated atomically. Although more than one process may execute in parallel, the set of global events and the blocked and active thread lists will remain consistent. However, the memory state component of a configuration is shared between processes: race conditions can occur if processes access the same location without using a safe coordination scheme. Fork, Call and Return A fork call (Call-Fork) creates a new running thread descriptor and invokes the called function. Call-Fork
hM, E, Q, P ⊎ hfork f(); S, K, Lii hM, E, Q, P ⊎ {{hS, K, Li, hf(), •, ∅i}}i
A blocking call (Call-Blk) adds a blocking continuation (blk) to the stack. Call-Blk
hM, E, Q, P ⊎ hx = block f(); S, K, Lii hM, E, Q, P ⊎ hf(), (blk x.S); K, Lii
A nonblocking call (Call-Nbl) adds a nonblocking continuation (nbl) to the stack. Call-Nbl
hM, E, Q, P ⊎ hx = nonblock f(); S, K, Lii hM, E, Q, P ⊎ hf(), (nbl x.S); K, Lii
Once the stack has been updated, a called function f is expanded into the statement representing its body. Call
hM, E, Q, P ⊎ hf(), K, Lii
S is the body of f
hM, E, Q, P ⊎ hS, K, Lii
83
5.4. OPERATIONAL SEMANTICS
The behavior of return is the same for both synchronous (Return-Blk) and asynchronous (Return-Nbl) continuations on the stack. The return value is stores in the variable x and the continuation S is executed. Return-Blk
hM, E, Q, P ⊎ hreturn v, (blk x.S); K, Lii hM, E, Q, P ⊎ hx = v; S, K, Lii
Return-Nbl
hM, E, Q, P ⊎ hreturn v, (nbl x.S); K, Lii hM, E, Q, P ⊎ hx = v; S, K, Lii
If the continuation stack is empty, current operation halts, and a different operation may be scheduled from the queue. Return-Empty
hM, E, Q, P ⊎ hreturn v, •, Lii hM, E, Q, P i
Sending Events The statement send e results in the event e being added to the set of global events (Send). This event e will can be consumed by exactly one of the processes in the queue. On the other hand, the broadcast statement sendall e adds the event e to the set of local events of each process in the queue.
Send
hM, E, Q, P ⊎ hsend e; S, K, Lii hM, E ∪ {e}, Q, P ⊎ hS, K, Lii
SendAll
hM, E, Q, P ⊎ hsendall e; S, K, Lii hM, E, Q′ , P ′ ⊎ hS, K, L ∪ {e}ii
where hb1 , E1 , S1 , K1 , L1 ∪ {e}i ∈ Q′ ⇐⇒ hb1 , E1 , S1 , K1 , L1 i ∈ Q and hS2 , K2 , L2 ∪ {e}i ∈ P ′ ⇐⇒ hS2 , K2 , L2 i ∈ P WaitFor The waitfor statement does not block if the wait evaluation of the predicate is not false and the wait events are available. If the waitfor statement
84
CHAPTER 5. INTRODUCING CLARITY
blocks, the behavior differs depending on whether or not there is a nbl continuation on the stack. If all continuations on the stack are blk continuations, the next statement and the stack are added to the blocked process list—every function in the call stack is blocked until the wait condition is satisfied.
hM, E1 ∪ E2 , Q, P ⊎ Wsat i, WaitFor-Sat
M (b) 6= false
Wsat = hwaitfor r (W ∪ (ℓ, b, L1 ∪ E1 )); S, K, L1 ∪ L2 i hM, E2 , Q, P ⊎ hwaitevent = ℓ; S, K, L2 ii hM, E1 , Q, P ⊎ hwaitfor r W ; S, k1 ; . . . ; kn ; •, Lii ki = blk xi .Si′ , 1 ≤ i ≤ n
WaitFor-Blk
∀hℓ, b, Ei ∈ W : M(b) = false ∨ E 6⊆ E1 ∪ L hM, E1 , Q ⊎ hW, S, k1 ; . . . ; kk ; •, Li, P i
If there is a nbl continuation on the stack, the next statement and portion of the stack preceding the nbl continuation (the blocking prefix ) are added to the blocked process list, but the return value argument to waitfor is passed to the nbl continuation and the non-blocking caller remains active—control returns to the most recent non-blocking context. Note that the return type of all of the functions in the blocking prefix must match—this can be checked using a simple type analysis. hM, E1 , Q, P ⊎ hwaitfor r W ; S1 , k1 ; . . . ; kn ; (nbl x.S2 ); K, Lii ki = blk xi .Si′ , 1 ≤ i ≤ n WaitFor-Nbl
∀hℓ, b, Ei ∈ W : M (b) = false ∨ E 6⊆ E1 ∪ L hM, E1 , Q ⊎ hW, S1 , k1 ; . . . ; kn ; •, ∅i, P ⊎ hx = r; S2 , K, Lii
When the wait condition of a blocked thread descriptor is satisfied, the thread consumes its wait events and moves from blocked to running. hM, E1 ∪ E2 , Q ⊎ hW ∪ hℓ, b, L1 ∪ E1 i, S, K, L1 ∪ L2 i, P i Unblock
M (b) 6= false hM, E2 , Q, P ⊎ hwaitevent = ℓ; S, K, L2 ii
5.5. RELATED WORK
85
When an external (i.e., non-clarity) caller invokes a clarity function f, a new thread is created for f. If the external caller invokes a clarity function using a blocking call, and a blocking waitfor statement is encountered, the caller also blocks. But, if the external caller invokes a clarity function using a nonblocking call (for instance, consider the call to sendpacket from Figure 5.2), and a blocking waitfor statement is encountered, the external caller is allowed to continue with the first argument of waitfor as return value and the remainder of the operation is put in a queue and scheduled later. We assume that the thread scheduler is fair, i.e., that a blocked thread whose wait condition is infinitely often satisfied will eventually move to the active thread list (by application of Unblock) and that every active thread will eventually execute (by evaluation of its next statement). Note that this does not preclude threads blocking indefinitely: there is no guarantee that a wait condition will ever be satisfied (or, indeed, is satisfiable). It is up to the programmer to design the clarity program in such a way that deadlock is avoided and wait conditions are eventually satisfied. The use of coords and clarity’s static analysis can help avoid many concurrency errors.
5.5
Related Work
The merits of the event-driven programming style have been the subject of controversy for decades (e.g., [38; 39; 48; 56]). Recent work, e.g., the Capriccio project [57] and Adya et al [1], has focused on capturing the performance of the event-driven style in a more thread-like idiom. Li and Zdancewic have demonstrated how this approach can be incorporated into a language like Haskell [42]. Some of the techniques presented in the above papers (e.g., [56; 57]) could be used to optimize the clarity compiler and runtime. However, none of the above efforts address inter-operation coordination in a way that allows for simple compositional reasoning. The Message Passing Interface (MPI) [43] — a message-passing library interface specification — is a widely used event-driven API for parallel com-
86
CHAPTER 5. INTRODUCING CLARITY
puting. Siegel and Avrunin [51] describe techniques for model checking MPI programs which we believe could be applicable to verifying coord implementations. Strout et al. [54] formulate a data-flow analysis framework for MPI programs. Our emphasis here is on simplifying the analysis problem for event-driven code. Lee [41] discusses the difficulties of writing correct concurrent software using the threaded model and calls for the use of design patterns for concurrent computation (cf. [40; 49]). We believe coords are exactly these kinds of design patterns. To our knowledge, patterns like gate and chute have not previously been described in the literature. Further, clarity allows programmers to write their own coords, which allows the development of customized coordination schemes. For example, our tinynetapi driver (see Section 6.3) uses a gchute coord which combines elements from both the gate and chute coords. The language primitives of clarity used for sending and waiting for events are derived from Hoare and Brinch Hansen’s monitors [29; 33] and from process calculi such as CCS [44], CSP [34], and the π-calculus [45]. The distinctive feature of clarity is the compositional analysis enabled by protocol specifications on coords and the linear hand-off at asynchronous calls. Coord protocols are similar to De Alfaro and Henzinger’s interface automata [15], but are restricted to describing only the input constraints of a single component. Halbwachs et al. [27], Erlingsson and Schneider [20], and Sekar et al. [50] describe protocol enforcement through runtime monitoring. Coord protocols are intended to provide purely static checking. Our compilation strategy relies in part on a transformation to continuationpassing style (CPS) [3], which requires collecting the execution environment of the current function in a form that can be stored and resumed. Influential work on this problem includes Scheme’s closures [55], Algol’s thunks [35], and Hewitt’s actors [32]. Several systems provide light-weight threads or “fibers”, which allow programmers to create, save and resume closures efficiently [1; 56; 57]. These mechanisms can been used to mimic the behavior
5.6. DISCUSSION
87
of clarity’s blocking primitives, but they do not constitute a full solution to the difficulties of asynchronous programming—because of the lack of high-level coordination patterns like coords, state still needs to be managed manually and kept on the heap, limiting the analyzability of the code. Demsky [16] and Fischer et al. [22] describe CPS transformations from threaded to event-driven code similar to the one implemented in the clarity compiler. Demsky’s transformation creates a new continuation at every blocking I/O call and relies on the scheduler to invoke the continuation when the I/O is complete. Fischer et al. introduce a wait primitive similar to our waitfor and require potentially blocking functions to have a special type modifier. Our key contribution is to move consideration of blocking vs. non-blocking call behavior to the caller and leverage the simpler sequential semantics of the source program to perform precise program analysis. Simpler programming models for concurrency have been tried before in specialized domains. In the hardware domain, synchronous programming languages like Esterel [8] enforce deterministic concurrency by design and statically schedule the concurrent operations. For cache coherence protocols, Teapot presents a domain specific high-level language that can be both analyzed using model checking and compiled to an implementation [10]. Languages like Cilk [9] and MultiLisp [28] include parallel execution primitives similar to fork, but have focused primarily on efficient multiprocessor implementations rather than analyzability.
5.6
Discussion
In this chapter we have presented clarity, a language that allows development of asynchronous components as sequential clarity code. clarity introduces three new language features: non-blocking calls, coords with protocol specifications, and linearity annotations to delegate protocol obligations. In the next chapter we present how these new features help in efficient verification of safety properties.
88
CHAPTER 5. INTRODUCING CLARITY
Chapter 6 Analyzing CLARITY In this chapter we describe the analysis of clarity programs, and our prototype implementation of a clarity runtime environment and compiler. We present static analysis of clarity programs in Section 6.1, and concurrency analysis in Section 6.2. We then present our implementation of a clarity runtime and compiler, and our verification results in Section 6.3.
6.1
Static Analysis
The primary goal of clarity’s static analysis is to check if coords are implemented and used correctly. We want to check that assertions in the implementation of the coord never fail during execution and that no deadlocks can occur due to the use of coords (i.e., no thread waits for an event that is never sent). One way to verify this is to run a model checker on all of the threads together with the coord implementation and explore the states that arise from all possible interleavings. This approach scales poorly. We exploit the protocol specifications of coords to do compositional analysis: (1) Using sequential analysis (ignoring concurrency), we use the slam tool [5] to check that each thread of execution uses coords according to each coord’s protocol; (2) Assuming that each thread obeys the coord’s protocol, we use the zing model checker [2] to check that the implementation of the coord is correct. 89
90
CHAPTER 6. ANALYZING CLARITY
6.1.1
Sequential analysis
Coord protocol declarations are slic safety properties. For example, we might want to veriy that every thread that calls enter() on the gate coord calls exit(). Recall that we require each coordination protocol in progress to be handed off to exactly one of the two threads at each fork call site. This enables the static analysis to transform a clarity program with annotations at the fork calls to a nondeterministic sequential program. The transformation merely picks one of the two continuations at each parallel call depending on which protocol is currently being analyzed. This transformation assumes that linearity annotations are consistent with the code. We assume that the programmer does not continue to use a coord after a hand-off to another thread, either explicitly or through an alias. For example, the following code illustrates such an inconsistency. 1: read(FILE *fp, int n, chute *c) { 2:
for(i = 0; i < n; i++){
3:
int token = c->Enter();
4:
chute *c2 = c;
5:
fork read_and_send_block (fp,i,token,c) @c;
6:
waitfor(c->isMyTurn(token),[c->e]);
7:
waitfor(c2->isMyTurn(token),[c2->e]);
8:
}
9: }
The read method creates an alias c2 for chute coord c before handing-off the c to the forked thread in line 5. It then attempts to use the same chute in lines 6 & 7 using c and c2, violating the linearity annotation and possibly the chute protocol as well. We can use existing techniques [21; 52] to enforce linearity. Typestate [53] is an elegant framework for specifying a class of temporal safety properties. Typestates can encode correct usage rules for many common libraries and application programming interfaces (APIs). For example, typestate can express the property that a gate can should be entered before it can be exited.
91
6.1. STATIC ANALYSIS
In sequential typestate analyzers such as slam, a typestate property is checked independently on every statically identifiable distinct instance of the given type. There is an internal variable (say curfsm) that holds the current (typestate property) instance being checked, that is equal to NULL until an instance is detected, e.g., at a variable declaration. We transform a clarity program P to a sequential program C(P ) such that we can analyze C(P ) instead of P for conformance to the protocol specification ϕ. The transformation syntactically translates every fork call fork foo(args)@c1 , . . . , cn , that hands off coords c1 , . . . , cn to foo, to the program segment shown below. if(*) { assume( curfsm == NULL ∨ foo(args);
hW
1≤i≤n curfsm == ci
i
);
i
);
ThreadDone(); assume (false); } else { assume( curfsm == NULL ∨ }
hV
1≤i≤n
curfsm != ci
We use if(*) to represent a nondeterministic choice. In the if branch, the assume statement allows the analysis to proceed only if the current property being checked is NULL or one of the handed-off coords. The call to foo in the transformed program is a regular sequential call and not a fork call. After the call returns, the statement assume(false) forces the analysis to stop. In the else branch, the assume statement allows the analysis to proceed only if current property being checked is not one of the handed-off coords. In addition to coords, protocols can be stated on other objects as well. For example, we might want to check the completion property for each packet p that is passed to SendPacket in the network driver shown in Figure 5.2. We can check this property also using a sequential analysis, as long as we follow the programming discipline that at each fork only one of the continuations is responsible for completing the protocol and use linearity annotations to guide the analysis.
92
CHAPTER 6. ANALYZING CLARITY
6.2
Concurrency analysis
The objective of the concurrency analysis is to check the implementation of the coords against the coord protocol. We assume that each thread obeys the protocol specified by the coord and use the model checker zing that can handle concurrency, to check if the implementation of the coord works correctly under these assumptions. We automatically convert the protocol specification of the coord to generate a nondeterministic thread that exercises the coord implementation in ways that are allowed by the protocol. Then, we launch a bounded number of these threads (based on usage patterns of the coord) in parallel and check the implementation for errors (assertion violations and deadlocks) using zing. The checks we describe here prove that the implementation of the coord is correct only with a bounded number of threads. A more general proof is possible, e.g., using parameterized verification [4].
6.2.1
Guarantees and limitations
Our analysis offers the following guarantee. For any clarity program P with one coord c, and coord protocol ϕ, if each of the threads in the transformed program C(P ) satisfies ϕ using sequential analysis and the coord implementation satisfies the concurrency analysis, then during execution of P it is guaranteed that (1) there will be no assertion violations in the implementation of c and (2) if a thread in P waits for an event e associated with the coord c, then some thread is guaranteed to send e before exiting. We support this claim with the following argument. Suppose both the sequential analysis and the concurrency analysis pass, and still the program P either fails an assertion inside coord c, or deadlocks on an event in c. Consider the run r that leads to the assertion failure or deadlock. Suppose there is some
6.3. IMPLEMENTATION
93
thread that violates the coord protocol in r. This contradicts the assumption that the sequential analysis has certified all threads as individually obeying the coord protocol. Thus, all threads have to obey the coord protocol for c in r. Now consider the calls made to the coord by all the threads in r. Since every thread satisfies the protocol, such a test should have been exercised by the concurrency analysis (assuming large enough concurrency bound), contradicting the assumption that the concurrency analysis passes. Our static analysis has two main limitations. The first limitation is that it can detect deadlocks only in programs that use coords for synchronization, and then only for coords used independently. If the programmer uses lowlevel synchronization primitives or multiple coords in the same block of code, the order in which each thread does blocking waitfor operations can result in deadlocks that we will not detect. The second limitation is that we only check safety properties. Thus, if a thread t1 is waiting for an event through a coord and thread t2 is obliged to send the event, we can say only that along all code paths, before t2 exits, the event is indeed sent. We cannot guarantee that t2 exits and thus we cannot guarantee that the event will be sent.
6.3
Implementation
We wish to demonstrate the viability of our approach in building asynchronous system components with realistic levels of complexity using clarity. Towards this goal, we have implemented prototype clarity development tools and a clarity driver for a simple network card, which we have tested in an emulated environment. Compiler and runtime. The clarity compiler transforms a clarity source program into C target code. The send, sendall, and fork primitives are implemented as calls into a clarity coordination library. However, translation of the waitfor primitive requires more extensive compiler support—if a thread blocks, the clarity runtime must be able to restart the thread at a later time, perhaps in the context of a different physical thread,
94
CHAPTER 6. ANALYZING CLARITY
with all of its local state preserved. The compilation uses continuation passing style (CPS) transformations. Waiting Functions. We will use the following definitions: a waiting function is any function that contains a waitfor statement or (transitively) calls a function that contains a waitor statement; a waiting call is any call to a waiting function. We will treat waitfor as a waiting function that returns void. At a waiting call site, it is possible for execution to be suspended and resume later in a different context. Therefore, at each waiting call site, it is necessary for the clarity runtime environment to collect enough information about the current context to resume execution: in particular, we need to preserve the values of function-local variables and the next statement to execute when the waiting call returns. These values are normally stored in an activation record on the call stack—since execution may resume in another physical thread, we cannot assume that the call stack will continue to be available unmodified. Moving local variables to the heap. In event-driven programs, state is commonly managed using heap-allocated control blocks. These blocks are normally pre-allocated, fixed-size structures containing all of the information needed to resume execution at a later time. The programmer must design and manage these structures himself. clarity provides the programmer similar functionality in an automated fashion. The compiler transforms each waiting function to declare a structure called locals containing its local variables. All references to local variables are transformed to refer to the locals structure, e.g., the assignment “p = &x” (where p and x are local variables) is transformed to “locals->p = &(locals->x)”. Each waiting call is augmented with a locals parameter; the structures are chained in a list, creating a shadow image of the call stack. When a thread blocks, a pointer to the current locals structure is saved on the wait queue. (Note that the locals structure does not contain the return address of the waiting call, because the address is not explicitly available at
6.3. IMPLEMENTATION
95
the C source code level.) Continuation-passing transformation. The clarity compiler augments each waiting call with a continuation parameter, representing the next statement to be executed when control returns from the call. The continuation argument from the caller becomes part of the local environment for the waiting function. Since C does not directly support continuations, we modify the procedural structure of the source program by splitting a waiting function at each waiting call. For example, a function f with a waitfor call, “f() { A; waitfor(...); B }”, will be translated into two functions f0 and f1 (we elide the locals structure argument, which is necessary to maintain the function-local state): f0() { A; clarity__waitfor(... , f1); } f1() { B } Here clarity waitfor is the implementation in the clarity runtime environment that accepts the continuation function as one of its arguments. In addition, any return statement in a waiting function must be transformed to instead invoke a continuation on the caller’s locals structure. This transformation is applied recursively to each waiting call in a function. The precise details of the transformation, including the handling of local and inter-procedural control flow, are standard (see, e.g., Appel [3]) and we omit the details. Taken together, the local variable and continuation-passing transformations automate the translation from a threaded execution model to eventdriven code—they allow a blocked thread to be resumed at any time, from any calling context, by simply invoking the thread’s continuation on the thread’s locals structure. Device driver implementation. We have written a network device driver in clarity for an emulated device we call tinynic, comprising about 1,300 lines of code. The target C code produced by the clarity compiler is about 2,500 lines.
96
CHAPTER 6. ANALYZING CLARITY tinynic is closely modeled after hardware such as the Intel E100 network
card. We have preserved many of the sources of concurrency and asynchrony, as well as some defining features and idiosyncrasies of network hardware, such as maskable interrupts, memory mapped device registers, and reads and writes via shared memory buffers. We have eliminated most other features that are irrelevant with respect to concurrent and asynchronous behavior (e.g., tinynic does not support multicast address filters). We have a software implementation of the tinynic hardware specification that supports concurrent behavior. Static analysis. We were able to establish properties of the tinynic driver by transforming it as described in Section 6.1.1 and running a sequential analysis on the transformed program. Our tinynic driver uses a gchute coord that combines the properties of both the gate and the chute—as in Figure 5.2, we support the asynchronous rundown of sending packets, but we also use the chute protocol to ensure that packets are transmitted in the same order they were submitted. All packets call Enter and Exit on the gchute. The pause code closes the gate and waits for all pending sends to complete. The sendpacket code uses waitfor to wait until the hardware becomes available and uses the gchute to enforce packet ordering. The results for two properties are shown in Table 6.1. The first property is the protocol for the gchute, which is a mixture of both the gate and chute protocols. slam was able to check the property on the transformed clarity program in 17.77 seconds, after 9 iterations of iterative refinement, introducing 25 predicates. The second property “packet completion” states that for every packet passed to SendPacket, the code gets the size of the packet, transmits at least one fragment of the packet, calls CompletePacket. (Note that this is a weak notion of correctness for SendPacket in that we do not require every fragment of the packet to be transmitted, nor that the data transmitted match the packet data.) slam was able to check this property on the transformed clarity program in 6.82 seconds, after 2 iterations of
97
6.3. IMPLEMENTATION iterative refinement, introducing 5 predicates. clarity driver Property
Time(s) Iters
Preds Result
gchute protocol
17.77
9
25
PASS
packet completion
6.82
2
5
PASS
C driver Time(s) Iters
Preds Result
gchute protocol
*
*
*
*
packet completion
*
*
*
*
Table 6.1: Sequential checking results For both properties, slam could not finish checking these directly on a hand-coded event-driven C driver. The C driver puts packets that cannot complete immediately into a queue, implemented as a linked heap structure, making analysis difficult. However, the clarity code for the tinynic driver does not use any queues (though the clarity target code and runtime do). It simply keeps the packet as a local variable in the logical thread and uses waitfor to block in case the packet cannot be processed. Thus, using an interprocedural analysis (and without reasoning about heap structures), slam is able to prove the two properties on the clarity code, but it is unable to prove those directly on the event driven C code. Concurrency analysis. We were able to verify the implementations of gate, chute and gchute on small number of threads as shown in Table 6.2 using the model checker zing. The coord protocols were used to automatically derive a nondeterministic thread that uses the coord. In our gate implementation (code not shown), the model checker found the following bug: if the gate is closed (by calling Close) when there are no pending client threads that have entered, but not exited, then the subsequent call to waitfor(IsEmpty(),[e]) deadlocks since there is no client thread to send the event e. We were able to fix this bug and verify the modified implementation.
98
CHAPTER 6. ANALYZING CLARITY coord
Result
num threads
States explored
Time(s)
gate
PASS
3
9133
1
gate
PASS
5
1165393
74
chute
PASS
3
775
0.4
chute
PASS
5
26431
2
chute
PASS
7
1241923
103
gchute
PASS
3
11458
1
gchute
PASS
5
1827952
119
Table 6.2: Concurrency checking results Table 6.2 shows the number of states explored by the model checker and time taken by the model checker in each case. For the gate and gchute, the numbers in the table were obtained after fixing the bug mentioned above. Runtime testing. We have built a virtual test environment to provide thorough runtime testing of code generated by the compiler and the clarity runtime. Our test environment consists of a virtual network hardware implementation, tinynic, and a runtime execution environment, tinynetapi, that serves as a host for the device driver. The environment, implemented with over 10,000 lines of C and C++ code. A block diagram can be found in Figure 6.1. tinynetapi, the execution environment, implements a subset of the kernel mode network driver interface in our target operating system. As with tinynic, most sources of concurrency are preserved—concurrent and asynchronous sending and receiving of packets, interrupt and tasklet handling; support for pausing, halting and unloading the driver; and so on. In addition, numerous dynamic checks have been put in place to validate proper driver behavior. For example, the uses of spinlocks and memory allocations are individually tracked. The environment includes utilities that make it easy to write simple test programs which can, for example, submit concurrent streams of packets and pause the driver midstream. The clarity-generated driver processes 15,000 packets per second on
99
6.3. IMPLEMENTATION
Test Code
TINYNETAPI Interface
TINYNIC Driver
Simulated NETAPI Runtime
Miniport Instance
Emulated Hardware Instances
CLARITY Runtime
Simulated TINYNIC Hardware
Figure 6.1: clarity simulation environment a 2GHz single processor Pentium machine. The driver passes the following tests: 1. Ability to initialize and shutdown, including appropriately initializing and resetting the hardware. 2. Ability to handle concurrent sends, pausing the driver midstream. 3. Ability to handle concurrent sends and receives. We have kept track of the bug fixes we have needed to make in order pass all tests. It is encouraging to note that none of the errors have been issues of concurrency or asynchrony, but rather logical errors with respect to the hardware specification. We have of course had to fix concurrency-related errors in the clarity runtime environment as it was under development, but none in the clarity driver code—we appear to have made some progress toward our goal of simplifying the driver development in the areas of concurrency and asynchrony.
100
6.4
CHAPTER 6. ANALYZING CLARITY
Discussion
In this chapter we have described static analysis and concurrency analysis of clarity programs and coords. We have also presented our prototype implementation of clarity along with our verification results.
Chapter 7 Epilogue In this chapter we summarize the main contributions of this thesis and point to areas of future work.
7.1
Summary
In this thesis, we have introduced ccs — a new formal model for specifying asynchronous concurrent systems, and clarity — a new programming language that enables analyzable design of asynchronous components. ccs is a formal model that combines the rich visual notation of HMSCs with the expressive power of MPAs. ccs introduces a mechanism to overlap phases of communication that allows complex interactions to be specified, and at the same time helps one maintain the logical structure of the constituent communication scenarios. We have also shown that verification of ccs models can be automated using the model checker Uppaal, provided that we impose bounds on resources like channels and transactions. clarity is a programming language that extends C and introduces three novel language features: nonblocking function calls, coords and linearity annotations. clarity enables the programmer to write asynchronous code in a sequential manner, producing code that is more amenable to static analysis. The use of coords helps in verifying concurrency properties of clarity programs, and proper use of linearity annotations helps in concurrency analysis 101
102
CHAPTER 7. EPILOGUE
and avoiding concurrency related bugs
7.2
Future Work
Though developed for different purposes, ccs and clarity have the same underlying philosophy of using a framework with few, but powerful, features to enhance analyzability of the models or software being developed. A natural way to proceed is to develop a framework that can bridge the gap between ccs and clarity. Such a framework would gives us a complete solution from design to implementation of asynchronous concurrent systems. This integrated framework can then serve as a basis for fully automated (or user-guided) synthesis of software from the formal specification. We now present a few interesting areas for future work, to help harness the full power of ccs and clarity. 1. Add timing to ccs to model Timed Asynchronous Concurrent Systems. Adding timing constraints to MSGs quickly leads to undecidability, even for problems known to be decidable in the untimed case [23]. For ccs, constructs such as await and async pose additional challenges, starting with defining their semantics in the presence of timing constraints. On the other hand, since timing constraints naturally occur in “real” specifications, it is important to identify a reasonable way to integrate them into the notation while preserving the possibility of formal analysis. 2. Integrate clarity constructs into an operating system kernel and core C libraries, and the clarity compiler in to a C compiler. By integrating the clarity compiler into a C compiler (like gcc) one can reduce the overheads like storing local variables on the heap. Integrating the clarity runtime as part of C libraries, and the operating system kernel would enhance performance by eliminating overheads in maintaining queues etc., inside the clarity runtime.
7.2. FUTURE WORK
103
Also, an operating system scheduler will, for example, know better when to schedule queued up clarity tasks. 3. Bridging ccs and clarity to develop an integrated framework for code synthesis. It would be ideal if a model in ccs could be automatically (or with user-guidance) synthesized into an software implementation. In this context, given that clarity removes the hassles of asynchronous programming and allows one to write asynchronous code sequentially, it is only natural to look at synthesizing ccs models into clarity programs.
104
CHAPTER 7. EPILOGUE
Publications • P. Chandrasekaran and M. Mukund. Matching scenarios with timing constraints. In FORMATS ’06: Proceedings of the 4th International Conference on Formal Modelling and Analysis of Timed Systems, volume 4202 of LNCS, pages 98–112. Springer, 2006 • P. Chandrasekaran, C. L. Conway, J. M. Joy, and S. K. Rajamani. Programming asynchronous layers with clarity. In ESEC-FSE ’07: Proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, pages 65–74. ACM, 2007 • P. Chandrasekaran and M. Mukund. Specifying interacting components with coordinated concurrent scenarios. In SEFM ’09: Proceedings of the 7th IEEE International Conference on Software Engineering and Formal Methods, 2009. (to appear)
Bibliography [1] A. Adya, J. Howell, M. Theimer, W. J. Bolosky, and J. R. Douceur. Cooperative task management without manual stack management. In ATEC ’02: Proceedings of the General Track of the USENIX Annual Technical Conference, pages 289–302. USENIX Association, 2002. [2] T. Andrews, S. Qadeer, S. Rajamani, J. Rehof, and Y. Xie. Zing: Exploiting program structure for model checking concurrent software. In CONCUR ’04: Proceedings of the 15th International Conference on Concurrency Theory, pages 1–15. Springer, 2004. [3] A. W. Appel. Compiling with continuations. Cambridge University Press, 1992. [4] T. Arons, A. Pnueli, S. Ruah, J. Xu, and L. D. Zuck. Parameterized verification with automatically computed inductive assertions. In CAV ’01: Proceedings of the 13th International Conference on Computer Aided Verification, pages 221–234. Springer-Verlag, 2001. [5] T. Ball and S. K. Rajamani. Automatically validating temporal safety properties of interfaces. In SPIN ’01: Proceedings of the 8th International SPIN Workshop on Model Checking of Software, pages 103–122. Springer-Verlag, 2001. [6] T. Ball and S. K. Rajamani. SLIC: A specification language for interface checking of C. Technical Report MSR-TR-2001-21, Microsoft Research, Jan. 2001. 105
106
BIBLIOGRAPHY
[7] G. Behrmann, A. David, and K. G. Larsen. A tutorial on Uppaal. In Formal Methods for the Design of Real-Time Systems, International School on Formal Methods for the Design of Computer, Communication and Software Systems, SFM-RT 2004, pages 200–236, 2004. [8] G. Berry and G. Gonthier. The Esterel synchronous programming language: Design, semantics, implementation. Science of computer programming, 19(2):87–152, 1992. [9] R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. ACM SIGPLAN Notices, 30(8):207–216, 1995. [10] S. Chandra, B. Richards, and J. R. Larus. Teapot: A domain-specific language for writing cache coherence protocols. IEEE Transactions on Software Engineering, 25(3):317–333, 1999. [11] P. Chandrasekaran, C. L. Conway, J. M. Joy, and S. K. Rajamani. Programming asynchronous layers with clarity. In ESEC-FSE ’07: Proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, pages 65–74. ACM, 2007. [12] P. Chandrasekaran and M. Mukund. Matching scenarios with timing constraints. In FORMATS ’06: Proceedings of the 4th International Conference on Formal Modelling and Analysis of Timed Systems, volume 4202 of LNCS, pages 98–112. Springer, 2006. [13] P. Chandrasekaran and M. Mukund. Specifying interacting components with coordinated concurrent scenarios. In SEFM ’09: Proceedings of the 7th IEEE International Conference on Software Engineering and Formal Methods, 2009. (to appear). [14] M. Das, S. Lerner, and M. Seigle. Esp: path-sensitive program verification in polynomial time. ACM SIGPLAN Notices, 37(5):57–68, 2002.
107
BIBLIOGRAPHY
[15] L. de Alfaro and T. A. Henzinger. Interface automata. In ESEC/FSE-9: Proceedings of the 8th European Software Engineering Conference held jointly with 9th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 109–120. ACM, 2001. [16] B. C. Demsky. An empirical study of technologies to implement servers in Java. Master’s thesis, Massachusetts Institute of Techonology, 2001. [17] V. Diekert. The Book of Traces. World Scientific Publishing Co., Inc., 1995. [18] D. D’Souza and M. Mukund. Checking consistency of SDL + MSC specifications. In SPIN ’03: Proceedings of the 10th International SPIN Workshop, pages 151–165. Springer, 2003. [19] D. Engler, B. Chelf, A. Chou, and S. Hallem.
Checking system
rules using system-specific, programmer-written compiler extensions. In OSDI’00: Proceedings of the 4th conference on Symposium on Operating System Design & Implementation, pages 1–1. USENIX Association, 2000. [20] U. Erlingsson and F. B. Schneider. Sasi enforcement of security policies: A retrospective. In NSPW ’99: Proceedings of the 1999 Workshop on New Security Paradigms, pages 87–95. ACM, 2000. [21] M. Fahndrich and R. DeLine. Adoption and focus: practical linear types for imperative programming. In PLDI ’02: Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, pages 13–24. ACM, 2002. [22] J. Fischer, R. Majumdar, and T. Millstein. Tasks: language support for event-driven programming. In PEPM ’07: Proceedings of the 2007 ACM SIGPLAN Symposium on Partial Evaluation and semantics-based Program Manipulation, pages 134–143. ACM, 2007.
108
BIBLIOGRAPHY
[23] P. Gastin, M. Mukund, and K. Narayan Kumar. and boundedness in time-constrained MSG graphs.
Reachability In K. Lodaya,
M. Mukund, and R. Ramanujam, editors, Perspectives in Concurrency, pages 157–183. Universities Press, 2008. [24] T. Gazagnaire, B. Genest, L. H´elou¨et, P. S. Thiagarajan, and S. Yang. Causal message sequence charts.
Theoretical Computer Science,
410(41):4094–4110, 2009. [25] B. Genest, D. Kuske, and A. Muscholl. A kleene theorem for a class of communicating automata with effective algorithms. In DLT ’04: Proceedings of the 8th International Conference on Developments in Language Theory, volume 3340 of LNCS, pages 30–48. Springer-Verlag, 2004. [26] E. L. Gunter, A. Muscholl, and D. Peled. Compositional message sequence charts. International Journal on Software Tools for Technology Transfer, 5(1):78–89, 2003. [27] N. Halbwachs, F. Lagnier, and P. Raymond. Synchronous observers and the verification of reactive systems. In AMAST ’93: Proceedings of the Third International Conference on Methodology and Software Technology, pages 83–96. Springer-Verlag, 1994. [28] R. H. Halstead, Jr. Multilisp: a language for concurrent symbolic computation. ACM Transactions on Programming Languages and Systems, 7(4):501–538, 1985. [29] P. B. Hansen. The programming language concurrent pascal. In P. B. Hansen, editor, The origin of concurrent programming: from semaphores to remote procedure calls, pages 297–318. Springer-Verlag New York, Inc., 2002. [30] J. Henriksen, M. Mukund, K. Narayan Kumar, M. Sohoni, and P. Thiagarajan. A theory of regular MSC languages. Information and Computing, 202(1):1–38, 2005.
BIBLIOGRAPHY
109
[31] T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy abstraction. In POPL ’02: Proceedings of the 29th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, pages 58–70. ACM, 2002. [32] C. Hewitt and B. Smith. A PLASMA Primer. MIT Artificial Intelligence Laboratory, Oct. 1975. Draft. [33] C. A. R. Hoare. Monitors: an operating system structuring concept. Communications of the ACM, 17(10):549–557, 1974. [34] C. A. R. Hoare. Communicating sequential processes. Communications of the ACM, 21(8):666–677, 1978. [35] P. Z. Ingerman. Thunks: a way of compiling procedure statements with some comments on procedure declarations. Communications of the ACM, 4(1):55–58, 1961. [36] ISO Standard - Programming Languages - C, Dec. 1999. ISO/IEC 9899:1999. [37] L. Lamport. A new solution of dijkstra’s concurrent programming problem. Communications of the ACM, 17(8):453–455, 1974. [38] J. R. Larus and M. Parkes. Using cohort-scheduling to enhance server performance. In ATEC ’02: Proceedings of the General Track of the USENIX Annual Technical Conference, pages 103–114. USENIX Association, 2002. [39] H. C. Lauer and R. M. Needham. On the duality of operating system structures. SIGOPS Operating Systems Review, 13(2):3–19, 1979. [40] D. Lea. Concurrent Programming in Java. Addison-Wesley, second edition, 2000. [41] E. A. Lee. The problem with threads. Computer, 39(5):33–42, 2006.
110
BIBLIOGRAPHY
[42] P. Li and S. Zdancewic. Combining events and threads for scalable network services implementation and evaluation of monadic, applicationlevel concurrency primitives. In PLDI ’07: Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 189–199. ACM, 2007. [43] Message Passing Interface Forum. MPI: A message-passing interface standard. http://www.mpi-forum.org. [44] R. Milner. A Calculus of Communicating Systems. Springer-Verlag New York, Inc., 1982. [45] R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, Parts 1-2. Information and computation, 100(1):1–77, 1992. [46] M. Mukund, K. Narayan Kumar, and P. Thiagarajan. Netcharts : Bridging the gap between HMSCs and executable specifications. In CONCUR ’03: Proceedings of the 14th International Conference on Concurrency Theory, volume 2761 of LNCS, pages 293–307. Springer, 2003. [47] A. Muscholl, D. Peled, and Z. Su. Deciding properties for message sequence charts. In FoSSaCS ’98: Proceedings of the 1st International Conference on Foundations of Software Science and Computation Structures, pages 226–242. Springer-Verlag, 1998. [48] J. Ousterhout. Why threads are a bad idea (for most purposes). In Presentation given at the 1996 Usenix Annual Technical Conference, 1996. [49] D. C. Schmidt, M. Stal, H. Rohnert, and F. Buschmann. PatternOriented Software Architecture: Patterns for Concurrent and Networked Objects. Addison-Wesley, 2000. [50] R. Sekar, M. Bendre, D. Dhurjati, and P. Bollineni. A fast automatonbased method for detecting anomalous program behaviors. In SP ’01:
BIBLIOGRAPHY
111
Proceedings of the 2001 IEEE Symposium on Security and Privacy, pages 144–144. IEEE Computer Society, 2001. [51] S. F. Siegel and G. S. Avrunin. Verification of MPI-based software for scientific computation. In S. Graf and L. Mounier, editors, SPIN ’04: Proceedings of the 8th International SPIN Workshop on Model Checking of Software, volume 2989 of LNCS, pages 286–303. Springer-Verlag, 2004. [52] F. Smith, D. Walker, and J. G. Morrisett. Alias types. In G. Smolka, editor, ESOP ’00: Proceedings of Programming Languages and Systems, 9th European Symposium on Programming, Held as Part of the European Joint Conferences on the Theory and Practice of Software, volume 1782 of LNCS, pages 366–381. Springer, 2000. [53] R. E. Strom and S. Yemini. Typestate: A programming language concept for enhancing software reliability. In Transactions on Software Engineering, 12(1), pages 157–171. IEEE Computer Society, 1986. [54] M. M. Strout, B. Kreaseck, and P. D. Hovland. Data-flow analysis for MPI programs. In ICPP ’06: Proceedings of the 2006 International Conference on Parallel Processing, pages 175–184. IEEE Computer Society, 2006. [55] G. J. Sussman and G. L. Steele, Jr. Scheme: A interpreter for extended lambda calculus. Higher Order Symbolic Computation, 11(4):405–439, 1998. [56] R. von Behren, J. Condit, and E. Brewer. Why events are a bad idea (for high-concurrency servers). In HOTOS’03: Proceedings of the 9th Conference on Hot Topics in Operating Systems, pages 4–4. USENIX Association, 2003. [57] R. von Behren, J. Condit, F. Zhou, G. C. Necula, and E. Brewer. Capriccio: scalable threads for internet services. In SOSP ’03: Proceedings of
112
BIBLIOGRAPHY the 19th ACM Symposium on Operating Systems Principles, pages 268– 281. ACM, 2003.