applications of abstraction for concurrent programs - CiteSeerX

0 downloads 0 Views 827KB Size Report
[73] Daniel William Palmer. Efficient Execution of .... Ciancaglini, and S. Ronchi della Rocca, editors, Proceedings of the International Colloquium on Automata ...
APPLICATIONS OF ABSTRACTION FOR CONCURRENT PROGRAMS

by James Wheelis Riely

A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science.

Chapel Hill 1999

Approved by

Advisor: Professor Jan Prins

Advisor: Professor Rance Cleaveland

Reader: Professor James Anderson

ABSTRACT JAMES WHEELIS RIELY: Applications of Abstraction for Concurrent Programs (Under the direction of Jan Prins and Rance Cleaveland)

We study the use of abstraction to reason operationally about concurrent programs. Our thesis is that abstraction can profitably be combined with operational semantics to produce new proof techniques. We study two very different applications: • the implementation of nested data-parallelism, and • the verification of value-passing processes. In the first case, we develop a typing system for a nested data-parallel programming language and use it to prove the correctness of flattening, an important compilation technique. In the second, we demonstrate that abstract interpretations of values domains can be applied to process description languages, extending the applicability of finite-state methods to infinite-state processes.

ii

For Lucia

iii

TABLE OF CONTENTS 1 Introduction 1.1 Implementing of Nested Data-Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Verifying Value-Passing Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Unifying themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 2 4 5

I

6

Implementing Nested Data-Parallelism

2 Nested Data-Parallelism and Flattening 2.1 A Nested-Sequence Language . . . . . 2.2 Flattening . . . . . . . . . . . . . . . . 2.3 The Step/Work Cost Model . . . . . . . 2.4 Containment and Typing . . . . . . . . 2.5 Segment Vectors and References . . . . 2.6 Overview of Part I . . . . . . . . . . . .

. . . . . .

7 7 10 11 14 16 16

. . . . .

18 18 20 21 28 32

4 A Typing System for Containment 4.1 Types and Subtypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Properties of Typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35 35 36 39

5 Correctness of the Reference Implementation 5.1 Asymptotic Improvement . . . . . . . . . . . 5.2 Costed Semantics . . . . . . . . . . . . . . . 5.3 Strong Improvement . . . . . . . . . . . . . 5.4 Transformation implies Strong Improvement .

. . . .

42 42 43 47 48

6 A Segment-Vector Implementation 6.1 The Segment-Vector Implementation of Nested Sequences . . . . . . . . . . . . . . . .

55 55

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

3 A Nested-Sequence Language and its Implementation 3.1 The Source Language . . . . . . . . . . . . . . . . . . 3.2 The Intermediate and Target Languages . . . . . . . . 3.3 The Semantics . . . . . . . . . . . . . . . . . . . . . . 3.4 The Transformations . . . . . . . . . . . . . . . . . . 3.5 Proving the Transformations Correct . . . . . . . . . .

iv

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

6.2 7

Adapting the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discussion

56 61

II Verifying Value-Passing Processes

63

8 Verification, State-Explosion and Abstraction 8.1 Success of Model-Based Verification . . . . . 8.2 Process Algebras . . . . . . . . . . . . . . . 8.3 State Explosion and Value Passing . . . . . . 8.4 Structure of Part II . . . . . . . . . . . . . .

. . . .

64 64 65 66 67

. . . .

68 68 72 78 83

. . . . . . . . . .

87 88 93 96 102 103 105 105 106 107 107

9 Processes and Verification 9.1 Value Signatures, Interpretations and Terms 9.2 VPL . . . . . . . . . . . . . . . . . . . . . 9.3 Ready Simulation . . . . . . . . . . . . . . 9.4 Logics . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

10 Abstraction 10.1 Abstraction and Concretization . . . . . . . . . . . 10.2 Soundness . . . . . . . . . . . . . . . . . . . . . . 10.3 Quality of Abstract Semantics . . . . . . . . . . . 10.4 An Example . . . . . . . . . . . . . . . . . . . . . 10.5 Testing . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Alternative Semantics and Extensions . . . . . . . 10.6.1 Alternative Semantics for Output and Conditional 10.6.2 Alternative Value Sets for Booleans . . . . . . . 10.6.3 Call-By-Value Recursion . . . . . . . . . . . . . 10.6.4 Errors in Value Semantics . . . . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

11 Discussion

108

12 Bibliography

112

v

LIST OF TABLES 3.1 3.2 3.3 3.4 3.5 3.6 3.7

Values, Expressions and Runtime Environments . . Primitives: Typing Rules and Reference Semantics Reference Semantics . . . . . . . . . . . . . . . . Transformations: Context Rules . . . . . . . . . . Transformations: let Introduction and Elimination . Transformations: Map Rules . . . . . . . . . . . . Termination Metric . . . . . . . . . . . . . . . . .

. . . . . . .

19 23 25 28 28 29 31

4.1 4.2 4.3

Types and Type Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Typing Rules: Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Typing Rules: Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36 37 38

5.1 5.2 5.3

Cost Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Costed Reference Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Internal distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44 44 49

6.1 6.2

Primitives: Segmented Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Segmented Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56 57

9.1 9.2 9.3 9.4

Typical Meaning of Metavariables VPL Process Signature . . . . . . Transition Rules for VPLI . . . . . Semantics of µHML and µRL . . .

71 73 75 86

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . .

10.1 Hennessy and Ing´olfsd´ottir’s semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

vi

Chapter 1 Introduction Correct concurrent programs are difficult to write. This fact has been evident since the 1950’s, and today it is almost a cliche to point it out. An enormous amount of work in computer science has been devoted to trying to improve the art of creating concurrent programs: basic algorithms have been studied that provide building blocks for programs; paradigms have been proposed to provide programmers with a clean semantics of concurrency; formal methods have been developed for both step-wise refinement and formal verification. Correct concurrent programs are still difficult to write, but not nearly so much as three decades ago. This dissertation describes work centered on further improving the practice of concurrent program development. We address two distinct problems that arise from different applications of concurrency. Concurrency may be introduced into a program for a variety of reasons. In many cases the notion of multiple processes arises naturally from the task, as in a multi-threaded user interface or a communication protocol. Such systems are inherently reactive1 and nondeterministic: reactive because processes must cooperate during execution, and nondeterministic because the relative progress of processes is at best only approximately known when the program is written. Whereas humans are reasonably good at predicting the outcome of a sequential program, we are often incapable of anticipating every possible interaction and interleaving of a concurrent system. Thus, proofs of correctness are essential, even for small programs. In Part II of the dissertation, we show how programs can be automatically verified by using abstractions of the values manipulated by processes. In other cases, concurrency is introduced into a non-reactive (or batch) program in order to speed up execution; this is the case for almost all applications of multi-processor supercomputers. In this case, portability and execution speed are as important as the program’s correctness; supercomputer architecture changes rapidly, and slow programs might as well be exchanged with their sequential counterparts. A large body of recent work has been devoted to developing high-level languages that are expressive, portable and have efficient implementations. Languages such as SETL and APL have long used nested data-parallel notations adapted from mathematics. These languages are very expressive and allow for high-level specification of algorithms operating on both regular and irregular data 1 In traditional sequential computing,

once a program is started (with some input) it need not communicate to the outside world until it terminates (with some output). This batch model of execution is well understood and has been formalized using Turing machines, the lambda-calculus, and the RAM model, just to name a few. Reactive programs, instead, are required to interact with the environment as they compute. The classic example of a reactive program is a text editor; one cannot specify such a program as a function from input to output.

structures such as sparse matrices and trees. Not until the late 1980’s, however, was it clear that nested data-parallel constructs could be efficiently implemented in parallel [11]. In designing algorithms, it is important to have a cost metric that accurately predicts the relative performance of different codes on an actual parallel machine; only with such a metric can one make an informed choice between two alternate algorithms. To be usable, a metric must be designed to match the constructs of the programming language rather than the operations of the underlying machine. In Part I of the dissertation, we present cost metrics for nested data-parallel languages and study the conditions under which they accurately reflect performance on today’s supercomputers. Thus, the dissertation presents techniques for reasoning about reactive and data-parallel concurrent programs, in two parts. The first part analyzes cost metrics and compilation techniques for dataparallel programming languages and proves that the cost metrics are preserved by compilation. The second part develops a method for the automatic verification of distributed algorithms based on abstractions of the values that these systems manipulate. The two parts of the dissertation address different audiences, and therefore we defer detailed introductions to later chapters. In the rest of this introduction we briefly summarize the results and underscore the common themes in the work.

1.1

Implementing of Nested Data-Parallelism

In the first part of the dissertation we discuss data-parallel programming languages and compilers for supercomputers. Specifications of supercomputer applications specifications are seldom (if ever) reactive or nondeterministic; these concerns arise in some implementation languages, but not dataparallel languages, which have simple, functional semantics. In addition, data-parallel languages provide a simple cost measure to programmers. The key issue is performance: one must ensure that the high-level cost measures are realized in efficient implementations. In this part of the dissertation, we study compilation techniques for nested data-parallel languages and prove that they preserve highlevel measures of program running time. There are at least three approaches to writing parallel programs. One can simply use a sequential program, relying on a parallelizing compiler to extract processes that run on the actual machine. The sequential semantics, however, often encourage the development of algorithms that are not readily parallelized. At the opposite extreme, one may write a program with an explicit decomposition of tasks into processes using notations developed for reactive systems; this approach is commonly called process or control parallelism. While such codes have the potential to maximize performance, they are difficult to develop and often fail to be portable to machines with different numbers of processors or different communication topologies. An intermediate approach is to write a data-parallel program in which parallelism is specified using parallel operations on aggregate datatypes, with the eventual mapping of operations to processors left to the compiler or run-time system. Data-parallel languages provide a simple programming model, and therefore programs are relatively easy to develop. When nesting is added to a data parallel language, the resulting language is also very expressive. Blelloch, et al. describe these languages as follows [10]: Nested data-parallel languages combine aspects of both strict data-parallel languages and control-parallel languages. Nested data-parallel languages allow recursive data struc2

tures, and also the application of parallel functions to multiple sets of data in parallel. For example, a sparse array can be represented as a sequence of rows, each of which is a subsequence containing the nonzero elements in that row (each subsequence may be of a different length). A parallel function that sums the elements of a sequence can be applied in parallel to sum all the rows of our sparse matrix. In the language we consider, parallel function calls are written using an iterator construct, which is similar to the familiar set former construct of mathematics. For example, the sequence h2, 6, 10, 14, 18i can be described by the iterator [i ⇐ h1, 3, 5, 7, 9i : 2 ∗ i]. By replacing the iterator expression (2∗i) with another iterator, or with a parallel function call, nested parallelism can be described. Blelloch [6, 7] developed a simple cost metric for nested-parallel languages by adapting the step and work measures commonly used in the analysis of parallel programs. Roughly speaking, the step complexity gives the running time under the assumption that all specified parallelism is realized; the work complexity counts the total number of (scalar) operations performed. The step complexity of an iterator is simply the maximum of the steps taken by the iterator expression (2 ∗ i in the example) under each binding of the iterator variable (i in the example); whereas the work complexity is the sum of the work under each binding. These metrics are very useful in designing algorithms. In compiling a nested data-parallel language to run on SIMD and vector machines, the key step is flattening [11]; in our language this step replaces iterators with simpler operations on nested sequences, which are implemented as vectors. For some programs, flattening is “incorrect” in the sense that the running time of the flattened code is substantially worse than the running time of the original code. We say that flattening is “correct” for a program if the work/step metrics for the flattened code can be bounded with respect to the work/step metrics for the original code. In his thesis [6], Blelloch defined a criterion, called containment, that could be used to distinguish programs that could be correctly implemented. He then proved that contained programs could be “correctly” implemented on a vector machine. In applying Blelloch’s results, however, one confronts two difficulties. First, containment is a semantic condition and, as such, is undecidable. Second, Blelloch’s correctness proof involves a simulation technique which is not directly related to any particular implementation strategy, such as flattening; that is, although Blelloch defines flattening, he does not prove it correct, even for contained programs. We address both of these difficulties. First, we define a typing system that identifies (a subset of) contained programs. Our typing system divides programs into three classes, which abstract the dynamic function-call structure of programs. cnst programs have constant call structures; flat programs have regular, but variable, call structures; exp programs have arbitrary call structures. Second, we show that for flat programs, flattening preserves the step/work cost metrics — in other words, if flattening is used to implement a source program, then the implementation will run in the time predicted when applying the work/step metrics to the source program. Key to our proof are two techniques. The first is the afore-mentioned typing system. The second is an efficiency preorder which orders programs based on how fast they run. Both are novel.

3

1.2

Verifying Value-Passing Processes

Since the mid 1980’s, a number of tools have been developed for verifying finite-state concurrent systems. These tools typically work by “compiling” a program into a state graph (which includes all reachable states of the programs). and then analyzing the graph. This approach has been very successful for small and medium sized systems; however, practical data-types often throw a wrench into the works. Consider a simple routing protocol in which the router examines the packet header and does a table lookup to determine its destination; we wish to verify that each packet is sent to the correct destination. Even in a system with small 53 byte packets, the number of packets is 2424 , far more than the finite-state tools can hope to deal with directly. In Part II of the dissertation, we propose a method that allows one to build an abstract state graph directly from the program text. For example, for our routing protocol, a symbolic abstraction (defined in Chapter 10) can be used to reduce the number of possible packets to two; a similar abstraction can be used to reduce the size of the routing table. Using the abstract version of packets, we may now have a system small enough that verification can be performed using the tools mentioned above. In general, it is the burden of the user of our technique to choose a value abstraction and to provide an abstract semantics for each of the value operations found in a program. In many cases, however, it is sufficient to supply only the abstraction function; a usable abstract semantics can be automatically generated. In these cases, our technique is very easy to use. We comment further on this in Chapter 10. With the theorems presented in Chapter 10, one can conclude that if a universal property can be proven to hold using the abstract state graph, then that property will also hold—in an appropriate sense—for the original system. Roughly, a universal property specifies that some predicate must hold for every execution of a system. The idea of using abstract value sets to verify processes is not new. All previous work, however, is based on simple state-transformer languages in which a program is simply a collection of guarded actions that specify a relation on states. In such languages, there is a straightforward translation from programs to state graphs; however, the possibilities for compositional verification are quite limited due to the difficulty of controlling interaction among the composed processes. For compositional verification using abstraction, the only published results are for specific forms of parallel composition [63, 62]. Our work, instead, is based on the framework of process algebras [66, 54, 3, 49]. In this formalism, all interactions between processes are specified explicitly. There is no “shared state” that processes can modify without inter-process communication; shared objects (such as memory) are themselves modeled as processes. This restriction allows compositional verification with respect to a rich set of process combinators, including numerous forms of parallel composition, guarded choice, channel hiding and channel renaming. A second novel aspect of our work is the notion of α-independence. The exact definition will be given in Chapter 10; for now, it suffices to say that α-independence generalizes data-independence as studied by Wolper [98]. A data-independent program is one in which the control flow of the program does not depend on the data that are received; for example queue operations are data-independent in that the behavior of the queue does not depend on the specific values it stores. α-independence weakens this restriction, allowing for data-dependent behavior that “respects” the abstraction function 4

α. For example, the router discussed at the beginning of this section is α-independent, though it is not data-independent. In addition to proving that (for any program) abstraction preserves all universal properties, we also prove that for any α-independent program, abstraction preserves all mixed properties. Recall that a universal property specifies that some predicate must hold for every execution of a system. The negation of a universal property is an existential property, which specifies that a predicate must hold for some execution. Mixed properties permit the alternation of universal and existential quantification. While universal and existential properties have equal discriminating power (p |= ϕ iff p 6|= ¬ϕ), mixed properties can distinguish processes that satisfy exactly the same universal formulae. For example, a mixed property can determine whether a process “after receiving a message, is always capable of replying”; this requirement is not expressible without alternating universal and existential quantification. In addition to Wolper, other authors have proposed conditions under which abstractions preserve mixed properties; however, these are either overly restrictive [22, 62] or require that the user provide two abstract semantics for each value operation (one for universal properties and one for existential properties) [35]—requiring the user to provide one abstract semantics is already burden enough. αindependence is broadly applicable, providing a usable framework for verifying mixed properties.

1.3

Unifying themes

The goal of our work is to make concurrent programming easier. In the case of process-parallel languages, we have developed a technique that widens the applicability of existing tools for automatic verification. For data-parallel languages, we have established a cost measure for nested parallelism that both provides a workable programming model and allows for efficient implementation. The two parts of this dissertation treat very different problems, but use very similar techniques. In both parts we are concerned with program correctness: in the first part, of compilers; in the second part, of user-defined programs. We reason about correctness in both cases using operational semantics and semantic preorders. One theme of this work is that operational semantics provide a valuable tool for reasoning about concurrent systems, both functional and reactive. Another is the key role of abstraction in establishing usable proof techniques. Operational semantics allow a language to be described simply and intuitively. Structural operational semantics, which originated with Plotkin’s lectures in 1981 [78], has the additional advantage of an elegant mathematical theory which lends itself to intuitive proofs. Plotkin’s work has been tremendously influential, particularly in the theories of reactive and functional programming. This dissertation extends existing work in both these areas. The first part adds a novel form of asymptotic efficiency analysis to standard functional programming methods. The second extends the state of the art in reactive programming theory by incorporating techniques from Abstract Interpretation. Abstraction is crucial to the development of usable proof techniques. In the first part of the dissertation, we use abstraction to statically identify programs that satisfy certain dynamic properties. In the second part, we use abstraction to simplify models of reactive programs, making analysis of the programs feasible.

5

Part I Implementing Nested Data-Parallelism

Chapter 2 Nested Data-Parallelism and Flattening In this half of the dissertation, we study flattening, a technique for implementing nested dataparallelism. We present the flattening transformations for a simple functional language and show that the transformations are “correct” for a class of programs. The transformations define a translation from a source language to a target language. Both source and target are equipped with metrics that estimate the running times of programs. Our notion of correctness is quite strong: a correct implementation must compute the same value using approximately the same amount of computation time. We call this intensional correctness, to be contrasted with extensional correctness, where computation time is ignored. The flattening transformations are interesting in that they are not intensionally correct for all programs in our language, although they are always extensionally correct. In order to identify those programs for which the transformations preserve intension, we use a typing system that provides a static, abstract estimate of the computational structure, and running time, of a program. We make the following contributions which, to our knowledge, are all novel: • We define the notion of “asymptotic” intensional correctness and provide a proof technique for it which uses an improvement preorder [83] defined over an explicitly parallel language. • We define a sufficient syntactic condition for the correctness of flattening, i.e. a static characterization of “containment” [6]. • We present a proof that the flattening transformations are intensionally correct for “contained” programs. In the rest of this chapter, we develop these ideas further. First we describe the use of nested data-parallelism in parallel programming. We then describe the flattening transformations and their importance. In order to discuss the intensional aspects of flattening, we then turn to the work/step metrics commonly used to estimate the computation time of parallel programs. This leads us to a discussion of containment and a summary of our typing system. We conclude with an overview of the remainder of this part.

2.1

A Nested-Sequence Language

In this section, we describe, in informal terms, the extensional semantics of a strict first-order functional language with sequences. The language being functional, parallelism is not observable in extension (as it would be in an imperative or reactive language). As such, we do not discuss parallelism

here at all, deferring the topic until Section 2.3. The material is quite standard; it is included for review and to establish some notation. The following example establishes the basic syntax of the language, defined formally in the following chapter. We presuppose scalar primitives equal, minus and mult, which test equality and perform subtraction and multiplication, respectively. (Henceforth, we use such self-explanatory primitives without introduction; we also occasionally use standard infix notation for binary primitives.) The following expression computes xn for n ≥ 0: letrec pow(x0 , n0 ) ⇐ if equal(n0 , 0) then 1 else let z ⇐ pow(x0 , minus(n0 , 1)) in mult(x0 , z) in pow(x, n) Let E refer to this expression. E defines a recursive function named pow, then invokes the function on two variables. The evaluation relation gives the operational interpretation of expressions. For E, we have x ⇐ 2, n ⇐ 8 ` E −→ 256 which states that in the environment where x is bound to 2 and n is bound to 8, the expression E evaluates to 256. This is to be expected, since 28 = 256. In addition to scalars, nested data-parallel languages include data types for sequences. (For simplicity, we avoid product types, although we occasionally use them in examples.) We use two notations for sequence values, angle brackets and overlines; thus, h1, 2, 3i and 1 2 3 both represent the three-element sequence whose ith element is the integer i. The empty sequence is written hi or •. Sequences are uniform — all elements must have the same type — giving rise to types of the form: V :: = int bool V1 Here V1 is the type of sequences of type V. Note that sequences may contain other sequences. We write Vn+1 to abbreviate (Vn )1 . A typical element of int3 is: 1 234 56

78



This is a sequence of two elements, the first element of which is a sequence of three elements, the second a sequence of two elements. Sequences differ from both cons and cat lists. The basic constructors are a family of primitives build` that build an `-element list from ` arguments; for example,1 build2 (1, 2) = h1, 2i. cons and cat are derivable using the flat primitive, which “flattens” a nested sequence; for example,

flat h1, 2i, h3, 4i = h1, 2, 3, 4i. cat(xs, ys) ⇐ flat build2 (xs, ys) cons(x, ys) ⇐ flat build2 (build1 x, ys) 1 In

this chapter, we use the equality symbol ‘=’ informally; the formal semantics of primitives is given in the next chapter.

8

The basic destructor is the elt primitive, which selects an element from a sequence; for example,  elt 2, h5, 6, 7i = 6. Other important primitives include rstr, which restricts a sequence based on a sequence of booleans, merge, which merges two sequences based on a sequence of booleans, and part, which partitions a sequence according to the structure of a different sequence. Let t and f be the boolean values true and false respectively, and let a through e be arbitrary values, then:  rstr t f t, 1 2 3 = 1 3  merge 1 2 3, f t f f t, 8 9 = 1 8 2 3 9   part a b c d e, 1 2 3 4 5 = 1 2 3 4 5 Let i be a natural number between 1 and `. Let as be a sequence and let bs be a boolean sequence of b its elementwise logical complement. Let ass be a sequence of sequences. The equal length, with bs primitives satisfy the following equations:  elt i, build` (a1 , .., a` ) = ai (2.1)  b as), bs, rstr(bs, as) = as merge rstr(bs, (2.2)  part ass, flat ass = ass (2.3) In addition to primitives, our language includes a second-order construct for operations on sequences. This is the well-known “map” or “apply-to-each” construct. As in NESL and PROTEUS, we adopt a special notation for this construct, called the iterator. (We use the terms “iterator” and “map” interchangeably.) The basic form is [x ⇐ as : A], which binds the variable x in expression A. This maps the function λx.A over the sequence as. In a language such as HASKELL, this might be written ‘map (λx.A) as’. More generally, the syntax is: [x1 ⇐ B1 , .., x` ⇐ B` : A] To produce a value, each expression Bi must evaluate to a sequence hb1i , .., bni i, for some n. Note that all of these sequences must have the same length, namely n. The value of the iterator is then a sequence of n elements whose jth element is determined by evaluating A with each xi bound to bji . For example, the sum of two sequences is computed by   x ⇐ 1 2 3, y ⇐ 4 5 6 : add(x, y) which evaluates to 5 7 9. More generally, of course, the sum of two equal-length integer sequences, xs and ys, is computed using the expression: [x ⇐ xs, y ⇐ ys : add(x, y)] The notation for iterators is sometimes cumbersome; thus we adopt some abbreviations. We   e : A ’. We allow extensions of this “tilde notation” that often write ‘[x1 ⇐ B1 , .., x` ⇐ B` : A]’ as ‘ e x⇐B violate the abstract syntax, but allow for much abbreviated expressions. For example, we may write ‘[x1 ⇐ rstr(zs, B1 ), .., x` ⇐ rstr(zs, B` ) : A]’ as:   e :A e x ⇐ rstr(zs, B) 9

In examples, we also use the following “filter” notation. As an example of the notation, [x ⇐ h1, 2, 3, 4, 5, 6i | odd x : square x] evaluates to the sequence h1, 9, 25i. Here, odd x is an expression that filters the values over which the   e | F : A ’ is shorthand for: iterator is applied. In general, ‘ e x⇐B     e : F in e e :A let zs ⇐ e x⇐B x ⇐ rstr(zs, B)

2.2

Flattening

Flattening is a transformation that can be used to remove all instances of the iterator construct from an expression. After flattening, an expression can easily be mapped to a vector machine, or indeed many other architectures. This has been the primary implementation strategy for NESL for many architectures. As a simple example, suppose that our machine supports vector addition and multiplication. We represent these operations using sequence primitives add1 and mult1 ; for example add1 (1 2 3, 4 5 6) = 5 7 9. Then we can implement the expression [x ⇐ xs, y ⇐ ys, z ⇐ zs : add(x, mult(y, z))] as: add1 (xs, mult1 (ys, zs)) As another example, we can implement [x ⇐ xs : add(x, 1)] as: add1 (xs, prom(xs, 1)) where prom is a primitive that “promotes” its second argument to be the length of its first argument; for example, prom(h1, 2, 3i , a) = ha, a, ai. Flattening was introduced by Blelloch and Sabot [11] in their compiler for PARALATION LISP. They described flattening as a set of transformations on expressions. A typical transformation rule is the following rule for let-expressions. Given that variable zs does not occur free in A, ‘[x ⇐ xs : let z ⇐ B in A]’ rewrites to: let zs ⇐ [x ⇐ xs : B] in [x ⇐ xs, z ⇐ zs : A]

(2.4)

As the example implies, the basic strategy is to “push” the iterator expressions through the abstract syntax until it can be replaced, either by a variable or a promoted constant. The elimination rules allow ‘[x ⇐ xs : x]’ to be rewritten simply as ‘xs’ and ‘[x ⇐ xs : A]’ to be rewritten as prom(xs, A) as long as x does not appear free in A. 10

(2.5)

We have already seen an instance of the transformation rule for application; for primitive p, ‘[x ⇐ xs : p x]’ rewrites to ‘p1 xs’ and ‘[xs ⇐ xss : p1 xs]’ rewrites to:  part xss, p1 (flat xss)

(2.6)

The same strategy is used for user-defined functions. The extensional correctness of this (2.6) follows easily from basic algebraic properties of the map functional. Indeed, denotational and categorical arguments have been used to prove the extensional correctness of many transformations, including the flattening transformations. As a final example, the transformation of conditional expressions specifies that if z does not appear free in A or C, then ‘[z ⇐ zs, x ⇐ xs : if z then A else C]’ rewrites to:  merge [x ⇐ rstr(zs, xs) : A], not1 zs, [x ⇐ rstr(not1 zs, xs) : C]

(2.7)

To avoid confusion, let us mention that in the main text we use a different transformation for conditionals. In fact, in the next chapter we adopt a non-standard interpretation the conditional ‘if B then A else C’ that is strict in both A and B; under the standard interpretation, the conditional is strict only in B. Throughout this introduction we stick to the standard interpretation; the differences are purely technical and are not relevant to our motivations.

2.3

The Step/Work Cost Model

We are interested in proving that the flattening transformations preserve the running time of programs. In this section, we describe the metrics that we use to estimate running time. A cost model is an abstract model that provides an estimate of the running time of a program. In the case of sequential programs, the canonical model is the RAM, which is taught to every undergraduate computer scientist. In the RAM model, each application of a primitive is counted as one step. For example, the program i := 1; sum := 0; while i ≤ n do sum := sum + i; computes the sum of the numbers between 1 and n. It does so using approximately 5 + 3n steps, depending upon what, exactly, one counts. In practice, constant factors, such as 5 and 3 here, are usually ignored; one simply says that the code is in “big-O” of n, written O (n). This simplification is justified by the fact that for “sufficiently large” n, the constant factor 5 is negligible. Such approximations are called asymptotic approximations. We will revisit asymptotics in Section 3.5. The RAM model is useful precisely because it abstracts from many machine details, allowing comparison of algorithms in a simpler setting. In particular, it ignores memory hierarchy and pipeline effects. When such effects are of interest, more detailed models must be used. For programs with parallelism, the Parallel RAM (PRAM) model has been developed as a natural extension of the RAM model. Whereas the RAM model assumes that all instructions are executed sequentially on a single processor, the PRAM model allows for the execution of several threads in parallel. The question then arises: How many threads should one allow? The appropriate answer, 11

unfortunately, varies widely. Thus the PRAM model is parameterized with respect to p: the maximal number of concurrently executing threads, or, simply, the number of processors. A somewhat more abstract view is provided by the Vector RAM (VRAM) model. In this model, processors are entirely abstract. As such, explicit process parallelism cannot be specified. In its place, the VRAM model embodies parallelism using vector operations, as its name suggests. In the VRAM model, vector operations are executed in parallel over the data in the vector, all other instructions are sequential. Two metrics are used to determine the eventual running time of a program, steps and work. The step-count sums to the number of steps taken by the program assuming that all available parallelism is realized. The work-count sums the number of steps taken assuming that no parallelism is realized. If a program has step-count s and work-count w, we say that it takes s parallel steps and w sequential ones. For example, the operation add1 (1 2 3, 4 5 6) takes approximately 1 parallel step and 3 sequential ones. We use a variant of the VRAM known as the scan VRAM. (In the sequel, all references to the VRAM refer to its scan variant.) In this model reductions are assumed to take a single (parallel) step. For example, the function which sums the elements of a vector takes one step and has work proportional to the length of the vector. Using Brent’s scheduling theorem [17], Blelloch has shown that an expression that executes on the VRAM in t steps with w work can be executed on a p-processor PRAM in O (w/p + t log p) time. The log p factor is required to account for reductions. When the data-set is large, the work factor (w/p) typically dominates the execution time. The PRAM and VRAM have variants with different assumptions about memory access. The EREW variants (exclusive read, exclusive write) assume that there is no memory contention during reads or writes; that is, on each step the programmer must guarantee that no two processes attempt to read or write the same location. The CREW variants (C for concurrent) relax this assumption, allowing concurrent reads. The CRCW variants additionally allow concurrent writes. The EREW variants accurately predict eventual performance on parallel machines with sufficient memory bandwidth. If memory contention is small, the CREW variants are equally accurate [8]. Models such as the RAM, PRAM and VRAM are attractive because they are quite simple and provide reasonable abstractions of widely-used hardware. On the other hand, these models do not capture many features of modern programming languages. In particular, it is difficult to reason about programs with higher-order functions using these models. Even our simple iterator construct has no obvious parallel implementation using these models. Because authors of parallel programs are concerned about the performance of their code, they typically write programs in languages that closely model the hardware on which those programs are implemented. The main reason for adopting such languages is that they provide useful (often informal) metrics that programmers can use to tune their code. Of course, such low-level coding makes program development harder, since useful abstractions, such as iterators, cannot be used. It also creates portability problems; these are particularly significant given the instability of parallel platforms. Thus, substantial effort has been spent to discover useful metrics for higher-level parallel programming languages. To be successful, a metric must be closely tied to the syntax of the language

12

and must accurately predict performance. Perhaps the most successful attempt to marry a high-level language to a performance metric has been Blelloch’s NESL, which is the basis of the language used in this dissertation. The NESL metrics are an extension of the step/work model of the VRAM. Most of the constructs of language are shared with the VRAM. For example, the let construct is representable in the VRAM using sequencing and assignment. The only substantially new construct is the iterator. Iterators can be assigned a very intuitive costing. Since all elements of an iterator are defined   independently, they may be evaluated independently. Consider the iterator x ⇐ hb1 , .., bn i : A . If this evaluates, it evaluates to a sequence ha1 , .., an i, where for each j between 1 and n, aj is determined by evaluating A with x bound to bj . Suppose that evaluating A with x bound to bj takes tj steps and wj work. Then the step-count of the iterator is (maxj tj ), and the work-count is (∑j wj ); that is, the step-count is the maximum of the step-counts of the subevaluations, and the work-count is the sum of the work-counts of the subevaluations. This account of the cost of an iterator is very appealing, largely because it is simple and easy to work with. There are some difficulties in implementing this model, however, which we discuss in the next section. We finish this section with some simple examples showing the use of the step/work metrics. As a first example, suppose that a is a sequence of n sequences, each of which is a sequence of n reals; i.e., a represents an n × n matrix. If xs is an n-element sequence of reals, then the inner product a · xs can be written:   i ⇐ iota n : sum [j ⇐ iota n : mult elt(i, elt(j, a)), elt(j, xs) ] Here iota is a primitive that takes an integer n and returns an vector consisting of the integers between 1 and n; for example, iota 4 = h1, 2, 3, 4i. The sum primitive computes the sum of an n-element vector in a single step and with O (n) work. The example expression directs the evaluation of n such  summations in parallel. Thus the expression has step complexity O (1) and work complexity O n2 . Alternatively, we might choose to represent each row of a as a sequence of (index, value)-pairs where only the non-zero values are listed. The matrix a, then, is a sequence of such rows. Now we may write the inner product as:   i ⇐ iota n : sum [(j, v) ⇐ elt(a, i) : mult v, elt(xs, j) ] If m is the number of non-zero elements, this expression has step complexity O (1) and work complexity O (n + m). Expressions within an iterator may include recursive function calls. Suppose that we have a sequential implementation binsearch of a binary search algorithm, and we wish to search the vector xs in parallel for each of the keys in the vector keys. This can be specified: [k ⇐ keys : binsearch(k, xs)] The step complexity is O (log n), where n is the length of xs. The work complexity is O (k log n), where k is the length of keys. 13

One can also write parallel divide-and-conquer algorithms using iterators. For example, the key step in the parallel quicksort of a sequence xs (with no duplicate values) can be written: let les = [x ⇐ xs | x < elt(1, xs) : x] gtr = [x ⇐ xs | x ≥ elt(1, xs) : x] in flat [ys ⇐ build2 (les, gtr) : quicksort(ys)] If n is the length of xs, then the expected step-count is log n and the expected work is n log n.

2.4

Containment and Typing

Given the step/work metric for our language, one must ask if these metrics can be implemented. More precisely, can we translate expressions in the source language to some target machine language so that the target machine code runs in the time predicted by the high-level metrics? Additionally, we must ask: Is the final performance acceptable or are the “constant factors” introduced in translation unacceptably large? The first question is largely a theoretical one and can be answered using pure mathematics. The second question is pragmatic and is best answered by measuring the performance of an actual implementation on a variety of problems. Various techniques can be used to construct an implementation. For example, a simplistic threadbased implementation might simply spawn a thread for each element of an iterator’s iteration-space. In order to put less strain on the runtime system, one might use various analyses to control the number of threads and their granularity, or to schedule them statically. It is informative to view the flattening transformations in this light. Iterator-free expressions in our language are very easy to schedule; the only sources of parallelism are primitive calls, and each of these can be optimized independently, usually as a function of the input vector size. By restructuring the code, flattening greatly simplifies the dynamic scheduling of threads. However, in general flattening does not preserve the running time of programs. In fact, flattening can transform a program that takes O (n) steps into one that takes O (2n ) steps. The main reason for this is the effect of flattening the conditional. Consider the function f defined: f (x) ⇐ if x ≤ 1 then 1 else if even x then f (x − 1) else f (x − 1)

(2.8)

A call to f (x) takes O (x) steps, evaluating to 1. Note that each invocation of f is directly responsible for at most one subsequent call to f . If n = max {x | x ∈ xs}, then [x ⇐ xs : f (x)] takes O (n) steps. However, the transformed code, f 1 (xs), takes O (2n ) steps. The reason is that each invocation of f 1 may be directly responsible for as many as two subsequent calls to f 1 . To see this, consider the flattening transformation for conditionals given in Equation 2.7. Note that an iterator — [z ⇐ zs, x ⇐ xs : if z then A else C] — may perform several conditionals in parallel; some of these may take the then-branch, others may take the else-branch, but none take both. In the transformed code, however, none of the else-branches — [x ⇐ rstr(not1 zs, xs) : C] — evaluate until all of the then branches — [x ⇐ rstr(zs, xs) : A] — have finished evaluating; thus a single invocation of the transformed code may

14

execute both branches of the original conditional, one after the other. We revisit this example in Section 3.5. To address this issue, Blelloch introduced the notion of containment. He then argued that contained iterator expressions have a cost-preserving implementation on the VRAM; i.e., contained iterator expressions can be implemented on the VRAM in a way that respects the costs predicted by the high-level metrics. We shall refer to this result as the containment theorem. Containment is a semantic condition based on the set of functions invoked by an expression when it evaluates. Intuitively, a contained expression is one that always makes the same function calls, in the same order, regardless of its input parameters. Let us give a slightly more detailed definition. Assuming that all functions are named, one may construct a labelled call tree that represents the function calls that an expression performs on a given input, in order; each node is labelled with the name of the function or primitive called. For example, the function f defined in (2.8), when executed, would produce a tall, thin tree with spines labelled ‘f ’; dangling off the spine would be other nodes labelled ‘leq’, ‘even’, and ‘minus’. The tree set of an expression is the set of call trees that is generated by evaluating the expression on all possible inputs. A contained expression, then, is one whose tree set is totally ordered under the subtree relation. Note that f is a contained function, although flattening does not “work” for it. This does not, however, provide a counter example to Blelloch’s result. The containment theorem does not prove the correctness of flattening, rather Blelloch’s proof uses an entirely different simulation technique. As far as we are aware, ours is the first poof of the extensional correctness of flattening. It is easy to rewrite f in such a way that flattening “works”. This suggests that one may be able to refine the flattening transformations so that they work for such expressions. While this would clearly be useful, it is not obvious how it could be done. We do not pursue this approach here. Rather, we provide a method to identify such programs statically, offering them to the programmer for rewriting. In order to prove flattening “correct”, therefore, we must first narrow attention to programs for which the result is actually true. As we have seen, containment is not sufficient. Containment has another important defect; it is a semantic property, and thus is undecidable. One of the major contributions of this work is to define a syntactic condition that is sufficient to establish the correctness of flattening. We do this via a typing system which, in effect, abstracts the tree set of a program. The typing system divides expressions into three categories: cnst : expressions with a singleton tree set, i.e., expressions with a constant call tree, flat : expressions with a totally ordered tree set, i.e., (a subset of) contained expressions, and exp : all other expressions. The expressions in flat are those that we consider flattenable. Given that containment is undecidable and that membership in flat is decidable, flat is obviously not precisely the set of contained expressions; in light of f , this would also be undesirable, since f is contained but not correctly flattened using our transformations. Nonetheless, we call our typing system “a typing systems for containment”, as it captures the essential property of containment. 15

2.5

Segment Vectors and References

The VRAM does not directly support (nested) sequences as a datatype. Thus an additional hurdle in the implementation process is to define a representation of sequences on the VRAM. In confronting this problem, Blelloch and Sabot introduced the segment-vector encoding of sequences, which is described at the beginning of Chapter 6. Using segment vectors, however the flattening transformations again fail to correctly implement the high-level metrics, even for contained programs. The culprit this time is the rule for constant expressions (2.5). Using this rule, the iterator ‘[x ⇐ xs : y]’ is rewritten as: prom(xs, y) Suppose the length of xs is n. Looking at the source term, ‘[x ⇐ xs : y]’ takes work proportional n. However, ‘prom(xs, y)’ takes work proportional to n times the size of y. If y refers to a non-scalar value, its size my easily dominate n. More important, the size of y depends on the environment; thus we cannot bound the work of the target expression with respect to the work of the source expression, not even asymptotically. The solution adopted by Blelloch (in the appendix of his reference manual for NESL) is to change the costing function for iterators to include the size of free variables. Instead, we adopt a different representation of sequences based on references, outlined in Chapter 3. Under this representation, prom(xs, y) takes work proportional n; it simply creates n references to y. This representation allows us to prove the transformations correct with respect to the simpler highlevel metric. However, it also weakens our ability to claim that our results apply to a “pragmatic” language. We believe that a reference-based implementation will perform well; however, we have no evidence to support this belief. As far as we are aware, no reference-based implementation of sequences has ever been used to implement flattening; work is underway to amend this. We do not abandon segment vectors entirely, however. In Chapter 6, we present Blelloch’s solution and prove its correctness. Indeed the proof is only a minor variation on the one that precedes it.

2.6

Overview of Part I

Part I includes five subsequent chapters. Chapter 3 defines the source and target languages; these are unified in an intermediate language that includes both the source and target as sublanguages. We sketch the implementation of the target language on the VRAM; however, this aspect of the implementation is treated informally. The flattening transformations are formalized as a relation A A0 , meaning that expression A can be rewritten as A0 . We prove that every source term can be transformed into a target term using the transformations. t Evaluation is formalized as a partial relation ‘σ ` A −→ w a’, meaning that in the runtime environment σ, expression A evaluates to value a with step-count t and work-count w. The last section of this chapter outlines the correctness problem and motivates our solution. Chapter 4 presents the typing system and proves that both evaluation and transformation preserve typing. We also prove a few key properties of the typing system. Chapter 5 contains the main results. We formalize the notion of correctness using the asymptotic 0 0 improvement relation (notation A B ≈ A ). Roughly, A is asymptotically improved by A if the two 16

expressions compute the same functions and A0 runs faster than A, up to a constant factor. In more detail (but still roughly), A is asymptotically improved by A0 if there exist constant factors u and v such that for any runtime environment σ, t 0 6 u·t σ ` A −→ −−→ a w a implies σ ` A − 6 v·w

or in words: if σ ` A evaluates to a value a with step-count t and work-count w, then σ ` A0 must also evaluate to a with some step-count t0 and work-count w0 such that: t0 ≤ u · t and w0 ≤ v · w. The main result can then be stated. Roughly, A

0 A0 implies A B ≈ A . In words:

Every (well-typed) source expression A can be translated to a target expression A0 such that A is asymptotically improved by A0 . The proof of this result is quite a bit more difficult than one might expect. Crucial to the proof is 0 the use of a strong improvement relation (notation A B ∼ A ) defined over an alternative semantics. We motivate this approach in Section 5.1. Chapter 6 outlines Blelloch and Sabot’s segment vector implementation of sequences and replays the results of Chapter 5 in this setting. Chapter 7 provides an overview of related work.

17

Chapter 3 A Nested-Sequence Language and its Implementation In this chapter, we introduce the language and transformation rules, proving some properties of the transformations. We start by defining the source language, which includes the iterator (or map) construct. In Section 3.2, we then define the target language, which includes parallel primitives such as add1 , and an intermediate language, which includes constructs that are used during translation. Section 3.3 presents the evaluation relation that defines the operational semantics of expressions. Section 3.4 is devoted to the transformation rules; we prove that every source term can be rewritten to a target term. Finally, in Section 3.5 we sketch the correctness problem, presenting a few examples that motivate the work in the chapters ahead.

3.1

The Source Language

The syntax is parameterized with respect to the following syntactic sets, which we assume to be disjoint: • • • •

Bool = {t, f}, of boolean values, ranged over by bv, Int = {.., −1, 0, 1, ..}, of integer values, ranged over by letters h through n, Prim, of primitive names, ranged over by p, and Var, of variable names, ranged over by f , x, y, z, xs, xss, etc.

We often use primes and subscripts two distinguish metavariables. D EFINITION 3.1 (VALUES , E XPRESSIONS , C ONFIGURATIONS , S UBSTITUTION ). The syntax of values, a, b, as, bs, . . . , expressions, A, B, . . . , and runtime environments, σ, ρ, is given in Table 3.1. We use the words “term” and “expression” synonymously. A configuration is a pair consisting of a runtime environment and an expression, written ‘σ ` A’. The variable x is bound in ‘let x ⇐ B in A’, the scope is A. The variables xi are bound in the map ‘[x1 ⇐ B1 , .., x` ⇐ B` : A]’, the scope is A. The variable f is bound in the definition ‘letrec f ⇐ (x1 , .., x` ) D,E in A’, the scope is D, E and A; the variables xi are also bound in the definition ‘letrec f ⇐ (x1 , .., x` ) D,E in A’, the scope is D and E. We identify expressions up to renaming of bound variables. Certain constraints on the formation of expressions and configurations are given in the table. For example, in the map construct, all bound variables must be unique.

Table 3.1 Values, Expressions and Runtime Environments a, b, c, d, e ::= n bv ha1 , .., an i

Values Integer Boolean Vector

A, B, C, D, E ::= a x p B (A1 , .., A` ) letrec f ⇐ (x1 , .., x` ) D,E in A let x ⇐ B in A [x1 ⇐ B1 , .., x` ⇐ B` : A] if B then A else C

Expressions Value Variable Primitive Application Function definition (xi distinct) Sequencing Map A over vectors Bi (xi distinct) Conditional

σ, ρ ::= 0/ f ⇐ (x1 , .., x` ) D,E x⇐a σ, ρ

Runtime environments Empty Definition of f as function Definition of x as value  Union dom(σ) disjoint dom(ρ)

We use a simple form of substitution to allow explicit renaming of variables. A variable substitution is a map from Var to Var. We write A{|ey/ex|} for the term resulting from the simultaneous capture-avoiding substitution of yi for xi in A.  The syntax is mostly standard. Note that there is no construct for parallel application — p1 or f 1 — in the source language. In the expression ‘letrec f ⇐ (x1 , .., x` ) D,E in A’, the expressions D and E give definitions for f . Essentially D gives the sequential implementation of f , whereas E gives the parallel implementation of f 1 . In the next chapter we will enforce a relation between these two; for the time being we allow them to be arbitrary. In source terms, there is no construct for parallel application, and thus E may safely be dropped; in examples, we usually write function declarations simply as ‘letrec f ⇐ (x1 , .., x` ) D in A’. The expression E can be generated automatically when translating into the intermediate language.1 We include E at the outset to avoid having two variants of the syntax. N OTATION . We sometimes write the sequence ‘ha1 , .., an i’ as ‘a1 , .., an ’ or ‘hai ini=1 ’. We also occasionally write ‘f ⇐ e x D,E’ as ‘fe x ⇐ D,E. Finally, we may write ‘let x ⇐ B in A’ as ‘let x ⇐ B A’.  1 The

e : Ai’ introduced generation of E could be formalized as a transformation rule, using the e-map construct ‘he x ⇐ xs in the next section. letrec f ⇐ e x D in A letrec f ⇐ e x D,he y ⇐e x : D{|ey/ex|}i in A

19

D EFINITION 3.2 (D EPTH , S IZE , L ENGTH ). The depth function D and the size function S are defined for all values. The length function L is defined only for sequences. All of these functions map values to natural numbers. def

Sn=1

def

S bv = 1

Dn=0 D bv = 0 def

D haj inj=1 = 1 + maxj D aj

def

def def

S haj inj=1 = 1 + ∑j S aj

def

L haj inj=1 = n



Note that the depth and size of the empty sequence hi is 1; its length is 0. As another example, let: c = 1 234 56

78



Then D c = 3, S c = 16 and L c = 2.

3.2

The Intermediate and Target Languages

In order to implement the language on a vector machine, the compilation process eliminates all occurrences of the map construct from source terms. In order to achieve this, the target language is enriched with a new construct for parallel application. The compilation process is formalized using a modified map construct, the e-map, forming an intermediate language that includes both source-language maps and target-language parallel applications. The intermediate and target languages also include an alternative syntax for the conditional. This “e-conditional” is added solely to facilitate proofs (e.g. Proposition 5.12); in implementation it is identical to the standard conditional. D EFINITION 3.3 (I NTERMEDIATE , S OURCE AND TARGET E XPRESSIONS ). The intermediate language extends the source language with constructs for parallel application, evaluated conditionals (e-conds) and evaluated maps (e-maps). A, B, C, D, E ::= .. B1 (A1 , .., A` ) ife B then A else C e : Ai he x ⇐ xs

Intermediate expressions Parallel application Evaluated conditional Evaluated map (xi distinct, A a source term)

For an expression E of the intermediate language we say: • E is source term if E is formed using the syntax of Table 3.1; i.e. E contains no parallel applications, evaluated conditionals or evaluated maps, and • E is target term if E contains no maps or evaluated maps. The variables xi are bound in the e-map ‘hx1 ⇐ xs1 , .., x` ⇐ xs` : Ai’, the scope is A. We require that in e : Ai, A be a source term. every e-map, he x ⇐ xs 

In the sequel all references to terms will be to intermediate terms unless explicitly noted. Parallel application is used in the target language to implement maps. E-maps are used as an intermediate form during compilation. Note that in an e-map, the binding expressions must all be variables. E-conditionals are included as an accounting trick to distinguish conditionals which are 20

subject to transformation from those which are not. The utility of this distinction will be made clear in Section 5.2, where we define an alternative semantics of expressions that assigns different costs to the two constructs. For now, the reader can safely ignore the distinction between conditionals and e-conditionals. Throughout this chapter and the next they are treated identically in every detail. The target language has a very direct mapping to the VRAM. Variables in the target language correspond to variables in the VRAM, scalar values correspond to scalar values. Sequence values correspond to vectors, where a sequence of sequences is represented as a vector of references to other vectors. All of the primitives we consider have straightforward sequential and parallel implementations on the VRAM; for example add is represented as scalar addition and add1 is represented as vector addition. Function declarations, function application, let-expressions and the conditional all have direct analogs in the VRAM. The most significant feature of this mapping is the use of references to form nested sequences, i.e., a sequences of sequences is represented as a vector of memory addresses that “point” to other vectors. We believe that this implementation is fairly intuitive and, therefore, do not describe it further. In Chapter 6 we describe a different implementation of nested sequences on the VRAM, which uses segment vectors; this implementation is less obvious and therefore is described in greater detail. Whereas the segment vector implementation has been used in interpreters for NESL and PROTEUS, to our knowledge the reference implementation has never been realized. There is reason to believe that the reference implementation may perform well on the current crop of parallel computers, but this remains to be verified experimentally. The choice of implementation is reflected in our language in the semantics of primitive calls, in particular in the costing of primitive calls. The next section describes the costing metrics for primitives under the reference implementation of nested sequences. In Chapter 6, we describe the costing metrics for the segment-vector implementation. We rely on the reader’s intuitions to establish that the costings we propose are reasonable; we present no proof that these costings can be realized. For the reference implementation, we believe that the costings we assign are “obviously” implementable. For the segment vector implementation, our costings are consistent with those adopted by NESL and PROTEUS. In some respects it would be satisfying to have a formal proof that our semantics for the target language is correctly implementable on the VRAM. Simply to state the result, however, would require several pages worth of definitions, all for a proof we believe to be straightforward. We concentrate on proving the less obvious results.

3.3

The Semantics

The semantics of primitives and terms are given in Tables 3.2 and 3.3, respectively. Table 3.2 also defines the function δ which maps primitive names to types; this is included for reference in the next chapter and can be ignored safely for now. The semantic judgments are of the form: t pe a −→ w d t σ ` A −→ w a

Primitive p evaluates to d under e a. Expression A evaluates to a under σ.

21

Here, t and w indicate the step-count and work-count required for the evaluation. We drop these scripts from the arrow when they are uninteresting. Table 3.2 gives the semantics of all primitives used in the transformation rules. It also gives the semantics of some other primitives so that the reader may see how these might be specified. The table is not exhaustive; we use additional primitives in examples, defining them as necessary. Let us briefly describe the primitives included in Table 3.2. All primitives take a constant number of steps. The functions not and add are self-explanatory operations on scalars. The function all is a reduction that tests to see if all elements of a boolean sequence are “true”. Similarly, sum is a reduction that computes the sum of the elements in an integer sequence. The p-sum primitive is a scan operation that computes the sum over every prefix of an integer sequence. These vector operations take work proportional to the length of their sequence arguments. We describe the remaining “structural” operations in somewhat more detail. empty : Determines whether its argument is an empty sequence; takes constant work. 1 For example: empty a b −→ 1 f. len :

Determines the length of its argument; takes constant work. 1 For example: len a b c d e −→ 1 5.

prom : Promotes its second argument to the length of its first; takes work proportional to the length of its first argument.  1 For example: prom 1 2 3 4, a b −→ 2 a b a b. build` : Builds an `-element sequence from ` arguments; takes work proportional to `. 1 For example: build3 (a, b, c) −→ 3 abc elt :

Selects an element of a sequence; takes constant work.  1 For example: elt 2, a b c d e −→ 1 b

read :

Selects several elements from a sequence; takes work proportional to the number of elements selected.  1 For example: read 2 4, a b c d e −→ 2 bd

rstr :

Restricts a sequence, based on a boolean sequence; takes work proportional to the length of the original sequence.  1 For example: rstr t f t t f, a b c d e −→ 5 a c d.

write : Merges one sequence into another, based on a third boolean sequence; takes  1 work proportional to the length of the original sequence. For example: write a b c, t f t, d e −→ 3 d b e. flat :

Flattens a sequence of sequences; takes work proportional to the length of its sequence argument plus the sum of the lengths of the subsequences. For example: 1 flat a b c d e −→ 7 a b c d e.

part :

Partitions a sequence, based on a sequence of sequences; takes work similar to the flattening primitive.  1 For example: part 1 2 3 4 5, a b c d e −→ 7 a b c d e. 22

Table 3.2 Primitives: Typing Rules and Reference Semantics Scalar operations: δ(not) = bool  cnst bool

1 not bv −→ 1 (¬bv)

δ(add) = (int, int)  cnst int

1 add(m, n) −→ 1 (m + n)

Vector length operations: δ(empty) = α1  cnst bool

1 empty haj inj=1 −→ 1 (n = 0)

δ(len) = α1  cnst int

1 len haj inj=1 −→ 1 n

Reductions and prefix scans: δ(all) = bool1  cnst bool

1 all hbvj inj=1 −→ n (∀j : bvj )

δ(sum) = int1  cnst int

1 n sum hkj inj=1 −→ n ∑j=1 kj

j n 1 p-sum hkj inj=1 −→ ∑i=1 ki j=1 n

δ(p-sum) = int1  cnst int1 Vector constructors: ` z }| { δ(build` ) = (α, .., α)  cnst α1

1 ` build` (a1 , .., a` ) −→ ` hai ii=1  1 n prom hcj inj=1 , a −→ n haij=1

δ(prom) = (β1 , α)  cnst α1

Vector destructors: δ(elt) = (int, α1 )  cnst α  1 elt k, haj inj=1 −→ 1 ak

if 1 ≤ k ≤ n

δ(read) = (int1 , α1 )  cnst α1  1 n read hki im → haki im i=1 i=1 , haj ij=1 −m

if ∀i : 1 ≤ ki ≤ n

Restriction and merging operations: δ(rstr) = (bool1 , α1 )  cnst α1  1 m rstr hbvj inj=1 , haj inj=1 −→ n haji ii=1 δ(write) = (α1 , bool1 , α1 )  cnst α1  1 n write haj inj=1 , hbvj inj=1 , hci im n hdj ij=1 i=1 −→

if ∀1 ≤ i ≤ m : bvji = t and ∀m < i ≤ n : bvji = f if bvj = f implies dj = aj and bvj = t implies dj = c

j

∑i=1 bvi



Nested vector operations: δ(flat) = α2  cnst α1

m 1 i flat haj ikj=1+k −−− → haj inj=1 i−1 i=1 m+n

if k0 = 0 and km = n and ∀i : ki−1 ≤ ki

δ(part) = (β2 , α1 )  cnst α2  1

m m i i part hcj ikj=1+k , haj inj=1 −− −→ haj ikj=1+k m+n i−1 i=1 i−1 i=1

23

if k0 = 0 and km = n and ∀i : ki−1 ≤ ki

As described in Section 2.1, the elt and build primitives are complementary, as are part and flat. Let i be a natural number between 1 and `. Let ass be a sequence of sequences. Roughly, we have:  elt i, build` (a1 , .., a` ) = ai  part ass, flat ass = ass

(2.1) (2.3)

We adopt the write primitive in lieu of the merge primitive described in the previous chapter. Let as be a sequence and let bs be a boolean sequence of equal length. Corresponding to (2.2), we have:  write as, bs, rstr(bs, as) = as

(3.1)

We now turn to the semantics of expression, defined in Table 3.3. For the moment, let us ignore the costing annotations t and w. Extensionally, all of the constructs of the language are standard but for the conditional, which is discussed below. The map and parallel application constructs were described informally in Sections 2.1 and 2.2, respectively; Table 3.3 presents a straightforward formalization. Note that if f is defined ‘letrec f (x) ⇐ D, E’, then D is used in evaluating the application f y, whereas E is used in evaluating the parallel application f 1 ys. We adopt the interpretation that the expression ‘if B then A else C’ is strict in both B and A. (Of course the same applies to the e-conditional, ‘ife B then A else C’.) Thus if ⊥ is a non-terminating function then ‘if f then ⊥ else 42’ also fails to terminate. In effect, our conditional is a one-armed conditional, with a default value. We might better write ‘if B then A else C’ as ‘A unless B then C’, but we find the former more readable. A standard two-armed conditional can be derived: if B then if not B then 42 else A else C

(3.2)

This interpretation of the conditional allows us to simplify the transformation rule for conditionals, described in Equation 2.7. Using our interpretation, if z does not appear free in A or C, then ‘[z ⇐ zs, x ⇐ xs : if z then A else C]’ rewrites, roughly, to:  write [x ⇐ xs : A], not1 zs, [x ⇐ rstr(not1 zs, xs) : C]

(3.3)

We emphasize that we have chosen this non-standard interpretation purely to simplify the transformations; our results extend smoothly to the standard interpretation of conditionals, although the proofs are somewhat longer. Since we a primarily interested in expressions that are to be flattened, the choice is reasonable. As we shall see in the next chapter, in order for ‘if B then A else C’ to be typed as a flat expression (i.e., to be flattenable), either A or C must have constant step complexity; we arbitrarily fix A in order to streamline the presentation. Thus for well-typed expressions of type flat, our interpretation and the standard interpretation are extensionally indistinguishable. t Intentionally, we view the evaluation σ ` A −→ w a as an operation on the runtime environment, σ, which resembles a computer store. Thus, the rule (E-VAR) states that a variable x evaluates to σ(x) with in zero steps. Intuitively, x has already been computed; therefore no work must be done

24

Table 3.3 Reference Semantics (E-VAL)

(E-VAR)

a σ ` a −D − S→ a a

0 σ ` x −→ 0 σ(x)

(E-LET) tB σ ` B −− b w→ B

(E-LETREC)

tA σ, x ⇐ b ` A −− a w→ A

tA σ, f ⇐ e x D,E ` A −− a w→ A

tB +tA σ ` let x ⇐ B in A −w− −−→ a B +wA

tA σ ` letrec f ⇐ e x D,E in A −− a w→ A (E-IF F )

(E-IF T )

tB σ ` B −− f w→ B

tB σ ` B −− t w→ B

tA σ ` A −− a w→ A

tA σ ` A −− a w→ A

tC σ ` C −− c w→ C

1+tB +tA σ ` if B then A else C −1+w − −−B− −→ a +w A

1+tB +tA +tC σ ` if B then A else C −1+w − −− −−−−→ c B +wA +wC

(E-APP P )

(E-APP F )

ti { σ ` Ai −w→ ai }`i=1 i

{

ti σ ` Ai −w→ ai }`i=1 i tD σ,e x⇐e a ` D −− d w→ D

p pe a −wt→ d p

ti +1)+tp e −−−−1+( σ ` pA −∑− −−−−−−→ d 1+(∑w− i +1)+max(1,wp )

1+(∑ti +1)+tD e −− σ `fA −− −−−−−→ d 1+( ∑wi +1)+wD

(E-PAPP P )

(E-PAPP F )

ti { σ ` Ai −w→ haji inj=1 }`i=1 i

{

{

p p aej −wt→ dj j

}nj=1

∑ti +1)+tp e −−−−−2+( σ ` p1 A −−−−−−−−→ hdj inj=1 1+n+(∑− w− i +1)+max(1,∑wj )

n≥1

(E-EMAP)

σ` { σ,e x ⇐ bej ` {

σ(f ) = e x D,E

ti σ ` Ai −w→ ai }`i=1 i tE σ,e x⇐e a ` E −− e w→ E

1+(∑ti +1)+tE e −− σ ` f 1A −− −−−−−→ e 1+( ∑wi +1)+wE

σ(f ) = e x D,E

(E-MAP) 0 n xsi −→ 0 hbji ij=1 tj A −w→ aj j

ti σ ` Bi −w→ hbji inj=1 }`i=1 i tj { σ,e x ⇐ bej ` A −w→ aj }nj=1 j   1+(∑ti )+(max tj ) e : A −− σ` e x⇐B −−−∑− −−−→ hainj=1 1+n+( w− i )+(∑wj )

}`i=1 }nj=1

tj e : Ai −1+max σ ` he x ⇐ xs − −− haj inj=1 n+ w→ ∑− j

{ n≥1

25

in order to determine its value. This should be contrasted with (E-VAL) which states that evaluation of a constant a, requires steps proportional to the depth of a and work proportional to the size of a. Intuitively, the value a must be stored before evaluation can terminate, and this requires some work. Explicit sequencing, via the let construct, incurs no cost. Intuitively, sequencing is essential to any computation, and thus it makes no sense to charge extra when it happens to be represented syntactically. Roughly, we wish to ensure that certain equations from Moggi’s monadic metalanguage hold true, such as: f A = let x ⇐ A in f x In order to compute f A, one must first compute A. In let x ⇐ A in f x the sequence events is simply made explicit, it is not changed. Function declaration also incurs no cost. This interpretation is justified by the typing rules given in the next chapter. Roughly, functions must be explicitly parameterized, i.e. they cannot refer to free variables (except to other functions). This means that all function declarations can be processed statically, with no runtime cost. The rules for the conditional should be easily understandable given the discussion above. (The rules for e-conditionals are identical and are not explicitly stated.) The conditional charges one step in order to select whether or not expression C is evaluated. The rules (E-APP P ) and (E-PAPP P ) give the semantics for primitive application. Note that, because of the side condition n ≥ 1, parallel application is not defined on empty sequences; for example, ‘σ ` not1 hi’ does not evaluate. Our transformation rules guarantee that parallel application is only introduced on non-empty sequences. In addition to charging for the evaluation of the arguments and of the primitive itself, we charge one step for storing each argument and one step for storing the result. The parallel version requires that we store n + 1 results — the n results of the subproblems plus one for the cumulative result. In both cases, we are careful to make sure that the work complexity is never less than the step complexity. A bit of care is required since the primitive semantics does not enforce 1 this. For example, sum hi −→ 0 hi. This explains the max term in the work count. The rules (E-APP F ) and (E-PAPP F ) give the semantics of function application. Again we charge one step for each argument and one step for the result. The rules differ only in which function body is chosen, the sequential body D or the parallel body E. Finally, (E-EMAP) and (E-MAP) give the semantics of e-maps and maps. These correspond closely to the intuitive semantics given in Section 2.3. The expressions determining the iteration space are evaluated sequentially; the map operation itself is evaluated in parallel. Parallelism is manifest in the use of the max function to determine the step count. In the treatment of “arguments” and “results”, maps are somewhere between let-expressions and applications. We charge for storing the result, but we do not charge for iterating through the iteration space. As for parallel primitive application, e-maps are only defined over non-empty sequences; for example if σ(xs) = hi, then σ ` hx ⇐ xs : Ai does not evaluate. Again, our transformation rules guarantee that e-maps are only introduced on non-empty sequences. We conclude this section by presenting two results concerning the semantics. Lemma 3.4 states that the work complexity of an expression is always greater than the step complexity of an expression. 26

Proposition 3.5 states that the evaluation relation is deterministic; i.e., each configuration has at most one evaluation. t L EMMA 3.4. σ ` D −→ w d implies w ≥ t ≥ 0 t Proof. By induction on the judgment σ ` D −→ w d, using the rules in Table 3.3. We provide a few t examples. Suppose that σ ` D −→ w d via ( E - VAL ) because D = d, t = D d and w = S d. One can easily establish that for all d, S d ≥ D d, thus establishing the result. t Suppose that σ ` D −→ w d via ( E - MAP ). Then for some ` ≥ 0, n ≥ 0, 1 ≤ i ≤ ` and 1 ≤ j ≤ n, we have

t = (∑ti ) + (max tj ) + 1

e : A] D = [e x⇐B d=

w = (∑wi ) + (∑wj ) + 1 + n

hainj=1

where ti σ ` Bi −w→ hbji inj=1 i

tj aj σ,e x ⇐ bej ` A −w→ j

by shorter inference. By induction, we have that for all i and j, wi ≥ ti ≥ 0 and wj ≥ tj ≥ 0. Therefore (∑wi ) ≥ (∑ti ) and (∑wj ) ≥ (max tj ). In addition, (1 + n) ≥ 1. By monotonicity of addition, the result holds. t Suppose that σ ` D −→ w d via ( E - APP P ). We make use of another auxiliary result: t pe a −→ w d implies max(w, 1) ≥ t ≥ 0

The result then follows by induction as in the previous example.



P ROPOSITION 3.5 (D ETERMINISM ). t t0 If σ ` D −→ → d0 then t = t0 , w = w0 and d = d0 . w d and σ ` D − w0 Proof. First we note similar results for the depth and size functions and for the primitives: 0

t t 0 0 0 0 pe a −→ a −w→ 0 d imply t = t and w = w and d = d w d and p e

t The result then follows by induction on the judgment σ ` D −→ w d. t For example, suppose σ ` C −→ w d via ( E - APP F ). Then C = f (A1 , .., A` ) for some terms A1 , . . . , A` and variable f , where for some D and E:

σ(f ) = (x1 , .., x` ) D,E Let i range between 1 and `. Then it must be that t = 1 + (∑ti + 1) + tD

w = 1 + (∑wi + 1) + wD

because ti σ ` Ai −w→ ai i

tD σ,e x⇐e a ` D −− d w→ D

27

Table 3.4 Transformations: Context Rules (X-CTXT A ) (X-CTXT L1 ) (X-CTXT L2 ) (X-CTXT C1 ) (X-CTXT C2 ) (X-CTXT E1 ) (X-CTXT E2 ) (X-CTXT R1 ) (X-CTXT R2 ) (X-CTXT R3 )

e B0 A let x ⇐ B0 in A let x ⇐ B in A0 if B then A0 else C if B then A else C0 ife B then A0 else C ife B then A else C0 letrec f ⇐ e x D0 ,E in A letrec f ⇐ e x D,E0 in A letrec f ⇐ e x D,E in A0

e BA let x ⇐ B in A let x ⇐ B in A if B then A else C if B then A else C ife B then A else C ife B then A else C letrec f ⇐ e x D,E in A ⇐ letrec f e x D,E in A ⇐ letrec f e x D,E in A

if if if if if if if if if if

B0 B0 A0 A0 C0 A0 C0 D0 E0 A0

B B A A C A C D E A

Table 3.5 Transformations: let Introduction and Elimination (X-ILET A ) (X-ILET P ) (X-ILET C ) (X-ILET E ) (X-ILET M )

(X-ELET 1 ) (X-ELET 2 )

B (A1 , .., Ai , .., A` ) let x ⇐ Ai in B (A1 , .., x, .., A` ) 1 B (A1 , .., Ai , .., A` ) let x ⇐ Ai in B1 (A1 , .., x, .., A` ) if B then A else C let x ⇐ B in if x then A else C ife B then A else C let x ⇐ B in ife x then A else C   x1 ⇐ B1 , .., xi ⇐ Bi , .., x` ⇐ B` : A   let xsi ⇐ Bi in x1 ⇐ B1 , .., xi ⇐ xsi , .., x` ⇐ B` : A let x ⇐ y in A let x ⇐ B in A

A{|y/x|} A

Ai ∈ / Var Ai ∈ / Var B ∈ / Var B ∈ / Var Bi ∈ / Var

x∈ / fv(A)

0

t e then σ ` C −→ by shorter inference. Since C = f A, d0 must also be derived via (E-APP F ); no other w0 rule applies to this syntactic form. Therefore: 0

ti 0 σ ` Ai −w→ 0 ai

0 t0 = 1 + (∑ti0 + 1) + tD 0

w =

i

σ,e x⇐e a0

1 + (∑w0i + 1) + w0D

`D

0 tD −− → w0D

d0

By induction ti0 = ti , w0i = wi and a0i = ai . This last fact allows us to apply induction again, concluding 0 = t , w0 = w and d 0 = d. Thus: that tD D D D d0 = d

3.4

t0 = 1 + (∑ti + 1) + tD

w0 = 1 + (∑wi + 1) + wD



The Transformations

The transformation relation A A0 is defined in three tables. The context rules are in Table 3.4. The transformations for let introduction and elimination are given in Table 3.5. The main rules are in Table 3.6. In all of the rules, variables introduced on the right-hand-side of the transformation must be fresh, that is, they may not appear free in any subexpression given anywhere in the rule.

28

Table 3.6 Transformations: Map Rules e : A] [e x ⇐ xs

(X-IMAP)

fv(A) \e x = {y1 , .., y` } A is a source term

ife empty xsh then hi else let ys1 ⇐. prom(xsh , y1 ) .. let ys` ⇐ prom(xsh , y` ) e e e : Ai in he x ⇐ xs, y ⇐ ys

x∈ / fv(A)

prom(xs, A)

x∈ / fv(A)

(X-CONST)

e x ⇐ xs, ez ⇐ zs e : Ai he y ⇐ ys, hx ⇐ xs : Ai

e ez ⇐ zs e : Ai he y ⇐ ys,

(X-VAR)

hx ⇐ xs : xi

xs

(X-APP)

e : B (xi1 , .., xi` )i he x ⇐ xs

B1 (xsi1 , .., xsi` )

e : if xh then A else Ci he x ⇐ xs

e : Ai let ys ⇐ he x ⇐ xs in ife all xsh then ys else let zs ⇐ not1 xsh let zs0 ⇐ rstr(zs, zs) in write(ys, zs, hz ⇐ zs0 : Ci)

(X-EMAP)

(X-LET) (X-IF 1 )

e : let z ⇐ B in Ai he x ⇐ xs

e : Bi in he e z ⇐ zs : Ai let zs ⇐ he x ⇐ xs x ⇐ xs,

fv(C) = 0/

(X-IF 2 )

e : if xh then A else Ci he x ⇐ xs fv(C) = {xi1 , .., xi` } 6= 0/

(X-MAP 1 )

e : [y1 ⇐ xi1 , .., ym0 ⇐ xim0 : A]i he x ⇐ xs

e : Ai let ys ⇐ he x ⇐ xs in ife all xsh then ys else let zs ⇐ not1 xsh let xs0i1 ⇐. rstr(zs, xsi1 ) .. 0 ⇐ let xsi` rstr(zs, xsi` ) e 0 : Ci) in write(ys, zs, he x ⇐ xs

fv(A) = 0/

(X-MAP 2 )

e : [y1 ⇐ xi1 , .., ym0 ⇐ xim0 : A]i he x ⇐ xs  fv(A) = nxk1 , .., xkp o ∪ yk10 , .., ykq0

6= 0/

ife all empty1 xsh then prom(xsh , hi)) e : Ai) else prom1 (xsh , he x ⇐ xs ife all empty1 xsh then prom(xsh , hi)) else let xs0k1 ⇐. flat(prom1 (xsih0 , xsk1 )) .. let xs0kp ⇐ flat(prom1 (xsih0 , xskp )) let ys0k0



let ys0kq0



1

.. .

flat(xsik0 ) 1

flat(xsik0 ) q

e 0, e e 0 : Ai) in part(xsih0 , he x ⇐ xs y ⇐ ys

29

The general transformation strategy is as follows. The context and let introduction rules are used to isolate a map expression. In certain contexts, let introduction must be used to isolate a map expression. For example, in order to isolate B in if B then A else C, one must first use (X-ILET C ). Once a map expression is found, the let introduction rule (X-ILET M ) is applied until the iteration space of the map is described entirely by variables. At this point (X-IMAP) is used to remove the map construct, replacing it with an e-map. The remaining rules of Table 3.6 are then used to “push” the e-map through the syntax until it can be removed using (X-CONST), (X-VAR) or (X-APP). The rule (X-IMAP) enforces two properties of e-maps which the other transformations preserve. First, it guarantees that e-maps are only invoked dynamically on non-empty sequences. Second, it guarantees that e-maps have no free variables. All free variables in a map are explicitly bound before the map is replaced with an e-map. The transformation rules have a different flavor than those of [11] which we described in Section 2.2.  e : [e e: Blelloch and Sabot’s rules work “inside-out”. That is, confronted with a nested map e x ⇐ xs y ⇐ ys  ⇐ e : A] to arrive at an expression D, then flatten the A] , they would first flatten the inner map [e y ys ⇐ e : D]. Our transformations, instead work from the “outside-in”, as in [79]. The subsequent map [e x xs difference is a technical one. “Outside-in” transformations introduce less redundant code; as a result we found them easier to work with in proofs. They may also produce more efficient code (or more efficient compilers), but this remains to be studied. We will make only a few comments about the transformation rules. The context and let rules are self explanatory, and the other rules are motivated in Section 2.2. The reader may find it useful to compare the rules given in Section 2.2 with the corresponding rules in Table 3.6. The rules (X-ELETi ) and (X-EMAP) allow for the elimination of useless let and e-map binders. Note that (X-ELET 2 ) may reduce the cost of an expression, while all of the other let rules have no effect on cost. The transformation rules for conditionals (X-IFi ) and maps (X-MAPi ) come in two flavors, depending upon the free variables in the expression. In both cases, the more important of the two rules is the second; the first rule is applied only when the second would make no sense. ?

We conclude this section by defining the multi-step transformation relation D D0 and by proving ? that every source expression can be transformed to a target expression via . This relation is the ? reflexive and transitive closure of . That is, D D0 if D can be rewritten to D0 using zero or more transformation steps. D EFINITION 3.6. For each n ≥ 0, we define

D We write D

?

D0 if D

n

0

n

to be the least relation that satisfies the following: D D0 n D0 D00

D

D

n+1

D00

D0 for some n.



P ROPOSITION 3.7. For any expression A, there exists a series of transformations A A0 is a target expression. Proof. An easy consequence of the following two lemmas. 30

?

A0 such that 

Table 3.7 Termination Metric χ(x) χ(a) χ(p) e χ(BA) 1 e χ(B A)

= = = = = = = = = = =

0 0 0 χ(B) + ∑ χ(Ai ) + ∑[Ai ∈ / Var] χ(B) + ∑ χ(Ai ) + ∑[Ai ∈ / Var] χ(D) + χ(E) + χ(A) χ(B) + χ(A) + [B ∈ Var] + [x ∈ / fv(A)] / Var] + 15 · |A| ∑ χ(Bi ) + χ(A) + ∑[Bi ∈ χ(A) + 14 · |A| χ(B) + χ(A) + χ(C) + [B ∈ / Var] χ(B) + χ(A) + χ(C) + [B ∈ / Var]

|x| |a| |p| e |BA| 1 e |B A|

= = = = = = = = = = =

1 1 1 1 + |B| + ∑ |Ai | 1 + |B| + ∑ |Ai | 1 + |D| + |E| + |A| 1 + |B| + |A| 1 + ∑ |Bi | + |A| 1 + |A| 1 + |B| + |A| + |C| 1 + |B| + |A| + |C|

χ(letrec f ⇐ e x D,E in A) χ(let x ⇐ B in A) e : A]) χ([e x⇐B e : Ai) χ(he x ⇐ xs χ(if B then A else C) χ(ife B then A else C)

|letrec f ⇐ e x D,E in A| |let x ⇐ B in A| e : A]| |[e x⇐B e : Ai| |he x ⇐ xs |if B then A else C| |ife B then A else C|

L EMMA 3.8. Any non-target expression can be transformed. Proof. By induction on the syntax. For example, if the expression is a map, it can be transformed using (X-ILET M ) and (X-IMAP). If it is an evaluated map, it can be transformed using at least one rule in Table 3.6. If it has any other form, it can be transformed using a mix of let rules (Table 3.5) and context rules (Table 3.4, applying induction.  L EMMA 3.9. Transformations are terminating. Proof. By induction on the judgment A A0 , one can show that A A0 implies χ(A) > χ(A0 ), where for any expression C, χ(C) is a natural number — the metric χ is defined in Table 3.7. Thus the transformations are terminating. Intuitively, the metric requires that we “pay” for iterators and for “complex” expressions that can be simplified using the let rules. In Table 3.7, we write [A ∈ Var] to be 1 if A ∈ Var and 0 otherwise; likewise for other predicates. The definition of χ(A) uses the multipliers 14 and 15 for e-maps and maps. These factors allow the e-map transformation rules to be written in a readable fashion: transformations introduce code that subsequently requires let transformations. The number 31

14 is chosen because no e-map transformation introduces code that requires more than 14 subsequent let transformations. The rule (X-MAP 2 ) requires one additional transformation for each free variable in an iterator — thus we use 15 for maps rather than 14. 

3.5

Proving the Transformations Correct

In order to demonstrate the extensional correctness of the transformations, one can show the following (a variation on this is a corollary of Theorem 5.2): if D

?

D0 then σ ` D −→ d implies σ ` D0 −→ d

(3.4)

This states that transformation preserves the extensional meaning of programs, in so far as it is defined. Here we allow programs to be “improved” in translation; a program that never terminates can be transformed into one that does. This allowance for improvement is mandated by the rule (X-ELET 2 ), which invalidates the stronger bi-implication (σ ` D −→ d iff σ ` D0 −→ d), as the following example shows. E XAMPLE 3.10. (X-ELET 2 ) may convert a non-terminating program into one that terminates. For example, consider the following expressions: D = letrec f () ⇐ f () in let x ⇐ f () in 42 D0 = letrec f () ⇐ f () in 42 D

D0 , but D never evaluates, whereas D0 evaluates to 42 in one step.



We wish to show something stronger than (3.4). Our goal is to show that the transformations preserve computational cost, in some sense, not just extensional meaning. That is, we wish to show ? that D D0 implies D  D0 , for some interesting semantic relation . We would like D  D0 to capture the intuition that if D reduces to a value, then D0 reduces to the same value and does so as fast or faster. 6t 0 0 N OTATION . We write “σ ` D −6− → w d” to abbreviate: “there exists t ≤ t and w ≤ w such that σ ` t0 D −w→  0 d.”

As a first attempt, we might take D  E if for all σ: t 6t σ ` D −→ − →d w d implies σ ` E − 6w

(3.5)

This relation is known as strong improvement. We will revisit this relation in Section 5.3, using an alternative semantics. For now, we note that this relation is too strong to be useful directly. The transformations do not preserve this notion of strong improvement. E XAMPLE 3.11. Consider the rule (X-CONST), which states that if x ∈ / fv(A) then: hx ⇐ xs : Ai

prom(xs, A)

Here the target takes four steps more than the source, providing a counter example to the claim that transformation implies strong improvement. Computing effort is also increased by the transformations (X-IMAP), (X-IFi ) and (X-MAPi ).  32

If our transformations increase computational costs, why are they useful? To answer this question we recall our discussion of implementations in Section 2.4. Our operational semantics is defined primarily for programmers; it does not give rise directly to a pragmatic implementation of the language on any actual parallel computer. Flattening, therefore, is of great practical interest because it has been demonstrated to work reasonably well on real machines. Essentially, flattening simplifies the problem of scheduling threads by limiting synchronization to the implementation of parallel primitives. Our semantics is far too abstract to capture such implementation details. One might argue that the semantics is too abstract, advocating that we change the costing of expressions to more accurately reflect the implementation. Yet the semantics, and in particular the costing function, is useful precisely because is abstract. Tying the semantics closely to a particular implementation means that whenever the implementation changes the semantics must change as well. Given several implementations of a language, which semantics should we chose? High-level program constructs and abstract costing functions allow programmers to create code that is both portable and efficiently executable. More detailed cost models can be difficult to understand and may have surprising properties, making them harder for programmers to master and use. Rather than abandon our costing functions, we introduce a bit of “slack” into the correctness relation. While we cannot prove that flattening strictly improves performance with respect to our operational semantics, we can prove that it does so up to a constant factor. Formally, we will define D  E if there exist constants u and v such that for all σ: t 6 u·t σ ` D −→ − −→ d w d implies σ ` E − 6 v·w

(3.6)

This relation is called asymptotic improvement. We study its properties in Chapter 5. Let us explain our use of the word “asymptotic”. Traditionally, the first step in the analysis of a program is to map the input domain of the program to some numeric domain; this is done in order to determine an ordering on inputs. For example, we say that the running time of insertion sort on array  xs is O n2 , where n is the length of xs. Here we have implicitly defined a map on the input domain which maps each input environment σ to the length of σ(xs). The analysis relies on the notion that one input is larger than another, allowing us to say that “for sufficiently large inputs” certain factors, such as additive constants, can be ignored. Thus the term “asymptotic”. Such orderings, however, are highly problem-specific. For example in analyzing a function over sequences, the running time may depend on the length of the sequence, the magnitude of the first element, or any other metric one can conceive of. We are interesting in showing that the flattening transformations preserves running time “asymptotically” for all programs. We cannot, of course, presuppose any particular metric on the input domain. Thus we are lead to the ordering defined in (3.6), referring to it as “asymptotic” even though no asymptote is mentioned. As discussed in Section 2.4, flattening does not imply asymptotic improvement for all programs in our language. We recall the example given in (2.8).

33

E XAMPLE 3.12. Suppose that f (x) is defined as follows: if x ≤ 1 then 1 else if even x then if not even x then 42 else f (x − 1) else f (x − 1) Here we have used (3.2) to encode (2.8) under our non-standard interpretation of the conditional. After transformation, f 1 (xs) is defined roughly as follows. To improve readability, we have simplified the expression somewhat while retaining its essential qualities. We also adopt an intuitive infix notation  for parallel primitive application. For example, we write ‘leq1 xs, prom(xs, 1) ’ as ‘xs ≤1 1’. ife all (xs ≤1 1) then prom(xs, 1) else let ys ⇐ f 1 (xs −1 1) let zs ⇐ not1 even1 xs in ife all even1 xs then ys else write(ys, zs, f 1 (rstr(zs, xs) −1 1) The expression [x ⇐ hn, n − 1i : f x] evaluates to h1, 1i in O (n) steps. The transformed expression f 1 hn, n − 1i evaluates to the same result, but takes O (2n ) steps.  To eliminate such programs, we introduce a typing system in the following chapter. Then, in Chapter 5, we prove that for well-typed expressions, transformation implies asymptotic improvement.

34

Chapter 4 A Typing System for Containment We introduce a typing system that captures the essential properties of containment using three complexity annotations, cnst, flat and exp. We prove that evaluation and transformation preserve typing. We also prove an important property of cnst expressions, described below. Containment is described in Section 2.4. For a formal definition, see [6]; we do not give one here, nor do we directly prove any properties of flat expressions (which correspond to contained expressions). The significance of flat expressions is made clear in the proof of Proposition 5.12 where the typing rules for flat are used in conjunction with Proposition 4.4 to prove the flattening transformations correct.

4.1 Types and Subtypes The syntax of types is parameterized with respect to a set TVar of type variable names, α, β. D EFINITION 4.1 (T YPES , T YPE E NVIRONMENTS , T YPE S UBSTITUTIONS ). The syntax of complexity annotations, Φ, Ψ, value types, U, V, types, S, T, and type environments, Γ, ∆, is given in Table 4.1. The type variables occurring in Ui are bound in the function type ‘(U1 , .., U` )  Φ V’, the scope is V. A type substitution is a map from type variables α to value types V. Let ϖ range over type substitutions. We use notations similar to those for value substitutions; thus, we write T{|Ve/αe|} for the term resulting from the simultaneous capture-avoiding substitution of Vi for αi in T.  The type language is stratified between value types U and types T. The latter include function types. Functions are constrained to act over values. The type (U1 , .., U` )  Φ V is assigned to a function that takes ` arguments of the appropriate types and evaluates to a value of type V; additionally, the function body is constrained to be an expression with complexity Φ. Thus, a functions type tells us something of how it evaluates. The complexity annotation cnst refers to constant-step expressions, flat refers to contained expressions, and exp refers to all expressions. Every constant-step expression is contained, and every contained expressions is an expression. This gives rise to a natural ordering on complexity annotations and, by extension, to types. D EFINITION 4.2 (S UBCOMPLEXITY, S UBTYPING ). The subcomplexity relation (notation Φ > >> > @ I

α

γ

δ



A The significance of the corollary is twofold. First, it states that intermediate abstractions can be used to prove more abstract properties. Second, it states that properties that hold for the most abstract model also hold for models at intermediate levels of abstraction. This suggests, for example, that an interpretation that distinguishes some values may be used to prove properties that ignore values altogether. Thus, in order to prove properties of a concrete system, users of our framework may employ many abstraction functions, starting with the most abstract; if the desired result can be proven at the most abstract level, then the task is done, otherwise more concrete models may be used.

10.2

Soundness

In the previous section we showed how to abstract the model of a process in such a way that properties of the abstract model hold also for the original. This technique, however, requires that the concrete model be constructed, an impossibility in the case that the concrete model is infinite state. In this section we advocate an alternative method: rather than abstracting the concrete model, one simply constructs a model using the abstract semantics of Table 9.3. We show that if the value abstraction is sound, then properties of the resulting abstract model will also hold for the concrete model. The advantage of this approach is clear: the concrete model need never be constructed. Not every abstract semantics is sound. Consider the expression d = pos + neg. Is {pos} a reasonable meaning for d? Intuitively, this interpretation is unsound because it allows us to conclude that the result of the expression is always positive, yet we know that 1 + –2 is negative. Thus, soundness requires that the abstract semantics must provide a meaning that is general enough to capture every possible concrete interpretation of an expression. In other words, a value interpretation A is sound with respect to interpretation C if for all expressions, A yields no more precise a result than C. D EFINITION 10.11 (S OUNDNESS ). A (non-symbolic) value abstraction α: C → A is sound iff for all e ∈ VExprC : JαeKA ⊇ αJeKC A symbolic value abstraction αc1 ,.., cn : C → A is sound iff for all vi ∈ Jξi KC , αv1 ,.., vn is sound. Diagrammatically, the condition for soundness can be drawn: VExprC (Soundness)

α



VExprA

J·KC

J·KA

93

/ ValC 

α

/ ⊇ ValA



There are several alternative characterizations of soundness that can help in understanding the definition. We present two. By Proposition 10.5, the soundness requirement is equivalent to γJαeKA ⊇ JeKC , which has the diagram: VExprC (Soundness)

α

J·KC

/ ⊆ ValC O γ



VExprA

J·KA

/ ValA

Using Propositions Proposition 10.7 and Proposition 10.6 the soundness criterion can also be written γJdKA ⊇ JγdKC , with the corresponding diagram: VExpr O C

J·KC

γ

(Soundness)

VExprA

/ ⊆ ValC O γ

J·KA

/ ValA

Returning to our example expression d = pos+neg soundness requires that JdKA ⊇ {neg, zero, pos} . The abstract semantics can yield no more precise an answer; if it did, it would be “wrong” for at least one of the following concrete expressions: 2 + –1, 2 + –2, 2 + –3. One important quality of soundness is that for compositional languages a sound semantics can be constructed by considering each language construct separately. P ROPOSITION 10.12 ([57]). If C and A are value interpretations with compositional semantics (defined in Section 9.1) and for every vop ∈ Vop and v1 , .., vn ∈ ValC Jvop(αv1 , .., αvn )KA ⊇ αJvop(v1 , .., vn )KC then for every e ∈ VExprC : JαeKA ⊇ αJeKC



This gives a clue as to how one may go about constructing a useful sound semantics, but it does not solve the problem. Let us limit our discussion to compositional languages. Before one can use the techniques advocated here, one must “invent” a semantics for each value operation and prove that it is sound. Given a ground semantics for vop, the abstraction function α induces two abstract semantics: the trivial and the optimal versions of vop. The trivial operator is defined: Jvopξ (αv1 , .., αvn )KA = JξKA The optimal operator, instead, is defined: Jvopξ (αv1 , .., αvn )KA = αJvop(γv1 , .., γvn )KC The trivial semantics is always automatically generable; however this semantics often does not suffice to prove interesting properties of the system. In some cases the optimal semantics of an operator is also automatically generable; for example, when the concrete value domain is finite. If the optimal semantics can be generated automatically, then all the user needs to do to define an optimal abstract 94

interpretation is to provide the abstraction function α. (For finite domains, Boolean decision diagrams (BDDs) have been put to good use in speeding the automatic generation of optimal semantics for value operators [22]; the use of BDDs for this purpose is entirely compatible with our work.) If the optimal semantics is not automatically generable, one may be able to use techniques from the theory of abstract interpretation [32, 57] to aid the construction of a usable sound semantics; it remains to future work to fully explore this alternative. We now state and prove the main theorem of this chapter: if an abstract value-language semantics is sound, then the corresponding abstract process-language semantics is sound. It is worth noting that the proof presented here is much simpler than the proof of a weaker result that we presented in [29]. T HEOREM 10.13 (S OUNDNESS OF P ROCESS S EMANTICS ). If α : C → A is a sound (non-symbolic) value abstraction, then for all p ∈ PExprC : F JαpKαF A @ + A αJpKC

If αc1 ,.., cn : C → A is a sound symbolic value abstraction, then for all p ∈ PExprC and for all vi ∈ Jξi KC : F @ Jαv1 ,.., vn pKαF A + A αv1 ,.., vn JpKC Proof. It suffices to show that the theorem holds for non-symbolic abstractions. For symbolic abstractions, the result then holds by quantifying over the possible abstractions αv1 ,.., vn and applying the result for non-symbolic abstractions in each case. Recall that −→C and −→A are defined in Table 9.3. To prove the theorem, it is enough to show the following, for all r, r0 in PExprC : av aαv 0 0 (1) r −→ C r implies αr −→A αr

(2)

a a αr −→ A implies r −→C

The proof of each property proceeds by rule induction, with case analysis on the structure of r. The rules (VPL-OMEGA) and (VPL-INT) are trivial. Suppose that the rule (VPL-IN) is used to derive a transition from r = cξ ?x.P. (VPL-IN) states: c?v v cξ ?x.P −→ I P{| /x|} , for all v ∈ JξKI

For (1), consider an arbitrary v ∈ JξKC . Then we must demonstrate that: c?αv v cξ ?x.αP −→ A α(P{| /x|}) c?αv αv By Proposition 10.3, it suffices to show that cξ ?x.αP −→ A α(P){| /x|}. But this follows immediately from (VPL-IN), since αv ∈ JξKA . For (2), it is sufficient to note that both JξKA and JξKC are nonempty; c? c? therefore both αr −→ A and r −→C are always derivable from this rule. Suppose (VPL-OUT) is used to derive a transition from r = c!e.p. (VPL-OUT) states: c!v c!e.p −→ I p , for all v ∈ JeKI c!αv For (1), we show that c!αe.αp −→ A αp for each v in JeKC . This follows from the fact that the value interpretation is sound, therefore JαeKA ⊇ αJeKC . For (2), note that both JeKA and JeKC are nonempty; c! c! therefore both αr −→ A and r −→C are always derivable from this rule.

95

Suppose (VPL-COND) derives a transition for r = if e then p else q. We treat the first clause of (VPL-COND); the case for ff is similar: τ if e then p else q −→ I p , if tt ∈ JeKI

τ (2) is trivial. For (1), we show that if tt ∈ JeKC , then if αe then αp else αq −→ A αp. This follows from the fact that the value interpretation is sound, therefore JαeKA ⊇ αJeKC . The cases for the rules (VPL-EXT), (VPL-PAR), (VPL-RES), and (VPL-REN) are very similar to each other. As examples, we discuss (VPL-PAR 1 ) and (VPL-PAR 2 ). (VPL-PAR 1 ) states: av 0 p −→ I p av 0 p|q −→ I p |q τ()

τ Recall that τ actions are of sort unit and that we write −→ I to abbreviate −→I ; therefore the “pattern” av 0 av in the rule matches any label, internal or external. For (1), assume that p|q −→ C p and therefore av aαv aαv p −→C p0 by a shorter inference. By induction, αp −→A αp0 ; therefore, applying the rule, αp|αq −→ A 0 αp |αq as required. For (2) the argument is similar. Suppose that (VPL-PAR 2 ) derives a transition for r = p|q. Rule (VPL-PAR 2 ) states: c?v 0 p −→ I p

c!v 0 q −→ I q

τ 0 0 p|q −→ I p |q c?v c!v τ 0 0 0 0 For (1), assume that p|q −→ C p |q , and therefore p −→C p and q −→C q by shorter inferences. c?αv c!αv τ 0 0 By induction, αp −→A αp0 ; and αq −→A αq0 ; therefore, applying the rule, αp|αq −→ A αp |αq as required. Again, the argument is very similar for (2). Finally, let us consider the case in which (VPL-REC) is used to derive a transition for r = f (e). In this case, the rules for C and A are distinct, since A relies on an abstracted function declaration αF. We can write the rules as follows: av 0 Pi {|e/x|} −→ Cp av

fi (e) −→C

p0

def

fi (x) = Pi in F

av 0 (αPi ){|e/x|} −→ Ap av

fi (e) −→A

p0

def

fi (x) = Pi in F

av av 0 0 e For (1), assume that fi (e) −→ C p and therefore Pi {| /x|} −→C p by a shorter inference. By induction, aαv aαv 0 0 α(Pi {|e/x|}) −→C p , and therefore by Proposition 10.3 (αPi ){|αe/x|} −→ C p . Applying the rule, we have aαv 0 fi (αe) −→  A p as required. Again, the argument is very similar for (2).

10.3

Quality of Abstract Semantics

While soundness is a necessary condition for an abstract semantics to be useful, it is not sufficient. For example, for any α, one can define a trivial abstract semantics for values: Jeξ KA = JξKA . (Another way of achieving the same result is to use the trivial interpretation defined in Section 9.1.) While clearly sound, this semantics does not convey useful information about the concrete expression e that it is supposed to approximate. For our example abstraction posneg, the semantic function that maps all expressions to {neg, zero, pos} would be an example of such a trivial, yet sound, semantics. Of course, the quality of the abstract semantics of our process language is dependent on the quality of the abstract value language semantics that is provided. In this section, we argue that our process 96

semantics preserves as much information as possible from the value semantics. To begin with, we note that improvements in the value semantics result in improved process semantics. P ROPOSITION 10.14. Let A and B be value interpretations with ValA = ValB (therefore VExprA = VExprB and PExprA = PExprB .) Further, let A be a sound abstraction of B. If there exists some e such that JeKA ) JeKB , then there exists some q such that JqKA @ X+ JqKB . Proof. Since A is a sound abstraction of B, we have (by Theorem 10.13) that for all p, JpKA @ + JpKB . Let e be an expression such that JeKA ) JeKB , and let q = c!e.0. Then JqKA @  X+ JqKB . At the opposite extreme from the trivial semantics—described at the top of this section—is the most precise sound semantics, called the optimal or induced semantics [33]. The optimal semantics is traditionally used as the standard by which any abstract interpretation must be compared. α

P ROPOSITION 10.15 ([33]). Given a Galois insertion C −→ ←− γ A, the optimal semantics is induced by the equation ∀d ∈ VExprA : JdKA = αJγdKC , or diagrammatically: VExpr O C (Optimality)

/ ValC

J·KC

γ

VExprA

J·KA



α



/ = ValA

Unfortunately, the definition of optimality does not give a terminating algorithm for computing optimal abstract semantics for even a single operator, much less for the entire language. In addition, whereas Proposition 10.12, states that a sound semantics can be derived by constructing a sound implementation of each operator in the language, the same does not hold true for optimality. Consider the integer operators one and dec, defined as follows: one(v) = 1 dec(v) = v − 1 In the posneg abstraction, the optimal abstract operators are: one(v) = {pos}  {zero, pos} dec(v) = {neg}

, if v = pos , otherwise

Under this interpretation, the expression dec(one(zero)) evaluates to dec(pos) = {zero, pos}. The induced semantics, however, requires that dec(one(zero)) be equal to α(dec(one(0)) which evaluates to α(dec(1)) = {pos}. Thus, optimal semantics may be inherently non-compositional. If optimal semantics are so hard to come by, why have we bothered to discuss optimality at all? First, optimality is important in the theory of Abstract Interpretation because it is the only formal criterion for demonstrating that an abstract semantics is “of high quality”. Thus, we would like to be able to prove that, in some cases, our abstract process semantics is optimal. Second, it may be 97

possible (as described in the previous section) to automatically generate optimal versions of each value operator. While the resulting value-language semantics will not in general be optimal (due to the fact that optimality is not a compositional property), it may be sufficient for many proofs. Finally, many of the value languages that occur in distributed systems are quite simple, and therefore optimal semantics are easy to define. For example, let us say that we wish to verify the correctness of a bitonic sorting network. To do this, we can use the symbolic abstraction splitc defined at the beginning of this chapter:  0 if i ≤ c splitc i = 1 otherwise To implement the sorting network, the only operator that we require on values is the comparison operator leq(u, v) = u ≤ v which has the following optimal abstract semantics:  {tt} , if u = 0 and v = 1 leq(u, v) = {tt, ff} , otherwise Optimal semantics are difficult to define if the language contains operators that can discriminate between values that are same under abstraction. For concrete expressions, posneg(dec(v)) gives different results for the arguments v = 1 and v = 2, which are equivalent under the abstraction. On the other hand leq(u, v) gives the same result for any two concrete values that are equivalent under splitc . The difference between the two can be expressed using the notion α-independence. α

D EFINITION 10.16 (α- INDEPENDENCE ). Given value interpretations C −→ ←− γ A, we say that C is α-independent if for every d, e ∈ VExprC : αd = αe implies αJdKC = αJeKC For a symbolic abstraction, αc1 ,.., cn , we say that C is α-independent if it is α-independent for each αv1 ,.., vn .  Thus leq is splitc -independent, but dec is not posneg-independent. α-independence generalizes the notion of data independence [98]. A language is data independent if it has no conditional expressions that depend on values (i.e. if the trivial interpretation as defined in Section 9.1 is optimal.) Many applications, such as sorting, are not data independent, but are α-independent for an appropriate α. It turns out that for α-independent languages, optimality coincides with the stronger notion of exactness up to α, or simply exactness. D EFINITION 10.17 (E XACTNESS UP TO α). Let α: C → A be a (non-symbolic) abstraction function. We say that A is exact up to α iff for all e ∈ VExprC : JαeKA = αJeKC For a symbolic value abstraction αc1 ,.., cn : C → A we say that A is exact up to α if A is exact for every αv1 ,.., vn .  98

α

P ROPOSITION 10.18. Let C and A be value interpretations such that C −→ ←− γ A and C is α-independent. Then A is optimal iff A is exact up to α. Proof. To see that exactness implies optimality, suppose that for all e, JαeKA = αJeKC . By substituting γd for e, we have JαγdKA = αJγdKC . By Proposition 10.6 αγd = d; therefore JdKA = αJγdKC as required. Conversely, suppose that A is optimal and therefore for all d, JdKA = αJγdKC . By substituting αe for d, we have JαeKA = αJγαeKC . Because C is α-independent, we know that αJeKC = αJγαeKC ; therefore JαeKA = αJeKC as required.  Recall that A is sound if JαeKA ⊇ αJeKC ; therefore exactness up to α is a much stronger property than soundness. Diagrammatically, the condition for exactness up to α can be drawn: VExprC (Exactness up to α)

α



VExprA

J·KC

J·KA

/ ValC 

α

/ = ValA

To achieve exactness up to α, one must find an α such that the ground language is α-independent, then define an optimal semantics for that abstraction. Note that the trivial interpretation (Section 9.1) is α-independent and optimal for every sort but bool. If α-independence and optimality can also be achieved for bool, then a variation on the trivial interpretation will often suffice, but not always. (Note that if one uses the more general notion of value interpretation suggested in Section 10.6.2, then the trivial interpretation is always α-independent and optimal for all sorts; however, in this case Theorem 10.20 does not hold.) Exactness is very sensitive to set of operators allowed in the language. For example, in a language with integer comparison and successor operations, we speculate that every exact interpretation must include an infinite set of values for sort int; the only sound abstractions have value sets that are isomorphic to the original value set! In this case one must be contented with a sound—but not exact—abstract interpretation. While stronger than soundness, exactness up to α is much weaker than “exactness” as it is used by Clarke et al. in [22]. (In [62], roughly the same property is termed “consistency”.) To avoid confusion, when a semantics is exact in the sense of [22] we will say that it is equivalent to the concrete semantics. An equivalent semantics preserves all of the information available using the concrete semantics. In our context, the requirement can be written γJαeKA = JeKC or diagrammatically: VExprC (Equivalence)

α



VExprA

J·KC

/ = ValC O γ

J·KA

/ ValA

Thus, if two models are distinguishable using the concrete semantics, they must also be distinguishable using an equivalent abstract semantics. As a result, the abstract value set must be approximately the same size as the concrete value set. As noted by Clarke et al., such abstractions are of little use in

99

reducing the complexity of verification. In fact, one questions the use of the term “abstraction” in this case.3 To be “exact up to α”, instead, an abstract semantics must accurately reflect the behavior of the concrete semantics when viewed modulo α. Processes which are distinguishable in the concrete semantics may fail to be distinguishable using a semantics that is exact up to α. To see that equivalence is stronger than exactness up to α, note that γJαeKA = JeKC implies αγJαeKA = αJeKC , and JαeKA = αγJαeKA . Thus equivalence implies exactness. Conversely, however, JαeKA = αJeKC implies γJαeKA = γαJeKC , but in general γαJeKC ⊇ JeKC . Thus exactness up to α implies soundness (the second characterization in Section 10.2), but not equivalence. Unlike optimality, exactness is a compositional property. (In the sequel, we use “exactness” as an abbreviation of “exactness up to α.”) The following proposition is immediate from the definitions. P ROPOSITION 10.19. If C and A are value interpretations with compositional semantics and for every vop ∈ Vop and v1 , .., vn ∈ ValC Jvop(αv1 , .., αvn )KA = αJvop(v1 , .., vn )KC then for every e ∈ VExprC : JαeKA = αJeKC



We now state the main result of this section: if the abstract value semantics is exact, then so is the resulting process semantics. Note that we use bisimulation equivalence ∼ in the theorem rather than ready equivalence . ∼ is a stronger relation than . Since we prove the result up to bisimulation, F we have that for all abstract HML formulas ϕ, JαpKαF A |= ϕ iff αJpKC |= ϕ. The proof is deferred to the end of the section. (Although ∼ is stronger than , we have not found a weaker condition than exactness of the value semantics that will suffice to prove exactness of the process semantics up to

.) α

T HEOREM 10.20 (E XACTNESS OF P ROCESS S EMANTICS ). If C −→ ←− γ A and A is exact then for every p ∈ PExprC : F JαpKαF A ∼A αJpKC 0 Proof. To show that αJrKFC ∼A JαrKαF A it is enough to show the following, for all r, r in PExpr C :

(1)

av aαv 0 0 r −→ C r implies αr −→A αr

av au 0 0 0 0 0 (2) αr −→ A r implies ∃v : u = αv : ∃r : r = αs : r −→C s 3 It

is also worth noting that Clarke et al. define a “congruence” requirement for abstractions that is sufficient to ensure equivalence of the abstract semantics. Not surprisingly, it is a much stronger requirement than α-independence. In our setting their congruence requirement can be stated as follows (compare Definition 10.16). An abstraction α is a congruence for C if for all d, e ∈ VExprC : αd = αe implies ∀u : u ∈ JdKC implies u ∈ JeKC Or more directly: αd = αe implies JdKC = JeKC

This means that expressions that are equivalent under α must have the same concrete meaning; any such α is not much of an abstraction.

100

We proved (1) in Theorem 10.13. Like the proof of (1), the proof of (2) proceeds by rule induction with case analysis on the structure of r. However, in order to get the case for (VPL-PAR 2 ) to go through, we need a stronger hypothesis. (2a)

av au 0 0 0 0 0 αr −→ A r implies ∃v : u = αv : ∃s : r = αs : r −→C s and

av c?u 0 0 0 0 0 (2b) αr −→ A r implies ∀v : u = αv : ∃s : r = αs : r −→C s

Most of the cases are very similar to the cases in the proof of Theorem 10.13, thus we present only three: input, output, and parallel composition. Suppose that the rule (VPL-IN) is used to derive the c?u 0 0 0 u v transition cξ ?x.αP −→ A r . Then r = (αP){| /x|}. By Proposition 10.3 r = α(P{| /x|}) for any v such that αv = u. From the semantics, clearly (b) holds. In addition (a) holds because JξKC is nonempty. c!u Suppose (VPL-OUT) is used to derive the transition c!αe.αp −→ A αp for some u ∈ JαeKA . Since c!v JαeKA = αJeKC , we know that there is some v ∈ JeKC such that αv = u. Therefore c!e.p −→ A p. The same argument works for (VPL-COND). Suppose that (VPL-PAR 2 ) derives the transition τ 0 0 αp1 |αp2 −→ A r1 |r2

and therefore for some u, c?u 0 αp1 −→ A r1

c!u 0 and αp2 −→ A r2

by shorter inferences. By induction, there exists some v2 , s02 with αv2 = u and αs02 = r20 such that c!v2 0 0 p2 −→ C s2 . Also by induction, we have that for all v1 such that αv1 = u, there exists an s1 such that c?v τ 0 0 αs01 = r10 and p1 −→1 C s01 . From the semantics, we therefore have that p1 |p2 −→ C s1 |s2 , as required.  Exactness of the value semantics is sufficient to ensure the optimality of the process semantics. In fact, exact value semantics are both sufficient and necessary for optimal process semantics up to ready simulation—at least in the usual case that the ground semantics is deterministic. Before presenting the general argument, we show an example. Consider the value operator isone defined:  tt , if v = 1 isone(v) = ff , otherwise In the posneg abstraction, the optimal semantics for isone requires:  {tt, ff} , if v = pos isone(v) = {ff} , otherwise This semantics is optimal, but not exact up to posneg. For the process r = c?x.c!isone(x), the LTS JrKA is r> c?neg

• c!ff

•

>> >> c?pos >>  0 r ?? ??c!ff ?? c!tt ? • •

c?zero

• c!ff

•

101

whereas JγrKC is

γr

··· ··· ··· ···

c!ff

c?–2

• •

c!ff

and thus αJγrKC is:

~ @@ ~~ c?0 @ c?1@@ c?–1 @ ~~ •~~ • •

•

c!ff

•

•

c?2

• c!tt

•

··· c!ff

··· ··· ···

γr

···

c?neg

•

~ @@ ~~c?zero @ c?neg c?pos @@@ ~~ •~~ • •

c?pos

···

•

··· ··· c!tt · · · c!ff c!ff c!ff c!ff · · ·      ··· • • • • • ··· c!pos 0 Note that r −→ A r , but there is no c!pos-derivative of γr in αJγrKC that can simulate r0 . Thus JrKA @ X+ αJγrKC . In general, suppose that J·KC is deterministic and e is an expression such that JeKA ) αJeKC . Suppose that the values v1 , .., vn occur in e. Then let E be the term derived from e by replacing each value occurrence vi with a variable xi , and let r be the process c?x1 .c?x2 . .. c?xn .c!E. Then in JrKA we have the transitions c?αv c?αv c?αv r −→1 −→2 · · · −→n r0 but there is no state in αJγrKC that can simulate r0 . To see this, note that r0 has more than one outgoing transition, whereas every state in αJγrKC has but a single outgoing transition. The same argument holds if one considers optimality up to “weak” simulation relations such as copy/refusal simulation [91, 92]. However for trace-based relations, such as the testing preorder of Section 10.5, the argument does not hold. It may be possible that there are non-exact value semantics that induce an optimal process semantics up to testing.

10.4

An Example

In this section we give a small example illustrating the utility of our results. Consider the following system consisting of a router and two processing units. The router waits for a value, which is a natural number, to arrive on its in channel; it then routes the (halved) value to the “left” processing unit if the original value is even and to the right otherwise. (Thus the least significant bit of the value may be thought of as an “address”.) Assume that the value interpretation C is the standard one for natural numbers. The VPL process describing this system may be given as follows. def

Router = in?(v).if ((vmod2) = 0) then left!(v/2).Router else right!(v/2).Router def

Unit0 = in?(v).out!(op1 v).Unit0 def

Unit1 = in?(v).out!(op2 v).Unit1 def

System = (Router | Unit0 [left/in] | Unit1 [right/in])\ {left, right}

102

We would like to determine whether the above system is deadlock-free. Unfortunately, its state space is infinite, and naive state-space enumeration techniques would not terminate. The results in this dissertation suggest, however, that if we can come up with a safe abstraction on values and establish that the resulting abstracted process is deadlock-free, then so is the original system. That is, letting A be the abstract interpretation and α the abstraction from C to A, it follows from the fact that JpKA @ ∼A αJpKC that JpKC deadlocks if and only if αJpKA does. Consider the trivial abstract interpretation T in which all concrete values are collapsed into a single abstract value (), every expression evaluates to (), and every boolean evaluates to the set {ff, tt}. The abstraction function α that maps the concrete interpretation into this interpretation is clearly safe. When we apply this abstraction to the above system, we get a system that is semantically equivalent to the following. def

RouterT = in?().(left!().RouterT  right!().RouterT ) def

UnitT = in?().out!().UnitT def

SystemT = (RouterT | UnitT [left/in] | UnitT [right/in])\ {left, right} This system is finite-state, and using reachability analysis one may determine that it is deadlock-free. Accordingly, it follows that the original system is also deadlock-free.

10.5 Testing In [50], Hennessy and Ing´olfsd´ottir defined a precongruence for VPLI based on testing as introduced in [37, 49]. Our modifications to “traditional” ready simulation (Definition 9.12) closely follow their modifications to traditional testing. We used a variant of Hennessy and Ing´olfsd´ottir’s relation in [29]. In this section, we show that @ + refines the testing preorder of [29]. For motivation and further details about testing, we refer the reader to the above references. The testing preorder is defined with respect to traces, which are sequences of non-τ labels (ext(LabelI ))∗ . We write ε for the empty trace and use juxtaposition to denote prefixing. For example, in?1 in?5 out!7 is a trace consisting of three labels. Let s range over traces. We use TracesI to denote the set of all traces over label set LabelI . In order to define the testing preorder we introduce the following definitions, which borrow heavily from [27, 49]. D EFINITION 10.21 (T RACE R ELATIONS ). Let (˙p ∈) P = (SP , p˙ 0 , 7 7−→) be an LTS in LTSI . • The weak transition relation, (Z Z=⇒) ⊆ (SP × TracesI × SP ), is defined inductively on traces: ε

τ ∗ τ ∗ τ (Z Z=⇒) = (7 7−→) , where (7 7−→) is the transitive and reflexive closure of 7 7−→. κs

ε

κ (Z Z=⇒) = (Z Z=⇒) · (7 7−→) · (Z Z=⇒) , where · denotes relational composition. s

• The divergence relation, ↑ ⊆ (SP × TracesI ), is defined inductively on traces: τ p˙ ↑ ε iff there exists an infinite sequence (˙pi )i≥0 with p˙ = p˙ 0 and p˙ i 7 7−→ p˙ i+1 . κ

p˙ ↑ κs iff p˙ ↑ ε or (˙p Z Z=⇒ p˙ 0 and p˙ 0 ↑ s). We write p˙ ↓ s for the convergence relation ¬(˙p ↑ s). 103

• The set of initial (external) actions that a process may perform is given as: init(˙p) = {a | a 6= τ ∧ p˙ 7 7−av →} τ If p˙ 7 7− X− →, we say that p˙ is stable. An acceptance set A is a set of subsets of ext(Act), i.e. ext(Act) A⊆2 . The acceptance set of p˙ after a trace s defined as follows.

 s τ acc(˙p, s) = init(˙p0 ) | p˙ Z Z=⇒ p˙ 0 ∧ p˙ 0 7 7− X− → • Let A and B be acceptance sets. We define an ordering c on acceptance sets as follows: A c B iff ∀B ∈ B : ∃A ∈ A : A ⊆ B.  s

Thus p˙ Z Z=⇒ p˙ 0 holds if p˙ can perform the completed actions listed in s with any number of intervening s s τ actions and end up as p˙ 0 . We abbreviate (∃˙p0 : p˙ Z Z=⇒ p˙ 0 ) as p˙ Z Z=⇒. The predicate p˙ ↓ s holds if p˙ is incapable of infinite internal computation at any point during its “executions” of s. The set init(˙p) is the set of initial actions of p˙ ; we emphasize that this set includes no reference to values. The acceptance set acc(˙p, s) represents the set of “action capabilities” of p˙ after s. Note that the set of actions does not depend on the value interpretation. Each set A in acc(˙p, s) corresponds to a state that p˙ may reach by executing s and contains the set of next possible actions in that state. The fact that acc(˙p, s) may contain more than one such set indicates that nondeterministic choices may occur during the execution of s; the more sets acc(˙p, s) contains, the more nondeterministic p˙ is in its execution of s. Finally, the ordering c relates acceptance sets on the basis of their relative nondeterminism; intuitively, A c B if A represents a “less deterministic” set of processes. Note that according to the definitions the predicate (acc(˙p, s) c acc(˙q, s)) is true if and only if τ τ q˙ Z Z=⇒ q˙ 0 7 7− X− → implies (∃˙p0 : p˙ Z Z=⇒ p˙ 0 7 7− X− → ∧ init(˙p0 ) ⊆ init(˙q0 )). s

s

This states that if q˙ can reach a stable state q˙ 0 , then p˙ must be able to reach a stable state p˙ 0 by executing the same trace, and, in addition, the initial actions available to p˙ 0 must be a subset of those available to q˙ 0 . We can now define the testing preorder @ ∼. D EFINITION 10.22 (T ESTING P REORDER ). Let P and Q be q˙ 0 . Then P @ ∼I Q iff for all s: s

LTS s

in LTSI with initial states p˙ 0 and

s

• q˙ 0 Z Z=⇒ implies p˙ 0 Z Z=⇒, and • p˙ 0 ↓ s implies (˙q0 ↓ s and acc(˙p0 , s) c acc(˙q0 , s)). As with ready simulation, we can related process expressions using the transition relation defined in @ Table 9.3. For processes p and q, p @  ∼ q iff JpK ∼ JqK. Our use the term “testing preorder” is non-standard. In the conventional parlance, our testing preorder would be called the “must/may-inverse testing preorder”. As with ready simulation, we prefer the shorter name.

104

P ROPOSITION 10.23 (R EADY S IMULATION R EFINES T ESTING ). Let P and Q be an @ Then P @ + I Q implies P ∼I Q.

LTS s

in LTSI .

Proof. Let p˙ 0 and q˙ 0 be the initial states of P and Q and assume that p˙ 0 @ ˙ 0 . To show that P @ +q ∼ Q, it suffices to show for all s: s

s

q˙ 0 Z Z=⇒ implies p˙ 0 Z Z=⇒

(1)

q˙ 0 ↑ s implies p˙ 0 ↑ s

(2)

τ τ (3) q˙ 0 Z Z=⇒ q˙ 0 7 7− X− → implies (∃˙p0 : p˙ 0 Z Z=⇒ p˙ 0 7 7− X− → ∧ init(˙p0 ) ⊆ init(˙q0 )) s

s

To facilitate the proof of these properties, let us define an extension of traces that allow τ actions to appear. A τ-trace t is a sequence of actions in Act∗ . We can then extend the transition relation to τ-traces by taking: ε p˙ 7 7−→ p˙ 0 if p˙ = p˙ 0 κ t p˙ 7 7−κt → p˙ 0 if ∃˙p00 : p˙ 7 7−→ p˙ 00 7 7−→ p˙ 0

By induction on t, it is easy to show that if p˙ @ ˙ , then for all t: +q t t (a) q˙ 7 7−→ q˙ 0 implies ∃˙p0 : p˙ 7 7−→ p˙ 0 and p˙ 0 @ ˙0 +q

(b) init(˙p) = init(˙q) τ τ (c) p˙ 7 7− X− → iff q˙ 7 7− X− →

(1) follows immediately from (a). (2) also follows from (a) since for all n, q˙ 7 7−τ→ implies p˙ 7 7−τ→ . Finally, (3) follows from (a)–(c).  n

n

There are other weak preorders in the literature that are refined by traditional ready simulation [94]. In each case, the definition of the preorder can readily be modified to accommodate value passing so that the modified relation is refined by @ + . Of these, one of the better studied is the copy/refusal simulation of Ulidowski [91, 92]. This preorder is based on an alternative formulation of testing due to Abramsky and Phillips [1, 76]. We do not mention this preorder further, but note that it may be useful for purposes that require a relation that is weaker than @ + , yet stronger than the testing preorder described below. (Copy/refusal simulation is more sensitive to the branching structure of processes than testing.)

10.6

Alternative Semantics and Extensions

In this section, we comment on some of the choices we have made in defining the semantics and discuss some alternative possibilities and extensions. 10.6.1

Alternative Semantics for Output and Conditional

In [29], we gave a different semantics for output in which an explicit evaluation step was included. One advantage of this approach is that most of our results can then be stated using the “usual” definition of ready simulation (compare Definition 9.12), in which p˙ @ ˙ implies: +q κ κ (20 ) ∀κ ∈ LabelI : p˙ 7 7−→ implies q˙ 7 7−→

105

Table 10.1 Hennessy and Ing´olfsd´ottir’s semantics All rules but (O UT) and (C OND) from Table 9.3 with ,→ replacing −→

(HI-OUT)

JeKI = {v} c!v

c!e.p ,→ p

(HI-COND 1 )

(HI-COND 2 )

JeKI = {tt} λ p ,→ p0

JeKI = {ff} λ q ,→ q0 λ

if e then p else q ,→ p0

λ

if e then p else q ,→ q0

The semantics of [29] requires some modification to be suitable for this version of ready simulation. Basically, we must ensure that an evaluation step is always done before a value is output; in [29] this was not the case for output expressions with simple values such as c!v.0. One way to define the semantics is to add states to the transition system corresponding to evaluated outputs. Let use write these special states c!v.p. The semantics for output can then be given using the two rules: τ c!e.p −→ c!v.p , if v ∈ JeK c!v c!v.p −→ p

Most of our results hold using this definition for output and ready simulation, except for Theorem 10.9 and the theorems that depend on it. (We have been unable to define a Galois insertion with respect to the stronger definition of ready simulation, although for preorders that are insensitive to τs—such as testing—a Galois insertion can be defined as in [29]). A disadvantage of this approach is that it forces the introduction of a τs before each output action, increasing the size of the LTS and making it impossible to describe τ-less systems that engage in output. Using the testing preorder, we can also compare our semantics with the one given in [50]. There, value interpretations are assumed to be deterministic. Table 10.1 gives their formulation of the semantics of output and the conditional. This definition of ,→ may be substituted into our definitions for J·KI to generate new transition systems for processes and hence a new testing preorder on processes that we denote I . If the valuation functions are deterministic, then the two formulations of testing relate exactly the same terms. P ROPOSITION 10.24. Let I be a value interpretation such that the range of J·KI is {{v} | v ∈ ValI }. Then p @  ∼I q iff p I q. 10.6.2

Alternative Value Sets for Booleans

We have defined value interpretations and process semantics in such a way that JboolK must have exactly the two elements tt and ff. It is possible to allow other value sets for JboolK. In this case, we require the user to provide predicates istrue and isfalse on value sets. The rule for conditional expressions (Table 9.3) can then be written: τ (VPL-COND) if e then p else q −→ p , if istrueJeK τ if e then p else q −→ q , if isfalseJeK

106

In particular, one might want to use a trivial interpretation of bool with a single element ()bool that is both “true” and “false”. In this case, the conditional construct if e then p else q is equivalent to the internal choice p  q. Not all of our results hold, however, with this weaker semantics. The soundness theorem (Theorem 10.13) holds; however the exactness theorem (Theorem 10.20) does not. The problem lies in the fact that using this semantics, exactness is a weaker requirement for value languages. Intuitively, it is important not to “trivialize” Booleans since they are used to determine the control flow of a process. 10.6.3

Call-By-Value Recursion

Another possible change to our semantics would be to use “call-by-value” semantics for recursion: κ P{|JeK/x|} −→ p0 κ

f (e) −→

p0

def

f (x) = P in F

We speculate that all of our results can be adapted to this call-by-value semantics as long as the value language is compositional. 10.6.4

Errors in Value Semantics

Note that errors can be accommodated in our simple framework by choosing a value constant errξ for each sort ξ. However, our process semantics doesn’t follow the intuition that all language constructs should be strict in err; errors are not treated specially by the output construct and conditional expressions must evaluate either to tt or ff, thus making it impossible for them to be strict in err. A better treatment of errors is achieved by adding a distinguished internal action; for simplicity, let us simply call it err. In this case, the semantics for output and conditional must be modified as follows: err c!e.p −→ 0 , if err ∈ JeK

c!v c!e.p −→ p , if v ∈ JeK and v 6= err

err if e then p else q −→ 0 , if err ∈ JeK τ if e then p else q −→ p , if tt ∈ JeK τ if e then p else q −→ q , if ff ∈ JeK

Note that “less defined” expressions may have errors in cases where their more defined counterparts do not. To accommodate this we must weaken the definition of ready simulation to allow the appearance of spurious error actions in smaller processes. The second rule for the ready simulation p˙ @ ˙ then becomes (compare Definition 9.12): +q a a (200 ) ∀a ∈ (ActI \ {err}) : p˙ 7 7−→ implies q˙ 7 7−→

The main results of the Chapter 10, Theorems 10.13 and 10.20, hold using this alternative semantics.

107

Chapter 11 Discussion We described a method for verifying value-passing processes using abstract interpretations of values. We defined ready simulation, a semantic preorder on processes, and studied its properties. We demonstrated that safe value abstractions induce safe process abstractions, up to ready simulation. We also presented a sufficient condition for exactness, call α-independence; this condition is more widely applicable than all other conditions we have found in the literature. Abstractions based on the operational behavior of programs have been studied at least since the Cousots [32, 33] introduced Abstract Interpretation (AI). AI has been extensively studied and widely applied; for a survey see [57]. Only in the last few years, however, has AI been applied to reactive programs. The work most closely related to ours has all been based on state-transformer languages and Kripke structures (unlabelled graphs) rather than process algebras and LTSs [60, 22, 47, 63, 4, 62, 44, 45, 35].1 In state-transformer languages, a program is simply a predicate P on pairs of states, where the program counter is included in the state. Programs have a simple relationship with their models; if P(s, s0 ) is true, then there is a transition s → s0 in the model of the program. Some of the specific languages considered in the above references include operations that allow one to generate a new program via parallel composition; however, none include other operations on processes such as choice and conditional execution.2 Although some state-transformer languages, such as TLA [61] and UNITY [20], have a well-developed theory, none of the papers cited above make any attempt to capitalize on this existing work. (It would be particularly interesting to consider the relationship between abstraction and program transformations such as those studied in [99]). In the setting of state-transformer languages, abstractions on the values that form the program state naturally induce an abstraction on the states of the corresponding Kripke structure. In most of the papers cited above, the authors are (like us) primarily concerned with the preservation of universal properties; in this case one includes a transition αs → αs0 between two abstract states whenever s → s0 in the original structure. Of papers cited, those from Clarke’s group at CMU [22, 47] give the best treatment of abstract 1 In fact, [62] allows labels on transitions, but these do not contain values and are not used in the definition of abstraction;

their sole purpose is to identify transitions that synchronize during parallel composition. 2 In his thesis, Long [63] does describe a more sophisticated language with conditionals and other operators; however, he does not provide a congruence result that would allow these operators to be used for compositional verification.

semantics that preserve universal properties, based on CTL∗ . For the most part we will concentrate our comparison on their work, particularly on [22]. Sifakis and Graf’s group at IMAG has done very similar (and slightly more general) work based on µ-calculi. While the IMAG work has a less useful notion of abstract semantics, they have treated a more general form of parallel composition than CMU [4, 62]. Dams, Grumberg and Gerth [35] have presented a different result for abstractions that preserve existential properties [35]. We will discuss this further below. First, we note that the difference between Kripke structures and LTSs is not a fundamental one. One can transform a Kripke structure into an “equivalent” LTS with only a linear increase in the number of states and transitions, and vice-versa [39, 38]. Thus, at the model level, our results are very similar to those from CMU. Recall, however, that one does not want to construct an abstract model by first constructing a concrete model, then abstracting it. The goal of abstraction, after all, is to avoid constructing the concrete model. Thus, the abstract semantics—which allows one to generate an abstract model directly from the program text—is of critical importance. At this linguistic level our work stands apart. In [22], Clarke et al. give the semantics of primitive operations as predicates on values. Thus the binary operation + must be given a meaning via the predicate ψ+ (v1 , v2 , v3 ), with the meaning that v1 plus v2 equals v3 . State-transformers can then be built up using the primitives and predicate logic. For example, the transformer x0 = 3 ∗ y + x can be represented by the predicate: ∃z : ψ∗ (3, y, z) ∧ ψ+ (z, x, x0 ) ˆ of each of the primitive predicate ψ, they then show how one can construct Given an abstract version ψ an abstract version of each state transformer. Thus, their semantics is a compositional abstract value semantics. We expect that our value sublanguages will be compositional as well; however, we do not ˆ be optimal (in fact, they define require this. They also require that the abstract primitive predicates ψ them to be optimal, as do the other references); therefore, our results are doubly more general. The additionally generality, however, is not particularly important. In practice, all usable semantics are compositional, and Clarke et al. are not concerned with non-optimal semantics because they assume that concrete value sets are finite and thus that the optimal semantics can always be computed from the abstraction function. A major advantage of our approach, however, is that it allows for the mixture of abstraction and compositional verification. While [63, 62] present results showing that abstraction semantics can be used in conjunction with parallel composition of processes, these papers do not treat other operators such as choice and restriction. Our approach neatly separates concerns into two theorems: a congruence theorem (Theorem 9.14) and a soundness theorem (Theorem 10.13). Since bisimulation is a congruence for VPLI (in addition to ready simulation), our results also allow for compositional verification of mixed properties in the case that the value abstraction is exact (Theorem 10.20), something the other authors do not show. A second novel aspect of our work is α-independence. As stated in Section 10.3, Clarke et al. only consider the preservation of mixed properties in the limited case that the abstract semantics is equivalent to the ground semantics. Our result is far more useful since α-independence often allows for substantial reductions in the size of systems. As we have noted, not all value programs admit 109

abstractions that achieve α-independence. Nonetheless, the class of such programs is broad enough to be useful. A different approach is taken to the preservation of mixed properties in [35]. There, the authors construct an abstract transition system with two transition relations: one for universal properties which is as described above for Kripke structures, and a second for existential properties. In constructing the transition relation for existential properties, one includes a transition αs → αs0 whenever for all states t, t0 such that αt = αs and αt0 = αs0 , t → t0 in the original structure. Mixed formulae can be verified on the resulting structure by using the different transition relations when considering different types of quantification in the logic. This approach has the advantage that it is sound for any program and for any abstraction function α; one need not limit attention to programs that are α-independent. One the other hand, it requires the definition of two abstract semantics. If the abstract semantics are automatically generable, this is not a problem; in other cases, however, it is a significant burden on the user to provide two semantics for every value operator. This approach has also been followed in [28] where the issue of optimality in such abstract transition systems is addressed. Several authors have treated data-independent programs [58, 98]; this class of programs, however, is rather limited. Our work on α-independence is more general. The goals of our work are also similar to those of Hennessy and Lin in their work on symbolic bisimulations [51, 84]. Central to their work is the notion of a symbolic transition system, which is a transition system with sets of free variables as states and guarded expressions as edges. Symbolic transition systems are often finite, but even trivial recursive processes whose arguments vary from call to call may have infinite symbolic transition systems, rendering the technique ineffective. The advantage of using a symbolic transition system, of course, is that one need not explicitly define the abstraction function. A different approach has been taken by Groote in his work on µCRL [46]. He and Ponse define a value passing language with algebraic datatypes based on ACP [5]. Processes are verified using an axiomatization of bisimulation equivalence and a general-purpose theorem prover. This approach has the advantage that it allows one to, for example, reason inductively on the structure of datatypes, thus proving properties that are not provable using abstractions. The disadvantages are those that come with general-purpose theorem provers: the work is highly interactive and requires a firm understanding both of the data domains under investigation and of the inner workings of the theorem prover. We have limited our attention to first-order process languages, in which processes are values are treated distinctly from processes. In higher-order process languages [68, 80, 90], the verification problem is much more difficult. Our work is related, however, to work on effect systems for higherorder languages. Effect systems are extensions of conventional type systems that include information about the side effects that a program may engage in [64]. The side-effect information is usually approximate because actions may be inferred by the effect system that can never occur in practice. In this sense they are a kind of sound abstraction. In [70], Nielson and Nielson have devised an effect system for the higher-order language CML [80]. Our trivial interpretation provides an effect system for process in the spirit of the Nielsons’ work. Their language is much more complex, supporting higher-order

110

communication. However, our abstractions preserve more of the behavior of the original process than do theirs; for example, their abstractions reduce external to internal non-determinism. It remains to be seen if more refined effect systems can be developed for languages like CML. Finally, we note that our results are compatible with other techniques to speed program verification, including algorithms that perform on-the-fly verification [88, 31, 41] and BDD-based model checking [18, 36].

111

Bibliography [1] Samson Abramsky. Observation equivalence as a testing equivalence. Theoretical Computer Science, 53:225–241, 1987. [2] Brad Alexander, Dean Engelhardt, and Andrew Wendelborn. An overview of the Adl language project. In Wim Bohm and John Feo, editors, High Performance Computer Architecture, pages 73–82, 1995. Available from ftp://sisal.llnl.gov/pub/hpfc/index.html. [3] J. C. M. Baeten and W. P. Weijland. Process Algebra, volume 18 of Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, 1990. [4] S. Bensalem, A. Bouajjani, C. Loiseaux, and J. Sifakis. Property preserving simulations. In G. van Bochmann and D.K. Probst, editors, Proceedings of the 4th Workshop on ComputerAided Verification, volume 663 of Lecture Notes in Computer Science, pages 260–273, Montreal, July 1992. Springer-Verlag. [5] J. A. Bergstra and J. W. Klop. An introduction to process algebra. In J.C.M. Baeten, editor, Applications of Process Algebra, volume 17 of Cambridge Tracts in Theoretical Computer Science, pages 1–21. Cambridge University Press, 1990. [6] Guy E. Blelloch. Vector Models for Data-Parallel Computing. MIT Press, 1990. [7] Guy E. Blelloch. NESL: A nested data-parallel language (version 3.0). Technical report, Carnegie-Mellon University, Department of Computer Science, 1994. [8] Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias, and Marco Zagha. Accounting for memory bank conetention and delay in high-bandwidth multiprocessors. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 84–94, Santa Barbara, CA, July 1995. ACM Press. [9] Guy E. Blelloch and John Greiner. A provable time and space efficient implementation of NESL. In International Conference on Functional Programming, 1996. [10] Guy E. Blelloch, Jonathan C. Hardwick, Jay Sipelstein, Marco Zagha, and Siddhartha Chatterjee. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing, 21(1):4–14, April 1994. [11] Guy E. Blelloch and Gary W. Sabot. Compiling collection-oriented languages onto massively parallel computers. Journal of Parallel and Distributed Computing, 8:119–134, 1990. [12] B. Bloom and R. Paige. Computing ready simulations efficiently. In S. Purushothaman and Amy Zwarico, editors, Proceedings of the North American Process Algebra Workshop, Workshops in Computing, pages 119–134, Stony Brook, August 1992. Springer-Verlag. [13] Bard Bloom, Sorin Istrail, and Albert R. Meyer. Bisimulation can’t be traced. In Conference

112

[14] [15]

[16] [17] [18]

[19]

[20] [21]

[22] [23] [24]

[25] [26]

[27] [28]

Record of the ACM Symposium on Principles of Programming Languages, pages 229–239, San Diego, January 1988. ACM Press. Full version available as Cornell CS-TR-90-1150; to appear in JACM. Bard Bloom, Sorin Istrail, and Albert R. Meyer. Bisimulation can’t be traced. Technical Report TR 90-1150, Cornell University, Department of Computer Science, 1990. To appear in JACM. J. Bradfield and C. Stirling. Verifying temporal properties of processes. In J.C.M. Baeten and J.W. Klop, editors, CONCUR ’90: Theories of Concurrency—Unification and Extension, volume 458 of Lecture Notes in Computer Science, pages 115–125, Amsterdam, August 1990. Springer-Verlag. J. Bradfield and C. Stirling. Local model checking for infinite state spaces. Theoretical Computer Science, 96(1):157–74, 1992. R. P. Brent. The parallel evaluation of generic arithmetic expressions. Journal of the ACM, 21(2):201–206, 1974. J. R. Burch, E. M. Clarke, K. L McMillan, D. L. Dill, and L. J. Hwang. Symbolic model checking: 1020 states and beyond. Information and Computation, 98(2):142–170, 1992. Preliminary version in LICS ’90. U. Celikkan and R. Cleaveland. Generating diagnostic information for behavioral preorders. In G. van Bochmann and D.K. Probst, editors, Proceedings of the 4th Workshop on ComputerAided Verification, volume 663 of Lecture Notes in Computer Science, Montreal, July 1992. Springer-Verlag. Full version to appear in Distributed Computing. K. Mani Chandy and Jayadev Misra. Parallel Program Design: A Foundation. Addison-Wesley, 1988. E. M. Clarke, E. A. Emerson, and A. P. Sistla. Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Transactions on Programming Languages and Systems, 8(2):244–263, April 1986. Edmund M. Clarke, Orna Grumberg, and David E. Long. Model checking and abstraction. ACM Transactions on Programming Languages and Systems, 16(5):1512–1542, September 1994. R. Cleaveland. Tableau-based model checking in the propositional mu-calculus. Acta Informatica, 27(8):725–747, September 1990. R. Cleaveland, J. Parrow, and B. Steffen. The concurrency workbench: A semantics-based tool for the verification of finite-state systems. ACM Transactions on Programming Languages and Systems, 15(1):36–72, January 1993. R. Cleaveland and B. Steffen. A linear-time model-checking algorithm for the alternation-free modal mu-calculus. Formal Methods in System Design, 2:121–147, 1993. Rance Cleaveland. On automatically explaining bisimulation inequivalence. In K.G. Larsen and A. Skou, editors, Proceedings of the 3rd Workshop on Computer-Aided Verification, volume 575 of Lecture Notes in Computer Science, Alborg, Denmark, July 1991. Springer-Verlag. Rance Cleaveland and Matthew Hennessy. Testing equivalence as a bisimulation equivalence. Formal Aspects of Computing, 5:1–20, 1993. Rance Cleaveland, S. Purushothaman Iyer, and Daniel Yankelovich. Optimality in abstractions of model checking. In Static Analysis Symposium, volume 983 of Lecture Notes in Computer 113

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37] [38] [39] [40] [41]

Science. Springer-Verlag, 1995. Rance Cleaveland and James Riely. Testing-based abstractions for value-passing systems. In B. Jonsson and J. Parrow, editors, CONCUR ’94: Concurrency Theory, volume 836 of Lecture Notes in Computer Science, pages 417–432, Uppsala, Sweden, August 1994. Springer-Verlag. Rance Cleaveland and Bernhard Steffen. Computing behavioural equivalences, logically. In J. Leach Albert, B. Monien, and M. Rodriguez Artalejo, editors, Proceedings of the International Colloquium on Automata, Languages and Programming, volume 510 of Lecture Notes in Computer Science, pages 127–138, Madrid, July 1991. Springer-Verlag. C. Courcoubetis, M. Vardi, P. Wolper, and M. Yannakakis. Memory efficient algorithms for the verification of temporal properties. In E.M. Clarke and R.P. Kurshan, editors, Proceedings of the 2nd Workshop on Computer-Aided Verification, volume 531 of Lecture Notes in Computer Science, pages 233–2424, New Brunswick, NJ, June 1990. Springer-Verlag. P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Conference Record of the ACM Symposium on Principles of Programming Languages, pages 238–252. ACM Press, January 1977. P. Cousot and R. Cousot. Systematic design of program analysis frameworks. In Conference Record of the ACM Symposium on Principles of Programming Languages. ACM Press, January 1979. P. Cousot and R. Cousot. Inductive definitions, semantics, and abstract interpretation. In Conference Record of the ACM Symposium on Principles of Programming Languages, pages 83–90. ACM Press, January 1992. D. Dams, O. Grumberg, and R. Gerth. Abstract interpretation of reactive systems: Abstractions preserving ∀CTL∗ , ∃CTL∗ and CTL∗ . In E.-R. Olderog, editor, Proceedings of the IFIP WG2.1/WG2.2/WG2.3 Working Conference on Programming Concepts, Methods and Calculi (PROCOMET), IFIP Transactions, San Miniato, Italy, June 1994. North-Holland. Full version available as EUT Computing Science Note 94/24. R. De Nicola, A. Fantechi, S. Gnesi, and G. Ristori. An action-based framework for verifying logical and behavioral properties of concurrent systems. Computer Networks and ISDN Systems, 25:761–778, 1993. R. De Nicola and M. C. B. Hennessy. Testing equivalences for processes. Theoretical Computer Science, 34:83–133, 1984. Rocco De Nicola and Frits Vaandrager. Three logics for branching bisimulations. Journal of the ACM, 42(2):458–487, 1995. E. A. Emerson and C. L. Lei. Model checking under generalized fairness constraints. Technical Report TR-84-20, University of Texas, Department of Computer Science, 1984. E. A. Emerson and C. L. Lei. Modalities for model checking: Branching time strikes back. Science of Computer Programming, 8:275–306, 1987. Preliminary version in POPL ’85. J.-C. Fernandez and L. Mounier. ‘On the fly’ verification of behavioural equivalences and preorders. In K.G. Larsen and A. Skou, editors, Proceedings of the 3rd Workshop on ComputerAided Verification, volume 575 of Lecture Notes in Computer Science, pages 181–191, Alborg, 114

[42]

[43] [44]

[45]

[46]

[47] [48]

[49] [50]

[51] [52] [53] [54] [55]

[56] [57]

[58]

Denmark, July 1991. Springer-Verlag. Andrew D. Gordon. Bisimilarity as a theory of functional programming. In Mathematical Foundations of Programming Semantics, volume 1 of Electronic Notes in Theoretical Computer Science (http://www.elsevier.nl/locate/entcs). Elsevier, 1995. J. Goyer. Communications protocols for the B-HIVE multicomputer. Master’s thesis, North Carolina State University, 1991. S. Graf and C. Loiseaux. Program verification and abstraction. In M.-C. Gaudel and J.-P. Jouannaud, editors, TAPSOFT ’93: Theory and Practice of Software Development, volume 668 of Lecture Notes in Computer Science, Orsay, France, April 1993. Springer-Verlag. S. Graf and C. Loiseaux. A tool for symbolic program verification and abstraction. In Costas Courcoubetis, editor, Proceedings of the 5th Workshop on Computer-Aided Verification, volume 697 of Lecture Notes in Computer Science, Elounda, Greece, July 1993. Springer-Verlag. J. F. Groote and A. Ponse. Proof theory for µCRL: A language for processes with data. In Proceedings of the International Workshop on Semantics of Specification Language, Workshops in Computing. Springer-Verlag, 1994. Orna Grumberg and David E. Long. Model checking and modular verification. ACM Transactions on Programming Languages and Systems, 16(3):843–871, 1994. E. Harcourt, J. Mauney, and T. Cook. Specification of instruction-level parallelism. In Bard Bloom, editor, Proceedings of the North American Process Algebra Workshop, Ithaca, NY, August 1993. Available as Technical Report TR93-1369, Cornell University. M. C. B. Hennessy. Algebraic Theory of Processes. MIT Press, Boston, 1988. M. C. B. Hennessy and A. Ing´olfsd´ottir. A theory of communicating processes with valuepassing. Information and Computation, 107:202–236, December 1993. Preliminary version in ICALP ’90. M. C. B. Hennessy and H. Lin. Symbolic bisimulations. Computer Science Technical Report 92/01, University of Sussex, 1992. Available from http://www.cogs.susx.ac.uk/. M. C. B. Hennessy and R. Milner. Algebraic laws for nondeterminism and concurrency. Journal of the ACM, 32(1):137–161, January 1985. Jonathan M. D. Hill. Vectorizing a non-strict functional language for a data-parallel “spineless (not so) tagless G-machine”: DRAFT, 1993. C. A. R. Hoare. Communicating Sequential Processes. Prentice-Hall, 1985. Douglas J. Howe. Equality in lazy computation systems. In Proceedings of the Symposium on Logic in Computer Science, pages 198–203, Pacific Grove, CA, June 1989. IEEE Computer Society Press. Douglas J. Howe. Proving congruence of bisimulation in functional programming languages. Information and Computation, 124(2):103–112, February 1996. N. D. Jones and F. Nielson. Abstract interpretation: a semantics-based tool for program analysis. In Samson Abramsky, Dov M. Gabbay, and T.S.E. Maibaum, editors, Handbook of logic in computer science. Oxford University Press, 1994. Bengt Jonsson and Joachim Parrow. Deciding bisimulation equivalences for a class of non-finitestate programs. Information and Computation, 107(2):272–, 1993. 115

[59] P. Kanellakis and S. A. Smolka. CCS expressions, finite state processes, and three problems of equivalence. Information and Computation, 86(1):43–68, May 1990. [60] R. P. Kurshan. Analysis of discrete event coordination. In J.W. de Baker, W.-P. de Roever, and G. Rozenberg, editors, Stepwise Refinement of Distributed Systems, Proceedings of the REX Workshop, volume 430 of Lecture Notes in Computer Science. Springer-Verlag, June 1989. [61] Leslie Lamport. The temporal logic of actions. Technical Report 79, Digital Equipment Corporation Stanford Research Center, 1993. To appear in TOPLAS. [62] C. Loiseaux, S. Graf, J. Sifakis, A. Bouajjani, and S. Bensalem. Property preserving abstractions for the verification of concurrent systems. Formal Methods in System Design, 6(1):11–44, 1995. [63] David E. Long. Model Checking, Abstraction, and Compositional Verification. PhD thesis, Carnegie-Mellon University, 1993. [64] John M. Lucassen and David K. Gifford. Polymorphic effect systems. In Conference Record of the ACM Symposium on Principles of Programming Languages, pages 47–57, San Diego, January 1988. ACM Press. [65] K. L. McMillan and J. C. Schwalbe. Formal verification of the gigamax cache consistency protocol. CMU Tech Report, June 1991. [66] Robin Milner. Communication and concurrency. Prentice-Hall, 1989. [67] Robin Milner. Operational and algebraic semantics of concurrent processes. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B. Formal Models and Semantics, chapter 19. Elsevier/MIT Press, 1990. [68] Robin Milner, Joachim Parrow, and David Walker. A calculus of mobile processes, Parts I and II. Information and Computation, 100:1–77, September 1992. [69] Jayadev Misra. Axioms for memory access in asynchronous hardware systems. ACM Transactions on Programming Languages and Systems, 8(1):142–153, January 1986. [70] Flemming Nielson and Hanne Riis Nielson. From CML to process algebras. In Eike Best, editor, CONCUR: Proceedings of the International Conference on Concurrency Theory, volume 715 of Lecture Notes in Computer Science, pages 495–508, Hildesheim, Germany, August 1993. Springer-Verlag. [71] R. Paige and R. E. Tarjan. Three partition refinement algorithms. SIAM Journal of Computing, 16(6):973–989, December 1987. [72] Daniel W. Palmer, Jan F. Prins, and Stephen Westfold. Work-efficient nested data-parallelism. In Frontiers ’95, 1995. [73] Daniel William Palmer. Efficient Execution of Nested Data Parallel Programs. PhD thesis, University of North Carolina, 1996. [74] David Park. Concurrency and automata on infinite sequences. In Theoretical Computer Science, volume 104 of Lecture Notes in Computer Science, pages 167–183. Springer-Verlag, 1981. [75] Joachim Parrow. Verifying a CSMA/CD-protocol with CCS. In Proceedings of the IFIP Symposium on Protocol Specification, Testing and Verification, pages 373–387, Atlantic City, New Jersey, June 1988. North-Holland. [76] Iain Phillips. Refusal testing. Theoretical Computer Science, 50:241–284, 1987. [77] A. M. Pitts. Operationally-based theories of program equivalence. In P. Dybjer and A. M. 116

[78] [79]

[80] [81]

[82]

[83]

[84] [85] [86] [87]

[88] [89]

[90] [91] [92] [93]

Pitts, editors, Semantics and Logics of Computation, Publications of the Newton Institute, pages 241–298. Cambridge University Press, 1997. G.D. Plotkin. Structural operational semantics. Lecture Notes DAIMI FN-19, Aarhus University, 1981. Jan F. Prins and Daniel W. Palmer. Transforming high-level data-parallel programs into vector operations. In Proceedings of the Symposium on Principles and Practice of Parallel Programming, pages 119–128, San Diego, May 1993. (ACM SIGPLAN Notices, 28(7), July, 1993). John Hamilton Reppy. Higher-Order Concurrency. PhD thesis, Cornell University, June 1992. Available as Computer Science Technical Report 92-1285. J. Richier, C. Rodgriguez, J. Sifakis, and J. Voiron. Verification in XESAR of the sliding window protocol. In Proceedings of the IFIP Symposium on Protocol Specification, Testing and Verification, pages 235–250, Zurich, May 1987. North-Holland. V. Roy and R. de Simone. Auto/Autograph. In E.M. Clarke and R.P. Kurshan, editors, Proceedings of the 2nd Workshop on Computer-Aided Verification, volume 3 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 235–250, New Brunswick, NJ, June 1990. AMS Press. David Sands. Total correctness by local improvement in the transformation of functional programs. ACM Transactions on Programming Languages and Systems, 18(2):175–234, March 1996. Marcel Zvi Schreiber. Value-Passing Process Calculi as a Formal Method. PhD thesis, Imperial College of Science, University of London, January 1994. Thomas J. Sheffler and Siddhartha Chaterjee. An object-oriented approach to nested data parallelism, June 1994. D. B. Skillicorn. Models for practical parallel computation. International Journal of Parallel Programming, 20(2):133–158, 1991. Jr. Steele, G. L. and W. D. Hillis. Connection machine Lisp: Fine-grained parallel symbolic processing. In Proceedings of the ACM Conference on LISP and Functional Programming, pages 279–297, Cambridge, MA, August 1986. ACM Press. C. Stirling and D. Walker. Local model checking in the modal mu-calculus. Theoretical Computer Science, 89:161–177, 1991. Dan Suciu and Val Tannen. Efficient compilation of high-level data parallel algorithms. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures. ACM Press, June 1994. Bent Thomsen. Plain CHOCS: A second generation calculus for higher order processes. Acta Informatica, 30:1–59, 1993. Irek Ulidowski. Equivalences on observable processes. In Proceedings of the Symposium on Logic in Computer Science. IEEE Computer Society Press, June 1992. Irek Ulidowski. Local Testing and Implementable Concurrent Processes. PhD thesis, Imperial College of Science, University of London, 1994. R. J. van Glabbeek. The linear time – branching time spectrum. In J.C.M. Baeten and J.W. Klop, editors, CONCUR ’90: Theories of Concurrency—Unification and Extension, volume 117

[94]

[95]

[96]

[97]

[98]

[99]

458 of Lecture Notes in Computer Science, pages 178–297, Amsterdam, August 1990. SpringerVerlag. R. J. van Glabbeek. The linear time – branching time spectrum II. In Eike Best, editor, CONCUR: Proceedings of the International Conference on Concurrency Theory, volume 715 of Lecture Notes in Computer Science, pages 66–81, Hildesheim, Germany, August 1993. SpringerVerlag. Moshe Vardi and Pierre Wolper. An automata-theoretic approach to automatic program verification. In Proceedings of the Symposium on Logic in Computer Science, pages 332–345, Cambridge, MA, June 1986. IEEE Computer Society Press. G. Winskel. A note on model checking the modal ν-calculus. In G. Ausiello, M. DezaniCiancaglini, and S. Ronchi della Rocca, editors, Proceedings of the International Colloquium on Automata, Languages and Programming, volume 317 of Lecture Notes in Computer Science, pages 761–772, Stresa, Italy, July 1989. Springer-Verlag. Martin Wirsing. Algebraic specification. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B. Formal Models and Semantics, chapter 13. Elsevier/MIT Press, 1990. Pierre Wolper. Expressing interesting properties of programs in propositional temporal logic. In Conference Record of the ACM Symposium on Principles of Programming Languages, St. Petersburg, FL, January 1986. ACM Press. Shengzong Zhou, Rob Gerth, and Ruurd Kuiper. Transformation preserving properties and properties preserved by transformationas in fair transition systems. In Eike Best, editor, CONCUR: Proceedings of the International Conference on Concurrency Theory, volume 715 of Lecture Notes in Computer Science, pages 353–367, Hildesheim, Germany, August 1993. SpringerVerlag.

118