Program Units as Higher-Order Modules

0 downloads 0 Views 1MB Size Report
ming languages cannot clearly express the interaction between these levels. ..... extended with three unit-specific forms: a unit form for creating units, a.
Program Units as Higher-Order Modules Matthew Flatt

Matthias Felleisen

Rice University

Abstract We have designed a new module language called program units . Units support separate compilation, independent module reuse, cyclic dependencies, hierarchical structuring, and dynamic linking. In this paper, we present untyped and typed models of units.

1 Program Fragments and Units Programmers consume code fragments to create programs, and produce code fragments for other programs. When managing fragments becomes mechanistic, programmers write programs that assemble and execute these fragments. Some of these programs launch fragments as separate processes. Other programs link several fragments together to produce a new program fragment. And some programs dynamically link fragments into an already-running program. Especially in this last case, the distinction between the program and the fragments that is manages begins to blur. Unfortunately, current programming languages cannot clearly express the interaction between these levels. Programming with fragments requires a two-phase view of execution: linking followed by evaluation. This separation of phases is important because it enables the separate compilation, analysis, and optimization of fragments. It also suggests separate programming languages: a core language for implementing fragments and a module language for linking them. Much of the recent work on modularity (in particular, work on ML modules 2,10,11,15,16,18,23]) has taken this two-language view and focused on making the linking language exible. For MzScheme 7], we designed and implemented an extension of Scheme that carefully combines the core programming language with the linking language. In this language, code fragments called units are rst-class values. The This research was partially supported by a NSF Graduate Research Fellowship, NSF grants CCR-9619756, CDA-9713032, and CCR-9708957, and a Texas ATP grant.

1

only primitive operations on units are linking and invocation, which preserves the phase separation for an individual unit, but programmers can exploit the full exibility of the core language for the application of these operations. In this paper we present a typed language of program units that supports  units that import and export type denitions as well as value denitions  compound units that link several units together and hide selected details of the constituent units  procedure and type denitions with mutual references across unit boundaries  dynamic linking of units into a running program  separate compilation of units and  exible linking that allows multiple instances of a unit in a single program. Units accomodate a variety of core languags, such as ML, Ada, Modula-3, Java, Scheme, or C. Each of these languages can benet by incorporating the unit language: ML's modules disallow mutually-recursive type or function denitions linking in Ada, Modula-3, and Java is inexible because linking is specied within a package and relies on a global package namespace 2 Scheme has no standard module system and C, like most languages, lacks a standard mechanism for dynamic linking. Section 2 provides an overview of programming with units, and Section 3 denes the precise type checking and semantics of units. Section 4 briey considers compilation issues. The last two sections relate our unit language to existing module languages, and put this work into perspective.

2 Programming with Units The following sub-sections illustrate the basic design elements of our unit language using an informal, semi-graphical language. The examples assume a core language with lexical bloacks and a sub-language of types. The syntax used for the core language mimics that of ML. 2.1 Dening Units

Figure 1 denes a unit called Database. In the graphical notation, each unit is drawn as a box with three sections:  The top section lists the unit's imported types and values. The Database With class loaders, the meaning of the global namespace can be adjusted, but this adjustment must be described indirectly via a loader object rather than directly in the language. Even then, using hardwired names for imported packages prevents linking a single package in multiple contexts. 2

1





unit imports the type info (with the kind ) for data stored in the database, and the function error (with the type str!void), for error-handling. The middle section contains the unit's type and value denitions and an initialization expression. The latter performs startup actions for the unit at run-time. The Database unit denes the type db and the functions new, insert, and delete (plus some other denitions that are not shown). Database entries are keyed by strings, so Database initializes a hash table for strings with the expression strTable := makeStringHashTable(). The bottom section enumerates the unit's exported types and values. The Database unit exports the type db and the functions new, insert, and delete.

Database

info:: error:str!void

db =  fun new():db =  fun insert(d:db, key:str, v :info) =  fun delete(d:db, key:str) =  type



g imports

9 >> >= denitions >> and expressions 

(strTable := makeStringHashTable()):void db:: new:void!db insert:dbstrinfo!void delete:dbstr!void g exports Fig. 1. A simple database unit

In a statically-typed language, all imported and exported values have a type, and all imported and exported types have a kind. Imported and dened types can be used in the the type expressions for imported and exported values. All exported variables must be dened within the unit, and the type expression for an export must use only imported and exported types. In Database, both the imported type info and the exported type db are used in the type expression for insert: dbstrinfo!void. A unit is specically not a record of values (as ML structures are usually described). A unit encapsulates unevaluated code, much like the \.o" le created by compiling a C module. Before a unit's denitions and initialization expression can be evaluated, it must be rst linked with other units to resolve all of its imports. 2.2 Linking Units In the graphical notation, units are linked together via arrows connecting the exports of one box with the imports of another. Linking units together creates a compound unit, as illustrated in Figure 2 with the PhoneBook unit. This unit links Database with NumberInfo, a unit that implements the info type for phone numbers. The error function is not yet determined, so the PhoneBook

2

unit imports error and passes the imported value on to Database. All of the exported types and values of Database and NumberInfo are re-exported by PhoneBook, except the delete function from Database. Since the delete function is not exported, it is hidden to clients of PhoneBook. PhoneBook

error:str!void

NumberInfo

info =  numInfo(n:int):info =  info:: numInfo:int!info

type fun

Database

?

?

info:: error:str!void

  new void!db insert dbstrinfo!void delete dbstr!void ? new void?!db insert dbstr?info!void db ? ? info  numInfo int!info

db::

:

:

::

:

:

::

:

:

Fig. 2. Linking units to form a compound unit

A complete program is a unit (either simple or compound) without imports. Figure 3 denes a complete interactive phone book program, InteractivePhoneBook, which links PhoneBook with a graphical interface implementation Gui and an error unit ErrorHandler. The Main 3 contains an initialization expression that creates a database and an associated grahical interface. A complete program is analogous to an executable le in Unix invoking the unit evaluates the denitions in all of the program's units and then executes their intialization expressions in sequence. Thus, when InteractivePhoneBook is invoked, a new phone book database is created and a phone book window is opened by Main. The return value of the whole program is the value of the last initialization expression, which is a bool value in InteractivePhoneBook (assuming Main's expression is evaluated last). 4 Since linking and invocation are separate phases, linking can connect mutually recursive functions across units. Figure 4 denes a slightly revised version of the phone book program, IPB, where error is part of the Gui unit. Links ow both from PhoneBook to Gui and from Gui to PhoneBook. Thus,

Main is not a special name. Our informal graphical notation does not specify the order of units in a compound unit, but a textual notation covers this aspect of the language. 3

4

3

InteractivePhoneBook ErrorHandler error(s:str) =  error:str!void

fun

PhoneBook

?

error:str!void

db:: new:void!db insert:dbstrinfo!void info:: numInfo:int!info

Main ? ? ? db:: new:void!db openBook:db!bool let pb = new() in openBook(pb):bool

Gui? ?info!void db::?  insert:dbstr ? info:: numInfo:int!info fun openBook(pb:db) =  openBook:db!bool

Fig. 3. Linking units to dene a complete program

the insert function in PhoneBook may call error in Gui, which could in turn call PhoneBook's insert again to handle the error. A compound unit's links must satisfy the type requirements of the constituent units. For example, in InteractivePhoneBook, Main imports the type db from PhoneBook unit and also the function openBook:db!bool from Gui. The two occurrences of db must refer to the same type. A type checker can verify this by proving that the two occurrences have the same source, which is the db exported by PhoneBook In contrast, Figure 5 denes a \program" Bad in which inconsistent imports are provided to Main. Specically, db and openBook:db!bool refer to types named db that originate from dierent units. The type checker will correctly reject bad due to this mismatch. 2.3 Programs that Link and Invoke Other Programs The IPB unit has a xed set of constituent units: Main, PhoneBook, and Gui. But it is often useful to dene the shape of a compound unit without immediately specifying all of its constituent units. For example, the interactive phone book can be implemented for dierent graphical platforms, e.g.,

4

IPB PhoneBook

?

error:str!void

db:: new:void!db insert:dbstrinfo!void info:: numInfo:int!info Main db:: new:void!db openBook:db!bool let pb = new() in openBook(pb):bool

Gui ? db:: insert:dbstrinfo!void info:: numInfo:int!info fun openBook(pb:db) =  fun error (s:str) =  openBook:db!bool error:str!void

Fig. 4. Cycles in the linking graph are allowed Bad PhoneBook

error:str!void

db:: new:void!db insert:dbstrinfo!void info:: numInfo:int!info OtherDatabase db =  db::

type

? 

Gui ? db:: insert:dbstrinfo!void info:: numInfo:int!info fun openBook(pb:db) =  fun error (s:str) =  openBook:db!bool error:str!void

Main Mismatch ? db:: new:void!db openBook:db!bool let pb = new() in openBook(pb):bool

 

Fig. 5. Illegal linking due to a type mismatch

5

Macintosh and Windows, by dening dierent GUI units. Every GUI unit will have the same set of imports and exports, so the linking required to produce the complete interactive phone book is independent of the specic GUI unit. In short, the IPB compound unit should be abstracted with respect to its Gui unit. Since units are values, this form of abstraction can be achieved with a function. Figure 6 denes MakeIPB, a function that takes a GUI unit and returns an interactive phone book unit. The dashed boxes for aGui and MakeIPB indicate that the actual GUI and interactive phone book units are not yet determined. MakeIPB can be applied to dierent GUI implementations to produce dierent interactive phone book programs. The type associated with MakeIPB's argument is a unit type|a signature| that contains all of the information needed to verify its linkage in MakeIPB. In the graphical notation, a signature corresponds to a box with imports, exports, and an initialization expression type, but no denitions or expressions. The signature for aGui is dened by its dotted box, with :void indicating the type of the initialization expression. Using only this signature, the linking specied in MakeIPB can be completely veried, and the signature of the resulting compound unit is determined. fun

MakeIPB(aGui) = PhoneBook

?

error:str!void

db:: new:void!db insert:dbstrinfo!void info:: numInfo:int!info Main ? ? ? db:: new:void!db openBook:db!bool let pb = new() in openBook(pb):bool

aGui? ?info!void db::?  insert:dbstr ? info:: numInfo:int!info

void bool error:str void

openBook:db!

:

!

Fig. 6. Abstracting a compound unit over one of its constituent units

The MakeIPB function is used to create an interactive phone book, but is not intended to be a function within the interactive phone book program. Instead, MakeIPB is part of a linking and invoking program that is written in the same language as the program it links. Invocation is expressed in this language by writing \invoke" next to a unit. For example, Starter in 6

Figure 7 is a program that uses invoke to run an interactive phone book with a Macintosh GUI. Starter fun

MakeIPB(aGui) =

MacintoshGui = db:: insert:dbstrinfo!void info:: numInfo:int!info ...:void openBook:db!bool (invoke MakeIPB(MacintoshGui)):bool val

Fig. 7. A program that links and invokes an interactive phone book

2.4 Dynamic Linking The invoke form also works on units that are not complete programs. In this case, the unit's imports are satised by types and values from the lexical environment of the invoke expression in the invoking program. This generalized form of invocation implements dynamic linking for units. For example, the phone book program can exploit dynamic linking to support third-party \plug-in" extensions that load phone numbers from a foreign source. Each loader extension is implemented as a unit that is dynamically linked with the phone book program. The core language must provide a syntactic form that retrieves a unit value from an archive, such as the Internet, and checks that the unit matches a particular signature. 5 Then, a phone book user can install a loader extension at run-time. Figure 8 denes a Gui unit that supports loader extensions. The function addLoader consumes a loader extension as a unit and dynamically links it into the program using invoke. The extension unit imports types and functions that enable it to modify the phone book database. These imports are satised in the invoke expression with types and variables that were originally imported into Gui, plus the error function dened within Gui. The result of invoking the extension unit is the value of the unit's initialization expression, Type-checking in the load expression's context ensures that dynamic linking is type-safe. Java's dynamic class loading is broken because it checks types in a type environment that may dier from the environment where the clas is used 19]. 5

7

which is required (via signatures) to be a function with the type dble!void. This function is then installed into the interface's table of loader functions. Gui



db:: insert:dbstrinfo!void info:: numInfo:int!info

error(s:str) =  registerLoader(format:str, loader:dble!void) =  fun addLoader(format:str, aLoader) = aLoader ? insert:dbstr?info!void db::? ? ? registerLoader(format, invoke info:: numInfo:int!info error:str!void ) :dble!void fun

fun



openBook:db!bool error:str!void

Fig. 8. Dynamic linking with invoke

3 The Structure and Interpretation of Units

In this section we develop a precise account of the unit language design in three stages. We start in Section 3.1 with units as an extension of a dynamically typed language to introduce the basic syntax and semantics for units. In Section 3.2, we enrich this language with denitions for constructed types (like classes in Java or datatypes in ML). Finally, in Section 3.3 we consider arbitrary type denitions (like type equations in ML). For all three sections, we only consider those parts of the core language that are immediately relevant to units. The rigorous description of the unit language, including its type structure and semantics, relies on well-known type checking and rewriting techniques for Scheme and ML 5,11,25]. In the rewriting model of evaluation, the set of program expressions is partitioned into a set of values and a set of non-values. Evaluation is the process of rewriting a non-value expression within a program to an equivalent expression, repeating this process until the whole program is rewritten to a value. For example, a simple unit expression|represented in the graphical language by a box containing text code|is a value, while a compound unit expression is not. A compound unit expression can be rewritten to an equivalent unit expression by merging the text of the constituent units, as demonstrated in Figure 9. Invocation for a unit is similar: an invoke expression is rewritten by extracting the invoked unit's denitions and initialization expression, and then replacing references to imported variables with 8

values supplied for the imports. Otherwise, the standard rules for functions, assignments, and exceptions apply. PhoneBook

error:str!void

NumberInfo

info =  numInfo(n:int):info =  info:: numInfo:int!info

type fun

Database

?

?

info:: error:str!void

db =  new():db =  fun insert(d:db, key:str, v :info) =  fun delete(d:db, key:str) =  type fun



(strTable := makeStringHashTable()):void db:: new:void!db insert:dbstrinfo!void delete:dbstr!void

?

?

?

db::?  new:void!? db insert:dbstrinfo!void info:: numInfo:int!info PhoneBook

error:str!void

info =  db =  fun numInfo(n:int):info =  fun new():db =  fun insert(d:db, key:str, v :info) =  fun delete(d:db, key:str) = 

type

type

!

,



(strTable := makeStringHashTable()):void db:: new:void!db insert:dbstrinfo!void info:: numInfo:int!info Fig. 9. Graphical reduction rule for a compound unit

3.1 Dynamically Typed Units Figure 10 denes the syntax of Unitd , an extension of a dynamically typed core language. The unspecied expression forms of the core language are

9

extended with three unit-specic forms: a unit form for creating units, a compound form for linking units into a compound unit, and an invoke form for invoking units. The core language must provide two forms that are used in the process of linking and invoking: an expression sequence form (\") and a letrec form for lexical blocks containing mututally recusive denitions. e = unit imports exports denitions e j compound imports exports link e linkage and e linkage j invoke e with invoke-linkage j e  e j letrec value-defn* in e imports = import value-var-decl* exports = export value-var-decl* denitions = value-defn* value-defn = val value-var-decl = e linkage = with value-var-decl* provides value-var-decl* invoke-linkage = value-invoke-linkage* value-invoke-linkage = value-var-decl = e value-var-decl = x x = value variable

Fig. 10. Syntax of Unitd , units for a dynamically typed core language

The unit form consists of a set of import and export declarations followed by internal denitions and an initialization expression. The variables specied in the imports section of the unit are bound in the denition and initialization expressions. All variables listed in the exports section must be dened within the unit. In each denition, the expression on the right-hand side of = must be a valuable expression in the sense of Harper and Stone 11]|i.e., evaluating the expression must not incur any computational eects|with the restriction that imported and dened variables are not considered valuable. 6 The scope of a dened variable includes all of the denition expressions in the unit as well as the initialization expression. A unit expression is a rst-class value. There are only two operations on this value: linking the unit and invoking the unit. There is no way to \look inside" of a unit value to extract any information about its denitions or initialization expression. In particular, there is no \dot notation" for accessing parts of a unit, since a unit contains only unevaluated denitions and expressions. The compound form links two constituent units together into a new This restriction simplies the presentation of the formal semantics, but it can be lifted for an implementation, as in MzScheme, where accessing an undened variable is detected as a run-time error. 6

10

unit. 7 Like unit, the compound form starts with a list of imported and exported variables. The imported variables can be supplied as imports to the compound unit's constituents. The exported variables must be a subset of the constituents' exports. The constituent units are determined by two subexpressions: one after the link keyword and another after the and keyword. Along with each constituent unit expression, the variables that the unit is expected to import are listed after the with keyword, and the variables that the unit is expected to export are listed after the provides keyword. Variables are linked within a compound unit by name. Thus, the set of variables listed after with for the rst constituent unit must be a subset of the variables imported by the compound expression plus the variables listed after provides for the second constiuent unit. Similarly, the variables exported by the compound expression must be a subset of the combined set of variables listed after provides for each of the constituent units. A compound expression is not a value. It evaluates to a unit value that is indistinguishable from a unit created by unit with the same imports and exports. This unit's initialization expression is the sequence of the rst constituent unit's initialization expression followed by the the second constiuent unit's. The invoke form consumes a unit, determined by a single expression, and invokes it. If the unit requires any imported values, they can be provided in the invoke-linkage section of the invoke expression, which associates values with names for the unit's imports. An invoke expression evaluates to the invoked unit's initialization expression. To simplify the presentation, Unitd does not allow -renaming for a unit's imported and exported variables. In MzScheme's implementation of units, imported and exported variables have separate internal and external names, so all bound variables within a unit can be -renamed. Also, MzScheme's compound form links imports and exports via source and destination name pairs, rather than requiring the same name at both ends of a linkage. 3.1.1

Unitd

Context-sensitive Checking

The rules in Figure 11 enforce the context-sensitive properties that were informally described in the previous section. The checks ensure that a variable is not multiply dened, imported, or exported, and that the link clause of a compound expression is locally consistent.

Linking an arbitrary number of units together in a single compound expression (as in MzScheme's implementation) is a simple generalization. 7

11

x

distinct  `e u  `e

 `invoke e u with x = e p



x `e

x i x distinct x e  x  x i e  x  x i p

`

p

p

p

p

p

p

p

p

p

p

b

 `unit import x i export x e p

p

val x = e in e b p

   

x i x p1 x p2 distinct x e distinct x p2 x w 2 x i x p1 x e x p1 x p2  e 1 p

x w1

p

p

p

p

x     `compound import x export x i

p

p

p

p

p

p

i

p

p

`

p

e

 `e 2

p

link e 1 with x w1 provides x p1 and e 2 with x w2 provides x p2 p

p

p

p

The notation x indicates either a set or sequence of variables x, depending on the context. The notation val x = e indicates the sequence val x = e where each x is taken from the set x with a corresponding e from the set e . p

p

p

p

Fig. 11. Checking the form of Unitd expressions

invoke ( unit import x i export x e if x i  x w val x = e in e b) with x w = v w ! v w x w](letrec val x = e in e b) p

p

p

p

p

p

p

,

p

=

compound import x i export x e link ( unit import x i1 export x e1 val x 1 = e 1 in e b1 ) with x w1 provides x p1 and ( unit import x i2 export x e2 val x 2 = e 2 in e b2 ) with x w2 provides x p2 ! unit import x i export x e val x 1 = e 1 val x 2 = e 2 in e b1  e b2 p

p

p

p

p

p

p

p

p

p

p

p

,

p

p

p

if x 1 x 2 x i distinct, x i1  x w1 , x p1  x e1 , x i2  x w2 , and x p2  x e2 p

p

p

p

p

p

p

p

p

Fig. 12. Reducing Unitd expressions

12

p

p

p

3.1.2 Unitd Evaluation The unit-specic reduction rules for Unitd are dened in Figure 12. These rules are a modication of those for Scheme 5]. The rst rule shows that an invoke expression reduces to a letrec expression containing the invoked unit's denitions and initialization expression. In this letrec expression, imported variables are replaced with values supplied for the imports. The variables supplied by invoke's with clause must cover all of the imports required by the unit. The second rule denes how the compound expression combines two units: the denitions from each unit are merged and the initialization expressions are sequenced. The compound rule requires that the constituent units provide at least the expected exports (according to the provides clauses) and need no more than the expected imports (according to the with clauses). The reduction rule also requires that unexported denitions in the two units have been appropriately -renamed to avoid collisions when the denitions are merged. 3.2 Units with Constructed Types Figure 13 extends the language in Figure 10 for a statically typed language with programmer-dened constructed types, such as ML datatypes. In the new language, Unitc , the imports and exports of a unit expression include type variables as well as value variables. All type variables have a kind 8 and all value variables have a type. The compound and invoke expressions are extended in the natural way to handle imported and exported types. The denition section of a unit expression contains both type and value denitions. Type denitions of the form type t = xl l j xr r . xs are similar to ML datatype denitions. For simplicity, every type dened in Unitc has exactly two variants. Instances of the rst variant are constructed with the xl function, which takes a value of type l and constructs a value of type t . Instances of the second variant are constructed with xr given a value of type r . The xs function is the standard selector function for a datatype. The type of a unit expression is a signature of the form sig imports exports  end where imports species the kinds and types of a unit's imports and exports describes the kinds and types of its exports. As in unit, types in either imports or exports can be used in the type expressions within the signature. The type expression  is the type of the unit's initialization expression, which cannot depend on type variables listed in exports . The type checking and evaluation rules for Unitc are natural extensions The only kind in this language is , which is the kind of types for values. We declare explicit kinds in anticipation of future work that handles type constructors and polymorphism, which require kinds such as !. 8

13

imports exports denitions datatype-defn linkage invoke-linkage type-invoke-linkage type-var-decl value-var-decl , 

= = = =

= = = =

=

import type-var-decl* value-var-decl* export type-var-decl* value-var-decl*

datatype-defn* value-defn* type t = x  j x  . x with type-var-decl* value-var-decl* provides type-var-decl* value-var-decl* type-invoke-linkage* value-invoke-linkage* type-var-decl =  t ::  x: t

j



!  j signature

sig imports exports  end

signature

=

t

=

type variable

=

type kind

 Fig. 13.

=

Syntax extensions for

Unitc

, units for a core language with constructed

types

to those of Unitd . The interested reader is referred to Appendix A for details. 3.3

Units with Type Equations

Unit is sucient to extend languages where a new constructor is associc

ated with every dened type. Other languages support type equations of the form type t =  , which denes t as an abbreviation for the type  . If the complete program is known, the variable t can be replaced everywhere with  . Otherwise, the expansion of t must be delayed until the program is fully assembled. Unite extends Unitc with type equations. Since two units can contain mutually-recursive denitions, naively linking two units with type equations may result in a cyclic type denition. To prevent cyclic denitions created by linking, signatures in Unite include information about type dependencies. denitions type-defn signature dependency

= = = =

type-defn* datatype-defn* value-defn* type t ::  =  sig imports exports depends dependency*  end tt

Fig. 14. Syntax extensions for

Unite

, units for a language with type denitions

Figure 14 denes syntax extensions for Unite , including a new signature form that contains a depends clause. The dependency declaration te  14

means that an exported type te depends on an imported type ti. When two units are linked with a compound expression, the unit system traces the set of dependencies to ensure that linking does not create a cyclic type denition. The signature for a compound expression propagates dependency information for types imported into and exported from the compound unit. The type checking and evaluation rules for Unite are natural extensions to those of Unitc. The interested reader is referred to Appendix B for details.

ti

4 Implementation Closed units can be compiled separately in the same way as closed functors in ML. When compiling a unit, imported types are obviously not yet determined and thus have unknown representations. Hence, expressions involving imported types must be compiled like polymorphic functions in ML 22,14]. Otherwise, the restrictions implied by a unit's interface allow inter-procedural optimizations within the unit (such as inlining, specialization, and dead-code elimination). Furthermore, since a compound unit is equivalent to a simple unit that merges its constituent units, intra -unit optimization techniques naturally extend to inter -unit optimizations when a compound expression has known constituent units. In MzScheme's implementation of Unitd , units are compiled by transforming them into procedures. The unit's imported and exported variables are implemented as rst-class reference cells that are externally created and passed to the procedure when the unit is invoked. The procedure is responsible for lling the export cells with exported values and for remembering the import cells for accessing imports later. The return value of the procedure is a closure that evaluates the unit's initialization expression. Figure 15 illustrates this transformation on an atomic unit. A compound unit encapsulates a list of constituent units (instead of denitions) and a procedure that propagates import and export cells to the constituent units, creating new cells to implement variables in the constituents that are not exposed by the compound unit.

5 Related Module Languages Our unit language incorporates ideas that have evolved in distinct language communities:  Traditional languages like C have relied on the lesystem as the language of modules. Programs (makeles) manipulate \.o" les to select the modules that are linked into a program, and module les are partially linked to create new \.o" or library les. Modern linking systems such as ELF 21] support dynamic linking. However, even the most advanced linking systems rely on a global namespace of function names and module (i.e., le) names, so that 15

(unit (import even) (export odd) (define odd (lambda (x) (if (zero? x) #f (even (sub1 x))))) (odd 13)) => (lambda (even-cell odd-cell) (set-cell! odd-cell (lambda (x) (if (zero? x) #f ((cell-value even-cell) (sub1 x))))) (lambda () ((cell-value odd-cell) 13)))

Fig. 15. An example of the basic compilation strategy for Scheme units 



9

modules can only be linked and invoked once in a program. Languages such as Ada 1], Modula-2 24], Modula-3 9], Haskell 12], Common Lisp 20], and Java 8] have established the \packages" approach to modularity, in which type and value denitions are grouped into packages that explicitly import parts of other packages. The package system delineates the boundaries of each module and forces the specication of static dependencies between modules. Linking and invocation are clearly separated, which allows mutually recursive function denitions across package boundaries. The main weakness of a package system is its reliance on a global namespace of packages with importing connections hardwired into each package. In contrast to our unit language, package systems do not permit the reuse of a single package for multiple invocations in a program or the external selection of connections between packages. 9 There is also no way to merge several packages into a new package that hides parts of the constituent packages. These shortcomings make packages less reuseable than units. Among the languages with packages, only Java provides a mechanism for dynamic linking. This mechanism is similar to the dynamic linking for units, but it is expressed indirectly via the language of class loaders, and is not fully general due to the constraints of a global package namespace. ML's functor system 17] is the most notable example of a language that lets a programmer describe abstractions over modules and gives a programmer direct control over assembling modules. Programmers can create modules that are completely private to other modules by instantiating functors Modula-3's generics allows the former but not the latter.

16

anonymously as arguments to other functors. The ML community has produced a large body of work exploring variations on the basic module system, especially variaions for higher-order modules 2,10,15,16,18,23]. Unfortunately, the standard mechanism for combining modules relies prevents the denition of mutually recursive types or procedures across module boundaries. And unlike units, ML provides no mechanism for dynamic linking since the module language is distinct from the evaluation language with a strict phase separation. Duggan and Sourelis have investigated \mixins" as a solution to the recursion problem 4]. Their approach is radically different from ours and does not address the problems of higher-order modules or dynamic linking. In addition, Cardelli 3] anticipated the unit language's emphasis on module linking as well as module denition. Our unit model is more concrete than his proposal and addresses many of his suggestions for future work. Kelsey's proposed module system for Scheme 13] captures most of the organizational properties of units, but does not address static typing or dynamic linking. In short, our unit model fuses the best parts of existing module systems in a novel, compact way that is applicable to many core languages. 6 Conclusion

Encapsulating program fragments is only half the story for modular programming. The other half is linking and invoking these encapsulations, sometimes in the context of a program that is already executing. A promising approach to giving programmers control over the latter half is to integrate program fragments into the core programming langauge. We have shown how this can be accomplished with program units to give the programmer a exible language for combining programs fragments without sacricing the distinct phases of linking and evluation. The unit language was originally implemented to simplify the development of DrScheme 6], Rice's Scheme programming environment. Units simplify DrScheme's implementation as a large and dynamic program. DrScheme supports multiple language dialects and third-party extensions that hook into its complex graphical interface. DrScheme also acts as a kind of operating system for client programs that are being developed, launching client programs by dynamically linking them into the system while maintaing the boundaries between clients. Units express DrScheme's extensibility and OS-nature directly and elegantly. Our proposal does not necessitate a tight integration of units into the core language. A weaker form of integration|using a separate language for dening and linking units|can acheive similar benets if essential design features are kept intact: compound units, a mechanism to inject units into the set of run17

time values, and an core expression for invoking units. In future work, we intend to explore linking languages that are separate from the core language to determine the optimal level of integration.

References 1] Barnes, J. G. P. Programming in Ada 95. Addison-Wesley, 1996. 2] Biswas, S. K. Higher-order functors with transparent signatures. In Proc. ACM Symposium on Principles of Programming Languages (1995), pp. 154{ 163. 3] Cardelli, L. Program fragments, linking, and modularization. In Proc. ACM Symposium on Principles of Programming Languages (1997), pp. 266{277. 4] Duggan, D., and Sourelis, C. Mixin modules. In Proc. ACM International Conference on Functional Programming (1996), pp. 262{273. 5] Felleisen, M., and Hieb, R. The revised report on the syntactic theories of sequential control and state. Tech. Rep. 100, Rice University, June 1989. Theoretical Computer Science, volume 102, 1992, pp. 235{271. 6] Findler, R. B., Flanagan, C., Flatt, M., Krishnamurthi, S., and Felleisen, M. DrScheme: A pedagogic programming environment for Scheme. In Proc. International Symposium on Programming Languages: Implementations, Logics, and Programs (1997), pp. 369{388. 7] Flatt, M. PLT MzScheme: Language manual. Tech. Rep. TR97-280, Rice University, 1997. 8] Gosling, J., Joy, B., and Steele, G. The Java Language Specication. The Java Series. Addison-Wesley, Reading, MA, USA, June 1996. 9] Harbison, S. P. Modula-3. Prentice Hall, 1991. 10] Harper, R., and Lillibridge, M. A type-theoretic approach to higher-order modules with sharing. In Proc. ACM Symposium on Principles of Programming Languages (1994), pp. 123{137. 11] Harper, R., and Stone, C. A type-theoretic semantics for Standard ML 1996. Submitted for publication, 1997. 12] Hudak, P., and Wadler, P. (Eds.). Report on the programming language Haskell. Tech. Rep. YALE/DCS/RR777, Yale University, Department of Computer Science, Aug. 1991. 13] Kelsey, R. A. Fully-parameterized modules or the missing link. Tech. Rep. 97-3, NEC Research Institute, 1997.

18

14] Leroy, X. Unboxed objects and polymorphic typing. In Proc. ACM Symposium on Principles of Programming Languages (1992), pp. 177{188. 15] Leroy, X. Manifest types, modules, and separate compilation. In Proc. ACM Symposium on Principles of Programming Languages (1994), pp. 109{122. 16] Leroy, X. Applicative functions and fully transparent higher-order modules. In Proc. ACM Symposium on Principles of Programming Languages (1995), pp. 142{153. 17] MacQueen, D. Modules for Standard ML. In Proc. ACM Conference on Lisp and Functional Programming (1984), pp. 198{207. 18] MacQueen, D. B., and Tofte, M. A semantics for higher-order functors. In European Symposium on Programming (Apr. 1994), Springer-Verlag, LNCS 788, pp. 409{423. 19] Saraswat, V. Java is not type-safe, Aug. 1997. URL: www.research.att.com/vj/bug.html. 20] Steele Jr., G. L. Common Lisp: The Language, second ed. Digital Press, 1990. 21] SunSoft. SunOS 5.5 Linker and Libraries Manual, 1996. 22] Tarditi, D., Morrisett, G., Cheng, P., Stone, C., Harper, R., and Lee, P. TIL: A type-directed optimizing compiler for ML. In Proc. ACM Conference on Programming Language Design and Implementation (1996), pp. 181{192. 23] Tofte, M. Principal signatures for higher-order program modules. In Proc. ACM Symposium on Principles of Programming Languages (1992), pp. 189{ 199. 24] Wirth, N. Programming in Modula-2. Springer-Verlag, 1983. 25] Wright, A., and Felleisen, M. A syntactic approach to type soundness. Tech. Rep. 160, Rice University, 1991. Information and Computation, volume 115(1), 1994, pp. 38{94.

19

Appendix A Unit Type Checking and Evaluation c

For economy, we introduce the following unusual abbreviation, which summarizes the content of a signature with the indices used on names:

 sig import t :: x : export t :: x :

sigi  e  b ]

p

i

i

p

i

i

e

e

p

e

e

p

b

To allow the use of specialized units in place of more general units, signatures have a subtype relation (see Figure A.1): a specic signature ts is a subtype of a more general signature tg if 1) ts has fewer imports and more exports, 2) the type of each imported name in ts is a subtype of the one in tg , 3) the type of each exported name in tg is a subtype of the one in ts, and 4) the type of the body expression in ts is a subtype of the body expression type in tg .

  b2 t i1 ::i1  t i2 ::i2 t e1 ::e1 t e2 ::e2 8x i1 : i1 2 x i1 : i1  9x i1 : i2 2 x i2 : i2 s.t.  i2   i1 8x e2 : e2 2 x e2 : e2  9x e2 : e1 2 x e1 : e1 s.t.  e1   e2 sigi1  e1  b1 ]  sigi2  e2  b2 ] p

 b1

p

p

p

p

p

p

p

 `e :   `s e : 

0



0





Fig. A.1. Subtyping and subsumption in Unitc signatures

The typing rules for Unitc are shown in Figure A.2. These rules are typed extensions of the rules from Section 3.1.1. The special rule `s is used when subsumption is allowed on an expression's type. Subsumption is used carefully so that type checking is deterministic. For example, full subsumption is not allowed in the expression eu for the invoke rule because the initialization expression type b in eu's signature supplies the type of the entire invoke expression. The reduction rules for Unitc in Figure A.3 resemble the reductions in Section 3.1.2. The only dierence for Unitc is that the invoke and compound reductions propagate type denitions as well as val denitions. 20

 =  t i:: i t e:: e ( b) \ t e =   ` i :: i  ` e :: e  ` b ::   `sigi e b ] ::  p

0



0

p

 



p



0



FTV





0









distinct  ` b ::   ` ::  `s e :  `e u : sigi e b ] sigi e b ]  sig import t :: x : export  `invoke e u with t :: = x : = e :



t x p

0

p

p







p









p





p



p

p

p



    

`

p

p

p

p

p

p

p

!

p

!

p

`

p

p



l

p

l l

0

p

l

r



p

p

s s

0



p

p

r





b

p

p

j

p

r r

p

x i : i export t e ::e x e : e type t = x l  l x r  r . x s val x : = e in e b sigi  e  b ] p

i

:

b

p

p

p

 `unit import t i ::

b

 t ::x : x : x : x : ` ::   ` ::   ` ::  !( ! )!( ! )! ` :



p

p





t i t x i x l x r x s x distinct t e ::e x e : e sigi  e  b ] ::  0 =  t i ::i t :: 0 00 =0 x i: i  x l: l t  x r : r t  x s:t 00 s e :  00 e b p

p

p

p

                     ` ` ` `  

t i t p1 t p2 x i x p1 x p2 distinct t e x e distinct t w1 ::w1 x w1 : w1 t i ::i t p2 ::p2 x i : i x p2 : p2 t w2 ::w2 x w2 : w2 t i ::i t p1 ::p1 x i : i x p1 : p1 t e ::e x e : e t p1 ::p1 t p2 ::p2 x p1 : p1 x p2 : p2  sigi  e  b2 ] ::   sigw1  p1  b1 ] ::   sigw2  p2  b2 ] ::   e 1 : sigi1  e1  b1 ]  e 2 : sigi2  e2  b2 ] sigi1  e1  b1 ] sigw1  p1  b1 ] sigi2  e2  b2 ] sigw2  p2  b2 ] p

p

p

p

p

`

p

: sigi



e  b2 ]

link and

p

p

p

p

p

p

 `compound import t i::

p

p

p

p

p

p

p

p

p

p

p

p

p

p

x i : i export t e ::e x e : e e 1 with t w1 ::w1 x w1 : w1 provides t p1 ::p1 x p1 : p1 e 2 with t w2 ::w2 x w2 : w2 provides t p2 ::p2 x p2 : p2 p

i

p

p

p

p

p

p

p

p

p

Fig. A.2. Type checking for Unitc

B Unit Type Checking and Evaluation e

Unit  sig import t :: x : export t :: x : depends t  t

The following abbreviation expresses a

sigi  e  di  de  b ]

e signature: p

i

i

p

i

i

e

e

p

e

e

de

b

21

p

di

p

p

p

p

invoke ( unit import t i:: i x i: i export t e:: type t = x l l x r r x s val x : = e in e b) with t w :: w = w x w : w = v w !  w w v w x w](letrec type t = x l l j x r val x : = e in e b) p

p







j



e x e : e p



p

p

.

p



p



p

,



=

p





p



=



xs

r .

p

p



compound import t i:: i x i: i export t e:: e x e: e link ( unit import t i1 :: i1 x i1 : i1 export t e1:: e1 x e1: e1 type t 1 = x l1 l1 j x r1 r1 x s1 val x 1 : 1 = e 1 in e b1 ) with t w1:: w1 x w1: w1 provides t p1:: p1 x p1: p1 and ( unit import t i2 :: i2 x i2: i2 export t e2:: e2 x e2: e2 type t 2 = x l2 l2 j x r2 r2 x s2 val x 2: 2 = e 2 in e b2 ) with t w2:: w2 x w2: w2 provides t p2:: p2 x p2: p2 ! unit import t i:: i x i: i export t e:: e x e: e type t 1 = x l1 l1 j x r1 r1 x s1 type t 2 = x l2 l2 j x r2 r2 x s2 val x 1: 1 = e 1 val x 2: 2 = e 2 in e b1  e b2 p





p

p

p

p





p





p

p





p





.

p



p

p





p



p



p





p

p





p

p





.

p



p





p

p

p

,





p





p





p

p

p





.





.

p

p



p



if t 1t 2t i x 1 x 2x i distinct Fig. A.3. Reduction rules for Unitc p

p

p

p

p

p

The subtyping rule in Figure B.1 accounts for the new dependency declarations by requiring that a specic signature declares more dependencies than a more general signature. The type checking rules for Unite are dened in Figure B.2. To calculate type equation dependencies for the signature of a simple unit, the type checking rules rely on the /D relation, which associates a type expression with each 22



b1   b2

t i1 ::i1  t i2 ::i2 t e1 ::e1 t e2 ::e2 t de1 t di1  t de2 t di2 8x i1: i1 2 x i1: i1  9x i1: i2 2 x i2 : i2 s.t.  i2   i1 8x e2: e2 2 x e2: e2 9x e2: e1 2 x e1: e1 s.t.  e1   e2 sigi1  e1  di1  de1  b1 ]  sigi2  e2  di2  de2  b2 ] p

p



p

p

p

p



p

p

p

p

Fig. B.1. Subtyping and subsumption in Unite signatures

of the type variables it references from the set of type equations D: 

/D t i t 2 F T V ( ) or (9ht =  i 2D s.t. t 0

0

0

2 F T V ( ) and  /D t ) 0

( ) denotes the set of type variables in  that are not bound by the import or export clause of a sig type. Types in a set of type equations D can be eliminated from a type expression with the j  jD operator, as follows: FTV 

8t >> >> j jD >> j jD !j jD j jD = sig import t i::i x i:j ijD >> export t e::e x e:j ejD >> depends t de  t di : 0

0

00

p

p

0

p

p

0

p

j bjD

if if if if

=t and t 62D  =t and ht =  i 2D  = !  = sigi  e  di  de  b ] and D = fht =  ijht =  i 2 D and t 2= t i t eg 

0

0

00

0

p

p

0

and similarly expanded in a value expression, sketched as follows:

8x if e =x >> >> unit if e = unit >> import t i::i x i: i import t i ::i x i: i >> export t e::e x e: e < export t e::e x e: e je jD = >> type t a::a = j ajD type t a::a =  a type t = xl l j xr r . xs >> type t = x l j ljD j x r j r jD . x s >> val x :j jD = je jD in je bjD val x : = e in e b >> and D = fht =  ijht =  i 2 D : and t 2= t i t e t a t g p

p

p

p

p

p

p

p

p

p

0

p

p

0

0

p

p

0

0

0



0

p

p

p

The subscript D is left o of j  j when D is clear from context. The Unite reductions in Figure B.3 dier only slightly from the Unitc reductions. Type abbreviations are immediately expanded away in the invoke 23

p

\ te  `e e `b i t de  t e t di  t i

t i ::i  t e ::e p

0

`

 =  0





i

:: 

p



0

b

p



:: 

p



p

 





=

0



p

`sig i e di de b



p

F T V ( )



]

:: 

:: 

          h i / ) 6/ h ih  f  jh i2 2 2 ` `j j `j j `j j x i j i j x j j ! jt j x j j ! jt j x t ! j j ! !j ` je j j j `je bj b `unit import t i:: i x i: i

  i2 / g ` j! !

t i t x i x l x r x s x distinct t e ::e x e : e t a ::a t :: x : x l : l x r : r x s : s D = ta = a a D t 0a  0a D t a for t a =  a  t 0a =  0a D t de t di = t a ti ta = a D and t i t i and t a t e and  a D t i 0  sigi  e  di  de  b ] ::   =  t i ::i  t :: 0a = 0  t a ::a 0a  a :: a p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p



0

l

p



00

0

= 

: 

p

l:  l



0

:: 



r

p

r:  r





00

p

s

p



p

p

0

:: 



s:

p

:: 



( l

00

)

( r

)

p



: 

p





p



:

p

p

p

p





export t e:: e x e: e type t a:: a = a type t = x l l j x r val x : = e in e b p





p

p







xs

r .

p

p



:

sigi  e  di  de  b ]

              `

p

p

p

p

p

p

p

p

` `

 

p

p

p

p



p

p

h

p

p

p

p

t de



t i t p1 t p2 x i x p1 x p2 distinct t e x e distinct t w1 ::w1 x w1 : w1 t i ::i t p2 ::p2 x i : i x p2 : p2 t w2 ::w2 x w2 : w2 t i ::i t p1 ::p1 x i : i x p1 : p1 t e ::e x e : e t p1 ::p1 t p2 ::p2 x p1 : p1 x p2 : p2  sigi  e  di  de  b2 ] ::   sigw1  p1  di1  de1  b1 ] ::   sigw2  p2  di2  de2  b2 ] ::   e 1 : sigi1  e1  di1  de1  b1 ]  e 2 : sigi2  e2  di2  de2  b2 ] sigi1  e1  di1  de1  b1 ] sigw1  p1  di1  de1  b1 ] sigi2  e2  di2  de2  b2 ] sigw2  p2  di2  de2  b2 ] t di1  t de1 t de2  t di2 = t di = t e t i t i t i and t e t e and t e t i t de1 t di1 t de2

  i\h 2

p

p

   p

p

p

p

p

p

` `

i   2

p

p

 f  j 2    t di g `compound import t i:: i x i: i export t e:: e x e: e p

p

p

p







p

p

p





p

2

p

link e 1 with t w1:: w1 x w1: w1 provides t p1:: p1 x p1: p1 and e 2 with t w2:: w2 x w2: w2 provides t p2:: p2 x p2: p2 p

p



:



p

p









p

p





p

p

sigi  e  di  de  b2 ]

Fig. B.2. Type checking for

Unite

reduction, and the compound reduction preserves and merges type equations when linking. 24

p

invoke unit import t i:: i x i: i export t e:: e x e: type t a :: a = a type t = x l l x r r x s val x : = e in e b with t w :: w = w x w : w = v w ! w w v w x w letrec type t = x l j lj j x r j rj val x :j j = je j in je bj p

(

p

p





e



p

p





p



j



.

p



)

p



p

,



=

p





p



=

](





.

p

where D =

ht a



i

xs

p

)

p

= a

compound import t i:: i x i: i export t e:: e x e: e link unit import t i1 :: i1 x i1 : i1 export t e1:: e1 x e1: e1 type t a1 :: a1 = a1 type t 1 = x l1 l1 j x r1 r1 x s1 val x 1 : 1 = e 1 in e b1 with t w1:: w1 x w1: w1 provides t p1:: p1 x p1: p1 and unit import t i2 :: i2 x i2: i2 export t e2:: e2 x e2: e2 type t a2 :: a2 = a2 type t 2 = x l2 l2 j x r2 r2 x s2 val x 2: 2 = e 2 in e b2 with t w2:: w2 x w2: w2 provides t p2:: p2 x p2: p2 ! unit import t i:: i x i: i export t e:: e x e: e type t a1 :: a1 = a1 type t a2 :: a2 = a2 type t 1 = x l1 l1 j x r1 r1 x s1 type t 2 = x l2 l2 j x r2 r2 x s2 val x 1: 1 = e 1 val x 2: 2 = e 2 in e b1 e b2 p





p

p

p

(

p

p





p





p





p





p





.

p



)

p

p





p

p





p

(





p

p



p





p





.

p



)

p





p

p

p





,

p





p





p

p

p









p

p





.





.

p

p



p





   

p

if

p

p

p

p

t 0a t 00a t 0 t 00 x 0 x 00

p

distinct

Fig. B.3. Reduction rules for

25

Unite



p

Suggest Documents