Nov 15, 2002 - addition to these, however, there are a number of operations that are important to the meaning of programming language symbols but that are ...
Symbol Binding and Resolution in Programming Languages David Lomet Digital Equipment Corporation Cambridge Research Lab One Kendall Square, Bldg. 700 Cambridge, MA 02139 November 15, 2002
Abstract We propose a new explanation for the “binding” process in programming languages. Current explanations [6] rely on a run-time search of an environment that consists of < symbol; value > pairs. The nature of this search and the ordering of the pairs are the critical factors in determining the meanings and scopes of the symbols. Instead of this, we propose explicit operations to perform “binding”. Further, these operations are frequently performed at compile time, thus capturing the notion of early binding that characterizes most languages. The sequence in which these operations are performed replaces the previously mentioned searching as the determiner of the meanings and scopes of symbols.
1
Introduction
Current programming languages, and operating systems in which they are embedded, provide many forms of binding. Intuitively, binding is a process by which a symbolic name becomes accociated with a value. This binding encompasses the normal declaration/use forms that are present in the programming language itself. Thus, declarationss of local variables, own variables, heap variables, external variables, and files are all examples in current languages of the binding process. In addition to these, however, there are a number of operations that are important to the meaning of programming language symbols but that are traditionally outside the province of programming languages. These are vital to an understanding 1
of programs. These “binding” operations include those performed by the linkage editor in which external names are given a definition, those performed by a command or shell language in which persistent objects, e.g. files, are linked to names declared within a program, and finally, the compiler itself. The purpose for this brief and incomplete survey in this introduction is to emphasize that the binding process is fundamental, diverse, and pervasive. It is also frequently outside of the domain of programs written in conventional programming languages. What we attempt to do is to provide a framework in which the complexities recounted above can be more clearly understood. This will pave the way for the inclusion of explicit binding operations into high level programming languages. We begin by examining binding as it exists in mathematics. Consider the notation for summing a sequence of integers–
Xi 25
i=1
The term being summed, i.e. i, is a free variable, considered in isolation. Its meaning is as yet unspecified and its symbolic name will be used to define it. The summation notation serves to bind i. What we mean by this is that the symbolic name of the term becomes irrelevant when the entire summation expression is considered. The use of the bound symbol i as the term being summed functions as a pure placeholder. The i in the summation notation is the definition (or declaration) for, and hence the binding occurence for i. Systematically changing the names of both the declaration and all uses of the variable does not change the meaning of the summation. If j were substituted for i throughout the summation, the meaning of the summation would remain exactly the same. From a programming viewpoint, there is another aspect to the previous expression. If the summation expression is to be executed so as to calculate the sum, then during this execution, the variable denoted by i must be successively replaced by the integers from one to twenty-five. We call this process symbol resolution. When the summation program is executing, no variables are present. Rather, during each iteration of the sum, a different integer takes the place of i. Let us explore another example. Consider a term in the lambda calculus, e.g. t x , that inclues x as a free symbol. When we precede this term with a lambda variable list that contains the symbol x, thus creating the formula
()
()
(x)t(x)
we bind the free x in t x by means of the occurence of x in the lambda variable list. This has been explained as follows: When x t x is encountered in a
( )( )
position in which it is to be applied to an argument, the existing environment of < symbol; value > pairs is extended by adding the pair < x; arg > to it, where “arg” denotes the evaluated argument. All symbols encountered in the evaluation of the body t x are given meaning by some form of search of this environment. Thus, with this explanation, a runtime search is always required at every occurence of every symbol. We propose a new explanation as follows. When the expression x t x is formed, the free occurences of x in t x are bounded. That is, they are replaced by anonymous placeholders and the symbolic nature of x completely disappears during the construction of this lambda expression. We say that the free symbol x has been bound. Note that an argument has not yet been specified for it. This argument specification, i.e. the association of paramenter to argument is again a symbol resolution process and is perfomred when the lambda expression is finally applied. The bound variable is replaced at this time by the result of evaluating the argument. In this view, then, runtime searching of an environment is not required in order to bind symbols. The binding process has been completed before the body of the lambda expression is evaluated (executed). Binding, then, usually occurs at program construction (compilation) time, and resolution sometimes occurs then also. Thus, the notion that most programming languages perform early binding is captured by our explanation. Our view is that this early binding is intrinsic. Thus, we view programs written in a typical programming language as the specification for computing, via a series of binding and resolution operations, a program which is to be executed. A user thus specifies both the program construction process and the algorithm which is to solve his problem when he writes a program using concrete syntax. We have thus distinguished two kinds of processes that are involved in the intuitive notion of “binding”, which we have called binding and resolution. In the remainder of the paper, we examine several forms of binding in programming languages in the light of this distinction. Binding and resolution operations are introduced that peform these processes by explicit direction. It is the sequence in which these operations are executed that serves to control the meaning and scope of variables. Thus, an inner procedure, one lexically nested within an outer procedure, must be subjected to both its own priviate binding operations and, in addition, the binding operations of the outer procedure. This sequence of operations replaces the usual runtime environment search in other descriptions of binding in programming languages. Rather than binding from the outside-in by extending environments, we will be binding from the inside-out by converting free symbols into bound ones in the inner most scopes first, and then proceeding outward. Outside-in binding requires
()
()
( )( )
an environment to be present that precisely specifies what a symbol means, and hence can only be performed at the last possible moment. Inside-out binding only requires that we identify which symbols are no longer free. Even when we resolve the symbols by specifying a method by which a value will be generated that is to replace them, this does not require that the replacement be made at this time. Thus, we explicitly provide for the precomputation of a large part of the replacement process, and therefore can describe, via explicit operations, the compilation process itself, not in terms of generating assembly code, but rather as a process in which the program is constructed via symbol binding and resolution operations. In the following sections, we introduce operations that are used in symbol binding and resolution. These operations describe the program construction process, which is not the usual realm for operations of the progrmming language itself. Hence, these operators can be interpreted as meta-language operators. That is, they are helpful in describing the meaning of programs but are not to be considered as part of the programming language itself. Our long range desire, however, is to extend the definition of programming languages to include program construction. Thus, we would like to eventually include these operators as part of the programming language. As will become clear in the following sections, this extension will require that program fragments, types, and symbols all become values in the programming language.
2
Symbols
In describing binding and resolution operations, it iss necessary to manipulate values that are or contain symbolic quantities that have yet to be bound or resolved. For this reason, (free) symbols are introduced. Such symbols will, when bound and subsequently resolved, be transformed into other values. However, a symbol does not itself designate some other value but rather is a placeholde for it. A name is an important component of a symbol as it is by means of its name that its occurences are bound by means of binding operations. We call a symbol’s name an identifier. Identifiers in (free) symbols are matched with identifiers in binding specifications (declarations) in order to designate which symbols are bound by the declaration. It is useful for a symbol to have a second component, a type. This permits type information to be available in program construction at a very early stage. This permits type checking to be performed on program fragments prior to the time that the symbol is bound by a declaration, and paves the way for program construction using binding and resolution operations on the fragment.
Hence, we denote symbols by pairs < identifier; type >. Associating types with symbols in the binding process has been suggested in [1]. This two part conception of a symbol need not imply anything about the concrete syntax of a programming language. Clearly, declarations can still be employed to associate an identifier with a type to form a symbol. But everywhere that the identifier (symbol) occurs, it should be regarded as carrying this type information. ALGOL 68 provides an example of this. In ALGOL 68, declarations consist, in the strict language, of a left hand side (the formal declarer) and a right hand side (either a value or a generator of a value). Occurences of identifiers in ALGOL 68, in our view, are regarded as carrying the formal declarer so that all uses of the identifiers can be type checked independently of the remainder of the program. To pursue our agenda of being able to construct programs incrementally with explicit operators, an operator must be provided that creates a symbol. This is provided via the sym operator as follows:
sym(identifier; type) ! symbol The identifier is typically some proscribed form of an ordinary character string. The type operand is just that, a type. The type restricts the bindings that are permitted for the symbol to declarations that provide a matching type. This treates types as first class values. It is hard to see how one can construct programs in this incremental fashion without first class types. Again, one can view what we write here as meta-language, in which first class types are unexceptional, or as part of the programming language itself, in which first class types are a great adventure.
3
Binding and Resolution
In section 1, we informally discussed the separation of the binding process into two components, binding and resolution. What we wish to do here is to be more precise as to the role of binding and resolution in programming languages and to prepare the ground for the introduction of explicit binding operations and resolution operations.
3.1 Working with Free Variables Consider a section of executable code that contains free symbols, that we denote by C x; y because the symbols x and y are free in this code. From the previous section, we know symbols have types, and we denote the types involved by Tx and Ty . The method by which these symbols are to be resolved is, however, not yet
( )
specified. In particular, we do not know whether these symbols will be parameters, local variables, own variables, etc. In Algol68 [9] terms, while formal declarers for the symbols have been provided, the actual declarers and generators have not. The specification of this information is done by means of binding operations. We now characterize the binding operations more precisely. Binding operations require the following operations: 1. a section of code containing zero or more free symbols 2. a specification of the symbols that are to be bound 3. a specification of how each such symbol is to be resolved The result of a binding operation is a new section of code in which the occurences of free symbols of 1. that are specified in 2 are bound such that the bound occurences are resolved as indicated in 3. Within a binding operation, there may be type checking performed in order to ensure that the free symbols are bound in ways that are compatible with their types. Algol68 illustrates this process in an obvious fashion with its identity declaration. As indicated previously, the left hand side of such a declaration, i.e. the formal declarer, associates a type (mode in Algol68) with the identifier, resulting in a symbol. The right hand side of an identity declaration specifies either a value or a means of generating a value that is to replace the symbol when the code is executed. In either case, the type in the formal declarer must be satisfied by the value produced by the right hand sided of the declaration. For example:
ref real x = loc real; describes a symbol x whose type is a reference to a real variable (pointer to a real variable). [This is what, in most languages would be thought of as declaring the variable. In Algol68, an identifier that denotes a real variable is “identified” with a reference value to that variable (memory location or cell that contains real values.)] The symbol x is bound by specifying on the right hand side a loc generator that has as an actual declarer the type real. This generator must return a value that satisfies the formal declarer and in this case, it clearly does since it generates a real variable on the stack local to some scope and returns a reference to it. This checking is part of the analysis performed at compile time. The Algol68 designers have insisted on this. Hence this checking is part of the binding operation. When the code within the scope of this declaration is executed, all occurences of the symbol x are replaced by references to this variable generated by loc, which is only executed when control passes into the scope of the declaration. This run
time substitution does not require type checking as the necessary checks have already been performed. That is, the result of executing loc real must be a reference to a variable that contains only reals. This step is called resolution, and we say that x has been resolved. ndce the above declaration is made, i.e., the binding operation is performed, the identifier component of the symbol x no longer plays any role in determining the meaning of the code within the scope of the declaration. That is, systematic replacement of x with, for example, the symbol z (which avoids name clashes with other variables) does not alter the meaning of the code. In summary then, a binding operation performs the following:
%
binds symbols in a section of code by converting them into anonymous placeholders in the resulting code performs type checking to ensure that the type associated with each symbol will be satisfied when the bound symbol (placeholder) is replaced by a value during execution specifies how each symbol will be resolved at run time, usually when control enters the scope of the binding declaration
3.2 Working with Bound Variables bound symbols (variables) differ significantly from free symbols in other ways than that they are “anonymous”. In particular, bound symbols are not values. There is no way to access them and they are not subject to any explicit manipulation. Bound symbols are always replaced by other values before the code in which they are embedded is executed. We illustrate this with the following two cases: Parameters: Symbols may be bound by specifying that they are to be parameters in the lambda calculus. This was described informally in the preceding. In procedural languages, procedures have their parameters determined at compile time. However, despite the fact that the parameters are bound, the association between symbol and argument has not yet been made. The bound symbols (parameters) are not resolved until the procedure is called and the arguments are “passed”. The passing of arguments represents the resolution step and it is normally part of the effect of a procedure call operation. What occurs at compile time, when the parameters are bound is that each previously free occurence of a parameter symbol is replaced by an anonymous
placeholder, usually an index into the argument list that designates the correct paramenter/argument resolution when the environment for the procedure is established and the arguments are passed. Local Variables: Symbols can also be bound by declaring that they be local variables of a procedure, e.g. loc in Algol68. As with parameters, symbols declared as local variables are bound at compile time. In this case, the placeholder (bound symbol) frequently takes the form of a pair < displayelement; offset > that identifies the location of the local variable when storage is allocated at procedure entry for all the local variables. The display element is set to the address of this local storage and the offset is added to this base address to complute the address of the resolved symbol. Once again, however, the bound symbols are not resolved at compile time, but only at run time, during procedure call, when all local variables are instantiated.
3.3 Wrap-up What can be seen from the above examples is that, in most 3rd generation languages, binding usually takes place at compile time. Hence, the intuitive notion that these languages perform early binding is correct. Resolution, on the other hand, may occur multiple times, e.g. during procedure calls, and frequently occurs at run time. Thus, the resolution process involves: a source of values or a means of generating them a section of code containing bound variables
The result of resolution is a new section of code in which bound symbols have been replaced by the values to which they have been resolved, i.e. those provided as arguements or local variables. If explicit resolution operations are involved, then the resulting code can be further manipulated as a value and perhaps be the subject of further binding and resolution operations. Alternatively, resolution may be performed implicitly as a result of procedure involation. Then, the resulting code is available only for immediate execution as the executable body of the procedure activation. Like the binding process, resolution may involve type checking. For instance, ensuring that an argument satisfies the type requirement of a parameter to which it is passed is done during resolution.
4
Our View of Languages
Exposing binding and resolution in terms of explicit operations makes possible a new view of languages. We can use these operations as a means of constructing programs without recourse to exposing the program as some form of extended data structure that is subject to manipulation be ordinary data handing operations, e.g. assignment. Rather, we can treat program material as scalar values, like, for example, integers. Like integers, which have operations like plus and minus defined on them, program type data has binding and resolution operations defined on its values. In neither case need the internal representation of the scalar value be exposed. We view the concrete representation of a source program, not as the program itself, but instead as a character string that, when interpreted correctly, tells us what operations to use in constructing the program, as well as telling us what the constructed program is. This is not a new view. Lisp [5] has embodied this view from its inception. The Lisp symtax is a specification of the list struction of the program. In the Vienna definition of PL/1 [8], the character string representation of the program is converted into a form called the abstract syntax. This abstract syntax is an internal data structure that is manipulated by the data handling operations of the interpreter. In both the Lisp and the PL/1 cases above, the program is exposed to manipulation by ordinary data handling operations, and the type or constraint checking that is done during program construction can be undone by subsequent use of more of these operations. Thus, were Lisp to support some form of type checking, the checking could not be effectively performed when the program was constructed. Only at the time that a program is secure against further data manipulation can the checking be performed in a way that guarantees that its effects can be depended upon when the program is executed. This happens in most Lisp programs only when the complier is invoked. One cannont, however, consider a compiled program to be the list structure that is used to represent interpreted programs. Unlike Lisp, 3GL languages do not ordinarily expose their programs as data structures that are accessible to users of the language, although their definition methods might treat the programs as data structures as in the case of PL/1. Aside from, perhaps, being able to invoke their compiler, passing it a character string, or being able to assign procedure values to procedure variables, these languages do not possess operations that can manipulate program material. Our view of type checking is that type consistency is required in order for a program to be well-formed. This is captured in the notion that the type checking goes on during the time that the program is being constructed, and that the program
will not be successfully constructed unless the type constraints are satisfied. Type checking at compile time is not an optimization of run time checks. Rather it is a necessary ingredient in the construction of the program. It does not typically occur at run time at all. What we sometimes call type checking at execution time may involve some testing of values, but the program elsewhere does not rely on the results of this test. For example, one might have a language construct that tests which of the possible types the current value of a union type possesses. Each possibility can then be expressed as an alternative of a case style construct. But the code for the alternatives is compiled independently of the results of the test. Rather, each alternative is compiled such that if the value present is of a particular flavor in the union type, then the type of that value, currently a union type, is “narrowed” to be the simple type. For example, consider union(int,real). When one wants to use the value of the union in, e.g., an arithmetic expression, then a case construct involving the variable is introduced. The case construct is responsible for determining which alternative is present in the union, “narrowing” it to that more restrictive type, and then executing the appropriate alternative. The kind of “type checking” done at execution time by case is not part of the well-formedness of the program. In one of the case alternatives, the symbol introduced is resolved to an int, in the other, to a real. Both cases can be strongly type checked during program construction, in which the symbols introduced are declarations that bind free occurences in the case alternatives. Thus, this case is a binding operation like lambda or loc in section 5, binding the free symbols in its body, and specifying how they are to be resolved during execution. How then do we go about constructing programs? We start with the character string or vector of character strings that are in a form described by the concrete syntax. The character string is parsed and the internal, abstract form of the program is generated. The parsing is done using ordinary string manipulation operations. The abstract program is an ordinary data structure, not a program value. It is constructed using ordinary data handling operations. The abstract form of the program is then converted into program values by means of program construction operations, essentially our binding and resolution operations. This view requires then that the concrete syntax of a program tell us both what the program is and how we should go about constructing it. For example, a procedure header is not an executable part of the program that is constructed. Rather, it is a directive as to how we should construct the program.
5
Program Construction
5.1 Overview The construction of program values begins with the bits and pieces that can be manipulated as values within our language. This is why symbol was introduced as a type of value in our language system. In addition, we need to have operators be values of our language system. They are already, in many languages, a form of procedure value. A procedure value is characterized by its signature, i.e. the types of its arguments and the type of its result. Hence, we begin to build program values from symbol values and procedure values. Assignment statements are expressions in which the assignment procedure value is the controlling operation. There are similar operations for conditional statements, compound statements, indeed all relevant statement forms of the language. For each primitive operation of the language, arithmetic, boolean, character or bit string, there is a corresponding procedure value that is, like the assignment procedure value the starting place for incorporating the operation into program values.
5.2 Notation Before introducing and describing program construction operations, we need to introduce some notational conventions. We are not concerned with the concrete or character string form of programming languages. Nonetheless, in order to discuss program construction, we need a notation to describe the effects of our operations. The notation is introduced for purposes of explanation in this paper. It is not intended as a notation for any programming language. We need to distinguish between an expression (procedure) and the result of evaluating the expression. In addition, we must distinguish the expression from the “expression” that was used to create it as part of a program. This is actually simply another example of the first distinction, but applied to the expressions that produce expressions as results. The notation is introduced as we proceed with the discussion of the operators. Most of our effort is concerned with providing a descriptive notation for expressions that are the result of evaluating other expressions.
5.3 Some Program Construction Operations We do not provide a complete set of program construction operations here. Rather, we illustrate our approach for some cases that are common to essentially all 3gl languages. These are functions/expressions, local variables, and external variables
as presented by PL/1 [2]. Extensions to own variables should be straightforward. Extensions to file systems and other forms of persistent objects takes us out of the program construction business and into the world of true dynamic binding, which we discuss briefly in the Discussion section. 5.3.1
Functions and Expressions
The important characteristics of an operator(procedure), so far as program construction is concerned, reside in its signature (type), i.e. the types of its parameters and the type of its result. Consider the addition operation. We represent its signature as
(x; y)iadd(< x; int >; < y; int >)int which describes it as taking two arguments which are integers, and returning an integer result. The two free symbols in iadd are bound as parameters by the -list. In providing arguments to the above operator, we must distinguish between the expression that causes the arguments to be passed and the expression that results from passing the arguments. The operation that passes the arguments involves the apply operation. This name was chosen because of our view that in purely applicative languages such as the lambda-calculus, application is nothing more than substituting arguments for parameters. Thus,
apply((x; y)iadd(< x; int >; < y; int >)int; (< a; int >; < b; int >)) ! iadd(< a; int >; < b; int >) or
apply(+; (a; b)) ! a + b The expression preceding the ! will, when evaluated, resolve the parameters of the iadd operation to the argument list provided to it. The result of the evaluation of the expression with the resolution operation apply is described following the !. This is described in two equivalent ways and represents the substitution of arguments for parameters. In this example, the intadd operation’s bound variables have become the free symbols < a; int > and < b; int >. It is also possible to resolve the parameters to arguments that are constants. For example,
apply(+; (1; 2)) = apply((x; y)iadd(< x; int >; < y; int >) ! iadd(1; 2) =
1+2
1+2
where is an expression containing no free symbols and which, when evaluated, produces the result . More complex expressions can be built by applying an operator(procedure) to an argument list whose entries are themselves expressions. Note, however, that to be precise we must describe the symbols in the body of our operators so that expressions can be substituted for them. Assuming that this is done, we can create, for example,
3
apply(+; (a; b c)) ! +(a; b c) = a + (b c) As it is possible, using apply, to substitute arguments for parameters, it is also possible to turn free symbols, such as the substituted arguments free symbols < a; int > and < b; int > above into parameters. This is done with the lambda operator. Thus:
lambda((x; y); expr(< x; Tx >; < y; Ty >)) ! (x; y)expr(< x; Tx >; < y; Ty >) where the free symbols < x; Tx > and < y; Ty > have become parameters of the resulting function and are no longer free. We could subsequently apply this function to other arguments such that the resulting expression had different free symbols, or constants, as we did above for iadd. 5.3.2
Local Variables
Variables considered to be local are usually “local” to a block. So it is useful to exploit two operators here, one that forms a block, and a second that defines local variables. We call the operation that concatenates two statements (procedures with no parameters) into a compound statement operation. Thus, we have
; ;(stm1; stm2) ! stm1; stm2
Obviously, we can form arbitrary sequences of statements in this way. We then treat the sequence as if it were itself a statemnent. One can bind free variables of a statement to become local variables by use of the local bind operation. local is similar to lambda in that it binds free variables. However, while lambda specifies that the bound symbols are to be resolved by associating them with arguments, local specifies that the bound symbols are to be resolved to local variables that are created at the time that the block is entered. An example of this is:
local((x; y); stm(< x; int >; < y; int >)) ! begin(x : int; y : int; stm)
where the begin item denotes the resulting statement block with local variables x and y . [It is important to keep in mind that the operation local must be executed to produce the begin statement group. Subsequently, at execution time for the program, the begin statement group is itself executed. When its execution starts, the local variables denoted by x and y are created and exist for the duration of the begin block’s execution. Thus, again, variables are bound prior to the time at which we know completely what they are intended to identify. That is, the storage for local variables does not exist at binding time. At bind time, the symbolic nature of the symbols disappears and the resulting “bound” symbols become anonymous placeholders. We know, however, that at execution time, these placeholders are to be replaced by (resolved to) variables local to the begin block. 5.3.3
External Variables
A final example of binding and resolution that we consider is an explanation in our terms of the handling of external variables. External variables are variables that are defined by the linker such that like named external variables in independently compiled programs are associated with the same storage location or value. The high level explanation that we seek is that the external variables are free variables that are bound by the linker using our binding operations, perhaps even the binding operation local, executed on the collection of separate programs. External variables can be nested within other blocks, and their names can be used as local variables in these other blocks. Nonetheless, the external variables do not become bound to these local variables. We solve this problem by distinguishing external variables from their similarly named local variables. The declaration of an external variable in some programming language, e.g. PL/1, is interpreted as a directive to name the variable in a special way. Thus, a variable X declared to be external becomes, e.g., the symbol < ext X; TX > where ext is a prefix reserved for external symbols. Then, by convention, only the linker issues binding commands for these special symbols. Like other symbols, external symbols remain free variables until a bind command causes them to become placeholders.
%
5.3.4
%
Naming Persistent Variables
What we have tried to show is that most “binding” in 3GL languages is really early binding that is done at compile time. The binding at compile time is not simply an optimization, but rather is part of the construction of the program. The treatment of external variables demonstrates how some of the binding can be left until link
time. In all cases, our free symbols have type information with them and hence the program can be type checked during program construction, hence insuring that the program is well formed. There are, however, symbols that need to be left as free symbols until execution time. Symbols that denote files are the most common of these in conventional practice. The binding and resolution of these symbols usually occurs at the time that a main program begins to execute, so that meaning of these symbols cannot be changed by the program itself. Thus, it is the command or shell language that usually provides the binding and resolution of these symbols. Mechanisms like environments, as described in [6] are useful for this dynamic binding and resolution. In this case, a symbol is bound and resolved simultaneously. This is possible because, at binding time, we already have in existence the persistent variable to which the symbol is to be bound. We give a brief description here of this mechanism. A more complete description is given in [6]. An environment is an ordered sequence of ¡symbol,persistent variable¿ pairs. A symbol that remains free in a program is bound and resolved by means of searching the environment for a like-named symbol and then resolving it to the persistent variable to which it is paired. We think the notion of an environment is useful, but do not believe it needs to be primitive. Instead, one can build an environment as an ordinary data structure. The search of the environment can be done with the normal operations provided by a language. Then one can lambda the symbol name in the program we are doing the binding for. Then this symbol can be resolved by applying the resulting function to an argument that is the persistent variable associated with the symbol in the environment data structure. This works if we know the symbols that remain free in the program that we wish to bind in this way, and if we can do an equality test on symbol values, which seems reasonable.
6
Discussion
6.1 Summary Our view is that treating static binding as an optimization of dynamic binding is very unsatisfying. It is like treating strong static type checking as an optimization of dynamic type checking. The value of strong type checking is the firm knowledge that it cannot be bypassed during the dynamic execution of programs. Hence it is declarative in nature, i.e. a statement about all executions of the program. The same is true for most binding as well.
Our notion of binding captures the mathematical notion of binding as removing the significance of variable names from the meaning of a formula. In programming languages, the names of local variables are, in fact, not significant in that a systematic renaming of them would not change the meaning of the program. Since arguments are not yet passed and local variables do not yet exist at the time that variables are bound, the placeholder bound variables are not yet associated with programming language entities, be they variables or values. We call this second association step symbol resolution.
6.2 Very Dynamic Languages There are very dynamic programming languages for which our static binding model does not work. Smalltalk may be such a language. The notion of passing messages to objects and then having the objects sometimes respond appropriately and sometimes responding with a ”huh?” certainly delays the connection of meanings to programming language symbols. Even Smalltalk programs must be constructed, of course. And we would argue that this construction is the result of binding and resolution operations. The types involved, however, are so general as to not be very helpful. Runtime interpretation of much of the program will still be required. And this runtime interpretation will involve dealing with symbolically named quantities. In this case, it is not so much that our model of binding and resolution doesn’t apply, it is more that after applying it, much of the meaning of Smalltalk programs has not been captured.
6.3 Concluding Remarks This paper was started in 1985. Indeed, the current version is a modest re-write of the draft version of 1985. I worked intermittently in programming languages in the 1970’s and early 1980’s. The views expressed here permeated much of that work. The bibliographic entries [3, 4] point to the more relevant and accessible parts of this work. The language examples come mostly from Algol68 and PL/1. One of the reasons for this is the time at which the paper was initially written. Another reason is that these very rich languages provide a large collection of features “all in one place”. It would be interesting to have comments from readers concerning the generality and relevance of this work to languages that are more “au courant”.
References [1] Burstall, R. and Lampson, B. A Kernal language for Abstract Data Types and Modules. Symposium on the Semantics of Data Types Sophia-Antipolis, FR (1984) [2] IBM, OS and DOS PL/1 Language Reference Manual International Business Machines (Sept. 1981) [3] Lomet, D.B. A Data Definition Facility Based on a Value-Oriented Storage Model. IBM J. of R.D. 24,6 (November 1980), 764-782. [4] Lomet, D.B. Objects and Values: the Basis of a Storage Model for Procedural Languages. IBM J. of R.D. 20,2 (March 1976), 157-167. [5] McCarthy, J., Abrahams, P., Edwards, D., Hart, T., and Levin, M. LISP 1.5 Programmer’s Manual The MIT Press, Cambridge, MA (1966) [6] Morrison, R., Atkinson, M., Brown, A., and Dearle, A. Binding in Persistent Programming Languages. SIGPLAN Notices 23,4 (April 1988) 27-32. [7] Strachey, C. Fundamental Concepts in Programming Languages. Oxford University Press, Oxford, UK (1967) [8] Vienna Definition of PL/1 [9] Van Wijngaarden, A., Mailoux, B., Peck, J., and Koster, C. Revised Report on the Algorithmic Language ALGOL 68. Acta Informatica 5, 1-236 (1975)
Contents 1
Introduction
1
2
Symbols
4
3
Binding and Resolution 3.1 Working with Free Variables . . . . . . . . . . . . . . . . . . . . 3.2 Working with Bound Variables . . . . . . . . . . . . . . . . . . . 3.3 Wrap-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 5 7 8
4
Our View of Languages
9
5
Program Construction 5.1 Overview . . . . . . . . . . . . . . . . 5.2 Notation . . . . . . . . . . . . . . . . . 5.3 Some Program Construction Operations 5.3.1 Functions and Expressions . . . 5.3.2 Local Variables . . . . . . . . . 5.3.3 External Variables . . . . . . . 5.3.4 Naming Persistent Variables . .
6
. . . . . . .
11 11 11 11 12 13 14 14
Discussion 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Very Dynamic Languages . . . . . . . . . . . . . . . . . . . . . . 6.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . .
15 15 16 16
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .