Executable Assertions and Separate Compilation? - CiteSeerX

10 downloads 0 Views 186KB Size Report
this purpose are also suitable for elimination of redundant programmer-de ned assertions. Already Markstein et al. 7] observe that in addition to elimination of re-.
Executable Assertions and Separate Compilation? K John Gough1 and Herbert Klaeren2 1 Queensland University of Technology, Box 2434 Brisbane 4001, Australia, [email protected], 2 University of T ubingen, Sand 13, 72076 Tubingen, Germany, [email protected]

Abstract. The use of executable assertions is widely recognised as a useful programming technique for complex systems. In many cases static analysis of programs allows such assertions to be removed at compile time, thus removing the overhead of the test. The use of interprocedural analysis would often allow a larger number of such tests to be statically removed; intermodular analysis could even improve overall safety of the system. In general, however, such analysis is antithetical to separate compilation and extensible systems. In this paper we o er a partial solution to this dilemma: We propose that preconditions become part of the interface de nition of an encapsulated object. The implementation consequences of this technique are explored. Keywords Executable assertions, separate compilation, value propagation, elimination of redundant checks.

1 Introduction The use of preconditions and postconditions in the analysis of programs is by now rmly established. In a somewhat related development, many software engineers have found it useful to incorporate programmer-speci ed executable assertions into their software. This goes hand in hand with the claim for secure languages as described recently by Strom [11] who opposes insecure languages (where a bug in module A can damage data structures used by a (presumably) non-buggy module B in a way that B seems the culprit in a later system crash) to secure languages that would either make module A non-compilable or, to the least, raise an exception at the moment when A corrupts the data structure. Besides the usual measures taken by a type-safe language (such as over ow and array bounds checks and the like), programmer-de ned pre- and postconditions can be a great help in making a language secure. Note that in the scenario described by Strom, we must be able to deal with such conditions across module boundaries. For example, an assertion that a particular list is non-empty would prevent an algorithm depending on such a precondition from erroneously continuing in cases where the precondition isn't met. There are two consequences of this: ?

Joint Modular Languages Conference, Linz 1997

2

Firstly, failure of the assertion at runtime can provide an explicit error message which highlights the exact nature of the programmer's misconception rather than allowing some arbitrary failure of later code, possibly even residing in a di erent module. Perhaps more importantly, the addition of such assertions keeps code evolution later in the software lifecycle from invalidating a necessary condition for correctness of some existing part of the code. Some programming languages provide compiler support for such assertions; already Algol-W [12] had such provisions. Ei el [8] introduced executable preand postcondition assertions as an intrinsic part of the programming by contract [9] paradigm. Oberon-2 now incorporates such assertions, and Gardens Point Modula (gpm) [10] has always had such a feature, although technically is not part of the Modula-2 language [6]. In the case of gpm, the Assert procedure takes either one or two arguments. The rst is an arbitrary expression of type Boolean . If the optional second argument is present, it speci es an associated error message. If no message is speci ed, the compiler generates a default message which reports the compilation unit in which the assertion failed, and the respective line number in the source. The gpm compilers generate inline code for the Boolean evaluation, and indeed do not even cause any overhead in form of a taken branch in the case that the assertion evaluates to truth. All generation of error messages, and the trapping to the runtime system is done out of line of the main control ow; thus, the overhead of an assertion is never more than that of evaluating the Boolean expression plus a non-taken branch. It is possible to use a command line ag to turn o the generation of code for assertions, but we have never found a circumstance in which we have thought the modest saving in execution time was worthwhile. Our own code makes very substantial use of these assertions. Examination of the code which arises from the use of the Assert facility shows that occasionally the dead-code removal algorithms of the compiler can eliminate the test entirely. This happens for example, when one assertion is logically implied by another, or when properties of the type system imply the truth of the assertion. We have some experience with elimination techniques for redundant checks that are automatically introduced by type-safe language systems such as array bounds checks [3] or dynamic type tests [1]; obviously, all techniques used for this purpose are also suitable for elimination of redundant programmer-de ned assertions. Already Markstein et al. [7] observe that in addition to elimination of redundant checks it usually pays o to move the non-redundant ones to regions of lower execution frequency. Later it has been observed by Gupta [4, 5] that interprocedural analysis would enable an even larger number of tests to be eliminated. For this reason, he proposes to move tests from called procedure to the call site. The gain arises, because the caller of a procedure always has more static information about the arguments than the called procedure.3 For example, the 3

Note the traditional abuse of terminology here: When we speak of what a procedure \knows" we use this as code to refer to the information that can be computed during Version of December 19, 1996

3

value of a literal constant argument is known to the calling procedure, but is just another variable to the callee . Thus the use of interprocedural analysis, or the use of procedure integration (\inlining") together with advanced intraprocedural analysis, might be expected to yield signi cant gains. Unfortunately, the use of either procedure integration or whole program analysis is antithetical to separate compilation, and simply has no place in extensible systems. Indeed, we do not favour the use of such analysis either to propagate value information, or to patch up holes in the type system (as is required by some languages which allow covariant specialisation). Instead we are interested in exploring the strengthening of the static analysis which is available in the presence of separate compilation, and modular decomposition. The rest of this paper is set out as follows. Section 2 introduces a motivating example. Section 3 sets out the essence of our proposal, while Section 4 discusses the semantic constraints on the method and section 5 discusses some implementation details. In the remaining sections, we discuss the extension to postconditions and procedure variables and draw some conclusions.

2 A Motivating Example We consider the example of a generic matrix multiplication routine, using open arrays. The interface to such a routine might be

PROCEDURE matmul (VAR res :ARRAY OF ARRAY OF REAL; VAR lOp :ARRAY OF ARRAY OF REAL; VAR rOp :ARRAY OF ARRAY OF REAL); Of course, the correctness of any call to this routine will depend on the proper conformance of the dimensions of the three arguments. In particular, if lOp is an l  m matrix, and rOp is m  n, then the result matrix res must be l  n. Using x1 , x2 to denote the rst and second upper bounds on parameter x, these conformance assertions are lOp2 = rOp1 res1 = lOp1 res2 = rOp2

(1) (2) (3)

All of this is known to every student of algebra. What is not so well known is that the conformance assertions are sucient to allow a compiler to statically eliminate all index bounds checks in the implementation of matrix multiplication. A typical body code might be compilation of the procedure. http://www-pu.uni-tuebingen.de/users/klaeren/jmlc97.ps.gz

4

BEGIN FOR i := 0 TO lOp1 DO FOR j := 0 TO rOp2 DO sum := 0.0; FOR k := 0 TO lOp2 DO sum := sum + lOp [i ,k ]  rOp [k ,j ] END; res [i ,j ] := sum END END END matmul ; The reasoning now is as follows. It is almost trivial for a compiler frontend to compute that if the FOR loop index k ranges over [0 .. lOp2 ] then the bounds check on k in lOp[i; k] is never necessary. The bound on k, stated in terms of the dimensions of lOp, does not help by itself to eliminate the check on k in rOp[k; j]. Equation (1), however, taken together with the FOR loop bounds on k, eliminates the second test on k in the summation line. Equations (2) and (3), taken together with the other FOR loop bounds eliminate both bounds checks on the nal assignment. All of the other index bounds checks are directly eliminated by the FOR loop bounds, without the use of the conformance assertions. It may be noted that the elimination of these bounds checks is quite in uential, since the presence of the checks makes the loops harder to optimise. Once the tests are eliminated, the full power of induction variable analysis and invariant code motion are available to give very signi cant speedups in the code. In our previous work [3] we have shown how a symbolic computation of range information for variables can be used for removal and relocation of over ow and array bounds checks. With a small extension of these techniques it is possible for our compilers to automatically eliminate all index checks in this example. If we begin the body of the procedure with three executable Assert statements equivalent to the conformance relations above, then the compiler links the values together. Whenever a new range interval is computed for (say) k in terms of lOp2 then a check against rOp1 is known to be equivalent, according to equation (1). The runtime cost of performing the Boolean evaluations is regained by the elimination of the index tests alone. This leaves us with the question whether the called procedure should really be responsible for ensuring conformance with the dimension constraints. Meyer, in his \Design by Contract" paper [9] says that such contraints are part of the \contract" between client and server and continues by stating that there is a wide range of possible styles, \ranging from `demanding' ones where the precondition is strong (putting the responsibility on clients) to `tolerant' ones where it is weak (increasing the routine's burden). [.. .] The experience with Ei el, in particular the design of the libraries, suggests that the systematic use of Version of December 19, 1996

5

a demanding style can be quite successful. In this approach, every routine concentrates on doing a well-de ned job so as to do it well rather than attempting to handle every imaginable case." This is in perfect accordance with our own experience. In most cases, the code which calls the procedure knows the dimensions of the matrices, and is able with more or less (compiler) e ort to establish the assertions without incurring any runtime cost. On the other hand, it is the matrix multiplicationroutine which can make the most valuable use of the information. Thus the calling side should be responsible for such checking, since more information is known about the actual arguments on the calling side. We also conclude that the called routine should be able to use the semantic content of any such assertions in the elimination of runtime checks, or in the computation of other static semantic properties, in other words: it should be allowed to assume that the client guarantees the precondition, and take advantage of this knowledge.

3 Assertions in the De nition Consideration of the example in the previous section suggests the introduction of a formal mechanism for passing assertions between a library interface and its clients. We propose that assertions on the values of the arguments to a routine should form part of the declared interface. The rules of the contract are as follows: { the client of an interface is responsible for ensuring that the arguments passed to each procedure obey the Boolean conditions stated in the interface definition. The client may either eliminate these tests if statically computed information proves them redundant, or must otherwise perform tests at runtime, { a procedure which is guarded by an advertised precondition on the values of its actual arguments may assume without tests that these preconditions hold. The procedure may use this information in the computation of static properties of the procedure body. Note that at worst the runtime tests have been transferred from the body of the called procedure, back to the client. This can never use more machine cycles, but may cause an increase in code size proportional to the number of call sites due to code replication. At best, the assertion may be statically satis ed in the caller, and may enable useful optimisations in the called procedure. All of this occurs without compromising the type-checked separate compilation which is at the heart of modular languages. Indeed, it is possible to use these interface assertions as extensions to the type system. Consider the case of languages such as the Oberon family, which do not have subrange and enumeration types. In such languages integer types are used to carry values which are semantically of restricted range. Thus a variable which http://www-pu.uni-tuebingen.de/users/klaeren/jmlc97.ps.gz

6

holds a day-of-the-week value would be declared as an integer. At runtime the valid values would be only [0::6], and the interface to a procedure WriteDay might be declared with a precondition PROCEDURE WriteDay (day : INTEGER); PRE (0  day ) ^ (day  6); In this case, the WriteDay procedure will certainly be able to use the assertion to eliminate the index test on the name-table access. The client of the procedure will often be able to eliminate the test, and when that is not possible will simply perform the prescribed range test. This example shows that using an integer type, together with the range assertion leads to exactly the same tests and test-eliminations as would have occurred in Modula-2, for the call of a procedure PROCEDURE WriteDay (day : WeekDays ); where WeekDays is either a subrange, or perhaps an enumeration type. All of this is cognate with the point of view in which the notion of datatype is equivalent to a set of assertions on the permissible values of its object of that type.

4 What can be Asserted In the case of Modula-2, assertions on exported procedures would occur in the de nition part of a module. In this case, the scope in which the assertions are embedded is the scope of declarations of the formal parameters. Thus, the visible identi ers which can occur in the assertions include { names of objects exported from this module { formal parameters of the procedure in question { identi ers explicitly imported into the de nition. Fortuitously, all of these names refer to values or types which are already available to all possible clients of the interface. Clearly, all of the objects exported from the de nition are available to the client. Furthermore, any modules explicitly imported into a de nition must also be indirectly imported by all clients of the de nition, in order to check interface conformance. Thus the introduction of assertions according to these constraints does not require any broadening of the name scope in the client. In languages such as Oberon, which do not have a separate de nition part, the precondition assertions need to be treated with a little care. It would be simple for an assertion in an Oberon program to be perfectly well formed from the point of view of the compiler of the procedure, but to be ill-formed from the point of view of the client of the module. This would occur if the assertion referred to any identi er which was not publically available to the client. The di erences between the cases of Oberon and Modula thus relate to the name scopes in which semantic analysis of the de nitions of the assertions must take place. Version of December 19, 1996

7

4.1 Handling opaque types

One of the bene ts of encapsulation is that it is possible for the implementation of an abstract data type to ensure the integrity of all data values of that type. Nevertheless, it may sometimes be the case that a procedure requires an assertion on value(s) of some opaque type, other than the implicit assertion that it is a valid value of the type. When such cases do arise, the assertions should not and need not break the encapsulation of the type. In such cases it seems that the module which encapsulates the type must also export the necessary procedure(s) to perform the value checking. Thus DEFINITION MODULE Foo ; TYPE FooType ; ( opaque )

PROCEDURE Thing (x :FooType ); PRE IsOk (x ); PROCEDURE IsOk (x :FooType ):BOOLEAN ; END Foo . Of course, in this case there is no real possibility of the client being able to eliminate this opaque test on the opaque type. After all, the use of an opaque type is intended to prevent the user of the code from making assumptions about the implementation. This is an important aspect of data abstraction, since it ensures that users are insulated from possible changes in the implementation of types. If the full bene t of separate compilation is to be maintained, then even the compiler of a module which uses an abstract type should not break the abstraction. Thus the compiler is prevented from performing optimisations based on implementation details. Even in this case of assertions on the values of opaque types, when such an assertion is violated there is at least the advantage that the error message is sheeted home to the calling site, rather than requiring a stack trace to identify the culprit.

5 Implementation Issues There are several important issues of implementation which require some consideration. Firstly, the format of the assertions themselves must be determined. We currently propose a syntax of the form given in Figure 1, which has the EBNF (leaning on [13]) ProcedureDeclaration ! PROCEDURE ident [FormalParameters] [\;"PreCondition]: PreCondition ! PRE expression : In e ect, the identi er PRE is treated as a reserved word. http://www-pu.uni-tuebingen.de/users/klaeren/jmlc97.ps.gz

8 ProcedureHeading FormalParameters PROCEDURE

;

PreConditions

ident

PreConditions PRE

expression

Fig. 1. Syntax diagram for procedure headings with optional assertions It suces to provide for a single expression at this place because multiple assertions can be formed into a conjunct. The expression in the assert statement is parsed and statically checked within the scope of the current declarations, in exactly the same way that a constant declaration would be. The expression must be syntactically correct, and must be of the Boolean type. As pointed out earlier, because an Oberon compiler would have names in the scope of the assertions which would not be visible to the client of the module we believe that some additional analysis would be required in that case.

5.1 Symbol le representation In most implementations of modular languages, information is passed between modules and their clients in a symbol le. These les are a compact representation of the static semantics of the interface. Typically, information such as the names of formal parameters are elided,4 and constant expressions, no matter how complicated, are reduced to literals. This parsimony brings advantages, since it is a simple matter for a smart recompilation utility to detect when the source of a de nition has changed but the symbol le has not. It is axiomatic that if a symbol le has not changed, then the interface is unchanged. Thus unnecessary recompilation of client modules can be avoided. Recent work [2] at ETH shows how a ner-grained view of change may reduce unnecessary recompilation even further. In this context the representation of assertions in the symbol le becomes an issue. Clearly, some extension of the symbol le format is required to encode the additional information. In gpm the symbol le format completely folds constant expressions to literals, so there is no de ned syntax for expressions or their operators. In our current experimental implementation we have dumped a simple preorder representation of the precondition expression as an addendum to the procedure type information. Globally visible names appear explicitly in the le, but the names of formal parameters, which are meaningless to the client, are replaced by positional indices. 4

For Oberon-like languages the names of formal parameters may be retained, so that a browser can reconstruct a \virtual de ninition module" from the symbol le. Version of December 19, 1996

9

A simple extension of the symbol le parser reconstructs the precondition expression trees, and attaches them as attributes of their corresponding type descriptors.

5.2 Client-side processing Gardens Point compilers create abstract syntax trees for the whole of a compilation unit at once. For some languages the compilers bind names during tree creation, while for others, names are bound during a static semantic traversal of the tree. In either case, during the static semantic traversal of the abstract syntax tree the referent of each procedure call is identi ed, and its type descriptor is retrieved. Each actual argument expression-tree is then type-evaluated, and tested for conformance against the corresponding formal parameter. The typeevaluator is also capable of quite aggressive constant folding, so that in many cases expressions are reduced to literals. During static semantic checking, procedure call nodes with preconditions are modi ed by local tree rewriting. A new parent node is interposed above such nodes. This parent has an elaborated instance of the precondition attached to it. The precondition expression instance is type-checked in the normal way, and any constant expressions folded to literals. For our motivating example, the dimensions of statically declared arrays will probably be known, and the preconditions will be trivially satis ed at compile time. When code is generated for the procedure call, intermediate language code to evaluate all arguments is emitted. If the precondition expression did not fold to the Boolean truth value, then code is emitted to evaluate the precondition at runtime, and trap if the assertion fails. In gpm compilers all global data ow analysis and global constant propagation is performed in the backend code generators.5 Whether or not value range analysis is performed, there is further value propagation and dead code elimination in the backends, and further assertion tests will be optimised away.

6 Extension to Post-conditions In this paper, we have concentrated on preconditions for purposes of exposition. It is, however, quite clear that postconditions can be treated symmetrically. We have noted earlier that preconditions on the arguments to a function call are often able to be statically satis ed by the caller of a procedure, but are semantically relevant to the called function. Conversely, it is often the case that a function is able to statically ensure that some postcondition on its return value is satis ed. This assertion is perhaps of value to the caller who receives the returned value. 5

Note however that backends, since they are language independent, do not understand the type system, but only untyped values. Thus we need to perform analysis in both front- and backends.

http://www-pu.uni-tuebingen.de/users/klaeren/jmlc97.ps.gz

10

Thus we propose that post-conditions will place an obligation on the compiler of a procedure to either statically prove the test unnecessary, or generate code for a runtime test. The code which calls the procedure is then able to use the assertion of the postcondition to remove tests in the same way as for any other assertion. This establishes a symmetry in the client-server \contract": In the same way that the server relies on the client to check for the preconditions, the client can expect that the server has veri ed its postconditions before returning a result.

7 Treating Procedure Variables Obviously, in a language like Modula-2 where we have procedures as datatypes and therefore assignable procedure variables, the scenario described so far cannot completely satisfy. Introducing pre- and postconditions to procedure types poses two interesting problems: 1. The FormalTypeList [13] used in the declaration of procedure types doesn't assign names to formal parameters but only types. However, the most natural way to express any assertions for procedure types is to use formal parameter names. Having both procedures with and without assertions means that we have to allow a mixture of the FormalTypeList and FormalParameters syntaxes; this will violate the LL(1) property. Of course, this poses no serious problem since an ad-hoc lookahead by a second symbol will resolve the ambiguity. 2. If we assign a procedure (constant) with given assertions to a procedure variable whose type has speci ed assertions, the question arises as to how to deal with the compatibility of these assertions. There are several tar pits lurking here, one of them being the theoretical incomputability of expression equivalence, one other the practical impossibility of incorporating a theorem prover into a compiler. In this case we resort to the KISS (\keep it simple, stupid") principle by allowing the said assignment only if (a) the procedure constant has no associated assertions at all (in which case it inherits those of the type it is being assigned to) or (b) the abstract syntax trees of the assertions belonging to the procedure constant and the left hand type are identical. A small problem occurring in the implementation is that a precondition is expressed as a condition possibly referencing named formal parameter values, but is evaluated by referencing the corresponding actual values. These actual values may be mentioned multiple times, but must only be evaluated once because there may be side e ects hidden in function calls. Furthermore, all such values are \used" at least twice, once by the parameter passing mechanism, and at least once by the evaluation. Our current solution to this is an implementation which de nes local, internal names for the evaluated actual parameters, and converts all uses to references to those named \variables". Version of December 19, 1996

11

8 Conclusions We claim that the technique described here has many advantages. Firstly, it creates enhanced safety, in that declaratively speci ed assertions form part of a module's contract, and are enforced by the compiler. It does so without losing any of the advantages of separate compilation, and is thus applicable to extensible systems which cannot use whole-of-program analysis even in principle. Furthermore, we have shown that placing the assertions in the interface de nition allows the preconditions to be checked by the caller, which has the greatest amount of information about the actual arguments. In addition, precondition assertions are processed by the compiler of the called procedure, and without any runtime cost can often eliminate code in the procedure. Finally, we note that the addition of assertions to an interface may be seen as strengthening the type system of the language, by allowing constraints to be speci ed which the type system itself cannot express. We have a working prototype compiler which tests the viability of the concept. Failure to honour the contract of the precondition of any procedure leads to a trap in this system. The increase in compiler size for the programmer-de ned assertions is negligible; it amounts only to the small syntax additions and to the sketched enhancements of the symbol le reader and writer. All of the other ingredients have been present in the compiler anyway, which makes the proposed extension a very economic and sensible one. Of course, the next step would be to obtain precise data about the impact on code size and execution speed; both of them being quantities that largely depend on properties of the speci c program to be compiled, this could probably only done by a statistical analysis of a large number of programs from di erent application areas and programmer teams. The same can be said about the increase in program safety due to a thoughtful speci cation of pre- and postconditions; here, we still have some signi cant work in front of us.

Acknowledgement The idea of moving preconditions into the interface de nitions so as to achieve interprocedural code motion in the presence of separate compilation rst arose while discussing [3] with Clemens Szyperski. His contribution is gratefully acknowledged.

References 1. Diane Corney and John Gough. Type test elimination using type ow analysis. In Jurg Gutknecht, editor, Proceedings Int. Confr. Programming Languages and System Architectures, volume 782 of Lecture Notes in Computer Science, pages 137{150. Springer Verlag, 1994. 2. Regis Crelier. Separate Compilation and Module Extension. PhD thesis, Swiss Federal Institute of Technology, Zurich, Switzerland, 1994. Diss. ETH No. 10650. http://www-pu.uni-tuebingen.de/users/klaeren/jmlc97.ps.gz

12 3. K John Gough and Herbert Klaeren. Eliminating range checks using static single assignment form. In Proceedings ACSC19, Melbourne, Australia. Australian Computer Science Society, 1996. 4. Rajiv Gupta. A fresh look at optimizing array bound checking. In Proc. ACM SIGPLAN'90 Confr. Programming Language Design and Implementation, volume 25(6) of SIGPLAN Notices, pages 272{282, 1990. 5. Rajiv Gupta. Optimizing array bound checks using ow analysis. ACM Letters on Programming Languages and Systems, 2(1-4):135{150, 1993. 6. ISO. Information Technology - Programming Languages - Modula-2. IS 10154-1. International Standards Organisation, June 1996. 7. Victoria Markstein, John Cocke, and Peter Markstein. Optimization of range checking. In Proc. of ACM '82 Symposium on Compiler Construction, pages 114{ 119, 1982. 8. Bertrand Meyer. Ei el: The Language. Englewood Cli s, 1991. 9. Bertrand Meyer. Applying \Design by Contract". IEEE Computer, 25(10):40{51, 1992. 10. QUT. Gardens Point Modula Home Page. http://www.fit.qut.edu.au/CompSci/PLAS/GPM. Information on gardens point compilers, their availability, and documentation. 11. Rob Strom. Do programmers need seat belts? SIGPLAN Notices, 31(3):6{7, 1996. 12. N. Wirth and C. A. R. Hoare. A contribution to the development of Algol. Communications of the ACM, 9:413{431, 1966. 13. Niklaus Wirth. Programming in Modula-2. Springer, 3rd edition, 1985.

This article was processed using the LATEX macro package with the LLNCS document class. Version of December 19, 1996

Suggest Documents