1 Motivation - Semantic Scholar

In Proc. Intl. Conf. Object-Oriented Technology (WOON’97), 252-266, 1997

Analysing and Optimizing Strongly Typed Object-oriented Languages: A Generic Approach and its Application to Oberon-2 Jens Knoop

Falk Schreiber

Abstract

Object-oriented programming is going to replace the imperative programming paradigm as the state-of-the-art paradigm for large industrial software projects. The benets oered, however, come at the price of intricate new problems for optimizing compilers. In contrast to the imperative paradigm, where abstract interpretation is well-established as a uniform and theoretically wellfounded framework for analysis and optimization, for the object-oriented paradigm the situation is still characterized by approaches being quite specic or ad-hoc. Here, we show how to enhance the framework of abstract interpretation to the object-oriented setting considering an Oberon-2like language for illustration. We arrive at a generic framework, whose generality and exibility is demonstrated by a hierarchy of type analysis algorithms of dierent power and precision. Their results can directly be used for various optimizations like inlining type-bound procedure calls or eliminating unnecessary type tests and assertions. Moreover, we show how a previously proposed approach for type analysis by Corney and Gough, which is restricted to the intraprocedural setting, can be modeled in our approach, and even interprocedurally be enhanced, underlining thus the generality and exibility of our approach. Keywords: Object-oriented programming, strongly typed languages, Oberon-2, type analysis, type test elimination, abstract interpretation, optimization, generic analysis framework.

1 Motivation Object-oriented programming is gaining more and more importance in both computer science and industrial practice. A major reason is that it is commonly expected to provide a major contribution for overcoming the so-called \software crisis" by supporting faster and more reliable programming, by allowing to model application scenarios more naturally, and particularly, by supporting and simplifying the reuse of software components. On the other hand, the benets oered on the application side by the object-oriented paradigm raise subtle and intricate new problems for the construction of optimizing compilers. For imperative languages the theory of abstract interpretation (cf. CC77, Mar93]) provides a uniform and theoretically well-founded platform for the construction of powerful analysis and optimization algorithms capturing the full range of imperative language features (cf. Kno93]). In contrast, the situation for object-oriented languages is still characterized by approaches being quite specic or ad-hoc. This may be due to the fact that (1) language features and implementation techniques being typical for object-oriented languages as for instance late procedure binding introduce inherently new problems, for which the classical imperative analysis techniques do not (directly) apply, and (2) due to the broad conceptual diversity of object-oriented languages which, starting historically with Simula, ranges now from purely untyped languages like Smalltalk over hybrid languages like C++ to strongly typed single inheritance languages like Oberon-2. Fakultat fur Mathematik und Informatik, Universitat Passau, Innstrae 33, D-94032 Passau, Germany (e-mail:

fknoop,[email protected]

In this article we show how to improve on this situation by transferring the abstract interpretation based approach for analysing and optimizing imperative programs to the object-oriented paradigm considering an Oberon-2-like language, called SimpleOberon, for illustration, whose intraprocedural part was originally introduced by Corney and Gough CG94]. Resulting from this transfer we get a generic abstract interpretation based framework applying to Oberon-2 like languages. The generality and exibility resulting from the genericity of our approach is demonstrated by a hierarchy of type analysis algorithms of dierent power and precision. Their results can directly be used for a variety of optimizations like inlining type-bound procedure calls or eliminating unnecessary type tests and assertions representing a major source of ineciency in Oberon-2 programs. All analyses result from concise specications consisting of only 4 elementary components, which can directly be fed into the generic algorithm of our analysis framework supporting both rapid prototyping and easy trading of eciency against power and vice versa supporting user-customized solutions. In particular, we show how a previously proposed approach by Corney and Gough (cf. CG94]), which is restricted to the intraprocedural analysis of single procedures, can be modeled in our approach, and even interprocedurally be enhanced. Related work: Type analysis as the common footing of optimizing object-oriented programs is quite an active eld of research for years. The large majority of approaches proposed (cf. Age94, Age95, OPS92, PC94, PS91, VHU92]), however, focuses on untyped object-oriented languages like Smalltalk, and does not (directly) apply to statically typed languages like Oberon-2. In particular, the complex interplay of pointer variables and type-bound procedures in Oberon-2 lacks a direct counterpart in the settings considered there. Most closely related to our approach are the approaches of Corney and Gough CG94], and Knoop and Golubski KG96]. Corney and Gough proposed an intraprocedural type analysis algorithm for SimpleOberon Knoop and Golubski demonstrated how to extend the classical abstract interpretation framework to untyped object-oriented languages. While in essence the extension to statically typed languages is quite similar, its application to type analysis must be organized dierently in order to capture the interplay of pointer variables and type-bound procedures. Structure of the Article. After the preliminary Section 2, in which we present our Oberon-2-like programming language SimpleOberon, we introduce in Section 3 program models and interprocedural program models as representations of SimpleOberon-programs which are appropriate for the formal development of our approach. Central are then Sections 4 and 5, where we adapt the framework of abstract interpretation to the object-oriented setting under consideration, and subsequently demonstrate its application to type analysis and optimization presenting several variants, whose power is demonstrated by an illustrating example. Section 6, nally, contains our conclusions.

2 The Language SimpleOberon

Like in CG94] we consider an Oberon-2-like programming language (cf. MW91, Rei91, Mos95]), called SimpleOberon, which omits complex data structures like arrays or multilevel pointer structures, but is complex enough to demonstrate the essential features of our approach in particular, of how to resolve the interplay of pointer variables and type-bound procedures. The terminology of Oberon-2 varies from common object-oriented terminology (cf. Wir88]). Record-types, i.e., types of records, correspond to classes. Basis - and extension-types correspond to super - and subclasses in conventional object-oriented terminology. Variables of a record-type are the counterparts of objects (or instances of a class ), and type-bound procedures are the equivalent of methods. The invocation of a type-bound procedure corresponds to a message-send. Finally, the concept of inheritance is modeled by type-extension.

Program Decl Type PT RT StatSeq Stat

= f Decl g f ProcDecl g BEGIN StatSeq END. = VAR Ident:Type. = PreDefT j PT j RT. = POINTER TO RT. = RECORD ( Ident ) ]f Ident : PreDefT g. = Stat f Stat g. = Ident := Expr j Ident ( f ActPar ] g ) j WHILE Expr IF Expr THEN StatSeq ELSE StatSeq j : : : ProcDecl = ProcHead ProcBody Ident .. ProcHead = PROCEDURE Receiver ] Ident ( f FormalPar ] g ). ProcBody = f Decl g BEGIN StatSeq END. FormalPar = VAR] Ident : Type.

DO

StatSeq j

Figure 1: Part of the SimpleOberon syntax. rt0 Prt0

rt1

rt2

Prt1

Prt2

rt3 Prt3

Figure 2: Type extension and type-bound procedures. The syntax of SimpleOberon is essentially given by the contextfree-like grammar of Figure 1. A program consists of a set of declarations of variables and (type-bound and ordinary) procedures followed by the main program encapsulated by BEGIN and END. There are three kinds of types : predened types like INTEGER, REAL, CHARACTER, : : : record-types combining components of predened types, and pointer-types. Each pointer-type corresponds to a record-type, each variable of a pointertype refers to a record variable. Figure 2 illustrates the concept of type-extension. It is a means for creating new record-types by adding new elds to existing ones. In the example of Figure 2, rt0 is the basis-type of the record-types rt1 and rt2 . Analogously, rt1 and rt2 are (direct) extension-types of rt0. The hierarchy of record-types carries over to the corresponding pointer-types of pointer variables. The static type of a pointer variable is the type of its declaration. In contrast, its dynamic type is runtime dependent. It is the pointer type corresponding to the type of the record variable the pointer variable points to. This must be a variable of the static type of the pointer variable or of any of its extension types. For example, the set of dynamic types of the pointer variable pv1 :POINTER TO rt1 consists of the elements POINTER TO rt1 and POINTER TO rt3 , in the following abbreviated by pt1 and pt3 , respectively. The value of pv" is the record variable pv is pointing to. The statement pv(pt) is called a type assertion (or type guard ) testing whether the dynamic type of pv coincides with pt or some of its

extension types. Type assertions are typically used in order to control access to record components missing in types dierent from pt and its extension types. A type assertion does not aect the dynamic type of a pointer variable, but if its dynamic type does not coincide with pt or an extensiontype of pt the program execution aborts. A type test of the form pv IS pt evaluates to true, if the dynamic type of pv is pt or an extension-type of pt, otherwise it evaluates to false. The semantics of the remaining statements like IF THEN ELSE, WHILE DO, etc. coincides with the semantics of their counterparts of conventional imperative languages. For object-oriented languages it is characteristic that classes do not only contain the data, but also the methods for manipulating them. In SimpleOberon this is reected in that procedures can be bound to record types. For example, the procedure header PROCEDURE (rt1 ) P (

:::

formal parameters

:::

) BEGIN

:::

END P.

declares a procedure P bound to the record type rt1 . A procedure bound to a record-type is also accessible in all its extension-types, unless it is overwritten (replaced) by an equally-named procedure. For example, if pv is a pointer variable, and P the identier of a type-bound procedure, then pv.P() represents a call of the particular procedure P, which is bound to the dynamic type of pv. We remark that type-bound procedures are implemented by means of dynamic (\late") binding. In SimpleOberon, we consider procedures with value parameters (of predened, pointer- and record-types) and reference parameters (of predened and record-types). We do not consider pointer variables as arguments for reference parameters in order not to burden the formal development of our approach with multilevel aliasing. We remark, however, that our approach is not limited to this setting. Pointer variables as arguments of reference parameters can be dealt with for example after some preprocess for computing alias information (cf. KRS96]). Conventions. As usual we assume that inheritance is removed from the argument program. This can be achieved by a simple pre-compilation step, which, in essence, expands all record-type denitions occurring in a program by adding all inherited components (variables and type-bound procedures) except for type-bound procedures being overwritten. Thus, in the resulting program a record-type denition contains all information including the inherited one explicitly. This simplies the formal development of our approach and its application to type analysis, but is not a prerequisite for it. Moreover, we introduce a special record-type rtNIL corresponding to the common pointer-type ptNIL (the type of the NIL object). Every record-type rt is an extension type of rtNIL. Dually, this holds for the corresponding pointer-types as well. In particular, every type-bound procedure P gets an equally-named \empty" implementation bound to rtNIL. Additionally, for type-bound procedures and type-bound procedure calls we add the receiver parameter as an ordinary parameter to the parameter list, i.e., the header PROCEDURE (receiver:pt) P(v1 :t1 ,...,vn:tn) is replaced by PROCEDURE (receiver:pt) P(receiver:pt, v1 :t1 ,...,vn :tn )

Accordingly, the type-bound procedure call pv.P(a1,...,an) is replaced by pv.P(pv,a1,...,an). Finally, statements containing a type assertion are split into two parts: the type assertion and the term without the assertion. Thus, the statement x:= : : : pv(pt) : : : is replaced by the sequence pv(pt) x:= : : : pv : : : . Again, we remark that this simplies the formal development of our approach by separating concerns, but leaves the result of type analysis invariant. Recall that a type assertion does not aect the dynamic type of a pointer variable pv. Goal of Type Analysis. In the context of SimpleOberon the goal of type analysis is to compute for every occurrence of a pointer variable the set of dynamic types, which can actually occur at runtime as precisely as possible. Central for accomplishing this is the semantics of elementary statements concerning the eect on the type of a variable, which is summarized in the following table.

Statement NEW(pv)

pv:=pv0

Semantics

Eect on the dynamic type of the pointer variable pv Create: Creates a new variable The dynamic type of pv is equal to of the record-type corresponding to its static type. the static pointer-type of pv, which can be referenced by pv.

Assign: (i) Assigns pointer variable The dynamic type of pv is set to the

pv:=NIL

to pv. dynamic type of pv0 . (ii) Assigns the special value NIL to The dynamic type of pointer variable pv. ptNIL .

pv(pt)

Test: (i) Type assertion. Tests the No eect on the dynamic type of

pv IS pt

pv0

dynamic type of pv.

pv

is set to

pv,

if it is equal to pt or any of its extension-types. Otherwise the program execution is aborted. (ii) Type test. Evaluates to true, if No eect on the dynamic type of pv. the dynamic type of pv is pt or any of its extension-types, otherwise it evaluates to false.

3 Program Models We represent SimpleOberon-programs by means of directed edge-labeled ow graphs G0 : : : Gk , where every Gi represents an ordinary or type-bound procedure of (the main program is considered a parameterless procedure). Each Gi consists of a set of nodes Ni and edges Ei having a unique start node si and end node ei , which are assumed to be free of incoming and outgoing edges, respectively. The system G = fG0 : : : Gk g is called a program model. The edges of G represent both the statements and the nondeterministic control ow of the underlying procedures, while the nodes represent just program points (cf. Figure 4). Unlabeled edges are assumed to represent the empty statement \skip". For every ow graph Gi , let predGi (n) =df fmj(m n) 2 E g and succGi = fmj(n m) 2 E g denote the sets of immediate predecessors and sucessors of a node n 2 N , and let source(e) and dest(e) denote the source and destination node of an edge e 2 E . A nite path in Gi is a sequence he1 : : : eq i of edges such that dest(ej ) = source(ej +1 ) for j 2 f1 : : : q ; 1g. It is a path from m to n, if source(e1 ) = m and dest(eq ) = n. PGi m n] denotes the set of all nite paths from m to n in Gi . The function pg(n) yields for every node n the graph Gi containing it. Similarly, start(Gi ) and end(Gi ) yield the start node and end node of Gi . Without loss of generality we assume that every node n 2 N lies on a path from a start(pg(n)) to end(pg(n)). Moreover, by EC we denote the set of edges labeled by a procedure call, and by ETPC EC the subset of edges labeled by a typebound procedure call. The function callee maps every edge representing a procedure call to the set of procedures it may invoke. If e represents an ordinary procedure call, callee(e) yields the singleton set containing this procedure. If it represents a type-bound procedure call, callee(e) yields the set of all procedures which are possibly called at runtime, i.e., the set of all equally-named procedures bound to the static type or any of the extension types of the pointer variable at the call site or to rtNIL, for which the list of formal parameters matches the list of actual parameters. Dually, the function caller yields the set of all call sites of a procedure Gi . Interprocedural program models. Note that a program model G does not explicitly represent the control ow caused by procedure calls. This, however, is required for the operational characterization

of our approach (cf. Section 4.2). Therefore, we introduce the interprocedural program model G of G, which results from G by replacing every transition e 2 EC by call edges leading from source(e) to the start node of every procedure of callee(e), and by return edges connecting the end nodes of these procedures with dest(e). These edges are labeled by assignments reecting the parameter transfer (cf. KG96]). We denote the set of call and return edges of G by EC and ER , respectively. We remark that the notion of a nite path introduced above applies also to the interprocedural program models G of G. However, there are paths in G , which do not correspond to a runtime execution of the underlying program because they do not respect the call/return behaviour of procedure calls. This leads to the notion of valid paths, which are a subset of the set of all nite paths of G . Validity imposes a consistency constraint on nite paths excluding those violating the call/return behaviour of procedure calls (cf. SP81, KS92]). Intuitively, identifying matching call and return transitions with \(" and \)", respectively, valid paths correspond to the prex-closed language of balanced parentheses. In the following we denote the set of all valid paths of G from m to n by IP m n].

4 Abstract Interpretation Abstract interpretation has proved to be a powerful and theoretically well-founded framework for static program analysis (cf. CC77, Mar93]). In this section we adapt this approach to the objectoriented setting of this article following the lines of KS92] and KG96].

4.1 Local Abstract Semantics

The point of abstract interpretation is to replace the \full" semantics of a program by a simpler, more abstract version, which is tailored to deal with a specic problem. In the interprocedural setting, the (global) abstract semantics is typically induced by a local abstract semantics ] 0 : E ! (C ! C ), which gives abstract meaning to every transition of the interprocedural program model G of G in terms of a transformation on a complete lattice (C u v ? >) with least element ? and greatest element >. Its elements are assumed to represent the data ow information of interest. Fundamental in order to deal with local variables of recursive procedures is the introduction of stacks of lattice elements as in KS92]. This is required by the fact that in a program with recursive procedures and local variables there are potentially innitely many copies of a local variable at runtime. Intuitively, abstract stacks model the ordinary runtime stacks and record the part of the program history which will become relevant after returning from (nested) procedure calls. Abstract stacks can as usual be manipulated by the following operations, where STACK denotes the set of all abstract stacks: newstack : C ! STACK for creating a new stack push : STACK C ! STACK for pushing a new element on top of the stack pop : STACK ! STACK for removing the top element of the stack top : STACK ! C for accessing its top element. The local abstract semantics on stacks induced by an (ordinary) local abstract semantics is given below. ] :E

8 ! (STACK !0 STACK ) : >>push(pop(stk) e ] (top(stk))) push(pop(pop(stk)) >: 0

R(e)(top(pop(stk)) e ] (top(stk))))

if e 2 E n (EC ER ) if e 2 EC if e 2 ER

Important are the return functions R(e) : C C ! C , e 2 ER . Intuitively, they model the eect of returning from a procedure call, which requires to maintain the eects on global variables but to reset the eects on local ones. This is achieved by appropriately combining the information valid before entering the procedure with that being valid immediately before leaving it. We remark that these

informations are available as the rst and second argument of the return functions in any reasonable analysis context. The global abstract semantics of a program results then from one of the following two globalization approaches: the \operational" meet over all paths (MOP) approach leading to the MOP-solution, and the \denotational" maximal xed point (MFP) approach leading to the MFP-solution in the sense of Kam and Ullman KU77], respectively.

4.2 First Globalization: The Meet Over All Paths Approach (MOP)

The operational MOP-approach globalizes a local abstract semantics by directly mimicing possible program executions: it \meets" (intersects) all information, which belong to a program path reaching the program point under consideration. The variant of this approach required here is as follows:

The MOP-Solution: 8c0 2 C: 8n 2 N : MOP( ]]

c0) (n)

=

uf p ] (newstack(c0 )) j p 2 IP sG n]g: 0

Here, ] denotes the straightforward extension of a local abstract semantics to paths. The MOPsolution, dened as the greatest lower bound of the top elements of the stacks of f p ] (newstack(c0 )) j p 2 IP sG0 n]g, reects directly our desires. Unfortunately, it does not specify an eective computation procedure in general (think for example of loops or of recursive procedure calls in a program). This is the central concern of the second globalization approach.

4.3 Second Globalization: The Maximal Fixed Point Approach (MFP)

Like for its interprocedural counterpart, the key of the maximal xed point approach required for the setting of this article is a preprocess for computing the meanings of procedure call transitions in terms of the meaning of the called procedures. This requires the introduction of an auxiliary semantic functional ]] giving meaning to whole procedures. Intuitively, n ]] transforms data ow information which is assumed to be valid at the entry of the procedure containing n into the corresponding data ow information being valid at n. The meaning function of a procedure Gi is then essentially given by the meaning function of its end node end(Gi ) ]]. Formally, this is captured by Denition 4.1:

Denition 4.1 (Global Semantics of Program Models)

The functionals ]] : N ! (STACK ! STACK ) and ] : E ! (STACK ! STACK ) are dened as the greatest solutions of the equations system: n ]] = and

id STACK uf (m n) ] m ]] j m 2 predpg(n) (n)g

e] e ] = uf e (G ) ] R i

if n 2 fsG0 : : : sGk g otherwise

end(Gi ) ]] eC (Gi ) ] j Gi 2 callee(e)g

if e 2 E n EC if e 2 EC

where idSTACK denotes the identity on STACK , eC (Gi ) the call edge from source(e) to start(Gi ), and eR (Gi ) the corresponding return edge from end(Gi) to dest(e) with Gi 2 callee(e). The equation system of Denition 4.2 then characterizes the MFP-approach.

Denition 4.2 (The MFP-Equation System)

8 newstack(c ) < 0 d (n) = : uf eC (pg(n)) ] (d (source(eC ))) j eC 2 caller(pg(n))g uf (m n) ] (d (m)) j m 2 predpg(n) (n)g

if n = sG0 if n 2 fsG1 : : : sGk g otherwise

Denoting the greatest solution of the equation system of Denition 4.2 by d c0 , the MFP-solution is dened by: The MFP-Solution: 8c0 2 C: 8n 2 N: MFP( ]] c0) (n) = d c0 (n) As usual, the practical relevance of the MFP-approach stems from the fact that it induces an iterative computation procedure, which is eective, if the function lattice on C satises the descending chain condition, and if the local semantic functions e ] 0 , e 2 E , and the return functions R(e), e 2 ER , are monotonic (cf. KS92, KRS96, KG96]). In the following section we establish the relation between the MOP- and the MFP-solution. Important are the correctness and coincidence theorem.

4.4 Correctness and Coincidence

The correctness and coincidence theorems give a sucient condition for the correctness (or safety) and the coincidence (or precision) of the MFP-solution with respect to the MOP-solution. Along the lines of KS92, SP81] we can prove:

Theorem 4.3 (Correctness (Safety) Theorem)

The MFP-solution is a correct approximation of the MOP-solution, i. e. 8 c0 2 C: 8 n 2 N: MFP( ]] c0 ) (n) v MOP( ]] c0 ) (n), if the local semantic functions e ] 0 , e 2 E , and the return functions R(e), e 2 ER , are all monotonic.

Theorem 4.4 (Coincidence (Precision) Theorem)

The MFP-solution and the MOP-solution coincide, i. e. 8 c0 2 C: 8 n 2 N: MFP( ]] c0 ) (n) = MOP( ]] c0 ) (n), if the local semantic functions e ] 0 , e 2 E , and the return functions R(e), e 2 ER , are all distributive.

5 Type Analysis

Resolving the Interplay of Pointer Variables and Type-bound Procedures

Central for type analysis of Oberon-2-like programs is to resolve the complex interplay of pointer variables and type-bound procedures. In contrast to imperative languages, where procedures occurring as procedure parameters cause similar phenomena, they are not orthogonal, but inherently interweaved: the dynamic types of pointer variables and the procedures called by type-bound procedure calls depend mutually on each other. Here, we resolve these interdependencies by decomposing the analysis into two components dealing with (1) the computation of dynamic types of pointer variables, and (2) the computation of potentially called procedures, respectively. Both steps are repeated until a common xed point is reached, i.e., both the sets of dynamic types of pointer variables and the sets of potentially called procedures of type-bound procedure calls are invariant under further applications of the component analyses. We remark that both components rely on information computed by the other component. Fortunately, this (apparent) deadlock-situation can be resolved by means of the function callee. Based on the static type declarations of the program under consideration, it provides a safe approximation of the sets of potentially called procedures of type-bound procedure calls. This information is initially fed into the type analysis of step 1 returning an approximation of the sets of dynamic types of pointer variables according to this information. Vice versa, the type information on pointer variables induces now an improved approximation of the sets of potentially called procedures of type-bound procedure calls. The repetition stops, if the new approximation provided by step 2 coincides with the former one. It is worth noting that step 2 does not require an interprocedural analysis. In essence, it amounts to evaluating the type analysis information delivered by the analysis of step 1 at call sites of typebound procedures. In contrast, the type analysis of step 1 requires an interprocedural analysis. In

practice, this is considerably simplied as the implementation of this analysis results automatically from instantiating the generic xed point algorithm computing the MFP-solution of the data ow problem under consideration (cf. Section 4). The exibility resulting from this is demonstrated in Section 5.3 presenting dierent type analyses of varying precision and eciency. We remark that step 2 is not aected. It is commonly shared by all analyses. Together, this illustrates how to easily trade power against eciency and vice versa in our approach according to a user's individual needs.

5.1 Computing Dynamic Types of Pointer Variables

In this section we demonstrate how to apply the framework of Section 4 in order to compute a safe approximation of the sets of dynamic types of pointer variables. According to Section 4, this reduces to specifying an appropriate abstract interpretation. This only requires to x 4 elementary components: the data domain, the local abstract semantics, the return functional, and the start information of interest. As mentioned previously, these components can directly be fed into the generic computation procedure returning the implementation of the type analysis computing the MFP-solution (cf. KS92, KG96]). Note that the denition of the local abstract semantics and the return functions require a safe approximation of the procedures potentially called by type-bound procedure calls. This information is made available by the function calleeta provided by the component of step 2. Initially, i.e., for the rst application of step 1, calleeta is given by the function callee, which is based on static, and hence, safe type information. In the specication below, the index \ta" being a short-hand for \type analysis" reminds to this fact. Specication of the Basic Type Analysis: The Local Abstract Semantics. Data domain. The data domain of our basic type analysis is given by the powerset lattice (C u v ? >)=df (P (F ) F ) where F = PVar ! PT] denotes the set of all functions from pointer variables to pointer types including ptNIL. Local abstract semantics. The local abstract semantics is given by the functional ] : Eta ! (P (F ) ! P (F )), which is dened by: 8e 2 Eta : 8P 2 P (F ): e ] (P ) = f e ] (f ) j f 2 P g where ] : Eta ! (F ! F ) is given by: 8 >>f statTyp(pv)npv] if e NEW(pv) >>f pt npv] if e pv := NIL >: NIL f otherwise Here, extTyp(pt) denotes the set of extension types of pt including pt, and f : n : ] a substitution dened as follows. If f 2 F pv 2 PVar pt 2 PT, then f ptnpv] is the unique function of F dened by: ( pt if pv = pv0 0 0 8pv 2 PVar: f ptnpv](pv ) = f (pv0 ) otherwise Return functions. Central is the denition of the return functional R : ERta ! (P (F )P (F ) ! P (F )). It is dened by: 8e 2 ERta : 8P1 P2 2 P (F ): R(eR )(P1 P2 ) = fR0 (f1 f2 ) j f2 = eR ] source(eR ) ]] eC ] (f1 ) f1 2 P1 f2 2 P2 g

where eC denotes the call edge corresponding to eR , and source(eR ) the end node of the procedure callee under consideration. The function R0 : (F F ) ! F , nally, is dened by: 8 (f1 f2 ) 2 F F: R0 (f1 f2 ) = f3 where 8pv 2 PVar: f3 (pv) =

(

f1 (pv) if pv 2 LocVar(pg(source(eR ))) f2 (pv) otherwise

Here, LocVar is a function, which maps a procedure to its set of local pointer variables and value parameters. Intuitively, the denition of the return functional R reects that a pointer variable being local to the procedure callee under consideration must be reset to its value before entering the procedure, and be maintained otherwise. Start information. The start information completes the specication of our basic type analysis. It is given by the singleton set F0 =df ff0 g 2 P (F ), where the function f0 is dened by: 8pv 2 PVar: f0 (pv) = ptNIL

Actually, the denition of the basic type analysis is rather straightforward. Thus, we only discuss the case dealing with type assertions, i.e., e pv(pt), in the denition of the local semantic functional in more detail. Note, in case of f (pv) 2 extTyp(pt), the type assertion holds, and consequently the local abstract semantics has no eect as expected. On the other hand, if f (pv) 2 extTyp(pt) is violated, the test on the assertion fails indicating a possible program abortion at runtime. However, instead of immediately nishing the analysis with indicating this failure, we proceed by propagating the type information pt provided by the assertion. Thus, the analysis is continued using the information originally intended by the programmer, which allows us to provide him a program being completely annotated with type information, and not just partially up to detecting the rst point of a possible failure. This is important in practice. Note, however, that the information of possible failures is retained by this proceeding. After termination, the existence of possible failures can simply be checked by comparing the type assertions with the set of dynamic types computed for the source node of the edge the type assertion under consideration is attached to. We recall that the specication above can directly be fed into the generic computation procedure returning the implementation computing the maximal xed point solution of the basic type analysis. The termination of the instantiated xed point algorithm, and the correctness of the type information computed is essentially a consequence of Theorem 5.1. Correctness and Termination of the Type Analysis. The rst part of Theorem 5.1 follows immediately from the niteness of PVar and PT. The second part can easily be proved by means of the denitions of the local semantic functions and return functions. We have:

Theorem 5.1 (Descending Chain Condition and Monotonicity)

1. The function lattice P (F ) satises the descending chain condition. 2. All local semantic functions and return functions of the basic type analysis are monotonic. Theorem 5.1 directly yields the eectivity of the instantiated xed point algorithm. Moreover, because of part 2, the Safety Theorem 4.3 is applicable yielding the correctness of the results of our type analysis with respect to the MOP-solution of the basic type analysis.

Theorem 5.2 (Termination and Correctness)

The MFP-solution of the basic type analysis is a safe approximation of its MOP-solution.

Denoting the result of the basic type analysis, i.e., its MFP-solution, by MFPFta0 , we obtain as approximation of the sets of dynamic types of pointer variables: dynTypta (pv n) = ff (pv) j f 2 top(MFPFta0 (n))g

This information can now be fed into step 2, inducing a new approximation for the sets of procedures possibly called by type-bound procedure calls.

5.2 Computing Potentially Called Procedures of Type-bound Procedure Calls

By means of the type information computed by step 1, we get a new approximation of the sets of procedures potentially called by type-bound procedure calls. It is given by: 8e 2 ETPC

: ppta (e) = pwts(tpc-tp(e) dynTypta (tpc-id(e) source(e)))

where tpc-id and tpc-tp are two functions mapping an edge e labeled by a type-bound procedure call pv.P() to pv, and the set of procedures with identier P bound to the static type of pv or any of its extension types, respectively. pwts(fG1 : : : Gm g fpt1 : : : ptl g), nally, denotes the set of type-bound procedures of G1 : : : Gm being bound to some type of pt1 , : : : ,ptl . Intuitively, ppta induces a \dynamic" counterpart of callee being more precise as it prots from type analysis information, whereas callee (cf. Section 3) is based on static type information only: 8e 2 EC : calleeta (e) =

(

ppta (e) if e 2 ETPC callee(e) if e 2 EC nETPC

The function calleeta can now be (re-)fed into step 1 oering the type analysis the chance of computing an improved approximation of the sets of dynamic types of pointer variables. After the termination of the global analysis, i.e., after reaching a xed point for both steps, the sets of procedures possibly called by type-bound procedure calls is given by: 8e 2 ETPC

: potProc(e) = calleetafix (e)

where calleetafix denotes the nal value of calleeta . Correctness. In essence, the correctness of step 1, and, together with Theorem 5.2, of the complete two-step approach, is a consequence of the validity of the inclusion 8e 2 ETPC

: actProc(e) callee(e)

where actProc is assumed to denote the sets of type-bound procedures which can actually be called at runtime. Hence, the rst application of step 1, based on callee, and thus of all subsequent ones because of a monotonicity argument, relies on a safe approximation of actProc.

5.3 Variants

Dierent type analyses can easily be obtained by modifying the rst step of the global analysis procedure only. In fact, changing or modifying the local abstract semantics suces for obtaining new type analyses or variants of the basic type analysis of Section 5.1 in order to meet the individual requirements of a user concerning precision and eciency. Enhancing Precision. Precision can be enhanced, e.g., in the fashion of KG96] by means of introducing lter functions in order to evaluate type tests and to arrive at an almost deterministic treatment of program branches. Technically, this is realized by introducing lter functions, which rely on the simple program transformation illustrated in Figure 3. In essence, introducing lter

econd pv true

false

IS pt

pv IS pt

: (pv IS pt)

Figure 3: Transformation of our program model to deal with deterministic treatment of type tests. functions amounts to expanding the local semantic functional of Section 5.1 to edges labeled by branch information. For the basic type analysis of Section 5.1 this is accomplished as follows (we remark that we do not touch here the lattice or the semantic functional ] ):

8 >:f e ] (f ) j =f extTyp 2 Pg

if e pv IS pt if e :(pv IS pt) otherwise Note that the local abstract semantics of a type test performs in fact like a lter: functions f 2 F are propagated along the true-branch, if the test pv IS pt evaluates to true, i.e., if the dynamic type of pv is pt or an extension-type of pt. Otherwise, the function propagation of the lter is blocked. This holds dually for the lter applying to the false-branch. Enhancing eciency. Eciency can be enhanced, e.g., by basing the type analysis of step 1 onto (function) lattices, where the maximal length of a descending chain is \short". An important variant following this approach is to consider the type-extension (inheritance) tree as the lattice of interest as proposed by Corney and Gough for an intraprocedural setting (cf. CG94]). It is straightforward to adapt the local semantic functions of our basic type analysis accordingly in order to model their algorithm in our approach. Actually, this even holds for the denition of the return functional, which together is sucient to enhance their algorithm even interprocedurally.

5.4 Illustrating Example

In this section we demonstrate the power of the basic type analysis of Section 5 enhanced by lter functions according to Section 5.3 by means of the example of Figure 4(a), which is assumed to obey the type hierarchy of Figure 2. In particular, we assume that pvi is a pointer variable of the type POINTER TO pti , and that Prti are equally-named type-bound procedures corresponding to the record types rti. Following Section 5.3, the usage of lter functions requires to replace the label of edge 13 by skip, and to attach the labels pv0 IS pt1 and : (pv0 IS pt1 ) to the edges 14 and 15, respectively, assuming that edge 14 represents the true-branch and edge 15 the false-branch. According to Section 5, the global analysis starts with xing the initial approximation of the set of procedures possibly called by type-bound procedure calls (cf. Section 5.2). Initially, it returns for both type-bound procedure calls pv0 .P() at the edges 12 and 19 the procedures Prt0 , Prt1 , Prt2 , Prt3 and PrtNIL . This information xes the program model for the rst application of step 1 of the global analysis computing a rst approximation of the set of potential types of pointer variables (cf. Section 5.1). For the occurrence of variable pv0 at node source(e12 ) it returns the types pt2 and pt3, and for the occurrence at node source(e19 ) the types pt2 and pt3 . Repeating step 2, we get Prt2 and Prt3 as possibly called procedures for the call site at edge 12, Prt2 and Prt3 at edge 19. This information xes a new program model for the second application of step 1. We obtain pt2 and pt3 as possible types of the variable pv0 at node source(e12 ), and pt2 at node source(e19 ). After updating the sets of

(a)

(b)

1

NEW(pv0)

1

NEW(pv0)

2

NEW(pv1)

2

NEW(pv1)

3

NEW(pv2)

3

NEW(pv2)

4

NEW(pv3)

4

NEW(pv3)

5

x 10?

5

x 10?