The Static and Dynamic Semantics of C: Preliminary ... - CiteSeerX

The Static and Dynamic Semantics of C: Preliminary Version James K. Huggins and Wuwei Shen Abstract

Montages are a semi-visual formalism for de ning the static and dynamic semantics of a programming language using Gurevich's Abstract State Machines (ASMs). We describe an application of Montages to describe the static and dynamic semantics of the C programming language.

1 Introduction In this paper we present the semantics of the C programming language by Montages, which is a new method for giving the semantics of a programming language. In general the only comprehensive description of a programming language is likely its reference manual, which can be informal and open to misinterpretation. Formal approaches are therefore sought. As an attempt in this eld, Gurevich's Abstract State Machines(or ASMs) have been successfully used to model the dynamic semantics of programming languages such as Prolog [4], Occam[1, 2], C[7], C++[11], Java[5] and Oberon[10]. But how to represent the static semantics of a programming language remains a problem when we want to give the complete semantics for a programming language. Montages[9] provide a way to describe the static semantics and dynamic semantics of a programming language. A language speci cation (i.e. the description of its syntactical and semantical aspects) is given as a collection of Montages, each of which is associated with a syntax rule. The semantics of such a collection is given by an ASM. Such an ASM is composed of two parts: the rst part de ning the static analysis and semantics of the language, the second part de ning the dynamic semantics of the language. Each program of the speci ed language de nes an initial state for the ASM, which contains the program parse tree. After we have given the semantics of a programming language L and a program written in L, the corresponding ASM M begins by performing the static analysis for each node of the parse tree. The static analysis decorates the leaves of the parse tree with control and data ow information, building the sequence of tokens that are needed by the dynamic semantics. Next, M executes the dynamics semantics using the sequence of tokens generated by the rst step. Thus, M can be seen as an interpreter for L. In this paper we present the static and dynamic semantics of C by using Montages. As a continuation of [7], we also use the ANSI standard for C as described in [8]. We also concentrate on the following four topics respectively: statements, expressions, memory allocation and variable initialization, and function invocation and return. L

L

L

1

The paper is organized as follows. In the next section, we give a brief introduction to ASMs and Montages. In the following sections we give the Montages for the above four topics.

2 ASMs and Montages 2.1 Introduction to ASMs

In the following we describe an ASM model [6] which is sucient to represent the semantics of C [8]. (ASMs have many features not presented here; see [6] for details.) The signature of an ASM A is a nite collection of function names, each name having a xed arity. A state of A is a set, the superuniverse, together with interpretations of the function names in the signature. These interpretations are called basic functions of the state. A basic of function of arity r is an r?ary operation on the superuniverse. When r = 0, such a basic function is called a distinguished element. The superuniverse does not change as A evolves; the basic functions may. The superuniverse contains some distinct elements true, false and undef which are used to describe relations and partial functions. They are logical constants, whose names do not appear in the signature. In addition, we use equality as a logical constant. A universe U is an important concept in ASMs. It is a special type of basic function: a unary relation usually identi ed with the set fx : U (x)g. ASMs provide some built-in universes such as the logic constant Boolean = ftrue; falseg. When we de ne a function f from a universe U to a universe V , and write f : U ! V , we mean that f is a unary operation on the superuniverse such that f (a) 2 V for all a 2 U and f (a) = undef otherwise. We can extend this notation to notations such as f : U1 U2 ! V and f : V , which means the distinguished element f belongs to V. In addition the expression f (a) can be written in the form a:f . For the general case, the expression f (a1; : : :an ) can be written in the form a1 :f (a2; : : : :; an ). There are three kinds of functions in ASMs. A function f is dynamic if f can be changed as the ASM evolves. Functions which are not dynamic are called static. External functions are syntactically static, but have their values determined by an oracle(that is, the outside world). In principle, a program of A is a nite collection of rules, which are de ned inductively in the following:

Update Rules: f (s) := t

is a rule with head f . Here s is a tuple (s1 ; : : :; sr ) of terms where r is the arity of f and r 0. If f is relational, then the term t must be Boolean. To re such a rule, change the value of f at the value of term s to the value of t. Conditional Rules: if g is a Boolean term and R1; R2 are rules then

if g then R1 else R2

2

endif

is a rule. To re this rule at a given state A, examine the guard g . If g 's value is true at A, then re R1; otherwise, re R2 . Block: If R1; R2 are rules then

do in ? parallel R1 R2 enddo is a rule with components R1 ; R2. Do-in-parallel rules are called blocks. Let r1 and r2 be update rules of the following forms: r1 : f1 (s1 ) := t1 r2 : f2 (s2) := t2

r1 and r2 are said to be mutually inconsistent at a given state A if f1 = f2; and the values of s1 and s2 are equal but the values of t1 and t2 are not equal. Otherwise they are

mutually consistent. To re a block R at a given state A, determine rst if the update rules which will be red in R1 and R2 are mutually consistent. If yes, then re them simultaneously. If not, do nothing; R is inconsistent at A. Do-forall Rules: If v is a variable, g(v) is a Boolean term and R0(v) is a rule, then do forall v : g(v) R0(v)

enddo is a rule with head variable v; guard g (v ) and body R0. A do-forall rule is similar

to the do-in-parallel rule, except that the components are not listed explicitly. Suppose R is the do-forall rule above. At a state A which maps every variable in R to a value, the components of R are the rules R0 (a) where a is any element in the state A satisfying g(a) = true. To re R at A, re simultaneously all these R0(a) unless they are mutually inconsistent. In the latter case, do nothing.

2.2 Introduction to Montages

Montages[9] are a semi-visual formalism that allow uni ed and coherent speci cation of syntax, static analysis and semantics, and dynamic semantics. Generally speaking, for every syntax rule there is one corresponding Montage. In every Montage there can be four parts. The rst three parts de ne the static aspects of the language, which refers to the work which can be done at compile time (such as static analysis), and the fourth part de nes the dynamic semantics of the language. 3

2.3 Structure of Montages

The rst part of a Montage is the syntax rule. In general a syntax rule has the form n ::= E where n is a nonterminal symbol and E is a string of nonterminal and terminal symbols. We say this rule is associated with the nonterminal n. Terminal symbols are the symbols which do not appear on the left-hand side of any syntax rule. The terminal symbols are also called tokens. All nonterminal and terminal symbols are called grammar entities in the following. The universe Node consists of all grammar entities; according to the elements in the universe Node, it can be divided into two subuniverses Token and Nontoken. The grammar entities on the right hand side of the \::=" symbol are called the descendants of n. During the static analysis phase, we use the distinguished element CN to denote the token which is currently being analyzed. The second part of a Montage is the data and control ow diagram, also called the static analysis graph, where the Montage describes the data and control ow functions among the nonterminal and terminal symbols de ned in the syntax rule. Control ow functions describe the execution order among the grammar entities and data ow functions describe how values ow through these grammar entities. Control ow functions are represented by dotted arrows and data ow functions are represented by solid arrows. For brevity, we call dotted arrows control arrows and the solid arrows data arrows. Every data and control function has a name associated with it. Montages provide some special names for certain control functions. I and T represent the initial and terminal control ow for a Montage. They are used to connect local control

ow information to global control ow. NT, another important control ow function, represents the token to be executed in the next step during the dynamic semantics computation. Boxes represent nonterminal symbols and ovals represent terminal symbols in the syntax rule. All of the names appearing in boxes or ovals start with \S-" followed by the corresponding grammar entities' name. They represent selector functions which allow one to select that grammar entity from its parent when a program is parsed. Besides these attributes one can de ne other static aspects which can not be represented directly by the ow diagram. The third part of a Montage is the static condition; the conditions in this part should be satis ed during the static analysis phase, otherwise a syntax error will be reported by the static analysis phase. The fourth part of a Montage is the dynamic semantics for this syntax rule, written by ASM transition rules.

2.4 An Example

Consider the following example:

4

while ::= WHILE Expr DO Statementlist END I

S-Expr NT

cond NT

S-Statementlist

S-DO

T

TrueTask

condition CN.S-Expr.StaticType=Boolean @"DO": if CT.cond.value=true then CT:=CT.TrueTask; else CT:=CT.NT endif end@

In the above Montage, the rst part is the syntax rule for a while statement in some programming language. The second part is the data and control ow diagram. Here there are two boxes which represent the nonterminal symbols named Expr and StatementList in the syntax rule. There is one oval which represents the terminal symbol named DO in the syntax rules. The dotted arrows give the execution order among these grammar entities. The third part is the static condition for this syntax rule. For this rule the condition states that the type of Expr should be Boolean. If the type of Expr is not Boolean a syntax error will be reported during the static analysis phase. The last part is the dynamic semantics for this syntax rule. The dynamic semantics is always presented by giving a set of ASM rules for some terminal symbol(s). In the above example the dynamic semantics of the while statement is presented by giving the ASM rules for the token \DO". Here the keywords @\DO" and end@ and the statements between them form a block for this syntax rule. The dynamic semantics will be executed by rst checking the value of the expression(using the data arrow named cond). If the value of the expression is true, then control passes to the Statementlist of this while statement; otherwise we exit this while statement.

2.5 Informal Meaning of Montages

2.5.1 Compact Derivation Tree

A Montage speci cation of a language L de nes an ASM M that for a given program P of L analyzes and de nes the control and data ow, checks the static semantics, and then executes the dynamic semantics. For each program P , there is a dierent initial state I of the ASM L

P

5

which encodes the syntax of P . L?programs are strings generated by a context free grammar G . Each program generated by G is represented in I as a compact derivation tree, which is derived from a parse tree by repeatedly collapsing each node n with a single child c to a single node with the labels of both n and c, and having only the children of c as children, until no such nodes remain. The initial state I associated with a program P is given by universes and functions representing the nodes and branches of the compact derivation tree. The functions are selector functions which are used to select nodes in some branch of the compact derivation tree. The initial state I contains the compact derivation tree of the program P . Because it is possible for one node to have multiple disjoint labels such as n and c mentioned above, the universes are not disjoint and the selector function is used to select descendants from more than one category of nodes. Here the notation S- is used to distinguish the selector function names from the universe function names. After obtaining the compact derivation tree, we introduce how to generate the control and data ow information which makes the ASM program executable during the dynamic semantics computation. The control and data ow information is provided as attributes of the tokens after the static analysis. These attributes are illustrated as data arrows and control arrows among the tokens. Montages provide a way to de ne control and data ow starting from the syntax of the program. When the parse tree for a program P is generated by a grammar L, each syntax rule such as n ::= E has multiple occurrences in the tree; in particular we can imagine the nonterminal n matching subtrees whose descendants are structured according to the right-hand side E of the syntax rule. The root of such a subtree is a node corresponding to the left-hand side n. Nonterminal and terminal symbols in E represent the direct descendants of that root. Linking together these symbols results in ow arrows among internal nodes of the derivation tree. In order to create normal ow graphs we have to move the ow arrows down to the leaf level. This move can be done in a simple but universal way. Every node can be associated with a subtree of the parse tree, namely the tree rooted at that node. The local control ow among the leaves of the subtree is de ned by the corresponding Montage. The rst and the last leaves of this local control ow are called initial and terminal leaves. Incoming control ow is always connected to the initial leaf and outgoing to the terminal leaf. The function Initial connects a node to its initial leaf and the function Terminal connects a node to its terminal leaf, which is shown in the following: L

L

P

P

P

Terminal T

Initial I . . . Initial

Terminal Terminal Initial

T0

Tn

T0’

6

Tn’

2.5.2 Control and Data Functions

For a given parse tree of a program, the initial and terminal leaves of a node n can be de ned graphically by means of Montage. For this node n, if there is a syntax rule associated with it, then the symbol to which the dotted arrow named I in the data and control ow diagram of its Montage points is called the I-symbol of this node n; the symbol from which the dotted arrow named T points is called the T-symbol of this node n. For the above while statement example, Expr is called the I-symbol of the node while and DO is called the T-symbol of the node while in a parse tree generated by that syntax rule. If the I-symbol of n is a leaf in the parse tree then this I-symbol is called the I-leaf of n; otherwise the I-leaf of n is the I-leaf of n's I-symbol. We can de ne a T-leaf for a node in the same way. Thus Expr is not the I-leaf of the node while but DO is the T-leaf of the node while. Then we can de ne the Initial and Terminal functions over every node as follows: they map a node to its I-leaf and T-leaf respectively. Based on the de nition of initial and terminal leaf, we can now de ne which leaves are linked by control and data arrows as follows: - A dotted control arrow links the terminal leaf of its source with the initial leaf of its target. - A solid data ow arrow links the terminal leaf of its source with the terminal leaf of its target. Using the above de nition we can derive a control and data ow graph of a program from its derivation tree. Here all the nodes appearing in the control and data ow graph are leaves(i.e. tokens) in the compact derivation tree. We call these nodes tasks where the dynamic semantics is given. The last part of a Montage is the dynamic semantics, which is given by means of ASM transition rules. A transition rule is a description of how the current state evolves. Given an initial state, the dierent states of an ASM are reached by iteratively triggering the transition rules in parallel until the ASM is in a state which cannot evolve anymore. The initial state of the dynamic semantics of the program is the result of the static analysis, which is the control and data ow graph. The transition rule characterizing the dynamic semantics is the union of all transition rules given in the lowest part of the Montages. In order to represent the program's run, we use a nullary function CT to point to the current task in the control and data ow graph. Generally speaking, in each step CT is set to the next task in the control and data ow graph. This can often be derived from the unary function NT which takes implicitly the current task CT as argument.

2.5.3 Lists

Montages also provide for the processing of lists with which most languages are concerned. If the right hand side of a syntax rule contains a symbol enclosed in f g, a list of descendants is generated. An additional node, called a list node, is generated as well. In order to access the elements and needed information about the list, Montages provide an attribute ListLength of 7

the list node which is set to the length of the generated list and a binary in x function [ ] : ListNode Integer ! Node which can be used to retrieve the elements of the list. Moreover, a function

Position : Node ! Nat returns the physical position of an element within a list. The initial leaf of the list node is the initial leaf of the rst element in the list, and the terminal leaf is the terminal leaf of the last element. For the control functions de ned in the list rule, the corresponding terminal leaf of each list element is connected to the initial leaf of its successor in the list by them. For data functions related to lists, there are two special cases. One case is an arrow de ned from all the elements within a list to an outside node. In this case, we can regard the function as a group of arrows which connect all the elements within the list to the outside node respectively. As an example, the following Montage gives the static semantics for a list of variable declarations in some programming language. The list is graphically represented by a box, which is labeled in its upper right corner with the keyword LIST. The data ow arrows StaticType speci es a group of data ow arrows, one from each variable to the node type, all labeled StaticType. In addition, all variable objects are linked sequentially by a control function NT .

VarDec ::= Var f"," Var g ":" Type I T

LIST S-Var

StaticType NT

S-Type

The other special case is a function de ned from an outside node to the elements within the list. In this case the data function is regarded as set of arrows which connect the outside node to all the elements within the list respectively by using an in x function. Let us look at another example. The following Montage gives the semantics for a function call in some programming language. The data function ActualParameters in the following Montage de nes a binary function ActualParameters : Node Integer ! Node which maps a call task to all the actual parameters of this task. In order to implement the type check between the corresponding parameters of the function call and the function de nition, we need to get the positions of all the parameters in the function call parameter's list. To get an actual parameter position in the list, we can use the projection function which is de ned as follows: ActualParametersPosition : Token ! Integer which gives the position number for every Expr referenced by the function ActualParameters in the list of Expr. The semantics of the data function ActualParameters is the following: 8x 2 list of Expr 8

CN.S-Call.ActualParameters(x.Position):=x.Terminal x.ActualParametersPosition:=x.Position; The projection function name of a data function, like ActualParametersPosition in the following Montage, is generated by the data function name (ActualParameters) followed by Position. The reason that we use the projection function is that the position of an item within a nesting of list boxes may be relative to the source of the arrow which points to it.

ProcCall ::= Call "(" [Expr f"," Exprg] ")" ActualParameters(.) LIST NT

I

S-Expr

S-Call

T

2.5.4 Some notation In order to make it easier to write our Montages, we use the following abbreviations. Vary statements can be used to iterate over a list of nodes. There are three kinds of vary statements which have the following forms: 1. vary over(n,L) R(n) endvary \L" is a universe of nodes, and \n" iterates over the elements of \L". The above rule is equivalent to the following ASM rule:

do forall n : L(n) R(n) enddo 2. vary over ind(i,L) R(i) endvary \L" is a list node, and \i" iterates over the indices of \L". The elements of \L" can be accessed using the notation \L[i]". The above rule is equivalent to the following ASM rule:

do forall i : 1 i ListLength(L) R(i) enddo 3. vary over ind(i,I) R(i) endvary 9

\I" is an integer. This rule is similar to the previous one and it is equivalent to the following ASM rule:

do forall i : 1 i I R(i) enddo 4. vary over child(i,n) R(i) endvary \i" and \n" are both nodes. Before we give the equivalent ASM rule, we de ne a new static function isChild : Node Node ! Boolean. A function isChild(a; b) = true if and only if: a is a direct descendant of b, or a is a direct descendant of some node s and isChild(s ; b) = true. We call a an indirect descendant of b. Then the above rule is equivalent to the following ASM rule: do forall i : isChild(i; n) R(i) 0

0

enddo

3 Semantics of C using Montages

3.1 Some Basic Functions

Before giving the semantics of C, we introduce the following basic functions and notations. The constructs of C can be divided into statements, expressions, and types. In the Montages approach, tasks are those tokens which are chosen to execute the dynamic semantics of the construct. The universe Token is partitioned into (sub)universes StatementTask, ExpressionTask and TypeTask. The universe StatementTask consists of all the tokens which are chosen to execute the dynamic semantics for the statement. The universe ExpressionTask consists of all the tokens for expressions and TypeTask consists of all the tokens for types. CV alue represents the universe of data values that can be represented in a particular C program. A static function NodeV alue : Token ! CV alue indicates the value of a node. In addition we de ne a function Name : Token ! String to indicate the name for the node.

3.2 if-then-else Statement

The if statement includes two forms, i.e. the if-then statement and the if-then-else statement. To execute an if statement, begin by evaluating the guard expression. If the resulting 10

value is nonzero, execute the statements following the guard expression. Otherwise, execute the statements following the else (if any). The guard type must be of arithmetic or pointer type. We use the two functions isPointer:TypeTask ! Boolean and isArithmeticType:TypeTask! Boolean in the condition of the Montage to implement the type restriction. isArithmeticType(s) is used to indicate whether a token s belongs to the universe ArithmeticTask. The de nition of ArithmeticTask will be shown in the type section. Similarly, isPointer(s) is also used to indicate whether the token is a pointer or not. Because there are two nonterminals named statement in this syntax rule, we use \S1-" and \S2-" followed by the nonterminal's name to distinguish them in the data and control ow diagram. That is, \S1-statement" selects the rst statement and \S2-statement" selects the second statement in this syntax rule. In the data and control ow diagram we set the control

ow functions TrueTask and FalseTask : StatementTask ! StatementTask for the token if to the two dierent statements respectively. We set the data ow function Value:Token! ExpressionTask for the token if to the expression exp. The dynamic semantics of this Montage is given by the rule of the token if and the token \)" respectively. The rule for the token if shows that if the value of the expression exp (using the data ow function Value) is true, then the statement before the else is performed; otherwise, the statement after the else is performed (using the control ow functions TrueTask and FalseTask respectively). The rule for the token \)" shows that after executing the if statement, control is passed to the next task. We can regard the if-then statement as a special case of the if-then-else statement, where the statements following the else are omitted. In this case the function FalseTask of the token if is directly connected to the token \)". The reason for doing this is that there is a unique terminal function in every Montage and control can go to this token after it executes the dierent statements. The Montage for if-then-else statements is shown below.

11

if else statement ::= "if" "(" exp ")" statement "else" statement Value

I

S-exp

NT

TrueTask

S1-statement

S-")"

S-"if" FalseTask

NT

S2-statement

T

NT

condition isArithmeticType(CN.S-exp.StaticType) or isPointer(CN.S-exp.StaticType)

@"if": if(CT.Value.NodeValue6=0) then CT := CT.TrueTask; else CT := CT.FalseTask; endif end@ @")": CT := CT.NT end@

Figure 1. Semantics of if-then-else statement.

3.2.1 While Statement The while statement, the do statement and the for statement are three kinds of iteration statements in C. In the while statement the substatements in it are executed repeatedly so long as the value of the guard expression remains unequal to 0; the guard expression in the while statement must be of arithmetic or pointer type. This restriction is given in the condition part of its Montage. The semantics of the while statement becomes complicated when there is a jump statement within it. For example, if a break statement appears in an iterative statement or a switch statement, its execution terminates the execution of the smallest enclosing such statement; control passes to the statement following the terminated statement. A continue statement may appear only within an iteration statement. It causes control to pass to the loop-continuation portion of the smallest enclosing such statement. In order to implement the semantics of the while statement, we set the data and control

ow functions as shown below. In addition, in order to deal with jump statements, we de ne some additional static semantics in the static analysis part. A new static function isclosest : Node String Node ! Boolean is used to indicate whether the rst parameter node in this 12

function has the same label as the second parameter, and there is no other node on all the paths from the third parameter node to the rst parameter node whose name is while, for or do. By using the function isclosest, the static analysis sets control for the closest break or continue statements inside the loop to the token \)" or the guard exp respectively. The dynamic semantics of the while statement is given by the rules for the tokens \(" and \)". Here the rule for the token \(", which is the key part of the dynamic semantics of the while statement, shows that if the value of the expression exp is non-zero then statement in this while statement is performed, otherwise control is passed to the token \)" so that it will execute the statement following the while statement. The Montage for while statements is shown below.

while statement ::= "while" "(" exp ")" statement I

S-exp NT

Value NT

S-"("

FalseTask

TrueTask

S-statement

S-")" T

do forall s: IsChild(s,CN.S-statement) if s.isclosest("break",CN.S-statement) then s.NT := S-")"; elseif s.isclosest("continue",CN.S-statement) then s.NT := S-"exp"; endif enddo condition isArithmeticType(CN.S-exp.StaticType) or isPointer(CN.S-exp.StaticType) @"(": if(CT.Value.NodeValue6=0) then CT := CT.TrueTask; else CT := CT.FalseTask; endif end@ @")": CT := CT.NT; end@

Figure 2. Semantics of while Statement.

13

3.3 Do Statement

The do statement is quite similar to the while statement. The dierence is that we need to execute the substatement in the do statement rst and then evaluate the expression's value. So in the data and control ow diagram we set the initial function I to s tatement rather than e xp. All the other parts of the Montage are the same as those of the while statement. The Montage for do statements is shown below.

do statement ::= "do" statement "while" "(" exp ")" S-exp NT

Value NT

S-"("

S-statement

FalseTask

TrueTask

S-")"

T

I

vary over child(s,CN.S-statement) if s.isclosest("break",CN.S-statement) s.NT := S-")"; else if s.isclosest("continue",CN.S-statement) s.NT := S-"exp"; endif endif endvary

condition isArithmeticType(CN.S-exp.StaticType) or isPointer(CN.S-exp.StaticType)

@"(": if(CT.Value.NodeValue6=0) then CT := CT.TrueTask; else CT := CT.FalseTask; endif end@ @")": CT := CT.NT; end@

Figure 3. Semantics of the Do Statement.

3.4 For Statement

The for statement is slightly dierent from the other two iteration statements. Its syntax is shown in the rst part of its Montage in the following. In the for statement, the rst expression is evaluated once, and thus speci es initialization for the loop. The second expression is evaluated 14

before each iteration, and if it becomes equal to 0, the for statement is terminated. The third expression is evaluated after each iteration, and thus speci es a re-initialization for the loop. The second expression's type must be arithmetic type or pointer type, which is shown in the condition part of its Montage. We deal with jump statements during the static analysis phase as with the while statement. The dynamic semantics of the for statement is given by the rules for the tokens for, \(" and \)" respectively. The rule for the token \(" is the key part for this rule. It rst checks the value of the second expression in the for statement and if the value is non-zero then the statement in the for statement will be executed, otherwise control will go to the token \)" so that it will execute the statement following this for statement. The Montage for for statements is shown below.

for statement ::= "for" "(" exp ";" exp ";" exp ")" statement S1-exp S3-exp

I

NT

S2-exp NT

S3-exp

Value FalseTask S-"(" S-")" NT TrueTask NT

S-statement

vary over child(s,CN.S-statement) if s.isclosest("break",CN.S-statement) s.NT := S-")"; else if s.isclosest("continue",CN.S-statement) s.NT := S3-"exp"; endif endif endvary condition isArithmeticType(CN.S2-exp.StaticType) or isPointer(CN.S2-exp.StaticType) @"(": if(CT.Value.NodeValue6=0) then CT := CT.TrueTask; else CT := CT.FalseTask; endif end@ @")": CT := CT.NT; end@

Figure 4. Semantics of the for Statement. 15

T

3.5 Label Statement

A label statement consists of a label followed by a statement. The only use of an identi er label is as a target of a goto statement. In order to consider unconditional transfer of goto statements, we de ne a new function LabelTable : String ! Token, which is used to indicate the speci c token for every label appearing in label statements. When a label statement is met, we rst need to check whether the same name of the label statement has been de ned before. If there is no such statement, the static analysis sets the token \:" to the LabelTable for this given identi er label, shown in the following Montage. Otherwise an error is reported. The dynamic semantics passes control to the statement following this label statement. The Montage for label statements is shown below.

label statement ::= Ident ":" statement S-Ident I

S-":"

NT

S-statement

T

LabelTable(CN.S-Ident.Name):=CN.S-":";

condition LabelTable(CN.S-Ident.Name)=undef @":": CT:=CT.NT; end@

Figure 5. Semantics of the Label Statement.

3.6 Jump Statement

Jump statements transfer control unconditionally. There are three kinds of jump statement which we consider here: break, continue and goto. (The return statement will be considered in the section on C functions.) A continue statement may appear only within an iteration statement. It causes control to pass to the loop-continuation portion of the smallest enclosing iteration statement. A break statement may appear in an iteration statement or a switch statement, and terminates execution of the smallest enclosing such statement; control passes to the statement following the terminated statement. A goto statement may appear in any part of the program. It causes control to go to the corresponding label statement. First let us discuss how to implement the continue statement; we deal with the break 16

statement in an identical way. In the static analysis part of the continue statement, we set the initial function for the statement and omit the terminal function. Because at this time we don't know where control should go, we postpone setting the terminal function until we have met the enclosing iterative statement containing this continue statement, as shown previously. At that point, the rules for that iteration statement will set the NT function for this token. The dynamic semantics of the continue statement sets control to goto the appropriate token de ned in the static analysis of the iteration statement. The Montages for continue and break statements are shown below.

cont statement ::= "continue" ";" I

S-"continue"

@"continue": CT := CT.NT; end@

Figure 6. Semantics of the Continue Statement.

break statement ::= "break" ";" I

S-"break"

@"break": CT := CT.NT; end@

Figure 7. Semantics of the Break Statement. The Montage for a goto statement is shown in the following. Here we de ne two new functions GotoNT : StatementTask ! StatementTask and GotoTable : String ! Token. The rst function is used to indicate where control goes when a goto statement is met. The second function is used to keep the goto label so that it can be used to nd the corresponding label statement. The static analysis sets the token \goto" to the entry in GotoTable whose index name is given by Ident. When static analysis of a C-function ends, we will compare GotoTable 17

with LabelTable which is used to indicate the token for the label in the label statement. If all the names in GotoTable are in the LabelTable, the static analysis sets GotoNT for the token in GotoTable to the corresponding token in LabelTable. This comparison is done when a syntax rule of a C-function has been reduced. This appears in the Montage for C-functions. The dynamic semantics of the goto statement passes control to the appropriate label statement, as de ned in the static analysis discussed above. The Montage for goto statements are shown below.

goto statement ::= "goto" Ident S-"goto"

I

GotoTable(CN.S-Ident) := CN.S-"goto"; @"goto": CT:=CT.GotoNT; end@

Figure 8. Semantics of the goto Statement.

3.6.1 Switch Statement

The switch statement causes control to be transferred to one of several statements depending on the value of an expression, which must have integral type. The C primitive types char and int are called integral type. We de ne a new function isIntegralType : TypeTask ! Boolean to check whether a (type) token is an integral type. Any statement within the substatement may be labeled with one or more case labels. There may also be at most one default label associated with a switch. When the switch statement is executed, its expression is evaluated, and compared with each case constant. If one of the case constants is equal to the value of the expression, control passes to the statement of the matched case label. If no case constant matches the expression, and if there is a default statement label, control passes to the labeled statement. If no case matches, and if there is no default, then none of the substatements of the switch statement is executed. In the static analysis part of the Montage for the switch statement, we set the data and control ow functions shown in the following. In order to deal with a break statement which may appear within it, we give the same rules as we did in the other iteration statements. In addition, we use the function SwitchTable to keep the token for the constant expression in the enclosed case statements. In order to get the value of the constant expression, we de ne the function LexicalV alue : Token ! CV alue to denote the value of constant expressions. This 18

function is part of the initial state and is generated during the lexical analysis of the input program. SwitchTable is used to nd the appropriate case statement to which control will be transferred during the dynamic semantics computation. The dynamic semantics rst nds whether there is a case statement for the value of the current expression. If there is one, then control goes to that case statement, otherwise control goes to the statement following the keyword \default". The statement following \default" is optional. If there is no such statement, then DefaultTask function points to the token \)". The Montage for switch statements are shown below.

switch statement ::= "switch" "(" exp ") Case f Case g [ "default" ":" statement ] I

S-exp S-Case

Value NT

LIST

NT

S-")"

S-"switch" DefaultTask

S-statement

T

NT

vary over child(s,CN.S-statement) if s.isclosest("break",CN.S-statement) then s.NT := S-")"; endif endvary vary over(s,CN.S-Case) SwitchTable(CN.S-"switch",s.S-exp.LexicalValue):= s.S-":"; endvary condition isIntegralType(CN.S-exp.StaticType) @"switch": if(CT.SwitchTable(CT.Value.NodeValue)6=undef) then CT:=CT.SwitchTable(CT.Value.NodeValue); else CT:=CT.DefaultTask; endif end@ @")": CT := CT.NT; end@ Figure 9. Semantics of Switch Statement.

Because we set SwitchTable in the Montage for the 19

switch

statement, the Montage for

statements become quite simple. The condition of the Montage for the case statement guarantees that the expression in case statement is a constant. The dynamic semantics is given by rules for the token \:" which passes control to the following token. The Montage for case statements is shown below.

case

Case ::= "case" exp ":" statement I

S-":"

NT

S-statement

S-exp

condition isConst(CN.S-exp) @":": CT:=CT.NT; end@

Figure 10. Semantics of Case Statement.

3.6.2 Statement List The last we discuss is the list of statements. Except for jump statements, statements are executed in sequence. The Montage for statement list are shown below.

ListStatement ::= statement f";" statement g LIST I

S-statement

T

Figure 10. Semantics of statement list.

4 Expressions All expressions in C are statically typed. The function StaticType : Token ! TypeTask is used to denote the (static) type of an expression throughout this section. An expression is said 20

to be an lvalue if this expression refers to an object which is a named region of memory. An obvious example of an lvalue expression is an identi er with a suitable type and storage class. A function lvalue : Token ! Boolean is used to indicate whether or not a node represents an lvalue expression.

4.1 Some New Functions and a Macro

C is not a strongly-typed language and so one may (re)cast types. For example, one may cast a pointer to a structure into a pointer to an array of characters. Thus, one can access the individual bytes of most values which might exist during the execution of the program. The universe address comprises the positive integers, which is used to model memory. In order to access the memory address corresponding to a variable, we de ne a new partial function Memory 1 : Token ! address which is used to set up the relation between a variable and its location in memory. A static function length : TypeTask ! Integer indicates how many bytes are used by a particular value type in memory. A dynamic function MemoryByteV alue : address ! bytes indicates the values stored in memory at a given address. Since most values of interest are larger than one byte, we need a means for storing members of CValue as individual bytes. A static partial function ByteToResult: TypeTaskbyten ! CValue converts the memory representation of a value of the speci ed basic type into its corresponding value in the CValue universe. Here n is the maximum number of bytes used by the memory representation of any particular basic type(and is implementation-dependent). For types whose memory representations are less than n bytes in length, we ignore any unused parameters. A static partial function ResultToByte : CV alue Integer TypeTask ! byte yields the speci ed byte of the memory representation of the speci ed value from the speci ed universe. This function is the inverse of ByteToResult. In order to get the value of the speci ed type being stored in memory starting at the indicated address easily, we de ne an abbreviation MemoryV alue : address TypeTask ! CV alue. MemoryV alue(addr; type) abbreviates ByteToResult(type,MemoryByteValue(addr), MemoryByteValue(addr+1),: : :; MemoryByteValue(addr+length(type)-1)). Because we use the byte to model memory, the rules for assignment to memory are a little complicated because a given assignment may require an arbitrary large number of updates to the MemoryByteV alue function. We use do-forall rules which perform a those arbitrary large number of updates in a systematic fashion. The transition rule for copying to memory is shown in the following: DoAssign(address,value,type) do forall i: 0 i length(type) ? 1 1

We use this name for a dierent purpose than that in [7].

21

MemoryByteValue(address+i):= ResultToByte(value, i, type); enddo

4.1.1 Comma Operator A comma expression has the following form, comma-expression ! expr1, expr2

where expr1 and expr2 are expressions. To evaluate a comma expression, evaluate expr1 and expr2 , left to right, returning the value of expr2 as the value of the whole expression. In the static analysis part of the comma expression, we set the data and control ow functions as shown in the following. The type of the result is the one of the right operand and the resulting expression is not an lvalue expression. In the dynamic semantics of the comma expression, we set the value of the second expression to the value of the token \,", which represents the value for the comma expression.

comma expr ::= expr1 "," expr2 I

S-expr1

NT

OnlyValue

S-expr2

NT

S-","

CN.S-",".StaticType := CN.S-expr2.StaticType CN.S-",".lvalue := false; @",": CT.NodeValue := CT.OnlyValue.NodeValue; CT := CT.NT; end@

Figure 11. Semantics of Comma Expression.

4.1.2 Conditional Expressions A conditional expression has the following form: conditional-expression ! expr1 ? expr2 : expr3

22

T

To evaluate a conditional expression, evaluate expr1. If the resulting value is non-zero, evaluate expr2 and set its value to the value of the whole conditional expression, otherwise evaluate expr3 and set its value to the value of the whole conditional expression. We represent the semantics for the conditional expression in a manner similar to that of conditional statements, as shown previously. In the static analysis part of the conditional expression, we set the function CondV alue : ExpressionTask ! ExpressionTask which is used to allow the token \:" to get the value of the token \?". The type of the resulting conditional expression depends on the types of the second and third operands' types. If the second and the third operands are arithmetic, then the type of the result is a common type to them. If both are structures or unions of the same type, or pointers to variables of the same type, the result has the common type. If one is a pointer and the other is the constant 0, the result is the pointer type. The function CommonType : TypeTask TypeTask ! TypeTask is used to implement this type conversion. The dynamic semantics of the conditional expression is given by the rules for the tokens \?" and \:". The rule for the token \?" decides which expression will be evaluated. The rule for the token \:" gets and keeps the nal result for this conditional expression.

23

condition expr ::= expr1 "?" expr2 ":" expr3 TrueTask

Value

S-expr1

NT

TrueValue NT

CondValue

S-"?"

I

S-expr2

FalseTask

S-expr3

S-":"

T

FalseValue NT

CN.S-"?".StaticType := CommonType(CN.S-expr2.StaticType,CN.S-expr3.StaticType); CN.S-"?".lvalue := false;

condition CommonType(CN.S-expr2.StaticType,CN.S-expr3.StaticType)6=undef @"?": if(CT.Value.NodeValue 6= 0) then CT := CT.TrueTask; else CT := CT.FalseTask; endif end@ @":": if(CT.CondValue.Value.NodeValue6=0) then CT.NodeValue := CT.TrueValue.NodeValue; else CT.NodeValue := CT.FalseValue.NodeValue; endif CT:=CT.NT; end@

Figure 12. Semantics of Condition Expression.

4.2 Logical OR Expression

A logical OR expression has the following form: logical-or-expression ! expr1 k expr2

where expr1 and expr2 are expressions. To evaluate a logical OR expression, start by evaluating 24

expr1 . If the result is non-zero, the value of the logical OR expression is 1 and expr2 is not evaluated. Otherwise, the value of the logical OR expression is the value of expr2, with non-zero

values coerced to 1. The operands need not have the same type, but each must have arithmetic type or be a pointer. The result type is int. The condition of the following Montage guarantees the operands' type restriction. In the static analysis of the logical OR expression, we set the function NT of expr1 to the token \k" and de ne a new function RightInitial : ExpressionTask ! ExpressionTask for the token \k". These two functions help to implement the order of evaluation for the logical OR expression. The resulting expression is not an lvalue expression. Because the result type is int, the static analysis sets the result static type to INT, which represents the token whose name is \int". The dynamic semantics of the logical OR expression is given by the rule for the token \k". Because the function NT for expr1 and expr2 are both set to the token \k", the rule for the token \k" rst checks the value of expr1. If the value of expr1 is non-zero, the dynamics semantics sets the value of expr1 to the value of the token \k". Otherwise, dynamics semantics checks whether expr2 has been evaluated by using the function RightV isited : ExpressionTask ! Boolean. If expr2 has been evaluated, then dynamics semantics sets the value of \k" to the value of the token expr2. Otherwise, dynamics semantics evaluates expr2.

25

logic or exp ::= expr1 "jj" expr2 Left

I

S-expr1

Right NT

S-expr2

S-"k"

T

RightInitial

NT

CN.S-" j j".StaticType := INT; CN.S-" j j".lvalue := false;

condition (isArithmeticType(CN.S-expr1.StaticType) or

isPointer(CN.S-expr1.StaticType)) and (isArithmeticType(CN.S-expr2.StaticType) or isPointer(CN.S-expr2.StaticType))

@" j j": if (CT.Left.NodeValue 6= 0) then CT.NodeValue := 1; CT:=CT.NT; else if (CT.RightVisited=true) then if(CT.Right.NodeValue6=0) then CT.NodeValue := 1; else CT.NodeValue:=0 endif; CT.RightVisited := false; CT := CT.NT; else CT.RightVisited := true CT := CT.RightInitial; endif endif end@

Figure 13. Semantics of OR expression. For AND expressions, we can give a Montage in the same way as for OR expressions(with the obvious change in value) and we omit it here.

4.3 Assignment Expression

A simple assignment has the following form: assignment-expression ! expr1 = expr2

where expr1 and expr2 are expressions. To evaluate a simple assignment expression, copy 26

the value of epxr2 into the memory location given by expr1 , returning that value as the value of the whole assignment expression. Here we de ne an external function ChooseTask : ExpressionTask ExpressionTask ! ExpressionTask to indicate which one between expr1 and expr2 is executed rst2 In C, most of the expressions do not give a xed order for evaluating its subexpressions. So we use ChooseTask to choose the appropriate subexpression to be executed rst. The type of an assignment expression is that of its left operand and the resulting expression is no longer an lvalue expression. The static analysis decides which expression is executed rst. In addition, we de ne a new function LeftV isited : Token ! Integer to indicate whether the left subexpression of an expression has been executed or not. Because C is not a strongly typed language, the two operand types may not be the same; however the right-hand-side operand type should be assignable to the left-hand-side operand type. More details about the abbreviation a AssignableTo b will be introduced in the types section. In order to access dierent subexpressions in this whole assignment expression, we de ne some new functions LeftTask; RightTask; LeftV alue; RightV alue : ExpressionTask ! ExpressionTask. The dynamic semantics of the Montage executes the rst expression which is decided during the static analysis, then executes the other expression and last assigns the value of expr2 into the memory location given by expr1. 2 Here we assume that the order of the subexpressions is decided at compile time. In addition, the evaluation of subexpressions may not be interleaved. See section 8.

27

eq assign expr ::= expr1 "=" expr2 I

RightValue LeftValue S-"=" LeftTask RightTask

S-expr1

S-expr2 T CN.S-"=".StaticType := CN.S-expr1.StaticType; CN.S-"=".lvalue := false; CN.S-"=".FirstTask:= ChooseTask(CN.S-expr1,CN.S-expr2); CN.S-"=".LeftVisited := false; CN.S-"=".RightVisited := false;

condition CN.S-expr1.lvalue=true and CN.S-expr2.StaticType expr1.StaticType

AssignableTo

CN.S-

@"=": if(CT.LeftVisited=false and CT.RightVisited=false) then CT:=CT.FirstTask; else if(CT.LeftVisited=true AND CT.RightVisited=true) then DoAssign(CT.LeftValue.Memory, CT.RightValue.NodeValue,CT.LeftValue.StaticType); CT.NodeValue:=CT.Right.NodeValue; CT.LeftVisited:=false; CT.RightVisited:=false; CT:=CT.NT; else if(CT.LeftVisited=true) then CT.RightVisited:=true; CT:=CT.RightTask; else if(CT.RightVisited=true) CT.LeftVisited:=true; CT:=CT.LeftTask; endif; endif endif endif end@

Figure 14. Semantics of the assignment expression. Within C, there are other assignment operators(\+=", \*=", etc.) which perform a mathematical operation on the value of expr2 and the value stored in the memory location given by expr1 . The result is copied into the memory location given by expr1. For example, \i*=2" has the same value and eect as \i=i * 2". However, it is not true in general that \a op= b" can be seen as an abbreviation for \a=a op b", since the expression a is evaluated only once in the 28

former expression, but twice in the latter. Except for the additive and subtractive assignment operators, all the other assignment operators have Montages like that shown in the following for the multiplicative assignment operator(\*=").

multi assign expr ::= expr1 "*=" expr2 I

Left

LeftTask NT S-expr1

S-"*="

Right RightTask NT S-expr2

T CN.S-"*=".StaticType := CN.S-expr1.StaticType; CN.S-"*=".lvalue := false; CN.S-"*=".FirstTask:= ChooseTask(CN.S-expr1,CN.S-expr2);

condition CN.S-expr1.lvalue=true and

CN.S-expr2.StaticType AssignableTo CN.S-expr1.StaticType

@"=": f(CT.LeftVisited=False and CT.RightVisited=False) then CT:=CT.FirstTask; else if(CT.LeftVisited=True and CT.RightVisited=True) then DoAssign(CT.Left.Memory, CT.Right.NodeValue*MemoryValue(CT.Left.Memory), CT.Left.StaticType); CT.NodeValue := CT.Right.NodeValue*MemoryValue(CT.Left.Memory); CT.LeftVisited:=False; CT.RightVisited:=False; else if(CT.LeftVisited=True) then CT.RightVisited:=True; CT:=CT.RightTask; else if(CT.RightVisited=True) CT.LeftVisited:=True; CT:=CT.LeftTask; endif; endif; endif endif end@

Figure 15. Semantics of the multiplicative assignment. The additive and subtractive assignment operators are two exceptions to this general format. 29

In C, one may add an integer i to a pointer expression p, with the result being a pointer which is i units forward in memory from p. Thus, to process `a += b", we must perform dierent actions if a is a pointer variable. In the static analysis part of the additive assignment, we need to check whether the left operand is a pointer or not. If it is a pointer then we record the size of the type to which the left operand points. Otherwise we do nothing. The dynamic semantics of the additive assignment is given by the rule for \+=". The rule for \+=" rst checks whether the left operand of the assignment expression is a pointer. Then it will execute the dierent actions accordingly.

30

add assign expr ::= expr1 "+=" expr2 I

LeftValue LeftTask

S-expr1

RightValue

S-"+=" T

RightTask NT

S-expr2

CN.S-"+=".StaticType := CN.S-expr1.StaticType; CN.S-"+=".lvalue := false; if(IsPointer(CN.S-expr1.StaticType)) then CN.oset:=CN.S-expr1.StaticType.PointedType.length; endif CN.S-"+=".FirstTask:= ChooseTask(CN.S-expr1,CN.S-expr2); condition CN.S-expr1.lvalue=true and CN.S-expr2.StaticType AssignableTo CN.S-expr1.StaticType) @"=": if(CT.LeftVisited=False and CT.RightVisited=False) then CT:=CT.FirstTask; else if(CT.LeftVisited=True and CT.RightVisited=True) then if(IsPointer(CT.LeftValue.StaticType)6=true) then DoAssign(CT.LeftValue.Memory, CT.RightValue.NodeValue+MemoryValue(CT.LeftValue.Memory), CT.ValueLeft.StaticType); CT.NodeValue:=MemoryValue(CT.LeftValue.Memory)+CT.RightValue.NodeValue; else DoAssign(CT.LeftValue.Memory, MemoryValue(CT.LeftValue.Memory)+ CT.RightValue.NodeValue*CT.oset,CT.LeftValue.StaticType); CT.NodeValue:=MemoryValue(CT.LeftValue.Memory)+CT.RightNode*CT.oset; endif CT.LeftVisited:=False; CT.RightVisited:=False; CT:=CT.NT; else if(CT.LeftVisited=True) then CT.RightVisited:=True; CT:=CT.RightTask; else if(CT.RightVisited=True) CT.LeftVisited:=True; CT:=CT.LeftTask; endif endif endif endif end@

Figure 16. Semantics of the additive assignment. 31

For the subtractive assignment operator, we can give its Montage in the same way and we omit it here.

4.4 The sizeof Operator

An expression having the sizeof operator has one of the following forms: sizeof-expression ! sizeof expression sizeof-expression ! sizeof (type-name)

For the rst case the data and control functions are shown in the following Montage. The dynamic semantics gets the value by using the function V alue to get the expression's type. For the second case we can directly get the value from the type name; the corresponding Montage is thus similar and we omit it.

sizeof exp ::= "sizeof" expr S-exp

I

S-"sizeof"

T

Value

CN.S-"sizeof".StaticType := CN.S-exp.StaticType @"sizeof": CT.NodeValue:= CT.Value.StaticType.length; CT:=CT.NT; end@

Figure 17. Semantics of sizeof operator.

4.5 General Mathematical Expressions

There are many mathematical expressions in C involving binary operators (\*",\+",\?", etc.) whose behaviors are similar. (We treat the bit-wise operators (e.g.,|,&) as ordinary mathematical operators.) To evaluate one of these expressions, evaluate both operand expressions and apply the appropriate function. We present the Montage for multiplication in the following as a representative of this category of expressions. Another important thing related to mathematical expressions is the usual arithmetic conversions whose eect is to bring operands into a common type. To implement this conversion, we de ne a new function TypeConvert : TypeTask TypeTask ! TypeTask, whose de nition 32

is based on the following rules:

if either operand is long double, the resulting type is long double. otherwise, if either operand is double, the resulting type is double. otherwise, if either operand is float, the resulting type is float. otherwise, the integral promotions are performed on both operands; then, if either operand is unsigned long int, the resulting type is unsigned long int. otherwise, if one operand is long int and the other is unsigned int, the eect depends on whether a long int can represent all values of an unsigned int; if so, the unsigned int operand is converted to long int; if not, both are unsigned long int. otherwise, if one operand is longit, the resulting type is long int. otherwise, if either operand is unsigned int, the resulting type is unsigned int. otherwise, convert char and short to int and the resulting type is int.

33

time multi expr ::= expr1 "*" expr2 I LeftTask

RightValue RightTask

S-"*"

LeftValue

NT

S-expr1

T

NT

S-expr2

CN.S-"*".StaticType := TypeConvert(CN.S-expr1.StaticType,CN.S-expr2.StaticType); CN.S-"*".lvalue := false; CN.S-"*".FirstTask:= ChooseTask(CN.S-expr1,CN.S-expr2); condition IsArithmeticType(CN.S-expr1) and IsArithmeticType(CN.S-expr2) @"=": if(CT.LeftVisited=False and CT.RightVisited=False) then CT:=CT.FirstTask; else if(CT.LeftVisited=True and CT.RightVisited=True) then CT.NodeValue := CT.RightValue.NodeValue*CT.LeftValue.NodeValue; CT.LeftVisited:=False; CT.RightVisited:=False; CT:=CT.NT; else if(CT.LeftVisited=True) then CT.RightVisited:=True; CT:=CT.RightTask; else if(CT.RightVisited=True) CT.LeftVisited:=True; CT:=CT.LeftTask; endif endif; endif endif end@

Figure 18. Semantics of the multiplication. Montages for addition and subtraction operators are a little complicated, since we must consider the case when one of the operands is a pointer. The way we deal with this case is similar to additive assignment. One may add an integer i to a pointer variable p with the result being a pointer which is i units forward in memory from p. According to the semantics of the additive expression, the types of the both operands can 34

be of arithmetic type or pointer type; but they can not be both pointer types. So we give this constraint in the condition part of the Montage for additive expressions, shown in the following. In the static analysis part of the Montage, we de ne two new functions Lindirect; Rindirect : ExpressionTask ! Boolean to indicate whether the left and right operands of an operator are pointers respectively. They are initially set to be false. If they are pointers then the corresponding functions are set to be true, and at the same time we compute the oset for the type to which the pointer points. The dynamic semantics of the additive expression is given by the rule for the token \+", which is shown in the following.

35

add expr ::= expr1 "+" expr2 S-expr1

LeftValue LeftTask NT

I S-"+" T

RightValue RightTask NT

S-expr2

CN.S-"+".StaticType:= TypeConvert(CN.S-expr1.StaticType,CN.S-expr2.StaticType); CN.S-"+".lvalue := false; if(IsPointer(CN.S-expr1.StaticType)) then CN."+".LIndirect:=true; CN.oset:=CN.S-expr1.StaticType.PointedType.length; else if(IsPointer(CN.S-expr2.StaticType)) then CN."+".RIndirect:=true; CN.oset:=CN.S-expr2.StaticType.PointedType.length; endif endif CN.S-"+".LeftVisited:=false; CN.S-"+".RightVisited:=false; CN.S-"+".FirstTask:= ChooseTask(CN.S-expr1,CN.S-expr2);

condition (IsArithmeticType(CN.S-expr1.StaticType) or

IsPointer(CN.S-expr1.StaticType)) AND(IsArithmeticType(CN.S-expr2.StaticType) or IsPointer(CN.S-expr2.StaticType)) AND (not(IsPointer(CN.S-expr1.StaticType) and IsPointer(CN.S-expr2.StaticType)))

@"+": if(CT.LeftVisited=false and CT.RightVisited=false) then CT:=CT.FirstTask; else if(CT.LeftVisited=true and CT.RightVisited=true) then if(CT.LIndirect=true) then CT.NodeValue:= CT.LeftValue.NodeValue+CT.RightValue.NodeValue*CT.oset; else if(CT.RIndirect=true) then CT.NodeValue:=CT.LeftValue.NodeValue*CT.oset +CT.RightValue.NodeValue; else CT.NodeValue:=CT.LeftValue.NodeValue +CT.RightValue.NodeValue; endif CT.LeftVisited:=false; CT.RightVisited:=false; CT:=CT.NT; endif else if(CT.LeftVisited=true) then CT.RightVisited:=true; CT:=CT.RightTask; else if(CT.RightVisited=true) CT.LeftVisited:=true; CT:=CT.LeftTask; endif endif; endif; endif 36 end@

Figure 19. Semantics of the addition. For subtractive expressions, we can give the Montage in the same way as for additive expressions. However, both of the two subexpressions' types can be of pointer type. The result is that the dierence between the two pointers is divided by the size of the type to which they point. The following is the Montage for subtractive expressions.

37

substract expr ::= expr1 "-" expr2 I

LeftValue

S-expr1

LeftTask NT

S-"-" T

RightValue RightTask NT

S-expr2

CN.S-"-".StaticType:=TypeConvert(CN.S-expr1.StaticType,CN.S-expr2.StaticType); CN.S-"-".lvalue := false; if(IsPointer(CN.S-expr1.StaticType)) then CN."-".LIndirect:=true; CN.oset:=CN.S-expr1.StaticType.PointedType.length; endif if(IsPointer(CN.S-expr2.StaticType)) then CN."-".RIndirect:=true; CN.oset:=CN.S-expr2.StaticType.PointedType.length; endif CN.S-"-".FirstTask:= ChooseTask(CN.S-expr1,CN.S-expr2); CN.S-"-".LeftVisited:=0; CN.S-"-".RightVisited:=0; condition (IsArithmeticType(CN.S-expr1.StaticType) and IsArithmeticType(CN.S-expr2.StaticType)) OR ((IsPointer(CN.S-expr1.StaticType) and IsArithmeticType(CN.S-expr2.StaticType)) OR (IsPointer(CN.S-expr1.StaticType) and IsPointer(CN.S-expr2.StaticType) and CN.S-expr1.PointedType.StaticType=CN.S-expr2.PointedType.StaticType)) @"-": if(CT.LeftVisited=False and CT.RightVisited=False) then CT:=CT.FirstTask; else if(CT.LeftVisited=True and CT.RightVisited=True) then if(CT.LIndirect=true AND CT.RIndirect=true) then CT.NodeValue := (CT.LeftValue.NodeValue-CT.RightValue.NodeValue)/CT.LeftValue.PointedType.length; else if(CT.LIndirect=true) then CT.NodeValue := CT.LeftValue.NodeValueCT.RightValue.NodeValue*CT.oset; else CT.NodeValue:=CT.LeftValue.NodeValue-CT.RightValue.NodeValue; endif CT.LeftVisited:=False; CT.RightVisited:=False; CN:=CN.NT; endif else if(CT.LeftVisited=True) then CT.RightVisited:=True; CT:=CT.RightTask; else if(CT.RightVisited=True) then CT.LeftVisited:=True; CT:=CT.LeftTask endif endif endif endif end@

38

Figure 20. Semantics of the subtraction operator.

4.6 Mathematical Unary Operators

Unary operator expressions have one of the following forms: unary-expression ! + expression unary-expression ! ? expression unary-expression ! ~ expression unary-expression ! ! expression Evaluating these expressions takes a form similar to that for binary mathematical operators and all the unary operators' computations are similar. Here we present the Montage for the minus operator. The operand of the unary minus operator must have arithmetic type, and the result is the negative value of the operand. So we present the Montage for the negation operator in the following.

minus exp ::= "-" exp I

S-exp

NT

S-"-"

T

Value

CN.S-"-".StaticType := CN.S-exp.StaticType

condition isArithmeticType(CN.S-exp.StaticType) @"+": CT.NodeValue:=-CT.Value.NodeValue; CT:=CT.NT: end@

Figure 21. Semantics of minus operator.

4.7 Casting Expression

A unary expression preceded by the parenthesized name of a type causes conversion of the value of the expression to the named type: case-expression ! ( type-name ) expression

39

In order to implement the conversion from the old type to the new type, we de ne a new function CastConvert : TypeTask TypeTask CV alue ! CV alue. So the Montage for the cast expression can be represented as follows.

cast expr ::= "(" type ")" expr StaticType

Value

I

S-expr

NT

S-")"

T

S-type

StaticType

@")": CT.NodeValue:= CastConvert(CT.Value.StaticType, CT.CastType,CT.Value.NodeValue); CT:=CT.NT; end@

Figure 22. Semantics of cast expression.

4.8 Pre-Increment and Pre-Decrement expression

A pre-increment or pre-decrement expression has the following form: pre-incr-expression ! ++ expression pre-decr-expression ! ?? expression

The operand's type can be of arithmetic type or pointer type. The operand must be an lvalue. The condition of the Montage for pre-increment expressions guarantees this restriction. The result is not an lvalue. The operand is incremented(++) or decremented(??) by 1. The result of this update is the resulting value of the operand. But when the operand is of pointer type, the value in memory is incremented or decremented by the size of the object to which the pointer points.(as with normal pointer addition and subtraction.) The Montage for the pre-increment expression is shown in the following.

40

pre incr exp ::= "++" expr OnlyValue

I

S-expr

NT

S-"++"

T

CN.S-"++".StaticType := S-expr.StaticType; CN.S-"++".lvalue := false if isPointer(CN.S-expr.StaticType) then CN.S-"++".Indirect := true CN.S-"++".oset := CN.S-expr.StaticType.PointedType.length; endif condition lvalue(S-unary exp)=true AND (isArithmeticType(CN.S-expr.StaticType) OR isPointer(CN.S-expr.StaticType)) @"++": if(CT.Indirect 6= true) then DoAssign(CT.OnlyValue.Memory, MemoryValue(CT.OnlyValue.Memory)+1, CT.OnlyValue.StaticType); CT.NodeValue := CT.MemoryValue(CT.OnlyValue.Memory) + 1; else DoAssign(CT.OnlyValue.Memory, MemoryValue(CT.OnlyValue.Memory)+CT.oset; CT.OnlyValue.StaticType); CT.NodeValue := MemoryValue(CT.OnlyValue.Memory)+CT.oset; endif end@

Figure 22. Semantics of pre-increment expression. The Montage for the pre-decrement expression is similar to the pre-increment expression and thus omitted.

4.9 Post-Increment and Post-Decrement Expressions

A post-increment or post-decrement expression has the following form: post-incr-expression ! expression ++

41

post-decr-expression ! expression ??

The post-increment or post-decrement operators are handled in the same way as pre-increment or pre-decrement operators except that the sequence of operations is reversed: i.e., the value of the whole expression is computed before the incrementing or decrementing takes place. The Montage for the post-increment expression is shown in the following.(As before, the Montage for the post-decrement operator is similar and thus omitted.)

post incr exp ::= expr "++" I

S-expr

OnlyValue NT

S-"++"

CN.S-"++".StaticType := CN.S-expr.StaticType; CN.S-"++".lvalue := false; if IsPointer(CN.S-expr.StaticType) then CN.Indirect := true; CN.oset := CN.S-expr.StaticType.PointedType.length; endif

condition lvalue(S-expr)=true AND

(isArithmeticType(CN.S-expr.StaticType) OR (isPointer(CN.S-expr.StaticType))

@"++": CT.NodeValue := MemoryValue(CT.OnlyValue.Memory); if(CT.Indirect 6= true) then DoAssign(CT.OnlyValue.Memory, MemoryValue(CT.OnlyValue.Memory)+1, CT.OnlyValue.StaticType); else DoAssign(CT.OnlyValue.Memory, MemoryValue(CT.OnlyValue.Memory)+CT.oset, CT.OnlyValue.StaticType); endif end@

Figure 23. Semantics of post-increment expression.

42

T

4.10 Address

The address expression has the following form: addr-exp : ! & expr1

where expr1 is an expression. The & operator in C passes back as its result the address of the memory location given by the argument expression. The Montage for the address expression is shown in the following. The condition of the Montage guarantees that the expression should be an lvalue which means that it refers to a location in memory. But this overall expression cannot be an lvalue.

addr unary exp ::= "&" unary exp OnlyValue

I

S-unary exp

NT

S-"&"

T

CN.S-"&".StaticType:=POINTER; CN.S-"&".StaticType.PointedType:=S-unary exp.StaticType; lvalue(CN.S-"&") := false;

condition lvalue(CN.S-unary exp)=true @"&": CT.NodeValue := CT.OnlyValue.Memory; CT := CT.NT; end@

Figure 24. Semantics of Address expression.

5 Type and Variable Declarations Before discussing variable declarations, we rst discuss types in C. In the following section the universe named TypeTask consists of all tokens related to the type construct.

43

5.1 Types

There are many kinds of types in C, which include the ground types, structure or union types, enumerated types and others. Ground types include all the basic types including char, short, int, long, float, double, signed, and unsigned. The semantics for all these ground types are similar and we choose the int type as a representative of this category of ground types. The arithmetic type includes double, float, long and int and other types such as char, short, unsigned etc. and their combinations. We use a string of capital letters to represent the token with the same non-capital letter's name. For example, INT represents a token whose name is \int". The universe ArithmeticTask is de ned as fINT, LONG, FLOAT, DOUBLE, . . . ,g. The function isArithmeticType : TypeTask ! Boolean indicates whether a token belongs to one of the corresponding elements in the universe ArithmeticTask. In addition, the universe IntegralTypeTask is de ned as fINT, CHARg to satisfy the type requirement for switch statements. In the Montage for the int type, the static analysis sets the number of bytes to be INTLENGTH . The value of INTLENGTH is a constant dependent on the machine implementation. The Montages for other types are similar.

int type = "int" S-"int" CN.length := INTLENGTH;

Figure 25. Semantics of int type. The other types of C are more complicated. The structure type consists of a sequence of named members of various types. For brevity, these members are called elds of the structure type. The names of members may not con ict with each other. The condition of the Montage for the struct type guarantees this restriction. In order to access the elds of the structure, we de ne a new function field : TypeTask String ! TypeTask. For every token named \struct" and a eld name, the function field gives the corresponding eld node (token). As with arithmetic types, STRUCT is used to represent the token \struct". A new function isStruct : TypeTask ! Boolean indicates whether a given token is a STRUCT. In the static analysis part of the Montage for structure types, we de ne two new external functions. One is AllocateField : token ! address, which is used to indicate the eld's relative oset in the structure. The other is AllocateLength : TypeTask ! Integer which is used to give the total length for this structure when a variable with this type is declared. By using 44

these two functions, the static analysis sets the oset for every eld for the structure type and the total length for this type. The new function eldoset: TypeTask ! Integer indicates the oset for every eld in this structure type. In addition, the static analysis sets the terminal function T for the token \struct" so that other Montages which use struct types may access the \struct" node with data ow functions.

struct type ::= "struct" "f" [FieldDecl f ";" FieldDecl g ] "g" FieldDecl = type FieldVar f"," FieldVar g FieldVar

=

Ident

I

StaticType

LIST S-FieldVar

S-"struct" T

LIST

S-type

S-FieldDecl

do forall i: 1 ( ? ) do forall j: 1 ( ? [] ? CN.S-FieldDecl[i].S-FieldVar[j]. eldoset:= AllocateField(CN.S-FieldDecl[i].S-FieldVar[j]); enddo enddo CN.S-"struct".length:=AllocateLength(CN.S-"struct"); do forall s: s 2 CN.S-FieldDecl.S-FieldVar eld(CN,s.String):=s; enddo i

ListLength C N:S

j

F ieldDecl

Listlength C N:S

F ieldDecl i :S

F ieldV ar

)

condition 8 d1,d22 CN.S-FieldDecl.S-FieldVar d1 = 6 d2 implies d1.Name = 6 d2.Name Figure 26. Semantics of structure type. The union type is similar to the structure type but it contains any one of several members of various types at dierent times. It may be thought of as a structure all of whose members begin in oset 0 and whose size is sucient to contain any of its members. Similarly, the special TypeTask UNION is used to denote the token \union" and the function isUnion(s) is used to indicate whether the token s is a UNION. The static analysis of the Montage for union type sets the length for this union type by using the external function AllocateLength. Similarly, the condition of the Montage for union type guarantees that all the eld names do not con ict with each other. 45

union type ::= "union" "f" [FieldDecl f ";" FieldDecl g ] "g FieldDecl = type FieldVar f"," FieldVar g FieldVar

=

Ident

LIST

StaticType

S-"union"

LIST S-FieldVar

T

S-type

S-FieldDecl

vary over ind(i,CN.S-FieldDecl) vary over ind(j,CN.S-FieldDecl[i].S-FieldVar) CN.S-FieldDecl[i].S-FieldVar[j]. eldoset:=0; endvary endvary CN.S-"union".length := AllocateLength(CN.S-"union"); vary(s, CN.S-FieldVar) eld(CN, s.String):=s; endvary condition 8 d1,d22 CN.S-FieldDecl.S-FieldVar d1 6= d2 implies d1.Name 6= d2.Name

Figure 27. Semantics of union type.

5.2 Pointers

When we declare a variable which points to another type, for example, int * a; the variable a may store a value which points to something of type int. A new function PointedType : TypeTask ! TypeTask connects the token \*" to the type being pointed to by the parameter. The token \*" is used to represent the pointer type, which is an element in the universe Pointer. The function isPointer : TypeTask ! Boolean indicates whether a given token is an element in the universe Pointer. The static analysis sets the length for the pointer type to be POINTERLENGTH, which is dependent on the machine implementation.

46

pointer type ::= type "*" T

PointedType

S-"*"

S-type

CN.S-"*".length := POINTERLENGTH;

Figure 28. Semantics of pointer type.

5.3 A Macro: AssignableTo

Having introduced types in C, we de ne the macro AssignableTo, which represents an assignable relation between two types. As mentioned above, because C is not a strongly typed language, the two operands types of an assignment operator may not be the same; however the righthand-side operand type should be assignable to the left-hand side operand. This relation can be represented by the macro AssignableTo as follows: t2 AssignableTo t1 = t1 != undef and t2 != undef and ( isArithmeticType(t1) and isArithmeticType(t2) or isStruct(t1) and isStruct(t2) and ListLength(t1)=ListLength(t2) and 8i 2 ListLength(t1:field) t2. eld[i] AssignableTo t1. eld[i] or isUnion(t1) and isUnion(t2) and ListLength(t1)=ListLength(t2) and 8i 2 ListLength(t1:field) t2. eld[i] AssignableTo t1. eld[i] or isPointer(t1) and isPointer(t2) and t2.PointedType AssignableTo t1.PointedType )

5.4 Variable Declaration and Initialization

Before we use a variable we need to declare it. C distinguishes between so-called static variables and other variables. The dierence between static and non-static variables arises when control is passed to the declaration for a variable. If the variable is not static, new memory is always allocated to the variable and its initializing expression(if it exists) is evaluated with the value of the expression being assigned to the new memory location. If the variable is static, then the above allocation and initialization is performed only the rst time that the declaration is 47

executed; should the declaration become the focus of control again, the same memory segment is allotted to the variable.

5.4.1 Non-static Variable Declaration The Montage for non-static variable declarations is shown in the following. The condition of the Montage guarantees that the new variable name is not multiply de ned. In order to set up the relation between the variable name and its corresponding node represented in the Montage, we de ne a new function V arTable : String ! Token. The static analysis part sets the V arTable for the new variable name to its node(token) so that we can nd its corresponding node when this variable is visited elsewhere in its scope. The data and control functions are shown in the following Montage. The dynamic semantics rst checks whether an initializing expression exists. If it does not exist, the dynamic semantics just allocates the memory for this variable by using the new external function CurAddr; or it exists and it has been evaluated, then it allocates the new memory address for the new variable by using the new external function CurAddr and assigns the initializing expression's value to the corresponding memory. If an initializing expression exists but has not been evaluated, then dynamic semantics passes control to the initializing expression.

48

non decl

direct decl

::= type direct decl ["=" exp] =

S-exp

Ident

Value ChildTask NT

S-Ident I

StaticType

S-type

T

VarTable(CN.S-Ident.Name):=CN.S-direct decl;

condition VarTable(CN.S-Ident.Name)=undef @direct decl: if(CT.Value.Node6=undef OR CT.ChildTask=undef) then CT.Memory := CurAddr; if(CT.Value.NodeValue6=undef) then DoAssign(CurAddr, CT.Value.NodeValue, CT.StaticType); endif CT:=CT.NT; else if(CT.Value.NodeValue=undef AND CT.ChildTask6=undef) then CT := CT.ChildTask; endif endif end@

Figure 29. Semantics of non-static variable declaration.

5.4.2 Static Variable Declaration If the variable is static, the above allocation and initialization is performed only the rst time that the declaration is executed; should the declaration become the focus of control once again, the same memory segment is allotted to the variable. The condition of the Montage for static variable declarations is similar to that for non-static variable declaration. We de ne a new function visited : Token ! Boolean to indicate whether this variable declaration has been visited before. The function visited is initially set to be false. The dynamic semantics rst checks whether this declaration has been visited before or not. If it has not been visited, it will allocate the new memory address for the new static variable by using the external function CurAddr. Otherwise because the Memory function for this token 49

contains the address for the static variable, nothing further needs to be done and the dynamic semantics transfers control to the next token.

static decl ::= "static" type direct decl [ "=" exp ] direct decl

=

Ident

Value

S-exp

ChildTask NT

I S-Ident

StaticType

S-type

T VarTable(CN.S-Ident.Name):=CN.S-direct decl; CN.S-direct decl.visited:=false;

condition VarTable(CN.S-Ident.name)=undef @Ident: if(CT.visited = false) then if(CT.Value.NodeValue6=undef OR CT.ChildTask=undef) then CT.Memory := CurAddr; if(CT.Value.NodeValue6=undef) then DoAssign(CurAddr,CT.Value,CT.StaticType) endif CT:=CT.NT; CT.visited:=true; else CT:=CT.ChildTask; endif else CT:=CT.NT; endif end@

Figure 30. Semantics of static variable declaration.

5.4.3 Declarations Related to Functions There are two kinds of declarations related to C-functions. The rst is a function variable declaration, which represents a function returning the type de ned in the beginning of its declaration. The other is a function pointer declaration, which can dynamically point to dierent C-functions during program execution. For these two cases, we give the following Montages respectively in the following. For function variables, the condition part of the Montage guarantees that this name should not be previously de ned. The static analysis sets this token to the entry in V arTable. Here we need to distinguish C-function variables from C-function names. For the tokens representing 50

C-function variables we just allocate the memory for this C-function; however for the tokens representing C-function names we need to store this token because it denotes the statements to be executed in this C-function. Here we de ne a new function FuncTable : String ! Token to connect a C-function name to its token. In addition, in order to set up the relation between the memory and the token denoting C-functions, we de ne a new function AddrToFunc : CV alue ! Token. From the value in a memory cell we can get the corresponding token for the function. The dynamic semantics allocates a memory address for it and then sets function AddrtoFunc for this C-function variable to the token to denote the corresponding token for the C-function name. Then it passes control to the next task. The Montage is shown in the following.

func decl ::= type fname "( [type f"," typeg] )" fname

=

I

Ident

T S-fname

StaticType

S1-type LIST

Varparam(.)

S2-type

VarTable(CN.S-Ident.Name):=CN.S-fname;

condition VarTable(CN.S-Ident.Name)=undef @Ident: CT.Memory:=CurAddr; CurAddr.AddrToFunc:=FuncTable(CT.Name); CT:=CT.NT; end@

Figure 29. Semantics of function variable declaration. For the function pointer, we give the following Montage for it. The declaration for function pointers is dierent from other variables. For example, in the declaration \ int (*f)()" f represents a pointer to a function returning \int" type. In order to treat a function pointer as a variable, the static analysis sets the function StaticType for the token of the function pointer to token \*". And the static analysis sets the function PointedType for token \*" to the token representing the function return type. Since it is possible for a function pointer to have parameters, we de ne a new function V arparam : Token Integer ! Token. The static analysis sets 51

V arparam for the token representing the function pointer as shown in the following Montage.

The Montage for function pointers is shown as follows.

funcp decl ::= type "(" "*" Ident ")(" [type f"," typeg] ")" I

StaticType

S-Ident T

S-"*"

PointedType

S1-type

LIST S2-type

Varparam

VarTable(CN.S-fname.Name):=CN.S-Ident;

condition VarTable(CN.S-Ident.Name)=undef @Ident: CT.Memory:=CurAddr; CT:=CT.NT; end@

Figure 29. Semantics of function pointer declaration.

5.4.4 Array Declaration Array variables can be treated as pointer variables in C, Thus we treat arrays as pointers, and use the function PointedType to represent the type of elements stored in an array. Before giving the Montage for array variables, we look at the following example. An array variable represents a set of variables with the same type. C permits a form of multi-dimensional array. For example, int x3d[3][5][7];

declares a three-dimensional array of integers, with rank 3 5 7. x3d denotes an array of three items; each item is an array of ve arrays; each of the latter arrays is an array of seven integers. In C, any of the expressions x3d, x3d[i], x3d[i][j], x3d[i][j][k] may reasonably appear in an expression. The rst three have type POINTER under the above assumptions; the last has type int. The following two Montages show the semantics for array declarations. In order to implement the semantics for multi-dimensional arrays, we de ne the token \[" to be another element in the universe Pointer. In order to allow the multi-dimensional array variable to get its static type, we de ne a function ArrayElement : Token Integer ! TypeTask. The element of the array 52

can be derived from the second parameter of ArrayElement. See more details in the following. In the following Montage, PointedType of the token Ident for array variable declarations is set to the rst token \[" by the static analysis. All the tokens \[" are connected in the same order as they appear in the array variable declaration. Because of this, we set the control functions I and T shown in the Montage for array variable dimension declarations. For the last token \[", the static analysis sets the function StaticType to the type declared at the beginning of this variable declaration. It becomes more complicated when initializing expressions are present. Because the order for evaluating the initializing expressions is not decided by the order in the program, we use the external function ChooseTask to set the order for all these initializing expressions during compile-time. In addition the following two array declarations are equivalent in C.

oat y[4][3]=f f1,3,5g,f2,4,6g,f3,5,7g g and

oat y[4][3]=f1,3,5,2,4,6,3,5,7g Here we assume that the preprocessor of C translates the rst declaration into the second declaration and we give the Montage for the second array declaration. Based on this assumption, we use the index to get the elements in a multi-dimensional array. That is why we give the de nition for the function ArrayElement. Furthermore, the following declaration:

oat y[3][2]; can be translated into the array declaration with initializing expressions, whose value is set in advance, and this is done by the preprocessor. In order to get the type for array variables, we de ne a new function ResultType : Token ! Token; the static analysis sets the function ResultType for the token Ident shown below. In C the element E1[E2] is identical to (E1 + E2). Here to compute (E1 + E2 ), we need to convert it to an address oset by multiplying it by the size of the object. Function length is used to implement this. For multi-dimensional arrays, we assume that processor translates it into one dimensional array so that we can get all the element in the array by using the function length. In addition, the condition part guarantees that the number of initializing expressions is equal or less than the number of the cells in the array variable. The dynamic semantics for array variables is given by the rules for the token Ident. Like other variable declarations, we allocate memory for the variable by using the external function CurAddr. If the initializing expression exists, we can assign the values into the corresponding memory address. Because memory allocation is done by the external function CurAddr, we use another external function ArrayElementSize : Token Integer ! Memory to indicate the memory address for every element in the array variable given by the token Ident and its position in the array. 53

array decl ::= Type Ident constlist fconstlistg [ "=" "f" expr f"," exprg ] ArrayVar

T

ArrayElement(.)

S-Ident NT

I

LIST S-constlist

S-Type

ResultType

InitialValue(.)

S-expr

StaticType

LIST

VarTable(CN.S-Ident.Name):=CN.S-Ident; CN.S-Ident.StaticType:=CN.S-Ident.ArrayElement[0]; CN.S-Ident.ArrayElement[CN.S-constlist.ListLength].length:=1; vary over ind(i,CN.S-constlist-1) CN.S-Ident.ArrayElement[i].PointedType:= CN.S-Ident.ArrayElement[i+1]; endvary; vary over(expinit,CN.S-expr) exprinit.NT:=ChooseTask(expinit,CN.S-Ident); endvary; CN.S-Ident.ArrayElement[CN.S-constlist.ListLength].StaticType :=CN.S-Type; CN.S-Ident.ArrayElement[0].length:= AllocateLength(CN.S-Ident.ArrayElement[0]); condition CN.S-constlist.ListLength CN.S-expr.ListLength @S-Ident: CT.Memory := CurAddr; vary over ind(i,CT.InitialValue) if(CT.InitialValue[i]6=undef) then DoAssign(CurAddr+(i * CT.ResultType.length), CT.InitialValue[i].NodeValue, CT.InitialValue[i].ResultType); endif endvary CT := CT.NT; end@

Figure 31. Semantics of array variable declaration. In order to provide the osets for elements in a array variable, the static analysis in the following Montage sets the constant value by using the function LexicalV alue used in case 54

statements. The length value for an array variable is determined by the previous length values and the value for this token \[". Given the address of an array variable and the oset for an element, we can compute the real address for this element. In addition, the static analysis sets the oset for every token \[". The static analysis uses the function ArrayV ar to reset the values for all previous token \[" shown in the following Montage. More details can be seen from the following Montage.

constlist ::= "[" Const "]" I

S-"["

T

CN.S-"[".ConstValue:=CN.S-Const.LexicalValue; vary over ind(i,CN.ArrayElementPosition-1) CN.ArrayVar[i].length:= CN.ArrayVar[i].length * CN.S-"[".ConstValue; endvary

Figure 32. Semantics of array variable dimension declaration.

5.5 Variable Access

After a variable has been declared, we can read and write the memory location corresponding to the variable.

5.5.1 Primary Variables First we consider the case when the variable is just a simple identi er. The entry VarTable(CN.Ident) denotes the token of the corresponding declaration. The condition of the Montage for primary variables guarantees the existence of this token. The static analysis sets Decl : Token ! Token to this token and sets the static type for the corresponding entry in V arTable to this token. The dynamic semantics gets the corresponding memory for this variable and passes control to the next token.

55

primary id = Ident CN.Decl :=VarTable[CN.S-Ident.Name]; CN.StaticType := VarTable[CN.S-Ident.Name].StaticType; CN.lvalue:=true; condition VarTable[CN.S-Ident.Name]6=undef @primary id: CT.Memory:= CT.Decl.Memory; CT.NodeValue:= MemoryValue(CT.Decl.Memory,CT.Decl.StaticType); CT := CT.NT endif

Figure 33. Semantics of primary variable.

5.5.2 Pointer Variables A pointer variable is used to access a given memory location given by the current pointer variable. The condition of the Montage for pointer variables guarantees that primary's type is a pointer type. The static analysis sets the PointedType to which the current variable points to the static type of the current variable.

primary pointer ::= "*" primary I

S-primary

Argument NT

T

S-"*"

CN.S-"*".StaticType := CN.S-primary.StaticType.PointedType; CN.S-"*".lvalue := true;

condition CN.S-primary.StaticType.PointedType6=undef @"*": CT.Memory:= MemoryValue(CT.Argument.Memory,CT.Argument.StaticType); CT.NodeValue:= MemoryValue(MemoryValue(CT.Argument.Memory, CT.Argument.StaticType), CT.Argument.StaticType.PointedType); CT := CT.NT: end@

56

Figure 34. Semantics of pointer variable.

5.5.3 Structure Variables The structure variable is used to access the elds of structures. The condition of the Montage for structure variables guarantees that primary's type should be of structure type and that the corresponding eld name in this type exists. The static analysis sets the static type of the corresponding eld to the current variable's static type. The dynamic semantics gets the corresponding memory location for the current variable by using the function eldoset.

primary rec ::= primary "." FieldIdent FieldIdent

=

Ident

Argument

I

S-primary

NT

S-FieldIdent

T

CN.S-FieldIdent.StaticType := CN.S-primary.StaticType. eld[CN.S-FieldIdent.Name].StaticType; CN.S-FieldIdent.lvalue:=true;

condition isStruct(CN.Argument.StaticType) or

isUnion(CN.Argument.StaticType) and (CN.S-primary.StaticType. eld[CN.SFieldIdent]6=undef)

@FieldIdent: CT.Memory:=CT.Argument.Memory + CT.Argument.StaticType. eld[CT.Name]. eldoset; CT.NodeValue:=MemoryValue(CT.Argument.Memory +CT.Argument.StaticType. eld[CT.Name]. eldoset, CT.Argument.StaticType. eld[CT.Name].StaticType); CT := CT.NT; end@

Figure 35. Semantics of structure variable.

5.5.4 Array Expressions Array variables in C are closely tied to pointer variables. The static analysis, using the function

PointedType, checks whether the type of primary is a pointer. The data and control functions

can be set in the way shown in the following Montage. 57

The dynamic part of the Montage for array variables is quite dierent from the previous variables. For an array expression a[b], the value of this expression is computed by (a + b). The computation for a + b is similar to the computation of additive expressions involving pointers, which is shown in the following Montage.

58

primary array ::= primary "[" exp "]" I LeftValue LeftTask

S-primary

S-"]"

RightValue RightTask

S-exp

T NT CN.S-"]".StaticType := CN.S-primary.PointedType; CN.S-"]".lvalue:=true; if(IsPointer(CN.S-primary.StaticType)) then CN."]".LIndirect:=true;CN.oset:= CN.S-primary.StaticType.PointedType.length; else if(IsPointer(CN.S-exp.StaticType)) then CN."]".RIndirect:=true;CN.oset:=CN.S-exp.StaticType.PointedType.length;endif endif CN.S-"]".LeftVisited:=False; CN.S-"]".RightVisited:=False; CN.S-"]".FirstTask:= ChooseTask(CN.S-primary,CN.S-exp); condition CN.S-primary.StaticType.PointedType 6=undef NT

@"]": if(CT.LeftVisited=False and CT.RightVisited=False) then CT:=CT.FirstTask; else if(CT.LeftVisited=True and CT.RightVisited=True) then if(CT.LIndirect=true) then CT.Memory := CT.LeftValue.NodeValue+CT.RightValue.NodeValue*CT.oset; CT.NodeValue := MemoryValue(CT.LeftValue.NodeValue+CT.RightValue.NodeValue*CT.oset) else if(CT.RIndirect=true) then CT.Memory:=CT.Left.NodeValue*CT.oset+CT.RightNode; CT.NodeValue:=MemoryValue(CT.Left.NodeValue*CT.oset+CT.RightNode); else CT.Memory:=CT.LeftValue.NodeValue+CT.Right.NodeValue; CT.NodeValue:=MemoryValue(CT.LeftValue.NodeValue+CT.Right.NodeValue); endif CT.LeftVisited:=False; CT.RightVisited:=False; CT:=CT.NT; endif else if(CT.LeftVisited=True) then CT.RightVisited:=True; CT:=CT.RightTask; else if(CT.RightVisited=True) CT.LeftVisited:=True; CT:=CT.LeftTask; endif endif; endif endif end@

Figure 36. Semantics of array variable.

59

6 Functions Now we turn to an important concept in C: functions.

6.1 Some New ASM Functions

A function is a very important and useful concept in C. Generally speaking, functions are blockstructured; the variables which are declared in a C-function can only be referenced within that function. The value of Scope for program global variables is 1. The value of Scope for C-function variables is 2 (C does not permit functions to be nested). After a C-function is exited, the value of Scope returns to 1. First we extend the function V arTable : String Integer ! Token whose second argument is used to choose a scope. During the static analysis, we guarantee that the function Scope denotes the scope of the currently analyzed program section. Thus all the entries in V arTable(name) shown in the previous Montages will be rewritten as V arTable(name; Scope). At the same time, C functions may have several active incarnations at a given time during their execution. Thus we must have some means for storing multiple values of an ASM function for a given token. The universe Stack comprises the positive integers. A dynamic distinguished element StackTop:Integer is used to indicate the current top of the stack. To store state-associated information on the stack, we modify various functions such as NodeValue, Left, Right, Value, Cond, OnlyValue, etc. to be binary functions from Token Stack to CV alue. This leads to rewriting all the previous Montages we have given. We simplify matters by stating that every previous reference to V (X ) should be replaced by V (X; StackTop), where V is one of the functions listed above. Similarly, we modify address to be a binary function from Token Stack to address. The semantics for the function are given in the following two parts. One is the caller part and the other is the callee part. The callee part is given by the following three Montages.

6.2 Callee Part

The static analysis of a function declaration rst analyzes its function name. The Montage for function names is shown in the following. In C, functions may not be de ned within other functions. But variables can be de ned in a block-structured fashion. Declarations of variables may follow the left brace that introduces any compound statement, not just the one that begins a function. Variables declared in this way hide any identically named variables in outer blocks, and remain in existence until the matching right brace. We de ne a new function Scope : Integer to indicate the concept of \block level". The initial block level Scope is 1. If a function is entered the block level Scope is incremented by 1; if a function is exited then the block level Scope is decremented by 1. The static analysis of a function name increments the scope. In fact, in the Montages for function names, Scope can be replaced with 1 and scope+1 can be replaced with 2. and copies the declaration tables of the previous scope. The reason for copying previous declaration tables 60

is that C permits one to use variables declared at outside levels if there is no appropriate local declaration. In addition, like a variable, C permits one to declare a function variable before its de nition. For those function names which are not de ned in the declaration, we assume that the preprocessor adds these function names as function variables in the declaration part. We use the function FuncTable : String ! Token to set up the relation between the function name and its token. From this token, we can get the address for this function.

funcid = Ident Scope := Scope +1; FuncTable(CN.S-Ident.Name) := CN; FuncTK := CN; do forall n: String(n) if(n 6= CN.S-Ident.Name) then VarTable(n, Scope+1):=VarTable(n,Scope); endif enddo

Figure 37. Semantics of function name.

The condition of the Montage for C-function declarations guarantees that all local variable names not be the same as parameter names. In addition, the condition guarantees that all the names in goto statements within the C-function should have the corresponding name in label statements within the C-function. The static analysis of the Montage for C-function declarations decrements the increased scope to the previous value since the whole C-function, including the parameter declaration, local variable declaration and the statement, has been analyzed(reduced); at this time the static scope should return to the previous value. The static analysis sets all the tokens in the GotoTable to the corresponding ones in the LabelTable. But because GotoTable and LabelTable are shared among all the C-functions we need to clear both of them when a C-function has been analyzed so as to avoid transferring control across C-functions. In addition we de ne a new function FirstParam : Token ! Token which will be used to pass control to the rst parameter when this C-function is called. In addition, the static analysis sets the data control function FormalParam : Token Integer ! Token as shown in the following Montage. FormalParam is used to connect the formal parameters and the actual parameters when this C-function is called. To get the speci c position of a formal parameter in the parameter list from the token funcid, we will use the projection function FormalParamPosition : Token Integer ! Token in the following Montages. The dynamic semantics of the Montage decrements the the value of the s tack(StackTop) to the previous one because the C-function execution has nished. In order to make control transfer to the corresponding token after a C-function has been executed, we de ne a new function Caller : Integer ! Token to store the token that calls this C-function at a given recursion level. We set the value of Caller when a C-function is called, which is shown in 61

the Montage for C-function calls. Then the dynamic semantics transfers control to the token following the function call by using the function Caller. The other way to make control transfer to the token following the C-function call is the return statement, which is shown later.

FuncDecl ::= type funcid "(" [ formalparlist f"," formalparlistg] ")" "f" declarationlist ";" statementlist ";" "g" StaticType

S-type

S-funcid I

FormalParam(.) NT

S-declarationlist

NT

FirstPara

T LIST

S-formalid S-formalparlist S-statementlist

NT

S-"g"

do forall name: GotoTable(name) 6= undef GotoTable(name).GotoNT:=LabelTable(name); GotoTable(name):=undef LabelTable(name):=undef; enddo Scope:=Scope-1;

condition 8 p1,p2 2 CN.S-formalparlist.S-formalid: (p16=p2 implies p1.Name 6= p2.Name) AND 8 name 2 String: (GotoTable(name)6=undef implies LabelTable(name)= 6 undef) AND 8 i ListLength(CN.S-funcid.Varparam):

(CN.S-funcid.Decl.Varparam[i].StaticType AssignableTo CN.S-funcid.FormalParam[i].StaticType)

@"g": StackTop := StackTop - 1; CT := Caller[StackTop].NT; end@

Figure 38. Semantics of function declaration. The static analysis of the Montage for formal parameter declarations is quite similar to that for variable declarations. The data and control functions are set in the following Montage. In order to nd the variable de ned in the parameter declarations, the static analysis sets the 62

corresponding token to the entry in V arTable. The dynamic semantics of the Montage allocates the corresponding memory address for every parameter, assigns the appropriate argument value, and passes control to the next token.

formalpar ::= type formalid f"," formalidg formalid

=

Ident

StaticType

LIST S-Ident I

S-type

T

VarTable(CN.S-Ident.Name,Scope):=CN.S-formalid;

condition 8 p1,p22 CN.S-formalid holds p1= 6 p2 implies p1.Name 6= p2.Name @Ident: CT.Memory[StackTop] := CurAddr; DoAssign(CurAddr, Caller[StackTop].Func.ActualPara[CT.FormalParamPosition].NodeValue[StackTop1], CT.StaticType); CT := CT.NT; end@

Figure 39. Semantics of formal parameter declaration. If the parameter is a pointer to a C-function, then we can give the Monage for it as follows. Like the Montage dealing with the pointer, the static analysis sets the function VarTable for the parameter's name and the types for this parameter are set in the following Montage, which are similar to pointer variables. Like the declaration for function pointers, the static semantics sets the StaticType for function parameters and sets PointedType for token \*". The dynamic semantics rst allocates the corresponding memory address for this pointer function name. Then, like the other function parameters, we copy the real function address in the caller part into this new address for function pointers.

63

formalfunpar ::= type "(" "*" Ident ")()" I

S-Ident StaticType

S-"*"

PointedType

S-type

T VarTable(CN.S-Ident.Name,Scope):=CN.S-Ident; @Ident: CT.Memory[StackTop]:=CurAddr; DoAssign(CurAddr, Caller[StackTop].Func.ActualPara[CT.FormalParamPosition].Memory[StackTop1], CT.StaticType); end@

Figure 40. Semantics of pointer to function declaration.

6.3 Block Structure

Because the way we give the semantics for block structures is similar to function calls, we give the Montage for block structures before discussing the semantics for a caller part of functions. Because variables can be de ned in a block-structured fashion within a function, we use the function Scope to represent the block level. If a block statement is entered the block level Scope is incremented by 1; if a block statement is exited then the block level Scope is decremented by 1. Similar to function de nitions, we copy all the variables in the previous levels into the current level table. So the Montages for block statements are as follows:

new block ::= blockbegin statementlist blockend I

S-statementlist

T

64

Figure 36. Semantics of block statements.

blockbegin ::= "f" Scope:=Scope+1; vary over(n,String) VarTable(n, Scope+1):=VarTable(n,Scope); endvary

Figure 36. Semantics of the beginning of block statements.

blockend ::= "g" Scope:=Scope-1;

Figure 36. Semantics of the end of block statements.

6.4 Caller Part

Although a function can be called by a function variable or a function pointer, we regard it as an expression whose value is a memory address from which we use the function AddrToFunc to get the corresponding task for this C-function. The static analysis of the Montage for the function call sets the data and control functions as follows. The condition part of the Montage guarantees that all the parameters in the callee function correspond with arguments in the function declaration in their type and position in the function parameter list. The order of evaluation of expressions in the function call is not decided in advance, which is similar to assignment expressions. Any expressions in the C-function parameters can be possibly evaluated rst rather than the rst parameter in the caller expression. Here we use the external function ChooseTask to set the next task for every expression in the function call.3 The dynamic semantics of the Montage increments the value of the stack. The dynamic semantics sets the value for the function Caller in order to transfer control to the token next to this function call. In addition, because we set the function AddrToFunc for the memory 3

As with expressions, we perform this choice at compile-time.

65

of function names, here the dynamic semantics of the Montage can nd the the appropriate C-function and make control transfer to its rst parameter.

funccall ::= exp "(" [exp f "," exp g] ")" LIST ActualPara(.) I

S2-exp

NT

S1-exp

Func NT

S-")" T

vary over(expvar,CN.S-exp) expvar.NT:=ChooseTask(expvar,CN.S-exp); endvary; CN.S-")".StaticType := CN.S1-exp.StaticType

condition CN.S2-exp.ListLength = CN.S2-exp.StaticType.FormalParam.ListLength and 8 e 2 CN.S2-exp

e.StaticType AssignableTo CN.S2-exp.StaticType.FormalParam(e.Position)

@")": StackTop := StackTop + 1; Caller[StackTop+1] := CT; CT := CT.Func.Memory[StackTop].AddrToFunc.FirstParam; end@

Figure 41. Semantics of function call.

6.5 Return statement

The static analysis of the Montage for return statements sets the data and control functions as follows. The condition part of the Montage guarantees that the type of the expression returned be consistent with the return type de ned in the function declaration. The dynamic semantics of the Montage sets the value of the expression to the corresponding token and passes the control to the token just following the function call by using Caller.

66

return statement ::= "return" exp Argument

I

S-exp

NT

S-"return"

condition CN.S-exp.StaticType

T

AssignableTo

FuncTK.StaticType

@"return": StackTop:=StackTop-1; Caller[StackTop].NodeValue[StackTop-1]:= CT.Argument.NodeValue[StackTop]; CT := Caller[StackTop].NT; end@

Figure 42. Semantics of return statement.

7 Discussion As a continuation of [7], this work gives static semantics and dynamic semantics of the entire programming language C(shown more fully in [?]). By using Montages, a clear and complete semantics of C has been given. In [7], the static functions are not completely speci ed. There, the authors assume that the function NextTask is given before the corresponding ASM program is executed. Compared with [7], this work gives the static semantics as well as the dynamic semantics for C under the same framework so that all these Montages can become a complete manual for the programming language C. We give the dynamic semantics for any syntax rule by using the ASMs for the tokens appearing in the syntax rule. Because we re-use these tokens to give the dynamic semantics, it is unnecessary to de ne some new tasks such as \branch" to give the dynamics semantics, as done in [7]. At the same time a tool called GEM/MEX[?], which supports Montages, has been used to translate all of these Montages into an executable programming language. Because GEM/MEX uses ASMs to give the static and dynamic semantics for a programming language, the resulting speci cations are executable. Up to now, many of the formal methods used for programming language semantics are not executable. But using MEX, after we give the Montages for a programming language like the above for C, we can run all the Montages on a given C program. 67

Thus, Montages can be used to debug the program by executing them and observing the result. In addition, we can verify the validity of our Montage descriptors by testing on the C programs. We used the rst version of GEM/MEX to verify many of our Montages; in the meantime, a new version of the GEM/MEX tool has been released with many new features for the graphical language of Montages. We are currently working on converting our work to the new version for comprehensive testing. Several other languages have been given semantics by using Montages. These languages include Oberon[10] and Java[12]. Compared with the programming language Oberon, this work gives more details about the low level features of memory. In fact this is because C has many low-level operations that other programming languages don't have. We de ne a set of functions regarding the memory such as Memory; MemoryByteV alue; ByteToResult; etc:. This shows that we can use Montages to give the semantics for these low-level features of a programming language. The semantics for Oberon and Java do not need to give these kinds of functions, because they are higher-level languages and provide fewer memory operations. In addition, compared with Oberon and Java, C gives more exibility to the order of the evaluation of subexpressions. According to the semantics of C, the evaluation of the subexpressions in an expression in C can be arbitrary. That is why we use the external function ChooseTask to implement this. Here we assume that the execution of the subexpression is atomic, which is not allowed to be interrupted by the other subexpressions(in [7], this choice was made dynamically at run time; it seems reasonable to force this choice to be made at compile-time).

References [1] E. Borger and I. Durdanovic. Correctness of compiling Occam to Transputer code. Computer Journal, 39(1):52-92, 1996. [2] E. Borger, I. Durdanovic, and D.Rosenzweig. Occam: Speci cation and Compiler Correctness. Part I: Simple Mathematical Interpreters. In U. Montanari and E.R. Olderog, editors, Proc. PROCOMET'94 (IFIP Working Conference on Programming Concepts, Methods and Calculi), pages 489-508. North-Holland, 1994. [3] E. Borger, U. Glasser, and W. Muller. The Semantics of Behavioral VHDL'93 Descriptions. In EURO-DAC'94. European Design Automation Conference with EURO-VHDL'94, pages 500-505, Los Alamitos, California, 1994. IEEE CS Press. [4] E. Borger and D. Rosenzweig. A Mathematical De nition of Full Prolog. In Science of Computer Programming, volume 24, pages 249-286. North-Holland, 1994. [5] E. Borger and W. Schulte. Programmer Friendly Modular De nition of the Semantics of Java. In J. Alves-Foss, editor, Formal Syntax and Semantics of Java, LNCS. Springer, 1998. [6] Y. Gurevich. Evolving Algebras 1993: Lipari Guide. In E.Borger, editor, Speci cation and Validation Methods, pages 9-36. Oxford University Press, 1995. 68

[7] Y. Gurevich and J.Huggins. The Semantics of the C Programming Language. In E. Borger, H. Kleine Buning, G. Jager, S.Martini, and M.M. Richter, editors, Computer Science Logic, volume 702 of LNCS, pages 274-309. Springer, 1993. [8] B. Kernighan, W. Brian, The C programming Language, 2nd edition, Prentice Hall, 1988. [9] P. Kutter and A. Pierantonio. Montages: Speci cations of Realistic Programming Languages. Journal of Universal Computer Science, 3(5):416-442, 1997. [10] P. Kutter and A. Pierantonio. The Formal Speci cation of Oberon. Journal of Universal Computer Science, 3(5):443-503, 1997. [11] C. Wallace. The Semantics of the C++ Programming Language. In E.Borger, editor, Speci cation and Validation Methods, pages 131-164. Oxford University Press, 1995. [12] C. Wallace. The Semantics of the Java Programming Language: Preliminary Version. Technical Report CSE-TR-355-97, EECS Dept., University of Michigan, December 1997.

69

The Static and Dynamic Semantics of C: Preliminary ... - CiteSeerX

The Static and Dynamic Semantics of C: Preliminary ... - CiteSeerX

Suggest Documents

Static and Dynamic Semantics. Preliminary Report

The Static and Dynamic Semantics of C 1 Introduction - CiteSeerX

The Static and Dynamic Semantics of C 1 Introduction - CiteSeerX

Static and Dynamic Semantics Processing - CiteSeerX

Static and Dynamic Semantics of NoSQL Languages

Static and Dynamic Semantics of NoSQL Languages

Static and Dynamic Strategies to Understand C Programs ... - CiteSeerX

Static Semantics of Message Sequence Charts - CiteSeerX

The Semantics of Dynamic Conjunction - CiteSeerX

Static and Dynamic Analysis of the Internet's ... - C. Lee Giles

Static and Dynamic Analysis of the Internet's ... - C. Lee Giles

Static and Dynamic Light Scattering - CiteSeerX

Static and Dynamic Structural Symmetry Breaking - CiteSeerX

Realism in Dynamic, Static-Sequential, and Static ... - CiteSeerX

Evolutionary Algorithms for Static and Dynamic ... - CiteSeerX

Dynamic Semantics

Dynamic Denotational Semantics of Java - CiteSeerX

STATIC, DYNAMIC AND INTEGRATED

The Perception of Dynamic and Static Facial

An Examination of the Static and Dynamic

Dynamic and static contributions of the

Identification of the Quasi-Static and Dynamic

Static and dynamic lengthscales in glassforming liquids C. Patrick ...

The Semantics of the C Programming Language 0 ... - CiteSeerX