Algebraic de nition of programming languages 1 ... - Semantic Scholar

1 downloads 39700 Views 223KB Size Report
De nition 1: a programming language is a tuple L = hSem; Syn;L : Sem ! ...... p 2 N, def(r), dec(r), app(r), r 2 R, and SEL constructs used as parameters, and.
Algebraic de nition of programming languages Teodor Rus Department of Computer Science The University of Iowa Iowa City, Iowa 52242 [email protected]

Abstract This paper provides an algebraic de nition of programming languages. It presents a methodology for the construction of syntax and semantics of a programming language as similar algebras speci ed by the same speci cation rules. Then we show that syntax and semantics algebras of a programming language form a Galois connection. Applications to programming language processing are sketched and homomorphism computation is suggested as an alternative to automata for machine computation and for system software development. Keywords: algebra, Galois-connection, homomorphism, language

1 Introduction A language processing environment is a set of integrated tools that support the three tasks of computer language processing: speci cation, implementation, and usage. The development of the tools that belong to a language processing environment requires language speci cation mechanisms that de ne formally all components of a programming language, the syntax, the semantics, and their association, by the same speci cation rules. This motivates the following algebraic de nition of a programming language: De nition 1: a programming language is a tuple L = hSem; Syn; L : Sem ! Syni where Sem and Syn are algebras of the same similarity class and L is a carrier-mapping such that there exists a homomorphism E : Syn ! Sem and L  E = 1Sem. Observation: L tells that any semantic element has at least one syntactic representation and E tells that any syntax element has as unique semantic value. We use equations of the form r : A = t A t A t : : : Antn, called BNF notation [Nau63], as speci cation rules for language algebras similarity class, where Ai, 0  i  n, are parameters called nonterminals and ti , 0  i  n, are xed strings. For a given speci cation rule r : A = t A t A t : : : Antn, A is the left hand side of the rule and is denoted by lhs(r) and t A t : : : tn? Antn is the right hand side of the rule and is denoted by rhs(r). We assume here that a programming languages is speci ed by a nite set, R, of speci cation rules, N is the collection of parameters used in R, T is a collection of xed strings used by rules in R and W = hN [ T ; ; i is the semigroup associative of words with unity  generated by 0

0

0

1 1

0

1 1

2 2

0

1 1

0

1

1

2 2

concatenation  over the alphabet N [ T . Appropriate interpretations of the parameters P 2 N and rules r 2 R provide the mechanisms that allow one to construct Syn and Sem as algebras of the same similarity class and of L and E as homomorphic maps. For a set of speci cation rules R the syntax algebra, Syn(R), is unique (up to isomorphism) while the semantics algebra, Sem(R), may be a language user choice. The algebraic tools for language processing requires both Syn(R) and Sem(R) to be speci ed by concrete notations such that Syn(R) and Sem(R) are naturally associated with each other and are sound and well-understood by the people involved in the three tasks of language processing. Since Syn(R) is unique the major problem rests with the notation used to express Sem(R).

2 Syntax algebra The syntax algebra of the language speci ed by R is Syn(R) = hf[An]jAn 2 N g; f[r]jr 2 Rgi where [An], An 2 N , and [r], r 2 R, are the syntax interpretations of the speci cation rules and are de ned as follows:

 Each parameter An 2 N is interpreted as a family of well-formed expressions called a

syntactic domain, denoted by [An], that may be used to denote computations.  Each rule r 2 R; r : A = t A t : : : tn? Antn is interpreted as an algebraic operation [r] : [A ]   : : :  [An] ! [A ] which constructs the elements w 2 [A ] from the elements wi 2 [Ai ], 1  i  n, by the rule: [r](w ; w ; : : : wn) = w = t w t w : : : tn? wntn . 0

1

0

1 1

1

0

0

1

0

1 1

2

2

0

0

1

We illustrate this de nition with the speci cation rules in Table 1. The syntax interpretation St St St St Sl Sl Dc Dl Dl Bl

= = = = = = = = = =

Id ":=" Ex "if" Ex "then" St "else" St "fi" "while" Ex "do" St "od" Bl St St ";" Sl Id Tp Dc Dc ";" Dl "begin" Dl ";" Sl "end"

Table 1: The speci cation rules of a simple language of the parameters used in these rules is de ned by the equations in Table 2. The syntax algebra of the language speci ed by the rules in Table 1 is Syn(R) = hf [Id], [Ex], [Tp], [St], 2

[Id] [Ex] [Tp] [St]

= = = =

[Sl] [Dc] [Dl] [Bl]

= = = =

set of identi er and constant representations; set of expressions generated by identi ers and constants; set of given (prede ned or de ned) type names; fid := expjid 2 [Id]; exp 2 [Ex]g [fif e then s else s fije 2 [Ex]; s 2 [St]g [fwhile e do s od je 2 [Ex]; s 2 [St]g [fbegin dl ; sl endjdl 2 [Dl]; sl 2 [Sl]g; [St] [ fh; tjh 2 [St]; t 2 [Sl]g; fid tjid 2 [Id]; t 2 [Tp]g; [Dc] [ fh; tjh 2 [Dc]; t 2 [Dl]g; fbegin dl; sl endjdl 2 [Dl]; sl 2 [Sl]g; Table 2: Syntax domains of the rules in 1

[Sl], [Dc], [Dl], [Bl] g; f[r ]; [r ]; [r ]; [r ]; [r ]; [r ]; [r ]; [r ]; [r ]; [r ]g i where operations [r ] through [r ] are provided by the syntax interpretation of the speci cation rules de ned by the equations in Table 3. 1

2

3

4

5

6

7

8

9

10

1

10

[r ]: r ]: [r ]: [r ]: [r ]: [r ]: [r ]: [r ]: [r ]: [r ]: 1

2

3

4

5

6

7

8

9

10

[Id]  [Ex] ! [St]; [Ex]  [St]  [St] ! [St]; [Ex]  [St] ! [St]; [Bl] ! [St]; [St] ! [Sl]; [St]  [Sl] ! [Sl]; [Id]  [Tp] ! [Dc]; [Dc] ! [Dl]; [Dc]  [Dl] ! [Dl]; [Dl]  [Sl] ! [Bl];

[r ](id; e) = id := e [r ](e; s ; s ) = if e then s else s fi [r [(e; s) = while e do s od [r ](b) = b r (s) = s [r ](h; t) = h; t [r ](id; t) = id t [r ](d) = d [r ](h; t) = h; t [r ](dl; sl) = begin dl; sl end 1

2

1

2

1

2

3

4

5

6

7

8

9

10

Table 3: Syntax interpretation of the rules in Table 1

Fact 1: Syn(R) is embedded in the semigroup W = hN [ T ; ; i by derived operations.

A component relation, Syn, on Syn(R), can now be de ned by: for all w; w0 2 Syn(R), w Syn w0 i w = w0 or there is r 2 R such that w0 = [r](w ; : : : ; wk ) and w Syn wi for some i, 1  i  k. Using this relation we can introduce the context of w 2 Syn(R) as the set w[ ] of strings over the alphabet of the language that contains a whole, denoted by [ ], that when replaced by w an element w0[w] 2 Syn(R) is obtained, i.e., w[ ] = fw [ ]w jw ww 2 Syn(R)g. Hence, for w0 2 w[ ], w0[w] = w ww such that w 6Syn w . 1

1

1

2

3

1

2

1

2

3 Semantics algebra The semantics algebra of the language speci ed by R is Sem(R) = hf[ An ] jAn 2 N g; f[ r] jr 2 Rgi where [ An] ; An 2 N , and [ r] ; r 2 R, are the semantics interpretations of the parameters and speci cation rules and are de ned as follows:  Each parameter Ai 2 N is interpreted as a family of computations called a semantic domain, denoted by [ Ai ] , that may be expressed in the language.  Each rule r 2 R, r : A = t A t : : : tn? Antn, is interpreted as an algebraic operation [ r] : [ A ]  : : :  [ An] ! [ A ] which constructs the elements c 2 [ A ] from the elements ci 2 [ Ai ] , 1  i  n, by the rule [ r] (c ; c ; : : : cn) = c . Since the language we contemplate is machine-independent the computations expressed in this language are machine-independent and as such computation process does not necessarily mean the process of a program execution performed by a machine. Each computation is however speci ed in terms of three generic elements [Rus98]: an universe of discourse de ned by a collection of types, a state de ned as a mapping assigning values of types given in the universe to a nite set of names (variables and constants), and a state transition that maps the state before computation into the state after computation. The universe of types contains the type computation process whose values are processes preforming given computations. We assume that a computation operates on a set D of typed data variables and constants, that denote values in the universe of types, and a set C of control variables and constants that denote processes in the universe of types. To simplify presentation we use here only two control variables denoted by " which identi es a computation process before its execution, and # that identi es a computation process after its execution. To describe formally the machine independent computations we use the following concepts: 1. The universe of types T is composed of two sets: PT is the set of prede ned types which are available to every computations, and TC is a set of type constructors that can be used by a computation to construct new types in T from the available types. The scope of the types constructed by type constructors is determined by the computation which de ne them. For t a type we denote by V (t) the set of values of type t. We assume that there is a universal value @ which denotes any value of any type and for each type t 2 T , @t is the universal value of type t, @t 2 V (t). 2. The states  2  of a computation are mappings  = (d ; c), d : D ! V (t); t 2 T , c : f"; #g ! L where L is a set of process labels, such that for x 2 D and t 2 PT or t = (t ; : : : ; tk ) where  2 TC , d (x) 2 V (t), c(") identi es a process that performs next, c (#) identi es the process that performed previously. 0

0

1

1 1

1

0

0

1

2

0

0

1

1

A process is a tuple P = hAgent; Action; Statusi where Agent is speci ed by the set of data types it recognizes, Action is an expression in the language of the Agent describing a computation activity, and Status is the status of the computation performed by Agent. 1

4

 3. The transitions  2 ? are mappings hT;  : D [f"; #g ! (V (t); t 2 T ) [ Li ?! hT 0; 0 : 0 0 0 D ! (V (t); t 2 T ) [ L i. Assuming that " and # enclose the expression w of the w hT 0; 0; w #i transformation performed by  , the expression of  becomes hT; ; " wi ?! where w denotes the process of performing the transformation w with variables bind to values de ned in the state . Abstracting the expression of the transformation  performed by a transition  we may represent it by hT; i ! hT 0; 0i. ( )

A machine independent computation is a sequence of transitions i? i   i hT ;  i ?! hT ;  i ?! : : : ?! hTi; i i ?! hTi ; i i ?! ::: 1

1

1

2

1

2

2

+1

+1

+1

The initial state of a computation is a state whose variable and constants satisfy an initial condition  : D [ f"; #g ! ftrue; falseg; the nal state of a computation is a state whose variable and constants satisfy a nal condition  : D [ f"; #g ! ftrue; falseg. Note: The transformation performed by  applied on hT; i may de ne new types, declare new variables, and construct a new state. The state hT ;  i of the computation 1

1

i? i   i hT ;  i ?! hT ;  i ?! : : : ?! hTi; i i ?! hTi ; i i ?! ::: is initial if ( ) = true; this computation terminates if there is a state hTi ; ii such that for every transition  ,  (hTi ; ii) = hTi; i i and (i ) = true. 1

1

1

2

1

2

2

+1

+1

+1

1

Fact 2: The set ? of transitions over a universe of types, T , a universe of data and con-

trol variables and constants, D [ C , and a universe of computation processes, P , forms a semigroup associative with unity.  Proof: we de ne the composition of two transitions  and  denoted    by hT ;  i ?! hT ;  i i hT ;  i ?! hT ;  i ?! hT ;  i. It is obvious that   (   ) = (   )   . Now we observe that the transition  de ned by: 8T 2 UT; D 2 DV; C 2 CV; L 2 UP ^  :  D [ C ! V (T ) [ L:hT; i ?! hT; i is both a left and right unity for the transition operation , i.e., 8 2 ?:   =    =  . Hence, T = h?; ; i is a semigroup associative with unity. The semantics algebra Sem(R) of a particular language is obtained by interpreting each parameter P 2 N as a set of transitions of the semigroup TP = h?(P ); ; i where  2 ?(P ) represents a type, a state, or a state mapping, while each speci cation rule r 2 R is interpreted as an algebraic operation on transitions. That is, for r 2 R, r : A = t A t : : : tn? Antn, [ Ai ] , 0  i  n, is a set of transitions [ Ai ] = [hTi; i i ?! hTi0; i0 i], and for every wi 2 [Ai ], 0  i  n, and w = [r](w ; : : : ; wn) 2 [A ], the transition [ r] performed by w is expressed in terms of transitions [ ri ] performed by the components wi of w , 1  i  n. Hence, [ r] : [ A ]  : : :  [ An ] ! [ Ao ] , [ r] = Er ([[r ] ; : : : ; [ rn] ; ; ), where Er is a particular composition of transitions [ ri ] , 1  i  n, using the laws  and . That is, Sem(R) = hfTP ; P 2 N g; fEr ; r 2 Rg; ; i To construct the semantics algebra we rst observe that computational interpretation of parameters N used in R is one of: type, state, and mapping. Type parameters t 2 N 3

3

1

1

1

2

2

2

1

3

2

3

1

1

2

2

2

1

1

1

1

2

3

0

0

1 1

1

0

1

0

0

0

1

1

5

2

represent families of sets V (t) = ([[t] ); t 2 N ; state parameters  2 N represent functions mapping sets of names (of variables and constants), D, to values in the universe of their types,  : D ! ([[t] ); t 2 N ; mapping parameters  2 N represents state-transitions, mapping functions t : Dt ! [ t] ; t 2 N into functions t00 : Dt00 ! [ t0 ] ; t0 2 N . These interpretations of the parameters in N can be uni ed as transitions, as follows: 8  > hp; ;i ?! hp; ;i;  if p denotes a type; < [ p] = > ht;  : D ! V (t)i ?! ht;  : D ! V (t)i; if p denotes a state; : ht;  : D ! V (t)i ?!  ht0; 0 : D0 ! V (t0)i; if p denotes a transformation.

The rules r 2 R, r : A = s A s : : : si? Aisi : : : sn? Ansn are interpreted as computation steps de ned as follows: 0

0

1 1

1

1

1. If n = 0 then rhs(r) is a lexical element and [ r] : ; ! [ A ] is de ned by the following t ht; ;i; formulas: if rhs(r) is a lexical constant, c, and lhs(r) = t then [ r] (;) = hc; ;i 7! if rhs(r) is a lexical variable, X , and lhs(r) is a syntax category, V , then [ r] (;) = V h;; fx 7! @jx 2 [X ]gi 7! hV; fx 7!V @V jx 2 [X ]gi. For example, if r is V = id then [ r] (;) = h;; fx 7! @jx 2 [id]gi 7! hV; fx 7! @V jx 2 [id]gi; if r is t = integer then  t ht; ;i; if r is LP = ( then [ r] (;) = h(; ;i 7! hLP; ;i. [ r] (;) = hinteger; ;i 7! 2. For n  1 the interpretation of [ r] is obtained by composing one or more transitions representing operations of type construction, state de nition, and state transformation. 0

(

Concluding, parameters used in the speci cation rules r 2 R are interpreted as transitions representing sets of values, sets of functions, and sets of operators, i.e., function transformations. The speci cation rules are then semantically interpreted as transitions representing algebraic operations of set, function, and operator construction. A particular language is de ned by a particular set of speci cation rules R whose parameters are semantically interpreted as a particular family of sets of values, state functions, and operators, and whose speci cation rules are interpreted as a particular set of algebraic operations of set, function, and operator construction. Each transition of the semantic domains of Sem(R) performs at least one of the actions: type de nitions, state variable declarations, and state mapping. These actions are of di erent natures. Type de nitions use type constructors to de ne new kind of values in the universe of types, hence they can be mathematically expressed by products and coo-products of sets; state variable declarations choose appropriate names and bind them to values of given types, hence this is expressed by functions from a universe of names to a universe of values; state transitions change the values of a state to the values of another state, and are usually expressed by operators that map functions into functions. Though the nature of these actions is di erent they are used to express the semantics of speci cation rules by lumping them together which results in mathematical diculties for semantic manipulation. We handle these diculties by showing explicitly these actions in the algebraic operations on transitions. That is, the semantic domain [ lhs(r)]] of the computations denoted by constructs w (Tw0 ; Dw0 ; w #)g, where w 2 [lhs(r)] is speci ed by the set of transitions f(Tw ; Dw ; " w) 7! 6

w is the transition de ned by the speci cation rule of the construct w, Tw and Dw are the set of types and variable bindings, respectively, accessible to the transition w in order to perform the computation expressed by w, and Tw0 and Dw0 are Tw and Dw , respectively, enriched with (or deprived of) the types and variable bindings created (or destroyed) by the computation expressed by w. A transformation that operates on variables in Dw of types Tw and does not create (or destroy) types or variables is expressed by a transition of the w form f(Tw ; Dw ; " w) 7! (Tw ; Dw ; w #)g. If w does not create (destroy) types and variable bindings and Tw coincide with the prede ned types then the computation is expressed by a transition of the form f(Dw ; " w) 7!w (Dw ; w #)g. St]]

[[

=

8w 2 [St] if w = id:= e then := [[St]] [ (D; " id := e) 7! (D 0 ; id := e #) where  0 (x) =  (x) if x 6= id else 8w 2 [St] if w = if e then s1 else s2 and [[e]] = true then e [[St]] [ ((D; " if e then s1 else s2 fi) 7! (D 0 ; if e then " s1 else s2 ) s (D 0 ; if e then " s1 elses2 fi) 7!1 (D 00 ; if e then s1 # else s2 fi)  (D 00 ; if e then s1 # else s2 fi) 7! (D 00 ; if e then s1 else s2 fi ")) 8w 2 [St] if w = if e then s1 else s2 and [[e]] = false then e [[St]] [ ((D; " if e then s1 else s2 fi) 7! (D 0 ; if e then s1 else " s2 fi) s (D 0 ; if e then s1 else " s2 fi) 7!2 (D 00 ; if e then s1 else s2 # fi)  (D 00 ; if e then s1 else s2 # fi) 7! (D 00 ; if e then s1 else s2 fi ")) 8w 2 [St] if w = while e do s od and [[e]] = true then e [[St]] [ ((D; " while e do s od) 7! (D 0 ; while e do " s od)  s (D 00 ; while e do s # od) (D 0 ; while e do " s od) 7!  (D 00 ; while e do s # od) 7! (D 00 ; " while e do s od) 8w 2 [St] if w = while e do s od and [[e]] = false then e [[St]] [ ((D; " while e do s od) 7! (D 0 ; while e do s # od)  (D 0 ; while e do s # od) 7! (D 0 ; while e do s od ")

e

[[ ]]

Table 4: Semantic domain speci cation We illustrate the construction of the algebra Sem(R) using the speci cation rules given in Table 1, assuming that [ Id] is the universe of names, [ Ex] is the universe of values, and [ Tp] is the set of types. For x 2 [Id], e 2 [Ex] and t 2 [Tp] we denote with x , e, and t , respectively, the corresponding transitions in [ Id] , [ Ex] , [ Tp] , respectively. In addition, [ e] 2 [ Ex] is the value of the expression e computed in the current state, i.e., replacing each variable that occur in e with its value in the current state; [ St] is shown in Table 4; [ Sl] , [ Dc] , [ Dl] , [ Bl] are shown in Table 5. The semantics algebra of the language speci ed by the rules in Table 1 is Sem(R) = hf[[Id]],[[Ex]],[[Tp]],[[St]],[[Sl]],[[Dc]],[[Dl]],[[Bl]]g;f[[r ]],[[r ]],[[r ]],[[r ]],[[r ]],[[r ]],[[r ]],[[r ]],[[r ]],[[r ]],,gi, where the operations [ r ] through [ r ] are in Table 6. Fact 3: Sem(R) is embedded in the semigroup hf[ A] jA 2 N g; ; i by derived operations. 1

1

10

7

2

3

4

5

6

7

8

9

10

Sl]]

=

Dc]]

=

Dl]]

=

Bl]]

=

[[

[[

[[

[[

8w 2 [Sl] if w = w0 ; w00 and w0 2 [St]; w00 2 [Sl] then w0  [[Sl]] [ (D; " w0 ; w00 ) 7! (D 0 ; w0 #; w00 )  (D 0 ; w0 #; w00 7! (D 0 ; w; " w00 ) 8w 2 [Dc] if w = id; t and id 2 [Id]; t 2 [Tp] then w [[Dc]] [ (D; " id : t) 7! (D [ id 7! @t ; id : t ") 8w 2 [Dl] if w = w0 ; w00 and w0 2 [Dc]; w00 2 [Dl] then w0  [[Dl]] [ (D; " w0 ; w00 ) 7! (D 0 ; w0 #; dl)  (D 0 ; w0 #; dl) 7! (D 0 ; w0 ; " w00 ) 8w 2 [Bl] if w = begin d; s end 2 [Bl]; d 2 [Dl]; s 2 [Sl] then  [[Bl]] [ (D; " begin d; s end) 7! (D; begin " d; s end)  (D; begin " d; s end) d 0  7! (D ; begin d #; s end)  (D 0 ; begin d "; s end) 7! (D 0 ; begin d; " s end)  s (D 0 ; begin d; " s end) mapsto (D 00 ; begin d; s # end)  (D 00 ; begin d; s # end)  7! (D; begin dl; sl end ")

Table 5: Semantic domain speci cation [r ]: [r ]: [r ]: [r ]: [r ]: [r ]: [r ]: [r ]: [r ]: [r ]: 1

2

3

4

5

6

7

8

9

10

[ Id]  [ Ex] ! [ St] ; [ Ex]  [ St]  [ St] ! [ St] ; [ Ex]  [ St] ! [ St] ; [ Bl] ! [ St] ; [ St] ! [ Sl] ; [ St]  [ Sl] ! [ Sl] ; [ Id]  [ Tp] ! [ Dc] ; [ Dc] ! [ Dl] ; [ Dc]  [ Dl] ! [ Dl] ; [ Dl]  [ Sl] ! [ Bl] ;

e hx 7! [ e] ; e #i [ r ] (x ; e) = hx 7! (x); " ei 7! [ r ] (e ;  ;  ) =  if [ e] = true else  [ r ] (e ;  ) =   [ r ] ((e   );  ) if [ e] = true else  [ r ] ( ) =  [ r ] ( ) =  [ r ] ( 1;  ) =    t hx 7! @t ; i [ r ] (x ; t ) = hx 7! (x); i 7! [ r ] ( ) =  [ r ] ( ;  ) =    [ r ] (l ;  ) =    1

2

1

2

1

3

2

3

4

5

6

2

1

2

7

8

9

1

2

10

2

1

1

2

2

Table 6: Semantics interpretation of the rules in Table 1 A compoent relation, Sem , on Sem(R) can now be de ned by: for all transitions  = w w0 (T 0 ; D0 ; w #),  = (T ; D ; " w0) 7! (T 0 ; D0 ; w0 #),  Sem  i w0 = (T ; D ; " w) 7! E (w ; ; ). Note that if w0 2 w[ ] then w Sem w0 . If we denoted by w( ) the expression of the computation performed by a transition  then one can see that w( ) Syn w( ) implies  Sem  and vice-versa. 1

1

1

1

1

2

2

2

2

1

2

2

1

1

2

2

4 Galois connection of a programming language To construct the mappings LR : Sem(R) ! Syn(R) and ER : Syn(R) ! Sem(R) used in the language de nition we rst observe that: Sem(R) is freely generated by Sem (R) = V t hV; fx 7! @V jx 2 ht; ;ij9r 2 R; lhs(r) = t; rhs(r) = cg [ fh;; fx 7! @jx 2 [X ]gi 7! fhc; ;i 7! 0

8

[X ]gij9r 2 R; lhs(r) = V; rhs(r) = X g; Syn(R) is the unique term algebra generated by the signature f[r]jr 2 Rg whose generators are the lexical elements in Syn(R). The mapping LR is de ned as follows: 1. We de ne rst the function LR : Sem (R) ! Syn(R) by the equality: 8 2 Sem (R) ^ w  = hw; i 7! hw0; 0i we set LR ( ) = w0. Note that by construction w0 is the token name of the class of lexical elements speci ed by r 2 R. Hence, for  ;  2 Sem (R)  0 0 de ned by hw ; i 7! hw ; i, hw ; i 7! hw0 ; 0 i,  6=  implies w0 6= w0 . Thus, LR is a surjection. 2. The mapping LR : Sem(R) ! Syn(R) is now de ned by the equality: ( (R); LR( ) = L[rR]((L );( ); : : : ; L ( )); ifif  2= Sem [ r] ( ; : : : ;  ). 0

0

0

1

0

1

1

1

2

1

0

1

2

2

2

1

2

0

2

2

1

0

2

0

R

R n

1

n

1

3. Since Sem(R) and Syn(R) are similar and moreover Sem(R) is freely generated by Sem (R), the unique extension lemma[BL69] ensures that this function is the unique extension of LR : Sem (R) ! Syn(R) to the homomorphism LR : Sem(R) ! Syn(R). Note that since LR is a surjection, LR is a surjection as well. 0

0

0

0

The mapping ER is de ned as follows: 1. For each lexical element w 2 Syn (R), let r be the rule that speci es w, that is, lhs(r) = t for some token t, and rhs(r) matching w. If rhs(r) is a constant then rhs r there is a transition  2 Sem (R),  = hw; ;i ?! hlhs(r); ;i and ER (w) =  ; if rhs(r) is a lexical variable then there is a transition  2 Sem (R),  = h;; fx 7! lhs r @jx 2 [rhs(r)]gi ?! hlhs(r); fx 7! @lhs r jx 2 [rhs(r)]gi such that w 2 [rhs(r)] and ER(w) =  . Note that this function maps classes of lexical elements into transitions, therefore it de nes an equivalence relation on Syn (R). That is, two lexical elements w; w0 2 Syn (R) are equivalent, w  w0, if ER(w) = ER (w0). Hence, for  2 Sem (R), ER : Syn (R)= = fw 2 Syn (R)jER(w) =  g is an equivalence class on Syn (R). Denote by Syn (R)=  the quotient of Syn (R) with respect to  . Then ER : Syn (R)= ! Sem(R) de ned by ER((Syn (R)= ) =  is an injection. 2. Now the mapping ER : Syn(R) ! Sem(R) is de ned by the equality: ( (R); ER(w) = E[ rR]((wE );(w ); : : : ; E (w )); ifif ww 2= Syn [r](w ; : : : ; w ). 0

( )

0

0

( )

( )

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

R

R

1

n

1

n

3. Since Syn(R) is freely generated by Syn (R) and Syn(R) is similar to Sem(R), the function ER : Syn (R) ! Sem(R) has a unique extension to the homomorphism ER : Syn(R) ! Sem(R). The equivalence  on Syn (R) de ned by ER : Syn (R) ! Sem(R) is extended to an equivalence  on Syn(R) de ned by: 8w; w0 2 Syn(R):w  w0 iff ER (w) = ER(w0). That is, ER : Syn(R)= ! Sem(R) is an injection. 0

0

0

0

9

0

0

0

Fact 4: LR and ER de ne a Galois connection. Proof: Let hP; P i and hQ; Qi be two preorders (i.e., P and Q are re exive and transitive). Then the mappings f : P ! Q and g : Q ! P form a Galois connection [Bir48] if 8p 2 P and 8q 2 Q f (p) Q q i p P g(q). The relations Sem on Sem(R) and Syn on Syn(R) have been de ned component-wise. Since Sem and Syn are re exive and transitive hSem(R); Semi and hSyn(R); Syni are preorders. Since LR is a surjection and ER is an injection it results that L  E = 1Sem R . Assume that LR( ) Syn w, i.e., LR( ) = w0 and w 2 w0[ ]. This means that there is r 2 R such that w = [r](: : : ; w0; : : :). Hence, ER(w) = [ r] (: : : ; ER(LR(w0)) : : :). Since LR  ER = 1Sem R we have E (w) = [ r] (: : : ; ; : : :), i.e.,  Sem ER(w). Now assume that  Sem ER(w) and let w0 = LR ( ). Then, there is r 2 R such that w = [r](: : : ; w0; : : :). That is, w 2 w0[ ] and thus w0 = LR( ) Syn w. Hence, 8 2 Sem(R) and 8w 2 Syn(R) we have LR ( ) Syn w i  Sem ER (w) and thus LR and ER de ne a Galois connection. ( )

( )

5 Programming language implementation Syntax and semantics of a programming language are both algebras embedded in semigroups. This allows their treatment by the same approach and creates a mathematical framework for compiler generation from speci cations. In addition, semantics algebra can be seen as a semantics expression language, thus creating a mathematical framework for compiler back-ends generations from speci cations. The Galois connection of a programming language provides the mathematical foundation for the use of homomorphism as alternative to automata as machine computation and can be employed as a formal criteria for compiler correctness. Since homomorphism computation is naturally parallel and compositional this provides a foundation for system software development for parallel architectures. The notation used to manipulate Sem(R) de nes a standard machine independent language called a semantics expression language, SEL. SEL can itself be seen as an algebraic language, i.e., SEL = hSELsyn; SELSem; LSEL : SELSem ! SELSyni where SELSyn is the syntax algebra of SEL, SELSem is the semantics algebra of SEL, and there is a homomorphism ESEL : SELSyn ! SELSem such that LSEL ESEL = 1SELSem and LSEL and ESEL form a Galois connection. The expression in SELSyn of the transition semantics Sem(R) of a language L(R) de nes an isomorphic image of Sem(R) embedded in SELSyn. For a given set R of speci cation rules which use parameters given in set N the EmbedSEL : Sem(R) ! SELSyn is constructed by the following procedure: 1. For each parameter p 2 N let SEL(p) be a SEL macro-operation that speci es the SEL constructs denoting transitions p 2 [ p] called the images of p in SEL. If p is parameterized in terms of p ; p : : : ; pk 2 N then SEL(p) is parameterized in terms of SEL constructs speci ed by SEL(pi ), 1  i  k. We denote these parameters by @i, 1  i  k. 1

2

10

2. For each speci cation rule r 2 R let SEL(r) = hdef (r); dec(r); app(r)i be a tuple of three SEL macro-operations specifying SEL expressions that denote transitions r 2 [ r] . Here def (r) speci es SEL constructs employed to denote types used by r , dec(r) speci es SEL constructs employed to denote states used by r , and app(r) speci es SEL constructs employed to denote state transitions performed by r . Since type, state, and transition are optional actions in r any of def (r), dec(r), app(r) may be an empty macro-operation. If r is parameterized in terms of p ; p : : : ; pn 2 N then def (r), dec(r), app(r0, are parameterized in terms of SEL constructs speci ed by SEL(pi ), 1  i  n. We denote these parameters by @i, 1  i  k. 3. Let MSEL be a SEL macro-processor that takes as input SEL macro-operation SEL(p), p 2 N , def (r), dec(r), app(r), r 2 R, and SEL constructs used as parameters, and expand them into SEL constructs speci ed by SEL macro-operations. 4. EmbedSEL(Sem(R)) = hfMSEL : SEL(p)  SELSyn ! SELSynjp 2 N g; fMSEL : SEL(r)  SELSyn ! SELSynjr 2 Rgi is a subalgebra of SELSyn whose carrier is the family fMSEL((SEL(p)(w ; w ; : : : ; wk ))jp 2 N; w ; w ; : : : ; wk 2 SELSyng of SELSyn constructs and whose operations are the set of derived (polynomial) operations fMSEL((SEL(r)(w ; w ; : : : ; wn))jr 2 R; w ; w ; : : : ; wn 2 SELSyng obtained by the process of macro-expansions performed by MSEL. 1

1

1

2

1

2

1

2

2

2

De nition 2: a translator implementing the language L(R) = hSem(R); Syn(R); LR :

Sem(R) ! Syn(R)i in SEL is a tuple T (L) = (HSEL; CSEL), HSEL : Sem(R) ! SELSem, and CSEL : Syn(R) ! SELSyn that makes commutative the diagram in Figure 1.

LR - Syn(R) ER - Sem(R) Sem(PR)  PPPP    HSEL Embed C PPPPq ?SEL) Embed H?SEL ?  ESEL SELSyn  LSEL SELSem SELSem Figure 1: Language implementation using SEL

De nition 3: a translator T (L) = hHSEL; CSELi is a correct implementation of the language L in SEL if T (L) preserves the Galois connection of L in SEL. Fact 5: the tuple T (L) = hEmbed(R)  ESEL; ER  Embed(R)i de nes a correct implementation of L(R) into SEL. Proof: Let (o; wo) 2 Sem(R)  Syn(R) such that wo 2 oLR(o) and o = ER(wo). Mapping the tuple (o; wo) by the translator T (L) we obtain (oSEL; wSELi where oSEL = ESEL(Embed(o)) o = Embed(ER (wo)). Since ER (wo ) = o by assumption we have wo = Embed(o). and (wSEL SEL o ). On the other hand since oSEL = ESEL(Embed(o)) we have That is oSEL = ESEL(wSEL 11

LSEL(oSEL) = LSEL(ESEL(Embed(o)). Since LSEL and ESEL form a Galois connections we o , which completes the have LSEL(ESEL(Embed(o)) = Embed(o), that is LSEL(oSEL) = wSEL proof.

6 Comparisons with other approaches Earlier research that approaches compiler correctness using the algebraic methods [Jan98] do not rely on an algebraic concept of a programming language where syntax and semantics of a language are similar algebras related by a Galois connection. In this research, a compiler is seen as a syntax to syntax mapping, Comp : SynSL ! SynTL, where SL stands for source language and TL stands for target language, and compiler correctness is expressed by the commutativity of the diagram in Figure 2, where SemSL, SemTL are the semantics

SynSL Comp - SynTL Int Int ? SL ? TL SemSL  Dec - SemTL Enc Figure 2: Compiler correctness diagram algebras of source and target languages, respectively, and IntSL, IntTL, Dec, and Enc are homomorphisms. Since the corners of the diagram in Figure 2 are not necessary similar algebras the mappings connecting them are not necessarily homomorphisms and therefore though Jansssen's analysis is not always correct his conclusions are. Using embeddings [Rus98], as done in this paper, the diculties signaled by Janssen are removed. We show this in Figure 3, the diagram of a correct compiler for the Example 2 in [Jan98], obtained from Figure 1 by removing SEL details. Here we have: LSL(0) = f?0; +0g, ESL(?0) = ESL(+0) =

f0gP

LSL - f?0; +0g

ESL -

f0g PPPPEmbed   Embed  H C H P  P  ?  ETL PPq ? ) LTL ? f?0; +0g f?0; +0g f?0; +0g Figure 3: The diagram of a correct compiler for Example 2 in [Jan98] 0, LTL(?0) = ETL(?0) = ?0, LTL(+0) = ETL(+0) = +0, Embed(0) = f?0; +0g. One can see that the class of source language constructs f?0; +0g, representing the source meaning 0, 12

is mapped into the target constructs f?0; +0g and the source language meaning 0 is mapped into the target language f?0; +0g as expected. SEL images are similar to the attributes used in attributed grammars [KW94] for semantics speci cation of programming language constructs. However, while attributes are properties of source language constructs SEL images are well de ned constructs of the SEL language. In addition, attributes are processed by tools that belong to the source language processing environment while images are processed by tools that belong to SEL.

References [Bir48] G. Birkho . Lattice Theory. American Mathematical Society, 1948. [BL69] R.M. Burstall and P.J. Landin. Programs and their proofs: an algebraic approach. Machine Intelligence, 4:17{43, 1969. [Jan98] T.M.V. Janssen. Algebraic translations, correctness and algebraic compiler construction. Theoretical Computer Science, 199:25{56, 1998. [KW94] U. Kastens and W.M. Waite. Modularity and reusability in attribute grammars. Acta Informatica, 31:601{627, 1994. [Nau63] P. Naur. Revised report on the algorithmic language algol 60. Communications of the ACM, 6(1):1{17, 1963. [Rus98] T. Rus. Algebraic processing of programming languages. Theoretical Computer Science, 199:105{143, 1998.

13