Rigal — a programming language for compiler ... - Springer Link

10 downloads 0 Views 1MB Size Report
parsing, optimization and code generation. The REFAL programming language [10], acknowledged for translator writing, may serve as a good example.
RIGAL Language for Compiler Writing

a Programming

Mikhail AUGUSTON Institute of Mathematics and Computer Science The University of Latvia Rainis boulevard 29, Riga, Latvia, SU - 226250 A new programming language for compiler writing i s described. The main data structures are atoms, lists and trees. The control structures are based on advanced pattern matching. All phases of compilation, including parsing, optimization and code generation, can be programmed in this language in short and readable form. Sample compiler written in RICAL is presented.

Abstract.

1. Introduction Programming parsing(

language

context

checking,

included

),

for

analysis

of

programs,

preprocessors

RICAL i s

code

all

construction

the

problems

language,

(usually

[1]

parsing.

implemented

optimization,

code

as

for

well

as

systems

tool

for

of errors static

the

programming

to

solve

of

contain

means

in

present

Earlier

tools

describe

systems,

to

P a r s i n g methods f o r

the

to

limited

l a n g u a g e s and s y s t e m s o f

context-free

like

work w i t h

compiler

the Floyd -

stack,

which

grammar c l a s s e s later

is are

generations

LL(1) o r LR(1) ).

(Agamirzyan

[4])

implementation

of

formation of tables, performed

universal

a

generation,

envisaged

Such s y s t e m s a s YACC ( J o h n s o n [ 2 ] ) ,

are

be

d i a g n o s i n g and n e u t r a l i z a t i o n

grammar o f t h e s o u r c e l a n g u a g e .

used for

to

and c o n v e r t o r s .

Almost

Evans

intended

and

many

parsing

and

context

by c a l l

of

others

CDL-2 ( K o s t e r make

different

checking,

semantic

etc.

use

SHAC

synchronous

computations,

e.g.,

Usually these actions

subroutines,

programming language ( e . g . ,

of

[3]),

written

i n P a s c a l o r C).

in

some

530

Attribute influenced Systems, [7]),

grammars

development

advanced of

by

systems

Knuth for

[S]

have

compiler

greatly

construction.

like, SUPER (Serebryakov [6]), ELMA (Vooglaid, Lepp, Lijb

MUG2

(Wilhelm

[9])

are

based

on

the

use

of

attribute

grammars not only for parsing, but for code generation as well. Pattern parsing,

matching

optimization

is

a

convenient

tool

and code generation.

for

programming

of

The REFAL programming

language [10], acknowledged for translator writing,

may serve as

a good example. Vienna method for defining semantics of programming languages [Ii] suggests the usage of labelled trees in order to present the abstract

syntax

of

programs.

Representation

of

compilation

intermediate results in the tree form has become usual Dependence

of

control

structures

in

the

(see [12]).

program

from

data

structures used for program's work is one of the basic principles in programming.

The recursive descent method could be considered

to be the application of dependence principle. The above mentioned ideas and methods were taken into account when creating RIGAL language. The contain

language atoms,

possesses

lists

and

few

basic

trees.

notions.

Advanced

Data

mechanism

structures of

pattern

matching lies at the basis of control structures. The

fact

distinctive.

that

RIGAL

is

a

closed

language

makes

RIGAL

That means that almost all the necessary computations

and input-output could be executed by internal means and there is no

need

to

use

external

semantic

subroutines.

Therefore

the

portability of RIGAL programs to other computers is increased. Means enable

both

for

work

with

programming

of

trees,

different

parsing

phases and code generation as well.

patterns

algorithms

and

including, optimization

The language supports design

of multipass translators. Trees are used as intermediate data. The language allows to split

the program

into small modules

(rules) and presents various means to arrange interaction of these modules. Pattern matching is used for parameter passing. RIGAL

supports

attribute

translation

scheme

and

easy

531

implementation

possible. special

of

synthesized

The p r o b l e m o f g l o b a l

and

inherited

attributes

is

attributes

solved

by usage

is of

references.

Lexical language

analysis

facilities'

LEX/YACC [2]

system.

is for In

a

separate

task

is,

requires

description

as

the

implementation

current

it

and

for

special

example, of

in

RIGAL two

scanners are included that accept lexics of Pascal and RIGAL.

2. Implementation R I G A L w a s designed and implemented i n the Computing Center of Latvia University in years 1987-1988. The first implementation was for PDP-11 in RSX-Ii. At the present stage RIGAL interpreter has been developed and optimizing compiler RIGAL -> Pascal has been implemented by means of RIGAL itself. The interpreter and the compiler have been ported to VAX/VMS and IBM PC AT /MS DOS environments.

3. Lexical Rules The text of RIGAL program (e.g.,

is a sequence of tokens - atoms

identifiers and integers ), keywords

special

symbols

(e.g.,

+,

##

), names

of

(e.g.,

if, return

variables

and

),

rules

(e.g., SA, #L ). Tokens may be surrounded by any number of blanks. A

comment

is

any

consecutive symbols

string '-'

of

(minus).

symbols

that

begins

The end of the comment

end of the line. For example, #Sum

-- rule for addition of two numbers

/ ##

SNI

-- the first number

$N2

-- the second number

return

SNI + $N2 /

with

-- return of the result

two

is the

532

4. Data 4.1 Atoms An atom is a string of symbols. ( the first underscore

symbol

is a letter

symbols),

written directly: Numerical

in

the

AABC

If the atom is an identifier

followed

text

of

by

letters

RIGAL

program

total_number

atoms are integers,

Some

identifiers

are reserved

are used as RIGAL atoms, 'return'.

Besides,

'+'

different process

is

'2' and

frequently

operations,

if

of computations.

usually

this atom

represents

value

If they

For example,

'if',

also can be

'2S' are different

atoms,

'S'.

yielded

as

was

a

result

incorrect

This atom also represents

is yielded

value by

the

in the language.

something

list, an empty tree and Boolean T -

in RIGAL.

one and the same atom.

Two special atoms are distinguished atom

be

'Ist'

any atom, which is an identifier,

latter is just a string of symbols

this

':='

as keywords

It should be noted that 2S and

-

could

2, 187, O, -2S

they should be quoted.

quoted - ABC and 'ABC' represent

NULL

it

or

x2S

for instance,

In other cases the atom is quoted:

or digits

of

in an

the empty

"false".

logical

operations

and

"true".

4.2 Variables The name of a variable must begin with the symbol $, followed by

an

example,

identifier.

Value

can

be

by the help of assignment

assigned statement:

to

a

variable,

for

SE := A

In this case the atom A becomes value of the variable SE. In RICAL variables

have no types,

the same variable

an atom, a list or a tree as a value in different

may have

time moments.

4.3 Lists Ordered sequences,

i.e., lists can be composed from atoms and

533

from

other

lists and

trees,

as well.

A special

serves for list formation.

constructor

function

For instance,

- list

(. A B C

.)

forms a list of three atoms A, B and C. Arguments The sample

of

the

list

SE := (.

constructor

(. 8 14 7 .)

may

(. A

be

expressions.

B .)

.)

could be rewritten as follows: SA := (. 8

14

7 .); $B := (. A

B

.); SE := (. SA

SB .);

Separate elements of the list can be selected by indexing. Hence,

SB [I] is atom A,

$A [2] is atom 14,

SE [2] is list

(. A B .), but SE [I0] is atom NULL If the value of the index is a negative number, -N, then the

for instance

N-th element, beginning from the end of the list, is

selected. For example, SA [-i] is atom 7. The necessity common.

Operation

Example.

to add one more element !.

(. A

is

envisaged

B .) !. C

for

to the

this

yields (. A

B

C

For

yields the list (. A

instance,

(. A

is quite

purpose.

To link two lists in a new list the operation ( list concatenation).

list

B

.)

B

C .)

!! is applied !!

(. C

D

.)

arches

of

D .).

4.4 Trees Tree constructor

is used to create a tree. For example,

a graph,

the nodes

and

which are marked by some objects. Objects In

selectors.

identifiers graphical

before the

'-'

given

( except

in

tree

implementation

NULL

representation

the

),

may

selectors

constructor solely

serve

as

atoms,

are

which

selectors.

correspond

to

named

arches

are

In

the

of

the

graph. All selectors of one and the same level in the tree must be different. Any correspond

object to

-

atom,

terminal

list nodes

or of

tree, the

except

graph

(

atom

NULL,

"leaves"

tree). Hence, multilayer trees can be built. For instance,

of

may the

534

.>

is n a m e d branch

tree

of

the

constructor

may

tree. Br~unches are u n o r d e r e d in the tree. Likewise

for

the

list

constructor,

be d e s c r i b e d by e x p r e s s i o n s

the

tree

( in both s e l e c t o r and object places),

for instance, SX

:= D;

SB

$C

:=

: ;

the tree component.

sel

,

some

tree,

where

$C

. A

is the a t o m

is the tree

'+'

is the a t o m

, $C

. E

8 , $C

. A

,

is the list . M

"addition"

(. 2

8

,) ,

is the a t o m K in the tree,

then

is p e r f o r m e d as well:

T1 ++

Tree T2 b r a n c h e s are a d d e d to the

the

tree

brs~nch w i t h the same selector, t h e

It

is

is a t o m NULL.

, w h e r e TI and T2 are trees.

tree

tree

but

: K

If there is no b r a n c h w i t h a g i v e n s e l e c t o r the result

It is

w h o s e value must be an a t o m - i d e n t i f i e r .

Consequently, . D

: K

8 .);

o p e r a t i o n serves

following

$C

(. 2

TI

there

branch

already

exists

a

is s u b s t i t u t e d by a n e w

'%+" is not commutative. out

that

the

i.e.,

gives the same result as the e x p r e s s i o n (( N U L L ++ ) ++ ) ++

5. Expressions O p e r a t i o n s = and serve for

the c o m p a r i s o n

of objects.

result of the c o m p a r i s o n is either T ("true") or N U L L A t o m s are m a t c h e d directly,

The

("false").

for instance, a = b gives NULL,

2S = 25 gives T, 17 25 gives T. L i s t s are c o n s i d e r e d equal

iff they c o n t a i n equal n u m b e r

c o m p o n e n t s and if these c o m p o n e n t s are equal r e s p e c t i v e l y .

of

535

Trees are considered branches then

and

if one

the other

of

equal the

tree also

iff they contain

trees

contains

contains

equal number

the

the branch

branch

"S

"S

: OBI"

of

: OB",

and OB =

OB1.

Arithmetical numerical those

+,

atoms.

The

essence

in Pascal.

The

result

numerical

atom.

arithmetical its

operations

Atom

NULL

operation,

value.

Under

-,

of

~, div,

these

of an

is also

in this

matching

operations

assigned is

operation

admitted

the

as

integer

atoms

are

for

similar

arithmetical

case

these

are

mod

to

is a

argument

0 is supposed considered

of

to be

different,

i.e., NULL = 0 gives NULL. Besides

the

operations

= and



numerical

values

could

be

compared by the help of >, = and

,


=

7)) may be

SNum

V'(SNum > 7)

8. Statements 8.1 Assignment Statement In the left side of assignment indicated,

which

is

followed

by

an

statement

a variable may be

arbitrary

number

of

list

indexes and/or tree selectors. For example, SX := (. A

B

C .);

SY := ;

After assignment $X[2]

:= T

the value of $X is (. A T C .)

After assignment SY.D :=17 the value of SY is The execution of the statement SY.A := T yields the run time error message. The necessary result is obtained the following way: SY ++:=

The branch

is deleted

empty object to the corresponding selector:

by assigning

SY.D := NULL;

8.2 Conditional Statement Conditional statement has the following form:

an

549

expression

if

Then branches may follow

In

statement

conditional

one by one,

until

the statements

statements

(it is not compulsory)

expression

elsif

Conditional

->

->

statements

ends with keyword fi. statement

a value

described

branches

different

expressions

from

NULL

are

computed

is obtained.

Then

in this branch are executed.

8.3 Fail Statement Fail statement

finishes

the execution of the rule branch with

failure. Example.

In

order

sequence

of

tokens

instance,

until semicolon

#statement (*

to

repair

should

errors

be

parsing

skipped

symbol.

...

in

quite

process,

the

frequently,

for

It is done the following way.

;; -- branches for statement analysis

#Not_semicolon

*)

';'

-- no statement

is recognised

#$ #Not_semicolon

SE / if

$E = ';' -> fail

fi/

##

8.4 Loop Statements Statement

of the type

forall SVAR

in

expression

do

statements

od

loops over a list or a tree. The value of the expression Value

of the current

list element

or value of the current selector assigned describing variable

to

the

body

loop of

the

the

list)

(if the loop is over the tree)

variable loop,

(if the loop is over

may

SVAR

one

use

the

by

one.

current

is

Statements, value

of

the

SVAR.

Loop statement

of the type loop

repeats statements break,

must be either a list or a tree.

return or

statements

end;

of the loop body, until one of the statements fail

is not executed.

-

550

8.5 Rule Call If a rule

is called just to execute statements described

it, and value returned by the rule is not necessary, is written down as statement.

in

the rule call

It is analogous to procedure call in

traditional programming languages. Success/failure of the rule and value returned by it is disregarded in such a call.

9. Input and Output 9.1 Save and Load Statements Objects created by RICAL program (atoms, lists, trees) can be saved in the file and loaded back to the memory. Statement

SVar

save

file-specification

unloads the object, which is the value of the variable SVar to the file with the given specification. File, formed by s a v e statement, contains precisely one object (atom, list or tree). We can

load the object

executed statement:

load

from

SVar

the file

in the memory

having

file-specification

9.2 Text Output To several

output

texts

(messages,

generated

text files can be opened

file FFF is opened by statement:

object

codes,

in the RIGAL program. open

FFF

File-specification may be an expression.

etc. The

)

text

file-specification

It presents

the name of

the file on the device. Statement of the type FFF

syntax tree - is

during parsing messages

errors in file REP can be output.

open GEN 'A.BAL';

-- if the tree is created,

-- then file is opened to output the generated BAL text #G_PROGRAM($S_TREE) elsif

T

->

-- 2nd phase - code generation

REP