parsing, optimization and code generation. The REFAL programming language [10], acknowledged for translator writing, may serve as a good example.
RIGAL Language for Compiler Writing
a Programming
Mikhail AUGUSTON Institute of Mathematics and Computer Science The University of Latvia Rainis boulevard 29, Riga, Latvia, SU - 226250 A new programming language for compiler writing i s described. The main data structures are atoms, lists and trees. The control structures are based on advanced pattern matching. All phases of compilation, including parsing, optimization and code generation, can be programmed in this language in short and readable form. Sample compiler written in RICAL is presented.
Abstract.
1. Introduction Programming parsing(
language
context
checking,
included
),
for
analysis
of
programs,
preprocessors
RICAL i s
code
all
construction
the
problems
language,
(usually
[1]
parsing.
implemented
optimization,
code
as
for
well
as
systems
tool
for
of errors static
the
programming
to
solve
of
contain
means
in
present
Earlier
tools
describe
systems,
to
P a r s i n g methods f o r
the
to
limited
l a n g u a g e s and s y s t e m s o f
context-free
like
work w i t h
compiler
the Floyd -
stack,
which
grammar c l a s s e s later
is are
generations
LL(1) o r LR(1) ).
(Agamirzyan
[4])
implementation
of
formation of tables, performed
universal
a
generation,
envisaged
Such s y s t e m s a s YACC ( J o h n s o n [ 2 ] ) ,
are
be
d i a g n o s i n g and n e u t r a l i z a t i o n
grammar o f t h e s o u r c e l a n g u a g e .
used for
to
and c o n v e r t o r s .
Almost
Evans
intended
and
many
parsing
and
context
by c a l l
of
others
CDL-2 ( K o s t e r make
different
checking,
semantic
etc.
use
SHAC
synchronous
computations,
e.g.,
Usually these actions
subroutines,
programming language ( e . g . ,
of
[3]),
written
i n P a s c a l o r C).
in
some
530
Attribute influenced Systems, [7]),
grammars
development
advanced of
by
systems
Knuth for
[S]
have
compiler
greatly
construction.
like, SUPER (Serebryakov [6]), ELMA (Vooglaid, Lepp, Lijb
MUG2
(Wilhelm
[9])
are
based
on
the
use
of
attribute
grammars not only for parsing, but for code generation as well. Pattern parsing,
matching
optimization
is
a
convenient
tool
and code generation.
for
programming
of
The REFAL programming
language [10], acknowledged for translator writing,
may serve as
a good example. Vienna method for defining semantics of programming languages [Ii] suggests the usage of labelled trees in order to present the abstract
syntax
of
programs.
Representation
of
compilation
intermediate results in the tree form has become usual Dependence
of
control
structures
in
the
(see [12]).
program
from
data
structures used for program's work is one of the basic principles in programming.
The recursive descent method could be considered
to be the application of dependence principle. The above mentioned ideas and methods were taken into account when creating RIGAL language. The contain
language atoms,
possesses
lists
and
few
basic
trees.
notions.
Advanced
Data
mechanism
structures of
pattern
matching lies at the basis of control structures. The
fact
distinctive.
that
RIGAL
is
a
closed
language
makes
RIGAL
That means that almost all the necessary computations
and input-output could be executed by internal means and there is no
need
to
use
external
semantic
subroutines.
Therefore
the
portability of RIGAL programs to other computers is increased. Means enable
both
for
work
with
programming
of
trees,
different
parsing
phases and code generation as well.
patterns
algorithms
and
including, optimization
The language supports design
of multipass translators. Trees are used as intermediate data. The language allows to split
the program
into small modules
(rules) and presents various means to arrange interaction of these modules. Pattern matching is used for parameter passing. RIGAL
supports
attribute
translation
scheme
and
easy
531
implementation
possible. special
of
synthesized
The p r o b l e m o f g l o b a l
and
inherited
attributes
is
attributes
solved
by usage
is of
references.
Lexical language
analysis
facilities'
LEX/YACC [2]
system.
is for In
a
separate
task
is,
requires
description
as
the
implementation
current
it
and
for
special
example, of
in
RIGAL two
scanners are included that accept lexics of Pascal and RIGAL.
2. Implementation R I G A L w a s designed and implemented i n the Computing Center of Latvia University in years 1987-1988. The first implementation was for PDP-11 in RSX-Ii. At the present stage RIGAL interpreter has been developed and optimizing compiler RIGAL -> Pascal has been implemented by means of RIGAL itself. The interpreter and the compiler have been ported to VAX/VMS and IBM PC AT /MS DOS environments.
3. Lexical Rules The text of RIGAL program (e.g.,
is a sequence of tokens - atoms
identifiers and integers ), keywords
special
symbols
(e.g.,
+,
##
), names
of
(e.g.,
if, return
variables
and
),
rules
(e.g., SA, #L ). Tokens may be surrounded by any number of blanks. A
comment
is
any
consecutive symbols
string '-'
of
(minus).
symbols
that
begins
The end of the comment
end of the line. For example, #Sum
-- rule for addition of two numbers
/ ##
SNI
-- the first number
$N2
-- the second number
return
SNI + $N2 /
with
-- return of the result
two
is the
532
4. Data 4.1 Atoms An atom is a string of symbols. ( the first underscore
symbol
is a letter
symbols),
written directly: Numerical
in
the
AABC
If the atom is an identifier
followed
text
of
by
letters
RIGAL
program
total_number
atoms are integers,
Some
identifiers
are reserved
are used as RIGAL atoms, 'return'.
Besides,
'+'
different process
is
'2' and
frequently
operations,
if
of computations.
usually
this atom
represents
value
If they
For example,
'if',
also can be
'2S' are different
atoms,
'S'.
yielded
as
was
a
result
incorrect
This atom also represents
is yielded
value by
the
in the language.
something
list, an empty tree and Boolean T -
in RIGAL.
one and the same atom.
Two special atoms are distinguished atom
be
'Ist'
any atom, which is an identifier,
latter is just a string of symbols
this
':='
as keywords
It should be noted that 2S and
-
could
2, 187, O, -2S
they should be quoted.
quoted - ABC and 'ABC' represent
NULL
it
or
x2S
for instance,
In other cases the atom is quoted:
or digits
of
in an
the empty
"false".
logical
operations
and
"true".
4.2 Variables The name of a variable must begin with the symbol $, followed by
an
example,
identifier.
Value
can
be
by the help of assignment
assigned statement:
to
a
variable,
for
SE := A
In this case the atom A becomes value of the variable SE. In RICAL variables
have no types,
the same variable
an atom, a list or a tree as a value in different
may have
time moments.
4.3 Lists Ordered sequences,
i.e., lists can be composed from atoms and
533
from
other
lists and
trees,
as well.
A special
serves for list formation.
constructor
function
For instance,
- list
(. A B C
.)
forms a list of three atoms A, B and C. Arguments The sample
of
the
list
SE := (.
constructor
(. 8 14 7 .)
may
(. A
be
expressions.
B .)
.)
could be rewritten as follows: SA := (. 8
14
7 .); $B := (. A
B
.); SE := (. SA
SB .);
Separate elements of the list can be selected by indexing. Hence,
SB [I] is atom A,
$A [2] is atom 14,
SE [2] is list
(. A B .), but SE [I0] is atom NULL If the value of the index is a negative number, -N, then the
for instance
N-th element, beginning from the end of the list, is
selected. For example, SA [-i] is atom 7. The necessity common.
Operation
Example.
to add one more element !.
(. A
is
envisaged
B .) !. C
for
to the
this
yields (. A
B
C
For
yields the list (. A
instance,
(. A
is quite
purpose.
To link two lists in a new list the operation ( list concatenation).
list
B
.)
B
C .)
!! is applied !!
(. C
D
.)
arches
of
D .).
4.4 Trees Tree constructor
is used to create a tree. For example,
a graph,
the nodes
and
which are marked by some objects. Objects In
selectors.
identifiers graphical
before the
'-'
given
( except
in
tree
implementation
NULL
representation
the
),
may
selectors
constructor solely
serve
as
atoms,
are
which
selectors.
correspond
to
named
arches
are
In
the
of
the
graph. All selectors of one and the same level in the tree must be different. Any correspond
object to
-
atom,
terminal
list nodes
or of
tree, the
except
graph
(
atom
NULL,
"leaves"
tree). Hence, multilayer trees can be built. For instance,
of
may the
534
.>
is n a m e d branch
tree
of
the
constructor
may
tree. Br~unches are u n o r d e r e d in the tree. Likewise
for
the
list
constructor,
be d e s c r i b e d by e x p r e s s i o n s
the
tree
( in both s e l e c t o r and object places),
for instance, SX
:= D;
SB
$C
:=
: ;
the tree component.
sel
,
some
tree,
where
$C
. A
is the a t o m
is the tree
'+'
is the a t o m
, $C
. E
8 , $C
. A
,
is the list . M
"addition"
(. 2
8
,) ,
is the a t o m K in the tree,
then
is p e r f o r m e d as well:
T1 ++
Tree T2 b r a n c h e s are a d d e d to the
the
tree
brs~nch w i t h the same selector, t h e
It
is
is a t o m NULL.
, w h e r e TI and T2 are trees.
tree
tree
but
: K
If there is no b r a n c h w i t h a g i v e n s e l e c t o r the result
It is
w h o s e value must be an a t o m - i d e n t i f i e r .
Consequently, . D
: K
8 .);
o p e r a t i o n serves
following
$C
(. 2
TI
there
branch
already
exists
a
is s u b s t i t u t e d by a n e w
'%+" is not commutative. out
that
the
i.e.,
gives the same result as the e x p r e s s i o n (( N U L L ++ ) ++ ) ++
5. Expressions O p e r a t i o n s = and serve for
the c o m p a r i s o n
of objects.
result of the c o m p a r i s o n is either T ("true") or N U L L A t o m s are m a t c h e d directly,
The
("false").
for instance, a = b gives NULL,
2S = 25 gives T, 17 25 gives T. L i s t s are c o n s i d e r e d equal
iff they c o n t a i n equal n u m b e r
c o m p o n e n t s and if these c o m p o n e n t s are equal r e s p e c t i v e l y .
of
535
Trees are considered branches then
and
if one
the other
of
equal the
tree also
iff they contain
trees
contains
contains
equal number
the
the branch
branch
"S
"S
: OBI"
of
: OB",
and OB =
OB1.
Arithmetical numerical those
+,
atoms.
The
essence
in Pascal.
The
result
numerical
atom.
arithmetical its
operations
Atom
NULL
operation,
value.
Under
-,
of
~, div,
these
of an
is also
in this
matching
operations
assigned is
operation
admitted
the
as
integer
atoms
are
for
similar
arithmetical
case
these
are
mod
to
is a
argument
0 is supposed considered
of
to be
different,
i.e., NULL = 0 gives NULL. Besides
the
operations
= and
numerical
values
could
be
compared by the help of >, = and
,
=
7)) may be
SNum
V'(SNum > 7)
8. Statements 8.1 Assignment Statement In the left side of assignment indicated,
which
is
followed
by
an
statement
a variable may be
arbitrary
number
of
list
indexes and/or tree selectors. For example, SX := (. A
B
C .);
SY := ;
After assignment $X[2]
:= T
the value of $X is (. A T C .)
After assignment SY.D :=17 the value of SY is The execution of the statement SY.A := T yields the run time error message. The necessary result is obtained the following way: SY ++:=
The branch
is deleted
empty object to the corresponding selector:
by assigning
SY.D := NULL;
8.2 Conditional Statement Conditional statement has the following form:
an
549
expression
if
Then branches may follow
In
statement
conditional
one by one,
until
the statements
statements
(it is not compulsory)
expression
elsif
Conditional
->
->
statements
ends with keyword fi. statement
a value
described
branches
different
expressions
from
NULL
are
computed
is obtained.
Then
in this branch are executed.
8.3 Fail Statement Fail statement
finishes
the execution of the rule branch with
failure. Example.
In
order
sequence
of
tokens
instance,
until semicolon
#statement (*
to
repair
should
errors
be
parsing
skipped
symbol.
...
in
quite
process,
the
frequently,
for
It is done the following way.
;; -- branches for statement analysis
#Not_semicolon
*)
';'
-- no statement
is recognised
#$ #Not_semicolon
SE / if
$E = ';' -> fail
fi/
##
8.4 Loop Statements Statement
of the type
forall SVAR
in
expression
do
statements
od
loops over a list or a tree. The value of the expression Value
of the current
list element
or value of the current selector assigned describing variable
to
the
body
loop of
the
the
list)
(if the loop is over the tree)
variable loop,
(if the loop is over
may
SVAR
one
use
the
by
one.
current
is
Statements, value
of
the
SVAR.
Loop statement
of the type loop
repeats statements break,
must be either a list or a tree.
return or
statements
end;
of the loop body, until one of the statements fail
is not executed.
-
550
8.5 Rule Call If a rule
is called just to execute statements described
it, and value returned by the rule is not necessary, is written down as statement.
in
the rule call
It is analogous to procedure call in
traditional programming languages. Success/failure of the rule and value returned by it is disregarded in such a call.
9. Input and Output 9.1 Save and Load Statements Objects created by RICAL program (atoms, lists, trees) can be saved in the file and loaded back to the memory. Statement
SVar
save
file-specification
unloads the object, which is the value of the variable SVar to the file with the given specification. File, formed by s a v e statement, contains precisely one object (atom, list or tree). We can
load the object
executed statement:
load
from
SVar
the file
in the memory
having
file-specification
9.2 Text Output To several
output
texts
(messages,
generated
text files can be opened
file FFF is opened by statement:
object
codes,
in the RIGAL program. open
FFF
File-specification may be an expression.
etc. The
)
text
file-specification
It presents
the name of
the file on the device. Statement of the type FFF
syntax tree - is
during parsing messages
errors in file REP can be output.
open GEN 'A.BAL';
-- if the tree is created,
-- then file is opened to output the generated BAL text #G_PROGRAM($S_TREE) elsif
T
->
-- 2nd phase - code generation
REP