STARLET: An affix-based compiler compiler designed as a logic ...

STARLET • An Affix-Based Compiler Compiler d e s i g n e d a s a Logic P r o g r a m m i n g S y s t e m

Jean

BENEy1

Jean-Francois

BOULICAUT1-2

Institut National des Sciences Appliqu6es de Lyon 1 Laboratoire d'Informatique Appliqu6e, Batiment 502 F-69621 Villeurbanne Cedex (France)

Ecole Centrale de Lyon 2 D6partement Math6matiques-lnformatique-Syst~mes, BP 163 F-69131 Ecully Cedex

Abstract :

We present STARLET, a

(France)

new compiler compiler which compiles

Extended Affix G r a m m a r s defining a translation into an executable program : the translator. We look at its operational semantics and we focus on the points which are close to or different from Prolog procedural semantics. We discuss the two interwoven issues which are Program Reliability (due to m a n y static checks) and Program Efficiency (optimizations at compile time).

Both

are reached through a systematic use of

grammatical properties.

I

Introduction Our research group has been working on grammatical programming development

i.e. an approach to

software construction based on compiling methods [Beney-86],

Compiler compilers are designed as high-level programming environments and

if

compiler writing is the major application field, we also investigate other application fields [Fr~con 89].

72 Within the grammatical programming framework, the specification step must produce

abstract language definitions. As context-free grammars are non-algorithmic

descriptions for the syntax of languages, two-level grammars are grammatical formalisms which enable the semantic part to be defined. Among the best known types of two-level grammars are the W-grammars

devised by A. van Wijngaarden and the

Attribute Grammars devised by D.E. Knuth. To help language prototyping and also to improve its final implementations, the idea of analysis-orlented two-level grammars (and compiler compilers] leads to restricted classes of attribute grammars (L-attrlbuted systems [E2S-84], SNC attribute grarmnars [Jourdan-88].,.) or restricted classes of Wgrammars (Affix Grammars, Extended Affix Grammars, RW-grammar and transparent W-grammars devised respectively by C.H,a~. Koster [Koster-70], D . ~ Watt [Watt-74], M. Simonet [Simonet-81]

and J, Maluszynskl [Maluszynski-84]). CDL was the first

compiler compiler based on afl?u~grammars [Koster-77].

Our group has worked on affix grammars and one CDL-like implementation based on its own set of well-form conditions to deal with a deterministic top-down analysis [Beney-80]. This system is called LET [trademark of INSA de Lyon). Then, we needed to extend the class of grammar which was accepted as well as to improve the translator writing facilities. Independently of the research in Berlin or in Nijmegen (e.g. EAGLE [Franzen-77] or PROGRAMMAR [Meijer-86]), thls h a s lead to a shift from

Grammars to Extended Affix Grammars

Affix

and a shift from an algorithmic language to a

logic programming language. We called this grammatical system and its metalanguage STARLET while the first implementation is called STARLET/GL.

II

B a c k g r o u n d : Affix G r a m m a r s

and

related f o r m a l i s m s

Like W-grammars, Affix Grammars have two grammatical levels but there is a clear distinction between the notions (i.e. the non-termlnals) and their parameters : the so-called affiw positions which are variables. It introduces structural constraints on hypernotions and rules)

an underlying context-free grammar [UCFG). The affix level (affix

assign domains

to affix positions. The referencing problem known to be

undecidable for W-grammars is then decidable and one c a n make straightforward extensions

of context-free parsers. As it is oriented towards analysis and

metacompilation, mode assignment is introduced to specify the data flow. Another useful concept for programming purposes is the "primitive predicate". The primitive predicates, described in CDL and LET by some programming languages, are used to

73

define i n p u t / o u t p u t operations (reading or writing of terminal symbols) or

any

algorithmic processing (e.g. symbol table management). The class of grammars accepted by CDL or even LET is rather poor since the analysis method is an adaptation of the recursive descent (without any automatic backtracking and a one-pass left to right evaluation of every affix value}. These systems are not good enough for general programming purposes since they look like algorithmic programming languages where coding and concrete data structure processing always interferes with higher-level analysis tasks.

D_~ Watt

proposed Extended Affix Grammars

(EAG} (Watt-74} in order to

eliminate primitive predicates and return to a more generative formalism. An EAG is "extended" since affix positions c a n be affix expressions relationships between

and this

allows all

the affix in each rule to be implicitly defined. Thus, explicit

evaluation rules (which are often simple copies} and constraints become unnecessary since the Uniform Replacement Rule combined with the type discipline (following affix rules

specifications for affix positions) are enough.

Therefore, one of the major

problems in designing a compiler compiler which processes

EAG is to implement a

grammatical unification algorithm [Maluszynski-84] (see § III.2}. The question arises of the differences between Affix Grammars (Extended or not) and Attribute Grammars. Some answer that there is no distinction : EAG has become either Extended Affix G r a m m a r s or Extended Attribute G r a m m a r s in [Watt-83}, assuming that the original W-grammar mechanism for context-sensitive requirements by

predicative hyperrules could be applied to attribute g r a m m a r s as constraints

associated to each rule. We also notice that affLxgrammars which are processed by the LET compiler compiler are 1L-Attribute Grammars (see for example the MIRA compiler compiler [E2S-841 or [Deransart-88a] for a survey}. This arises from the fact that in LET, affix rules (and thus two-level grammar mechanisms) are not really used since the user has to choose concrete data structures for affix variables and use them via the primitive predicates. In both cases, semantic values can be used to drive syntactic analysis (values may be computed during parsing}.

However, there are differences between affix and attribute formalisms when no context-free basis is apparent. An attribute grammar

ought to be defined from an

underlying context-free language (which is independent from the attribute values). This m e a n s that the underlying language must be defined by a well-formed context-free grammar according to a particular parsing method (LL, LR ...).

74 Within

the affix framework, the UCFG m a y have unwanted properties (e.g.

ambiguities) that affix-driven parsing c a n allow because of the contextual analysis

method. This problem has already been informally considered in [Beney-89]. On the other hand, m a n y grammatical formalisms have been proposed under the generic term of "Log/c Grammars ". Metamorphosis Grammars (MG) [Colmerauer-77] or

Defmite Clause Grammars (DCG) [Pereira-80] are grammatical formalisms designed for natural language processing which are easily compiled into (or interpreted as) logic programs. Furthermore, it is interesting to consider MG or DCG as programming languages and implement with them more sophisticated g r a m m a r processors. S. Saidi and J . F Boulicaut are working on such a n implementation of t r a n s p a r e n t two-level g r a m m a r s [Saidi-90]. Logic Programming features such as the tree data structure of terms, the variable instantiation by unification and the systematic backtracking m e c h a n i s m are useful to handle multiple analysis and hhus ambiguities in language processing. Given the SLDresolution, interpreters t u m Prolog into a real generator of non-deterministic top-down parsers. The well-known shortcomings of top-down parsing (prefix sharing and local ambiguities, e-productions, ambiguities} can easily be managed but it costs too m u c h for m a n y applications of practical interest. The efficiency of Prolog programs m u s t be improved by m e a n s of explicit control over resolution (e.g. cut, freeze,diff, wait...), mode assignments or the informed use of unification (e.g. taking concatenation assoclativity into account). On the other hand, the production of more reliable logic programs is made easier by certain systems which have explicit type definitions [Borland-86], EAG provided with operational semantics (given a compiler compiler) are very close to logic programs, We decided to execute an EAG defining an unambiguous left to

right translation as a logic program. Thus, the rules m u s t be selected by means of affix values to check whether the tried productions do not lead to blind alleys (we want only programs which deliver at least one result). This m e a n s that the affix values (or at least some of them) have to be computed while parsing, Local ambiguities are allowed since the right decision c a n be t a k e n by inspecting known affix values (contextual LL(k)

parsing). Each rule call can only have one successful completion as ff there was an implicit "cut" at the end of each rule (see the recursive back-up scheme in [Koster-74]). Two-level g r a m m a r features found in STARLET extend logic p r o g r a m m i n g facilities for this class of problem fleft to right u n a m b i g u o u s translations) by a wellmotivated introduction of types, modes and implicit control of computations.

75 Other well-form conditions ensure the grammar computability

i.e. the ability to

recognize the axiom of the grammar while instantiating every affix variable. Therefore, our work on the reliability and the efficiency of translators developed with STARLET/GL contributes to the software engineering of logic programming on this restricted class of problem.

III

THE STARLET/GL

operational semantics

The reference manual of the STARLET/GL implementation is [Beney-87]. The STARLET/GL programming environment PLEIADE is running

on a BULL DPX5000

under SPIX and the compiler can be used on other UNIX-like systems. This has been implemented by J. Beney (STARLET/GL compiler), A.N. Benharkat (Syntactic Editor, Pretty-Printer and Application Managment : [Benharkat-90] } and H. Harti (interpreter [Harti-89]). III. 1

DeJ~ming languages, translations and

programs

The STARLET grammatical formalism is a set of conventions to specify EAG (see references for formal definitions}. Our notations for EAG have been influenced by the van Wijngaarden style and

also by our practical experiences with the LET language.

Lexical features bring STARLET grammars very close to the natural language style of Wgrammars. A non-terminal symbol is made up of letters, spaces, quotes, minus signs and dollar signs (e.g. "do $ times $"). The number of dollar signs is the arity of the symbol (the n u m b e r of the affix positions of the symbol).

A hypernotion is written with

parenthesized affix sentential forms in place of the dollar signs (e.g. "do (one N) times (ABC)"). We say that STARLET/GL allows split identifiers. We illustrate the notations on a classic example : L1 = { anbncn, n >_ 1 }

Exump/e 1 : ROOT : anbncrL

$

EAG ax/om

AFFIXES :

$

The metarules

$ $

ABC : alpha ; beta ; gamma. $ A ru/e which enumerates the available t~tluesforABC $ N : zero ; one N.

$

An affix rule which defines the natural numbers $

NOTIONS:

$

Rules defining the non-terminals $

anbncn:

d o ( N ) t l m e s (alpha), d o ( N ) t l m e s (beta), d o ( N ) t i m e s (gamma).

76

do ( : zero ) times ( : A B C } : TRUE. do{ : o n e N ) t i m e s ( : A B C ) : a n {ABC), d o ( N ) t i m e s ( A B C ) . a n { : alpha} : object{"a"). a n ( : beta) : obJect('~'~. a n ( : gamma) : object{"c').

STARLET/GL k e y w o r d s a r e in u p p e r c a s e style. Modes a r e lexically defined with t h e s y m b o l " : " in front of (resp. behind) a p a r a m e t e r to define t h e inherited m o d e (resp. t h e synthesized). A n u n d e r s c o r e in front of a n affix v a r i a b l e d e n o t e s t h a t its v a l u e will be u n u s e d b y the computation. This EAG is a spec~cation of the language LI. We c a n c o n s i d e r this EAG a s a generative g r a m m a r ( a s s u m i n g t h a t "object" plays t h e role of " p r o t o n o t i o n s e n d i n g with symbol" in t h e two-level g r a m m a r s c h e m e a n d forgetting m o d e a s s i g n m e n t or u n d e r s c o r e s ) b u t it is w o r t h u s i n g it e i t h e r to g e n e r a t e sentences of L I or p a r s e some strings which m u s t belong to L1. In fact, "object" is a n e x t e r n a l f u n c t i o n w h i c h o u t p u t s

its s t r i n g p a r a m e t e r . A

slight c h a n g e to E x a m p l e 1 gives a correct STARLET/GL p r o g r a m w h i c h

generates

s e n t e n c e s . We c a n give a v a l u e to N b y a n i n t r o d u c t i o n of a n i n h e r i t e d affix p o s i t i o n to "anbncn" : anbncn ( : N ). With 'ROOT : a n b n c n ( one one one zero )" the output is

A recognizer of L1

could

"aaabbbccc"

e a s i l y b e i m p l e m e n t e d too. The effective t e r m i n a l

s y m b o l s a r e recognized b y a n e x t e r n a l function "symbol". This function m a n a g e s a buffer which

allows s o u r c e b a c k t r a c k i n g w h e n n e c e s s a r y . The f u n c t i o n "object" g e n e r a t e s t h e

object c o d e w i t h b a c k t r a c k i n g s y n c h r o n i z e d to s o u r c e b a c k t r a c k . A recognizer of L1 {with the affix rules of Example 1} could b e :

ROOT : parser. NOTIONS : $ "write string" and "needed EOF' are extemal self-explanatory functions $ p a r s e r : a n b n c n , n e e d e d EOF, write string ("OK") ; write string ("KO"). ?anbncn / N :

accept a n d c o u n t (N) t i m e s (alpha), accept (N) times (beta), accept (N) t i m e s (gamma).

?accept a n d count ( one N :) times ( : alpha ) : there is a n (alpha) o n input, accept a n d count ( N ) times ( a l p h a ). accept a n d count ( zero : ) times ( : alpha ) : TRUE. accept{zero :) ttmes{:ABC} : TRUE. ?accept ( one N : ) times ( : ABC } : there is a n {ABC ) on input, accept ( N } times ( A t ~ ).

77

?there is a n ( : alpha ) o n i n p u t : symbol ("a'~. ?there is a n ( : beta ) on i n p u t : symbol ('~"). ?there is a n ( : g a m m a ) o n i n p u t : symbol ("c'~.

Character

"?" is u s e d to specify t h a t the rule is supposed to be a Test (vs. Actior~.

There is a (static) check of the algorithmic consistency fail while a n a c t i o n n e v e r fails. It is u s e f u l to help

of this specification : a test m a y error

recovery d u r i n g

parsing.

Character "/" is followed by local variables used i n the right h a n d side of a rule.

111.2 Grammatical unification and

Uniform Replacement Rule

To execute a STARLET/GL program is to try to recognize the root of the FAG for a given i n p u t text. Its r e s u l t s are available as a n object text. Data types are considered as i n t e r m e d i a t e l a n g u a g e s defined b y m e a n of the context-free aft'ix rules. Affix s e n t e n t i a l forms parameterize the n o n - t e r m i n a l s y m b o l s a n d are generally m a n i p u l a t e d via their d e r i v a t i o n tree i n t h i s g r a m m a r of affixes. T h u s , the g e n e r a l c a s e of g r a m m a t i c a l u n i f i c a t i o n is the tree c o n f r o n t a t i o n which is u s e d to b u i l d a n d look at affix s e n t e n t i a l form s t r u c t u r e s . V a r i a b l e s are c o n s i d e r e d a s logic v a r i a b l e s : t h e p r o g r a m tries to i n s t a n t i a t e t h e m (at p r e s e n t , every v a r i a b l e of a STARLET/GL p r o g r a m h a s to be instantiated). A n o t i o n rule defines a n a n a l y s i s m e t h o d for the non-terrr~nal which is its left h a n d side. There m a y be several r u l e s defining a given n o n - t e r m i n a l . Alternatives i n a rule development are considered from left to right.

A rule call i n t r o d u c e s a confrontation between formal p a r a m e t e r s a n d a r g u m e n t s i.e.

an oriented type-sensitive grammatical unification. When the confrontation

succeed, the r e s u l t is either a tree construction or a tree split (expression i n front of expression is n o t allowed). A rule call is a s u c c e s s if : ~nput confrontation is a success, the development of the rule is a s u c c e s s a n d finally the output confrontation is a success.

T h u s a rule call m a y fail either b e c a u s e of t h e c o n f r o n t a t i o n s or its development.

The Uniform Replacement Rule is responsible for the correct propagation of the values a l r e a d y k n o w n . The c o n f r o n t a t i o n of a variable a n d a n affix e x p r e s s i o n m a y s u c c e e d only if this e x p r e s s i o n a n d one of the legal s t r u c t u r e s for the variable data type are compatible (e.g. a n affix expression 'e' is compatible with the s t r u c t u r e of a n affix 'a' ff affix rules exist s u c h that 'e' c a n be derived from 'a' according to the afl~LXrules).

78

A tree c o n s t r u c t i o n h a p p e n s w h e n e v e r a n affix s e n t e n t i a l form a p p e a r s a s

an

a r g u m e n t to b e unified with a n i n h e r i t e d variable affix p o s i t i o n or a s a s y n t h e s i z e d affix position. A tree s p l i t h a p p e n s w h e n e v e r a n affix s e n t e n t i a l form a p p e a r s a s a n i n h e r i t e d affix position or a s a n a r g u m e n t to be unified with a s y n t h e s i z e d affix position.

There a r e s p e c i a l c a s e s of c o n f r o n t a t i o n since we n e e d

predefined data types

(INTEGER for Integer, CHARAC for c h a r a c t e r a n d STRING for c h a r a c t e r

strings)

or

e n u m e r a t e d t y p e s (every v a l u e is a single t e r m i n a l affix : it allows t h e del'mitlon of t h e implicit i n t e r s e c t i o n of types).

W h e n possible, t e s t which are n e e d e d d u r i n g the confrontations are carried out at compile time a n d therefore r u l e s which are able to s u c c e e d a r e statically sorted. It avoids u s e l e s s a t t e m p t s at confrontation (see § IV).

A p p l i c a t i o n of t h e

identifiers

Uniform Replacement Rule

and the confrontation

combined with the split

facilities give a n i d e a of t h e c o n c i s i o n a n d

r e a d a b i l i t y of the p r o g r a m s : W i t h t h e s e affix r u l e s defining list of integers : L: empty ; X L .

X: INIEGER

Test for t h e equality of two list

If(:L) is ( : L ) : T R U E .

E x t r a c t t h e first element of a list

The First of ( : X _L) is ( X : ) : TRUE.

Test t h e flrst element of a l i s t

Iftheflrst of ( : X _ L ) is ( : X ) : T R U E .

Unused variables are a convenient m e a n s of i m p o s i n g type c o n s t r a i n t s over t h e values. We illustrate t h e u s e of STARLET a s a logic p r o g r a m m i n g l a n g u a g e on a complete program that reads

a n integer list, b u i l d s it in m e m o r y a n d o u t p u t s it in t h e reverse

order.

Example 2 :

$

Input342

¢xalputs 2 4 3 5

ROOT : Reverse, next llne. $ "next line", "read integeV', "write integer", "write charac" are external functions $ AFFIXES :

L: e r n p t y ; X L .

X: INIEGER.

CAR

: CHARAC.

NOTIONS : Reverse / L, LI : Read (L), Reversed list of (L) in front of ( empty ) is (L1), Wptte (L1). Reversed list of ( : X L ) infront of (:Ll)is ( L 2 : ) : Reversed list of ( L ) i n f r o n t o f ( X L 1 ) i s ( L 2 ) . Reversed list of { : e m p t y ) infront o f ( : L ) i s

(L:):TRUE.

79

?Read ( X L : ) : read integer (X), write integer (X}, write charac (' '),Read (L). Read( e m p t y : ) : next l l n e . Write( : X L ) : write integer (X), write charac (' '), Write (L). Write { : empty ) : TRUE.

111.3

We

A translator written in STARLET/GL

i l l u s t r a t e STARLET p r o g r a m m i n g

with

t h e e x a m p l e of e x p r e s s i o n

t r a n s l a t i o n . We o b t a i n a list of identifiers a n d t h e n a n a s s i g n m e n t i n s t r u c t i o n where t h e r i g h t h a n d side is a n infLx e x p r e s s i o n (input device is t h e s t a n d a r d one). We p a r s e it a n d c h e c k for the identifier u s e : e a c h u s e d identifier m u s t have b e e n declared once only. T h e n the tree r e p r e s e n t a t i o n of a correct "program" is t r a n s l a t e d into its postfix r e p r e s e n t a t i o n o n t h e s t a n d a r d o u t p u t device. If t h e e x p r e s s i o n t u r n s out to be c o n s t a n t (each of its o p e r a n d s is a constant), t h e t r a n s l a t o r e v a l u a t e s it.

Example

3:

$ input "a, b, $ input "a

c; c := b + c * a "

outputs "c b c a * + =" $

; a := 2 + 3 * 4"outputs "a 14

=" $

R(xYr : translate. WITH : lexico. $

This program uses notions from this module : needed EOF, error, symbol, iden~ constant...

AFFIXES :

$

$ Abstract syntax $

a b s y : a b s y op a b s y ; cste ; name. cste: INTEGER. n a m e : STRING. op : p l u s ; mnlt ; ass. symbtab : name symbtab ; empty.

$.4 symbol table is a list of identifiers $

VARIABLES : s y m b t a b .

$ Global variable to contain the symbol table $

NOTIONS : t r a n s l a t e : init(symbtab], declarations, i n s t r u c t i o n , n e e d e d EOF. d e c l a r a t i o n s : identifier l i s t , n e e d e d symbol (";"]. identifier list : one v a r i a b l e , o t h e r s identifiers. o t h e r s identifiers: s y m b o l C,"), identifier list ; TRUE. one variable / n a m e I : i d e n t ( n a m e l } , c h e c k a n d e n t e r {name 1) i n (symbtab) gives (symbtab) ; error("identifier expected"). instruction / a b s y

: a s s i g n m e n t (absy), postfix (absy).

80 assignment{ name I ass absyl : ) / absy : variable reference (nameD, needed symbol{":="}, expression(absy ), optimization of (absy) gives {absyl). ?variable reference (name:) : ident(name), check for (name). variable reference ("foo":): error("identffier expected").

$ 'foo" replaces the missing ident~er in symbtab $ expression (absyl:} / absy2 : term

(absy2), rest of expression (absy2,absyl).

?rest of expression (: absy, absyl:) / absy2 : symb{"+"), term{absy2), rest of expression (absy plus absy2,absyl). rest of expression ( : absy,absy : } : TRUE. t e r m ( a b s y : ) / a b s y l : factor(absyl), rest of term (absyl,absy). ?rest of term (:absyl,absy:} / absy2 : symb("*"), factor(absy2), rest of term (absyl mult absy2,absy) o rest of term ( : a b s y , absy :) : TRUE. ?factor(cste : ): constant (cste). ?factor(absy : ): symb("("), expression (absy), needed symbol (")"). ?factor{name : }: variable reference (name). factor (0:} : error ("operand is needed"). postfix(: este ): write integer (cste). postfix(: name) : write string (name). postfix(: a b s y l op absy) : posffix(absyl), postfix(absy), write op (op). write op (: plus) : write string (" + "). write op (: mult) : write string (" * "). write op (: ass): write string ( " = ").

$ Optimization $ ?optimization of {:absy) gives (absyl:} : ff expression {absy) is constant its value is {absyl). optimization of ( : absy ) gives ( absy : ) : TRUE. ?if expression (: a b s y l ..op absy2 } is constant its value is { absy30 : } / a b s y l 0 , absy20 : ff expression {absyl} is constant its value is ( a b s y l 0 ), ff expression (absy2} is constant its value is ( absy20 eval (absyl0, op, absy20, absy30). ff expression (: cste) is constant its value is { cste : ) : TRUE. ?eval (: absyl, : plus, : absy2, cste : } : add (cstel,cste2,cste). ?eval (: absyl, : mult, : absy2, cste : ) : mul (estel,cste2,cste}.

},

81

$ Symbol table management $ inlt(empty :) : TRUE. ? c h e c k a n d e n t e r (: name) in (: symbtab) gives (symbtab : ) : l o o k for (name) in ( s y m b t a b ) , error("identifier a l r e a d y declared"). check a n d enter (: name) in (: symbtab) gives ( n a m e s y m b t a b :) : TRUE. look for (: name) in ( : n a m e _symbtab) : TRUE. ?look for (: name) in (: _name1 s y m b t a b ) : look for (name) in ( s y m b t a b ) . check for (: name) :

IV

look for (name) in (symbtab) ; error("identifier n o t declared").

S T A R L E T / G L f o r grammatical

debugging

O p t i m i z a t i o n s c o n s i s t of c u t t i n g u s e l e s s pieces of t h e g e n e r a t e d program. D u r i n g t h i s p r o c e s s , t h e t r a n s l a t i o n of s o m e r u l e - c a l l or s o m e r u l e s m a y b e c o m e impossible. Therefore, o p t i m l z a t i o n s c a n l e a d to e r r o r m e s s a g e s t h a t h e l p to d e b u g t h e p r o g r a m w i t h o u t h a v i n g to e x e c u t e it.

We i n t r o d u c e s o m e of t h e o p t i m i z a t i o n s m a d e b y t h e

STARLET/GL c o m p i l e r w h i c h a r e r e l a t e d to g r a m m a t i c a l p r o p e r t i e s . It h e l p s

either

l a n g u a g e design (grammatical debugging) or t r a n s l a t o r d e b u g g i n g ,

First, for e a c h rule call, we select only t h e r u l e s whose p a r a m e t e r s are of the "right type". Note t h a t with Prolog t h i s c a n n o t b e done a n d t h a t s o m e r u l e s a r e a l w a y s rejected w h e n tried d u r i n g execution. T h i s o p t i m i z a t i o n m a y s h o w logical e r r o r s s i n c e we o n l y w o r k on p r o g r a m s which have a t l e a s t one s o l u t i o n : if, for a call, not one single rule h a s good p a r a m e t e r s ; t h e call will always fail so t h a t t h e rule t h a t c o n t a i n s t h i s call is a systematic blind alley. ff a rule is never u s e d b e c a u s e of its p a r a m e t e r types

This is c h e c k e d a t compile time b y m e a n s of affix r u l e s parsing. To sort t h e r u l e s before r u n t i m e e n a b l e s t h e u s e r to give t h e s a m e n a m e to r u l e s w h i c h p r o c e s s objects of different t y p e s w i t h o u t u s e l e s s a t t e m p t s at confrontation.

Example : Given t h e affix r u l e s LE : e m p t y ; e LE.

e : INTEGER.

I ~ : e m p t y ; n LN.

n : STRING

82

write (: e LE) : ... write (: n LN) : ... write (: empty) : ... A : ... write (LE), write (LN) .,.

I n t r y i n g the d e v e l o p m e n t of "A", the first ' ~ r i t e " call will n o t u s e the second "write" rule while the second will n o t u s e the first "write" rule. A single rule c a n be u s e d to process v a l u e s which are n o t of the s a m e type b u t which s h a r e some s t r u c t u r e s (e.g. "empty"). The third rule will be u s e d for each of the '~vrlte" call i n the development of '7¢'.

Static checks e n s u r e t h a t tree c o n s t r u c t i o n will always succeed. At r u n time, in the case of tree split, one m u s t also check t h a t the a c t u a l value for variables have the right s t r u c t u r e s . T h u s , for the rule call "write (LE)", there is a test which tells ff the s t r u c t u r e of the variable "LE" is "e LE" (the first 'h~rite" rule is tried)) or "empty" (the third "write" rule is tried). Note t h a t it m a y reveal

p r o b l e m s w h e n the affix r u l e s define a n a m b i g u o u s

g r a m m a r [Maluszynski-84] {as the "absy" affix rule i n I~.ample 3). As only one successful parse is r e t a i n e d (parse of affix s e n t e n t i a l forms given the afflx rules), one m a y n o t split some

trees. However, Example 3 i m p l e m e n t s a n u n a m b i g u o u s t r a n s l a t i o n since the

g r a m m a r which is u s e d to parse infix expressions is u n a m b i g u o u s (operator precedence is well-defined). Secondly, we reject some r u l e s b y checking the context of a call i.e. the type or v a l u e c o n s t r a i n t s fixed b y the previous a n d the following calls i n t h e development. It e n a b l e s o p t i m i z a t i o n s b u t ff every possibility is e l i m i n a t e d it s h o w s t h a t t h e r e are logical errors.

Here we also Fred some errors : a s u c c e s s i o n of calls m a y always fail or a rule is no longer used b e c a u s e of the context of its calls.

Example :

Given the affix rule

L : x L ; empty .

Read (x L : ) : .... Read ( empty : ) : .... Write ( : x L ) : .... Write ( : empty ) : .... Try / L: Read ( L ) , write ( L ) .

83

Inside 'Try" development, ff t h e first rule ' ~ e a d " is a s u c c e s s (resp. t h e second), we s h o u l d try only the first rule WCrite" (resp. the second).

The d e c l a r a t i o n s of t h e p o s s i b l e failure of t h e r u l e s (test rules) allow a n o t h e r optimization to be m a d e w h e n u n r e a c h a b l e rule calls c a n b e cut. This c a n l e a d to a n o t h e r logical e r r o r d e t e c t i o n w h e n a rule is always u n r e a c h a b l e b e c a u s e of a p r e v i o u s rule t h a t always s u c c e e d s (action rules).

Lastly, dataflow c h e c k i n g is very s t r i c t i n t h e c u r r e n t i m p l e m e n t a t i o n since every v a r i a b l e m a y receive a value. Global v a r i a b l e s a r e n o t c o n s i d e r e d a s logical v a r i a b l e s : t h e y are assigned b y a side-effect a n d their u s e is not checked.

V

Conclusion W e a r e w o r k i n g on a n efficient i m p l e m e n t a t i o n of E x t e n d e d Affix G r a m m a r s

w h i c h provides high-level d e b u g facfllties ( g r a m m a r debugging) t h r o u g h n u m e r o u s static c h e c k s . P r e v i o u s e x p e r i m e n t s with a n algorithmic i m p l e m e n t a t i o n of affix g r a m m a r s (LET} w e r e m a d e a n d a n d we a l r e a d y a p p r e c i a t e t h e e a s e of u s e of t h e STARLET p r e d i c a t i v e i n t e r p r e t a t i o n s i n c e it e l i m i n a t e s m a n y of t h e c o d i n g difficulties p r e v i o u s l y encountered. We h a v e few e x p e r i m e n t a l r e s u l t s

s i n c e STARLET/GL

is c u r r e n t l y u s e d in a

r e s e a r c h p r o g r a m on t h e a u t o m a t i c a n a l y s i s of F r e n c h (3000 lines, M. De Brito) or h a s b e e n u s e d for a n expert s y s t e m g e n e r a t o r (5000 lines, L. C o u d o u n e a u ) . s t u d e n t s l e a r n compiling t e c h n i q u e s with this compiler compiler (150

Moreover, o u r

p e r year).

O n t h e o t h e r h a n d , we still have a lot of w o r k with t h e a s s o c i a t e d p r o g r a m m i n g tools. It i n c l u d e s n o t o n l y f u r t h e r environment

d e v e l o p m e n t s a n d m a i n t e n a n c e on t h e PLEIADE

b u t a l s o t h e d e s i g n of n e w

t o o l s a s s o o n a s we a r e

a b l e to explicit

g r a m m a t i c a l p r o g r a m m i n g m e t h o d o l o g i e s {method-driven p r o g r a m m i n g tools}.

References M a n y r e f e r e n c e s a r e only available in F r e n c h (see "-F-" ). INSAL d e n o t e s I n s t i t u t N a t i o n a l d e s S c i e n c e s A p p l i q u 6 e s de Lyon.

BULLET is a local j o u r n a l (6 i s s u e s s i n c e

1986) which c a n be o b t a i n e d u p o n r e q u e s t to the a u t h o r s . Most of t h e L~-~I" a n d STARLET a p p l i c a t i o n s are r e p o r t e d in it.

84 Beney-80

BENEY (J.), FRECON (L.). Langage et Systemes d'Ecriture de

-r-

Transducteurs. RAIRO Informatique, 1980, Voi.14, n°4, p. 379-394.

Beney-86

BENEY (J.). La notion de programmation grammaticale.

-r-

BULLET, Juin 1986, n°l, p, 16-21.

Beney-87

BENEY (J.). Presentation de STARLET/GL. Rapport Interne INSAL, Laboratotre d'Informatique Appliquee, 1987.

Beney-89

BENEY (J.), BOULICAUT (J.F.). Transfer of Expertise between TwoG r a m m a r s and Logic Programs Through a n Affix Implementation. in : Proceedings of Informatlka'89, (E. Tyugu Ed.), Tallin (URSS), May 29th - June 2nd 1988, 18 p.

Benharkat-90

BENHARKAT {A.N.). Atelier Logiclel PLEIADE : Edition des modules et

-r-

suivi des applications STARLET. Ph-D thesis : INSAL, 1990, 228 p.

Borland-86

BORLAND International. Turbo Prolog Owner's Handbook, Scott Valley, USA, 1986.

Colmerauer-77

COLMERAUER (A~). Metamorphosis grammars, in : Natural Communication with Computer, Springer-Verlag, 1977, LNCS 63, p. 133-189.

Deransart-88a

DERANSART (P.), JOURDAN [M.), LOHRO (B.). Attribute Grammars : Definitions, Systems and Bibliography. Sprtnger-Verlag, LNCS 323, August 1988, 232 p.

Deransart-88b

DERANSART (P.), MALUSZYNSKI {J.)~ Grammatical view of Logic. in : Proceedings of PLILP 88, Orleans, France, May 1988, Sprknger-Verlag, LNCS 348, 1989, p. 219-251.

E2S-84

E~pert Software System. MIRA user's guide. E2S Building "de Schelde" Moutstaat 100 B~9000 GHENT.

Franzen- 77

FRANZEN {H.), HOFFMANN (B.), POHL {B.), SCHMIEDECKE {I.IL). The EAGLE Parser Generator : an Experimental Step towards a Practical Compiler-Compiler using Two Level Grammars. in : 5th Annual HI Conference, Guidel~ France, May 1977, p.397-420.

Frecon-89

FRECON (L.). Gran.amalres aff~es : applications et questions

-F-

ouvertes. Document de travail, Atelier Lyon-Nimegue sur les grammatres afflxes~ 24-26juin 1989 (Organlsation INSAL), 25 p.

Hartl-89

HARTI (H.). Exploitation predlcative des grarnmaires affixes :

-r-

interprete et machine vlrtueUe STARLET. These de Doctorat : INSAL, 1989, 168p.

Jourdan-88

JOURDAN (M.), PARIGOT (D.}. The FNC-2 System : advances in attribute g r a m m a r s technology. !NR/A, april 1988~RR-834,28p.

85

Koster- 70

KOSTER (C.HJL). Affix Grammars. in : ALGOL 68 Implementation, J.E.L. PECK Ed., North-Holland, 1970, p.95-109.

Koster-74

KOSTER (C.H.A.), A technique for parsing ambiguous grammars. in : Proceedings of GI-4 Jahrestagung, (D. Siefkes Ed.), Sprlnger-Verlag, 1975, LNCS 26, p. 233-246.

Koster-77

KOSTER (C.H.A~). CDL : a Compiler Implementation Language. in : Methods of Algorithmic Language Implementation, Springer-Verlag, 1977, LNCS 47, p. 341-351.

Maluszynski-84 MALUSZYNSKI (J.). Towards a programming language based on the notion of two-level grammars. Theoretical Computer Science, 1984, Voi.28, p.13-43. Meijer-86

MEIJER (H.). PROGRAMMAR : a translator generator, Ph-D thesis : University of Nijmegen, 1986, 225 p.

Pereira-80

PEREIRA (F.C.N.), WARREN {D.H.D.). Definite Clause Grammars for Language Analysis : a survey of the formalism and a comparison with Augmented Transition Networks. Artificial Intelligence, 1980, Vol. 13, p. 231-278.

Saidi-90

SAIDI (S.), BOULICAUT (J.F.). Checking and Debugging Transparent Two-Level Grammars. ECL : Research Report 90-10, July 1990, 24 p.

Slmonet-81

SIMONET (M.). W-grammaires et logique du premier ordre pour la

-F-

definition et l'Implantation des langages. Th6se d'Etat : Grenoble, 1981, 329 p.

Watt-74

WATT (D.A.). Analysis-oriented two-level grammars. Ph.D. thesis : University of Glasgow, january 1974.

Watt-83

WATT (D.A.), MADSEN (O,L.). Extended Attribute Grammars. The Computer Journal, 1983, Vol. 26, n ° 2, p. 142-153.

STARLET: An affix-based compiler compiler designed as a logic ...

STARLET: An affix-based compiler compiler designed as a logic ...

Suggest Documents

STARLET: An affix-based compiler compiler designed as a logic ...

The PRECC Compiler Compiler

Compiler Application (COMPILER) - Erlang

A Ladder Logic Compiler and Interpreter

Compiler

Smacc: a Compiler-Compiler - Pharo file server

Online Compiler as a Cloud Service - People

Building a compiler

A Pattern-Matching Compiler

A Rudimentary Quantum Compiler

Engineering a compiler

A Standard ML Compiler

An appraisal of compiler technology

COMPILER CONSTRUCTION

Compiler Construction

CBASIC® Compiler

Compiler construction

A New C Compiler

COMPILER CONSTRUCTION

Compiler Construction

Compiler Construction

A Pattern Enforcing Compiler (PEC) for Java: Using the Compiler

An Extensible Open-Source Compiler

A certified compiler for an imperative language