A FORMAL SEMANTICS FOR COMPUTER-ORIENTED LANGUAGES ...

11 downloads 220703 Views 3MB Size Report
programming course. Also of great value was the assistance of many members of the .... languages contains all the information needed in a translator for that.
A FORMAL SEMANTICS FOR COMPUTER-ORIENTED LANGUAGES

by

Jerome A. Feldman

Submitted to the Carnegie Institute of Technology in partial fulfillment of the requirements for the degree of Doctor of Philosophy

Pittsburgh, Pennsylvania 196&

/

. _ j-

/

.}

/I _-_///

PREFACE

In this thesis oriented

languages

practical

results

which

aspects

of the formalization

are considered. made

such languages. compiler

several

possible

The most has been

The emphasis

has been

by a formalization

important

of these

implemented

using

of computerplaced

on the

of the semantics

results

of

is the compiler-

the formal

systems

described

here as a base.

The author guidance

is deeply

in this work.

He would

tions made by the students course.

indebted

to Professor

also like to acknowledge

of his undergraduate

Also of great value

Tech Computation

by Messrs.

and R. MacEwan

The research Projects

Agency

the Carnegie

reported

Center,

of Defense

of Technology.

iv@

for his

the contribuprogramming

of many members especially

and by Miss

here was supported

of the Department

Institute

advanced

was the assistance

staff of the Carnegie D. Blocher

Alan Perlis

Nancy

that provided

Lintelman.

by the Advanced under

of the

the Grant

Research SD-1A6

to

TABLE OF CONTENTS Page Preface. • • • • • • • • • • • • • • • • • • • • • •

iv

Table of Contents. • • • • • • • • • • • • • • • • •

v

Abstract • • • • • • • • • • • • • • • • • • • • • •

vi

CHAPTER I.

INTRODUCTION.

• • • • • • • • • • • • •

1

CHAPTER II. SYNTAX. • • • • • • • • • • • • • • • • A. The Production Language........... B. Formal Properties..............

9 9 21

CHAPTER A. B. C. D. E. F. G. H.

33 33 37 A2 51 63 66 73 78

IIl. SE_L_NTICS............... Introduction............... Storage Declarations . . . . . . Storage Operations . . . . . . . Operands and Primaries . . . . . Expressions................. Statements . . . . . . . . . . . Subroutines................. Program Structure..............

CHAPTER IV.

• . . . . . . . . . . . . . . . . . . . . . . . .

IMPLEMENTATION.............

83

CHAPTER V. CONCLUSION............... A. Review and Critique............. B. F_ture Research...............

95 95 102

APPENDICES . . . . . . . . . . . . . . . . . . . . . Notes on the Appendices........... A. Backus Normal Form Syntax of the Small Language.............. B. Production Language Syntax of the Small _2nguage.............. C. Semantics of the Small Language....... D. Backus Normal Form Syntax of FSL...... E. Production Language Syntax of FSL...... F. Sample Small Language Programs . . . . . . .

109 109

BIBLIOGRAPHY....................

153

V.

ll5 ll7 127 133 137 l&7

ABSTRACT

This dissertation

presents

study of the formalization oriented

languages.

which is capable large

there are several

sections

so that the thesis

detailed

where we also formalizing

establish others

Chapter guage which

is a computer

language,

in a program

any of a

in this class

are all the

in the thesis

is based

to theoretical

development.

Although

questions,

This organization

as a userts

guide

on the

these

was chosen

to the compiler-

language.

to the paper

is given

some of the philosophical

semantics.

Chapter

II contains

between

in Chapter

questions

a discussion

used in the compiler-compiler.

relationships

appearing

of computer

the compiler-compiler.

devoted

introduction

consider

s_vntax language

called

could also serve

as a computer

attained

languages.

of the results

are set off from the main

A more

Included

problem-oriented

of this program,

result

into machine

languages.

The presentation

of results

properties

important

of translating

usual high level

compiler

of certain

The most

class of source

structure

a number

In this

this formalization

raised

I, by

of a formal chapter

we

of syntax and

in the literature.

III is a complete is the main

discussion

contribution

we show how the two formal

systems

of the Formal

of this thesis. were

vi

combined

Semantic

In Chapter

LanIV

to form the basis

for a useful computer technique.

The final chapter contains a

discussion of the strengths and weaknesses of our system as well as several suggestions for future research.

The appendices form an integral part of the thesis.

The examples

contained there include a record of the development of a translator for one small language from a formal definition of the language to examples of resultant machine code.

vii

i@

I.

This thesis of an attempt

is a study

to formalize

Among the languages guages

INTRODUCTION

of some theoretical the semantics

in this class

such as ALGOL

such as Markov

processes,

but are not usually

the semantics not consider guage.

definitions

The emphasis

which

for that

language.

machine

will

semantics.

which

will

construct,

appropriately

languages.

This program

would

for the complete

formal

description

program,

based

it in the compilation

we will usually

in later

Formalized

interesting

results

application

a translator

semantics,

syntax,

of computer

(compiler) for the

combined

provides

languages.

on the formalisms

of compilers

lan-

chapters.

be a compiler-compiler

for formalizing

Since

for any computer-oriented

described,

it is implemented.

a computer

computational

on the practical

The most

lan-

we include

code as a computer-oriented

be placed

techniques

have used

order

in this paper

one of the known

structed

to be computer

this situation

has been

on which

specify

will be said about

program

language

In addition,

[30] which

considered

machine

from a formalized

is in a computer

[_3].

languages.

problem-oriented

are to be processor-independent,

a particular

Much more

attainable

Algorithms

consequences

of computer-oriented

are the existing

bO [31] and COMIT

formalisms

and practical

the facility We have

described

for several

with

con-

below,

well-known

and

lan-

guages.

The entire distinction

formal

between

scheme depends

syntax

on a particular

and semantics.

definition

The definitions

of the

we have chosen

@

differ

from those used by most mathematical

to the intuitive

We define

notions

an alphabet,

any string of symbols symbols. finite sidered

strings

_

of symbols

an uninterpreted

can be uniquely

from

of the uninterpreted

effective

_,

there

_.

select

called the s.yntactic meta-lan_uage of _may

strings

in _or

an algorithm

_,

_

of

in which

can be con-

is called

a formal

mentioned

consider

above

need

only languages

the test is specified

an algorithm

for deciding

of

A specification

used to specify _.

be either

set, _,

_

_.

The process we will

such that

into its component

is an infinite

the subset _of

The language

(syntax)

decomposed

over the alphabet

language _

process

set of symbols

Any subset,

[lO], but in this paper

tests.

but do correspond

and semantics.

, to be a finite

language

which will

not be effective with

in

_

For any such alphabet,

of a process syntax

of syntax

logicians,

is

The selection

for generating

of a string

all the

in _whether

or

not it is in _.

Any other well-defined semantical. processor

An uninterpreted will be

interesting only with

interpreted

it is possible us here.

language

connected

in this

system used

_.

in _will with

thesis,

at least

languages, referring

semantic

one semantic

there

are many

we will deal

to them

a semantic

meta-language

uninteresting

be considered

Although

to describe

is called a semantic to define

together

with uninterpreted

languages

A formal

some lan_=uage,_,

concern

on strings

called an interpreted

problems

as laDA_ua_es.

that

operation

simply

processor

of doperators

on

The fact will not

.

In our formulation, a specification is what

the syntax

of which

logicians

have

strings

called

There are good reasons

are permissible

syntax

logic,

it is usual

the difference to refer

a system as syntactical. ence

example

rithm describes

according

a method

of these operations

are considered

tactic. on what formed

which

chooses

However,

formulas,

language.

then

calculus

(quantifier-free

for selecting

strings

One algorithm

specifies

rules.

the set of theorems

is once

selection

logic.

results.

again

purely

The syn-

depends

the set of welloperator.

in the predicate is a formal

algo-

are theorems.

algorithm

is a semantic

selector

which

in mathematical

If_is

selector

from

A second

strings which

strings

within

out the differ-

up with the following

to be the language,_.

the theorem

operation

point

syntactical

of the theorem

then the theorem

choose to consider

the language,_,

the well-formed

the nature

one considers

one comes

those

out, there

In mathematical

formalized

to the formation

for choosing

Using our definitions, operation

algorithms

over the alphabet.

are well-formed

This

[_, p.58].

[9] points

may help

of the propositional

strings

only

language.

sense

is not essential.

In any formulation

there are two well-known

in that

As Church

of syntax.

the set of all strings

we

so.

includes

chosen not to use our defini-

in the two notions

logic)

Both

have

to any completely

A simple

language

in the narrow

why logicians

tion of syntax and why we have done is a sense in which

of a formal

calculus

syntax

If as

of that

A good discussion of why the wider definition of syntax is useful in logic may be found in Church [9, PP. _7-68].

The principle motivation

behind our definitions is their close correspondence with the operation of compilers operating on digital computers.

As we will show, the specifi-

cation of the formal syntax and formal semantics of a computer-oriented languages contains all the information needed in a translator for that language.

In Chapter II we will discuss two types of syntactic meta-languages, with the emphasis on a particular recognizer-oriented language which has proved useful in building translators.

Section II-A contains a description

of this language as a programming language.

In Section II-B we consider

the formal properties of this syntactic meta-language in comparison with some well-known formal systems.

Chapter III contains a complete description of the semantic metalanguage, FSL (Formal Semantic Language), which is the principle contribution of this thesis.

The discussion treats FSL as a programming language

and contains frequent references to language given in Appendix C.

the semantic description of a small

In Chapter IV we describe a computer progr_n

which builds, from a formal description of a language, a translator for that language.

Although this program is highly machine dependent, we

require the formal descriptions themselves to be independent of the machine on which the translator is run.

In setting up such a scheme we have fol-

lowed the semiotic of C. S. Pierce, introduced into programming by Gorn [21].

.

The semiotic concept of language is that it consists of three parts: syntax, semantics, and pragmatics.

Syntax is concerned with the selection

of those strings which are members of the language.

Semantics consists

of the meanings of a construct in the language independent of the interpreter involved.

Those aspects of language introduced by the consid-

eration of what is interpreting the language are called pragmatic.

The

interesting point is that we have been able to devise a useful trichotomy between these three aspects of computer-oriented languages.

As we have mentioned, the syntax and semantics of computer languages are to be described in machine-independent formal systems.

Machine depen-

dence (pragmatics) occurs in the programs which utilize the formal descriptions.

The following block diagram should help illustrate how formal

syntax and semantics can be used in a compiler-compiler.

& Source Code in L

T Syntax of L -_

Syntax Loader

Fixed Part of Compiler Basic

Compiler

Semantic Semantics of L Loader i_-_Object Code Meta-Compiler

Compiler

Figure 1.

.

_Wnena translator for some language, L, is required, the following steps are taken.

First the formal syntax of L, written in the syntax

meta-language, is fed into the Syntax Loader.

This program builds tables

which will control the recognition and parsing of programs in the language L.

Then the semantics of L, written in the Formal Semantic Language,

is fed to the Semantic Loader.

This program builds another table which

in turn contains a description of the semantics of L.

The Basic Compiler is a table driven translator based on a recognizer using a single pushdown stack.

Each element in this stack consists of two

machine words -- one for a syntactic construct and one holding the semantics of the construct.

The syntax tables determine how the compiler will

parse a source language program.

When a construct is recognized, the

semantic tables are used to determine what actions are to be taken.

The entire system is based on the formalization of syntax and semantics, but not on the particular representations chosen.

The remainder of

this thesis will be largely devoted to a discussion of the two meta-languages used in the compiler-compiler at Carnegie Institute of Technology. The system has been running since early in 196A and translators for such languages as ALGOL, FORTRAN, LISP and MARKOV ALGORITHMS have been written.

Chapter II of this thesis discusses the syntax meta-language used in our system.

This language, Production Language, is similar to systems

used for different purposes by Floyd [15], Evans [13] and Graham [22].

0

The most interesting feature of these systems is the absence of the explicitly recursive processes which are found in most syntax directed compilers.

The Formal Semantic Language (FSL) is described in Chapter III. The discussion there is basically a userts guide to the FSL system. Part of the FSL description deals with the semantic loader of Figure 1. Each of the three basic parts of Figure l;

syntax loader, semantic loader,

and basic compiler, is a complex computer program.

In Chapter IV we

describe how these were combined to form a compiler-compiler.

In Chapter V we review and criticize the first four chapters. Among the topics discussed are the strengths and weaknesses of our system, the quality of translators produced and opportunities for future research.

The appendices form an important part of the paper.

In Appendices

A-C we describe a subset of ALGOL 60 called the "small language'r. Appendix A consists of the Backus Normal Form syntax of the small language. Its syntax in Production Language is Appendix B and Appendix C contains its formal semantics written in FSL.

Throughout Chapters II and III

there are references to examples taken from these appendices.

Appendices

B and C were used just as they appear to build a compiler for the small language.

Appendices D and E contain a formal syntactic definition of the Formal Semantic Language itself.

The Backus Normal Form syntax is

Appendix D and the Production Language syntax is AppendixE.

The question

of writing the FSL semantics in FSL is discussed in the section of Chapter V which deals with weaknesses of the system.

Appendix F contains several examples of small language programs and their machine language equivalents.

Thus Appendices A, B, C, and F enable

one to follow the operation of the compiler-compiler from abstract syntax to machine code for the small language.

II.

SYNTAX

9.

A, .... The Production Language

For most computer scientists, formal syntax is synonymous with Backus Normal Form or BNF.

The use of BNF to represent the formal syntax

of ALGOL 60 [31] had a marked effect on the field.

Its simple and precise

specification of ALGOL syntax was a great improvement over the unwieldy English descriptions associated with previous programming languages. Besides improving human communication, the use of formal syntax has led to new formal developments in the theory and practice of programming.

Theoreticians were able to show that BNF grammars were capable of generating the same class of languages as those generated by several other formalisms.

From the theory of natural languages came the Context

Free Grammars of Chomsky [6] and the Simple Phase Structure Grammars of Bar Hillel Ill, both equivalent to BNF.

A mathematician, Ginsburg showed

that BNF was equivalent to a formalism he called Definable Sets [20]. In addition, many results on the formal properties of such systems were attained.

The most interesting practical development was the syntax-directed compiler.

If the syntax of programming languages can be represented

formally, it should be possible to construct a language-independent syntax processor for a translator.

As early as 1960, Irons [25] was able to

write a translator whose syntax phase was independent of the source

10.

language being translated.

This work and that of other early contributors

such as Brooker and Morris E2_ led to speculation that the entire translation process can be automated.

This dissertation can be viewed as an effort

in this direction.

With this weight of accomplishment behind BNF one might well question the usefulness of developing other syntax languages.

Unfortunately, for

several reasons BNF is inadequate as a universal syntax meta-language. We will present the theoretical weakness first and then discuss some practical difficulties and how they may be overcome.

As Floyd E183 has shown, Backus Normal Form is not capable of representing the language ALGOL 60 as defined by the English description in the report E31_.

That is, not only is the BNF syntax in the ALGOL report

inadequate to represent that language, but there is no BNF grammar that can do so.

The concepts not expressible in BNF include relations imposed by

the data types of declared variables and the number of parameters of subroutines.

Since at least one of these constructs occurs in almost all

programming languages, BNF is in some sense too weak to represent programming languages.

To make matters worse, there is a sense in which BNF grammars are too powerful.

The languages generated by BNF grammars are sufficiently complex

to have recursive undecidability play a significant role.

llo

A problem

in artificial

no single algorithm any language lowing

which will

in the class

problems

[i0].

generate

2)

Is the language

generated

of the possible In addition, i.e.,

are questions languages

no proofs

of such

that recursive

on computers things,

recognizers

Intuitively,

details

incurred

translator

a single

BNF grammar

in more

indicate

that,

is ambiguous,

than one way [A_]. about

These

programming

in general,

this cannot

means

are slower

the speed

convenience

stem from their

of the recursive

notation

slow processing.

it is generally

in recursive

human

design

in the

its alphabet?

on BNF grammars

The elegance

below.

greater

over

sentence

above

contained

all (or all but a finite number)

whether

1Lmitations

nature. which

that the fol-

be

with BNF grammars.

The practical

processing,

time for

grammar?

like to be able to answer

and the results

done for languages

recursive

strings

the same

one would

[I] has shown

by one grammar

generate

it is undecidable

if it generates

in a finite

is

the same language?

by a second

Does a BNF grammar

if there

for BNF grammars:

Do two grammars

3)

is undecidable

a solution

Bar Hillel

l)

generated

theory

produce

are undecidable

language

where

language

agreed

processors

difference processing. compensates

arises

inherently

indicates

Although

(cf. [5], [22], than

implied

there

are

[A2])

the type described

from the housekeeping

There are many for this

does not seem to be one of these.

applications

difference, Furthermore,

but error

12.

detection and recovery is more difficult in recursive recognizers [5].

A

survey of the relation of formal syntax to programming may be found in an article by Floyd [17].

It was also Floyd who introduced an alternative notational technique, that of productions, into language analysis [16].

Like BNF, this produc-

tion notation was originally devised to improve communication between people.

It was so useful in representing symbol manipulation algorithms

that it was natural to make a programming language from it.

When an effort

to implement ALGOL 60 was undertaken at Carnegie Tech, this notation was chosen for the coding of the syntax phase of the translator [13].

A good deal is now known about the theoretical and practical aspects of Production Language and other recognizer-oriented grammars.

The theory

[15] and practice [22] of bounded context grammars has received the greatest attention.

Our Production Language system differs from those described

elsewhere and thus will be presented in some detail below.

The remainder

of this section deals with the use of our system while its formal properties are discussed in Section II-B.

The Production Language, PL, is based on the model of a recognizer with a single pushdown stack.

The syntax of a source language, when writ-

ten in PL, will specify the behavior of the recognizer as it scans an input text in that source language. the stack of the recognizer.

As a character is scanned it is brought into The symbol configuration at the top of this

13. stack is compared with the specification of the source language being translated.

When the characters in the stack match one of the

specified configurations for the source language, certain actions are performed on the stack.

If a match is not attained the stack is

compared with the next PL statement.

This process is repeated until

a match is found or until the stack is discovered to be in an error state.

The functioning of the recognizer will be discussed in connection

with the example at the end of this section.

The format of a statement (production) in PL is given below.

Label

L5

L_

L3

L2

L11 _

R3

R2

R1 I

Action

Next

In this format the first vertical bar from the left represents the top of the stack in the recognizer.

The symbols LS...L1 in a given pro-

duction will be characters which might be encountered in translating the source language. Only the top _ blank.

The symbol in position L1 is at the top of the stack. characters (1 ___

__5) need be present, the others being

If only three symbols are specified in some production, it will

be compared with the three top positions of the stack during translation. The symbols LS...L1 constitute the "specified configuration" mentioned in the preceding paragraph.

Any of LS---L1 may be the character t_,

which

represents any symbol and is always matched.

A match occurs when the actual characters in the stack during translation are the same as LS...L1 in the production being compared.

At this

time the remainder of the information in that production is used.

The

t--_, following the bar specifies that the stack is to be changed.

The

symbols R3...RI specify the symbols which must occupy the top of the stack after the change.

If no t___, occurs then the stack will not be changed

and _3...R1 must be blank.

Thus LS...L1 refer to the top of the stack

before and R3...R1 to the top of the stack after a production is executed which changes the stack.

An example of a production containing a '_-_' is found on line II of the fragment example at the end of this section.

If this production is

matched the characters on top of the stack will be

T

The execution

of line

11 will

*

leave

the

P

top

of stack

in

the

configuration

T

Such a production would be used in a language such as ALGOL to recognize a multiplication.

In the 'action_ field of the production on line ll there occurs tEXEC 2'. above.

All actions are performed after adjusting the stack as described

The execution of EXEC 2 initiates the call of a semantic routine.

In this case the semantic routine would cause the multiplication of the fourth and second elements in the stack.

The value of the result would

then be put in the second position in the stack and translation continued.

15.

The semantic routines themselves are written in a special purpose language discussed in Section III.

The last part of the production in line ll is the symbol 'T2' in the next field.

This means that the next production to be compared with

the stack is the one labeled 'T2'.

(It is on line 13.)

The production on line 13 is an example of a production in which the stack is not changed.

If this line is matched the symbol at the top

of the stack is ,.t denoting multiplication.

At this point the translator

requires another operand before the multiplication can be performed. Since no action can be taken, the production on line 13 does not change the stack and its action field is blank. production is t*Pl'.

In the label field of this

The asterisk is this position is a special symbol

of the Production Language and specifies that a new character is to be brought into the stack before transferring to the production labeled P1. The asterisk is equivalent to the command "SCAN" which may occur in the action field of a production.

These are the only ways that a new char-

acter can be introduced into the stack.

Among the other commands which can appear in the action field of a production are EXEC, ERROR, and HALT.

The action EXEC is the main link

between syntax and semantics in our system and will be discussed in detail below.

The action ERROR causes its parameter to be used in print-

ing an error message for the program being translated.

An execution of

16. HALT causes translation to end and, if there have been no errors, the running of the compiled code.

There is one feature of Production Language which does not appear in the syntax of the language fragment.

It is possible to specify classes of

symbols and use the class name in a production.

This technique allows the

user to replace many similar productions with one production. B the class

_TP_

In Appendix

consists of the reserved words REAL, BOOLEAN, and LABEL.

In the productions under D3, the class name

_TP_

is used in processing

declarations in the small language, each production using

_TP_

replacing

three productions.

The remainder of this section will be a discussion of the example on page 20.

The language fragment description in the example is a subset of

the syntax of arithmetic expressions in a language such as ALGOL 60.

The mnemonic significance of the letters in the example is related to the ALGOL syntax and is given in the following table

Letter

May be Read

I

Identifier

P

Primary

T

Term

E

Expression

117.

The parentheses, asterisk and minus sign have their usual meanings. There are no productions to recognize identifiers because this is done in a pre-scan phase.

We will assume that the production syntax of the

rest of the language is compatible with the fragment.

The productions under the label P1 (lines 7-10) are used when a primary is expected. unary minus sign.

Line 7 is matched if the primary is preceded by a

In this case the minus is left in the stack where it

will be processed by line 15.

Similarly, the parenthesis in line 8 is

left in the stack to be processed by line 18 when its mate is scanned.

In line 9 an identifier is recognized and changed to a primary. Then EXEC 1 will initiate a call of a semantic routine which will see if the identifier is properly defined and, if so, will assign the value associated with that identifier to the primary.

Any other character appearing in the stack at this point is a syntactic error.

This is expressed in line 10 where the action field contains

an error message.

Line ll recognizes a multiplication and has been discussed earlier. Line 12 (also 16) only changes the name of the symbol in the stack.

This

device is used to enable the productions to realize the rule of arithmetic that requires multiplication to bind more tightly than subtraction. The production in line 13 embodies this implementation. followed by an t.v, there is a multiplicationwhich

If a term is

must be performed

18.

before that term is used in any subtraction.

For this reason line 13

brings a new symbol into the stack and causes a transfer to PI to pick up the other operand for that multiplication.

The productions under E1 (lines l&-16) recognize subtractions and form the final expressions.

Those under E2 (line 17-20) process an expression

once it is recognized.

If the expression is followed by a minus, it is

part of a larger expression so the remainder.

line 17 returns control to PI to process

The production on line 18 embodies the rule that all opera-

tions within parentheses must be executed before the parenthesized expression acts as an operand.

If the fragment were imbedded in a full language

(e.g., Appendix B) other

uses of arithmetic expressions would be checked

for under E2.

In the fragment, expressions are used in just the two ways

described above so that any other character in the stack leads to an error conaition.

The label QI is assumed to lead to an error recovery routine.

For an

example of a sophisticated error recovery scheme using production language see reference [13].

The production syntax for a complete programming language is included as Appendix B of this paper. language is Appendix A. BNF and a PL grammar.

The Backus Normal Form syntax oi the same

A word is in order regarding the equivalence of a Nowhere in this paper is there an attempt to present

a formal proof of the equivalence of two such grammars.

The writing of

19.

productions is in essence a constructive programming task and discrepancies should be treated as tbugs' in a program.

To the best of our knowledge

the examples are all debugged.

The task of writing productions is not a difficult one.

The produc-

tions for the small language (Appendix B) were assigned, with good results, as a homework problem in an undergraduate programming course.

In fact,

one of the students wrote a program in IPL-V which converts a BNF specification of a language into productions for a large class of languages. This work will be reported in a forthcoming paper.

20.

BNF Syntax of a Language Fragment

1.

< I >

:: =



2.

< P >

:: =

< I >

3.

< T >

:: =

< P>

_.

(



> ) *

< P >



Production Syntax for the Same Fragment

, LABEL

6.

7.

LEFT

n

io. TI

T

*

P

J

_i

f

61-'-_

T _i

i4.

T

_i E -

_i-e

15.

T _'i--->

16.

T

(

RIGHT

-f

i_. T2

18.

I < I >

: i l

5.

ii.

i < I >

T

_I

ACTION

*n

_oRi

Qi

EXEC 2

T2

f E

NEXT

*Pi

61

EXEC3

E2

E _l

FxEc 4

E2

_I-_ _,_I

I

I

_ )I-+

PI

Figure 2.

_,_ ExEc 5

_,Ti

21.

B,

Formal

Properties

The Production languages

Language

is a formal

grammar.

mathematical

properties

will discuss

the relationship

known

There

the syntax

has been much

of such formal between

grammars

of various

research

source

done on the

and in this section

Production

Language

we

and some well-

grammars.

Most

of the formal

generator-oriented. a set of rules are applied initial

languages

grammar of L.

of characters,

are

L, is

The generation

rules

with

to classify

of the languages

of Chomsky

studied

for a language,

starting

have been devised

to the complexity

the terminology

a fixed these

they specify.

[6] in describing

paper,

Chomsky

describes

each is capable

of generating

in the hierarchy.

The most

Type O, which transforming

allows

as a grammar

character

restrictions

strings.

We

a hierarchy

of

of grammar

complex

languages

general

grammar

he considers

essentially The other

the restriction

rule does not decrease

four types

more

on the permissible

In TYPE 1 grammars, generation

schemes

have been

types.

shows that

various

properties

the sentences

to strings

Various

In his definitive

successor

whose

A generator-oriented

recursively

according

follow

grammars

for generating

symbol.

grammars will

used to describe

any finite

types

than

and its

in

set of rules

are obtained

by placing

rules.

is that each application

the length

of the sentence

for

being

of a

22.

generated. that

The TYPE 2 grammars

each rule specifies

dent of its context. replacements tenee being types,

the replacement

The weakest

which effect generated.

establishes

some well-known

may be characterized

grammars,

gives

the hierarchy,

mathematical

machine

of executing

Turing machines,

TYPE i grammars rules may depend have been property

will

these

of those

four

formalisms

to

and thus

of the character

i.e.,

for determining

on a Turing

equivalent

TYPE 0 grammars,

(cf. Davis

types,

to the class

performable

perform

on the context

that they are decidable,

string

are capable

[lO])

because being

the replacement

replaced.

These

but do have the interesting

for any TYPE 1 language

if a given

string

there

in its alphabet

is

of that language.

tion in the literature. Structure

Form [32J.

which

context-sensitive

The TYPE 2 or context-free

Phrase

definition

is shown to be equivalent

function.

used less than the other

a sentence

a precise

are called

exists an algorithm

those

on only one end of the sen-

i.e., for any computations

any computable

only

indepen-

constructs.

there is a TYPE 0 grammar

transformations.

character,

TYPE 3, allow

and then relates

The class of TYPE 0 grammars of all Turingmachines,

of a single

a transformation

Chomsky

by the restriction

These

Grammars

It is this

languages

have received

are equivalent

Ill, Definable

last equivalence

Sets which

the widest

atten-

to the formalisms:

Simple

[20], and to Backus

Normal

has made

TYPE 2 grammars

23. interesting to workers in programming languages.

Despite its theoretical

inadequacies, Backus Normal Form has established itself as a canonical form for the description of syntax of artificial languages.

In this

section, we will discuss the formal properties of the Production Language as compared with those of Backus Normal Form.

In this discussion we will make use of results by Chomsky [7] and Evey

[1A].

In independent efforts, they attempted to construct automata

which would be equivalent to Backus Normal Form.

We will follow Chomsky

in presenting the definition of a Push Down Store Automaton.

This automa-

ton is recognizer-oriented and will, in fact, be shown equivalent to the Production Language.

A pushdown store automaton, M, consists of a finite

state control unit and two unbounded tapes, an input and a storage tape. The control unit can read symbols of the input alphabet AI from the input tape and the symbols of the output alphabet A0 _ AI from the storage tape. It can also write the symbols of A0 on the storage tape.

If M is in the

state Si scanning the symbols a on the input tape and b on the storage tape, we say that is in the situation (a, Si, b).

There is an element _

which cannot appear on the input tape nor be written on the storage tape. At the beginning of a computation, M is in the state SO, with the storage tape containing _o in every square and the input tape containing a string bI ... bn symbol _ state SO .

in the first n squares. in every square.

After bn, the input tape has a special

A computation ends when the M returns to the

If at this time M is in the situation ( _c, SO, _)

we say it

2A.

accepts

(recognizes)

A computation

the string

bI ... bn.

of M is controlled

(1)

by a set of rules

(a, Si, b) --_

where a

E A I, b @A 0 ;

control of X.

unit.

If

X = _o

We can assume,

S i = SO and K LEFT2)_

ZO_GONST{LEFT2J_

COMT _ ; TEMPLOC

RIGHT2_

_

LEFT2

|

SyMB[LEFT2,,$,]=BOOL*RIGHT2_SYMB[LEFT2,$,,]; SET(RIGHT2,LOGIC] ; TEMPLOC * T3 Zl¢ Z2_

RIGNTI

CODE( LEFTS* TEMPLOC* LEFT41 TEMPLOC* RIGHT2_ TEMPL. OC; SETIRIGHT2,LOGIC] I IALLY[TEMPLOC] SYMB[LEFT4,,$,I=BOOL4 CODE( COMT 2* LEFT2}:

COMT FAULT

Z4_

COMT 4_ SYMB(LEFT4,$,,]I CONST(LEFT2] 4 COMT 2*

LEFT2:

SN

_

SYMB[LEFTI,,$,] _ LABE _ FAULT COMT 2_ LOC[SYMB[LEFTI,$,,]]; COMT CODE(

3_ ; JUMP[CHAIN[

_7_

ASSIGN[FLAP2]

_B*

CODE(

_9_

MTNUS{LEV];

50.

ENTER[SYMB;LEFT2,0,LABE,

3_

CODE(

VALUe2

STOP}

* LEFI4

LEFT2$)

COMI

2_

SYMBILEFT2,$,,]:

_0

4 CODE(

6: $_

_ LEFT2)

POP[STR,STORLOC];

POP[SYM,LOC[SYMB]} 0

;

6:

SYMB[LEFTI,,,$] COMT 2] ]) $$

SYMB[LEFT2,,$,] _ LABE _ FAULT SYMB[O,,,$] * I ; ASSI_N[LOC [ SYMS[ LEFT2,$,,]]]

TAPE

355_

2_ SYMB[LEFT4,$,,]; 45

SYMB[LEFT2,,$,]=SYMB(LEFT4,,$,] FAULT 5 $ $; TEMPLOC _ T3; CODE( COMT 4- COMT 2)

_5_

: FAULI

* LEFT2

23_

_5_

SYMB[LE_T4,$_,]; _ T3 I FAULT 35

}

JtJMP[COMT

3])

S

137. Appendix E Productions for the Formal Semantic Language * *

SYMBOLS 18 INTERNAL

SYMBOLS ! OP AP AF AT AE TL LO BP BS BF BE IC UN UQ S SQ +

/

/

c >

< >

v

v

A

A

e

t

Q



|

( { j )

$ ( [ } )

!

t

_EGIN END STOP STACK CELL TITLE TABLE DATA

BEGN END STOP STAK CELL NAME _ABL DATA

]..'38.

,

JUMP MARKJUMP EXECUTE ENTER PUSH POP GHAIN ASSIGN CODE FAULT TALLY MINUS SET TEST |NT _OC CODELOC STORLOC TEMPLOC GOMT HUNT VALUE_ VALUE2 VALUE3 LEPTI LEPT2 LEFT3 LEFT4 LEFT5 NIGHTI RIGHT2 NIGHT3 FLADi FLAD2 FLAD3 FLAD4 SIGNAL HSTAK RTABL CONST CLEAR OK PLACE DEPTH SAR SIN COS _XP LN BQRI ARCTAN SIGN TRUE FALSE METACHARACTERS

TR TM X E ST NS CH UC CD. FALT TAL MIN SET

rEST IN L A W

FLOG CT RT Vl V2 V3 Y1 Y2 Y3 Y4 Y5 Zi Z2 Z3 FLI FL2 FL3 FL4 SIG RSTK RTAB CON CLE OK PLAC DEP SAR SIN COS EXP LN SORT ARC SIGN TRUE FALS

ILO.

DO

BEGN

DZ

D2

TA TAt TA2



I

TABL I

! .

Sl

EX1

LAB OPt

I_

'

L

TL

B1

CD APt

[ [ [

END I I • I ; [ I . I ] TABL . TABL ; TABL E CD. FALT SlOP ' ; I _ _ OK w [LOC SIG TEST I ' OP OP OP I ] I ] I I . I [ I CT I RT I CD. ( ( I IN L

_ _ * * *



* * _ _

EXEC I ERROR 0 SCAN

*D$ Q2 *D2

HALT EXEC ERROR EXEC EXEC SCAN ERROR SCAN EXEC

D1 *D% Q2 D_ *D% .TA Q2 *TA1 *TA2 DI *D1 D1 Q2 *API *SP1 ,EX1 ,CD *F1 *UN *LAB *SI *S% *EXI *BSZ .OPt lOP1 *BS1 *AP1 AP1 *S1 AF1 BSI AFI Q1 .OPI AE3 AE2 TO .APt OP1 Q1 .OP1 *OPI 01 *S1 Q1 ,EX1 *ID *AP1 *API ,AP1

2 I 3 3 2 4

TAHL ERROR 3 SCAN SCAN SCAN ExEC 5

_

UN

_ _

_ _ [ - * _ TL _ _

19

ERROR

4 _ _ _

_ _ _ _

EXEC 57 SCAN

BP OP OP BP

EXEC EXEC EXEC EXEC SCAN

6 15 16 17

EXEC 7 AP 8P AP



AE AE AE TL OP

OP ] . [

_ _

OP OP

_

CD.

ERROR 5 EXEC 58 EXEC 8 EXEC 8 EXEC 9 EXEC 10 EXEC 11 ERROR 6 EXEC 12 EXEC 13 ERROR 7 ERROR

SCAN SCAN

4

ILl.

AFt

AF

_

AF2 ATI

AT



AT2 AEi

AE

+ -

AE2
) • _ , ] ) ] ] ] ] } ] ] ] ] ] ] } ] }

SCAN

_ _ _

AP AP AP

_ _ _

AE OP AP

_ _

AE AP AP

EXEC EXEC EXEC SCAN EXEC EXEC ExEC

14 15 16

EXEC EXEC EXEC ERROR EXEC

49 22 66 18 21

18 19 20

_

AF



_

AF



_ _

AT AT



EXEC

23

_ 4 _ _

AE AE AE AE



EXEC

24

EXEC

25

ERROR

_ _ e TL _ _ _ _ _ _ _ _ * * * * * _ _ _ * * * _ _

AE E 8P UN UN BP

8

OP

EXEC

26

AP , AE TL AE UN UN AP UN UN UN AE UN BP UN UN UN UN BP UN

EXEC

27

EXEC EXEC EXEC EXEC EXEC

27 28 29 30 31

EXEC EXEC EXEC EXEC EXEC EXEC EXEC EXEC EXEC EXEC EXEC EXEC EXEC EXEC EXEC EXEC

32 33 34 35 36 37 38 39 40 64 65 41 42 54 56 43

*APZ *B1 *B1 *AF1 .AF1 *AF1 *AP1 *AE3 *OP1 .AF1 *AP1 *AE2 .AF1 *AF1 Q1 AF2 *EX1 AF2 *AP1 AT2 AT2 *AP1 AE2 AE2 AE2 AE2 Ol *AP1 AE3 *OP1 *API ,AF1 TO ,AP1 *EX1 BS1 UN UN BS1 *AE2 *UN *UN *AFZ *UN *UN *UN *AE2 *UN *BSZ *UN *UN *UN *UN *BSZ *UN

1L2.

851

_

BF1

BF

^

8F2 BE1

BE

v

BE2

AP

( _ _

UN

BP BP BS BS BF BF BF BE BE BE BE BE UN

S9 IG

SQ1 IG

SQ SQ

SQ

; s CD.

l CD. IC IC

S S S S SQ SQ SQ SQ SO SO

SP1 SP2

E

[ TL

TO

TL TL

AE AE

T1

02

F1 •

OTHbR I% 12 13 14 15 1 6

FALT END

STUFF 4 0 0 0 0 20

^ v _ ) $ ) $ ) _ = $ ; ] ; • $ • ]

I I I ! I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I

END _ END

I I _ I I I _ I _ I

_ _ * _

8S BS BF 8F

_ _

BE BE

_ * * *

UN UN

_

S

* _ * _

SQ

SQ

_ _ _

_ _

E

_ _ _

TL

i I I I I t I I I I IC I BP I I I I I I I S I S I I I S I S I I I S I t I TL I TL I I AE I OP I I I I



UN

I I I I I I I

ERROR 9 EXEC 44 EXEC 45

EXEC 46 ERROR 10 EXEC 47 EXEC 48 EXEC 31 EXEC 50 ERROR 11 ERROR 12 EXEC 53 EXEC 52 ERROR 13 EXEC 53 EXEO 52 EXEC 59 EXEC 55 EXEO 51 ERROR 15 EXEC 63 ERROR 16 EXEO 60 EXEC 61

ERROR 17 HALT

HALT EXEC 62 ERROR 20

Ol BF1 BFI BF2 BF2 ,EX1 BEZ BE2 Ol *EX1 *$1 *BS1 UN UN Q1 $9 Q1 SQI *S9 .S9 SO1 Q1 *S9 ,S9 *D1 *S1 ,S9 *S% Q1 _SP2 *EX1 Q1 *TO .TI *Ti *OP1 Ol Q1 tQ1 D1 Q1 *Q2 ,UN Ol

1L3. 1 7 1 8 I g 110 IZi 112 113 114 1:1.5 116 117 118 1:1.9 I20 I2% J 0 J t J J J J J

62 67 0 I 67 6 15 3 0 36 2 565 •

0 28 16273 492 0 65 67 3 _1 18

3 4 5 6 7

LABEL

% 2 3 4 5 5 7 B 9 10 11 12 13

LABEL NAME DO DI 02 D2 $I TA TA1 TA2 APl SP1 EX1 CO F1

TABLE VALUE 0 4 481 14 48 26 31 37 136 452 67 131 481

14

UN

400

15 15

LAB BSt

81 350

17

OP1

85

18 19 20 2% 22 23 24 25 26 27

AF1 01 ID AE3 AE2 TO B1 AF2 ATI AT2

176 475 97 268 219 461 123 i87 190 198

IA? • Appendix

This appendix small language and C. rather

consists

and translated

to illustrate

an example

plex code.

program

of a semantic reader

by the system

involved

structures

may yield an unrealistically

programs built

written

in the

from Appendices

to do useful

computations,

B but

of the compiler.

involving

statements.

run with several

The interested

code produced of really

the functioning

and arithmetic

pler, but has been contains

are meant

1 is a correct

of conditional

sample

by the compiler

None of the programs

Example

mance.

of three

F

fairly

Example

trace

error,

options as well

complicated

uses

2 is somewhat

sim-

on.

Example

3

as some fairly

com-

can get a good idea of the type

from these

examples.

(procedures, optimistic

etc.)

picture

However,

of

the lack

in the small of the sytem's

language perfor-

125. b0407 654/7 65507 65517 65527 655_! 65547 65501 65567 65517 60607 65&J7 656Z7 65037 65641 656_7 656b7 •

0003UU00210 00070000321 00030000000 00130000010 n0070000143 00020000002 0003t)000000 00030000206 00030000001 00070000120 00060000001 0007U000155 00030000210 00010000031 000t0000033 00060000001 00070000435 TAPE 210 RECORD

00030000000 0U020000004 00010000023 00070000435 00060000001 00030000204 00050000024 00030000000 00030000000 00060000001 00070000143 00_20000002 OOO300n0000 000600(10001 0U070000402 00070000056 00060000001

000100t}0020 0n030000205 D00700U032_, 00060000001 00070000J4,_ UO03OO0000U 0007001)032_ 00070000327 0001000002_ 0007000036b 0006000000_ 00030000204 U0010000030 00070000120 00020000005 00020000004 0007000012(J

TYPE

NUMBER

SYMBOL TABLE HIERARCHY TABLE PRODUCTION TAeLE METACHARACTER LISTS TIME

USED|

00=02|43

00070u00120 OU030',OnO00 OOO?O,,O_JO04 000701_01)276 00020"0_1002 00010_,0n010 01}060,,00001 00020,,0r_003 O0070uOff327 00060_'0n001 00070_100143 O0030_,Or)O00 00070u0_)120 00130,=00011 00030f_00211 00030h00220 001401100000 OF RECORDS 1 1 I i

PA_ES

USEDI

00020000004 00010000022 00030000205 00020000001 00030000201 00060000001 00070000306 00030000207 00020000004 00070000276 00020000002 00010000010 001300000_0 U0070000435 00030000000 00010000035 00070000435 STARTING 21 21 22 22

8

00030000205 00070000321 00030000000 00060000001 00030000000 00070000143 00060000005 00010000025 00030000210 00020000005 00030000201 00060000001 00070000435 00020000005 00010000033 00070000430

RECORD .+01 =+01 .+01 .+01

19114109

00030000000 00020000004 00010000021 00070000143 00010000010 U0022000U02 00070000276 00070000102 00030000000 00060000001 00030000000 00070000143 U0020000002 00030000001 00070000402 00020000001

00010000021 00030000205 00070000321 00060000001 00070000155 00030000205 00020000002 00020000005 00010000027 00070000_43 00010000010 00020000004 00030000210 00030000000 00020000002 00010000037