Automatic OO Parser Generation using Visitors ... - ACM Digital Library

1 downloads 0 Views 258KB Size Report
Nov 12, 2006 - Automatic OO Parser Generation using Visitors for Ada 2005. Martin C. Carlisle. US Air Force Academy. 2354 Fairchild Dr, Ste 6G101, USAFA, ...
Automatic OO Parser Generation using Visitors for Ada 2005 Martin C. Carlisle US Air Force Academy 2354 Fairchild Dr, Ste 6G101, USAFA, CO 80840-6234 +1-719-333-3590

[email protected] domain as a service of the United States Air Force; but it is unsupported and without any warranty. It is currently available at: ftp://ftp.usafa.af.mil/pub/dfcs/carlisle/usafa/adagoop.

ABSTRACT We describe AdaGOOP 2005 (the Ada Generator of ObjectOriented Parsers). AdaGOOP 2005 takes a specification of tokens and an LALR(1) grammar and creates a lexer, a parser that automatically creates a parse tree, and a traversal of the parse tree using the visitor pattern. AdaGOOP generates output that is similar to that of the Java tool SableCC. It takes advantage of the new interface feature available in Ada 2005 to create a visitor.

2. USING THE TOOL AdaGOOP is distributed as Ada 2005 source code. To use the tool, simply download and compile the source (adagoop.adb contains the main procedure). AdaGOOP outputs will be run through SCAFLEX and SCAYACC. Source code for these tools is distributed in subfolders of the same names (scaflex.adb and scayacc.adb are the main procedures).

Categories and Subject Descriptors D.3.4 [Programming Languages]: Translator writing systems and compiler generation, D.1.5 [Programming Techniques] Object-oriented programming

AdaGOOP is executed from a command prompt as follows: adagoop input_file prefix (e.g. adagoop ada05.g ada05)

General Terms

Prefix should be a simple name for the project. This name will be used several places in the output files.

Languages, Theory

The input file to AdaGOOP consists of three parts: token_macros, tokens, and the grammar. Token macros are regular expressions that form parts of tokens. For example,

Keywords AdaGOOP, automatic parser generation, visitor pattern, Ada 2005.

1. INTRODUCTION

token_macros DIGIT [0-9] INTEGER ({DIGIT}+)

A large number of tools have been developed to automate lexer and parser generation [1, 2, 3, 4, 5, 6, 7, 8, etc]. Probably the most well known of these are Lex and Yacc [4,5] and their GNU cousins Flex and Bison [3,7]. Although these tools were developed for C programmers, Ada versions (Aflex and Ayacc) were developed at the University of California-Irvine [1].

defines two macros, DIGIT and INTEGER, that can be used to create other tokens. Tokens are formed in the same fashion; however, their regular expressions can only contain token macros, not other tokens. The regular expression operators supported are given in Table 1. Comments should be defined with the token “ignore”, e.g.

Using these tools, the programmer provides a set of regular expressions and an LALR(1) grammar, and obtains a completely functional lexer and parser. AdaGOOP 2005 is an Ada 2005 automatic parser generator built on top of the SCATC versions of aflex and ayacc [2]. As SableCC does, AdaGOOP automatically generates code to build a parse tree, and a traversal of the parse tree using the visitor pattern. This tool is distributed in the public

ignore

"--"[^\n]*

Both tokens and token macros should be specified 1 per line. Comments may be interspersed with --. This is why the ignore token above must have the -- in quotation marks. The final section, “grammar” gives an LALR(1) grammar. It should begin with a %start, indicating which non-terminal symbol is the starting symbol of the grammar, e.g.

Copyright 2006 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the U.S. Government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. SIGAda’06, November 12–16, 2006, Albuquerque, New Mexico, USA. Copyright 2006 ACM 1-59593-563-0/06/0011...$5.00.

grammar %start A A : A token1 B | token2 | ; B : token2;

3

Note gnatchop is needed only if your compilation environment (such as GNAT) requires package specifications and bodies to appear in separate files. sed is used only to delete the first line of prefix_tokens.a. We needed to add with context clauses to this file, but they appeared after the package declaration. To resolve this, we added the context clauses, another package declaration, and within a batch script used sed to delete the first package declaration.

Each grammar rule consists of a non-terminal name, followed by a colon, followed by 1 or more sequences of terminals and nonterminals, separated by ‘|’. Note that a null production is specified by simply having one of these sequences be empty. The rule is terminated with a semicolon. White space is irrelevant. A complete sample file is given in Appendix A. [0-9A-FZ]

Any 1 of 0-9 A-F,Z

[^xyz]

Any 1 except x,y,z

\n

New line

.

Any 1 character

\.,\(

Period, Right paren

X*

0 or more X’s

X+

1 or more X’s

X?

0 or 1 X

X|Y

Either X or Y

“X*”

Literally X* (ignore operators)

{MACRO}

Replace with macro defn

()

Precedence

For the object-oriented parse tree, AdaGOOP creates an abstract object, Parseable, that will serve as the root of our hierarchy (it has no fields): type Parseable is abstract tagged null record; We also create a single object type, Parseable_Token Pto handle all nonterminal symbols. type Parseable_Token is new Parseable with record Line : Natural; Column : Natural; Token_String : String_Ptr; Token_Type : Token; -- enumeration type of all tokens end record; Both Parseable and Parseable_Token are found in the package Prefix_Model. A production A => B C, where B and C are nonterminals, corresponds to a node in the parse tree containing pointers to the subtrees derived through B and C. We therefore create the following class declaration (which will appear in the package specification a_model.ads): type A_nonterminal is new Parseable with record B_part : access B_nonterminal’Class; C_part : access C_nonterminal’Class; end record;

Table 1. Regular expression operators of AdaGOOP

3. ADAGOOP OUTPUT Throughout this section, “prefix” refers to the command-line argument passed to AdaGOOP. AdaGOOP generates prefix.l, an input file for scaflex. This file contains the specifications of tokens. The output file prefix.y is an input file for scayacc. It contains the grammar, plus code to build an object-oriented parse tree.

Since we have a single class for all terminal symbols, if we have A => B terminal1 C, we simply add a pointer to a Parseable_Token as the second field of the record. The parse tree then contains all of the information to completely recreate the original source text, with the exception of comment tokens (which are ignored).

Once AdaGOOP has been run, you can create the lexer and parser by running (for our example):

To address the issue of multiple productions for a single nonterminal, we insert an abstract class for the non-terminal, and have concrete classes corresponding to each possible production. For example, if we have A => B terminal1 C | D, then AdGOOP outputs the following record declarations in the package A_Model:

scaflex prefix.l gnatchop -w prefix.a gnatchop -w prefix_io.a gnatchop -w prefix_dfa.a scayacc prefix.y gnatchop -w prefix.a gnatchop -w prefix_goto.a sed -e 1d prefix_tokens.a > prefix_modified_tokens.a gnatchop -w prefix_modified_tokens.a gnatchop -w prefix_shift_reduce.a

type A_nonterminal is abstract new Parseable with null record; type A_nonterminal_Ptr is access all A_nonterminal’Class;

4

type A_nonterminal1 is new A_nonterminal with record B_part : access B_nonterminal’Class Terminal1_Part : Parseable_Token_Ptr; C_part : access C_nonterminal’Class; end record;

limited with A_Model; limited with prefix_Model; package prefix_Visitor_Interface is type Visit_prefix_Interface is interface; procedure Visit_Parseable_Token( I : access Visit_prefix_Interface; T : access prefix_Model. Parseable_Token'Class) is null;

type A_nonterminal2 is new A_nonterminal with record D_part : access D_nonterminal’Class; end record;

procedure Before_A_nonterminal1( I : access Visit_prefix_Interface; N : access A_Model.A_nonterminal1 'Class) is null; procedure Visit_A_nonterminal1( I : access Visit_prefix_Interface; N : access A_Model.A_nonterminal1 'Class) is abstract; procedure After_A_nonterminal1( I : access Visit_prefix_Interface; N : access A_Model.A_nonterminal1 'Class) is null;

The tool also will automatically number the fields in the case where a non-terminal or terminal appears more than once on the right hand side of a production. So, if we had A => B terminal1 B instead, the record would look like: type A_nonterminal1 is new A_nonterminal with record B_part1 : access B_nonterminal’Class Terminal1_Part : Parseable_Token_Ptr; B_part2 : access B_nonterminal’Class; end record;

procedure Before_A_nonterminal2( I : access Visit_prefix_Interface; N : access A_Model.A_nonterminal2 'Class) is null; procedure Visit_A_nonterminal2( I : access Visit_prefix_Interface; N : access A_Model.A_nonterminal2 'Class) is abstract; procedure After_A_nonterminal2( I : access Visit_prefix_Interface; N : access A_Model.A_nonterminal2 'Class) is null;

AdaGOOP also generates code for traversing the parse tree using the Visitor pattern [9]. Each class in the parse tree hierarchy has an acceptor method, which is used to dispatch to the appropriate visitor method:

… end prefix_Visitor_Interface;

procedure Acceptor(This : access Parseable; I : access prefix_Visitor_Interface. Visit_prefix_Interface'Class) is abstract;

A depth first traversal class, DFS, is generated into the package prefix_DFS. DFS implements the visitor interface, and its visit methods perform a depth first traversal of the parse tree. For example, following is the visitor method corresponding to the grammar rule A → A PROCEDURE B:

This automatically generated interface specifies a visitor method that has no additional parameters and does not return a value. It is output to the file prefix_visitor_interface.ads.

procedure Visit_A_nonterminal1( I : access DFS; N : access A_Model.A_nonterminal1'Class) is I_Classwide : access DFS'Class := I;

As can be seen from the following sample body (automatically generated), Acceptor merely calls the appropriate visitor method found in the interface: procedure Acceptor( This : access A_nonterminal2; I : access Visit_prefix_Interface'Class) is begin I.Visit_A_nonterminal2(This); end Acceptor;

begin I_Classwide.Before_A_nonterminal1( N); N.A_part.Acceptor(I); I_Classwide.Visit_Parseable_Token( N.PROCEDURE_part); N.B_part.Acceptor(I); I_Classwide.After_A_nonterminal1( N); end Visit_A_nonterminal1;

Visit_prefix_Interface contains one visitor method for each leaf of the hierarchy, plus supports both pre-order and postorder depth first traversals by providing “before” and “after” methods that can be overloaded:

5

prefix_parser.run(argument(1)); parse_tree := prefix_parser.get_parse_tree; parse_tree.Acceptor(Visitor); end tool;

This visitor method calls the “before” method (for pre-order traversals), then calls Acceptor for each of the children to dispatch to the appropriate visitor, and finally calls the “after” method (for post-order traversals). Tokens don’t require dispatching, so the visit method can be called directly. It is interesting to note that Ada, unlike Java, does not perform dynamic dispatching by default. Therefore, it is necessary to convert the parameter I to a classwide type before calling Acceptor on each child so that dispatching occurs.

4. CONCLUSIONS AND FUTURE WORK AdaGOOP allows a programmer to quickly create a parser that generates an object-oriented parse tree. As a result, it is quite useful for generating simple language translators. Fagin [10] used the previous version of AdaGOOP to generate a translator from a subset of Ada 95 to NQC [11]. This allows developers to use Ada to program the Lego Mindstorms robots. Pedersen and Constantinides [12] created an Aspect Ada language using AdaGOOP. Future projects using AdaGOOP should be even easier with the new enhancements.

To create your own visitor, you simply create a child class for DFS. In this child class, you override the “before” and “after” methods for those parse tree classes that you wish to process: limited with A_Model; with Prefix_DFS; package DFS_Example is

One possibility for future work is to figure out how to include comments in the parse tree (since they do not appear in the grammar they are not currently included). This would allow AdaGOOP to be used to create a code reformatter. We would welcome additional contributions to this project (either grammars or improvements to the tool).

type DFS is new Prefix_DFS.DFS with null record; overriding procedure After_A_nonterminal1( I : access DFS; N : access A_Model.A_nonterminal1'Class); overriding procedure Before_A_nonterminal2( I : access DFS; N : access A_Model.A_nonterminal2'Class);

5. REFERENCES [1] Arcadia Project. “Aflex and Ayacc.” http://www.ics.uci.edu/ ~arcadia/Aflex-Ayacc/aflex-ayacc.html [2] Conn, Richard. 1997. “The Source Code Analysis Tool Construction Project.” Proceedings of Tri-Ada ’97, 141-148. See also http://unicoi.kennesaw.edu/ase/support/cardcatx/ scatcdsk.htm

… end DFS_Example;

Creating a main program for a tool generated with AdaGOOP is simple. You need simply call the “run” procedure in the prefix_parser package with a filename, then use get_parse_tree to get the root of resulting parse tree. The depth first traversal is started by calling the method Acceptor on the root of the tree.

[3] Free Software Foundation. 1999. “Bison-GNU Project-Free Software Foundation.” http://www.gnu.org/software/bison/ bison.html [4] Johnson, S.C. 1975. “Yacc—yet another compiler compiler.” C.S. Technical Report #32. Murray Hill, NJ: Bell Telephone Laboratories.

with Ada.Command_Line; use Ada.Command_Line; with ada.text_io; use ada.text_io; with prefix_parser; with prefix_model; with DFS_Print; with a_model;

[5] Lesk, M.E., and Schmidt, E. 1975. “Lex—a lexical analyzer generator.” In Unix Programmer’s Manual 2. Murray Hill, NJ: AT&T Bell Laboratories. [6] Mauney, Jon, and Fischer, Charles N. 1981. “An improvement to immediate error detection in Strong LL(1) parsers.” Information Processing Letters 12(5):211-12. [7] Paxson, Vern. 1990. “Flex users manual.” Ithaca, NY: Cornell University.

procedure tool is type DFS_Access is access all DFS_Print.DFS'Class; Parse_Tree : prefix_model.parseable_ptr; Visitor : DFS_Access := new DFS_Print.DFS;

[8] SableCC Home Page. http://www.sablecc.org [9] Martin, Robert C. 2002. “The Visitor Family of Design Patterns.” In The Principles, Patterns and Practices of Agile Software Development. Prentice Hall. ISBN: 0135974445. http://objectmentor.com/resources/articles/visitor

begin if argument_count /= 1 then put_line("usage: tool filename"); return; end if;

[10] Fagin, B. 2000. “Using Ada-based robotics to teach computer science.” In Proceedings of the 5th Annual SIGCSE/SIGCUE Conference on Innovation and Technology in Computer Science Education (Helsinki, Finland, July 11 - 13, 2000), 148151. [11] NQC Home Page. http://bricxcc.sourceforge.net/nqc.

6

[12] Pedersen, K. H. and Constantinides, C. 2005. “AspectAda: aspect oriented programming for Ada 95.” Proceedings of SIGAda 2005 (Atlanta, GA, USA, November 13 - 17, 2005), 79-92.

tokens -- Reserved Words OPEN

[oO][pP][eE][nN]

PROCEDURE [pP][rR][oO][cC][eE][dD][uU][rR][eE] WRITE

A. SAMPLE ADAGOOP GRAMMAR

--Other NUMBER ({INTEGER}|{DECIMAL_LITERAL})

token_macros DIGIT INTEGER EXPONENT

[wW][rR][iI][tT][eE]

[0-9]

grammar

({DIGIT}*)

-- testing

([eE](\+?|-){INTEGER})

%start A

DECIMAL_LITERAL {INTEGER}\.{DIGIT}{INTEGER}{EXPONENT}?

A : A PROCEDURE B | B; B : B OPEN C | C; C : WRITE NUMBER;

7