Component-Based Grammars

15 downloads 0 Views 145KB Size Report
Faculty of Electrical Engineering and Computer Science ... can modularize a programming language grammar as independent and ... generally considered poor practice to have hundreds of lines of code tangled and lumped together within a .... language that handles integer addition and subtraction is provided in Listing 2.
Component-Based Grammars Xiaoqing Wu, Barrett R. Bryant, Jeffrey G. Gray1 Department of Computer and Information Sciences University of Alabama at Birmingham, U.S.A.

Marjan Mernik2 Faculty of Electrical Engineering and Computer Science University of Maribor, Slovenia

Abstract

The Component-Based Context-Free Grammar (CBCFG) is presented as a new formalism that can modularize a programming language grammar as independent and pluggable components. The key benefit of using a CBCFG is reduced complexity in the design of a large language grammar and improved reuse when constructing programming language implementations. CBCFG enriches the properties of formal context-free grammars by adopting software engineering constructs and techniques such as object-orientation, aspect-orientation and macro definition from programming languages. Using the Java programming language as a case study, it is shown how each of these techniques may be used to improve programming language grammar design. The language implementation based on CBCFG and its tool support on the Eclipse platform are also described.

11Introduction According to software engineering principles concerned with cohesion and coupling, it is generally considered poor practice to have hundreds of lines of code tangled and lumped together within a single module [1]. Therefore, modern programming languages all provide constructs for supporting modularity (e.g., methods, classes, packages, and even aspects). Based on these constructs, developers are able to reduce the complexity of building a large system by composing several smaller components in a bottom-up fashion. However, within the domain of grammar design, the idea of writing grammars in a modular 1

Email: {wuxi, bryant, gray}@cis.uab.edu

2

Email: [email protected]

fashion has not been given much attention. Although grammar specifications can be quite complicated and large, most parser/compiler generation tools are based on pure Context-Free Grammar (CFG) or its variations, which do not provide a facility for modularization beyond the production rule. For large language implementations (e.g., the Java 1.4 grammar in CUP [2] is 879 lines of codes with 670 lines of pure CFG productions), the developer has to design the whole grammar by manipulating hundreds of grammar symbols and productions at the same time. In terms of the information hiding an encapsulation [1], this kind of language design provides poor comprehensibility, changeability and independent development. Table 1 highlights some of the problems endemic to current grammar design practice. Table 1. The problems endemic to current grammar design. Parnasian objectives

Violation reasons

Comprehensibility

1 Single namespace generates long variables. 1 670 lines of CFG productions result in poor readability. 1 Change to any name may propagate other hundreds of lines. 1 It is hard to locate all the related changes. 1 The grammar has to be designed as a whole. 1 The intertwined relations between productions make it impossible for multiple people to cooperate on designing the same grammar.

Changeability Independent development

To address the aforementioned problems, this paper describes an approach that applies software engineering constructs and techniques (e.g., object-orientation, aspect-orientation and macro definition) from programming languages to develop a new formalism called Component-Based Context-Free Grammar (CBCFG). A key contribution of our approach is the ability to modularize a grammar as independent and pluggable components to decrease the complexity in the specification of large grammars. The characteristics of a CBCFG as a independent language and a pluggable component are presented in Table 2. Table 2. Characteristics of CBCFG . Language-independent properties 1

1

1

Pluggable component properties

It has its own namespace, 1 nonterminal symbols and a group of related productions. It requires enough scale that it 1 could be developed and tested individually. For General Grammar 1 Component (GGC), a start symbol acts as the entry point of a language.

A grammar component may inherit from other grammar components or contain aggregation of other components by importing them. Grammar symbols imported from other components are visible to current component through high-order grammar symbols. All language components share the terminals library, global nonterminals and productions via a unified component.

An important concept of CBCFG is macro definition, which can be used to adapt the imported symbols or global symbols as local symbols. Grammar macros can also be used as placeholder symbols to make the language complete when some imported outer language components are not implemented yet. In addition to general language components, CBCFG also provides abstract components and aspect components, which are described in Section 2.1. The paper is organized as follows. Section 2 defines the grammar component types and internal structure. It also describes the key contribution and salient features of the CBCFG approach. The advantage of using CBCFG is demonstrated in Section 3. This section provides an example definition of the syntax of Java using CBCFG. The language implementation based on grammar components is presented in Section 4. Related work is introduced in Section 5. A summary and conclusion is provided in Section 6.

2. Component-Based Context-Free Grammar Within a grammar specification, it is common to find a group of productions that are tightly coupled to each other (e.g., expression related productions). Inside the group of productions, most variables have nothing to do with the outside productions, whereas only one variable is constantly referred to by other groups (e.g., expression), which can be regarded as the point of entry to this group, in other words, the start symbol of this sub-language. The CBCFG can assist in addressing this problem by providing a convenient notation for grouping related productions inside a component and extracting the root symbol as the hook to interface with other components. A typical CBCFG specification is composed of one or more grammar components and a corresponding environment. The environment defines terminal symbol declarations, global nonterminal symbols, and production declaration. The detailed component types and internal structure of a component are described by the following subsections.

2. 1 Component types There are three kinds of components in CBCFG: General Grammar Component (GGC), Abstract Grammar Component (AGC) and ASpect Grammar Component (ASGC). 1

1

General grammar component: the specification of each general grammar component describes one complete language (i.e., it must have one start symbol which can derive a language based on the productions). A GGC can inherit multiple GGCs or AGCs, which means all the variables and productions specified in the super component are valid for the sub component to reuse. Abstract grammar component: An abstract grammar component is used to capture the common variables and productions of several language components. It does not represent a concrete language and has no start symbol. The variable declaration and productions of the environment in a CBCFG application can be treated as a global AGC that is inherited by all grammar components of the destination language by default.

1

Aspect grammar component: This mechanism provides a second dimension of separation of concerns, in addition to the object-oriented fashion for grammar modularization. Aspect grammar components typically are added after the specification of other grammar components. GGC and AGC components are oblivious to ASGC components. An ASGC adds new behavior to existing productions, which is analogous to the property of aspect-oriented programming [3]. Aspect grammar components have no start symbol. The functionality for aspect grammar components is during language evolution, to add new language artifacts (such as arrays and pointers) whose related productions will crosscut several other components. They are also useful for incremental language design. The new features to be added to a language can be specified in ASGCs.

2. 2 Component structure [ abstract ] language | aspect Identifier-1 [ extends Identifier-2, …, Identifier-n ] { import declarations; macro definitions; variable declarations; CFG productions; } Listing 1. The abstract syntax definition of a CBCFG component. Bold words stand for key words of the formalism. Table 3. The different roles of the four elements inside a grammar component Element

Role

Declare outer language components that could be used in this component. It is an import declarations important artifact to indicate the composition relationships between language modules and direct variable and production reuse among different components. Adapt the imported symbols as local symbols or used as placeholder symbols to macro makes the language complete when some imported outer language components definition are not yet implemented. A list of local nonterminals that can be used in the language component. variable declarations Specify the syntax rules of the component by CFG productions with high-ordered CFG productions grammar symbols. As illustrated in Listing 1, the syntax definition of one grammar component follows an object-oriented fashion. Each component is identified by a language name followed by its super language components (if available). Inside each language component, there are 4 parts of a specification: the import declarations, the macro definitions, the variable declarations and CFG productions (Table 3 outlines the role of each part). A sample component for a simple expression language that handles integer addition and subtraction is provided in Listing 2. Based on this example, the following sub-sections detail the salient features of CBCFG and its internal structures.

1. language Expression { 2. // no import declarations in this example 3. // macro definition 4. define term = INTEGER; 5. // variable declaration 6. variable expression, binary_expression, sum, difference, term; 7. // CFG productions 8. main ::= expression; 9. expression ::= term | binary_expression; 10. binary_expression ::= sum | difference; 11. sum ::= expression PLUS expression; 12. difference ::= expression MINUS expression; 13. } Listing 2. The CBCFG specification for an expression component. Bold words stand for keywords, where “main” indicates the start symbol of the language. Tokens INTEGER, PLUS, MINUS are terminal symbols defined in the global environment. 2.2.1 High-order grammar symbol in CBCFG The difference between a grammar symbol in CBCFG and the symbols in a general CFG is that the grammar symbol can be high-ordered, in other words, a grammar symbol itself can have its own grammar symbols as well. In CBCFG, once a component A is imported into component B, A can be used as a nonterminal symbol inside the component B. At the same time, the grammar symbols inside A are visible to B as well, the reference to which are in the form of A.*, where * represents the internal grammar symbol of A. The reference to A in B equals the reference to the start symbol of A. For example, if the expression component is imported by a component named statement, then both Expression and Expression.sum will be valid symbols in the statement component. The reference to Expression has the same effect as the reference to Expression.expression. As a summary, the list of all the valid symbols inside a component includes: 1 1 1 1

the terminal symbols and global nonterminals defined in the environment the nonterminals (variables) defined inside the component the nullable macro symbols the imported component name and its internal nonterminals.

The high order grammar symbol mechanism makes the symbol name reflect the modular information which is more valuable then pure string names. 2.2.2 Name space Traditional grammar design is based on pure context-free grammars. All the productions share the same namespace. For a language specification that has a large number of symbols, it is very hard to invent different names to distinguish the tokens that have similar roles. Moreover, since each symbol’s role description is totally defined by the name of the symbol, explicit long names have

to be created to capture the necessary information for improved comprehensibility such as local_variable_declaration_statement defined in the Java specification [4]. However, in CBCFG, each component has its own namespace, which will not interfere with other components. This enables the grammar designer to use the symbols that have been defined in other components to represent different entities. Therefore, the same variable name can be used in several components to represent package_name, type_name, method_name. The identification and role of each grammar is defined by the component name-variable name pair. The variable names inside a component can be precise and unique. 2.2.3 CFG productions The CFG productions inside a grammar component are similar to general CFG productions, except the grammar symbol in the CBCFG case can be high-ordered. In CBCFG, the union is defined by the “|” symbol, and concatenation is defined by blank spaces. Some regular expression symbols, such as Kleene star “*”, parentheses “()”, as well as the “?” and “+” operators, are also available. For general components and abstract components, the left-hand-side (LHS) of a production must be a local nonterminal (the productions of a component can only define its internal symbols), and the right-hand-side (RHS) can be high-ordered as a component id attached with a variable id. For aspect components, both LHS and RHS symbols of a production can be high-ordered as the aspect component can extend the productions that have been already defined by other components. 2.2.4 Macro definition It is a common case that one variable may have different roles in different components. For example, an identifier (or name) can act as a type id or a class id or a variable id according to the context. In traditional grammar design methods, the designer has to either use the same variable (e.g., identifier) at different places to act the different roles, or use a unit production to rename the variable. The first approach eliminates the good exposition of the grammar because the variable names do not make sense, and once the referred variable’s name is changed, all the reference should be changed. In the case of the CUP specification for Java [2], the developer has to manually search 800 lines of codes and change the name one by one. The second approach increases the complexity of the grammar by adding unnecessary productions, which can confuse the grammar user when the unit production is placed at the inappropriate place. Another important problem of this approach is that it is quite possible for the new unit-productions to generate reduce-reduce conflicts, which will be explored in section 3.7. In our component based approach, the user is able to use a macro to redefine the imported grammar symbols in the form of A = B or A = B.C. Once the macro is defined, it can not be redefined by CFG productions. The macro name can be used as a RHS variable inside the component to help the user to understand the grammar. The macro (A) will be expanded by the original imported grammar symbol (B or B.C) in actual parser generation, to avoid new reduce-reduce conflicts. Therefore this approach eliminates the drawback of both methods introduced above. Any change to a grammar symbol will only affect the productions inside the component and the macro definitions in related components. Moreover, the macro definition (combined with import declaration) is also useful for directing component composition. For

example, if the macro definition in List 2 is changed to: import Matrix; define term = Matrix; where Matrix is another component for matrix definition, the integer expression language is changed to a matrix expression language by composition of the two components. A macro name can also be defined as a placeholder symbol in the form of A = null, which makes the symbol referable but actually contains nothing. The nullable macro definition provides the ability to run or test the grammar component as an independent language when some imported outer language components are not yet implemented. Another advantage of macro variables is that the macro definition is not just a grammar symbol but also a collection of grammar symbols connected with logical combination operators: logical disjunction (+), logical difference (-), and logical conjunction (blank space). The functionality of the logical “or” operator is the same as union in CFG productions and logical connection is the same as concatenation in CFG productions. Logical difference A = B - C means A can derive all B’s derivation except C. The precondition of this operator is that there must be a production in the form of B ::= C | D. This operator enables partial reuse of grammar symbols in the current or other component.

3. Case Study on Java Grammar Specification Using the CBCFG approach, language grammars can be designed in a very modular way. To illustrate the benefits of using this approach, we fully modularize the Java language grammar in a component based manner (as illustrated in Figure 1) and compare it to the classical Java syntax grammar introduced in [4].

3.1 General grammar components for modular and readable specification The design is based on the Java specifications introduced in [4]. First, a Java component is created to capture an instance of the Java language (i.e., a compilation unit). A compilation unit is composed of a sequence of package_declarations, import_declarations and type_declarations. The specification of package_declarations and import_declarations is estimated to be short enough to be specified in the Java component. However, type_declaration, which can be either class_declaration or interface_declaration, tends to be lengthy and independent, so class_declaration and interface_declaration are decomposed into two individual languages for modular development. Applying the same strategy, productions related to statements and expressions are extracted as general grammar components, of which the expression component is further decomposed as binary_expression, unary_expression, primary and the new expression components. Type related productions are also encapsulated as a component.

Object_type

Array

Interface

Type Class Java

Statement Expression

Binary_expr Unary_expr Primary

Figure 1. CBCFG for Java specification. GGC Java acts as the start component. AGC Object_type encapsulates the common behavior of Interface and Class definitions. ASGC Array describes productions related to array structure, which crosscuts various components.

3.2 Abstract grammar component for common property reuse Since Java classes and interfaces share quite a few constructs (such as field declaration and method signature), an abstract language component named Object_type is created to capture the common variables and productions so that the interface and class components can focus only on the parts that are different from each other.

3.3 Aspect grammar component for crosscutting productions There are certain language constructs (e.g., arrays) that have a global affect throughout the language grammar. Its productions are functionally related to each other but physically distributed all over the grammar (as illustrated by the dotted lines in Figure 1). This kind of construct is difficult to extract as a component in the general approach. An aspect component is designed to capture easily these kinds of constructs. In an aspect component, the grammar rule can use symbols of any regular components (i.e., not only the RHS symbol but also the LHS symbols can be high-ordered). The general form is: A.a ::= B, where symbol a should be already defined in component A. This production extends the derivation of the symbol a inside component A to the union of the original RHS and B. In parser generation, an aspect weaving process is applied to weave these productions into the productions defined in A. Besides arrays, there are other language constructs that can be specified as aspect grammar components, such as the pointer

structure in C++. The ASGC is also useful in grammar version control. For example, Figure 2 illustrates how productions in aspect Java_1_4 is weaved to add a new assert statement to the original Statement component for version update.

Figure 2. The productions in the aspect component Java_1_4 are weaved into the general component Statement to upgrade Java 1.3 to Java 1.4 grammar. ASSERT and SEMICOLON are terminals specified in the environment. empty_statement, try_statement and assert_statement are shortened to empty, try and assert, respectively.

3. 4 Namespace for producing precise yet meaningful variable names As described in Section 2, the stand-alone namespace of each grammar component enables the same variable name to be used in different components for different roles. For example, the same variable name is used as package_name in the component Java, type_name in the component Type and expression_name in the Expression, etc. Since the identification and role of each grammar is defined by the component name-variable name pair, the variable names inside a component can be quite precise (e.g., in the statement component, the variables block_statement, empty_statement, local_variable_declaration_statement can all be shortened as block, empty and local_variable_declaration).

3.5 Macro definition for renaming symbols and eliminating parsing conflicts For better comprehensibility, macro definitions are used in multiple components in the Java grammar to rename an outer grammar symbol to act in a different role in the component, as described in Section 2.2.7. Moreover, macro definition is used as a mechanism for eliminating the reduce-reduce conflicts in the original Java specification [4]. There are 5 conflict problems introduced in [4] and the solutions provided by the authors are all based on the same strategy, which is to unify the different variable names and productions and require a later stage of compiler analysis for type-checking or sorting out the precise role of each symbol. In CBCFG, macro definitions with logical operators can resolve these problems without generating any side-effects, which is explored by the following example. The modifiers of different language constructs (e.g., class, interface, field, method) will generate reduce-reduce conflicts. The solution in [4] is to eliminate all six of the nonterminals class_modifier, field_modifier, method_modifier, constructor_modifier, interface_modifier, and constant_modifier from the grammar, replacing all of them by a single nonterminal modifier defined as the union of modifier keywords: Modifier ::= PUBLIC | PROTECTED | PRIVATE | STATIC | ABSTRACT | FINAL | NATIVE | SYNCHRONIZED | TRANSIENT | VOLATILE However, since the modifier of each language construct is only the subset of all the modifiers listed above, in this case a later stage of compiler analysis must sort out the precise role of each modifier to determine whether it is permitted in a given context. The CBCFG solution is to make the modifier a global nonterminal, then use macro definitions and logical not operators to rename it to class_modifiers and field_modifiers. For example, the macro definition of method_modifier will be: define method_modifier = modifier - TRANSIENT – VOLATILE;

3. 6 Component self-testing and independent development In CBCFG, since each grammar component has its own name space and start symbol, it can be treated as an independent language that can be developed and tested. The import declarations and macro definitions act as hooks between different grammar components (as in Figure 3(a)). If imported components are not yet developed, the null macro variable introduced in 2.2.7 can be used to complete the language (as in Figure 3(b)). Once related components are defined, the component composition can be realized by replacing the null macro names with real macro definitions (as in Figure 3(c)).

Figure 3. The Expression component in the Java Grammar. (a): the imported components are implemented except the Type component. (b): using null macro symbol to provide a false hook and make the language complete. (c): after the type component is implemented, compose two components together.

4. Language Implementation Based on Grammar Components

Figure 4. Language implementation based on CBCFG Language implementation using CBCFG is demonstrated by Figure 4. We use Java programming language for AST tree building and AspectJ [5], a seamless aspect-oriented extension to the Java, for semantics implementation. First, grammar components are compiled to generate the parser and Abstract Syntax Tree (AST) nodes in Java. Then, concrete semantics specified by AspectJ specifications are weaved into AST nodes without any change to generated AST classes. Therefore, the workload of developing a large language is decreased by developing a number of smaller languages. Tool support for CBCFG is also developed under the Eclipse platform [6].

4.1 Automatic parser generation and tree building.

Figure 5. The automatic parser generation and tree building process in CBCFG. As shown in Figure 5, the grammar components are compiled by the CBCFG compiler and a parser that is built on top of CUP is generated. The lexer is generated based on the JLex specification described in the global component (the environment), which is activated by the parser. Meanwhile, the AST is constructed. Each grammar component is generated as a Java package while the grammar symbol inside a component is generated as a Java class. The whole generation process is detailed in [7].

4.2 Aspect-oriented programming for semantics implementation In order to freely attach concrete semantics to generated AST nodes, the visitor pattern is utilized in our language implementation approach [8]. In the visitor pattern, all the methods pertaining to one operation of the nodes are encapsulated into a single visitor class, which is independent of other node classes and can be freely added or deleted from the system. The conventional visitor pattern is implemented in an object-oriented programming language, where the accept() methods use the explicit delegation mechanism (redirection of operation calls) to activate the behaviors of the elements which have been defined in the visitor classes. This tends to make the programs difficult to understand, increases communication between objects and introduces dependencies that can prevent some evolution of the software [9]. Aspect-orientation when applied to the visitor pattern can isolate crosscutting behavior in a more explicit way [10]. By applying AOP concepts, the visitor operations can be written so that the AST nodes are oblivious to semantics operations (the accept() methods defined in each separated node class are no longer needed). Listing 3 is a sample aspect that is used for value evaluation for the expression language described in Section 2.2. More detail of the aspect-oriented semantics implementation is described in [11]. 1. 2. 3. 4. 5. 6. 7.

aspect ValueEval { public abstract Double ASTNode.valueEval(); public Double Integer.valueEval(){ return Double.valueOf(lexeme); }; public Double Difference.valueEval(){ Double value1 = expression1.valueEval();

8. 9. 10. 11. 12. 13. 14. 15. 16. }

Double value2 = expression2.valueEval(); return new Double (value1.doubleValue()-value2.doubleValue()); }; public Double Sum.valueEval(){ Double value1 = expression1.valueEval(); Double value2 = expression2.valueEval(); return new Double (value1.doubleValue()+value2.doubleValue()); };

Listing 3. AspectJ specification for value evaluation. The generated AST node has no awareness of the existence of these semantics operations.

4.3 Eclipse tool support To carry out the whole language implementation process in a user-friendly manner, Eclipse [6] is chosen to build a CBCFG IDE. Eclipse is a Java IDE offering a platform for building and integrating application development tools delivered via plug-ins. This IDE provides a user-friendly environment for editing and navigation of grammar component specifications, as well as parsing and type-checking at the source code level. The AspectJ plug-in is reused in this IDE to enable interpreter generation for programming languages. The language development unit in this IDE is a CBCFG project, which contains one or more grammar components and a global environment. The specification files are classified into three categories: a language configuration file (.pj) for the global environment, a number of grammar component files(.lang) and a number of aspect component files(.al). One grammar component must be specified in one file but one .lang (or .al) file can be used to store several grammar components (aspects). The CBCFG project management module handles the loading of CBCFG source files from the file system as well as editing and saving of files in the editor view. The CBCFG compiler will parse and type-check the CBCFG specifications whenever the file is loaded, changed, or saved in the editing environment. The symbol table is generated after the parsing phase, which contains all the parsed information such as language names, their parents components, import declarations, macro definition, variable declarations and CFG productions. Other modules such as the task view and the outline view, as well as the interpreter generator, refer to this symbol table to retrieve the related information.

5. Related Work The different techniques inside CBCFG approach are related to many different works. The grammatical operators proposed by Dave Wile [12] are used to transform one grammar into another. The functionality of the operators such as rename, abstract, remove, extend and integrate in [12] are all implemented in CBCFG by inheritance, import declaration, macro definition as well as abstract component and aspect component. Composable Attribute Grammars (CAG) [13]

proposed by Farrow consists of component attribute grammars and glue grammar. In component attribute grammar phrase structure and its semantics are expressed in terms of abstract, language independent context free grammar. The concrete syntactic structure is specified only in a glue grammar, whereas in our approach, the composition is naturally specified by import declarations and macro definition inside each grammar component. Simple tree attribution described in [14] has the property of descriptional composition, which allows a complex tree transformation to be built up from simpler ones. The composition idea is similar to CBCFG except that CBCFG works at the grammar level while simple tree attribution is about tree composition. Multiple inheritance on attribute grammar is utilized in the LISA system [15] for incremental language implementation, which is analogous to the component inheritance in our work. The semantic weaving technique used on top of AST node generated by CBCFG is related to JastAdd [16] which weaves aspect-oriented Java code to Reference Attributed Grammars (RAGs) for semantic analysis. There are also other works aimed at modularization of language implementation. The most widely used Java Compiler Compiler (JavaCC) [17] is a Java parser generator which can be combined with tree generator JJTree [18] to generate object-oriented interpreter/compilers. The syntax definition of JavaCC follows an object-oriented fashion: each production is encapsulated as an object-like structure. However, this kind of small-scale modularization and the fact that all the lexical rules and syntax rules have to be specified in one .jj file doesn't address the modularization problem more than pure CFG productions. The ASF+SDF Meta-Environment [19] is an environment for the development of language definitions and tools in a modular fashion. It combines the syntax definition formalism SDF with the term rewriting language ASF. SDF is supported with Generalized LR (GLR) parsing technology. ASF is a rather pure executable specification language that allows rewrite rules to be written in concrete syntax. ASF+SDF encapsulates syntax and semantics of one or more productions inside a module to increase the reusability of the language constructs. The modularization in ASF+SDF is mainly focus on semantic reuse. Each module inside ASF+SDF meta-environment is an object-like structure which has a relative small scale and can't be developed or run independently. The CBCFG approach is distinguished from all of the above work in that the language is modularized at the early stage of the language implementation, i.e., at the grammar level. The decomposition of a language is based on grammar components in different forms for different modularization purposes. The appropriate granularity and tight cohesion of general grammar components enable the decomposed unit to be developed and tested independently. The aspect component presented in this work is the first time that aspect-orientation is utilized in grammar design.

6. Conclusion In order to decrease the complexity in the specification of large grammars and improve reusability in programming language implementations, the CBCFG is presented in this paper as a

new formalism that can modularize a programming language grammar as independent and pluggable components. CBCFG enriches the property of formal specification by applying software engineering constructs and techniques (e.g., object-orientation, aspect-orientation and macro definition) from programming languages. In terms of software engineering principles, CBCFG provides nice comprehensibility, changeability, independent development and reusability. 1

1

1

1

Comprehensibility. By decomposing the large grammar into small grammar components, the number of intertwined symbols and productions are greatly reduced and the name of each symbol tends to be short and precise, resulting in a specification that is easy to understand and maintain. Changeability. As the CBCFG approach provides nice information hiding property, the design change to one component will be isolated inside the component to avoid propagation changes. Independent development. The appropriate granularity and tight cohesion of general grammar components enable the decomposed unit to be developed and tested independently. This will greatly decrease the complexity in language implementation. As each component is doing a smaller job, it is less likely to have major errors. Reusability. As each component is an independent entity, it is free to compose and reuse different components to build new languages. For example, the Binary_expression component in the Java specification, which has more than 50 lines of CFG productions, can be totally reused to build the C or C++ language.

References D. Parnas. On the Criteria To Be Used in Decomposing Systems into Modules. Communications of the ACM, December 1972, pp. 1053-1058. [2] CUP: Parser Generator for Java. http://www.cs.princeton.edu/~appel/modern/java/CUP/ [3] G. Kiczales, J. Lamping, A. Mendhekar, C. Maeda, C. Lopes, J. Loingtier, and J. Irwin. Aspect-Oriented Programming. In Proc. 11th European Conf. Object-Oriented Programming (ECOOP), Springer-Verlag, LNCS 1241, 1997, pp. 220-242. [4] Sun Microsystems. The Java Language Specification. http://java.sun.com/docs/books/jls [5] G. Kiczales, E. Hilsdale, J. Hugunin, M. Kersten, J. Palm, and W. G. Griswold. An Overview of AspectJ. In Proc. 15th European Conf. on Object-Oriented Programming (ECOOP), Springer-Verlag, LNCS 2072, 2001, pp. 327–355. [6] Object Technology International, Inc. Eclipse Platform Technical Overview, February 2003 [7] X. Wu, B. R. Bryant, and M. Mernik. Object-Oriented Pattern-Based Language Implementation. Technical Report, University of Alabama at Birmingham, http://www.cis.uab.edu/wuxi/paper/acta.pdf [8] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns, Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995. [9] O. Hachani and D. Bardou. Using Aspect-Oriented Programming for Design Patterns Implementation. In Proc. Workshop Reuse in Object-Oriented Information Systems Design, 2002. [10] J. Hannemann and G. Kiczales. Design Pattern Implementation in Java and AspectJ. In Proc. Object-Oriented Programming, Systems, and Applications (OOPSLA), 2002, pp. 161–173. [11] X. Wu, S. Roychoudhury, B. Bryant, J. Gray, and M. Mernik. A Two-Dimensional Separation of Concerns for Compiler Construction. In ACM Symposium on Applied Computing (SAC) Programming for Separation of Concerns Track, to appear March 2005. th [12] D. Wile. Integrating syntaxes and their associated semantics. In Proc. 36 Hawaii Int. Conf. System Sciences (HICSS), 2003. [13] R. Farrow, T. J. Marlowe, and D. M. Yellin. Composable attribute grammars: Support for modularity in translator design and implementation. In Proc. 18th ACM Symposium on Principles of Programming Languages (POPL), ACM Press, 1992, pp. 223-234. [14] J. Boyland and S. L. Graham. Composing Tree Attributions. In Proc. 21st ACM Symposium on Principles of Programming Language (POPL), ACM Press, 1994, pp. 375 – 388. [15] M. Mernik, V. Žumer., M. 123456789 45 2 4. Implementation of Multiple Attribute Grammar Inheritance In The Tool LISA. ACM SIGPLAN Not., 34(6): 68-75, June 1999. [16] G. Hedin and E. Magnusson. JastAdd---a Java-based system for implementing front ends. In M. van den Brand and D. Parigot, eds, Electronic Notes in Theoretical Computer Science, vol. 44, Elsevier Science Publishers, 2001. [17] JavaCC: Java Compiler Compiler, Sun Microsystems, Inc. https://javacc.dev.java.net/ [18] Introduction to JJTree. http://www.j-paine.org/jjtree.html [19] M. G. J. van den Brand, J. Heering, P. Klint and P.A. Olivier. Compiling Language Definitions: The ASF+SDF Compiler. ACM Transactions on Programming Languages and Systems, 24(4): 334-368, 2002. [1]