Object-Oriented Language Specifications: Current

0 downloads 0 Views 63KB Size Report
encapsulation and inheritance, improves language ... are woven into context-free grammar classes ... oriented languages such as Java. ..... alks/ip.pdf.gz.
Object-Oriented Language Specifications: Current Status and Future Trends Marjan Mernik1, 2, Xiaoqing Wu1, Barrett R. Bryant1 1

2

Department of Computer and Information Sciences, University of Alabama at Birmingham, USA, Email: {mernik, wuxi, bryant}@cis.uab.edu

Faculty of Electrical Engineering and Computer Science, University of Maribor, Slovenia, Email: [email protected]

Abstract It is a well-known fact that programming language definitions are hard to be efficiently modularized. Moreover, new programming languages are hard to build simply by incorporating different language components due to complex interactions among different language features. This is true for general-propose programming languages, as well as domainspecific languages. Here, object-oriented techniques and concepts, like encapsulation and inheritance, have much to offer and improve language specification languages towards better modularity, reusability and extensibility. In this position paper our view and experience with object-oriented specification languages are given.

1. Introduction The challenge in programming language definition is to support modularity and abstraction in a manner that supports reusability and extensibility. The language designer wants to include new language features incrementally as the programming language evolves. This is especially true in developing domain-specific languages which change more frequently than the general-purpose programming languages do [Mernik 2003]. Ideally, a language designer would like to build a language simply by reusing different language definition modules (language components), such as modules for expressions, declarations, etc., regardless of the different formal methods that may be used to specify such language components. This approach is common

in component-based programming where components can be simply plug-ins. These reusable components should be straightforwardly extendible to reflect language design changes. This cannot be done now, even if we restrict ourselves to just one of the formal methods (abstract state machines [Gurevich 1993], action semantics, algebraic specifications, attribute grammars, denotational semantics, operational semantics, two-level grammars, etc. [Slonneger 1995]) since different compiler-compilers (automatic compiler generation systems) use different and incompatible specification languages (e.g. despite the fact that Eli [Gray 1992] and FNC-2 [Jourdan 1990] both rely on attribute grammars one can not exchange language definition modules written in the other system). Moreover, the same is usually true even in the case of the same specification language since syntax entities (non-terminals and terminals) and semantic entities (e.g. attributes and semantic rules in the case of attribute grammars) are not constituents of the hidden part of the module, nor are the parameters of language definition modules. For example, when importing a module for expressions some non-terminals may clash with existing non-hidden non-terminals producing undesirable effects. Such a module can be parameterized using non-terminals as parameters to solve renaming problems. But modules with dozens of parameters are hard to use. Compared to modern programming languages, such as object-oriented or functional languages, language specification languages of the 1980ís and early 1990ís were far less advanced,

specifically concerning provisions for abstraction, modularization, extensibility and reusability. Therefore, in recent years concepts from general programming languages into language specification languages have been successfully incorporated. Among them, object-oriented techniques are one of the most successful. Indeed, this had several benefits on language specification languages which are reported also in this paper. Our position statement is that the use of object-oriented techniques and concepts, like encapsulation and inheritance, improves language specification languages to a much greater extent towards their modularity, reusability and extensibility than any other technique. Two such examples are shown in this paper. However, to achieve modularity, extensibility and reusability to the full extent these techniques need to be combined with aspect-oriented techniques since semantic aspects also crosscut many language features (components). Moreover, special algorithms have to be invented (e.g. forwarding) which have to improve modularity of underlying formal methods. At this moment we have almost reached the point where programming languages can be composed simply by incorporated different language components using the same formal method.

2. Current status Object-oriented notations have been integrated with attribute grammars a long time ago [Paakki 1995]. In this case context-free grammars define the class hierarchy. Nonterminals act as abstract super classes and productions act as specialized concrete subclasses that specify the syntactic structure, attributes and semantic rules. All these elements can be inherited, specialized, and overridden in subclasses. One of the shortcomings of this approach is that right-hand nonterminals cannot have inherited attributes and the other is that only small features can be added to the language. In other words, language can not evolve dramatically. Another problem is that the class hierarchy defines the modularization based on language syntax constructs, whereas the language developer also wants to have modules based on different aspects (e.g. name analysis, type checking, code generation, etc). To overcome this problem aspect-oriented techniques have been

proposed. Special modules for describing aspects are woven into context-free grammar classes using aspect-oriented techniques [Hedin 2003]. On the other hand some special algorithms have been invented that implicitly improve modularity of attribute grammars, such as remote attribute access [Hedin 2000]. The goal of intentional programming (IP) [De Moor 2001] was also a modular language implementation system where intentions are plugand-play components. Achieving independence of components (intentions) was the main technical challenge. Modularity and reusability was achieved using forwarding [Van Wyk 2002], a variation of inheritance in attribute grammars, and using aspects. A programming language can be built simply by importing an appropriate set of such components or can be extended by a rich set of features, and each of these features is a reusable component. The IP project failed despite state-of-the art modularity of language specifications being achieved. In our opinion the reason is that it was too ambitious, expecting death of programming languages. Moreover, again it was proven how complex the interactions of different language features are. Two-level Grammar (TLG, also called Wgrammar) was developed as a specification language for programming language syntax and semantics [van Wijngaarden 1974]. The name ì two-levelî in Two-Level Grammar comes from the fact that TLG consists of one level of contextfree grammars defining the set of type domains and one level of context-free grammar template defining the set of function definitions operating on those domains. TLG domain declarations and associated functions may be encapsulated into a class hierarchy supporting multiple inheritance [Bryant 2002]. In this case, the type domains can be used to define the instance variables of the class and the function definitions can be used to define the methods of the class as in objectoriented languages such as Java. The class hierarchy, which is resident in TLG, is a small forest of built-in classes, such as integers, lists, etc. The most significant feature of TLG specifications is that it is wide-spectrum. The natural language nature of a TLG specification makes it very understandable and its Turing-computable property as well as built-in classes make it very detailed for implementation and translatable into

executable object-oriented code directly. TLG serves as a communication medium between language designers and implementers. The shortcoming of TLG specification is that the reusability of TLG classes is offset when dealing with low-level semantic details. Modularity and reusability can be achieved also using other non object-oriented techniques. One of the recent achievements regarding better reusability and modularity of action semantics is reported in [Doh 2003]. The authors propose a finer modular structure where a new semantic equation module is constructed for each production. The final language definition module is obtained simply by importing them together, assuming that the symbols they share correspond to common features. It is our belief that a fine modular structure is not feasible for real programming languages, just as a monolithic structure is infeasible, since optimal granularity is somewhere between two extreme options. Modularity and extensibility of specifications based on denotational semantics are much harder to achieve. Some attempts were made in [Liang 1996, Vaidyanathan 1996]. Despite their usefulness language specification languages are not popular. Among the reasons are classical ones such as that they are hard to understand, modify and maintain. Many of these problems can be attributed to nonmodularity, non-extensibility and non-reusability of language specification languages. Another problem is that formal methods have fallen short when low-level implementation details needs to be specified. Therefore several works tried informal or semi-formal approaches to the implementation of programming languages. Again, object-oriented techniques have proven useful. Examples are:  JTS [Batory 1998] where syntax is specified formally and semantics informally. For each nonterminal a class is generated. Language designers than provide the semantics using the object-oriented language Java. These classes are reusable components which are composed together using type equations (GenVoca components). Here inheritance plays an important role again. Language extensions (components) are composed together by inheritance.



JJForester [Kuipers 2001] combines the syntax definition formalism SDF and the associated components that support generalized LR parsing, with the objectoriented language Java. It generates class structures from SDF grammar definitions and the generated class structures implement a number of object-oriented design patterns [Gamma 1995] to facilitate construction and traversal of parse trees represented by object structures. For example, apart from Java code for constructing and representing syntax trees, JJForester generates visitor classes that facilitate generic traversal of these trees.

3. Our Approaches In this section two of our latest approaches are shortly described. More examples will be provided at the workshop. 3.1 LISA In the LISA project [Mernik 2000, Mernik 2004], one of the main goals was to enable incremental language development. It was soon recognized that inheritance can be very helpful since it is a language mechanism that allows new definitions to be based on the existing ones. A new specification can inherit the properties of its ancestors, and may introduce new properties that extend, modify or defeat its inherited properties. In object-oriented languages the properties that consist of instance variables and methods are subject to modification. But what are the corresponding properties in language definitions based on attribute grammars? Since semantic rules in attribute grammars are tightly coupled with particular production rules, properties in attribute grammars consist of: lexical regular definitions, attribute definitions, rules which are generalized syntax rules that encapsulate semantic rules, and operations on semantic domains. In this approach the attribute grammar as a whole is subject to inheritance employing ì Attribute grammar = Classî paradigm. We call this multiple attribute grammar inheritance. With our approach, the language designer is able to add new features (syntax constructs and/or semantics) to the language in a simple manner by extending lexical, syntax and

semantic specifications. One of the shortcomings of this approach is that it doesnít help if programming languages have similar semantics but different syntax. This is due to the fact that generalized rules encapsulate both syntax and semantics. Further, when studying semantic specifications for various programming languages common patterns can be noticed (e.g. value distribution, value construction, propagation, etc.). Such patterns are independent of the structure of production rules and are independent of a number of attribute occurrences. To overcome both problems, templates in attribute grammars have been invented [Mernik 2000]. With templates we are able to describe the semantic rules which are independent of grammar production rules (syntax). In figure 1, only a small piece of a LISA specification is presented due to page limitation. language ExprEnv extends Expr { lexicon { Identifier [a-z]+ } attributes Hashtable *.inEnv; rule extends Expression1 { compute { valueDistribution }; } rule extends Expression2 { compute { valueDistribution }; } rule Term2 { TERM ::= #Identifier compute { TERM.val = ((Integer)TERM.inEnv.get( #Identifier.value())).intValue(); }; } }

3.2. Object-Oriented and Pattern-based TLG in Language Implementation In this project, we combine Object-Oriented TLG specifications with Java to implement domain-specific languages. Figure 2 illustrates the control-flow of this framework. We define the lexical and syntax rules in TLG type domain, specify abstract semantics in the function domain and encapsulate the lexical and syntax rules, as well as their associated abstract semantics of each terminal or non-terminal grammar entity into a class. The lexical rules and syntax rules are generated from TLG specification directly, which will be compiled by lexer generator JLex 1 and parser generator CUP 2 to generate the corresponding lexer and parser in Java, respectively. Meanwhile, the Java code for abstract semantics is generated from the TLG specification. We implement the concrete TLG Specification

TLG Compiler

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000000000000000 0000000000000000 JLex Specification (Lexical rules)

JLex

CUP Specification (Syntax rules) Specification

CUP

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Lexer in Java

Parser in Java

Semantics in Java

javac

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000000000000000

Figure 1. LISA Specification

The LISA approach has been proven successful in developing various generalpurpose as well as domain-specific languages (e.g. SODL, COOL, AspectCOOL, PLM). Our experience with these non-trivial examples shows that multiple attribute grammars inheritance is useful in managing the complexity, reusability and extensibility of language definitions. Specifications become much easier to read, maintain and to modify.

User-supplied Java Code

Interpreter in Java byte code

Input term

JVM

Output term

Figure 2. Control Flow of Language Implementation by TLG Specification

1

JLex: Java Lexical Analyzer Generator. http://www.cs.princeton.edu/~appel/modern/java/JLex/ 2 CUP: Parser Generator for Java. http://www.cs.princeton.edu/~appel/modern/java/CUP/

semantics by user-supplied Java classes. Here, abstract semantics refers to the semantics of a nonterminal that are used to describe the composition of this nonterminal by other grammar symbols. This kind of semantics is domainindependent and can be easily specified formally using TLG. On the other hand, concrete semantics refers to semantics for which the implementation is very low-level or operating system related, such as the calculation of two complex objects (e.g., two matrices) or I/O operations. Such semantics is closely related to the domain of the language and is difficult to be specified by formal methods. We also apply several object-oriented design patterns to help improve the modularity and abstraction level of TLG specification to enhance the reusability of formal specifications in language implementation. The interpreter pattern is used to treat each nonterminal and terminal symbol as a class; therefore the generated parse tree is an instance of the Composite pattern, with the terminal classes as leaf, and the nonterminal classes as composite; and most importantly, the Chain of Responsibility pattern is applied in the parse tree to recursively throw the responsibilities of implementing concrete semantics from the upper nodes to the lower nodes, until they reach the leaf nodes, i.e., nodes for terminal symbols. The use of these design patterns has the following benefits for the improvement of reusability: firstly, it is easy to change and extend the grammar. As each grammar is composed by a number of terminals and nonterminals, the designer can always replace a class definition to change the grammar or use inheritance to extend the grammar [Gamma 1995]. Secondly, because we throw the responsibilities of implementing concrete semantics to the leaf nodes, we keep all the middle nodes (nodes for nonterminals) abstract and domain-independent, which can be reused directly. Changing a language from one domain to another domain can be achieved by simply rewriting the leaf classes. The byproduct of using chain-of-responsibility pattern is that the interpreter designer can freely add concrete semantics in generated leaf-node classes using abundant libraries and I/O capabilities provided by Java and avoid the use of complicated formal methods in specifying low-level semantics.

4. Current Shortcomings and Future Trends As already mentioned, the use of objectoriented techniques and concepts greatly improves language specification languages towards better modularity, reusability and extensibility. To achieve modularity, extensibility and reusability to the full extent these techniques need to be combined with aspect-oriented techniques since semantic aspects also crosscut many language components. Moreover, special algorithms have to be invented to improve modularity of underlying formal methods. Another shortcoming of current approaches is lack of scalability since they do not fully support grammatical operators such as described in [Wile 1999]: rename, export, abstract, remove, extend, and integrate. The purpose of the rename operator is to explicitly rename different entities in language specifications. The export operator hides all entities except those explicitly exported. The abstract operator simple ignores the definitions for entities. The remove operator removes all traces of entities from specifications. The extend operator extends previous entities with new ones. The operator integrates puts two language specifications together. Such an operator suite will greatly improve reusability of specifications. In our opinion this is an open issue in current object-oriented specification languages. The question is in which directions language specification languages will continue to improve. In our opinion, the main future advances in programming languages will be soon incorporated into language specification languages after proven useful for general programming (e.g. as in the case of aspect-oriented programming). Furthermore, specifications will be extended in a manner that also other language-based tools (e.g. editors, debuggers, animators, etc) can be automatically generated from language specifications. One of the recent attempts is described in [Henriques 2002]. But, the ideal solution where language designer can freely combine language components based on different formal methods is less likely to appear in forthcoming years.

References [Batory 1998] D. Batory, B. Lofaso, Y. Smaragdakis. JTS: Tools For Implementing Domain-Specific Languages. Proc. Fifth Int. Conf. Software Reuse, 1998, pp. 143 - 153 [Bryant 1998] B. R. Bryant, V. Vaidyanathan, ObjectOriented Software Specification in Programming Language Design and Implementation, Proc. COMPSAC '98, Twenty-Second Ann. Int. Computer Software and Applications Conf., pp. 387-392, 1998. [Bryant 2002] B. R. Bryant, B.-S. Lee. Two-Level Grammar as an Object-Oriented Requirements Specification Language. Proc. 35th Hawaii Intl Conf. System Sciences, 2002. http://www.hicss.hawaii.edu/HICSS_35/HICSSpape rs/PDFdocuments/STDSL01.pdf [Doh 2003] K. Doh, P. Mosses. Composing Programming Languages by Combining ActionSemantics Modules. Sci. Comput. Programming, Vol. 47, No. 1, pp. 3-36, 2003. [Gamma 1995] E. Gamma, R. Helm, R. Johnson, J. Vlissides. Design Patterns, Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995. [Gurevich 1993] Y. Gurevich. Evolving Algebras: An Attempt to Discover Semantics. Current Trends in Theoretical Computer Science, World Scientific, pp. 266-292, 1993. [Gray 1992] R.W. Gray, V.P. Heuring, S.P. Levi, A.M. Sloane, W.M. Waite. Eli: A Complete, Flexible Compiler Construction System. Communications of the ACM, Vol.35, No.2, pp.121-131, 1992. [Hedin 2000] G. Hedin. Reference Attributed Grammars. Multiple Attribute Grammar Inheritance. Informatica, Vol. 24, No. 3, pp. 301-318, 2000. [Hedin 2003] G. Hedin, E. Magnusson. JastAdd-An Aspect-Oriented Compiler Construction System. Sci. Comput. Programming, Vol. 47, No. 1, pp. 37-58, 2003. [Henriques 2002] P. Henriques, M. Varanda Pereira, M. Mernik, M. Lenič, E. Avdičauöević, V. umer. Automatic Generation of Language-based Tools. Electron. Notes Theor. Comput. Sci., Vol. 65, No. 3, 2002. [Jourdan 1990] M. Jourdan, D. Parigot, C. Julie, O. Durin, C. Le Bellec. Design, Implementation and Evaluation of FNC-2 Attribute Grammar System. Proc. of the ACM Sigplan'90 Conference on Programming Language Design and Implementation, pp.209-222, 1990. [Kuipers 2001] T. Kuipers, J. Visser. Object-Oriented Tree Traversal with JJForester. Electron. Notes Theor. Comput. Sci., Vol. 44, 2001. [Liang 1996] S. Liang, P. Hudak. Modular Denotational Semantics for Compiler Construction.

Proc. 6th European Symp. Programming. SpringerVerlag LNCS Vol. 1058, pp. 219-234, 1996. [Mernik 2000] M. Mernik, M. Lenič, E. Avdičauöević, V. umer. Multiple Attribute Grammar Inheritance. Informatica, Vol. 24, No. 3, pp. 319-328, 2000. [Mernik 2003] M. Mernik, J. Heering, T. Sloane. When And How To Develop Domain-Specific Languages. CWI Technical Report, SEN-E0309, 2003. http://ftp.cwi.nl/CWIreports/SEN/SEN-E0309.pdf. [Mernik 2004] M. Mernik, V. umer. Incremental Programming Language Development, to appear in Computer Languages, Systems and Structures, 2004. [de Moor 2001] O. de Moor. Intentional Programming. Invited talk at British Computer Society, 2001. http://web.comlab.ox.ac.uk/oucl/work/oege.demoor/t alks/ip.pdf.gz. [Paakki 1995] J. Paakki. Attribute Grammar Paradigms - A High-Level Methodology in Language Implementation. ACM Computing Surveys, Vol. 27, No.2, pp. 196-255, 1995. [Slonneger 1995] K. Slonneger, B. L. Kurtz. Formal Syntax and Semantics of Programming Languages. Addison-Wesley, 1995. [Van Wyk 2002] E. Van Wyk, O. de Moor, K. Backhouse, P. Kwiatkowski. Forwarding in Attribute Grammars for Modular Language Design, Proc. 11th Int. Conf. Compiler Construction, pp. 128-142, 2002. [van Wijngaarden 1974] A. van Wijngaarden, ì Revised Report on the Algorithmic Language ALGOL 68.î Acta Inf., Vol. 5, pp. 1-236, 1974. [Wile 1999] D. Wile. Integrating Syntaxes And Their Associated Semantics. Technical Report, USC/Information Science Institute, Marina del Rey, 1999.