Categories and Subject Descriptors D.2.3 [Software Engineer- ing]: ProcessorsâCode .... A smaller number of AST nodes have a custom EMO and/or CMO.
Expressive Programs Through Presentation Extension Andrew D. Eisenberg
Gregor Kiczales
Department of Computer Science University of British Columbia {ade, gregor}@cs.ubc.ca
Abstract Most approaches to programming language extensibility have worked by pairing syntactic extension with semantic extension. We present an approach that works through a combination of presentation extension and semantic extension. We also present an architecture for this approach, an Eclipse-based implementation targeting the Java programming language, and examples that show how presentation extension, both with and without semantic extension, can make programs more expressive. Categories and Subject Descriptors D.2.3 [Software Engineering]: Processors—Code generation; Run-time environments ; D.3.4 [Software Engineering]: Coding Tools and Techniques— Program editors General Terms
Languages, Design
Keywords Annotations, Metadata, Metaobject Protocol, MOP, Expressiveness
1.
Introduction
Many factors contribute to making a program expressive. Good modularity makes the units of comprehension natural and clearly interconnected [39, 40]. The use of elegant and standard style makes a program easier to read, verify, and maintain [37, Chaps 18, 19]. Tailorability of the programming language makes it possible to express general or domain-specific functionality using specialized constructs [25][Chap 9]. Our current work on program expressiveness is focused on tailorability of programming languages. Within this space, there have been a range of approaches, from making it easier to define domainspecific languages (DSLs) [26, 50, 51, 52] to making general purpose languages (GPLs) extensible, so that programmers can define new language constructs and semantics that better suit their needs [9, 11, 13, 30, 49]. Use of these extension mechanisms has itself ranged from defining fairly general-purpose constructs [19] to more domain-specific constructs [8, 10, 32]. Most approaches to language extensibility have worked by pairing syntactic extension with semantic extension. The programmer defines a new bit of special syntax to which the new special semantics is attached. The Common Lisp defmacro facility is a typical example of this approach [45]. This combination of syntactic and
Figure 1. Screenshot of our extended editor
semantic extension has worked most easily in the Lisp languages, where the syntactic constraints are simple [3]. In the larger set of languages with Algol-like syntax (including C, Java etc.) enabling syntactic extension is more complex, although systems like Dylan, JSE, and Maya have shown that it is workable [3, 4, 5]. In this paper we present another approach to language extensibility that works using presentation extension, with or without semantic extension. In presentation extension, the stored form of the program remains in a standard syntax—there is no syntactic extension. Instead, the integrated development environment (IDE) incorporates a metaobject protocol [28] (or framework) to make it possible for the programmer to customize the way the program is read, edited, or browsed. Like syntactic extensions, presentation extensions rely on semantic extensions to deliver static and dynamic semantics that affects the program’s runtime behavior. This paper makes four contributions: • We propose presentation extension as separate from semantic
extension. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. AOSD 07 March 12-16, 2007, Vancouver Canada c 2007 ACM 1-59593-615-7/07/03. . . $5.00 Copyright
• We present an architecture for coupling presentation extension
with semantic extension. • We show an Eclipse-based implementation targeting the Java
programming language, that provides good expressiveness, and works compatibly with existing source repositories, tools, and semantic extension mechanisms.
package ca . ubc . ships ;
Figure 2. Zooming in on Figure 1: text does not necessarily flow in straight lines and can be impeded by graphical elements such as vertical lines
public class Ship { @ca . ubc . cs . etmop . annotations . Getter (" get ") @ca . ubc . cs . etmop . annotations . Setter (" set ") private int x ; @ca . ubc . cs . etmop . annotations . Previous () private int y ;
• We use the implementation to show that presentation extensi@ca . ubc . cs . etmop . annotations . Getter ("") @ca . ubc . cs . etmop . annotations . Setter (" set ") @ca . ubc . cs . etmop . annotations . Guard (" $$ >= 0") private int uid ;
bility can be used to make programs more expressive and that the extensions themselves are straight-forward to create. The next section introduces a simple example of using our approach. Section 3 describes our architecture in more detail. Section 4 describes how we implement our architecture with Java as the target language. Several example uses of our editor and preprocessor are described in Section 5. We introduce the concept of indexical referencing in Section 6. Related work and a summary are in Sections 7 and 8.
2.
@ca . ubc . cs . etmop . annotations . Precondition (" Ocean . inBounds (x , y ") @ca . ubc . cs . etmop . annotations . Postcondition (" this . getX () == x && this . getY () == y ") public void move ( int x , int y ) { this . x = x ; this . y = y ; }
Simple Example
Figure 1 is a screenshot of editing a simple Java program using our approach. In this world, the programmer works with existing primitives, such as class and method declarations, but also has access to an open-ended set of language extensions. The visual form of the program is simple and elegant in ways that are not possible with ordinary Java-like languages. In this example, five different extensions are used to allow the concise definition of getter and setter methods, field guards, and pre- and postconditions. The screenshot says that the fields x and y, both have getter and setter methods respectively named getX, setX, getY, and setY. Additionally, the field uid has a getter and a setter method method called uid and setUid, and a guard, which says that the field must be non-negative (the $$ represents the new value to be set). Finally, the move method has both a pre- and a postcondition. Compiler warnings and appropriate IDE features are integrated into the presentation extensions. The error in the precondition appears underlined in red in the code as well as on the overview ruler on the right. The message associated with the error is specific to the extension; it clearly states that the precondition is malformed. Typing and cursor movement are as expected, and blocks of text can be selected and deleted, even though, as shown in Figure 2, text does not always flow in straight lines, and there may be graphical elements such as vertical bars. Context menus and keyboard shortcuts provide commands including traditional ones such as copy, paste, undo, redo, and code completion. Clicks on alternate Eclipse views, such as the Problems View, opens an extended editor with the expected region of text selected. Extension-specific behavior can be accessed in a variety of ways. For example, the regions to the right of the getter and setter extensions can be typed into directly. Additionally, extensions can be added, removed, or configured through context menus, an extension palette (Section 4.6), and keyboard shortcuts. To provide presentation extensibility there is a metaobject protocol (MOP) [28] running in the editor. The semantic extensibility is provided by another MOP that runs as a preprocessor before compilation; other semantic processors can also be used. The program is stored as Java code that makes use of JSR 175 annotations [7] as shown in Figure 3.
3.
public Ship ( int uid ) { this . uid = uid ; }
Architecture
As shown in Figure 4, the architecture consists of two metaobject protocols (MOPs) [28] that run at separate times, but work syn-
public String toString () { return " Ship " + uid () + " is currently at (" + getX () + " , " + getY () + ")"; } }
Figure 3. Persistent store of Figure 1 is ordinary Java language source. ergistically to enable implementation of language extensions. The edit-time MOP (ETMOP) is responsible for presentation extensibility and the implementation of the visual form of the program. The compile-time MOP (CTMOP) is responsible for semantic extensibility and the implementation of traditional static and dynamic language semantics (warnings, errors and execution behavior). In Section 4.4 we discuss how other semantic processors can be used together with our CTMOP. This section describes the two MOPs by focusing on the data structures they produce and how they are produced. Both MOPs start with a program AST annotated with metaobjects: edit metaobjects (EMOs) for the ETMOP, and compile metaobjects (CMOs) for the CTMOP.1 These EMOs and CMOs are responsible for driving the display and semantic extensions. The relationship between AST nodes and metaobjects is n to 1. In a typical program, most AST nodes simply have default EMOs and CMOs. A smaller number of AST nodes have a custom EMO and/or CMO. To handle the common cases where we want a single AST node to have multiple EMOs or CMOs (e.g., Getter and Setter), we provide a collection of composite metaobjects, which are discussed in more detail in Sections 4.2 and 5.1. As shown in Figure 1, the n to 1 sharing of EMOs allows the vertical bar to cover multiple fields. Composite EMOs allow, for example, having both a getter and setter (on a single or cluster of field declarations). 3.1
ETMOP architecture
The ETMOP implements a program editor and has a standard model-view-controller architecture [31]. 1 We
defer the discussion of how MOs are created and stored to Section 4.1
External Compiler
Figure 4. Architecture of ETMOP and CTMOP 3.1.1
Model
3.1.3
Controllers
The model is made of two sub-structures: AST + EMOs (described above), and boxes. Boxes form the part of the model that defines an abstraction of the relative layout of the final display. They are produced as a result of the ETMOP running a simple code walker over the AST + EMOs structure. The code walker asks each EMO to produce a box for the AST nodes it is attached to. The default EMO, which is attached to the majority of AST nodes, makes a recursive call, and gathers the boxes of child nodes into the canonical layout of the program text. A sub-tree of nodes that all have a default EMO will simply return box structure containing the unaltered program text. Specialized EMOs may change the canonical layout by adding, removing, or replacing notation, graphics, and text. Different kinds of boxes produce different kinds of containment, which an EMO can use to assemble customized layout. Text boxes lay out a string of characters. Code boxes know how to lay out source code. Row and column boxes lay out structure horizontally and vertically. And graphics boxes display graphics, though the actual drawing of the graphics is configurable and deferred until the view is created. A simple example of a graphics box is the vertical line box, which produces the lines in Figure 1. Boxes are substantially immutable. Most configuration happens at construction time: setting their contents, how they respond to events, how they serialize themselves back to the persistent store, and, for graphics boxes, the graphics they produce.
Controllers react to user generated keyboard and mouse events by creating commands that affect the state of the editor. There is a 1 to 1 correspondence between boxes and controllers. Controllers are created by the boxes at the same time as the figures. Commands created by the controllers affect editor state by changing some part of the model and triggering a refresh of the ETMOP view. Which part and how much of the view is refreshed is dependent on the scope of the command. Commands such as text editing and selection changes have a local scope and change only the box directly associated with the controller and, potentially, its children. Hence, they perform a local refresh. Commands that add, remove, or change extensions update the AST directly and can have a potentially global effect on the view. These commands trigger a complete refresh of the view, recreating the boxes, figures, and controllers from the updated AST after execution. The controllers are also a standard part of the MVC architecture and handle updates to the model after a user interacts with the display.
3.1.2
The CTMOP implements the static and dynamic semantics of the program through a process similar to expansion of syntactic macros [34]. It uses data structures and concepts similar to the ETMOP. A simple code walker traverses the AST, asking each AST node’s CMO to produce the expanded AST for each node it is attached to. The default CMO continues the walk recursively to follow a deep-copy like process. Customized CMOs can return an alternate expansion.
View
The view is a hierarchical set of figures (graphical objects) drawn on the screen. The figure hierarchy is almost identical to the box hierarchy except that graphics boxes are able to draw any number or kind of figures. The view is produced by walking the box structure and drawing the figures for each box as it is visited. Thus, the view is a standard part of an MVC architecture and describes the physical layout of the display in terms of pixel coordinates.
3.1.4
Traceability
The ETMOP maintains a simple correspondence between the boxes and the program text. Boxes know their offset into the program text, and so any region of the display can be mapped back to the text and vice versa. 3.2
CTMOP Architecture
This protocol is similar to both the mechanism and usage patterns of Common Lisp’s defmacro. Most customized CMOs do their work by picking out relatively high-level sub-parts of the AST nodes they expand, and then call the walk recursively to allow nested CMOs to do what they would like. But, a CMO can customize or entirely override the recursive walk if required. This allows a CMO to give context sensitive behavior to nested AST nodes with default or special CMOs. In contrast to typical syntactic macro systems the protocol allows CMOs to not only expand the current node, but also create additional top=level nodes, or sibling nodes if allowed by the abstract syntax. This allows a CMO associated with a method-declaration to not only customize that method, but also produce additional method declarations for example. Similarly, a CMO associated with a method parameter can add new parameters, or even new methods. Once the AST is expanded, a normal compiler is called to complete the compilation. 3.2.1
Traceability of warnings and errors
To facilitate traceability of warning and error messages, the CTMOP provides several kinds of built-in behavior. AST nodes that are simply copied or relocated in the expanded code retain their original source locators. Synthetic AST nodes are assigned a source locator by the CMO that created it. By default, this is the source locator that is associated with the CMO itself when it is created, but the CMO can explicitly assign different source locators to synthetic nodes it creates when the default is not appropriate. After compilation, the CTMOP gathers all errors and warnings generated during expansion and compilation, and uses the source locators to translate an error in the expanded AST to an error in the original AST. Syntactic errors can be detected at expansion time. For these kinds of errors, the CTMOP can produce messages meaningful to the programmer in the specific context of the extension involved. This is how in Figure 1 we see an error message that is specific to the precondition extension and why we do not see error messages for uses of the getter methods (getX, etc.) that are implicitly defined. Other kinds of errors including semantic errors like name binding and typing, are deferred to the actual Java compiler. For these kinds of errors, the messages produced by the compiler will be shown and will not be specific to the extension that caused the error. However, the traceability of source locators and other handling of the context of the error can make the errors integrated into the extension as shown in Figure 1.
4.
Implementation
Our implementation uses Java as the target language. We have implemented the two MOPs as Eclipse plugins. This section describes some of the rationale behind the design choices we made. One of the key properties of our approach is its compatibility with existing source repositories, tools, and semantic extension mechanisms. This section focuses on the ETMOP, and mentions the CTMOP only when needed for a comparison. 4.1
Persistent Store
The architecture is neutral with respect to the form and concrete syntax of the persistent store. In our implementation, we use JSR 175 annotated Java code. The metadata that states that an extension is being used at a specific location in the AST are stored as annotations (in other words, metaobjects are serialized to annotations). The architecture does not prevent the possibility of storing this metadata in other ways, as XML, in comment strings, or as some kind of binary file, but we have chosen annotations because they are programmer, tool and code-base compatible. In particular:
• Annotations are already familiar among developers. • Annotations work with existing tools in modern IDEs. • They are valid Java syntax. • They are attached to source code. • They can be edited by hand. • The Java compiler checks for syntax and some type correctness
of annotations. To handle AST nodes that share extensions, we use a special annotation, @Previous, which designates that an AST node shares the metaobject of its previous sibling. None of our extensions so far suggest the need for more sophisticated sharing. By relying on annotations, we restrict ourselves because annotation syntax is somewhat limiting: annotations can only be attached to declaration nodes of an AST, annotations are not subclassable, and declarations cannot hold multiple annotations of the same type unless they are in an array. To partially address these limitations, metaobjects can also be created based on simple structural patterns in the AST. We show an example of this in Section 5.4. EMOs are registered with the ETMOP by extending an Eclipse extension point2 that associates an annotation class with a EMO class, and there is a similar mechanism for CMOs. 4.2
Combining Multiple Metaobjects
Although each AST node has exactly one EMO and one CMO, there are situations where the effect of multiple extensions should be combined on a single AST node or group of them. For example, the field declaration uid in Figure 1 has a getter, a setter, and a guard. We define a generic suite of composite EMOs and CMOs that can combine the effects of multiple metaobjects onto a single or a group of AST nodes; and extension writers can provide their own. These composite EMOs and CMOs are created implicitly during the EMO and CMO creation process, and do not need to be stored as annotations. Which composite metabject to create during the EMO and CMO creation process is defined during CMO/EMO registration. An example of this is given in Section 5.1. 4.3
Hooks into Eclipse
Each instance of an ETMOP editor is connected to a hidden instance of a JDT editor. Through our traceability mechanism described in Sections 3.1.4 and 3.2.1, there is a reversible mapping between JDT editor locations (lines and offsets), ETMOP editor locations (boxes), and the expanded AST. Thus, the ETMOP editor can hook up with existing eclipse infrastructure in the following ways: • Syntax highlighting is applied to the ETMOP editor by listening
for syntax highlighting events on the hidden JDT editor, and applying them to the correct boxes (or portions of boxes). • Similarly, clicks on the overview ruler, outline view, class hier-
archy view, call hierarchy view, etc will open an ETMOP editor with the expected text highlighted. We do this by having the ETMOP editor listen for the proper events. • Content Assist works in the ETMOP editor. The content assist
proposals include code from the expanded source. For example, if a field, x, has a getter extension applied to it, the getX() 2 The
use of the term Eclipse extension here is an unfortunate overloading of the term extension. When used in this context, we will use the full term Eclipse extension to differentiate it from our own language extensions.
method is returned as one of the proposals, if the request warrants it. • Errors and warnings on files persist between sessions by linking
them to Eclipse resources using standard APIs. Our integration with Eclipse is an on-going process and is facilitated through our architecture. Our strategy to integrate new features into the ETMOP editor and CTMOP preprocessor leverages the traceability between a program’s persistent store, display, and expanded code. There are two general steps to integrating a new feature: 1) determine which Eclipse platform events to listen to and how they apply to the persistent store, and 2) converting this event from coordinates in the persistent store to the appropriate part of the display or expanded source. Thus, we have been able to integrate many Eclipse features into our MOPs and there seems to be few barriers to integrate more. TODO: added to address Shriram’s comment
One caveat may be with the text compare editor, which views the diff between two versions of the same file. Text compare introduces its own graphical notation to link differences between the files. It is an open question as to how text compare notation should be overlayed on top of ETMOP notation. Indeed, there are similar issues with any view that adds custom notation to the text of a program. 4.4
Other Semantic Processors
Using a different semantic processor is possible, either with the CTMOP or separately. Our current implementation of the CTMOP is a source-to-source translation, so a requirement is that if the CTMOP is to be used, it must receive source code with the proper annotations applied to it. Taking into account this requirement, other processors may be used before or after the CTMOP is run. Section 5.5 shows an example where we use this approach. The default behavior of the CTMOP is for annotations to be carried over to the expanded code, where a built-in Java mechanism can control whether they are carried over to the class file and runtime [7]. 4.5
Drawing Framework
We use the Graphical Editing Framework (GEF) [18] to build the figures (Section 3.1.2) and controllers (Section 3.1.3) of the editor. In addition to providing an editor framework, GEF also allows us to hook into built-in undo/redo, copy/paste, selection, highlighting, and cursor movement functionality. 4.6
Extensions Palette
We provide a drag and drop palette from which a programmer can add instances of language extensions to a program, Figure 5. An extension can only be dropped onto sections of the program where it is syntactically valid to have this extension.
5.
Examples
In this section, we walk through several examples that use our current implementation to show that presentation extensibility can be used to make programs more expressive and that the extensions themselves are straight-forward to write. 5.1
Setter
The complete setter extension consists of four parts: 1. The @Setter annotation: public @interface Setter { String value (); }
Figure 5. Palette from which extensions can be dragged and dropped into appropriate locations. 2. Registration of the EMO and CMO. This links the disparate parts of their definition to their annotation and describes how they can be used in the IDE. Additionally, the registration process defines how the extension can combine with others and what composite EMOs and CMOs are created to handle this composition. 3. The SetterEMO class (Figure 6) 4. The SetterCMO class (Section 5.2) Complete code for the SetterEMO is in Figure 6. Following along with the commented numbers in the Figure, we describe the code: 1. SetterEMO implements ICombinableRight. This interface states that an instance of this EMO can combine with other EMO instances that also implement this interface. Getter, Setter, and Guard all implement ICombinableRight, which is what allows them to combine, and appear together at the right of a vertical line. 2. The EMO constructor protocol is that the constructor receives the annotation as well as the list of AST nodes that share the metadata. Some EMOs save both values, but in this case the setter EMO only needs to store the annotation, not the AST nodes. 3. The prefixConfiguration and noteBoxConfiguration methods configure the boxes’ controllers. In particular, the ones used here are standard ETMOP library configurations that add extra commands to the controller. UpdateAnnotationConfiguration ensures that anything typed into the figure it is attached to, automatically updates the annotation accordingly. RightNoteBoxConfiguration ensures that other ICombinableRight EMOs can be added and removed via the context menu. 4. The logicalLayout method implements a simple layout in which the field declarations go on the left, then a vertical line, and then the note box for this EMO. This method is is part of the box construction protocol, and all EMOs must implement it to customize the layout. As the code walker constructs the boxes, it calls the logicalLayout for every EMO that is directly attached to AST nodes. 5. The logicalLayoutNoteBox method does the part of the layout that corresponds to the getter/setter behavior of the EMO, which consists of the text “Setter:”, followed by the prefix in bold, blue letters. This particular method is part of the setter
// 1 public class SetterEMO extends AbstractEMO implements I C o m b i n a b l e R i g h t { private final Annotation annotation ; // 2 public SetterEMO ( Annotation annotation , List < ASTNode > asts ) { this . annotation = annotation ; } private Expression
prefix ()
{ return annotation . getExpression (); }
// 3 private I Con fi guration p r e f i x C o n f i g u r a t i o n () { return new U p d a t e A n n o t a t i o n C o n f i g u r a t i o n ( annotation ); } private I Con fi guration n o t e B o x C o n f i g u r a t i o n () { return new R i g h t N o t e B o x C o n f i g u r a t i o n (); } @Override // 4 public Box logicalLayout ( L o g i c a l L a y o u t V i s i t o r v , List < ASTNode > passedInASTs ) { return makeRow ( super . logicalLayout (v , passedInASTs ) , makeVLine () , l o g i c a l L a y o u t N o t e B o x (v , passedInASTs )); } // 5 public Box l o g i c a l L a y o u t N o t e B o x ( L o g i c a l L a y o u t V i s i t o r v , List < ASTNode > asts ) { return m a k eC enteredRow ( n o t e B o x C o n f i g u r a t i o n () , em ph asi zeBlack ( m a k e R e a d O n l y T e x t B o x (" Setter : ")) , makePrefixBox (v , asts )); } private Box makePrefixBox ( L o g i c a l L a y o u t V i s i t o r v , List < ASTNode > asts ) { return emphasizeBlue ( makeJavaBox ( p r e f i x C o n f i g u r a t i o n () , prefix ())); } }
Figure 6. Complete code for SetterEMO extension’s combining protocol. All EMOs whose extensions combine on the right side of a declaration (e.g., guards, getters, and setters) must implement ICombinableRight and extend this method, which ensures that all of these EMOs combine together. We have described how presentation extensibility is achieved for the setter extension using only a modest amount of code. The semantic extensibility for setters is described next. 5.2
Figure 7. Screenshot of the quasiquote extension. This code snippet generates a setter method and is called during code expansion. Code in boxes is quasiquoted from meta-code and all other code is generated as is. makeParameter ( typeName + " " + varName )) , makeBlock ( makeStatement ( makeThis () + "." + varName + " = " + varName + ";")));
Quasiquote
Most semantic extension mechanisms have some sort of templatebased generation facility such as source-level code generation in Dylan [3], mayans in Maya [5], and define-syntax in Scheme [30]. In the non-Lisp languages, it is particularly difficult to make the use of such facilities expressive—to make it easy to see what code will be generated, and what is constant vs. evaluated. To this end, our CTMOP has a similar facility. The raw form of the code that generates the setter method is moderately expressive, but it still takes more work than we would like for the programmer to see just what will be generated:3 @Quasiquote () private M e t h o d D ec la r at io n makeSetter ( String typeName , String varName ) { return makeMethod ( makeModifiers ( PUBLIC_KEYWORD ) , makeVoid () , ma ke Se tterName ( varName ) , ma ke Pa rameters ( 3 MOP
annotations and inserted library code are fully qualified in the generated code by default. Typically, the programmer never sees the fully qualified names since ETMOP hides them. To make the underlying code easier to read in this paper, however, we have removed all name qualifications in this and the following examples.
}
Using a simple presentation extension (the quasiquote), we can make this code more expressive, as shown in Figure 7. Code in boxes is quasiquoted from the meta-code, and all other code is generated as is. Thus, the screenshot says that the setter method is created by using the variables typeName and varName and by making a call to makeSetterName from the meta-code. Syntax highlighting is also provided by this extension. Any boxed text is directly editable, and text outside of that can be edited using keyboard commands and context menus. Only EMOs are necessary to implement this extension, no CMOs. The Java code in the persistent store has the full semantics. The EMO provides a library of abstract syntax generators through which code is generated at compile time. The EMO uses this same library to generate the display. The EMO converts library calls to create abstract syntax into the appropriate, visually simpler quasiquote display. 5.3
Uses
Some objects, such as file readers, have complex initialization behavior involving construction inside a try/catch, and corresponding
Figure 8. Screenshot of the uses extension cleanup. In Java syntax, this behavior requires a significant amount of code. Figure 8 shows the use of a simple extension designed to clean up this kind of code. The uses extension declares that the method uses an instance of a specified type—the extension has knowledge about the initialization and cleanup behavior for a set of types. This code says that the getFileContents method uses the InputStream called file. The stream is instantiated and closed, with simple error handling. The extension provides what looks like a new keyword, uses, but the underlying grammar does not have to change to accommodate this. The expanded code for this example is below. @Uses ( type = Bu fferedReader . class , name = " file " , init = " new Buf fe re dReader ( new FileReader ( fName ))") public String ge tFileContents ( String fName ) { Bu ff er edR ea der file = null ; try { file = new BufferedReader ( new FileReader ( fName )); { StringBuffer sb = new StringBuffer (); String l ; while (( l = file . readLine ()) != null ) { sb . append ( l + "\ n "); } return sb . toString (); } } catch ( java . io . IOException e ) { throw new R u nti meEx cept ion ( e ); } finally { try { file . close (); } catch ( Throwable t ) { } } }
This kind of extension is application or domain specific in that the CMO only has knowledge of how to expand a limited set of types (although the an extension implementor could certainly add to this set). 5.4
Equations
Figure 9 shows the use of an extension that supports writing code that looks like the usual mathematical notation for the code. New equation parts can be added using the extension palette (Section 4.6). And extensions can be configured to use integer or floating point arithmetic through context menus. The persistent store for this snippet of display is: @Equation ( DOUBLE ) public double mean ( final List < Integer > x ) { return Equ . div (1 , x . size ()) * new Summer () { public int loop ( int i ) { return x . get ( i ); }
Figure 9. Screenshot of the equation extension }. sum (0 , x . size () -1); }
The method is annotated with @Equation, which signifies that there may be an equation in the code attached to it, and DOUBLE means that the expanded code should use floating point arithmetic. During EMO creation, this annotation spawns an EquationEMO, which creates a new visitor that searches for code patterns matching parts of equations, and assigns EMOs to AST nodes based on the patterns. For example, the Equ.div(a, b) method invocation gets a DivisionEMO assigned to it, which then adds the horizontal bar in the appropriate location. The CTMOP creates CMOs similarly, assigning an EquationCMO based on the annotation and searching for patterns below the attached AST node. The CTMOP converts all invocations on methods in the Equ class to appropriately use integer or floating point arithmetic based on the specification in the annotation. Also, the expansion makes optimizing changes to the code by converting method calls into operators. The Equ.div(a,b) method invocation above will, as expected, be converted into a/b.4 In the expanded code for Figure 9 all locations where there may be integer references have been explicitly cast to double: @Equation ( DOUBLE ) public double mean ( final List < Integer > x ) { return ( double ) (( double ) 1) / (( double ) x . size ()) * new SummerD () { public double loop ( int i ) { return x . get ( i ); } }. sum (0 , x . size () -1); }
5.5
Code Style vs. Annotation Style AspectJ
Another use of our approach is to display annotation-style AspectJ as code-style AspectJ. A programmer can view and edit standard code-style AspectJ as it appears in Figure 10, and this can be combined with other extensions, such as an equation. This code is stored as annotation style in a text file, as shown below: 4 The use of Equ.div(a,b) distinguishes the use of division when the display should be altered from when it should not.
Figure 10. Screenshot of the AspectJ extension
@Equation ( FLOAT ) @After ( value =" changes ( p )") public void myAdvice ( Point p ){ Screen . l o g D i s t a n c e F r o m O r i g i n ( Equ . root ( Equ . power ( p . getX () , 2) + Equ . power ( p . getY () , 2) , 2)); } @Pointcut (" this ( p ) && execution ( void Point . set *( int ))") void changes ( Point p ){}
We define one EMO metaclass for each annotation type used in annotation-style AspectJ. These EMOs all work in a similar way: 1. During box creation, walk the annotation-style AST, creating boxes that look like code-style AspectJ. 2. Allow the editing of the code-style form as text. 3. When serializing back to text: • read the display by walking the box structure • create a code-style AspectJ AST • transform to an annotation-style AST, which also happens to
be syntactically correct Java, which becomes the persistent store. Figure 10 shows how different semantic processors can be used together in our architecture. Here, the standard CTMOP executes, creating the expanded code for the myAdvice method, which includes expanding the contained equation. This code is then sent to the AspectJ compiler, which uses its own semantic processor. 5.6
Finite Automaton
We have also implemented a finite automaton extension. This example shows how our tool can support more complex visualizations. Figure 11 shows a simple finite automaton that can accept or reject the regular expression c(ad)*r. This example and its semantics are similar to that described in [33, Chap. 37]. The presentation of the automaton is a simplified version of UML’s state chart.5 The automaton can be called in the following manner: try { // result is the list of characters accepted by // the input stream . List < Character > result = new A c c e p t C a dr Stri ng (). accept ( inputStream ); } catch ( I n v a l i d T r a n s i t i o n E x c e p t i o n e ) { ... } catch ( U n e x p e c t e d S t r e a m E n d E x c e p t i o n e ) { ... }
Each element of the input stream transitions the machine to a new state. To determine which state to go to, all possible transitions are examined. Each transition has a label, which contains the transition event (an expression) and an optional guard (a method that returns a boolean) in brackets. The next value is taken from the stream and is compared to all transitions from the current state. If the value and the transition 5 The
UML specification can be found at http://www.uml.org/.
Figure 11. Finite Automaton that accepts cad*r
event are equal, then the guard method is executed and if this returns true, then the transition is taken, the current state is changed and the process continues. The automaton succeeds if the stream ends when the automaton is in an end state. The automaton fails and throws an exception if there are no transitions that can be taken or if the stream ends when the automaton is not in an end state. This is our most ambitious extension in terms of enhancements to presentation and compilation. Our intention with this example is to strike a balance between making the visual form powerful, but at the same time intuitive.
6.
Spatial Indexical Referencing
Now that we have introduced some examples of presentation extension, we discuss the approach we have been using to choose the style of presentation extensions. Consider two alternative presentations of a getter extension:
and
In the first case, the referent of each getter declaration (the fields x and y) are determined indexically—the getter keyword is part of the field declaration that it declares a getter on. In the second version, the referent of each getter declaration is determined by naming—a defgetter declaration defines a getter for the field with the corresponding name. Comparing the two, the indexical reference alternative is more concise, but the name-based alternative supports a physical separation between the declarations. The physical separation, in turn, enables different grouping structures. For example, all the getters could be clustered separately from all the setters. Now consider the presentation used in Figure 1:
In this case, the referent of the getter declaration is also determined indexically, but the indexical reference works spatially, rather than textually. This enables the getter declaration to range over more than one field—it applies to all the fields to the left of the line. The following analysis accounts for our decisions about when to use which kind of presentation extension (textually indexical reference, spatially indexical reference, or name-based reference): • When we want an extended construct to have a single referent,
we choose to use the textually indexical form, as in quasiquote, before/after/around, named pointcut declarations, uses, equations. • When we want an extended construct to have multiple, but co-
located referents, we choose to use the spatially indexical form, as in getter, setter, guard. This gives us some separation, and also captures the sharing among co-located referents. An additional case for spatial indexical reference is when there is a single referent, but the extended construct is so bulky that we would like more separation, as in the automaton and pre/post-conditions. • And when we want the extended declaration to be physically
separate from the referents, or when the sharing structure is not a simple n to 1, we use name-based reference, such as in pointcut designators. Interestingly, some extensions involve multiple reference techniques. For example, we use spatial indexical reference for automatons because of their bulk. But from automaton notation to within the class, we use name-based reference to talk about the relationship between the transitions and the input characters as well as between the transitions and the guard methods. Similarly, in the AspectJ case, the before extension uses textually indexical referencing to say that the actual declaration that includes before is before advice. And, it uses name-based reference in the pointcut language, because the pointcut may refer to multiple and distant locations. Thus, spatially indexical referencing provided by our presentation extension mechanism is a kind of middle ground, allowing program elements to have multiple referents and spatial separation, but without the full power or complexity of name-based reference. Spatially indexical referencing appears to work well in two general cases: when the programmer wants to highlight the connection between several co-located referents (getter, setter, guard), and when the programmer wants an indexical connection, but the presentation is too cumbersome to inline (pre/postcondition, automaton).
7.
Related Work
The general approach of tailoring languages to make code more expressive—more concise, more declarative, more comprehensible
etc.—has been called many names: meta-linguistic abstraction [1], Language-Oriented Programming [53], Intentional Programming [42], and Model Driven Development [21] to name just a few. While the distinction is not crisp, these approaches tend to fall into two broad categories: language-based, in which the extensions define programming languages ranging from integrated extensions to the main general-purpose language to stand-alone domain-specific languages; and model-based, which involves not just a shift to a higher level of modeling or abstraction, but also explicit manipulation of those models [23, 47, 48]. Successful applications have been described in both categories, covering a wide range of domains including electronics, aerospace, automotive and many others [35, 38, 46, 50, 51]. Our current approach falls into the language-based category. In the language-based category, many mechanisms have been developed to support language extension and tailorability, including syntactic and hygenic macros [30], reflection and metaobject protocols [13, 28, 44], generative programming [15], staged computation [16], attribute grammars [27, 52], and structure editors [6]. Approaches that use these mechanisms add some combination of presentation, syntax, and semantic extensibility to a language. We describe several below. Metaprogramming System (MPS) MPS6 is a development environment that is intended to simplify the definition of new languages and their interactions with existing languages. In addition to the languages themselves, MPS aids the creation of editors and support for those languages such as code completion and debuggers. The Domain Workbench The Domain Workbench [43] is an instance of intentional programming (IP) [41]. Similar to MPS, the Domain Workbench assists in the development of domain specific languages that interact with each other. Code is stored in a proprietary data structure called the intentional tree, in a similar way to how MPS stores its code. Unlike MPS, the Domain Workbench enables the creation of graphical languages and multiple views of single programs through its ability to apply syntactic stylesheets to present the intentional tree. DrScheme DrScheme7 is an example of a system that incorporates language-based extensibility to tailor PLT Scheme. DrScheme also provides presentation extensibility through its use of boxes that can be added to a program and evaluated by the Scheme interpreter. For example, DrScheme has included syntax for fractions, comment boxes, images as values, XML with quasiquote-like escape, and test boxes that allow students to write test cases within their programs. More recently, DrScheme has included picture boxes for WYSIWYG layout of slide content that is otherwise created programmatically [20]. This kind of presentation extensibility is in addition to recognizing standard Scheme hygienic macros. There are several differences between this approach and ours. First, there is no documented protocol for creating new kinds of boxes. Second, any program that uses boxes is stored as ascii-encoded binary and is difficult to edit outside of DrScheme. Last, the semantics of the boxes is driven off of the diplay itself, rather than the persistent store; therefore, in order to compile, interpret, or edit the program, a box-aware editor is required. Presentation Extension There are many development environments that enable visual programming and specifically allow programmers to extend the environment with new visual constructs. We discuss three. Barista [29] provides presentation extensibility in terms of graphical pretty-printing and formating for Java code, 6 MPS
is a commercial product currently in beta and is available through JetBrains at http://jetbrains.com/mps. 7 http://www.drscheme.org
but provides no mechanism for semantic extensibility. Redwood [55] also provides presentation extensibility without semantic extensibility for code. It allows programmers to define code snippets that have customized display and editing capabilities; snippets can be reused and nested inside of other snippets. The Boxer programming environment and language [17] allows programmers to describe computation declaratively in terms of a hierarchical set of boxes. There is some amount of extensibility of the language in the form of defining new types of boxes and new interactions with them (e.g., creating widgets that respond to mouse events).
on a large code base, and looking for more domains that can make use of our presentation extension technique.
Fortress The Fortress programming language [2] specifically targets computational algorithms. Therefore, the ability to display mathematical equations crisply and succinctly is essential. To this end, the fortress editor takes advantage of several syntactic stylesheets so that code can be displayed in ascii, unicode, or in standard mathematical notation. The language itself is meant to be open and extensible through the use of libraries and mathematical notation for using functions in these libraries.
[3] J. Bachrach and K. Playford. D-expressions: Lisp power, Dylan style. Technical report, Massachussetts Institute of Technology, 1999. http://www.ai.mit.edu/people/jrb/Projects/dexprs.pdf.
Lisp An important body of related practice comes from the the Lisp community, where large systems like the Lisp Machine [54], Interlisp-D [12], Macsyma [36] and different versions of Emacs [22] include hundreds of small Lisp extensions and domain specific languages (DSLs). In these systems macros and other metalinguistic technology are used to provide the semantics. Some of these systems do include Integrated Development Environment (IDE) extension points that give annotations modest amounts of control over the displaying and editing of the language extensions and DSLs. This includes the ability for macros to say how they are pretty-printed and language extensions to control how they appear in the debugger. Metaobject Protocols (MOPs) Prior work on metaobject protocols in Smalltalk [24] and CLOS [28] demonstrated that layering could be used to balance power and ease of use. That work also showed a small range of different times at which the MOP could run. Later work in C++, Java and other languages [13, 14] showed that a MOP could run at compile time. We have built on this work by having a MOP which is layered, addresses a very different set of issues and runs at edit time. The work on Explicit Programming [11] uses compile-time MOP technology to implement semantics of annotations, so is analogous with, but different from our work.
8.
Summary
We have shown how presentation extensions, with and without semantic extensions can enable increased program expressiveness. Language extensions can be created that can express concepts in ways that are not possible using standard syntax extension techniques. We presented an architecture for presentation and semantic extension that is comprised of two metaobject protocols: the ETMOP and the CTMOP. The more novel part of our architecture is the ETMOP, which runs in the program editor and enhances the presentation of a program. The CTMOP is a standard code expander that provides semantic extensibility and incorporates traceability from expanded code to the original. Our implementation of this architecture targets the Java programmming language and is an Eclipse plugin, thus it works in coordination with existing tools, code repositories, and semantic extenders. We have also introduced several sample uses of our approach and shown how they increase the expressiveness of a program. Future work includes enhancing our code generation techniques so that they are safer and easier to implement, testing our technique
References [1] H. Abelson and G. J. Sussman. Structure and Interpretation of Computer Programs. MIT Press, Cambridge, MA, USA, 1996. [2] E. Allen, D. Chase, V. Luchangco, J.-W. Maessen, S. Ryu, G. L. S. Jr., and S. Tobin-Hochstadt. The fortress language specification. Technical Report Version 0.866, Sun Microsystems, Inc, Feb. 2006.
[4] J. Bachrach and K. Playford. The Java syntactic extender (JSE). In Proceedings of the 16th ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, pages 31–42. ACM Press, 2001. [5] J. Baker and W. C. Hsieh. Maya: multiple-dispatch syntax extension in java. In PLDI ’02: Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, pages 270–281, New York, NY, USA, 2002. ACM Press. [6] D. R. Barstow. Overview of a display-oriented editor for Interlisp. In Proceedings of the 7th International Joint Conference on Artificial Intelligence, pages 927–929, 1981. [7] J. Bloch. JSR-175: A Metadata Facility for the JavaTM Programming Language. Technical Report Final Release, Sun Microsystems Inc., Sept. 2004. [8] C. Brabrand, A. Moller, and M. I. Schwartzbach. The project. ACM Trans. Inter. Tech., 2(2):79–114, 2002. [9] C. Brabrand and M. I. Schwartzbach. Growing languages with metamorphic syntax macros. In PEPM ’02: Proceedings of the 2002 ACM SIGPLAN workshop on Partial evaluation and semantics-based program manipulation, pages 31–40, New York, NY, USA, 2002. ACM Press. [10] M. Bravenboer and E. Visser. Concrete syntax for objects. Domainspecific language embedding and assimilation without restrictions. In OOPSLA ’04: Proceedings of the 19th ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications, Vancouver, Canada, October 2004. ACM SIGPLAN. [11] A. Bryant, A. Catton, K. D. Volder, and G. C. Murphy. Explicit programming. In AOSD ’02: Proceedings of the 1st international conference on Aspect-oriented software development, pages 10–18. ACM Press, 2002. [12] R. R. Burton, R. M. Kaplan, L. M. Masinter, B. Sheil, A. Bell, D. G. Bobrow, L. P. Deutsch, and W. S. Haugeland. Papers on InterlispD. Technical report, Palo Alto Research Center, Xerox Corporation, Sept. 1980. [13] S. Chiba. A metaobject protocol for C++. In OOPSLA ’95: Proceedings of the tenth annual conference on Object-oriented programming systems, languages, and applications, pages 285–299, New York, NY, USA, 1995. ACM Press. [14] S. Chiba. Load-time structural reflection in Java. In ECOOP ’00: Proceedings of the 14th European Conference on Object-Oriented Programming, pages 313–336, London, UK, 2000. Springer-Verlag. [15] K. Czarnecki and U. W. Eisenecker. Generative programming: methods, tools, and applications. ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 2000. [16] R. Davies and F. Pfenning. A modal analysis of staged computation. In POPL ’96: Proceedings of the 23rd Symposium on Principles of Programming Languages, pages 258–270, 1996. [17] A. A. diSessa and H. Abelson. Boxer: a reconstructible computational medium. Commun. ACM, 29(9):859–868, 1986. [18] Eclipse Tools Project. The Graphical Editing Framework. http: //www.eclipse.org/gef/.
[19] J.-C. Fabre and S. Chiba, editors. Proceedings of Workshop on Reflective Programming in C++ and Java, Oct. 1998. http: //www.csg.is.titech.ac.jp/~chiba/oopsla98ws.html. [20] R. B. Findler and M. Flatt. Slideshow: functional presentations. J. Funct. Program., 16(4-5):583–619, 2006. [21] D. Frankel. Model Driven Architecture: Applying MDA to Enterprise Computing. John Wiley & Sons, Inc., New York, NY, USA, 2002. [22] Free Software Foundation. Programming in Emacs Lisp (Second Edition). Free Software Foundation, Jan. 2002. [23] M. P. J. Fromherz and V. A. Saraswat. Model-based computing: Using concurrent constraint programming for modeling and model compilation. In Proceedings of the 1st International Conference on Principles and Practice of Constraint Programming, pages 629–635, London, UK, 1995. Springer-Verlag. [24] A. Goldberg and D. Robson. Smalltalk-80: The Language and its Implementation. Addison Wesley, Xerox Palo Alto Research Center, 1983. [25] J. Greenfield and K. Short. Software factories: assembling applications with patterns, models, frameworks and tools. In Companion of the 18th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 16–27, New York, NY, USA, 2003. ACM Press. [26] JetBrains. Meta Programming System. http://www.jetbrains. com/mps/. [27] K. Kennedy and S. K. Warren. Automatic generation of efficient evaluators for attribute grammars. In POPL ’76: Proceedings of the 3rd ACM SIGACT-SIGPLAN symposium on Principles on programming languages, pages 32–49, New York, NY, USA, 1976. ACM Press. [28] G. Kiczales, J. des Rivi`eres, and D. G. Bobrow. The Art of the Metaobject Protocol. MIT Press, Cambridge, MA, USA, 1991. [29] A. J. Ko and B. A. Myers. Barista: An implementation framework for enabling new tools, interaction techniques and views in code editors. In CHI ’06: Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 387–396, New York, NY, USA, 2006. ACM Press. [30] E. Kohlbecker, D. P. Friedman, M. Felleisen, and B. Duba. Hygienic macro expansion. In LFP ’86: Proceedings of the 1986 ACM conference on LISP and functional programming, pages 151–161, New York, NY, USA, 1986. ACM Press. [31] G. E. Krasner and S. T. Pope. A cookbook for using the model-view controller user interface paradigm in Smalltalk-80. J. Object Oriented Program., 1(3):26–49, 1988. [32] S. Krishnamurthi. Automata as macros. Journal of Functional Programming, 2005. [33] S. Krishnamurthi. Programming Languages: Application and Interpretation. Self-Published, Jan. 2006. http://www.cs.brown.edu/ sk/Publications/Books/ProgLangs/. [34] B. M. Leavenworth. Syntax macros and extended translation. Communications of the ACM, 9(11):790–793, 1966. [35] E. Long, A. Misra, and J. Sztipanovits. Application of modelintegrated computing in manufacturing execution systems. In Proceedings of the 6th Conference and Workshop on the Engineering of Computer Based Systems, pages 53–59, 1999. [36] Mathlab Group. The MACSYMA papers 1970. Massachusetts Institute of Technology, Mathlab Group, Cambridge, MA, USA, 1971. [37] S. McConnell. Code complete: a practical handbook of software construction. Microsoft Press, Bellevue, WA, USA, 1993. ´ L´edeczi. [38] A. Misra, J. Sztipanovits, G. Karsai, M. Moore, and A. Integration of information systems in large-scale enterprises using model-integrated computing. In Proceedings of the 1st International Conference on Enterprise Information Systems, pages 485–492, 1999. [39] H. Ossher and P. Tarr. Multi-dimensional separation of concerns in
hyperspace. In C. V. Lopes, A. Black, L. Kendall, and L. Bergmans, editors, International Workshop on Aspect-Oriented Programming (ECOOP 1999), June 1999. [40] D. L. Parnas. On the criteria to be used in decomposing systems into modules. Commun. ACM, 15(12):1053–1058, 1972. [41] C. Simonyi. The death of computer languages, the birth of intentional programming. Technical Report MSR-TR-95-52, Microsoft Corporation, Sept. 1995. [42] C. Simonyi. Intentional programming: Innovation in the legacy age. In Proceedings of the IFIP Working Group 2.1, June 1996. [43] C. Simonyi, M. Christerson, and S. Clifford. Intentional software. In OOPSLA ’06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming languages, systems, and applications, pages 451–464, New York, NY, USA, 2006. ACM Press. [44] B. C. Smith. Procedural Reflection in Programming Languages. PhD thesis, Massachusetts Institute of Technology, Jan. 1982. MIT-LCSTR-272. [45] G. L. Steele. Common Lisp the Language, 2nd edition. Digital Press, 1990. [46] G. J. Sussman and J. Wisdom. Structure and Interpretation of Classical Mechanics. MIT Press, Cambridge, MA, USA, 2001. ´ L´edeczi, and [47] J. Sztipanovits, G. Karsai, C. Biegl, T. Bapty, A. A. Misra. MULTIGRAPH: an architecture for model-integrated computing. In Proceedings of the 1st International Conference on Engineering of Complex Computer Systems, pages 361–368, 1995. [48] J. Sztipanovits, G. Karsai, and H. Franke. Model-integrated program synthesis environment. In Proceedings of the 3rd Conference and Workshop on the Engineering of Computer Based Systems, pages 348–355, 1996. [49] M. Tatsubori, S. Chiba, K. Itano, and M.-O. Killijian. Openjava: A class-based macro system for java. In Proceedings of the 1st OOPSLA Workshop on Reflection and Software Engineering, pages 117–133, London, UK, 2000. Springer-Verlag. [50] USENIX, editor. Proceedings of the Conference on Domain-Specific Languages, New York, NY, USA, 1997. ACM Press. [51] USENIX, editor. Proceedings of the 2nd conference on Domainspecific languages, New York, NY, USA, 1999. ACM Press. [52] E. Visser. Program transformation with Stratego/XT: Rules, strategies, tools, and systems in StrategoXT-0.9. In C. Lengauer et al., editors, Domain-Specific Program Generation, volume 3016 of Lecture Notes in Computer Science, pages 216–238. Spinger-Verlag, June 2004. [53] M. Ward. Language oriented programming. Software—Concepts and Tools, 15(4):147–161, 1994. [54] D. Weinreb and D. Moon. The lisp machine manual. ACM SIGART Bulletin, 1(78):10–10, 1981. [55] B. Westphal, J. F. C. Harris, and S. Dascalu. Snippets: Support for drag-and-drop programming in the redwood environment. Journal of Universal Computer Science, 10(7):859–871, 2004. http: //www.jucs.org/jucs_10_7/snippets_support_for_drag.