Matching Objects Joost Visser∗
Ralf Lammel ¨
Universidade do Minho Braga, Portugal
Free University Amsterdam, The Netherlands
[email protected]
[email protected]
ABSTRACT Pattern matching is a powerful programming concept which has proven its merits in declarative programming paradigms. We propose a light-weight approach to support pattern matching in objectoriented languages. Our technique requires no language extension. Instead, we provide a generic, reflective matching algorithm that switches to type-specific behavior where possible. We detail the realization of our approach within Java. Additionally, we generalize the pattern-matching idiom with active patterns, and matching at arbitrary depth. This is achieved through an elegant integration of our pattern-matching approach with visitor combinator programming. Previously, we introduced generic visitor combinators as a technique for modeling term rewriting strategies in object-oriented setting. The result is a light-weight, flexible, but quite powerful technique for navigation and manipulation of object graphs.
Categories and Subject Descriptors D.1.5 [Programming Techniques]: Object-oriented Programming; D.1.m [Programming Techniques]: Strategic Programming, Adaptive Programming; D.3.3 [Programming Languages]: Language Constructs and Features
General Terms Algorithms, Design, Languages, Performance
Keywords Pattern matching, Visitor pattern, Traversal, Reflection
1.
INTRODUCTION
Pattern matching is a programming concept that plays a central role in declarative paradigms, such as term rewriting and functional programming. Given a term and a term pattern, i.e. a term in which ∗ Work carried out with support from Fundac¸a˜ o para a Ciˆencia e a Tecnologia, Portugal, ref. SFRH.BPD.11609.2002.
variables may occur, the pattern matching problem consists in finding a substitution for the variables in the pattern such that the pattern and the term become equal. For instance, given the following pattern and term: pattern term
F (x, G(x, y)) F (H(A), G(H(A), B))
where x and y are variables, pattern matching yields the following substitutions: x := H(A) y := B Note that the variable x occurs twice in the pattern. When variables are allowed to occur more than once in a pattern, the patternmatching problem is called non-linear. When the pattern is not matched against a closed term, but against another pattern, a more general problem, called unification, must be solved. Unification plays a central role in logic programming, type inference in functional programming, and in some artificial intelligence approaches. A comprehensive survey of unification can be found in [3]. As we will see below, pattern-matching, not surprisingly, is meaningful and useful within the object-oriented paradigm as well. Rather than matching tree-shaped terms, we will be matching objects arranged in object graphs. Instead of extending an object-oriented language with built-in matching functionality, we will employ a mixture of techniques, including reflection, to offer fully generic, but customizable, pattern matching functionality within an existing language (we will use Java). Additionally, we will enrich the pattern-matching functionality with generic traversal functionality as provided by visitor combinators, another technique borrowed from declarative paradigms. In particular, we will allow visitors to appear in patterns, just like variables, and we will allow patterns to be used as visitor building blocks. The former gives us active patterns, i.e. patterns with behavior encapsulated at their leafs. The latter allows us to do pattern matching at arbitrary depths inside object graphs. As a result, a powerful generic programming technique emerges for the navigation and manipulation of object graphs. In Section 2 we explain our solution to object matching. In Section 3 we assess the performance of the algorithm. In Section 4 we introduce additional sophistication by blending pattern match-
pattern
object
F
F
G
H
A
G
B
Figure 1: The basic idea of matching object graphs. The pattern is an object graph that contains variable objects, indicated by empty dashed circles. After matching, these variables are bound to corresponding objects in the term. The dotted arrows indicate object references that represent bindings.
ing with generic visitors. Section 5 concludes with a discussion of related work and of relative merits and weaknesses.
2.
GENERIC OBJECT MATCHING
The basic idea of matching object graphs is illustrated in Figure 1. A pattern is an object graph that contains designated variable objects. These variables have no name; they are identified by reference, as are all objects. In the figure, they are indicated by empty dashed circles. The pattern is matched against an object graph that does not contain variable objects. Matching involves a comparison between the structure of both object graphs as well as between the types of the objects that appear in them. During matching, references are created from variables to corresponding objects. These references, indicated by dotted arrows in the figure, represent variable bindings. Figure 2 shows some examples in Java.
2.1
The algorithm
Starting from this basic idea, an algorithm for object graph pattern matching can be designed in a fairly straightforward way. The outline for such an algorithm is shown as a decision diagram in Figure 3. Two things must be noted about this algorithm before we explain it. Firstly, it is recursive. When the pattern and the object against which it is matched are composite, the matching algorithm will be invoked recursively on their components (cf. the match connector at the bottom of the diagram). Secondly, the algorithm is generic, i.e. it works for all possible types of objects (which can be achieved by reflection), but it includes the possibility of switching to a type-specific matching algorithm (cf. the specific match connector at the right side of the diagram). The rationale behind this will become evident below. Is the pattern a variable? If the pattern is a variable, which is not yet bound, then the object against which it is matched will be bound to that variable, and the algorithm terminates successfully. If the pattern is a variable which has been bound to a specific object before, then this object is retrieved, and checked for equality with the object against which the variable is being matched. If both are equal, the matching algorithm terminates successfully; if unequal
void test() throws MatchFailure { Term x = new TermVariable(); Term y = new TermVariable(); Object pattern = new F(x,new G(x,y)); Term h = new H(new A()); Term b = new B(); Object term = new F(h,new G(h,b)); MatchEngine.match(pattern,term); assertTrue(x.getValue() == h); assertTrue(y.getValue() == b); } Object matchSingleton(Object o) throws MatchFailure { Object x = new ObjectVariable(); List pattern = new ArrayList(); pattern.add(x); MatchEngine.match(pattern,o); return x.getValue(); }
Figure 2: Examples of use. The upper snippet shows a test method that replays the match illustrated in Figure 1. The matchSingleton method constructs a pattern consisting of a singleton list with a variable as element. The incoming object o is matched against this pattern. If o is indeed a singleton list, the match will succeed, and its element will be bound to the variable x. it terminates with a match failure. Both implement the Matchable interface? It the pattern is not a variable, but some other object, then both objects are inspected to establish whether they implement the Matchable interface. As will be explained below, this interface provides a type-specific matching function which may be implemented in a more efficient, or more accurate manner than the generic one. If this is the case, the generic algorithm will switch to the specific one. This specific matching algorithm might invoke the generic one again on a subgraph. Are the types the same? If the concrete classes of the pattern and the object against which it is matched are different, the match fails. Are both composite? An object is said to be composite, if it has components. But there are several ways to decide what the components of an object are: fields as components A generic, but simplistic approach is to regard fields of the objects as its components. In our algorithm, specifically, we take non-static public fields as components by default. iterable and component interface The default view of components can be refined by letting the object itself decide which are its components. In our algorithm, we inspect whether the object either implements a Composite interface, which allows access to components by index, or provides an iterator method that allows iteration over its components. This latter method is implemented by most collection classes in Java’s standard libraries. array components Finally, arrays are given a special treatment. The elements of an array are clearly to be viewed as its components. If none of these perspectives on the object yields components, it
is considered to be non-composite. If neither pattern nor matched object is composite, the equals method is used to decide whether the match succeeds. Are both composites with similar shape? If only one (either pattern or matched object) is a composite, the match fails. Similarly, the match fails when when both are composites, but of different shape, i.e. with a different number of components. match
Bind term to the variable.
Iterate over pairs of components. If both are composites with similar shapes, we can pair-up the respective components, and invoke the generic matching algorithm on these pairs.
Is the pattern a variable?
no
2.2 specific match
no
Variable is bound
Both implement Matchable? Get value!
no Are the types the same?
no
no match no match
no equal() no
Are both composite?
ok Are both composites with similar shape?
no
Iterate over pairs of components
match
Figure 3: Outline of the matching algorithm.
Class collaborations
The generic matching algorithm just outlined can be encapsulated in a MatchEngine class. Figure 4 show a structure diagram for such a class and its collaborators. MatchEngine. The MatchEngine makes extensive use of reflection. It uses runtime type inspection (in Java: instanceof) to determine whether objects are of type Variable, or implement the Matchable or Composite interface. Reflection is also used to determine whether an object is an array, and, if so, what its element type is. Also, the non-static public fields of an object are discovered via reflection, as well as the existence of an iterator method. Variable. As mentioned before, variables have no names. Initially, they are unbound, meaning that isBound returns false, and getValue returns a null reference. When they are bound, through the setValue method, a reference to the bound object will be added to the variable. A binding can also be undone, with the unBind method. The class ObjectVariable provides an implementation of the Variable interface. If, however, a pattern is constructed with variables at the place of more specific types than Object, other type-specific implementations of he interface must be provided. Examples will be provided below. MatchFailure. Matching failure can be modeled in various ways. We have chosen to do so with a specific exception, called MatchFailure. An alternative way would be to employ a boolean return value of all match methods, but this results in some code clutter, because return values must be explicitly combined in logical expressions and propagated upward in the call chain. Bound. In general, matching failure will be discovered after some submatches have already succeeded and resulted in the binding of variables to objects. We prefer to bind either all variables (in case of success) or none (in case of failure). For this reason, the MatchEngine will maintain a set of bound variables. Each time a variable is bound, it will be registered in this set. When match failure occurs, the MatchFailure exception calls the undo method, which takes care of unbinding all the registered variables. Matchable. The generic matching engine switches to type-specific matching when it encounters objects that implement the Matchable interface. This interface consists of a match method. In the next section we will provide details about the implementation obligations for this method. Composite. The generic matching engine also checks whether objects implement the Composite interface. In this case, it does not delegate the responsibility for the entire matching tasks to these objects, but only for identifying their components. The Composite
match(Object,Bound)
checks for instances of
Matchable
MatchEngine
Exception
match(Object,Object,Bound) match(Object,Object)
Composite 2 getChildCount() getChild(int) Iterable iterator() ObjectVariable value : Object
raises
Object
MatchFailure
maintains
0..1
undoes
Variable isBound() unBind() setValue(Object) getValue()
Bound variables : Set register(Variable) undo()
Figure 4: Structure diagram for the generic matching engine. The match method of class MatchEngine implements the generic matching algorithm outlined in Figure 3. interface offers the methods getChildCount and getChildtAt to access components by indexing. Iterable. As an alternative to the Composite interface, an object may expose its composite structure via an Iterator, which can be obtained with the iterator method. In the diagram, an explicit Iterable interface with this method is shown. However, the generic matching engine checks for the presence of the method, not for the implementation of this explicit interface. In fact, such an explicit interface was not present in the Java collection classes until quite recently.1 We have implemented the MatchEngine class and its collaborators in Java. The code is available from the authors’ webpages.
2.3
Type-specific matching
The generic matching algorithm can be supplemented with typespecific code to (i) enable construction of patterns with variables at type-specific places, (ii) hook type-specific matching behavior into the generic engine. We will discuss both possibilities. Some illustrative example code is provided in Figure 5. Matching behavior. As explained above, type-specific behavior can be hooked into the generic matching algorithm in several ways: 1. Implement the Composite interface. This exposes the structure of the object to the MatchEngine as an indexable lists of components. 2. Implement an iterator method. Most Java collection classes implement such a method. It exposes the structure of the object via an Iterator object, which allows iterating over the components with the hasNext and next methods. 3. Implement the Matchable interface. This means that a typespecific matching engine is implemented in the object itself. 1 The Iterable interface is new in the Java 2 platform version 1.5.0, which is in beta release at the time of writing.
abstract class Tree implements Matchable {
}
class Leaf extends Tree { private int value; public void match(Matchable term, Bound bound) throws MatchFailure { if (term instanceof Leaf) { Leaf leaf = (Leaf) term; if (leaf.value==value) { } else { throw new MatchFailure(this,term,bound); } } else { throw new MatchFailure(this,term,bound); } } } class Fork extends Tree { private Tree left; private Tree right; public void match(Matchable term, Bound bound) throws MatchFailure { if (term instanceof Fork) { Fork fork = (Fork) term; left.match(fork.left,bound); right.match(fork.right,bound); } else { throw new MatchFailure(this,term,bound); } } } public class TreeVariable extends Tree implements Variable { private Tree value; public void match(Matchable term, Bound bound) throws MatchFailure { if (term instanceof Tree) { if (value!=null) { value.equals(term); } else { value = (Tree) term; bound.register(this); } } else { throw new MatchFailure(this,term,bound); } } }
Figure 5: Implementation of the Matchable and Variable interfaces for a simple set of Tree classes.
This engine does not need to make extensive use of reflection, because it can already make certain assumptions about the pattern. Also, it can observe the internal state of the object, while the external generic matching engine can only make use of observable state.
In each case, the pattern was a balanced binary tree, without variables. It was matched against a distinct, but isomorphic tree. Thus, the matches were all successful. Because of the absence of variables, the matching problem degenerates to an equivalence test, making it possible to use the equals method as a base case against which to compare the matching functionality.
Clearly, the third alternative involves more programming effort than the first two.
Figure 6 shows the running times in milliseconds of the matching for patterns of various sizes. Each plotted point resulted from measuring the average runtime of 1,000,000 matches. The pattern size is the number of nodes in the balanced tree. The range of sizes (between 1 and 25) is reasonable, because patterns constructed by a programmer will generally fall within these bounds. The size of the object against which the pattern is matched will in general be (much) larger, but since the matching follows the structure of the pattern, this is not relevant for performance.
Variables. Apart from type-specific match behavior, one also may need to provide type-specific variables as alternative to the generic ObjectVariable. This was already mentioned above. To construct a pattern with variables at the place of more specific types than Object, type-specific implementations of the Variable interface need to be provided. Example. In Figure 5, some example code is provided to illustrate type-specific implementations of the Variable and Matchable interfaces. There are four classes involved in this example, an abstract class Tree, and three concrete subclasses Leaf, Fork, and TreeVariable. In each of the concrete classes, a type-specific match method is implemented. In the case of TreeVariable, this method implements the part of the decision diagram of Figure 3 starting from Variable is bound?. Additionally, the method tests whether the object to be matched is of type Tree, i.e. whether its type matches the type of the location in the pattern to which it will be bound. In case of Leaf and Fork, the match method implements the decision diagram, starting at Are the types the same?. Note that these methods access private attributes of the respective classes; something the generic matching algorithm can not do. Generation. Type-specific matching functionality can be provided by manual implementation, but it can also be provided through code generation. In fact, we implemented a generator that supports generation of match methods, using the OpenJava precompilation system [8].
3.
PERFORMANCE
Our prime objective is to provide programmers with convenient expressiveness, allowing them to write shorter and more comprehensible programs. But not at all cost, of course. The extensive use of reflection implies a potentially significant performance hit. In this section we analyze performance issues. To this end we conducted several experiments. To assess the effectiveness of the various optimizations built into the generic matching engine, we have compared the run times for tree structures of different kinds: vanilla trees Forks and leafs where the subtrees and integer values are non-static public fields. Matching uses reflection only. composite trees Forks and leafs that implement the Composite interface. Matching makes use of the getChildAt method, in addition to reflection. matchable trees Forks and leafs that implement the Matchable interface. See Figure 5. Generic matching switches immediately to type-specific matching. nested lists Lists with exactly two elements, where each element is again a list or an integer object. Matching makes use of the iterator method, in addition to reflection.
We collected six data sets: data set equals Matchable∗ Matchable Composite nested lists vanilla
tree type vanilla matchable matchable composite nested lists vanilla
called method equals Matchable.match MatchEngine.match MatchEngine.match MatchEngine.match MatchEngine.match
The difference between Matchable and Matchable∗ is that in the first case the generic matching engine is called, which switches to the type-specific matching behavior, while in the second case, the type-specific matching behavior is invoked directly. These data sets are plotted in three charts in Figure 6. The upper chart contains all six data sets, the middle one shows only the first four, and the bottom one shows only the last tree. The selective charts allow a closer inspection of the data sets with smaller runtime values (see caption). Linear trend lines have been fitted through the data, and the corresponding formula’s are shown in the charts. For instance, the trend line for vanilla trees shows that matching has a constant cost of 0.0411 milliseconds, and variable costs of 0.246 milliseconds per pattern node. Interpretation Several conclusions can be drawn from our performance measurements. • All flavors of matching exhibit linear time behavior with respect to the pattern size. • As expected, the purely reflective matching (cf vanilla trees) is by far the slowest. • Implementation of the Composite interface pays off, because it reduces both the constant runtime costs and the variable runtime costs by an approximate factor of 98%. • The runtimes of matching of the nested lists (using the iterator method), do not warrant many conclusions relative to the tree experiments, but at least they suggest that the performance remains within reasonable bounds.
• Implementation of the Matchable interface has an even greater impact: runtime is reduced by an approximate factor of 99.97% with respect to vanilla trees, and 97.90% with respect to composite trees. • In fact, if we take a detailed look at the runtimes of matchable trees switching from generic to specific behavior, matchable trees calling specific behavior directly, and the equals method, we see that the type-specific matching is almost as performant as the equals method. The variable costs for both flavors of type-specific matching are only a fraction higher than for the equals method. A constant cost of 6.6 ∗ 10−4 milliseconds separates the type-specific matching algorithm itself from the equals method, and another 1.4 ∗ 10−3 is incurred by the switch via the generic matching engine. The overall conclusion is that the generic matching algorithm is costly. Depending on the demands of particular applications, a trade-off needs to be made whether an investment should be made in implementation, or generation of type-specific matching functionality.
4.
MATCH AND VISIT
An interesting characteristic of our approach to object matching is that patterns are first-class entities. Like any other object, they can be passed as arguments, targeted by references, and returned as method results. This opens possibilities for doing more with patterns than what is customary in declarative languages. In this section in particular, we will investigate some possibilities of combining generic pattern matching with generic visitor combinators. Generic visitor combinators are small, reusable classes that capture basic functionality, and can be combined in different constellations to construct more complex behavior [11]. We will generalize our pattern matching approach by combining patterns with visitors in two complementary ways: patterns in visitors What if we want to match a pattern, not at the root of an object graph, but at a deeper node? If we capture the pattern inside a visitor combinator, it can be combined with a visitor combinator that performs traversal to realize pattern matching at arbitrary depths.
Figure 6: Performance measurements of various flavors of matching. The horizontal axis shows the pattern size in number of nodes. The vertical axis shows the run time in milliseconds. Linear trend lines have been fitted through the scatter plots, and the corresponding formulas are shown. The upper chart shows all measurements, but those for vanilla trees and nested lists are distinguishable, because they dwarf the runtimes of the others. The middle chart shows all measurements except those for vanilla trees and nested lists, making the runtimes for composite trees distinguishable, but not the others. The bottom chart shows only the runtimes for the matchable trees (called directly, as indicated with ∗ , or via the matching engine), and of the equals method.
visitors in patterns What if we want to apply a visitor to a subgraph that matches with a variable in our pattern? We can eliminate the intermediate steps of binding to the variable and retrieving its value to be visited, simply by integrating the visitor into the pattern at the place where the variable would normally occur. Since visitors encapsulate behavior, this makes our patterns active. Before explaining these two generalizations in detail, we provide a brief exposition of the concept of visitor combinators. For a more elaborate account, the reader should look elsewhere [11, 4, 1, 12].
4.1
Generic visitor combinators
Visitor combinator programming was introduced in [11] and is supported by JJTraveler: a combination of a framework and library that provides generic visitor combinators for Java. Figure 7 shows the architecture of JJTraveler (upper half) and its relationship with an application that uses it (lower half). JJTraveler
k
ewor
Fram
Visitable getChildCount getChildAt setChildAt
n ntatio
Insta
Libra
Visitor visit(Visitable)
HVisitable
Oper ation s
HVisitor
accept(HVisitor)
visitA(A) visitB(B)
rchy
Hiera
ry
fwd
A
B
Fwd visitA(A a) visit(v) v.accept( this )
fwd. visit(a)
Figure 7: The architecture of JJTraveler. Rounded boxes indicate interfaces, square boxes are classes. Inheritance is indicated by lines with triangular connectors. These are dashed if the inheritance relation is implementation of an interface rather than specialization of a class. Dashed boxes indicate implementation notes. Name Identity Fail Not Sequence Choice All
Args
One
v
IfThenElse Try TopDown BottomUp OnceTopDown OnceBottomUp AllTopDown AllBottomUp
c, t, f v v v v v v v
v v1 , v 2 v1 , v 2 v
Description Do nothing Raise VisitFailure exception Fail if v succeeds, and v.v. Do v1 , then v2 Do v1 , if it fails, do v2 Apply v sequentially to all immediate children until it fails Apply v sequentially to all immediate children until it succeeds If c succeeds, do t, otherwise do f Choice(v,Identity) Sequence(v,All(TopDown(v))) Sequence(All(BottomUp(v)),v) Choice(v,One(OnceTopDown(v))) Choice(One(OnceBottomUp(v)),v) Choice(v,All(AllTopDown(v))) Choice(All(AllBottomUp(v)),v)
Figure 8: JJTraveler’s library (excerpt).
consists of a framework and a library. The application consists of a class hierarchy, an instantiation of JJTraveler’s framework for this hierarchy, and the operations on the hierarchy implemented as visitors. The JJTraveler framework offers two generic interfaces, Visitor and Visitable. The latter provides the minimal interface for nodes that can be visited. In fact, it is an extension of our Composite interface with a setChildAt method. The Visitor interface provides a single visit method that takes any visitable node as argument. Each visit can succeed or fail, which can be used to control traversal behavior.Failure is indicated by a VisitFailure exception. The library of JJTraveler consists of a number of predefined visitor combinators. These rely only on the generic Visitor and Visitable interfaces, not on any specific underlying class hierarchy. An overview of the library combinators is shown in Figure 8. To use JJTraveler, one needs to instantiate the framework for the class hierarchy of a particular application. This first of all requires specializing the visitor and visitable interfaces to hierarchy-specific ones, called HVisitor and HVisitable in Figure 7. In particular, the
HVisitor interface contains distinct visit methods for each class in the hierarchy. Secondly, a default implementation of the extended visitor interface is provided in the form of a visitor combinator Fwd. This combinator forwards every specific visit call to a generic default visitor given to it at construction time. Concrete visitors are built by providing Fwd with the proper default visitor – typically Identity if for most nodes nothing needs to be done – and overriding some of the specific Fwd methods to obtain the required behavior for selected node types. Finally, the class-hierarchy must be made visitable. To turn a class into a visitable class, it must implement the hierarchy-specific HVisitable interface. In addition to the generic visitable methods, this interface provides an accept method, which calls the appropriate visit method in the hierarchy-specific HVisitor. The accept method realizes the so-called double-dispatch functionality of the Visitor pattern: it selects a visit method to be executed, based both on the visitor object and the object being visited. Though instantiation of JJTraveler’s framework can be done manually, automated support for this is provided by a generator, called JJForester [4]. This generator takes a grammar as input. From this grammar, it generates a class hierarchy to represent the parse trees corresponding to the grammar, the hierarchy-specific HVisitor and HVisitable interfaces, and the Fwd combinator. After instantiation, the application programmer can implement operations on the class hierarchy by specializing, composing, and applying visitors. Figure 8 shows high-level descriptions for an excerpt of JJTraveler’s library of generic visitor combinators. A full overview of the library can be found in the online documentation of JJTraveler. Two sets of combinators can be distinguished: basic combinators and defined combinators, which can be described in terms of the basic ones as indicated in the overview. Note that some of these definitions are recursive. Basic combinators provide the primitive building blocks for visitor combination. They include unary combinators Identity and Fail, as well as binary operators such as Sequence and Choice. An example of a recursively defined visitor is TopDown(v), which in Figure 8 is defined as TopDown(v) = Sequence(v, All(TopDown(v))) Thus, TopDown first applies v to the current node, and then recursively applies the top down strategy to each of the children of the current node, yielding a depth-first traversal of a tree visited. Visitor combinators can be used to build recursive visitors with all sorts of sophisticated traversal behavior.
4.2
Patterns inside visitors
Figure 9 shows a structure diagram that illustrates the combination of patterns and visitors. The class Match implements the Visitor interface, and holds a reference to a pattern. This pattern may either be an Object, if we intend to use the generic MatchEngine to match it, or it may be an object that implements the Matchable interface, if we intend to use a type-specific matching algorithm. Figure 10
Object Visitor
Matchable
visit(Visitable)
match(Object,Bound)
Match pattern : ...
Visit visitor : Visitor
Figure 9: Mixing patterns and visitors.
class Visit implements Matchable { private Visitor visitor; void match(Matchable term, Bound bound) throws MatchFailure { if (term instanceof Visitable) { try { visitor.visit((Visitable) term); } catch (VisitFailure e) { throw new MatchFailure(this,term,bound); } } else { throw new MatchFailure(this,term,bound); } } }
Figure 11: The pattern combinator Visit implements the match method by visiting the object against which it is matched with the visitor to which it holds a reference. If a visit failure occurs, it catches it and throws a match failure instead. Before the visitor is applied, reflection is used to determine whether the object to be match is actually visitable.
show the corresponding Java code. Thus, Match implements the visit method by invoking the generic or specific matching functionality. If a match failure occurs, it catches this exception, and throws a visit failure instead. An example of using the Match combinator is to find the innermost fork with equal left and right branches: TreeVariable x = new TreeVariable(); Visitor v = new OnceBottomUp(new Match(new Fork(x,x))); v.visit(tree);
class Match implements Visitor { private Object pattern; Visitable visit(Visitable term) throws VisitFailure { try { MatchEngine.match(pattern,term,new Bound()); return term; } catch (MatchFailure e) { throw new VisitFailure(e.getMessage()); } } } class Match extends Tree implements Visitor { private Tree pattern; void visit(Visitable term) throws VisitFailure { try { pattern.match(term,new Bound()); } catch (MatchFailure e) { throw new VisitFailure(e.getMessage()); } } }
Figure 10: The visitor combinator Match implements the visit method by matching the pattern to which it holds a reference against the visitable that it visits. A generic and a typespecific implementation variant are shown.
If the object graph rooted at tree at any depth actually contains a fork with equal branches, this branch will be bound to variable x. Otherwise a matching failure will occur.
4.3
Visitors inside patterns
The class Visit in Figure 9 implements the Matchable interface, and holds a reference to a visitor. Figure 11 shows the corresponding Java code. Thus, Visit implements the match method by visiting the object against which it is matched with the visitor to which it holds a reference. If a visit failure occurs, it catches it and throws a match failure instead. Before the visitor is applied, reflection is used to determine whether the object to be match is actually visitable. An example of using the Visit combinator is to match a fork that contains at least one leaf with value 42 in its right branch: TreeVariable x = new TreeVariable(); Visitor v = new OnceTopDown(new Match(new Leaf(42))); Object pattern = new Fork(x, new Visit(v)); MatchEngine.match(pattern,tree,new Bound());
Note that the visitor fed to Visit was itself constructed using the visitor combinator Match. As we will explain in the related work section, this style of mixing traversal strategies with first-class matching patterns was inspired by the capabilities of the strategic term rewriting language Stratego.
class IfNotRewrite implements Visitor { Visitable visit(Visitable term) { try { ExprVar b = new ExprVar(); StatVar t = new StatVar(); StatVar e = new StatVar(); Object pat = new If(new Not(b),t,e); MatchEngine.match(pat,term); return new If(b,t,e); } catch (MatchFailure e) { throw VisitFailure(); } } }
Figure 12: The visitor combinator IfNotRewrite implements a rewriting rule for if statements with negative conditions.
4.4
Rewriting
optimizable Trough such customizations, optimizations can be realized. There is a smooth transition path from prototypical to optimized situations. first class Patterns are first-class entities. This allows elegantly integration of matching with visiting such that arbitrary-depth matching and active patterns are realized. These characteristics are achieved without extending the Java language. The elements of our solution are: • A generic, reflective matching engine.
Matching and visitors can also be combined to implement rewriting. Consider the following rewrite rule: if not(x) then t else e
gives programmers control over the functional and non-functional characteristics of matching. Generative approaches may be used.
→
if x then e else t
This law of programming [2] states that an if statement with a negative conditions can be replaced by an if statement with a positive condition with inverted branches. Figure 12 shows how this rewrite rule can be implemented as a visitor that employs our matching engine. This visitor builds the left-hand side of the rewrite rule as pattern object, matches it against its argument, and returns the right-hand side of the rule if the match is successful. In case of a match failure, the visitor throws a visit failure. To apply this rewrite rule to an entire program we can feed the IfNotRewrite visitor to a traversal visitor combinator: Visitor v = new Innermost(new IfNotRewrite()); v.visit(program);
In this case, we selected the Innermost combinator, which captures the leftmost innermost rewriting strategy. This combinator is part of the JJTraveler library.
5. CONCLUDING REMARKS 5.1 Contributions The transposition of the concept of pattern matching from its declarative setting to an object-oriented one is fairly straightforward, and we do not consider it to be a significant accomplishment by itself. However, the particular approach we have taken combines a number of desirable traits. generic Matching is fully generic in the sense that it applies to objects of any type. Also, there is no limitation to tree-shaped object graphs (though patterns should be acyclic to ensure termination). extensible When new classes are added to an application, the existing patterns will remain valid. customizable The generic matching behavior can be altered for specific types by implementing particular interfaces. This
• Some interfaces that allow type-specific behavior to be hooked into the generic engine. • A generator that can optionally be used to provide implementations of these interfaces. Since we do not rely on extending an existing language, the adoption of our approach does not interfere with the use of existing tools, development environments, libraries, modeling methods, refactoring aids, etc.
5.2
Related work
Java language extensions. The Pizza language [7] is an extension of the Java language with three concepts taken from functional programming: higher-order functions, parametric polymorphism (type parameters), and algebraic data types with patternmatching. JMatch [5] is also an extension of the Java language with pattern-matching. In this case, matching is not based on algebraic datatypes, but through a generalization of constructor methods, called pattern constructors. These pattern constructors have some similarity with our type-specific matching behavior. Tom [6] also provides support for algebraic data types with pattern-matching. Tom is not a full fledged language extension, but rather a precompiler. In comparison to our approach, which stays completely within Java, language extensions provide several advantages. They allow optimizing compilation of matching expressions. They allow static checking beyond the capabilities of the Java type system. They may provide more concise syntax. On the flip side, the introduction of a new language is more intrusive on the existing programming practices of potential users. Also, modeling pattern matching within Java makes it easier to introduce variations, to add sophistication, or to combine the approach with other techniques. Strategic pattern matching The Stratego language [10, 9] is a term rewriting language that does not offer a single fixed rewriting strategy, but allows the programmer to compose different rewriting strategies from basic strategy combinators. The notion of visitor combinator can be seen as an object-oriented counterpart of Stratego’s strategies. Stratego’s pattern matching features include congruences, which are term patterns that appear in strategy positions, and that can contain strategies in subterm positions. The semantics of a congruence is to match against an incoming term, and apply the strategies in subterm positions to the subterms of the incoming term. Thus, congruences allow mixing of traversal and pattern matching, in a similar vein as our Match and Visit combinators.
6.
REFERENCES
[1] A. v. Deursen and J. Visser. Building program understanding tools using visitor combinators. In Proceedings 10th Int. Workshop on Program Comprehension, IWPC 2002, pages 137–146. IEEE Computer Society, 2002. [2] C. A. R. Hoare. Laws of programming. Communications of the ACM, 30(8):672–686, Aug. 1987. [3] K. Knight. Unification: a multidisciplinary survey. ACM Comput. Surv., 21(1):93–124, 1989. [4] T. Kuipers and J. Visser. Object-oriented tree traversal with JJForester. Science of Computer Programming, 47(1):59–87, Nov. 2002. [5] J. Liu and A. C. Myers. JMatch: Iterable abstract pattern matching for java. In Proceedings of the 5th International Symposium on Practical Aspects of Declarative Languages, Jan. 2003. [6] P.-E. Moreau, C. Ringeissen, and M. Vittek. A pattern matching compiler for multiple target languages. In 12th Conference on Compiler Construction, volume 2622 of LNCS, pages 61–76. Springer-Verlag, May 2003. [7] M. Odersky and P. Wadler. Pizza into Java: Translating theory into practice. In Proceedings of the 24th ACM Symposium on Principles of Programming Languages, Paris, France, Jan. 1997. [8] M. Tatsubori, S. Chiba, M.-O. Killijian, and K. Itano. OpenJava: A class-based macro system for java. In Reflection and Software Engineering, volume 1826 of Lecture Notes in Computer Science, pages 117–133. Springer-Verlag, 2000. [9] E. Visser. Strategic pattern matching. In Rewriting Techniques and Applications (RTA’99), Lecture Notes in Computer Science, volume 1631, pages 30–44, July 1999. [10] E. Visser, Z. Benaissa, and A. Tolmach. Building program optimizers with rewriting strategies. ACM SIGPLAN Notices, 34(1):13–26, January 1999. Proceedings of the International Conference on Functional Programming (ICFP’98). [11] J. Visser. Visitor combination and traversal control. ACM SIGPLAN Notices, 36(11):270–282, Nov. 2001. OOPSLA 2001 Conference Proceedings. [12] J. Visser. Generic Traversal over Typed Source Code Representations. PhD thesis, University of Amsterdam, 2003.