However, in order to be used by domain experts without graph ... selves, or at a price of consultations. ..... the remaining part of the matching (pheromone check-.
Software Tools for Technology Transfer manuscript No. (will be inserted by the editor)
Manual and Automated Performance Optimization of Model Transformation Systems Tam´ as M´ esz´ aros, Gergely Mezei, Tiham´ er Levendovszky, M´ ark Asztalos
?
Budapest University of Technology and Economics, Department of Automation and Applied Informatics, 1111, Budapest, Goldmann Gy¨ orgy t´er 3, Hungary Received: date / Revised version: date
Abstract. Model-Based Development is one of the most promising solutions for several problems of industrial software engineering. Graph transformation is a proven method for processing domain-specific models. However, in order to be used by domain experts without graph transformation experts, it must be fast even if not tweaked for speed manually based on knowledge available only to the implementers of the transformation system. In this paper, we compare the performance of such manual optimizations with a solution using automated optimization based on sharing of matches between overlapping lefthand-sides of sequentially independent rules. This yields a 11% improvement in our scenario, although our prototypical implementation only exploits overlapping between at most two rules, and the analyzed benchmark does not contain many cases where the optimization is applicable.
to address the practical issues prevalent in model processing.
1 Introduction
The previously mentioned vision is close. There are a few existing prototypes integrating metamodeling systems with visual transformation capabilities. Our tool, VMTS is one of them. Obviously, there are disadvantages of giving the DSML and the DSML processing activities to the domain experts. The DSML processing application executing the model transformation must be generated automatically and cannot be fine tuned manually by software engineers. Out of the tuning issues, the most important is the performance factor.
In the past few years modeling has become an inevitable part of software engineering. Providing the background for Model-Based Development (MBD) was one of the highest priority research issues. The formal results need to be transferred to prototypes, and, later on, applications as well as technological know-hows. The same steps must happen to the methods of processing models. The bold formal background of graph transformation [1] became integrated with systems configurable for Domain-Specific Modeling Languages (DSMLs), such as GReAT[2], FUJABA [3], AToM3 [4], VMTS [5], and many other tools. Moreover, certain features have been added to graph transformation systems ?
{mesztam, gmezei, tihamer, asztalos}@aut.bme.hu
The vision of integrated environments for DomainSpecific Languages is as follows. The DSML is developed in tight cooperation with the domain experts. There are two ways for this: (i) a software engineer is trained by a domain expert, or (ii) the domain expert is trained to use these environments. In our experience, the second option is more efficient and cost-effective, since the later modifications can be added by the customer, at most consultation is needed with the software engineers. This can be facilitated by intuitive environments only with appropriately chosen modeling techniques. Nowadays, metamodeling is one of the most popular techniques to fulfill these requirements. The success of DSMLs inspired research efforts targeting to express the DSML processing steps in DSMLs as well. The expected benefit is the same: the domain experts can write and modify DSML processors themselves, or at a price of consultations.
This paper contributes methods (i) to fine tune a transformation when manual manipulation is assumed, (ii) to optimize a transformation when manual tuning is not allowed. Moreover it (iii) compares the two cases, and provides numerical measurements to show the performance penalty, and finally (iv) describes novel methods for further optimization for both cases.
2
Tam´ as M´esz´ aros et al.: Manual and Automated Performance Optimization of Model Transformation Systems (1) CreateAnts
CreateMap
Cre ate
UpdatePheromon
MapProcessed
OnFood
FinishLevel
(3)
(2) fail MoveBack
succ
NewAnt CreateCorner MoveFromCenter CreateExtension Move6
fail succ
Fig. 1. Ant World illustration
CreateAxis Move4
AntProcessed
BorderCheck
2 Motivation Fig. 2. The controlflow of the submitted solution
This paper has been created in connection with the contest organized by the 4th International Workshop on Graph-Based Tools [6]. The workshop presented a casestudy called Ant World [?], the aim of which was to compare local searching capabilities of different model transformation tools. The submitted solutions including the one [?] based on VMTS, is detailed in [6]. The exercise was to simulate an anthill (Figure 1) using model transformation. The map, where the ants can move are built from nodes connected with concentric circles of edges. The nodes are also connected radially. The anthill itself can be found in the center of the map. Two concentric circles and eight ants in the center are created at the beginning of the simulation. When the simulation is started, the ants start searching for food. The movement of the ants can be considered random in this state. If an ant reaches the border of the map, another concentric circle must be created and 100 pieces of food are created on every 10th new node. If an ant reaches a node with food, it picks up one piece of food, and moves backward to the center. On its way backward, the ant drops 1024 pheromone on each node. If the ant reaches the center and drops the food, a new ant is born in the center. If pheromone is placed on the map, it influences the movement of the ants searching for food, as they prefer nodes with at least ten pieces of pheromone to nodes with less or none. The simulation is executed in separate rounds: each ant moves once in a round to an adjacent node. The number of pheromone should evaporate by 5% at the end of each round. Our motivation for this work was twofold. Firstly, we have received feedback that our former solution contained too much imperative parts and highly exploited the knowledge about the internal behavior of the model transformation engine. The former transformation engine also required these ”tweaks” to work efficiently, thus, it would be impossible for a domain-expert to implement an equally efficient solution in the tool. Therefore, we tried to improve both the transformation engine
and the solution to avoid these tweaks. Secondly, we have discovered similarities between the rules, and we came to the idea that similarity between rules could be exploited to improve the performance of the transformation, not only in the case study, but in general as well. VMTS uses a control flow-driven [7] model transformation engine. Fig. 2 depicts the control flow of our former Ant World solution. Rules in block (1) are used to build the initial map including the two levels of the map and the eight ants in the center. Rules in block (2) are responsible to perform ant movement, in order: picking up food, moving an ant with food back to the center, creating a new ant in the center, moving an ant without food out of the center, moving an ant if it stands on a node with six adjacent nodes, moving an ant with four adjacent nodes (not the center), and enabling the movement of the ants in a next round. The BorderCheck rule validates if there is an ant already at the edge of the map. If not, a new round begins, otherwise a new level of the map is built in block (3). In block (3), we create the upper right corner (CreateAxis), then build one side of the map (CreateExtension), and finally try to build another corner of the map. If we can, we continue to build another side, otherwise we finish building the new level. The UpdatePheromon rule evaporates the pheromone placed on the map-nodes at the end of each round.
3 Solutions with optimization techniques Recall that the motivation to provide a new solution was to decrease the manual optimization of the transformation as much as possible. This also involved the modification of the transformation engine to support the declarative definition of rules. In this section, we compare the
Tam´ as M´esz´ aros et al.: Manual and Automated Performance Optimization of Model Transformation Systems
3
a:Ant
GridNode
0..* GridEdge
0..*
e:IsAnt isProcessed isAxis level food isCenter isBorder counter pheromone
IsAnt 1
0..*
Ant hasMoved carriesFood
Fig. 3. The metamodel of the Ant World
solutions with and without manual modifications of the model transformation engines. 3.1 Metamodel Both solutions use the same metamodel which is depicted in Fig. 3. The individual nodes of the map are represented by GridNode nodes that are connected with GridEdge edges. Each node has the following properties: isAxis (indicates whether the node is on either of the diagonals), level (the distance from the center), food (the number of food items on the node), isCenter (whether the node is the center node), isBorder (whether the node is on the border of the current map), counter (the order ID of the node, the order ID of the center node is 0), pheromone (the number of pheromone items on the node) and isProcessed that is used to mark if the node has already been processed when building a new level of the map. Each GridNode can contain several Ants by using the IsAnt edge. An Ant can either search for food or carry food back to the anthill. These two possibilities are modeled with the carriesFood attribute. The hasMoved attribute is used to mark the ant if it has already been moved within a round. 3.2 Static indexes We have achieved noticeable performance gain by exploiting three implementation details: (i) the data structures used to store the in- and outgoing edges of nodes are directly indexable like an array, (ii) the map of the Ant World has a well-defined regular structure, (iii) during the definition of a rule we were able to influence the order of creating new elements, thus, influence their positions in the container data structures as well. If we know that a specific edge must exist, and its index in the data structure is also known, then instead of performing a search operation, we can directly access it at a statically known index. Generally, the matching of each edge would create a possible backtracking point (matching nodes as the endpoint of an edge is unambiguous), which must be checked in case of backtracking even if there are no more other edges to check. Using static indexes, we can prevent the creation of a backtracking point in the search space. This feature is illustrated in Fig. 4, which presents the LHS side of moving ants back to the center if they
StatIdx = 0 hasMoved = false carriesFood = true
act: GridNode isCenter = false e4:GridEdge StatIdx = 0 prev:GridNode
Fig. 4. The LHS of the rule moving ants with food back to the center
already carry food (the static index is described with the StatIdx attribute). StatIdx=0 can be set for e, as an Ant node has exactly one outgoing IsAnt edge, which is found at the index 0. We also know (because of the systematic structure of the map) that a node is always connected with a GridEdge edge at the index 0 to another GridNode one level closer to the center. This means that we can also index that edge directly. Given an ant, we can perform the entire match without the need to create a backtracking point in the matcher. However, static indexes are not generally applicable. We admit that they highly build on the knowledge of the internal behavior of the transformation engine and they also require the knowledge of the match order already at design time. 3.3 Search plan In the solution using manual coding of the transformation engine, we did not use a sophisticated method to determine the matching order of the elements in the LHS part of the rule. One was able to manually define the LHS element, where the transformation starts (Pivot node), but did not have any influence on the order of the following elements to be matched. We applied breadthfirst-search to match the elements of the rule, where the matcher extended the already existing partial match always with a new element, but the selection of the new element was fairly random (actually the order of the elements in the data layer). This did not cause performance issues, because the LHS pattern in the manually optimized solution could be matched in exactly one way after selecting the pivot point. The LHS pattern of a typical ant-movement rule (MoveFromCenter ) can be seen in Fig. 5. We set the pivot node of the rule to the Ant, thus, the matching starts there. The next element that can be matched is e, then n0. Then e1,e2,e3,e4 together with n1, n2, n3 and n4. The only condition that must be satisfied by n1, n2, n3, n4 is that they have to be different nodes. In addition, the selection of e,e1,e2,e3,e4 is performed using statical indexing (the center node always has exactly four adjacent nodes). Thus, a match for this pattern can succeed in exactly one way. Instead of using Pivot nodes, we implemented the BacktrackingOnly heuristics published in [8]. It intro-
4
Tam´ as M´esz´ aros et al.: Manual and Automated Performance Optimization of Model Transformation Systems n1:GridNode
n2:GridNode
next: GridNode pheromon > 9
a:Ant
i:IsAnt
e2:GridEdge a:Ant
e:IsAnt
n0: GridNode
e4:GridEdge
e3:GridEdge
act: GridNode isCenter = true
carriesFood = false
carriesFood = false
next: GridNode pheromon < 10
hasMoved = false
e: GridEdge (random)
a:Ant
n3:GridNode
act: GridNode isCenter = true
hasMoved = false
isCenter = true i:IsAnt
hasMoved = false carriesFood = false
e: GridEdge (random)
a:Ant
e1:GridEdge
n4:GridNode
i:IsAnt
Fig. 5. LHS of the MoveFromCenter ant movement rule
act: GridNode isCenter = true
hasMoved = false carriesFood = false A) LHS of the original rule
duces a sophisticated cost-model for primitive matching operations and tries to minimize the overall cost of the execution of a rule by minimizing the possible backtracks when matching an element. The idea of the approach is building a special plan graph for the pattern to be matched. The plan graph is a directed graph, the nodes of which correspond to the elements (both nodes and edges) of the pattern graph, and the edges of it reflect the possible matching orders in the pattern graph. E.g. if an edge is connected to a node in the pattern graph, then in the plan graph an edge leads from the node corresponding to the pattern node to the node corresponding to the pattern edge, and another edge in the other direction. The two edges represent the two cases, when the node was found first by the matcher, and the match is extended with a connecting edge, or the other way: the edge was found first, and the matcher extends it with the appropriate endpoint. A weight is assigned to each edge of the plan. The weight reflects the number of possible backtracks when extending a partial match with the element the edge points to. The weight is calculated offline based on the statistics of the existing models. Then the algorithm calculates an optimal matching order by finding the minimal multiplicative spanning tree in the plan graph. By applying the presented heuristics, we have achieved the same match order for the rules as before without having to define pivot nodes manually. 3.4 Nondeterministic choice The case study by its definition requires the random selection of adjacent nodes to move the ant to. The original solution used the imperative part of the rules to describe the random selection and to check the pheromone level on the adjacent nodes. To replace this solution, we have introduced an additional flag called IsRandomChoice for LHS elements that indicates that an element should be selected in a deterministic order or using a random permutation. For performance issues, we created a lazy version of the Durstenfeld [9] algorithm with a modification to keep the original order of the source array. This solution has some overhead compared to the previous purely imperative version because we have to allocate memory
B) New rules
Fig. 6. Original and refined movement rule with random selection
for an index array at each case we need a random permutation. However, we were able to greatly decrease imperative codes of the rules. Fig. 6 illustrates how we have rewritten the LHS of the movement rule which moves the ants from the center node. The original version was able to match each adjacent node at the same time, however, the remaining part of the matching (pheromone checking, random selection) and the reconnection of the right endpoint of the i edge had to be written in imperative code. We were not able to describe in a declarative way to prefer nodes with pheromone within a rule, so we have split the rules into two different one: (i) the first rule processes ants on nodes which have adjacent node with more than 9 pheromone, (ii) the second rule is executed after the first rule, and processes the remaining ants without adjacent nodes with more than 9 pheromone. Although we had to split the rule into two parts, we could almost completely eliminate the imperative code assigned to the rules: it is limited to the setting of the hasMoved attribute of the ants. We also had to modify the original Move6 and Move4 rules for similar purposes. However, due to the more expressive declarative capabilities, we were able to express the same behavior with two rules instead of 4 (splitting both rules into two). The LHS of the resulting rules is also depicted in Fig. 10 (MoveGeneral1 and MoveGeneral2 rules). 3.5 Measurement results Table 1 summarizes the measurement results of the original and the rewritten transformation. The transformation was executed on a PC with Intel Core2 Duo 3.0 GHz CPU and 2GB memory. As it can be seen, we have to face about 70% performance loss if we avoid manual optimizations and rely on a fairly naive generated matcher. 4 Overlapped pattern matching The most time consuming part of a round in the transformation is the execution of the ant movement rules.
Tam´ as M´esz´ aros et al.: Manual and Automated Performance Optimization of Model Transformation Systems
Round 100 200 300 400 500 600 700 800 900 1000
Manually opt. Time Level Ant (ms) 16 1565 46 39 7981 312 69 16178 1045 102 25163 2496 131 34968 4945 163 44526 8408 197 54717 12636 230 65156 18064 260 75692 24320 294 86453 32042
Automatically opt. Time Level Ant (ms) 19 1956 124 44 8487 826 73 16732 2527 105 25960 5444 137 35149 9703 170 45120 15334 201 55448 22620 234 65998 31387 267 76395 42323 299 86631 54303
Table 1. Original and rewritten transformation results
We have discovered that ant movement rules show certain degree of similarity. For example, each movement rule contains an Ant node connected to a GridNode node with an IsAnt edge. The GridNode usually has at least another outgoing GridEdge and a connecting GridNode. Rules are executed independently, thus, each match starts from scratch, regardless of the fact that exactly the same elements were matched in a previous rule by isomorphic subgraphs of the two patterns. Overlapped matching builds on the simple idea of interlacing the matching of different rules by matching common parts at the same time, and extending these parts with the remaining parts of the matches for each rule. If a common match can be extended to a complete match, then the rewriting operations for the completed match are executed. For this purpose, the first step is to analyze the executed rules whether they can be overlapped and to decide if it is worth executing them in an overlapped way at all (based on available statistics). These verifications can be performed offline, and a specialized matcher can be generated which can be executed then on an arbitrary number of host models without any further cost of runtime analysis or optimization. 4.1 Applicability VMTS supports three ways of executing a rewriting rule: (i) In case of first fit execution, the matcher runs until a match is found, and executes its rewriting operations once. (ii) exhaustive execution means that the same rule is executed repeatedly. If we find a valid match, the rewriting operations are performed, and we restart the matching from the beginning. Otherwise, the rule is finished. (iii) Iterative execution means that one (or several) element(s) of the match is (are) substituted for a specific element of a set (e.g. all Ants or a subset), and the rule is executed for each element of the set, regardless of the success of a previous execution. In this paper, we deal with the basic use cases: (i) overlapping rules executed in a first fit manner; (ii) over-
5
lapping rules executed in an exhaustive manner; (iii) overlapping rules executed in an iterative way. 4.2 Definitions Given rules p1 , p2 ... pn are executed sequentially. For the sake of simplicity, we have the assumption that @pj executed between pi and pi+1 if i ∈ 1..n − 1, where n means the number of examined rules. We denote the LHS of pi as Li , and the RHS as Ri . (k)
Definition 1. (common subgraph) Let L∩ mean a graph having an isomorphic subgraph in ∀ Li , i = 1..n, and (k) (k) mapi : L∩ → Li injective morphism defines the map(k) ping between L∩ and Li . ((k) means that one such graph is selected out of all possibilities.) (k)
Definition 2. (complementer graph) Let Li/∩ mean the (k)
complementer graph of L∩ in Li . If Vi defines the gluing points (the common points) (k) (k) between the two graphs, then Li/∩ = Li /(L∩ /Vi ). Definition 3. (match set) Let M atch(L, G) mean the set of valid matches for L in the G host graph, where m ∈ M atch(L) and m : L → G defines an injective morphism between the elements of L and the G host graph. Definition 4. Let M atch(m, L, G) mean the set of valid matches in the host graph G for L having the initial match m already bound. Necessarily, the common elements of the dom(m) (the domain of m) and L should be mapped to the same elements in G, but only the common elements. Definition 5. Let EXECU T Epi (mi , G) mean the result graph of the execution of rule pi on match mi in the G host graph. Definition 6. (OLRA susceptibility) Two rewriting rules pi and pj are Overlapped Rewriting Algorithm pi ,mi +3 H1 pj ,mj +3 X are sequen(OLRA) susceptible, if G tially independent [11] for ∀G, mi , mj , with the extension that the attribute transformations of pi do not enable or disable the preconditions of pj and vica versa for ∀G, mi , mj . By attribute transformation we mean changing of the values of the attributes. Intuitively, the definition means that the execution of a rule does not enable or disable the execution of another rule on a match, regardless of the match itself or the host graph. Definition 7. (commutative attribute transformations) The attribute transformations performed by two sequentially independent graph transformations pj ,mj pi ,mi 3 + 3 + H1 G X are commutative if the attribute
6
Tam´ as M´esz´ aros et al.: Manual and Automated Performance Optimization of Model Transformation Systems
configuration of the resulting graph (X) does not depend on the execution order of the productions.
(I) (v)
(3')
(1)
Definition 8. (strong OLRA susceptibility) Two graph productions pi and pj are strongly OLRA susceptible, if they are OLRA susceptible and the attribute transforpi ,mi +3 H1 pj ,mj +3 X are commations performed by G mutative for ∀mi , mj . Intuitively, this means that the execution of a rule does not enable or disable the execution of another rule on a match, and the resulting attribute configuration does not depend on the execution order, regardless of the match itself or the host graph. 4.3 Algorithms In the following we present the three algorithms for the first fit, the iterative and the exhaustive case. In each case first we present the control flow pattern, on which the algorithm can be applied together with additional application conditions. Then we introduce the algorithm and finally we provide semantic analysis to prove the correctness of the approach. 4.3.1 Overlapping first fit-executed rules Algorithm 1 is applicable to rules which are executed sequentially in the first fit manner: EXECU T E(p1 ), EXECU T E(p2 ) ... EXECU T E(pn ), and pi is skipped if it is not applicable. The pi and pj rules are required to be OLRA susceptible for each i 6= j, and it is also (k) necessary, that the rules have an intersection: ∃ L∩ . The algorithm searches for a match m0 for the com(k) mon pattern L∩ , then tries to extend the common match to a complete match for each pi rule. The matchSucceededi flag is used to ensure that only one match is searched for each pi . After the algorithm has found a match for each rule, or has traversed the entire search space, only then executes the rules keeping the original order of execution. Algorithm 1 Overlapped rewriting rule for first fit execution matchSucceededi = f alse, ∀ i = 1..n (k) // Enumerating matches for LHS∩ (k) for all m∩ ∈ M atch(L∩ , G) do for i = 1 to n do if not matchSucceededi then if ∃mi ∈ M atch(m∩ , Li , G) then matchSucceededi = true if @i : ¬matchSucceededi then break for for i:=1 to n do G := EXECU T Epi (mi , G)
Given the two LHS models (L1 and L2 , for rules p1 and p2 ) and the host graph depicted in Fig. 7, first
(J)
(a)
(i)
(b’)
(ii) (1')
(2) (K) (iii)
(b)
(a’) (2')
(L)
(3)
(iv)
L1
Host graph
L2
Fig. 7. Overlapped matching of rules
we discover that (1), (2), (a) and (1’), (2’), (a’) are isomorphic. The algorithm starts searching for the common pattern isomorphic with (1), (2) and (a). A possible match for that are the nodes (i) and (ii) and the edge (J). Now, they can be mapped to (1), (2), (a) and (1’), (2’), (a’) as well. Next, the algorithm tries to finish the matching of L1 . Since (b) and (3) cannot be mapped to (K) and (iii), the matching of L1 is suspended. The matcher attempts to finish L2 , and (I) is found as a match for (b’) and (v) for (3’). Now, matchSucceeded2 is set to true. As two rules are overlapped only, another common match is searched for, and (ii), (K), (iii) is found. The matcher extends this common match with (L) and (iv) for (b) and (3), and matchSucceeded1 is also set to true. As each matchSucceeded flag is set to true now, the matcher exits, and the algorithm executes first p1 then p2 . Note that (1), (2), (a) and (1’), (2’), (a’) do not necessarily match the same elements in the host graph. The benefit of this approach is, that the the matches for the common patterns do not have to be generated twice for both rules (as opposed to of sequential execution). Proposition 1. The algorithm generates a result equal to the one provided by the sequential execution. Proof. As the p1 , p2 ... pn rules are OLRA susceptible, pi cannot enable or disable a match for pj i 6= j. Thus, it is not required to keep the original, sequential execution order to perform the matching phase in case of the different rules. However, the different execution order might result in a different attribute configuration for the resulting graph, we thus preserve the execution order of the rewriting phase of the rules.
4.3.2 Overlapping iteratively executed rules Algorithm 2 is applicable to rules which are executed in a sequential order and in the iterative way: forall I in S do EXECUTE(I, p1 ), ..., forall I in S do EXECUTE(I, pn ), where EXECU T E(I, pi ) denotes, that the pi rule is executed on a match based on the I initial match. To be able to overlap iteratively executed rules, in addition to satisfying the conditions of strong OLRA susceptibility,
Tam´ as M´esz´ aros et al.: Manual and Automated Performance Optimization of Model Transformation Systems
it is also expected that the initially bound elements of each rule can be found in the intersection of the LHS graphs as well, and the bound elements are iterated over the same initial match set. Algorithm 2 Overlapped rewriting rule for iterative execution for all mI in (M I Initial match set) do matchSucceededi = f alse, ∀ i = 1..n (k) // Enumerating matches for L∩ (k) I for all m∩ ∈ M atch(m , L∩ , G) do (k) //Completing matches Li/∩ for i = 1 to n do if not matchSucceededi then if ∃mi ∈ M atch(m∩ , Li , G) then matchSucceededi = true G = EXECU T Epi (mi , G) if @i : ¬matchSucceededi then break forall
Algorithm 2 extends Algorithm 1 at two points: (i) mI initial matches are selected in a cycle from the M I initial match set, and the first fit approach is executed for each of them, (ii) the matches for m∩ are searched after we have set the elements covered by LI . The matchSucceededi flag is used again to ensure that pi has been executed exactly once for the same mI initial match. Proposition 2. In case of the overlapped execution of iteratively executed rules, the set of possible resulting host graphs is the same as in case of the sequential execution of iterative rules. Proof. (i) Executing the algorithm on one rule on several mI initial matches, the execution is simplified to the iterative execution of a simple rule. (ii) If the algorithm is applied on several rules, and an arbitrary mI initial match is selected, the execution is simplified to the first fit execution the correctness of which has already been proved. (iii) In case of executing the algorithm on several mI initial matches and several pi graph productions: 1. The enumeration order of the mI initial matches is the same for each pi production as in case of the sequential execution. This is an important property of the algorithm, as, however pi and pj are strongly OLRA susceptible for each i 6= j, pi applied at mIk initial match may influence the execution of pi at mIl initial match. 2. The execution of graph production pi may precede the execution of pj (i > j) having an initial match mI , however - because of the strong OLRA susceptibility of pi and pj - this is equal to the reversed execution. Consequently, the execution of pi rules on mI initial matches in an overlapped way can always be reordered
7
to match the execution order of the non-overlapped processing. 4.3.3 Overlapping exhaustively executed rules Here we assume, that the rules to be overlapped are executed sequentially and in the exhaustive way: do execute(p1 ) while success, ..., do execute(pn ) while success. Thus, a rule is executed until a match is found for it, and if and only if it cannot be found, then is the next rule executed. Overlapping exhaustively executed rules also requires the rules to be strongly OLRA susceptible, as the execution order of the rules will probably differ from the original, sequential order. Algorithm 3 describes a method to perform overlapped matching of exhaustively executed rules. Algorithm 3 Overlapped rewriting rule for exhaustive execution matchSucceededi := true ∀i = 1..n while ∃i : matchSucceededi do actM atchSucceededi := f alse ∀i = 1..n (k) // Enumerating matches for L∩ (k) for all m∩ ∈ M atch(L∩ , G) do (k) //Completing matches Li/∩ for i := 1 to n do if matchSucceededi then while ∃mi ∈ M atch(m∩ , Li , G) do G = EXECU T Epi (mi , G) actM atchSucceededi = true matchSucceededi = actM atchSucceededi ∀i = 1..n
Proposition 3. When Algorithm M atch(Li , G) = Ø ∀i = 1..n.
3
finishes,
Proof. The flag matchSucceededi is set to false, when (k) none of the m∩ ∈ M atch(L∩ , G) common matches can be completed to mi ∈ M atch(Li , G) match. After this point pi is not attempted to be matched and executed again. As pi and pj (i 6= j) are pairwise strongly OLRA susceptible, the execution of neither pj rule can generate a match for pi (i 6= j) thus reenabling it. Proposition 4. The resulting host graph after the overlapped execution of exhaustive rules is one of the possible resulting graphs for the sequential execution of the same rules. Proof. (1) Being pi and pj i 6= j strongly OLRA susceptible, the execution order of pi and pj (having arbitrary mi and mj matches) does not influence the resulting graph. (2) In Proposition 3 we have already proven that @mi ∈ M atch(Li , G) ∀i = 1..n when the algorithm finishes. (3) As pi and pj are strongly OLRA susceptible, the execution of the different pi -s on different matches can
8
Tam´ as M´esz´ aros et al.: Manual and Automated Performance Optimization of Model Transformation Systems n2:N weight10 X e1:E value>0
n2:N weight>9
n1:N weight>10
NewAnt
a2:Ant hasMoved = true carriesFood = false
e2:E value>0
1..1
1..1
e1:E value>0
e2:E
X value 9
i:IsAnt
hasMoved = false carriesFood = false
i:IsAnt
hasMoved = false
act: GridNode IsCenter = true next: next.Pheromone > 9
carriesFood = false next: GridNode
e: GridEdge (random)
a:Ant
next: GridNode Pheromon > 9 e: GridEdge (random)
a:Ant
act: GridNode IsCenter = true
carriesFood = false MoveGeneral1
MoveFromCenter2
MoveGeneral2
a:Ant i:IsAnt
act: GridNode IsCenter = false next: next.Pheromone > 9 and next.Level = act.Level+1
hasMoved = false carriesFood = false
next: GridNode Pheromon > 9 Next.Level = act.Level + 1 e: GridEdge (random) act: GridNode IsCenter = false next: next.Pheromone > 9 and next.Level = act.Level+1
prev: GridNode prev: GridNode NewAnt
prev.Level = act.Level - 1
MoveBack
a:Ant
e: GridEdge
a:Ant i:IsAnt
act: GridNode IsCenter = true
i:IsAnt
hasMoved = false
hasMoved = false
carriesFood = true
carriesFood = true
act: GridNode IsCenter = false
Round 100 200 300 400 500 600 700 800 900 1000
Automatically opt. Time Level Ant (ms) 19 1956 124 44 8487 826 73 16732 2527 105 25960 5444 137 35149 9703 170 45120 15334 201 55448 22620 234 65998 31387 267 76395 42323 299 86631 54303
9
Overlapped Level
Ant
18 42 68 99 130 162 196 227 261 295
1085 6779 15183 24311 34208 42024 54358 65082 75832 86898
Time (ms) 78 561 1918 4383 8112 13228 19671 27721 37097 48672
Table 2. Rewritten and overlapped transformation results
Fig. 10. LHS parts of overlappable rewriting rules
conditions (IsCenter = true and IsCenter = f alse) are applied to the different act nodes, we can overlap them as well, but we evaluate the different conditions already when the common match is tried to be extended with the rule-specific remaining parts. 4.6 Heuristics to estimate cost of total match We try to find an optimal overlapped match by calculating the estimated cost C0j for the common match based on [8], and for the remaining parts of the rules (Cij ) for each common match (j), and produce P the aggregated estimated cost by C j = C0j ∗ (1 − n + i Cij ). The 1 − n addent expresses that the cost of the common match (C0j ) has to be counted only once, and not for each (n) rule. If the minimum overlapped cost is smaller thanPthe cost of executing the rules sequentially: minj C j < i Ci , then we choose the overlapped execution OEj instead of the sequential. 4.7 Application to the Ant World case study We have created a reference implementation for the results introduced previously. It is capable of overlapping at most two rules at once because of implementation reasons. Fig. 10 illustrates the LHS models of the rules that can be overlapped. Each rule was executed in the iterative manner in the original implementation (we iterate through the ants, and try to finish the matches based on them), therefore, Algorithm 2 is used to optimize the execution. As the problem requires to prefer nodes with pheromone in outer map circles, we had to split movement rules into two parts: MoveFromCenter1, MoveFromCenter2, MoveGeneral1, MoveGeneral2. The MoveFromCenter rules match ants in the center of the map, and move them one level further from the center. We have removed the replacement edge that models the reconnection of
the edge i for the sake of simplicity in Fig. 6. Similarly, the MoveGeneral rules move the ant when it is not in the center: MoveGeneral1 tries to move the ant towards pheromones one level further from the center, while MoveGeneral2 moves the ant in arbitrary random direction. If successfully executed, each rule sets the hasMoved attribute of the Ant element to true. It means that it should not be processed by another rule. Because of this fact, each Ant node in the LHS models is dependent on another Ant node in another RHS. However, according to the metamodel, the right multiplicity of the IsAnt edge is 1..1, this means that an Ant always has exactly one GridNode connected to it. The act node can also be involved into the independency evaluation, and as they are pairwise exclusive (MoveFromCenter1 - MoveFromCenter2, MoveGeneral1 - MoveGeneral2, NewAnt MoveBack ), the Ant nodes are independent. Therefore, the entire rule set is also pairwise independent. To ensure the exclusiveness of the act nodes we have added additional negative and positive application conditions to them. The positive application conditions could also be derived from the topology of the rules (∃next : next.P heromone > 9), but we are not able to discover them at the moment. As a result, we were able to combine the six rules into three overlapped rules.
4.8 Measurement results Table 2 summarizes the measurement results of the automatically optimized and its overlapped version. After 1000 rounds of execution we can measure about 11% difference in favor of the overlapped version. We have been able to achieve this difference (i) even though the current version of the matcher is able to overlap only two rules at once, and (ii) the rules of the Ant World case study are not ideal in sense of overlapping, as only few elements (an edge end its two endpoints) of the LHS models can be matched at once.
10
Tam´ as M´esz´ aros et al.: Manual and Automated Performance Optimization of Model Transformation Systems
5 Related work Varr´ o et al. have already published a technique called Incremental pattern matching [12] [13] which is an online technique to store parts of matches and to reuse them in another rewriting rules within the same transformation. Their approach uses sophisticated data structures and algorithms to store matches and to efficiently update them during the transformation. The drawback of this approach is that it is an online method, and in case of transformations that intensively modify the host graph the overhead of maintaining these data structures not only eliminates the benefits of the technique but obstructs the transformation compared to the nonoptimized version as well. However, it may perform well in case of the transformation is used for verification purposes or when the host graph is modified relatively rarely compared to the number of the matches. The technique has been implemented in VIATRA2 [14]. To the best of our knowledge this is the only existing solution which exploits the similarity of the executed rules to share partial matches between the them. However, as already mentioned, it uses on online technique and targets another application domain compared to our solution. VIATRA2 uses a sophisticated cost model [15] to generate a search plan for the execution of a rule. Cost calculation is based on the creation of a search graph, which models the basic operations (element enumeration, navigation) of the matching as nodes. Nodes in the search graph are connected with weighted edges, which weight is derived from the number of potential backtracks originated in matching an element. Search plan generation is originated in finding a minimum directed spanning tree (DST) in the search graph using the Edmonds algorithm [16]. GrGen.NET [17] [8] is a graph transformation tool implementing the SPO (single pushout) [11] approach. The cost model of GrGen.NET extends the solution of VIATRA2 by also considering the cost of performing a primitive step in the matching as well. Furthermore, instead of optimizing by searching for a minimum directed spanning tree in the plan graph, it searches for a minimum multiplicative directed spanning tree, minimizing thus, the product of the costs instead of their sum, thus, they can reach more accurate solutions. The cost model utilizes not only static information and information derived from the metamodel, but also the statistics derived from the current host model. GrGen.NET does not use fixed matching strategies, instead it optimizes the matching strategy at runtime depending on the actual host graph. In FUJABA (From UML to Java and Back Again) [3], the combination of activity diagrams and collaboration diagrams are used to express control structures. A story-diagram [18] is written in a visual programming language that facilitates the specification of complex application-specific object structures. FUJABA uses
a breadth-first search strategy to find a match in the host graph. The matching starts from a pivot node fixed by the designer. The matching strategy is generated at compile-time for each rule. PROGRES [19] uses a sophisticated operation graph to describe each primitive step of the matching including the enumeration of nodes, navigation along edges or even the verification of attribute constraints. PROGRES also provides a cost model to estimate the overall cost of valid search plans. However, cost calculation does not take into account the statistics of the current host graph or the properties of the metamodel of the host graph, but uses the assumptions based on a typical domain the tool is planned to be used on. AGG [20] is a development environment for attributed graph transformation systems supporting algebraic approaches to graph transformation. It aims at specifying and rapid prototyping applications with complex, graph structured data. AGG graphs may be attributed by Java objects and types. The graph rules may be attributed by Java expressions which are evaluated during rule applications. Additionally, rules may have attribute conditions being boolean Java expressions. AGG implements critical pair analysis. The Graph Rewriting and Transformation (GReAT ) [2] framework is a transformation system for domain specific languages (DSL) built on metamodeling and graph rewriting concepts. The control structure of the GReAT allows specifying an initial context for the matching to reduce the complexity of the general matching case. The pattern matcher returns all the possible matches to avoid the inherent non-determinism in the matching process. The execution engine chooses a path nondeterministically, and the path that is chosen is executed completely before the next path is selected. The attribute transformation is specified by a proprietary attribute mapping language whose syntax is close to C. LHS of the rules can contain OCL constraint to refine the structure but postconditions are not supported. GReAT uses breadthfirst traversal to find a match. The traversal starts from nodes initially matched (referred to as pivotted pattern matching).
6 Conclusions We have presented methods to fine tune a transformation when manual manipulation is assumed, and methods for a fully automatic solution. We have compared the benefits and the drawbacks of two different approaches and elaborated them with the help of the Ant World case study in this paper. We have shown that fully automated model transformation requires less manual implementation, and less knowledge about the internal mechanisms of the model transformation system. We can define transformation steps in a clearly declarative way, and we do not have to precisely define the match order or even
Tam´ as M´esz´ aros et al.: Manual and Automated Performance Optimization of Model Transformation Systems
the element where the matching starts in a rule, the environment automatically generates this information using search plans. The price of this convenience is the speed loss of the transformation. The performance penalty was also shown in the paper with numerical measurements. We have presented novel methods to evaluate pattern matches of different rewriting rules in an overlapped way, increasing thus the overall performance of the transformation. Our technique discovers common parts of different rewriting rules, and tries to evaluate them only once. We have analyzed the applicability of the approach: we have found criteria when it is possible to overlap the matching of different rules without influencing the final output, we have also examined when it is worth to overlap rules at all. We provided heuristics to discover the parts of rules which are worth to be combined. We have created a reference implementation of our algorithms, and presented them on the Ant World case-study. We have achieved measurable difference even though it turned out the the applied rewriting rules are not ideal in the sense of this improvement. Note that the presented approach is not specific to this case study, but generally applicable for any model transformation which contains similar, but independent rules. The gain achieved by the overlapped execution increases together with the level of similarity. Future work contains a more sophisticated mechanism to discover dependency between rewriting rules, especially for analyzing attribute constraints and the influence of rewriting rules on them in a more accurate way. We tend to improve the overlapped matcher generation technique to be able to (i) overlap the matching phase of more than two rules and to (ii) apply the technique on several rules in a hierarchical way. It is also an open issue how to discover the applicability of a rule from the successful or unsuccessful application of another one. Acknowledgements. The Bolyai Research Scholarship of the Hungarian Academy of Sciences has supported this paper. Infragistics has partly supported the activities described in this paper.
References 1. Heckel, R., Graph Transformation in a Nutshell, Language Engineering for Model-Driven Software Development, Dagstuhl Seminar Proceedings 04101, Internationales Begegnungs- und Forschungszentrum f¨ ur Informatik (IBFI), Schloss Dagstuhl, Germany (2004) 2. Vizh´ any´ o, A., Agrawal, A., Shi, F., Towards generation of efficient transformations, Proc. of 3rd Int. Conf. on Generative Programming and Component Engineering (GPCE 2004), LNCS 3286, pp. 298-316, 2004. 3. Klein, T., Nickel, U., Niere, J., Z¨ undorf, A., From UML to Java and back again, Technical report, University of Paderborn, 2000.
11
4. de Lara, J., Vangheluwe, H., AToM3 : A Tool for Multi-Formalism and Meta-modeling. In Fundamental Approaches to Software Engineering, LNCS 2306, pp. 174– 188., 2002. 5. Visual Modeling and Transformation System homepage: http://www.aut.bme.hu 6. 4th International Workshop on Graph-Based Tools homepage: http://www.fots.ua.ac.be/events/grabats2008/ 7. Lengyel, L., Levendovszky, L., Mezei, G., Model Transformation with a Visual Control Flow Language, International Journal of Computer Science (IJCS) 1, pp. 45–53., 2006. 8. Batz, G. V., Kroll, M., Geiß, R.: A First Experimental Evaluation of Search Plan Driven Graph Pattern Matching, Proc. of the 3rd Intl. Workshop on Applications of Graph Transformation with Industrial Relevance (AGTIVE ’07), Springer, 2008. 9. Durstenfeld, R., Communications of the ACM 7, pp. 420., 1964. 10. Heckel, R., K¨ uster, J. M., Taentzer, G., Confluence of Typed Attributed Graph Transformation Systems, Proc. ICGT 2002. Volume 2505 of LNCS, pp. 161-176,2002 11. Ehrig, H., Ehrig, K., Prange, U., Taentzer, G.: Fundamentals of Algebraic Graph Transformation. Monographs in Theoretical Computer Science. Springer, 2006. 12. Varr´ o, G., Varr´ o, D., Graph Transformation with Incremental Updates. In Proc. Graph Transformation and Visual Modelling Techniques (GT-VMT 2004), Barcelona, Spain, 2004. 13. Varr´ o, G., Varr´ o, D., Sch¨ urr, A., Incremental Graph Pattern Matching: Data Structures and Initial Experiments. In Graph and Model Transformation (GraMoT), Electronic Communications of the EASST 4. 2006. 14. Varr´ o, D., Automated Model Transformations for the Analysis of IT Systems, PhD Thesis, http://www.inf.mit.bme.hu/FTSRG/Publications/ varro/2004/phd thesis.zip, 2004. 15. Varr´ o, G., Varr´ o, D., Friedl, K.: Adaptive Graph Pattern Matching for Model Transformations using Modelsensitive Search Plans. In Proc. of Int. Workshop on Graph and Model Transformation (GraMoT’05), ENTCS 152, pp. 191–205., Elsevier, 2005. 16. Edmonds, J., Optimum branchings, Journal Research of the National Bureau of Standards pp. 233240, 1967. 17. Geiß, R., Batz., G. V., Grund, D., Hack, S., Szalkowski, A. M., GrGen: A Fast SPO-Based Graph Rewriting Tool, Graph Transformations (ICGT 2006), pp. 383– 397., Springer, 2006. 18. Fischer, T., Niere, J., Torunski, L., Zndorf, A., Story Diagrams: A new Graph Rewrite Language based on the Unified Modeling Language, in Proc. of the 6th International Workshop on Theory and Application of Graph Transformation (TAGT), LNCS 1764, pp. 296–309, Springer, 1998. 19. Z¨ undorf, A., Graph pattern-matching in PROGRES, Proc. of the 5th Int. Workshop on Graph Grammars and their Application to Computer Science, LNCS 1073, pp. 454-468, 1996. 20. Taentzer, G., AGG: A Graph Transformation Environment for Modeling and Valodation of Software, Application of Graph Transformations with Industrial Relevance (AGTIVE 2004), Springer, 2004