Efficient Retrieval of Similar Business Process Models Based on ...

3 downloads 7753 Views 237KB Size Report
School of Software, Tsinghua University, China ... We apply MCES based similarity algorithm to business process models. ..... But all the above works have.
Efficient Retrieval of Similar Business Process Models Based on Structure (Short Paper) Tao Jin1,2 , Jianmin Wang2 , and Lijie Wen2 1

Department of Computer Science and Technology, Tsinghua University, China 2 School of Software, Tsinghua University, China

Abstract. With the business process management technology being more widely used, there are more and more business process models, which are typically graphical. How to query such a large number of models efficiently is challenging. In this paper, we solve the problem of querying similar models efficiently based on structure. We use an index named TaskEdgeIndex for query processing. During query processing, we estimate the minimum number of edges that must be contained according to the given similarity threshold, and then obtain the candidate models through the index. Then we compute the similarity between the query condition model and every candidate model based on graph structure by using maximum common edge subgraph based similarity, and discard the candidate models that actually do not satisfy the similarity requirement. Since the number of candidate models is always much smaller than the size of repositories, the query efficiency is improved.

1

Introduction

The wide use of business process management technology results in a large number of business process models. These models are typically graphical. For example, there are more than 200,000 models in China CNR Corporation Limited. How to query such a large number of models efficiently is challenging. For example, before a designer creates a new business process model, if s/he can obtain the models nearly containing her/his draft model (which is always incomplete) as a subgraph, and then continue to work on these models instead of starting from scratch, it would save a lot of time and it is less error-prone. Since the number of models is large, the efficiency of similarity retrieval is very important. The problem to be solved in this paper can be described as follows. Given a query condition model q, quickly find all the models (notated as S ) in the model repository R satisfying that, for every model m in S, we can find a subgraph sub in m, the similarity between sub and q based on structure must not be less than a specified threshold θ. There are many methods to compute the similarity between graphs based on their structure such as the method based on graph edit distance, maximum common subgraph. In this paper, we use the maximum common edge subgraph R. Meersman, T. Dillon, and P. Herrero (Eds.): OTM 2011, Part I, LNCS 7044, pp. 56–63, 2011. c Springer-Verlag Berlin Heidelberg 2011 

Efficient Retrieval of Similar Business Process Models Based on Structure

57

(MCES) based similarity, which is widely used. For example, MCES based similarity was used in [1,2]. MCES based similarity is superior to graph edit based similarity in that no particular edit operations together with their costs need to be defined. Since the computation of MCES based similarity is NP-hard, it would be much time-consuming if we scan the repository sequentially and compute the similarity between the query condition model and every model in the repository. In this paper, we use a filtering-verification framework to reduce the number of times of MCES based similarity computation. Our contributions in this paper can be summarized as follows. – We apply MCES based similarity algorithm to business process models. – We use an index named TaskEdgeIndex to speed up the query processing. – We implement our approach in the BeehiveZ system and do some experiments to evaluate our approach. There are many different notations used to capture the business processes, such as BPMN, BPEL, XPDL, EPC, YAWL, PNML and so on. Among all the notations, Petri net has good formal foundation and simple graph notations, so that it can not only be understood and used easily but also can be used for analysis. Many researchers have worked on the transformation from other notations to Petri nets. You can refer to [3] for an overview. To deal with business process models with different formats in an uniform way, we assume that all the models in the repository are represented as or transformed to Petri nets.

2

Preliminaries

Petri net was introduced into business process management area for modeling, verification and analysis in [4]. The details of Petri net can be found in [5]. Definition 1 (Petri net). A Petri net is a triple N = (P, T, F ), with P and T as finite disjoint sets of places and transitions (P ∩ T = ∅), and F ⊆ (P × T ) ∪ (T × P ) is a set of arcs (flow relations). We write X = (P ∪ T ) for all nodes of a Petri net. For a node x ∈ X, •x = {y ∈ X|(y, x) ∈ F }, x• = {y ∈ X|(x, y) ∈ F }. In this paper, we measure the similarity between two Petri nets using MCES based similarity. To obtain the MCES of two given graphs, first we get the line graphs of the original graphs, and then get the modular product graph of the line graphs, finally, we get the maximum clique of the modular product graph. When we project the maximum clique back to the original graphs, we can get the MCES of the two given original graphs. The details can be found in [1]. In the following, we give the corresponding definitions for Petri net. Definition 2 (Task edge). Given a Petri net N = (P, T, F ), a task edge is a pair te = t1 , t2  satisfying that t1 , t2 ∈ T ∧ ∃p ∈ P (p ∈ t1 • ∧p ∈ •t2 ). We distinguish the source and target of a task edge as s(te) = t1 , t(te) = t2 , and all the task edges is denoted as T E(N ).

58

T. Jin, J. Wang, and L. Wen

Definition 3 (Task edge graph). A directed graph T EG(N ) = (V, E) is the task edge graph of a Petri net N = (P, T, F ) satisfying: V = T E(N ), ∀te1 , te2 ∈ V , te1 is adjacent to te2 iff t(te1 ) = s(te2 ). We denote the incident task as adj(te1 , te2 ) = t(te1 ) = s(te2 ), which means that the task edge te1 and te2 share the same task. If te1 is adjacent to te2 , te1 , te2  ∈ E. We denote the set of vertices of a graph G as V (G) and the set of edges as E(G). Definition 4 (Task edge graph modular product). The modular product of two task edge graphs T EG1 and T EG2 , M P G(T EG1 , T EG2 ), is defined on the vertex set V (T EG1 ) × V (T EG2 ) where the respective vertex labels are compatible and two vertices (ui , vi ) and (uj , vj ) of modular product are adjacent when (ui , uj ) ∈ E(T EG1 ) ∧ (vi , vj ) ∈ E(T EG2 ) ∧ w(ui , uj ) = w(vi , vj ) or (ui , uj ) ∈ E(T EG1 ) ∧ (vi , vj ) ∈ E(T EG2 ). Here w(ui , uj ) = w(vi , vj ) indicates that the labels of adj(ui , uj ) and adj(vi , vj ) are compatible. Compatible labels means that the labels are equal when label similarity is not considered or similar when label similarity is considered. Definition 5 (Sub-similarity). Given two business process models represented as Petri nets N1 and N2 , we can construct the task edge graph modular product first, and then find the biggest clique in the modular product, which is the maximum common subgraph between the corresponding task edge graphs, denoted as ω(teg(N1 ), teg(N2 )). The MCES based similarity between N1 and N2 can be measured as: subSim(N1 , N2 ) =

3

|V (ω(teg(N1 ), teg(N2 )))| . min(|T E(N1 )|, |T E(N2 )|)

(1)

Index Construction and Query Processing

To improve the query efficiency, we use an index named TaskEdgeIndex as a filter to obtain a set of candidate models which contain at least a specific number of task edges in the query condition model. Then in the verification stage, we calculate the MCES based similarity between query condition model and every candidate model and discard all the candidate models with the similarity less than the specified threshold θ. Since the number of candidate models is always much smaller than the size of repository, the query efficiency can be improved. Moreover, the computation of task edges for the models in the repository is completed during the construction of TaskEdgeIndex, so during the query processing the time is saved. 3.1

Index Construction

The TaskEdgeIndex sets up the relation between task edges and models, and it has two parts (we only discuss the index on logical level here, the information of implementation can be found in Section 4). One part is a forward index (notated as FI ), which stores the mapping from models to task edges. The items indexed

Efficient Retrieval of Similar Business Process Models Based on Structure

59

in FI are like (m, T E), in which, m is denoted as a model, represented as a Petri net, and TE is the set of task edges of the corresponding model. FI can be used to obtain the task edges of a model. The other part is an inverted index (notated as II ), which stores the mapping from task edges to models. The items indexed in II are like (te, te.list), in which, te is denoted as a task edge, and te.list is denoted as a set of models where the corresponding te appears. II can be used to obtain all the models that contain a specific task edge. Given a model represented as a Petri net, Algorithm 1 extracts all the task edges. Every place in the model is traversed and the corresponding task edges are extracted. Algorithm 1. Task edges extraction (getTaskEdges) input : a model m represented as a Petri net output: all the task edges in the given model m 1

4

foreach p in P do foreach tpre ∈ •p do foreach tsuc ∈ p• do TE.add(tpre, tsuc);

5

return TE;

2 3

Based on the Algorithm 1, the TaskEdgeIndex can be constructed as described in Algorithm 2. From Algorithm 2, we can see that when a new model is added to the repository, the index can be updated incrementally. When one model is deleted from the repository, the mapping between the model and its task edges can be deleted from TaskEdgeIndex directly.

Algorithm 2. Add a model to TaskEdgeIndex input: a model m represented as a Petri net 1 2 3 4

3.2

TE = getTaskEdges(m); foreach te in TE do FI.add(m,te); II.add(te,m);

Query Processing

Based on TaskEdgeIndex, the query processing can be divided into two stages, namely, the filtering stage and the refinement stage. Firstly, in the filtering stage, we extract all the task edges from the query condition model q by using Algorithm 1. and use the inverted index (II ) to get the set of candidate models where at least θ × |T E(q)| task edges from the query condition model appear. Secondly, in the refinement stage, for every candidate model, we calculate the MCES based

60

T. Jin, J. Wang, and L. Wen

similarity between it and the query condition model by using Equation 1. If the similarity is less than the specified threshold θ, the corresponding candidate model is removed from the candidate set. Finally, all the models satisfying the user’s requirement are returned.

Algorithm 3. Retrieve the similar models input : a query condition model q, and the model similarity threshold θ output: all the models satisfying the requirement 1 2 3 4 5 6 7

// filtering stage qTE = getTaskEdges(q); foreach te in qTE do ret.add(II.getModelSet(te)); foreach c in ret do mTE = FI.getTaskEdges(c); if |mTE ∩ qTE| < θ × |qTE| then ret.remove(c);

10

// refinement stage foreach c in ret do if subSim(c,q)

Suggest Documents