Yi Wang, Minglu Li, Jian Cao, Xinhua Lin, Feilong Tang. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai. 200240 ...
Workflow Similarity Measure for Process Clustering in Grid Yi Wang, Minglu Li, Jian Cao, Xinhua Lin, Feilong Tang Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China {wangsuper, li-ml, cao-jian, lin-xh, tang-fl}@cs.sjtu.edu.cn
Abstract In grid environment, workflow process can be seen as not only cooperative approach of grid services and resources, but also reusable and sharable knowledge to settle specific problem. The research of grid workflow process clustering can promote knowledge discovery and reuse in grid. In this paper, we put forward a grid workflow process design method using EventCondition-Action (ECA) rule, and propose a new process similarity measure approach. Then, we use a case to prove the feasibility of the approach and show how to revise present clustering algorithm with the similarity measure approach briefly.
1. Introduction Grid workflow plays more and more important role in the emerging grid technology. With the trend of merging grid and service-oriented technology, grid has changed into a distributed Problem Solve Environment (PSE) among different users, and a grid workflow process can be seen as not only cooperative approach of grid services and resources, but also sharable knowledge to settle specific problem. So it is necessary to cluster grid workflow processes to reduce the large amount of raw processes by categorizing them into smaller sets of similar items. We give a novel approach for calculating process similarity of ECA rulebased grid workflow, and introduce a similarity-based algorithm for grid workflow clustering. The remainder of this paper is organized as follows. Next section overviews the related works. Section 3 briefly introduces ECA rule and analyze how ECA rule supports typical workflow patterns. Section 4 proposes a new process similarity measure approach based on the comparison of ECA rule. A similarity measure case is used to prove the feasibility of our approach in section 5. The last Section concludes the whole paper and points out some future works briefly.
2. Related Works The approach proposed in [1] refers to the clustering of execution traces of processes or logs based on kmeans clustering. Ref [2] puts forward a process similarity measure approach based on both domain classification and pattern analysis. Ref [3] converts each workflow dependency graph into binary branch vector, and distance between the binary branch vectors is the distance of two processes. An inexact process matching approach is introduced in [4], it use ontology path to calculate the distance between two activities and give some rules for similarity comparison. A weighted graph is introduced in [5] for comparing processes. The graph similarity is the weighted sum of similarity between sets of services and sets of service links.
3. Process Design Based on ECA Rule Event-Condition-Action (ECA) rule is put forward in the research field of active database [6]. The rule make the data repositories react to internal or external events and trigger a chain of activities that includes notifying users and applications or performing database updates. It is similar to the business process. So, we developed a grid workflow management system based on ECA rule [7].
Figure 1. ECA Rule
As be shown in Figure 1, An ECA rule consists of two parts essentially: an event and a list of conditionaction pairs. When an event has occurred, a list of conditions are evaluated, if any condition is satisfied, the relative action is executed. A formal definition of ECA rule-based Workflow can be found in [7].
Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007) 0-7695-2874-0/07 $25.00 © 2007
Table 1 shows the ECA rules for the typical workflow patterns. It only summarizes the ECA rules triggered by EndOf(a) Event. We also defined other events, such as “BeginOf(a)” and “ErrorOf(a)”. Table1 ECA Rules for Basic Workflow Pattern
4.2. Event Similarity Measure 4.2.1 Atomic Event Similarity Measure Event can be differed as atomic event and composite event. Table 3 shows the evaluation method of the similarity between two atomic events ae and ae' , a and a' is the relative activity of ae and ae' respectively, AESim(ae, ae' ) represents the similarity of ae and ae' . If category (such as Begin, End, Error) of ae is same as category of ae' , AESim(ae, ae' ) Asim(a, a' ) , else AESim(ae, ae' ) 0 . Table 3. Atomic Event Similarity Measure
Cate(ae) is same as Cate( ae' )
AESim(ae, ae' ) ASim(a, a' )
Yes No
0
4.2.2 Complex Event Similarity Measure
4. Process Similarity Measure 4.1. Activity Similarity Measure Measure of activity distance estimates the function dissimilarity between two activities. Here we borrow the idea from [4] and do some modification. We use ADis(a, a' ) to represent the distance of activity a and a' . Table 2. Activity Distance Measure
Cate(a) Cate( a' ) LinkNumber (a, a' ) ADis(a, a' ) Start Start no ontology 0 End End no ontology 0 Delay Delay no ontology 0 Assign Assign no ontology 0 Service Service n n Other cases +Ğ Table 2 shows the value of ADis(a, a' ) in the different case. If the categories of a and a' are same and are not Service activity, ADis(a, a' ) is 0. If a and a' are both service activities, we will count the minimal link number “n” from a to a' in ontology tree. If a is same as a' , ADis(a, a' ) =0; and in other cases, ADis(a, a' ) ė+Ğ. After get the distance of activity a and a' , we can calculate activity similarity as: ASim(a, a' ) 1/(ADis(a, a' ) 1) (1)
Complex event can be expressed with an atomic event sequence connected by logic nodes “ ” and “ ”. A complex event has only one principal disjunctive normal form (PDNF). So, before we calculate the similarity between two complex events, we always transform them into atomic event sequences with PDNF at first. Assume ce and ce' are two complex event, and we use CESim(ce, ce' ) to represent the similarity of ce and ce' , we can do as the following steps: a. Transform ce and ce' into PDNF of atomic event sequences. Assume that ce and ce' have n cc and n'cc clauses respectively, n cc >= n'cc . ncc
ki
i 1
j 1
§¨© §¨©
ce
n 'cc
k 'i
i 1
j 1
ce.cci .ae j ·¸ ¹
(2)
(3) ce'.cc'i .ae' j ·¸ ¹ In (2), ce.cci is the ith clause of ce, ce.cci .ae j is
ce'
the jth atomic event in ce.cci , k i is the amount of atomic events in ce.cci . The symbols in (3) have the similar meaning. b. Take n'cc clauses from ce arbitrarily, we can get Pnn 'cc disjunctive form as cc
ce * [ p ]
n 'cc §
k *i
i 1
j 1
¨©
· ce * [ p ].cci .ae j ¸ ¹
(4)
Which Pnn 'cc means the amount of permutations to
Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007) 0-7695-2874-0/07 $25.00 © 2007
cc
take n'cc clauses from n cc clauses, 0= n'r , Take n' rules from Pr arbitrarily, we can get Pnn 'r rule permutation as
Figure 3. Image Processing Workflow B
r
Pr * [p]
Which
(Pr * [p].r1 , , Pr * [p].ri , , Pr * [p]. p n'r ) n'r
Pn
r
(18)
means the amount of permutations to
take n'r rules from process Pr, 0