Process Mining: Fuzzy Clustering and

0 downloads 0 Views 334KB Size Report
Process Mining: Fuzzy Clustering and. Performance Visualization. B.F. van Dongen and A. Adriansyah. Eindhoven University of Technology,. P.O. Box 513 ...
Process Mining: Fuzzy Clustering and Performance Visualization B.F. van Dongen and A. Adriansyah Eindhoven University of Technology, P.O. Box 513, NL-5600 MB, Eindhoven, The Netherlands [email protected], [email protected]

Abstract. The goal of performance analysis of business processes is to gain insights into operational processes, for the purpose of optimizing them. To intuitively show which parts of the process might be improved, performance analysis results can be projected onto process models. This way, bottlenecks can quickly be identified and resolved. Unfortunately, for many operational processes, good models, describing the process accurately and intuitively are unavailable. Process mining, or more precisely, process discovery, aims at deriving such models from events logged by information systems. However many mining techniques assume that all events in an event log are logged at the same level of abstraction, which in practice is often not the case. Furthermore, many mining algorithms produce results that are hard to understand by process specialists. In this paper, we propose a simple clustering algorithm to derive a model from an event log, such that this model only contains a limited set of nodes and edges. Each node represents a set of activities performed in the process, but many nodes can refer to many activities and vice versa. Using the discovered model, which represents the process at a potentially high level of abstraction, we present two different ways to project performance information onto it. Using these performance projections, process owners can gain insights into the process under consideration in an intuitive way. To validate our approach, we apply our work to a real-life case from a Dutch municipality.

1

Introduction

The goal of performance analysis of business processes is to gain insights into operational processes, for the purpose of optimizing them. Traditionally, such analysis is presented to problem owners in the form of simple pictures, such that the results are easily interpretable. Many Business Process Intelligence (BPI) tools use event logs to derive performance information. Typical performance statistics include throughput times of activities, utilization rates of departments, and so on. Within the research domain of process mining, process discovery aims at constructing a process model as an abstract representation of an event log. The goal S. Rinderle-Ma et al. (Eds.): BPM 2009 Workshops, LNBIP 43, pp. 158–169, 2010. c Springer-Verlag Berlin Heidelberg 2010 

Process Mining: Fuzzy Clustering and Performance Visualization

159

is to build a model (e.g., a Petri-net, an EPC, etc.) that provides insights into the control-flow captured in the log. Both research tools like ProM and industrial tools like Protos support the construction of Petri-nets from event logs. A recent Master project by Riemers [19] in a large Dutch hospital has shown that combining process discovery and performance analysis is far from trivial. In his work, Riemers designed a method for using a combination of process discovery and performance analysis techniques to improve processes in an healthcare setting. When applying this method to the logs of the Hospital, he identified some problems regarding the applicability of process discovery techniques. The medical specialists from the hospital found it very hard to understand the complex process models constructed by various process discovery techniques. Instead, they thought of their treatment processes as simple, sequential processes, with very little deviation from a main path. Therefore, when presented with performance information projected onto the discovered models, they were unable to interpret them directly. Using the experience of Riemens, as well as our own experiences in applying process mining to real-life logs taken from industry [4], we identified that the main reason for the combination of process mining and performance analysis being difficult is that process discovery algorithms are not capable of identifying events that occurred at different levels of abstraction. In the hospital for example, a large number of events were all different, but should be considered as one called “clinical chemistry”. In this paper, we present an algorithm for clustering events automatically to a desired level of abstraction. After discussing some related work in Section 2 and preliminaries in Section 3, Section 4 presents a clustering algorithm for the discovery of Simple Precedence Diagrams at a given level of abstraction. In Section 5, we show how to project performance information onto SPDs in two different ways. In order to show that our approach is applicable to real-life situations, we used a dataset called “bezwaar WOZ” from a Dutch municipality [18, 20]. The process described in this log is the process of handling objections filed against real estate taxes1 . We conclude the paper with some conclusions and future work in Section 6.

2

Related Work

Many researchers have investigated the process discovery problem, i.e. the problem of how to discover a process model from event logs. We refer to [3,6] and the process mining website www.processmining.org for a complete overview of the whole research domain. Interestingly, most current discovery techniques derive process models from a log assuming that all events in the log refer to an activity and that these activities occur at the same level of abstraction. 1

The approach presented in this paper is implemented in the pre-release version of the ProM-framework 2008, which can be obtained from www.processmining.org

160

B.F. van Dongen and A. Adriansyah

The so-called fuzzy miner [15] was the first attempt to let go of the assumption that activities occur on the same level of abstraction. The graph-based models produced by the fuzzy miner have two types of nodes, namely nodes that refer to one activity and nodes that refer to more activities (clusters). Therefore, the model is able to provide a high-level view of a process by abstracting undesired details. However, the fuzzy miner still assumes that each event in the log belongs to one of these nodes, i.e. there is a one-to-many relation as each node can represent many activities, but each activity is represented by exactly one node. Our approach relaxes this restriction to a many-to-many relation. In [5] a technique is presented the relation between nodes and activities is truly many-to-many. However, this technique first constructs a statespace from the log and then uses the theory of regions [9,10,12,13,14] to construct a Petri net from that statespace. As the theory of regions has a worst case complexity that is exponential in the size of the statespace, the second step is clearly the bottleneck and therefore this technique is less applicable on real-life logs. Currently, very few techniques are available to projects performance related information onto discovered process models. Instead, a comparison of commercial process monitoring tools in [16] showed that (1) performance values are either measured with the requirement of having a user-defined process model directly linking events in the log to parts of the model or (2) they are measured totally independent from process model. An exception is the work presented in [2] where performance indicator values are derived from timed workflow logs. Before the performance measures are calculated, a process model in form of colored workflow net is extracted from the logs using an extension of the α algorithm [7]. Then, the logs are replayed in the resulting net to derive performance measurements. Unfortunately, this approach relies on the discovered model to fit the log, i.e. each case in the log should be a trace in the discovered Petri net. On complex or less-structured logs, this often implies that the Petri net becomes ”spaghetti-like”, showing all details without distinguishing what is important and what is not [1]. Hence, it is difficult for process owners to obtain any useful insights out of these models. For the fuzzy models mentioned earlier, animation techniques are available to visually represent the execution captured in the log in the model. Using this animation, possible deadlock activities can be identified. Even so, no performance values can be obtained from the proposed animation approach. In the performance visualization we propose, measurements are dependent on the chosen process model and projected onto that model to provide intuitive insights into the process’ performance.

3

Preliminaries

In this section, we formally introduce some concepts we use in the remainder of this paper. First, we start by defining event logs. An event log is a collection of events, such that each event occurred at a given point in time. Furthermore each event relates to an instance of an activity and occurred within the context of a specific case.

Process Mining: Fuzzy Clustering and Performance Visualization

161

Definition 3.1 (Event Logs). An event log W is defined as: W = (E, I, A, C, t, i, a, c), where: E I A C t : E → IR+ 0 i:E→I a:I→A c:I→C

is is is is is is is is

the set of events, the set of activity instances, the set of activities, the set of cases, a function assigning a timestamp to each event, a function relating each event to an activity instance, a function relating each activity instance to an activity, and a function relating each activity instance to a case.

It is important to realize that in practical applications, event logs rarely adhere to Definition 3.1. Instead, references to activity instances are often missing, or events cannot even be related to cases directly. However, for the formalizations in this paper, we assume that logs do adhere to Definition 3.1. Furthermore, we assume that the timestamps define a total ordering on the events relating to the same case. In this paper, we use a case study taken from a Dutch municipality [18, 20]. From the log that originally contains 1982 cases, we only kept those cases that started after the beginning of the measurement period. This resulted in a log containing 1448 cases. In total, 37,470 events were recorded, relating to 16 activities. Another important concept for this paper is the notion of a process model. Recall from the introduction, that we later want to project performance information onto such a process model, where the process model can be a discovered model, but also a given model. The latter requirement makes that we need to define process models on a very high level. Therefore, we introduce the concept of a Simple Precedence Diagram (SPD). Definition 3.2 (Simple Precedence Diagram). A Simple Precedence Diagram is defined as: S = (N, L), where L ⊆ N × N , where N is the set of nodes in the model and L a set of edges linking the nodes. An SPD is simply a directed graph consisting of nodes and edges and should be seen as a conceptual model of a process. The nodes in an SPD identify activities in a very loose way, i.e, these nodes do not correspond one to one with activities in an event log. Furthermore, the edges define some notion of control flow, without specifying their semantics formally. The reason for using such a conceptual model as an SPD is that in performance analysis, it is often interesting to ask the domain expert to sketch a model of the process s/he wants to analyze. The resulting model is rarely an exact copy of the process model as it is being executed. Instead, the resulting model will be a highlevel view on the actual process. Nonetheless, we want to project performance characteristics onto such a model. Therefore, it is the relation between a log and an SPD which makes the SPDs useful for performance analysis. For this purpose, we define a connected SPD (cSPD).

162

B.F. van Dongen and A. Adriansyah

Definition 3.3 (Connected Simple Precedence Diagram). Let W = (E, I, A, C, t, i, a, c) be an event log and S = (N, L) an SPD. We say that Sc = (W, S, la , ln ) is a connected SPD, where la : A → P(N )\ ∅ and ln : N → P(A)\ ∅, such that for all a ∈ A and n ∈ N holds that n ∈ la (a) ≡ a ∈ ln (n). A cSPD is a combination of a log and an SPD, such that each activity in the log is represented by one or more nodes in the SPD and that each node in the SPD refers to one or more activities in the log. It is important to realize here that the connection between a log and an SPD is made on the level of activities, not on the level of activity instances. Obviously, for each log, a trivial cSPD can be constructed that contains only one node and no edges. And although this model can be interesting, especially for projecting performance information onto, we typically are interested in slightly more elaborate models. Therefore, we introduce an algorithm for the discovery of SPD models.

4

Discovering SPDs

When we introduced SPDs, we noted that it is possible for an expert to sketch such a model of the process under consideration. However, we feel that it is also necessary to provide a simple algorithm to construct cSPDs from event logs. For this purpose, we use a straightforward, fuzzy clustering algorithm. The goal of clustering algorithms is to divide observations over a number of subsets (or clusters), such that the observations in each of these clusters are similar in some sense. The idea behind the clustering algorithm we use follows this concept in a very simple way. First, we define a similarity metric on activities. Then, we choose a number of clusters and use a Fuzzy k-Medoids algorithm to divide the activities over the clusters, while maximizing the similarity of activities in each cluster. Such a Fuzzy k-Medoid algorithm requires two metrics, namely (1) a measure for the (dis)similarity of objects (activities in our case) and (2) a measure for the probability that an object belongs to a cluster of which another object is the medoid. We define both metrics based on direct succession of events. Definition 4.1 (Event Succession). Let W = (E, I, A, C, t, i, a, c) be an event log. We define >W : A×A → IN as a function counting how often events from two activities directly succeed each other in all cases, i.e. for a1 , a2 ∈ A, we say that >W (a1 , a2 ) = #e1 ,e2 ∈E (t(e1 ) < t(e2 ) ∧ a(i(e1 )) = a1 ∧ a(i(e2 )) = a2 ∧ c(i(e1 )) = c(i(e2 ))∧  ∃e3 ∈E (c(i(e3 )) = c(i(e1 )) ∧ t(e1 ) < t(e3 ) < t(e2 ))). We use the notation a1 >W a2 to denote >W (a1 , a2 ) > 0. The similarity between activities is defined by looking at how often events relating to these activities follow each other directly in the log. If events relating to these activities follow each other more often, then the similarity increases. Note that if two activities a1 , a2 are different, their similarity is never equal to 1 as >W (a1 , a2 ) = >W (a2 , a1 ).

Process Mining: Fuzzy Clustering and Performance Visualization

163

Definition 4.2 (Activity Similarity). Let W = (E, I, A, C, t, i, a, c) be an event log. We define the similarity σ : A × A → (0, 1) between two activities a1 , a2 ∈ A, such that if a1 = a2 then σ(a1 , a2 ) = 1, otherwise σ(a1 , a2 ) = >W (a1 ,a2 )+>W (a2 ,a1 )+1 1+2·maxa ,a ∈A (>W (a3 ,a4 )) . 3

4

As stated before, we also need a measure for the probability that an activity belongs to a cluster of which another activity is the medoid. For this purpose, we use the FCM membership model from [11]. Definition 4.3 (Cluster Membership Probability) Let W = (E, I, A, C, t, i, a, c) be an event log. Furthermore, let Ak ⊆ A with |Ak | = k be a set of medoids, each being the medoid of a cluster. For all a1 ∈ Ak and a2 ∈ A we define the probability u(a1 , a2 ) to denote the probability that a2 belongs to the cluster of which a1 is the medoid, i.e. u : Ak × A → [0, 1], where u(a1 , a2 ) =

1



σ(a1 ,a2 ) m−1 a3 ∈Ak

1

σ(a3 ,a2 ) m−1

.

Note that m ∈ [1, ∞) here denotes the so-called “fuzzifier”, which for this ,a2 ) paper we fixed at m = 2, i.e. u(a1 , a2 ) =  σ(ak1 σ(a . 3 ,a2 ) a3 ∈A

Using the cluster membership and the similarity functions, we can introduce the fuzzy k-Medoid algorithm. Definition 4.4 (Fuzzy k-Medoid Algorithm) Let W = (E, I, A, C, t, i, a, c) be an event log. Furthermore, let 0 < k ≤ |A| be the desired number of clusters. Wesearch Ak ⊆ A with a kset ofk medoids k k m |A | = k, such that this set minimizes a∈A a ∈ A (u(a , a) σ(a, ak )−1 ). For our implementation, we implemented the algorithm presented in [17] in ProM 20082. This algorithm does not guarantee to find the global minimum. Furthermore, the result depends on an initial random selection of medoids, which could result in non-determinism. However, the algorithm is fast, which is more important for our purpose. Finally, after the medoids have been found, we need to construct an SPD. Obviously, the found clusters correspond to the nodes in the SPD model, thereby also providing the mapping between activities in the log and nodes in the model. The edges however are again constructed using the succession relation defined earlier. Definition 4.5 (cSPD Mining Algorithm). Let W = (E, I, A, C, t, i, a, c) be an event log and let Ak ⊆ A be a set of medoids. We define the mined cSPD model M = (W, (N, L), la , ln ) such that: – N = Ak , i.e. the nodes of the SPD model are identified by the cluster medoids, – la : A → P(Ak ), such that la (a) = {ak ∈ Ak | u(ak , a) ≈ maxak1 ∈Ak (u(ak1 , a))}, 2

ProM 2008 is not released yet, but available from www.processmining.org

164

B.F. van Dongen and A. Adriansyah

Fig. 1. cSPD with 5 clusters, clustering the 16 activities of the “WOZ bezwaar” process. Note that all clusters overlap.

– ln : Ak → P(A), such that ln (ak ) = {a ∈ A | a ∈ la (a)}, – L = {(ak1 , ak2 ) ∈ Ak × Ak | ∃a1 ∈ln (ak1 )\ln (ak2 ) ∃a2 ∈ln (ak2 )\ln (ak1 ) a1 >W a2 }. According to Definition 4.5, an activity refers to a node (and vice versa) if the probability that the activity belongs to the cluster represented by the node is approximately the same as the maximum probability over all clusters. This implies that each medoid belongs to its own cluster. Furthermore, all other activities belong to at least one cluster, namely the one for which the function u is maximal. However, an activity can belong to multiple clusters. Note that we do not use equality of probabilities, as this would require the number of direct successions in the log to be the same for multiple pairs of activities and this is rarely the case in practice. The edges of the connected SPD are determined using the direct succession relation. Basically, two nodes are connected if there is an activity referred to by the first node that is not referred to by the second node that is at least once directly succeeded by an activity referred to by the second node, but not by the first. It is important to realize that an SPD does not have executable semantics. Instead, one should interpret and SPD as a high-level description that a process owner would draw of his process. Figure 1 shows an example of a cSPD. In this case, our example “bezwaar WOZ” process was used and we clustered the 16 activities into 5 clusters. Interestingly, all of these clusters overlap, as they all contain the activity “OZ14 Plan. taxeren”. All the other activities appear in at most 1 cluster. In the following section, we show how to project performance information into a cSPD in two different ways.

5

Performance Analysis in cSPDs

In Section 4, we proposed SPDs which are capable of describing events at different levels of abstraction and we presented an algorithm to derive such models from event logs. The next step of our work is to project performance information

Process Mining: Fuzzy Clustering and Performance Visualization Frequency of edge occurrence (width)

165

Time spent transferring control from source to target (color) Relative frequency of activity instances referred to by this node Relative frequency of cases in which this node occurs

Frequency of activity instances referred to by this node

Average throughput time of this node

Waiting time performance indicator (color)

Fig. 2. Fuzzy Performance Diagram of the SPD of Figure 1

onto the SPDs. For this purpose, we propose two diagrams: the Fuzzy Performance Diagram and the Aggregated Activities Performance Diagram. 5.1

Fuzzy Performance Diagrams

A Fuzzy Performance Diagram (FPD) is a visualization of an SPD, an example of which is shown in Figure 2. It is designed to show both performance information and control flow of a process in an easily interpretable manner. In an FPD, this information is projected onto each node of the SPD (or cluster of activities from the log) and onto each edge of the SPD. The way this projection is done is highly influenced by both Fuzzy model [15] and extended Petri nets in [2, 16]. In order to obtain performance information for cSPDs, a replay algorithm is used. It is beyond the scope of this paper to introduce the replay algorithm in detail. Instead, we refer to [8] for details. However, we do mention that the replay algorithm we use is based on the replay algorithms in [2] and [15]. The size of each FPD element indicates the relative importance of the corresponding element in an overall process based on the occurrence frequency of that element. The more activities to which a node refers were executed (i.e. the more activity instances the node refers to), the bigger the node’s size. The same principle is also applied to edges, i.e. the thicker an edge from a source node to a target node, the more often cases were routed from one node to another. The colors of all elements indicate whether the times spent on these elements is relatively high (red), medium (yellow) or low (green). Already from the sizes and colors of nodes and edges, a human analyst can easily recognize important activities and paths in the process under consideration. However, we also provide insights into the types of node splits and joins, i.e. by indicating to what extend these tend to be XOR, AND or OR. Figure 2 shows performance information of our example log projected onto the cSPD of Figure 1. From this diagram, we can immediately see that the node

166

B.F. van Dongen and A. Adriansyah

at the center bottom position has the worst throughput. Indeed, the average throughput here is 84 days, whereas the bottom right node’s throughput is 39 days and the other nodes have a throughput of 7-12 days. Furthermore, we see that the waiting times in front of the center nodes are relatively high as well. Although we think that FPDs provide intuitive insights into the performance of a process, projected onto a given model, we also feel that there is a need to focus on a single cluster of activities. For this purpose, we propose the Aggregated Activities Performance Diagram.

5.2

Aggregated Activities Performance Diagram

An Aggregated Activities Performance Diagram (AAPD) is a simple diagram consisting of bar elements, an example of which is shown in Figure 3. Each bar has a one-to-one relationship with a node in the FPD of Figure 2, hence, an AAPD bar indirectly refers to a cSPD node, and to one or more activity instances in the log. An AAPD is designed to show time spent between activities in a process and to show activities which often run in parallel. It is complementary to an FPD. Every AAPD has one focus element, which determines the cases that are being considered. Only cases which contain at least one activity instance referred to by the focus element are considered in the AAPD. Besides the focus element, each AAPD contains the other relevant nodes of the FPD from which it is constructed. In our example of Figure 3, we selected the node referring to OZ12, OZ14, OZ16 and OZ18 as our focus element. Each relevant FPD node is shown in the AAPD as a rectangle, such that the width indicates the sum of the average waiting time and the average service time for all corresponding activities in the selected cases. The height of the rectangle is determined by the percentage of cases in which any of the represented activities occurs, relative to the cases determined by the focus element. The position along the horizontal axis of all nodes is determined by the average start times of the activities represented by each node (note that the scale is non-linear, but logarithmic with the start time of the focus node being 0). Another indicator in each element of the AAPD is a horizontal line inside the big rectangle. This indicator shows the frequency of activity instances which are represented by the element, relative to the frequency of activity instances which are represented by a focus element. In the example of Figure 3, the height of line in element OZ14, OZ15, OZ20 and OZ24 is approximately 55%, indicating that the number of activity instances belonging to this node is about 55% of the activity instances belonging to the focus node. Finally, using a bar below the rectangle the average time when activities of each element are performed in parallel with activities of the focus element. Note that in the example this is artificial, no activity instances belonging to two different nodes ever overlap in time. Therefore, we show this as an example in the bottom right corner of Figure 3.

Focus Element : Determines which cases are used for the performance metrics in other bars

Focus Element

Relative distance between start time of activities OZ14, OZ15, OZ20 and OZ24 and the start times of activities belonging to the focus node in all cases in which the focus node was instantiated

Average waiting time of activities OZ02, OZ04, OZ06, OZ08, OZ09, OZ10 and OZ14 in all cases in which the focus node was instantiated

Average waiting time of activities OZ02, OZ04, OZ06, OZ08, OZ09, OZ10 and OZ14 in all cases in which the focus node was instantiated

Fig. 3. AAPD example

average time during which an activity is performed in parallel with activities of the focus node (as an example only, as in the log they never occur in parallel)

Activity Names

Line Height : Indicates the frequency of activities OZ14, OZ15, OZ20 and OZ24 compared to the frequency of activities from the focus node in all cases in which the focus node was instantiated

Element Color: indicates throughput time performance of activity instances belonging to each node in all cases where they occur

Average processing time of activities OZ12, OZ14, OZ16 and OZ18 in all cases in which the focus node was instantiated

Element Height : Indicates the % of cases where instances of activities belonging to this node occur (relative to the number of cases in which instances of activities belonging to the focus node occur)

Average waiting time of activities OZ12, OZ14, OZ16 and OZ18 in all cases in which the focus node was instantiated

Average throughput time of activities OZ12, OZ14, OZ16 and OZ18 in all cases in which the focus node was instantiated

Process Mining: Fuzzy Clustering and Performance Visualization 167

168

B.F. van Dongen and A. Adriansyah

As in an FPD node, the AAPD visualization helps a human analyst to distinguish important clusters of activities from unimportant ones. Furthermore, it provides an indication of control flow, i.e. which activities often comes before/after another, and how are they conducted (in sequence/parallel).

6

Conclusions and Future Work

In this paper, we introduced a mining algorithm to derive process models at different levels of abstraction from event logs. These so-called Simple Precedence Diagrams (SPDs) consist of nodes that have a many to many relation with activities observed in the log, as well as edges between the nodes indicating a notion of control flow. Using the discovered SPDs, or man-made SPDs provided by a process owner, we can replay a given log file in order to obtain performance information about the operational process. In this paper, we presented two approaches to project this performance information onto SPDs, using so-called Fuzzy Performance Diagrams (FPD) and Aggregated Activity Performance Diagrams (AAPD). FPDs provide isights into the bottlenecks of a process, by coloring the nodes and edges of an SPD according to their relative performance. AAPDs are used to see the performance of certain nodes with respect to a single focus node. We argue that both FPDs and AAPDs provide intuitive performance information of a process to a human analyst. Although we use a real-life log as an example in this paper, we propose to validate this approach more thoroughly using the case study presented in [19], as we have access to the problem owner to help with the validation, especially since he can provide us with SPDs at different levels of abstraction. Finally, better insights need to be obtained into the performance of our approach in a setting where logs do not follow Definition 3.1, i.e. for example because no information is provided about activity instances.

References 1. van der Aalst, W.M.P.: Trends in Business Process Analysis: From Verification to Process Mining. In: Cordeiro, J., Cardoso, J., Filipe, J. (eds.) Proceedings of the 9th International Conference on Enterprise Information Systems (ICEIS 2007), pp. 12–22. Institute for Systems and Technologies of Information, Control and Communication, INSTICC, Medeira (2007) 2. van der Aalst, W.M.P., van Dongen, B.F.: Discovering Workflow Performance Models from Timed Logs. In: Han, Y., Tai, S., Wikarski, D. (eds.) EDCIS 2002. LNCS, vol. 2480, pp. 45–63. Springer, Heidelberg (2002) 3. van der Aalst, W.M.P., van Dongen, B.F., Herbst, J., Maruster, L., Schimm, G., Weijters, A.J.M.M.: Workflow Mining: A Survey of Issues and Approaches. Data and Knowledge Engineering 47(2), 237–267 (2003) 4. van der Aalst, W.M.P., Reijers, H.A., Weijters, A.J.M.M., van Dongen, B.F., Alves de Medeiros, A.K., Song, M., Verbeek, H.M.W.: Business Process Mining: An Industrial Application. Information Systems 32(5), 713 (2007)

Process Mining: Fuzzy Clustering and Performance Visualization

169

5. van der Aalst, W.M.P., Rubin, V., Verbeek, H.M.W., van Dongen, B.F., Kindler, E., G¨ unther, C.W.: Process mining: a two-step approach to balance between underfitting and overfitting. Software and Systems Modeling (2009) 6. van der Aalst, W.M.P., Weijters, A.J.M.M. (eds.): Process Mining. Special Issue of Computers in Industry, vol. 53(3). Elsevier Science Publishers, Amsterdam (2004) 7. van der Aalst, W.M.P., Weijters, A.J.M.M., Maruster, L.: Workflow Mining: Discovering Process Models from Event Logs. IEEE Transactions on Knowledge and Data Engineering 16(9), 1128–1142 (2004) 8. Adriansyah, A.: Performance Analysis of Business Processes from Event Logs and Given Process Models. Master’s thesis, Eindhoven University of Technology, Eindhoven (2009) 9. Badouel, E., Bernardinello, L., Darondeau, P.: The Synthesis Problem for Elementary Net Systems is NP-complete. Theoretical Computer Science 186(1-2), 107–134 (1997) 10. Badouel, E., Darondeau, P.: Theory of regions. In: Reisig, W., Rozenberg, G. (eds.) APN 1998. LNCS, vol. 1491, pp. 529–586. Springer, Heidelberg (1998) 11. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell (1981) 12. Cortadella, J., Kishinevsky, M., Lavagno, L., Yakovlev, A.: Synthesizing Petri Nets from State-Based Models. In: Proceedings of the 1995 IEEE/ACM International Conference on Computer-Aided Design (ICCAD 1995), pp. 164–171. IEEE Computer Society, Los Alamitos (1995) 13. Cortadella, J., Kishinevsky, M., Lavagno, L., Yakovlev, A.: Deriving Petri Nets from Finite Transition Systems. IEEE Transactions on Computers 47(8), 859–882 (1998) 14. Ehrenfeucht, A., Rozenberg, G.: Partial (Set) 2-Structures - Part 1 and Part 2. Acta Informatica 27(4), 315–368 (1989) 15. Gunther, C.W., van der Aalst, W.M.P.: Fuzzy Mining: Adaptive Process Simplification Based on Multi-perspective Metrics. In: Alonso, G., Dadam, P., Rosemann, M. (eds.) BPM 2007. LNCS, vol. 4714, pp. 328–343. Springer, Heidelberg (2007) 16. Hornix, P.T.G.: Performance Analysis of Business Processes through Process Mining. Master’s thesis, Eindhoven University of Technology, Eindhoven (2007) 17. Krishnapuram, R., Joshi, A., Yi, L.: A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering. In: Proc. IEEE Intl. Conf. Fuzzy Systems - FUZZIEEE 1999, Snippet Clustering, Korea (1999) 18. de Medeiros, A.K.A.: Genetic Process Mining. PhD thesis, Eindhoven University of Technology, Eindhoven (2006) 19. Riemers, P.: Process Improvement in Healthcare: a Data-Based Method Using a Combination of Process Mining and Visual Analytics. Master’s thesis, Eindhoven University of Technology, Eindhoven (2009) 20. van der Aalst, W.M.P., Dumas, M., Ouyang, C., Rozinat, A., Verbeek, E.: Conformance checking of service behavior. ACM Trans. Interet Technol. 8(3), 1–30 (2008)

Suggest Documents