Automated Discovery of Process Models from ... - Semantic Scholar

IEEE TRANS. ON KNOW. AND DATA ENG., VOL. ??, NO. ??, ??? 2017

1

Automated Discovery of Process Models from Event Logs: Review and Benchmark

arXiv:1705.02288v1 [cs.SE] 5 May 2017

Adriano Augusto, Raffaele Conforti, Marlon Dumas, Marcello La Rosa, Fabrizio Maria Maggi, Andrea Marrella, Massimo Mecella, Allar Soo

Abstract—Process mining methods allow analysts to exploit logs of historical executions of business processes in order to extract insights regarding the actual performance of these processes. One of the most widely studied process mining operations is automated process discovery. An automated process discovery method takes as input an event log, and produces as output a business process model that captures the control-flow relations between tasks that are observed in or implied by the event log. Several dozen automated process discovery methods have been proposed in the past two decades, striking different tradeoffs between scalability, accuracy and complexity of the resulting models. So far, automated process discovery methods have been evaluated in an ad hoc manner, with different authors employing different datasets, experimental setups, evaluation measures and baselines, often leading to incomparable conclusions and sometimes unreproducible results due to the use of non-publicly available datasets. In this setting, this article provides a systematic review of automated process discovery methods and a systematic comparative evaluation of existing implementations of these methods using an opensource benchmark covering nine publicly-available real-life event logs and eight quality metrics. The review and evaluation results highlight gaps and unexplored tradeoffs in the field, including the lack of scalability of several proposals in the field and a strong divergence in the performance of different methods with respect to different quality metrics. The proposed benchmark allows researchers to empirically compare new automated process discovery against existing ones in a unified setting. A. Augusto, M. Dumas, F.M. Maggi, A. Soo are with the University of Tartu, Estonia. E-mail: {adriano.augusto,marlon.dumas,f.m.maggi}@ut.ee, [email protected] A. Augusto, R. Conforti and M. La Rosa are with Queensland University of Technology, Australia. E-mail: {a.augusto,raffaele.conforti,m.larosa}@qut.edu.au A. Marrella and M. Mecella are with Sapienza University of Rome, Italy. E-mail: {marrella,mecella}@diag.uniroma1.it Manuscript received April 19, 2005; revised August 26, 2015.

Index Terms—Process Mining, Automated Process Discovery, Literature Review, Benchmark

I. I NTRODUCTION Modern information systems maintain detailed trails of the business processes they support, including records of key process execution events, such as the creation of a case or the execution of a task within an ongoing case. Process mining techniques allow analysts to extract insights about the actual performance of a process from collections of such event records, also known as event logs [1]. In this context, an event log consists of a set of traces, each trace itself consisting of the sequence of events related to a given case. One of the most widely studied process mining operations is automated process discovery. An automated process discovery method takes as input an event log, and produces as output a business process model that captures the controlflow relations between tasks that are observed in or implied by the event log. In order to be useful, such automatically discovered process models must accurately reflect the behavior recorded in or implied by the log. Specifically, the process model discovered from an event log should: (i) parse the traces in the log; (ii) parse traces that are not in the log but are likely to belong to the process that produced the log; and (iii) not parse other traces. The first property is called fitness, the second generalization and the third precision. In addition, the discovered process model should be as simple as possible, a property that is usually quantified via complexity measures. The problem of automated discovery of process models from event logs has been intensively researched in the past two decades. Despite a rich set of proposals, state-of-the-art automated process


discovery methods suffer from two recurrent deficiencies when applied to real-life logs [2]: (i) they produce large and spaghetti-like models; and (ii) they produce models that either poorly fit the event log (low fitness) or grossly over-generalize it (low precision or low generalization). Striking a tradeoff between these quality dimensions in a robust manner has proved to be a difficult problem. So far, automated process discovery methods have been evaluated in an ad hoc manner, with different authors employing different datasets, experimental setups, evaluation measures and baselines, often leading to incomparable conclusions and sometimes unreproducible results due to the use of non-publicly available datasets. This article aims to fill this gap by: (i) providing a systematic review of automated process discovery methods; and (ii) a comparative evaluation of six implementations of representative methods, using an open-source benchmark consisting of eleven publicly-available real-life event logs and eight quality metrics covering all four dimensions mentioned above (fitness, precision, generalization and complexity) as well as execution time. The outcomes of this research are a classified inventory of automated process discovery methods and a benchmark designed to enable researchers to empirically compare new automated process discovery against existing ones in a unified setting. The benchmark is provided as an opensource command-line Java application to enable researchers to replicate the reported experiments with minimal configuration effort. The rest of the article is structured as follows. Section II presents the systematic literature review methodology. Section III presents the outcomes of the literature review, while Section IV classifies the approaches identified by the review. Next, Section V introduces the experimental benchmark and results. Section VI discusses the overall findings, Section VII acknowledges the threat to validity of the study, and Section VIII refers to previous reviews and comparative studies in the field. Finally, Section IX draws conclusions and outlines future work directions. II. M ETHODOLOGY To understand what research has been done in the field of process discovery, we conducted

2

a Systematic Literature Review (SLR) through a scientific, rigorous and replicable approach as specified by Kitchenham in [3]. Based on that, at first research questions were specified (cf. Section II-A), after which research strings were created (cf. Section II-B) and data sources were selected (cf. Section II-C). Then, we identified inclusion and exclusion criteria (cf. Section II-D) and methods for quality assessment (cf. Section II-E). Finally, we performed research studies selection (cf. Section II-F), data extraction and analysis (cf. Section II-G) A. Research questions formulation The objective of our SLR is to analyse research studies related to (business) process discovery. Specifically, we focused on approaches that produce process models from event logs. To this aim, the following research questions were created: RQ1 Which are the existing approaches dealing with process discovery? RQ2 Which kinds of process model are discovered by the existing approaches? RQ3 Which language constructs can be identified by the discovered process models? RQ4 What tools exist to perform process discovery? RQ5 How the existing process discovery approaches have been evaluated? RQ6 In which domains have existing process discovery approaches been applied? RQ1 aims at identifying in the research literature the existing approaches to perform process discovery. RQ2 categorizes the output of a process discovery approach on the basis of the kind of process model discovered (i.e., imperative, declarative or hybrid). RQ3 investigates which language constructs (e.g., exclusive choice, concurrency, etc.) can be identified by the discovered process models. RQ4 explores what tool support the different approaches have, while RQ5 investigates how they have been empirically evaluated and RQ6 analyzes in which application domains they have been applied. B. Search strings In order to perform search over the selected data sources (cf. Section II-C), we elaborated four


3

search strings by deriving keywords from our – is not already investigated in [2]; knowledge of the subject matter. – is an improvement of an approach invesWe first determined that the term “process tigated in [2]. discovery” would covered the majority of the Exclusion Criteria research field investigated. Furthermore, being “learning” and “workflow” terms often used as EX1 The study describes an existing approach to process discovery without introducing any quasi-synonyms of “discovery” and “process”, we relevant improvement to it. constructed four search strings by combining the EX2 The study tackles process mining in general, above terms, as follows: but is mainly focused on conformance check“process discovery”, “workflow ing or process enhancement. discovery”, “process learning”, EX3 The study refers to a non-peer reviewed pub“workflow learning”. lication. The search was done separately for any of the EX4 The study is not presented in English. above search strings. If a query on a specific data EX5 The study is not electronically available or freely accessible through the standard universource returned more than one thousand results, sity libraries proxy service. we refined it by combining the selected search EX6 The study describes an approach that is destring with the term “business” or “process minclared as not implemented. Conversely, if ing” and obtaining more elaborated search strings, the study claims the existence of an ime.g., “business process discovery” or “(process plementation, it is provisionally included in learning AND process mining)”, which allowed the SLR, even if the implementation is not to return more focused results. electronically available. EX7 If several studies refer to the same approach, C. Data sources selection then all studies except the most complete and We applied the four identified search strings general are excluded. to seven popular academic databases: (i) Scopus; A study that matches at least one of the above (ii) Web of Science; (iii) IEEE Xplore; (iv) ACM exclusion criteria is dropped out from the SLR. Digital Library; (v) SpringerLink; (vi) ScienceDiNotice that we apply a restriction with respect to rect; (vii) Google Scholar. Research studies were the publication date of a study. In fact, being the retrieved based on the presence of the search most recent survey on process discovery published strings in the title, the keywords and the abstract in the early 2012 [2] and partially covering disof the studies. In addition, to ensure that all covery approaches published in 2011, we exclude relevant studies were identified in our search, we from our primary search all the approaches pubperformed a further search on Google Scholar by lished before 2011 or already investigated in [2]. retrieving any study whose full text contained at least one of the search strings. We noted that this additional search did not return any relevant study E. Quality assessment that was not already discovered in our primary We performed a quality assessment to evalusearch. ate the maturity and robustness of a subset of D. Inclusion and exclusion criteria In order to identify the relevant studies for our SLR, we defined the following inclusion and exclusion criteria. Inclusion Criteria IN1 The study is related to process discovery from event logs and describes an approach that: – is published in 2011 or later;

the discovery approaches underlying the studies included in our SLR. The criteria for selecting the relevant approaches, the list of quality criteria used for classifying them and the results of the quality assessment are discussed in Section V. F. Studies selection We applied the four search strings elaborated in Section II-B to each of the seven electronic


libraries mentioned in Section II-C. The search was performed in December 2016. As shown in Table I, the queries resulted in a total of 2165 studies satisfying the inclusion criteria. Starting from this, each of the studies was reviewed through an iterative approach to determine its relevance for the SLR. Database/ Search string Scopus Web of Science IEEE Xplore ACM SpringerLink ScienceDirect Google Scholar Total

process discovery 331 175 68 21 335 110 79 1119

workflow discovery 10 4 4 1 29 13 182 243

process learning 222 97 35 47 47 29 126 603

workflow learning 21 7 2 1 17 12 140 200

Table I: Number of studies matching the inclusion criteria (December 2016). First, for each reviewed study, we checked if it matched the exclusion criteria EX1-5 by analyzing the title, the abstract, the introduction and the conclusion sections. Furthermore, we performed a backward reference search by considering literature cited by the studies themselves and we removed duplicate studies. After this filtering, we obtained 331 studies.1 Secondly, we analysed in depth any of the 331 studies by focusing on the exclusion criterion EX6. Specifically, if an approach was not implemented and tested, or without sufficient data to perform the quality assessment, it was discarded from the SLR. This second filtering returned 98 studies. Finally, we applied the exclusion criterion EX7, so that if an approach did not provide any novelty with respect to a previous study describing the same approach, or if it was a variant of one of its precedent more general and complete approach, it was excluded from the SLR. Overall, this results in 32 primary studies describing 32 distinct discovery approaches, as shown in Table II. G. Data extraction and Analysis We applied a data extraction process to each of the 32 primary studies for answering the research questions defined in Section II-A. For any study, we extracted the following information: 1 The list of studies obtained at this stage is available at https://doi.org/10.6084/m9.figshare.4975130

4

1) General information: title, authors, year of publication, number of citations. 2) Kind of process model (procedural, declarative, hybrid) and corresponding modeling languages (Petri nets, BPMN, Declare, etc.) discovered. 3) Specific language constructs identified by the discovery approach (exclusive choice, concurrency, etc.). 4) Implementation details (e.g., programming language) related to the tool implementing the approach. 5) Type of validation performed: size and type of logs tested (real-life, artificial, synthetic), availability of the tool, etc. Table II provides an aggregated view of the most relevant information extracted from any of the selected primary studies. Such information was used for answering to the research questions, as detailed in Section III. III. R ESULTS OF THE SLR In this section, the results obtained from the SLR are presented. Figure 1 shows how the primary studies are distributed over the years. It can be seen that, from year 2014 on, a fast growth in the number of studies has happened. This indicates that the field of process discovery has gained interest over the years. In the remaining of this section we show the detailed results for our SLR answering each research question separately (from Section III-A to Section III-F). A. Approaches to process discovery (RQ1) We classified the 32 primary studies according to their underlying approaches to process discovery (cf. Table I,). Section IV provides details ont each approach identified in the SLR. B. Discovered process models (RQ2) Concerning the type of process models discovered, Table II shows that the majority of the approaches (25 out of 32) underlying the primary studies produce procedural models. On the other hand, five approaches (cf. [6], [7], [27], [30], [32]) discover declarative models in the form of Declare constraints, while [12] produces declarative models formalized through the WoMan formalism.


Name

Authors

Year

Model Type

Model language

Process Spaceship HK Declare Miner MINERful Inductive Miner - Infrequent Process Skeletonization Evolutionary Tree Miner Updated Heuristics Miner WoMan Hybrid Miner Competition Miner Direted Acyclic Graphs Decomposed Process Miner MVPM Mine CNMining alpha$ Maximal Pattern Mining Supervised Polyhedra DGEM LocalizedLogs HybridILPMiner ProDiGen Structured Miner Non-Atomic Declare Miner RegPFA BPMN Miner TB-MINERful PGminer SQLMiner ProM-D CSMMiner Proximity Miner

Motahari et al. [4] Huang and Kumar [5] Maggi et al. [6] Di Ciccio, Mecella [7] Leemans et al. [8] Abe, Kudo [9] Buijs e al. [10] De Cnudde et al. [11] Ferilli [12] Maggi et al. [13] Redlich et al. [14] Vasilecas et al. [15] Verbeek, van der Aalst [16] Folino et al. [17] Greco et al. [18] Guo et al. [19] Liesaputra et al. [20] Ponce de Leon et al. [21] Molka et al. [22] van der Aalst et al. [23] van Zelst et al. [24] Vazguez et al. [25] Augusto et al. [26] Bernardi et al. [27] Breuker et al. [28] Conforti et al. [29] Di Ciccio et al. [30] Mokhov et al. [31] Schönig et al. [32] Song et al. [33] van Eck et al. [34] Yahya et al. [35]

2011 2012 2012 2013 2013 2014 2014 2014 2014 2014 2014 2014 2014 2015 2015 2015 2015 2015 2015 2015 2015 2015 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016

Procedural Procedural Declarative Declarative Procedural Procedural Procedural Procedural Declarative Hybrid Procedural Procedural Procedural Procedural Procedural Procedural Procedural Procedural Procedural Procedural Procedural Procedural Procedural Declarative Procedural Procedural Declarative Procedural Declarative Procedural Procedural Procedural

State machines Petri nets Declare Declare Process trees Fuzzy maps Process trees Heuristics nets WoMan Declare + Petri nets BPMN Directed Acyclic Graphs Petri nets Heuristics nets Causal nets Petri nets Petri nets Petri nets BPMN Petri nets Petri nets Heuristics nets BPMN Declare Petri nets BPMN Declare Partial Order Graphs Declare Petri nets State machines Heuristic nets

5

AND X X

Process constraints XOR OR Cycle Declare X X

X X X X

X X X X X

X X X X

X X X X X X X X X

X X X X X X X X X X X X X X

X X

X X

X

X

X X X

X X X

X

X X

X X X X X X

X

X X

X X X X X X X X X X X X

X

X X X X X X X

Implementation Eclipse Standalone ProM ProM, Standalone ProM Standalone ProM ProM Standalone ProM Standalone Standalone ProM Standalone ProM ProM ProM Standalone Standalone ProM ProM Standalone ProM, Standalone ProM Standalone ProM, Apromore Standalone Workcraft, Standalone Standalone Standalone ProM ProM

Evaluation Real-life Synth. X X X X X X X X X X X X X X X X X X X X

X X X X X X X X X

Avl? Art.

X X X X

X X X X X X X

X X

X X X X X

X X

X X X X X X

X X X X X X

X X X X X X X X X

X X X

Number of Primary Studies

Table II: Overview of the 32 primary studies identified ({year, author} ordered).

10 9 8

2 1 0

2011 2012 2013 2014 2015 2016 Publication Year

Figure 1: Distribution of primary studies by publication year.

Lastly, the approach presented in [13] is the only one able to discover hybrid models, presented as a combination of Petri nets and Declare constraints. C. Detectable language constructs (RQ3) Regarding procedural discovery approaches, which are all able to identify simple sequences in the discovered models, we also take into account the ability to detect the following constructs: concurrency, exclusive choice, inclusive choice, loops. So far, only 4 approaches are able to discover inclusive choices: [8], [10] are able to directly

identify block-structured inclusive choices (thanks to the use of process trees), while [16], [29] can detect this construct only when used on top of [8] or [10] (i.e. indirectly). The remaining 21 procedural approaches can discover constructs for concurrency, exclusive choice and loops; exception made for: [9], [17], which can detect exclusive choice and loops but not concurrency, and [15], which can discover only exclusive choice. This is mostly due to the nature of their outputs; indeed, Fuzzy maps, Heuristics nets, Directed Acyclic Graphs and Casual nets do not natively allow to represent concurrency. However, other approaches returning the same kinds of output (cf. [18], [25], [35]) are able to discover concurrency patterns thanks to variant of the output modelling language. About the declarative discovery approaches, we cannot determine the same constructs as in the case of procedural models. In this case we considered the discoverable declarative patterns. In particular, the approaches [6], [7], [27], [30], [32] are able to identify the whole set of Declare constraints, while [12] detects constructs belonging to the WoMan formalism. D. Existing tools (RQ4) More than the 50% of the approaches (17 out of 32) provide an implementation as plugin of the ProM framework. The reason behind the


popularity of ProM can be explained by its opensource and portable framework, which easily allows researchers to develop and test new discovery algorithms. Among these latter, 2 are also available as standalone applications (cf. [7], [26]), while [29] provides a further implementation as Apromore plugin. The remaining 15 approaches have been developed as standalone applications or as plugins of Eclipse (cf. [4]) and Workcraft platform (cf. [31]).

E. Evaluation of the approaches (RQ5) For answering to RQ5, we analyzed the “nature” of the event logs used for evaluating the 32 discovery approaches. Three kinds of logs are possible: (i) artificial, i.e., generated automatically replaying artificial process models; (ii) synthetic, i.e., generated automatically replaying real-life process models; (iii) real-life, i.e., containing reallife process execution data. According to the above classification, our analysis found that the majority of the approaches (29 out of 32) were tested against real-life logs. Among them, 10 approaches (cf. [4]–[7], [15], [18], [23], [27], [28], [33]) were further tested against synthetic logs, while 9 approaches (cf. [10]–[12], [19], [20], [22], [29]–[31]) against artificial logs. Finally, 2 out of 3 of the remaining approaches were tested both on synthetic logs and artificial logs (cf. [25], [26]), while [24] was tested only on artificial logs.

6

•

•

•

BPIC12 - this log is from a Dutch Financial Institute and describes a loan application process (used by [7], [8], [11], [13]–[16], [22], [28], [30], [32], [34]). BPIC13 - this log is from Volvo IT Belgium and describes incident and problem management process within VINST system (used by [11], [17], [27], [28]). BPIC14 - this log is from Rabobank Group ICT and describes the work of service desk, namely interaction management, incident management, and change management process (used by [30]).

Not any discovery approach was validated using the event logs provided by BPIC. For example, the approach [4] was tested against a log extracted by an on-line game service, while [5] against a public log describing patent application procedures in United States. The approaches [6], [8], [10] were evaluated through non-public logs of a building permit approval procedure held in several Dutch municipalities. Finally, logs coming from several domains such as logistic [33], [35], traffic congestion dynamics [18], employers habits [12], [34], financial [23], automotive [19], administrative [21], health care [22], telecom [31], project management and insurance [9], [29] were used as well. IV. C LASSIFICATION OF A PPROACHES The 32 papers selected in Section II were classified based on the type of representation they use for process models.

F. Application domains (RQ6) Concerning the application domains where the approaches have been evaluated, we notice that several approaches exploited the real-life logs made available by the Business Process Intelligence Challenge (BPIC), which is held annually in the range of the BPM conference. The event logs extracted from the various BPIC belong to the following application domains: •

BPIC11 - this log is from a Dutch academic hospital describing patients diagnosis and treatment in Gynaecology department (used by [7], [8], [20], [32]).

A. Petri nets In [5], the authors describe an algorithm to extract block structured Petri nets from event logs. The algorithm works by first building an adjacency matrix between all pairs of tasks and then analyzing the information in it to extract block structured models consisting of basic sequence, choice, parallel, loop, optional and self loop structures as building blocks. The approach has been implemented in a standalone application called HK.


In [16], the authors propose an improvement of the algorithm implemented in the ILP Miner [36] whose complexity is linear in the size of the event log and exponential in the number of distinct activities. The approach is based on splitting up distinct activities over multiple event logs, which alleviates the complexity. An implementation is available as a ProM plug-in. The approach presented in [19] is based on an algorithm called α$, which is able to discover invisible tasks involved in nonfree-choice constructs. The algorithm is an extension of the α algorithm, originally presented in [1]. In [20], the authors propose a technique for process discovery using Maximal Pattern Mining where they construct patterns based on the whole sequence of events seen on the traces. Starting from these patterns they construct process models in the form of Workflow nets. In [21], Ponce de Leon et al. present Supervised Polyhedra, a standalone application based on a process discovery approach that takes into account negative information. This feature guarantees that the discovered process models are not only simple, fitting and precise, but also good on generalizing the right behavior underlying an event log. In [23], the authors present an approach for the discovery of Petri nets implemented in a ProM plug-in available in a package called LocalizedLogs. In a log, events are localized by assigning a nonempty set of regions to each event. Regions can only interact through shared events. The approach is based on a genetic algorithm that exploits localized event logs to improve the quality of the discovered models. In [24], the authors propose an improvement of the ILP Miner [36] based on the concept of hybrid variable-based regions. Through hybrid variablebased regions, it is possible to vary the number of variables used within the ILP problems being solved. Using a different number of variables has an impact on the average computation time of solving ILPs during ILP-based process discovery. The technique presented in [28] allows for the discovery of Petri nets using the theory of grammatical inference. The technique has been implemented as a standalone application called RegPFA. The approach proposed in [33] is based on the

7

observation that activities with no dependencies in an event log can be executed in parallel. In this way, this technique is able discover processes with concurrencies even if the logs fail to meet the completeness criteria. The approach has been implemented in a tool called ProM-D. B. Process trees The Inductive Miner [8], [37]–[40] and the Evolutionary Tree Miner [10] are both based on the extraction of process trees from an event log. Concerning the former, many different variants have been proposed during the last years, but its first appearance is in [37]. Successively, since that technique was not able to deal with infrequent behavior an upgrade has been proposed in [8], which efficiently drops infrequent behavior from logs, still ensuring soundness and high value of fitness of the discovered models. Another variant of the Inductive Miner is presented in [38]. This variant is able to minimize the impact of incompleteness of the input logs. Finally, the variant presented in [39], [40] combines scalability with quality guarantees. It can be used to mine large event logs and produces sound models. In [10], Buijs et al. introduce the ProM plug-in Evolutionary Tree Miner. This plug-in implements a discovery approach based on a genetic algorithm that allows the user to drive the discovery process based on preferences with respect to the four quality dimensions for a process model (fitness, precision, generalization and simplicity). C. Heuristics nets In [41], the authors present a ProM plug-in called Flexible Heuristics Miner. The plug-in is able to mine process models that contain nontrivial constructs and are low structured and, at the same time, is able to deal with noisy event logs. The process models are a specific type of heuristics nets where the semantics of splits and joins are represented using split/join frequency tables. This results in easy to understand process models even in the case of non-trivial constructs, low structured domains and the presence of noise. The discovery algorithm is based on the one originally presented in [42]. In [11], the technique presented in [41] has been improved as anomalies were


found concerning the validity and completeness of the resulting process model. The improvements have been implemented in a ProM plug-in called Updated Heuristics Miner. In [17], Folino et al. propose a two-phase clustering-based process discovery approach, where the clusters are inherently defined through logical decision rules over context data. The approach has been implemented in a standalone application. ProDiGen, a standalone miner by Vazguez et al. [25], allows users to discover heuristics nets from event logs using a genetic algorithm. The algorithm is based on a fitness function that takes into account completeness, precision and simplicity and specific crossover and mutation operators. In [35], the authors present a discovery method to obtain heuristics nets from event logs based on event relations and enhanced using the background knowledge of the domain experts. The approach has been implemented in a ProM plug-in called Proximity Miner. D. Causal nets The approach presented in [18] is based on causal nets. This approach is based on encoding the information gathered from the input log and the (possibly) given background knowledge in terms of precedence constraints over the topology of the resulting process models. A mining algorithm is formulated in terms of reasoning problems over precedence constraints. E. State machines An approach called process spaceship has been implemented as an Eclipse plug-in by Motahari et al. [4]. The authors start from the observation that the information about process executions is often scattered across several systems and data sources. They investigate the different ways in which process-related events could be correlated in service interaction logs and propose a mechanism to discover event correlations (semi-)automatically from them. The data collected through event correlations is mined to discover process models in the form of state machines. In [34], the authors present an approach to discover process models where different facets,

8

or perspectives, of the process can be identified. Instead of focusing on the events or activities that are executed in the context of a particular process, they concentrate on the states of the different perspectives and discover how they are related. These relations are expressed in terms of Composite State Machines. The approach has been implemented as ProM plug-in called CSM Miner and provides an interactive visualization of the multi-perspective state-based models. F. BPMN In [43], Conforti et al. present a technique for the automated discovery of BPMN models containing sub-processes, interrupting and noninterrupting boundary events and activity markers. The approach has been improved in [29] to make it robust to noise. This approach has been implemented in an application called BPMN Miner, which is available as a standalone application and as a ProM plug-in and an Apromore plug-in. Another approach to discover BPMN models is presented in [26]. Differently from other approaches to process discovery, this technique separates the concern of producing accurate models with that of ensuring their structuredness, without sacrificing the former to ensure the latter. This approach has been implemented in an application called Structured Miner, which is available as a standalone application and, also, as a ProM plugin. An approach to discover BPMN models is also presented in [14]. This approach has been implemented in an application called Dynamic Constructs Competition Miner developed as an extension of the Constructs Competition Miner presented in [44]. The approach is based on a divide-and-conquer algorithm which discovers block-structured processes from event logs possibly consisting of exceptional behavior. G. Declare [6] describes a two-phase approach for mining declarative process models expressed using Declare [45], [46]. The first phase is based on an Apriori algorithm used to identify frequent sets of correlated activities. A list of candidate constraints is built on the basis of the correlated


activity sets. During the second phase, the candidate constraints are checked by replaying the log on specific automata, each accepting only those traces that are compliant to one constraint. The Declare constraints satisfied in a percentage of traces higher than a user-defined threshold are discovered. MINERful [7] is an approach to discover Declare constraints consisting of two phases. The first phase computes statistic data describing the occurrences of activities and their interplay in the log. The second one checks the validity of Declare constraints by querying such a statistic data structure (knowledge base). The approach presented in [27] is based on the use of discriminative rule mining to determine how the characteristics of the activity lifecycles in a business process influence the validity of a Declare constraint in that process. The approach has been implemented as a plug-in of ProM. In [30], the authors present an approach to discover a class of Declare constraints called targetbranched Declare. A Declare constraint is targetbranched when one of its parameters (the target) is the disjunction of two or more activities to express that one activity out of a set of activities can occur. The approach has been implemented as a standalone application. The SQLMiner presented in [32] is based on a mining approach that directly works on relational event data by querying the log with standard SQL. By leveraging database performance technology, the mining procedure is fast without limiting itself to detecting certain control-flow constraints. Queries can be customized and cover process perspectives beyond control flow, like illustrated in [47]. H. Hybrid models In [13], Maggi et al. present a technique for discovering hybrid process models from event logs. A hybrid process model is hierarchical, where each of its sub-processes may be specified in a declarative or procedural way. They use Petri nets for representing procedural sub-processes and Declare for representing declarative sub-processes. The approach has been implemented as a ProM plug-in called Hybrid Miner.

9

I. Other languages In [9], the authors introduce a monitoring framework for process discovery. A monitoring context is used to extract traces from relational event data and attach different type of metrics to them. Based on these metrics, traces with certain characteristics can be selected and used for the discovery of process models expressed as Fuzzy maps. In [12], Ferilli presents WoMan, a framework for Workflow Management. Its core is an automatic approach that learns and refines process models from event logs. The approach is based on First-Order Logic that guarantees incrementality in learning and adapting the models, the ability to express triggers and conditions on the process tasks and efficiency. In [15], Vasilecas et al. present an approach for the extraction of Directed Acyclic Graph from event logs. Starting from these graphs it is possible to generate Bayesian belief networks, one of the most common probabilistic models, which can be used to analyze business processes in an efficient manner. In [31], the authors show how Conditional Partial Order Graphs, a compact representation of families of partial orders, can be used for addressing the problem of compact and easy-tocomprehend representation of event logs with data. They present algorithms for extracting both the control flow as well as the relevant data parameters from a given event log and show how Conditional Partial Order Graphs can be used for efficient and effective visualization of the obtained results. The technique has been implemented as a Workcraft plug-in and as command line standalone application called PGminer. V. B ENCHMARK In this section, we present a benchmark performed over some of the discovery approaches analyzed in the previous sections. A. Selection of the discovery approaches Assessing all the approaches discussed so far would not be possible due to the heterogeneous nature of the inputs required and the outputs produced. Hence, we decided to focus our attention


on a subset of those approaches. We applied the following criteria to decide if an approach should have been assessed or not: i an implementation of the approach has to be available; ii the implementation of the approach must take as input a simple event log (i.e. event logs with no data payload); iii the implementation of the approach must produce as output a Petri net or something seamlessly convertible into a Petri net; iv the corresponding research paper counts more than 10 citations OR it provides a strong evaluation of the proposed discovery approach. Criteria i, ii, and iii are necessary conditions to run the benchmark, since it is based on metrics which can only be computed on Petri net and BPMN models. The last criterion is a quality filter for the discovery approaches. An evaluation is assessed upon six quality criteria, each of which can score low, medium, or high. The six quality criteria are: a log type: assesses the origin of the event logs used for the evaluation of the discovery approach (i.e., artificial, synthetic or real-life). A low is assigned if only artificial logs were used, a medium is assigned if artificial and/or synthetic were used, or a high is assigned if real-life logs were used. b number of logs: assesses the number of logs used for the evaluation. A low is assigned if no real-life logs were used and the total number of logs is less than 15, a medium is assigned if the evaluation included between one to three real-life logs or the total number of logs is less than 27, while a high is assigned in the remaining cases. The thresholds of these score-bands were determined considering the maximum number of logs ever used in an evaluation of a discovery approach (excluding outliers) and applying the following percentages: 33% and 66%. c log size: assesses the size (in terms of number of traces) of the logs used. A low is assigned if the largest size of the logs used is less than 33% of the maximum log size ever found in an evaluation of a discovery approach, a

10

medium is assigned if it is less than 66%, or a high otherwise. d type of experiments domain: assesses the heterogeneity of the dataset used for the evaluation. A low is assigned if the dataset contained only noise-free or complete logs, a medium is assigned if the dataset included noisy logs, incomplete logs, or logs of various sizes (for scalability testing), while a high is assigned if noise-free, noisy, complete, and incomplete logs of various size were used. e comparison to the state-of-the-art: according to related previous work [2], the Heuristics Miner was the most reliable and efficient discovery approach. Therefore, if the evaluation includes experimental tests comparing the novel discovery approach with Heuristics miner, this criterion scores high. Otherwise low. f type of validation: assesses the quality of the validation. A high is assigned if a comparison with other discovery approaches was performed. Otherwise low. We grouped these six criteria in three macro criteria: logs quality (containing criteria a, b, and c), experiments quality (containing criteria d and e), and validation quality (containing f ). These three macro criteria were finally used to assess the strength of an evaluation. We considered an evaluation to be a strong evaluation if it scored at least two high in logs quality, one high in experiments quality, and a fourth high any where else. 2 On the basis of these criteria we selected the following approaches: Inductive Miner [48](IM), CNMining [18], Alpha$ [19](A$), Evolutionary Tree Miner [10](ETM), Hybrid ILP Miner [24](HILP), Structured Miner on Heuristic Miner 5.2 [26](S-HM5.2 ), BPMN Miner [29], Decomposed Process Mining [16], Maximal Pattern Mining [20], ProDiGen [25], Heuristic Miner 6.0 [49](HM6 ). This latter was included since the most efficient among those evaluated in [2]. At posteriori, we excluded Decomposed Process Mining, Maximal Pattern Mining, ProDiGen, CN2 Evaluations performed in more than one research paper, but regarding the same discovery approach, were assessed as unique evaluation


Mining and BPMN Miner. We excluded Decomposed Process Mining as it is a divide and conquer approach which can be applied on top of any other discovery approach and can be used to further improve the results of the remaining discovery approaches. We had to exclude Maximal Pattern Mining and ProDiGen since we were not able to find an implementation, despite mentioned in the respective papers, and we got no reply from the authors. We held out CNMining because during experiments it was not possible to convert any of its output causal nets into Petri nets. Lastly, we excluded BPMN Miner since it produces the same results of Structured Miner in the case of flat process models (such as the ones used for this benchmark), but while the latter can only be used in conjunction with the HM the former can be used on top of other discovery approaches. To conclude, the approaches selected for the benchmark are: A$, IM, ETM, HM6 , S-HM5.2 and HILP. B. Datasets and Setup To guarantee the reproducibility of our benchmark and to provide the community with a tool for comparing new approaches with the ones evaluated in this paper, we developed a command-line java application which performs measurements of accuracy and complexity metrics on the selected discovery approaches. The tool, available as an open source tool,3 can be easily extended to incorporate new discovery approaches or metrics. The dataset used for our benchmark is the collection of real-life event logs publicly available at the “4TU Centre for Research Data”, up to March 2017.4 We included all the BPI Challenge (BPIC) logs, plus the Road Traffic Fines Management Process (RTFMP) and the SEPSIS Cases log. They record executions of business processes in different domains, i.e. healthcare, finance, government and IT service management. Instead, we held out those logs that do not explicitly capture business processes (e.g. the BPIC 2011 and 2016 logs) as well as those contained in other logs (e.g. the Environmental permit application process log). 3 The source code is avaialble at https://github.com/ raffaeleconforti/ResearchCode. 4 https://data.4tu.nl/repository/collection:event_logs_real.

11

Table III reports on the logs selected and their characteristics. We can observe that the collection is widely heterogeneous containing both simple and very complex real-life logs. The log size ranges from 681 traces (for the BPIC152f log) to 150370 traces (for the RTFMP log). A Similar variety can be observed in the percentage of distinct traces, ranging from 0.2% to 80.6%, and the number of event classes (i.e. activities executed within the process), ranging from 7 to 82. Finally, the length of a trace also varies from super short, counting only one event, to very long counting 185 events. Finally, we ran our benchmark tool on a 6-core Intel Xeon CPU E5-1650 v3 @ 3.50GHz with 128GB RAM. We setup Java JVM 8 with 16GB of heap space, applying a time-out of one hour for the discovery time, and one hour as measuring time for each metric evaluated.5 C. Evaluation metrics For all selected discovery approaches we measured the following accuracy and complexity metrics: recall (a.k.a. fitness), precision, generalization, complexity, and soundness. Fitness is the ability of a model to reproduce the behavior contained in a log. Under trace semantics, a fitness of 1 means that the model can reproduce every trace in the log. In this paper, we use the fitness measure proposed in [50], which measures the degree to which every trace in the log can be aligned with a trace produced by the model. Precision represents the ability of a model to generate only the behavior found in the log. A score of 1 indicates that any trace produced by the model is contained in the log. In this paper, we use the precision measure defined in [51], which is based on similar principles as the above fitness measure. Recall and precision can be combined together into a single measure known as F-score, which is the harmonic meanof the two measureFitness·Precision . ments 2 · Fitness+Precision Generalization measures the ability of an automated discovery algorithm to capture behavior that is not present in the log but that can be produced 5 Each discovery approach was executed applying its default settings.


Log Name BPIC12 BPIC13cp BPIC13inc BPIC14f BPIC151f BPIC152f BPIC153f BPIC154f BPIC155f BPIC17f RTFMP SEPSIS

Total Traces 13087 1487 7554 41353 902 681 1369 860 975 21861 150370 1050

Distinct Traces (%) 33.4 12.3 20.0 36.1 32.7 61.7 60.3 52.4 45.7 40.1 0.2 80.6

Total Events 262200 6660 65533 369485 21656 24678 43786 29403 30030 714198 561470 15214

12

Distinct Events 36 7 13 9 70 82 62 65 74 41 11 16

Trace Length min avg max 3 20 175 1 4 35 1 9 123 3 9 167 5 24 50 4 36 63 4 32 54 5 34 54 4 31 61 11 33 113 2 4 20 3 14 185

Table III: Descriptive statistics of the event logs.

by the process under observation. To measure generalization we use 3-fold cross validation [52]: We divide the log into 3 parts, discover a model from 2 parts (i.e. we hold-out 1 part), and we measure fitness of the discovered model against the holdout part. This is repeated for every possible holdout part. Generalization is the mean of the fitness values obtained for each hold-out part. A generalization of 1 means that the discovered models produce traces in the observed process, even if those traces are not in the log from which the model was discovered and that the discovered models is accurate and does not introduce extra behaviour (i.e. does not over-generalize). Complexity quantifies how difficult it is to understand a model. Several complexity metrics have been shown to be (inversely) related to understandability [53], including Size (number of nodes); Control-Flow Complexity (CFC) (the amount of branching caused by gateways in the model) and Structuredness (the percentage of nodes located directly inside a well-structured single-entry single-exit fragment). Lastly, soundness assesses the semantic quality of a process model and reports whether a process model violates one of the three soundness criteria [54]: i) option to complete, ii) proper completion, and iii) no dead transitions. D. Results Table IV displays the results obtained in our benchmark. We highlighted in bold the best score for each measure on each log, and used “-” to report that a given accuracy or complexity measurement could not be reliably obtained due to syntactical or semantic issues in the discovered

model (e.g. the model is disconnected or deadlocks in some state). The first conclusion we can draw from the results is that four out of six discovery approaches were not able to systematically discover sound models. A$, HILP and HM experienced severe difficulties in producing useful outputs. A$ showed scalability issues and it discovered only one model within the time-out, which did not stand out in accuracy (despite very balanced in fitness and precision) neither in complexity. HILP did definitely better compared to A$, producing a model ten times out twelve. However, its models were not workflow nets (i.e. they contained multiple end places or disconnected transitions). For this latter reason, we could assess only the models’ complexity, which shows moderately simple outputs for the BPIC13cp , BPIC13inc and RTFMP logs, and highly complex for the remainings, with size ranging from 80 to 433 and CFC from 59 to 829. Lastly, HM6 shares similar results of HILP, it always delivered an output in time and faster than any other discovery approach, but only two times out of twelve the outputs were sound. Precisely, for BPIC13inc , HM6 delivered a very simple and accurate model with maximum scores for precision, F-score, size, CFC and structuredness. While for BPIC153f , HM6 delivered a highly fitting model having best fitness, F-score and generalization, thought very complex. About the remaining approaches, S-HM5.2 led to slightly better results than HM6 thanks to its ability to repair soundness, being able to output five sound models out of twelve. Of these five models, S-HM5.2 scored the best fitness, F-score


Log Name

BPIC12

BPIC13cp

BPIC13inc

BPIC14f

BPIC151f

BPIC152f

BPIC153f

BPIC154f

BPIC155f

BPIC17f

RTFMP

SEPSIS

Discovery Method A$ IM ETM HM6 S-HM5.2 HILP A$ IM ETM HM6 S-HM5.2 HILP A$ IM ETM HM6 S-HM5.2 HILP A$ IM ETM HM6 S-HM5.2 HILP A$ IM ETM HM6 S-HM5.2 HILP A$ IM ETM HM6 S-HM5.2 HILP A$ IM ETM HM6 S-HM5.2 HILP A$ IM ETM HM6 S-HM5.2 HILP A$ IM ETM HM6 S-HM5.2 HILP A$ IM ETM HM6 S-HM5.2 HILP A$ IM ETM HM6 S-HM5.2 HILP A$ IM ETM HM6 S-HM5.2 HILP

Fitness 0.98 0.33 0.82 0.99 0.83 0.92 0.84 0.91 0.83 0.89 0.68 0.71 0.97 0.57 1.00 0.93 0.62 0.95 0.66 0.95 0.96 0.66 0.94 0.58 1.00 0.98 0.72 0.99 0.79 0.92 0.99 0.71 -

Accuracy Precision 0.50 0.98 1.00 0.76 0.79 0.54 0.80 0.96 0.95 0.64 0.94 0.73 0.57 0.89 0.75 0.56 0.90 0.55 0.88 0.67 0.58 0.95 0.18 0.89 0.70 0.70 1.00 0.70 0.98 0.88 0.45 0.84 -

F-score 0.66 0.49 0.90 0.86 0.81 0.68 0.82 0.93 0.89 0.74 0.79 0.76 0.71 0.69 0.86 0.70 0.73 0.70 0.75 0.79 0.73 0.78 0.30 0.70 0.82 0.82 0.84 0.82 0.87 0.90 0.62 0.77 -

Gen. (3-Fold) 0.98 0.38 0.82 0.99 0.86 0.92 0.88 0.91 0.83 0.89 0.57 t/o 0.96 0.56 1.00 0.94 0.57 0.95 0.64 0.95 0.96 0.63 0.94 0.56 1.00 0.98 0.82 0.99 0.81 0.77 0.96 0.70 -

13

Size 59 69 85 128 300 9 11 12 17 10 13 28 9 16 24 31 22 43 35 80 219 164 73 150 145 282 193 78 194 196 159 78 157 184 433 162 74 156 155 364 134 82 166 167 35 31 29 195 222 34 46 47 70 57 50 30 81 52 87

Complexity CFC Struct. 37 1.00 10 1.00 99 0.05 177 0.10 460 4 1.00 17 1.00 6 0.67 13 1.00 3 7 1.00 24 1.00 4 1.00 11 1.00 9 18 1.00 15 1.00 51 24 59 91 0.22 108 1.00 21 1.00 98 89 0.39 322 123 1.00 19 1.00 158 0.11 154 0.11 108 1.00 26 1.00 151 0.07 183 0.06 829 111 1.00 17 1.00 127 0.13 123 0.16 593 95 1.00 26 1.00 124 0.15 123 0.20 20 1.00 5 1.00 10 0.45 114 0.68 330 20 1.00 33 1.00 50 0.06 55 1.00 53 32 1.00 15 1.00 132 0.17 42 0.58 129 -

Table IV: Accuracy and complexity results of the benchmark.

Sound yes yes no no no yes yes no yes yes yes yes yes yes yes yes yes no no no yes yes yes no yes no yes yes no no yes yes yes no no yes yes no no no yes yes no yes yes yes no no no yes yes no yes no yes yes no no no

Exec. Time(s) t/o 6.6 3,600 2.5 341.0 772.2 t/o 0.1 3,600 0.1 1.5 0.1 t/o 1.0 3,600 0.8 12.8 2.5 t/o 3.4 3,600 3.3 47.9 7.3 3,545.9 0.6 3,600 0.5 481.5 4.4 t/o 0.7 3,600 0.7 119.8 t/o t/o 1.3 3,600 0.8 1,246.6 1,062.9 t/o 0.7 3,600 0.5 301.7 14.7 t/o 1.5 3,600 1.2 304.0 t/o t/o 13.3 3,600 6.5 390.9 384.5 t/o 10.9 3,600 7.8 260.9 3.5 t/o 0.4 3,600 0.03 312.3 1.6


and generalization on BPIC151f , BPIC155f , being second in precision only to ETM. Also, it scored the best F-score for the RTFMP model, and delivered useful outputs for the two BPIC13 models, yet not being the best. Finally, IM and ETM were the only two approaches able to systematically produce sound process models, since they discover process trees which by construction are fully structured and sound. Furthermore, models discovered using IM scored the highest value on fitness and generalization for nine models out of twelve. Whilst models discovered by ETM scored the best results on precision for nine models as well. Also, ETM delivered five times the most accurate models (highest F-score), against two times for IM. Indeed, IM lack of precision did not let it outperform ETM. It is worth mention, that the high quality of ETM’s outputs is strictly bound to the high execution time needed by ETM’s genetic algorithm. About the complexity of the output models, ETM and IM performed the best again. First, both output always fully structured process models. Second, size and CFC are very small compared to the other approaches. However, ETM output more compact models than IM, having the smallest size and CFC seven times out of twelve. To conclude, IM and ETM showed to be the most useful/reliable discovery approaches, followed by S-HM5.2 and HM6 , which were able to produce suitable models only in the case of simpler logs. VI. D ISCUSSION Our review highlights a growing interest on the field of process discovery, and it confirms the existence of a wide and heterogeneous number of proposals addressing the problem of this area of literature. Despite the variety of proposals we can clearly identify two main streams: approaches which output declarative process models, and approaches which output procedural process models. Further, while the former relies only on declarative statements to represent a process, the latter provides various alternatives, though, most of the procedural discovery approaches output Petri nets. The predominance of this process modeling language is driven by the expressiveness power of the

14

Petri nets, and by the requirements of the methods used to assess the quality of the discovered process models. Indeed, as reported in section V, quality assessment tools are based on Petri nets inputs. Despite some modeling languages have a straightforward conversion to Petri nets, the strict requirements of these quality assessment tools represent a limitation for the proposals in this research field. For the same reason, it was not possible to compare the two main streams, so that we decided to focus on the evaluation and comparison of procedural approaches, since these latter have a higher relevance than the declarative ones. Finally, our benchmark shows gaps, limitations and unexplored trade-offs of the state-of-the-art approaches, such as: lack of scalability, and strong divergences in the performance with respect to the different quality metrics. About this latter, we reported divergences between different approaches, such that: given same inputs the quality of their outputs highly differ (i.e. BPIC12 log). But also, divergences among outputs of the same approach, such that: the majority of the approaches were not able to guarantee any minimum quality for their outputs, exception made for Inductive Miner and Evolutionary Tree Miner, which guarantee soundness and full structuredness. Further, our evaluation shows that existing approaches do not integrate sufficiently robust noise-filtering methods to be used in standalone mode on large-scale real-life events logs. In some cases (i.e. BPIC14, BPIC15, BPIC17 logs), in order to produce any output for which fitness and precision could be measured or to avoid extreme over-fitting (in the case of the Inductive Miner), we had to apply a noise-filtering technique as a pre-processing step. To conclude, even if many proposals are available in this research area, yet there is no definitive solution to the problem of process discovery. First, most of the approaches are difficult to evaluate, if not at all. Second, for those approaches we assessed, we were not able to identify the best among of them, since it is clearly missing a discovery approach which is able to deliver process models striking balanced quality metrics. Despite this, it can be noted that there has been significant progress in the field in the past five years. Three of the methods proposed during this period (i.e.


Inductive Miner, Evolutionary Tree Miner and Structured Miner) clearly outperform those that had been developed in the 2000s. VII. T HREATS TO VALIDITY The first threat to validity refers to the potential selection bias and inaccuracies in data extraction and analysis typical of literature reviews. In order to minimize such issues, our review carefully adheres to the guidelines outlined in [3]. Concretely, we used well-known literature sources and libraries in Computer Science to extract relevant works on the topic of process discovery. Furthermore, we performed a backward reference search in order to avoid the exclusion of potential relevant papers. A second threat to validity derives from the fact that the search of relevant works on process discovery was performed in December 2016. This means that any work published after that date was not included in the SLR for evaluation. Moreover, only 32 primary studies were included in the SLR, with only 1-2 studies for each approach with information regarding research questions. This might endanger the overall accuracy of an approach’s representation in the study. Finally, to avoid that our review was threatened by insufficient reliability, we ensured that the search process could be replicated by other researchers. However, the search may produce different results as internal processes of source libraries are frequently updated. Additionally, since the process of creating a literature review also considers subjective factors (e.g., subjective interpretations considering inclusion criteria), other researchers may come to different conclusions and may not obtain exactly the same results as here presented. The experimental evaluation on the other hand is limited in scope to techniques that produce Petri nets (or languages such as BPMN that can be directly translated to Petri nets). Also, it only considers major studies identiifed in the SLR, with at least 10 citations and an available implementation. When a more recent technique of of a given type has been shown to outperform others in previous work, we took only the more recent one – e.g. we only considered A$, as it was shown in previous work to outperform the

15

α and α+ algorithms. Also, we used existing implementations of each retained technique with their default configuration. Since in every case the default configuration has been chosen by the authors of the approach, we have assumed that this configuration generally delivers better results. In order to compensate for these shortcomings, we have published the bechmarking toolset as opensource software in order to enable researchers both to reproduce the results herein reported and to run the same evaluation for other methods or for alternative configurations of the evaluated methods. Another limitation is the use of only 12 event logs, which to some extent limits the generalizability of the conclusions. However, the event logs included in the evaluation are all reallife logs of different sizes and characteristics. To mitigate this limitation, we have structured the released benchmarking toolset in such a way that the benchmark can be seamlessly re-run with additional datasets. VIII. R ELATED W ORK A previous survey and empirical evaluation of automated process discovery methods has been reported by De Weerdt et al. [2]. This previous survey covered 27 approaches altogether, all of which are included in the 331 studies we identified during our systematic literature review (prior to filtering). The empirical evaluation reported in [2] includes seven approaches, namely AGNEsMiner, α+, α++, Genetic Miner (and a variant thereof), Flower Heuristics Miner and ILP Miner. In comparison, our benchmark includes A$ (which is an improved version of α+ and α++), heuristics miner and HILP (improved version of ILP). We did not include AGNEsMiner and the genetic miner due to the very long execution times (in the order of several hours to several days as reported in [2]). We did include however ETM in our benchmark, which is a genetic algorithm postdating the evaluation in [2], which achieves better execution times by focusing on block-structured process models and simpler transformation rules. Another difference with respect to [2] is that in this paper we restricted the evaluation to publicly available event logs in order to ensure reproducibility, whereas the evaluation in [2] is based


on closed datasets due to the unavailability of public datasets at the time that study. In terms of the findings, [2] found that the heuristics miner achieved a better F-score than other approaches and generally produced simpler models, while the ILP achieved the best fitness at the expense of low precision and high model complexity. Our results show that other techniques postdating these ones, specifically ILP and SHM, achieve even better F-score and lower model complexity than the heuristics miner, which in addition suffers from the fact that it often produces unsound process model. It therefore appears that progress in the field in the last years has been achieved by focusing on the discovery of blockstructured process models. Another previous survey in the field is outdated [55] and a more recent one is not intended to be comprehensive [56], but rather focuses on plugins available in the ProM toolset. Another related effort is CoBeFra – a tool suite for measuring fitness, precision and model complexity of automatically discovered process models [57]. The methods for evaluation of fitness and precision used in our benchmark are taken from this suite. IX. C ONCLUSION This article has presented a Systematic Literaure Review (SLR) of automated process discovery methods and a comparative evaluation of existing implementations of these methods using a benchmark covering 12 publiclyavailable real-life event logs and eight quality metrics. The toolset used in this benchmark is available as open-source software at https:// github.com/raffaeleconforti/ResearchCode and all the event logs are sourced from the 4TU Centre and are available at: https://data.4tu.nl/ repository/collection:event_logs_real. The benchmarking toolset has been designed in a way that it can be seamlessly extended with additional methods, event logs, and evaluation metrics. The SLR put into evidence a vast number of automated process discovery methods (331 relevant papers were identified). Traditionally, many of these proposals produce Petri nets, but more recently, we observe an increasing number of methods that produce models in other notations,

16

including BPMN and declarative process modeling notations. We also observe a recent emphasis on methods that produce block-structured process models. The results of the empirical evaluation show that methods that seek to produce block-structured process models (ETM, IM and S-HM) achieve the best performance in terms of fitness, precision and simplicity. Methods that do not restrict the topology of the generated process models (A$, Heuristics and HILP), generally produce unsound process models when applied to real-life event logs. We have also observed that in the case of very complex event logs, it is necessary to use a filtering method prior to applying existing automated process discovery methods. None of the methods appears to have a sufficiently powerful and adaptive filtering method to be able to cope with the complexity of these event logs. Finally, the study has shown that there is still significant room for improvement in the field. One of the two top-performing automated process discovery methods (IM) is problematic because it often generates “flower” structures, i.e. fragments where a set of activities can be performed any number of times and in any order. These structures lead to low precision and generalization. The other relatively well-performing method (S-HM) is not robust – it sometimes produces unsound process models due to its reliance on the Heuristics Miner. ACKNOWLEDGMENTS This research is partly funded by the Australian Research Council (grant DP150103356) and the Estonian Research Council (grant IUT20-55). R EFERENCES [1] W. M. P. van der Aalst, T. Weijters, and L. Maruster, “Workflow mining: Discovering process models from event logs,” IEEE Trans. Knowl. Data Eng., vol. 16, no. 9, pp. 1128–1142, 2004. [2] J. De Weerdt, M. De Backer, J. Vanthienen, and B. Baesens, “A multi-dimensional quality assessment of stateof-the-art process discovery algorithms using real-life event logs,” Information Systems, vol. 37, no. 7, pp. 654– 676, 2012. [3] B. Kitchenham, “Procedures for performing systematic reviews,” Keele, UK, Keele University, vol. 33, no. 2004, pp. 1–26, 2004.


[4] H. R. Motahari-Nezhad, R. Saint-Paul, F. Casati, and B. Benatallah, “Event correlation for process discovery from web service interaction logs,” The VLDB Jourˇ nalâA˘ TThe International Journal on Very Large Data Bases, vol. 20, no. 3, pp. 417–444, 2011. [5] Z. Huang and A. Kumar, “A study of quality and accuracy trade-offs in process mining,” INFORMS Journal on Computing, vol. 24, no. 2, pp. 311–327, 2012. [6] F. M. Maggi, R. P. J. C. Bose, and W. M. P. van der Aalst, “Efficient discovery of understandable declarative process models from event logs,” in Advanced Information Systems Engineering - 24th International Conference, CAiSE 2012, Gdansk, Poland, June 25-29, 2012. Proceedings, 2012, pp. 270–285. [7] C. Di Ciccio and M. Mecella, “A two-step fast algorithm for the automated discovery of declarative workflows,” in Computational Intelligence and Data Mining (CIDM), 2013 IEEE Symposium on. IEEE, 2013, pp. 135–142. [8] S. J. J. Leemans, D. Fahland, and W. M. P. van der Aalst, “Discovering block-structured process models from event logs containing infrequent behaviour,” in Business Process Management Workshops - BPM 2013 International Workshops, Beijing, China, August 26, 2013, Revised Papers, 2013, pp. 66–78. [9] M. Abe and M. Kudo, “Business monitoring framework for process discovery with real-life logs,” in International Conference on Business Process Management. Springer, 2014, pp. 416–423. [10] J. C. Buijs, B. F. van Dongen, and W. M. van der Aalst, “Quality dimensions in process discovery: The importance of fitness, precision, generalization and simplicity,” International Journal of Cooperative Information Systems, vol. 23, no. 01, p. 1440001, 2014. [11] S. De Cnudde, J. Claes, and G. Poels, “Improving the quality of the heuristics miner in prom 6.2,” Expert Systems with Applications, vol. 41, no. 17, pp. 7678– 7690, 2014. [12] S. Ferilli, “Woman: logic-based workflow learning and management,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 44, no. 6, pp. 744–756, 2014. [13] F. M. Maggi, T. Slaats, and H. A. Reijers, “The automated discovery of hybrid processes,” in International Conference on Business Process Management. Springer, 2014, pp. 392–399. [14] D. Redlich, T. Molka, W. Gilani, G. Blair, and A. Rashid, “Dynamic constructs competition miner-occurrence-vs. time-based ageing,” in International Symposium on DataDriven Process Discovery and Analysis. Springer, 2014, pp. 79–106. [15] O. Vasilecas, T. Savickas, and E. Lebedys, “Directed acyclic graph extraction from event logs,” in International Conference on Information and Software Technologies. Springer, 2014, pp. 172–181. [16] H. Verbeek and W. M. van der Aalst, “Decomposed process mining: The ilp case,” in International Conference on Business Process Management. Springer, 2014, pp. 264–276. [17] F. Folino, M. Guarascio, and L. Pontieri, “On the discovery of explainable and accurate behavioral models for complex lowly-structured business processes.” in ICEIS (1), 2015, pp. 206–217. [18] G. Greco, A. Guzzo, F. Lupia, and L. Pontieri, “Process discovery under precedence constraints,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 9, no. 4, p. 32, 2015.

17

[19] Q. Guo, L. Wen, J. Wang, Z. Yan, and S. Y. Philip, “Mining invisible tasks in non-free-choice constructs,” in International Conference on Business Process Management. Springer, 2015, pp. 109–125. [20] V. Liesaputra, S. Yongchareon, and S. Chaisiri, “Efficient process model discovery using maximal pattern mining,” in International Conference on Business Process Management. Springer, 2015, pp. 441–456. [21] H. Ponce-de Léon, J. Carmona, and S. K. Vanden Broucke, “Incorporating negative information in process discovery,” in International Conference on Business Process Management. Springer, 2015, pp. 126–143. [22] T. Molka, D. Redlich, M. Drobek, X.-J. Zeng, and W. Gilani, “Diversity guided evolutionary mining of hierarchical process models,” in Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. ACM, 2015, pp. 1247–1254. [23] W. M. van der Aalst, A. Kalenkova, V. Rubin, and E. Verbeek, “Process discovery using localized events,” in International Conference on Applications and Theory of Petri Nets and Concurrency. Springer, 2015, pp. 287– 308. [24] S. J. van Zelst, B. F. van Dongen, and W. M. P. van der Aalst, “ILP-Based Process Discovery Using Hybrid Regions,” in International Workshop on Algorithms & Theories for the Analysis of Event Data, ATAED 2015, ser. CEUR Workshop Proceedings, vol. 1371. CEURWS.org, 2015, pp. 47–61. [25] B. Vázquez-Barreiros, M. Mucientes, and M. Lama, “Prodigen: Mining complete, precise and minimal structure process models with a genetic algorithm,” Information Sciences, vol. 294, pp. 315–333, 2015. [26] A. Augusto, R. Conforti, M. Dumas, M. La Rosa, and G. Bruno, “Automated discovery of structured process models: Discover structured vs. discover and structure,” in Conceptual Modeling: 35th International Conference, ER 2016, Gifu, Japan, November 14-17, 2016, Proceedings 35. Springer, 2016, pp. 313–329. [27] M. L. Bernardi, M. Cimitile, C. Di Francescomarino, and F. M. Maggi, “Do activity lifecycles affect the validity of a business rule in a business process?” Information Systems, 2016. [28] D. Breuker, M. Matzner, P. Delfmann, and J. Becker, “Comprehensible predictive models for business processes,” MIS Quarterly, vol. 40, no. 4, pp. 1009–1034, 2016. [29] R. Conforti, M. Dumas, L. García-Bañuelos, and M. La Rosa, “Bpmn miner: Automated discovery of bpmn process models with hierarchical structure,” Information Systems, vol. 56, pp. 284–303, 2016. [30] C. Di Ciccio, F. M. Maggi, and J. Mendling, “Efficient discovery of target-branched declare constraints,” Information Systems, vol. 56, pp. 258–283, 2016. [31] A. Mokhov, J. Carmona, and J. Beaumont, “Mining conditional partial order graphs from event logs,” in Transactions on Petri Nets and Other Models of Concurrency XI. Springer, 2016, pp. 114–136. [32] S. Schönig, A. Rogge-Solti, C. Cabanillas, S. Jablonski, and J. Mendling, “Efficient and customisable declarative process mining with sql,” in International Conference on Advanced Information Systems Engineering. Springer, 2016, pp. 290–305. [33] W. Song, H.-A. Jacobsen, C. Ye, and X. Ma, “Process discovery from dependence-complete event logs,” IEEE


[34]

[35]

[36]

[37]

[38]

[39]

[40] [41]

[42]

[43]

[44]

[45]

[46]

[47]

Transactions on Services Computing, vol. 9, no. 5, pp. 714–727, 2016. M. L. van Eck, N. Sidorova, and W. M. van der Aalst, “Discovering and exploring state-based models for multiperspective processes,” in International Conference on Business Process Management. Springer, 2016, pp. 142– 157. B. N. Yahya, M. Song, H. Bae, S.-o. Sul, and J.-Z. Wu, “Domain-driven actionable process model discovery,” Computers & Industrial Engineering, 2016. J. M. E. M. van der Werf, B. F. van Dongen, C. A. J. Hurkens, and A. Serebrenik, “Process discovery using integer linear programming,” Fundam. Inform., vol. 94, no. 3-4, pp. 387–412, 2009. S. J. J. Leemans, D. Fahland, and W. M. P. van der Aalst, “Discovering block-structured process models from event logs - A constructive approach,” in Application and Theory of Petri Nets and Concurrency - 34th International Conference, PETRI NETS 2013, Milan, Italy, June 24-28, 2013. Proceedings, 2013, pp. 311–329. S. J. Leemans, D. Fahland, and W. M. van der Aalst, “Discovering block-structured process models from incomplete event logs,” in International Conference on Applications and Theory of Petri Nets and Concurrency. Springer, 2014, pp. 91–110. ——, “Scalable process discovery with guarantees,” in International Conference on Enterprise, BusinessProcess and Information Systems Modeling. Springer, 2015, pp. 85–101. ——, “Scalable process discovery and conformance checking,” Softw. Syst. Model., 2017, to appear. A. J. M. M. Weijters and J. T. S. Ribeiro, “Flexible heuristics miner (FHM),” in Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011, part of the IEEE Symposium Series on Computational Intelligence 2011, April 11-15, 2011, Paris, France, 2011, pp. 310–317. A. J. M. M. Weijters and W. M. P. van der Aalst, “Rediscovering workflow models from event-based data using little thumb,” Integrated Computer-Aided Engineering, vol. 10, no. 2, pp. 151–162, 2003. R. Conforti, M. Dumas, L. García-Bañuelos, and M. La Rosa, “Beyond tasks and gateways: discovering bpmn models with subprocesses, boundary events and activity markers,” in International Conference on Business Process Management. Springer, 2014, pp. 101–117. D. Redlich, T. Molka, W. Gilani, G. S. Blair, and A. Rashid, “Constructs competition miner: Process control-flow discovery of bp-domain constructs,” in Business Process Management - 12th International Conference, BPM 2014, Haifa, Israel, September 7-11, 2014. Proceedings, 2014, pp. 134–150. M. Pesic, H. Schonenberg, and W. M. P. van der Aalst, “DECLARE: full support for loosely-structured processes,” in 11th IEEE International Enterprise Distributed Object Computing Conference (EDOC 2007), 15-19 October 2007, Annapolis, Maryland, USA, 2007, pp. 287–300. M. Westergaard and F. M. Maggi, “Declare: A tool suite for declarative workflow modeling and enactment,” in Proceedings of the Demo Track of the Nineth Conference on Business Process Management 2011, ClermontFerrand, France, August 31st, 2011, 2011. S. Schönig, C. Di Ciccio, F. M. Maggi, and J. Mendling, “Discovery of multi-perspective declarative process mod-

[48]

[49]

[50] [51]

[52]

[53] [54] [55]

[56] [57]

18

els,” in Service-Oriented Computing - 14th International Conference, ICSOC 2016, Banff, AB, Canada, October 10-13, 2016, Proceedings, 2016, pp. 87–103. S. J. Leemans, D. Fahland, and W. M. van der Aalst, “Discovering block-structured process models from event logs containing infrequent behaviour,” in International Conference on Business Process Management. Springer, 2013, pp. 66–78. A. Weijters and J. Ribeiro, “Flexible heuristics miner (fhm),” in Computational Intelligence and Data Mining (CIDM), 2011 IEEE Symposium on. IEEE, 2011, pp. 310–317. A. Adriansyah, B. van Dongen, and W. van der Aalst, “Conformance checking using cost-based fitness analysis,” in Proc. of EDOC. IEEE, 2011. A. Adriansyah, J. Munoz-Gama, J. Carmona, B. F. van Dongen, and W. M. P. van der Aalst, “Alignment based precision checking,” in Proc. of BPM Workshops, ser. LNBIP, vol. 132. Springer, 2012, pp. 137–149. R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in International Joint Conference on Artificial Intelligence, IJCAI. Morgan Kaufmann, 1995, pp. 1137–1145. J. Mendling, Metrics for Process Models: Empirical Foundations of Verification, Error Prediction, and Guidelines for Correctness. Springer, 2008. W. M. P. van der Aalst, Verification of workflow nets. Berlin, Heidelberg: Springer Berlin Heidelberg, 1997, pp. 407–426. van der Aalst, W. M. P., van Dongen, B. F., J. Herbst, L. Maruster, G. Schimm, and Weijters, A. J. M. M., “Workflow mining: a survey of issues and approaches,” ˘ S267, Data Knowl. Eng., vol. 47, no. 2, p. 237âA ¸ 2003. J. Claes and G. Poels, “Process Mining and the ProM Framework: An Exploratory Survey,” in Business Process Management Workshops. Springer, 2012, pp. 187–198. S. K. L. M. vanden Broucke, J. D. Weerdt, J. Vanthienen, and B. Baesens, “A comprehensive benchmarking framework (CoBeFra) for conformance analysis between procedural process models and event logs in ProM,” in IEEE Symposium on Computational Intelligence and Data Mining, CIDM. IEEE, 2013, pp. 254–261.

Adriano Augusto is a joint-PhD student at University of Tartu (Estonia) and Queensland University of Technology (Australia). He graduated in Computer Engineering at Polytechnic of Turin (Italy) in 2016, presenting a master thesis in the field of Process Mining.


Raffaele Conforti is Post-Doctoral Research Fellow at the Information Systems school of the Queensland University of Technology, Brisbane, Australia. He is conducting research in the area of business process management, with respect to business process automation and process mining. In particular his research focuses on automated process discovery and automatic detection, prevention and mitigation of process-related risks during the execution of business processes.

Marlon Dumas is Professor of Software Engineering at University of Tartu, Estonia. Prior to this appointment he was faculty member at Queensland University of Technology and visiting researcher at SAP Research, Australia. His research interests span across the fields of software engineering, information systems and business process management. He is co-author of the textbook “Fundamentals of Business Process Management” (Springer, 2013).

Marcello La Rosa is Professor of Business Process Management (BPM) and the academic director for corporate programs and partnerships at the Information Systems school of the Queensland University of Technology, Brisbane, Australia. His research interests include process consolidation, mining and automation, in which he published over 80 papers. He leads the Apromore initiative (www.apromore.org), a strategic collaboration between various universities for the development of an advanced process model repository. He is co-author of the textbook Fundamentals of Business Process Management (Springer, 2013).

19

Fabrizio Maria Maggi is a Senior Researcher with the Software Engineering Group at the University of Tartu (Estonia). He received his Ph.D. degree in Computer Science in 2010 from the University of Bari and has been working as postdoctoral researcher at the Architecture of Information Systems (AIS) research group – Department of Mathematics and Computer Science – Eindhoven University of Technology. His research interest span across the fields of business process management, data mining and service-oriented computing.

Andrea Marrella is Post-Doctoral Research Fellow at the Department of Computer, Control, and Management Engineering at Sapienza University of Rome. His research interests include Human-Computer Interaction, User Experience Design, Knowledge Representation, Reasoning about Action, Automated Planning, Business Process Management. He has published over 40 research papers and articles and 1 book chapter on the above topics, among others in ACM Transaction on Intelligent Systems and Technologies, IEEE Internet Computing and Journal on Data Semantics.

Massimo Mecella is Associate Professor (with a qualification to Full Professorship) with Sapienza Università di Roma, Department of Engineering in Computer, Control and Management Sciences. His research focuses on service oriented computing, business process management, CPSs (Cyber-Physical Systems) and IoT (Internet-of-Things), advanced interfaces and human-computer interaction, with applications in the fields of eGovernment, eBusiness, smart houses and smart spaces, healthcare, disaster/crisis response & management, cyber-security. He has published more than 150 research papers, in prestigious journals and top conferences. He has chaired, with various roles, different conferences in the above areas, and has a good expertise in research projects, especially in the European FP5, FP6, FP7 and Horizon2020.


Allar Soo Allar Soo is student in the Masters of Software Engineering at University of Tartu. His Masters thesis is focused on automated process discovery and its use in practical settings.

20

Automated Discovery of Process Models from ... - Semantic Scholar

Automated Discovery of Process Models from ... - Semantic Scholar

Suggest Documents

Automated Discovery of Structured Process Models ... - QUT ePrints

Automated Discovery of Structured Process Models - QUT ePrints

Automated Discovery of Structured Business Process ... - Apromore

automated discovery of relationships, models and ...

Knowledge Discovery Process Models

automated discovery of relationships, models and ...

Automated discovery, interaction and composition ... - Semantic Scholar

Semi-Automated Knowledge Discovery: Identifying ... - Semantic Scholar

Interactive Process Models - Semantic Scholar

Value-Chain Discovery from Business Process ... - Semantic Scholar

Value-Chain Discovery from Business Process ... - Semantic Scholar

Interest-Driven Discovery of Local Process Models

Efficient Discovery of Understandable Declarative Process Models ...

The Semantic Automated Discovery and

Cumulative Process Models from Thornton to ... - Semantic Scholar

Deriving requirements from process models via the ... - Semantic Scholar

From Conceptual Process Models to Running ... - Semantic Scholar

User-Guided Discovery of Declarative Process Models - Process Mining

Automated Discovery of Compositions of Services ... - Semantic Scholar

Semantic Knowledge Discovery from ... - Semantic Scholar

Towards Automated Large Scale Discovery of ... - Semantic Scholar

Automated Discovery of Telic Relations for WordNet - Semantic Scholar

Towards Automated Large Scale Discovery of ... - Semantic Scholar

process models discovery and traces classification: a