Mar 17, 2015 - event logs generated by three real business processes are used to ... for compliance checking with respect to reactive business rules is.
Conformance Checking Based on Multi-Perspective Declarative Process Models A. Burattina , F. M. Maggib , A. Sperdutia
arXiv:1503.04957v1 [cs.SE] 17 Mar 2015
a University b University
of Padua, Italy of Tartu, Estonia
Abstract Process mining is a family of techniques that aim at analyzing business process execution data recorded in event logs. Conformance checking is a branch of this discipline embracing approaches for verifying whether the behavior of a process, as recorded in a log, is in line with some expected behaviors provided in the form of a process model. The majority of these approaches require the input process model to be procedural (e.g., a Petri net). However, in turbulent environments, characterized by high variability, the process behavior is less stable and predictable. In these environments, procedural process models are less suitable to describe a business process. Declarative specifications, working in an open world assumption, allow the modeler to express several possible execution paths as a compact set of constraints. Any process execution that does not contradict these constraints is allowed. One of the open challenges in the context of conformance checking with declarative models is the capability of supporting multi-perspective specifications. In this paper, we close this gap by providing a framework for conformance checking based on MP-Declare, a multi-perspective version of the declarative process modeling language Declare. The approach has been implemented in the process mining tool ProM and has been experimented in three real life case studies. Keywords: Process Mining, Conformance Checking, Linear Temporal Logic, Business Constraints, Declare
1. Introduction The need to develop information systems able to fully support business processes of companies, and organizations in general, is becoming more and more urgent because of the fast pace of change in markets. Such dynamic markets impose frequent modifications and updates to business processes, leading to a constant decrease, in terms of temporal span, to the life-cycle of a business process definition. In this context, one very important functionality that any process-aware information system should be able to support is conformance checking, i.e., the ability to verify whether the actual flow of work is conformant with the intended business process model. This is especially true in the case Preprint submitted to Elsevier
March 18, 2015
of very complex processes, where the adoption of an imperative formalism to represent it, such as Petri Nets [1] or BPM Notation [2], may lead to so much intricate workflows (so called “spaghetti”-like workflows) to become basically impossible to even properly visualize the process for human inspection. Early works in conformance checking (e.g., [3, 4, 5]) mainly focused on the control-flow perspective in the context of imperative models, i.e., on the functional dependencies among performed activities/tasks in the process, while abstracting from time constraints, data dependencies, and resource assignments. These works were mainly based on replaying the log on the model to compute, according to the proposed approach, the fraction of events or traces in the log that can be replayed by the model. An evolution of these approaches is given by align-based approaches, where the conformance checking is performed by aligning both the modeled behavior and the behavior observed in the log (e.g. [6]). Only recently, approaches able to deal with multiple perspectives have been developed [7, 8], as well as approaches that aim at being computationally efficient via a problem decomposition strategy [9, 10, 11, 12]. In the case in which the process in consideration is complex, however, it is much better to use a declarative formalism, such as Declare [13, 14, 15], to represent a set of constraints that must be satisfied throughout the process execution. In this way, the “spaghetti”-like workflows are avoided, and the obtained model is flexible enough to allow all behaviors that do not violate the defined constraints. Conformance checking approaches based on the control-flow perspective have been defined for declarative models as well (e.g. [16, 17, 18]). More recently the additional data perspective has been considered in [19, 20], even if in these works the data perspective is not fully integrated with the control flow perspective. Efficient and fully integrated multi-perspective conformance checking proposals for declarative models, however, are still missing. In this paper, we aim at closing this gap by proposing a multi-perspective approach based on Declare where it is possible to define multi-perspective constraints jointly considering data, temporal, and control flow perspectives. In order to allow that, we formally define Multi-Perspective Declare (MP-Declare), an augmented version of Declare where, thanks to the use of Metric First-Order Linear Temporal Logic, it is possible to define activation, correlation, and time conditions to build constraints over traces. A nice feature of MP-Declare is that, by construction, it allows the user to efficiently perform conformance checking over event logs. In fact, we show that it is possible to define a conformance checking algorithmic framework operating on constraint templates, that is linear in the number of traces, constraints, and in the number of events of each trace. Conformance checking for a specific template is then obtained via definition of template-dependent procedures within the framework, whose time complexity depends on the actual template. Overall, however, the time complexity is upper bounded in the worst case by a quadratic function. We assess the validity of the proposed approach both on artificial and real event logs. Controlled artificial data, involving logs containing up to 5 million events, are used to prove the scalability of the proposed approach, while real 2
event logs generated by three real business processes are used to demonstrate the expressivity and flexibility of constraints defined via MP-Declare. 2. Related Work The scientific literature reports several works in the field of conformance checking [21]. Typically, the term conformance checking refers to the comparison of observed behaviors – as recorded in an event log – with respect to a process model. In the past, most of the conformance checking techniques were based on procedural models. State of the art examples of these approaches are reported in [7, 22, 11, 12]. In recent years, an increasing number of researchers are focusing on the conformance checking with respect to declarative models. For example, in [16], an approach for compliance checking with respect to reactive business rules is proposed. Rules, expressed using Condec [23], are mapped to Abductive Logic Programming, and Prolog is used to perform the validation. The approach has been extended in [17], by mapping constraints to LTL, and evaluating them using automata. The entire work has been contextualized into the service choreography scenario. Runtime monitoring for compliance checking has been studied also based on MFOTL, as reported in [24, 25]. In these cases, the focus is on security policy monitoring. On the one side the authors try to enforce security policies, on the other they perform monitoring. In order to enforce security policies, it is necessary to distinguish between controllable and observable activities and, under specific circumstances, terminate the systems in order to prevent policy violations. Concerning the monitoring, authors identified fragments of the used logic, to describe security policies insensitive with respect to the ordering of actions with equal timestamps. The authors assume to perform monitoring in a distributed systems, which have synchronized clocks with limited precision. Another application domain that researchers used to assess the applicability of conformance checking techniques is the medical domain. In particular, Grando et al. [26, 27] used Declare to model medical guidelines and to provide semantic (i.e., ontology-based) conformance checking measures. However, in this analysis neither data nor time perspectives are taken into account. In [18], the authors report an approach that can be used to evaluate the conformance of a log with respect to a Declare model. In particular, their algorithms compute, for each trace, whether a Declare constraint is violated or fulfilled. Using these statistics the approach allows the user to evaluate the “healthiness” of the log. The approach is based on the conversion of Declare constraints into automata and, using a so-called “activation tree”, it is able to identify violations and fulfillments. The approach described in this work does not take into account the data and time perspective, but only the control-flow is analyzed. The work described in [28, 29] consists in converting a Declare model into an automaton and perform conformance checking of a log with respect to the
3
generated automaton. The conformance checking approach is based on the concept of “alignment” and as a result of the analysis each trace is converted into the most similar trace that the model accepts. In a recent work, reported in [20], the data perspective for conformance checking with Declare is expressed in terms of conditions on global variables disconnected from the specific Declare constraints expressing the control flow. This work does not take the temporal perspective into account. In contrast, we provide a formal semantics in which the data perspective, the temporal perspective and the control flow are connected with each others. 3. Preliminaries In this section, we present the fundamental concepts required to understand the rest of the paper. 3.1. Process Mining and XES The basic idea behind process mining is to discover, monitor and improve processes by extracting knowledge from data that is available in today’s systems [5]. The starting point for process mining is an event log. XES (eXtensible Event Stream) [30, 31] has been developed as the standard for storing, exchanging and analyzing event logs. Each event in a log refers to an activity (i.e., a well-defined step in some process) and is related to a particular case (i.e., a process instance). The events belonging to a case are ordered with respect to their execution times. Hence, a case (i.e., a trace) can be viewed as a sequence of events. Event logs may store additional information about events such as the resource (i.e., person or device) executing or initiating the activity, the timestamp of the event, or data elements recorded with the event. In XES, data elements can be event attributes, i.e., data produced by the activities of a business process and case attributes, namely data that are associated to a whole process instance. In this paper, we assume that all attributes are globally visible and can be accessed/manipulated by all activity instances executed inside the case. 3.2. Metric First Order Temporal Logic In this paper, we use Metric First Order Temporal Logic (MFOTL) first introduced in [32]. MFOTL extends propositional metric temporal logic [33] to merge the expressivity of first-order logic together with the MTL temporal modalities. We deal with a fragment of MFOTL where all traces are finite. In the following, we call “structure” a triple D = (∆, σ, ι). ∆ is the domain of the structure, i.e., an arbitrary set. σ is the signature of the structure, i.e., a triple σ = (C, R, a), where C is a set of constant symbols, R is a set of relational symbols, and a is a function that specify the arity of each relational symbol. ι is the interpretation function of the structure that assigns a meaning to all the symbols in σ over the domain ∆.
4
Definition 1 (Timed temporal structure). A timed temporal structure over the signature σ = (C, R, a) is a pair (D, τ ) where D is a finite sequence of structures D = (D1 , . . . , Dn ) and τ = (τ1 , . . . , τn ) is a finite sequence of timestamps with τi ∈ N.1 D is assumed to have constant domains, i.e., ∆i = ∆i+1 , for all 1 ≤ i < n. Each constant symbol in C has an interpretation that does not vary over the time. The sequence of timestamps τ is monotonically increasing, i.e., τi ≤ τi+1 , for all 1 ≤ i < n. We indicate with I = [a, b) an interval, where a ∈ N and b ∈ N ∪ {∞}, and with V a set of variables. To express MFOTL formulas, we use the syntax: Definition 2 (MFOTL Syntax). Formulas of MFOTL over a signature σ = (C, R, a) are given by the grammar φ ::= t1 ≈ t2 | r(t1 , . . . , ta(r) ) | ¬φ | φ1 ∧ φ2 | ∃x.φ | XI φ | φ1 UI φ2 | YI φ | φ1 SI φ2 where φ, φ1 , φ2 ∈MFOTL, I = [a, b) is an interval, r is an element of R, x ranges over V , and t1 , t2 , . . . belong to V ∪ C. A valuation is a mapping v : V → ∆. With abuse of notation, if c is a constant symbol in C, we say that v(c) = c. For a valuation v, a variable x ∈ V , and d ∈ ∆, v[x/d] is the valuation that maps x to d and leaves unaltered the valuation of the other variables. Definition 3 (MFOTL Semantics). Given (D, τ ) a timed temporal structure over the signature σ = (C, R, a) with D = (D1 , . . . , Dn ), τ = (τ1 , . . . , τn ), φ a formula over S, v a valuation, and 1 ≤ i ≤ n, we define (D, τ, v, i) φ as follows: (D, τ, v, i) t ≈ t0 (D, τ, v, i) r(t1 , . . . , ta(r) ) (D, τ, v, i) (¬φ1 ) (D, τ, v, i) φ1 ∧ φ2 (D, τ, v, i) ∃x.φ1 (D, τ, v, i) YI φ1 (D, τ, v, i) XI φ1 (D, τ, v, i) φ1 SI φ2
(D, τ, v, i) φ1 UI φ2
v(t) = v(t0 ) (v(t1 ), . . . , v(ta(r) ))) ∈ ι(r) (D, τ, v, i) 2 φ1 (D, τ, v, i) φ1 and (D, τ, v, i) φ2 (D, τ, v[x/d], i) φ1 , for some d ∈ ∆ i > 1, τi − τi−1 ∈ I, and (D, τ, v, i − 1) φ1 i < n, τi+1 − τi ∈ I and (D, τ, v, i + 1) φ1 for some j ≤ i, τi − τj ∈ I, (D, τ, v, j) φ2 and (D, τ, v, k) φ1 for all k ∈ [j + 1, i + 1) iff for some j ≥ i, τj − τi ∈ I, (D, τ, v, j) φ2 and (D, τ, v, k) φ1 for all k ∈ [j, i)
iff iff iff iff iff iff iff iff
We add syntactic sugar for the normal connectives, such as true ≡ ∃x.x ≈ x, φ1 ∨ φ2 ≡ ¬(¬φ1 ∧ ¬φ2 ), ∀x.φ ≡ ¬∃x.¬φ φ1 → φ2 ≡ (¬φ1 ) ∨ φ2 and 1 Note
that every timestamp available in a XES log can be translated into an integer.
5
Table 1: Semantics for some Declare templates. Template
LTL semantics
responded existence
G(A → (OB ∨ FB))
A
response alternate response chain response
G(A → FB) G(A → X(¬AUB)) G(A → XB)
A A A
precedence alternate precedence chain precedence
G(B → OA) G(B → Y(¬BSA)) G(B → YA)
B B B
not not not not not
G(A → ¬(OB ∨ FB)) G(A → ¬FB) G(B → ¬OA) G(A → ¬XB) G(B → ¬YA)
A A B A B
responded existence response precedence chain response chain precedence
Activation
φ1 ↔ φ2 ≡ (φ1 → φ2 ) ∧ (φ2 → φ1 ). We also add temporal syntactic sugar, FI ψ ≡ trueUI ψ (timed future operator), GI ψ ≡ ¬(FI (¬ψ)) (timed globally operator), OI ψ ≡ trueSI ψ (timed once operator) and HI ψ ≡ ¬(OI (¬ψ)) (timed historically operator). The non-metric variants of the temporal operators are obtained by specifying I = [0, ∞). 3.3. Declare Declare is a declarative process modeling language originally introduced by Pesic and van der Aalst in [13, 14, 15]. Instead of explicitly specifying the flow of the interactions among process activities, Declare describes a set of constraints that must be satisfied throughout the process execution. The possible orderings of activities are implicitly specified by constraints and anything that does not violate them is possible during execution. In comparison with procedural approaches that produce “closed” models, i.e., all that is not explicitly specified is forbidden, Declare models are “open” and tend to offer more possibilities for the execution. In this way, Declare enjoys flexibility and is very suitable for highly dynamic processes characterized by high complexity and variability due to the turbulence and the changeability of their execution environments. A Declare model consists of a set of constraints applied to activities. Constraints, in turn, are based on templates. Templates are patterns that define parameterized classes of properties, and constraints are their concrete instantiations (we indicate template parameters with capital letters and concrete activities in their instantiations with lower case letters). They have a graphical representation understandable to the user and their semantics can be formalized using different logics [34], the main one being LTL over finite traces, making them verifiable and executable. Each constraint inherits the graphical representation and semantics from its template. Table 1 summarizes some Declare 6
templates (the reader can refer to [13] for a full description of the language). The responded existence template specifies that if A occurs, then B should also occur (either before or after A). The response template specifies that when A occurs, then B should eventually occur after A. The precedence template indicates that B should occur only if A has occurred before. Templates alternate response and alternate precedence strengthen the response and precedence templates respectively by specifying that activities must alternate without repetitions in between. Even stronger ordering relations are specified by templates chain response and chain precedence. These templates require that the occurrences of A and B are next to each other. Declare also includes some negative constraints to explicitly forbid the execution of activities. The not responded existence template indicates that if A occurs in a process instance, B cannot occur in the same instance. According to the not response template any occurrence of A cannot be eventually followed by B, whereas the not precedence template requires that any occurrence of B is not preceded by A. Finally, according to the not chain response and not chain precedence, A and B cannot occur one immediately after the other. The major benefit of using templates is that analysts do not have to be aware of the underlying logic-based formalization to understand the models. They work with the graphical representation of templates, while the underlying formulas remain hidden. Declare is very suitable for specifying compliance models that are used to check if the behavior of a system complies with desired regulations. The compliance model defines the constraints related to a single process instance, and the overall expectation is that all instances comply with the model. Consider, for example, the response constraint G(a → Fb). This constraint indicates that if a occurs, b must eventually follow. Therefore, this constraint is satisfied for traces such as t1 = ha, a, b, ci, t2 = hb, b, c, di, and t3 = ha, b, c, bi, but not for t4 = ha, b, a, ci because, in this case, the second instance of a is not followed by a b. Note that, in t2 , the considered response constraint is satisfied in a trivial way because a never occurs. In this case, we say that the constraint is vacuously satisfied [35]. In [18], the authors introduce the notion of behavioral vacuity detection according to which a constraint is non-vacuously satisfied in a trace when it is activated in that trace. An activation of a constraint in a trace is an event whose occurrence imposes, because of that constraint, some obligations on other events (targets) in the same trace. For example, a is an activation for the response constraint G(a → Fb) and b is a target, because the execution of a forces b to be executed, eventually. In Table 1, for each template the corresponding activation is specified. An activation of a constraint can be a fulfillment or a violation for that constraint. When a trace is perfectly compliant with respect to a constraint, every activation of the constraint in the trace leads to a fulfillment. Consider, again, the response constraint G(a → Fb). In trace t1 , the constraint is activated and fulfilled twice, whereas, in trace t3 , the same constraint is activated and fulfilled only once. On the other hand, when a trace is not compliant with respect to a constraint, an activation of the constraint in the trace can lead to a fulfillment but also to a violation (at least one activation leads to a violation). In trace t4 , 7
-≤ φτ
{
Time condi�on
Time Ac�vity Data condi�on
--
B φc
(a) Fulfillment
≤ φτ
Time condi�on
Time Ac�vity Data condi�on
A φa
-≤ φτ
{
{
Time condi�on
A φa
Time
B φc
Ac�vity Data condi�on
(b) Violation
A φa
B φc
(c) Violation
Figure 1: Fulfillment and violation scenarios for the response constraint between activities A and B. (a) reports a typical fulfillment scenario. In (b), the violation is due to the violation of the correlation condition ϕc . In (c), the violation is due to the violation of the time condition ϕτ . for example, the response constraint G(a → Fb) is activated twice, but the first activation leads to a fulfillment (eventually b occurs) and the second activation leads to a violation (b does not occur subsequently). An algorithm to discriminate between fulfillments and violations for a constraint in a trace is presented in [18]. Table 1 reports the activations for the main Declare templates. In [18], the authors define two metrics to measure the conformance of an event log with respect to a constraint in terms of violations and fulfillments, called violation ratio and fulfillment ratio of the constraint in the log. These metrics are valued 0 if the log contains no activations of the considered constraint. Otherwise, they are evaluated as the percentage of violations and fulfillments of the constraint over the total number of activations. Tools implementing process mining approaches based on Declare are presented in [36]. The tools are implemented as plug-ins of the process mining framework ProM. 4. MFOTL Semantics for Multi-Perspective Business Constraints In this section, we introduce a multi-perspective version of Declare (MPDeclare). The version is similar to the ones in [37, 38], but we enrich it by allowing both time and data perspective. To do this, we use Metric FirstOrder Linear Temporal Logic (MFOTL). While many reasoning tasks are clearly undecidable for MFOTL, this logic is appropriate to unambiguously describe the semantics of the MP-Declare constraints we can use for conformance checking in our proposed algorithms.
8
-≤ φτ
{
Time condi�on
Time Ac�vity Data condi�on
A φa
B φc
(a) Fulfillment
--≤ φτ
Time Ac�vity Data condi�on
A φa
Time
B φc
A φa
Ac�vity Data condi�on
≤ φτ
Time condi�on
Ac�vity Data condi�on
A A φa φa
≤ φτ
{
{
Time
B φc
(c) Violation
(b) Violation
Time condi�on
--≤ φτ
{
Time condi�on
{
Time condi�on
Time
B φc
Ac�vity Data condi�on
A A φa φa
B φc
(e) Violation
(d) Fulfillment
Figure 2: Fulfillment and violation scenarios for the alternate response constraint between activities A and B. (a) reports a typical fulfillment scenario. In (b), the violation is due to the violation of the correlation condition ϕc . In (c), the violation is due to the violation of the time condition ϕτ . The activation in (d) is a fulfillment because the second occurrence of A does not satisfy the activation condition. In contrast, (e) reports a violation since, in this case, the second occurrence of A satisfies the activation condition. To define the new semantics for Declare, we have to contextualize the definitions given in Section 3.2 in XES. Consider, for example, that the execution of an activity pay is recorded in an event log and, after the execution of pay at timestamp τi , the attributes originator, amount, and z have values John, 100, and July. In this case, the valuation of (activityN ame, originator, amount, z) is (pay, John, 100, July) in τi . Considering that in XES, by definition, the activity name is a special attribute always available, if (pay, John, 100, July) is the valuation of (activityN ame, originator, amount, z), we say that, when pay occurs, two special relations are valid event(pay) and ppay (John, 100, July). In the following, we identify event(pay) with the event itself pay and we call (John, 100, July), the payload of pay. The semantics for MP-Declare is shown in Table 2. Note that all the templates here considered have two parameters, an activation and a target (see also
9
≤ φτ
{
Time condi�on
Time
A B φa φc
Ac�vity Data condi�on
(a) Fulfillment
Time condi�on
Time Ac�vity Data condi�on
≤ φτ
{
≤ φτ
{
Time condi�on
Time
A B φa φc
Ac�vity Data condi�on
(b) Violation
A B φa φc (c) Violation
Figure 3: Fulfillment and violation scenarios for the chain response template between activities A and B. (a) reports a typical fulfillment scenario. Note that, in this case, the two events are contiguous. In (b), the violation is due to the violation of the correlation condition ϕC . In (c), the violation is due to the violation of the time condition ϕτ . Table 1). As an example, we consider the response constraint “activity pay is always eventually followed by activity get discount” having pay as activation and get discount as target. The timed semantics of Declare, introduced in [37], is extended by requiring two additional conditions on data, i.e., the activation condition ϕa and the correlation condition ϕc . The activation condition is a relation (over the variables corresponding to the global attributes in the event log) that must be valid when the activation occurs. If the activation condition does not hold the constraint is not activated. In the case of the response template the activation condition has the form pA (x) ∧ ra (x), meaning that when A occurs with payload x, the relation ra over x must hold. For example, we can say that whenever pay occurs and client type is gold then eventually get discount must follow. In case pay occurs but client type is not gold the constraint is not activated. The correlation condition is a relation that must be valid when the target occurs. It has the form pB (y) ∧ rc (x, y), where rc is a relation involving, again, variables corresponding to the (global) attributes in the event log but, in this case, relating the valuation of the attributes corresponding to the payload of A and the valuation of the attributes corresponding to the payload of B. In our example, we can say that whenever pay occurs and client type is gold then eventually get discount must follow and the due amount corresponding to activity get discount must be lower than the one corresponding to activity pay. In the following, with abuse of notation we specify the interval characterizing the time perspective of a MP-Declare constraint (I = [a, b)) with ϕτ .
10
Table 2: Semantics for MP-Declare constraints. Template
MFOTL Semantics
responded existence
G(∀x.((A ∧ ϕa (x)) → (OI (B ∧ ∃y.ϕc (x, y)) ∨ FI (B ∧ ∃y.ϕc (x, y)))))
response alternate response chain response
G(∀x.((A ∧ ϕa (x)) → FI (B ∧ ∃y.ϕc (x, y)))) G(∀x.((A ∧ ϕa (x)) → X(¬(A ∧ ϕa (x))UI (B ∧ ∃y.ϕc (x, y))))) G(∀x.((A ∧ ϕa (x)) → XI (B ∧ ∃y.ϕc (x, y)))
precedence alternate precedence chain precedence
G(∀x.((B ∧ ϕa (x)) → OI (A ∧ ∃y.ϕc (x, y))) G(∀x.((B ∧ ϕa (x)) → Y(¬(B ∧ ϕa (x))SI (A ∧ ∃y.ϕc (x, y)))) G(∀x.((B ∧ ϕa (x)) → YI (A ∧ ∃y.ϕc (x, y)))
not not not not not
G(∀x.((A ∧ ϕa (x)) → ¬(OI (B ∧ ∃y.ϕc (x, y)) ∨ FI (B ∧ ∃y.ϕc (x, y))))) G(∀x.((A ∧ ϕa (x)) → ¬FI (B ∧ ∃y.ϕc (x, y)))) G(∀x.((B ∧ ϕa (x)) → ¬OI (A ∧ ∃y.ϕc (x, y))) G(∀x.((A ∧ ϕa (x)) → ¬XI (B ∧ ∃y.ϕc (x, y))) G(∀x.((B ∧ ϕa (x)) → ¬YI (A ∧ ∃y.ϕc (x, y)))
responded existence response precedence chain response chain precedence
Graphical representations of three MP-Declare templates are reported in Figures 1, 2 and 3. In particular, these figures report the semantics for response, alternate response and chain response constraints. Each figure shows possible scenarios of violations and fulfillments for the corresponding constraint. A scenario is described reporting events as rounded circles. Each circle is associated to an activity (A, B, or C) and a data condition (either an activation condition ϕa or a correlation condition ϕc ). The time condition ϕτ is reported above the horizontal curly bracket. Crossed data or time conditions indicate violated conditions. Red circles indicate events that are violations, green circles indicate fulfillments. The response constraint in Figure 1 indicates that, if A occurs at time τA with ϕa holding true, B must occur at some point τB ∈ [τA + a, τA + b) with ϕc holding true. The alternate response constraint in Figure 2 specifies that, if A occurs at time τA with ϕa holding true, B must occur at some point τB ∈ [τA + a, τA + b) with ϕc holding true. A is not allowed in the interval [τA , τB ] if ϕa is true. Any event different from A is allowed and, also, A is allowed if ϕa is false. The chain response constraint in Figure 3 indicates that, if A occurs at time τA with ϕa holding true, B must occur next at some point τB ∈ [τA + a, τA + b) with ϕc holding true. 5. Conformance Checking Algorithms As stated in the previous section, with MP-Declare, it is possible to express Declare constraints taking into account also the temporal and the data perspectives. As an example, it is possible to express constraints like: • activity A must occur between 10 and 11 hours before activity B; • if activity A writes a variable x with value = 10 000
-
6
Response
A SUBMITTED
A ACCEPTED
A.AMOUNT REQ < 10 000
-
7
Response
W Valideren W Valideren aanvraag-SCHEDULE aanvraag-START
-
-
-
8
Response
W Valideren W Valideren aanvraag-SCHEDULE aanvraag-START
-
A.org:resource != T.org:resource
-
9
Response
W Valideren W Valideren aanvraag-SCHEDULE aanvraag-START
-
A.org:resource != T.org:resource
0,7,d
10
Response
W Valideren W Valideren aanvraag-SCHEDULE aanvraag-START
-
A.org:resource != T.org:resource
0,24,h
11
Response
W Valideren aanvraag-START
W Valideren aanvraag-COMPLETE
-
-
12
Response
W Valideren aanvraag-START
W Valideren aanvraag-COMPLETE
A.org:resource == T.org:resource
-
13
Response
W Valideren aanvraag-START
W Valideren aanvraag-COMPLETE
A.org:resource == T.org:resource
0,1,h
14
Response
W Valideren aanvraag-START
W Valideren aanvraag-COMPLETE
A.org:resource == T.org:resource
0,15,m
Table 11: Conformance checking results using the log from the BPI challenge 2012. Id
Act.no.
Viol.no.
Fulfill.no.
Avg.act.sparsity
Avg.viol.ratio
Avg.fulfill.ratio
3 4 5 6 7 8 9 10 11 12 13 14
13 087 13 087 6 847 6 240 5 023 5 023 5 023 5 023 7 891 7 891 7 891 7 891
7 974 9 036 3 601 4 373 51 236 263 2 897 2 6 228 3 355
5 113 4 051 3 246 1 867 4 972 4 787 4 760 2 126 7 889 7 885 7 663 4 536
0.8596 0.8596 0.9585 0.9211 0.9909 0.9909 0.9909 0.9909 0.9863 0.9863 0.9863 0.9863
0.6093 0.6905 0.5259 0.7008 0.0102 0.047 0.0524 0.5767 0.0003 0.0008 0.0289 0.4252
0.3907 0.3095 0.4741 0.2992 0.9898 0.953 0.9476 0.4233 0.9997 0.9992 0.9711 0.5748
26
particular, when the requested amount is lower than 10 000 the acceptance rate is almost 30%. The acceptance rate is higher if the requested amount is greater or equal to 10 000 (almost half of the applications is accepted in this case). With constraints 7-14, we analyze the validation of the applications. With constraint 7, we can see that almost 99% of the scheduled validations are eventually started. In 95% of the cases, the resource that schedules the validation is not the same resource that starts this activity (see constraint 8). In addition, in around 94% of the cases, a scheduled validation is started within 7 days from the scheduling (constraint 9) and in almost half of the cases the validation is started only 24 hours after the scheduling. Constraint 11 indicates that almost 100% of the validations that have been started are also completed, and almost in all the cases the resource that starts the validation is the same resource that Table 12: Execution times using the log from the BPI challenge 2012. Id
Avg.execution time (milliseconds)
3 4 5 6 7 8 9 10 11 12 13 14
2 772 3 220 3 261 3 205 3 196 3 100 3 212 3 146 2 176 3 210 3 241 3 258
(a) Example of fulfillment W Valideren (b) A correlated target W Valideren aanvraag-START at position 35. aanvraag-COMPLETE at position 36 executed by the same resource.
Figure 10: Example of fulfillment for constraint 13.
27
(a) Example of violation W Valideren (b) A possible target W Valideren aanvraag-START at position 37. aanvraag-COMPLETE occurs more than 1 hours after.
Figure 11: Example of violation for constraint 13; W Valideren aanvraag-COMPLETE occurs outside the required time interval (too late).
(a) Example of fulfillment W Valideren (b) Corresponding target executed by the aanvraag-START at position 39. same resource.
Figure 12: Example of fulfillment for constraint 13; aanvraag-START at position 39 is followed by aanvraag-COMPLETE within the required time interval.
28
W Valideren W Valideren
Table 13: Reference constraints used to analyze the log from the BPI challenge 2014. Id
Constraint
1st param.
2nd param.
Activation condition
Correlation condition
Time condition
15
Not response
Open
Reopen
-
-
-
16
Not response
Open
Reopen
-
A.org:resource != T.org:resource
-
17
Response
Open
Closed
-
-
-
18
Response
Open
Closed
-
-
0,12,h
19
Response
Open
Closed
A.KMnumber == ’KM0000611’
-
0,12,h
20
Response
Open
Closed
A.KMnumber == ’KM0002043’
-
0,12,h
Table 14: Conformance checking results using the log from the BPI challenge 2014. Id
Act.no.
Viol.no.
Fulfill.no.
Avg.act.sparsity
Avg.viol.ratio
Avg.fulfill.ratio
15 16 17 18 19 20
46 607 46 607 46 607 46 607 446 773
2 121 510 449 24 392 386 48
44 486 46 097 46 158 22 215 60 725
0.8468 0.8468 0.8468 0.8468 0.9993 0.9969
0.0455 0.0109 0.0096 0.5234 0.8655 0.0621
0.9545 0.9891 0.9904 0.4766 0.1345 0.9379
completes this activity (see constraint 12). In 97% of the cases, the validation is done in at most 1 hour (constraint 13), and in more than half of the cases it is completed in less than 15 minutes (constraint 14). In Figure 10 and 12, we show two fulfillments for constraint 13 (the activations with the correlated targets). 12 shows a violation for the same constraint. In Table 12, we show the execution times needed for checking the constraints in this case study. Also in this case, like in the first case study here presented, the execution time is low (between 2 and 3 seconds on average). 7.3. Rabobank The case study we illustrate in this section has been provided for the BPI challenge 2014 by Rabobank Netherlands Group ICT [41]. The log we use pertains to the management of calls or mails from customers to the Service Desk concerning disruptions of ICT-services. The log contains 46 616 cases, Table 15: Execution times using the log from the BPI challenge 2014. Id
Avg.execution time (milliseconds)
15 16 17 18 19 20
4 294 5 093 5 240 5 055 4 861 5 398
29
(a) Example of violation Open at position 1. (b) A forbidden event Reopen occurs after Open.
Figure 13: Example of violation for constraint 16; Open is followed by an event Reopen associated to a different resource. 466 737 events referring to 39 different event classes. There are 242 originators and domain specific event attributes like KM number, Interaction ID and IncidentActivity Number. For this case study, we have used the constraints shown in Table 13. As shown in Table 14, constraint 15 has 46 607 activations and 44 486 fulfillments. This allows us to understand that in around 95% of open calls are not reopened afterwards. This percentage is even higher if we require that an open call cannot be eventually reopened by the same resource (see constraint 16). Indeed, this is true in almost 99% of the cases. Around 99% of the open calls are eventually closed (see constraint 17). Around half of them are closed within 12 hours (constraint 18). The “KM number” in this case study identifies the characteristics of a call to understand how urgent the corresponding problem is. The checks on rules 19 and 20 show that the calls corresponding to the number KM0002043 are, in general, more urgent than the ones corresponding to the number KM0000611. Indeed, over 446 calls corresponding to the KM number KM0000611 only 60 are closed within 12 hours. On the other hand, over 773 calls corresponding to the KM number KM0002043, 725 are closed within 12 hours. Figure 13 shows a violation for constraint 16. The selected event Open is fol-
30
lowed by a forbidden event Reopen (associated to a different resource). Table 15 shows that the execution times for this case study range from 4 to 5 seconds. 8. Conclusion and Future Work In this work, we propose a framework for checking the conformance of event logs with respect to MP-Declare models. MP-Declare is an extension of the declarative process modeling language Declare that allows the modeler to specify constraints over the data associated to the control-flow and over the “time dimension” of a business process. We describe and discuss in detail how the proposed framework can be used to define algorithms for conformance checking based on MP-Declare. Our proposal has been implemented in the process mining tool ProM. The implemented software covers the entire set of MP-Declare templates. In addition, the conformance checker can also be used with standard Declare. A wide experimentation has been carried out using both real-life and synthetic logs. These case studies prove the applicability of our implementation in realistic settings. Although it is extremely important to recognize deviances a-posteriori, in some particular contexts, it would be also useful to detect violations on-the-fly as they occur. To this aim, in the near future we are planning to make the proposed framework suitable to be used in online settings. References [1] T. Murata, Petri nets: Properties, analysis and applications., in: Proceedings of the IEEE, 1989, pp. 541–580. [2] O. M. G. (OMG), Business Process Model and Notation (BPMN) Version 2.0, Tech. rep. (jan 2011). [3] J. E. Cook, A. L. Wolf, Software process validation: Quantitatively measuring the correspondence of a process to a model, ACM Trans. Softw. Eng. Methodol. 8 (2) (1999) 147–176. [4] A. Rozinat, W. M. P. van der Aalst, Conformance checking of processes based on monitoring real behavior, Inf. Syst. 33 (1) (2008) 64–95. [5] W. M. P. van der Aalst, Process Mining: Discovery, Conformance and Enhancement of Business Processes, 1st Edition, Springer Publishing Company, Incorporated, 2011. [6] A. Adriansyah, B. F. van Dongen, W. M. P. van der Aalst, Conformance checking using cost-based fitness analysis, in: Proceedings of the 15th IEEE International Enterprise Distributed Object Computing Conference, EDOC 2011, 2011, pp. 55–64. [7] M. de Leoni, W. M. van der Aalst, Aligning Event Logs and Process Models for Multi-Perspective Conformance Checking: An Approach Based on Integer Linear Programming, in: International Conference on Business Process Management, Springer Berlin Heidelberg, 2013, pp. 113–129. 31
[8] F. Mannhardt, M. de Leoni, H. A. Reijers, W. M. van der Aalst, Balanced multi-perspective checking of process conformance, Tech. Rep. BPM-14-07, BPM Center (2014). [9] W. M. P. van der Aalst, Decomposing process mining problems using passages, in: Application and Theory of Petri Nets - 33rd International Conference, Petri Nets 2012, 2012, pp. 72–91. [10] W. M. P. van der Aalst, Decomposing petri nets for process mining: A generic approach, Distributed and Parallel Databases 31 (4) (2013) 471– 507. [11] M. de Leoni, J. Munoz-Gama, J. Carmona, W. M. P. van der Aalst, Decomposing alignment-based conformance checking of data-aware process models, in: On the Move to Meaningful Internet Systems: OTM 2014 Conferences - Confederated International Conferences: CoopIS, and ODBASE 2014, 2014, pp. 3–20. [12] J. Munoz-Gama, J. Carmona, W. M. P. van der Aalst, Single-entry singleexit decomposed conformance checking, Inf. Syst. 46 (2014) 102–122. [13] W. van der Aalst, M. Pesic, H. Schonenberg, Declarative Workflows: Balancing Between Flexibility and Support, Computer Science - R&D (2009) 99–113. [14] Declare (2008). URL http://declare.sf.net [15] M. Pesic, H. Schonenberg, W. van der Aalst, DECLARE: Full Support for Loosely-Structured Processes, in: EDOC 2007, pp. 287–298. [16] F. Chesani, P. Mello, M. Montali, F. Riguzzi, M. Sebastianis, S. Storari, Checking Compliance of Execution Traces to Business Rules, in: Business Process Management Workshops, 2009, pp. 134–145. [17] M. Montali, M. Pesic, W. M. van der Aalst, F. Chesani, P. Mello, S. Storari, Declarative specification and verification of service choreographiess, ACM Transactions on the Web 4 (1) (2010) 1–62. [18] A. Burattin, F. M. Maggi, W. M. P. van der Aalst, A. Sperduti, Techniques for a Posteriori Analysis of Declarative Processes, in: 2012 IEEE 16th International Enterprise Distributed Object Computing Conference, IEEE, 2012, pp. 41–50. [19] M. Montali, F. Chesani, P. Mello, F. M. Maggi, Towards data-aware constraints in declare, in: Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC ’13, 2013, pp. 1391–1396. [20] D. Borrego, I. Barba, Conformance checking and diagnosis for declarative business process models in data-aware scenarios, Expert Systems with Applications 41 (11) (2014) 5340–5352. 32
[21] W. M. P. van der Aalst, Process Mining: Discovery, Conformance and Enhancement of Business Processes, Springer Berlin / Heidelberg, 2011. [22] A. Adriansyah, Aligning observed and modeled behavior, Phd thesis, Technische Universiteit Eindhoven (2014). [23] M. Peˇsi´c, W. M. P. van der Aalst, A Declarative Approach for Flexible Business, in: Business Process Management, Springer Berlin Heidelberg, 2006, pp. 169–180. [24] D. Basin, V. Jug´e, F. Klaedtke, E. Zlinescu, Enforceable Security Policies Revisited, ACM Transactions on Information and System Security 16 (1) (2013) 1–26. [25] D. Basin, M. Harvan, F. Klaedtke, E. Zlinescu, Monitoring Data Usage in Distributed Systems, IEEE Transactions on Software Engineering 39 (10) (2013) 1403–1426. [26] M. A. Grando, W. M. P. van der Aalst, R. S. Mans, Reusing a Declarative Specification to Check the Conformance of Different CIGs, in: Business Process Management Workshops, Springer Berlin Heidelberg, 2012, pp. 188–199. [27] M. A. Grando, M. H. Schonenberg, W. M. P. van der Aalst, Semantic-Based Conformance Checking of Computer Interpretable Medical Guidelines, in: International Joint Conference, BIOSTEC, Vol. 273 of Communications in Computer and Information Science, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 285–300. [28] M. D. Leoni, F. M. Maggi, W. M. P. van der Aalst, Aligning Event Logs and Declarative Process Models for Conformance Checking, in: Business Process Management, Springer Berlin / Heidelberg, 2012, pp. 82–97. [29] M. de Leoni, F. M. Maggi, W. M. van der Aalst, An alignment-based framework to check the conformance of declarative process models and to preprocess event-log data, Information Systems (2014) 1–20. [30] IEEE Task Force on Process Mining: XES Standard Definition, 2013. [31] H. M. W. Verbeek, J. C. A. M. Buijs, B. F. van Dongen, W. M. P. van der Aalst, XES, XESame, and ProM 6, in: Information Systems Evolution CAiSE Forum, Vol. 72, 2010, pp. 60–75. [32] J. Chomicki, Efficient checking of temporal integrity constraints using bounded history encoding, ACM Trans. Database Syst. 20 (2) (1995) 149– 186. [33] R. Koymans, Specifying real-time properties with metric temporal logic, Real-Time Systems 2 (4) (1990) 255–299.
33
[34] M. Montali, M. Pesic, W. M. P. van der Aalst, F. Chesani, P. Mello, S. Storari, Declarative Specification and Verification of Service Choreographies, ACM Transactions on the Web 4 (1). [35] O. Kupferman, M. Vardi, Vacuity Detection in Temporal Model Checking, Int. Journal on Software Tools for Technology Transfer (2003) 224–233. [36] F. M. Maggi, Declarative process mining with the declare component of prom, in: BPM (Demos), 2013. [37] M. Westergaard, F. M. Maggi, Looking into the future: Using timed automata to provide a priori advice about timed declarative process models, in: Proc. of CoopIS, LNCS, Springer, 2012. [38] R. D. Masellis, F. M. Maggi, M. Montali, Monitoring data-aware business constraints with finite state automata, in: International Conference on Software and Systems Process 2014, ICSSP, 2014, pp. 134–143. [39] 3TU Data Center, BPI Challenge 2011 Event Log (2011). doi:doi:10. 4121/uuid:d9769f3d-0ab0-4fb8-803b-0d1120ffcf54. [40] 3TU Data Center, BPI Challenge 2012 Event Log (2012). doi:10.4121/ uuid:3926db30-f712-4394-aebc-75976070e91f. [41] 3TU Data Center, BPI Challenge 2014 Event Log (2014). doi:10.4121/ uuid:c3e5d162-0cfd-4bb0-bd82-af5268819c35.
34