A Bottom-up Workflow Mining Approach for Workflow Applications Analysis Walid Gaaloul1 , Karim Ba¨ına2 , and Claude Godart1 1
LORIA - INRIA - CNRS - UMR 7503 BP 239, F-54506 Vandœuvre-l`es-Nancy Cedex, France 2 ENSIAS, Universit´e Mohammed V - Souissi, BP 713 Agdal - Rabat, Morocco
[email protected],
[email protected],
[email protected]
Abstract. Engineering workflow applications are becoming more and more complex, involving numerous interacting business objects within considerable processes. Analysing the interaction structure of those complex applications will enable them to be well understood, controlled, and redesigned. Our contribution to workflow mining is a statistical technique to discover workflow patterns from event-based log. Our approach is characterised by a ”local” workflow patterns discovery that allows to cover partial results through a dynamic programming algorithm. Those local discovered workflow patterns are then composed iteratively until discovering the global workflow model. Our approach has been implemented within our prototype WorkflowMiner. keywords : workflow mining, workflow patterns, business process Analysis, Business process intelligence.
1
Introduction
With the technological improvements and the continuous increasing market pressures and requirements, collaborative information systems are becoming more and more complex, involving numerous interacting business objects. Analysing interactions of those complex systems will enable them to be well understood, controlled, and redesigned. Our paper is a contribution to this problem in the particular context of workflow applications analysis through workflow mining. Our approach (a) starts by collecting log information from workflow processes instances as they took place (event collectors and adapters component). Then, (b) it builds, through statistical techniques, a graphical intermediary representation modelling elementary dependencies over workflow activities executions (events analyser component). These dependencies are then (c) refined to discover workflow patterns (patterns analyser component). Beside workflow patterns analysis, some workflow performance metrics are computed but will not be in the scope of this paper (performance analyser component). This paper is structured as follows. Section 2 explains our workflow log model. Section 3 details our structural workflow patterns mining algorithm. Section 4 discusses related work, implementation and perspectives issues, before concluding.
2
Workflow Log Model
The workflow specification might not be concerned with the details of the activities however it would have to deal, at least, with the externally visible completion events of activities (such as aborted, failed, and completed). Currently, most of WfMSs log all events occurring during process execution. We expect the activities to be traceable, meaning that the system should in somehow keep track of ongoing and past executions. As shown in the UML class diagram of figure 1, WorkflowLog is composed of a set of EventStreams (definition 1). Each EventStream traces the execution of one case (instance). It consists of a set of events (Event) that captures the activities life cycle performed in a particular workflow instance. An Event is described by the activity identifier that it concerns, the current activity state (aborted, failed, completed or compensated) and the time when it occurs (TimeStamp). A Window defines a set of Events over an EventStream. Finally, a Partition builds a set of partially overlapping Windows partition over an EventStream.
Fig. 1. Workflow Log Model
Definition 1. (EventStream) An EventStream represents the history of a workflow instance events as a tuple EventStream= (begin, end, sequenceLog, SOccurrence) where: Xbegin : TimeStamp is the moment of log beginning ; Xend : TimeStamp is the moment of log end; XsequenceLog : Event* is an ordered Event set belonging to a workflow instance; XSOccurrence : int is the activity instance number. A WorkflowLog is a set of EventStreams. WorkflowLog=(workflowID, {EventStreami , 0 ≤ i < number of workflow instances}) where EventStreami is the event stream of the ith workflow instance. Here is an EventStream related to an instance of workflow in figure 2. This EventStream was filtered to take only events with completed as state. L = EventStream((13/5/2005,5:42:12), (14/5/2005, 14:01:54), [Event(”A1 ”, completed, (13/5/2005, 5:42:12)), Event(”A2 ”, completed, (13/5/2005,11:11:12)), Event(”A4 ”, completed, (13/5/2005,14:01:54)), Event(”A3 ”, completed, (14/5/2005, 00:01:54)), Event(”A5 ”, completed, (14/5/2005,5:45:54)), Event(”A7 ”, completed, (14/5/2005,10:32:55)), Event(”A9 ”, completed, (14/5/2005,14:01:54))])
Fig. 2. Workflow running example
3
Mining structural workflow patterns
As we stated before, we start by collecting WorkflowLog from workflow instances as they took place. Then we build, through statistical techniques, a graphical intermediary representation modelling elementary dependencies over workflow logs (section 3.2). These dependencies are refined by advanced structural workflow patterns (section 3.3). An elementary dependency is an ”immediate” dependency3 linking two activities in the sense that the termination of the first causes the activation of the last. Thus, the event of termination of the first activity is considered as the pre-condition of the activation of the last and reciprocally the activation of the last is considered as a post condition of the termination of the first activity. While an advanced structural workflow pattern is a set of elementary dependencies that defines an advanced structure to express specific behaviour, in terms of control flow, linking these dependencies. A pattern is the abstraction from a concrete form which keeps recurring in specific non arbitrary contexts. Thus, a workflow pattern [1] can be seen as an abstract description of a recurrent class of interactions based on (primitive) activation dependency. 3.1
Overview
As illustrated in figure 3, our approach is applied in bottom up manner : 1. Discovering activities dependencies : First, we specify dependencies linking workflow activities during execution. We divide these dependencies in two kinds : causal and non-causal. A Causal dependency between two activities expresses that the occurrence of an activity event involves the activation of an other activity event. While a non-causal dependency specifies other activities behavioural dependency. 2. Computing statistical behavioural properties : Secondly, we compute the statistical behavioural properties from logs. These properties tailor the main behaviour features of the chosen discovered patterns. We define three types of properties : sequential, concurrent and choice. The sequential and concurrent properties inherit from causal dependency. The first expresses an exclusive causal dependency between two activities. While the second specifies a causal between an activity on one hand and a set of activities on an other hand. The concurrent property inherits from non-causal dependency and characterises the concurrent behaviour of a set activities. 3
Terms immediate and direct will be used interchangeably in the remainder of the paper
3. Discovering workflows patterns : Finally, we use a set of rules to discover a set of the most useful patterns. These rules are expressed using the statistical properties and specify an indicator function (could be expressed as a 1st order logic predicate, for instance) defining as a unique manner a pattern. In this work, we have chosen to discover the most useful patterns, but the adopted approach allows to enrich this set of patterns by specifying new statistical dependencies and their associated properties or by using the existing properties in new combinations discovering new patterns.
Fig. 3. Hierarchical view of workflow patterns mining approach
3.2
Discovering elementary dependencies
The aim of this section is to explain our algorithm for discovering elementary dependencies among a WorkflowLog and build an intermediary model representing those dependencies : statistical dependency table (or SDT). Discovering direct dependencies. In order to discover direct dependencies from a WorkflowLog, we need an intermediary representation of this WorkflowLog through a statistical analysis. We call this intermediary representation : statistical dependency table (or SDT) which is based on a notion of frequency table [2]. As workflow patterns are described only by control flow dependencies, this table captures control flow direct dependencies which are related exclusively to activities ”terminated” state dependencies reporting ”correct” (i.e., without ”exceptions”) executions. There is no need to use other EventStreams relating to failure executions containing failed or aborted or compensated states. In fact, these cases concern only transactional behaviour and dependencies which tailors the mechanisms for failures handling and recovery. These issues are out of the scope of our paper. Nevertheless, in [3, 4] we use these events to
P (x/y) A1 A2 A3 A4 A5 A6 A7 A8 A9 A1 0 0 0 0 0 0 0 0 0 0 0 0 A2 0.54 0 0 0.46 0 0 A3 0 0.69 0 0.31 0 0 0 0 0 A4 0.46 0.31 0.23 0 0 0 0 0 0 A5 0 0 0.77 0.23 0 0 0 0 0 A6 0 0 0 0 1 0 0 0 0 A7 0 0 0 0 1 0 0 0 0 A8 0 0 0 0 0 0 0 0 0 A9 0 0 0 0 0 0.38 0.62 0 0 #A1 = #A2 = #A3 = #A4 = #A5 = #A9 = 100, #A6 = 38, #A7 = 62, #A8 = 0 Table 1. Initial Statistical Dependencies Table (P (x/y)) and activities Frequencies (#)
discover and improve workflow transactional behaviour. Consequently, to mine workflow patterns, we need to filter the analysed WorkflowLog and take only EventStreams of instances executed ”correctly”. Basically, SDT is built through a statistical calculus that extracts elementary dependencies between activities of a WorkflowLog that are executed without ”exceptions” (i.e. they reached successfully their completed state). We denote by WorkflowLogcompleted this workflow log selection. Thus, the unique necessary condition to discover elementary dependencies is to have workflow logs containing at least the completed event states. These features allow to mine control flow from ”poor” logs which contain only completed event states. By the way, any information system using transactional systems or workflow management systems offer this information in some form. For each activity A, we extract from WorkflowLogcompleted the following information in the statistical dependency table (SDT): (i) The overall occurrence number of this activity (denoted #A) and (ii) The elementary dependencies to previous activities Bi (denoted P (A/Bi )). The size of SDT is n ∗ n, where n is the number of workflow activities. The (m,n) table entry (notation P (A0≤i