Towards Goal-driven Access to Process Warehouse: Integrating Goals with Process Warehouse for Business Process Analysis Khurram Shahzad
Jelena Zdravkovic
Department of Software and Computer Systems (SCS) Royal Institute of Technology (KTH) Forum 120, SE 164 40 Kista, Stockholm, Sweden
[email protected]
Department of Computer and Systems Sciences (DSV) Stockholm University (SU) Forum 100, SE 164 40 Kista, Stockholm, Sweden
[email protected]
Abstract - Analysis and improvement of business processes, one of the core phases in the BPM life cycle, is becoming on top of the agenda for many enterprises. An emerging approach to analyze business processes is to use the business intelligence approaches that attempt to facilitate the analytical capabilities of business process management systems by implementing process-oriented data warehouse and mining techniques. However, very little work has been done on employing business orientation in the design and utilization of process warehouse, which greatly limit the number of users who are able to fully exploit the potential of warehouse technology. The approach presented in this paper, addresses that limitation by attempting to integrate business goals with warehouse. The proposed approach to integration spans across two levels, conceptual and implementation; the first defines the concepts that are used to relate goals and process warehouse; the latter is used to implement the concepts and the relations, to link warehouse data and goal records in relational databases. The proposed approach enhances the way users access warehouse data and interpret the data. To facilitate the use of the proposed approach we have implemented a prototype. The evaluation results, though preliminary, provides evidence of the usefulness of our approach. Keywords- Business Process Analysis, Goals Integration, Process Data Warehouse, Business-orientation of Process Data Warehouse.
I.
INTRODUCTION
Business Process Management (BPM) is being established as an important instrument to help organizations in meeting their business goals and achieving competitive advantages. Analysis and improvement of business processes, one of the core phases in the BPM life cycle, is becoming on top of the agenda for many enterprises. This requires an understanding of business processes, as well as approaches for continuously managing and changing them to facilitate sustainable process improvement. An emerging approach to analyze business processes is to utilize business intelligence techniques. Some of these techniques attempt to boost the analytical capabilities of BPM systems by employing process-oriented data warehouse [1], or simply Process Warehouse (PW). These data storage structures are aimed to capture information about executed processes to enable their efficient analyses and thereby provide basis for possible improvements [2]. Two major issues in capturing data for the analysis of a business process using a warehouse, concern a) the
requirement to load the process-relevant data in the warehouse, and b) to identify and retrieve problem-relevant data in an efficient way. The first issue has been initially addressed by introducing the concept of Process Warehouse; the latter is even more a subject for consideration. Due to large size of process warehouses and lack of approaches for their efficient use, users commonly consider many irrelevant data, miss vital information, and as a consequence, fail in the process analysis task. In order to address the outlined problems, a starting point may be the consideration of the needs of business process analysts and decision makers. They are accustomed to the business perspective in the process analysis, and thereby far of the capability to directly utilize low-level data contained in warehouses [3, 4, 5]. It has been argued that these users prefer to analyze business performance with regards to, for instance, business goals [5, 6]. In the absence of the contextual information such as business process goals and their fulfilment criteria, the retrieved values are difficult to interpret, or can be interpreted incorrectly, consequently making the analysis inefficient. Consider a healthcare process in which the performance is analyzed in terms of the number of patients attended per day. Given the number is 80, it is not clear whether the process is efficient or not - due to absence of the information about business goals and their fulfillment criteria, a user may consider the process efficient, whereas another user may find it inefficient, thereby making a different decision. Our approach to address the discussed issues is based on a business orientation in the design of PW and its utilization, by integrating goals with the warehouse data. By the integration we aim to serve several purposes: a) to retrieve relevant information, b) support for a goal-based navigation through PW and hence an improvement of the way users access warehouse data, c) defining contextual information in the form of goals and satisfaction conditions and hence enhancing the way users interpret data, i.e. ensuring correct data interpretation. The proposed approach to integration spans across two levels, conceptual and implementation. At the conceptual level, we define the concepts needed to relate goals and PW, whereas at the implementation level, we present the extensions to the PW conceptualization, needed to model the relation between goal and PW at the data level. In addition to that, we have implemented a prototype that facilitates the navigation through PW using goals.
Rest of the paper is organized as follows: Section 2 outlines the related work. Section 3 presents a generic meta-model for process warehouse and extensions to integrate goals with PW at the conceptual level. In section 4 integration of goal with PW is defined at the implementation level, and the implemented prototype is presented. Section 5 presents an evaluation of the proposed approach, followed by conclusions in Section 6. II.
RELATED WORK
The related work to this study has been divided into two categories followed by a brief discussion. These categories are: a) the use of data warehouse technology for process analysis, b) business orientation of warehouse. A. Using PW for Business Process Analysis For more than a decade the data warehouse technology (together with analysis tools like Online Analytical Process Systems [7]) has been employed in many domains, such as healthcare and agriculture in order to enhance the analytical capabilities of enterprises. However, the technology does not provide adequate bases for detailed analyses of business processes. It is because the warehouse data is extracted from Online-Transactional Processing systems and thereby, lack in providing the facts concerning business process executions, such as idle time and execution time. As a consequence, credible decisions on potential process improvements cannot be taken [1, 2, 8]. To overcome this problem the concept of process data warehouse [10], or simply process warehouse (PW) [9] has been introduced. Studies related to business process analysis have focused on presenting toolsets like Business Process Cockpit [2] and their architectures like Business Process Intelligence System [11]. These toolsets allow users to monitor, analyze and optimize processes. Despite the fact that these studies are conducted at different geographical locations and at different time intervals a common thing among such solutions is the use of process warehouse that captures information about running processes. Specific studies on process warehouses have been mainly focused on proposing their multidimensional data model, generic [8], or specific to a certain domain, like healthcare [12] and chemical engineering [13]. Although some of these studies explicitly state that they present an excerpt of the data model, analyses of PW design proposals conclude that the presented data models are incomplete [14, 15]. Brandt et. al. [13, 16] presented an extended design of PW that is capable of capturing two types of traces to facilitate the support of creative design processes. The types are, product and process traces. Product traces describe the properties and relationships of concrete design objects and process traces describe the history of actions that led to the creation of the product objects. From the discussion it can be concluded that the use of data warehouse technology is an established way for business process analysis.
B. Business Orientation of Warehouse During the last decade a number of efforts have been reported on business orientation of data warehouses with an aim to facilitate users in using data warehouses. At first Sarda et al. [5] discussed the use of business metadata in data warehouse systems. Following that, Stefanov et. al. [4] extended Sarda’s work and introduced an integrated view of data warehouse and enterprise models using model weaving links. The model weaving approach was also used to make the relationship between data warehouse and enterprise goals visible and accessible [6]. The study focuses on providing this information to users within analysis tool to support and improve data interpretation. However, the details about what extensions should be made to data warehouse to capture information about goals and their relationship are not discussed. Addition to that, the benefit of the approach is limited to improve interpretation of data i.e. it does not facilitate the way users navigate data warehouse. In line with the business orientation of warehouse, Xie et. al. [17] reported the work being done at IBM China to build business-friendly data warehouse using semantic web technology. The approach aims to make business semantics explicit in the form of a Business Concept Model by using the concepts familiar to the data warehouse users such as classes, attributes and relationship. Then, a mapping from the business metadata to the schema of the data warehouse is made to enhance usability of data warehouse. Although the efforts to formally model a responsive warehouse are done in the context of data warehouse, such an effort has not been done in the context of process warehouse. On the other hand, recent efforts [18, 19, 20] related to process warehouse focus on developing methods and tools that can guide process managers for business process improvement using process warehouse. In contrast to that, this study facilitates these users in using PW, by proposing a goal-driven access to the warehouse data. In particular, we propose to integrate the goals related to the performance of business processes with process warehouse. Our solution differs from the discussed studies in a number of ways: a) in order to integrate goals with PW, we consider both the conceptual and implementation level; b) our approach allows navigation through PW using goals; c) retrieval of relevant information from PW. III.
INTEGRATING GOALS WITH PW: CONCEPTUAL LEVEL
In this section, we first present a conceptual metamodel for process warehouse that can guide designers in the specification of PW. The metamodel encompasses existing PW design proposals by capturing common design aspects. Additionally, the model is extendable to instantiate specific cases. Secondly, we extend the metamodel to integrate goals with PW. Intuitively, a generic metamodel for PW requires consideration of common business process concepts and frameworks. Therefore, we employ a process design framework [22], a generic metamodel for business processes [23], to design a common metamodel for PW. In the following paragraphs, we describe them briefly:
The process design framework proposed by Curtis [22], is a conceptual framework that represents a business process in terms of four distinguished yet interrelated perspectives for analyzing and presenting process information [24]. The framework has been used in the past (for instance in [25]) as a reference to achieve a comprehensive representation of process concepts and also to ensure that basic building blocks are covered. The four perspectives of the framework are: functional, informational, organizational, and behavioral. Functional perspective concerns the elements of a process models that are performed i.e. the activities executed during the course of process execution. Behavioral perspective concerns the time and the order in which the elements process are performed i.e. through looping, advance branching and decision conditions etc. Organizational perspective concerns the place where process elements are performed and the participants who perform them i.e. distribution of responsibility for executing process activities. The main focus here is on the notion of the actor which can be an organization unit, a human or software. Informational perspective regards the resources imparted, consumed and produced by elements of a process. A resource can be traditional, such as a product, or informational, such as data and/or artifacts. A metamodel for business processes is said to be generic if it represents core concepts of business processes independent of any process modeling language. Figure 1 shows a generic metamodel for business processes taken from a previous work [23]. It is defined that a business process consists of a set of activities, and a process may also have of sub-processes [24]. The execution of a business process is initiated by an event called start event and terminated by an event called end event. During process execution, activities execute in a specific order. Control flow defines the order in which activities can be executed and it can be of two types [25], operators and connection. During execution, an activity can consume, produce, or transfer a resource between participants, where, participants are the entities that can execute a task to take/transfer control of a resource [23].
Step1 - Basic Elements: Information in a data warehouse is structured into facts and dimensions [26, 27]. Since PW is a special type of data warehouse, therefore, fact and dimension are the two fundamental elements of the PW metamodel and are used as a starting point for designing the metamodel for PW; where, fact is an item of interest and it is described through attributes called measures [27]. Dimension describes the context in which a fact is to be analyzed [7]. A dimension may be shared between facts and may have one or more aggregation levels which forms the aggregation hierarchy [6, 27]. All these elements form the core warehouse metamodel and are represented by the unshaded elements in Figure 2. Step2 - Classification of Measures: As a next step, we employ the process design framework to extend the metamodel with the four perspectives, functional, behavioral, organizational and informational. The perspectives explicitly represent the types of measures that should be captured in a PW. While some existing approaches (like the one in [28]) either ignore facts or present a few measures in their PW design, these types can help by guiding users in identifying measures and ensures the identification of a more exhaustive set of measures. Obviously, the classification cannot guarantee that a complete set of measures will be identified. However, we assume that the use of classification reduces the complexity of analyzing a process (as argued by Curtis et. al. [22]) and therefore yields an exhaustive list of measures. Also, it has been argued in literature [25] that these perspectives are relevant to adequately organize information about a process. These extensions to the metamodel are represented by shaded elements with thin boundaries in figure 2. The functional perspective represents the facts related to the functions performed in a process e.g. duration of activity and idle time of an activity. Organizational perspective represents the facts related to entities in an organization that are relevant to the process e.g. number of actor types used in the process and the place where the activity was realized. Informational perspective represents the facts related to the data, objects/ resources imparted, transferred produced or consumed within a process e.g. amount of resource types used. Behavioral perspective represents the facts related to the event that launched activities, execution order or activity, cycletime, anomalous behavior, path, exception e.g. idle time between activities and waiting time. Step3 - Classification of Dimensions: As the next design step, we employ the generic business process metamodel for defining the warehouse dimension types based on the elements of business process metamodel. In addition to those, time and location dimensions are added in order to keep the time-variant property of warehouse and to keep information about the place in the enterprise where the process is executed. Also, from the study of an existing approach [14], it is found that time dimension is modeled in all the studied existing PW proposals which additionally justify the use of the time dimension.
Figure 1: A generic metamodel for Business Processes [23] A. The Metamodel Below, we describe the concepts of the metamodel and the steps used to develop the metamodel.
Figure 2: Conceptual level metamodel for PW
By the definition, a process is a composed set of activities. Due to this hierarchical relation between process and activity, these two elements together form the activity dimension of PW. The event concept from the business process metamodel is directly added as a dimension, whereas two sub-classes (informational and tangible) are defined for resource concept. It is due to the reason that, in an earlier study [14] about the comparison of existing approaches we found that informational resources have not been modeled by most of the existing approaches. Therefore, the purpose of defining sub-classes is to emphasis the two major types of resources. For the same reason, two sub-classes (passive and active) are also defined for participant, where active participant in our case is the one who is responsible to perform an activity and passive is the one who is not responsible to perform the activity but involved in the activity. All these extensions to the metamodel are represented by shaded elements with solid boundary in Figure 2. In the surgery example, the doctor who is responsible for the surgery is an active participant, whereas patient whose surgery is performed is a passive participant. In the proposed metamodel we have identified classes of dimensions, however to generate case specific dimensions, user can build several dimensions out of one dimension and/or merge some dimensions. Similar to facts, classification of dimension doesn’t guarantee complete set of dimension, however it works as a guideline to identify dimensions, reduces the complexity of identifying dimensions and likely yields an exhaustive list of dimensions. Step4 - Integrating goals with PW metamodel: As a next step, we integrate goals with the metamodel. In the context of this study, a goal is a measurable state of a process or a segment of a process, in terms of QoS (quality of service) property, that is intended to achieve [20]. Examples of goals are ‘efficient registration’ and ‘timely treatment’, where ‘efficient’ and ‘timely’ are QoS properties and ‘registration’ and ‘treatment’ are sub-processes of a healthcare process. From the definition, a goal is measurable therefore a set of indicators can be defined for each goal, where indicators are a set of criteria in terms of which a goal can be measured. For each indicator a desired value called satisfaction condition (S) can also be defined, where satisfaction conditions of all indicators of a goal forms the satisfaction condition (Sg) of the goal, also referred to as fulfillment criteria of the goal. A satisfaction condition S is a coupling of desired value and context,
represented by S (V, C). Desired value is the value that is intended to achieve and context represent the settings in which the value is valid. Consider an example of satisfaction condition, ‘one staff member should register patient’. In this example, the value ‘one’ is the desire value and context is ‘staff member’ (a participant) and ‘register patient’ (an activity). The desired value ‘one’ is not meaningful in the absence of the settings in the the information is valid. Based on the discussion we add three elements to the metamodel. These elements are goal, indicator and satisfaction condition. The self loop on goal represents that a goal can participate in a goal structure, where the goal structure is build along functional decomposition of a process on which it is defined [29]. It is build along by recursively analyzing a process until the level of atomic actions are derived, where no further process sub-elements can be identified [29]. A goal, through its indicators, is related with the measure and aggregation level – two elements of PW metamodel. i) The relationship between measure and indicator represents that an indicator of a goal can be directly represented as a measure (i.e. the concrete items of interest) or a function can be defined on top of measure/s to represent an indicator. Consider a fabricated example of a fact table Treatment [QtyofMedicatedPatients, QtyofOperatedPatients] and two indicators ‘Medicated Patients’ and ‘Treated Patients’ represented by I1 and I2 respectively, where I2 represents the total number of patients treated. I1 is represented by QtyofMedicatedPatients, whereas a single measure to represent I2 is not given in the Treatment fact. However, a SUM function can be defined on top of the two measures (QtyofMedicatedPatients, QtyofOperatedPatients) to represent I2. Earlier in the section, we proposed to use a classification of measures (functional, behavioral, organizational, informational) to yield an exhaustive list of measures, therefore it is reasonable to assume that for each indicator one or more corresponding measures can be identified from PW. ii) The relationship between aggregation level of dimension and indicator serves two purposes, a) To scope measures by aggregate (rollup) or detail (drilldown) operations [27]. For that, it is important to first identify what aggregation levels (of dimensions) can be used to analyze the measures related with an indicator. Secondly, the relationship is important to correctly scope which instances of measures corresponds to the indicator. b) It helps to identify what contextual values can be used in the definition of satisfaction condition. As stated above, satisfaction condition of each indicator is a coupling of context and desired value. The desired value is a possible instance of a measure and context is a set of values of dimension levels. For the Treatment fact table, consider a Time dimension with aggregation level Year, Month and Week. If the aggregation level is set of Year 2009 the QtyofMedicatedPatients gives the medicated patients quantity for the year 2009. IV.
INTEGRATING GOALS WITH PW: IMPLEMTATION LEVEL
The metamodel describe the elements and their relationships which are used to integrate goals with PW, at the conceptual level. Given that a PW is implemented in the form
of tables (dimension and fact tables) and a relationship between them, the conceptual level does not describe what changes should be made to dimension and fact tables of a PW to represent the records that are related with goals. To capture the discussed information in PW we present what changes should be made to PW description. Our solution is a refined form of Bebel’s approach [30] to share data between versions of a data warehouse. Their approach suggests addition of a set of bitmaps to a given record in order to capture information about all versions the record belongs to. Value of the bitmap attribute could be 1 or 0, where the values represent relevance and non-relevance of the complete record, respectively. By using the approach, as presented in [30], it is not possible to represent relevance of a segment of record, whereas in our case a goal is related to a segment of record in most cases i.e. not to the whole record of dimension or fact table. Consider a simple example of a fact table with two measures, Patient [TreatmentDelay, RegistredPatientsQty] and two goals ‘timely treatment’ and ‘efficient registration’. The goals are respectively measured in terms of indicators, ‘Delay in Treatment’ and ‘Amount of Registered Patients’. In order to analyze a goal ‘timely treatment’, the records in TreatmentDelay are needed whereas the records in RegisteredPatientsQty are not needed. Similarly, to analyze a goal ‘efficient registration’ the records in RegistredPatientsQty are needed whereas records in TreatmentDelay are not needed. This leads us to propose changes to PW description at two levels, Schema level and Data level. The details are as follows: Schema level: Schema level defines the tables and segment of a table (in terms of attributes) that is relevant to a goal. This level captures what tables (dimension, fact) and attributes are related with goals through its indicators. Our approach to capture schema level relationship is based on addition of a metatables structure that captures information about goals, indicators, satisfaction conditions and their relationship with dimension, fact tables. Figure 3 shows the data model for the metatable structure that we have hardcoded in our prototype (see prototype details in next section). The Goal and Goal_Structure tables capture information about goals and their relationship with other goals. The indicators and their relationship with goals are captured by Goal_Indicator and Indicator tables respectively. The satisfaction condition and their relationship with indicators and goals are captured in S_Condition and Indicator_SC tables respectively. Indicator_fact table captures the information about relevant fact tables and attributes, whereas Indicator_Dim captures information about dimensions and their attributes which corresponds to indicators.
Figure 3: Data model for metatable structure
Data level: Data level defines the instances (in terms of records) of the relevant tables (dimension and fact) that are related to the goal. Our approach to relate a record with multiple goals is based on addition of bitmaps to the dimension and fact tables. What is efficient about addition of bitmaps is that, we suggest addition of a bitmap to the PW tables that are defined relevant to a goal at schema level i.e. we negate the addition of bitmaps to each table for each goal to avoids unnecessary increase in size of PW. As the relationship is defined at two levels, Schema and Data, the value of the bitmap (1 or 0) does not represent the relevance or irrelevance of the complete record. Instead the value 1 of bitmap for a goal represents the relevance of the record to the segment of the table defined at schema level. For the Patient fact table example, two goals ‘timely treatment’ and ‘efficient registration’ are related with the fact table. So, add two bitmap attributes G1 and G2 corresponding to ‘timely treatment’ and ‘efficient registration’ respectively. The value 1 or 0 of G1 or G2 does not represent the relevance or irrelevance of the complete record i.e. [TreatmentDelay, RegistredPatientsQty]. Instead the value 1 of G1 represents that the record of TreatmentDelay is related to related goal which in this case is ‘timely treatment’. A. The Prototype To facilitate the use of the proposed approach, we have implemented a prototype. The prototype supports the concept of integrating goal with process warehouse and navigating PW using goals.
For the Patient fact table example, the name of goals ‘efficient registration’ and ‘timely treatment’ are captured in Goal table whereas name of related indicators and measures are stored in Indicator and Indicator_Fact tables. Due to primary-foreign key relationships in the metatables structure, goals and their related measures can be traced to each other through indicators. Figure 4: Use case diagram (an excerpt)
Figure 5: User Interface for the Goal Module
An excerpt of the use case diagram is presented in figure 4 which provides bases for designing the prototype. An interface (see screenshot in figure 5) allows defining and changing goals, indicators and satisfaction conditions. Another interface allows definition of relation between goal, indicators, satisfaction condition, measures and aggregation levels. On defining the relations, the prototype extracts and store information about the defined relationship in an internal repository that implements the metatable structure shown in figure 3. In order to allow navigation of PW using goals, we have implemented a third interface. Figure 6 shows the screenshot of the interface using which a goal structure/s is available to users for navigating PW. On choosing a goal from the goal structure, the prototype automatically extracts the satisfaction condition and the data about executed process. As discussed in the introduction section, business users are accustomed to their own vocabularies and concepts [5, 6], therefore, navigating PW with classical dimensions to identify the information useful for analysis can be a complex task. Providing access to PW using goals simplifies the task of navigating PW by business users due to the familiarity of these users with goals. In classical approach, due to the potential size of process warehouse, business users may consider irrelevant information or can miss important points, consequently, making the analysis error pruned [20, 29]. Whereas, in goaldriven approach due to the relationship with PW only related data is retrieved, which decreases the possibility to consider irrelevant information and hence decreases the possibility of making the analysis error pruned. The approach also enhances the way users interpret data, due to the presence of context in the form of goal. Generally, input to warehouse is a set of dimension values whereas the output is a set of fact values. It has been argued before that the real meaning of the values can be missed [29] and in the absence of information about desired values identification of bottlenecks is not a straight forward task. Whereas, in our approach, the addition of goals and satisfaction conditions simplify the task because the target values (captured and provided as satisfaction conditions) can be compared with the values extracted from PW to conclude on achievement or not achievement of goals. Therefore, the approach increases the usefulness of the data by enhances the way users use PW data.
Figure 6: User Interface of the Analysis Module
V.
EVALUATION
In this section we describe the results from the part of the study that we have carried out for evaluating the proposed approach. In an earlier part of the paper, we claim that the conceptual level metamodel encompasses existing PW proposals. Therefore, for conceptual level, we compare the proposed metamodel with existing PW proposals to evaluate whether or not it encompasses existing PW proposals. For implementation level, we negate the addition of bitmaps to each table for each goal by proposing changes to PW description at two levels, to avoid unnecessary increase in size of PW. Therefore, for implementation level, we study the changes in size of PW. Finally, to study the effect of goal-based navigation of PW, we study navigation effectiveness. The study results are discussed below: A. Comprehensiveness The choice of the existing approaches used for comparison is based on a comprehensive search through major databases (like Springerlink, ACM DL, IEEE Xplore) by using keywords and phrases like process warehouse, process data warehouse, process-oriented data warehouse, data warehouse for business processes, process analysis, designing process warehouse, data warehouse for workflows etc. A total of 34 studies related to process warehousing were selected during the search. From these approaches, there were 10 studies that present a multidimensional data model for PW. While some of them were repeating, we selected nine distinct models that are considered in this study along with our proposed PW metamodel. These are: a) multidimensional data model (MD) [28], b) Goal-driven design of a DW (GD) [31], c) generic warehouse for business process data (GW) [8], d) Performance warehouse (PH) [32], e) real time process data store (PD) [33], f) business oriented development of DW structure (PS) [34], g) DW for audit trail analysis (DT) [35], h) Data warehouse for workflow logs (DL) [36], i) Warehousing workflow data (WD) [37] and our proposed j) PW metamodel (MM). Because these approaches do not present a metamodel of PW, they rather present PW design, therefore we have focused on the comparison of concepts they have used for designing PW.
Table 5 shows the results of integrating goals in process warehouse. From our study we found no effort has been made to integrate goals to PW. Although, some approaches use goaldriven approach to design warehouse, but they do not include goals as a part of the PW design produced by the approach. Therefore, to explain such cases a small explanation about involvement of goals by each approach is given in table 5. B. Size In order to study the size of PW and effectiveness of goaldriven access to PW data, we implemented a PW for a Stroke process. The aim is to study the effect of integrating goals with PW through a controlled experiment, where the developed prototype is used. As a first task to design the experiment a PW was implemented for the Stroke process that contains 6 fact tables, 19 measures and 10 dimension tables and populated with small sample of fabricated data, due to privacy reason. Secondly, another PW was implemented with the same specifications but this time a goal structure that contains 11 goals were integrated with the PW. This was done by adding goal definitions to meta-table structure (refer to figure 3) using the prototype and extensions to PW schema by adding bitmap attributes to dimension and facts tables. Below we discuss the results from the experiment. Graph 7 (a) shows a comparison of the original size of fact tables (in terms of number of attributes) with the extended size of fact tables, whereas graph 7 (b) shows a comparison of the original size of dimension tables with the extended size of dimension tables. From graph 7 (b), we observe a significant increase in size of dimension tables. For instance, number of attributes of dimension D10 increased to more 400%. It is because the dimension has fewer attributes and it is related to several goals and by our proposed approach one bitmap attribute corresponding to each relevant goal was added.
% of Size
Some of the approaches (PH, PD and PS) explicitly discuss that the presented model is not complete however from the comparison results given in table 3, we see that existing approaches do not contain at least one measure for each classification. Certainly, identification of a complete list of generic measures for processes is not a straight forward task due to the domain specific requirements. However, a larger set of measures is expected (due to classification) compared to the fewer measures modeled in existing approaches. From table 3 given in appendix it can be seen that a few approaches have modeled some measures like perception, expectation and share of enrolled students, which cannot be classified. Similarly, from table 4 it can be seen that existing approaches do not contain at least one dimension for each classification.
200 180 160 140 120 100 80 60 40 20 0
Origional Size
F1
F2
Extended Size
F3 F4 Fact Tables
F5
F6
Figure 7 (a): % of increase in size of fact tables
% of Size
From the comparison results given in table 2 it can be seen that existing approaches with an exception of WD [37] use all core concepts of warehouse metamodel. These concepts are also common in data warehouses and our proposed metamodel also covers these concepts. From the comparison it is affirmed that PW uses key concepts of a classical data warehouse.
Although the results shows significant increase in size of PW, in terms of number of attributes, however the fact that ‘CHAR’ - a data type in DB2, can take more than 50 bytes and it is a data type for most of the dimension attributes in our PW. Therefore, it can be argued that addition of several bitmaps will not increase the size of PW exponentially.
500 450 400 350 300 250 200 150 100 50 0
Origional Size
D1
D2
D3
Extended Size
D4 D5 D6 D7 Dimension Tables
D8
D9
D10
Figure 7 (b): % of increase in size of dimension tables
Graph 8 (a) shows a comparison of the total number of dimension tables in PW with the number of dimension tables related with goals, whereas graph 8 (b) shows a comparison of total number of measures in PW with the number of measures related with goals. From the graphs we observe that the number of dimensions and measures related to goals are significantly less than the total size of PW. As discussed before, our prototype provides goal-related data to its users. Therefore, it can be argued that a reduction in the information available to users of PW will take place due to goal-driven access to PW. Total Dimensions Related Dimensions
Table 2 to 5 given in the appendix shows the results of the comparison.
Goals Related
110 100 90 80 70 60 50 40 G1
G2
G3
G4
G5
G6 G7 Goals
G8
G9 G10 G11
Figure 8 (a): dimensions related to goals
Goal Related
110 100 90 80 70 60 50 40
7
0,19
0,4
8
0,19
0,364
Avg.
0,158
0,323
Classical Approach
0,3
G1
G2
G3
G4
G5
G6 G7 Goals
G8
Figure 8 (b): measures related to goals
C. Navigation Effectiveness In order to study the effect of goal-based navigation of PW we calculate the navigation effectiveness. Navigation effectiveness is the degree of accuracy with which a user finds relevant data from PW by using goals. It is measured in terms of F-measure - the harmonic mean of precision and recall [21, 38]; where precision is the fraction of retrieved data that are relevant, and recall is the fraction of relevant data retrieved. The steps taken for measuring F-measure are as follows: 1.
Let RR be the number of relevant measures retrieved, IR be the number of irrelevant measures retrieved, and RN be the number of relevant measures in the PW that have not been retrieved. The precision PR and recall RC is, PR = RR / [RR + IR] and RC = RR / [RR + RN]
2.
The navigation effectiveness measured in terms of Fmeasure can then be calculated by, F-Measure = 2* PR * RC / [PR + RC].
For the experiment we collected eight problems related to the process that can trigger the need for process analysis. Primarily, they are based on an earlier study [39] in which one of the authors participated together with practitioners from healthcare domain in Stockholm, Sweden. The problem definitions were used for process analysis using the PW with goals and without goals (classical approach). Based on the problem definitions (represented as questions in table 1), one or more goals were identified which were used for process analysis using the PW with goals. Table 1 shows a comparison of F-measure using a classical and goal based approach. The results provide evidence that the goal-driven access affects the navigation effectiveness positively. Also, by a comparison of precision results presented in figure 9 we see the increase in precision which affirms the retrieval of relevant data from PW while using goal-driven approach. TABLE I.
COMPARISON OF NAVIGATION EFFECIVENSS (F-MEASURES) Q
Classical Approach
Goal-Driven access
1
0,19
0,4
2
0,1
0,2
3
0,19
0,364
4
0,1
0,222
5
0,19
0,4
6
0,1
0,2
Goal-Based Approach
0,25
G9 G10 G11 Precision
Related Facts
Total Measures
0,2 0,15 0,1 0,05 0 Q1
Q2
Q3
Q4 Q5 Questions
Q6
Q7
Q8
Figure 9: Comparison of precision
VI.
CONCLUSION
Business processes are vital for efficiently managing the activities of enterprises. Therefore, it is important to constantly analyze and improve them. Lately, different approaches for the process analysis have been initiated, such as the use of processoriented data warehouse. In this study, we have argued that the way users utilization process warehouse can be enhanced by employing business orientation in the design and utilization of process warehouse. To achieve that, we have proposed integration of business goals in a method for warehouse-based process analysis. The integration spans across two levels, conceptual and implementation: the first defines the concepts employed in the integration, and implementation describes the extensions to dimension and fact tables of a PW to represent the records related to goals. Below, we summarize how our contribution addresses the current limitations in a data-driven process analysis outlined in the introduction section. - Magnitude of the data needed for an analysis is commonly substantially smaller compared to the size of the warehouse. Therefore the problem arises, how to identify and retrieve the necessary and sufficient data. Our solution to this problem is based on the traceability between goals and content of process warehouse. On choosing and refining a goal for analysis, based on the relation between goals and the warehouse, only the relevant information is retrieved. The experimental results presented in Section 5 demonstrate this ability. - Business analysts are accustomed to business concepts like goals - therefore if the goal related information is not available during analysis it requires additional effort to use the information. Our approach simplifies the task of navigating through the warehouse by allowing the navigation using goals. - Typically, the data retrieved from warehouses are numerical, which may not be interpreted correctly due to the absence of contextual information. In our approach, goals are integrated with the process warehouse and their satisfaction conditions. Thus, during a process analysis
the contextual information is available to users to ensure correct interpretation of the retrieved data. Feasibility of the proposed approach has been studied in three different ways: -
-
-
At conceptual level we have defined metamodel for a process warehouse, by considering the required perspectives of process design [22]. Based on the results of the comparison discussed in Section 5.1 it can be concluded that the proposed metamodel encompasses the existing PW proposals that we have found. Overall effectiveness of the approach is studied in terms of the navigation effectiveness. The obtained results (Section 5 - C), provide evidence that the goal-driven access affect the navigation efficiency positively. In order to integrate multiple goal structures with a single PW at implementation level, we have refined Bebel’s approach [30]. Based on the results of the preliminary experiments, it can be argued that addition of several bitmaps will not increase the size of PW exponentially.
Regarding future work, we are working to empirically investigate the usefulness of the proposed approach over traditional warehouse utilization, using a larger experiment. For that, we are working to refine the prototype and to extend the implemented process warehouse with and without integrated goals for a large healthcare process. Following that, we plan to conduct an empirical study to determine user satisfaction with process analyses using the proposed method and traditional way of analyzing processes. REFERENCES [1] [2] [3]
[4]
[5] [6]
[7] [8]
[9]
D. Grigori, F. Casati, M. Castellanos, U. Dayal, M. Sayal, and M.C. Shan, Business Process Intelligence. Computers in Industry vol 53, issue 3, pp 321-343, 2004. M. Sayal, F. Casati, U. Dayal,and M.C. Shan: Business Process Cockpit. In Proceedings of the 28th International Conference on Very Large Databases (VLDB’02), Hong Kong, China. V. Stefanov. Conceptual Models and Model-Based Business Metadata to Bridge the Gap between Data Warehouses and Organizations, PhD Thesis, Vienna University of Technology, November 2007. http://wit.tuwien.ac.at/people/stefanov/documents/iwmec2006_stefanov _list.pdf last access on 19 December 2010. V. Stefanov and B. List: Explaining Data Warehouse Data to Business Users - A Model-Based Approach to Business Metadata, In Proceedings of the 15th European Conference on Information Systems (ECIS 2007), June 2007, 2062-2073, Switzerland N.L. Sarda. Structuring Business Metadata in Data Warehouse Systems for Effective Business Support. http://arxiv.org/abs/cs/0110020 last access on 19 Dec 2010. V. Stefanov and Beate List: Business Metadata for the Data Warehouse Weaving Enterprise Goals and Multidimensional Models. In Proceedings of the International Workshop on Models for Enterprise Computing (IWMEC) at the EDOC’06, Hong Kong, China. S. Chaudhuri, and U. Dayal. An Overview of Data Warehousing and OLAP Technology. ACM-SIGMOD, Vol. 26, No. 1, 65-74 (1997). F. Casati, M. Castellanos, U. Dayal, and N. Salazar. A Generic Solution for Warehousing Business Process Data. Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB’07), pp. 1128-1137, Vienna, Austria. B. List, J. Schiefer, M. Tjoa, and G. Quirchmayr. The Process Warehouse: A Data Warehouse Approach for Business Process Management. Selected Aspects of Knowledge Discovery for Business Information Systems 2000, Boston, USA.
[10] F. Casati, M. Castellanos, N. Salazar, and U. Dayal. Abstract Process Data Warehousing. Proceedings of 23rd IEEE International Conference on Data Engineering (ICDE’07), pp. 1387-1389, 2007, Istanbul, Turkey. [11] L. An, J. Yan, L. Tong. Business Process Intelligence System: Architecture and Data Models. Proceedings of the 6th Wuhan International Conference on e-business, 2007, China. [12] S. Mansmann, T. Neumuth, M. Scholl. Multidimensional data modeling for business process analysis. In Proceedings of 26th International Conference on the Entity Relationship Approach (ER’07), Springer LNCS, Vol. 4801, pp. 23-38 (2007). [13] S. Brandt, M Schluter, M. Jarke. Process data warehouse models for cooperative engineering processes. Proceedings of the 9th IFAC Symposium on Automated Systems Based on Human Skill And Knowledge, 2006, Nancy, France. [14] K. Shahzad, P. Johannesson. An Evaluation of Process Warehousing Approaches for Business Process Analysis. In Proceedings of the 5th ACM-EOMAS in conjunction with CAiSE'09, ACM Press, & CEURWS Proceedings, Volume 458, pp 1-14, Amsterdam, Netherland. [15] K. Shahzad. Extending REA Ontology for Evaluation of Process Warehousing Approaches. Proceedings of the 2nd World Summit of the Knowledge Society (WSKS'09), Springer LNCS/LNAI Volume 5736, pp. 196-206, Crete, Greece. [16] S. Brandt, M Schluter, M. Jarke. A Process Data Warehouse for Tracing and Reuse of Engineering Design Processes. Proceedings of the Second International Conference on Innovations in Information Technology, Dubai, 2005. [17] G. Xie, Y. Yang, S. Liu, Z. Qiu, Y. Pan and X. Zhou. EIAW: Towards a Business-friendly Data Warehouse Using Semantic Web Technologies. Proceeding 6th international 2nd Asian conference on Asian semantic web conference (ASWC'07), Busan, Korea. [18] A. Pourshahid, D. Amyot, L. Peyton, S. Ghanavati, P. Chen, M. Weiss, and J.A. Forster. Business process management with the user requirements notation. Electronic Commerce Research, Vol. 9, No. 4, 269-316 (2009). [19] A. Pourshahid, D. Amyot, L. Peyton, S. Ghanavati, P. Chen, M. Weiss, and A. Forster. Toward an integrated User Requirements Notation framework and tool for Business Process Management. In Proceedings of the International MCeTech Conference on e-Technologies, 2008, Canada. [20] K. Shahzad, and J. Zdravkovic. A Decision-model Based Approach to Business Process Improvement. Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC'10), pp. 810-818, Istanbul,Turkey. [21] A. H. Huang. Effects of multimedia on document browsing and navigation: an exploratory empirical investigation. Information and Management, vol 41 (2), pp 189-198, 2003. [22] B. Curtis, M.I. Kellner, and J. Over. Process Modeling. Communications of ACM, Vol. 35, No. 9, pp. 75-90 (1992). [23] K. Shahzad, M. Elias, and P. Johannesson. Towards Cross Language Process Model Reuse – A Language Independent Representation of Process Models. Proceedings of the 2nd IFIP WG8.1 Working Conference on the Practice of Enterprise Modeling (PoEM'09), Springer LNBIP Vol. 39, pp.176-190, Stockholm, Sweden. [24] B. Korherr. Business Process Modelling - Languages, Goals and Variabilities. PhD Thesis at Vienna University of Technology, 2008 http://www.wit.at/people/korherr/phd-thesis.pdf last access on 19 December 2010. [25] B. List, and B. Korherr. An Evaluation of Conceptual Business Process Modelling Languages. Proceedings of the 21st ACM Symposium on Applied Computing (SAC'06), Dijon, France. [26] J. Silva, A. Oliveira, R. Fidalgo, A. Salgodo, and V. Time. Modelling and querying geographical data warehouses. Information Systems, Vol. 35, No 5. Pp. 592 – 614, 2010. [27] N. Prat, J. Akoka, I.C. Wattiau. A UML-based data warehouse design method. Decision Support Systems, Vol. 42, No. 3, pp. 1449 - 1473. [28] S. Mansmann, T. Neumuth, M. Scholl. Multidimensional data modeling for business process analysis. In Proceedings of 26th International Conference on the Entity Relationship Approach (ER’07), Springer LNCS, Vol. 4801, pp. 23-38 (2007). [29] K. Shahzad, and J. Zdravkovic. Process Warehouse in Practice: A GoalDriven Method for Business Process Improvement. Software Process
[30]
[31]
[32]
[33]
[34]
Improvement and Practice Journal, Special Issue on: Business Process Life-Cycle: Design, Deployment, Operation & Evaluation. B. Bebel, J. Eder, C. Koncilia, T. Morzy, and R. Wrembel. Creation and management of versions in multiversion data warehouse. In Proceedings of the ACM Symposium on Applied Computing (SAC’04), pp. 717723, Nicosia, Cyprus. L. Niedrite, and D. Solodovnikova. Goal-Driven Design of a data warehouse based business process analysis system. In: Proceedings of the 6th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Database, pp. 243-249, (2007). P. Kueng, T. Wettstein, B. List. A holistic process performance analysis through a performance data warehouse. In Proceedings of the 7th Americas conference on information systems (AMCIS’01), pp. 349-356 (2001). J. Schiefer, B. List, and R. Bruckner. Process data store: A real-time data store for monitoring business processes. In: Proceedings of Database and Expert Systems Applications (DEXA’03), Springer LNCS, Vol. 2736, pp. 760-770 (2003). M. Boehnlein, and A.U. Achim. Developing Data Warehouse Structures from Business Process Models. Otto-Friedrich-Universität Bamberg. 2000 Technical Report.
[35] [36]
[37]
[38]
[39]
http://www.ceushb.de/forschung/downloads/bb57.pdf last accessed on 19 Dec 2010. K. Pau, Y. Si, M. Dumas. Data warehouse model for audit trial analysis in workflows. In: Proceedings of IEEE International Conference on eBusiness Engineering (ICEBE’07). J. Eder, G. Olivotto, W. Gruber. A data warehouse for workflow logs. In Proceedings of the International Conference on Engineering and Deployment of Cooperative Information Systems, Springer LNCS, Vol. 2480, pp. 1-15, 2002, China. A. Bonifati, F. Casati, U. Dayal, M.C. Shan. Warehousing Workflow Data: Challenges and Opportunities. In Proceedings of the 27th International Conference on Very Large Scale Database (VLDB’01) 2001. M. Kandefer, S.C Shapiro. An F-Measure for Context-based Information Retrieval. In Proceedings of the Ninth International Symposium on Logical Formalizations of Commonsense Reasoning, Toronto, CA, pp. 79-84 (2009). M. Hägglund, M. Henkel. J. Zdravkovic, P. Johannesson, I. Rising, I. Krakau, S. Koch. A New Approach for Goal-oriented Analysis of Healthcare Processes. Accepted for publication in Stud Health Technol Inform. Vol. 160, No. 2, pp. 1251-1255, 2010.
APPENDIX TABLE II.
BASIC ELEMENTS
Basic Elements
MD
GD
GW
PH*
PD*
PS*
DT
DL
WD
MM†
Measure
++
++
++
++
++
++
++
++
++
++
Fact
++
++
++
++
++
++
++
++
++
++
Dimension
++
++
++
++
++
++
++
++
++
++
Aggregation Level
++
++
++
++
++
++
++
++
--
++
‘++’ if the element of our metamodel is used in the design. ‘--’ if the element is not used in the design. † is our proposed metamodel TABLE III. Facts Classification
CLASSIFICATION OF MEASURES
MD
GD
GW
PH*
PD*
PS*
DT
DL
WD
MM
Functional
+
+
-
-
+
+
-
+
+
+
Behavioral
-
-
-
+
-
-
+
+
+
+
Organizational
-
-
-
-
-
-
-
-
-
+
Informational
-
+
+
-
-
-
+
-
-
+
Other Measures
-
-
-
+
-
+
-
-
-
+ if there is at least one measure that can be classified in the given class. - if there is no measure that can be classified in the given class. * is added to the name of approach to distinguish the ones which represent an exceprt PW model TABLE IV.
CLASSIFICATION OF DIMENSIONS
Dimensions Classification MD GD GW PH* PD* PS* DT DL WD MM Activity
+
+
+
+
+
+
+
+
-
+
Event
+
-
-
-
-
-
+
-
-
+
Time
+
+
+
+
+
+
+
+
+
+
Participant
+
+
+
+
+
+
+
+
-
+
Passive
+
+
-
+
-
+
-
-
-
+
Active
+
+
+
+
+
-
+
+
-
+
+
+
+
-
-
+
-
-
+
+
Ordinary
+
-
-
-
-
-
-
-
-
+
Informational
+
+
+
-
-
+
+
-
-
+
Process
-
-
+
+
+
-
+
+
+
+
Location
-
-
-
-
-
+
+
+
-
+
+
-
+
+
+
-
-
+
-
Resource
Other Dimensions
+ if there is at least one measure that can be classified in the given class. - if there is no measure that can be classified in the given class. * is added to the name of approach to distinguish the ones which represent an exceprt PW model TABLE V. Approach
GOALS Goals Analysis
MD
The concept of goal is completely ignored during the design of PW. Also, the concept of goal is not included as a part of PW design.
GD
Goals driven approach is used for designing process warehouse. Goals and subgoals are defined by GQ(I)M [28] and analyzed to define measurement goals. Measurement goals are latterly used for notational model from which dimensions and facts are derived. However, goals are not included in the final process warehouse design.
GW
The concept of goal is completely ignored during the design of PW. Also, the concept of goal is not included as a part of PW design.
PH
According to this approach, goals are included as a part of auxiliary data, in the form of a goal tree. Individual goals are specified by key performance indicators and for each indicator target and actual performance [23].
PD
The concept of goal is completely ignored during the design of PW. Also, the concept of goal is not included as a part of PW design.
PS
This approach consists of four steps. In the first step, goals are identified followed by deep understanding of each goal in the form of subgoals and their relationships. Also, services and subservices are identified that fulfill these goals. In the remaining steps, business processes are analyzed conceptual object schema is derived and finally DW structures are defined.
DT
The concept of goal is completely ignored during the design of PW. Also, the concept of goal is not included as a part of PW design.
DL
The concept of goal is completely ignored during the design of PW. Also, the concept of goal is not included as a part of PW design.
WD
The concept of goal is completely ignored during the design of PW. Also, the concept of goal is not included as a part of PW design.
MM
The concept of goal is completely ignored during the design of PW. Also, the concept of goal is not included as a part of PW design.