In this article, we model business processes as workflows (activities) that are ... small project; thus he may play all the above-mentioned roles (project-leader, ..... some resources required for executing a task, missed deadline or over the budget, .... Similarly, if a suitable web-hosting company cannot be found for setting up ...
Information Systems Vol. 24, No. 2, pp. 159-184, 1999 Copyright 1999 Elsevier Sciences Ltd. All rights reserved Printed in Great Britain 0306-4379/99 $20.00 + 0.00
A Meta Modeling Approach To Workflow Management Systems Supporting Exception Handling† DICKSON K.W. CHIU1, QING LI2 and KAMALAKAR KARLAPALEM1 1
Department of Computer Science, University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong 2
Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong (Received 15 February 1998; in final revised form 10 March 1999)
Abstract — Workflow Management Systems (WFMSs) facilitate the definition of structure and decomposition of business processes and assists in management of coordinating, scheduling, executing and monitoring of such activities. Most of the current WFMSs are built on traditional relational database systems and/or using an objectoriented database system for storing the definition and run time data about the workflows. However, a WFMS requires advanced modeling functionalities to support adaptive features, such as on-line exception handling. This article describes our advanced meta-modeling approach using various enabling technologies (such as object orientation, roles, rules, active capabilities) supported by an integrated environment, the ADOME, as a solid basis for a flexible WFMS involving dynamic match making, migrating workflows and exception handling. Copyright © 1999 Elsevier Science Ltd Key words: Meta-modeling, Object-Orientation, Workflow Management, Match-Making, Exception Handling, Workflow Evolution
1. INTRODUCTION Many application domains including office automation, decision support systems, flexiblemanufacturing systems, involve complex interactions among humans and technical sub-systems to facilitate day to day business processes. Patterns behind these business processes are neither routine nor totally ad-hoc but somewhere in between. This calls for new types of information support that can streamline the support of such application domains [19]. Workflow Management Systems (WFMSs) form one post of such support. These are systems that can assist in specification, decomposition, coordination, scheduling, execution, and monitoring of the business activities. Besides streamlining and improving routine business processes, WFMSs can help in documenting and reflecting upon business processes. In this article, we describe a novel WFMS where effective and flexible coordination of human activities (as opposed to computer-based ones in existing WFMSs for routine office operations) is possible. In this article, we model business processes as workflows (activities) that are executed by a set of problem solving agents. We use the terms activity and workflow interchangeably. A Problem Solving Agent (PSA) is a hardware/software system, or a human being, with an ability to execute a finite set of tasks in an application domain. An activity is typically recursively decomposed into sub-activities and eventually down to the unit level called tasks. A task is usually handled by a single PSA. The execution of the activities is driven by events. Upon an exception, appropriate events will be raised so that the exception manager of the WFMS can take control of resolutions. The WFMS schedules and selects the PSAs for executing the tasks. We match the tasks with PSAs by using a capability-based token/role approach [17], where the main criterion is that the set of capability tokens of a chosen PSA should be matched to the requirement of the task. A token embodies certain capabilities of a PSA to execute certain functions / procedures / tasks, e.g., programming, database-administration, Japanese-speaking, while a role represents a set of responsibilities, which usually correspond to a job-function in an organization, e.g., project-leader, project-member, programmer, analyst, etc. Each PSA can play a set of PSA-roles and hold a set of extra capabilities. For example, John is a Japanese analyst-programmer who is leading a
†
Recommended by Kalle Lyytinen and Richard Welke
159
160
DICKSON K.W. CHIU et al.
small project; thus he may play all the above-mentioned roles (project-leader, project-member, programmer, analyst, etc.), and in addition holds an extra capability (token) of Japanese-speaking. Since it is not possible to specify all possible outcomes and alternatives in a workflow (especially with various special cases and unanticipated possibilities), exceptions can occur frequently during the execution of a business process. A WFMS must therefore provide exception mechanisms that help reallocate resources (data / object update) or amend current workflows adding alternatives (workflow evolution). Further, frequent occurrences of similar exceptions have to be incorporated into workflows as expected exceptions. Such workflow evolution can help avoid unnecessary exceptions by eliminating error-prone activities, adding alternatives, or by enhancing the operation environment. In order to support such WFMS features, we employ the following three major measures: 1. Meta-Modeling - Many of the earlier WFMSs [24] were built with on top of traditional database technologies (e.g., relational databases). They fall short in facilitating / offering flexibility of modeling, ease of implementation, and/or in handling dynamic run-time requirements. Advanced features — objects, rules, roles, active capability and the flexibility — of object-oriented database systems are needed to facilitate the development of a versatile WFMS [10], especially with meta-modeling features. In ADOME-WFMS, we advocate a three-level meta modeling approach wherein workflows, capabilities, exceptions, and handlers are defined at a meta-level as depicted in Figure 1: • Workflow templates are defined at the meta-level so that actual workflows can be instantiated for specific applications. For example, a generic requisition workflow template can be declared at the meta-level, so that specific requisition workflows (e.g. for requesting computers, or, for requesting supplies) having customized rules and sub-activities can be instantiated (cf. Section 4.3.1). • Capability tokens are defined at the meta-level so that they can be combined to form PSA-roles, which capture requirement of task classes (cf. Section 4.1.2). • Exceptions (which are events) and handlers (which correspond to conditions and actions) are defined at the meta-level. Exceptions are associated to handlers in the form of meta-EventCondition-Action-rules (meta-ECA-rules). Specific ECA-rules can then be bound to workflow for versatile exception handling (cf. Section 5.2). 2. Integrated Environment - There are several frameworks to develop advanced features of WFMS: loosely-coupled systems that extend and patch up existing databases inside the WFMS, and integrated environments that provides a separate layer of advanced features for the WFMS. The ADOME (ADvanced Object Modeling Environment) discussed in this article is an integrated environment that enhances the knowledge-level modeling capabilities of OODBMS models [22]. It provides a suitable environment for developing a versatile WFMS. 3. Effective Management of Human Resources – A common cause for failing to execute a task is either no agent is available, or a wrong agent is assigned. Human resources are the major cost in organizations. Thus, effective management of human resources is a key to the success of a WFMS and that of the organization. The ADOME-WFMS discussed in this article employs an event-driven approach to coordinate agents and a capability-based approach to match tasks and agents. 1.1. Related Work There has been advancement in using meta-modeling approaches in Computer Aided Software Engineering (CASE), such as [23]. As CASE is a more open-ended area than WFMS, our standpoint is that it is more effective and beneficial to adopt the meta modeling techniques for WFMS development. Meanwhile, notable advanced WFMSs have been developed in the past years [1, 2, 3, 12, 16, 17, 18, 24, 25, 27]. None of them, however, has employed a meta-modeling approach and/or used meta-modeling approach to address exception handling in WFMS.
A Meta Modeling Approach for WFMS Supporting Exception Handling
MetaClass Level
Exceptions
Handlers
161
Capability Tokens
Meta-Workflows
Meta-ECA-Rules Class Level
Workflow Classes
Requirement
PSA Roles
Workflow Instances
Capability Matching
PSA Objects
Bind
Instance Level
ECA-Rules
Bind
One-to-Many Many-to-Many
Fig. 1: Three-Level Meta-Modeling for ADOME-WFMS
Among these WFMS, TriGSflow [16] is perhaps the closest system to ours in that it adopts an OO design, and utilizes rules and roles. However, it only uses roles for associating PSAs with tasks, but not for modeling capabilities needed in matching agents with tasks. Moreover, it has little support for handling exceptions. Kumar et al, [18] describe a framework of using roles, event-based execution model and exception handling rules in a WFMS. However, they do not provide an organization model nor address a variety of exception conditions. Further, they do not support activity decomposition or capability matching to facilitate workflow definition and execution. WIDE [3] uses object and rule modeling techniques and suggests some simple basic measures in handling exceptions. However, systematic analysis and classification of exceptions and their handling approaches is not addressed adequately. OASIS [24] models an organization as a network of MOAP (micro-organization) nodes and provides a model for organizing and structuring organizational knowledge. Thus, its structure is different from the other systems which all use a composition of tasks. OASIS uses class and object concept to assign tasks in individual MOAPs. Flowmark [1] also uses Sagas and flexible transactions for modeling workflow exception handling. To avoid system failures, Flowmark uses a replicated database and clustered workflow servers. As an extension of Flowmark, Exotica/FMDC [2] handles disconnected agents. These additional techniques are relevant in handling exceptions since managing manual tasks off the computer system (e.g., meeting, typing, etc.) are not much different from handling disconnected agents. Since Flowmark only finds out all possible candidates for task execution and then lets them volunteer for the execution instead of using capability matching, effectiveness and fairness may be impaired. WAMO [12] uses Sagas and flexible transactions for supporting workflow exception handling. It also offers a preliminary classification of exceptions in which we have made some extensions in our taxonomy of exception categories (cf. Section 3.1). ConTract [26] focuses on activity control and execution but not on organization modeling and activity specification. ConTract net protocol is for communication among nodes in a distributed problem-solving environment, where nodes engage each other in discussions that resemble contract negotiation for PSA / task assignment. Similarly, other works like [13] focus on transactional aspects and thus on lower level issues. CapBasED-AMS [17] is implemented on a relational system with active capability. The modeling for workflow is not straightforward since almost all designs (which are in essence object-oriented) need to be converted into a traditional relational model. Moreover, the system can weakly support actions requiring advanced OODBMS features, such as dynamic schema modification (basic requirement for workflow evolution). In summary, other workflow systems either do not address exception problems comprehensively or concentrate only on extended transaction models. Furthermore, few systems have advocated (let alone supported) an extensive meta-modeling approach (such as PSAs, match-making, exception handling, etc.).
DICKSON K.W. CHIU et al.
162
1.2. Organization and Contribution of this Article The objective and contribution of this article include: • a meta-modeling approach for WFMS, • a taxonomy of different types of exceptions and their handling approaches in WFMS, • an augmented solution for exception handling based on workflow evolution, • an effective PSA modeling and management based on capability token approach, • a demonstration of the feasibility of ADOME-WFMS for effective support of exception handling. The rest of this article is organized as follows. In Section 2, we present several challenges for developing a WFMS. Section 3 presents a taxonomy of different types of exceptions and their handling approaches, with a special emphasis on exception handling and workflow evolution. Section 4 details the meta-modeling approach employed by the ADOME-WFMS. Section 5 describes the framework for supporting exception handling in ADOME-WFMS. We conclude our article with a plan for on-going research in Section 6. 2. ISSUES AND FUNCTIONS OF A WFMS 2.1. Requirements and Functions of a WFMS From a conceptual (meta-) modeling perspective, a WFMS should address and/or support the following essential aspects [4, 17, 22]: • Organization and Resource model – describes the constituency of an organization. This model includes permanent organization units and their compositions (such as departments and divisions), temporary units (such as project teams or committees), employees with their ‘roles’ (such as authority, responsibilities) and capabilities, physical facilities (such as hardware and software), and problem solving agents (PSAs) which can be human or systems. • Activity and Execution model – captures the dynamic aspects of the organization, i.e., how various states of the organization proceed. One way to model activities is to decompose a complex activity into sub-activities and finally down to atomic tasks. In addition, an execution model is required for specifying activity ordering, coordination of actual task execution and data exchange among PSAs. • Match Making model – captures the policies for selecting PSAs for the execution of tasks, and • Exception model – captures the dynamic handling of exceptions including approaches for workflow evolution. From a functional point of view, a WFMS should include the following features [4,17]: • • • • •
Modeling problem solving agents (PSA) especially their capabilities, states and knowledge Activity specification and management Dynamic coordination and event-based interaction for task execution Collaboration by capability based matching of PSA and tasks, and Effective and efficient handling of exceptions
Consequently, the following requirements can be set for a versatile WFMS [21, 22]: Heterogeneous object types – for various data objects in the WFMS, procedural operations and declarative rules Flexible abstraction – for modeling objects at various levels of abstraction, and for abstract objects such as roles and capabilities. Flexible and dynamic composition – for permanent and temporary organization units, and workflows. Flexible and dynamic binding – for binding employees to positions, as PSA, and for binding rules to different entities and events. Data relativism – for modeling multi-faceted PSAs, and semantic-relativism (multiple perspectives) among different interacting PSAs.
A Meta Modeling Approach for WFMS Supporting Exception Handling
163
2.2. Enabling Technologies for WFMS To facilitate the development of WFMS which can accommodate these requirements, a number of enabling (e.g., meta-modeling) technologies are needed. Table 1 illustrates several components / enabling technologies that are useful for various aspects of a WFMS: Object-oriented database (OODB) – The use of OODB for modeling and processing of complex objects and their relationships is instrumental in building an advanced WFMS. For example, the composition hierarchy for activities, sub-activities and down to tasks; isa hierarchy of PSAs not only captures more semantics than traditional relational models, but also helps in the reuse of their definitions. It also enables easier maintenance, understanding and evolution than a large number of inter-related tables as in [14]. Moreover, the OO paradigm enables flexible passing of different forms of data among agents and tasks while the OODB provides for a convenient general persistent storage that requires to be recorded. Roles – Roles are essentially virtual classes, which are similar to ordinary object classes, except that they do not create or delete objects, but only include-in or exclude-out role players from existing object classes. Roles enable PSA objects to be dynamically associated with one or more functions, responsibility and authority. They also capture attributes, states, methods and knowledge specific to individual roles. With roles extended by multiple-inheritance, capabilities and roles for PSA can be much better represented in a hierarchy as detailed in Section 4. Rules – The capability of Event-Condition-Action(ECA) rules enables event-based automatic execution and exception handling. Rules can also be used for processing declarative knowledge such as organization policies, agent selection criteria, exception handling criteria, etc. Flexibility of objects and schema – This facilitates exception handling since real-time modifications to objects, roles, rules and workflow are required. Model Technology Object Orientation
Roles Rules
Flexibility of Objects & schema
Organization & Resource Unit composition, Resources hierarchy Positions, PSA Role/capabilities Organization rules
Activity & Execution Activity composition
Match Making
Exception
Methods provided by PSA and tasks
rules events objects
and as
Role/capabilities Event-driven execution
Declarative policies
cost
ECA exception handlers Real-time update and dynamic bindings of objects, rules and roles Schema and workflow evolution
Table 1: Enabling Technologies Relevant to WFMS Modeling
3. OVERVIEW OF EXCEPTION HANDLING IN WFMS Exceptions and failures are basic features that need to be addressed by effective WFMSs. However, currently there is no established framework for exception handling in WFMSs. Motivated by [12], we classify exceptions along the two dimensions, namely: 1. Exception Source: external or workflow 2. Exception Type: expected or unexpected. We also classify exception handlers into the three dimensions, namely: 1. Exception Handling Mode: trivial, automatic, cooperative, manual, or failure 2. Re-execution criteria: optional, critical, repeatable or replaceable 3. Exception Handler Type: procedural or declarative.
DICKSON K.W. CHIU et al.
164
3.1. Taxonomy of Exceptions Exception Source – This dimension classifies the source of an exception: External exceptions arise from external components participating in the WFMS such as the operating system, DBMS, software applications, machines and equipment, etc; or operations within external organizations. The internal mechanisms for these “black box” components are not known to the WFMS. Workflow (internal) exceptions are those relating to workflow management issues such as unable to find a PSA or to get some resources required for executing a task, missed deadline or over the budget, special-case outcomes from a finished task, etc. We may further distinguish workflow exceptions into inter-workflow exceptions relating to 2 or more activities (such as due to resources competition) and intra-workflow exceptions that occur within a workflow instance. In this article we shall focus on intraworkflow exceptions. Exception Type – This dimension classifies the knowledge of the WFMS on a particular occurrence of an exception: Expected exceptions are those anticipated and already planned with explicit exception handlers, which may have been supplied by the WFMS, the workflow administrator, or the user. Examples are situations listed in emergency procedures, trouble-shooting guidelines, etc. Unexpected exceptions on the contrary require human intervention since they are unanticipated. Administrative rules can specify who should be informed to take action, though the solution itself is not known to the WFMS. The WFMS can then provide suggestions to the human expert for planning and selecting alternatives to overcome unexpected exceptions; otherwise, they will cause failure of the activity. The human expert can also choose to save the exception and its resolution into a database (as a case example) by amending existing activity definitions so that this exception becomes expected and can be handled automatically if it occurs again. 3.2. Taxonomy of Exception Handlers Exception Handling Mode – This dimension represents the difficulty of handling an exception with respect to the knowledge and the effort required by the WFMS: • •
• • •
Trivial – the exception (usually external and expected) is so basic that it is handled transparently by the relevant system components and the WFMS does not even know its occurrence. Automatic (system driven) – the WFMS attempts to handle the exception (with an optional message to the target agents) automatically by finding an explicit exception handler or by applying a pre-specified resolution. If the exception is not resolved, the targeted agents will be requested for action. Cooperative (system assisted) – a human expert is informed of the exception and is requested to provide an instruction or an approval so that the WFMS can carry out subsequent exception handling tasks. Manual (user driven) – a human expert is requested for exception handling and the progress of this activity is put on hold until the expert fixes the problem. Failures – the exception cannot be handled by the WFMS or by the available human experts, thus causing an exception to the activity initiating this task.
By default, automatic handling is attempted. However, if an exception handler cannot be determined, the WFMS will request for human intervention so that cooperative or manual handling can be carried out. If that fails too, the activity fails completely. To ensure proper degree of human intervention, the WFMS should allow users to override the intrinsic exception handling mode for important tasks and activities in the WFMS. For example, even if an exception type is expected (e.g., to hire more workers if the construction works fail to meet the deadline) but we specify manual handling for critical tasks (e.g., a multi-million dollar project), the WFMS should still inform the top management and suggest manual intervention.
A Meta Modeling Approach for WFMS Supporting Exception Handling
165
Re-execution Criteria – Users can specify the following re-execution criteria (or re-execution pattern) for tasks to assist in automatic exception resolution: • • •
•
Optional – whether the task is successful will not affect the next step of the transaction or the success of the parent activity. Critical – once a task is selected for execution but fails, it cannot be repeated nor can alternate solutions be determined. Unless there is human intervention, it causes failure of the parent activity. Repeatable – once a task is selected for execution but fails, it can be re-executed by the same agent, alternate agents, or using alternate resources; but the WFMS should not try other alternate paths because of some reasons. (E.g. due to set up overhead for a production line or a process etc.) Replaceable – should the task fail, the WFMS can choose either redo the task with a free choice of agents and resources; or if necessary, the WFMS can choose any alternate path to work around the problematic task.
Exception Handler Type – Users can specify the following re-execution criteria (or re-execution pattern) for different tasks and sub-activities to assist in automatic exception resolution: • •
Procedural – These are extra branches of activity decomposition for exception handling. Each procedural handler is specific to a particular context for handling specific outcomes. Declarative – These are ECA rules specifying resolution actions under certain events and conditions.
3.3. Exception-Handling Resolutions From an exception handling resolution point of view, different levels of resolution are possible during manual or automatic exception handling, such as: Level 1. Resolution by maintaining execution behavior Level 2. Resolution by modifying execution behavior Level 3. Resolution by evolving workflow We shall discuss Level 1 and Level 2 in this section, which are case-by-case resolutions selected by users manually upon each unexpected exception; Level 3 is discussed in the next section, as it aims at avoiding exceptions and/or subsequent automatic exception handling. The detailed resolution techniques described below are those supported by ADOME-WFMS. 3.3.1. Resolution by Maintaining Execution Behavior Updating data objects like PSAs, resources, etc., and changing activity constraints (e.g., budgets and/or deadlines) are the simplest and most fundamental solutions for handling basic workflow exceptions such as inadequate PSA availability or activity constraint violations. The user does not need to understand every detail of the workflow, but probably only the specific task causing the exception in order to make a sensible decision. Possible resolution decisions include the following: Changing capability or resource requirement for a task instance - under exceptional cases when there are no suitable PSAs or adequate resources, the management may decide to temporarily relax the requirements for a task. Thus, the WFMS can automatically (or the management can manually) select a less capable PSA to carry out a task. For example, because a certain job is small, a PC instead of a workstation can be used; a junior programmer, instead of a senior programmer, may take full responsibility of the job. On the other hand, the management may decide to increase the capability requirement upon failure of a task when the cause is due to inadequate experience or capability. For example, because a certain job is very complicated and the chosen programmer fails to finish the job, the management may request a senior programmer to redo the job. Changing constraints for an activity - an activity or a sub-activity usually has various global constraints such as deadlines or budgets. There may be other constraints imposed such as total number of
DICKSON K.W. CHIU et al.
166
PSAs allowed for a particular project (instance) or a whole sub-activity (class) must be carried out by a single PSA. Many of these constraints may probably reflect preferences rather than absolute critical requirements. The management would judge and probably prefer relaxing such constraints instead of a failure for a certain activity (which means loss in work done), especially when there are no other costeffective alternatives. Amendment of capabilities for one or more PSAs - through processes like training of personnel, accumulation of experiences, and upgrading of machines, the capabilities of PSAs can increase over time. For example, an organization may decide to train their LAN administrators so that they can set up and manage their new web site instead of hiring additional staff. Incomplete specification of capabilities is often possible, and thus requires amendment when an exception occurs. For example, upon the event that the art designer needs assistants for a major web-authoring project, colleagues with art skills will be called to volunteer and hence the capability “art skills” will be added for the volunteers. Adding PSA or other resources - if all the PSAs that can carry out a task are occupied or are not available, the management may consider adding/hiring more PSAs, especially if the organization is in short of such type of PSA. For example, when there is a shortage of programmers and there are new projects, a simple solution would be to hire more programmers. Moreover, if there are actually no PSAs capable of doing the task, adding a suitable PSA is a natural solution. For instance, if an organization needs to set up and maintain a comprehensive web site of its own, they may want to hire a web administrator. In some cases, adding PSA may speed up the tasks (e.g., hiring more programmers for coding), and thus help in meeting deadlines. Thus, the factors for acquiring additional resources are similar to that of handling inadequate PSAs. 3.3.2. Resolution by Modifying Execution Behavior Upon exceptions, the management may manually specify alternative and/or additional steps of a workflow instance instead of the specified flow to resolve the exception. However, this requires deeper understanding of the semantics of the affected workflow (at least the affected sub-activity). The exception handling resolutions include: Waiting – hold the progress of an activity and wait for some other events before resuming execution. Typically, this occurs when the required PSAs are unavailable or there are inadequate resources but the task is not so urgent and thus can afford waiting. The management may also specify waiting before reexecution of the failed task or an alternate solution so that the environment may change in favor of the task. (E.g., wait for the price of a raw material to get back to normal before purchasing them.) Repeating current task – this corresponds to a manual decision for the “repeatable” execution criterion. The management may choose the same PSA, alternate PSA, or use alternate resources to repeat a failed task. Choosing to repeat a task instead of other alternatives often helps reduce the cost for preparation and/or setting up the work. Switching PSA assignment for tasks – if a new complicated task requires an expert PSA with many capabilities but some of those PSAs are engaged in simpler tasks that actually do not require all their capabilities. In this case, an expert PSA may be swapped out of his assigned task to perform the new complicated task which requires his/her expertise, letting another PSA with less but enough capability to take over the simpler task. For example, suppose the programming team has only 2 programmers: John can program in Prolog and C but Tom only knows C. Let John be engaged in project t1 but Tom is free. Now a new project t2 needs a Prolog programmer and thus Tom replaces John in the old project so that John can contribute to the new project. Figure 2 intuitively illustrates such a desired dynamic PSA assignment swapping. Programmer John
Tom
Fig. 2: Dynamic Switching PSA Assignment for Tasks
t1
t2
A Meta Modeling Approach for WFMS Supporting Exception Handling
167
Choosing an existing alternate branch for execution – if the required PSA or resources are not available for execution of a task, the management may instead choose other feasible alternate branches (if any). This will not cause loss in work but may result in a higher cost or a less satisfactory result. For example, although it is better to hire an in-house administrator to set up and maintain a comprehensive web site, it is still feasible to accomplish the task by passing the job to an external web-hosting service company. Similarly, if a suitable web-hosting company cannot be found for setting up some web pages and a web administrator is hired instead and a higher cost may be incurred. On the other hand, some tasks may fail due to wrong initial judgement of feasibility or due to changes in the environment and thus they should not be repeated. The correct solution should be attempting another execution path. Skipping current task – this corresponds to the manual decision for the “optional” execution criterion. The management may decide whether the skipped tasks should be re-executed later or just ignored. In any case, the next step of the workflow will be executed since this optional task does not affect the progress of the activity. Aborting the sub-activity – Under situations where a task fails and there are no feasible alternatives, or there is a severe shortage of resources or agents, or the sub-activity is seriously over budget or beyond deadline, the management may choose to abort the execution of the whole sub-activity and seek solution at a higher level activity. The extreme case is that the whole activity or project is aborted (e.g., when aborting the project can stop further loss of money). Aborting other tasks - the management may decide to abort other less important tasks to release resources or PSAs for more urgent and important ones. 3.4. Workflow Evolution While the conventional approach to exception handling is often manual, off-line and on a case by case basis, a more radical approach would be to allow at run-time various kinds of changes of workflow to be specified and accommodated, i.e., modification of workflow definitions during work progress. With this approach, more exceptions could be avoided and/or handled automatically should they occur repeatedly. This reflects the accumulation of experience and knowledge by the WFMS, which may lead to better solutions [24]. Note that there are issues of correctness of the modified workflow that must be ensured. In this section, we describe the different categories of workflow evolution useful and relevant to exception handling. 3.4.1. Schema Evolution Schema evolution can be supported only if the underlying OODBMS has such functionality; examples include the following: Changing capability or resource requirement for a task (class) – instead of temporarily changing the capability or resources requirement for an instance, after getting enough experience and observations, the management may decide to permanently change them at the schema level to avoid further exceptions of this kind. This may be considered some form of migration of experience to declarative knowledge [24]. Moreover, the management may decide to change these requirements for other reasons such as cost, quality, and/or speed. Changing capability token / role hierarchy – this is necessary due to incomplete specifications and unanticipated requirement changes. Composition links and /or Isa links may sometimes be missed during the initial specification and thus may result in failures in finding a suitable PSA (although one might be actually available). Upon failure of tasks, the management may deduce that a new kind of capability is crucial to the success of a task. Thus, the new capability definition should be added to the WFMS, and the task requirement should be updated, along with other details (such as which PSAs possess this new capability) to be discovered and specified. Changing organizational structures – The management may exercise this type of control while encountering exceptions that caused by a shortage of resource or a PSA. Eliminating redundant structures and combining small project teams may sometimes result in more efficient use of resources.
168
DICKSON K.W. CHIU et al.
3.4.2. Patching Workflow Definition to Avoid Exceptions With a thorough understanding of problems after several occurrences of the same exception, the user may discover the actual reason for the exception. If the exception can be avoided, it is better to avoid it rather than let it occur because of the overhead of exception handling. This is especially useful if there are more than one possible branch of workflow (see below). Typical modifications include: Changing pre-conditions for transitions – Extra checking and/or pre-requisites are added before the execution of a workflow. For example, to avoid shortage of Prolog programmers during the project, a condition check of “more than 3 Prolog programmers available in team” is added before the transition to the task of “selecting Prolog as the programming language” for a project. Eliminating problematic branches and adding new branches – some unreliable procedures may be totally removed to avoid exceptions, if there are alternatives that are more reliable (though they may incur higher costs). New alternatives can be added and less reliable alternatives may be downgraded to serve only as back-ups. For example, because external services are often late, the management may decide to hire more programmers and not to contract out small software development jobs. Extra preparation work – extra preparation work may improve the working environment and eliminate adverse conditions leading to exceptions. For example, after a severe data loss due to computer virus attack, all systems delivered to clients include virus checking software. 3.4.3. Explicit Procedural Exception Handler With experiences gathered from the manual handling of exceptions, it may be possible to specify explicit procedural exception handlers that will perform automatic exception handling. In this way, unexpected exceptions become now expected ones and the system can be more versatile. Possible procedural handlers may include: Procedural exception handlers before main task – the procedural handlers are quite similar to preparation work but are only executed upon a specified exception condition. For example, if no vehicles are available for delivering the goods, hire a vehicle. Moreover, actions like changing PSA assignment and aborting other tasks to release required PSAs/resources can be added, along with suitable conditions guiding such executions. Re-execution pattern – these include changes in execution behavior such as skipping or repeating the current task, choosing another existing branch for execution, or aborting the current sub-activity (as discussed in Section 3.2). Once such actions are stabilized or become permanent, subsequent exceptions can follow the same way by re-executing these actions and thus further human intervention will not be required. Compensation or additional action – upon failure of some tasks, some compensation activity and/or additional procedure may be required for execution. For example, if a file containing the source code of a program was deleted from the hard disk, the programmer may need to load it back from a backup tape before he can continue to work on it. 3.4.4. Declarative ECA Rules for Exception Handling Upon different situations (events and conditions) which may cause directly or indirectly the exceptions (or may be unrelated to exceptions at all), declarative ECA rules are useful for specifying generic actions in bulk, thereby enhancing the modeling power of the WFMS significantly.† For exception handling specifically, ECA rules are useful for: Handling violations of global constraints – this represents a more specialized use of ECA rules, where exceptions relating to violations of a global constraint may probably be handled only within a specific workflow context (e.g., to postpone deadline, adding budgets, etc.). Specifying summarized rules for exception handling such as re-execution patterns, criteria for adding PSA and resources, aborting sub-activities, PSA reassignment, etc. †
With ECA rules, desired parallel actions which may or may not be related to exceptions (such as logging and reporting) can be specified.
A Meta Modeling Approach for WFMS Supporting Exception Handling
169
It should be emphasized that ECA rules can be at a high level and can cover different dimensions of exception handling, including: Association of actions with a class of tasks / sub-activities occurring in any context or in a specified context upon exception. This allows exception handlers not to be tediously specified individually for each task of every activity. Association of certain PSA classes with actions. E.g. all tasks of trainees should be logged, all exceptions relating to trainees should be reported, etc. Association of actions with certain events in general. E.g. upon a hard disk failure, report to the Information Systems manager. 3.4.5. Other Resolutions Other possible resolutions include drastic amendment to structures of the task and sub-activities at run-time. The minimal sub-activity containing all modifications can be restarted or resumed at a userspecified point due to the encapsulating properties of the composition hierarchy of activities and tasks. The changes in data / objects or re-executions are relatively simple solutions which can be easily supported. Such a change can be effected either manually or by automatic exception handlers. However, schema and workflow evolution must proceed with human intervention. 4. A META-MODELING FOR EXCEPTION HANDLING In order to support dynamic exception handling capabilities, we employ a meta-modeling approach, applying contempory enabling technologies as summarized in Section 2.2. 4.1. Organization and Resource Model The main components for our organization and resource models for a WFMS include the following entities: Organization composition hierarchy – specifies the subsidiaries, divisions, departments, employees, assets, accounts, customers, etc.† PSA object hierarchy – specifies and records the PSA objects, attributes and their methods. Token/role multiple-inheritance hierarchy – specifies and records the capability tokens, roles and positions that PSA object can possess. Token Derivation Theory – for determining equivalent capabilities of PSA to increase to chance of finding a suitable PSA for a task. 4.1.1. PSA Object Hierarchy The PSA object hierarchy uses an object-oriented inheritance mechanism so that the PSA sub-classes can capture more structural and behavioral semantics of the agents. A set of generic PSA classes/subclasses like Person, Computer Agent, etc. can be pre-defined and provided in a class library. 4.1.2. Role/Token Multiple Inheritance Hierarchy The token/role multiple-inheritance hierarchy is used for modeling and reasoning about capabilities. A token can be a composite token, which is equivalent to a set of simple tokens, i.e., the composite token inherits all the capabilities of the simple tokens. The higher levels of the inheritance hierarchy have
†
We omit here further elaboration on this aspect as it is relatively standard; details relevant to ADOME-WFMS on this part can be found from [5, 21].
DICKSON K.W. CHIU et al.
170
Tokens/Role Hierarchy
PSA Hierarchy
Computer Agents
Software
PSA
Machines
Tokens
Person Install System
Write Program
Teach Prolog
Teach C++
CS Graduate
Hardware Programmer Peter
Metaclass
Class
John
Exclude: Install System
PSA Roles
Junior Programmer
Object
Fig. 3: Meta-Class and Classes for PSAs
highly complex tokens that correspond to the capabilities of PSA-roles. Thus, capability-tokens and roles form a unified multiple-inheritance hierarchy, as shown in Figure 3Fig. 3. In this way, tokens and roles not only are symbols but also have attributes and methods which can be inherited too. More semantics about capabilities and roles of the PSA can thus be captured. Moreover, we can combine roles and/or tokens to form complex roles. The users may take a bottom-up approach for the specification process when they are considering from the perspective of capabilities, or a top-down approach when they are considering from the perspective of task requirements or positions. A PSA can acquire capabilities by playing PSA-roles and/or tokens directly. Usually, we include a PSA to play high-level PSA-roles (capabilities) due to one’s job function or position (e.g., Programmer, Manager), education / knowledge background (e.g., Computer-science graduate), and/or responsibility (e.g., project member). Moreover, we can also include a PSA to play low-level capability tokens to represent extra skills (such as Japanese speaking, Chinese typing, etc.), as illustrated by the definition of an example PSA John in Figure 3. Role AMS_role Class_attributes Role_Description: string; Date_Created: date; Attributes: Date_played: date; End Role PSA_Role isa AMS_role /* noble role of all PSA */ played_by PSA attributes ... end
role Token isa AMS_role /* root class for different Tokens */ played by PSA class_attributes Exclude_Tokens: set of Token; Reverse_derivation: boolean; (default false) ... methods ... /* other methods and attributes to be declared by the user */ end
Fig. 4: Sample Meta-Level Declarations for PSA-Roles and Tokens
Note that it is not necessary for a more specific role to always have more capabilities. For instance, while a programmer can be in general expect to perform some system work, a junior may be a programmer who is not allowed to install system software. Therefore, the attribute exclude_token (cf. Figure 4) is required to record which tokens / roles that the more specific role should not inherit from its ancestors.
A Meta Modeling Approach for WFMS Supporting Exception Handling
171
4.1.3. Token Derivation Theory Token Derivation Theory is developed in addition to simple inheritance to maximize the chance of finding a capable agent or alternative agents for a task. A PSA not only has its own tokens/roles, but also those obtained due to inheritance. (In particular, it also owns all simpler tokens down the hierarchy, except those specified in the attribute exclude_token.) Figure 6 presents the algorithms PSA_Role_Token and PSA_token, which recursively compute the inherited tokens, and derive all possible tokens of a PSA, respectively.
Fluent_C Use_ Debugger
Teach_ Pgmming
Debug_C
Debug_C++
Teach_C++
AI_expert
Teach_ Prolog
Fluent_ Prolog
Specialization
Fluent_C++
Composite Reverse Derivation Link
Fig. 5: Token Example: Programming
Based on this algorithm, one can, for instance, conclude that PSA John who is playing a role of programmer can also write programs and debug progra ms. The reverse direction (as denoted by the attribute reverse_derivation in Figure 4) may or may not hold. Consider the example of programming in Figure 5, one may conclude that the capability of teaching C++ (Teach_C++) can be derived if a PSA is fluent C++ (Fluent_C++) and knows how to teach programming (Teach_Pgmming). However, if we only record that a programmer is fluent in C++ (Fluent_C++), we cannot conclude the reverse relationship that s/he is also fluent in C. In ADOME-WFMS, our derivation approach is similar to the derivation approach of Datalog rules, where the derivation stops when there are no new additional tokens can be derived. In fact, the reverse links can be specified in non-recursive Datalog-like rules to reflect these additional tokens: Teach_C++(X) :- Teach_Pgmming(X), Fluent_C++(X). Debug_C++(X) :- Use_Debugger(X), Fluent_C++(X). Debug_C(X) :- Use_Debugger(X), Fluent_C(X). Teach_Prolog(X) :- Teach_Pgmming(X), Fluent_Prolog(X), AI_expert(X).
As an example, suppose John can teach C++ (Teach_C++), we can conclude from the inheritance hierarchy (of Figure 5) that he also has tokens Teach_Pgmming, Use_Debugger, Fluent_C++, and Fluent_C. By the above derivation rules, for instance, we can also conclude that John has tokens Debug_C and Debug_C++ (and also Teach_C if this token type is later defined). The combination of inheritance links and reverse derivation links for the tokens and roles forms a unified Token Derivation Network in our meta-model, which improves the reasoning of capabilities / roles and thus maximizes the chance of finding a capable agent for a task. The algorithm for deriving tokens for a PSA is illustrated in Figure 6. In the algorithm of Figure 6, the method PSA_token computes the set of tokens of a PSA O (without derivation), which is the union of all the hierarchies of the roles that O is playing, excluding those specified by the exclude_token attributes of individual components. This result can then be processed by the method PSA_Token_Derived, which is similar to the function of Datalog rules, to carry out the derivation of all other tokens with the reversed-derivation links.
DICKSON K.W. CHIU et al.
172
METHOD PSA_Token(O: PSA) /* to derive all tokens of a PSA with derivation down the inheritance hierarchy */ token_set := {} FOREACH r IN roles(O) DO token_set := token_set ∪ PSA_Role_token(r) END METHOD PSA_Role_Token(P: PSA_role) /* recursively compute the tokens deriving down the inheritance hierarchy */ token_set := {P} FOREACH r IN sub_role(P) DO token_set := token_set ∪ PSA_token(r) RETURN (token_set - P.Exclude_Token) END METHOD PSA_Token_Derived (O: PSA) token_set := PSA_Token(O) inc := token_set; WHILE (inc {}) DO BEGIN inc2 = {} FOREACH t IN inc DO FOREACH s IN Super_Role(t) DO IF (s NOT IN token_set) AND (s.reverse_derivation) THEN IF (subclass(s) IN token_set) THEN BEGIN token_set := token_set ∪ s inc2 := inc2 ∪ s END inc := inc2 END END
Fig. 6: Token Derivation Algorithms
4.2. Match Making Model It is essential that to execute a task, we must find a PSA that possess all the tokens required by the task. If there is more than one capable PSA, we must decide on a matching policy (or match making model) to select a PSA for the task. In [14], several match-making policies are described, viz., first find first match, last find last match, best match and worst match. However, these matching policies did not consider the cost for a certain PSA to accomplish the task. A cost function with both the task and PSA as parameters not only allows us to represent the basic cost, but also can be designed to incorporate any factors that the user may want to optimize, such as efficiency, resource utilization (with best match), workload, etc. METHOD Match_Making(T: Task, FC: Cost_function, S: Selection_Criteria) Found := NULL; Cost := MAX_Cost; FOREACH PSA satisfies S DO IF (role(PSA) ⊇ T.token) and PSA.Avaiable THEN BEGIN /* roles and tokens are all roles in ADOME */ C := FC(PSA, T) IF (C< Cost) THEN BEGIN Found := PSA; Cost := C; END END END Fig. 7: Matching PSA and Tasks
A Meta Modeling Approach for WFMS Supporting Exception Handling
173
Method Partial_Match(Task: T, S: Selection_Criteria FC: Cost-function) PSA_Set = {}; Rest_Token_set = T.task_need; FOREACH PSA satifies S ordered by FC DO BEGIN IF (Token_Set ⊆ Rest_Token_set) THEN IF PSA.available THEN BEGIN Rest_Token_Set := Rest_Token_Set - PSA.token; PSA_Set := PSA_Set ∪ PSA IF (Rest_PSA_Set = {}) THEN /* done */ RETURN PSA_Set END Return (PSA_SET={}) /* not found */ END Fig. 8: Algorithm for Partial Matching
METHOD Match_Cost_Derived(T: Task, FC: Cost_function, S: Selection_Criteria) Found := NULL; Cost := MAX_Cost; FOREACH PSA DO IF PSA satisfies S IF (PSA_Token_Derived.(PSA.token) ⊇T.token) and PSA.Available THEN BEGIN C := FC(PSA, Task) IF (C < Cost) THEN BEGIN Found := PSA; Cost := C; END END END Fig. 9: Matching Derived PSA Tokens with Tasks
METHOD PSA_for_Activity(A: Activity, F: Cost_Function, S: Selection) PSA_set := {}; IF A is Task THEN BEGIN PSA_set := Match_Cost(A, FN, S) IF (PSA_set = {}) THEN BEGIN /* fail matching */ PSA_set := Match_Cost_Derived(A, FN, S) IF (PSA_set = NULL) AND (A.Allow_Partial_Match) THEN BEGIN PSA_set := Partial_Match(A.Token) IF PSA_set = NULL THEN (Human_Intervention) OR (Claim Task cannot be done) END END ELSE FOREACH sa in Activity.SubActivity DO PSA_set := PSA_set ∪ PSA_for_Activity(sa) RETURN PSA_set END Fig. 10: Algorithm to find a set of PSA to carry out an activity
The basic algorithm for match making is illustrated in Figure 7. It is possible that sometimes we may not be able to find any PSA which can satisfy all the task requirements. Furthermore, certain types of tasks such as meeting and interview, by their nature, require multiple PSAs for the cooperative work. To this end, we have also developed a partial match algorithm as shown in Figure 8. Because of possible explosion of PSA combinations, one may not be able to minimize the cost in this case. Heuristics for improving partial match is therefore needed to sort the candidates and select one out of them. Because computing all tokens of a PSA with the derivation algorithm may be costly, it is conducted only if the matching procedures for the tokens based on the inheritance hierarchy fails. Note that the
DICKSON K.W. CHIU et al.
174
modification for Match_Making to Match_Making_Derived is simple: only change the clause role(PSA) to PSA_Role_Derived(PSA) as depicted in Figure 9. (Similary, Partial_Match can be modified to Partial_Match_Derived.) For example, let there be a need for a programmer to debug a C module, i.e. the required token is Debug_C. As explained in Section 4.1.3, the senior programmer John with token Teach_C++ can be matched if Match_Making_Derived is executed instead of Match_Making, since Teach_C++ can be derived to include Debug_C. The main algorithm to find a set of PSAs to carry out an activity is presented in Figure 10, where the result denotes the union of the PSAs for all the constituting tasks. 4.3. Event-Driven Activity Execution Model We use an event-driven activity execution model to provide a unified approach for both activity execution and exception handling. It consists of an activity meta-model and an execution model as explained in the following subsections. 4.3.1.
Activity Meta-Model
The activity composition hierarchy specifies and records the activities and their constituting subactivities recursively down to the task level. Tasks are atomic activities with no sub-activities. Nonatomic activities made up of tasks are called complex activities. The recursive activity decomposition is user-driven. Each activity needs to be decomposed once by the user, after which this decomposition is stored and made available for reuse. Thus, a sub-activity may be part of many other activities. Users can easily compose complicated activities based on existing ones.Figure 11 illustrates an example workflow (template) of requisition procedure. (a) Requistition (repeatable)
Begin
Supplier not found Purchase Request
COD
Procurement Cred
Payment Arrangement
Receive and check goods
(Replacable)
it
Receive and check goods
End Wait till Payment Due
Payment Arrangement
(Replacable) (b) Purchase Request
Begin
Get product information (Optional)
Fill in PR form
Budget Check
PR approval
(Repeatable)
(Critical)
(Critical)
End
Activity
Task (c) Payment Arrangement (Critical, Manual)
Begin
Match PR, PO and invoice
Check available funding
Payment Authorization
Prepare Cheque
End
transition
Fig. 11: Example Workflow of Requisition Procedures
Matching input events and output events of the sub-activity nodes specifies the arcs of the decomposition graph (or coordination plan). Any user-input decomposition must be verified against incomplete specifications. Different types of arcs such as and/or, joint/splits, parallel/serial, etc., as specified in [27] can be supported. This hierarchical composition is important for encapsulating details of activities and sub-activities so as to facilitate: (a) reuse of task and sub-activity definition in other new activities; (b) stepwise refinement by decomposing an activity into sub-activities representing more
A Meta Modeling Approach for WFMS Supporting Exception Handling
175
elementary tasks if this is required; (c) easy maintenance for activity definition and exception handler definitions; (d) scoping for exception handlers; (e) the use of nested transaction models for execution control; (f) localizing failures and thus limiting the loss of work done; (g) capturing exceptions between task execution. Furthermore, we propose that workflow templates can be defined at the meta-level so that actual workflows can be instantiated for specific applications. For example, a generic requisition workflow template can be declared at the meta-level, so that specific requisition workflows (e.g. computers, supplies) having customized rules and sub-activities can be instantiated. Class AMS_class Class_attributes: Class_Description: string; Class_Date_Created: date; Attributes: Instance_Date_Created: date; Name: string; Instance_Description: string; End class Task isa Activity class_attributes: Task_Need: set of Token; Allow_Partial_Match: bool; ... constrains Sub_Activities = {}; methods Match_Cost, Partial_Match, Match_Cost_Ext... End
class Activity isa AMS_class /* all activities are sub-classes of this meta-activity class because each different activity class can have multiple instance */ class_attributes: /* specification of an activity class */ Sub_Activities: set of Activity Classes; Backlink: set of Activity Classes; /* to parents of composition hierarchy */ Input_Events: set of Events; Output_Events: set of Events; Input_Parameters: set of Parameters; Output_Parameters: set of Parameters; Reexecution_Pattern: (optional, repeatable, replacable…) Activity_Type: (User_defined, System, Manual...); Attributes: /* parameters for execution of instance */ Priority: integer; PSA_chosen: set of PSA; Methods Decomposition, PSA_for_Activity, Execution, .... End
Fig. 12: Meta-level Specification of Activities and Tasks
The meta-level design of the activity objects is shown in Figure 12 and the following is noted: AMS_class serves as the root class of all class definitions in the WFMS for easy maintenance of the objects. Class-attributes (which is a feature supported in many advanced object-oriented systems) are used for storing either attributes of the class object (such as class description) or attributes of the same value among all objects of the class (such as the decomposition of an activity class). All sub-class objects inherit the definition attributes of the super-class object but each of the class can have their own value of class attributes. Since activity instances can be created and arranged for execution as many times as needed, all activity classes are treated as sub-classes of the meta-class Activity according to the same activity decomposition, but with different input/output parameter values and different sets of PSAs for the actual execution. 4.3.2.
Execution Model
We concentrate on using a centralized control and coordination execution model centered on the activity executor of the WFMS. The activity executor initiates the PSAs to be selected by the match maker to carry out their assigned task and get the response (if any) from the PSA upon task completion. The activity executor also monitors task execution status and enforces deadlines. Besides, if tasks return exception conditions or do not respond within the deadlines (i.e., time out), the activity executor will arrange the exception manager to handle it. An activity consists of multiple inter-dependent tasks that need to be coordinated, scheduled and
176
DICKSON K.W. CHIU et al.
executed. The data dependency, temporal dependency and external input dependency, can be expressed by means of a uniform framework of events, such as Data operations, Workflow, Clock Time, External Notification, Abstract Events [6]. Besides primitive events, any (recursive) combination of conjunction, disjunction, or sequence of other events [6, 7] can define a composite event. E: Task.execute(t) A: BEGIN /* prepare and execute a task */ REPEAT p = Findpsa(t); IF p {} THEN IF (p.Request(t) = “accepted”) THEN BEGIN accepted := true; /* PSA accepts and start working */ t.Impose_deadline; /* time-out event will be raised deadline (if any) is not met */ END ELSE accepted := false; UNTIL (p={} or accepted); IF (p={}) THEN Raise(No_PSA(t)); /* exception */ END E: Begin_activity.execute(a) /* first (dummy) sub-activity of an activity must be called “begin” */ A: FOREACH s IN a.Successor DO BEGIN /* start the next sub-activity(ies) */ Raise(Activity.execute(s)); s.Impose_deadline; END; E: End_activity.execute(a) /* last (dummy) sub-activity of an activity must be called “end” */ A: Raise(Activity.finish(Parent(a))); /* this sub-activity done */ E: Activity.execute(a) /* execute a composite activity by executing its first sub-activity “begin” */ A: BEGIN Raise(Activity.execute(a.Begin)); (a.Begin).Impose_deadline; END E: Activity.finish(a) A: BEGIN Cancel_deadline(a); if Task(a) then Free(s.PSA_chosen); /* free PSA from task */ FOREACH s IN a.Successor DO BEGIN /* start the next sub-activity(ies) */ Raise(Activity.execute(s)); s.Impose_deadline; END END
Fig. 13: Meta ECA Rules for the ADOME-WFMS Execution Manager
Since ECA-rules are supported by many contemporary systems, Figure 13 illustrates the event driven activity execution model with meta-ECA-rules. In descriptive terms, the operations of the execution manager are explained as follows: There is an activity decomposer module which generates ECA rules for automatic coordination during the execution of workflow and stores them in the database. Users and external applications can trigger the corresponding start-events to start work. Upon a start-event, if the activity is a composite one, the activity executor will raise a start-event for the first sub-activity. This process will continue recursively downward the composition hierarchy
A Meta Modeling Approach for WFMS Supporting Exception Handling
177
until a leaf task is reached. The activity executor invokes the match maker to select the appropriate PSA(s) for the task and then initiates the task†. The selected PSA will acknowledge or reject the assignment by raising a corresponding reply event. After finishing the assigned task successfully, the PSA replies to the activity executor by raising a finish-event. The activity executor then carries on with the next step according to the result passed back. Upon failures or time out, the PSA or the system will raise an appropriate exception event to invoke the exception manager. 5.
EXCEPTION HANDLING FRAMEWORK FOR ADOME-WFMS
Organization & Resource Model
WFMS
Positions, Role/Capabilities Units, Resources Organization Rules
Activity & Execution Model Activity Composition Event-Driven Execution
ADOME Role Mechanism
Rule Base
OODB
Procedure Base
Match Making Model
Exception Model
Role/Capabilities Cost policies
Exception Handlers Events & Rules
Fig. 14: Mapping ADOME to Meta-Models of a WFMS
We have been developing an experimental WFMS called ADOME-WFMS which aims to support most (if not all) of the functions and features mentioned above. As the name suggests, ADOME-WFMS is built upon an integrated advanced object oriented modeling environment (ADOME). In particular, the ADOME system was developed to enhance the knowledge-level modeling capabilities of OODBMS models [22], to allow them to more adequately deal with data and knowledge management requirements of advanced information management applications, such as WFMSs. ADOME can thus serve effectively as a basic layer to provide for most of the afore-mentioned important features (cf. Section 2) necessary for building a comprehensive WFMS (viz., ADOME-WFMS). Instead of patching up and extending an existing OODB inside the WFMS, our integrated approach shields the WFMS from low-level details and provides a rich set of modeling features and constructs for building the WFMS. This approach also facilitates testing, maintenance and evolution from a software engineering point of view.Figure 14 illustrates how various meta-models are supported by and centered around ADOME.
†
This approach is different from that of most other WFMSs, which usually compute the whole execution plan before starting execution. Our approach can handle dynamic resource allocation, online modification of workflow and exceptions in a rather flexible manner.
DICKSON K.W. CHIU et al.
178
5.1. The Architecture The architecture of ADOME, as illustrated in the lower part of Figure 15, is characterized by the seamless integration of a rule base, an OODBMS and a procedure base. Role extension to the OO model has been employed for accommodating dynamic nature of object states and for capturing more application semantics at the knowledge level. Roles also act as “mediators” for bridging the gap between the semantics of database, knowledge base and procedure base and for binding them dynamically [20]. Further, support for advanced ECA rules is also provided [6, 7]. Exception Model
WFMS
Exception Manager
Activity & Execution Model Log Manager
Match Making Model
Activity Executor
Activity Decomposer
Orgranizational Database
Match Maker
Organization & Resource Model
ADOME’s W orkspace & Interface
role mechanism
A D
ADOME’s Upper Layer Facilities & Role Mechanism
Role / Token hierarchy
O M E
ADOME’s Base Level OODB
Rule Base
Rule-set hierarchy
Class Hierarchy: PSA+Activities
Procedure Base
Procedure Pool
Fig. 15: Workflow Management System and ADOME
The ADOME prototype has been built by integrating an OODBMS (ITASCA [15]) and production inference engine (CLIPS [11]). Therefore, a WFMS can be implemented on top of it with relative ease. The architecture and functional aspects of the resultant ADOME-WFMS (as depicted in the upper part of Figure 15) are as follows:
A Meta Modeling Approach for WFMS Supporting Exception Handling
179
ADOME active expert OODBMS provides a unified enabling technology for the WFMS, viz., object and role database, event specification and execution, rule / constraint specification and processing. Activity Decomposer facilitates the decomposition of activities into tasks. The user provides the knowledge and related data to decompose activities into tasks by a user interface. Organizational Database manages data objects for the organization, as well as PSA classes, instances and their capability token (role) specifications. Besides maintaining user-specified extensional tokens / roles systematically, intensional token/role derivation for a PSA is also supported. Activity Executor coordinates execution by user-raised and database generated events. Log Manager keeps track of task execution states and results. Match Maker selects PSAs for executing tasks of an activity according to some selection criteria. Exception Manager handles various exceptions by re-executing failed tasks or their alternatives (either resolved by the WFMS or determined by the user) while maintaining forward progress. 5.2. Exception Handlers in ADOME-WFMS and Scoping In ADOME-WFMS, conceptually exception handlers can be: •
•
Procedural: There are extra branches (i.e. a kind of (meta-)sub-activities) for exception handling. Each procedural handler is specific to a certain task or sub-activity under a particular context for handling specific outcomes. For example, the arc labeled supplier not found in the example workflow template of requisition procedures (cf. Figure 11) represents a procedural handler. Declarative: Exceptions (which are events) and handlers (which correspond to conditions and actions) are defined at the meta-level. Exceptions are associated to handlers in the form meta-ECArules. Specific ECA-rule instances can then be bound to workflow for versatile exception handling at different activity and sub-activity scope. Thus, an exception handler applies not only to the body of the target activity but also to all sub-activities and tasks. For example, a meta-ECA-rule (E: program_error; A: inform(programmer)) can be declar ed so that specific rule instances such as (E: systema.program_error, A: inform(systema.programmer_in_charge)) can be bounded to a specific sub-activity class or instance.
Both kinds of exception handlers can be added, deleted and modified during activity definition time before execution, or during exception occurrence at run-time (viz. workflow evolution), which are adequately supported by the dynamic schema evolution capability of ADOME. When an exception occurs, the search for a handler is from the current task to its parent, then progressively up to the global activity (unless it is stopped by an explicit declaration). This also allows for special exception handlers overriding default exception handlers if necessary. For example, a declarative exception handler (rule) can specify that all exceptions should notify the personnel manager in the recruitment activity while a global exception handler specifies all PC failures should be reported to the EDP department. In this case, the failure of the PC in the personnel department causing exception to a sub-activity of the recruitment activity will trigger both rules to inform the personnel manager and the EDP department. However, the user can specify in some non-critical sub-activities / tasks to inform, e.g., a personnel officer on duty, instead of the personnel manager as an improvement of the exception handler. If an exception is not handled (either due to the WFMS or expert’s decision) in a sub-activity, the activity fails and triggers exception to its parent activity. This process may propagate up the composition hierarchy until it is handled. This approach localizes exception, and thus reduces loss of work done. Similarly, human intervention requirements of exception handling (automatic, warning, semiautomatic and manual) and re-execution patterns (optional, critical, repeatable and replaceable) for subactivities and tasks are specified within the scope of this composition hierarchy, with the lowest level taking priority in specification and thus overriding those of higher levels.
DICKSON K.W. CHIU et al.
180
5.3. Reusing Exception Handlers Since exceptions can be rather common in a WFMS, reusing exception handlers is vital to the effectiveness, user-friendliness and efficiency of the WFMS. In ADOME-WFMS, mechanisms for reuse of exception handlers follow from its structure: For procedural exception handlers, arcs from more than one peer tasks / sub-activities of the same level (siblings inside the same parent activity) can lead to the same exception handler for some degree of sharing. Because of scoping, only one declarative exception handler is required for each exception type for each activity composition hierarchy (as explained in the previous sub-section). For declarative exception handlers, an ECA-rule object r (i.e., an instance of a meta-ECA-rule) can be associated with more than one scope by repeated binding. (E.g. Bind r to activity1, activity2). Since exceptions are events (which are first-class objects in ADOME), exception classes are also arranged into an ‘isa’ hierarchy. Thus, an exception handler for a super-class will also handle an exception of a sub-class. (E.g., an exception handler for program_error will handle subscript_out_of_range also.) Extending the event-part with ‘or’ event composition can generalize exception handlers. (e.g., E: program_error ∨ computer_breakdown, A: inform(EDP)) and increase the applicability of the exception handlers. 5.4. ADOME-WFMS Exception Manager
Notificati on required
Auto m Han atic dling
Notify User
Ma Sp nua l e Ha cifie ly nd d ler
BEGIN
No Find / Exc epti Exc on h Resolve ept and ion ler Exception H or R andle eso r Fo Handler lved und U se au r r ha tom equ Execute nd a es lin tic t Exception g Handler
Exc Han ption dle reso r not lved
ti ifica Not
d uire req t o on n
Manual / Human Coopera- Intervention tive Manager
xcep No E
tion H
Dispatch next: Redo/ Resume/ Jump/ Abort
END
r andle
Fig. 16: Meta-Activity for ADOME-WFMS Exception Manager
Internally, the activity decomposition module rewrites user-supplied ECA rules and errorhandling procedure definitions so that when an exception occurs, an event is triggered to invoke the exception manager. The exception manager then takes control and acts according to different exception handling modes discussed in Section 3.2†. Relevant PSAs/users will be notified unless automatic handling mode is specified. Figure 16 depicts the control flow of the ADOME-WFMS exception manager. Note that if human intervention is required, the human intervention manager will be invoked. Thus, various exception-handling activities can be specified and executed (and the user can request for automatic resolution too). If there is an explicit, a resolved, or manually specified handler, then the chosen handler will be executed; otherwise human intervention will (again) be called upon. In some other cases, the exception handler (or even the human handler) may abort the parent (sub)-activity if it is determined that a failure has occurred and the parent sub-activity cannot be continued.
†
Note that exception source is not as important here since all exceptions are just treated as events [8].
A Meta Modeling Approach for WFMS Supporting Exception Handling
181
5.5. Supporting Exception Handling in ADOME-WFMS In Section 3, we have described different types of exceptions and here we describe how ADOMEWFMS can handle them.
5.5.1. Handling Expected Workflow Exceptions (EWE) For generic exceptions, ADOME-WFMS has built-in exception handlers: If a PSA rejects a task assignment or the best candidate PSA is not available, the WFMS will find the next available PSA. If all PSAs capable of executing the task are busy or the required resources are occupied, the WFMS will either wait or choose alternate execution paths. For more specific exceptions, the workflow administrator or users will have to anticipate and input exception handlers so that the WFMS can handle these workflow exceptions for them. ADOME-WFMS supports a lot of exception handling resolutions relating to PSA assignment based on capability matching, such as amending the capabilities of PSAs and changing capability requirements for a task instance. This is important because a significant proportion of internal (workflow) exceptions are due to failures in finding (suitable) PSA(s) for the execution of tasks. Moreover, ADOME supports advanced analysis for PSA capabilities termed as “capability role/token multiple inheritance hierarchy” and “token derivation network theory” (cf. Section 4.1). This increases the chance of finding suitable PSA(s) automatically (thus avoiding no-PSA), and also finding alternate ones for repeating a task upon exception. For automatic switching PSA assignment among tasks, these advanced capability processing features are vital to the success of this resolution scheme. It should be noted that quite a number of traditional WFMS like Flowmark [1] and OASIS [24], do not readily support or employ the notion of capability matching for PSA assignment to tasks. ADOME-WFMS supports different re-execution criteria / patterns so that failed tasks may be reexecuted or alternate task may be executed instead. Moreover, ADOME-WFMS can resolve and decide for the correct alternative PSAs or alternate execution branch automatically if the re-execution pattern for a task has been specified. To the best of our knowledge, no other WFMS (except WAMO [12]) support these features. 5.5.2. Handling Expected External Exceptions (EEE) There can be several kinds of expected external exceptions for which declarative ECA handlers are specified along with the appropriate actions that need to be taken: Some of these exceptions can be handled by the corresponding system level components participating in the WFMS, such as DBMS, operating systems, network. Also, recovery techniques (e.g., rollback, roll-forward for DBMS) can be used to handle these failures. Some of the above exceptions are handled silently by the external agents and ADOME-WFMS does not even need to be aware of (trivially handled), while others can notify the ADOME-WFMS by raising an appropriate event so that further actions can be taken (e.g., for logging and audit trail). Other exceptions, though expected, cannot be handled by the external system components. Appropriate events are raised to inform ADOME-WFMS so that those explicit exception handlers (which have been specified by the workflow administrator or users) can be executed. 5.5.3. Conventional Approaches for Unexpected Exceptions For an unexpected exception, the treatment of workflow and external exceptions is similar. Upon unexpected exceptions, the human intervention manager sub-module of the Exception Manager will assist the user by providing a list of possible resolutions with some evaluations. Moreover, all recent case-bycase resolutions are kept in the database for user reference. Since every scenario can be different, only the user can probably determine what are the most appropriate actions.
DICKSON K.W. CHIU et al.
182
The following table summarizes the suggested resolution for some exception cases from a situation point of view†. Situation Cannot find PSA Not enough resources Cannot meet deadline Activity constraint violation Task fail Activity fail
Suggested resolutions Wait, skip / postpone task, change requirement for a task, amend capabilities for PSA, abort other tasks to release PSA, switch PSA assignment, add PSA Wait, skip / postpone task, change resources requirement for a task, add resources, abort other tasks to release resources Postpone deadline, add PSA, expedite activity execution Change constraint (e.g. budgets), change relevant data objects Skip / postpone / repeat task, alternate branches of execution, change PSA / resources requirement Abort current sub-activity, alternate branches for execution
In short, ADOME-WFMS provides an adequate support of a variety of conventional exception resolutions as studied in Section 3, not only manually but also automatically for many of them. For example, the resolution of abort in the current sub-activity or other tasks / sub-activities is well supported, since such an exception can be handled in a unified manner at a higher level (the root activity in the activity composition hierarchy) either automatically or manually. In addition, ADOME-WFMS also supports radical solutions based on workflow evolution as discussed below. 5.5.4. Workflow Evolution for Unexpected Exceptions ADOME-WFMS has the required facilities for supporting workflow evolution (cf. Section 4 and 5.1). In particular, besides conventional exception handling resolutions, the human intervention manager submodule also accepts update of workflow on-line. In contrast, there are currently few WFMSs having such facilities for supporting the whole spectrum of exception-handling resolutions, especially those relating to workflow evolution. In ADOME-WFMS, the user can choose any of the suggested resolutions to be persistent or enter any schema evolution operation, update of workflow and/or enter new ECA rules. As workflow evolution requires the modification of workflow definitions or adding ECA rules to the system during work in progress, advanced schema evolution capability is required at run-time. Many WFMSs are based on relational databases and can hardly support schema evolution and thus severely restricts workflow evolution resolutions. Due to ADOME’s support of dynamic schema evolution [22], ADOMEWFMS readily provides exception resolutions based on schema evolution (cf. Section 3.4). It should be noted that the resolutions based on schema evolution are general-purpose ones, which can help reduce additional exceptions to occur. Furthermore, advanced ECA rule support in ADOME [3, 4] greatly facilities the reuse of exception-handling rules and the flexibility of associating these rules for different targets such as task / sub-activities, PSAs and events [9]. Because ADOME-WFMS uses activity decomposition, upon workflow evolution (i.e., modification of a certain sub-activity class definition), the side effects of affecting other activities containing this subactivity are very much confined. At the time of the workflow evolution, only those activities having the same sub-activity in execution are affected. Other activities having the same sub-activity but not in execution are unaffected since the sub-activity is encapsulated and behaves as a black box to activities at a higher level. 6. CONCLUSION Workflow management system technology, though fairly recent, has been regarded as one of the main types of the next generation information systems. It is perceived that workflow technology not only requires the support for complex data model functionality, but also flexibility for dynamically modifying the workflow specifications in cases of exception handling. However, in the current state of the art †
A comprehensive system assisting such decision-making is beyond the scope of this article.
A Meta Modeling Approach for WFMS Supporting Exception Handling
183
research, there is no comprehensive framework and/or an enabling technology that meets the functionality requirements of supporting workflow technology adequately. This article addresses this shortcoming by presenting a meta-modeling approach for a WFMS based on an integrated environment called ADOME. The meta-modeling approach is embodied by a three-level framework (cf. Figure1) in which the highest level defines the generic templates necessary for modeling various types of WFMSs, the middle level defines for each type of WFMSs the useful facilities that constitute the systems, and the bottom level instantiates for a specific type the actual WFMS instances. Such a meta modeling approach allows us to isolate various issues to different levels while it also promotes adaptiveness and reusability. The base system, ADOME, provides an improved (meta-)environment for developing a WFMS which can adapt to changing requirements. In particular, the resultant system (i.e., ADOME-WFMS) supports an effective PSA modeling and management through its capability/token approach, a rich taxonomy of different types of exception and their handling approaches, and a novel augmented solution for exception handling based on workflow evolution. ADOME-WFMS is thus able to provide a better support for the flexible and advanced functions, including versatile match making, online modification of workflow instances and definitions, and dynamic exception handling. ADOME-WFMS is currently being constructed upon the ADOME prototype system. Further research work currently being carried out includes: a cost model for effective PSA allocation and alternatives in exception handling; issues in cascaded and nested exceptions handling maintenance of previous cases of human intervention for future reference and automatic case-based exception handling; a more comprehensive methodology of handling exceptions systematically; and enhancement of the prototype system with the comprehensive exception handling features and workflow evolution support. Acknowledgements — This research has been funded by HKSAR CERG 747/96E.
7. REFERENCES [1]
G. Alonso, et al. Failure Handling in Large Scale Workflow Management Systems. IBM Research Report RJ9913 (1994).
[2]
G. Alonso, et al. Exotica/FMDC: a workflow management system for mobile and disconnected clients. Distributed & Parallel Databases, 4(3):229-247 (1996).
[3]
F. Casati, S. Ceri, B. Pernici and G. Pozzi. Workflow evolution. In Proceedings of the 15th ER’96 International Conference, pp. 438-455, Springer-Verlag Lectures Notes in Computer Science, Cottbus, Germany (1996).
[4]
S. Chakravarthy, K. Karlapalem , S.B. Navathe, A. Tanaka. Database Supported Cooperative Problem Solving. International Journal of Intelligent and Cooperative Information Systems, 2(3):249-287 (1993).
[5]
L. C. Chan, D. K. W. Chiu and Q. Li. A versatile bridging mechanism with a experimental user interface for an expert OODBMS. Technical Report HKUST-CS95-35, Computer Science Dept., Hong Kong University of Science and Technology (1995).
[6]
L. C. Chan and Q. Li. Devising a flexible event model on top of a common data / knowledge storage manager. In Proceedings of 6th Intlernational Workshop on Information Technologies and Systems (WITS ’96), Cleveland, Ohio, pp.182-191, Texas A&M University (1996).
[7]
L. C. Chan and Q. Li. An extensible approach to reactive processing in an advanced object modeling environment. In Proceedings of 8th International Conference on Database and Expert Systems Applications (DEXA ’97), Toulouse, France, LNCS(1308), pp.38-47, Springer-Verlag (1997).
[8]
D. K. W. Chiu, K. Karlapalem and Q. Li. Developing a workflow management system in an integrated object-oriented modeling environment. In Proceedings of 6th International Conference On Sofware Engineering and Knowledge Engineering (SEKE’98), San Francisco, pp.71-78, Knowledge Systems Institute, U.S.A. (1998).
[9]
D. K. W. Chiu, K. Karlapalem and Q. Li. Exception handling with workflow evolution in ADOME-WFMS: a taxonomy and resolution techniques. In Proceedings of CSCW-98 Workshop: Towards Adaptive Workflow Systems, Seattle, Washington, (to appear) (1998).
[10]
D. K. W. Chiu and Q. Li. A three-dimensional perspective on integrated management of rules and objects. International Journal of Information Technology, 3(2):98-118 (1997).
[11]
http://www.ghg.net/clips/CLIPS.html
[12]
J. Eder and W. Liebhart. The workflow activity model WAMO. In Proceeding of CoopIS-95, Vienna, Austria, pp 97-98, University of Toronto (1995).
DICKSON K.W. CHIU et al.
184 [13]
D. Georgakopoulos, M. F. Hornich and F. Manola. Customizing transaction models and mechanisms in a programmable envioronment supporting reliable workflow automation. IEEE Transactions on Knowledge and Data Engineering, 8(4):630-649 (1996).
[14]
P. C. K. Hung. A Capability-Based Activity Specification and Decomposition for an Activity Management System. M. Phil. Thesis, Computer Science Dept., Hong Kong University of Science and Technology (1995).
[15]
Ibex Corporation. http://www.ibex.ch/
[16]
G. Kappel, et.al. Workflow management based on objects, rules, and roles. IEEE Bulletin of the Technical Committee on Data Engineering, 18(1):11-18 (1995).
[17]
K. Karlapalem, H. P. Yeung and P. C. K. Hung. CapBaseED-AMS - A framework for capability-based and event-driven activity management system. In Proceeding of COOPIS ‘95, Vienna, Austria, pp. 205-219, University of Toronto (1995).
[18]
A. Kumar, et.al. A framework for dynamic routing and operational integrity controls in a workflow management system. In Proceedings of the Twenty-Ninth Hawaii International Conference on System Sciences, 3:492-501, IEEE Computer Society Press (1966).
[19]
Q. Li and F. H. Lochovsky. An approach to integrating data and knowledge management in next generation information systems. In Proceeding of International Workshop on Next Generation Information Technologies and Systems, pp. 59-66, Technion – Israel Institute of Technology, Israel (1993).
[20]
Q. Li and F. H. Lochovsky. Roles: extending object behaviour to support knowledge semantics. In Proceeding of Internatioal Symposium on Advanced Database Technologies and Their Integration, pp. 314-322, Nara Institute of Science and Technology, Nara, Japan (1994).
[21]
Q. Li and F. H. Lochovsky. Advanced database support facilities for CSCW systems. Computing & Electronic Commerce, 6(2):191-210 (1996).
[22]
Q. Li and F. H. Lochovsky. ADOME: an advanced object modeling environment. IEEE Transactions on Knowledge and Data Engineering, 10(2):255-276 (1998).
[23]
K. Lyytinen, P. Kerola. MetaPHOR: Metamodeling, Principles, Hypertext, Objects and Repositories. Computer Science and Information Systems, Technical Report TR-7, University of Jyväskylä (1994).
[24]
C. Martens and C.C. Woo. OASIS: An integrative toolkit for developing autonomous applications in decentralized environments. Journal of Organizational Computing, New Jersey: Ablex Publishing Corporation, 7(2&3):227-251 (1997).
[25]
Proceedings of the NATO ASI on Workflow Management Systems and Interoperability, A. Dugoc, L. Kalinichenko, M.T. Ozsu, and A. Seth. editors, Istanbul, Turkey, Springer Verlag (1997).
[26]
A. Reuter and F. Schwenkreis. ConTracts - A low-level mechanism for building general-purpose workflow management systems. IEEE Bulletin of the Technical Committee on Data Engineering. 18(1)4-10 (1995).
[27]
Workflow Management Coalition Members. Glossary - A Workflow Management Coalition Specification, WFMC-TC1011, version 2.0, http://www.aiim.org/wfmc/standards/docs/glossary.pdf (1994).
Journal of Organizational