Articulation: An Integrated Approach to the Diagnosis ... - CiteSeerX

Articulation: An Integrated Approach to the Diagnosis, Replanning, and Rescheduling of Software Process Failures1

Peiwei Mi and Walt Scacchi

Information and Operations Management Department University of Southern California2 Los Angeles, CA 90089-1421 fpmi,[email protected]

Abstract The papers presents an integrated approach to articulate software process plans that fail. The approach, called articulation, repairs a plan when a diagnosed failure occurs and reschedules changes that ensure the plan's continuation. In implementing articulation, we combine diagnosis, replanning, and rescheduling into a powerful mechanism supporting process-based software development. Use of articulation in plan execution supports recovery and repair of unanticipated failures, as well as revising and improving the plans to become more eective. In this paper, we also describe how a prototype knowledge-based system we developed implements the articulation approach.

Revised version to appear in Proc. 8th. Annual Knowledge-Based Software Engineering Conf., IEEE Computer Society, Chicago, IL (September 1993). 2 Acknowledgements: This work has been supported in part by contracts and grants from AT&T, Hewlett-Packard, and Northrop Inc. No endorsement implied. 1

Contents 1 Introduction

1

2 An Overview of Articulation

2

3 Plan Representation and An Example Plan

3

4 The Problem Space and Diagnosis

6

4.1 The Problem Space : 4.2 Diagnosis : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

5 The Solution Space and Problem-Solving Heuristics 5.1 The Solution Space : : : : : 5.2 Problem-Solving Heuristics

11

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

6 Reschedule the Changed Allocation

6.1 Rescheduling Constraints and Heuristics 6.2 The Rescheduling Process : : : : : : : :

6 9

11 13

14 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

7 Discussion and Conclusion

14 16

18

List of Figures 1 2 3 4 5 6 7 8

An Overview of Articulation : : : : : : : : : : : : : : : : Partial de nition for Develop-Change-and-Test-Unit : The Problem Space : : : : : : : : : : : : : : : : : : : : : The Solution Space : : : : : : : : : : : : : : : : : : : : : Algorithmic Description of Redo-and-Review : : : : : : The Modi ed Develop-Change-and-Test-Unit : : : : : Schedule for Develop-Change-and-Test-Unit : : : : : The Modi ed Develop-Change-and-Test-Unit : : : : :

i

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

2 5 8 11 13 14 17 19

1 Introduction Software process plans specify the resources needed for the enactment of a software process model. They also specify the relationships between the resources to process steps, the products produced by these steps, and any resource or enactment constraints. Process plans guide the instantiation and use of process models by users or automated mechanisms. The use of knowledge-based plans and planning mechanisms for this domain was rst introduced by Hu[HL88]. Plans that direct software development with limited resources can fail due to unexpected events. Planning mechanisms that must handle plan failure face two closely related problems: repairing the failed plan and recovering the broken plan execution. While the former is a dynamic planning or replanning problem, the latter involves dynamic or reactive scheduling. Most advanced planning and scheduling mechanisms tackle these two problems separately. For example, CHEF[Ham90b] is a plan repair mechanism that explains and repairs failed plans in the domain of Szechwan cooking. SIPE[Wil88] also has a replanner to do the same thing in general problem-solving. However, they do not deal with rescheduling resources to meet changed arrangements. Dynamic schedulers such as [SDS90], on the other hand, primarily schedule resources incrementally in order to avoid unexpected events and backtracking. However, they are unable to identify arrangements that have been changed. In this paper, we present an integrated approach to the diagnosis, replanning, and rescheduling of failed software process plans called articulation3 . When a process plan fails during execution, articulation identi es problems based on knowledge of dierent types of plan failures. Articulation retrieves failure repair mechanisms from a knowledge base of problem-solving heuristics and implements them. Then articulation reschedules the necessary resources so overall plan execution is able to continue according to the modi ed plan. Our articulation approach is prototyped in a multi-agent knowledge-based system, called the Articulator [MS90, MS91b, Mi92]. In what follows, we rst give an overview of articulation in Section 2 which integrates diagnosis, replanning, and rescheduling. In Section 3, we brie y discuss plan representation in the Articulator. We present these issues and specify the problem space, the solution space, and the rescheduling mechanism in Section 4, 5, and 6 respectively. In Section 7, we conclude the paper with a short discussion and summarize the unique characteristics found in the Articulator. Articulation originally refers to a kind of development work that people perform in order to continue their productive activities after a process breakdown occurs. It was rst identi ed in a number of empirical studies of software development [BS87, BS89, CKI88, KS82, OSS88, Str88]. The results of these studies are summarized and mapped into articulation heuristics elsewhere [MS91b, Mi92]. 3

1

Original Plan

Original Plan

New Plan Plan Repair Problem Space

Solution Space

Selection Instatiation

Repair Suggestion Diagnosis Schedule

Recovery

Execution Failure

Execution

* Instance

Finished Instance Reschedule Resources Articulation

Figure 1: An Overview of Articulation

2 An Overview of Articulation An overall view of the articulation approach that integrates diagnosis, replanning, and rescheduling is shown in Figure 1. Plan execution starts when a speci c process plan is instantiated. After the scheduling of its resources is complete, execution starts to symbolically execute the activities step by step. Normally, it continues according to the speci ed execution order in the plan until all the activities are nished. However, the ongoing execution of a plan is blocked once a failure occurs. In the Articulator environment, articulation mechanisms would now take over. The inputs for articulation are an executed plan with allocated resources, a broken activity in the plan, and an identi ed failure. The process of articulation consists of several stages: diagnosis, selection of problem-solving methods, recovery, and rescheduling. In diagnosis, the Articulator identi es possible causes for a failure based on its symptoms and the other related information. The problem space in the Articulator stores domain-speci c knowledge of failures that helps during diagnosis. From this point of view, the purpose of diagnosis is to locate a given failure in the problem space according to the situation where the failure occurs. Once a diagnosis is identi ed, a solution must be created to resolve it. This stage is accomplished in the Articulator through selection of a pre-de ned set of problem-solving heuristics (PSHs). However, all PSHs are not applicable at a given time, due to other constraints on resource availability. 2

There are also other preferences that limit use of a particular PSH. All such issues of applicability and preference are abstracted into a set of selection heuristics. Selection heuristics are therefore applied during PSH selection to determine which PSH best matches the identi ed diagnosis and the current process enactment context. The next stage is to apply the chosen PSH to recover from the failure. During this stage, the PSH is executed with a set of parameters, which de ne the current enactment context, gathered from diagnosis. Changes are then made in the broken activity, the plan, and/or the related resource allocation. At this point, the failure is resolved as far as the executed plan is concerned. However, this modi ed plan can not be executed directly because the original resource allocation as scheduled may not be valid due the changes. Rescheduling should then be performed to re-arrange the timing of the modi ed resource allocation. Otherwise, such changes may cause other failures. Two outputs resulting from articulation are produced by the Articulator. First, a modi ed plan instance with allocated resources is put into execution. Second, a repair suggestion is forwarded to a user whose responsibility is to study sets of repair suggestions gathered from repeated plan executions and realize necessary modi cations or repairs to the original process model. The Articulator is implemented in the form of a knowledge-based system[MS90]. The knowledge and experience of articulation abstracted from empirical studies of human articulation behavior are represented as knowledge schemata and inference rules[MS91b, Mi92]. The Articulator has 4 modules that together perform support articulation:

The problem space that embodies knowledge of failure and a diagnosis mechanism based on the problem space to interpret failures;

The solution space that embodies knowledge of problem-solving methods in form of PSHs;

A set of selection heuristics that bridges diagnosis to PSHs for given circumstances and application preferences; 4

A rescheduling mechanism that dynamically creates a global or partial resource allocation for a speci ed duration in response to changes in plans and related resources;

3 Plan Representation and An Example Plan Before we get into the details of articulation, we brie y present our representation of software process plans and resource allocation. Also, we provide an example plan which will be referenced Due to space limitations, we will not detail the selection process except to point out it is determined by applicability of PSHs to diagnosis as well as users preferences on use of de ned PSHs. The details can be found in [Mi92]. 4

3

frequently in later sections. In the Articulator, a software process plan is represented as a combination of several objects: An activity hierarchy represents the decomposition of a plan into a hierarchy of smaller process steps called tasks and activities. Multiple levels of decomposition can occur depending on the complexity of a plan. At the bottom of this hierarchy are activities, which represent the smallest observable steps. We use subtasks to refer to both tasks and activities since they are components in a plan. Within a level of a decomposition there exists a pair of relations to specify precedence order. These relations can de ne several types of execution ordering, such as linear, parallel, iteration, and multi-choice. Resource requirements specify descriptions of the resources necessary to perform a subtask and produce its expected outputs. These requirements specify what resources, or types of resources, are needed in a subtask, but do not refer to particular instances of resources. Resource requirements for a subtask include: (i) information about the timing of subtask execution and expected duration; (ii) knowledge and skills required by the agents (or roles) performing the subtask; (iii) speci cation of tools and raw materials; and (iv) products that are created or enhanced during the subtask. Resource possession includes 'real' resource instances that are physically allocated and used during plan execution. It binds real resources to execution, while resource requirements do not. In order to have such a match of resource requirements and possession, each of the resource instances in resource possession should correspond to a particular entry in the resource requirement. In addition, these two resource relationships should be functionally equivalent in terms of their knowledge and skill requirements. As such, with these objects in mind, we provide an example. The example describes a software development process plan called Develop-Change-and-Test-Unit. This process modi es a software system's design and source code as part of a software development project5. It focuses on designing, coding, unit testing, and management of a localized change to a software system. Figure 2 gives a graphic display of the activity hierarchy of the Develop-Change-and-Test-Unit process as well as the de nition of activity Modify-Design. Develop-Change-and-Test-Unit is decomposed into a set of 7 activities shown as icons in the gure. Also the ow of process execution among the activities is shown. For example, once Assign-Task is done, three activities, Monitor-Progress, Modify-Test-Plan, and Modify-Design can be started. Resource requirements are speci ed at the level of activities. In Figure 2, resource requirements This example was created by the organizing committee of the 6th International Software Process Workshop in 1990[KFFe90]. The original problem served as a standard modeling problem for comparing various existing approaches to software process modeling. We made minor changes to the original problem and reuse it here as an example of articulation. 5

4

Modify-Design

Modify-Code Test-Unit

Assign-Task

Modify-Test-Plan Modify-Test-Pk

Monitor

ffModify-Design

instance: task-component-of: task-has-predecessor: task-has-successor: task-has-agent-role-spec: task-has-required-resource-spec: task-has-provided-resource-spec:

activity Develop-Change-and-Test-Unit Assign-Task Modify-Code agent-for-modify-design required-for-modify-design provided-for-modify-designgg

ffagent-for-modify-design

instance: agent-role-spec-in-task: resource-requirement: maximum-quantity: resource-possession:

agent-role-spec Modify-Design Design-engineer Software-engineer 11 John Douggg

ffrequired-for-modify-design

instance: required-resource-spec-in-task: resource-requirement: maximum-quantity: resource-possession:

required-resource-spec Modify-Design System-Design-doc Requirements-change 11 System-Design-doc Requirements-changegg

ffprovided-for-modify-design

instance: provided-resource-spec-in-task: resource-requirement: maximum-quantity: resource-possession:

provided-resource-spec Modify-Design System-Design-Doc 1 System-Design-Docgg

Figure 2: Partial de nition for Develop-Change-and-Test-Unit

5

for activity Modify-Design is given. It states that Modify-Design requires two kinds of agents: Design-engineer and Software-engineer as well as some other kinds of required-resources. In de ning these requirements, we assume that object classes, such as Design-engineer, have been previously de ned in the knowledge base. Let us assume that a team of software engineers and other related resources are allocated for Develop-Change-and-Test-Unit. For Modify-Design, John is assigned as a Design-engineer and Doug as a Software-engineer. For other classes of resources, we use the same name to represent instances within the classes for this example. Although it is an oversimpli ed example, it is easy to anticipate that given dierent sets of resource possession, plan execution will be dierent. It is also easy to see that whenever resource possession changes, plan execution changes accordingly. These relations give rise to the necessity for articulation, and therefore, the role of the Articulator is to handle unexpected changes or failures during process plan execution.

4 The Problem Space and Diagnosis The problem space in the Articulator is an abstraction of problems caused by various types of process breakdowns [Mi92]. In this section, we describe the problem space. Once such a failure occurs, the Articulator interprets it by nding its corresponding position in the problem space. This diagnosis process is also described in this section.

4.1 The Problem Space The problem space represents the types of plan failure that the Articulator recognizes. The problem space consists of three dimensions: (i) type of failure, (ii) use of a problematic resource, and (iii) existence of the task chain for a problematic resource. Each dimension has a set of values. Along with each value, some other information may be attached. We explain these dimensions, their values, and the implications as follows: Type of failure describes the nature of failures. Its values are failure types most commonly seen during plan execution and recognized by the Articulator. These values represent domain failures for plans. A domain failure [Ham90b] is a problem that can be characterized in terms of the operators and states in a domain. In the domain of software development, the operators are activities and the states are status of resources such as agents, tools, and software documents [MS90, MS91b, Mi92]. Currently, we have identi ed seven failure types, each of which represents a particular class of problems in a poorly de ned plan: 6

Unfinished-Resource:

Products are not created, or subtasks are not nished when plan execution is complete. Consider the Develop-Change-and-Unit-Test. If activity Test-Unit is left un nished when its execution is complete, this is considered as a failure of unfinished-resource.

Unnecessary-Resource:

Unnecessary products are created, extra required resources are allocated, or unnecessary subtasks are included. Here unnecessary means resources exist for no apparent use in a plan. In this sense, unnecessary products are often by-products of a plan and unnecessary tasks are tasks that only produce unnecessary resources. For example, if it turns out that only one Design-engineer is needed to complete Modify-Design, the extra requirement for a Software-engineer causes an unnecessary-resource failure to occur.

No-Resource-Requirement:

a resource is used in plan execution which was not speci ed in resource requirements of the executed activity. For example, if the agents choose to bring one more Design-engineer into activity Modify-Design, this is considered a no-resource-requirement failure since no resource requirement is set for the extra Design-engineer and therefore no allocation has been made.

No-Resource-Possession:

the required resources are not in possession when the activity starts. This situation is the opposite of the previous one. If Modify-Design starts without any Software-engineer, this violates the resource requirements in Modify-Design.

Inadequate-Match:

a resource possession does not match its corresponding resource requirement. We use both class match and detailed quali cation test to make sure that a match is adequate. Quali cation tests are often domain-speci c and should be provided by domain experts if possible. For example, this type of failure occurs if the process planner assigns a QA-engineer instead of Software-engineer in Modify-Design. The latter role requires a dierent set of professional skills than the former.

Not-Work-Match-Occupied:

a possessed resource assigned to an activity is currently being used when the activity starts. Therefore, the resource instance is not available for use. Let us say that when Modify-Design starts, Doug, the assigned Software-engineer, is actually doing another emergent assignment that prevents him from joining the activity. A major issue to resolve here is to provide an incrementally updated working schedule for the assigned agents, to better arrange these activities.

Not-Work-Match-Broken:

a possessed resource assigned to an activity does not function correctly when the activity starts. Therefore, the resource instance is not available for use. Consider the example of Modify-Design again. Say this time John is sick and will not come 7

Failure Types

Usage of Resource

Task Chain

Un nished-Resource Unnecessary-Resource No-Resource-Requirement No-Resource-Possession Inadequate-Match Not-Work-Match-Occupied Not-Work-Match-Broken

Activity Init Agent Inter Tool Required-Resource Provided-Resource

Figure 3: The Problem Space to work for some unknown time. In comparison to the previous case, a major issue here is to nd a replacement for John within organizational limitations. The next dimension is the usage of a problematic resource or resource requirement in a broken activity. A problematic resource is one that causes the failure and becomes the object to be repaired or replaced. When modeled in the Articulator, a resource may be used by an activity in ve dierent ways: It can be the activity itself, an agent that performs the activity, a tool used in it, a requiredresource, or a provided-resource created by the activity. These ve basic types of uses not only determine the way a resource is used, but also in uence the way it is xed when a failure occurs. A dierent usage of a resource requires a dierent treatment during articulation. For example, agents normally arrange the access rights for tools with their owners, while they negotiate directly with other agents about work assignment. Often, nding the designated agent (a person) can be more dicult than nding a required information object or tool. The last dimension is the existence of a task chain with a problematic resource. A task chain consists of activities that are supposed to create or manipulate a resource before it becomes problematic. Therefore, if something goes wrong with the resource, chances are that something may be wrong within its task chain. Consider the Develop-Change-and-Test-Unit. The original System-Design-Doc must be modi ed in Modify-Design before it can be used subsequently in Modify-Code. If a Software-engineer nds that System-Design-Doc remains unchanged, he needs to check with the Design-engineer who is responsible for Modify-Design. It may turn out the Design-Engineer forgot to save the new code and must redo Modify-Design. It is also possible that no task chain exists within a plan where a problematic resource belongs, so there must be an external task chain that introduces the problematic resource into the plan. For a problematic resource, value inter represents the existence of an internal task chain, and init means no internal task chain exists. Figure 3 is a list of all values for the three dimensions in the problem space. The values for all three dimensions in the problem space give a clear description of what a problem is and therefore 8

makes it possible to suggest solutions to x it. In total combination, 70 types of problems are represented in the problem space: 7 failure types times 5 problematic resource usages times 2 problematic task chain types. Next, we explain the process for how to determine these values in the problem space for a given failure.

4.2 Diagnosis Once a failure occurs, diagnosis locates a position of the failure in the problem space. At the same time, a necessary explanation is gathered to support the location. To this end, diagnosis searches for a causal explanation that will be formatted as directed by the problem space. The position of a failure will then be used as a pointer to the solution space, which identi es a set of possible PSHs. Findings in diagnosis are summarized into a class of object called diagnosis, which serve as index to potential problem-solving heuristics. When diagnosis starts, it rst identi es the type of failure by determining: Is the plan execution complete? Is there a problematic resource associated with a failure? Is there a problematic resource requirement? Is there a match between the problematic resource and its requirement? What resources does the broken activity provide?

These questions are answered through both direct information retrieval and deductive reasoning, as provided by the Articulator[MS90]. For example, answers to the rst question can be determined by gathering the development status of each subtask in a plan. On the other hand, answers to the fourth question must reason about relationships between the problematic resource and the related resource requirements, as well as testing for quali cations. Answers to these questions classify the type of the failure as one of the seven recognized types. Information gathered from these questions is also saved for later use. The second dimension concerns the intended use of a problematic resource (or its requirement if the resource does not exist). This can be achieved by retrieving the speci ed relationship between the problematic resource and the broken activity. This is presented as a question: How are the broken activity and the problematic resource related?

Answers to this question must relate activity, agent, tool, required-resource, and providedresource according to the de ned relationship between the problematic activity, the problematic resource, and the related resource requirement. Also, other related information is collected such as problematic-resource-class indicating the class in which it occurs and current-co-users indicating other users of the problematic resource if they exist. The third dimension is then identi ed by asking: 9

Is the problematic resource (supposed to be) produced in the same plan?

The answer can be found through inferencing on a producer activity of the problematic resource and the relationship between the producer activity and the broken activity. If the producer activity can not be found, or is outside the plan where the broken activity belongs, the Articulator generally will not try to nd its producers and redo them unless all other possibilities prove ineective. If the producer activity is found, diagnosis then retrieves the task chain that leads to the current status of the problematic resource. This task chain consists of activities that manipulated the problematic resource in their execution order, the developers involved in those producer activities, and other critical resources. Such a task chain provides a causal explanation about what happened to the problematic resource and thus gives a clue for how it should be repaired. We will not go into a detailed description of the implementation of diagnosis, but simply point out that it is similar to those suggested by Schank [Sch86]. In total, there are 25 diagnosis strategies with their corresponding values in the problem space [Mi92]. Let us now consider an example failure as follows: As we see in Develop-Change-and-Test-Unit (Figure 2), Modify-Design is immediately followed by Modify-Code. When execution starts Modify-Design begins with two assigned agents, John and Doug. Modify-Design creates a new version of System-Design-Doc that is supposed to satisfy the changes in system design. The execution then initiates Modify-Code which is assigned to Doug and Chris. While modifying the source code, Doug and Chris may nd that the modi ed design document has

aws that create new problems for the software system, subsequently making System-Design-Doc inappropriate to follow when modifying the system's source code. Therefore, they identify an Inadequate-match and hand it over the Articulator. Diagnosis of this failure creates a diagnosis as follows: fffailure-12+11

instance: related-activity: identi ed-diagnosis:

failure Modify-Code diagnosis-12+12gg

ffdiagnosis-12+12

instance: related-failure: failure-type: problematic-resource: problematic-resource-requirement: problematic-resource-usage: problematic-resource-class: other-instances-of-req: problematic-resource-source: latest-provider:

diagnosis failure-12+11 Inadequate-Match System-Design-Doc required-for-modify-code required-resource System-Design-Doc nil Inter Modify-Designgg

10

Type of Operation

Replace-Instance Replace-Class Restructure Modify Redo-and-Review Split/Merge Others

Target Usage

activity agent tool required-resource provided-resource

Operand Usage

Operand

activity existing-self agent existing-other tool new required-resource provided-resource

Figure 4: The Solution Space

5 The Solution Space and Problem-Solving Heuristics Once the failure is located in the problem space, the Articulator acts to resolve the problem. The solution space in the Articulator is an abstraction of methods that recover failures. In the solution space, a recovery method is called a problem-solving heuristic (PSH). In this section, we rst explain these dimensions as characteristics of PSHs and then discuss the functionality and implementation of PSHs.

5.1 The Solution Space The solution space contains various PSHs that x or avoid failures. These PSHs are characterized by four dimensions that identify their operations and operand objects. When a PSH is selected, it is important to know both the type of the operation and the operand objects for the operation. The four dimensions in the solution space are designed to make these characteristics explicit. The four dimensions for the solution space as displayed in Figure 4 are: (i) type of operation; (ii) target resource usage; (iii) operand resource usage; and (iv) operand. First, the type of operation indicates operations used to repair a failure. These types represent abstracted methods commonly used in practice, such as nding alternative resources or adding new activities. They involve structural or non-structural changes in a plan. An operation that modi es the de nition of the original plan is called structural, while an operation that only involves re-assignment of resource instances to the plan is non-structural. The consequences of selecting one of these operations will be discussed in the next section, but now we focus on the operations in the solution space. Currently, seven types of operation are de ned:

Replace-instance: replace a problematic resource instance by another instance of the same class. This method is useful when a resource instance is not able to be used during its assigned subtask.

Replace-class: replace a problematic resource by an instance of another functionally equivalent

11

class. Its use is similar to the rst method above, but involves search for a functionally equivalent resource.

Restructure: change the task hierarchy so that a failure can be eliminated. This includes operations such as add, delete, and recon gure activities and/or resources. Some of these operations include combinations of replanning actions, as suggested in [Ham90b, Wil88].

Modify-Value: change the attribute values of aected resources or activities. An example of this type of operation is to change the work span of an activity.

Redo-and-Review: redo a problematic activity or its predecessor activity, then add a review activity to check it. This is useful when an activity does not produce its provided-resources as expected.

Split-or-Merge: decompose a task into a set of subtasks to create another layer of detail, or merge a group of tasks into a larger task.

Others: other methods that do not involve plan changes or resource con guration. They include 1) wait, 2) give up a problematic resource requirement and its associated possession, and 3) re-negotiate the use of a problematic resource instance.

The second dimension identi es the intended resource, called a target usage, to be xed by the chosen operation. It is the objective of the chosen operation. The Articulator supports ve resource uses to de ne this dimension (see Figure 3). The last two dimensions of the solution space concern an operation's operands. The Operand usage gives the class of the operand and the operand provides the exact operand used in an operation. Values for an operand usage are the ve types of resource usage: activity, agent, tool, required-resource, and provided-resource. Values for an operand are 1) existing-self for a problematic resource or a problematic resource requirement itself, 2) existing-other for using another existing instance/class, and 3) new for creating a new instance/class. In sum, a template of all four dimension values can be used to index a particular PSH, which is used to link to a particular diagnosis. Consider the following example:

and

12

1) 2) 3) 4)

Create another instance of the activity to be redone; Create a new instance of activity Review and add resource requirements; Link the two as activity 1) proceeds activity 2); Insert 3) into the plan just before the broken activity.

Figure 5: Algorithmic Description of Redo-and-Review This example identi es two PSHs that can be used to repair the example diagnosis given in the previous section. The rst PSH searches for another copy of System-Design-Doc to replace the problematic one. The second PSH adds more activities to modify the problematic System-Design-Doc. Without going into the details of PSH selection, we simply point out that the selection process uses global and local constraints to identify an applicable PSH among all de ned PSHs. The applicable PSH is then executed to x a given diagnosis. In this example, the second PSH is chosen to be applicable because it is a more feasible solution for the given circumstance.

5.2 Problem-Solving Heuristics Each legal position in the solution space represents a recognized class of articulation methods in the Articulator and speci es its necessary inputs. However, all combinations do not represent possible ways to x process plan failures. Only a subset of these combinations have been implemented as legal combinations of the solution space, each of which represents a PSH that is used to recover failures. All together, 35 PSHs are de ned and implemented as sets of inference rules[MS91b, Mi92]. These PSHs represent experiences that expert developers often use. When a PSH is selected to recover a failure, information recorded in the related diagnosis is fed into the set of inference rules and the rules solve the failure. The Articulator's PSHs owe a great deal to previous work on plan repair ([Ham90b, Wil88]). However, PSHs here are oriented to practical methods that are gathered from the empirical studies[MS91b], rather than a collection of single replanning actions. To this end, a PSH contains a set of changes that prove eective to a particular problem. Also, PSHs are organized such that a single failure can have multiple PSHs applied to it. As an example, consider the chosen PSH

It represents a combination of replanning actions that creates a pair of activities and inserts them into the broken plan. Figure 5 gives an algorithmic description of this PSH. Other de ned PSHs are implemented in similar fashion as inference rules. 13

Modify-Design

Modify-Design

Review-Design

Modify-Code Test-Unit

Assign-Task

Modify-Test-Plan

Modify-Test-Pk

Monitor

Figure 6: The Modi ed Develop-Change-and-Test-Unit When this PSH is executed on our example, a modi ed plan is created as in Figure 6. When the PSH is executed, the rst half of the plan has been completed and the second half is in progress. Furthermore, this modi ed plan is not yet ready for execution since its resource allocation is not yet done for the new part. As such, we turn to explain the rescheduling process that completes the remaining part before execution re-starts.

6 Reschedule the Changed Allocation Articulation requires scheduling to be reactive and partial. First of all, scheduling is called in response to changed resource arrangements. The previous arrangement must be abandoned and new arrangements asserted. Second, only part of the whole schedule needs to be modi ed while others remain the same. As such, changes should be limited as much as possible. In the light of these two requirements, a rescheduling mechanism has been implemented in the Articulator that is interfaced to the other mechanisms. In this section, we present both schema de nitions for the rescheduling constraints and heuristics, and the rescheduling process.

6.1 Rescheduling Constraints and Heuristics The Articulator's rescheduling mechanism is based on heuristic constraint-directed search[FSe89, KY89, MS91a]. It has a similar structure as those just cited, but it is designed to be reactive and partial. The rescheduling mechanism represents plans, resource allocation, and other conditions as rescheduling constraints and selection criteria as rescheduling heuristics. Currently, it supports two types of rescheduling constraints: 14

Unary rescheduling constraints that de ne a property for an object. For example, start time 3 for Review-Design is a unary rescheduling constraint and represented as: {{modify-design-start-time instance: comparison: related-object: value:

unary-constraint smaller Review-Design 3}}

Binary rescheduling constraints that de ne relationship between objects. For example, the successor of Modify-Design, Modify-Code, is represented as a binary rescheduling constraint: {{review-design-successor

instance: binary-constraint first-operand: Modify-Design second-operand: Modify-Code}}

The rescheduling mechanism uses two types of heuristics to reduce the search space and backtracking:

Search heuristics direct the selection of activities among all available candidates. Examples of selection heuristics are: Minimum-slack-time-first which is equivalent to the Critical Path Method; Earliest-start-time-first which selects activities in order of their speci ed start time (see the de nition below); and Maximum-number-of-resource-requirement-first which chooses the activities with larger number of resource requirements rst (see the de nition below). A set of selection heuristics can be used as a group to lter the necessary selection. {{earliest-st-first

instance: operation: operation-type: operand-constraint:

heuristic min value start-time}}

{{max-number-of-reqs


heuristic max number resource-requirements}}

Resource heuristics direct the selection of resource to satisfy resource requirements of activities. An example is highest-use-preference which selects the resources with the highest levels of preference. {{highest-use-preference


heuristic max value use-preference}

In addition to the set of rescheduling constraints and heuristics mentioned above, users can specify their own types of rescheduling constraints and heuristics as subclasses of these. 15

6.2 The Rescheduling Process The rescheduling mechanism takes over once a PSH is executed. First, the rescheduling mechanism decides which part of the previous schedule to abandon and which resources to reschedule. This part is very important since it acts as a link between replanning and rescheduling. It is to our advantage that this is done with reference to the executed PSH and changes made during PSH execution. When deciding the part to abandon, there are several alternatives. One strategy is to discard only those resource possessions for the broken activity and all its successors. This strategy is intended not to disturb part of the old schedule that binds the resources to other unaected activities, and therefore limits the scope of changes. Another possible strategy is just to abandon all resource allocations that have not been used and schedule all remaining activities altogether. This second strategy creates more potential changes to the original plan, but may have a better result. We will give an example to show their dierences later in this section. The part of the original plan to be rescheduled is summarized in the form of search constraints called Activity-Constraints and Resource-Constraints. In addition, Activity-Heuristics and Resource-Heuristics are given initially as selection criteria for activities and resources. The rescheduling mechanism starts to select an initial group of activities that satisfy the ActivityConstraints. These activities are then ordered according to the Activity-Heuristics. The rst activity is used to create a demand window which lists its resource requirements and time span. According to the demand window, the rescheduling mechanism selects a set of available resources through Resource-Constraints and Resource-Heuristics as it selected the chosen activity. If the demand window can be satis ed with the chosen resources, the chosen activity is allocated with these resources, then changes on activities and resources are propagated. After this, the rescheduling mechanism re-starts by selecting the next activity to be allocated. If the demand window can not be satis ed due to insucient resources at the moment, the chosen activity is postponed until the next time it is selected for resource allocation. The scheduling process continues until one of the following conditions occurs: 1) all activities are allocated; 2) the resources are used up in their available time; or 3) scheduling iteration reaches a pre-de ned step or number of activities to schedule. The rescheduling mechanism can be applied on either a complete plan or part of it over a speci ed time span, depending on given Activityconstraints and Resource-constraints. In the latter case, one only speci es activities constraints where some activities are allocated and others are not. Resource constraints are set accordingly, so only those to be allocated are available for scheduling. Now, let us consider our example of Develop-Change-and-Test-Unit for rescheduling (see Figure 6 for its activity hierarchy). Part (a) in Figure 7 is the original agent schedule. In this schedule, we assume that most activities take 1 time unit to complete while Monitor-Progress 16

Time

Mary

Peter

Project-Manager

1 2 3 4 5 1 2 3 4 5 6 7 1 2 3 4 5 6

QA-Engineer

John

Doug

Chris

Design-Engineer

Software-Engineer

Software-Engineer

(a) The Initial Schedule

Assign-Task Monitor-Progress Modify-Test-Plan Modify-Design Monitor-Progress Modify-Test-Plan Monitor-Progress Modify-Test-Package Monitor-Progress Unit-Test Unit-Test (b) The Modi ed Schedule (I) Assign-Task Monitor-Progress Modify-Test-Plan Modify-Design Monitor-Progress Modify-Test-Plan Modify-Design Monitor-Progress Modify-Test-Package Monitor-Progress Review-Design Review-Design Monitor-Progress Monitor-Progress Unit-Test Unit-Test (c) The Modi ed Schedule (II) Assign-Task Monitor-Progress Modify-Test-Plan Modify-Design Monitor-Progress Modify-Test-Plan Modify-Design Monitor-Progress Review-Design Review-Design Monitor-Progress Modify-Test-Package Monitor-Progress Unit-Test Unit-Test

Modify-Design Modify-Code

Modify-Code

Modify-Design Modify-Design Review-Design Modify-Code

Review-Design Modify-Code

Modify-Design Modify-Design Review-Design Modify-Code

Review-Design Modify-Code

Figure 7: Schedule for Develop-Change-and-Test-Unit and Modify-Test-Plan take longer. We also have an agent-to-role binding according to their skills. Part (b) in Figure 7 re ects a modi ed schedule created for Modify-Design and its successor activities. In this new schedule, we assume that Review-Design is assigned to four agents, Peter, John, Doug, and Chris. Their presence is required for the activity to start. Part (c) in Figure 7 is created by rst discarding part of the schedule for all unstarted activities at time 2 and scheduling them together with the newly added Modify-Design and Review-Design activities. Accordingly, this shows the nal schedule (c) takes fewer time steps to complete than (b) because of a better task arrangement for Peter. Finally, at this time, the rescheduling mechanism does not optimize modi ed schedules, but it is equipped with the capability that allows users to try out dierent strategies.

17

7 Discussion and Conclusion Before we conclude this paper, let us examine our example to see what kind of repair the discussed failure brought. Initially, there is no iteration de ned in the plan (Figure 2). After a failure of Inadequate-match is identi ed, the Articulator introduces a repeated Modify-Design and a new Review-Design to the plan (Figure 6). Subsequently, the Articulator suggests an iteration to encapsulate the two activities in the generated repair suggestion. If a manager accepts this suggestion and incorporates it into the original plan, this new plan, displayed in Figure 8, takes care of a problem that was not addressed in the original. Therefore, similar failures could be avoided in future instantiations of this process. Currently, the articulation approach is applied in both modeling and designing software development processes[MS90, MS91b, MS92, Mi92]. On the one hand, it supports simulation of software process plans as a means to debug and tailor them. On the other hand, it enables developers to recover from process breakdowns during software development which were not anticipated in software process plans. To this end, we have built two separate systems for these two purposes respectively. The Articulator system provides the capability for handling the dynamic evolution of software engineering processes. The second system, our process-driven CASE environment, accepts a software process plan from the Articulator as its input, displays it in a graphic form to its assigned developers, and supports data management, tool integration, and project management during software development[MS92]. One user view is shown in Figure 8. As such, when users in their assigned agent roles experience a process plan breakdown while using this CASE environment, they can then forward the identi ed failure back to the Articulator for resolution. After articulation is completed, the Articulator then outputs a revised process plan which is then input to this CASE environment. In conclusion, articulation of failed plans in execution is an integrated mechanism for diagnosizing, replanning, and rescheduling software process plans that fail during enactment. Our approach was in uenced by works on both planning and scheduling [Ham90a, HR85, Ste81, Wil88, FSe89, SFO86], Further, CHEF[Ham90a] and SIPE[Wil88] provided us with insight on the depth of plan repair and some valuable replanning actions. However, compared with these approaches, the Articulator and its articulation mechanisms are unique in three aspects: First, it is designed to handle both plan repair and rescheduling. This results in an integrated system of replanning and rescheduling which is powerful and convenient to users. Second, the Articulator relies heavily upon knowledge and skills for repairing problems in the domain of software development. The kind of knowledge and skills is abstracted from a number of empirical studies by us and others. It is also implemented as an open system so that more heuristics can be added easily. Finally, our 18

Figure 8: The Modi ed Develop-Change-and-Test-Unit articulation mechanisms were conceived to help solve process failure problems, which will likely become more prevalent as both conventional and knowledge-based software engineering environments evolve to support process-centered methods for software development. As such, our strategy for organizing articulation problem and solution spaces, together with problem solving hueristics and PSH selection hueristics, provides a viable and extensible foundation for further exploring how to better support knowledge-based software development.

References [BS87]

S. Bendifallah and W. Scacchi. Understanding Software Maintenance Work. IEEE Trans. on Software Engineering, 13(3):311{323, Mar 1987.

[BS89]

S. Bendifallah and W. Scacchi. Work Structures and Shifts: An Empirical Analysis of Software Speci cation Teamwork. In Proc. of the 11th International Conference on Software Engineering, pages 260{270, Pittsburgh, PA, May 1989.

[CKI88] B. Curtis, H. Krasner, and N. Iscoe. A Field Study of the Software Design Process for Large Systems. Communications of ACM, 31(11):1268{1287, Nov 1988. 19

[FSe89]

M.S. Fox, N. Sadeh, and etc. Constrained Heuristic Search. In Proc. of Joint International Conference on Arti cial Intelligence, pages 309{315, 1989.

[Ham90a] K.J. Hammond. Case-Based Planning: A Framework for Planning from Experience. Cognitive Science, 14(4):385{443, 1990. [Ham90b] K.J. Hammond. Explaining and Repairing Plans that Fail. Arti cial Intelligence, 45(3):173{228, 1990. [HL88]

K.E. Hu and V.R. Lesser. A Plan-Based Intelligent Assistant That Supports the Process of Programming. ACM SIGSOFT Software Engineering Notes, 13:97{106, Nov 1988.

[HR85]

B. Hayes-Roth. A Blackboard Architecture for Control. Arti cial Intelligence, 26(3):251{ 321, July 1985.

[KFFe90] M. Kellner, P.H. Feiler, A. Finkelstein, and etc. Software Process Modeling Example Problem. In The 6th International Software Process Workshop. Japan, Oct 1990. [KS82]

R. Kling and W. Scacchi. The Web of Computing: Computer Technology as Social Organization. In Advances in Computers, Vol.21, pages 1{90. Academic Press, Inc., 1982.

[KY89]

N. Keng and D.Y.Y. Yun. A Planning/Scheduling Methodology for the Constrained Resource Problem. In Proc. of Joint International Conference on Arti cial Intelligence, pages 998{1003, 1989.

[Mi92]

P. Mi. Modeling and Analyzing the Software Process and Process Breakdowns. PhD thesis, Computer Science Dept. University of Southern California, 1992. September.

[MS90]

P. Mi and W. Scacchi. A Knowledge-based Environment for Modeling and Simulating Software Engineering Processes. IEEE Trans. on Knowledge and Data Engineering, 2(3):283{294, Sept 1990.

[MS91a] C. Meng and M. Sullivan. Logos: A Constraint-Directed Reasoning Shell for Operations Management. IEEE Expert, 6(1):20{28, Feb 1991. [MS91b] P. Mi and W. Scacchi. Modeling Articulation Work in Software Engineering Processes. Proc. of the 1st International Conference on the Software Process, pages 188{201, Oct 1991. [MS92]

P. Mi and W. Scacchi. Process Integration in CASE Environments. IEEE Software, 9(2):45{53, March 1992. 20

[OSS88] G.M Olson, S. Sheppard, and E. Soloway. Empirical Studies of Programmers, I and II. Ablex Publishing Corporation, 1987 and 1988. [Sch86]

R.C. Schank. Explanation Patterns: Understanding Mechanically and Creatively. Erlbaum Hillsdale, NJ, 1986.

[SDS90] M.J. Shah, R. Damian, and J. Silverman. Knowledge Based Dynamic Scheduling in a Steel Plant. IEEE Sixth Conference on Arti cial Intelligence Applications, pages 108{ 113, 1990. [SFO86] S.F. Smith, M.S. Fox, and P.S. Ow. Constructing and Maintaining Detailed Production Plans: Investigations into the Development of Knowledge-based Factory Scheduling Systems. AI Magazine, pages 45{61, Fall 1986. [Ste81]

M. Ste k. MOLGEN Part 1: Planning with Constraints. Arti cial Intelligence, 16(2):111{139, May 1981.

[Str88]

A. Strauss. The Articulation of Project Work: An Organizational Process. The Sociological Quarterly, 29(2):163{178, Apr 1988.

[Wil88]

D.E. Wilkins. Practical Planning: Extending the Classical AI Planning Paradigm. Morgan Kaufmann Publishers, Inc., 1988.

21