A Logic-Based Approach to Finding Explanations for Discrepancies in ...

1 downloads 1030 Views 333KB Size Report
IOS Press. A Logic-Based Approach to Finding Explanations for Discrepancies .... whose nodes correspond to states and whose edges correspond to action occurrences. A planning ..... A solution to the planning problem P is the following plan of length 6:3 ... The states S0 and S6 are the initial ... Credulous vs “valid” plans.
Fundamenta Informaticae 79 (2007) 1–45

1

IOS Press

A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution Thomas Eiter∗ Institut f¨ur Informationssysteme, TU Wien Favoritenstr. 9-11, A-1040 Wien, Austria [email protected]

Esra Erdem† Faculty of Engineering and Natural Sciences, Sabancı University Orhanli, Tuzla, Istanbul 34956, Turkey [email protected]

Wolfgang Faber Department of Mathematics, University of Calabria 87030 Rende (CS), Italy [email protected]

J´an Senko Institut f¨ur Informationssysteme, TU Wien Favoritenstr. 9-11, A-1040 Wien, Austria [email protected]

Abstract. Consider an agent executing a plan with nondeterministic actions, in a dynamic environment, which might fail. Suppose that she is given a description of this action domain, including specifications of effects of actions, and a set of trajectories for the execution of this plan, where each trajectory specifies a possible execution of the plan in this domain. After executing some part of the plan, suppose that she obtains information about the current state of the world, and notices that she is not at a correct state relative to the given trajectories. How can she find an explanation (a point of failure) for such a discrepancy? An answer to this question can be useful for different purposes. In the context of execution monitoring, points of failure can determine some checkpoints that specify when to check for discrepancies, and they can sometimes be used for recovering from discrepancies that cause plan failures. At the modeling level, points of failure may provide useful insight into the action domain for a better understanding of the domain, or reveal errors in the formalization of the domain. We study the question above in a general logic-based knowledge representation framework, which can accommodate nondeterminism and concurrency. In this framework, we define a discrepancy and an explanation for it, and analyze the computational complexity of detecting discrepancies and finding explanations for them. We introduce a method for computing explanations, and report about a realization of this method using DLVK , which is a logic-programming based system for reasoning about actions and change.

Keywords: knowledge representation, reasoning about actions, logic-based planning, explanations, execution monitoring, computational complexity ∗ †

Address for correspondence: Institut f¨ur Informationssysteme, TU Wien, Favoritenstr. 9-11, A-1040 Wien, Austria Main part of the work carried out while visiting TU Wien.

2

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

1. Introduction In many planning frameworks, a plan for achieving a goal consists of a sequence of actions, which if taken will bring about a goal state. In real-world domains, executing such a plan for good is a fragile strategy, since non-determinism might cause unintended effects of actions and thus the execution of an action might fail. Such effects may, for instance, prevent the execution of the rest of the plan from reaching a goal state. Imagine an agent who goes shopping and, at some point, realizes that she does not have enough money to pay for the milk because she accidentally picked a similar, but more expensive one. Then, she can not complete her original shopping plan, which would have resulted in having milk at home. To cope with this problem, special notions of plans have been defined. For instance, a conformant plan is a sequence of actions, which enforces that the goal is reached under any circumstances, whatever the effects of the actions taken are. However, this notion is very strong, and conformant plans often simply do not exist. A conditional plan has a tree-like structure, nodes denoting observed fluents and edges denoting actions. Conditional plans are more powerful than conformant plans since they enable, depending on the values of observed fluents, branching to different subplans; to obtain the values of observed fluents, special sensing actions have been introduced. However, conditional plans are difficult to generate in general, and storing them might require exponential space because of recursive branching. Moreover, continuous or frequent sensing usually comes at a cost, slows down the plan execution, and makes planning more involved because of a larger action repertoire. Although conditional planning is very expressive, there are scenarios in which no conditional plan exists, simply because outcomes of non-deterministic actions might prevent the successful execution. For a drastic example, imagine an agent situated in a hotel which is on fire. Her only chance to be in safety is to jump from the window of her room to a small balcony one floor below and then to rush down the fire escape from there. However, jumping has an uncertain outcome: the agent might miss the balcony with a tragic effect. Here, neither a conformant plan nor a conditional plan exists, but still there is some credulous plan which might work out successfully, and, given the circumstances, it seems reasonable to adopt it. In other, less dramatic scenarios, when some action effects are not as expected, the goal might still be achievable by means of some branch of a conditional plan; but the path taken might not be desired by the agent. In our shopping example, the agent might achieve the goal of having milk at home by buying accidentally the more expensive milk; but she might not be happy with it when she notices that she bought the more expensive milk, since she desired to have her cheap milk as usual. Also here, a conditional plan for achieving the goal in the desired way might not exist (e.g., when there is no cheap milk available). Even if such a plan exists, it may be rather complex since, in most of its parts, it may have to take care of exceptional cases; in such a case, it is more natural that the agent will stick to a reasonable credulous plan, which might not guarantee a success but establish it with a high likelihood (we discuss this further in Section 2.3). Adopting plans which might not establish the goal leads to execution monitoring. The idea of execution monitoring is to check, from time to time, whether a plan execution is “on track” and the state reached in the end complies with the agent’s intentions, rather than to enforce a priori that any possible execution will be successful as in conformant and conditional planning. If the execution is found not to be on track, then the agent might take appropriate steps in order to recover from such a situation. This, however, calls for suitable methods to first detect failures and then to recover from them. In our shopping example, the agent might have the simple plan of picking the milk (intentionally, the cheap one), paying

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

3

for it and then carrying it home. If she checks the label of the milk before paying and discovers that she picked the more expensive one, she can analyze the situation and then figure out what to do next (put the milk back, be happy with the more expensive one, etc.). Motivated by such “insecure” plan execution, we consider in this paper a plan execution framework based on logical theories of actions. In this framework, the agent can tell when things go wrong by detecting discrepancies between the actual world at a certain time stamp (also referred to as stage) and the agent’s internal representation of it. Furthermore, she can find out what goes wrong by making diagnoses of the discrepancies, which offer explanations in terms of actions whose outcomes have unfortunate effects, based on a rationality principle. Such explanations may be used for different purposes. As for online plan execution, the agent may recover from a discrepancy to achieve her goals, using the information obtained from the diagnoses. For instance, if in the previous example the agent detects with monitoring that she picked a more expensive milk (e.g., by checking the brand of the milk immediately after she picked the milk), then she could put it back and pick a cheaper milk instead. Another usage of discrepancy explanations is checkpointing for future executions of the same plan, which may be done offline: if the agent gets alerted that a certain action in the plan may not have the desired effects, then in execution monitoring a checkpoint may be installed right after this action, in order to detect execution failure as soon as possible. In our example, the agent might always check the label of the milk she picked whenever she goes shopping to this store to buy milk in the future. Furthermore, explanations of a discrepancy elicit some implicit knowledge about the logical structure of the planning domain, and may hint at errors that have been made in its formalization or unveil misunderstandings of the domain formalization by the agent. For instance, in the previous example, the agent believed that picking the milk is deterministic, in the sense that the cheapest milk will be taken. Explanations of the discrepancy observed will help the agent to revise her view of the domain, according to which, for instance, the action of picking the milk is nondeterministic. In this paper, we study detecting discrepancies and generating diagnoses for them. The last step, plan recovery, is not formally addressed here; a detailed study of plan recovery in the monitoring framework described below can be found in [12]. The main contributions of this paper are summarized as follows. • We study plan execution and monitoring in the logic-based action representation framework of Turner [46]. In this framework, an action description is a set of formulas in second-order propositional logic; the meaning of an action description can be represented by a transition diagram—a directed graph whose nodes correspond to states and whose edges correspond to action occurrences. A planning problem is described as a satisfiability problem, in the spirit of “satisfiability planning” [28], see Section 2.3 for further discussion. Our choice of this action representation framework is mainly due to three reasons. First, it can accommodate nondeterminism, concurrent actions, and dynamic worlds. Second, it is a general action representation framework: action domains encoded in this framework can be obtained from domain descriptions in STRIPS-based or more expressive action description languages, such as C+ [24] or K [15]. This allows us to use systems such as CCALC [33], CPLAN [23, 19], and DLVK [15], for reasoning about actions. The specific action languages are in fact closely related to the framework considered here, and the notions and concepts can be easily transferred to them. Third, the logic-based representation is convenient to study properties of execution monitoring.

4

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

• We formalize the notion of a discrepancy between the run-time execution of a plan and a set of trajectories, in the above framework. These trajectories describe some possible executions of the plan that are intended or preferred as in [41]. It is possible to determine these trajectories by meta-level constraints which are not represented (or cannot be expressed) in the framework. • For a diagnosis of discrepancies, we introduce the notion of a “point of failure”, at which the execution of the scheduled action yields a state that is not intended. There may be many deviation points, at which the expected evolution of the current state where a discrepancy is detected deviates from the given preferred trajectories. In this paper, we consider a point of failure as the deviation point which is closest to the current stage, so that the effort for undoing the execution of the plan until the point of failure to re-establish a state on track is kept small. Then the idea is, from such a state on track, to reuse the rest of the plan (either the original one, or a variant thereof). (This kind of plan-rollback recovery can be in particular useful when the remaining plan is very long. ) We discuss alternative notions of points of failure later in Section 4. It is important to note that, in our approach, to detect a discrepancy and to find an explanation for it, we do not require that the runtime execution of the plan be saved. • We describe a general execution monitoring algorithm which incorporates discrepancy detection and diagnosis as optional steps (Section 6). • We analyze the complexity of several main computational problems in plan execution: detecting a discrepancy, determining the relevancy of a discrepancy, verifying a diagnosis for a discrepancy, and finding a diagnosis for a discrepancy. For each of the four problems above, which can be solved in polynomial time with the help of an NP oracle (as for relevancy, under one of the considered notions), we give a precise complexity characterization in terms of completeness for a complexity class. In particular, we show that computing a diagnosis for a discrepancy is complete for the not widely known class FNP//OptP[O(log n)] from [8], which is in between computability in nondeterministic polynomial time and deterministic polynomial time with an NP oracle. Few problems in AI have been shown to be complete for this class so far. We also note that the complexity results for the problems above, derived for Turner’s action representation framework, carry over to similar representation frameworks like that of CCALC and that of DLVK . • We introduce a generic algorithm to generate diagnoses for a discrepancy. Based on this algorithm, we describe a realization of our approach to detecting discrepancies and finding explanations for them using the DLVK planning system. Among other logic-based frameworks for execution monitoring are [10, 42, 43], and [20]. These frameworks differ from ours with respect to discrepancy detection and diagnosis mainly as follows. In [10, 42, 43], a discrepancy is detected when the remaining plan is not successful, considering all possible trajectories; diagnosis of discrepancies is not considered. In [20], a discrepancy is detected when the action is not executable or when the effects of an action are not as intended; diagnosis of discrepancies is provided in terms of some abnormality predicates. No complexity analysis for discrepancy detection and diagnosis is provided in these works. More discussion on the related work is given in Section 10. In the following, first we present the action representation framework and the planning framework we consider (Section 2). After that, we precisely describe discrepancies (Section 3) and diagnosis of

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

5

discrepancies (Section 4), and analyze some semantic properties of these notions (Section 5). We then give an example in the context of execution monitoring to show how explanations for discrepancies can be useful (Section 6). After we analyze the computational complexities of the problems of detecting discrepancies and finding diagnosis of discrepancies (Section 7), we present algorithms to solve them and report on a prototype implementation on top of the DLVK system (Sections 8 and 9). Then we conclude with a discussion on the related work (Section 10) and the future work (Section 11). Proofs of theorems are presented in the Appendix.

2. Preliminaries We introduce in brief Turner’s action representation and planning framework [46].

2.1. Action representation framework We begin with a set A of action symbols and a disjoint set F of fluent symbols. Let state(F) be a formula in which the only nonlogical symbols are elements of F. This formula encodes the set of states that correspond to its models. Let act(F, A, F 0 ) be a formula in which the only nonlogical symbols are elements of F ∪ A ∪ F 0 , where F 0 is obtained from F by priming each element of F. Then the models of the formula state(F) ∧ act(F, A, F 0 ) ∧ state(F 0 ) (1) correspond to the set of transitions. That is, • the start state corresponds to an interpretation of (the symbols in) F,1 • the set of actions executed corresponds to an interpretation of A, and • the end state corresponds to an interpretation of F 0 . Formula (1) is abbreviated as tr(F, A, F 0 ). Example 2.1. [24] Putting a puppy into water makes the puppy wet, and drying a puppy with a towel makes it dry. With the fluents F = {inWater , wet}, and the actions A = {putIntoWater , dryWithTowel }, the states can be described by the formula state(F) = inWater ⊃ wet. Since there are three interpretations of F satisfying state(F) {inWater , wet}, {¬inWater , wet}, {¬inWater , ¬wet} there are three states: {inWater , wet}, {wet }, {}. 1

In the rest of the paper, we sometimes say “interpretation of S” to mean an interpretation of the symbols in a set S.

6

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

The action occurrences can be defined as follows: act(F, A, F 0 ) = (inWater 0 ≡ inWater ∨ putIntoWater ) ∧ (wet 0 ≡ (wet ∧ ¬dryWithTowel ) ∨ putIntoWater ) ∧ (dryWithTowel ⊃ (¬inWater ∧ ¬putIntoWater )) The last line of the formula above expresses that dryWithTowel is executable when inWater is false and it is not executable concurrently with putIntoWater . For instance, the interpretation {¬inWater , wet , dryWithTowel , ¬putIntoWater , ¬inWater 0 , ¬wet 0 } satisfies tr(F, A, F 0 ), therefore it describes a transition: h{wet }, {dryWithTowel }, {}i.

(2)

Note that the interpretation of A above describes the occurrence of dryWithTowel .2 The meaning of a domain description can be represented by a transition diagram—a directed graph whose nodes correspond to states and whose edges correspond to action occurrences. In a transition diagram, a “trajectory” of length n is obtained by finding a model of the formula n−1 ^

tr(Ft , At , Ft+1 )

t=0

where each Fi (resp., each Ai ) is the set of fluents (resp., actions) obtained from F (resp., A) by adding time stamp i to each fluent symbol (resp., each action symbol). The trajectory is the alternating sequence of states and action occurrences that correspond to the interpretation of fluents and actions respectively. The trajectory is of the form S0 , A0 , S1 , . . . , Sn−1 , An−1 , Sn where each Si is the state that corresponds to the interpretation of Fi , and each Ai is the action occurrences that correspond to the interpretation of Ai . Example 2.2. The transition diagram for the action description of Example 2.1 is as follows: {} {putIntoWater} {inWater, wet}

{} {putIntoWater}

{putIntoWater}

{wet}

{dryWithTowel}

{}

{} {dryWithTowel}

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

7

In this diagram, paths of length 1 describe transitions, and paths of length n describe trajectories. For instance, the path {wet}, {} describes transition (2), and the path {wet }, {}, {inWater , wet} describes the trajectory {wet }, {dryWithTowel }, {}, {putIntoWater }, {inWater , wet}

(3)

where a wet puppy is first dried with a towel and then put into the water. This trajectory is obtained from the following interpretation of F0 ∪ A0 ∪ F1 ∪ A1 ∪ F2 { ¬inWater 0 , wet 0 , ¬putIntoWater 0 , dryWithTowel 0 , ¬inWater 1 , ¬wet 1 , putIntoWater 1 , ¬dryWithTowel 1 , inWater 2 , wet 2 }. that satisfies the conjunction tr(F0 , A0 , F1 ) ∧ tr(F1 , A1 , F2 ).2

2.2. Plans In a planning problem, an initial state is described by the formula init(F) such that init(F) |= state(F), and the goal is described by the formula goal(F). A credulous (or optimistic) plan of length n for the planning problem is obtained from any model of the formula V init(F0 ) ∧ n−1 (4) t=0 tr(Ft , At , Ft+1 ) ∧ goal(Fn ); as the sequence hA0 , . . . , An−1 i interpretation of Ai . Note that such a plan is not always “valid,” in the sense that it is always executable and guaranteed to reach the goal (such plans are also called conformant). In fact, it is valid if the initial state is completely specified and the action domain is “deterministic”—execution of every action sequence leads to a unique state. Planning under such conditions (i.e., complete information and determinism) is generally referred to as “classical planning”. We do not enforce validity of plans in our planning framework; see Section 2.3 for a discussion why we do not enforce this condition. In the rest of this paper, we refer to a credulous plan simply as plan. A feasible trajectory for a plan P = hA0 , . . . , An−1 i is a trajectory S0 , A00 , S1 , . . . , Sn−1 , A0n−1 , Sn such that Ai = A0i (0 ≤ i < n), S0 |= init(F0 ), and Sn |= goal(Fn ). In the following, we will denote by TP the set of all feasible trajectories for a plan P . We can talk about more specific states, transitions, or trajectories by applying some “substitutions” to the formulas describing them. Consider, for instance, Example 2.2. To find states reachable from

8

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

the state S = {wet }, we need to find the transitions of the form hS, A, S 0 i. These transitions can be described by substituting S for F in tr(F, A, F 0 ), that is, by the formula tr(S, A, F 0 ) obtained from tr(F, A, F 0 ) by replacing every atom p ∈ F with > if p ∈ S, and by replacing every atom p ∈ F with ⊥ otherwise. Similarly, trajectories S0 , A0 , S1 , A1 , S2 of length 2 with the initial state S0 = {} and the action occurrence A1 = {dryWithTowel } at time stamp 1 can be expressed by tr(S0 , A0 , F1 ) ∧ tr(F1 , A1 , F2 ). Here, tr(S0 , A0 , F1 ) is the formula obtained from tr(F0 , A0 , F1 ) by replacing every atom p0 ∈ F0 with > if p ∈ S0 , and by replacing every atom p0 ∈ F0 with ⊥ otherwise. Similarly, tr(F1 , A1 , F1 ) is the formula obtained from tr(F1 , A1 , F2 ) by replacing every atom p1 ∈ A1 with > if p ∈ A1 , and by replacing every atom p1 ∈ A1 with ⊥ otherwise. Example 2.3. Consider a version of the blocks world where the block being moved to a location may end up at a different location because the agent might not grip it properly. With the fluents on(B, L) (“block B is on location L”), and the actions move(B, L) (“move block B to location L”), the states, i.e., state(F), can be defined by the conjunction of the following formulas:2 • every block should be on some location: _

on(B, L);

L

• if a block is on some location then it is not anywhere else: ^ ¬on(B, L1); on(B, L) ⊃ L6=L1

• a block cannot have more than one block on itself: ^ on(B1, B) ⊃ ¬on(B2, B); B16=B2

• every block is “supported” by the table (here supported (B) is an auxiliary propositional variable defined in terms of on(B, L)): supported (B). Formula act(F, A, F 0 ) can be defined by the conjunction of the following formulas: • the preconditions of move(B, L): move(B, L) ⊃

^

¬on(B1, B);

B1

• the effects of move(B, L), and inertia: on(B, L1)0 ⊃ (on(B, L1) ∨ 2

W

L

move(B, L));

The symbols B, B1, B2, L, L1, L2 are schematic variables: B, B1, B2 range over a finite set of block constants, and L, L1, L2 range over the set of location constants that consists of the set of block constants and the constant table.

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

9

• no-concurrency: move(B, L) ⊃

V

B16=B,L1

¬move(B1, L1).

Consider, in this domain, the blocks a, p, x, i, r, s, and a planning problem P with the initial state and the goal state as follows:

p a

x

r i

p a r i x s

s

initial

goal

A solution to the planning problem P is the following plan of length 6:3 P = hmove(r, x), move (i, s), move(r, i), move (p, x), move(a, r), move (p, a)i. The feasible trajectories TP are of the form S0 , {move(r, x)}, S1 , {move(i, s)}, S2 , {move(r, i)}, S3 , {move(p, x)}, S4 , {move(a, r)}, S5 , {move(p, a)}, S6 where each Si is a state that corresponds to an interpretation of Fi . The states S0 and S6 are the initial state and the goal state, respectively, presented in the figure above; S2 –S5 are states such that tr(S0 , {move(r, x)}, S1 ) ∧ tr(S1 , {move(i, s)}, S2 ) ∧ tr(S2 , {move(r, i)}, S3 )∧ tr(S3 , {move(p, x)}, S4 ) ∧ tr(S4 , {move(a, r)}, S5 ) ∧ tr(S5 , {move(p, a)}, S6 ). These feasible trajectories can be represented by the paths from 0 to 6 in the following graph:

r p x a

r i

s

x

p a

i

r p x a

i

r p x a

0

i

1

s

r

x

p a

s

r p x a

i s

s

r p x a

2

i s

p p x a

x

3

a

r i s

p

x

p x

a r i s

r i s p x

i s

a

r i s

4

a r i s

5

p a r i x s

6 u t

In the following sections, an expression of the form F ≡ F 0 denotes the conjunction 3

V

f ∈F

f ≡ f 0.

We sometimes drop curly brackets from singleton action occurrences in a plan. For instance, in P , singleton action occurrences of the form {move(B, L)} are written as move(B, L) .

10

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

2.3. Credulous vs “valid” plans Our description of a planning problem above is in the spirit of “satisfiability planning” [28], since a planning problem is described by a propositional theory whose models correspond to plans. On the other hand, it is different from satisfiability planning, since it allows incomplete information (i.e., initial states may not be completely described) and nondeterminism (i.e., action domain may not be deterministic). In our planning framework, what is called classical planning is essentially satisfiability planning. In the presence of incomplete information and nondeterminism, a valid plan can be obtained by conformant planning or conditional planning, if such a plan exists (cf. [46] for formal definitions). In conformant planning, the idea is to find a sequence of actions that would lead to a goal state whatever the initial state is. Consider, for instance, an agent placed in a room with four walls. Suppose that the agent does not know what color each wall is; the goal is to have all walls painted red. So a conformant plan would be to paint all walls red. In conditional planning, the idea is to construct a plan taking into account each contingency that may arise. Actions in such a plan are conditional on the current state of the world, so execution of the plan is basically directed via observations of the agent. For instance, for the planning problem above, a conditional plan would involve, for each wall, first checking the color of the wall, and then painting it red if it is not already red. We remark that weak and strong solutions to a planning problem defined by Cimatti et al. [9] are related to credulous and valid plans, respectively. Roughly, a weak solution consists of a table π of pairs (s, a) where s is a state and a is a uniquely determined action (i.e., (s, a), (s, a0 ) ∈ pi implies a = a0 ) which satisfies the following property. From every initial state s0 , a goal state sk can be reached by a finite execution e = s0 , a0 , s1 , a1 , . . . , ak−1 , sk , where for each i = 0, . . . , k − 1, (si , ai ) is in the table π and si+1 is a possible outcome of taking ai in state si . A strong solution is a weak solution such that every execution e from s0 terminates in some state sk (i.e., no pair (sk , ak ) exists in π at stage k) which is a goal state. Note that credulous and conformant plans have, different from weak and strong solutions, a linear plan structure and do not allow goal-establishing trajectories of different length. On the other hand, credulous and conformant plans allow to take different actions at different stages i and j of the execution with the same state (i.e., Si = Sj ), which is not permitted in weak or strong solutions. Intuitively, feasible trajectories for a credulous plan correspond to executions of a weak solution. In case there is a single initial state s0 , a weak solution π can be obtained from any feasible trajectory for P , by removing all cyclic parts (i.e., subtrajectories beginning and ending in the same state) from it. However, in case of multiple initial states, a credulous plan may exist but no weak solution, since from some of the initial states no goal state might be reachable. In turn, each weak solution induces some (not unique) credulous plan, which can be extracted from a successful execution that reaches the goal from some initial state. From every conformant plan, a corresponding strong solution can be constructed from its feasible trajectories, while it may happen that a strong solution exists but no conformant plan (e.g. if different actions are needed in different branches of the evolution.) Thus, strong solutions are more expressive than conformant plans. Although the validity of plans can be ensured by conformant planning or conditional planning, we do not enforce this condition on plans mainly due to the following reasons. First, as already discussed in the Introduction, for some planning problems, there may not exist a conformant plan or a conditional plan, or no plan which has only intended trajectories, while some credulous plan exists. Consider, for instance, two coins, Coin 1 and Coin 2, which are initially tails. Suppose that the goal is to have heads

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

11

for at least one of them. If tossing Coin 1 and tossing Coin 2, each with the nondeterministic outcome of heads or tails (but not both), are the only possible actions, then there is no conformant plan or conditional plan whatsoever that would guarantee a goal state when executed. On the other hand, a credulous plan for this problem is to toss Coin 2. This plan is not valid but there is still some possibility of reaching a goal state by executing this plan. If we have no information about the odds, then this plan is perfectly fine, otherwise another plan could be more promising; we discuss this in a moment. Second, in general, conformant planning is Σp3 -complete for plans of polynomially-bounded length (Theorem 2 of [46]) and conditional planning (more precisely, deciding plan existence) is PSPACE-complete (Theorem 2 of [46]). Even for plan length fixed to a constant k > 0, conformant planning is Σp3 -complete and conditional planning is Πp2k+1 -complete (Theorems 2,4 of [46]). On the other hand, credulous planning as described in the previous section is NP-complete for plans of polynomially-bounded length (like classical planning, Theorem 1 of [46]). Thus, credulous planning is “cheaper” than conformant and conditional planning, and one might be willing (or forced) to trade validity for feasibility, if limited resources play a role. (For further discussion on complexity of planning, see [46], [39], [32].) In execution monitoring, we let the agent execute a not necessarily valid plan but still a reasonable credulous plan, and help her reach a goal state by monitoring the execution of that plan. (Note that there would be no need for execution monitoring, had we enforced validity of the plan adopted.) Such a reasonable plan can be selected among the set of all possible plans which are obtained from the models of formula (4), according to some given criterion which make his plan attractive. For example, if we have knowledge about failure probabilities of actions, then this could be a plan which has the highest probability of success; if we have plausibility information relative to some domain-specific information, then this could be a plan with the most plausible trajectories; etc. For instance, consider the planning problem above. Suppose that we know that tossing Coin 1 leads to heads with a 0.9 probability, whereas tossing Coin 2 leads to heads with a 0.1 probability, and the probabilities are the same on each toss. So tossing Coin 1 is a more reasonable plan than tossing Coin 2, and is also a most reasonable one since no other one has higher success probability. The selection of a (most) reasonable plan among others is a separate issue which we discuss only briefly, since it is beyond the scope of this paper. If probabilistic knowledge is available, one could select, e.g., a plan which has the highest probability of success; one whose probability of success is above a threshold; or one having a goal-establishing trajectory with (sufficiently high or maximum) probability (assuming that action failures are unlikely).

3. Discrepancies Suppose that we are given a planning problem whose domain is represented in the action representation framework of Section 2, whose initial conditions and goal conditions are described by formulas init(F) and goal(F) respectively. Let T be a set of trajectories of the form S0 , A0 , S1 , . . . , Sn−1 , An−1 , Sn described by a formula n−1 ^ t=0

tr(Ft , At , Ft+1 ) ∧ φ(F0 , A0 , . . . , An−1 , Fn )

(5)

12

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

such that each Si corresponds to an interpretation of Fi , and each Ai corresponds to an interpretation of Ai . Here, intuitively, the formula φ(F0 , A0 , . . . , An−1 , Fn ) specifies some “intended” or “preferred” conditions on trajectories of length n, and, thus, T describes some “intended” or “preferred” trajectories. In the following, formula (5) will be denoted by traj T (F0 , A0 , . . . , An−1 , Fn ). Let P = hA0 , . . . , An−1 i be a plan of length n for the given planning problem. Let Si be the state reached at stage i from an initial state after action occurrences A0 , . . . , Ai−1 . We say that there is a discrepancy between Si and T relative to P , if there is no trajectory 0 S00 , A0 , S10 , . . . , Sn−1 , An−1 , Sn0

in T such that Sn0 is a goal state and Si = Si0 . In our framework, this can be expressed as follows. Let us (Fi ) the formula denote by discrepancy P,T i ∀F00 , . . . , Fn0 traj T (F00 , A0 , . . . , An−1 , Fn0 ) ∧ goal(Fn0 ) ⊃ ¬(Fi0 ≡ Fi ). There is a discrepancy between Si and T relative to P if discrepancy P,T (Si ) holds. Note that in general i not all trajectories in T are feasible for P . Example 3.1. (Example 2.3 continued) Consider the planning problem P. Assume that the agent prefers to put a block onto a block instead of the table; so she considers the set T of trajectories S0 , A0 , S1 , . . . , S5 , An−1 , S6 such that move i (B, table) 6∈ Ai for any block B. The set T can be described by the formula T

traj (F0 , A0 , . . . , A5 , F6 ) =

5 ^

^

tr(Ft , At , Ft+1 ) ∧

t=0

¬move i (B, table)

i=0,...,5, B=a,p,x,i,r,s

and it can be presented by the graph r p x a p x a

r i

i

r p x a

i s p x a

s r p x a

0

s

i

1

s

r p x a

r i s

p x

a

r i s

a r i s

p x

p a r i x s

i s

2

3

4

5

6

Assume that, during the execution of P , at time stamp 4, the following state S4 is observed:

x

a

p r i s

Since this state is different from the state at time stamp 4 of any trajectory in T , there is a discrepancy between S4 and T relative to P . u t

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

13

The above notion of discrepancy handles the case where the current state is compatible with the possible states of the intended trajectories at this stage in a credulous manner, since there is some possibility to reach a goal state on some intended trajectory by executing the rest of the plan. If we would require in a cautious manner that a goal state must be necessarily reached, then we would essentially do conformant planning, where monitoring would not be necessary. One may consider, however, notions of discrepancy in between the credulous and cautious view. For example, in a probabilistic setting in the case of state compatibility, it should be considered a discrepancy if the rest of the plan can not establish the goal with sufficiently high probability. However, this can be emulated using the above notion of discrepancy and adapting the set T of intended trajectories. Indeed, if for state Si at stage i some discrepancy should be detected according to some notion of discrepancy, just prune from T all trajectories with state Si at stage i; no such trajectory should be intended. This can be easily accomplished by adapting the formula traj T accordingly. Relevancy of discrepancies. While monitoring an execution of a plan, the agent might want to check whether the detected discrepancy is relevant to the rest of the plan; if it is not relevant, she may continue with executing the rest of the plan ignoring the discrepancy. Different notions of relevancy might be considered, depending on the agent’s stance on uncertainty of goal achievement by intended trajectories. We mention here two notions relevancy, the latter being more cautious. • A discrepancy between Si and T relative to P is weakly k-irrelevant (1 ≤ k ≤ n − i) to the rest of the plan, Ai , . . . , An−1 , with respect to T , if the following formula holds: V 0 ∃Fi+1 , . . . , Fn , F00 , . . . , Fi+k−1 tr(Si , Ai , Fi+1 ) ∧ n−1 t=i+1 tr(Ft , At , Ft+1 ) ∧ goal(Fn ) ∧ T 0 0 0 traj (F0 , Ai , F1 , . . . , Fi+k−1 , Ai+k−1 , Fi+k , . . . , Fn−1 , An−1 , Fn ). (6) In the formula above, the conjuncts in the first line express that some execution of Ai , . . . , An−1 at state Si leads to a goal state; the conjunct in the second line expresses that this execution coincides with one of the intended trajectories in T after k steps. When k = 1 and T is the set of all feasible trajectories for P , our definition of weak relevancy is like the definition of relevancy in [10, 42, 43]. • A discrepancy between Si and T relative to P is strongly k-irrelevant to the rest of the plan, Ai , . . . , An−1 , with respect to T , where 1 ≤ k < n − i, if the goal can be reached from the current state by all executions of the rest of the plan, and every execution coincides with one of the intended trajectories in T after k steps, that is, if the following formula holds: 00 0 )∧ tr(Si , Ai , Fi+1 ∀Fi+1 , . . . , Fn ∃Fi0 , . . . , Fn0 ∃F000 , . . . , Fi+k−1 Vn−1 Vt−1 0 t=i+1 (tr(Si , Ai , Fi+1 ) ∧ u=i+1 tr(Fu , Au , Fu+1 ) ⊃ tr(Ft , At , Ft+1 ))∧ Vn−1 (tr(Si , Ai , Fi+1 ) ∧ t=i+1 tr(Ft , At , Ft+1 ) ⊃ (goal(Fn ) ∧ 00 traj T (F000 , A0 , F100 , . . . , Fi+k−1 , Ai+k−1 , Fi+k , . . . , Fn−1 , An−1 , Fn ))).

(7)

In the formula above, the conjuncts in the first two lines express that the rest of the plan, Ai , . . . , An−1 , is executable at state Si ; the formula in the third and fourth line expresses that all executions

14

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

of Ai , . . . , An−1 at Si lead to state in which the goal is established and that each of these executions coincides with one of the desired trajectories in T after at k steps. This definition of irrelevancy requires that Ai , . . . , An−1 must be a conformant plan for the goal starting from Si , and for k = 1 only desired trajectories are guaranteed to occur. This notion of relevance may be suitable, if the agent decides to switch from a credulous to a cautious plan execution. This may happen, for instance, if the effort that has been spent on monitoring and recovery exceeds a limit, perhaps since repeatedly actions did not work out as expected by default (which, e.g., for a robot might hint at a technical problem). If strong k-irrelevancy holds (which may well be the case, if merely “uncritical” actions remain), replanning might be avoided. Clearly, weak k-irrelevancy is implied by strong k-irrelevancy but not vice versa. While the latter is more desirable, it will apply less frequently than the former. Furthermore, there is a complexity trade-off between weak and strong k-irrelevancy, since the latter is computationally harder; we will discuss this in Section 7. Other notions of (ir)relevancy might be conceived and employed, though. A simple approach would be to cautiously adopt, for a given discrepancy, that it is relevant. This does not incur further cost for a relevancy analysis, and might be attractive for online testing if such a cost would be high.

4. Diagnosis of Discrepancies In this section, we formalize the notion of a diagnosis for an observed discrepancy between a state Si and a set of trajectories T relative to a plan P = hA0 , . . . , An−1 i, in terms of a point of deviation in the execution. Informally, such a point of deviation is given by a stage k, k < i, and a state Sk such that the execution was on track at stage k with being in state Sk , but no longer after taking Ak ; that is, every “evolution” of Si “deviates” from every trajectory in T after stage k. Any such point of deviation provides an explanation for the observed discrepancy (in terms of the action that was executed), and in general, many such points will exist. One could specify a plausibility measure to be applied on these points in order to single out the most plausible one as points of failures. Which measure to apply will in general depend on the particular application and information available. In the following we shall focus on a particular measure, which does not depend on the domain to modeled. We shall discuss other possibilities at the end of this section. In this paper, we focus on selecting the “latest” points of deviations as points of failure. As we have already explained in the introduction, the intuition is that, within a plan-rollback recovery scenario, the source of the problem is close to the current stage, and the effort for undoing the plan execution to reestablish a state on track is kept small. From there, the plan can be reused (and the redo effort is also small). One possible latest time stamp as described is the one at which an evolution of Si “matches” a trajectory in T ; another possible latest time stamp is the one until which an evolution of Si “matches” a trajectory in T . A point of failure k provides, for instance, the execution of a sequence hAk , . . . , Ai−1 i of actions at time stamp k as an explanation for a discrepancy observed at time stamp i (i > k). Therefore, for a discrepancy observed at an initial state, at time stamp 0, there is no point of failure. In the following, first we define an evolution of a state from an initial state relative to a plan, and then precisely describe a diagnosis of discrepancies mentioned above.

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

15

An evolution of a state Si reached at time stamp i from an initial state S0 after action occurrences A0 , . . . , Ai−1 of a plan P is a trajectory S0 , A0 , S1 , . . . , Si−1 , Ai−1 , Si obtained by finding a model of the formula init(F0 ) ∧

i−1 ^

tr(Ft , At , Ft+1 ),

t=0

where every Si is the state that corresponds to an interpretation of Fi .We will denote this formula by traj P (F0 , . . . , Fi ). Example 4.1. (Example 3.1 continued) The evolutions of the state S4 are: r

x

p a

r i

0

s

x

p a

i

r

i s i s

x

r p x a

i

s

r p x a

r p x a

i

s

r p x a

1

p a

s

2

i s

x

p a

r i s

x

3

a

p r i s

4 u t

A point of failure for a discrepancy between Si and T relative to P is characterized by a state Sk and a time stamp k such that the following holds: (S1) (at-point of deviation) Some evolution of Si “matches” a goal-establishing trajectory in T “at” time stamp k (0 ≤ k < i ≤ n) in state Sk and deviates from the trajectory at time stamp k + 1. (S2) (at-maximality) No evolution of Si matches a goal-establishing trajectory in T at a time stamp greater than k. Towards a formalization, we say that an evolution of Si matches a goal-establishing trajectory in T at time stamp k (0 ≤ k ≤ i ≤ n) if the sentence matchState P,T k,i (Si ), defined by 0 0 ∃F0 , . . . , Fi−1 , F00 , . . . , Fn0 matchAt P,T k,i (F0 , . . . , Fi−1 , Si , F0 , . . . , Fn ),

(8)

0 0 holds, where matchAt P,T k,i (F0 , . . . , Fi , F0 , . . . , Fn ) stands for the formula

traj P (F0 , . . . , Fk , Fk+1 , . . . , Fi ) ∧ 0 traj T (F00 , A0 , . . . , Fk0 , Ak , Fk+1 , . . . , An−1 , Fn0 ) ∧ goal(Fn0 ) ∧ Fk0 ≡ Fk . In the formula above, the first conjunct describes an evolution of a state reached at time stamp i (i.e., a state described by an interpretation of Fi ) while executing a plan P . The second and the third conjuncts

16

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

describe an intended trajectory in T that achieves the goal. The fourth conjunct expresses that, at time stamp k, the corresponding states of the evolution and the intended trajectory, given by the interpretations of Fk and Fk0 , respectively, are identical. Condition (S1) is expressed by deviateState P,T k,i (Si , Sk ) defined as the sentence ∃F0 , . . . , Fk−1 , Fk+1 , . . . , Fi−1 , F00 , . . . , Fn0 0 0 matchAt P,T k,i (F0 , . . . , Fk−1 , Sk , Fk+1 , . . . , Fi−1 , Si , F0 , . . . , Fn ) ∧ 0 ¬(Fk+1

(9)

≡ Fk+1 )

and condition (S2) is expressed by the sentence i ^

¬matchState P,T j,i (Si ).

(10)

j=k+1

A state-oriented point of failure (or a state-oriented diagnosis) for a discrepancy between Si and T relative to P at time i is then a pair (Sk , k) of a state Sk and a time stamp k (0 ≤ k < i ≤ n) such that the sentence (9) ∧ (10), denoted by diagnosisState P,T k,i (Si , Sk ), holds. Example 4.2. In the setting of Example 4.1, consider the following evolution of S4 p x a

r i

r p x a

s

0

r p x a

s

i

1

r i s

p x a

i s

2

x

3

a

p r i s

4

and the following intended trajectory in T

p x a

r i

0

s

r p x a

i

1

s

r p x a

i s

p x a

2

3

r i s

p x

a

4

r i s

a r i s

p x

5

p a r i x s

6

Let S3 be the state: p x a

r i s

Since the states of the evolution and the intended trajectory at time stamp 3 are identical to S3 , and since the states of the evolution and the intended trajectory at time stamp 4 are different, the formula deviateStateP,T 3,4 (S4 , S3 ) holds. Also, since there is no evolution of S4 , and no intended trajectory in T such that their states at time P,T stamp 4 are identical, matchStateP,T 4,4 (S4 ) does not hold. Therefore, diagnosisState3,4 (S4 , S3 ) holds, and (S3 , 3) is a state-oriented point of failure. V4 P,T Moreover, since matchStateP,T 3,4 (S4 ) holds, j=k+1 ¬matchState j,4 (S4 ) does not hold for k =

1, 2. Then, diagnosisStateP,T k,4 (Sk , S4 ) does not hold for k = 1, 2. Therefore, (S3 , 3) is the only stateoriented point of failure. u t

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

17

Example 4.3. Consider a variation of Example 2.1 where sometimes the action dryWithTowel does not make the puppy dry. Let T be the singleton consisting of the trajectory (3) presented in Example 2.2. Suppose that the state reached at time stamp 1, from the state S0 = {wet}, by executing the plan P = hdryWithTowel , putIntoWater i, is {wet}, which may have evolved as {wet }, {dryWithTowel }, {wet }.

(11)

Then, (S0 , 0) is the only state-oriented point of failure. Indeed, the initial states of the evolution (11) and the intended trajectory (3) are identical whereas their next states (at time stamp 1) are different. u t Alternative notions of points of failure. As mentioned earlier, the definition of points of failures as those points of deviations which are “closest” to the current stage is motivated by minimizing the amount of work that has to be spent for backtracking to the point of failure and then re-executing the plan. In a generalized framework, diagnoses can be singled out using a preorder, i.e., a reflexive and transitive relation (Sk , k)  (Sk0 , k0 ) on the points of deviations (expressing plausibility, preference, etc.), possibly depending on parameters such as (Si , i), P , and T . Accordingly, the at-maximality condition S2 selects those (Sk , k) such that no evolution of Si matches a goal establishing trajectory in T at some stage k0 with state Sk0 such that (Sk0 , k0 ) ≺ (Sk , k) holds, i.e., (Sk0 , k0 )  (Sk , k) and (Sk , k) 6 (Sk0 , k0 ) holds. For the notion of point of failure considered in this paper, (Sk , k)  (Sk0 , k0 ) is understood as k ≥ k0 . If knowledge about the undo costs to reach a particular state is available, then (Sk , k)  (Sk0 , k0 ) could be defined in terms of the negative undo costs. Note that this notion coincides with the former if the undo cost monotonically increases with the number of steps to undo (which is a reasonable assumption in many settings). On the other hand, in a Markov-decision style framework, with probabilities of transitions, under a single error assumption, (Sk , k)  (Sk0 , k0 ) might be defined as P(Sk , k; Si , i) ≥ P(Sk0 , k0 ; Si , i), where P(Sj , j; Si , i) is the probability that the evolution of Si at stage i passed through Sj at stage j. We do not further discuss the issue of selecting points of failures from points of deviations, but we point out that depending on the preorder , the task of singling out points of failures might get computationally very hard (e.g., for the above preorder based on probabilities, recognizing points of failures is #P-hard and thus much harder than NP-complete problems).

5. Semantic Properties of Diagnoses In this section, we discuss several properties of the notion of diagnosis defined above. In particular, we verify that this notion complies with the principle of parsimony, and we discuss issues concerning the existence of diagnoses. As discussed in Sections 1 and 4, according to the recovery policy of undoing the plan execution until a point of failure and then reusing the plan from that point on, a desirable feature of diagnoses of discrepancies is that they identify points of failure according to which recovering from the detected discrepancies is possible with the smallest effort. Therefore, ideally, without any further information

18

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

about the domain, these points of failure should be at a minimal distance from the state of the observed discrepancy, or, equivalently, at a maximal distance from the initial state. The following proposition shows that this is indeed the case according to our definitions. Proposition 5.1. For any discrepancy, if there is a state-oriented point of failure (Sk , k), then there does not exist any state-oriented point of failure (Sj , j) such that j > k. We point out that if there is a state-oriented point of failure (Sk , k), also no state-oriented deviation exists for any time stamp j > k. This can be seen in the proof of Proposition 5.1. It is not always the case that, when a discrepancy is detected, a diagnosis can be found for it. For instance, if a discrepancy is observed at time stamp 0 then no diagnosis exists for it, as discussed at the beginning of Section 4. On the other hand, for a discrepancy detected at a state Si at time stamp i (i > 0), if no evolution of the observed state Si matches with any of the goal-based intended trajectories at an earlier time stamp, there is no state-oriented diagnosis for the discrepancy. Concerning our monitoring framework, as described in Section 6, since in such a situation a recovery by backtracking is never feasible (even backtracking to the initial state does not recover), one has to resort to other means of recovery, such as replanning. Observe that the absence of a diagnosis implies that none of the initial states of all possible evolutions of the observed state is in a goal-establishing trajectory in T . So, conversely, if the goal-establishing trajectories in T contain all initial states, a diagnosis must exist. Proposition 5.2. Let P be a plan hA0 , . . . , An−1 i and let T be a set of trajectories such that for any state S0 , for which init(S0 ) holds, T contains a trajectory S0 , A0 , . . . , An−1 , Sn , such that goal(Sn ) holds. Then, for any discrepancy, a point of failure exists. An important class of action occurrences are deterministic action occurrences, which always result in a unique state. More formally, an action occurrence A is deterministic, if ∀F, F 0 , F 00 (tr(F, A, F 0 ) ∧ tr(F, A, F 00 ) ⊃ F 0 ≡ F 00 ); an action domain is deterministic, if each action occurrence A is deterministic. An important aspect is that deviations cannot occur after a deterministic action occurrence. Since a deviation is a necessary condition for any diagnosis by definition, we have the following property. Proposition 5.3. Suppose that (Sk , k) is a point of failure for some discrepancy between Si and T relative to the plan P = hA0 , . . . , An−1 i. Then Ak is not deterministic. This property may prove valuable also for computing diagnoses, provided knowledge about deterministic action occurrences is available: any time stamp k such that Ak is deterministic is can be excluded for a point of failure (Sk , k). As an immediate corollary of the preceding proposition, we obtain the following result for deterministic action domains. Corollary 5.1. If the action domain is deterministic, then no diagnosis exists for a discrepancy.

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

19

This means that if it is known that an action domain is deterministic, then the monitoring system can skip the test for diagnoses, as it will always be negative, and resort to a recovery method different from backtracking. History-oriented diagnosis. Another way of finding explanations for discrepancies is discussed in a longer version of the paper [11]. According to that approach, a point of failure for a discrepancy between Si and T relative to P is characterized by a state Sk and a time stamp k which satisfy the following conditions: (H1) (until-point of deviation) Some evolution of Si “matches” a goal-establishing trajectory in T “until” time stamp k in state Sk and deviates from the trajectory at time stamp k + 1. (H2) (until-maximality) No evolution of Si matches a goal-establishing trajectory in T until time stamp k + 1. Note that, with this approach, unlike the state-oriented one, the monitoring agent is concerned about that the agent “obeys” a goal-establishing trajectory in T while executing the plan. To guarantee that the execution exactly follows one of the intended trajectories until the point of failure, the monitoring agent must check for a discrepancy at every time stamp, since otherwise not one particular evolution of Si may be considered. It can be shown [11] that history-oriented diagnosis is equivalent to state-oriented diagnosis, if each trajectory in T starts with an initial state S0 for the respective plan execution, i.e., init(S0 ) holds, but is a different concept in general. Furthermore, state-oriented diagnoses exist whenever history-diagnoses exist. For further discussion and details, we refer to [11].

6. A Monitoring Example Consider a monitoring framework, according to which, in the process of monitoring the execution of a plan of length k relative to a given set of trajectories, the monitoring agent, from time to time (say at most m times), does the following: 1.

(a) checks whether there is a discrepancy between the current state and the corresponding states of the given trajectories relative to the plan; (b) if there is a discrepancy then may check whether it is relevant to the execution of the rest of the plan;

2.

(a) if no (relevant) discrepancy is detected then continues with the execution of the plan; (b) otherwise, may try to find a diagnosis of discrepancies by examining the given trajectories against possible evolutions of the current state from the initial state;

3. recovers from the discrepancies to reach a goal state, by doing one of the following (possibly with the help of a decision support): (a) finds a new plan (of length less than k) from the current state to reach a goal state, and executes the new plan;

20

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

(b) undoes the plan until the point of failure with respect to a given reverse plan library (as described in [13, 14]) if possible, and re-execute the rest of the plan to reach a goal state.4 Here, the undoing is assumed to be safe in the sense that it will not end up in an unexpected state. We refer to [13, 14] for more discussion. Steps 1–3 above are described further in Figure 1. Consider, for instance, the blocks world described in Example 2.3. Suppose that an agent is executing the plan P . At time stamp 3, at a state S3 , the agent does not grip block p properly while she is moving it from the table onto block x, and the block ends up on block r. If this effect of the action of moving p onto x is not intended at time stamp 4 with respect to the given set of trajectories, a discrepancy is observed between the actual state and the intended one. When such a discrepancy is detected, the agent can look for an explanation, i.e., a point of failure. In this case, the point of failure for the observed discrepancy can be characterized by the state S3 and the time stamp 3. If the agent executes this plan often, then she may want to check for a discrepancy at time stamp 4 every time she executes P . Therefore, she may determine time stamp 4 as a checkpoint. On the other hand, since such a discrepancy may prevent the execution of the plan later on, the agent may want to recover from this discrepancy. For that, the agent can undo the plan execution until the diagnosed point of failure (to reach S3 ) by moving block p onto the table, and then continue with the plan execution. Undoing effects of actions is a method for recovering from plan failures, and has been studied, e.g., in [26, 27, 13]. As described in the example above, at Step 2, the diagnosis of the detected discrepancies found by the agent can be helpful in at least three ways. • It can be used for improvement of future plan executions. A diagnosis provides an explanation for some weaknesses of the plan being executed, which materializes differently from the expectation. As for monitoring, this explanation can be used to determine a checkpoint specifying when to check for discrepancies in an execution, based on the experience gained from the execution and in a way such that the checkpoint is “close” to the potential point of failure, and thus failure latency is kept short. Checkpoints set up in this way can be very useful, especially when a plan is executed many times. The diagnostic process for setting up checkpoints for future plan executions can be detached from Step 2 and performed offline. • A diagnosis can also provide valuable information about the user’s understanding of the planning domain formalization – for example, if it turns out that the action executed at some point of failure may have different outcomes, contrary to her belief that the action has deterministic effects. Like for setting up checkpoints, such analysis of the formalization may be performed offline. • Another way to use a diagnosis for a discrepancy is for online plan recovery, at Step 3. For instance, if the diagnosis of discrepancies provides a possible point of failure for the discrepancies then, at Step 3, the monitoring agent can backtrack to this diagnosed point of failure, and execute the plan from that point on, as described in [12]. In a more general framework, at Step 2, the agent does not have to find a diagnosis of discrepancies every time there is a detected discrepancy. When to look for a diagnosis can be handled by a decision 4

Note that, with the help of the decision support unit, the agent can avoid re-executing a void action repeatedly (at most m times).

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

no

do a check for a discrepancy?

yes

CHECK FOR A DISCREPANCY

no

is there a discrepancy?

yes

CHECK FOR RELEVANCY

CONTINUE WITH THE PLAN EXECUTION

no

is discrepancy relevant? yes

want an explanation for the discrepancy?

yes

no

RECOVER FROM DISCREPANCIES WITHOUT BACKTRACKING

FIND A POINT OF FAILURE FOR DISCREPANCIES

no

was a point of failure found? yes

no

want to back− track to a point of failure? yes

RECOVER FROM DISCREPANCIES BY BACKTRACKING TO A POINT OF FAILURE

Figure 1.

Steps 1–3 of the monitoring framework described in Section 6.

21

22

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

support model. We refer to [17] for a partial implementation of the approach. Also, Step 3 of the framework can be refined further. Depending on the diagnosis of discrepancies, the length of the plan executed so far, the length of the remaining plan, and possibly some other criteria, the agent can pick with the help of a decision support model a plan recovery method among many, including replanning, backtracking, and patch planning. Such extensions and refinements of the framework above are possible. However, we do not study them here, since this is beyond the scope of this paper; we mention, however, that besides the logic-based approaches like [10], [42], and [43], many other works (e.g., [21], [1], [7], [25], [36], [29]) integrate planning and execution monitoring with various ways of plan recovery methods. It is worth pointing out [47] which shows that plan repair is in practice much better than planning from scratch, in spite of a theoretical result in [35].

7. Computational Complexity In this section, we address the complexity of the three main computational problems in execution monitoring: detecting a discrepancy, verifying a diagnosis for a discrepancy, and finding a diagnosis for a discrepancy. Suppose that a background B of execution monitoring is formed by • an action domain description, specified by a set F of fluents, a set A of actions, and formulas state(F) and act(F, A, F 0 ), • a planning problem P with the initial states specified by a formula init(F), and with the goal conditions specified by a formula goal(F), • a plan P = hA0 , . . . , An−1 i of length n for the planning problem P, and • a set T of intended trajectories specified by a formula traj T (F0 , A0 , . . . , An−1 , Fn ), where each formula above is an arbitrary propositional formula under classical semantics, like in Example 2.1. The generic setting above accommodates nondeterminism and incomplete information (e.g., there may be several initial states). Dealing with nondeterminism and incomplete information in action execution and planning is computationally complex in general, and leads to intractability even in plain settings [3, 16, 39, 46]; intuitively, nondeterminism causes branchings in execution of a plan, and thus enlarges the search space progressively. Due to this generic setting, one may expect that the computational tasks we are interested in are intractable in general, and at least NP-hard respectively coNP-hard. However, as shown by the results below, the complexity of the main problems in discrepancy detection and diagnosis is not much above the well-known complexity of propositional logic (NP for satisfiability and coNP for consequence, respectively) underlying the basic framework, and is in fact only “mildly” harder than NP and coNP. We start with the problem of discrepancy detection. Here, we have to decide whether a state Si observed at time stamp i is compatible with some trajectory in T . The following theorem shows that this problem is intractable in general. Theorem 7.1. Given a background B, a reached state Si , and a time stamp i (i ≤ n), deciding whether a discrepancy exists between Si and T relative to P at time stamp i is coNP-complete.

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

23

The next problem which we consider is deciding irrelevancy of discrepancies. Here, we assume that a discrepancy has already been found. Theorem 7.2. Given a background B, a reached state Si , a time stamp i (i ≤ n) such that a discrepancy exists between Si and T relative to P at time stamp i, and an integer k, i ≤ k < n, deciding (i) whether the discrepancy is weakly k-irrelevant is NP-complete and (ii) whether it is strongly k-irrelevant is Πp2 complete.5 Recall that Πp2 = co-NPNP is the class of problems whose opposite is nondeterministically decidable in polynomial time with an oracle for NP. The membership parts of this result are immediate from the formal specifications (6) and (7), respectively, since the respective QBFs can be constructed in polynomial time. The hardness parts can be easily shown by reductions from the satisfiability problem and from testing whether a given plan is valid (i.e., conformant), which are NP-complete and Πp2 -complete problems, respectively. The problem of recognizing a diagnosis for a discrepancy is slightly harder than discrepancy detection. Intuitively, to recognize that a state Sk and a stage k characterize a point of failure for a discrepancy, we need to verify two conditions: (i) some goal-establishing trajectory in T and an evolution of the observed state Si pass through Sk at stage k, and (ii) no goal-establishing trajectory in T and no evolution of the observed state Si pass through a state at a stage later than k. The verification of (i) amounts to a satisfiability problem, and the verification of (ii) amounts to an independent unsatisfiability problem. Problems like the verification of both (i) and (ii) are captured by the complexity class Dp = {L×L0 | L ∈ NP, L0 ∈ coNP }, which can be viewed as the “conjunction” of NP and coNP. Theorem 7.3. Given a background B, a state Sk , a reached state Si , and time stamps k and i (k < i ≤ n), such that there is a discrepancy between Si and T relative to P at time stamp i, deciding whether (Sk , k) is a point of failure for the discrepancy is Dp -complete. The last computational problem we consider here is the computation of a point of failure (Sk , k) for a discrepancy. A naive way of solving this problem is simply to guess a state Sk and a time stamp k, and then to check whether (Sk , k) is a point of failure for the discrepancy. Since the second part of this method is coNP, due to Theorem 7.3, this naive way of problem solving is at the second level of the Polynomial Hierarchy. In another way, some point of failure (Sk , k) for a discrepancy can be computed in (deterministic) polynomial time with the help of an NP oracle, using a technique similar to the one used for computing an optimal tour in the Traveling Salesman Problem (TSP) in [37]: First compute the stage k, and then construct Sk fluent by fluent, by making suitable calls to the NP oracle. A more refined analysis yields the following result. Theorem 7.4. Given a background B, a reached state Si , and a time stamp i (i ≤ n), such that there is a discrepancy between Si and T relative to P at time stamp i, computing some point of failure (Sk , k) for the discrepancy is FNP//OptP[O(log n)]-complete. The complexity class FNP//OptP[O(log n)] (for short, FNP//log) is from [8]. Intuitively, FNP//log contains all problems such that a solution for an instance I can be nondeterministically computed by a 5

Strictly speaking, these are promise problems since a valid problem input can not be recognized in polynomial time.

24

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

transducer in polynomial time, if the result opt(I) of an NP optimization problem on I is known. Here, opt(I) is an integer having O(log |I|) bits; an NP optimization problem is the problem of computing the maximum value of any solution for an instance I, given that deciding opt(I)≥k is in NP, and recognizing solutions is polynomial. For example, computing the largest set S of pairwise connected nodes in a given graph G (i.e., a maximum clique) is a problem in FNP//log (observe that different maximum cliques may exist). Indeed, computing the size of a maximum clique in G is an NP-optimization problem with O(log |G|) output bits, since testing whether a set S is a clique is easy (just check whether G has an edge between each pair of nodes in S), and deciding whether opt(G) ≥ k is in NP (guess a clique of size ≥ k). Note, however, that this problem is not known to be FNP//log-complete. We conclude this subsection with some remarks on variants and special cases of the problems above. (1) The complexity results above hold in a general setting where there may be several initial states (in particular, when there is incomplete information about some fluents), and where actions can have nondeterministic effects. These results remain the same if there is a single initial state (i.e., there is no incompleteness). For deterministic actions, even in the presence of several initial states, diagnosis of a discrepancy is trivial due to Corollary 5.1. (2) In a background B for execution monitoring, instead of state(F), act(F, A, F 0 ), and init(F), one can consider formulas evols Pi (F0 , . . . , Fi ) (i ∈ {0, . . . , n}) that describe evolutions of states reached at time stamp i by executing P . Alternatively, one can consider a single multi-sorted formula evols P (I, F0 , F1 , . . . , Fn ) where I is an integer variable, describing the evolutions. In such background settings, the computational problems we are interested in do not become harder, and we obtain the same complexity results. (3) Similarly, the complexity results remain the same if the formulas in B contain hidden variables, i.e., existentially quantified propositional variables that are projected off. For example, in the formula φ(X, Y ) = ∃Z(X ⊃ Z) ∧ (Z ⊃ Y ), variable Z is a hidden variable; φ(X, Y ) is logically equivalent to the quantifier-free formula X ⊃ Y . Hidden variables are useful as auxiliary variables in problem representations. On the other hand, using polynomially many such variables, the transition-based semantics of systems such as CCALC and DLVK can be emulated by classical semantics. In fact, this applies to any action representation framework with a transition-based semantics, in which the constituents state(F), act(F, A, F 0 ) can be decided in non-deterministic polynomial time. In such frameworks, the results above provide upper bounds on the complexity of the computational problems we consider. On the other hand, if the action representation framework above can be polynomially expressed in a particular representation framework, like that of CCALC [33] and DLVK [15], then the results above provide lower bounds as well. Putting things together, the complexity results above hold if the background is described for CCALC and DLVK .

8. Computing Explanations Algorithms for computing diagnoses have already been encountered in the discussion of the computational complexity in the previous section. However, these algorithms do not take into account possible functionalities of the existing systems for action representation and reasoning. For this reason, we consider here the computation of diagnoses at a generic level, which may be refined for the existing systems.

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

25

We assume that the description of the action domain as well as the set of trajectories in T (5) is fixed. Furthermore, we assume that a function match T, which returns the set of all trajectories in T satisfying some given conditions, is available. The input of match T are • a sequence of actions hA0 , . . . , An−1 i of length n; • a list [M0 , M1 , . . . , Mm ] (0 ≤ m ≤ n) of sets of fluents, such that each Mi corresponds to an interpretation of Fi ; and of Ai ; and • a formula φ(F) describing some conditions on F. The output of match T is a set of prefixes S0 , A0 , S1 , . . . , Sm−1 , Am−1 , Sm

(12)

of all trajectories S0 , A0 , S1 , . . . , Sn−1 , An−1 , Sn in T such that the following hold: (a) Prefix (12) matches M0 , A0 , . . . , Am−1 , Mm , i.e., Si = Mi . A wild card symbol “?” may replace any Mi , matching any Si ; (b) φ(Sn ) holds. For example, consider the action domain description presented in Example 2.1. Suppose that trajectory (3), i.e., {wet}, {dryWithTowel }, {}, {putIntoWater }, {inWater , wet }, belongs to T , and suppose that P = h{dryWithTowel }, {putIntoWater }i and φ(F) = inWater ∧ wet. Then, one of the prefixes returned by match T(P, [{wet }, ?], φ) is {wet}, {dryWithTowel }, {}.

(13)

Indeed, prefix (13) matches trajectory (3), and φ(F) holds at state {inWater , wet }. On the other hand, prefix (13) is not a part of the output of match T(P, [{wet }], φ(F)), because the single prefix returned by this call is {wet}. We assume that a similar function match Plan(P, [M0 , M1 , . . . , Mm ]) is available, which returns, for the given plan P =hA0 , . . . , An−1 i, the set of all evolutions S0 , A0 , . . . , Am−1 , Sm of states Sm according to P that match M0 , A0 , M1 , . . . , Am , Mm . For example, consider the plan P =h{dryWithTowel }, {putIntoWater }i for the planning problem with the initial state {wet} and the goal condition wet ∧ inWater . Then, match Plan(P, [{wet }, ?]) returns the single evolution (13), whereas match Plan(P, [{wet }, ?, ?]) returns the two evolutions: (3) and {wet}, {dryWithTowel }, {}, {putIntoWater }, {}. The functions match T and match Plan can be easily implemented on top of the systems CCALC and DLV K , by pushing the values of fluents and actions into a planning problem, and then by returning all trajectories.

26

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

Algorithm S TATE D IAGNOSIS(Si , i) (Si ), Observed state Si at stage i satisfying discrepancy P,T i with respect to plan P =hA0 , . . . , An−1 i and trajectories T . Output: All state-oriented diagnoses (Sk , k) for the discrepancy Input:

PoF States := ∅; k := 0; for each S0 , A0 , . . . , Ai−1 , Si ∈ match Plan(P, [?, . . . , ?, Si ]) do begin j := i − 1 while j ≥ k do begin if match T(P, [?, . . . , ?, Sj ], goal(F)) 6= ∅ then if j > k then begin k := j; PoF States := {Sk } end else PoF States := PoF States ∪{Sk } j := j − 1 end end; output {(S, k) | S ∈ PoF States }. Figure 2. A generic algorithm to compute all state-oriented diagnoses.

The problem of discrepancy detection between a state Si and T relative to P at stage i is then accomplished by testing whether match T(P, [?, . . . , ?, Si ], goal(F)) = ∅. Exploiting the functions match T and match Plan, the state-oriented diagnoses for a discrepancy between an observed state Si and T relative to P at a time stamp i can be computed by the generic algorithm S TATE D IAGNOSIS presented in Figure 2. This algorithm computes all state-oriented point of failures iteratively, by finding, for each evolution of Si , the maximum time stamp k and a set of states Sk such that the evolution deviates from every goal-establishing trajectory in T at state Sk at stage k.

9. Implementation We have implemented the algorithms described in the previous section for the K action representation language on top of the DLVK planning system. In this section, we report on the system itself and on experiments that have been performed with it.

9.1. System Description Our program, which is available at http://www.kr.tuwien.ac.at/research/plan diagnosis/, is a Unix script that calls DLVK to compute match T and match Plan, and a C++ program to compute the points of failure. The inputs to our program are • a domain description D,

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

27

• a planning problem P, • a set T of trajectories, • a plan P , and • a state Si . Both the domain description and the planning problem are described in the action language K [16]. The set of trajectories can be explicitly stated, or alternatively they can be computed from a given planning problem. In the latter case, we denote by PT the new planning problem, possibly with some constraints. To compute match Plan(P, [?, . . . , ?, Si ]), first a new file is generated from the given plan P of length n, containing the “causal laws” Penf : caused istime(j) after istime(i)

(j = i + 1, 0 ≤ i < n)

(14)

caused false after not Ai , istime(i)

(0 ≤ i < n)

(15)

initially : istime(0).

(16)

Here the fluent istime(X) specifies the time stamps. By rule (14), the time stamp is incremented by 1. Rule (15) makes sure that at time stamp i the corresponding action Ai in P occurs. Finally, (16) makes sure that istime(0) holds in the initial state. (See [16] for the syntax and the semantics of the action language K, and [15] for a description of the input and output of DLVK .) This file is used to make sure that the actions in P are executed in the same order. Then we obtain a new planning problem P 0 from P by replacing the original goal with the description of Si . After that match Plan(P, [?, . . . , ?, Si ]) is computed by calling DLVK with files containing D, P 0 , and Penf . Instead of computing match T(P, [?, . . . , ?, Sj ], goal(F)) at each iteration of the inner loop of the algorithm in Figure 2, we compute match T(P, [?, . . . , ?], goal(F)) once at the very beginning by calling DLV K with files containing D, Penf and PT . Having computed the set of possible evolutions leading to Si , i.e., match Plan(P, [?, . . . , ?, Si ]), and the set of goal-establishing preferred trajectories, i.e., match T(P, [?, . . . , ?], goal(F)), we compute points of failure by comparing these two sets. This comparison is done by a C++ program that searches for a maximal time index x, at which some possible evolution and preferred trajectory contain the same state Sx . A flowchart of our program is shown in Figure 3. Example 9.1. Consider the blocks world domain and the planning problem of Example 2.3. We describe the blocks world domain in the action language K, and present it with the planning problem to DLVK in file problem.plan where the background knowledge describing blocks and locations is described in file problem.dl. For instance, the direct effects of moving a block are presented to DLVK by total on(B,L) after move(B,L1). and the inertia by inertial on(B,L).

28

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

By a new planning problem

By a constraint

on P

How are trajectories

D

P

P

Si

T specified? Specify the new planning problem

Add constraints

to P

Obtain Penf to enforce plan execution

PT

Obtain P 0 by substituting Si as the goal in P P0

Penf

Call DLVK

Call DLVK

match T(P, [?, . . . , ?], goal(F))

match Plan(P, [?, . . . , Si ])

By explicitly describing each trajectory Compare by a C++ program Points of failure

Figure 3.

The flowchart of our implementation of the algorithm from Figure 2.

We describe the planning problem P of Example 2.3 as follows: initially: on(x,table). on(a,table). on(p,a). on(r,i). on(i,table). on(s,table). goal:

on(x,table), on(p,a), on(a,r), on(r,i), on(i,s), on(s,table) ? (6)

Here (6) describes that the maximum plan length is 6. To describe the set T of trajectories as in Example 3.1, we describe the constraint which expresses that agent prefers to put a block onto a block instead of the table, in file constraint.plan: always: caused false if on(B,table) after move(B,L), L table. This constraint with the planning problem P describes PT . The state Si is the state S4 from Example 3.1, and we describe it in file statefile: STATE 4: on(p,r), on(r,i), on(i,s), on(s,table), on(a,table), on(x,table) We take P as the plan from Example 2.3, and describe it in file planfile:

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

29

PLAN: move(r,x); move(i,s); move(r,i); move(p,x); move(a,r); move(p,a) To compute the points of failure we call the diagnosis program diagnose implemented in C++ as follows: diagnose problem.plan problem.dl planfile statefile -add constraint.plan Then this program calls DLVK twice: dlv problem.plan problem.dl constraint.plan _enforcing.plan _enforcing.dl -FPopt dlv problem.plan problem.dl _enforcing.plan _enforcing.dl _goal.plan -FPopt Here the files enforcing.plan and enforcing.dl contains Penf , and the file goal.plan contains the goal created from Si . The first call to DLVK produces the list of all goal-establishing preferred trajectories (the two trajectories shown in Example 3.1), and the second call to DLVK produces the list of all possible evolutions of the diagnosed state (the three evolutions shown in Example 4.1). The output of diagnose is the only point of failure (S3 , 3) shown in Example 4.2: Point of failure: 3 on(x,table), on(p,a), on(a,table), on(r,i), on(i,s), on(s,table) For further examples and information about the implementation, see http://www.kr.tuwien.ac. at/research/plan diagnosis/.

9.2. Experiments In order to assess the practicability of our approach, we have conducted experiments using our simple, unoptimized prototype implementation. The benchmarks have been performed on the blocks world domain as presented in Example 9.1 and modifications thereof. All experiments have been run on a modest platform, namely an AMD Athlon 1.2 GHz with 256 KB cache and 256 MB main memory running Linux 2.4.21, using DLVK version 2006-01-12. In a first experiment, we have directly taken over the setting of Example 9.1 and considered the task of inverting a stack of blocks onto a designated block. When stack contains 15 blocks, finding a diagnosis for randomly chosen states at various time stamps took between 41 and 83 seconds. These timings include the computation of preferred trajectories (consuming 40 seconds in each case), the computation of evolutions (consuming between milliseconds and 42 seconds, depending on the time stamp) and the computation of the point of failure (consuming less than one second in each case). Note that the set of preferred trajectories could also be computed in advance, and stored in an efficient data structure. The domain description in Example 9.1, which we kept simplistic for illustration purposes, is not very realistic for the following reason: it allows arbitrary outcomes of move actions, as blocks may fall anywhere. In our example of restacking blocks, a very large number of states can be reached by executing a credulous plan, so the number of evolutions of a state is excessively large. Towards a more realistic

30

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

B A

E D C

table1

table2

table3

table1

initial

A D E

B C

table2

table3

goal

Figure 4. Planning problem — small size.

F E D C B A

L K J I H G

table1

table2

initial

table3

E H J B A L G

F I K D C

table1

table2

table3

goal

Figure 5. Planning problem — larger size.

setting, we have next considered a modification of the domain, in which blocks may fall only on several designated locations. An equivalent way of viewing this domain would be considering several tables, which may hold at most one stack of blocks. We have chosen the planning problem reported in Figure 4 and the following plan: PLAN: move(e,b); move(d,e); move(c,table3); move(d,c); move(e,table2); move(d,e); move(b,c); move(a,d) In this setting, preferred trajectory calculation took 120 ms, and also computing evolutions and points of failure was in the same order of magnitude for various states and time stamps. We made similar observations for slight variations of the number of blocks and the plan lengths; for example, in a similar setting with a plan of length 11, the overall computation time was still below one second. Therefore, for problems of this size, even our simple prototype could be used for online diagnosis. We have also considered larger settings, including restacking 12 blocks as shown in Figure 5, using a plan of length 29. We have considered states at various time stamps, for which we have computed diagnoses using our prototype implementation. Here, computing the preferred trajectories took about 70 seconds (in each case), while computing the evolutions took from within one second up to 66 seconds, depending on the time stamp considered (see Table 1); the increase in computation time is intuitively explained by the still largely growing possible number of evolutions. Subsequent computation of the points of failure took less than one second in each case. In order to understand the impact of branching by nondeterminism in early stages of plan execution, we have increased in a final set of tests the number of tables in the domain (more tables mean more possibilities where a block may fall to). We have considered a task similar to the one reported in Figure 4, but with 6 blocks and a plan of length 11. We have computed diagnoses for two states at time stamps 4 and 10, respectively. In a setting with 4 tables, computing a diagnosis took 0.8 and 7.0 seconds, respectively, with 5 tables it increased to 2.3 and 17.8 seconds, and with 6 tables we measured 9.4

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

Time stamp

3

7

10

11

12

13

runtime (seconds)

0.41

3.05

6.94

9.03

18.66

66.04

31

Table 1. Timings for computing evolutions at various time stamps for the planning problem reported in Figure 5.

and 38.3 seconds. It is quite evident that the dominating factors are the number of possible evolutions, together with the time stamp. In conclusion, we can observe that the reported implementation can—despite its prototype status— deal with problems of reasonable size. As a tool for analyzing domain descriptions (where response time is less critical), it is useful already in this form. Note that, like in the blocks world, versions of a domain might be considered with a small number of objects to understand the (proper) working of the domain axioms. However, the results are encouraging also for online monitoring, because replanning using DLVK , that is finding a (good) plan from the given state to the goal state, consumes much more time than computing diagnoses, especially at earlier time stamps. Therefore, in a hypothetical plan execution and monitoring framework, diagnosing and backtracking may well have advantages over replanning in this setting. Furthermore, there is a lot of potential for performance improvements, by more efficient realization of the generic functions described, as well as other implementations. Note also that the time for online diagnosis has to be related to actual action execution time; if, in our scenario, the blocks were large concrete blocks on a construction site, even tens of seconds can be considered fast with respect to the duration of plan execution. Our results indicate that the expensive operations concern the computation of preferred trajectories and the computation of the possible evolutions (performing diagnosis is a fast, given the evolutions). The former is disconnected from the actual diagnosis process and can be performed offline. In our prototype the trajectories are stored in simple data structures; here, considerable improvements can be made. The computation of evolutions depends mostly on the time stamp at which the diagnosis is performed and on the branching factor of the evolutions. Also in this case, both the data structures for evolutions in our prototype and the interface to the DLVK system (which produces the evolutions) are unoptimized, and considerable improvements are possible.

10. Related Work Other frameworks that describe execution monitoring in a logic-based framework for reasoning about actions, in which a planning problem can be formulated and solved, are [10, 42, 43] and [20]. In the former three, execution monitoring is described in the situation calculus as in [38], which makes them applicable to Golog programs [31]. In [20], the authors describe execution monitoring in the fluent calculus [44] for FLUX programs [45]. Our framework is different from these logic-based frameworks with respect to discrepancy detection and diagnosis as follows. First of all, we assume that the current state is observed by the agent, but we do not discuss how the description of this state is obtained. In [10], the authors assume that the agent knows which exogenous actions the environment has executed, so the current situation can be identified easily. In [42, 43] and in [20], the authors introduce sensing actions so that the monitoring agent can find the truth values of some fluents at the current situation.

32

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

In our framework, the monitoring agent detects a discrepancy when the current state and the corresponding states of the given trajectories are different. In [10, 42, 43], the agent detects a discrepancy when the remaining plan (or, in general, program) is not successful, considering all possible trajectories. In [20], the agent detects a discrepancy when the action is not executable or when the effects of an action are not as intended, considering all possible transitions, i.e., trajectories of length 1. Though a precise description of a discrepancy is missing in this paper. In both [20] and our framework, the detected discrepancies may not be relevant to the successful execution of the rest of the plan: the effects of actions may not be as intended but the execution of the rest of the plan may lead to a goal state. Our notion of diagnosis can provide explanations for detected discrepancies. In [10, 42, 43], the problem of finding explanations for detected discrepancies is not considered. In [20], explanations for action failures are specified by introducing an abnormality predicate for each action. For instance, if the action of picking milk at the grocery store fails because the agent can not pay for the milk, then one possible explanation for this failure is that the agent might have picked a more expensive milk. To take into account this abnormality, the authors introduce an abnormality predicate for the action of picking milk, and modify the state update axioms accordingly. Note that, later on, if the agent decides to consider some other possible abnormalities, he needs to modify the domain description again. In our framework, the agent can specify such abnormalities by a formula, whose models describe the intended or preferred trajectories, and thus does not have to modify the domain description for various abnormalities. Note that another advantage of describing an intended or preferred behavior of an agent with a set of trajectories, instead of abnormality predicates, is that it can be formulated separately from the action domain description. For instance, we can formalize the action domain in an action language such as C+ [24] or K [15], and specify the intended or preferred behavior of an agent with a logic program. This kind of modular representation allows us to talk about intended or preferred behavior of an agent beyond single transitions if the action domain is described in a language, like the action languages [22], where we can not easily describe the effects of a sequence of several actions, like moving several blocks consecutively onto the table, without introducing new auxiliary atoms. Also, this kind of modularity gives the agent the flexibility of not having to determine all possible abnormalities while describing the action domain, thus leading to more elaboration tolerant representations [34]. Another difference of our work from [10, 42, 43, 20] is that we have analyzed, in our framework, the computational complexity of detecting discrepancies and finding diagnosis of discrepancies. Another line of work that is related to diagnosis of discrepancies is the work on diagnostic reasoning in the sense of [4] and [2], where action descriptions can be represented by transition diagrams. The kind of diagnosis discussed in [4] and [2] is different from ours in that they detect faulty components of a system by observing its behavior. For instance, if the light is not on when the lamp is turned on, it may be because the bulb is broken. To diagnose such faulty components, the authors introduce an abnormality predicate for each component, and describe the effects of actions by default rules. For instance, instead of saying that light is on after the lamp is turned on, they would say that light is on after the lamp is turned on if the bulb is not broken. To obtain an explanation, they also introduce an exogenous action for each abnormality of a component, like, for instance, the action of breaking the bulb. Like in [20], if the agent decides to consider some other possible abnormalities of components, he needs to modify the domain description, and modularity is lost. On the other hand, some abnormalities can not be identified from the given set of trajectories. For instance, if the light is not on, we can find out, with respect to a given set of trajectories, that there is

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

33

something wrong with the execution of the action of switching the lamp on; however, we can not pinpoint that it is because the bulb is broken if this abnormality is not specified in the action domain description. Fortunately, our modular representation framework allows us to use abnormality predicates in the action domain description to express such abnormalities, and, at the same time, use a set of trajectories to express the intended or preferred behavior of the agent. There are some systems that integrate planning with monitoring, but not described in a logic-based framework. PLANEX [21] is the earliest one. It first calls the STRIPS planner to generate plans, and then executes the plan. In case of an execution failure, it continues to execute the parts of the plan that are independent; if there is no such independent part, a new plan is generated. IPEM [1] is the first system integrating partial-order planning with plan execution. It starts with an incomplete plan and tries to reduce the “flaws” (i.e., a list of things to be done, such as “unexecuted action”, to complete the plan) at each step. XRFM [7] provides the continual modification of plans during their execution, using a rich collection of failure models and plan repair strategies. It projects a default plan into its possible executions, diagnoses failures of these projected plans by classifying them into a taxonomy of predefined failures, and then revises the default plan by following the pointers from the predefined failures to predefined plan repair strategies. Some other systems that integrate planning with execution monitoring are ROGUE [25], SIPE [48], SOAR [40], SPEEDY [6]. Compared to these work, we formally describe monitoring of a plan execution. For that, we provide definitions for the basic notions, such as a discrepancy, a point of failure, a recovery. These definitions allow us to study properties of monitoring, including the computational complexity analysis of related problems.

11. Conclusion We have investigated three problems related to plan execution: detection of discrepancies, assessment of relevancy, and finding explanations for detected discrepancies. We have described some solutions to these problems in a logic-based framework, different from those suggested in the literature, considering the current state information without a log of the past states. According to our solution to the first problem, one checks the compatibility of the plan execution with a given set of intended trajectories. For the second problem, difference notions of relevancy have been considered for which a trade-off between complexity and goal enforcement as well as applicability has been found. For the last problem, points of failure are singled out by comparing the possible evolutions of the observed state with the given trajectories, and singling out certain points of deviations, which are stages in the execution where the action outcome is different from a desired one, following some rationale. These points of failures can describe some weaknesses of the plan being executed, and can be used, e.g., for setting checkpoints specifying for a monitoring agent when to check for discrepancies, which can be especially useful if the plan is executed many times. They might also prove useful for the agent’s understanding of the domain formalization, and can be used for recovering from the detected discrepancies. For instance, a monitoring agent can roll back the plan execution until the point of failure from which the remainder of the plan can be (re)executed, and thus can avoid an expensive (re)planning step. This kind of plan recovery with backtracking can be useful, in particular, when the remaining plan is very long. Finally, we have analyzed the complexity of computing explanations for discrepancies, and we have addressed the computation of diagnoses on top of existing action and planning frameworks. An implementation of our approach for the DLVK action and planning framework is discussed in Section 9.

34

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

Several issues remain for further research. One issue is the extension of the results in this paper to action descriptions with partially observable states. Another issue is to explore alternative notions points of failures more in depth, as well as to develop refined notions of irrelevancy of discrepancies. A further line of research is to monitor execution of non-linear plans, such as conditional plans, with respect to a set of “intended” or “preferred” trajectories. The main difference to the setting here is that alternative sequences of actions must be respected which, according to the plan, might be executed in the future or have been executed in the past (unless the action history is known). A different direction is the usage and integration of these results for plan recovery in a more general monitoring framework, as discussed in Section 6. This is part of our future work.

Acknowledgements We thank Matthias Fichtner and Mikhail Soutchanski for useful discussions and other colleagues for helpful comments. We are grateful to the reviewers for their comments which helped to better position this work, and for their suggestions to consider notions of relevant discrepancies and alternative notions of points of failure. This work was supported by FWF (Austrian Science Funds) under project P16536N04. The work of Wolfgang Faber was funded in part by an APART grant of the Austrian Academy of Sciences.

A.

Proofs of Theorems

A.1. Proof of Proposition 5.1 Proposition 5.1 For any discrepancy, if there is a state-oriented point of failure (Sk , k), then there does not exist any state-oriented point of failure (Sj , j) such that j > k. Proof: Suppose that there is a state-oriented point of failure (Sk , k) for a discrepancy between a state Si and a set T of trajectories relative to a plan P at a time stamp i, i.e., diagnosisState P,T k,i (Si , Sk ) holds. Then, by definition, for every j (0 ≤ k < j ≤ i), ¬matchState P,T j,i (Si ) holds. That is, for every j (0 ≤ k < j ≤ 0 0 i), there exist no states S0 , . . . , Si−1 , S00 , . . . , Sn0 such that matchAt P,T j,i (S0 , . . . , Si−1 , Si , S0 , . . . , Sn ) holds. Then, by definition, there does not exist a time stamp j (0 ≤ k < j < i) such that, for some state Sj , deviateState P,T j,i (Si , Sj ) holds. Therefore, by the definition of a state-oriented point of failure, there

is no time stamp j (0 ≤ k < j < i) such that, for some state Sj , diagnosisState P,T j,i (Si , Sj ) holds. The uniqueness of a history-oriented point of failure can be shown similarly (see Proposition 1 of [11].) u t

A.2. Proof of Proposition 5.2 Proposition 5.2 Let P be a plan hA0 , . . . , An−1 i and let T be a set of trajectories such that for any state S0 , for which init(S0 ) holds, T contains a trajectory S0 , A0 , . . . , An−1 , Sn , such that goal(Sn ) holds. Then, for any discrepancy, a point of failure exists.

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

35

Proof: Suppose that a discrepancy is detected between a state Si and T relative to P at a time stamp i (i ≤ n), i.e. discrepancy P,T (Si ) holds. i First observe that i > 0 must hold. Otherwise an initial state S0 would exist, such that ∀F00 , . . . , Fn0 traj T (F00 , A0 , . . . , An−1 , Fn0 ) ∧ goal(Fn0 ) ⊃ ¬(Fi0 ≡ Si ), i.e. no goal-establishing trajectory starting in S0 would exist in T , contrary to our assumption. Consider history-oriented diagnosis [11].Since at least one evolution S0 , . . . , Si exists for Si (for which init(S0 ) necessarily holds), then by assumption also some trajectory S0 , A0 , S10 , . . . , An−1 , Sn0 in 0 0 T exists, and hence matchUntil P,T 0,i (S0 , . . . , Si , S1 , . . . , Sn ) holds. Then, for some k (0 ≤ k < i), P,T deviateHistory P,T k,i (Si , Sk ) must hold, as otherwise, for all j (0 ≤ j ≤ i), matchUntil j,i (S0 , . . . , Si , 0 Sj+1 , . . . , Sn0 ) would hold contradicting discrepancy P,T (Si ). i Without loss of generality, assume that this k is maximal for the evolution S0 , . . . , Si among all evolutions. Then, also ¬matchHistory P,T k+1,i (Si ) holds, and hence (Sk , k) is a history-oriented point of failure. By Proposition 3 of [11], also some state-oriented point of failure exists. u t

A.3. Proof of Proposition 5.3 Proposition 5.3 Suppose that (Sk , k) is a point of failure for some discrepancy between Si and T relative to the plan P = hA0 , . . . , An−1 i. Then Ak is not deterministic. Proof: Given that (Sk , k) is a point of failure, the sentence diagnosisState P,T k,i (Si , Sk ) holds, and hence also P,T deviateState P,T k,i (Si , Sk ) holds. By definition of the latter and of matchAt k,i , trajectories S0 , A0 , S1 , . . ., 0 6= Sk+1 . Thus, tr(Sk , A, Sk+1 )∧ Ai−1 , Si and S00 , A0 , S10 , . . . , An−1 , Sn0 exist such that Sk0 = Sk and Sk+1 0 0 tr(Sk , A, Sk+1 ) ⊃ Sk+1 ≡ Sk+1 is false, and hence Ak is not deterministic. u t

A.4. Proof of Theorem 7.1 Theorem 7.1 Given a background B , a reached state Si , and a time stamp i (i ≤ n), deciding whether a discrepancy exists between Si and T relative to P at time stamp i is coNP-complete. Proof: For a background B, a state Si , and a time stamp i (i ≤ n), we want to show that the problem of deciding whether discrepancy P,T (Si ) holds is coNP-complete. i

Membership. By definition, discrepancy P,T (Si ) is false if there exist states S00 , S10 , . . . , Sn0 such that i 0 T 0 (i) Si = Si , and (ii) traj (S0 , A0 , . . . , An−1 , Sn0 ) ∧ goal(Sn0 ) is true. Such states Si0 can be guessed in polynomial time, and (i), (ii) can be verified in polynomial time. Hence, deciding that discrepancy P,T i (Si ) is false is in NP, which means that deciding whether discrepancy P,T (S ) holds is in coNP. i i Hardness. Given a propositional formula φ(X ) on atoms X = {X 1 , . . . , X m }, deciding whether φ(X ) is unsatisfiable is a well-known coNP-complete problem. To show coNP-hardness, we need a background B, a state Si , and a time stamp i obtainable from φ(X ) and X in polynomial time such that discrepancy P,T (Si ) holds iff φ(X ) is unsatisfiable. i

36

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

Such a background B, a state Si , and a time stamp i can be obtained from φ(X ) and X in polynomial time as follows. Let F = {Y, X 1 , . . . , X m } and let A = {D}. Take state(F) = act(F, A, F 0 ) = init(F) = goal(F) = >, take P = h{D}i, and take traj T (F0 , A0 , F1 ) = Y1 ⊃ φ(X0 ). Notice that in this domain, D is a dummy action which does not have any effects. The formula traj T (F0 , A0 , F1 ) describes transitions according to which a state S1 is reachable from a state S0 iff Y is false in S1 or the formula φ(X ) evaluates to true in S0 . Let i = 1 and S1 = {Y }. Now we need to show that, with respect to B defined above, Si , and i, discrepancy P,T 1 (S1 ) holds iff φ(X ) is unsatisfiable. For the if-direction, suppose that φ(X ) is unsatisfiable. Then, by definition, for every trajectory S00 , {D}, S10 in T , it holds that Y is false in S10 and thus S10 6= S1 . Hence, discrepancy P,T 1 (S1 ) holds. For the only-if direction, suppose that φ(X ) is satisfiable. Then some state S0 exists such that φ(X ) evaluates to true in it. Hence, S0 , {D}, S1 is a trajectory in P,T T . Consequently, discrepancy P,T (Si ) 1 (S1 ) does not hold. Therefore, deciding whether discrepancy i holds is coNP-hard. u t

A.5. Proof of Theorem 7.2 Theorem 7.2 Given a background B , a reached state Si , a time stamp i (i ≤ n) such that a discrepancy exists between Si and T relative to P at time stamp i, and an integer k, i ≤ k < n, deciding (i) whether the discrepancy is weakly k-irrelevant is NP-complete and (ii) whether it is strongly k-irrelevant is Πp2 complete. Proof: Membership. This is immediate from (6) and (7), respectively, since the respective QBFs can be constructed in polynomial time, and their evaluation is in NP and Πp2 , respectively.

Hardness. For weak k-irrelevancy, NP-hardness is shown by a simple reduction from the satisfiability problem. Let φ(X ) be a propositional formula on atoms X = {X 1 , . . . , X m }, and define a background B, similar to the proof of Theorem 7.1, as follows. Let F = {Y, X 1 , . . . , X m } and let A = {D}. Take state(F) = init(F) = goal(F) = >, take P = h{D}i, and take act(F, A, F 0 ) = Y ⊃ φ(X 0 ), traj T (F0 , A0 , F1 , A1 , F2 ) = ¬Y1 . Let i = 1, S1 = {Y }, and k = 1. Then, discrepancy P,T 1 (S1 ) holds, and clearly formula (6) holds for i = 1, S1 = {Y }, and k = 1 iff φ(X ) is satisfiable. That is, the discrepancy between S1 and T relative to the plan P is weakly 1-irrelevant iff φ(X ) is satisfiable. This proves NP-hardness. For strong k-irrelevancy, we show that deciding whether a given plan P = hA0 , . . . , An−1 i, n ≥ 1, for a planning problem specified by init(F) and goal(F) in an action domain description with a set F of fluents, set A of actions, and formulas state(F) and act(F, A, F 0 ), is valid, i.e., is a conformant plan, can be polynomially reduced to deciding strong k-irrelevancy of a discrepancy between a state Si at some stage i, 1 ≤ i < n, and a set T of trajectories relative to P in some background B, as follows.

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

37

Without loss of generality, we assume that init(F) describes a single state, S ∗ . Let Y be a new fluent, and let the new action description comprise the set of fluents Fˆ = F ∪ {Y }, the set of actions Aˆ = A, the formulas state(Fˆ ) = state(F) and ˆ Fˆ0 ) = Y ⊃ (Y 0 ∧ act(F, A, F 0 )). act(Fˆ , A, ˆ = > and let goal(Fˆ ) = goal(F). Let Pˆ = hAˆ0 , . . . , An+1 ˆ i = h∅, A0 , . . . , An i and let Let init(F) traj T (Fˆ0 , Aˆ0 , . . . , Aˆn+1 , Fˆn+2 ) = ¬Y1 . Then, it is easily seen that Pˆ is a credulous plan for init(Fˆ ) and goal(Fˆ ), and that a discrepancy between S1 = S ∗ ∪ {Y } and T exists at stage 1 relative to Pˆ . Furthermore, this discrepancy is strongly 1irrelevant, if and only if the remainder of Pˆ (i.e., P ) is always executable and leads to a state Sˆn+2 such that goal(Sˆn+2 ) holds. This is equivalent to P being a valid (conformant) plan for init(F) and goal(F). Deciding the latter is Πp2 -complete, as follows from results in [46, 16], even under the stated restriction on init(F). Therefore, deciding strongly 1-irrelevancy of S1 is Πp2 -hard. This proves the result. u t

A.6. Proof of Theorem 7.3 Theorem 7.3 Given a background B , a state Sk , a reached state Si , and time stamps k and i (k < i ≤ n), such that there is a discrepancy between Si and T relative to P at time stamp i, deciding whether (Sk , k) is a point of failure for the discrepancy is Dp -complete. Proof: For a background B, states Sk and Si , and time stamps k and i (k < i ≤ n), we want to show that the p problem of deciding whether diagnosisState P,T k,i (Si , Sk ) holds is D -complete.

Membership.

P,T By definition, diagnosisState P,T k,i (Si , Sk ) holds iff (S1) deviateState k,i (Si , Sk ) holds,

and (S2) for all j (k < j ≤ i), ¬matchState P,T j,i (Si ) holds. Deciding whether (S1) holds is in NP, since proper values for the existential variables in the definition (9) of deviateState P,T k,i (Si , Sk ) can be guessed, and then the formula can be evaluated in polynomial time. Deciding whether (10) does not hold is in NP, since a proper j and values for the existential variables in the definition (8) of matchState P,T j,i (Si ) can be guessed, and then the formula can be evaluated in polynomial time. Hence, deciding that (S2) holds is in coNP. Since (S1) and (S2) can be decided independently of each other, it follows that deciding whether p diagnosisState P,T k,i (Si , Sk ) holds is in D .

Hardness. Given propositional formulas φ(X ) and ψ(X ) on atoms X = {X 1 , . . . , X m }, deciding whether φ(X ) is satisfiable and ψ(X ) is unsatisfiable is a Dp -complete problem. To show Dp -hardness, we need a background B, states Si and Sk , and time stamps i and k (k < i ≤ n) obtainable from φ(X ), ψ(X ), and X in polynomial time such that φ(X ) is satisfiable and ψ(X ) is unsatisfiable iff diagnosisState P,T k,i (Si , Sk ) holds. We construct such a background B, states Si and Sk , and stages k and i, in polynomial time, as follows. Let F = {X 1 , . . . , X m , Y }, and let A = {D}. Take state(F) = init(F) = >, and define the other formulas in B as follows:

38

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

φ

Y

φ(X 1 )

S0

S2 ψ

¬Y

S20

ψ(X 1 )

0

1

2

Figure 6. The evolutions of states reached at time step 2 by executing the plan P = hD, ∅i (shown by the paths from 0 to 2) and the trajectory in T (shown by the path from 0 to 2 with thick edges), as described in Proof of Theorem 7.3.

• act(F, A, F 0 ) = (¬Y ∧ D ⊃ Y 0 ) ∧ (Y ∧ D ∧ Y 0 ⊃ φ(X 0 )) ∧ (¬Y ∧ ¬D ∧ Y 0 ⊃ ψ(X )) • goal(F) = ¬Y • traj T (F0 , A0 , F1 , A1 , F2 ) = tr(F0 , A0 , F1 ) ∧ tr(F1 , A1 , F2 ) ∧ Y0 ∧ ¬Y1 ∧ ¬Y2 . Take P = hD, ∅i, i = 2, k = 0, S2 = {Y }, and S0 = {Y }. The possible evolutions of a state reached at time step 2, with respect to P , are represented by paths from 0 to 2 in Figure 6. The path with thick edges shows the goal-establishing trajectory in T . Informally, from an initial state in which Y is false, all states in which Y is true and only those can be reached by executing D; from an initial state in which Y is true, all states can be reached except those in which Y is true and φ(X ) does not hold. From a state S1 at stage 1, all states can be reached by doing nothing, except those in which Y is true, if Y is false in S1 and ψ(X ) does not hold in S1 . All trajectories in T are goal-establishing; they start in those states in which Y is true, and have Y false at stage 1. We claim that φ(X ) is satisfiable and ψ(X ) is unsatisfiable iff diagnosisState P,T 0,2 (S2 , S0 ) holds, i.e., P,T P,T (S1) deviateState P,T 0,2 (S2 , S0 ) holds and (S2) ¬matchState 1,2 (S2 ) ∧ ¬matchState 2,2 (S2 ) holds. For the only-if direction, suppose that φ(X ) is satisfiable and ψ(X ) is unsatisfiable. Then there is an evolution S0 , {D}, S1 , ∅, S2 (17)

of S2 , according to P , such that Y is true at S1 . The evolution (17) of S2 matches every goal-establishing trajectory S0 , {D}, S10 , ∅, S20 (18) 0 0 0 in T at time stamp 0, i.e., matchAt P,T 0,2 (S0 , S1 , S2 , S0 , S1 , S2 ) is true. Since Y is not true at state S1 , 0 S1 6= S1 , (S1) holds. Furthermore, since ψ(X ) is unsatisfiable, in every evolution (17) of S2 according to 0 0 P , Y is true S1 . Since in each trajectory (18) in T , Y is false in S10 , matchAt P,T 1,2 (S0 , S1 , S2 , S0 , S1 , S2 )

is false, hence, (S2) holds. Therefore, diagnosisState P,T 0,2 (S2 , S0 ) holds. For the if-direction, suppose that diagnosisState P,T 0,2 (S2 , S0 ) holds, i.e., (S1) and (S2) hold, but either φ(X ) is unsatisfiable or ψ(X ) is satisfiable. Since (S1) holds, there exists an evolution (17) of S2 , according to P , such that S1 |= Y ∧ φ(X ). Consequently, by assumption, ψ(X ) must be satisfiable. This implies that there exists an evolution (17) of S2 , according to P , such that Y is false in S1 . Since the goal-establishing trajectory S0 , {D}, S1 , ∅, S20

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

39

is in T , the sentence matchAt P,T 1,2 (S2 ) is true, and hence (S2) fails, which contradicts our assumption. This proves the claim, and hence Dp -hardness of recognizing a state-oriented diagnosis. u t

A.7. Proof of Theorem 7.4 Theorem 7.4 Given a background B , a reached state Si , and a time stamp i (i ≤ n), such that there is a discrepancy between Si and T relative to P at time stamp i, computing some point of failure (Sk , k) for the discrepancy is FNP//OptP[O(log n)]-complete. Proof: For a background B, a state Si , and a time stamp i (i ≤ n), such that there is a discrepancy between Si and T relative to P at time stamp i, we want to show that computing a state-oriented point of failure (Sk , k) for the discrepancy is FNP//OptP[O(log n)]-complete.

Membership. The computation of k in a state-oriented diagnosis (Sk , k) can be viewed as the optimization problem of finding the largest stage k such that the sentence ∃Fk deviateState P,T k,i (Si , Fk )

(19)

holds. Given a stage k, deciding whether (19) holds is in NP. Here, k has O(log |I|) many bits, where I is the problem input (B, Si , i). When k is known, the state Sk in a diagnosis (Sk , k) can be obtained in polynomial time by nondeterministically generating proper values S0 , . . . , Si−1 , S00 , . . . , Sn0 for the existentially quantified variables in (19) so that the formula evaluates to true. Hence, computing a stateoriented diagnosis is in FNP//log.

Hardness. We show FNP//log-hardness by a polynomial-time reduction from X -MAXIMAL MODEL: Given a Boolean formula φ(Y) on atoms Y = {Y 1 , . . . , Y m } and a subset X ⊆ {Y 1 , . . . , Y m }, compute the X -part of a model M of φ(Y) such that M ∩ X is maximal, i.e., no model M 0 of φ(Y) exists such that M 0 ∩ X ⊃ M ∩ X , where a model M is identified with the set of atoms that are mapped to true. Completeness of this problem for FNP//log is shown in [8]. We will reduce X -MAXIMAL MODEL to computing some diagnosis in polynomial time in two parts, according to [8]. In Part 1, we will show that, for any instance φ(Y) of X -MAXIMAL MODEL, an instance f (φ(Y)) = (B, Si , i) of our problem input is constructible in polynomial time, such that f (φ(Y)) has some diagnosis. In Part 2, we will show that, from every diagnosis (Sk , k) for f (φ(Y)) and φ(Y), some X -maximal model M of φ(Y) can be constructed in time polynomial in the size of (Sk , k) and φ(Y) (provided φ(Y) is satisfiable). Without loss of generality, we assume that φ(Y) is satisfiable, and that in each model of φ(Y) some atom in X is true. We first prove that computing some history-oriented diagnosis for a discrepancy is FNP//log hard in two parts as follows.

Part 1. Informally, we construct (B, Si , i) as follows. Suppose that every evolution of a state Sn reached at time stamp n by executing the plan P starts at an initial state that corresponds to a model M of φ(Y). Consider two phases of such an evolution. In Phase 1, a counter C, which is set at the initial state to the number of atoms from X which are true in M , is decreased step by step until it reaches 0, and the model

40

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

Phase 2: C=0

. . .

. . .

C=1 C=2 . . .

C=n 1

2

C=n−1

n

. . .

0

C=n−2

C=0

Phase 1: C>0 Figure 7. The evolutions of states reached at time stamp n by executing the plan P = h∅, . . . , ∅i (shown by paths from 0 to n), where the dashed and the dotted edges symbolize the commitment of B to false and true respectively, as described in Proof of Theorem 7.4.

M is propagated by inertia. When C gets 0, Phase 2 begins. At the beginning of Phase 2, a special fluent B is committed to be either true or false. Throughout Phase 2, i.e., till the end of the evolution, C is stuck at 0, and the value of B is propagated by inertia. The observation is made at stage n (i.e., i = n). In the observed state Si , B is false, while in the trajectories in T , B is true at all stages. With the construction above, for a history-oriented point of failure (Sk , k) between Si and T relative to P at stage i, we consider an evolution of which Phase 1 is as long as possible, i.e., that C is as large as possible, and of which Phase 2 has B mapped to true. That is, a model M corresponding to an initial state has a maximum number C of atoms of X set to true. Then, from (Sk , k), such a model M can be easily extracted. More formally, let n = |X | > 0 and let F = {Y 1 , . . . , Y m } ∪ {C i,j : i ∈ {0, . . . , n}, j ∈ {0, . . . , i}} ∪ {B}. Intuitively, C i,j expresses in the initial state that exactly j atoms among X1 , . . . , Xi of X are true, such that C n,j is true iff the initial value of the counter C is j. Let A be the set consisting of the dummy action symbol D. We define the background B as follows: • state(F) = >. • init(F) = φ(Y) ∧ set counter C , where set counter C

= C 0,0 ∧

n ^

(C i,0 ≡ C i−1,0 ∧ ¬Xi ) ∧

i=1 n ^

i ^

C i,j ≡ ((C i−1,j−1 ∧ X i ) ∨ (C i−1,j ∧ ¬X i )).

i=1 j=1

Here formula set counter C defines inductively the value of C i,j and thus C.

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

41

• act(F, A, F 0 ) = dec C ∧ inertia, where dec C

= ¬C n,0 ⊃ (¬C n,n0 ∧

n−1 ^

0

(C n,j ≡ C n,j+1))

j=0

m     ^ 0 0 (Y i ≡ Y i ) ∧ C n,0 ⊃ (C n,0 ∧ (B 0 ≡ B)) . inertia = ¬C n,0 ⊃ i=1

Formula dec C decrements counter C in Phase 1 (when C n,0 is false, i.e., C > 0) by shifting the values of all C n,j . Formula inertia carries on the model M in Phase 1, and propagates the value of C n,0 and B in Phase 2 (when C = 0). • goal(F) = B. • traj T (F0 , A0 , . . . , Fn ) =

Vn−1 t=0

tr(Ft , At , Ft+1 ) ∧ init(F0 ) ∧

Vn

t=0 Bt .

That is, T contains all trajectories from initial states in which B is always true. • P =h∅, . . . , ∅i. We set Si =Sn = {C n,0 } and i=n. With the construction of B, Si and i above, there is a discrepancy between Sn and T relative to P at time stamp n, because B is false at Sn , and B is true at every state of every trajectory in T . Now we want to show that there is a point of failure for this discrepancy. First, note that there is an evolution S0 , ∅, S1 , ∅, . . . , ∅, Sn (20) of Sn = {C n,0 } according to P . Indeed, from any model M of φ(Y), we can obtain a corresponding state S0 such that init(S0 ) holds, by assigning B value true and all fluents C i,j the intended values; in particular, C n,j is true (i.e., counter C = j) iff j = |M ∩ X |. According to dec C and inertia, the value of C is decreased in S1 , S2 , etc. while the value of Y i stays the same as in S 1 until C = 0 (i.e., C n,0 is true) at some state Sl where l = |M ∩ X | and l > 0; the values of C i,j , i < n, are set to false in these states S1 , . . . , Sl ; B is set true in S1 , . . . , Sl−1 and false in Sl . In the states Sl+1 , . . . , Sn , C n,0 is true while all other fluents (including B) are set to false. Next, note that the trajectory S00 , ∅, S10 , ∅, . . . , ∅, Sn0 , (21) where Si0 =Si for i ∈ {0, . . . , l−1} and Si0 = Si ∪ {Bi } for i ∈ {l, . . . , n} (i.e., B is made true in all states Si where it is false) is a goal-establishing trajectory in T . Then, due to the evolution (20) of Sn , and the presence of the goal-establishing trajectory (21) in 0 0 T , it is easy to see that the sentence matchUntil P,T l−1,n (S0 , . . . , Sn , Sl , . . . , Sl ) is true. Thus the formula deviateHistory P,T l−1,n (Sl−1 , Sn ) holds. This implies that there is some history-oriented diagnosis (Sk , k) for the discrepancy detected at time stamp n such that k ≥ l.

Part 2.

We claim that, for any history-oriented diagnosis (Sk , k) for Sn , the model M = {Y j | Y j is true in Sk , 1 ≤ j ≤ m},

has a maximal X -part among the models of φ(Y).

42

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

To prove this claim, note that, since the evolution (20) and the goal-establishing trajectories (21) are constructed for an arbitrary model M of φ(Y), k ≥ l∗ must hold, where l∗ = max |M ∩ X | − 1. M |=φ(Y)

We first show that (i) C n,0 is false in Sk and (ii) k ≤ l∗ , which means k = l∗ . The sentence deviateHistory P,T k,i (Si , Sk ) holds witnessed by some evolution (20) of Sn according to P and a trajectory (21) in T , such that Sj = Sj0 , for j ∈ {0, . . . , k}. Suppose that C n,0 is true in Sk . Since B must be true in Sk , it follows from inertia that B is also true in Sn , which is a contradiction. Therefore, (i) holds. For (ii), suppose that k > l∗ . Then, Sl∗ = Sl0∗ holds. Since init(S0 ) holds, it follows from dec C and inertia that C n,0 is true in Sk , which contradicts (i). Thus, k = l∗ holds. Since C n,0 is false in Sk and, by the assumption about φ(Y), false in S0 , it follows from dec C and inertia that Y j has in Sk the same value as it has in S0 , for j ∈ {1, . . . , m}. Hence M is a model of φ(Y) such that |M ∩ X | − 1 = l∗ ; equivalently, |M ∩ X | = max |M ∩ X |. M |=φ(Y)

Clearly, such an M has a maximal X -part among the models of φ(Y). Since M can be easily extracted from (Sk , k), this completes Part 2 of the reduction. We thus have proven FNP//log hardness of computing some history-oriented diagnosis (Sk , k). Since in the reduction above, init(S0 ) holds for all initial states S0 of trajectories in T , Proposition 2 of [11] implies FNP//log hardness of computing some state-oriented diagnosis (Sk , k). u t

References [1] Ambrose-Ingerson, J. A., Steel, S.: Integrating planning, execution and monitoring, Proc. of the 7th National Conference on Artificial Intelligence (AAAI-88), St. Paul, MN, August 21-26, 1988. [2] Balduccini, M., Gelfond, M.: Diagnostic reasoning with A-Prolog, Journal of Theory and Practice of Logic Programming, 3(4–5), 2003, 425–461. [3] Baral, C., Kreinovich, V., Trejo, R.: Computational complexity of planning and approximate planning in the presence of incompleteness, Artificial Intelligence, 122(1-2), 2000, 241–267. [4] Baral, C., McIlraith, S. A., Son, T. C.: Formulating diagnostic problem solving using an action language with narratives and sensing, Proceedings Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR-00), Breckenridge, Colorado, USA (A. Cohn, F. Giunchiglia, B. Selman, Eds.), Morgan Kaufmann, 2000. [5] Baral, C., Tran, N., Tuan, L.: Reasoning about actions in a probabilistic setting, Proc. of the Eighteenth National Conference on Artificial Intelligence (AAAI-02), July 28 - August 1, 2002, Edmonton, Alberta, Canada, 2002. [6] Basti´e, C., R´egnier, O.: SPEEDY: Monitoring the execution in dynamic environments, Proc. Workshop on Reasoning about Actions and Planning in Complex Environments, International Conference on Formal and Applied Practical Reasoning (FAPR’96), Bonn, Germany, 1996, Available as Technical Report AIDA-9611, Fachgebiet Intellektik, Technische Hochschule Darmstadt, Germany.

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

43

[7] Beetz, M., McDermott, D.: Improving robot plans during their execution, Proc. of the Second International Conference on Artificial Intelligence Planning Systems (AIPS-94), June 13-15, 1994, University of Chicago, 1994. [8] Chen, Z.-Z., Toda, S.: The complexity of selecting maximal solutions, Information and Computation, 119, 1995, 231–239. [9] Cimatti, A., Pistore, M., Roveri, M., Traverso, P.: Weak, strong, and strong cyclic planning via symbolic model checking., Artificial Intelligence, 147(1-2), 2003, 35–84. [10] De Giacomo, G., Reiter, R., Soutchanski, M.: Execution monitoring of high-level robot programs, Proceedings 6th International Conference on Principles of Knowledge Representation and Reasoning (KR-98), Trento, Italy, 1998. [11] Eiter, T., Erdem, E., Faber, W.: Diagnosing plan execution discrepancies in a logic-based action framework, Technical Report INFSYS RR-1843-04-03, Vienna University of Technology, 2004. [12] Eiter, T., Erdem, E., Faber, W.: Plan reversals for recovery in execution monitoring, Proceedings 10th International Workshop on Nonmonotonic Reasoning (NMR-2004), Action and Causality Track (J. P. Delgrande, T. Schaub, Eds.), 2004. [13] Eiter, T., Erdem, E., Faber, W.: Undoing the Effects of Action Sequences, Technical Report INFSYS RR1843-04-05, Institut f¨ur Informationssysteme, Technische Universit¨at Wien, A-1040 Vienna, Austria, December 2004. [14] Eiter, T., Erdem, E., Faber, W.: On reversing actions: Algorithms and complexity, Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), Hyderabad, India, January 6-12, AAAI Press/IJCAI, 2007, [15] Eiter, T., Faber, W., Leone, N., Pfeifer, G., Polleres, A.: A logic programming approach to knowledge-state planning, II: The DLVK system, Artificial Intelligence, 144(1–2), 2002, 157–211. [16] Eiter, T., Faber, W., Leone, N., Pfeifer, G., Polleres, A.: A logic programming approach to knowledge-state planning: Semantics and complexity, ACM Transactions on Computational Logic, 5(2), 2004, 206–263. [17] Eiter, T., Fink, M., Senk¨o, J.: KMonitor – A tool for monitoring plan execution in action theories, Proceedings of the 8th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR 2005) (C. Baral, G. Greco, N. L. nad Giorgio Terracina, Eds.), number 3662 in LNCS, Springer, 2005. [18] Eiter, T., Lukasiewicz, T.: Probabilistic reasoning about actions in nonmonotonic causal theories, Proceedings Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI-2003), August 7-10, 2003, Acapulco, Mexico (C. Meek, U. Kjærulff, Eds.), Morgan Kaufmann Publishers, San Francisco, CA, 2003. [19] Ferraris, P., Giunchiglia, E.: Planning as satisfiability in nondeterministic domains, Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI’00), July 30 – August 3, 2000, Austin, Texas USA, AAAI Press / The MIT Press, 2000. [20] Fichtner, M., Großmann, A., Thielscher, M.: Intelligent execution monitoring in dynamic environments, Fundamenta Informaticae, 57(2–4), 2003, 371–392. [21] Fikes, R. E., Hart, P. E., Nilsson, N. J.: Learning and executing generalized robot plans, Artificial Intelligence, 3(4), 1972, 251–288. [22] Gelfond, M., Lifschitz, V.: Action languages, Electronic Transactions on AI, 3, 1998, 195–210. [23] Giunchiglia, E.: Planning as satisfiability with expressive action languages: Concurrency, constraints and nondeterminism, Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR 2000), April 12-15, Breckenridge, Colorado, USA (A. G. Cohn, F. Giunchiglia, B. Selman, Eds.), Morgan Kaufmann, 2000.

44

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

[24] Giunchiglia, E., Lee, J., Lifschitz, V., McCain, N., Turner, H.: Nonmonotonic causal theories, Artificial Intelligence, 153(1-2), 2004, 49–104. [25] Haigh, K. Z., Veloso, M.: Interleaving planning and robot execution for asynchronous user requests, Autonomous Robots, 5(1), 1998, 79–95. [26] Hayashi, H., Cho, K., Ohsuga, A.: Mobile agents and logic programming, Proceedings of the Sixth International Conference on Mobile Agents (MA 2002), Barcelona, Spain, October 22-25, 2002 (N. Suri, Ed.), 2535, Springer, 2002. [27] Hayashi, H., Cho, K., Ohsuga, A.: A new HTN planning framework for agents in dynamic environments, Computational Logic in Multi-Agent Systems, 4th International Workshop (CLIMA IV), Fort Lauderdale, FL, USA, January 6-7, 2004, Revised Selected and Invited Papers (J. Dix, J. A. Leite, Eds.), 3259, Springer, 2004. [28] Kautz, H., Selman, B.: Planning as satisfiability, Proc. 10th European Conference on Artificial Intelligence (ECAI-92), Vienna, Austria, August 3-7, 1992. [29] Koenig, S., Furcy, D., Bauer, C.: Heuristic search-based replanning, Proceedings of the Sixth International Conference on Artificial Intelligence Planning and Scheduling (AIPS), 2002. [30] Kushmerick, N., Hanks, S., Weld, D. S.: An algorithm for probabilistic planning, Artificial Intelligence, 76(1-2), 1995, 239–286. [31] Levesque, H. J., Reiter, R., Lesperance, Y., Lin, F., Scherl, R. B.: GOLOG: A logic programming language for dynamic domains, Journal of Logic Programming, 31(1-3), 1997, 59–83. [32] Littman, M. L.: Probabilistic propositional planning: Representations and complexity, Proc. AAAI-97, 1997. [33] McCain, N., Turner, H.: Satisfiability planning with causal theories, Proceedings Sixth International Conference on Principles of Knowledge Representation and Reasoning (KR-98) (A. G. Cohn, L. Schubert, S. C. Shapiro, Eds.), Morgan Kaufmann Publishers, 1998. [34] McCarthy, J.: Elaboration tolerance, 1999, In progress. [35] Nebel, B., Koehler, J.: Plan reuse versus plan generation: A theoretical and empirical analysis, Artificial Intelligence, 76(1-2), 1995, 427–454. [36] Onaindia, E., Sapena, O., Sebastia, L., Marzal, E.: SimPlanner: An execution-monitoring system for replanning in dynamic worlds, Proceedings 10th Portuguese Conference on Artificial Intelligence (EPIA-2001), 2001. [37] Papadimitriou, C. H.: Computational Complexity, Addison-Wesley, 1994. [38] Reiter, R.: Knowledge in action: Logical foundations for specifying and implementing dynamical systems, MIT Press, 2001. [39] Rintanen, J.: Constructing conditional plans by a theorem-prover, Journal of Artificial Intelligence Research, 10, 1999, 323–352. [40] Rosenbloom, P., Laird, J., Newell, A.: The SOAR papers: Readings on integrated intelligence, MIT Press, 1993. [41] Son, T. C., Pontelli, E.: Planning with preferences using logic programming, Proceedings of the 7th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR 2004) (I. Niemel¨a, V. Lifschitz, Eds.), number 2923 in LNCS, Springer, 2004. [42] Soutchanski, M.: Execution monitoring of high-level temporal programs, Proc. of IJCAI Workshop on Robot Action Planning, 1999.

Th. Eiter et al. / A Logic-Based Approach to Finding Explanations for Discrepancies in Optimistic Plan Execution

45

[43] Soutchanski, M.: High-level robot programming and program execution, Proc. of ICAPS Workshop on Plan Execution, 2003. [44] Thielscher, M.: The concurrent, continuous Fluent Calculus, Studia Logica, 67(3), 2001, 315–331. [45] Thielscher, M.: FLUX: A logic programming method for reasoning agents, Theory and Practice of Logic Programming, 5(4-5), 2005, 533–565. [46] Turner, H.: Polynomial-length planning spans the polynomial hierarchy, Proc. of Eighth European Conf. on Logics in Artificial Intelligence (JELIA’02), 2002. [47] van der Krogt, R., de Weerdt, M.: Plan repair as an extension of planning, Proc. Fifteenth International Conference on Planning and Scheduling (ICAPS-05), 2005. [48] Wilkins, D.: Practical planning: Extending the classical AI planning paradigm, Morgan Kaufmann Publishers, 1988.

Suggest Documents