Jim Blythe and W. Scott Reilly, Integrating Reactive and Deliberative Planning in a. Household ... H. D. Burkhard, M. Hannebauer, J. Wendler. AT Humboldt ...
ICAGENT: Balancing between reactivity and deliberation Vangelis Kourakos Mavromichalis1, George Vouros2 1
Dept. Of Information and Communication Systems – University of the Aegean 2 Dept. Of Mathematics – University of the Aegean Karlovassi, Samos – Greece {emav, georgev}@aegean.gr
Abstract. The aim of this paper is to present a framework for developing intelligent agents that act in dynamic and unpredictable environments by balancing between deliberation and reaction. Acting reactively, an agent executes routine tasks without considering its options and thinking about alternatives. Acting deliberatively, an agent deals with complex goals that need careful, although not detailed planning and reconciles its desires and its intentions. This paper emphasizes on the utility of plan management techniques, as well as on the integrated treatment of agents’ mental structures during reaction and deliberation. The framework is thoroughly explained and the behaviour of an agent in a specific environment is examined.
1. Introduction Agents, to achieve their goals successfully in dynamic and unpredictable environments, must be able to generate plans towards their goals, monitor changes in their environment and change their plans accordingly. Generally, planning reactively an agent builds and/or adjusts its plans in response to changes in the environment at execution time. Reactive planning, as it is mentioned in [4], concerns with the difficulties of direct interaction with a changing world. However, in order agents to exhibit robust behavior must be able to deliberatively evaluate their options, consider all their desires and intentions, form strategic plans, and reason about the consequences of their actions [4,2]. Planning deliberatively, an agent may deal with goals that need careful, although in general not detailed, planning. During deliberation the agent considers all the options it has and prioritizes alternatives by reconciling between desires and intentions. For agents to plan and act effectively in dynamic and unpredictable environments, they must decide in which situations they should react or deliberate balancing and intermixing reaction and deliberation. Let us for instance assume that while driving to work, an agent intents to withdraw money from the bank, make some shopping and pass from a specific point. Let us further assume that this agent plans her/his route deliberatively: The agent examines and reconciles alternative plans, and decides which alternative to pursue. We must notice that this does not mean that the agent has a detailed plan for achieving her/his intentions. The
agent may have a partial plan that is further extended towards fine-grained (basic-level) actions. While the agent extends and refines the plan, she/he must decide which segments of the plan to form reactively and which ones deliberatively. As already noted, the agent chooses a route deliberatively, and starts following it. Certain parts of the route can be followed in a purely reactive way. This means that the agent does not consider other options she/he may have. The agent tries to achieve her/his goals by following the planned route and by reacting to changes in the environment. However, unpredictable (according to agent’s abilities) obstacles that prohibit him/her to follow the planned route, such as heavy traffic due to an accident, may force him/her to re-schedule the route, change the sequence in which she/he pursues the goals and even abandon some goals, deliberatively. Choosing which goal to abandon depends on the relative strength of agent desires and intentions, on its future (later in the day) intentions, as well as on contextual constraints. The aim of this paper is to report on progress towards our efforts to establish a generic framework for developing agents that reason about their plans and balance between deliberation and reaction. Key issues towards this aim are the following: • Provide the necessary reasoning tasks for agents to plan towards achieving their intentions, to monitor changes happening in their environment, to manage their desires and intentions, to assess alternatives, to cooperate with other agents and to change/extend their plans accordingly. This requires agents to be equipped with advanced plan management reasoning tasks [10]. • Provide a clear distinction between deliberation and reaction in terms of agents’ reasoning tasks and management of agents’ mental state. This requires considering agents’ mental attitudes and reasoning tasks in an integrated way. This in its turn involves distinguishing between deliberative and reactive planning in terms of desires, intentions, goals as well as in terms of beliefs and constraints. • Provide an explicit and an as detailed as possible representation of agents’ mental state so as to support effective plan management, effective reactive, deliberative and cooperative behavior. This paper considers that planning reactively, an agent adjusts its plans during execution time taking into account its current mental state and the intentions it has formed towards achieving a particular desire. Planning deliberatively, an agent plans towards achieving complex goals, and considers alternatives and further desires it may have. To distinguish between deliberative and reactive planning, this paper conjectures that deliberation requires agents to reconcile their desires among themselves as well as with their intentions. Reconciliation forces agents to assess their options and form a consistent set of intentions. This is in contrast to reactive planning, which does not consider desires at all: In this case the agent forms intentions and pursues its plans further by forming new intentions without considering other options. Subsequently, this means that the distinction between deliberation and reaction is not based on whether an agent plans or invokes a hard-wired procedure, but on how plan operators (recipes) are utilized during planning. The decision whether a recipe will be utilized in a reactive or in a deliberative way depends on the agent developer, or on the agent’s mental state. To achieve the objectives and tackle the above-mentioned key issues, the proposed agent development framework is based on the collaborative planning framework of [5], which is further extended with advanced plan management tasks [10]. It must be mentioned that although the proposed framework is based on a collaborative planning framework, it does not deal with social deliberation since it does not handle
shared plans at the moment. The paper focuses on the principles and reasoning tasks that distinguish between reaction and deliberation and provides the basis for dealing with social deliberation.
2. Related work As Pollack and Horty [10] indicated, in order agents to achieve their goals in uncertain, unpredictable and dynamic environments, they must be able to form plans and manage them in an effective way. Plan management aims to tackle problems related to the agents being omniscient, acting in deterministic and instantaneous ways, having goals that are fixed and to their environment being static. As it proposed in [10] to manage plans effectively, agents should be equipped with the following reasoning tasks: 1. Plan generation: Agents must be able to perform classical planning. 2. Environment monitoring: Agents must be able to monitor their environment and focus their attention to specific environmental changes. 3. Commitment management: Agents must be able to reason about the relative strength of their commitments and to decide which to give up. 4. Alternative assessment: Given contextual constraints and their capabilities, agents must be able to assess the costs and the benefits of the options that are presented to them. 5. Plan elaboration: Agents must be able to interleave planning and execution in order to elaborate their partial plans. In dynamic and unpredictable environments agents may have not all the needed knowledge to complete their plans at planning time. 6. Meta-level control: Agents, for some activities, may need to do a lot of careful planning, while for other, may decide to have a less-than-optimal solution. 7. Cooperation with other agents: Agents acting in a multi-agent environment may need to cooperate (or at least interact) with other agents in order to realize their tasks. Balancing between deliberation and reaction is a plan management problem; because, as already noted, agents have to reason about the relative strength of their desires and intentions, assess their options, determine how to generate and elaborate their plans. Consequently, balancing between deliberation and reaction, agents determine how they shall change their focus of attention to environmental changes. For instance, during deliberation, agents are forced to consider all their options, while during reaction they commit to the achievement of a specific goal by forming intentions towards it. Major agent frameworks that intermix deliberative with reactive behavior are dMARS [7], and InteRRaP [9]. dMARS provides a framework for developing agents capable to perform in dynamic and unpredictable environments. dMARS does not support distinguishing in a comprehensive way plans (or plan portions) that are generated in a reactive or in a deliberative way. The abilities of the framework to support balancing between deliberation and reaction are further constrained, since agents’ mental attitudes are not represented and handled in an explicit way. InteRRaP uses a hybrid-layered based architecture, which combines reactive and deliberative behavior and incorporates the ability to interact with other agents. This architecture distinguishes reactive and deliberative behavior using two different control
and knowledge layers: The behavior-based layer, incorporating reactive and procedural knowledge for routine tasks and the local planning layer that provides the facilities for means-ends reasoning for the achievement of complex level action producing a goaldirected behavior. InteRRaP distinguishes between deliberation and reaction by goal classification (reaction, local planning, cooperative) and by distinguishing the abovementioned layers. It does not consider reconciling desires and intentions, which would involve dealing with incompatibilities between plans. Furthermore, InteRRaP does not arbitrarily intermix between deliberative and reactive behavior: The local planning layer can invoke procedures from the behavior-based layer, without procedures to be able to either invoke reactors from the behavior-based layer or establish a goal state to the local planning layer. HAC [1] is a framework for building agents that act in dynamic environments. HAC, among others, deals with the problems of reacting to changing environments in a timely manner, integrating reactive and cognitive processes to achieve abstract goals and interleaving planning with execution. However, it does not clearly distinguish between deliberation and reaction in terms of agents’ reasoning tasks and management of their mental states. It assumes that in a hierarchical plan, high-level actions are of a deliberative character, while lower-level actions are more of a reactive character. Consequently, it assumes that reaction and deliberation do need extra mechanisms to implement them. Another approach is the one used for developing AT-Humboldt [8]. Intentions in this approach correspond to long-term plans (that can be partial) for achieving a chosen goal. Then, the execution of each intention is split into short-term plans that are precompiled plan skeletons. In this approach, each new sensor information results in a complete deliberation process, which result in new plans created. Therefore, according to this approach, although agents react to the changes in the environment, they do not balance between reaction and deliberation effectively. The major problem arises from the fact that new commitments are formed independently from previous intentions. A hybrid reactive-deliberative system is presented in [2]. The reactive component, Hap, of the system starts with a predefined set of goals and hand-coded plans for achieving them. In case this component has no stored plan for handling a situation, it calls the deliberative planner Prodigy. This system neither balances between reaction and deliberation, nor it intermixes these planning modes. Furthermore, the conditions under which Prodigy is called are contained in pre-defined Hap productions. As already mentioned, to support agents balancing between planning deliberatively and planning reactively, and to deliberate in an effective way, an agent developing framework needs to provide an explicit and an as detailed as possible representation of agents mental state. Furthermore, although not a goal of this paper, to support social deliberation, the framework should provide collaborative planning facilities, in order agents to ascribe attitudes to each other, share goals, plan towards these goals and integrate their individual plans towards their shared goal. Towards these directions, we propose using the framework for collaborative planning proposed by B.Grosz and S.Kraus in [5]. This framework can form the basis for constructing agents that have advanced plan generation and elaboration abilities, and provides the basis for designing agents that have advanced plan management facilities.
3. The Tile-world example The tile-world example proposed by M. Pollack and M. Ringuette [11] provides the dynamic and unpredictable environment that abstracts many real world tasks in which an agent needs to balance between deliberation and reaction, such as the “driving to work” example presented in section 1. However, for the purposes of this paper a variant of this example is utilized: The tile-world is a chessboard like environment with empty squares, obstacles, holes and an agent that carries a number of tiles with him. The agent is a unit square. The goal of the agent is to put all its tiles into the holes making the least possible moves. Obstacles are blocks that the agent has to move in order to reach the holes. Holes and blocks change their position randomly on the board. Each hole can be filled by only one tile. It is not possible to have a block over a hole or to have two blocks one over the other. The agent is able to move in all directions, except diagonally, by one square per move. The agent is also able to move an obstacle to a neighbor empty square, except to the square that is behind the agent. This environment is dynamic because it continuously changes, and non-deterministic, because at each time point the next state of the environment can not be determined by its current state and the actions performed by the agent. Figure 1 shows a configuration of a 5x5 chessboard with 8 obstacles and 2 holes. The agent must fill the two holes with the tiles, performing the least possible moves. We assume that the agent is fully aware of the changes in the environment and therefore, at any time point it knows the complete chessboard configuration. For each existing hole the agent forms a desire to fill it with a tile. 1 A B C D
2
3
4
5
Notation : Agent : Hole : Obstacle
E
Figure 1: A configuration of the tile-world chessboard
Given the configuration depicted in Figure 1, the agent plans deliberatively and commits towards filling the hole that is nearest to it, i.e. the hole at D5. Towards filling this hole, the agent finds the shortest route to D5 (i.e. D2, D3, D4, D5) without planning in detail all the necessary actions to reach that hole. Doing so, the agent has a partial plan to D5, which it must further refine. This behavior is justified by the fact that the environment is dynamic and unpredictable: The agent may not have the all the necessary information to complete the plan. For this reason the agent refines the partial plan and moves along the selected path mainly in a reactive way, interleaving planning with execution. Let us suppose that the configuration of the chessboard will not change until the agent reaches D3. At this point the agent forms the belief that it cannot move the obstacle D4 and proceeds deliberatively to reconcile its intention to fill D5 with the desire to fill A3, given the new evidence about its route to hole D5. Therefore, the agent balances between reactive and deliberative planning taking into
account its mental state and contextual parameters. The agent further intermixes reactive and deliberative planning in an arbitrary way. For example, it first plans deliberatively its route to D5, then it tries to refine that plan reactively and finally, it deliberates to D5, which results in re-thinking its intentions with respect to its desires. At this point we must notice that reaction does not require reconciliation between agent intentions and desires. For instance, during reaction the agent may simply react to an empty hole and form an intention to fill it. Doing so, when that hole changes its position, the agent will simply “chase” it without considering its options. On the contrary, during deliberative planning the agent plans its actions carefully, resolving all conflicts between desires, intentions and possible actions towards achieving the formed intentions. Doing so, when the agent finds a hole that is nearest to it, it will “re-think” its options by reconciling its desires and intentions.
4. The ICAGENT Framework As Figure 2 shows, ICAGENT overall architecture comprises two main modules: the Deliberation Control Unit (DCU) and the Plan Elaboration and Realization Control Unit (PERCU). All modules can access agents’ knowledge base and update it. The knowledge base includes agent’s mental state in terms of beliefs. Beliefs concern agent’s physical environment, agent capabilities, desires, intentions and goals, as well as recipes for achieving goals. Situation Rules represent the conditions under which the agent must act. Furthermore, context [5] includes all operators, functions and predicates on actions as well as the plans that are formed for achieving goal states. Agents use the context to track their individual and collaborative plans as well as the constraints that should hold during their activity.
Notation
Environment Knowledge Base
Facts
Perception 1 Deliberation Control Unit
Situation Recognition 3,4,5
Beliefs for new facts Beliefs about: Situations
Facts
Desires
Situation Rules
Desires with recipes
Reconciliation 6 Plan Elaboration and Realization Control Unit
Plan Elaboration 2,3,4,5
Clock
Intention Realization
Reconciled Goals Goals Desires and Intentions Intentions Beliefs
Desires Goals Intentions Capabilities
Recipes
Context
Environment
Figure 2 Overall ICAGENT Architecture
: The agent has a desire : The agent has a desire and an intention to find a recipe for that action. : The agent has a desire for an action and a recipe for that action. : The desire for an action is transformed to a Goal. : The agent has an intention for an action. : The agent has executed the action.
1: Checks the mental conditions of Situation Recognition rules. 2: Checks the mental conditions of a relevant recipe. 3: Checks the capabilities constraints. 4: Checks the context constraints. 5: Checks the context constraints of all ancestor recipes. 6: The action has been reconciled successfully
4.1 Knowledge Base Beliefs: Beliefs represent knowledge about the physical environment and agents’ mental state. Beliefs are according to the following form: bel(T,Agent,C,fact(FT,Prop)) (1) where, Agent is an agent id, C is a certainty factor about the occurrence of the fact(FT,Prop), FT is the time point where the fact(FT,Prop) occurred, T is the time point where the agent has learned about the occurrence of fact(FT,Prop)and Prop is a proposition of the following forms: • bel(T’,Agent’,C’,fact(FT’,Prop’)), i.e. the agent may have beliefs about other agents’ mental attitudes. • desire_to(Agent, Action), where Agent is an agent id and Action may be a complex or a basic level action. A desire is an abstract notion that specifies a preference to achieve a particular mental or external world state or to take over an action. The agent may hold an inconsistent set of desires. • goal_to(Agent, Action), where Agent and Action are as it is specified above. Following Cohen-Levesque [3], goals are desires that have passed the reconciliation process. Therefore goals of an agent are always consistent among themselves as well as with agent intentions. • intent_to(Agent, Action), where Agent and Action are as it is specified above. Intentions represent the commitment of the agent to perform the Action. intent_to corresponds to the Int.To mental attitude proposed in [5]. I.e. the agent either has a complete plan to perform Action, or has a partial plan but knows a way to elaborate that plan in order to complete it. Intentions are consistent with the goals of the agent as well as among themselves: This is true only for the intentions that correspond to actions that the agent plans deliberatively. cap(Agent, Action). Capabilities represent agent’s ability to perform Action, where Action is a basic level action. Beliefs do persist. Moreover, the agent remembers all the beliefs it had at previous time points. Recipes: Recipes have the general form rec(Action , Mental Condition , Type, Capability Constraints, Contextual Constraints, Actions List, Effects) Where, • Action is of the form Action_Name(List of Action arguments). Action arguments may be either constants or variables. • Mental Condition is a logical proposition that combines beliefs using and,or and not logical connectives. Mental conditions are checked when the agent tries to find a recipe that is relevant to achieve a goal. Mental conditions may also determine whether a goal will be achieved deliberatively or reactively. • Type is of the form(Recoverability, Deliberation/Reaction). (a) Recoverability denotes whether the recipe is towards a recoverable or an irrecoverable action. For recoverable actions, the agent interleaves planning with execution. On the contrary, for irrecoverable actions, it forms the complete plan (deliberatively or reactively) before start executing it. This argument is specified by the agent developer.
(b) Deliberation/Reaction denotes whether the recipe must be elaborated in a deliberative or reactive way. This argument can be either specified by the agent developer or can be determined by recipe’s mental conditions. It must be noticed that during planning, each recipe inherits the deliberation/reaction characterization of its parent recipe. However, this is subject to change due to the mental conditions of the new recipe. It is this argument that specifies whether the elaboration of this recipe should form further desires (deliberation) or form intentions that need no reconciliation (reaction). Furthermore, it is this argument that determines whether the capability and contextual constraints, and effects of the recipe will be reconciled with the effects and corresponding constraints, respectively, of the other plan recipes. rec( Action fill( T, hole(H) ), MntlCond ( bel(true), type(recoverable,deliberative), Type CapConstr bel(true), bel(path(H,Path,Cost)) AND CConstr bel(path(H1,Path1,Cost1)) AND bel(Cost