$-Calculus Bounded Rationality = Process Algebra + Anytime Algorithms Eugene Eberbach Jodrey School of Computer Science Acadia University Wolfville, NS B0P 1X0, Canada
[email protected]
Abstract
$-calculus is a higher-order polyadic process algebra with a utility (cost) integrating deliberative and reactive approaches for action selection in real time, and allowing to capture bounded optimization and metaresoning in distributed interactive AI systems. In this paper we present basic notions of $-calculus and demonstrate the versatility of its k meta-control to plan behaviors for single and multiple agents. The approach can be understood as a generic proposal of a computational theory of AI based on search and optimization under bounded resources. Keywords: bounded rationality, process algebras, anytime algorithms
1 Introduction Decision-making can be understood as a search problem whose goal is to arrive at the best possible decision. Recently, there has been shift from consideration of optimal decisions in games to a consideration of optimal decision-making programs for dynamic, inaccessible, complex environments such as the real world. Perfect rationality is impossible in these environments, because of prohibiting deliberation complexity. Anytime alorithms, e.g., Dean, Horvitz, Russell, Zilberstein [1, 7, 10, 12], attempt to trade o result quality for the time or memory needed to generate results. Bounded rational agents are ones that always take the actions that are expected to optimize their performance measure, given the percept sequence they have seen so far and limited resources they have. Process algebras, e.g., Hoare's CSP, Milner's CCS or -calculus [8], with basic programming operators, has been used to study the behaviors of interactive multi-agent systems and leading to more expressive models than Turing Machines, e.g., Interaction Machines [11]. By extending process algebra operators with von Neumann/Morgenstern's costs/utilities, anytime algorithms can be viewed as a basis for a general theory of computation. As the result we shift a computational paradigm from the design of agents achieving one-time goals, to the agents who persistenly attempt to optimize their happiness. We call this approach $-calculus (pronounced \cost-calculus"), which is a higher-order polyadic process algebra with a utility (cost) allowing to capture bounded optimization and metareasoning in distributed interactive AI systems. $-calculus extends performance measures beyond time to include answer quality and uncertainty, using k -optimization to deal with spatial and temporal constraints in a exible way. This is a very general model, just as neural networks or genetic algorithms, leading to a new programming paradigm (cost languages) and a new class of computer architectures (cost-driven computers). In this paper, we present basic notions of $-calculus and present the versatility of its k -optimization for the case of single and multiple agents. research
partially supported by a grant from NSERC No. OGP0046501
2 Bounded Rationality After years of often successful, but divergent research, we are back at roots. Currently dominates a uni ed view of AI as an area designing intelligent agents (see e.g., [9]). An agent is embodied (e.g., robot or animal) or disembodied (e.g., softbot) creature that perceives its environment through sensors and acts upon that environment through eectors in order to survive and prospect (i.e., living natural or arti cial life). We can distinguish the folowing agent classes with growing level of intelligence (i.e., complexity of problems being tackled for the price of eectiveness): Reactive agents respond immediately either to percepts only or to percepts and their internal state. Goal-based agents act so that they will achieve their goal(s) Utility-based agents try to maximize their own happiness A rational (intelligent) agent is one that does the right thing using some performance measure to determine how successful an agent is [10, 9]. Reactive agents use condition-action (if-then) rules encoded in software (e.g., expert systems or logic clauses) or hardware (e.g., behavior-based robots like Brooks subsumption architecture or Arbib/Arkin's motor-schema robots). They work in sense-act cycles. Goal-based agents use goal states to describe situations which are desirable. Search and planning are used to nd actions that lead to the agent's goal. In other words, they use some kind of deliberation about the future (i.e., will I achieve a goal if I choose a given action?). GPS, Strips/Shakey represent that approach. They work using sense-deliberate-act cycles. Goals provide a more exible way do deal with changeable situation than "hardwired" answer of re exive/reactive agents applying for a larger class of problems for the price of smaller eciency for each speci c problem instance. However, goals do not capture that the same result can be achieved quicker, more reliable, or cheaper than others. Goals just provide a crude distinction between "happy" and "unhappy" states, in the sense of having higher utility for the agent. Utility-based agents allow rational decisions where goals are hard or impossible to specify: 1. when there are con icting goals, only some of which can be achieved (for example, speed and safety), the utility function speci es the appropriate trade-o 2. where there are multiple goals that the agent can aim for, none of which can be achieved with certainty, utility provides a way in which the likelihood of success can be weighted up against the importance of the goals, 3. they allow to take into account a price for deliberation - a part of utility function expressing costs of meta-reasoning (i.e., deliberation length can be exible). Utility is a function that maps a state onto a real number, which describes an associated degree of happiness. Utility-based agents have roots in heuristic search algorithms (e.g., A*, IDA*, SMA*, hill climbing, simulated annealing, minimax, alpha-beta, game theory, utility theory, dynamic programming). Newer approaches include evolutionary computation (with tness as utility function), anytime algorithms, decision-theoretic metareasoning, dynamic belief networks, dynamic decision networks, various forms of machine learning, and $-calculus (with cost function as utility measure). An ideal rational (intelligent) agent is one that always takes the action that is expected to optimize its performance measure, given the percept sequence it has seen so far. However, it requires an unrealistic assumption that an agent has unlimited resources (e.g., in nite speed of computation, or in nite memory). Unfortunately, most of the real problems requiring intelligence are computably intractable (either unsolvable or exponentially complex), thus perfect optimization required for intelligence is out of questions. A limited (bounded) rational agent is one that always takes the action that is expected to optimize its performance measure, given the percept sequence it has seen so far and limited resources one has. A bounded rational agent solves constraint optimization problems, taking into account both quality of a solution and costs of resources used to obtain it. Bounded optimality ts intuitive idea of intelligence and provides a bridge between theory and practice. Herbert Simon states that organisms adapt well enough to "satis ce" (i.e., limited optimization), they do not, in general "optimize" (i.e., perfect optimization).
3 $-Calculus Basic Notions $-calculus (pronounced cost calculus) is a formalization of resource-bounded computation. It was developed at Acadia University in 1990s [3, 5, 6]. $-calculus is a higher-order polyadic process algebra with a utility (cost) integrating deliberative and reactive approaches for action selection in real time, and allowing to capture bounded optimization and metareasoning in distributed interactive AI systems. To its formal roots belong Milners -calculus [8] (basic theory of concurrency) and Wegners interaction machines [11] (interactive agent model). Regarding expressiveness it can express formalisms having richer behaviors than Turing Machines, including -calculus, cellular automata, interaction machines, neural nets, and random automata networks [5]. $-calculus leads to a new programming paradigm (cost languages) [2] and a new class of computer architectures (cost-driven computers). It has been applied to the Oce of Naval Research SAMON robotics testbed to derive GBML (Generic Behavior Messagepassing Language) for behavior planning, control and communication of heterogeneous Autonomous Underwater Vehicles (AUVs) [4]. Also used in the DARPA Reactive Sensor Networks Project at ARL Penn State for empirical cost pro ling. In general, it is applicable to robotics, software agents, neural nets, and evolutionary computation. Potentially could be used for design of cost langugages, cellular evolvable cost-driven hardware, DNA-based computing and molecular biology, electronic commerce, and quantum computing. In $-calculus everything is a cost expression: agents, environment, communication/interaction links, inference engines, modi ed structures, data, code, and meta-code. Then the syntax of $-expressions (in pre x notation) is the smallest set P , which includes the following cost expressions (assuming P; Q; Pi are arbitrary cost expressions): Composite (interruptible) $-expressions C :
C ::= ( i2I Pi ) j ( k i2I Pi ) j ( [ i2I Pi ) j (ti2I ( i Pi )) j (f Q~ ) j (:= (f X~ ) P ) [
sequential composition parallel composition cost choice general choice call of (user) de ned $-expression (application) recursive de nition (abstraction)
Simple (contract) $-expressions (considered to be executed in one atomic indivisible step):
::= ($ P )
j j j j j
(!i2I (ai Q~ i )) ( i2I (ai X~ i )) (7!i2I (ai Q~ i )) (a Q~ ) (:a Q~ )
cost send receive mutation call of (user) de ned simple $-expression negation of (user) de ned simple $-expression
Sequential composition is used when $-expressions are evaluated in a textual order. Parallel composition is used when expressions run in parallel and it picks a subset of non-blocked elements at random. Cost choice is used to select the cheapest alternative according to a cost metric. General choice picks one non-blocked element at random. General choice is dierent from cost choice. It uses guards satis ability. Cost choice is based on cost functions. Wrong choice can be corrected in cost choice, but not in general choice. Call and de nition encapsulate expressions in a more complex form (like procedure or function de nitions in programming languages). In particular, they specify recursive or iterative repetition of $-expressions. An empty sequential composition (for an empty indexing set I ) is denoted by ", and an empty parallel composition, general and cost choices are denoted by ?. Simple cost expressions execute in one atomic step. Cost functions are used for optimization and adaptation. They currently consist of standard, probabilistic, and fuzzy cost functions. The user is free to de ne his/her own cost metrics. Send and receive perform handshaking message-passing communication, and inferencing. Mutation is like sending through the noisy channel. Additionally, a user is free to de ne her/his own simple $-expressions, which may or may not be negated.
Operational cost semantics for $-calculus is de ned in a traditional way for process algebras using inference rules and Labeled Transition System (LTS). The new aspect is that LTS looks not for any action to execute, but the action with a minimal cost. Strong and weak (obervation) congruence (bisimulation) de nes an equivalence of totally or partially observable $-expressions. The dierence is that for an observation bisimulation cost of invisible $-expressions is neutral (equal 0), and for strong bisimulation is estimated (i.e., not equal zero, thus in uencing directly optimization).
4 Anytime Meta-Control: Cost Functions Performance Measures, A Simple Modifying Algorithm, and k -Optimization Cost (utility) functions $ represent a uniform criterion of adaptation and control. They have their roots in the von Neumann/Morgenstern's utility theory. They can represent anything, in particular, uncertainty, time, or available resources. In $-calculus there are prede ned crisp, probablistic and fuzzy cost functions, but user is free to de ne its own $-fuctions. To compute the cost, we use the following principle: \the cost of the problem consisting of subproblems is the function of costs of subproblems", what is called in the terms of the utility theory separability of costs. This function may be dicult to nd, so cost may also be assigned to the program as the whole. Let P be the set of $-expressions. The user speci es the initial costs of all simple terminals, functions, and modi cations (costs of simple cost expressions - elements from T , F and M ), or they are derived on the basis of statistical pro ling. They can be learned using methods from adaptive dynamical programming, temporal dierence learning, or Q-learning [9], i.e., costs are dynamic entities. $-calculus has prede ned 3 typical cost functions: crisp, probabilistic and fuzzy. Probabilistic and fuzzy costs functions are instances of normalized cost functions (de ned on unit interval). In particular, asymptotic cost functions (by analogy to asymptotic complexity) can be de ned. The domain of the cost function is a cartesian product of states S (capturing costs/rewards of past and present $-expresions) and P (future $-expressions started from a given state) and returning real value costs R1 (real numbers with added in nity), i.e. $ : S P ! R1 . Let v : S A" ! R1 be costs of simple cost expressions, including a silent expression. They are context dependent, i.e. they depend on states. In particular, cost of " may depend which cost expression is made invisible by ". Note that the value of the cost function (or its estimate) can change after each step (evaluation of a simple cost expression). (A standard crisp cost function) For every state S and $-expression P ($ (S P )) = ($ S ) + ($ P ) where
Definition 4.1
8 history o (captures current rewards/penalties to be in S) < ($ S ) history on (parent's rewards/penalties included) ($ S ) = : ($ S 0 ) + ($ S S ) + ($ S ) S ($ S 0 ) + ($ S S ) + ($ S ) total history on (all visited states included) where S 0 is a parent state of S , and S S is a cheapest multiset of simple $-expressions leading from 0 S to S , and S covers all states expanded so far (and perhaps, executed). ($ P ) is de ned below: 0
0
0
0
0
1. ($ ?) = +1 for observation congruence 2. ($ ") = 0(v ") for strong congruence does not block 3. ($ ) = c + (v ), where c = 0+1 blocks ($ (:)) = c1 + (v (:)) 4. ($ (ti2I ( i Pi ))) = i2I pi (($ i ) + ($ Pi0 )) where pi is the probability of choice of the i-th branch 5. ($ ( [ i2I Pi )) = ($ j ) + ($ ( [ i2I;i6=j Pi Pj0 )) = (mini2I ($ Pi )) [
[
6. ($ k i2I Pi )) = J I pJ (($ fj gj2J ) + ($ ( k i2I ?J;j2J Pi Pj0 ))) where pJ is the probability of choice of the J-th multiset ~ X~ g) where (:= (f X~ ) P ). 7. ($ (f Q~ )) = ($ P fQ= Anytime meta-control performs k -optimization and is responsible for local and emerging global optimization in time and space. Meta-control takes a form of a simple modifying algorithm, which attempts to minimize cost, and solution quality improves if time (more resources) is available. Formally, it is the following $-expression: (:= MA ( (init T F ) (loop P ))) where loop is the select-examine-execute cycle performing k -optimization: (:= (loop P ) (t ( (: goal P )(sel P )(exam P )(exec P ) (loop P )) ( (goalP ) MA))) Deliberation occurs in the form of select-examine-execute cycles. An empty examine phase produces a reactive algorithm. Short (long) deliberation is natural for interruptible (contract) algorithms. Interruptible algorithms can be interrupted down to the level of atomic expressions. Interruptibility is controlled at two levels: choice of atomic expressions ( ) and the length (k) of the deliberation phase. We will judge intelligence of an agent on the basis of clevereness of its search strategy/algorithm for the best (least costly) solution. Search algorithms are judged on the basis of their
Completeness: is the strategy guaranteed to nd a solution when there is one? Search Cost: how much resources (e.g., time and space complexity: how long does it take and how much memory) are used to nd a solution? Optimality: does the strategy nd the highest quality solution when there are several dierent solutions? Total Optimality: does the strategy nd the highest quality solution with optimal search cost? Search can be cooperative or competitive and involve: - single agent: and algorithms like depth- rst, breadth- rst, uniform cost, iterative deepening, A*, IDA*, SMA*, hill climbing, simulated annealing, and models of computation like Turing Machines, -calculus, RAM-machine, - two agents: using algorithms like minimax, alpha-beta, expectiminimax, and models of computation like -calculus, Sequential Interaction Machines, - multiple agents: and algorithms like k -optimization with models of computation like cellular automata, neural networks, automata networks, Multistream Interaction Machines, PRAM, MPRAM, $-calculus.
Thus the k Optimization provides a exible approach to local and/or global optimization in time or space. Technically this is done by replacing parts of $-expressions with invisible $-expressions ", which remove part of the world from consideration. For observation bisimilarity/congruence, silent $-expressions have neutral cost (0 for the crisp cost function) and do not participate directly in optimization. They are indirectly an integral part of $-expressions, and in uence the results.
An Agent and an Environment Consisting of Zero or More Other Agents Given:
A - an alphabet of simple $-expressions ofS the universe consisting of an enumerable number of agents 6 j. Ai (including an environment), A = i Ai , Ai \ Aj = ;; i =
Ai - an alphabet of simple $-expressions of the i-th agent, Pi0 2 P - initial $-expression, and optional (explicit) goals Qi P , an implicit goal is min($ Pi0 ).
i A - a scope of deliberation/interests of the i-th agent, i.e. a subset of the universe's simple $expressions chosen for optimization. All elements of A ? i represent an environment or irrelevant part of an agent, and will become invisible (replaced by "), thus either ignored or unreachable for a given agent (makes optimization local spatially). Expressions over A ? i will be treated as observationally congruent (cost of " becomes neutral, e.g., for crisp cost function equal 0). All expressions over i ? Ai will be treated as strongly congruent - they will be replaced by " and although invisible, their cost will be estimated using best available knowledge of an agent (may take arbitrary values from the cost function domain). $i - cost function (selected or user de ned). hi - a threshold value of cost function allowing to execute simple $-expressions. ki = 0; 1; 2; :::; 1 - represents the depth of deliberation, i.e. the number of steps in the derivation tree selected for optimization in the examine phase (decreasing ki prevents combinatorial explosion, but can make optimization local in time). ki = 1 is a shorthand to mean to the end to reach a goal (may not require in nite number of steps). ki = 0 means omitting optimization (i.e., empty deliberation) leading to reactive behaviors. Steps consist of multisets of simple $-expressions, i.e. a parallel execution of one or more simple $-expressions constitutes one elementary step. ni = 0; 1; 2; :::; 1 - the number of steps selected for execution in the execute phase. For ni > ki steps larger than ki will be executed without optimization in reactive manner. For ni = 0 execution will be postponed until the goal with minimal costs will be reached. gi = 0; 1; 2 - history o gi = 0, history on gi = 1, and total history on gi = 2. Find: a $-expression with min($ Pi0 )
The k -Optimization Algorithm (part of the loop of the simple modifying algorithm) 1. Select Phase (sel Pi ): Find a derivation partial tree at most ki steps deep over alphabet A starting from current state Pi using all choices in cost choice [ . In the construction of the derivation tree replace (perhaps by exchanging messages with other agents) $-expressions hidden by " in previous select phase to the real values and learn their costs. In the expansion replace all simple $-expressions which do not belong to i by " (i.e. hide them - make them invisible - they will be treated as observationally congruent with neutral cost equal 0 for the crisp cost function). Replace also $-expressions which are at boundary depth larger than ki , or belonging to i ? Ai by ", and estimate their costs, i.e. they will be treated as strongly congruent. Initially Pi = Pi0 . 2. Examine Phase (exam Pi ): Skip Examine Phase if ki = 0. Otherwise for strong congruence, use estimates or precise values of costs for hidden (by ") $-expressions. For observation congruence, use ($ ") = 0 (for crisp cost function). Select paths in nondeterministic choices which cost is minimal (using inference rule Cif). Ties are broken randomly. 3. Execute Phase (exec Pi ): If ni = 0 and goal state with minimal costs not reached in the Examine Phase, append generated tree so far to the existing tree, and pick up its most promising leaf node (with minimal cost) for expansion, i.e., make it a current node and return to Select Phase. If ni = 0 and goal state with minimal costs has been reached in the Examine Phase, prune the execution tree by removal branches in cost choices which do not lead to the goal state (where cost backtracking occurred), and execute Pi up to the goal node. If ni 6= 0, execute Pi up to at most ni steps. [
If execution interrupted, decrease ni by the value proportional to the remaining steps if ni 6= 0, or if ni = 0 decrease ki , otherwise increase ni pending 0 < ni < ki . If ni = ki increase ki . If execution not possible increase threshold hi . If cost of search too large decrease ki , otherwise increase it. Modify other parameters of meta-system, including costs, if they have changed. If termination condition not satis ed (goal not reached) return to Select Phase with a new current state. If goal with minimal cost reached, work on a new goal by returning to Init.
5 An Illustrating Example of Bounded Rationality by k -Search We will present a single agent A* heuristic search [9], which besides Minimax, belongs to the one of the best known informed search algorithms. A* selects the node with the minimal value of the evaluation function f(n)=g(n)+h(n), where g(n) gives the path cost from the start node to node n, and h(n) is the estimated cost of the cheapest path from n to the goal. A* is complete on locally nite graphs, path optimal if cost estimates is admissible (i.e., not overestimated), with exponential time and space search cost. A* compared to uninformed search, allows considerably cut the search space, however, it still has an exponential time and space complexity. Usually, for more realistic domains, the programmer runs out of the available memory space, before the optimum can be found. The resource-bounded optimization can help exactly, where the problem of A* is, i.e. to take into account directly the search cost plus the solution quality. Let's present A* work using a special case of the $-calculus k -optimization with A as a root, D, F, I, J as goal states, and values of f(n) written close to the states. A
m
0+12=12 ! a ! aa aa 8 10 !!! aa G B!!!! aa
m
m
10+5=15 8+5=13
J
J 10
8
J 16 J 10 J I J D C
H
J 24+0=24
J 20+5=25 20+0=20 16+2=18
J
J J8 J 10 10
8
J K
J E
J
J JF 30+5=35 30+0=30 24+0=24 24+5=29
m
m
m
m
m
m
m
m
We need only 3 operators from $-calculus: cost ($), cost choice ( [ ) and sequential composition ( ) to describe the work of A*. The system consists of one agent only which is interested in everything, i.e. A = A1 = and it uses a crisp cost function $. The number of steps in the derivation tree selected for optimization in the examine phase k = 1, and the number of steps selected for execution in the examine phase n = 0, i.e., execution is postponed until the optimum is found. Threshold h = 1, and the total history is on g = 2, i.e., cost of a state adds to the future actions the cost history of its parent, cost of action leading from the parent, and cost of search. In the example presented, the time search cost is proportional to the number of nodes expanded, i.e. 5, and the space search cost is proportional to the number of created nodes, i.e. 9 (A* has to keep all generated nodes in memory). $-calculus can incorporate the search cost by optimizing rather total cost and not path cost only (totalCost=pathCost+searchCost). Let's assume that we are interested in space complexity search cost measured as a number of created nodes multiplied by the cost of the node creation. Let's assume that each created node has cost 2: 1. ($ ( A ")) = (2 + 0 + 12) , initialization - A selected for expansion [
2. ($ ( [ ( B ") ( G "))) = min((4 + 10 + 5); (6 + 8 + 5)) , both have the same cost, let's assume G selected 3. ($ ( [ ( B ") ( H ") ( I "))) = min((4 + 10 + 5); (8 + 16 + 2); (10 + 24 + 0)) , B selected 4. ($ ( [ ( C ") ( D ") ( H ") ( I "))) = min((12+20+5); (14+20+0); (8+16+2); (10+24+0)) , H selected 5. ($ ( [ ( C ") ( D ") ( J ") ( K ") ( I "))) = min((12 + 20 + 5); (14 + 20 + 0); (16 + 24 + 0); (18 + 24 + 5); (10 + 24 + 0)) , D or I can be selected, which are the goals and the end of search. The total optimum is either at nodes D or I with total path cost (path cost + memory cost) 34. [
[
[
[
6 Conclusions Because of the space limitations, we did not present appropriate theorems, showing how multiple agents can achieve global optimum (global total optimum) in spite that they work with limited time (k) and space ( ) horizons. In [6] several other examples have beed presented illustrating the versality of $-calculus as a computational theory of AI based on search and optimization under bounded resources. Those examples described AI search for single, two and multiple agents, dynamical programming, evolutionary computation, neural networks and arti cial life in terms of $-calculus k -search. All of them used an optimization mechanism to trade o solution quality for resource availability.
References [1] Dean T., Boddy M., An Analysis of Time-Dependent Planning, in Proc. Seventh National Conf. on Arti cial Intelligence, Minneapolis, Minnesota, 1988, 49-54. [2] Eberbach E., SEMAL: A Cost Language Based on the Calculus of Self-modi able Algorithms, Intern. Journal of Software Engineering and Knowledge Engineering, vol.4, no.3, 1994, 391-408. [3] Eberbach E., A Generic Tool for Distributed AI with Matching as Message Passing, Proc. of the Ninth IEEE Intern. Conf. on Tools with Arti cial Intelligence TAI'97, Newport Beach, California, 1997, 11-18. [4] Eberbach E., Brooks R., Phoha S., Flexible Optimization and Evolution of Underwater Autonomous Agents, in: New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, Proc. of the 7th Intern. Workshop on Rough Sets, Fuzzy Sets, Data Mining and Granular-Soft Computing RSFDGrC'99, Yamaguchi, Japan, LNAI 1711, Springer-Verlag, 1999, 519-527. [5] Eberbach E., Expressiveness of $-Calculus: What Matters?, in: Advances in Soft Computing, Proc. of the 9th Intern. Symp. on Intelligent Information Systems IIS'2000, Bystra, Poland, Physica-Verlag, 2000, 145-157. [6] Eberbach E., $-Calculus: Flexible Optimization as Search under Bounded Resources in Interactive Systems, School of Computer Science, Acadia University, 2000. [7] Horvitz E., Reasoning about Beliefs and Actions under Computational Resources Constraints, in Proc. of the 1987 Workshop on Uncertainty in Arti cial Intelligence, Seattle, Washington, 1987. [8] Milner R., Parrow J., Walker D., A Calculus of Mobile Processes, I & II, Information and Computation 100, 1992, 1-77. [9] Russell S., Norvig P., Arti cial Intelligence: A Modern Approach, Prentice-Hall, 1995. [10] Russell S., Wefald E., Do the Right Thing: Studies in Limited Rationality, The MIT Press, 1991. [11] Wegner P., Interactive Foundations of Computing, Theoretical Computer Science, 1997, (also http://www.cs.brown.edu/people/pw). [12] Zilberstein S., Operational Rationality through Compilation of Anytime Algorithms, Ph.D. Thesis, Univ. of California at Berkeley, 1993.