The kΩ-Optimization Distributed Meta-Level Control for Cooperation and Competition of Bounded Rational Agents Eugene Eberbach? Dept. of Eng. and Science, Rensselaer Polytechnic Institute 275 Windsor Street, Hartford, CT 06120
[email protected],
[email protected]
Abstract. The $-calculus process algebra for problem solving applies the cost performance measures to converge to optimal solutions with minimal problem solving costs. The same meta-level kΩ-optimization control can be used to find the best quality solutions (expressed as optimization problems), the most effective solutions (expressed as search optimization problems), or to find solutions representing the tradeoff between the best quality and least costly solutions (expressed as totally optimization problems). The total optimization is described as an instance of multiobjective optimization. In this paper we demonstrate that cooperation and competition of multiagent systems can be naturally investigated as a multiobjective optimization too. Keywords: bounded rational agents, problem solving, process algebra, optimization under bounded resources, cooperation and competition, mutlitobjective optimization
1
Introduction
Resource-based reasoning [10, 12], called also anytime algorithms, trading off the quality of solutions for the amount of resources used, seems to be particularly well suited for the solution of hard computational problems in real time and under uncertainty. On the other hand, process algebras [11] are currently the most mature approach to concurrent and distributed systems, and seem to be the appropriate way to formalize multiagent systems. We combine the best of two words, by using the $-calculus process algebra for problem solving inspired by ideas from anytime algorithms. Technically, the $-calculus is a process algebra derived from Milner’s π-calculus [11] extended by von Neumann/Morgenstern’s costs/utilities [13] and a very general meta-search method, called the kΩ-optimization [7]. This novel search method allows to simulate many other search algorithms (of course, not all), including A*, minimax, expectiminimax, hill climbing, dynamic programming, evolutionary algorithms, ?
Research supported in part by ONR under grant N00014-06-1-0354. On leave from University of Massachusetts Dartmouth.
neural networks. The search tree can be infinite - this allows to solve in the limit non-recursively some undecidable problems (for instance, the halting problem of the Universal Turing Machine, or to approximate a nonexisting universal search algorithm [9]). For solutions of intractable problems the total optimization is utilized to provide an automatic way to deal with intractability by optimizing together the quality of solutions and search costs. This paper concentrates on cooperation and competition of distributed multiagent systems Section 2 contains a primer on the $-calculus, needed for understanding the following sections. The syntax and semantics based on the kΩoptimization meta-control are presented. Section 3 illustrates the $-calculus approach to problem solving using the kΩ-optimization meta-search. Section 4 presents new results on cooperation and competition of multiagent systems expressed as an instance of multiobjective optimization. Section 5 contains conclusions and problems to be solved in the future.
2
The $-Calculus Process Algebra of Bounded Rational Agents
The $-calculus is a mathematical model of processes capturing both the final outcome of problem solving as well as the interactive incremental way how the problems are solved. The $-calculus is a process algebra of Bounded Rational Agents for interactive problem solving targeting intractable and undecidable problems. It has been introduced in the late of 1990s [2, 5, 7]. The $-calculus (pronounced cost calculus) is a formalization of resource-bounded computation (also called anytime algorithms), proposed by Dean, Horvitz, Zilberstein and Russell in the late 1980s and early 1990s [10, 12]. Anytime algorithms are guaranteed to produce better results if more resources (e.g., time, memory) become available. The standard representative of process algebras, the π-calculus [11] is believed to be the most mature approach for concurrent systems. The $-calculus rests upon the primitive notion of cost in a similar way as the π-calculus was built around a central concept of interaction. Cost and interaction concepts are interrelated in the sense that cost captures the quality of an agent interaction with its environment. The unique feature of the $-calculus is that it provides a support for problem solving by incrementally searching for solutions and using cost to direct its search. The basic $-calculus search method used for problem solving is called kΩ-optimization. The kΩ-optimization represents this “impossible” to construct, but “possible to approximate indefinitely” universal algorithm. It is a very general search method, allowing the simulation of many other search algorithms, including A*, minimax, dynamic programming, tabu search, or evolutionary algorithms. Each agent has its own Ω search space and its own limited horizon of deliberation with depth k and width b. Agents can cooperate by selecting actions with minimal costs, can compete if some of them minimize and some maximize costs, and be impartial (irrational or probabilistic) if they do not attempt optimize (evolve, learn) from the point of view of the ob-
server. It can be understood as another step in the never ending dream of universal problem solving methods recurring throughout all computer science history. Because the $-calculus, similar like Turing Machine, is a generic tool for problem solving, its applications are enormous. The $-calculus is applicable to robotics [4], software agents, neural nets, and evolutionary computation [3]. In particular, it has been applied for the design of special-purpose CCL language to control the group of Undersea Autonomous Vehicles [4] and general-purpose CO$T language for problem solving [6]. It has been used to simulate by one algorithm - the kΩ-optimization both single and multiple, global and local sequence alignment algorithms [1], to model polymorphic viruses [8]. Potentially $-calculus could be used for design of cost languages [6], cellular evolvable cost-driven hardware, DNA-based computing and bioinformatics, electronic commerce, data mining, machine vision, and quantum computing [7]. The $-calculus leads to a new programming paradigm cost languages [6], a new class of computer architectures cost-driven computers, and belongs to new superTuring models of computation [5, 9]. 2.1
The $-Calculus Syntax
In $-calculus everything is a cost expression: agents, environment, communication, interaction links, inference engines, modified structures, data, code, and meta-code. $-expressions can be simple or composite. Simple $-expressions α are considered to be executed in one atomic indivisible step. Composite $-expressions P consist of distinguished components (simple or composite ones) and can be interrupted. Throughout the paper we use prefix notation. Data, functions, and meta-code are written as (fi∈I xi ), where f is a function/data/meta-code name and x1 , x2 , ... is a vector of its parameters (possibly countably infinite). Definition 1. The $-calculus The set P of $-calculus process expressions consists of simple $-expressions α and composite $-expressions P (including indexed cost expressions Pi ) , and is defined by the following syntax: α ::= ($i∈I Pi ) cost | (→i∈I c Pi ) send Pi with evaluation through channel c | (←i∈I c Xi ) receive Xi from channel c | (0i∈I Pi ) suppress evaluation of Pi | (ai∈I Pi ) defined call of simple $-expr. a with parameters Pi | (¯ ai∈I Pi ) negation of defined call of simple $-expression a P ::= ( ◦ i∈I α Pi ) sequential composition | ( k i∈I Pi ) parallel composition ∪ | (∪ i∈I Pi ) cost choice + | ( ∪ i∈I Pi ) adversary choice | (ti∈I Pi ) general choice | (fi∈I Pi ) defined process call f with parameters Pi , and its associated definition (:= (fi∈I Xi ) R) with body R
The indexing set I is a possibly countably infinite. In the case when I is empty, we write empty parallel composition, general, cost and adversary choices as ⊥ (blocking), and empty sequential composition (I empty and α = ε) as ε (invisible transparent action, which is used to mask, make invisible parts of $expressions). Adaptation (evolution/upgrade) is an essential part of $-calculus, and all $-calculus operators are infinite (an indexing set I is unbounded). The $-calculus agents interact through send-receive pair as the essential primitives of the model. Sequential composition is used when $-expressions are evaluated in a textual order. Parallel composition is used when expressions run in parallel and it picks a subset of non-blocked elements at random. Cost choice is used to select the cheapest alternative according to a cost metric. Adversary choice is used to select the most expensive alternative according to a cost metric. General choice picks one non-blocked element at random. General choice is different from cost and adversary choices. It uses guards satisfiability. Cost and adversary choices are based on cost functions. Call and definition encapsulate expressions in a more complex form (like procedure or function definitions in programming languages). In particular, they specify recursive or iterative repetition of $-expressions. Simple cost expressions execute in one atomic step. Cost functions are used for optimization and adaptation. The user is free to define his/her own cost metrics. Send and receive perform handshaking message-passing communication, and inferencing. The suppression operator suppresses evaluation of the underlying $-expressions. Additionally, a user is free to define her/his own simple $-expressions, which may or may not be negated. 2.2
The $-Calculus Semantics: The kΩ-Opimization Meta-Search
In this section we define the operational semantics of the $-calculus using the kΩsearch that captures the dynamic nature and incomplete knowledge associated with the construction of the problem solving tree. The basic $-calculus problem solving method, the kΩ-optimization, is a very general search method providing meta-control, and allowing to simulate many other search algorithms, including A*, minimax, dynamic programming, tabu search, or evolutionary algorithms [12]. The problem solving works iteratively: through select, examine and execute phases. In the select phase the tree of possible solutions is generated up to k steps ahead, and agent identifies its alphabet of interest for optimization Ω. This means that the tree of solutions may be incomplete in width and depth (to deal with complexity). However, incomplete (missing) parts of the tree are modeled by silent $-expressions ε, and their cost estimated (i.e., not all information is lost). The above means that kΩ-optimization may be if some conditions are satisfied to be complete and optimal. In the examine phase the trees of possible solutions are pruned minimizing cost of solutions, and in the execute phase up to n instructions are executed. Moreover, because the $ operator may capture not only the cost of solutions, but the cost of resources used to find a solution, we obtain a powerful tool to avoid methods that are too costly, i.e., the $-calculus directly minimizes search cost. This basic
feature, inherited from anytime algorithms, is needed to tackle directly hard optimization problems, and allows to solve total optimization problems (the best quality solutions with minimal search costs). The variable k refers to the limited horizon for optimization, necessary due to the unpredictable dynamic nature of the environment. The variable Ω refers to a reduced alphabet of information. No agent ever has reliable information about all factors that influence all agents behavior. To compensate for this, we mask factors where information is not available from consideration; reducing the alphabet of variables used by the $-function. By using the kΩ-optimization to find the strategy with the lowest $-function, meta-system finds a satisficing solution, and sometimes the optimal one. This avoids wasting time trying to optimize behavior beyond the foreseeable future. It also limits consideration to those issues where relevant information is available. Thus the kΩ optimization provides a flexible approach to local and/or global optimization in time or space. Technically this is done by replacing parts of $-expressions with invisible $-expressions ε, which remove part of the world from consideration (however, they are not ignored entirely - the cost of invisible actions is estimated). The kΩ-optimization meta-search procedure can be used both for single and multiple cooperative or competitive agents working online (n 6= 0) or offline (n = 0). The $-calculus programs consist of multiple $-expressions for several agents. Let’s define several auxiliary notions used in the kΩ-optimization metasearch. Let: – A - be an alphabet of $-expression names for an enumerable universe of agent population (includingSan environment, i.e., one agent may represent an environment). Let A = i Ai , where Ai is the alphabet of $-expression names (simple or complex) used by the i-th agent, i = 1, 2, ..., ∞. We will assume that the names of $-expressions are unique, i.e., Ai ∩ Aj = ∅, i 6= j (this always can be satisfied by indexing $-expression name by a unique agent index. This is needed for an agent to execute only own actions). The agent population size will be denoted by p = 1, 2, ..., ∞. – xi [0] ∈ P - be an initial $-expression for the i-th agent, and its initial search procedure kΩi [0]. – min($i (kΩi [t] xi [t])) - be an implicit default goal and Qi ⊆ P be an optional (explicit) goal. The default goal is to find a pair of $-expressions, i.e., any pair (kΩi [t], xi [t]) being min{($i (kΩi [t], xi [t])) = $1i ($2i (kΩi [t]), $3i (xi [t]))}, where $3i is a problem-specific cost function, $2i is a search algorithm cost function, and $1i is an aggregating function combining $2i and $3i . This is the default goal for total optimization looking for the best solutions xi [t] with minimal search costs kΩi [t]. It is also possible to look for the optimal solution only, i.e., the best xi [t] with minimal value of $3i , or the best search algorithm kΩi [t] with minimal costs of $2i . The default goal can be overwritten or supplemented by any other termination condition (in the form of an arbitrary
–
–
–
–
–
–
$-expression Q) like the maximum number of iterations, the lack of progress, etc. $i - a cost function performance measure (selected from the library or user defined). It consists of the problem specific cost function $3i , a search algorithm cost function $2i , and an aggregating function $1i . Typically, a user provides cost of simple $-expressions or an agent can learn such costs (e.g., by reinforcement learning). The user selects or defines also how the costs of composite $-expressions will be computed. The cost of the solution tree is the function of its components: costs of nodes (states) and edges (actions). This allows to express both the quality of solutions and search cost. Ωi ⊆ A - a scope of deliberation/interests of the i-th agent, i.e., a subset of the universe’s of $-expressions chosen for optimization. All elements of A−Ωi represent irrelevant or unreachable parts of an environment, of a given agent or other agents, and will become invisible (replaced by ε), thus either ignored or unreachable for a given agent (makes optimization local spatially). Expressions over Ai − Ωi will be treated as observationally congruent (cost of ε will be neutral in optimization, e.g., typically set to 0). All expressions over Ωi − Ai will be treated as strongly congruent - they will be replaced by ε and although invisible, their cost will be estimated using the best available knowledge of an agent (may take arbitrary values from the cost function domain). bi = 0, 1, 2, ..., ∞ - a branching factor of the search tree, i.e., the maximum number of generated children for a parent node. For example, hill climbing has bi = 1, for binary tree bi = 2, and bi = ∞ is a shorthand to mean to generate all children (possibly infinite many). ki = 0, 1, 2, ..., ∞ - represents the depth of deliberation, i.e., the number of steps in the derivation tree selected for optimization in the examine phase (decreasing ki prevents combinatorial explosion, but can make optimization local in time). ki = ∞ is a shorthand to mean to the end to reach a goal (may not require infinite number of steps). ki = 0 means omitting optimization (i.e., the empty deliberation) leading to reactive behaviors. Similarly, a branching factor bi = 0 will lead to an empty deliberation too. Steps consist of multisets of simple $-expressions, i.e., a parallel execution of one or more simple $-expressions constitutes one elementary step. ni = 0, 1, 2, ..., ∞ - the number of steps selected for execution in the execute phase. For ni > ki steps larger than ki will be executed without optimization in reactive manner. For ni = 0 execution will be postponed until the goal will be reached. For the depth of deliberation ki = 0, the kΩ-search will work in the style of imperative programs (reactive agents), executing up to ni consecutive steps in each loop iteration. For ni = 0 search will be offline, otherwise for ni 6= 0 - online. gp, reinf , strongcon, update - auxiliary flags used in the kΩ-optimization meta-search procedure.
Each agent has its own kΩ-search procedure kΩi [t] used to build the solution xi [t] that takes into account other agent actions (by selecting its alphabet of
interests Ωi that takes actions of other agents into account). Thus each agent will construct its own view of the whole universe that only sometimes will be the same for all agents (this is an analogy to the subjective view of the “objective” world by individuals having possibly different goals and different perception of the universe). Definition 2. The kΩ-Optimization Meta-Search Procedure The kΩoptimization meta-search procedure kΩi [t] for the i-th agent, i = 0, 1, 2, ..., from an enumerable universe of agent population and working in time generations t = 0, 1, 2, ... is a complex $-expression (meta-procedure) consisting of simple $expressions initi [t], seli [t], exami [t], goali [t], $i [t], complex $-expression loopi [t] and execi [t], and constructing solutions, its input xi [t], from predefined and user defined simple and complex $-expressions. For simplicity, we will skip time and agent indices in most cases if it does not cause confusion, and we will write init, loop, sel, exam, goali and $i . Each i-th agent performs the following kΩ-search procedure kΩi [t] in the time generations t = 0, 1, 2, ...: (:= (kΩi [t] xi [t]) ( ◦ (init (kΩi [0] xi [0])) // initialize kΩi [0] and xi [0] (loop xi [t + 1])) // basic cycle: select, examine, execute ) where loop meta-$-expression takes the form of the select-examine-execute cycle performing the kΩ-optimization until the goal is satisfied. At that point, the agent re-initializes and works on a new goal in the style of the never ending reactive program: (:= (loop xi [t]) // loop recursive definition (t ( ◦ (goali [t] (kΩi [t] xi [t])) // goal not satisfied, default goal min($i (kΩi [t] xi [t])) (sel xi [t]) // select: build problem solution tree k step deep, b wide (exam xi [t]) // examine: prune problem solution tree in cost and in adversary choices
(exec (kΩi [t] xi [t])) (loop xi [t + 1])) ( ◦ (goali [t] (kΩi [t] xi [t])) (kΩi [t] xi [t])))
// execute: run optimal xi n steps and update kΩi param. // return back to loop // goal satisfied - re-initialize search
) More details on the kΩ-search, including inference rules of the Labeled Transition System, observation and strong bisimulations and congruences, the standard cost function definition, and more details on init, goal, sel, exam, exec can be found in [7].
3
An Illustrating Example of Problem Solving by the kΩ-Optimization Meta-Search
To give some kind of intuition how the $-calculus solves specific problems by applying the kΩ-optimization meta-search, we will consider the case of two competitive agents using the minimax search. The example is for illustration only,
because the kΩ-optimization is a very general search method indeed (able to simulate many existing search methods and to create and investigate new search algorithms). The minimax search was proposed by von Neumann in 1944 [13] and used by Claude Shannon in 1950 for chess playing program. This two-player search method assumes perfect information and it can determine the best move for a player (assuming that the opponent plays perfectly) by enumerating the entire game tree. The player maximizes payoff, and its opponent minimizes it in successive moves. Taking maximum by the player, and minimum by the opponent in consecutive moves (called ply) going from leaves towards the root, the best move for the player maximizing payoff is determined. Minimax is complete on locally finite graphs, with exponential time and quadratic space search cost, and is path optimal under perfect information assumption (both agents know exactly the moves of its opponent). For illustration, we will use the same example of minimax as in [12]. “Our” agent performs minimization, and its opponent maximization. The payoff/goal states are S11 ,S12 ,S13 ,S21 ,S22 ,S23 ,S31 ,S32 ,S33 . The payoff is represented by negative costs. MIN
MAX
nS0 aa ! !! aa ! aa a3 a1!! a2 ! aa ! ! a S1 S2 S3
J
J
J
J
J
J a11 a12 a13J a21 a22 a23J a31 a32 a33J
J
J
J n n n n n n n n n S11 S12 S13 S21 S22 S23 S31 S32 S33 -3 -12 -8 -2 -4 -6 -14 -5 -2
Fig:1 A minimax complete search tree A = Ω1 = Ω2 = A1 ∪ A2 , where ∪ , ◦ , tS , S , S A1 = { ∪ 0 0 11 , S12 , S13 ,S21 , S22 , S23 , S31 , S32 , S33 , a1 , a2 , a3 }, + , ◦ , tS , tS , tS , S , S , S , a A2 = { ∪ 1 2 3 1 2 3 11 , a12 , a13 , a21 , a22 , a23 , a31 , a32 , a33 }. Their subsets, alphabets of simple $-expressions for both agents will be B1 = {S0 , S11 , S12 , S13 ,S21 , S22 , S23 , S31 , S32 , S33 , a1 , a2 , a3 }, and B2 = {S1 , S2 , S3 , a11 , a12 , a13 , a21 , a22 , a23 , a31 , a32 , a33 }, respectively. Both agents use a standard cost function, however first agent performs mini∪ - minimizing negative cost maximizes its payoff), mization (uses a cost choice ∪ + (by maximizing negative cost of and second agent uses an adversary choice ∪
the first agent, it minimizes first agent payoff). If to not count the meta-system itself - the kΩ-optimization, to simulate minimax we need only four operators + for constructing possible solutions, and $-operator ∪ , ∪ from the $-calculus: ◦ , ∪ for optimization. For both agents k = ∞ (optimize until the payoffs/leaves are reached), n = 1 (execute one action only after optimization). Note that cooperat∪ , i.e., they would minimize costs in unison. Note also ing agents both would use ∪ that actions/edges do not have any costs (neutral costs 0), only nodes give payoff (more precisely, the goal nodes). Flags gp = reinf = strongcong = update = 0. The user provides also the costs of simple $-expressions and the structure of the minimax tree in the form of the subtrees with roots/parents S0 , S1 , S2 , S3 , i.e., subtrees tS0 , tS1 , tS2 , tS3 , i.e., ∪ ( ◦ S a tS )( ◦ S a tS )( ◦ S a tS ))), (:= tS0 ( ∪ 0 1 1 0 2 2 0 3 3 + (◦ S a (:= tS1 ( ∪ 1 11 S11 )( ◦ S1 a12 S12 )( ◦ S1 a13 S13 ))), + (◦ S a (:= tS2 ( ∪ 2 21 S21 )( ◦ S2 a22 S22 )( ◦ S2 a23 S23 ))), + (◦ S a (:= tS3 ( ∪ 3 31 S31 )( ◦ S3 a32 S32 )( ◦ S3 a33 S33 ))). This structure will be used in the sel phase to build incrementally the tree by applying the structural congruence inference rule from the LTS. Optimization: Only $3 is used, i.e., $ = $3 - an optimal move is looked for, the costs of the minimax search are ignored. In other words costs of the kΩ-search, simulating minimax, are ignored. The goal is the minimum of $3 . We will present the tree for the 1st “minimum” agent. The tree for the 2nd agent is practically the same (only the 2nd agent looks for the maximum of costs the zero-sum game). Of course, each agent executes its own actions only, and in this sense both trees are fully synchronized, i.e., agent 1 will not execute actions a11 , ..., a33 until agent 2 will execute one of them on its own tree (and the move will be observed and registered by agent 1). Then consecutive steps (cycles) of the kΩ-optimization simulate precisely the minimax algorithm for the 1st agent: 0. t=0, initialization phase init : The root tS0 = ( ◦ S0 εS0 ) is created with an empty continuation εS0 , and parameters for the kΩ are selected: k1 = b1 = ∞, n1 = 1, Ω1 = A1 ∪ A2 , a standard cost function is used $ = $3 (optimization only, and not total optimization, i.e., cost of search is ignored), $3 for S0 , S1 , S2 , S3 , a1 , a2 , a3 , a11 , ..., a33 is 0, and $3 (S11 ) = −3,$3 (S12 ) = −12,$3 (S13 ) = −8,$3 (S21 ) = −2, $3 (S22 ) = −4,$3 (S23 ) = −6, $3 (S31 ) = −14,$3 (S32 ) = −5,$3 (S33 ) = −2. S0 is not the goal state and is selected for expansion. Because k1 = ∞, a complete problem solution tree will be built in the select phase. 1. t=1, first loop iteration, sel, exam, exec ∪ ( ◦ a tS ) ( ◦ a tS ) ( ◦ a tS )) select phase sel : εS0 = ( ∪ 1 1 2 2 3 3 + (◦ S a tS1 = ( ∪ 1 11 S11 ε) ( ◦ S1 a12 S12 ε) ( ◦ S1 a13 S13 ε)) + (◦ S a tS2 = ( ∪ 2 21 S21 ε) ( ◦ S2 a22 S22 ε) ( ◦ S2 a23 S23 ε)) + (◦ S a tS3 = ( ∪ 3 31 S31 ε) ( ◦ S3 a32 S32 ε) ( ◦ S3 a33 S33 ε)) examine phase exam : $3 (tS0 ) =
+ (◦ S a ∪ (◦ S a ( ∪ $3 ( ∪ 0 1 1 11 S11 ε) ( ◦ S1 a12 S12 ε) ( ◦ S1 a13 S13 ε))) + ( ◦ S2 a2 ( ∪ ( ◦ S2 a21 S21 ε) ( ◦ S2 a22 S22 ε) ( ◦ S2 a23 S23 ε))) + (◦ S a ( ◦ S3 a3 ( ∪ 3 31 S31 ε) ( ◦ S3 a32 S32 ε) ( ◦ S3 a33 S33 ε)))) = min(0+0+max(0+0−3+0, 0+0−12+0, 0+0−8+0), 0+0+max(0+0−2+ 0, 0+0−4+0, 0+0−6+0), 0+0+max(0+0−14+0, 0+0−5+0, 0+0−2+0)) = min(−3, −2, −2) = −3 The optimal path is S0 , a1 , S1 , a11 , S11 , ε. Thus action a1 is the optimal move (n1 = 1 - only one action is executed). It is on the path leading to state S11 guaranteeing the best payoff at least -3 pending perfect play of the opponent. execute phase exec : a1 move is executed (on-line algorithm n1 = 1), the opponent is performing a move, and minimax algorithm would be repeated normally again (with a new root being a current state after execution), however because the ply depth is one - the minimax is over, and the kΩsearch is called again to solve another problem.
Because of its time complexity, minimax is impractical (intractable) in most cases. For real games, only a partial tree is considered: the utility function is replaced by an evaluation function, and the goal function by the cutoff function (decreasing k value). The alpha-beta algorithm by pruning search tree (decreasing b value) alleviates a little the problem of the search complexity, and indirectly minimizes the search cost. The resource-bounded approach of the $-calculus solves directly the total optimization problem (minimizing search costs and negative payoff simultaneously). It adds costs of actions and search cost directly (cost of deliberation). Total optimization: $ = $1 ($2 , $3 ) = $2 + $3 - an optimal move is looked for that does not exceed the maximum time limit to analyze the next move. This is equivalent that $2 operates on the kΩ-optimization $-expression where costs of exam for each analyzed/examined move (accepted or rejected) is 1. Costs of init, goal, sel and exec are ignored. Let’s assume that every move takes to analyze 1 unit of time, and let’s assume that minimax depth-first search starts from the right branch. Then consecutive steps (cycles) of the kΩ-optimization are similar as for the “regular” optimization (however, the conclusion is different): 0. t=0, initialization phase init : The root tS0 = ( ◦ S0 εS0 ) is created with an empty continuation ε0 , and parameters for the kΩ are selected: k1 = b1 = ∞, n1 = 1, Ω1 = A1 ∪ A2 , a standard cost function is used for $2 and $3 . $ = $1 ($2 , $3 ) = $2 +$3 (total optimization), $3 (S0 ) = ... = $3 (S3 ) = 0,$3 (S11 ) = −3,...,$3 (S33 ) = −2. S0 is not the goal state and is selected for expansion with the cost 0. Because k1 = ∞, a complete problem solution tree will be built in the select phase. 1. t=1, first loop iteration, sel, exam, exec ∪ ( ◦ a tS ) ( ◦ a tS ) ( ◦ a tS )) select phase sel : εS0 = ( ∪ 1 1 2 2 3 3 + (◦ S a tS1 = ( ∪ 1 11 S11 ε) ( ◦ S1 a12 S12 ε) ( ◦ S1 a13 S13 ε)) + (◦ S a tS2 = ( ∪ 2 21 S21 ε) ( ◦ S2 a22 S22 ε) ( ◦ S2 a23 S23 ε)) + (◦ S a tS3 = ( ∪ 3 31 S31 ε) ( ◦ S3 a32 S32 ε) ( ◦ S3 a33 S33 ε))
examine phase exam : $(kΩ[0] tS0 ) = $2 (kΩ)+ + (◦ S a ∪ (◦ S a ( ∪ $3 ( ∪ 0 1 1 11 S11 ε) ( ◦ S1 a12 S12 ε) ( ◦ S1 a13 S13 ε))) + (◦ S a ( ◦ S2 a2 ( ∪ S 2 21 21 ε) ( ◦ S2 a22 S22 ε) ( ◦ S2 a23 S23 ε))) + ( ◦ S3 a3 ( ∪ ( ◦ S3 a31 S31 ε) ( ◦ S3 a32 S32 ε) ( ◦ S3 a33 S33 ε)))) = $2 (exam(a1 , a11 , ..., a33 )) + min($3 (S0 ) + $3 (a1 ) + max($3 (S1 ) + $2 (a11 ) + $3 (S11 ) + $3 (ε), 0 + 0 − 12 + 0, 0 + 0 − 8 + 0), 0 + 0 + max(0 + 0 − 2 + 0, 0 + 0 − 4 + 0, 0 + 0 − 6 + 0), 0 + 0 + max(0 + 0 − 14 + 0, 0 + 0 − 5 + 0, 0 + 0 − 2 + 0)) = 12+min(0+0+max(0+0−3+0, 0+0−12+0, 0+0−8+0), 0+0+max(0+0−2+ 0, 0+0−4+0, 0+0−6+0), 0+0+max(0+0−14+0, 0+0−5+0, 0+0−2+0)) = 12 + min(−3, −2, −2) = 9 The optimal path is the same as in a “regular” optimization, i.e., S0 , a1 , S1 , a11 , S11 , ε, thus the action a1 would be the optimal move (n = 1). It is on the path leading to state S33 guaranteeing the best total cost 9 (payoff -3 with the costs of thinking/search 12). execute phase exec : a1 move is executed (the on-line algorithm n1 = 1), the opponent is performing a move, and minimax algorithm would be repeated normally again, however because the ply depth is one - the minimax is over, and the kΩ-search is called again to start to work on a new problem.
The question is: can we do better, i.e., can we get a better result than 9 and a different recommended move? The answer is yes. We can get it in the way similar like in A*, by decreasing the amount/cost of search, i.e., by decreasing the width b or the depth k of search and using correct estimates. For example, if we set the width of search b = 1, and we assume that we analyze moves from the right to the left and $3 (εS3 ) = −4,$3 (εS0 ) = −1, we would get in the examine phase: + (◦ S ∪ (◦ S $(kΩ[0] tS0 ) = $2 (kΩ) + $3 ( ∪ 0 a3 ( ∪ 3 a33 S33 ε) εS3 )) εS0 ) = $2 (exam(a3 , a33 )) + min($3 (S0 ) + $3 (a3 ) + max($3 (S3 ) + $2 (a33 ) + $3 (S33 ) + $3 (ε), $3 (εS3 )), $(εS0 )) = 2 + min(0 + 0 + max(0 + 0 − 2 + 0, −4), −1) = 2 − 2 = 0 The optimal path would be S0 , a3 , S3 , a33 , S33 , ε, thus the action a3 would be the recommended move (n = 1) for total optimization. It is on the path leading to state S33 guaranteeing the best total cost 0 (payoff -2 with the costs of thinking 2). Of course, changing the order of analyzing may result in a different recommended move. Note that for different cost estimates of ε (we overestimated costs εS0 ), the result of the recommended move can be different than a3 . Underestimating the cost of εS0 may result in continuation of optimization in the next cycle that may result in a more attractive payoff -3 for a1 , but also increase cost of search at least by two, thus total cost will not be better. Because of the limited space we will skip the general results for the the $calculus optimization under bounded resources: completeness, optimality, search optimality and total optimality. In particular, search optimality and total optimality are crucial for the solution of real-world intractable problems, because both types of optimization take into account an amount of resources available for an agent. For more details, the reader is referred to [7]. Insted of that we
will present some new results about cooperation and competition of multiagent systems expressed as a special case of the $-calculus mulitobjective optimization.
4
Cooperation and Competition of Multiagent Systems as an Instance of the $-Calculus Multiobjective Optimization
Note that a specific j-th agent may take into account other agents actions (by setting its scope of interests Ωj to other agents actions), but truly optimizes and cares about its local goal/cost function. In this sense, an agent is “selfish”. The case of “altruistic” agents that will try to optimize the cost function representing the whole population will be presented in this section. Let $(kΩ[t], X[t]) = $1 ($2 (kΩ[t]), $3 (X[t])), where kΩ[t] = {kΩ1 [t], ..., kΩp [t]}, X[t] = {x1 [t], , xp [t]}. We define a problem-specific cost function $3 for the whole population $3 (X[t]) = $13 ($31 (x1 [t]), ..., $3p (xp [t])), where $13 is an aggregating function for $31 , ..., $3p , and $3j is a problem-specific cost function of the j-th agent with meta-search procedure kΩj and solution xj , j = 1, ..., p, and a meta-search algorithm cost function $2 for the whole population $2 (kΩ[t]) = $12 ($21 (kΩ1 [t]), ..., $2p (kΩp [t])), where $12 is an aggregating function for $21 , ..., $2p , and $2j is a cost function of the j-th agent’s meta-search algorithm kΩj , j = 1, ..., p, and meta-search algorithm kΩj is responsible for evolution of xj . We will present definitions for cooperation and competition of the population of agents for the problem-specific cost function $3 . Similar definitions can be provided for cost functions $ and $2 . Definition 3. (On cooperation of single agent with population) We will say that j-th individual xj cooperates in time t with the whole population with respect to a problem specific cost function $3 iff $3 [t] > $3 [t+1] and others agents cost functions are fixed $3i [t] = $3i [t + 1] for i 6= j. Definition 4. (On cooperation of the whole population) We will say that all agent population cooperates as the whole in time t with respect to a problem specific cost function $3 iff $3 [t] > $3 [t + 1]. Definition 5. (On competition of single agent with population) We will say that j-th agent xj competes in time t with the whole population with respect to a problem specific cost function $3 iff $3 [t] < $3 [t + 1] and others agents cost functions are fixed $3i [t] = $3i [t + 1] for i 6= j. Definition 6. (On competition of the whole population) We will say that all population competes as the whole in time t with respect to a problem specific cost function $3 iff $3 [t] < $3 [t + 1]. In other words, if agent improves (makes worse) cost function of the whole population then it cooperates (competes) with it. If cost function of the whole population improves (deteriorates) then the population exhibits cooperation
(competition) as the whole (independently what its individuals are doing). If individual (population) cooperates (competes) for all moments of time, then it is always cooperative (competitive). Otherwise, it may sometimes cooperate, sometimes compete (like in Iterated Prisoner Dilemma problem). Let us consider the design and analysis problems for the population of p agents and problem-specific cost function $3 (searching for the best quality solutions only, but of the whole population). Definition 7. (On analysis problem for $3 ) Given $31 [0], ..., $3p [0] for agents’ solutions x1 [0], ..., xp [0] from the solution population X[0] and $13 [0] is given. What will be the behavior (emerging, limit behavior) of $3 [t] for X[t]? In particular, will the optimum of $3 be reached? Definition 8. (On synthesis/design problem for $3 ) Given $3 [0] for the population of agents’ solutions X[0]. Find corresponding agents with initial solutions x1 [0], ..., xp [0], their problem-specific cost functions $31 [0], ..., $3p [0] and aggregating function $13 [0] such that $3 [t] will converge to optimum. Theorem 1. (On optimality for cooperating agents - sufficient conditions to solve the design/analysis problems for $3 ) For a given agent population with the set of initial solutions X[0], if agents’ meta-search procedures satisfy five conditions 1. the termination condition is set to the optimum of the problem-specific cost function $3 (X[t]) for the whole population with the optimum $∗3 , 2. search is complete, 3. population is cooperative all time t = 0, 1, 2, ..., 4. agents’s scope of interests covers the whole population, i.e., Ωj = A, j = 1, ..., p, and 5. search is admissible, then population of agents is guaranteed to find the optimum X ∗ of $3 (X[t]) in an unbounded number of generations t = 0, 1, 2, ..., and that optimum will be maintained thereafter. Note that cooperation replaces elitism in sufficient condition for convergence of cooperating members of population looking for the optimum of fitness of the whole population and not of a single individual. Of course, not necessary optimal solution for the whole population would be optimal from the point of view of a single agent, i.e., the agent may have to sacrify their own interests for the “goodness” of the whole population. There is no surprise: if the whole population competes all the time, then the optimum will not be found despite completeness. Theorem 2. (On no free lunch for competing agents - inability to solve the design/analysis problems for $3 ) For a given agent population with the set of initial solutions X[0], if agents’ meta-search procedures satisfy five conditions
1. the termination condition is set to the optimum of the problem-specific cost function $3 (X[t]) with the optimum $∗3 , 2. search is complete, 3. population is competing all time t = 0, 1, 2, ..., 4. agents’s scope of interests covers the whole population, i.e., Ωj = A, j = 1, ..., p, and 5. search is admissible, then population of agents is not guaranteed to find and maintain the optimum X ∗ of $3 (X[t]) even in an unbounded number of generations t = 0, 1, 2, .... If population is sometimes competing, sometimes cooperating, then the optimum sometimes will be found, sometimes not, but the convergence and its maintenance is not guaranteed. Analogous results can be derived for search optimization problems and total optimization problems.
5
Conclusions
In this paper we presented a generic kΩ-optimization meta-control allowing to experiment with various search algorithms (old and new ones), and to investigate cooperation and competition of the population of agents. The approach seems be very promising, because it allows to capture and investigate formally and uniformly cooperation and competition of multiagent systems under bounded resources. Of course, much more research is needed. In particular, we did not investigate necessary and/or sufficient conditions allowing to achieve a common population goal despite presence of competing agents. The tradeoff between an agent and population goals, multiple subpopulation goals should be investigated as well. What seems be very promising that all these extra requirements we believe should be possible to express uniformly as different instances of the multiobjective optimization of the same kΩ-meta search control. The $-calculus is not only a pure theoretical model, but it is implementable, including CO$T - a general purpose multi-agent programming environment and language ( a generic unified tool for AI and computer science [6] - only design in Java), and CCL - a specialized Common Control Language for mission planning and control of multiple Autonomous Underwater Vehicles (AUVs) (a generalization of evolutionary robotics with online control algorithms [4] - a prototype implementation in C/C++ at NUWC/ONR). For the portability reason and similar to Java or Parle from the ESPRIT Fifth Generation SPAN project, implementation was foreseen in two steps: as a compiler and an interpreter. To implement compiler and interpreter - scanning, parsing, semantic analysis, intermediate/target code generation and optimization phases - standard compiler generators tools were used. The kΩ-search has been intended to be an essential part of the interpreter’s optimizer phase. The output of the compiler is in the form of intermediate $-expression “bytecode”, and the optimal output of the intepreter (solving optimization, search optimization or total optimization
problems) is to be downloaded and executed in the select-examine-execute loop controlled by the kΩ-meta search. The author has learned that Adaptive Computing Group at Pardes Institute in Switzerland intends to implement an interpreter to test the $-calculus using formal methods for software model checking and automated proof theorem proving.
References 1. Atallah L., Eberbach E., The $-Calculus Process Algebra for Problem Solving and Its Support for Bioinformatics, Proc. 2nd Indian Intern. Conf. on Artificial Intelligence IICAI’95, Puna, India, 2005, 1547-1564. 2. Eberbach E., $-Calculus Bounded Rationality = Process Algebra + Anytime Algorithms, in: (ed.J.C.Misra) Applicable Mathematics: Its Perspectives and Challenges, Narosa Publishing House, New Delhi, Mumbai, Calcutta, 2001, 213-220. 3. Eberbach E., Evolutionary Computation as a Multi-Agent Search: A $-Calculus Perspective for its Completeness and Optimality, Proc. 2001 Congress on Evolutionary Computation CEC’2001, Seoul, Korea, 2001, 823-830. 4. Eberbach E., Duarte Ch., Buzzell Ch., Martel G., A Portable Language for Control of Multiple Autonomous Vehicles and Distributed Problem Solving, Proc. of the 2nd Intern. Conf. on Computational Intelligence, Robotics and Autonomous Systems CIRAS’03, Singapore, Dec. 15-18, 2003. 5. Eberbach E., Goldin D., Wegner P., Turing’s Ideas and Models of Computation, in: (ed. Ch.Teuscher) Alan Turing: Life and Legacy of a Great Thinker, SpringerVerlag, 2004, 159-194. 6. Eberbach E., Eberbach A., On Designing CO$T: A New Approach and Programming Environment for Distributed Problem Solving Based on Evolutionary Computation and Anytime Algorithms, Proc. 2004 Congress on Evolutionary Computation CEC’2004, vol.2, Portland, OR, 2004, 1836-1843. 7. Eberbach E., $-Calculus of Bounded Rational Agents: Flexible Optimization as Search under Bounded Resources in Interactive Systems, Fundamenta Informaticae, vol.68, no.1-2, 2005, 47-102. 8. Eberbach E., Capturing Evolution of Polymorphic Viruses, Proc. IX National Conf. on Evolutionary Computation and Global Optimization KAEiOG’06, Murzasichle, Prace Naukowe, Elektronika, z.156, Politechnika Warszawska, Warsaw, Poland, 2006, 125-138. 9. Eberbach E., Expressiveness of the π-Calculus and the $-Calculus, Proc. 2006 World Congress in Comp. Sci., Comp. Eng., & Applied Computing, The 2006 Intern. Conf. on Foundations of Computer Science FCS’06, Las Vegas, Nevada, 2006, 24-30. 10. Horvitz, E., Zilberstein, S. (eds), Computational Tradeoffs under Bounded Resources, Artificial Intelligence 126, 2001, 1-196. 11. Milner R., Parrow J., Walker D., A Calculus of Mobile Processes, I & II, Information and Computation 100, 1992, 1-77. 12. Russell S., Norvig P., Artificial Intelligence: A Modern Approach, Prentice-Hall, 1995 (2nd ed. 2003). 13. Von Neumann J., Morgenstern O., Theory of Games and Economic Behavior, Princeton University Press, 1944.