Representing Coordination Relationships with ... - Semantic Scholar

2 downloads 0 Views 104KB Size Report
David Jensen, Michael Atighetchi, Règis Vincent, and Victor Lesser. Learning quantitative ... David J. Spiegelhalter and Steffen L. Lauritzen. Sequential updating ...
Representing Coordination Relationships with Influence Diagrams A. Zunino and A. Amandi ISISTAN Research Institute Faculty of Sciences - UNICEN University Campus Universitario - Paraje Arroyo Seco - (B7001BBO) Tandil - Bs. As., Argentina {azunino, amandi}@exa.unicen.edu.ar

Abstract It is well know the necessity of managing relationships among agents in a multi-agent system to achieve coordinated behavior. One approach to manage such relationships consists of using an explicit representation of them, allowing each agent to choose its actions based on them. Previous work in the area have considered ideal situations, such as fully known environments, static relationships and shared mental states. In this paper we propose to represent relationships among agents and entities in a multi-agent system by using influence diagrams. The advantages of the representation are twofold. First, it enables agents to better reason how to achieve their goals in an uncertain environment inhabited by multiple agents. Second, it can be used to learn new coordination relationships from past experiences.

1 Introduction An autonomous agent is an intelligent entity acting rationally and intentionally with respect to its own goals and the current state of its knowledge [38]. Multi-agent systems (MAS) are concerned with the study of autonomous agents living and interacting in a common environment, sharing resources, negotiating and collaborating to achieve their goals, forming coalitions, having conflicts, etc. Coordination is the process by which an agent reasons about its local actions and the anticipated actions of others to try and ensure the community acts in a coherent manner [18]. A very important issue in designing coordination techniques for multiagent systems is managing situations where the actions of one agent affect the operation of other agents’ actions [27]. For example, one agent’s action can facilitate the actions of other agents, or can enable other agents to perform different activities. The study of these types of non-local effects among agents’ activities is central to coordinate agents in a MAS [9,2]. Not taking into account these relationships can cause multi-agent activities to waste resources, perform badly, or even fail [23]. On the other hand, agents with information about these relationships can choose their actions exploiting beneficial relationships and avoiding conflicts. Unfortunately, accurate knowledge of coordination relationships (CR) is difficult to maintain [12], specially when agents act in an uncertain environment where interactions occur very often. In addition developers may not fully know every detail of the environment or interactions among agents, making the problem harder to solve.

Some attempts on the usage of an explicit representation of CRs have shown their usefulness to achieve coordination in MAS. However, these studies have considered fully known environments, static/stable relationships [23] and no/little learning [19]. This paper deals with the problem of representing and managing coordination relationships in uncertain environments in order to achieve coordinated behavior. We claim that each agent should be able to infer and explain coordination relationships in order to behave coherently. For example, an agent pursuing a goal should be able to predict conflicts, if another agent’s actions can facilitate or hinder the achievement of the goal, etc. Furthermore, each agent should be able to learn CRs. This paper describes an approach to represent coordination relationships assuming that agents inhabit an uncertain environment. We represent beliefs of CRs, utilities and actions by using influence diagrams (ID), an extension of Bayesian networks (BN). In this way, agents are able to both represent and infer how their activities affect other agents’ activities, use this information to achieve coordinated behavior and learn new CRs. For example, agents may represent the probability of having conflicts while doing a specific activity, collaborating in a given task, etc. Then, agents can use this knowledge to decide their actions. In addition, agents might learn these relationships, enhancing themselves with their experiences. The rest of the paper starts with an overview of coordination relationships, Bayesian networks and influence diagrams. Then, in section 3 we show how to represent CRs with IDs. In section 4 we describe the usage of IDs to infer and predict CRs, and to make decisions taking into account utilities and costs. In section 6 we get into the problem of learning CRs. Section 5 introduces the usage of IDs in MAS. In section 7 we describe some related work. Finally, section 8 presents the conclusions and directions for further work.

2 Context 2.1 Coordination Relationships Let us consider a simple MAS with two agents interacting in a shared environment. Each agent is able to affect the environment by executing actions. An agent’s action can influence the execution of other actions, either their own actions or actions of another agents. For example, when an agent takes objects from the environment, it inhibits both itself and another agents from doing so. These interactions between tasks being worked on by a single agent are called non-local effects [9]. On the other hand, if they occurs between tasks being worked on by different agents, they are called coordination relationships. It is worth noting that a coordination relationship is a special case of a non-local effect where two or more agents are involved. Several works have shown that non-local effects and coordination relationships are useful not only to represent domains consisting on multiple agents, but to reason, schedule and coordinate MAS [10,33]. Coordination relationships are crucial to the design and analysis of coordination mechanisms [11]. A common classification of coordination relationships includes the following [7]:

Basic Domain Relationships: relationships such as inhibits, cancels, constraints, facilitates, causes, enables and parent task/subtask. For example, task B is a subtask of task A if B is required for some method of achieving A; task A facilitates task B if information about the solution of A is useful for the solution of B but not necessary. If some task A enables another task B, then A must be completed before B can begin. Graph Relationships: some generalized coordination relationships such as overlaps, necessary, sufficient, extends, subsumes and competes. For example, task A overlaps B if there exists a task G such that A is a parent task of G and B is a parent task of G. Temporal Relationships: these depend on the timing of tasks such as their start and finish times, estimates of these and real and estimated durations. These include before, equal, meets, overlaps, during, starts, finishes, and their inverses. Non-computational Resource Constraints: a final type of relation is the use of physical, non-computational resources. This is the major coordination relationship in some domains such as factory scheduling and office automation [36]. Several works on coordination [23,8,33] have shown that only a subset of these relationships are necessary to achieve coordination. For example, [10] uses enables, facilitates and mutex to coordinate a MAS in charge of hospital patient scheduling. In this paper we represent some of the CRs described above with influence diagrams, an extension of Bayesian networks. In the next section we briefly introduce how to use Bayesian networks to represent relationships between agents. 2.2 Using Bayesian networks A Bayesian network is a graphical representation of the joint probability distribution for a set of variables, encoding uncertain relationships among parameters in a given domain [20]. BNs provide a compact representation of joint probability distributions by taking advantage of conditional dependencies and thus providing an efficient mechanism not only for representing a joint, but also for performing operations such as inference [5] and learning [29]. A BN consists of two components. The first is a directed acyclic graph in which each vertex corresponds to a random variable. Each variable is probabilistically independent of its non-descendant in the graph given the state of its parents. This graph captures the qualitative structure of the probability distribution. In addition it is optimized for efficient inference and decision making. The second component is a collection of local interaction models that describe the conditional probability p(Xi | Pai ) of each variable Xi given its parent Pai (Fig. 1). Together, these two components represent a joint probability distribution over the set of variables X = {X1 , . . . , Xn }: n

p(X) = ∏ p(Xi | Pai )

(1)

i=1

where each variable Xi is conditionally independent of its non-descendant given its parents, or conditioning variables.

For example, let us consider two agents interacting in a shared environment. The first one is able to execute three actions A, B and C, while the second one is able to execute D and E. There are causal relationships between some of these actions. These relationships are depicted in Fig. 1. Each action is represented as a node. An arc between two nodes models a relationship. For example, C → E can be understood as “the execution of C facilitates the execution of E”. The lack of an arc between two nodes such as B and D is an assertion that our belief in whether or not D occurs is unchanged if we notice that B has been executed.

Enable actions of Agent 1

Enable actions of Agent 2

A D B E C

Figure1. Relationships between actions represented as a BN

Associated to each arc, there is a conditional probability distribution modeling the strength of the relationship. For example, the associated distribution of the arcs B → E and C → E is shown in table 1. From the table we can conclude that D is more likely to be executed if either B or C are executed previously. As a consequence B and C facilitate the execution of E.

Table1. P(D|B,C) B D \C F T

F F .5 .5

F T .2 .8

T F .3 .7

T T .1 .9

The usefulness of BNs has been shown in many real world applications, for example, the Microsoft’s Office Assistant [17], the Bayesian Network Tutoring Shell [15] and the Lockheed Martin unmanned underwater vehicle [25]. Despite these experiences, their usage in the field of MAS is still small. The next section describes an extension of Bayesian networks which in addition to representing uncertain knowledge, can use this knowledge to make decisions.

2.3 Using influence diagrams Bayesian networks are useful for representing beliefs, providing a framework to infer and reason. In this section we describe an extension of BNs named influence diagrams (ID) which in addition to uncertain knowledge, model actions and utilities in order to take decisions with the highest expected utility. In this paper, we shall treat decision problems in the framework of utility theory: decisions are taken because they may be useful in some way. Therefore, the various decisions should be evaluated on the basis of the usefulness of their consequences. We shall assume that “usefulness” is measured on a numerical scale called a utility scale, and if several kinds of utilities are involved in the same decision problem, then the scales have a common unit. The utility of an action may depend on the state of some variables called determining variables. For example, the utility of a treatment with penicillin is dependent on the type of infection and whether the patient is allergic to penicillin. To model costs/utilities and actions we use influence diagrams. IDs [20, ch. 6] extends BNs with action nodes and utilities nodes to solve decision problems. Action nodes can be further classified into two types, namely intervening actions, which force a change of state for some variables in the model, and non-intervening actions (observations or tests) of which the impact is not a part of the model. In general, intervening action nodes are depicted as squares, non-intervening actions as triangles, and costs/utilities as diamonds indicating that we attach costs/utilities to their parents. The idea of IDs is to predict the expected distribution for a set of variables using the inference mechanisms of BNs. Then, based on this information, it is possible to calculate the expected utility of executing a set of actions. IDs impose a restriction in the ordering of actions. Let the set of decision variables be Ud = {D1 , . . . , Dn }, with the decisions to be made in the order of their index. Let the sets I0 , . . . , In ; for 0 < k < n, Ik is the set of variables that will be observed between decision Dk and Dk+1 ; where I0 is the initial evidence variables and In is the set of variables that will never be observed or will be observed after the last decision. This induces a partial order ≺ over the set U = Ud ∪ Ur , where Ur is the set of random variables of the ID and ∀Ii ∈ Ur (∀I j ∈ Ur : Ii 6= I j /Ii ∩ I j = φ): I0 ≺ D1 ≺ I1 ≺ · · · ≺ Dn ≺ In Therefore we need to define a restriction on the decision problem, namely that a decision cannot have an impact on a variable already observed: P(Ik |I0 , . . . , Ik−1 , D1 , . . . , Dn ) = P(Ik |I0 , . . . , Ik−1 , D1 , . . . , Dk ) thus we can calculate the joint distribution for Ik without knowledge of the states of the future decisions Dk+1 , . . . , Dn . In the next section we describe an approach to represent coordination relationships with IDs in order to infer and predict how a MAS may behave given a set of hypothesis.

3 Modeling coordination relationships with influence diagrams In this section we show how to model coordination relationships with influence diagrams. We assume that agents do not have accurate information of their own environment. As a consequence, agents represent their beliefs as probability distributions encoded in IDs. Then, they act and reason based on this probabilistic information. In addition, agents can exchange information, shared assumptions, commonly developed viewpoints, commonly accepted social and cultural conventions, and norms in order to learn in a communal way [37]. It is worth noting that the coordination relationships described in section 2 can be represented by IDs [30]. In particular, in this paper we model the following types of CRs: non-computational resource constraints, enables, facilitates and task/subtask. To clarify the concepts introduced in previous sections let us consider a MAS consisting of four agents: Factory, Lab-A, Lab-B and Consultant. The goal of the Factory is to determine the quality of its products maximizing its profits. To do that, the products have to pass two types of evaluations, namely E1 and E2 . Both Lab-A and Lab-B are able to perform these two types of evaluations. Thus the Factory delegates these activities on them. The Factory must decide either to ask Lab-1 (costing $10) or Lab-2 (costing $7) to perform E1 and either to ask Lab-1 (costing $14) or Lab-2 (costing $9) to perform E2 . If both evaluations are successfully performed, the Factory gets an utility of $30 minus the costs. The evaluations of the products are very complex thus the laboratories may fail. This fact is modeled by associating probabilities to each laboratory. In addition, the laboratories have different processes with different qualities to evaluate products. For example, P(E1 = q|AE1 = Lab − 1) represents the probability of success of Lab-1 in the evaluation E1 using a process of quality q given the request of the Factory. The price of an analysis depends on the quality of the process. Therefore, the Factory has to maximize its utilities considering the probabilities of the model, the costs, and, at the same time, minimizing its risks. The results of an evaluation include a report of the different aspects that the laboratory has taken into account. Based on this report, the Factory evaluates the quality of each laboratory. Therefore, in future situations the laboratories will be chosen based on their quality. Moreover, the Factory has a Consultant who is able to recommend a laboratory to perform a given type of evaluation. The consultant charges $15 for his opinion, thus the Factory has to carefully consider whether to ask the Consultant based on its trust in him. The influence diagram representing the application is depicted in Fig. 2. The node G represents the top level goal, namely, evaluating a product minimizing the costs. E1 and E2 represents the results of the evaluations. These nodes are connected to G in order to model two subtask relationship, this is, both E1 and E2 have to be achieved in order to accomplish G. The nodes AE1 and AE2 represent the action “ask L to evaluate a product with a quality q”, where L can be Lab-1 or Lab-2 and q can be low, medium or high. These nodes are connected to E1 and E2 because they cause the results of E1 and E2 . The

arc between AE1 and AE2 is necessary because IDs require the existence of a linear temporal ordering of the actions1 . Finally, the lower part of the ID models the relationships between the Consultant and the rest of the MAS. Recommendations are represented as non-intervening actions [21]. This type of actions are useful to model situations where there is a possibility of acquiring information in order to reduce uncertainty.

G

E1

E1: Evaluation 1 E2: Evaluation 2 AE1: Ask for E1 AE2: Ask for E2 C: Cost function U: Utility function CE1: C’s recommendation for E1 CE2: C’s recommendation for E2 CT: C is trustworthy C: Cost of C’recommendation

E2 U

AE1

CE1

AE2

CT

C

CE2 Random variable Utility node Action node (observation) Action node (intervening)

Figure2. An influence diagram representing the relationships of the MAS

In addition to the influence diagram shown in Fig. 2, there is one conditional probability distribution for each relationship and two utility functions. These elements are not detailed in this paper. As stated above, the usage of IDs to model CRs is useful because it can be used to infer efficiently and make decisions based on the uncertain knowledge encoded in the ID. For example, the Factory can determine whether Lab-1 or Lab-2 is more convenient, given its previous experiences and the Consultant’s opinion. In the next section we describe the usage of IDs to perform inference.

4 Inference To this point we have an expressive, concise representation that is easy to acquire because of a good cognitive match. This representation provides us mechanisms to: – determine the probability of a proposition based on prior uncertain knowledge and current observations. – encode casual relationships in an efficient representation optimized to make predictions [5]. 1

This constraint can be relaxed by using more advanced algorithms for inference [28].

– learn casual relationships [4,16]. This is useful to gain understanding about a problem domain. – update the belief or the probability of occurrence of a particular event given an evidence [26]. For example, we can determine the joint probability distribution for the variables AE1 and AE2 given that the goal G is achieved, namely P(AE1 , AE2 |G = t) which means, “the probability of asking Lab-n for performing E1 with a process of quality q1 and Lab-m for performing E2 with a process of quality q2 ”, and then we can use this information to determine the expected utility of every option and select the most convenient. Let us consider the probability P(AE1 , AE2 |G = t). This probability is not stored directly in the model, and hence needs to be computed. In general, the computation of a probability of interest given a model is known as probabilistic inference [5]. The general method to compute a probability given a BN consists of using the fundamental rule for probability calculus to recover the full joint probability distribution and answer questions: P(A|B) =

P(A, B) P(B)

(2)

thus the query P(AE1 , AE2 |G = t) can be calculated by applying the chain rule (equation 1): P(AE1 , AE2 , G = t) P(G = t) ∑E1 ,E2 ,CE1 ,CE2 ,CT P(G = t, E1 , . . . ,CT, AE1 , AE2 ) = ∑E1 ,E2 ,CE1 ,CE2 ,CT,AE1 ,AE2 P(G = t, E1 , . . . , E2 )

P(AE1 , AE2 |G = t) =

(3)

Note that P(AE1 , AE2 , G = t) can be obtained by marginalization over E1 , E2 , CE1 , CE2 , CT when G = t, and then multiplying by P(AE1 , AE2 ). In order to do this it is necessary to reconstruct the full joint probability distribution from the factored representation by applying the equation 1. Such explicit reconstructing is feasible only for toy problems [5], because of the number of operations involved. To solve this we have to rewrite the equations by using the normal associativity, commutivity and distributy properties of real-numbers and matrices. There are many algorithms to do this such as the set-factoring heuristic [24], variable elimination [6] and junction trees [26]. In this paper we have applied a simple method based on the set-factoring heuristic developed by Li [24]. By applying this heuristic to the numerator of the right part of the equation 3, it can be simplified in ways that reflect the structure of the Bayesian network itself. In other words, the equation should be simplified taken into account the probabilistic independence among variables:



E1 ,E2 ,CE1 ,CE2 ,CT

P(G = t, E1 , . . . ,CT )P(AE1 , AE2 ) =

=



P(G = t|E1 , E2 )P(E1 |AE1 )P(E2 |AE2 )P(AE2 |AE1 ,CT ) ·

E1 ,E2 ,CE1 ,CE2 ,CT

·P(AE1 |CT )P(CE1 |AE1 )P(CE2 |AE2 )P(CT ) This equation can be further simplified by grouping some factors in order to reduce the number of operations to calculate the distribution. In a similar manner, the same procedure can be applied to the denominator of equation 3. Table 2 shows the results obtained by solving the simplified expressions. Table2. P(AE1 , AE2 |G = t)

Lab-1 low Lab-1 medium Lab-1 high Lab-2 low Lab-2 medium Lab-2 high

AE1 0.17 0.13 0.09 0.25 0.20 0.16

AE2 0.15 0.12 0.09 0.23 0.21 0.20

According to table 2 it is more convenient for the Factory to ask Lab-2 for both evaluations, since their probabilities of success are higher than the probabilities for Lab-1. In the example we have shown one of the advantages of using IDs to represent CRs, namely their inference capability. The model is useful because it allows us to reason about generic coordination relationships in a MAS. For example, we can infer new relationships or determine the probability of achieving a goal given a set of relationships between agents. As a consequence, it is possible to determine whether an agent would help or not in some activity, or if some relationship would lead to conflicts. To this point we have centered our analysis in inference, without taking into account utilities. In addition to inference, by using IDs we can calculate queries such as: – Which is the best choice of actions in order to maximize the expected utility? – If G = t, what is the expected utility?: P(U|G = t) The first query can be easily calculated since there are arcs AE1 → U and AE2 → U. The expected utility of doing an action a ∈ AE1 is: EU(a) =



U(G, a, AE2 )P(G)P(AE2 )

G,AE2

Therefore, let A∗1 be the action to ask for an evaluation E1 that maximizes the expected utility: A∗1 = argmax EU(a) a

= argmax a



G,AE2

U(G, a, AE2 )P(G)P(AE2 )

This can be easily solved, since all the probabilities are known. The second query, namely P(U|G = t), can be solved by applying the fundamental rule (equation 2) and the chain rule (equation 1) in a similar manner to the first example of inference.

5 Using influence diagrams in MAS In previous sections we described how to represent CRs with IDs. This section uses this representation of CRs in multi-agent systems to infer and explain CRs between agents. The goal of representing CRs with IDs is to infer and explain relationships in uncertain environments. We have used this representation in multi-agent systems in order to permit each agent to infer and explain relationships with other entities of the system. We assume that agents are able to communicate their local views of CRs. Each agent keeps an ID with its own view of the relationships in the MAS. Typically, these relationships only involve entities directly related to the agent, but not transitive relationships. The idea is to keep the local ID of each agent as simple as possible by distributing the knowledge across the whole MAS. For example, in the factory domain each laboratory can be represented by an agent. Therefore each agent would have an ID with the relationships in which the agent is directly involved. As a consequence of this distribution of the ID across the agents of the MAS, the algorithms to infer and explain by using IDs are distributed across the MAS. Each agent uses its local information to infer. When the local information is not enough, the agent asks another agent the necessary information. In the interactions agents do not exchange their IDs, but only high level information necessary to reason. For example, these interactions often involve questions that trigger reasoning tasks (infering or explaining some variable) in the agent which receives a message. In this way, the global ID of the CR is actually mapped into a MAS as a distributed ID, in which each agent is in charge of a specific part of the knowledge. In addition, reasoning with the ID is also distributed, since each agent reasons with its part of the ID. We use a variation of the local conditioning (LC) algorithm [13] for infering in a distributed manner. LC is an extension of the classical algorithm for evidence propagation for single-connected networks [22]. The basis of LC is the process of building an associated tree to the ID. The tree is build by doing a depth first search in the ID as a way to detect and break the loops in the network. Then, in the resulting tree, a simple variation of the distributed LC algorithm [13] for evidence propagation in distributed Bayesian networks is applied.

6 Learning problem As we have mentioned, the purpose of learning coordination relationships is to have IDs that can predict interactions, possible conflicts, opportunities for collaboration, etc. These predictions will lead the learning agent to arrive to the best decision in terms of maximizing its expected utility.

In order to learn CRs an agent may follow several strategies such as monitoring the execution of its actions, interactions with other agents, evolution of organizations, etc. In the framework of CRs represented as IDs we can classify all these types of learning into two groups, namely, learning the conditional probability table associated to a CR and learning new CRs (the structure of the influence diagram). For example, consider the MAS described in section 3. After an interaction between the Factory and the Consultant, the first one may decide to ask Lab-1 for certain task given the Consultant’s recommendation. If Lab-1 fails to performing the task, the Factory’s trust in the Consultant will decrease. This will be reflected in the ID by updating the conditional probability table of CT . In the rest of the section we will consider that the qualitative relationships among agents are known. Then, the problem is to determine the quantitative information among them, which is represented as conditional probabilities. The problem of learning conditional probability distributions has been largely studied, however most of these methods need a database containing a big number of cases with the values of all variables deterministically specified. In a MAS we do not have a database of cases, thus we need to update the conditional probability tables sequentially from observations. We assume that the goal of learning in this case is to find the values of the parameters of each conditional probability distribution which maximizes the likelihood of the training data, which contains N independent cases. The normalized log-likelihood of the training set D is a sum of terms, one for each node: L=

1 n S ∑ ∑ log P(Xi |Pai , Dl ) N i=1 l=1

We see that the log-likelihood scoring function decomposes according to the structure of the graph, and hence we can maximize the contribution to the log-likelihood of each node independently (assuming the parameters in each node are independent of the other nodes). Consider estimating the conditional probability table for any boolean node. If we have a set of training data, we can just count the number of times the variable is true given its parents, and the number of times it is false given its parents. Given these counts, we can find the maximum likelihood estimate of the conditional probability table by: P(Xi |Pai ) ≃

N(Xi , Pai ) N(Pai )

where N(Xi ) is the number of times the variable Xi takes the value xi . Thus learning just amounts to counting (in the case of multinomial distributions). For Gaussian nodes, we can compute the sample mean and variance, and use linear regression to estimate the weight matrix. For other kinds of distributions, more complex procedures are necessary [34,4,31]. As it is well known from the hidden Markov models literature, max-likelihood estimates of conditional probability tables are prone to sparse data problems, which can be solved by using (mixtures of) Dirichlet priors or pseudo counts. This results in a Maximum A Posteriori estimate. For Gaussians, we can use a Wishart prior, etc.

There is another factor that makes the learning of a conditional probability table more difficult: we often are not able to provide complete information to the learner, since agents only have a partial view of the whole MAS and its variables. The learning problem we are facing, where the structure of the network is known and there is partial observability is usually solved by the expected maximization (EM) algorithm. The basic idea behind EM is that if we knew the values of all the nodes, learning (the M step) would be easy, as we saw above. Thus in the E step, we compute the expected values of all the nodes using an inference algorithm, and then treat these expected values as if they were observed. For example, in the case of the Xi node we replace the observed counts of the events with the number of times we expect to see each event: P(Xi |Pai ) =

E(N(Xi , Pai )) E(N(Pai ))

where E(N(Xi )) is the expected number of times event x occurs in the whole training set, given the current guess of the parameters. These expected counts can be computed as follows: " # E(N(Xi )) = E

∑ I(Xi |D(k)) k

= ∑ P(Xi |D(k)) k

where I(Xi |D(k)) is an indicator function defined as:  1 if x occursin the trainingcase k I(Xi |D(k)) = 0 otherwise Given the expected counts, we maximize the parameters, and then recompute the expected counts, etc. This iterative procedure is guaranteed to converge to a local maximum of the likelihood surface. It is also possible to do gradient ascent on the likelihood surface [3,34], but EM is usually faster, since it uses the natural gradient, and simpler, since it has no step size parameter and takes care of parameter constraints. In any case, we see than when nodes are hidden, inference becomes a subroutine which is called by the learning procedure. We still have to do further research in this area to determine, both theoretically and experimentally, which learning method is best suited for learning coordination relationships. Moreover, we have been trying to extend the algorithms for parameter learning in BN with heuristics more related to CRs that would increase the efficiency of the learner as well as its accuracy.

7 Related Work The approaches related to the usage of Bayesian networks in MAS are in the area of recursive modeling method [35] and trustworthiness in negotiation [1]. [35] describes a MAS where each agent models others agents’ beliefs, desires and intentions as an influence diagram. The purpose of learning is to predict other agents’ behavior based on past observations. The main shortcoming of the approach is the large volume of

information needed to model a MAS composed by several agents. In opposition, in our approach the learning tasks, as well as inference is distributed over the whole MAS. The approach presented in [1] uses a static ID to model relationships among agents that negotiate. These relationships are not updated, thus the IDs are used only for acting, but not for learning. There are several attempts to learn to coordinate, however most of them are designed for specific types of coordination or specific domains. For example, in [32] agents uses a Q-learning algorithm and a classifier for developing coordination in two different domains (robot navigation and resource sharing). The convergence of this type of algorithms is often achieved after several hundred of trials, thus the approach is likely to be more effective in domains where agents repeatedly perform similar tasks. In the MAS discussed in [14] each agent learns cooperative procedures which are then used by a planner assuming complete and accurate knowledge of agents and the environment.

8 Conclusions In this paper we have presented an approach to represent coordination relationships in an uncertain multi-agent environment. The proposed mechanism involves a number of agents which select their actions by maximizing the expected utility. At the same time, they refine their models of CRs by monitoring the execution of their actions thus achieving better coordination and discovering new relationships. By using BNs and IDs we have taken advantage of their well known features, namely compact representation of uncertain knowledge optimized for inference, action selection and learning. Therefore a MAS can enhance itself, not only at the level of its individual agents, but also at the MAS level. Currently we are developing a MAS based on the model described in this paper. The MAS is being developed with a Java framework for MAS named Brainstorm/J [39]. In addition, we are comparing the performance of the learner component of our work with a mechanism based on reinforcement learning. We plan to expand on the model proposed in this paper to incorporate other types of relationships. For example, we plan to represent knowledge of organizations, thus allowing the system to form coalitions and evolve as needed.

References 1. Bikramjit Banerjee, Sandip Debnath, and Sandip Sen. Using bayesian network to aid negotiations among agents. In Working Notes of the AAAI-99 Workshop on Negotiation: Settling Conflicts and Identifying Opportunities (also available as AAAI Technical Report WS-9912), pages 44–49, 1999. 2. M. Barbuceanu and M. Fox. The design of a coordination language for multi-agent systems. In J. P. Müller, M. J. Wooldridge, and N. R. Jennings, editors, Proceedings of the ECAI’96 Workshop on Agent Theories, Architectures, and Languages: Intelligent Agents III, volume 1193 of LNAI, pages 341–356, Berlin, August 12–13 1999. Springer. 3. John Binder, Daphne Koller, Stuart Russell, and Keiji Kanazawa. Adaptive probabilistic networks with hidden variables. Machine Learning, 29:213–244, 1997.

4. W. L. Buntine. Operations for learning with graphical models. Journal of Artificial Intelligence Research (JAIR), 2:159–225, December 1994. 5. Bruce D’Ambrosio. Inference in bayesian networks. AI Magazine, 20(2):21–35, 1999. 6. Rina Dechter. Bucket elimination: A unifying framework for probabilistic inference. In Eric Horvitz and Finn Jensen, editors, Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence (UAI-96), pages 211–219, San Francisco, August 1–4 1996. Morgan Kaufmann Publishers. 7. Keith Decker. TÆMS: A framework for environment centred analysis and design of coordination mechanisms. In Greg O’Hare and Nick Jennings, editors, Foundations of Distributed Artificial Intelligence, chapter 16. John Wiley and Sons, 1996. 8. Keith Decker and Victor Lesser. Generalizing the partial global planning algorithm. International Journal of Intelligent and Cooperative Information Systems, 1(2):319–346, June 1992. 9. Keith Decker and Victor Lesser. Designing a family of coordination algorithms. In Proceedings of the Thirteenth International Workshop on Distributed AI, pages 65–84, Seattle, WA, July 1994. AAAI Press Technical Report WS-94-02. Also UMass CS-TR-94-14. To appear, Proceedings of the First International Conference on Multi-Agent Systems, San Francisco, AAAI Press, 1995. 10. Keith Decker and Jinjiang Li. Coordinating mutually exclusive resources using GPGP. Autonomous Agents and Multi-Agent Systems, 3(2):133–157, June 2000. 11. Keith S. Decker. Environment Centered Analysis and Design of Coordination Mechanisms. PhD thesis, Departament of Computer Science, University of Massachesetts, 1995. 12. Keith S. Decker. Task environment centered simulation. In M. Prietula, K. Carley, and L. Gasser, editors, Simulating Organizations: Computational Models of Institutions and Groups. AAAI Press/MIT Press, 1996. 13. Francisco Javier Díez. Local conditioning in Bayesian networks. Artificial Intelligence, 87:1–20, 1996. 14. Andrew Garland and Richard Alterman. Learning cooperative procedures. In AIPS workshop on Integrating Planning, Scheduling and Execution in Dynamic and Uncertain Environments, 1998. 15. P. Haddawy, J. Jacobson, and Charles E. Kahn. Banter: A bayesian network tutoring shell. Artificial Intelligence in Medicine, 10(2):177–200, 1997. 16. David Heckerman. A tutorial on learning bayesian networks. Technical Report MSR-TR95-06, Microsoft Research, March 1995. 17. Eric Horvitz, Jack Breese, David Heckerman, David Hovel, and Koos Rommelse. The Lumière project: Bayesian user modeling for inferring the goals and needs of software users. In Gregory F. Cooper and Serafín Moral, editors, Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 256–265, San Francisco, July 24–26 1998. Morgan Kaufmann. 18. N. R. Jennings. Coordination techniques for distributed artificial intelligence. In G. M. P. O’Hare and N. R. Jennings, editors, Foundations of Distributed Artificial Intelligence, pages 187–210. John Wiley & Sons, 1996. 19. David Jensen, Michael Atighetchi, Règis Vincent, and Victor Lesser. Learning quantitative knowledge for multiagent coordination. In Proceedings of the 6th National Conference on Artificial Intelligence (AAAI-99); Proceedings of the 11th Conference on Innovative Applications of Artificial Intelligence, pages 24–31, Menlo Park, Cal., July 18–22 1999. AAAI/MIT Press. 20. Finn V. Jensen. An Introduntion to Bayesian Networks. UCL Press, 1996. 21. Finn V. Jensen. Influence diagrams. In W. Piegorsch and A. El-Shaarawi, editors, Encyclopedia of Environmetrics. John Wiley & Sons, 2001.

22. J. H. Kim and J. Pearl. A computational model for causal and diagnostic reasoning in inference engines. In International Joint Conference on Artificial Intelligence, pages 190–193. Morgan Kaufmann, 1983. 23. V. Lesser, K. Decker, N. Carver, D. Neiman, M. Nagendra Prasad, and T. Wagner. Evolution of the GPGP domain-independent coordination framework. Technical Report UM-CS-1998005, University of Massachusetts, Amherst, Computer Science, July, 1998. 24. Z. Li and B. D’Ambrosio. Efficient inference in Bayes nets as a combinatorial optimization problem. International Journal of Approximate Reasoning, 10(5), 1994. 25. Lockheed-Martin. Lockheed-martin autonomous control logic to guide unmanned underwater vehicle. Press release, Lockheed-Martin Missiles and Space Communications Office, Palo Alto, CA, April 1996. http://lmms.external.lmco.com/newsbureau/pressreleases/1996/9604.html. 26. Anders L. Madsen and Finn V. Jensen. L AZY progagation: A junction tree inference algorithm based on lazy evaluation. Artificial Intelligence, 113(1–2):203–245, 1999. 27. Thomas W. Malone and Kevin Crowston. The interdisciplinary study of coordination. ACM Computing Surveys, 26(1):87–119, March 1994. 28. Thomas D. Nielsen and Finn V. Jensen. Welldefined decision scenarios. In Kathryn B. Laskey and Henri Prade, editors, Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-99), pages 502–511, S.F., Cal., July 30–August 1 1999. Morgan Kaufmann Publishers. 29. Judea Pearl. Bayesian networks. Technical Report 980002, University of California, Los Angeles, Computer Science Department, March 31, 1998. 30. Judea Pearl. Reasoning with cause and effects. In Dean Thomas, editor, Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99-Vol2), pages 1437– 1449, S.F., July 31–August 6 1999. Morgan Kaufmann Publishers. 31. Stuart Russell, John Binder, and Daphne Koller. Adaptive probabilistic networks. Technical Report CSD-94-824, University of California, Berkeley, July 1994. 32. Sandip Sen and Mahendra Sekaran. Individual learning of coordination knowledge. Journal of Experimental and Theoretical Artificial Intelligence, 10(3):333–356, 1998. 33. Munindar P. Singh. Synthesizing coordination requirements for heterogeneous autonomous agents. Autonomous Agents and Multi-Agent Systems, 3(2):107–132, June 2000. 34. David J. Spiegelhalter and Steffen L. Lauritzen. Sequential updating of conditional probabilities on directed graphical structures. Networks, 20:579–605, 1990. 35. Dicky Suryadi and Piotr J. Gmytrasiewicz. Learning models of other agents using influence diagrams. In Proceedings of the 7th International Conference on User Modeling, pages 223– 232, New York, July 1999. Springer. 36. Frank von Martial. Interactions among autonomous planning agents. In Y. Demazeau and J.-P. Muller, editors, Decentralized AI, pages 105–119. North Holland, 1990. 37. Gerhard Weiss. Adaptation and learning in multi-agent systems - some remarks and a bibliography. In Gerhard Weiss and Sandip Sen, editors, Adaptation and Learning in Multi-Agent Systems, volume 1042 of Lecture Notes in Artificial Intelligence, chapter 1, pages 1–21. Springer-Verlag, 1996. 38. M. Wooldridge. Intelligent agents. In Gerhard Weiss, editor, Multiagent Systems, chapter 1. The MIT Press, 1999. 39. Alejandro Zunino and Analía Amandi. Building multi-agent systems from reusable software components. In Luis Otavio Alvares, editor, Proceedings of the 3rd Workshop in Distributed Artificial Intelligence and Multi-Agent Systems (3WDAIMAS) held in conjunction with the 7th Iberoamerican Conference on Artificial Intelligence (IBERAMIA 2000) and the 15th Brazilian AI Symposium (SBIA 2000), Atibaia, São Paulo, Brasil, November 2000.