A Learning-Based Coalition Formation Model for ... - CiteSeerX

2 downloads 0 Views 19KB Size Report
ABSTRACT. In this paper, we present a learning-based coalition formation ... The agent then feeds the analysis back to the casebase, completing the learning ... Permission to make digital or hard copies of all or part of this work for personal or ...
A Learning-Based Coalition Formation Model for Multiagent Systems Leen-Kiat Soh

Xin Li

Department of Computer Science and Engineering University of Nebraska-Lincoln, 115 Ferguson Hall Lincoln, NE 68588-0115 USA 001-402-472-6738

Department of Computer Science and Engineering University of Nebraska-Lincoln, 115 Ferguson Hall Lincoln, NE 68588-0115 USA 001-402-472-6738

[email protected]

[email protected]

ABSTRACT In this paper, we present a learning-based coalition formation model that forms sub-optimal coalitions among agents to solve real-time constrained allocation problems in a dynamic, uncertain and noisy environment. This model consists of three stages (coalition planning, coalition instantiation and coalition evaluation) and an integrated learning framework. An agent first derives a coalition formation plan via case-based reasoning (CBR). Guided by this plan, the agent instantiates a coalition through negotiations with other agents. When the process completes, the agent evaluates the outcomes. The integrated learning framework involves multiple levels embedded in the three stages. At a low level on strategic and tactical details, the model allows an agent to learn how to negotiate. At the meta-level, an agent learns how to improve on its planning and the actual execution of the plan. The model uses an approach that synthesizes reinforcement learning (RL) and case-based learning (CBL). We have implemented the model partially and conducted experiments on CPU allocation in a multisensor target-tracking domain.

Keywords Coalition formation, reinforcement learning, case-based learning, negotiation

1. INTRODUCTION In this paper we present a learning-based coalition formation model for a cooperative multiagent system. The model consists of three stages: coalition planning, coalition instantiation, and coalition evaluation. At the coalition planning stage, the coalitioninitiating agent generates a plan using case-based reasoning (CBR). Then, the agent performs the actual act of coalition formation during the instantiation stage—allocating tasks or resources, identifying and ranking potential candidates, approaching the candidates and soliciting their help, coordinating agreements, and so on. If a coalition is formed, the agent executes it. The execution may be a success or a failure. Thereafter, the agent evaluates its

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. AAMAS ’03, July 1-2, 2003, Melbourne, Australia. Copyright 2003 ACM 1-58113-000-0/00/0000…$5.00.

coalition formation process at both the previous stages to identify areas for improvement. The evaluation is based on two groups of parameters: one for the efficiency of the coalition formation (planning and instantiation) process, and one for the outcome of the coalition execution. The integrated learning framework involves multiple levels embedded in the three stages. At a low level on strategic and tactical details, the model allows an agent to learn how to negotiate with other agents to form a coalition. At the meta-level, an agent learns how to improve on its planning and the actual execution of the plan. The model uses an approach that synthesizes reinforcement and case-based learning. Our research domain is coalition formation among cooperative agents operating in a dynamic, real-time, uncertain and noisy environment. Each agent only has a partial view of the environment and cannot afford making fully rational, optimal decisions due to the time constraints. Moreover, the environment changes dynamically such that agents within or without a coalition may affect the environment, which renders a carefully rationalized coalition formation plan useless (or less useful) by the time the coalition is formed. These concerns motivate our coalition formation model.

2. COALITION FORMATION MODEL Briefly, when an agent encounters a task that it does not have the capability to accomplish satisfactorily, it invokes the coalition formation process. First, it composes a plan via case-based reasoning (CBR). Equipped with a plan, the agent proceeds with the actual coalition instantiation, interacting with other agents and soliciting help. The actual tactical design of the process is aided by dynamic profiles of the environment and other agents. Within this stage, there are learning modules that allow the agent to learn from its interaction activities. If the coalition formation is successful, the agent executes it. Either way, the agent will arrive at the third stage that evaluates the outcome of the coalition activities. The agent then feeds the analysis back to the casebase, completing the learning cycle. In our system, there are two levels of learning for an agent: (1) a planning level where an agent learns plans that form coalitions better, and (2) a strategic and tactical level where an agent learns how to carry out the plan. By dividing learning into two levels, we do not overburden each particular level. In addition, there are two levels of learning involvement: (1) the learning module acts as an observer, and (2) the learning module acts as an adaptor. As an observer, the learning module monitors the coalition formation process and the outcome. After the process terminates, the agent refines the plan. Thus the learning only affects future coalition formation activities. As an adaptor, the learning module reacts to

current information, activities, and environment to increase the chance of a successful coalition formation. We use the observer at the planning level, and the adaptor at the strategic/tactical level.

tion at the time it is terminated, or (2) if it is a success, the agent waits until the coalition is executed or discarded due to intercoalition competition, then the agent evaluates both the coalition instantiation and execution outcomes.

2.1 Coalition Planning

Here, we learn from the evaluation through RL and CBL. The role of RL is to increase the likelihood of a good plan being selected in the next coalition formation task. The role of CBL is to provide the basis for suitability study of a plan given a particular task. The performance metric we use for each coalition is based on: (1) the time taken for the coalition formation process, and a small value is preferred in general, (2) the number of messages, and a small value is preferred in general, (3) the number of neighbors approached; depending on the robustness and reliability requirements, a certain value is preferred, (4) the outcome of the coalition formation process and associated utilities, (5) the number of coalition members, (6) the fairness of load distribution among members; in general, the load should be evenly distributed among members, (7) the execution outcome of coalition; a set of weighted vectors of the quality of the execution outcome, and (8) the number of aborted coalitions due to inter-coalition competition, and a small value is preferred in general.

At this stage, the coalition-initiating agent derives a plan for the required coalition formation process via case-based reasoning (CBR). A case consists of a problem description, a solution, and an outcome. The problem description refers to the task to be accomplished. The solution refers to the coalition formation plan. A task description consists of the following: (1) the priority of the task, (2) the set of subtasks, (3) the temporal relationships (a partial order) of the subtasks, (4) the time requirement of the task, (5) the resource requirement of the task (communication channels, CPU usage, disk storage, database access, etc.), (6) the noise factor associated with the task (the success rates and solution qualities, with probabilities), and (7) the set of acceptable outcomes and corresponding utility values. A plan consists of the following: (1) the number of coalition candidates, (2) the number of expected coalition members, (3) the time allocated for coalition instantiations, (4) the allocation algorithm, (5) the number of messages recommended, (6) the expected coalition instantiation outcome, and (7) the expected coalition execution outcome. The learning mechanism employed here is case-based learning (CBL) and tightly coupled with the coalition evaluation stage. Basically, we use CBR to retrieve the best case, and adapt its solution for the current coalition problem. The retrieval is based on the similarity in the problem description and the value of the case. This is how we infuse the element of reinforcement learning into CBR and subsequently CBL. The value of a case is based on the Q-value of applying the plan to the problem description. The goal of learning at this stage is manifold: (1) to obtain a good coalition, (2) to obtain a satisficing coalition, (3) to reduce the communication cost, (4) to reduce the processing cost, and (5) to reduce the formation time. The last three objectives are for the improvement of the coalition formation process whereas the first two for coalition quality.

2.2 Coalition Instantiation The coalition instantiation stage consists of three components: (1) coalition initialization where the agent evaluates and ranks the potential utility of each neighbor, (2) coalition finalization where the agent conducts multiple concurrent 1-to-1 negotiations with the top-ranked neighbors, and (3) coalition acknowledgement where the agent decides whether a coalition is formed and whether a coalition is to proceed as agreed, and informs all neighbors who have agreed to help. At this stage, we use both case-based learning (CBL) and reinforcement learning (RL). We use the former to learn better negotiation strategies and the latter to choose better negotiation partners. Readers are referred to [1].

2.3 Coalition Evaluation When a coalition instantiation process completes, (1) if it is a failure, then the agent immediately evaluates the status of the coali-

3. RESULTS AND CONCLUSIONS Thus far, we have designed and implemented the coalition instantiation stage and several allocation algorithms. We have also conducted several experiments on CPU reallocation. In this application domain, when an agent detects a CPU shortage, it needs to form a coalition with other agents to solve resource shortage problem at the same time as tracking targets. On the other hand, when an agent has unused CPU, it will be willing to help others through transferring part of CPU to other agents. Our experiments show that the agents are able to reduce CPU shortages and adapt to various resource constraints coherently, and that learning plays a significant role in coalition formation effectiveness among agents. An initiating agent with less experience does not form coalitions well even though it may become really good at joining other agents’ coalitions. We have presented a learning-based coalition formation model. The integrated learning framework involves a low level on strategic and tactical details, and a meta-level on planning and execution. We have implemented and tested the coalition instantiation stage of this model and initial results are promising.

4. ACKNOWLEDGMENTS This work is partially supported by a grant from the DARPA ANTS project, subcontracted from the University of Kansas (Contract number: 26-0511-0026-001).

5. REFERENCES [1] Soh, L.-K., and Tsatsoulis, C. Reflective negotiating agents for real-time multisensor target tracking. in Proceedings of IJCAI’01, 1121–1127, Seattle, WA, 2001.

Suggest Documents