Design and Evaluation of Distributed Role Allocation Algorithms in Open Environments Ranjeev Mittu William Chao Myriam Abramson Naval Research Laboratory Naval Research Laboratory Naval Research Laboratory
[email protected] [email protected] [email protected] Fax: 202-404-1191 Fax: 202-767-1191 Fax: 202-404-1191 Abstract— Role allocation has emerged as one of the key issue in teamwork involving communication. While direct point-to-point communication is expensive and uncertain, access to neighbors is reliable and efficient in scale-free networks such as those found in open environments and social networks. This suggests decentralized approaches to role allocation based on communication between neighbors. This paper adapts and evaluates some of the basic types of algorithms in distributed role allocation in open environments using a novel coordination measure in the prey/predator domain. Index Terms— distributed problem-solving, role allocation, coordination, open environments
I. I NTRODUCTION Open environments, e.g. peer-to-peer (P2P) and wireless or Mobile Adhoc Networks (MANET), are characterized by dynamic team formation, multiple and changing goals, non-deterministic state transitions, variable resources, and unreliable communication. The key issue in teamwork theory is role allocation based on the communication of beliefs and intentions. This issue becomes complicated by the characteristic factors of open environments. Role allocation can be viewed as a constraint satisfaction problem similar to resource allocation problems where agents are variables and roles are values that those variables can take. When a utility or preference is attached to an assignment, role allocation also entails a constraint optimization problem. When multiple teams coexist, an agent can take on multiple non-conflicting roles. Over-constrained situations where a team cannot be formed because of an insufficient number of agents are not relevant here. Rather, an agent has to assume multiple roles in those situations. The capability of agents to
Fig. 1. Multi-hop routing in MANET Communication can extend beyond the immediate neighbors, effectively extending the communication range of the agents
handle multiple roles is an important aspect of the coordination of intelligent agents. Mobile Adhoc Networks are characterized by a dynamic topology due to node mobility and limited energy life. In addition, the limited communication range provides only a partial knowledge of the global environment but is not necessarily restricted to the immediate neighbors (see Fig. 1). Those constraints make it advantageous for agents to selforganize within their communication range and the absence of centralized control requires a distributed control policy. The uncertainty that a message will arrive at its destination in a finite amount of time violates one of the basic communication assumptions of distributed constraint satisfaction algorithms[1]. How to extend those algorithms to open and uncertain environments is still an active area of research[2]. This paper is organized as follows. Section II introduces our methodology for evaluation as well as motivates experiments in the prey/predator domain. Section III presents results and analysis of our evaluation. Finally, Section IV and V conclude with a summary of related work and recommendations
Form Approved OMB No. 0704-0188
Report Documentation Page
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.
1. REPORT DATE
3. DATES COVERED 2. REPORT TYPE
JUN 2005
00-00-2005 to 00-00-2005
4. TITLE AND SUBTITLE
5a. CONTRACT NUMBER
Design and Evaluation of Distributed Role Allocation Algorithms in Open Environments
5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER
6. AUTHOR(S)
5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)
8. PERFORMING ORGANIZATION REPORT NUMBER
Naval Research Laboratory,4555 Overlook Avenue SW,Washington,DC,20375 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)
10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S)
12. DISTRIBUTION/AVAILABILITY STATEMENT
Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES
Proceedings of the 2005 International Conference on Artificial Intelligence 2005, IC-AI 2005, June 2005. 14. ABSTRACT
see report 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: a. REPORT
b. ABSTRACT
c. THIS PAGE
unclassified
unclassified
unclassified
17. LIMITATION OF ABSTRACT
18. NUMBER OF PAGES
Same as Report (SAR)
7
19a. NAME OF RESPONSIBLE PERSON
Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18
for future work. II. M ETHODOLOGY Our methodology consists of adapting three distinct types of distributed role allocation algorithms and evaluating results in the prey/predator domain – a concurrent and asynchronous environment. A. Prey/Predator Problem The prey/predator pursuit game (Fig. 2) is a canonical example in the teamwork literature[3] because one individual predator alone cannot accomplish the task of capturing a prey. Practical applications of the prey/predator pursuit game include, for example, unmanned ground/air vehicles target acquisition, distributed sensor networks for situation awareness, and rescue operations. Due to the decomposability of the global reward as a sum of local rewards, the original problem can be extended to multiple teams by including additional preys. Prey/predators can sense each other if they are in proximity but do not otherwise communicate. Predators communicate with other predators individually or can broadcast messages through their neighbors. Four predators are needed to capture a prey by filling out four different roles: surround the prey to the north, south, east and west. Those roles are independent of each other and can be started at any time obviating the need for scheduling. The only requirement is that they have to terminate at the same time either successfully when a capture occurs or unsuccessfully if no team can be formed. The predator agents are homogeneous and can assume any role but heterogeneity can be introduced by restricting the role(s) an agent can assume. The prey and predators move concurrently and asynchronously at different time steps. In addition to the four orthogonal navigational steps, the agents can opt to stay in place. In case of collision, the agents are held back to their previous position. The prey moves furthest away from the closest predator in its perception range or does not move if no predators are in sight. This strategy is referred to as the Move Away From Nearest Predator (MAFNP) strategy [4]. It has been shown to lead to certain capture situations[5]. The preference of predator agent for a role is
Fig. 2.
Prey/predator pursuit game on a toroidal grid
inversely proportional to the Manhattan distance required to achieve the role. The predators move in the direction of their target when assigned a role or explore the space according to a memory-based scheme on the last few steps. The decision space for the role allocation of predators and preys is where is the number of teams of size 1 . This problem belongs to the most difficult class of problems for constraint satisfaction in multi-agent systems[6] due to the dynamic nature of the environment and the mutuallyexclusive property of role allocation. B. Distributed Role Allocation Algorithms There are three basic types of distributed role allocation algorithms involving communication between agents. Typically, each agent runs the same algorithm to decide whether to change their value and arrive at a solution after exchanging messages in an iterative fashion. In one type, the distributed stochastic algorithm (DSA) [7], [8], changes are made stochastically by the agents and communicated until a consistent solution is obtained (see IIB.1). In the other type, illustrated by the simple distributed improvement algorithm (SDI) (see IIB.2), the changes are communicated and “voted” on in a distributed fashion before being incorporated into a solution. A third type of distributed algorithm (DCO) involves the communication of state information, or beliefs, between agents to infer their respective role using a constraint optimization method. In those algorithms, agents only broadcast 1 for distinct teams and agents assume multiple roles.
for non-distinct teams where
communications without waiting for responses eliminating deadlocks such as the coordinated attack dilemma[9]. The basic sense-think-act agent loop needs to be modified to augment perception with communication from peers. A distributed sense-think-actcommunicate agent loop follows: 1) 2) 3) 4) 5)
Read messages from neighbors Sense environment Modify internal state and select next action act send messages to neighbors
In this context, a round or cycle involving communication consists of executing steps 4 and 5 and then, assuming all messages have been simultaneously put in the appropriate channels, steps 1, 2 and 3. In synchronous environments, those two alternate modes, active and passive, are executed in locksteps. In asynchronous environment, the time required for executing a round depends on the internal clock of the agent. Depending on how fast the internal clock of the agent is relative to the other agents and the environment, a discrepancy between the external world and the internal state of the agent will occur. In those algorithms, exploring is another role that the agent can assume. Since the possible roles are not known apriori but obtained through sensory information, they have to be disseminated as a separate message. 1) Distributed Stochastic Algorithm: The DSA algorithm (Algorithm 1) is a stochastic algorithm where each agent changes their assignment concurrently if it improves the overall solution quality in terms of constraint violations, according to a probability The stochasticity of the algorithm should reduce the number of possible messages without affecting the overall performance. The role communicated is the high-level role (e.g. surround the prey to the north). To avoid cycles, a tabu search is conducted when searching for a new role to minimize conflicts. Agent selects a new role ! according to its preference "# and the number of constraint violations $ associated with the new role :
Algorithm 1
% &
Distributed Stochastic Algorithm(DSA)
select , a probability threshold set initial role to explore active true while (no termination condition) do if (active) then if (a new role is selected) then broadcast the new role to neighbors endif act according to role sense environment broadcast prey information to neighbors active false endif sense environment collect neighbors’ new roles, if any generate a random number r if ( ) then select the next best role to reduce conflicts endif true active end while
&
'"()% &
3 ! *,+ !.-0/ +210 $
(1)
The best role according to Eq. 1 that is not tabu is selected. If a new role cannot be obtained, the agent switches to exploration. Each agent maintains a view of the intent of the other agents but does not know or infer their role preferences. The agent stochastically selects a different role if there are conflicts but not necessarily a better role in terms of preference. A role might not have a feasible plan if the location of the prey is unknown and the agent has to switch to exploration until then. A message is sent only if the agent selects a new role or switches to exploration. In a closed and static environment, termination occurs after 4 consistent rounds where 4 is the number of neighbors and is assumed fixed. At this stage, the agents have common knowledge of their respective roles. Termination entails commitment to a role. In an open and dynamic environment, the agents can only weakly commit to a role and no termination is ensured. 2) Simple Distributed Improvement Algorithm: A similar algorithm, based on distributed local improvement[10] and the min-conflict heuristic[11], exchanges messages among neighbors before settling on a role that attempts to find a solution minimizing conflicts (Algorithm 2). In a leader’s election approach[9], an external consensus based on preference is sought before settling on a role.
Algorithm 2
Simple Distributed Improvement (SDI)
set initial role to explore active true while (no termination condition) if (active) then if (a new role is selected) broadcast the new role to endif act according to role sense environment broadcast prey information to active false endif sense environment selectively collect roles from with maximal improvement if (conflicts occurs) then select a new role endif false active end while
&
&
do then neighbors
Algorithm 3
Distributed Constraint Optimization (DCO)
set initial role to explore active true while (no termination condition) do if (active) then act according to role sense environment broadcast local state to neighbors active false endif collect neighbors’ new information, if any estimate possible roles with a constraint optimization algorithm select role active true end while
&
&
neighbors
neighbors
&
&
TABLE I D ISTINCTIVE FEATURES OF DISTRIBUTED ROLE ALLOCATION ALGORITHMS
In an open and dynamic environment, a role is tentatively followed without reaching a consensus, keeping the option of readjusting when conflicts occur. Upon conflict, an agent will attempt to change its role only if the role is taken by another agent with a higher preference for the role. As in the DSA algorithm, a tabu list is kept to avoid cycles. If a new role cannot be obtained, the current role is kept. The role communicated is the high-level role (i.e. intent) relative to a uniquely identified prey (e.g. surround prey to the north) along with the agent’s preference for the role. A message is sent only if the agent selects a new role. 3) Distributed Constraint Optimization Algorithm: An optimization algorithm can be used in parallel fashion by each agent based on sensed and communicated information from the other agents in the group to autonomously determine which role to assume (see Algorithm 3). It is assumed that the other agents reach the same conclusions because they use the same optimization algorithm[12] and the same payoff function. The Hungarian algorithm[13] is used as the optimization method by each agent. Information necessary to determine the payoff of each role needs to be communicated. Therefore, it is the current local state within the perception range, or augmented with second hand information, that is communicated to the neighbors instead of just the intended role. What is being communicated is a location on the grid. The Hungarian algorithm, also known as the bipar-
belief* DSA SDI DCO
5
intent*
5
5
preference*
5
stochastic
5
* Description of what is being communicated
tite weighted matching algorithm, solves constraint optimization problems such as the job assignment problem in polynomial time. The implementation of this algorithm follows Munkres’ assignment algorithm[14]. The algorithm is run over a utility matrix of !7698;:.= + -?:A@"B