Information-Based Multi-agent Exploration - Semantic Scholar

Information-Based Multi-agent Exploration M. Baglietto, M. Paolucci, L. Scardovi, and R. Zoppoli Department of Communications, Computer and System Sciences, DIST–University of Genoa, Via Opera Pia 13, 16145 Genova, Italy. E-mail: {mbaglietto, paolucci, lucas, rzop}@dist.unige.it

Abstract This paper deals with the problem of mapping an unknown environment by a team of robots. A discrete grid map of the environment is considered in which each cell is marked as free or not dependent on the possible presence of obstacles. The multi-agent exploration is performed by a team of autonomous robots which can communicate with each other and coordinate their actions. A new information based exploration heuristic that exploits the concepts of both information-gain and frontier is proposed. Experimental results show the effectiveness of the approach.

1

Introduction

Exploration and mapping is a fundamental task that autonomous mobile agents must perform whenever they operate in unknown domains on which only partial information or no information at all is available (examples are robotic extra-planetary exploration, undersea terrain mapping, and exploration of telecommunication networks). A promising strategy to rapidly determine a map of an unknown environment is to use a team of autonomous agents. According to such a strategy, each single agent performs a local exploration, and a teamcoordination policy is adopted in order to make the agents cooperate in reaching the global exploration goal more efficiently. In this way, the whole exploration task is decomposed into a set of simpler subtasks that are accomplished by the single agents on the basis of their local information and of the information exchanged with the other agents. An unknown environment has been modelled with a grid defined by partitioning the domain into a finite number of squares (cells). The environment is unknown due to the possible presence of obstacles located in unknown positions; then, two distinct states,

free and not free, are associated with the cells of the grid depending on the presence of obstacles on them. The agents must explore the environment moving from a free cell to a free adjacent one for the purpose of identifying the state of all the cells reachable from the starting one. At each exploration step, an agent improves its knowledge of the model by updating the state of the cells it has encountered along its path on its private map, and communicates such pieces of information to the other agents in order to speed up the whole exploration. As a grid graph can be associated with the discrete grid model adopted, the on-line exploration considered corresponds to the off-line determination of the minimum connected dominating set for such a graph, that is, the set of vertices whose neighbors, along with themselves, constitute all the vertices of the graph. Such a problem is NP-hard [1]. This paper proposes a new multi-agent exploration heuristic, based on the concepts of entropy and frontier. Entropy is used to quantify the information gain obtained during the exploration process, making the agents move toward the places where information is less certain. Such a strategy has been used by MacKay [2] for data selection and analysis. Similar techniques in an exploration framework have appeared in [3], [4], and [5]. The concept of frontier, introduced in [6], has been widely used in experimental robotic frameworks. However, the approach here proposed aims at using such two concepts in a joint way by defining the frontier as a function of the entropy. Even the multi-agent exploration of unknown environments has been studied in several works in the literature (for example, see [7, 8, 9]), but to the authors’ best knowledge it does not seem to be considered in connection with a frontier exploration that is driven by information gain. On the other hand, the approach presented here is based on the minimization of a penalty function that includes a measure of the entropy associated with the cells of the model, and acts as a greedy heuristic that moves

the agents toward the frontier cells with the greatest information gains.

2

The problem of mapping an unknown environment

Let us consider a set of N autonomous vehicles DM1 , . . . , DMN . that are assigned the task of exploring a bidimensional environment. The exploration problem consists in constructing a map of the terrain, by identifying its obstacle-free part. In order to model the environment, we choose a discrete formalization dividing the terrain into regular squares (cells), denoted by their coordinates (i, j). Without loss of generality, we assume the portion of terrain to be explored to be rectangular, i.e., described by G = {(i, j),

i = 1, . . . , ni , j = 1, . . . , nj }.

More generally, any compact set of cells G can be considered. In order to model the presence of obstacles on a cell, we introduce the following function: ½ 1 if there is an obstacle on (i, j) S(i, j) = 0 if there is no obstacle on (i, j). Let us define as xk (t) = [ik (t), jk (t)], k = 1, . . . , N, t = 0, 1, . . . the position of DMk on G at time t (we have adopted a discrete-time setting). In order to consider the set of feasible moves of each DMk at time t, we define Q = {(a, b) : a, b ∈ {−1, 0, 1}} \ (0, 0). The movements of the DMs on the terrain will then be defined as xk (t + 1) = xk (t) + uk (t), uk (t) ∈ Q. Definition 1 A path Pk = {xk (t), t = 0, 1, . . .} is admissible if and only if S[xk (t)] = 0, ∀t = 0, 1, . . . We now give some definitions which formalize the explorability concept. Given N agents and their N starting vertices x1 (0), . . . , xN (0): Definition 2 A cell xg is reachable if and only if an index k and an admissible path Pk exist such that xg ∈ Pk . Definition 3 A cell xe is explorable if and only if it is adjacent, at least, to one reachable cell xg . When moving on the terrain, the DMs acquire new information on the environment by their sensors. At each time t = 0, 1, . . ., each DM has its own representation of the environment

  1 if DMk knows that S(i, j) = 1 0 if DMk knows that S(i, j) = 0 Mk [(i, j), t] =  0.5 if (i, j) is an unexplored cell. (1) For every DMk , k = 1, . . . , N , the mapping Mk [(i, j), t] represents, at time t, its knowledge of the environment. As is obvious, the complete knowledge of the map is achieved if Mk [(i, j), t] = 0 or 1 , ∀(i, j) ∈ G. Updating an entry of Mk corresponds to the acquisition of information on the corresponding cell (in the next section, we shall enter in some details on this aspect). Not to burden the notation, in the formulation of the exploration problem and in the next section (where a technique to efficiently solve it is proposed), we shall consider a single agent (thus dropping index k). We introduce the following problem: Problem 1 Given a compact set G of cells (i, j) and an initial position x(0), S[x(0)] = 0, find a sequence of moves u(0), . . . , u(T − 1) such that the path P = {x(0), . . . x(T )} is feasible and Mk [(i, j, T ] = 0 or 1, ∀(i, j) ∈ G, (i, j) explorable.

3

An entropy based technique

In order to introduce our exploration algorithm, let us consider the mapping M [(i, j), t] . A possible interpretation is the following. At time t, M [(i, j), t] represents the probability for vertex (i, j) to be occupied by an obstacle, given the information acquired by DM . We shall assume DM to have a limited sensor range (call it Rv ). Moreover, to take into consideration the possibility of “seeing” a cell x0 when the DM occupies the vertex x, we shall denote by N (x) the set of vertices DM can see from x (note that, when Rv > 1, obstacles could inhibit the vision of some x0 , even if ||x0 − x|| < Rv ). Remark 1 In this paper, by considering only 3 possible values for M [(i, j), t] (see (1)), we assume the sensor to be “deterministic”, i.e., when a vertex is “seen” by DM , it can label it as obstacle-free or not, without uncertainty. One could easily take into consideration non-deterministic sensors by letting M [(i, j), t] ∈ [0, 1] and using a Bayesian rule to update it. As the exploration task can be seen as an information gain problem, we recall some concepts from information theory. Let us consider an information source X emitting a series of messages, chosen from a set m1 , . . . , mL . We can consider every message mi as a

a node x, eq. (2) expresses the information gain the agent achieves in the following sense. In our “deterministic sensor” framework, the subjective map M is updated as M [(i, j), t + 1] = S(i, j), ∀(i, j) ∈ N (xt+1 ). Denote by t¯ the the time (number of steps) needed by DM to visit a node x(t + t¯) with B(x(t + t¯, t)) > 0, then, obviously: E(G, t + t¯) − E(G, t) = −B(x(t + t¯), t) > 0. Figure 1: A map symbol emitted with probability Pi and carrying an information quantity Ii = −logb Pi , where b is the logarithmic base. Obviously, the message set must satPL isfy the condition i=1 Pi = 1. Assuming a stationary source and statistically independent messages (we consider a discrete memoryless source), the entropy of the PL 1 source can be expressed as H(X) = i=1 Pi logb Pi . This quantity will be useful to define a suitable cost function to be minimized during the exploration. The basic idea of our work is the following: if we see the exploration task as an information-gain one, we can use the entropy of the map to guide the agent to generate suitable trajectories. In our formalization, every vertex (i, j) can be considered as an information source that can send to the agent two messages m1 : [S(i, j) = 1] or m2 : [S(i, j) = 0] with the same probability. At time t, we can express the information gain achieved by DM when moving to vertex x as: ½ ¾ X X 0

P {S(x ) = k}log2

B(x, t) =

x0 ∈N (x) k=0,1

1 P {S(x0 ) = k}

(2)

where P {S(x0 ) = 1} = M [x0 , t] e P {S(x0 ) = 0} = 1 − M [x0 , t]. We are now able to give a formal quantitative description of the knowledge of the environment for DM , in terms of the above-introduced definition. Define “subjective entropy” of the map the quantity ¾ ½ X X

Ek (G, t) =

Pk {S(i, j) = n}log2

(i,j)∈G n=0,1

1 Pk {S(i, j) = n}

(3)

Before the beginning of the exploration procedure, an impulsive pdf is associated with the state of every vertex. If we consider a totally unknown environment, such pdf is characterized by two impulses placed in 0 and 1 with area 12 (M [(i, j), 0− ] = 12 ∀(i, j) ∈ G). Let us now formulate a first algorithm to solve Problem 1. Of course, one could consider an infinite-horizon procedure (see [10]) or at least a receding-horizon one. In this paper, we propose a simple “one-step-horizon” strategy. Even if suboptimal, this strategy has turned out to be effective in the solution of the exploration problem. At each exploration step, we try to maximize the information gain achievable by selecting one of the adjacent vertices. Procedure 1 At time t = 0, 1, . . . choose u? (t) ∈ Q such that u? (t) = arg{max[B(x + u, t)] , s.t. S(x + u) = 0}, (4) u∈Q

To reduce the previous constrained mathematical programming problem to an unconstrained one, we introduce the following penalty cost: C(x) = α M (x, t).

(5)

We modify (4) as: u? (t) = arg{min[C(x + u, t) − B(x + u, t)]} u∈Q

(6)

where the coefficient α must satisfy the relation α ≥ 2B max .

Problem 10 Find a feasible path P such that Ek (G) is minimized.

In order to assure its termination, we introduce the frontier concept [6]. If Rv = 1 at time t, a frontier vertex can be defined as a “known” free vertex (xf : M (xf , t) = 0) adjacent to an unexplored one. However, if the sensor radius is extended (Rv > 1), a more general definition can be given. Making use of the previously introduced quantities, we define the set of frontier vertices through the following relation:

At each exploration step, when visiting (at time t + 1)

Xf (t) = {x : [C(x, t) − σB(x, t) < 0]}

where Pk {S(i, j) = 1} = Mk [(i, j), t] (here we have made explicit index k). Thanks to this definition, Problem 1 can now be reformulated (under the same hypotheses) as follows:

(7)

To consider only vertices that are not occupied by obstacles, we must bound the parameter σ: 0 0; Eq.(6) completes the proof.

4

Extension to a multi-agent framework

In this section, Procedure 3 will be extended to the case where a set of N agents DM1 , . . . , DMN cooperate on the exploration task. By using a team of mobile robots (instead of a single one) to explore an unknown environment, a performance benefit can be gained. Moreover, using several “simple” agents can be easier, cheaper and more fault-tolerant than using a

single “powerful” agent. When dealing with a multirobot control problem, two approaches are possible: centralized or decentralized control. We choose to focus on a decentralized exploration framework. Some of the reasons for such a choice are fault tolerance, natural parallelism, reliability, and scalability. Each agent DMk , k = 1, . . . , N , has its own sensorial capability and keeps a “personal” representation of the environment Mk (·, t), as well as other “local” information (its own position, frontier vertex, etc.) constituting the information set Ik (t) that can (or cannot) be shared with other DMs on the basis of their communication capabilities, their distances, the positions of the obstacles etc. DM1 , ..., DMN do not share the information set but have a common goal: the exploration of the environment, then they can be viewed as cooperative members of a team (see [11]). In this framework we must take in consideration several aspects, which, in the single-agent case have not been considered. The strategy described in Section 3 also applies to a multiagent approach, thanks to the locality and simplicity in the use of information, with minor changes. In this framework, we make the following hypotheses: - the agents can communicate only at a distance Rc ; - when the exploration has been completed, each agent must return to its starting vertex; - every agent DMk makes its decisions considering its own information set and shares it with any DMl in its communication range (i.e., if ||xk − xl || < Rc ). To adapt the strategy proposed in the previous section to a multi-agent framework, it can be slightly modified with a different choice of the frontier vertices. A penalty term is added to force a “distributed” behavior in the exploration. The frontier vertices are chosen such as to satisfy the following condition: xkf = arg{ min [Dk (x) − γF (x) − (x)∈Xf (t) X − kx − xif k)]}, k = 1, . . . , N.

(8)

i6=k

Here a penalty cost has been introduced to increase the distances among the frontier vertices chosen by different agents. Moreover, the collision problem must be faced. If the grid is dense (for example, every square has almost the same dimensions as the agents), choosing the same vertex by two or more agents might create a deadlock (even if a low level collision avoidance module were implemented). Then at each step, Procedure 1 is modified as follows:

Procedure 10 u?k = arg{min{C(xk + u, t) + U (xk + u, t) − u∈Q

−B(xk + u, t)}}, k = 1, . . . , N

(9)

where U [(i, j), t] gives infinity if (i, j) is occupied by a vehicle at time t, 0 otherwise. If the cost defined in eq. (9) is zero, the following steps that extend Procedure 2, apply: Procedure 20 1. Choose a frontier vertex using eq. (4); 2. Choose the following direction u?k = arg{min{Lk (xk + u, t) + U (xk + u, t)}} u∈Q

where L(x, t) is the distance between x and the chosen frontier vertex. To sum up, the following procedure is applied: Procedure 30 If possible use Procedure 10 . Otherwise, if a frontier vertex exists, use Procedure 20 . If no frontier vertex is present, return to the starting vertex following the minimum path. Share, if possible, the personal information {xk (t), Mk (t), xkf }.

5

Experimental results

In this section, some examples are given to illustrate the effectiveness of the methodology described in Sections 3 and 4. Different kinds of maps have been considered, in order to show how the efficiency in solving the exploration task improves when the proposed entropy based technique is applied. To this end, our algorithm has been compared with one that uses only the frontier definition [6] (i.e., when Procedure 10 is never applied and only Procedure 20 is used). The map shown in Fig. 1 has been considered as representative of an artificial structured environment. Elements such as rooms, doors and passages are present, as well as an unexplorable part (which must be identified in order to end the exploration task). As we have seen in Section 3, the entropy concept is sufficient to establish if the exploration task is terminated, and so if all the explorable parts of the environment have been detected. In the simulations, we have considered a vision range Rv = 4 and a communication capability Rc = 10. We have selected four starting

locations, one for each room. In Fig. 2, the mean value of the number of steps required to complete the exploration task is given as a function of the number N of agents (the continuous and the dashed lines correspond to the use of Procedure 30 and Procedure 20 respectively). Note that, for a small N , a significant improvement in the performance is obtained by increasing the number of exploring agents. The best performance is achieved with N = 6, when Procedure 30 is used (i.e., when the procedure making use of entropy is exploited). When N is augmented, the increase in time is due to “congestion” when too many agents are moving in this particular environment. To show how the exploration task evolves, the entropy of the map (1) has been computed over time. The minimum map entropy is reached by using any of the two procedures, but the entropy introduction significantly increases the speed of the information acquisition (see Fig. 3). To study the behavior of our technique in a “natural” unstructured environment, we have considered 30 scattered maps generated as instances of a random map with a random number of obstacles between 0 and 300 uniformly distributed on it. As a performance index, we have considered the mean value of the number of steps required to complete the exploration task for N = 1, .., 7 (see Fig. 4). In this situation a performance improvement is highlighted when Procedure 3 is used. We have also considered the following limited autonomy framework: the agents must return to their starting points every so often (when the minimum distance from the starting point equals the distance reachable with the current fuel quantity). A better performance, as shown in Fig. 5, has been achieved (i.e., a large part of information has been acquired in a shorter time). This fact is crucial, for example, whenever target has to be found in an unknown environment.

References [1] M. R. Garey and D. S. Johnson, Computers and Intractability – A Guide to the Theory of NPCompleteness, Freeman, San Francisco, 1979. [2] D. MacKay, “Information-based objective functions for active data selection”, Neural Computation, vol. 4, no. 4, pp. 590–604, 1992. [3] P. Whaite and F. P. Ferrie, “Autonomous exploration: Driven by uncertainty”, IEEE Trans-

actions on Pattern Analysis and Machine Intelligence, vol. 19, no. 3, pp. 193–205, 1997. [4] R. Sim, “To boldly go: Bayesian exploration for mobile robots”, Ph.D. Proposal, June 2000. [5] S. Moorehead, R. Simmons, and W. Red L. Whittaker, “A multiple information source planner for autonomous planetary exploration”, in iSAIRAS, June 2001.

400

400

N=1

200 0 400

N=2

200

0

50

100

0

150

400

0

20

40

60

80

N=3 200 0 400

0

20

40

60

80

0

100

400

0

20

40

60

80

0

[8] R. Simmons, D. Apfelbaum, M. Moors, S. Thrun, H. Younes, and D. Fox, “Coordination for multirobot exploration and mapping”, in AAAI/IAAI, 2000, pp. 852–858. [9] I. Wagner, M. Lindenbaum, and A. Bruckstein, “Mac vs. pc – determinism and randomness as complementary approaches to robotic exploration of continuous unknown domains”.

0

20

40

60

80

100

0

120

0

20

40

60

80

100

N=7 Proc. 3’ Proc. 2’ 0

20

40

60

80

100

Figure 3: Entropy evolution.

80 Proc. 2’ Proc. 3’

exploration steps

[7] W. Burgard, D. Fox, M. Moors, R. Simmons, and S. Thrun, “Collaborative multi-robot exploration”, 2000.

200

200 0

100

N=6

200

400

120

200

N=5

[6] B. Yamauchi, “A frontier based approach for autonomous exploration”, in IEEE International Symposium on Computational Intelligence in Robotics and Automation, Monterey, CA., 1997.

100 N=4

[10] D. P. Bertsekas, Neuro-Dynamic Programming, Athena Scientific, 1996.

70

60

50

40

30

20 1

[11] R. Radner, “Team decision problems”, Ann. Math. Statist, vol. 33, no. 3, pp. 857–881, 1962.

2

3

4

n. agents

5

6

7

Figure 4: Mean of the exploration steps. 120

260

Proc. 3’ Proc. 2’

110

Proc. 3’ Proc. 2’

240 220

exploration steps

exploration steps

100 90 80 70 60 50

200 180 160 140 120 100

40 30 1

80

2

3

4

n. agents

5

6

7

Figure 2: Mean of the number of exploration steps for the map in Fig. 1.

60 1

2

3

4

n. agents

5

6

7

Figure 5: Numbers of exploration steps for different numbers of agents (with limited autonomy).