Order Formation in Search Task based on Evolutionary Game Theory Mamoru Saito, Takeshi Hatanaka and Masayuki Fujita
Abstract— In this paper we consider a search problem for a group of agents to find a target which appears randomly and stays for a fixed time interval. Each agent has two options on search area and chooses either them. For this problem, we propose a probabilistic decision-making model of search strategy based on evolutionary game theory and Win-Stay, Lose-Shift rule. We then prove convergence of the expectation value of population share programmed to each strategy. Finally, simulation results show the validity of the proposed method.
I. I NTRODUCTION Search theory studies how to deploy an agent in order to find a target within finite resources, which is expanded into cooperative search for multi-agent system [1], [2], [3], [4]. Each search problem assumes how the target appears and moves, e.g., stationary [5] or stochastically-generated [6]. In this paper, we address the case in which the target appears randomly and stays for a fixed time interval, e.g., surveillance and ubiquitous service delivery. In [4], for the above target, we formulated an optimal search control problem maximizing the probability of finding the target while minimizing the energy consumption, and gave its approximate solution. Moreover, we showed a necessary and sufficient condition for the agent’s state and control input to converge respectively to periodic trajectories. In swarm robotics [7], [8], game theory has been attempted to apply in order to achieve ordered configurations and behaviors for multi-robot systems [9], [10]. Especially, evolutionary game theory [11] receives a lot of attention in this research area, where evolution of a group decision under interaction are investigated, given individual decisionmaking models. This theory is useful not only for analysis in collective behaviors in biology but also for designing a decision-making model in robotic networks. In addition, it is also expected to apply to some research areas such as economics, cognitive science and so on. It should be noted that in recent years decision-making also gains increasing interests in the context of mixed human/robot teams, where the main focus is on a dilemma of exploit and explore of human decision making in two-alternative forced-choice task and introduce several human decision making models. [12] and [13] give analyses of evolutions of the human decision and [13] moreover present maps from complex tasks a human/robot team should achieve to simple decision making models. In this paper, we consider the situation where a group of agents search two areas for finding a target. Each Mamoru Saito, Takeshi Hatanaka and Masayuki Fujita are with the Department of Mechanical and Control Engineering, Tokyo Institute of Technology, Tokyo 152-8552, JAPAN
[email protected]
agent selfishly chooses either area at a prescribed time step. This decision is made by the agent autonomously. We propose a decision-making model of search strategy based on evolutionary game theory and Win-Stay, Lose-Shift rule. Moreover, we investigate both macroperspective and microperspective order formation. In macro, a population share reaches an ordered configulation, and in micro each robot converges to a periodic motion. The organization of this paper is as follows. In section III we describe a probabilistic decision-making model of search strategy and discuss convergence of the population share. Next, quoted from [4], a necessary and sufficient condition for the agent’s behavior to converge to a periodic trajectory is derived in section IV. In section V the above ordered structures are shown by numerical simulations. II. P ROBLEM S ETTING Let the search area E ⊂ Rn (n ∈ {1, 2, 3}) for the agent to search be a bounded set. In this paper we will consider mainly the planar cases, i.e., n = 2. We assume that the target to be found appears randomly and stays for gh[s] (g ∈ {1, 2, . . .} and h[s] is a sampling period). φ(z) is a density function satisfying E φ(z)dz = 1 which represents the probability that the target appears in z ∈ E. Suppose that the agent equips a sensor and makes an observation of a target at prescribed time step tk , k = 1, 2, . . . which is called observation time. For simplicity we assume that the observation time tk is described by tk = kh. Here, y(t) ∈ Rn is the agent’s position, and the position at time tk is represented by yk := y(tk ). Let the observation point set from time tP to tQ (t0 < tP ≤ tQ ) be denoted by YP :Q := {yk }k=P,P+1,...,Q . In a similar way, the observation point sequence from tP to tQ is described as YP :Q := (yk )k=P,P+1,...,Q . Furthermore, when the agent observes z ∈ E from the observation point yk , the sensor accuracy generally decays with ||z − yk ||. We represent a sensor model by a monotonically increasing differentiable function p : R+ → [0, 1), where p(||z − yk ||) is the detection error probability. Suppose that the target appears at time tt (tj ≤ tt < tj+1 , j ∈ {0, 1, . . .}), the target exists in E at tk , k = j + 1, j+2, . . . , j+g. Then, the search level S(Yj+1:j+g , E) which represents the probability for missed the target detection by observing from yk ∈ Yj+1:j+g is defined by S(Yj+1:j+g , E) := φ(z) p(||z − yk ||)dz. E
yk ∈Yj+1:j+g
However, the agent knows nothing about the time when the target appears, i.e., above j. In [4], we showed that a cyclic path satisfying Y1:g = argmin S(Y1:g , E) and yk+g = yk ,
TABLE I PAYOFF TABLE
HH B A HH v(t) = 1 v(t) = 2
Fig. 1.
Directed graph for search
k = 1, 2, . . . leads the probability for the missed target ∗ , E) whenever the detection to a constant minimum S(Y1:g target appears in E. For simplicity, two subareas E1 and E2 (E = E1 ∪ E2 ) and a directed graph for observation point transition are given as shown in Fig. 1. Each node of the directed graph expresses the observation point. Besides, the directed graph is composed of two subgraphs that share the nodes. Observe that each subgraph consists of a cyclic path to search Ei with a g step period and of switching paths, which will bring the agent’s position onto the cyclic path. The search behavior is realized by simply choosing the subgraph at a prescribed time step Δh[s]. Let the strategy for the agent to choose at time t be denoted by v(t), where v(t) = 1 if E1 is chosen, and v(t) = 2 if E2 is chosen. Furthermore, the target detection probability of strategy i is also given by Pi . Throughout this paper we assume that the agent motion is represented by the state space representation. Consider a second order mass-spring-damper system y(t) ˙ x(t) = Ax(t) + Bu(t), x(t) = (1) ˙ y(t) ˙ ∈ Rn is the velocity, where x(t) ∈ R2n is the state, y(t) m u(t) ∈ R is the control input, and the pair (A, B) is controllable. Here, the state and the velocity at time tk are ˙ k ), respectively. represented by xk := x(tk ) and y˙ k := y(t
v(t) = 1 ψ11 ψ21
ψ11 ψ12
v(t) = 2 ψ12 ψ22
ψ21 ψ22
replicator dynamics representing how each population share changes is described as ξ˙i = ri (ξ, Ψ)ξi , i = 1, 2,
(2)
ri (ξ, Ψ) := Ψi ξ − ξ T Ψξ.
(3)
Ψi ξ and ξ T Ψξ represent the expected payoff to strategy i and the population average payoff, in other words, the payoff to an agent drawn at random from the population, respectively. Hence those subpopulations associated with betterthan-average strategies grow, while those associated with worse-than-average strategies decline. We can analyze the stable equilibrium points of (2), i.e., so-called evolutionarily stable strategies. In this paper, the payoff matrix Ψ describes a search scenario, whose examples are shown in III-C. B. Probabilistic Decision-making Model Here we present a probabilistic decision-making model of search strategy based on the replicator dynamics (2) and Win-Stay, Lose-Shift rule. Let a state of strategy i at time t be defined by
WIN if ri ξ(t),Ψ ≥ 0, η(i, t) := LOSE if ri ξ(t),Ψ < 0,
We first explain evolutionary game briefly (see, e.g., [11]). Evolutionary game theory discusses the model of evolutionary process under the interactions between the agents. Suppose that agents are repeatedly drawn at random to play a symmetric 2 × 2 game. Given a payoff table as Table I, a payoff matrix is defined by ψ ψ12 Ψ1 := 11 . Ψ := Ψ2 ψ21 ψ22
where η(i, t) = WIN and η(i, t) = LOSE respevtively mean that strategy i is better-than-average and worse-than-average at time t. We propose the following decision-making model. [Decision-making Model of Search Strategy] The search strategy of each agent programmed to strategy i at time t−Δh is chosen at time t with the following probability:
P r v(t) = iη(i, t) = WIN = 1
P r v(t) = switch(i)η(i, t) = WIN = 0
P r v(t) = iη(i, t) = LOSE = 1 + Δhri ξ(t), Ψ
P r v(t) = switch(i)η(i, t) = LOSE = −Δhri ξ(t), Ψ (4)
Moreover, let a population state be denoted by ξ := [ξ1 , ξ2 ]T , where each component ξi , i = 1, 2 represents the population share ξ ∈ Ω0 := of agents programmed to pure strategy i and ξ 0 ≤ ξ1 ≤ 1, 0 ≤ ξ2 ≤ 1, ξ1 + ξ2 = 1 . Then the
where switch(i) = 2 if i = 1, and switch(i) = 1 if i = 2. Each agent keeps the current strategy iif η(i, t) = WIN, and switches with the probability −Δhri ξ(t), Ψ if η(i, t) = LOSE. Here, Δh is assumed to satisfy −Δhri ξ(t), Ψ < 1.
III. D ECISION - MAKING M ODEL OF S EARCH S TRATEGY A. Evolutionary Game and Evolutionarily Stable Strategy
Then the expectation value of the population state is updated by ⎧ ξ ξ(t),Ψ ξ (t) − Δhr (t) ⎪ 1 2 2 ⎪ ⎪ ⎪ ⎪ ξ2 (t) + Δhr2 ξ(t),Ψ ξ2 (t) ⎪ ⎪ ⎪ ⎨ if η(1, t) = WIN, η(2, t) = LOSE, ξ1 (t+Δh) = ξ2 (t+Δh) ⎪ ξ(t),Ψ ξ ξ (t) + Δhr (t) ⎪ 1 1 1 ⎪ ⎪ ⎪ ⎪ ξ2 (t) − Δhr1 ξ(t),Ψ ξ1 (t) ⎪ ⎪ ⎩ if η(1, t) = LOSE, η(2, t) = WIN. (5) Lemma 1: ξ(t) = [1, 0]T , [0, 1]T are equilibrium points of (5). Proof : Immediate from (3). Hence the population state does not change if ξ(0) = ξ(0) ∈ Ω := ξ 0 < ξ1 < [1, 0]T , [0, 1]T . Then, we assume 1, 0 < ξ2 < 1, ξ1 + ξ2 = 1 . Lemma 2: For the system (5), the expected population state ξ(t) satisfies ξ(0) ∈ Ω ⇒ ξ(t) ∈ Ω, t = 0, Δh, 2Δh, . . . . Proof : Suppose ξ(t) ∈ Ω. In the case of η(1, t) = WIN and η(2, t) = LOSE, 0 < ξ1 (t) − Δhr2 ξ(t),Ψ ξ2 (t) < ξ1 (t) + ξ2 (t) = 1, 0 = ξ2 (t) − ξ2 (t) < ξ2 (t) + Δhr2 ξ(t),Ψ ξ2 (t) < 1 from 0 < −Δhr2 ξ(t),Ψ < 1 and ξ(t) ∈ Ω. In the case of η(1, t) = LOSE and η(2, t) = WIN, 0 = ξ1 (t) − ξ1 (t) < ξ1 (t) + Δhr1 ξ(t),Ψ ξ1 (t) < 1, 0 < ξ2 (t) − Δhr1 ξ(t),Ψ ξ1 (t) < ξ2 (t) + ξ1 (t) = 1 from 0 < −Δhr1 ξ(t),Ψ < 1 and ξ(t) ∈ Ω. Therefore 0 < ξ1 (t + Δh) < 1 and 0 < ξ2 (t + Δh) < 1 for (5). ξ1 (t+Δh) + ξ2 (t+Δh) = 1 leads to ξ(t+Δh) ∈ Ω. This completes the proof. Furthermore r1 ξ(t), Ψ and r2 ξ(t), Ψ satisfy r2 ξ(t), Ψ ξ2 (t) = −r1 ξ(t), Ψ ξ1 (t) for ξ(t) ∈ Ω, and (5) is simply rewritten as
ξi (t + Δh) = 1 + Δhri ξ(t), Ψ ξi (t), i = 1, 2.
(6)
This corresponds to a descrete replicator dynamics given by first order approximation. C. Search Scenarios and Convergence of Population State We show two examples of the payoff matrix Ψ and those asymptotic properties of the expected population state. Example 1 (competition): We consider the case where the agents compete with each other. Let the payoff to strategy i be described as Pi (1 − ξi ), where the payoff becomes larger with the higher target detection probability and the fewer
population share to strategy i. Then the payoff matrix Ψ, r1 ξ(t), Ψ and r2 ξ(t), Ψ are given by 0 P21 Ψ = P2 , (7) 0 2 P1 P1 +P2 r1 ξ(t), Ψ = ξ2 (t) − ξ1 (t)ξ2 (t) 2 2 P1 +P2 P1 ξ1 (t)− ξ2 (t), =− 2 P1 +P2 P2 P1 +P2 r2 ξ(t), Ψ = ξ1 (t) − ξ1 (t)ξ2 (t) 2 2 P1 +P2 P2 ξ2 (t)− =− ξ1 (t), 2 P1 +P2
(8)
(9)
Here, for ξ(t) ∈ Ω, P1 P22 ≤ r1 ξ(t), Ψ < , 8(P1 +P2 ) 2 P2 P12 − ≤ r2 ξ(t), Ψ < . 8(P1 +P2 ) 2 In order to satisfy −Δhri ξ(t), Ψ < 1, we assume 8(P1 +P2 ) 8(P1 +P2 ) Δh < min . , P12 P22 −
(10)
We give the following theorem concerning the convergence of the population state. Theorem 1: Given the payoff matrix Ψ by (7), each expected population share converges to ξ1 (t) = ξ1e :=
P1 P2 , ξ2 (t) = ξ2e := P1 +P2 P1 +P2
as t → ∞ if ξ(0) ∈ Ω and Δh
0, ξ(t) ∈ V [ξ1e , ξ2e ]T Ω\{[ξ1e , ξ2e ]T } are satisfied. Let us now define U11 (t), U12 (t) and U2 (t) by U11 (t) := ξ1 (t) − ξ1e , U12 (t) := ξ2 (t) − ξ2e , Δh(P1 +P2 ) U2 (t) := − ξ1 (t)ξ2 (t), 2 respectively. From (6), (8) and (9), V ξ(t + Δh) − V ξ(t) 2
2
= (ξ1 (t+Δh)−ξ1e ) +(ξ2 (t+Δh)−ξ2e ) 2
2
−(ξ1 (t) − ξ1e ) −(ξ2 (t) − ξ2e )
2 2 = U11 (t) + U11 (t)U2 (t) + U12 (t) + U12 (t)U2 (t) 2 2 − U11 (t) − U12 (t) 2 2 = U11 (t) + U12 (t) U2 (t) 2 + U2 (t) .
2 2 Since U11 (t) + U12 (t) > 0, U2 (t) < 0 and 0 < ξ1 (t)ξ2 (t) ≤ and sufficient 1/4 if ξ(t) ∈ Ω\{[ξ1e , ξ2e ]T }, a necessary condition for V ξ(t + Δh) −V ξ(t) < 0 is
2 + U2 (t) ≥ 2 −
Δh(P1 +P2 ) 1 16 > 0 ⇒ Δh < 2 4 P1 +P2
Therefore, Lyapunov Stability Theory inplies limt→∞ ξ1 (t) = ξ1e and limt→∞ ξ2 (t) = ξ2e if ξ(0) ∈ Ω and Δh < 16/(P1 + P2 ). This theorem means that each population share converges to ξ1 (t) = P1 /(P1 + P2 ) and ξ2 (t) = P2 /(P1 + P2 ) if the decision is made at an interval satisfying (10) and (11). Example 2 (penalty avoidance): Next, we consider that the agents are imposed a penalty on in search task. Let the payoff to strategy i be described as −Pi (1 − ξi ), where the regret becomes larger with the fewer population share in spite of the higher Then the payoff matrix Ψ, target detection probability. r1 ξ(t), Ψ and r2 ξ(t), Ψ are given by 0 − P21 Ψ= , (12) − P22 0
P1 P1 +P2 r1 ξ(t), Ψ = − ξ2 (t) + ξ1 (t)ξ2 (t) 2 2 P1 +P2 P1 ξ1 (t)− = ξ2 (t), 2 P1 +P2 P2 P1 +P2 r2 ξ(t), Ψ = − ξ1 (t) + ξ1 (t)ξ2 (t) 2 2 P1 +P2 P2 ξ2 (t)− ξ1 (t). = 2 P1 +P2
(13)
(14)
Here, for ξ(t) ∈ Ω,
Theorem 2: Given the payoff matrix Ψ by (12), each expected population share converges to ⎧ ξ1 (t) = 1, ξ2 (t) = 0 ⎪ ⎪ ⎪ ⎨ if P1 < ξ (0) < 1, 0 < ξ (0) < P2 1 2 P1 +P2 P1 +P2 ⎪ ξ (t) = 0, ξ (t) = 1 1 2 ⎪ ⎪ ⎩ 1 2 , P1P+P < ξ2 (0) < 1 if 0 < ξ1 (0) < P1P+P 2 2
ξ(0) ∈ Ω\
P2 P1 , P1 +P2 P1 +P2
T .
(15) completes the proof.
IV. AGENT ’ S B EHAVIOR IN C ONTINUOUS S PACE A. Control Law for Observation Point Transition In this subsection, we show that, given the current state xk and the observation point sequence Yk+1:k+f (f ∈ {1, 2, . . .}) from the directed graph, the optimal control input u(t) ∈ PC m , where PC denotes the set of all piecewise continuous functions, t ∈ [tk , tk+f ] minimizing the cost k+f−1 Ju u(t), ti Jk:k+f u(t) :=
Ju u(t), ti :=
(16)
i=k ti+1 ti
uT(t)Ru(t)dt, R > 0
(17)
is given in an explicit form [4]. For fixed xi and xi+1 the optimal input u∗ (t) to min∗ imize (17) and the corresponding state trajectory ∗ x (t), ∗ t ∈ [ti , ti+1 ], and the minimizing cost of (17) Ju u (t), ti are given by ∗ u (t) xi = Z(t) , t ∈ [ti , ti+1 ], (18) x∗ (t) xi+1 T xi xi , (19) Ju∗ u∗ (t), ti = M xi+1 xi+1 respectively (see e.g., [14]). Here, Y and Y˙ denote the T T T vertical concatenation of vectors [yk+1 , yk+2 , · · · , yk+f ]T and T T T [y˙ k+1 , y˙ k+2 , · · · , y˙ k+f ]T , respectively. (16) can be rewritten as ⎡ ⎤T ⎡ ⎤ yk yk ⎥ ⎢Y ⎥ H1 H2 ⎢ ⎢Y ⎥ ⎥ Jk:k+f u(t) = ⎢ (20) ⎣ y˙ k ⎦ H2T H3 ⎣ y˙ k ⎦ Y˙ Y˙
P22 P1 < r1 ξ(t), Ψ ≤ , 2 8(P1 +P2 ) P12 P2 − < r2 ξ(t), Ψ ≤ . 2 8(P1 +P2 ) Due to satisfy −Δhri ξ(t), Ψ < 1, we assume 2 2 . , Δh ≤ min P1 P2 −
as t → ∞ if
Proof : From (6), (13) and (14), we have ⎧ ξ1 (t+Δh) > ξ1 (t), ξ2 (t+Δh) < ξ2 (t) ⎪ ⎪ ⎪ ⎪ 1 2 ⎪ < ξ1 (t) < 1, 0 < ξ2 (t) < P1P+P if P1P+P ⎪ ⎪ 2 2 ⎪ ⎨ξ (t+Δh) < ξ (t), ξ (t+Δh) > ξ (t) 1 1 2 2 1 2 ⎪ , P1P+P < ξ2 (t) < 1 if 0 < ξ1 (t) < P1P+P ⎪ 2 2 ⎪ ⎪ ⎪ ⎪ ξ (t+Δh) = ξ (t), ξ (t+Δh) = ξ 1 1 2 2 (t) ⎪ ⎪ ⎩ P1 P2 if ξ1 (t) = P1+P2 , ξ2 (t) = P1+P2 .
(15)
where H1 , H2 , H3 are matrices of dimensions n(f +2) × n(f +2), n(f +2) × nf and nf × nf , respectively. Hence the optimal vector Y˙ ∗ minimizing (20) is given by ⎡ ∗ ⎤ y˙ k+1 ⎡ ⎤ ∗ ⎥ yk ⎢ y˙ k+2 ⎥ ⎢ Y˙ ∗ = ⎢ .. ⎥ = −H3−1 H2T ⎣ Y ⎦ . (21) ⎣ . ⎦ y˙ k ∗ y˙ k+f Therefore, from (18) and (21), the optimal control input u∗ (t), t ∈ [tk , tk+f ] is given in an explicit form including the initial state xk and the observation point sequence Yk+1:k+f as parameters.
A receding horizon policy is used for control of the agent, that is, at time tk the optimal control u(t), t ∈ [tk , tk+f ] is computed, and u(t) is applied only for t ∈ [tk , tk+1 ). B. Convergence to Periodic Trajectory In this subsection, we provide a necessary and sufficient condition for the agent to converge to a periodic trajectory with a period T = gh[s] [4]. Let us now define the matrix G by ⎡ ⎤T ⎡ ⎤ In On ⎢ On ⎥ ⎢ .. ⎥ ⎢ ⎥ ⎢ ⎥ G := − ⎢ . ⎥ H3−1 H2T⎢ . ⎥ ⎣ .. ⎦ ⎣ On ⎦ On In where In and On are an identity matrix and a zero matrix of dimensions n × n, respectively. Roughly speaking, this matrix represents the effect of y˙ k on y˙ k+1 . Theorem 3: Using the proposed method, the trajectories of the agent’s state and control input converge respectively to periodic ones which have a period T = gh[s] as t → ∞ if and only if maxi |λi | < 1, where λi , i = 1, 2, . . . , n are the eigenvalues of G. the brief of Proof (see [4]): Let the error vector e[k] be defined by e[k] := y˙ k+g − y˙ k , e[k + 1] is described as e[k+1] = Ge[k]. Therefore, if and only if maxi |λi | < 1, the discrete-time linear system e[k+1] = Ge[k] is asymptotically stable, and limk→∞ e[k] = lim k→∞ y˙ k+g − y˙ k = 0. Additionally, limk→∞ xk+g − xk = 0 from yk+g = yk , k = 1, 2, . . .. Since u(t) and x(t) are given in an explicit form including the waypoints xk as parameters from (18), this proof is complete.
(a) 1[s]
(b) 2[s]
(c) 3[s]
(d) 4[s]
(e) 6[s]
(f) 10[s]
(g) 15[s]
(h) 23[s]
V. ROOMS
The observation interval h = 1[s], the decision-making interval Δh = 0.02[s], the prediction step f = 10 and the cost parameter R = diag(1, 1). In this case, λi = −0.2534 (multiplicity 2), and the agent’s behavior converges to a periodic trajectory (period T = 4[s]) from Theorem 3. Here the initial states of the agents are given randomly, and y1 is the nearest observation point to the initial position. In Example 1, Figs 2(a) - 2(i) and Fig. 2(j) respectively show the agents’ position and the population share for
1
ξ1
0.9 share of each strategy
We consider the situation where a team of 200 agents searches E = [0, 4] × [0, 5]. For the search strategy 1, E1 = [0, 2.5] × [0, 4], four observation points [1.25, 1], [0.625, 2], [1.25, 3], [1.875, 2] (g = 4) and the target detetion probability P1 = 0.5 are given. On the other hand, for the search strategy 2, E2 = [2.5, 5] × [0, 4], four observation points [3.75, 1], [3.125, 2], [3.75, 3], [4.375, 2] and the target detetion probability P2 = 0.2 are given. The directed graph is given as shown in Fig. 1. The dynamics of the agent (1) is described by ⎡ ⎤ ⎡ ⎤ 0 0 1 0 0 0 ⎢0 0 0 ⎢0 0⎥ 1⎥ ⎥ ⎢ ⎥ ˙ x(t) =⎢ ⎣0 0 −1 0 ⎦ x(t) + ⎣1 0⎦ u(t). 0 0 0 −1 0 1
ξ2
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
10
20
30
40
time
(i) 32[s] Fig. 2.
(j) Population share Example 1 (competition)
50
ξ1 (0) = ξ2 (0) = 0.5, where the blue square nodes, the red symbol × and the green nodes respectively describe the agent programmed to strategy 1 and to strategy 2, and the observation points. As shown in Fig. 2(j), each population share converges to ξ1 (t) = P1 /(P1 + P2 ) ≈ 0.7143 and ξ2 (t) = P2 /(P1+P2 ) ≈ 0.2857 from Theorem 1. In a similar way, Figs 3 show the simulation result of Example 2 for ξ1 (0) = 0.6 and ξ2 (0) = 0.4. As shown in Fig. 3(j), each population share converges to ξ1 (t) = 0 and ξ2 (t) = 1 from ξ1 (0) < P1 /(P1 +P2 ) ≈ 0.7143, ξ2 (0) > P2 /(P1 +P2 ) ≈ 0.2857 and Theorem 2. This shows that all agents eventually choose strategy 2 to avoid the penalty in spite of the fact that strategy 1 has the higher target detection probability and the more initial population share than strategy 2. We see from both Fig. 2 and Fig. 3 that the position trajectory of each agent converges to a periodic one.
(a) 1[s]
(b) 2[s]
(c) 3[s]
(d) 4[s]
(e) 6[s]
(f) 10[s]
(g) 15[s]
(h) 23[s]
VI. C ONCLUTIONS In this paper, we have considered that a swarm of agents search two areas for the target which appears randomly and stays for a fixed time interval. Firstly we have presented a probabilistic decision-making model of search strategy based on evolutionary game theory and Win-Stay, LoseShift rule. Moreover we have investigated convergence of the population share in two specific tasks. Next, we have derived a necessary and sufficient condition for the agent’s behavior to converge to a periodic trajectory. Finally, two ordered formation are shown thorough numerical simulations. R EFERENCES
1
ξ1
0.9 share of each strategy
[1] J.R. Riehl, G.E. Collins, and J.P. Hespanha. Cooperative Graph-Based Model Predictive Search. Proc. IEEE Conf. on Decision and Control, pp. 2998-3004, 2007. [2] M.M. Polycarpou, Y. Yang, and K.M. Passino. A cooperative search framework for distributed agents. Proc. IEEE International Symposium on Intelligent Control, pp. 1-6, 2001. [3] E. Wong, F. Bourgault, and T. Furukawa. Multi-vehicle Bayesian Search for Multiple Lost Targets. Proc. IEEE International Conf. on Robotics and Automation(ICRA), pp. 3169-3174, 2007. [4] M. Saito, T. Hatanaka, and M. Fujita. Periodic Optimal Search Control for a Target Appearing Randomly. Proc. IEEE Conf. on Decision and Control, 2009. [5] J.R. Riehl and J.P. Hespanha. Cooperative Graph Search Using Fractal Decomposition. Proc. IEEE American Control Conf., pp. 2557-2562, 2007. [6] J.J. Enright, E. Frazzoli, K. Savla, and F. Bullo. On Multiple UAV Routing with Stochastic Targets: Performance Bounds and Algorithms. Proc. AIAA Conf. on Guidance, Navigation and Control, 2005. [7] M. Dorigo, V. Trianni, E. Sahin, et. al.. Evolving Self-Organizing Behaviours for a Swarm-Bot. Autonomous Robots, 17, pp. 223-245, 2004. [8] D. Yingying, H. Yan and J. Jing-ping. Self-Organizing Multi-Robot System Based on Personality Evolution. Proc. IEEE International Conf. on Systems, Man and Cybernetics, vol. 5, 4 pp., 2002. [9] S.N. Givigi Jr and H.M. Schwartz. Evolutionary Swarm Intelligence Applied to Robotics. Proc. IEEE International Conf. on Mechatronics & Automation, pp. 1005-1010, 2005. [10] S.N. Givigi Jr and H.M. Schwartz. Swarms of Robots based on Evolutionary Game Theory. Proc. 9th IASTED International Conf. on Control and Applications, pp. 1-7, 2007. [11] J.W. Weibull. Evolutionary game theory. MIT Press, 1995. [12] L. Vu and K.A. Morgansen. Modeling and analysis of dynamic decision making in sequential two-choice tasks. Proc. IEEE Conf. on Decision and Control, pp. 1121-1126, 2008.
ξ2
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
10
20
30
40
time
(i) 32[s] Fig. 3.
(j) Population share Example 2 (penalty avoidance)
50
[13] M. Cao, A. Stewart and N.E. Leonard. Integrating human and robot decision-making dynamics with feedback: Models and convergence analysis. Proc. IEEE Conf. on Decision and Control, pp. 1127-1132, 2008. [14] R.W. Brockett. Finite dimensional linear systems. New York, Wiley, 1970.