A Game-Theoretical Model Applied to an Active ... - Nicola Basilico

2010 International Conference on Emerging Security Technologies

A Game-Theoretical Model Applied to an Active Patrolling Camera Nicola Basilico and Davide Rossignoli and Nicola Gatti and Francesco Amigoni Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy Email: {amigoni, basilico, ngatti}@elet.polimi.it, [email protected] Abstract— In patrolling, an agent perceives portions of an environment to detect the presence of an intruder. Usually, the agent cannot perceive the whole environment at once, but can change over time the observed portion. Finding an optimal patrolling strategy that minimizes the possibility of intrusions constitutes one of the main scientific problems in this field. Game theoretical models have been recently employed to compute effective patrolling strategies that explicitly consider the presence of a rational intruder. In this paper, we present a general game theoretical framework for computing patrolling strategies for different kinds of patrollers. In particular, we study the case of an active patrolling camera.

patroller’s states are associated with the sensed portions of the environment and depend on its sensors while its actuators define the transitions between states. According to a worst case stance, the intruder is assumed to be in the position to observe the patroller and to exploit an exact knowledge of the patrolling strategy in deciding when and where to attack. The interaction between these two agents is represented by a game model and the optimal patrolling strategy is computed by determining the leader-follower equilibrium of the game [8]. We present the application of the proposed framework to the situation in which the patrolling agent is an active camera. In this case, the patrolling strategy can be defined in terms of pan, tilt, and zoom (PTZ) commands. Works in the field of camera scheduling focused on effective PTZ controls to detect events or objects of interest. A relevant example is [9] where a technique to deal with an exploration-exploitation trade-off for a surveillance scenario is proposed. Unobserved events (intrusions) are assumed to occur independently according to a Poisson probability distribution. In our approach we consider a more complex model of a rational intruder. The main original contributions of this paper are the definition of a general framework for strategic patrolling, that accounts for different kinds of patroller, and its application to an active camera-based patrolling setting. To the best of our knowledge, the use of game theoretical models for patrolling with active cameras is a novel contribution. The paper is organized as follows. Section II and Section III describe the general framework and its active camera-based instance, respectively. Section IV reports experiments. Section V concludes the paper.

I. I NTRODUCTION In a patrolling task, an agent is required to perceive portions of a known environment with the objective to detect malicious intrusions. Usually, the agent cannot perceive the whole environment at once and exploits its actuators to observe different portions of it at different times. Hence, the agent executes a patrolling task exploiting its sensors and its actuators. Sensors allow it to monitor a portion of the environment, checking for the presence of an intruder. Actuators allow the agent to change the current view by focusing sensors to different portions of the environment. Finding effective patrolling strategies for moving sensors around the environment received a remarkable attention in literature [1], [2]. Originally the problem has been investigated in non-adversarial scenarios [1], [3]. Emerging approaches focus on adversarial scenarios, where the interaction with a rational intruder agent is explicitly taken into account and the patrolling strategy is computed exploiting game theoretical techniques [2], [4], [5]. These approaches have addressed patrolling settings involving fixed checkpoints [2] or mobile robots [4], [5]. We note that similar strategic problems have been addressed in the pursuit-evasion field (e.g., [6], [7]). However, some assumptions, including the fact that the evader’s goal is only to avoid capture and not to enter an area of interest, make the pursuit-evasion strategies not directly applicable to patrolling scenarios. In this paper, we present a theoretical framework that computes optimal patrolling strategies in general adversarial settings. An instance of the framework has been already applied to autonomous patrolling robots in [4]. Here, we show its generality by defining and experimentally evaluating another instance of the framework for patrolling with active cameras. The basic idea of the framework is to introduce state spaces for the patroller and the intruder agents. Broadly speaking, 978-0-7695-4175-4/10 $26.00 © 2010 IEEE DOI 10.1109/EST.2010.11

II. A G AME T HEORETICAL PATROLLING M ODEL We model a scenario characterized by an environment to be patrolled by two rational agents: a patroller and an intruder. The patroller controls a set of resources that can be dynamically allocated to monitor different portions of the environment, in the attempt to reduce the risk of intrusions. For instance, a resource can be a mobile patrolling robot (as in [4]) or, as we will show in Section III, an active camera. On its side, the intruder tries to perform an intrusion from outside the environment without being detected. Time is discrete, starting at turn 0, and at each turn the agents (must) undertake an action. The environment is represented with a bidimensional grid, as shown in Fig. 1, that is composed of free (white) and obstacle (black) cells. Free cells constitute a 130

set V = {v1 , v2 , . . . , vn }. A subset T ⊆ V defines targets, i.e., cells that have some value for the agents and where an intrusion can occur (in Fig. 1 they are marked with small black circles). Notice that usually, due to its limited sensors and actuators, the patroller is not able to monitor all targets at once. A mobile robot moving between two targets could also traverse (and monitor) also non-target cells. Given a target vt ∈ T , ϑp (vt ) and ϑi (vt ) are the vt ’s values for the patroller and the intruder, respectively. An attempted intrusion in a target vt requires a number d(vt ) of turns to be successfully completed. d(vt ) is called penetration time of vt .

Fig. 1.

The intruder’s state space Si = (Si , ai (·), τi (·, ·)) is similarly defined. The intruder agent can wait and observe the environment, staying hidden, for an undefined number of turns and then decide to attempt an intrusion. At each turn, it can choose to continue waiting or to attack a target. If it decides to attack a target vt at turn k, then it is forced to stay there for d(vt ) turns during which it will be exposed to the patroller’s detection capabilities. (It is also possible to consider the path followed by the intruder to reach vt , please refer to [11] for details.) The set of states is Si = {waiting} ∪ {(vt , k)|vt ∈ T, k ∈ {1, . . . , d(vt )}}. State si = (vt , k) represents the situation in which the intruder is in target vt from k turns. Available actions are defined in the following way: ai (waiting) = {wait, attack(vt )} and ai ((vt , k)) = {wait} ∀vt ∈ T, k ∈ {1, . . . , d(vt ) − 1}. Similarly to the patroller’s case, the transition function τi : Si × Ai → Si is deterministic:

An example of grid environment

τi (waiting, wait) = waiting τi (waiting, attack(vt )) = (vt , 1) τi ((vt , k), wait) = (vt , k + 1) ∀vt ∈ T, k ∈ {1, . . . , d(vt ) − 1}

A. Defining the model Adopting an approach inspired by stochastic games [10], we model the situation specifying agents’ states and transition models. We denote with Sp the state space of the patroller and with Si the state space of the intruder. Let us start by defining Sp = (Sp , ap (·), τp (·, ·)). Sp is the set of states sp , each one representing a specific allocation of resources, e.g., the position of a robot or the PTZ configuration of a camera. Function ap : Sp → 2Ap , where Ap is the set of possible actions for the patroller, returns the available actions in a given state, e.g., the possible movements of a robot starting from its current position or the possible next camera’s configurations, given its current one. Function τp : Sp × Ap → Sp specifies the possible transitions, mapping a pair state-action to another state. We assume that the transition function τp is deterministic, i.e., given the current state at turn k, denoted with skp , and an available action in that state a ∈ ap (skp ), then k+1 τp (skp , a) = sk+1 at turn k+1. p , with a unique arrival state sp It is convenient to introduce a function ωp : Sp → 2V that returns the set of cells monitored at a given state, assigning each state a subset of V . For example, if ω(sp ) = V , then, in sp , the allocation of resources is such that the patroller can monitor the whole environment. Note that, in general, ωp is not injective, because different allocations of resources (i.e., states) can yield the same set of observed cells. When ωp is injective, we can identify a state with the corresponding set of observed cells. In Fig. 2 we report an example of a portion of a patroller’s state space (composed of three states) where we assumed ωp injective and we represented each state sp with ωp (sp ) (parallel lines mark the observed cells). 11 00 00 11 00 11 00 11

11 00 00 11 00 11 00 11 11 00 00 11 00 11 00 11

Fig. 2.

11 00 00 11 00 11 00 11

In Fig. 3 an example of an intruder’s state space is reported for the environment of Fig. 1. The presence of the intruder in a target is denoted with a black triangle. wait attack(v t) waiting

11 00 00 11 00 11

11 00 00 11

11 00 00 11

... wait ( vt ,1)

Fig. 3.

wait ( vt ,2)

( vt , d(vt))

A portion of an intruder’s state space

Our objective is to compute the optimal strategy for the patroller agent explicitly accounting for the adversarial nature of the scenario, given by the presence of the intruder. In general, a strategy is a probability distribution over the patroller’s actions available in a state. In this paper, we consider only Markovian patrolling strategies that can be defined as a matrix {αi,j } where αi,j is the probability for the patroller of going from state si ∈ Sp to state sj ∈ Sp (due to the determinism of the transition function we can consider the corresponding action as implicit). B. Computing the optimal patrolling strategy To compute the optimal strategy, we build a game model. We define a two-player partially observable stochastic game [10]. Each state of the game is defined as sk = (skp , ski ), where skp ∈ Sp and ski ∈ Si and correspond to the agents’ states at turn k. Therefore, actions available to the players are ap (skp ) for the patroller and ai (ski ) for the intruder. There are three possible classes of terminal states (i.e., outcomes) in this stochastic game:

11 00 00 11 00 11 00 11 11 00 00 11 00 11 00 11

A portion of a patroller’s state space

131

intruder-capture: the intruder plays action attack(vt ) at turn k and the patroller detects it in target vt in the time interval [k + 1, k + d(vt )], • penetration-vt : the intruder successfully completes the intrusion in target vt , • no-attack: for every turn k the intruder plays the wait action and never enters in the environment, in this case the game does not end in a finite number of turns. In Table I we report the utilities the players obtain in each outcome of the game (with a slight abuse of notation, ϑp (T ) = vt ∈T ϑp (vt ) and ϑp (T−i ) = ϑp (T \ {i})). For each outcome, the patroller gets an amount corresponding to the total value of preserved targets. Hence, it obtains the same utility in intruder-capture and no-attack outcomes (in both cases no target is violated). Instead, the intruder gets the value of a target when successfully completing an intrusion in it or a penalty ∈ R+ in case it is detected. In case of intrudercapture or penetration-vt the game ends in at most k¯ + d(vt ) turns, given that k¯ is the turn at which the intruder attacked vt . This explains why we omit the arc from state (vt , d(vt )) to state waiting in Fig. 3. In case of no-attack the game is repeated for an infinite number of turns. Given this game, the optimal patrolling strategy is the strategy that the patrolling agent plays at the equilibrium.

the best response of the intruder. If such a strategy exists, then it is the optimal one (recall Table I) and we can stop. If such strategy cannot be found, we search the optimal patrolling strategy such that the best response of the intruder will be enter-when(vt , sp ) for some target vt and patroller’s state sp . The leader-follower equilibrium can be computed by resorting to mathematical programming (as in [2]). We introduce the w,t , referring to the probability that the patroller goes variable γi,j in state sj in w steps, starting from state si and not detecting a possible intruder in target vt . For the sake of simplicity, we assume here that the detection process is error-free, i.e., if the intruder is in a cell vt ∈ ωp (sp ) then the patroller detects it with certainty. We will relax this hypothesis later. In order to find a patrolling strategy inducing a stay-out best response, we solve the following feasibility problem:

•

Game outcome intruder-capture penetration-vt no-attack

Patroller ϑp (T ) ϑp (T−i ) ϑp (T )

X

αi,j ≥ 0

∀si , sj ∈ Sp

(1)

αi,j = 1

∀si ∈ Sp

(2)

∀si , sj ∈ Sp

(3)

sj ∈Sp

αi,j ≤ f (i, j) 1,t γi,j

X

w,t γi,j =

= αi,j

“ ” w−1,t γi,x αx,j

sx ∈Sp |vt ∈ω / p (sx )

X

/ ωp (sj ) si , sj ∈ Sp , vt ∈ ∀vt ∈ T, w ∈ [2, d(vt )], / ωp (sj ) si , sj ∈ Sp , vt ∈

(4) (5)

d(v ),t

∀vt ∈ T, si ∈ Sp

(6)

ϑi (vt )p(vt , si ) − (1 − p(vt , si )) ≤ 0

∀vt ∈ T, si ∈ Sp

(7)

p(vt , si ) =

Intruder − ϑi (vt ) 0

∀vt ∈ T,

γi,z t

sz ∈Sp |vt ∈ω / p (sz )

Constraints (1)-(2) express that probabilities αi,j s are well defined. Function f (·, ·) is such that f (i, j) = 1 if the transition from si to sj is possible (i.e., there exists an action a ∈ ap (si ) such that τp (si , a) = sj ). Hence, constraints (3) limit the patroller’s state transitions to those allowed by its transition function; constraints (4)-(5) impose the Markov property to the patrolling strategy; constraints (6) define p(vt , si ) as the probability that action enter-when(vt , si ) results in a successful intrusion; constraints (7) express that no action enter-when(vt , si ) gives the intruder an expected utility larger than that of stay-out. A solution of this problem is a set of probabilities {αi,j } such that the best response of the intruder is stay-out. When the above feasibility problem does not admit any solution (i.e., there is no patroller’s strategy such that stayout is a best response for the intruder), we find the best response of the intruder such that the patroller can maximize its expected utility. This problem is formulated as a multibilinear programming problem. We have a single bilinear problem for each enter-when(vt , si ), which is assumed to be the intruder’s best response:

TABLE I P LAYERS ’ UTILITIES FOR EACH GAME OUTCOME

The “strength” of the intruder influences the selection of the solution concept adopted for computing the game equilibrium. As already mentioned, the intruder can stay hidden before attacking and, by observing the patroller, can derive a correct belief over its patrolling strategy. Adopting a worst case stance, we assume that the intruder exactly knows the patrolling strategy and considers this knowledge when computing its own strategy (this assumption is common in literature [2]). (The fact that the intruder knows the probability distribution with which the patroller will select its actions does not amount to assume that the intruder knows the actual action the patroller will play.) For this reason, the equilibrium of the game is determined by using the leader-follower solution concept [8]. In a leader-follower game, the leader commits to a strategy and the follower considers such commitment when computing its strategy that, at the equilibrium, is a best response. In our setting, the patroller is the leader and the intruder is the follower. We now show how a leader-follower equilibrium can be calculated for our setting. It is convenient to define the possible best responses for the intruder as stay-out, i.e., making wait forever from the first turn of the game, and enter-when(vt , sp ), i.e, making wait until the patroller is in state sp and then attack target vt . Our algorithm is composed of two steps. In the first one, we try to find a patrolling strategy such that stay-out is

max

ϑp (T−i )p(vt , si ) + ϑp (T ) (1 − p(vt , si )) s.t. constraints (1)-(5)

ϑi (vt )p(vt , si ) − (1 − p(vt , si )) ≥ ≥ ϑi (vz )p(vz , sj ) − (1 − p(vz , sj ))

132

∀vz ∈ T, sj ∈ Sp

(8)

we make the strong assumption that the cells have the same size, independently of their distance from the physical location of the camera. Hence, each state sp can be of two different types: zoom-out or zoom-in. In a L × L zoom-out state, a possible intruder in a target vt ∈ ωp (sp ) is detected with probability u, namely 1 − u is the detection error for a zoomout state. On the other hand, we assume that in a zoom-in state the camera is able to detect the intruder with certainty, namely with a null detection error (these assumptions are sound with those in [9]). As described in the previous section, actuators define the transitions between states. In our case, each transition corresponds to particular PTZ commands. We can consider a complete state space, where actuators allow the camera to move between any two states, or more realistic models of actuators, for example a τp defined as:

The objective function maximizes the patroller’s expected utility. Constraints (8) express that no action enter-when(vz , sj ) gives a larger value to the intruder than action enterwhen(vt , si ). If a single bilinear problem is feasible, its solution is a set of probabilities {αi,j } that define a possible patrolling strategy. From the solutions of all feasible problems, we pick out the one that gives the patroller the maximum expected utility. A particularly interesting situation arises when the game is strictly competitive [12], i.e., when both the patroller and the intruder have the same preference ordering over the targets, namely when, for any pair of targets vi , vj ∈ T , ϑp (vi ) ≥ ϑp (vj ) if and only if ϑi (vi ) ≥ ϑi (vj ). If the game is strictly competitive, the equilibrium patrolling strategy is the minimax strategy (i.e., the strategy that minimizes the maximum intruder’s expected utility) and we can compute it with a single mathematical programming problem. We call u the maximum expected utility of the intruder and solve the following problem:

• •

These rules force camera’s movements to maintain certain overlapping between past and current fovs.

min u s.t. constraints (1)-(5)

00 11 00 11 11 00 00 11

(9)

11 00 00 11 00 11 00 11

Similarly to the original formulation, the solution of this problem is given by a set of probabilities {αi,j } representing the optimal patrolling strategy. Therefore, in a strictly competitive setting, the two steps (mathematical programming problems) of the algorithm are reduced to solve a single optimization problem.

00 11 11 00 00 11 00 11

00 11 11 00 00 11 00 11

1111 00 00 0011 11 00 11 00 11 00 00 11 00 0011 11 00 11 00 11 00 11 0011 11 00

1111 00 00 0011 11 00 11 00 11 00 00 11 00 0011 11 00 11 00 11 00 11 0011 11 00

0011 11 00 11 00 00 0011 11 00 11 00 11 00 11 0011 11 00 11 00 11 0000 11 00 11 0011 11 00

Fig. 4.

III. PATROLLING WITH ACTIVE CAMERAS

0011 11 00 11 00 00 0011 11 00 11 00 11 00 11 0011 11 00 11 00 11 0000 11 00 11 0011 11 00

An example of transitions from a zoom-out state

0011 11 00 11 00 00 0011 11 00 11 0011 11 00 0011 11 00 11 00 11 00 0011 11 00

Fig. 5.

0011 11 00 11 00 00 0011 11 00 11 00 11 00 11 0011 11 00 11 00 11 00 0011 11 00

In Fig. 4 we show an example of transitions from a zoomout state, using the more realistic model. In this example, the environment is a 4 × 4 grid while the maximum fov of the camera is 2 × 2 (L = 2). As in the previous section, sensed cells are marked with parallel lines. Fig. 5 shows an example of transitions from a zoom-in state. As it can be easily shown, under our more realistic transition model, in order to go from a zoom-in state to another one, at least two turns are needed since a common zoom-out state should be traversed. Note that τp can be defined to capture other possible movement capabilities of the camera (e.g., τp can be defined to allow transitions between zoom-in states of adjacent cells).

In previous works, we applied the game theoretical patrolling model introduced in the previous section to situations in which the patroller is a mobile robot [4], [13]. In this section, we describe another instance of the above general framework that considers an active camera as the patroller. The intruder’s behavior is that described in the previous section. The patroller’s behavior is defined by instantiating a suitable state space Sp , as we now show. We define an over-simplified model of an active camera as a sensor able to take pictures of portions of the environment and to analyze them in order to detect the presence of an intruder. Our camera is equipped with motion actuators of pan (rotation about vertical axis), tilt (rotation about horizontal axis), and zoom, that allows the camera to change its field of view (fov). When considering an active camera as patroller, the goal is to control the three movements (pan, tilt, zoom, or PTZ) in an optimal way to detect intrusions. At each turn, the camera has a particular fov (i.e., it observes a particular subset of environment’s cells). We assume that the function ωp is injective, so we can consider the fov as the states Sp of the patroller. We assume that the camera has only two types of zoom: zoom-in (observing a single cell) or zoom-out (observing an L×L group of cells). For the sake of simplicity,

11 00 00 11 00 11 00 11

ϑi (vz )p(vz , sj ) − (1 − p(vz , sj )) ≤ u ∀vz ∈ T, sj ∈ Sp

transitions between zoom-out and zoom-in states sout and sin can happen only if ωp (sin ) ⊆ ωp (sout ), transitions between two zoom-out states sout and sout can happen only if ωp (sout ) ∩ ωp (sout ) = ∅.

An example of transitions from a zoom-in state

In order to account for the detection error, we relax the assumption of Section II for which if the intruder is in a cell vt ∈ ωp (sp ) then the patroller detects it with certainty. We introduce a function ρ(sp , t) denoting the probability that the camera in state sp detects the intruder in target vt :

133

8 > :0 otherwise

fundamental role, making again stay-out the intruder’s best response. In order to identify the states that are visited more frequently by the camera according to the optimal patrolling strategy found by our approach, we calculate the steady state for the Markov chain defined by the computed patrolling strategy. In the case of Fig. 6(b), zoom-out states are preferred because of the relatively large value for u and the good views they offer over the targets (indeed, with a 3 × 3 field of view up to 4 targets can be viewed in a single state).

With this function, constraints (4), (5), and (6) can be rewritten as constraints (10), (11), and (12) respectively: 1,t γi,j = αi,j (1 − ρ(sj , t))

X “ sx ∈Sp

w,t = γi,j w−1,t

γi,x

αx,j (1 − ρ(sj , t))

p(vt , si ) =

X sz ∈Sp

”

d(v ),t

γi,z t

∀vt ∈ T, si , sj ∈ Sp ∀vt ∈ T, w ∈ [2, d(vt )], si , sj ∈ Sp ∀vt ∈ T, si ∈ Sp

(10)

(11)

(12)

1

2

0.1

0.1

1

2

3

0.1

0.1

0.1

4

5

6

0.1

0.1

9

10

5

With the definition of Sp , ap (·), and τp (·, ·) and with this slight adaptation of the model, we can apply the same algorithms presented in Section II to compute the optimal patrolling strategy for the camera. The algorithms return a set {ai,j } of probabilities, which can be easily translated in the corresponding PTZ movements.

0.1

0.1

0.2

7

8

9

0.1

0.1

0.1

6

3

4

7

8

11

12

15

16

0.4

(a) L = 1, d = 3 Fig. 6.

13

14 0.2

(b) L = 3, u = .4, d = 2

Environments tested with fully connected state space

Environments of Fig. 7 are similar to those of Fig. 6 but we evaluate them considering a state space with a more realistic transition function defined according to the two rules of Section III. It is expected that the optimal patrolling strategy

IV. E XPERIMENTAL RESULTS We implemented our framework in Matlab, exploiting SNOPT [14] for the resolution of mathematical programming problems. We made tests on a Linux computer equipped with a 2.33 GHz CPU and 8 GB RAM. We applied our algorithm to compute the optimal patrolling strategy in different grid environments. We decided to define environments such that for every target vt , ϑp (vt ) = ϑi (vt ) so that the associated game is strictly competitive. In this way, we can compute the patrolling strategy employing both the general multiple mathematical programming formulation and the more specific strictly competitive one. We exploit this possibility to crosscheck the consistency of the two formulations by comparing the obtained strategies and, at the same time, to have a feedback about the potential improvement in computational time when using the strictly competitive formulation. Moreover, we assigned, without affecting the significance of our results, the same penetration time d to each target. Let us consider the environments of Figs. 6 and 7, where the values for parameters L, u, and d are reported. Targets cells are filled with their corresponding values. In all cases, we considered = 1 (the larger , the more likely the intruder plays stay-out as best response). Environments of Fig. 6 are a small grid with a relatively large number of targets (Fig. 6(a)) and a larger one with a majority of non-target cells (Fig. 6(b)). We consider Sp as fully connected, namely the camera’s actuators allow it to move between any two PTZ configurations in one turn. The value L = 1 in the environment of Fig. 6(a) constrains the zoom modality to be always zoom-in. Despite this limitation, the obtained patrolling strategy is such that the intruder’s best response is stay-out. In this case, the strong capabilities of the actuators compensate the small field of view of the camera, allowing it to obtain the maximum utility. In the environment of Fig. 6(b) the strong capabilities of actuators still play a

1

2

0.3

3 0.1

1

5

4 0.05

5 0.1

6

7 0.05

8 0.1

9 0.3

2

3

4

0.2 6

7

8

10

11

12

14

15

16

0.3 9 0.3

Fig. 7.

13

0.2

(a)

(b)

L = 2, u = .4, d = 2

L = 3, u = .5, d = 2

Environments tested with partially connected state space

performs worst in this state space with respect to the fully connected state space. This is what happens in the environment of Fig. 7(a) where the optimal patrolling strategy induces the intruder’s best response enter-when(1, s9 ), i.e., to attack target 1 when the camera is observing cell 9 in zoom-in modality. It is easy to see that, when zoomed-in in 9 the camera needs at least 2 turns (equal to the penetration time) to reach a zoom-out state from which target 1 can be viewed. Reasonably, this is the most advantageous situation for the intruder. In the environment of Fig. 7(b) we obtain again the maximum expected utility for the patroller with a stayout best response. As it can be seen, in this environment the large value of L permits to maintain a comprehensive view of all the targets. Steady state probabilities confirm this intuition, the most probable states being the two zoom-out states with ωp (s1 ) = {1, 2, 3, 5, 6, 7, 9, 10, 11} and ωp (s2 ) = {5, 6, 7, 9, 10, 11, 13, 14, 15}. Currently, one of the main limitations of our algorithm is the long computational time needed to solve the non linear mathematical programming problems and to obtain the optimal patrolling strategy. When using the general formulation, we are able to solve in a “reasonable” time (less than 2 hours) settings

134

4

10

250

2

200

time (s)

time (s)

10

1

10

0

10

150

100

−1

10

50

−2

10

9

10

11

12

13

14

0

15

5

10

15

total number of cells

20

25

30

35

40

45

total number of cells

(a)

(b) Fig. 8.

V. C ONCLUSIONS

Computational time

1.2

In this paper we presented a theoretical model for computing optimal patrolling strategies in adversarial settings, where the presence of a rational intruder is explicitly considered. The model is general and can be applied to different patrollers. Specifically, we defined an instance of the model for an active patrolling camera. Pan, tilt, and zoom camera movements are exploited to monitor an environment and protecting it from intrusions of an observing adversary. One of the main limitations of our approach is the large amount of computational time needed to compute a solution in complex settings. Despite significant improvements can be obtained with strictly competitive settings, further optimizations are needed to handle more complex situations. A promising approach is to eliminate dominated strategies of the agents. For example, in preliminary experiments we observed that a remarkable amount of time (about 90%) can be saved when discarding zoom-in states on non-target cells.

1.2

1

utility

1

utility

more time to complete an intrusion. As expected, patroller’s utility is large with large values of L and d. In Fig. 9(b) values d and u are varied. Utility increases as u values became larger. However, this parameter seems to have a less remarkable influence with respect to d. The weak influence of u over the patroller’s utility is confirmed also by the experiments reported in Fig. 9(c), where parameters L and u are concurrently varied.

300

Multi Strictly competitive

3

10

0.8

0.6

0.8

0.6

0.4 3

0.4 3 2.5

2.5

4 2 1.5

d

0.8 0.6

2

3

1

0.4

1.5

2 1

d

L

0.2 1

(a)

0

u

(b)

1.2

utility

1

0.8

0.6

0.4 3 2.5

0.8 0.6

2 0.4

1.5

L

0.2 1

0

R EFERENCES

u

(c) Fig. 9.

[1] Y. Chevaleyre, F. Semp´e, and G. Ramalho, “A theoretical analysis of multi-agent patrolling strategies,” Proc. AAMAS, pp. 1524–1525, 2004. [2] P. Paruchuri, J. Pearce, M. Tambe, F. Ordonez, and S. Kraus, “An efficient heuristic approach for security against multiple adversaries,” in Proc. AAMAS, 2007, pp. 311–318. [3] J.-S. Marier, C. Besse, and B. Chaib-draa, “A Markov model for multiagent patrolling in continuous time,” in Proc. ICONIP, 2009, pp. 648–656. [4] N. Basilico, N. Gatti, and F. Amigoni, “Leader-follower strategies for robotic patrolling in environments with arbitrary topologies,” in Proc. AAMAS, 2009, pp. 57–64. [5] N. Agmon, V. Sadov, G. Kaminka, and S. Kraus, “The impact of adversarial knowledge on adversarial planning in perimeter patrol,” in Proc. AAMAS, 2008, pp. 55–62. [6] V. Isler, S. Kannan, and S. Khanna, “Randomized pursuit-evasion in a polygonal environment,” IEEE Transactions on Robotics, vol. 5, no. 21, pp. 864–875, 2005. [7] R. Vidal, O. Shakernia, J. Kim, D. Shim, and S. Sastry, “Probabilistic pursuit-evasion games: Theory, implementation and experimental results,” IEEE Transactions on Robotics and Automation, vol. 18, no. 5, pp. 662–669, 2002. [8] B. von Stengel and S. Zamir, “Leadership with commitment to mixed strategies,” London School of Economics, CDAM Research Report LSECDAM-2004-01, 2004. [9] E. Sommerlade and I. Reid, “Information-theoretic active scene exploration,” in Proc. CVPR, 2008, pp. 1–7. [10] J. Filar and K. Vrieze, Competitive Markov decision processes. Springer-Verlag, 1996. [11] N. Basilico, N. Gatti, and F. Amigoni, “Extending algorithms for mobile robot patrolling in the presence of adversaries to more realistic settings,” in Proc. IAT, 2009. [12] D. Fudemberg and J. Tirole, Game Theory. The MIT Press, 1991. [13] F. Amigoni, N. Basilico, and N. Gatti, “Finding the optimal strategies in robotic patrolling with adversaries in topologically-represented environments,” in Proc. ICRA, 2009, pp. 819–824. [14] Stanford Business Software Inc., “http://www.sbsi-sol-optimize.com/.”

Patroller’s optimal utility for different values of parameters

with a maximum size of 15 cells. As shown in Fig. 8(a) when using the strictly competitive formulation a remarkable amount of time can be saved, at the price of reducing a little the scope of applicability of the model because of the constraints on the targets’ values (note that the time axis of Fig. 8(a) has a logarithmic scale). With the strictly competitive formulation we are able to solve environments with up to 45 cells with a relatively small amount of time. However, as shown in Fig. 8(b), the computational time still exhibits an exponential behavior in the number of cells. Anyway, note that the optimal patrolling strategies are calculated off-line before being executed by the active cameras. We evaluated the effectiveness of the patrolling strategies computed with the strictly competitive formulation when varying parameters L, u, and d. Default values u = 0.7, L = 2, and d = 3 are initially assigned. Fig. 9 reports the patroller’s optimal utility in an environment with 4 × 4 cells and 5 targets for different values of L, u, and d (results for other environments are similar). Fig. 9(a) shows how the utility varies with respect to L and the penetration time d. As it can be seen, these two parameters have a significant impact on the utility. With large values of L the camera has a large fov, augmenting the possibility to detect an intruder. On the other side, large values for d correspond to weak intruders, that need

135