Toward Optimizing Static Target Search Path ... - Semantic Scholar

7 downloads 0 Views 317KB Size Report
include maximum path length or deadline, admissible/legal move, and ... episodes defined by possible legal moves. Squares refer to grid ..... the IBM CPLEX software [22] package. IV. .... path, which represent a quantum leap in near-optimally.
Toward Optimizing Static Target Search Path Planning Nassirou Lo

Jean Berger

Martin Noel

T-OptLogic Ltd. Quebec City, Canada [email protected]

DRDC Valcartier Quebec City, Canada [email protected]

UQAM, TELUQ Quebec City, Canada [email protected]

Abstract—Discrete static open-loop target search path planning is known to be a NP (non-deterministic polynomial) -Hard problem, and problem-solving methods proposed so far rely on heuristics with no way to properly assess solution quality for practical size problems. Departing from traditional nonlinear model frameworks, a new integer linear programming (ILP) exact formulation and an approximate problem-solving method are proposed to near-optimally solve the discrete static search path planning problem involving a team of homogeneous agents. Applied to a search and rescue setting, the approach takes advantage of objective function separability to efficiently maximize probability of success. A network representation is exploited to simplify modeling, reduce constraint specification and speed-up problem-solving. The proposed ILP approach rapidly yields near-optimal solutions for realistic problems using parallel processing CPLEX technology, while providing for the first time a robust upper bound on solution quality through Lagrangean programming relaxation. Problems with large time horizons may be efficiently solved through multiple fast subproblem optimizations over receding horizons. Computational results clearly show the value of the approach over various problem instances while comparing performance to a myopic heuristic. Keywords: search path planning, search and rescue, linear programming, static, open-loop

I.

INTRODUCTION

Target search path planning is a pervasive problem occurring over a variety of civilian and military domains such as homeland security, emergency management and search and rescue/respond. Target search and in particular search and rescue (SAR) problems may be characterized through multiple dimensions and attributes including: one-sided search in which targets are non-responsive toward searcher’s actions, twosided, describing target behavior diversity (cooperative, noncooperative or anti-cooperative), stationary Vs. moving target search, discrete Vs. continuous time and space search (efforts indivisibility/divisibility), observation model, static/dynamic as well as open and closed -loop decision models, pursued objectives, target and searcher multiplicity and diversity. Early work on related search problems emerges from search theory [1], [2]. Search-theoretic approaches mostly relate to the effort (time spent per visit) allocation decision problem rather than path construction. Based upon a mathematical framework, efforts have increasingly been devoted to algorithmic

contributions to handle more complex dynamic problem settings and variants [3], [4]-[6]. In counterpart, many contributions on search path planning may be found in the robotics literature in the area of robot motion planning [7] and, namely, terrain acquisition [8], [9] and coverage path planning [10],[11], [12]. Robot motion planning explored search path planning, primarily providing constrained shortest path type solutions for coverage problem instances [13], [14]. These studies typically examine uncertain search environment problems with limited prior domain knowledge, involving unknown sparsely distributed static targets and obstacles. Recent taxonomies and comprehensive surveys on target search problems from search theory and artificial intelligence/distributed robotic control perspectives may be found in [15], [4], [16]-[18] respectively. However, despite a large body of work published on various problem models, the SAR problem even in its simplest form remains computationally hard [4]. The open-loop static (offline planning) problem in particular, still presents strong interest and challenges in a variety of situations such as major disaster management or urban/military combat search-and-rescue operations where any potential gain in life saving and efficiency is worth the investment. Such circumstances include cases for which gathered information during search cannot be instantly exploited, or high-level organizational resource allocation decision-making processes aimed at exploring and assessing anticipated offline solution plan prior to costly resource deployment. Those situations may typically result from unavailable expertise/knowledge or insufficient/limited information processing technology capability at searcher’s disposal, or the prevalence of current organizational structure, process, policy and constraints or security conditions. In spite of the development of many heuristics and approximate problem-solving techniques to face the curse of dimensionality [19]-[21], [4], [18] for the static SAR, published problemsolving heuristics mostly fail to provably estimate real performance optimality gap for practical size problems, questioning their real expected relative efficiency, ignoring potential feasible gains to be possibly further considered. In this paper, we propose a new exact integer linear programming formulation along with an approximate technique to near-optimally solve the discrete static search path planning problem involving a team of homogeneous agents. In that setting, a team of centrally controlled

homogeneous agents with imperfect sensing capabilities (without false alarm) searches an area (grid) to maximize probability of target detection (probability of success - POS), given a prior cell occupancy probability distribution. The open-loop property of the problem confers objective function separability over cells enabling efficient objective function pre-computation leading to a new and convenient ILP formulation. A network flow representation significantly reduces modeling complexity (e.g. constraint specification) as well as implementation and computational costs. The new decision model relies on an abstract network representation, that can be coupled to parallel computing (e.g. using the CPLEX solver [22]) to gain additional speed-up. The novelty lies in a new exact linear model lending itself to a fast approximate computable solution for practical size problems, providing for the first time a tight upper bound on solution quality through Lagrangean programming relaxation. The computable upper bound defines an objective measure to fairly compare performance gap assessment over various techniques. Computational results prove the proposed approach very efficient and show to significantly outperform limited myopic search path planning for a random sample of problem instances. Large horizon problems are solved efficiently considering multiple overlapping episodes over receding horizons. The remainder of the paper is structured as follows. Section II first introduces problem definition, describing the main characteristics of the static open-loop search path planning problem. Then the main solution concept for the problem is presented in Section III. It describes a new linear programming network flow formulation combined with network representation to efficiently compute a near-optimal solution. Details of the proposed approximate problem-solving technique are then provided in Section IV. Section V reports and discusses computational results while comparing the value of the proposed method to an alternate myopic heuristic. Finally, a conclusion is given in Section VI. II.

target occupancy belief distribution, agent positions and orientations. A typical cognitive map at a given point in time is illustrated in Figure 1.

Figure 1. Uncertainty grid /cognitive map at time step t. The 4-agent team beliefs are displayed through multi-level shaded cell areas. Projected agent plans are represented as possible paths.

Cell visit time specifying an episode duration is assumed to be constant. Vehicles are assumed to fly at slightly different altitudes to avoid colliding with each other. A search path solution consists in constructing agent path plans to maximize target detection. B. Path Planning A centralized decision-making process episodically makes an agent’s search path planning decision based on agent’s position (cell location), and specific orientation {N,S,E,W,NE,SE,SW,NW}. Decisions are limited to three possible moving directions with respect to its current heading, namely, ahead, right or left as depicted in Fig. 2. These limited moves account for the physical acceleration associated to agent motion.

PROBLEM DEFINITION

A. Description The discrete centralized static search path planning problem (SPP) involves a team of homogeneous agents acting as stand-off sensors, searching a stationary target in a bounded environment over a given time horizon. From a search and rescue mission perspective, the goal of the team consists in maximizing the probability of target detection within a given region. The static feature of the problem refers to the single stage offline nature of problem-solving (system steady state assumption). Modeled as a grid, the search region defines a two-dimensional cellular area composed of N cells, populated by a single stationary target, assumed to occupy a single cell. Target location is unknown. Based on domain knowledge, a prior target location probability density distribution defines individual cell occupancy, characterizing a grid cognitive map. Target occupancy probabilities over grid cells sum to one. The cognitive map or uncertainty grid is a knowledge base capturing local environment state representation, reflecting

Figure 2. Agent’s region of interest displayed as forward move projection span (possible paths) over a 3–step time horizon.

The primary goal consists in planning base-level control action moves to maximize probability of success (target detection) over the entire grid. C. Probability of Success In the static open-loop SAR model, the probability to successfully detect the target resulting from n agent path solution executions on the grid is defined as the sum over cells of the product of the probability of detection reflected from cell visits and target cell occupancy belief dictated by the cognitive map (grid). The probability of success (POS) can then be expressed as follows:

POS 

 pc 0 1  1  pcc lc 

c N

(1)

where pc0 refers to the current probability/belief of cell target occupancy whereas the probability pdet(c) for a sensing agent to detect the target in cell c after lc visits on c, is defined by:

p det (c)  1  1  p cc lc

(2)

where pcc is the probability on a specific visit to correctly detect the target in cell c given that the target is present in cell c. pcc depends on cell c. Agent sensors are assumed to be false-alarm free, meaning that a vacant cell visit always corresponds to a negative observation by the sensing agent. In the current setting, agent sensor’s range defining visibility or footprint (coverage of observable cells given the current sensor position) is limited to the cell being searched. III.

aimed at eliminating disjoint solution subtours otherwise difficult to handle explicitly, and provides a directed acyclic graph to represent a legal solution through binary integer flow decision variables including a multi-cycle path (possible occurrence of many visits on the same cell). Duplication implicitly satisfies path length constraint as well. The significant gain in modeling obtained through duplication clearly exceeds the cost incurred by slightly degraded model readability due to the utilization of more complex notations. An agent network includes |O| |N| T nodes and |O| |N| T |A| arcs.

INTEGER LINEAR PROGRAMMING MODEL FORMULATION

A. Network Representation A network representation is used to simplify modeling and constraint specification as well as problem-solving, as it eliminates the need to explicitly capture all constraints. These include maximum path length or deadline, admissible/legal move, and disconnected subtours elimination which may significantly impact run-time when handled explicitly. Let Gk=(Vk, Ak) be the grid network, a directed acyclic graph associated with agent k   ={1,...,n}, where Vk  V is the set of vertices associated to agent states (i.e.

Figure 3. An agent grid network (directed acyclic graph) excerpt, over consecutive episodes t and t+1 for a 3x3 -cell grid. Nodes depict agent state (position, orientation, episode) whereas arcs capture node transition between episodes defined by possible legal moves. Squares refer to grid cells enclosing 8 possible agent orientations. A T-move path may be constructed by moving along arcs from stage 1 to stage T.

kt

tT

position and orientation state variables during a given episode t  T ={1,2,..,T}), and Ak the set of arcs (i,j) where i,j  Vk, reflecting possible agent state transition between consecutive episodes over the grid, corresponding to a legal move m selected from the action set A={left, ahead, right}. Nkt = N is the set of possible cell locations {1,...,|N|} over the grid during episode t whereas Okt = O refers to the set of possible agent orientations/headings {E,NE,N,NW,W,SW,S,SE} during episode t. As a result, Vk =  ( N kt  O kt ) . The nodes o and d t T

are additional fictitious origin and destination location vertices defining legal path ends in graph. An excerpt from the abstracted representation for an agent network over two consecutive episodes is given in Fig. 3. An integer binary flow decision variable xijk is associated to each arc (i,j)  Ak. Agent k path solution include arcs (i,j)  Ak for which xijk = 1. These flow decision variables are coupled to alternate integer binary visit decision variables vjl reflecting that l visits on cell j are part of the physical agent path solution (vjl =1) in minimizing expected non-detection over the grid. Given an initial agent state i0(k), path may be defined over the grid network traveling along arcs connecting o to d instantiating flow decision variables to build feasible paths and then, consequently, assigning visit decision variables involved in the objective function. Agent state vertex duplication over T episodes is

B. Mathematical Modeling As cell probability of target detection uniquely depends on local visits being conducted at the site, problem formulation for the POS objective function expressed in (1) proves to be separable. Decision variables may then be partitioned in subsets with separate contributions to the objective function. Separability makes feasible the precomputation of cell target detection probability contribution values in advance, since these contributions essentially rely on l local visits on cell c. Accordingly, the corresponding decision model may be ultimately formulated as follows: Vj  Vc  Max  pc0 1   1  pcc l vcl   Min  pc0 1  pcc l vcl {vcl } cN  l 0  {vcl } cNl 0

Vc

Min {vcl}

  pc0 1 pcc l vcl

(3)

c l 0

Subject to the linear convex constraint set: Vc

 xijk  lvcl  0

iVk k

l 0

c  N, (i, j(c))Ak

(4)

Vc

v

cl

l 0

1

c  N

(5)

Initial agent position

k , i 0 (k)  Vk

xo i0 (k ) k  1

particular agent index, the resulting team directed acyclic graph G=(V,A) captures agent multiplicity substituting xijk integer flow decision variables for xij, and slightly modifying some key flow constraints:

(6)

Vc

 xij  lvcl  0 cN,(i, j(c))A

Initial/final path condition

iV

x

1

k 

(7)

x

1

k 

(8)

iVk

iVk

oik

idk

 

iV

iV {o}



xijk 

x

jik iVk {d}

 0 k , j  Vk , (i, j)  Ak

 x

T

k , (i, j)  Ak

xij 

x

iV {d}

ji

 0 j  V, (i, j)  A

1 if i  j 0 otherwise

(9)

 ij  

(10)

The expected computational gain comes at the low cost expense of reconstructing individual agent paths from the computed agent-free decision variables of the team network solution. The agent path reconstruction procedure is described next.

(11)

1)

Maximum path length ijk iVk jVk /{i}

x oi  n , iV xid  n

xij {0, n} (i, j )  A

Flow conservation: iVk {o}

l 0

xoi  k i i0 (k ) i  V

Binary decision variables

vcl {0,1} cN, l {0,Vc} xijk {0,1} k , (i, j)  Ak

(12)

The objective function shown in (3) refers to the probability of non-detection (1-POS) over cell c assuming at most l=Vc visits at the site. The bound Vc can be pre-computed or selected arbitrarily large such that (3) may safely capture the optimal solution. Constraints are governed through equations (4)-(13). For a given path solution, the coupling constraints (4) map number of visits and incoming arcs to a site c. The arc (i,j(c)) relates to any agent state transition terminating in position c. Constraints (5) simply represent the number of visits to be ultimately paid on site c. Constraints (6)-(8) ensure path solution departure and end points to be uniquely defined. Flow conservation dictated by constraints (9) aims at balancing the number of incoming and outgoing arcs respectively for a given node. Constraints (10) guarantee a T-move path solution for an agent, but turn out to be unnecessary as solution constraints are implicitly satisfied by agent graph construction. Finally, vcl and xijk refer to binary decision variables for the number of visits l on cell c and agent state transition along arcs (i,j) at each move respectively. The visit decision variable assignment vcl = 1 corresponds to a path solution including l visits to cell c. The assignment xijk = 1 reflects an agent k legally transiting from state i to j. C.

Single Team Network Simplification Given agent homogeneity, a single compact ‘team’ (n agent) T-stage network G=(V,A) to capture agent paths at once can be used. Single network utilization instead of multiple network-agent mapping provides additional speed-up, number of decision variable reduction and significant computer space and management savings (by a factor n). No longer labelled by

Agent Path Reconstruction

A particular agent path is reconstructed using the team network and its instantiated integer flow decision variables xuv. A legal T-move agent k path is simply generated by moving along the computed team solution arcs from its departure state node i0(k) (combining initial cell and orientation) in stage 1 adding the related cell to the evolving path, up to stage T, before finally converging to the destination node d. Decision variables are progressively decremented as the path expands. The agent path reconstruction algorithm is straightforward and fast (O(nT)), as summarized below:

For

k  1 ..n

do

u  i0 ( k ); path k   While ( u  d ) select ( u , v ) such that x uv  0 path k  path k  {cell u } x uv  x uv  1, u  v end While   T  move path constructi on end For   agent k path solution The path solution pathk in the above procedure refers to an ordered multi-set in which multiple occurrences of the same element is possible. D. Time Horizon Despite the fact that large or medium size time horizon should usually be captured through dynamic modeling to benefit from feedback information, there are nonetheless some circumstances driven by technological or organizational

business contingencies for which episodic problem-solving in static settings may still be suitable to handle such problems, as mentioned in Section I. Large time horizon problems are solved through repeated fast subproblem optimizations over receding horizons as pictured in Fig. 4. Time horizon is divided in time intervals and corresponding subproblems sequentially solved over respective episodes of period ΔT. Accordingly, a subproblem solution periodically expands the overall current partial path solution progressively incorporating a small fraction of its solution moves (subperiod δT), while updating the objective function with new path contributions. Limited move insertions define overlapping episodes, mitigating the effects of myopic path planning. A new static subproblem is then periodically solved subject to the revisited objective function updated from the previous episode accounting for the partial solution being progressively built. The process is then reiterated until the time horizon has been covered. The strategy consists in taking advantage of the fast computation of reasonable time horizon subproblems over a limited number of episodes to quickly compute a near optimal solution to the original problem.

Figure 4. A large time horizon T is defined over T/δT receding horizons of period ΔT. Moves computed in subperiods δT form the final path solution to the original problem.

E.

Discussion

The proposed formulation confers many advantages over alternate modeling procedures, as the linear model allows to efficiently compute a bound on the optimal solution quality through Lagrangrean programming relaxation. This provides a comparative measure to carry out performance gap analysis over alternate solutions, as well as the ability to trade-off solution quality and run-time for heuristic methods operating under tight temporal constraints. Constraint handling effort may be further reduced through network construction and node duplication strategies whenever required. The proposed linearization approach can also be easily adapted to alternate separable objective functions. Problem-solving may be naturally achieved using well-known efficient techniques from the IBM CPLEX software [22] package. IV.

ILP ALGORITHM - CPLEX SOLVER

The IBM ILOG CPLEX parallel Optimizer version 12.2.0.0 [22] was used, essentially exploiting various optimized problem-solving techniques for large size problems.

The barrier method was privileged over the primal-dual technique as it generally explores regions closer to the optimal solution more efficiently. As CPLEX provides classical (exact) mixed integer programming (MIP) and implicitly Lagrangean programming relaxation (LP) solutions based on those techniques, which might possibly turn out to be timeconsuming for the MIP model, we further introduce an approximate method (LP+MIP) combining Lagrangean programming relaxation (LP) and classical MIP. The rationale for the hybrid approach (LP+MIP) relies on its potential to further prune the solution space and significantly reduce computational run-time in solving practical size problems. In LP+MIP, a LP problem is first optimized. Then, a new smaller and less constrained MIP optimization problem is solved. The approach consists in preserving in the revisited MIP problem through additional constraint specifications, instantiated null decision variables corresponding to undesirable moves resulting from prior LP execution. The simplified MIP problem is then optimally solved to instantiate remaining (open) integer decision variables leading to the highest payoffs among alternate contentious path segment candidates to build the final path solution. The proposed near-optimal LP+MIP procedure may be summarized as follows: 1. Compute solution SLP from the Lagrangean programming relaxation (LP) version of the problem. 2. Compute solution to the original problem subject to an additional constraint set: constraints consist in instantiating to zero, null decision variables emerging from SLP, significantly reducing combinatorial complexity. Many variants of the idea could be further explored to trade-off solution quality and run-time. Depending on time available and desired acceptable optimality gap, solution quality for large horizon problems may also be arbitrarily improved through suitable user-defined subproblem time horizon selection. Additional speed-up can be contemplated if a good feasible solution is initially provided in input. V.

COMPUTATIONAL EXPERIMENT

A computational experiment has been conducted to test the approach for a team of n agents for a variety of scenarios. The value of the proposed ILP approach is assessed in terms of optimality gap and run-time, and its performance alternatively compared to an alternate myopic heuristic. Computed solutions from respective methods are reported against the relative target probability detection optimality gap shown at the end of horizon T: pos *  pos ILP (13) Opt gap  pos * where pos* is the optimal probability of success defined in (1) or a tight upper bound (LP solution), and posILP the performance of the algorithm for a given scenario. The closer the optimality gap (pos*- posILP) the better the performance.

A. Myopic Algorithm The limited look-ahead method consists in myopically planning moves one step ahead of time, visiting the closest admissible neighbour cell providing the highest gain (steepest gradient descent). At each time step, the agent with highest reward is selected first, and the objective function updated accordingly. Should a dead-end occur (e.g. physical boundary impeding any legal moves), the agent backtracks as much as necessary, exploring alternate directions. The process is then repeated for the other agents. The procedure is then reiterated for each episode over time horizon T. Run-time  O(nT). B. Simulations Computer simulations were conducted under the following conditions:  Prior cell occupancy belief distribution for grid size N: exponential, uniform; 10x10, 15x15  Homogeneous sensor agents: Team size n (1,5,10) Actions: 3 moves possible: right, ahead, left Sensor parameters: pc=0.8 for all cells  Hardware Platform: Intel (R) Xeon (R) CPU X5670 Shared-memory multi-processing: 8 processors, 2.93 GHz Random Access Memory: 16 Go, 64 bits binary representation (double precision) It should be mentioned that as target cell occupancy probability sum up to one, performance analysis for large grid becomes less appealing. This refers to the fact that the larger the grid size the smaller (ultimately negligible) the probability of cell occupancy in general, inevitably resulting either in significant prospective visit payoffs for a limited number of cells sparsely distributed over a large grid, or alternatively in near similar cell visit payoffs, for which any methods would likely exhibit comparable performance behavior. In both cases, this would translate in a substantial, unnecessarily and costly (time-consuming) fraction of the total effort devoted to lengthy subpath planning segments, eventually conducting to low equivalent expected payoffs. It is therefore argued that anticipated grid instances finer than approximately 15x15 cells in size should be hierarchically reduced or aggregated for practical purposes and in order to support meaningful performance evaluation and analysis. This is why this study opted to limit its investigation to the proposed grid sizes. C. Results A sample of random simulation results is reported in Table I for a few N=15x15 -grid and nT

Suggest Documents