to thank Professor Hugh Durrant-Whyte, Dr Tomonari Furukawa and others in particular for ... Back at UTS, thanks goes to Dr Matthew Gaston for a high availability computing ...... Table 5.8 PD of heuristic plans for the small problem set (b).
Optimal Search in Structured Environments
Haye Lau
A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy
The University of Technology, Sydney 2007
CERTIFICATE OF AUTHORSHIP/ORIGINALITY I certify that the work in this thesis has not previously been submitted for a degree nor has it been submitted as part of requirements for a degree except as fully acknowledged within the text. I also certify that the thesis has been written by me. Any help that I have received in my research work and the preparation of the thesis itself has been acknowledged. In addition, I certify that all information sources and literature used are indicated in the thesis.
Signature of Student
____________________________
ii
Acknowledgements I would like to thank all of the people who encouraged and supported me during the undertaking of this research. Firstly I would like to thank my supervisor, Professor Gamini Dissanayake, for his invaluable guidance, support, and for always keeping an eye on the long term view. I would also like to thank, for his appreciated advice, my co-supervisor Dr Shoudong Huang, who came upon a diversion from SLAM one day and literally made me go search. Thanks goes to Dr Dikai Liu for his help early in my candidature and to Mr Zenon Chaczko for putting me in touch with a certain professor in a new Centre of Excellence for Autonomous Systems. In general, I wish to thank the great minds in all three nodes of CAS for providing a fantastic culture in which to research and learn, and I would like to thank Professor Hugh Durrant-Whyte, Dr Tomonari Furukawa and others in particular for organising workshops to expressly share some of that collective knowledge. Back at UTS, thanks goes to Dr Matthew Gaston for a high availability computing cluster and the librarians for retrieving articles from far and wide. I wish to also thank those in the Red Corner: Cindy “Morning” Leung, Damith “Action” Herath, Alen “Ibis” Alempijevec, Matthew “Two dollar suspension” Rozyn, Ashod “Crimp tool” Donikian and Asela “Elephant” Kulatunga, for the fun, friendship and food in the years of my research candidature. Those who had never been in the corner, such as Dr Ngai Kwok, Zhan Wang, Weizhen Zhou and others in the Research Fellows‟ corner, are honorary members too. I must also thank Lily Chu for her critique of the diagrams, her companionship, and a timely supply of fruit juice. Angus the Labrador supervised the writing at nights. Finally, I owe my gratitude to my parents, John and Helen, and my grandmother, Shuk Chun Yeung, for the years of education, encouragement and care.
iii
Table of contents
Acknowledgements ..........................................................................................................iii Table of contents .............................................................................................................. iv List of Figures ................................................................................................................viii List of Tables .................................................................................................................... x Abstract ............................................................................................................................ xi 1
2
Introduction ............................................................................................................... 1 1.1
Elements of a Search Problem .......................................................................... 2
1.2
Search for Targets in Structured Environments ................................................ 3
1.3
Problems Addressed in the Thesis .................................................................... 5
1.4
Principal Contributions ..................................................................................... 7
1.5
Publications ....................................................................................................... 8
1.6
Thesis Structure................................................................................................. 9
Literature Review .................................................................................................... 11 2.1
Introduction ..................................................................................................... 11
2.2
Classical Search Problems .............................................................................. 11
2.2.1
Overview ................................................................................................. 11
2.2.2
Stationary Target Search Problems ......................................................... 13
2.2.3
Searching for a Moving Target ............................................................... 14
2.2.4
Extensions to the Detection Search Problem .......................................... 22
2.3
2.3.1
Single Searcher Problems ....................................................................... 23
2.3.2
Multiple Searcher Problems .................................................................... 24
2.3.3
Searching in Structured Environments.................................................... 27
2.4 3
Autonomous Searching and Related Work ..................................................... 23
Summary ......................................................................................................... 28
Search for a Stationary Target ................................................................................ 30 3.1
Introduction ..................................................................................................... 30
3.2
Problem Description ....................................................................................... 30
3.2.1
Environment Structure ............................................................................ 31
3.2.2
Searcher Capability ................................................................................. 31 iv
3.2.3
Target Information .................................................................................. 32
3.2.4
Search Efficiency .................................................................................... 33
3.2.5
Discrete Time Formulation of the Search Problem................................. 33
3.3
4
Approach ......................................................................................................... 34
3.3.1
Value Function ........................................................................................ 34
3.3.2
Dynamic Programming Equation............................................................ 35
3.3.3
Dynamic Programming Algorithm ......................................................... 36
3.4
Searching for Multiple Targets ....................................................................... 38
3.5
Examples ......................................................................................................... 38
3.5.1
Optimal Search Plan................................................................................ 39
3.5.2
Evaluation of the Proposed Algorithm.................................................... 41
3.6
Related Work and Discussion ......................................................................... 45
3.7
Summary ......................................................................................................... 47
Search for a Moving Target .................................................................................... 48 4.1
Introduction ..................................................................................................... 48
4.2
Problem Overview .......................................................................................... 49
4.3
Motivation ....................................................................................................... 49
4.4
Optimal Searcher Path Problem with non-uniform Travel Times (OSPT) ..... 51
4.5
Branch and Bound Framework ....................................................................... 54
4.5.1
Bounds for the Probability of Detection ................................................. 56
4.5.2
The Generalised MEAN Bound .............................................................. 57
4.6
The Discounted MEAN (DMEAN) bound ..................................................... 60
4.6.1
Motivation ............................................................................................... 60
4.6.2
Method .................................................................................................... 60
4.6.3
Proof of Guaranteed Upper Bound ......................................................... 62
4.6.4
Computational Complexity ..................................................................... 62
4.7
Evaluation of the Use of DMEAN Bound for OSP and OSPT Problems ....... 63
4.7.1
Uniform OSP Search Grid ...................................................................... 63
4.7.2
Comparison with Previous OSP Bounds ................................................ 64
4.7.3
OSPT Example ........................................................................................ 70
4.8
Search Problems with Minimum Transit Time Constraints ............................ 71
4.8.1
The Generalised Optimal Searcher Path Problem (GOSP) ..................... 72 v
4.8.2
Branch and Bound Algorithm for the GOSP Problem ............................ 74
4.8.3
DMEAN Bound for the GOSP Problem ................................................. 76
4.8.4
Example Search of an Office Environment ............................................ 77
4.8.5
Computational Complexity ..................................................................... 78
4.9
5
Discussion and Summary ................................................................................ 79
4.9.1
Potential Extensions ................................................................................ 79
4.9.2
Choice of Bounds for the OSP Problem ................................................. 82
4.9.3
Alternative Branch and Bound Approaches ............................................ 82
4.9.4
Application Issues and Other Related Work ........................................... 83
4.9.5
Summary ................................................................................................. 85
Multi-Agent Search with Interim Positive Information .......................................... 86 5.1
Introduction ..................................................................................................... 86
5.2
Searching with the Aid of Scouts .................................................................... 86
5.2.1
Overview ................................................................................................. 86
5.2.2
Problem Statement .................................................................................. 88
5.3
Optimal Policies for the Searcher/Scout Problem........................................... 91
5.3.1
Obtaining Optimal Plans ......................................................................... 91
5.3.2
Solution Approach Details and Illustrative Example .............................. 93
5.3.3
Notes on Computation ............................................................................ 98
5.4
Practical Heuristics for Searching with Scouts ............................................... 99
5.4.1
Heuristic Solution to the OSP problem ................................................. 100
5.4.2
Complete Planning Heuristics (G1, G1d, G2, G2d) ............................. 103
5.4.3
Replanning Heuristics (R1, R1d, R2, R2d) ........................................... 104
5.5
Results ........................................................................................................... 105
5.5.1
Optimal Solutions ................................................................................. 105
5.5.2
Heuristic Solutions ................................................................................ 109
5.5.3
Heuristics Evaluation ............................................................................ 114
5.6
Discussions and Summary ............................................................................ 121
5.6.1
Related Work ........................................................................................ 121
5.6.2
Incorporation of Non-Uniform Searcher Travel Times ........................ 122
5.6.3
Computational Issues ............................................................................ 123
5.6.4
Possible Extensions ............................................................................... 125 vi
5.6.5 6
Summary ............................................................................................... 125
Conclusions and Future Work............................................................................... 127 6.1
Summary of Contributions ............................................................................ 127
6.2
Directions for Future Work ........................................................................... 128
Appendix A – Bounding Methods for the Optimal Searcher Path problem ................. 131 A.1 Bounding Methods in Literature ...................................................................... 131 A.2 Obtaining Upper Bounds of Probability of Detection for the OSP Problem ... 132 A.3 The PROP Bound ............................................................................................. 133 A.4 The FABC Bound............................................................................................. 133 Bibliography.................................................................................................................. 136
vii
List of Figures Figure 1.1 Thesis motivation............................................................................................. 2 Figure 1.2 Discretisation of different environments ......................................................... 5 Figure 1.3 Problems considered in thesis.......................................................................... 6 Figure 2.1 Classical detection search problems in discrete space and time .................... 12 Figure 2.2 Example OSP search grid and equivalent graph............................................ 18 Figure 2.3 Enumeration of possible searcher paths for map in Figure 2.2 ..................... 20 Figure 2.4 Implicit enumeration of searcher paths for map in Figure 2.2 ...................... 21 Figure 3.1 An environment decomposed into a set of regions ........................................ 30 Figure 3.2 Computation times versus regions with non-zero target probability ............. 37 Figure 3.3 Snapshots at two distinct times of the stationary target search sequence (a). 39 Figure 3.4 Snapshots at two distinct times of the stationary target search sequence (b).40 Figure 3.5 Stationary target search sequence .................................................................. 40 Figure 3.6 Search sequence guided by additional knowledge ........................................ 42 Figure 4.1 Goal: Find an optimal sequence to search the regions of interest ................. 49 Figure 4.2 An OSPT search space depicted as a graph. .................................................. 52 Figure 4.3 All feasible search plans for the example in Figure 4.2 when T=7. .............. 54 Figure 4.4 Example implicit enumeration of searcher paths........................................... 55 Figure 4.5 Generalised MEAN bound calculation for the OSPT problem. .................... 58 Figure 4.6 DMEAN bound calculation. .......................................................................... 62 Figure 4.7 Example 5×5 OSP search grid ....................................................................... 63 Figure 4.8 11×11 OSP search grid used in comparisons ................................................ 64 Figure 4.9 Optimal path for 11×11 OSP search grid example with T = 15 .................... 66 Figure 4.10 Optimal path for 11×11 OSP search grid example with T = 17 .................. 66 Figure 4.11 Computation times versus time horizon for 11×11 OSP grid, g=0.6 and d=0.3. .............................................................................................................................. 68 Figure 4.12 Computation times versus time horizon for 11×11 OSP grid, g=0.6 and d=0.6. .............................................................................................................................. 68 Figure 4.13 Computation times versus time horizon for 11×11 OSP grid, g=0.6 and d=0.9. .............................................................................................................................. 69 Figure 4.14 Computation times versus time horizon (C++ implementation) for a 15×15 OSP grid .......................................................................................................................... 69 Figure 4.15 Example OSPT search environment with non-uniform travel times ........... 70 viii
Figure 4.16 Example search area with contiguous regions ............................................. 72 Figure 4.17 Tree of searcher actions up to T=5 for the example in Figure 4.16 ............ 75 Figure 4.18 ED network for the GOSP problem. ............................................................ 76 Figure 4.19 Example GOSP search area. ........................................................................ 77 Figure 4.20 Two-step discounting for a two-cell problem. ............................................. 80 Figure 5.1 Searcher scouring an area with the aid of scouts ........................................... 88 Figure 5.2 Solving the Searcher/Scout problem as a series of smaller OSP problems ... 93 Figure 5.3 Example search area with 3 cells ................................................................... 96 Figure 5.4 Example tree of options for the Searcher/Scout problem .............................. 97 Figure 5.5 Example DMEAN ED network ..................................................................... 98 Figure 5.6 Operation of the G1d, G1, G2d and G2 heuristics ...................................... 103 Figure 5.7 PD for 3-cell problem with different numbers of searchers and scouts ...... 105 Figure 5.8 Computation times for 3-cell problem with different numbers of searchers and scouts ...................................................................................................................... 106 Figure 5.9 Initial optimal search paths for 1 searcher and 1 scout................................ 107 Figure 5.10 Revised optimal search paths if the scout detects the target in cell 11 at time step 4 ............................................................................................................................. 108 Figure 5.11 Optimal search path for 1 searcher working alone. ................................... 108 Figure 5.12 Optimal search paths for 2 searchers. ........................................................ 109 Figure 5.13 Initial paths obtained using G1d for example in Figure 5.9 ...................... 110 Figure 5.14 Initial paths obtained using G1 for example in Figure 5.9 ........................ 110 Figure 5.15 Initial paths obtained using G2d for example in Figure 5.9 ...................... 111 Figure 5.16 Initial paths obtained using G2 for example in Figure 5.9 ........................ 111 Figure 5.17 Initial paths obtained using R1d for example in Figure 5.9 ...................... 112 Figure 5.18 Initial paths obtained using R1 for example in Figure 5.9 ........................ 112 Figure 5.19 Initial paths obtained using R2d for example in Figure 5.9 ...................... 113 Figure 5.20 Initial paths obtained using R2 for example in Figure 5.9 ........................ 113 Figure 5.21 Heuristic computation times for the small problem set ............................. 118 Figure 5.22 Heuristic computation times for the medium problem set......................... 119 Figure 5.23 Time to compute the initial plan using a replanning heuristic for the medium problem set ...................................................................................................... 119 Figure 5.24 Distributed calculation of rewards ............................................................. 124 Figure 6.1 Integrated approach to search ...................................................................... 130
ix
List of Tables Table 3.1 Stationary target search scenario properties.................................................... 43 Table 3.2 Expected target detection times for stationary target scenarios ...................... 44 Table 4.1 Branch and bound computation for 11×11 OSP search grid with T=15......... 65 Table 4.2 Branch and bound computation for 11×11 OSP search grid with T=17......... 66 Table 4.3 Branch and bound computation for GOSP example ....................................... 79 Table 5.1 Reward values for example with one searcher and scout ............................... 96 Table 5.2 Optimal agent plans for problem with 3 cells ................................................. 97 Table 5.3 PD using different numbers of searchers and scouts in a 3-cell problem ....... 98 Table 5.4 Small problem set for evaluating Searcher/Scout problem heuristics. ......... 114 Table 5.5 Medium problem set for evaluating Searcher/Scout problem heuristics. ..... 115 Table 5.6 Large problem set for evaluating Searcher/Scout problem heuristics. ......... 115 Table 5.7 PD of heuristic plans for the small problem set (a)....................................... 116 Table 5.8 PD of heuristic plans for the small problem set (b). ..................................... 116 Table 5.9 PD of heuristic plans for the small problem set (c)....................................... 117 Table 5.10 PD of heuristic plans for the medium problem set...................................... 118 Table 5.11 Average PD of heuristic plans over 10000 runs for the large problem set. 120 Table A.1 OSP bounds in literature .............................................................................. 132
x
Abstract Optimal Search in Structured Environments This thesis is concerned with the development of optimal search techniques to find a target in a structured environment. It is necessary in many rescue and security applications for the responders to efficiently find and reach the phenomena of interest. Although the target location is by definition not precisely known, it is nevertheless imperative to make the best use of any available information. Given a known area described as a set of connected regions, a prior belief on the target location and a model of likely target motion, the underlying task addressed is to determine the best paths for the one or more searchers to follow to maximise their effectiveness. As the most related problem discussed in literature, the Optimal Searcher Path (OSP) problem, assumes that the searcher can instantly relocate between regions, one of the main contributions in this thesis is an extension to the OSP problem to deal with the more realistic scenario where a searcher needs a finite time to physically travel from one region to another. This work first considers the search of an indoor environment for a stationary target with the aim of minimising the expected time to detection. A dynamic programming approach is used to find an optimal ordering of regions to inspect. The proposed technique is also extended to the search for multiple targets. Secondly, for maximising the probability of detecting a moving target in a structured environment, the more general Optimal Searcher Path Problem with nonuniform Travel times (OSPT) is formulated. A key contribution is a branch and bound solution approach with a new bounding technique, the Discounted MEAN bound, which also provides much tighter bounds for the OSP problem compared to existing methods. As this improvement is gained with almost no increase in computational time, optimal search paths can thus be feasibly derived for longer time horizons. Finally, a multi-agent problem where the searchers are aided by scouts that can help find but not rescue the target is considered. Envisaged applications include firefighters (searchers) entering a building with a number of scouting robots. While the process terminates as soon as a searcher finds the target, successful scout detections can only improve on the knowledge available to guide future searches. The solution framework must therefore plan not only to maximise the probability of the searchers directly finding the target, but also put them in the best position to exploit any new information obtained from detections by scouts. It is shown that the problem can be partitioned into a series of modified OSP problems, through which the complete set of paths necessitated by each possible scout detection (and non-detection) can be obtained. Optimal and heuristic solutions to this problem are presented.
xi
1 Introduction This thesis is concerned with the development of optimal search techniques to find a target located somewhere in a structured environment. It is necessary in many rescue and security applications for mobile responders to efficiently find and reach a subject of interest, be it a victim, possible intruder, or an otherwise unexplained phenomenon. Under an Urban Search and Rescue (USAR) setting, the objective is to render aid to the victim as quickly as possible in a race against a diminishing survival window. In security scenarios, the target or phenomenon is to be found and investigated to minimise the potential for harm. Regardless of the specific application, a responder‟s knowledge about the target location can be categorised as:
Perfect knowledge – the responder knows where the target is at all times, or
Imprecise knowledge – the responder has some idea of where it could be, but cannot be sure, or
No knowledge – the target could indeed be anywhere. A responder can simply intercept the target via the most direct path if its location
is known at all times. Conversely, in the absence of any knowledge, the best that can be done is to cover the entire area of interest as quickly as possible. In many cases, however, imprecise information can arrive in the form of distress signals, witness reports “I last saw him in the back of the office”, informed guesses based on past behaviour patterns, or data gathered through a network of sensors, such as motion sensors, in the environment. The objective of this thesis is to develop techniques that make the best use of this partial information in planning the paths taken by one or more responders, such that the search of a structured environment can be conducted as efficiently as possible. While some heuristic solutions are also developed, the main focus is on taking advantage of the increasing capability of modern computers to obtain optimal solutions within realistic time frames.
2
Darker shade: Higher likelihood of target
Target probability distribution
Which path maximises the chance of finding the target?
Building sensors, witness reports
Figure 1.1 Thesis motivation – given the available target information, determine the best path to search for the target.
1.1 Elements of a Search Problem Search theory studies the problem of how to best allocate a limited amount of resources to find a target (or targets) whose location is not precisely known (Frost and Stone, 2001). Originating as a discipline from the work by Bernard Koopman and others at the Anti-Submarine Warfare Operations Research Group (ASWORG) during World War II, the number of decades since has seen a body of work developed both in theory and towards application. Having relevance in the areas of search and rescue, military and security operations, fault discovery, as well as in resource exploration, the overarching goal is to arrive at an allocation of the search resources such that the probability of detecting the target, or alternatively a reward linked to this probability, is maximised. The elements of a basic optimal search problem can be defined as (Koopman, 1980):
Target information: represented by a prior probability distribution of its location and other relevant states,
Searcher capability: the amount of search effort available for use and the restrictions on the manner in which it can be deployed,
Detection function: a function relating the amount of search effort applied at a location to the probability of detecting a target that is actually there, and
3
Measure of search effectiveness: an objective that defines a reward in relation to the probability of finding the target when a given search plan is executed. Additional considerations include knowledge about the target motion (for moving
target scenarios), sensor characteristics, and the representation chosen to model the search environment (the search set). These can also be considered as part of the above. An optimal search problem is therefore concerned with finding an allocation of the limited search resources available, such that the reward as specified by the measure of effectiveness is maximised. The solution of this problem informs the search planner or the searchers themselves how much and where each component of the effort should be placed at each time (Frost and Stone, 2001).
1.2 Search for Targets in Structured Environments When searching in structured environments, the likely sources of prior target information can include witness statements, past activity logs, and information from networks of motion, heat, smoke, door or other sensors already instrumented in the area (Krishnamachari and Iyengar, 2004). Data may also come from additional sensor networks deployed for the express purpose of gathering more target information prior to risking the deployment of searchers (Batalin and Sukhatme, 2005). One of the natural ways to summarise the available target information, regardless of its source, is to represent it as the likelihood of the target being present in each part of the search environment. In the case of finding a ship lost at sea, Bourgault et al. (2001) employed a Bayesian approach where such a target probability distribution function (PDF), defined over a large uniform grid of cells representing the environment, is used as prior information. As the rescue air vehicles scour the ocean, the target PDF is updated using a model of the sensor and the expected target motion. Depending on the particular sensor used, its “footprint” may affect a large number of cells at once. The likelihood of target presence in structured environments can also be discretised into a number of cells. For the purpose of searching a structured environment, however, it can be seen that a searcher is principally concerned about the probability of the target being in each of the regions (rooms, floors or buildings) comprising the area. Viewed more simply, a searcher must eventually inspect a region if it has a chance of containing the target, but otherwise can ignore a whole region altogether. As such, while a large number of uniform cells are often necessary to
4
characterise a target PDF in an outdoor area, knowledge about a target in structured environments can also be summarised by the probability of the target being in a much smaller number of non-uniform regions. This difference in scale impacts on the techniques that can be feasibly applied, which in turn determines how far-sighted the search plan generated can be. When a target PDF is defined over a large uniform grid of cells, as is the case in many robotic applications (Bourgault et al., 2001; Moors and Schulz, 2006), it is usually only practical to either (1) select the optimal actions with respect to what could happen in the very short term future, or (2) try to plan over a longer horizon, but via suboptimal means. Short term planning can provide adequate solutions when the target PDF contains a clear gradient that leads a searcher towards the areas with high target probability. When the task is to search a structured environment replete with internal walls and doorways, however, this is much less likely to be the case. A searcher that uses short term planning under such circumstances may then be compelled to repeatedly cover a nearby area while remaining oblivious to the possibility of finding a target much further away. Given a structured environment divided according to its constituent regions rather than uniformly small cells, there exists the potential to generate plans that are optimal over a much longer time period. Instead of determining which direction to move towards in the very next instance, a searcher could instead determine a sequence of regions to inspect that maximises the overall effectiveness in the time available. Although there already exist techniques in Operations Research literature that generate long term plans for searching grids of uniform cells, the predominant assumption that the cells are essentially identical preclude their direct application for the search of the disparate regions here. A key motivation behind this thesis therefore is the development of optimal techniques that bridge the requirements of the search of structured environments with the optimal search problems that are in existence (Figure 1.2).
5
Optimal Search Problems (open outdoor areas)
Search Problems in Robotics Literature (open outdoor areas)
Tens or hundreds of uniform cells
Tens of thousands of uniform cells in a fine grid
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Possible to find globally optimal search plans
Only feasible to optimise for short time horizons
Structured Environments At most hundreds of non-uniform regions, typically much fewer, with potential for hierarchical decomposition
Figure 1.2 Discretisation of different environments.
1.3 Problems Addressed in the Thesis The following lists the three problems examined in this thesis and the corresponding issues that they seek to consider.
Searching for a stationary target in a structured environment (Chapter 3)
How to best generate long term plans, based on a given target probability distribution, to search for stationary targets in structured areas?
Searching for a moving target in structured environments using an imperfect searcher (Chapter 4)
How to model the search of a separated group of regions (e.g. cluster of buildings) or alternatively a set of contiguous regions (e.g. an office space with rooms and corridors)?
How to efficiently generate plans for the above environments when the target is known to be moving and when the searcher has a possibility of overlooking the target?
6
Searching for a moving target in structured environments with searchers and scouts (Chapter 5)
How to make the most use of platforms that can only detect but not rescue the target?
Since it is possible for positive target information to be received during plan execution, how does one plan to best take advantage of the possibility of replanning?
Figure 1.3 summarises the relationships between the optimal search problems addressed in this thesis.
Search for Stationary Target(s) in Structured Environments Chapter 3
Non-uniform travel times between cells Minimise expected time to detection
Optimal Searcher Path Problem(OSP) [Stewart’79, Eagle/Yee’90]
Maximise probability of detection
Non-uniform travel times between cells
Optimal Searcher Path problem with non-uniform Travel times (OSPT)
Minimum transit times through cells
Multiple agents with positive target information. Need to adapt plans during search.
Chapter 4
Generalised Optimal Searcher Path Problem (GOSP) Chapter 4
4 2
5 6
2
1
2
4
2
2
3 2 1
1
2 2
5
3
2 1
4 3
1
8
1
9
7
6 1 10
1
1 1
1
11
12
Searcher/Scout Problem Chapter 5
Figure 1.3 Problems considered in thesis. Chapters 3, 4 and 5 will formulate and develop optimal solutions for different extensions of the Optimal Searcher Path (OSP) problem in the literature.
7
1.4 Principal Contributions The main contributions of this thesis are:
The Optimal Searcher Path (OSP) problem in the Operations Research literature is extended to account for the time a searcher needs to move from one region to another. The ability to plan with the non-uniform travel times between regions in mind makes it possible to realistically model the discrete search of structured environments. Two complementary formulations are proposed: one is aimed at modelling the search of physically separated regions while the other deals with the search of environments consisting of contiguous regions, such as office spaces.
A branch and bound approach is presented to generate optimal plans for these new problems. A key contribution is a new bounding method that provides tighter bounds for the new problems as well as the original OSP problem with almost no additional computational cost.
A problem of searching with multiple searchers and scouts is presented, in which the search team obtains not only negative target information from nondetection but additionally positive information concerning the target location from the scouts. Unlike most problems which terminate as soon as the target is found, successful detections by scouts only serve to improve on the current knowledge such that the team can react to better engage the target in the future. The team must correspondingly plan not only to maximise the probability of the searchers directly finding the target, but also give them the best chance of exploiting possible new information. It is shown that this need to plan for replanning can be addressed by equivalently solving a series of modified simpler detection search problems that always do terminate on detection.
Optimal and heuristic solution methods for the above Searcher/Scout problem are derived, such that the capabilities of all the sensing platforms in a search task are
harnessed
even
when
only
rescuing/engaging/servicing the target.
a
subset
are
capable
of
actually
8
1.5 Publications Some of the contributions of this thesis are documented in the following articles:
Lau, H., Huang, S., Dissanayake, G., 2005, „Optimal search for multiple targets in a built environment‟, In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, Canada, pp. 37403745.
Lau, H., Huang, S., Dissanayake, G., 2006, „Probabilistic search for a moving target in an indoor environment‟, In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, pp. 3393-3398.
Lau, H., Huang, S., Dissanayake, G., 2007, „Discounted MEAN bound for the optimal searcher path problem with non-uniform travel times‟, European Journal of Operational Research, in press.
Lau, H. Huang, S., Dissanayake, G., 2007, „Multi-agent search with interim positive information‟, IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, USA, to be presented.
9
1.6 Thesis Structure The thesis is structured as follows: Chapter 2 discusses a number of search problems addressed in robotics and Operations Research literature. The Optimal Searcher Path problem (OSP), a discrete search problem which forms the basis for the rest of this thesis, is discussed in detail. The literature on search problems from the area of Operations Research mainly focus on techniques that generate long term solutions. On the other hand, typically sub-optimal or shorter time horizon solutions are employed due to practical constraints in robotic search applications. It is argued that due to the ability to naturally divide structured environments into a small number of constituent regions rather than a large grid of uniform cells, there is scope to optimally plan for the entire duration of the search on the level of the constituent regions inspected.
Chapter 3 considers the search for a single stationary target in a known structured environment that can be described by a set of connected regions. In contrast to most problems in the existing literature, the scenario allows the search times of individual regions and the travel times between the regions to be arbitrarily specified. The objective is to minimise the expected time for detecting the target and a dynamic programming approach is proposed. The technique is also extended to the problem of searching for multiple targets.
Chapter 4 formulates the Optimal Searcher Path problem with non-uniform Travel times (OSPT), which extends the OSP problem to model the search for a moving target in a structured environment. The searcher is imperfect in the sense that there could be a non-zero probability of missing the target even when the region in which the target is present is searched. The objective is thus to maximise the probability of detecting the target within a limited amount of time. A complementary formulation, the Generalised Optimal Searcher Path problem (GOSP), is also provided to specifically model the search of open indoor environments. A key contribution is a branch and bound solution for both problems with a new bounding technique, the Discounted MEAN (DMEAN) bound. DMEAN also provides much tighter bounds than existing methods for the OSP problem itself. As this improvement is made with almost no additional computation, the
10
bounding technique extends the time horizons for which plans for the OSPT and OSP problems can be feasibly obtained.
Chapter 5 considers a general multi-agent search problem where one or more searchers are aided by scouts that can help detect but not rescue or engage the target. Envisaged applications include a team of fire-fighters entering a building with a number of scouts. While the process terminates as soon as a searcher finds the target, successful scout detections only serve to improve on the knowledge available to guide future searches. The solution framework must therefore plan not only to maximise the probability of the searchers directly finding the target, but also put them in the best position to respond to and exploit any new information obtained from detections by scouts. It is shown that the problem can be partitioned into a series of modified multi-searcher OSP problems, through which the complete set of paths necessitated by each possible eventuality of scout detection (and non-detection) can be obtained. Leveraging the work developed in the earlier chapters, optimal and heuristic methods are presented to address this search problem with both negative and positive target information.
Chapter 6 summarises the main contributions of this thesis and suggests a number of future directions for research.
2 Literature Review 2.1 Introduction This chapter discusses a number of related search problems addressed in robotics and Operations Research literature. The Optimal Searcher Path problem (OSP), a discrete search problem which forms the basis of the rest of this thesis, is discussed in detail. The chapter is organised as follows: Section 2.2 discusses the literature on classical search problems stemming principally from the area of Operations Research, which focus mainly on techniques that generate long-term optimal solutions. Section 2.3 canvasses search applications in robotics and related fields, where typically suboptimal or shorter time horizon solutions are employed due to practical constraints.
2.2 Classical Search Problems 2.2.1 Overview Benkoski, et al. (1991) provides a survey of the different types of problems explored in search theory literature, which can be viewed in the broad categories of onesided search problems and two-sided search games. The former assumes that the target is unwilling or unable to respond to the searcher‟s action, such that once the search process has started, it is only the searcher‟s plan (based on the expected target motion) that affects the anticipated outcome. This assumption is used in most maritime search and rescue (SAR) searches, where the target remains unaware of being the subject of the search and cannot actively influence the chances of detection until actually coming into close range with the searchers (Frost and Stone, 2001). This same assumption is made for the search problems in structured environments considered in Chapters 3, 4 and 5 of this thesis. Envisaged scenarios include situations with a compliant target, such as the search for a distressed or lost victim in a burning building, and in general cases where the searcher is difficult to detect. The second category, the two-sided search games, involves cases where the searcher and target cooperate to find each other, or where the target actively seeks to hide from the searcher. In the worst case scenario, a target may have perfect knowledge about the searcher‟s actions and can always move to minimise the chance of capture. Faced with such an omnipotent (or even intermittently informed) adversary, the best
12
that a searcher can then do is to maximise the probability of capturing the target under the assumption that it will always do its worst. This in spirit describes the operations of the search allocation game (SAG) (Hohzaki, 2006) as well as the pursuit-evasion game (Gerkey et al., 2006), which in many cases also assume an arbitrarily fast target. Other problem variations where the target can actively counteract the searcher, such as choosing a stationary hiding position (Nakai, 1988), selecting a route to avoid ambush by a stationary searcher (Hohzaki and Iida, 2001) and avoiding detection by a searcher on a pre-planned route (Hohzaki and Iida, 2000), are also often addressed through game-theoretic means. Conversely, a target in a rendezvous problem (Alpern, 1995; Anderson and Weber, 1990), possibly armed with some knowledge of the searcher‟s location, actually wishes to be found as quickly as possible. However, two-sided problems are outside the scope of this thesis. Due to the large body of work on one-sided search problems, the following sections will only provide an overview of the literature relevant to the problems addressed in this thesis. Figure 2.1 outlines the main one-sided detection search problems addressed in search theory literature.
Stationary Target Search Problems
Markov target motion
Changing from one cell to another incurs a time penalty/cost
Solved with Langrangian multiplier methods
Solution method: Forward and Backward (FAB) algorithm (Brown, 1980)
Moving Target Search Problems
Search Problems with Switch Costs
(Pollock, 1970)
(Onaga, 1971, Lössner and Wegener 1982 )
Path constraints on where search effort can be placed
Solution methods: dynamic programming, integer programming
Optimal Searcher Path Problem(OSP)
Solution methods: dynamic programming
(Stewart, 1979, Eagle/Yee, 1990)
Continuous Search Effort (Infinitely divisible) Search effort allocation at each time step is restricted with respect to the previous allocation
Search effort is spread over any number of cells at each time step Search plan is just a “path” of the cells visited at t=1,2,….T NP-Complete
Eg. 20% in cell 1, 32% in 6, 48% in 7
Discrete Search Effort Solved with FAB
Solution methods: dynamic programming, branch and bound
All the search effort is placed in a single cell at each time
Figure 2.1 Classical detection search problems in discrete space and time. The problems gain complexity with additional features and constraints, indicated by the red plus signs.
13
2.2.2 Stationary Target Search Problems Early search problems sought to find a stationary target located at a point on a plane or alternatively in one of a number of discrete cells. Most cases assumed infinitely divisible search effort, such that multiple cells can be simultaneously searched at each time step. These continuous effort allocations might represent the amount of time that an aeroplane spends over each patch of the ocean, or the proportion of radar, sonar or other sensing resource that can be finely apportioned between the individual areas. Sensor effectiveness is represented by a detection function that maps the amount of effort invested in each cell to the probability that any target occupying it will be found; typically an exponential detection function was assumed. The task was therefore to arrive at an optimal effort allocation among the cells of the search environment, in terms of the amount of effort to be placed in each cell such that the overall probability of detecting the target is maximised. Other objectives including the minimisation of expected detection time were also addressed. Stone (1989) discussed a number of techniques used for solving this and other related basic search problems. As the best continuous effort allocation under the assumption of exponential detection functions forms a convex optimisation problem, Lagrange multiplier methods were typically used. Further work extended the conditions for the optimality of such techniques to support a wider class of regular detection functions. A regular detection function is defined to be one which has a continuous, positive and decreasing first derivative (Stone, 1989), such that the investment of effort to increase the probability of detection is subject to the law of diminishing returns. Discrete effort stationary target search problems provide a complement to their continuous counterparts by considering cases where all the available search effort is restricted to be in a single cell at each time. Modelling situations where a search resource (such as a ship or a manned patrol) cannot be simultaneously deployed in multiple places, the optimisation task then became that of finding the best sequence of cells for the searcher to visit. Alternate solution methods are necessary in cases where the search effort available is not infinitely divisible. While Lagrange multiplier techniques can be readily used to solve the problem in its continuous form, it can be shown that the same methods do not always result in an optimal solution when the search effort to be allocated is discrete (Zahl, 1963). Fortunately, for a restricted set of problems, one can sequentially construct an optimal plan by always choosing to search
14
the next cell that maximises the ratio of the next increment in probability of detection to the next increment in cost (Benkoski et al., 1991). This property for a locally optimal plan to be also globally optimal at any point in time is termed “uniformly optimal” (Stone, 1989). For basic discrete search problems maximising the probability of detection, such a locally optimal strategy is also uniformly optimal whenever the detection function is concave and continuous (Stone, 1989). For example, a problem in which the ratio of the gain in probability of detection versus the increase in total cost (marginal rate of return) does not grow for each search of a cell can then be addressed with the locally optimal approach. This includes the typical problem where a fixed cost is incurred for the search of each cell and an exponential detection is used. In the related case where the objective is to minimise the expected cost, Stone (1989) also showed a locally optimal plan to be uniformly optimal as long as the marginal rate of return does not increase. On the other hand, search problems in which a “switch” cost is incurred whenever the searcher chooses a different cell (Gilbert, 1959; Kisi, 1965; Onaga, 1971; Lössner and Wegener, 1982) do not enjoy the same uniform optimality property. As will be further discussed in Chapter 3 and Chapter 4, respectively, enumerative approaches such as dynamic programming and branch and bound are then required.
2.2.3 Searching for a Moving Target 2.2.3.1 Target Motion Models In addition to the prior probability distribution of the possible target locations, a model of the target‟s likely motion over the time period of interest is central to determining the best course of action when searching for a moving target. Given that a target is assumed not to vary its actions during the search process, its motion may simply be described by its likely locations at each successive time step. For example, a submarine‟s possible movements through a set of cells can be characterised by a number of tracks listing the sequence of cells it will visit should that track hypothesis be correct (Hohzaki and Iida, 1997). Each track is typically defined as 1 ,..., T , where i is the position of the target at time i and T is the total number of time steps available for search. Some researchers such as Hohzaki and Iida (2001) also make use of target tracks unaccompanied by specific timing information.
15
It is typically the responsibility of the search planner to define the feasible tracks and assign the corresponding likelihood p of each track being the actual path taken. Let ( , ) denote the probability of non-detection when the target uses track and the searcher follows a plan . A basic optimisation task is then the choice of to minimise
p (, ).
Since a discrete problem with 9 cells and T 10 time steps
may contain up to 910 possible target tracks, the number of tracks can become unmanageable even with modestly sized problems (Washburn, 1995). Not only is it difficult to just evaluate the objective function alone, assigning specific probabilities for each and every track would also be a cumbersome process. Many works therefore apply Markov assumptions to simplify the description of target motion. Instead of defining entire target tracks, Markov motion models assume that the likelihood of a target moving to another at a given time is independent of any prior action. In particular, such a motion model may be captured as a matrix where element (i, j, t , u) describes the probability that a target residing in a cell i at time t will move to a cell j at a later time u , typically u t 1, irrespective of its history before time t . Chaining together the transition probabilities for t 1,...T 1 then yields a distribution of target location at each time step. For clarity of explanation, the rest of this thesis assumes the use of a time invariant motion model, and thus is restated as an N N matrix where (i, j ) holds the probability that the target in cell i will move to cell j at the next time step. The solution techniques to be developed in Chapters 4 and 5 however can also be directly used with a target motion model that changes with time. The assumption of independence from prior actions allows Markov motion models to avoid the need to define all feasible target paths, while preserving most of the ability to describe target motion. Further realism can be introduced by simply redefining each “cell” of the model to represent not just a single cell occupied the target but also an associated velocity and acceleration profile (Washburn, 2002). Moors and Schulz (2006) followed a similar approach in expanding the above simple Markov model to include also an intended direction of target motion learnt offline using random particle models. At a small increase in computation cost, the resultant second order model led to a noticeably more human-like evolution of the target probability distribution over time through the search space. It is even possible in the limit to use each cell to represent an
16
entire target track. This would however just recast the scenario as a stationary target search problem with an inordinately large number of cells (Washburn, 2002).
2.2.3.2 Moving Target Problems Described in Literature It was not until the mid 1970s that a large number of researchers began to consider the one-sided optimal search for moving targets. The addition of an evolving target distribution greatly enlarges the state space, and as can be expected, simply maximising the detection probability for each individual time step does not guarantee a globally optimal allocation (Washburn, 1983). Consequently, only a very limited number (less than ten) of cells were considered in the early solutions to the moving target search problems in discrete space and time. In an important advance, Brown (1980) showed that a search plan for a continuous effort problem (using a regular detection function) is optimal only if the effort allocations at each time step t also maximises the detection probability for a linked stationary target problem. In particular, each cell‟s probability in the stationary problem is set to be equivalent to the joint probability of the target arriving at the cell at time t and is not detected at any other time. Practically, this observation allowed the optimal search plan for continuous effort moving target search problems to be found by solving a series of simpler stationary target sub-problems. This approach greatly simplified the solution process, in that techniques already developed for finding optimal continuous effort allocations for stationary target search problems could then be directly applied. This overall iterative approach is formalised in the FAB (forward and backward) algorithm1 (Brown, 1980) and has since become the basis of solutions to a number of other related works (Tierney and Kadane, 1983; Washburn, 1995; Hohzaki and Iida, 1997; Kunigami, 1997; Dambreville and Le Cadre, 2002; Dodin et al., 2007). Reflecting its more general use, Washburn (1983) extended the FAB algorithm to consider other payoff functions, including minimising the expected cost to find a target and maximising the reward in a multi-state survivor search (Discenza and Stone, 1981). An interesting aspect of the FAB approach lies with the fact that it can quantify the maximum possible difference of any arbitrary plan‟s payoff from the truly optimal reward (Washburn 1981); the iterative computation procedure may then be terminated early once a solution guaranteed to be sufficiently close to optimal is found. The same property also makes the technique amenable for estimating the usefulness of partially 1
A more detailed description of the FAB algorithm can be found in Appendix A.
17
enumerated plans in a branch and bound framework, as will be further outlined in Section 2.2.3.4. Reflecting the similar difficulties with applying continuous effort methods to the discrete forms of the stationary target problem, the conditions under which the FAB algorithm generates an optimal solution are not always sufficient when the effort to be allocated is discrete. In particular, Washburn (1983) showed that unlike for the continuous effort case, critical discrete search plans found by the algorithm are not necessarily optimal. Of particular relevance to the problems considered in this thesis, the discrete Optimal Searcher Path problem (OSP) (Stewart, 1979; Eagle 1984, Eagle and Yee, 1990; Dell et al. 1996; Hohzaki and Iida, 1997; Washburn 1998) in literature further restricts the cells which can be searched at each time. Aimed at modelling scenarios where the search effort is constrained to follow a path, if one cell is searched at a time interval t , then the effort can only be redeployed to a neighbouring cell at time t 1. Due to this search effort constraint, the problem is known to be NP-Complete if maximising for the probability of detection, and at least NP-Hard when the objective is to minimise the expected detection time (Trummel and Weisinger, 1986). The following section describes the discrete effort form of the OSP problem in more detail.
2.2.3.3 The Optimal Searcher Path Problem (OSP) The searcher and target move through an environment divided into a finite set of cells C 1,...,N (see Figure 2.2). The target occupies one cell at a time and moves according to a specified Markov model at each time step; a matrix describes the probability that a target will move from any of the cells to another at the next time step. As an example, setting 11, 0.6, 1,2 0.3, and 1,3 0.1 indicates that a target known to be in cell 1 will move to cell 2 with a 30% probability, move to cell 3 with a 10% probability, and has a 60% probability of remaining in its current cell. An initial probability distribution p( ,1 ) [p( 1,1 ), p( 2,1 )...p( N ,1 )] of where the target could be at time step 1 is supplied, where p( i,t ) is the probability that the target is in cell i at time t without being detected by any searches before t. In the absence of searchers, the distribution evolves according to the formula p( ,t 1 ) p( ,t ) .
18
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Figure 2.2 Example OSP search grid and equivalent graph. Edges show valid searcher transitions
Target detection is modelled as follows: if both the searcher and target are in cell
i during time t , detection occurs with a glimpse probability of g( i,t ). This probability is
assumed
to
be
independent
of
past
searches.
As
an
example,
if
p( ,t ) [p( 1,t ),..., p( N ,t )] and cell 1 is searched for one time step, the distribution at the next time step then becomes p( ,t 1 ) [p( 1,t ) ( 1 g( 1,t )), p( 2,t ),..., p( N ,t )] . This glimpse function may typically take the form of g( i,t ) 1 e( i,t ) (Dell et al., 1996, Eagle and Yee, 1990), with ( i,t ) 0 being a measure of search effectiveness for a given cell i. Any function 0 g( i,t ) 1, however, can be used. The searcher‟s path is constrained by the structure of the environment, with S i ,i C denoting the set of cells that a searcher can directly move to from cell i. In
particular, if the searcher is in cell i at time t , it is only able to search cell j S( i ) at time t 1. Given T time steps to find the target, let be a valid search plan represented by a series of cells searched in one time unit increments, where ( t ) denotes the cell inspected at time step t { 1,...,T }. A searcher following plan first moves to and searches cell ( 1 ) for one time period, then travels to search cell ( 2 ) for another time step, and continues for the remaining cells until T time periods in total have been expended. For convenience, ( 0 ) denotes the searcher‟s given initial position prior to the first search. Taking into account the target motion and the effects of previous cell searches, the undetected target probability mass in a cell at each time period can accordingly be determined by:
p( ,t 1 ) p( ,t ) M ( t )t , 1 t T
(2.1)
19
Where M ( t )t is an N N identity matrix with the ( t )th diagonal element set to
1 g( ( t ),t ). The objective for the Optimal Searcher Path problem (OSP) is to find the search plan that maximises the cumulative probability of detection PD( ) within the T time steps, which can then be stated as: T
max PD( ) p t ,t g t ,t
(2.2)
( t 1 ) S( ( t )), t 0,...,T 1
(2.3)
t 1
Subject to: The glimpse functions g( ( t ),t ) are given and the undetected target probability
p( ( t ),t ) can be obtained using equation (2.1). There exist alternative formulations in literature for the OSP problem (Thomas and Eagle 1995; Washburn, 1995), including a variant in which the objective is to maximise an expected reward accounting for both the utility of finding the target and the cost of sensing (Hohzaki and Iida, 1997). The formulation shown above, similar to that used in Dell et al. (1996), is chosen here to make clear a generalisation of the problem in Chapter 4. Trummel and Weisinger (1986) also defined a form of the problem that seeks instead to minimise the expected time to detection given an infinite time horizon. The work in Chapter 3 can be seen to be related to this form.
2.2.3.4 OSP Solution Methods As discrete search problems are rendered non-convex by the implied integral search effort constraint, previous techniques that directly optimise the allocation of infinitely divisible effort, such as the FAB algorithm, do not necessarily converge to optimal discrete effort solutions. The need for the searcher to follow constrained paths in the case of the OSP problem also presents additional challenges, thus further favouring the use of enumerative methods. Eagle (1984) formulated the OSP problem as a partially observable Markov decision process (POMDP) and employed dynamic programming to maximise the probability of detection. In order to manage the potential size of the solution space, the proposed technique relied heavily on a dominance checking method to first eliminate states that clearly do not belong in an optimal solution. Even with the use of this technique, only a small number of value iterations could be calculated for a problem
20
with nine cells before computer memory was exhausted. The use of dynamic programming for the OSP problem is therefore limited by computation speed and memory requirements to very small problem instances. Searcher Start Cell 1
Cells the searcher can visit at time step 1
Cells the searcher can visit at time step 2
Cells the searcher can visit at time step 3
1
2
6
1
2
6
1
2
3
7
1
2
6
7
11
...
...
...
...
...
...
...
...
...
...
...
2
6
7
8
12
...
...
...
...
...
Figure 2.3 Enumeration of possible searcher paths for map in Figure 2.2. Each sequence of nodes describes a different feasible path.
Under an alternative approach, Stewart (1979) proposed a branch and bound framework that finds the best search path by implicitly enumerating all the feasible paths for the searcher. The key to the approach lies in explicitly examining only a small subset of the possible paths, safe in the guarantee that the remainder cannot possibly be optimal. Various forms of branch and bound have since featured prominently in OSP literature (Eagle and Yee, 1990; Martins, 1993; Dell et al. 1996; Hohzaki and Iida, 1997; Washburn, 1998). Such branch and bound approaches typically enumerate feasible paths in a depth-first manner, beginning with the starting cell occupied by the searcher (Figure 2.3). As the process branches out to consider each of the cells that the searcher can possibly search at the subsequent time step, an upper bound of the best payoff (in terms of the probability of detection or an expected reward) that can be achieved if the searcher does indeed choose to go to this cell next is estimated. Should the estimate not exceed the best known reward thus far, meaning that no further expansion of that branch can possibly yield a better solution than what is already known, then the entire subspace of related paths can be safely discarded (Figure 2.4). As a result, a branch and bound approach stands to find the optimal solution significantly quicker than the worst case of exhaustive search if reasonably accurate bounds can be quickly computed.
21
Searcher Start Cell 1
Cells the searcher can visit at time step 1
Cells the searcher can visit at time step 2
Cells the searcher can visit at time step 3
1
2
6
1
2
6
1
2
3
7
1
2
6
7
11
...
...
...
...
...
...
...
...
...
...
...
2
6
7
8
12
...
...
...
...
...
Figure 2.4 Implicit enumeration of searcher paths for map in Figure 2.2. If searching cell 2 at time 1 cannot possibly lead to an optimal solution, then all subsequent path extensions beyond that point can be ignored.
The calculation of such bounds commonly involves solving a simplified problem in which one or more constraints of the original OSP problem is relaxed. Stewart (1979) simplified path constraints such that a discrete searcher can visit any cell reachable from the starting location after the elapsed number of time steps, even if that particular cell is not directly connected to the last cell searched. Although a problem with this distribution of effort (DOE) relaxation can be promptly solved with a discrete version of the FAB algorithm (Brown, 1980), the resultant solutions are not guaranteed to find the optimal answer for the relaxed problem and therefore only give rise to heuristic bounds. In contrast, Washburn (1995) removed the discrete search effort assumption from the problem such that the searcher is allowed to be in multiple cells at the same time. The solution of this is now a convex continuous effort problem. Therefore the FAB algorithm is able to generate a true and tight bound. The FABC bound operates in this manner by only using the payoff from just the first iteration of the FAB algorithm. FABC is, nevertheless, the most computationally intensive of the OSP bounds proposed in the literature. A similar approach is taken by Eagle and Yee (1990), who also solved a relaxed problem made convex by allowing continuous allocation of search effort. Eschewing bound sharpness for calculation speed, Martins (1993)‟s MEAN method made linear relaxations to the OSP problem through maximising the expected number of detections. In particular, the original search problem is transformed into a longest path problem in which both the searcher indivisibility and path constraints are still
22
preserved. When these path constraints are relaxed to form a reward collection problem, the more easily evaluated bound PROP (Washburn, 1998) is obtained. Lastly, ERGO2 (Washburn, 1998) estimates bounds with even less computation by directly using a stationary target distribution to calculate the corresponding rewards, instead of computing the actual distributions at each time. Washburn (1995, 1998) reviews and compares a number of bounding techniques in literature for OSP problem, including the above, which will be further discussed in Section 4.5.1. This thesis provides an improved version of the MEAN method, Discounted MEAN (DMEAN), which produces much sharper bound values with almost no increase in computational cost and outperforms other bounding methods proposed in the literature. An overview of the OSP bound methods and a more detail description of the PROP and FABC bounds can be found in Appendix A.
2.2.4 Extensions to the Detection Search Problem Generalized Search Optimization (GSO) was defined by Stone (1984) to denote the techniques (Stromquist and Stone, 1981), including FAB, that address a range of continuous effort search problems. Modelling the idea that a victim‟s health may deteriorate with time, the target in the survivor search problem of Discenza and Stone (1981) undergoes not only changes in motion but additionally (irreversible) changes in state. The objective was then to maximize the probability of finding the target alive by a given time. Similarly, another multi-state problem, the defensive search, involved finding an attacker before its weapon is launched (Stone, 1984). Beyond maximising detection probability in T time periods, Kadane (1983) considered a whereabouts search problem in which a searcher can additionally guess the target‟s location after the final time step. Surveillance and counter-smuggling scenarios also present particular challenges, since the first target detection in such applications might not necessitate the end of the search process. The Generalised Surveillance Search Problem (Tierney and Kadane, 1983), which can assign payoffs to additional detections after the first event, was subsequently proposed to incorporate the whereabouts, surveillance and also the standard detection search problem in a common formulation. The framework‟s ability to maximise rewards beyond the first target detection raises interesting possibilities and will be further discussed in Chapter 5. Moving beyond just searching for the target itself, Stewart (1985) outlined heuristics for
23
a problem in which target trails can also be used as source of positive information of where the target might be. More recently, Dambreville and Le Cadre (2002) explored the management of mixed search resources, such as radar and sonar, which can be re-used after a number of time steps subject to renewal constraints. As the deployment of a resource at one time stands to affect its future availability at another, effort allocation takes place not only across space but also time. To this end, Brown‟s FAB algorithm was modified to first divide the global amount of resources into optimal search effort quotas for each time interval. Dodin et al. (2007) used the discrete moving target problem framework to model a radar acquisition task, and employed branch and bound to find the best pattern to acquire a ballistic target with a narrow-beam tracking radar.
2.3 Autonomous Searching and Related Work This section discusses literature on search problems more directly set in the context of robotics. In general, the problems considered tend to incorporate a wider variety of application-specific concerns, reflecting the detailed needs of the individual scenarios addressed. In contrast to the solutions available in classical search theory literature, the emphasis is often given to quickly obtaining reasonable sub-optimal solutions.
2.3.1 Single Searcher Problems In a variation of the stationary target search problem discussed in Section 2.2.2, DasGupta et al. (2006) investigated the search for a static “honey-pot” hidden in a bounded region with internal walls. The searcher is only able to use local sensory information such that its circular sensing radius forms a “cookie-cutter” footprint in the target density distribution as it travels. Although the problem was defined in the continuous domain, the solution approach consisted of partitioning the area into smaller connected regions, solving a discrete problem similar to the OSP problem (Section 2.2.3.3) and then fitting the discrete plan back into a viable continuous path. The complexity the discrete problem was managed by first aggregating the regions into a sufficiently manageable number. A polynomial time approximation to the NP-hard problem was also provided, along with bounds that gauge the loss in optimality due to both the use of this heuristic and the process of discretising and refining the path itself.
24
Similar to the problem to be addressed in Chapter 3, Sarmiento et al. (2003) minimised the expected time to find a target inside a polygon that may contain holes. Since a uniform target distribution was assumed, the problem considered could also be seen as an exploration task. Instead of continuously searching through the area, it was assumed that the searcher senses only when located at specific points in the map. Although the choice of such locations was not addressed, reasonable suggestions were provided, including the guard positions from the solution of a corresponding art gallery problem (Chvátal, 1975) or similarly points on a watchman path (Chin and Ntafos, 1986). A branch and bound approach was sketched for finding the best sequence of locations to visit. Due to the complexity of the NP-hard problem, a greedy algorithm was proposed such that each location to be next visited maximises the ratio of the increase in detection probability to the increase in cost. The authors subsequently extended the problem to incorporate an arbitrary target probability distribution (Sarmiento et al., 2004) and proposed an extended two-layered approach. While the top level determined an efficient ordering of regions as before, a new lower level joins them together using locally optimal (in terms of visibility) trajectories. Bourgault et al. (2003a) presented a Bayesian approach to model the search for a stationary or drifting target at sea, principally with the objective of maximising the probability of detection within a given a time. The search environment is discretised into a large grid of cells, over which a target probability density function is defined. This function is defined a priori with available information, and updated with a process model that accounts for wind, current and other factors. Similarly, a distance-based observation model maps the position of the airborne searcher, defined in the continuous space, to the likelihood of detecting the target in each of the cells. Updating the probability distribution with this model then provides a posterior accounting for the effects of search. Due to the large number of cells involved, the trajectories were calculated using one-step look-ahead.
2.3.2 Multiple Searcher Problems The search of an outdoor area with Unmanned Aerial Vehicles (UAVs) has provided a popular multiple searcher application. Beyond the aforementioned issues associated with directly finding the target, the efficient sharing and fusion of information between platforms under communication constraints is also the subject of significant interest. Multi-agent information sharing is, however, beyond the scope of
25
this thesis and will not be discussed in detail. Other issues introduced by the use of multiple searchers include the need to avoid collision, stay in communication range, or for multiple vehicles to simultaneously respond to a target. Polycarpou et al. (2001) outlined a framework developing and evaluating strategies for coordinating the search and engagement in a dynamic target environment. A multi-objective cost function weighing the different competing needs of the searchers was solved using recursive q-step planning. Carrying the work forward, Flint et al. (2003) maximised the expected number of targets detected within a given time horizon in a risky environment with threats using a dynamic programming approach. The target information was presented using probabilistic maps and the eventuality that searchers could also be destroyed was considered. More recently Liao et al. (2005) considered a search and response task using platforms with limited communication, focusing on information sharing and information fusing policies. Jin et al. (2006) also addressed a search and response task with a heterogeneous team, whereby the trade off between searching and target engagement are evaluated with respect to mission performance. Beard and McLain (2003) examined a multi-target scenario where the searchers additionally have to avoid colliding with each other and yet not stray beyond communication range. An optimal dynamic programming approach for the NP-hard problem was presented along with two heuristics; One approach myopically planned for one vehicle at a time while the other planned with some consideration to the other vehicles. A number of works dealt with problems closer in form to the classical discrete search problems discussed in Section 2.2.3. Dell et al. (1996) examined a multi-searcher version of the OSP problem and compared the use of branch and bound, rolling horizon branch and bound, genetic algorithms, simple hill climbing, as well as two heuristics based on maximising the expected number of detection. Due to the high complexity of the problem, optimal paths maximising the probability of detection could only be found for at most two searchers. Ogras et al. (2004) used the multi-searcher OSP problem as a basis for a hierarchical approach where the cell-level paths obtained from the problem are then translated into robot steering directions. Although the cited example searches for multiple stationary targets, the problem objective of maximising the probability of detection (of at least one target) is cast identically as that for the single-target OSP described in Section 2.2.3.3. Two heuristics maximising a rate of return (ROR) measure instead of the probability of detection are given, one of which first aggregates the
26
searchers into groups to simplify planning. In operation, the searchers communicate to maintain a common target probability distribution and replan after each goal cell is searched. Hollinger et al. (2007a, 2007b) examined the problem of locating a nonadversarial target with multiple searchers in indoor environments, with the aim of minimising the expected time to capture. The probability of not detecting the target, the inverse of the rate of return, as well as the resultant entropy from searching each region were used as one-step heuristic cost functions to guide the searchers‟ actions. A decentralised planning algorithm in which each searcher plans as if the states of the others are fixed was proposed. Sujit and Ghose (2004) considered the case where the environment is divided into a regular grid of regular cells and the UAVs with endurance constraints must return regularly to a base station. Routes were planned heuristically a using k-shortest path algorithm, under a simplifying assumption of not updating the target information for the effects of searching during each particular sortie. Mission performance under differing assumptions of information sharing between platforms was also compared. A subsequent work (Sujit and Ghose, 2006) similarly explored the effects of applying market-based techniques. Bringing together a decentralised Bayesian data fusion (DDF) technique and a decentralised coordinate control scheme originally proposed in Grocholsky (2002), Bourgault et al. (2003b, 2004a) extended an earlier single-searcher framework (2003a) to coordinate the task of searching at sea with multiple vehicles. The vehicles are viewed as nodes in a decentralised Bayesian sensor network, where a channel filter (Bourgault and Durrant-Whyte, 2004) maintains a common picture of the target probability density function. Scalability is achieved through each vehicle planning only with respect to the locally available PDF, although it is also possible to improve on the common utility between vehicles through further negotiation (Bourgault et al., 2004b). Further work in Wong et al. (2005) extended the approach to search for multiple targets. Overcoming the possibility of earlier techniques where the target probability around a vehicle might not have a sufficient gradient to guide its direction of motion, a dual-objective switching function is introduced such that the vehicles would then move towards the mode of the nearest target PDF. Mathews and Durrant-Whyte (2007) identified the different information requirements of a multi-vehicle information gathering system that would allow a common team objective to be decentrally optimised. A scalable cooperative control algorithm was proposed that could also facilitate the negotiated solution of the multi-vehicle maritime search problem
27
considered in Bourgault et al. (2004b). Moving beyond optimising for searching alone, Furukawa et al. (2006) used the Bayesian framework for scenarios where some of the vehicles need to track the targets even after they have been initially found.
2.3.3 Searching in Structured Environments While much of the research described above are mainly aimed at addressing outdoor search scenarios, some of the work (Sarmiento et al., 2003, 2004; and Ogras et al., 2004; DasGupta et al., 2006) are also amenable to the search of indoor or structured areas. The Bayesian search framework of Bourgault et al. (2003a), for instance, can be coupled with a process model that accounts for a target‟s probable motion in an environment divided by walls (Bourgault et al., 2004c). It is unclear however whether same short horizon planning methods would result in paths that are as effective as for the outdoor case. Recognising that the Brownian motion of traditional simple Markov models can produce unrealistic target motion in a closed environment, the approach of Moors and Schulz (2006) used a second-order model trained offline using random particles. Models were developed using a training process in which each particle randomly chooses a waypoint in the environment and moves along a planned path before choosing another waypoint once more. All the particle tracks are then summarised in an expanded second-order motion model that accounts for not only the previous location of the target but also its intended direction. This model therefore allows existing problems that assume Markov target motion to plan for a more realistic target indoors. Beyond obtaining an effective target motion model, the nature and representation of the search space also play an important role in shaping the overall outcome. While the search spaces of discrete search problems all essentially consist of graphs with connected nodes, the choice of what the nodes physically represent can vary. As with other path planning problems, a primary consideration involves employing a graph that adequately models the environment, for the purposes of the application, while ensuring that the resultant complexity remains manageable. At one end of the scale, regular grids or points can finely represent spaces in open environments (Bourgault et al., 2003a), structured environments (Moors and Schulz, 2006), as well as unstructured environments cluttered by obstacles (Jung, 2005). However, as suggested in Chapter 1, the introduction of the many nodes also imposes a high complexity penalty on the path optimisations that can be applied. Region-based approaches (Jung and Sukhatme, 2004,
28
Hollinger et al., 2007a) seek to overcome this limitation by exploiting the structure inherent in indoor environments and representing an environment as a set of connected topological regions, in a manner analogous to the use of navigation meshes in general path planning literature. Representation of the minute elements in each region, which is unnecessary in tracking and searching applications where agents have a much larger sensing footprint, is relinquished in return for the ability to plan more sensibly over a longer time horizon. This thesis also similarly exploits the structure of such environments. Jung (2005) examined the relationship between region-based search and environmental complexity using maps with different levels of obstruction. Open areas were found to be more conducive to greedy planning approaches than region-based planning, since the agents can in reality sense beyond the boundary of any artificially imposed regions. On the other hand, defining regions in more obstructed structured environments enabled region-based methods to better relocate the agents to improve target visibility. Of practical interest is the author‟s recommendation that the size of the regions should be chosen based on the complexity of the environment. The work in this thesis and all the literature discussed thus far assume that a sufficiently accurate map of the environment is available to the searchers. This is however not always possible for Urban Search and Rescue applications that take place in heavily damaged areas. The RoboCupRescue competition (Tadokoro, 2002) provides one such simulated example, where the robots must search for victims through areas in various stages of collapse. The principle tasks addressed in these cases can therefore more often resemble map building and exploration.
2.4 Summary This chapter introduced a number of search problems described in literature and discussed the techniques used to address them. At one end of the spectrum, search plans for finding a stationary target under continuous effort assumptions can be found using Lagrange multiplier techniques. Discretely searching for a moving target under additional path constraints, on the other hand, constitutes an NP-hard problem. Existing robotic search problems are therefore typically solved using suboptimal or short-horizon methods. Due to the increasing availability of cheap computing power and the ability to divide structured environments naturally into a smaller number of constituent regions, there is scope to optimally plan for the entire duration of the search on the level of the
29
constituent regions inspected. As discussed in this chapter, there exist techniques in classical search literature that computes long-term search plans for uniform cells with the assumption that effort can be instantly relocated at each time step. The focus of this thesis is on extending these techniques to find longer-term optimal plans for the search of regions in a structured environment that are not necessarily uniform. Chapter 3 considers the search for a stationary target in an environment where the searcher must additionally spend some time to move from one region to another. Chapter 4 extends the Optimal Searcher Path problem to model the search for a moving target, first in environments where the searcher must spend time moving between widely separated regions but cannot detect the target during travel, and secondly for open indoor areas where the searcher must spend some time transiting through an intervening third region. Building on the OSP problem, Chapter 5 considers searching by a heterogeneous team consisting of searchers and scouts. Detection of the target by a scout does not terminate the search process but instead improves on the target information available to the team. The optimisation task addressed involves balancing between maximising the probability of the searchers directly finding the target and ensuring that they can respond effectively to possible detections by scouts.
3 Search for a Stationary Target 3.1 Introduction This chapter considers the search for a single stationary target in a known environment that can be described by a set of connected regions. A search strategy that minimises the expected time for detection based on available target information is presented. In contrast to much of the existing literature, the proposed algorithm allows the search times of individual regions and the travel times between them to be arbitrarily specified. The stationary target search problem is defined in detail in Section 3.2. A solution method based on Dynamic Programming is presented in Section 3.3. An extension to cater for the search of multiple targets is outlined in Section 3.4 and simulation results are shown in Section 3.5. Section 3.6 discusses the problem in the context of related stationary target search problems while Section 3.7 summarises the work in this chapter.
3.2 Problem Description Searching looks for objects of interests – targets. A general problem when a single searcher is looking for a target in a known environment can be described as follows. Given knowledge of:
The environment structure
Searcher capability
A priori target information
Find a search strategy such that the search efficiency is maximised. The following defines the environment structure, searcher capability, target information and search efficiency for the problem addressed in this chapter. 1
2
1 2
2 2 1
2
5
3
Figure 3.1 An environment decomposed into a set of regions. The regions are shown with different undetected target probability (shading), search time (number) and travel times (arrow weight).
31
3.2.1 Environment Structure Typical indoor environments are composed of connected regions (floors, rooms, halls or corridors). It is assumed that the environment can be decomposed into a set of simply connected regions, as shown in Figure 3.1, where two regions are only linked if a searcher can physically travel directly from one to another. The regions are defined as non-overlapping areas, in which a searcher can guarantee a target‟s absence after it has been searched for a requisite amount of time; convex regions will be used in subsequent examples for simplicity but are not strictly required. The distances between each pair of connected regions are assumed to be known, along with the size (and structure) of each region. It is assumed that a map of the environment is available such that the searcher can self-localise and move between regions as required. Given the available information, the time necessary for a searcher to effectively search each region and the minimum time needed to move from a region to each of its immediate neighbours can also be easily computed in advance. A topological map can be used to describe the environment (Figure 3.1). The complete search area is partitioned into a weighted undirected graph G( N , E ), where each of the n N nodes denotes a region i that requires Ti time steps to search. An edge from node i to j exists if a searcher can travel from region i directly to region j ; the weight Wij denotes the corresponding travel time required. The set of all the adjoining nodes to region i is denoted by S (i ). Typical indoor environments are not fully connected and therefore S (i ) would usually be a small subset of N . Figure 3.3 illustrates an example discretisation of an office area into one such map.
3.2.2 Searcher Capability On arrival in a region i , a searcher can take one of the following two actions: 1. Search the region i , or 2. Move from region i to an adjoining region j S (i ) without searching Searching the region i requires Ti 1 time steps and will detect the target if it is actually present in the region. x(t ) denotes the searcher‟s location at time t , which is updated for each action respectively as follows: 1. When searching region i:
x(t ) i x(t Ti ) i
(3.1)
32
2. When moving from region i to j S (i ):
x(t ) i x(t Wij ) j
(3.2)
3.2.3 Target Information A discrete probability distribution p describes the probability that an undetected target resides in each region2. If there is no prior knowledge, a uniform distribution may be used to recognise that the target is just as likely to be in one region as in any other. Alternatively, the probability of the target being in each region may be assigned proportional to the size of the region concerned. At time t , the probability of an undetected target being in region i is given by:
p(i, t ), i 1,..., n.
(3.3)
Note that: n
p(i, t ) 1
(3.4)
i 1
except at the termination of the search when the target has been found. In this case:
p(i, t ) 0, i 1,..., n.
(3.5)
Consider the actions described by (3.1) and (3.2). If the searcher at time t chooses to move from region x(t ) {1...n} to a neighbouring region j, then the known probability distribution of the target stays unchanged until time t Wx ( t ) j . On the other hand, if the searcher chooses to search region x(t ) and does not find the target there, then the target probability mass can be redistributed amongst the other regions as follows: p( x (t ), t Tx ( t ) ) 0; p(i, t Tx ( t ) )
p (i , t ) , i x (t ) 1 p( x(t ), t )
(3.6)
Equation (3.6) updates p to reflect the fact that the target cannot then be in region x(t ) and normalises the probability distribution such that the undetected target probability still sums to one over all the regions. Alternatively, should the searcher succeed in finding the target, the distribution is then set to (3.5) as previously noted and the process terminates.
2
This distribution is normalised to sum to one whilst the target remains undetected and is different from the target distribution p for the Optimal Searcher Path (OSP) problem defined in Section 2.2.3.3.
33
3.2.4 Search Efficiency When looking for a single target, two typical measures can be used for search efficiency, namely (i) the expected time needed to detect the target, and (ii) the probability of detection within a given time window. Although the precise target location is not known, the searcher in this problem will always find the stationary target as long as it spends the time to inspect all the regions. Minimising the expected time to target detection is therefore chosen as the objective for the problem considered in this chapter.
3.2.5 Discrete Time Formulation of the Search Problem Let u(1), u(2), u(3),..., be a sequence of actions chosen by the searcher at time periods (1), (2), (3),..., respectively. In particular, an action u(k ) s, j S ( x ( (k ))) represents either a decision (denoted by „s‟) for the searcher at time (k ) to search its current region x( (k )), or specifies a neighbouring region j S ( x( (k )) for the searcher to move on to. The optimum search problem can now be written as: Given a map of the environment (such as Figure 3.1), the initial searcher location x(1) and an initial target probability distribution p(1,1),..., p(n,1), decide u , a sequence of actions that minimises the expected time to find the target. Because there can be an infinite number of control sequences, it is in general not possible to compute this minimal value directly. This search problem, however, is subject to a number of simplifying constraints due to the fact that:
The searcher should not visit regions in which the probability of finding the target is zero.
Perfect detection implies that each region only needs to be searched once, and
An optimal searcher should always travel from one region to another via the shortest possible route. Computing the optimal actions u * is therefore equivalent to finding the order in
which the regions are visited such that the expected minimum time to detection is minimised.
34
Let represent a particular sequence that describes the order in which the regions are to be visited, which differs from the more detailed action sequence u defined above. The expected target detection time for a given sequence can be computed by:
T (1) p(1,1) T (2) p(2,1) ... T (n) p(n,1)
(3.7)
where T (i ) denotes the earliest possible time for region (i ) to be fully searched when the sequence is followed. Formally, finding the sequence of actions to find a stationary target can be described as the following equivalent ordering problem: n0
min T (i ) p( (i ),1)
(3.8)
(i ) N0 , i 1,..., n0
(3.9)
(i) ( j), i, j 1,..., n0 , i j
(3.10)
i 1
Subject to:
Where N0 { j, p( j,1) 0} is the subset of regions that can possibly contain the target and n0 N 0 is the number of such regions.
3.3 Approach In this section, a method for obtaining the sequence of regions * that minimises the expected time to detection for a single searcher using dynamic programming is presented. While the dynamic programming algorithm provides a provably optimal solution, it tends to be computationally expensive in general. However, it will be shown that the structure of the specific problem lends itself to an efficient implementation of this algorithm.
3.3.1 Value Function Following a similar argument as in Section 2 of Lössner and Wegener (1982), it can be shown that the next optimal action for the searcher to take depends only on the current searcher location and the current probability distribution of the target. For any searcher location x 1, 2,..., n and any feasible target probability distribution p1,..., pn , one may then define a value function V ( x, p1,..., pn ) as the minimum expected time to find the target, starting from the current time. The optimal search plan from this time onwards minimises this value function.
35
Define A(k ), k 1,..., n0 , as the set of k-combinations of N0 and set A(0) x(1). Let a A(k ), k 0,..., n0 hold a combination of the regions that a searcher could be in when it is known that k regions have just been searched3. For convenience, let
p(a), a A(k ) map a combination a to the corresponding target probability distribution when the k regions specified have all been searched without success and
p y (a ) refer to the target probability in region y in that case. p(a ) is obtained by applying the update equation (3.6) to the initial probability distribution p(1,1)..., p(n,1) for each region in a. Note that p(a) [0,...,0] for a A(n0 ).
3.3.2 Dynamic Programming Equation In general, if the searcher has already searched regions a and is now currently in region x, the next region y N0 \ a it should inspect is the one that leads to the lowest value of Rxy (1 p y (a )) V ( y, p(a { y})), where Rxy denotes the shortest time needed for a searcher in region x to move to and search region y. Using the principle of optimality, the following Dynamic Programming Equation (DPE) is obtained for any x a, a A(k ), k 1,..., n0: V ( x, p(a )) min Rxy (1 p y (a )) V ( y , p(a { y})) yN 0 \ a
(3.11)
In other words, the minimal expected time to detect the target when the regions in set a have already been searched and the searcher is in region x is at least the sum of the time Rxy needed to search the next region y N0 \ a, and the minimal expected detection time V ( y, p(a { y})) multiplied by the probability 1 p y (a ) of not finding the target in region y either. The value functions V ( x,0,...,0) 0, x N0 represent the boundary condition when the target has been found.
For example, consider the case when N0 {1,2,3}. Then A(1) {{1},{2},{3}}, A(2) {{1, 2},{1,3},{2,3}} and A(3) {{1, 2,3}}. A searcher that has just searched two of the three regions could have searched them in combinations of a {1, 2}, {1,3}, or {2,3}, and in each case the searcher may now be waiting in either one of the two respective regions, depending on the actual search order taken. 3
36
3.3.3 Dynamic Programming Algorithm The optimal order in which the regions need to be searched can thus be found as follows:
Algorithm for stationary target search problem: 1. Calculate the shortest travel times from the regions of i {x(1)} N0 to each region j N0 , using, for example, the Floyd-Warshall algorithm (Cormen, 1990). 2. Add to this the search times T j , j N 0 to form a table
Rij , i {x(1)} N 0 , j N 0 , denoting the minimum time for a searcher starting at region i to reach and complete the search of region j. 3. Set V ( x, p( N0 )) 0 for x N 0 . 4. For k n0 1 to 0 and for each x a, a A(k ), compute: V ( x, p(a )) min Rxy (1 p y (a )) V ( y , p(a { y})) yN 0 \ a
The optimal search sequence is obtained concurrently with the calculations in step 4. Accordingly, the best region for the searcher to first search is found through minimising V ( x(1), p(1,1),..., p(n,1)). It is possible for more than one region to lead to the same minimum value, in which case an arbitrary one may be chosen. Repeating this for each subsequent step yields the entire optimal plan. The number of states for which the value function needs to be defined is n0
kC k 1
k n0
+1, where C nk0 is the number of combinations of k - element subsets drawn from
n0 elements, as illustrated below: 1. When k n0: A(n0 ) contains the single possible combination of all the regions with non-zero target probability while the searcher could actually be in one of the n0 regions. The number of states is therefore n0Cnn00 n0 . 2. When k n0 -1: (after n0 1 regions have been searched) there are n0 choices from the set A(n0 1) and the searcher could be in one of n0 1 positions for each. The number of states is thus ( n0 1)Cnn00 1.
37
…. 3. When k 1: there are C 1n0 choices for the set A(1) and 1 corresponding searcher position for each choice. So the number of states is Cn10 n0 . 4. When k 0: no regions have yet been searched. There is then just the single initial state.
An unoptimised MATLAB implementation of the algorithm executes in under 2 seconds on a 2.6-GHz AMD Opteron 152 processor, for the environment shown in Figure 3.4 where n0 14 and n 17. When all seventeen regions have non-zero target probability (n0 17), paths are generated within 22 seconds (Figure 3.2). Despite the NP-hard nature of the problem (Trummel and Weisinger, 1986), the proposed algorithm is obviously viable. It should be noted that the optimal strategy * is calculated only for the possible future states that lead from the initial distribution p(1,1),..., p(n,1). If new information becomes available, for example through an embedded sensor network, so that the target probability distribution can be updated, then the value function and optimal control action need to be recomputed. Replanning in this manner also limits the uncertainty of the prior knowledge and helps ensure the ongoing effectiveness of the searcher‟s actions. While the computation time for planning is kept reasonably short, this remains acceptable in practice.
Computation time versus number of regions with non-zero target probability 25
Computation Time (s)
20
15
10
5
0
8
9
10 11 12 13 14 15 16 Number of regions with non-zero target probability - n0
17
Figure 3.2 Computation times versus regions with non-zero target probability for the example in Figure 3.4
38
3.4 Searching for Multiple Targets It is possible to adapt the problem framework to optimise the search for a known number of targets, whereby the expected proportion of undetected targets in each region is used instead of the target probability distribution p() as a convenient form of describing the available information. When only one target is present, the information measure then simply equals the probability of the target being in each region. In contrary to a single target scenario, many more measures of search efficiency are possible when a searcher is looking for multiple targets. For example, minimising the time to find all the targets, minimising the time to find the very first target, or maximising the number of targets found in a fixed time frame are all reasonable objectives. While the overall goal in search and rescue scenarios is to find all the targets, it is also desirable to find most (if not all) of the target as quickly as possible. For the purposes of adapting the discussed problem to cater for multiple targets, the objective of minimising the average time to find a target, provided all targets are found, is used in the following. Under this objective function, in a 10 target scenario, a search plan that finds the first 2 targets in 5 minutes and the remaining 8 targets in 30 minutes (average time of (2×5+8×30)/10=25 minutes) will be less desirable than a plan which finds 8 targets at the 7 minute mark and the last 2 targets after 35 minutes (average time of 12.6 minutes). In practice, since the exact locations of the targets are not known, one can only minimise “the expected average time to find a target” (instead of the true average time), provided all the targets are found. Given the similarities to the original single target search problem, the DP algorithm of Section 3.3.3 can then also plan for multiple targets if the previous target probability distribution p is redefined to represent the expected proportion of targets in each region.
3.5 Examples This section illustrates the use of the proposed algorithm using a simulated search of an office floor in the University of Technology, Sydney. The floor is divided into 17 regions, as shown in Figure 3.4. In this figure, the regions are shown to be linked by straight lines for clarity only. The actual travel times between regions (shown on the corresponding links in black) are estimated by planning the shortest distance path between the nodes for a searcher with a speed of 0.5 m/s, giving due consideration to
39
the presence of walls, doors etc. The time to fully search each region (shown below the corresponding region number in red) is set proportional to its area.
3.5.1 Optimal Search Plan The following example considers the case when the searcher starts from region 9 and the prior target probability distribution is p(,1) (0.0654, 0.1307, 0.0654, 0, 0, 0, 0, 0, 0, 0.0196, 0.0654, 0.0654, 0.0654, 0.0654, 0.0654, 0.0654, 0.3268). The search problem was solved using the DP algorithm presented in Section 3.3.3 and the best search plan calls for the searching of regions 17, 16, 15, 14, 13, 12, 11, 2, 3, 1, 10 in sequence. Figure 3.3 and Figure 3.4 show the execution of the search plan at two points in time while Figure 3.5 summarises the entire plan.
Figure 3.3 Snapshots at two distinct times of the stationary target search sequence (a). Darker shading indicates regions with higher undetected target probability. The search time required for each region is shown (in red) below the corresponding region number and the travel times between connected regions are shown (in black) on the corresponding links.
40
Figure 3.4 Snapshots at two distinct times of the stationary target search sequence (b).
11
7
6
5
4
3
2
1
8 9
10
Figure 3.5 Stationary target search sequence. Underlined numbers indicate the order in which each region is searched.
This plan can clearly be seen to be not greedy with respect to either distance or target probability. For example, it calls for the searcher starting at region 9 to bypass the nearby regions 10-16 and proceed to search region 17. On the other hand, even though region 2 has the next highest probability of containing the target, immediately after the
41
search of region 17, the searcher is instead assigned to inspect the regions 16, 15, 14, 13, 12, and 11 in succession before searching region 2. As suggested by the conditions of the DPE (3.11), the solution algorithm in general derives plans which strike an optimal balance between anticipated costs (time) and potential rewards (probability of finding the target).
3.5.2 Evaluation of the Proposed Algorithm This section compares the plans generated by the optimal algorithm to search an area against the shortest path through the regions and plans generated by a number of heuristics. The expected times to detect the target for the different plans are calculated using Equation (3.7).
3.5.2.1 Comparison with Shortest Path Plans Despite apparent similarities, obtaining the optimal search plan does not always correspond with finding the shortest path through the nodes. If there is no knowledge at all of where the target could be and if each region requires insignificant effort to investigate (e.g., the searcher only needs to go there and see), an optimal search plan is the same as the shortest coverage path through the set of nodes. For example, given equal target probability in regions 1, 2, 3, 6, 7, 10, 11, 12 and zero search time for each region, the proposed algorithm generates a route (start-10-11-12-3-2-16-7) that requires the searcher at most 78.34s to find the target, which is also the length of the shortest possible coverage path. However, if one is privy to information suggesting region 7 to be more likely to contain the target (Figure 3.6), the new optimal sequence (start-10-11-12-6-7-3-2-1) instead requires 80s if the target is found in the very last region searched. Despite this, the proposed strategy results in a smaller expected time to detect than that achieved by the shortest coverage path, requiring an expected time of 42.50s versus 47.85s. The example illustrates how the algorithm is able to make the most of any additional data about likely target presence. Even in the absence of any such guidance, the plan generated is nevertheless the best that can be expected given the complete lack of knowledge.
42
Figure 3.6 Search sequence guided by additional knowledge. The initial target probability is 0.909 in each of the regions 1, 2, 3, 6, 10, and 12 but is 0.3636 in region 7. A searcher following a shortest path would search leftwards first (start-10-11-12-3-2-16-7). The optimal search sequence (start-1011-12-6-7-3-2-1) instead first searches the regions to the right of the map. Despite being longer, the latter results in the minimal expected time to detect the target.
3.5.2.2 Comparison with Heuristics This section compares the plans generated by the optimal algorithm to search an area against plans generated by a number of heuristics. The aim is to illustrate the likely benefit, in terms of the obtained expected detection times, of computing an optimal plan in different situations over simply using suboptimal methods, rather than to critique the particular heuristics compared. The methods used below stem from related problems in literature and are characterised by the criteria employed to select the next region to search. Despite being locally optimal (optimal for just one time step), the techniques generate reasonable plans for comparison, particularly on small graphs. The three selection criteria are defined as follows: 1. Maximise the highest probability of detection: A straightforward choice when choosing the region to search next is to select a region with the highest target probability. The region with the lowest cost (in terms of the time required to travel to the region and then immediately search it) is chosen if multiple regions have the same probability. 2. Minimise the cost of travel and search (for regions with non-zero target probability): Even if a region has a high probability of containing the target, it can be too far away or requires too much time to search to justify ignoring
43
other regions. An alternative selection policy therefore is to select a region with non-zero target probability that can be searched in the earliest possible time. 3. Maximise the ratio of detection probability and cost: This criterion attempts to weigh the probability of detecting the target in a region against the cost required to search it. The approach is known to be globally optimal for a restricted version of this problem (Stone, 1989) in which the searcher can switch instantly between cells. It is also equal to the one-step version of the utility greedy heuristic used by Sarmiento, Murrieta and Hutchinson (2003) for minimising the expected time to detection.
To illustrate the likely difference of the optimal method from the suboptimal approaches in a range of situations, scenarios where search times and travel times are comparable, where searches require no time, where search times are large compared to the travel cost, as well as where travel times are zero are considered (Table 3.1).
Scenario
Target Distribution
Search and Travel Times
1
11 regions with non-zero target
Search and travel times as outlined in Figure 3.3
probability: p(,1) (0.0493, 0.0985,
0.0493, 0, 0.2463, 0.0985, 0, 0, 0, 0.0148, 0.0493, 0.0493, 0.0493, 0, 0, 0.0493, 0.2463) 2
Same as 1
Search times five times that of Figure 3.3 Travel times as outlined in Figure 3.3 (search times significantly larger than travel times)
3
Same as 1
Search times are set to zero Travel times as outlined in Figure 3.3
4
Same as 1
Search times as outlined in Figure 3.3 Travel times are set to zero
Table 3.1 Stationary target search scenario properties. The searcher begins in region 9 in all cases.
44
Scenario
Mean Time to Detection (s) DP algorithm
Maximise detection
Minimise travel
Maximise
probability
and search costs
detection probability/ cost
1
170.26
194.21(14.06%)
186.65(9.63%)
185.86(9.16%)
2
562.74
619.02(10%)
689.68(22.56%)
567.93(0.92%)
3
54.0553
101.46(87.69%)
57.34(6.09%)
74.61(38.02%)
4
94.02
106.20(12.96%)
123.25(31.10%)
94.017(0%)
Table 3.2 Expected target detection times for stationary target scenarios. Bracketed numbers show the percentage difference from the corresponding minimum time.
Table 3.2 collates the expected target detection times for plans obtained via the Dynamic Programming algorithm and the three heuristic methods in the same environment. It can be seen from the results that:
A shortest path heuristic is rendered competitive when stopping to search a region requires no time at all and only the travel time remains to be minimised (Scenario 3).
On the other hand, maximising the ratio of detection probability versus cost performed well when the search times dominated the travel times (Scenario 2). This is because a searcher‟s current position then has almost no bearing on the time required to cover any other cell. Combined with the fact that repeated searches in the same region would yield diminishing rewards (in fact, no additional reward), the non-increasing marginal rate of return then satisfies the condition identified in Stone (1989) for a locally optimal solution to be also globally optimal.
The result when there are no travel times and the searcher can consequentially always “jump” instantly from one region to another (Scenario 4) confirms this claim of uniform optimality. However, since indoor areas typically do not support this greedy choice property (due to them not consisting of fully connected regions in uniform-distance meshes), the difference of such a locally optimal plan from the global optimal is likely to grow with the size of the travel times required.
45
3.6 Related Work and Discussion While the problem posed in this chapter has not been efficiently and optimally solved before, there is significant literature on a range of stationary target search problems in discrete time and space. Kadane and Simon (1977) presented a sequential search problem in which the inspection of each box (i.e., the search of a region) is certain to be successful. The same authors (1983) then extended this solution to deal with the situation where a box can be fully searched in a finite number of looks. Lössner and Wegener (1982) examined a problem with the same search and travel time structures as the problem in this chapter, but the focus was on the necessary and sufficient conditions for the existence of optimal ultimately period strategies assuming that there is a probability of the target being overlooked with each search. The computation of an optimal strategy required the evaluation of all the ultimately period strategies. As opposed to the algorithm in this chapter that directly finds the optimal actions, the work identified necessary conditions to reduce the number of strategies to be evaluated. Onaga (1971) also formulated a problem where a time penalty is incurred for changing between cells, and Sarmiento et. al. (2003) minimised the mean time to find a uniformly distributed target from an arbitrary set of vantage points without search times. However, beyond these examples and earlier work by Gilbert (1959) and Kisi (1965), the vast majority of related search problems have concentrated on only cases without switch costs. Chapter 4 of Stone (1989) identified that if the ratio of search effectiveness to the search cost of inspecting a region always decreases with each additional look, then a greedy strategy of always searching a region with the highest such ratio would also be globally optimal. This condition is however usually not applicable in the presence of the non-uniform search and travel times between regions in structured environments. Except for the unlikely case when all the inter-region travel times are identical, a global ordering problem must nevertheless still be solved to optimally plan for the indoor search scenario considered here. An alternative to minimising the expected cost to detect the target is to maximise the probability of detection subject to a cost (or time) constraint. Chew (1967), Nakai (1981) and Wegener (1982) for example examined problems of this type. The more recent Optimal Searcher Path (OSP) problem, which typically aims to maximise the chance of finding a moving target within a limited time, will be further discussed and
46
extended in the next chapter. When the flexibility of supporting a more general detection function is not necessary, a solution method only needs to arrive at an order of searching each of the non-empty regions once. The search problem considered here may then be posed as a Maximum Collection Problem with Time-Dependent Rewards (Erkut and Zhang, 1996). In extending the problem to search for multiple unknown targets following a Poisson distribution, Smith and Kimeldorf (1975) sought to minimise the expected cost of finding at least one object. Ogras et al. (2004) on the other hand approximated the optimal paths for multiple searchers to find a number of stationary targets, but similarly with respect to maximising the probability of detecting at least one. The computational cost of the approach presented in this chapter is related to the number of regions with non-zero target probability. Problems with up to 20 such regions can be solved in less than a minute under a MATLAB implementation of the algorithm, although an optimised implementation on an efficient platform is anticipated to accommodate many more regions within the same time period. It is nevertheless recognised that optimal planning is feasible only when the computation time is small with respect to the travel and search times of the problem. In the face of larger environments or stricter timing constraints, the use of planning heuristic or a suboptimal hierarchical decomposition of the regions is then needed. For instance, the heuristics outlined in Section 3.5.2.2 require significantly less than 1 second to compute for the examples in this chapter, and thus could be used even in combination to always obtain the best solution amongst them for larger problems. Decomposing a large area into constituent parts has been used as a natural way of managing complexity in search related problems. Jung and Sukhatme (2004) divides a structure into regions to coordinate the tracking of targets between cells, Murarka and Kuipers (2001) extracts the topology in architectural plans for navigation, while Gerkey, Thrun and Gordon (2006) solves a Pursuit-Evasion problem by first partitioning visibility regions. DasGupta et al. (2006) also gradually aggregated the regions representing a search area into a computationally manageable number, before refining the discrete inspection sequence back into a continuous path. Of most direct interest, Murphy (2000) describes the USAR scenario where a disaster site is segmented into regions with different survivor probability, thereby facilitating more rational decision-making over where to focus resources. It was proposed that hierarchical decomposition could be achieved along structural lines (e.g. from urban block to buildings to floors), or by grouping the
47
target probabilities of regions remote from the searcher. The latter in particular allows for planning in graduated granularity.
3.7 Summary This chapter considered the discrete search for a stationary target with the objective of minimising the expected time to detection, where the search area is a set of regions requiring non-uniform times for the searcher to traverse. While the focus is on a static target, the computation cost of the solution algorithm is such that re-planning to incorporate the effect of any new information gathered (reflected in a new target probability distribution) is feasible provided the number of regions is small. The next chapter examines scenarios where the target is capable of moving and the successful search of a region does not always guarantee the absence of targets. The increased number of states arising due to the moving target and the imperfect searcher renders a dynamic programming approach impractical and a branch and bound solution method is therefore proposed.
4 Search for a Moving Target 4.1 Introduction This chapter deals with the problem of finding a moving target in a structured environment. In contrast to the problem examined in the previous chapter, the target is able to move between the regions in the environment according to a predefined motion model. Furthermore, the searcher is imperfect in the sense that there could be a non-zero probability of missing the target even when the region in which the target is present is searched. Although somewhat similar to the Optimal Searcher Path (OSP) problem formulation, originating from maritime Operations Research (OR) applications, solutions to this problem is not available in the literature. Section 4.2 first provides an overview of the indoor search problem. Section 4.3 highlights its relationship to the OSP problem and motivates the specific issues that needs addressing, for example environments that do not decompose as easily into exact uniform regions and the need for incorporating travel times between regions. Section 4.4 formulates the Optimal Searcher Path problem with non-uniform Travel times (OSPT) and Section 4.5 describes a branch and bound approach to solve this new problem, using a generalisation of the MEAN bound (Martins, 1993), developed for solving the OSP. Section 4.6 presents a further improvement, the Discounted MEAN (DMEAN) bound, which greatly tightens the MEAN bound for the OSPT and OSP problems alike with almost no additional computation. The new algorithm is evaluated in Section 4.7 using a range of examples to demonstrate that it leads to much faster solution times compared with other known OSP bounding methods and extends the time horizons for which search plans for the OSPT and OSP problems can be feasibly obtained. Section 4.8 presents a problem formulated specifically for the search of open indoor environments, such as offices, where the searcher moves amongst a number of contiguous regions. The DMEAN relaxation is adapted to find plans for this Generalised Optimal Searcher Path (GOSP) problem. Section 4.9 suggests further avenues for reducing computation times for both the OSPT and GOSP problems before discussing their application and the DMEAN relaxation in a wider context.
49
4.2 Problem Overview This chapter is concerned with optimising the search for a distressed victim moving within a structured area divided into independently searchable regions (such as rooms or individual buildings). In particular, the scenario to be considered can be summarised as follows: At each time step, the target can stay in its current region or shift to a different location. A known Markov model captures the target‟s anticipated behaviour and specifies the probability with which it may travel to another region. For each period that the searcher spends in the same region as the target, detection may occur with a given glimpse probability. There is thus a possibility for the searcher to “overlook” a target. After searching a region for one time period, a searcher can continue in the same place or move to search an adjacent region. Due to the structure of the environment, the searcher may be required to spend time travelling from one region to another before resuming search. Unlike the problem described in Chapter 3, as the searcher is imperfect there is no guarantee that the target can always be found. As such, the goal is to find a search plan that maximises the probability of detection within a specified time window.
2 1
Figure 4.1 Goal: Find an optimal sequence to search the regions of interest
4.3 Motivation Much work has been done in Operations Research literature on a similar problem known as the path constrained or Optimal Searcher Path problem (Stewart, 1979; Eagle and Yee, 1990; Dell et al. 1996; Hohzaki and Iida, 1997; Washburn 1998), for the special case where the environment can be divided into a grid of identically-sized cells and a searcher can immediately search one cell after another in successive time periods. In particular, such OSP problems can be viewed as the searcher being able to expend
50
some amount of time searching one region, and then being able to shift the effort to search a neighbouring one with no intervening “travel time” necessary. While adequate for modelling the search of the open sea with a sufficiently fast platform (e.g. a plane), this assumption is not appropriate for the problem of searching a cluster of buildings or an indoor area consisting of different-sized rooms. Not only is a searcher in such cases reasonably expected to devote time moving from one region to another before resuming search, the time it takes to travel also naturally depends on the location of the originating and destination regions. The effective total time available for searching, unlike in an OSP problem, therefore becomes linked to the choice of search actions themselves. The need for planning to explicitly consider searcher travel arises when a searcher moves slowly to the target, cannot sense adequately during travel, or if movement causes a resource to be consumed. The first reason in particular is not unusual for indoor search scenarios, given typical requirements to constrain speed to maintain localisation, stay covert, ensure safety, or satisfy other operational concerns. Although most related problems described in the literature factor in some incurred costs from a searcher looking in region, few cases exist where the specific travel times of changing from one region or cell to another are directly taken into account. In one typical example, Hohzaki and Iida (1997) raised the idea of capturing the cost of travel between the successive cells in a search sequence. This, like the search costs of other problems, was however envisioned as a purely financial expense, with no impact on the effective time available for further searches. Switch cost in the sense of time was first considered by Gilbert (1959) and Kisi (1966) using two-cell examples. The decision to inspect a different cell in the general ncell problems of Lössner and Wegener (1982), DasGupta et al. (2006) and in previous work (Chapter 3 and Lau et al., 2005) also incur such a switch time, but these works all dealt only with stationary targets. Interestingly, Dambreville and Le Cadre (2002) raised a Markovian target search formulation that allows a search resource, such as a radar, to renew itself only after spending a certain amount of time for moving. The formulation is nevertheless concerned only with allocating an infinitely divisible search resource whose future availability is not conditioned on the searcher‟s current position, and therefore still does not meet the physical travel requirements in the scenarios considered here.
51
This chapter proposes two complementary ways in which the search of structured and indoor environments may be discretely modelled under different sensing modalities, and describes solution methods that enable optimal sequences to be efficiently generated. It extends the idealised cell model of the OSP problem to incorporate nonuniform searcher travel time between cells, such that the inter-cell travel constraints enforced by the original problem is adapted to model the search of regions in more structured environments.
4.4 Optimal Searcher Path Problem with non-uniform Travel Times (OSPT) Existing OSP problem formulations can indeed approximate the case of locationdependent, non-uniform travel times by first injecting additional identical cells to render travel times uniform. However, this model then assumes the use of an unusually myopic sensor, does not take advantage of the inherent environment structure, and would therefore unnecessarily escalate the overall complexity. Thus a more direct approach to augment the existing OSP formulation to define the Optimal Searcher Path Problem with non-Uniform Travel Times (OSPT) is required. Consider a searcher and a target moving through an environment divided into a finite set of regions (cells) C 1,...,N . The target occupies one cell (representing a region) at a time and moves according to a specified Markov process at each time step. A prior distribution p( ,1 ) [p( 1,1 ), p( 2,1 )...p( N ,1 )] of the target at time 1 is initially supplied, where p( i,t ) is the probability that the target is in cell i at time t without being detected by the searcher before t . In the absence of searches, the distribution evolves according to the formula p( ,t 1 ) p( ,t ) . This distribution is not normalised and may sum to less than 1. Target detection is modelled as follows: if both the searcher and the target are in region i during time t , detection occurs with a glimpse probability of g( i,t ). As an example, if p( ,t ) [p( 1,t ),..., p( N ,t )] and cell 1 is searched for one time step, then p( ,t 1 ) [p( 1,t ) ( 1 g( 1,t )), p( 2,t ),..., p( N ,t )] . The glimpse or detection function typically takes the form of g( i,t ) 1 e( i,t ) (Dell et al., 1996, Eagle and Yee, 1990), with ( i,t ) 0 being a measure of search effectiveness for a given cell. Any
52
function 0 g( i,t ) 1 however can be used. The glimpse probability at each time is assumed to be independent of past searches. The searcher‟s path is constrained by the structure of the environment. Let S i ,i C be the set of cells that the searcher can directly move to from cell i. Unlike
existing OSP problems described in the literature, it is assumed that the redeployment of search effort from one cell to another may not be necessarily instantaneous. If the searcher is in cell i at time t , it can only start searching the next cell j S( i ) at time
t 1 Wij . The integer value Wij represents the length of time needed for a searcher to travel between the two referenced cells, during which no detection can occur. Travel in different directions can be assigned dissimilar values to capture specific travel or terrain constraints. For example, a region transition in which a searcher moves up a steep incline may require significantly more time than movement in the opposite direction. Figure 4.2 illustrates an example representation of the search space for this new problem. 1
2
3
2
2 2
3 2 1 1
2
S(1) = [1,4] S(2) = [1,2,3] S(3) = [2,3,4] S(4) = [3,4]
W14 = 2 W21 = 2 W23 = 2 W32 = 1 W34 = 1 W43 = 2
4
4
1
Figure 4.2 An OSPT search space depicted as a graph. The time required to move from building to building varies with distance.
Given T time steps to find the target, let denote a valid search plan represented by a series of cells searched in one time unit increments, where ( n ) refers to the n th cell searched. A searcher following plan first moves to and searches cell
( 1 ) for one time period, then travels to search cell ( 2 ) for another time step, and continues for the remaining cells until T time periods in total have been expended. Note that in contrast to the original OSP problem formulation in Section 2.2.3.3, the number of one-time-unit searches in the sequence, , can now be less than T , since some time may be needed for travel between the cells. For example, given T 5, both
a [ a ( 0 ), a ( 1 ), a ( 2 )] [3, 2,1]
and
53
b [ b ( 0 ), b ( 1 ), b ( 2 ), b ( 3 ), b ( 4 )] [3,3,2,2,2] are valid plans for the problem in Figure 4.2. Accordingly, the target distribution p now needs to be updated using: t 1 if t T1 p( ,1 ) , p( ,t ) t Tn p( ,Tn ) M ( n )Tn , if Tn t Tn 1 ,1 n
(4.1)
Where Tn denotes the time period when the n th search occurs and M ( n )Tn is an N N identity matrix with the ( n )th diagonal element set to 1 g( ( n ),Tn )
and
the
remainder of the diagonal set to one.
Objective Function The objective of the Optimal Searcher Path problem with non-uniform Travel times (OSPT) is to maximise the probability of detection within a specified time window and can be written as: | |
max PD( ) p n ,Tn g n ,Tn
n 1
Subject to:
( n 1 ) S( ( n ))
(4.2)
Tn 1 Tn W n n 1 1, n 0,...,
(4.3)
T| | T
(4.4)
The glimpse functions g( ( n ),Tn ) are given and the undetected target probability p( ( n ),Tn ) can be obtained using equation (4.1). T0 0 is assumed for the purposes of constraint (4.3). Based on the approach used by Trummel and Weisinger (1986) to describe the OSP problem, the OSPT can also be formulated as the search of a connected graph in which each node represents a cell, edges define valid adjacent cell transitions and edge weights Wij ,i N , j S( i ) denote required travel time. In the following discussion, for clarity of presentation, the existing Optimal Searcher Path problem formulation in literature will be referred to as the Optimal Searcher Path problem with no travel times or simply the Optimal Searcher Path problem (OSP). Note that the OSPT reduces to the OSP problem (Eagle, 1984) when the travel times between cells are strictly zero. Given
Wij 0,i, j S( i ), which implies T and Tn n, the above formulation is then equivalent to the OSP formulation outlined in Section 2.2.3.3.
54
4.5 Branch and Bound Framework The complexity of the OSPT problem (and its predecessor, the OSP problem) arises because the usefulness of searching an individual cell with respect to maximising the probability of detection is governed by the probability of the undetected target being present at that time, which in turn is a function of all previous search actions thus far. Additionally, the OSPT does not have a fixed number of control variables in (the effective time available for searching being dependent on both T and the particular order of cells chosen). Branch and bound techniques, already popular for solving the NP-complete OSP problem (Trummel and Weisinger, 1986), would therefore be especially suitable for the OSPT when non-uniform travel times are introduced. As the approach finds the optimal solution by implicitly enumerating feasible actions, the best plan for the searcher to follow can be found regardless of the variable sequence length. Instead of examining every possible solution (Figure 4.3), branch and bound takes advantage of the fact that all extensions of a partial search plan (consisting of different sequences of regions to search in the remaining time) can be ruled out together if they cannot possibly lead to the optimal reward. Central to the approach is estimating the best reward achievable from any valid continuation of a given search plan: if the value does not exceed the best known solution, an entire “branch” of related plans (Figure 4.3) can then be safely abandoned.
3
T=1
3
T=2
2
T=3
2
T=4
2
2
T=5 1
1
1
2
1
2
3
2
3
3
1
3
3
1
3
2
2
2
4
1
3
2
1
1
3
3
1
4
3
2
2
T=6 T=7
3
3
4
3
2
2
2
2
2
2
2
2
4
3
2
4
4
4
4
4
4
4
4
3
4
4
3
4
4
3 2
4
4
4
3 2
3 2
4
3 2
4
3 2
3 2
3 2
Figure 4.3 All feasible search plans for the example in Figure 4.2 when T=7.
3 2
3 2
2
3 2
4
55
3
T=1
3
T=2
2
T=3
2
T=4
2
2
T=5 1
1
1
2
1
2
3
2
3
3
1
3
3
1
3
2
2
2
4
1
3
2
1
1
3
3
1
4
3
2
2
T=6 T=7
3
3
4
3
2
2
2
2
2
2
2
2
4
3
2
4
4
4
4
4
4
4
4
3
4
4
3
4
4
3 2
4
4
4
3 2
3 2
4
3 2
4
3 2
3 2
3 2
3 2
3 2
2
3 2
4
Figure 4.4 Example implicit enumeration of searcher paths. Fathoming branches removes many unpromising solutions (in dark shade) from explicit consideration.
The branch and bound algorithm below adapts the approach in Washburn (1995) to
operate
with
non-uniform
travel
times.
K( s )
is
a
set
of
3-tuples
{ nextcell,time,upperbound } representing path continuations yet to be explored after a particular sequence of s cells is searched. The first field in the 3-tuple refers to the next cell to search for one time step, the second field is the total time expended once the specified cell is searched, and the third contains the upper cumulative probability of detection (PD) bound associated with this particular extension. p* holds the best detection probability hitherto found. Algorithm for Branch and Bound (OSPT): 1. Let 0 ( 0 ). Set s 0,K( s ) {{ s ,0,0 }} and p* to a value below 0. 2. If K( s ) is empty, let s s 1, else go to 4. 3. if s 0, go to 9, else go to 2. 4. Selection: Remove from K( s ) a tuple { s , s , ps } chosen according to a selection criterion. 5. If ps p*, this extension can be fathomed. Go to 2. 6. Else Branch: For each cell c S( s ), if s W s c T , obtain pc , the upper PD bound for any plan beginning with the path { 0 ,..., s , c }. Add tuple
{ c , s W s c 1, pc } to K( s 1 ). 7. If no tuples were added to K( s 1 ), the current extension is a leaf and no more searches can be done. Let p* ps and store { 0 ..., s } as the incumbent best path. Go to 2. 8. Else let s s 1, go to 4. 9. Stop, the last saved path is optimal with the maximum PD of p* .
56
The algorithm employs a depth-first branch and bound approach as per Washburn (1995). For the purpose of this thesis, the selection criterion used for step 4 always chooses the tuple { s , s , ps } with the highest bound ps from K( s ) such that nodes at the same tree level are expanded in descending order of their upper bounds. This selection criterion is chosen for its simplicity and likely ability to quickly improve upon the incumbent, but the use of other criteria is also possible. Figure 4.3 shows an example expansion of search plans using the above algorithm for the problem in Figure 4.2 when the searcher begins in cell 3.
4.5.1 Bounds for the Probability of Detection The purpose for the upper bound, used in step 6 in the Algorithm for Branch and Bound, is to quickly estimate the best achievable reward in a given solution subspace without exhaustively examining each full plan. The tighter a bound is in relation to the actual achievable PD, the more branches can be fathomed in advance. However, a tight bound which requires too much effort to calculate may actually slow down the solution process. Washburn (1995, 1998) provides a comprehensive overview of the known bounding techniques and examines the trade off between bound tightness and calculation speed. Existing bounding approaches for the closely related OSP problem is outlined in Section 2.2.3.4, with further details given in Appendix A4. As existing OSP bounding methods presume that travel times between cells are zero, these cannot directly help solve the OSPT. One possible solution is to transform the non-uniform regions into a series of suitably uniform cells, and adding an artificial cell for each travel time unit between regions. This would clearly scale poorly if travel times are large. A new bounding method is therefore needed for efficiently solving the OSPT that does not redefine the search area or sacrifice the corresponding travel constraints that ensure bounds remain tight. The Forward and Backward (FAB) algorithm (Brown, 1980), which can optimally allocate infinitely divisible search effort to find a moving target, has been used to find bounds for different versions of the OSP problem (Washburn, 1995; Hohzaki and Iida, 1997; Kunigami, 1997). However, the technique appears incompatible for the OSPT problem unless the map is enlarged in the manner mentioned above. One possible exception is the algorithm by Dambreville and Le Cadre (2002), which optimally
4
This thesis will refer the existing OSP bounds by their given names in Washburn (1995, 1998).
57
allocates a search resource that self-renews after a certain amount of time steps. For the simple case when all the travel times are identical, this algorithm can be used in a similar way as the continuous FABC bound5 (Washburn, 1995) to obtain an upper bound of the probability of detection for a discrete path. Nevertheless, the inability to incorporate different travel times specific to the source and destination cells renders the method unsuitable for use with the OSPT. Since multiple time steps may elapse while the searcher travels from one cell to another, it is also more likely for a first-order Markov target to have changed location between successive searches than if travel times were not considered at all. In light of their better performance with energetic targets, the PROP6 (Washburn, 1998) and MEAN (Martins, 1993) bounds thus appear to be the best candidates of the OSP bounds for the present problem. In particular, the MEAN bound‟s calculation can be extended to enforce both the path constraints of the original OSP problem and the additional consideration of searcher travel times. While the PROP bound is calculated through solving a simpler graph problem than that used for finding the MEAN bound, its method of relaxing the OSP problem cannot be extended directly in the same way. The next section first extends the MEAN technique to provide upper PD bounds for the OSPT problem, then introduces an improvement that significantly tightens the bound for both the OSP and OSPT problems.
4.5.2 The Generalised MEAN Bound The MEAN bound functions by maximising the expected number of detections for the searcher in the time steps of k 1 to T . This approximates the OSPT‟s aim of detecting the target for the first time within the given time window T . Following Martins (1993), define P( D d ) as the probability that d detections occur when the searcher follows a sequence from time 1 to T . Taking an expectation of the number of detections yields: T
T
d 1
d 1
ED( ) d P( D d ) P( D d ) PD( ) Since the expected number of detections (ED) for a particular plan can be no smaller than the corresponding probability of detection (PD), the maximal ED across all search plans thus also provides an upper bound for PD itself. 5 6
Appendix A describes the FAB algorithm and the FABC bounding method in detail. Appendix A describes the PROP bounding method in more detail.
58
Under this modified objective, the utility of searching a cell at a given time, in terms of its contribution to ED( ), is independent of any previous searches that might have already occurred. Prior actions only play a role in limiting where the next searches could take place in that two consecutively searched cells i and j must be connected to each other according to set S( i ). Calculating the value of the MEAN bound can thus be modelled as finding a longest path in a directed acyclic graph (DAG). The technique below generalises Martin‟s method (1993) to operate with the OSPT problem defined. Consider the bound calculation illustrated in the following 1-dimensional example, where the searcher at each time can either stay in its current cell or move sideways towards an adjacent one. Let be a fixed sequence of k one-time-unit searches such that the search of cell ( k ) takes place at time Tk . Note that Tk k for an OSP problem but may exceed that value given the possibly non-zero travel times in an OSPT formulation. The objective is to find the maximum ED possible for any extension of this partial plan from time k 1 up to time T . This can then be added to the known PD for searches during times T1 ,...,Tk to give the upper PD bound for any sequence beginning with the partial plan . In this OSPT example, assume that searching a cell i and then moving to cell i 1 requires one time period for travel while reaching i 1 incurs a delay of two time periods. Remaining in the same cell causes no delay. Time Period Tk
Tk+1
Tk+2
Tk+3
T
1 RMEAN(1,Tk+2)
Cell
2
Example search grid RMEAN(3,Tk+3)
3
1
2
3
4
5
( k ) 2
4 5
Figure 4.5 Generalised MEAN bound calculation for the OSPT problem. Arcs can span multiple time steps (dotted lines) if corresponding travel incurs a time delay.
Figure 4.5 shows how the MEAN bound can be found through solving a longest path problem in a network, where each node represents the search of a cell at a particular time. Starting with the constraining node ( k ),Tk denoting the last fixed
59
searcher position, let directed arcs indicate the valid searcher movements from one time to the next. To determine the maximum ED achievable in the time interval from Tk 1 to T , one can associate a reward for searching a cell with the arc entering the corresponding node. Let ( Tk ) be the undetected target probability mass at time Tk , taking
into
effect
the
searches
of
(1)
through
to
( k ).
By
letting
P( ,t ) ( Tk ) t Tk , t Tk , P( j,t ) is the probability that the target was not detected by any searches up to time Tk and is now in cell j at time t . Accordingly, each arc heading into node
j , t
, j Ck , 1t
{ T can ,be. .given . , T a } weight of
RMEAN ( j,t ) P( j,t ) g( j,t ), reflecting the contribution towards the total ED if a cell
j is searched at time t . A standard single-source DAG longest path algorithm such as in Cormen et al. (1990) can then be applied on the network to maximise the total reward to obtain an upper bound on the achievable PD. The calculation of the generalised MEAN bound for the OSPT is summarised below: Algorithm for the generalised MEAN Bound 1. For each time step from Tk to T , create a graph node per cell at that time. Mark node { ( k ),Tk } as valid. 2. Use P( ,t ) ( Tk ) t Tk to calculate P( ,t ) for Tk t T . 3. From each valid node { i,t }, extend arcs to all nodes { j, },
j S( i ), t Wij 1 T . Assign a weight of P( j, ) g( j, ) for each new arc and mark the head nodes { j, } valid. 4. Repeat 3 until arcs have been extended from all valid nodes. 5. Apply a DAG longest path algorithm to find the maximum reward for paths leading from node { ( k ),Tk }. Add the reward to the PD of following sequence
( 1 ),..., ( k ) to form the upper bound of any continuation. By using arcs that can span multiple time steps as required, the algorithm generalises the existing MEAN method (Martins, 1993) to accommodate the additional movement constraints of the OSPT formulation. The following section presents one of the main contributions of this thesis, a technique that further improves on the tightness of the generalised MEAN bound.
60
4.6 The Discounted MEAN (DMEAN) bound 4.6.1 Motivation The efficiency of computing the MEAN bound comes from decoupling the reward attributed to an action from past history, by assuming that searching a cell does not change the target probability distribution. The utility (in the sense of maximising ED) of multiple plans can then be considered at once without having to expensively apply the target motion model for each case. This simplifying assumption, however, comes at a cost when the searcher already has a high chance of detecting the target each time it searches a cell. Consider the worst case where both a searcher with a 100% glimpse probability and all of a stationary target‟s probability mass begin in the same cell. While the maximum probability of detection is clearly 100%, the maximum expected number of detections would increase for each additional time period available for search. Moreover, instead of moving on from one searched cell to another, a path that maximises ED is also more likely to repeatedly inspect the same high probability cell. The difference between the MEAN bound and the actual probability of detection can therefore be even larger in the presence of travel times. Viewed in the context of the longest path problem in Figure 4.5, the bound‟s looseness is linked to each arc weight‟s overestimation of the probability of detecting the target with the corresponding search. In the limit, one can assign accurate weights that fully account for all the cells previously inspected by the searcher, but not without regaining the computational complexity of maximising PD itself. A closer examination of the MEAN bounding method however shows that information already available can be leveraged to cheaply reduce the overestimation of arc weights.
4.6.2 Method This section presents an improved bounding technique, Discounted MEAN (DMEAN), which greatly improves the MEAN bound at a small cost by retaining a limited memory of past actions. Although Martins (1993) calculated the MEAN bound with an ED network that assigns the reward for searching a cell to the weight of a node‟s outgoing arc, an equivalent graph where the value is accrued on the arc heading into corresponding node has been used in Section 4.5.2. Each arc‟s weight is therefore now clearly linked to the paired actions of visiting one particular cell after another,
61
rather than describing the current cell in isolation. Given a graph that readily embeds this information, one then has the option of discounting from the reward (estimating the gain in PD) of searching the current cell an amount that is known to have already been claimed when the previous cell was searched. More specifically, the DMEAN relaxation relates the weight of an arc from node
{ i,t } to { j, } to the search of both the corresponding cells i and j, while still ensuring the longest path through the network produces a valid upper bound. Let
( i, j,t, ), t be the probability that the target in cell i at time t moves to cell j at time . Conditioned additionally on the searcher failing to find the target in cell i at time t , the reward on the arc to node { j, } can be safely reduced to:
RDMEAN ( j, ,i,t ) ( P( j, ) P( i,t ) g( i,t ) ( i, j,t, )) g( j, )
(4.5)
The DMEAN algorithm can now be stated as follows. Algorithm for the DMEAN Bound 1. For each time step from Tk to T , create a graph node per cell at that time. Mark node { ( k ),Tk } as valid. 2. Use P( ,t ) ( Tk ) t Tk to calculate P( ,t ) for Tk t T . 3. From each valid node { i,t }, extend arcs to all nodes { j, },
j S( i ), t Wij 1 T . Mark the head nodes { j, } valid. If t Tk , assign a weight of P( j, ) g( j, ) for each new arc. Else use
RDMEAN ( j, ,i,t ) ( P( j, ) P( i,t ) g( i,t ) M( i, j,t, )) g( j, ). 4. Repeat 3 until arcs have been extended from all valid nodes. 5. Apply a DAG longest path algorithm to find the maximum reward for paths leading from node { ( k ),Tk }. Add the reward to the PD of following sequence ( 1 ),..., ( k ) to form the upper bound of any continuation. Whereas P( j, ) is the target probability assuming no searches after Tk , the discounted term P( j, ) P( i,t ) g( i,t ) ( i, j,t, ) represents the probability of the target being in cell j at time , given it has also survived an earlier search of cell i at time t as well. The arc weights of such a discounted network clearly cannot exceed
62
those of the corresponding MEAN network, and are indeed likely to be much smaller, especially along the arcs of the original longest path. The DMEAN OSPT problem relaxation thus produces a consistently tighter bound than MEAN regardless of problem parameters. Time Period Tk
Tk+1
Tk+2
Tk+3
T
1 RMEAN(1,Tk+2)
RDMEAN(2,Tk+2,1,T)
Cell
2 RDMEAN(2,Tk+1,3,Tk+3)
3 4 5
Figure 4.6 DMEAN bound calculation. Arcs starting after time Tk are assigned discounted weights.
4.6.3 Proof of Guaranteed Upper Bound Proof: Let J indicate detection at any time after time step k when following a given plan and I t be a random variable indicating detection on the t th look by the searcher at time Tt , irrespective of any detections made before that time. Then:
E( J ) E( I t ( 1 I u )) E( I k 1 t k
k u t
I t ( 1 I u )) E( I t ).
t k 1
k u t
(4.6)
t k
Equation (4.6) reiterates MEAN‟s rationale that the PD of a search plan from time k
onwards, E( I k 1
I (1 I
t k 1
t
k u t
u
)) , is bounded by the corresponding ED,
E( I t ) .
The
DMEAN
relaxation
on
the
other
hand
corresponds
to
t k
E( I k 1
I (1 I
t k 1
t
t 1
)), which therefore necessarily lies between PD and ED.
4.6.4 Computational Complexity The new approach of evaluating E( I t ( 1 I t 1 )) E( I t ) E( I t I t 1 ) instead of
E( I t ) does not require much extra computation. It can be easily seen that:
63
E( It It 1 ) P( ( t 1 ),Tt 1 ) g( ( t 1 ),Tt 1 ) M( ( t 1 ), ( t ),Tt 1 ,Tt ) g( ( t ),Tt ) If the target transition matrix does not change with time, M( i, j,t, ) is just the
( i, j ) element in t and the computation required for discounted weights can be rendered only marginally higher than that for undiscounted weights by caching
k ,k { 1... max Wij 1} in advance. Since DMEAN and MEAN differ only in the i , jS( i )
above calculation of each arc‟s weight, both methods share a complexity that is linear to the network size of ( NcT ) when each cell is connected to at most c others and adjacency list structures are used to represent the cell connectivity and the target motion model (Martins, 1993). The terms of E( It It 1 ) suggests that the DMEAN relaxation stands to provide a much sharper bound than MEAN when the glimpse probabilities g( , ) are high. Naturally, the effectiveness of discounting also depends on the proportion of target probability P( ( t 1 ),Tt 1 ) that should have been already accounted for in the search of cell ( t 1 ) at Tt 1, and whether a target known to be in that cell is likely to move to cell ( t ) at Tt .
4.7 Evaluation of the Use of DMEAN Bound for OSP and OSPT Problems This section presents a number of example OSP and OSPT problems to illustrate the performance of the DMEAN bound.
4.7.1 Uniform OSP Search Grid The following examples are based on the search grids proposed by Eagle and Yee (1984). 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Figure 4.7 Example 5×5 OSP search grid.
64
Similar grids have been used extensively in the past to demonstrate OSP results (Eagle and Yee, 1990; Martins, 1993; Dell et al. 1997; Washburn 1998; Ogras et al., 2004). As shown in Figure 4.7, the search area is divided into a square of N n n cells where at each time step the searcher can only search its current cell or an adjacent (vertical or horizontal) neighbour. The target remains in its cell with a probability d after each time step and with the remaining probability moves to one of the adjacent cells at random. Detection occurs with a glimpse probability of g . The cell connectivity sets S( i ),i C are formed according to the depicted topology. For the purpose of the examples here, the searcher is set to start in cell 1 ( ( 0 ) 1 ) while the target starts at the centre cell S (e.g., S 13 for N 25) of an oddnumbered grid (p( S,1 ) 1, p( i,1 ) 0, i [1...N ] \ S ). The values of Wij , i, j S( i ) are uniformly set to zero and grids with 11×11 and 15×15 cells are used (Figure 4.8).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
Figure 4.8 11×11 OSP search grid used in comparisons – The searcher starts in a corner and the target begins in the centre.
4.7.2 Comparison with Previous OSP Bounds Tables 4.1 and 4.2 show the branch and bound solution times when using DMEAN alongside that of the best bound methods reviewed in Washburn (1998), for a number of uniform search grids with different parameters. In particular, the DMEAN, MEAN, PROP and FABC algorithms are implemented and compared. FABC (Washburn, 1995) applies one iteration of the FAB algorithm to solve a version of the problem with distribution of effort and convexity relaxations as discussed in Section
65
2.2.3.4. The FABC bound is nevertheless by far the slowest of the tested bounds to individually compute. The PROP method (Washburn, 1998) can be seen as a simplified MEAN bound in which the most profitable node is chosen at each time step without regard to path constraints. While PROP, MEAN and DMEAN all belong to a class of easy to compute linear bounds, computing the PROP bound does not need a solution of a longest path problem, making it the fastest of the four bounding methods evaluated. ERGO2, another linear bound also reviewed in Washburn (1998), is omitted as it is inferior to PROP when the target is moving while worse than FABC when the target is near-stationary. The number of bound fathoming attempts (i.e., the number of times that step 5 of the Algorithm for Branch and Bound is executed) in solving the problems with the various bounds are also listed for comparison. As it reflects the number of possible solutions explicitly enumerated by the branch and bound process, a lower number implies a sharper bound. Results were obtained using a MATLAB implementation on a 2.6-GHz Opteron 152 processor of the branch and bound framework described in Section 4.5; values for problem grid size and time horizon were chosen to allow for test completion within reasonable time. Note that the FABC implementation used was not fully optimised, although the results shown are nevertheless indicative of relative performance, as seen from the number of bounding attempts in the different cases. The reader is referred to Washburn (1995, 1998) for performance comparisons between the existing OSP bounding techniques with 1-dimensional problems. g( , ) 0.3
d=0.3 DMEAN MEAN PROP FABC
d=0.6
g( , ) 0.6
d=0.9
d=0.3
d=0.6
g( , ) 0.9
d=0.9
d=0.3
d=0.6
d=0.9
2.54
2.91
14.52
3.03
3.14
68.21
3.13
7.24
206.81
(10216)
(11074)
(51322)
(10594)
(10079)
(256794)
(9744)
(17204)
(941615)
5.51
8.90
72.02
12.31
12.27
324.19
17.77
13.59
323.98
(34329)
(37054)
(341929)
(57924)
(45457)
(1980086)
(79512)
(49037)
(1981951)
4.65
6.12
54.90
9.04
8.16
281.48
12.96
9.01
281.05
(34329)
(37054)
(341929)
(57924)
(45457)
(1980086)
(79512)
(49037)
(1981951)
40.14
18.69
9.31
216.41
62.64
13.47
2353.84
835.99
65.28
(96244)
(38354)
(19684)
(403667)
(94034)
(26849)
(3645611)
(1011430)
(92687)
Table 4.1 Branch and bound computation for 11×11 OSP search grid with T=15. Total computation times in seconds are presented together with the number of bound fathoming attempts within brackets. The optimal path is 2, 3, 4, 15, 26, 37, 48, 49, 60, 61, 72, 73, 62, 61, 50 with a detection probability of 0.26491 for the case when glimpse probability g = 0.6 and target stay probability d = 0.6.
66
1 Searcher 2 3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
Figure 4.9 Optimal path for 11×11 OSP search grid example with T = 15, g = 0.6 and d = 0.6. PD=0.26491. g( , ) 0.3
DMEAN MEAN PROP FABC
g( , ) 0.6
g( , ) 0.9
d=0.3
d=0.6
d=0.9
d=0.3
d=0.6
d=0.9
D=0.3
d=0.6
d=0.9
14.56
14.53
157.30
19.07
23.76
730.96
21.55
30.58
2902.27
(58349)
(52394)
(380974)
(49779)
(47489)
(2185136)
(45029)
(59547)
(11299324)
47.62
41.76
708.66
74.01
71.57
3114.58
131.51
113.60
11912.70
(263314)
(159392)
(2526216)
(313227)
(166645)
(14733399)
(446140)
(269479)
(74459729)
37.98
27.43
466.23
51.50
37.20
2380.86
82.02
60.23
10438.66
(263314)
(159392)
(2526216)
(313427)
(167350)
(14738284)
(446340)
(269479)
(74469799)
416.95
135.57
29.26
2184.70
352.96
131.20
30297.75
7512.15
396.67
(939229)
(244199)
(45974)
(3694435)
(451797)
(209634)
(40372593)
(7746490)
(429665)
Table 4.2 Branch and bound computation for 11×11 OSP search grid with T=17 1 Searcher 2 3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
Figure 4.10 Optimal path for 11×11 OSP search grid example with T = 17, g = 0. 6 and d = 0.6. PD=0.29785.
67
The results confirm DMEAN‟s suitability as a superior direct replacement for MEAN in a branch and bound framework. Compared to the FABC bounds, DMEAN bounds also tended to be much tighter for cases where d 0.3 and d 0.6, despite being significantly easier to compute. However, as noted by Washburn (1995, 1998), FABC becomes quite effective when the target is less mobile (i.e. d has a large value). Nevertheless, even in the case of d 0.9 DMEAN is still superior to the linear PROP and MEAN bounds, due to its ability to mitigate the redundant counting of alreadysearched target probability mass. Figures 4.11, 4.12 and 4.13 show the computation times for example problems in a 11×11 grid with increasing time horizons, when the target stay probability d is set at 0.3, 0.6 and 0.9, respectively. In the case where the target is energetic (Figure 4.11), DMEAN can be seen to further outperform the other methods as the time horizon expands. While its performance gap against PROP in the problems where d 0.6 varied occasionally as seen in Figure 4.12, the advantages appear to hold over time. These examples are actually already structured to favour the PROP method, since PROP bounds are known to approach the tightness of MEAN when the target roughly follows a random walk but are very loose when a target for example zig-zags from side to side (Washburn, 1998). The former is confirmed by the similar numbers of MEAN and PROP bounding attempts in Tables 4.1 and 4.2. Assuming the other problem parameters stay unchanged, the difference between DMEAN and PROP may therefore be even more pronounced given another type of target motion. It needs to be mentioned that in similar comparisons by Washburn (1998) using a 1-dimensional example and a slower target (Washburn, 1998), FABC computation times exhibited a much more favourable nonlinear growth and eventually fell below that of PROP. The results of Figure 4.13 confirm FABC‟s relative quality when the target is near stationary and time horizons are large.
68
5
10
4
10
Computation time(s)
3
10
2
10
1
10
DMEAN MEAN PROP FABC
0
10
-1
10
12
14
16
18 T (s)
20
22
24
Figure 4.11 Computation times versus time horizon for 11×11 OSP grid, g=0.6 and d=0.3. 6
10
5
10
4
Computation time(s)
10
3
10
2
10
1
10
DMEAN MEAN PROP FABC
0
10
-1
10
12
14
16
18
20
22
24
26
T (s)
Figure 4.12 Computation times versus time horizon for 11×11 OSP grid, g=0.6 and d=0.6.
69
6
10
5
10
4
Computation time(s)
10
3
10
2
10
1
10
DMEAN MEAN PROP FABC
0
10
-1
10
12
13
14
15
16
17
18
19
20
21
T (s)
Figure 4.13 Computation times versus time horizon for 11×11 OSP grid, g=0.6 and d=0.9.
To better illustrate the extent to which the branch and bound approach remains feasible on a midrange single-processor computer, Figure 4.14 plots the computation times required by an unoptimised C++ implementation of the same branch and bound framework using each of the linear DMEAN, MEAN and PROP bounds, for a range of problems with different glimpse and target stay probabilities. 6
10
5
10
4
Computation time(s)
10
3
10
2
10
g=0.3, g=0.3, g=0.3, g=0.3, g=0.3, g=0.3, g=0.6, g=0.6, g=0.6, g=0.6, g=0.6, g=0.6,
1
10
0
10
d=0.3 : d=0.3 : d=0.3 : d=0.6 : d=0.6 : d=0.6 : d=0.3 : d=0.3 : d=0.3 : d=0.6 : d=0.6 : d=0.6 :
DMEAN MEAN PROP DMEAN MEAN PROP DMEAN MEAN PROP DMEAN MEAN PROP
-1
10
15
20
25
30
T (s)
Figure 4.14 Computation times versus time horizon (C++ implementation) for a 15×15 OSP grid and combinations of g=0.3, 0.6 and d=0.3, 0.6. Put in context with earlier figures, solving with DMEAN when T=20, g=0.6 and d=0.6 required 10.9 seconds while it required 170 seconds to compute in the MATLAB implementation.
70
4.7.3 OSPT Example Figure 4.15 illustrates an example search environment that requires the use of the OSPT framework. Each node in the figure represents an individual region of interest, for example a specific building or a room in the search area, while the arcs describe their connectivity and corresponding searcher travel time. For simplicity, the travel times are assumed to be symmetric in this case, such that switching from a given region
i to a region j would require as much time as moving from region j to region i. 8
0
0
1
1
10
9
11 6 2
2 3
2
3 2
3
12
3
3
3
2 18
7
1
2
2 4
4
13
2
2
5
14
1
1
1
1
16
15
17
Figure 4.15 Example OSPT search environment with non-uniform travel times. The problem can for example represent the search of a floor in a shopping centre or an 18 room motel.
Consider the case where the searcher begins in cell 1 and the target could begin in cells 2, 5, 8, 17 or 18 with equal probability. The target follows a random walk with d 0.8 amongst the regions. The optimal plan when g( , ) 0.6 and T 40 is to
follow the search sequence [3 3 3 3 3 3 3 3 3 3 7 12 12 12 18 13 13 14 14 14 15 17 17]. This optimal plan leads to a PD of 0.50823 and is computed in 29.47 seconds in a C++ branch and bound implementation using the DMEAN bound. In comparison, the same implementation with the MEAN bound requires 535.55 seconds to compute and exhaustive enumeration requires 33701 seconds. It is clear that the use of the DMEAN bound allows practically realistic OSPT problems (searching a facility with 18 buildings in this example, and generally in areas with 30 or more regions) to be optimally solved in a reasonable time frame.
71
For a given environment structure, the target motion model, glimpse probabilities as well as the values of the travel times along the likely optimal paths all influence the computational effort required. For example, different values of the travel times W13 and
W14 can affect whether it is preferable to first search along the top of the map (going first from region 1 to region 3) or search along the bottom (going from region 1 to region 4). This in turn impacts on how easily a particular branch of plans leading from either choice can be fathomed. Clearly, problems for which certain paths are noticeably more favourable than others would require less time to compute than cases where almost all the paths must be explicitly enumerated for consideration. For large scale problems, -optimal solutions that provide an approximately optimal plan with a payoff guaranteed to be within of the true optimal can be computed in a fraction of the time required to find the true optimal solutions (Washburn, 1995). This can be achieved by changing the fathoming criteria in step 5 of the algorithm given in Section 4.5 to ps p* . For this particular OSPT example, the solution time falls below 5 seconds when 0.1, with a corresponding reduction of PD to 0.44392. Using an appropriate balance between guaranteed plan quality and computation time, large scale problems could be solved even in cases where the time available for planning is limited. It is also important to note that each plan can also be used as part of a sub-optimal rolling horizon approach, whereby the effective time horizon is extended through replanning if the target is not found.
4.8 Search Problems with Minimum Transit Time Constraints The OSPT problem described in Section 4.4, illustrated in Figure 4.2, deals with scenarios where no searching takes place while traversing in the intervening space between regions. In contrast, some indoor environments, such as offices, may be better represented by contiguous regions of rooms and corridors where travelling from one region to another may require passing through and searching neighbouring regions, as illustrated in Figure 4.16. This section presents the Generalised Optimal Searcher Path problem (GOSP), which retains the inter-region path constraints of the OSP, but additionally stipulates the minimum number of time steps that each region must be searched for before the searcher can start searching a neighbouring region. The formulation using this allows a search of an indoor area to be intuitively modelled as the
72
successive investigation of individual regions while incorporating the natural travel constraints in a manner more realistic than the scenarios discussed so far. Section 4.8.2 outlines a modified branch and bound optimisation framework for the GOSP problem. The DMEAN relaxation is adapted to provide an upper PD bound for partially enumerated GOSP paths. It is shown that, although more complex than an equivalent OSPT problem, search paths for GOSP problem can also be quickly generated.
4.8.1 The Generalised Optimal Searcher Path Problem (GOSP) 4.8.1.1 Outline Assume that a search area is partitioned into a set of contiguous cells
C { 1,...,N } as shown in Figure 4.16. Let S( i ), i C include cell i and the set of adjacent cells immediately reachable from cell i via a shared portal. ENTRY(E)
WEab
WEac
b Wabc Wcba
a
c
Figure 4.16 Example search area with contiguous regions
The minimum time for a fixed-speed searcher to move from one region to another in the environment depicted in Figure 4.16 is not always well defined by the source and destination cells alone. If a searcher visits cells a , b and c in sequence, it must clearly spend at least as much time in cell b as it would take to physically move between the portal joining cell a with b and the connecting point from b to c (see Figure 4.16). More precisely, whether a searcher can move to a different cell, at a given point in time, depends on where it entered its current cell from and the furthest it could have travelled across the cell since.
73
Let Wabc 1,a C,b S( a ),c S( b ) capture this minimum time required for the searcher to travel from cell a through b to cell c. Similarly, WEab 1,b S( a ) accounts for the times needed to initially move from the entry E via the starting cell a before reaching any other cell b S( a ). By default, Wiji 1,i S( j ), j S( i ), as it is assumed that a searcher can always plan a trajectory such that it returns to its previous cell in one time step. As is the case for the OSPT problem, the target is assumed to occupy one cell at a time and move according to a Markov process specified by the matrix . An initial target probability distribution p( ,1 ) is supplied and detection occurs with a given independent glimpse probability 0 g( , ) 1. Given T time periods to locate the target, the searcher chooses a series of cells to visit on its route and the amount of time to spend in each. The objective once more is to maximise the cumulative probability of detecting the target within the time horizon. Let now be a sequence of distinct cells ( i 1 ) ( i ) for the searcher to visit, where each specified cell in the sequence is different from the previous one. The time when the searcher arrives at each cell ( i ),1 i is denoted by ( i ). Using the map in Figure 4.16 as an example, a searcher following the plan
{ a,b,c }, { 1,5,7 } would search cell a for time periods 1 to 4, cell b from steps 5 to 6, and cell c from step 7 onwards to T . For convenience, the cell x( , ,t ) C searched at each time step t 1,...,T when following the plan specified by and can be retrieved by:
( i ), if ( i ) t ( i 1 ),1 i x( , ,t ) ( ), if t ( )
(4.7)
Following the update equation (4.1) of the OSPT, the undetected target probability is updated for target motion and the effects of searches using:
p( ,t 1 ) p( ,t ) M x( , ,t )t ,2 t T
(4.8)
74
4.8.1.2 Objective The objective for the GOSP problem can then be stated as: T
max PD( , ) p x( , ,t ),t g x( , ,t ),t ,
t 1
Subject to:
( i 1 ) S( ( i ))\{ ( i )}
(4.9)
( i 1 ) ( i ) W ( i 1 ) ( i ) ( i 1 ) ,1 i
(4.10)
( ) T
(4.11)
The undetected target probability p( x( , ,t ),t ) can be obtained using equation (4.8), while the searcher location x( , ,t ) is resolved via (4.7). ( 0 ) E represents the searcher‟s commencement the entry and ( 1 ) 1 is assumed for the purposes of constraint (4.10). The need to travel through a cell for a non-uniform minimum time before reaching the next cell differentiates this problem from both the original OSP and the OSPT problems. It reduces to an OSP problem when the values of Wabc between connected cells a , b and c are uniformly set to 1. The GOSP problem is therefore also NP-Complete.
4.8.2 Branch and Bound Algorithm for the GOSP Problem The approach below further adapts the OSPT branch and bound algorithm described in Section 4.5 to only expand options satisfying the new minimum transit time constraint (4.10) in the solution tree. To facilitate early fathoming, each node in the tree represents the searching of a given cell for an added time step. The algorithm thus directly finds x* ( *, *,t ) , the cell occupied at each time step by the searcher in a path that maximises the probability of detection, before retrieving * and * from the equivalent cell sequence. Let K( s ) now be a set of 5-tuples { nextcell, time, upperbound , lastcell,
duration } representing GOSP path continuations yet to be explored after a given sequence of s 1-time-unit searches. The first three fields are identical to that used for the OSPT problem in Section 4.5. The additional fourth field stores the last location of
75
the searcher different from nextcell and the fifth counts the number of consecutive time steps that cell nextcell would have been searched for.
Algorithm for Branch and Bound (GOSP): 1. Set s 1, K( s ) { ( 1 ),1,0,E,1} and initialise p* to a value less than 0. 2. If K( s ) is empty, let s s 1, else go to 4. 3. If s 1, go to 8, else go to 2. 4. Selection: Remove from K( s ) a tuple { s ,ts , ps ,s ,s } chosen according to a selection criterion. 5. If ps p*, this extension can be fathomed, go to 2. 6. Else Branch: For each cell c S( s ), if ts 1 T and s Ws s c , obtain pc , the upper PD bound for any plans beginning with the path { 1 ,..., s , c }. If
c s , add the tuple { c ,ts 1, pc ,s ,s 1} to K( s 1 ), else add { c ,ts 1, pc , s ,1}. 7. If no tuples were added to K( s 1 ), the current extension is a leaf and no more searches can be done. Let p* ps , store { 1 ,..., s } as the incumbent best path and go to 2. Else let s s 1 and go to 4. 8. Stop, the last saved path is optimal and p* is the maximum detection probability. Entry
a(1)
a(2)
a(3)
a(4)
b(1)
c(1)
b(1)
c(1)
a(1)
b(2)
a(1)
c(2)
a(5)
b(1)
c(1)
a(1)
b(2)
a(1)
c(2)
a(2)
b(1)
a(1)
b(3)
c(1)
a(2)
c(1)
a(1)
b(1)
c(3)
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Figure 4.17 Tree of searcher actions up to T=5 for the example in Figure 4.16, when all the minimum cell transit times to a different cell are set to 2. Brackets denote how long each cell has been successively searched.
The initialisation of the tree in step 1 differs from the branch and bound approach in Section 4.5 since the searcher is limited to search a particular cell in the first time
76
interval. Similarly, step 6 is modified to ensure that only cells reachable by the searcher with respect to the minimum transit times are considered. Step 4 always selects the tuple
{ s ,ts , ps ,s ,s } in K( s ) with the highest bound ps .
4.8.3 DMEAN Bound for the GOSP Problem A relaxation based on maximising ED can once more be employed to obtain an upper PD bound. Figure 4.18 shows how a network originally designed for the OSP problem can be augmented to provide bounds for the GOSP problem. Time Period k a
K+1 RMEAN(a,k+1)
K+2
T
RDMEAN(a,k+2,a)
b
Cell
c Ea1 ba1
RMEAN(c,k+1)
ca1 ab1 RDMEAN(a,k+2,c)
cb1 ac1 bc1
Figure 4.18 ED network for the GOSP problem.
Assume Wijk 2,i C E, j,k C,i k for the map shown in Figure 4.16, such that the searcher must always search a given cell for two time steps before moving on to another, except when it is returning to the cell previously searched. In addition to nodes a , b and c, specially-labelled nodes are needed to help retain the extra information
required to enforce the new travel constraint. For instance, if the searcher is at cell a at time k and can search cell b at time k 1, the corresponding node { a,k } would be joined with an arc to node { ab1,k 1 }. The label prefix a retains the previous searcher location, and the appended number denotes for how long the current cell, b in this case, has been repeatedly searched. These special nodes are only necessary up to the point that the values they contain play any role in constraining future movement; an original node { i,t } is still used when a searcher has travelled through a cell for sufficient time that it can reach any other cell as allowed by S( i ). An arc drawn from { ab1,k 1 } to
{ b,k 2 } would represent the searcher staying in cell b again at time k 2. This
77
network is thus a generalisation of that used with the OSP DMEAN/MEAN techniques in Section 4.6. Reduced rewards, similar in manner to those described in section 4.6, can once more be assigned such that the longest path in the DAG network yields a discounted expected number of detections. More specifically, each arc linking a node representing the search of cell i at time t 1 with the search of cell j at time t is assigned the weight of:
RDMEAN ( j,t,i ) ( P( j,t ) P( i,t 1) g( i,t 1) ( i, j,t 1,t )) g( j,t ) (4.12) The upper PD bound for a partial path defined until time k can then be calculated by finding the longest path in a corresponding discounted ED network (Figure 4.18), where arcs heading into nodes at time k 1 retain the RMEAN ( ,k 1 ) rewards but arcs heading to nodes { j,t },t k 1 are instead given the reduced weights of
RDMEAN ( j,t,i ).
4.8.4 Example Search of an Office Environment 3
2
2 2
7
4
5 1
3
2
2 6 2 1
1 1 8
1 9
1 10
1
1 1
1 11
12
Figure 4.19 Example GOSP search area. Minimum transit times shown on dotted lines.
Figure 4.19 shows an example search of an environment modelled using the GOSP formulation. Each of the rooms and the corridor are represented by a separate cell and the initial distribution p( ,1 ) is set with the assumption that the target could be in either cell 2 or 8 at time 1 with equal probability. At each time period, the target can either stay in its current cell with a probability of 0.8 and with a probability of 0.2 move randomly to any cell immediately on its right. Commencing at the entry on the left side
78
of the corridor, a searcher with a detection probability of g( , ) 0.6 is given T 15 time steps to find the target. The optimal searcher plan computed via branch and bound and confirmed with exhaustive enumeration is: { 1,8,1,2,3,5,7,1,12 } and { 1,2,3,4,6,8,9,12,13 }, which leads to a target detection probability of 0.9. More descriptively, the plan calls for the searcher to first search room 8 in the hope that the target actually does begin in the bottom row. If the target is not found there, then the more likely hypothesis is that the target started in cell 2 and the searcher should then try to catch up with the moving target across the top row. In the mean time, most of the probability mass belonging to the bottom-row target hypothesis would accumulate in room 12, making it a sensible place to search if the target was still undetected after the top row is swept.
4.8.5 Computational Complexity As extra nodes are necessary to enforce the new transit constraints, the actual complexity of applying the DMEAN method to a GOSP problem is sensitive to c, the degree of cell connectivity, as well as the number of minimum time steps that are required traverse a cell. For a worst-case problem where the searcher at each time step can always move on to c 1 cells (including its present cell) without restriction but must spend L more time periods transiting to another cell very far away, the requisite ED network to find the bound contains ( c 1 ) L N extra nodes per time step (excluding nodes representing the entry) and creates up to c ( c 1 ) L N ( T 1 ) additional arcs compared to the corresponding OSP case. The overall complexity to obtain the necessary longest path thus becomes O( NcT Nc2 L( T 1 )). Calculating an equivalent MEAN bound also has the same complexity. Thus the tighter DMEAN bound is clearly preferable for the GOSP problem. Table 4.3 illustrates the effect of bounding on reducing computation times when the branch and bound algorithm described is used. For step 6 of the algorithm, the bounding methods respectively do nothing (set pc 1 ), use the MEAN bound or apply the DMEAN method to obtain an upper PD bound. The first approach is equivalent to an exhaustive search. The second column of the Table 4.3 shows the number of feasible plans which are fully expanded in each approach and the third column shows the number of fathoming attempts (step 5) conducted. The last column shows the corresponding computation times in a MATLAB branch and bound implementation.
79
Bounding method
Plans fully examined
Bounding attempts
Solution time(s)
None 13117491 5679078 (exhaustive) MEAN 82 775020 DMEAN 59 108470 Table 4.3 Branch and bound computation for GOSP example, T=15.
300 64.9 15.2
While there are more than 13 million feasible plans for this problem, the number of bounding attempts is significantly less when the MEAN bound is employed. The tighter DMEAN bound further aids in fathoming branches of related plans as early as possible and leads to an even quicker solution time.
4.9 Discussion and Summary 4.9.1 Potential Extensions This section discusses two possible extensions for further reducing computation times. While not fully explored as part of current work, their potential for use and limitations are briefly discussed.
4.9.1.1 Further Discounting of Rewards The DMEAN bound presented in this chapter may be described as a MEAN bound with a single step look-back, in that the projected gain for searching a cell according to the MEAN method is reduced with respect to the immediate last cell visited. Since this can be calculated by finding a longest path in the same N T node network as the original MEAN method, a tighter bound is obtained at little extra computation cost. An even tighter bound can be obtained through looking back for an additional number of steps. This, however, implies a significant increase in computation. As an illustration, while two nodes, { 1,t } and { 2,t }, are sufficient to represent the searches at time t when finding the DMEAN bound for a two-cell OSP problem, every pair of nodes in the ED network for calculating a two-step discounted bound will also need to uniquely identify a sequence of three cells searched. Figure 4.20 shows an instance of such a network consisting of nodes { 1 : 1,t } , { 1 : 2,t } , { 2 : 1,t } , and
{ 2 : 2,t }. Each node‟s label in this case signifies both the cell to be searched and the previous one visited; an arc joining the nodes { 1 : 2,t } and { 2 : 1,t 1 } therefore
80
represents the consecutive searches of cell 2 at time t 1, cell 1 at time t , and cell 2 prior to that. After assigning each arc‟s reward to appropriately discount for the two previous cells searched, a two-step discounted MEAN bound can then be found via the longest path in the enlarged network. In general, discounting d 1 steps for a map of n meshed cells would require an ED network consisting of nd T nodes. Time Period Tk
Tk+1
Tk+2
T
Cell
1:1 1:2 2:1 R
2:2
Figure 4.20 Two-step discounting for a two-cell problem. For example, the reward of the arc entering node {2:2,Tk+2} (black) is linked to the prior searches of cell 2 at time Tk+1 and cell 1 at Tk.
From a number of problems attempted, cases involving a small number of cells, a high glimpse probability, and/or a slow target (situations where the MEAN/DMEAN bounds perform less well compared with FABC) appear to benefit from two-step discounting. The use of a two-step discounted bound also halved the solution time for the OSPT scenario in Section 4.7.3. On the other hand, problems that use OSP grids (Figure 2.2) larger than 13×13 cells actually required much more time to solve as a result. Since there is clearly a point at which the higher computation cost would become prohibitive, the choice of the number of discounting steps therefore depends on the individual trade off between the extra complexity and the potential for tighter bounds.
4.9.1.2 Planning via Solving a k-Longest Path Problem Section 4.5.2 described the use of the longest path in an ED network, such as Figure 4.5, to provide an upper PD bound for the equivalent search problem. In fact, the length of the k th longest path in the network also bounds the maximum PD from above, provided it does not exceed a highest corresponding PD of the k 1 longer paths. This observation suggests that the desired PD-maximising path may also be found through examining the longest paths in an ED network in decreasing length until it is certain that the next longest path cannot be optimal. The following outlines one such approach.
81
OSPT Path Ranking Algorithm: 1. Construct an ED network where the weight of each arc estimates the utility of searching a cell at time steps 1,...,T . For example, MEAN, DMEAN or further discounted rewards may be used. 2. Set p* 0, * {}, and k 1. 3. Find , the k th longest path in the network. 4. If length( ) p*, go to 8. 5. Calculate PD( ), the PD accrued if the searcher follows this path. 6. If PD( ) p*, set p* PD( ) and * . 7. Set k k 1. Go to 3. 8. Stop, * is the optimal search path with a PD of p* .
The longest paths required by step 3 can be found with an existing k-best path ranking algorithm (Yen, 1970; Lawler, 1971; Eppstein, 1998; Martins and Pascoal, 2003). These algorithms are typically used where a solution satisfying more than one objective is sought. Since the ED network is already in the convenient form of a Directed Acyclic Graph (DAG), the complexity of finding the paths with any of the techniques wis significantly smaller than the corresponding computation of each path‟s PD value (step 5). The potential appeal of this approach lies with the fact that it enumerates search paths in the approximate order of their usefulness, as estimated by the corresponding MEAN or DMEAN rewards. It is possible to stop computation at any time and simply use the best plan found thus far, in a similar manner to using branch and bound. However, it can be seen that the ordered path enumeration makes it arguably more likely for a “good” path to have been encountered at an early stage. This was found to be case for many of the examples problems tested using the algorithm, although whether it applies more generally remains to be investigated. As the approach only requires the construction of a single ED network, it is also more feasible to use a larger network with rewards that incorporate 2, 3 or more steps of discounting as discussed in Section 4.9.1.1. Although the network increases in size for each extra step of discounting beyond the first, this has the advantage of reducing the number of longest paths that needs to be examined prior to the terminating condition (step 4 of the algorithm) being met.
82
Some limitations were identified through initial experiments with an implementation of this approach. In particular, while the optimal path is often found early in the execution of the algorithm, many more subsequent paths had to be tested before the process can safely terminate. Conversely, given that the length of each path diverges further from its actual PD with each additional time step, many of the longest paths found in the beginning would still be clearly suboptimal if the problem time horizon is large. This second disadvantage can be mitigated with a modified longest path ranking algorithm that skips groups of related paths if they cannot possibly improve on the optimal, in a manner akin to the fathoming of solutions in branch and bound. In particular, a family of k-best path algorithms that find new paths via deviations from known ones (Martins and Pascoal, 2000) are amenable to this modification. This adaptation remains future work. In its current form, the approach of examining potential search paths in decreasing order of their length in an equivalent ED network therefore better serves as a potential alternative to expanding all the children (and all subsequent continuations) of a node in a branch and bound framework, when only a limited number of time steps remain.
4.9.2 Choice of Bounds for the OSP Problem The effectiveness of a branch and bound solution process depends primarily on the tightness of a chosen bound against its ease of computation. The results in Section 4.7 show DMEAN to have superior performance to existing bounds for the OSP problem when the target is reasonably energetic. It is therefore suitable as a replacement for the MEAN, PROP and FABC bounds for most moving target problems. FABC may still be the preferred bound however for the OSP when the target is known to be very slow and the time horizon is large. Although the choice of bounds always depends on the particular case at hand, the DMEAN bound‟s ease of computation and ability to better retain tightness across a range of problem parameter values renders it competitive for a wider number of situations.
4.9.3 Alternative Branch and Bound Approaches This chapter adopted an eager depth-first branch and bound approach in solving both the OSPT and GOSP problems. It is eager (Clausen and Perregaard, 1999) in the sense that the upper bound associated with each child node is calculated as soon as that
83
node is branched from the parent, while the depth-first expansion of candidate paths ensures that the amount of memory used remains low. A simple strategy of always selecting the open node with the highest PD bound to expand is used. While all the existing OSP branch and bound frameworks in literature follow a similar approach, there exist more generally a number of more advanced node selection strategies. For example, depth-first search is often coupled with a selection strategy that chooses a node whose children have the greatest difference in bound values. This difference is seen to be desirable in that branching from the child node with the larger bound quickly establishes a good incumbent solution value. Once the process returns to examine the other child with the very small bound, it is then even more likely to be immediately fathomed (Clausen, 1999). Naturally, given sufficient memory, breadthfirst expansions that choose between open nodes in all branches may also be used to some effect. It remains to be seen whether the more advanced techniques developed for branch and bound in general literature can significantly speed up computation for the related OSP, OSPT, and GOSP search problems discussed in this chapter. Naturally, there are other established search techniques, including best-first search, A* (Hart et al., 1968), and variants such as Iterative Deepening A* (IDA*) (Korf, 1985), Memory-bounded A* (MA*) and D* (Stentz, 1994), which can explore a problem state space in an informed manner. While the methods all differ in their tradeoff of computational, space and implementation complexity, there are some commonalities with the methods presented in this chapter. For instance, A* can be seen to essentially perform a best-first branch and bound search, in which the bound is the heuristic underestimate of the remaining cost to a goal. Viewed in this light, DMEAN could also be readily incorporated into the heuristic to guide an A* search towards a path minimising a cost representing the probability of non-detection.
4.9.4 Application Issues and Other Related Work The OSPT problem described in this chapter captures situations where the searcher can more appropriately sense at distinct locations in the environment before moving on to the next, such as buildings in a cluster or rooms with specific vantage points. Under similar assumptions but for a stationary target, the search problem investigated by DasGupta et al. (2006) aggregated a continuous search space into discrete tiles. An alternative problem is presented in Section 4.8, for the case where the searcher senses at every time step but must spend sufficient time in each cell as
84
necessary to physically travel to another, with respect to the previous cell visited. The region decomposition adopted allows for finer control of the searcher‟s path, at the cost of an increase in the number of states linked to the degree of region connectivity and the inter-region travel times. Targeted more towards the search of open indoor environments such as offices, the GOSP problem is also amenable to the use of a modified DMEAN bound. An even more generalised framework was provided by Kierstead and DelBalzo (2003), which uses a genetic algorithm to construct continuous paths through arbitrary search environments. One added advantage of their method lies in its ability to support targets that react to the searcher as long as an appropriate motion model is supplied. Noting that simple Markov models can lead to seemingly nonsensical movements even for a wandering person when directly applied to structured environments, Moors and Schulz (2006) sampled target paths planned with random destinations before approximating the recorded target tracks in a second-order Markov model. Although time-consuming to learn offline in the first instance, the improved models evolve visually more convincing target distributions for indoor areas, while remaining principally compatible with existing OSP approaches. Given modifications to account for the multi-step passage of a target through the larger cells, the opportunity exists for a similar approach to be adapted for the OSPT. Sarmiento et al. (2003) formulated the visibility-based search of a polygonal region as a combinatorial optimisation of vantage points to visit. A set of sensing locations covering the search area is given as input, with branch and bound then used to determine the visit sequence to find an object evenly distributed in the environment in the minimal expected time. This was later extended (Sarmiento et al., 2004) to seek optimal continuous search paths for a sensor with fixed speed. The emphasis on finding a stationary target via infinite-range visual coverage further differentiates the works from the OSPT/GOSP problems considered in this chapter. Bourgault et al. (2004) searched for an object at sea using multiple UAVs, with the area partitioned sufficiently finely that flight constraints are directly preserved. However, this combined with the short look-ahead window of the corresponding solution method renders the approach not as suitable for the indoor search scenarios considered here. Notwithstanding the improved performance provided by the DMEAN bound, whether the time needed to compute the optimal solution is acceptable naturally depends on the individual application. From an implementation perspective, drastically
85
quicker solution times can be obtained with minimal change by accepting suboptimal solutions in the branch and bound process (Washburn, 1995), or implementing a rolling horizon technique and computing individually shorter plans (Dell et al., 1996). Additionally, the number of regions itself can be kept manageable by first hierarchically grouping related regions. For problems of even larger sizes, alternative heuristics such as genetic algorithms (Kierstead and DelBalzo, 2003; Dell et al., 1996) or more greedy methods (Dell et al., 1996; Sarmiento et al. 2003; Moors and Schulz, 2006) may then be required.
4.9.5 Summary This chapter formulated the Optimal Searcher Path problem with non-uniform Travel times (OSPT), which extends the Optimal Searcher Path problem (OSP) in search literature to better account for travel between disparate discrete regions in the environment. Although primarily aimed at searching structured environments, the formulation can be applied more generally to situations where the target moves between the regions faster than the search effort can be redeployed. Complementing the OSPT problem, the Generalised Optimal Searcher Path problem (GOSP) in turn extended the OSP formulation to model a search respecting not only cell connectivity but also transit time constraints. The second formulation discretely models the path of a searcher at a finer scale, thereby facilitating the optimisation of indoor searches as seen through the example in Section 4.8.4. An OSP branch and bound method was adapted to solve the problems proposed, with the main contribution being an improvement which successfully tightens the bound across a range of situations with little additional computation. Results show the DMEAN relaxation also leads to faster overall computation times for OSP problems that involve fast moving targets, compared against other known bounding techniques. The next chapter considers a multi-agent extension of the OSP problem in which some agents can only help to detect but not engage the target. The more capable searchers must therefore plan not only to maximise their probability of directly detecting the target, but also ensure they are in the best position to take advantage of any new information uncovered by the less capable scouts. Aimed at scenarios where a team of searchers are aided by a larger group of scouts, an optimal solution approach leveraging the work developed in this chapter is presented along with a number of heuristics.
5 Multi-Agent Search with Interim Positive Information 5.1 Introduction This chapter considers searching by a heterogeneous team in which searchers are aided by mobile sensor scouts that are able to provide information but cannot engage the target. Envisaged scenarios include a team of fire-fighters entering a burning building with a number of scouting robots, which collectively search for the moving victim before a fire-fighter can arrive to render aid. Unlike the search problems discussed so far in this thesis, the search process does not necessarily terminate on detection of the target by the scouts. Moreover, the target can continue moving and leave the cell where it was found. As such, scout detections only serve to improve the information available to the searchers. The solution framework should therefore optimally balance between maximising the searcher‟s own ability to directly find the target, and maximising their ability to respond and take advantage of possible future detections by scouts. Section 5.2 describes the multi-agent search problem addressed in this chapter and provides a detailed definition. A strategy for obtaining optimal policies under the assumption of uniform cells is presented in Section 5.3. Unlike the search problems described in Chapter 3 and Chapter 4, finding an optimal solution to the multi-agent search problem in a realistically sized environment, even with the uniform cell assumption, requires long computational times. As such, a number of heuristics developed to solve this problem are described in Section 5.4. Section 5.5 evaluates the optimal and heuristic approaches using a range of examples. Section 5.6 discusses the extension for when the travel times are non-uniform and summarises the chapter.
5.2 Searching with the Aid of Scouts 5.2.1 Overview Figure 5.1 depicts a typical scenario analysed in this chapter. Two types of agents, searchers and scouts, are responsible for locating a target moving through an environment divided into a set of cells C 1,...,N . Each agent may move from one cell to another at each time step, subject to the environment‟s structure. A searcher can detect the target with some finite probability if both it and the target occupy the same
87
cell. Once the searcher detects the target, it is considered to have been serviced (rescued/engaged) and the search process terminates. A scout may also similarly detect a target it shares a cell with. However, it cannot service the target and may only update the team‟s target information to reflect its confirmed current location. The main role of a scout is therefore to gather information and trigger a change of searcher and scout paths, where necessary, to better intercept the target in the future. A search team consists of one or more searchers working in conjunction with zero or more scouts. The objective of the search team is to maximise the probability of detecting and engaging the target in T time steps. There are two approaches to address this problem, (1) plan the initial paths for the team to follow and appropriately replan whenever a scout detects the target, or equivalently (2) plan a complete set of paths, contingent on every possibility of scout detection along each path, and follow the appropriate paths depending on the actual occurrence of scout detections. Similar considerations apply when multiple search vehicles are available but only some have the payload to interact with the target (Rybski et al., 2000); the remaining platforms should then do what they can to share the workload of the more capable searchers. Although the scouts cannot directly service the target, they contribute by covering ground not reached by the searcher and revisiting already inspected cells while the searchers move to more rewarding areas. Consider an example where the search team has two time steps to find a stationary target located with equal probability in one of two cells. A searcher on its own would naturally search each cell once, but has a finite chance of overlooking the target in each instance. However, the chance of not detecting a target that is actually in the first cell can be reduced if a scout searches there at the first time step as well. The searcher can thus either re-examine the first cell if the scout detects the target or proceed to the second cell as before. Clearly, capitalising on information from the scout would result a higher probability of servicing the target than just using the searcher alone.
88
Figure 5.1 Searcher scouring an area with the aid of scouts
5.2.2 Problem Statement Target As was assumed for the problems in Chapter 4, the target occupies one cell C 1,...,N at a time and moves according to a specified Markov process at each
time step. A prior probability distribution p( ,1 ) [p( 1,1 ), p( 2,1 )...p( N ,1 )] of the target at time 1 is initially supplied. The distribution evolves according to the formula
p( ,t 1 ) p( ,t )'' , where p( ,t )'' denotes the updated undetected target probability at time t, after the effect of sensing by searcher and scout in the same time step have been taken into account.
Search Team A search team consists of n 1 searchers and m 0 scouts. The searchers are indexed before the scouts for convenience, i.e., searchers are numbered as agents 1,...,n while scouts are numbered as agents n 1,...,n m. Each agent occupies one cell at each time step; let Sk ( i ),i C,k { 1,...,n m} denote the set of cells that an agent k can directly move to from cell i. For clarity of explanation, the cells are assumed to be uniform such that an agent s in cell i at time t is constrained to move to the next cell j Ss ( i ) at time t 1, then
89
l Ss ( j ) at t 2, and so on. Non-uniform travel times can be handled but requires synchronising the agents‟ actions in the event of a scout detection. This extension will be further discussed in Section 5.6.2. The search team has T time steps in which to find the target. Let X ( t ) be the set of possible cell combinations occupied by the agents (team state) at time step t 1,...,T , and cs ( x ),1 s n m be the cell occupied by agent s in a given state x X ( t ). Membership of X ( t ) is constrained by ( 0 ), the initial state of the team. Let
( 1 ),..., ( T ), ( ) X( ) be a fixed plan for the searcher and scouts to follow, represented by the series of cells respectively searched by each agent in one time unit increments. At the commencement of the search process, each agent s first moves to and searches cell cs ( ( 1 )) for one time period, then travels to search cs ( ( 2 )) for another time step. This continues until the target is detected.
Target Detection Target detection is modelled as follows: if both an agent s and the target are in cell cs ( x ) during time t , detection occurs with a glimpse probability of gs ( cs ( x ),t ). Concurrent searches are independent, that is if the target and both agents s and l are in cell i at time t , then the probability of detection is 1 ( 1 gs ( i,t )) ( 1 gl ( i,t )). Assuming unsuccessful detection, the un-normalised target distribution is updated via:
p( ,t 1 ) p( ,t )''
(5.1)
n m
p( j,t )'' p( j,t )' ( 1 I ( s, j,t ) g s ( j,t )), j { 1,...,N }
(5.2)
s n 1
n
p( j,t )' p( j,t ) ( 1 I ( s, j,t ) g s ( j,t )), j { 1,...,N }
(5.3)
s 1
where I ( s, j,t ) is an indicator function that equals 1 if agent s searches cell j at time
t , with the product of an empty set interpreted as 1. p( ,t )' denotes the undetected target probability distribution at time t after the effect of only the searchers‟ actions in that time step (and all agent actions prior to that) have been taken into account. A successful scout detection serves to concentrate the target probability in the cell in which it was found. In particular, when a scout detects a target in cell j at time t , the
90
target distribution p( ,t )'' is set to be Q j , a 1 N vector with element j set to 1 and the remaining elements set to 0, reflecting the confirmed target location at that time.
Objective: Probability of Detection (PD) Assuming that the scouts never detect the target, the probability of detection of a fixed plan from t to T , given an initial target distribution of p, is: T
N
n
PD( ,t, p ) p( j, ) ( 1 ( 1 I ( s, j, ) g s ( j, )) ) t j 1
(5.4)
s 1
For the team to be as successful as possible in taking advantage of any new information, it should be able to adapt its path after any detection by the scouts rather than simply follow a single fixed plan. Thus, instead of finding a fixed sequence of regions for each agent to follow as in Chapter 4, a set of plans containing one initial plan and a set of new plans is needed. Suppose is now the initial plan for the team to follow. Let x , j , be a new plan that is selected after a scout detection in cell j at time , given that the agents are in the cells indicated by state x. This plan will then be followed from 1,...,T or until a scout detects the target once more. Given the different possible outcomes of scout actions (e.g., scout s may or may not be successful at each of the cell it inspects) and the correspondingly different target distribution for each eventuality, the effective PD of each plan can be calculated recursively via:
PD( ,t, p ) PD( ,t, p ) T 1 N
n m
p( j, )' ( 1 ( 1 I ( l, j, ) g ( j, ))) t j 1
l n 1
l
(5.5)
PD( ( 0 ), j , , 1,Q j ) The Searcher/Scout Problem The goal of the problem considered in the remainder of this chapter can thus be summarised as: max
, ( 1 ),1 ,1 ,..., ( T 1 ),N ,T 1
PD( ,1, p( ,1 ))
(5.6)
Subject to:
cs ( ( t )) Ss ( cs ( ( t 1 ))), s { 1,...,n m},t { 1...T }
(5.7)
cs ( ( , ), j , ( 1 )) Ss ( cs ( ( )), s { 1,...,n m }, { 1...T 1}
(5.8)
91
The objective of maximising PD( ,1, p( ,1 )) in the general Searcher/Scout problem reduces to that of a multi-searcher Optimal Searcher Path Problem (OSP) if only searchers but not scouts are used.
5.3 Optimal Policies for the Searcher/Scout Problem 5.3.1 Obtaining Optimal Plans Search plans for teams with scouts can be obtained via: 1. Planning the initial paths for the team to follow until the first detection, at which time another set of paths are planned by solving an updated problem that incorporates the current searcher and scout locations, the updated target information, and the reduced number of available time steps until T , or equivalently 2. Planning a complete set of paths, contingent on every possibility of scout detection along each path, and selecting the appropriate set of paths depending on the actual occurrence of scout detections. Since the agents must additionally plan to take advantage of potential detections by scouts and adjust accordingly whenever they actually occur, the problem is much more complicated than the Optimal Searcher Path problem (OSP) discussed in earlier chapters. The Generalised Surveillance Search Problem (GSSP) (Tierney and Kadane, 1983) provides one example framework in literature where target detection also does not necessarily terminate the search. In one of the examples presented, surveillance against smuggling operations was modelled where the target is only captured if detection occurs within national boundaries. As such, detection outside a particular subset of situations only helps in improving the quality of target information and the potential for future gain. Following similar reasoning, the Searcher/Scout problem proposed in this chapter may actually be solved instead as a detection search problem (that always terminate once the target is found, even if by a scout) in which the reward for an agent s finding the target at time t when the team is in state x X( t ) is 0 d * ( x,s,t ) 1. Naturally
d * ( ,s, ) 1 if agent s is a searcher, since the process actually terminates on the very
92
first searcher detection and no further reward in the form of detection probability can be gained. However, as the utility of a scout detection ultimately depends on how the team is able to make use of the new information in the future, each value of d * ( x,s,t ),s n can be seen as the optimal payoff from a smaller detection sub-problem in which the target is known to start from cell cs ( x ) at time t , the agents begin in state x , and the searches take place during time steps t 1,...,T . In particular, the utility can be computed by solving a modified OSP problem where the reward for detection in a given state and time is specified by d * ( ,, ), t rather than being always 1. Since each such problem depends only on “future” values of d*, one can first assign d * ( x,s,T ) 0, s n and then solve a series of progressively larger OSP problems with specified rewards, in order to obtain d * ( x,s,t ) successively for
t T 1,...,1 (Figure 5.2). The algorithm for finding optimal plans for the multi-agent, discrete Searcher/Scout problem is as follows.
Algorithm for the Searcher/Scout Problem: 1. Let d * ( x,s,t ) 1 for each x X( t ),s { 1,...,n },t { 1,...,T } and
d * ( x,s,T ) 0 for each x X( T ),s { n 1,...,n m }. Set t T 1. 2. For each x X( t ), s { n 1,...,n m }: Find * to maximise Lx ,Qc ( x ) ,t 1( d*, *). s
Set *x ,cs ( x ),t * and d * ( x,s,t ) Lx,Qc ( x ) ,t 1( d*,*). s
3. Set t t 1. 4. If t 0, go to Step 2, else go to Step 5. 5. Find * to maximise L ( 0 ),p( ,1 ),1( d*, *). * is now the optimal plan for the team to first follow. If a scout s finds the target at time t when the team is in state i , the plan *x ,cs ( x ),t should then be used as an alternative. Repeat for each subsequent scout detection until the search terminates. Here Lx ,p ,t ( d*, *) denotes the optimal payoff of an OSP problem over the period t,...,T with rewards d*, given initial team state x and an initial target probability
distribution (at time t) of p.
93
Searcher/Scout Problem =
d * ( x,s,1 ),...,d * ( x,s,T 1 )
Solve:
Solve:
Lx ,Qc ( x ) ,2 ( d*, *)
L ( 0 ),p( ,1 ) ,1( d*, *)
x X ( 1 ),s n 1,...,n m
s
...
d * ( x,s,2 ),...,d * ( x,s,T 1 )
d * ( x,s,T 2 ),d * ( x,s,T 1 )
d * ( x,s,T 1 )
Solve: Lx ,Qc ( x ) ,T ( d*, *)
Solve: Lx ,Qc ( x ) ,T 1( d*, *) s
s
x X ( T 2 ),s n 1,...,n m
x X ( T 1 ),s n 1,...,n m
Figure 5.2 Solving the Searcher/Scout problem as a series of smaller OSP problems
Apart from using multiple searchers, the only difference between Lx ,p ,t ( d*, *) and the OSP problem described in equation (2.2) lies in the arbitrary specification of detection rewards instead of always using a value of 1 to calculate PD, such that:
Lx ,p ,t ( d*, *) max
T
N
n m
p( j, ) ( 1 ( 1 I ( s, j, ) g s ( j, ) d * ( j,s, )) (5.9)
t 1 j 1
s 1
Subject to:
cs ( ( )) Ss ( cs ( ( t 1 )), s { 1,...,n m}, { t 1...T }
(5.10)
The OSPT branch and bound algorithm and DMEAN bounding method described in Chapter 4 can be directly adapted to solve this OSP problem with specified rewards. The next section describes such an approach in more detail before illustrating the process with the aid of an example.
5.3.2 Solution Approach Details and Illustrative Example 5.3.2.1 Branch and Bound Method for Solving the Multi-Searcher OSP problem with specified rewards The algorithm below adapts the OSPT branch and bound approach in Chapter 4 to cater for multiple searchers and arbitrarily specified rewards. In particular, it solves a
94
multi-searcher OSP problem with the searchers initially in the cells indicated by state
x X ( t 1 ) and able to act during the time steps t,...,T . Following the approach in Section 4.5,
K( a )
is a set of 3-tuples
{ nextstate,time,upperbound } representing plan continuations yet to be explored in a given solution branch consisting of a actions. The first field refers to the next combination of cells (team state) for the team to search for one time step, the second field is the total time expended once that search is completed, and the third contains the upper bound associated with this particular extension. p* holds the best achievable reward discovered thus far. Let S( x ) be the set of states that are directly reachable from state x such that cs ( y ) S( cs ( x )),s, y S( x ).
Algorithm for Branch and Bound (Multi-Searcher OSP with specified rewards): 1. Let 0 x. Set a 0,K( a ) {{ a ,t 1,0 }} and p* to a value below 0. 2. If K( a ) is empty, let a a 1, else go to 4. 3. if a 0, go to 9, else go to 2. 4. Selection: Remove from K( a ) a tuple { a , a , pa } chosen according to a selection criterion. 5. If pl p*, this extension can be fathomed. Go to 2. 6. Else Branch: For each state c S( a ), if a 1 T , obtain pc , the upper reward bound for any plan beginning with the sequence { 0 ,..., a , c }. Add tuple { c , a 1, pc } to K( a 1 ). 7. If no tuples were added to K( a 1 ), the current extension is a leaf and no more searches can be done. Let p* pa and store { 0 ..., a } as the incumbent best path. Go to 2. 8. Else let a a 1, go to 4. 9. Stop, the last saved plan is optimal with the maximum reward of p* . Since d * ( ,s,T ) 0, s n for the Searcher/Scout problem, scout positions are irrelevant for the last time step and only states with unique combinations of searcher positions in c S( a ) need to be expanded. An algorithm that can be used to obtain the upper reward bound for Step 6 is described in the next section.
95
5.3.2.2 Adapted DMEAN bound for OSP with specified rewards The following algorithm obtains an upper reward bound to be used in the branch and bound algorithm in the previous section, for any extension of a partial plan 0 through to k from time step Tk 1 k 1 onwards to T . Let I x ( s, j ) be an indicator function that equals 1 if agent s searches cell j when the team is in state x X . The terms , P and M are as defined for the MEAN algorithm in Section 4.5.2.
Algorithm for the DMEAN Bound (for OSP with specified rewards d * ) 1. For each time step from Tk to T , create a graph node per cell at that time. Mark node { k ,Tk } as valid. 2. Use P( ,t ) ( Tk ) t Tk to calculate P( ,t ) for Tk t T . 3. From each valid node { x,t }, extend arcs to all nodes { y, },
y S( x ), t Wxy 1 T . Mark the head nodes { y, } valid. 4. Assign DMEAN_Reward( y, ,x,t ) as the weight for each such new arc to
{ y, }. 5. Repeat 3 until arcs have been extended from all valid nodes. 6. Apply a DAG longest path algorithm to find the maximum reward for paths leading from node { k ,Tk }. Add this to the reward of following sequence
0 ,..., k to form the upper reward bound of any continuation. Function DMEAN_Reward( y, ,x,t ) 1. Set U P( , ) and Re ward 0. 2. If t Tk , go to step 4. 3. For each cell i 1,...,N that is searched by one or more agents in state y : N
n m
l 1
s 1
Set U( i ) U( i ) P( l,t ) ( 1 ( 1 g s ( l,t ) I x ( s,l )) ) M( l,i,t, ).
U( i ) is now the probability that the undetected target is in cell i at time , given that the (multiple) cells indicated by x were searched in time t as well. 4. For each agent s, in decreasing order of d * ( y,s, ): Set pd U( cs ( y )) gs ( cs ( y ), ). Set U( cs ( y )) U( cs ( y )) pd .
96
Set Re ward Re ward pd d * ( y,s, ). 5. Return Re ward . Note that step 4 of DMEAN_Reward ensures that if multiple agents are in the same cell at time , searching is done in an order that maximises the expected reward. This ordering preserves the preference for searcher detections over scout detections in the original Searcher/Scout problem.
5.3.2.3 Illustrative Example Consider a small example where one searcher and one scout have T 3 time steps to search for a target moving amongst three cells connected in a line (Figure 5.3). At each time step the target stays in same cell with 80% probability and otherwise moves to an adjacent cell – an initial uniform distribution p( 1,1 ) p( 2 ,1 ) p( 3,1 )
1 3
is assumed. Glimpse probabilities are set to be gs ( , ) 0.6, s 1,2. The searcher and scout both begin in cell 1, and at each time step may move to an adjacent cell. Table 5.1 shows the values of d * after the Algorithm for the Searcher/Scout Problem in Section 5.3.1 has been applied.
1
2
3
Figure 5.3 Example search area with 3 cells Agent 1 (searcher) d * ( x,1,t ) i 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 t=1 1 1 1 1 1 1 1 1 1 t=2 1 1 1 1 1 1 1 1 1 t=3 1 1 1 1 1 1 1 1 1 Agent 2 (scout) d * ( x,2,t ) i 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 t=1 0.696 0.6792 x 0.6960 0.6792 x x x x t=2 0.48 0.48 0.12 0.48 0.48 0.48 0.12 0.48 0.48 t=3 0 0 0 0 0 0 0 0 0 Table 5.1 Reward values for example with one searcher and scout. x denotes values not calculated for states unreachable at that time.
As can be seen, if a scout finds the target with one time step remaining (at t=2) when the searcher is in the same or an adjacent cell, then at the last time step there is a
97
80%×60%=48% chance that the searcher will detect the target. However, if the searcher is at the opposite end of the map than the scout (e.g., when team is in state {1,3}), then the best it could do is search a cell containing 20% of the undetected probability mass, resulting in a detection probability of 12%. The value of scout detections tend to increase with the amount of time remaining as the searcher is then usually better able respond. Table 5.2 shows the corresponding optimal plans to be used if the scout detects the target at each time. The optimal PD when searching with one searcher and one scout is 0.64901, whereas using the searcher alone has a 0.56864 probability of finding the target. Table 5.3 shows the optimal PD when using different numbers of searchers and scouts for the same problem.
State when scout Agent 1 (Searcher) Path Agent (Scout) 2 Path detects the target (x represents a scout position that has (time, searcher region, no impact on the objective) scout region) t=1, i={1,1} 1,1 2, x t=1, i={1,2} 2,2 1, x t=1, i={2,1} 1,1 2, x t=1, i={2,2} 2,2 1, x t=2, i={1,1} 1 x t=2, i={1,2} 2 x t=2, i={1,3} 2 x t=2, i={2,1} 1 x t=2, i={2,2} 2 x t=2, i={2,3} 3 x t=2, i={3,1} 2 x t=2, i={3,2} 2 x t=2, i={3,3} 3 x Initial plan in the 1,2,3 1,2, x absence of detections Table 5.2 Optimal agent plans for problem with 3 cells. x represents a ‘don’t care’ position for the scout at time t=T=3, which has no impact on the objective.
1-1
T=1 T=2 T=3
1-1
1-2
1-1
1-2
2-1
2-1
2-2
1-x
2-x
2-2
3-x
Figure 5.4 Example tree of options for the Searcher/Scout problem
Figure 5.4 shows a partial tree of options explored when solving the top level problem L ( 0 ),p( ,1 ),1( d*, *) to compute the initial team plan. As noted in Section
98
5.3.2.1, only unique searcher actions need to be considered for the very last time step. Figure 5.5 shows the ED network constructed to calculate the DMEAN bound, for any continuation of the plan from state {1,1} at time 1.
State
Time Period 1
2
3
1-1
1-1
1-x
1-2
1-2
2-x
1-3
1-3
3-x
2-1
2-1
2-2
2-2
Figure 5.5 Example DMEAN ED network
Number of
Number of Searchers
Scouts
1
2
3
0
0.56864
0.82096
0.92490
1
0.64990
0.87800
0.94814
2
0.69824
0.89695
0.95806
3
0.72934
0.90786
0.96570
4
0.75273
0.91665
0.96967
5
0.76737
0.92097
0.97204
6
0.77201
0.92366
0.97328
Table 5.3 PD using different numbers of searchers and scouts in a 3-cell problem
5.3.3 Notes on Computation There are a number of considerations which can be used to limit the actual amount of computation required.
First, when populating the reward table d*, one needs to only consider states that are reachable by the agents by each time. Moreover, calculating a value
d * ( x,s,t ) for a scout s is only necessary if it can possibly detect the target at that point, i.e., gs ( cs ( x ),t ) 0 and p( ,1 ) t 1( cs ( x )) 0.
99
One can directly assign d * ( x,s,t ) 0 if a target found in cell cs ( x ) at time t cannot be intercepted by any of the searchers before all T time steps have been expended.
Given that all scout detections are equal, the value of d * ( x,s,t ),s n is simply the value of d * ( x,u,t ) if cs ( x ) cu ( x ), and thus the reward in such cases only needs to be calculated once.
When using more than one searcher or scout with equivalent capabilities (in terms of travel constraints and glimpse probabilities), simply defining valid state transitions as those in which all corresponding agent positions between a source state x and state y are connected ( cs ( y ) cs ( x ),s ) may result in multiple transitions to functionally equivalent states. Consider once more the 3-cell example in Section 5.3.2.3, using this time one searcher and two identical scouts. In this case, having scout 1 at cell 1 and scout 2 in cell 2 serves the same purpose for the team as the converse of having scout 1 in cell 2 and scout 2 in cell 1. As such, only one of { 1,1,2 } or { 1,2,1 }, for example, needs to be considered to be in the set S( x { 1,1,2 }). Taking care in avoiding redundant states and state transitions, where they might occur, can therefore significantly reduce unnecessary computation. Where a team can transit to a number of valid but equivalent combinations of agent positions, one desirable tie-breaking strategy may be to prefer state transitions that minimise the amount of agent movement.
5.4 Practical Heuristics for Searching with Scouts Due to the large state space of the Searcher/Scout problem, it is presently feasible (with a MATLAB implementation on a current desktop processor) to optimally solve problems up to 9×9 cells in size using 1 searcher, 1 scout, and T 10 time steps. This section describes a number of applicable heuristics that may be necessary to obtain solutions for problems with a larger number of cells, more agents, or over longer time horizons. Two general approaches are taken:
Solving the Searcher/Scout problem as a series of OSP problems with specified rewards (as per the optimal approach outlined), but employ an OSP heuristic
100
instead of branch and bound to solve each sub-problem. This approach is elaborated in Section 5.4.2.
Use a run-time heuristic to generate a plan for the agents to follow, and replan whenever a scout detects the target. This is explained in Section 5.4.3. Dell et al. (1996) reviewed and compared a number of heuristics for the OSP
problem with multiple searchers, including local search, genetic algorithms, as well as the use of branch and bound in a moving horizon approach. In particular two expected detection heuristics, titled H1 and H2, were shown to be capable of obtaining good solutions in the problem sets attempted (around 2% from the optimal for one searcher and 3% from the best known results for two searchers, with a maximum of difference of 7% in rare cases). The following sections describe in detail the H1 and H2 heuristics, originally proposed in Martins (1993), prior to adapting them to be used in Searcher/Scout problem heuristics.
5.4.1 Heuristic Solution to the OSP problem 5.4.1.1 Expected Detection Heuristic H1 The H1 heuristic functions by repeatedly taking the first step of a path maximising the expected number of detections (ED) as the next step of the heuristic plan, updating the probability distribution conditioned on non-detection, and then computing another maximum ED path for the remaining time step again until all the steps of a plan is obtained. In contrast to directly maximising PD, the heuristic takes advantage of the speed with which a path maximising ED can be computed.
Expected Detection Heuristic 1 (H1) (Dell, et. al., 1996): 1. ( 0 ) initial positions (state) of the search team. 2. For t = 1 to T Do 3.
Let ( t 1 ) be the team‟s position(s) at time t 1.
4.
Find path maximising ED for t,...,T .
5.
Set ( t ) ( t ).
6.
Bayesian update the probability mass of the target up to time t 1 assuming nondetection.
7. Compute the PD obtained when the team follows . 8. Return and PD.
101
5.4.1.2 Expected Detection Heuristic H2 Martins (1993) cited pathological problem cases for which the completely myopic approach of H1 results in undesirable plans. As a response, the H2 algorithm was designed to augment H1 by basing the next step of the plan on more than a single path maximising ED. In particular, the set of maximum ED paths starting from each possible state for the team to be next in is found. The candidate next state whose maximum ED path leads to the highest PD is then added as the next step of the plan. This additional discrimination comes at the cost of repeatedly planning many more paths, and therefore H2, as opposed to H1, was not seen as a reasonable heuristic in Dell et al. (1996) for solving three-searcher problems given the available computing power at the time.
Expected Detection Heuristic 2 (H2) (Dell, et. al., 1996): 1. ( 0 ) initial positions (state) of the search team. 2. For t = 1 to T Do 3.
Let ( t 1 ) be the team‟s position(s) at time t 1.
4.
For all states x S( ( t 1 )) Do
5.
Find path x maximising ED for t 1,...,T , fixing x ( t ) x.
6.
Compute PD( x ).
7.
Assign ( t ) k such that PD( k ) max PD( x ).
8.
Bayesian update the probability mass of the target up to time t 1 assuming
x
nondetection on k ( t ). 9. Compute the PD obtained when the team follows . 10. Return and PD.
5.4.1.3 Adaptations of H1 and H2 for the OSP Problem with Specified Rewards With respect to the new objective of solving the OSP with arbitrary specified rewards for detection at different times, locations, and by different agents, the H1 and H2 algorithms need to be modified to operate based on paths maximising expected rewards (ER).
102
Expected Detection Heuristic 1 with Specified Rewards (H1r): 1. ( 0 ) initial positions (state) of the search team. 2. For t = 1 to T Do 3.
Let ( t 1 ) be the team‟s position(s) at time t 1.
4.
Find path maximising expected reward for t,...,T .
5.
( t ) ( t ).
6. Bayesian update the probability mass of the target up to time t 1 assuming nondetection. 7. Compute the reward L ( 0 ),p( ,1 ),1( d*, ) obtained when the team follows . 8. Return and the reward L ( 0 ),p( ,1 ),1( d*, ).
Expected Detection Heuristic 2 with Specified Rewards (H2r): 1. ( 0 ) initial positions (state) of the search team. Let q( ,1 ) p( ,1 ). 2. For t = 1 to T Do 3.
Let ( t 1 ) be the team‟s position(s) at time t 1.
4.
For all states x S( ( t 1 )) Do
5.
Find path x maximising expected reward for t 1,...,T , fixing x ( t ) x.
6.
Compute PD( x ).
7.
Assign ( t ) k such that L ( t 1 ),q( ,t ),t ( d*,k ) max L ( t 1 ),q( ,t ),t ( d*,x ).
8.
Update the target probability mass q up to time t 1 assuming
x
nondetection on k ( t ). 9. Compute the reward L ( 0 ),p( ,1 ),1( d*, ) obtained when the team follows . 10. Return and the reward L ( 0 ),p( ,1 ),1( d*, )
The paths maximising expected rewards (required in step 4 of H1r and step 5 of H2r) can be found using the strategy in Section 5.3.2.2, with individual rewards calculated by MEAN_Reward . Function MEAN_Reward( y, ) 1. Set U P( , ) and Re ward 0. 2. For each agent s, in decreasing order of d * ( y,s, ) : Set pd U( cs ( y )) gs ( cs ( y ), ).
103
Set U( cs ( y )) U( cs ( y )) pd . Set Re ward Re ward pd d * ( y,k, ). 3. Return Re ward . Similarly, a variant of the heuristics, H1rd and H2rd, can be defined by using paths maximising the discounted expected rewards in the sense of DMEAN_Reward. As suggested in Chapter 4, these heuristics would require very similar computation effort as their non-discounted counterparts. While myopically picking the first step from individually “better” paths may not necessarily result in a more effective plan overall, the use of an alternate method may enable the variant to avoid the same local optimum as the original H1/H2 heuristics.
5.4.2 Complete Planning Heuristics (G1, G1d, G2, G2d)
Searcher/Scout Problem = Heuristically Solve: ˆ *) L ( d*,
dˆ * ( x,s,1 ),...,dˆ * ( x,s,T 1 )
Heuristically Solve: ˆ *) L ( d*, x ,Qcs ( x ) ,2
x X ( 1 ),s n 1,...,n m
( 0 ),p( ,1 ) ,1
...
dˆ * ( x,s,2 ),...,dˆ * ( x,s,T 1 )
dˆ * ( x,s,T 2 ),dˆ * ( x,s,T 1 )
Heuristically Solve: ˆ *) L ( d*, x ,Qcs ( x ) ,T
x X ( T 1 ),s n 1,...,n m
dˆ * ( x,s,T 1 )
Heuristically Solve: ˆ *) L ( d*, x ,Qcs ( x ) ,T 1
x X ( T 2 ),s n 1,...,n m
Figure 5.6 Operation of the G1d, G1, G2d and G2 heuristics
The optimal approach proposed in Section 5.3.1 obtains plans by solving a series of OSP problems with specified rewards, in a manner similar to that proposed for the GSSP problem by Tierney and Kadane (1983). Clearly, plans can be found much quicker if one is content with only sub-optimally solving each constituent problem. To this end, let G1, G1d, G2 and G2d be heuristics for the Searcher/Scout problem, where
104
the problem is solved as a series of sub-problems, each respectively computed using the H1r, H1rd, H2r and H2rd heuristics. Similar to the optimal approach, as each sub-problem is solved, the corresponding payoff is then stored into dˆ * ( x,s,t ), which is then used as the basis to find the values of dˆ * ( x,s, ), t. Figure 5.6 illustrates the operation of the G1d/G1/G2d/G2 Searcher/Scout problem heuristics. Given the reliance on the OSP heuristics, sources of sub-optimality for these heuristics include the underestimation of optimal rewards d * cascaded up the hierarchy of sub-problems, as well as the conservative solution of the
ˆ *) itself. final top level problem L ( 0 ),p( ,1 ) ,1( d*,
5.4.3 Replanning Heuristics (R1, R1d, R2, R2d) The heuristics discussed in Section 5.4.2 require solving for the entire set of OSP problems, which could still be computationally expensive when the number of states is large. An alternative approach is to apply an OSP heuristic just once at the beginning of the search process and only replan whenever a scout detects the target. The R1, R1d, R2 and R2d heuristics operate in this manner and are defined as follows:
Replanning Algorithm (R1d/R1/R2d/R2):
ˆ ˆ *) with the H1rd/H1r/H2rd/H2r heuristic. 1. Solve L ( 0 ),p( ,1 ) ,1( d*, 2. While T times steps have not been used and the target is not engaged, 3.
Follow plan ˆ * .
4.
ˆ ˆ *) using the If scout s detects the target at time t, solve Lˆ *( t ),Qcs ( i ) ,t 1( d*,
chosen OSP heuristic and follow this new plan instead. This approach has the advantage of quickly obtaining a useable plan (each OSP heuristic for example executes in much less than a second for the problems in Table 5.4 in Section 5.5.3). However, the optimality is further reduced since the utility of future scout detections ( dˆ * ) are by necessity very roughly approximated in advance. For the purposes of this chapter, the reward dˆ * ( x,s,t ) for a detection by scout s is set to be 1 if at least one searcher can possibly intersect a target starting from cell cs ( x ) in the time remaining, and zero otherwise. Scout detections are therefore optimistically treated as searcher detections as long as a searcher can respond to this new information in time.
105
In an actual implementation, as the search team follows its current plan through time steps t 1,...,T , the quality of future plans can be improved by heuristically computing dˆ * ( ,s, ),s n from T 1,...,t in the mean time, as shown in Figure 5.6. In the limit, the PD from using the R1/R2 would therefore approach that obtained from the corresponding G1/G2 heuristics as well.
5.5 Results This section shows the computation results for a number of problems using the optimal and the proposed heuristic methods. The optimal plans allow the team to react effectively to scout detections, but require significant time to compute. The heuristics are then evaluated with three sets of problems and are shown to lead to satisfactory plans.
5.5.1 Optimal Solutions Figure 5.7 plots the optimal PD for the 3-cell map described in Section 5.3.2.3, using different combinations of searchers and scouts with T 5. Clearly, coupling the searcher(s) with one or two scouts noticeably improves the expected probability of engaging the target. For this small example, however, adding more scouts produces poor returns in comparison to that obtained by using an additional searcher instead. PD for 3-cell problem with different numbers of searchers and scouts, T=5 1
0.95
PD
0.9
0.85
Using 1 searcher Using 2 searchers Using 3 searchers
0.8
0.75
0
1
2
3 Number of scouts
4
5
6
Figure 5.7 PD for 3-cell problem with different numbers of searchers and scouts
Figure 5.8 shows the computation times required to find optimal plans for the above cases, using a MATLAB implementation of the algorithm on a 2.6-GHz AMD
106
Opteron 252 processor. While the case involving one searcher and six scouts required only 46.02 seconds to compute, the addition of two more searchers to the team led to a computation time of almost 97 minutes. In terms of team composition, it was quicker to plan for the team consisting of one searcher and six scouts (solving 592 OSP problems in 46.02 seconds) than for two searchers and five scouts (840 problems in 279.24 seconds), which in turn was quicker than for three searchers with four scouts (932 problems in 682.48 seconds). While a higher proportion of scouts implies division into more sub-problems in general, the trend of a higher planning expense for searchers relative to scouts was also seen in other larger problems tested. One contributing factor is that the expectation-based DMEAN/MEAN bounds may be less tight when faced with the high rewards
(d * ( x,s,t ) 1) of the extra searchers. This situation is analogous to having a slow moving target and a high glimpse probability in unmodified OSP problem, which is known to cause the ED of a path to significantly overestimate the PD. Rewards for scout detections however are usually much less than 1, thereby keeping the bounds tight even when more scouts are added.
Cmmputation times for 3-cell problem with different numbers of searchers and scouts 6000
Computation time (s)
5000
4000
3000
2000
Using 1 searcher Using 2 searchers Using 3 searchers
1000
0
0
1
2
3 Number of scouts
4
5
6
Figure 5.8 Computation times for 3-cell problem with different numbers of searchers and scouts
Figure 5.9 shows an example search of a 7×7 grid using one searcher and one scout. The target is known to be in the centre cell at time step 1 (p( 25,1 ) 1). At each
107
time step it will either stay in the same cell (60% probability) or choose to move to a neighbouring cell (40% probability). The target probability mass (not drawn) thus spreads out gradually from the centre cell and first reaches cell 11 for example in time step 3. Both agents start in cell 1 and have a 60% glimpse probability. The search takes place over T 10 time steps. The optimal initial plan (Figure 5.9) calls for the searcher to head towards the centre of the grid (favouring the left side), with the scout travelling also towards the central area but in an almost parallel track along the right. The plan thus corrals the bulk of the probability mass between the two agents. Should the target be found to be in cell 11 by the scout at time step 4, however, the plan then changes accordingly to concentrate around the area (Figure 5.10). Figure 5.11 and Figure 5.12 show the corresponding paths when one and two searchers are used, respectively.
Initial search paths for 7x7 grid, 1 searcher and 1 scout Scout 1 2 Searcher
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Figure 5.9 Initial optimal search paths for 1 searcher and 1 scout. Asterisks denote starting location and diamonds the final cell searched. PD = 0.40630. Shading of the cells correspond to the final target location probability which could be used to replan the actions of the team if the target was not found after the execution of the plan.
108
Revised optimal paths if the scout detects the target in cell 11 at time step 4 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Searcher18 17
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Scout
Figure 5.10 Revised optimal search paths if the scout detects the target in cell 11 at time step 4. Note that at step 4, the searcher is in cell 17 and the scout is in cell 11. Search path for 7x7 grid, 1 searcher 1Searcher2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Figure 5.11 Optimal search path for 1 searcher working alone. PD = 0.33069.
109
Search path for 7x7 grid, 2 searchers Searcher 2 1 Searcher 12
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Figure 5.12 Optimal search paths for 2 searchers. PD = 0.51715.
The optimal set of plans for the example using one searcher and one scout was computed in 6765 seconds on a MATLAB implementation. While much can be gained from a more efficient software implementation, for example in C++ (which was shown to significantly reduce the time for solving OSP problems in Figure 4.14), faster heuristic methods would be necessary to address larger problems within reasonable time.
5.5.2 Heuristic Solutions To convey a qualitative sense of how the different heuristics perform for one particular problem, Figure 5.13 to Figure 5.20 show the respective initial paths heuristically obtained for the same 7×7 cell example with one searcher and one scout examined in Section 5.5.1. For comparison, the original optimal paths can be found in Figure 5.9.
110
Initial search paths for 7x7 grid, 1 searcher and 1 scout (obtained via G1d) Scout 1 2 Searcher
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Figure 5.13 Initial paths obtained using G1d for example in Figure 5.9. PD = 0.39120.
Initial search paths for 7x7 grid, 1 searcher and 1 scout (obtained via G1) Scout 1 2 Searcher
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Figure 5.14 Initial paths obtained using G1 for example in Figure 5.9. PD = 0.38079.
111
Initial search paths for 7x7 grid, 1 searcher and 1 scout (obtained via G2d) Scout 1 2 Searcher
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Figure 5.15 Initial paths obtained using G2d for example in Figure 5.9. PD = 0.40079. Initial search paths for 7x7 grid, 1 searcher and 1 scout (obtained via G2) Scout 1 2 Searcher
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Figure 5.16 Initial paths obtained using G2 for example in Figure 5.9. PD = 0.39419.
112
Initial search paths for 7x7 grid, 1 searcher and 1 scout (obtained via R1d) Scout 1 2 Searcher
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Figure 5.17 Initial paths obtained using R1d for example in Figure 5.9. PD = 0.36487. Initial search paths for 7x7 grid, 1 searcher and 1 scout (obtained via R1) Scout 1 2 Searcher
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Figure 5.18 Initial paths obtained using R1 for example in Figure 5.9. PD = 0.36413.
113
Initial search paths for 7x7 grid, 1 searcher and 1 scout (obtained via R2d) Scout 1 2 Searcher
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Figure 5.19 Initial paths obtained using R2d for example in Figure 5.9. PD = 0.31598. Initial search paths for 7x7 grid, 1 searcher and 1 scout (obtained via R2) Scout 1 2 Searcher
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Figure 5.20 Initial paths obtained using R2 for example in Figure 5.9. PD = 0.30758.
114
While it is difficult to visually assess the performance of a set of plans through the initial plan alone, it can be seen that performance suffered if the scout‟s cell at time step
T 1 could not be reached by the searcher at time T . The plans obtained with the G2d method (Figure 5.15), which led to the highest probability of detection for the above example, mainly separated the agents in the high target probability cells and always allowed the searcher to respond to scout detections within the available time. Nevertheless, none of the heuristics are completely superior to the others.
5.5.3 Heuristics Evaluation This section evaluates the heuristics proposed in Section 5.4 with the aid of three sets of examples with different maps, search team composition, and planning horizons. The small problem set (Table 5.4) allows for direct comparison against the optimal outcomes in the previous section, while the medium set (Table 5.5) compares the heuristic for larger 2-dimensional grids, concentrating on cases with one searcher and different numbers of scouts. Lastly, the run-time heuristics R1d and R1 are employed to plan for problems over long time periods (Table 5.7).
Problem
Map
Number of
Number
Searchers
of Scouts
Time Horizon (T)
S1
3-cells (Figure 5.3)
1
0
5
S2
3-cells
1
1
5
S3
3-cells
1
2
5
S4
3-cells
1
4
5
S5
3-cells
2
0
5
S6
3-cells
2
1
5
S7
3-cells
2
2
5
S8
3-cells
2
4
5
S9
3-cells
3
0
5
S10
3-cells
3
1
5
S11
3-cells
3
2
5
S12
3-cells
3
4
5
Table 5.4 Small problem set for evaluating Searcher/Scout problem heuristics.
115
Problem
Map
Number of
Number
Searchers
of Scouts
Time Horizon (T)
M1
3 3 grid
1
0
10
M2
3 3 grid
1
1
10
M3
3 3 grid
1
2
8
M4
5 5 grid
1
0
7
M5
5 5 grid
1
1
7
M6
5 5 grid
1
2
7
Table 5.5 Medium problem set for evaluating Searcher/Scout problem heuristics.
Problem
Map
Number of
Number
Searchers
of Scouts
Time Horizon (T)
L1
5 5 grid
1
0
20
L2
5 5 grid
1
1
20
L3
5 5 grid
1
2
15
L4
7 7 grid
1
1
25
L5
7 7 grid
1
2
15
L6
9 9 grid
1
1
20
L7
9 9 grid
1
1
30
Table 5.6 Large problem set for evaluating Searcher/Scout problem heuristics.
In all of the examples, the agents begin in cell 1 and have a glimpse probability of 60%. In the small problem set, the target has a stay probability of 60% and has an initial uniform probability distribution. In the medium and large problem sets, the target also has a stay probability of 60% but is known to be in the centre grid cell at time 1. Table 5.7, Table 5.8, and Table 5.9 show the PD from following the plans generated by each of the heuristics for the small problem set for cases with one searcher, two searchers and three searchers, respectively. The PD of the replanning heuristics R1(d) and R2(d) were calculated analytically via chaining together OSP sub-problems similarly to G1(d) and G2(d).
116
S1, 0 scouts 0.76031 S2 1 scout 0.84817 S3 2 scouts 0.89242 S4 4 scouts 0.92719
G1d
G1
G2d
G2
R1d
R1
R2d
R2
0.75426
0.76031
0.72865
0.71799
Same as
Same as
Same as
Same as
(0.8%)
(0%)
(4.2%)
(5.6%)
G1d
G1
0.84179
0.83154
0.84503
0.80831
0.79262
0.80418
0.71746
0.75418
(0.8%)
(2%)
(0.4%)
(4.7%)
(6.5%)
(5.2%)
(15.4%)
(11.1%)
0.88098
0.87132
0.88958
0.88882
0.78235
0.83105
0.78010
0.73475
(1.3%)
(2.4%)
(0.3%)
(0.4%)
(12.3%)
(6.9%)
(12.6%)
(17.7%)
0.90902
0.91660
0.92327
0.92599
0.83425
0.79899
0.81400
0.81581
(2%)
(1.1%)
(0.4%)
(0.1%)
(10%)
(13.8%)
(12.2%)
(12.0%)
G2d
G2
Table 5.7 PD of heuristic plans for the small problem set (a). Bracketed figures show percentage difference from the optimal (underlined). Shading highlights the best performing heuristic for each problem.
S5 0 scouts 0.94868 S6 1 scout 0.97440 S7 2 scouts 0.98226 S8 4 scouts 0.98882
G1d
G1
G2d
G2
R1d
R1
R2d
R2
0.94119
0.94426
0.94447
0.92813
Same as
Same as
Same as
Same as
(0.8%)
(0.5%)
(0.4%)
(2.2%)
G1d
G1
0.96653
0.96852
0.96518
0.94032
0.96308
0.94771
0.90606
0.93178
(0.8%)
(0.6%)
(0.9%)
(3.5%)
(1.2%)
(2.7%)
(7.0%)
(4.4%)
0.97480
0.97940
0.97273
0.97274
0.95335
0.95699
0.92340
0.90582
(0.8%)
(0.3%)
(1.0%)
(1.0%)
(2.9%)
(2.6%)
(6.0%)
(7.8%)
0.98677
0.98700
0.98632
0.98033
0.95140
0.97242
0.95030
0.91562
(0.2%)
(0.2%)
(0.3%)
(0.9%)
(3.8%)
(1.7%)
(3.9%)
(7.4%)
G2d
G2
Table 5.8 PD of heuristic plans for the small problem set (b). Bracketed figures show percentage difference from the optimal (underlined). Shading highlights the best performing heuristic for each problem.
117
S9 0 scouts 0.98798 S10 1 scout 0.99387 S11 2 scouts 0.99662 S12 4 scouts 0.99812
G1d
G1
G2d
G2
R1d
R1
R2d
R2
0.98585
0.98736
0.98618
0.98403
Same as
Same as
Same as
Same as
(0.2%)
(0.1%)
(0.2%)
(0.4%)
G1d
G1
0.99118
0.99311
0.99154
0.98958
0.98890
0.99057
0.97291
0.98117
(0.3%)
(0.1%)
(0.2%)
(0.4%)
(0.5%)
(0.3%)
(2.1%)
(1.3%)
0.99547
0.99606
0.99244
0.99256
0.99351
0.99066
0.95842
0.97830
(0.1%)
(0.1%)
(0.4%)
(0.4%)
(0.3%)
(0.6%)
(3.8%)
(1.8%)
0.99719
0.99784
0.99496
0.99098
0.98875
0.98300
0.98021
0.99212
(0.1%)
(0%)
(0.3%)
(0.7%)
(0.9%)
(1.5%)
(1.8%)
(0.6%)
G2d
G2
Table 5.9 PD of heuristic plans for the small problem set (c). Bracketed figures show percentage difference from the optimal (underlined). Shading highlights the best performing heuristic for each problem.
H1r was seen to be an effective heuristic for the OSP examples considered in Dell et al. (1996). Possibly because of this, the G1 method, which uses H1r as a subroutine, also tended to produce more effective plans here. The replanning heuristics nevertheless performed relatively well in cases with more than one searcher, with R1 and R1d having the better results. Figure 5.21 shows the computation times of G2 and G2d growing considerably faster than G1 and G1d for problems using more scouts.
118
Computation Times - Small Problem Set (S1-S12) 140 G1d G1 G2d G2
120
Computation time (s)
100
80
60
40
20
0
1
2
3
4
5
6
7 Problem
8
9
10
11
12
Figure 5.21 Heuristic computation times for the small problem set. Computation times to obtain the optimal solution for the same problems are shown in Figure 5.8.
Table 5.10 shows the results obtained for the two-dimension maps in the medium problem set. In this case, there is not a clear winner between G1d, G1 and G2d. The computation times for G1/G1d/G2/G2d are shown in Figure 5.22 while Figure 5.23 lists the time to compute the first plan with R1/R1d/R2/R2d. G1d
G1
G2d
G2
R1d
R1
R2d
R2
0.72063
0.72063
0.70230
0.70796
Same as
Same as
Same as
Same as
(0.6%)
(0.6%)
(3.2%)
(2.4%)
G1d
G1
M2*
0.78255
0.78179
0.79801
0.78041
0.77729
0.76520
0.73785
0.63398
1 scout
(1.9%)
(2.0%)
(0%)
(2.2%)
(2.6%)
(4.1%)
(7.5%)
(20.6%)
M3*
0.79174
0.77163
0.79866
0.79129
0.76465
0.57983
0.71521
0.56733
2 scouts
(0.9%)
(3.4%)
(0.0%)
(0.9%)
(4.3%)
(27.4%)
(10.4%)
(29.0%)
M4
0.57215
0.57215
0.56489
0.56468
Same as
Same as
Same as
Same as
0 scouts
(0%)
(0%)
(1.3%)
(1.3%)
G1d
G1
M5*
0.64243
0.63865
0.63350
0.61934
0.63464
0.63865
0.57349
0.59147
1 scouts
(0%)
(0.6%)
(1.4%)
(3.6%)
(1.2%)
(0.6%)
(10.7%)
(7.9%)
M6*
0.69078
0.67748
0.69370
0.66774
0.67498
0.66425
0.50652
0.64879
2 scout
(0.4%)
(2.3%)
(0%)
(3.7%)
(2.7%)
(4.2%)
(27.0%)
(6.5%)
M1 0 scouts
G2d
G2d
G2
G2
Table 5.10 PD of heuristic plans for the medium problem set. Bracketed figures show percentage difference from best known. Asterisks denote cases where optimal payoffs had not been computed.
119
Computation Times - Medium Problem Set (M1-M6) 2500 G1d G1 G2d G2
Computation time (s)
2000
1500
1000
500
0
1
2
3
4
5
6
Problem
Figure 5.22 Heuristic computation times for the medium problem set.
Computation Times - Medium Problem Set (M1-M6) 40
35
R1d R1 R2d R2
Computation time (s)
30
25
20
15
10
5
0
1
2
3
4
5
6
Problem
Figure 5.23 Time to compute the initial plan using a replanning heuristic for the medium problem set
120
Table 5.11 lists the average number of times that the target is detected in 10,000 trials with random initial locations (generated according to the corresponding initial PDF) and target motion (following the motion model). Only the R1d and R1 heuristics are used due to their speed and relative performance against the other replanning heuristics for the medium problem set. While planning with R1d obtained better results for the latter scenarios than R1, the results suggest that neither has an absolute advantage over the other. As same algorithms are likely to execute in no more than 5 seconds for test problems other than L5 when implemented in a compiled language such as C++ (estimated based on the experience of porting the algorithm for DMEAN), using the two heuristics together could be a viable option to ensure the best plan is obtained.
R1d
R1
L1: 5×5 grid, 0 scouts
0.7685 (0)
0.7715 (0)
T 20
Time to compute plan: 0.05s
Time to compute plan: 0.03s
L2: 5×5 grid, 1 scout
0.8548 (0.5772)
0.862 (0.5652)
T 20
Time to compute plan: 2.2s
Time to compute plan: 2.0s
L3: 5×5 grid, 2 scouts
0.8635 (0.8936)
0.865 (0.9145)
T 15
Time to compute plan: 57s
Time to compute plan: 56.3s
L4:7×7 grid, 1 scout
0.6002 (0.5334)
0.5931 (0.5442)
T 25
Time to compute plan: 16.4s
Time to compute plan: 14.1s
L5: 7×7 grid, 2 scouts
0.5569 (0.7181)
0.5467 (0.7534)
T 15
Time to compute plan: 378s
Time to compute plan: 375s
L6: 9×9 grid, 1 scout
0.4772 (0.3816)
0.45553 (0.4644)
T 20
Time to compute plan: 23.5s
Time to compute plan: 17.7s
L7: 9×9 grid, 2 scouts
0.5489 (0.5692)
0.5475 (0.5676)
T 30
Time to compute plan: 73s
Time to compute plan: 58s
Table 5.11 Average PD of heuristic plans over 10000 runs for the large problem set. The time to compute the first plan for each case is shown. Bracketed number shows the average number of scout detections per trial run.
121
5.6 Discussions and Summary 5.6.1 Related Work Apart from cases with false target detections (Stone, 1979), a searcher in most optimal search problems considered in the literature only receives information from failed attempts to detect the target. The whereabouts and surveillance search problems (Stone and Kadane, 1981), the smuggling boat problem (where the searcher can only engage the target in a specified set of cells) and the cumulative reward problem (where the searcher is rewarded for every detection) (Tierney and Kadane, 1983), which are closely related to each other, provide the main examples in the literature where detecting a target does not necessarily terminate the process and the searcher may benefit from this new information in the future. Tierney and Kadane (1983) formulated the Generalised Surveillance Search Problem (GSSP) to incorporate these problems and the standard detection search problem considered. The problem was solved through using the FAB algorithm to address an equivalent series of continuous effort search problems. The main difference of the approach in Tierney and Kadane (1983) with this chapter lies in its use of a single searcher and focus on infinitely divisible search effort. The work also did not address problems with path constraints on where the search effort could be placed. Stewart (1985) proposed a different problem in which the target leaves behind evidence in each cell it visits. Cases where the target trail persists for just one time step or can remain for longer time periods were considered. Instead of always directly searching for the target, the searcher must therefore consider also the information gain from finding target trails. A moving horizon heuristic was proposed where the searcher maximises detection probability for the remaining time periods but can only react to trail detections at the very next time step. The emphasis of the work was on exploiting clues left behind by the target in the environment using a single searcher, which differs from the aim of this chapter. The need for the agents to keeping searching beyond the first target detection also brings the problem closer to a combination of search and tracking. Furukawa et al. (2006) presented a case where rescue helicopters and UAVs cooperate to search for drifting lifeboats but only the former are capable of rescue. The vehicles first search for the targets under the Bayesian approach in Bourgault et al. (2003b). Once one or more
122
targets have been found, the vehicles then optimise their actions with respect to tracking an assigned target. This work is different from the Searcher/Scout problem in terms of representing the searchers‟ motion in the continuous domain, the consideration of multiple targets (although target assignment was not discussed) and the explicit switching between searching and tracking tasks. The solution approach taken was also not concerned with optimising actions with respect to all future possibilities for replan.
5.6.2 Incorporation of Non-Uniform Searcher Travel Times Provided that care is taken to synchronise the agents‟ actions, the Searcher/Scout problem can be extended to support the non-uniform agent travel times between regions considered by the OSPT problem in Chapter 4. A complicating factor is that instead of all the agents being able to change course immediately after a scout finds the target, one or more agent may still be in transit between regions. It therefore becomes necessary to first define what is expected of such a travelling agent in the event of scout detection. Depending on the particular application, valid options may include:
Committing the agent to search its intended destination cell before being allowed to replan, or
Forcing the agent to return to the cell it came from before being allowed to replan, or
Allowing the agent to decide which of the two cells to move to as part of the replanning process. Having defined this behaviour, one can then solve each of the OSPT sub-
problems leading from possible scout detections with the restriction in mind. As the agents may all be in different stages of transit from one cell to another, however, there may be a very large number of sub-problems that need to be solved. Assume for instance that an agent in transit is always committed to search its intended destination cell before being allowed to react. For each possibility of detection by a scout, a subproblem is then necessary not only for each combination of cells to be searched (or is being searched) by the other agents, but also for each possible number of time steps that the agents might be away from reaching these intended cells. Additionally, the asynchronous searching renders it more complicated to obtain bounds for a multi-agent OSPT problem with specified rewards, as well as to find the solution in a branch and bound framework. It is naturally possible to convert the problem into one without travel
123
times through injecting a large number of artificial cells, and then simply use the solution methods outlined in this chapter. This approach is however undesirable as suggested in Chapter 4. As such, the efficient implementation of a solution method for the multi-agent OSPT problem, for instance through adapting the branch and bound and bounding methods discussed in Chapter 4 to inherently account for asynchronous searches by agents, remains ongoing work for addressing a Searcher/Scout problem with non-uniform travel times.
5.6.3 Computational Issues This chapter considered two strategies for heuristically solving the Searcher/Scout problem. The first approach partitions the problem into a series of OSP problems with specified rewards, as is done in the optimal method, before heuristically solving each OSP problem. Instead of computing solutions for a complete series of OSP problems, an alternate strategy is to only heuristically plan for the team‟s actions until the next scout detection and then replan as necessary. The complete planning heuristics (G1/G1d/G2/G2d) that use the first approach can produce high quality plans, but they still require significant computation time when the team includes many agents. The more practical alternative of only planning as required was shown via the R1d and R1 heuristics to produce satisfactory plans in much quicker time. Naturally, other heuristics that strike a different balance between planning speed and quality are also possible. As the number of states grows exponentially with each additional agent, the time and memory required to pre-process them prior to running the heuristics, for example to identify and store valid transitions between the states, can become significant. Sequentially planning the agents‟ actions might therefore be necessary for sufficiently large problem instances and teams. One can for example first plan a single searcher‟s path (using a suitable heuristic), then plan for another agent knowing what cells have already been visited by the first, and so on for all agents. The scouts are additionally constrained in that they can gain nothing from inspecting cells beyond the range of the searchers, whose paths by then are already known. To further speed up computation, a scout that detects the target may be limited to simply follow a pre-calculated path designed to contain the target probability until a searcher arrives. The main shortcoming of this sequential approach lies with the greedy choice of paths by each agent. Provided the agents are not equivalent (in the sense of capability and starting positions), repeating the planning in different orders may improve on the
124
overall plan (Beard and McLain, 2003). Given sufficient computation time, the sequential planning could also be done in subgroups of two, three, or more agents to induce better cooperation. Alternatively, negotiation techniques such as auctioning methods (Gerkey and Matarić, 2002; Koenig et al., 2007) could be used to better coordinate the individual plans. The division of the Searcher/Scout problem into a series of sub-problems (Figure 5.2) renders most of the solution process a predominantly parallel task. In particular, the solution
of
each
problem
Lx,Qc ( x ) ,t 1( d*,*)
to
find
the
rewards
s
d * ( x,s,t ),x X ,s { n 1,...,n m } can just as easily be allocated to different processors, perhaps placed on the agents themselves (Figure 5.24). Assuming the agents already all know the main problem parameters (e.g. the number of cells, glimpse probabilities, travel constraints etc), only the values of d * ( x,s, ) for t T need to be communicated for a team to share the computational load. Once all the rewards
d * ( x,s,t ) are collated, the next set of values d * ( x,s,t 1 ) can then be found in parallel once more. The single top-level problem L ( 0 ),p( ,1 ),1( d*, *) with a T - step horizon, however, requires a significant amount of computation on its own. A distributed solution method for the OSP problem itself, such as via parallel branch and bound (Clausen and Perregaard, 1999), would therefore still be necessary to fully utilise the available processing resources.
d* X(t ) a bc s n 1,...,n m t T 1,...,1.
d * ( a,s,t )
d * ( b,s,t )
d * ( c,s,t )
Solve:
Solve:
La ,Qc ( a ) ,t ( d*, *)
Lb,Qc ( b ) ,t ( d*, *)
Lc ,Qc ( c ) ,t ( d*, *)
a,s n 1,...,n m
b,s n 1,...,n m
c,s n 1,...,n m
Agent 2
Agent 3
s
Agent 1
s
Figure 5.24 Distributed calculation of rewards
Solve: s
125
This division of labour can also help increase the effectiveness of the replanning R1/R1d heuristics. While the current plan is being executed, the team can concurrently replace the loose estimates of dˆ * (as described in Section 5.4.3) with more accurate values, through heuristically (or even optimally) solving the corresponding subproblems. In the limit, future plans obtained from replanning would then approach those from the G1/G1d heuristics.
5.6.4 Possible Extensions The ability to replan allows the problem to cater for two modes of target motion, such as for when it is either alert or unaware. For example, the initial search plan for an unaware target could be computed with a wandering motion model. Plans triggered by scout detections, on the other hand, may use another model that accounts for assumed panic (e.g. an increase in speed or fleeing on seeing the scout) or a compliant response (e.g. staying still to wait for a searcher or moving towards an indicated exit in the mean time). A number of extensions logically lead on from the problem considered here. For instance, the engagement criteria may be relaxed such that a scout detection terminates the search if there is also a searcher in the same cell. Similarly, the condition may be expanded so that simultaneous scout detections also “capture” the target, to reflect cases where multiple scouts together can overcome the limitations of one. Closer coordination of the scouts‟ paths would then be required. Taking this further, synergy from concurrently searching a cell with specific combinations of agents could be recognised with additional detection probability. For example, multiple agents may be able to cooperatively sweep an area to minimise the chance of escape, or benefit from using a particular combination of sensors.
5.6.5 Summary This chapter examined a problem of searching with heterogeneous agents, which augments the OSP problem explored in literature to account for cases where some agents are only capable of detecting but not meaningfully engage or give aid to the target. The fact that the search process does not necessarily terminate on detection renders it more complicated to plan for and evaluate than the single searcher problems considered in Chapter 4, or indeed their direct multiple searcher extensions. To this end,
126
a solution approach was proposed that equivalently partitions the overall problem into a series of dependent, but importantly smaller, OSP problem instances. Optimal solutions were obtained. The solution consists of not only the paths for the search team to initially follow, but also the alternative changes of paths should the target be detected by a scout at any stage. Optimal solutions, however, are not practical due to computation cost for many realistic problem scenarios. A range of solutions based on heuristics adapted from literature were developed and evaluated using examples. It was demonstrated that an approach of planning only until the next scout detection, while not leading to as high a detection probability as heuristically solving a complete set of OSP problems for all possibility of scout detections, produces reasonable plans in significantly quicker time.
6 Conclusions and Future Work The objective of this thesis was to develop optimal search techniques for finding a target in structured environments. This chapter summarises the principle contributions and provides suggestions for future research directions.
6.1 Summary of Contributions The main contributions of this thesis include:
Generalisation of the Optimal Searcher Path problem for the Search of Structured Environments The search of structured environments is modelled through extended forms of the Optimal Searcher Path (OSP) problem. Central to this is the addition of switch times to the OSP problem, such that the formulation now accounts for the time a searcher realistically needs to travel from one region in the environment to another. Chapter 3 addressed a version of this problem where the aim is to minimise the expected time to find a stationary target. For finding a moving target, Chapter 4 formulated the Optimal Searcher Path problem with non-uniform Travel times (OSPT) and the Generalised Optimal Searcher Path (GOSP). These two problems respectively model the search of environments consisting of separated and contiguous regions.
The Discounted MEAN bound for Optimal Searcher Path problems A branch and bound approach is presented to optimally plan for the new OSPT and GOSP problems. A major contribution is a new bounding method, the Discounted MEAN (DMEAN) bound, which also efficiently generates tight bounds for the original OSP problem. The DMEAN bound is shown to lead to much faster solution times for OSP problems with a fast moving target when compared with existing bounding techniques. Optimal search paths spanning longer time horizons, for both structured environments and maritime scenarios, can thus be feasibly derived.
Formulation of a Multi-Agent Search Problem with Positive Target Information A multi-agent search problem is presented in which the agents receive not only negative target information from not detecting the target, but can also make use of
128
positive information uncovered during the search. Two types of agents, searchers and scouts, are employed. While the process terminates as soon as the searcher finds the target, successful scout detections only serve to update the target information available. The team can then replan to follow a better course of action in the remaining time. As such, instead of directly maximising the probability of searcher detection in a single fixed plan, the team must also be flexible in its ability to exploit any new information obtained. This Searcher/Scout problem captures search scenarios where only some of the platforms are capable of directly servicing the target. Instead of leaving the remaining platforms behind, it leverages whatever capability there is to reduce the workload of the searchers themselves.
Optimal and Heuristic Solution Methods for the Searcher/Scout Problem It is shown that the Searcher/Scout problem can be addressed via solving a series of modified OSP problems, in which the reward for detection is not necessarily one. In particular, the true utility of a scout detection is obtained through solving a sub-problem where the target is known to be in that position at that time. The branch and bound framework and the DMEAN bound for the OSP problem are accordingly modified to accommodate arbitrary rewards for detection. Using the fact that the outcome of each sub-problem serves as the input for subsequently larger ones, a number of optimal and heuristic methods for the Searcher/Scout problem are described.
6.2 Directions for Future Work This thesis modelled the search of structured environments using generalised forms of the OSP problem and developed centralised techniques to obtain optimal plans. Given the complexity of the problems addressed, the cost of computation nevertheless stands to be a main issue in large applications. This section identifies the directions for future research, in particular the investigation of alternative branch and bound approaches, development of heuristic and decentralised solution techniques, and the creation of an integrated implementation. Some of the more straightforward directions for future work have already been presented in Section 4.9.1 and Section 5.6.
129
Alternate Branch and Bound Approaches While all the existing OSP branch and bound frameworks in literature use a depth-first expansion approach, it may be possible to leverage alternate branch and bound techniques developed in general literature to further reduce computation times. For instance, it remains to be investigated whether using a different node selection strategy or a best-first expansion approach could significantly speed up computation. Moreover, since the Searcher/Scout problem by definition assumes the use of multiple platforms, the computation of plans can be shared through using parallel branch and bound. It should also be possible to improve on the computation efficiency of the branch and bound framework proposed in its current form. The idea of hybrid bounding was raised in Washburn (1995), in which a fast bounding method is first applied in an attempt to quickly fathom each node prior to expensively calculating a tighter bound (such as FABC). While the same work showed that the theoretical improvement of hybrid bounding is small, this change can be introduced with little implementation effort. An alternative approach is a dynamic choice of bounding methods that trades off the cost of computing a bound against the likely time savings if the bound successfully fathoms a node. For instance, it may be sufficient to use a loose bound when a node represents a search near the end of the time horizon, since the penalty of not fathoming is just the full enumeration of a very small number of nodes. On the other hand, it may be worthwhile to invest the extra effort at higher levels of the solution tree to eliminate major branches. The MEAN bound, DMEAN bound and the DMEAN bounds with additional look-back steps (Section 4.9.1.1) provide one such set of bounds with graduated complexity and tightness that can be selected dynamically based on need. As the relative expense of finding a bound against enumerating a node depends on both the problem parameters and the particular branch and bound implementation, the criteria for deciding which bound to use at which stage may need to be estimated online.
Heuristic and Decentralised Solution Approaches Given the tight timing constraints of some applications, it would also be beneficial to develop fast suboptimal techniques, ideally with worst case bounds, to address larger problem instances. To this end, the optimal solution techniques already developed provide a basis of evaluating the performance of any eventual heuristics. These may be based for instance on general vehicle routing problem heuristics or those specific to a
130
Travelling Salesman Problem with profits (Feilet et al., 2005). The main challenge for adapting the latter to the OSPT/GOSP problems would be the assignment of “profits” that appropriately change depending on the prior regions searched, which is not currently supported. In addition, problems can be (optimally) solved for shorter time horizons and then combined in a rolling-horizon manner (Dell et al., 1996). Due to the much larger state space, the Searcher/Scout problem also stands to significantly benefit from a decentralised solution approach. The adoption of multi-agent coordination methods such as sequential auctioning (Koenig, 2007) could allow the agents to locally plan before negotiating with others.
Integrated Implementation The planning methods presented in this thesis obtain high-level strategies that sensibly coordinate the searching of individual regions in a structured environment. Figure 6.1 places these methods in the context of an envisaged integrated approach. In particular, a fully implemented search application would additionally require the estimation of target information from sensor networks (Dantu et al., 2005), an approach to appropriately decompose the entire area into constituent regions (Kuipers et al., 2004) as well as the specification of techniques to locally search each region. Depending on the complexity of the problem and the particular local search technique used, it may also be desirable to first aggregate the regions into larger groups and then plan for their search in a hierarchical manner. With security applications in mind, dealing with reactive targets in similar structured environments also forms part of future work. Other Target Information (e.g. Motion / heat sensor)
Environment Information (e.g. Floor plans)
Updated Target PDF, Topological Map Motion Model
High-level Search Algorithm
Estimated Search and Travel Times
Newly Acquired Data Optimal Search Plan
Local-level Search (e.g. Systematic or others)
Decomposed Regions
Further Hierarchical Decomposition
Figure 6.1 Integrated approach to search
Appendix A – Bounding Methods for the Optimal Searcher Path problem This Appendix outlines the known bounds for the Optimal Searcher Path (OSP) problem in literature and describes in more detail the PROP and FABC bounding methods used as comparison in the results of Chapter 4.
A.1 Bounding Methods in Literature The following lists the known bounds for the discrete-effort OSP problem in literature in approximate descending order of complexity:
Bounding method
Description
FABC
Uses the continuous effort version of the Forward and
Washburn (1995)
Backward (FAB) algorithm (Brown, 1980) to solve a relaxed version of the OSP problem in which the search effort is allowed to be infinitely divisible (Convex relaxation) and can be placed in any cell that is reachable from the searcher‟s starting cell by a given time (Distribution of Effort relaxation). The payoff obtained from running one iteration of the FAB algorithm is combined with an upper bound for this relaxed problem to provide a bound for the original problem.
FAB
Applies a discrete effort version of the FAB algorithm
Washburn (1995)
directly on the OSP problem without any relaxations. Shown to be significantly inferior to FABC in Washburn (1995).
Eagle and Yee (1990)
Solves a convex relaxed problem that allows search effort to be infinitely divisible between cells.
DMEAN
Solves a similar longest path problem as MEAN (see next),
Lau et al. (2006)
but conditions the estimated reward of searching a cell at each time on an additional previous search by the searcher. Results in a tighter bound than MEAN. Discussed in Section 4.6 on page 60.
132
MEAN
Maximises the expected number of detections in the
Martins (1993)
remaining time steps instead of the probability of detection. Computed via finding the longest path in a graph in which each arc‟s weight is equal to the probability of detecting the target in a corresponding cell, assuming the searcher did not inspect any cells prior to that point. Discussed in Section 4.5.2 on page 57.
PROP
Applies the Distribution of Effort relaxation to the problem
Washburn (1995)
solved in MEAN, such that bound computation simply involves maximising a reward instead of finding the longest path.
ERGO2
As a simplification of PROP, estimates bounds by directly
Washburn (1998)
using a stationary target distribution instead of updating the target probability distribution with anticipated target motion at each time step. Assumes an Ergodic target motion model. Table A.1 OSP bounds in literature.
A.2 Obtaining Upper Bounds of Probability of Detection for the OSP Problem Let be a fixed sequence of k one-time-unit searches in an OSP problem such that the search of cell ( k ) takes place at time k . The objective of a bound is to estimate the maximum PD possible for any extension of this partial plan from time k 1 up to time T . This can then be added to the known PD for searches during times
1,...,k to give the upper PD bound for any sequence beginning with the partial plan . Washburn (1995) identified the three most important OSP problem relaxations to obtain bounds as:
Convex
Linear
Distribution of Effort (DOE) The convex relaxation refers to changing the discrete search effort assumption of
the problem such that the effort can be infinitely divided. This serves the purpose of rendering the original problem convex and thus much easier to solve. The linear relaxation can be interpreted as changing the objective from maximising the probability of detection to maximising the expected number of detections. Instead of searching cells
133
that are connected in a path, the Distribution of Effort relaxation only restricts the search effort at a time step to be in the set of cells that are reachable from the last constrained cell of the searcher, after this amount of time has elapsed. Let St be such a set of cells that are reachable from the last constraining cell ( k ) at time t k . This can be obtained through St 1 xSt S( x ), beginning with Sk { ( k )}. The following sections outline in more detail the operation of the PROP and FABC bounds, respectively.
A.3 The PROP Bound The PROP bound combines the linear relaxation made by the MEAN bound additionally with the DOE relaxation. Instead of finding the longest path that maximises the expected number of detections, PROP solves the following reward collection problem: T
max P( x,t ) g( x,t )
t k 1
xSt
Where g( x,t ) is the glimpse probability for a searcher in cell x at time t and P( x,t ) is the probability that the target is in cell x at time t , assuming that no other searches have been conducted since time k .
A.4 The FABC Bound The FABC (Forward and Backward, Convex) bound functions by applying one iteration of the FAB algorithm (Brown, 1980) to an OSP problem with convex and DOE relaxations. As outlined in Section 2.2.3, Brown‟s FAB algorithm finds the optimal continuous effort allocation for a multiple time step moving target problem by repeatedly solving a series of single time step stationary target search problems, each associated with one time step of the T step horizon. In particular, the stationary target problem at each time t 1,...,T uses a target probability distribution in which the probability of each cell is a product of the probability of an undetected target arriving in the cell at that time, and the probability of the target remaining undetected until all T steps have been expended, assuming that the search plan is fixed for all other times
t. The process iterates by using the effort allocation found for each stationary problem as the new allocation at the corresponding time step for the original moving target problem, before repeating with another time step until no further improvements
134
can be made. The search plan obtained after executing one iteration of the algorithm (i.e. solving T stationary target search problems) is then used as a basis for the FABC bound. Let represent the search effort allocation (search plan) where ( x,t ) denotes the amount of search effort to be placed in x at time t . Define reach( x,t, ) as the probability that the target will arrive in cell x at time t without being detected by searches at times 1 to t 1 when the effort allocation is used and let survive( x,t, ) be the probability that a target starting in cell x at time t will remain undetected for the remainder of the search. Both reach( ,t, ) and survive( ,t, ) can be directly calculated from the search plan and the target motion model but neither are dependent on
( ,t ), the search effort allocations for the time period t . Let PD( ) be the probability of target detection from following this plan. Given that the search effort can now be divided over multiple cells, the original glimpse probability function g( x,t ) is mapped to an exponential detection function 1 e ( x ,t ) ( x ,t ) , where ( x,t ) is a measure of search effectiveness. Accordingly, the
function q( x,t, ) e( x,t ) ( x ) denotes the probability of not detecting a target in x at time t if the amount of search effort specified by ( x ) is invested. Brown (1980) and Washburn (1983) detail the use of the FAB algorithm to solve a number of moving target problem with an exponential detection function and the equivalent objective of minimising the probability of non-detection. A brief description of the FAB algorithm to maximise the probability of detection, in line with the formulation of the OSP problem given in Section 2.2.3.3, is given below.
Forward and Backward Algorithm (FAB) (Brown, 1980): 1. Initialise with an arbitrary search effort allocation. Set . 2. Calculate survive( ,, ) using the allocation . 3. Assign the initial target probability distribution to reach( ,t, ). 4. Set t 1. 5. Find the search effort allocation satisfying: N
max survive( x,t, ) ( 1 q( x,t, )) reach( x,t, ).
x 1
6. Replace the effort allocations in ( ,t ) with .
135
7. If t T , set t t 1, else go to step 9. 8. Calculate survive( ,t, ) using the updated allocation. Go to step 5. 9. is the optimal effort allocation for the moving target search problem and the procedure terminates if PD( ) PD( ). Else set and go to step 2.
The stationary target problem in step 5 can be efficiently solved using Lagrange multipliers as described in Chapter 2 of Stone (1989). For the purposes of the FABC bound, the FAB algorithm is adapted accordingly to solve an OSP problem with convex and DOE relaxations over the time steps t k 1,...,T . In particular, P( ,k 1 ) is normalised to sum to one and used as the initial distribution in step 2, the value of t is set to k 1 instead of 1 in step 4, only one unit of search effort is available per time step and the distribution at time t can only allocate search effort in set of cells specified by St . Following Washburn (1983), define:
M( x,t ) ( x,t ) reach( x,t ) q( x,t,( ,t )) survive( x,t ) and
( x,t ) max M( x,t ) x
An upper bound of the difference between PD( ) and the true optimal detection probability for an OSP problem (assuming convex and DOE relaxations) is shown in Washburn (1983) to be obtainable by T
N
t 1
x 1
D( ) ( M( x,t ) ( x,t ) ( t )) Since the maximum detection probability of a problem with the convex and DOE relaxations can be no smaller than an equivalent problem with discrete effort constraints, the FABC bound PD( ) D( ) (normalised with respect to the sum of the probability mass in the distribution P( ,k 1 ) ) also provides a valid upper PD bound for the original OSP problem itself over the time steps k 1,...,T .
Bibliography Alpern, S., 1995, „The rendezvous search problem‟, SIAM Journal on Control and Optimization, vol. 33, no. 3, pp. 673-683. Anderson, E. J. and Weber, R. R., 1990, „The rendezvous problem on discrete locations‟, Journal of Applied Probability, vol. 27, no. 4, pp. 839-851. Batalin, M. A., and Sukhatme, G. S., 2005, „The analysis of an efficient algorithm for robot coverage and exploration based on sensor network deployment‟, In Proceedings of the IEEE International Conference on Robotics and Automation, Barcelona, Spain, pp. 3478-3485. Beard, R. W., and McLain, T. W., 2003, „Multiple UAV cooperative search under collision avoidance and limited range communication constraints', In Proceedings of the 42nd Conference on Decision and Control, Maui, Hawaii, USA, vol. 1, pp. 25-30. Benkoski, S. J., Monticino, M. G. and Weisinger, J. R., 1991, „A survey of the search theory literature‟, Naval Research Logistics, vol. 38, no. 4, pp. 469-494. Bourgault, F., Furukawa, T., and Durrant-Whyte, H. F., 2003a, „Optimal search for a lost target in a Bayesian world‟, In Proceedings of the 4th International Conference on Field and Service Robotics, Mt Fuji, Japan. Bourgault, F., Furukawa, T., and Durrant-Whyte, H. F., 2003b, „Coordinated decentralized search for a lost target in a Bayesian world‟, In IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, Nevada, pp. 48-53. Bourgault, F., and Durrant-Whyte, H. F., 2004a, „Communication in general decentralized filters and the coordinated search strategy‟, In Proceedings of the 7th International Conference on Information Fusion, pp. 723-770. Bourgault, F., Furukawa, T., and Durrant-Whyte, H. F., 2004b, „Decentralized Bayesian negotiation for cooperative search‟, In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Sendai, Japan, pp. 2681-2686. Bourgault, F., Furukawa, T., and Durrant-Whyte, H. F., 2004c, „Process model, constraints, and the coordinated search strategy‟, In Proceedings of the IEEE International Conference on Robotics and Automation, New Orleans, USA, pp. 52565261. Brown S. S., 1980, „Optimal Search for a moving target in discrete time and space‟, Operations Research, vol. 28, no. 6, pp. 1275-1289. Chew, M. C., 1967, „A sequential search procedure‟, Annals of Mathematical Statistics, vol. 38, pp. 494-502. Chin, W., Ntafos, S., 1986, „Optimum watchman routes‟, In Proceedings of the Second Annual Symposium on Computational Geometry, Yorktown Heights, New York, USA, pp. 24-33. Chkhartishvili, L. G., and Shikin, E. V., 2002, „Geometry of dynamical search for objects‟, Journal of Mathematical Sciences, vol. 110, no. 2. Chvátal, V., 1975, „A combinatorial theorem in plane geometry‟, Journal of Combinatorial Theory, Series B, vol. 18, issue 1, pp. 39-41.
137
Clausen, J., 1999, „Branch and bound algorithms - principles and examples‟, Department of Computer Science, University of Copenhagen, Universitetsparken 1, DK-2100 Copenhagen, Denmark. Clausen, J. and Perregaard, M., 1999, „On the best search strategy in parallel branchand-bound: best first search versus lazy depth-first search‟, Annals of Operations Research, 90, pp. 1-17. Cormen, T. H., Leiserson, C. E., Rivest, R. L., 1990, Introduction to algorithms, The MIT Press: Cambridge, Massachusetts. Dambreville, F. and Le Cadre, J.-P., 2002, „Detection of a Markovian target with optimization of the search efforts under generalized linear constraints‟, Naval Research Logistics, vol. 49, no. 2, pp. 117-142. Dantu, K., Rahimi, M., Shah, H., Babel, S., Dhariwal, A., and Sukhatme, G. S., 2005, „Robomote: enabling mobility in sensor networks‟, In Fourth International Symposium on Information Processing in Sensor Networks, pp. 404-409. DasGupta, B., Hespanha, J. P., and Sontag, E., 2004, „Aggregation-based approaches to honey-pot searching with local sensory information‟, In Proceedings of the American Control Conference, Boston, USA, pp. 1207-1207. DasGupta, B., Hespanha, J. P., Riehl, J. and Sontag, E., 2006, „Honey-pot constrained searching with local sensory information‟, Nonlinear Analysis: Hybrid Systems and Applications, vol. 65, issue 9, pp. 1773-1793. Dell, R. F., Eagle, J. N., Martins, G. H. A. and Santos, A. G, 1996, „Using multiple searchers in constrained-path, moving-target search problems‟, Naval Research Logistics, vol. 43, 463-480. Discenza, J. H. and Stone, L. D., 1981, „Optimal survivor search with multiple states‟, Operations Research, vol. 29, no. 2, pp. 309-323. Dodin, P., Minvielle, P., Le Cadre, J.-P., 2007, „A branch-and-bound algorithm applied to optimal radar search pattern‟, Aerospace Science and Technology, vol. 11, issue 4, pp. 279-288. Eagle, J. N., 1984, „The optimal search for a moving target when the search path is constrained‟, Operations Research, vol. 32, no. 5, pp. 1107-1115. Eagle, J. N., and Yee, J. R., 1990, „An optimal branch and bound procedure for the constrained path, moving target search problem‟, Operations Research, vol. 38, no. 1, pp. 110-114. Erkut, E. and Zhang, J., 1996, „The maximum collection problem with time-dependent rewards‟, Naval Research Logistics, vol. 43, pp. 749-763. Eppstein, D., 1998, „Finding the k shortest paths‟, SIAM Journal on Computing, vol. 28, no. 2, pp. 652-673. Feilet, D., Dejax, P., and Gendreau, M., 2005, „Travelling salesman problem with profits‟, Transportation Science, vol. 39, no. 2, pp. 188-205. Flint, M., Fernandez-Gaucherand, E., and Polycarpou, M., 2003, „Cooperative control for UAV‟s searching risky environments for targets‟, In Proceedings of the 42nd IEEE Conference on Decision and Control, vol. 4, pp. 3567-3572.
138
Frost, J. R., and Stone, L. D., 2001, „Review of search theory: advances and applications to search and rescue decision support‟, Technical Report No. CG-D-15-01, U.S. Coast Guard Research and Development Center, Gronton, CT, USA. Furukawa, T., Bourgault, F., Lavis, B., and Durrant-Whyte, H. F., 2006, „Recursive Bayesian search-and-tracking using coordinated UAVs for lost targets‟, In Proceedings of the IEEE International Conference on Robotics and Automation, Orlando, Florida, pp. 2521-2526. Gerkey, B. P. and Matarić, M. J., 2002, „Sold! Auction methods for multi-robot coordination‟, IEEE Transactions on Robotics and Automation, Special Issue on Multirobot Systems, vol. 18, no. 5, pp. 758-768. Gerkey, B. P., Thrun, S., and Gordon, G., 2006, „Visibility-based pursuit-evasion with limited field of view‟, International Journal of Robotics Research, vol. 25, issue 4, pp. 299-315. Gilbert, E. N., 1959, „Optimal search strategies‟, Journal of Society of Industrial Applied Mathematics, vol. 7, pp. 413-424. Grocholsky, B., 2002, Information-theoretic control of multiple sensor platforms, PhD Thesis, University of Sydney. Hart, P., Nilsson, N., and Raphael, B., 1968, „A formal basis for the heuristic determination of minimum cost paths‟, IEEE Transactions on Systems Science and Cybernetics, vol. ssc-4, no. 2, pp. 100-107. Hohzaki, R., and Iida, K., 1997, „Optimal strategy of route and look for the path constrained search problem with reward criterion‟, European Journal of Operational Research, vol. 100, pp. 236-249. Hohzaki, R., and Iida, K., 2000, „A search game when a search path is given‟, European Journal of Operational Research, vol. 124, pp. 114-124. Hohzaki, R. and Iida, K., 2001, „Optimal ambushing search for a moving target‟, European Journal of Operational Research, vol. 133, pp. 120-129. Hohzaki, R., 2006, „Discrete search allocation game with false contacts‟, Naval Research Logistics, vol. 54, issue 1, pp. 46-58. Hollinger, G., Kehagias, A., Singh, S., 2007, „Probabilistic strategies for pursuit in cluttered environments with multiple robots‟, In Proceedings of the IEEE International Conference on Robotics and Automation, Rome, Italy, pp. 3870-3876. Hollinger, G., Djugash, J., Singh, S., 2007, „Coordinated search in cluttered environments using range from multiple robots‟, In Proceedings of the International Conference on Field and Service Robotics, Chamonix, France. Jin, Y., Liao, Y., Minai, A. A., Polycarpou, M. M., 2006, „Balancing search and target response in cooperative unmanned vehicle teams‟, IEEE Transactions on Systems, Man and Cybernetics – Part B: Cybernetics, vol. 36, no. 3, pp. 571-587. Jung, B., 2005, Cooperative tracking using multiple robots, PhD thesis, University of Southern California, USA. Jung, B. and Sukhatme, G. S., 2004, „A generalized region-based approach for multitarget tracking in outdoor environments‟, In Proceedings of the IEEE International Conference on Robotics and Automation, vol. 3, pp. 2189-2195.
139
Kadane, J.B., and Simon, H. A., 1977, „Optimal strategies for a class of constrained sequential problems‟, The Annals of Statistics, vol. 5, pp. 237-255. Kadane, J.B., and Simon, H. A., 1983, „Correction to optimal strategies for a class of constrained sequential problems‟, The Annals of Statistics, vol. 11, p. 346. Kierstead, D. P. and DelBalzo, D. R., 2003, „A genetic algorithm applied to planning search paths in complicated environments‟, Military Operations Research, vol. 8, no. 2, pp. 45-59. Kisi, T., 1966, „On an optimal searching schedule‟, Journal of the Operations Research Society of Japan, vol. 8, pp. 53-65. Koenig, S., Tovey, S., Zheng, X., and Sungur, I., 2007, „Sequential bundle-bid singlesale auction algorithms for decentralized control‟, in Proceedings of the International Joint Conference on Artificial Intelligence, Hyderabad, India, pp. 1359-1365. Korf, R. E., 1985, „Depth-first iterative-deepening: an admissible tree search‟, Artificial Intelligence, vol. 27, issue 1, pp. 97-109. Krishnamachari, B. and Iyengar, S., 2004, „Distributed Bayesian algorithms for faulttolerant event region detection in wireless sensor networks‟, IEEE Transactions on Computers, vol. 53, no. 3, pp. 241-250. Kuipers, B., Modayil, J., Beeson, P., MacMahon, M., Savelli, F., 2004, „Local metrical and global topological maps in the hybrid spatial semantic hieararchy‟, In Proceedings of the IEEE International Conference on Robotics and Automation, New Orleans, pp. 4845-4851. Kunigami, M., 1997, Optimizing ASW search for HVU protection using the FAB algorithm, Masters thesis, Naval Postgraduate School, Monterey, CA. Lau, H., „Behavioural approach for multi-robot exploration‟, 2003, Australasian Conference on Robotics and Automation, Brisbane. Lau, H., Huang, S., and Dissanayake, G., 2005, „Optimal search for multiple targets in a built environment‟, In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, Canada, pp. 3740-3745. Lau, H., Huang, S., and Dissanayake, G., 2006, „Probabilistic search for a moving target in an indoor environment‟, In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, pp. 3393-3398. Lau, H., Huang, S., and Dissanayake, G., 2007a, „Discounted MEAN bound for the optimal searcher path problem with non-uniform travel times‟, European Journal of Operational Research, in press. Lau, H., Huang, S., and Dissanayake, G., 2007b, „Multi-agent search with interim positive information‟, IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, USA, to be presented. Lawler, E. L., 1972, „A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path problem‟, Management Science, vol. 18, no. 7, pp. 401-405. Liao, Y., Jin, Y., Minai, A. A., Polycarpou, M. M., 2005, „Information sharing in cooperative unmanned aerial vehicle teams‟, In Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference, Seville, Spain, pp. 90-95.
140
Lössner, U., and Wegener, I., 1982, „Discrete sequential search with positive switch cost‟, Mathematics of Operations Research, vol. 7, no. 3, pp. 426-440. Martins, E. Q. V., and Pascoal, M. M. B., 2000, „An algorithm for ranking optimal paths‟, Technical Report 01/002, CISUC. Viewed 14 March, 2006 . Martins, E. Q. V., and Pascoal, M. M. B., 2003, „A new implementation of Yen‟s ranking loopless paths algorithm‟, 4OR – Quarterly Journal of the Belgian, French and Italian Operations Research Societies, vol. 1, no. 2, pp. 121-134. Martins, G., 1993, A new branch-and-bound procedure for computing optimal search paths, Master‟s Thesis, Naval Postgraduate School. Moors, M., and Schulz, D., 2006, „Improved Markov models for indoor surveillance‟, In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, pp. 4072-4077. Murphy, R. R., 2000, „Biomimetic search for urban search and rescue‟, In IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 3, pp. 2073-2078. Nakai, T., 1981, „Search problem with time-dependent detection probabilities‟, Mathematica Japonica, vol. 26, pp. 499-505. Nakai, T., 1988, „Search models with continuous effort under various criteria‟, Journal of the Operations Research Society of Japan, vol. 31, no. 3, pp. 335-351. Ogras, U. Y., Dagci, O. H., and Ozguner, U., 2004, „Cooperative control of mobile targets for target search‟, In Proceedings of the IEEE International Conference on Mechatronics, pp. 123-128. Onaga, K., 1971, „Optimal search for detecting a hidden target‟, SIAM Journal on Applied Mathematics, vol. 20, no. 2, pp. 298-318. Pollock, S. M., 1970, „A simple model for search for a moving target‟, Operations Research, vol. 18, no.3, pp. 883-903. Polycarpou, M. M., Yanli, Y., and Passino, K. M., 2001, „A cooperative search framework for distributed agents‟, In Proceedings of the IEEE International Symposium on Intelligent Control, Mexico City, Mexico. Riehl, J. R., and Hespanha, J. P., 2007, „Cooperative graph search using fractal decomposition‟, American Control Conference, New York City, USA. Rybski, P. E., Papanikolopoulos, N. P., Stoeter, S. A., Krantz, D. G., Yesin, K. B., Gini, M., Voyles, R., Hougen, D. F., Nelson, B., Erickson, M. D., 2000, „Enlisting rangers and scouts for reconnaissance and surveillance‟, IEEE Robotics & Automation Magazine, vol. 7, issue 4, pp. 14-24. Sarmiento, A., Murrieta, R., Hutchinson, S. A., 2003, „An efficient strategy for rapidly finding an object in a polygonal world‟, In Proceedings of the IEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, USA, pp. 1153-1158. Sarmiento, A., Murrieta-Cid, R. and Hutchinson, S., 2004, In „Planning expected-time optimal paths for searching known environments‟, In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, pp. 872878.
141
Singh, S., Krishnamurthy, V., 2003, „The optimal search for a Markovian target when the search path is constrained: the infinite-horizon case‟, IEEE Transactions on Automatic Control, vol. 48, no. 3, pp. 493-497. Smith, F. H., and Kimeldorf, G., 1975, „Discrete sequential search for one of many objects‟, The Annals of Statistics, vol. 3, no. 4, pp. 906-915. Stentz, A., 1994, „Optimal and efficient path planning for partially-known environments‟, In Proceedings of the IEEE International Conference on Robotics and Automation, San Diego, USA, pp. 3310-3317. Stewart, T. J., 1979, „Search for a moving target when searcher motion is restricted‟, Computers and Operations Research, vol. 6, pp. 129-140. Stewart, T. J., 1985, „Optimizing search with positive information feedback‟, Naval Research Logistics, vol. 32, no. 2, pp. 263-274. Stone, L. D., 1984, „Generalized search optimization‟ in E. J. Wegman and J. G. Smith (eds), Statistical Signal Processing, Marcel Dekker Inc., New York. Stone, L. D., 1989, Theory of optimal search, 2nd edn, Academic Press. Stone, L., D., and Kadane, J. B., 1981, „Optimal whereabouts search for a moving target‟, Operations Research, vol. 29, pp. 1154-1166. Stromquist, W. R., and Stone, L. D., 1981, „Constrained optimization of functionals with search theory applications‟, Mathematics of Operations Research, vol. 6, no. 4, pp. 518-529. Sujit, P. B., and Ghose, D., 2004, „Search using multiple UAVs with flight time constraints‟, IEEE Transactions on Aerospace and Electronic Systems, vol. 40, no. 2, pp. 491-509. Sujit, P. B., and Ghose, D., 2006, „Self assessment schemes for multi-agent cooperative search‟, In Proceedings of the American Control Conference, Minneapolis, Minnesota, USA, pp. 1388-1393. Tadokoro, S., 2002, „RoboCupRescue project‟, In Proceedings of the 41st SICE Annual Conference, vol. 1, pp. 334-337. Tierney, L. and Kadane, J. B., 1983, „Surveillance search for a moving target‟, Operations Research, vol. 31, no. 4, pp. 720-738. Trummel, K. E., and Weisinger, J. R., 1986, „The complexity of the optimal searcher path problem‟, Operations Research, vol. 34, no. 2, pp. 324-327. Washburn, A. R., 1981, „An upper bound useful in optimizing search for a moving target‟, Operations Research, vol. 29, no. 6, pp. 1227-1330. Washburn, A., R., 1983, „Search for a moving target: The FAB algorithm‟, Operations Research, vol. 31, no. 4, pp. 739-751. Washburn, A. R., 1995, „Branch and bound methods for search problems‟, Technical Report, NPS-OR-95-003, Naval Postgraduate School, Monterey, CA, USA. Washburn, A. R., 1998, „Branch and bound methods for search problems‟, Naval Research Logistics, vol. 45, pp. 243-257. Washburn, A. R., 2002, Search and detection, 4th edn, INFORMS.
142
Wegener, I., 1982, „The discrete search problem and the construction of optimal allocations‟, Naval Research Logistics Quarterly, vol. 29, pp. 203-212. Yen, J. Y., 1971, „Finding the k shortest loopless paths in a network‟, Management Science, vol. 17, pp. 712-716. Zahl, S., 1963, „An allocation problem with applications to operations research and statistics‟, Operations Research, vol. 11, pp. 426-441.