A Proof of Knight-Swapping Puzzle Heuristic Properties. 106 ...... In this problem, we consider a number of airports located in a two-dimensional grid-world,. 27 ...
SAMPLING-BASED PLANNING FOR DISCRETE SPACES by STUART BRUCE MORGAN
Submitted in partial fulfillment of the requirements for the degree of Master of Science
Thesis Advisor: Dr. Michael S. Branicky
Department of Electrical Engineering and Computer Science CASE WESTERN RESERVE UNIVERSITY May, 2004
Contents
List of Figures
5
List of Tables
7
List of Algorithms
8
Abstract
9
1 Introduction
10
1.1
Problem Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1.2
Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1.3
Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1.4
Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2 Discrete Search Background 2.1
2.2
2.3
13
Discrete Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.1.1
Discrete Space Planning Problems . . . . . . . . . . . . . . . .
14
Existing Discrete Algorithms . . . . . . . . . . . . . . . . . . . . . . .
15
2.2.1
General Discrete Search Framework . . . . . . . . . . . . . . .
15
2.2.2
Uninformed Search . . . . . . . . . . . . . . . . . . . . . . . .
17
2.2.3
Informed Search . . . . . . . . . . . . . . . . . . . . . . . . . .
18
Sample Discrete Planning Problems . . . . . . . . . . . . . . . . . . .
24
1
2.3.1
Grid World . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2.3.2
N -Puzzle . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.3.3
Knight-Swapping Puzzle . . . . . . . . . . . . . . . . . . . . .
25
2.3.4
Grid World Air Traffic Control . . . . . . . . . . . . . . . . .
27
3 Sampling-Based Planning Background 3.1
3.2
Rapidly-Exploring Random Trees . . . . . . . . . . . . . . . . . . . .
30
3.1.1
The RRT Algorithm . . . . . . . . . . . . . . . . . . . . . . .
31
3.1.2
Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.1.3
Path Planning Techniques . . . . . . . . . . . . . . . . . . . .
34
3.1.4
Hybrid RRTs . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
Probabilistic Roadmaps . . . . . . . . . . . . . . . . . . . . . . . . .
36
3.2.1
The PRM Algorithm . . . . . . . . . . . . . . . . . . . . . . .
37
3.2.2
Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.2.3
Path Planning Techniques . . . . . . . . . . . . . . . . . . . .
38
4 Sampling-Based Planners for Discrete Spaces 4.1
4.2
4.3
4.4
30
39
Discretized RRT Algorithm . . . . . . . . . . . . . . . . . . . . . . .
39
4.1.1
Path Planning . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
Rapidly-Exploring Random Leafy Trees . . . . . . . . . . . . . . . . .
42
4.2.1
Path Planning Optimizations . . . . . . . . . . . . . . . . . .
44
Discretized PRM Algorithm . . . . . . . . . . . . . . . . . . . . . . .
45
4.3.1
Path Planning . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
Discrete Sampling-Based Planning Results . . . . . . . . . . . . . . .
46
4.4.1
RRT vs. RRLT Comparison . . . . . . . . . . . . . . . . . . .
46
4.4.2
N -Puzzle Results . . . . . . . . . . . . . . . . . . . . . . . . .
48
4.4.3
Knight-Swapping Puzzle Results . . . . . . . . . . . . . . . . .
49
4.4.4
Grid World Results . . . . . . . . . . . . . . . . . . . . . . . .
51
2
4.4.5
Discrete Air Traffic Control Results . . . . . . . . . . . . . . .
51
5 Discrete Sampling-Based Planners: Improvements and Variations
55
5.1
RRT and RRLT Nearest-Neighbor Search Improvements . . . . . . .
55
5.1.1
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
RRTs With Local Planners . . . . . . . . . . . . . . . . . . . . . . . .
62
5.2.1
Meta-Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
5.2.2
RRLTs vs. RRTs as Global Planners . . . . . . . . . . . . . .
66
5.2.3
Best-First Search as a Local Planner . . . . . . . . . . . . . .
67
5.2.4
A∗ as a Local Planner . . . . . . . . . . . . . . . . . . . . . .
68
5.2.5
Generalized Heuristic Local Planners . . . . . . . . . . . . . .
68
5.2.6
Importance of Randomization in Local Planners . . . . . . . .
70
5.3
Cost-Optimized RRTs . . . . . . . . . . . . . . . . . . . . . . . . . .
73
5.4
k -Growth RRLTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
5.5
Semi-Leafy RRTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
5.2
6 Sampling-Based Planning Properties in Discrete Space
78
6.1
Overlapping Voronoi Regions and Tie-Breaking . . . . . . . . . . . .
78
6.2
Coverage and Heuristic—Dynamic Interactions . . . . . . . . . . . . .
79
6.3
Path Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
7 Conclusion
97
7.1
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
7.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
7.2.1
Refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
7.2.2
Extensions of Discrete Sampling-Based Planners . . . . . . . .
99
7.2.3
Study of Discrete Properties . . . . . . . . . . . . . . . . . . . 103
7.2.4
Application to Continuous and Hybrid Systems . . . . . . . . 104
3
A Proof of Knight-Swapping Puzzle Heuristic Properties
106
A.1 Admissibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 A.2 Domination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 A.3 Symmetry Violation . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 A.4 Triangle Inequality Violation . . . . . . . . . . . . . . . . . . . . . . . 110 B Implementation Details
111
Bibliography
113
4
List of Figures 2.1
A simple discrete motion-planning problem. . . . . . . . . . . . . . .
14
2.2
Optimal and feasible paths. . . . . . . . . . . . . . . . . . . . . . . .
15
2.3
Breadth-first and depth-first search. . . . . . . . . . . . . . . . . . . .
18
2.4
A∗ and best-first search. . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.5
The N -Puzzle and Knight-Swapping Puzzle. . . . . . . . . . . . . . .
26
2.6
Constraints on the Knight-Swapping Puzzle. . . . . . . . . . . . . . .
27
2.7
Grid-world air traffic control. . . . . . . . . . . . . . . . . . . . . . .
29
3.1
Construction of an RRT in continuous space. . . . . . . . . . . . . . .
31
3.2
Voronoi regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.3
Voronoi interpretation of a continuous RRT. . . . . . . . . . . . . . .
33
4.1
A non-optimal step in a discrete RRT. . . . . . . . . . . . . . . . . .
41
4.2
A failed step in a discrete RRT. . . . . . . . . . . . . . . . . . . . . .
42
4.3
8-Puzzle example of RRT and RRLT growth. . . . . . . . . . . . . . .
43
4.4
Simple grid-world air traffic control. . . . . . . . . . . . . . . . . . . .
53
4.5
More difficult grid-world air traffic control. . . . . . . . . . . . . . . .
54
5.1
Pruning in nearest-neighbor searches. . . . . . . . . . . . . . . . . . .
58
5.2
Nearest-neighbor pruning in a bi-directional 24-Puzzle RRLT. . . . .
61
5.3
Nearest-neighbor pruning in a single-directional 9×9 Knight-Swapping Puzzle RRLT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
61
5.4
Discrete analog of a continuous RRT’s . . . . . . . . . . . . . . . . .
63
5.5
Improved discrete analog of a continuous RRT’s . . . . . . . . . . . .
64
5.6
Meta vs. non-meta global planners in the Knight-Swapping Puzzle. .
65
5.7
Lack of stepwise optimality in the meta-RRLT. . . . . . . . . . . . .
66
5.8
Potential usefulness of leaves in the meta-RRLT. . . . . . . . . . . . .
67
5.9
Varied local heuristics in meta-RRTs and meta-RRLTs. . . . . . . . .
69
5.10 Varied local heuristics and sizes in meta-RRLTs. . . . . . . . . . . . .
70
5.11 Meta-tree randomization visualization. . . . . . . . . . . . . . . . . .
72
5.12 K-Growth RRLTs. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
6.1
Discrete grid-world Voronoi regions using the L1 metric. . . . . . . .
83
6.2
Discrete grid-world Voronoi regions using the L2 metric. . . . . . . .
84
6.3
Discrete grid-world Voronoi regions using the L∞ metric. . . . . . . .
85
6.4
Discrete grid-world coverage. . . . . . . . . . . . . . . . . . . . . . . .
86
6.5
Discrete grid-world coverage with diagonal moves. . . . . . . . . . . .
87
6.6
Discrete three-dimensional grid-world coverage. . . . . . . . . . . . .
88
6.7
Voronoi regions in grid-world meta-trees and PRMs. . . . . . . . . . .
89
6.8
Narrow-corridor coverage in grid-world meta-trees and PRMs. . . . .
90
6.9
Discrete grid-world path optimality. . . . . . . . . . . . . . . . . . . .
94
6.10 Discrete grid-world path optimality with α weighting. . . . . . . . . .
95
6.11 Coverage and optimality of grid-world meta-trees and PRMs. . . . . .
96
A.1 Violation of symmetry in the Knight-Swapping Puzzle. . . . . . . . . 109 A.2 Violation of the triangle inequality in the Knight-Swapping Puzzle. . 110
6
List of Tables 4.1
Space-filling comparison of RRTs and RRLTs. . . . . . . . . . . . . .
47
4.2
Goal-biased search in the Knight-Swapping Puzzle and N -Puzzle. . .
48
4.3
Single-directional search comparison in the N -Puzzle. . . . . . . . . .
49
4.4
Bi-directional search comparison in the N -Puzzle. . . . . . . . . . . .
49
4.5
Single-directional search comparison in the Knight-Swapping Puzzle.
50
4.6
Bi-directional search comparison in the Knight-Swapping Puzzle. . . .
50
4.7
Single-directional search comparison in the Knight-Swapping Puzzle with added constraints. . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8
50
Bi-directional search comparison in the Knight-Swapping Puzzle with added constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
Results for a simple air traffic control problem. . . . . . . . . . . . . .
53
4.10 Results for a harder air traffic control problem. . . . . . . . . . . . . .
54
5.1
Comparison of pruning methods. . . . . . . . . . . . . . . . . . . . .
62
5.2
Cost-optimized N -Puzzle and Knight-Swapping Puzzle RRLTs. . . .
74
5.3
Comparison of the semi-leafy tree and RRLT. . . . . . . . . . . . . .
77
4.9
7
List of Algorithms 2.1
Generalized discrete reachability algorithm. . . . . . . . . . . . . . . .
15
2.2
Generalized discrete planning algorithm. . . . . . . . . . . . . . . . .
16
3.1
The RRT algorithm in continuous space. . . . . . . . . . . . . . . . .
31
3.2
A bi-directional RRT search algorithm. . . . . . . . . . . . . . . . . .
35
3.3
The PRM learning algorithm in continuous space. . . . . . . . . . . .
37
3.4
The PRM planning algorithm in continuous space. . . . . . . . . . . .
38
4.1
The RRT algorithm in discrete space. . . . . . . . . . . . . . . . . . .
40
4.2
The RRLT algorithm.
. . . . . . . . . . . . . . . . . . . . . . . . . .
43
4.3
The PRM learning algorithm in discrete space. . . . . . . . . . . . . .
45
4.4
The PRM planning algorithm in discrete space. . . . . . . . . . . . .
46
5.1
Nearest-neighbor search with leaf-based pruning. . . . . . . . . . . . .
57
5.2
Nearest-neighbor search with node-based pruning. . . . . . . . . . . .
59
8
Sampling-Based Planning for Discrete Spaces Abstract by STUART BRUCE MORGAN
In this thesis, we explore the use of sampling-based planning algorithms for searching discrete spaces. We adapt the rapidly-exploring random tree (RRT) and probabilistic roadmap (PRM) algorithms for use in solving discrete-space problems by substituting a heuristic for the distance metric in nearest-neighbor searches and by substituting local planners for straight-line connectivity. We provide experimental results for these discrete sampling-based planning algorithms on several example problems and compare them to existing discrete search methods, such as A∗ . In addition, we propose new algorithms inspired by these discrete sampling-based planners, most notably the rapidly-exploring random leafy tree (RRLT), which addresses issues in the discretized RRT, and meta-trees, which combine elements of the RRT and PRM algorithms. Lastly, we explore coverage and optimality properties of these algorithms in grid-world problems, and explain their underlying causes in terms of discrete Voronoi regions and agent dynamics.
9
Chapter 1 Introduction 1.1
Problem Overview
The purpose of this thesis is to explore a novel method of solving discrete-space planning and search problems using sampling-based techniques. A wide array of interesting problems, particularly in the field of artificial intelligence, can be expressed as discrete-space searches. These range from primarily physical planning problems, like the N -Puzzle [4, 10, 22, 34, 36], the Rubik’s Cube [21, 34], and Sokoban [17–19], to more abstract feasibility or optimization problems such as combinatorial auctions [9, 15, 32] and the traveling salesman problem [13, 29, 34]. Since discrete-space searching is fundamental to so many problems, new ways of searching have the potential to make improvements in understanding and solving a variety of current research topics.
1.2
Thesis Contributions
Our work demonstrates that rapidly-exploring random trees (RRTs) and probabilistic roadmaps (PRMs), two popular sampling-based algorithms for continuous planning in high-dimensional problems, can be applied successfully to 10
discrete-space planning. We also propose a variety of other algorithms based on the discretized versions of these two planners, and explore their behaviors and effectiveness in discrete search. In addition, we explore the properties of sampling-based planners in simple discretized two-dimensional space by providing visualizations of coverage and optimality characteristics. We explain the patterns found in these visualizations in terms of interactions between the algorithms and the underlying dynamics of the space, illustrated using discrete Voronoi regions. This thesis is a comprehensive treatment of work that has already been published in [5, 6, 31].
1.3
Thesis Outline
This thesis is organized into seven chapters: Chapter 1 lays out the focus of this thesis, the motivation for researching discrete problems, and the contributions made. Chapter 2 formally introduces discrete spaces and discrete planning problems, and highlights common approaches to solving these problems. In addition, we present several discrete planning problems that are used in experiments throughout this thesis. Chapter 3 introduces sampling-based planning, specifically the RRT and PRM algorithms. Chapter 4 explains the process of adapting the RRT and PRM algorithms for use in discrete space, and highlights the shortcomings of direct discrete-space mappings. We also introduce the rapidly-exploring random leafy tree (RRLT), which more closely parallels the properties and behavior of continuous RRTs. 11
Finally, we provide experimental results of the effectiveness of these algorithms in several example problems. Chapter 5 describes new discrete sampling-based planning algorithms inspired by our discrete RRT and PRM algorithms. Most notably, we describe meta-tree planners, which use sampling-based planning at a high level while abstracting away details of the discrete space by using local searches. Here we also discuss improvements and optimizations of our algorithms and explore their usefulness in solving our example problems through experimentation. Chapter 6 explores in depth several properties of the discrete sampling-based planning algorithms in relation to their continuous counterparts, highlighting notable differences and similarities. Specifically, we examine the idea of Voronoi regions in discrete space, and explore coverage and optimality properties of our algorithms in a discrete grid world. Chapter 7 reviews our conclusions and contributions, and suggests areas that may warrant further research.
1.4
Experimental Data
Throughout this thesis, the experimental data that we report was generated on a dual 2 GHz processor G5 with 2 GB RAM. All algorithms were implemented in C++, making extensive use of Standard Template Library (STL) data structures (see Appendix B). Visualizations of experiments were produced using OpenGL. Because our results are generated by randomized algorithms, all data we report is approximate. As a result, experimental values are reported to at most three significant figures. In each experiment, we averaged our results over many trials, and report the number of trials with the data.
12
Chapter 2 Discrete Search Background In order to develop new algorithms for discrete planning problems, we must first understand what discrete planning entails. In this chapter, we define discrete space and the discrete planning problem, survey existing discrete search methods, and introduce several examples of discrete planning problems.
2.1
Discrete Systems
We define a discrete system 1 as consisting of a countable set of states S, and a corresponding set of discrete dynamics that define, for each state q, a transition rule ∆ : S → 2S , where ∆(q) = Q0 ⊆ S is the (finite) set of possible successors to q. In discrete search, it is often useful to consider the transition rules for states in terms of a small set of universal operators, each of which map q → q 0 in a predictable way. For example, in planning an agent’s motion in a grid world (see Figure 2.1), where each state corresponds to an (x, y) pair, the operators are the four possible moves to adjacent squares: oup (x, y) → (x, y + 1), odown (x, y) → (x, y − 1), oleft (x, y) → (x − 1, y), and oright (x, y) → (x + 1, y) (with the constraint that an operator cannot be applied if the resultant state would collide with an obstacle). 1
For convenience, we will often refer to a discrete system simply as a discrete space.
13
y
g
Figure 2.1: A simple discrete motion-planning problem where the goal is to move the agent (represented by the black circle) to the goal location g. The agent may move up, down, left, or right from cell to cell.
2.1.1
Discrete Space Planning Problems
In general, a discrete planning problem (also known as a discrete search problem) gives us a a start state qstart and a set of goal states G in a discrete system, and asks us to find a path from qstart to some qgoal ∈ G. This solution path must consist of a o
o
on−1
o
0 1 2 sequence of states and state transitions qstart → q1 → q2 → · · · → qgoal with each
transition oi to a new state obeying the transition rule for the previous state (i.e., for each transition qn → qn+1 , it must be true that qn+1 ∈ ∆(qn )). Path Cost Often in planning problems there is a cost associated with each transition or operator. For example, we might associate a cost of 1 to each move of one cell by o
o
o
on−1
1 2 3 the agent. We can define the cost of a path q1 → q2 → q3 → · · · → qn as Pn−1 i=1 cost(oi ), where oi is the operator mapping qi to qi+1 and cost(o) > 0 for all
operators o. Cost considerations add a new element to planning problems: in some cases it is not sufficient merely to find a path between two states, but we must instead find an optimal path—a path such that there is no other path between the two states that has a lower cost. In cases where cost is not a primary consideration, we speak instead of finding a feasible path—one that connects two states regardless of cost (see Figure 2.2).
14
y
y
g
g
Figure 2.2: Optimal (left) and feasible (right) paths for a grid-world planning problem.
2.2
Existing Discrete Algorithms
Due to its importance in a wide variety of areas, there has been significant research into the subject of discrete search. Here, we explore general discrete search techniques and summarize several well-known search algorithms, including breadth-first, depth-first, best-first, and A∗ [33, 34].
2.2.1
General Discrete Search Framework
In broad terms, we can express discrete search algorithms in a generalized search framework, using the following algorithm: Algorithm 2.1: Generalized discrete reachability algorithm. 1 S e a r c h (S, G, qstart ) 2 T ree . i n i t () 3 Open . i n i t (qstart ) 4 while (Open . isNotEmpty()) { 5 q = Open . pop() 6 T ree . add(q) 7 f o r each qnew in S . g e n e r a t e S u c c e s s o r s (q) { 8 i f (G . c o n t a i n s (qnew )) 9 return s u c c e s s 10 i f (T ree . doesNotContain(qnew )) 11 Open . push(qnew ) 12 } 13 } 14 return f a i l u r e 15 }
15
The algorithm begins with an empty search tree and a list of open states (those that have not yet been explored) containing only a start state qstart . At each step, the first state in the open list is added to the tree, and each of its possible successors is considered: those successors that have not already been explored by the tree are added to the open list. This process continues until a goal state is encountered (in which case the search is successful) or the open list becomes empty (which indicates that the algorithm has explored all reachable states without finding a solution). This framework, although simple, encompasses a wide variety of search algorithms with only minor changes. For example, the properties of the search can be altered dramatically simply by changing the ordering of the Open list.
Path Planning The algorithm described above is known as a reachability algorithm. A reachability algorithm is useful for determining if a path between two nodes exists, but may tell us nothing about the path itself. Algorithm 2.1 can be expanded into a complete planning algorithm by storing extra information with each state during the search. In the modified algorithm below, states are replaced by nodes, which contain extra information in addition to the state. Here, nodes consist of tuples defined as hstate, cost, pathi ∈ S × R+ × P (where P is the set of all possible paths). Algorithm 2.2: Generalized discrete planning algorithm. 1 2 3 4 5 6 7 8 9 10 11 12
S e a r c h (S, G, qstart ) T ree . i n i t () Open . i n i t (hqstart , 0, ∅i) while (Open . isNotEmpty()) { hq, c, pi = Open . pop() T ree . add(hq, c, pi) f o r each hqnew , cnew i in S . g e n e r a t e S u c c e s s o r s (hq, ci) { i f (G . c o n t a i n s (qnew )) return s u c c e s s i f (T ree . doesNotContain(qnew ) Open . push(hqnew , cnew , p · qi) }
16
13 } 14 return f a i l u r e 15 }
Note that the tuple values for successors are generated with no additional work by the algorithm. In line 7 the cost of a successor is generated by the same operator that generates the new state, and in line 11 the successor’s path is generated by concatenating the parent’s state onto the parent’s path. For convenience, we will often use q to refer to nodes as well as states in our algorithms, since many of our algorithms will use both nodes and states in searching, and state is generally the primary component of a node (see, for example, nearest-neighbor queries in the discrete RRT of Section 4.1).
Bi-Directional Search In cases where there is only one goal state qgoal , it is often possible to use two trees to make planning significantly faster. In bi-directional search, two trees are grown simultaneously, one from qstart toward qgoal , and one from qgoal toward qstart . Each tree is grown according to Algorithm 2.1, except that the goal checking in line 8 is replaced with a check to determine if the two trees have intersected yet. Generally, the search proceeds by alternating trees, performing one iteration of the search algorithm of one tree before switching to the other, until the trees intersect. The usefulness of bi-directional search is largely dependent on the specific search algorithm, as we will discuss in the descriptions below.
2.2.2
Uninformed Search
The simplest searches are na¨ıve or uninformed searches, which order the open list without the use of any information that could be obtained or deduced about the space being searched or the “usefulness” of any given avenue of exploration. The
17
14
15
16
13
11
12
8
9
4
1
5
10
6
2
y
3
7
17
18
20
19
22
12
13
14
15
16
17
18
20
19
10
11
21
7
6
5
4
3
21
22
g23
8
9
y
1
2
g24
23
Figure 2.3: Breadth-first (left) and depth-first (right) search trees in a grid-world planning problem. Cells are numbered by the order in which they are explored, assuming that successors are generated in the order up, down, left, right.
canonical uninformed searches, breadth-first and depth-first, can be obtained from the generalized algorithm by treating the open list as a queue or stack respectively (see Figure 2.3). Breadth-first and depth-first search are at the opposite ends of the spectrum with regard to applicability of bi-directional search. In breadth-first search, a solution with a path length of m would require searching bm states (where b is the average branching factor, or number of operators that can be performed on each state), whereas a bi-directional breadth-first search would require searching only 2bm/2 states, reducing the search space exponentially. Since breadth-first search trees are full trees, exploring all possible branches equally, they are guaranteed to intersect at a half-way point between qstart and qgoal . On the other hand, bi-directional search using depth-first search is often less effective than a single-directional search; since its search trees are very sparse, it is often the case that the two trees reach their respective targets without ever intersecting, and thus twice as many states are explored as necessary [34].
2.2.3
Informed Search
Since uninformed search methods are exponential in the size of the solution path and the branching factor of the operators, another class of search algorithms, known as informed search, employ heuristic information about the space to attempt to 18
reduce the scope of the search. By using this additional information, a search algorithm can bias growth in directions that appear promising. In our discussions of specific informed search algorithms we denote the function used to evaluate node x by f (x), with the convention that a node with a lower f (x) value is a “better” node. In terms of the general search framework, this means that the first element of the open list at each step is defined to be the node with the lowest f (x) value. For example, in the grid world of Figures 2.1 and 2.2 we could define f (x) as the number of moves that would be required to reach the goal from node x if the wall did not exist (see Section 2.3.1).
Heuristics Heuristics allow an algorithm to make educated guesses about the distance or cost-to-go between two states in a discrete space; better heuristics improve the performance of informed search. Obtaining a perfectly accurate heuristic for a search problem is equivalent to solving the problem, since simple gradient descent guided by the heuristic would yield an optimal solution with no exploration—such a heuristic can be created for any problem simply by solving a search between the two states with an uninformed search (or an informed search using a less precise heuristic), however this is impractical since it is as hard as the original problem. In practice, the goal of heuristic design is to create the best possible heuristic that does not pass the point of diminishing return on increased complexity. One common method of obtaining such heuristics is known as operator relaxation, where constraints on the generation of new states (such as obstacles in the space) are ignored for the purposes of of computing the remaining cost until the relaxed problem becomes computationally feasible to solve [34]. Appendix A gives an example of a heuristic we obtained for a test problem using operator relaxation. In designing a heuristic for informed search, there are several properties that
19
are often desirable. Perhaps the most important of these is the property of admissibility. This property guarantees that for every pair of nodes x, y, their heuristic distance h(x, y) satisfies
0 ≤ h(x, y) ≤ d(x, y),
where d(x, y) is the actual (optimal) distance from x to y. By guaranteeing that the distance between two states is never underestimated, it is possible to put lower bounds on the optimality of many heuristic algorithms. Also useful is the idea of a metric heuristic. In order to be a metric, a heuristic must satisfy five properties [14]: • Non-negativity: h(x, y) ≥ 0, • Reflexivity: h(x, x) = 0, • Discernibility: h(x, y) > 0 if x 6= y, • Symmetry: h(x, y) = h(y, x), and • Triangle inequality: h(x, y) + h(y, z) ≥ h(x, z). The properties of non-negativity, reflexivity, and discernibility are trivial to guarantee in a heuristic (assuming a non-zero lower-bound on the cost to move between any two states in the space, which is generally true in a discretized space). Symmetry and the triangle inequality, however, can be more difficult to guarantee. Although they can be useful in analysis, as we will see later, they are not necessary. (Symmetry can of course be guaranteed by defining h0 (x, y) = max(h(x, y), h(y, x)), but the twofold increase in computational time is not always worthwhile.) Since a good heuristic can be critical to an algorithm’s usefulness, it is often useful to compare two heuristics: Heuristic hA is better than heuristic hB if hA (x, y) 20
generally gives values closer to d(x, y). When combined with admissibility constraints, we have that hA is the better heuristic if hB (x, y) ≤ hA (x, y) ≤ d(x, y) for all x, y. If this is true, then heuristic hA is said to dominate hB . It is not always possible to prove dominance of heuristics, however. In such cases, heuristics are generally evaluated through experimentation. A∗ Algorithm The A∗ algorithm is an optimal discrete-space search algorithm, in that (i) it guarantees that it will find the optimal solution to any discrete planning problem if a solution exists, and (ii) it has been shown that no other algorithm can guarantee optimal results with the same heuristic information while expanding fewer nodes than the A∗ algorithm [33]. The evaluation function f (x) for the A∗ algorithm estimates the total length of the optimal path from qstart to qgoal constrained to go through node x. It has two components, expressed as f (x) = g(x) + h(x), where g(x) represents the cost from the starting state qstart to the state x and h(x) is a heuristic estimate of the remaining cost from x to a goal state qgoal . In the ideal case, g(x) would be the optimal cost from qstart to x (denoted by g ∗ (x)), and h(x) would be the optimal cost from x to the nearest qgoal (denoted by h∗ (x)). Taken together, we could define f ∗ (x) = g ∗ (x) + h∗ (x), with f ∗ (x) giving the optimal cost of a solution path containing x. However, as explained above, the function h∗ (x) cannot be known in any but the most trivial of problems, and therefore must be estimated by h(x).2 Using an estimate h(x) leaves us with an evaluation function f (x) that selects the node that, to the extent that the heuristic h(x) is accurate, will have the lowest overall solution cost. The g ∗ (x) component of the function need not be estimated, since the properties of the A∗ search guarantee that if the minimum cost so far to every state q is stored and updated as the algorithm progresses, each node in the solution will have the lowest-cost path to that node’s state [33]. 2
21
The choice of h(x) has a profound impact on the effectiveness of the A∗ algorithm. In order to guarantee the optimality of the A∗ algorithm it must be the case that for every x, 0 ≤ h(x) ≤ h∗ (x); i.e., h(x) must never over -estimate the remaining cost. Within that constraint, the algorithm performs better and better as h(x) approaches h∗ (x), which leads to a trade-off in designing a heuristic for A∗ searches. As the accuracy of h(x) increases, the number of nodes expanded during a search decreases; however, there is generally an increase in the computational complexity of the calculation of a better h(x). The optimality of the A∗ algorithm comes at the cost of increased space and time requirements. Although a variety of A∗ variants have been proposed that sacrifice some amount of optimality in exchange for improved running time, such as A∗ , R∗δ , and R∗δ, [33], its focus on optimal or near-optimal paths generally makes it a poor choice when trying to find any solution—regardless of its optimality—as quickly as possible.
Best-First Search The best-first search algorithm is a greedy informed search designed to find solutions as quickly as possible, with no regard to the optimality of solutions. The evaluation function f (x) for best-first search consists of the heuristic distance from x to qgoal (h(x)), without considering the cost from qstart to x (g(x)). Where A∗ favors nodes with lowest expected total cost, best-first search follows paths that the heuristic indicates are closest to finding a solution, i.e., those that have the lowest expected cost-to-go (see Figure 2.4). While best-first search is useful for quickly finding solutions to straightforward problems, it performs poorly in spaces with “obstacles” or dead-ends that are not accounted for in the heuristic. For example, best-first search will often exhaustively explore an entire blocked subtree before beginning to explore another path.
22
11
12
13
14
9
15
16
11
17
9
10
7
3
5
6
18
9
4
y
1
2
g19
12
13
14
15
16 17
10
7
5
4
3
18
8
6
y
1
2
g19
Figure 2.4: A∗ (left) and best-first (right) search trees in a grid-world planning problem, using a heuristic h(x) that counts the number of moves between x and qgoal ignoring obstacles. Cells are numbered by the order in which they are explored, assuming that ties are broken in favor of older nodes, and that successors are generated in the order up, down, left, right.
Generalized Heuristic Search Both A∗ and best-first search can be generalized to a weighted nearness evaluation
f (x) = αg(x) + h(x),
where α ≥ 0.3 By varying α, this formulation allows finer control over the full spectrum between finding a solution quickly (α = 0), and finding an optimal solution (α = 1) [33, 34]. In the use of bi-directional solving with heuristic search, A∗ is similar to breadth-first search in fullness, whereas best-first search tends to be sparse, like depth-first search. Thus, bi-directional search is more useful for heuristic searches with larger values of α.
Limitations of Heuristic Search Although heuristic search is generally much more efficient than brute-force search, the feasibility of many informed searches is largely dependent on the quality of the heuristic information. It is easy to construct examples where most informed 3
Another common form for weighted heuristic evaluation is f (x) = wg(x)+(1−w)h(x). Our form sacrifices some generality in favor of simplicity, as we consider only the range 0 ≤ α ≤ 1 (equivalent to 0 ≤ w ≤ 0.5).
23
searches provide little or no improvement over na¨ıve, brute-force methods by incorporating substantial obstacles that are not predicted by the heuristic (see Figures 2.3 and 2.4).
2.3
Sample Discrete Planning Problems
In order to evaluate discrete planning algorithms, we need discrete planning problems to use as test-cases. Here, we present several discrete planning problems that we will use throughout this thesis.
2.3.1
Grid World
In the grid world, we consider an agent in a two-dimensional grid capable of moving to an adjacent cell at each step. Except where noted, we do not allow the agent to make diagonal moves. Each transition in the grid world has a cost of one. The grid world presents an opportunity to study the effects of using various simple distance metrics as heuristics, specifically the L1 -norm, L2 -norm, and L∞ -norm (also known as the sum norm or Manhattan distance, Euclidean norm or Euclidean distance, and max norm, respectively). These heuristics are defined as follows [14]: • L1 ((x1 , y1 ), (x2 , y2 )) = abs(x1 − x2 ) + abs(y1 − y2 ) • L2 ((x1 , y1 ), (x2 , y2 )) =
p
(x1 − x2 )2 + (y1 − y2 )2
• L∞ ((x1 , y1 ), (x2 , y2 )) = max(abs(x1 − x2 ), abs(y1 − y2 )) In Chapter 6 we present the results of our studies on the behavior of these heuristics in the grid world.
24
2.3.2
N -Puzzle
One of the simplest discrete planning problems is the N-Puzzle (where N is of the form N = k 2 − 1, for some integer k ≥ 2), which consists of a k × k grid containing tiles numbered from 1 to N and one blank space. The goal of the N -Puzzle is to begin with the tiles in a random configuration and to arrange them in numerical order (see Figure 2.5). At each step, it is possible to swap the positions of the empty space and one of the adjacent tiles. Since the 8-Puzzle version of this problem is easily solvable, with its very low branching factor and average optimal solution length of approximately 22,4 it provides a simple test-case for new discrete-space search algorithms. Since it scales to arbitrary sizes (15-Puzzles, 24-Puzzles, etc.), it is also useful as a test of performance on high-depth, low-branching-factor trees. In general, the number of states in the space for an N -Puzzle is (N + 1)!/2, which comes to approximately 105 in the 8-Puzzle, 1013 in the 15-Puzzle, and 1025 in the 24-Puzzle. For all of our experiments, we used the standard heuristic of the Manhattan distance from each tile’s position in the current board relative to the target board as a distance heuristic. This heuristic is both admissible, since it is obtained by the relaxed operator method, and a metric, making it well-suited to a variety of search algorithms and optimizations [34].
2.3.3
Knight-Swapping Puzzle
A more difficult puzzle is the Knight-Swapping Puzzle, which entails swapping the positions of knights on a k × k chess board (where k must be odd), using only valid knight moves. The set-up consists of filling all but the center square with knights, divided between black and white along a diagonal (see Figure 2.5). The 5 × 5 version of this problem is more difficult than the 8-Puzzle, having both a larger 4
based on the average of 1000 trials solved with A∗
25
N -Puzzle Randomized Start 5 4
Knight-Swapping Puzzle Goal
Start
2
6
1
2
3
8
7
4
5
6
3
1
7
8
v v v v v
v v v v f
v v f v f f f f f f f f f f
Goal f f f f f
f f f f v
f f v f v v v v v v v v v v
Figure 2.5: The N -Puzzle (N = 8) and Knight-Swapping Puzzle (5 × 5).
branching factor and an optimal solution length of 36,5 leading to much larger search trees. Scaled to 7 × 7 or 9 × 9, it provides a test of large spaces with higher branching factors than in the N -Puzzles. The number of states in a k × k Knight-Swapping Puzzle is given by the formula (k 2 )! , ((k 2 − 1)/2)!((k 2 − 1)/2)! which yields about 108 in the 5 × 5 version, 1015 in the 7 × 7 version, and 1025 in the 9 × 9 version. For this puzzle, all experiments were performed using a heuristic value obtained by summing the number of required knight moves (ignoring the presence of other knights) for each out-of-place knight to reach a destination position for its color that is not already occupied by another knight of the same color. Note that “out-of-place” and “destination position” are defined in terms of the target board, which allows the heuristic to find an estimated cost-to-go from a given board layout to any other layout, and not just to the goal configuration. This heuristic is admissible, and dominates the simplistic heuristic that counts the number of knights out-of-place (see Appendix A for proof). Although it is similar in concept to the idea of Manhattan distance for N -Puzzles, it is not a metric, violating both symmetry and, rarely, the triangle inequality. It could be made symmetrical by 5
based on solving with A∗
26
nnot move to theObstacles empty square ng With less there is a knight of the same
or adjacent to the empty square se 3 adds the further constraint that night Puzzle: night cannot move to the empty uare if doing so would leave nt thatknight a knight other without an adjacent pty square ight of the same color. of the same mpty square Figure 2.6: Added constraints on the Knight-Swapping Puzzle. The move on the left constraint that is not allowed since there is no white knight adjacent to the destination. The move the right is not allowed since it leaves the white knight marked with the red (grey) o the empty on ‘×’ without an adjacent white knight. ld leave an adjacent defining a new heuristic h (a, b) = h (b, a) = max(h(a, b), h(b, a)), however this would double the time required for the heuristic computation (already the most costly part or. 0
0
of informed search) without significantly improving the quality of the heuristic. In order to incorporate the idea of obstacles and dead-ends in our trials, we defined two additional constraints for the Knight-Swapping Puzzle (see Figure 2.6): • Constrained destination: A knight cannot move to the empty square if there is not already an ally (a knight of the same color) in one of the squares adjacent to the square. • Constrained source: A knight cannot move if it would leave behind a knight without an ally in one of the four squares adjacent to it. Usage of these additional constraints is noted in trials that incorporate them.
2.3.4
Grid World Air Traffic Control
In order to explore prioritized multi-agent planning, we discretized a version of the air traffic control problem described in [11] by placing it in a grid world. In this problem, we consider a number of airports located in a two-dimensional grid-world, 27
with each airplane taking off from one airport and traveling to its destination airport. We make the following restrictions on airplanes: • An airplane cannot occupy the same location as another airplane, unless that location is one of the four cells adjacent to an airport where one or both airplanes are taking off or landing (since different, special-purpose algorithms would be used for take-off and landing). • Airplanes cannot pass directly in front of or behind another airplane, where “in front of” and “behind” an airplane are defined by the locations of that airplane at times t + 1 and t − 1 respectively. • Airplanes cannot stop; they must move one cell every time step. The goal of the problem is to use prioritized planning to find paths subject to the problem constraints for a set of airplanes, each scheduled to take off at a set time and travel to its destination. In prioritized multi-agent planning, the agents’ paths are planned sequentially: once an agent’s path is planned, that path is immutable. The search tree used to plan the agent’s path is discarded, except for the solution path. That path then becomes an obstacle in the space-time (x, y, t) for all future agents [11]. In order to explore the added effects of static obstacles, we further allow the definition of “no-fly” zones, which act as obstacles for all agents, at all time steps.
28
Figure 2.7: Air traffic control in a grid world with six airports. The airports are denoted by empty squares, and each airplane is color-coded by its destination airport. The grey diamond-shaped regions indicate no-fly zones. In the lower panel, we show the remaining paths of all airplanes currently in the air.
29
Chapter 3 Sampling-Based Planning Background Before adapting sampling-based planning algorithms to discrete space, we review two algorithms—rapidly-exploring random trees and probabilistic roadmaps—in their original, continuous forms. In addition to presenting the algorithms, we explore properties that make them successful in high-dimensional motion planning, in order to better understand how to map the algorithms to discrete space.
3.1
Rapidly-Exploring Random Trees
Rapidly-exploring random trees (RRTs) are a probabilistic exploration method developed for searching the high-dimensional continuous spaces encountered in motion planning and control problems [23, 24, 26]. Later, they were adapted for use in hybrid-systems planning and control [5, 6, 11, 27].
30
sxstart xnear Q Q sQ Q s s QQsxnew s B B Bs
s
xrand
Figure 3.1: Construction of an RRT in continuous space. Adapted from [26].
3.1.1
The RRT Algorithm
The RRT construction algorithm begins with an initial configuration xstart as the tree. At each step, it selects a random configuration xrand from the configuration space, then finds the nearest configuration already in the tree, xnear , using some definition of nearness (often Euclidean distance). From xnear , it takes a step toward xrand , and adds that new configuration to the tree (see Figure 3.1). These steps are repeated until some xgoal (which may be a specific goal state or an element of a set of goal states) is reached, or until the tree has reached a certain size. In pseudo-code, the algorithm is as follows [24]: Algorithm 3.1: The RRT algorithm in continuous space. 1 GrowRRT(xstart ) { 2 T ree . i n i t (xstart ) 3 f o r n = 1 to N { 4 xrand = RandomConfiguration() 5 T ree . extend(xrand ) 6 } 7 }
1 RRT : : extend(xrand ) { 2 xnear = T ree . n e a r e s t N e i g h b o r (xrand ) 3 xnew = TakeStep(xnear , xrand , ) 4 i f (xnew ) { 5 T ree . a d d C o n f i g u r a t i o n (xnew ) 6 T ree . addEdge(xnear , xnew ) 7 } 8 }
31
The behavior of the TakeStep function depends on the type of planning problem. In holonomic planning, in which the agent has full freedom of motion, the step is generally a straight line of length (as in Figure 3.1). However, in the more difficult case of non-holonomic planning, in which the agent’s range of motion is limited, the TakeStep function must find an input to the agent’s equations of motion that yields an xnew as close as possible to xrand . In this case, is often defined as a ∆t for the equations of motion, rather than as a distance. For example, in finding an xnew in planning the path of a car with a maximum turning radius and acceleration, TakeStep would find a steering-wheel angle and gas-pedal force that would place the car as close to possible to xnew after some elapsed time ∆t= . In both cases, the TakeStep function is responsible for performing obstacle-avoidance checks to ensure that the path from xnear to xnew can be reached without collision. In some cases, there may be no obstacle-free path away from xnear (e.g., a car that has been boxed in), in which case no xnew is generated.
3.1.2
Properties
RRTs have been shown to be probabilistically complete, and have good space-filling properties in that their growth is biased toward the largest unexplored regions in the space [24, 30]. The reason for this bias toward unexplored areas is clear when the algorithm is examined in terms of Voronoi diagrams [30]. Figure 3.2 shows two sets of points, one generated pseudo-randomly, the other from a lattice, along with each point’s Voronoi region: the set of all points closer to that point than any other. Figure 3.3 shows an RRT in a two-dimensional space, as well as each node’s corresponding Voronoi region; the Voronoi regions of a tree give important indications of the degree to which it has explored the space. First, the size of the largest Voronoi region(s) gives us an indication of the dispersion of the points: how far, in the worst case, a 32
Figure 3.2: Voronoi regions of pseudo-random points (left) and lattice points (right). Reproduced from [25].
Figure 3.3: Voronoi regions of each node in an RRT at three stages of growth. Reproduced from [11].
point in the space can be from the tree. Second, the locations of large regions tell us where those worst-case points are, and thus which areas have been least thoroughly explored. Finally, the range of sizes in a Voronoi diagram tell us how evenly the space has been explored: if all of the Voronoi regions are roughly the same size then the coverage of the tree is essentially uniform, whereas large size disparities indicate that some regions have been explored much more thoroughly than others. The RRT algorithm leverages these properties of the Voronoi regions probabilistically: so long as the random points in the RRT algorithm are selected uniformly from the space, the largest Voronoi regions will be the most likely to
33
contain the target point [24, 30]. Since the selected region’s node is expanded, the tree will be biased toward expanding the edges of the tree that border on the largest unexplored regions—dramatically so in the early stages of growth (as seen in the first panel of Figure 3.3).
3.1.3
Path Planning Techniques
The RRT algorithm described above is presented as a general exploration of the space, but just as in the case of discrete search we can modify it to create a path-planning algorithm by performing goal-checking at each iteration, and storing nodes containing meta-data about each configuration in the tree. In the case of single-directional search, the goal must usually be defined as one or more regions, rather than a finite set of points, because equality testing is problematic in continuous space. If the goal is given as one or more points, rather than regions, the goal point(s) can be converted to regions by defining a tolerance that creates a small region around each point.
Biased Search Although the RRT algorithm as given is an undirected search, it can easily be adapted for directed search by introducing a bias in the random sampling. Periodically, rather than selecting qrand randomly, we can assign it some specific value. Usually this value is the goal configuration (or a point selected at random from a goal region), but it can also be a way-point configuration (if useful intermediate configurations are known in advance), or simply a point selected in some way other than a uniform random distribution (e.g., floor biasing in the stair-climber of [27, pp. 48-57]). In our implementations, we define a bias period p for our searches as follows: every p iterations of the RRT algorithm, we select a biased target configuration instead of generating it randomly. 34
Bi-Directional Search In problems with one unique goal configuration, an alternative to defining a goal region is to use bi-directional RRT search. Since bi-directional search requires a specific destination node from which to grow the second tree (or a small set of goal configurations if growing a small forest of goal trees is feasible), it is not suitable for all problems. Most notably, bi-directional search is impossible for optimization problems where knowing the goal configuration is equivalent to having a solution. Although planning problems with many goal configurations can be made to fit into a bi-directional framework by selecting one or more goals and discarding the others, this can have the effect of slowing planning by artificially restricting the possible solution paths (for example, see the Acrobot problem in [11, pp. 20-22]). There are a variety of ways to implement bi-directional RRT searches; here we consider one such method, from [11, 23, 26]. This algorithm alternates between growing the trees toward random points, and growing them toward configurations randomly selected from the opposite tree in order to encourage the trees to connect (RRT::extend is unchanged from Algorithm 3.1): Algorithm 3.2: A bi-directional RRT search algorithm. 1 GrowBiDirectionalRRTs(xstart ) { 2 T ree1 . i n i t (xstart ) 3 T ree2 . i n i t (xgoal ) 4 u n t i l I n t e r s e c t i o n (T ree1 , T ree2 ) { 5 xrand1 = RandomConfiguration() 6 T ree1 . extend(xrand1 ) 7 xrand2 = RandomConfiguration() 8 T ree2 . extend(xrand2 ) 9 xtree2 = T ree2 . randomNode() 10 T ree1 . extend(xtree2 ) 11 xtree1 = T ree1 . randomNode() 12 T ree2 . extend(xtree1 ) 13 } 14 }
35
Since RRTs are biased toward unexplored areas, they tend to create relatively full trees, so for those planning problems where bi-directional search is possible the benefits are often significant.
3.1.4
Hybrid RRTs
Although RRT research has focused primarily on continuous planning, recent research has shown that the RRT algorithm can be successfully applied to hybrid systems as well [5, 6, 11, 27]. A hybrid RRT algorithm was generated by extending the RRT’s state to include discrete state information as well as the continuous configuration, using a heuristic to account for differences in discrete state when finding the nearest neighbor, and adding the ability to change states as an action when the agent reaches a switching boundary. This hybrid RRT was then used to solve a variety of hybrid-systems planning and control problems. The promise of the hybrid RRT algorithm is encouraging for the prospects of a discretized RRT. The issues of discrete state switching and heuristic evaluation of state differences, which were introduced in the hybrid RRT, are central to our development of a fully discrete RRT (see Chapter 4).
3.2
Probabilistic Roadmaps
Probabilistic roadmaps (PRMs) constitute another sampling-based method for solving planning problems in high-dimensional spaces, especially multiple-query path planning, where we wish to plan many different paths in the same space [3, 16, 20, 25].
36
3.2.1
The PRM Algorithm
The PRM algorithm operates by repeatedly selecting a random configuration xrand , and adding it to a set of configurations. As each xrand is generated, the algorithm attempts to connect it to any other configurations in the set that are within some distance δ of xrand . The connections are made by a local planner, which in the simple case of holonomic planning is often simply a straight-line generator with obstacle checking. This process is repeated until the PRM reaches a fixed size, or until some level of graph connectivity is achieved. Algorithm 3.3: The PRM learning algorithm in continuous space. 1 BuildPRM() { 2 M ap . i n i t () 3 f o r n = 1 to N { 4 xrand = RandomConfiguration() 5 M ap . a d d C o n f i g u r a t i o n (xrand ) 6 M ap . c o n n e c t C o n f i g u r a t i o n (xrand ) 7 } 8 } 9 PRM: : c o n n e c t C o n f i g u r a t i o n (xrand ) { 10 f o r each xnear in M ap . f i n d N e i g h b o r s (xrand , δ) { 11 i f ( P l a n L o c a l l y (xnear , xrand )) 12 M ap . addEdge(xnear , xrand ) 13 } 14 }
3.2.2
Properties
Like RRTs, PRMs are probabilistically complete using uniform sampling [20]. Voronoi regions are useful in the analysis of PRMs as well, since PRMs implicitly leverage Voronoi regions probabilistically through the increased likelihood of selecting an xrand from a larger Voronoi region. Unlike RRTs, which expand out from one point, the nodes in a PRM follow the sampling distribution exactly, but generally do not create a connected graph until
37
some threshold point density is reached. Thus, whereas the area reachable from a node in an RRT grows steadily over time, the reachability from a node in a PRM changes very little during the beginning of the learning phase, but jumps suddenly, potentially in very large increments, as it is connected to other connected components in the graph.
3.2.3
Path Planning Techniques
Path planning with the PRM algorithm operates in two phases. In phase one, the learning phase, a roadmap is grown as described in Algorithm 3.3. Once the PRM has reached a sufficient level of coverage, the learning phase is complete. In phase two, the query phase, path planning queries are solved by using a local planner to connect qstart and qgoal to the PRM, then finding a path between the connection points through the pre-planned roadmap (using cost information stored for each edge, and a standard graph search such as Dijkstra’s shortest-path algorithm). Algorithm 3.4: The PRM planning algorithm in continuous space. 1 PRM: : planPath(xstart , xgoal ) { 2 xnear = M ap . n e a r e s t N e i g h b o r (xstart ) 3 P l a n L o c a l l y (xstart , xnear ) 4 xnear = M ap . n e a r e s t N e i g h b o r (xgoal ) 5 P l a n L o c a l l y (xnear , xgoal ) 6 }
As in the learning phase’s PRM::connectConfiguration, the connection to the PRM can be done with straight lines (in the holonomic case) or a more complex planner (if the agent is non-holonomic). One interesting possibility is the use of RRTs as local planners, which has shown promising results [3].
38
Chapter 4 Sampling-Based Planners for Discrete Spaces By applying the same sort of heuristic information used in general informed search methods, sampling-based planners can be adapted for use in discrete state spaces. Here, we describe a method for mapping sampling-based planning algorithms onto discrete-space search problems by substituting heuristics for metric functions, and invoking local planners instead of straight-line connectivity. We also explore difficulties encountered in the direct mapping of the RRT, and propose an alternate mapping that better preserves the intent of the original algorithm.
4.1
Discretized RRT Algorithm
As in the original RRT algorithm, the discrete RRT algorithm begins with an initial state qstart . At each step, it selects a random state qrand from the state space that is not already in the tree, then finds find the nearest state already in the tree1 , qnear , based on a heuristic estimate of distance between states. Considering each possible 1
In our implementation, we use a dedicated secondary data-structure (STL set) to allow efficient testing for whether or not a node is already in the tree.
39
operation from qnear , the algorithm selects the successor state qnew that is closest to qrand , and adds qnew to the tree with an edge to its parent, qnear . Algorithm 4.1: The RRT algorithm in discrete space. 1 GrowRRT(qstart ) { 2 T ree . i n i t (qstart ) 3 f o r n = 1 to N { 4 qrand = RandomUnexploredState() 5 T ree . extend(qrand ) 6 } 7 } 8 RRT : : extend(qrand ) { 9 qnear = T ree . n e a r e s t T r e e N o d e (qrand ) 10 i f (qnear . h a s U n s e e n S u c c e s s o r s ()) { 11 qnew = N e a r e s t S u c c e s s o r ( qrand , qnear ) 12 T ree . addChildNode ( qnew , qnear ) 13 } 14 }
The use of heuristic information as a replacement for the RRT’s distance metric allows the discrete RRT to be applied to any problem that can be formulated as a discrete search. Since designing a heuristic and defining the operators that generate successors are already requirements of existing informed discrete searches, using the discrete RRT to solve a problem that has already been defined for popular search strategies like A∗ and its variants requires no work beyond the implementation of the simple RRT algorithm framework itself. In addition, this allows the discrete RRT algorithm to leverage the extensive research into informed search strategies, in order to benefit from improvements such as pattern databases and other domain-specific heuristic optimizations [12, 18, 19]. Besides using heuristic distance, the largest difference between construction of continuous and discrete RRTs is in the necessity of selecting from a finite set of possible operators when taking the step toward qrand . The fact that the number of possible successors to any state is finite leads to two difficulties: non-optimal steps, and failed steps. 40
qnew |
qrand
qnear qs
qp
Figure 4.1: A non-optimal step in a grid-world RRT. The node labeled qnear is the closest to qrand (using the heuristic of L1 distance ignoring obstacles). However, the successor of qp (qs ) is closer to qrand than the successor of qnear (qnew ).
Non-optimal steps occur when some state that is reachable in one step from the current RRT is closer to qrand than the qnew that is generated at that step. They occur because it is possible for the best successor of qnear to be no closer to qrand than qnear itself, or even to be further away, leaving the possibility that a node that is not the nearest neighbor could generate a better successor state (see Figure 4.1). The performance of the RRT in discrete space is degraded by this decreased bias toward unexplored areas when growing the tree randomly, and by the reduced ability of the tree to grow toward a goal in goal-biased exploration (as we demonstrate in Section 4.4.1). Failed steps are those where no qnew is generated during the RRT::extend step, because qnear has no unexplored successors (see Figure 4.2). Although failed steps have no effect on the exploratory properties of the algorithm, they can significantly increase the running time of the algorithm if they occur frequently. In our implementation of the discrete RRT, we attempt to reduce the number of failed steps by storing a count of remaining successors at each node. This counter is updated whenever a node is selected as qnear (adding no extra work to the algorithm, since we must already generate the successors and test them against the list of states that have already been explored), so that when a node generates its last possible successor, or a node is selected as qnear and is found to have no unexplored successors, the counter is set to zero. Nodes that have a counter of zero are not 41
|
qnear
qrand
Figure 4.2: A failed step in a grid-world RRT. Since qnear has no unexplored successors, no qnew is generated.
considered when selecting the nearest neighbor. This method does not completely prevent failed steps, however, since in many cases (such as that illustrated in Figure 4.2) it is possible to generate the successors of a node as children of other nodes. Although these could be ruled out by searching for all possible parents of each new node (in order to maintain completely accurate counts), such searches would likely be more time consuming than the failed steps that they would prevent.
4.1.1
Path Planning
The discrete RRT algorithm can be converted to a path planning algorithm simply by storing the same sort of extra information such as cost and path at each node, as described in Section 2.2.1.2 Often, planning can be done more easily by using biased trees or bi-directional search, just as is the continuous RRT algorithm (see Section 3.1.3).
4.2
Rapidly-Exploring Random Leafy Trees
A new algorithm that mitigates the issues we identified in the discrete RRT is the rapidly-exploring random leafy tree (RRLT). The RRLT algorithm keeps a list of all states reachable in one step from the current tree: the “leaves” of the tree nodes. 2
However, unlike the continuous RRT, the discrete RRT does not require the definition of goal regions.
42
4
5
2
1
3
2
1
5
4
6
5
4
7
8
7
8
qnear−RRT
3
6
qnew−RRT 4 5
1
2
5
4
8
7
3
1
2
3
5
4
6
8
7
1
2
3
4
5
6
7
8
qrand
qnew−RRLT 6
6
1
2
5
4
3
8
7
6
Figure 4.3: A step in growing an 8-Puzzle tree using the RRT and RRLT algorithms. The bold states are the nearest states to qrand in the tree, shown with their potential children (RRLT leaves). The numbers associated with each state are the heuristic distances to qrand . Notice that qnew−RRT is heuristically further from qrand than qnew−RRLT is (cf. Figure 4.1).
When qrand is selected, the nearest leaf is located directly and added to the tree, and all of its successors are added to the open list. The complete RRLT algorithm is as follows: Algorithm 4.2: The RRLT algorithm. 1 GrowRRLT(qstart ) { 2 T ree . i n i t (qstart ) 3 f o r n = 1 to N { 4 qrand = RandomUnexploredState() 5 T ree . extend(qrand ) 6 } 7 } 8 RRLT : : extend(qrand ) { 9 qnew = T ree . n e a r e s t T r e e L e a f (qrand ) 10 T ree . changeLeafToNode(qnew ) 11 T ree . addNewLeaves(qnew ) 12 }
43
Since qnear = qnew is selected from the list of leaf states, and not from among the states already in the tree, the algorithm guarantees that at each step the tree grows as far as possible (heuristically) toward qrand , essentially providing a one-step look-ahead in the RRT growth (see Figure 4.3).3 Although this algorithm is more space-intensive than the simple discrete RRT algorithm, in our experience it is generally faster and finds better solutions (as shown in Section 4.4). The RRLT algorithm also fits much more cleanly into the generalized search algorithm framework discussed in Section 2.2.1 than does the RRT algorithm. In the RRLT algorithm, we simply place the leaves in the open list, and order the list by nearness to a randomly selected node before selecting the first node from the list to add to the tree. In the simple discrete RRT algorithm, on the other hand, we must consider the open lists as containing the children of the current tree nodes even though they have not been explicitly generated by the RRT algorithm. Those children are ordered first by their parents’ nearness to a randomly selected node, then by their own nearness when tie-breaking is necessary. Within this framework it is clear that the RRLT’s update rule is more natural than that of the discrete RRT, and more closely parallels that of other discrete search algorithms. In both cases, however, the ordering is redefined at every step, which prevents the information reuse that makes algorithms such as A∗ significantly faster for simple problems. Note that some information reuse is possible in the RRLT algorithm; specifically, an A∗ -like ordering can be preserved in biased search (see Section 7.2.2).
4.2.1
Path Planning Optimizations
When using RRLTs for path planning (rather than simple exploration), using the leaves of the tree effectively can result in faster planning, as well as smaller trees. In single-directional search, we check the leaves as they are generated to see if any of 3
The nearest-neighbor in an RRLT is always a leaf, since a node already in the tree cannot be added again.
44
them is a goal configuration, and if so we add it as a tree node immediately, rather than waiting until the node is explored at a later iteration. This can significantly reduce the size of the tree as compared to an RRT search.4 Similarly, the leaves can be used to speed the joining of trees in bi-directional search. Rather than waiting for the nodes of the tree to intersect, we check for leaf intersection, and if the leaves of the trees intersect (or if the leaves of one tree intersect the nodes of the other) we can immediately grow both trees to that node. This again reduces the depth to which the RRLTs must reach before they intersect, thus lowering their time and space requirements in solving planning problems.
4.3
Discretized PRM Algorithm
The PRM algorithm in discrete space is identical to its continuous counterpart, except that we again replace the distance metric used to define the connection radius δ with a heuristic estimate of the cost-to-go. We use a standard discrete search method such as A∗ or best-first search as a local planner in PlanLocally. Algorithm 4.3: The PRM learning algorithm in discrete space. 1 BuildPRM() { 2 M ap . i n i t () 3 f o r n = 1 to N { 4 qrand = RandomState() 5 M ap . a d d S t a t e (qrand ) 6 M ap . c o n n e c t S t a t e s (qrand ) 7 } 8 } 9 PRM: : c o n n e c t S t a t e s (qrand ) { 10 f o r each qnear in M ap . f i n d N e i g h b o r s (qrand , δ) { 11 i f ( P l a n L o c a l l y (qnear , qrand )) 12 M ap . addEdge(qnear , qrand ) 13 } 14 } 4
In biased search with one goal, however, it can never reduce the number of additional RRLT::extend steps by more than the bias period.
45
4.3.1
Path Planning
Path planning in the discrete PRM is again unchanged from continuous space. In the learning phase we can store (as part of the edge) the full path returned by the local planner during a connection step, which allows the path to be reconstructed instantly. If memory is a concern, we can instead store only the edge costs, and reconstruct the paths between roadmap nodes as necessary using the same local planner [20]. Algorithm 4.4: The PRM planning algorithm in discrete space. 1 PRM: : PlanPath(qstart , qgoal ) { 2 qnear = M ap . n e a r e s t N e i g h b o r (qstart ) 3 P l a n L o c a l l y (qstart , qnear ) 4 qnear = M ap . n e a r e s t N e i g h b o r (qgoal ) 5 P l a n L o c a l l y (qnear , qgoal ) 6 }
4.4
Discrete Sampling-Based Planning Results
In order to evaluate our new discrete-space planning algorithms, we explored their performance on various discrete planning problems from Section 2.3. Note that as we consider only single-query path planning, results for the discrete PRM algorithm are not reported; we consider the PRM in Chapter 6.
4.4.1
RRT vs. RRLT Comparison
Space-Filling Results In order to quantify the degradation of exploration using the simple discrete RRT algorithm, we grew RRTs and RRLTs in the 8-Puzzle space to various percentages of the full space. We then measured the distance both heuristically and exactly from every state in the space to the nearest neighbor in the tree, and subsequently
46
Percent Filled 5% 10% 15% 25% 50% 75%
Heuristic Distance Improvement RRT RRLT for RRLT 3.94 3.90 1.0% 3.23 3.18 1.5% 2.79 2.74 1.9% 2.13 2.06 3.3% 1.01 0.93 8.4% 0.36 0.31 13.4%
Actual Distance Improvement RRT RRLT for RRLT 5.97 5.95 0.4% 4.45 4.42 0.7% 3.57 3.51 1.6% 2.48 2.39 3.7% 1.05 0.95 9.5% 0.36 0.31 13.9%
Table 4.1: Space-filling comparison of RRTs and RRLTs.
computed the percent improvement of the RRLT algorithm’s average over that of the RRT (see Table 4.1). Our results suggest that even in a small space with a very low branching factor, the RRLT algorithm gives an improvement over the simple discrete RRT algorithm in terms of rapid-exploration. Although the differences are small, the average distances never varied more than 0.4% across our three trials (and most varied no more than 0.1%), giving us confidence that these results are not simply due to random variation.
Biased Search Comparison We can also quantify the improvement offered by the RRLT algorithm in preserving the idea of “direction”, by measuring the effectiveness of biased growth. Since the goal of biased growth is to shape the tree toward a particular node or set of nodes, its success will depend heavily on how well the search algorithm can define “toward”. Table 4.2 shows the results of solving several versions of the N -Puzzle and Knight-Swapping Puzzle (without the additional constraints), each averaged over 100 trials. In all cases, the RRLT algorithm not only reached the goal state with a substantially smaller tree, it also found shorter solution paths, both of which indicate a better discrete representation of steps “toward” target states. Note that the tree size does not consider the added space complexity of storing the RRLT leaves; here, we are interested in how large the tree itself must be to find a solution.
47
Bias Period 5 × 5 KSP: 1 2 4 10 7 × 7 KSP: 1 2 4 10 8-Puzzle: 1 2 4 10 15-Puzzle: 1 2 4
RRT 114 212 392 858 448 751 1410 3230 388 666 880 1740 3610 6120 13500
Nodes RRLT Improvement 78.3 31.3% 145 31.6% 257 34.4% 547 36.2% 240 46.6% 430 42.7% 761 46.0% 1670 48.3% 259 33.2% 401 39.8% 581 34.0% 1250 28.2% 2010 44.3% 3710 39.4% 6410 52.5%
RRT 50.7 51.6 52.1 51.8 145 139 137 137 57.0 53.0 45.6 40.8 215 208 214
Solution Cost RRLT Improvement 48.9 3.55% 49.1 4.84% 49.6 4.80% 50.7 2.12% 136 6.21% 133 4.32% 130 5.11% 131 4.38% 56.0 1.75% 49.2 7.17% 43.0 5.70% 38.9 4.66% 210 2.38% 204 1.92% 191 10.7%
Table 4.2: Goal-biased search in the Knight-Swapping Puzzle (KSP) and N -Puzzle.
4.4.2
N -Puzzle Results
Since the N -Puzzle graph is disjoint, with only half of the states being reachable from the start state, care should be taken in selecting random states for the sampling-based planners. Fortunately, the reachability of a state can be quickly and easily tested by checking for a board invariant [4, 35, 36].5 We use this method to ensure that randomly selected nodes are from only the reachable half of the space. In Tables 4.3 and 4.4, we compare the RRT and RRLT algorithms to existing heuristic search algorithms, in both single-directional and bi-directional forms (averaged over 500 trials each for the 8-Puzzle and 100 each for the 15-Puzzle). In each case, although the best-first search is significantly faster, the RRT and RRLT algorithms find much better solutions while exploring a comparable number of nodes. Although the A∗ algorithm is clearly better in simple problems, the 15-Puzzle results show that it very quickly becomes extremely space- and time-intensive. 5
In [4, 36], a method is described for testing the reachability of N -Puzzles with even side lengths by testing for an invariant consisting of the number of inversions in a left-to-right, top-to-bottom reading of the tiles, plus the current row of the empty space. It is easy to show, using the same method as [4], that the number of inversions forms an invariant for N -Puzzles with odd-length sides without considering the location of the empty space.
48
8-Puzzle:
Algorithm A∗ Best-first search RRT Bias Period 10 Bias Period 5 Bias Period 2 RRLT Bias Period 10 Bias Period 5 Bias Period 2
Nodes 1070 275
Leaves 1090 231
Solution Cost 21.7 63.2
Time (s) 0.100 0.020
1930 1110 590
— — —
41.2 43.8 49.7
1.40 0.460 0.116
1250 724 381
892 270 518
40.3 42.4 47.9
0.438 0.182 0.088
Table 4.3: Single-directional search comparison in the N -Puzzle.
8-Puzzle:
15-Puzzle:
Algorithm A∗ Best-first search RRT RRLT A∗ Best-first search RRT RRLT
Nodes 600 269 380 299 341000 3960 7820 5700
Leaves 667 243 — 217 516000 4400 — 6160
Solution Cost 24.0 48.0 31.0 30.2 56.6 202 106 101
Time (s) 0.015 0.010 0.161 0.117 31.8 0.110 26.0 6.01
Table 4.4: Bi-directional search comparison in the N -Puzzle.
4.4.3
Knight-Swapping Puzzle Results
Tables 4.5 through 4.8 show similar experiments from Knight-Swapping Puzzles (averaged over 500 trials each for the 5 × 5 case and 100 each for the 7 × 7 case). Again, our results show that the A∗ algorithm becomes difficult very quickly (the ∞ symbols indicates cases where the A∗ algorithm began continuous memory paging on our test machine, and ran for several hours without finding a solution). Although Tables 4.5 and 4.6 show that a biased RRLT is nearly competitive with best-first search in many cases, it is Tables 4.7 and 4.8 that highlight the real strength of the RRT and RRLT algorithms. While the best-first search suffers significantly from the reduction in the quality of the heuristic as constraints are added (see especially the 7 × 7 results in Table 4.8), the RRT and RRLT continue to provide relatively good solutions with substantially fewer nodes explored.
49
5 × 5:
Algorithm A∗ Best-first search RRT Bias Period 10 Bias Period 5 Bias Period 2 RRLT Bias Period 10 Bias Period 5 Bias Period 2
Nodes 2100000 88.9
Leaves 4790000 230
Solution Cost 36.0 51.6
Time (s) 70.6 0.004
858 392 212
— — —
52.3 52.1 51.6
1.49 0.29 0.09
547 257 144
1160 540 301
50.7 49.6 49.1
0.41 0.21 0.03
Table 4.5: Single-directional search comparison in the Knight-Swapping Puzzle.
5 × 5:
7 × 7:
Algorithm A∗ Best-first search RRT RRLT A∗ Best-first search RRT RRLT
Nodes 24200 153 406 249 ∞ 458 2320 1170
Leaves 66100 410 — 525 ∞ 1680 — 3880
Solution Cost 36.0 48.2 51.6 47.3 ∞ 124 145 130
Time (s) 0.656 0.002 0.176 0.162 ∞ 0.05 15.5 11.6
Table 4.6: Bi-directional search comparison in the Knight-Swapping Puzzle.
5 × 5:
Algorithm A∗ Best-first search RRT Bias Period 5 Bias Period 2 RRLT Bias Period 5 Bias Period 2
Nodes 1450000 290
Leaves 1160000 281
Solution Cost 46.0 25.3
Time (s) 42.6 0.10
1960 457
— —
87.9 17.9
7.13 1.54
1310 274
1450 292
86.1 17.7
2.27 0.655
Table 4.7: Single-directional search comparison in the Knight-Swapping Puzzle with added constraints.
50
5 × 5:
7 × 7:
Algorithm A∗ Best-first search RRT RRLT A∗ Best-first search RRT RRLT
Nodes 14800 445 749 510 ∞ 131000 5910 3210
Leaves 23000 616 — 579 ∞ 194000 — 7380
Solution Cost 46.0 82.7 71.2 66.6 ∞ 270 198 175
Time (s) 0.55 0.01 0.498 0.376 ∞ 16.8 156 102
Table 4.8: Bi-directional search comparison in the Knight-Swapping Puzzle with added constraints.
4.4.4
Grid World Results
The grid world is an ideal test problem for comparing properties of discrete sampling-based planners with their continuous counterparts, since it is a direct discretization of two-dimensional continuous sampling-based planning, which has already been studied [20, 23–27, 30]. The grid world also presents an opportunity to study the effects of using various simple distance metrics as heuristics, such as the L1 -, L2 -, and L∞ -norms [14]. See Chapter 6 for the results of our studies on the properties of discrete sampling-based planners in grid world planning problems.
4.4.5
Discrete Air Traffic Control Results
For our air traffic control problems, we used six airports in two different grid-world setups. In our first example, we used a grid world where x and y ranged from −50 to 50, with the airports located near the four corners of the space and in the middle of the vertical sides, and three no-fly zones in the center of the space (see Figure 4.4). Our second example used a larger space, where x and y ranged from −100 to 100, with the airports in the same relative locations, but with more extensive no-fly zones (see Figure 4.5).
51
Because we are not expressly interested in the time at which the airplanes reach their destinations, we simplify our planning by growing search trees in a two-dimensional space, with the time component defined by the cost of the node (since cost is equivalent to the time to reach a node along the path to it in the search tree), rather than by growing the tree in a three-dimensional space that includes time explicitly. This avoids issues such as exploration backward through time, at the minor cost of preventing an airplane from doubling back on itself. In problems where the ability to enter a holding pattern is important, a more relaxed three-dimensional approach would be necessary (cf. [11, pp. 26-28]). Tables 4.9 and 4.10 show the results of planning for both problems using four different search algorithms, each averaged over 100 trials. In the easier problem (Table 4.9), the RRLT compares poorly with best-first search, since simple gradient descent is almost ideal for most paths. Although best-first search is still faster in the more difficult problem (Table 4.10), we see the same trend as above in the N -Puzzle and Knight-Swapping Puzzle: the increase in time for the standard heuristic search (almost eight times for best-first search, and four times for A∗ ) is much larger than that of the RRLTs (four and two times, for the 50% and 20% biases respectively). Thus we expect that, as in the other puzzles, the problem would at some point become large enough and complex enough that the RRLT algorithm would outperform both A∗ and best-first search.
52
Figure 4.4: A sequence from the results of planning 400 airplanes flying between 6 airports, with 4 airplanes taking off at each time-step. Top: snapshots of the airplanes’ positions at three different times. Bottom: corresponding remaining paths for all airplanes currently in the air, as planned by a 20% biased RRLT.
Algorithm A∗ Best-first search RRLT, 50% Bias RRLT, 20% Bias
Average Solution Cost 100 107 108 111
Time (s) 19 3.4 11 26
Table 4.9: Solution time and average path length for solving the air traffic control problem of Figure 4.4. Paths for 300 airplanes were planned, with 4 airplanes taking off at each time-step.
53
Figure 4.5: A sequence from the results of planning 100 airplanes flying between 6 airports in a more difficult environment than that of Figure 4.4, with 4 airplanes taking off at each time-step. Top: snapshots of the airplanes’ positions at three different times. Bottom: corresponding remaining paths for all airplanes currently in the air, as planned by a 20% biased RRLT. Algorithm A∗ Best-first search RRLT, 50% Bias RRLT, 20% Bias
Average Solution Cost 226 238 271 274
Time (s) 81 26 41 50
Table 4.10: Solution time and average path length for solving the air traffic control problem of Figure 4.5. Paths for 100 airplanes were planned, with 4 airplanes taking off at each time-step.
54
Chapter 5 Discrete Sampling-Based Planners: Improvements and Variations Having established that the RRT and PRM algorithms can be modified and augmented for use in discrete space, we suggest further improvements and modifications to the discrete RRT algorithm. Most importantly, we provide nearest-neighbor search optimization techniques and introduce the meta-tree, a new sampling-based planning algorithm inspired by the discrete RRT and PRM algorithms. In addition, we explore RRT variations such as cost-optimized trees, k-growth trees, and semi-leafy trees.
5.1
RRT and RRLT Nearest-Neighbor Search Improvements
The feasibility of sampling-based planning algorithms in large discrete problems is strongly dependent on finding efficient means of locating nearest neighbors, as the repeated computation of heuristic distances is the most costly step in the algorithm. A direct search, which is linear in the size of the tree, is not efficient for even
55
moderately sized problems. Although structures exist to improve the efficiency of nearest-neighbor searches in continuous, n-dimensional spaces [1, 2, 37], it is not clear that these algorithms can be effectively applied to dynamically constructed discrete-space sampling-based planners using cost-to-go heuristics. Here, we propose optimizations for nearest-neighbor queries in discrete RRLTs. The na¨ıve nearest-neighbor search for RRLTs examines all leaves in the tree, requiring a heuristic evaluation of every leaf at each iteration of the RRLT algorithm.1 In RRLTs that use admissible metric heuristics, a simple method for reducing the complexity of nearest-neighbor search is to prune using auxiliary information. First, we store each node’s ∆leaf , the cost from that node to its maximally distant child leaf (i.e., unexplored successor). Then, in nearest-neighbor searches, the algorithm need only consider the leaves that are children of nodes where the node cost minus ∆leaf is less than the estimated cost-to-go to the nearest leaf found so far. These are the nodes whose children could potentially be better than the best possible leaf seen so far. This one-step pruning can be extended to leverage the existing tree structure of the RRLT by also storing ∆subtree , the distance to the maximally distant leaf in each node’s entire subtree (where distance is measured along the path to that leaf in the tree, and not by the heuristic). This allows us to compute the best-case cost from any leaf in a node’s subtree as the estimated distance from the node to qrand minus the distance to its furthest leaf, ∆subtree . Therefore, any node for which the heuristic cost for this best-case leaf is greater than that of the best leaf seen so far during the search can be pruned, along with its entire subtree. Using these best-case costs as heuristics, we can define the nearest-neighbor query as a discrete search where the space is the tree itself, and the goal set consists of all leaves in the tree. By searching this space for nearest neighbors using a 1
Recall that we need only consider leaves in RRLT nearest-neighbor queries, since duplicate nodes cannot be added to the tree.
56
best-first search2 we can be guaranteed to find the nearest-neighbor. Thus, the speed of the search can be increased considerably by pruning large sections of the tree, without sacrificing optimality. This leaf-based pruning method is summarized as Algorithm 5.1: Algorithm 5.1: Nearest-neighbor search with leaf-based pruning. 1 RRLT : : n e a r e s t N e i g h b o r (qrand ) { 2 qbest =null 3 best cost = ∞ 4 q = T ree . r o o t 5 Open . i n i t P r i o r i t y Q u e u e (hq, H e u r i s t i c D i s t a n c e (q, qrand ) − q . ∆subtree i) 6 while (Open . isNotEmpty()) { 7 hq, ci = Open . pop() 8 i f (c > best cost) break 9 f o r each leaf in q . l e a v e s { 10 leaf cost = H e u r i s t i c D i s t a n c e (leaf, qrand ) 11 i f (leaf cost < best cost) { 12 qbest = leaf 13 best cost = leaf cost 14 } 15 } 16 f o r each child in q . c h i l d r e n { 17 ideal subtree cost = H e u r i s t i c D i s t a n c e (child, qrand ) − child . ∆subtree 18 i f (ideal subtree cost < best cost) 19 Open . push(hchild, ideal subtree costi) 20 } 21 } 22 return qbest 23 }
The Open list in this algorithm is defined as a priority queue ordered by the best-case subtree leaf cost, c, of each node, q, in order to search the nodes in a best-first order relative to their potential for yielding a nearest neighbor. This allows the algorithm to stop searching as soon as the front of Open is a node whose best-case leaf is worse than the nearest leaf found so far (see line 8). Assuming that the heuristic used is both admissible and a metric, we can prove that this algorithm is guaranteed not to prune the heuristically nearest leaf as Here, best-first search is equivalent to A∗ search since the tree cost has no bearing on nearness to qrand , and thus we can define g(x) to be 0 for all nodes. 2
57
q
s
B s Bs
s
z x sP PPs
qmax
s
y
qrand
A step of the pruning algorithm, currently considering node q. If the heuristic is metric and admissible, then the estimated distances x, y, and z and the path-length ∆subtree must satisfy z − x ≤ y and x ≤ ∆subtree , so y ≥ z − ∆subtree .
Figure 5.1: Pruning in nearest-neighbor searches.
follows.3 Let x be the heuristic distance from q to its maximally distant descendant leaf, y be the heuristic distance from that leaf to qrand , z be the heuristic distance from q to qrand (see Figure 5.1), and best cost be the heuristic distance to qrand from the best leaf seen so far during the search. We know that x ≥ z − y (by the triangle inequality), and that ∆subtree ≥ x (since the heuristic is admissible), so it must be true that ∆subtree ≥ z − y, and therefore y ≥ z − ∆subtree . Thus, if z − ∆subtree > best cost (i.e., if the algorithm would prune the subtree), then it must also be true that y > best cost. Since y is the minimum possible heuristic distance to the target for all of the subtree’s leaves, no leaf in the subtree could be a nearest neighbor. Although no such guarantee can be made for non-metric heuristics, there is hope that this method, or a variation on it, could serve as an effective approximate nearest-neighbor algorithm in many problems with admissible but non-metric heuristics. This is reasonable because the pruning is guaranteed to be conservative in the case of an admissible heuristic, due to the fact that the actual path cost, ∆subtree , is likely to be larger than the distance x to the maximally distant leaf. A more conservative pruning method than the above leaf-based method is to find every node which could possibly be the parent of the best leaf, then search those nodes’ children with an uninformed search, rather than find the best leaf 3
Note that the nearest neighbor is defined in terms of the heuristic; although the true nearest neighbor could potentially be pruned by this algorithm, it could only be pruned if it were not the heuristically nearest, and thus would not have been selected regardless of pruning.
58
directly. In this node-based pruning method, a subtree is only pruned if the parent’s z − ∆subtree value is greater than the estimated distance to the nearest node encountered so far plus that node’s ∆leaf (i.e., if the best-case for the subtree under consideration is worse than the worst-case child leaf of the best node seen so far). This method is summarized as Algorithm 5.2: Algorithm 5.2: Nearest-neighbor search with node-based pruning. 1 RRLT : : n e a r e s t N e i g h b o r (qrand ) { 2 P arent List . i n i t () 3 best worst cost = ∞ 4 q = T ree . r o o t 5 Open . i n i t P r i o r i t y Q u e u e (hq, H e u r i s t i c D i s t a n c e (q, qrand ) − q . ∆subtree i) 6 while (Open . isNotEmpty()) { 7 hq, ci = Open . pop() 8 i f (c > best worst cost) break 9 best case leaf cost = c + q . ∆subtree − q . ∆leaf 10 worst case leaf cost = c + q . ∆subtree + q . ∆leaf 11 i f (best case leaf cost ≤ best worst cost) 12 P arentList . push(q) 13 i f (worst case leaf cost < best worst cost) { 14 best worst cost = worst case leaf cost 15 P arentList . removeWorseThan(best worst cost) 16 } 17 f o r each child in q . c h i l d r e n { 18 ideal subtree cost = H e u r i s t i c D i s t a n c e (child, qrand ) − child . ∆subtree 19 i f (ideal subtree cost < best worst cost) 20 Open . push(hchild, ideal subtree costi) 21 } 22 } 23 qbest =null 24 best cost = ∞ 25 f o r each q in P arentList { 26 f o r each leaf in q . l e a v e s { 27 leaf cost = H e u r i s t i c D i s t a n c e (leaf, qrand ) 28 i f (leaf cost < best cost) { 29 qbest = leaf 30 best worst cost = leaf cost 31 } 32 } 33 } 34 return qbest 35 }
59
At each step of the algorithm, P arentList contains all the nodes which could potentially have a successor that is at least as good as the best worst-case leaf that the algorithm has found so far. Since the best worst-case leaf puts an upper bound on the heuristic distance of the nearest neighbor, subtrees can be pruned if they could not possibly do better than best worst-case leaf. This algorithm is more conservative than the leaf-based method, since the upper bound is more relaxed, and therefore prunes fewer subtrees. However, the node-based method can potentially be faster in problems with large branching factors, since the number of leaves evaluated is limited by considering only those which have a possibility of being the nearest leaf. Note that while a similar pruning technique could be applied to the discrete RRT algorithm in order to locate the nearest node, it increases the chance of a failed step, as subtrees might be pruned in favor of a node that turns out not to have any successors.
5.1.1
Results
Using pruning during the nearest-neighbor queries, as described above, gave substantial improvements over the na¨ıve linear nearest-neighbor search. Figures 5.2 and 5.3 show the percentage of nodes and leaves evaluated at each step of growth during nearest-neighbor search using leaf- and node-based pruning in RRLTs solving the 24-Puzzle and 9 × 9 Knight-Swapping Puzzle, respectively. Table 5.1 shows the effects of different pruning methods in solving several versions of the N -Puzzle and Knight-Swapping Puzzle without added constraints (averaged over 500 trials each). Both puzzles show dramatic reduction in running time using nearest-neighbor pruning, even in the very simple variants. Interestingly, the N -Puzzle favors leaf-based pruning, whereas the Knight-Swapping Puzzle performed better using node-based pruning. This demonstrates that neither method 60
100
90 80 70
90 80 70
Evaluated (%)
Evaluated (%)
100
60 50 40 30 20 10
60 50 40 30 20 10
0
0 0
10000
20000
30000
40000
50000
0
Tree Size (nodes)
10000
20000
30000
40000
50000
Tree Size (nodes)
100
100
90 80 70
90 80 70
Evaluated (%)
Evaluated (%)
Figure 5.2: Leaf-based (left) and node-based (right) pruning in a bi-directional 24Puzzle RRLT. Red (dark grey) points indicate the percentage of nodes evaluated, and green (light grey) points indicate the percentage of leaves evaluated. The two distinct bands in the node evaluations are due to the two different kinds of target nodes generated during bi-directional search; the upper band corresponds to random nodes, whereas the lower band corresponds to nodes selected from the opposite tree.
60 50 40 30 20 10
60 50 40 30 20 10
0
0 0
10000
20000
30000
Tree Size (nodes)
0
10000
20000
30000
Tree Size (nodes)
Figure 5.3: Leaf-based (left) and node-based (right) pruning in a single-directional 9 × 9 Knight-Swapping Puzzle RRLT. Red (dark grey) points indicate the percentage of nodes evaluated, and green (light grey) points indicate the percentage of leaves evaluated.
61
8-Puzzle:
15-Puzzle:
5 × 5 KSP:
7 × 7 KSP:
Pruning None Node-Based Leaf-Based None Node-Based Leaf-Based None Node-Based Leaf-Based None Node-Based Leaf-Based
Nodes 294 295 295 5730 5690 5400 251 252 256 1120 1110 1110
Leaves 216 220 216 6210 6160 5850 536 537 540 3680 3700 3660
Time (s) 0.314 0.264 0.194 214 42.0 30.4 0.180 0.072 0.095 20.1 3.16 5.23
Table 5.1: Comparison of pruning methods in the N -Puzzle and Knight-Swapping Puzzle (KSP).
is clearly better, and thus the type of pruning should be chosen based on the specific problem domain.
5.2
RRTs With Local Planners
In continuous-space, the step-size of the RRT algorithm can be varied to change the characteristics of the search. In general, a very small results in a tree that explores slowly, but fills the space more completely and gives shorter solution paths (See Figure 5.4). Although we are constrained in discrete space to take steps between adjacent states, it is possible to achieve a similar effect by using the RRT as a global planner, with a different planner used for local planning. In this variation, at each step, instead of picking the nearest leaf to the randomly selected node, we perform a search with a local planner—limited by depth, size, and/or time—from the nearest node (or leaf in an RRLT), qnear , toward qrand . When the limit of the local search is reached, the node closest to qrand is added to the tree, along with the nodes forming the path to reach it. Here, the degree to which the local search is limited plays a role similar to the parameter in the continuous RRT. The most significant difference is that the limit on the local planner has a significant 62
sH Hs
s Q Qf ssH BBs 1Hs
s s s s s f s s s s
s s s s s s s s s s s f s s s s s
s s QQ s Qf s QQ 2 Qs s B B Bs
Figure 5.4: Left: Two continuous RRTs with different values of (2 = 21 ) after the same sequence of random nodes. Right: Two grid-world RRT with different local search restrictions (depth 1, top, and depth 2, bottom) after the same sequence of random nodes. Start nodes are indicated by a double circle.
impact on the running time of each step of the algorithm. Whereas a large has only a linear effect on the continuous RRT’s iterative running time (assuming incremental collision-checking), relaxing the limit on a local A∗ search could cause the running-time of a discrete RRT to increase exponentially. Because the distance heuristic is based on discrete states, it is not uncommon to have multiple states with equal heuristic distance from qrand . Since the goal of using the local planner is to maximize exploration at each step, we chose the node from among the ties for nearest by selecting for maximal cost from qnear . Although this may result in higher solution-path lengths than selecting the minimal cost, or selecting randomly, intuition suggests that the maximal cost should be the maximal distance from qnear , assuming a relatively optimal local search. Since the cost from qnear is an actual cost, whereas the heuristic distance to qrand is estimated, it seems reasonable to assume that considering the actual cost as well as the heuristic cost would give a more reliable estimate of distance toward qrand than the heuristic alone.
63
s s s s s s s s s s s s s s s s s s s sf s s s s s s s s s s s
s sQs s s s s QsQsf s sQs Qs s Qs Bs s Bs Bs
s
s s QQ s Qsf QQ s B Qs B Bs
s B B
s
s B B s f BP
s
Bs
PPs s
Figure 5.5: Left: Two continuous RRTs with and without intermediate nodes after the same sequence of random nodes. Right: Two grid-world RRTs with a local search of depth 4, with and without intermediate nodes, after the same sequence of random nodes. Start nodes are indicated by a double circle.
5.2.1
Meta-Trees
We can further improve the parallel between the global planner and the continuous value by drawing on the techniques used in PRMs, and building a meta-tree from the local planner results that considers the path to the new node as a unit rather than as a sequence of nodes in the tree. Rather than storing the entire sequence of nodes returned by the local planner, we add the endpoint of the path to the tree directly as a child of qnear , and store any desired information about the path as additional information associated with the edge between them (see Figure 5.5). The advantage of this method lies in the fact that the intermediate nodes contribute very little to the overall exploration of the space, since they are, by definition, only one step away from at least two other states. This can be seen in the very small Voronoi regions of internal nodes in an RRT (as seen in Figures 3.3 and 6.1–6.3). Thus, these nodes increase the nearest-neighbor query time without adding significantly to the coverage of the tree, especially in cases where we are 64
40000
Meta Nodes Meta Leaves Non-Meta Nodes Non-Meta Leaves
35000
Tree Size
30000 25000 20000 15000 10000 5000 0 0
200
400
600
800
1000
Size of Local Search Tree
Time to Find Solution (s)
180
Meta Non-Meta
160 140 120 100 80 60 40 20 0 0
200
400
600
800
1000
Size of Local Search Tree Solution Path Length (nodes)
600
Meta Non-Meta
500 400 300 200 100 0 0
200
400
600
800
1000
Size of Local Search Tree
Figure 5.6: Meta vs. non-meta global planners in the 9 × 9 Knight-Swapping Puzzle.
primarily interested in exploring quickly; by discarding them, we cause the tree to explore more rapidly (cf. RRT pruning in [28]). Figure 5.6 shows results of both meta and non-meta global planners (using A∗ as a local planner) in the 9 × 9 Knight-Swapping Puzzle with added constraints, averaged over 100 trials. Although the meta-planner returns more costly solutions, it is significantly faster and occupies substantially less memory. 65
s qrand
c c s c cB B B
c s c c
Figure 5.7: A piece of a grid world meta-RRLT during the expansion step. Since the leaves (empty circles) are not enough to detect the obstacle, a leaf from the left-hand node will be selected as qnear , despite the fact that a search starting at a leaf from the right-hand node would result in a significantly better qnew due to the obstacle.
5.2.2
RRLTs vs. RRTs as Global Planners
When using local planners, especially if using relatively large local searches, it is less clear whether using an RRLT as the global planner offers a significant advantage over an RRT. Whereas the regular RRLT algorithm offers an entire step of look-ahead in the tree growth as compared to the RRT, the global RRLT with a local planner offers only 1/d of a step of look-ahead, where d is the average depth of the local planner, while still requiring the same increase in memory and nearest-neighbor search. Since the full-step look-ahead is lost, the RRLT global planner no longer provides a guarantee that the tree grows as far as possible (heuristically) at each step (see Figure 5.7). In order to restore that guarantee in a tree with a local planer, we would have to consider the leaves of a node to be the states reachable by the entire local planning tree. While that would restore the one-step optimality property, it would require a memory increase on the order of md (where m is the average branching factor in the discrete space), and a corresponding drop in the performance of nearest-neighbor queries. While pruning, such as that described in Section 5.1, could mitigate the performance costs, the cost/benefit ratio of such a modification seems much lower than that of the original RRLT algorithm.
66
s qrand
c c s c s c c c c c c s c c s c c c
Figure 5.8: A piece of another grid world meta-RRLT during the expansion step. Here, the meta-RRLT algorithm will select a leaf from the node on the right, whereas a meta-RRT would select the node against the corner of the obstacle, which can only grow away from qrand .
However, despite its reduced effectiveness when used as a global planner, the global RRLT algorithm still has potential benefits over the simple RRT algorithm. First, since it starts the local search one node further in the direction of the target node, it essentially has the effect of increasing the local search depth without actually making the local search tree larger. Second, the leaves continue to act as “feelers”, allowing advance detection of goal states, which can be especially important in trees without goal biasing. Lastly, there are some cases, such as the one shown in Figure 5.8, where even a partial-step look-ahead is helpful. Since the RRLT algorithm eliminates nodes as candidate nearest-neighbors if they have no leaves, the RRLT algorithm can help to avoid selecting a node as nearest-neighbor if that node is against an obstacle lying between it and the target node. This can reduce the number of local searches that grow predominantly backward into regions already explored by the tree, which can be helpful in spaces with a high number of obstacles (or restrictive dynamics that effectively act as obstacles).
5.2.3
Best-First Search as a Local Planner
One interpretation of the straight-line step in translating it to discrete space is to consider the fact that a straight line is the simplest method of moving directly
67
toward qrand , which suggests a greedy local planning algorithm: a depth- and node-limited best-first search. In the extreme interpretation, one could have a local best-first search with no depth limit, in which case the local search would tend to have very few branches, thus approximating a growth that is (heuristically) “straight” toward the goal.
5.2.4
A∗ as a Local Planner
An alternate interpretation of the continuous straight-line step focuses instead on the fact that a straight line is the shortest path to the point units in the direction of the randomly selected point. Under this interpretation, a depth- and/or node-limited A∗ search is a logical choice for use as a local planner, as it guarantees the shortest path in the minimum number of nodes. While this interpretation tends to give shorter overall problem solution paths than a greedy method, it yields less expansion at each step than many other algorithms. The primary advantage of the A∗ local search over the best-first method—besides reduced solution lengths—is better tolerance of obstacles. Whereas the best-first search is likely to fare poorly in spaces with many obstacles, the A∗ local search will find paths around them. Here “obstacles” can refer either to literal obstructions, as in the grid world, or simply to dynamics of the discrete space that are poorly predicted by the heuristic.
5.2.5
Generalized Heuristic Local Planners
As in traditional discrete search, the local search can be adjusted to a variety of balances of speed and solution length using the weighted nearness evaluation f (x) = αg(x) + h(x), where α ≥ 0, for the local search tree (see Section 2.2.3). Figure 5.9 shows the effect of different values of α on the solving time and solution path length in a 9 × 9 Knight-Swapping Puzzle with constrained source and 68
700
Meta-RRT Meta-RRLT
60
Solution Path Length (nodes)
Time to Find Solution (s)
70
50 40 30 20 10 0
Meta-RRT Meta-RRLT
600 500 400 300 200 100 0
0
0.2
0.4
0.6
0.8
1
0
Weight of g(x) in Nearness Heuristic
0.2
0.4
0.6
0.8
1
Weight of g(x) in Nearness Heuristic
Figure 5.9: Effects of varying the heuristic of local search trees limited to size 50 in bi-directional meta-RRT and meta-RRLT trees for the 9×9 Knight-Swapping Puzzle. The graphs show the changes in average time to find a solution (left) and average solution cost (right). destination, using bi-directional meta-RRTs and meta-RRLTs, demonstrating the speed–optimality trade-off. These graphs also show the trade-off between the RRT and RRLT algorithms and global planners in this problem domain. Although the RRT algorithm is slightly slower, especially for more optimal local searches, it finds slightly shorter solutions. Figure 5.10 considers the effects of varying the local search tree size for a variety of heuristic weightings, again in the 9 × 9 Knight-Swapping Puzzle with constrained source and destination. The trends as local tree size increases depend strongly on the cost weighting in the local search heuristic, with the general pattern being that the global trees resemble their local planners more and more as local tree size increases. For example, meta-trees with A∗ (α = 1) local searches become much more time consuming, but the paths they plan become more optimal, whereas meta-trees with best-first (α = 0) local searches have decreasing solution quality, but require very little extra time. Especially interesting to note is that a small weighting, such as 0.25, is enough to stabilize solution costs, but incurs essentially no penalty in solving times, which suggests that even in cases where rapid-exploration is the primary focus, a small weight on the g(x) component of local search is likely to be beneficial. 69
0 0.25 0.5 0.75 1
60 50 40
Solution Path Length (nodes)
Time to Find Solution (s)
70
30 20 10 0
1000 900 800 700 600 500 400 300
0 0.25 0.5 0.75 1
200 100 0
0
200
400
600
800
1000
0
200
Size of Local Search Tree
400
600
800
1000
Size of Local Search Tree
Figure 5.10: Effects of varying the local heuristic and local search tree size in bidirectional meta-RRLT trees for the 9 × 9 Knight-Swapping Puzzle. The graphs show the changes in average time to find a solution (left) and average solution cost (right). Note that the line α = 0 is not visible in the solution time graph because it is covered by the line α = 0.25.
5.2.6
Importance of Randomization in Local Planners
Regardless of the algorithm used as a local planner, it is vital that ties are broken in a random way. Since heuristic nearest-neighbor queries in discrete space are highly likely to result in ties (see Section 6.1 for further discussion), any local planner will need to break ties somehow in order to pick a “best” node to return to the meta-planner, except in the case where the local planner finds a path all the way to the random target node. Many discrete search algorithms, such as best-first search and A∗ , have essentially similar behavior as stand-alone algorithms whether deterministic or randomized. Specifically, although their running time, memory usage, and exact solutions will depend on randomization, their fundamental properties (e.g., optimality for A∗ ) remain the same. However, the meta-tree often makes use of the local search tree in an unfinished state, at which point determinism in the algorithm will have notable impact on the “shape” of the local tree. This intermediate bias can significantly degrade the properties of the meta-tree. In the example of Figure 5.11, meta-RRLTs using a local A∗ planner with search depths of 0 and 1 (thus step sizes of 1 and 2, since the local tree search
70
begins at a leaf) are grown in an L1 -spherical grid world of radius 100, which mirrors the movement of the agent in the space. The pattern of tree exploration in the version with a randomized local A∗ search is symmetrical, and purely a function of the heuristic and the system dynamics (see Section 6.2 for details). However, the deterministic version clearly shows the effect of biasing on the growth of the RRLT, exploring a few small areas intently while leaving other regions virtually unexplored. The biasing is far worse when using a regular discrete RRT as a meta-planner; unlike the RRLT, which introduces an element of randomness by breaking ties randomly in the nearest-neighbor query, a meta-RRT with a deterministic planner is completely controlled by the deterministic bias. It is important to note that while in this example it is sufficient to break ties at the stage of deciding the best of the generated nodes to return to the meta-planner, that is not generally the case. Unless the stopping condition of the local tree is decided purely by depth, a bias during tree growth will affect the state of the tree at the stopping point, and thus influence which nodes have been generated as possible best nodes. In general, the local planner must break ties randomly at any point where more than one node is a candidate for expansion at the next step of the local search in order to avoid biasing the meta-planner.
71
Figure 5.11: Visualizations of the frequency of coverage in 1000 trials of meta-trees grown to size 1000 in an L1 -spherical grid world of radius 100, using A∗ as a local planner with the L1 -norm heuristic. The histograms compare randomized A∗ (left) and deterministic A∗ (right) with step sizes of 2 (top) and 3 (bottom).
72
5.3
Cost-Optimized RRTs
Although the RRT algorithm is primarily concerned with finding solutions quickly, we generally prefer low-cost solutions over high-cost solutions when all other factors are equal. In many cases, even a slight reduction in solution speed is acceptable if it results in noticeably improved solutions. Such a trade-off in the RRT algorithm is possible by modifying the nearness heuristic. Instead of considering exclusively the estimated cost-to-go, h(x), we use the weighted nearness evaluation described in Section 2.2.3, defining “nearness” in terms of the value f (x) = αg(x) + h(x). Although this departs from the parallel with the continuous tree, a non-zero value of α allows the search to be biased toward low-cost solutions by encouraging growth that moves out from the edges of the tree, rather than back toward its center. Although any positive value of α is possible in creating semi-optimal RRTs, we found that only small values are practical. For example, adopting α = 1 will create an algorithm that always generates optimal solutions, but does so in a vastly inefficient way when compared with A∗ or breadth-first search due to the need to perform nearest-neighbor queries at each iteration. Table 5.2 shows the results of varying α in several N -Puzzles and Knight-Swapping Puzzles, averaged over 500 trials each, with the exception of the 7 × 7 Knight-Swapping Puzzle with α = 0.5, which is averaged over only 25 trials4 . In each case, a small α improved solution costs noticeably, while having relatively little impact on solution time. Large values of α, however, can increase running time dramatically in harder problems. 4
Although we cannot be sure of the accuracy of such a small sample, the fact that even the best case out of the 25 solutions took an order of magnitude longer than with α = 0.1, and in most cases took two or three orders of magnitude longer, suggests that the trend evidenced by this data is correct.
73
α Value 0 0.1 0.25 0.5 15-Puzzle: 0 0.1 0.25 0.5 5 × 5 KSP: 0 0.1 0.25 0.5 7 × 7 KSP: 0 0.1 0.25 0.5 8-Puzzle:
Nodes 290 305 320 354 5390 6920 8280 15900 244 273 325 363 1110 1740 2900 60000
Leaves 212 220 228 256 5820 7410 8750 16200 525 563 665 734 3660 5660 17500 200000
Solution Cost 29.4 26.4 24.4 23.2 101 82.6 70.0 59.6 48.3 41.9 40.7 37.3 136 113 108 96
Time (s) 0.024 0.024 0.028 0.034 5.32 7.56 11.5 59.1 0.0600 0.0740 0.104 0.150 3.05 6.04 15.1 9000
Table 5.2: Cost-optimized N -Puzzle and Knight-Swapping Puzzle (KSP) RRLTs.
5.4
k -Growth RRLTs
In problem spaces with many obstacles, or where the restrictions on state transitions cause branches of the tree that appear promising to end or “veer” unexpectedly, it is useful to have a fuller tree, and to approach any new region from several directions at once in case some of the paths prove unfruitful. The RRLT can easily be made to explore in this manner with little impact on the running time of each iteration. The nearest-neighbor query can be easily extended to find the k nearest leaves, or the k nearest leaves with different parents (if we wish to encourage exploration by diverse branches), with very little increase in iterative running time. Although it takes little extra work to find the k nearest neighbors, the overall running time of the algorithm will be greater since this will increase the size of the tree at the ith iteration to ki, and therefore increase running time of the nearest-neighbor query at the ith iteration relative to the standard RRLT. However, in problems where, due to obstacle constraints, a relatively full tree will need to be grown eventually, this algorithm will grow the tree with fewer overall nearest-neighbor queries. Figure 5.12 shows the results of k-growth bi-directional RRLTs in the 7 × 7 74
120000
Nodes Leaves
Nodes Leaves
100000
16000 14000
Size of Tree
Size of Tree
20000 18000
12000 10000 8000 6000 4000 2000
80000 60000 40000 20000
0
0 2
3
4
5
6
7
8
9
2
3
4
K Growth Factor
6
7
8
9
7
8
9
7
8
9
700 Time to Find Solution (s)
Time to Find Solution (s)
25 20 15 10 5 0
600 500 400 300 200 100 0
2
3
4
5
6
7
8
9
2
3
4
K Growth Factor
5
6
K Growth Factor 400 Solution Path Length (nodes)
180 Solution Path Length (nodes)
5
K Growth Factor
160 140 120 100 80 60 40 20 0
350 300 250 200 150 100 50 0
2
3
4
5
6
7
8
9
2
K Growth Factor
3
4
5
6
K Growth Factor
Figure 5.12: Effects of different values of k in the 7 × 7 (left) and 9 × 9 (right) KnightSwapping Puzzles with both added constraints in bi-directional RRLTs. The graphs show effect on tree size (top), solving time (center), and solution length (bottom). and 9 × 9 Knight-Swapping Puzzles with added constraints (see Section 2.3.3), both using node-based nearest-neighbor pruning. The results (averaged over 1000 trails in the 7 × 7 version and over 250 trials in the 9 × 9 version) show that while increasing the value of k does increase the size of the tree, the increase is very small relative to k, which indicates that the added nodes are contributing to the exploration of the space. Both experiments also show a decrease in average solution cost, as we would expect from fuller trees. Although the time results are less clear, they both seem to indicate a decrease in running time for small values of k. 75
The usefulness of this technique depends to a large degree on the assumption that the nodes added will be spread about the space somewhat, giving an advantage in terms of approaching the target node from several directions in case one path dead-ends. In order to ensure that the nodes are not clustered in a small area, and thus giving little increase in the coverage of the space, it may be helpful to use some measure of nearness among candidate nodes to filter their selection as nodes to grow. Assuming k is relatively small, k(k − 1)/2 added pairwise comparisons should have little effect on the running time of the algorithm. Two possibilities suggest themselves as possible methods of filtering by nearness: • Heuristic Distance: discard a candidate node if it is within some δ of a node already chosen as one of the k. • Common Ancestry: discard a candidate node if one of its n nearest ancestors is also one of the n nearest ancestors of one of the k. Exploration of the effectiveness of these methods in various problem domains remains for future work.
5.5
Semi-Leafy RRTs
One method of reducing the added memory requirements of the RRLT algorithm is to create a “semi-leafy” tree, which creates leaves only on demand. Using the node-based pruning method described above, it is simple to modify the RRLT algorithm to add nodes to the tree without expanding their leaves, then to expand leaves only during the nearest-neighbor queries, and only from nodes that are candidates for having the nearest-neighbor leaf. In order to avoid duplicate work, these leaves are saved once generated, as creating successors is often a costly step. While this algorithm does reduce the leaf-to-node ratio of the tree, it suffers from several drawbacks. First, it creates the possibility of failed nearest-neighbor 76
queries, in the case that the candidate parent node or nodes turn out not to have any possible successors. Unlike the standard pruning, which relies on a predetermined subtree leaf count, nearest-neighbor pruning in a semi-leafy tree cannot prune subtrees that have no possibility of expansion unless they have been explored in a previous query. In this case, the nearest-neighbor query must be run again, potentially several times, before a neighbor is found. Second, both single- and bi-directional search suffer from the loss of exterior leaves to use as “feelers” to detect solutions before the tree itself reaches them. Thus, while there are fewer leaves in a final semi-leafy tree as compared to a full RRLT, there are more nodes in the tree, which again means an increase in the number of nearest-neighbor queries. The effects of these issues are visible in Table 5.3, which compares the semi-leafy tree to the RRLT in the N -Puzzle and Knight-Swapping Puzzle without additional constraints. Results are for leaf-based pruning in the N -Puzzle and node-based pruning in the Knight-Swapping Puzzle, and are averaged over 1000 trials each.
Problem 5 × 5 KSP 7 × 7 KSP 8-Puzzle 15-Puzzle
Nodes RRSLT RRLT 302 256 1270 1130 331 259 6050 5690
Leaves RRSLT RRLT 416 548 2600 3900 144 217 4082 6150
Solution Cost RRSLT RRLT 49.7 48.4 138 136 30.9 30.2 105 103
Time RRSLT 0.204 8.45 0.133 65.0
(s) RRLT 0.168 5.45 0.117 14.1
Table 5.3: Comparison of the semi-leafy tree (RRSLT) and RRLT in the N -Puzzle and Knight-Swapping Puzzle (KSP).
77
Chapter 6 Sampling-Based Planning Properties in Discrete Space In order to better understand the differences inherent in discrete RRTs relative to their continuous counterparts, we examine discrete RRTs in the grid world of Chapter 2 with respect to several fundamental properties of the tree growth, including coverage, optimality, and the behavior of Voronoi regions.
6.1
Overlapping Voronoi Regions and Tie-Breaking
The fundamental difference between Voronoi regions in discrete and continuous spaces is the introduction of overlap in discrete space. Although the definition of a Voronoi region guarantees non-overlapping regions in continuous space, discrete-space heuristics generally cannot guarantee that there will not be ties. It is of course possible to modify any heuristic to prevent ties by introducing arbitrary tie-breaking rules, but doing so will introduce an equally arbitrary bias in the exploration of the RRT algorithm, which is generally undesirable.
78
The extent to which Voronoi regions overlap depends strongly on the space, the heuristic used, and the degree to which the tree has explored the space, as we will show in Section 6.2. Since such overlap is highly likely to exist in some form, however, some decision must be made with regard to handling ties in the nearest-neighbor search. The simplest solution, and the one used in our experiments (except where noted), is to break ties by selecting one of the candidate neighbors at random. However, depending on the problem domain and the desired behavior of the algorithm, a variety of other solutions are possible. Using the cost-so-far of ties as a tie breaker can either emphasize optimality (by favoring low-cost nodes), or exploration (by favoring high-cost nodes, as in the case of our generalized heuristic local planners for meta-trees). Another possibility is a second-pass heuristic evaluation of tied nodes; if we assume that the number of ties for nearest-neighbor is generally a small number regardless of tree size, we can use a much more costly heuristic to break ties with minimal impact on the complexity of the algorithm.
6.2
Coverage and Heuristic—Dynamic Interactions
In order to explore the effect of different heuristics on the behavior of RRTs, we conducted trials of RRT growth in a grid world using the L1 , L2 , and L∞ metrics as distance estimates. Recall that the agent in the grid world moves according to the L1 metric: at each step, the possible moves are up, down, left, and right. To begin, we created visualizations of the trees and their discrete Voronoi regions (see Figures 6.1, 6.2, and 6.3, which are highly typical of the patterns generated by each heuristic function). The tree is shown using thick blue (dark grey) lines, and the edges of Voronoi regions with thin red (light grey) lines. Cells with red (grey) shading denote areas of Voronoi region overlap, with darker shading 79
indicating more ties (especially evident in Figure 6.3). The differences in Voronoi patterns are striking, and have a significant impact on the behavior of the tree, as we will show below. The first important point to note is that while both the L1 and L∞ heuristics tend to leave four large, regular regions (the four diagonal triangles and the four diamonds on the cardinal directions, respectively), the L2 heuristic breaks the space into irregular and less predictable regions. The more random placement of these regions, as well the fact that the regions in the L2 tree are more uniform in size, suggests that from the standpoint of a Voronoi region interpretation, the L2 heuristic has significantly better properties for the grid-world RRT in terms of rapid, and more uniform, exploration of space. Examining the structure of the Voronoi regions in the trees with the L1 and L∞ heuristics allows us to predict specific weaknesses in those trees. In each case, the persistence of the four large regions results in a strong bias toward those regions. Since the regions remain essentially symmetrical as they shrink during tree growth, the main trunks of the trees remain fairly linear, resulting in generally X-shaped L1 trees, and +-shaped L∞ trees. Also, the presence of large Voronoi regions at the edge of the L1 tree as compared to the other two trees indicates less rapid coverage of the region. These effects can be clearly seen in Figure 6.4, which shows histograms from 1000 trials of RRT growth using each heuristic at several stages of growth. Cells with darker shading are those that are nodes in more of the 1000 trees. The X- and +-shaped biases are clearly evident, as is the slower exploration of the region’s frontier in the L1 tree. Other, more subtle effects are visible as well, such as the L2 tree’s better coverage at the edge of the region, as well as the most uniform coverage of the space in general. Also noticeable are the off-bias regions near the center of the tree, which are poorly sampled, especially by the L∞ metric.
80
Also interesting to note in Figure 6.4 is the interaction between the heuristic biases and the dynamics of the agent itself. The slight +-shaped bias in the L2 tree is largely due to the fact that the agent moves in a +-shaped pattern, thereby seeding the tree in a slightly biased way during the initial moves. In the case of the L∞ tree, the agent dynamics and the Voronoi bias magnify each other, resulting in the sharp, well-defined lines along the bias axes. The L1 tree shows the opposite effect; growth in the bias directions is approximated by selecting one of several moves at a 45-degree angle at each step. The resulting randomized zigzagging gives noticeably fuzzier regions along the bias axes. The differences in Figure 6.5, which shows the coverage information for the same trees except for the addition of diagonal moves, demonstrates that all the effects just described are indeed results of the interaction between the Voronoi regions and the non-diagonal agent dynamics. Interactions between heuristics and agent dynamics are not restricted to the two-dimensional grid world. We conducted similar experiments in a three-dimensional grid world and observed the same trends in both coverage and optimality. For example, Figure 6.6 shows slices of coverage by RRTs using the L∞ metric grown to size 3000, and averaged over 500 trials (with only the positive octant shown, since the space is symmetrical). The same sorts of biases along axes and dead areas along diagonals that occurred in the two-dimensional L∞ experiments are clearly visible. The interaction between Voronoi regions and agent dynamics can be largely eliminated by using meta-trees1 or PRMs; since there is a much broader range of allowed movement in algorithms that use local planners, the relative positions of nodes are not as predictable, leading to less predictable Voronoi regions. Figure 6.7 shows the Voronoi regions for meta-trees and PRMs, with search depths (meta-tree) and connection radii (PRM) of 5 and 10. In all but the case of the radius 5 PRM, 1 Because our grid-world experiments contain no obstacles, meta-RRTs and meta-RRLTs behave identically in terms of coverage and optimality, and can therefore be used interchangeably.
81
there are no strong patterns in the Voronoi regions, despite the use of an L1 heuristic. The radius 5 PRM highlights a significant drawback of the PRM algorithm; until enough samples are taken that the point density passes some threshold, the PRM is largely disconnected, and thus the coverage of the space relative to some start state (here we use the center of the space, as in the meta-tree) expands very little. The coverage histograms of Figure 6.11, which shows the results of 1000 trials of a meta-tree (depth 5) and PRM (connection radius 5) grown to 3000 nodes each, demonstrates the coverage resulting from the unstructured Voronoi regions of Figure 6.7. The meta-tree shows only a slight pattern near the center from its early growth, and the PRM shows completely smooth coverage (with the exception of the start node). Figure 6.8 demonstrates another effect of the PRM’s dependence on connecting points. Narrow corridors present a well-known difficulty for the PRM algorithm. If the corridor extends farther than the connection radius of the PRM, the two halves of the PRM can only become connected if one or more points exactly inside the corridor are chosen during the sampling [25]. The meta-tree, on the other hand, is able to explore the corridor so long as points are selected whose nearest neighbors are near the opening on the meta-tree’s side of the corridor, defining a large possible sampling area. Note, however, that the meta-tree shows difficulty in the form of excessive exploration along the edge of the boundary. Whereas the PRM, if connected, shows uniform coverage of the entire space, the meta-tree will always explore the right-hand side—and especially the right-hand border of the dividing wall—more thoroughly than the left-hand side.
82
Figure 6.1: Discrete Voronoi regions using the L1 metric in an L1 -spherical grid world of radius 50, at tree sizes 50, 250, 500, and 1000.
83
Figure 6.2: Discrete Voronoi regions using the L2 metric in an L1 -spherical grid world of radius 50, at tree sizes 50, 250, 500, and 1000.
84
Figure 6.3: Discrete Voronoi regions using the L∞ metric in an L1 -spherical grid world of radius 50, at tree sizes 50, 250, 500, and 1000.
85
Figure 6.4: Frequency of cell coverage using (from left to right) the L1 , L2 , and L∞ metrics in an L1 -spherical grid world of radius 100, at tree sizes 100, 2500, and 5000.
86
Figure 6.5: Frequency of cell coverage using (from left to right) the L1 , L2 , and L∞ metrics in an L1 -spherical grid world of radius 100, at tree sizes 100, 2500, and 5000, with diagonal moves allowed.
87
Figure 6.6: Frequency of cell coverage using the L∞ metric in an L1 -spherical threedimensional grid world of radius 50, at tree sizes 3000. Only the positive octant is shown, since the space is symmetrical. The panels show the slices z = 0 (top), z = y (middle), and z = x+y (bottom). (Note: contrast has been enhanced—equally across 2 all three images—to make the coverage more visible.)
88
Figure 6.7: L1 Voronoi regions in grid-world meta-trees with local search depths of 5 (top-left) and 10 (bottom-left), and PRMs with a connection radius of 5 (top-right) and 10 (bottom-right). In each case, the trees are grown to 250 nodes. The radius 5 PRM (top-right) shows a potential drawback of the PRM algorithm: until a critical density is reached where the graph is predominantly connected, adding nodes does not increase the actual coverage of the space, as demonstrated by the large Voronoi regions.
89
Figure 6.8: Coverage of a space with a narrow corridor using meta-trees with a local search depths of 5 (left) and PRMs with a connection radius of 5 (right) grown to 5000 nodes. Coverage is based on connection to a start node located in the center of the right-hand side (visible as a dark square). The narrow corridor highlights a weakness of the PRM algorithm; whereas the RRT grows along the obstacle and eventually through the corridor, the PRM can only explore the left-hand side once nodes within the corridor are sampled, allowing it to connect the two sides [25]. (Note: contrast has been enhanced—equally across both images—to make the coverage more visible.)
90
6.3
Path Optimality
The effects of growth biases introduced by interactions between heuristics and the dynamics of the discrete system are not limited to the smoothness of tree coverage. Figure 6.9 shows a visualization of the optimality of the paths to each point reached by the tree, averaged over 1000 trials, with darker values indicating paths closer to optimal. Specifically, the brightness of each cell on a scale of 0 to 1 is computed as 1 − d/p, where d is the optimal Manhattan distance from the tree root, and p is the average tree path cost taken over every trial out of the 1000 where that cell was included in the tree. Immediately noticeable in the figures is that the L1 metric is significantly less uniform in the optimality of its paths. This can again be explained by the interaction between the bias in the tree and the dynamics of the space. First, we must notice that even without considering any bias in the tree, not every cell has an equal chance of being reached in an optimal way by any randomized algorithm. A cell n steps from the starting point and positioned on the line x = 0 or y = 0 can be reached by exactly one optimal path, whereas a cell the same distance out but positioned along one of the four diagonals can be reached by any one of choose(n, n2 ) = n!/( n2 ! n2 !) optimal paths.2 This creates an underlying bias toward an X-shaped optimality histogram. In the case of the L1 metric, that bias is magnified significantly by the bias in tree growth: the straight lines from the tree are predominantly along the X, with cells furthest from the X most likely to be reached by some amount of doubling-back in the tree in either the x or y direction. The L2 and L∞ metrics, on the other hand, give much more even results since the two biases serve to counteract each other—however, in both cases there is still a clearly visible line of lower optimality along the lines x = 0 and y = 0 due to the existence 2
Note that n is always even on a diagonal, since there must be exactly one horizontal move for every vertical move.
91
of only one optimal path. A more subtle effect is the lightening of the visualizations over time as the tree grows; this is most clearly visible in the L2 and L∞ graphs, but can be seen in the narrowing of the X-shaped dark region in the L1 graph. Even more important is the fact that the lightening effect is not uniform across the space, but instead favors the off-bias axes, close to the center of the trees (especially in the L2 metric results). This behavior is due to a fundamental pattern in the growth of the RRT algorithm, which is clearly visible in the Voronoi diagrams in Figures 6.1, 6.2, and 6.3: during the initial stages of growth, the exploration of the tree is almost exclusively outward from the edges of the tree; only as the region becomes more explored are the remaining areas inside the tree filled in. Growth from the edges of the tree into unexplored areas creates paths that are nearly optimal in most cases, since they are predominantly on the main trunks of the tree. Internal tree growth, however, is often tangential to the trunks of the tree, doubling back in x, y, or even both (in the case of a tangent to a tangent). Thus, the earlier in exploration a tree encounters a node, the more optimal the path to that node is likely to be. This explains the pattern of lightening that occurs in the places least likely to be explored early in each tree’s growth; in early stages of the trial, although relatively few trees have encountered those areas, they have all encountered them during outward growth. As the trees continue to grow, however, more and more of them encounter those areas during internal growth, thus lowering the average optimality. This decrease in optimality over time can be almost entirely prevented by using optimal-cost weighting in growing the RRTs. Assigning weight α to the g(x) component of the heuristic when finding nearest neighbors, as described in Section 5.3, gives preference to nodes near the center of the tree, eliminating most inward growth. As seen in Figure 6.10, the optimality of paths increase as the α weight increases; in fact, the L∞ tree with α = 0.25 shows almost 100% optimal
92
solutions, although the exploration of the space does suffer, as evidenced by the unexplored points along the edges that are not present with α = 0 and α = 0.1. Figure 6.11 shows the optimality characteristics of meta-trees and PRMs. These optimality histograms reflect a fundamental difference in the meta-tree and PRM algorithms: while increasing the search depth of a meta-tree lowers its optimality, increasing the connection radius of a PRM yields more optimal solutions. The reason for this difference (visible in Figure 6.7) is that large search depths in a meta-tree results in more inward growth and doubling back, since there are fewer potential nearest-neighbors to choose from as compared to a traditional RRT or RRLT covering a comparable area. In the PRM algorithm, however, the fact that nodes are connected to all of their neighbors means that a larger connection radius results in more paths which take shortcuts past neighbors which are closer but off the optimal path.
93
Figure 6.9: Cost optimality of paths in 1000 trials using the L1 (left), L2 (center), and L∞ (right) metrics in an L1 -spherical grid world of radius 100, at tree sizes 2000 (top) and 8000 (bottom). Darker points have average costs closer to optimal.
94
Figure 6.10: Cost optimality of paths in 1000 trials using the L1 (left), L2 (center), and L∞ (right) metrics in an L1 -spherical grid world of radius 100, at tree size 8000, with α = 0.1 (top) and α = 0.25 (bottom). Darker points have average costs closer to optimal. The white points at the edges of the space are those that were not reached in any of the 1000 trials.
95
Figure 6.11: Discrete grid-world coverage (top) and optimality (bottom) for 1000 meta-trees (left) with local search depths of 5 and PRMs (right) with connection radii of 5, grown to 3000 nodes. Although both show nearly uniform coverage, the optimality of the PRM is significantly better than that of the meta-RRT.
96
Chapter 7 Conclusion 7.1
Summary
In this thesis, we applied sampling-based planning techniques to discrete search and planning by adapting existing sampling-based planners—RRTs and PRMS—from continuous and hybrid problems, as well as developing new algorithms inspired by those planners. In addition to giving experimental results for these algorithms in several discrete test problems, we explored their properties more directly through experiments in a discrete grid world. In Chapter 4, we introduced a method for mapping continuous sampling-based planners to discrete space by using heuristic cost-to-go evaluations as a substitute for distance metrics, and standard discrete search techniques as local planners to define connectivity. After applying this technique to RRTs and PRMs, we explored ways of improving the mapping to discrete space. As a result, we developed the rapidly-exploring random leafy tree (RRLT) algorithm, which addresses some of the issues of the discrete RRT. We provided experimental results for these algorithms in the N -Puzzle and Knight-Swapping Puzzle, as well as a test of multi-agent planning in a discretized, two-dimensional air-traffic control problem.
97
In Chapter 5, we proposed several promising variations and improvements based on our discrete RRTs and PRMs. Most notably, we introduced the meta-tree algorithm, which combines elements of the RRT and PRM algorithms, and suggested a method for speeding up nearest-neighbor search in the RRLT. We gave experimental results in our test problems that illustrate the differences among these algorithms. Finally, in Chapter 6, we introduced the idea of heuristic-based discrete Voronoi regions, and examined in detail the coverage and optimality properties of several of our algorithms in grid-world problems. We explored the interactions between heuristics and agent dynamics using discrete Voronoi regions, and provided visualizations of these interactions and their effects.
7.2 7.2.1
Future Work Refinements
Since this work is a preliminary foray into discrete sampling-based planning, the experimental results presented here would benefit from further refinements, both in implementation of the algorithms that generated the data, as well as the final analysis of that data.
Implementation Details Since our algorithms were exploratory prototypes, work remains to be done in improving the efficiency, and therefore the running time, of our algorithms. For example, a hash-based structure for tree and map membership testing and intersection detection would decrease the complexity of checks for node duplication or intersection with another tree from logarithmic time to constant time. Nearest-neighbor queries and heuristics would also benefit from further analysis and 98
optimization, since these generally constitute the most expensive part of the RRT and PRM algorithms.
Data Analysis In our experimentation, we were primarily concerned with evaluating the overall potential of discrete sampling-based planners, as well as exploring general trends in our data. The analysis of our data, therefore, was not statistically rigorous. Because results vary to extremes in many of our trials—for example, solving a random configuration of the 15-Puzzle or 24-Puzzle can be very simple, if the random configuration is very near the solved configuration, but very difficult in other cases—it was not uncommon for the standard deviation of our results to be of the same order of magnitude as the results themselves. Although we believe that the trends in the data we report to be correct, since we observed them across many trials and different problem domains, solid conclusions about our algorithms and results would require larger trials and more rigorous statistical analysis.
7.2.2
Extensions of Discrete Sampling-Based Planners
The performance of our discrete sampling-based planners is promising for the application of these algorithms to harder discrete-space problems. Since they use the same type of heuristic information used in other informed search methods, they could be easily adapted to any other problem that can be solved with informed search. In addition, the discrete sampling-based planners could benefit directly from the use of more complex heuristic techniques, such as pattern databases [10, 12, 34], that have been developed for difficult discrete problems. Below, we survey other research that could improve the usefulness of the algorithms we have described.
99
Solution-Path Smoothing One of the most significant issues in the use of RRTs, and to a lesser extent PRMs, is the decidedly sub-optimal solutions they generate. It would be helpful to explore techniques to smooth paths once they have been planned in order to increase the quality of solutions. One possibility is repeated re-planning of sections of the path with an optimal or near-optimal search algorithm, such as A∗ .
Nearest-Neighbor Tie Handling In Section 6.1, we discuss methods of breaking ties generated in nearest-neighbor searches. A very different approach, related to the idea of the k-growth RRT, is simply not to break ties at all, but instead to grow all ties simultaneously. Unlike the fixed k-growth RRT, we can continue to use nearest-neighbor search optimizations like those proposed in Section 5.1, and let the tree determine the value of k at each step. As in the basic k-growth RRT, some form of dispersion [25] checking on ties might be valuable to prevent growing a small cluster of nodes and avoiding the possibility of overlapping successor states. In the case of a meta-tree, multiple growth can be incorporated without modifying the algorithm by delaying the tie-breaking step until after the local planning. A local search tree could be grown from each of the ties for nearest neighbor, and then the new node (or nodes) could be selected from the trees generated by the local search using a method like one of those described in Section 6.1. By deferring tie-breaking, nodes that look promising but are blocked from growing toward the target point—either by explicit obstacles or the dynamics of the system—could be weeded out. Assuming a reasonable bound exists (or is imposed) on the number of possible ties, the algorithm is only a constant factor more complex than the basic meta-tree. The more poorly the heuristic predicts obstacles, the more likely this method is to produce better successor states. 100
Multiple-Query Planning PRMs are designed specifically for multiple-query path planning, and the RRT algorithm is well suited to multiple-query single-source or single-destination path planning. Given the smooth and rapid coverage of both algorithms in our grid-world exploration, we expect that these algorithms would perform well in discrete multi-query path planning problems.
Discrete PRMs with Local RRTs Recent results suggest that PRMs and RRTs can be combined to effectively solve difficult continuous planning problems by using bi-directional RRT search as the local planner for connecting PRM nodes [16]. Since our results suggest that algorithms such as A∗ are more efficient for small searches, they are therefore better suited to local planning in our simple test problems. However, it is likely that this technique could be applied to solve more difficult discrete planning problems, where the local search trees would be large enough that a discrete RRT or RRLT would show advantages over A∗ .
Biased Tree Improvements In single-directional RRT search, and even in bi-directional search, it is often useful or necessary to bias search toward the goal state(s). In our implementation, we treated the nearest-neighbor lookup the same as the lookup for a random target node, performing an explicit nearest-neighbor search. In problems with a single goal state, the speed of biased search could likely be improved significantly by keeping a priority-queue ordering of states based on their nearness to the goal state (as in a best first search) so that the biased steps could be performed without the need for time-consuming nearest-neighbor searches.
101
Semi-Leafy Tree Improvements Although our experiments with a simple on-demand leaf-creation scheme were not promising, other more intelligent schemes could be beneficial in cases where memory is enough of a limiting factor that some sort of size-optimization is necessary. Many of the drawbacks of the semi-leafy RRLT are due to the fact that (by the nature of the RRLT’s bias toward unexplored space) the leaves that are most likely to be useful are those on the outermost nodes of the tree. However, since the semi-leafy tree as described in Section 5.5 creates nodes on demand, the leaves most likely to have been expanded are those on the inside of the tree (because those are the leaves whose parents are the oldest, and thus most likely to have been considered by some previous nearest-neighbor search). One alternate method of creating a semi-leafy tree would involve creating leaves as their parent nodes are created (as in the original RRLT algorithm) but introducing time-based decay. That is, leaves of a node could “die off” and be removed from memory if that node had not been a candidate for the parent of a nearest neighbor in the last m iterations of RRLT growth. Here, m could be either a fixed value or based off of a last-used ranking of the nodes in the tree. This would reduce or eliminate both of the major drawbacks, but at a cost of additional processing of the tree at each step.
k -Growth Improvements In Section 5.4, we suggest methods for improving the k-growth RRLT by preventing growth of nodes which contribute little to the overall coverage of the tree, using heuristic distance to other nodes, or by checking for common recent ancestors of potential nodes. The extent to which this technique lowers the number of nearest-neighbor queries necessary to achieve the same level of coverage, especially in problems with many obstacles and dead-ends, remains to be seen.
102
7.2.3
Study of Discrete Properties
Our studies of the properties of discrete sampling-based planners in Chapter 6 gave us insight into their behaviors in discrete space. Other forms of analysis could further enhance our understanding.
Distance-to-Tree Coverage Our experiments with grid-world coverage focused on analysis in terms of cells actually covered by the tree or roadmap. An alternate method of measuring coverage is in terms of the distance from each point in the space to the the nearest point in the tree (see Section 4.4.1). This distance-to-tree approach to examining coverage could give further insight into the coverage properties of discrete sampling-based planners.
Voronoi Region Analysis As mentioned in Section 3.1.2, Voronoi regions convey information about coverage of space directly. Although we considered only grid-world Voronoi regions, the method we used to compute them is general, and can be applied to sampling-based planning in any discrete space. Another way of exploring issues of coverage in problems where visualizing Voronoi regions is difficult would be to compute the sizes and locations of Voronoi regions during RRT and PRM growth, and to use these properties to track changes in the Voronoi regions over time.
Path Optimality Distributions In [27], RRT optimality was explored through statistical analysis of repeated planning experiments. The resulting distributions of path optimality demonstrated interesting trends beyond those that can be captured through average optimality
103
results. It remains to be seen if similar trends exist in discrete planning, especially in direct grid-world discretizations of the same experiments.
Harder Problem Domains In order to gain a broader understanding of the properties of our discrete algorithms, similar analysis should be done in problem domains other than small grid worlds. Specifically, direct study of properties in larger two- and three-dimensional grid worlds, grid worlds with more varied obstacles, and problems such as the N -Puzzle and Knight-Swapping Puzzle, would give us more complete information about discrete sampling-based planners.
7.2.4
Application to Continuous and Hybrid Systems
Several of the algorithms we present could prove useful in certain continuous or hybrid applications, despite being developed for discrete space. Specifically, we believe that the RRLT and the Cost-Optimized RRT/RRLT have potential for extension out of purely discrete problems.
Continuous RRLTs Although the RRLT algorithm and its enhancements were designed to overcome problems in searching discrete spaces, it is possible that it would prove useful in some continuous and hybrid systems as well. Specifically, it could be applicable in non-holonomic motion planning problems that restrict motion to a small set of actions at each time-step. The RRLT algorithm could also be applied when restrictions are placed on the motion of an agent in continuous space to make searching more computationally feasible [7, 8].
104
Continuous Cost-Optimized Trees Although nearness in continuous RRTs has been defined in terms of distance metrics, it would be possible instead to define them using a formula of the same form as that used in general heuristic search. By assigning a small weight to the path cost of each node and adding that to the distance metric when finding nearest neighbors, a continuous or hybrid RRT could likely be made to sacrifice a small amount of rapid exploration in favor of better solutions.
105
Appendix A Proof of Knight-Swapping Puzzle Heuristic Properties A.1
Admissibility
Throughout our proof, we will use the following notation: Sb
=
the set of black starting locations,
Db
=
the set of black destination locations,
Sw
=
the set of white starting locations,
Dw
=
the set of white destination locations.
Let the simple distance d(i, j) be the shortest distance (in knight moves) between locations i and j, disregarding obstacles at any intermediate locations. For a set of locations A, d(i, A) = min d(i, j). j∈A
Note that d(i, A0 ) ≤ d(i, A),
106
if A0 ⊇ A.
(A.1)
Any solution π to the Knight-Swapping Puzzle naturally induces bijective maps
π b : Sb → D b ,
π w : Sw → D w ,
where, for example, πb (k) is the destination square in the target board of the black knight that starts at square k in the initial board. Let the cost of any such solution be denoted by J(π). Then the optimal solution, π ∗ satisfies
J(π ∗ ) = min J(π). π
J(π) is certainly bounded by
J(π) ≥
X
d(k, πb (k)) +
k∈Sb
X
d(k, πw (k)).
k∈Sw
Note that for any solution π,
J(π) ≥
X k∈Sb
=
X
d(k, Db ) +
X
d(k, Dw )
k∈Sw
d(k, Db ) +
X
d(k, Dw )
k∈Sw −Dw
k∈Sb −Db
, hA ,
so that the heuristic which computes the sum of the simple distance of every out-of-place knight to any destination square is admissible. We proposed heuristic hB , namely the sum of the simple distance of every out-of-place knight to any destination square not occupied by a knight of the same color :
hB =
X
d(k, Db − Sb ) +
k∈Sb −Db
X k∈Sw −Dw
where A − B denotes set difference. 107
d(k, Dw − Sw ),
We claim that without loss of generality, any optimal solution satisfies both
πb∗ (k) = k,
k ∈ Sb ∩ D b ,
πw∗ (k) = k,
k ∈ Sw ∩ D w .
That is, the optimal solution needn’t move any knight that is already at a correct destination. PROOF: Suppose there were a black knight at location m with m ∈ Sb ∩ Db but πb∗ (m) 6= m. Then consider the black knight at location l such that πb∗ (l) = m. (There must be such a knight because m ∈ Db .) But in this case the optimal solution could swap the destinations of m and l because
d(l, πb∗ (l)) + d(m, πb∗ (m)) = d(l, m) + d(m, πb∗ (m)) ≥ d(l, πb∗ (m))
(by the Triangle Inequality)
= d(l, πb∗ (m)) + d(m, m) = d(l, πb∗ (m)) + d(m, πb∗ (l))
The same argument can be made for the white knights. Q.E.D.
A.2
Domination
By Equation (A.1), it follows from the definition of hB that hB dominates hA .
108
f f v v f
f v f v v
f v v f v f f v f v v f f v
20
18
f - f f v f
f f v f v
v f v f v
v v v v f
f v f v
Figure A.1: Violation of symmetry in the Knight-Swapping Puzzle.
A.3
Symmetry Violation
Since the heuristic does not provide a direct one-to-one mapping between the knights and the possible destination squares, there is no guarantee that the heuristic evaluation will return the same value when evaluated in each direction (see Figure A.1). In order to gauge the extent of this issue, we generated 1,000,000 pairs of random boards for each of the 5 × 5, 7 × 7, and 9 × 9 Knight-Swapping Puzzles and compared the heuristic cost-to-go in both directions. We found that symmetry was violated in approximately half of the pairs at each size. However, the two evaluations rarely differed by more than one, so the average overall improvement in the heuristic that would result from defining a new, dominating, symmetrical heuristic h0 (q1 , q2 ) = max(h(q1 , q2 ), h(q2 , q1 )) ranges from about 6.6% (for the 5 × 5 board) to 2.2% (for the 9 × 9 board).1 Since this slight improvement would come at a cost of doubling the computation time for every cost-to-go evaluation—and thus the already-expensive nearest-neighbor calculation—we chose not to use the more accurate heuristic. 1
We define improvement as (h0 − h)/h), where h and h0 are the averages over the 1,000,000 trials of each heuristic.
109
f f f v f > 6 f f f f f v v f f
v f f f v
v f v v f v v f f
v f f v v
v v v v v
20
f v v f v
v f fZ v Z 12 Z v Z v f - f v f
ZZ ~ v v v v f f f f f v
v f v f v
v f f v
Figure A.2: Violation of the triangle inequality in the Knight-Swapping Puzzle.
A.4
Triangle Inequality Violation
In general, an admissible heuristic need not satisfy the triangle inequality, since the amount of underestimation differs between evaluations. In the Knight-Swapping Puzzle, we found that although it is possible to violate the triangle inequality (see Figure A.2), it happens very rarely. In 1,000,000 trials of randomly generated boards of each of the three sizes, the triangle inequality was violated only 16 times in the 5 × 5 board, and never in the 7 × 7 or 9 × 9 versions. Thus, although the Knight-Swapping Puzzle is not guaranteed to satisfy the triangle inequality, techniques that require the triangle inequality (such as the nearest-neighbor optimizations described in Section 5.1) are likely to work without noticeable problems.
110
Appendix B Implementation Details All code for the experiments in this thesis was written in C++, and designed to allow maximum code re-use between problems, in order to ensure consistency between different experiments. Both the RRT family of algorithms and the PRM algorithms had an abstract base class, which defined a basic interface for all implementations. All functions that operated on the RRT and PRM classes used either this high-level interface or C++ templates to make them as general as possible. We also used inheritance to make the implementation of RRT variants as similar as possible by adding layers to the class hierarchy containing all common implementation details, with specific variants overriding only those functions necessary to modify the algorithm’s behavior. This isolated the effects of our changes to the algorithms, allowing us to be sure that the results we observed were not effects of incidental differences in implementation. Specific problem domains were implemented as the bottom layer of the class hierarchy. All RRT and PRM implementations left both the heuristic evaluation function and the method of generating successors abstract, so that a new problem domain could be tested simply by defining those two functions in a subclass of any implementation.
111
The RRT family of algorithms was implemented with an actual tree structure, where each node stores a pointer to its parent and each of its children, in order to allow pruning during nearest-neighbor queries. The tree was augmented with an STL set of all the nodes in the tree, to allow for more efficient checks that a node is not in the tree. In the case of the RRLT algorithms, an additional set kept track of leaves which had been generated, to prevent creation of duplicate leaves. An STL vector was used to keep track of nodes recently added to the tree, in order to make intersection testing of bi-directional trees more efficient; tree intersection was tested by using the STL set-intersection function to test for intersection of the new nodes in each tree with all the nodes in the opposite tree. The PRM algorithms were implemented using an STL vector to store the nodes in the map, with each node storing its adjacency list in the form of pointers to all the nodes to which it is connected. Other than path costs, details of the connections between nodes were not stored in our implementation. PRM connectedness was tested using a standard disjoint set algorithm: each node stores a pointer to a representative node for its component, which is updated whenever two components are connected to each other. Two nodes are connected if and only if they have the same representative. Traditional heuristic algorithms, such as A∗ and best-first search, were implemented by defining specific problem domains as subclasses—exactly as described for the discrete sampling-based planners—so that all algorithms used the same format of problem definition. The search algorithm itself was implemented using an STL priority queue to keep the Open list (of nodes to be explored) ordered according to the heuristic.
112
Bibliography [1] Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, and Angela Y. Wu. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. Journal of the ACM, 45(6):891–923, 1998. [2] Anna Atramentov and Steven M. LaValle. Efficient nearest neighbor search for motion planning. In IEEE International Conference on Robotics and Automation, pages 632–637, Taipei, Taiwan, September 14–19, 2002. [3] Kostas E. Bekris, Brian Y. Chen, Andrew M. Ladd, Erion Plaku, and Lydia E. Kavraki. Multiple query probabilistic roadmap planning using single query planning primitives. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 656–661, Las Vegas, NV, October 27–31, 2003. [4] Alexander Bogomolny. Sam Loyd’s fifteen. http://www.cut-the-knot.org/pythagoras/fifteen.shtml. [5] Michael S. Branicky, Michael M. Curtiss, Joshua Levine, and Stuart Morgan. RRTs for nonlinear, discrete, and hybrid planning and control. In IEEE Conference on Decision and Control, Lahaina, HI, December 9–12, 2003. [6] Michael S. Branicky, Michael M. Curtiss, Joshua Levine, and Stuart Morgan. Sampling-based planning and control. In Twelfth Yale Workshop on Adaptive and Learning Systems, New Haven, CT, May 28–30, 2003. [7] Michael. S. Branicky, Tor A. Johansen, Idar Petersen, and Emilio Frazzoli. On-line techniques for behavioral programming. In IEEE Conference on Decision and Control, pages 1840–1845, Sydney, Australia, December 12–15, 2000. [8] Michael. S. Branicky and Gang Zhang. Solving hybrid control problems: Level sets and behavioral programming. In American Control Conference, pages 1175–1180, Chicago, IL, June 28–30, 2000. [9] John Collins, G¨ uleser Demir, and Maria Gini. Bidtree ordering in IDA* combinatorial auction winner-determination with side constraints. In J. Padget, O. Shehory, D. Parkes, N. Sadeh, and W. Walsh, editors, Agent Mediated Electronic Commerce IV, volume LNAI2531, pages 17–33. Springer-Verlag, 2002. 113
[10] J. Culberson and J. Schaeffer. Efficiently searching the 15-puzzle. Technical Report 94-08, Department of Computer Science, Iowa State University, August 1994. http://citeseer.nj.nec.com/culberson94efficiently.html. [11] Michael M. Curtiss. Motion planning and control using RRTs. Master’s project report, Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, May 2002. http://dora.cwru.edu/msb/pubs/mmcMS.pdf. [12] Stefan Edelkamp. Planning with pattern databases. In European Conference on Planning, pages 13–24, Toledo, Spain, September 12–14, 2001. [13] Keld Helsgaun. An effective implementation of the Lin-Kernighan traveling salesman heuristic. European Journal of Operations Research, 126(1):106–130, 2000. [14] Morris W. Hirsch and Stephen Smale. Differential Equations, Dynamical Systems, and Linear Algebra. Academic Press, San Diego, California, 1974. [15] Holger H. Hoos and Craig Boutilier. Solving combinatorial auctions using stochastic local search. In AAAI National Conference on Artificial Intelligence, pages 22–29, July 30–August 3, 2000. [16] Pekka Isto. Constructing probabilistic roadmaps with powerful local planning and path optimization. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2323–2328, Lausanne, Switzerland, September 30–October 4, 2002. [17] Andreas Junghanns and Jonathan Schaeffer. Sokoban: A challenging single agent search problem. In International Joint Conferences on Artificial Intelligence, Nagoya, Japan, August 23–29, 1997. [18] Andreas Junghanns and Jonathan Schaeffer. Domain-dependent single-agent search enhancements. In International Joint Conferences on Artificial Intelligence, pages Stockholm, Sweden, July 31–August 6, 1999. [19] Andreas Junghanns and Jonathan Schaeffer. Sokoban: Improving the search with relevance cuts. Journal of Theoretical Computing Science, 252(1–2):151–175, 2001. ˇ [20] Lydia E. Kavraki, Petr Svestka, Jean-Claude Latombe, and Mark H. Overmars. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 12(4):566–580, 1996. [21] Richard E. Korf. Finding optimal solutions to Rubik’s cube using pattern databases. In International Joint Conferences on Artificial Intelligence, pages 700–705, Providence, RI, July 27–31, 1997.
114
[22] Richard E. Korf and Larry A. Taylor. Finding optimal solutions to the twenty-four puzzle. In AAAI National Conference on Artificial Intelligence, pages 1202–1207, Portland, OR, August 4–8, 1996. [23] James J. Kuffner and Steven M. LaValle. RRT-connect: An efficient approach to single-query path planning. In IEEE International Conference on Robotics and Automation, pages 995–1001, San Francisco, CA, 2000. [24] Steven M. LaValle. Rapidly-exploring random trees: A new tool for path planning. Technical Report 98-11, Department of Computer Science, Iowa State University, October 1998. http://msl.cs.uiuc.edu/∼lavalle/papers/Lav98c.ps.gz. [25] Steven M. LaValle and Michael S. Branicky. On the relationship between classical grid search and probabilistic roadmaps. In Workshop on the Algorithmic Foundation of Robotics, pages 55–71, Nice, France, December 15–17, 2002. [26] Steven M. LaValle and James J. Kuffner. Rapidly-exploring random trees: Progress and prospects. In B. R. Donald, K. M. Lynch, and D. Rus, editors, Algorithmic and Computational Robotics: New Directions, pages 293–308. A.K. Peters, Wellesley, Massachusetts, 2001. [27] Joshua A. Levine. Sampling-based planning for hybrid systems. Master’s thesis, Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, January 2004. http://dora.cwru.edu/msb/pubs/jalMS.pdf. [28] Tsai-Yen Li and Yang-Chuan Shie. An incremental learning approach to motion planning with roadmap management. In IEEE International Conference on Robotics and Automation, pages 3411–3416, Washington, DC, May 11–15, 2002. [29] S. Lin and B. W. Kerninghan. An effective heuristic algorithm for the traveling-salesman problem. Operations Research, 21:498–516, 1973. [30] Stephen R. Lindemann and Steven M. LaValle. Incrementally reducing dispersion by increasing Voronoi bias in RRTs. In IEEE International Conference on Robotics and Automation, New Orleans, LA, 2004. Submitted. http://msl.cs.uiuc.edu/∼lavalle/papers/LinLav04.ps.gz. [31] Stuart Morgan and Michael S. Branicky. Sampling-based planning for discrete spaces. In IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004. Submitted. [32] Noam Nisan. Bidding and allocation in combinatorial auctions. In ACM Conference on Electronic Commerce, pages 1–12, Minneapolis, MN, October 17–20, 2000.
115
[33] Judea Pearl. Heuristics: Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley, Menlo Park, California, 1984. [34] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, Upper Saddle River, New Jersey, 2nd edition, 2003. [35] William E. Story. Notes on the “15” puzzle. American Journal of Mathematics, 2:397–404, December 1879. [36] Eric W. Weisstein. 15 puzzle. From MathWorld—A Wolfram Web Resource. http://mathworld.wolfram.com/15Puzzle.html. [37] Peter N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In SODA: ACM-SIAM Symposium on Discrete Algorithms, pages 311–321, Austin, TX, January 25–27, 1993.
116