State Abstraction in Real-time Heuristic Search ... - Semantic Scholar

1 downloads 0 Views 4MB Size Report
Nov 13, 2006 - 3. Problem Formulation: Heuristic Search. We use a common formulation ...... Warcraft III: Reign of chaos., Published by Blizzard Entertainment,.
University of Alberta, Department of Computing Science, Technical Report

November 13, 2006

State Abstraction in Real-time Heuristic Search: The Director’s Cut Vadim Bulitko Nathan Sturtevant Jieshan Lu Timothy Yau

BULITKO @ UALBERTA . CA NATHANST @ CS . UALBERTA . CA JIESHAN @ CS . UALBERTA . CA THYAU @ UALBERTA . CA

Department of Computing Science, University of Alberta Edmonton, Alberta, T6G 2E8, CANADA

Abstract Real-time heuristic search methods are used by situated agents in applications that require the amount of planning per move to be independent of the problem size. Such agents plan only a few actions in a local search space and avoid getting trapped in local minima by improving their heuristic function over time. We extend a wide class of real-time search algorithms with automatically built state abstraction. We then prove completeness and convergence of the resulting novel family of algorithms. Finally, we analyze effects of abstraction in an extensive empirical study situated in the goal-directed navigation problem. In particular, we demonstrate the new algorithms are most efficient in trading pairs of antagonistic performance measures such as response time and overall convergence speed. Keywords: learning real-time heuristic search, state abstraction, goal-directed navigation.

1. Introduction and Motivation Imagine an autonomous surveillance aircraft in the context of disaster response (Kitano, Tadokoro, Noda, Matsubara, Takahashi, Shinjou, & Shimada, 1999). As a part of surveying the disaster site, locating victims and assessing damage, the aircraft can be ordered to fly from its current position to a particular position whose coordinates are set by a human operator. Radio interference due to electrical system damage and fire makes remote control unreliable and requires a certain degree of autonomy from the aircraft. This problem poses several challenges to an Artificial Intelligence (AI) controller of the aircraft. First, due to flight dynamics, the AI has to control the aircraft in real time, producing a certain number of actions per second. Second, the extent of damage to the site is unknown and, thus, the aircraft has to map out the site via exploration. Since human victims may be involved and the amount of fuel on board the aircraft is limited, the exploration needs to be conducted in an efficient and expedited manner. Third, the aircraft has a limited range of its onboard sensors such as laser range finders and cameras which can be further reduced by smoke and debris in the air. Thus, the AI of the aircraft has to maintain an internal map of the site and update it with the information it receives from its sensors. In this paper we study a simplified version of this problem posed as agent-centered heuristic search (Koenig, 2001). In such settings, an agent is ordered to navigate from its current position to a goal state. We abstract from sensor errors, stochastic nature of the actuators, and dynamics of the environment. This allows us to focus solely on the three challenges motivated above. First, we require heuristic search algorithms to operate in real time. We do so by interleaving planning and execution and bounding planning time for each action by a constant. Second, we assume that while c Vadim Bulitko, Nathan Sturtevant, Jieshan Lu, Timothy Yau. All rights reserved.

B ULITKO , S TURTEVANT, L U , YAU

the goal state is known at all times, the search space itself is a priori unknown and the search agent maintains an internal model of the search problem. Third, the agent can sense only a limited part of the search problem around its current state and uses it to update its model.

2. Contributions We make the following three contributions in this paper. First, we present a real-time heuristic search algorithm that uses automatically generated state abstraction.1 The new algorithm unifies and extends several widely known existing heuristic search algorithms. We prove its real-time operation, completeness, and convergence properties. This contribution can be viewed as a followup to previous unification and extension efforts (Bulitko & Lee, 2006). In some existing literature on abstraction in search (e.g., Holte, Mkadmi, Zimmer, & MacDonald, 1996; Botea, M¨uller, & Schaeffer, 2004) novel algorithms are suggested as giving up “little” solution optimality for a “substantial” reduction in running time. Unfortunately, it is usually unclear how much optimality must be given up for a particular reduction in running time. We make our second contribution by studying this question empirically. Namely, we introduce the concept of dominance as out-performance along two antagonistic (i.e., negatively correlated) measures. Thus, one algorithm would be said to dominate another if it both improves optimality and reduces running time. We consider a large set of algorithms with and without abstraction and show that algorithms with abstraction dominate most algorithms without abstraction. Third, we analyze effects of abstraction on search with respect to five commonly used performance measures (e.g., solution suboptimality, amount of planning for the first move, etc.) Knowing the effects deepens our understanding of real-time heuristic search methods as well as guides a practitioner in selecting the most appropriate search algorithm configuration for her application.

3. Problem Formulation: Heuristic Search We use a common formulation of search problems found in real-time heuristic search literature. The agent’s task is to solve a search problem by finding a path in the search graph from the start state to the goal state. The search problem is formally defined as follows. Definition 3.1 A search problem instance is defined as a tuple (G, c, s0 , sg , h0 ) where G = (S, E) is a directed weighted graph. S is a finite set of states (or vertices) and E ⊂ S × S is a finite set of edges between them. The edge weights are defined by the cost function c : E → (0, ∞) with c(s1 , s2 ) being the travel cost for the edge e = (s1 , s2 ). s0 ∈ S is the start state, sg ∈ S is the goal state, and h0 : S → [0, ∞) is the initial heuristic function. We assume that h0 (sg ) = 0 and sg is reachable from any state. In the following, out-edges of agent’s current state are called moves or actions. Their number (i.e., out-degree of a state) is called branching factor of a state. Definition 3.2 A solution to search problem instance is a path from the start state s0 to the goal state sg . The path is denoted by (s0 , s1 , . . . , sg ) where each si is a valid state and each pair of states (si , si+1 ) is a valid edge. We assume the standard settings of agent-centered search (Koenig, 2001): 1. An early version of our algorithm was first published in (Bulitko, Sturtevant, & Kazakevich, 2005). That account lacked theoretical analysis and presented only a pilot empirical study.

2

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Definition 3.3 At all times, a search agent is in a single search state sc ∈ S called current state. The agent can change its current state only by executing actions and thus incurring travel cost. Initially, the current state coincides with the start state s0 . Agent is said to succeed when it makes its current state coincide with the goal state sg . To illustrate: in the domain of goal-directed navigation a single search problem is defined by a terrain map, a start location (e.g., the agent’s current position) and a goal location (e.g., place on the map that a user is ordering the agent to go to). In this paper, we will be evaluating all algorithms on a set of search problems. Each problem in a set is generated by selecting a map and then randomly selecting start state s0 and goal state sg . This setup reflects the common applications where an agent can be tasked to go from any map location to any other map location. Assumption 1 We assume that (i) for every action in every state there exists a reverse action (possibly with a different cost) and (ii) every applicable action leads to another state (i.e., no self-loops). ( Shue and Zamani (1993) and (Shimbo & Ishida, 2003)) Assumption (i) is needed for the backtracking mechanism used in SLA* (Shue & Zamani, 1993). It holds in many combinatorial and pathfinding problems. Its applicability to domains where reversing an action may require a non-trivial sequence of actions is a subject of future research. Assumption (ii) is adopted for the sake of proof simplicity. In other words, the results presented in this paper hold even if self-loops are present. In line with most literature in the field of real-time search, we assume that the environment is stationary and deterministic. Extension of our algorithms to dynamic and/or stochastic domains is a subject of current research. Definition 3.4 The travel cost of traveling from state s1 to state s2 denoted by dist(s1 , s2 ) is defined as the cost of the shortest path from s1 to s2 . Throughout the paper, we will assume that dist satisfies the standard triangle inequality: ∀s1 , s2 , s3 ∈ S [dist(s1 , s3 ) ≤ dist(s1 , s2 ) + dist(s2 , s3 )]. Then, for any state s its perfect heuristic h∗ is defined as the minimal travel cost to the goal: h∗ (s) = dist(s, sg ). A heuristic approximation h to the true travel cost is called admissible if and only if ∀s ∈ S [h(s) ≤ h∗ (s)]. The value of h in state s will be referred to as the heuristic value of state s. We assume that h(sg ) = 0 which trivially holds for an admissible h. Definition 3.5 A trial is defined as the agent’s problem-solving experience while traveling from its start state to the goal state. Once the goal state is reached, the agent is reset to the start state and the next trial begins. On each trial the agent can update its heuristic function and its internal representation of the search graph as well as any additional data structures it chooses to maintain (e.g., an abstraction hierarchy). A convergence process is defined as the first sequence of trials until the agent does not update its heuristic function or its internal representation of the search problem. The first trial to lack such updates is called the final trial and the learning process is said to have converged. Note that in the case of systematic or fixed tie-breaking (e.g., Furcy & Koenig, 2000), the final trial will be identical to all trials that follow it. Intuitively, convergence means that the agent’s solution became as good as it will ever get for a given search problem instance. It does not mean that the entire graph has been explored or the learned heuristic is perfect for all states. In our experiments we break all ties between moves in a systematic fashion which entails that agent’s behavior will be identical on trials after the final trial. 3

B ULITKO , S TURTEVANT, L U , YAU

Definition 3.6 Convergence travel is the cumulative cost of all edges traversed by the agent during convergences. Convergence planning is the amount of all planning effort expended by the agent during convergence. First-move lag is the amount of planning incurred by the agent on the first move of its final trial. Convergence memory is measured as the total number of heuristic values stored during the convergence process.2 Finally, suboptimality is defined in percentage points as the final-trial solution length excess relative to the shortest-path cost (i.e., h∗ (s0 )). Note that we measure the first-move lag at the end of a convergence process (i.e., on its final trial). This is done because we would like the first-move lag to be strongly correlated with the physical time it takes the agent to generate its first action. During the convergence process, the agent is repairing its internal search graph representation as well as updating its heuristic function. These processes strongly influence the delay time in an implementation- and platform-specific fashion. Once the convergence process is over, these transient influences are eliminated and the amount of planning effort for the first move is indicative of the agent’s responsiveness on all subsequent trials on the search problem. This approach also makes our data consistent with previously published results (Bulitko et al., 2005; Bulitko & Lee, 2006). In line with previous literature (e.g., Holte et al., 1996) we do not measure planning effort in terms of physical CPU time. This is done for several reasons. First, due to the massive scale of our experiments (over 1.7 CPU-year), we used a grid of diverse computers which would have made CPU timings incompatible. Second, physical times are highly sensitive to efficiency of algorithm implementation. Not only our implementations were designed to be flexible more than efficient but also we could not ensure the same level of code optimization across different algorithms. Instead of CPU times, we report convergence planning and first-move delay in terms of the number of states the algorithm “touched” (i.e., considered) during planning. This measure is called edges traversed in (Holte et al., 1996, p. 325) and is related to another common measure: number of states expanded. Solving a single search problem instance is of little interest in evaluating scalability of an algorithm or its real-time properties. Instead we evaluate search algorithms on a series of problems of increasing size. Consequently, we define a parameterized search problem as follows: Definition 3.7 A heuristic search problem is defined as a series of search problem instances (Definition 3.1) where the number of states is the search problem parameter (n). Formally, it is {(Gn , cn , sn0 , sng , hn0 ) | n ∈ N} where Gn = (S n , E n ) and |S n | = n. We assume that the maximum degree of any state is constant bounded and independent of n. Consequently, |E n | = O(n). In the agent-centered search framework an agent interleaves planning and execution. It runs a search algorithm from its current state and then traverses one or more graph edges to change its current state. Thus, an important question is the amount of planning time/effort taken between edge traversals. Definition 3.8 An search algorithm exhibits real-time performance on a heuristic search problem (Definition 3.7) if and only if its planning effort per move is constant-bounded and the constant is independent of the problem parameter n. 2. The standard practice in the real-time heuristic search literature (e.g., Korf, 1990; Shimbo & Ishida, 2003) is to store the heuristic values in a hash table. Hash table misses are handled by a procedurally specified initial heuristic h0 (e.g., the Manhattan distance). Then convergence memory is the size of the hash table after convergence.

4

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

4. Application: Goal-directed Navigation One of the motivating applications of heuristic search is goal-directed navigation (also known as pathfinding). It is a special case of the heuristic search problem formalized in the previous section where the search graph (S, E) is defined by a terrain map. Thus, the states/vertices correspond to geographical positions on a map, the edges describe passability or blocking, and the cost function represents the difficulty/time of traversing the terrain. We call pathfinding real-time if the planning time per any move is constant-bounded regardless of the terrain size (Definition 3.8). Real-time pathfinding is motivated primarily by time-sensitive robotics (e.g., Koenig & Simmons, 1998; Koenig, 1999; Kitano et al., 1999; Koenig, Tovey, & Y., 2003) and computer games. The latter include real-time strategy games (Blizzard Entertainment, 2002), first-person shooters (id Software, 1993), and role-playing games (BioWare Corp., 1998). In all of these, time plays a critical role since the pathfinding agent is situated in an asynchronous dynamic environment that changes while the agent is planning. Moreover, frequently a large number of agents perform pathfinding simultaneously. As a result, in games such as “Age of Empires II” (Ensemble Studios, 1999), as much as 60-70% of simulation time is spent in pathfinding (Pottinger, 2000). In this paper, we follow the footsteps of Furcy and Koenig (2000), Shimbo and Ishida (2003), Koenig (2004), Botea et al. (2004), Hern´andez and Meseguer (2005a, 2005b), Sigmundarson and Bj¨ornsson (2006), Koenig and Likhachev (2006) and situate our empirical study in navigation on two-dimensional rectangular-shaped grid-based maps. The cells are rectangular and each cell is connected to four cardinally and four diagonally neighboring cells. Each cell can be occupied by an agent (i.e., free) or by a wall (i.e., blocked). Each cell has an elevation value. The agent is unable to travel from a cell to its free neighbor if the difference in elevation exceeds a certain threshold. In computer games, this is done to simulate impassability due to cliffs. Each grid cell that can be occupied by the agent constitutes a vertex/state in the search space S. For any two free neighboring cells s1 and s2 , if the agent can travel from s1 to s2 , the edge (s1 , s2 ) is added to the set √ E. In this paper, we set the edge costs to 1 for cardinal moves (i.e., west, north, east, south) and to 2 for diagonal moves. The cell initially occupied by the agent is s0 , the target cell is sg . An example of converting a grid-based map to a search problem defined by (G, c, s0 , sg , h0 ) is shown in Figure 1. Note that we do not allow diagonal moves that “cut corners” and, thus, the state s6 is not connected to states s1 , s5 , s7 , sg . This is done because a non-zero size agent will not be able to pass through a zero-width bottleneck formed by two diagonally adjacent blocked cells. In the case when there is only one corner (e.g., between√states s5 and s6 in Figure 1), allowing to cut it would lead to the actual travel distance exceeding 2 as a non-zero width agent will be have to walk around the corner.

5. Search Graph Discovery An important attribute of a search problem (G, c, s0 , sg , h0 ) is whether it is known to a search agent a priori. In this paper, we do not require a search agent to know the problem in its entirety. We assume that a portion of the search problem (e.g., immediate neighbors of the current state sc ) are sensed by agent at each time step. We assume that an agent can remember parts of the problem sensed so far. More specifically, at all times an agent has a belief of what its search problem is like. This is formalized as follows: 5

B ULITKO , S TURTEVANT, L U , YAU

Goal

s7

s8

sg

s6

Start

s1

s4

s5

s0

s2

s3

Figure 1: A 3×4 grid-based map (left) converted to a 9-state search graph (right). Thinner √ cardinaldirection edges have the cost of 1, thicker diagonal edges have the cost of 2. Definition 5.1 An agent has an internal representation or a model of the search problem it is solving. That is, at each time moment t it believes that the search problem is ( G|t , c|t , s0 , sg , h0 ) which may or may not be the same the actual search problem (G, c, s0 , sg , h0 ). Definition 5.2 Graph discovery is a process of updating the agent’s model of the search graph to make it consistent with agent’s observations as it explores its environment. An update at time t can be represented as: S|t+1 = ( S|t ∪ Sadd |t ) \ Sdelete |t E|t+1 = ( E|t ∪ Eadd |t ) \ Edelete |t where S|t , S|t+1 are the sets of vertices before/after the update respectively; Sadd |t , Sdelete |t are the sets of vertices to add/delete respectively. A similar notation is used for the set of edges E. Cost function is updated from c|t to c|t+1 . We will now illustrate search graph discovery in the goal-directed navigation domain. The terrain map is initially unknown to the agent. As it moves around the environment, grid cells whose coordinates are within a fixed visibility radius of the agent’s current position are revealed. Formally, the agent situated in cell (x, y) can check status (free/blocked) of any cell (x0 , y 0 ) if |x − x0 | ≤ r and |y − y 0 | ≤ r, where r ∈ N is a visibility radius. Additionally, for any two visible cells (x0 , y 0 ) and (x00 , y 00 ) the agent can tell if there is an edge between them and what the edge’s cost is. This is similar to virtual sensors advocated by Thalmann, Noser, and Huang (1997). Such a discovery of the map in computer games is typical for the human player and is known as the “fog of war” (Figure 2). Likewise, in robotic navigation, it is usually assumed that the robot’s sensor range is limited. In software domains, theoretically the agent can have access to its entire environment. Several types of arguments have been made to justify restricting agent’s senses. First, omniscient virtual humans tend to behave unrealistically and, thus, are less suitable for virtual reality trainers (Dini, van Lent, Carpenter, & Iyer, 2006). In commercial games, revealing the entire map to an AI player when, at the same time, is only partially revealed to a human player, is viewed as cheating and can be exploited. Second, it can be computationally expensive to sense (Orkin, 2006) and reason about an entire environment (Thalmann et al., 1997; Aylett & Luck, 2000). Consequently, localized sensing is used in large-scale multi-unit systems (Reynolds, 1987). 6

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Figure 2: Fog of war: unexplored areas of the map are shown in black (Westwood Studios, 1992). An important issue is the assumptions an agent makes about the unknown part of the map. In this paper, we adopt the “free space assumption” (Koenig et al., 2003) and optimistically assume that unknown grid areas are free and passable. This is consistent with previous research that assumes initially unknown maps (Koenig, 2004; Bulitko et al., 2005; Bulitko & Lee, 2006; Koenig & Likhachev, 2006). Another important question is whether the goal-directed navigation lends itself to trial-based learning described in Section 3. It clearly applies when a unit is shuttling (e.g., traveling repeatedly between a resource pile and a base to gather resources) or patrolling. Interestingly, even one-time navigation tasks provide learning opportunities as on a single trial a real-time heuristic search agent usually improves heuristic values for many states. These values can be later reused in any other navigation task whose destination maps to the same region or, as Section 7.1 introduces, an abstract state. It is clear that map discovery process described above is a particular case of map learning. Namely, we assume that map is static and the only updates to the agent’s initial model are due to terrain impassabilties unknown a priori. We will now generalize this special case of graph discovery onto arbitrary search graph. Definition 5.3 Obstacle discovery is a special case of graph discovery where sets Sadd |t and Eadd |t are empty for any t. In other words, the only possible updates are edge and vertex removals. All cost function updates c|t are assumed to be for removed edges and vertices. For instance, it is impossible that cost of a non-removed edge changes over time.

6. Related Research Existing heuristic search methods for situated methods can be divided into two categories: complete search and real-time search. Algorithms from the former group form an entire solution given their current knowledge of the search graph. They use the initial heuristic h0 and learn only the search graph through exploration of the environment. On the contrary, real-time search algorithms plan only a small segment (frequently just the first move) of their solution and execute it as soon as it becomes available. It is generally true that, due 7

B ULITKO , S TURTEVANT, L U , YAU

to the local nature of planning, they need to update the heuristic function in order to avoid getting stuck in a local minimum of their heuristic function (Korf, 1990). 6.1 Agent-centered Complete Search A widely used complete search algorithm is a version of A* (Hart, Nilsson, & Raphael, 1968) called Local Repair A* (Stout, 1996). In it, a complete search is conducted from agent’s current state to the goal state under the free space assumption. The agent then executes the computed path until either the destination is reached or the path becomes invalid (e.g., a previously unknown wall blocks the way). In the latter case, the agent replans from its current position to the goal. Local Repair A* suffers from two problems. First, it searches for a shortest solution and, for a general search problem, may end up expanding a number of states exponential in the solution length due to inaccuracies in the heuristic function (Pearl, 1984). Second, re-planning episodes do not use any information from the previous search and run A* from scratch. The first problem is addressed by suboptimal versions of A* which are frequently implemented via weighting the heuristic function (Pohl, 1970, 1973). Such a weighted A* (WA*) usually finds a longer solution in less time. Once a suboptimal solution is found, it can be improved upon by conducting additional searches. This can be done by continuing to work from the open list of the weighted A* (Hansen, Zilberstein, & Danilchenko, 1997) or by re-running A* searches in a tunnel induced by the suboptimal path at hand (Furcy, 2006). The latter algorithm can also use beam search with backtracking instead of weighted A* (Furcy & Koenig, 2005). The second problem is addressed by incremental search methods such as D* (Stenz, 1995), D* Lite (Koenig & Likhachev, 2002) and LPA* (Koenig & Likhachev, 2001). These algorithms reuse some information from the previous search, thus speeding up subsequent replanning episodes. Finally, A* search can be run on a smaller abstract graph. The resulting abstract solution can be fully or partially refined into a ground path (Botea et al., 2004; Sturtevant & Buro, 2005). Despite substantial improvements made in complete search methods since the original A*, a full (abstract) path has to be computed before the first move can be executed by the agent. Consequently, the delay before the first move is not constant-bounded and increases with the problem size. Thus, agent-centered complete search is not real-time in the sense of Definition 3.8. 6.2 Learning Real-time Search Since the seminal work on LRTA* (Korf, 1990), research in the field of learning real-time heuristic search has flourished resulting in over twenty algorithms with countless variations. Most of them can be described along the following four attributes: 1. The Local search space is the set of states whose heuristic values are accessed in the planning stage. The two popular choices are full-width limited-depth lookahead (Korf, 1990; Shimbo & Ishida, 2003; Shue & Zamani, 1993; Shue, Li, & Zamani, 2001; Furcy & Koenig, 2000; Hern´andez & Meseguer, 2005a, 2005b; Sigmundarson & Bj¨ornsson, 2006; Rayner, Davison, Bulitko, Anderson, & Lu, 2007) A*-shaped space (Koenig, 2004; Koenig & Likhachev, 2006). Additional choices are decision-theoretic based shaping (Russell & Wefald, 1991) and dynamic lookahead depth-selection (Bulitko, 2004; Luˇstrek & Bulitko, 2006). 2. The Local learning space consists of all states whose heuristic values will be updated upon planning. The choices here range from the current state only (Korf, 1990; Shimbo & Ishida, 8

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

2003; Shue & Zamani, 1993; Shue et al., 2001; Furcy & Koenig, 2000; Bulitko, 2004) to all states within the local search space (Koenig, 2004; Koenig & Likhachev, 2006) to some of the previously visited states and their neighbors (Hern´andez & Meseguer, 2005a, 2005b; Sigmundarson & Bj¨ornsson, 2006; Rayner et al., 2007). 3. A Learning rule is used to update the heuristic values of the states in the learning space. The common choices are dynamic programming or mini-min (Korf, 1990; Shue & Zamani, 1993; Shue et al., 2001; Hern´andez & Meseguer, 2005a, 2005b; Sigmundarson & Bj¨ornsson, 2006; Rayner et al., 2007), their weighted versions (Shimbo & Ishida, 2003), max of mins (Bulitko, 2004), modified Dijkstra’s algorithm (Koenig, 2004), and updates with respect to the principal variation3 (Koenig & Likhachev, 2006). Additionally, several algorithms learn more than one heuristic function (Russell & Wefald, 1991; Furcy & Koenig, 2000; Shimbo & Ishida, 2003). 4. A Control strategy is responsible for producing the move after planning and learning phases are completed. Commonly used strategies include: the first move of the principal variation (Korf, 1990; Furcy & Koenig, 2000; Hern´andez & Meseguer, 2005a, 2005b), the entire variation (Bulitko, 2004), and backtracking moves (Shue & Zamani, 1993; Shue et al., 2001; Bulitko, 2004; Sigmundarson & Bj¨ornsson, 2006) Given the wealth of existing algorithms, unification efforts have been undertaken. In particular, Bulitko and Lee (2006) suggested a unified framework, called Learning Real Time Search (LRTS), to combine and extend LRTA* (Korf, 1990), weighted LRTA* (Shimbo & Ishida, 2003), SLA* (Shue & Zamani, 1993), SLA*T (Shue et al., 2001), and to a large extent, γ-Trap (Bulitko, 2004). In the dimensions described above, LRTS operates as follows. It uses a full-width fixeddepth local search space with transposition tables to prune duplicate states. LRTS uses a max of mins learning rule to update heuristic value of the current state (its local learning space). The control strategy executes the entire principal variation if the cumulative volume of heuristic function updates on a trial is under a user-specified quota or backtracks to its previous state otherwise. For reader’s convenience we walk through LRTS pseudocode in Appendix A. Within LRTS, the unification of several algorithms was accomplished through implementing several methods for local search space selection, the learning rule, and the control strategy. Each of these methods can be engaged run-time via user-specified parameters. The resulting parameter space contained all the original algorithms plus numerous new combinations enabled tuning performance according to a specific problem and objective function of a particular application. As a demonstration, Bulitko et al. (2005) tuned LRTS for ten maps from computer game “Baldur’s Gate” (BioWare Corp., 1998) and achieved convergence speed two orders of magnitude faster than LRTA* while finding paths within 3% of optimal. At the same time, LRTS was about five times faster on the first move than incremental A* as shown in Table 1. Despite the improvements, LRTS and other real-time search algorithms converge more slowly than A* and, visually, may behave unintelligently by repeatedly revisiting dead-ends and corners.

7. Novel Algorithm In the previous section, we reviewed two groups of existing search algorithms. Complete search methods learn a search graph and use highly efficient complete search methods to plan an entire 3. Defined as the shortest path from the current state to the best-looking state on the frontier of the local search space.

9

B ULITKO , S TURTEVANT, L U , YAU

Table 1: Incremental A*, LRTA*, LRTS averaged over 50 problems on 10 maps. The average solution length is 59.5. Timings are taken on a 2.0GHz G5 with gcc 3.3 (Bulitko et al., 2005). Algorithm A* LRTA*(d = 1) LRTS(d = 10, γ = 0.5, T = 0)

1st move time 5.01 ms 0.02 ms 0.93 ms

Conv. travel 186 25,868 555

Suboptimality 0.0% 0.0% 2.07%

solution. The planning time prior to the first action the agent increases with problem size and is not constant-bounded for a search problem (Definitions 3.7 and 3.8). In practice, such agents converge quickly but respond slowly to user commands. Real-time search algorithms guarantee a constant-bound on the time per move independent of the problem size. To accomplish this, they plan only the first few moves of the solution and update their heuristic function to avoid getting trapped in a local minimum of their heuristic function. As a result, real-time agents can be made very responsive at the cost of a long learning process. Our novel algorithm is based on the following simple idea. We combine the best of both approaches by running real-time search algorithms on an abstract search graph. Smaller size of the abstract graph speeds up convergence. The output of a real-time search algorithm running on the abstract problem is then refined by running A* restricted to a small subset on the ground-level graph. Because the A* is refining only a fixed number of abstract edges, its running time is constantbounded. As a result, the entire algorithm satisfies the real-time property (Definition 3.8). The resulting algorithm is called Path Refinement Learning Real Time Search (PR LRTS). It exhibits real-time performance in the sense of Definition 3.8. The algorithm is not limited to navigation problems and is complete for any search problems satisfying the formal description in Section 3. It does not assume prior knowledge of the search problem and learns both the search graph and the heuristic. The learning process converges after a finite number of trials. More importantly, the learning process can be made substantially faster than existing real-time search algorithms by running the learning component on a smaller abstract graph. 7.1 Automatic State Abstraction A key feature of PR LRTS is automatic search graph abstraction. We use the term ‘abstraction’ to mean a graph homomorphism in line with (Holte et al., 1996). Namely, abstraction is a many-to-one function which maps (abstracts) one or more states to a single abstract state and adjacent vertices are mapped to adjacent or identical vertices (Property 5 below). Given such a graph homomorphism and a model of a search problem (Definition 5.1), a PR LRTS agent builds N additional abstract search graphs, collectively called abstraction hierarchy, as follows. It first applies the graph homomorphism to the search graph of its model. The result is an abstract search graph at level 1. The process is then repeated until abstract search graph at level N is computed. Any homomorphic abstraction can be used as long as the resulting hierarchy of abstract graphs satisfies the following properties. In the following we introduce the properties informally and illustrate them with an example. In Appendix B we describe the properties formally. 10

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Property 1 Each of the abstract graphs is a search graph in the sense of Section 3. Property 2 Each state s in the search graph at level n < N has a unique “parent” state s0 in level n + 1 abstract search graph. More formally: parent(s) = s0 . Property 3 Each state s in search graph at level m, for 0 < m ≤ N , has at least one “child” state s0 at level m − 1. The notation children(s) represents the set of children of state s. Thus, s0 ∈ children(s). Because abstraction is a function defined over states, a parent state can be called image while the corresponding set of children can be called a pre-image. The term ‘children’ is also used to mean successor or neighbor states within a search. In this paper, we use the terms ‘children’, ‘child’, and ‘parent’ strictly in the sense of pre-image and image of abstraction mapping. Property 4 Given a heuristic search problem in the sense of Definition 3.7, the number of children of any abstract state is upper-bounded by a constant independent of the number of states in the ground-level graph. Property 5 (Graph homomorphism) Every edge in the search graph at level i has either a corresponding abstract edge at level i + 1 or the edge’s ends abstract to a single abstract state. Property 6 If an abstract edge exists between states s1 and s2 then there is an edge between some child of s1 and some child of s2 . Property 7 Any two children of an abstract state are connected through a path whose states are all children of the abstract state. Property 8 The abstraction hierarchy is consistent with agent’s model of its search problem at all times. Formally, for any time t, properties 1 through 7 are satisfied with respect to the agent’s model.

In this paper, we use a clique-based abstraction mechanism published in (Sturtevant & Buro, 2005). It operates by finding fully connected components (cliques) in the search graph and mapping each (together with any adjacent degree-one states) to a single abstract state. This way of building abstractions is favored by the analysis in (Holte et al., 1996, Section 5.2) wherein reduction of search effort due to abstraction is shown to be maximized by minimizing edge diameter of set of children and maximizing its size. For any clique, its edge diameter (i.e., the maximum number of edges between any two elements) is one while the number of states in a clique is maximized. We present the clique-based abstraction mechanism by developing several stages of a handtraceable example. We then illustrate how each of the formal properties introduced above is satisfied in the example. The information contained in this and following sections allows independent researchers to re-implement our algorithm and reproduce the experimental setup we use in this paper, short of minor implementational details. Clique-based abstraction for grid-based maps satisfies all aforementioned properties. The proof is, however, lengthy and beyond the scope of this paper. This is justified by the fact that the focus of this paper is on the novel algorithm that can operate with a broad class of abstraction mechanisms, not just the clique-based abstraction. 11

B ULITKO , S TURTEVANT, L U , YAU

s7

s8

sg

s'3

s6 s1

s4

s'4

s5 s'1

s0

s2

s'2

s3

Level 0 (original graph)

Level 1 (abstract graph)

Figure 3: Clique abstraction: the original search graph from Figure 1 shown on the left is abstracted into the search graph on the right. Clique {s0 , s1 , s2 , s4 } becomes abstract state s01 . Clique {s3 , s5 } becomes s02 , clique {s7 , s8 } and a degree-1 state sg become s03 , and clique {s6 } becomes s04 . In the grid-based navigation domain, the costs of the abstract edges can be set as Euclidean distance between the average coordinates of the children.

s7

s8

sg

s'3 s''2

s6 s1

s4

s'4

s5 s'1

s0

s2

s'''1

s'2

s''1

s3

Level 0 (original graph)

Level 1

Level 2

Level 3

Figure 4: Three iterations of the clique abstraction procedure. A single application of the abstraction procedure is illustrated in Figure 3. Note that degree-one states (e.g., sg in the figure) join their neighboring clique ({s7 , s8 } in the figure). This reduces the number of resulting abstract states but increases the edge diameter of the set of children (it becomes 2 for the set {s7 , s8 , sg }). This is a minor detail of our abstraction mechanism that happens to be effective in the pathfinding domain we use. One can use “pure” clique-abstraction as well. The abstraction process can be successively applied until a single abstract state for each connected component of the original graph remains (Figure 4). We can now illustrate the abstraction properties with Figure 4. Property 1 requires that each of the four abstraction levels in Figure 4 is a search graph in the sense of Definition 3.1 in Section 3. Property 2 requires each state at levels 0, 12

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Figure 5: Coordinates and edge costs for all levels of the abstraction hierarchy. At the grid level (leftmost illustration) vertex coordinates are given through column/row labels (e.g., state √ s8 has coordinates (1, 0)). Ground edges are cost 1 (thin cardinal edges) and cost 2 (thick diagonal edges). At all abstract levels, coordinates are given as (x, y) next to a vertex and edges are labeled with their approximate cost.

1, and 2 to have a unique parent at the level above. Property 3 requires that each state at levels 1, 2, and 3 has a non-empty set of children at the level below. Property 4 places an upper bound on the number of children each abstract state can have. In this example the bound is 4. Properties 5 and 6 require that for any two abstract states their parents and any of their children are connected. Consider, for instance, abstract states s02 and s03 . They are connected at level 1 as there is an abstract path p = (s02 , s01 , s04 , s03 ). Any child of s02 is connected to any child of s03 at level 0. For instance, s3 is connected to s7 . Furthermore, Property 7 requires that at least one path between s3 and s7 lies with the corridor formed by children of states on path p. Path (s3 , s4 , s6 , s8 , s7 ) satisfies the requirement. Note that the costs of abstract edges (e.g., edge (s01 , s02 ) in Figure 3) can be defined in an arbitrary way as long as the resulting search graph satisfies properties in Section 3. In order to achieve a better performance, a low-cost abstract path should be an abstraction of a low-cost ground path. In this paper we experiment with grid-based navigation and, correspondingly, define the cost of edge (s01 , s02 ) as the Euclidean distance between the average coordinates of children(s01 ) and children(s02 ). If children of the abstract state s01 happen to be abstract states themselves then their coordinates are averages of their children and so on. The process is illustrated in Figure 5. In practice, Property 8 is satisfied via repairing agent’s abstraction hierarchy upon updates to agent’s model. To illustrate, imagine an agent just discovered that discrepancies between the terrain elevation in state s4 and s6 (Figure 3) prevent it from being able to traverse the edge (s4 , s6 ). It will then update its model by removing the edge from it. Additionally, degree-one state s6 will join the clique {s7 , s8 }. At this point, agent’s abstraction hierarchy needs to be repaired. This is accomplished by replacing abstract states s04 and s03 with a single abstract state s05 . The edges (s01 , s04 ) and (s04 , s03 ) will be removed. If more then one level of abstraction is used then the repair has to be propagated to higher levels as well. 13

B ULITKO , S TURTEVANT, L U , YAU

We will prove in Section 8 that the PR LRTS algorithm can operate with any abstraction mechanism that satisfies the properties listed above. In this paper, we do not analyze the time or space complexity of building an abstraction hierarchy for the following reasons. First, a good choice of abstraction mechanism is likely to be domain-specific. The focus of the paper is the novel algorithm (PR LRTS) rather than clique-based abstraction. Second, applications such as pathfinding in computer games frequently have a “native” domain-tuned abstraction mechanism. For instance, a game map may be represented as a collection of areas, each area consisting of rooms, each room consisting of models, each model being a navigational mesh. In such a case, PR LRTS is likely to be used with such a “native” abstraction mechanism as opposed to a generic abstraction (e.g., clique merging) in order to reduce or eliminate additional memory needed to build an abstraction. Finally, we will analyze the computational complexity of building and repairing abstractions in the companion paper with the focus on abstraction. 7.2 PR LRTS: Algorithm Structure PR LRTS computes paths at several levels of abstraction. Intuitively, search in a higher, and thus smaller, abstract search space yields a path quickly. Such an abstract path defines an subset of the lower-level abstract space to search in and refine the abstract path. This refinement proceeds incrementally until the lowest (i.e., ground) search space is reached and a ground path is produced. Technically, an agent using PR LRTS operates at N + 1, for N ≥ 1, levels of abstraction. Each level is “populated” with either A*, LRTS, or is a “pass-through” level with no algorithm assigned to it. Note that a path computed at level i may be a complete path from agent’s current state to the goal state or may be only a partial path. The algorithm operates recursively as presented in Figure 6. At every decision-making point, an agent running the algorithm invokes the function PR LRTS and the path it returns is executed by the agent. The function takes two parameters: the current state sc and the destination state sd . In line 1 it checks if the states passed to it are above the top level of abstraction on which pathfinding is to occur. If so, an empty path is returned (line 2). path PR LRTS(state sc , state sd ) 1 if level(sc ) > N then 2 return ∅ 3 end if 4 p =PR LRTS(parent(sc ), parent(sd )) 5 if p 6= ∅ then 6 sd = child(end(p)) 7 end if 8 C = {s | parent(s) ∈ p} 9 switch(algorithm[level(sc )]) 10 A* : return A*(sc , sd , C) 11 LRTS : return LRTS(sc , sd , C) 12 pass-through : return p 13 end switch

Figure 6: Path refinement learning real-time search (PR LRTS) [simplified]. Otherwise, the function calls itself recursively to path-plan between the parental state of sc (computed as parent(sc ) in line 4) and the parental state of sd . The returned path (if non-empty as 14

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

checked in line 5) is used to derive the new destination in line 6. Specifically, the new destination sd is a child of the end of the abstract path p. In line 8, we compute the corridor C comprised of children of the states on path p. The corridor C will be empty if the path p computed in line 4 was empty. Finally, we run the algorithm assigned to our current level of abstraction (i.e., the level of sc and sd ) in lines 10 and 11. Note that it will be the A* or the LRTS algorithm tasked to find either a complete (in the case of A*) or a partial path (in the case of LRTS) from sc to sd limited to the set of states C. By convention, an empty corridor (C = ∅) allows A*/LRTS to search its entire graph. Also, note that no processing happens at a pass-through level (line 12). Our implementation of A* is standard (Hart et al., 1968) except it is run (line 10) on a subgraph defined by the corridor C (line 8). Our implementation of LRTS is taken from (Bulitko & Lee, 2006) and used here in the corridor. For reader’s convenience, we review LRTS in Appendix A. This is a simplified description of the PR LRTS algorithm demonstrating the main ideas. We present a more detailed pseudocode and several implementation details (e.g., selection of a specific child sd in line 6) in Appendix C.

Figure 7: The path refinement process. The original graph (level 0) is shown at the bottom. The abstract graph (level 1) is shown at the top. For illustration purposes, consider a micro-example in Figure 7. In the example, N = 1 and only one level of abstraction (shown at the top) is used in addition to the ground level (shown at the bottom). sc is the current state while sd is the destination state. LRTS is assigned to level 1 while A* is assigned to level 0. Subfigure (i) shows the ground state space below and one level of abstraction above. The agent must plan a path from sc to sd located at the ground level. First, the abstract parents of sc and sd , parent(sc ) = s0c and parent(sd ) = s0d , are located. Then LRTS with d = 3 plans three steps in the abstract space (ii). A corridor C at the ground level comprised of children of the abstract path is then built (iii).4 A child representing the end of the abstract path is set as the new destination sd (iv). Finally, A* is run within the corridor to find a path from sc to the new destination sd (v). If the goal is not reached in the previous step, this procedure is repeated with the current location as the new starting state.

4. For the sake of clarity, we show a simplified rectangle-shaped corridor in the figure. In reality, only ground states whose parents are on the abstract path would be included in the corridor.

15

B ULITKO , S TURTEVANT, L U , YAU

7.3 Running PR LRTS: Procedure and Conjectures Each trial starts by executing PR LRTS(s0 , sg ) where s0 is the start state and sg is the goal state. Upon executing the path returned by the function, a new current state sc is reached. Then, PR LRTS(sc , sg ) is called and so on until the goal state is reached (i.e., sc = sg ). While executing a path, new areas of the search graph may be seen. This causes updates to the abstraction hierarchy the agent maintains. PR LRTS clears and recomputes its abstract paths upon discovering new areas of the search graph as they may have become invalid. Also, if a ground path proves invalid (e.g., runs into a newly discovered obstacle), the execution stops and PR LRTS replans from the current state using the updated abstraction hierarchy. We conjecture that state abstraction delivers the following three benefits. First, algorithms running at the higher levels of abstraction reason over a much smaller (abstracted) search space. Consequently, LRTS convergence travel can be substantially reduced. While the exact amount of speed-up would depend on the search space topology, initial heuristic, visibility radius, and LRTS parameters, consider the following simple example. Imagine using LRTA* for navigation on a uniform rectangular grid. Korf (1990) showed that, in the worst case, convergence travel is O(L3 ) where L is the solution length (in the number of edges). On uniform grids, a single application of the clique-based abstraction would reduce the number of states by approximately a factor of four. Consequently, LRTA* running at abstraction level 2 (i.e., in an abstract space 16 times smaller than the initial map) may converge 163 times or three orders of magnitude faster. Second, when LRTS learns at a higher abstraction level, it maintains the heuristic at that level. Thus, a single update to the heuristic function effectively influences the agent’s behavior on a set of ground-level states. Such a generalization via state abstraction reduces the convergence time. Third, algorithms operating at lower levels are restricted to a corridor. This focuses their operation on more promising areas of the state space and speeds up search (in the case of A*) and convergence (in the case of LRTS).

8. Theoretical Analysis In order to describe this family of algorithms concisely, we will henceforth use the following notation: [A0 , A1 , . . . , AN ] denotes a configuration of PR LRTS with algorithm Ai is assigned to level i. The levels range from 0 (the original search graph) to N (top level of abstraction). Each algorithm Ai is one of the following: A*, LRTS with parameters d, γ, T , or a pass-through denoted by “ ”. For instance, PR LRTS denoted by [A*] is the original A* algorithm, [LRTS(d = 1, γ = 1, T = ∞)] is Korf’s original LRTA* with the lookahead of 1, and [A*, ,LRTS(5,0.1,10)] denotes LRTS with the lookahead of 5, optimality weight of 0.1, and backtracking control parameter of 10 running at the abstraction level of 2. Its solution is passed through the abstraction level of 1 and refined at the ground level by A* running in a corridor. In the rest of the paper we will also use the subclass of PR LRTS algorithms with N levels of abstraction, A* at the ground level, LRTS at the top level, and N − 1 pass-through levels in the middle. This is denoted by [A*,?,LRTS(d, γ, T )] where the wildcard ? represents N − 1 passthrough levels (N ≥ 1). 16

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

8.1 Subsumption Since the assignment of algorithms per levels of abstraction is arbitrary, PR LRTS subsumes several known algorithms when no abstraction is used. Indeed, PR LRTS denoted by [A*] behaves identically to A* (Hart et al., 1968), while the configuration [LRTS(d, γ, T )] behaves identically to LRTS (Bulitko & Lee, 2006). The latter algorithm itself subsumes and generalizes several realtime search algorithms including: LRTA* (Korf, 1990), weighted LRTA* (Shimbo & Ishida, 2003), γ-Trap (Bulitko, 2004) and SLA*/SLA*T (Shue & Zamani, 1993; Shue et al., 2001). 8.2 Real-time Operation As we have mentioned earlier, PR LRTS with LRTS at an abstract level and A* at the ground level is a real-time search algorithm in the sense of Definition 3.8. More formally, in Appendix D we prove the following theorem: Theorem 8.1 For any heuristic search problem (Definition 3.7), [A*,?,LRTS(d, γ, T )] the amount of planning per any action is constant-bounded. The constant depends on the control parameters d, γ, T but is independent of problem’s number of states. 8.3 Completeness An important property of a search algorithm is guaranteed arrival at a goal state on every trial. We prove this property for PR LRTS of the form [A*,?,LRTS(d, γ, T )]. For this proof, we will place an important assumption on the graph discovery process. Recall that PR LRTS of the form [A*,?,LRTS(d, γ, T )] uses the LRTS algorithm to build an abstract path at level N . It then uses a corridor-restricted A* at the ground level to refine the abstract path into a sequence of ground-level edges. Due to Property 7, A* will always be able to find a path between the ground-level states sc and sd that lie within the corridor C by the time execution gets to line 9 in Figure 6. Due to the exploration process, the agent’s model of the search graph may be different from what the graph actually is in reality. Consequently, the path found by A* may contain a ground-level edge that the agent believes to exist but in reality does not. The following lemma demonstrates that such an execution failure is possible only a finite number of times for a given search graph (all proofs are found in Appendix D.) Lemma 8.1 In the case of [A*,?,LRTS(d, γ, T )] doing obstacle discovery there will be only a finite number of path execution failures on each trial. A direct corollary to this lemma is the fact that for any trial, there will be a moment of time after which no graph discoveries are made on that trial. Therefore, executing A*’s path will indeed allow the agent to follow its abstract path on the actual map. Lemma 8.2 LRTS is complete on an abstract graph. The two lemmas lead directly to the following statement. Theorem 8.2 [A*,?,LRTS(d, γ, T )] doing obstacle discovery is complete. Thus, on every trial PR LRTS will find a path to the goal. 17

B ULITKO , S TURTEVANT, L U , YAU

8.4 Convergence Theorem 8.3 Assuming systematic or fixed tie-breaking, [A*,?,LRTS(d, γ, T )] doing obstacle discovery converges to its final solution after a finite number of trials. It makes no updates to its graph or the heuristic on the final trial and follows the same path on all subsequent trials.

9. Empirical Study The primary contribution of this paper is an evaluation of the effects that state abstraction has on learning real-time heuristic search. This empirical study is presented in four parts. In Section 9.1, we introduce the concept of trading off antagonistic measures and demonstrate that PR LRTS makes such trade offs efficiently. This is due to the use of abstraction and consequently, we investigate the effects of abstraction independently of LRTS control parameters in Section 9.2. We then investigate how PR LRTS performance scales with problem size (Section 9.3). Finally, we study the interplay between effects of abstraction and LRTS control parameters. This is the most domain-specific part and, for that reason, we delegate it to Appendix H. The experimental setup is as follows. We used 3000 problems randomly generated over three maps modeled after environments from a role-playing game (BioWare Corp., 1998) and three maps modeled after combat zones from a real-time strategy game (Blizzard Entertainment, 2002). The six maps had 5672, 5852, 6176, 7629, 9749, and 18841 states on grids from 139 × 148 to 193 × 193. The maps we used are found in Appendix E. The 3000 problems were uniformly distributed across five buckets by length of the optimal path between the start and the goal states. Thus, the first 600 problems had the optimal path length in the [50, 60] range, the next 600 problems fell into the [60, 70] bucket5 and so on until the last 600 problems that were in the [90, 100] bucket. We ran 128 primary configurations of the PR LRTS algorithm. The configurations were of the form [A*,?,LRTS(d, γ, T )] where ? represents N − 1 pass-through levels (N ≥ 1). We will also include the no-abstraction case of [LRTS(d, γ, T )] in our study (denoted by N = 0). The control parameters for LRTS used were: abstraction level N ∈ {0, 1, 2, 3, 4}, lookahead depth d ∈ {1, 3, 5, 9}, optimality weight γ ∈ {0.2, 0.4, 0.8, 1.0}, and backtracking quota T ∈ {100.0, ∞}. Two visibility radius values were used: 10 and 1000. In our analysis, we will focus on the case of visibility radius of 10, in line with the previous publications in the area (Bulitko et al., 2005; Bulitko & Lee, 2006). Experiments with the visibility radius of 1000 yielded similar results and are not reported in this paper. Secondary experiments included different assignment of algorithms per abstraction levels including A* at the top, LRTS at the bottom, A* throughout, and LRTS throughout as well as experiments with no pass-through levels. Finally, we have run additional values of T for some of the configurations of other parameters. The algorithms were implemented within the Hierarchical Open Graph framework (Sturtevant, 2005) in C++ and run on a Westgrid cluster in Western Canada. The total running time was over 15 thousand hours of Intel Xeon 3.0GHz CPU. 5. Note that the two ranges overlap as both include 60. In practice, we did not have any problems whose optimal solution length fell precisely on any range boundary.

18

cost fy

cost fy

worse

worse

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

B

Nondominated algorithms

better

better

A

better

cost fx

worse

better

cost fx

worse

Figure 8: Algorithm A dominates algorithm B (left). All non-dominated algorithms are shown as solid circles. Dominated algorithms are shown as hollow circles (right). 9.1 Antagonistic Performance Measure Trade-off From a practitioner’s viewpoint, this section can be viewed as a best parameter selection guide. We start by finding sets of parameters to optimize performance of PR LRTS along a single measure. Then we will consider optimizing pairs of antagonistic performance measures. The research on optimizing single performance measures within LRTS was first published in (Bulitko & Lee, 2006). In Table 2 we extend the results to include A* and PR LRTS. In the table, ‘’ indicates that the value of that parameter had no impact on performance. The best algorithms for a single performance measure are all special cases of PR LRTS (effectively A* and LRTA*) that do not use abstraction. The only exception is planning cost which is an interplay between planning per move and convergence travel. Table 2: Optimizing a single performance measure. Measure convergence travel first-move lag convergence planning convergence memory suboptimality

The best algorithm [A*] [LRTS(d = 1, , )] [A*, , ,LRTS(d = 1, γ = 0.2 or 0.4, ∞)] or [A*, ,LRTS(d = 1, γ = 0.2 or 0.4, ∞)] [A*] [A*] or [LRTS(, γ = 1.0, )]

The power of abstraction comes when we attempt to optimize two negatively correlated (or antagonistic) performance measures simultaneously. Consider, for instance, convergence travel and first-move lag. In order to lower convergence travel, the agent needs to increase the amount of planning per move which, in turn, raises its first-move lag. As these measures are negatively correlated, any agent would effectively trade performance along one measure for performance along the other. Thus, we are interested in algorithms that make such a trade-off most efficiently. 19

B ULITKO , S TURTEVANT, L U , YAU

In order to make our analysis more specific, we first introduce the concept of dominance. We consider a set of algorithms and their parameterizations. For instance, {[A∗], [LRT S(1, 1.0, ∞)], [LRT S(5, 1.0, ∞)]} is a three-element set of algorithms consisting of A*, LRTS with d = 1 and LRTS with d = 5. For two representatives of such a set of parameterized algorithms, A and B, we say that A dominates B with respect to performance measures fx and fy if and only if A is better than B along both measures. Definition 9.1 Performance of algorithm A with respect to performance measure fx is the average value of the performance measure A achieves over a fixed set of problem instances. All our performance measures are defined as undesirable quantities (e.g., percent of suboptimality) and thus being better along a performance measure means minimizing the corresponding cost. Therefore, formally we say that: Definition 9.2 Algorithm A dominates algorithm B with respect to performance measures fx and fy if and only if fx (A) < fx (B) & fy (A) < fy (B) (Figure 8, left). Given a set A of algorithms (each with a particular assignment of parameters), algorithm B ∈ A is called dominated if and only if ∃A ∈ A [A dominates B]. Otherwise, B is called non-dominated. Intuitively, non-dominated algorithms make the trade-off between fx and fy most efficiently among all algorithms of A. They should be considered in practice when one wants to optimize performance in both fx and fy . Conversely, dominated algorithms can be safely excluded from consideration regardless of the relative importance of measures fx and fy in a particular application. If one plots performance of a finite number of algorithms or parameter sets of one algorithm in the cost-based coordinate system (fx , fy ), an ideal algorithm would be as close to the centre of origin as possible (so that the costs are minimized). Since the performance measures fx and fy are antagonistic, there will be a frontier of non-dominated algorithms. All other algorithms are dominated and can be discarded as shown on the right in Figure 8. Note that dominance is defined with respect to performance measurements averaged over a benchmark set of problems. Thus, algorithm A dominates algorithm B with respect to performance measures fx and fy in the sense of average values. On some individual search problem instances, B can actually achieve better fx and fy than A. We study dominance at the level of individual problems in Appendix F. In the rest of this section, we present actual dominance plots for two practically important antagonistic pairs of performance measures. The most prominent result is that all points in the frontier except possibly its extremes are due to configurations of the PR LRTS algorithm that use abstraction. First-move lag vs. convergence travel is an important pair as practitioners may wish low response times and fast convergence. Figure 9 shows the results.6 A* has the lowest convergence travel while weighted LRTA* has the best first-move lag. Bulitko and Lee (2006) tuned LRTS parameters to trade some of A*’s large first-move lag for some of LRTA* convergence travel. Remarkably, all such parameterizations of LRTS turned out to make the trade-off less efficiently than PR LRTS with abstraction. In the figure, the dominated LRTS algorithms are shown as hollow circles and are not labeled. Instances of PR LRTS with abstraction are labeled and take intermediate 6. For the sake of visual clarity, a subset of PR LRTS combinations is plotted in this and subsequent plots. This does not affect the claims we make in this section.

20

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

positions between A* and LRTA*. Note that due to the log scale of both axes in the plot, a proxy agent that runs LRTA* with probability  and A* with probability 1 −  would be dominated for any value of  ∈ (0, 1). Standard errors would be smaller than the marker size if plotted. More informative cross-algorithm plots are found in Appendix F. They indicate that dominance occurs not only on average but also on most of the individual problems.

4

10

[A*]

[A*,_,_,_,LRTS(9,1.0,∞)] [A*,_,_,LRTS(9,1.0,∞)] [A*,_,_,_,LRTS(5,1.0,∞)] [A*,_,_,LRTS(5,1.0,∞)]

3

10

First−Move Lag

[A*,_,_,LRTS(3,1.0,∞)] [A*,_,_,_,LRTS(1,1.0,∞)]

2

10

[A*,_,_,LRTS(1,1.0,∞)] [A*,_,LRTS(1,1.0,∞)]

1

10

[LRTS(1,0.4,∞)]

0

10 2 10

3

4

10

10

5

10

Convergence Travel

Figure 9: Non-dominated algorithms for first-move lag and convergence travel are labeled. Dominated algorithms that do not use abstraction are shown as hollow circles and are not labeled. All non-dominated algorithms except the extreme cases (A* and LRTS[1,0.4,∞] which is weighted LRTA*) use abstraction. Suboptimality vs. convergence planning are antagonistic measures as finding shorter paths requires more learning and incurs a higher planning cost. As Figure 10 demonstrates, A* is the extreme case and all other algorithms that make the trade-off efficiently (i.e., are non-dominated) use abstraction. The non-dominated algorithms in Figures 9 and 10 can be used to select the best algorithm for a particular application. For instance, if an application imposes an upper limit on the first-move lag (e.g., one thirtieth of a second) then one would select among non-dominated algorithms below the upper limit. 9.2 Effects of Abstraction on Individual Performance Measures Given the observed benefits of abstraction, in this section we study effects of abstraction on individual performance measures. We plot each of these measures in Figure 11. In order to make the plots readable, we focus on three particular LRTS parameter combinations (lookahead d, optimality weight γ, and backtracking control T ): (1, 1.0, ∞), (3, 0.2, 100), (9, 0.4, ∞). In the following section we will investigate other combinations of LRTS parameters as well. 21

B ULITKO , S TURTEVANT, L U , YAU

25

[A*,_,_,LRTS(1,0.4,∞)] 20

Suboptimality (%)

[A*,_,LRTS(1,0.4,∞)]

15

[A*,_,_,LRTS(3,0.2,∞)]

10

[A*,_,_,LRTS(3,0.4,∞)] [A*,_,_,LRTS(5,0.2,∞)] [A*,_,_,LRTS(5,0.4,∞)] 5

[A*,_,_,LRTS(5,0.8,∞)]

[A*] 0

4

10

5

10 Convergence Planning

6

10

Figure 10: Non-dominated algorithms for suboptimality and convergence planning are labeled. Dominated algorithms that do not use abstraction are shown as hollow circles and are not labeled. All non-dominated algorithms except the extreme case (A*) use abstraction.

The abstraction level is plotted on the horizontal axis. For level 0, the algorithm used is LRTS running on the ground level map. For abstraction levels 1 through 4, the agent runs LRTS at the abstract graph of that level and refines the produced path by running A* at the ground level. For instance, the point corresponding to the (d = 1, γ = 1, T = ∞) curve and abstraction level of 4 represent the algorithm [A*, , , ,LRTS(1,1,∞)] in the notation introduced in Section 8. Table 3 provides a qualitative summary of the trends followed by a detailed account. Convergence travel decreases with abstraction as the bottom level search is constrained within a narrow corridor induced by an abstract path. The decrease is most noticeable for shallow lookahead searches (d = 1 and d = 3). Algorithms using lookahead of nine have a small convergence travel even without abstraction and the convergence travel is lower bounded by double the optimal solution cost (one optimal solution for the first trial at which the map is discovered and one for the final trial with no map discovery or heuristic learning). Consequently, abstraction has diminished gains for deeper lookahead (d = 9). Table 3: Qualitative effects of abstraction: general trends. measure / parameter convergence travel first-move lag convergence planning convergence memory suboptimality

0 → abstraction N → ∞ ↓ ↑ ↓ ↓ ↑ 22

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

4

6

x 10

Convergence Planning

Convergence Travel

4 3 2 1 0

0

1 2 3 Abstraction Level

4

Convergence Memory

First−Move Lag

6000

4000

2000

0

0

1 2 3 Abstraction Level

4

2

x 10

1.5 1 0.5 0

0

1 2 3 Abstraction Level

4

0

1 2 3 Abstraction Level

4

1500

1000

500

0

Suboptimality (%)

30

d=1,γ=1,T=∞ d=3,γ=0.2,T=100

20

d=9,γ=0.4,T=∞ 10

0

0

1 2 3 Abstraction Level

4

Figure 11: Effects of abstraction in PR LRTS. Error bars indicate the standard errors and are too small to see for most data points.

Convergence planning decreases with abstraction level. This is because the increase of planning per move at higher abstraction levels is overcompensated for by the decrease in convergence travel. The exact shape of the curves is due to an interplay between these two measures. First-move lag increases with abstraction level. This is due to the fact that the corridor at the ground level induced by the abstract path of length d computed by LRTS at the abstract level increases with the abstraction level. There are two additional factors affecting shape of the curves. First, the average out-degree of abstract states varies with abstraction level. Second, boundaries of abstract graphs can often be seen with lookahead of nine at higher abstraction level. Convergence memory decreases with abstraction level as the learning algorithm (LRTS) operates on smaller abstract maps and incurs smaller travel cost. In practice, the amount of learning 23

B ULITKO , S TURTEVANT, L U , YAU

in LRTS tends to correlate tightly with its travel. For instance, for [A*, ,LRTS(3, 0.8, ∞)] the correlation between the number of values stored and the distance traveled is empirically measured at 0.9544 with the probability of the null hypothesis less than 0.01. While different parameters will affect the exact values (e.g., LRTS with higher d tends to store fewer heuristic values as studied by Bulitko & Lee, 2006), there is a nearly linear correlation between the convergence travel and convergence memory for a fixed set of LRTS parameters. Suboptimality increases with abstraction. The increase is due to the fact that each abstraction level progressively simplifies the ground graph topology and, while the abstract path is guaranteed to be refinable into a ground path, it may lead the agent away from the shortest solution. An illustrative example is given in Appendix G, Figure 19 where a refinement of a complete abstract path is 221% longer than the optimal ground path.7 A second mechanism explains why suboptimality rises faster with shallower LRTS searches. Specifically, A* at the ground level refines an abstract d-step path by finding a ground-level solution from the current state to a ground-representative of the end of the abstract path. This solution is guaranteed to be optimal within the corridor and does not necessarily have to pass geographically closely to intermediate states of the abstract path. Thus, giving A* a corridor from a longer abstract path liberates it from having to plot a path to possibly far-off intermediate states on the abstract path. This phenomenon is illustrated in Appendix G, Figure 20 where “feeding” ground-level A* the abstract path in small fragments results in more suboptimality than giving it the “big picture” – the abstract path in its entirety. The final factor affecting the suboptimality curves in the figure is the optimality weight γ. Setting γ to lower values leads to higher suboptimality independently of abstraction as LRTS is the initial heuristic is made closer to the optimal weighted heuristic (Bulitko & Lee, 2006). 9.3 Effects of Abstraction: Scaling up with Problem Size In this section we report empirical data as the problems become larger. We measure the size of the problem as the length of a shortest path between the start and the goal position (hence forth referred to as optimal solution length). This can be calculated by running A* on a fully revealed map. As a reminder, we bucket the 3000 problems into five buckets. The first bucket contains 600 problems whose optimal solution length in [50, 60] range, the second bucket covers [60, 70] range and so on. Figures 12 and 13 show the five performance measures plotted as bucket averages. For each average, we use the middle of the bucket (e.g., 55, 65, . . . ) as the horizontal coordinate. The error bars indicate standard errors. Overall, the results demonstrate that abstraction enables PR LRTS to be applied to larger problems by significantly dampening the increase in convergence travel, convergence planning, and convergence memory. These advantages come at the price of suboptimality and first-move lag. The former clearly increases with abstraction when lookahead is small (Figure 12) and is virtually bucket-independent. The lookahead of nine (used for the plots in Figure 13) draws the curves together as deeper lookahead diminishes effects of abstraction on suboptimality (cf. Figure 25). First-move lag is virtually bucket-independent except for the cases of d = 9 and abstraction levels of 3 and 4 shown in Figure 13. In these cases first-move lag is capped for problems in lower 7. Derivation of a theoretical upper bound on suboptimality due to abstraction is non-trivial and depends on the exact abstraction mechanism used as well as specifics of the search problem at hand. Since PR LRTS can used with a wide class of abstraction mechanisms and applied to a variety of heuristic problems, we do not offer the analysis here but will publish it in a separate journal paper with theoretical properties of clique-based abstraction.

24

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

4

4 Convergence Planning

Convergence Travel

5 4 3 2 1 0

5

x 10

2000

400

3

300

2

1

0

55 65 75 85 95 Optimal Solution Length

x 10

First−Move Lag

6

55 65 75 85 95 Optimal Solution Length

200

100

0

55 65 75 85 95 Optimal Solution Length

30

1500

Suboptimality (%)

Convergence Memory

L=0

25

1000

500

L=1 L=2

20

L=3

15

L=4

10 5

0

55 65 75 85 95 Optimal Solution Length

0

55 65 75 85 95 Optimal Solution Length

Figure 12: Scaling up: performance of PR LRTS versus optimal solution length. Each curve shows bucketed means. LRTS parameters are fixed at d = 1, γ = 1.0, T = ∞. Error bars indicate standard errors and are too small to see for most data points.

buckets as the goal is seen from the start state at these higher levels of abstraction. Consequently, LRTS computes an abstract path of shorter than nine moves. This leads to a smaller corridor and less work for A* to refine this path. Consequently, the first-move lag is reduced. As the problems become larger, LRTS has the room to compute a full nine-move abstract path and the first-move lag increases. For abstraction level 3 this phenomenon takes place up to bucket 85 where seeing the goal state from the start state is not frequent enough to make an impact. No such leveling off is observed at abstraction level 4 as proximity of the abstract goal continues to cut the search short even for the largest problems. Finally, we observe a minute decrease in first-move lag for larger problems. This appears to be due to the fact that problems in the higher buckets tend to have their start state located in a 25

B ULITKO , S TURTEVANT, L U , YAU

5

4000

2000

6000 5000

6

4

2

0

0

200

6

55 65 75 85 95 Optimal Solution Length

x 10

First−Move Lag

6000

8 Convergence Planning

Convergence Travel

8000

4000 3000 2000

55 65 75 85 95 Optimal Solution Length

1000

55 65 75 85 95 Optimal Solution Length

L=1

150

100

50

0

55 65 75 85 95 Optimal Solution Length

Suboptimality (%)

Convergence Memory

L=0

5

L=2 L=3

4

L=4

3

2

55 65 75 85 95 Optimal Solution Length

Figure 13: Scaling up: performance of PR LRTS versus optimal solution length. Each curve shows bucketed means. LRTS parameters are fixed at d = 9, γ = 0.4, T = ∞. Error bars indicate standard errors and are too small to see for some data points. cluttered region of the map (so that the optimal solution length is necessarily higher). Walls reduce the number of states touched by the agent on its first move and reduce the first-move lag.

10. Future Work The promising results presented in this paper open several directions for future research. First, PR LRTS is able to operate with a wide class of homomorphic state abstraction techniques. Thus, it would be of interest to investigate the extent to which effects of state abstraction on real-time search are specific to the clique abstraction mechanism used in this paper. The sheer size of our experiments prohibited us from using more than one domain for this paper. We are planning to conduct a large-scale empirical study in a different domain for a follow-up paper. 26

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Second, PR LRTS uses abstract solution to restrict its search in the original ground-level graph. It will be of interest to combine this with a complementary approach of using length of an optimal solution to an abstract problem as a heuristic estimate for the original search graph in the context of real-time search. In particular, we are looking at effective ways of propagating heuristic values from higher to lower levels of abstraction hierarchy. Third, state abstraction is one way of generalizing learning. Future research will consider combining it with function approximation of the heuristic function as it is commonly practiced in largescale applications of reinforcement learning. Fourth, we are presently investigating applications of PR LRTS to dynamic environments. In particular, we are studying the extent to which savings in memory gained by learning at a higher abstraction level will afford application of PR LRTS to moving target search. The previously suggested MTS algorithm (Ishida & Korf, 1991) requires learning the number of heuristic values quadratic in the size of map. This is prohibitive for present-day commercial game maps. Additionally, we are planning to investigate how the A* component of PR LRTS compares with the incremental updates to the routing table in Trailblazer search (Chimura & Tokoro, 1994) and its hierarchical abstract map sequel (Sasaki, Chimura, & Tokoro, 1995). Finally, we are studying the extent to which state abstraction and path-refinement can be applied to planning with approximate domain model in stochastic environments represented as Markov Decision Processes.

11. Conclusions Situated agents in real-time environments are expected to act quickly while efficiently learning an initially unknown environment. Response time and learning speed are antagonistic performance measures as more planning leads to faster convergence but slow acting and vice versa. Complete search algorithms, such local repair A*, learn quickly but have an unbounded planning time per move. Real-time heuristic search algorithms have a constant-bounded planning time per move but learn slowly. In this paper we combined both approaches and suggested a hybrid algorithm that learns a heuristic function in a smaller abstract space and uses corridor-restricted A* to generate a partial ground-level path. The new algorithm finds the goal state on every trial and converges in a finite number of trials. In a large-scale empirical study it was found to make the trade-offs between antagonistic performance measures efficiently. The combination of learning and planning brings real-time performance to much larger search spaces. This is important in applications such as real-time goal-directed pathfinding in computer games.

Acknowledgments Funding for this research was provided by the National Science and Engineering Research Council of Canada, the Alberta Ingenuity Centre for Machine Learning, the Informatics Circle of Research Excellence, and the University of Alberta. We appreciate consultation by David Thue, Rich Korf, Rob Holte, Csaba Szepesv´ari, Adi Botea, and Jonathan Schaeffer. This research has been enabled by the use of WestGrid computing resources, which are funded in part by the Canada Foundation for Innovation, Alberta Innovation and Science, BC Advanced Education, and the participating research 27

B ULITKO , S TURTEVANT, L U , YAU

institutions. WestGrid equipment is provided by IBM, Hewlett Packard and SGI. In particular, we would like to acknowledge help from Roman Baranowski, Masao Fujinaga, and Dough Phillips.

28

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Appendix A. Learning Real-time Search (LRTS) The PR LRTS algorithm uses previously published LRTS (Bulitko & Lee, 2006) as its learning component. In order to make this paper accessible to a wider audience, we review LRTS here. LRTS combines three extensions of LRTA* via three control parameters: lookahead d, optimality weight γ, and learning quota T . It operates as follows. In the current state s, the agent running LRTS conducts a full-width d-ply lookahead search (line 3 in Figure 14). At each ply, it finds the most promising state (line 4). Assuming that the initial heuristic h0 is admissible, we can safely increase h(s) to the maximum among the f -values of promising states for all plies (line 5). If the total learning amount u exceeds the learning quota T , the agent backtracks to the previous state in which it did planning (lines 7, 10). Otherwise, it executes d moves forward towards the most promising frontier state (line 8). Within PR LRTS, the LRTS algorithm operates on search graph that is initially unknown. The free space assumption is used for unknown areas so that in line 3 of the algorithm all children up to depth d can be generated. Since the agent’s model of the search graph used for lookahead and the actual graph may differ, it is possible that the execution in line 8 fails. In that case, the agent’s representation of the search graph is updated, the rest of the failed path is cleared, and LRTS replans from its current state. It is important that learnt heuristic values learned for all states affected by the graph discovery are discarded and will be subsequently re-learnt. Also, within PR LRTS, LRTS may be operated within a corridor induced by a path at an abstract level above. In this case, states outside of the corridor are never considered in the lookahead state (line 3). LRTS(d, γ, T ) 1 initialize: s ← sstart , u ← 0 2 while s 6∈ Sg do 3 generate children i moves away, i = 1 . . . d 4 on level i, find the state si with the lowest f = γ · g + h 5 update h(s) to max f (si ) if it is greater 1≤i≤d

6 increase the amount of learning u by |∆h| 7 if u ≤ T then 8 execute d moves to get to the state sd 9 else 10 execute d moves to backtrack to the previous state, set u = T 11 end if 12 end while Figure 14: LRTS algorithm combines the three extensions of LRTA* (Bulitko & Lee, 2006).

29

B ULITKO , S TURTEVANT, L U , YAU

Appendix B. Abstraction Properties Abstraction properties were informally introduced and illustrated with an example in Section 7.1. The appendix makes the presentation mathematically precise. We will use the notation introduced for proofs in Appendix D. In the following, variables i, k, m run natural numbers including 0. Property 1. An agent using abstraction maintains a hierarchy of N abstract search graphs in addition to its model of the environment (Definition 5.1). Each of the abstract graphs is a search graph in the sense of Section 3. In the following we denote the abstract search graph at level i, 1 ≤ i ≤ N by (G(i), c(i), s0 (i), sg (i), h0 (i)). As before, G(i) = (S(i), E(i)). Property 2. Each state s in the search graph at level n < N has a unique “parent” state s0 in level n + 1 abstract search graph. More formally:   ∀s ∈ S(k), k < N ∃!s0 ∈ S(k + 1) parent(s) = s0 .

(B.1)

Property 3. Each state s in search graph at level m, for 0 < m ≤ N , has at least one “child” state s0 at level m − 1. The notation children(s) represents the set of children of state s. Thus, s0 ∈ children(s):   ∀s ∈ S(k), k > 0 ∃s0 ∈ S(k − 1) s0 ∈ children(s) .

(B.2)

Property 4. Given a heuristic search problem S in the sense of Definition 3.7, for any instance of that problem, the number of children of any abstract state is upper-bounded by a constant independent of the number of states of the instance: ∀S, S is a search problem ∃m ∀((S, E), c, s0 , sg , h0 ) ∈ S ∀i, 0 < i ≤ N ∀s ∈ S(i) [| children(s)| < m] .

(B.3)

Property 5. (Graph homomorphism) Every edge (s1 , s2 ) ∈ E(k), k < n has either a corresponding abstract edge at level k + 1 or s1 and s2 abstract into the same state: ∀s1 , s2 ∈ S(k), k < N [(s1 , s2 ) ∈ E(k) =⇒ (parent(s1 ), parent(s2 )) ∈ E(k + 1) ∨ parent(s1 ) = parent(s2 ))] .(B.4)

Property 6. If an edge exists between abstract states s1 and s2 then there is an edge between some child of s1 and some child of s2 : ∀s1 , s2 ∈ S(k), k > 0   0 0 (s1 , s2 ) ∈ E(k) =⇒ ∃s1 ∈ children(s1 ) ∃s2 ∈ children(s2 ) (s01 , s02 ) ∈ E(k − 1) . For the last property, we need the following definition. 30

(B.5)

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Definition B.1 A path p in space (S(k), E(k)), 0 ≤ k ≤ N is defined as an ordered sequence of states from S(k) whose any two sequential states constitute a valid edge in E(k). Formally, p is a path in (S(k), E(k)) if and only if: ∃s1 , . . . , sm ∈ S(k) [p = (s1 , . . . , sm ) & ∀i, 1 ≤ i ≤ m [(si , si+1 ) ∈ E(k)]] .

(B.6)

We use the notation p ⊂ (S(k), E(k)) to indicate that both the vertices and the edges of the path p are in the sets S(k), E(k) respectively. We use the notation s ∈ p to indicate that state s is on the path p.

Property 7 Any two children of an abstract state are connected through a path whose states are all children of the abstract state: ∀s ∈ S(k), 0 < k ≤ N ∀s01 , s02 ∈ children(s) ∃p = (s01 , . . . , s02 ) ⊂ (S(k − 1), E(k − 1)). (B.7)

31

B ULITKO , S TURTEVANT, L U , YAU

Appendix C. Details of PR LRTS Implementation path PR LRTS(state sc , state sd ) 1 if level(sc ) > N then 2 return ∅ 3 end if 4 p =PR LRTS(parent(sc ), parent(sd )) 5 if p 6= ∅ then 6 sd = child0 (end(p)) 7 end if 8 C = {s | parent0 (s) ∈ p} 9 switch(algorithm[level(sc )]) 10 A* : return A*(sc , sd , C) 11 LRTS : return LRTS(sc , sd , C) 12 pass-through : return p 13 end switch

Figure 15: Path refinement learning real-time search (PR LRTS). We define the functions child0 and parent0 to handle pass-through levels. Specifically, in line 6, the state sd will be computed by child0 at the first non-pass-through level below the level at which path p is computed. Likewise, in line 8, the states s forming the corridor C are at the first non-passthrough level (level i) below the level of path p (level j). Thus, parent0 (s) will apply the parent operation j − i times so that parent0 (s) and p are both at level j. While any child can be used in line 6 of Figure 15, some choices may lead to better performance. Intuitively, the child chosen by child(end(p)) should be the “closest representative” of the abstract state end(p) among children(end(p)). In the navigation domain, we implement child(s) to return the element of children(s) that is geographically closest to the average coordinates of states in children(s).

32

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Appendix D. Proofs We first prove an auxiliary lemma not defined in the main body of the paper. Lemma D.1 (Downward Refinement Property)8 For any abstract path p = (sa , . . . , sb ), any two children of its ends are connected by a path lying entirely in the corridor induced by p. This means that any abstract path can be refined within the corridor formed by its children. Formally: ∀1 ≤ k ≤ N ∀p = (sa , . . . , sb ) [p ⊂ (S(k), E(k)) =⇒ ∀s0a ∈ children(s1 ) ∀s0b ∈ children(sm ) ∃p0 = (s0a , . . . , s0b ) [p0 ⊂ (S(k − 1), E(k − 1)) & ∀s0 ∈ p0 [s0 ∈ ∪s∈p children(s)]]].

(D.1)

Proof. The proof is by induction on the number of edges in the abstract path. The base case is 0. This means that any two children of a single abstract state are connected by a path that lies entirely in the set of children of the abstract state. This holds due to Property 7. Suppose the statement holds for all abstract paths of length j. We will now show that then it holds for all abstract paths of length j+1. Consider an arbitrary abstract path p ⊂ (S(k), E(k)), k > 0 that has j + 1 edges. Then we can represent p as (s1 , . . . , sj+1 , sj+2 ). Consider arbitrary children s01 ∈ children(s1 ) and s0j+2 ∈ children(sj+2 ). We need to show that there is path p0 ⊂ (S(k − 1), E(k − 1)) between them that lies entirely in the union of children of all states on p (let us denote it by Cp ). Let s0j+1 be an arbitrary child of state sj+1 . Since s1 and sj+1 are only j edges apart, by inductive supposition, there is a path between s01 and s0j+1 that lies entirely in Cp . All that is left to show is that s0j+1 is connected to s0j+2 within Cp . This holds due to Property 6.  Theorem 8.1 For any heuristic search problem in the sense of Definition 3.7, [A*,?,LRTS(d, γ, T )] has a constant-bounded amount of planning per any action. The constant depends on the control parameters d, γ, T but is independent of problem’s number of states. Proof. At the abstract level N , LRTS(d, γ, T ) considers no more than bd abstract states by the algorithm design (cf. Appendix A), here b is maximum degree of any state. As per Definition 3.7, the maximum degree of any state does not depend on the number of states. The resulting abstract path of no longer than d abstract edges induces a corridor at the ground level. The corridor consists of all ground-level states that abstract to abstract states on the path. The size of the corridor is upper-bounded by the number of edges in the path (at most d) multiplied by the maximum number of ground-level children of any abstract state at level N . The latter is upper-bounded by a constant independent of the number of ground-level states due to Property 4. Finally, A* running in a corridor of constant-bounded size takes time independent of the number of ground-level states.  Lemma 8.1 In the case of [A*,?,LRTS(d, γ, T )] doing obstacle discovery there will be only a finite number of path execution failures on each trial. Proof. By contradiction: suppose there are an infinite number of such failures. Each failure is due to a discovery of at least one new blocked edge or vertex in the ground-level graph. Then there will be infinitely many blocked edges or vertices in a finite graph.  Lemma 8.2 LRTS is complete on a fixed abstract graph. 8. The definition is from (Holte et al., 1996, p. 331).

33

B ULITKO , S TURTEVANT, L U , YAU

Proof. First, we show that any abstract graph satisfies the properties under which LRTS is shown to be complete (Theorem 7.5, Bulitko & Lee, 2006). That is, the abstract graph is finite, each action is reversible, there are no self-loops, all actions have a positive cost, and the goal state is reachable from every state. The graph also has to be stationary and deterministically traversible (p.4, Bulitko & Lee, 2006). Due to abstraction mechanism requirements in Section 7.1, the properties listed above are satisfied by the clique abstraction mechanism as long as the ground-level graph satisfies these properties as well (which we require in Section 3). Thus, LRTS running on an abstract graph as if it were a ground graph is complete. In PR LRTS, however, LRTS on an abstract graph does not execute its own actions. Instead, its current (abstract) state is computed as abstract parent of the agent’s current ground-level state. Therefore, a critical question is whether the agent is able to find a ground-level path from its current state to the ground-level state corresponding to the end of its abstract path as computed in line 6 of Figure 15. A failure in doing so would mean that the corridor computed in line 8 of Figure 15 and used to refine the path does not contain a ground-level path from sc to sd . Due to the downward refinement property (Lemma D.1), this can only be due to graph discovery. According to Lemma 8.1, after a finite number of failures, the A* algorithm operating at the ground level is guaranteed to find path to reach the end of the abstract path computed by LRTS. Thus, LRTS has the effective ability to “execute” its own abstract actions. Putting these results together, we conclude that for any valid d, γ, T parameters, LRTS reaches its abstract goal on every trial.  Theorem 8.2 [A*,?,LRTS(d, γ, T )] doing obstacle discovery is complete. Thus, on every trial PR LRTS will find a path to the goal. Proof. Follow directly from Lemma 8.1 and Lemma 8.2.  Theorem 8.3 Assuming systematic or fixed tie-breaking, [A*,?,LRTS(d, γ, T )] doing obstacle discovery converges to its final solution after a finite of trials. It makes no updates to its graph or the heuristic on the final trial and follows the same path on all subsequent trials. Proof. Follows from Lemma 8.1 and Theorem 7.6 of (Bulitko & Lee, 2006) in the same way Lemma 8.2 and Theorem 8.2 were proved above. 

34

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Appendix E. Maps Used in the Empirical Study

Figure 16: The six maps used in the experiments.

35

B ULITKO , S TURTEVANT, L U , YAU

Appendix F. Dominance on Individual Problems In Section 9.1 we introduced the concept of dominance. Figures 9 and 10 demonstrated that PR LRTS with abstraction dominates all but extreme of the search algorithms without abstraction. This analysis was done using cost values averaged over the 3000 problems. In this section we consider dominance on individual problems. Due to high variance in the problems and their difficulty, we report percentages of problems on which dominance is achieved. For every pair of algorithms, we measure the percentage of problems on which the first algorithm dominates the second. We then measure the percentage of problems on which the second algorithm dominates the first. The ratio between these two percentages we call the dominance ratio. 4

10

3

First−Move Lag

10

2

10

1

10

0

10 2 10

3

4

10

5

10

10

Convergence Travel

4

x 10

Convergence Travel

First−Move Lag 700 600 [A*,_,_,LRTS(1,1.0,∞)]

[A*,_,_,LRTS(1,1.0,∞)]

14 12 10 8 6 4 2

500 400 300 200 100

5 10 [LRTS(5,0.8,∞)]

15

200 400 600 [LRTS(5,0.8,∞)]

4

x 10

Figure 17: Top: [A*, , ,LRTS(1,1.0,∞)] is shown as a filled star; [LRTS(5,0.8,∞)] is shown as a hollow star. Bottom left: convergence travel. Bottom right: first-move lag.

At the top of Figure 17 we see a reproduction of Figure 9 where two particular algorithms are marked with stars. The filled star is [A*, , ,LRTS(1,1.0,∞)] that uses three levels of abstraction. The hollow star is [LRTS(5,0.8,∞)] that operates entirely at the ground level. Statistics are reported in Table 4. The bottom left of the figure shows advantages of the PR LRTS with respect to convergence travel. 36

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Table 4: Statistics for the two algorithms from Figure 17. Algorithm

Convergence travel

First-move lag

Dominance

[A*,_,_,LRTS(1,1.0,!)]

72.83%

97.27%

70.97%

[LRTS(5,0.8,!)]

27.17%

2.67%

0.87%

Dominance ratio 81.89

On approximately 73% of the 3000 problems, the PR LRTS travels less than the LRTS before convergence. With respect to the first-move lag, the PR LRTS is superior to the LRTS on 97% of the problems (bottom right in the figure). Finally, on 71% of the problems the PR LRTS dominates the LRTS (i.e., outperforms it with respect to both measures). On the other hand, the LRTS dominates the PR LRTS on approximately 1% of the problems. This leads to the dominance ratio of 81.89. 25

Suboptimality (%)

20 15 10 5 0 3 10

6

4

5

10

10 Convergence Planning

6

10

Convergence Planning

x 10

7

10

Suboptimality (%)

6

[A*,_,_,LRTS(5,0.8,∞)]

[A*,_,_,LRTS(5,0.8,∞)]

80 5 4 3 2 1

1

2 3 4 [LRTS(3,0.2,∞)]

5

60

40

20

0

6 6

x 10

0

20

40 60 [LRTS(3,0.2,∞)]

80

Figure 18: Top: [A*, , ,LRTS(5,0.8,∞)] is shown as a filled star; [LRTS(3,0.2,∞)] is shown as a hollow star. Bottom left: convergence planning. Bottom right: suboptimality.

37

B ULITKO , S TURTEVANT, L U , YAU

Table 5: Statistics for the two algorithms from Figure 18. Algorithm

Convergence planning

Suboptimality

Dominance

[A*,_,_,LRTS(5,0.8,!)]

80.03%

55.80%

48.97%

[LRTS(3,0.2,!)]

19.93%

40.97%

12.2%

Dominance ratio 4.01

Similarly, Figure 18 compares two algorithms with respect to convergence planning and suboptimality of the final solution. At the top of the figure, we have the plot from Figure 10 with [A*, , ,LRTS(5,0.8,∞)] shown as a filled star and [LRTS(3,0.2,∞)] shown as a hollow star. Percent points for domination on individual problems are found in Table 5. The plot at the bottom left of the figure shows that PR LRTS has lower convergence planning cost than the LRTS on 80% of the problems. The plot at the bottom right shows suboptimality of the solutions these algorithms produced. The PR LRTS is more optimal than the LRTS 56% of the time. Finally, the PR LRTS dominates the LRTS on 49% of the problems. Domination the other way (i.e., the LRTS dominates the PR LRTS) happens only on 12.2% of the problems. Which leads to the dominance ratio of 4.01. This analysis provides an insight into dominance at the level of individual problems. There are several factors that influence the results. First, there is a high variance in the difficulty of individual problems due to their distribution over five buckets by optimal path distance. Consequently, there is high variance in how the algorithms trade off antagonistic performance measures on these problems. In the case when there is a large difference between mean values, such as in Figure 17, dominance on average is supported by dominance on majority of individual problems. Conversely, small difference in mean values (e.g., 2.3% in suboptimality for the algorithms in Figure 18) does not lead to overwhelming dominance at the level of individual problems. We extended this analysis to all pairs of algorithms displayed in Figures 17 and 18. For convergence travel and first-move lag, the dominance ratio varies between 5.48 and ∞ with the values below infinity averaging 534.32 with the standard error of 123.47. For convergence planning and suboptimality, the dominance ratio varies between 0.79 and ∞ with the values below infinity averaging 5.16 with the standard error of 1.51. Finally, only a small discrete set of algorithm parameters were tested in our study. Therefore, the results presented in Figures 9, 10, 17, and 18 should be viewed as an approximation to the actual dominance relationship among the algorithms.

38

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Appendix G. Final Solution Suboptimality Caused by Abstraction

Figure 19: Abstraction can cause suboptimality. Left: refining an abstract path. Solid arrows indicate the abstract path. Ground-level path is shown with thinner dashed arrows. Agent’s position is shown with “A”, goal’s position is “G”. The white cells form the corridor induced by the abstract path. Right: A*-generated optimal path.

Figure 20: Partial path refinement increases suboptimality. Left: refining the entire abstract path. Right: refining the first abstract action. Solid arrows indicate the abstract path. Groundlevel path is shown with thinner dashed arrows. Agent’s position is shown with “A”, goal’s position is “G”. The white cells form the corridor induced by the abstract path.

39

B ULITKO , S TURTEVANT, L U , YAU

Appendix H. Interaction Between Abstraction and LRTS Parameters Section 9.2 observed general trends in influence of abstraction on the five performance measures. As the abstraction level adds another dimension to the parameter space of LRTS, previously defined by d, γ, T , the natural question is how the four parameters interact. In order to facilitate a comprehensible visualization in this paper, we will reduce the LRTS parameter space from d, γ, T to d, γ by setting T = ∞ (i.e., disabling backtracking in LRTS). This is justified for two reasons. First, recent studies (Bulitko & Lee, 2006; Sigmundarson & Bj¨ornsson, 2006) have shown that effects of backtracking are highly domain-specific. Second, we did conduct an additional study and found that effects of abstraction are virtually independent of learning quota T while being noticeably dependent on lookahead d and weight γ. Table 6 gives an overview of the influence of abstraction on the parameters of LRTS at a qualitative level. A more detailed analysis for each of the five performance measures follows. It is important to note that these experiments were performed on a set of fixed length paths and fixed size maps. Consequently, map boundary effects are observed at higher levels of abstraction. We will detail their contribution below. Table 6: Influence of LRTS parameters on the impact of abstraction. Each cell in the table represents the impact of abstraction either amplified (“A”) or diminished (“D”) by increase in d or γ. Lower-case “a” and “d” indicate minor effect, “-” indicates no effect. increase in d d A a D D

measure / control parameter convergence travel first-move lag convergence planning convergence memory suboptimality

increase in γ A A A a

Convergence travel: increasing the abstraction level generally decreases convergence travel as LRTS learns on smaller abstract maps. Independently, increasing the lookahead depth in LRTS has a similar effect (Bulitko & Lee, 2006). Convergence travel is lower-bounded by the doubled optimal cost from the start to the goal. Therefore, decreasing convergence travel via either of two mechanisms diminishes the gains from the other mechanism. This effect can be seen in Figure 21 where there is a noticeable gap in convergence travel between abstractions levels 0 and 2. But with a lookahead of 9, there is only a small difference between using abstraction levels 2 and 4. Thus, increasing the lookahead slightly diminishes the effect of abstraction (hence the “d” in the middle column of Table 6). Increasing γ increases the convergence travel. Thus, the higher the value of γ, the more there is to be gained from using abstraction. Thus, an increase in γ amplifies the advantage of abstraction. First-move lag generally increases with both the abstraction level and with lookahead depth. As lookahead depth increases, the size of the corridor used for A* search increases as well. Thus, increasing d amplifies the first-move lag due to abstraction, because PR LRTS must plan once within the lookahead space (within LRTS) and once inside the corridor (within A*) (Figure 22). Deeper lookahead amplifies the impact of abstraction. In the simplified analysis below, we assume that the map is obstacle free which leads to all levels of abstraction being regular grids 40

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Convergence Travel x 10

x 10

Level 2 Level 0 Optimality Weight γ

4 3 2 1 0 1 0.8 Optimality Weight γ

4

Difference in Convergence Travel 1

4

2

0.8

1.5

1 0.4

9 0.4 0.2

1

3

0.5

5

0.2

Lookahead Depth d

1

Convergence Travel

3 5 Lookahead Depth d

9

Difference in Convergence Travel 2500

1 Level 4 Level 2 Optimality Weight γ

6000 4000 2000 0 1 0.8 Optimality Weight γ

2000

0.8

1500 1000 0.4

500

9 0.4 0.2

1

3

5

0.2

Lookahead Depth d

1

3 5 Lookahead Depth d

9

Figure 21: Convergence travel: impact of abstraction as a function of d, γ LRTS parameters. Top two graphs: [LRTS(d, γ)] vs. [A*, ,LRTS(d, γ)]. Bottom two graphs: [A*, ,LRTS(d, γ)] vs. [A*, , , ,LRTS(d, γ)]. (ignoring boundary effects). The length of a path between two points (expressed in the number of actions) is, thus, decreased by a factor of two with each abstraction level. Under these assumptions, the total number of states PR LRTS touches on the first move of the last trial is Ω(d2 ) at the abstract graph and Ω(d · 2N ) at the ground graph. The latter quantity is simply the number of edges in the ground path computed as the number of edges in the abstract path (d) multiplied by the reduction factor of 2N . Adding M more abstraction levels increases the first-move lag to Ω(d · 2N +M ). The increase is a linear function of lookahead depth d. Thus, larger values of d amplify the effect of adding extra abstraction levels. There are several points glossed over by our simplified analysis. First, the reduction in the path length is not always four-fold as we assumed above. In the presence of walls, higher levels of abstraction are less likely to locate and merge fully-fledged 4-element cliques. Second, boundaries of the abstract grapj can be reached by LRTS in less than d moves at the higher abstraction level. This effectively decreases the quantity d in the formula above and the size of the corridor can be reduced from our generous estimate d · 2N . Finally, “feeding” A* longer abstract path often improves its performance as we have analyzed in a previous section (cf. Figure 20). This explains why at abstraction level 4 deepening lookahead has diminishing returns as seen in Figure 22. 41

B ULITKO , S TURTEVANT, L U , YAU

First−Move Lag

Difference in First−Move Lag −100

1 Level 2 Level 0

−200 Optimality Weight γ

3000 2000 1000 0 1 0.8 Optimality Weight γ

−300

0.8

−400 −500 −600 −700 0.4

−800

9 0.4 0.2

1

3

−900

5

0.2

Lookahead Depth d

1

First−Move Lag

3 5 Lookahead Depth d

9

Difference in First−Move Lag 1

−500

Level 4 Level 2 Optimality Weight γ

4000 3000 2000 1000 0 1 0.8 Optimality Weight γ

0.8 −1000

−1500 0.4 −2000

9 0.4 0.2

1

3

5

0.2

Lookahead Depth d

1

3 5 Lookahead Depth d

9

Figure 22: First-move lag: impact of abstraction as a function of d, γ LRTS parameters. Top two graphs: [LRTS(d, γ)] vs. [A*, ,LRTS(d, γ)]. Bottom two graphs: [A*, ,LRTS(d, γ)] vs. [A*, , , ,LRTS(d, γ)]. Optimality weight does not affect the number of states touched by LRTS at the abstract level. On the other hand, it can change the cost of the resulting A* search as a different abstract path may be computed by LRTS. Overall, however, the effect of γ on the first-move lag and the impact of abstraction is inconsequential (Figure 22). Convergence planning: As the abstraction level increases, convergence planning generally decreases. The effect of d is more complex, because deeper lookahead increases the cost of each individual planning step, but overall decreases planning costs as convergence is faster. The interplay of these two trends moderates the overall influence as seen in Figure 23. The effect of γ on convergence planning is non-trivial. In general, lower values of γ reduce the convergence planning cost. Note that convergence planning cost is a product of average planning time per unit of distance and the convergence travel. As we discussed above, optimality weight amplifies the effects of abstraction convergence travel. At the same time, it does not substantially affect increase in planning per move as the abstraction goes up. Combining these two influences, we conclude that optimality weight will amplify effects of abstraction on convergence planning. This is confirmed empirically in Figure 23. 42

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Convergence Planning

6

Difference in Convergence Planning

x 10

1

6

x 10

Level 2 Level 0

2 Optimality Weight γ

3 2 1 0 1 0.8

0.8 1.5

1 0.4

0.5

9

Optimality Weight γ

0.4 0.2

1

3

5

0.2

Lookahead Depth d

1

Convergence Planning

9

Difference in Convergence Planning 25000

1

4

x 10

Level 4 Level 2

20000 Optimality Weight γ

6 4 2 0 1 0.8 Optimality Weight γ

3 5 Lookahead Depth d

0.8

15000 10000 5000

0.4

0

9 0.4 0.2

1

3

−5000

5

0.2

Lookahead Depth d

1

3 5 Lookahead Depth d

9

Figure 23: Convergence planning: impact of abstraction as a function of d, γ LRTS parameters. Top two graphs: [LRTS(d, γ)] vs. [A*, ,LRTS(d, γ)]. Bottom two graphs: [A*, ,LRTS(d, γ)] vs. [A*, , , ,LRTS(d, γ)]. Convergence memory: Abstraction decreases the amount of memory used at convergence because there are fewer states over which to learn. The effects of d and γ are the same as for convergence travel described above. This is because there is a strong correlation between convergence travel and convergence memory that we have previously discussed. Visually Figures 21 and 24 display very similar trends. Suboptimality: Increasing the abstraction level increases suboptimality. For plain LRTS, lookahead depth has no effect on suboptimality of the final solution. However, when we combine deeper lookahead with abstraction the suboptimality arising from abstraction decreases. With deeper lookahead the abstract goal state is seen earlier making PR LRTS a corridor-constrained A*. Additionally, as we discussed in Section 9.2 and Figure 20, refining shorter paths (computed by LRTS with lower d) introduces additional suboptimality. As suboptimality is lower bounded by 0%, increasing lookahead diminishes the effects of abstraction on suboptimality (Figure 25) – hence “D” in Table 6. Increasing γ decreases the amount of suboptimality when no abstraction is used. When combined with abstraction increasing γ has a minor amplification effect on the difference abstraction makes (Figure 25) for two reasons. First, at abstract levels the graphs are fairly small and γ makes less difference there. Second, the degree suboptimality of an abstract path does not translate directly 43

B ULITKO , S TURTEVANT, L U , YAU

Convergence Memory

Difference in Convergence Memory 1000

1 Level 2 Level 0 Optimality Weight γ

1500 1000 500 0 1 0.8 Optimality Weight γ

800

0.8

600 400 0.4 200

9 0.4 0.2

1

3

5

0.2

Lookahead Depth d

1

Convergence Memory

3 5 Lookahead Depth d

9

Difference in Convergence Memory 90

1 Level 4 Level 2

80 Optimality Weight γ

150 100 50 0 1 0.8 Optimality Weight γ

70

0.8

60 50 40 30 0.4

20

9 0.4 0.2

1

3

10

5

0.2

Lookahead Depth d

1

3 5 Lookahead Depth d

9

Figure 24: Convergence memory: impact of abstraction as a function of d, γ LRTS parameters. Top two graphs: [LRTS(d, γ)] vs. [A*, ,LRTS(d, γ)]. Bottom two graphs: [A*, ,LRTS(d, γ)] vs. [A*, , , ,LRTS(d, γ)]. into the degree of suboptimality of the resulting ground path as the A* may still find a reasonable ground path. Thus, the influence of γ at the abstract level is overshadowed by the suboptimaly introduced by the process of refinement itself (cf. Figure 19).

References Aylett, R., & Luck, M. (2000). Applying artificial intelligence to virtual reality: Intelligent virtual environments. Applied Artificial Intelligence, 14(1), 3–32. BioWare Corp. (1998). Baldur’s Gate., Published by Interplay, http://www.bioware.com/bgate/, November 30, 1998. Blizzard Entertainment (2002). Warcraft III: Reign of chaos., Published by Blizzard Entertainment, http://www.blizzard.com/war3, July 3, 2002. Botea, A., M¨uller, M., & Schaeffer, J. (2004). Near Optimal Hierarchical Path-Finding. Journal of Game Development, 1(1), 7–28. 44

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Suboptimality (%)

Difference in Suboptimality (%) 0

1 Level 2 Level 0

−2 Optimality Weight γ

30 20 10 0 0.2 0.4

5

0.8 1

Optimality Weight γ

9

3

0.8

−4 −6 −8 −10

0.4

−12

1 0.2

Lookahead Depth d

1

Suboptimality (%)

3 5 Lookahead Depth d

9

Difference in Suboptimality (%) 0

1 Level 4 Level 2

−2 Optimality Weight γ

30 20 10 0 0.2 0.4

5

0.8 Optimality Weight γ

1

9

−14

3

0.8 −4 −6 −8

0.4

−10

1 0.2

Lookahead Depth d

1

3 5 Lookahead Depth d

9

Figure 25: Suboptimality: impact of abstraction as a function of d, γ LRTS parameters. Top two graphs: [LRTS(d, γ)] vs. [A*, ,LRTS(d, γ)]. Bottom two graphs: [A*, ,LRTS(d, γ)] vs. [A*, , , ,LRTS(d, γ)]. Bulitko, V. (2004). Learning for adaptive real-time search. Tech. rep. http: // arxiv. org / abs / cs.AI / 0407016, Computer Science Research Repository (CoRR). Bulitko, V., & Lee, G. (2006). Learning in real time search: A unifying framework. Journal of Artificial Intelligence Research (JAIR), 25, 119 – 157. Bulitko, V., Sturtevant, N., & Kazakevich, M. (2005). Speeding up learning in real-time search via automatic state abstraction. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 1349 – 1354, Pittsburgh, Pennsylvania. Chimura, F., & Tokoro, M. (1994). The Trailblazer search: A new method for searching and capturing moving targets. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 1347–1352. Dini, D. M., van Lent, M., Carpenter, P., & Iyer, K. (2006). Building robust planning and execution systems for virtual worlds. In Proceedings of the Artificial Intelligence and Interactive Digital Entertainment conference (AIIDE), pp. 29–35, Marina del Rey, California. 45

B ULITKO , S TURTEVANT, L U , YAU

Ensemble Studios (1999). Age of empires II: Age of kings., Published by Microsoft Game Studios, http://www.microsoft.com/games/age2, June 30, 1999. Furcy, D. (2006). ITSA*: Iterative tunneling search with A*. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Workshop on Heuristic Search, Memory-Based Heuristics and Their Applications, Boston, Massachusetts. Furcy, D., & Koenig, S. (2000). Speeding up the convergence of real-time search. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 891–897. Furcy, D., & Koenig, S. (2005). Limited discrepancy beam search. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 125–131. Hansen, E. A., Zilberstein, S., & Danilchenko, V. A. (1997). Anytime heuristic search: First results. Tech. rep. CMPSCI 97-50, Computer Science Department, University of Massachusetts. Hart, P., Nilsson, N., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2), 100–107. Hern´andez, C., & Meseguer, P. (2005a). Improving convergence of LRTA*(k). In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Workshop on Planning and Learning in A Priori Unknown or Dynamic Domains, Edinburgh, UK. Hern´andez, C., & Meseguer, P. (2005b). LRTA*(k). In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, UK. Holte, R., Mkadmi, T., Zimmer, R. M., & MacDonald, A. J. (1996). Speeding up problem solving by abstraction: A graph oriented approach. Artificial Intelligence, 85(1-2), 321–361. id Software (1993). Doom., Published by id Software, http://en.wikipedia.org/wiki/Doom, December 10, 1993. Ishida, T., & Korf, R. (1991). Moving target search. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 204–210. Kitano, H., Tadokoro, S., Noda, I., Matsubara, H., Takahashi, T., Shinjou, A., & Shimada, S. (1999). Robocup rescue: Search and rescue in large-scale disasters as a domain for autonomous agents research. In Proceedings of the IEEE Conference on Man, Systems, and Cybernetics. Koenig, S. (1999). Exploring unknown environments with real-time search or reinforcement learning. In Proceedings of the Neural Information Processing Systems, pp. 1003–1009. Koenig, S. (2004). A comparison of fast search methods for real-time situated agents. In Proceedings of the International Joint Conferences on Autonomous Agents and Multiagent Systems (AAMAS), pp. 864 – 871. Koenig, S., & Likhachev, M. (2001). Incremental A*. Advances in Neural Information Processing Systems, 14. Koenig, S., & Likhachev, M. (2002). D* Lite. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 476–483. Koenig, S., Tovey, C., & Y., S. (2003). Performance bounds for planning in unknown terrain. Artificial Intelligence, 147, 253–279. Koenig, S. (2001). Agent-centered search. Artificial Intelligence Magazine, 22(4), 109–132. 46

S TATE A BSTRACTION IN R EAL - TIME H EURISTIC S EARCH : T HE D IRECTOR ’ S C UT

Koenig, S., & Likhachev, M. (2006). Real-time adaptive A*. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 281–288. Koenig, S., & Simmons, R. (1998). Solving robot navigation problems with initial pose uncertainty using real-time heuristic search. In Proceedings of the International Conference on Artificial Intelligence Planning Systems, pp. 144 – 153. Korf, R. (1990). Real-time heuristic search. Artificial Intelligence, 42(2-3), 189–211. Luˇstrek, M., & Bulitko, V. (2006). Lookahead pathology in real-time path-finding. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Workshop on Learning For Search, pp. 108–114, Boston, Massachusetts. Orkin, J. (2006). 3 states & a plan: The AI of F.E.A.R. In Proceedings of the Game Developers Conference (GDC). Pearl, J. (1984). Heuristics. Addison-Wesley. Pohl, I. (1970). First results on the effect of error in heuristic search. In Meltzer, B., & Michie, D. (Eds.), Machine Intelligence, Vol. 5, pp. 219–236. American Elsevier, New York. Pohl, I. (1973). The avoidance of (relative) catastrophe, heuristic competence, genuine dynamic weighting and computaional issues in heuristic problem solving. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 12–17. Pottinger, D. C. (2000). Terrain analysis in realtime strategy games. In Proceedings of Computer Game Developers Conference. Rayner, D. C., Davison, K., Bulitko, V., Anderson, K., & Lu, J. (2007). Real-time heuristic search with a priority queue. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India. Reynolds, C. W. (1987). Flocks, herds and schools: A distributed behavioral model. In SIGGRAPH ’87: Proceedings of the 14th annual conference on Computer graphics and interactive techniques, pp. 25–34, New York, NY, USA. ACM Press. Russell, S., & Wefald, E. (1991). Do the right thing: Studies in limited rationality. MIT Press. Sasaki, T., Chimura, F., & Tokoro, M. (1995). The Trailblazer search with a hierarchical abstract map. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 259–265. Shimbo, M., & Ishida, T. (2003). Controlling the learning process of real-time heuristic search. Artificial Intelligence, 146(1), 1–41. Shue, L.-Y., Li, S.-T., & Zamani, R. (2001). An intelligent heuristic algorithm for project scheduling problems. In Proceedings of the 32nd Annual Meeting of the Decision Sciences Institute, San Francisco. Shue, L.-Y., & Zamani, R. (1993). An admissible heuristic search algorithm. In Proceedings of the 7th International Symposium on Methodologies for Intelligent Systems (ISMIS-93), Vol. 689 of LNAI, pp. 69–75. Sigmundarson, S., & Bj¨ornsson, Y. (2006). Value Back-Propagation vs. Backtracking in RealTime Search. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Workshop on Learning For Search, Boston, Massachusetts, USA. AAAI Press. 47

B ULITKO , S TURTEVANT, L U , YAU

Stenz, A. (1995). The focussed D* algorithm for real-time replanning. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 1652–1659. Stout, W. (1996). Smart moves: Intelligent pathfinding. Game Developer Magazine. Stout, W. (2000). The basics of A* for path planning. In Game Programming Gems. Charles River Media. Sturtevant, N. (2005). Hog - hierarchical open graph. http://www.cs.ualberta.ca/˜nathanst/hog/. Sturtevant, N., & Buro, M. (2005). Partial pathfinding using map abstraction and refinement. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 1392–1397, Pittsburgh, Pennsylvania. Thalmann, D., Noser, H., & Huang, Z. (1997). Autonomous virtual actors based on virtual sensors. In Lecture Notes in Computer Science (LNCS), Creating Personalities for Synthetic Actors, Towards Autonomous Personality Agents, Vol. 1195, pp. 25–42. Springer-Verlag, London, UK. Westwood Studios (1992). Dune II: Battle for Arrakis., Published by Virgin Interactive, http://en.wikipedia.org/wiki/Dune II.

48