Database Driven Real Time Heuristic Search

Database-Driven Real-Time Heuristic Search in Video-Game Pathﬁnding

Ramon Lawrence, Member, IEEE, and Vadim Bulitko

Abstract—Real-time heuristic search algorithms satisfy a con-

same states. Not only does such behavior yield poor solution

stant bound on the amount of planning per action, independent of

quality, but it also appears irrational, which has prevented

the problem size. These algorithms are useful when the amount of time or memory resources are limited, or a rapid response time is

deployment of such algorithms in video games. Extensions to

required. An example of such a problem is pathﬁnding in video

LRTA , such as LSS LRTA [3], improve solution quality but

games where numerous units may be simultaneously required to

still allow an agent to revisit a state multiple times.

react promptly to a player’s commands. Classic real-time heuristic

The performance of real-time search algorithms can be im-

search algorithms cannot be deployed due to their obvious state

proved by using an ofﬂine precomputation stage that provides

revisitation (“scrubbing”). Recent algorithms have improved performance by using a database of precomputed subgoals. However,

additional information to guide the search online. Algorithms

a common issue is that the precomputation time can be large, and

such as D LRTA [4] and kNN LRTA [5] use precomputed

there is no guarantee that the precomputed data adequately cover

subgoal databases to deal with inaccuracies in the heuristic

the search space. In this paper, we present a new approach that

function, thereby reducing the amount of learning and the

guarantees coverage by abstracting the search space, using the

resulting “scrubbing.” However, as the search space grows, the

same algorithm that performs the real-time search. It reduces the precomputation time via the use of dynamic programming. The

database precomputation time becomes prohibitively long in

new approach eliminates the learning component and the resultant

practice. Additionally, the solution quality varies substantially

“scrubbing.” Experimental results on maps of tens of millions of

as the database coverage and quality are uneven throughout the

grid cells from Counter-Strike: Source and benchmark maps from

space. LRTA with subgoals [6] computes subgoals to escape

Dragon Age: Origins show signiﬁcantly faster execution times

heuristic depressions and stores them as a tree of subgoals for

and improved optimality results compared to previous real-time algorithms.

each state. This provides more complete coverage, but still requires considerable space and time for the precomputation.

Index Terms—Database, game pathﬁnding, real-time search.

The contribution of this paper is a new real-time search algorithm called hill-climbing and dynamic programming search

(HCDPS) that outperforms previous state-of-the-art algorithms

NTRODUCTION

I. I

by requiring less precomputation time, having faster execution times, and virtually eliminating state revisitation. This contribu-

S search problems become larger, the amount of memory tion is achieved via two ideas. First, instead of using a generic and time to produce an optimal answer using standard way of partitioning the map, such as into cliques [7] or sectors

search algorithms, such as A [1], tend to increase substantially. [8], we partition the map into reachability regions. The reacha- This is an issue in resource-limited domains such as video-game bility is defined with respect to the underlying pathfinding algo- pathfinding. In real-time search, the amount of planning time rithm, which guarantees that the agent can traverse such regions

per move is bounded independently of the problem size. 1 This without any state revisitation or learning. This allows us to re- is useful when an agent does not have time to compute the entire place a learning algorithm (e.g., LRTA ) with simple greedy hill plan before making a move.

climbing. Doing so simpli ﬁes the algorithm, virtually eliminates Classic heuristic algorithms, such as learning real-time A

scrubbing, and reduces the agent’s online memory footprint. (LRTA ) [2], satisfy the real-time operation constraint via

The second idea is applying dynamic programming to data- learning and consequently tend to repeatedly visit (“scrub”) the base precomputation. Once we partition the map into regions of hill-climbing reachability, we use dynamic programming in the region space to approximate optimal paths between representa-

Manuscript received May 11, 2012; revised July 06, 2012; accepted

tives of any two such regions. This is in contrast to computing

November 25, 2012. Date of publication November 29, 2012; date of current

optimal paths for all region pairs with A , as done in D LRTA

version September 11, 2013. This work was supported by the National Science and Engineering Research Council.

[4] or, by computing subgoals for every start/goal state combi-

R. Lawrence is with the Department of Computer Science, University of

nation, as done in LRTA with subgoals [6]. In our experiments,

British Columbia Okanagan, Kelowna, BC V1V 1V7 Canada (e-mail: ramon.

the bene ﬁt of this approximation is a two orders of magnitude

[email protected]). V. Bulitko is with the Department of Computing Science, University of Al-

speedup in the database precomputation time.

berta, Edmonton, AB T6G 2E8 Canada (e-mail: [email protected]).

Together, these two ideas, applied in the domain of

Color versions of one or more of the ﬁgures in this paper are available online

pathﬁnding on video game maps of over ten million states,

at http://ieeexplore.ieee.org. Digital Object Identi

enable HCDPS to take less than 2 min of precomputation per

ﬁer 10.1109/TCIAIG.2012.2230632

map, have a path suboptimality of about 3%, per-move plan-

1 More precisely, of the number of states.

ning time of 1 s, and an overall planning time two orders of

228 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 3, SEPTEMBER 2013

magnitude lower than A . HCDPS has substantially improved that do not perform precomputation are also practically bounded precomputation time and suboptimality compared to D LRTA , by the constraint of storing heuristic values for each state vis- kNN LRTA , and LRTA with subgoals. When applied to ited in the search space. Representing heuristic values for a benchmarking maps from Dragon Age: Origins, HCDPS had large search space may not be practical. Finally, all algorithms similar runtime performance but with map precomputation of are evaluated assuming that the state space is static. When a between 5 and 30 s.

state space changes on the ﬂy, precomputation algorithms are af-

The organization of this paper is as follows. In Section II, we fected as the precomputation may be invalidated due to changes. formulate and deﬁne the pathﬁnding heuristic search problem In addition, algorithms based on LRTA are negatively affected studied in this work. Section III covers related work on real-time by state–space changes during their operation as heuristics may and non-real-time heuristic search algorithms. An overview not be correct. Handling state–space changes is out of the scope of the approach is given in Section IV, and Section V covers of this paper. extensive details on the implementation and optimization of

The presentation and experimental focus for this paper is the algorithm. Different algorithm variations are explained in pathﬁnding on grid-based video-game maps. States are vacant Section VI, and a theoretical analysis follows in Section VII. square grid cells. Each cell is connected to four cardinally (i.e., Experimental results are in Section VIII, and the paper closes N, E, W, S) and four diagonally neighboring cells. Edges out of with future work and conclusions.

a vertex are moves available in the corresponding cell, and we will use the terms “action” and “move” interchangeably. Edges

ROBLEM II. P F ORMULATION

in the cardinal direction have a cost of 1, while diagonal edges

We define a heuristic search problem as a directed graph con- cost 1.4. Throughout the paper, we use octile distance as our taining a finite set of states and weighted edges and two states heuristic. It is the de facto standard heuristic in grid maps with designated as start and goal. At every time step, a search agent diagonal edges and is defined as has a single current state, a vertex in the search graph, which it

where and are ab- can change by taking an action (i.e., traversing an edge out of the solute values of the differences in - and -coordinates of the

current state). Each edge has a positive cost associated with it. two states between which the heuristic distance is computed. In The total cost of edges traversed by an agent from its start state the absence of occupied grid cells, the octile distance gives the until it arrives at the goal state is called the solution cost. We re- true shortest distance between any two states. quire algorithms to be complete (i.e., produce a path from start to goal in a ﬁnite amount of time if such a path exists). Accord-

ELATED III. R W ORK ingly, we adopt the standard assumption of safe explorability of

Heuristic search algorithms can be classiﬁed as either real- the search space and assume that the goal state can be reached time or non-real-time. A real-time heuristic search algorithm from any state reachable from the start state.

guarantees a constant bound on planning time per action. Algo-

An agent plans its next action by considering states in a local rithms such as A , IDA [9], and minimal-memory abstraction search space surrounding its current position. A heuristic func- [8] are non-real-time, as they produce a complete (possibly ab- tion (or simply heuristic) estimates the optimal travel cost be- stract) solution before the ﬁrst action is taken. As the problem tween a state and the goal. It is used by the agent to rank avail- size increases, the planning time and corresponding response able actions and select the most promising one. We require that time will exceed any set limit. The primary focus of this paper the heuristic be available for any pair of states. We consider is on real-time search algorithms, but we also compare some only admissible and consistent heuristic functions, which do not non-real-time heuristic search algorithms. overestimate the actual remaining cost to the goal and whose difference in values for any two states does not exceed the cost

A. Real-Time Heuristic Search Algorithms of an optimal path between these states. This is required for our

Real-time search algorithms repeatedly interleave planning complexity results that involve A .

(i.e., selecting the most promising action) and execution (i.e.,

In real-time heuristic search, the amount of planning the agent performing the selected action). This allows actions to be taken does per action has an upper bound that does not depend on without solving the entire problem, which improves response the total number of states in the problem space. We measure time at the potential cost of suboptimal solutions. As only a par- the move time as the mean planning per action in terms of cen- tial solution exists when it is time to act, the selected action may tral processing unit (CPU) time. Hard cutoffs of planning time

be suboptimal (e.g., lead the agent into a corner). To improve are enforced by ensuring that the maximum move time for any the solution over time, most real-time search algorithms up- move is below a given limit. The second key performance mea- date/learn their heuristic. The learning process frequently causes sure of our study is suboptimality, defined as the ratio of the so- the agent to “scrub” (i.e., repeatedly revisit) states to fill in lution cost found by the agent to the optimal solution cost minus heuristic local minima or heuristic depressions [10]. This de- one and times 100%. To illustrate, suboptimality of 0% indicates grades solution quality and makes the algorithm unusable for an optimal path and suboptimality of 50% indicates a path 1.5 video-game pathfinding. times as costly as the optimal path.

Since the advent of the original LRTA [2] with its dynamic

In principle, all algorithms in this paper are applicable to any programming style learning rule, researchers have attempted to heuristic search problem as deﬁned above. However, algorithms speed up the learning process and make state revisitation less ap- based on precomputation are best suited for problems where the parent. LSS LRTA [3] extends LRTA by expanding the local search space can be efﬁciently enumerated. Search algorithms search space using A and updating the heuristics of all states in

LAWRENCE AND BULITKO: DATABASE-DRIVEN REAL-TIME HEURISTIC SEARCH IN VIDEO-GAME PATHFINDING 229

the local search space in order to speed up learning. This signif- TBA -controlled agent can revisit states, run into corners, and, icantly improves suboptimality and the number of state revisits generally speaking, appear irrational. but does not eliminate the scrubbing problem and can still result

Our new algorithm combines the best features of the pre- in highly suboptimal paths.

vious algorithms. Like D LRTA and kNN LRTA , we run our Signiﬁcant performance improvement is possible by solving real-time agent toward a nearby subgoal as opposed to a dis-

a number of problems ofﬂine and storing them in a database. tant global goal. Unlike D LRTA , our abstract regions do not Then, online, these solved problems can be used to guide the contain heuristic depressions with respect to their representa- agent by directing it to a nearby subgoal instead of a distant tive state, which eliminates scrubbing. Unlike kNN LRTA , we goal. Most heuristic functions are more accurate around their guarantee a complete coverage of the search space, always pro- goal state. This is because heuristic functions are usually de- viding our agent with a nearby subgoal. Also, unlike both D rived by ignoring certain intricacies of the search space (e.g., the LRTA and kNN LRTA , we use simple hill climbing instead of octile distance heuristic ignores obstacles), and, thus, the closer LRTA , thereby eliminating the need to store learned heuristic to the goal, the fewer intricacies/obstacles are ignored and the values (like TBA ), but without open and closed lists. Precom- more accurate the heuristic becomes. Hence, by following a se- putation time and space is reduced compared to LRTA with quence of nearby subgoals, the agent beneﬁts from a more ac- subgoals [6] as abstract states and paths reduce the number of curate heuristic and, thus, has to do less learning.

subgoals that need to be stored and computed. There are several, previously developed, real-time heuristic search algorithms that use precomputed subgoals. D LRTA ab-

B. Non-Real-Time Heuristic Search Algorithms stracts the search problem using the clique abstraction of PRA

A [1] produces an optimal solution to a search problem be- [7] and then builds a database of optimal paths between all pairs fore the ﬁrst action is taken. Its time and space complexity in-

of ground-level representatives of distinct abstract states. The creases for large maps and/or multiple agents pathﬁnding simul- database does not store the entire path but only the ground-level taneously. Weighted versions of A [12] trade some complexity state where the path enters the next region. Online, the agent for solution optimality. repeatedly queries the database to identify its next subgoal and

Partial reﬁnement A (PRA ) abstracts the space using the runs LRTA to it. The issues with D LRTA are the large amount clique abstraction [7] and runs A search in the resulting abstract

of memory used and the lengthy precomputation time. Further, space. It then reﬁnes a part of the produced abstract path into the

D LRTA repeatedly applies the clique abstraction, thereby cre- original search space by running A in a small area of the orig- ating large irregular regions. As a result, the abstract regions can inal space. PRA is not a real-time algorithm and its running contain local heuristic depressions, thereby trapping the under- time per move will increase with the map size. A reﬁnement, lying LRTA agent and causing learning and scrubbing.

implemented in Dragon Age: Origins, used sectors instead of kNN LRTA attempts to address D LRTA ’s shortcomings by cliques to partition the map [8]. This was done to improve the

not using a state abstraction and instead precomputing a given abstraction time and memory footprint. Our comparisons in this number of optimal paths between randomly selected pairs of paper are to this improved version of PRA , which will be re- states. Each optimal path is compressed into a series of states, ferred to as “PRA ” in the plots. such that each of them can be hill climbed to from the previous

BEAM [13], [14] uses breadth-ﬁrst search to build a search one. Online, a kNN LRTA agent uses its database in an attempt tree. Each search tree level is constructed by generating all suc-

to ﬁnd a similar precomputed path and then runs LRTA to the cessor states at that level, sorting them, and only keeping a set associated subgoal. kNN LRTA ’s random database records do number of the best states (determined by the beam width). Spec- not guarantee that a suitable precomputed path will be found ifying a beam width bounds the memory requirements of the for a given problem. In such cases, kNN LRTA runs LRTA

search by sacriﬁcing completeness. To make the algorithm com- to the global goal, which subjects it to heuristic depressions and plete (i.e., being able to ﬁnd a solution to any solvable problem),

leads to learning and scrubbing. Additionally, precomputing D the beam width is doubled whenever the search fails. LRTA and kNN LRTA databases can be time consuming (e.g.,

The search space can be represented as a grid of cells, like over 100 h for a single video-game map).

in Dragon Age [8], using waypoints, or navigation meshes [15]. LRTA with subgoals [6] precomputes a subgoal tree from All of these techniques indicate the area on the map that can be

each goal state where a subgoal is the next target state to exit traversed by an agent. The heuristic search algorithms in this

a heuristic depression. Online, LRTA will be able to use them paper can be applied to any of these search space representa- to escape a heuristic depression. However, the algorithm has no tions, but this paper uses grid-based maps. way of preventing scrubbing when it tries to reach the closest subgoal in the tree. Another issue is that the number of subgoals is large as there is a subgoal tree for each goal state, and the

VERVIEW OF IV. O O UR A PPROACH precomputation time is long.

The HCDPS algorithm operates in two stages: offline and on- TBA [11] is not based on LRTA and runs a time-sliced line. The offline stage is performed once, before any searches, version of A instead. It does not use subgoals and always and precomputes information to speed up subsequent searches. computes its heuristic with respect to the global goal. Like A , The offline stage may take a considerable amount of time and it uses open and closed lists during the search. For a given is not real-time. The online stage takes a given search problem amount of planning per move, TBA tends to have worse and uses the precomputed information to efficiently solve the suboptimality than D LRTA or kNN LRTA . Additionally, a problem in real-time.

230 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 3, SEPTEMBER 2013

During the ofﬂine stage, the algorithm analyzes its search region starts with a representative state selected among space and precomputes a database of subgoals. The database yet unpartitioned states. Then, we form a queue of candidate covers the space such that any pair of start and goal states will states to be added where each candidate state is an immediate have a series of subgoals in the database. This is accomplished neighbor of some existing state in the region. For each candi- by abstracting the space. We partition the space into regions in date state , we check if is mutually (i.e., bidirectionally) hill such a way that any state in the region is mutually reachable climbing reachable with . If so, we add to . Partitioning via hill climbing with a designated state, called the representa- stops when every ground state is assigned to a region. As the tive of the region. Since the abstraction builds regions using hill online part of HCDPS starts by heading for the region repre- climbing, which is also used in the online phase, we are guaran- sentative of its start region, we keep the regions fairly small teed that for any start state , our agent can hill climb to a region to reduce suboptimality by imposing a cutoff such that any representative of some region . Likewise, for any goal state , state assigned to a region is fewer than steps from the region there is a region that the goal falls into, which means that the representative. agent will be able to hill climb from ’s representative to . All

The base algorithm described above is then extended with the we need now is a hill-climbable path between the representative ﬁrst optimization: we allow a nonrepresentative state to change of region and the representative of region .

its region if it is closer to a new region’s representative than

For every pair of close regions, we run A in the ground-level its current region’s representative. The number of such region space to compute an optimal path between region representa- changes is limited by the total number of representatives, as the tives. We then use dynamic programming to assemble the com- representative states themselves never move or change regions. puted optimal paths into paths between more distant regions,

The partitioning of the search space depends on the search until we have an approximately optimal path between represen- space characteristics and the initial placement of representa- tatives of any two regions. Once the paths are computed, they tives. For pathﬁnding problems, we place representatives reg- are compressed into a series of subgoals in the kNN LRTA

ularly along grid axes. For general search problems, representa- fashion. Speciﬁcally, each subgoal is selected to be reachable tives can be selected randomly among yet unpartitioned states. from the preceding one via hill climbing. Each such sequence Seed state locations can clearly affect the abstraction and the re- of subgoals is stored as a record in the subgoal database. Finally, sulting HCDPS performance. Optimal placement of representa- we build an index for the database that maps any state to its re- tives is an open research question. gion representative in constant time.

For large search spaces, abstraction tends to be the most

Online, for a given pair of start and goal states, we use the costly offline part of HCDPS. At minimum, every state must be index to find their region representatives. The subgoal path be- visited at least once, which would prevent the algorithm from tween the region representatives is retrieved from the database. being used in state spaces that are too large to be efficiently The agent first hill climbs from its start state to the region rep- enumerated. Additionally, the abstraction procedure will often resentative. The agent then uses the record’s subgoals one by visit a state numerous times. This is because each state must be one until the end of the record is reached. Finally, the agent hill added to a region, which involves hill-climbing checks to and climbs from the region representative to the goal state.

from the region representative. During those two hill-climbing checks, many states (up to the cutoff) can be visited. Further, a

MPLEMENTATION V. I D ETAILS

state may move regions if another representative ends up being

In this section, we present implementation details and opti- closer, or we may have to run the hill-climbing checks several mizations, illustrated with a simple example.

times before a suitable region is found. There are two optimizations that improve the abstraction per-

A. Ofﬂine Stage formance in practice. The ﬁrst optimization is to terminate the

The hill-climbing agent used ofﬂine and online is a simple hill-climbing check from a candidate to a representative pre- greedy search. In its current state , such an agent considers maturely if, while hill climbing, we reach a state already as- immediately neighboring states and selects the state

that signed to the region. Suppose the state is hill-climbing moves minimizes

, where from . Then, if the hill-climbing check from to used up more is the cost of traversing an edge between and

steps, then clearly cannot be reached and is the heuristic estimate of the travel cost between

, than or equal to

from under the moves, and the hill-climbing check is de- and the agent’s goal

. Ties in are broken toward higher clared to have failed. If, on the other hand, we reached in less . Remaining ties are broken arbitrarily but deterministically than

steps, then we also terminate hill climbing, but add based on the order of neighbor expansion of a given state. The to the region. agent then moves from to

The second optimization is to eliminate the hill-climbing climbing is terminated when a plateau or a local minimum in

, and the cycle repeats. Hill

check in the opposite direction (i.e., from a region represen- is reached:

. If this happens tative to a candidate state). This is safe to do in undirected before the agent reaches its goal, we say that the goal is not hill search spaces since any path can be reversed. Speciﬁcally, climbing reachable from the agent’s position. The hill-climbing when online, the agent reaches the end of its record (which is a agent is effectively LRTA without the heuristic update and representative of a region); it ﬁrst tries to plan a hill-climbing does not store updated heuristic values.

sequence from the representative to the agent’s (global) goal.

1) Abstraction: The ﬁrst step in the ofﬂine stage is ab- If that fails, it plans a hill-climbing sequence from the global stracting and partitioning the search problem into regions. Each goal to the representative. This step is guaranteed to succeed by

LAWRENCE AND BULITKO: DATABASE-DRIVEN REAL-TIME HEURISTIC SEARCH IN VIDEO-GAME PATHFINDING 231

state is in. In the worst case, this can occupy the same amount of space as the search space. In practice, performance can be improved by compressing the mapping using run-length encoding (RLE) into an array sorted by a state identiﬁer (a unique scalar). On a grid map, a cell

is assigned its identiﬁer (id) as ( is the map width). For the map in Fig. 1 the representatives are

(2, 9), (7, 2), (7, 7), which, with

, translate to ids: 26,

31, 79, and 84, respectively. The RLE compression works by scanning the map in row order and detecting changes in abstract region indices. The ﬁrst record added to the table is (12, 1) since

, and the state is assigned to region 1: . The ﬁrst change of region index happens at state (1, 7), which

Fig. 1. Region partitioning of an example grid map (73 states,

belongs to region 2. As

, record (18, 2) is added to the table. The next region change is detected at (2, 1), adding (23, 1) to the table. Once the compressed records are produced, the original abstraction mapping is discarded.

records where is the number of abstract regions. As it is sorted by state id, the worst case time to map a state to its region is

The RLE table contains at least

. Ob-

serve that

may increase with the total number of states (which happens, for instance, in video-game pathﬁnding if is ﬁxed). To get amortized constant time independent of (as necessary for real-time performance of HCDPS 2 ), we use a hash index on the RLE table. The index maps every th state id to its corresponding record in the RLE table. To look up region assignment for state

, we ﬁrst compute the hash table

index as

. We then run linear search in the RLE table starting with record number

until we ﬁnd a

record

with state id

exceeding (or equal to) . If

Fig. 2. Region partitioning of an actual game map (9 762 640 states,

, then

. Otherwise , the

region

is indicated in the immediately preceding RLE

, where is independent of . region construction. The agent then executes the resulting plan

record. The query time is

3) Computing Paths Between Seeds With Dynamic Pro- in reverse, thereby traveling from the record’s end to the global gramming: At this point, we have partitioned all states into

goal. The cost of this optimization is that it increases the move abstract regions and recorded the region representatives as hill time, speciﬁcally the last moves, as the algorithm may need to climbing reachable from all states in their regions. We will now perform two hill-climbing checks of up to steps. This would generate paths between any two distinct region representatives. increase the average move time only when used.

First, we compute optimal paths between representatives of On the four-region example, shown in Fig. 1, the two opti- neighboring regions, up to

regions away. This is done by mizations work as follows. For the ﬁrst optimization, consider running A in the search space. The costs of such base paths state

cost matrix for paths between pose that at this point, all states in the region within three or all region representatives. We then run the Floyd–Warshall

before it is added to region

. Sup- are used to populate an

fewer steps from the representative (2, 4) have been added to algorithm [16], [17] on the matrix. The algorithm has a time the region. Without the optimization, we would hill climb from complexity of

and computes all weights in the matrix. (2, 8) to (2, 4) and vice versa to determine whether we can add Note that due to the abstraction this problem does not exhibit

(2, 8) to region 1. With the ﬁrst optimization, hill climbing is optimal substructure. Speciﬁcally, an optimal path between terminated as soon as state (2, 7) is reached. State (2, 8) is then

and a representative for the added to the region since we reached (2, 7) in one step and it is region

a representative for the region

may be shorter than the sum of the optimal paths three steps away from the representative. The sum

and , even if the path is less than the cutoff

between

and

and between

. The second optimization avoids passes through the region . Thus, the computed paths are the hill-climbing check from (2, 4) to (2, 8) entirely. Then, on- approximations to optimal paths. line, the agent is guaranteed to be able to travel from (2, 4) to

Once dynamic programming is complete for each region, we (2, 8) by planning a hill-climbing path from (2, 8) to (2, 4) and will know the next region to visit to get to any other region.

then executing it in reverse. An example of the regions built for an actual map is in Fig. 2. 2 The standard practice in real-time heuristic search is to accept amortized

2) Abstraction Indexing: The mapping of base states to ab- time complexity in the deﬁnition of real-time heuristic search. This is due to the fact that even in the basic algorithms such as LRTA the heuristic function is stract regions must be stored to identify which region each base often implemented as a hash table, hence the amortized access time.

232 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 3, SEPTEMBER 2013

This is similar to a network routing table which stores the next hop to get to any network address. With this table, it is possible to build a path between any pair of region representatives by starting at one region representative and continually going to the speciﬁed next neighbor. Since as an initial step we generated optimal paths between neighboring regions, we can assemble these into approximately optimal paths between more distant regions.

From an implementation perspective, there are several key issues. First, by increasing , we can get more optimal solutions by directly computing more base paths. Two regions and

are immediate neighbors

if there exists an

edge with

and

. By scanning the

search space, an abstraction space can be produced with nodes Fig. 3. Start and end optimizations. being each region and edges connecting neighboring regions. Then, the neighbors

away from a given region are all the nodes reachable from the given region’s node by traversing up element of such a record is used as a subgoal. An example of a to edges. At the limit,

, in which case dynamic pro- compressed path is shown in Fig. 3. In this example, the com- gramming is not used at all, and we have exact paths between pressed path consists of only the start (S) and goal states (G) and all region representatives. Clearly, the time increases with .A one subgoal at (6, 8). second issue is that if the number of regions is large, the actual

Note that the subgoals are selected only to guarantee that each implementation structure would be an adjacency list rather than subgoal is hill climbing reachable from the previous one in the

a 2-D array as the matrix is sparse. This saves space and allows record. However, such reachability does not guarantee cost opti- the algorithm to be useful for search spaces with larger numbers mality. Consequently, online, an agent may have a nonzero sub- of regions. Third, it is possible to have variations of the algo- optimality even if it is solving a problem whose compressed so- rithm which either materialize all paths or produce them only lution is already stored in the database. on demand. It is also possible to have a non-real-time version

5) Database Indexing: The ofﬂine stage ﬁnishes with of the algorithm that only computes the base paths and saves building an index over the database records to allow record the dynamic programming step for the online stage. We discuss retrieval in constant time. A simple index is an

matrix such variants in Section VI.

that stores the compressed paths between pairs of regions: entry To illustrate, using the example in Fig. 1 with

stores the database record from the region representative ﬁrst compute optimal paths between immediately neighboring

, we

. regions (i.e., 1

to the region representative

4). The costs of the resulting

paths are put in an

matrix

B. Online Stage Given a problem

, HCDPS ﬁrst seeks the regions in which its start and goal states are located. Suppose (1) is located in a region with representative

and is located in a region with representative

. By region construction, the agent can hill climb from

to and Costs of longer paths (e.g., between regions 1 and 3) are then travel from

without state revisitation. Furthermore, computed with the Floyd–Warshall algorithm. The resulting the agent’s database contains a record with subgoals connecting cost matrix for all region pairs is

and . Thus, the agent hill climbs from to , then between the subgoals all the way to , and then, to

. Thus, the maximum move time is equal to the time to look up the record (2) to use during the ﬁrst move. After that point, the move time is the time for performing greedy hill climbing to the next state. There are several enhancements to this basic process, de-

4) Database Record Generation: Once we have approxi- signed to improve solution optimality at the cost of increasing mate optimal paths between all pairs of regions, we compress planning time per move. None of these optimizations affect the them into subgoal records. This is done to reduce the space hard cap for move time as the agent always plans its ﬁrst step complexity of the database. Speciﬁcally, given an actual path to the next subgoal from the database record before attempting

, we initialize a compressed path by storing the optimizations. This is used if the time runs out before any of the beginning state of in it. We then use binary search to ﬁnd the optimizations complete. the state

is hill climbing reachable from . the end state of but the immediately preceding state

such that

is not hill climbing reachable from

First, we check if

This is done by planning a hill-climbing path from to is. We then add

to and repeat the process until we reach in the “agent’s head” (i.e., without moving the agent). To satisfy the end state of which we then add to

as well. Each com- the real-time property of the search, the planning is limited to pressed path becomes a record in our database. Online, each moves. If the agent hits a plateau or a local minimum, or simply

LAWRENCE AND BULITKO: DATABASE-DRIVEN REAL-TIME HEURISTIC SEARCH IN VIDEO-GAME PATHFINDING 233

climbs from (2, 9) to (6, 8) and then onto the end of the record (7, 2). Finally, it hill climbs to the goal (6, 4).

The start optimization will try to hill climb from the problem start (2, 7) directly to the second state in the record (6, 8), which is successful. The end optimization will try to hill climb from the second-last state in the record [also (6, 8)] to the goal (6,4), which is also successful. These two optimizations reduce suboptimality from 57% to 0%. In Fig. 4, an HCDPS agent is navi- gating from the start (4, 5) to the goal (9, 3). The start is in region

1 with the representative

(2, 4). The goal is in region 3

with

(7, 2). The agent locates a database record [(2, 4),

The start optimization attempts to hill climb from the start

Fig. 4. Generalized optimizations.

state (4, 5) to the second state in the record (6, 8), but fails as hill climbing terminates at (4, 8) due to reaching a local minimum. The generalized version of the start optimization is then tried

runs out of the move quota before reaching , then we go to where the algorithm tries to determine the state farthest along the ﬁrst subgoal in the database record as usual.

the reconstructed path from the record start (2, 4) to (6, 8) that Second, when we do use a record, we check if its ﬁrst subgoal is hill climbing reachable from the start (4, 5). The search ﬁnds

(i.e., the state following in the record) is hill climbing reach- the state (5, 9), to which the agent then hill climbs from its start able from

. If so, then we direct the agent to go to the ﬁrst state. Once at (5, 9), it continues with the record, hill climbing subgoal instead of the record’s start. The same move-capped hy- to (6, 8). pothetical hill-climbing test is used here. Likewise, when the

The end optimization check from (6, 8) to the goal (9, 4) ter- agent reaches the subgoal before the end of the record , it minates as the goal is ﬁve states away and the cutoff is

. checks if it can hill climb directly to

Thus, the generalized version of the end optimization is per-

A generalization of these start and end hill-climbing checks is formed to locate the closest state to (6, 8) from which the agent as follows. Consider a case where it is not possible to hill climb can hill climb to its global goal of (9, 3). The agent ﬁrst recon- from

to the ﬁrst subgoal , in the database record with structs (via hill climbing) a path from (6, 8) to the end of the start . We know from construction that it is possible to hill record (7, 2). It then tries states on the path and locates (7, 7), climb from

to . Suboptimality can be improved if we from which it can hill climb to (9, 3). As a result of these two can ﬁnd the farthest state on the path from to , where we optimizations, path suboptimality is reduced from 74% to 0%. can hill climb from

to and from to . The state can

be found by ﬁrst reconstructing the path from to (using hill LGORITHM VI. A V ARIATIONS climbing) and then considering the states on this path from toward . For each such state , we try to plan a hill-climbing

Some of the steps performed ofﬂine can be moved online sequence from

to . If we succeed in under moves, then or avoided completely to save ofﬂine computation time. We present four variations of HCDPS that make such tradeoffs. we set to and terminate the search. Once is computed,

the agent hill climbs from

to . A similar optimization is

A. HCDPS: Basic Version

applied at the end of the record, except the objective is to ﬁnd the closest state to the second-last state of the record from

The basic version of HCDPS described in Section IV per- which the agent can hill climb to

forms the abstraction, dynamic programming, and complete The optimizations are illustrated in Figs. 3 and 4. The agent database path generation during the ofﬂine phase. There are a

starts out in state with the goal of arriving at . The empty few issues with the basic version. The abstraction may produce circles are states visited by an HCDPS agent without the online

. This reduces performance in two optimizations, and the ﬁlled circles are states visited with the ways. First, during dynamic programming, the time complexity

a large number of regions

optimization. Light/yellow shaded cells are subgoals from the is , and a 2-D matrix implementation has the space database record the agent is using. Dark/red shaded cells are complexity of

. Even more costly, all paths are reconstructed by hill climbing. For this example, assume that enumerated and stored in the database, which is expensive in the database was computed with

becomes greater than does not affect the example as a record is computed between all several thousand, it is not possible to construct and store the regions either directly or using dynamic programming.

, but the value of

time and space. In practice, once

entire database in memory.

In Fig. 3, the problem is to navigate from the start to the goal (6, 4). The start is in region 2 and the goal is in

B. dHCDPS: Dynamic HCDPS

region 3, so the database contains a record between the region dHCDPS introduces one major change to the basic HCDPS representatives

(2, 9) and (7, 2). The record algorithm. Recall that the basic HCDPS uses dynamic program- contains three states: (2, 9), (6, 8), (7, 2). The online algorithm ming to compute approximately optimal distances between rep- without optimizations navigates from (2, 7) to the start of the resentatives of all regions. It then uses such a distance table to record (2, 9). The next subgoal is (6, 8), so the algorithm hill assemble the actual paths using actual optimal paths computed

234 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 3, SEPTEMBER 2013

with A for neighboring regions. dHCDPS forgoes this path as- cause reaching a state when building its open list does not mean sembly step ofﬂine and, instead, stores only the distance table that the state will be the one to which cA will move when it and the optimal paths between the neighboring regions. Since reaches its state expansion limit. paths between distant regions are not computed ofﬂine, there is

The advantage of A DPS is that the regions can be larger no need to compress them. Only the paths between neighboring since cA is more robust with respect to heuristic depressions regions are compressed. Then, online, dHCDPS uses the dis- than hill climbing. cA can produce regions of arbitrary size tance table to ﬁnd the next region to travel to and follows the by controlling the cutoff, whereas hill-climbing abstraction has correct optimal path to it. This is slightly more expensive com-

a fundamental upper bound on region size that depends on the pared to HCDPS, as the database lookups are more frequent, but map topology. That is, increasing the cutoff for hill climbing the ofﬂine phase is accelerated.

will eventually have no effect as the hill-climbing algorithm will terminate due to hitting plateaus rather than by hitting the cutoff.

C. nrtHCDPS: Non-Real-Time HCDPS Larger regions lead to a lower number of them , hence

nrtHCDPS is a non-real-time version that performs the dy- speeding up the ofﬂine part as well as reducing the database size. namic programming step online on demand. This is done by On the down side, running A can be slower than hill climbing computing the optimal neighbor paths and storing their costs, per move, which slows down region partitioning. The lack of as done in the other versions. Dynamic programming is not per- the aforementioned region-building optimization further com- formed at all in the ofﬂine phase. Rather, the database consists pounds the problem. The tradeoff is investigated in Section VIII. of only the optimal paths between neighbors and their costs. On- line, if the path required is not a neighbor path, it is constructed

HEORETICAL VII. T A NALYSIS by running Dijkstra’s algorithm over the abstract space. Speciﬁ-

In this section, we theoretically analyze HCDPS’ properties cally, the database contains a representation of the abstract space and present the results as a series of theorems. We start with the where each vertex is an abstract region and each edge is a com- notation used in this section.

is the total number of states in puted cost between neighbors’ representatives. Dijkstra’s algo- the search space,

is the number of regions the abstraction rithm is run on this graph to compute the optimal path between procedure produces using a cutoff on moves. is the depth

a particular start and goal vertex (abstract regions). Given this of the neighborhood used to compute optimal paths between re- solution path, the path in the base space is produced by taking gion representatives. Any two regions more than regions apart each edge in the abstract solution (which maps to a compressed will have the path between their representatives approximated base path) and combining them. Our implementation uses Dijk- by dynamic programming. If and are states, then

is the stra’s algorithm to search the abstract space, but A and other cost of an optimal path between them and

is the number of algorithms are also possible.

edges in such a path. is the maximum branching factor in the Since the dynamic programming time, even when solving a search space (i.e., the maximum out-degree of any vertex). is single problem rather than computing the whole table, depends the maximum branching factor in the space of regions (i.e., the on

(and hence indirectly on ), the real-time constraint is maximum number of regions adjacent to any region). is the violated. The advantage is that the costly dynamic programming diameter of the space (i.e., the longest shortest path among any step is avoided ofﬂine and is performed only when necessary two vertices, measured in the number of edges). is the number online. We compare nrtHCDPS to other non-real-time search of base states represented by one index entry in the RLE array algorithms in the empirical evaluation.

(Section V-A2). All proofs apply to the basic version of HCDPS. Behavior of other versions is discussed in Section VI.

D. A DPS Lemma 1 (Guaranteed Suitable Record): For every pair of Instead of using hill climbing as the base algorithm, this ver- states

is reachable from ) there is a sion uses A with an a priori set limit on the number of states ex- record in the HCDPS database that the algorithm can hill climb panded per move [denoted by capped A (cA )]. In other words, to and hill climb from. both ofﬂine and online, we run A which expands up to a certain

and

(such that

Proof: All proofs are found in the Appendix. number of states from the current state and then travels toward

Lemma 2 (Guaranteed Hill Climbability Within a Record): the most promising state on the open list.

For each record (i.e., a compressed path), its ﬁrst subgoal is hill

This is different from TBA in two ways. First, A DPS does climbing reachable from the path beginning. Each subgoal is not retain its open and closed lists between the moves, which hill climbing reachable from the previous one. The end of the makes it incomplete with respect to the global goal. Complete- path is hill climbing reachable from the last subgoal. ness is restored by using subgoals. The subgoals are selected

Lemma 3 (Completeness): For any solvable problem (i.e., such that each is reachable by cA from the previous subgoal.

a start and a goal states that are reachable from each other), Likewise, the regions are built using cA -reachability checks, HCDPS will ﬁnd a path between the start and the goal in a ﬁ- and, hence, each state in a region is cA -reachable to/from the nite amount of time without any state revisitation between its representative.

subgoals.

The rest of the HCDPS implementation and all of its opti- Theorem 1 (Offline Space Complexity): The total worst case mizations stand with one exception. Specifically, recall that the space complexity of the offline stage is

. regular version of HCDPS can terminate its hill-climbing check

Theorem 2 (Ofﬂine Time Complexity): The worst case of- from a candidate state to a region representative as soon as a ﬂine time complexity is state already in the region is reached. A DPS cannot do so be-

LAWRENCE AND BULITKO: DATABASE-DRIVEN REAL-TIME HEURISTIC SEARCH IN VIDEO-GAME PATHFINDING 235

HCDPS was run for neighborhood depth . The base version of HCDPS as well as dHCDPS and nrtHCDPS were tested. We also tested HCDPS and dHCDPS with

, which computes exact paths between all regions rather than using dynamic programming. D LRTA was run with clique abstraction levels of {9, 10, 11, 12} for the Counter-Strike maps and {6, 7, 8, 9} for the Dragon Age maps. kNN LRTA was run with database sizes of {10 000, 40 000, 60 000, 80 000} records.

Fig. 5. Two of the 14 maps used in our empirical evaluation.

We used

steps for hill climbing and for RLE indexing for Counter-Strike, and

and for Dragon Age. kNN LRTA used reachability checks on the

Database Driven Real Time Heuristic Search